141 43 18MB
English Pages 465 [447] Year 2023
Advances in Mechanics and Mathematics 36
Zdeněk Dostál Tomáš Kozubek Marie Sadowská Vít Vondrák
Scalable Algorithms for Contact Problems Second Edition
Advances in Mechanics and Mathematics Volume 36
Series Editors David Gao, Deakin University, Victoria, Australia Tudor Ratiu, Shanghai Jiao Tong University, Minhang District, Shanghai, China
Editorial Board Members Anthony Bloch, University of Michigan, Ann Arbor, MI, USA John Gough, Aberystwyth University, Aberystwyth, UK Darryl D. Holm, Imperial College London, London, UK Peter Olver, University of Minnesota, Minneapolis, MN, USA Juan-Pablo Ortega, University of St. Gallen, St. Gallen, Switzerland Genevieve Raugel, CNRS and University-Paris Sud, Orsay Cedex, France Jan Philip Solovej, University of Copenhagen, Copenhagen, Denmark Michael Z. Zgurovsky, Igor Sikorsky Kyiv Polytechnic Institute, Kyiv, Ukraine Jun Zhang, University of Michigan, Ann Arbor, MI, USA Enrique Zuazua, Universidad Autónoma de Madrid and DeustoTech, Madrid, Spain
Mechanics and mathematics have been complementary partners since Newton’s time, and the history of science shows much evidence of the beneficial influence of these disciplines on each other. Driven by increasingly elaborate modern technological applications, the symbolic relationship between mathematics and mechanics is continually growing. Mechanics is understood here in the most general sense of the word, including - besides its standard interpretation - relevant physical, biological, and information-theoretical phenomena, such as electromagnetic, thermal, and quantum effects; bio- and nano-mechanics; information geometry; multiscale modelling; neural networks; and computation, optimization, and control of complex systems. Topics with multi-disciplinary range, such as duality, complementarity, and symmetry in sciences and mathematics, are of particular interest. Advances in Mechanics and Mathematics (AMMA) is a series intending to promote multidisciplinary study by providing a platform for the publication of interdisciplinary content with rapid dissemination of monographs, graduate texts, handbooks, and edited volumes on the state-of-the-art research in the broad area of modern mechanics and applied mathematics. All contributions are peer reviewed to guarantee the highest scientific standards. Monographs place an emphasis on creativity, novelty, and innovativeness in the field; handbooks and edited volumes provide comprehensive surveys of the state-of-the-art in particular subjects; graduate texts may feature a combination of exposition and electronic supplementary material. The series is addressed to applied mathematicians, engineers, and scientists, including advanced students at universities and in industry.
Zdenˇek Dostál • Tomáš Kozubek • Marie Sadowská • Vít Vondrák
Scalable Algorithms for Contact Problems Second Edition With Contributions by Tomáš Brzobohatý, David Horák, ˇ , Oldˇrich Vlach Lubomír Ríha
Zdenˇek Dostál National Supercomputer Center and Department of Applied Mathematics VŠB-Technical University of Ostrava Ostrava, Czech Republic
Tomáš Kozubek National Supercomputer Center and Department of Applied Mathematics VŠB-Technical University of Ostrava Ostrava, Czech Republic
Marie Sadowská Department of Applied Mathematics VŠB-Technical University of Ostrava Ostrava, Czech Republic
Vít Vondrák National Supercomputer Center and Department of Applied Mathematics VŠB-Technical University of Ostrava Ostrava, Czech Republic
ISSN 1571-8689 ISSN 1876-9896 (electronic) Advances in Mechanics and Mathematics ISBN 978-3-031-33579-2 ISBN 978-3-031-33580-8 (eBook) https://doi.org/10.1007/978-3-031-33580-8 1st edition: © Springer Science+Business Media LLC 2016 2nd edition: © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This book is published under the imprint Birkhäuser, www.birkhauser-science.com, by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland Paper in this product is recyclable.
To our families, colleagues, students, and friends
Preface
The practical interest in solving contact problems stems from the fundamental role of contact in the mechanics of solids and structures. Indeed, the contact of one body with another is a typical way the loads are delivered to a structure and the typical mechanism which supports structures to sustain the loads. Thus, we do not exaggerate too much when we say that the contact problems are at the heart of mechanical engineering. The contact problems are also interesting from the computational point of view. The equilibrium conditions of a system of bodies in mutual contact enhance a priori unknown boundary conditions, which make the contact problems strongly nonlinear. When some of the bodies are “floating,” then the boundary conditions admit rigid body motions, and a solution need not be unique. After the discretization, the contact problem can be reduced to a finite-dimensional problem, such as the minimization of a possibly nondifferentiable convex function in many variables (currently from tens of thousands to tens of billions) subject to linear or nonlinear constraints. Due to the floating bodies, the cost function can have a positive semidefinite quadratic part. Thus, the solution of large discretized multibody contact problems remains a challenging task that general algorithms can hardly solve. The primary purpose of this book is to present scalable algorithms for the solution to multibody contact problems of linear elasticity, including the problems with friction and dynamic contact problems. Let us recall that an algorithm is said to be numerically scalable if the cost of the solution increases nearly proportionally to the number of unknown variables, and it enjoys parallel scalability if the computational time can be reduced nearly proportionally to the number of processors. The algorithms that enjoy numerical scalability are, in a sense, optimal. The cost of the solution by such algorithms increases proportionally with the cost of duplicating the solution. Taking into account the above characterization of contact problems, it is rather surprising that such algorithms exist. vii
viii
Preface
Most of the results presented here were obtained during the last 30 years. Our development of scalable algorithms for contact problems is based on the following observations: • There are algorithms that can solve relevant quadratic programming and QCQP problems with asymptotically linear complexity. • Duality-based methods like FETI provide sufficiently small linear subspaces with the solution. • The coarse problem can be distributed between the primal and dual variables. • The projector to the space of rigid body motions can be used to precondition both the linear and nonlinear steps of solution algorithms. • The space decomposition used by the variants of FETI is an effective tool for solving multibody contact problems and opens a way to the massively parallel implementation of the algorithms. The development of scalable algorithms is challenging even when considering much simpler linear problems. For example, the computational cost of solving a system of linear equations arising from the discretization of the equilibrium conditions of an elastic body with prescribed displacements or traction on the boundary by direct sparse solvers increases typically with the square of the number of unknown nodal displacements. The first numerically scalable algorithms for linear problems of computational mechanics based on the concept of multigrid came into use only in the last quarter of the last century, and fully scalable algorithms based on the FETI (Finite Element Tearing and Interconnecting) methods were introduced by Farhat and Roux by the end of the twentieth century. The presentation of the algorithms in the book is complete in the sense that it starts from the formulation of contact problems, briefly describes their discretization and the properties of the discretized problems, provides the flowcharts of solution algorithms, and concludes with the analysis, numerical experiments, and implementation details. The book can thus serve as an introductory text for anybody interested in contact problems. Synopsis of the Book The book starts with a general introduction to contact problems of elasticity with the account of the main challenges posed by their numerical solution. The rest of the book is arranged into four parts, the first of which reviews some well-known facts on linear algebra, optimization, and analysis in the form that is useful in the following text. The second part concerns the algorithms for minimizing a quadratic function subject to linear equality constraints and/or convex separable inequality constraints. A unique feature of these algorithms is their rate of convergence in terms of bounds on the spectrum of the cost function’s Hessian. The description of the algorithms is organized in five chapters starting with a sepa-
Preface
ix
rate overview of two main ingredients, the conjugate gradient (CG) method (Chap. 5) for unconstrained optimization and the results on gradient projection (Chap. 6), in particular, on the decrease of the cost function along the projected-gradient path. Chapters 7 and 8 describe the MPGP (Modified Proportioning with Gradient Projections) algorithm for minimizing strictly convex quadratic functions subject to separable constraints and its adaptation MPRGP (Modified Proportioning with Reduced Gradient Projections) for bound-constrained problems. The result on the rate of convergence in terms of bounds on the spectrum of the Hessian matrix is given that guarantees a kind of optimality of the algorithms. Thus, the number of iterations necessary to get approximate solutions to problems with the spectrum of cost function’s Hessian contained in a given positive interval is uniformly bounded regardless of the dimension of the problem. Special attention is paid to solving the problems with elliptic constraints and coping with their potentially strong curvature, which can occur in the solution of contact problems with orthotropic friction. Chapter 9 combines the algorithms for solving problems with separable constraints and a variant of the augmented Lagrangian method. The resulting SMALSE (Semi-Monotonic Augmented Lagrangians for Separable and Equality constraints) algorithm uses an effective precision control in the inner loop to find the solution to relevant QCQP problems in a uniformly bounded number of cheap iterations. Apart from SMALSE, the specialized variants for solving bound- and equality-constrained quadratic programming problems (SMALBE) and QCQP problems, including elliptic constraints with a strong curvature (SMALSE-Mw), are considered. The most important results of the book are presented in the third part, including the scalable algorithms for solving multibody frictionless contact problems, contact problems with Tresca’s friction, and transient contact problems. Chapter 10 presents the basic ideas of the scalable algorithms in a simplified setting of multidomain scalar variational inequalities. Chapters 11–13 develop the ideas presented in Chap. 10 to the solution of multibody frictionless contact problems, contact problems with friction, and transient contact problems. For simplicity, the presentation is based on the node-to-node discretization of contact conditions. The presentation includes the variational formulation of the conditions of equilibrium, the finite element discretization, some implementation details, the dual formulation, the TFETI (Total Finite Element Tearing and Interconnecting) domain decomposition method, the proof of asymptotically linear complexity of the algorithms (numerical scalability), and numerical experiments. Chapter 14 extends the results of Chaps. 10 and 11 on solving the problems discretized by the boundary element methods in the framework of the TBETI (Total Boundary Element Tearing and Interconnecting) method. The main new features include the reduction of the continuous problem to the boundary,
x
Preface
the boundary variational formulation of the conditions of equilibrium, and the discretization, which is limited to the boundary. Chapter 15 describes how to extend the range of scalability of the TFETI algorithms, which is proportional to the number of subdomains, through the H-TFETI (hybrid TFETI) method. The basic idea is to join some groups of adjacent subdomains into clusters by the faces’ rigid body modes. As a result, the defects of the clusters and their subdomains are equal, and the dimension of the H-TFETI coarse problem, which is distributed between the primal and dual variables, is proportional to the number of clusters. The method proposed for linear problems complies well with the structure of contact problems. Chapter 16 extends the results of Chaps. 10–15 to the problems with the non-penetration conditions implemented by mortars. In particular, we show that the application of the mortars need not spoil the scalability of the algorithms. Chapter 17 describes preconditioning methods for constrained quadratic programming problems. In particular, we show that the reorthogonalizationbased preconditioning or the renormalization-based scaling can reduce or eliminate the effect of varying coefficients and that the performance of all the algorithms presented in Chap. 9 can be essentially improved by adaptive reorthogonalization, enhancing the information gathered from previous iterations. The last part begins with Chaps. 18 and 19 dealing with the extension of the optimality results to some applications, namely to contact shape optimization and contact problems with plasticity. Finally, the book is completed by Chap. 20 on massively parallel implementation and parallel scalability. The (weak) parallel scalability is demonstrated by solving an academic benchmark discretized by billions of nodal variables. However, the methods presented in the book can be used for solving much larger problems, as demonstrated by the results for a linear benchmark discretized by tens of billions of nodal variables. Ostrava, Czech Republic Pustá Polom, Czech Republic Ostrava, Czech Republic Velká Polom, Czech Republic October 2022
Zdeněk Dostál Tomáš Kozubek Marie Sadowská Vít Vondrák
New Features of the Second Edition
The book is about twenty percent longer. The most important new staff is a chapter on hybrid domain decomposition methods for contact problems (Chap. 15), and new sections bringing the latest improvements in the algorithms, in particular the fast reconstruction of displacements (Sect. 11.9) and the adaptive reorthogonalization of dual constraints, which speeds up the augmented Lagrangians iterations (Sects. 17.4 and 17.5.3). We have updated the essential part of the last chapter (Chap. 20) on parallel implementation and extended Chaps. 1 and 4. We included Notations and a List of Symbols. There are also new sections on stable orthogonalization (Sect. 2.8), angles of subspaces (Sect. 2.11), bounds on the Schur complements’ spectrum of stiffness matrices (Sects. 10.7 and 11.7), a benchmark for Coulomb orthotropic friction (Sect. 12.9.2), and bounds on the spectrum of the mass matrix (Sect. 13.6.1). Other sections are extended by a new staff on the Kronecker product (Sect. 2.1), CS-decomposition (Sect. 2.10), and details of discretization (Sects. 10.3 and 11.4).
xi
Acknowledgments
We have found many results presented in this book in cooperation with colleagues from outside Ostrava. Here we would like to express our thanks especially to Ana Friedlander and Mario Martínez for the early assessment of the efficiency of augmented Lagrangian methods, Sandra A. Santos and F.A.M. Gomes for their share in the early development of algorithms for solving variational inequalities, Olaf Steinbach for introducing us to the boundary element methods, Joachim Schöberl for sharing his original insight into the gradient projection methods, Jaroslav Haslinger for joint research in the problems with friction, and Charbel Farhat for drawing our attention to the practical aspects of our algorithms and inspiration to think twice about simple topics. Our thanks also go to our colleagues and students from the Faculty of Electrical Engineering and Computer Science of VŠB-Technical University of Ostrava. Jiří Bouchala offered his insight into the topics related to boundary integral equations, David Horák first implemented many variants of the algorithms that appear in this book, Marta Domorádová–Jarošová participated in the research of conjugate projectors, and Lukáš Pospíšil participated in the development of algorithms for orthotropic friction. Petr Horyl, Alexandros Markopoulos, and Tomáš Brzobohatý paved the way from academic benchmarks to real-world problems, and Petr Kovář helped with the graph theory. Radek Kučera adapted the algorithms for bound constrained QP to the solution of more general problems with separable constraints and carried out much joint work. Our special thanks go to Oldřich Vlach, who not only participated in the research related to mortars and composites with inclusions but also prepared many figures and participated in the experiments. The key figures in parallel implementation were Václav Hapla, David Horák, and Lubomír Říha. The book would be much worse without critical reading of its parts by Dalibor Lukáš, Kristina Motyčková, and other colleagues. We are also grateful to Renáta Plouharová for improving our English. We gratefully acknowledge the support of the IT4 Innovations Center of Excellence project, reg. no. CZ.1.05/1.1.00/02.0070 via the “Research and xiii
xiv
Acknowledgments
Development for Innovations” Operational Programme funded by the Structural Funds of the European Union. We would also like to thank those participating in the research leading to new results presented in the second edition, especially Tomáš Brzobohatý, David Horák, Jakub Kružík, Ondřej Meca, and Lubomír Říha for their work on hybrid FETI, David Horák for his work on the reconstruction formulae, and Oldřich Vlach for his deal in the research of adaptive reorthogonalization. Finally, we are grateful to Tomáš Brzobohatý and Lubomír Říha for their help with the last chapter and to Petr Vodstrčil and Dalibor Lukáš for the discussions on estimates.
Contents
1
Contact Problems and Their Solution . . . . . . . . . . . . . . . . . . . . .
1
1.1 1.2 1.3 1.4
Frictionless Contact Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Contact Problems with Friction . . . . . . . . . . . . . . . . . . . . . . . . . 3 Transient Contact Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Numerical Solution of Contact Problems . . . . . . . . . . . . . . . . . 6 1.4.1 Continuous Formulation . . . . . . . . . . . . . . . . . . . . . . . . 6 1.4.2 Discretization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.4.3 Solution and Scalable Algorithms . . . . . . . . . . . . . . . . 8 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Part I Basic Concepts 2
Linear Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.1 Vectors and Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Matrices and Mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Inverse and Generalized Inverse . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Direct Methods for Solving Linear Equations . . . . . . . . . . . . . 2.5 Schur Complement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7 Scalar Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.8 Orthogonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.9 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.10 SVD and CS Decompositions . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.11 Angles of Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.12 Graphs, Walks, and Adjacency Matrices . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15 18 19 21 23 25 27 29 30 31 35 38 39 xv
xvi
3
Contents
Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.1 3.2
Optimization Problems and Solutions . . . . . . . . . . . . . . . . . . . . Unconstrained Quadratic Programming . . . . . . . . . . . . . . . . . . 3.2.1 Quadratic Cost Functions . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Unconstrained Minimization of Quadratic Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Convexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Convex Quadratic Functions . . . . . . . . . . . . . . . . . . . . 3.3.2 Minimizers of Convex Function . . . . . . . . . . . . . . . . . . 3.3.3 Existence of Minimizers . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.4 Projections to Convex Sets . . . . . . . . . . . . . . . . . . . . . . 3.4 Equality Constrained Problems . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 Optimality Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.2 Existence and Uniqueness . . . . . . . . . . . . . . . . . . . . . . . 3.4.3 Sensitivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Inequality Constrained Problems . . . . . . . . . . . . . . . . . . . . . . . . 3.5.1 Optimality Conditions for Linear Constraints . . . . . . 3.5.2 Optimality Conditions for Bound Constrained Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.3 Optimality Conditions for More General Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.4 Existence and Uniqueness . . . . . . . . . . . . . . . . . . . . . . . 3.6 Equality and Inequality Constrained Problems . . . . . . . . . . . . 3.6.1 Optimality Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7 Duality for Quadratic Programming Problems . . . . . . . . . . . . 3.7.1 Uniqueness of a KKT Pair . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
41 42 42 43 45 45 46 47 48 50 52 54 56 57 58 59 60 61 62 63 64 68 71
Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.1 Sobolev Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Trace Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Variational Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
74 77 78 82
Part II Optimal QP and QCQP Algorithms 5
Conjugate Gradients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 5.1 5.2 5.3 5.4 5.5
First Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conjugate Gradient Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rate of Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Preconditioned Conjugate Gradients . . . . . . . . . . . . . . . . . . . . . Convergence in the Presence of Rounding Errors . . . . . . . . . .
88 90 93 96 98
Contents
xvii
5.6 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 6
Gradient Projection for Separable Convex Sets . . . . . . . . . . . 101 6.1 Separable Convex Constraints and Projections . . . . . . . . . . . . 6.2 Conjugate Gradient Step Versus Gradient Projections . . . . . . 6.3 Quadratic Functions with Identity Hessian . . . . . . . . . . . . . . . 6.4 Subsymmetric Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5 Dominating Function and Decrease of the Cost Function . . . 6.6 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
MPGP for Separable QCQP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 7.1 7.2 7.3 7.4 7.5 7.6 7.7
Projected Gradient and KKT Conditions . . . . . . . . . . . . . . . . . Reduced Gradients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reduced Projected Gradient . . . . . . . . . . . . . . . . . . . . . . . . . . . . MPGP Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rate of Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bound on Norm of Projected Gradient . . . . . . . . . . . . . . . . . . . Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.7.1 Projection Step with Feasible Half-Step . . . . . . . . . . . 7.7.2 MPGP Algorithm in More Detail . . . . . . . . . . . . . . . . 7.8 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
118 121 123 126 128 129 133 133 134 136 137
MPRGP for Bound Constrained QP . . . . . . . . . . . . . . . . . . . . . . 139 8.1 Specific Form of KKT Conditions . . . . . . . . . . . . . . . . . . . . . . . 8.2 MPRGP Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3 Rate of Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4 Identification Lemma and Finite Termination . . . . . . . . . . . . . 8.5 Implementation of MPRGP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.6 Preconditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.7 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
101 103 106 107 111 114 115
140 141 143 144 147 148 150 151
Solvers for Separable and Equality QP/QCQP Problems . . 155 9.1 9.2 9.3 9.4 9.5 9.6
KKT Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Penalty and Method of Multipliers . . . . . . . . . . . . . . . . . . . . . . SMALSE-M . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Inequalities Involving the Augmented Lagrangian . . . . . . . . . Monotonicity and Feasibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . Boundedness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
156 157 158 160 162 164
xviii
Contents
9.7 9.8 9.9 9.10
Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Optimality of the Outer Loop . . . . . . . . . . . . . . . . . . . . . . . . . . . Optimality of the Inner Loop . . . . . . . . . . . . . . . . . . . . . . . . . . . SMALBE for Bound and Equality Constrained QP Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.11 R-Linear Convergence of SMALBE-M . . . . . . . . . . . . . . . . . . . . 9.12 SMALSE-Mw . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.13 Solution of More General Problems . . . . . . . . . . . . . . . . . . . . . . 9.14 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.15 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
166 168 170 172 173 175 177 178 179 180
Part III Scalable Algorithms for Contact Problems 10 TFETI for Scalar Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 10.1 10.2 10.3 10.4 10.5 10.6 10.7 10.8
Two Membranes in Unilateral Contact . . . . . . . . . . . . . . . . . . . Variational Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tearing and Interconnecting . . . . . . . . . . . . . . . . . . . . . . . . . . . . Discretization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dual Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Natural Coarse Grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reducing the Analysis to Subdomains . . . . . . . . . . . . . . . . . . . H-h Bounds on the Schur Complements’ Spectrum . . . . . . . . 10.8.1 An Upper Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.8.2 A Lower Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.9 Optimality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.10 Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.11 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
186 187 189 191 194 196 197 199 200 202 203 206 207 208
11 Frictionless Contact Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 11.1 11.2 11.3 11.4 11.5 11.6 11.7 11.8 11.9 11.10
Linearized Non-penetration Conditions . . . . . . . . . . . . . . . . . . . Equilibrium of a System of Elastic Bodies in Contact . . . . . . Variational Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tearing and Interconnecting . . . . . . . . . . . . . . . . . . . . . . . . . . . . Discretization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dual Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Preconditioning by Projectors to Rigid Body Modes . . . . . . . Stable Evaluation of K+ x by Using Fixing Nodes . . . . . . . . . . Fast Reconstruction of Displacements . . . . . . . . . . . . . . . . . . . . Reducing the Analysis to Subdomains . . . . . . . . . . . . . . . . . . .
214 215 218 222 223 227 230 232 234 237
Contents
11.11 H-h Bounds on the Schur Complement’s Spectrum . . . . . . . . 11.11.1 An Upper Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.11.2 A Lower Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.12 Optimality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.13 Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.13.1 Cantilever Beams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.13.2 Roller Bearings of Wind Generator . . . . . . . . . . . . . . . 11.14 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xix
238 238 239 242 244 244 245 246 248
12 Contact Problems with Friction . . . . . . . . . . . . . . . . . . . . . . . . . . 251 12.1 12.2 12.3 12.4 12.5 12.6 12.7 12.8 12.9
Equilibrium of Bodies in Contact with Coulomb Friction . . . Variational Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tresca (Given) Isotropic Friction . . . . . . . . . . . . . . . . . . . . . . . . Orthotropic Friction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Domain Decomposition and Discretization . . . . . . . . . . . . . . . . Dual Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Preconditioning by Projectors to Rigid Body Modes . . . . . . . Optimality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.9.1 Cantilever Beams with Isotropic Friction . . . . . . . . . . 12.9.2 Cantilever Beams with Coulomb Orthotropic Friction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.9.3 Yielding Clamp Connection . . . . . . . . . . . . . . . . . . . . . 12.10 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
252 253 255 257 258 260 262 264 265 265 267 269 270 271
13 Transient Contact Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 13.1 13.2 13.3 13.4 13.5 13.6
Transient Multibody Frictionless Contact Problem . . . . . . . . . Variational Formulation and Domain Decomposition . . . . . . . Space Discretization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Time Discretization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dual Formulation of Time-Step Problems . . . . . . . . . . . . . . . . Bounds on the Spectrum of Relevant Matrices . . . . . . . . . . . . 13.6.1 Bounds on the Mass Matrix Spectrum . . . . . . . . . . . . 13.6.2 Bounds on the Dual Stiffness Matrix Spectrum . . . . 13.7 Preconditioning by Conjugate Projector . . . . . . . . . . . . . . . . . . 13.8 Optimality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.9 Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.9.1 Academic Benchmark . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.9.2 Impact of Three Bodies . . . . . . . . . . . . . . . . . . . . . . . . . 13.10 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
276 278 279 281 282 284 284 285 286 291 292 292 294 295 296
xx
Contents
14 TBETI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299 14.1 14.2 14.3 14.4 14.5 14.6 14.7 14.8 14.9 14.10 14.11
Green’s Representation Formula for 2D Laplacian . . . . . . . . . Steklov-Poincaré Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Decomposed Boundary Variational Inequality . . . . . . . . . . . . . Boundary Discretization and TBETI . . . . . . . . . . . . . . . . . . . . . Operators of Elasticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Decomposed Contact Problem on Skeleton . . . . . . . . . . . . . . . TBETI Discretization of Contact Problem . . . . . . . . . . . . . . . . Dual Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bounds on the Dual Stiffness Matrix Spectrum . . . . . . . . . . . . Optimality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.11.1 Academic Benchmark . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.11.2 Comparing TFETI and TBETI . . . . . . . . . . . . . . . . . . 14.11.3 Ball Bearing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.12 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
300 302 304 306 309 311 312 314 316 318 320 320 321 322 323 323
15 Hybrid TFETI and TBETI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325 15.1 15.2 15.3
Hybrid TFETI for 2D Scalar Problems . . . . . . . . . . . . . . . . . . . Hybrid TFETI Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Orthogonal Rigid Body Modes . . . . . . . . . . . . . . . . . . . . . . . . . . 15.3.1 Rigid Body Modes of the Interiors of Faces . . . . . . . . 15.3.2 Rigid Body Modes of Subdomains . . . . . . . . . . . . . . . . 15.4 Joining Subdomains by the Face Rigid Body Modes . . . . . . . 15.4.1 Coupling Two Adjacent Faces . . . . . . . . . . . . . . . . . . . 15.4.2 Interconnecting General Clusters . . . . . . . . . . . . . . . . . 15.5 General Bounds on the Spectrum of Clusters . . . . . . . . . . . . . 15.6 Bounds on the Spectrum of Chained Clusters . . . . . . . . . . . . . 15.6.1 Angle Between the Kernel and Complement of Feasible Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.6.2 The Norm of Face’s Rigid Body Modes . . . . . . . . . . . 15.6.3 The Norm of Subdomain’s Rigid Body Modes . . . . . 15.6.4 An Estimate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.7 Bounds on the Spectrum of Cube Clusters . . . . . . . . . . . . . . . . 15.8 Hybrid TFETI and TBETI for Contact Problems . . . . . . . . . 15.9 Optimality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.10 Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.10.1 Scalar Variational Inequality and Clustering Effect . 15.10.2 Clumped Elastic Cube . . . . . . . . . . . . . . . . . . . . . . . . . . 15.10.3 Clumped Cube Over Obstacle . . . . . . . . . . . . . . . . . . . 15.11 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
326 328 331 331 332 334 334 336 337 339 339 340 341 342 345 348 350 351 352 352 353 354 355
Contents
xxi
16 Mortars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359 16.1 Variational Non-penetration Conditions . . . . . . . . . . . . . . . . . . 16.2 Variationally Consistent Discretization . . . . . . . . . . . . . . . . . . . 16.3 Conditioning of Mortar Non-penetration Matrix . . . . . . . . . . . 16.4 Combining Mortar Non-penetration with TFETI . . . . . . . . . . 16.5 Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.6 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
360 361 363 367 370 372 372
17 Preconditioning and Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375 17.1 17.2 17.3 17.4
Reorthogonalization-Based Preconditioning . . . . . . . . . . . . . . . Renormalization-Based Stiffness Scaling . . . . . . . . . . . . . . . . . . Lumped and Dirichlet Preconditioners in Face . . . . . . . . . . . . Adaptive Augmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.4.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.4.2 Convergence of Adaptive SMALBE-M . . . . . . . . . . . . 17.5 Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.5.1 Reorthogonalization—3D Heterogeneous Beam . . . . 17.5.2 Reorthogonalization—Contact Problem with Coulomb Friction . . . . . . . . . . . . . . . . . . . . . . . . . . 17.5.3 Adaptive Augmentation—3D Hertz Problem . . . . . . 17.6 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
376 378 380 381 381 385 386 386 387 389 391 391
Part IV Other Applications and Parallel Implementation 18 Contact with Plasticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397 18.1
Algebraic Formulation of Contact Problem for Elasto-Plastic Bodies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.2 Semi-smooth Newton Method for Optimization Problem . . . 18.3 Algorithms for Elasto-Plasticity . . . . . . . . . . . . . . . . . . . . . . . . . 18.4 TFETI Method for Inner Problem and Benchmark . . . . . . . . 18.5 Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.6 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
398 399 400 401 401 403 403
19 Contact Shape Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405 19.1 19.2 19.3 19.4
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Discretized Minimum Compliance Problem . . . . . . . . . . . . . . . Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
405 406 407 410
xxii
Contents
19.5 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412 20 Massively Parallel Implementation . . . . . . . . . . . . . . . . . . . . . . . . 413 20.1 20.2 20.3 20.4
Parallel Loading and Decomposition of Meshes . . . . . . . . . . . . Interconnecting Clusters by Local Saddle Point Problems . . . When to Use TFETI and H-TFETI . . . . . . . . . . . . . . . . . . . . . Parallel Preprocessing of FETI Objects . . . . . . . . . . . . . . . . . . 20.4.1 Factorization of K . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.4.2 Assembling of GGT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.5 Factorization of GGT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.6 Parallel H-TFETI Solver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.7 MatSol, PERMON, and ESPRESO Libraries . . . . . . . . . . . . . 20.7.1 MatSol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.7.2 PERMON . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.7.3 ESPRESO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.8 Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.8.1 H-TFETI on Large Linear Problem . . . . . . . . . . . . . . 20.9 Two-Body Contact Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
414 416 419 421 422 423 424 425 426 426 427 427 428 428 429 431
Notation and List of Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433 All authors Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 439
Chapter 1
Contact Problems and Their Solution Zdeněk Dostál
We start our exposition with an informal overview of contact problems, including frictionless problems, problems with friction, and dynamic contact problems. We discuss their specific features, including some theoretical results concerning the existence and uniqueness of a solution, and mention some historical results and references. Next, we present the main tools for their practical solution and the ingredients of scalable algorithms. We assume that the strains and displacements are small, at least in one time step in the case of transient problems, so we can use linear elasticity to formulate the conditions of equilibrium and the equations of motion.
1.1 Frictionless Contact Problems We speak about frictionless contact problems whenever we can obtain an acceptable solution under the assumption that we can neglect the tangential forces on the contact interface. Many such problems arise in mechanical engineering whenever there is a need to predict the stress or deformation of the moving parts of machines or vehicles. We often assume that we can also neglect the inertial forces. The static frictionless contact problems are the most simple ones, so it is not surprising that the first numerical results were obtained just for them. The first publication dates back to 1881 when Heinrich Hertz published his paper “On the contact of elastic solids” [1]. Hertz observed that when two smooth bodies of a particular shape come into contact so that the contact area is much smaller than the characteristic radius of each body, then the Z. Dostál Department of Applied Mathematics and IT4Innovations National Supercomputing Center, VŠB-Technical University of Ostrava, Ostrava, Czech Republic e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 Z. Dostál et al., Scalable Algorithms for Contact Problems, Advances in Mechanics and Mathematics 36, https://doi.org/10.1007/978-3-031-33580-8_1
1
2
Zdeněk Dostál
active contact interface is confined to a small region of a predictable shape. As a result, it is possible to simplify the equilibrium conditions near the contact interface so that they can be solved analytically. A variant of such a problem is in Fig. 1.1. Hertz’s results are still relevant for the design of bearings, gears, and other bodies when two smooth and nonconforming surfaces come into contact, the strains are small, and within elastic limits, the area of contact is much smaller than the characteristic radius of the bodies, and the friction can be neglected. A later overview of the analytical solutions of contact problems can be found in Johnson [2]. We use them to verify the results of our benchmarks.
g g
Fig. 1.1 Stacked lenses pressed together
As an example of the frictionless contact problem with a small contact interface that cannot be solved analytically, let us consider the problem to describe the deformation and contact pressure in the ball bearings depicted in Fig. 1.2. A new feature of this problem is the complicated interaction of several bodies, possibly without prescribed displacements, through the nonlinear non-penetration conditions. The bodies are of different shapes—we can easily recognize balls, rings, and cages. The balls are not fixed in their cages, so their stiffness matrices are singular. The prescribed forces cannot be arbitrary, as the balls can be in equilibrium only if the moment of the external forces on the balls is equal to zero, as in our case. The displacements and forces are typically given on the surfaces of bodies, but active contact interfaces are only known after the problem is solved. Moreover, the displacements of the balls would not be uniquely determined even if we replaced all inequality constraints that describe the non-penetration with the corresponding equalities describing the related bilateral constraints. A solution of the ball bearings and other frictionless contact problems with “floating” bodies exists if the external forces acting on each body are balanced (including the reaction forces). If the prescribed displacements of each body prevent its rigid body motion, then the solution exists and is necessarily unique. The equilibrium conditions of a general elastic body in contact with a frictionless rigid obstacle were formulated in 1933 by Antonio Signorini [3] as a boundary value problem with “ambiguous boundary conditions,” i.e., with
1 Contact Problems and Their Solution
3
Fig. 1.2 Ball bearings
Dirichlet or Neumann boundary conditions and a solution-dependent rule specifying which is valid. The first results concerning the existence and uniqueness of a solution to the class of Signorini problems date back to Gaetano Fichera, who published the paper “On the elastostatic problem of Signorini with ambiguous boundary conditions” [4] in 1964. Fichera observed that the variational formulation of the Signorini conditions describes the minimizer of an energy function. A later presentation of the existence results based on the coercivity of the energy function can be found, e.g., in Hlaváček et al. [5] or Kikuchi and Oden [6]. However, the most popular sufficient condition for the existence of a solution, the coercivity of the energy functional, does not hold for the bearing problems.
1.2 Contact Problems with Friction The contact problems with friction arise whenever we consider tangential forces on the contact interface. They are important in many areas of engineering, including locomotive wheel-rail or tire-road contact, braking systems, yielding clamp connections, etc.
F
Fig. 1.3 Leonardo da Vinci experiment with friction
4
Zdeněk Dostál
The effort to describe the friction forces phenomenologically dates back to Leonardo da Vinci. Carrying out the experiments like those depicted in Fig. 1.3, Leonardo observed that the area of contact interface has no effect on the friction and, if the load on the object is doubled, the friction is also doubled. The latter is a particular version of the most popular friction law named after Charles-Augustin de Coulomb, used in many applications. This law was formulated for quasi-static contact by Amontons in 1691 and extended to dynamic situations by Coulomb in 1781 [7]. The Coulomb (also Amontons-Coulomb) friction law claims that “strength due to friction is proportional to the compressive force.” The Coulomb friction law agrees in many cases with observations, but Coulomb himself noticed that “for large bodies friction does not follow exactly this law.” A more recent discussion of friction laws is in Wriggers [8].
R1 d
R2 Fig. 1.4 Yielding support
Fig. 1.5 Yielding clamp connection
A realistic example of a contact problem with friction is the analysis of the mine support yielding clamp connection depicted in Figs. 1.4 and 1.5. The support comprises several parts deformed by applied forces. As in the ball bearing example, some parts are without prescribed displacements. Therefore, the numerical algorithms for the solution must be prepared to deal with singular stiffness matrices of the discretized problem. A new difficulty is balancing the contact forces in the tangential direction. Other complications arise in the variational formulation due to the nondifferentiable dissipative term in the non-convex energy function. To solve the problem with friction, we have to identify the unknown contact interface and its unknown slip-stick parts. Unlike the frictionless case, the existence theory for the quasi-static contact problems with Coulomb’s friction is much weaker and does not guarantee a solution for many real problems. Moreover, a solution is unique only in very exceptional cases. Early results concerning the existence and uniqueness of the solution of the contact problems with friction can be found in Duvaut and Lions [9]. A more recent overview of the results concerning the existence
1 Contact Problems and Their Solution
5
and uniqueness of a solution to contact problems with friction can be found in the book by Eck et al. [10]. The most comprehensive theory has been developed for the Tresca (given) friction, which assumes that the contact area and the pressure force, which defines the slip-stick bound, are known a priori. Though this assumption is not realistic, it is helpful as a well-understood convex approximation of the Coulomb friction with strong theoretical results on the existence and uniqueness. Our book uses the Tresca friction as an auxiliary problem in the fixed-point algorithms for the Coulomb friction.
1.3 Transient Contact Problems The transient contact problems arise whenever we have to consider the inertia forces. Therefore, it is more realistic to consider dynamic models than those considered above. Many such problems occur in mechanical engineering, geophysics, or biomechanics when the bodies in contact move fast, like in car engines. However, providing a useful solution to realistic problems requires overcoming the difficulties specified above for the static problems and resolving the problems arising from the lack of smoothness of time derivatives, which puts a high demand on the construction of effective time discretization schemes. Moreover, since the solution of transient problems is typically reduced to a sequence of related static problems, it is natural to assume that their solution is much more time-consuming than related static problems. Other complications arise from the time-dependent coupling of the boundary points that come to contact.
Fig. 1.6 Crash test
An example of a real transient contact problem is the analysis of a crash test depicted in Fig. 1.6. Sophisticated time discretization is necessary to preserve energy. A reasonable time discretization requires the solutions of 3 5 .10 –.10 auxiliary static problems per second so that an efficient solver of the static problems is necessary. Moreover, it is required to identify a possible contact interface in each time step. The simplifications, as compared with
6
Zdeněk Dostál
the static problems, concern the regularization of the stiffness matrices of the “floating” bodies by the mass matrix and a good initial approximation of the contact interface from the previous time step. Despite much research, little is known about the problem solvability, so a numerical solution is typically obtained assuming it exists. Moreover, the solution is assumed to be sufficiently smooth so that its second time derivatives exist in some good sense and can be approximated by finite differences. The complications arise from the hyperbolic character of the equilibrium of inner forces. An overview of the results concerning the existence and uniqueness of a solution to transient contact problems can be found in the book by Eck et al. [10].
1.4 Numerical Solution of Contact Problems Given a system of elastic bodies in a reference configuration, traction, and boundary conditions, including contact conditions on a potential contact interface, the complete solution of a related contact problem comprises the displacements of the bodies and reaction forces on a contact interface. The stress and strain fields in the bodies can be evaluated additionally. A specific feature of contact problems is the strong nonlinearity in variables associated with an a priori unknown contact interface. To briefly describe how to resolve the challenges posed by the numerical solution to contact problems, we split the procedure into three stages: • A continuous formulation of the conditions of equilibrium and boundary conditions. • The discretization of the continuous problem. • The solution of the discretized problem.
1.4.1 Continuous Formulation The displacements of a system of elastic bodies subjected to a given traction satisfy the Lamé equations of equilibrium in the interior of each body and boundary conditions. The latter can include prescribed displacements (Dirichlet’s conditions), given traction (Neumann’s conditions), and contact conditions. Thus, the solution of contact problems amounts to solving a nonlinear elliptic boundary value problem with boundary conditions that include inequalities. A solution exists if all data are sufficiently “smooth” and satisfy some additional restrictive assumptions. A suitable general formulation of contact problems uses variational inequalities describing the equilibrium in “average.” The variational formulation simplifies the treatment of discontinuous coefficients and/or traction
1 Contact Problems and Their Solution
7
and helps formulate all contact problems, including those with additional nonlinearities, such as large deformations or plasticity. The variational formulation in suitable function space is mathematically sound as it enables results on the existence and uniqueness of a solution under sufficiently general conditions on prescribed data. Moreover, the displacements that satisfy the variational inequality describing the equilibrium of elastic bodies in frictionless contact minimize the energy functional subject to bounds on boundary displacements. The energy formulation introduces a structure that solution algorithms can exploit. If some additional assumptions are satisfied, it is possible to express the interior displacements in terms of the boundary displacements and their derivatives. Then the Lamé equations in the interior of bodies are replaced by the Calderon system of boundary integral equations, and both the variational inequality and the energy function are reduced to the boundary of bodies, which reduces the dimension of the problem by one. Though the reduced variational formulation is not as general as the full one, some problems can be solved more efficiently in the boundary variational formulation, including the exterior problems, the problems with cracks, or the contact shape optimization problems. In addition, the continuous formulation can include the decomposition of bodies into parts that can be interconnected by the equality constraints as in Fig. 1.7.
Fig. 1.7 Ball bearings’ decomposition
1.4.2 Discretization The discretization of contact problems is partly the same as that of linear problems—the finite element method (FEM) and the boundary element method (BEM) [11] are widely used. Let us mention that the surface BEM discretization is much simpler than the volume FEM discretization. This feature can be useful as the space discretization of huge problems can be
8
Zdeněk Dostál
time-consuming. If the continuous formulation does not include domain decomposition, then the decomposition can be combined with discretization. Though a direct linearization of the non-penetration conditions is possible and works well with matching discretizations, this approach is not supported by an approximation theory. For example, more sophisticated discretization is necessary when the irregularly discretized contact interface is large and curved. To resolve the problem, we can impose the contact conditions on average using a variational form and use the variationally consistent discretization by biorthogonal mortars introduced by Wohlmuth [12]. A more engineering presentation of the mortar discretization is in Laursen [13]. Under reasonable conditions, the biorthogonal mortars generate well-conditioned and nicely structured constraint matrices that comply with the scalability theory.
1.4.3 Solution and Scalable Algorithms The early algorithms based on FEM were developed in the late 1960s; see, e.g., Goodman et al. [14] or Chan and Tuba [15]. They typically imposed the contact conditions by the solution-dependent penalty enhanced into a special contact element. The convergence of early algorithms was not supported by theory. An early algorithm for problems with friction based on the minimization of energy was proposed by Panagiotopoulos [16]. Modern algorithms for contact problems usually reduce the solution to a series of auxiliary linear systems or to quadratic programming (QP) or QCQP (quadratically constrained quadratic programming) problems with a specific structure that can be effectively exploited in computations. For example, the high-precision solution of the linear systems used by Newton-type algorithms is typically obtained by direct solvers, which can now solve sparse linear systems up to millions of unknowns. However, the number of arithmetic operations required by direct methods increases with the square of the numbers of unknowns—too much to solve larger problems. Two ingredients are necessary to develop a theoretically supported algorithm for contact problems with asymptotically linear (optimal) complexity: • The procedure that reduces the problem to a well-conditioned sparse problem with a special structure. • An algorithm that can exploit favorable conditioning to find the solution in a uniformly bounded number of cheap iterations. It is not trivial to satisfy the above conditions, even for linear problems. The first numerically scalable algorithm for the latter was the multigrid proposed by Fedorenko [17] in 1961. However, the method was only fully appreciated after the faster computers cleared the way to solving larger problems. The multigrid now competes with the variants of the FETI (finite element
1 Contact Problems and Their Solution
9
tearing and interconnecting) domain decomposition method introduced by Farhat and Roux [18] in 1991. Both approaches exploit coarse spaces to generate well-conditioned systems, which can be solved efficiently by standard iterative solvers with the rate of convergence depending on the condition number of the energy function’s Hessian matrix. However, the methods differ in the way the coarse grids are generated. The auxiliary coarse problems used by the multigrid are typically generated by the hierarchy of discretizations of the original problem. In contrast, those used by FETI are generated by the domain decomposition as in Fig. 1.7. The coarse grids generated by the latter are much coarser and seem to comply better with modern supercomputers. The power of scalable linear solvers is demonstrated in Fig. 1.8 where is the solution of the 3D Laplace equation on the unit cube using varying discretizations and an H-TFETI domain decomposition solver. Notice that the time (about 10 min) and the number of iterations (29–31) are nearly constant. Since the cost of matrix-vector multiplications increases linearly with the number of variables, the algorithm’s computational complexity is approximately linear for the problems discretized by hundreds of billions of nodal variables. The cost of an inexact solution by a scalable solver is also proportional to the precision of the solution.
Fig. 1.8 Weak scalability of H-TFETI for linear system, ESPRESO software. The two bottom lines give the numbers of nodal variables (.×109 ) and clusters, respectively. Courtesy of Říha et al. [19]
The scalable solvers of linear equations can be used to implement the Newton steps in the search for the solution of the nonsmooth equations describing contact problems; see, e.g., the comprehensive paper by Wohlmuth [12].
10
Zdeněk Dostál
If there are no floating bodies, the nonsmooth Newton algorithm enjoys superlinear local convergence and can help implement additional nonlinearities such as plasticity. However, far from the solution, combining the Newton steps with a globalization strategy that slows down the rate of convergence is necessary. Alternatively, the scalable iterative linear solvers can be plugged into practical optimization algorithms to generate cheap approximations of the Newton steps and exploit the specific structure of discretized contact problems. Robust theoretically and/or experimentally supported scalable algorithms for contact problems combine the FETI domain decomposition methods introduced by Farhat and Roux [18] and the BETI domain decomposition methods introduced by Langer and Steinbach [11] with quadratic programming and QCQP algorithms. FETI and BETI are based on duality, transforming all inequality constraints into bound or separable quadratic constraints. As a result, the related optimization problems have a structure that specialized algorithms can exploit. Moreover, hybrid variants of FETI and BETI use the two-level coarse grids associated with subdomains and clusters and can effectively exploit the hierarchical structure and massively parallel capabilities of modern supercomputers. Recall that the clusters are groups of subdomains joined on the primal level by the corners and/or edge and/or face averages of rigid body motions.
References 1. Hertz, H.R.: On the contact of elastic solids. J. für die reine und angewandte Mathematik. 92, 156–171 (1981): In English translation: Hertz, H.: Miscellaneous Papers, pp. 146–183. Macmillan, New York (1896) 2. Johnson, K.L.: Contact Mechanics. Cambridge University Press, Cambridge (1985) 3. Signorini, A.: Sopra akune questioni di elastostatica. Atti della Societa Italiana per il Progresso delle Scienze 21(II), 143–148 (1933) 4. Fichera, G.: Problemi elastostatici con vincoli unilaterali: Il Problema di Signorini con ambique condizioni al contorno. Memorie della Accademia Nazionale dei Lincei, Serie VIII VII, 91–140 (1964) 5. Hlaváček, I., Haslinger, J., Nečas, J., Lovíšek, J.: Solution of Variational Inequalities in Mechanics. Springer, Berlin (1988) 6. Kikuchi, N., Oden, J.T.: Contact Problems in Elasticity. SIAM, Philadelphia (1988) 7. Coulomb, C.A.: Théorie des machines simples, en ayant égard au frottement de leurs parties et la raideur des cordages. Mémoirs Savants Étrangers X, 163–332 (1785)
1 Contact Problems and Their Solution
11
8. Wriggers, P.: Computational Contact Mechanics, Springer Nature, Switzerland (2019) 9. Duvaut, G., Lions, J.L.: Inequalities in Mechanics and Physiscs, Springer, Berlin (1976) 10. Eck, C., Jarůšek, J., Krbec, M.: Unilateral Contact Problems. Chapman & Hall/CRC, London (2005) 11. Langer, U., Steinbach, O.: Boundary element tearing and interconnecting methods. Computing 71, 205–228 (2003) 12. Wohlmuth, B.I.: Variationally consistent discretization scheme and numerical algorithms for contact problems. Acta Numer. 20, 569–734 (2011) 13. Laursen, T.A: Computational Contact and Impact Mechanics. Springer, Berlin (2003) 14. Goodman, R.E., Taylor R.L., Brekke T.L.: A model for the mechanics of jointed rock. J. Soil Mech. Found., Engrg. Div. ASCE 99, 637–660 (1968) 15. Chan, S.H., Tuba, I.S.: A finite element method for contact problems of solid bodies. Int. J. Mech. Sci. 13, 615–639 (1971) 16. Panagiotopoulos, P.D.: A Nonlinear programming approach to the unilateral contact and friction boundary value problem. Ingenieur-Archiv 44, 421–432 (1975) 17. Fedorenko, R.P.: A relaxation method for solving elliptic difference equations. Zhurnal Vychislitel’noj Matemaki i Matematicheskoj Fiziki 1, 922– 927 (1961). Also in USSR Computational Math. Math. Physics, I, 1092– 1096 (1962) 18. Farhat, C., Roux, F.-X.: A method of finite element tearing and interconnecting and its parallel solution algorithm. Int. J. Numer. Methods Eng. 32, 1205–1227 (1991) 19. Říha, L., Brzobahatý, T., Markopoulos, A., Meca, O.: ESPRESO – Highly Parallel Framework for Engineering Applications. http://numbox.it4i.cz
Part I Basic Concepts
Chapter 2
Linear Algebra Zdeněk Dostál and Tomáš Kozubek
The purpose of this chapter is to briefly review the notations, definitions, and results of linear algebra that are used in the rest of the book. There is no claim of completeness as the reader is assumed to be familiar with the basic concepts of college linear algebra, such as vector spaces, linear mappings, matrix decompositions, etc. More systematic exposition and additional material can be found in the books by Demmel [1], Laub [2], Golub and Van Loan [3], or Mayer [4].
2.1 Vectors and Matrices In this book, we work with n-dimensional arithmetic vectors .v ∈ Rn , where R denotes the set of real numbers. The only exception is Sect. 2.9, where vectors with complex entries are considered. We denote the i th component of an arithmetic vector .v ∈ Rn by .[v]i . Thus, .[v]i = vi if .v = [vi ] is defined by its components .vi . All the arithmetic vectors are considered by default to be column vectors. The relations between vectors .u, v ∈ Rn are defined componentwise. Thus, .u ≤ v is equivalent to .[u]i ≤ [v]i , .i = 1, . . . , n. We sometimes call the elements of .Rn points to indicate that the concepts of length and direction are not important.
.
Z. Dostál • T. Kozubek Department of Applied Mathematics and IT4Innovations National Supercomputing Center, VŠB-Technical University of Ostrava, Ostrava, Czech Republic e-mail: [email protected]; e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 Z. Dostál et al., Scalable Algorithms for Contact Problems, Advances in Mechanics and Mathematics 36, https://doi.org/10.1007/978-3-031-33580-8_2
15
16
Zdeněk Dostál and Tomáš Kozubek
The vector analog of .0 ∈ R is the zero vector .on ∈ Rn with all the entries equal to zero. When the dimension can be deduced from the context, possibly using the assumption that all expressions in our book are well defined, we often drop the subscript and write simply .o. Given vectors .v1 , . . . , vk ∈ Rn , the set .
Span {v1 , . . . , vk } = {v ∈ Rn : v = α1 v1 + · · · + αk vk , αi ∈ R}
is a vector space called the linear span of .v1 , . . . , vk . For example, .
Span {s1 , . . . , sn } = Rn ,
[si ]j = δij ,
i, j = 1, . . . , n,
where .δij denotes the Kronecker symbol defined by .δij = 1 for .i = j and δ = 0 for .i = j, is the standard basis of .Rn . We sometimes use the componentwise extensions of scalar functions to vectors. Thus, if .v ∈ Rn , then .v+ and .v− are the vectors with the i th components .max{[v]i , 0} and .min{[v]i , 0}, respectively. If .I is a nonempty subset of .{1, . . . , n} and .v ∈ Rn , then we denote by .[v]I or simply .vI the subvector of .v with components .[v]i , i ∈ I. Thus, if m .I has m elements, then .vI ∈ R , so we can refer to the components of .vI either by the global indices .i ∈ I or by the local indices. j ∈ {1, . . . , m}. We usually rely on the reader’s judgment to recognize the appropriate type of indexing. Similar to the related convention for vectors, the (i, j )th component of a matrix .A ∈ Rm×n is denoted by .[A]ij , so that .[A]ij = aij for .A = [aij ] which is defined by its entries .aij . A matrix .A ∈ Rm×n is called an (m, n)-matrix . The matrix analog of 0 is the zero matrix .Omn ∈ Rm×n with all the entries equal to zero. When the dimension is clear from the context, we often drop the subscripts and write simply .O. The matrix counterpart of .1 ∈ R in .Rn×n is the identity matrix .In = [δij ] of the order n. When the dimension may be deduced from the context, we often drop the subscripts and write simply .I. Thus, we can write . ij
A = IA = AI
.
for any matrix .A, having in mind that the order of .I on the left may be different from that on the right. A matrix .A is positive definite if .xT Ax > 0 for any .x = o, positive semidefinite if .xT Ax ≥ 0 for any .x, and indefinite if neither .A nor .−A is positive definite or semidefinite. We are especially interested in symmetric positive definite (SPD) or symmetric positive semidefinite (SPS) matrices. The matrices in our applications are often sparse in the sense that they have a small number of nonzero entries distributed in a pattern that can be exploited to the efficient implementation of matrix operations or to the storage reduction.
2 Linear Algebra
17
If .A ∈ Rm×n , .I ⊆ {1, . . . , m}, and .J ⊆ {1, . . . , n}, with .I and .J nonempty, we denote by .AIJ the submatrix of .A with the components .[A]ij , i ∈ I and j ∈ J . The local indexing of the entries of .AIJ is used whenever it is convenient in a similar way as the local indexing of subvectors which was introduced above. The full set of indices can be replaced by * so that .A = A∗∗ and .AI∗ denotes the submatrix of .A with the row indices belonging to .I. Occasionally, we simplify .AI = AI∗ . Sometimes, it is useful to rearrange the matrix operations into manipulations with submatrices of given matrices called blocks. A block matrix m×n .A ∈ R is defined by its blocks .Aij = AIi Jj , where .Ii and .Jj denote nonempty contiguous sets of indices of .{1, . . . , m} and .{1, . . . , n}, respectively. We can use the block structure to implement matrix operations only when the block structure of the involved matrices matches. The entries .Aij of a block matrix .A can be scalar multiples of a matrix .C, i.e., .Aij = bij C. Such matrix .A is called the Kronecker product .B ⊗ C of m×n .B = [bij ] and .C. If .B ∈ R , .C ∈ Rp×q , then ⎤ ⎡ b11 C · · · b1n C ⎢ .. .. .. ⎥ ∈ Rmp×nq . .A = B ⊗ C = ⎣ . . . ⎦ bm1 C · · · bmn C The block structure of the Kronecker product occurs in many applications and allows a deep insight into some special problems. We shall use it in the hybrid domain decomposition method analysis in Chap. 15. Relevant Kronecker product properties include (αB) ⊗ C = B ⊗ (αC) = α(B ⊗ C), (B ⊗ C)T = BT ⊗ CT , (B ⊗ C)(D ⊗ E) = (BD ⊗ CE), .
(B ⊗ C) ⊗ D = B ⊗ (C ⊗ D).
(2.1) (2.2) (2.3) (2.4)
The dimensions of the matrices in (2.3) must comply with the definitions of BD and .CE so that the latter products exist. The properties (2.1)–(2.4) can be verified directly by computations. For example, to check (2.3), notice that ⎤⎡ ⎤ ⎡ d11 E · · · d1p E b11 C · · · b1n C ⎢ .. .. .. ⎥ ⎢ .. .. .. ⎥ , .(B ⊗ C)(D ⊗ E) = ⎣ . . . ⎦⎣ . . . ⎦
.
bm1 C · · · bmn C
dn1 E · · · dnp E
so the block entry in ith row and jth column reads [(B ⊗ C)(D ⊗ E)]ij = (bi1 d1j + · · · + bin dnj )CE = [BD]ij CE.
.
If .u ∈ Rm and .v ∈ Rn , then .u ⊗ v ∈ Rmn . If .B ∈ Rm×m and .C ∈ Rn×n are square matrices and .Bu = βu and .Cv = βv, then we can use (2.1) and (2.3)
18
Zdeněk Dostál and Tomáš Kozubek
to get (B ⊗ C)(u ⊗ v) = βγ(u ⊗ v).
.
(2.5)
It follows that (B ⊗ In )(u ⊗ v) = β(u ⊗ v),
.
(Im ⊗ C)(u ⊗ v) = γ(u ⊗ v),
and .
(B ⊗ In + Im ⊗ C) (u ⊗ v) = (β + γ)(u ⊗ v).
(2.6)
More information can be found in the books by Golub and Van Loan [3, Chap. 12] or Horn and Johnson [5, Chap. 6].
2.2 Matrices and Mappings Each matrix .A ∈ Rm×n defines the mapping which assigns to each .x ∈ Rn the vector .Ax ∈ Rm . Two important subspaces associated with this mapping are its range or image space .ImA and its kernel or null space .KerA; they are defined by ImA = {Ax : x ∈ Rn }
and
.
KerA = {x ∈ Rn : Ax = o}.
The range of .A is the span of its columns. The rank and the defect of a matrix are defined as the dimensions of its image and kernel, respectively. If f is a mapping defined on .D ⊆ Rn and .Ω ⊆ D, then .f |Ω denotes the restriction of f to .Ω, that is, the mapping defined on .Ω which assigns to each m×n .x ∈ Ω the value .f (x). If .A ∈ R and V is a subspace of .Rn , we define .A|V as a restriction of the mapping associated with .A to V. The restriction T .A|V is said to be positive definite if .x Ax > 0 for .x ∈ V, x = o, and positive T semidefinite if .x Ax ≥ 0 for .x ∈ V . The mapping associated with .A is injective if .Ax = Ay implies .x = y. It is easy to check that the mapping associated with .A is injective if and only if .KerA = {o}. If .m = n, then .A is injective if and only if .ImA = Rn . A subspace .V ⊆ Rn which satisfies AV = {Ax : x ∈ V } ⊆ V
.
is an invariant subspace of .A. Obviously, A(ImA) ⊆ ImA,
.
so that .ImA is an invariant subspace of .A. A projector is a square matrix .P that satisfies P2 = P.
.
2 Linear Algebra
19
A vector .x ∈ ImP if and only if there is .y ∈ Rn such that .x = Py, so that Px = P(Py) = Py = x.
.
If .P is a projector, then also .Q = I − P and .PT are projectors as (I − P)2 = I − 2P + P2 = I − P
.
and
PT
2
T = P2 = PT .
Since for any .x ∈ Rn x = Px + (I − P)x,
.
it simply follows that .ImQ = KerP, Rn = ImP + KerP,
.
and
KerP ∩ ImP = {o}.
We say that .P is a projector onto .U = ImP along .V = KerP and .Q is a complementary projector onto V along U. The above relations may be rewritten as ImP ⊕ KerP = Rn .
.
(2.7)
Let .(π(1), . . . , π(n)) be a permutation of numbers .1, . . . , n. Then the mapping which assigns to each .v = [vi ] ∈ Rn a vector .[vπ(1) , . . . , vπ(n) ]T is associated with the permutation matrix P = [sπ(1) , . . . , sπ(n) ],
.
where .si denotes the i th column of the identity matrix .In . If .P is a permutation matrix, then PPT = PT P = I.
.
Notice that if .B is obtained from a matrix .A by reordering of the rows of .A, then there is a permutation matrix .P such that .B = PA. Similarly, if .B is a matrix obtained from .A by reordering of the columns of .A, then there is a permutation matrix .P such that .B = AP.
2.3 Inverse and Generalized Inverse If .A is a square full rank matrix, then there is the unique inverse matrix .A−1 such that .
AA−1 = A−1 A = I.
(2.8)
20
Zdeněk Dostál and Tomáš Kozubek
The mapping associated with .A−1 is inverse to that associated with .A. If .A−1 exists, we say that .A is nonsingular. A square matrix is singular if its inverse matrix does not exist. If .P is a permutation matrix, then .P is nonsingular and P−1 = PT .
.
If .A is a nonsingular matrix, then .A−1 b is the unique solution of .Ax = b. If .A is nonsingular, then we can transpose (2.8) to get (A−1 )T AT = AT (A−1 )T = I,
.
so that .
(AT )−1 = (A−1 )T .
(2.9)
It follows that if .A is symmetric, then .A−1 is symmetric. If .A ∈ Rn×n is positive definite, then also .A−1 is positive definite, as any vector .x = o can be expressed as .x = Ay, .y = o, and xT A−1 x = (Ay)T A−1 Ay = yT AT y = yT Ay > 0.
.
If .A and .B are nonsingular matrices, then it is easy to check that also .AB is nonsingular and (AB)−1 = B−1 A−1 .
.
If
HII HIJ .H = HJ I HJ J
A BT = B C
is an SPD block matrix, then we can directly evaluate .
H−1 =
A BT B C
−1 =
A−1 − A−1 BT S−1 BA−1 , − A−1 BT S−1 , −S−1 BA−1 , S−1
(2.10)
where .S = C − BA−1 BT denotes the Schur complement of .H with respect to .A. Thus, −1 = S−1 . . H (2.11) JJ If .A ∈ Rm×n and .b ∈ ImA, then we can express a solution of the system of linear equations .Ax = b by means of a left generalized inverse matrix + .A ∈ Rn×m which satisfies .AA+ A = A. Indeed, if .b ∈ ImA, then there is .y such that .b = Ay and .x = A+ b satisfies
2 Linear Algebra
21
Ax = AA+ b = AA+ Ay = Ay = b.
.
Thus, .A+ acts on the range of .A like the inverse matrix. If .A is a nonsingular square matrix, then obviously .A+ = A−1 . Moreover, if .A ∈ Rn×n and n×p .R ∈ R are such that .AR = O, then .A+ + RRT is also a left generalized inverse as T + A = AA+ A + ARRT A = A. .A A + RR If .A is a symmetric singular matrix, then there is a permutation matrix .P such that
T T B C .A = P P, C CB−1 CT where .B is a nonsingular matrix, the dimension of which is equal to the rank of .A. It may be verified directly that the matrix
−1 T O # T B .A =P (2.12) P O O is a left generalized inverse of .A. If .A is SPS, then .A# is also SPS. Notice that if .AR = O, then .A+ = A# + RRT is also an SPS generalized inverse.
2.4 Direct Methods for Solving Linear Equations An inverse matrix is a useful tool for theoretical developments but not for computations. It is often much more efficient to implement the multiplication of a vector by the inverse matrix by solving the related system of linear equations. We briefly recall the direct methods, which reduce solving of the original system of linear equations to solving of a system or systems of equations with triangular matrices. A matrix .L = [lij ] is lower triangular if .lij = 0 for .i < j. It is easy to solve a system .Lx = b with the nonsingular lower triangular matrix .L ∈ Rn . As there is only one unknown in the first equation, we can find it and then substitute it into the remaining equations to obtain a system with the same structure but with only .n − 1 remaining unknowns. Repeating the procedure, we can find all components of .x. A similar procedure, but starting from the last equation, can be applied to a system with the nonsingular upper triangular matrix .U = [uij ] with .uij = 0 for .i > j. The solution costs of a system with triangular matrices increase proportionally to the square of the number of unknowns. In particular, the solution of a system of linear equations with a diagonal matrix .D = [dij ], dij = 0
22
Zdeněk Dostál and Tomáš Kozubek
for .i = j, reduces to the solution of a sequence of linear equations with one unknown. If we are to solve the system of linear equations with a nonsingular matrix, we can systematically use equivalent transformations to modify the original system to that with an upper triangular matrix. It is well known that the solution of a system of linear equations does not change if we change the order of the equations, replace any equation by its nonzero multiple, or add a scalar multiple of one equation to another equation. The Gauss elimination for the solution of a system of linear equations with a nonsingular matrix thus consists of two steps: the forward reduction, which exploits equivalent transformations to reduce the original system to that with an upper triangular matrix, and the backward substitution, which solves the resulting system with the upper triangular matrix. Alternatively, we can use suitable matrix factorizations. For example, it is well known that any SPD matrix .A can be decomposed into the product .
A = LLT ,
(2.13)
where .L is a nonsingular lower triangular matrix with positive diagonal entries. Having the decomposition, we can evaluate z = A−1 x
.
by solving the systems Ly = x
.
and
LT z = y.
The factorization-based solvers may be especially useful when we are to solve several systems of equations with the same coefficients but different righthand sides. The method of evaluation of the factor .L is known as the Cholesky factorization. The Cholesky factor .L can be computed column by column. Suppose that
a11 aT1 l11 o .A = and L= . l1 L22 a1 A22 Substituting for .A and .L into (2.13) and comparing the corresponding terms immediately reveal that l
. 11
=
√
a11 ,
−1 l1 = l11 a1 ,
L22 LT22 = A22 − l1 lT1 .
(2.14)
This gives us the first column of .L, and the remaining factor .L22 is simply the Cholesky factor of the Schur complement .A22 − l1 lT1 which is known to be positive definite, so we can find its first column by the above procedure. The algorithm can be implemented to exploit a sparsity pattern of .A, e.g., when .A = [aij ] ∈ Rn×n is a band matrix with .aij = 0 for .|i − j| > b, b n.
2 Linear Algebra
23
If .A ∈ Rn×n is only positive semidefinite, it can happen that .a11 = 0. Then 0 ≤ xT Ax = yT A22 y + 2x1 aT1 y
.
T for any vector .x = x1 , yT . The inequality implies that .a1 = o, as otherwise we could take .y = −a1 and large .x1 to get yT A22 y + 2x1 aT1 y = aT1 A22 a1 − 2x1 a1 2 < 0.
.
Thus, for .A symmetric positive semidefinite and .a11 = 0, (2.14) reduces to l
. 11
= 0,
l1 = o,
L22 LT22 = A22 .
(2.15)
This simple modification assumes exact arithmetics. In the computer arithmetics, the decision whether .a11 is to be treated as zero depends on some small .ε > 0. Alternatively, it is possible to exploit some additional information. For example, any orthonormal basis of the kernel of a matrix can be used to identify the zero rows (and columns) of a Cholesky factor by means of the following lemma. Lemma 2.1 Let .A ∈ Rn×n denote an SPS matrix with the kernel spanned by the matrix .R ∈ Rn×d with orthonormal columns. Let I = {i1 , . . . , id },
.
1 ≤ i1 < i2 < · · · < id ≤ n,
denote a set of indices, and let .J = N − I, Then .
N = {1, 2, . . . , n}.
4 λmin (AJ J ) ≥ λmin (A) σmin (RI∗ ),
(2.16)
where .λmin (A) and .σmin (RJ ∗ ) denote the least nonzero eigenvalue of .A and the least singular value of .RJ ∗ , respectively. Proof See Dostál et al. [6].
.
2.5 Schur Complement Let .A and .b denote a square matrix and a vector, respectively,
AII AIJ bI .A = , b= . AJ I AJ J bJ If the block .AII of .A is nonsingular, then we can solve the system of linear equations
24
Zdeněk Dostál and Tomáš Kozubek
AII xI + AIJ xJ = bI AJ I x I + AJ J x J = bJ .
in two steps. First evaluate .xI from the first equation to get −1 xI = A−1 II bI − AII AIJ xJ
.
and then substitute the result into the second equation to get (AJ J − AJ I A−1 II AIJ )xJ = bJ .
.
The matrix S = AJ J − AJ I A−1 II AIJ
.
is called the Schur complement of the block .AII . Let us assume that .A is symmetric, so that .ATIJ = AJ I , let .AII be nonsingular, and let us consider all .x and .y such that AII x + AIJ y = o.
.
Denoting x = x(y) =
.
−A−1 II AIJ y,
x z= , y
we get
AII AIJ x zT Az = xT yT AJ I AJ J y
.
−1 T T = yT ATIJ A−1 II AIJ y − y AIJ AII AIJ y T + yT ATIJ A−1 II AIJ y + y AJ J y
= yT Sy. It follows that if .A is an SPS or SPD matrix, then the Schur complement of A with respect to arbitrary nonsingular diagonal submatrix of .A is also SPS or SPD, respectively. If .A is an SPS matrix, then it is easy to check that .S “inherits” its kernel from .A. Indeed, if .Ae = o, then
.
AII eI + AIJ eJ = o,
.
AJ I eI + AJ J eJ = o, and
2 Linear Algebra
25
SeJ =(AJ J − AJ I A−1 II AIJ )eJ =AJ J eJ − AJ I A−1 II (−AII eI ) =AJ J eJ + AJ I eI = o,
.
i.e., .eJ ∈ KerS. The Schur complement is a useful tool that can be used to reduce the problems related to the manipulation with singular matrices to those with smaller matrices. Alternatively, we can use the Cholesky decomposition of the nonsingular block .AII to decompose an SPS matrix .A into the product of triangular and nearly triangular factors. To specify the procedure in more detail, let AII = LII LTII .
.
Denoting L
. JI
= AJ I L−T II ,
we can check that .
A=
AII AIJ AII AJ J
S = AJ J − AJ I A−1 II AIJ ,
=
LII O LJ I I
LTII LTJ I . O S
(2.17)
If .S is a small matrix, we can use the decomposition (2.17) for the effective construction of a generalized inverse. We shall summarize the main observations in the following lemma. Lemma 2.2 Let .A denote an SPS block matrix as in (2.17), let .e ∈ KerA, and let .S denote the Schur complement matrix of the block .AII . Then e
. J
∈ KerS.
If .S+ denote any generalized inverse of .S, then the matrix .A+ defined by
−T
O L−1 LJ J − L−T LTIJ S+ + J J J J .A = (2.18) O S+ −LIJ L−1 JJ I is a generalized inverse of .A which satisfies .
[A+ ]J J = S+ .
(2.19)
2.6 Norms General concepts of size and distance in a vector space are expressed by norms. A norm on .Rn is a function which assigns to each .x ∈ Rn a number
26
Zdeněk Dostál and Tomáš Kozubek
IxI ∈ R in such a way that for any vectors .x, y ∈ Rn and any scalar .α ∈ R, the following three conditions are satisfied:
.
(i) .IxI ≥ 0, and .IxI = 0 if and only if .x = o. (ii) .Ix + yI ≤ IxI + IyI. (iii) .IαxI = |α| IxI. It is easy to check that the functions . x 2 = x21 + · · · + x2n and x ∞ = max{|x1 |, . . . , |xn |} are norms. They are called . 2 (Euclidean) and . ∞ norms, respectively. A matrix variant of the Euclidean norm is the Frobenius norm n m . A F = a2ij , A ∈ Rm×n . i=1 j=1
Given a norm defined on the domain and the range of a matrix .A, we can define the induced norm .IAI of .A by IAxI . x=o IxI
IAI = sup IAxI = sup
.
IxI=1
If .B = O, then IABI = sup
.
x=o
IABxI IABxI IBxI IAyI IBxI = sup ≤ sup sup . IxI Bx=o IBxI IxI y∈ImB; IyI x=o IxI y=o
It follows easily that the induced norm is submultiplicative, i.e., .
IABI ≤ IA|ImBI IBI ≤ IAI IBI.
(2.20)
If .A = [aij ] ∈ Rm×n and .x = [xi ] ∈ Rn , then Ax ∞ = max |
n
.
i=1,...,m
j=1
aij xj | ≤ max
i=1,...,m
n
|aij ||xj | ≤ x ∞ max
i=1,...,m
j=1
n
|aij |,
j=1
n that is, . A ∞ ≤ maxi=1,...,m j=1 |aij |. Since the last inequality turns into the equality for a vector .x with suitably chosen entries .xi ∈ {1, −1}, we have max . A ∞ =
i=1,...,m
n j=1
|aij |.
(2.21)
2 Linear Algebra
27
2.7 Scalar Products General concepts of length and angle in a vector space are introduced by means of a scalar product; it is the mapping which assigns to each couple n n .x, y ∈ R a number .(x, y) ∈ R in such a way that for any .x, y, z ∈ R and any scalar .α ∈ R, the following four conditions are satisfied: (i) (ii) (iii) (iv)
(x, y + z) = (x, y) + (x, z). (αx, y) = α(x, y). .(x, y) = (y, x). .(x, x) > 0 for .x = o.
.
.
The scalar product is an SPD form; see also Chap. 4. We often use the Euclidean scalar product or the Euclidean inner product which assigns to each couple of vectors .x, y ∈ Rn a number defined by (x, y) = xT y.
.
In more complicated expressions, we often denote the Euclidean scalar product in .R3 by dot, so that x · y = xT y.
.
If .A is an SPD matrix, then we can define the more general .A-scalar product on .Rn by (x, y)A = xT Ay.
.
We denote for any .x ∈ Rn its Euclidean norm and .A-norm by x = (x, x)1/2
.
and
1/2
x A = (x, x)A ,
respectively. It is easy to see that any norm induced by a scalar product satisfies the properties (i) and (iii) of the norm. The property (ii) follows from the CauchySchwarz inequality .
(x, y)2 ≤ x 2 y 2 ,
(2.22)
which is valid for any .x, y ∈ Rn and any scalar product. The bound is tight in the sense that the inequality becomes the equality when .x, y are dependent. The inequality (2.22) opens a way to introduce the concept of angle between vectors by (x, y) ≤ 1, α ∈ [−π/2, π/2]. . cos α = x y A pair of vectors .x and .y is orthogonal (with respect to a given scalar product) if
28
Zdeněk Dostál and Tomáš Kozubek .
(x, y) = 0.
The vectors .x and .y that are orthogonal in .A-scalar product are called A-conjugate or briefly conjugate. Two sets of vectors .E and .F are orthogonal (also stated “.E orthogonal to .F ”) if any .x ∈ E is orthogonal to any ⊥ .y ∈ F . The set .E of all the vectors of .Rn that are orthogonal to .E ⊆ Rn is a vector space called an orthogonal complement of .E . If .E ⊆ Rn , then .
.
Rn = Span E ⊕ E ⊥ .
A set of vectors .E is orthogonal if its elements are pairwise orthogonal, i.e., any .x ∈ E is orthogonal to any .y ∈ E , .y = x. A set of vectors .E is orthonormal if it is orthogonal and .(x, x) = 1 for any .x ∈ E . A square matrix .U is orthogonal if .UT U = I, that is, .U−1 = UT . Multiplication by an orthogonal matrix .U preserves both the angles between any two vectors and the Euclidean norm of any vector as (Ux)T Uy = xT UT Uy = xT y.
.
A matrix .P ∈ Rn×n is an orthogonal projector if .P is aprojector, i.e., .P = P, and .ImP is orthogonal to .KerP. The latter condition can be rewritten equivalently as 2
PT (I − P) = O.
.
It simply follows that PT = PT P = P,
.
so that orthogonal projectors are symmetric matrices and symmetric projectors are orthogonal projectors. If .P is an orthogonal projector, then .I − P is also an orthogonal projector as (I − P)2 = I − 2P + P2 = I − P and
.
(I − P)T P = (I − P)P = O.
If . U ⊆ Rn is spanned by m independent columns of .U ∈ Rn×m , then P = U(UT U)−1 UT
.
is an orthogonal projector as P2 = U(UT U)−1 UT U(UT U)−1 UT = P
.
and
PT = P.
Since any vector .x ∈ U may be written in the form .x = Uy and Px = U(UT U)−1 UT Uy = Uy = x,
.
2 Linear Algebra
29
it follows that .U = ImP. Observe that .UT U is nonsingular; since .UT Ux = o implies Ux 2 = xT (UT Ux) = 0,
.
it follows that .x = o by the assumption on the full column rank of .U.
2.8 Orthogonalization If .A = [a1 , . . . , am ] ∈ Rn×m is a full column rank matrix, then it is often useful to find an orthonormal basis .q1 , . . . , qm of .ImA. The classical Gram-Schmidt orthogonalization generates an orthogonal basis .u1 , . . . , um and orthonormal basis .q1 , . . . , qm of .ImA starting from u 1 = a1 ,
.
q1 = u1 −1 u1 ,
r11 = (q1 , a1 ) = u1 ,
a1 = r11 q1 .
(2.23)
Having .q1 , . . . , qk−1 , .k ≤ m, the algorithm looks for .qk in the form qk = uk −1 uk ,
uk = ak − r1k q1 − · · · − rk−1,k qk−1 .
.
(2.24)
Taking the scalar product of the right equation of (2.24) with .qi , we get . ik
r
= (qi , ak ),
i = 1, . . . k − 1,
(2.25)
r
k−1 2 . = uk = ak 2 − rik
(2.26)
and
. kk
i=1
If we denote by .R ∈ Rm×m the upper triangular matrix with the nonzero entries defined by (2.25) and (2.26), we get the QR decomposition QQT = I,
A = QR,
.
Q = [q1 , . . . , qm ].
It follows that we can obtain .R alternatively by the Cholesky factorization AT A = LLT ,
.
L = RT .
If the columns of .A are not strongly independent, then the application of the classical Gram-Schmidt orthogonalization can be seriously flawed by rounding errors. This effect can be reduced without increasing the cost by replacing (2.25) by
30
Zdeněk Dostál and Tomáš Kozubek
r
. ik
= (qi , ak −
i−1
rjk qj ),
i = 1, . . . k − 1.
(2.27)
j=1
Though (2.25) and (2.27) are identical in exact arithmetic, it has been proved that the application of modified Gram-Schmidt orthogonalization using (2.27) generates better .Q, i.e., the bound on the violation of orthogonality given by e = QT Q − I
.
is smaller for the modified Gram-Schmidt algorithm (see Giraud et al. [7]). Still, better orthogonality can be achieved by executing the Gram-Schmidt twice.
2.9 Eigenvalues and Eigenvectors Let .A ∈ Cn×n denote a square matrix with complex entries. If a vector .e ∈ Cn and a scalar .λ ∈ C satisfy .
Ae = λe,
(2.28)
then .e is said to be an eigenvector of .A associated with an eigenvalue .λ. A vector .e is an eigenvector of .A if and only if . Span {e} is an invariant subspace of .A; the restriction .A|Span{e} reduces to the multiplication by .λ. If .{e1 , . . . , ek } are eigenvectors of a symmetric matrix .A, then it is easy to check that .Span{e1 , . . . , ek } and .Span{e1 , . . . , ek }⊥ are invariant subspaces. The set of all eigenvalues of .A is called the spectrum of .A; we denote it by .σ(A). Obviously, .λ ∈ σ(A) if and only if .A − λI is singular, and .0 ∈ σ(A) if and only if .A is singular. If .λ = 0, .λ ∈ σ(A), then we can multiply (2.28) by −1 −1 .λ A to get .A−1 e = λ−1 e, so we can write σ(A−1 ) = σ −1 (A).
.
If .U ⊆ Cn is an invariant subspace of .A ∈ Cn×n , then we denote by .σ(A|U ) the eigenvalues of .A that correspond to the eigenvectors belonging to U. Even though it is in general difficult to evaluate the eigenvalues of a given matrix .A, it is still possible to get nontrivial information about .σ(A) without heavy computations. Useful information about the location of eigenvalues can be obtained by Gershgorin’s theorem, which guarantees that every eigenvalue of .A = [aij ] ∈ Cn×n is located in at least one of the n circular disks in the complex plane with the centers .aii and radii .ri = j=i |aij |. The eigenvalues of a real symmetric matrix are real. Since it is easy to check whether a matrix is symmetric, this gives us useful information about the location of eigenvalues.
2 Linear Algebra
31
Let .A ∈ Rn×n denote a real symmetric matrix, let .I = {1, . . . , n − 1}, and let .A1 = AII . Let .λ1 ≥ · · · ≥ λn and .λ11 ≥ · · · ≥ λ1n−1 denote the eigenvalues of .A and .A1 , respectively. Then by the Cauchy interlacing theorem .
λ1 ≥ λ11 ≥ λ2 ≥ λ12 ≥ · · · ≥ λ1n−1 ≥ λn .
(2.29)
There is a beautiful result by Smith [8] on the interlacing of the eigenvalues of .A and its Schur complements. Let .I = {1, . . . , m}, .J = {m + 1, . . . , n}, and let .λi (A), .i = 1, . . . , n, and .λj (S), .j = 1, . . . , n − m, denote the entries of the decreasing series of the eigenvalues of .A and of the Schur complement S = AJ J − AJ I A−1 II AJ I .
.
Then .
AII ≥ λi (S) ≥ λi+m (A), i = 1, . . . , n − m.
(2.30)
2.10 SVD and CS Decompositions If .A ∈ Rn×n is a symmetric matrix, then it is possible to find n orthonormal eigenvectors .e1 , . . . , en that form the basis of .Rn . Moreover, the corresponding eigenvalues are real. Denoting by .U = [e1 , . . . , en ] ∈ Rn×n an orthogonal matrix, the columns of which are the eigenvectors of .A, we may write the spectral decomposition of .A as .
A = UDUT ,
(2.31)
where .D = diag (λ1 , . . . , λn ) ∈ Rn×n is a diagonal matrix, the diagonal entries of which are the eigenvalues corresponding to the eigenvectors .e1 , . . . , en . Reordering the columns of .U, we can achieve that .λ1 ≥ · · · ≥ λn . The spectral decomposition reveals close relations between the properties of a symmetric matrix and its eigenvalues. Thus, a symmetric matrix is SPD if and only if all its eigenvalues are positive, and it is SPS if and only if they are nonnegative. The rank of a symmetric matrix is equal to the number of nonzero entries of .D. If .A is symmetric, then we can use the spectral decomposition (2.31) to check that for any nonzero .x .
λ1 = λmax ≥ x −2 xT Ax ≥ λmin = λn .
(2.32)
Thus, for any symmetric positive definite matrix .A .
−1 A = λmax , A−1 = λ−1 min , x A ≤ λmax x , x A−1 ≤ λmin x . (2.33)
32
Zdeněk Dostál and Tomáš Kozubek
The spectral condition number .κ(A) = A A−1 , which is a measure of departure from the identity, can be expressed for a real symmetric matrix by κ(A) = λmax /λmin .
.
If .A is a real symmetric matrix and f is a real function defined on .σ(A), we can use the spectral decomposition to define the scalar function by f (A) = Uf (D)UT ,
.
where .f (D) = diag (f (λ1 ), . . . , f (λn )). It is easy to check that if a is the identity function on .R defined by .a(x) = x, then .
a(A) = A,
and if f and g are real functions defined on .σ(A), then (f + g)(A) = f (A) + g(A)
.
and
(f · g)(A) = f (A)g(A).
Moreover, if .f (x) ≥ 0 for .x ∈ σ(A), then .f (A) is SPS, and if .f (x) > 0 for x ∈ σ(A), then .f (A) is SPD. For example, if .A is SPD, then
.
A = A1/2 A1/2 .
.
Obviously, .
σ(f (A)) = f (σ(A)),
(2.34)
and if .ei is an eigenvector corresponding to .λi ∈ σ(A), then it is also an eigenvector of .f (A) corresponding to .f (λi ). It follows easily that for any SPS matrix, .
ImA = ImA1/2
and
KerA = KerA1/2 .
(2.35)
A key to understanding nonsymmetric matrices is the singular value decomposition (SVD). If .B ∈ Rm×n , then SVD of .B is given by .
B = USVT ,
(2.36)
where .U ∈ Rm×m and .V ∈ Rn×n are orthogonal and .S ∈ Rm×n is a diagonal matrix with nonnegative diagonal entries .σ1 ≥ · · · ≥ σmin{m,n} = σmin called singular values of .B. If .A is not a full rank matrix, then it is often more convenient to use the reduced (thin) singular value decomposition (RSVD) .
T , B=U SV
(2.37)
∈ Rm×r and .V ∈ Rn×r are matrices with orthonormal columns, where .U r×r . is a nonsingular diagonal matrix with positive diagonal entries S ∈ R
2 Linear Algebra
33
σ ≥ · · · ≥ σr = σ min , and .r ≤ min{m, n} is the rank of .B. The matrices .U m are formed by the first r columns of .U and .V, respectively. If .x ∈ R , and .V then
. 1
−1 T x = (U T )(V T )(U Bx = U SV SV SU S VT x) = BBT y,
.
so that .
ImB = ImBBT .
(2.38)
T is RSVD, then If .B = U SV Im B = Im U,
.
⊥ Ker B = Im V .
It follows that ⊥
Im BT = (Ker B) .
.
(2.39)
The SVD reveals close relations between the properties of a matrix and its singular values. Thus, the rank of .B ∈ Rm×n is equal to the number of its nonzero singular values, B = BT = σ1 ,
(2.40)
σmin x ≤ Bx ≤ B x .
(2.41)
.
and for any vector .x ∈ Rn , .
Let .σ min denote the least nonzero singular value of .B ∈ Rm×n , let T T with .U ∈ Rn×r , ∈ Rm×r , V .x ∈ ImB , and consider the RSVD .B = U SV r×r r and and . S ∈ R . Then there is .y ∈ R such that .x = Vy T Vy = U Bx = U SV Sy = Sy ≥ σ min y .
.
Since = y , x = Vy
.
we conclude that .
σ min x ≤ Bx
.
σ min x ≤ BT x for
for
any x ∈ ImBT ,
(2.42)
any x ∈ ImB.
(2.43)
or, equivalently,
34
Zdeněk Dostál and Tomáš Kozubek
The SVD (2.36) can be used to introduce the Moore-Penrose generalized inverse of an .m × n matrix .B by .
B† = VS† UT ,
where .S† is the diagonal matrix with the entries .[S† ]ii = 0 if .σi = 0 and −1 † .[S ]ii = σi otherwise. It is easy to check that .
BB† B = USVT VS† UT USVT = USVT = B,
(2.44)
so that the Moore-Penrose generalized inverse is a generalized inverse. If .B is a full row rank matrix, then it may be checked directly that B† = BT (BBT )−1 .
.
If .B is a any matrix and .c ∈ ImB, then .xLS = B† c is a solution of the system of linear equations .Bx = c, i.e., BxLS = c.
.
Notice that .xLS ∈ ImBT , so that if .x is any other solution, then T .x = xLS + d, where .d ∈ KerB, .xLS d = 0, and .
xLS 2 ≤ xLS 2 + d 2 = x 2 .
(2.45)
The vector .xLS is called the least squares solution of .Bx = c. Obviously, .
−1 B† = σ min ,
(2.46)
where .σ min denotes the least nonzero singular value of .B, so that .
−1 xLS = B† c ≤ σ min c .
(2.47)
It can be verified directly that .
B†
T
† = BT .
There are close relations between the SVD decompositions of blocks of an orthogonal matrix
Q11 Q12 .Q = ∈ Rn×n , Q21 Q22 where .Q11 ∈ Rp×p , .Q22 ∈ Rq×q , and .p + q = n. They are revealed by the CS decomposition of .Q, which guarantees that there are orthogonal block diagonal matrices
2 Linear Algebra
35
U=
.
V=
U1 O O U2 V1 O O V2
∈ Rn×n ,
U1 ∈ Rp×p ,
U2 ∈ Rq×q ,
∈ Rn×n ,
V1 ∈ Rp×p ,
V2 ∈ Rq×q ,
such that ⎡
I
⎢ C ⎢ ⎢ O T .U QV = ⎢ ⎢O ⎢ ⎣ −S I
⎤
O
⎥ ⎥ I⎥ ⎥, ⎥ I ⎥ C ⎦ O S
(2.48)
where C = diag(c1 , · · · , cp ),
1 > c1 ≥ · · · ≥ cp > 0,
S = diag(s1 , · · · , sp ),
0 < s1 ≤ · · · ≤ sp < 1,
.
and C2 + S2 = I.
.
Some of the matrices .I, .C, and .S can be missing, and the marked off-diagonal zero matrices can be rectangular. The matrices .I and .O that appear in different positions can denote matrices of different dimensions. The rule can be derived from the fact that the rows and columns of an orthogonal matrix are orthogonal. Some additional explanations can be found in Golub and Van Loan [3, Chap. 2]; the proof is in Page and Saunders [9]. The CS decomposition shows that the blocks of a partitioned orthogonal matrix share singular vectors.
2.11 Angles of Subspaces In Sect. 2.7, we introduced the concept of angle between vectors in terms of their scalar product. A similar definition can be formulated for subspaces n .X , Y ⊂ R , but in this case we shall use a collection of angles starting from the angle between the nearest vectors .x ∈ X and .y ∈ Y . More formally, the canonical angle .γ1 between the subspaces .X and .Y of the dimensions p and q, .p ≤ q, is defined recurrently by .
and
cos(γ1 ) = xT1 y1 = max{xT y : x ∈ X , y ∈ Y , x = y = 1}
36
Zdeněk Dostál and Tomáš Kozubek .
cos(γk ) = xTk yk = max{xT y : x ∈ Xk−1 , y ∈ Yk−1 , x = y = 1},
where .k = 2, . . . , p and ⊥ .Xk = x ∈ Span {x1 , . . . , xk } ,
⊥
Yk = {y ∈ Span {y1 , . . . , yk }
.
To see the relation to SVD, let us denote by .X ∈ Rn×p and .Y ∈ Rn×q the full rank matrices with orthonormal columns such that X = Im X
.
and
Y = ImY.
Then .
cos(γ1 ) = xT1 y1 = max{ξ T XT Yη : ξ ∈ Rp , η ∈ Rq , ξ = η = 1} = ξ T1 XT Yη 1 ,
i.e., we can take .x1 = Xξ 1 , y1 = Yη 1 . Introducing SVD XT Y = USVT ,
.
S = diag(σ1 , . . . , σp ) ∈ Rp×q ,
and taking into account the extremal properties of singular values, we get .
cos(γ1 ) = ξ T1 XT Yη 1 = max{ξ T USVT η : ξ ∈ Rp , η ∈ Rq , ξ = η = 1} = σ1 .
Using similar arguments, we get .
cos(γk ) = xTk yk = ξ Tk XT Yη k = ξ Tk USVT η k = σk ,
k = 2, . . . , p,
and [x1 , . . . , xp ] = X [ξ 1 , . . . , ξ p ],
.
[y1 , . . . , yp ] = Y [η 1 , . . . , η p ].
The largest canonical angle .γp between two subspaces, the cosine of which corresponds to the smallest singular value .σp , incorporates the information about their distance. If applied to one- or two-dimensional subspaces associated with lines or planes in .R3 , .γp coincides with the notion of angle between a vector and a plane or between two planes. If .p = q, we can characterize the distance between .X and .Y also by the gap gap(X , Y ) = PX − PY ,
.
where .PX denotes the orthogonal projector onto the subspace .X . To see the relation between the gap and canonical angles, denote .X = ImX and .Y = ImY, and let
2 Linear Algebra
37
W = [X X⊥ ],
.
Z = [Y Y⊥ ]
denote orthogonal matrices. Then gap(X , Y ) = XXT − YYT = WT XXT − YYT Z T
X O XT Y⊥ T T . = T XX − YY [Y Y⊥ ] = T X⊥ X⊥ Y O
.
Since .WT Z is orthogonal, it follows that there are orthogonal matrices U1 , V1 ∈ Rp×p, U2 , V2 ∈ R(n−p)×(n−p) , and U = diag(U1 , U2 ), V = diag(V1 , V2 ),
.
such that the CS decomposition of .WT Z has the form ⎤ ⎡ I OO OOO ⎢O C O O S O⎥ ⎥ T ⎢ T
⎢O O O O O I⎥ X Y XT Y⊥ V1 O T T ⎥ = U1 OT ⎢ .U W ZV = . ⎢O O O I O O⎥ O V2 O U2 XT⊥ Y XT⊥ Y⊥ ⎥ ⎢ ⎣ O −S O O C O ⎦ O O I OOO (2.49) Comparing the blocks of the right matrix with its CS decomposition, we get that the nonzero canonical cosines of .XT Y are the same as those of .XT⊥ Y⊥ and XT Y⊥ = UT1 XT Y⊥ V2 = diag(O, S, I) = UT2 XT⊥ YV2 = XT⊥ Y .
.
Moreover, if .x ∈ Rp×p , . x = 1, then T
X
2 T
x T + X⊥ Yx 2 . .1 = XT [Y, Y⊥ ] o = X Yx ⊥ It follows that
2 XT Y⊥ = max XT Y⊥ x = 1 − min [XT Yx]
.
x=1
x=1
and gap(X , Y ) = XT Y⊥ =
.
1 − σp2 = sin(γp ).
(2.50)
See also the book by Golub and van Loan [3, Chap. 2] or Ipsen and Mayer [10].
38
Zdeněk Dostál and Tomáš Kozubek
2.12 Graphs, Walks, and Adjacency Matrices Let us recall that the vertices .Vi and the edges of the graph of the mesh of the triangulation .T = {τi } of a polyhedral domain .Ω are the nodes of the mesh and their adjacent couples .Vi Vj , respectively. Recall that the edges .Vi and .Vj are adjacent if there is an element .τk ∈ T such that .Vi Vj is the edge of .τk . The graph is fully described by the adjacency matrix with nonzero entries .dij equal to one if the nodes .Vi and .Vj are adjacent. Since the graph of a mesh is not oriented and does not contain loops, the adjacency matrix is symmetric, and its diagonal entries .dii are equal to zero. Let us also recall that the walk of length k in the mesh of .T is a sequence of distinct nodes .Vi1 , . . . , Vik such that the edges .Vij Vij+1 , .j = 1, 2, . . . , k − 1 belong to the graph of the mesh. Thus, d
= 1,
. ij ij+1
j = 1, . . . , k − 1.
The walk .(Vi1 , . . . , Vik ) starts at .Vi1 and ends at .Vik . We call a walk between nodes .Vi and .Vk an (i, k )-walk. We use the following well-known observation [11]. Lemma 2.3 Let .D be the adjacency matrix of the mesh of .T and .B = Dk . Then each entry .bij of .B gives the number of distinct (i, j)-walks of length k. Proof For .k = 1, the claim follows from the definition of .D. Suppose that for some .k ≥ 1, the entry .bij in .B = Dk gives the number of distinct (i, j )-walks of length k. Denoting .C = Dk+1 = BD, we get that the entries of .C are given by c =
n
. ij
bi dj ,
=1
where the number .bi gives the number of distinct .(i, )-walks of length k and d = 0 or .dj = 1. If a particular edge .V Vj is not in the mesh (graph), then .dj = 0 and .bi dj = 0. Thus, there is no (i, j )-walk of length .k + 1 with the last-but-one node . . On the other hand, if .d,j = 1, then we can prolong each .(i, )-walk to an (i, j )-walk. Thus, .cij gives the number of all distinct . (i, j )-walks of length .k + 1. . j
The following corollary follows easily from Lemma 2.3. Corollary 2.1 Let .D denote the adjacency matrix of a given mesh and .e = [ei ], . ei = 1, i = 1, 2, . . . , n. Then the number w(i, k) of distinct walks of length k starting at node i is given by w(i, k) = [Dk e]i .
.
2 Linear Algebra
39
If the mesh is approximately regular, we expect that more walks of length k originate from the nodes near the center of the mesh than from the vertices far from it. It simply follows that the node with index i which satisfies w(i, k) ≥ w(j, k), j = 1, 2, . . . , n,
.
for sufficiently large k is in a sense near to the center of the mesh.
References 1. Demmel, J.W.: Applied Numerical Linear Algebra. SIAM, Philadelphia (1997) 2. Laub, A.J.: Matrix Analysis for Scientists and Engineers. SIAM, Philadelphia (2005) 3. Golub, G.H., Van Loan, C.F.: Matrix Computations, 4th edn. Johns Hopkins University Press, Baltimore (2013) 4. Mayer, C.: Matrix Analysis and Applied Linear Algebra. SIAM, Philadelphia (2000) 5. Horn, R.A., Johnson, C.R.: Topics in Matrix Analysis. Cambridge University Press, Cambridge (1994) 6. Dostál, Z., Kozubek, T., Markopoulos, A., Menšík, M.: Cholesky factorization of a positive semidefinite matrix with known kernel. Appl. Math. Comput. 217, 6067–6077 (2011) 7. Giraud, L., Langou, J., Rozložník, M., van den Eshof, J.: Rounding error analysis of the classical Gram–Schmidt orthogonalization process. Numer. Math. 101, 87–100 (2005) 8. Smith, R.L.: Some interlacing properties of Schur complement of a Hermitian matrix. Linear Algebra Appl. 177, 137–144 (1992) 9. Page, C.C., Saunders, M.: Toward a generalized singular value decomposition. SIAM J. Numer. Anal. 18, 398–405 (1981) 10. Ipsen, I.C., Mayer, C.D.: The angle between complementary subspaces. Amer. Math. Monthly 102 904–911 (1995) 11. Diestel, R.: Graph Theory. Springer, Heidelberg (2005)
Chapter 3
Optimization Zdeněk Dostál
In this chapter, we briefly review the results concerning the minimization of quadratic functions to the extent sufficient for understanding the algorithms described in Part II. Whenever possible, we use algebraic arguments that exploit the specific structure of these problems. The systematic exposition of optimization theory can be found in the books by Bertsekas [1]; Nocedal and Wright [2]; Conn et al. [3]; Bazaraa et al. [4]; or Griva et al. [5].
3.1 Optimization Problems and Solutions Optimization problems considered in this book are described by a cost (objective, target) function f defined on a subset .D ⊆ Rn and by a constraint set .Ω ⊆ D. The elements of .Ω are called feasible vectors. Important ingredients of scalable algorithms for frictionless contact problems are efficient solvers of quadratic programming (QP) problems with a quadratic cost function f and a constraint set .Ω described by linear equalities and inequalities. The solution of problems with friction requires effective algorithms for quadratically constrained quadratic programs (QCQP) with the constraints described also by separable quadratic inequalities. We look either for a solution .x ∈ Rn of the unconstrained minimization problem .
f (x) ≤ f (x), x ∈ Rn ,
(3.1)
Z. Dostál Department of Applied Mathematics and IT4Innovations National Supercomputing Center, VŠB-Technical University of Ostrava, Ostrava, Czech Republic e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 Z. Dostál et al., Scalable Algorithms for Contact Problems, Advances in Mechanics and Mathematics 36, https://doi.org/10.1007/978-3-031-33580-8_3
41
42
Zdeněk Dostál
or for a solution .x ∈ Ω of the constrained minimization problem .
f (x) ≤ f (x), x ∈ Ω, Ω ⊂ Rn .
(3.2)
A solution of the minimization problem is called its minimizer or global minimizer. A nonzero vector .d ∈ Rn is a feasible direction of .Ω at a feasible point n .x if .x + εd ∈ Ω for all sufficiently small .ε > 0. A nonzero vector .d ∈ R is a recession direction, or simply a direction, of .Ω if for each .x ∈ Ω, x+αd ∈ Ω for all .α > 0.
3.2 Unconstrained Quadratic Programming Let us first recall some simple results which concern unconstrained quadratic programming.
3.2.1 Quadratic Cost Functions We consider the cost functions in the form .
f (x) =
1 T x Ax − bT x, 2
(3.3)
where .A ∈ Rn×n denotes a given SPS or SPD matrix of order n and .b ∈ Rn . If .x, d ∈ Rn , then using elementary computations and .A = AT , we get .
1 f (x + d) = f (x) + (Ax − b)T d + dT Ad. 2
(3.4)
The formula (3.4) is Taylor’s expansion of f at .x, so that the gradient of f at .x is given by .
∇f (x) = Ax − b
(3.5)
and the Hessian of f at .x is given by ∇2 f (x) = A.
.
Taylor’s expansion will be our simple but powerful tool in what follows. A vector .d is a decrease direction of f at .x if f (x + εd) < f (x)
.
3
Optimization
43
for all sufficiently small values of .ε > 0. Using Taylor’s expansion (3.4) in the form f (x + εd) = f (x) + ε(Ax − b)T d +
.
ε2 T d Ad, 2
we get that .d is a decrease direction if and only if (Ax − b)T d < 0.
.
3.2.2 Unconstrained Minimization of Quadratic Functions The following proposition gives algebraic conditions that are satisfied by the solutions of the unconstrained QP problem to find .
min f (x),
(3.6)
x∈Rn
where f is a quadratic function defined by (3.3). Proposition 3.1 Let the quadratic function f be defined by an SPS or SPD matrix .A ∈ Rn×n and .b ∈ Rn . Then the following statements hold: (i) A vector .x is a solution of the unconstrained minimization problem (3.6) if and only if .
∇f (x) = Ax − b = o.
(3.7)
(ii) The minimization problem (3.6) has a unique solution if and only if .A is SPD. Proof The proof is a simple corollary of Taylor’s expansion formula (3.4). . Remark 3.1 Condition (3.7) can be written as a variational equality (Ax)T (x − x) = bT (x − x),
.
x ∈ Rn .
Examining the gradient condition (3.7), we get that problem (3.6) has a solution if and only if .A is SPS and .
b ∈ ImA.
(3.8)
Denoting by .R a matrix the columns of which span Ker.A, we can rewrite (3.8) as .RT b = o. This condition has a simple interpretation: if a mechanical
44
Zdeněk Dostál
system is in equilibrium, the external forces must be orthogonal to the rigid body motions. If .b ∈ ImA, a solution of (3.6) is given by x = A+ b,
.
where .A+ is a left generalized inverse introduced in Sect. 2.3. After substituting into f and simple manipulations, we get .
1 min f (x) = − bT A+ b. 2
x∈Rn
(3.9)
In particular, if .A is positive definite, then .
1 min f (x) = − bT A−1 b. 2
x∈Rn
(3.10)
The above formulae can be used to develop useful estimates. Indeed, if (3.8) holds and .x ∈ Rn , we get 1 1 1 b2 f (x) ≥ − bT A+ b = − bT A† b ≥ − A† b2 = − , 2 2 2 2λmin
.
where .A† denotes the Moore-Penrose generalized inverse and .λmin denotes the least nonzero eigenvalue of .A. In particular, it follows that if .A is positive definite and .λmin denotes the least eigenvalue of .A, then for any .x ∈ Rn , .
1 1 b2 f (x) ≥ − bT A−1 b ≥ − A−1 b2 = − . 2 2 2λmin
(3.11)
If the dimension n of the unconstrained minimization problem (3.6) is large, then it can be too ambitious to look for a solution which satisfies the gradient condition (3.7) exactly. A natural idea is to consider the weaker condition .
∇f (x) ≤ ε
(3.12)
with a small .ε. If .x satisfies the latter condition with .ε sufficiently small and as A nonsingular, then .x is near the unique solution .x
.
.
= A−1 A (x − x ) = A−1 (Ax − b) ≤ A−1 ∇f (x). x − x
(3.13)
The typical “solution” returned by an iterative solver is just .x that satisfies (3.12).
3
Optimization
45
3.3 Convexity Intuitively, convexity is a property of the sets that contain with any two points their joining segment. More formally, a subset .Ω of .Rn is convex if for any .x and .y in .Ω and .α ∈ (0, 1), the vector .s = αx + (1 − α)y is also in .Ω. Let .x1 , . . . , xk be vectors of .Rn . If .α1 , . . . , αk are scalars such that αi ≥ 0,
.
i = 1, . . . , k,
k
αi = 1,
i=1
k then the vector .v = i=1 αi xi is said to be a convex combination of vectors .x1 , . . . , xk . The convex hull of .x1 , . . . , xk , denoted .Conv{x1 , . . . , xk }, is the set of all convex combinations of .x1 , . . . , xk .
3.3.1 Convex Quadratic Functions Given a convex set .Ω ∈ Rn , a mapping .h : Ω → R is said to be a convex function if its epigraph is convex, that is, if h (αx + (1 − α)y) ≤ αh(x) + (1 − α)h(y)
.
for all .x, y ∈ Ω and .α ∈ (0, 1), and it is strictly convex if .h αx + (1 − α)y < αh(x) + (1 − α)h(y) for all .x, y ∈ Ω, x = y, and .α ∈ (0, 1). The following proposition gives a characterization of convex functions. Proposition 3.2 Let V be a subspace of .Rn . The restriction .f |V of a quadratic function f with the Hessian matrix .A to V is convex if and only if .A|V is positive semidefinite, and .f |V is strictly convex if and only if .A|V is positive definite. Proof Let V be a subspace; let .x, y ∈ V , .α ∈ (0, 1), and .s = αx + (1 − α)y. Then by Taylor’s expansion (3.4) of f at .s 1 f (x) = f (s + (x − s)) = f (s) + ∇f (s)T (x − s) + (x − s)T A(x − s), 2 . 1 T f (y) = f (s + (y − s)) = f (s) + ∇f (s) (y − s) + (y − s)T A(y − s). 2 Multiplying the first equation by .α and the second equation by .1 − α and summing up, we get
46
Zdeněk Dostál
αf (x) + (1 − α)f (y) = f (s) + .
α (x − s)T A(x − s) 2 1−α + (y − s)T A(y − s). 2
(3.14)
It follows that if .A|V is positive semidefinite, then f |V is convex. Moreover, since .x = y is equivalent to .x = s and .y = s, it follows that if .A|V is positive definite, then f |V is strictly convex. Let us now assume that f |V is convex, let .z ∈ V and .α = 12 , and denote .x = 2z and .y = o. Then .s = z, x − s = z, y − s = −z, and substituting into (3.14) results in 1 f (s) + zT Az = αf (x) + (1 − α)f (y). 2
.
Since .z ∈ V is arbitrary and f |V is assumed to be convex, it follows that .
1 T z Az = αf (x) + (1 − α)f (y) − f (αx + (1 − α)y) ≥ 0. 2
Thus, .A|V is positive semidefinite. Moreover, if f |V is strictly convex, then A|V is positive definite. .
.
The strictly convex quadratic functions have a nice property that f (x) → ∞ when .x → ∞. The functions with this property are called coercive functions. More generally, a function .f : Rn → R is said to be coercive on .Ω ⊆ Rn if
.
f (x) → ∞
.
for
x → ∞, x ∈ Ω.
3.3.2 Minimizers of Convex Function Under the convexity assumptions, each local minimizer is a global minimizer. We shall formulate this result together with some observations concerning the set of solutions. Proposition 3.3 Let f and .Ω ⊆ Rn be a convex quadratic function defined by (3.3) and a closed convex set, respectively. Then the following statements hold: (i) Each local minimizer of f subject to .x ∈ Ω is a global minimizer of f subject to .x ∈ Ω. (ii) If .x and .y are two minimizers of f subject to .x ∈ Ω, then x − y ∈ KerA ∩ Span{b}⊥ .
.
(iii) If f is strictly convex on .Ω and .x and .y are two minimizers of f subject to .x ∈ Ω, then .x = y.
3
Optimization
47
Proof (i) Let .x ∈ Ω and .y ∈ Ω be local minimizers of f subject to .x ∈ Ω, f (x) < f (y). Denoting .yα = αx + (1 − α)y and using that f is convex, we get
.
f (yα ) = f (αx + (1 − α)y) ≤ αf (x) + (1 − α)f (y) < f (y)
.
for every .α ∈ (0, 1). Since y − yα = αy − x,
.
the inequality contradicts the assumption that .y is a local minimizer. (ii) Let .x and .y be global minimizers of f on .Ω. Then for any .α ∈ [0, 1], x + α(y − x) = (1 − α)x + αy ∈ Ω,
.
y + α(x − y) = (1 − α)y + αx ∈ Ω.
Moreover, using Taylor’s formula, we get α2 (y − x)T A(y − x), 0 ≤f x + α(y − x) − f (x) = α(Ax − b)T (y − x) + 2 . α2 0 ≤f y + α(x − y) − f (y) = α(Ay − b)T (x − y) + (x − y)T A(x − y). 2 Since the latter inequalities hold for arbitrarily small .α, it follows that (Ax − b)T (y − x) ≥ 0 and
.
(Ay − b)T (x − y) ≥ 0.
After summing up the latter inequalities and simple manipulations, we have .
− (x − y)T A(x − y) ≥ 0.
Since the convexity of f implies by Proposition 3.2 that .A is positive semidefinite, it follows that .x − y ∈ KerA. If .f (x) = f (y) and .x, y ∈ KerA, then bT (x − y) = f (x) − f (y) = 0,
.
i.e., .x − y ∈ Span{b}⊥ . (iii) Let f be strictly convex, and let .x, y ∈ Ω be different global minimizers of f on .Ω, so that .f (x) = f (y). Then .KerA = {o} and by (ii) .x − y = o. .
3.3.3 Existence of Minimizers Since quadratic functions are continuous, existence of at least one minimizer is guaranteed by the Weierstrass theorem provided .Ω is compact, that is,
48
Zdeněk Dostál
closed and bounded. The following standard results do not assume that .Ω is bounded. Proposition 3.4 Let f be a convex quadratic function, and let .Ω denote a closed convex set. Then the following statements hold: (i) If f is strictly convex, then there is a unique minimizer of f subject to .x ∈ Ω. (ii) If f is coercive on .Ω, then a global minimizer of f subject to .x ∈ Ω exists. (iii) If f is bounded from below on .Ω, then there is a global minimizer of f subject to .x ∈ Ω. Proof (i) If f is strictly convex, it follows by Proposition 3.2 that .A is SPD and .z = A−1 b is by Proposition 3.1 the unique minimizer of f on .Rn . Thus, for any .x ∈ Rn , f (x) ≥ f (z).
.
It follows that the infimum of .f (x) subject to .x ∈ Ω exists and there is a sequence of vectors .xk ∈ Ω such that .
lim f (xk ) = inf f (x).
k→∞
x∈Ω
The sequence .{xk } is bounded as f (xk ) − f (z) =
.
1 k λmin k (x − z)T A(xk − z) ≥ x − z2 , 2 2
where .λmin denotes the least eigenvalue of .A. It follows that .{xk } has at least one cluster point .x ∈ Ω. Since f is continuous, we get f (x) = inf f (x).
.
x∈Ω
The uniqueness follows by Proposition 3.3. (ii) The proof is similar to that of (i). See, e.g., Bertsekas [1, Proposition A.8]. . (iii) The statement is the well-known Frank-Wolfe theorem [7].
3.3.4 Projections to Convex Sets Let us define the projection .PΩ to the (closed) convex set .Ω ⊂ Rn as a ∈ Ω as in Fig. 3.1. mapping which assigns to each .x ∈ Rn its nearest vector .x The following proposition concerns the projection induced by the Euclidean scalar product. Proposition 3.5 Let .Ω ⊆ Rn be a nonempty closed convex set and .x ∈ Rn . ∈ Ω with the minimum Euclidean distance Then there is a unique point .x
3
Optimization
49
Fig. 3.1 Projection to the convex set
from .x, and for any .y ∈ Ω, .
)T (y − x ) ≤ 0. (x − x
(3.15)
Proof Since the proof is trivial for .x ∈ Ω, let us assume that .x ∈ / Ω is arbitrary but fixed and observe that the function f defined on .Rn by f (y) = x − y2 = yT y − 2yT x + x2
.
has the Hessian ∇2 f (y) = 2I.
.
The identity matrix being positive definite, it follows by Proposition 3.2 that ∈ Ω of .f (y) subject to f is strictly convex, so that the unique minimizer .x .y ∈ Ω exists by Proposition 3.4(i). If .y ∈ Ω and .α ∈ (0, 1), then by convexity of .Ω, + α(y − x ) ∈ Ω, (1 − α) x + αy = x
.
so that for any .x ∈ Rn , 2 ≤ x − x − α(y − x )2 . x − x
.
Using simple manipulations and the latter inequality, we get − α(y − x )2 = 2 − 2α(x − x )T (y − x ) x − x x − x2 + α2 y − x − α(y − x )2 ≤x − x
.
2 − 2α(x − x )T (y − x ). + α2 y − x Thus, )T (y − x ) ≤ α2 y − x 2 2α(x − x
.
for any .α ∈ (0, 1). To obtain (3.15), just divide the last inequality by .α, and . observe that .α may be arbitrarily small.
50
Zdeněk Dostál
Fig. 3.2 Projection .PΩ is nonexpansive
Using Proposition 3.5, it is not difficult to show that the mapping .PΩ which assigns to each .x ∈ Rn its projection to .Ω is nonexpansive as in Fig. 3.2. Corollary 3.1 Let .Ω ⊆ Rn be a nonempty closed convex set, and for any n ∈ Ω denote the projection of .x to .Ω. Then for any .x, y ∈ Ω, .x ∈ R , let .x .
≤ x − y. x−y
(3.16)
, y to .Ω satisfy Proof If .x, y ∈ R, then by Proposition 3.5, their projections .x )T (z − x ) ≤ 0 and (x − x
.
)T (z − y ) ≤ 0 (y − y
into the first inequality and .z = x into the for any .z ∈ Ω. Substituting .z = y second inequality and summing up, we get −y+y )T ( ) ≤ 0. (x − x y−x
.
After rearranging the entries and using the Schwarz inequality, we get 2 ≤ (x − y)T ( ) ≤ x − y , x−y x−y x−y
.
which proves (3.16).
.
3.4 Equality Constrained Problems We shall now consider the problems with the constraint set described by a set of linear equations. More formally, we shall look for .
min f (x),
x∈ΩE
(3.17)
3
Optimization
51
where f is a convex quadratic function defined by (3.3), .ΩE = {x ∈ Rn : Bx = c}, .B ∈ Rm×n , and .c ∈ ImB. We assume that .B = O is not a full column rank matrix, so that .KerB = {o}, but we admit dependent rows of .B. It is easy to check that .ΩE is a nonempty closed convex set. A feasible set .ΩE is a linear manifold of the form ΩE = x + KerB,
.
where .x is any vector which satisfies Bx = c.
.
Thus, a nonzero vector .d ∈ Rn is a feasible direction of .ΩE at any .x ∈ ΩE if and only if .d ∈ KerB, and .d is a recession direction of .ΩE if and only if .d ∈ KerB. Substituting .x = x+z, z ∈ KerB, we can reduce (3.17) to the minimization of .
fx (z) =
1 T T z Az − (b − Ax) z 2
(3.18)
over the subspace .KerB. Thus, we can assume, without loss of generality, that c = o in the definition of .ΩE . We shall occasionally use this assumption to simplify our exposition. A useful tool for the analysis of equality constrained problems is the Lagrangian function .L0 : Rn+m → R defined by
.
.
L0 (x, λ) = f (x) + λT (Bx − c) =
1 T x Ax − bT x + (Bx − c)T λ. 2
(3.19)
Obviously, ∇2xx L0 (x, λ) =∇2 f (x) = A,
(3.20)
∇x L0 (x, λ) =∇f (x) + BT λ = Ax − b + BT λ,
(3.21)
.
.
.
1 L0 (x + d, λ) =L0 (x, λ) + (Ax − b + BT λ)T d + dT Ad. 2
(3.22)
The Lagrangian function is defined in such a way that if considered as a function of .x, then its Hessian and its restriction to .ΩE are exactly those of f, but its gradient .∇x L0 (x, λ) varies depending on the choice of .λ. It simply follows that if f is convex, then .L0 is convex for any fixed .λ, and the global minimizer of .L0 with respect to .x also varies with .λ. We shall see that it is possible to give conditions on .A, .B, and .b such that with a suitable choice
52
Zdeněk Dostál
Fig. 3.3 Geometric illustration of the Lagrangian function
the solution of the constrained minimization problem (3.17) reduces λ = λ, to the unconstrained minimization of .L0 as in Fig. 3.3.
.
3.4.1 Optimality Conditions The main questions concerning the optimality and solvability conditions of (3.17) are answered by the next proposition. Proposition 3.6 Let the equality constrained problem (3.17) be defined by an SPS or SPD matrix .A ∈ Rn×n , a constraint matrix .B ∈ Rm×n with the column rank less than n, and vectors .b ∈ Rn and .c ∈ ImB. Then the following statements hold: (i) A vector .x ∈ ΩE is a solution of (3.17) if and only if .
(Ax − b)T d = 0
(3.23)
for any .d ∈ KerB. (ii) A vector .x ∈ ΩE is a solution of (3.17) if and only if there is a vector m .λ ∈ R such that .
Ax − b + BT λ = o.
(3.24)
Proof (i) Let .x be a solution of the equality constrained minimization problem (3.17), so that for any .d ∈ KerB and .α ∈ R, .
0 ≤ f (x + αd) − f (x) = α(Ax − b)T d +
α2 T d Ad. 2
(3.25)
For sufficiently small values of .α and .(Ax − b)T d = 0, the sign of the righthand side of (3.25) is determined by the sign of .α(Ax − b)T d. Since we can choose the sign of .α arbitrarily and the right-hand side of (3.42) is nonnegative, we conclude that (3.23) holds for any .d ∈ KerB.
3
Optimization
53
Let us now assume that (3.33) holds for a vector .x ∈ ΩE . Then f (x + d) − f (x) =
.
1 T d Ad ≥ 0 2
for any .d ∈ KerB, so that .x is a solution of (3.17). (ii) Let .x be a solution of (3.17), so that by (i) .x satisfies (3.23) for any T .d ∈ KerB. The latter condition is by (2.39) equivalent to .Ax − b ∈ ImB , so m that there is .λ ∈ R such that (3.24) holds. If there are .λ and .x ∈ ΩE such that (3.57) holds, then by Taylor’s expansion (3.22), f (x + d) − f (x) = L0 (x + d, λ) − L0 (x, λ) =
.
1 T d Ad ≥ 0 2
for any .d ∈ KerB, so .x is a solution of the equality constrained problem (3.17).
.
The conditions (ii) of Proposition 3.6 are known as the Karush-KuhnTucker (KKT) conditions for the solution of the equality constrained problem (3.17). If .x ∈ ΩE and .λ ∈ Rm satisfy (3.24), then .(x, λ) is called a KKT pair of problem (3.17). Its second component .λ is called the vector of La grange multipliers or simply the multiplier. We shall often use the notation .x to denote the components of a KKT pair that are uniquely determined. or .λ Proposition 3.6 has a simple geometrical interpretation. The condition (3.33) requires that the gradient of f at a solution .x is orthogonal to .KerB, the set of feasible directions of .ΩE , so that there is no feasible decrease direction as illustrated in Fig. 3.4. Since .d is by (2.39) orthogonal to .KerB if and only if .d ∈ ImBT , it follows that (3.23) is equivalent to the possibility to choose .λ so that .∇x L0 (x, λ) = o. If f is convex, then the latter condition is equivalent to the condition for the unconstrained minimizer of .L0 with respect to .x as illustrated in Fig. 3.5.
Fig. 3.4 Solvability condition (i)
Fig. 3.5 Solvability condition (ii)
54
Zdeněk Dostál
Notice that if f is convex, then the vector of Lagrange multipliers which is the component of a KKT pair modifies the linear term of the original problem in such a way that the solution of the unconstrained modified problem is exactly the same as the solution of the original constrained problem. In terms of mechanics, if the original problem describes the equilibrium of a constrained elastic body subject to traction, then the modified problem is unconstrained with the constraints replaced by the reaction forces.
3.4.2 Existence and Uniqueness Using the optimality conditions of Sect. 3.4.1, we can formulate the conditions that guarantee the existence or uniqueness of a solution of (3.17). Proposition 3.7 Let the equality constrained problem (3.17) be defined by an SPS or SPD matrix .A ∈ Rn×n ; a constraint matrix .B ∈ Rm×n , the column rank of which is less than n; and vectors .b ∈ Rn and c ∈ ImB. Let .R denote a matrix, the columns of which span .KerA. Then the following statements hold: (i) If .A is an SPS matrix, then problem (3.17) has a solution if and only if .
RT b ∈ Im(RT BT ).
(3.26)
(ii) If .A|KerB is positive definite, then problem (3.17) has a unique solution. (iii) If .(x, λ) and .(y, μ) are KKT couples for problem (3.17), then x − y ∈ KerA ∩ Span{b}⊥
.
and
λ − μ ∈ KerBT .
In particular, if problem (3.17) has a solution and KerBT = {o},
.
then there is a unique Lagrange multiplier .λ. Proof (i) Using Proposition 3.6(ii), we have that problem (3.17) has a solution if and only if there is .λ such that .b−BT λ ∈ ImA or, equivalently, that .b−BT λ is orthogonal to .KerA. The latter condition reads .RT b − RT BT λ = o and can be rewritten as (3.26). (ii) First observe that if .A|KerB is positive definite, then .f |KerB is strictly convex by Proposition 3.2 and it is easy to check that .f |ΩE is strictly convex. Since .ΩE is closed, convex, and nonempty, it follows by Proposition 3.4(i) that the equality constrained problem (3.17) has a unique solution. (iii) First observe that .KerB = {x − y : x, y ∈ ΩE } and that f is convex on .KerB by the assumption and Proposition 3.2. Thus, if .x and .y are any solutions of (3.17), then the left relation follows by Proposition 3.3(ii). The . rest follows by a simple analysis of the KKT conditions (3.57).
3
Optimization
55
If .B is not a full row rank matrix and .λ is a Lagrange multiplier for (3.17), then by Proposition 3.7(iii), any Lagrange multiplier .λ can be expressed in the form .
λ = λ + d, d ∈ KerBT .
(3.27)
The Lagrange multiplier .λLS which minimizes the Euclidean norm is called the least squares Lagrange multiplier ; it is a unique multiplier which belongs to .ImB. If .λ is a vector of Lagrange multipliers, then .λLS can be evaluated by .
T λLS = B† BT λ
(3.28)
and λ = λLS + d, d ∈ KerBT .
.
of (3.17) is by PropoIf .A is positive definite, then the unique solution .x sition 3.6 fully determined by the matrix equation x b A BT . (3.29) = , λ c B O which is known as the Karush-Kuhn-Tucker system, briefly KKT system or KKT conditions for the equality constrained problem (3.17). Proposition 3.6 does not require that the related KKT system is nonsingular, in agreement with observation that the solution of the equality constrained problem should not depend on the description of .ΩE .
Fig. 3.6 Minimization with perturbed constraints
56
Zdeněk Dostál
3.4.3 Sensitivity The Lagrange multipliers emerged in Proposition 3.6 as auxiliary variables which nobody had asked for, but which turned out to be useful in alternative formulations of the optimality conditions. However, it turns out that the Lagrange multipliers frequently have an interesting interpretation in specific practical contexts, as we have mentioned at the end of Sect. 3.4.1, where we briefly described their mechanical interpretation. Here, we show that if they are uniquely determined by the KKT conditions (3.29), then they are related to the rates of change of the optimal cost due to the violation of constraints. Let us assume that .A and .B are positive definite and full rank matrices, of the equality x, λ) respectively, so that there is a unique KKT couple .( m constrained problem (3.17). For .u ∈ R , let us consider also the perturbed problem .
min f (x)
Bx=c+u
as in Fig. 3.6. Its solution .x(u) and the corresponding vector of Lagrange multipliers .λ(u) are fully determined by the KKT conditions x(u) b A BT . = , λ(u) c+u B O so that −1 −1 −1 x(u) b b o A BT A BT A BT . = = + . λ(u) c+u c u B O B O B O satisfies First observe that .d(u) = x(u) − x Bd(u) = Bx(u) − B x = u,
.
to approximate the change of optimal so that we can use .∇f ( x) = −BT λ cost by T
T
Bd(u) = −λ u. T d(u) = −λ ∇f ( x)T d(u) = −(BT λ)
.
i can be used to approximate the change of the optimal It follows that .−[λ] cost due to the violation of the i th constraint by .[u]i . To give a more detailed analysis of the sensitivity of the optimal cost with respect to the violation of constraints, let us define for each .u ∈ Rm the primal function p(u) = f (x(u)) .
.
3
Optimization
57
= x(o) and using the explicit formula (2.10) to evaluate Observing that .x the inverse of the KKT system, we get + A−1 BT S−1 u, x(u) = x
.
where .S = BA−1 BT denotes the Schur complement matrix. Thus, = A−1 BT S−1 u, x(u) − x
.
so that p(u) − p(o) =f (x(u)) − f ( x) T 1 A x(u) − x + x(u) − x =∇f ( x)T x(u) − x . 2 1 T −1 T −1 =∇f ( x) A B S u + uT S−1 BA−1 BT S−1 u. 2 It follows that the gradient of the primal function p at .o is given by T ∇p(o) = ∇f ( x)T A−1 BT S−1 = S−1 BA−1 ∇f ( x).
.
we get Recalling that .∇f ( x) = −BT λ, .
= −λ. ∇p(o) = −S−1 BA−1 BT λ
(3.30)
The analysis shows that the decrease of the total differential of f outside .ΩE T (Bx − c). See also Fig. 3.3. The is compensated by the increase of .λ near .x are also called shadow prices after their interpretation in components of .λ economy.
3.5 Inequality Constrained Problems Let us now consider the problems .
min f (x),
x∈ΩI
ΩI = {x ∈ Rn : h(x) ≤ o},
(3.31)
where f is a quadratic function defined by (3.3) and the constraints are defined by continuously differentiable convex functions .hi (x) = [h(x)]i , i = 1, . . . , s, that satisfy .∇hi (x) = o when .hi (x) = 0. In our applications, .hi are either linear forms h (x) = bTi x − ci , ci ∈ R,
. i
or strictly convex separable quadratic functions, i.e.,
58
Zdeněk Dostál
h (x) = (xi − yi )T Hi (xi − yi ) − ci , xi , yi ∈ R2 , Hi SPD, ci > 0.
. i
We assume that .ΩI is nonempty. If the definition of .ΩI includes a quadratic inequality, we call (3.31) the QCQP (quadratically constrained quadratic program) problem. At any feasible point .x, we define the active set A (x) = {i ∈ {1, . . . , s} : hi (x) = 0}.
.
In particular, if .x is a solution of (3.31) with .h(x) = Bx − c, .B ∈ Rs×n , then each feasible direction of .ΩE = {x ∈ Rn : [Bx]A (x) = cA (x) } at .x is a feasible direction of .ΩI at .x. Using the arguments of Sect. 3.4.1, we get that .x is also a solution of the equality constrained problem .
min f (x), x∈ΩE
ΩE = {x ∈ Rn : [Bx]A (x) = cA (x) }.
(3.32)
Thus, (3.31) is a more difficult problem than the equality constrained problem (3.17) as its solution necessarily enhances the identification of .A (x).
3.5.1 Optimality Conditions for Linear Constraints We shall start our exposition with the following optimality conditions. Proposition 3.8 Let the inequality constrained problem (3.31) be defined by an SPS or SPD matrix .A ∈ Rn×n , the constraint matrix .B ∈ Rm×n , and the vectors .b, c ∈ Rn . Let .ΩI = ∅. Then the following statements hold: (i) . x ∈ ΩI is a solution of (3.31) if and only if .
(Ax − b)T d ≥ 0
(3.33)
for any feasible direction .d of .ΩI at .x. (ii) . x ∈ ΩI is a solution of (3.31) with linear inequality constraints if and only if there is .λ ∈ Rm such that .
λ ≥ o,
Ax − b + BT λ = o,
and
T
λ (Bx − c) = 0.
(3.34)
Proof (i) Let .x be a solution of the inequality constrained problem (3.31), and let .d denote a feasible direction of .ΩI at .x, so that the right-hand side of .
f (x + αd) − f (x) = α(Ax − b)T d +
α2 T d Ad 2
(3.35)
is nonnegative for all sufficiently small .α > 0. To prove (3.33), it is enough to take .α > 0 so small that the nonnegativity of the right-hand side of (3.35)
3
Optimization
59
implies that .
α(Ax − b)T d ≥ 0.
Let us assume that .x ∈ ΩI satisfies (3.33) and .x ∈ ΩI . Since .ΩI is convex, it follows that .d = x − x is a feasible direction of .ΩI at .x, so that, using Taylor’s expansion and the assumptions, we have 1 f (x) − f (x) = (Ax − b)T d + dT Ad ≥ 0. 2
.
(ii) Notice any solution .x of (3.31) solves (3.32), so that by Proposition 3.6(ii) there is .y such that Ax − b + BTA (x) y = cA (x) ,
.
and .y ≥ o by the arguments based on the sensitivity of the minimum in Sect. 3.4.3. To finish the proof, it is enough to define .λ as .y padded with zeros. If (3.34) holds and .x + d ∈ ΩI , then .Ax − b = −BT λ and .
1 f (x + d) − f (x) = (Ax − b)T d + dT Ad ≥ −λT Bd 2 = −λT (B(x + d) − c) ≥ o.
.
Remark 3.2 Condition (3.33) can be written as a variational inequality (Ax)T (x − x) ≥ bT (x − x),
.
x ∈ ΩI .
The conditions (3.34) are called the KKT conditions for inequality constraints. The last of these conditions is called the condition of complementarity.
3.5.2 Optimality Conditions for Bound Constrained Problems A special case of problem (3.31) is the bound constrained problem .
min f (x), ΩB = {x ∈ Rn : x ≥ },
x∈ΩB
(3.36)
where f is a quadratic function defined by (3.3) and . ∈ Rn . The optimality conditions for convex bound constrained problems can be written in a more convenient form.
60
Zdeněk Dostál
Proposition 3.9 Let f be a convex quadratic function defined by (3.3) with a positive semidefinite Hessian .A. Then .x ∈ ΩB solves (3.36) if and only if .
Ax − b ≥ o
and
(Ax − b)T (x − ) = 0.
(3.37)
Proof First observe that denoting .B = −In , c = −, and ΩI = {x ∈ Rn : Bx ≤ c},
.
the bound constrained problem (3.36) becomes the standard inequality constrained problem (3.31) with .ΩI = ΩB . Using Proposition 3.11, it follows that .x ∈ ΩB is the solution of (3.36) if and only if there is .λ ∈ Rn such that .
λ ≥ o,
Ax − b − Iλ = o,
and
λT (x − ) = 0.
(3.38)
We complete the proof by observing that (3.37) can be obtained from (3.38) and vice versa by substituting .λ = Ax − b. . In the proof, we have shown that .λ = ∇f (x) is a vector of Lagrange multipliers for the constraints .−x ≤ − or, equivalently, for .x ≥ . Notice that the conditions (3.37) require that none of the vectors .si is a feasible decrease direction of .ΩB at .x, where .si denotes a vector of the standard basis of .Rn formed by the columns of .In , .i ∈ A (x).
3.5.3 Optimality Conditions for More General Constraints If the constraints .hi that define (3.31) are nonlinear, it can happen that their first-order representation by means of the gradients .∇hi is not adequate. The reason is illustrated in Fig. 3.7, where the linear cone .LΩI (x) of .ΩI at .x on the boundary of .ΩI defined by LΩI (x) = {d ∈ Rn : dT ∇hi (x) ≤ 0, i ∈ A (x)}
.
comprises the whole line, while the tangent cone .TΩI (x) of .ΩI at .x on the boundary of .ΩI defined by TΩI (x) = {d ∈ Rn : d = lim di , x + αi di ∈ ΩI , lim αi = 0, αi > 0}
.
i→∞
i→∞
comprises only one point. To avoid such pathological situations, we shall assume that .LΩI (x) = TΩI (x). This is also called the Abadie constraint qualification (ACQ) [5]. Notice that linear constraints satisfy ACQ. The ACQ assumption reduces the analysis of the conditions of minima to the linear case, so that we can formulate the following proposition [6].
3
Optimization
61
Fig. 3.7 Example of .ΩI = {x}, .LΩI (x) = TΩI (x)
Proposition 3.10 Let the inequality constrained problem (3.31) be defined by an SPS or SPD matrix .A ∈ Rn×n and convex differentiable functions .hi , and let .ΩI satisfy ACQ. Then .x ∈ ΩI is a solution of (3.31) if and only if there is .λ ∈ Rm such that .
λ ≥ o,
Ax − b + ∇h(x)λ = o,
and
T
λ h(x) = 0.
(3.39)
Proof First notice that due to the definition of tangential cone, .x solves (3.31) if and only if f (x) =
.
min
x+TΩI (x)
f (x).
Since we assume .TΩI (x) = LΩI (x), the letter problem has the same solution as .
min
x+LΩI (x)
f (x).
Using Proposition 3.8, we get that .x ∈ ΩI solves (3.31) if and only if .x satisfies (3.39). .
3.5.4 Existence and Uniqueness In our discussion of the existence and uniqueness results for the inequality constrained QP problem (3.31), we restrict our attention to the following results that are useful in our applications. Proposition 3.11 Let the inequality constrained problem (3.31) be defined by convex functions .hi and f. Let .C denote the cone of recession directions of the nonempty feasible set .ΩI . Then the following statements hold:
62
Zdeněk Dostál
(i) If problem (3.31) has a solution, then .
dT b ≤ 0
f or
d ∈ C ∩ KerA.
(3.40)
(ii) If the constraints are linear, then (3.40) is sufficient for the existence of minima. (iii) If the constraints are linear and .(x, λ) and .(y, μ) are KKT couples for (3.31), then .
x − y ∈ KerA ∩ Span{b}⊥
and
λ − μ ∈ KerBT .
(3.41)
(iv) If .A is positive definite, then the inequality constrained minimization problem (3.31) has a unique solution. Proof (i) Let .x be a global solution of the inequality constrained minimization problem (3.31), and recall that .
f (x + αd) − f (x) = α(Ax − b)T d +
α2 T d Ad 2
(3.42)
for any .d ∈ Rn and .α ∈ R. Taking .d ∈ C ∩ KerA, (3.42) reduces to f (x + αd) − f (x) = −αbT d,
.
which is nonnegative for any .α ≥ 0 if and only if .bT d ≤ 0. (ii) See Dostál [8]. (iii) The first inclusion of (3.41) holds by Proposition 3.3(ii) for the solutions of any convex problem. The inclusion for multipliers follows by the KKT condition (3.34). (iv) If .A is SPD, then f is strictly convex by Proposition 3.2, so by Proposition 3.4, there is a unique minimizer of f subject to .x ∈ ΩI . .
3.6 Equality and Inequality Constrained Problems In the previous sections, we obtained the results concerning optimization problems with either equality or inequality constraints. Here, we extend these results to the optimization problems with both equality and inequality constraints. More formally, we look for .
min f (x),
x∈ΩIE
ΩIE = {x ∈ Rn : [h(x)]I ≤ oI , [h(x)]E = oE },
(3.43)
where f is a quadratic function with the SPS Hessian .A ∈ Rn×n and the linear term defined by .b ∈ Rn , .I , E are the disjunct sets of indices which
3
Optimization
63
decompose .{1, . . . , m}, and the equality and inequality constraints are defined, respectively, by linear and continuously differentiable convex functions .[h(x)]i = hi (x), .i = 1, . . . , m. We assume that .∇hi (x) = o when .hi (x) = 0 and that .Ωi = ∅. We are especially interested in linear equality and inequality constraints defined by linear forms or strictly convex separable quadratic functions. If we describe the conditions that define .ΩIE in components, we get ΩIE = {x ∈ Rn : hi (x) ≤ 0, i ∈ I , bTi x = ci , i ∈ E },
.
which makes sense even for .I = ∅ or .E = ∅; we consider the conditions which concern the empty set as always satisfied. For example, .E = ∅ gives ΩIE = {x ∈ Rn : hi (x) ≤ 0, i ∈ I },
.
and the kernel of an “empty” matrix is defined by KerBE ∗ = {x ∈ Rn : bTi x = 0, i ∈ E } = Rn .
.
If all constraints are linear, then .ΩI is defined by .B ∈ Rm×n and .c ∈ Rm , ⎡ T⎤ b1 cI BI ⎢ .. ⎥ .B = ⎣ . ⎦ = B E , c = cE , bTm and we get a QP variant of (3.43) .
min f (x),
x∈ΩIE
ΩIE = {x ∈ Rn : BI x ≤ cI , BE x = cE }.
(3.44)
3.6.1 Optimality Conditions First observe that any equality constraint .bTi x = ci , i ∈ E can be replaced by the couple of inequalities .bTi x ≤ ci and .−bTi x ≤ −ci . We can thus use our results obtained by the analysis of the inequality constrained problems in Sect. 3.5 to get similar results for general bound and equality constrained QP problem (3.43). Proposition 3.12 Let the quadratic function f be defined by the SPS matrix A and .b ∈ Rn . Let .ΩIE = ∅ be defined by
.
hE (x) = BE x − cE
.
and convex differential functions .hi , i ∈ I . Then the following statements hold: (i) .x ∈ ΩIE is a solution of (3.43) if and only if
64
Zdeněk Dostál .
dT ∇f (x) = (Ax − b)T d ≥ 0
(3.45)
for any feasible direction .d of .ΩIE at .x. (ii) If .ΩIE is defined by linear constraints (3.44), then .x ∈ ΩIE is a solution of (3.43) if and only if there is a vector .λ ∈ Rm such that .
λI ≥ o,
Ax − b + BT λ = o,
and
T
λI [Bx − c]I = 0.
(3.46)
(iii) If .ΩIE is a feasible set for the problem (3.43) and the constraints satisfy ACQ, then .x ∈ ΩIE is a solution of (3.43) if and only if there is a vector m .λ ∈ R such that .
λI ≥ o,
Ax − b + ∇h(x)λ = o,
T
and
λI hI (x) = 0.
(3.47)
Proof First observe that if .E = ∅, then the statements of the above proposition reduce to Propositions 3.8 and 3.10 and if .I = ∅, then they reduce to Proposition 3.6. Thus, we can assume in the rest of the proof that .I = ∅ and .E = ∅. As mentioned above, (3.43) may also be rewritten as .
min f (x), ΩI = {x ∈ Rn : [Bx]I ≤ cI , [Bx]E ≤ cE , −[Bx]E ≤ −cE },
x∈ΩI
(3.48) where .ΩI = ΩIE . Thus, the statement (i) is a special case of Proposition 3.8. Observing that we can always ignore one of the inequality constraints related to the same equality constraint, we easily get (ii) and (iii) from Proposi. tions 3.8 and 3.10, respectively. Remark 3.3 Condition (3.45) can be written as a variational inequality (Ax)T (x − x) ≥ bT (x − x),
.
x ∈ ΩIE .
3.7 Duality for Quadratic Programming Problems The duality associates each problem (3.43), which we shall also call primal problem, with a maximization problem in Lagrange multipliers that we shall call the dual problem. The solution of the dual problem is a Lagrange multiplier of the solution of the primal problem, so that having the solution of the dual problem, we can get the solution of the primal problem by solving an unconstrained problem. Here, we limit our attention to the QP problems (3.44), postponing the discussion of more general cases to specific applications. The cost function of the dual problem is the dual function .
Θ(λ) = infn L0 (x, λ). x∈R
(3.49)
3
Optimization
65
If A is SPD, then .L0 is a quadratic function with the SPD Hessian, and .Θ is a quadratic function in .λ which can be defined by an explicit formula. However, in our applications, it often happens that .A is only SPS, so the cost function f need not be bounded from below and .−∞ can be in the range of the dual function .Θ. We resolve this problem by keeping .Θ quadratic at the cost of introducing equality constraints. Proposition 3.13 Let matrices .A, B, vectors .b, c, and index sets .I , E be those of the definition of problem (3.44) with .A positive semidefinite and n×d .ΩIE = ∅. Let .R ∈ R be a full rank matrix such that .
ImR = KerA,
let .A+ denote an SPS generalized inverse of .A, and let .
1 1 Θ(λ) = − λT BA+ BT λ + λT (BA+ b − c) − bT A+ b. 2 2
(3.50)
Then the following statements hold: (i) If .(x, λ) is a KKT pair for (3.44), then .λ is a solution of .
ΩBE = {λ ∈ Rm : λI ≥ o, RT BT λ = RT b}.
max Θ(λ),
λ∈ΩBE
(3.51)
Moreover, there is .α ∈ Rd such that .(λ, α) is a KKT pair for problem (3.51) and .
x = A+ (b − BT λ) + Rα.
(3.52)
(ii) If .(λ, α) is a KKT pair for problem (3.51), then .x defined by (3.52) is a solution of the equality and inequality constrained problem (3.44). (iii) If .(x, λ) is a KKT pair for problem (3.44), then .
f (x) = Θ(λ).
(3.53)
Proof (i) Assume that .(x, λ) is a KKT pair for (3.44), so that .(x, λ) is by Proposition 3.12 a solution of λI ≥o,
(3.54)
∇x L0 (x, λ) =Ax − b + BT λ = o,
(3.55)
[∇λ L0 (x, λ)]I = [Bx − c]I ≤ o,
(3.56)
[∇λ L0 (x, λ)]E = [Bx − c]E = o,
(3.57)
.
.
.
.
66
Zdeněk Dostál .
λTI [Bx − c]I =0.
(3.58)
Notice that given a vector .λ ∈ Rm , we can express the condition b − BT λ ∈ ImA,
.
which guarantees solvability of (3.55) with respect to .x, conveniently as .
RT (BT λ − b) = o.
(3.59)
If the latter condition is satisfied, then we can use any symmetric left generalized inverse .A+ to find all solutions of (3.55) with respect to .x in the form x(λ, α) = A+ (b − BT λ) + Rα, α ∈ Rd ,
.
where d is the dimension of .KerA. After substituting for .x into (3.56)–(3.58), we get [ −BA+ BT λ + (BA+ b − c) + BRα]I ≤ o,
(3.60)
[ −BA+ BT λ + (BA+ b − c) + BRα]E = o,
(3.61)
λTI [ −BA+ BT λ + (BA+ b − c) + BRα]I = 0.
(3.62)
.
.
.
The formulae in (3.60)–(3.62) look like something that we have already seen. Indeed, introducing the vector of Lagrange multipliers .α for (3.59) and denoting Λ(λ, α) =Θ(λ) + αT (RT BT λ − RT b) 1 1 . = − λT BA+ BT λ + λT (BA+ b − c) − bT A+ b 2 2 + αT (RT BT λ − RT b), g = ∇λ Λ(λ, α) = −BA+ BT λ + (BA+ b − c) + BRα,
.
we can rewrite the relations (3.60)–(3.62) as .
gI ≤ o,
gE = o,
and
λTI gI = 0.
(3.63)
Comparing (3.63) with the KKT conditions for the bound and equality constrained problem, we conclude that (3.63) are the KKT conditions for .
max Θ(λ)
subject to
RT BT λ − RT b = o and
λI ≥ o.
(3.64)
3
Optimization
67
We have thus proved that if .(x, λ) solves (3.54)–(3.58), then .λ is a feasible vector for problem (3.64) which satisfies the related KKT conditions. Recalling that .A+ is by the assumption symmetric positive semidefinite, so that + T .BA B is also positive semidefinite, we conclude that .λ solves (3.51). Moreover, we have shown that any solution .x can be obtained in the form (3.52) with a KKT pair .(λ, α), where .α is a vector of the Lagrange multipliers for the equality constraints in (3.51). (ii) Let .(λ, α) be a KKT pair for problem (3.51), so that .(λ, α) satisfies (3.59)–(3.62) and .λI ≥ o. If we denote x = A+ (b − BT λ) + Rα,
.
we can use (3.60)–(3.62) to verify directly that .x is feasible and .(x, λ) satisfies the complementarity conditions, respectively. Finally, using (3.59), we get that there is .y ∈ Rn such that .b − BT λ = Ay. Thus, Ax − b + BT λ =A A+ (b − BT λ) + Rα − b + BT λ .
=AA+ Ay − b + BT λ = b − BT λ − b + BT λ = o,
which proves that .(x, λ) is a KKT pair for (3.44). (iii) Let .(x, λ) be a KKT pair for (3.44). Using the feasibility condition (3.57) and the complementarity condition (3.58), we get T
T
T
λ (Bx − c) = λE [Bx − c]E + λI [Bx − c]I = 0.
.
Hence, T
f (x) = f (x) + λ (Bx − c) = L0 (x, λ).
.
Next recall that if .(x, λ) is a KKT pair, then ∇x L0 (x, λ) = o.
.
Since .L0 is convex, the latter is the gradient condition for the unconstrained minimizer of .L0 with respect to .x; therefore, L0 (x, λ) = minn L0 (x, λ) = Θ(λ).
.
x∈R
Thus, f (x) = L0 (x, λ) = Θ(λ).
.
.
Since the constant term is not essential in our applications and we formulate our algorithms for minimization problems, we shall consider the function
68
Zdeněk Dostál .
1 1 θ(λ) = −Θ(λ) − bT A+ b = λT BA+ BT λ − λT (BA+ b − c), 2 2
(3.65)
so that .
arg
min θ(λ) = arg max Θ(λ).
λ∈ΩBE
λ∈ΩBE
3.7.1 Uniqueness of a KKT Pair We shall complete our exposition of duality by formulating the results concerning the uniqueness of the solution for the constrained dual problem .
min θ(λ),
λ∈ΩBE
ΩBE = {λ ∈ Rm : λI ≥ o, RT BT λ = RT b},
(3.66)
where .θ is defined by (3.65). Proposition 3.14 Let the matrices .A and .B, the vectors .b and .c, and the index sets .I and .E be those from the definition of problem (3.44) with .A positive semidefinite, .ΩIE = ∅, and .ΩBE = ∅. Let .R ∈ Rn×d be a full rank matrix such that ImR = KerA.
.
Then the following statements hold: (i) If . BT and .BR are full column rank matrices, then there is a unique of problem (3.66). solution .λ is a unique solution of the constrained dual problem (3.66), (ii) If .λ A = {i : [λ]i > 0} ∪ E ,
.
α) and .BA ∗ R is a full column rank matrix, then there is a unique triple .( x, λ, solves the consuch that .( x, λ) solves the primal problem (3.44) and .(λ, α) is known, then strained dual problem (3.66). If .λ − (BA ∗ A+ b − cA ) = (RT BTA ∗ BA ∗ R)−1 RT BTA ∗ BA ∗ A+ BT λ .α (3.67) and .
+ Rα. = A+ (b − BT λ) x
(3.68)
(iii) If .BT and . BE ∗ R are full column rank matrices, then there is a unique α) solves the primal problem (3.44) and .(λ, α) such that .( x, λ, x, λ) triple .( solves the constrained dual problem (3.66).
3
Optimization
69
Proof (i) Let .BT and .BR be full column rank matrices. To show that there is a unique solution of (3.66), we examine the Hessian .BA+ BT of .θ. Let T T + T .R B λ = o and .BA B λ = o. Using the definition of .R, it follows that T n .B λ ∈ ImA. Hence, there is .μ ∈ R such that BT λ = Aμ
.
and μT Aμ = μT AA+ Aμ = λT BA+ BT λ = 0.
.
Thus, .μ ∈ KerA and BT λ = Aμ = o.
.
Since we assume that .BT has independent columns, we conclude that .λ = o. We have thus proved that the restriction of .BA+ BT to .Ker(RT BT ) is positive definite, so that .θ|KerRT BT is by Proposition 3.4 strictly convex, and it is easy to check that it is strictly convex on U = {λ ∈ Rm : RT BT λ = RT b}.
.
Since .ΩBE = ∅ and .ΩBE ⊆ U , we have that .θ is strictly convex on .ΩBE , of (3.66). and it follows by Proposition 3.4 that there is a unique solution .λ be a unique solution of problem (3.66). Since the solution satisfies (ii) Let .λ such that the related KKT conditions, it follows that there is .α − (BA ∗ A+ b − cA ) − BA ∗ Rα = o. BA ∗ A+ BT λ
.
After multiplying on the left by .RT BTA ∗ and simple manipulations, we is unique due to the uniqueget (3.67). The inverse exists, and the solution .α ness of .λ and the assumption on the full column rank of .BA ∗ R. (iii) If .BT and .BE ∗ R are full column rank matrices, then .BR is also a full of problem (3.66) by column rank matrix. Hence, there is a unique solution .λ (i). Since .E ⊆ A and .BE ∗ R has independent columns, it follows that .BA ∗ R has also independent columns. Thus, we can use (ii) to finish the proof. . The reconstruction formula (3.67) can be modified in order to work whenever the dual problem has a solution .λ. The resulting formula obtained by the analysis of the related KKT conditions then reads T T + T T + T + . α = (R BA ∗ BA ∗ R) R BA ∗ BA ∗ A B λ − (BA ∗ A b − cA ) . (3.69) It is rather evident that if we have the Lagrange multiplier .α from the solution of the dual problem (3.51), then there is no reason to use formulae (3.69) or (3.67). Notice that their evaluation can be expensive as it requires assem-
70
Zdeněk Dostál
bling and solving a possibly large system of linear equations. However, we can have a solution to the modified dual problem, but not the corresponding Lagrange multiplier .α. For example, the effective solver can require a modification of the dual cost function (3.65) such as preconditioning or reg but changes the multiplier .α ularization that does not change the solution .λ of the dual problem. In this case, the formulae (3.69) or (3.67) can be helpful. If we get a multiplier .α corresponding to some modified problem, it is of the possible that it can be used to evaluate effectively the multiplier .α original problem [9]. Such procedure is problem dependent. The effective evaluation of the primal solution using the multiplier of a modified dual problem resulting from the application of the FETI methodology is described in Chaps. 10 and .
Fig. 3.8 Unique displacement
Fig. 3.9 Nonunique displacement
The theory concerning the existence and uniqueness of a solution can be illustrated on a problem to find the displacement .x of an elastic body depicted in Fig. 3.8 under traction .b. After the finite element discretization, we get a convex QP problem to find a minimum of the discrete quadratic energy function f subject to equality (prescribed displacements) and inequality (nonpenetration) constraints that enforce some limits on the movement of the body. We can distinguish the following cases: • If we assume that the body is fixed to the support so that it does not admit any rigid body motion, then f is strictly convex, and the displacements are unique. • If we assume that the body is fixed to the support only in normal direction as in Figs. 3.8 and 3.9, then the existence of a solution depends on the direction of the traction .b: – If the body is not allowed to penetrate the vertical obstacle and the traction .b corresponds to Fig. 3.8, then the vector of displacements exists and is unique. – If the horizontal component of .b is zero as in Fig. 3.9, then the solution still exists, but it is not unique.
References
71
– If the horizontal component of .b in Fig. 3.8 pointed from the obstacle, then the solution would not exist. – If the horizontal component of .b in Fig. 3.9 were not zero, then the solution would not exist. If the solution exists and the active constraints are represented by the relation BA ∗ x = oA
.
with independent rows, then the corresponding Laplace multipliers, i.e., the requires that the reaction forces, are unique. The condition .RT b = RT BT λ resulting forces are balanced in the directions of the rigid body motions and I ≥ o guarantees that the body is not glued to the obstacle. The condi.λ is necessary for the reconstruction of displacements. If tion .RT b = RT BT λ determine the components of .λ, then .λ is uniquely the reaction forces .BT λ T determined by the conditions of equilibrium. Notice that .B λ can be interpreted as reaction nodal forces and, unlike the Lagrange multipliers, is always uniquely determined by the conditions of equilibrium.
References 1. Bertsekas, D.P.: Nonlinear Optimization. Athena Scientific, Belmont (1999) 2. Nocedal, J., Wright, S.F.: Numerical Optimization. Springer, New York (2000) 3. Conn, A.R., Gould, N.I.M., Toint, Ph.L: Trust Region Methods. SIAM, Philadelphia (2000) 4. Bazaraa, M.S., Shetty, C.M., Sherali, H.D.: Nonlinear Programming, Theory and Algorithms, 2nd edn. Wiley, New York (1993) 5. Griva, I., Nash, S.G., Sofer, A.: Linear and Nonlinear Optimization. SIAM, Philadelphia (2009) 6. Dostál, Z., Beremlijski, P.: On convergence of inexact augmented Lagrangians for separable and equality convex QCQP problems without constraint qualification. Adv. Electr. Electron. Eng. 15, 215–222 (2017) 7. Frank, M., Wolfe, P.: An algorithm for quadratic programming. Nav. Res. Logist. Q. 3, 95–110 (1956) 8. Dostál, Z.: On solvability of convex non-coercive quadratic programming problems. JOTA 143(2), 413–416 (2009) 9. Horák, D., Dostál, Z., Sojka, R.: On the efficient reconstruction of displacements for contact problems. Adv. Electr. Electron. Eng. 15(2), 237– 241 (2017)
Chapter 4
Analysis Marie Sadowská
We first give a brief presentation of essential Sobolev spaces on Lipschitz domains and their boundaries to the extent that is useful for the continuous formulation of the conditions of equilibrium of elastic bodies in contact and the analysis of solution algorithms. In particular, we recall the fundamental definitions of relevant spaces of functions, the results on equivalent norms, and some inequalities used in the proofs of scalability of presented algorithms, particularly in Chaps. 10, 11, and 14. These results are sufficient for a correct and sufficiently general variational formulation of the equilibrium conditions. We typically do not include proofs, but we try to refer to the relevant results in a suitable form. The results mentioned above enable us to eliminate the interior variables on a continuous level and reduce the contact problems to the boundary. The latter procedure is fundamental for the boundary element methods. The applications of functional analysis to both finite and boundary element approximation of elliptic problems, including equilibrium of elastic bodies, can be found in the book by Steinbach [1]. For more general results, we refer an interested reader to the books by Adams and Fournier [2] or McLean [3]. Finally, we review the relations between the semicoercive variational inequalities that provide an abstract framework for formulating contact problems and the conditions satisfied by the minimizers of the energy function. More information on variational inequalities can be found, e.g., in the books by Lions and Stampacchia [4], Glowinski [5], and Glowinski et al. [6].
M. Sadowská Department of Applied Mathematics, VŠB-Technical University of Ostrava, Ostrava, Czech Republic e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 Z. Dostál et al., Scalable Algorithms for Contact Problems, Advances in Mechanics and Mathematics 36, https://doi.org/10.1007/978-3-031-33580-8_4
73
74
Marie Sadowská
4.1 Sobolev Spaces Let .Ω ⊂ Rd , .d = 2, 3, denote a nonempty bounded Lipschitz domain with a boundary .Γ. The space of all real functions that are Lebesgue measurable and quadratically integrable in .Ω is denoted as .L2 (Ω). We do not distinguish functions which differ on a set of zero measure. In .L2 (Ω), we define the scalar product 2 (u, v) = u v dΩ . (4.1) L (Ω) Ω
and the norm 1/2
uL2 (Ω) = (u, u)L2 (Ω) .
.
The space .L2 (Ω) with scalar product (4.1) is a Hilbert space. Note that there holds Hölder’s inequality . |u v| dΩ ≤ uL2 (Ω) vL2 (Ω) for all u, v ∈ L2 (Ω). Ω
By .C ∞ (Ω), we denote the space of all real functions with continuous derivatives of all orders with respect to all variables in .Ω that are continuously extendable to its closure .Ω. Its subset comprising functions with compact support in .Ω is denoted .C0∞ (Ω). Recall that the support .sup v of v is defined as a closure (in .Rd ) of the set of points x such that .v(x) = 0. Let us define the Sobolev space ∂u 1 .H (Ω) = u ∈ L2 (Ω) : ∈ L2 (Ω) for i = 1, . . . , d , ∂xi where the derivatives are uniquely defined by the relation ∂v ∂u . v dΩ = − udΩ, v ∈ C0∞ (Ω). ∂x ∂x i i Ω Ω Note that .H 1 (Ω) can be equivalently defined as the completion of ∞ . C (Ω), · H 1 (Ω) , where 1/2 uH 1 (Ω) = u2L2 (Ω) + |u|2H 1 (Ω)
.
with the seminorm
4 Analysis
75
|u|H 1 (Ω)
.
d
∂u
2
= ∇uL2 (Ω) =
∂xi 2
.
L (Ω)
i=1
It holds that .H 1 (Ω) is a Hilbert space with respect to the scalar product .
(u, v)H 1 (Ω) = (u, v)L2 (Ω) + (∇u, ∇v)L2 (Ω) ,
(4.2)
where (∇u, ∇v)L2 (Ω) =
d
.
i=1
Ω
∂u ∂v dΩ. ∂xi ∂xi
The following theorem introduces another useful norm on .H 1 (Ω) that is closely associated with the seminorm. Theorem 4.1 (Sobolev’s Norm Equivalence Theorem) Let f : H 1 (Ω) → R
.
denote a bounded linear functional, so that there is .C > 0 such that 0 ≤ |f (u)| ≤ CuH 1 (Ω)
.
u ∈ H 1 (Ω)
for any
and .f (u) = 0 for any nonzero constant function u. Then .uf = |f (u)|2 + ∇u2L2 (Ω) defines a norm equivalent to .uH 1 (Ω) .
Proof See the book by Steinbach [1, Theorem 2.6].
.
Important instances of f are
2
f (u) =
u dΓ
. Γ
and
2 u dΩ .
fΩ (u) =
Γ
Ω
A useful relation between the seminorm .|u|H 1 (Ω) and .uL2 (Ω) is the following special case of Poincaré inequality. Theorem 4.2 (Poincaré Inequality) Let .Ω ⊂ R2 denote a bounded domain with a Lipschitz boundary .Γ . Then there is .CP > 0 such that any .u ∈ H 1 (Ω) satisfies 2
u2L2 (Ω) ≤ CP
.
∇u2L2 (Ω) +
u dΓ Γ
.
(4.3)
76
Marie Sadowská
Proof See Steinbach’s book [1, inequality (2.12)]. A simple proof of the Poincaré inequality on a cube of arbitrary dimension for the functions with . zero traces on the boundary can be found, e.g., in Nečas [7]. The Poincaré inequality holds for more general domains with constants that depend on their shapes. Here, we have limited our attention to a particular case suitable for the analysis of our model problems. For a subset .ΓU of .Γ with .meas ΓU > 0, we define the Sobolev space 1 .H0 (Ω, ΓU ) as the completion of ∞ . C0 (Ω, ΓU ), · H 1 (Ω) , where .C0∞ (Ω, ΓU ) contains all functions from .C ∞ (Ω) vanishing on .ΓU . Note that .H01 (Ω, ΓU ) is a Hilbert space with respect to the scalar product (4.2). In addition, due to the Friedrichs theorem, the functional .| · |H 1 (Ω) represents on 1 .H0 (Ω, ΓU ) an equivalent norm to . · H 1 (Ω) . The following theorem defines the trace operator which gives sense to the concept of boundary value to the functions from .H 1 (Ω). Theorem 4.3 (Trace Theorem) There is a unique linear continuous mapping .
γ0 : H 1 (Ω) → L2 (Γ)
(4.4)
satisfying γ u = u| Γ
. 0
for all u ∈ C ∞ (Ω).
The function .γ0 u ∈ L2 (Γ) is called the trace of .u ∈ H 1 (Ω). Proof See, e.g., McLean [3].
.
Theorem 4.3 guarantees that there is .CT > 0 such that . (γ0 u)2 dΓ ≤ CT u2L2 (Ω) + ∇u2L2 (Ω) = CT u2L2 (Ω) + |u|2H 1 (Ω) . Γ
(4.5) Note that it can be shown that H01 (Ω, ΓU ) = {u ∈ H 1 (Ω) : γ0 u = 0 on ΓU }.
.
Remark 4.1 The generic constants .CP in Theorem 4.2 and .CT in (4.5) depend on the diameter .H(Ω), the shape of .Ω, and the shape of .Γ . However, there are rich classes of domains associated with the constants depending only on H. For example, Payne and Weinberger [8] proved that the Poincaré inequality (4.2) is satisfied with CP = CP (H) = H/π
.
4 Analysis
77
for any convex domain .Ω ⊂ Rd of the diameter H. Similar results were proved for the class of domains with the cone property and uniformly Lipschitz boundaries [9]. Exact bounds for elementary simplexes can be found in [10]. We shall call the class of such subdomains regular. More formally, the class .O of bounded domain subdomains is regular if there is a function .C(H) such that any subdomain .Ω ∈ O with the diameter .H = H(Ω) satisfies (4.3) and (4.5) with .CP ≤ C(H) and CT ≤ C(H). Intuitively, the boundary of regular domains should not be too saw-toothed, and the domain should neither be thin [11] nor include thin channels. For the analysis of model problems, we can do with the squares and cubes.
4.2 Trace Spaces Let us denote the trace space of .H 1 (Ω) by .H 1/2 (Γ), i.e., H 1/2 (Γ) = γ0 (H 1 (Ω)).
.
In .H 1/2 (Γ), we introduce the Sobolev-Slobodeckij scalar product (u(x) − u(y))(v(x) − v(y)) 2 (u, v) = (u, v) + dΓx dΓy . L (Γ) H 1/2 (Γ) d x − y Γ Γ (4.6) and the corresponding norm .
1/2 2 2 vH 1/2 (Γ) = vL2 (Γ) + |v|H 1/2 (Γ) ,
where .
(v(x) − v(y))2
2
|v|H 1/2 (Γ) =
Γ
Γ
d
x − y
dΓx dΓy .
Recall that .H 1/2 (Γ) is a Hilbert space with respect to the scalar product (4.6), that there is .c1 > 0 such that γ0 uH 1/2 (Γ) ≤ c1 uH 1 (Ω)
.
for all u ∈ H 1 (Ω),
and that there exist .c2 > 0 and an extension operator ε : H 1/2 (Γ) → H 1 (Ω)
.
such that γ εu = u
. 0
78
Marie Sadowská
and εuH 1 (Ω) ≤ c2 uH 1/2 (Γ)
.
for all u ∈ H 1/2 (Γ).
The dual space to .H 1/2 (Γ) with respect to the .L2 (Γ) scalar product is denoted by .H −1/2 (Γ) , and the norm in .H −1/2 (Γ) is given by .
wH −1/2 (Γ) =
sup 0=v∈H 1/2 (Γ)
| w, v | , vH 1/2 (Γ)
where . w, v = w(v) denotes the duality pairing. Notice that w is a bounded functional on .L2 (Γ), so that by the Riesz theorem, there is .w ∈ L2 (Γ) such that w(u) = (w, u)L2 (Γ) .
.
A following Green formula is a key tool in developing the boundary element method. We shall formulate it for the functions from the space 1 HΔ (Ω) = {v ∈ H 1 (Ω) : Δv ∈ L2 (Ω)}.
.
1 Theorem 4.4 (Green’s Theorem) Let .u ∈ HΔ (Ω) and .v ∈ H 1 (Ω). Then .
(∇u, ∇v)L2 (Ω) + (Δu, v)L2 (Ω) = γ1 u, γ0 v ,
(4.7)
1 where .γ1 : HΔ (Ω) → H −1/2 (Γ) is the interior conormal derivative defined by
.
γ1 u(x) =
lim
Ωy→x
Proof See, e.g., McLean [3].
∂ u(y), ∂nx
x ∈ Γ.
(4.8)
.
4.3 Variational Inequalities Let us review basic definitions and ideas concerning semi-elliptic variational inequalities. Let V denote a real Hilbert space with a scalar product .(·, ·)V and the induced norm .·V ; let .| · |V be a seminorm on V ; let .K ⊂ V be a closed, convex, and nonempty set; let .f ∈ V be a bounded linear functional on V ; and let a be a bilinear form on V.
4 Analysis
79
Definition 4.1 A bilinear form .a : V × V → R is said to be: • Bounded on .K if there is an .M > 0 such that .
|a(u, v)| ≤ M uV vV
for all u, v ∈ K .
• Symmetric on .K if a(u, v) = a(v, u)
.
for all u, v ∈ K .
• Elliptic on .K if there is an .α > 0 such that a(u, u) ≥ αu2V
.
for all u ∈ K .
• Semi-elliptic on .K if there is an .α > 0 such that a(u, u) ≥ α|u|2V
.
for all u ∈ K .
We shall also need a classification of functionals.
Definition 4.2 A functional .f : V → R is said to be: • Coercive on .K if .
v∈K vV → ∞
⇒ f (v) → ∞.
• Convex on .K if for all .u, v ∈ K and all .t ∈ (0, 1), f (tu + (1 − t)v) ≤ t f (u) + (1 − t) f (v).
.
Theorem 4.5 If a functional .f : V → R is continuous, coercive, and convex on .K , then there exists a solution of the minimization problem to find .u ∈ K such that f (u) = min {f (v) : v ∈ K } .
.
Proof Let .BR be a closed, origin-centered ball of radius .R > 0, i.e., BR = {v ∈ V : vV ≤ R}.
.
Then the coercivity of f on .K yields the existence of a large enough .R > 0 such that
80
Marie Sadowská
q = inf {f (v) : v ∈ K } = inf {f (v) : v ∈ K ∩ BR }.
.
Let us consider a sequence .{un } ⊂ K ∩ BR such that f (un ) → q.
.
Since .{un } is bounded, there is a subsequence .{unk } of .{un } and .u ∈ V satisfying u
. nk
u.
Let us recall that every closed convex set is weakly closed, i.e., .u ∈ K , and since every continuous convex functional is weakly lower semi-continuous on a closed convex set, we get f (u) ≤ lim inf f (unk ) = lim f (unk ) = q ≤ f (u).
.
.
We are concerned with the following variational inequality: find .u ∈ K such that .
a(u, v − u) ≥ f (v − u)
for all v ∈ K .
(4.9)
Note that if .K is a subspace of V, then (4.9) is equivalent to the variational equation to find .u ∈ K such that a(u, v) = f (v)
.
for all v ∈ K .
Theorem 4.6 (Generalization of the Lax-Milgram Theorem) If a is bounded and elliptic on V, then there exists a unique solution .u ∈ K of problem (4.9). Proof The proof can be found, e.g., in Glowinski [5], Stampacchia [12], and Glowinski et al. [6]. Let us prove the uniqueness of the solution. Assume that both .u1 and .u2 solve the variational inequality (4.9), so that for all .v ∈ K , we have a(u1 , v − u1 ) ≥ f (v − u1 )
.
and
a(u2 , v − u2 ) ≥ f (v − u2 ).
Substituting .u2 and .u1 for v in the first and second inequalities, respectively, and summing up the inequalities, we get a(u1 − u2 , u1 − u2 ) ≤ 0.
.
Since a is elliptic on V, there is an .α > 0 such that a(u1 − u2 , u1 − u2 ) ≥ αu1 − u2 2V .
.
4 Analysis
81
Thus, we get .u1 = u2 .
.
Let us define the energy functional .q : V → R by .
q(v) =
1 a(v, v) − f (v). 2
(4.10)
Proposition 4.1 If a is symmetric and semi-elliptic on V, then the energy functional q defined by (4.10) is convex on V. Proof Let us show that for all .u, v ∈ V and all .t ∈ (0, 1) q(tu + (1 − t)v) ≤ t q(u) + (1 − t)q(v).
.
Since f is linear on V, it is enough to prove that .v → a(v, v) is convex on V. Let .u, v ∈ V and .t ∈ (0, 1) be arbitrary. From the semi-ellipticity and symmetry of a on V, we obtain .
0 ≤ a(u − v, u − v) = a(u, u) − 2 a(u, v) + a(v, v).
(4.11)
Then, by (4.11), we get a(tu + (1 − t)v, tu + (1 − t)v) = = t2 a(u, u) + 2 t(1 − t) a(u, v) + (1 − t)2 a(v, v) . ≤ t2 a(u, u) + t(1 − t) [a(u, u) + a(v, v)] + (1 − t)2 a(v, v) = t a(u, u) + (1 − t) a(v, v),
which completes the proof.
.
Theorem 4.7 If a is bounded, symmetric, and semi-elliptic on V, then problem (4.9) is equivalent to the minimization problem to find .u ∈ K such that .
q(u) = min {q(v) : v ∈ K } .
(4.12)
Proof Suppose .u ∈ K is a solution of (4.9). Let us pick any .v ∈ K and put z = v − u ∈ V . Then
.
a(u, z) ≥ f (z)
.
and due to the symmetry and semi-ellipticity of a on V, 1 q(v) = q(z + u) = a(z + u, z + u) − f (z + u) 2 1 1 = a(z, z) + a(u, z) + a(u, u) − f (z) − f (u) 2 2 . 1 =q(u) + a(z, z) + (a(u, z) − f (z)) 2 ≥q(u).
82
Marie Sadowská
Conversely, assume that .u ∈ K minimizes the energy functional q on .K . If we pick an arbitrary .v ∈ K , then φ(t) = q((1 − t)u + tv) ≥ q(u) = φ(0)
.
for all t ∈ [0, 1].
Now let us take a closer look at the function .φ. The symmetry of a on V yields φ(t) =
.
1 1 (1 − t)2 a(u, u) + (1 − t) ta(u, v) + t2 a(v, v) − (1 − t)f (u) − tf (v) 2 2
for all .t ∈ [0, 1], and therefore, φ (0) = −a(u, u) + a(u, v) + f (u) − f (v) = a(u, v − u) − f (v − u).
. +
Since .φ+ (0) ≥ 0, the proof is finished.
.
Remark 4.2 Let us sketch the proof of Theorem 4.6 using an additional assumption of the symmetry of a on V. Due to Theorem 4.7, to prove the solvability, it is enough to show that the energy functional q defined by (4.10) gets its minimum on .K . By Proposition 4.1, we know that q is convex on V. Continuity of both a and f on V implies continuity of q on V, and, moreover, it can be easily shown that particularly due to ellipticity of a on V, the energy functional q is coercive on V. Thus, by Theorem 4.5, we get that problem (4.12) is solvable. Remark 4.3 If the assumptions of Proposition 4.1 are satisfied, q is convex on V. If, moreover, a is continuous on V, so is q. Thus, if we want to use Theorem 4.5 in order to obtain the solvability of minimization problem (4.12) (and the solvability of variational inequality (4.9)), it is enough to prove coercivity of q on .K .
References 1. Steinbach, O.: Numerical Approximation Mathods for Boundary Value Problems. Springer, New York (2008) 2. Adams, R.A., Fournier, J.J.F.: Sobolev Spaces, 2nd edn. Academic Press, New York (2003) 3. McLean, W.: Strongly Elliptic Systems and Boundary Integral Equations. Cambridge University Press, Cambridge (2000) 4. Lions, J.-L., Stampacchia, G.: Variational inequalities. Commun. Pure Appl. Math. 20, 493–519 (1967) 5. Glowinski, R.: Numerical Methods for Nonlinear Variational Problems. Springer Series in Computational Physics. Springer, Berlin (1984) 6. Glowinski, R., Lions, J.L., Tremolieres, R.: Numerical Analysis of Variational Inequalities. North-Holland, Amsterdam (1981)
4 Analysis
83
7. Nečas, J.: Direct Methods in the Theory of Elliptic Equations. Springer, Berlin (2012) 8. Payne, L.E., Weinberger, H.F.: An optimal Poincaré inequality for convex domains. Arch. Rat. Mech. Anal. 5, 286–292 (1960) 9. Nazarov, A.I., Repin, S.I.: Exact constants in Poincaré type inequalities for functions with zero mean boundary traces. Math. Methods Appl. Sci. 38, 3195–3207 (2015) 10. Boulkhemair, A., Chakib, A.: On the uniform Poincaré inequality. Commun. Partial Differ. Equ. 32, 1439–1447 (2007) 11. Nitsche, J.A.: On Korn’s second inequality. ESAIM Math. Model. Numer. Anal. 15(3), 237–248 (1981) 12. Stampacchia, G.: Formes bilineaires coercitives sur les ensembles convexes. C. r. hebd. séances Acad. sci. 258, 4413–4416 (1964)
Part II Optimal QP and QCQP Algorithms
Chapter 5
Conjugate Gradients Zdeněk Dostál
We begin our development of scalable algorithms for contact problems by the description of the conjugate gradient method for solving .
min f (x),
x∈Rn
(5.1)
where .f (x) = 12 xT Ax − xT b, .b is a given column n-vector, and .A is an .n × n SPD matrix. We are interested especially in the problems with n large and .A sparse and reasonably conditioned. As we have already seen in Sect. 3.2.2, problem (5.1) is equivalent to the solution of a system of linear equations .Ax = b, but our main goal here is the solution of auxiliary minimization problems generated by the algorithms for solving constrained QP problems arising from the discretization of contact problems. Here, we view the conjugate gradient (CG) method as an iterative method, which generates improving approximations to the solution of (5.1) at each step. The cost of one step of the CG method is dominated by the cost of the multiplication of a vector by the matrix .A, which is proportional to the number of nonzero entries of .A or its sparse representation. The memory requirements are typically dominated by the cost of the storage of .A. In our applications, the matrices come in the form of a product of sparse matrices so that the cost of matrix-vector multiplications and storage requirements increases proportionally to the number of unknown variables. The rate of convergence of the CG method depends on the distribution of the spectrum of .A and can be improved by a problem-dependent preconditioning. For the development of scalable algorithms for contact problems, it is important that there are well-established standard preconditioners for the solution of the problems arising from the application of domain decomposiZ. Dostál Department of Applied Mathematics and IT4Innovations National Supercomputing Center, VŠB-Technical University of Ostrava, Ostrava, Czech Republic e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 Z. Dostál et al., Scalable Algorithms for Contact Problems, Advances in Mechanics and Mathematics 36, https://doi.org/10.1007/978-3-031-33580-8_5
87
88
Zdeněk Dostál
tion methods to the problems of elasticity that can reduce the conditioning of the Hessian matrix of the discretized elastic energy function.
5.1 First Observations The conjugate gradient method is based on simple observations. Let us start with examining the first one, namely, that it is possible to reduce the solution of (5.1) to the solution of a sequence of one-dimensional problems. Let .A ∈ Rn×n be an SPD matrix, and let us assume that there are nonzero n-vectors .p1 , . . . , pn such that (pi , pj )A = (pi )T Apj = 0 for i = j.
.
We call such vectors .A-conjugate or briefly conjugate. Specializing the arguments of Sect. 2.7, we get that .p1 , . . . , pn are independent. Thus, .p1 , . . . , pn form the basis of .Rn , and any .x ∈ Rn can be written in the form .
x = ξ1 p 1 + · · · + ξ n p n .
Substituting into f and using the conjugacy result in 1 2 1 T 1 1 2 n T n T 1 T n ξ (p ) Ap − ξ1 b p + · · · + ξ (p ) Ap − ξn b p f (x) = 2 1 2 n . =f (ξ1 p1 ) + · · · + f (ξn pn ). Thus, f ( x) = minn f (x) = min f (ξ1 p1 ) + · · · + min f (ξn pn ).
.
x∈R
ξ1 ∈R
ξn ∈R
We have thus decomposed the problem (5.1) into n one-dimensional problems. Since df ξpi . = ξi (pi )T Api − bT pi = 0, dξ ξi of (5.1) is given by the solution .x .
= ξ1 p 1 + · · · + ξ n p n , x
ξi = bT pi /(pi )T Api , i = 1, . . . , n.
(5.2)
The second observation concerns effective generation of a conjugate basis. Let us recall how to generate conjugate directions with the Gram-Schmidt procedure. Assuming that .p1 , . . . , pk are nonzero conjugate directions, .1 ≤ / Span{p1 , . . . , pk } to generate a new k < n, let us examine how to use .hk ∈ k+1 in the form member .p
5 Conjugate Gradients
89 .
pk+1 = hk + βk1 p1 + · · · + βkk pk .
(5.3)
Since .pk+1 should be conjugate to .p1 , . . . , pk , we get 0 =(pi )T Apk+1 = (pi )T Ahk + βk1 (pi )T Ap1 + · · · + βkk (pi )T Apk .
=(pi )T Ahk + βki (pi )T Api ,
i = 1, . . . , k.
Thus, .
βki = −
(pi )T Ahk , i = 1, . . . , k. (pi )T Api
(5.4)
Obviously, Span{p1 , . . . , pk+1 } = Span{p1 , . . . , pk , hk }.
.
Therefore, given any independent vectors .h0 , . . . , hk−1 , we can start from 1 0 .p = h and use (5.3) and (5.4) to construct a set of mutually .A-conjugate directions .p1 , . . . , pk such that Span{h0 , . . . , hi−1 } = Span{p1 , . . . , pi }, i = 1, . . . , k.
.
For .h0 , . . . , hk−1 arbitrary, the construction is increasingly expensive as it requires both the storage for the vectors .p1 , . . . , pk and heavy calculations including evaluation of .k(k + 1)/2 scalar products. However, it turns out that we can adapt the procedure so that it generates very efficiently the conjugate basis of the Krylov spaces Kk = Kk (A, g0 ) = Span{g0 , Ag0 , . . . , Ak−1 g0 }, k = 1, . . . , n,
.
with .g0 = Ax0 − b defined by a suitable initial vector .x0 and .K0 = {o}. The powerful method is again based on a few simple observations. First assume that . p1 , . . . , pi form a conjugate basis of .Ki , .i = 1, . . . , k, and observe that if .xk denotes the minimizer of f on .x0 + Kk , then by Proposition 3.6(i), the gradient .gk = ∇f (xk ) is orthogonal to the Krylov space .Kk , that is, .
(gk )T x = 0 for any x ∈ Kk .
In particular, if .gk = o, then .
gk ∈ / Kk .
Since .gk ∈ Kk+1 , we can use (5.3) with .hk = gk to expand any conjugate basis of .Kk to the conjugate basis of .Kk+1 . Obviously, Kk (A, g0 ) = Span{g0 , . . . , gk−1 }.
.
90
Zdeněk Dostál
Next observe that for any .x ∈ Kk−1 and .k ≥ 1, .
Ax ∈ Kk
or briefly .AKk−1 ⊆ Kk . Since .pi ∈ Ki ⊆ Kk−1 , i = 1, . . . , k − 1, we have .
(Api )T gk = (pi )T Agk = 0, i = 1, . . . , k − 1.
It follows that β
. ki
=−
(pi )T Agk = 0, i = 1, . . . , k − 1. (pi )T Api
Summing up, if we have a set of such conjugate vectors .p1 , . . . , pk that .
Span{p1 , . . . , pi } = Ki , i = 1, . . . k,
then the formula (5.3) applied to .p1 , . . . , pk and .hk = gk simplifies to .
pk+1 = gk + βk pk
(5.5)
with .
βk = βkk = −
(pk )T Agk . (pk )T Apk
(5.6)
Finally, observe that the orthogonality of .gk to .Span{p1 , . . . , pk } and (5.5) imply that .
pk+1 ≥ gk .
(5.7)
In particular, if .gk−1 = o, then .pk = o, so (5.6) is well defined provided k−1 .g = o.
5.2 Conjugate Gradient Method In the previous two sections, we have found that the conjugate directions can be used to reduce the minimization of any convex quadratic function to the solution of a sequence of one-dimensional problems and that the conjugate directions can be generated very efficiently. The famous conjugate gradient (CG) method just puts these two observations together. The algorithm starts from an initial guess .x0 , .g0 = Ax0 − b, and .p1 = g0 . If .xk−1 and .gk−1 are given, .k ≥ 1, it first checks if .xk−1 is the solution. If
5 Conjugate Gradients
91
not, then the algorithm generates .
xk = xk−1 − αk pk with αk = (gk−1 )T pk /(pk )T Apk
(5.8)
and gk = Axk − b = A xk−1 − αk pk − b = Axk−1 − b − αk Apk .
= gk−1 − αk Apk .
(5.9)
Finally, the new conjugate direction .pk+1 is generated by (5.5) and (5.6). The decision if .xk−1 is an acceptable solution is typically based on the value of .gk−1 , so the norm of the gradient must be evaluated at each step. It turns out that the norm can also be used to replace the scalar products involving the gradient in the definition of .αk and .βk . To find the formulae, let us replace k in (5.5) by .k − 1 and multiply the resulting identity by .(gk−1 )T . Using the orthogonality, we get .
(gk−1 )T pk = gk−1 2 + βk−1 (gk−1 )T pk−1 = gk−1 2 ,
(5.10)
so by (5.8), .
αk =
gk−1 2 . (pk )T Apk
(5.11)
To find an alternative formula for .βk , notice that .αk > 0 for .gk−1 = o and that by (5.9), .
Apk =
1 k−1 (g − gk ), αk
so that αk (gk )T Apk = (gk )T (gk−1 − gk ) = −gk 2
.
and .
βk = −
(pk )T Agk gk 2 gk 2 = = . (pk )T Apk αk (pk )T Apk gk−1 2
The complete CG method is presented as Algorithm 5.1.
(5.12)
92
Zdeněk Dostál
Algorithm 5.1 Conjugate gradient method (CG)
Each step of the CG method can be implemented with just one matrixvector multiplication. This multiplication by the Hessian matrix .A typically dominates the cost of the step. Only one generation of vectors .xk , pk , and .gk is typically stored, so the memory requirements are modest. Let us recall that the algorithm finds at each step the minimizer .xk of f on .x0 + Kk = x0 + Kk (A, g0 ) and expands the conjugate basis of .Kk to that of .Kk+1 provided .gk = o. Since the dimension of .Kk is less than or equal to k, it follows that for some .k ≤ n, .
Kk = Kk+1 .
Since .gk ∈ Kk+1 and .gk is orthogonal to .Kk , Algorithm 5.1 implemented in of (5.1) in at most n steps. We can the exact arithmetics finds the solution .x sum up the most important properties of Algorithm 5.1 into the following theorem. Theorem 5.1 Let .{xk } be generated by Algorithm 5.1 to find the solution of (5.1) starting from .x0 ∈ Rn . Then the algorithm is well defined, and .x . Moreover, the following statements hold for there is .k ≤ n such that .xk = x .i = 1, . . . , k:
It is usually sufficient to find .xk such that .gk is small. For example, given a small .ε > 0, we can consider .gk small if .
gk ≤ εb.
5 Conjugate Gradients
93
= xk is an approximate solution which satisfies Then .x ) ≤ εb, A( x−x
.
≤ ελmin (A)−1 , x−x
where .λmin (A) denotes the least eigenvalue of .A. It is easy to check that the solves the perturbed problem approximate solution .x .
min f(x),
x∈Rn
1 T x, f(x) = xT Ax − b 2
= b + gk . b
What is “small” depends on the problem solved. To keep our exposition general, we shall often not specify the test in what follows. Of course, .gk = o is always considered small.
5.3 Rate of Convergence of (5.1) in a number of Although the CG method finds the exact solution .x steps which does not exceed the dimension of the problem by Theorem 5.1, it turns out that it can often produce a sufficiently accurate approximation of .x in a smaller number of steps than required for exact termination. .x This observation suggests that the CG method may also be considered as an iterative method. Let us denote the solution error as .
e = e(x) = x − x
and observe that .
g( x) = A x − b = o.
It follows that ) = Aek , gk = Axk − b = Axk − A x = A(xk − x
.
so in particular Kk (A, g0 ) = Span{g0 , Ag0 , . . . , Ak−1 g0 } = Span{Ae0 , . . . , Ak e0 }.
.
We start our analysis of the solution error by using the Taylor expansion (3.4) to obtain the identity )) − f ( f (x) − f ( x) =f ( x + (x − x x) .
1 2A − f ( ) + x − x x) =f ( x) + g( x)T (x − x 2 1 1 2A = e2A . = x − x 2 2
94
Zdeněk Dostál
Combining the latter identity with Theorem 5.1, we get ek 2A =2 f (xk ) − f ( x) = min 2 (f (x) − f ( x)) x∈x0 +Kk (A,g0 )
.
=
min
x∈x0 +Kk (A,g0 )
2A = x − x
min
x∈x0 +Kk (A,g0 )
e(x)2A .
Since any .x ∈ x0 + Kk (A, g0 ) may be written in the form x = x0 + ξ1 g0 + ξ2 Ag0 + · · · + ξk Ak−1 g0 = x0 + ξ1 Ae0 + · · · + ξk Ak e0 ,
.
it follows that = e0 + ξ1 Ae0 + · · · + ξk Ak e0 = p(A)e0 , x−x
.
where p denotes the polynomial defined for any .x ∈ R by .
p(x) = 1 + ξ1 x + ξ2 x2 + · · · + ξk xk .
Thus, denoting by .P k the set of all k th-degree polynomials p which satisfy .p(0) = 1, we have .
ek 2A =
min
x∈x0 +Kk (A,g0 )
e(x)2A = min p(A)e0 2A .
(5.13)
p∈P k
We shall now derive a bound on the expression on the right-hand side of (5.13) that depends on the spectrum of .A but is independent of the initial error .e0 . Let a spectral decomposition of .A be written as .A = UDUT , where .U is an orthogonal matrix and .D = diag.(λ1 , . . . , λn ) is a diagonal matrix defined by the eigenvalues of .A. Since .A is assumed to be positive definite, the square root of .A is well defined by 1
.
1
A 2 = UD 2 UT .
Using .p(A) = Up(D)UT , it is also easy to check that 1
.
1
A 2 p(A) = p(A)A 2 .
Moreover, for any vector .v ∈ Rn , 1
1
1
1
1
v2A = vT Av = vT A 2 A 2 v = (A 2 v)T A 2 v = A 2 v2 .
.
Using the latter identities (5.13) and the properties of norms, we get 1
1
ek 2A = min p(A)e0 2A = min A 2 p(A)e0 2 = min p(A)A 2 e0 2 p∈P k
.
p∈P k
1
≤ min p(A)2 A 2 e0 2 = min p(D)2 e0 2A . p∈P k
Since
p∈P k
p∈P k
5 Conjugate Gradients
95 .
p(D) =
max
i∈{1,...,n}
|p(λi )|,
we can write .
ek A ≤ min
max
p∈P k i∈{1,...,n}
|p(λi )| e0 A .
(5.14)
The estimate (5.14) reduces the analysis of convergence of the CG method to the analysis of approximation of zero function on .σ(A) of .A by a k th-degree polynomial with the value one at origin. For example, if .σ(A) is clustered around a single point .ξ, then the minimization by the CG should be very effective because .|(1 − x/ξ)k | is small near .ξ. We shall use (5.14) to get a “global” estimate of the rate of convergence of the CG method in terms of the condition number of .A. Theorem 5.2 Let .{xk } be generated by Algorithm 5.1 to find the solution .x of (5.1) starting from .x0 ∈ Rn . Then the error ek = xk − x
.
satisfies k
κ(A) − 1 . e A ≤ 2
e0 A , κ(A) + 1 k
(5.15)
where .κ(A) denotes the spectral condition number of .A. Proof See, e.g., Axelsson [1] or Dostál [2].
.
The estimate (5.15) can be improved for some special distributions of the eigenvalues. For example, if .σ(A) is in a positive interval .[amin , amax ] except for m isolated eigenvalues .λ1 , . . . , λm , then we can use special polynomials k+m .p ∈ P of the form λ λ .p(λ) = 1− ... 1 − q(λ), q ∈ P k λ1 λm to get the estimate
.
e
k+m
√ k κ −1 A ≤ 2 √ e0 A , κ +1
(5.16)
where .κ = amax /amin . If the spectrum of .A satisfies .σ(A) ⊆ [amin , amax ] ∪ [amin + d, amax + d], .d > 0, then .
ek A ≤ 2
√ k κ−1 √ e0 A , κ+1
(5.17)
96
Zdeněk Dostál
where .κ = 4amax /amin approximates the effective condition number of .A. The proofs of the above bounds can be found in Axelsson [3] and Axelsson and Lindskøg [4].
5.4 Preconditioned Conjugate Gradients The analysis of the previous section shows that the rate of convergence of the CG algorithm depends on the distribution of the eigenvalues of the Hessian .A of f. In particular, we argued that CG converges very rapidly if the eigenvalues of .A are clustered around one point, i.e., if the condition number .κ(A) is close to one. We shall now show that we can reduce our minimization problem to this favorable case if we have a symmetric positive definite matrix .M such that .M−1 x can be easily evaluated for any .x and .M approximates .A in the sense that .M−1 A is close to the identity. First assume that .M is available in the form .
L LT , M=
so that .M−1 A is similar to . L−T and the latter matrix is close to the L−1 A identity. Then f (x) =
.
1 T T −1 −T T (L x) (L AL )(L x) − ( L−1 b)T ( LT x) 2
and we can replace our original problem (5.1) by the preconditioned problem to find .
min f¯(y),
(5.18)
y∈Rn
where we substituted .y = LT x and set 1 L−1 A L−T )y − ( L−1 b)T y. f¯(y) = yT ( 2
.
of the preconditioned problem (5.18) is related to the solution The solution .y of the original problem by x
.
= . x L−T y
.
If the CG algorithm is applied directly to the preconditioned problem (5.18) with a given .y0 , then the algorithm is initialized by y0 = LT x0 ,
.
¯0 = g L−1 A L−T y0 − L−1 b = L−1 g0 ,
the iterates are defined by
¯1 = g ¯0; and p
5 Conjugate Gradients
97
¯k, α ¯ k =¯ gk−1 2 /(¯ p k )T L−T p L−1 A
.
¯k, yk =yk−1 − α ¯k p ¯k, ¯ k =¯ gk−1 − α ¯k L−T p L−1 A g β¯k =¯ gk 2 /¯ gk−1 2 , ¯k. ¯ k+1 =¯ p gk + β¯k p
Substituting yk = LT xk ,
.
¯k = g L−1 gk ,
and
¯k = p L T pk ,
and denoting zk = L−T L−1 gk = M−1 gk ,
.
we obtain the preconditioned conjugate gradient algorithm (PCG) in the original variables. Algorithm 5.2 Preconditioned conjugate gradient method (PCG)
Notice that the PCG algorithm does not explicitly exploit the Cholesky factorization of the preconditioner .M. The pseudoresiduals .zk are typically obtained by solving .Mzk = gk . If .M is a good approximation of .A, then .zk is close to the error vector .ek . The rate of convergence of the PCG algorithm depends on the condition number of the Hessian of the transformed function −1 .f¯, i.e., on .κ(M A) = κ( L−1 A L−T ). Thus, the efficiency of the preconditioned CG method depends critically on the choice of a preconditioner, which should balance the cost of its application with the preconditioning effect. We refer interested readers to specialized books like Axelsson [1] or Saad [5] for more information. In our applications, we use the standard FETI preconditioners introduced by Farhat et al. [6].
98
Zdeněk Dostál
5.5 Convergence in the Presence of Rounding Errors The elegant mathematical theory presented above assumes implementation of the CG algorithm in exact arithmetic and captures well the performance of only a limited number of CG iterations in computer arithmetics. Since we use the CG method mainly for a low-precision approximation of wellconditioned auxiliary problems, we shall base our exposition on this theory in what follows. However, it is still useful to be aware of possible effects of rounding errors that accompany any computer implementation of the CG algorithm, especially for the solution of very large problems. It has been known since the introduction of the CG method that, when used in finite precision arithmetic, the vectors generated by these algorithms can seriously violate their theoretical properties. In particular, it has been observed that the evaluated gradients can lose their orthogonality after as small a number of iterations as 20 and that nearly dependent conjugate directions can be generated. In spite of these effects, it has been observed that the CG method still converges in finite precision arithmetic, but that the convergence is delayed [7, 8]. Undesirable effects of the rounding errors can be reduced by reorthogonalization. A simple analysis reveals that the full reorthogonalization of the gradients is costly and requires large memory. A key to an efficient implementation of the reorthogonalization is based on observation that accumulation of the rounding errors has a regular pattern, namely, that large perturbations of the generated vectors belong to the space generated by the eigenvectors of .A which can be approximated well by the vectors from the current Krylov space. This has led to the efficient implementation of the CG method based on the selective orthogonalization proposed by Parlett and Scott [9]. More details and information about the effects of rounding errors and implementation of the CG method in finite arithmetic can be found in the comprehensive review paper by Meurant and Strakoš [10].
5.6 Comments Since its introduction in the early 1950s by Hestenes and Stiefel [11], a lot of research related to the development of the CG method has been carried out, so that there are many references concerning this subject. We refer an interested reader to the textbooks and research monographs by Saad [5], van der Vorst [12], Greenbaum [13], and Axelsson [1] for more information. A comprehensive account of development of the CG method up to 1989 may be found in the paper by Golub and O’Leary [14]. Most of the research is concentrated on the development and analysis of preconditioners. Finding at each step the minimum over the subspace generated by all the previous search directions, the CG method exploits all information gathered
5 Conjugate Gradients
99
during the previous iterations. To use this feature in the algorithms for the solution of constrained problems, it is important to generate long uninterrupted sequences of the CG iterations. This strategy also supports exploitation of yet another unique feature of the CG method, namely, its self-preconditioning capabilities that were described by van der Sluis and van der Vorst [15]. The latter property can also be described in terms of the preconditioning by the conjugate projector. Indeed, if .Qk denotes the conjugate projector onto the conjugate complement V of .U = Span{p1 , . . . , pk }, then it is possible to give the bound on the rate of convergence of the CG method starting from T .xk+1 in terms of the spectral regular condition number .κk = κ(Qk AQk |V ) of T .Qk AQk |V and observe that .κk decreases with the increasing k. Recall that the spectral regular condition number .κ(A) of an SPS matrix .A is defined by .
κ(A) =
A , λmin (A)
where .λmin (A) denotes the least nonzero eigenvalue of .A.
References 1. Axelsson, O.: Iterative Solution Methods. Cambridge University Press, Cambridge (1994) 2. Dostál, Z.: Optimal Quadratic Programming Algorithms, with Applications to Variational Inequalities, 1st edn. Springer, New York (2009) 3. Axelsson, O.: A class of iterative methods for finite element equations. Comput. Methods Appl. Mech. Eng. 9, 127–137 (1976) 4. Axelsson, O., Lindskøg, G.: On the rate of convergence of the preconditioned conjugate gradient method. Numer. Math. 48, 499–523 (1986) 5. Saad, Y.: Iterative Methods for Large Linear Systems. SIAM, Philadelphia (2002) 6. Farhat, C., Mandel, J., Roux, F.-X.: Optimal convergence properties of the FETI domain decomposition method. Comput. Methods Appl. Mech. Eng. 115, 365–385 (1994) 7. Greenbaum, A.: Behavior of slightly perturbed Lanczos and conjugategradient recurrences. Linear Algebr. Appl. 113, 7–63 (1989) 8. Greenbaum, A., Strakoš, Z.: Predicting the Behavior of Finite Precision Lanczos and Conjugate Gradient Computations. SIAM J. Matrix Anal. Appl. 13, 121–137 (1992) 9. Parlett, B.N., Scott, D.S.: The Lanczos algorithm with selective orthogonalization. Math. Comput. 33, 217–238 (1979) 10. Meurant, G., Strakoš, Z.: The Lanczos and conjugate gradient algorithms in finite precision arithmetic. Acta Numer. 471–542 (2006)
100
Zdeněk Dostál
11. Hestenes, M.R., Stiefel, E.: Methods of conjugate gradients for solving linear systems. J. Res. Natl. Bureau Stand. 49, 409–436 (1952) 12. van der Vorst, H.: Iterative Krylov Methods for Large Linear Systems. Cambridge University Press, Cambridge (2003) 13. Greenbaum, A.: Iterative Methods for Solving Linear Systems. SIAM, Philadelphia (1997) 14. Golub, G.H., O’Leary, D.P.: Some history of the conjugate gradient and Lanczos methods. SIAM Review 31, 50–102 (1989) 15. van der Sluis, A., van der Vorst, H.A.: The rate of convergence of the conjugate gradients. Numer. Math. 48, 543–560 (1986)
Chapter 6
Gradient Projection for Separable Convex Sets Zdeněk Dostál
An important ingredient of our algorithms for solving QP and QCQP problems is the Euclidean projection on the convex set defined by separable convex constraints. To combine the gradient projection with the CG method effectively, it is necessary to have nontrivial bounds on the decrease of f along the projected gradient path in terms of bounds on the spectrum of its Hessian matrix .A. While such results are standard for the solution of unconstrained quadratic programming problems [1, 2], it seems that until recently there were no such results for inequality constrained problems. The standard results either provide the bounds on the contraction of the gradient projection [3] in the Euclidean norm or guarantee only some qualitative properties of convergence (see, e.g., Luo and Tseng [4]). Here, we present the results concerning the decrease of f along the projected gradient path in the extent that is necessary for the development of scalable algorithms for contact problems.
6.1 Separable Convex Constraints and Projections Our goal is to get insight into the effect of the projected gradient step for the problem to find .
min f (x), x∈Ω
(6.1)
where .f = 12 xT Ax − xT b, .A ∈ Rn×n denotes an SPS matrix, .b, x ∈ Rn ,
Z. Dostál Department of Applied Mathematics and IT4Innovations National Supercomputing Center, VŠB-Technical University of Ostrava, Ostrava, Czech Republic e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 Z. Dostál et al., Scalable Algorithms for Contact Problems, Advances in Mechanics and Mathematics 36, https://doi.org/10.1007/978-3-031-33580-8_6
101
102
Zdeněk Dostál
x = [xT1 , . . . , xTs ]T ,
xi = [xi1 , . . . , xii ],
.
1 + · · · + s = n,
and Ω = Ω1 × · · · × Ωs
.
denotes a closed convex set defined by separable constraints .hi : Ri → R, .i = 1, . . . , s, i.e., Ωi = {xi ∈ Ri : hi (xi ) ≤ 0}, i = 1, . . . , s.
.
We assume that .b has the same block structure as .x, i.e., b = [bT1 , . . . , bTs ]T ,
.
bi ∈ Ri .
We are especially interested in the problems defined by the separable elliptic constraints .
hi (xi ) = (xi − yi )T Hi (xi − yi ) − di ,
xi , yi ∈ R2 ,
di > 0,
Hi SPD, (6.2)
that appear in the dual formulation of contact problems with Tresca friction or the bound constraints h (xi ) = i − xi ,
. i
i , xi ∈ R,
arising in the dual formulation of non-penetration conditions. In what follows, we denote by .PΩ the Euclidean projection to .Ω, so that PΩ (x) = arg min x − y.
.
y∈Ω
Since the constraints that define .Ω are separable, we can define .PΩ block-wise by .
PΩi (xi ) = arg min xi − y, y∈Ωi
T PΩ (x) = PΩ1 (x1 )T , . . . , PΩs (xs )T . (6.3)
The action of .PΩ is especially easy to calculate for the spherical or bound constraints. As illustrated by Fig. 6.1, the components of the projection .PΩB (x) of .x onto ΩB = {x ∈ Rn : xi ≥ i , i = 1, . . . , n}
.
are given by [PΩB (x)]i = max{i , xi }, i = 1, . . . , n.
.
6 Gradient Projection for Separable Convex Sets
103
Fig. 6.1 Euclidean projections onto convex sets
The gradient projection is an important ingredient of the minimization algorithms. A typical step of the gradient projection method is in Fig. 6.2.
Fig. 6.2 Gradient projection step
6.2 Conjugate Gradient Step Versus Gradient Projections Since the conjugate gradient is the best decrease direction which can be used to find the minimizer in a current Krylov space by Theorem 5.1, probably the first idea how to plug the projection into CG-based algorithms for (6.1) is to replace the conjugate gradient step by the projected conjugate gradient step xk+1 = PΩ (xk − αcg pk ).
.
However, if we examine Fig. 6.3, which depicts the 2D situation after the first conjugate gradient step for a bound constrained problem, we can see that though the second conjugate gradient step finds the unconstrained minimizer k k .x − αcg p , it can easily happen that f (xk ) < f (PΩ (xk − αcg pk )).
.
104
Zdeněk Dostál
Fig. 6.3 Poor performance of the projected conjugate gradient step
Fig. 6.3 even suggests that it is possible that for any .α > αf , k k k k .f (PΩ (x − αp )) > f PΩ (x − αf p ) . Though Fig. 6.3 need not capture the typical situation when a small number of components of .xk − αf pk are affected by .PΩ , we conclude that the nice properties of conjugate directions are guaranteed only in the feasible region. These observations comply with our discussion at the end of Sect. 5.3. On the other hand, since the gradient defines the direction of the steepest descent, it is natural to assume that for a small step length, the gradient perturbed by the projection .PΩ defines a decrease direction as in Fig. 6.4. We shall prove a quantitative refinement of this conjecture. In what follows, we restrict our attention to the analysis of the fixed step length gradient iteration .
xk+1 = PΩ (xk − αgk ),
where .gk = ∇f (xk ).
Fig. 6.4 Fixed step length gradient step
(6.4)
6 Gradient Projection for Separable Convex Sets
105
Which values of .α guarantee that the iterates defined by the fixed gradient in the Euclidean norm? projection step (6.4) approach the solution .x Proposition 6.1 Let .Ω denote a closed convex set, .x ∈ Ω, and .g = ∇f (x). Then for any .α > 0, .
≤ ηE x − x , PΩ (x − αg) − x
(6.5)
where .λmin , λmax are the extreme eigenvalues of .A and .
ηE = max{|1 − αλmin |, |1 − αλmax |}.
(6.6)
∈ Ω and the projected gradient at the solution satisfies .g P = o, Proof Since .x it follows that . PΩ ( x − α g) = x
.
Using that the projection .PΩ is nonexpansive by Corollary 3.1, the definition of gradient, the relations between the norm of a symmetric matrix and its spectrum (2.32), and the observation that if .λi are the eigenvalues of .A, then .1 − αλi are the eigenvalues of .I − αA (see also (2.34)), we get =PΩ (x − αg) − PΩ ( PΩ (x − αg) − x x − α g) ≤(x − αg) − ( x − α g) ) − α(g − g ) = (x − x ) − αA(x − x ) = (x − x . ) =(I − αA)(x − x . ≤ max{|1 − αλmin |, |1 − αλmax |}x − x
.
We call .ηE the coefficient of Euclidean contraction. If . α ∈ (0, 2A−1 ), then .ηE < 1. Using some elementary arguments, we get that .ηE is minimized by .
opt αE =
2 λmin + λmax
(6.7)
and .
opt ηE =
λmax − λmin κ−1 , = λmax + λmin κ+1
κ=
λmax . λmin
(6.8)
Notice that the estimate (6.5) does not guarantee any bound on the decrease of the cost function. We study this topic in Sect. 6.5.
106
Zdeněk Dostál
6.3 Quadratic Functions with Identity Hessian Which values of .α guarantee that the cost function f decreases in each iterate defined by the fixed gradient projection step (6.4)? How much does f decrease when the answer is positive? To answer these questions, it is useful to carry out some analysis for a special quadratic function .
F (x) =
1 T x x − cT x, 2
x ∈ Rn ,
(6.9)
which is defined by a fixed .c ∈ Rn , c = [ci ]. We shall also use .
F (x) =
n
Fi (xi ),
Fi (xi ) =
i=1
1 2 x − ci xi , 2 i
x = [xi ].
(6.10)
The Hessian and the gradient of F are expressed by .
∇2 F (x) = I
and
g = ∇F (x) = x − c, g = [gi ],
(6.11)
respectively. Thus, .c = x − g, and for any .z ∈ Rn , z − c2 = z2 − 2cT z + c2 = 2F (z) + c2 .
.
Since by Proposition 3.5 for any .z ∈ Ω z − c ≥ PΩ (c) − c,
.
we get that for any .z ∈ Ω, .
2F (z) = z − c2 − c2 ≥ PΩ (c) − c2 − c2 = 2F (PΩ (c)) = 2F (PΩ (x − g)) .
We have thus proved that if .y ∈ Ω, then, as illustrated in Fig. 6.5, . F PΩ (x − g) ≤ F (y).
(6.12)
(6.13)
We are especially interested in the analysis of F along the projected gradient path . p(x, α) = PΩ x − α∇F (x) , where .α ≥ 0 and .x ∈ Ω is fixed. A geometric illustration of the projected gradient path for the feasible set of a bound constrained problem is in Fig. 6.6.
6 Gradient Projection for Separable Convex Sets
107
Fig. 6.5 Minimizer of F in .Ω
Fig. 6.6 Projected gradient path
6.4 Subsymmetric Sets In the analysis of the rate of convergence in Chaps. 7 and 8, we shall use the following generalization of a property of the half-interval (see Dostál [5]). Definition 6.1 A closed convex set .Ω ⊆ Rn is subsymmetric if for any .x ∈ Ω, n .δ ∈ [0, 1], .c ∈ R , and .g = x − c, . F PΩ (x − (2 − δ)g) ≤ F PΩ (x − δg) , (6.14) where F is defined by (6.9). The condition which defines the subsymmetric set is illustrated in Fig. 6.7. Let us show that the half-interval is subsymmetric. Lemma 6.1 Let . ∈ R and .ΩB = [, ∞). Let F and g be defined by F (x) =
.
1 2 x − cx 2
and
g = x − c.
Then for any .δ ∈ [0, 1], . F PΩB (x − (2 − δ)g) ≤ F PΩB (x − δg) .
(6.15)
108
Zdeněk Dostál
Fig. 6.7 The condition which defines a subsymmetric set
Proof First assume that .x ≥ l is fixed and denote g = F (x) = x − c,
.
g(0) = 0,
g(α) = min{(x − )/α, g}, α = 0.
For convenience, let us define F PΩB (x − αg) = F (x) + Φ(α),
.
Φ(α) = −α g (α)g +
α2 2 ( g (α)) , α ≥ 0. 2
Moreover, using these definitions, it can be checked directly that .Φ is defined by ΦF (α) for α ∈ (−∞, ξ] ∩ [0, ∞) or g ≤ 0, .Φ(α) = ΦA (α) for α ∈ [ ξ , ∞) ∩ [0, ∞) and g > 0, where .ξ = ∞ if .g = 0 and .ξ = (x − )/g if .g = 0,
α2 1 .ΦF (α) = −α + g 2 , and ΦA (α) = −g(x − ) + (x − )2 . 2 2 See also Fig. 6.8.
Fig. 6.8 Graphs of .Φ for .ξ < 1 (left) and .ξ > 1 (right) when .g > 0
6 Gradient Projection for Separable Convex Sets
109
It follows that for any .α,
(2 − α)2 . ΦF (2 − α) = −(2 − α) + g 2 = ΦF (α), 2
(6.16)
and if .g ≤ 0, then .
Φ(α) = ΦF (α) = ΦF (2 − α) = Φ(2 − α).
Let us now assume that .g > 0 and denote .ξ = (x − )/g. Simple analysis shows that if .ξ ∈ [0, 1], then .Φ is nonincreasing on [0, 2] and (6.15) is satisfied for .α ∈ [0, 1]. To finish the proof of (6.15), notice that if .1 < ξ, then Φ(α) = ΦF (α),
.
α ∈ [0, 1],
Φ(α) ≤ ΦF (α),
α ∈ [1, 2],
so that we can use (6.16) to get that for .α ∈ [0, 1], Φ(2 − α) ≤ ΦF (2 − α) = ΦF (α) = Φ(α).
.
.
Bouchala and Vodstrčil managed to prove that also the ellipse is a subsymmetric set [6]. Lemma 6.2 Let .Ω ⊂ R2 be defined by an elliptic constraint Ω = {x ∈ R2 : h(x) ≤ 0},
.
h(x) = (x − y)T H(x − y) − d,
y ∈ R2 ,
d > 0,
where .H is SPD. Then .Ω is subsymmetric. Proof Clearly, we can assume that the ellipse .Ω is centered at the origin, i.e., 2 x22 x1 2 x1 .Ω = x= ∈R : 2 + 2 ≤1 , x2 a b where .a, b ∈ R and .a, b > 0. We prove the following three cases: 1. .x + αg ∈ Ω, x − αg ∈ Ω, 2. .x + αg ∈ Ω, x − αg ∈ / Ω, 3. .x + αg ∈ / Ω, x − αg ∈ / Ω. Ad 1. In this case, PΩ (x + αg) − x = αg = PΩ (x − αg) − x.
.
Ad 2. Since .[b cos t, a sin t] is the outer normal vector of the ellipse ∂Ω = {[a cos t, b sin t] ∈ R2 : t ∈ R}
.
at .[a cos t, b sin t], we can write .x + αg and .x − αg in the form
110
Zdeněk Dostál
x + αg = s[a cos t1 , b sin t1 ]T ,
.
x − αg = [a cos t2 , b sin t2 ]T + k[b cos t2 , a sin t2 ]T ,
(6.17)
where .0 ≤ s ≤ 1, k > 0, t1 , t2 ∈ R. It can be verified directly that 2 2 .PΩ (x + αg) − x − PΩ (x − αg) − x = abk 1 − s cos(t2 − t1 ) ≥ 0. Ad 3. Now we can write x + αg = [a cos t1 , b sin t1 ]T + s[b cos t1 , a sin t1 ]T ,
.
x − αg = [a cos t2 , b sin t2 ]T + k[b cos t2 , a sin t2 ]T ,
(6.18)
where .s, k > 0, and we can check that PΩ (x + αg) − x2 − PΩ (x − αg) − x2 = ab(k − s) 1 − cos(t1 − t2 ) .
.
It remains to prove that .k ≥ s. Due to the symmetry of the ellipse, we can assume .0 < t2 ≤ t1 < π. Now we consider such .t0 ∈ (t1 , π] that the line given by two points x + αg = [a cos t1 , b sin t1 ]T + s[b cos t1 , a sin t1 ]T
.
and
[a cos t0 , b sin t0 ]T
is a tangent to the ellipse. It is easy to calculate that ab 1 − cos(t0 − t1 ) ab 1 − cos(t0 − t2 ) .s = , k≥ 2 , b2 cos t0 cos t1 + a2 sin t0 sin t1 b cos t0 cos t2 + a2 sin t0 sin t2 and that the function
ab 1 − cos(t0 − t) .f (t) = b2 cos t0 cos t + a2 sin t0 sin t
has a nonpositive derivative (and therefore f is nonincreasing) on the interval [t2 , t1 ]. This implies .k ≥ s. .
.
Now we are ready to prove the main result of this section. Proposition 6.2 Let .Ω ⊂ Rn be defined as a direct product of ellipses and/or half-spaces. Then .Ω is subsymmetric, i.e., . F PΩ (x − (2 − δ)g) ≤ F PΩ (x − δg) . (6.19) Proof Let .
Ω = Ω1 × · · · × Ωs ,
(6.20)
where .Ωi is either a half-interval or an ellipse. If .s = 1, then the statement reduces to Lemma 6.1 or Lemma 6.2.
6 Gradient Projection for Separable Convex Sets
111
To prove the statement for .s > 1, first observe that for any .y ∈ R, [PΩ (y)]i = PΩi (yi ), i = 1, . . . , s.
.
It follows that .PΩ is separable and can be defined componentwise by the real functions Pi (y) = max{y, i }, i = 1, . . . , n.
.
Using the separable representation of F given by (6.10), we can define a separable representation of F in the form F (x) =
s
.
Fi (xi ), i = 1, . . . , s,
i=1
which complies with (6.20). To complete the proof, it is enough to use this representation and Lemmas 6.1 and 6.2 to get s F PΩ (x − (2 − δ)g) = Fi [PΩ (x − (2 − δ)g)]i i=1
= .
n
Fi PΩi (xi − (2 − δ)gi )
i=1
≤
n
Fi PΩi (xi − δgi )
i=1
=F PΩ (x − δg) .
.
6.5 Dominating Function and Decrease of the Cost Function Now we are ready to give an estimate of the decrease of the cost function f (x) =
.
1 T x Ax − bT x 2
along the projected gradient path. The idea of the proof is to replace f by a suitable quadratic function F which dominates f and has the Hessian equal to the identity matrix. Let us assume that .Ω is convex, .0 < δA ≤ 1, and let .x ∈ Ω be arbitrary but fixed, so that we can define a quadratic function
112
Zdeněk Dostál .
1 Fδ (y) = δf (y) + (y − x)T (I − δA)(y − x), 2
y ∈ Rn .
(6.21)
∇2 Fδ (y) = I.
(6.22)
It is defined so that .
∇Fδ (x) = δ∇f (x) = δg,
Fδ (x) = δf (x),
and
Moreover, for any .y ∈ Rn , δf (y) ≤ Fδ (y).
(6.23)
δf (PΩ (x − δg)) − δf ( x) ≤ Fδ (PΩ (x − δg)) − δf ( x)
(6.24)
.
It follows that .
and .
∇Fδ (y) = δ∇f (y) + (I − δA)(y − x) = y − (x − δg).
(6.25)
Using (6.13) and (6.22), we get that for any .z ∈ Ω, .
Fδ (PΩ (x − δg)) ≤ Fδ (z).
(6.26)
The following lemma is due to Schöberl [7, 8]. Lemma 6.3 Let .Ω be a closed convex set, let .λmin denote the smallest eigenvalue of .A, .g = ∇f (x), .x ∈ Ω, .δ ∈ (0, A−1 ], and let = arg min f (x) x
.
x∈Ω
denote a unique solution of (8.1). Then .
Fδ (PΩ (x − δg)) − δf ( x) ≤ δ(1 − δλmin ) (f (x) − f ( x)) .
Proof Let us denote [ x, x] = Conv{ x, x}
.
and
− x. d=x
Using (6.26), [ x, x] = {x + td : t ∈ [0, 1]} ⊆ Ω,
.
0 < λmin δ ≤ Aδ ≤ 1, and .λmin d2 ≤ dT Ad, we get
.
Fδ (PΩ (x − δg)) −δf ( x) = min{Fδ (y) − δf ( x) : y ∈ Ω}
.
x) : y ∈ [ x, x]} ≤ min{Fδ (y) − δf ( = min{Fδ (x + td) − δf (x + d) : t ∈ [0, 1]}
(6.27)
6 Gradient Projection for Separable Convex Sets
= min{δtdT g +
113
t2 δ d2 − δdT g − dT Ad : t ∈ [0, 1]} 2 2
1 δ ≤δ 2 λmin dT g + δ 2 λ2min d2 − δdT g − dT Ad 2 2 1 δ ≤δ 2 λmin dT g + δ 2 λmin dT Ad − δdT g − dT Ad 2 2 1 =δ(δλmin − 1)(dT g + dT Ad) 2 =δ(δλmin − 1) (f (x + d) − f (x)) x)) . =δ(1 − δλmin ) (f (x) − f (
.
Now we are ready to formulate and prove the main result of this section. deProposition 6.3 Let .Ω be a product of ellipses and/or half-spaces, let .x note the unique solution of (6.1), .g = ∇f (x), .x ∈ Ω, and let .λmin denote the smallest eigenvalue of .A. If . α ∈ (0, 2A−1 ], then .
f (PΩ (x − αg)) − f ( x) ≤ η (f (x) − f ( x)) ,
(6.28)
η = η(α) = 1 − α λmin
(6.29)
where .
is the cost function reduction coefficient and .α = min{α, 2A−1 − α}. Proof Let us first assume that .0 < αA ≤ 1, and let .x ∈ Ω be arbitrary but fixed, so that we can use Lemma 6.3 with .δ = α to get .
Fα (PΩ (x − αg)) − αf ( x) ≤ α(1 − αλmin ) (f (x) − f ( x)) .
(6.30)
In combination with (6.24), this proves (6.28) for .0 < α ≤ A−1 . To prove the statement for .α ∈ (A−1 , 2A−1 ], let us first assume that .A = 1, and let .α = 2 − δ, .δ ∈ (0, 1). Then .F1 dominates f and .
δF1 (y) ≤ δF1 (y) +
1−δ y − x2 = Fδ (y). 2
(6.31)
Thus, we can apply (6.23), Proposition 6.2, and the latter inequality to get δf PΩ (x − αg) ≤δF1 PΩ (x − αg) ≤ δF1 PΩ (x − δg) . ≤Fδ PΩ (x − δg) . Combining the latter inequalities with (6.30) for .α = δ, we get
114
Zdeněk Dostál
δf PΩ (x − αg) − δf ( x) ≤ δ(1 − δλmin ) (f (x) − f ( x) .
.
This proves the statement for .α ∈ (A−1 , 2A−1 ) and .A = 1. To finish the proof, apply the last inequality divided by .δ to the function .A−1 f , and . recall that f and .PΩ are continuous. The estimate (6.28) gives the best value η opt = 1 − κ(A)−1
.
for .α = A−1 with .κ(A) = AA−1 .
6.6 Comments While the contraction properties of the Euclidean projection have been known for a long time (see, e.g., Bertsekas [3]), it seems that the first results concerning the decrease of the cost function along the projected gradient path are due to Schöberl [7–9], who found the bound on the rate of convergence of the cost function in the energy norm for the gradient projection method with the fixed step length .α ∈ (0, A−1 ] in terms of bounds on the spectrum of the Hessian matrix .A. Later, Kučera [10] observed that the arguments provided by Schöberl are valid for any convex set. A successful application of the projections in the energy norm with a longer step length to the solution of contact problems by Farhat and Lesoinne in their FETI-DP code motivated the extension of the estimate for the bound constraints to .α ∈ [0, 2A−1 ] by Dostál [5]. The latter proof used a simple property of interval that was generalized to the concept of subsymmetric set. Bouchala and Vodstrčil in [6] and [11] extended the estimates to the spheres and ellipses by proving that they are subsymmetric. Let us mention that it is known that there are convex sets that are not subsymmetric [11], but no example of a convex set for which the estimates do not hold is known. The estimates of the decrease of quadratic cost function along the projected gradient path are important for the development of effective monotonic algorithms that combine Euclidean projections and conjugate gradients—see Chaps. 7 and 8. Let us recall that the linear rate of convergence of the cost function for the gradient projection method was proved earlier even for more general problems by Luo and Tseng [4], but they did not make any attempt to specify the constants. Very good experimental results were obtained by the so-called fast gra−1 dient methods that use a longer step length .α(xk ) ∈ (λ−1 max , λmin ) defined by various rules. Though these methods can guarantee neither the decrease of the cost function nor a faster rate of convergence, their potential should not be overlooked. The idea of the fast gradient algorithms originates in the pioneering paper by Barzilai and Borwein [12]. For the algorithms and con-
6 Gradient Projection for Separable Convex Sets
115
vergence results concerning bound constrained problems, see, e.g., Birgin et al. [13, 14], Dai and Fletcher [15], and Grippo et al. [16]. For the experimental results obtained by the spectral gradient method with a fall-back, see Pospíšil and Dostál [17].
References 1. Saad, Y.: Iterative Methods for Large Linear Systems. SIAM, Philadelphia (2002) 2. Dostál, Z.: Optimal Quadratic Programming Algorithms, with Applications to Variational Inequalities, 1st edn. Springer, New York (2009) 3. Bertsekas, D.P.: Nonlinear Optimization. Athena Scientific, Belmont (1999) 4. Luo, Z.Q., Tseng, P.: Error bounds and convergence analysis of feasible descent methods: a general approach. Annal. Oper. Res. 46, 157–178 (1993) 5. Dostál, Z.: On the decrease of a quadratic function along the projectedgradient path. ETNA 31, 25–59 (2008) 6. Bouchala, J., Dostál, Z., Vodstrčil, P.: Separable spherical constraints and the decrease of a quadratic function in the gradient projection. JOTA 157, 132–140 (2013) 7. Schöberl, J.: Solving the Signorini problem on the basis of domain decomposition techniques. Computing 60(4), 323–344 (1998) 8. Dostál, Z., Schöberl, J.: Minimizing quadratic functions over nonnegative cone with the rate of convergence and finite termination. Comput. Optim. Appl. 30(1), 23–44 (2005) 9. Schöberl, J.: Efficient contact solvers based on domain decomposition techniques. Comput. Math. Appl. 42, 1217–1228 (2001) 10. Kučera, R.: Convergence rate of an optimization algorithm for minimizing quadratic functions with separable convex constraints. SIAM J. Optim. 19, 846–862 (2008) 11. Bouchala, J., Dostál, Z., Kozubek, T., Pospíšil, L., Vodstrčil, P.: On the solution of convex QPQC problems with elliptic and other separable constraints. Appl. Math. Comput. 247(15), 848–864 (2014) 12. Barzilai, J., Borwein, J.M.: Two point step size gradient methods. IMA J. Numer. Anal. 8, 141–148 (1988) 13. Birgin, E.G., Martínez, J.M., Raydan, M.: Nonmonotone spectral projected gradient methods on convex sets. SIAM J. Optim. 10, 1196–1211 (2000) 14. Birgin, E.G., Raydan, M., Martínez, J.M.: Spectral projected gradient methods: review and perspectives. J. Stat. Softw. 60, 3 (2014)
116
Zdeněk Dostál
15. Dai, Y.H., Fletcher, R.: Projected Barzilai–Borwein methods for largescale box-constrained quadratic programming. Numer. Math. 100, 21–47 (2005) 16. Grippo, L., Lampariello, F., Lucidi, S.: A nonmonotone line search technique for Newton’s method. SIAM J. Numer. Anal. 23(4), 707–716 (1986) 17. Pospíšil, L., Dostál, Z.: The projected Barzilai–Borwein method with fall-back for strictly convex QCQP problems with separable constraints. Math Comput Simul 145, 79–89 (2018)
Chapter 7
MPGP for Separable QCQP Zdeněk Dostál
We are concerned with the special QCQP problems to find .
min f (x),
x∈ΩS
(7.1)
where .f (x) = 12 xT Ax − xT b, .b ∈ Rn , .A ∈ Rn×n is SPD, .
ΩS = Ω1 × · · · × Ωs , Ωi = {xi ∈ Rmi : hi (xi ) ≤ 0}, i ∈ S , S = {1, 2, . . . , s},
(7.2)
and .hi are convex functions. We are especially interested in the problems where .Ωi denotes a half-interval, i.e., mi = 1
.
and
hi (xi ) = i − xi , xi , i ∈ R,
or an ellipse, i.e., mi = 2, hi (xi ) = (xi − yi )T Hi (xi − yi ) − ci ,
.
xi , yi ∈ R2 , ci > 0, Hi SP D. To include the possibility that not all components of .x are constrained, we admit .hi (xi ) = −1. Problem (7.1) arises in the solution of transient contact problems or in the inner loop of TFETI-based algorithms for the solution of contact problems with possibly orthotropic friction. Here, we are interested in efficient algorithms for solving problems with large n and sparse and well-conditioned .A. Such algorithms should be able to return an approximate solution at the cost proportional to the dimension n Z. Dostál Department of Applied Mathematics and IT4Innovations National Supercomputing Center, VŠB-Technical University of Ostrava, Ostrava, Czech Republic e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 Z. Dostál et al., Scalable Algorithms for Contact Problems, Advances in Mechanics and Mathematics 36, https://doi.org/10.1007/978-3-031-33580-8_7
117
118
Zdeněk Dostál
and to recognize an acceptable solution when it is found, as our goal here is also to solve the auxiliary problems generated by the algorithms for solving more general QCQP problems. Our choice is a variant of the active set strategy that we coined MPGP (modified proportioning with gradient projections). The algorithm uses conjugate gradients to solve auxiliary unconstrained problems with the precision controlled by the norm of violation of the Karush-Kuhn-Tucker conditions. The fixed step length gradient projections are used to change the active set.
7.1 Projected Gradient and KKT Conditions By Proposition 3.11, the solution to problem (7.1) always exists and is nec of (7.1) is fully essarily unique. As .ΩS satisfies ACQ, the unique solution .x determined by the KKT conditions (3.39), so that there is .λ ∈ Rs such that .
i + ∇hi ( g xi )λi = o,
hi ( xi )λi = 0,
λi ≥ 0,
hi ( xi ) ≤ 0, i = 1, . . . , s, (7.3)
= g( where we use the notation .g x). To link the violation of the KKT conditions (7.3) to the solution error, we begin with some notations. Let .S denote the set of all indices of the constraints so that S = {1, 2, . . . , s}.
.
For any .x ∈ Rn , we define the active set of .x by A (x) = {i ∈ S : hi (xi ) = 0}.
.
Its complement F (x) = {i ∈ S : hi (xi ) = 0}
.
is called a free set. For .x ∈ ΩS , we define the outer unit normal .n by ∇hi (xi )−1 ∇hi (xi ) for .ni = ni (x) = o for
i ∈ A (x), i ∈ F (x).
The components of the gradient that violate the KKT conditions (7.3) in the free set and the active set are called a free gradient .ϕ and a chopped gradient .β, respectively. They are defined by .
ϕi (x) = gi (x) for i ∈ F (x),
ϕi (x) = o for i ∈ A (x)
(7.4)
7 MPGP for Separable QCQP .
119
β i (x) = gi (x) − (nTi gi )− ni for i ∈ A (x), (7.5)
β i (x) = o for i ∈ F (x),
where we use the notation .gi = gi (x) and .
(nTi gi )− = min{nTi gi , 0}.
Thus, the KKT conditions (7.3) are satisfied if and only if the projected gradient gP (x) = ϕ(x) + β(x)
.
is equal to zero. Notice that .gP is not a continuous function of .x in .ΩS . If .hi (xi ) defines the bound constraint .xi ≥ i , then .ni = −1, and the projected gradient is defined by .
giP = gi−
for
i ∈ A (x),
giP = gi
for
i ∈ F (x).
(7.6)
Since .
giP (x) = gi (x),
i ∈ F (x),
and for any .i ∈ A (x), giP 2 =β i 2 = (gi − {nTi gi }− ni )T (gi − {nTi gi }− ni ) . 2 =gi 2 − {nTi gi }− = giT giP , we have .
gP 2 = gT gP ≤ g gP
(7.7)
gP ≤ g.
(7.8)
and .
We need yet another simple property of the projected gradient. Lemma 7.1 Let .x, y ∈ ΩS and .g = ∇f (x). Then .
gT (y − x) ≥ (gP )T (y − x).
Proof First observe that gT (y − x) = (g − gP )T (y − x) + (gP )T (y − x).
.
Using the definition of projected gradient, we get
(7.9)
120
Zdeněk Dostál
(g − gP )T (y − x) =
.
(gi − giP )T (yi − xi ) =
i∈S
(nTi gi )− nTi (yi − xi ).
i∈A (x)
To finish the proof, it is enough to observe that for .i ∈ A (x), nTi (yi − xi ) ≤ 0
.
due to the convexity of .ΩS .
.
The following lemma can be considered as a quantitative refinement of the KKT conditions. be the solution of (7.1), and let .gP (x) denote the projected Lemma 7.2 Let .x gradient at .x ∈ ΩS . Then . x − x 2A ≤ 2 f (x) − f ( x) ≤ gP (x)2A−1 ≤ λmin (A)−1 gP (x)2 . (7.10) , and .g denote the active set, the free set, and the gradient Proof Let .A, F at the solution, respectively. Observe that if .i ∈ Aand .xi ∈ Ωi , then, using the convexity of .hi , .
∇hi ( xi )
T
i ) ≤ hi (xi ) − hi ( (xi − x xi ) = hi (xi ) ≤ 0.
F = oF that It follows by the KKT conditions (7.3) and .g T (x − x ) = g .
iT (xi − x i ) = g
i∈A
i∈A
T i ) ≥ 0. −λi ∇hi ( xi ) (xi − x (7.11)
Thus, for any .x ∈ ΩS , 1 1 )T A(x − x 2A . T (x − x ) + (x − x ) ≥ x − x f (x) − f ( x) = g 2 2
.
This proves the left inequality of (7.10). To prove the right inequality, we can use Lemma 7.1 and simple manipulations to get for any .x ∈ ΩS 0 ≥2 f ( x) − f (x) = x − x2A + 2gT ( x − x) T ≥ x − x2A + 2 gP ( x − x) . T 1 T y Ay + gP y = −(gP )T A−1 gP . ≥2 minn y∈R 2 The right inequality of (7.10) now follows easily.
.
7 MPGP for Separable QCQP
121
7.2 Reduced Gradients The projected gradient .gP is a natural error measure as it provides information about the cost function in a feasible neighborhood. If applied to a contact problem, its components can be interpreted as unbalanced contact forces. However, experience shows that the projected gradient can be very sensitive to the curvature of the boundary. The sensitivity of the projected gradient to the curvature of the feasible domain is documented in Fig. 7.1 (see [7]), where we can observe stagnation of the norm of the projected gradient in the iterations for the solution of a variant of the two clumped beams’ benchmark with orthotropic friction resolved in Sect. 12.9.2. The feasible region was defined by the elliptic constraints with semi-major and semi-minor axes a and b satisfying .a/b = 10. Let us point out that the active set was well approximated in the early stage of computations and stayed fixed after iteration 154, so the discontinuity of the projected gradient did not cause the stagnation.
Fig. 7.1 Stagnation of g P (xk ) in convergent algorithm
The explanation is in Fig. 7.2, which shows that the projected gradient can be large near the solution that is on a strongly curved part of the boundary. If the curvature of the active constraints near the solution is strong as compared with the norm of the gradient, which is typical for the elliptic constraints near a vertex or for the spherical constraints with a small radius, then it is necessary to use more robust error measures, such as the reduced
α (x), which is defined for any .x ∈ ΩS and .α > 0 by
α = g gradient .g .
α = g
1 (x − PΩS (x − αg)) . α
(7.12)
122
Zdeněk Dostál
n
−gP gP
g
x
x
Ω
Fig. 7.2 Large projected gradient near the solution
α is a continuous function of It is easy to check that the reduced gradient .g x and .α. The following lemma is an alternative to Lemma 7.2.
.
α (x) denote the be the solution of (7.1), and let .g
α = g Lemma 7.3 Let .x reduced gradient at .x ∈ ΩS . Then .
≤ ν(α)
x − x gα ,
(7.13)
where .
ν(α) =
−1
(λmin (A)) for −1 α (2 − αA) for
0 < α ≤ 2(λmin (A) + A)−1 , (7.14) 2(λmin (A) + A)−1 ≤ α < 2A−1 .
Proof Using the properties of norm and Corollary 3.1, we get that for any α > 0,
.
≤PΩS (x − αg) − x + PΩS (x − αg) − x x − x ≤α
gα + PΩS (x − αg) − PΩS ( x − αg( x)) . . ≤α
gα + max{|1 − αλmin (A)|, |1 − αλmax (A)|}x − x
After simple manipulations, we get (7.13).
.
We shall also need the inequalities formulated in the following lemma. Lemma 7.4 Let .x ∈ ΩS , .α > 0, and let us denote .g = g(x) = Ax − b,
α (x), and .gP = gP (x). Then
=g g
. α
.
α ≤ gT gP = gP 2 ≤ g2 .
gα 2 ≤ gT g
Proof Using that for all .x ∈ ΩS and .y ∈ Rn , .
x − PΩS (y)
T
y − PΩS (y) ≤ 0
(7.15)
7 MPGP for Separable QCQP
123
by Proposition 3.5, we get for any .α > 0 and .y = x − αg T T x − αg − (x − α
gα ) ≤ 0. α
gα (−αg + α
gα ) = x − (x − α
gα )
.
After simple manipulations, we get the left inequality of (7.15). To prove the rest, notice that .x − gP is the projection of .x − g to the set ˆ ˆ =Ω ˆ1 × · · · × Ω ˆ s, Ω(x) =Ω
.
where ˆ i (yi ) ≤ 0}, ˆ i = {yi ∈ Ri : h Ω T ˆ . hi (yi ) = (yi − xi ) ∇hi (xi ) ˆ hi (yi ) = −1
for i ∈ A (x), for i ∈ F (x).
Since .x − α
gα is the projection of .x − αg to .ΩS , .x − αgP is the projection ˆ S , and .ΩS ⊆ Ω, ˆ we have .x − α
gα ∈ ΩS and of .x − αg to .Ω α2 g − gP 2 = (x − αg) − (x − αgP )2
.
α 2 . ≤ (x − αg) − (x − α
gα )2 = α2 g − g It follows that .
α +
α . − gT gP = −2gT gP + gP 2 ≤ −2gT g gα 2 ≤ −gT g
The rest follows by (7.7) and (7.8).
.
7.3 Reduced Projected Gradient Though the reduced gradient provides a nice and robust estimate of the solution error, closer analysis reveals that it provides a little information about the local behavior of f in .ΩS . For example, the inequality gT (y − x) ≥ (gP )T (y − x),
.
which is essential in the development of algorithms in Sect. 9.10, does not
α . hold when we replace the projected gradient .gP by the reduced gradient .g P gα (x) for .x ∈ ΩS . Moreover, there is no constant C such that .Cg (x) ≤
α (x) are different from the corThe reason is that the free components of .g responding components of .g(x), which defines the best linear approximation of the change of .f (x) near .x in terms of the variation of free variables. The remedy is the reduced projected gradient α
(x),
P (x) = ϕ(x) + β g
. α
124
Zdeněk Dostál
which is defined for each .x ∈ Ω by the free gradient .ϕ(x) and the reduced
α (x) with the components chopped gradient .β .
ϕi (x) =gi (x) for i ∈ F (x),
α (x) .β i
ϕi (x) = o for i ∈ A (x),
⎧ ⎨ o for i ∈ F (x), T = gi(x) for i ∈ A (x) and ni gi > 0, ⎩1 T α xi − PΩi (xi − αgi ) for i ∈ A (x) and ni gi ≤ 0.
(7.16)
(7.17)
P
α Observe that .g is not a continuous function of .x and the KKT conditions (7.3) are satisfied if and only if the reduced projected gradient is equal to zero. P
α and .g
α Comparing the definitions of .g and using (7.15), we get .
P
gα ≤
gα ≤ gP .
(7.18)
We can also formulate the following lemma. P P
α be the solution of (7.1), and let .g
α Lemma 7.5 Let .x =g (x) denote the reduced projected gradient at .x ∈ ΩS . Then P ≤ ν(α)
x − x gα ≤ ν(α)gP ,
.
(7.19)
where .ν(α) is defined in Lemma 7.3. Proof The inequalities are easy corollaries of Lemma 7.3 and (7.18). Lemma 7.6 Let the set .ΩS of problem (7.1) be defined by the convex func denote the solution tions .hi with continuous second derivatives, and let .x of (7.1). Then there are constants .ε > 0 and .C > 0 such that if .x ∈ Ω, ≤ ε, and .i ∈ S , then .x − x .
. giP (x) ≤ Cx − x
(7.20)
Proof It is enough to prove (7.20) for C dependent on i. Let us first consider the components associated with the free or weakly binding set of the solution Z = {i ∈ S : gi ( x) = o}.
.
Then ) ≤ Ax − x , gi (x) = gi (x) − gi ( x) = Ai∗ (x − x
.
where .Ai∗ denotes the block row of .A which corresponds to .gi , .i ∈ Z . Observing that .giP (x) ≤ gi (x) by (7.7), we get , giP (x) ≤ Cx − x
.
C = A.
7 MPGP for Separable QCQP
125
Let us now consider the components of .g(x) with indices in B = {i ∈ A (x) : giT ( x)ni ( x) < 0},
.
≤ ε0 , and notice that so there is .ε0 > 0 such that .gi (x) = o for .x − x gT ( x)∇hi ( xi ) = −giT ( x)∇hi ( xi ) < 0.
. i
≤ ε, It follows that there is .ε ∈ (0, ε0 ) such that if .x ∈ Rn satisfies .x − x then gT (x)∇hi (xi ) < 0,
. i
i ∈ B.
≤ ε by Moreover, the mapping .ψi : Rn → Ri defined for .x − x ψi (x) = gi (x) −
.
giT (x)∇hi (xi ) ∇hi (xi ) ∇hi (xi )2
is differentiable at .x and for .i ∈ B ψi (x) = gi (x) −
.
(giT (x)∇hi (xi ))− ∇hi (xi ) = giP (x). ∇hi (xi )2
It follows that we can use the classical results of calculus to get that for each ≤ ε implies component .i ∈ B, there is .C > 0 such that .x − x . ψi (x) − ψi ( x) ≤ Cx − x
.
≤ ε, To finish the proof, recall that .ψ( x) = o and that for .i ∈ B and .x − x gP (x) = ψi (x).
. i
.
Now we are ready to prove the main result of this section. Theorem 7.1 Let the feasible set .ΩS = Ω1 × · · · × Ωs of (7.1) be defined by the convex functions .hi : Ri → R, each with a continuous second derivative. Then there are .C > 0 and .ε > 0 such that for each .x ∈ ΩS and −1 .α ∈ (0, 2A ), .
P P
gα (x) ≤ gP (x) ≤ C
gα (x).
(7.21)
Proof Let .α ∈ (0, 2A−1 ) and .x ∈ ΩS be fixed, so we can simplify the notation by g = g(x),
.
P
P = g
α g (x),
and
gP = gP (x).
The left inequality of (7.21) has already been established—see (7.18). We shall prove the right inequality of (7.21) componentwise, observing that by
126
Zdeněk Dostál
the definitions, giP =
giP for
.
i ∈ F (x).
≤ ε, and If .i ∈ A (x) and .ε > 0, .C > 0 are those of Lemma 7.6, then .x − x by Lemmas 7.5 and 7.6, .
≤ ν(α)C
giP ≤ Cx − x gP ,
(7.22)
where .ν(α) is defined in Lemma 7.3. This proves the right inequality of (7.21) for .x near the solution. ≥ ε, observe that To prove (7.21) for .i ∈ A (x) and .x − x .
+ C1 . giP ≤ gi = (gi − gi ( x)) + gi ( x) ≤ Ax − x
(7.23)
Moreover, by Lemma 7.5, .
≤ ν(α)
x − x gP .
(7.24)
≥ ε, Thus, for .x − x + (C1 /ε)ε ≤ (A + (C1 /ε)) x − x giP ≤Ax − x
.
gP . ≤ (A + (C1 /ε)) ν(α)
(7.25)
The rest follows by the finite dimension argument.
.
7.4 MPGP Scheme The algorithm that we propose here exploits a user-defined constant .Γ > 0, a test which is used to decide when to change the face, and two types of steps. The conjugate gradient step is defined by .
xk+1 = xk − αcg pk+1 ,
T αcg = bT pk+1 / pk+1 Apk+1 ,
(7.26)
where .pk+1 is the conjugate gradient direction (see Sect. 5.2) constructed recurrently. The recurrence starts (or restarts) with .pk+1 = ϕ(xk ) whenever k .x is generated by the gradient projection step. If .xk is generated by the conjugate gradient step, then .pk+1 is given by .
pk+1 = ϕ(xk ) − γpk , γ =
The coefficient .αcg is chosen so that
ϕ(xk )T Apk . (pk )T Apk
(7.27)
7 MPGP for Separable QCQP
127
f (xk+1 ) = min{f xk − αpk+1 : α ∈ R} . = min{f (x) : x ∈ xr + Span pr+1 , . . . , pk+1 . It can be checked directly that .
1 ϕ(xk )4 . f (xk+1 ) ≤ f xk − αcg ϕ(xk ) = f (xk ) − 2 ϕ(xk )T Aϕ(xk )
(7.28)
The conjugate gradient steps are used to speed up the minimization in the working face WJ = {x : hi (xi ) = 0, i ∈ J },
.
J = A (xk ).
The gradient projection step is defined by the gradient projection k+1 .x (7.29) = PΩS xk − αg(xk ) = xk − α
gα with a fixed step length .α > 0. This step can both add and remove the indices from the current working set. If for a given .Γ > 0 the inequality .
β(xk ) ≤ Γϕ(xk )
(7.30)
holds, then we call the iterate .xk proportional. Test (7.30) is used to decide if the algorithm should continue exploration of the current working set by the conjugate gradients or change the working set by the gradient projection step. If .Γ = 1, then the iterate is proportional if the free gradient dominates the violation of the KKT conditions. Alternatively, test (7.30) can be written in the form .
2δgP (x)2 ≤ ϕ(xk )2
(7.31)
with δ=
.
1 , +2
2Γ2
δ ∈ (0, 1/2),
which is more convenient for the analysis of algorithms. It is easy to check that the tests (7.30) and (7.31) are equivalent. Now we are ready to describe the basic algorithm in the form which is convenient for the analysis. More details about the implementation and the choice of parameters can be found in Sect. 7.7.2.
128
Zdeněk Dostál
Algorithm 7.1 Modified proportioning with gradient projections (MPGP schema)
Remark 7.1 The gradient projection step in (ii) can be replaced by the expansion step k+1 .x = PΩ xk − αϕ(xk ) .
7.5 Rate of Convergence In this section, we give the bounds on the difference between the value of the cost function at the solution and the current iterate. Let us first examine the effect of the CG step. denote a unique soluLemma 7.7 Let .ΩS denote a closed convex set, let .x tion of (7.1), let .λmin denote the smallest eigenvalue of .A, let .{xk } denote the iterates generated by Algorithm 7.1 with .δ ∈ (0, 1/2), and let .xk+1 be generated by the CG step. Then k+1 .f x (7.32) − f ( x) ≤ η δA−1 f (xk ) − f ( x) , where η(ξ) = 1 − ξλmin .
.
Proof Let .xk+1 be generated by the CG step, so that .xk is proportional (7.31), and denote .α = δA−1 . Using (7.28), (7.31), (7.15), and simple manipulations, we get 1 ϕ(xk )4 f (xk+1 ) =f xk − αcg ϕ(xk ) = f (xk ) − 2 ϕ(xk )T Aϕ(xk ) 1 ≤f (xk ) − A−1 ϕ(xk )2 ≤ f (xk ) − αgT (xk )gP (xk ) 2 T ≤f (xk ) − α
gα (xk )g(xk )
.
α2 T k T
(x )A
g ≤f (xk ) − α
gα (xk )g(xk ) + gα (xk ) 2 α =f PΩ xk − αg(xk ) .
7 MPGP for Separable QCQP
129
After subtracting .f ( x) from the first and the last expression, we get k+1 . f (x ) − f ( x) ≤ f PΩ xk − δA−1 g(xk ) − f ( x). (7.33)
The rest follows by (6.28).
.
The main result reads as follows. denote a unique soluTheorem 7.2 Let .ΩS be a closed convex set, let .x tion of (7.1), let .λmin denote the smallest eigenvalue of .A, and let .{xk } be generated by Algorithm 7.1 with .x0 ∈ Ω, .α ∈ (0, 2A−1 ), and .δ ∈ (0, 1/2). Then for any .k ≥ 0, k+1 .f x (7.34) − f ( x) ≤ η f (xk ) − f ( x) , where .
η = η(δ, α) = 1 − δ α λmin ≤ 1 − δκ(A)−1 ,
α = min{α, 2A−1 − α}. (7.35)
Proof If .xk+1 is generated by the gradient projection step, then the (7.34) is satisfied by Proposition 6.3. If .xk+1 is generated by the CG step, then (7.34) . is satisfied by Lemma 7.7.
7.6 Bound on Norm of Projected Gradient To use the MPGP algorithm in the inner loops of other algorithms, we must be able to recognize when we are near the solution. However, there is a catch—though the latter can be tested by a norm of the projected gradient by Lemma 7.2, Theorem 7.2 does not guarantee that such test is positive near the solution. The projected gradient is not continuous and can be large near the solution as shown in Fig. 7.3. Here, we show that the situation is different for the subsequent iterates of MPGP. To see the difference, let us assume that .{xk } is generated by MPGP for the solution of (7.1), and let .k ≥ 1 be arbitrary but fixed. The main tool in our analysis is the linearized problem associated with .xk that reads .
min f (x)
subject to
ˆ k, x∈Ω
ˆk = Ω ˆ k1 × · · · × Ω ˆ ks , Ω
where ˆ i (xi ) ≤ 0} ˆ k = {xi ∈ Ri : h Ω i ˆ i (xi ) = (xi − xk )T ∇hi (xk ) .h i i ˆ i (xi ) = −1 h
for i ∈ S , for i ∈ A (xk ), for i ∈ F (xk ).
(7.36)
130
Zdeněk Dostál
ΩB g(x) gP (x) = o
gP (xk ) xk
x
Fig. 7.3 Large projected gradient near the solution
Comparing (7.36) with our original problem (7.1), we can see that the original constraints on .xi are omitted in (7.36) for .i ∈ F (xk ) and replaced by their linearized versions for .i ∈ A (xk ). Since .hi are convex by the assumptions, we get easily .
ˆ and ni = n ˆ i , i ∈ A (xk ). ΩS ⊆ Ω
(7.37)
Problem (7.36) is defined so that the iterate .xk , obtained from .xk−1 by MPGP for the solution of problem (7.1), can also be considered as an iterate for the solution of (7.36). We use the hat to distinguish the concepts related to problem (7.36) from those related to the original problem (7.1). ˆ k . For For example, .Aˆk (x) denotes the active set of .x ∈ Rn with respect to .Ω ˆα . The typographical reasons, we denote the reduced gradient for (7.36) by .g following relations are important in what follows. Lemma 7.8 Let .xk denote an iterate generated by the MPGP algorithm under the assumptions of Theorem 7.2, let problem (7.36) be associated with k ˆ P (xk ) and .g ˆα (xk ) denote the projected gradient and the reduced .x , and let .g gradient associated with problem (7.36), respectively. Then .
ˆ P (xk ) = g ˆα (xk ). gP (xk ) = g
(7.38)
k k ˆ = g ˆα (xk ), and Proof Let .i ∈ A (xk ), .n = n(x ), .g = g(x ), .α > 0, .g T .ni gi < 0. Using standard linear algebra and (7.37), we get
ˆ i. xki − PΩˆ i (xki − αgi ) = αˆ gi = αgi − α(nTi gi )− ni = αgi − α(ˆ nTi gi )− n
.
Thus, αˆ gi = αgiP = αˆ giP .
.
If .i ∈ F (xk ) or .nTi gi ≥ 0, then obviously
7 MPGP for Separable QCQP
131
ˆiP = g ˆi = gi . gP = g
. i
.
We shall also use the following lemma on the three subsequent iterations that is due to Kučera [1]. ˆ and satisfy Lemma 7.9 Let .ξ 0 , .ξ 1 , and .ξ 2 belong to .Ω 2 .f ξ (7.39) − f ( ξ) ≤ η f (ξ 1 ) − f ( ξ) ≤ η 2 f (ξ 0 ) − f ( ξ) , where . ξ denotes the solution of (7.36) and .η ∈ (0, 1). Then f (ξ 1 ) − f (ξ 2 ) ≤
.
1+η η f (ξ 0 ) − f (ξ 1 ) . 1−η
Proof We shall repeatedly apply (7.39). As f (ξ 0 ) − f (ξ 1 ) = f (ξ 0 ) − f ( ξ) − f (ξ 1 ) − f ( ξ) ξ) ≥(1 − η) f (ξ 0 ) − f ( . 1−η ≥ ξ) f (ξ 1 ) − f ( η and
f (ξ 1 ) + f (ξ 0 ) 2 − f (ξ ) f (ξ ) − f (ξ ) =2 2 f (ξ 1 ) + f (ξ 2 ) . ≤2 − f ( ξ) 2 ≤(1 + η) f (ξ 1 ) − f ( ξ) , 1
2
we get .
1 f (ξ 1 ) − f (ξ 2 ) ≤ f (ξ 1 ) − f ( ξ) 1+η η ≤ f (ξ 0 ) − f (ξ 1 ) . 1−η
This completes the proof.
.
Now we are ready to prove the main result of this section. Theorem 7.3 Let .{xk } denote the iterates generated by the MPGP algorithm under the assumptions of Theorem 7.2 with α ∈ (0, 2A−1 )
.
and
δ ∈ (0, 1/2).
132
Zdeněk Dostál
Then for any .k ≥ 1,
g (x ) ≤ a1 η f (x ) − f ( x) , P
.
k
2
k
0
2 1+η , a1 = α 1−η
(7.40)
where .α and .η = η(δ, α) are defined in Theorem 7.2. Proof As mentioned above, our main tool is the observation that given .k ≥ 1, we can consider the iterates .xk−1 and .xk as initial iterates for auxiliary problem (7.36). Let us first show that k . f (x ) − f ( (7.41) ξ) ≤ η f (xk−1 ) − f ( ξ) , where . ξ denotes a unique solution of (7.36). We shall consider separately two steps that can generate .xk . If .xk is generated by the conjugate gradient step for problem (7.1), then xki = xk−1 i
.
for
i ∈ A (xk−1 ).
Noticing that Aˆk (x) ⊆ A (x)
.
for any .x ∈ ΩS , we get .
ˆ k−1 ) ≤ β(xk−1 ) β(x
and
ˆ k−1 ) ≥ ϕ(xk−1 ). ϕ(x
It follows that .xk−1 is proportional also as an iterate for the solution of problem (7.36) and (7.41) holds true by Theorem 7.2. To prove (7.41) for .xk generated by the gradient projection step, notice ˆ k is defined in such a way that that .Ω k−1 k .x = PΩS x − αg(xk−1 ) = PΩˆ k xk−1 − αg(xk−1 ) . We conclude that (7.41) holds true. Let us define ξ 0 = xk−1 , ξ 1 = xk , and ξ 2 = PΩˆ k ξ 1 − αg(ξ 1 ) .
.
Using Theorem 7.2 and (7.41), we get 2 . f (ξ ) − f ( ξ) ≤ η(δ, α) f (ξ 1 ) − f ( ξ) ≤ η(δ, α)2 f (ξ 0 ) − f ( ξ) .
(7.42)
It follows that the assumptions of Lemma 7.9 are satisfied with .η = η(δ, α) and
7 MPGP for Separable QCQP
133
1+η η f (ξ 0 ) − f (ξ 1 ) 1−η 1+η = η f (xk−1 ) − f (xk ) 1−η 1+η η f (xk−1 ) − f ( ≤ x) 1−η 1+η k η f (x0 ) − f ( ≤ x) . 1−η
f (ξ 1 ) − f (ξ 2 ) ≤
.
Finally, using Lemma 7.8, relations (7.7), and simple manipulations, we get f ξ 1 − f ξ 2 =f ξ 1 − f PΩˆ ξ 1 − αg(ξ 1 ) α2 T 1 ˆ (ξ )Aˆ g gα (ξ 1 ) 2 α α2 T ≥αˆ gα Aˆ gα (ξ 1 )2 (ξ 1 )g(ξ 1 ) − 2 α2 T ˆα A) g =(α − (ξ 1 )g(ξ 1 ) 2 1 T ˆα = Aα(2A−1 − α) g (ξ 1 )g(ξ 1 ) 2 α P 1 T α T 1 ˆα (ξ )g(ξ 1 ) = ˆ (ξ ) g(ξ 1 ) ≥ g g 2 2 α α α P 1 2 g (ξ ) = gP (ξ 1 )2 = gP (xk )2 . = ˆ 2 2 2 T =αˆ gα (ξ 1 )g(ξ 1 ) −
.
To verify the last inequality, consider .α ∈ (0, A−1 ] and .α ∈ (A−1 , 2A−1 ) separately. Putting the last terms of the above chains of relations together, . we get (7.40).
7.7 Implementation In this section, we describe Algorithm 7.1 in the form that is convenient for implementation. We include also some modifications that may be used to improve its performance.
7.7.1 Projection Step with Feasible Half-Step To improve the efficiency of the projection step, we can use the trial conjugate gradient direction .pk which is generated before the projection step is invoked. We propose to generate first
134
Zdeněk Dostál 1
xk+ 2 = xk − αf pk
and
.
1
gk+ 2 = gk − αf Apk ,
where the feasible step length .αf for .pk is defined by .
and then define
αf = max{α : xk − αpk ∈ ΩS },
(7.43)
1 1 xk+1 = PΩS xk+ 2 − αg(xk+ 2 ) .
.
The half-step is illustrated in Fig. 7.4. Such modification does not require any additional matrix-vector multiplication, and estimate (7.32) remains valid as 1
f (xk+ 2 ) − f (xk ) ≤ 0
.
and
1 f (xk+1 ) − f ( x) ≤ηΓ (f (xk+ 2 ) − f (xk )) + f (xk ) − f ( x) . x) . ≤ηΓ f (xk ) − f (
Fig. 7.4 Feasible half-step
Since our analysis is based on the worst-case analysis, the implementation of the feasible half-step does not result in improving the error bounds.
7.7.2 MPGP Algorithm in More Detail Now we are ready to give the details of implementation of the MPGP algorithm which was briefly described in a form suitable for analysis as
7 MPGP for Separable QCQP
135
Algorithm 7.1. To preserve readability, we do not distinguish the generations of variables by indices unless it is convenient for further reference. Algorithm 7.2 MPGP with a feasible half-step
Our experience indicates that the performance of MPGP is not sensitive to .Γ as long as .Γ ≈ 1. Since .Γ = 1 minimizes the upper bound on the rate of convergence and guarantees that the CG steps reduce directly the larger of the two components of the projected gradient, we can expect good efficiency with this value. Recall that .Γ = 1 corresponds to .δ = 1/4. The choice of .α requires an estimate of .A. If we cannot exploit a specific structure of .A, then we can carry out a few iterations of the following power method.
136
Zdeněk Dostál
Algorithm 7.3 Power method for the estimate of .A
Alternatively, we can use the Lanczos method (see, e.g., Golub and van Loan [2]). We can conveniently enhance the Lanczos method into the first conjugate gradient loop of the MPRGP algorithm by defining .
qi = ϕ(xs+i )−1 ϕ(xs+i ),
i = 0, . . . , p,
where .ϕ(xs ) and .ϕ(xs+i ) are free gradients at the initial and the i th iterate in one CG loop, respectively. Then we can estimate .A by applying a suitable method for evaluation of the norm of the tridiagonal matrix [2] T = QT AQ,
.
Q = [q0 , . . . , qp ].
Though these methods typically give only a lower bound A on the norm of A, the choice like .α = 1.8A−1 is often sufficient in practice. The decrease of f can be achieved more reliably by initializing .α ≥ 2(bT Ab)−1 b2 and by inserting the following piece of code into the expansion step:
.
Algorithm 7.4 Modification of the steplength of the extension step
The modified algorithm can outperform that with .α = A−1 as longer steps in the early stage of computations can be effective for the fast identification of the active set of the solution. We observed a good performance opt which minimizes the with .α close to, but not greater than, .2A−1 , near .αE coefficient .ηE of the Euclidean contraction (6.8).
7.8 Comments The MPGP algorithm presented in this section is a variant of the algorithms which combine conjugate gradients with Euclidean projections. Such algorithms were developed first for solving the bound constrained QP problems. See Sect. 8.7 for more information. It seems that the first algorithm of this type for separable QCQP problems was presented by Kučera [3], who found the way how to adapt the MPRGP
7 MPGP for Separable QCQP
137
algorithm ([4]; see the next chapter), originally proposed for the solution of bound constrained QP problems, for solving more general problems. He proved the R-linear convergence of his KPRGP algorithm for the step length −1 .α ∈ (0, A ]. Later, he also proved the R-linear convergence of projected gradients for the step length .α ∈ (0, A−1 ] [1]. The MPGP algorithm presented here appeared in Dostál and Kozubek [5]. The estimate of the rate of convergence presented in Theorem 7.2 is better than that in [1] and guarantees the R-linear convergence of the projected gradient for .α ∈ (0, 2A−1 ]. The proof uses the estimates due to Bouchala et al. [6, 7]. Here, we provided the analysis of a monotonically decreasing algorithm with the rate of convergence in bounds on the spectrum of .A. However, we observed that its performance can be sometimes improved using some heuristic modifications, in particular those using longer step length or unfeasible steps, such as the conjugate gradient or Barzilai-Borwein step length for the solution of a related minimization problem in free variables or the heuristics proposed in [8]. See also Sect. 6.6. The performance of MPGP can also be improved by preconditioning. We have postponed the description of preconditioning to Sect. 8.6 in order to exploit the simplified setting of the bound constrained QP problem. The preconditioning in face improves the solution of auxiliary unconstrained problems, while the preconditioning by a conjugate projector improves the efficiency of all steps, including the nonlinear ones. The effective preconditioners are problem dependent. The preconditioners suitable for the solution of contact problems which comply with the FETI methodology are in Chap. 17. The preconditioning by the conjugate projector (deflation) is described in Sect. 13.7 in the context of transient contact problems.
References 1. Dostál, Z., Kučera, R.: An optimal algorithm for minimization of quadratic functions with bounded spectrum subject to separable convex inequality and linear equality constraints. SIAM J. Optim. 20(6), 2913–2938 (2010) 2. Golub, G.H., Van Loan, C.F.: Matrix Computations, 2nd edn. Johns Hopkins University Press, Baltimore (1989) 3. Kučera, R.: Convergence rate of an optimization algorithm for minimizing quadratic functions with separable convex constraints. SIAM J. Optim. 19, 846–862 (2008) 4. Dostál, Z., Schöberl, J.: Minimizing quadratic functions over non-negative cone with the rate of convergence and finite termination. Comput. Optim. Appl. 30(1), 23–44 (2005)
138
Zdeněk Dostál
5. Dostál, Z., Kozubek, T.: An optimal algorithm with superrelaxation for minimization of a quadratic function subject to separable constraints with applications. Math. Program. Ser. A 135, 195–220 (2012) 6. Bouchala, J., Dostál, Z., Vodstrčil, P.: Separable spherical constraints and the decrease of a quadratic function in the gradient projection. J. Optim. Theory Appl. 157, 132–140 (2013) 7. Bouchala, J., Dostál, Z., Kozubek, T., Pospíšil, L., Vodstrčil, P.: On the solution of convex QPQC problems with elliptic and other separable constraints. Appl. Math. Comput. 247(15), 848–864 (2014) 8. Dostál, Z.: Box constrained quadratic programming with proportioning and projections. SIAM J. Optim. 7(3), 871–887 (1997)
Chapter 8
MPRGP for Bound Constrained QP Zdeněk Dostál
We shall now be concerned with a special case of separable problem (7.1), the bound constrained problem, to find .
min f (x)
(8.1)
x∈ΩB
with ΩB = {x ∈ Rn : x ≥ },
.
f (x) =
1 T x Ax − xT b, 2
and .b given column n-vectors, and .A an .n × n SPD matrix. To include the possibility that not all components of .x are constrained, we admit .i = −∞. The problem (8.1) appears in the dual formulation of both static and dynamic contact problems without friction. There are two specific features of (8.1) that are not explicitly exploited by the MPGP algorithm of Chap. 7, namely, the possibility to move arbitrarily in the direction opposite to the chopped gradient and a simple observation that knowing the active constraints of the solution amounts to knowing the corresponding components of the solution. Here, we present a modification of MPGP that is able to exploit these features. The modified algorithm is a variant of the active set strategy that we coined MPRGP (modified proportioning with reduced gradient projections). The algorithm uses the conjugate gradients to solve the auxiliary unconstrained problems with the precision controlled by the norm of the dominating component of the projected
.
Z. Dostál Department of Applied Mathematics and IT4Innovations National Supercomputing Center, VŠB-Technical University of Ostrava, Ostrava, Czech Republic e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 Z. Dostál et al., Scalable Algorithms for Contact Problems, Advances in Mechanics and Mathematics 36, https://doi.org/10.1007/978-3-031-33580-8_8
139
140
Zdeněk Dostál
gradient. The fixed step length reduced gradient projections and the optimal step length chopped gradient steps are used to expand and reduce the active set, respectively. It turns out that MPRGP has not only the R-linear rate of convergence in terms of the extreme eigenvalues of the Hessian matrix as MPGP of the previous chapter but also the finite termination property, even in the case that the solution is dual degenerate. We consider the finite termination property important, as it indicates that the algorithm does not suffer from undesirable oscillations often attributed to the active set-based algorithms and thus can better exploit the superconvergence properties of the conjugate gradient method for linear problems.
8.1 Specific Form of KKT Conditions Let us introduce special notations that enable us to simplify the form of the projected gradient (7.6) and of the KKT conditions which read .
gP (x) = o.
(8.2)
The KKT conditions at .x ∈ ΩB determine three subsets of the set N = {1, . . . , n} of all indices. The set of all indices for which .xi = i is called an active set of x. We denote it by .A (x), so
.
A (x) = {i ∈ N : xi = i }.
.
Its complement F (x) = {i ∈ N : xi = i }
.
and subsets B(x) = {i ∈ N : xi = i and gi > 0}, B0 (x) = {i ∈ N : xi = i and gi ≥ 0}
.
are called a free set, a binding set, and a weakly binding set, respectively. Using the subsets of .N , we can decompose .gP (x) into the free gradient .ϕ and the chopped gradient (Fig. 8.1) .β that are defined by .
ϕi (x) = gi (x) for i ∈ F (x), ϕi (x) = 0 for i ∈ A (x), βi (x) = gi− (x) for i ∈ A (x), βi (x) = 0 for i ∈ F (x),
where we have used the notation .gi− = min{gi , 0}. Thus, .
gP (x) = ϕ(x) + β(x).
8 MPRGP for Bound Constrained QP
141
ΩB −gP
gP =
g
g = gP =
g
−gP gP = o
−gP
β g
= gP
Fig. 8.1 Gradient splitting
8.2 MPRGP Algorithm Let us first recall that if the indices of the active constraints of the solution are known, then the corresponding components are known. There is a simple result which shows that if the norm of the chopped gradient is sufficiently larger than the norm of the free gradient, then it is possible to recognize some indices of active constraints that do not belong to the solution active set and use the components of the chopped gradient to reduce the active set [1, Lemma 5.4]. Notice that the latter can be done much more efficiently with bound constraints and then with more general constraints, as any step in the direction opposite to the chopped gradient is feasible. MPRGP enhances these observations by replacing the gradient projection step of MPGP which changes the active set by the free gradient projection with a fixed step length which expands the active set. The modified algorithm has been proved to preserve the R-linear rate of convergence of the cost function and to enjoy the finite termination property even for QP problems with dual degenerate solution. The MPRGP algorithm exploits a user-defined constant .Γ > 0, a test which is used to decide when to leave the face, and three types of steps. The conjugate gradient step, defined by .
xk+1 = xk − αcg pk+1 ,
(8.3)
is used in the same way as in the MPGP algorithm introduced in Sect. 7.4. The active set is expanded by the expansion step defined by the free gradient projection k+1 .x (8.4) = PΩB xk − αϕ(xk ) = max{, xk − αϕ(xk )} with a fixed step length .α. To describe it in the form suitable for analysis, α (x) let us recall that, for any .x ∈ ΩB and .α > 0, the reduced free gradient .ϕ is defined by the entries
142
Zdeněk Dostál .
ϕ i = ϕ i (x, α) = min{(xi − i )/α, ϕi }, i ∈ N = {1, . . . , n},
(8.5)
so that α (x). PΩB x − αϕ(x) = x − αϕ
.
Using this notation, we can write also α (x) + β(x) . . PΩB x − αg(x) = x − α ϕ
(8.6)
(8.7)
If the step length is equal to .α and the inequality α (xk )T ϕ(xk ) ||β(xk )||2 ≤ Γ2 ϕ
.
(8.8)
holds, then we call the iterate .xk strictly proportional. Test (8.8) is used to decide which components of the projected gradient .gP (xk ) should be reduced in the next step. Notice that the right-hand side of (8.8) blends the information about the free gradient and its part that can be used in the gradient projection step. It is possible to replace the free gradient by some other direction, e.g., .g− . We have made some experiments, but have not found much difference. The proportioning step is defined by .
xk+1 = xk − αcg β(xk )
(8.9)
with the step length αcg = arg min f xk − αβ(xk ) .
.
α>0
It has been shown in Sect. 5.1 that the CG step length .αcg that minimizes f (x−αd) for a given .d and .x can be evaluated using the gradient .g = g(x) = ∇f (x) at .x by
.
.
αcg = αcg (d) = dT g/dT Ad.
(8.10)
The purpose of the proportioning step is to remove the indices of the components of the gradient .g that violate the KKT conditions from the working set and to move far from the bounds. Note that if .xk ∈ ΩB , then .
xk+1 = xk − αcg β(xk ) ∈ ΩB .
Now we are ready to define the algorithm in the form that is convenient for analysis, postponing the discussion about implementation to the next section.
8 MPRGP for Bound Constrained QP
143
Algorithm 8.1 Modified proportioning with reduced gradient projections (MPRGP schema)
We call our algorithm modified proportioning to distinguish it from earlier algorithms introduced independently by Friedlander and Martínez with their collaborators [2] and Dostál [3].
8.3 Rate of Convergence The result on the rate of convergence of both the iterates generated by MPRGP and the projected gradient reads as follows. Theorem 8.1 Let .{xk } be generated by Algorithm 8.1 with .x0 ∈ ΩB , .Γ > 0, and .λmin denote a unique solution of (8.1) and and .α ∈ (0, 2A−1 ). Let .x the smallest eigenvalue of .A, respectively. Then for any .k ≥ 1, k+1 . f (x (8.11) ) − f ( x) ≤ηΓ f (xk ) − f ( x) ,
.
gP (xk+1 )2 ≤a1 ηΓk f (x0 ) − f ( x) ,
(8.12)
2A ≤2ηΓk f (x0 ) − f ( xk − x x) ,
(8.13)
.
where .
.
ηΓ =1 −
α λmin , 2 ϑ + ϑΓ
ϑ =2 max{αA, 1} ≤ 4,
= max{Γ, Γ−1 }, Γ α = min{α, 2A−1 − α},
(8.14)
(8.15)
144
Zdeněk Dostál
and .
a1 =
1 + ηΓ . 2 α(1 − ηΓ )
(8.16)
Proof The proof of (8.11) is technical and may be found in Domorádová et al. [4] or Dostál [1]. The proof of (8.12) is a simplified version of the proof of Theorem 7.3 based on estimate (8.11). Notice that Lemma 7.9 on three . iterates can be applied directly to .xk−1 , xk , and .xk+1 . Theorem 8.1 gives the best bound on the rate of convergence for = 1 in agreement with the heuristics that we should leave the face Γ = Γ when the chopped gradient dominates the violation of the Karush-KuhnTucker conditions. The formula for the best bound .ηΓopt which corresponds to −1 .Γ = 1 and .α = A reads .
.
ηΓopt = 1 − κ(A)−1 /4,
(8.17)
where .κ(A) denotes the spectral condition number of .A. The bound on the rate of convergence of the projected gradient given by (8.12) is rather poor. The reason is that it has been obtained by the worst-case analysis of a general couple of consecutive iterations and does not reflect the structure of a longer chain of the same type of iterations. Recall that Fig. 7.3 shows that no bound on .gP (xk ) can be obtained by the analysis of a single iteration!
8.4 Identification Lemma and Finite Termination Let us consider the conditions which guarantee that the MPRGP algorithm of (8.1) in a finite number of steps. Recall that such finds the solution .x algorithm is more likely to generate longer sequences of the conjugate gradient iterations. In this case, the reduction of the cost function values is bounded by the “global” estimate (5.15) and finally switches to the conjugate gradient method, so that it can exploit its nice self-acceleration property [5]. It is difficult to enhance these characteristics of the algorithm into the rate of convergence as they cannot be obtained by the analysis of just one step of the method. We first examine the finite termination of Algorithm 8.1 in a simpler case of (8.1) is not dual degenerate, i.e., the vector of Lagrange when the solution .x of the solution satisfies the strict complementarity condition multipliers .λ > 0 for .i ∈ A ( .λ x). The proof is based on simple geometrical observations and the arguments proposed by Moré and Toraldo [6]. For example, it is easy to see that the free sets of the iterates .xk soon contain the free set of
8 MPRGP for Bound Constrained QP
145
. The formal analysis of such observations is a subject of the the solution .x following identification lemma. Lemma 8.1 Let .{xk } be generated by Algorithm 8.1 with .x0 ∈ ΩB , .Γ > 0, and .α ∈ (0, 2A−1 ]. Then there is .k0 such that for .k ≥ k0 , .
F ( x) ⊆ F (xk ),
k )), F ( x) ⊆ F (xk − αϕ(x
and
B( x) ⊆ B(xk ), (8.18)
α (xk ) is defined by (8.5). k) = ϕ where .ϕ(x , Proof Since (8.18) is trivially satisfied when there is .k = k0 such that .xk = x for any .k ≥ 0. Let us denote we shall assume in what follows that .xk = x k k .xi = [x ]i and .x i = [ xk ]i , .i = 1, . . . , n. x) = ∅ and .B( x) = ∅, so that we can define Let us first assume that .F ( ε = min{ xi − i : i ∈ F ( x)} > 0 and
.
δ = min{gi ( x) : i ∈ B( x)} > 0.
by Theorem 8.1, there is .k0 such that for any Since .{xk } converges to .x .k ≥ k0 , ε for i ∈ F ( x), 4α
(8.19)
xki ≥i +
ε for i ∈ F ( x), 2
(8.20)
xki ≤i +
αδ for i ∈ B( x), 8
(8.21)
δ for i ∈ B( x). 2
(8.22)
gi (xk ) ≤
.
.
.
.
gi (xk ) ≥
In particular, for .k ≥ k0 , the first inclusion of (8.18) follows from (8.20), while x) the second inclusion follows from (8.19) and (8.20), as for .i ∈ F ( xk − αϕi (xk ) = xki − αgi (xk ) ≥ i +
. i
ε αε − > i . 2 4α
Let .k ≥ k0 and observe that, by (8.21) and (8.22), for any .i ∈ B( x) xk − αgi (xk ) ≤ i +
. i
αδ αδ − < i , 8 2
so that if some .xk+1 is generated by the expansion step (8.4), .k ≥ k0 , and i ∈ B( x), then
.
xk+1 = max{i , xki − αgi (xk )} = i .
. i
146
Zdeněk Dostál
It follows that if .k ≥ k0 and .xk+1 is generated by the expansion step, then k+1 .B(x ) ⊇ B( x). Moreover, using (8.22) and the definition of Algorithm 8.1, x) and .k ≥ k0 , then also we can directly verify that if .B(xk ) ⊇ B( k+1 .B(x ) ⊇ B( x). Thus, it remains to prove that there is .s ≥ k0 such that s .x is generated by the expansion step. Let us examine what can happen for .k ≥ k0 . First observe that we can never take the full CG step in the direction .pk = ϕ(xk ). The reason is that αcg (pk ) =
.
ϕ(xk )2 ϕ(xk )T g(xk ) α = ≥ A−1 ≥ , k T k k T k ϕ(x ) Aϕ(x ) ϕ(x ) Aϕ(x ) 2
so that for .i ∈ F (xk ) ∩ B( x), by (8.21) and (8.22), .
xki − αcg pki = xki − αcg gi (xk ) ≤ xki −
αδ α αδ gi (xk ) ≤ i + − < i . (8.23) 2 8 4
It follows by the definition of Algorithm 8.1 that if .xk , k ≥ k0 , is generated by the proportioning step, then the following trial conjugate gradient step is not feasible and .xk+1 is necessarily generated by the expansion step. To complete the proof, observe that Algorithm 8.1 can generate only a finite sequence of consecutive conjugate gradient iterates. Indeed, if there is neither proportioning step nor the expansion step for .k ≥ k0 , then it follows by the finite termination property of the conjugate gradient method and that there is .l ≤ n such that .ϕ(xk0 +l ) = o. Thus, either .xk0 +l = x k .B(x ) = B( x) for .k ≥ k0 + l by rule (i), or .xk0 +l is not strictly proportional, k +l+1 .x 0 is generated by the proportioning step, and .xk0 +l+2 is generated by x) = ∅ and the expansion step. This completes the proof, as the cases .F ( .B( x) = ∅ can be proved by the analysis of the above arguments. . k 0 Proposition 8.1 Let .{x−1} be generated by Algorithm 8.1 with .x ∈ ΩB , satisfy the condition of strict .Γ > 0, and .α ∈ 0, 2A . Let the solution .x i = i implies .gi ( x) > 0. Then there is .k ≥ 0 such complementarity, i.e., .x . that .xk = x
satisfies the condition of strict complementarity, then .A ( Proof If .x x) = B( x), and, by Lemma 8.1, there is .k0 ≥ 0 such that for .k ≥ k0 , we have k .F (x ) = F ( x) and .B(xk ) = B( x). Thus, for .k ≥ k0 , all .xk that satisfy k−1 = x .x are generated by the conjugate gradient steps, and, by the finite . termination property of CG, there is .k ≤ k0 + n such that .xk = x . Unfortunately, the discretization of contact problems with a smooth contact interface typically results in the QP problems with a dual degenerate or nearly dual degenerate solution. The reason is that there can be the couples of points on the boundary of contact interface that are in contact but do not press each other. The solution of (8.1) which does not satisfy the strict complementarity condition is in Fig. 8.2.
8 MPRGP for Bound Constrained QP
147
x1 gP1
ΩB
gP2 x2
x3 gP3 = o
x4 gP4
Fig. 8.2 Projected gradients near dual degenerate solution
A unique feature of MPRGP is that it preserves the finite termination property even in this case provided the balancing parameter .Γ is sufficiently large. The result is a subject of the following theorem (see [1, Theorem 5.21]). Theorem 8.2 Let .{xk } denote the sequence generated by Algorithm 8.1 with 0 . x ∈ ΩB , Γ≥3 κ(A) + 4 , and α ∈ (0, 2A−1 ]. (8.24) . Then there is .k ≥ 0 such that .xk = x Let us recall that the finite termination property of the MPRGP algorithm with a dual degenerate solution and .
α ∈ (0, A−1 ]
has been proved for Γ≥2
.
κ(A) + 1 .
For the details, see Dostál and Schöberl [7].
8.5 Implementation of MPRGP In this section, we describe Algorithm 8.1 in the form which is convenient for implementation. To improve the efficiency of expansion steps, we include the feasible half-step introduced in Sect. 7.7.1, which is now associated with the expansion step. Recall that it uses the trial conjugate gradient direction k .p which is generated before the expansion step is invoked. We propose to generate first 1
xk+ 2 = xk − αf pk
.
and
1
gk+ 2 = gk − αf Apk ,
where .αf denotes the feasible step length for .pk defined by
148
Zdeněk Dostál
αf = min {(xki − i )/pki , pki > 0},
.
i=1,...,n
and then define .
1 1 xk+1 = PΩS xk+ 2 − αϕ(xk+ 2 ) .
To preserve readability, we do not distinguish the generations of auxiliary vectors. The MPRGP algorithm with the feasible step reads as follows. Algorithm 8.2 Modified proportioning with reduced gradient projections (MPRGP)
For the choice of parameters, see Sect. 7.7.2.
8.6 Preconditioning The performance of the CG-based methods can be improved by preconditioning described in Sect. 5.4. However, the application of preconditioning requires some care, as the preconditioning transforms variables, turning the
8 MPRGP for Bound Constrained QP
149
bound constraints into more general inequality constraints. In this section, we present a basic strategy for the preconditioning of auxiliary linear problems. Probably, the most straightforward preconditioning strategy which preserves the bound constraints is the preconditioning applied to the diagonal block .AF F of the Hessian matrix .A in the conjugate gradient loop which minimizes the cost function f in the face defined by a free set .F . Such preconditioning requires that we are able to define for each diagonal block .AF F a regular matrix .M(F ) which satisfies the following two conditions. First, we require that .M(F ) approximates .AF F so that the convergence of the conjugate gradient method is significantly accelerated. The second condition requires that the solution of the system .
M(F )x = y
can be obtained easily. The preconditioners .M(F ) can be generated, e.g., by any of the methods described in Sect. 5.4. Though the performance of the algorithm can be considerably improved by the preconditioning, the preconditioning in face does not result in the improved bound on the rate of convergence. The reason is that such preconditioning affects only the feasible conjugate gradient steps, leaving the expansion and proportioning steps without any preconditioning. In probably the first application of preconditioning to the solution of bound constrained problems [8], O’Leary considered two simple methods which can be used to obtain the preconditioner for .AF F from the preconditioner .M which approximates .A, namely, M(F ) = MF F
.
and
M(F ) = LF F LTF F ,
where .L denotes the factor of the Cholesky factorization .M = LLT . It can be proved that whichever method of the preconditioning is used, the convergence bound for the conjugate gradient algorithm applied to the subproblems is at least as good as that of the conjugate gradient method applied to the original matrix [8]. To describe the MPRGP algorithm with the preconditioning in face, let us assume that we are given the preconditioner .M(F ) for each set of indices .F , and let us denote .Fk = F (xk ) and .Ak = A (xk ) for each vector .xk ∈ ΩB . To simplify the description of the algorithm, let .Mk denote the preconditioner corresponding to the face defined by .Fk padded with zeros so that .
[Mk ]F F = M(Fk ),
[Mk ]A A = O,
T
[Mk ]A F = [Mk ]F A = O,
and recall that .M†k denotes the Moore-Penrose inverse of .Mk defined by [M†k ]F F = M(Fk )−1 ,
.
[M†k ]A A = O,
[M†k ]A F = [M†k ]TF A = O.
150
Zdeněk Dostál
In particular, it follows that M†k g(xk ) = M†k ϕ(xk ).
.
The MPRGP algorithm with preconditioning in face reads as follows. Algorithm 8.3 MPRGP with preconditioning in face
8.7 Comments Since the conjugate gradient method was introduced in the celebrated paper by Hestenes and Stiefel [9] as a method for the solution of systems of linear equations, it seems that Polyak [10] was the first researcher who proposed to use the conjugate gradient method to minimize a quadratic cost function subject to bound constraints. Though Polyak assumed the auxiliary problems to be solved exactly, O’Leary [8] observed that this assumption can be replaced by refining the accuracy in the process of solution. In this way, she managed to reduce the number of iterations to about a half as compared with
8 MPRGP for Bound Constrained QP
151
the algorithm using the exact solution. In spite of this, the convergence of these algorithms was supported by the arguments which gave only exponential bound on the number of matrix-vector multiplications that are necessary to reach the solution. An important step forward was the development of algorithms with a rigorous convergence theory. On the basis of the results of Calamai and Moré [11], Moré and Toraldo [6] proposed an algorithm that also exploits the conjugate gradients and projections, but its convergence is driven by the gradient projections with the step length satisfying the sufficient decrease condition (see, e.g., Nocedal and Wright [12]). The step length is found, as in earlier algorithms, by possibly expensive backtracking. In spite of the iterative basis of their algorithm, the authors proved that their algorithm preserves the finite termination property of the original algorithm provided the solution satisfies the strict complementarity condition. Friedlander, Martínez, Dostál, and their collaborators combined this result with an inexact solution of auxiliary problems [2, 3, 13, 14]. The concept of proportioning algorithm as presented here was introduced by Dostál in [3]. The convergence of the proportioning algorithm was driven by the proportioning step, leaving more room for the heuristic implementation of projections as compared with Moré and Toraldo [6]. The heuristics for the implementation of the proportioning algorithm of Dostál [3] can be applied also to the MPRGP algorithm of Sect. 8.2. Comprehensive experiments and tests of heuristics can be found in Diniz-Ehrhardt et al. [15]. A common drawback of all abovementioned strategies is possible backtracking in search of the gradient projection step length and the lack of results on the rate of convergence. A key to further progress were the results by Schöberl [16, 17] and Dostál [18] on the decrease of the cost function along the projected gradient path (see also Sect. 6.6). It was observed by Dostál [19] that these results can be plugged into the proportioning algorithm in a way which preserves the rate of convergence. In our exposition of the MPRGP algorithm, we follow Dostál and Schöberl [7] and Dostál et al. [4]. See also the book [1]. Further improvements can be achieved by using step length modified by a second-order information; see, e.g., di Serafino et al. [21] or Dostál et al. [22]. The preconditioning in face was probably first considered by O’Leary [8]. The MPRGP with projector preconditioning is a key ingredient of the scalable algorithms for transient contact problems [20]. See also Chap. 13.
References 1. Dostál, Z.: Optimal Quadratic Programming Algorithms, with Applications to Variational Inequalities, 1st edn. Springer, New York (2009)
152
Zdeněk Dostál
2. Friedlander, A., Martínez, J.M.: On the maximization of a concave quadratic function with box constraints. SIAM J. Optim. 4, 177–192 (1994) 3. Dostál, Z.: Box constrained quadratic programming with proportioning and projections. SIAM J. Optim. 7(3), 871–887 (1997) 4. Dostál, Z., Domorádová, M., Sadowská, M.: Superrelaxation in minimizing quadratic functions subject to bound constraints. Comput. Optim. Appl. 48(1), 23–44 (2011) 5. van der Sluis, A., van der Vorst, H.A.: The rate of convergence of the conjugate gradients. Numer. Math. 48, 543–560 (1986) 6. Moré, J.J., Toraldo, G.: On the solution of large quadratic programming problems with bound constraints. SIAM J. Optim. 1, 93–113 (1991) 7. Dostál, Z., Schöberl, J.: Minimizing quadratic functions over nonnegative cone with the rate of convergence and finite termination. Comput. Optim. Appl. 30(1), 23–44 (2005) 8. O’Leary, D.P.: A generalised conjugate gradient algorithm for solving a class of quadratic programming problems. Linear Algebra Appl. 34, 371–399 (1980) 9. Hestenes, M.R., Stiefel, E.: Methods of conjugate gradients for solving linear systems. J. Res. Natl. Bur. Stand. 49, 409–436 (1952) 10. Polyak, B.T.: The conjugate gradient method in extremal problems. USSR Comput. Math. Math. Phys. 9, 94–112 (1969) 11. Calamai, P.H., Moré, J.J.: Projected gradient methods for linearly constrained problems. Math. Program. 39, 93–116 (1987) 12. Nocedal, J., Wright, S.F.: Numerical Optimization. Springer, New York (2000) 13. Friedlander, A., Martínez, J.M., Raydan, M.: A new method for large scale box constrained quadratic minimization problems. Optim. Methods Softw. 5, 57–74 (1995) 14. Bielschowski, R.H., Friedlander, A., Gomes, F.A.M., Martínez, J.M., Raydan, M.: An adaptive algorithm for bound constrained quadratic minimization. Invest. Oper. 7, 67–102 (1997) 15. Diniz-Ehrhardt, M.A., Gomes-Ruggiero, M.A., Santos, S.A.: Numerical analysis of the leaving-face criterion in bound-constrained quadratic minimization. Optim. Methods Softw. 15(1), 45–66 (2001) 16. Schöberl, J.: Solving the Signorini problem on the basis of domain decomposition techniques. Computing 60(4), 323–344 (1998) 17. Schöberl, J.: Efficient contact solvers based on domain decomposition techniques. Comput. Math. Appl. 42, 1217–1228 (2001) 18. Dostál, Z.: On the decrease of a quadratic function along the projectedgradient path. ETNA 31, 25–59 (2008) 19. Dostál, Z.: A proportioning based algorithm for bound constrained quadratic programming with the rate of convergence. Numer. Algorithms 34(2–4), 293–302 (2003)
8 MPRGP for Bound Constrained QP
153
20. Dostál, Z., Kozubek, T., Brzobohatý, T., Markopoulos, A., Vlach, O.: Scalable TFETI with optional preconditioning by conjugate projector for transient contact problems of elasticity. Comput. Methods Appl. Mech. Eng. 247–248, 37–50 (2012) 21. di Serafino, D., Ruggiero, V., Toraldo, G., Zanni, L.: On the steplength selection in gradient methods for unconstrained optimization. Appl. Math. Comput. 318, 176–195(2017) 22. Dostál, Z., Toraldo, G., Viola, M., Vlach, O.: Proportionality-Based Gradient Methods with Applications in Contact Mechanics. In: Kozubek, T. et al. (eds.) Procceedings of HPCSE 2017—High Performance Computing in Science and Engineering. LNCSE, vol. 11087, pp. 47–58. Springer, Cham (2018)
Chapter 9
Solvers for Separable and Equality QP/QCQP Problems Zdeněk Dostál and Tomáš Kozubek
We shall now use the results of our previous investigations to develop efficient algorithms for the minimization of strictly convex quadratic functions subject to possibly nonlinear convex separable inequality constraints and linear equality constraints .
min f (x),
x∈ΩSE
f (x) =
1 T x Ax − xT b, 2
(9.1)
where .A is an .n × n SPD matrix, .b ∈ Rn , .ΩSE = {x ∈ Rn : Bx = o and x ∈ ΩS }, .B ∈ Rm×n , .ΩS = {x ∈ Rn : hi (xi ) ≤ 0, i = 1, . . . , s}, and .hi are convex functions. We are especially interested in problems with bound and/or spherical and/or elliptic inequality constraints. We consider similar assumptions as in previous chapters. In particular, we assume .ΩSE = ∅ and admit dependent rows of .B. We assume that .B = O is not a full column rank matrix, so that .Ker B = {o}. Moreover, we assume that the constraints satisfy the Abadie constraint qualification introduced in Sect. 3.5.3, so that the solution can be characterized by the tangent cone, though we shall give some convergence results without this assumption. Observe that more general QP or QCQP programming problems can be reduced to (9.1) by duality, a suitable shift of variables, or by a modification of f. The main idea of the algorithms that we develop here is to treat both of the sets of constraints separately. This approach enables us to use the ingredients of the algorithms developed in the previous chapters, such as the precision control of auxiliary problems. We restrict our attention to the SMALSE-M (semimonotonic augmented Lagrangian algorithm for separable and equality constraints) algorithm, which will be proved to have important Z. Dostál • T. Kozubek Department of Applied Mathematics and IT4Innovations National Supercomputing Center, VŠB-Technical University of Ostrava, Ostrava, Czech Republic e-mail: [email protected]; e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 Z. Dostál et al., Scalable Algorithms for Contact Problems, Advances in Mechanics and Mathematics 36, https://doi.org/10.1007/978-3-031-33580-8_9
155
156
Zdeněk Dostál and Tomáš Kozubek
optimality properties. We shall discuss separately the variant of SMALSE-M called SMALBE-M for the solution of QP problems with bound and equality constraints. The SMALSE-M and SMALBE-M algorithms are the key tools for the solution of contact problems with and without friction.
9.1 KKT Conditions Since .ΩSE is closed and convex and f is assumed to be strictly convex, the solution of problem (9.1) exists and is necessarily unique by Proposition 3.4. The conditions that are satisfied by the solution of (9.1) can be formulated by means of the augmented Lagrangian L(x, λE , μ, ρ) =
.
s 1 T ρ x Ax − xT b + xT BT λE + Bx2 + μi hi (xi ), 2 2 i=1
the gradient of which reads ∇x L(x, λE , μ, ρ) = (A + ρBT B)x − b + BT λE +
s
.
μi ∇hi (xi ).
i=1
Using (3.47) and the Abadie constraint qualification, we get that a feasible vector .x ∈ ΩSE is a solution of (9.1) if and only if there are .λE ∈ Rm and .μ1 , . . . , μs such that .
∇x L(x, λE , μ, ρ) = o, μi ≥ 0, and μi hi (xi ) = 0, i = 1, . . . , s.
(9.2)
Having effective algorithms for the solution of QCQP problems with separable constraints, it is convenient to use explicitly the Lagrange multipliers only for the equality constraints, i.e., to use L(x, λE , ρ) =
.
1 T ρ x Ax − xT b + xT BT λE + Bx2 . 2 2
Denoting by .g = g(x, λE , ρ) the gradient of the reduced augmented Lagrangian, so that g = g(x, λE , ρ) = (A + ρBT B)x − b + BT λE ,
.
we get that .x ∈ ΩSE is a solution of (9.1) if and only if there is .λE ∈ Rm such that .
gP (x, λE , ρ) = o,
(9.3)
9 Solvers for Separable and Equality QP/QCQP Problems
157
f (x) = c
ΩE
xρ
fρ (x) = c x
ΩS
Fig. 9.1 The effect of the quadratic penalty
where .gP is the projected gradient defined in Sect. 7.1 for the auxiliary problem .
min L(x, λE , ρ),
x∈ΩS
ΩS = {x ∈ Rn : h(x) ≤ o}.
(9.4)
However, condition (9.3) is sensitive to the curvature of the boundary, so we shall consider also an alternative condition (see Theorem 7.3) .
P α (x, λE , ρ) = o g
(9.5)
P α with the reduced projected gradient .g (x, λE , ρ) of Sect. 7.3 and −1 .α ∈ (0, 2A ).
9.2 Penalty and Method of Multipliers Probably the most simple way how to exploit the algorithms of the previous chapters to the solution of (9.1) is to enhance the equality constraints into the objective function f by adding a suitable term which penalizes their violation. Thus, the solution of (9.1) can be approximated by the solution of .
min fρ (x),
x∈ΩS
ρ fρ (x) = f (x) + Bx2 . 2
(9.6)
ρ of (9.6) Intuitively, if the penalty parameter .ρ is large, then the solution .x can hardly be far from the solution of (9.1). Indeed, if .ρ were infinite, then the minimizer of .fρ would solve (9.1). Thus, it is natural to expect that ρ is a suitable if .ρ is sufficiently large, then the penalty approximation .x of (9.1). The effect of the penalty term is approximation to the solution .x illustrated in Fig. 9.1. Notice that the penalty approximation is typically near the feasible set, but does not belong to it. That is why the penalty method is also called the exterior penalty method.
158
Zdeněk Dostál and Tomáš Kozubek
Because of its simplicity and intuitive appeal, the penalty method is often used in computations. However, a good approximation of the solution may require a very large penalty parameter, which can complicate computer implementation. to (9.1) The remedy can be based on the observation that the solution .x solves also .
min L(x, λE , ρ)
x∈ΩS
(9.7)
with a suitable .λE ∈ Rm . The point is that having a solution .xρ of the penalized problem (9.7) with .ρ and .λE ∈ Rm , we can modify the linear term of L in such a way that the minimum of the modified cost function without the penalization term with respect to .x ∈ ΩS is achieved again at .xρ . The formula follows from .
min L(x, λE , ρ) = min L(x, λE + ρBxρ , 0).
x∈ΩS
x∈ΩS
Then we can find a better approximation by adding the penalization term to the modified cost function and look for the minimizer of .L(x, λ + ρBxρ , ρ). The result is the well-known classical augmented Lagrangian algorithm, also called the method of multipliers, which was proposed for general equality constraints by Hestenes [1] and Powell [2]. See also Bertsekas [3] or Glowinski and Le Tallec [4].
9.3 SMALSE-M The following algorithm is a modification of the algorithm proposed by Conn et al. [5] for the minimization of more general cost functions subject to bound and equality constraints in their LANCELOT package. The SMALSE-M algorithm presented here differs from that used in LANCELOT by the adaptive precision control introduced by Hager [6] and Dostál et al. [7], by using a fixed regularization parameter, and by the control of the parameter .Mk . Since we use only the multipliers for the equality constraints, we denote them by .λ, i.e., .λ = λE . The complete SMALSE-M algorithm reads as follows.
9 Solvers for Separable and Equality QP/QCQP Problems
159
Algorithm 9.1 Semimonotonic augmented Lagrangians for separable and equality constrained QCQP problems (SMALSE-M)
In Step 1, we can use any algorithm for minimizing strictly convex quadratic functions subject to separable constraints provided it guarantees the convergence of projected gradients to zero. Our optimality theory requires that the algorithm in the inner loop has the rate of convergence in terms of bounds on the spectrum of .A, such as the MPGP Algorithm 7.2 or MPRGP Algorithm 8.2. A stopping criterion should either follow Step 1 or be enhanced in Step 1. A natural choice is the relative precision
.
gP ≤ min{εg b, εe b},
(9.11)
prescribed by small parameters .εg > 0 and .εe > 0. The SMALSE algorithm requires four parameters .M0 , .β, .η, and .ρ. The algorithm is not very sensitive to the choice of .β and .η and adjusts properly the balancing parameter M. A larger value of .ρ increases the rate of convergence of SMALSE at the cost of slowing down the rate of convergence of the algorithm in the inner loop. Some hints concerning the initialization of the parameters and the implementation of SMALSE-M can be found in Sect. 9.14. The next lemma shows that Algorithm 9.1 is well defined, i.e., any algorithm for the solution of auxiliary problems in Step 1 that guarantees the convergence of projected gradients to zero generates either .xk which satisfies (9.5) in a finite number of steps or the iterates which converge to the solution of (9.1).
160
Zdeněk Dostál and Tomáš Kozubek
Lemma 9.1 Let .M > 0, λ ∈ Rm , .η > 0, and .ρ ≥ 0 be given. Let .{yk } ∈ ΩS denote any sequence such that = lim yk = arg min L(y, λ, ρ) y
.
y∈ΩS
k→∞
and let . gP (yk , λ, ρ) converge to zero vector. Then .{yk } either converges to of problem (9.1), or there is an index k such that the unique solution .x .
gP (yk , λ, ρ) ≤ min{M Byk , η}.
(9.12)
Proof If (9.12) does not hold for any k, then gP (yk , λ, ρ) > M Byk
.
for any k. Since .gP (yk , λ, ρ) converges to zero vector by the assumption, it y = o and follows that .Byk converges to zero. Thus, .B gP ( y, λ, ρ) = o.
.
satisfies the KKT conditions (9.2) and .y =x . It follows that .y
.
Remark 9.1 We can keep .Mk = M0 and replace the constant regularization parameter . by .k . In this case, we can update .k in Step 3 by ρ
. k+1
= ρk /β.
We do not elaborate this option here.
9.4 Inequalities Involving the Augmented Lagrangian In this section, we establish basic inequalities which relate the bound on the norm of the projected gradient .gP of the augmented Lagrangian L to the values of L. These inequalities will be the key ingredients in the proof of convergence and other analysis concerning Algorithm 9.1. Lemma 9.2 Let .x, y ∈ ΩS , λ ∈ Rm , ρ > 0, .η > 0, and .M > 0. Let .λmin = λ + ρBx. denote the least eigenvalue of .A and .λ (i) If .
gP (x, λ, ρ) ≤ M Bx,
(9.13)
then .
ρ) ≥ L(x, λ, ρ) + L(y, λ,
1 2
ρ−
M2 λmin
ρ Bx2 + By2 . 2
(9.14)
9 Solvers for Separable and Equality QP/QCQP Problems
161
(ii) If .
gP (x, λ, ρ) ≤ η,
(9.15)
then .
2 ρ) ≥ L(x, λ, ρ) + ρ Bx2 + ρ By2 − η . L(y, λ, 2 2 2λmin
(9.16)
(iii) If .z0 ∈ ΩSE and (9.15), then L(x, λ, ρ) ≤ f (z0 ) +
.
η2 . 2λmin
(9.17)
Proof Let us denote .δ = y − x and .Aρ = A + ρBT B and recall that by the assumptions and Lemma 7.1, gT (y − x) ≥ (gP )T (y − x).
.
Using ρ) = L(x, λ, ρ) + ρBx2 and g(x, λ, ρ) = g(x, λ, ρ) + ρBT Bx, L(x, λ,
.
we get ρ) =L(x, λ, ρ) + δ T g(x, λ, ρ) + 1 δ T Aρ δ L(y, λ, 2 1 T =L(x, λ, ρ) + δ g(x, λ, ρ) + δ T Aρ δ + ρδ T BT Bx + ρBx2 2 1 T P ≥L(x, λ, ρ) + δ g (x, λ, ρ) + δ T Aρ δ + ρδ T BT Bx + ρBx2 2 λmin ρ T P δ2 + Bδ2 + ρδ T BT Bx ≥L(x, λ, ρ) + δ g (x, λ, ρ) + 2 2 + ρBx2 .
.
Noticing that .
ρ ρ ρ ρ By2 = B(δ + x)2 = ρδ T BT Bx + Bδ2 + Bx2 , 2 2 2 2
we get
.
ρ) ≥L(x, λ, ρ) + δ T gP (x, λ, ρ) L(y, λ, ρ ρ λmin δ2 + Bx2 + By2 . + 2 2 2
Using (9.13) and simple manipulations then yields
(9.18)
162
Zdeněk Dostál and Tomáš Kozubek
ρ) ≥L(x, λ, ρ) − M δBx + λmin δ2 + ρ Bx2 + ρ By2 L(y, λ, 2 2 2 λmin M 2 Bx2 2 δ − M δBx + =L(x, λ, ρ) + 2 2λmin 2 2 ρ ρ M Bx + Bx2 + By2 − 2λmin 2 2 1 ρ M2 ≥L(x, λ, ρ) + Bx2 + By2 . ρ− 2 λmin 2
.
This proves (i). (ii) If we assume that (9.15) holds, then by (9.18), λmin ρ ρ δ2 + Bx2 + By2 2 2 2 ρ ρ η2 2 2 ≥L(x, λ, ρ) + Bx + By − . 2 2 2λmin
ρ) ≥L(x, λ, ρ) − δη + L(y, λ, .
This proves (ii). (iii) Let . z denote the solution of the auxiliary problem .
minimize L(z, λ, ρ) s.t. z ≥ ,
(9.19)
= let .z0 ∈ ΩSE so that .Bz0 = o, and let .δ z − x. If (9.15) holds, then T Aρ δ T g(x, λ, ρ) + 1 δ 0 ≥L( z, λ, ρ) − L(x, λ, ρ) = δ 2 . 2 T gP (x, λ, ρ) + 1 δ 2≥− η . ≥ −δη + 1 λmin δ T Aρ δ ≥δ 2 2 2λmin Since .L( z, λ, ρ) ≤ L(z0 , λ, ρ) = f (z0 ), we conclude that L(x, λ, ρ) ≤ L(x, λ, ρ) − L( z, λ, ρ) + f (z0 ) ≤ f (z0 ) +
.
η2 . 2λmin
.
9.5 Monotonicity and Feasibility Now we shall translate the results on the relations that are satisfied by the augmented Lagrangian into the relations concerning the iterates generated by SMALSE-M. Lemma 9.3 Let .{xk }, {λk }, and .Mk be generated by Algorithm 9.1 for solving (9.1) with .η > 0, 0 < β < 1, M0 > 0, ρ, and λ0 ∈ Rm . Let .λmin denote the least eigenvalue of the Hessian .A of the quadratic function f.
9 Solvers for Separable and Equality QP/QCQP Problems
163
(i) If .k > 0, .Mk = Mk+1 , and .
Mk ≤
(9.20)
λmin ρ,
then .
ρ L(xk+1 , λk+1 , ρ) ≥ L(xk , λk , ρ) + Bxk+1 2 . 2
(9.21)
(ii) For any .k ≥ 0, ρ L(xk+1 , λk+1 , ρ) ≥ L(xk , λk , ρ) + Bxk 2 2 ρ η2 + Bxk+1 2 − . 2 2λmin
.
(9.22)
(iii) For any .k ≥ 0 and .z0 ∈ ΩSE , .
L(xk , λk , ρ) ≤ f (z0 ) +
η2 . 2λmin
(9.23)
Proof In Lemma 9.2, let us substitute .x = xk , .λ = λk , .M = Mk , and k+1 = λk+1 . .y = x , so that inequalities (9.13) and (9.15) hold by (9.8) and .λ Thus, we get Lemma 9.3 from Lemma 9.2. If (9.20) holds, we can use (9.14) to get (9.21). Similarly, inequalities (9.22) and (9.23) can be obtained by the substitution into Lemma 9.2 (ii) and (iii), . respectively. Theorem 9.1 Let .{xk }, {λk }, and .{Mk } be generated by Algorithm 9.1 for the solution of (9.1) with .η > 0, .0 < β < 1, .M0 > 0, .ρ > 0, and .λ0 ∈ Rm . Let .λmin denote the least eigenvalue of the Hessian .A of the cost function f, and let .p ≥ 0 denote the smallest integer such that .ρ ≥ (β p M0 )2 /λmin . Then the following statements hold. (i) There is .k0 such that . min{M0 , β (9.24) ρλmin } ≤ Mk0 = Mk0 +1 = Mk0 +2 = · · · (ii) If .z0 ∈ ΩSE , then ∞
ρ η2 Bxk 2 ≤ f (z0 ) − L(x0 , λ0 , ρ) + (1 + p) . . 2 2λmin
(9.25)
k=1
(iii) .
lim gP (xk , λk , ρ) = o
k→0
and
lim Bxk = o.
k→0
(9.26)
Proof Let .p ≥ 0 denote the smallest integer such that .ρ ≥ (β p M0 )2 /λmin , and let .I ⊆ {1, 2, . . . } denote a possibly empty set of the indices .ki such that
164
Zdeněk Dostál and Tomáš Kozubek
Mki < Mki −1 . Using Lemma 9.3(i), .Mki = βMki −1 = β i M0 for .ki ∈ I , and p 2 p .ρ ≥ (β M0 ) /λmin , we conclude that there is no k such that .Mk < β M0 . Thus, .I has at most p elements, and (9.24) holds. By the definition of Step 3, if .k > 0, then either .k ∈ I and .
.
ρ Bxk 2 ≤ L(xk , λk , ρ) − L(xk−1 , λk−1 , ρ), 2
or .k ∈ I and by (9.22) ρ ρ ρ Bxk 2 ≤ Bxk−1 2 + Bxk 2 2 2 2 .
≤L(xk , λk , ρ) − L(xk−1 , λk−1 , ρ) +
η2 . 2λmin
Summing up the appropriate cases of the last two inequalities for .k = 1, . . . , j and taking into account that .I has at most p elements, we get j ρ .
k=1
2
Bxk 2 ≤ L(xj , λj , ρ) − L(x0 , λ0 , ρ) + p
η2 . 2λmin
(9.27)
To get (9.25), it is enough to replace .L(xj , λj , ρ) by the upper bound (9.23). The relations (9.26) are easy corollaries of (9.25) and the definition of . Step 1.
9.6 Boundedness The first step toward the proof of convergence of SMALSE-M is to show that xk are bounded.
.
Proposition 9.1 Let .{xk } and .{λk } be generated by Algorithm 9.1 for the solution of (9.1) with .η > 0, 0 < β < 1, M0 > 0, ρ > 0, and .λ0 ∈ Rm . For each .i ∈ {1, . . . , s}, let .Ii denote the indices of the components of .x associated with the argument of the constraint function .hi , so that .xi = [x]Ji , and let us define A(x) = ∪i∈A (x) Ii ,
.
(x) = N \ A(x), F
xF = xF(x) ,
xA = xA(x) .
Let the boundary of .ΩS be bounded, i.e., there is .C > 0 such that for any x ∈ ΩS ,
.
xA (x) 2 ≤ C.
.
Then .{xk } is bounded.
9 Solvers for Separable and Equality QP/QCQP Problems
165
Proof Since there are only a finite number of different subsets .F of the set of all indices .N = {1, . . . , n} and .{xk } is bounded if and only if .{xkF } is bounded for any .F ⊆ N , we can restrict our attention to the analysis of the (xk ) = F } that are defined by the nonempty infinite subsequences .{xkF : F subsets .F of .N . (xk ) = F } be infinite, and denote Let .F ⊆ N , .F = ∅, let .K = {k : F .
A = N \ F,
H = A + ρk BT B.
We get gk = g(xk , λk , ρ) = Hxk + BT λk − b
.
and .
HF F BT∗F B∗F O
xkF λk
k + bF − HF A xkA gF = B∗F xkF
.
(9.28)
Since for .k ∈ K B∗F xkF = Bxk − B∗A xA ,
.
k gF = gF (xk , λk , ρk ) ≤ gP (xk λk , ρk ),
and both .gP (xk , λk , ρ) and .Bxk converge to zero by the definition of k .x in Step 1 of Algorithm 9.1 and (9.25), the right-hand side of (9.28) is bounded. Since .HF F is nonsingular, it is easy to check that the matrix of the system (9.28) is nonsingular when .B∗F is a full row rank matrix. It simply follows that both .{xk } and .{λk } are bounded provided the matrix of (9.28) is nonsingular. If .B∗F is not a full row rank matrix, then its rank r satisfies .r < m, and by the RSVD formula (2.37), there are matrices U ∈ Rm×r , V ∈ Rn×r , Σ = diag(σ1 , . . . , σr ), σi > 0, UT U = I, VT V = I,
.
ˆ ∗F = such that .B∗F = UΣVT . Thus, we can define the full row rank matrix .B ΣVT that satisfies ˆ ∗F xF = ΣVT xF = UΣVT xF = B∗F xF B
.
ˆ = UT λ, so that for any vector .x. Let us assign to any .λ ∈ Rm the vector .λ ˆT λ ˆ = VΣUT λ = BT λ. B ∗F ∗F
.
Using the latter identity and (9.28), we get the system
k k ˆT xF gF + bF − HF A xA HF F B ∗F . = ˆ ∗F xk ˆ ∗F O ˆk B B λ F
(9.29)
166
Zdeněk Dostál and Tomáš Kozubek
with a nonsingular matrix. The right-hand side of (9.29) being bounded due ˆ ∗F xk = B∗F xk , we conclude that the set .{xk : F (xk ) = F } is to .B F F F . bounded. See also Dostál and Kučera [8] or Dostál and Kozubek [9].
9.7 Convergence Now we are ready to prove the main convergence results of this chapter. ( To describe them effectively, let .F = F x) denote the set of indices of (see the variables that are involved in the free set of a unique solution .x Proposition 9.1 for formal definition of .F ( x)), and let us call the solution .x regular if .B∗F is a full row rank matrix and range regular if .ImB = ImB∗F . It is easy to check that the regular or range regular solution of (9.1) satisfies the Abadie constraint qualification. Theorem 9.2 Let .{xk }, .{λk } be generated by Algorithm 9.1 for the solution of (9.1) with .η > 0, .β > 1, .M > 0, .ρ > 0, and .λ0 ∈ Rm . Then the following statements hold. of (9.1). (i) The sequence .{xk } converges to the solution .x of (9.1) is regular, then .{λk } converges to a uniquely (ii) If the solution .x E of Lagrange multipliers for the equality constraints determined vector .λ of (9.1). of (9.1) is range regular, then .{λk } converges to (iii) If the solution .x λ = λLS + (I − P)λ0 ,
.
where .P is the orthogonal projector onto .Im B = Im B∗F and .λLS is the least squares Lagrange multiplier for the equality constraints of the solution of (9.1). Proof (i) Since the iterates .xk are bounded due to Lemma 9.1, it follows that there is a cluster point .x of .{xk } and .K ⊆ N such that .
lim {xk }k∈K = x.
k→∞
Moreover, since .xk ∈ ΩS and by (9.26) .
lim Bxk = 0,
k→∞
it follows that .Bx = o, .x ∈ ΩSE , and .f (x) ≥ f ( x). To show that .x solves (9.1), let us denote − x, d=x
.
αk = max{α ∈ [0, 1] : xk + αd ∈ ΩS },
dk = αk d,
9 Solvers for Separable and Equality QP/QCQP Problems
167
so that Bd = Bdk = o,
.
lim dk = d,
k→∞
, lim xk + dk = x
k→∞
and L(xk + d, λk , ρ) − L(xk , λk , ρ) = f (xk + d) − f (xk ).
.
Moreover, if we denote ˆ k = arg min L(x, λk , ρ), x
.
x∈ΩS
then by Lemma 7.2 and the definition of .xk in Step 1 of SMALSE-M, 0 ≤ L(xk , λk , ρ) − L(ˆ xk , λk , ρ) ≤
.
1 M02 gP (xk , λk , ρ)2 ≤ Bxk 2 . 2λmin 2λmin
Using the above relations, we get 0 ≤L(xk + dk , λk , ρ) − L(ˆ xk , λk , ρ) .
=L(xk + dk , λk , ρ) − L(xk , λk , ρ) + L(xk , λk , ρ) − L(ˆ xk , λk , ρ) ≤f (xk + dk ) − f (xk ) +
M02 Bxk 2 . 2λmin
The continuity of f implies M02 .0 ≤ lim Bxk 2 = f ( f (xk + dk ) − f (xk ) + x) − f (x). k→∞ 2λmin of (9.1) being unique, it follows It follows that .x solves (9.1). The solution .x . that .xk converges to .x = x ( (ii) Let us denote .F = F x) and .H = A + ρBT B, so by the assumptions, such that there is a unique Lagrange multiplier .λ .
F = o. − b + BT λ] [H x
(9.30)
, there is .k1 such that Since we have just proved that .{xk } converges to .x F ⊆ F {xk } for .k ≥ k1 and
.
g (xk , λk , ρ) = HF ∗ xk − bF + BT∗F λk
. F
converges to zero. It follows that the sequence BT∗F λk = bF − HF ∗ xk + gF (xk , λk , ρ)
.
is bounded. Since the largest nonzero singular value .σ min of .B∗F satisfies
168
Zdeněk Dostál and Tomáš Kozubek
σ min λk ≤ BT∗F λk ,
.
it follows that there is a cluster point .λ of .λk . Moreover, .λ satisfies [H x − b + BT λ]F = o.
.
After comparing the last relation with (9.30) and using the assumptions, we and .λk converges to .λ. conclude that .λ = λ of (9.1) is only range regular, let .λ (iii) Let us assume that the solution .x denote any vector of Lagrange multipliers for (9.1), and let .Q = I − P denote the orthogonal projector onto .Ker BT = Ker BT∗F . Using .P + Q = I, .BT Q = O, and (2.43), we get .
BT∗F (λk − λ) = BT∗F (P + Q)(λk − λ) = BT∗F (Pλk − Pλ) ≥ σ min Pλk − Pλ.
Using the arguments from the proof of (ii), we get that the left-hand side of the above relation converges to zero, so .Pλk converges to .Pλ. Since λk = λ0 + ρBx0 + · · · + ρk Bxk
.
with .Bxk ∈ ImB, we get λk = (P + Q)λk = Qλ0 + Pλk .
.
Observing that .λ = λLS + Qλ0 is a Lagrange multiplier for (9.1) and .Pλ = λLS , we get
k 0 k 0 = Pλk − Pλ. .λ − λ = Qλ + Pλ − λLS + Qλ Since the right-hand side converges to zero, we conclude that .λk converges . to .λ, which completes the proof of (iii).
9.8 Optimality of the Outer Loop Theorem 9.1 suggests that it is possible to give an independent of .B upper bound on the number of outer iterations of Algorithm 9.1 (SMALSE-M) that are necessary to achieve a prescribed feasibility error for a class of problems like (9.1). To present explicitly this new feature of SMALSE-M, at least as compared to the related algorithms [7], let .T denote any set of indices, and let for any .t ∈ T be defined the problem .
minimize ft (x) s.t. x ∈ ΩtSE ,
ft (x) =
1 T x At x − bTt x, 2
(9.31)
9 Solvers for Separable and Equality QP/QCQP Problems
169
where ΩtSE = {x ∈ Rnt : Bt x = o and x ∈ ΩtS };
ΩtS = {hti (xi ) ≤ 0, i ∈ I t };
.
ht defines the bound, spherical, or elliptic constraints; .At ∈ Rnt ×nt denotes an SPD matrix; .Bt ∈ Rmt ×nt ; and .bt ∈ Rnt . Our optimality result reads as follows.
. i
Theorem 9.3 Let .{xkt }, {λkt }, and .{Mt,k } be generated by Algorithm 9.1 for (9.31) with 0 < ηt ≤ bt , 0 < β < 1, Mt,0 = M0 > 0, ρ > 0, and λ0t = o.
.
Let .o ∈ ΩtSE , and let there be an .amin > 0 such that the least eigenvalue .λmin (At ) of the Hessian .At of the quadratic function .ft satisfies λ
. min
(At ) ≥ amin ,
t∈T.
Then for each .ε > 0, there are indices .kt , .t ∈ T , such that .
kt ≤ a/ε2 + 1
(9.32)
and .xkt t is an approximate solution of (9.31) satisfying .
Bt xkt t ≤ εbt .
(9.33)
Proof First notice that for any index j, ∞
ρ ρ ρj min{Bt xit 2 : i = 1, . . . , j} ≤ Bt xit 2 ≤ Bt xit 2 . 2 2 2 i=1 i=1 j
.
(9.34)
Denoting by .Lt (x, λ, ρ) the augmented Lagrangian for problem (9.31), we get for any .x ∈ Rnt and .ρ ≥ 0 Lt (x, o, ρ) =
.
1 T 1 bt 2 x (At + ρBTt Bt )x − bTt x ≥ amin x2 − bt x ≥ − . 2 2 2amin
If we substitute this inequality and .z0 = o into (9.25) and use the assumption bt ≥ ηt , we get
.
∞ ρ .
i=1
2
Bt xit 2 ≤
bt 2 η2 (2 + p)bt 2 + (1 + p) ≤ , 2amin 2amin 2amin
(9.35)
where .p ≥ 0 denotes the smallest integer such that .ρ ≥ β 2p M02 /amin . Using (9.34) and (9.35), we get
170
Zdeněk Dostál and Tomáš Kozubek
.
ρj (2 + p) 2 min{Bt xit 2 : i = 1, . . . , j} ≤ ε bt 2 . 2 2amin ε2
Let us now denote .
a = (2 + p)/(amin ρ)
and take for j the least integer which satisfies .a/j ≤ ε2 , so that .
a/ε2 ≤ j ≤ a/ε2 + 1.
(9.36)
Denoting for any .t ∈ T k = arg min{Bt xit : i = 1, . . . , j},
. t
we can use (9.36) with simple manipulations to obtain Bt xkt t 2 = min{Bt xit 2 : i = 1, . . . , j} ≤
.
a 2 ε bt 2 ≤ ε2 bt 2 . jε2
.
9.9 Optimality of the Inner Loop We need the following simple lemma to prove the optimality of the inner loop (Step 1) implemented by the MPGP algorithm. Lemma 9.4 Let .{xk } and .{λk } be generated by Algorithm 9.1 for the solution of (9.1) with .η > 0, .0 < β < 1, .M > 0, .ρ > 0, and .λ0 ∈ Rm . Let .0 < amin ≤ λmin (A), where .λmin (A) denotes the least eigenvalue of .A. Then for any .k ≥ 0, .
L(xk , λk+1 , ρ) − L(xk+1 , λk+1 , ρ) ≤
η2 . 2amin
(9.37)
Proof Let us denote ˆ k+1 = arg min L(x, λk+1 , ρ). x
.
x∈ΩS
Using Lemma 7.2 and the definition of Step 1 of SMALSE-M, we get L(xk , λk+1 , ρ) − L(xk+1 , λk+1 , ρ) ≤L(xk , λk+1 , ρ) − L( xk+1 , λk+1 , ρ) .
≤
1 η2 gP 2 ≤ . 2amin 2amin
.
9 Solvers for Separable and Equality QP/QCQP Problems
171
Now we are ready to prove the main result of this chapter, the optimality of Algorithm 9.1 (SMALSE-M) in terms of matrix-vector multiplications, provided Step 1 is implemented by Algorithm 7.2 (MPGP) or any other, possibly more specialized algorithm for the solution of separable constraints with R-linear rate of convergence of the norm of the projected gradient, such as MPRGP described in Chap. 8. Theorem 9.4 Let .
0 < amin < amax ,
0 < cmax ,
and
ε>0
be given constants, and let the class of problems (9.31) satisfy .
amin ≤ λmin (At ) ≤ λmax (At ) ≤ amax and Bt ≤ cmax .
(9.38)
Let .{xkt }, {λkt }, and .{Mt,k } be generated by Algorithm 9.1 (SMALSE-M) for (9.31) with bt ≥ ηt > 0, 0 < β < 1, Mt,0 = M0 > 0, ρ > 0,
.
and
λ0t = o.
Let Step 1 of Algorithm 9.1 be implemented by Algorithm 7.2 (MPGP) with the parameters .Γ > 0 and .α ∈ (0, 2(amax + ρc2max )−1 ] to generate the iterates k,0 k,1 k,l .xt , xt , . . . , xt = xkt for the solution of (9.31) starting from .xk,0 = xk−1 t t −1 with .xt = o, where .l = lt,k is the first index satisfying .
k, k g P (xk,l t , λt , ρ) ≤ Mt, Bt xt .
(9.39)
Then Algorithm 9.1 generates an approximate solution .xkt t of any problem (9.31) which satisfies gP (xkt t , λkt t , ρ) ≤ εbt and Bt xkt t ≤ εbt
.
(9.40)
at .O(1) matrix-vector multiplications by the Hessian of the augmented Lagrangian for (9.31). Proof Let .t ∈ T be fixed, and let us denote by .Lt (x, λ, ρ) the augmented Lagrangian for problem (9.31). Then by (9.37) and the assumption .ηt ≤ bt , Lt (xk−1 , λkt , ρ) − Lt (xkt , λkt , ρ) ≤ t
.
ηt2 bt 2 ≤ . 2amin 2amin
Since the minimizer .xkt of .Lt (x, λkt , ρ) subject to .x ∈ ΩtS satisfies (9.8) and is a possible choice for .xkt , it follows that .
Lt (xk−1 , λkt , ρ) − Lt (xkt , λkt , ρ) ≤ t
bt 2 . 2amin
(9.41)
172
Zdeněk Dostál and Tomáš Kozubek
Using Theorem 7.3, we get that Algorithm 7.2 (MPGP) used to implement Step 1 of Algorithm 9.1 (SMALSE-M) starting from .xk,0 = xk−1 generates t t k,l .xt satisfying
bt 2 l k k−1 2 l gtP (xk,l , λkt , ρ) − Lt (xkt , λkt , ρ) ≤ a1 η , t , λt , ρ) ≤ a1 ηΓ Lt (xt 2amin Γ
.
where .a1 and .η = η(δ, α) < 1 are the constants specified for the class of problems (9.31) in Theorem 7.3. It simply follows by the inner stop rule (9.39) that the number l of the inner iterations in Step 1 is uniformly bounded by an index .lmax which satisfies a
. 1
bt 2 lmax 2 η ≤ Mt, ε2 bt 2 . max 2amin Γ
Thus, we have proved that Step 1 implemented by MPGP is completed in a uniformly bounded number of iterations of MPGP. Since each iteration of MPRGP requires at most two matrix-vector multiplications, it follows that Step 1 can be carried out for any instance of the class of problems (9.1) in a uniformly bounded number of matrix-vector multiplications. To finish the proof, it is enough to combine this result with Theorem 9.3, in particular to carry out the outer iterations until Mt,kt Bxkt ≤ εb
.
and
Bxkt ≤ εb.
.
9.10 SMALBE for Bound and Equality Constrained QP Problems We shall now consider a special case of problem (9.1), the minimization of a strictly convex quadratic function on a feasible set defined by the bound and linear equality constraints .
min f (x),
x∈ΩBE
ΩBE = {x ∈ Rn : Bx = o and
x ≥ },
(9.42)
where .f (x) = 12 xT Ax − xT b, .b ∈ Rn , .A is an .n × n SPD matrix, and m×n .B ∈ R . We consider the same assumptions as above, in particular, .ΩBE = ∅, .B = O, and .Ker B = {o}. We admit dependent rows of .B and . i = −∞. The complete algorithm that we call SMALBE-M (semimonotonic augmented Lagrangian algorithm for bound and equality constraints) reads as follows.
9 Solvers for Separable and Equality QP/QCQP Problems
173
Algorithm 9.2 Semi-monotonic augmented Lagrangians for bound and equality constrained QP problems (SMALBE-M)
In Step 1, we can use any algorithm for minimizing strictly convex quadratic functions subject to bound constraints as long as it guarantees the convergence of projected gradient to zero, such as the MPRGP algorithm of Chap. 8.
9.11 R-Linear Convergence of SMALBE-M Our main optimality result, Theorem 9.4, guarantees that the number of iterations .kt (ε) that are necessary to get an approximate solution of any problem from the class of problems (9.31) to the prescribed relative precision .ε is uniformly bounded by k(ε) = max{kt (ε) : t ∈ T }.
.
Notice that .k(ε) does not depend on the matrices .Bt which appear in the description of the feasible sets .Ωt . . . a feature which is not obvious from the standard analysis of the Uzawa-type methods even for linear problems. However, Theorem 9.3 yields only k(ε) ε−2 ,
.
i.e., we proved that there is .C > 0 independent of .Bt such that for any .ε > 0,
174
Zdeněk Dostál and Tomáš Kozubek
k(ε) ≤ Cε−2 ,
.
which is very pessimistic and has never been observed in practice. Here, we report a stronger result concerning the convergence of a particular class of problem (9.42) defined by the bound and equality constraints, namely, that SMALBE-M enjoys the R-linear convergence of feasibility errors in a later stage of computations, when the indices of the free and strongly active variables of the solution are identified, and show that there are .kt (ε) and .k t such that for sufficiently small .ε, k (ε) − k t | log(ε)|.
. t
Notice that the iterates can remain nonlinear in any stage of the solution procedure, so the convergence analysis cannot be reduced to that for the equality constraints. The result does not assume independent equality constraints and remains valid even when there are some zero multipliers for active bound constraints. We shall start with the following lemma which shows that the binding set = F ( x) and the free set .F x) are identified in a finite number of .B = B( steps. Lemma 9.5 Let the matrices .A, B and the vectors .b, be those from the definition of problem (9.42). Let .xk and .λk be generated by the SMALBE-M algorithm for the solution of (9.42) with the regularization parameter .ρ > 0. Then there is .k1 such that for .k ≥ k1 ,
.
= F ( F x) ⊆ F (xk )
and
= B( B x) ⊆ B(xk ).
(9.46)
Proof First observe that .gP (xk , λk , ρ) converges to zero by Theorem 9.3. If = ∅ or .F = ∅, then the statements of our lemma concerning these sets are .B valid trivially. Hence, we assume that they are both nonempty and denote }, δF = min{ x i − i : i ∈ F
.
δB = min{ gi : i ∈ B}.
, , there is .k such that for .k ≥ k and .i ∈ F Since .xk converges to .x 1 0, .0 < β < 1, .M0 > 0, .ρ > 0, and .λ0 ∈ Rm . of (9.42) be range regular, and let .k0 and .k1 be those of Let the solution .x Theorem 9.1 and Lemma 9.5, so that for .k = max{k0 , k1 }, Mk = Mk+1 = Mk+2 = · · ·
.
F ( x) = F (xk ) = F (xk+1 )
and
= F (xk+2 ) = · · · . Let .σ min denote the smallest nonzero singular value of .B∗F , let .H = A+ρBT B, and denote C 1 = Mk 0
.
κ(H) + 1 κ(Aρ ) + , λmin (H) σ min
C2 =
−1 min H Mk0 κ(Aρ ) + σ σ min
C=
2C2 . ρ
Then the following relations hold: (i) For any .k ≥ k, .
Bxk 2 ≤ (C + 1)
C C +1
k−k Bxk 2 .
(9.47)
(ii) For any .k ≥ k, .
2 ≤ C1 (C + 1) xk − x
C C +1
k−k Bxk 2 .
(9.48)
(iii) If .λ0 ∈ ImB, then for any .k ≥ k, .
λ − λLS ≤ C2 (C + 1) k
2
Proof See [10, Theorem 4.4].
C C +1
k−k Bxk 2 .
(9.49)
.
9.12 SMALSE-Mw The algorithm SMALSE-Mw that we develop here is a modification of SMALSE-M which can cope with a high sensitivity of the projected gradient when the curvature of the boundary of a feasible set is strong, as typically
176
Zdeněk Dostál and Tomáš Kozubek
happens when the inequality constraints are elliptic as in the algorithms for contact problems with orthotropic friction. The difficulties with the curvature P α (see Sect. 7.3). The are resolved by using the reduced projected gradient .g SMALSE-Mw algorithm reads as follows. Algorithm 9.3 Semimonotonic augmented Lagrangians for separable and equality constrained QCQP problems (SMALSE-Mw)
In Step 1, we can use any algorithm for minimizing the strictly convex quadratic function subject to separable constraints as long as it guarantees the convergence of reduced projected gradients to zero. Since Theorem 7.1 guarantees that for any .λk ∈ Rm and .ρ ≥ 0, there is a constant .C > 0 such that for any .x ∈ ΩS , .
P P ˜ gα (x, λk , ρ) ≤ gP (x, λk , ρ) ≤ C˜ gα (x, λk , ρ),
(9.53)
it follows that we can use the MPGP algorithm of Sect. 7.4. The next lemma shows that Algorithm 9.3 is well defined, that is, any algorithm for the solution of auxiliary problems required in Step 1 that guarantees the convergence of the reduced projected gradient to zero generates either a feasible .xk which satisfies (9.50) in a finite number of steps or approximations which converge to the solution of (9.1). Lemma 9.6 Let .M > 0, λ ∈ Rm , .α ∈ (0, 2(A + ρB2 )−1 ), .η > 0, and k .ρ ≥ 0 be given. Let .{y } ∈ ΩS denote any sequence such that = lim yk = arg min L(y, λ, ρ) x
.
k→∞
y∈ΩS
9 Solvers for Separable and Equality QP/QCQP Problems
177
P α and . g (yk , λ, ρ) converges to the zero vector. Then .{yk } either converges to of problem (9.1), or there is an index k such that the unique solution .x .
P gα (yk , λ, ρ) ≤ min{M Byk , η}.
(9.54)
P Proof If (9.54) does not hold for any k, then . gα (yk , λ, ρ) > M Byk for P k (yα , λ, ρ) converges to the zero vector by the assumption, it any k. Since .g y = o, and using the assumpfollows that .Byk converges to zero. Thus, .B tions and (9.5), we get
P ( y) = gP ( y, λ, ρ) = o. g
. α
satisfies the KKT conditions (9.5) and .y =x . It follows that .y
.
Inequality (9.53) guarantees that .xk satisfies .
gP (xk , λk , ρ) ≤ min{CMk Bxk , η},
(9.55)
so .xk can be considered as an iterate of SMALSE-M for problem (9.1). This is sufficient to guarantee the convergence of SMALSE-Mw. However, C in (9.55) depends on .λk , so the analysis used above is not sufficient to obtain optimality results for SMALSE-Mw. In spite of this, it turns out that SMALSE-Mw is an effective algorithm for the solution of problems with linear equality and separable inequality constraints with strong curvature, such as those arising in the solution of contact problems with orthotropic friction. Since .λk changes slowly, it is not surprising that we observe in our experiments a kind of optimal performance.
9.13 Solution of More General Problems If .A is SPS, but its restriction to the kernel of .B is SPD, then we can use a suitable penalization to reduce such problem to the strictly convex one. Using Lemma [11, Lemma 1.3], it is easy to see that there is . > 0 such that .A + BT B is positive definite, so that we can apply our SMALBE-M algorithm to the equivalent penalized problem .
min f0 (x),
x∈ΩBE
(9.56)
where f (x) =
.
1 T x (A + BT B)x − bT x. 2
If .A is an SPS matrix which is positive definite on the kernel of .B, which is typical for the dual formulation of the problems arising from the discretization
178
Zdeněk Dostál and Tomáš Kozubek
of contact problems, then we can use any . > 0. A good choice is ≈ A,
.
which provides a sufficiently strong regularization of .A and preserves the bounds on its nonzero eigenvalues.
9.14 Implementation Let us give here a few hints that can be helpful for an effective implementation of SMALSE-M (SMALBE-M) and SMALSE-Mw with the inner loop implemented by MPGP (MPRGP). Before applying the algorithms presented to the problems with a wellconditioned Hessian .A, we strongly recommend to rescale the equality constraints so that .A ≈ B. Taking into account the estimate of the rate of convergence in Theorem 9.5, it is also useful to orthonormalize or at least normalize the constraints. A stopping criterion should be added not only after Step 1 but also into the procedure which generates .xk in Step 1. In our experiments, we use ∇L(xk , λk , ρ) ≤ εg b and Bxk − c ≤ εf b.
.
(9.57)
The relative precisions .εf and .εg should be judiciously determined. We often use .ε = εg = εf . Our stopping criterion in the inner loop reads gP (yi , λk , ρ) ≤ min{Mk Byi − c, η} or (9.57),
.
so that the inner loop is interrupted when either the solution or a new iterate xk = yi is found. The parameter .η is used to define the initial bound on the feasibility error which is used to control the update of M. The algorithm does not seem to be sensitive with respect to .η; we use .η = 0.1b. The parameter .β is used to increase the precision control. Our experience indicates that .β = 0.2 is a reasonable choice. The regularization parameter .ρ should compromise the fast speed of the outer loop with large .ρ and a slow convergence of the algorithm in the inner loop. For the problems arising from the dual formulation of the conditions of equilibrium of contact problems, the choice .ρ ≈ A does not increase the upper bound on the spectrum of the Hessian of the augmented Lagrangian and seems to balance reasonably the speed of the inner and outer iterations. The basic strategy for the initialization of .M0 is based on the relation
.
Mk2 < ρλmin (A),
.
9 Solvers for Separable and Equality QP/QCQP Problems
179
which guarantees sufficient increase of the augmented Lagrangian. We use M02 ≈ 100ρλmin (A),
.
which allows fast early updates of the Lagrange multipliers. Notice that .Mk is adjusted automatically, so the performance of the algorithm is not sensitive to .M0 . If the Hessian .H = A + ρBBT of L is ill-conditioned and there is an approximation .M of .H that can be used as preconditioner, then we can use the preconditioning strategies introduced in Sect. 8.6. We report effective problem-dependent preconditioners for contact problems in Chap. 17.
9.15 Comments This chapter is based on the research the starting point of which was the algorithm introduced by Conn et al. [12]; they adapted the augmented Lagrangian method of Powell [2] and Hestenes [1] to the solution of problems with a nonlinear cost function subject to nonlinear equality constraints and bound constraints. Conn, Gould, and Toint worked with the increasing penalty parameter. They also proved that the potentially troublesome penalty parameter .ρk is bounded and the algorithm converges to a solution also with asymptotically exact solutions of auxiliary problems [12]. Moreover, they used their algorithm to develop the LANCELOT [5] package for the solution of more general nonlinear optimization problems. More references can be found in their comprehensive book on trust region methods [13]. An excellent reference for the application of augmented Lagrangian for more general problems can be found in Birgin and Martínez [14]. For the augmented Lagrangian method with filter, see Friedlander and Leyfer [15]. The SMALSE and SMALBE algorithms differ from the original algorithm in two points. The first one is the adaptive precision control introduced for bound and equality constrained problems by Dostál et al. [7]. These authors also proved basic convergence results for problems with a regular solution, including the linear convergence of both Lagrange multipliers and feasibility error for a large initial penalty parameter .ρ0 . The algorithm presented in [7] generated a forcing sequence for the increase of penalty parameter which was sufficient to get the convergence results. The next modification, the update rule for the parameter .ρk of the original SMALBE which enforced a sufficient monotonic increase of .L(xk , μk , ρk ), was first published by Dostál [16]. The convergence analysis included the optimality of the outer loop and the bound on the penalty parameter. The possibility to keep the regularization parameter fixed was first mentioned in the book [11]. The algorithm for the bound and equality constraints which keeps the regularization fixed was coined SMALBE-M. This innovation
180
Zdeněk Dostál and Tomáš Kozubek
was important for the experimental demonstration of numerical scalability of the algorithms for multibody contact problems. The analysis shows that it is possible to combine both strategies. The first optimality results for the bound and equality constrained problems were proved by Dostál and Horák for the penalty method [17, 18]. The optimality of SMALBE with the auxiliary problems solved by MPRGP was proved in Dostál [19]; the generalization of the results achieved earlier for the penalty method was based on a well-known observation that the basic augmented Lagrangian algorithm can be considered as a variant of the penalty method (see, e.g., Bertsekas [20, Sect. 4.4]). The generalization to the solution of problems with more general separable constraints was easy after the development of the algorithms discussed in Chap. 7. The presentation of SMALSE-M given here is based on our earlier work, Dostál and Kučera [8], and Dostál and Kozubek [9]. The SMALSE-Mw algorithm adapted for the elliptic constraints with strong excentricity appeared in Bouchala et al. [21]. Linear convergence for SMALBE-M has been proved in Dostál et al. [10]. If applied to the bound and equality constrained problem (9.42), all these algorithms generate identical iterates. Effective heuristic modifications can be found in the thesis of Hapla [22].
References 1. Hestenes, C.: Multiplier and gradient methods. J. Optim. Theory Appl. 4, 303–320 (1969) 2. Powell, M.J.D.: A Method for Nonlinear Constraints in Minimization Problems. Optimization, pp. 283–298. Academic, New York (1969) 3. Bertsekas, D.P.: Constrained Optimization and Lagrange Multiplier Methods. Academic, London (1982) 4. Glowinski, R., Le Tallec, P.: Augmented Lagrangian and OperatorSplitting Methods in Nonlinear Mechanics. SIAM, Philadelphia (1989) 5. Conn, A.R., Gould, N.I.M., Toint, PhL: LANCELOT: A FORTRAN Package for Large Scale Nonlinear Optimization (Release A). Springer Series in Computational Mathematics, vol. 17. Springer, New York (1992) 6. Hager, W.W.: Analysis and implementation of a dual algorithm for constraint optimization. J. Optim. Theory Appl. 79, 37–71 (1993) 7. Dostál, Z., Friedlander, A., Santos, S.A.: Augmented Lagrangians with adaptive precision control for quadratic programming with simple bounds and equality constraints. SIAM J. Optim. 13, 1120–1140 (2003) 8. Dostál, Z., Kučera, R.: An optimal algorithm for minimization of quadratic functions with bounded spectrum subject to separable convex inequality and linear equality constraints. SIAM J. Optim. 20(6), 2913–2938 (2010)
9 Solvers for Separable and Equality QP/QCQP Problems
181
9. Dostál, Z., Kozubek, T.: An optimal algorithm with superrelaxation for minimization of a quadratic function subject to separable constraints with applications. Math. Program. Ser. A 135, 195–220 (2012) 10. Dostál, Z., Brzobohatý, T., Horák, D., Kozubek, T., Vodstrčil, P.: On Rlinear convergence of semi-monotonic inexact augmented Lagrangians for bound and equality constrained quadratic programming problems with application. Comput. Math. Appl. 67(3), 515–526 (2014) 11. Dostál, Z.: Optimal Quadratic Programming Algorithms, with Applications to Variational Inequalities, 1st edn. Springer, New York (2009) 12. Conn, A.R., Gould, N.I.M., Toint, Ph.L.: A globally convergent augmented Lagrangian algorithm for optimization with general constraints and simple bounds. SIAM J. Numer. Anal. 28, 545–572 (1991) 13. Conn, A.R., Gould, N.I.M., Toint, Ph.L.: Trust Region Methods. SIAM, Philadelphia (2000) 14. Birgin, E.M., Martínez, J.M.: Practical Augmented Lagrangian Method. SIAM, Philadelphia (2014) 15. Friedlander, M.P., Leyfer, S.: Global and finite termination of a twophase augmented Lagrangian filter method for general quadratic programs. SIAM J. Sci. Comput. 30(4), 1706–1729 (2008) 16. Dostál, Z.: Inexact semi-monotonic augmented Lagrangians with optimal feasibility convergence for quadratic programming with simple bounds and equality constraints. SIAM J. Numer. Anal. 43(1), 96–115 (2005) 17. Dostál, Z., Horák, D.: Scalable FETI with optimal dual penalty for a variational inequality. Numer. Linear Algebra Appl. 11(5–6), 455–472 (2004) 18. Dostál, Z., Horák, D.: Scalable FETI with optimal dual penalty for semicoercive variational inequalities. Contemp. Math. 329, 79–88 (2003) 19. Dostál, Z.: An optimal algorithm for bound and equality constrained quadratic programming problems with bounded spectrum. Computing 78, 311–328 (2006) 20. Bertsekas, D.P.: Nonlinear Optimization. Athena Scientific, Belmont (1999) 21. Bouchala, J., Dostál, Z., Kozubek, T., Pospíšil, L., Vodstrčil, P.: On the solution of convex QPQC problems with elliptic and other separable constraints. Appl. Math. Comput. 247(15), 848–864 (2014) 22. Hapla, V.: Massively parallel quadratic programming solvers with applications in mechanics. Ph.D. Thesis, FEECS, VŠB–Technical University of Ostrava, Ostrava (2016). https://dspace.vsb.cz/handle/10084/112271? locale-attribute=en
Part III Scalable Algorithms for Contact Problems
Chapter 10
TFETI for Scalar Problems Zdeněk Dostál and Tomáš Kozubek
We illustrate the ideas of scalable domain decomposition algorithms for contact problems by describing the solution of two scalar elliptic variational inequalities. The model problems proposed by Ivan Hlaváček deal with the equilibrium of two membranes with prescribed non-penetration conditions on the parts of their boundaries. The structure of our problems is similar to the multibody contact problems of elasticity. The scalar variational inequalities are of independent interest as they describe the steady-state solutions of problems arising in various fields of mathematical physics (see, e.g., Duvaut and Lions [1]). Our presentation uses the FETI (finite element tearing and interconnecting) method, which was proposed as a parallel solver for linear problems arising from the discretization of elliptic partial differential equations. The idea is to decompose the domain into non-overlapping subdomains joined by equality constraints. The decomposed and discretized problem is then reduced by duality to a small, relatively well-conditioned QP problem in Lagrange multipliers. Here, we introduce a variant of the FETI method called total FETI (TFETI), which differs from the original FETI in the way of implementing Dirichlet boundary conditions. While FETI assumes that the subdomains inherit their Dirichlet boundary conditions from the original problem, TFETI enforces them by Lagrange multipliers. Such an approach simplifies the implementation as the stiffness matrices of “floating” subdomains have a priori known kernels that do not depend on prescribed displacements. The procedure is combined with the preconditioning by the “natural coarse grid” of rigid body motions to get a uniformly bounded regular condition number of the Hessian matrix of the dual-energy function. The TFETI methodology is Z. Dostál • T. Kozubek Department of Applied Mathematics and IT4Innovations National Supercomputing Center, VŠB-Technical University of Ostrava, Ostrava, Czech Republic e-mail: [email protected]; e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 Z. Dostál et al., Scalable Algorithms for Contact Problems, Advances in Mechanics and Mathematics 36, https://doi.org/10.1007/978-3-031-33580-8_10
185
186
Zdeněk Dostál and Tomáš Kozubek
then used to reduce the continuous scalar variational inequality to the bound and equality constrained QP problem with a well-conditioned Hessian matrix. The resulting problem is solved by the SMALBE-M and MPRGP algorithms with asymptotically linear complexity.
10.1 Two Membranes in Unilateral Contact We shall reduce our analysis to two scalar model problems. Let Ω1 = (0, 1) × (0, 1)
.
and
Ω2 = (1, 2) × (0, 1)
denote open domains with the boundaries .Γ1 and .Γ2 , respectively. Let the parts .ΓiU , .ΓiF , and .ΓiC of .Γi , .i = 1, 2, be formed by the sides of .Ωi as in Fig. 10.1, so that ΓiC = ΓC = {(1, y) ∈ R2 : y ∈ (0, 1)}.
.
Let .Ω = Ω1 ∪ Ω2 , and let f :Ω→R
.
denote a given continuous function. Our goal is to find a sufficiently smooth u = (u1 , u2 ) satisfying
.
.
−Δui = f
in Ωi ,
ui = 0
on ΓiU ,
∂ui =0 ∂ni
on ΓiF ,
i = 1, 2, (10.1)
where .ni denotes the outer unit normal, together with the conditions .
u2 − u1 ≥ 0,
∂u2 ≥ 0, ∂n2
∂u2 2 (u − u1 ) = 0, ∂n2
and
∂u1 ∂u2 + =0 1 ∂n ∂n2 (10.2)
given on .ΓC = Γ1C = Γ2C . The solution u can be interpreted as the displacement of two membranes that are fixed on .ΓU , pressed down vertically by the traction of the density f, and pulled horizontally by the unit density traction along .ΓF and the part of .ΓC where .u1 < u2 . Moreover, the left edge of the right membrane is not allowed to penetrate below the right edge of the left membrane. We shall distinguish two cases. In the first case, both membranes are fixed on the outer edges as in Fig. 10.1 left, so that Γ1U = {(0, y) ∈ R2 : y ∈ [0, 1]}, Γ2U = {(2, y) ∈ R2 : y ∈ [0, 1]}.
.
10 TFETI for Scalar Problems
187
f
ΓU1
f f
Ω1
Ω 2 ΓU2
ΓU1
Ω1
f
Ω 2 ΓF2
Fig. 10.1 Coercive (left) and semicoercive (right) model problems
Since the Dirichlet conditions are prescribed on the parts .ΓiU , i = 1, 2, of the boundaries with a positive measure, it follows that a solution exists and is necessarily unique [2, 3]. In the second case, only the left membrane is fixed on the outer edge, and the right membrane has no prescribed vertical displacement as in Fig. 10.1 right, so that Γ1U = {(0, y) ∈ R2 : y ∈ [0, 1]}, Γ2U = ∅.
.
To guarantee the solvability and uniqueness, we assume . f dΩ < 0. Ω2
More details about this model problem may be found, e.g., in [4].
10.2 Variational Formulation To reduce the requirements on the smoothness of data and solutions of (10.1) and (10.2), let us reformulate the problem in a variational form, which requires that the relations are satisfied rather in average than point-wise. This approach opens a convenient way to the results concerning the existence and uniqueness of a solution and to the application of efficient QP solvers to the solution of discretized problems. The variational form of (10.1) and (10.2) uses the linear and bilinear forms defined on suitable Sobolev spaces. Let .H 1 (Ωi ), i = 1, 2, denote the Sobolev spaces of the first order in the space .L2 (Ωi ) of the functions defined on .Ωi , the squares of which are integrable in the sense of Lebesgue. Let i i 1 i i .V = v ∈ H (Ω ) : v = 0 on ΓiU denote the closed subspaces of .H 1 (Ωi ), i = 1, 2, and let 1 2 .V = V × V and K = (v 1 , v 2 ) ∈ V : v 2 − v 1 ≥ 0 denote the closed subspace and the closed convex subset of
on
ΓC
188
Zdeněk Dostál and Tomáš Kozubek
H = H 1 (Ω1 ) × H 1 (Ω2 ),
.
respectively. The relations on the boundaries are in terms of traces. On H we shall consider the .L2 scalar product (u, v) =
2
ui v i dΩ,
.
i=1
Ωi
the symmetric bilinear form
2
a(u, v) =
.
Ωi
i=1
∂ui ∂v i ∂ui ∂v i + ∂x1 ∂x1 ∂x2 ∂x2
dΩ,
and the linear form (v) = (f, v) =
2
f i v i dΩ,
.
i=1
f i = f |Ωi .
Ωi
To get variational conditions that are satisfied by each sufficiently smooth solution .u = (u1 , u2 ) of (10.1) and (10.2), let 1 1 1 2 1 2 .v = (v , v ) ∈ K ∩ C (Ω ) × C (Ω ) . Using the Green Theorem 4.4, the definition of .K , (10.1), and (10.2), we get .
∂u1 1
∂u2 2 − (Δu, v) = (∇u, ∇v) − ΓC ∂n 1 v dΓ − Γ 2 v dΓ C ∂n
∂u2 1 = a(u, v) + ΓC ∂n2 (v − v 2 ) dΓ ≤ a(u, v).
(10.3) (10.4)
For .x ∈ Ωi , we define .u(x) = ui (x). Since for any .v ∈ C 1 (Ω1 ) × C 1 (Ω2 ) .
− (Δu, v) = (f, v)
and a solution u satisfies .
ΓC
∂u2 1 (u − u2 ) dΓ = 0, ∂n2
we can use the assumption v − u ∈ C 1 (Ω1 ) × C 1 (Ω2 )
.
and (10.2) to get
10 TFETI for Scalar Problems
189
a(u, v − u) − (v − u) = − (Δu, v − u) +
.
ΓC
∂u2 2 (v − v 1 ) dΓ − (f, v − u) ∂n2
2
∂u (v 2 − v 1 ) dΓ ≥ 0. ∂n2
= ΓC
The latter condition, i.e., v ∈ K ∩ C 1 (Ω1 ) × C 1 (Ω2 ) ,
a(u, v − u) ≥ (v − u),
(10.5)
is by Theorem 4.7 just the condition which satisfies the minimizer of .
q(v) =
1 a(v, v) − (v) subject to v ∈ K ∩ C 1 (Ω1 ) × C 1 (Ω2 ) . 2
(10.6)
The classical formulation of the conditions of equilibrium does not describe realistic situations. For example, if f is discontinuous, then the description of equilibrium by (10.1) and (10.2) is not complete, but the problem to find .u ∈ K such that .
q(u) ≤ q(v),
v∈K,
(10.7)
is defined for any .f ∈ L2 (Ω1 ) × L2 (Ω2 ). Using the arguments based on the coercivity of q, it is possible to prove that (10.7) has a solution [2] and that any sufficiently smooth solution of (10.7) solves (10.1) and (10.2). Moreover, if f is piece-wise discontinuous, then it can be shown that the solution of variational problem (10.7) satisfies additional conditions of equilibrium [2].
10.3 Tearing and Interconnecting So far, we have used only the natural decomposition of the spatial domain Ω into .Ω1 and .Ω2 . To enable efficient application of domain decomposition methods, we decompose each .Ωi into .p = 1/H 2 square subdomains i1 ip .Ω , . . . , Ω with boundaries .Γi1 , . . . , Γip as in Fig. 10.2. We shall call H a decomposition parameter. The continuity of a global solution in .Ω1 and .Ω2 can be enforced by the “gluing” conditions .
uij = uik ,
(10.8)
.
∇u · n = − ∇u · n , ij
ij
ik
ik
(10.9)
which should be satisfied by the traces of .uij and .uik on Γij,ik = Γij ∩ Γik .
.
(10.10)
190
Zdeněk Dostál and Tomáš Kozubek
Ω 11
Ω 12
Ω 21
Ω 22
Ω 13
Ω 14
Ω 23
Ω 24
H
h
λ
λ
λ Fig. 10.2 Domain decomposition and discretization
Let us simplify the notation by assigning to each subdomain and its boundary a unique index .k ∈ {1, . . . , s = 2p}, Ωk = Ω ij ,
.
Γk = Γ ij ,
k = k(i, j) = (i − 1)p + j,
let ΓU ∩ Γ i ,
V = vi ∈ H 1 (Ωi ) : vi = 0 on
. i
i = 1, . . . s,
denote the closed subspaces of .H 1 (Ωi ), and let
.
VDD = V1 × · · · × Vs , Γij = Γi ∩ Γj KDD = {v ∈ VDD : vi − vj ≥ 0 on Γij ∩ ΓC , i > j; vi = vj on Γij − ΓC } .
The relations on the boundaries are again in terms of traces. On .VDD , we shall define the scalar product (u, v) =
s
ui vi dΩ,
.
i=1
Ωi
the symmetric bilinear form a(u, v) =
s
.
i=1
Ωi
∂ui ∂vi ∂ui ∂vi + ∂x1 ∂x1 ∂x2 ∂x2
dΩ,
and the linear form (v) = (f, v) =
s
f i vi dΩ,
.
i=1
Ωi
where .f i ∈ L2 (Ωi ) denotes the restriction of f to .Ωi . Observing that for any .v ∈ KDD and .i, j such that .Γij ∩ ΓC = ∅
10 TFETI for Scalar Problems
.
Γij
∂ui vi dΓ + ∂ni
Γij
191
∂uj vj dΓ = ∂nj
Γij
∂ui ∂uj + ∂ni ∂nj
vi dΓ = 0,
we get that relations (10.3)–(10.5) remain valid also for the decomposed problem and .u ∈ KDD solves .
a(u, v − u) ≥ (v − u),
v ∈ KDD .
(10.11)
In what follows, we shall consider the equivalent problem to find .u ∈ KDD such that .
q(u) ≤ q(v),
q(v) =
1 a(v, v) − (v), 2
v ∈ KDD .
(10.12)
10.4 Discretization To reduce (10.12) to a finite-dimensional problem, let us introduce on each subdomain .Ωi a regular grid with the step h as in Fig. 10.2, so that the grids match across the interfaces .Γij of the adjacent subdomains, index contiguously the nodes and entries of corresponding vectors in the subdomains, and define piece-wise continuous linear basis functions φi : Ωi → R,
.
φi (xk ) = δk ,
, k = 1, . . . , ni ,
as in Fig. 10.3, where .ni = (H/h + 1)2 denotes a number of nodes in .Ωi , .i = 1, . . . , s. We shall look for an approximate solution .uh in the trial space .Vh which is spanned by the basis functions, Vh = Vh1 × · · · × Vhs , .
Vhi = Span{φi : = 1, . . . , ni }. i
φ (x ) = 1
x
Fig. 10.3 Triangulation of the domain (left) and piece-wise linear continuous basis functions (right)
192
Zdeněk Dostál and Tomáš Kozubek
Decomposing .uh into the components which comply with the decomposition, i.e., u = (u1h , . . . , ush ), ni ui φi (x), uih (x) = . h
=1
we get a(uh , vh ) =
s
.
ai (uih , vhi ),
i=1
ai (uih , vhi ) = [Ki ]m =
ni ni
i ai (φi , φim )ui vm = uTi Ki vi ,
=1 m=1 ai (φi , φim ),
[ui ] = ui ,
(10.13)
[vi ] = vi .
Similarly, (uh ) =
s
.
(f i , uih ),
i=1
(f i , uih ) =
ni
(f i , ui φi ) = fiT ui ,
(10.14)
=1 i
[fi ] =(f , φi ). To simplify the following exposition, we introduce the notation for global objects, so that we can write ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ K1 O · · · O u1 f1 ⎢ O K2 · · · O ⎥ ⎢ ⎥ ⎢ .. ⎥ ⎢ .. ⎥ .K = ⎢ . . . ⎥ , u = ⎣ . ⎦ , f = ⎣ . ⎦ , s = 2p, ⎣ .. .. . . . .. ⎦ us fs O O · · · Ks and a(uh , uh ) = uT Ku,
.
(uh ) = f T u.
(10.15)
Using the above notation, we get the discretized version of problem (10.12) with auxiliary domain decomposition .
min
1 T u Ku − f T u 2
s.t.
BI u ≤ o and
BE u = o.
(10.16)
In (10.16), .K ∈ Rn×n , .n = sni , denotes a block diagonal SPS stiffness matrix; the full rank matrices .BI and .BE describe the discretized non-penetration
10 TFETI for Scalar Problems
193
i
j
k
l
i
j
i
Fig. 10.4 Three types of constraints
and interconnecting conditions, respectively; and .f represents the discrete analog of the linear term .(u). The rows of .BE and .BI are filled with zeros except 1 and .−1 in the positions that correspond to the nodes with the same coordinates on the subdomain interfaces, the nodes on the Dirichlet boundaries, and the nodes that can come into contact. We get three types of equality constraints as in Fig. 10.4. If .bi denotes a row of .BI or .BE , then .bi does not have more than four nonzero entries. The continuity of the solution in “wire baskets” (see Fig. 10.4 left) and on the subdomain interfaces (see Fig. 10.4 middle) and the Dirichlet boundary conditions (see Fig. 10.4 right) are enforced by the equalities u = uj , uk = u , ui + uj = uk + ul ;
. i
ui = uj ;
ui = 0;
respectively. Alternatively, they can be expressed by means of the vectors bij = (si − sj )T , bk = (sk − s )T , bijkl = (si + sj − sk − sl )T ; .
bij = (si − sj )T ; bi = sTi ;
where .si denotes the i th column of the identity matrix .In . The continuity of the solution across the subdomains’ interfaces (see Fig. 10.4 middle) is implemented by bij x = 0,
.
so that .bij x denotes the jump across the boundary. The non-penetration is enforced similarly. If i and j are the indices of matching nodes on .Γ1C and .Γ2C , respectively, then any feasible nodal displacements satisfy bij x ≤ 0.
.
The construction of matrices .BE and .BI guarantees that any couple of their rows is orthogonal. We can easily achieve by scaling that .B = [BTE , BTI ]T
194
Zdeněk Dostál and Tomáš Kozubek
satisfies BBT = I.
.
10.5 Dual Formulation Our next step is to simplify the problem using the duality theory; in particular, we replace the general inequality constraints BI u ≤ o
.
by the nonnegativity constraints. To this end, let us define the Lagrangian associated with problem (10.16) by .
L(u, λI , λE ) =
1 T u Ku − f T u + λTI BI u + λTE BE u, 2
(10.17)
where .λI and .λE are the Lagrange multipliers associated with the inequalities and equalities, respectively. Introducing the notation λI BI .λ = and B = , λE BE we can observe that .B ∈ Rm×n is a full rank matrix and write the Lagrangian as L(u, λ) =
.
1 T u Ku − f T u + λT Bu. 2
Thus, the solution satisfies the KKT conditions, including .
Ku − f + BT λ = o.
(10.18)
Equation (10.18) has a solution if and only if .
f − BT λ ∈ ImK,
(10.19)
which can be expressed more conveniently by means of a matrix .R, the columns of which span the null space of .K as .
RT (f − BT λ) = o.
(10.20)
The matrix .R can be formed directly so that each floating subdomain is assigned to a column of .R with ones in the positions of the nodal variables that belong to the subdomain and zeros elsewhere, so that
10 TFETI for Scalar Problems
195
R = diag(e, . . . , e) ∈ Rn×s ,
.
e = [1, . . . , 1]T ∈ Rni .
Notice that .RT BT is a full rank matrix. Indeed, if BRy = o,
.
then .Ry is a constant vector with all entries equal to zero due to the Dirichlet conditions. Now assume that .λ satisfies (10.19), so that we can evaluate .λ from (10.18) by means of any (left) generalized matrix .K+ which satisfies .
KK+ K = K.
(10.21)
It may be verified directly that if .u solves (10.18), then there is a vector .α such that u = K+ (f − BT λ) + Rα.
.
(10.22)
For the effective evaluation of the generalized inverse, we can use # K# = diag(K# 1 , . . . , Ks ),
.
where .K# i is defined in (2.12). Using Lemma 2.1, we can check that we can get a full rank submatrix .Ai of .Ki , which appears in the definition of .K# i , by deleting any row and corresponding column of .Ki . The best conditioning of # .Ki can be achieved by deleting those corresponding to a node near the center of .Ωi [5]. The action of .K# can be evaluated by the Cholesky decomposition. See Sect. 11.8 for more details and a faster alternative procedure. Using Proposition 3.13, we can find .λ by solving the minimization problem .
min θ(λ)
s.t.
λI ≥ o
and
RT (f − BT λ) = o,
(10.23)
where .
θ(λ) =
1 T λ BK+ BT λ − λT BK+ f . 2
(10.24)
Notice that .θ is obtained from the dual function .Θ defined by (3.50) by changing the signs and omitting the constant term. Once the solution .λ of (10.16) can be evaluated by (10.22) and of (10.23) is known, the solution .u the formula (3.67). We get .
T BR) −1 RT B T BK + (f − BT λ), = −(RT B α
(10.25)
= [B T , BT ]T , and the matrix .B I is formed by the rows .bi of .BI that where .B I E I characterized by correspond to the positive components of the solution .λ .λi > 0.
196
Zdeněk Dostál and Tomáš Kozubek
10.6 Natural Coarse Grid Even though problem (10.23) is much more suitable for computations than (10.16) and was used to effective solving of discretized variational inequalities [6], further improvement can be achieved using orthogonal projectors associated with the feasible set. Let us denote .
= BK+ f , F = BK+ BT , d = R T BT , G e = RT f ,
and let .T denote a regular matrix that defines the orthonormalization of the so that the matrix rows of .G G = TG
.
has orthonormal rows. After denoting e = T e,
.
problem (10.23) reads .
min
1 T λ Fλ − λT d 2
s.t.
λI ≥ o
and
Gλ = e.
(10.26)
Next we shall transform the minimization on the subset of the affine space to that on the subset of the vector space by looking for the solution of (10.26) in the form λ = ν + λ,
.
where
= e. Gλ
such that The following rather trivial lemma shows that we can even find .λ .λI ≥ o. = e. such that .λ I ≥ 0 and .Gλ Lemma 10.1 There is .λ Proof Since the feasible set of (10.26) is nonempty, we can take = arg min 1 λ 2 λ 2
.
subject to
λI ≥ o and
Gλ = e.
.
so that To carry out the transformation, denote .λ = ν + λ, .
1 T − Fλ) + 1λ −λ T d = 1 ν T Fν − ν T (d T Fλ λ Fλ − λT d 2 2 2
and problem (10.26) reduces to
10 TFETI for Scalar Problems .
197
1 I , d = d − Fλ, λ I ≥ o. min ν T Fν − ν T d s.t. Gν = o and ν I ≥ −λ 2 (10.27)
Our final step is based on the observation that the solution of (10.27) solves .
min
θρ (ν)
Gν = o and
I , ν I ≥ −λ
H = PFP + ρQ,
Q = GT G,
s.t.
(10.28)
where .ρ is a positive constant and .
θρ (ν) =
1 T ν H ν − ν T Pd, 2
P = I − Q. (10.29)
The matrices .P and .Q are the orthogonal projectors on the kernel of .G and the image space of .GT , respectively. The regularization term is introduced in order to enable the reference to the results on strictly convex QP problems. substitute the = ν + λ, To evaluate .u, use the solution .ν of (10.28) to get .λ result to (10.25) to get .α, and finally get .u by means of (10.22).
10.7 Reducing the Analysis to Subdomains We shall solve the QP problem (10.28) by SMALBE-M (Algorithm 9.2) with the inner loop implemented by MPRGP (Algorithm 8.2). These algorithms can solve the class of problems (10.28) arising from the discretization of (10.12) with varying discretization and decomposition parameters in a uniformly bounded number of iterations provided there are positive bounds on the spectrum of the Hessian .H of the cost function .θρ . In this section, we shall reduce the analysis of the spectrum of .H to that of the Schur complements .Si of the stiffness matrices .Ki of .Ωi with respect to the interior variables. First observe that Im.P and Im.Q are invariant subspaces of .H , Im.P+Im.Q = Rm as .P + Q = I, and for any .λ ∈ Rm , H Pλ = (PFP + ρQ)Pλ = PFPλ
.
and
H Qλ = (PFP + ρQ)Qλ = ρQλ.
It follows that σ(H |ImQ) = {ρ},
.
so it remains to find the bounds on σ(H |ImP) = σ(PFP|ImP).
.
198
Zdeněk Dostál and Tomáš Kozubek
The next lemma reduces the problem to the analysis of local Schur complements. Lemma 10.2 Let there be constants .0 < c < C such that for each .λ ∈ Rm , .
c λ 2 ≤ BT λ 2 ≤ C λ 2 .
(10.30)
Then for each .λ ∈ ImP, .
c
−1 −1 T 2 max Si
λ ≤ λ Fλ ≤ C min λmin (Si )
λ 2 , (10.31)
i=1,...,s
i=1,...,s
where .Si denotes the Schur complement of .Ki with respect to the indices of the interior nodes of .Ωi and .λmin (Si ) denotes the smallest nonzero eigenvalue of .Si . Proof Let .B denote a block matrix which complies with the structure of .K, so that ⎡ + ⎤⎡ T ⎤ B1 K1 O · · · O T ⎥ s ⎢ ⎢ O K+ ⎥ B · · · O 2 ⎢ ⎥⎢ 2 ⎥ + T T .F = BK B = [B1 , B2 , . . . , Bs ] ⎢ . . . Bi K+ . ⎥⎢ . ⎥ = i Bi . ⎣ .. .. . . .. ⎦ ⎣ .. ⎦ O O · · · K+ s
BTs
i=1
Since the columns of .Bi that correspond to the interior nodes of .Ωi are formed by zero vectors, we can renumber the variables to get .Bi = [Ci O] and .
c λ 2 ≤
s
CTi λ 2 ≤ C λ 2 .
i=1
Let us now denote by .Si the Schur complement of .Ki , .i = 1, . . . , s, with respect to the interior variables. Notice that it is well defined, as eliminating of the interior variables of any subdomain amounts to solving the Dirichlet problem which has a unique solution. Moreover, if we denote by .Bi the set of all indices which correspond to the variables on the boundary .Γi of .Ωi , + .B = B1 ∪· · ·∪Bs , and choose a generalized inverse .Si , we can use Lemma 2.2 to get F = BK+ BT =
s
.
T Bi K+ i Bi =
s
i=1
T + T Ci S+ i Ci = B∗B S B∗B ,
i=1
where + S+ = diag(S+ 1 , . . . , Ss ),
.
BBT = B∗B BT∗B .
For the analysis, we shall choose the Moore-Penrose generalized inverse .K† . Observing that for each .λ ∈ Rm , .BT Pλ ∈ ImK, .ImK = ImK† , and by
10 TFETI for Scalar Problems
199
Lemma 2.2 .BT∗B Pλ ∈ ImS† , we get λT PFPλ = λT PBK† BT Pλ = λT PB∗B S† BT∗B Pλ.
.
Using the assumptions, we get for any .λ ∈ ImP −1
.
T 2 −1 T 2 † T T 2 cλ−1 max (S) λ ≤ λmax (S) B λ ≤ λ B∗B S B∗B λ ≤ λmin (S) B λ
−1
≤ Cλmin (S) λ 2 .
To finish the proof, notice that λ
. max
(S) = max λmax (Si ) = max Si
i=1,...,s
i=1,...,s
and
λmin (S)
= min λmin (Si ).
.
i=1,...,s
Remark 10.1 Lemma 10.2 indicates that the orthonormalization of the rows of the constraint matrix .B is important for favorable conditioning of .H. We have thus reduced the problem to bound the spectrum .σ(H ) to the analysis of spectra .σ(Si ) of the Schur complements of the subdomains’ stiffness matrices with respect to their interiors.
10.8 H-h Bounds on the Schur Complements’ Spectrum Let .i ∈ {1, . . . , s} be fixed, let .Ki result from the discretization of a square subdomain .ΩH = Ωi of the side length H with the boundary .ΓH = Γi using the discretization parameter h, and let .SH,h = Si denote the Schur complement of the subdomain’s stiffness matrix .Ki with respect to its interior variables. Our goal is to give bounds on the spectrum of .SH,h in terms of H and h. First observe that by the Schur complement interlacing relations (2.30), λ
. min
(Ki ) ≤ λmin (Si )
and
λmax (Si ) ≤ λmax (Ki ) = K .
Since the eigenvalues of the discrete Laplacian on .Ωi , obtained by the central differences with step h [7, Sect. 1.1.4], are given by jπh kπh 4 2 2 + sin .λjk = sin , j, k = 0, . . ., H/h, h2 2(H + h) 2(H + h) the eigenvalues .λjk (Ki ) of .Ki satisfy .λjk (Ki ) = h2 λjk . Assuming small h, we get λ
. min
(SH,h )≥4 sin2
π 2 h2 πh ≈ , 2(H + h) (H + h)2
λmax (SH,h ) ≤ 8.
(10.32)
200
Zdeněk Dostál and Tomáš Kozubek
10.8.1 An Upper Bound The bounds on the spectrum of .K (10.32) are sufficient to guarantee that there is a constant C > 0 independent of h and H such that κ(SH,h ) ≤ C
.
provided H/h is uniformly bounded. However, a more refined analysis reveals that the bounds on the spectrum of Ki provide very pessimistic bounds on the spectrum of SH,h . Let us first show how to improve the upper bound using the relations between the Schur complement of Ki with respect to the boundary variables and discrete harmonic functions. Recall that uh ∈ Vhi (see Sect. 10.4) is a discrete harmonic function on Ωi if i .a (uh , vh ) = (∇uh , ∇vh )dΩ = 0 for all vh ∈ Vhi , vh |Γi = 0. Ωi
The following property provides an alternative characterization of harmonic functions. Lemma 10.3 Let uh ∈ Vhi denote a discrete harmonic function. Then .
ai (uh , uh ) ≤ ai (vh , vh )
(10.33)
for any vh ∈ Vhi such that vh |Γi = uh |Γi . Proof Let uh and vh satisfy the assumptions of the lemma and denote wh = v h − u h .
.
Then wh |Γi = 0 and ai (vh , vh ) = ai (uh + wh , uh + wh ) = ai (uh , uh ) + 2ai (uh , wh ) + ai (wh , wh )
.
= ai (uh , uh ) + ai (wh , wh ) ≥ ai (uh , uh ). To specify the coordinates of the discrete harmonic functions, let us decompose the set of the indices N = {1, . . . , n} of nodes in Ωi into the sets I and B corresponding to the nodes in the interior and on the boundary of Ωi , respectively. The decomposition of N imposes a block structure on Ki and the subdomain’s coordinate vectors, KII KIB uI vI .Ki = = [km ], u = = [u ], v = = [v ]. KIB KBB uB vB If uh is a discrete harmonic function and vh (xi ) = 0 for i ∈ B, i.e., vB = o, then
10 TFETI for Scalar Problems
201
ai (uh , vh ) = vT Ki u = vIT (KII uI + KIB uB ) = 0.
.
Since the above relations hold for any v such that vB = o, we get that the coordinate vector u of any discrete harmonic function uh is fully determined by uB and uI = −K−1 II KIB uB .
.
We shall call the coordinates of the discrete harmonic function uh defined by of uB , uB the harmonic extension u uB = .u . −K−1 II KIB uB It follows that T Ki u = uTB Si uB . ai (uh , uh ) = u
(10.34)
.
If u h denotes a discrete harmonic function defined by the coordinate vector with the boundary coordinates uB , then we can take the piece-wise linear u function vh defined by the coordinates uB .v = o and use (10.13) and Lemma 10.3 to get T .uB Si uB
= a ( Ki u =u uh , u h ) ≤ a (vh , vh ) = T
i
∇vh 2 dΩ.
i
(10.35)
Σ
Notice that the support Σ of vh is a strip of the width h along the boundary Γi of Ωi as in Fig. 10.5. A little exercise in the integration of linear functions yields T .uB Si uB ≤
∇vh 2 dΩ ≤ 3 uB 2 and λmax (Si ) ≤ 3. (10.36) Σ
Fig. 10.5 Support Σ of vh
202
Zdeněk Dostál and Tomáš Kozubek
10.8.2 A Lower Bound We can use the discrete harmonic functions to improve also the lower bound. Let us first recall that by the Poincaré inequality (4.3), there is .C1 > 0 such that any piece-wise linear function .uh defined on .ΩH = (0, H)2 satisfies 2 2 . uh = uh dΩH ≤ C1 (∇uh , ∇uh )dΩ = ∇uh 2 (10.37) ΩH
ΩH
provided (10.38)
uh dΓ = 0.
.
ΓH
Let us first assume that .H = 1, so that we can use (10.37) and Trace Theorem 4.3 to get that there are positive constants .C2 and .C3 such that . u2h dΓ ≤ C2 uh 2 + ∇uh 2 ≤ C3 ∇uh 2 = C3 (∇uh , ∇uh )dΩ. Γ1
Ω1
(10.39) If .H = 1/m, then define v (x) = uh (Hx),
. h
x ∈ Ω1 .
Using the substitution Hx = y,
.
we get . vh 2 dΓ = H −1 Γ1
y ∈ ΩH ,
u2h dΓ
(∇vh , ∇vh )dΩ =
and
ΓH
Ω1
(∇uh , ∇uh )dΩ. ΩH
Combining the last relations with (10.39), we get that there is .C > 0 independent of H and h such that any .uh ∈ Vhi with the trace orthogonal to constants satisfies −1 2 .H uh dΓ ≤ C (∇uh , ∇uh ) dΩ. (10.40) ΓH
ΩH
The inequality (10.40) is a key to the desired estimate of the lower bound on the spectrum of .Si . Let .u and .uB denote the coordinates of a piecewise linear harmonic function .uh and of its restriction to the boundary .ΓH , respectively. Let us assume that .uB ∈ ImS, i.e., uTB e = 0,
.
e = [1, . . ., 1]T .
10 TFETI for Scalar Problems
203
Using the definitions, (10.34), and the minimization property of harmonic functions, we get . uh dΓ = uh (xi )h = huTB e = 0,
ΓH
xi ∈ΓH
u2h dΓ ΓH
=
2
(uh (xi )) h = uB 2 h,
xi ∈ΓH
(∇uh , ∇uh )dΩ = ai (uh , uh ) = uT Ki u = uTB Si uB , ΩH
Thus, we can use (10.40) to get that there is .C > 0 independent of H and u such that
. h
hH −1 uB 2 ≤ CuTB Si uB .
.
(10.41)
We can summarize the results of this section into the form convenient for the proof of optimality. Proposition 10.1 Let H and h denote the decomposition and discretization parameter, respectively. Then there are positive constants c and C independent of h and H such that for each .λ ∈ ImSH,h , .
c
h
λ 2 ≤ λT SH,h λ ≤ C λ 2 . H
(10.42)
Proof See (10.36) and (10.41). The bounds on the Schur complement appeared in the analysis of domain decomposition methods in Bramble et al. [8, 9]. See also Brenner [10]. The proof of the bounds for more general domains and discretizations with the Dirichlet conditions prescribed on the part . of the boundary can be found in Pechstein [11], Corollary 1.61. Remark 10.2 The estimate was a key ingredient of the first optimality analysis of FETI for linear problems by Farhat et al. [12].
10.9 Optimality The following theorem is an easy corollary of Lemma 10.2 and Proposition 10.1. Theorem 10.1 Let .ρ > 0, and let .Hρ,H,h denote the Hessian of .θρ resulting from the decomposition and discretization of problem (10.7) with the parameters H and h. Then there are constants c and C independent of h and H such that for each .λ ∈ Rn ,
204
Zdeněk Dostál and Tomáš Kozubek
c λ 2 ≤ λT Hρ,H,h λ ≤ C
.
H
λ 2 . h
(10.43)
Proof Let .P, .Q, .F, and .H denote the matrices introduced in Sect. 10.6, so that P and .Q are orthogonal projectors on .ImF and .KerF, respectively. Assume that they were obtained using the parameters h and H, so that
.
H,H,h = H = PFP + Q.
.
Substituting (10.42) into Lemma 10.2, we get that there are .0 < c0 < C0 such that c λ 2 ≤ λT Fλ = λT PFPλ ≤ C0 λ 2 ,
. 0
λ ∈ ImF.
To finish the proof, denote .c = min{, c0 } and .C = max{, C0 }, and check that 2 2 2 ≤ λT PFPλ + Qλ 2 .c λ = c Pλ + Qλ
= λT H,H,h λ ≤ C Pλ 2 + Qλ 2 = C λ 2 for any .λ ∈ Rn .
.
To show that Algorithm 9.2 (SMALBE-M) with the inner loop implemented by Algorithm 8.2 (MPRGP) is optimal for solving a class of problems arising from the varying discretizations of a given variational inequality, let us introduce new notations which comply with that used in Part II. Let T = {(H, h) ∈ R2 : H ≤ 1, 0 < 2h ≤ H, and H/h ∈ N}
.
denote the set of indices, where .N denotes the set of all positive integers. Given a constant .C ≥ 2, we shall define a subset .TC of .T by TC = {(H, h) ∈ T : H/h ≤ C}.
.
Let . > 0 be fixed, and for any .t ∈ T , define .
At = PFP + Q, Bt = G,
bt = Pd, I It = −λ
by the vectors and matrices arising from the discretization of (10.12) with the parameters H and h, .t = (H, h), so that we get a class of problems .
min ft (λ) s.t. Bt λ = o and λI ≥ It ,
ft (λ) =
1 T λ At λ − bTt λ, 2
t ∈ TC . (10.44)
Using Lemma 10.1, we can achieve that .It ≤ o, and using .GGT = I, we obtain
10 TFETI for Scalar Problems
205 .
Bt = 1.
(10.45)
It follows by Proposition 10.1 that for any .C ≥ 2, there are constants aC
. max
> aC min > 0
such that .
C aC min ≤ λmin (At ) ≤ λmax (At ) ≤ amax
(10.46)
for any .t ∈ TC . As above, we denote by .λmin (At ) and .λmax (At ) the extreme eigenvalues of .At . Our optimality result then reads as follows. Theorem 10.2 Let .C ≥ 2, .ρ > 0, and .ε > 0 denote the given constants, and let .{λkt }, {μkt }, and .{Mt,k } be generated by Algorithm 9.2 (SMALBE-M) for (10.44) with
bt ≥ ηt > 0, 1 > β > 0, Mt,0 > 0, μ0t = o.
.
Let Step 1 of Algorithm 9.2 be implemented by means of Algorithm 8.2 (MPRGP) with parameters Γ>0
.
and
α ∈ (0, 2/aC max ),
so that it generates the iterates k,1 k,l k λk,0 t , λt , . . . , λt = λt
.
for the solution of (10.44) starting from .λk,0 = λk−1 with .λ−1 = o, where t t t .l = lt,k is the first index satisfying .
k,l k
gP (λk,l t , μt , ρ) ≤ Mt,k Bt λt
(10.47)
or .
k
gP (λk,l t , μt , ρ) ≤ ε bt
and
Bt λk,l t ≤ ε bt .
(10.48)
Then for any .t ∈ TC and problem (10.44), an approximate solution .λkt which satisfies (10.48) is generated at .O(1) matrix-vector multiplications by .At . Proof The class of problems satisfies all assumptions of Theorem 9.4 (i.e., . inequalities (10.45) and (10.46)). The rest follows by Theorem 9.4. Since the cost of matrix-vector multiplications by the Hessian .At is proportional to the number of dual variables, Theorem 10.2 proves the numerical scalability of SMALBE-M for (10.44) provided the bound constrained minimization in the inner loop is implemented through MPRGP. The parallel
206
Zdeněk Dostál and Tomáš Kozubek
scalability follows directly from the discussion at the end of Sect. 2.4. See also the next section.
10.10 Numerical Experiments In this section, we illustrate the numerical scalability of TFETI on the solution of model variational inequalities (10.1). The domain .Ω was first partitioned into identical squares with the side length H ∈ {1, 1/2, 1/4, 1/8, 1/16, 1/32}.
.
The square subdomains were then discretized by regular grids with the discretization parameter .h = H/128, so that the discretized problems have the primal dimension n ranging from 33,282 to 34,080,768. The computations were carried out with the recommended parameters, including M0 = 1,
.
ρ = At ,
Γ = 1,
and
εf = εe = ε = 10−4 .
Thus, the stopping criterion was
gtP (λk , μk , ρ) ≤ 10−4 bt and
.
Bt λk ≤ 10−4 bt .
The solutions for .H = 1/4 and .h = 1/4 are in Figs. 10.6 and 10.7. The results of computations are in Fig. 10.8. We can see that the numbers of matrixvector multiplications (on vertical axis) vary moderately in agreement with Theorem 10.2 up to a few millions of nodal variables. The reason for the increased number of iterations for large problems is not clear. The latter could have been caused by very fine discretization, unmatched by the discretization of any 3D problem, the increased number of zero active multiplicators, rounding errors, etc.
0 −0.1 −0.2 −0.3 −0.4 −0.5 1 0.8 0.6 −0.7 0.4 0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 −0.6
Fig. 10.6 The solution of coercive problem
0 −0.2 −0.4 −0.6 −0.8 −1 −1.2 −1.4
1 0.5 0 0.2 0.4 0.6 0.8
1 1.2 1.4 1.6 1.8 2
0
Fig. 10.7 The solution of semicoercive problem
10 TFETI for Scalar Problems
207
nsub 2 250
8
32
128
nsub 512
2048
2
Coercive problem Semicoer. problem
80
8
32
128
512
2048
8.52
34.08
Coercive problem Semicoer. problem
nout
nHes
200 150
60 40
100 20
50 0
0 0.13
0.53
2.13
n
[106 ]
8.52
34.08
0.13
0.53
2.13
n [106 ]
Fig. 10.8 Numerical scalability for the coercive and semicoercive problems. Number of Hessian multiplications .nHes (left) and outer iterations .nout (right)
10.11 Comments The solvability, approximation, and classical numerical methods for elliptic boundary variational inequalities are discussed in the books by Glowinski [13]; Glowinski et al. [14]; and Hlaváček et al. [2]. The dual (reciprocal) formulation of boundary variational inequalities can be found, e.g., in the books by Duvaut and Lions [1] and Hlaváček et al. [2]. More problems described by variational inequalities can be found in Duvaut and Lions [1]. For scalar contact problems, see Sofonea and Matei [15] or Migorski et al. [16]. The first steps toward developing scalable algorithms for variational inequalities used multigrid methods. Hackbusch and Mittelmann [17] and Hoppe [18] used the multigrid to solve auxiliary linear problems and gave numerical evidence of the efficiency of their algorithms. Using the observations related to Mandel [19], Kornhuber [20] proved the convergence of his monotonic multigrid method. Later overview of multigrid solvers can be found in Gräser and Kornhuber [21]. Our presentation is based on the FETI method proposed by Farhat and Roux [22, 23]. A milestone in the development of domain decomposition algorithms was the proof of numerical scalability of FETI with preconditioning by the “natural coarse grid” by Farhat et al. [12]. TFETI was proposed for linear problems independently by Dostál et al. [24]. The same idea was introduced independently for the problems discretized by the boundary element method in the thesis by Of; he coined it all floating BETI [25, 26]. The idea was considered earlier by Park et al. [27]. Augmented Lagrangians were often used to implement conveniently active constraints; see, e.g., Glowinski and LeTallec [28] or Simo and Laursen [29]. The algorithm combining FETI with the augmented Lagrangians appeared in Dostál et al. [30]. See also Dostál et al. [4, 6]. The first result on the numerical
208
Zdeněk Dostál and Tomáš Kozubek
scalability of an algorithm for the solution of a variational inequality used an optimal penalty in dual FETI problem [31]. Dostál and Horák [32] provided experimental evidence of the scalability of the algorithm with the inner loop implemented by the proportioning [33]. The complete proof of optimality that we present here appeared in Dostál and Horák [34]. They used the original SMALBE algorithm, which kept M fixed and updated the regularization parameter .. Other results related to the scalability include Badea et al. [35], who proved the linear rate of convergence for an additive Schwarz domain decomposition method which assumes the exact solution of nonlinear subdomain problems. The variants of two-level FETI methods with preconditioning in face applied to the solution of auxiliary linear problems arising in the solution of the model variational inequality introduced in Sect. 10.1 can be found in Lee [36]. An attempt to make a FETI-DP (dual-primal) algorithm for the solution of variational inequalities with the subdomains interconnected in the corners scalable can be found in Dostál et al. [37–39], but the proof of scalability is flawed by error. Lee and Park [40] based the development of FETI-DP on the Fenchel duality and showed that this approach without modification is practical but does not result in a scalable algorithm. The solution of the discretized scalar variational inequalities by the hybrid TFETI method with the subdomains interconnected by edge averages are in [41, 42]. The proof of optimality is based on [43]. See also the comments in Chap. 15.
References 1. Duvaut, G., Lions, J.L.: Inequalities in Mechanics and Physics. Springer, Berlin (1976) 2. Hlaváček, I., Haslinger, J., Nečas, J., Lovíšek, J.: Solution of Variational Inequalities in Mechanics. Springer, Berlin (1988) 3. Glowinski, R.: Numerical Methods for Nonlinear Variational Problems. Springer Series in Computational Physics. Springer, Berlin (1984) 4. Dostál, Z., Gomes, F.A.M., Santos, S.A.: Solution of contact problems by FETI domain decomposition with natural coarse space projection. Comput. Methods Appl. Mech. Eng. 190(13–14), 1611–1627 (2000) 5. Bochev, P.: On the finite element solution of the pure Neumann problem. SIAM Rev. 47(1), 50–66 (2005) 6. Dostál, Z., Gomes, F.A.M., Santos, S.A.: Duality based domain decomposition with natural coarse space for variational inequalities. J. Comput. Appl. Math. 126(1–2), 397–415 (2000) 7. Marchuk, G.I.: Methods of Numerical Mathematics. Springer, Berlin (1982)
10 TFETI for Scalar Problems
209
8. Bramble, J.H., Pasciak, J.E., Schatz, A.H.: An iterative method for the solution of elliptic problems on regions partitioned into substructures. Math. Comput. 46, 361–369 (1986) 9. Bramble, J.H., Pasciak, J.E., Schatz, A.H.: The construction of preconditioners for elliptic problems by substructuring I. Math. Comput. 47, 103–134 (1986) 10. Brenner, S.C.: The condition number of the Schur complement in domain decomposition. Numer. Math. 83, 187–203 (1999) 11. Pechstein, C.: Finite and Boundary Element Tearing and Interconnecting Solvers for Multiscale Problems. Springer, Heidelberg (2013) 12. Farhat, C., Mandel, J., Roux, F.-X.: Optimal convergence properties of the FETI domain decomposition method. Comput. Methods Appl. Mech. Eng. 115, 365–385 (1994) 13. Glowinski, R.: Variational Inequalities. Springer, Berlin (1980) 14. Glowinski, R., Lions, J.L., Tremolieres, R.: Numerical Analysis of Variational Inequalities. North-Holland, Amsterdam (1981) 15. Sofonea, M., Matei, A.C.: Variational Inequalities with Applications. A Study of Antiplane Frictional Contact Problems. Springer, New York (2009) 16. Migorski, S., Ochal, A., Sofonea, M.: Modeling and analysis of an antiplane piezoelectric contact problem. Math. Models Methods Appl. Sci. 19, 1295–1324 (2009) 17. Hackbusch, W., Mittelmann, H.: On multigrid methods for variational inequalities. Numerische Mathematik 42, 65–76 (1983) 18. Hoppe, R.H.W.: Multigrid algorithms for variational inequalities. SIAM J. Numer. Anal. 24(5), 1046–1065 (1987) 19. Mandel, J.: Étude algébrique d’une méthode multigrille pour quelques problèmes de frontière libre. Comptes Rendus de l’Académie des Sci. Ser. I(298), 469–472 (1984) 20. Kornhuber, R.: Monotone Multigrid Methods for Nonlinear Variational Problems. Teubner, Stuttgart (1997) 21. Gräser, C., Kornhuber, R.: Multigrid methods for obstacle problems. J. Comput. Math. 27(1), 1–44 (2009) 22. Farhat, C., Roux, F.-X.: A method of finite element tearing and interconnecting and its parallel solution algorithm. Int. J. Numer. Methods Eng. 32, 1205–1227 (1991) 23. Farhat, C., Roux, F.-X.: An unconventional domain decomposition method for an efficient parallel solution of large-scale finite element systems. SIAM J. Sci. Comput. 13, 379–396 (1992) 24. Dostál, Z., Horák, D., Kučera, R.: Total FETI - an easier implementable variant of the FETI method for numerical solution of elliptic PDE. Commun. Numer. Methods Eng. 22, 1155–1162 (2006) 25. Of, G.: BETI - Gebietszerlegungsmethoden mit schnellen Randelementverfahren und Anwendungen. Ph.D. Thesis, University of Stuttgart (2006)
210
Zdeněk Dostál and Tomáš Kozubek
26. Of, G., Steinbach, O.: The all-floating boundary element tearing and interconnecting method. J. Numer. Math. 17(4), 277–298 (2009) 27. Park, K.C., Felippa, C.A., Gumaste, U.A.: A localized version of the method of Lagrange multipliers. Comput. Mech. 24, 476–490 (2000) 28. Glowinski, R., Le Tallec, P.: Augmented Lagrangian and OperatorSplitting Methods in Nonlinear Mechanics. SIAM, Philadelphia (1989) 29. Simo, J.C., Laursen, T.A.: An augmented Lagrangian treatment of contact problems involving friction. Comput. Struct. 42, 97–116 (1992) 30. Dostál, Z., Friedlander, A., Santos, S.A.: Solution of contact problems of elasticity by FETI domain decomposition. Domain decomposition methods 10. AMS, Providence. Contemp. Math. 218, 82–93 (1998) 31. Dostál, Z., Horák, D.: Scalable FETI with optimal dual penalty for a variational inequality. Numer. Linear Algebra Appl. 11(5–6), 455–472 (2004) 32. Dostál, Z., Horák, D.: Scalability and FETI based algorithm for large discretized variational inequalities. Math. Comput. Simul. 61(3–6), 347– 357 (2003) 33. Dostál, Z.: Box constrained quadratic programming with proportioning and projections. SIAM J. Optim. 7(3), 871–887 (1997) 34. Dostál, Z., Horák, D.: Theoretically supported scalable FETI for numerical solution of variational inequalities. SIAM J. Numer. Anal. 45(2), 500–513 (2007) 35. Badea, L., Tai, X.C., Wang, J.: Convergence rate analysis of a multiplicative Schwarz method for variational inequalities. SIAM J. Numer. Anal. 41(3), 1052–1073 (2003) 36. Lee, J.: Two domain decomposition methods for auxiliary linear problems for a multibody variational inequality. SIAM J. Sci. Computing. 35(3), 1350–1375 (2013) 37. Dostál, Z., Horák, D., Stefanica, D.: A scalable FETI-DP algorithm for a coercive variational inequality. IMACS J. Appl. Numer. Math. 54(3–4), 378–390 (2005) 38. Dostál, Z., Horák, D., Stefanica, D.: A scalable FETI-DP algorithm for semi-coercive variational inequalities. Comput. Methods Appl. Mech. Eng. 196(8), 1369–1379 (2007) 39. Dostál, Z., Horák, D., Stefanica, D.: A scalable FETI-DP algorithm with non-penetration mortar conditions on contact interface. J. Comput. Appl. Math. 231(2), 577–591 (2009) 40. Lee, C.-O., Park, J.: A dual-primal finite element tearing and interconnecting method for nonlinear variational inequalities utilizing linear local problems. Int. J. Numer. Methods Eng. 122(22), 6455–6475 (2021) 41. Dostál, Z., Brzobohatý, T., Horák, D., Kružík, J., Vlach, O.: Scalable Hybrid TFETI-DP Methods for Large Boundary Variational Inequalities, Proceedings of the Domain Decomposition Methods in Science and Engineering XXVI, Brenner, S.C. et al. (eds.). LNCSE, vol. 145, pp. 27–38, Springer, Heidelberg (2022)
10 TFETI for Scalar Problems
211
42. Dostál, Z., Horák, D., Kružík, J., Brzobohatý, T., Vlach, O.: Highly scalable hybrid domain decomposition method for the solution of huge scalar variational inequalities. Numer. Algorithms 91, 773–801 (2022) 43. Dostál, Z., Horák, D., Brzobohatý, T., Vodstrčil, P.: Bounds on the spectra of Schur complements of large H-TFETI-DP clusters for 2D Laplacian. Numer. Linear Algebra Appl. 28(2), e2344 (2021) 44. Smith, R.L.: Some interlacing properties of Schur complement of a Hermitian matrix. Linear Algeb. Appl. 177, 137–144 (1992)
Chapter 11
Frictionless Contact Problems Zdeněk Dostál, Tomáš Kozubek, and Vít Vondrák
We shall extend the results introduced in the previous chapter to the solution of multibody contact problems of elasticity without friction. We shall restrict our attention to linear elasticity problems, i.e., we assume small deformations and linear stress-strain relations. Moreover, we shall be interested mainly in computationally challenging 3D problems. TFETI for the solution of frictionless contact problems is similar to TFETI for scalar variational inequalities in the previous chapter. Apart from more complicated formulae and kernel spaces, the main difference is in the discretization of linearized non-penetration conditions. Here, we shall restrict our attention to the most simple node-to-node non-penetration conditions, leaving the discussion of more sophisticated biorthogonal mortars to Chap. 16. The FETI-type domain decomposition methods comply well with the structure of multibody contact problems, the description of which enhances the decomposition into the subdomains defined by the bodies involved in the problem. Notice that if we decompose the bodies into subdomains, we can view the result as a new multibody problem to find the equilibrium of a system of bodies that are possibly interconnected and do not penetrate each other. The FETI methods treat each subdomain separately, an excellent asset for parallel implementation. Moreover, the algorithm efficiently treats the “floating” bodies whose Dirichlet boundary conditions admit a rigid body motion. A unique feature of FETI is the existence of a projector to the coarse space that contains the solution. The basic TFETI-based algorithms presented here can effectively use tens of thousands of cores to solve both coercive and semicoercive contact problems decomposed into tens of thousands of subdomains and discretized by billions of nodal variables. For still more nodal variables, the initialization Z. Dostál • T. Kozubek • V. Vondrák Department of Applied Mathematics and IT4Innovations National Supercomputing Center, VŠB-Technical University of Ostrava, Ostrava, Czech Republic e-mail: [email protected]; e-mail: [email protected]; e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 Z. Dostál et al., Scalable Algorithms for Contact Problems, Advances in Mechanics and Mathematics 36, https://doi.org/10.1007/978-3-031-33580-8_11
213
214
Zdeněk Dostál, Tomáš Kozubek, and Vít Vondrák
of the solving procedure, in particular assembling the projectors, starts to dominate the costs. Some modifications suitable for emerging exascale technologies are described in Chaps. 15 and 20.
Fig. 11.1 Two-body contact problem
11.1 Linearized Non-penetration Conditions Let a system of bodies in a reference configuration occupy open bounded domains .Ω1 , . . . , Ωs ⊂ R3 with the Lipschitz boundaries .Γ1 , . . . , Γs . Suppose p q that some .Γp comprises a part .Γpq C ⊆ Γ that can get into contact with .Γ as in Fig. 11.1. We assume that .ΓpC is sufficiently smooth, so that there is a well-defined outer unit normal .np (x) at almost each point .x ∈ ΓpC . After the deformation, each point .x ∈ Ωp moves into yp (x) = x + up (x),
.
where .up = up (x) is the displacement vector which defines the deformation of .Ωp . The mapping .yp : Ωp → R3 is injective and continuous. The nonpenetration condition requires x ∈ ΓpC
.
⇒ x + up (x) ∈ / yq (Ωq ),
q ∈ {1, . . . , s},
p = q.
It is difficult to enhance the latter condition into an effective computational scheme, so we shall replace it by linearized relations. For each couple of indices pq .p, q which identifies .ΓC = ∅, we choose one index to identify the slave side of a possible contact interface. This choice defines the contact coupling set .S of all ordered couples of indices, the first component of which refers to the nonempty slave side of the corresponding contact interface. For each
11 Frictionless Contact Problems
215
(p, q) ∈ S , we then define a one-to-one continuous mapping
.
qp χpq : Γpq C → ΓC
.
qp q which assigns to each .x ∈ Γpq C a point of the master side .ΓC ⊆ ΓC that is assumed to be after the deformation near to .x, as in Fig. 11.2. The (strong) linearized non-penetration condition then reads
Fig. 11.2 Linearized non-penetration
.
up (x) − uq ◦ χpq (x) · np (x) ≤ χpq (x) − x · np (x), x ∈ Γpq C , (p, q) ∈ S , (11.1)
where .np is an approximation of the outer unit normal to .Γp after the deformation. The linearized condition is exact if .np = np (x) is orthogonal to .Γpq C in the deformed state and the vector .χpq (x) − x moves into the position which is parallel with .np . This observation can be used to develop an iterative improvement for enforcing the non-penetration condition. See also the discussions in the books by Kikuchi and Oden [2], Laursen [3], or Wriggers [4].
11.2 Equilibrium of a System of Elastic Bodies in Contact Having described the non-penetration condition, let us switch to the conditions of equilibrium of a system of bodies .Ω1 , . . . , Ωs . Let each .Γp , .p = 1, . . . , s, consists of three disjoint parts .ΓpU , .ΓpF , and .ΓpC , .Γp = ΓpU ∪ ΓpF ∪ ΓpC , and let the volume forces .f p : Ωp → R3 , the zero boundary displacements p p p p 3 .uΓ : ΓU → {o}, and the boundary traction .fΓ : ΓF → R be given. We admit p .ΓU = ∅, but in this case we assume some additional restrictions to guarantee that a solution exists. To enhance the contact with a rigid obstacle, we admit the bodies with a priori defined zero displacements. In this case, only the contact boundary of such bodies is relevant in our considerations.
216
Zdeněk Dostál, Tomáš Kozubek, and Vít Vondrák
Let us choose a contact coupling set .S , so that for each .(p, q) ∈ S , .Γpq C denotes the part of .ΓpC which can get into contact with .Γq , and let us define qp qp of .Γq a one-to-one continuous mapping .χpq : Γpq C → ΓC onto the part .Γ p which can come into contact with .Γ . Thus, p pr q Γ = Γ and Γ = Γrq C C C C. . (p,r)∈S
(r,q)∈S
Let .vp : Ωp → R3 , p = 1, . . . , s, denote a sufficiently smooth mapping, so that the related concepts are well defined, and denote Ω = Ω1 ∪ · · · ∪ Ωs .
v = (v1 , . . . , vs ),
.
Notice that if .x ∈ Ω, then there is a unique .p = p(x) such that .x ∈ Ωp , so we can define v(x) = vp(x) (x)
.
for
x ∈ Ω.
We assume that the small strain assumption is satisfied, so that the straindisplacement relations are defined for any .x ∈ Ω by the Cauchy small strain tensor ε(v)(x) = ε(v) =
.
1 ∇v + (∇v)T 2
with the components εij (v) =
.
1 2
∂vj ∂vi + ∂xi ∂xj
,
i, j = 1, 2, 3.
(11.2)
For simplicity, we assume that the bodies are made of an isotropic linear elastic material so that the constitutive equation for the Cauchy stress tensor .σ is given in terms of the fourth-order Hooke elasticity tensor .C by σ(v) = C : ε(v) = λ tr (ε(v)) Id + 2με(v),
.
(11.3)
where .λ > 0 and .μ > 0 are the Lamé parameters which are assumed to be constant in each subdomain .Ωp , .p = 1, . . . , s. The Lamé coefficients can be easily calculated by means of the Poisson ratio .ν and Young’s modulus E using λ=
.
Eν , (1 + ν)(1 − 2ν)
μ=
E , 2(1 + ν)
so the components of the elasticity tensor are given by ν E δij δk + δik δj , i, j, k, = 1, 2, 3. . Cijk = 1 + ν 1 − 2ν
(11.4)
11 Frictionless Contact Problems
217
The components of the stress tensor are given componentwise by 3
σ (v) =
Cijk εk (v),
. ij
i, j = 1, 2, 3.
k,=1
Using the above notations, we can write the linearized elastic equilibrium condition and the Dirichlet and Neumann boundary conditions for the displacement .u = (u1 , . . . , us ) as .
− div σ(u) = f
in Ω,
u = o in ΓpU , σ(up )np = fΓp on ΓpF , p
(11.5)
where .np denotes the outer unit normal to .Γp which we assume to exist almost everywhere. Moreover, we assume that all objects are sufficiently smooth so that the equations can be satisfied point-wise, postponing more realistic assumptions to the next section. The equations can be written componentwise, e.g., the first equation of (11.5) reads 3 ∂ σij (u) + fi = 0 in Ω, i = 1, 2, 3. ∂xj j=1
.
To complete the classical formulation of frictionless contact problems, we have to specify the boundary conditions on .ΓC . Assuming that .(p, q) ∈ S , we introduce the notation u − u ◦ χ · n = up (x) − uq ◦ χpq (x) · np (x), x ∈ Γpq C , pq p . pq g = χ (x) − x · n (x), x ∈ ΓC , and use (11.1) to get the non-penetration condition .
(u − u ◦ χ) · n ≤ g,
x ∈ Γpq C ,
(p, q) ∈ S .
(11.6)
The surface traction .λ on the slave side of the active contact interface .Γpq C and on the rest of .Γpq C is given by λ = λN = σ(up )np
.
and
λ=0
respectively. Since we assume that the contact is frictionless, the tangential component of .λ is zero, i.e., λ = (λ · np )np ,
.
and the linearized conditions of equilibrium read
218 .
Zdeněk Dostál, Tomáš Kozubek, and Vít Vondrák
λ · np ≥ 0 and (λ · np ) (up − uq ◦ χ)np − g = 0,
x ∈ Γpq C , (p, q) ∈ S . (11.7)
The second condition in (11.7) is called the complementarity condition. Newton’s law requires that the normal traction acting on the contacting surfaces is equal and opposite, so that (11.8)
.
The system of equations and inequalities (11.5)–(11.8) with the assumption that the tangential component of .λ is zero represents the classical formulation of multibody frictionless contact problems. Denoting by .λn and .[un ] the contact stress and the jump of the boundary displacements, respectively, i.e., p .λn = λ · n , [un ] = up − uq ◦ χ · np , x ∈ Γpq C , we can write the contact conditions briefly as .
[un ] ≤ g, λn ≥ 0, λn ([un ] − g) = 0, λ = λn np , x ∈ Γpq C , (p, q) ∈ S . (11.9)
11.3 Variational Formulation The classical formulation of contact problem (11.5) and (11.9) makes sense only when the solution complies with strong regularity assumptions which are not satisfied by the solution of realistic problems. For example, if a body is not homogeneous, the equilibrium on the material interface requires additional equations. The basic idea is to require that the equilibrium conditions are satisfied in some average. To formulate it more clearly, let us define the spaces 3 V p = v ∈ H 1 (Ωp ) : v = o on ΓpU , p = 1, . . . , s,
.
V = V 1 × · · · × V s,
and the convex set K = {v ∈ V : [vn ] ≤ g on Γpq C , (p, q) ∈ S } ,
.
where the relations should be interpreted in the sense of traces. Let us first assume that .up , vp ∈ V p are sufficiently smooth, so that we can define .
11 Frictionless Contact Problems
219
and the bilinear form
ap (up , vp ) =
σ (up ) : ε (vp ) dΩ.
.
Ωp
Using the symmetry of means of
, we can get an alternative expression for .ap by
3 ∂vjp 1 ∂vip p .σ (u ) : ε (v ) = σij (u ) + 2 i,j=1 ∂xj ∂xi 3 p ∂vjp 1 p ∂vi p = + σji (u ) σij (u ) 2 i,j=1 ∂xj ∂xi p
p
=
3 i,j=1
σij (up )
(11.10)
∂vip = σ(up ) : ∇vp . ∂xj
Let us assume that .u ∈ K is a sufficiently smooth solution of (11.5) and (11.9), and let .v ∈ V be sufficiently smooth, so that the Green formula is valid, i.e.,
(11.11)
.
After multiplying the first equation of (11.5) by .vp − up and integrating the result over .Ωp , we get
.
We can also use (11.10) and (11.11) to get
.
After comparing the latter two equations and summing up, we get
.
Using the boundary conditions, the boundary integrals can be modified to
.
220
Zdeněk Dostál, Tomáš Kozubek, and Vít Vondrák
Denoting (v) =
s
.
a(u, v) =
p=1 s
p (v),
p (v) = Γp F
fΓp · vp dΓ +
f p · vp dΩ Ωp
ap (up , vp ),
p=1
we can rewrite the above relations for the solution .u ∈ K and .v ∈ V as (11.12)
.
Moreover, assuming that .(p, q) ∈ S , .np = −nq ◦ χ, and .v ∈ K , we get
. σ(up )np · (vp − up )dΓ + σ(uq )nq · (vq − uq )dΓ Γpq C
Γqp C
=
Γpq C
=
Γpq C
=
Γpq C
= Γpq C
λ · up − vp + vq − uq ) ◦ χpq dΓ λn ([un ] − [vn ])dΓ λn ([un ] − g + g − [vn ])dΓ λn (g − [vn ])dΓ ≥ 0,
so any solution .u of the frictionless problem (11.5) and (11.9) satisfies the variational inequality a(u, v − u) ≥ (v − u),
.
v∈K.
(11.13)
Inequality (11.13) characterizes a minimizer of the quadratic function q(v) =
.
1 a(v, v) − (v) 2
defined on V (Theorem 4.7). Denoting .v = u + d, we can rewrite (11.13) as a(u, d) − (d) ≥ 0,
.
u+d∈K ,
so 1 q(u + d) − q(u) = a(u, d) − (d) + a(d, d) ≥ 0, 2
.
i.e.,
11 Frictionless Contact Problems .
221
u = arg min q(v).
(11.14)
v∈K
The inequality (11.13) and problem (11.14) are well defined for more general functions than the piece-wise continuously differentiable functions assumed in (11.5) and (11.9). Indeed, if .f p ∈ L2 (Ωp ) and .fΓp ∈ L2 (ΓpF ), then a and . can be evaluated with .v ∈ V provided the boundary relations concerning .vp are interpreted in the sense of traces. Moreover, it can be proved that if .u is a solution of such generalized problem, then it has a natural mechanical interpretation. It is well known that a minimizer of q on .K exists when q is coercive on .K , i.e., v∈K,
.
v → ∞
⇒
q(v) → ∞,
with the norm induced by the broken scalar product
(u, v) =
s
u · v dΩ.
.
p=1
Ωp
If .ΓpU = ∅ for some .p ∈ {1, . . . , s}, then the coercivity condition is satisfied if .
a(v, v) = 0
⇒
(v) < 0,
v∈K.
In this case, a solution exists, but it need not be unique.
Fig. 11.3 TFETI domain decomposition with subdomain renumbering
(11.15)
222
Zdeněk Dostál, Tomáš Kozubek, and Vít Vondrák
11.4 Tearing and Interconnecting Our next step is to reduce the contact problem into a number of “small” problems at the cost of introducing additional constraints. To this end, let us decompose each .Ωp into subdomains with sufficiently smooth boundaries as in Fig. 11.3, assign each subdomain a unique number, and introduce new interconnecting conditions on the artificial inter-subdomain boundaries. We denote by H the diameter of the largest subdomain, also called the decomposition parameter. In the early papers, the subdomains were required to have similar shape and size. The latter conditions indeed affect the performance of the algorithms, but they are not necessary for the optimality theory presented here. We decompose also the parts of the boundaries .ΓpU , ΓpF , and .ΓpC , .p = 1, . . . ,s, and introduce their numbering so that it complies with the decomposition of the subdomains. For the artificial inter-subdomain boundaries, we introduce a notation in analogy to that concerning the contact boundary, i.e., pq qp pq qp p .ΓG denotes the part of .Γ which is “glued” to .ΓG . Obviously, .ΓG = ΓG , and pq it is possible that .Γ = ∅. We shall also tear the boundary subdomains from the boundary and enforce the Dirichlet boundary conditions by the equality constraints. An auxiliary decomposition of the problem of Fig. 11.1 with renumbered subdomains is in Fig. 11.3. To enhance the gluing conditions (11.16)
.
. (11.17) into the variational formulation, we shall choose the contact coupling set .S and the test spaces 3 p VDD = v ∈ H 1 (Ωp ) : vp = o on ΓpU , p = 1, . . . , s, .
VDD =V 1 × · · · × V s , pq p q KDD = {v ∈ VDD : [vn ] ≤ g on Γpq C , (p, q) ∈ S ; v = v on ΓG , p, q = 1, . . . , s} ,
where the relations should be interpreted in the sense of traces. Recall that for .x ∈ Γpq C , .(p, q) ∈ S , and .v ∈ KDD , [vn ] = (vp − vq ◦ χpq ) · np ,
.
g = (χpq (x) − x) · np .
If .u is a classical solution of the decomposed problem which satisfies the gluing conditions (11.16) and (11.17), then for any .v ∈ KDD and .p, q = 1, . . . s,
.
11 Frictionless Contact Problems
223
It follows that the inequality (11.13) holds also for the forms defined for the decomposed problems. Denoting a(u, v) = .
(v) =
s
ap (up , vp ),
p=1 s
p=1
Γp F
fΓp
· (v − u ) dΓ + p
p
s
p=1
f p · (vp − up ) dΩ, Ωp
we conclude that any classical solution .u of the frictionless decomposed problem (11.5), (11.9), (11.16), and (11.17) satisfies the variational inequality .
a(u, v − u) ≥ (v − u),
v ∈ KDD ,
(11.18)
and by Theorem 4.7, .
q(u) ≤ q(v),
v ∈ KDD ,
q(v) =
1 a(v, v) − (v). 2
(11.19)
It can be proved that any sufficiently smooth solution of (11.18) or (11.19) is a classical solution of the frictionless contact problem.
11.5 Discretization Let us decompose each subdomain .Ωp into elements, e.g., tetrahedra, the shape of which is determined by the connection data and the positions of nodes (vertices) p .xj , j = 1, . . . , np . Let h denote the maximum of the elements’ diameters, also called the discretization parameter. We shall use h also as a label which identifies the set of all elements .Th (Ωp ). We assume that the elements are shape regular , i.e., there is a constant .cs > 0 independent of h such that the diameter .h(τ ) of each element .τ ∈ Th (Ωp ) and the radius .ρ(τ ) of the largest ball inscribed into .τ satisfy ρ(τ ) ≥ cs h(τ ),
.
and that the discretization is quasi-uniform, i.e., there is a constant .cd > 0 independent of h such that for any element .τ ∈ Th (Ωp ), h(τ ) ≥ cd h.
.
It follows that if the discretization is quasi-uniform with the shape regular elements, then there is a constant .cr > 0 such that for any .τ ∈ Th (Ωp ),
224
Zdeněk Dostál, Tomáš Kozubek, and Vít Vondrák
ρ(τ ) ≥ cr h.
(11.20)
.
We also assume that the subdomains consist of the unions of elements, i.e., the subdomain boundaries do not cut through any element and the grids are matching on the “gluing” interface of the subdomains. Here, we also assume that the grids match on the contact interface, i.e., the set of nodes on each pq onto the slave part .Γpq C of a contact interface is mapped by a bijection .χ qp set of nodes on the master part .ΓC of the contact interface, postponing the generalization to Chap. 16. On each element .τ ∈ Th (Ωp ), we define the linear shape function as the restriction of the linear combination of the basis functions Φpj,k : Ωp → R3 ,
.
j = 1, . . . , np ,
k = 1, 2, 3.
We assume that the basis functions are piece-wise linear with just one component equal to one in one of the nodes and remaining components and values in the other nodes equal to zero. More formally, .Φpj,k is defined by the nodal values as ⎤ ⎡ δ1k p p ⎣ δ2k ⎦ , Φp (xp ) = o, j = , , j = 1, . . . , np , k = 1, 2, 3, .Φ j,k (xj ) = j,k δ3k where .δik denotes the Kronecker delta. The assumptions on the discretization are sufficient to get a bound on the derivatives in terms of h. Lemma 11.1 Let the basis functions .Φpj,k be defined by a quasi-regular discretization with shape regular elements and a discretization parameter h. Then there is a positive constant .C1 such that for nearly all .x ∈ supp Φpj,k , ε(Φpj,k )(x)2F ≤ C1 h−2 ,
.
j = 1, . . . , np ,
k = 1, 2, 3,
where .supp Φpj,k denotes the support of .Φpj,k . Proof Let .τ ⊆ supp Φpj,k , and let .x denote the center of the largest ball of radius .ρ(τ ) inscribed into .τ . Then the distance d of the vertex .xpj from the plane defined by the remaining three vertices of .τ is greater than or equal to .ρ(τ ), so that by (11.20), ∂ Φpj,k (x) ≤ 1/d ≤ 1/ (τ ) ≤ C2 /h, ∂xk
.
See Fig. 11.4 for the illustration in 2D.
C2 = 1/cr .
.
11 Frictionless Contact Problems
225
xpj
h
x d
( )
Fig. 11.4 Height of the triangle and inscribed circle
In what follows, we shall use linear indexing to identify the basis functions by a unique index so that Φp = Φpj,k ,
= 3(j − 1) + k.
.
We shall look for an approximate solution .uh in the trial space .Vh which is spanned by the basis functions, .
Vh = Vh1 × · · · × Vhs , Vhp = Span{Φp , = 1, . . . , np },
np = 3np .
Let us decompose .uh into the components .uph which comply with the decomposition, i.e., uh = (u1h , . . . , ush ).
.
Using the coordinate vectors up = [up1 , . . . , upnp ]T ,
.
p = 1, . . . , s,
we can get .uph in the form p .u h
=
np
up Φp .
=1 p
After substituting into the forms .a , we get
226
Zdeněk Dostál, Tomáš Kozubek, and Vít Vondrák
a(uh , vh ) =
s
.
ap (uph , vhp ),
p=1
a
p
(uph , vhp )
=
[Kp ]m =
np np
p ap (Φp , Φpm )up vm = uTp Kp vp ,
(11.21)
=1 m=1 ap (Φp , Φpm ).
Similarly, (uh ) =
s
.
p (uph ),
p=1
p (uph ) = [fp ] =
np
up p (Φp ) = fpT up ,
(11.22)
=1 p (Φp ).
To simplify the following exposition, we introduce the notation for global objects, so that we can write ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ K1 O · · · O u1 f1 ⎢ O K2 · · · O ⎥ ⎢ ⎥ ⎢ .. ⎥ ⎢ .. ⎥ .K = ⎢ . . . ⎥, u = ⎣ . ⎦, f = ⎣ . ⎦, ⎣ .. .. . . . .. ⎦ us fs O O · · · Ks and a(uh , uh ) = uT Ku,
.
(uh ) = f T u.
(11.23)
The finite element approximation of (11.19) gives rise to the QP problem .
min
1 T u Ku − f T u 2
s.t. BI u ≤ cI
and
B E u = cE ,
(11.24)
where .K denotes an SPS block diagonal matrix of order n, .BI denotes an mI × n full rank matrix, .BE denotes an .mE × n full rank matrix, .f ∈ Rn , m m .cI ∈ R I , and .cE ∈ R E . We use the same notation for nodal displacements as we used for continuous displacements. We shall sometimes denote the nodal displacements by .uh , indicating that it was obtained by the discretization using the elements with the diameter less than or equal to h. The blocks .Kp of .K, which correspond to .Ωp , are SPS sparse matrices with known kernels, the rigid body modes. Since we consider 3D problems, the dimensions of the kernels of .Kp and .K are six and 6s, respectively. The vector .f describes the nodal forces arising from the volume forces and/or some other imposed traction. .
11 Frictionless Contact Problems
227
The matrix .BI ∈ RmI ×n and the vector .cI describe the linearized nonpenetration conditions. The rows .bk of .BI are formed by zeros and appropriately placed multiples of coordinates of an approximate outer unit normal on the slave side. If .np is an approximate outer normal vector at .xp ∈ ΓpC on the slave side and .xq = χ(xp ) is the corresponding node on the master side, then there is a row .bk∗ of .BI such that T
bk uh = (uph − uqh ) np ,
.
where .uph and .uqh denote the discretized displacements at .xp and .xq , respectively. The entry .ck of .cI describes the normal gap between some .xp and .xq , i.e., c = (xp − xq )T np .
. k
The matrix .BE ∈ RmE ×n enforces the prescribed zero displacements on the part of the boundary with imposed Dirichlet’s condition and the continuity of the displacements across the auxiliary interfaces. The continuity requires that .bi uh = ci = 0, where .bi are the rows of .BE with zero entries except 1 and .−1 at appropriate positions. Typically, .m = mI + mE is much smaller than n. If k subdomains have a joint node .x, i.e., .x ∈ Ωi1 ∩ · · · ∩ Ωik , then the interconnection of the subdomains at .x for 3D problems requires .3(k − 1) rows of .BE . Notice that the rows of .BE that are associated with different nodes are orthogonal, so that we can use effectively the Gram-Schmidt orthonormalization procedure just to the wire-basket rows to get .BE with orthonormal rows. Remark 11.1 We can achieve that the rows of .B = [BTE , BTI ]T are orthonormal provided each node is involved in at most one inequality. This is always possible for two bodies or any number of smooth bodies. To simplify the formulation of optimality results, we shall assume in what follows (except Chap. 16) that .
BBT = I.
(11.25)
In what follows, we assume that the discretized problem has a unique solution.
11.6 Dual Formulation Even though (11.24) is a standard convex QP problem, its formulation is not suitable for numerical solutions. The reasons are that .K is typically illconditioned and singular and the feasible set is in general so complex that projections onto it can hardly be effectively computed. Under these circumstances, it would be complicated to identify the active set of the solution quickly and find a fast algorithm for auxiliary linear problems.
228
Zdeněk Dostál, Tomáš Kozubek, and Vít Vondrák
The complications mentioned above can be essentially reduced by applying the duality theory of convex programming (see Sect. 3.7). The Lagrangian associated with problem (11.24) reads .
1 T u Ku − f T u + λTI (BI u − cI ) + λTE (BE u − cE ), (11.26) 2
L(u, λI , λE ) =
where .λI and .λE are the Lagrange multipliers associated with the inequalities and equalities, respectively. Introducing the notation λI BI cI .λ = , B= , and c = , λE BE cE we can write the Lagrangian briefly as 1 T u Ku − f T u + λT (Bu − c). 2
L(u, λ) =
.
Using Proposition 3.13, we get that (11.24) is equivalent to the saddle point problem .
= sup inf L(u, λ). L( u, λ) λI ≥o u
(11.27)
For a fixed .λ, the Lagrange function .L(·, λ) is convex in the first variable, and the minimizer .u of .L(·, λ) satisfies .
Ku − f + BT λ = o.
(11.28)
Equation (11.28) has a solution if and only if .
f − BT λ ∈ ImK,
(11.29)
which can be expressed more conveniently by means of a matrix .R, the columns of which span the null space of .K as .
RT (f − BT λ) = o.
(11.30)
The matrix .R may be formed directly, block by block, using any basis of the rigid body modes of the subdomains. In our case, each .Ωp is assigned six columns with the blocks ⎤ ⎡ 0 − zi y i 1 0 0 0 − xi 0 1 0 ⎦ . ⎣ zi 0 0 0 1 −yi xi and .O ∈ R3×6 associated with each node .Vi ∈ Ωp and .Vj ∈Ω / p , respectively. Using the Gram-Schmidt procedure, we can find .Ri such that
11 Frictionless Contact Problems
Im Ri = Ker Ki ,
.
229
R = diag(R1 , . . . , Rs ),
RT R = I.
Now assume that .λ satisfies (11.29), so that we can evaluate .λ from (11.28) by means of any (left) generalized matrix .K+ which satisfies .
KK+ K = K.
(11.31)
It may be verified directly that if .u solves (11.28), then there is a vector .α such that u = K+ (f − BT λ) + Rα.
.
(11.32)
The evaluation of the action of a generalized inverse which satisfies (11.31) is simplified by the block structure of .K. Using Lemma 2.1, the kernel of each stiffness matrix .Ki can be used to identify a nonsingular submatrix .KI I of the same rank as .Ki . The action of the left generalized inverse .K# i (2.12) can be implemented by Cholesky’s decomposition. Observe that # K# = diag(K# 1 , . . . , Ks ).
.
Alternatively, it is possible to use the fixing point strategy of Sect. 11.8. After substituting expression (11.32) into problem (11.27), changing the signs, and omitting the constant term, we get that .λ solves the minimization problem .
min Θ(λ)
s.t.
λI ≥ o and
RT (f − BT λ) = o,
(11.33)
where .
Θ(λ) =
1 T λ BK+ BT λ − λT (BK+ f − c). 2
(11.34)
Recall that using the same procedure as in Sect. 10.7, we can rewrite the formula for .BK+ BT in terms of Schur’s complements to get .
Θ(λ) =
1 T λ B∗B S+ BT∗B λ − λT (BK+ f − c), 2
S = diag(S1 , . . . , Ss ), (11.35)
where .B denotes the set of the column indices of .B corresponding to the boundary variables. α) of (11.33) is known, the solution .u of (11.24) Once the KKT pair .(λ, may be evaluated by (11.32). If .α is not known, then we can use (3.69) or the efficient procedure described in Sect. 11.9.
230
Zdeněk Dostál, Tomáš Kozubek, and Vít Vondrák
11.7 Preconditioning by Projectors to Rigid Body Modes As in Chap. 10, further improvement can be achieved by adapting some simple observations originating in Farhat et al. [11]. Let us first simplify the notations by denoting = BK+ f − c, F = F, = BK+ BT , d F d ˜ = F −1 d, . F = F −1 F, = R T BT , G e = RT f , and let .T denote a regular matrix which defines the orthonormalization of so that the matrix the rows of .G G = TG
.
has orthonormal rows, i.e., GGT = I.
.
A standard way to get .T is to use the Cholesky factorization G T = LLT G
.
and set T = L−1 .
.
After denoting e = T e,
.
problem (11.33) reads .
min
1 T ˜ λ Fλ − λT d 2
s.t.
λI ≥ o
and
Gλ = e.
(11.36)
For practical computation, we can use an approximate value of the norm F ≈ F.
.
can be obtained by substituting Notice that a good approximation of .F −1 the estimates of .Si into (10.31). The scaling of .F was not necessary in Sect. 10.6. As in Chap. 10, we shall transform the problem of minimization on the subset of an affine space to that on the subset of a vector space by means of which satisfies arbitrary .λ
11 Frictionless Contact Problems
231 .
= e. Gλ
(11.37)
we can look for the solution of (11.36) in the form .λ = μ + λ. Having .λ, Though a natural choice for .λ is the least squares solution = GT (GGT )−1 e = GT e, λ
.
the following lemma shows that if we solve a coercive problem (11.19), then such that .λ I = o. we can even find .λ Lemma 11.2 Let the problem (11.24) be obtained by the discretization of a coercive problem, i.e., let the prescribed displacements of each body .Ωp be sufficient to prevent its rigid body motion, and let .G = [GI , GE ]. Then .GE is a full rank matrix, and oI .λ = (11.38) GTE (GE GTE )−1 e = e. I = oI and .Gλ satisfies .λ Proof First observe that I , TG E ], G = [TG
.
T ξ = BE Rξ = o implies .ξ = o. Since the entries so it is enough to prove that .G E of .BE Rξ denote the jumps of .Rξ across the auxiliary interfaces or the violation of prescribed Dirichlet boundary conditions, it follows that .BE Rξ = o implies that .u = Rξ satisfies both the discretized Dirichlet conditions and the “gluing” conditions, but belongs to the kernel of .K. Thus, .ξ = o contradicts our assumption that the problem (11.24) was obtained by a correct discretization . of (11.19). based on Lemma 11.2 guarantees .o to Let us point out that the choice of .λ be a feasible vector of the homogenized problem. If the problem (11.19) is which solves semicoercive, we can achieve the same effect with .λ .
min
1 λ2 2
s.t.
Gλ = e and λI ≥ o.
To carry out the transformation, denote λ = ν + λ,
.
so that .
1 T ˜ − Fλ) + 1λ −λ T d ˜ = 1 ν T Fν − ν T (d ˜ T Fλ λ Fλ − λT d 2 2 2
and problem (11.36) reduces to
232
Zdeněk Dostál, Tomáš Kozubek, and Vít Vondrák
min θ(ν)
.
s.t.
Gν = o and
I ν I ≥ I = − λ
(11.39)
with θ(ν) =
.
1 T ν Fν − ν T d, 2
˜ − Fλ. d=d
Let us point out that the proof of Lemma 11.2 is the only place where we assume that our problem (11.24) is coercive. Our final step is based on the observation that problem (11.39) is equivalent to .
min θρ (ν)
s.t.
Gν = o and
ν I ≥ I ,
(11.40)
where .ρ is an arbitrary positive constant, θ (ν) =
. ρ
1 T ν (PFP + ρQ)ν − λT Pd, 2
Q = GT G,
and
P = I − Q.
Recall that .P and .Q denote the orthogonal projectors on the image space of GT and on the kernel of .G, respectively. The regularization term is introduced in order to simplify the reference to the results of QP that assume the nonsingular Hessian of f. Notice that . ≈ F provides a good regularization and contributes to the favorable distribution of the spectrum of .H = PFP + ρQ.
.
11.8 Stable Evaluation of K+ x by Using Fixing Nodes In Sect. 2.4, we described an effective method that can be used for the identification of a nonsingular submatrix .KRR of the local stiffness matrix .Ki that has its dimension equal to the rank of .Ki and used it to the effective evaluation of the action of .K+ . Here, we shall describe an alternative procedure that can be applied when .KRR is ill-conditioned. The application to the Schur complement .S is straightforward. To simplify the notation, let us assume that .K = Ki ∈ Rn×n . If we choose M nodes that are neither near each other nor placed near any line, .M ≥ 3, then the submatrix .KRR of .K defined by the set .R of remaining indices is “reasonably” nonsingular. The latter is not surprising, since .KRR is the stiffness matrix of the body that is fixed at the chosen nodes. Using the arguments of mechanics, we deduce that fixing of the chosen nodes prevents rigid displacements of the body. We call the M chosen nodes the fixing nodes and denote by .F the set of indices of the fixed displacements. We start with the reordering of .K to get T KRR KRF LRR LTF R LRR O T . K = PKP = (11.41) , = KF R KF F LF R I O S
11 Frictionless Contact Problems
233
where .LRR ∈ Rr×r is a lower factor of the Cholesky decomposition of .KRR , s×r .LF R ∈ R , .s = 3M , .LF R = KF R L−T RR , .P is a permutation matrix, and s×s with respect to .KRR defined .S ∈ R is the Schur complement matrix of .K by S = KF F − KF R K−1 RR KRF .
.
To find .P, we proceed in two steps. First we form a permutation matrix P1 to decompose .K into blocks KRR KRF T . P1 KP1 = (11.42) , KF R KF F
.
where .KRR is nonsingular and .KF F corresponds to the degrees of freedom of the M fixing nodes. Then we apply a suitable reordering algorithm on T .P1 KP1 to get a permutation matrix .P2 which leaves the block .KF F without changes and enables the sparse Cholesky decomposition of .KRR . Further, we decompose .PKPT as shown in (11.41) with .P = P2 P1 . To preserve the sparsity, we can use any sparse reordering algorithm. The choice depends on the way in which the sparse matrix is stored and on the problem geometry. Using Lemma 2.2, we get −T −T T + O L−1 + T LRR − LRR LF R S RR .K =P (11.43) P, O S+ −LF R L−1 RR I where .S+ ∈ Rs×s denotes a left generalized inverse of .S. Since s is small, we can substitute for .S+ the Moore-Penrose generalized inverse .S† ∈ Rs×s computed by SVD. Alternatively, we can use a sparse nonsingular generalized inverse. First observe that the eigenvectors of .S that correspond to zero are the traces of vectors from .Ker K on the fixing nodes. = o, then Indeed, if .Ke KRR eR + KRF eF = o,
.
KF R eR + KF F eF = o,
and .
−1 K FF − K FRK SeF = (K RR RF )eF = o.
(11.44)
Having the basis of the kernel of .S, we can define the orthogonal projector .
−1 T Q = RF ∗ RTF ∗ RF ∗ RF ∗
onto the kernel of .S and specify .S+ in (11.43) by S+ = (S + ρQ)
.
−1
= S† + ρ−1 Q,
ρ > 0.
234
Zdeněk Dostál, Tomáš Kozubek, and Vít Vondrák
We use .ρ ≈ K. To see that .S+ is a nonsingular left generalized inverse, notice that −1 + .SS S = S (S + ρQ) S = S S† + ρ−1 Q S = SS† S + ρ−1 SQS = S. Such approach can be considered as a variant of regularization [5]. To implement the abovementioned observations, an effective procedure for choosing uniformly distributed fixing nodes is necessary. Here, we describe a simple but effective method that combines a mesh partitioning algorithm with a method for finding a mesh center. The algorithm reads as follows. Algorithm 11.1 Algorithm for finding M uniformly distributed fixing nodes in the graph of the discretization
Step 1 can be carried out by any code for graph decompositions such as METIS [6]. The implementation of Step 3 can be based on Corollary 2.1. If the mesh is approximately regular, we can expect that more walks of length k originate from the nodes that are near the center of the mesh. It simply follows that the node with the index i which satisfies w(i, k) ≥ w(j, k), j = 1, 2, . . . , n,
.
for sufficiently large k is in a sense near to the center of the mesh and can be used to implement Step 3 of Algorithm 11.1. Recall that p = lim Dk e−1 Dk e
.
k→∞
is a unique nonnegative eigenvector corresponding to the largest eigenvalue of the mesh adjacency matrix .D. It is also known as the Perron vector [7]. It can be approximated by a few steps of the Lanczos method [8]. Thus, the index of the largest component of the approximate Perron vector of .D is a good approximation of the center of the graph of triangulation. See Brzobohatý et al. [9] for more details.
11.9 Fast Reconstruction of Displacements The solution of (11.40) can be obtained effectively by means of SMALBE-M and MPRGP. Both algorithms can exploit the favorable distribution of the spectrum of the Hessian of .θρ to get an approximation .ν k of solution .ν and a
11 Frictionless Contact Problems
235
corresponding approximation .μk of the Lagrange multiplier for the equality constraints .μ. We can get an approximation to the approximate solution .λ of the dual problem (11.33) by ≈ νk + λ λ=ν +λ
.
and use the formula (3.69) to get T BR) −1 RT B T + (f − BT λ) , α = (RT B c − BK
.
(11.45)
where = B
.
I B BE
= [B]F ,
c=
cI cE
= [c]F ,
I and . cI are formed by the rows of .BI and the components of .cI that and .B do not correspond to the active inequality constraints. We can substitute .λ and .α into (11.32) to get the displacements .
u = K+ (f − BT λ) + Rα.
(11.46)
Evaluating (11.45) requires assembling and solving of the system of linear T BR, which can be expensive when the dimenequations with the matrix .RT B sion of .ImR is large. Notice that the Lagrange multiplier .α for (11.33) is not equal to the Lagrange multiplier .μ for (11.40) when the cost function is preconditioned as suggested in Sect. 11.7, so there is a natural question whether it is possible to modify .μ to get .α and to evaluate the primal solution .u in a more effective way. Here, we show that we can get an approximation to .α by means of .μ and −1 T −1 c − BK+ (f − BT λ) = G Fλ G T G −d α= G G G
.
(11.47)
as proposed by Horák et al. [12]. If FETI is implemented as in Sect. 11.7, then G T = LLT G
.
is available, so .α can be evaluated cheaply using −T −1 −d = F L−T G(Fν − d). .α = L L G F(ν + λ) Recall that (11.45) was derived from + (f − BT λ) + BRα = BK = c. Bu
.
The procedure is based on the following proposition.
(11.48)
236
Zdeněk Dostál, Tomáš Kozubek, and Vít Vondrák
Proposition 11.1 Let .(ν, μ) denote the KKT pair for problem (11.40), and let .α be defined by (11.47). Then α = α − F L−T μ,
.
λ=ν +λ
(11.49)
and the primal solution .u can be evaluated by (11.46). Proof Let us start by recalling that the KKT pair .(ν, μ) of (11.40) satisfies gP (ν, μ, ) = o,
(11.50)
.
where .g is the gradient of the augmented Lagrangian defined by g (ν, μ, ) = (PFP + Q) ν − Pd + GT μ,
.
˜ − Fλ. d=d
If .F denotes the set of all indices i that do not correspond to the active inequality constraints, i.e., those of the equality constraints or those satisfying i , then .ν i > λ T . (PFP + Q) ν − Pd + G μ = o. F Noticing that Pν = ν
.
and
Qν = o,
we get .
P (Fν − d) + GT μ
F
= o.
and .d = d ˜ − Fλ, we can replace .ν in the above equation Using .ν = λ + λ by .λ to get ˜ + GT μ . P Fλ − d = o. (11.51) F
Let us multiply (11.46) on the left by .B and subtract .c from both sides to get T α − Fλ +G [Bu − c]F = BK+ (f − BT λ) − c + BRα F = d F . (11.52) ˜ − Fλ) + G T α = F (d = o. F
T on the left, we get Multiplying (11.47) by .G −1 T G T α = G c − BK+ (f − BT λ) = Q Fλ G T −d G G . ˜ . = F Q Fλ − d
(11.53)
11 Frictionless Contact Problems
237
Let us now rewrite (11.51)–(11.53) as T ˜ .− G μ = P Fλ − d , F F ˜ T α = F Fλ − d , G F F ˜ . T α = F Q Fλ − d G Taking into account that .P + Q = I, it follows that T L−T μ T α T α − F GT μ T α − F G . G = G = G F
F
(11.54) (11.55) (11.56)
F
.
Since we assume that the solution of the discretized problem is contact unique, which is possible only when the columns of . GT μ F are independent, we get α = α − F L−T μ.
.
.
11.10 Reducing the Analysis to Subdomains Recall that Im.P and Im.Q are invariant subspaces of .H , ImP + ImQ = Rm
.
as
P + Q = I,
and for any .λ ∈ R , m
H Pλ = (PFP + ρQ)Pλ = PFPλ
.
and
H Qλ = (PFP + ρQ)Qλ = ρQλ.
It follows that σ(H |ImQ) = {ρ},
.
so it remains to find the bounds on σ(H |ImP) = σ(PFP|ImP).
.
The next lemma reduces the problem to the analysis of local Schur complements. Lemma 11.3 Let there be constants .0 < c < C such that for each .λ ∈ Rm , .
Then for each .λ ∈ ImP,
cλ2 ≤ BT λ2 ≤ Cλ2 .
(11.57)
238
Zdeněk Dostál, Tomáš Kozubek, and Vít Vondrák
.
c
−1 −1 T 2 max Si λ ≤ λ Fλ ≤ C min λmin (Si ) λ2 , (11.58)
i=1,...,s
i=1,...,s
where .Sp denotes the Schur complement of .Kp with respect to the indices of the interior nodes of .Ωp and .λmin (Sp ) denotes the smallest nonzero eigenvalue of .Sp . Proof Replace the matrices in Lemma 10.2 by those defined above.
.
11.11 H-h Bounds on the Schur Complement’s Spectrum To prove that problem (11.40) can be solved by SMALBE-M and MPRGP algorithms with optimal complexity, it remains to show that the spectrum of the Hessian .H of the cost function .θ is uniformly bounded in terms of the decomposition and discretization parameters H and h, respectively. Due to Lemma 11.3, it is enough to give appropriate bounds on the spectrum of the Schur complements .Si .
11.11.1 An Upper Bound As in Sect. 10.8, let .Kp result from the discretization of a subdomain .ΩH = Ωp of the diameter H with the boundary .ΓH = Γ p using the discretization parameter h, and let .SH,h = Sp denote the Schur complement of the subdomain’s stiffness matrix .Kp with respect to its interior variables. Our goal is to give bounds on the spectrum of .SH,h in terms of H and h. In order to avoid a proliferation of constants, especially in some proofs and lemmas, we shall use alternatively the notation .A B (or .B A) introduced by Brenner [5] to represent the statement that there are constants .C1 and .C2 independent of h, H, .Δ, and other variables that appear on both sides such that .A ≤ C1 B (or .B ≤ C2 A). The notation .A ≈ B means that .A B and .B A. We also need the following standard estimate. Lemma 11.4 Let K= diag (K1 , . . . , Ks )
.
be the finite element stiffness matrix of Sect. 11.5 arising from a quasiuniform discretization of the subdomains .Ωp , .p = 1, . . . , s, using linear shape regular tetrahedral elements with the discretization parameter h. Then .
SH,h ≤ Kp h.
(11.59)
11 Frictionless Contact Problems
239
Proof First observe that we assume that the components .cpijkl of Hooke’s tensor of elasticity .Cp are bounded, so the energy density produced by the strain .ε(vp ) associated with the displacement .vp in the subdomain .Ωp satisfies 3
σ(vp ) : ε(vp ) ε(vp )2F =
.
ε2ij (vp ).
i,j=1
3 Let .Φp1 , . . . , Φpnp denote a finite element basis of the subspace .Vhp of . H 1 (Ωp ) that satisfies the assumptions, so that any .uph can be written in the form uph = ξ1 Φp1 + · · · + ξnp Φpnp ,
.
p = 1, . . . , s,
and using the assumptions and Lemma 11.1, we get ε(Φpi )2F h−2 .
.
p Thus, the entries .kij of .Kp satisfy
k p = ap (Φpi , Φpj ) =
. ij
ωip ∩ωjp
σ(Φpi ) : ε(Φpj )dω h3 h−2 = h,
where .ωip and .ωjp are supports of .Φpi and .Φpj , respectively. Taking into account that k p = 0 for ωip ∩ ωjp = ∅,
. ij
we get ξ T Kp ξ hξ2 .
.
To complete the proof, just notice that .K = diag (K1 , . . . , Ks ), and use the . Schur complement interlacing property (2.30).
11.11.2 A Lower Bound To estimate of the lower bound on the spectrum of the Schur complement S , we shall use the procedure developed in Sect. 10.8.2 with the role of the Poincaré inequality supplied by the Korn inequality. To formulate its side condition in a convenient form, we shall use the basis
. H,h
(r1 (x), . . . , r6 (x)),
.
x = [x1 , x2 , x3 ]T ,
of the kernel of the energy form .a(v, v). We define .rk as the columns of the matrix
240
Zdeněk Dostál, Tomáš Kozubek, and Vít Vondrák
⎤ 1 0 0 −x2 0 x3 0 ⎦, .R(x) = ⎣ 0 1 0 x1 −x3 0 0 1 0 x2 −x1 ⎡
rk (x) = R∗k (x),
k = 1, . . . , 6. (11.60)
Theorem 11.1 (Korn’s Inequality) Let .Ω ⊂ R3 denote a Lipschitz domain with boundary .Γ , and let .u ∈ (H 1 (Ω))3 , .u = [u1 , u2 , u3 ]T , satisfy 2
6
uT rk dΓ
.
k=1
(11.61)
= 0.
Γ
Then there is .C > 0 such that 3 .
ui 2H 1 (Ω) ≤ C
ε(u)2F dΩ.
ε (u) : ε (u)dΩ = C Ω
i=1
(11.62)
Ω
Proof See Steinbach [1, Theorems 4.17 and 2.6].
.
First notice that the right-hand side of (11.62) is dominated by the energy form a, as by (11.3) in any .x ∈ Ω σ(u) : ε(u) = (λ tr(ε(u))I3 + 2μ ε(u)) : ε(u)
.
2
= λ tr (ε(u)) + 2μ ε(u) : ε(u) ≥ 2μ ε(u) : ε(u), so that
ε(u) : ε(u)d Ω a(u, u).
.
(11.63)
Ω
Since ε(u) : ε(u) =
3
.
∇ui 2 ,
(11.64)
i=1
we can get a lower bound on the left-hand side of (11.62) by the procedure developed in Sect. 10.8.2. Let us specify this observation in the following lemma. Lemma 11.5 Let .u ∈ (H1 (Ω))3 satisfy (11.61), .u = [u1 , u2 , u3 ]T . Then for .i = 1, 2, 3,
. ui dΓ = 0 and u2i dΓ a(u, u). (11.65) Γ
Γ
Proof First notice that by the assumption .u satisfies (11.61), so that
T i . u r dΓ = ui dΓ = 0, i = 1, 2, 3. (11.66) Γ
Γ
11 Frictionless Contact Problems
241
Using Trace Theorem 4.3, in particular (4.3), and Poincaré’s inequality (4.3), we get
2 2 2 . ui dΓ ui L2 (Ω) + ∇ui L2 (Ω) ∇ui 2 dΩ, i = 1, 2, 3. Γ
Ω
Summing the terms on the left and right of the inequality (11.65), and using (11.63), we get 3
u2i dΓ
.
i=1
Γ
3
∇ui 2L2 (Ω) =
ε(u) : ε(u)dΩ ≤ a(u, u).
(11.67)
Ω
i=1
.
The relation (11.67) is valid for any fixed domain, in particular for a reference domain .Ω with the decomposition parameter .H = 1. If we consider a class of domains −1 .ΩH = {x : H x ∈ Ω}, we can use the transformation of variables as in Sect. 10.8.2 to get .
H −2
3
u2i dΓ H −1 Γ
i=1
3
∇ui 2L2 (Ω) ≤ H −1 a(u, u).
(11.68)
i=1
After recalling that for any piece-wise linear function .uh : Ω → R3 defined by the coordinate vector .u with the boundary components .uB
. uh 2 dΓ ≈ uB 2 h2 , Γ
we are ready to formulate the main result of this section, which is a variant of Proposition 10.1 for 3D linear elasticity. Proposition 11.2 Let .Ω denote a reference Lipschitz domain with the diameter equal to one, let .H > 0, and let .KH,h denote the stiffness matrix of .ΩH = {Hx : x ∈ Ω} obtained by the quasi-uniform discretization of .ΩH with the shape regular elements of diameters less than or equal to h. Let .SH,h denote the Schur complement of .KH,h with respect to the displacements of interior nodes. Let the constraint matrix .B satisfy (11.25). Then there are constants c and C independent of h and H such that .
c
h2 x2 ≤ xT SH,h x ≤ Chx2 , H
x ∈ Im SH,h .
(11.69)
Proof Let .uh : Ω → R3 denote a piece-wise linear harmonic function with coordinates .u ∈ R3n , and let .uB denote the boundary coordinates of .uh . Then
242
Zdeněk Dostál, Tomáš Kozubek, and Vít Vondrák
a(uh , uh ) = uT KH,h u = uTB SH,h uB .
.
The rest follows by (11.68) (multiplied by H) and Lemma 11.4.
(11.70)
.
11.12 Optimality The following theorem is now an easy corollary of Lemma 11.3 and Proposition 11.2. Theorem 11.2 Let .ρ > 0, and let .Hρ,H,h denote the Hessian of .θρ resulting from the decomposition and finite element discretization of problem (11.19). Let us assume that the problem is decomposed into subdomains with the diameter less than or equal to H, and let the subdomains be discretized by a quasi-uniform grid with shape regular linear finite elements using the discretization parameters h. Let (11.25) be true. Then there are constants c and C independent of h and H such that .
c
h λ2 ≤ λT Hρ,H,h λ ≤ Cλ2 , H
λ ∈ Rm .
(11.71)
Proof Substitute (11.69) into Lemma 11.3, and take into account the choice . of the regularization parameter .ρ and the scaling of .F. Remark 11.2 We consider a class of subdomains regular if their Korn’s inequalities are uniformly bounded. A variant of the theorem was a key ingredient of the first optimality analysis of FETI for linear problems by Farhat et al. [11]. To show that Algorithm 9.2 (SMALBE-M) with the inner loop implemented by Algorithm 8.2 (MPRGP) is optimal for the solution of a class of problems arising from varying discretizations of a given frictionless contact problem, let us introduce a new notation that complies with that used in the analysis of algorithms in Part II. Let . > 0 and .C ≥ 2 denote the given constants, and let TC = {(H, h) ∈ R2 : H/h ≤ C}
.
denote the set of indices. For any .t ∈ TC , let us define .
At = PFP + ρQ, bt = Pd, I , Bt = G, It = −λ
where the vectors and matrices are those arising from the discretization of (11.19) with .t = (H, h). We assume that the discretization satisfies the assumptions of Theorem 11.2 and . tI ≤ o. We get a class of problems
11 Frictionless Contact Problems .
243
min ft ( 0 ) s.t. Bt λ = o and λI ≥ It
(11.72)
with f (λ) =
. t
1 T λ At λ − bTt λ. 2
Using these definitions and .GGT = I, we obtain .
Bt = 1.
(11.73)
It follows by Theorem 11.2 that for any .C ≥ 2, there are constants aC
. max
> aC min > 0
such that .
C aC min ≤ αmin (At ) ≤ αmax (At ) ≤ amax
(11.74)
for any .t ∈ TC . In particular, it follows that the assumptions of Theorem 9.4 are satisfied for any set of indices .TC , .C ≥ 2, and we can formulate the main result of this chapter. Theorem 11.3 Let C ≥ 2,
.
ρ > 0,
and
ε>0
denote the given constants, and let {λkt },
.
{μkt },
and
{Mt,k }
be generated by Algorithm 9.2 (SMALBE-M) for (11.72) with bt ≥ ηt > 0, 1 > β > 0, Mt,0 = M0 > 0, ρ > 0, μ0t = o.
.
Let Step 1 of Algorithm 9.2 be implemented by means of Algorithm 8.2 (MPRGP) with parameters Γ > 0 and α ∈ (0, 2/aC max ),
.
so that it generates the iterates k,1 k,l k λk,0 t , λt , . . . , λt = λt
.
for the solution of (11.72) starting from .λk,0 = λk−1 with .λ−1 = o, where t t t .l = lt,k is the first index satisfying .
k,l k gP (λk,l t , μt , ρ) ≤ Mt,k Bt λt
(11.75)
244
Zdeněk Dostál, Tomáš Kozubek, and Vít Vondrák
or .
k gP (λk,l t , μt , ρ) ≤ εbt
and
Bt λk,l t ≤ εbt .
(11.76)
Then for any .t ∈ TC and problem (11.72), Algorithm 9.2 generates an approximate solution .λkt which satisfies (11.76) at .O(1) matrix-vector multiplications by the Hessian .At of .ft .
11.13 Numerical Experiments The algorithms presented here were implemented in several software packages (see Sect. 20.7) and tested on several academic benchmarks and real-world problems. Here, we give some results that illustrate their numerical scalability and effectiveness using MatSol [13], postponing the demonstration of parallel scalability to Chap. 20. All the computations were carried out with the parameters recommended in the description of the algorithms in Chaps. 7–9. The relative precision of the computations was .ε = 10−4 (see (9.40)).
11.13.1 Cantilever Beams Let us consider a 3D semicoercive contact problem of two cantilever beams of sizes .2 × 1 × 1 [m] in mutual contact without friction resolved in [14]. The beams are depicted in Fig. 11.5. Zero horizontal displacements were prescribed on the right face of the upper beam. The lower beam (shifted by 1 [m]) was fixed on its left face. The vertical traction .f = 20 [MPa] was prescribed on the upper and left faces of the upper beam. The problem was discretized with varying discretization and decomposition parameters h and H, respectively. For each h and H, the bodies were decomposed into .2/H × 1/H × 1/H subdomains discretized by hexahedral elements. We kept .H/h = 8, so that the assumptions of Theorem 11.3 were satisfied. The performance of algorithms is documented in the following graphs. The numbers .nout of the outer iterations of SMALBE-M and .nHes of the Hessian .F multiplications depending on the primal dimension n are depicted in Fig. 11.6. We can see stable numbers of both inner and outer iterations for n ranging from .431,244 to .11,643,588. The dual dimension of the problems ranged from .88,601 to .2,728,955. We conclude that the performance of the algorithm is in agreement with the theory. In Fig. 11.7, we depict the normal traction along the axis of the contact interface. Figure 11.7 shows that in the solution of the largest problem, most of .11,550 linear inequality constraints were active.
11 Frictionless Contact Problems
245
Fig. 11.5 Two beams’ benchmark and its domain decomposition
Fig. 11.6 Cantilever beams—numbers of matrix-vector multiplications by .F (left) and outer (SMALBE-M) iterations (right) 500 400 300 200 100 0
0
200
400
600
800
Fig. 11.7 Normal contact pressures along the line r (length of r is given in millimeters)
11.13.2 Roller Bearings of Wind Generator We have also tested our algorithms on real-world problems, including the stress analysis in the roller bearings of a wind generator depicted in Fig. 11.8 (see [14]). The problem comprises 73 bodies in mutual contact; only 1 is fixed in space.
246
Zdeněk Dostál, Tomáš Kozubek, and Vít Vondrák
Fig. 11.8 Frictionless roller bearing of wind generator
The solution to the problem discretized by 2,730,000/459,800 primal/dual variables and decomposed into 700 subdomains required 2364 matrix-vector multiplications, including outer iterations for exact non-penetration. The von Mises stress distribution is in Fig. 11.8 (right). Though the number of iterations is not small, the parallel scalability of the algorithm enables to obtain the solution in a reasonable time. Later improvements implemented by T. Brzobohatý reduced the number of matrix-vector multiplications to 837.
11.14 Comments A convenient presentation of the variational formulation, including the dual formulation, the results concerning the existence and uniqueness of a solution, the finite element approximation of the solution, and the standard iterative methods for the solution, can be found in the book by Kikuchi and Oden [2]. For the variational formulation and analysis, see also Hlaváček et al. [15]. The up-to-date engineering approach to the solution of contact problems can be found in Laursen [3] or Wriggers [4]. See Chap. 16 for the discussion of the combination of TFETI and mortar approximation of contact conditions. Probably the first theoretical results concerning the development of scalable algorithms for coercive contact problems were proved by Schöberl [16, 17]. Numerical evidence of scalability of a different approach combining FETI-DP with a Newton-type algorithm and preconditioning in face by standard FETI preconditioners for 3D contact problems was given in Duresseix and Farhat [18] and Avery et al. [19]. See also Dostál et al. [20]. Impressive applications of the above approach can be found in [21]. A stable implementation of TFETI requires a reliable evaluation of the action of a generalized inverse of the SPS stiffness matrix. The presentation in Sect. 11.6 combines the regularization proposed by Savenkov et al. [5] and Felippa and Park [22] with the application of LU and SVD decompositions proposed by Farhat and Géradin [23]. Our exposition uses the fixing nodes presented in Brzobohatý et al. [9]. See also Ph.D. thesis by Kabelíková [10].
11 Frictionless Contact Problems
247
It should be noted that the efforts to develop scalable solvers for coercive variational inequalities were not restricted to FETI. Optimal properties of multigrid methods for linear problems were exploited, e.g., by Kornhuber and Krause [24] and Kornhuber et al. [25], to give experimental evidence of the numerical scalability of an algorithm based on monotonic multigrid. However, as pointed out by Iontcheva and Vassilevski [26], the coarse grid used in the multigrid should avoid the constrained variables, so that its approximation properties are limited and not sufficient to support the proof of optimality of the nonlinear algorithm. Multigrid has also been used in the framework of the nonsmooth Newton methods, which turned out to be an especially effective tool for the solution of problems with complex nonlinearities (see Sect. 12.10). The augmented Lagrangians were often used in engineering algorithms to implement equality constraints as in Simo and Laursen [27] or Glowinski and Le Tallec [28]. It seems that the first application of the LANCELOT style [29] augmented Lagrangians (proposed for bound and general equality constraints) with adaptive precision control in combination with FETI to the solution of contact problems is in Dostál et al. [30–32]. Dostál et al. [33] presented experimental evidence of numerical scalability. The optimality was proved in [14]—the proof exploits the optimal properties of MPRGP [34] (see also Sect. 8.2), SMALBE-M (see [35, 36] or Sect. 9.8), and bounds on the spectrum of the dual stiffness matrices generated by TFETI (see [37] or Sect 11.11). The latter bounds were used for the first time by Farhat, Mandel, and Roux in the proof of scalability of linear FETI method [11]. Here, we partly follow [14]. The linear steps of MPRGP can be preconditioned by standard FETI preconditioners in face, i.e., the lumped preconditioner or the Dirichlet preconditioner [38]. However, our experience does not indicate the high efficiency of the modified algorithms for contact problems, partly because such approach transforms the bound constraints into more general inequality constraints and can hardly improve the performance of nonlinear steps. Much better results were achieved by implementing adaptive reorthogonalization described in Sect. 17.4. The negative effect of jumping coefficients can be eliminated or at least reduced by the reorthogonalization-based preconditioning or renormalization-based scaling that is presented in Chap. 17 with other improvements. The fast reconstruction formula proposed by Horák appeared in [39]. There is an interesting corollary of our theory. If we have a class of contact problems which involves the bodies that are discretized by quasi-uniform grids using shape regular elements, so that the regular part of their spectrum is in a given positive interval, then Theorem 11.3 implies that in spite of nonlinearity, there is a bound, independent of a number of the bodies, on the number of iterations that are necessary to approximate the solution to a given precision.
248
Zdeněk Dostál, Tomáš Kozubek, and Vít Vondrák
References 1. Steinbach, O.: Numerical Approximation Methods for Elliptic Boundary Value Problems. Springer, New York (2008) 2. Kikuchi, N., Oden, J.T.: Contact Problems in Elasticity. SIAM, Philadelphia (1988) 3. Laursen, T.: Computational Contact and Impact Mechanics. Springer, Berlin (2002) 4. Wriggers, P.: Contact Mechanics. Springer, Berlin (2005) 5. Savenkov, E., Andrä, H., Iliev, O.: An analysis of one regularization approach for solution of pure Neumann problem. Berichte des Faruenhofer ITWM, Nr. 137, Kaiserslautern (2008) 6. Karypis, G.: METIS—a family of programs for partitioning unstructured graphs and hypergraphs and computing fill-reducing orderings of sparse matrices. http://glaros.dtc.umn.edu/gkhome/views/metis 7. Berman, A., Plemmons, R.J.: Nonnegative Matrices in the Mathematical Sciences. SIAM, Philadelphia (1994) 8. Golub, G.H., Van Loan, C.F.: Matrix Computations, 2nd edn. Johns Hopkins University Press, Baltimore (1989) 9. Brzobohatý, T., Dostál, Z., Kozubek, T., Kovář, P., Markopoulos, A.: Cholesky decomposition with fixing nodes to stable computation of a generalized inverse of the stiffness matrix of a floating structure. Int. J. Numer. Methods Eng. 88(5), 493–509 (2011) 10. Kabelíková, P.: Implementation of Non-Overlapping Domain Decomposition Techniques for FETI Methods [accessible online]. Ph.D. Thesis, FEECS VŠB-TU Ostrava (2012). http://hdl.handle.net/10084/95077 11. Farhat, C., Mandel, J., Roux, F.-X.: Optimal convergence properties of the FETI domain decomposition method. Comput. Methods Appl. Mech. Eng. 115, 365–385 (1994) 12. Horák, D., Dostál, Z., Sojka, R.: On the efficient reconstruction of displacements in FETI methods for contact problems. Adv. Electr. Electron. Eng. 15(2), 237–241 (2017) 13. Kozubek, T., Markopoulos, A., Brzobohatý, T., Kučera, R., Vondrák, V., Dostál, Z.: MatSol–MATLAB efficient solvers for problems in engineering. The library is no longer updated and is replaced by the ESPRESO framework. http://numbox.it4i.cz 14. Dostál, Z., Kozubek, T., Vondrák, V., Brzobohatý, T., Markopoulos, A.: Scalable TFETI algorithm for the solution of multibody contact problems of elasticity. Int. J. Numer. Methods Eng. 82(11), 1384–1405 (2010) 15. Hlaváček, I., Haslinger, J., Nečas, J., Lovíšek, J.: Solution of Variational Inequalities in Mechanics. Springer, Berlin (1988) 16. Schöberl, J.: Solving the Signorini problem on the basis of domain decomposition techniques. Computing 60(4), 323–344 (1998)
11 Frictionless Contact Problems
249
17. Schöberl, J.: Efficient contact solvers based on domain decomposition techniques. Comput. Math. Appl. 42, 1217–1228 (2001) 18. Dureisseix, D., Farhat, C.: A numerically scalable domain decomposition method for solution of frictionless contact problems. Int. J. Numer. Methods Eng. 50(12), 2643–2666 (2001) 19. Avery, P., Rebel, G., Lesoinne, M., Farhat, C.: A numerically scalable dual-primal substructuring method for the solution of contact problems—part I: the frictionless case. Comput. Methods Appl. Mech. Eng. 193, 2403–2426 (2004) 20. Dostál, Z., Vondrák, V., Horák, D., Farhat, C., Avery, P.: Scalable FETI algorithms for frictionless contact problems. Lecture Notes in Computational Science and Engineering, vol. 60, pp. 263–270. Springer, Berlin (2008) 21. Avery, P., Farhat, C.: The FETI family of domain decomposition methods for inequality-constrained quadratic programming: application to contact problems with conforming and nonconforming interfaces. Comput. Methods Appl. Mech. Eng. 198, 1673–1683 (2009) 22. Felippa, C.A., Park, K.C.: The construction of free-free flexibility matrices for multilevel structural analysis. Comput. Methods Appl. Mech. Eng. 191, 2111–2140 (2002) 23. Farhat, C., Géradin, M.: On the general solution by a direct method of a large scale singular system of linear equations: application to the analysis of floating structures. Int. J. Numer. Methods Eng. 41, 675–696 (1998) 24. Kornhuber, R., Krause, R.: Adaptive multigrid methods for Signorini’s problem in linear elasticity. Comput. Vis. Sci. 4(1), 9–20 (2001) 25. Kornhuber, R., Krause, R., Sander, O., Deuflhard, P., Ertel, S.: A monotone multigrid solver for two body contact problems in biomechanics. Comput. Vis. Sci. 11, 3–15 (2008) 26. Iontcheva, A.H., Vassilevski, P.S.: Monotone multigrid methods based on element agglomeration coarsening away from the contact boundary for the Signorini’s problem. Numer. Linear Algebra Appl. 11(2–3), 189–204 (2004) 27. Simo, J.C., Laursen, T.A.: An augmented Lagrangian treatment of contact problems involving friction. Comput. Struct. 42, 97–116 (1992) 28. Glowinski, R., Le Tallec, P.: Augmented Lagrangian and OperatorSplitting Methods in Nonlinear Mechanics. SIAM, Philadelphia (1989) 29. Conn, A.R., Gould, N.I.M., Toint, Ph.L.: LANCELOT: A FORTRAN Package for Large Scale Nonlinear Optimization (Release A). No. 17 in Springer Series in Computational Mathematics. Springer, New York (1992) 30. Dostál, Z., Friedlander, A., Santos, S.A.: Solution of contact problems of elasticity by FETI domain decomposition. Domain Decomposition Methods 10. Contemporary Mathematics, vol. 218, 82–93. AMS, Providence (1998)
250
Zdeněk Dostál, Tomáš Kozubek, and Vít Vondrák
31. Dostál, Z., Gomes, F.A.M., Santos, S.A.: Duality based domain decomposition with natural coarse space for variational inequalities. J. Comput. Appl. Math. 126(1–2), 397–415 (2000) 32. Dostál, Z., Gomes, F.A.M., Santos, S.A.: Solution of contact problems by FETI domain decomposition with natural coarse space projection. Comput. Methods Appl. Mech. Eng. 190(13–14), 1611–1627 (2000) 33. Dostál, Z., Horák, D., Kučera, R., Vondrák, V., Haslinger, J., Dobiáš, J., Pták, S.: FETI based algorithms for contact problems: scalability, large displacements and 3D Coulomb friction. Comput. Methods Appl. Mech. Eng. 194(2–5), 395–409 (2005) 34. Dostál, Z., Schöberl, J.: Minimizing quadratic functions over nonnegative cone with the rate of convergence and finite termination. Comput. Optim. Appl. 30(1), 23–44 (2005) 35. Dostál, Z.: Inexact semi-monotonic augmented Lagrangians with optimal feasibility convergence for quadratic programming with simple bounds and equality constraints. SIAM J. Numer. Anal. 43(1), 96–115 (2005) 36. Dostál, Z.: Optimal Quadratic Programming Algorithms, with Applications to Variational Inequalities, 1st edn. Springer, New York (2009) 37. Dostál, Z., Horák, D., Kučera, R.: Total FETI - an easier implementable variant of the FETI method for numerical solution of elliptic PDE. Commun. Numer. Methods Eng. 22, 1155–1162 (2006) 38. Toselli, A., Widlund, O.B.: Domain Decomposition Methods – Algorithms and Theory. Springer Series on Computational Mathematics, vol. 34. Springer, Berlin (2005) 39. Horák, D., Dostál, Z., Sojka, R.: On the efficient reconstruction of displacements in FETI methods for contact problems. Adv. Electr. Electron. Eng. 15, 237–241 (2017)
Chapter 12
Contact Problems with Friction Zdeněk Dostál, Tomáš Kozubek, and Vít Vondrák
Contact problems with friction are much more complicated than the frictionless problems considered so far. The problems start with the formulation of phenomenological friction laws. Unfortunately, the most popular friction law, the Coulomb law, makes the problem intrinsically non-convex. As a result, we have to face new challenges, particularly the non-uniqueness of a solution, nonlinear inequality constraints in dual formulation, and intractable minimization formulation. We limit our exposition to solving a simplified problem with a given (Tresca) friction to get strong optimality results. This friction model assumes that the normal pressure is a priori known on the contact interface. Such an assumption is unrealistic and violates the laws of physics, e.g., it admits positive contact pressure on the part of the contact interface that is not active in the solution, but it makes the problem well posed. Moreover, we can use Tresca’s friction in the inner loop of the fixed-point iterations to solve the problems with Coulomb’s friction. Though a satisfactory convergence theory does not support such an algorithm, it can often solve the contact problems with friction in a few outer iterations. The problems with Tresca’s friction minimize a convex nonsmooth cost function subject to a convex set. The switching to the dual formulation results in a convex QCQP (quadratically constrained quadratic programming) problem with linear non-penetration constraints and separable quadratic inequality constraints defined by the slip bounds. The structure of the problem complies well with TFETI. Surprisingly, the optimality results presented here are similar to those proved for frictionless problems.
Z. Dostál • T. Kozubek • V. Vondrák Department of Applied Mathematics and IT4Innovations National Supercomputing Center, VŠB-Technical University of Ostrava, Ostrava, Czech Republic e-mail: [email protected]; e-mail: [email protected]; e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 Z. Dostál et al., Scalable Algorithms for Contact Problems, Advances in Mechanics and Mathematics 36, https://doi.org/10.1007/978-3-031-33580-8_12
251
252
Zdeněk Dostál, Tomáš Kozubek, and Vít Vondrák
The TFETI-based algorithms presented here can use tens of thousands of cores to effectively solve coercive and semicoercive problems decomposed into tens of thousands of subdomains and billions of nodal variables.
12.1 Equilibrium of Bodies in Contact with Coulomb Friction Let us consider a system of bodies .Ω1 , . . . , Ωs introduced in Sects. 11.1 and 11.2, which is illustrated in Fig. 11.1, and let us examine the effect of friction on the equilibrium in a point on the tangential plane of a smooth contact interface under the assumption that it complies with Coulomb’s law. Let us assume that the contact coupling set .S was specified, and let .(p, q) ∈ S . Using the notation introduced in Sect. 11.2 and assuming that the displacement u = (u1 , . . . , us ),
.
up : Ωp → R3 ,
is sufficiently smooth, we can write the non-penetration contact conditions and the conditions of equilibrium in the normal direction .n briefly as .
[un ] ≤ g,
λn ≥ 0,
λn ([un ] − g) = 0,
x ∈ Γpq C .
(12.1)
Recall that .λn denotes the normal traction on the slave face of the contact interface and .[un ] denotes the jump of the boundary displacements. To formulate the conditions of equilibrium in the tangential plane, let us introduce the notation λT = λ − λN ,
.
uN = un np ,
uT = u − uN ,
x ∈ Γpq C
and [uT ] = upT − uqT ◦ χpq ,
.
x ∈ Γpq C ,
(p, q) ∈ S .
Let .Φ > 0 denote the friction coefficient, which can depend on .x. In this case, we assume that it is bounded away from zero. The Coulomb friction law at .x ∈ Γpq C , .(p, q) ∈ S , can be written in the form if [un ] = g,
.
then
λn ≥ 0,
if [un ] = g and λT < Φλn , then uT = o, if λT = Φλn , then there is μ > 0 such that [uT ] = μ[λT ], where g = χpq − Id · np .
.
(12.2) (12.3) (12.4)
12 Contact Problems with Friction
253
The relations (12.1)–(12.4) with constitutive relations (11.4) and .
− div σ(u) = f in Ω, up = o on ΓpU , p
p
σ(u )n =
fΓp
on
ΓpF ,
(12.5) p = 1, . . . , s,
describe completely the kinematics and equilibrium of a system of elastic bodies in contact obeying the Coulomb friction law.
12.2 Variational Formulation The classical formulation of contact problems with friction (12.1)–(12.5) makes sense only when the solutions satisfy strong regularity assumptions similar to those required by the classical solutions of frictionless problems. The remedy is again a variational formulation which characterizes the solutions by local averages. We shall use the same tools as in Sect. 11.3, namely, the spaces 3 V p = v ∈ H 1 (Ωp ) : v = o on ΓpU , p = 1, . . . , s, .
V = V 1 × · · · × V s, the convex set K = {v ∈ V : [vn ] ≤ g on Γpq C , (p, q) ∈ S } ,
.
and the forms
(12.6)
.
(12.7)
.
The relations above should be understood in the sense of traces. Moreover, we assume that a contact coupling set .S has been chosen, and we shall use the short notation introduced in the previous section. The effect of friction will be enhanced by a nonlinear functional j which represents the virtual work of frictional forces j(u, v) = Φ|λn (u)| [vT ] dΓ, u, v ∈ V. . (12.8) (p,q)∈S
Γpq C
254
Zdeněk Dostál, Tomáš Kozubek, and Vít Vondrák
The functional j is nonlinear and non-differentiable, which is a source of complications. We shall start with identity (11.12) for the solution .u and .v ∈ V which reads
.
and notice that for any .v ∈ K , we get
.
λ(u) · ([u] − [v]) = λT (u) · ([uT ] − [vT ]) + λn (u) · ([un ] − [vn ]) = λT (u) · ([uT ] − [vT ]) + λn (u) · ([un ] − g + g − [vn ]) ≥ λT (u) · ([uT ] − [vT ]).
Combining the above inequalities, we get a(u, v − u) − (v − u) ≥ λT (u) · ([uT ] − [vT ]) dΓ, . (p,q)∈S
Γpq C
v∈K. (12.9)
Let us show that the classical solution .u ∈ K of the contact problem with Coulomb’s friction (12.1)–(12.5) satisfies .
a(u, v − u) − (v − u) + j(u, v) − j(u, u) ≥ 0,
v∈K.
(12.10)
Using (12.9) and definition (12.8) of j, we get a(u , v − u) − (v − u) + j(u, v) − j(u, u) λT (u) · ([uT ] − [vT ]) + Φ|λn (u)| [vT ] ≥ .
(p,q)∈S
Γpq C
(p,q)∈S
Γpq C
− Φ|λn (u)| [uT ] dΓ = λT (u) · ([uT ] − [vT ]) + Φ|λn (u)| ([vT ] − [uT ]) dΓ.
To show that the last expression is greater or equal to zero, let .(p, q) ∈ S be arbitrary but fixed; let Ψ = Φ|λn (u)|,
.
x ∈ Γpq C ,
denote the slip bound , and consider two cases.
12 Contact Problems with Friction
255
If .λT (u) < Ψ, then .[uT ] = o and λT (u) · ([uT ] − [vT ]) + Φ|λn (u)|([vT ] − [uT ]) = Ψ[vT ] − λT (u) · [vT ] ≥ 0.
.
If .λT (u) = Ψ, then by (12.4) [uT ] = μ[λT (u)], μ ≥ 0,
.
so [uT ] · [λT (u)] = [uT ][λT (u)]
.
and for .x ∈ Γpq C
.
λT (u) · ([uT ]−[vT ]) + Φ|λn (u)| ([vT ] − [uT ]) ≥Ψ([vT ] − [uT ]) + λT (u) · [uT ] − Ψ[vT ] =λT (u) · [uT ] − Ψ[uT ] =[uT ]([λT (u)] − Ψ) = 0.
We have thus proved that each classical solution .u satisfies (12.10). Choosing suitable test functions .v ∈ K , it is possible to show that each sufficiently smooth solution .u of (12.10) is a classical solution of (12.1)–(12.5). Though the problem (12.10) is well defined for more general functions than those considered above, its formulation in more general spaces faces considerable difficulties. For example, such formulation requires that the trace of .σ(x) on .ΓC is well defined, and the theoretical results concerning the existence and uniqueness of the solution require additional assumptions. Moreover, no true potential energy functional exists for the problem with friction. For more detailed discussion, see the books by Hlaváček et al. [1], Kikuchi and Oden [2], Eck et al. [3] and the comprehensive papers by Haslinger et al. [4] or Wohlmuth [5].
12.3 Tresca (Given) Isotropic Friction Let us consider a simplification of the variational inequality (12.10) that can be reduced to a minimization problem which can be solved effectively by methods similar to those discussed in Chap. 11. The basic idea is to assume that the normal traction λn on the contact interface is known. Though it is possible to contrive a physical situation which can be captured by such model, the main motivation is the possibility to use the simplified model in the fixed-point iterations for solving contact problems
256
Zdeněk Dostál, Tomáš Kozubek, and Vít Vondrák
with Coulomb’s friction. If we assume that λn is known, we get the problem to find a sufficiently smooth displacement field up : Ωp → R3 ,
u = (u1 , . . . , us ),
.
such that for any (p, q) ∈ S and x ∈ Γpq if [un ] = g, then λn ≥ 0, if [un ] = g and λT < Ψ,
(12.11) (12.12)
.
if λT = Ψ,
then
uT = o,
then there is μ > 0 such that
[uT ] = μ[λT ],
(12.13)
where Ψ ≥ 0 is a sufficiently smooth prescribed slip bound. After substituting into the friction functional j (12.8) and simplifying the notation, we get j(v) = Ψ[vT ] dΓ, . (12.14) p,q∈S
Γpq C
and the variational problem (12.10) reduces to the problem to find u ∈ K which satisfies .
a(u, v − u) − (v − u) + j(v) − j(u) ≥ 0,
v∈K.
(12.15)
Observing that Ψ(x)[vT (x)] =
.
max
τ ≤Ψ(x)
τ · [vT (x)],
we can write the non-differentiable term in (12.14) as j(v) = Ψ[v ] dΓ = max T . (p,q)∈S
Γpq C
(p,q)∈S
τ ≤Ψ(x) Γpq C
τ · [vT (x)] dΓ. (12.16)
For the development of scalable algorithms for solving contact problems with Tresca friction, it is important that any solution of the variational boundary value inequality (12.15) solves subject to v ∈ K , 1 f (v) = a(v, v) − (v) + j(v), 2
min f (v) .
(12.17)
where a and are defined in (12.6) and (12.7), respectively. To verify this claim, notice that if u, v ∈ K satisfy (12.15), then
12 Contact Problems with Friction
257
1 1 a(v, v) − (v) + j(v) − a(u, u) + (u) − j(u) 2 2 1 = a(v + u, v − u) − (v − u) + j(v) − j(u) 2 = a(u, v − u) − (v − u) + j(v) − j(u) 1 + a(v, v − u) − a(u, v − u) 2 1 ≥ a(v − u, v − u) ≥ 0. 2
f (v) − f (u) =
.
We conclude that any classical solution of the contact problem with given (Tresca) friction (12.1), (12.11)–(12.13), and (12.5) satisfies .
f (u) ≤ f (v),
v∈K.
(12.18)
Denoting by u(Ψ) a solution of (12.18), we can try to get a solution of the problem with Coulomb’s friction by the fixed-point iterations .
u0 =u(Ψ0 ), uk+1 =u Φ|λn (uk )| .
(12.19)
The procedure was proposed by Panagiotopoulos [6].
12.4 Orthotropic Friction Let us briefly describe orthotropic Coulomb friction, which is defined by the matrix Φ1 0 .Φ = , Φ1 , Φ2 > 0. 0 Φ2 The diagonal entries of .Φ describe the friction in two orthogonal tangential directions which are defined at each point of the contact interface by the unit vectors .t1 , t2 . The friction coefficients .Φ1 , Φ2 can depend on .x, but in this case we assume that they are bounded away from zero. The orthotropic Coulomb friction law at .x ∈ Γpq C , .(p, q) ∈ S , reads if [un ] = g,
.
λn ≥ 0,
then
−1
if [un ] = g and Φ −1
if Φ
λ T < λn ,
(12.20) then
(12.21)
uT = o,
λT = λn , then there is μ > 0 s.t.
−1
[uT ] = μ[Φ
λT ].
(12.22)
The variational formulation of the conditions of equilibrium for the orthotropic Coulomb friction is formally identical to (12.10) provided we use the dissipative term in the form
258
Zdeněk Dostál, Tomáš Kozubek, and Vít Vondrák
j(u, v) =
.
(p,q)∈S
Γpq C
|λn (u)| [ΦvT ] dΓ,
u, v ∈ V.
The same is true for the variational formulation of the conditions of equilibrium for orthotropic Tresca friction, which is formally identical to (12.15) and (12.1) provided we use the non-differentiable term in the form .j(v) = Λn [ΦvT ] dΓ, (p,q)∈S
Γpq C
where .Λn ≥ 0 now defines the predefined normal traction. The reasoning is the same as in Sect. 12.3. Observing that Λn [ΦvT ] = max τ · [ΦvT ] = max Φτ · [vT ] τ ≤Λn
.
=
max
Φ−1 Φτ ≤Λn
τ ≤Λn
Φτ · [vT ] =
max
Φ−1 τ ≤Λn
τ · [vT ],
we can replace the non-differentiable term in (12.14) by the bilinear term to get j(v) = Λ [Φv ] dΓ = max τ · [vT ] dΓ. n T . −1 (p,q)∈S
Γpq C
(p,q)∈S
Φ Γpq C
τ ≤Λn
(12.23) Denoting by .u(Λn ) a solution of (12.18) with j defined in (12.23), we can try to get a solution of the problem with orthotropic friction by the fixedpoint iterations (12.19).
12.5 Domain Decomposition and Discretization To simplify the exposition, we shall restrict our attention in what follows to the isotropic friction, leaving the modification to the interested reader. The problem (12.18) has the same feasible set as the frictionless problem (11.14) and differs only in the cost function. It turns out that we can use the domain decomposition in the same way as in Sect. 11.4. To this end, let us decompose each domain .Ωp into subdomains with sufficiently smooth boundaries, assign a unique number to each subdomain, decompose appropriately the parts of the boundaries, introduce their numbering as in Sect. 11.4, and introduce the notation
12 Contact Problems with Friction
259
p VDD = v ∈ H 1 (Ωp )3 : v = o on ΓpU , p = 1, . . . s, .
VDD = V 1 × · · · × V s , KDD = {v ∈ VDD : [vn ] ≤ g on Γpq C , (p, q) ∈ S ; vp = vq on Γpq G , p, q = 1, . . . , s} .
The relations have to be understood in the sense of traces. The variational decomposed problem with the Tresca (given) friction defined by a slip bound .Ψ reads .
find u ∈ KDD s.t. f (u) ≤ f (v),
v ∈ KDD ,
(12.24)
where f (v) =
.
1 a(v, v) − (v) + j(v). 2
To get the discretized problem (12.24), let us use a quasi-uniform discretization with shape regular elements as in Sect. 11.5 and apply the procedure described in Sect. 11.5 to the components of the discretized linear form ., the quadratic form a, the gap function g, and the matrices that describe the discretized feasible set .KDD . Thus we shall get the block diagonal stiffness matrix .K ∈ Rn×n , the vector of nodal forces .f ∈ Rn , the constraint matrix m m m .BI ∈ R I , the discretized gap function .cI ∈ R I , and the matrix .BE ∈ R I that describes the gluing and Dirichlet conditions. To define the discretized non-differentiable term j (12.14), we introduce 2×n .2mI × n matrix .T such that .mI blocks of which are .Ti = Ti (xi ) ∈ R are formed by appropriately placed orthonormal tangential vectors .t1 (xi ) and .t2 (xi ) so that the tangential component of the displacement .u is given by .Ti u. After applying numerical integration to (12.14), we get the discretized dissipative term in the form .
j(u) =
mc
Ψi Ti u,
(12.25)
i=1
where .Ψi is the slip bound associated with .xi . Using (12.16), we get .
j(u) =
mc i=1
Ψi Ti u =
mc i=1
max τ Ti Ti u,
τ i ≤Ψi
(12.26)
where .τ i ∈ R2 can be interpreted as a Lagrange multiplier. The right-hand side of (12.26) is differentiable. The discretized problem (12.24) reads .
find u ∈ K s.t. f (u) ≤ f (v),
v∈K,
(12.27)
260
Zdeněk Dostál, Tomáš Kozubek, and Vít Vondrák
where .
1 T v Kv − f T v + j(v), 2
f (v) =
K = {v ∈ Rn : BI u ≤ cI and BE u = o}. (12.28)
We assume that .BI , .BE , and .T are full rank matrices. Notice that TT BI = O.
.
If each node is involved in at most one inequality, then it is possible (see Remark 11.1) to achieve that ⎤ ⎡ BE T . BB = I, B = ⎣ BI ⎦ . (12.29) T
12.6 Dual Formulation The problem (12.27) is not suitable for numerical solution, even if we replace j in the cost function by (12.26). The reasons are that the stiffness matrix .K is typically large, ill-conditioned, and singular, and the feasible set is in general so complex that the projections onto it can hardly be effectively computed. Under the circumstances, it would be complicated to solve auxiliary problems efficiently and to identify the solution’s active set effectively. As in Chap. 11, the complications can be essentially reduced by applying the duality theory of convex programming (see, e.g., Bazaraa et al. [7]). In the dual formulation of problem (12.27), we use three types of Lagrange multipliers, namely, .λI ∈ RmI associated with the non-interpenetration condition, .λE ∈ RmE associated with the “gluing” and prescribed displacements, and τ = [τ T1 , τ T2 , . . . , τ TmT ]T ∈ R2mT ,
.
τ i ∈ R2 , i = 1, . . . , mT ,
which are used to smooth the non-differentiability. The Lagrangian associated with problem (12.27) reads L(u, λI , λE , τ ) = f (u) + τ T Tu + λTI (BI u − cI ) + λTE (BE u − cE ).
.
To simplify the notation, we denote .m = mE + mI + 2mT = mE + 3mI , ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ BE o λE B = ⎣ B I ⎦ , c = ⎣ cI ⎦ , .λ = ⎣ λI ⎦ , τ T o
12 Contact Problems with Friction
261
and Λ(Ψ) = {(λTE , λTI , τ T )T ∈ Rm : λI ≥ o, τi ≤ Ψi , i = 1, . . . , mT },
.
so that we can write the Lagrangian briefly as 1 T u Ku − f T u + λT (Bu − c). 2
L(u, λ) =
.
Using the convexity of the cost function and constraints, we can reformulate problem (12.27) by duality to get .
min sup u
L(u, λI , λE , τ ) = max min L(u, λI , λE , τ ). λ∈Λ(Ψ)
λ∈Λ(Ψ)
u
We conclude that problem (12.27) is equivalent to the saddle point problem .
= max min L(u, λ). L( u, λ) λ∈Λ(Ψ)
u
(12.30)
Recall that .B is a full rank matrix. For a fixed .λ, the Lagrange function L(·, λ) is convex in the first variable, and the minimizer .u of .L(·, λ) satisfies
.
.
Ku − f + BT λ = o.
(12.31)
Equation (12.31) has a solution if and only if .
f − BT λ ∈ ImK,
(12.32)
which can be expressed more conveniently by means of a matrix .R ∈ Rn×6s the columns of which span the null space of .K as RT (f − BT λ) = o.
.
The matrix .R can be formed directly. See Sect. 11.6 for details. Now assume that .λ satisfies (12.32) and denote by .K+ any matrix which satisfies .
KK+ K = K.
(12.33)
Let us note that the action of a generalized inverse which satisfies (12.33) can be evaluated at the cost comparable with that of Cholesky’s decomposition applied to the regularized .K (see Sect. 11.8). It can be verified directly that if .u solves (12.31), then there is a vector .α ∈ R6s such that .
u = K+ (f − BT λ) + Rα.
(12.34)
After substituting expression (12.34) into problem (12.30), changing the signs, and omitting the constant term, we get that .λ solves the minimization problem
262
Zdeněk Dostál, Tomáš Kozubek, and Vít Vondrák
min Θ(λ)
.
s.t.
λ ∈ Λ(Ψ)
Θ(λ) =
.
and
RT (f − BT λ) = o,
(12.35)
1 T λ BK+ BT λ − λT (BK+ f − c). 2
of (12.35) is known, the solution .u of (12.27) can be Once the solution .λ evaluated by (12.34) with + (f − BT λ)), T BR) −1 RT B T ( α = (RT B c − BK
.
and . c are formed by the rows of .B and .c corresponding to all equality where .B constraints and all free inequality constraints.
12.7 Preconditioning by Projectors to Rigid Body Modes Even though problem (12.35) is more suitable for computations than (12.27), further improvement can be achieved by adapting the observations from the previous chapters. Let us denote F = F, d = F −1 (BK+ f − c), e = RT f ,
= BK+ BT , F . F = F −1 F, T T G=R B ,
and let .U denote a regular matrix that defines the orthonormalization of the so that the matrix rows of .G G = UG
.
has orthonormal rows. After denoting e = U e,
.
problem (12.35) reads .
min
1 T λ Fλ − λT d 2
λ ∈ Λ(Ψ)
s.t.
and
Gλ = e.
(12.36)
Next we transform the problem of minimization on a subset of the affine which satisfies space to that on a subset of the vector space using .λ .
= e. Gλ
(12.37)
12 Contact Problems with Friction
263
similarly as we did in Sect. 11.7. In particular, we can use We can choose .λ .λ which solves .
min
1 λ2 2
λ ∈ Λ(Ψ)
s.t.
and
Gλ = e,
(12.38)
or, if the problem is coercive, then we can use Lemma 11.2 to get oI .λ = . GTE (GE GTE )−1 e are important for the proof of optimality as they The above choices of .λ guarantee that .o is a feasible vector of the modified problem. For practical computations, we can use the least squares solution of .Gλ = e given by = GT e. λ
.
we can look for the solution of (12.36) in the form Having .λ, λ = μ + λ.
.
and .Λ(Ψ) = Λ(Ψ) − λ, To carry out the transformation, denote .λ = μ + λ so that .
1 T − Fλ) + 1λ −λ T d = 1 μT Fμ − μT (d T Fλ λ Fλ − λT d 2 2 2
and problem (12.36) turns, after returning to the old notation, into .
min
1 T λ Fλ − λT d 2
s.t.
Gλ = o and
λ ∈ Λ(Ψ),
− Fλ. d=d (12.39)
Recall that we can achieve that .o ∈ Λ(Ψ). Our final step is based on the observation that problem (12.39) is equivalent to .
min θρ (λ)
s.t.
Gλ = o
and
λ ∈ Λ(Ψ),
(12.40)
where .ρ is an arbitrary positive constant, Q = GT G
.
and
P=I−Q
denote the orthogonal projectors on the image space of .GT and on the kernel of .G, respectively, and θ (λ) =
. ρ
1 T λ (PFP + ρQ)λ − λT Pd. 2
264
Zdeněk Dostál, Tomáš Kozubek, and Vít Vondrák
12.8 Optimality To show that Algorithm 9.1 (SMALSE-M) with the inner loop implemented by Algorithm 7.2 (MPGP) is optimal for the solution of a class of problems arising from varying discretizations of a given contact problem with Tresca friction, let us introduce a notation that complies with that used for the description of the algorithms in Chaps. 7 and 9. Let .C ≥ 2 and .ρ > 0 denote given constants, and let TC = {(H, h) ∈ R2 : H/h ≤ C}
.
denote the set of indices. For any .t ∈ TC , let us define .
At = PFP + ρQ, Bt = G,
bt = Pd, ΩtS = Λ(Ψ),
where the vectors and matrices are those arising from the discretization of (12.24) with the discretization and decomposition parameters H and h, .t = (H, h). We shall assume that the discretization satisfies the assumptions of Theorem 11.2, including (12.29), and that .o ∈ ΩtS . We get a class of problems .
min ft (λt ) s.t. Bt λt = o and λt ∈ ΩtS
(12.41)
with 1 T λ At λt − bTt λt . 2 t
f (λt ) =
. t
To see that the class of problems (12.41) satisfies the assumptions of Theorem 9.4, recall that .GGT = I, so .
Bt = 1.
(12.42)
C It follows by Theorem 11.2 that there are constants .aC max > amin > 0 such that .
C aC min ≤ αmin (At ) ≤ αmax (At ) ≤ amax
(12.43)
for any .t ∈ TC . Thus the assumptions of Theorem 9.4 are satisfied for any set of indices .TC , C > 2, and .ρ > 0. Using the arguments specified in the above discussion, we can state our main result: Theorem 12.1 Let .C ≥ 2, ρ > 0, and .ε > 0 denote given constants; let {λkt }, {μkt }, and .{Mt,k } be generated by Algorithm 9.1 (SMALSE-M) for problem (12.41) with parameters
.
bt ≥ ηt > 0, 0 < β < 1, Mt,0 = M0 > 0, ρ > 0, and μ0t = o.
.
12 Contact Problems with Friction
265
Let Step 1 of SMALSE-M be implemented by Algorithm 7.2 (MPGP) with the parameters Γ>0
.
and
α ∈ (0, 2/aC max ),
so that it generates the iterates k,1 k,l k λk,0 t , λt , . . . , λt = λt
.
for the solution of (12.41) starting from .λk,0 = λk−1 with .λ−1 = o, where t t t .l = lt,k is the first index satisfying k,l k gP (λk,l t , μt , ρ) ≤ Mt,k Ct λt
.
or .
k gP (λk,l t , μt , ρ) ≤ εbt
and
Ct λk,l t ≤ εbt .
(12.44)
Then for any .t ∈ TC and problem (12.41), Algorithm 9.1 generates an approximate solution .λkt which satisfies (12.44) at .O(1) matrix-vector multiplications by the Hessian .At of .ft .
12.9 Numerical Experiments Here we give some results that illustrate the performance of the TFETI-based algorithms implemented for the solution of contact problems with friction using their implementation in MatSol [8]. All the computations were carried out with the parameters recommended in Chaps. 7–9, i.e., .M0 = 1, .η = 0.1Pd, .ρ = 1 ≈ PFP, .Γ = 1, .β = 0.2, .λ0 = o, and .μ0 = o. The relative precision of the computations was .ε = 10−4 (see (9.40)).
12.9.1 Cantilever Beams with Isotropic Friction To demonstrate the efficiency of TFETI for the solution of contact problems with friction, we considered the two cantilever beams benchmark described in Sect. 11.13. The geometry of the benchmark is depicted in Fig. 11.5. Its material properties are defined in Sect. 11.13, the Tresca friction was defined by the friction coefficient .Φ = 0.1, and the slip bounds were defined as the product of normal traction from the solution of the frictionless problem of Sect. 11.13 and .Φ. The discretizations and the decompositions were defined by the discretization parameter h and the decomposition parameter H, respectively. For each
266
Zdeněk Dostál, Tomáš Kozubek, and Vít Vondrák
h and H, the bodies were decomposed into cubic subdomains and discretized by the regular mesh with hexahedral elements. We kept .H/h = 8, so that the assumptions of Theorem 12.1 were satisfied. The normal and tangential tractions along the axis of the contact interface are shown in Fig. 12.1 (see [9]). This figure also shows that in the solution of the largest problem, most of the linear and quadratic inequality constraints were active.
Fig. 12.1 Tresca (left) and Coulomb (right) friction—normal and contact pressures along r (length of r is given in milimeters)
Fig. 12.2 Tresca and Coulomb friction: numerical scalability of cantilever beams—matrixvector multiplications by .F (left) and outer (SMALSE-M) iterations (right)
The performance of the algorithms is documented in the above graphs. The numbers of outer iterations of SMALSE-M and the multiplications by the Hessian .F of the dual function for the problem with Tresca friction depending on the primal dimension n are depicted in Fig. 12.2. We can see stable numbers of outer iterations for n ranging from 431,244 to 11,643,588. The dual dimension of the problems ranged from 88,601 to 2,728,955. We conclude that the performance of the algorithm is in agreement with the theory. The problem is difficult because many nodes on the contact interface nearly touch each other, making the identification of the contact interface difficult.
12 Contact Problems with Friction
267
Similar results were obtained for the solution of the problem with the Coulomb friction defined by the friction coefficient .Φ = 0.1. The results are also given in the above graphs. The number of iterations is naturally greater.
12.9.2 Cantilever Beams with Coulomb Orthotropic Friction To test the effectiveness of the SMALSE-Mw algorithm (see Sect. 9.12) for the solution of the problem with the Coulomb orthotropic friction, we resolved the problem of two clumped cantilever beams subjected to the surface tractions .p1 and .p2 depicted in Fig. 12.3 (see [10]). The beams .1×1×2 [m], clumped on the left, were decomposed into cube subdomains with the side-length .H = 0.5 [m] and discretized by the regular matching grid with the steplength .h = 0.1 [m]. As a result, we got the problem with 32 cube subdomains, 32,928 primal and 9567 dual variables. p1
Ω1
Ψy
p2
Ψx
z
Ω2 x y Fig. 12.3 Two cantilever beams with orthotropic Coulomb friction
The domain .Ω1 is filled with the elastic material characterized by the Young modulus .E1 and the Poisson ratio .ν1 , the material constants of .Ω2 are .E2 and .ν2 , E1 = 70,000 [MPa],
E2 = 212,000 [MPa],
ν1 = 0.35,
ν2 = 0.277.
.
The friction coefficients in the horizontal directions x and y are Ψx = 0.5
.
and
Ψy = 0.8,
respectively. The beams are clumped on the left, and the boundary surface pressure is prescribed by
268
Zdeněk Dostál, Tomáš Kozubek, and Vít Vondrák
p1 = (0, 0, −60) [MPa],
.
p2 = (60, −30, 0) [MPa].
The problem was solved with the code developed in MatSol [8]. The development of the Euclidean norm of the reduced projected gradient and the normal pressure is in Fig. 12.4. To get an approximate solution with relative precision .10−4 required 129 iterations of MPGP and 12 outer iterations of SMALSE-Mw. Comparing these numbers with Fig. 12.2 suggests that the performance of SMALSE-Mw with MPGP is similar to that of SMALSE-M with MPGP.
Fig. 12.4 Performance of MPGP on the problem with Coulomb friction and TFETI domain decomposition (left) and normal contact pressure on the contact interface (right)
The components of the tangential pressure on the contact interface are in Figs. 12.5 and 12.6. 35 20
2000
2000
30
1500
1500
25
15 1000
z
z
1000
20 500
10
0 0 500 1000 1500 x 2000 0
5 1000 500 y
500 15 0 0
10 500 1000 1500 x 2000 0
1000 500
5 0
y
Fig. 12.5 Solution of the problem: tangential (x and y direction) contact pressure on the contact interface
The results of experiments validate the formulation of contact problems with friction. They suggest that SMALSE-Mw with the inner loop imple-
12 Contact Problems with Friction
269
mented by MPGP is an effective tool for solving large complex problems with linear equality constraints and the convex separable constraints with strong curvature if we can evaluate the corresponding Euclidean projection effectively.
1
x2 [m]
0.8 0.6 0.4 0.2 0 0
0.5
1
1.5
2
x1 [m]
Fig. 12.6 Tangential contact pressure
12.9.3 Yielding Clamp Connection We have tested our algorithms on the solution of many real-world problems. Here we consider contact with the Coulomb friction, where the coefficient of friction was .Φ = 0.5. The problem was decomposed into 250 subdomains using METIS (see Fig. 12.7) and discretized by 1,592,853 and 216,604 primal and dual variables, respectively. The total displacements for the Coulomb friction are depicted in Fig. 12.8 (see [9]). It required 1922 matrix-vector multiplications to find the solution.
Fig. 12.7 Yielding clamp connection—domain decomposition
270
Zdeněk Dostál, Tomáš Kozubek, and Vít Vondrák
Fig. 12.8 Coulomb friction: total displacement
12.10 Comments Early results concerning the formulation of contact problems with friction can be found in Duvaut and Lions [11]. An up-to-date overview of the theoretical results can be found in the book by Eck et al. [3]. See also Haslinger et al. [4]. A convenient presentation of the variational formulation, the results concerning the existence and uniqueness of a solution, the finite element approximation of the solution, and standard iterative methods for the solution of contact problems with friction are in the book by Kikuchi and Oden [2]. For useful theoretical results, see Hlaváček et al. [1] or Capatina [12]. A comprehensive presentation of the variationally consistent discretization of contact problems with friction using biorthogonal mortars (see also Chap. 16) can be found in the seminal paper by Wohlmuth [5]. This paper includes the approximation theory and solution algorithms based on the semi-smooth Newton method and multigrid combination. An up-to-date engineering approach to the solution of contact problems with friction can be found in the papers by Laursen [13] or Wriggers [14]. See also Zhang et al. [15]. Despite the difficulties associated with the formulation of contact problems with friction, some interesting results concerning the optimal complexity of algorithms for solving auxiliary problems were achieved through multigrid. Therefore, the first papers adapted the earlier approach of Kornhuber and Krause to extend the results on frictionless problems to the problems with friction. Their authors gave experimental evidence that it is possible to solve frictional two-body contact problems in 2D and 3D with multigrid efficiency (see, e.g., Wohlmuth and Krause [16], Krause [17], or Krause [18]). The semi-smooth Newton method [19] became popular due to its capability to capture several nonlinearities in one loop (see, e.g., Hager and Wohlmuth [20], Wohlmuth [5], or comments in Sect. 18.6). The method is closely related to the active set strategy [21]. The problems suitable for the application of semi-smooth Newton include a combination of contact with electrostatics (see, e.g., Migorski et al. [22] or Hüeber et al. [23]). The lack of global convergence theory was compensated by some globalization strategies (see, e.g., Ito and Kunisch [24] or [25]).
12 Contact Problems with Friction
271
The development of algorithms for contact problems with friction used from the early beginning the scheme proposed by Panagiotopoulos [6] that combines the outer fixed-point iterations for normal contact pressure with the solution of a problem with given friction in the inner loop. It was quite straightforward to combine the scheme with the domain decomposition algorithms; see Dostál and Vondrák [26] or Dostál et al. [27]). Though the convergence of the outer loop is supported by the theory only for very special cases [28], the method works well in practice. However, it turned out that there was little chance to develop the theory for contact problems with Coulomb friction as complete as that for the frictionless problems in the previous chapter. Thus the effort to develop scalable algorithms was limited to solving problems with given friction. After developing the scalable algorithms for the frictionless contact problems presented in the previous chapter, it was rather easy to adapt them for solving 2D problems with Tresca friction [9]. However, this was not the case for the 3D problems, which have circular or elliptic constraints arising in the dual formulation. The breakthrough came through the new algorithms with the rate of convergence that depends on bounds on the spectrum of the dual energy function Hessian (see comments in Chaps. 7–9). The optimality results presented in this chapter appeared first in Dostál et al. [29]. The latter paper is the result of a graduate development starting with the approximation of circles by the intersections of squares in Haslinger et al. [30] and the early results without the optimality theory (see Dostál et al. [27] or Haslinger and Kučera [31]). Important issues related to the implementation of orthotropic friction are resolved in Bouchala et al. [32] and Ph.D. Thesis by Pospíšil [33].
References 1. Hlaváček, I., Haslinger, J., Nečas, J., Lovíšek, J.: Solution of Variational Inequalities in Mechanics. Springer, Berlin (1988) 2. Kikuchi, N., Oden, J.T.: Contact Problems in Elasticity. SIAM, Philadelphia (1988) 3. Eck, C., Jarůšek, J., Krbec, M.: Unilateral Contact Problems. Chapman & Hall/CRC, London (2005) 4. Haslinger, J., Hlaváček, I., Nečas, J.: Handbook of Numerical Analysis. Numerical Methods for Unilateral Problems in Solid Mechanics, vol. IV, pp. 313–485. North-Holland, Amsterdam (1996) 5. Wohlmuth, B.I.: Variationally consistent discretization scheme and numerical algorithms for contact problems. Acta Numer. 569–734 (2011) 6. Panagiotopoulos, P.D.: A nonlinear programming approach to the unilateral contact and friction boundary value problem. Ing.-Archiv 44, 421–432 (1975)
272
Zdeněk Dostál, Tomáš Kozubek, and Vít Vondrák
7. Bazaraa, M.S., Shetty, C.M., Sherali, H.D.: Nonlinear Programming. Theory and Algorithms, 2nd edn. Wiley, New York (1993) 8. Kozubek, T., Markopoulos, A., Brzobohatý, T., Kučera, R., Vondrák, V., Dostál, Z.: MatSol–MATLAB efficient solvers for problems in engineering. The library is no longer updated and is replaced by the ESPRESO framework. http://numbox.it4i.cz 9. Dostál, Z., Kozubek, T., Markopoulos, A., Brzobohatý, T., Vondrák, V., Horyl, P.: Scalable TFETI algorithm for two dimensional multibody contact problems with friction. J. Comput. Appl. Math. 235(2), 403–418 (2010) 10. Bouchala, J., Dostál, Z., Kozubek, T., Pospíšil, L., Vodstrčil, P.: On the solution of convex QPQC problems with elliptic and other separable constraints. Appl. Math. Comput. 247(15), 848–864 (2014) 11. Duvaut, G., Lions, J.L.: Inequalities in Mechanics and Physics. Springer, Berlin (1976) 12. Capatina, A.: Variational Inequalities and Frictional Contact Problems. Springer, New York (2014) 13. Laursen, T.: Computational Contact and Impact Mechanics. Springer, Berlin (2002) 14. Wriggers, P.: Contact Mechanics. Springer, Berlin (2005) 15. Zhang, H.W., He, S.Y., Li, X.S., Wriggers, P..: A new algorithm for numerical solution of 3D elastoplastic contact problems with orthotropic friction. Comput. Mech. 34, 1–14 (2004) 16. Wohlmuth, B.I., Krause, R.: Monotone methods on nonmatching grids for nonlinear contact problems. SIAM J. Sci. Comput. 25, 324–347 (2003) 17. Krause, R.H.: On the Multiscale Solution of Constrained Minimization Problems. Domain Methods in Science and Engineering XVII. Lecture Notes in Computational Science and Engineering, vol. 60, pp. 93–104. Springer, Berlin (2008) 18. Krause, R.H.: A Nonsmooth multiscale method for solving frictional twobody contact problems in 2D and 3D with multigrid efficiency. SIAM J. Sci. Comput. 31(2), 1399–1423 (2009) 19. Qi, L., Sun, J.: A nonsmooth version of Newton’s method. Math. Progr. 58, 353–367 (1993) 20. Hager, C., Wohlmuth, B.I.: Nonlinear complementarity functions for plasticity problems with frictional contact. Comput. Methods Appl. Mech. Eng. 198, 3411–3427 (2009) 21. Hintermüller, M., Ito, K., Kunisch, K.: The primal-dual active set strategy as a semi-smooth Newton method. SIAM J. Optim. 13, 865–888 (2002) 22. Migorski, S., Ochal, A., Sofonea, M.: Modeling and analysis of an antiplane piezoelectric contact problem. Math. Models Methods Appl. Sci. 19, 1295–1324 (2009)
12 Contact Problems with Friction
273
23. Hüeber, S., Matei, A., Wohlmuth, B.I.: A contact problem for electroelastic materials. ZAMM J. Appl. Math. Mech. 93, 789–800 (2013) 24. Ito, K., Kunisch, K.: On a semi-smooth Newton method and its globalization. Math. Progr. Ser. A 118, 347–370 (2009) 25. Kučera, R., Motyčková, K., Markopoulos, A.: The R-linear convergence rate of an algorithm arising from the semi-smooth Newton method applied to 2D contact problems with friction. Comput. Optim. Appl. 61(2), 437–461 (2015) 26. Dostál, Z., Vondrák, V.: Duality based solution of contact problems with Coulomb friction. Arch. Mech. 49(3), 453–460 (1997) 27. Dostál, Z., Horák, D., Kučera, R., Vondrák, V., Haslinger, J., Dobiáš, J., Pták, S.: FETI based algorithms for contact problems: scalability, large displacements and 3D Coulomb friction. Comput. Methods Appl. Mech. Eng. 194(2–5), 395–409 (2005) 28. Nečas, J., Jarušek, J., Haslinger, J.: On the solution of the variational inequality to the Signorini problem with small friction. Bollettino dell’Unione Matematica Italiana 5(17–B), 796–811 (1980) 29. Dostál, Z., Kozubek, T., Markopoulos, A., Brzobohatý, T., Vondrák, V., Horyl, P.: Theoretically supported scalable TFETI algorithm for the solution of multibody 3D contact problems with friction. Comput. Methods Appl. Mech. Eng. 205, 110–120 (2012) 30. Haslinger, J., Kučera, R., Dostál, Z.: An algorithm for the numerical realization of 3D contact problems with Coulomb friction. J. Comput. Appl. Math. 164–165, 387–408 (2004) 31. Haslinger, J., Kučera, R.: T-FETI based algorithm for 3D contact problems with orthotropic friction. In: Stavroulakis, E. (ed.) Recent Advances in Contact Mechanics, Papers Collected at the 5th Contact Mechanics International Symposium (CMIS2009), Chania. LNACM vol. 56, pp. 131–149 (2013) 32. Bouchala, J., Dostál, Z., Kozubek, T., Pospíšil, L., Vodstrčil, P.: On the solution of convex QPQC problems with elliptic and other separable constraints. Appl. Math. Comput. 247(15), 848–864 (2014) 33. Pospíšil, L.: Development of Algorithms for Solving Minimizing Problems with Convex Quadratic Function on Special Convex Sets and Applications, Ph.D. Thesis, VŠB-TU Ostrava (2015). https://dspace.vsb. cz/handle/10084/110918?locale-attribute=en
Chapter 13
Transient Contact Problems Zdeněk Dostál and Tomáš Kozubek
Here, we adapt the methods presented in the previous chapters to the solution of contact problems which take inertia forces into account. The introduction of time is a source of many complications, even if we restrict our attention to 3D frictionless contact problems. Little is known about the solvability of continuous problems. Moreover, the solution cost depends on the number of time intervals required by time discretization. On the other hand, the problems are physically more realistic. The real-world problems are naturally time-dependent, and time discretization reduces the problem to a sequence of quasi-static problems with a good initial approximation. Therefore, it is possible to develop an explicit method that avoids solving auxiliary problems at the cost of many time steps [1]. In this chapter, we present an implicit method that uses a much smaller number of time steps at the cost of solving two well-conditioned bound-constrained QP problems at each time step. The chapter starts with a continuous problem, decomposition, and variational formulation. After the space and time discretization, we use duality to reduce the time-step problems of the implicit algorithm to well-conditioned bound-constrained quadratic programming problems in Lagrange multipliers. Finally, we use the FETI methodology to get bounds on the spectrum of effective stiffness and mass matrices that are sufficient for the proof of scalability of the algorithm. Numerical experiments confirm the scalability. The FETI domain decomposition methods comply well with the structure of auxiliary problems arising in the solution of transient contact problems. It turns out that the implementation of the time-step problems by TFETI is even simpler than that in Chap. 11 since the effective stiffness matrices of the “floating” subdomains are regularized by the mass matrices. To reduce the number of iterations, it is possible to use the preconditioning by a conjugate Z. Dostál • T. Kozubek Department of Applied Mathematics and IT4Innovations National Supercomputing Center, VŠB-Technical University of Ostrava, Ostrava, Czech Republic e-mail: [email protected]; e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 Z. Dostál et al., Scalable Algorithms for Contact Problems, Advances in Mechanics and Mathematics 36, https://doi.org/10.1007/978-3-031-33580-8_13
275
276
Zdeněk Dostál and Tomáš Kozubek
t =0 Ω1
t = t2 t = t1
Ω2 Fig. 13.1 Transient multibody contact problem at time t = 0, t = t1 , t = t2
projector (deflation), which improves the performance of both linear and nonlinear steps of the algorithms. The basic TFETI-based algorithms presented here are suitable for massively parallel implementation. There is no costly orthogonalization of dual equality constraints. For a reasonable number of time steps, we can use tens of thousands of cores to solve as large problems as reported in the previous chapters, i.e., tens of thousands of subdomains and billions of nodal variables. Some hints for massively parallel implementation are in Chaps. 15 and 20.
13.1 Transient Multibody Frictionless Contact Problem Let us consider a system of s homogenous isotropic elastic bodies, each of which occupies at time .t = 0 a reference configuration, bounded domains p 3 p .Ω ⊂ R , .p = 1, . . . , s, with a sufficiently smooth boundary .Γ . The mechanp ical properties of .Ω , which are assumed to be homogeneous, are defined by the Young modulus .E p , the Poisson ratio .ν p , and the density .ρp . The density defines the inertia forces. On each .Ωp , there is defined the vector of external body forces f p : Ωp ×[0, T ] → R3 .
.
We suppose that each .Γp consists of three disjoint parts .ΓpU , .ΓpF , and .ΓpC that do not change in time, .Γp = ΓpU ∪ ΓpF ∪ ΓpC . There are prescribed zero displacements and traction up = o on
.
ΓpU × [0, T ]
and
fΓp : ΓpF × [0, T ] → R3 ,
respectively. The part .ΓpC denotes the part of .Γp that can get into contact with other bodies in a time interval .[0, T ], .T > 0. We denote by .Γpq C the part of .ΓpC that can come in contact with .Ωq in the time interval .[0, T ]. We
13 Transient Contact Problems
277
admit .ΓpU = ∅ for any .p = 1, . . . , s, but we assume that all .ΓpC are sufficiently smooth so that for almost every .xp ∈ ΓpC , there is a unique external normal p p p .n = n (x ). See also Fig. 13.1. We assume that we can neglect the change p of .n in time. We denote s
Ω=
.
Ωp ,
ΓU =
p=1
s
ΓpU ,
ΓF =
p=1
s
ΓpF ,
p=1
ΓC =
s
ΓpC ,
p=1
and define f : Ω × [0, T ] → R3 ,
f | Ωp × [0, T ] = f p ,
fΓ : ΓF × [0, T ] → R3 ,
fΓ | ΓpF × [0, T ] = fΓp .
.
The main new features of frictionless transient contact problems, as compared with Sect. 11.2, apart from the time dimension, are the inertia forces and the initial displacements and velocities defined by .u0 : Ω → R3 and 3 .d0 : Ω → R , respectively. To describe the non-penetration on .Γpq C , we choose a contact coupling set .S , and for each .(p, q) ∈ S , we define a one-to-one continuous mapping qp χpq : Γpq C → ΓC
.
as in Sect. 11.2. The (strong) linearized non-penetration condition at .x ∈ Γpq C and time .t ∈ [0, T ] then reads p p pq p q pq . u (x) − u ◦ χ (x) · n (x) ≤ χ (x) − x · n (x). (13.1) Denoting the surface traction on the slave side of the active contact interface without friction by .
and using the notation of Sect. 11.2, we can write the complete contact conditions for .(p, q) ∈ S in the form .
[un ] ≤ g,
λn ≥ 0,
λn ([un ] − g) = 0,
(x, t) ∈ Γpq C × [0, T ].
(13.2)
The equations of motion and the constraints that should be satisfied by the displacements up : Ωp × [0, T ] → R3 , T > 0,
.
read
278
Zdeněk Dostál and Tomáš Kozubek
ρ¨ u − div σ (u) = f u=o
.
σ (u) n = fΓ λ = λn n p λn ≥ 0 [un ] ≤ g λn ([un ] − g) = 0 u(·, 0) = u0 ˙ 0) = d0 u(·,
in on
Ω ΓU
× [0, T ] , × [0, T ] ,
(13.3) (13.4)
on on on
ΓF Γpq C Γpq C
× [0, T ] , × [0, T ] , × [0, T ] ,
(p, q) ∈ S , (p, q) ∈ S ,
(13.5) (13.6) (13.7)
on on
Γpq C Γpq C
× [0, T ] , × [0, T ] ,
(p, q) ∈ S , (p, q) ∈ S ,
(13.8) (13.9)
in in
Ω, Ω.
(13.10) (13.11)
The relations (13.3)–(13.11) represent the general formulation of a transient (dynamic) multibody frictionless contact problem. The relations include Newton’s equation of motion (13.3), the classical boundary conditions (13.4)– (13.5), the contact conditions (13.6)–(13.9), and the initial conditions (13.10) and (13.11).
13.2 Variational Formulation and Domain Decomposition To get a formulation of transient contact problem (13.3)–(13.11) that is suitable for a finite element discretization in space, let us first define the test spaces (in the sense of traces) p V = V 1 × · · · × V s. .V = vp ∈ (H 1 (Ωp ))3 : vp = o on ΓpU , If we multiply the Newton equation of motion (13.3) by .v ∈ V , integrate the result over .Ω, and notice that (13.3)–(13.5) differ from the conditions of static equilibrium (11.5) only by the first term in (13.3); we can use procedures of Sect. 11.3 to get s
¨ p · vp dΩ + a (u, v) + p u
.
p=1
Ωp
(p,q)∈S
Γpq C
λn [vn ] dΓ = (v),
(13.12)
where a and . are defined in Sect. 11.3. For a sound variational formulation of the equations of motion (13.3)–(13.5), it is useful to admit .λn ∈ M + , where + pq μpq ∈ H −1/2 (Γpq .M = C ) : μ , [vn ] dΓ ≥ 0 for v ∈ V, [vn ] ≥ 0 . (p,q)∈S
In this case, we should replace the second sum in (13.12) by the duality pairing . λn , [vn ]. Using the notation
13 Transient Contact Problems
m(¨ u, v) =
279
s
¨ p · vp dΩ, p u
.
p=1
¨ , v ∈ V, u
Ωp
we can rewrite the variational equations (13.12) in the compact form .
m (¨ u, v) + a (u, v) + λn , v = (v).
v ∈ V.
(13.13)
To write the contact conditions (13.6)–(13.9) in a form which admits λ ∈ M + , which we shall use in Chap. 16, let us apply .μ ∈ M + to the non-penetration condition (13.8) and add the result to (13.9). After a simple manipulation, we get
. n
.
μ − λn , [un ] ≤ μ − λn , g,
μ ∈ M +.
(13.14)
Thus we got the problem to find for almost every .t ∈ [0, T ] a displacement u(· , t) ∈ V and .λn ∈ M + that satisfy (13.13), (13.14), (13.10), and (13.11). ¨ exists Moreover, we assume that the solution .u is sufficiently smooth so that .u in some reasonable sense and can be approximated by finite differences. A little is known about the solvability of transient problems, so we shall assume in what follows that a solution exists. For a complete discussion of solvability and variational formulation of dynamic problems, including problems with friction, see Wohlmuth [2] or Eck et al. [3]. The domain decomposition is similar to that described in Sect. 11.4. We tear each body from the part of the boundary with Dirichlet boundary conditions, decompose each body into subdomains, assign each subdomain a unique number, and introduce new “gluing” conditions on the artificial intersubdomain boundaries and the boundaries with imposed Dirichlet conditions. Misusing a little the notation, we denote the subdomains and their number again by .Ωp and s, respectively. For the artificial inter-subdomain boundaries, we introduce the notation analogously to the notation of the contact p q boundary so that .Γpq G denotes the part of .Γ that is interconnected with .Ω . pq qp Obviously .ΓG = ΓG . The interconnecting conditions require the continuity of the displacements and their normal derivatives across the inter-subdomain boundaries. An auxiliary decomposition of the problem of Fig. 13.1 with renumbered subdomains and artificial inter-subdomain boundaries is similar to that in Fig. 11.3. The procedure is essentially the same as that in Sect. 11.4.
.
13.3 Space Discretization To get the discretized variant of (13.13) and (13.6)–(13.11), we shall combine the finite element space discretization with the finite difference time discretization. For the space discretization, we define on each subdomain .Ωp
280
Zdeněk Dostál and Tomáš Kozubek
the basis vector functions Φpj : Ωp → R3 ,
.
j = 1, . . . , np .
As in Sect. 11.5, we assume that the basis functions are peace-wise linear with just one component equal to one in one of the nodes and zero other two components and the values in the other nodes. Furthermore, we assume that the basis functions comply with a quasi-uniform discretization of .Ωp into the shape regular tetrahedral finite elements. We denote by h the diameter of the largest element. We shall look for an approximate solution .uh (t) at time t in the trial space .Vh which is spanned by the basis functions, .
Vh = Vh1 × · · · × Vhs , Vhp ⊆ Rn , Vhp = Span{Φp , = 1, . . . , np }.
After decomposing .uh (t) into the subdomain components, we get uh (t) = [u1h (t), . . . , ush (t)]T ,
.
uph (t) =
np
up (t)Φp (x).
=1
Using the procedure described in Sect. 11.5, we get the block diagonal stiffness matrix .K = diag (K1 , . . . , Ks ) of the order n with the sparse SPS diagonal blocks .Kp that correspond to the subdomains .Ωp . The decomposition also induces the block structure of the vector .fh (t) ∈ Rn of the discretized form .. To enhance the inertia forces, we introduce the mass matrix ⎡ ⎤ M1 O · · · O ⎢ O M2 · · · O ⎥ ⎢ ⎥ p p .M = ⎢ . . . ⎥ , [Mp ]m = p (Φ , Φm )L2 (Ωp ) , , m = 1, . . . , np . ⎣ .. .. . . . .. ⎦ O O · · · Ms (13.15) After the decomposition and space discretization described above, we reduce the transient problem described by (13.13) and (13.6)–(13.11) to the problem to find .uh ∈ Vh × [0, T ] such that at the time t
13 Transient Contact Problems
281
Mu¨h + Kuh = f − BTI λTI − BTE λE , B I u h ≤ cI ,
.
(13.16) (13.17) (13.18) (13.19)
BE uh = cE = o, λI ≥ o, λT (Buh − c) = 0.
(13.20)
Here, we use the same notation (bold symbol) for the continuous displacements .uh ∈ V × [0, T ] and its vector representation .uh = uh (t) ∈ Rn at time t. The matrix BI ∈ RmI ×n and the vector cI ∈ RmI describe the linearized non-penetration conditions as in Sect. 11.5. Similarly the matrix BE ∈ RmE ×n and the vector cE ∈ RmE with the entries ci = 0 enforce the zero displacements on the part of the boundary with imposed Dirichlet conditions and the continuity of displacements across the auxiliary interfaces, respectively. Typically, both mI and mE are much smaller than n. We use the notation BI cI m×n .B = , c= ∈R ∈ Rm , m = mI + mE . BE cE We assume that B and c do not change in time t and the matrix B has orthonormal rows so that it satisfies (11.25) (see also Remark 11.1). We introduced the latter assumption to simplify the proof of optimality. Finally, let λI ∈ RmI and λE ∈ RmE denote the components of the vector of Lagrange multipliers λ = λt ∈ Rm at time t. The Lagrange multipliers λE and λI define the reaction forces on the artificial inter-subdomain and contact interfaces, respectively. We use the notation λI . λ = λt = . λE
13.4 Time Discretization For the time discretization, we use the contact-stabilized Newmark scheme introduced by Krause and Walloth [4] with the regular partition of the time interval .[0, T ], 0 = t0 < t1 < · · · < tnT = T,
tk = kΔ,
.
Let us simplify the notations to uk = uh (tk ).
.
Δ = T /nT ,
k = 0, . . . , nT .
282
Zdeněk Dostál and Tomáš Kozubek
The scheme assumes that the acceleration vector is split at time .tk into two ¨ con ¨ int and .u components .u k k related to the acceleration affected by the contact and other forces, respectively, .
¨k = u ¨ con ¨ int u k +u k ,
−1 ¨ int ¨ con u (fk − Kuk ) , and u = −M−1 BT λk . k =M k (13.21)
Denoting K = {u ∈ Rn : BI u ≤ cI and BE u = o},
.
and
= 4 M + K, K Δ2 we can write the solution algorithm in the following form. .
Algorithm 13.1 Contact-stabilized Newmark algorithm Step 0. {Initialization} = 42 M + K, .λ0 , .T > 0, .nT ∈ N, and .Δ = T /nT Set .u0 , .u˙ 0 , .K Δ for .k = 0, . . . , nT − 1 Step 1. {Predictor displacement} Find the minimizer .upred k+1 1 T .minu∈K u Mu − (Muk + ΔMu˙ k )T u 2 Step 2. {Contact-stabilized displacement} Find the minimizer .uk+1 and the multiplier .λk+1 for .minu∈K
1 T u Ku 2
−
4 Mupred k+1 Δ2
− Kuk + fk + fk+1 − BT λk
T u
Step 3. {Velocity update} 2 ˙ k+1 = u˙ k + Δ .u uk+1 − upred k+1 end for
introduced in Step 0 is called the effective stiffness matrix. The matrix .K is is nonsingular and that the condition number .κ(K) We shall see that .K favorable and decreases with the decreasing time step .Δ.
13.5 Dual Formulation of Time-Step Problems The cost of the iterates of Algorithm 13.1 is dominated by the cost of the execution of its Step 1 and Step 2, the implementation of which requires the solution of bound and equality constrained QP problems .
min f (u)
s.t.
B I u ≤ cI
and
B E u = cE
(13.22)
13 Transient Contact Problems
283
with the strictly convex quadratic function f (u) =
.
1 T u Hu − uT h. 2
The function f for the problem arising in Step 1 is defined by H = M and
.
h = Muk + ΔMu˙ k
and for that arising in Step 2 by H=K
.
and
4 T Mupred k+1 − Kuk + fk + fk+1 − B λk . Δ2
h=
Even though (13.22) is a standard QP problem, its form is not suitable for numerical solution due to general inequality constraints. This complication can be reduced by applying the duality theory of convex programming (see Sect. 3.7) as in Sect. 11.6. In the dual formulation of problem (13.22), we use the Lagrange multipliers with two block components, namely, .λI ∈ RmI associated with the non-penetration condition and .λE ∈ RmE associated with the “gluing” and prescribed displacements, so the Lagrangian associated with problem (13.22) reads .
L(u, λ) = f (u) + λT (Bu − c).
(13.23)
Using Proposition 3.13 (with .R = [o]), we get that .λ solves the strictly convex minimization problem .
min
1 T λ BH−1 BT λ − λT (BH−1 h − c) 2
s.t.
λI ≥ o.
(13.24)
of (13.24) is known, the solution .u of (13.22) may be Once the solution .λ evaluated from the KKT condition = o. H u − h + BT λ
.
Problem (13.24) is much more suitable for computations than (13.22) because we replaced the general inequality constraints in (13.22) with nonnegativity constraints. In the following two sections, we show that the favorable and the form of constraints distribution of the spectrum of both .M and .K are sufficient to implement the first two steps by using the standard MPRGP algorithm with asymptotically linear complexity.
284
Zdeněk Dostál and Tomáš Kozubek
13.6 Bounds on the Spectrum of Relevant Matrices Let us give bounds on the spectrum of the mass matrix and the dual-energy matrix in the form that is suitable for the proof of optimality of algorithms for the problems in Sect. 13.5.
13.6.1 Bounds on the Mass Matrix Spectrum First, recall that the mass matrix .M is block-diagonal so that we can restrict our analysis to one block that we denote .Mp . Notice that .Mp itself comprises three identical blocks generated by the basis functions for the displacements in the coordinate axes. In what follows, we shall use the notations introduced in Sect. 11.11.1. For simplicity, we assume .p = 1. Lemma 13.1 Let .M denote a mass matrix (13.15) resulting from a quasiuniform discretization of .Ω using shape regular elements with the discretization parameter h. Then λ
. min
(M) ≈ h3 ,
λmax (M) ≈ h3 ,
and
κ(M) ≈ 1.
(13.25)
Proof Let k denote a maximum number of the basis functions with the nonempty intersections of their supports. Then λ
. max
(Mp ) ≤ Mp ∞ ≤ max i
≤
np
|(Φpi , Φpj )L2 (Ωp ) |
j=1 p 2 k max Φi L2 (Ωp ) ≈ i
h3 .
To get a lower bound on .λmin (Mp ), recall that .Ωp is decomposed into the tetrahedral finite elements .τ ∈ Tp , so we can use element mass matrices .Mp,τ to get .Mp = Mp,τ . τ ∈Tp
Let .ep and .ep,τ denote the normalized eigenvector corresponding to .λmin (Mp ) and the restriction of .ep with the indices of nodal displacements of .τ , respectively. Then we can follow Fried [7] to get T .λmin (Mp ) = ep Mp ep = eTp,τ Mp,τ ep,τ τ ∈Tp
≥ min λmin (Mp,τ ) τ ∈Tp
τ ∈Tp
ep,τ 2 ≥ min λmin (Mp,τ )ep 2 . τ ∈Tp
13 Transient Contact Problems
285
Now we can use the beautiful inequality introduced by Wathen [6] (see also Kamenski et al. [8]) to get that for any vector .u of nodal displecements of .τ 1 T T u diag(Mp,τ )u, .u Mp,τ u ≥ 2 where .diag(Mp,τ ) denotes the diagonal matrix comprising the diagonal entries of .Mp,τ . Since the diagonal entries of .Mp,τ are proportional to .h3 , we get λ
. min
(Mp ) ≥ min uT Mp,min u ≥ min u=1
u=1
1 T u diag(Mp,min )u 2
1 ≥ min Φpi 2L2 (Ωp ) ≈ h3 , i 2 with .Mp,min denoting a local mass matrix corresponding to the least eigen. value of .Mp,τ among all .τ ∈ Tp .
13.6.2 Bounds on the Dual Stiffness Matrix Spectrum To simplify the notation, let us denote .
= BK −1 BT F
and
−1 h − c. d = BK
reads The problem (13.24) with .H = K .
min Θ(λ)
s.t. λ ∈ ΩB ,
(13.26)
ΩB = {λ ∈ Rm : λI ≥ o}.
(13.27)
where .
Θ(λ) =
1 T λ Fλ − λT d 2
and
Recall that by Lemma 13.1, the quadratic form defined by the mass matrix satisfies .
xT Mx ≈ h3 x2 ,
x ∈ Rn .
(13.28)
Lemma 13.2 Let the assumptions of Lemma 11.4 hold, .C > 0, and BT λ2 ≈ λ2 .
.
Then .
h2 Δ2 Δ2 T 2 λ λ λ2 . Fλ h3 (h2 + Δ2 ) h3
(13.29)
286
Zdeněk Dostál and Tomáš Kozubek
Proof Let .λ ∈ Rm and .μ = BT λ. Since by (13.28) μT M−1 μ ≈ h−3 μ2
.
and the smallest eigenvalue of .K is zero, we have = μT K −1 μ = μT λT Fλ .
K+
4 M Δ2
−1 μ
Δ2 T 2 Δ2 B λ ≈ 3 λ2 . 3 h h
Similarly, using Lemma 11.4, (13.28), and the assumptions, we get
= μT K −1 μ = μT λT Fλ .
K+
4 M Δ2
−1 μ
−1 −1 h + h3 Δ−2 μ2 = h−3 h−2 + Δ−2 λ2 .
After simple manipulations, we get (13.29).
.
The main result of this section is now an easy corollary of Lemma 13.2. Proposition 13.1 Let the assumptions of Lemma 11.4 hold, .C > 0, and BT λ2 ≈ λ2 .
.
be generated with the discretization parameters h and .Δ, .0 < Δ ≤ Ch. Let .F Then .
≈ 1. κ(F)
(13.30)
13.7 Preconditioning by Conjugate Projector Examining the proof of Lemma 13.2, we can see that if we use longer time steps, the effective stiffness matrix has very small eigenvalues which correspond to the eigenvectors that are near the kernel of .K. This observation was first exploited for linear problems by Farhat et al. [10] who used the conjugate projectors to the natural coarse grid to achieve scalability for the time step. Unfortunately, this idea cannot be applied to the full extent to the contact problems as we do not know a priori which boundary conditions affect the subdomains with nonempty contact interfaces. However, we can still define preconditioning by the trace of rigid body motions on artificial subdomain interfaces. To implement this observation, we use the preconditioning by a conjugate projector for partially constrained strictly convex quadratic programming problems that complies with the MPRGP algorithm for the so-
13 Transient Contact Problems
287
lution of strictly convex bound-constrained problems. Our approach is based on Domorádová and Dostál [11]. Even though the method presented here is similar to that used in Chap. 11, it is not identical, as here we work with the effective stiffness matrix which is nonsingular. The choice of projectors is motivated by the classical FETI results by Farhat et al. [12] and their application to the unconstrained transient problems by Farhat et al. [10]. To explain the motivation in more detail, let us denote by .R ∈ Rn×r the full rank matrix, the columns of which span the kernel of the stiffness matrix .K introduced in Sect. 13.2, i.e., KR = O
and
.
= 4 MR. KR Δ2
Since all the subdomains are floating, the matrix .R is known a priori. In the static case, we get the same minimization problem as in (13.22) but with .H = K and .h = f . In this case, .H is SPS, so by Proposition 3.13 the dual problem reads .
min
1 T λ Fλ − λT d 2
s.t.
λI ≥ o
and
Gλ = e,
(13.31)
where .
= BK+ f − c, d e = TRT f ,
F = BK+ BT , G = TRT BT ,
and .T and .K+ denote a regular matrix that defines the orthonormalization of the rows of .RT BT and an arbitrary generalized inverse to .K satisfying + .K = KK K, respectively. The final step in the static case is based on the observation that problem (13.31) is equivalent to .
min
1 T λ (PFP + ρQ)λ − λT Pd 2
s.t.
Gλ = o and
λ I ≥ I ,
(13.32)
where .ρ is an arbitrary positive constant, Q = GT G
.
and
P=I−Q
denote the orthogonal projectors onto the image space of .GT and onto the − Fλ, and . = −λ, where .λ is a particular kernel of .G, respectively, .d = d solution of the equation .Gλ = e used for the homogenization as in Sect. 11.6. The proof of optimality for static problems in Sect. 11.12 was based on can be considered a favorable distribution of the spectrum of .F|KerG. Since .F for a large .Δ as a perturbation of .F, it seems natural to assume that the reduction of iterations to .KerG guarantees also some preconditioning effect for .F.
288
Zdeněk Dostál and Tomáš Kozubek
To develop an efficient algorithm for (13.24) based on MPRGP, we have to find a subspace of .Rm that is near to .KerG and is invariant with respect to the Euclidean projection onto the feasible set. The latter condition implements the requirement for the convergence of multigrid formulated by Iontcheva and Vassilevski [13] that the coarse grid should be defined by the subspace which is kept away from the contact interface. A natural choice .U = ImUT is defined by the full rank matrix .U ∈ Rr×m , GE = RT BTE .
U = [O, GE ],
.
We implement this idea by means of the conjugate projectors .
T )−1 UF P = I − UT (UFU
and
T )−1 UF Q = UT (UFU
(13.33)
onto the subspaces V = ImP
U = ImQ = ImUT ,
and
.
respectively. It is easy to check directly that .P and .Q are conjugate projectors, i.e., P2 = P,
.
Q2 = Q,
and
= O, PT FQ
and that .
= PT FP = FP PT F
and
= QT FQ = FQ. QT F
(13.34)
From the definition of V and (13.34), we get immediately .
FV ) ⊆ FV. PT FP(
(13.35)
is an invariant subspace of .PT FP. The following simple lemma shows Thus .FV the vector .Pλ ∈ V is expansive. that the mapping which assigns each .λ ∈ FV Lemma 13.3 Let .P denote a conjugate projector onto a subspace V. Then for any .λ ∈ FV .
Pλ ≥ λ
(13.36)
.
). V = P(FV
(13.37)
and
Proof See [14]. Using the projector .Q, it is possible to solve the auxiliary problem .
1 T μ − dT UT μ, min Θ(ξ) = minr Θ(UT μ) = minr μT UFU μ∈R μ∈R 2 ξ∈U
.
13 Transient Contact Problems
289
with .Θ defined in (13.27). By the gradient argument, we get that the mini of .Θ over U is defined by mizer . ξ = UT μ T μ = Ud, UFU
(13.38)
T )−1 Ud = QF −1 d. ξ = UT (UFU
(13.39)
.
so that .
Thus we can find the minimum of .Θ over U whenever we are able to solve (13.38). We shall use the conjugate projectors .Q and .P to decompose the minimization problem (13.24) into the minimization on U and .V ∩ΩB , where .V = ImP and .ΩB is defined in (13.27). In particular, we shall use three observations. First, using Lemma 13.3, we get that the mapping which assigns to each a vector .Px ∈ V is an isomorphism. Second, using the definitions of .x ∈ FV the projectors .P and .Q, (13.34), and (13.39), we get .
F −1 d − d = QF F −1 d − d = QT d − d = −PT d. (13.40) ξ − d = FQ g0 = F
Since .
= Im(FP) = FV ImPT = Im(PT F)
(13.41)
. Finally, observe that it and .g0 ∈ ImPT by (13.40), we get that .g0 ∈ FV T follows by Lemma 13.3 that the restriction .P FP|FV is positive definite. For any vector .λ ∈ Rm , let .I ⊆ {1, . . . , m} denote the set of indices that correspond to .λI , so that .λI = λI . Due to our special choice of .U, we get that for any .λ ∈ Rm .
[Pλ]I = λI .
(13.42)
Moreover, for any .ξ ∈ U and .μ ∈ V , .[ξ + μ]I ≥ o if and only if .μI ≥ o. Using (13.39), (13.40), and (13.42), we get min Θ(λ) =
λ∈ΩB
min Θ(ξ + μ) = min Θ(ξ) +
= Θ( ξ) + .
ξ∈U
ξ∈U,μ∈V ξ+μ∈ΩB
min
μ∈V ∩ΩB
Θ(μ)
1 Θ(μ) = Θ( ξ) + min μT PT FPμ − dT Pμ μ∈V ∩ΩB μ∈FV 2 min
= Θ( ξ) + min
μ∈ FV μI ≥o
1 T T μ P FPμ + g 2
μI ≥o
0 T
μ.
Thus we have reduced our bound-constrained problem (13.24), after replacing μ by .λ, to the problem
.
290
Zdeněk Dostál and Tomáš Kozubek
.
min
λ∈ FV λI ≥o
T 1 T T λ P FPλ + g0 λ. 2
(13.43)
The following lemma shows that the above problem can be solved by the standard MPRGP algorithm without any change. Lemma 13.4 Let .λ1 , λ2 , . . . be generated by the MPRGP algorithm for the problem T 1 T T λ P FPλ + g0 λ (13.44) λI ≥o 2 starting from .λ0 = PΩB g0 , where .PΩB denotes the Euclidean projection to k = 0, 1, 2, . . . . the feasible set .ΩB introduced in (13.27). Then .λk ∈ FV, .
min
Proof See Domorádová and Dostál [11] or the book [15].
.
To discuss at least briefly the preconditioning effect of the restriction of , let .ϕmin and .ϕmax denote the extreme eigenvalues of .F. the iterates to .FV and .μ = 1. Then by Lemma 13.3 Let .μ ∈ FV .
2 μT PT FPμ ≥ ϕmin ≥ (Pμ)T F(Pμ)/Pμ = (Pμ)T F(Pμ)
(13.45)
≤ ϕmax . μT PT FPμ ≤ μT QT FQμ + μT PT FPμ = μT Fμ
(13.46)
and .
Thus, it is natural to assume that the preconditioning by the conjugate projector results in an improved rate of convergence. The preconditioning effect can be evaluated by means of the gap between the subspace defined by the and the subspace eigenvectors corresponding to the smallest eigenvalues of .F that is defined by the image space of .UT (see Dostál [14] or [15]). Let us stress that the unique feature of the preconditioning by a conjugate projector is that it preconditions not only linear steps but also the nonlinear gradient projection steps. Let us compare our analysis with Farhat et al. [10], who show that in the unconstrained case, the conjugate projector for large time steps approaches the orthogonal projector to the natural coarse grid (defined by the kernel of .G). This observation implies that the convergence properties of their algorithm for longer time steps are similar to the convergence of FETI1. Unfortunately, these arguments are not valid for our algorithm. The reason is that our coarse space is smaller. Indeed, if .λ = [λTI , .oT ]T , then obviously .λ ∈ KerU, but not necessarily .λ ∈ KerG. On the other hand, the size of .λI is typically rather small as compared with the size of .λ so that it is natural to assume that the preconditioning by the conjugate projector improves the rate of convergence significantly.
13 Transient Contact Problems
291
13.8 Optimality Now we are ready to show that MPRGP can be used to implement the time step of the implicit Newmark method with a uniformly bounded number of matrix–vector multiplications. Let us consider a class of problems arising from the finite element discretization of a given transient contact problem (13.13) and (13.6)–(13.11) with varying time steps .Δ > 0 and the decomposition and discretization parameters H and h, respectively. We assume that the space discretization satisfies the assumptions of Theorem 11.2. Given the constants .C1 , C2 > 0, we shall define TC1 C2 = {(H, h, Δ) ∈ R3 : 0 < Δ ≤ C1 h and
.
h ≤ C2 }
as the set of indices. For any .t ∈ TC1 ,C2 , we shall define = F, F
. t
dt = d,
It = I ,
by the entities generated with the parameters H, h, and .Δ, so that problem (13.24) is equivalent to the problem .
min
1 T λ Ft λt − dTt λt s.t. [λt ]It ≥ o. 2 t
(13.47)
The main result reads as follows. Theorem 13.1 Let .C1 , C2 , ε > 0, and .0 < c < 1 denote given constants. For each .t ∈ TC1 C2 , let .{λkt } be generated by Algorithm 8.2 (MPRGP) defined for the solution of problem (13.47) with Γ>0
.
t −1 ≤ αt < (2 − c)F t −1 . cF
and
t which satisfies Then MPRGP finds an approximate solution .λ .
t ) ≤ εdt gP (λ
(13.48)
t of the cost function. at .O(1) matrix–vector multiplications by the Hessian .F Proof We shall show that the contraction coefficient of MPRGP ηΓ,t =1 − .
t ) α t λmin (F , 2 ϑt + ϑt Γ
t , 1}, ϑt =2 max{αt F
= max{Γ, Γ−1 }, Γ t −1 − αt }, α t = min{αt , 2F
292
Zdeněk Dostál and Tomáš Kozubek
which was established in Theorem 8.1, is bounded away from 1. First recall that by Proposition 13.1 there is a constant .C ≥ 1 such that t ) ≤ C, κ(F
.
t ∈ TC 1 C 2 .
Since t −1 − αt } ≥ min{cF t −1 , (2 − c)F t −1 } α t = min{αt , 2F t −1 > 0, = cF
.
(13.49) (13.50)
it follows that t ) ≥ cF t −1 λmin (F t ) = cκ(F t )−1 ≥ c/C. α t λmin (F
.
Observing that .ϑt ≤ 4, we get η
. Γ,t
=1−
t ) α t λmin (F c ≤1− < 1. 2 2 ) ϑt + ϑt Γ 4C(1 + Γ
Thus we have found a nontrivial bound on .ηΓ,t which does not depend on H, h, Δ. The rest follows Theorem 8.1. .
.
Let us finish by recalling that the scalability of the solution algorithm requires that the ratio .H/h is uniformly bounded. The convergence rate can be improved using the preconditioning by the conjugate projector.
13.9 Numerical Experiments The algorithms presented here were implemented in software packages (see Sect. 20.7) and tested on academic benchmarks and real-world problems. Here, we give some results that illustrate their numerical scalability and effectiveness using MatSol [16], postponing the demonstration of parallel scalability to Chap. 20. All computations were carried out with the parameters recommended in the description of the algorithms in Chap. 8, i.e., .Γ = 1 and −1 . The relative precision of the computations was .10−4 . .α = 1.9F
13.9.1 Academic Benchmark We first tested the performance of the TFETI-based algorithms on 3D impact of the elastic bodies .Ω1 and .Ω2 depicted in Fig. 13.2 (see [17]). The top view of the bodies is .10 × 10 [mm], and the top face of the lower body .Ω1 and the bottom face of the upper body .Ω2 are described by the functions .φ1 and .φ2 , respectively, where
13 Transient Contact Problems
293
Ω2
G
Ω1
Fig. 13.2 Geometry with traces of domain decomposition
φ (x, y) = 5 sin 20 − 400 − x2 − y 2 , φ2 (x, y) = sin 20.22 − x2 − y 2 − 20.2 .
. 1
Material constants are defined by the Young modulus .E 1 = E 2 = 2.1 · 105 [MPa], the Poisson ratio .ν 1 = ν 2 = 0.3, and the density .ρ1 = ρ2 = 7.85 · 10−3 [g/mm.3 ]. The initial gap between the curved boxes is set to .g0 = 0.001 [mm]. The initial velocity of the upper body .Ω2 in the vertical direction is .v 2 = −1 [m/s]. The upper body is floating and the lower body is fixed along the bottom side. The linearized non-interpenetration condition was imposed on the contact interface. The space discretization was carried out with the brick elements using varying discretization and decomposition parameters h and H, respectively. We kept .H/h = 16. The number of subdomains ranged from 16 to 250, the number of dual variables ranged from 21,706 to 443,930, and the number of primal variables ranged from 196,608 to 3,072,000. For the time discretization, we used the contact-stabilized Newmark algorithm with the constant time step −3 .Δ = 3h · 10 [s] and solved the impact of bodies in the time interval .[0, 45Δ] [s]. The energy development is in Fig. 13.3 which shows that the algorithm preserves the energy as predicted by the theory. The development of contact pressure in the middle of .Γ2C is in Fig. 13.4. The performance of the algorithm is in Fig. 13.5.
294
Zdeněk Dostál and Tomáš Kozubek Total energy
3.5
Kinetic energy
Potencial energy
3 2.5 2 1.5 1 0.5 0 0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8 −5
Time τ [s]
x 10
Fig. 13.3 Energy conservation [ton .· mm2 · s−2 ] 350 300 250 200 150 100 50 0 0
τ
τ
1
0.2
0.4
0.6
0.8
1
1.2
1.4
Time τ [s]
2
1.6
1.8 −5 x 10
Fig. 13.4 Contact pressure in the middle of .Γ2C [MPa]
Fig. 13.5 Hessian multiplications for MPRGP and MPRGP-P, time steps .Δ2 = 10−5 (left) and .Δ1,h = 3h · 10−1 (right)
13.9.2 Impact of Three Bodies The second benchmark deals with the transient analysis of three elastic bodies in mutual contact. The bottle is a sample model of Autodesk Inventor (“bottle.ipt”). We prescribed the initial horizontal velocity .v 3 = −5 [m/s] on the sphere. The L-shaped body was fixed on the bottom. For the time dis-
13 Transient Contact Problems
295
cretization, we used constant time step .Δ = 10−3 [s] and solved the impact of bodies in time interval .[0, 150Δ] [s]. The solution (total displacement of the problem discretized by .1.2 · 105 primal and .8.5 · 103 dual variables and decomposed into 32 subdomains using METIS at time .t1 = 20Δ, .t2 = 40Δ, .t3 = 60Δ, .t4 = 80Δ, .t5 = 100Δ, .t6 = 120Δ [s]) is depicted in Fig. 13.6 (see [17]). The whole computations required some 300 and 30 matrix–vector multiplications per time step during the impact and separation, respectively.
Fig. 13.6 Impact of bodies in time
13.10 Comments Convenient presentation of variational formulation, finite element approximation, and standard solution methods can be found in the book by Kikuchi and Oden [18]. More on the formulation of continuous problems and the existence of solutions is in Eck et al. [19]. An up-to-date engineering approach to solving transient contact problems is given by Laursen [20] or Wriggers [21]. A conveniently applicable stable energy-conserving method, which is based on quadrature formulas and applicable to frictional contact problems, was proposed by Hager et al. [22]. The method is based on quadrature formulas. For its analysis, see Hager and Wohlmuth [23]. Important issues related to the time discretization and avoiding the nonphysical oscillations that result from the application of standard time discretization methods for unconstrained problems were proposed, e.g., by Chawla and Laursen [24]; Wohlmuth [2]; Bajer and Demkowicz [25, 26]; Khenous et al. [27, 28]; Deuflhard et al. [29]; and Kornhuber et al. [30]. A discussion of the advantages and disadvantages of the respective approaches can be found in [31]. The experimental evidence of optimal complexity of the time step for some academic problems has been reported by the methods discussed in the previous chapters, in particular in Chap. 11. Thus Krause and Walloth [4] documented that even some transient problems with friction can be solved very efficiently by the monotone multigrid method. See also Wohlmuth and Krause [32] and Kornhuber et al. [30]. However, the multigrid methods typically reduce the solution of auxiliary problems to a sequence of linear problems that are solved with optimal complexity, typically leaving the nonlinear
296
Zdeněk Dostál and Tomáš Kozubek
steps without theoretically supported preconditioning so that the theory does not guarantee the optimal performance of the overall procedure. Here we use the combination of standard finite element space discretization with contact-stabilized Newmark scheme that Krause and Walloth [4] introduced for the transient problems with friction. The results presented in this chapter appeared in Dostál et al. [17]. The preconditioning by conjugate projector (deflation) for linear problems was introduced independently by Marchuk and Kuznetsov [33], Nicolaides [34], and Dostál [14]). For recent references, see Gutnecht [35]. The procedure was adapted to the solution of inequality-constrained problems by Domorádová and Dostál [11]. The lack of an a priori known and sufficiently small subspace with the solution does not allow stronger theoretical results. Farhat et al. [10] used the deflation with the naturl coarse grid to achieve optimality of FETI for unconstrained transient problems. See Chap. 16 for the discussion of the combination of TFETI and the mortar approximation of contact conditions.
References 1. Zhao, X., Li, Z.: The solution of frictional wheel-rail rolling contact with a 3D transient finite element model: validation and error analysis. Wear 271(1–2), 444–452 (2011) 2. Wohlmuth, B.I.: Variationally consistent discretization scheme and numerical algorithms for contact problems. Acta Numer. 20, 569–734 (2011) 3. Jarošová, M., Klawonn, A., Rheinbach, O.: Projector preconditioning and transformation of basis in FETI-DP algorithms for contact problems. Math. Comput. Simul. 82(10), 1894–1907 (2012) 4. Krause, R., Walloth, M.: A time discretization scheme based on Rothe’s method for dynamical contact problems with friction. Comput. Methods Appl. Mech. Eng. 199, 1–19 (2009) 5. Brenner, S.C.: The condition number of the Schur complement. Numer. Math. 83, 187–203 (1999) 6. Wathen, A.J.: Realistic eigenvalue bounds for the Galerkin Mass Matrix. IMA J. Numer. Anal. 7, 449–457 (1987) 7. Fried, I.: Bounds on the extremal eigenvalues of the finite element stiffness and mass matrices and their spectral condition number. J. Sound Vib. 22, 407–418 (1972) 8. Kamenski, L., Huang, We., Xu, H.: Conditioning of finite element equations with arbitrary anisotropic meshes. Math. Comput. 83, 2187–2211 (2014) 9. Toselli, A., Widlund, O.B.: Domain Decomposition Methods— Algorithms and Theory. Springer Series on Computational Mathematics, vol. 34. Springer, Berlin (2005)
13 Transient Contact Problems
297
10. Farhat, C., Chen, J., Mandel, J.: A scalable Lagrange multiplier based domain decomposition method for time-dependent problems. Int. J. Numer. Methods Eng. 38, 3831–3853 (1995) 11. Domorádová, M., Dostál, Z.: Projector preconditioning for partially bound constrained quadratic optimization. Numer. Linear Algebra Appl. 14(10), 791–806 (2007) 12. Farhat, C., Mandel, J., Roux, F.-X.: Optimal convergence properties of the FETI domain decomposition method. Comput. Methods Appl. Mech. Eng. 115, 365–385 (1994) 13. Iontcheva, A.H., Vassilevski, P.S.: Monotone multigrid methods based on element agglomeration coarsening away from the contact boundary for the Signorini’s problem. Numer. Linear Algebra Appl. 11(2–3), 189–204 (2004) 14. Dostál, Z.: Conjugate gradient method with preconditioning by projector. Int. J. Comput. Math. 23, 315–323 (1988) 15. Dostál, Z.: Optimal Quadratic Programming Algorithms, with Applications to Variational Inequalities, 1st edn. Springer, New York (2009) 16. Kozubek, T., Markopoulos, A., Brzobohatý, T., Kučera, R., Vondrák, V., Dostál, Z.: MatSol–MATLAB efficient solvers for problems in engineering. The library is no longer updated and is replaced by the ESPRESO framework. http://numbox.it4i.cz 17. Dostál, Z., Kozubek, T., Brzobohatý, T., Markopoulos, A., Vlach, O.: Scalable TFETI with optional preconditioning by conjugate projector for transient contact problems of elasticity. Comput. Methods Appl. Mech. Eng. 247–248, 37–50 (2012) 18. Kikuchi, N., Oden, J.T.: Contact Problems in Elasticity. SIAM, Philadelphia (1988) 19. Eck, C., Jarůšek, J., Krbec, M.: Unilateral Contact Problems. Chapman & Hall/CRC, London (2005) 20. Laursen, T.: Computational Contact and Impact Mechanics. Springer, Berlin (2002) 21. Wriggers, P.: Contact Mechanics. Springer, Berlin (2005) 22. Hager, C., Hüeber, S., Wohlmuth, B.I.: A stable energy conserving approach for frictional contact problems based on quadrature formulas. Int. J. Numer. Methods Eng. 73, 205–225 (2008) 23. Hager, C., Wohlmuth, B.I.: Analysis of a space-time discretization for dynamic elasticity problems based on mass-free surface elements. SIAM J. Numer. Anal. 47(3), 1863–1885 (2009) 24. Chawla, V., Laursen, T.: Energy consistent algorithms for frictional contact problems. Int. J. Numer. Methods Eng. 42, 799–827 (1998) 25. Bajer, A., Demkowicz, L.: Conservative discretization of contact/impact problems for nearly rigid bodies. Comput. Methods Appl. Mech. Eng. 190, 1903–1924 (2001)
298
Zdeněk Dostál and Tomáš Kozubek
26. Bajer, A., Demkowicz, L.: Dynamic contact/impact problems, energy conservation, and planetary gear trains. Comput. Methods Appl. Mech. Eng. 191, 4159–4191 (2002) 27. Khenous, H., Laborde, P., Renard, Y.: On the discretization of contact problems in elastodynamics. Analysis and Simulation of Contact Problems. Lecture Notes in Applied and Computational Mechanics, vol. 27, pp. 31–38. Springer, Berlin (2006) 28. Khenous, H., Laborde, P., Renard, Y.: Mass redistribution method for finite element contact problems in elastodynamics. Eur. J. Mech. A/Solids 27, 918–932 (2008) 29. Deuflhard, P., Krause, R., Ertel, S.: A contact-stabilized newmark method for dynamical contact problems. Int. J. Numer. Methods Eng. 73(9), 1274–1290 (2008) 30. Kornhuber, R., Krause, R., Sander, O., Deuflhard, P., Ertel, S.: A monotone multigrid solver for two body contact problems in biomechanics. Comput. Vis. Sci. 11, 3–15 (2008) 31. Krause, R., Walloth, M.: Presentation and comparison of selected algorithms for dynamic contact based on the Newmark scheme. Appl. Numer. Math. 62(10), 1393–1410 (2012) 32. Wohlmuth, B.I., Krause, R.: Monotone methods on nonmatching grids for nonlinear contact problems. SIAM J. Sci. Comput. 25, 324–347 (2003) 33. Marchuk, G. I., Kuznetsov, Yu. A.: A generalized conjugate gradient method in a subspace and its applications. In Marchuk, G.I., Dymnikov, V.P. (eds.) Problems of Computational Mathematics and Mathematical Modelling. Mir, Moscow 10–32 (1985) 34. Nicolaides, R.A.: Deflation of conjugate gradients with applications to boundary value problems. SIAM J. Numer. Anal. 24, 355–365 (1987) 35. Gutnecht, M.H.: Spectral deflation in Krylov solvers: a theory of coordinate based methods. Electron. Trans. Numer. Anal. 39, 156–185 (2012)
Chapter 14
TBETI Marie Sadowská, Zdeněk Dostál, and Tomáš Kozubek
The optimality results presented in the previous four chapters used the equilibrium conditions in the domains decomposed into non-overlapping subdomains and interconnected by the equality constraints. We discretized the subdomains by the finite element method and then reduced the original problem to minimizing a quadratic function in Lagrange multipliers. Since the multipliers are associated with the subdomains’ boundaries, the procedure reduced the problem to the skeleton of the decomposition. There is a natural question of whether we can reduce the formulation of equilibrium conditions to the subdomains’ boundary on a continuous level. Such reduction promises a lower dimension of the primal discretized problem, simplified discretization, and higher precision of numerical solution. Moreover, a boundary formulation can help solve the problems that include a free boundary, as in the contact shape optimization, and effective treatment of the boundary value problems defined on unbounded domains. This chapter indicates how to reduce frictionless continuous contact problems to the subdomains’ boundaries. The basic observation is that the formulae for sufficiently rich sets of so-called fundamental solutions can be combined to satisfy the original problem’s boundary conditions and the interconnecting conditions on the artificial interface. Our development of scalable algorithms for contact problems is based on the BETI (Boundary Element Tearing and Interconnecting) method, which was introduced by Langer and Steinbach [1]. The main new feature of BETI is that it generates the counterparts of the Schur complements directly by the discretization of the conditions of equilibrium reduced to the boundary. We use a variant of BETI called TBETI, which is the boundary element M. Sadowská • Z. Dostál • T. Kozubek Department of Applied Mathematics and IT4Innovations National Supercomputing Center, VŠB-Technical University of Ostrava, Ostrava, Czech Republic e-mail: [email protected]; e-mail: [email protected]; e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 Z. Dostál et al., Scalable Algorithms for Contact Problems, Advances in Mechanics and Mathematics 36, https://doi.org/10.1007/978-3-031-33580-8_14
299
300
Marie Sadowská, Zdeněk Dostál, and Tomáš Kozubek
adaptation of TFETI. Here we describe the main steps of reducing the contact problems to the boundary—first for scalar problems and then for elasticity.
14.1 Green’s Representation Formula for 2D Laplacian With explicit knowledge of the fundamental solution, we can reformulate the partial differential equation on the domain using the boundary integral equations. To recover the solution in the interior of the domain, one needs to know complete Cauchy data on the boundary. Let us first consider the 2D Poisson equation .
− Δu = f
(14.1)
in Ω,
where Ω ⊂ R2 denotes a bounded Lipschitz domain with a smooth boundary Γ, so that the normal derivative exists at each x ∈ Γ . Moreover, we shall assume that the diameter of Ω satisfies diam Ω = max x − y < 1.
.
x,y∈Ω
To simplify the presentation of basic ideas, let us also assume that the solution u of (14.1) satisfies u ∈ C ∞ (Ω). In this case, the abstract concepts that are necessary for the formulation of general statements and the description of algorithms reduce to explicit formulae. In particular, the standard interior trace operator γ0 (4.4) and the interior conormal derivative operator γ1 (4.8) satisfy on Γ γ u = u,
. 0
γ1 u =
∂u = ∇u · n ∂n
and the related bilinear forms read ∂u v dΓ, .γ0 u, v = uv dΓ, γ1 u, v = Γ Γ ∂n
v ∈ C ∞ R2 ,
where n = n(x) is the outer unit normal vector to Γ at x. A key ingredient of the boundary integral formulation of (14.1) is the fundamental solution U of the Laplacian in R2 U (x, y) = −
.
1 log x − y. 2π
If δy denotes Dirac’s distribution at y, then .
− Δx U (x, y) = δy
for y ∈ R2
14
TBETI
301
in the sense of distributions in R2 , i.e., .−Δx U (x, y), vR2 = − U (x, y)Δv(x) dx = v(y)
(14.2)
R2
for any y ∈ R2 and v ∈ C ∞ (R2 ) with a compact support. Considering y ∈ Γ, the latter formula holds true also for directional derivatives since ∂ ∂v U (x, y) , v =− (y). . −Δx ∂ny ∂ny R2 A key tool on the way to a boundary formulation is the following theorem. 1 Theorem 14.1 (Green’s Representation Formula) Let u ∈ HΔ (Ω) be such that −Δu = f on Ω. Then for any x ∈ Ω .u(x) = f (y) U (x, y) dΩy + γ1 u(y)U (x, y) dΓy Ω Γ − γ0 u(y) γ1,y U (x, y) dΓy . (14.3) Γ
Proof See, e.g., [2]. 1 HΔ (Ω),
Since all three summands of (14.3) belong to we can apply the operators γ0 , γ1 to the representation formula (14.3) to get [3] 1 γ0 u(x) = f (y) U (x, y) dΩy + γ1 u(y) U (x, y) dΓy 2 Ω Γ − γ0 u(y) γ1,y U (x, y) dΓy , Γ . 1 γ1 u(x) = γ1,x f (y)U (x, y) dΩy + γ1 u(y) γ1,x U (x, y) dΓy 2 Ω Γ − γ1,x γ0 u(y) γ1,y U (x, y) dΓy , Γ
x ∈ Γ . After introducing the standard operators (see [3–5]), we get Calderon’s system of integral equations valid on Γ
1
N0 f I −K V γ0 u γ0 u . (14.4) = 2 + 1 γ1 u γ1 u N1 f D 2I + K with the following integral operators defined for x ∈ Γ: single-layer potential operator .(V t)(x) = t(y) U (x, y) dΓy , Γ
V : H −1/2 (Γ) → H 1/2 (Γ),
302
Marie Sadowská, Zdeněk Dostál, and Tomáš Kozubek
double-layer potential operator .(Kh)(x) = h(y) γ1,y U (x, y) dΓy ,
K : H 1/2 (Γ) → H 1/2 (Γ),
Γ
adjoint double-layer potential operator .(K t)(x) = t(y) γ1,x U (x, y) dΓy ,
K : H −1/2 (Γ) → H −1/2 (Γ),
Γ
hypersingular integral operator .(Dh)(x) = −γ1,x h(y) γ1,y U (x, y) dΓy ,
D : H 1/2 (Γ) → H −1/2 (Γ),
Γ
and Newton’s potential operators (N0 f )(x) = f (y) U (x, y) dΩy , N0 : L2 (Ω) → H 1/2 (Γ), Ω . f (y) U (x, y) dΩy , N1 : L2 (Ω) → H −1/2 (Γ). (N1 f )(x) = γ1,x Ω
The mapping properties of the above integral operators are well known, in particular all are bounded, V is symmetric and elliptic (see, e.g., Steinbach [3], Theorem 6.23), and D is symmetric and semi-elliptic with the kernel R of the solutions of the Neumann homogeneous boundary value problem Δu = 0 in Ω, .
γ1 u = 0 on Γ.
14.2 Steklov-Poincaré Operator For a given .f ∈ L2 (Ω), observing that the assumption on the diameter of .Ω implies that the operator V is elliptic, we get from the first equation of (14.4) the Dirichlet–Neumann map 1 −1 I + K γ0 u − V −1 N0 f on Γ. . γ1 u = V (14.5) 2 We can also substitute the latter formula into the second equation of (14.4) to get another representation of the Dirichlet-Neumann map
14
TBETI
303
1 1 I + K V −1 I + K + D γ0 u 2 2 1 I + K V −1 N0 f + N1 − 2
γ1 u = .
(14.6)
on .Γ. The last equation can be expressed by means of two new operators, the Steklov-Poincaré operator defined by the equivalent representations 1 −1 I +K .S = V (14.7) 2 1 1 I + K V −1 I +K +D = (14.8) 2 2 and the Newton operator defined by 1 −1 I + K V −1 N0 . .N = V N0 = N1 − 2 The Eqs. (14.5) and (14.6) now read .
γ1 u = Sγ0 u − N f
on Γ.
(14.9)
If .g : Γ → R is sufficiently smooth, then the solution of .
− Δu = f γ0 u = g
in Ω, on Γ
can be obtained in the form u = ug + uf ,
.
where .ug and .uf solve .
−Δug = 0, − Δuf = f γ0 ug = g, γ0 uf = 0
in Ω, on Γ.
Moreover, (14.9) implies that for any .v ∈ H 1/2 (Γ) γ1 ug (x)v(x) dΓ . Sg, v = γ1 ug , v =
(14.10)
(14.11)
Γ
and
.
N f, v = −γ1 uf , v = −
γ1 uf (x)v(x) dΓ.
(14.12)
Γ
The properties of the Steklov-Poincaré operator are further summarized in the following theorem.
304
Marie Sadowská, Zdeněk Dostál, and Tomáš Kozubek
Theorem 14.2 The Steklov-Poincaré operator S : H 1/2 (Γ) → H −1/2 (Γ)
.
is symmetric, and there are .αD > 0 and .α > 0 such that .
2
Sv, v ≥ αD vH 1/2 (Γ)
for all v ∈ H 1/2 (Γ)/R
and .
2
Sv, v ≥ α vH 1/2 (Γ)
1/2
for all v ∈ H0 (Γ, ΓU ),
where .H 1/2 (Γ)/R is the space of all functions from .H 1/2 (Γ) that are orthogonal to R, 1/2
H0 (Γ, ΓU ) = {v ∈ H 1/2 (Γ) : v = 0 on ΓU },
.
and .ΓU is a part of the boundary .Γ with a positive measure. Proof See the books by McLean [2] or Steinbach [3, 6].
.
Let us note that the representation (14.7) together with a Galerkin discretization typically results in nonsymmetric stiffness matrices. However, since the symmetry of the stiffness matrix is essential in our further analysis, we consider only the symmetric representation (14.8).
14.3 Decomposed Boundary Variational Inequality Now we are ready to reduce the scalar variational inequality that was introduced in Sect. 10.3 to the boundaries of the subdomains. Let us recall that a solution of the variational inequality (10.11) defined on the union of the subdomains .Ωij requires to find .u ∈ KDD such that .
a(u, v − u) ≥ (v − u),
v ∈ KDD ,
(14.13)
where V ij = v ij ∈ H 1 (Ωij ) : γ0 v ij = 0 on
ΓiU ∩ Γij , i = 1, 2, j = 1, . . . , p,
VDD = V 11 × · · · × V 1p × V 21 × · · · × V 2p ,
. 1j KDD = v ∈ VDD : γ0 v 2i ≥ γ0 v 1j on Γ2i C ∩ ΓC and γ0 v ij = γ0 v ik on Γij ∩ Γik ,
14
TBETI
305
and (u, v) =
p 2
uij v ij dΩ,
.
i=1 j=1
a(u, v) =
Ωij
p 2 i=1 j=1
Ωij
∂uij ∂v ij ∂uij ∂v ij + ∂x1 ∂x1 ∂x2 ∂x2
dΩ, (14.14)
(v) = (f, v).
If u is a sufficiently smooth solution of (14.13), then .(u1 , u2 ) defined by ui = uij
for
.
x ∈ Ωij ,
i = 1, 2, j = 1, . . . , p,
is a classical solution of (10.1)–(10.2). To reduce the variational formulation of the decomposed problem to the skeleton Σ = Γ11 × · · · × Γ1p × Γ21 × · · · × Γ2p ,
.
let
Vbij = v ij ∈ H 1/2 (Γij ) : v ij = 0 on ΓiU ∩ Γij ,
.
i = 1, 2,
j = 1, . . . , p,
denote the closed subspaces of .H 1/2 (Γij ), and let b VDD = Vb11 × · · · × Vb1p × Vb21 × · · · × Vb2p ,
1j b b 2i . KDD = v ∈ VDD : v − v 1j ≥ 0 on Γ2i C ∩ ΓC and v ij = v ik on Γij ∩ Γik . b On .VDD we shall define a scalar product
(u, v) =
p 2
uij v ij dΓ,
.
i=1 j=1
Γij
a symmetric bilinear form ab (u, v) =
p 2
.
S ij uij , v ij ,
i=1 j=1
and a linear form b (v) =
p 2
.
i=1 j=1
N ij f ij , v ij ,
306
Marie Sadowská, Zdeněk Dostál, and Tomáš Kozubek
where the ordered couples of upper indices ij associate the objects with the subdomains .Ωij . Let us now assume that .u ∈ KDD is a solution of (14.13) and .v ∈ KDD . After substituting .Δu = −f in .Ω into Green’s identity (see Theorem 4.4), we obtain . ∇u∇v dΩ − f v dΩ = γ1 u, γ0 v , Ω
Ω
and using (14.9) to its right-hand side, we get .
a(u, v − u) − (v − u) = ab (γ0 u, γ0 (v − u)) − b (γ0 (v − u)) .
(14.15)
Thus if .u ∈ KDD is a solution of (14.13), then its reduction to the skeleton Σ satisfies
.
ab (γ0 u, γ0 (v − u)) ≥ b (γ0 (v − u)) ,
.
v ∈ KDD .
Moreover, if we define b q b : VDD → R,
.
q b (v) =
1 b a (v, v) − b (v), 2
we can use the same arguments as in Sect. 10.3 to get that any solution u ∈ KDD of (14.13) satisfies
.
.
q b (γ0 u) ≤ q b (v),
b v ∈ KDD .
(14.16)
Now we are ready to formulate the main result of this section. b Theorem 14.3 Let .f ∈ L2 (Ω1 ∪ Ω2 ) and .g ∈ KDD satisfy .
q b (g) ≤ q b (v),
b v ∈ KDD ,
(14.17)
and let .u = ug + uf , where .uf , ug are defined by (14.10) (with .Ω = Ωij , Γ = Γij , i = 1, 2, j = 1, . . . , p). Then u solves (14.13). Proof The proof follows the definitions and (14.15).
.
14.4 Boundary Discretization and TBETI The boundary element discretization of the variational inequality defined on the skeleton .Σ is very similar to the finite element discretization defined on .Ω. The only difference is that the basis functions .φ are defined on the boundaries of the subdomains .Ωij . Let us assume that each domain .Ωi is decomposed into .p = 1/H 2 square subdomains .Ωij , .i = 1, 2, .j = 1, . . . , p, as in Sect. 10.4, and decompose the
14
TBETI
307
boundary .Γij of .Ωij into .nτij line segments .τij of the length h. On each .τij , we can choose one or more nodes and use them to define the shape functions (see, e.g., Gaul et al. [7], Sect. 4.2.5) or the boundary element polynomial basis functions. For example, we can use the piece-wise constant functions .Ψij associated with the elements .τij or the linear basis functions .φij associated with the vertices of the elements which are depicted in Fig. 14.1. We denote the number of the basis functions by .nij .
Fig. 14.1 Piece-wise constant (left) and piece-wise linear continuous (right) basis function
We shall look for an approximate solution .uh in the trial space .Vh which is spanned by the basis functions, .
Vh = Vh11 × · · · × Vh1p × Vh21 × · · · × Vh2p , Vhij = Span{φij , = 1, . . . , nij }.
Decomposing .uh into components, i.e., 1p 2p 21 uh = (u11 h , . . . , uh , uh , . . . , uh ), .
uij h=
nij
ij uij φ ,
=1
we get Suh , uh = .
ij S ij uij h , uh
Similarly,
Sij
m
p 2 ij S ij uij h , uh ,
i=1 j=1 nij ij ij ij ij = uij S φ , φm um ,m=1 ij =S ij φij , φm ,
= (uij )T Sij uij ,
[uij ] = uij .
308
Marie Sadowská, Zdeněk Dostál, and Tomáš Kozubek
b (uh ) =
p 2
.
N ij f ij , uij h ,
i=1 j=1 nij ij ij ij ij b T ij N f , uh = N ij f ij , φij u = (fij ) u ,
b fij
=1
=N ij f ij , φij .
(14.18)
The effective evaluation of (14.18) can be found, e.g., in Steinbach [3]. Let us now assign each subdomain a unique index, and let us index contiguously the nodes and the entries of corresponding vectors in subdomains, so that we can write ⎡ b ⎤ ⎡ ⎤ ⎡ b⎤ S1 O · · · O u1 f1 ⎢ O Sb2 · · · O ⎥ ⎢ ⎥ ⎢ .. ⎥ ⎢ .. ⎥ b b .S = ⎢ . . . ⎥ , u = ⎣ . ⎦ , f = ⎣ . ⎦, s = 2p. ⎣ .. .. . . . .. ⎦ us fsb , O O · · · Sbs The SPS matrix .Sb ∈ Rn×n denotes a discrete analog of the Steklov-Poincaré operator that assigns each vector .u of nodal boundary displacements the vector of corresponding nodal forces. Our notation indicates that .Sb is closely related to the Schur complement .S introduced in Chap. 10. If we denote by .BbI and .BbE the full rank matrices which describe the discretized non-penetration and gluing conditions, respectively, we get the discretized version of problem (10.12) with auxiliary domain decomposition that reads .
min
1 T b u S u − (f b )T u 2
s.t.
BbI u ≤ o
and
BbE u = o.
(14.19)
In (14.19), the matrices .BbI and .BbE can be obtained from the matrices .BE and .BI used in (10.16) by deleting the columns which correspond to the inner nodes of the subdomains. In particular, we can easily achieve b
b T BI b B .B = I, Bb = . BbE The next steps are very similar to those described in Chap. 10 (details can be found also in [8]), so we shall switch to the main goal of this chapter, the development of a scalable BETI-based algorithm for solving the frictionless contact problems of elasticity.
14
TBETI
309
14.5 Operators of Elasticity To extend our exposition to frictionless contact problems in 3D, let us briefly recall some fundamental results concerning the boundary integral operators that can be used to reduce the conditions of equilibrium of a homogeneous elastic body to its boundary. Though most of these results are rather complicated, they can be obtained by the procedures indicated in Sect. 14.1 with Green’s formulae replaced by Betti’s formulae [2, 5]. The most important point is that the energy associated with a sufficiently smooth displacement .v which satisfies the conditions of equilibrium in the domain can be evaluated by means of the values of .v and its derivatives on the boundary as in our scalar benchmark (14.15). More details can be found, e.g., in the paper by Costabel [4] or in the books by Steinbach [3], Rjasanow and Steinbach [5], or McLean [2]. See also Dostál et al. [9]. Let .Ω ⊂ R3 be a bounded Lipschitz domain with the boundary .Γ which is filled with a homogeneous isotropic material, and consider the elliptic operator .L which assigns each sufficiently smooth .u : Ω → R3 a mapping 3 .L u : Ω → R by
.
where the stress tensor .σ satisfies, as in Sect. 11.2, Hooke’s law
.
Let us recall the standard interior trace and boundary traction operators γ :
. 0
H 1 (Ω)
3
3 → H 1/2 (Γ)
and
γ1 :
3 1 3 HL (Ω) → H −1/2 (Γ) ,
respectively, where .H 1/2 (Γ) denotes the trace space of .H 1 (Ω), .H −1/2 (Γ) is the dual space to .H 1/2 (Γ) with respect to the .L2 (Γ) scalar product, and 1 3 3 3 . . HL (Ω) = v ∈ H 1 (Ω) : L v ∈ L2 (Ω) 1 3 It is well known (see, e.g., [2–4]) that for any .u ∈ HL (Ω) , there exists the Dirichlet-Neumann map γ u = Sγ0 u − N L u
. 1
on Γ
with the Steklov-Poincaré operator .
S = (σI + K )V −1 (σI + K) + D :
H 1/2 (Γ)
3
3 → H −1/2 (Γ) , (14.20)
310
Marie Sadowská, Zdeněk Dostál, and Tomáš Kozubek
where .σ(x) = 1/2 for almost all .x ∈ Γ, and the Newton operator .
N = V −1 N0 :
L2 (Ω)
3
3 → H −1/2 (Γ) .
(14.21)
In (14.20) and (14.21), we use the single-layer potential operator V, the double-layer potential operator K, the adjoint double-layer potential operator .K , and the hypersingular integral operator D given for .x ∈ Γ and .i = 1, 2, 3 by 3 3 .(V t)i (x) = t(y) · Ui (x, y) dΓy , V : H −1/2 (Γ) → H 1/2 (Γ) , Γ
u(y) · γ1,y Ui (x, y) dΓy ,
(Ku)i (x) =
.
Γ
(K t)i (x) =
K:
H 1/2 (Γ)
3
3 → H 1/2 (Γ) ,
t(y) · γ1,x Ui (x, y) dΓy ,
.
Γ
K :
.
H −1/2 (Γ)
3
3 → H −1/2 (Γ) ,
(Du)i (x) = − γ1,x
u(y) · γ1,y Ui (x, y) dΓy ,
.
Γ
D:
.
H 1/2 (Γ)
3
3 → H −1/2 (Γ) ,
and the Newton potential operator .N0 given for .x ∈ Γ and .i = 1, 2, 3 by 3 3 .(N0 f )i (x) = f (y) · Ui (x, y) dΩy , N0 : L2 (Ω) → H 1/2 (Γ) , Ω
where .Ui are the components of the fundamental solution .U of .L (the socalled Kelvin’s tensor ),
.
Ui = [Ui1 , Ui2 , Ui3 ]T , i = 1, 2, 3, U = UT = [U1 , U2 , U3 ], δij (xi − yi )(xj − yj ) 1+ν (3 − 4ν) + , Uij (x, y) = 3 8πE(1 − ν) x − y x − y δij =[ I ]ij .
The mapping properties of the above integral operators are well known 3 [3, 4], in particular, the single-layer potential operator V is . H −1/2 (Γ) – elliptic, so its inverse exists. We shall need the following lemmas: Lemma 14.1 The Steklov-Poincaré operator S is linear, bounded, symmet 3 ric, and semi-elliptic on . H 1/2 (Γ) . Moreover, the kernel of S is equal to the space of linearized rigid body motions, i.e.,
14
TBETI
311
⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫ 0 x3 ⎬ 0 0 −x2 ⎨ 1 ⎣ 0 ⎦ , ⎣ 1 ⎦ , ⎣ 0 ⎦ , ⎣ x1 ⎦ , ⎣ −x3 ⎦ , ⎣ 0 ⎦ . (14.22) . Ker S = Span ⎭ ⎩ 0 x2 −x1 0 0 1
Proof See the book by Steinbach [3].
.
3 Lemma 14.2 The Newton operator N is linear and bounded on . L2 (Ω) .
Proof See the book by Steinbach [3].
.
14.6 Decomposed Contact Problem on Skeleton Let us consider the variational formulation of the decomposed multibody contact problem to find .u ∈ KDD such that .
q(u) ≤ q(v),
v ∈ KDD ,
(14.23)
where q(v) =
.
1 a(v, v) − (v) 2
and .KDD were introduced in Sect. 11.4. Let us recall that
.
where .S denotes the contact coupling set and the relations have to be interpreted in the sense of traces. To describe the boundary analogue of (14.23), let us introduce Vbi = {v ∈ (H 1/2 (Γi ))3 : v = o on ΓiU }, b . VDD b KDD
=
Vb1
× ··· ×
i = 1, . . . , s,
Vbs ,
ij b i j = {v∈VDD : [vn ]≤g on Γij C , (i, j) ∈ S ; v = v on ΓG , i, j = 1, . . . , s}.
312
Marie Sadowská, Zdeněk Dostál, and Tomáš Kozubek
For any displacement b u = (u1 , . . . , us ) ∈ VDD ,
.
let us define the energy functional by q b (u) =
.
1 b a (u, u) − b (u), 2
b where .ab is a bilinear form on .VDD defined by b
a (u, v) =
.
s
S p up , vp
p=1 b and .b is a linear functional on .VDD given by
b (v) =
s
.
N p f p , vp + fΓpp , vp . F
p=1
Recall that 3 → H −1/2 (Γp ) 3 3 N p : L2 (Ωp ) → H −1/2 (Γp )
Sp :
.
H 1/2 (Γp )
3
and
denote the (local) Steklov-Poincaré and Newton operators, respectively. The boundary version of the decomposed problem (14.23) now reads: find b such that the displacement .u ∈ KDD .
b q b (u) ≤ q b (v) for v ∈ KDD .
(14.24)
Due to Lemmas 14.1 and 14.2, the bilinear form .ab is bounded, symmetric, b b , and the linear functional .b is bounded on .VDD . Thus .q b and elliptic on .VDD b is a bounded convex quadratic functional on .VDD , and problem (14.24) has a unique solution by Theorems 4.6 and 4.7.
14.7 TBETI Discretization of Contact Problem The boundary element discretization of contact problems defined on the skeleton of the decomposition (14.24) is again very similar to the FEM discretization of .Ω. The main new feature, as compared with Sect. 14.4, is that the basis functions are defined on 2D boundaries of subdomains .Ωp and are vector valued. After decomposing each .Γp into .nτp triangles .τip , .p = 1, . . . , s, p τ .i = 1, . . . , np , we can use, e.g., the constant basis functions .ψ i associated
14
TBETI
313
Fig. 14.2 Piece-wise constant (left) and linear continuous (right) basis functions on the surface
with .τip or the linear basis functions .φpj associated with the vertices of .τip (see Fig. 14.2). We denote the number of the basis functions by .np and look for an approximate solution .uh in the trial space
1 s .Vh = Vh × · · · × Vh , Vhp = Span φp1 , . . . , φpnp . After substituting into the forms .ab and .b , we get the matrix of the local discretized Steklov-Poincaré operators .Sbp and the blocks .fpb of the discretized traction b b = S p φp , φpm , fp = N p f p , φp + fΓpp , φp . . Sp m F
To evaluate the entries of the boundary element matrices, we have to approximate (possibly singular) double surface integrals. This can be carried out effectively, e.g., by the semi-analytical approach introduced by Rjasanow and Steinbach in [5], where the inner integral is calculated analytically and the outer one is approximated by using a suitable numerical scheme. See also Steinbach [3, Chap. 10]. If we denote by .BbI and .BbE the full rank matrices which describe the discretized non-penetration and gluing conditions, respectively, we get the discretized version of problem (14.24) with auxiliary domain decomposition in the form .
min
1 T b u S u − (f b )T u 2
s.t.
BbI u ≤ c and
BbE u = o,
where ⎡
Sb1 ⎢O ⎢ b .S = ⎢ . ⎣ ..
O Sb2 .. .
··· ··· .. .
⎤ O O⎥ ⎥ .. ⎥ , . ⎦
O O · · · Sbs
⎡
⎤ u1 ⎢ ⎥ u = ⎣ ... ⎦ , us
⎤ f1b ⎢ ⎥ f b = ⎣ ... ⎦. ⎡
fsb
(14.25)
314
Marie Sadowská, Zdeněk Dostál, and Tomáš Kozubek
The SPS matrix .Sb ∈ Rn×n denotes a discrete analogue of the SteklovPoincaré operator that assigns each vector .u of nodal boundary displacements a vector of corresponding nodal forces. Our notation indicates that b .S is closely related to the Schur complement .S introduced in Chap. 10. In (14.25), the matrices .BbI and .BbE can be obtained from the matrices .BE and .BI from (10.16) by deleting the columns the indices of which correspond to the inner nodes of the subdomains. Remark 14.1 Since the matrices .BbI and .BbE can be obtained from the matrices BI and .BE by deleting the columns whose indices correspond to the inner nodes of subdomains, it is always possible to achieve that the rows of .B are orthonormal provided each node is involved in at most one inequality. This is always possible for two bodies or any number of smooth bodies. To simplify the formulation of the optimality results, we shall assume in what follows (except Chap. 16) that
.
.
T Bb Bb = I.
(14.26)
See also Remark 11.1.
14.8 Dual Formulation Since problem (14.25) arising from the application of the TBETI method to the frictionless contact problem has the same structure as that arising from the application of the TFETI method in Chap. 11, we shall reduce our exposition to a brief overview of the basic steps—more details can be found in Chap. 11. Recall that the Lagrangian associated with problem (14.25) reads L(u, λI , λE ) =
.
1 T b u S u − uT f b + λTI (BbI u − c) + λTE BbE u, 2
where λI and λE are the Lagrange multipliers associated with the inequalities and equalities, respectively. Introducing the notation b
λI BI cI .λ = , Bb = , and c = , λE oE BbE we can write the Lagrangian briefly as L(u, λ) =
.
1 T b u S u − uT f b + λT (Bb u − c). 2
The next steps are the same as in Sect. 11.6. In particular, we get that if (x, λ) is a KKT couple of problem (14.25), then λ solves the minimization problem
14
TBETI .
315
min Θ(λ)
λI ≥ o
s.t.
(Rb )T (f b − (Bb )T λ) = o,
and
(14.27)
where Θ(λ) =
.
1 T b b + b T λ B (S ) (B ) λ − λT (Bb (Sb )+ f b − c) 2
and Rb denotes the matrix the columns of which span the kernel of Sb . The matrix Rb can be obtained from the matrix R which spans the kernel of the stiffness matrix K obtained by the volume finite element discretization by deleting the rows corresponding to the nodal variables in the interior of the subdomains. Let us denote "b , F = F " b = F −1 Bb (Sb )+ f b − c , d " e = (Rb )T f b .
"b = Bb (Sb )+ (Bb )T , F "b , . Fb = F −1 F b b T " G = (B R ) ,
Moreover, let T denote a regular matrix that defines the orthonormalization " so that the matrix of the rows of G " G = TG
.
has orthonormal rows. After denoting e = T" e,
.
problem (14.27) reads .
min
1 T b "b λ F λ − λT d 2
s.t.
λI ≥ o
and
Gλ = e.
(14.28)
Next we shall transform the problem of minimization on the subset of the affine space to that on the subset of a vector space by means of an arbitrary " which satisfies λ " = e. Gλ
.
If problem (14.28) is a dual problem arising from the discretization of a " so that coercive problem, then we can use Lemma 11.2 to get λ " I ≥ o. λ
.
" we get, after Looking for the solution of (14.28) in the form λ = μ + λ, substituting back λ for μ, the problem to find .
min
1 T b λ F λ − λ T db 2
s.t.
Gλ = o and
"I λ I ≥ I = − λ
(14.29)
316
Marie Sadowská, Zdeněk Dostál, and Tomáš Kozubek
with " b − Fb λ. " db = d
.
Our final step is based on the observation that problem (14.29) is equivalent to .
min
1 T b λ H λ − λT Pdb 2
s.t.
λI ≥ I ,
Gλ = o and
(14.30)
where ρ is an arbitrary positive constant and Hb = PFb P + ρQ,
Q = GT G,
.
and
P = I − Q.
Recall that P and Q denote the orthogonal projectors on the kernel of G and " I ≥ o, then o is a feasible vector on the image space of GT , respectively. If λ for problem (14.30).
14.9 Bounds on the Dual Stiffness Matrix Spectrum First observe that Im.P and Im.Q are invariant subspaces of .Hb and Im.P+Im.Q = Rm , as P+Q=I
.
and for any .λ ∈ Rm Hb Pλ = (PFb P + ρQ)Pλ = P(Fb Pλ) and Hb Qλ = (PFb P + ρQ)Qλ = ρQλ.
.
It follows that σ(Hb |ImQ) = {ρ},
.
so it remains to find the bounds on σ(Hb |ImP) = σ(Fb |ImP).
.
The following lemma reduces the problem to the analysis of the local boundary stiffness matrices. Lemma 14.3 Let there be constants .0 < c < C such that for each .λ ∈ Rm cλ2 ≤ (Bb )T λ2 ≤ Cλ2 .
.
Then for each .λ ∈ ImP
14
TBETI
317
c
.
−1 max
i=1,...,s
Sbi
T "b
λ ≤ λ F λ ≤ C 2
−1 min
i=1,...,s
λmin (Sbi )
λ2 ,
where .Sbi denotes the boundary element stiffness matrix associated with .Γi .
Proof See the proof of Lemma 10.2.
.
Remark 14.2 Lemma 14.3 indicates that the conditioning of .H can be improved by the orthonormalization of the rows of the constraint matrix .Bb . We have reduced the problem to bound the spectrum of .H to the analysis of the spectrum of the boundary stiffness matrices. The following result, which is due to Langer and Steinbach, is important in the analysis of the optimality of the presented algorithms. Lemma 14.4 Let H and h denote the diameter and the discretization parameter of a quasi-uniform discretization of a domain .Ω ⊆ R3 with shape regular linear elements. Assume that the elements for the discretization of .Γ are defined by the traces of those for the discretization of .Ω. Let .SbH,h and .SH,h denote the boundary stiffness matrix and the Schur complement of the stiffness matrix .K of a subdomain of .Ω with respect to its interior variables, respectively. Then .SbH,h and .SH,h are spectrally equivalent, i.e., there are constants c and C independent of h and H such that for each .x ∈ ImSH,h .
c xT SH,h x ≤ xT SbH,h x ≤ C xT SH,h x.
(14.31)
Proof See Langer and Steinbach [1].
.
Combining Proposition 11.2 and (14.31), we can get H-h bounds .
c
h2 x2 ≤ xT SbH,h x ≤ Chx2 , H
x ∈ Im SbH,h .
(14.32)
The following theorem is now an easy corollary of Lemma 14.3 and (14.32). Theorem 14.4 Let .ρ > 0, and let .Hbρ,H,h denote the Hessian of the cost function of problem (14.30) resulting from the quasi-uniform discretization of (14.24) using shape regular boundary elements with the parameters H and h. Let .Bb satisfy (14.26). Then there are constants c, C independent of H and h such that for each .x ∈ Rn c
.
h x2 ≤ xT Hbρ,H,h x ≤ Cx2 . H
Proof Substitute (14.32) into Lemma 14.3, and take into account the choice . of .ρ and the scaling of .F.
318
Marie Sadowská, Zdeněk Dostál, and Tomáš Kozubek
Remark 14.3 The finite element and boundary element Schur complements S and .Sb are discrete approximations of the same object, the SteklovPoincaré operator. Since the eigenvectors corresponding to small eigenvalues are smooth, they are well approximated by both the finite and boundary elements, so the best constant c in (14.31) satisfies .c ≈ 1. However, this reasoning is invalid for the eigenvectors corresponding to large eigenvalues. Table 14.1 indicates that the best constant C in (14.31) satisfies .C > 2 for the square subdomain discretized by a structured grid with step h so that we can deduce that the boundary element method generates better conditioned dual stiffness matrix .F and faster convergence of BETI than FETI.
.
.H
=1
.Sb H,h
.SH,h .λmin (Sb H,h ) .λmin (SH,h )
.h
= 1/4 1.3152 2.6051 0.3259 0.3372
.h
= 1/8 1.2606 2.7600 0.1699 0.1712
.h
= 1/16 1.2569 2.8097 0.0858 0.0859
.h
= 1/32 1.2569 2.8235 0.0430 0.0430
Table 14.1 Extreme eigenvalues of FEM/BEM Schur complements—discrete Laplacian
Moreover, for each trace .uh of a finite element discretization of .Ω, the discrete harmonic expansion’s energy in .Vh always dominates the energy of the harmonic function in .V ⊃ Vh . Since the traces of the rigid body motions in V and .Vh are identical, we have .ImSb = ImS, and for any .x ∈ ImS it holds xT Sb x ≤ xT Sx.
.
(14.33)
14.10 Optimality To show that Algorithm 9.2 (SMALBE-M) with the inner loop implemented by Algorithm 8.2 is optimal for the solution of the problem (or a class of problems) (14.30) arising from the varying discretizations of a given frictionless contact problem, let us introduce, as in Sect. 11.12, a new notation that complies with that used in the analysis of the algorithms in Part II. Let .ρ > 0 and .C ≥ 2 denote given constants, and let TC = {(H, h) ∈ R2 : H/h ≤ C}
.
denote the set of indices. For any .t ∈ TC , let us define the problem .
At = PFb P + ρQ, bt = Pdb , "I , Bt = G, It = −λ
where the vectors and matrices of (14.30) arising from varying discretizations of (14.24) with the parameters H and h, .t = (H, h). Using the procedure
14
TBETI
319
described above, we get for each .t ∈ TC the problem .
min ft (λt ) s.t. Bt λt = o and λt ≥ It
(14.34)
with f (λ) =
. t
1 T λ At λ − bTt λ. 2
We shall assume that the discretization satisfies the assumptions of Theorem 14.4 and that t ≤ o.
. I
Using .GGT = I, we obtain .
Bt = 1.
(14.35)
Moreover, it follows by Theorem 14.4 that for any .C ≥ 2 there are constants aC > aC min > 0 such that
. max
.
C aC min ≤ λmin (At ) ≤ λmax (At ) ≤ amax
(14.36)
for any .t ∈ TC . As above, we denote by .λmin (At ) and .λmax (At ) the extreme eigenvalues of .At . Our optimality result for the solution of the class of problems (14.34) arising from the boundary element discretization of the multibody contact problem (14.24) reads as follows. Theorem 14.5 Let .C ≥ 2, .ε > 0, and .ρ > 0 denote given constants, and let .{λkt }, .{μkt }, and .{Mt,k } be generated by Algorithm 9.2 (SMALBE-M) for the class of problems (14.34) with bt ≥ ηt > 0,
1 > β > 0,
.
Mt,0 = M0 > 0,
and
μ0t = o.
Let Step 1 of Algorithm 9.2 be implemented by means of Algorithm 8.2 (MPRGP) with parameters Γ>0
.
and
α ∈ (0, 2/aC max ),
so that it generates the iterates k,1 k,l k λk,0 t , λ t , . . . , λ t = λt
.
for the solution of (14.34) starting from .λk,0 = λk−1 with .λ−1 = o, where t t t .l = lt,k is the first index satisfying k,l k gP (λk,l t , μt , ρ) ≤ Mt,k Bt λt
.
or
320
Marie Sadowská, Zdeněk Dostál, and Tomáš Kozubek k gP (λk,l t , μt , ρ) ≤ εbt
.
and
Bt λk,l t ≤ εbt .
(14.37)
Then for any .t ∈ TC and problem (14.34), an approximate solution .λkt satisfying (14.37) is generated at .O(1) matrix–vector multiplications by the Hessian .At of .ft . Proof The class of problems satisfies all assumptions of Theorem 9.4 (i.e., .o is feasible and the inequalities (14.35) and (14.36) hold true) for the set of indices .TC . Thus to complete the proof, it is enough to apply Theorem 9.4. . Since the cost of a matrix-vector multiplication by the Hessian .At is proportional to the number of dual variables, Theorem 14.5 proves the numerical scalability of TBETI. The (week) parallel scalability is supported by the structure of .At .
14.11 Numerical Experiments The algorithms presented here were implemented in MatSol [10] and tested on a number of academic benchmarks and real-world problems; see, e.g., Sadowská et al. [11] or [12]. Here, we give some results that illustrate the numerical scalability and effectiveness of TBETI using the benchmark of Chap. 12. We also compare the precision of TFETI and TBETI on the solution of a variant of the Hertz problem with the known analytic solution. All the computations were carried out with the parameters recommended in the description of the algorithms in Chaps. 7–9. The relative precision of the computations was the same as in Chaps. 11 and 12, i.e., .10−4 for the academic benchmark and .10−6 for the real-world problems.
14.11.1 Academic Benchmark We consider the contact problems of two cantilever beams in contact with friction that was introduced in Sect. 12.9. The problems were decomposed and discretized with varying decomposition and discretization parameters h and H as in Sect. 12.9. We kept .H/h = 12, so that the assumptions of Theorem 14.5 are satisfied. The performance of the algorithms is documented in the following graphs. The numbers of outer iterations and the multiplications by the Hessian .F of the dual function depending on the primal dimension n and the number of subdomains .nsub are in Fig. 14.3, both for the problem with the Tresca and Coulomb friction and the friction coefficient .Φ = 0.1. We can see a stable number of outer iterations and mildly increasing number of inner iterations
14
TBETI
321
Fig. 14.3 Tresca and Coulomb friction, TBETI: numerical scalability of cantilever beams—matrix-vector multiplications by .Fb
for n ranging from 7224 to 2,458,765. The dual dimension of the problems ranged from 1859 to 1,261,493. We conclude that the performance of the algorithms based on TBETI is very similar to those based on TFETI and is in agreement with the theory. Notice that the primal dimension of the problem discretized by TBETI is typically much smaller than that discretized by TFETI with the same discretization parameter h.
14.11.2 Comparing TFETI and TBETI We consider a frictionless 3D Hertz problem depicted in Fig. 14.4, with the Young modulus .2.1 · 105 MPa and the Poisson ratio 0.3. The ball is loaded on the top by the force .F = −5000 [N]. The ANSYS discretization of the two bodies was decomposed by METIS into 1024 subdomains. The comparison of TFETI and TBETI in terms of computational times and the number of Hessian multiplications is given in Table 14.2.
G
Fig. 14.4 Model specification
322 Method TFETI TBETI
Marie Sadowská, Zdeněk Dostál, and Tomáš Kozubek Primal DOFs 4,088,832 1,849,344
Dual DOFs 926,435 926,435
Preprocessing 21 min 1 h 33 min
Solution time 1 h 49 min 1 h 30 min
Hessian mult. 593 667
Table 14.2 Numerical performance of TFETI and TBETI applied to the Hertz problem
Fig. 14.5 Correspondence of numerical Hertz contact pressures to the analytic solution
We can see that the time of computation required by the two methods is comparable, especially if we take into account that the preprocessing time can be reduced very efficiently on more processors than 24 used in the test. In Fig. 14.5 we can see a fine correspondence of the contact pressures computed by TFETI and TBETI with the analytical solution. We can see that TBETI returns a bit more precise results, which is not surprising, as TFETI uses the exact solutions in the interior of subdomains. The convergence criterion was the reduction of the norm of the projected gradient of the dual energy function by .10−6 .
14.11.3 Ball Bearing We solved also the contact problem of ball bearing depicted in Fig. 1.2. We imposed the Dirichlet boundary conditions on the outer ring and loaded the opposite part of the inner diameter with the force 4500 [N]. The discretized geometry was decomposed into 960 subdomains by METIS. Numerical comparison of TFETI and TBETI is in Table 14.3. For more information see [11, 12].
Method TFETI TBETI
Primal DOFs 1,759,782 1,071,759
Dual DOFs 493,018 493,018
Preprocessing 129 s 715 s
Solution time 2 h 5 min 1 h 5 min
Hessian mult. 3203 2757
Table 14.3 Numerical performance of TFETI and TBETI applied to the ball bearing problem
14
TBETI
323
14.12 Comments For a nice introductory course on the classical boundary element method which covers also some more advanced topics, see Gaul et al. [7]. For more formal exposition, see the books by Steinbach [3] or McLean [2]. Engineering applications can be found in the book edited by Langer et al. [13]. The introduction of BETI method by Langer and Steinbach [1] opened the way to the development of scalable domain decomposition-based algorithms for the solution of problems discretized by the boundary element method. A variant of BETI which we use here appeared first in the doctoral thesis by Of [14] as AF (all floating) BETI. See also Of and Steinbach [15]. Here we call it TBETI (Total BETI) as it is shorter and indicates a close relation to TFETI, which was developed independently at about the same time. The scalability results reported here were published first for a scalar model problem in Bouchala et al. [8] and then for 3D multibody elastic problems in Bouchala et al. [8] and Sadowská et al. [11]. The scalability of the algorithm adapted for the solution of problems with Tresca (given) friction is reported in Sadowská et al. [12]. We are not aware of other scalable methods for the solution of contact problems by the boundary element method. The performance of any algorithm which uses BEM can be improved by so-called fast methods (see, e.g., Rjasanow and Steinbach [5]). They include the hierarchical matrices introduced by Hackbusch [16] (see a comprehensive exposition in Bebendorf [17]), the Adaptive Cross Approximation (ACA) (see, e.g., Bebendorf and Rjasanow [18]), or Fast Multipole Method (FMM) (see, e.g., Greengard and Rokhlin [19, 20] or Of et al. [21]). These methods accelerate the evaluation of the matrices and the consequent matrix-vector multiplication and lead to asymptotically nearly linear space and time complexities.
References 1. Langer, U., Steinbach, O.: Boundary element tearing and interconnecting methods. Computing 71, 205–228 (2003) 2. McLean, W.: Strongly Elliptic Systems and Boundary Integral Equations. Cambridge University Press, Cambridge (2000) 3. Steinbach, O.: Numerical Approximation Methods for Elliptic Boundary Value Problems. Finite and Boundary Elements. Springer, New York (2008) 4. Costabel, M.: Boundary integral operators on Lipschitz domains: elementary results. SIAM J. Math. Anal. 19, 613–626 (1988) 5. Rjasanow, S., Steinbach, O.: The Fast Solution of Boundary Integral Equations. Mathematical and Analytical Techniques with Applications to Engineering. Springer, New York (2007)
324
Marie Sadowská, Zdeněk Dostál, and Tomáš Kozubek
6. Steinbach, O.: Stability Estimates for Hybrid Coupled Domain Decomposition Methods. Lecture Notes in Mathematics. Springer, Berlin (2003) 7. Gaul, L., Kögl, M., Wagner, M.: Boundary Elements for Engineers and Scientists. Springer, Berlin (2003) 8. Bouchala, J., Dostál, Z., Sadowská, M.: Theoretically supported scalable BETI method for variational inequalities. Computing 82, 53–75 (2008) 9. Dostál, Z., Friedlander, A., Santos, S.A., Malík, J.: Analysis of semicoercive contact problems using symmetric BEM and augmented Lagrangians. Eng. Anal. Bound. Elem. 18(3), 195–201 (1996) 10. Kozubek, T., Markopoulos, A., Brzobohatý, T., Kučera, R., Vondrák, V., Dostál, Z.: MatSol–MATLAB efficient solvers for problems in engineering. The library is no longer updated and is replaced by the ESPRESO framework. http://numbox.it4i.cz 11. Sadowská, M., Dostál, Z., Kozubek, T., Markopoulos, A., Bouchala, J.: Scalable total BETI based solver for 3D multibody frictionless contact problems in mechanical engineering. Eng. Anal. Bound. Elem. 35, 330– 341 (2011) 12. Sadowská, M., Dostál, Z., Kozubek, T., Markopoulos, A., Bouchala, J.: Engineering multibody contact problems solved by scalable TBETI. In: Fast Boundary Element Methods in Engineering and Industrial Applications, LNACM, vol. 63, pp. 241–269. Springer, Berlin (2012) 13. Langer, U., Schanz, M., Steinbach, O., Wendland, W.L. (eds.): Fast Boundary Element Methods in Engineering and Industrial Applications. Lecture Notes in Applied and Computational MechanicsSeries, vol. 63. Springer, Berlin (2012) 14. Of, G.: BETI - Gebietszerlegungsmethoden mit schnellen Randelementverfahren und Anwendungen. Ph.D. Thesis, University of Stuttgart (2006) 15. Of, G., Steinbach, O.: The all-floating boundary element tearing and interconnecting method. J. Numer. Math. 17(4), 277–298 (2009) 16. Hackbusch, W.: A sparse matrix arithmetic based on H-matrices. Part I: introduction to H-matrices. Computing 62, 89–108 (1999) 17. Bebendorf, M.: Hierarchical Matrices: A Means to Efficiently Solve Elliptic Boundary Value Problems. Lecture Notes in Computational Science and Engineering, vol. 63. Springer, Berlin (2008) 18. Bebendorf, M., Rjasanow, S.: Adaptive low-rank approximation of collocation matrices. Computing 70, 1–24 (2003) 19. Greengard, L.F., Rokhlin, V.: A fast algorithm for particle simulations. J. Comput. Phys. 73, 325–348 (1987) 20. Greengard, L.F., Rokhlin, V.: A new version of the fast multipole method for the Laplace equation in three dimensions. Acta Numerica 6, 229–269 (1997). Cambridge University Press 21. Of, G., Steinbach, O., Wendland, W.L.: Applications of a fast multipole Galerkin in boundary element method in linear elastostatics. Comput. Vis. Sci. 8, 201–209 (2005)
Chapter 15
Hybrid TFETI and TBETI Zdeněk Dostál and Marie Sadowská
The scope of scalability of original FETI and TFETI is limited by the dimension of the coarse problem, which is proportional to the number of subdomains. The coarse problem is typically solved by a direct solver—its cost is negligible for a few subdomains. However, it starts to dominate when the number of subdomains is large, currently tens of thousands of subdomains. To overcome the latter limitation, Klawonn and Rheinbach [1] proposed to enforce some constraints on the primal level by interconnecting the groups of subdomains into clusters. The defect of each cluster is then the same as that of each subdomain. Klawonn and Rheinbach coined the method hybrid FETI (see [1], also hybrid FETI-DP or H-FETI). We can apply the same strategy to BETI to get a hybrid BETI (H-BETI) method. This section describes a three-level domain decomposition method in a form suitable for analysis and sufficient for basic implementation, postponing the issues related to the effective parallel implementation to the last chapter. We limit our analysis to the uniformly discretized cube decomposed into cube subdomains and clusters. We show that the condition number of the Schur complement of a cube cluster comprising .m × m × m subdomains interconnected by the faces’ rigid body modes and discretized by the regular grid increases proportionally to m. The estimates are plugged into the analysis of unpreconditioned H-TFETI and H-TBETI (also H-TFETI-DP) methods and used to prove their numerical scalability for contact problems. The results of numerical experiments show the considerable scope of scalability of both H-TFETI and H-TBETI and indicate that this method can
Z. Dostál • M. Sadowská Department of Applied Mathematics and IT4Innovations National Supercomputing Center, VŠB-Technical University of Ostrava, Ostrava, Czech Republic e-mail: [email protected]; e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 Z. Dostál et al., Scalable Algorithms for Contact Problems, Advances in Mechanics and Mathematics 36, https://doi.org/10.1007/978-3-031-33580-8_15
325
326
Zdeněk Dostál and Marie Sadowská
be helpful for the solution of huge linear and contact problems. The latter is not surprising as the estimates suggest that the cost of iterations increases proportionally to at most .m, but the cost of the coarse problem decreases with .m6 . Moreover, the two-level structure of the coarse grids split between the primal and dual variables can be effectively exploited by the node-core design of the modern supercomputers’ hardware.
15.1 Hybrid TFETI for 2D Scalar Problems Let us first describe how to aggregate a small number of adjacent subdomains into clusters. The method works in the same way in 2D and 3D, so we can illustrate the procedure on 2D scalar problem formulated on .Ω = (0, 1)2 decomposed into four adjacent square subdomains .Ω1 , Ω2 , Ω3 , Ω4 ⊆ (0, 1)2 with the only common node (corner) x ∈ Ω1 ∩ Ω2 ∩ Ω3 ∩ Ω4
.
as in Fig. 15.1. Ω
Ω Ω1
Ω2 x
Ω3
Ω4
Fig. 15.1 Connecting four squares in the common corner
The main idea is to transform the nodal variables associated with the cluster = Ω1 × Ω2 × Ω3 × Ω4 .Ω by the expansion matrix .L obtained by replacing appropriate four columns of the identity matrix by their sum containing four ones. Feasible variables by .u of the cluster are related to the global variables .u u = L u,
.
of such cluster in global variables can be obtained and the stiffness matrix .K by = LT diag(K1 , K2 , K3 , K4 )L, .K
15 Hybrid TFETI and TBETI
327
where .Ki is the stiffness matrix of i-th subdomain. Moreover, the kernel .ImR of .K is only one dimensional, while the dimension of the kernel of K = diag(K1 , K2 , K3 , K4 )
.
is four. Notice that .ImL is the space of feasible vectors, i.e., those with the same displacements of the nodes with identical coordinates. More generally, we can decompose .Ω into .c · 4 square subdomains and connect them in the corresponding adjacent corners into c square clusters by means of the block diagonal expansion matrix L = diag(L1 , . . ., Lc ),
.
of .Ω and the constraint matrix .B in global so that the stiffness matrix .K variables read = EBL, = LT KL and B .K where .E is obtained from the identity matrix by deleting the rows that correspond to zero rows of .BL. In this way, we obtain the problem to find .
1 T λ min λT Fλ −d 2
s.t.
= o and Gλ
I , λI ≥ λ
(15.1)
=R T B T . G
(15.2)
where =B K +B T , F
.
= diag(K 1, . . . , K c ), K
and
In (15.1)–(15.2), .λ denotes the Lagrange multipliers enforcing continuity of the solution on the subdomains’ interfaces and Signorini’s conditions, and .R denotes the block-diagonal matrix the columns of which span the kernel of .K. In this way, we managed to reduce the number of dual constraints (i.e., the by the factor of four; however, the regular condition rows of the matrix .G) deteriorated to number of .F H H G T )−1 G; =I−G T (G .κ(PFP|ImP) ≤ C 1 + ln , P h h see Vodstrčil et al. [2]. It has been observed that it is better to interconnect the subdomains by edge averages [2]. The basic idea is a rather trivial observation that for any p .x ∈ R , p T .x e = xi , e = [1, . . . , 1] ∈ Rp , i=1
so if [c1 , . . . , cp−1 , e],
.
1 e = √ e, p
328
Zdeněk Dostál and Marie Sadowská
denote an orthonormal basis of .Rp , and then the last coordinate of a vector T p .x ∈ R in this basis is given by .xp = e x. To find the basis and transformation, let us denote by .T ∈ Rp×p an orthogonal matrix that can be obtained by the application of the Gram-Schmidt procedure to the columns of I e .T0 = ∈ Rp×p , −eT 1 starting from the last one. We get CT C = I,
T = [C, e] ,
.
CT e = o,
e = 1,
C ∈ Rp×(p−1) ,
e ∈ Rp , (15.3)
and if .x = Ty, y ∈ Rp , then 1 .yp = e x = √ xi . p i=1 p
T
15.2 Hybrid TFETI Problems Let us again consider a dual contact TFETI problem (11.33) min Θ(λ)
.
s.t.
λI ≥ o
and
RT (f − BT λ) = o,
(15.4)
where
1 T λ BK+ BT λ − λT (BK+ f − c). 2 We assume that problem (15.4) resulted from the discretization of (possibly multibody) contact problem decomposed into s floating subdomains .Ωi with boundaries .Γi , so that Θ(λ) =
.
K = diag(K1 , . . . , Ks ),
.
R = diag(R1 , . . . , Rs ).
Each local stiffness matrix .Ki is SPS with the kernel which is spanned by the matrix .Ri defined by the blocks corresponding to the nodes in .Ωi . The block corresponding to .xj ∈ Ωi , .xj = (xj , yj , zj ), reads ⎤ ⎡ 1 0 0 yj 0 −zj i zj 0 ⎦ , i = 1, . . . , s. .Rj = ⎣ 0 1 0 −xj (15.5) 0 0 0 1 −yj xj To simplify our analysis, we assume that the problem is discretized by a uniform grid with the steplength h and the domain is decomposed into s cube subdomains with the sidelength H as in Fig. 15.2.
15 Hybrid TFETI and TBETI
329
Hc H h
Fig. 15.2 Domain .Ω decomposed into .4 × 4 × 4 subdomains and .2 × 2 × 2 clusters
Let us now interconnect some neighboring subdomains (of the same body) into c cube clusters so that the defect of each cluster is equal to six. To avoid uninteresting complications, we assume that the clusters are identical cubes comprising .m × m × m subdomains. A simple example of such twolevel decomposition is in Fig. 15.2; it represents a cube decomposed into 64 cube subdomains interconnected into 8 cube clusters comprising .2 × 2 × 2 subdomains each. We shall consider the clusters interconnected by more general matrices than the expansion ones introduced in Sect. 15.1. In particular, we shall describe the matrices .Z with orthogonal columns with the range comprising the displacements with identical rigid body modes of the interiors of adjacent faces. We get the H-TFETI problem min Θ(λ)
.
s.t.
λI ≥ o
and
T ( T λ) = o, R f −B
(15.6)
where 1 + T K + Θ(λ) = λT B K B λ − λ T (B f − c), 2
.
= diag(K 1, . . . , K c ). K
and .K are defined by The matrices .B = EBZ, B
.
= ZT KZ, K
i = (Zi )T Ki Zi . K
We shall give the details in the following sections. In the rest of this chapter, we shall study the performance of H-TFETI by the analysis of the spectrum of b b )T , =B K +B T = B F S+ (B
.
S = diag( S1 , . . . , Sc ),
= [B b , O], B
i with respect to its where . Si , .i = 1, . . . , c, denotes the Schur complement of .K subdomains’ interior variables. We shall give more details on the H-TFETI
330
Zdeněk Dostál and Marie Sadowská
matrices later, but they are not necessary for the following lemma, which is a cluster variant of Lemma 11.3. Lemma 15.1 Let there be constants .0 < C1 < C2 such that for each .λ ∈ Rm T λ2 ≤ C2 λ2 . C1 λ2 ≤ B
.
i be joined so that the kernel of its Let the subdomains in each cluster .Ω i comprises only the restrictions of global rigid body modes. stiffness matrix .K Denote =G T (G G T )−1 G, =R T B T , = I − Q. .G Q P Then F P) ≤ κ(P
.
Si : i = 1, . . . , c} C2 max{ . i ) : i = 1, . . . , c} C1 min{λmin (S
Proof The proof follows the same lines as that of Lemma 10.2; it is enough to replace the subdomain’s Schur complements .Si by the cluster’s Schur com. plements . Si . In the rest of this chapter, we describe the interconnection of subdomains by the face rigid body modes and prove relevant theoretical results. We shall use the following notations for the parameters of the discretized H-TFETI problems reduced to the subdomains’ boundaries (Table 15.1): s
Number of subdomains in the domain
c
Number of clusters in the domain
.sc
Number of subdomains in a cluster
.Hc
Diameter of clusters
.
= H/h
.ne
=−1
Number of nodes in the interior of an edge
.nf
= ( −
Number of nodes in the interior of a face
.ns
62
.n
=
1)2
+2
= sc n s
Number of nodes on boundaries of all subdomains in a cluster Number of interconnected faces in a cluster
.ni .m
Number of nodes on the boundary of a subdomain
= Hc /H
Number of subdomains in one direction of a cluster
Table 15.1 Notation used throughout this chapter
15 Hybrid TFETI and TBETI
331
15.3 Orthogonal Rigid Body Modes F P to the Using Lemma 15.1, we can reduce the analysis of the spectrum of .P analysis of that of . Si . To this end, we shall need orthonormal bases of the rigid body modes of faces and subdomains.
15.3.1 Rigid Body Modes of the Interiors of Faces For each face, let us introduce a local Cartesian system of coordinates with the origin in the center and use the coordinates to implement lexicographic ordering of the nodes in the faces’ interiors. The outer normal that is parallel to one of the coordinate axes can identify each face. We shall do with three faces .F x , .F y , and .F z and their local coordinate vectors identified by the same indices. For example, the linearized rigid body modes of .F z are spanned by the columns of ⎤ ⎡ ⎤ ⎡ ef o o ef o o yfz o −zzf yfz o o z z zzf o ⎦ = ⎣ o ef o −xzf o o ⎦ , .Rf = ⎣ o ef o −xf (15.7) o o ef o −yfz xzf o o ef o −yfz xzf where .ef ∈ Rnf denotes a vector with all entries equal to 1 and .xzf , .yfz , and z .zf = o denote the coordinate vectors. To show that our choice of the coordinate system guarantees that the columns of .Rzf are orthogonal to each other, we shall exploit their tensor structure by means of the Kronecker product. Recall that if .u = [ui ] ∈ Rp and .v = [vj ] ∈ Rq are given vectors, then their Kronecker product .u ⊗ v is defined by ⎤ ⎡ u1 v ⎢ . ⎥ pq .u ⊗ v = ⎣ . ⎦ ∈ R , . up v and if .u, w ∈ R and .v, t ∈ R , then by (2.3) p
q
(u ⊗ v)T (w ⊗ t) = uT w ⊗ vT t = (uT w)(vT t).
.
(15.8)
Denoting by .ee ∈ Rne a vector with all the entries equal to one and noticing that the x-coordinates of the nodes of .F z with a fixed y-coordinate form the vector q = h[1 − k, 2 − k, . . ., −1, 0, 1, . . . , k − 2, k − 1]T ∈ Rne ,
.
we can write e = ee ⊗ ee ,
. f
xzf = ee ⊗ q,
and
yfz = q ⊗ ee .
332
Zdeněk Dostál and Marie Sadowská
Since .eTe q = 0, we can use (15.8) to get eT xzf = (ee ⊗ ee )T (ee ⊗ q) = (eTe ee )(eTe q) = 0,
. f
eTf yfz = (ee ⊗ ee )T (q ⊗ ee ) = (eTe q)(eTe ee ) = 0, (xzf )T yfz = (ee ⊗ q)T (q ⊗ ee ) = (eTe q)(qT ee ) = 0, ef 2 = ee ⊗ ee 2 = nf , xzf 2
(15.9)
= (ee ⊗ q) (ee ⊗ q) = ee q = ne q = T
2
2
2
yfz 2 .
(15.10)
We conclude that the columns of .Rzf are orthogonal. We finish by normalizing the columns of .Rzf to get ⎤ √1 yz ef o o o o f 2 z ⎥ ⎢ z 1 o o ⎦, .Rf = ⎣ o ef o − √ xf 2 z z o −yf xf o o ef ⎡
ef = xzf = yzf = 1.
z
The relation between .Rf and .Rzf is given by z
Rf = Rzf D−1/2 ,
.
D = diag(nf , nf , nf , bf , bf , bf ),
bf = ne q2 .
(15.11)
If we apply a similar procedure to the other faces, we get orthonormal representations of their rotations in the form ⎡ ⎤ ef o o yxf o −zxf x ⎢ x x √1 zx o ⎥ .Rf = ⎣ o ef o o ⎦ , ef = zf = yf = 1, 2 f x o o ef o − √12 yf o ⎤ ⎡ ef o o o o − √12 zyf y ⎥ ⎢ y y o ⎦ , ef = zyf = xyf = 1. Rf = ⎣ o ef o −xf zf y 1 √ x o o ef o o 2 f
15.3.2 Rigid Body Modes of Subdomains For each subdomain, let us introduce a local Cartesian system of coordinates with the origin in the subdomain’s center, let .es ∈ Rns denote a vector with unit entries, and let us reorder the coordinates of the nodes on their surface into three vectors .xs , .ys , .zs ∈ Rns . The linearized rigid body modes of each subdomain then read ⎤ ⎡ ys o −zs es o o zs o ⎦ . .Rs = ⎣ o es o −xs (15.12) o o es o −ys xs
15 Hybrid TFETI and TBETI
333
To show that our choice of the coordinate system guarantees that the columns of .Rs are orthogonal, let us denote = h[−k, . . ., −1, 0, 1, . . ., k]T ∈ Rne +2 , ee = [1, . . ., 1]T ∈ Rne +2 , q
.
and let us introduce notations for cube vectors e = ee ⊗ ee ⊗ ee , , xγ = ee ⊗ ee ⊗ q .xγ = ee ⊗ ee ⊗ q,
e = ee ⊗ ee ⊗ ee, ⊗ ee , yγ = ee ⊗ q .yγ = ee ⊗ q ⊗ ee ,
. γ
. γ
.
.
⊗ ee ⊗ ee, z =q z = q ⊗ ee ⊗ ee .
. γ . γ
Since the scalar product of the vectors associated with the subdomain’s boundary can be obtained as a difference between the scalar products of the cube vectors associated with the subdomain and its interior, we can use T .e q = 0 and (15.8) to get e eT xs = eγ T xγ − eTγ xγ
. s
) − (ee ⊗ ee ⊗ ee )T (ee ⊗ ee ⊗ q) = (ee ⊗ ee ⊗ ee)T (ee ⊗ ee ⊗ q ) − (eTe ee )(eTe ee )(eTe q) = 0, = (eeT ee)(eeT ee)(eeT q xTs ys = xγ T yγ − xTγ yγ )T (ee ⊗ q ⊗ ee) − (ee ⊗ ee ⊗ q)T (ee ⊗ q ⊗ ee ) = (ee ⊗ ee ⊗ q )( = (eeT ee)(eeT q qT ee) − (eTe ee )(eTe q)(qT ee ) = 0, xTs xs = xγ T xγ − xTγ xγ
.
)T (ee ⊗ ee ⊗ q ) − (ee ⊗ ee ⊗ q)T (ee ⊗ ee ⊗ qe ) = (ee ⊗ ee ⊗ q q2 − n2e q2 . = (ne + 2)2 Similarly, xTs zs = ysT zs = eTs ys = eTs zs = 0,
.
and xs 2 = ys 2 = zs 2 = (ne + 2)2 q2 − n2e q2 ,
.
es 2 = ns .
(15.13)
We conclude that the columns of .Rs are orthogonal. After normalizing the columns of .Rs , we get an orthonormal basis of linearized rigid body modes in the form ⎤ ⎡ √1 y √1 zs es o o o − s 2 2 ⎢ 1 √1 zs o ⎥ .Rs = ⎣ o es o − √ xs ⎦ , es = xs = ys = zs = 1. 2 2 1 1 o o es o − √ 2 y s √ 2 xs
334
Zdeněk Dostál and Marie Sadowská
The relation between .Rs and .Rs is given by Rs = Rs D−1/2 , Ds = diag(ns , ns , ns , bs , bs , bs ), bs = (ne + 2)2 q2 − n2e q2 . s (15.14)
.
15.4 Joining Subdomains by the Face Rigid Body Modes To describe interconnecting of face rigid body motions by the averages, we shall use the transformation of bases similar to that proposed by Klawonn and Widlund [3]. The basic idea is again a trivial observation that if .rf ∈ Rnf and n .uf ∈ R f denote a unit face rigid body mode and face displacement, respectively, then .uTf rf is the corresponding coordinate of .uf in any orthonormal basis containing .rf . If we introduce .rf into the basis of displacements of adjacent faces, we can enforce the same rigid body displacement by identifying the corresponding coordinates using the expansion mapping as in Sect. 15.1. More formally, let the columns of .T and .Rf ∈ R3nf ×6 denote orthonormal bases of the displacements and rigid body modes of the faces, T T T = C, Rf , CT C = I, Rf Rf = I, Rf C = O, C ∈ R3nf ×(3nf −6) .
.
If .uf = Tν f , .ν f ∈ R3nf , then the coordinates of the rigid body modes 6 .[ν f ]R ∈ R of .ν f , also called rigid body face averages, are given by T
[ν f ]R = [TT uf ]R = Rf uf .
.
15.4.1 Coupling Two Adjacent Faces Let us first consider a coupling of two adjacent subdomains .Ω1 and .Ω2 into a cluster by enforcing the same average rigid body modes of the interiors of adjacent faces. The rigid body modes of the interior of the face .Γij of .Ωi ij adjacent to .Ωj are spanned by the matrix .Rf ∈ R3nf ×6 with orthonormal columns that form the basis of the rigid body modes associated with the ij ji face. See also Fig. 15.3. Notice that .Rf = Rf as the coordinates of the nodes on adjacent faces are identical. On the interiors of faces .Γ12 and .Γ21 , we shall introduce the transformation matrices .T12 , T21 ∈ R3nf ×3nf , u12 = T12 ν 12 ,
.
u21 = T21 ν 21 ,
ij
Tij = [Cij , Rf ] ∈ R3nf ×3nf ,
15 Hybrid TFETI and TBETI
335 Ωi
Γji
Ωj
Γij Fig. 15.3 Gluing two subdomains by the face averages
so that we can define the transformations .T1 , .T2 ∈ R3ns ×3ns acting on the boundaries of .Ω1 and .Ω2 by I O O I O O I O I O 1 2 .T = T = = = 12 , 21 . O T12 O T21 O C12 Rf O C21 Rf The identity matrix is associated with the variables that are not affected by the coupling. Notice that one face can be connected to at most one adjacent face. The global transformation .T ∈ R3n×3n , where .n = sc ns , .sc = 2, is the number of nodes in the cluster, which then reads ⎤ ⎤ ⎡ ⎡ I O O OO O I O O O 1 12 ⎥ ⎢ O T12 O O ⎥ ⎢ T O ⎢ O C12 Rf O O O ⎥ ⎥ ⎢ . = .T = = ⎢ ⎣O O I O ⎦ ⎣O O O I O O ⎥ O T2 ⎦ 21 O O O T21 O O O O C21 Rf To get a basis of feasible displacements, i.e., the displacements with identical average rigid body modes on the interior of adjacent faces, we can use a normalized expansion matrix .L to sum the third and the last block columns and then put the result into the last column to get ⎡ ⎤ I O OO O √ 12 ⎥ 1 √ 12 ⎢ C O 1/ 2 R ⎢ O C12 O O 1/ 2 Rf ⎥ √ 21 = [C E], .Z = TL = ⎢ ⎥= 2 O ⎦ ⎣O O I O O C 1/ 2 R √ 21 21 O O O C 1/ 2 Rf where .Z ∈ R3n×(3n−6) denotes a basis of feasible displacements, .C ∈ ij ∈ R3ns ×6 denotes R3n×(3n−12) has orthonormal columns, .E ∈ R3n×6 , and .R ij .Rf padded with zeros, i.e., the feasible rigid body modes of the interior of adjacent faces. The normalized expansion matrix reads
336
Zdeněk Dostál and Marie Sadowská
⎡
I ⎢O ⎢ ⎢O .L = ⎢ ⎢O ⎢ ⎣O O
O I O O O O
O O O I O O
⎤ O O O √ O⎥ ⎥ O 1/ 2 I ⎥ ⎥ ∈ R3n×(3n−6) . O O⎥ ⎥ ⎦ I √O O 1/ 2 I
Notice that the feasible vectors can be described in a much simpler way using 12 √ R ⊥ 3n 3n×6 .ImZ = {u ∈ R : uT Z⊥ = o} = (ImZ⊥ ) , Z⊥ = 1/ 2 . 21 ∈ R −R
15.4.2 Interconnecting General Clusters The procedure described above can be generalized to specify the feasible vectors of any cluster interconnected by the averages of any set of adjacent faces. Using a proper numbering of variables by subdomains, in each subdomain setting first the variables that are not affected by the interconnecting and then those associated with the constraints ordered by faces, we get a matrix ⎤ ⎡ · · · ⎢ ij ⎥ √ ⎢· R ·⎥ 1 sc ⎥ .Z = C, E , C = diag(C , . . . , C ), E = 1/ 2 ⎢ ⎢ · · · ⎥ , (i, j) ∈ C , ⎣· R ji · ⎦ · · · where .C denotes a coupling set of the couples of indices that define the coupling of adjacent faces by averages, ij ∈ R3ns ×6 , E ∈ R3n×6ni , Z ∈ R3n×(3n−6ni ) , n = sc ns , R
.
and .ni denotes a number of coupled faces in the cluster. We assume that (i, j) ∈ C
.
=⇒ i < j.
The columns of .Z form an orthonormal basis of the subspace of feasible vectors. As above, we can describe the complement of feasible vectors conveniently by the matrix .Z⊥ ∈ R3n×6ni obtained from the block .E by changing the sign of the lower nonzero block components in each column, i.e.,
15 Hybrid TFETI and TBETI
337
⎡
· · ij ⎢· R √ ⎢ ⎢ .Z⊥ = 1/ 2 ⎢· · ⎣ · −R ji · ·
⎤ · ·⎥ ⎥ ·⎥ ⎥, ·⎦ ·
(i, j) ∈ C .
Notice that both .Z and .Z⊥ have orthonormal columns and that .U = [Z, Z⊥ ] is an orthonormal matrix.
15.5 General Bounds on the Spectrum of Clusters Let us consider a cluster obtained by interconnecting subdomains .Ω1 , . . . , Ωsc by face’s rigid body averages and denote 1 1 sc sc , R = diag R , . . . , R .S = diag S , . . . , S , and S = ZT SZ, where the columns of .R ∈ R3n×6sc form an orthonormal basis of the kernel of the unassembled Schur complement .S. While the dimension of .KerS is .6sc , we assume that the subdomains are joined in such a way that the kernel of S is spanned by six orthonormal columns the assembled Schur complement . of .R. through the orthogonal basis of the space We can obtain the matrix .R Rc = ImR ∩ ImZ
.
of feasible vectors of the kernel of .S. A suitable basis of .Rc is defined by the orthogonal columns of the matrix .Rc ∈ R3n×6sc comprising blocks (15.5) in the system of coordinates with the origin in the center of the cluster. If .Rc denotes the matrix obtained by the normalization of the columns of .Rc , we have T .Rc = ZZ Rc and SRc = O. If we define
= ZT Rc , R
.
we get
= SR SZT Rc = ZT SZZT Rc = ZT SRc = O.
.
has orthonormal columns, notice that To see that .R Rc = ZR
.
and
T T ZT ZR =R T R. I = Rc Rc = R
338
Zdeněk Dostál and Marie Sadowská
Since .ZT Z = I, it is rather trivial to see that the upper bound on the S is given by spectrum of . .λmax (15.15) S ≤ max Si , i = 1, . . . , sc . To get a lower bound on the nonzero eigenvalues of . S, let us denote T ζ = o , .U = ζ ∈ Im S : ζ = 1 = ζ ∈ R3n−6ni : ζ = 1 and R where .ni is a number of coupled faces in the cluster, so that we can write .
min ζ T Sζ = min ζ T ZT SZζ ≥ λmin (S) min PIm S Zζ2 . U
U
U
(15.16)
Let . R⊥ , R denote an orthogonal matrix such that R⊥ ∈ R3n×(3n−6sc ) ,
Im R⊥ = Im S,
.
Im R = Ker S,
R ∈ R3n×6sc .
We assume that .R was obtained by the orthogonalization of the columns of .R, so that 1
sc
R = diag(R , . . . , R ),
.
i
i
(R )T R = I,
i = 1, . . . , sc ,
and
T
.
min PIm S Zζ2 = min R⊥ Zζ2 . U
U
T
PIm S = R⊥ R⊥ , (15.17)
The relations (15.16) and (15.17) are valid for any coupling set .C . Let us now assume that .C0 is sufficiently rich to interconnect .sc subdomains into a cluster with the kernel 0 = ZT Rc , .R 0 where .Z0 denotes an instance of .Z associated with .C0 . Examining the structure of .Z, we can check that if .C0 ⊆ C , then Im Z0 ⊇ Im Z.
.
It follows that
T
.
T
min R⊥ Zζ2 ≥ min R⊥ Z0 ζ2 , U
U
(15.18)
and combining (15.16)–(15.18), we get min ζ T Sζ ≥ λmin (S) min R⊥ Z0 ζ2 . T
.
U
U
(15.19)
The inequalities (15.18) and (15.19) are useful for the proof of qualitative results on the bounds on the spectrum of general clusters. We shall use them in the following sections.
15 Hybrid TFETI and TBETI
339
15.6 Bounds on the Spectrum of Chained Clusters Let us now consider the coupling set denoted by .Cchain which is associated with gluing of a cluster by the face averages into a chain as in Fig. 15.4. More formally, .Cchain = (1, 2), (2, 3), . . . , (sc − 1, sc ) . Notice that many general clusters formed by cube subdomains can be chained by the face averages and the number .ni of coupled interfaces specified by .Cchain is given by .ni = sc − 1.
Fig. 15.4 Examples of chained clusters
15.6.1 Angle Between the Kernel and Complement of Feasible Vectors To estimate the right-hand side of (15.18) for .C0 = Cchain , let us simplify the notation by denoting .Z = Zchain till the end of this section and recall that the matrices .[Z, Z⊥ ] and .[R⊥ , R] are orthogonal, so RT⊥ RT⊥ Z RT⊥ Z⊥ .W = [Z Z⊥ ] = T T T R R Z R Z⊥ is also orthogonal. Using the CS decomposition (2.49), we get that the set of T singular values of .RT⊥ Z and .R Z⊥ in the interval .(0, 1) are identical. ExamT
ining the block structure of .R and .Z⊥ , we can check that T
R Z⊥ ∈ Rsc ×6ni = R6sc ×6(sc −1)
.
340
Zdeněk Dostál and Marie Sadowská T
is a full column rank matrix. Replacing .R Z⊥ by its slim SVD decomposition T
R Z⊥ = UΣVT ,
.
Σ = diag (σ1 , . . . , σ6sc −6 ) ∈ R(6sc −6)×(6sc −6) , σ1 ≥ · · · ≥ σ6sc −6 = σmin > 0,
we get by (15.19) .
2 min ζ T ζ2 . Sζ ≥ λmin (S) min UΣVT ζ2 = λmin (S) σmin U
U
(15.20)
y kh (k − 1)h
h −h
−kh
−h
h
x
−(k − 1)h −kh kh
Fig. 15.5 Discretization of the face .F z with step .h = H/(2k)
15.6.2 The Norm of Face’s Rigid Body Modes Let us now evaluate the norms of the orthogonal rigid body modes of the interiors of faces (15.7) in terms of the discretization and decomposition parameters h and H using H . .h = 2k We shall use the notation of Sect. 15.4.1 and start our analysis with the face z .F specified by the outer normal in the direction of the z-axis. We first substitute for .nf in 15.9 to get ef 2 = nf = (2k − 1)2 .
.
(15.21)
To get the norms of coordinate vectors associated with edges and faces, recall that by Fig. 15.5 we can write the coordinates of the interior nodes of z .F on horizontal line segments in terms of k as
15 Hybrid TFETI and TBETI
341
xhe = yev = q ∈ Rne ,
ne = 2k − 1.
.
Denoting c(k) = 12 + 22 + · · · + k 2 =
.
1 k(k + 1)(2k + 1), 6
we get xhe 2 = yev 2 = q2 = 2c(k − 1)h2 ,
.
and, after multiplying by the number of line segments in the face’s interior, we find that the norms of the nonzero vector components of the orthogonal rotations (15.7) are equal to .bf , b = bf (k, h) = xzf 2 = yfz 2 = (2k − 1)2c(k − 1)h2 =
. f
4 4 2 k h + O(k 3 )h2 . 3 (15.22)
Applying the same procedure to.F x and .F z , we find that the nonzero components of their rotations have the same norm as the corresponding components of .F z . Thus y y z x z x .x = xf = yf = yf = zf = z = bf (k, h). (15.23) f f
15.6.3 The Norm of Subdomain’s Rigid Body Modes To get the norms of the components of the subdomains’ rotations, we shall rewrite the right-hand side of the formulas xs 2 = ys 2 = zs 2 = (ne + 2)2 q2 − n2e q2
.
developed in Sect. 15.3.2 in terms of h and k. Using n = 2k − 1,
. e
q2 = 2c(k − 1)h2 = q2 = 2c(k)h2 =
1 k(k − 1)(2k − 1)h2 , 3
1 k(k + 1)(2k + 1)h2 , 3
we get b = xs 2 = ys 2 = zs 2 =
. s
40 4 2 k h + O(k 3 )h2 . 3
(15.24)
342
Zdeněk Dostál and Marie Sadowská
We complete this subsection by substituting for .ns in (15.13) to get es 2 = ns = 24k 2 +2.
(15.25)
.
15.6.4 An Estimate To evaluate explicitly .σmin of .RT Z⊥ , notice that ⎡ ⎤ 12 O · · · O R O ⎢ −R ⎥ 21 23 · · · O O R ⎢ ⎥ ⎢ ⎥ 32 · · · O √ ⎢ O −R O ⎥ ⎢ ⎥, .Z⊥ = 1/ 2 .. ⎢ ⎥ . ⎢ · ⎥ · · · ⎢ ⎥ sc −1,sc −2 sc −1,sc ⎦ ⎣ O O · · · −R R sc ,sc −1 O O ··· O −R and recall that 1
s
R = diag(R , . . . , R ),
.
i ij ∈ R3ns ×6 , R ,R
and
ij )T R ij = I. (R
Let us assume for simplicity that the chain is vertical, so that the subdomains Ωi and .Ωj are interconnected by the average rigid body modes of horizontal −z , .i = 1, . . . , sc − 1. It follows by (15.11), (15.21), and faces .Fiz and .Fi+1 (15.23) that
.
ji = Rz = Rz D−1/2 , ij = R R f f f
.
Df = diag(nf , nf , nf , bf , bf , bf ).
Similarly by (15.14) i
R = Rs = Rs D−1/2 , s
.
Ds = diag(ns , ns , ns , bs , bs , bs ).
i ij , reorder the rows of .Ri and .Rij into two To evaluate explicitly .(R )T R vertical blocks, the upper one corresponding to the interior nodes of the face of .Ωi adjacent to .Ωj and the lower block corresponding to the rest, i.e., z z Rf Rf i .R = Rs = . , Rij = O Rr
We get (Ri )T Rij = (Rzf )T Rzf = Df
.
and
ij = D−1/2 (Rz )T Rz D (R )T R s f f f i
.
−1/2
= D−1/2 Df . s 1/2
(15.26)
15 Hybrid TFETI and TBETI
343
ij does not depend on .i, j, so we can In particular, it follows that .(Rf )T R define ij
D = D−1/2 Df = diag(α, α, α, β, β, β), s nf bf α = α(k) = , β =β(k) = , ns bs 1/2
.
(15.27) (15.28)
to get ⎡
D ⎢ −D √ ⎢ ⎢ T .R Z⊥ = 1/ 2 ⎢ ⎢ · ⎣ O O
O O ··· O D O ··· O . · · .. · O O ··· D O O . . . −D
⎤ O O⎥ ⎥ ⎥ . ·⎥ ⎥ O⎦ D
It is convenient to rewrite the latter matrix through the Kronecker product recalled in Sect. 2.2. We get ⎤ ⎡ 1 0 0 ··· 0 0 ⎢ −1 1 0 · · · 0 0 ⎥ ⎥ ⎢ 1 ⎥ ⎢ T 3sc ×(3sc −3) .R Z⊥ = √ C ⊗ D ∈ R , C = ⎢ · · · . . . · · ⎥ ∈ Rsc ×(sc −1) , ⎥ ⎢ 2 ⎣ 0 0 0 · · · −1 1 ⎦ 0 0 0 . . . 0 −1 so that ⎤ 2 −1 0 · · · 0 0 ⎢ −1 2 −1 · · · 0 0 ⎥ ⎥ ⎢ 1 ⎥ ⎢ T T T CT C ⊗ D2 = 1/2 ⎢ · · · . . . · · ⎥ ⊗ D2 .(R Z⊥ ) R Z⊥ = ⎥ ⎢ 2 ⎣ 0 0 0 · · · 2 −1 ⎦ 0 0 0 · · · −1 2 ⎡
and σ 2 (RT Z⊥ ) =
. min
1 2 λmin (CT C)(λmin (D)) . 2
(15.29)
We can easily recognize that .CT C is 1D discrete Laplacian, the eigenvalues of which are well known [4, Exercise 7.2.18], in particular the smallest eigenvalue is given by π 2 T . .λmin (C C) = 4 sin (15.30) 2s
344
Zdeněk Dostál and Marie Sadowská
The entries .α(k) and .β(k) of .D2 are increasing, .α(k) ≥ β(k), and .
lim α(k)2 = 1/6,
and
k→∞
lim β(k)2 = 1/10.
(15.31)
k→∞
Using (15.20), (15.26), and (15.29)–(15.30), we can formulate the estimate for chain clusters. Theorem 15.1 Let . S denote the Schur complement of a chained cluster comprising .sc cube subdomains of .Ω with the side length H. Let the subdomains be discretized by the regular matching grid with the steplength h and interconnected by the edge’s average rigid body motions. Let .λmin (S) denote the smallest nonzero eigenvalue of subdomains’ Schur complement .S and k=
.
H . 2h
Then S ≥ S, .
λmin (S) ≥ λmin ( S) ≥ β(k)2 λmin (S) sin2
π 2sc
≈
1 λmin (S) 10
π 2sc
2 . (15.32)
The condition number of .Si is known to be bounded by .CH/h; see Proposition 11.2. Combining this observation with Theorem 15.1, we get a qualiS) for any cluster comprising cubes joined by average tative estimate on .κ( rigid body motions. We shall formulate this observation as a corollary. Corollary 15.1 Let . S denote the Schur complement of the cluster defined by a coupling set .C comprising .sc identical cube domains discretized by a regular grid with the parameters h and H and interconnected by average rigid body motions of adjacent faces. Let there be a chain coupling set .Cchain of all .sc subdomains such as .Cchain ⊆ C . Then .λmin ( S) satisfies (15.32) and there is a constant C independent of h, H, and .sc such that ≤ Cs2 H = Csc Hc . .κ(S) (15.33) c h h Remark 15.1 The concept of “chain” is closely related to that of “acceptable path” used in the analysis of the condition number of the systems of linear equations arising from the application of preconditioned FETI-DP methods [3].
15 Hybrid TFETI and TBETI
345
15.7 Bounds on the Spectrum of Cube Clusters The inequalities (15.32) and (15.33) give some upper bound on .κ( S) for quite general clusters, but it is rather pessimistic for more compact ones. For exS) of the cube ample, the estimates guarantee that the condition number .κ( cluster with a fixed diameter .Hc comprising .m × m × m subdomains does not increase more than .sc = m3 times. Here we shall carry out finer analysis to prove that for the cube clusters joined by the face averages, it is possible to replace .sc by m. For the clusters with a fixed diameter .Hc , it follows that the condition number deteriorates proportionally to m. Let us consider a cluster formed by .m×m×m cube .H ×H ×H subdomains joined by the face averages of the variables associated with the interiors of adjacent faces. Using the procedure described above, we shall reduce the problem to the analysis of the singular values of .RT Z⊥ , which is now a bit more complicated. The main idea is to reduce the problem to the analysis of vertical and horizontal chains using the tensor structure of .RT Z⊥ . Let us illustrate the procedure on the .2×2×2 cluster depicted in Fig. 15.6. The cluster comprises 8 cube subdomains numbered in the column order interconnected by the face averages along all 12 interfaces.
Fig. 15.6 Cluster comprising .2 × 2 × 2 subdomains: vertical chains, vertical slices, and cluster
The interconnection can be represented by the matrix ⎡ ⎤ COOO I O I O √ ⎢ O C O O −I O O I ⎥ T ⎥ .R Z⊥ = 1/ 2 ⎢ ⎣ O O C O O I −I O ⎦ ⊗ D, O O O C O −I O −I
where C=
.
1 , −1
(15.34)
D = diag(α, α, α, β, β, β).
The matrix .C corresponds to the gluing of a vertical chains, and the entries of .D are defined in (15.27) and (15.28). Using the Kronecker product, we can rewrite the first four columns of (15.34) in the form .I ⊗ I ⊗ C, the next pair of columns as .I ⊗ C ⊗ I, and the last two ones as .C ⊗ I ⊗ I to get
346
Zdeněk Dostál and Marie Sadowská
1 RT Z⊥ = √ I ⊗ I ⊗ C, I ⊗ C ⊗ I, C ⊗ I ⊗ I ⊗ D. 2
.
(15.35)
Using the properties of the Kronecker product (see Sect. 2.2), we get RT Z⊥ (RT Z⊥ )T =
.
1 I ⊗ I ⊗ CCT + I ⊗ CCT ⊗ I + CCT ⊗ I ⊗ I ⊗ D2 . 2
The expression in brackets is a generalized Kronecker sum. In general case, C ∈ Rm×(m−1)
.
and
I = Im .
The eigenvalues of a generalized Kronecker sum can be expressed by the sum of the eigenvalues .λi of .CCT , similarly as in (2.6). If .λi , .λj , .λk , and .ei , .ej , T .ek are the eigenvalues and eigenvectors of .CC , respectively, then (I ⊗ I ⊗ CCT + I ⊗ CCT ⊗ I + CCT ⊗ I ⊗ I) ⊗ (ei ⊗ ej ⊗ ek )
.
= ei ⊗ ej ⊗ CCT ek + ei ⊗ CCT ej ⊗ ek + CCT ei ⊗ ej ⊗ ek = ei ⊗ ej ⊗ λk ek + ei ⊗ λj ej ⊗ ek + λi ei ⊗ ej ⊗ ek = (λi + λj + λk )(ei ⊗ ej ⊗ ek ). It follows that σ(I ⊗ I ⊗ CCT + I ⊗ CCT ⊗ I + CCT ⊗ I ⊗ I)
.
= {λi + λj + λk : λi , λj , λk ∈ σ(CCT )} and σ (I ⊗ I ⊗ CCT + I ⊗ CCT ⊗ I + CCT ⊗ I ⊗ I) ⊗ D2
.
= {(λi + λj + λk )δ : λi , λj , λk ∈ σ(CCT ), δ ∈ σ(D2 )}. Since zero is obviously a simple eigenvalue of .CCT and the rest of the spectrum of .CCT equals to that of .CT C, we conclude that σ 2 (RT Z⊥ ) ≥
. min
π 1 λmin CCT β 2 = 2β 2 sin2 . 2 2m
(15.36)
The arguments presented above are valid for any m. Thus we have managed to reduce the estimate to that of the chain of the length m. We can thus formulate the main result of this section. denote the Schur complement Theorem 15.2 For each integer .m > 1, let .S of the .m × m × m cluster with the side length .Hc comprising .sc = m3 cube subdomains of the side length .H = Hc /m discretized by the regular grid with the step length h. Let .λmin (S) denote the smallest nonzero eigenvalue of the Schur complement .S of the subdomain matrices .Ki , .i = 1, . . . , sc with respect to the interior variables. Then
15 Hybrid TFETI and TBETI
347
S = λmax (S) ≥ λmax ( S),
.
π 1 ≈ λmin (S) 2m 4m2 and there is a constant C such that for any h, H, and m λ
. min
(S) ≥ λmin ( S) ≥ λmin (S)β(k)2 sin2
κ( S) ≤ C
.
Hm2 . h
(15.37) (15.38)
(15.39)
Proof The inequalities (15.37) and (15.38) were proved above. To get (15.39), notice that by (15.37) and (15.38) π .κ( S) ≤ κ(S)β −2 (k) sin−2 2m To finish the proof, recall that by Proposition 11.2 there is a constant C independent of H and h such that κ(S) ≤ C
.
H h
and .H = Hc /m, .Hc fixed.
.
To complete our analysis, we can combine Lemma 15.1 with Theorem 15.2 for to get a bound on the conditioning of H-TFETI dual stiffness matrix .F interconnecting of cube subdomains into cube clusters by the face averages. Corollary 15.2 Let the domain .Ω be discretized and decomposed by uniform grids with the parameters h and H into .c = m × m × m cube clusters with the edge length .Hc = mH. Let the clusters be connected by face averages. Then there is a constant C independent of h and .Hc such that for any h and m P) ≤ Cm κ(F|Im
.
Hc . h
(15.40)
Corollary 15.2 is sufficient to extend the scalability of FETI-based algorithms to H-TFETI. If the problem is linear, then the dual problem reduces to the solution of F Pλ =P d, .P (15.41) .P, and .F are those from (15.2). The problem (15.41) can be solved where .G, by a uniformly bounded number of conjugate gradient iterations provided m, h, and .Hc satisfy Hc ≤C .m h for a sufficiently large positive constant .C > 0.
348
Zdeněk Dostál and Marie Sadowská
15.8 Hybrid TFETI and TBETI for Contact Problems Let us now plug the above reasoning into solving contact problems. For simplicity, we consider homogeneous Dirichlet conditions, so we can start with the discretized contact TFETI problem in displacements (11.24) in the form .
min
1 T u Ku − f T u s.t. 2
B I u ≤ cI
and
BE u = o.
(15.42)
The nonzero columns of the constraint matrix .B are associated with boundary variables .ub , so we can write b b b BI BI O u .B = B , O = = . , u= BE ur BbE O Let us reformulate (15.42) in the variables .v associated with the basis of displacements of clusters interconnected by the face averages of the adjacent edges’ displacements. The interconnecting affects only the boundary variables, so that the transformations read b ZO v T u. .u = Zv, v= , Z = , and v = Z O I vr We get the transformed problem .
min
1 T v Kv − fT v 2
I v ≤ cI s.t. B
and
E v = o, B
(15.43)
where the rows and columns of the matrices can be reordered so that c) = Z T KZ, = diag(K 1, . . . , K K
.
= [Bb Z, O] = = [B b , O] = BZ B
I B E B
=
b O B I b O , B E
f = ZT f .
Notice that variables on the contact interface are not affected by the trans I = BI . cI are identical and .B formation so that the gaps .cI and . We shall switch to dual problem as in Sect. 11.6. Introducing the notation μI c .μ = , c= I , μE o the kernel of .K, we get that .μ solves the minimization and denoting by .R problem min Θ(μ)
.
s.t.
μI ≥ o and
T (f − B T μ) = o, R
(15.44)
15 Hybrid TFETI and TBETI
349
where
1 T + T K + μ B K B μ − μ T (B (15.45) f − c). 2 For further improvement, we shall follow Sect. 11.7. We simplify the notations by denoting Θ(μ) =
.
K +B T , F ≈ B = F −1 B K +B T , .F T e = TR f ,
K + d = F −1 (B f − c), T T G = TR B ,
where .T denotes a regular matrix which defines the orthonormalization of so that the rows of .G G T = I .G and problem (15.44) reads 1 T μ Fμ − μT d 2
min
.
s.t.
μI ≥ o and
= Gμ e.
(15.46)
Let us recall that b )T . = F −1 B K +B T = F −1 B b F S+ (B
.
In this chapter, we assembled . S from the subdomains’ Schur complements .Si of the finite element stiffness matrices .Ki with respect to interior variables. The finite element matrices .Ki and .Si were introduced in Sect. 11.5 and Sect. 11.6, respectively. However, we can obtain .Si also by the discretization of the boundary integral equations described in Sect. 14.7. It follows that we can simplify our exposition because the dual FETI and BETI problems have the same structure. As in Chaps. 10, 11, and 14, we shall transform the problem of minimization on the subset of an affine space to that on the subset of a vector space by means of an arbitrary .μ which satisfies Gμ = e.
.
We shall carry out the transformation by substituting μ = ν + μ.
.
After a simple manipulation, problem (15.46) reduces to min θ(ν)
.
s.t.
with θ(ν) =
.
= o and Gν
1 T ν Fν − ν T d, 2
ν I ≥ I = −μI λ. =d−F d
(15.47)
350
Zdeněk Dostál and Marie Sadowská
Our final step is based on the observation that problem (15.47) is equivalent to = o and ν I ≥ I , .min θ ρ (ν) s.t. Gν (15.48) where .ρ is an arbitrary positive constant, θ (ν) =
. ρ
1 T − ν T Pd, ν (PFP + ρP)ν 2
=G T G, Q
and
= I − Q. P
and .Q denote the orthogonal projectors on the image space Recall that .P T respectively. The regularization term is introof .G and on the kernel of .G, duced to simplify the reference to the results of QP that assume nonsingular Hessian of the cost function.
15.9 Optimality To show that Algorithm 9.2 (SMALBE-M) with the inner loop implemented by Algorithm 8.2 (MPRGP) is optimal for the solution of a class of problems arising from varying decompositions and discretizations of a given frictionless contact problem, let us introduce a new notation that complies with that used in the analysis of the algorithms in Part II. Let . > 0 denote a given constant, and for any .C ≥ 2, let us define the set of indices TC = {(m, Hc , h) ∈ N × R2 : mHc /h ≤ C}.
.
For any .t ∈ TC , let us define .
F P + ρQ, bt = P d, At = P t Bt = G,
I = I ,
where the vectors and matrices are from the definition of problem (15.47) with .t = (m, Hc , h). We assume that . tI ≤ o. We get a class of problems .
min ft s.t. Bt λ = o and λI ≥ It
(15.49)
with f (λ) =
. t
1 T λ At λ − bTt λ. 2
T G = I, we obtain Using these definitions and .G .
Bt = 1.
(15.50)
15 Hybrid TFETI and TBETI
351
It follows by Corollary 15.2 that for any .C ≥ 2 there are constants C 0 < aC min < amax
.
such that .
C aC min ≤ αmin (At ) ≤ αmax (At ) ≤ amax
(15.51)
for any .t ∈ TC . In particular, it follows that the assumptions of Theorem 9.4 are satisfied for any set of indices .TC , C ≥ 2, and we can formulate the main result of this chapter. Theorem 15.3 Let .C ≥ 2, ρ > 0, and .ε > 0 denote given constants, and let .{λkt }, {μkt }, and .{Mt,k } be generated by Algorithm 9.2 (SMALBE-M) for (15.49) with bt ≥ ηt > 0, 1 > β > 0, Mt,0 = M0 > 0, ρ > 0, μ0t = o.
.
Let Step 1 of Algorithm 9.2 be implemented by means of Algorithm 8.2 (MPRGP) with parameters Γ > 0 and α ∈ (0, 2/aC max ),
.
so that it generates the iterates k,1 k,l k λk,0 t , λt , . . . , λt = λt
.
for the solution of (15.49) starting from .λk,0 = λk−1 with .λ−1 = o, where t t t .l = lt,k is the first index satisfying .
k,l k gP (λk,l t , μt , ρ) ≤ Mt,k Bt λt
(15.52)
or .
k gP (λk,l t , μt , ρ) ≤ εbt
and
Bt λk,l t ≤ εbt .
(15.53)
Then for any .t ∈ TC and problem (11.72), Algorithm 9.2 generates an approximate solution .λkt which satisfies (11.76) at .O(1) matrix-vector multiplications by the Hessian .At of .ft .
15.10 Numerical Experiments We implemented H-TFETI into the PERMON package [5] developed at the Department of Applied Mathematics of the VŠB-Technical University of Ostrava and the Institute of Geonics AS CR Ostrava. We carried out the computations on the Salomon cluster, which consists of 1008 compute nodes; each
352
Zdeněk Dostál and Marie Sadowská
node contains 24 core Intel Xeon E5-2680v3 processors with 128 GB RAM, interconnected by 7D Enhanced hypercube InfiniBand. We carried out some numerical experiments to compare the performance of H-TFETI and TFETI and to get information about the effect of clustering. The experiments were limited to the semicoercive problem.
15.10.1 Scalar Variational Inequality and Clustering Effect We used the discretized semicoercive variational inequality introduced in Sect. 10.1 to examine the effect of clustering. We decomposed the domain of each membrane into 1024 square subdomains discretized by .100×100 degrees of freedom each. The subdomains were interconnected into .m × m clusters, .m ∈ {1, 2, 4, 8}, with .m = 1 corresponding to the standard TFETI method. The primal dimension of the resulting discretized problem was 20,480,000; 3169 inequalities enforced the non-penetration. We allocated each cluster one computational core. Cores Clusters 32 32 128 128 512 512 2048 2048
m 8 4 2 1
SMALBE-M iterations matrix.×vector time (s) 12 218 142.87 16 186 37.79 25 252 24.30 52 243 74.51
Table 15.2 Performance of H-TFETI with .m × m clusters
The results in Table 15.2 indicate a positive effect of clustering on the rate of convergence of the outer loop. At least a partial explanation was provided Jarošová et al. [6], who proved that we could interpret the interconnecting of subdomains into clusters as the preconditioning by conjugate projector (see Sect. 13.7). The clustering reduces the number of equality constraints 2 .m -times, so it is not surprising that the number of the outer iterations decreases with m. Stopping criteria of the relative residual in the conjugate gradient method were set to .ε = 10−4 .
15.10.2 Clumped Elastic Cube In this section, we compare the performance of H-TFETI with the standard and preconditioned TFETI on the elastic unit cube benchmark discretized by Q1 finite elements. The cube was fixed on the left and pressed down by its own weight. Both algorithms were implemented in the ESPRESO (ExaS-
15 Hybrid TFETI and TBETI
353
cale PaRallel FETI SOlver) package [7] developed at IT4Innovations, Czech National Supercomputing Center. Computations were performed on the Salomon cluster which consists of 1008 compute nodes. Each node contains 24 core Intel Xeon E5-2680v3 processors with 128 GB RAM, interconnected by 7D Enhanced hypercube InfiniBand. The unit domain was decomposed into .c = k × k × k clusters, .k ∈ {3, 7, 10, 13, 16}, each cluster comprising .sc = m × m × m square subdomains with the edge length .H = 1/(mk), .m = 3, interconnected by face averages and discretized by the regular grid with the discretization parameter h, .H/h = 100. The number of subdomains and unknowns ranged from 729 to 110,592 and from 15,000,633 to 2,275,651,584, respectively (see Table 15.3). Stopping criteria of the relative residual in the conjugate gradient method were set to .ε = 10−6 .
Clusters Subdomains 27 729 343 9261 1000 27,000 2197 59,319 4096 110,592
Unknowns 15,000,633 190,563,597 555,579,000 1,220,607,063 2,275,651,584
iter/time (s) iter/time (s) iter/time (s) H-TFETI TFETI DirTFETI 89/27.7 60/17.3 20/40.0 105/35.1 59/20.3 19/46.2 104/35.2 na na 104/59.8 na na 104/58.7 na na
Table 15.3 Billion clumped cube—unpreconditioned H-TFETI and TFETI and TFETI with Dirichlet preconditioner (DirTFETI), times include initiation, .m = 3
We can see that the number of iterations of the H-TFETI algorithm without preconditioning is bounded as in the TFETI algorithm in agreement with the theory. Rather surprisingly, H-TFETI turned out to be faster than DirTFETI in spite of some three-time larger number of iterations. Both the number of H-TFETI iterations and times are greater than that required by TFETI algorithm, but acceptable.
15.10.3 Clumped Cube Over Obstacle Our final benchmark is a clamped cube over a sinus-shaped obstacle as in Fig. 15.7, loaded by own weight, decomposed into .4×4×4 clusters, .H/h = 14, using the ESPRESO [7] implementation of H-TFETI for contact problems. We can see in Table 15.4 that TFETI needs a much smaller number of iterations, but H-TFETI is still faster due to 64 times smaller coarse space and better exploitation of the node-core memory organization. In general, if we use .m × m × m clusters, the hybrid strategy reduces the dimension and the cost of the coarse problem by .m3 and .m6 , respectively.
354
Zdeněk Dostál and Marie Sadowská Clusters Subdomains 64 512 1000
4096 72,900 656,100
Cores 192 1536 3000
Unknowns .×106 13 99 193
H-TFETI TFETI iter/s iter/s 169/23.9 117/24.9 208/30.2 152/115.1 206/42.6 173/279.9
Table 15.4 Clamped elastic cube over sinus-shaped obstacle, total times, .m = 4, = 14
.Hs /h
Fig. 15.7 Displacements of a clamped elastic cube over the sinus-shaped obstacle
15.11 Comments The origins of hybrid FETI methods can be traced to the observation that the original preconditioned FETI method is not scalable for the solution of fourth-order (shall) and plate problems. Farhat and his coworkers noticed that FETI could be made scalable by enforcing the displacements’ continuity at the subdomains’ cross-points. They introduced an additional set of multipliers and proposed to determine their proper values at each iteration by solving another coarse problem; see Farhat et al. [8, 9]. They called the method FETI-2. A little later, Farhat, with other coauthors, replaced the additional set of multipliers by enforcing the corner constraints on the primal level [10]. Finally, the new method, called FETI-DP (FETI dual primal), was combined with preconditioners and proved scalable; see Mandel and Tezaur [11], Klawonn and Widlund [3], and Tu [12]. The idea to enforce some constraints on the primal level to interconnect the subdomains’ groups into clusters is due to Klawonn and Rheinbach [1]. Klawonn and Rheinbach coined the method H-FETI (hybrid FETI). The defect of the clusters and their subdomains is then the same, so the coarse grid preparation cost decreases proportionally to .m6 , .m = Hc /Hs , as compared with TFETI. Furthermore, the authors provided some numerical experiments and theoretical arguments with small clusters and subdomains interconnected in corners indicating that the preconditioned hybrid FETI can be effective for solving large problems; see [1, 13, 14]. Some analysis and applications to the solution of scalar variational inequalities can be found in Lee [15]. She called her method FETI-FETI.
15 Hybrid TFETI and TBETI
355
Probably the first bounds on the conditioning of small clusters can be found in Vodstrčil et al. [2]. Recently Dostál et al. [16] gave bounds on the spectrum of Schur complements of square clusters connected by the edge averages and proved that the conditioning of the clusters comprising .m × m square subdomains connected by edge averages deteriorates proportionally to m. The results were extended to 3D in Dostál et al. [17] and applied to the development of scalable algorithms for the solution of scalar variational inequalities; see [18]. The analysis of 3D elasticity is in [23]. The theoretical results and numerical experiments indicate that H-TFETI without preconditioner is a competitive computational engine for solving large variational inequalities and huge linear problems. The TFETI algorithms have proven effective for solving variational inequalities discretized by billions of nodal variables using tens of thousands of cores. However, the results presented here suggest that hybrid TFETI can solve at least ten times larger problems using hundreds of thousands of cores. Effective implementation of H-TFETI for linear problems is described in Říha et al. [19]. The FETI-DP methods are also classified as three-level methods. A powerful multilevel FETI-DP linear solver can be found in Toivanen et al. [20]. The numerical experiments of Sect. 15.10 appeared first in [18, 22, 23]. The description of H-TFETI in this chapter uses the transformation of bases described by Klawonn and Rheinbach in [24].
References 1. Klawonn, A., Rheinbach, O.: A hybrid approach to 3-level FETI. Proc. Appl. Math. Mech. 8, 10841–10843 (2009) 2. Vodstrčil, P., Bouchala, J., Jarošová, M., Dostál, Z.: On conditioning of Schur complements of H-TFETI clusters for 2D problems governed by Laplacian. Appl. Math. 62(6), 699–718 (2017) 3. Klawonn, A., Widlund, O.B.: Dual–primal FETI method for linear elasticity, Comm. Pure Appl. Math. LIX, 1523–1572 (2006) 4. Mayer, C.D.: Matrix Analysis and Applied Linear Algebra. SIAM, Philadelphia (2000) 5. PERMON software, http://permon.vsb.cz 6. Jarošová, M., Klawonn, A., Rheinbach, O.: Projector preconditioning and transformation of basis in FETI-DP algorithms for contact problem. Math. Comput. Simulation, 82(10), 1894–1907 (2012) 7. Říha, L., Brzobahatý, T., Makopoulos, A., Meca, O.: ESPRESO—Highly Parallel Framework for Engineering Applications. http://numbox.it4i.cz 8. Farhat C., Chen P.S., Mandel J., Roux F.X.: The two-level FETI method–Part II: extension to shell problems, parallel implementation and performance results. Comput. Methods Appl. Mech. Eng. 155, 153– 180 (1998)
356
Zdeněk Dostál and Marie Sadowská
9. Farhat, C., Mandel, J.: The two-level FETI method for static and dynamic plate problems–Part I: an optimal iterative solver for biharmonic systems. Comput. Methods Appl. Mech. Eng. 155, 129–152 (1998) 10. Farhat, C., Lesoine, M., LeTallec, P., Pierson, K., Rixen D.: FETI-DP: a dual-primal uni ed FETI method–Part I: A faster alternative to the two-level FETI method. Int. J. Numer. Meth. Engng. 50, 1523–1544 (2001) 11. Mandel, J., Tezaur, R.: On the convergence of a dual-primal substructuring method. Numer. Math. 88, 543–558 (2001) 12. Tu, X.: Three-level BDDC in three dimensions. SIAM J. Sci. Comput. 29, 1759–1780 (2007) 13. Rheinbach, O.: Parallel iterative substructuring in structural mechanics. Arch. Comput. Methods. Eng. 16, 425–463 (2009) 14. Klawonn, A., Rheinbach, O.: Highly scalable parallel domain decomposition methods with an application to biomechanics. ZAMM - J. Appl. Math. Mech. 90(1), 5–32 (2010) 15. Lee, J.: Two domain decomposition methods for auxiliary linear problems. SIAM J. Sci. Comput. 35(3), A1350–A2375 (2013) 16. Dostál, Z., Horák, D., Brzobohatý, T., Vodstrčil, P.: Bounds on the spectra of Schur complements of large H-TFETI-DP clusters for 2D Laplacian. Numer. Linear Algebra Appl. 28(2), e2344 (2021) 17. Dostál, Z., Brzobohatý, T., Vlach, O.: Schur complement spectral bounds for large hybrid FETI-DP clusters and huge three-dimensional scalar problems. J. Numer. Math. 29(4), 289–306 (2021) 18. Dostál, Z., Horák, D., Kružík, Brzobohatý, T., Vlach, O.: Highly scalable hybrid domain decomposition method for the solution of huge scalar variational inequalities. Numer. Algorithms 91, 773–801 (2022) 19. Říha, L., Brzobohatý, T., Markopoulos, A., Meca, O., Kozubek, T.: Massively parallel hybrid total FETI (HTFETI) solver. In: Proceedings of the Platform for Advanced Scientific Computing Conference - PASC 16, Article No. 7, pp. 1–11. ACM, New York (2016) 20. A. Markopoulos, Říha, L., Brzobohatý, T., Meca, O., Kučera, R., Kozubek, T.: The HTFETI method variant gluing cluster subdomains by kernel matrices representing the rigid body motions. In: Bjorstad P. et al. (eds.) Proceedings of Domain Decomposition Methods in Science and Engineering XXIV, LNCSE, vol. 125, pp. 543–551. Springer Nature, Cham (2018) 21. Toivanen, J., Avery, P., Farhat, C.: A multilevel FETI-DP method and its performance for problems with billions of degrees of freedom. Int. J. Numer. Meth. Engng. 116, 661–682 (2001) 22. Dostál, Z., Brzobohatý, T., Vlach., O., Meca, O., Sadowská, M.: Hybrid TFETI domain decomposition with the clusters joined by faces’ rigid modes for solving huge 3D elastic problems, Computational Mechanics 71, 333–347 (2023)
15 Hybrid TFETI and TBETI
357
23. Dostál, Z., Brzobohatý, T., Horák, D., Kružík, J., Vlach, O.: Scalable hybrid TFETI-DP methods for large boundary variational inequalities. In: Brenner S.C. et al. (eds.) Proceedings of the Domain Decomposition Methods in Science and Engineering XXVI, LNCSE, vol. 145, pp. 27–38. Springer, Heidelberg (2022) 24. Klawonn, A., Rheinbach, O.: A parallel implementation of dual-primal FETI methods for three- dimensional linear elasticity using a transformation of basis. SIAM J. Sci. Comput. 28, 1886–1906 (2006)
Chapter 16
Mortars Zdeněk Dostál and Tomáš Kozubek
The theoretical results on numerical scalability of algorithms for contact problems presented in Chaps. 10–15 rely on the strong linear independence of the rows of matrices describing the node-to-node non-penetration conditions. However, such an approach has poor approximation properties and causes problems when the contact conditions are imposed on large curved interfaces or nonmatching grids. Such grids are inevitably present in the solution of transient contact problems. Similar to the description of the equilibrium conditions, we can overcome most of the drawbacks mentioned above by imposing the non-penetration conditions in average. Our primary tool will be the biorthogonal mortars enforcing the weak non-penetration conditions like (13.14). They were introduced by Wohlmuth [1], who also developed a strong approximation theory of variationally consistent discretization for contact problems (see Wohlmuth [2]) that naturally enhances nonmatching grids. Until recently, it was unclear whether we could plug the mortars into the FETI-based contact algorithms in a way that preserves the scalability of contact algorithms. The main challenge was posed by the mortar-constraint matrices that do not share a nice block-diagonal structure with the nodeto-node ones. Moreover, it was not clear how to orthogonalize effectively the rows corresponding to the inequality constraints against those enforcing the subdomains’ interconnection, especially those associated with the “wire baskets.” In this chapter, we briefly review the description of non-penetration conditions using biorthogonal mortars, prove that the constraint matrices arising
Z. Dostál • T. Kozubek Department of Applied Mathematics and IT4Innovations National Supercomputing Center, VŠB-Technical University of Ostrava, Ostrava, Czech Republic e-mail: [email protected]; e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 Z. Dostál et al., Scalable Algorithms for Contact Problems, Advances in Mechanics and Mathematics 36, https://doi.org/10.1007/978-3-031-33580-8_16
359
360
Zdeněk Dostál and Tomáš Kozubek
from the discretization by some biorthogonal bases are well-conditioned, and show that the procedure complies well with the FETI domain decomposition method. Finally, we validate theoretical results through numerical experiments.
16.1 Variational Non-penetration Conditions As in Sect. 11.1, let us consider a system of bodies which occupy in the reference configuration open bounded domains .Ω1 , . . . , Ωs ⊂ R3 with the Lipchitz boundaries .Γ1 , . . . , Γs and their parts .ΓiF , ΓiU , and .ΓiC . Suppose that p some nonempty .ΓpC comprises a part .Γpq C ⊆ ΓC that can get into contact with p q .Ω as in Fig. 11.1. We assume that .ΓC is sufficiently smooth so that there is a well-defined outer normal .np (x) at almost each .x ∈ ΓpC . From each couple of indices .{p, q} which identify .Γpq C = ∅, we choose one index to identify the slave side of a possible contact interface and define a contact coupling set .S . Thus if .(p, q) ∈ S , then .Γpq C = ∅ is the slave side of the contact interface that can come into contact with the master side .Γqp C . We shall first discuss the problems that can be explained without loss of generality on two bodies assuming that .S = {(1, 2)}, so that we can simplify 2 21 the notation to .Γ1C = Γ12 C and .ΓC = ΓC . We shall use the notation introduced in Sect. 11.2 to write the contact conditions briefly as .
[un ] ≤ g, λn ≥ 0, λn ([un ] − g) = 0, λ = λn n, x ∈ Γ1C ,
(16.1)
where .n = n(x) denotes an outer normal unit vector at .x ∈ Γ1C , .λ is the contact traction, [un ] = (u1 − u2 ◦ χ) · n
.
denotes the jump in normal displacements, .χ = χ12 : Γ1C →Γ2C denotes the one-to-one slave-master mapping and .g = g(x) = χ(x) − x · n(x) is the initial normal gap. The first of the conditions (16.1) defines the nonempty, closed, and convex subset .K of the Hilbert space by K = {v ∈ V : [vn ] ≤ g
.
V =V ×V , 1
2
1
on Γ1C }, 1
1
3
V = (H (Ω )) ,
(16.2) 2
1
2
3
V = (H (Ω )) .
The feasible set can be alternatively characterized through the dual cone. To describe the weak non-penetration conditions, let us denote
16 Mortars
.
361
W = H 1/2 Γ1C , M = H −1/2 (Γ1C ),
W + = {w ∈ W : w ≥ 0}, M + = {μ ∈ M : μ, w ≥ 0, w ∈ W + },
so that M denotes the dual space of the trace space W and .M + denotes the dual cone. The feasible set (with respect to linearized non-penetration conditions) can be characterized by K = {v ∈ V : μ, [vn ] Γ1C ≤ μ, g Γ1C for all μ ∈ M + }.
.
16.2 Variationally Consistent Discretization Let us assume for simplicity that the domains .Ω1 and .Ω2 are polyhedral, so that we can decompose them into two independent families .T k , .k = 1, 2, of tetrahedrons .ω such that .Ωk = ω, k ∈ {1, 2}. ω∈T k
Let .F k denote the set of contact faces .τ of the elements of .T k , so that .F k defines a two-dimensional surface triangularization of .ΓkC , k .ΓC = τ , k ∈ {1, 2}. τ ∈F k
The surface mesh on .Γ2C is mapped by .χ−1 onto .Γ1C , resulting in possibly non-matching meshes on the contact interface. Following Wohlmuth [2], we use the standard piece-wise linear conforming finite element basis functions for the normal displacements on .ΓkC and the dual non-conforming piece-wise linear finite element basis functions reproducing constants for normal surface traction on .Γ1C . For the finite element discretization of normal displacements, we can use the procedure described in Sect. 10.4. We shall use the notation .
3 Vhk = Span{φki : i ∈ P k } , k ∈ {1, 2},
Vh =Vh1 × Vh2 ,
.
Mh+ =
⎧ ⎨ ⎩
μ=
1 i∈PC
βi ψi1 , βi ∈ R+
(16.3)
⎫ ⎬ ⎭
,
where • .P k is the set of all vertices of .T k , • .PCk is the set of all vertices which belong to .ΓkC , .k ∈ {1, 2},
(16.4)
362
Zdeněk Dostál and Tomáš Kozubek
• .φki : ΓkC → R denotes the standard piece-wise linear nodal basis function associated with vertex .xi , • .ψi1 : Γ1C → R denotes the dual basis function associated with vertex .xi . In particular, for any .x ∈ ΓkC φki (x) = 1, k = 1, 2. .
(16.5)
i∈P k
The following properties of the dual basis functions .ψi1 are essential for our presentation. • The support of .ψi1 is local, i.e., .
supp ψi1 = supp φ1i ,
i ∈ PC1 .
• The basis functions .ψi1 and .φ1i are locally biorthogonal, i.e., φ1i ψj1 dΓ = δij φ1i dΓ, i, j ∈ PC1 , τ ∈ F 1 , . τ
(16.6)
(16.7)
τ
where .F 1 denotes the set of all contact faces on the slave side. The basis functions are required to enjoy the best approximation property and the uniform inf–sup condition that are essential for the development of the approximation theory [1]. Notice that no set of nonnegative basis functions satisfy (16.7), so + + .M h is not a subset of .M . Observing that we can write each .v ∈ Vh in the form
1 1 2 2 .v = ξi φ i , ξi φi , ξi1 , ξi2 ∈ R, i∈P 1
i∈P 2
we can write the non-penetration condition in more detail as ⎛ ⎞ ψi1 ⎝ ξj1 φ1j − ξj2 φ2j ◦ χ⎠ · n dΓ ≤ ψi1 g dΓ, i ∈ PC1 . . Γ1C
1 j∈PC
2 j∈PC
Γ1C
(16.8) Our approximation of .K reads Kh = {v ∈ Vh : μ, [vn ] Γ1C ≤ μ, g Γ1C for all μ ∈ Mh+ }.
.
To write (16.8) in a convenient matrix form, let us assign local indices 1, . . . , n1 and .n1 + 1, . . . , n1 + n2 to the vertices on the contact interface on the slave and master side, respectively, and assume that .n(xki ) = n(x) for k .i = 1, . . . , n1 , .k = 1, 2, and .x ∈ supp φi , so that (16.8) is equivalent to .
16 Mortars
363
BC x ≤ Σ−1 gC ,
.
BC = Σ−1 [D, −M] ,
x = [(x1 )T , (x2 )T ]T ,
where ⎤ d1 nT1 · · · oT ⎢ .. . . .. ⎥ , .D =⎣ . . . ⎦ T o · · · dn1 nTn1 ⎡
⎤ m11 nT1 · · · m1n2 nT1 ⎥ ⎢ .. .. .. .M =⎣ ⎦, . . . T T mn1 1 nn1 · · · mn1 n2 nn1
di = Γ1C
⎡
φ1i dΓ,
(16.9)
mij = Γ1C
ψi1 (φ2j ◦ χ) dΓ,
(16.10)
. g =[gi ],
gi = Γ1C
ψi1 g dΓ,
i = 1, . . . , n1 ,
(16.11)
and .Σ denotes the scaling matrix which normalizes the rows of .[D, −M], so that the diagonal entries of .BC BTC are equal to one, i.e., 2 2 . [Σ]ii = di +
n2
m2ij ,
i = 1, . . . , n1 .
(16.12)
j=1
Due to the block structure of .BC , we have .
BC BTC = Σ−2 D2 + Σ−1 MMT Σ−1 ,
(16.13)
where .D is diagonal SPD and .MMT is SPS. A useful discussion of implementation details and catches can be found in Popp et al. [3].
16.3 Conditioning of Mortar Non-penetration Matrix Now we are ready to give bounds on the squares of the extreme singular values of .BC , i.e, on .λmin (BC BTC ) and .λmax (BC BTC ). In order to simplify the notation, we omit the superscript 1 of the basis functions .ψi and introduce the following definition. Definition 16.1 The dual basis functions .ψi , ψj , i = j defined on .Γ1C are near if there is a finite element basis function .φ2 defined on .Γ2C such that supp (φ2 ◦ χ) ∩ supp ψi = ∅ and
.
supp (φ2 ◦ χ) ∩ supp ψj = ∅.
The cover number of a dual basis function .ψi is the number of dual basis functions that are near to .ψi . The cover number .k of the mortar discretization
364
Zdeněk Dostál and Tomáš Kozubek φ2j+2 ◦χ φ2j+1 ◦χ
φ2j+3 ◦χ
Γc2 (master)
supp (ψi ◦χ)
χ ψi
χ Γc1 (slave)
supp ψi
Fig. 16.1 Primal basis functions .φ2j on .Γc2 and dual basis functions .ψi on .Γc1
is the maximal cover number of the dual basis functions on .Γ1C . See also Fig. 16.1. The following bounds are due to Vlach et al. [4]. Theorem 16.1 Let .BC denote the matrix arising from the consistent mortar discretization of the conditions of non-penetration for our model problem discretized with the finite elements that satisfy (16.5), and let .k denote the cover number of the mortar discretization. Then
.
min
i=1,...,n1
2 ψi dΓ T 2 2 ≤ λmin (BC BC ) ≤ 1 ψi dΓ + Γ1 |ψi | dΓ Γ1C
Γ1C
(16.14)
C
and .
λmax (BC BTC ) ≤ 1 + k max
i=1,...,n1
Γ1C 2
Γ1C
ψi dΓ
2 |ψi | dΓ 2 . + Γ1 |ψi | dΓ
(16.15)
C
Proof First observe that −2 2 T .λmin BC BC = λmin Σ D + Σ−1 MMT Σ−1 ≥ λmin Σ−2 D2 and that .Σ−2 D2 is diagonal. Using (16.12) and (16.13), we get
Σ−2 D2
ii
=
d2i +
d2 ni 2 j=1
m2ij
.
= Γ1C
ψi dΓ n2 Γ1C
2 ψi dΓ
2
+
j=1
Γ1C
ψi (φ2j
◦ χ) dΓ
2 .
Noticing that the basis functions .φj satisfy (16.5), we can estimate the last sum in the denominator of the last term by
16 Mortars
365
n2 j=1
2 Γ1C
ψi (φ2j ◦ χ) dΓ
⎛ n2 ≤⎝ j=1
⎛ ⎝ =
.
Γ1C
n2 Γ1C j=1
= Γ1C
⎞2 ψi (φ2j ◦ χ) dΓ⎠ ⎞2 |ψi |(φ2j ◦ χ) dΓ⎠ 2
|ψi | dΓ
,
i = 1, . . . , n1 ,
which proves the left inequality of (16.14). To prove the right inequality of (16.14), just observe that for the vector .ei with the entries .δij , we have due to the normalization [BC BC T ]ii = eTi BC BTC ei = 1.
.
To prove the upper bound (16.15), we recall that the .∞ -norm of any square matrix dominates its Euclidean norm, i.e.,
T . λmax BC BC
≤
max
i=1,...,n1
n1 j=1
|bTi bj |
= 1 + max
i=1,...,n1
n1
|bTi bj |.
(16.16)
j=1,j=i
Our next goal will be the upper bound on .|bTi bj |. Using the Cauchy interlacing inequalities (2.29), we get that the eigenvalues of any submatrix ! 1 bTi bj .Tij = , i, j ∈ {1, . . . , n1 }, i = j, bTj bi 1 of .BC BTC satisfy λ
. min
(Tij ) ≥ λmin (BC BTC ).
Direct computations reveal the spectrum .σ(Tij ) of .Tij ; we get σ(Tij ) = {1 + |bTi bj |, 1 − |bTi bj |}.
.
Combining these observations with (16.14), we get
2 ψi dΓ 1 − |bTi bj | ≥λmin (BC BTC ) ≥ 2 2 ψ dΓ + Γ1 |ψi | dΓ Γ1C i C . 2 |ψi | dΓ Γ1C =1 − 2 2 , ψ dΓ + |ψ | dΓ i i Γ1 Γ1 Γ1C
C
C
366
Zdeněk Dostál and Tomáš Kozubek
Fig. 16.2 Illustration to Corollary 16.1
so that
.
2 |ψi | dΓ 2 2 . ψi dΓ + Γ1 |ψi | dΓ Γ1C
|bTi bj | ≤ Γ1C
(16.17)
C
To finish the proof, substitute (16.17) to (16.16) and observe that
n2 T 2 2 .bi bj = ψi (φk ◦ χ) dΓ ψj (φk ◦ χ) dΓ . k=1
Γ1C
Γ1C
It follows that if the dual functions .ψi and .ψj are not near, then bTi bj = 0. .
.
It is possible to evaluate the bounds directly for the linear elements so that we can formulate the following result. Corollary 16.1 Let .BC denote the matrix arising from the consistent mortar discretization of the conditions of non-penetration of two 3D bodies using the linear finite element basis functions reproducing constants. Then 64 1 (16.18) < ≤ λmin BC BTC ≤ BC 2 ≤ 1 + k. 7 425 Proof The proof of the corollary uses the observation that the relevant characteristics of the biorthogonal basis functions associated with linear elements do not depend on the geometry. The slave continuous basis functions .φ1i and the discontinuous dual basis functions .ψi for 2D problems (see Fig. 16.2) have endpoint values .{1, 0} and .{2, −1}, respectively. To evaluate the estimate (16.14), we need the ratio between the integrals of .ψi and .|ψi | over .Γ1C . A little analysis reveals that the ratio is 3/5 (see the gray areas in the right part of Fig. 16.2), and the corresponding value for the 3D problems is 8/19. Thus 2 ψ dΓ 1 i ΓC 1 64 . . 19 2 = 2 2 ≥ 425 1 + ψ dΓ + |ψ | dΓ 8 i i Γ1 Γ1 .
C
C
.
16 Mortars
367
Though the estimate (16.18) captures the qualitative features of the mortar constraint matrices’ conditioning, it is rather pessimistic. Table 16.1 provides information about the singular values of .BI for nonmatching regular discretizations of Hertz’s problem contact interface with varying discretization parameters. We can see nice numbers for small values of .hslave /hmaster , which are relevant from the point of view of the approximation theory. See also Vlach et al. [4]. .hslave /hmaster .λmin (BI BT I ) .λmax (BI BT I ) .κ(BI BT I )
1/6 0.51 15.50 30.40
1/3 0.52 5.72 11.40
2/3 0.57 2.01 3.55
1 0.75 1.32 1.77
3/2 0.89 1.14 1.28
3 0.82 1.30 1.58
6 0.87 1.20 1.38
Table 16.1 Conditioning of the mortar matrices .BI for 3D Hertz problem with nonmatching grids
16.4 Combining Mortar Non-penetration with TFETI To plug our results into TFETI or TBETI, let us decompose each body .Ω1 , Ω2 into the subdomains .Ω1i , Ω2j of the diameter less or equal to H in a way which complies with the triangulations .T k , .k ∈ {1, 2} and use the triangulation to discretize separately the subdomains. The procedure introduces several local degrees of freedom for the displacements associated with the nodes on the interfaces .Ωki ∩ Ωkj . To simplify the reference to the above discussions, let us first consider only the variables associated with the nodes on .ΓC and denote "C the variables associated with the problem without the decomby .xC and .x position considered above and the decomposed problem, respectively. Using a suitable numbering of both sets of variables, we can define the expansion matrix .LC ∈ RnC ×nC which assigns to each vector .xC ∈ RnC of global vari"C ∈ RnC of the variables associated with the decomposed ables the vector .x problem so that " C = LC xC , x
.
LC = diag(e1 , . . ., enC ),
ei = [1, . . . , 1]T ∈ Rsi .
Thus each component .xi associated with .si corners of subdomains of .Ωk is assigned the vector "i = xi ei ∈ Rsi . .x To test the non-penetration of the decomposed problem, let us introduce the matrices −1 −1 −1 Σ = diag(e1 −1 1 , . . . , enC 1 ) = diag(s1 , . . . , snC ), " " C = BC " LC = LC Σ, and B LT . .
C
368
Zdeněk Dostál and Tomáš Kozubek
We get .
LTC LC = Σ−1 ,
" LTC LC = I,
and
"C x " C = BC " B LTC LC xC = BC xC . (16.19)
To interconnect the subdomains as in Chap. 11, notice that the displace" ∈ ImLC if and only if .y "i = yi ei , where .y "i is a block of .y " induced ment .y by the block structure of .ImLC . It follows that .ImLC comprises the displacements which correspond to the feasible displacements of the interconnected " E of the “gluing” matrix subdomains of .Ωk , .k ∈ {1, 2}, so that the part .B acting on the variables associated with .ΓC satisfies # $ "1 O B E "E , B "E = .Im" LC = ImLC = KerB "2 . O B E
Since the kernel of any matrix is the orthogonal complement of the range of its transpose, we get that "T .Ker" LTC = ImB E and " "T = O LT B E
. C
and
"C B " T = BC " " T = O. B LTC B E E
(16.20)
Let us now return to the discretization of our two body contact problem, so that .x ∈ Rn denotes the vector of coordinates of .vh ∈ Vh . The global matrices describing the non-penetration of and interconnection of variables " C and .B "E that correspond to the nodes on .ΓC can be obtained by padding .B " R with orthonormal rows which enforce with zeros. If we add the matrices .B the interconnecting of the remaining variables and the Dirichlet conditions, we get the global constraint matrices # $ ! & % "E O B BI " .BI = BC O , BE = " R , B = BE , O B and by (16.20) .
"C B " T , I ). BBT = diag(BI BTI , BE BTE ) = diag(B C
(16.21)
Using (16.19) and assuming that .BC has .mC rows, we get for any .λ ∈ RmC " T λ 2 = λ T BC " B LTC " LC BTC λ = λ T BC ΣBTC λ = Σ1/2 BTC λ 2 , C
.
where we used Σ−1 = LTC LC .
.
" recall that .si = ei 1 deTo give bounds on the singular values of .B, notes the number of subdomains the boundaries of which contain the node associated with the variable .xi and denote
16 Mortars
369
s
. max
=
max
i=1,...,nC
ei 1 ≥ 1.
Then .
T T "T s−1/2 max BC λ ≤ BC λ ≤ BC λ .
(16.22)
Since the rows of .BC and .BE are normalized and orthonormalized, respectively, we can combine (16.21) and (16.22) to get ! λI −1 T 2 λE 2 ≤ BT λ 2 ≤ BTC λ I 2 + λ λE 2 , λ = .smax BC λ I + λ . λE Under the assumptions of Corollary 16.1, we get 1 .
7smax
λ2 . λ2 ≤ BT λ 2 ≤ (2 + k)λ λ
(16.23)
Now we are ready to formulate the main result on the conditioning of the constraints arising in the solution of frictionless contact problems by the TFETI method with the non-penetration enforced by biorthogonal mortars. Since our reasoning is valid for all contact interfaces of the multibody contact problems with smooth contact interfaces, we can formulate the main result in general form. Proposition 16.1 Let the multibody contact problem introduced in Sect. 11.4 be discretized by linear finite elements reproducing constants with the consistent mortar discretization of the conditions of non-penetration. Let ij ij k k .ΓC ∩ ΓC = ∅ and .χ(ΓC ) ∩ χ(ΓC ) = ∅ for any .(i, j), .(k, ) that belong to the coupling set .S . Then the constraint matrix .B can be formed in such a way that 1 .
7smax
< λmin BBT ≤ B2 ≤ 2 + k,
(16.24)
where .k denotes the cover number of the mortar discretization and .smax denotes the maximum number of the subdomains of one body that share a point on .ΓC . Proof First, notice that inequalities (16.23) are valid for any couple of bodies that can come into contact. It follows that if we denote by .BIij ∗ the block of rows of .B that enforce the gluing or non-penetration of bodies (or domains) on the interface .Γij C , .(i, j) ∈ S , then for each .(i, j) ∈ S 1 .
7smax
< λmin BIij ∗ BTIij ∗ ≤ BIij ∗ 2 ≤ 2 + k.
370
Zdeněk Dostál and Tomáš Kozubek
Moreover, the assumption guarantees that for each .(i, j), (k, ) ∈ S , .(i, j) = (k, ), we have BIij ∗ BTIk ∗ = O.
.
" R of the Since the blocks .BIij ∗ , .(i, j) ∈ S are orthogonal to the block .B orthonormal rows of .B that define the equality constraints which do not interfere with the contact interface, it follows that the estimates (16.23) are valid also for the multibody contact problems that satisfy the assumptions . of the proposition. Proposition 16.1 assumes that the contact interfaces are smooth and do not intersect. However, the assumptions can be satisfied by a modification of the discretization as indicated in Fig. 16.3.
Fig. 16.3 A contact interface with “wire baskets” that violates the assumptions of Proposition 16.1 (left) and its acceptable modification (right).
Proposition 16.1 shows that the matrix .F arising from the mortar discretization of the non-penetration conditions can be assembled so that it satisfies the assumptions of Theorem 11.2 which established the .H/h bound on the regular condition number .κ(PFP) with .B generated by means of nodeto-node constraints. It follows that Theorem 11.3 on the optimality of TFETI for the solution of frictionless problems with node-to-node non-penetration conditions is valid also for the non-penetration enforced by biorthogonal mortars.
16.5 Numerical Experiments We implemented mortar non-penetration conditions in MATLAB using MatSol numerical library [5] and examined the conditioning of both constraint and Schur complement matrices on 3D Hertz problem depicted in Fig. 16.4 (left) (it appeared in Vlach et al. [4]). The upper body .Ω2 is pressed
16 Mortars
371
down on the upper face .x3 = 10 by the boundary traction .(0, 0, −2 · 103 ) [MPa]. We used the material coefficients .ν = 0.3 and .E = 210 [GPa]. The symmetry conditions are imposed on the boundaries .x1 = 0 and .x2 = 0; the lower body is fixed vertically along the bottom face .x3 = −10.
Fig. 16.4 3d Herz example—setting (left); decomposition case (middle); von Mises stress (right)
We carried out the computations with each body either undecomposed or decomposed into .3×3×3 or .2×2×2 subdomains, depending on .hslave /hmaster . The decomposition for .hslave /hmaster = 2/3 is in Fig. 16.4 (middle). In the right part of Fig. 16.4, the distribution of von Mises stress in the deformed state is depicted. The numerical results were obtained by the combination of MPRGP and SMALBE with adaptive reorthogonalization. This algorithm is described in the next chapter, with the numerical experiments documenting the scalability and performance of the algorithms in more detail. The results concerning the conditioning of the contact constrain matrices are in Table 16.2. We can see that the decomposition deteriorates the conditioning of the dual Schur complement .F rather mildly, being affected more by the nonmatching decomposition, which is far from the one required by the theory of domain decomposition reported in Chap. 11. .hslave /hmaster 1 / 6 1/3 2/3 1 .λmin (BI BT ) 0.49 0.40 0.43 0.73 I .λmax (BI BT 19.2 7.00 2.30 1.32 I ) .κ(BI BT ) 42.8 17.5 5.36 1.82 I .λmin (F|KerG) .1.6 · 10−6 .2.4 · 10−6 .2.8 · 10−6 .3.8 · .λmax (F|KerG) .1.1 · 10−4 .1.1 · 10−4 .1.1 · 10−4 .1.1 · .κ(F|KerG) 72.8 46.7 40.7 29.9
10−6 10−4
3/2 0.86 1.16 1.35 .2.7 · 10−6 .9.9 · 10−5 37.2
3 0.83 1.31 1.57 .2.2 · 10−6 .9.9 · 10−5 44.0
6 0.88 1.20 1.37 .1.4 · 10−6 .9.9 · 10−5 72.7
Table 16.2 3D Hertz problem with domain decomposition [4]
The iteration counts are in Table 16.3. We can see that the number of outer iterations of SMALBE-M ranges from 9 to 14, while the number of matrix–
372
Zdeněk Dostál and Tomáš Kozubek
vector multiplications required by MPRGP in the inner loop of SMALBE ranges from 96 to 160. Notice that the implementation of a nonlinear step requires typically two matrix–vector multiplications. Taking into account that the dimension of the discretized problems ranged from 53,760 to 6,720,000, we conclude that the cost of the solution increases nearly proportionally to the dimension in agreement with the theory. example subdomains primal dof’s dual dof’s outer it. Hess mult.
×3 35 53,760 17,295 9 96 .2
×6 280 430,080 160,607 9 114 .4
×9 945 1,451,520 566,644 14 142 .6
× 12 2240 3,440,640 1,372,121 9 128 .8
× 15 4375 6,720,000 2,713,761 11 160 .10
Table 16.3 Scalability of 3D Hertz problem with nonmatching domain decomposition [13]
16.6 Comments Maday et al. [6] introduced mortars into the domain decomposition methods to develop practical tools for coupling variational discretizations of different types of nonoverlapping subdomains. A milestone in the development of mortar methods was the discovery of biorthogonal mortars by Wohlmuth [1]. The mortar approximation of contact conditions was used by many researchers, including Puso [7], Puso and Laursen [8], Wriggers [9], Dickopf and Krause [10], and Chernov et al. [11] and was enhanced into the efficient algorithms for contact problems as in Wohlmuth and Krause [12]. We base our exposition on the variationally consistent approximation of contact conditions by biorthogonal mortars introduced in the seminal paper by Wohlmuth [2]. See also Popp et al. [3]. The estimates presented here appeared in Vlach et al. [4]. Numerical experiments are from Dostál et al. [13, 14].
References 1. Wohlmuth, B.I.: Discretization Methods and Iterative Solvers Based on Domain Decomposition. Springer, Berlin (2001) 2. Wohlmuth, B.I.: Variationally consistent discretization scheme and numerical algorithms for contact problems. Acta Numer. 20, 569–734 (2011) 3. Popp, A., Seitza, A., Geeb, M.W., Wall, W.A.: Improved robustness and consistency of 3D contact algorithms based on a dual mortar approach. Comput. Methods Appl. Mech. Eng. 264, 67–80 (2013)
16 Mortars
373
4. Vlach, O., Dostál, Z., Kozubek, T.: On conditioning the constraints arising from variationally consistent discretization of contact problems and duality based solvers. Comput. Methods Appl. Math. 15(2), 221–231 (2015) 5. Kozubek, T., Markopoulos, A., Brzobohatý, T., Kučera, R., Vondrák, V., Dostál, Z.: MatSol–MATLAB efficient solvers for problems in engineering. The library is no longer updated and is replaced by the ESPRESO framework. http://numbox.it4i.cz 6. Maday, Y., Mavriplis, C., Patera, A.T.: Nonconforming mortar element methods: application to spectral discretizations. In: Chan, T. (ed.) Domain Decomposition Methods, pp. 392–418. SIAM, Philadelphia (1989) 7. Puso, M.: A 3D mortar method for solid mechanics. Int. J. Numer. Methods Eng. 59, 315–336 (2004) 8. Puso, M., Laursen, T.: A mortar segment-to-segment contact method for large deformation solid mechanics. Comput. Methods Appl. Mech. Eng. 193, 601–629 (2004) 9. Wriggers, P.: Contact Mechanics. Springer, Berlin (2005) 10. Dickopf, T., Krause, R.: Efficient simulation of multibody contact problems on contact geometries: a flexible decomposition approach using constrained optimization. Int. J. Numer. Methods Eng. 77(13), 1834–1862 (2009) 11. Chernov, A., Maischak, M., Stephan, E.P.: hp-mortar boundary element method for two-body contact problems with friction. Math. Methods Appl. Sci. 31, 2029–2054 (2008) 12. Wohlmuth, B.I., Krause, R.: Monotone methods on nonmatching grids for nonlinear contact problems. SIAM J. Sci. Comput. 25, 324–347 (2003) 13. Dostál, Z., Vlach, O., Brzobohatý, T.: Scalable TFETI based algorithm with adaptive augmentation for contact problems with variationally consistent discretization of contact conditions. Finite Elem. Anal. Des. 156, 34–43 (2019) 14. Dostál, Z., Vlach, O.: An accelerated augmented Lagrangian algorithm with adaptive orthogonalization strategy for bound and equality constrained quadratic programming and its application to large-scale contact problems of elasticity. J. Comput. Appl. Math. 394, 113565 (2021)
Chapter 17
Preconditioning and Scaling Zdeněk Dostál and Tomáš Kozubek
So far, the scalability results assumed that the bodies and subdomains involved in a problem are made of the same material and that the stiffness matrices of the subdomains are reasonably conditioned. If this is not the case, we can improve the convergence rate by preconditioning. Here we present some results that can improve the convergence of the algorithms for solving QP or QCQP problems arising from the application of FETI to the solution of contact problems. We are especially interested in the methods that improve the condition number of the Hessian of the cost function and preserve the structure of the inequality constraints so that they affect both the linear and nonlinear steps. We first consider the preconditioning and scaling methods that can reduce or eliminate the effect of varying coefficients. They use the structure of the matrices .F, B introduced in Chap. 11 to pass the favorable effect of the stiffness matrix’s .K scaling on the reduction of the upper bound on the conditioning of the dual energy function’s Hessian. We show that the method guarantees that the bounds on the regular condition number of the preconditioned problem are independent of the coefficients provided the subdomains are homogeneous and each node is involved in at most one inequality constraint. We show that preconditioning can be implemented to preserve the bound constraints and affect both linear and nonlinear steps of optimization algorithms. Then we discuss the adaptation of standard Dirichlet and lumped preconditioners to solving contact problems and their implementation into the preconditioning in face of the MPRGP and MPGP algorithms. These methods can reduce the condition number efficiently, but do not preserve the bound constraints. Z. Dostál • T. Kozubek Department of Applied Mathematics and IT4Innovations National Supercomputing Center, VŠB-Technical University of Ostrava, Ostrava, Czech Republic e-mail: [email protected]; e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 Z. Dostál et al., Scalable Algorithms for Contact Problems, Advances in Mechanics and Mathematics 36, https://doi.org/10.1007/978-3-031-33580-8_17
375
376
Zdeněk Dostál and Tomáš Kozubek
Finally, we show that enhancing the information on the free set of current iterates into the reorthogonalization of equality constraints can improve the performance of optimization algorithms.
17.1 Reorthogonalization-Based Preconditioning We shall start with the preconditioning of the matrix F = BT K+ B,
.
K = diag(K1 , . . ., Ks ),
arising from the application of FETI domain decomposition methods to the solution of frictionless contact problems with matching grids and homogeneous subdomains made of isotropic material, so that their stiffness matrices depend linearly on their modulus of elasticity. We assume that the matrices .B and .K were obtained by the procedure described in Chap. 11. Notice that is block the constraint matrix .B can be reordered so that the reordered .B diagonal with small blocks. The structure of .F, B and Lemma 10.2 suggests that we can try to improve the conditioning of .F by first improving the regular condition number of .K by the diagonal scaling to get = Δ−1/2 KΔ−1/2 , K
.
+ = Δ1/2 K+ Δ1/2 , K
Δ = diag(k11 , . . ., knn ),
and + Δ−1/2 BT , F = BK+ BT = BΔ−1/2 K
.
does not depend on the Young coefficients .Ei , i = 1, . . . , s, and so that .K then determine .T by attempting to make = TBΔ−1/2 B
.
as close to a matrix with orthonormal rows as possible. If each block of B contains at most one inequality, we can use .T arising from the properly ordered Gram–Schmidt orthogonalization of .BΔ−1/2 to get
.
.
B T = TBΔ−1/2 (TBΔ−1/2 )T = I. B
(17.1)
Notice that .B is a reordered block diagonal matrix; each block corresponds to a “wire basket” of the dimension .(p − 1) × p, where p is bounded by the number nodes in the wire basket multiplied by three. Thus it is not expensive to get .BΔ−1/2 (BΔ−1/2 )T , its Cholesky factorization LLT = BΔ−1 BT ,
.
17 Preconditioning and Scaling
377
and the action of T = L−1 .
.
The effect of reorthogonalization-based preconditioning is formulated in the following proposition. Proposition 17.1 Let the domain of a contact problem be decomposed into homogeneous subdomains of the diameter H and discretized by the shape regular elements with the diameter h. Let the discretization be quasi-uniform with shape regular elements, and let the subdomains be homogeneous and isotropic. Then the regular condition number .κ satisfies .
K +B T ) ≤ C H , κ(TFTT ) = κ(B h
(17.2)
where C is independent of H, h, and Young’s modulus. nor the conditioning of .B depends on Young’s modulus and Proof Neither .K T K +B T . .TFT = TBΔ−1/2 Δ1/2 K+ Δ1/2 Δ−1/2 BT TT = B
The rest follows by Theorem 11.2.
.
We can plug the reorthogonalization-based preconditioning into problem (11.40) min θ(λ)
.
s.t.
Gλ = o and
λ I ≥ I
by substituting λ = TT μ
.
to get .
min θ(TT μ)
s.t.
GTT μ = o and
[TT μ]I ≥ I .
(17.3)
Since the reorthogonalization-based preconditioning turns in general case the bound constraints into more general inequality constraints, it follows that we can use this preconditioning only as preconditioning in face. However, if each node is involved in at most one inequality constraint, as in the two has at body problem with a smooth contact interface, i.e., if each block of .B most one row corresponding to the inequality constraint, then it is possible to preserve the bound inequality constraints. It is just enough to start the with the rows corresponding to the orthogonalization of each block of .B equality constraints to get BE = TE O .B (17.4) , TI Σ BI
378
Zdeněk Dostál and Tomáš Kozubek
so that .
min θ(TT μ)
s.t.
GTT μ = o,
μI ≥ Σ−1 I ,
λ = TT μ,
(17.5)
where .Σ is a diagonal matrix with positive entries. Since such transformation preserves the bound constraints, it follows that the preconditioning by reorthogonalization can be applied to discretized dual problem (17.3) and improve the bounds on the rate of convergence of the algorithms like MPRGP, MPGP, SMALBE, or SMALSE introduced in Part II.
17.2 Renormalization-Based Stiffness Scaling Let us now consider a general bound and equality-constrained problem (11.36) arising from the discretization of a contact problem with nonsmooth interface or which admits nodes on the contact interface which belong to more than two bodies, so that the reorthogonalization-based preconditioning changes the bound constraints into more general inequality constraints (17.3). To improve the conditioning of the nonlinear steps, we can try to use .Δ to find a diagonal matrix Σ = diag(σ1 , . . . , σm )
.
with positive diagonal entries such that the regular condition number of the scaled matrix .ΣFΣ is smaller than that of .F. Such scaling can be considered as modification of the constraints to ΣI BI∗ v ≤ ΣI gI ,
.
ΣE BE∗ v = o,
and
Σ = diag(ΣI , ΣE ),
where .ΣI and .ΣE are the diagonal blocks of .Σ that correspond to .BI∗ and BE ∗ , respectively. As above, we shall first improve the regular condition number of .K by the diagonal scaling to get
.
= Δ−1/2 KΔ−1/2 K
.
and + Δ−1/2 BT F = BK+ BT = BΔ−1/2 K
.
and then determine .Σ by attempting to make = ΣBΔ−1/2 B
.
as close to a matrix with orthonormal rows as possible, i.e.,
17 Preconditioning and Scaling .
379
ΣBΔ−1/2 (ΣBΔ−1/2 )T ≈ I.
(17.6)
Probably the simplest idea is to require that (17.6) is satisfied by the diagonal entries. Denoting by .bi the i -th row of .B, we get for the diagonal entries .σi of .Σ . σi bi Δ−1/2 (σi bi Δ−1/2 )T ≈ 1 (17.7) and σi = 1/ bi Δ−1 bTi . Unfortunately, it turns out that the scaling is not sufficient to get an optimality result similar to Proposition 17.1, though numerical experiments indicate efficiency of the diagonal scaling. The problem is in the constraints which enforce the continuity of the solution in the corners of the subdomains with different coefficients. The following proposition on scalar problem partly explains both observations. Proposition 17.2 Let a 2D Poisson problem on square be decomposed into homogeneous subdomains and discretized by finite elements using regular matching grids. Let the continuity of subdomains be enforced across the faces by the equations .xi − xj = 0 and in the corners by the equations x + xj − xk − x = 0,
. i
xi − xj = 0,
and
xk − x = 0.
= ΣBΔ−1/2 and the vectors of Lagrange multiLet the constraint matrix .B m c and .λc ∈ Rmc associated pliers .λ ∈ R be decomposed into the blocks .B with the corner constraints and .Be and .λe ∈ Rme associated with the edge constraints, .m = mc + me , so that .
T λc + B T λe . T λ = B B c e
(17.8)
Then there is .0 < C1 ≤ 1 such that .
T λc 2 ≤ 2λc 2 C1 λc 2 ≤ B c
and
T λe 2 = λe 2 . B e
(17.9)
Moreover, there are subspaces .Vc ⊆ Rmc and .Wc ⊆ Rmc of the dimension .nc and .mc − nc , where .nc is the number of corners, such that .Rmc = Vc ⊕ Wc and for any .λc ∈ Vc and .μc ∈ Wc λTc μc = 0,
.
T λc 2 ≤ λc 2 , C1 λc 2 ≤ B c T μc 2 ≤ 2μc 2 . μc 2 ≤ B c
and (17.10)
ijk the .3 × 4 block of .B c associated with variables Proof Let us denote by .B c c. .vi , vj , vk , v which are interconnected by the corresponding three rows of .B 2, b 3 , we get 1, b Denoting the parts of these rows .b
380
Zdeněk Dostál and Tomáš Kozubek
⎤ 1 b ⎢ ⎥ = ⎣b 2⎦, b3 ⎡
ijk B c
.
⎤ T T b 1 b 1 2 b1 b3 ⎥ ijk )T = ⎢ ijk (B T b B 1 0 ⎦. ⎣b c c 1 2 T b b 0 1 1 3 ⎡
The direct computation using the properties of the eigenvalues of SPS ma ijk )T are given by ijk (B trices shows that the eigenvalues of .B c c T b T b T b 2 )2 +(b 3 )2 > 0, μ2 =1, μ3 =1+ (b 2 T 2 .μ1 = 1− (b 1 1 1 2 ) +(b1 b3 ) < 2. It follows that we can take for .C1 the smallest eigenvalue of the corner blocks. The rest is easy as the rows of .Be are orthonormal and orthogonal to the rows . of .Bc . Remark 17.1 Proposition 17.2 indicates that the effect of varying coefficients can be reduced by the scaling to a relatively small subspace which does not contain dominating components of the solution, so that it can improve the convergence of the iterative contact problems. Notice that the reorthogonalization-based scaling is sufficient to get rid of the effect of varying coefficients for 2D problems solved by TFETI.
17.3 Lumped and Dirichlet Preconditioners in Face There are several well-established preconditioners that can be used to improve the conditioning of the Hessian .PFP of the dual function. We can try to adapt them to improve the rate of convergence for the solution of auxiliary linear problems by preconditioning in face (see Sect. 8.6), but we shall see that the effect of adapted preconditioners is supported rather by the intuition than by the theory. The lumped preconditioner was introduced by Farhat and Roux in their seminal paper on FETI [1]. It is very cheap and often outperforms more sophisticated preconditioners. For the linear problems solved by TFETI, it is given by .
M = PBKBT P = PB∗B KBB BT∗B P,
(17.11)
where .B denotes the set of all indices of the nodal variables that are associated with the nodes on the boundary of the subdomains. The Dirichlet preconditioner was introduced by Farhat, Mandel, and Roux in their another important paper on FETI [2]. For the linear problems solved by TFETI, it is given by .
M = PB∗B SBT∗B P.
(17.12)
17 Preconditioning and Scaling
381
The adaptation of standard preconditioners for the solution of contact problems by TFETI is not obvious. Taking into account that the Hessian of the dual cost function .θ in Sect. 11.7 reads .PFP + Q, we can consider adapted lumped preconditioner in the form .
M = PF ∗ BKBT P∗F = PF ∗ B∗B KBB BT∗B P∗F + −1 QF F ,
(17.13)
where .F denotes the indices of the variables associated with the free variables in face. Similarly, we can consider adapted Dirichlet preconditioner in the form .
M = PF ∗ B∗B SBT∗B P∗F + −1 QF F .
(17.14)
Though the second term is not a projector any more, our experience showed that the lumped preconditioner can modestly reduce the cost of the solution [3, 4]. Let us mention that replacing .QF F by the projector onto the kernel of .BF B KBB BTF B seems possible only at a rather high cost. Moreover, the preconditioning in face does not improve the performance of the nonlinear steps, and its implementation requires some globalization strategy to grant the convergence. There are successful attempts to plug standard preconditioners into the FETI solvers for contact problems based on nonsmooth Newton methods starting from the FETI-C algorithm proposed by Farhat and Dureisseix [5]. The standard preconditioners were later enhanced also into FETI-DPC algorithms based on FETI-DP [6]. See also Avery and Farhat [7] and Dostál et al. [8].
17.4 Adaptive Augmentation Here we show that the performance of the SMALBE for the solution of (11.40) can be essentially improved by enhancing the information on the free set of current iterates into the reorthogonalization of equality constraints. The basic idea is closely related to the reorthogonalization-based preconditioning introduced above.
17.4.1 Algorithm Let us first recall that cost function of the problem (11.40) reads
382
Zdeněk Dostál and Tomáš Kozubek
1 T ν (PFP + ρQ)ν − μT Pd, 2 Q = GT G, P = I − Q, and GGT = I,
θ (ν) =
. ρ
where .ρ > 0 is not specified. We choose ρ = ρ0 ≈ λmin (PFP)
.
and denote by . the augmentation parameter, so that the augmented Lagrangian .L for (11.40) reads ρ 1 T ν (PFP + ρ0 Q)ν − μT Pd + GT μ + Qν2 2 2 = L(ν, μ, ), ρ = ρ0 + ρ.
L(ν, μ, ) =
.
A little analysis of the proof of convergence of SMALBE in Chap. 9 reveals that the augmentation affects directly only free variables, i.e., the effective augmentation in k-th step is carried out by means of .G∗F (ν k ) . Indeed, if k .F = F (ν ) denotes the set of indices of inactive constraints, i.e., .νi > i for .i ∈ F , then the quadratic term of the Lagrangian reduces to .
1 T ν F PF ∗ FP∗F + GT∗F G∗F ν F , 2
which can be too small and ill-conditioned to enforce effectively the equality constrains, especially if the number of active constraints is large. A remedy is to carry out the iterations with orthonormal constraints as long as the active set of the iterates strongly varies until we get a reasonable of the solution .ν . Denoting approximation .F of the free set .F G∗A ∈ Rm×nA ,
G = [G∗A , G∗F ],
.
G∗F ∈ Rm×nF ,
n = nA + nF ,
we can introduce a lower triangular matrix .L of the triangular decomposition and define = L−1 G, G
.
LLT = G∗F GT∗F
so that ∗F G T = I. G ∗F
.
Thus the quadratic term of the Lagrangian for the modified representation of equality constraints reads .
1 T ν (PF ∗ FP∗F + IF F ) ν F . 2 F
17 Preconditioning and Scaling
383
T G is obviously more effective than that The regularization term with .G T with the orthogonal projector .G G as long as the free set of .ν is near the free However, it provides effective regularization for any set .F which defines .G. n .ν ∈ R as ν T G∗F GT∗F ν ≤ ν T GGT ν= ν2 ,
.
ν ∈ Rm ,
implies 2 = ν T GT G∗F GT −1 Gν ≥ ν T GT GGT −1 Gν = Gν2 . Gν ∗F
.
(17.15)
To enhance these observations into SMALBE-M, recall that the solution of the problem (11.40) does not depend on the representation of the feasible associated set .Ω, so that we can improve the convergence by switching to .G k . with a current approximation .Fk = F (ν ) of the free set .F of the solution .ν Assume that the orthogonalization of .G∗Fk is carried out by means of the Cholesky factor T .L = Chol G∗Fk G∗F k of the decomposition of .G∗Fk GT∗Fk , so that LLT = G∗Fk GT∗Fk ,
.
we get the orthogonalized equality constraints in the form L−1 Gν = o.
.
The corresponding Lagrangian .LL−1 reads LL−1 (ν, m, ) =
.
1 T ρ ν PFPν − ν T Pd + ν T GT L−T m + L−1 Gν2 , 2 2
and the gradient .gL−1 of .LL−1 reads g
. L−1
(ν, m, ) = ∇ν LL−1 (ν, m, ) = PFPν − Pd + GT L−T m + GT L−T L−1 Gν.
Notice that g
. L−1
(ν, LT μ, 0) = g(ν, μ, 0),
so the Lagrange multiplier .m associated with .gL−1 is related to .μ by m = LT μ.
.
Thus to switch the representation of .Ω, evaluate
384
Zdeněk Dostál and Tomáš Kozubek
μk = L−T mk with the actual .L, evaluate new .L = Chol B∗,Fk BT∗,Fk , set .mk = LT μk in Step 1, and continue the iterations with a new Lagrangian until the active set of the iterates stabilizes. Since the transformations between .m and .μ are simple, it is possible to use only one set of multipliers. The procedure is invoked whenever the free set of an iterate differs significantly from that approximated in the beginning of the cycle and the active set is stabilized so that it changes only mildly. To describe the procedure explicitly, let .#F denote a number of elements of a free set .F . The modified SMALBE-M algorithms with an adaptive augmentation read as follows. .
Algorithm 17.1 Semi-monotonic augmented Lagrangians with adaptive augmentation for problem (11.40) (adaptive SMALBE-M) Step 0. {Initialization} Choose .η > 0, .0 < β < 1, .M0 > 0, . > ρ0 > 0, .μ0 ∈ Rm , .L = I, .k = 0 for .k = 0, 1, 2, . . . Step 1. {Inner iteration with adaptive precision control} Find .ν kI ≥ I such that P k T k .g −1 (ν , L μ , ) L
≤ min{Mk Gν k , η}
(17.16)
Step 2. {Updating the Lagrange multipliers} k+1 .μ
= μk + ( − 0 )L−T L−1 Gν k
Step 3. {Update M provided the increase of the Lagrangian is not sufficient} if .k > k and .
L(ν k , μk , ) < L(ν k−1 , μk−1 , ) +
− 0 Gν k 2 2
(17.17)
.Mk+1 = βMk else .Mk+1 = Mk end else if
Step 4. {Update .L} if .|#F (ν k ) − #F (ν k )| is large and .Chol(G∗F (ν k ) GT ) ∗F (ν k ) then
.L
is nonsingular
= Chol(G∗F (ν k ) GT ) ∗F (ν k )
.k = k + 1 end for
In our experiments, we got very good results even with one update of .L (see Sect. 17.5.3). The specific implementation of the update rule depends on many factors, including the cost of orthogonalization (obtaining .L), the
17 Preconditioning and Scaling
385
difficulty of the approximation of active and free sets, etc. Observe that we do not suppose that .L is defined by the exact free set of the solution. Remark 17.2 To simplify the presentation, we assumed that .G∗F is a full row rank matrix. The algorithm can be modified to accept dependent rows as follows: • If during the factorization appears zero pivot, remember the index, and skip the corresponding row and column. • Reorder the vectors and matrices so that the indices of skipped columns are at the end. • Use the factor in the form LO .L = , O I where . L is a factor of decomposed .G∗F GT∗G without the rows and columns corresponding to approximately zero pivots.
17.4.2 Convergence of Adaptive SMALBE-M The theoretical results concerning adaptive SMALBE-M are very similar to those concerning original SMALBE-M. To simplify the reference to the results of Chap. 9, let us introduce notation A = PFP + 0 Q,
.
and
B = G.
Theorem 17.1 Let .Mk , .{xk }, and .{λk } be generated by SMALBE-M for the solution of (9.1) with .M0 , η, > 0 > 0, .0 < β < 1, and .λ0 ∈ Rm . Let the solution be regular. Then the following statements hold. (i) If .λmin denote the smallest eigenvalue of .A, then .Mk ≥ min{M0 , β ( − 0 )λmin }. (ii) If .k > 0 and .Mk = Mk+1 , then L(xk , λk , ) ≥ L(xk−1 , λk−1 , ) +
.
− 0 Gν2 . 2
(17.18)
of (9.1). (iii) The sequence .{xk } converges to the solution .x of La(iv) The series .{λk } converges to the uniquely determined vector .λ grange multipliers of (9.1). Moreover, if . is sufficiently large, then the convergence of both the Lagrange multipliers and the feasibility error is linear.
386
Zdeněk Dostál and Tomáš Kozubek
Proof First notice that by (17.15) and (17.16) gLP−1 (xk , LT λk , ) ≤ min{Mk Bxk , η} ≤ min{Mk L−1 Bxk , η},
.
so the iterates .xk , .μk = LT λk can be considered as iterates of the standard SMALBE algorithm for the problem with equality constraints in the form L−1 Bx = o.
.
Thus we can apply Lemma 9.3(i) to get (i). To prove (ii), notice that it follows by Lemma 9.3(i) and (17.15) that LL−1 (xk , μk , 0) ≥ LL−1 (xk−1 , μk−1 , ) = LL−1 (xk−1 , μk−1 , 0) + L−1 Bxk 2 2 k−1 k−1 k 2 ≥ LL−1 (x ,μ , 0) + Bx . 2
.
Using LL−1 (xk , μk , 0) = LL−1 (xk , LT λk , 0) = L(xk , λk , 0),
.
we get
L(xk , λk , 0) ≥ L(xk−1 , λk−1 , 0) + Bxk 2 . (17.19) 2 This proves (ii). The remaining statements can be deduced from (ii) in the same way as in the proof of Theorem 9.2. .
Remark 17.3 The algorithm can be applied without any change to the problems with separable and equality constraints as (12.40) arising from the discretization of contact problems with friction.
17.5 Numerical Experiments The variants of scaling were incorporated into MatSol [9] and tested on 2D and 3D benchmarks.
17.5.1 Reorthogonalization—3D Heterogeneous Beam The domain of our first benchmark was a 3D beam of size .2×1×1 [m] divided into .6 × 3 × 3 subdomains made of two materials with Young’s modulus 3 3 .E1 = 10 and .E2 = α·10 , .α ∈ {1, 10, 100, 1000}. The Dirichlet boundary conditions were prescribed on .ΓD = {(0, y, z) : y ∈ 0, 1, z ∈ 0, 1}. Three different distributions of materials labeled by a, b, c that were used in our experiments are depicted in Fig. 17.1. The gray regions are filled with the material with Young’s modulus .E2 . The vertical traction 2000 [N] was applied
17 Preconditioning and Scaling
387
Fig. 17.1 3D heterogeneous beams—material distribution, .E2 grey Prec. .E2 /E1
1 10 100 1000
a .18.4 .176.0 .1760.0 17,600.0
.I b .18.4 .179.0 .1780.0 17,800.0
c .18.4 .121.0 .1200.0
12,000.0
.T(diag(.K)) a b c .9.89 .9.89 .9.89 .9.85 .10.1 .10.2 .9.85 .10.1 .10.4 .9.85 .10.1 .10.4
.Σ(diag K)
a .9.89 .53.1 .501.0 .5000.0
b
c
.9.89
.9.89
.60.7
.51.9
.609.0
.504.0
.6100.0
.5020.0
Table 17.2 3D beam—.κ(PFP) for varying material distributions and preconditioning
on the upper face of the beam. The discretization resulted in a linear problem with 32,152 primal and 1654 dual variables. The regular condition number of .PFP for the varying combinations of material distributions a, b, c, material scales .E2 /E1 ∈ {1, 10, 100, 1000}, and preconditioning by .I (no preconditioning), .T, or .Σ are summarized in Table 17.2. We can see that the reorthogonalization-based preconditioning eliminates the effect of heterogeneous coefficients in agreement with the theory, while the renormalization-based scaling reduces the effect of jumps in the coefficients, but is far from eliminating it.
17.5.2 Reorthogonalization—Contact Problem with Coulomb Friction The effect of reorthogonalization-based preconditioning on a problem with at most one inequality associated with one node is documented by the analysis of the matrix with circular insets. The friction on the contact interfaces is described by Coulomb’s law with the coefficient of friction .Φ = 0.3. The discretized problem was divided into subdomains in two ways using regular decomposition and METIS; see Fig. 17.2. The discretization and decomposition resulted in the problem with 9960 primal variables, 12(28) subdomains, 756(1308) dual variables, 36(84) equality constraints, and 384 box constraints. The problem was resolved with varying .E2 /E1 and the reorthogonalizationbased preconditioning .T associated with the diagonal entries of .K and .S denoted, respectively, by .T (diag K) and .T(diag S). The results obtained by a regular and METIS discretization are reported separately. Young’s modulus
388
Zdeněk Dostál and Tomáš Kozubek
Fig. 17.2 Material with circular insets—regular (left) and METIS (right) decomposition
Fig. 17.3 Material with circular insets—deformation and the von Mises stress for = 104
.E2 /E1
E2 of the insets was parametrized as .α times Young’s modulus .E1 of the matrix. The resulting von Mises stress and the deformation are depicted in Fig. 17.3.
.
.E2 /E1
1 10 100 1000 10,000
93 126 640 6260 62,500
/ / / / /
.I 112 129 230 371 588
.T(diag K)
93 62 56 55 55
/ / / / /
139 145 188 148 139
.T(diag S)
72 53 50 50 50
/ / / / /
128 144 169 128 132
Table 17.3 Material with circular insets—.κ(PFP) / iterations (regular grid)
.E2 /E1
1 10 100 1000 10,000
385 627 6230 62,300 623,000
/ / / / /
.I 203 230 345 712 972
.T(diag K)
340 303 295 295 294
/ / / / /
287 216 249 192 251
.T(diag S) 322 / 269 295 / 242 289 / 207 288 / 195 288 / 200
Table 17.4 Material with circular insets—.κ(PFP) / iterations (METIS discretization)
17 Preconditioning and Scaling
389
The effective condition number and the number of Hessian multiplications needed to achieve the relative error .10−4 are presented in Table 17.3 (regular grid) and Table 17.4 (METIS grid). The regular condition number has been improved in all cases. Moreover, the number of iterations is nearly constant for regular decomposition in agreement with the theory as there are no corners on the material interface. The computations with the METIS decomposition resulted in an increased number of Hessian multiplications. An obvious reason is the increased condition number due to the ill-conditioning of the constraint matrices.
17.5.3 Adaptive Augmentation—3D Hertz Problem We tested the performance of the adaptive augmentation on the solution of Hertz problem depicted in 16.4. This time, the bodies were decomposed into .2 × 1000 subdomains, each discretized by .63 brick trilinear elements. The discretized problem was defined by 2,058,000 nodal variables, 3721 inequality constraints describing the non-penetration, and 64,679 equality constraints describing the “gluing” of subdomains and discretized boundary conditions. The dimension of the dual bound and equality constrained problem was 718,440 with 12,000 equality constraints and 3721 bound constraints, 3225 of which were active at the solution. To check the effect of adaptive augmentation, we resolved our benchmark with the reorthogonalization invoked after each .Nreo steps starting always after the first iteration, .Nreo ∈ {1, 3, 5, ∞}, where .∞ stands for the original SMALBE-M algorithm without reorthogonalization. Thus the computations with step .Nreo = 5 invoked the reorthogonalization after the first, sixth, etc. iterations. However, we skipped the orthogonalization when there was no change of the active set. We carried out the computations to the relative precision .ε = 10−8 . .Nreo
Reorthogonalization steps Outer iterations Matrix .× vector
=1 5/ 5 8/10 91/121
= 3 .Nreo = 5 .Nreo = ∞ 3/4 2/3 0/0 10/13 10/13 64/121 117/161 119/158 333/557
.Nreo
Table 17.5 Effect of reorthogonalization .(ε = 10−5 /10−8 )
The effect of reorthogonalization on the number of outer and inner iterations that are necessary to achieve the relative precision .ε = 10−5 /10−8 is captured in Table 17.5 and in Figs. 17.4–17.7.
390
Zdeněk Dostál and Tomáš Kozubek
Fig. 17.4 (a) .Bx vs outer iterations
Fig. 17.5 (b) .gP /g0P vs outer iterations
Fig. 17.6 .Bx vs Hessian multiplications
Fig. 17.7 .gP /g0P vs Hessian multiplications
We can see that even very simple implementation of the reorthogonalization strategy can considerably improve the performance of the algorithm. We conclude that there is an important class of problems the solution time of which can be essentially reduced by the proposed reorthogonalization strategy. Some additional improvement can be achieved by implementation of a more sophisticated update strategy depending on the cost of reorthogonalization. To illustrate the performance of SMALSE-M with reorthogonalization, let us point out that after identifying the active set of the solution, the algorithm required 34 conjugate gradient iterations to reduce the free gradient by −3 .10 . We deduce that the solution of the related linear problem to the relative precision .10−8 would require some 90 conjugate iterations—nonlinearity was resolved at the cost smaller than the cost of two Newton steps. Notice that one nonlinear step of MPRGP requires two matrix-vector multiplications, and no FETI type preconditioner was applied.
17 Preconditioning and Scaling
391
17.6 Comments The reorthogonalization-based preconditioning and renormalization-based scaling presented here appeared in Dostál et al. [10]. In spite of its simplicity, the renormalization-based scaling is just one of the very few methods that are known to precondition the nonlinear steps that identify the contact interface (see also [11]). If there are no corners on the contact interface as in the problem solved in Sect. 17.3, we can enhance this preconditioning to get the rate of convergence independent of material coefficients. Otherwise we can use it as preconditioning in face (see Sect. 8.6). The reorthogonalization-based scaling has a similar effect as the superlumped preconditioning developed by Rixen and Farhat (see [12, 13]). See also Bhardway [14]. The results can be extended to the algorithms based on BETI (see Chap. 14). The scaling is efficient mainly for problems with homogeneous domains of varying stiffness, but can be useful, at least in some special cases, also for heterogenous domains [15]. If all the subdomains have the same material coefficient, the renormalizationbased scaling reduces to the multiplicity scaling. The limits of the preconditioning effect that can be achieved by the diagonal scaling are indicated by the results of Greenbaum [16], Ainsworth et al. [17], Forsythe and Strauss [18], van der Sluis [19], or Bank and Scott [20]. Adaptive augmentation was successfully used to speed up the convergence of the SMALBE-M algorithm for the contact problems with mortar discretization; see [21, 22].
References 1. Farhat, C., Roux, F.-X.: A method of finite element tearing and interconnecting and its parallel solution algorithm. Int. J. Numer. Methods Eng. 32, 1205–1227 (1991) 2. Farhat, C., Mandel, J., Roux, F.-X.: Optimal convergence properties of the FETI domain decomposition method. Comput. Methods Appl. Mech. Eng. 115, 365–385 (1994) 3. Dostál, Z., Gomes, F.A.M., Santos, S.A.: Solution of contact problems by FETI domain decomposition with natural coarse space projection. Comput. Methods Appl. Mech. Eng. 190(13–14), 1611–1627 (2000) 4. Dostál, Z., Gomes, F.A.M., Santos, S.A.: Duality based domain decomposition with natural coarse space for variational inequalities. J. Comput. Appl. Math. 126(1–2), 397–415 (2000) 5. Dureisseix, D., Farhat, C.: A numerically scalable domain decomposition method for the solution of frictionless contact problems. Int. J. Num. Methods Eng., 2001, 50 (12), 2643–2666 (2001)
392
Zdeněk Dostál and Tomáš Kozubek
6. Avery, P., Rebel, G., Lesoinne, M., Farhat, C.: A numerically scalable dual-primal substructuring method for the solution of contact problems—part I: the frictionless case. Comput. Methods Appl. Mech. Engrg. 193, 2403–2426 (2004) 7. Avery, P., Farhat, C.:The FETI family of domain decomposition methods for inequality-constrained quadratic programming: Application to contact problems with conforming and nonconforming interfaces. Comput. Methods Appl. Mech. Engrg. 198, 1673–1683 (2009) 8. Dostál, Z., Vondrák, V., Horák. D., Farhat, C., Avery, P.: Scalable FETI algorithms for frictionless contact problems. In: Domain Methods in Science and Engineering XVII, Langer, U. et al. (eds), Lecture Notes in Computational Science and Engineering (LNCSE) 60. Springer Berlin, 263–270 (2008) 9. Kozubek, T., Markopoulos, A., Brzobohatý, T., Kučera, R., Vondrák, V., Dostál, Z.: MatSol–MATLAB efficient solvers for problems in engineering. The library is no longer updated and is replaced by the ESPRESO framework. http://numbox.it4i.cz 10. Dostál, Z., Kozubek, T., Vlach, O., Brzobohatý, T.: Reorthogonalization based stiffness preconditioning in FETI algorithms with applications to variational inequalities. Numer. Linear Algebr. Appl. 22(6), 987–998 (2015) 11. Domorádová, M., Dostál, Z.: Projector preconditioning for partially bound constrained quadratic optimization. Numer. Linear Algebr. Appl. 14(10), 791–806 (2007) 12. Rixen, D., Farhat, C.: Preconditioning the FETI method problems with intra- and inter-subdomain coefficient jumps. In: Bjorstad, P., Espedal, M., Keyes, D. (eds.) Proceedings of the 9th International Conference on Domain Decomposition Methods, pp. 472–479. Bergen (1997) 13. Rixen, D., Farhat, C.: A simple and efficient extension of a class of substructure based preconditioners to heterogeneous structural structural mechanics problems. Int. J. Numer. Methods Eng. 44, 472–479 (1999) 14. Bhardwaj, M., Day, D., Farhat, C., Lesoinne, M., Pierson, K., Rixen, D.: Application of the FETI method to ASCI problems: scalability results on 1000 processors and discussion of highly heterogeneous problems. Int. J. Numer. Methods Eng. 47, 513–535 (2000) 15. Kraus, J.K., Margenov, S.: Robust Algebraic Multilevel Methods and Algorithms. Radon Series on Computational and Applied Mathematics. De Gruiter, Berlin (2009) 16. Greenbaum, A.: Diagonal scaling of the Laplacian as preconditioners for other elliptic differential operators. SIAM J. Matrix Anal. Appl. 13(3), 826–846 (1992) 17. Ainsworth, M., McLean, B., Tran, T.: Diagonal scaling of stiffness matrices in the Galerkin boundary element method. ANZIAM J. 42(1), 141–150 (2000)
17 Preconditioning and Scaling
393
18. Forsythe, G.E., Strauss, E.G.: On best conditioned matrices. Proc. AMS - Am. Math. Soc. 6, 340–345 (1955) 19. van der Sluis, A.: Condition numbers and equilibration of matrices. Numer. Math. 14(1), 14–23 (1969) 20. Bank, R., Scott, R.: On the conditioning of finite element equations with highly refined meshes. SIAM J. Numer. Anal. 26(6), 1383–1394 (1988) 21. Dostál, Z., Vlach, O., Brzobohatý, T.: Scalable TFETI based algorithm with adaptive augmentation for contact problems with variationally consistent discretization of contact conditions. Finite Elem. Anal. Des. 156, 34–43 (2019) 22. Dostál, Z., Vlach, O.: An accelerated augmented Lagrangian algorithm with adaptive orthogonalization strategy for bound and equality constrained quadratic programming and its application to large-scale contact problems of elasticity. J. Comput. Appl. Math. 394, 113565 (2021)
Part IV Other Applications and Parallel Implementation
Chapter 18
Contact with Plasticity Tomáš Kozubek
We extend the application of scalable algorithms to solving the frictionless contact problems of elasto-plastic bodies. We use the TFETI domain decomposition method for frictionless contact problems with elastic bodies introduced in Chap. 11. Let us recall that plasticity is a time-dependent model of constitutive relations, which considers the history of loading. The main difficulty is posed by the nonlinear elasto-plastic operator, which is generally nonsmooth. For simplicity, we shall restrict our attention to the two-body frictionless contact problem. We shall assume that the bodies occupy in a reference configuration the domains .Ω1 , Ω2 with the boundaries .Γ1 , Γ2 and their parts k k k k .ΓU , ΓF , and .ΓC , .k = 1, 2. The bodies are assumed to be fixed on .ΓU with a nonzero measure and can come in contact on .ΓkC . The load is represented by the surface traction on .ΓkF and the volume forces defined on .Ωk , .k = 1, 2. We shall assume that the constitutive relations are defined by associated elasto-plasticity with the von Mises plastic criterion and the linear isotropic hardening law (see, e.g., Han and Reddy [1] or de Souza et al. [2]). A more detailed description of the elasto-plastic initial value constitutive model can be found, for example, in [3]. The weak formulation of the corresponding elasto-plastic problem can be found in [1]. Here, we start directly with the discretized elasto-plastic problem. For the sake of simplicity, we confine ourselves to the one-step problem formulated in displacements. Its solution is approximated in the inner loop by the iterates
T. Kozubek IT4Innovations National Supercomputing Center, VŠB-Technical University of Ostrava, Ostrava, Czech Republic e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 Z. Dostál et al., Scalable Algorithms for Contact Problems, Advances in Mechanics and Mathematics 36, https://doi.org/10.1007/978-3-031-33580-8_18
397
398
Tomáš Kozubek
minimizing convex and smooth functional on a convex set. The discretized inner problem is a QP optimization problem with simple equality and inequality constraints solvable at a cost nearly proportional to the number of variables.
18.1 Algebraic Formulation of Contact Problem for Elasto-Plastic Bodies Let us start with the discretized formulation of the problem that arises from the application of the TFETI domain decomposition. We assume that each domain .Ωk is decomposed into the subdomains Ωk1 , Ωk2 , . . . , Ωksk ,
.
k = 1, 2,
and that each subdomain is assigned one index, so that the displacements can be described by the vector .v ∈ Rn v = [v1T , . . . , vsT ]T ,
.
s = s1 + s2 .
We define the subspace .
V = {v ∈ Rn : BE v = o} ,
(18.1)
and the set of admissible displacement .
K = {v ∈ Rn : BE v = o, BI v ≤ cI } .
(18.2)
Here the matrix .BE ∈ RmE ×n enforces the gluing of the subdomains and their fixing along the parts .ΓkU , .k = 1, 2. The inequality constraint matrix m ×n .BI ∈ R I enforces the non-penetration condition on the contact interface. Notice that .K is convex and closed. The algebraic formulation of the contact elasto-plastic problem can be written as the following optimization problem [4]: .
Find u ∈ K
s.t. J(u) ≤ J(v)
for all v ∈ K ,
(18.3)
where .
J(v)=Ψ(v) − f T v,
v ∈ Rn .
(18.4)
Here, the vector .f ∈ Rn represents the load consisting of the volume and surface forces and the initial stress state. The functional .Ψ : Rn → R represents the inner energy; it is a potential to the nonlinear elasto-plastic operator n .F : R → Rn , i.e., .DΨ(v) = F(v), .v ∈ Rn . The function .F is generally nonsmooth but Lipschitz continuous. It enables us to define a generalized
18 Contact with Plasticity
399
derivative .K : Rn → Rn×n of .F in the sense of Clark, i.e., .K(v) ∈ ∂F(v), n .v ∈ R . Notice that .K(v) is symmetric, block diagonal, and sparse matrix. Problem (18.4) has a unique solution and can be equivalently written as the following variational inequality: .
Find u ∈ K
s.t. F(u)T (v − u) ≥ f T (v − u)
for all v ∈ K .
(18.5)
The properties of .F and .K can be found in Čermák and Sysala [3].
18.2 Semi-smooth Newton Method for Optimization Problem Problem (18.3) contains two nonlinearities—the non-quadratic functional J (due to .Ψ) and the non-penetration conditions included in the definition of the convex set .K . Using the semi-smooth Newton method, we shall approximate .Ψ by a quadratic functional using the Taylor expansion 1 Ψ(u) ≈ Ψ(uk ) + F(uk )T (u − uk ) + (u − uk )T K(uk )(u − uk ) 2
.
defined for a given approximation .uk ∈ K of the solution .u to the problem (18.3). Let us denote f = f − F(uk ),
Kk = K(uk ),
. k
and define Kk = K − uk = v ∈ Rn : BE v = o, BI v ≤ cI,k , cI,k =cI − BI uk ,
.
.
1 Jk (v)= vT Kk v − fkT v, 2
v ∈ Kk .
(18.6)
Then, the Newton step reads uk+1 = uk + δuk ,
.
uk+1 ∈ K ,
where .δuk ∈ Kk is a unique minimizer of .Jk on .Kk , .
Jk (δuk ) ≤ Jk (v)
for all v ∈ Kk ,
(18.7)
i.e., .δuk ∈ Kk solves .
Kk δuk
T
(v − δuk ) ≥ fkT (v − δuk )
∀v ∈ Kk .
(18.8)
400
Tomáš Kozubek
Notice that if we substitute .v = uk+1 ∈ K into (18.5) and .v = u − uk ∈ Kk into (18.8) and sum up, then we obtain the inequality .
T T K(uk )δuk (u − uk+1 ) ≥ F(u) − F(uk ) (u − uk+1 ),
which can be arranged into the form .(u
k+1
− u)T K(uk )(uk+1 − u) ≤ F(uk ) − F(u) − K(uk )(uk − u)
T
(u − uk+1 ).
Hence, one can derive the local quadratic convergence of the semi-smooth Newton method provided that .uk is sufficiently close to .u.
18.3 Algorithms for Elasto-Plasticity The discretized elasto-plastic problem can be solved by the following algorithm: Algorithm 18.1 Solution of discretized elasto-plastic problem
We solve the nonlinear system of Eq. (18.5) by the semi-smooth Newton method (see, e.g., [5]). The corresponding algorithm reads as follows: Algorithm 18.2 Semismooth Newton method
Here .εNewton > 0 is the relative stopping tolerance.
18 Contact with Plasticity
401
18.4 TFETI Method for Inner Problem and Benchmark The inner problem (18.7) has the same structure as the primal frictionless TFETI problem (11.24). Thus, we can switch to dual as in Sect. 11.6 to get the dual problem to (18.7) in the form .
min Θ(λ)
s.t.
λI ≥ o and
RTk (f − BT λ) = o,
(18.9)
where .
Θ(λ) =
1 T T + T λ BK+ k B λ − λ (BKk fk − ck ), 2
(18.10)
Rk is a full rank matrix such that .Im Rk =.Ker Kk , and BE cE o .B = c= = . BI cI,k cI,k
.
After the preconditioning and regularization of (18.10) by the TFETI projector .P to the rigid body modes and regularization by .ρQ, .ρ > 0, and .Q = I − P as in Sect. 11.7, we can solve the inner problem (18.7) in the same way as a contact problem with elastic bodies in Chap. 11, i.e., to use Algorithm 9.2 (SMALBE) to generate the iterations for the Lagrange multipliers for the equality constraints and to use Algorithm 8.2 (MPRGP) to solve the bound-constrained problems in the inner loop. Since the stiffness matrices .Kk , .k = 1, 2, . . . are spectrally equivalent to the elastic matrices .K that were introduced in Sect. 11.5 (see Čermák et al. [6] and Kienesberger et al. [7]), it is possible to get for the solution of problem (18.7) by TFETI similar optimality results as those developed in Sect. 11.12. Using the methodology described in Chap. 12, the algorithm can be modified to the solution of the elasto-plastic contact problem with friction. Due to the arguments mentioned above, we can formulate for the solution of such problems with friction by TFETI similar optimality results as those developed in Sect. 12.9. Of course, the optimality results are valid only for the inner loop.
18.5 Numerical Experiments We shall illustrate the performance of the algorithms on the solution of a variant of the two-beam academic benchmark introduced in Sect. 11.13. The size of the beams is .3 × 1 × 1 [m]. We use regular meshes generated in MatSol [8]. The Young modulus, the Poisson ratio, the initial yield stress for the von Mises criterion, and the hardening modulus are
402
Tomáš Kozubek
Fig. 18.1 von Mises stress distribution
Fig. 18.2 Total displacement
E = 210,000 [M P a], ν i = 0.29, σyi = 450 [M P a],
.
i = 10,000 [M P a], i = 1, 2. Hm
The vertical traction 150 [MPa] was prescribed on the top face of the upper beam. In addition, the initial stress (or plastic strain) state was set to zero. The problem discretized by 1,533,312/326,969 primal/dual variables and 384,000 finite elements was decomposed into 384 subdomains. The solution with 356,384 plastic elements required six Newton iterations, 67 SMALSE-M iterations, and 5375 multiplications by the Hessian of the dual function. The precision of the Newton method and inner loop was controlled by .
uk+1 − uk ≤ 10−4 uk+1 + uk
and by the relative precision of the inner loop .10−7 , respectively. The distribution of von Mises stress and total displacement for the finest mesh are in Figs. 18.1 and 18.2.
18 Contact with Plasticity
403
18.6 Comments The application of domain decomposition methods to the analysis of elastoplastic bodies was considered, e.g., by Yagawa et al. [9] or Carstensen [10]. Here we followed Čermák [4] and Čermák and Sysala [3]; there can also be found the solution of an elasto-plastic variant of the yielding clamp connection (see Fig. 1.6). The adaptation of TFETI to the solution of the inner loop is straightforward and can be especially useful for the solution of large multibody problems. The proposed method can be used or can be a part of the solution of another contact inelastic problems, e.g., the analysis of von Mises’ elasto-plastic bodies with isotropic hardening or loading with perfect plasticity [11, 12].
References 1. Han, W., Reddy, B.D.: Plasticity: Mathematical Theory and Numerical Analysis. Springer, New York (1999) 2. de Souza Neto, E.A., Perić, D., Owen, D.R.J.: Computational Methods for Plasticity, Theory and Applications. Wiley, West Sussex (2008) 3. Čermák, M., Sysala, S.: Total-FETI method for solving contact elastoplastic problems. LNCSE 98, 955–965 (2014) 4. Čermák, M.: Scalable algorithms for solving elasto-plastic problems. Ph.D. thesis, VŠB-TU Ostrava (2012) 5. Qi, L., Sun, J.: A non-smooth version of Newton’s method. Math. Program. 58, 353–367 (1993) 6. Čermák, M., Kozubek, T., Sysala, S., Valdman, J.: A TFETI domain decomposition solver for elasto-plastic problems. Appl. Math. Comput. 231, 634–653 (2014) 7. Kienesberger, J., Langer, U., Valdman, J.: On a robust multigridpreconditioned solver for incremental plasticity. In: Blaheta, R., Starý, J. (eds.) Proceedings of IMET 2004—Iterative Methods, Preconditioning & Nummerical PDE, pp. 84–87. Institute of Geonics AS CR (2004) 8. Kozubek, T., Markopoulos, A., Brzobohatý, T., Kučera, R., Vondrák, V., Dostál, Z.: MatSol–MATLAB efficient solvers for problems in engineering. The library is no longer updated and is replaced by the ESPRESO framework. http://numbox.it4i.cz 9. Yagawa, G., Soneda, N., Yoshimura, S.: A large scale finite element analysis using domain decomposition method on a parallel computer. Comput. Struct. 38(5–6), 615–625 (1991) 10. Carstensen, C.: Domain decomposition for a non-smooth convex minimization problem and its application to plasticity. Numer. Linear Algebr. Appl. 4(3), 177–190 (1997)
404
Tomáš Kozubek
11. Čermák, M., Haslinger, J., Kozubek, T., Sysala, S.: Discretization and numerical realization of contact problems for elastic-perfectly plastic bodies. PART II—numerical realization, limit analysis. ZAMM -. J. Appl. Math. Mech. 95(12), 1348–1371 (2015) 12. Sysala, S., Haslinger, J., Hlaváček, I., Čermák, M.: Discretization and numerical realization of contact problems for elastic-perfectly plastic bodies. PART I—discretization, limit analysis. ZAMM -. J. Appl. Math. Mech. 95(4), 333–353 (2015)
Chapter 19
Contact Shape Optimization Vít Vondrák
19.1 Introduction Contact shape optimization problems in 3D have a structure that can be effectively exploited by the TFETI-based methods introduced in Part III. The reason is that the state problem’s solution procedure initiation can be reused to solve several auxiliary contact problems that arise in each design step. Let us recall that the goal of contact shape optimization is to find the shape of a part of the contact boundary satisfying predefined constraints and minimizing a prescribed cost function. The unknown shape is defined by the design variables .α ∈ Dadm defined for .α from an admissible set p .Dadm ⊂ R . The cost function .J u(α), α typically depends explicitly not only on the design variables .α but also on the displacement .u(α), which corresponds to the solution of the associated contact problem. The latter problem is called the state problem. The state problem is a contact problem introduced in Part III, the specification of which depends on .α. Thus the i th body of the state problem which corresponds to .α occupies in the reference configuration the domain .Ωi (α), and the corresponding boundary traction i .fF (α) can depend on .α, etc. Denoting by .C (α) the set of the displacements .u(α), which satisfy the conditions of equilibrium and possibly some additional constraints specified for the problem, the shape optimization problem reads
V. Vondrák IT4Innovations National Supercomputing Center, VŠB-Technical University of Ostrava, Ostrava, Czech Republic e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 Z. Dostál et al., Scalable Algorithms for Contact Problems, Advances in Mechanics and Mathematics 36, https://doi.org/10.1007/978-3-031-33580-8_19
405
406
Vít Vondrák
Find α ∈ Dadm
.
s.t.
J u(α), α ≤ J u(α), α , α ∈ Dadm , u(α) ∈ C (α).
Here, we use TFETI to minimize the compliance of a system of bodies in contact subject to box and volume constraints. The TFETI method is especially useful in sensitivity analysis, the bottleneck of the design optimization process. We use the semi-analytical method reducing the sensitivity analysis to a sequence of QP problems with the same Hessian matrix, so that the relatively expensive dual problem can be reused (see, e.g., Dostál et al. [1]).
f
Ω1
Ω2 (α )
Fig. 19.1 Basic notation of the contact problem
19.2 Discretized Minimum Compliance Problem Let us consider a variant of the Hertz 3D contact problems depicted in Fig. 19.1 and assume that the shape of the upper face of the lower body is controlled by the vector of design variables .α ∈ Dadm , .Dadm = [0, 1]p ⊂ Rp . Our goal is to change the shape of the lower body in order to minimize the compliance while preserving the volume of the bodies. The zero normal displacements are prescribed on the faces which share their boundaries with the axes. After choosing the design vector .α, we can decompose and discretize the problem in Fig. 19.1 as in Chap. 11. We get the energy functional in the form .
J(u, α) =
1 T u K(α)u − uT f (α), 2
(19.1)
where the stiffness matrix .K(α) and possibly the vector of external nodal forces .f (α) depend on .α.
19 Contact Shape Optimization
407
The matrix .BI and the vector .cI that describe the linearized condition of non-interpenetration also depend on .α, so that the solution .u(α) of the state problem with the bodies that occupy the domains .Ω1 and .Ω2 = Ω2 (α) can be found as the solution of the minimization problem .
s.t. u ∈ Ch (α),
min J(u, α)
(19.2)
where .
Ch (α) = {u: BI (α)u ≤ cI (α) and BE u = o}.
Denoting the solution of (19.2) by .u(α), we can define a function J (α) = J(u(α), α)
.
to get the contact shape optimization problem to find .
min J (α) = min
α∈Dadm
min J(u, α).
α∈Dadm u∈Ch (α)
(19.3)
The set of admissible design variables .Dadm defines all possible designs. It has been proved that the minimal compliance problem has at least one solution and that the function .J (α) is differentiable with respect to .α under natural assumptions (see, e.g., Haslinger and Neittanmäki [2]).
19.3 Sensitivity Analysis The goal of the sensitivity analysis is to find the influence of the design change on the solution of the state problem. The minimization of the cost function is then carried out by improving the initial design in a feasible decrease direction using .∇u(α) the columns of which are defined by the directional derivatives of the solutions of the state problem u (α, d) = lim
.
t→0+
u(α + td) − u(α) , t
where for .d are substituted the columns of the identity matrix .I. Here we shall briefly describe the semi-analytical sensitivity analysis. Some comparisons of the efficiency of alternative methods on practical problems in the context of domain decomposition can be found in Dostál et al. [1] and Vondrák et al. [3]. Let us recall the Lagrange function of the TFETI state problem (19.2) has the form .
L(u, λ, α) =
1 T u K(α)u − f T (α)u + λT B(α)u − c(α) , 2
(19.4)
408
Vít Vondrák
where ⎤ b1 (α) BI (α) ⎥ ⎢ .B(α) = = ⎣ ... ⎦ BE bm (α)
⎡
and .u and .λ depend on the vector of design variables .α, so that the KKT conditions for (19.2) read .
K(α)u − f (α) + BT (α)λ = o,
.
BI (α)u − cI (α) ≤ o,
BE u = o, λI ≥ o.
(19.5) (19.6)
To find the gradient of the cost function at the solution .u of the state problem (19.2), let us first consider the derivative .u (α, d) of .u = u(α) in a given direction .d at .α, and let .I and .E denote the indices of the equality and inequality constraints, respectively, .I ∪ E = {1, . . . , m}; let us denote by .A = A (u) the active set of .u,
.A = i ∈ {1, . . . , m}: bi u = ci , and let us decompose .A into the weakly active set .Aw and the strongly active set .As , Aw = {i ∈ I : λi = 0},
.
As = A \ Aw ,
where .λ = λ(α) denotes the vector of Lagrange multipliers associated with u. After the differentiation of the left equation of (19.5) in the direction .d, we get that .z = u (α, d) satisfies
.
.
K(α)z − f (α, d) + K (α, d)u + B (α, d)T λ = o,
(19.7)
so that .u (α, d) can be obtained by the solution of .
min Jα,d (z)
s.t. z ∈ D(α, d),
(19.8)
where J
. α,d
(z) =
1 T z K(α)z − zT f (α, d) + K (α, d)u + B (α, d)T λ 2
and D(α, d) = Dw (α, d) ∩ Ds (α, d), .
Dw (α, d) = {z ∈ Rn : bi (α)z ≤ ci (α, d) − bi (α, d) u(α) for i ∈ Aw }, Ds (α, d) = {z ∈ Rn : bi (α)z = ci (α, d) − bi (α, d) u(α) for i ∈ As }.
19 Contact Shape Optimization
409
Recall that .K (α, d), .f (α, d), .B (α, d), and .c (α, d) are the directional derivatives in the direction .d that can be simply evaluated. It has been proved (see, e.g., Haslinger and Mäkinen [4]) that the solution of (19.8) is the directional derivative .u (α, d) of the solution of the state problem (19.2). Let us now assume that .α, d are fixed and denote f = f (α, d) = f (α, d) − K (α, d)u − B (α, d)T λ, cw = cw (α, d) = cAw (α, d) − BAw (α, d)u, cs = cs (α, d) = cAs (α, d) − BAs (α, d)u, .
Bw = Bw (α) = BAw (α), Bs = Bs (α) = BAs (α), c = Bw , c= w , B Bs cs
where .Bw and .Bs are the matrices formed by the rows of .B corresponding to the weak and strong constraints, respectively, and similarly for the vectors .cw and .cs . Using the duality as in Sect. 11.6, we can get the dual problem .
min Φ(μ)
s.t. μw ≥ o and
where
= μ
.
T μ = RT (α) RT (α)B f,
μw μs
(19.9)
are the multipliers enforcing the weak and strong constraints and Φ(μ) =
.
1 T † T μ − μT BK † (α) μ BK (α)B fT − c . 2
Finally, the derivative .u (α, d) can be obtained by the procedure described in Sect. 3.7. Problem (19.9) with the bound and linear equality constraints is efficiently solvable by the combination of the SMALBE and MPRGP algorithms. Notice that the semi-analytical method for the sensitivity analysis for one design step requires the solution of p quadratic programming problems (19.9) with the same matrix + + + . K (α) = diag K1 (α), . . . , Ks (α) . Thus, we can reuse the factorization and the kernel of the matrix .K(α) from the solution of the state problem to the solution of the problems arising in the semi-analytical sensitivity analysis, and one design step requires only one factorization of the stiffness matrix. Moreover, if the domains are reasonably discretized so that the assumptions of Theorem 11.3 are satisfied, then the sensitivity analysis of a design step has asymptotically linear complexity. No-
410
Vít Vondrák
tice that the problems (19.9) for varying .di ∈ Rp , .i = 1, . . . , p are independent so that they can be solved in parallel.
19.4 Numerical Experiments We have tested the performance of the whole optimization procedure on the minimization of the compliance .
J (α) =
1 u(α)T K(α)u(α) − u(α)T f (α) 2
(19.10)
of the Hertz system defined above subject to the upper bound on the volume of the lower body. The compliance was controlled by the shape of the top side of the lower body that is parameterized by means of 3. × 3 = 9 design variables defined over this face. The design variables were allowed to move in vertical directions within the prescribed box constraints. The results of semianalytical sensitivity analysis were used in the inner loop of the sequential quadratic programming algorithm (SQP) (see e.g., Nocedal and Wright [5]). The distribution of the normal contact pressure of the initial and optimized system of bodies is in Figs. 19.2 and 19.3.
Fig. 19.2 Initial design—contact traction
The computations were carried out with varying discretization and decomposition parameters h and H, respectively. The results concerning the solution of the state problem are in Table 19.1. The relative precision of the computations was .10−6 . The optimized shape was observed after 121 design steps. The history of the decrease of the cost function is in Fig. 19.4. More details about this problem can be found in Vondrák et al. [3].
19 Contact Shape Optimization
411
Fig. 19.3 Final design—contact traction Number of subdomains Primal variables Dual variables Null space Hessian mults. Outer iterations
2 24,576 11,536 12 64 12
16 196,608 23,628 96 153 16
54 663,552 92,232 324 231 13
128 1,572,864 233,304 768 276 10
Table 19.1 State problem—performance for varying decomposition and discretization
−0.5 −1 −1.5 −2 −2.5 −3 −3.5 −4
x 107
0
20
40
60
80
100 120 140
Fig. 19.4 3D Hertz—minimization history
19.5 Comments The contact shape optimization problems have been studied by several authors, and many results are known. The approximation theory and the existence results related to this problem can be found in Haslinger and Neittanmäki [4] or Haslinger and Mäkinen [4]. These authors also present basic semi-analytical sensitivity analysis for the problems governed by a variational inequality, describe its implementation based on a variant of the SQP (sequential quadratic programming), and give numerical results for 2D problems. More engineering approach to constrained design sensitivity analysis is discussed also by Haug et al. [6]; for enhancing plasticity and friction,
412
Vít Vondrák
see Kim et al. [7]. Younsi et al. [8], who obtained a numerical solution of 3D problem, exploited the relation between the discretized and continuous problem by combining the multilevel approach with genetic algorithms. Their approach can use parallelization on the highest level but does not fully exploit the structure of the problem. The natural two-level structure of the contact shape optimization problem for minimization of weight was exploited by Herskovits et al. [9]. The solution of 3D contact shape optimization problems with friction can be found in Beremlijski et al. [10]. The presentation of this chapter is based on Dostál et al. [1] and Vondrák et al. [3]. Similar results as those presented here can be obtained with TBETI. Such approach would simplify remeshing as BETI uses surface grids on the boundary of the (sub)domains only.
References 1. Dostál, Z., Vondrák, V., Rasmussen, J.: FETI based semianalytic sensitivity analysis in contact shape optimization. In Fast Solution of Discretized Optimization Problems. International Series of Numerical Mathematics, pp. 98–106. Basel (2001) 2. Haslinger, J., Neittanmäki, P.: Finite Element Approximation for Optimal Shape, Material and Topology Design, 2nd edn. Wiley, New York (1996) 3. Vondrák, V., Kozubek, T., Markopoulos, A., Dostál, Z.: Parallel solution of 3D contact shape optimization problems based on Total FETI domain decomposition method. Struct. Multidiscip. Optim. 42(6), 955– 964 (2010) 4. Haslinger, J., Mäkinen, R.A.E.: Introduction to Shape Optimization. SIAM, Philadelphia (2002) 5. Nocedal, J., Wright, S.F.: Numerical Optimization. Springer, New York (2000) 6. Haug, E.J., Choi, K., Komkov, V.: Design Sensity Analysis of Structural Systems. Academic Press, New York (1986) 7. Kim, N.H., Choi, K., Chen, J.S.: Shape design sensitivity analysis and optimization of elasto-plasticity with frictional contact. AIAA J. 38, 1742–1753 (2000) 8. Younsi, R., Knopf-Lenoir, C., Selman, A.: Multi-mesh and adaptivity in 3D shape optimization. Comput. Struct. 61(6), 1125–1133 (1996) 9. J. Herskovits, A. Leontiev, G. Dias, G. Santos: Contact shape optimization: a bilevel programming approach. Struct. Multidiscip. Optim. 20(3), 214–221 (2000) 10. Beremlijski, P., Haslinger, J., Kočvara, M., Kučera, R., Outrata, J.: Shape optimization in three-dimensional contact problems with coulomb friction. SIAM J. Optim. 20(1), 416–444 (2009)
Chapter 20
Massively Parallel Implementation Tomáš Kozubek and Vít Vondrák
The FETI domain decomposition methods can effectively exploit the parallel facilities provided by modern supercomputers to solve huge problems, up to hundreds of billions of nodal variables. However, their practical implementation represents a nontrivial problem of independent research interest. Here we limit our attention to some basic strategies using MPI (Message Passing Interface) processes. Recall that MPI is a portable message-passing standard widely used for implementing parallel algorithms. We use it to implement parallel computations associated with the subdomains and/or clusters. The FETI methods appeared in the early 1990s when parallel computers did not have tens or hundreds of thousands of cores. An immediate goal was to use them to solve the problems discretized by a few million degrees of freedom. Thus it is not surprising that we face new problems, starting with the domain decomposition into hundreds of thousands of subdomains. For example, the cost of assembling or applying the projector to the “natural coarse grid,” which is nearly negligible for smaller problems, starts to significantly affect the solution cost when the dimension of the dual problem reaches tens of millions. Moreover, the emerging exascale technologies bring the increasing role of accelerators and communication costs, and the hierarchical organization of memory makes the cost of operations dependent on the distribution of arguments in memory. Thus it is essential to exploit up-to-date open source or commercial software as the effective implementation of some standard steps, such as the application of direct solvers, is highly nontrivial and dramatically affects the overall performance of algorithms. Here we first give some hints concerning the parallel loading and preprocessing of unstructured meshes created in available engineering software. T. Kozubek • V. Vondrák IT4Innovations National Supercomputing Center, VŠB-Technical University of Ostrava, Ostrava, Czech Republic e-mail: [email protected]; e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 Z. Dostál et al., Scalable Algorithms for Contact Problems, Advances in Mechanics and Mathematics 36, https://doi.org/10.1007/978-3-031-33580-8_20
413
414
Tomáš Kozubek and Vít Vondrák
Then we discuss the parallel implementation of FETI-type algorithms. Finally, we discuss the effective assembling and application of the coarse space projector on complex supercomputer architectures by the hybrid TFETI method of Chap. 15 with its unique distribution of coarse grids between the primal and dual variables.
20.1 Parallel Loading and Decomposition of Meshes The solution to contact problems starts with the generation of a discretized numerical model including geometry, material properties, the definition of boundary and initial conditions, contact interfaces, finite/boundary element mesh, etc. We can get high-quality models by commercial tools producing sequential mesh databases, and open source tools convert external (typically sequential) database files into their (parallel) database structure. Typically, the user converts the external database sequentially into the native thirdparty format and then creates sequentially an initial decomposition that can be redistributed in parallel. Such a procedure dramatically slows down or inhibits the connection between tools that generate sequential data and parallel solvers. It is one of the factors contributing to the low exploitation of massively parallel supercomputers by the mainstream engineering community. However, the conversion of data describing huge contact problems into a form suitable for the algorithms developed in this book can also exploit parallel facilities. This chapter briefly describes an efficient parallel algorithm that can load and decompose a sequentially stored mesh database created by engineering tools. The algorithm can serve as a parallel converter for various formats. Moreover, the scalability tests demonstrate that the algorithm is effective as a direct loader and preprocessor of massively parallel solvers. The algorithm is implemented in the MESIO library [1]. The workflow comprises several steps. In the beginning, a mesh database is read and parsed. Since the parsed data are randomly scattered among MPI processes, we apply parallel sorting to get a known data distribution. Then we are ready to assign regions and make a spatial clusterization. This step assures the scalability of the following processing stage, which prepares a mesh for a parallel solver. Let us specify these steps in more detail. 1. Parsed data are randomly scattered among processes that do not share any information. Therefore, we must link everything together to address elements and nodes and assign regions (e.g., sets of elements corresponding to different materials). To do this, we balance nodes and elements among processes and sort them according to their identification numbers
20 Massively Parallel Implementation
415
to define both sets’ distribution via a region boundaries vector. Based on this vector, we can send region data to a process that holds a given node or element. 2. The scalability of parallel mesh processing depends on the uniform distribution of elements and the number of neighbors of each process. The preliminary sorting resolves the first requirement. Unfortunately, their identification numbers are usually randomly distributed across the mesh, so many processes are adjacent. Thus we have to cluster the elements to minimize the number of neighbors. This goal can be accomplished in two steps using some nontrivial mathematical tools. • The spatial clusterization uses the Hilbert space-filling curve. We use the discrete approximations of the Hilbert curves in three dimensions to index the nodes in a way that preserves locality [2] so that the Euclidean distance of two data points with the indices close to each other is small. • We first use the space-filling curve to index nodes and then use the indices to allocate elements to the processes. Each element asks for the space-filling curve index of its node stored on the closest process. The algorithm uses the indexing of the nodes by a space-filling curve. We use the space-filling curve also for the second parallel sort. The processes obtain spatially located nodes and elements by the second sorting according to space-filling curve indices. Finally, we use the distribution of space-filling curve indices to compute the processes’ neighbors and node’s duplicity, i.e., the information about all processes where a given node occurs. 3. The main priority of the previous step is not the decomposition quality. Instead, the spatial locality only assures reasonable times for the following processing, including a dual graph-based decomposition (we use the stateof-the-art tool ParMETIS). This type of decomposition is a better fit for FETI-based solvers’ requirements than geometric-based approaches. Effective implementation of the procedure described above requires efficient parallel sorting algorithms. In our approach, we use the histogram parallel sort [2]. The spatial locality is achieved using the Hilbert space-filling curve [3]. During the parallel sorting, data are exchanged according to the recursive halving algorithm [4]. Finally, we use ParMETIS [5] followed by METIS in the case of H-TFETI for the final decomposition of the mesh before calling a parallel solver. The performance of these algorithms is illustrated in Fig. 20.1.
416
Tomáš Kozubek and Vít Vondrák
Fig. 20.1 The parallel loading and decomposing of meshes in MESIO—jet engine benchmark (courtesy of T. Brzobohatý [1])
20.2 Interconnecting Clusters by Local Saddle Point Problems When we described the hybrid TFETI method in Chap. 15, we used bases of feasible displacements [6]. Though this approach is effective for understanding the method and formulation of the basic theoretical results, it is less flexible for implementation. Here we present an alternative approach that seems more flexible for implementation. It uses the modification of local saddle point matrices introduced by Mandel and Dohrmann [7] to implement FETI-DP for linear problems. Let us mention that the implementation of H-TFETI for contact problems is more complicated as the clusters’ stiffness matrices are positive semidefinite [8]. We illustrate the main features of the new implementation of the H-TFETI method on the 2D cantilever beam decomposed into four subdomains and two clusters as in Fig. 20.2.
Fig. 20.2 2D cantilever beam decomposed into four subdomains
After the discretization and domain decomposition, the resulting linear system reads
20 Massively Parallel Implementation
⎡
K1 ⎢O ⎢ ⎢O ⎢ .⎢O ⎢ ⎢ Bp,1 ⎢ ⎣O B1
O K2 O O Bp,2 O B2
O O K3 O O Bp,3 B3
O O O K4 O Bp,4 B4
BTp,1 BTp,2 O O O O O
417
O O BTp,3 BTp,4 O O O
BT1 BT2 BT3 BT4 O O O
⎤⎡ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦
⎤ ⎡ ⎤ u1 f1 ⎢ u 2 ⎥ ⎢ f2 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ u 3 ⎥ ⎢ f3 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ u 4 ⎥ = ⎢ f4 ⎥ , ⎢ ⎥ ⎢ ⎥ ⎢ λp,1 ⎥ ⎢ o ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ λp,2 ⎦ ⎣ o ⎦ λ c
(20.1)
where .Bi and .Bp,i , .i = 1, . . . , 4, denote the blocks of the standard interconnecting constraints and selected primal constraints, respectively. The rows of .Bp,i are formed either by the copies of the rows of .Bi that should be enforced on a primal level or appropriately placed rigid body modes of adjacent edges (in 3D faces) or their parts. Thus, if the redundant rows of Bp,1 Bp,2 O O .Bp = O O Bp,3 Bp,4 with the multipliers .λp are omitted, the components of the primal solution remain the same. Let us permute (20.1) to get ⎤ ⎡ ⎤ ⎡ ⎤⎡ K1 O BTp,1 O O O BT1 f1 u1 ⎢ O K2 BTp,2 O O O BT2 ⎥ ⎢ u2 ⎥ ⎢ f2 ⎥ ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎢ Bp,1 Bp,2 O O O O O ⎥ ⎢ λp,1 ⎥ ⎢ o ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ O O K3 O BTp,3 BT3 ⎥ .⎢O (20.2) ⎢ ⎥ ⎢ u 3 ⎥ = ⎢ f3 ⎥ . ⎢ O O O O K 4 B T B T ⎥ ⎢ u 4 ⎥ ⎢ f4 ⎥ p,4 4 ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎣ O O O Bp,3 Bp,4 O O ⎦ ⎣ λp,2 ⎦ ⎣ o ⎦ λ c B1 B2 O B3 B4 O O For convenience, let us rewrite to the line partition as ⎡
1 O K ⎢
2 .⎣O K
1 B
2 B
(20.2) in the block form, which corresponds ⎤ ⎤ ⎡ ⎤
T ⎡ u B
1 f1 1 ⎥ T ⎦ ⎣ ⎣
B 2 ⎦ u 2 = f2 ⎦ . λ c O
(20.3)
i , .i = 1, 2, we also eliminate the subset of dual variables .λp,i , Eliminating .u i = 1, 2, related to the matrices .Bp,j . Thus the structure looks like a problem decomposed into two clusters—the first and second subdomains belong to the first cluster, and the third and fourth subdomains belong to the second cluster. The rigid body modes (translations and rotations) associated with the subdomains and clusters are depicted in Fig. 20.3.
1 and .K
2 can be interpreted as the cluster stiffness matrices Matrices .K
2 , respectively. Denoting .λ
= λ, .
with the kernels .R1 and .R c = c,
.
418
Tomáš Kozubek and Vít Vondrák
Fig. 20.3 Cluster of subdomains joined in corners and corresponding rigid body modes
.
2 ),
= diag(K
1, K K + T
F = BK B ,
=B
K
+
d f −
c,
= [B
1, B
2 ], B T T
G=R B ,
T
e=R f,
= diag(R
1, R
2 ), R
we get the Schur complement system
−G
T
F λ d . = ,
O
α −
e −G
(20.4)
which can be solved by the methods used by the classical FETI. The dimen G
T is six, i.e., less than 12 corresponding to FETI. sion of .G The primal variables and the multipliers corresponding to the primal constraints are eliminated by the implicit factorization. We shall illustrate the
1 , i.e.,
1x
1 = b procedure on the solution of the linear system .K K1:2 BTp,1:2 b x1 . (20.5) = , μ z Bp,1:2 O where K1:2 = diag(K1 , K2 ),
.
Bp,1:2 = [Bp,1 , Bp,2 ].
Although (20.5) can be interpreted as a FETI problem, it is much smaller, and we can solve it by elimination. The generalized Schur complement system reads as Fc,1:2 − GTc,1:2 μ dc,1:2 . (20.6) , = −ec,1:2 β −Gc,1:2 O where .
T Fc,1:2 = Bc,1:2 K+ Gc,1:2 = RT1:2 BTc,1:2 , dc,1:2 = Bc,1:2 K+ 1:2 Bc,1:2 , 1:2 b − z, R1:2 = diag (R1 , R2 ) . ec,1:2 = RT1:2 b,
20 Massively Parallel Implementation
1 , both systems (20.5), (20.6) are solved in three steps To obtain .x
−1 , β = S+ d c,1:2 c,1:2 ec,1:2 − Gc,1:2 Fc,1:2 −1 T . μ = Fc,1:2 dc,1:2 + Gc,1:2 β ,
T x 1 = K+ 1:2 b − Bc,1:2 μ + R1:2 β,
419
(20.7)
T where .Sc,1:2 = Gc,1:2 F−1 c,1:2 Gc,1:2 is a singular Schur complement matrix.
1 of .K
1 is the last object to be effectively evaluated. The The kernel .R
1R
1 = O can be written in the form orthogonality condition .K K1:2 BTc,1:2 O R1:2
1 = R1:2 H1:2 . H1:2 = . (20.8) , R O O O Bc,1:2 O
Assuming that the subdomain kernels .R1 and .R2 are known, it remains to determine .H1:2 . However, the first equation in (20.8) does not impose any condition onto .H1:2 , and the second equation gives .
Bc,1:2 R1:2 H1:2 = GTc,1:2 H1:2 = O.
(20.9)
It follows that .H1:2 is the kernel of .GTc,1:2 , which is not a full-column rank matrix due to the absence of Dirichlet boundary conditions in .Bc,1:2 . To get matrix .H1:2 efficiently, the temporary sparse matrix .Gc,1:2 GTc,1:2 is assembled and factorized by the routine for singular matrices. Such routine provides not only the factors but also the kernel of .Gc,1:2 GTc,1:2 , in this case, the matrix .H1:2 . For more details, see [10]. Preprocessing in H-TFETI starts in the same way as in FETI by preparing the factors of .Ki by a direct solver and kernels .Ri for each subdomain. Then one pair consisting of .Fc,j:k and .Sc,j:k is assembled and factorized on each cluster by a direct solver. The dimension of .Fc,1:2 is controlled by the number of Lagrange multipliers .λc,1 interconnecting the cluster subdomains. The dimension of .Sc,1:2 is given by the sum of the defects of all matrices .Ki belonging to the particular cluster.
20.3 When to Use TFETI and H-TFETI All FETI algorithms, including TFETI and H-TFETI, are more efficient when using subdomains with a small number of nodal displacements. The reasons include the following: • The subdomains with a smaller number of unknowns provide better conditioning and a faster rate of convergence of the iterative solver. • Reduced memory requirements associated with storing the Cholesky decomposition data. • Reduced processing time of forward and backward substitution.
420
Tomáš Kozubek and Vít Vondrák
• More subdomains offer the opportunity to split the computing into more processes associated with the subdomains. Since the convergence of H-TFETI is slower than that of TFETI with the same number of subdomains, it follows that we should prefer TFETI to H-TFETI as long as the cost of the coarse problem does not dominate the computations. For TFETI, the coarse problem dimension equals the number of subdomains multiplied by their defect, which equals six for 3D elasticity problems. For example, the ESPRESO library can use the TFETI method to solve the problems discretized by up to four billion nodal variables and decomposed into approximately 50,000 subdomains. Preprocessing of the coarse grid data, multiplication by the dual stiffness matrix .F, and the action of the projector to the coarse grid are computationally the most intensive parts of the solution process. The solution times would only be acceptable with the effective exploitation of parallel facilities. The implementation of parallel algorithms requires that the most computationally intensive parts of the algorithms are executed by parallel MPI processes associated with the subdomains and their faces. In our case, we use distributed approaches: • To assemble the coarse problem. • To calculate related inverse matrix. • To carry out the iterations. The parallelization of TFETI is associated with subdomains. Each subdomain is treated by one core so that the solution procedure can exploit up to 50,000 cores. In the case of H-TFETI, the coarse problem size is proportional to the number of clusters. In the ESPRESO implementation, MPI processes execute the computations with clusters in parallel. If the clusters are associated with nodes similar to TFETI with cores, then the coarse problem dimension is given by the number of compute nodes. It follows that the solver can use up to 50,000 nodes. The performance of TFETI and H-TFETI is illustrated in Fig. 20.4, where we can see the runtimes required for solving the clumped cantilever beam problem with a varying number of compute nodes and varying discretization and decomposition parameters of subdomains and clusters. We chose a linear problem to separate the effect of nonlinearity and the choice of parameters. Using TFETI with many nodal variables per subdomain opens the way to solving large problems. The figure shows that the scalability could be better, but it is acceptable for up to three billion primal unknowns. However, the solution time in this range is still shorter than with H-TFETI. We can compare two strategies: (i) For large problems, use one cluster per compute node with a large number of subdomains—up to several thousand.
20 Massively Parallel Implementation
421
(ii) For moderate problems, use one cluster per core with a small number of subdomains—up to a few hundred.
Fig. 20.4 The two strategies are illustrated on solving problems discretized by up to 3 billion nodal variables running on 512 compute nodes. The figure indicates weak scalability
We can see that strategy (i) enables the solution of huge problems at the cost of increased computation time. Suppose H-TFETI is used according to (ii), with a small number of subdomains per cluster (here 216) and a very small number of nodal variables per subdomain (here 1536). Then it is faster than TFETI, but 1.7 billion nodal variables limit the problem size. Therefore H-TFETI with large clusters is more suitable for larger problems. The practical implementation of the H-TFETI method requires more complicated hybrid parallelization using more complicated standard OpenMP. Moreover, the solver must efficiently treat many subdomains per core. This feature is fundamental if one needs to process hundreds of subdomains per core (strategy (ii)) or thousands of subdomains per node (strategy (i)). Moreover, this approach also significantly improves load balancing, enabling performing work stealing for the OpenMP runtime.
20.4 Parallel Preprocessing of FETI Objects The cost of TFETI iterations, either of the CG steps or the gradient projection step, is typically dominated by the multiplication of a vector .x by .S+ , the generalized inverse of the Schur complement of the stiffness matrix .K with respect to the indices of variables associated with the interior of the subdomains. Even though there are attempts to implement this step by iterative solvers, they seem to have little chance of succeeding due to the necessity to carry out this step with high precision. The same is valid for applying
422
Tomáš Kozubek and Vít Vondrák
the projector .P to the natural coarse grid, which requires the solution of the coarse problem with high precision and can dominate the solution cost. Effective execution of the iterations assumes that they use suitable objects arising from the preprocessing of input data. To implement the preprocessing of .K+ in parallel, we map the problem data to the hardware in the following way. First, the H-TFETI clusters are mapped to the compute nodes; the parallelization model for the distributed memory is used. In our case, we used MPI. Next, the subdomains inside the clusters are mapped to the cores of the particular compute node; therefore, the shared memory model is used. Our implementation allows multiple clusters associated with a single compute node but does not allow the processing of a single cluster on more than one compute node. Finally, TFETI subdomains are mapped directly to cores.
20.4.1 Factorization of K The implementation of .K+ x can be parallelized without any data transfers because of the block-diagonal structure of .K and its kernel .R, K = diag(K1 , . . . , Ks ),
.
R = diag(R1 , . . . , Rs ),
so that it is easy to achieve that each compute core deals with a local stiffness matrix .Ki and its kernel .Ki as in Fig. 20.5, left. If this is not possible, e.g., when the number of the subdomains is greater than the number of cores, we can allocate more blocks to one core as in Fig. 20.5, right.
Fig. 20.5 Parallel distribution of .K and .R matrices. One core-one subdomain (left) and one core-more subdomains (right)
Since the local stiffness matrices are singular, we use the stable factorization of .K by the fixing node strategy of Sect. 11.8. If we use TFETI, we get nonsingular Cholesky factors L = diag(L1 , . . . , Ls )
.
20 Massively Parallel Implementation
423
that can be used to execute efficiently .K+ x for .x ∈ ImK. A similar but more complex procedure exploiting the structure of clusters can be used to evaluate the factors of H-TFETI stiffness matrices. As mentioned above, a natural way to effectively exploit the massively parallel computers is to maximize the number of subdomains so that the sizes of the subdomain stiffness matrices are reduced, which accelerates both their factorization and subsequent forward/backward solves. Conversely, the negative effect is the increase of the dual and the coarse space dimensions. The coarse problem becomes a bottleneck when its dimension, equivalent to the dimension of Ker.K, attains some tens of thousands. The performance of the algorithms is also affected by the choice of the direct solver. The effect of selecting the direct solver on the performance of TFETI in the PERMON library (experiments with PETSc, MUMPS, and SuperLU on Cray XE6 machine HECToR) can be found in Hapla et al. [9].
20.4.2 Assembling of GGT The matrix .G is not block-diagonal, so the assembling and inverting .GGT represent a bottleneck of the FETI methods. The standard method assembles T .GG on the MPI process of rank zero, also called a master. In this approach, the master receives all .Gi matrices, where i denotes the subdomain’s index from all MPI processes. Then the master combines them into the global matrix .G, creates the transposed matrix .GT , and multiplies these two matrices to get .GGT . A more efficient approach assembles .GGT in parallel so that each MPI process assembles one stripe of .GGT only. The basic case is the TFETI with a single subdomain per MPI process. Suppose each subdomain/MPI process has its matrix Gi = RTi BT
.
with six rows corresponding to its six rigid body motions and the matrices Gj corresponding to the six rigid body motions of the adjacent subdomains. The block structure of .G induces the block structure of .GGT with the blocks defined by T T 6×6 .[GG ]ij = Gi Gj ∈ R .
.
The parallelly assembled stripes are exchanged with other processes using a recursive doubling algorithm with a logarithmic number of steps. Naturally, we can exploit the symmetry of .GGT to speed up the procedure. The pattern of .GGT is illustrated in Fig. 20.6. TFETI with multiple subdomains per MPI process. In this case the MPI process deals with a matrix .GI obtained by appending several subdomain matrices .Gi in a row-wise fashion and with adjacent matrices .Gj . The number
424
Tomáš Kozubek and Vít Vondrák
of rows of the .GI matrix is .D · 6, where D is the number of subdomains contributing to .GI . The assembling of .GGT is identical to the process described in the previous paragraph with the only difference: the matrices .GI are used instead of the subdomain matrices .Gi .
I matrix H-TFETI with one cluster per MPI process. In this case, a single .G per H-TFETI cluster/MPI process is assembled from the subdomain matrices Gjk = RTj BTk ,
.
where we assume that .R and .B have the block structure induced by the decomposition into the subdomains, i.e., R = diag(R1 , . . . , Rs ),
.
B = [B1 , . . . , Bs ].
The main difference in the case of H-TFETI is that the matrices are not appended but assembled in a more complicated way. If we use the basis transformation approach described in Chap. 15, the assembling of .GGT is similar to the procedures described above for TFETI, but the matrices are
I not appended but added. As a result, the number of rows of the matrix .G is equal to six. We refer to Říha et al. [8] for more details.
Fig. 20.6 A parallel assembling of the global TFETI coarse problem matrix .GGT from local submatrices .[GGT ]ij
20.5 Factorization of GGT The second part of the coarse problem preprocessing concerns the projector, which performs the multiplication of an inverse matrix .(GGT )−1 and a vector. Though .GGT is a band matrix, the task is not trivial as its sparsity pattern is complicated. In our ESPRESO software, we calculate the explicit inverse
20 Massively Parallel Implementation
425
(GGT )−1 to enable parallel processing of this operation. The inverse matrix is calculated in a distributed fashion to reduce the processing time. The process starts with a Cholesky decomposition of .GGT and calculates only one small part of .(GGT )−1 that is locally required by the projector
.
P = GT (GGT )−1 G.
.
This method efficiently parallelizes the most critical part of the inverse matrix calculation—the forward and backward substitution for every column of the identity matrix with the dimension equal to that of .GGT . The minor penalty of this method is that all MPI processes have to perform Cholesky decomposition. However, this operation is significantly less expensive than several thousands of executions of the forward and backward substitutions because .GGT is very sparse.
20.6 Parallel H-TFETI Solver The H-TFETI solver can be implemented using hybrid parallelization, which is well-suited for multi-socket and multicore compute nodes provided by the hardware of today’s supercomputers. The first level of parallelization is designed for the processing of the clusters. Each cluster is assigned to a single node; if necessary, multiple clusters can be processed per node. Next, the distributed memory parallelization is carried out using MPI. The essential part of this parallelization is the communication layer. This layer is identical whether the solver runs in the TFETI or H-TFETI mode and comprises: 1. The parallel coarse problem solution using the distributed inverse matrix, which merges two global communication operations (Gather and Scatter) into one (AllGather) 2. The optimized version of global gluing matrix multiplication (matrix .B for
1 for H-TFETI)—written as a stencil communication which TFETI and .B is fully scalable The stencil communication for simple decomposition into four subdomains is depicted in Figs. 20.7 and 20.8. In each iteration, the update of multipliers requires the exchange of data between the adjacent subdomains. In our implementation of TFETI, each MPI process modifies only the multipliers associated with a particular subdomain. In the implementation of H-TFETI, each MPI process modifies only the multipliers associated with a particular cluster. We call this operation vector compression. In the preprocessing stage, the local part of the interconnecting matrix is compressed by using the same approach (in this case, we remove the zero rows from the matrix), so that we can directly use the sparse matrix-vector multiplication on the compressed vectors.
426
Tomáš Kozubek and Vít Vondrák
B1
B3
B2
B4
c1 c2 c3 c4 ka kb kc kd
Fig. 20.7 Stencil communication: distribution of .B and .λ on cores .ci
c1 ka
kb
c2
c3 kc
c4 kd
Fig. 20.8 Stencil communication: necessary data exchange .ki between cores
20.7 MatSol, PERMON, and ESPRESO Libraries 20.7.1 MatSol The algorithms described in Parts II and III were developed, implemented, and tested in the MatSol library. The MatSol library served as a referential implementation of these algorithms and was the fundamental tool in our research. We used MATLAB Distributed Computing Server and Parallel Computing Toolbox to parallelize the algorithms. Hence, MatSol has full functionality to solve efficiently large problems of mechanics. The solution process starts from the model, either in the model database or converted to the model database from the standard commercial or noncommercial preprocessors such as ANSA, ANSYS, COMSOL, etc. The list of preprocessing tools is not limited, and additional tools can be plugged into the library by creating a good database converter. The preprocessing part depends on the problem solved. Users can carry out static or transient analysis and solve optimization problems, problems of linear and nonlinear elasticity, and contact problems. The finite or boundary element methods are used for discretization. The TFETI and TBETI domain decomposition methods are implemented. METIS and spectral methods are used to decompose the domains into subdomains. The solution process could run either in sequential or parallel mode. The solution algorithms are implemented, so the code is the same for both sequential and parallel modes.
20 Massively Parallel Implementation
427
MatSol library also includes tools for postprocessing. The results are then converted through the model database into the modeling tools for further postprocessing. Finally, let us note that the MatSol library represented a key tool in the early stages of the development of our solvers; however, it is no longer maintained or updated and is currently entirely replaced by the ESPRESO parallel framework.
20.7.2 PERMON PERMON (Parallel, Efficient, Robust, Modular, Object-oriented, Numerical) is a software package that supports the massively parallel solution of problems of constrained quadratic programming (QP). PERMON is based on PETSc and combines the TFETI method and QP algorithms. The core solver layer consists of the PermonQP package for QP and its PermonFLLOP extension for FETI. PermonQP supports the separation of QP problems, their transformations, and solvers. It contains all QP solvers described in this book. More information on PERMON is on its website: http://permon.vsb.cz.
20.7.3 ESPRESO ESPRESO is an ExaScale PaRallel FETI SOlver developed at the IT4Innovations National Supercomputing Center in Ostrava to provide a highly efficient parallel solver. It enhances the H-TFETI method, which is designed to run on massively parallel machines with thousands of compute nodes and hundreds of thousands of CPU cores. The algorithms can be seen as a multilevel FETI method designed to overcome the main bottleneck of the standard FETI methods, a large coarse problem, which arises when solving large problems decomposed into a large number of subdomains. The ESPRESO solver can use modern HPC accelerators to speed up the solver runtime. To achieve this, it uses a dense representation of sparse system matrices in the form of Schur complements. The main asset of using this implementation of the FETI solvers is the essential reduction of the iteration time. Instead of calling a solve routine of the sparse direct solver in every iteration, which is a sequential operation, the solver can use the dense matrix-vector multiplication (GEMV) routine. GEMV offers the parallelism required by modern accelerators and delivers up to a three to five times speedup, depending on the hardware configuration. More information on ESPRESO can be found on its website: https:// numbox.it4i.cz.
428
Tomáš Kozubek and Vít Vondrák
20.8 Numerical Experiments We conclude this book with two numerical experiments. The first demonstrates the performance of H-TFETI on linear problems; the second demonstrates the performance of TFETI on large academic benchmarks.
20.8.1 H-TFETI on Large Linear Problem
Time [s]
To eliminate the effect of nonlinearity of contact conditions, let us illustrate the power of H-TFETI on the analysis of an elastic cube without contact computed on Titan, a supercomputer at the Oak Ridge National Laboratory. It has 18,688 nodes, each containing a 16-core AMD Opteron 6274 CPU with 32 GB of DDR3 memory and an Nvidia Tesla K20X GPU with 6 GB GDDR5 memory. There are a total of 299,008 processor cores and a total of 693.6 TiB of CPU and GPU RAM. The results with an iterative solver accelerated by the lumped preconditioner are depicted in two graphs. Figure 20.9 shows a weak scalability test, including all relevant information. The number of degrees of freedom ranges from 30 million to 70 billion, and the number of nodes from 8 to 17,576.
Number of DOFs in bilions Number of nodes Fig. 20.9 H-TFETI—linear elasticity, weak scalability test
The strong scalability test is in Fig. 20.10. Eleven billion nodal variables discretized the undecomposed problem that was solved with a varying number of subdomains and cores. A comparison of unpreconditioned H-TFETI with preconditioned TFETI can be found in [11].
20 Massively Parallel Implementation
429
Solver runtime [s]
128
Real Linear
64
32
2400
4800
9600
19200
Number of compute nodes Fig. 20.10 H-TFETI—linear elasticity, 11 billion DOFs, strong scalability test
20.9 Two-Body Contact Problem For the demonstration of weak parallel scalability of the algorithms presented in Parts II and III, we used their implementation in the ESPRESO software (see Sect. 20.7.3). As a benchmark, we consider a frictionless rigid punch problem depicted in Fig. 20.11 left. The rigid punch is pressed against the elastic cube .1 × 1 × 1 [m], which is fixed at the bottom. The punch, which is defined in a reference configuration by the function z(x1 , x2 ) = 1.11 − 0.1 sin(πx1 ),
.
drops vertically by 0.11 [m]. The material properties are defined by Young’s modulus .E = 2.1 · 105 [MPa] and Poisson’s ratio .μ = 0.3. The deformed cube is in Fig. 20.11 right.
Fig. 20.11 Frictionless rigid punch problem (left) and deformed cube (right)
All computations were carried out on the Salomon cluster in the IT4Innovations National Supercomputing Center in Ostrava. The code was
430
Tomáš Kozubek and Vít Vondrák
compiled with Intel C++ Compiler 15.0.2 and linked against Intel MPI Library for Linux 4.1 and PETSc 3.5.3. Salomon consists of 1008 compute nodes, totaling 24,192 compute cores with 129 TB RAM and giving over 2 Pflop/s theoretical peak performance. Each node is a powerful x86-64 computer equipped with 24 cores with at least 128 GB RAM. The nodes are interconnected by the 7D Enhanced hypercube Infiniband network and equipped with the Intel Xeon E5-2680v3 processors. The Salomon cluster consists of 576 nodes without accelerators and 432 nodes equipped with the Intel Xeon Phi MIC accelerators. The computations were carried out with varying decompositions into 64, 1728, 8000, 21,952, and 64,000 subdomains. When it was necessary, several subdomains were assigned to a single core. The stopping criterion was defined by the relative precision of the projected gradient and the feasibility error equal to .10−4 (measured in the Euclidean norm; see (9.40)) and compared with the Euclidean norm of the dual linear term. The results of computations are summarized in Table 20.1. We can see that the number of iterations increases rather mildly in agreement with the theory. Recall that the gradient projection steps require two matrix-vector iterations. # of subdomains 64 1728 8000 21,952 64,000
n 5,719,872 154,436,544 714,984,000 1,961,916,096 5,719,872,000
Hessian mult. 73 83 117 142 149
iter. time [s] 258 311 364 418 497
Table 20.1 Results of the benchmark—cube contact linear elasticity problem
The results in Fig. 20.12 illustrate good numerical and parallel scalabilities.
Fig. 20.12 Cube benchmark. Numbers of matrix-vector multiplications by .F (left) and solution times (right)
20 Massively Parallel Implementation
431
References 1. Meca, O., Říha, L., Jansík, B., Brzobohatý, T.: Toward highly parallel loading of unstructured meshes. Adv. Eng. Softw. 166, 103100 (2022) 2. Kale, L.V., Krishnan, S.: A Comparison based parallel sorting algorithm. In: International Conference on Parallel Processing - ICPP’93, pp. 196– 200, (1993) 3. Sagan, H.: Hilbert’s Space-Filling Curve. Springer, New York (1994) 4. Thakur, R., Rabenseifner, R., Gropp, W.: Optimization of collective communication operations in MPICH. Int. J. High Perform. Comput. Appl. 19, 49–66 (2005) 5. Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comp. 20, 359–392 (1998) 6. Klawonn, A., Rheinbach, O.: Highly scalable parallel domain decomposition methods with an application to biomechanics. ZAMM - J. Appl. Math. Mech. 90(1), 5–32 (2010) 7. Mandel, J., Dohrmann, C.R.: An algebraic theory for primal and dual substructuring methods by constraints. Appl. Numer. Math. 54, 167– 193(2005) 8. Říha, L., Brzobohatý, T., Markopoulos, A., Meca, O., Kozubek, T.: Massively parallel hybrid total FETI (HTFETI) solver. In: Proceedings of the Platform for Advanced Scientific Computing Conference, Article No. 7 (2016) 9. Hapla, V., Horák, D., Merta, M.: Use of direct solvers in TFETI massively parallel implementation. Lect. Notes Comput. Sci. 14, 192–205 (2013) 10. Markopoulos, A., Říha, L., Brzobohatý, T., Jirůtková, P., Kučera, R., Meca, O., Kozubek, T.: Treatment of singular matrices in the Hybrid total FETI method. In: Chang-Ock Lee et al. (eds.) Domain Decomposition Methods in Science and Engineering XXIII, LCNSE vol. 116, pp. 237–244. Springer, Cham (2017) 11. Dostál, Z., Brzobohatý, T., Vlach, O., Říha, L.: On the spectrum of Schur complements of 2D elastic clusters joined by rigid edge modes and hybrid domain decomposition. Numer. Math. 152(1), 41–66 (2022)
Notation and List of Symbols All authors
Abbreviations BEM BETI
Boundary element method Boundary element tearing and interconnecting (method) CG Conjugate gradient (method, algorithm, step, etc.) H-TFETI-DP Hybrid total FETI dual primal FEM Finite element method FETI Finite element tearing and interconnecting (method) KKT Karusch-Kuhn-Tucker (conditions, pair) MPGP Modified proportioning with gradient projections MPRGP Modified proportioning with reduced gradient projections PCG Preconditioned conjugate gradient (method, algorithm, step, etc.) QCQP Quadratically constrained quadratic programming (problem) QP Quadratic programming (problem) SMALBE-M Semi-monotonic augmented Lagrangian algorithm for bound and equality constraints SMALSE-M Semi-monotonic augmented Lagrangian algorithm for separable and equality constraints SPD Symmetric positive definite (matrix) SPS Symmetric positive semidefinite (matrix) SVD Singular value decomposition TBETI Total BETI TFETI Total FETI
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 Z. Dostál et al., Scalable Algorithms for Contact Problems, Advances in Mechanics and Mathematics 36, https://doi.org/10.1007/978-3-031-33580-8
433
434
All authors
Accents, Operators, etc. Less or equal up to a constant factor ≈ Equal up to constant factors .⊕ Space summation .|| · ||1 .1 (Manhattan) norm .|| · ||2 .2 (Euclidean) norm .|| · ||∞ .∞ norm .|| · ||A .A-norm .|| · ||L2 (Ω) Induced norm in the space .L2 (Ω) .(x, y), .x · y Euclidean scalar product of the vectors .x and .y 2 .(u, v)L2 (Ω) Scalar product in the space .L (Ω) −1/2 .w, v Duality pairing of .w ∈ H (Γ) and .v ∈ H 1/2 (Γ) −1 .A Inverse of the matrix .A T .A Transpose of the matrix .A + # .A , A Left generalized inverse to the matrix .A † .A Moore-Penrose generalized inverse of the matrix .A .[v]i = vi i-th component of the vector .v .[Aij ] = aij .(i, j)-th component of the matrix .A .Ω Closure of the set .Ω .: Double dot product of tensors .∂u/∂xi Partial derivative of u w.r.t. .xi .∂u/∂n Normal derivative of u (i.e., directional derivative of u along the vector .n) ˙ .u First derivative of .u w.r.t. time ¨ .u Second derivative of .u w.r.t. time . .
Notation and List of Symbols
435
Greek Letters Kronecker symbol Dirac’s .δ-distribution at a point .y (1) Laplace’s operator (Laplacian), (2) time step when solving transient problems .∇f, ∇x f Gradient of the function f (w.r.t. x) 2 2 .∇ f, ∇xx f Hessian of the function f (w.r.t. x) .Φ Friction coefficient .γ0 Trace operator .γ1 Interior conormal derivative operator i .Γ Boundary of i-th subdomain .ΓU Dirichlet boundary .ΓF Neumann boundary .ΓC Contact boundary pq .ΓC Contact interface between .Ωp and .Ωq .κ(A) Spectral condition number of the matrix .A .κ(A) Regular condition number of the matrix .A .λ Vector of Lagrange multipliers .λE Lagrange multipliers corresponding to equality constraints .λI Lagrange multipliers corresponding to inequality constraints .λi i-th eigenvalue of a matrix .λn Normal contact stress .Ω Computational domain i .Ω i-th subdomain .ΩE Equality constraint set .ΩI Inequality constraint set .ΩIE Inequality and equality constraint set .ΩBE Bound and equality constraint set .σ(A) Spectrum of the matrix .A .σi i-th singular value of a matrix .Ψ Slip bound δ δ .Δ
. ij . y
436
All authors
Other Symbols in Alphabetical Order A, B, C, . . . x, y, z, . . .
. .
a(·, ·) A (x) .B(x) ∞ .C (Ω)
Font for denoting matrices Font for denoting vectors
Bilinear form Active set at the feasible point .x Binding set at the feasible point .x Space of infinitely differentiable functions on .Ω continuously extendable to .Ω ∞ .C0 (Ω) Space of functions from .C ∞ (Ω) with compact support in .Ω .Conv{x1 , x2 , . . . , xk } Convex hull of .x1 , x2 , . . . , xk D Hypersingular integral operator .diag(A1 , A2 , . . . , An ) Block diagonal matrix with n diagonal blocks .A1 , A2 , . . . , An .diam(S) Diameter of the set S .div Divergence operator .E Set of indices corresponding to equality constraints .F Set of indices of the fixed displacements p .f Load vector to .Ωp .f |Ω Restriction of f to .Ω .F (x) Free set at the feasible point .x g Gap function P .g Projected gradient h Mesh/discretization parameter H Decomposition parameter 1 .H (Ω) Sobolev space of the first order 1 .H0 (Ω) Closure of .C0∞ (Ω) w.r.t. .|| · ||H 1 (Ω) 1 .HΔ (Ω) Space of functions from .H 1 (Ω) with the Laplacian in 2 .L (Ω) 1/2 .H (Γ) Trace space of the boundary .Γ −1/2 .H (Γ) Dual space to .H 1/2 (Γ) .I Set of indices corresponding to inequality constraints .I, In Identity matrix (of the order n) .I, Id Identity operator .Id Identity tensor .Im A Image space (range) of the matrix .A K Double-layer potential operator . .
Notation and List of Symbols
K .Ker A 2 .L (Ω)
437
Adjoint double-layer potential operator Kernel (null space) of the matrix .A Space of measurable functions that are quadratically integrable in .Ω .L(x, λE , μ, ρ) Augmented Lagrangian for separable and equality constraints .L0 (x, λ) Lagrangian function .LΩI (x) Linear cone of .ΩI at x on the boundary of .ΩI .log x Natural logarithm of x p .n, n Outer unit normal to .Γ (to .Γp ) N Newton’s operator .N0 , N1 Newton’s potential operators .o, on Zero vector (of the dimension n) .O, Omn Zero matrix (of the order .m × n) .PΩ Euclidean projection to the set .Ω S Steklov-Poincaré operator .S Contact coupling set .Span{x1 , x2 , . . . , xk } Linear span of .x1 , x2 , . . . , xk .supp f Support of the function f .TΩI (x) Tangent cone of .ΩI at x on the boundary of .ΩI .tr(A) Trace of the matrix .A .U (x, y) Fundamental solution of the Laplacian V (1) Test space, (2) single-layer potential operator + .v Non-negative part of the vector .v − .v Non-positive part of the vector .v ⊥ .V Orthogonal complement to the space V .
Index
Symbols vectors, 88 .A-scalar product, 27 .A-conjugate
A Algorithm Cholesky factorization, 22 SPS matrices, 23 SPS matrices with known rank, 23 conjugate gradient (CG), 91 preconditioned (PCG), 97 contact stabilized Newmark, 282 for uniformly distributed fixing nodes, 234 MPGP, 128, 135 MPRGP, 143 MPRGP implementation, 148 MPRGP with preconditioning in face, 150 proportioning, 151 SMALBE-M, 172 SMALSE-M, 158 SMALSE-Mw, 176 Angle between subspaces, 35 between vectors, 27 canonical, 35 B Basis function boundary element, 307 dual, 361 finite element, 239 near, 363
Bilinear form bounded, 79 elliptic, 79 semi-elliptic, 79 symmetric, 79 C Calderon’s system of integral equations, 301 Cauchy interlacing theorem, 31 Cauchy–Schwarz inequality, 27 Cholesky factorization, 22 Coefficient of Euclidean contraction, 105 of friction, 252 Combination convex, 45 Condition number effective, 96 regular, 99 spectral, 32, 95, 144 Conditions complementarity, 59, 218 complementarity strict, 144 KKT for bound constraints, 140 for equality and inequality constraints, 64 for equality constraints, 53, 55 for inequality constraints, 59 Cone dual, 360 linear, 60 tangent, 60
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 Z. Dostál et al., Scalable Algorithms for Contact Problems, Advances in Mechanics and Mathematics 36, https://doi.org/10.1007/978-3-031-33580-8
439
440 Constraint qualification Abadie, 60 Constraints Bound, 102 separable, 102 Cover number of a dual basis function, 363 of the mortar discretization, 363 D Decomposition CS, 34 QR, 29 Decomposition parameter, 222 Derivative interior conormal, 78 Design variables, 405 Direction decrease, 42 feasible, 42 recession, 42 Dirichlet–Neumann map, 302, 309 Discrete harmonic function, 200 Discretization boundary element, 307 finite element, 191 finite elements, 223 mortar, 361 quasi-uniform, 223 variationally consistent, 361 displacement vector, 214 Dual basis functions, 361 finite elements, 361 problem, 64 constrained, 68 Dual function, 64 Duality pairing, 78 E Eigenvalue, 30 Cauchy interlacing theorem, 31 Gershgorin’s theorem, 30 Eigenvector, 30 Element boundary, 307 constant, 307 finite, 223 linear, 191, 307 shape regular, 223 Equation equilibrium, 217 motion, 277
Index Newton’s of motion, 278 variational, 80 Euclidean scalar product, 27 F Finite termination, 141 Fixing node, 232 Formula Green’s represenation, 301 Formulation classical of frictionless contact problem, 218 Friction Coulomb, 252 Coulomb orthotropic, 257 given, 256 Tresca, 102, 256 Function coercive, 46 convex, 45 cost, 41 discrete harmonic, 200 dual, 64 objective, 41 primal, 56 strictly convex, 45 target, 41 Functional coercive, 79 convex, 79 energy, 81 G Gauss elimination, 22 backward substitution, 22 forward reduction, 22 Gershgorin’s theorem, 30 Gradient chopped, 118, 140 free, 118, 140 reduced, 141 projected, 119, 140 reduced, 121 reduced chopped, 124 reduced projected, 123, 176 Gram–Schmidt algorithm, 88 classical, 29 modified, 30 Graph of triangulation adjacency matrix, 38 edges, 38 vertices, 38 walk of length k, 38 Green theorem, 78
Index H Hooke’s law, 216, 309 Hull convex, 45 hybrid BETI, 325 hybrid FETI, 325 I Inequality Cauchy–Schwarz, 27 Hölder, 74 Korn, 239 Poincaré, 75 variational, 220 Iterate propotional, 127 strictly proportional, 142 Iteration proportioning, 142 K KKT pair for equality constraints, 53 Korn inequality, 239 Kronecker product, 17 Kronecker symbol, 16 Krylov space, 89 L Lagrange multiplier, 53, 194 least square, 55 Lagrangian function, 51 Lamé coefficient, 216 LANCELOT, 158 M Manifold linear, 51 Master side, 215 Matrix, 16 adjacency, 38 band, 22 block, 17 defect, 18 diagonal, 22 effective stiffness, 282 expansion, 326 identity, 16 image space, 18 indefinite, 16 induced norm, 26 invariant subspace, 18 inverse, 19 kernel, 18
441 left generalized inverse, 20, 195, 229 lower triangular, 21 mass, 280 Moore–Penrose generalized inverse, 34 null space, 18 orthogonal, 28 orthogonal projector, 28 permutation, 19 positive definite, 16 positive semidefinite, 16 projector, 18 range, 18 rank, 18 restriction, 18 scalar function, 32 Schur complement, 20, 24 singular, 20 singular value, 32 sparse, 16 SPD, 16 spectral condition number, 32 spectral decomposition, 31 SPS, 16 square root, 32 submatrix, 17 upper triangular, 21 Method augmented Lagrangian, 158 BETI, 306 conjugate gradient (CG), 90 exterior penalty, 157 fast gradient, 114 FETI, 185 of multipliers, 158 semismooth Newton, 399 TBETI, 306 TFETI, 185 Total BETI, 306 Total FETI, 185 Minimizer global, 42 Moore–Penrose generalized inverse, 34 N Newton’s equation of motion, 278 Non-penetration linearized (strong), 215, 277 weak (variational), 361 Norm .2 , 26 .A-norm, 27 Euclidean, 27 Frobenius, 26
442 induced, 26 on .Rn , 25 Sobolev–Slobodeckij, 77 submultiplicativity, 26 O Operator double-layer potential, 302, 310 adjoint, 302, 310 hypersingular integral, 302, 310 Newton, 303, 310 Newton’s potential, 302, 310 single-layer potential, 301, 310 Steklov–Poincaré, 303, 309 trace, 76 P Parameter decomposition, 189, 203 discretization, 203, 223 Perron vector, 234 Poisson ratio, 216 Preconditioner conjugate projector, 287 Dirichlet, 380 lumped, 380 reorthogonalization-based, 376 Preconditioning by conjugate projector, 286 by natural coarse grid, 196 in face, 150 Problem bound and equality constrained QP, 172 bound-constrained QP, 139 constrained minimization, 42 contact elasto-plastic, 398 convex separable inequality and linear equality constraints, 155 decomposed, 191 dual, 64 inequality constrained QP, 139 preconditioned, 96 primal, 64 QCQP, 41, 58, 251 separable inequality constraints, 117 QP, 41 transient frictionless contact, 278 unconstrained minimization, 41 Projection Euclidean, 102 free gradient with the fixed steplength, 141 nonexpansive, 50 to convex set, 48
Index Projector, 18 orthogonal, 28, 197 Pseudoresidual, 97 Q QCQP, 117 separable constraints, 117 R Rate of convergence conjugate gradients, 93 Euclidean error of gradient projection, 105 MPGP cost function, 128 MPGP projected gradient, 129 of cost function in gradient projection, 112 SMALBE feasibility, 173 Relative precision, 159 S Scalability numerical of TFETI, 206 Scalar product, 27 .A-scalar product, 27 broken, 221 Euclidean (.Rn ), 27 Sobolev–Slobodeckij, 77 Scaling multiplicity, 391 renormalization-based, 378, 387, 389, 391 Scheme contact-stabilized Newmark, 281 Schur complement, 20, 57 Selective orthogonalization, 98 Set active, 58, 118, 140 binding, 140 compact, 48 constraint, 41 contact coupling, 214 convex, 45 free, 118, 140 subsymmetric, 107 weakly binding, 140 Shadow prices, 57 Shape optimization problem, 405 Singular value, 32 Singular value decomposition (SVD), 32 reduced (RSVD), 32 Skeleton of the decomposition, 299 Slave side, 214 Slip bound, 254
Index
443
Solution dual degenerate, 140, 144 fundamental, 300, 310 least square (LS), 34 range regular, 166 regular, 166 Solution error for the CG method, 93 Space 1 (Ω), 78 .HΔ .H 1 (Ω), 74 .H −1/2 (Γ), 78 .H 1/2 (Γ), 77 1 (Ω, Γ ), 76 .H0 U .L2 (Ω), 74 dual, 78 Krylov, 89 trace, 77 Spectral decomposition, 31 Spectrum, 30 State problem, 405 Steklov–Poincaré operator, 309 Step conjugate gradient, 126 expansion, 128, 141 gradient projection, 127 projected conjugate gradient, 104 proportioning, 142 Subdomain floating, 185 regular, 77
Hooke’s elasticity, 216 Kelvin’s, 310 Theorem Cauchy intelacing, 31 Generalized Lax–Milgram, 80 Gershgorin, 30 Green, 78 Green’s representation formula, 301 Lax–Milgram, 80 Riesz, 78 Weierstrass, 47 Trace, 76
T Taylor’s expansion, 42 Tensor Cauchy stress, 216 Cauchy’s small strain, 216
W Weierstrass theorem, 47
V Variational equation, 80 Variational inequality, 80 Vector, 15 .∞ -norm, 26 .A-conjugate, 28 .A-norm, 27 conjugate, 88 Euclidean norm, 27 feasible, 41 orthogonal, 27 orthonormal, 28 Perron, 234 span, 16 subvector, 16 Vector space linear span, 16 norm, 25 standard basis, 16
Y Young modulus, 216