262 54 6MB
English Pages 211 [212] Year 2017
Sergey I. Martynenko The Robust Multigrid Technique
Also of Interest Direct and Large-Eddy Simulation Bernard J. Geurts, 2018 ISBN 978-3-11-051621-0, e-ISBN (PDF) 978-3-11-053236-4, e-ISBN (EPUB) 978-3-11-053182-4
Computational Physics. With Worked Out Examples in FORTRAN and MATLAB Michael Bestehorn, 2018 ISBN 978-3-11-051513-8, e-ISBN (PDF) 978-3-11-051514-5, e-ISBN (EPUB) 978-3-11-051521-3 Tensor Numerical Methods in Scientific Computing Boris Khoromskij, 2018 ISBN 978-3-11-037013-3, e-ISBN (PDF) 978-3-11-036591-7, e-ISBN (EPUB) 978-3-11-039139-8
Stochastic Methods for Boundary Value Problems. Numerics for High-dimensional PDEs and Applications Karl K. Sabelfeld, Nikolai A. Simonov, 2016 ISBN 978-3-11-047906-5, e-ISBN (PDF) 978-3-11-047945-4, e-ISBN (EPUB) 978-3-11-047916-4 Inside Finite Elements Martin Weiser, 2016 ISBN 978-3-11-037317-2, e-ISBN (PDF) 978-3-11-037320-2, e-ISBN (EPUB) 978-3-11-038618-9
Sergey I. Martynenko
The Robust Multigrid Technique | For Black-Box Software
Mathematics Subject Classification 2010 Primary: 65N55, 65Y05, 35Q30; Secondary: 65M08, 65M55 Author Dr. Sergey I. Martynenko Russian Academy of Sciences Institute of Problems of Chemical Physics Academician Semenov avenue 1 Chernogolovka Moscow region 142432 Russian Federation [email protected]
ISBN 978-3-11-053755-0 e-ISBN (PDF) 978-3-11-053926-4 e-ISBN (EPUB) 978-3-11-053762-8 Set-ISBN 978-3-11-053927-1 Library of Congress Cataloging-in-Publication Data A CIP catalog record for this book has been applied for at the Library of Congress. Bibliographic information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available on the Internet at http://dnb.dnb.de. © 2017 Walter de Gruyter GmbH, Berlin/Boston Cover image: StudioM1/iStock/thinkstock Typesetting: le-tex publishing services GmbH, Leipzig Printing and binding: CPI books GmbH, Leck ♾ Printed on acid-free paper Printed in Germany www.degruyter.com
Preface The fast development of computer technologies has lead to a new theoretical research method: numerical experimentation. The triad of ‘model-algorithm-software’ forms the basis of this method [22]. A mathematical model typically consists of a system of (non)linear partial differential equations or integro-differential equations with boundary and initial conditions. These equations usually express the fundamental conservation laws of the basic physical quantities (energy, momentum, mass, etc.). A computational algorithm is a step-by-step procedure by means of which numerical solutions of the equations of the mathematical model are found. Since the 1990s, the main trend has been the development of black-box software for the solution of a large class of applied problems in the fields of heat transfer, fluid dynamics, elasticity, electromagnetism, and other scientific and engineering disciplines. We define software to be black-box if it does not require any additional input from the user apart from the physical problem specification consisting of the domain geometry, boundary and initial conditions, the enumeration of equations to be solved (heat conductivity equation, Navier–Stokes equations, Maxwell equations, etc.) and mediums. The user does not need to know anything about numerical methods, or high-performance and parallel computing. The main mathematical problems of black-box software development are: robust generation of computational grids in complex domains, and robust and highly parallel solvers of systems of (non)linear partial differential equations or integro-differential equations. In this book, we will focus on the second problem. To overcome the problem of robustness, the essential multigrid principle has been used in a single-grid solver. If the computational grid is structured, it is possible to develop a pseudomultigrid approach for space-type problems with the least number of problem-dependent components, close-to-optimal complexity and high parallel efficiency. The proposed method is called the robust multigrid technique (RMT). The main purpose of this book is description of the robust multigrid technique and the analysis of convergence and parallelization. RMT has the same number of problem-dependent components as the damped Jacobi or Gauss–Seidel methods, but it can solve many (non)linear problems to within truncation error at a cost of CN log N arithmetic operations, where N is the number of unknowns and C is a constant that depends on the problem. Moreover, RMT is intended for application in black-box software. Another aim of this book is to relate the author’s experience in the use of RMT to the solution of multidimensional boundary value problems in a unified manner. The organization of the material is presented as follows: the first chapter is introductory in nature and consists of the mathematical principles of multigrid algorithms and numerical methods, which are hereafter used for solving the (initial) boundary value problems. https://doi.org/10.1515/9783110539264-201
VI | Preface
The second chapter gives a detailed description of RMT, results of convergence analysis and complexity, results of numerical experiments and a brief explanation of multigrid software. The third chapter describes parallel RMT and estimations of speed-up and efficiency of the parallel multigrid algorithms. The fourth chapter demonstrates applications of RMT for the numerical solution of the incompressible Navier–Stokes equations including the description of the main components of the multigrid algorithm and the original approach based on pressure decomposition. A decimal notation is used for numbering equations, theorems, tables, figures, etc. For example, the 11th numbered equation in Chapter 2 is referred to as (2.11). For the convenience of the reader, a list of frequently used symbols is given separately. Vectors are in bold italics. The book is of applied nature and is intended for specialists in the field of computational mathematics, mathematical modeling, and also for software developers engaged in the simulation of physical and chemical processes for aircraft and missilecosmic technologies and for power, chemical, and other branches of engineering. The book will be also useful for students, graduate students, engineers, and multigrid practitioners. Recently, some results have been published in the books Robust Multigrid Technique [13] and Multigrid Technique: Theory and Applications (in Russian) [14]. It is expected that the readers are familiar with the fundamentals of computational mathematics and are acquainted with the material presented in the excellent book Multigrid [25]. Sergey Martynenko Moscow August 2017
Acknowledgments The author expresses his sincere gratitude to Professors R. P. Fedorenko (Keldysh Institute of Applied Mathematics of the Russian Academy of Sciences), V. S. Ryaben’kii (Keldysh Institute of Applied Mathematics of the Russian Academy of Sciences), S. V. Polyakov (Keldysh Institute of Applied Mathematics of the Russian Academy of Sciences), O. B. Arushanyan (Lomonosov Moscow State University), I. B. Badriyev (Kazan Federal University), V. T. Zhukov (Keldysh Institute of Applied Mathematics of the Russian Academy of Sciences), I. V. Stankevich (Bauman Moscow State Technical University), V. M. Volokhov (Institute of Problems of Chemical Physics of the Russian Academy of Sciences), P. D. Toktaliev (Institute of Problems of Chemical Physics of the Russian Academy of Sciences), Yu. M. Temis (Baranov Central Institute of Aviation Motors), L. S. Yanovskiy (Institute of Problems of Chemical Physics of the Russian Academy of Sciences), S. P. Kopyssov (Institute of Mechanics of Ural Branch of the Russian Academy of Sciences), and A. K. Novikov (Institute of Mechanics of Ural Branch of the Russian Academy of Sciences) for their interest in the given work and for a critical discussion of the results obtained. Thanks also go to Professor M. A. Olshanskii (University of Houston) for extremely helpful consultations on the theoretical fundamentals of multigrid methods. The author is very grateful to the professors of the Kazan Federal University who organized the International Conference Mesh Methods for Boundary Value Problems and Applications, where the main results presented in this book were discussed, and also expresses a special gratitude to Professor M. P. Galanin (Keldysh Institute of Applied Mathematics of the Russian Academy of Sciences) for the research support and for his help with the editing of the books [13, 14]. Finally, V. P. Kokorev, L. A. Klimenko, and V. P. Vorobyev also commented extensively on the manuscript and checked the use of the English language. I wish to give special thanks to A. V. Antonik for extremely careful and competent help in the preparation of the manuscript.
https://doi.org/10.1515/9783110539264-202
VIII | Acknowledgments
This activity is a part of the project Supercomputer simulation of physical and chemical processes in the high-speed direct-flow propulsion jet engine of the hypersonic aircraft on solid fuels, supported by the Russian Science Foundation (project number 1511-30012). The main part of this work has been performed at the Institute of Problems of Chemical Physics of the Russian Academy of Sciences (Chernogolovka, Moscow district, Russia).
Contents Preface | V Acknowledgments | VII 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9
Introduction to multigrid | 1 Notations for vectors and matrices | 1 Basic direct methods | 5 Basic iterative methods | 9 One-dimensional Poisson equation | 14 Two-dimensional Poisson equation | 25 Two-grid algorithm | 33 Classical multigrid methods | 43 Higher-order discretization | 47 Nonlinear problems | 53
2 2.1 2.2 2.3 2.3.1 2.3.2 2.3.3 2.3.4 2.4 2.5 2.6 2.7 2.8 2.9 2.10
Robust multigrid technique | 57 Motivation and terminology | 57 Analytic part of the technique | 59 Computational part of the technique | 61 Generation of the finest grid | 61 Triple coarsening | 62 Discretization on the multigrid structure | 68 Multigrid iterations | 75 Approximation of the boundary conditions | 77 A numerical illustration | 79 Convergence analysis and computational work | 89 Numerical experiments | 95 Unstructured grids | 105 Remarks on multigrid software | 111 Conclusions | 118
3 3.1 3.2 3.3 3.4 3.5
Parallel multigrid methods | 120 Algebraic approach to parallelization | 120 Geometric approach to parallelization | 128 Combined approach to parallelization | 134 Parallel V-cycle | 138 Conclusions | 142
X | Contents
4 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.8.1 4.8.2 4.8.3 4.8.4 4.9 4.10
Applications of multigrid methods in computational fluid dynamics | 143 Navier–Stokes equations | 143 Staggered grids | 147 Discretization of convection-diffusion equations | 151 Resulting SLAE | 159 Uzawa iteration | 161 Vanka iteration | 165 Simple multigrid algorithm for Navier–Stokes equations | 170 Formal decomposition of pressure | 175 Simplified Navier–Stokes equations | 175 Pressure decomposition | 178 Explicit schemes | 181 Implicit schemes | 185 Full multigrid algorithm | 188 Conclusions | 194
Bibliography | 195 Index | 197
Contents | XI
Notation A, B, C ℂ D E E,̄ Ẽ
Gv Gf G lk I L L+ L+2 L+3 L+∗ M N P R ℝ Re S S,̄ S̃ T U W W ∆ ∇ Ω ∂Ω a ij , b ij , c ij c cond(A) d det A h l∗ p q
– square matrices (p. 1) – set of complex numbers – diagonal matrices (p. 2, 11, 18, 162) – error of the numerical solution (p. 33, 38, 51, 54, 88, 102, 133, 156, 159) – efficiency of a parallel algorithm (p. 124) – set of grid points (p. 61) – set of grid points (p. 61) – kth grid of level l (p. 65, 84) – identity matrix (p. 2) – lower triangular matrices (p. 2) – the coarsest level (general case) (p. 64) – the coarsest level (double coarsening) (p. 64) – the coarsest level (triple coarsening) (p. 64) – the coarsest level (combined coarsening) (p. 64) – multigrid iteration matrix (p. 37, 90, 92) – nonlinear operator (p. 145) – prolongation operator (p. 35, 86) – restriction operator (p. 35, 87) – set of real numbers – Reynolds number (p. 144) – iteration matrix (p. 10, 34, 90, 109) – speed-up of a parallel algorithm (p. 124) – execution time (Chapter 1 and 3); temperature (Chapter 4) – upper triangular matrices (p. 2) – splitting matrix (p. 9, 34, 90) – algorithmic complexity (p. 6, 8, 30, 38, 57, 59) – Laplace operator (p. 145) – Hamilton operator (p. 145) – domain – domain boundary – entry of matrices A, B, C (p. 1) – coarse grid correction (p. 35, 60, 170) – condition number of the matrix A (p. 13) – dimension: d = 2, 3 – determinant of matrix A – mesh size (p. 15, 61, 66, 156) – parallelization depth (p. 128, 134, 189) – the number of processors (Chapter 3); pressure (Chapter 4) – multigrid iteration counter
XII | Contents q∗ t u, υ x ε λ(A) λ ν ρ(A) ρ τ ϖ ω φ ij ‖A‖ ⟨r⟩ ‖x‖ :=
– extra multigrid iterations (p. 133) – time (Chapter 4) – velocity components (Chapter 4) – vector – parameter – eigenvalue of the matrix A (Chapter 1) (p. 3) – heat conductivity coefficient (Chapter 2 and 4) – smoothing iteration counter – spectral radius of the matrix A (Chapter 1) (p. 3) – density (Chapter 4) – relaxation parameters – relaxation parameters – parameter – grid function – matrix norm (p. 4) – average of r – vector norm (p. 4) – equality in sense of assignment
Abbreviation ASG
– auxiliary structured grid (p. 106)
FDM
– finite difference method (p. 14)
FEM
– finite element method (p. 14)
FVM
– finite volume method (p. 14)
CMM
– classical multigrid methods
OMA
– optimized multigrid algorithm (p. 45)
OUG
– original unstructured grid (p. 106)
RMA
– robust multigrid algorithm (p. 45)
RMT
– robust multigrid technique
SLAE
– system of linear algebraic equations (p. 5)
SOR
– successive overrelaxation method (p. 12)
TDMA – tridiagonal matrix algorithm (p. 9)
Contents |
Subscripts a {⋅} x, y, z max min opt
– analytical – index mapping (p. 66) – relative to x, y, z direction – maximum – minimum – optimum
Superscript T h (n) (q) (ν) (0) x, y, z
– transposition (p. 1) – relative to discrete quantity (p. 15) – relative to nth time layer – relative to qth multigrid iteration – relative to νth smoothing iteration – starting guess – relative to x, y, z direction
XIII
1 Introduction to multigrid We start with a short introduction to some computational algorithms used in solving the boundary value problems on structured grids. In Sect. 1.1 we will introduce some notation for vectors and matrices. Sections 1.2 and 1.3 represent basic direct and iterative methods for solving the systems of algebraic linear equations. One-dimensional (1D) and two-dimensional (2D) Poisson equations with Dirichlet boundary conditions, with some information on the finite difference discretization, will be discussed in Sect.s 1.4 and 1.5. In Sect. 1.6 we will take a first glance at a two-grid multigrid method including convergence analysis. Some facts on classic multigrid methods are listed in Sect. 1.7. Higher-order discretization of the boundary value problems will be considered in Sect. 1.8. Finally in Sect. 1.9 we will mention some numerical methods for solving nonlinear problems.
1.1 Notations for vectors and matrices Let ℝ and ℂ denote a set of real and complex numbers respectively. The space of all n-by-n real matrices will be denoted as ℝn×n . Definition 1.1.1. A square matrix has the same number of rows as columns:
A ∈ ℝn×n
a11 a12 ⋅ ⋅ ⋅ a1n a a22 ⋅ ⋅ ⋅ a2n ⇒ A = {a ij } = ( 21 ) , ................... a n1 a n2 ⋅ ⋅ ⋅ a nn
a ij ∈ ℝ .
If a capital letter is used to denote a matrix (A, B, C), then the corresponding lowercase letter with subscript ij refers to the (i, j) entry (a ij , b ij , c ij ). Basic matrix operations for A ∈ ℝn×n , B ∈ ℝn×n and C ∈ ℝn×n include a) transposition B = AT ⇒ b ij = a ji , b) addition C = A + B ⇒ c ij = a ij + b ij , c) scalar–matrix multiplication (α ∈ ℝ) C = αA ⇒ c ij = αa ij , and d) matrix–matrix multiplication n
C = AB ⇒ c ij = ∑ a ik b kj . k=1
https://doi.org/10.1515/9783110539264-001
2 | 1 Introduction to multigrid
The most important classes of matrices are listed below. ∙ Diagonal matrices: A = {a ij = 0 , i ≠ j} . We denote the diagonal matrices by D: a11 a22
D ∈ ℝn×n ⇒ D = (
) .
⋅⋅⋅ a nn
∙
Identity matrix: A = {a ii = 1} . We denote the identity matrix by I: 1 I ∈ ℝn×n ⇒ I = (
1
) .
⋅⋅⋅
1 ∙
The identity matrix satisfies the equality AI = IA = A for every matrix A of size n. Upper triangular matrices: A = {a ij = 0 , for i > j} . We denote the upper triangular matrices by U: 0 U ∈ ℝn×n ⇒ U = (
∙
a12 0
⋅⋅⋅ ⋅⋅⋅ 0
a1n a2n ) . ⋅⋅⋅ 0
Lower triangular matrices: A = {a ij = 0 , for i < j} . We denote the lower triangular matrices by L:
L∈ℝ
∙
n×n
0 a21 ⇒ L=( ⋅⋅⋅ a n1
0 ⋅⋅⋅ a n2
0 ⋅⋅⋅
) . 0
Tridiagonal matrices: A = {a ij = 0 , for any pair i, j such that |j − i| > 1} .
1.1 Notations for vectors and matrices | 3
∙ ∙ ∙
Block tridiagonal matrices: generalize the tridiagonal matrix by replacing each diagonal nonzero entry with a square matrix. Permutation matrices: the columns of A are a permutation of the columns of the identity matrix. Symmetric matrices: A = AT .
The inverse of a matrix A ∈ ℝn×n , when it exists, is a matrix C such that CA = AC = I. The inverse of A is denoted by A−1 . If A−1 exists, then A is said to be nonsingular. Otherwise, we say A is singular. The inverse of a product is the reverse product of the inverses: (AB)−1 = B−1 A−1 . Definition 1.1.2. A complex (real) scalar λ is called an eigenvalue of the square matrix A of size n if a nonzero vector υ ∈ ℂn exists such that Aυ = λυ. The vector υ is called an eigenvector of A associated with λ. The set of all the eigenvalues of A is called the spectrum of A and is denoted by σ(A). It should be noted that a real matrix may not have any real eigenvalues, but if A is a real symmetric matrix then the eigenvalues of A are all real. Definition 1.1.3. The maximum modulus of the eigenvalues is called the spectral radius and is denoted by ρ(A): ρ(A) = max |λ| . λ∈σ(A)
Definition 1.1.4. Matrix A ∈ ℝn×n is convergent to zero if the sequence of matrices A, A2 , A3 , . . . converges to the null matrix and is divergent otherwise. Theorem 1.1.1. If A ∈ ℝn×n , then A is convergent to zero if and only if ρ(A) < 1 lim A ν = 0 ⇔ ρ(A) < 1 .
ν→∞
Proof. For the proof, see [27]. Definition 1.1.5. Two matrices A and B are said to be similar if there is a nonsingular matrix X such that A = XBX −1 . The mapping B → A is called a similarity transformation. If two matrices are similar then they have exactly the same eigenvalues. Let ℝn denote the vector space of real n-vectors: x1 x 2 x ∈ ℝn ⇔ x = ( . ) , .. xn
xi ∈ ℝ .
4 | 1 Introduction to multigrid We refer to x i as the ith component of x. Basic vector operations for α ∈ ℝ, x ∈ ℝn , y ∈ ℝn , z ∈ ℝn include: a) scalar-vector multiplication z = αx ⇒ z i = αx i , and b) vector addition z = x + y ⇒ zi = xi + yi . Definition 1.1.6. A vector norm on a vector space 𝕏 is a real-valued function x → ‖x‖ on 𝕏 that satisfies the following three conditions: 1. ‖x‖ ≥ 0 ∀x ∈ 𝕏 and ‖x‖ = 0 if x = 0 . 2. ‖αx‖ = |α|‖x‖ ∀x ∈ 𝕏 ∀α ∈ ℂ . 3. ‖x + y‖ ≤ ‖x‖ + ‖y‖ ∀x, y ∈ 𝕏 . The most commonly used vector norms in numerical linear algebra are special cases of the Hölder norms: 1/p
n
‖x‖p = ( ∑ |x i |p )
.
i=1
The cases p = 1, p = 2 and p = ∞ lead to the most important norms in practice: n
‖x‖1 = ∑ |x i | , i=1 1/2
n
‖x‖2 = ( ∑ |x i |2 )
,
i=1
‖x‖∞ = max |x i | . i=1,...,n
All vector norms are equivalent. Thus, if ‖ ⋅ ‖α and ‖ ⋅ ‖β are two vector norms on ℝn , then there are positive constants c1 and c2 such that c1 ‖x‖α ≤ ‖x‖β ≤ c2 ‖x‖α holds for all x ∈ ℝn . For a matrix A ∈ ℝn×n , we define the following special set of p-norms: ‖A‖p = max x =0 ̸
‖Ax‖p . ‖x‖p
A fundamental property of p-norm is that ‖AB‖p ≤ ‖A‖p ‖B‖p , ‖Ax‖p ≤ ‖A‖p ‖x‖p .
1.2 Basic direct methods |
5
Matrix norms that satisfy the above property are sometimes called consistent. A result of consistency is that, for any square matrix A ‖A k ‖p ≤ ‖A‖kp . The most commonly used matrix norms for A ∈ ℝn×n are given below: n
‖A‖1 = max ∑ |a ij | , j=1,...,n
i=1 n
‖A‖∞ = max ∑ |a ij | , i=1,...,n
i=1
‖A‖2 = √ρ(AT A) . The norm ‖A‖2 is often called the spectral norm. All matrix norms are equivalent. Thus, if ‖ ⋅ ‖α and ‖ ⋅ ‖β are two matrix norms on ℝn×n then there are positive constants c1 and c2 such that c1 ‖A‖α ≤ ‖A‖β ≤ c2 ‖A‖α holds for all A ∈ ℝn×n . Often this results in a subscript-free norm notation ‖ ⋅ ‖.
1.2 Basic direct methods Let us seek the solution of the system of linear algebraic equations (SLAEs) a11 φ1 + a12 φ2 + a13 φ3 + ⋅ ⋅ ⋅ + a1n φ n = b 1 , { { { { { { a21 φ1 + a22 φ2 + a23 φ3 + ⋅ ⋅ ⋅ + a2n φ n = b 2 , { { { a φ + a32 φ2 + a33 φ3 + ⋅ ⋅ ⋅ + a3n φ n = b 3 , { { 31 1 { { { .......................................... { { { { { a n1 φ1 + a n2 φ2 + a n3 φ3 + ⋅ ⋅ ⋅ + a nn φ n = b n , or
(1.1)
n
∑ a ij φ j = b i ,
1≤i≤n,
j=1
where φ j (j = 1, 2, . . . , n) are unknown variables, a ij (i, j = 1, 2, . . . , n) are the coefficients and b i (i = 1, 2, . . . , n) are the nonhomogeneous terms. The first subscript i identifies the row of the equation and the second subscript j identifies the column of the system of equations; n is the number of unknowns. We write the system (1.1) in the matrix notation as Aφ = b ,
6 | 1 Introduction to multigrid
where a11 a12 a13 ⋅ ⋅ ⋅ a1n a21 a22 a23 ⋅ ⋅ ⋅ a2n A = ( a31 a32 a33 ⋅ ⋅ ⋅ a3n ) , ......................... a n1 a n2 a n3 ⋅ ⋅ ⋅ a nn
φ1 φ2 φ = (φ3 ) , ⋅⋅⋅ φn
b1 b2 b = (b3 ) . ⋅⋅⋅ bn
The matrix A ∈ ℝn×n is called a coefficient matrix of the system (1.1), b is a right-hand side vector, and φ is a vector of unknowns. For a vector φ, a vector of residuals denoted by r is r = b − Aφ r1 b 1 − a11 φ1 − a12 φ2 − a13 φ3 − ⋅ ⋅ ⋅ − a1n φ n r2 b 2 − a21 φ1 − a22 φ2 − a23 φ3 − ⋅ ⋅ ⋅ − a2n φ n r = ( r3 ) = ( b 3 − a31 φ1 − a32 φ2 − a33 φ3 − ⋅ ⋅ ⋅ − a3n φ n ) . ⋅⋅⋅ ......................................... rn b n − a n1 φ1 − a n2 φ2 − a n3 φ3 − ⋅ ⋅ ⋅ − a nn φ n Further suppose that the coefficient matrix A is nonsingular. Then vector φ ∈ ℝn (exact solution) exists and is given by φ = A−1 b , where A−1 is the inverse matrix of A. Computation of the inverse matrix of A−1 requires impractical work, therefore direct and iterative methods are used for solving the SLAE (1.1). Let n be the number of unknowns in the SLAE considered. We define an algorithmic complexity W as the total number of arithmetic operations needed to solve the given SLAE by the selected algorithm, i.e. W = W(n) arithmetic operations. In modern applications the number of unknowns may vary from n ∼ 106 up to n ∼ 1012 and larger, so special attention is paid to the development of computational algorithms with optimal complexity: Wopt = O(n) arithmetic operations. Although all arithmetic operations differ in the execution time, it can be assumed that the total execution time T is proportional to W, i.e. T ∼ W(n). There are two classes of methods for solving SLAEs. In direct methods the exact solution of a SLAE (1.1) can be obtained by performing a finite number of arithmetic operations in the absence of round-off errors. Examples of such direct methods include Gaussian elimination, Gauss–Jordan elimination, the matrix inverse method and LU (lower upper) factorization. As a rule, the number of arithmetic operations needed to solve a SLAE (1.1) by direct methods is proportional to n3 , i.e. W = O(n3 ) arithmetic operations. Therefore direct methods are used for solving small-sized SLAEs. Iterative methods achieve the solution asymptotically by an iterative procedure, starting from an initial guess. The computations continue until a sufficiently accurate approximation to the exact solution φ = A−1 b is obtained.
1.2 Basic direct methods |
7
Direct solutions utilize Gaussian elimination and its many variants. First, we assume that pivoting is unnecessary. Consider the following lower triangular system: l11 l21 ( l31 ⋅⋅⋅ l n1
l22 l32 ⋅⋅⋅ l n2
l33 ⋅⋅⋅ l n3
⋅⋅⋅ ⋅⋅⋅
φ1 b1 φ2 b2 ) (φ3 ) = (b3 ) . ⋅⋅⋅ ⋅⋅⋅ l nn φn bn
If l ii ≠ 0, then the unknowns can be determined sequentially: φ1 =
b1 , l11
φi =
i−1 1 (b i − ∑ l ij φ j ) , l ii j=1
i = 2, 3, . . . , n .
This is an algorithm known as forward substitution. The analogous algorithm for an upper triangular system, u 11
u 12 u 22
(
u 13 u 23 u 33
⋅⋅⋅ ⋅⋅⋅ ⋅⋅⋅ ⋅⋅⋅
φ1 b1 u 1n u 2n φ2 b2 u 3n ) ( φ3 ) = ( b 3 ) , ⋅⋅⋅ ⋅⋅⋅ ⋅⋅⋅ u nn φn bn
is called back substitution. If u ii ≠ 0, then the unknowns are determined sequentially: φn =
bn , u nn
φi =
n 1 (b i − ∑ u ij φ j ) , u ii j=i+1
(1.2a) i = n − 1, n − 2, . . . , 1 .
(1.2b)
The strategy of Gaussian elimination is to reduce a full linear system Aφ = b to a triangular system using elementary row operations. For a good understanding of basic techniques of direct elimination, it is incredibly helpful to apply the elimination method to find solutions of a system (1.1): Step 1: Subtracting the first equation multiplied by a21 /a11 from the second equation, and this multiplied by a31 /a11 from the third equation, . . . , and this multiplied by a n1 /a11 from nth equation, transforms the initial system (1.1) into (1)
(1)
(1)
(1)
(1)
a11 φ1 + a12 φ2 + a13 φ3 + ⋅ ⋅ ⋅ + a1n φ n = b 1 , { { { { (2) (2) (2) (2) { { a22 φ2 + a23 φ3 + ⋅ ⋅ ⋅ + a2n φ n = b 2 , { { { (2) (2) (2) (2) a32 φ2 + a33 φ3 + ⋅ ⋅ ⋅ + a3n φ n = b 3 , { { { { { .................................... { { { { (2) (2) (2) a n2 φ2 + a n3 φ3 + ⋅ ⋅ ⋅ + a(2) nn φ n = b n . {
8 | 1 Introduction to multigrid (2)
(2)
Step 2: Subtracting the second equation multiplied by a32 /a22 from the third equa(2) (2) tion, and this multiplied by a42 /a22 from the fourth equation, . . . , and this multiplied (2) (2) by a n2 /a22 from nth equation gives (1)
(1)
(1)
(1)
(1)
a11 φ1 + a12 φ2 + a13 φ3 + ⋅ ⋅ ⋅ + a1n φ n = b 1 , { { { { (2) (2) (2) (2) { { a22 φ2 + a23 φ3 + ⋅ ⋅ ⋅ + a2n φ n = b 2 , { { { (3) (3) (3) a33 φ3 + ⋅ ⋅ ⋅ + a3n φ n = b 3 , { { { { { ........................... { { { { (3) (3) a n3 φ3 + ⋅ ⋅ ⋅ + a(3) nn φ n = b n . { The purpose of the kth stage of the elimination is to zero the elements below the diagonal in the kth column of A(k) . This is accomplished by the operations (k+1)
= a ij − m ik a kj ,
(k+1)
= b i − m ik b k ,
a ij bi
(k)
(k)
i, j = k + 1, . . . , n ,
(1.3a)
(k)
(k)
i = k + 1, . . . , n ,
(1.3b)
(k)
(k)
where the multipliers m ik = a ik /a kk , i = k + 1, . . . , n. At the end of the (n − l)th stage we have the upper triangular system A(n) φ = b(n) , which is solved by back substitution (1.2). There are two problems with the method as described. First, the procedure fails if (k) a kk = 0. Second, if we are working in finite precision and some multiplier m ik is large, then there is a possible loss of significance. These observations are the motivation behind the strategy of partial or complete pivoting. The order of equations in a linear system can be interchanged without changing the solution. This procedure is called ‘partial pivoting’. Partial pivoting requires a very small fraction of computational efforts compared to the elimination calculations. ‘Full pivoting’ includes interchanging both equations and variables, and it is rarely applied in practical calculations because of its complexity. A much more detailed treatment of round-off error and pivoting in Gaussian elimination is given in [11]. To summarize the important applied aspects of the Gaussian elimination: The total number of arithmetic operations done by the basic elimination algorithm for a system of n equations is about O(n3 ). The back substitution takes approximately O(n2 ) operations, i.e. the algorithmic complexity is W = O(n3 ) arithmetic operations. In the following, the Gaussian elimination with partial pivoting will be used for solving the SLAEs of small size (usually n ≤ 30). The Gaussian elimination does not exploit a pattern of the coefficient matrix A. In the following, this fact will be widely used for constructing the robust smoothers for multigrid algorithms (Chapter 2 and 4). Direct methods for SLAEs are more sensitive to round-off errors than iterative ones. Subprograms for different variants of Gaussian elimination are part of freely available software packages for the solution of linear algebra problems.
1.3 Basic iterative methods | 9
Linear systems of equations of tridiagonal type arise when solving problems in a wide variety of disciplines. The tridiagonal matrix algorithm (TDMA), also known as the Thomas algorithm, is a simplified form of Gaussian elimination that can be used to solve the tridiagonal system of equations a11 a21 (
a12 a22 a32
a23 a33 ⋅⋅⋅
⋅⋅⋅ ⋅⋅⋅ a nn−1
φ1 b1 φ2 b2 ) (φ3 ) = (b3 ) . a n−1n ⋅⋅⋅ ⋅⋅⋅ a nn φn bn
Zero entries of the coefficient matrix are not shown. Note that only one subdiagonal entry is to be eliminated in every row, namely a i−1i , affecting only the diagonal entries and the right-hand vector. Subsequently, the elimination procedure (1.3) for a tridiagonal matrix is given by (k+1)
a ii
(k+1) bi
(k)
(k)
(k) bi
(k) m i b i−1
= a ii − m i a i−1i , =
−
(k)
i = 2, . . . , n , i = 2, . . . , n ,
,
(k)
where the multipliers m i = a ii−1/a i−1i−1 , k = 1. To summarize the important applied aspects of TDMA: The TDMA is only applicable to matrices that are diagonally dominant, i.e. |b i | > |a i | + |c i | ,
i = 1, 2, . . . , n ,
so pivoting is unnecessary. TDMA has an optimal complexity because the total number of arithmetic operations for a tridiagonal system of n equations is about W = O(n). There are different variants of the TDMA [11, 21, 27]; appropriate subprograms are part of freely available software.
1.3 Basic iterative methods In contrast to direct methods, all iterative methods generally do not produce the exact answer after a finite number of steps but decrease the error by some fraction after each step. Iteration ceases when the error is less than a user-supplied threshold. The final error depends on how many iterations one does as well as on properties of the method and the linear system. Simple iterative methods for a SLAE Aφ = b are given by W
φ (ν+1) − φ (ν) = b − Aφ (ν) , τ
ν = 0, 1, 2, . . . ,
(1.4)
where φ (0) is a starting guess, and W and τ are some matrix and relaxation parameters respectively. In this expression the superscript ν denotes an iteration counter and a nonsingular matrix W is called a splitting matrix.
10 | 1 Introduction to multigrid Definition 1.3.1. The residual after ν iterations is r(ν) = b − Aφ (ν) . Definition 1.3.2. The exact error after ν iterations is φ − φ (ν) . Definition 1.3.3. The iteration error after ν iterations is φ (ν+1) − φ(ν) . In general, a new approximation to the solution φ (ν+1) is computed as follows: Step 1: numerical solution of the SLAE Wδ (ν+1) = τ(b − Aφ (ν)) ⇒ δ (ν+1) = τW −1 (b − Aφ (ν) ) , where solution δ (ν+1) = φ(ν+1) − φ (ν) is the iteration error after ν iterations. Step 2: computation of a new approximation to the solution φ(ν+1) = φ(ν) + δ (ν+1) . Let us rewrite SLAE (1.4) as φ (ν+1) = (I − τW −1 A)φ (ν) + τW −1 b = Sφ (ν) + τW −1 b ,
(1.5)
where the matrix S = I − τW −1 A is called the iteration matrix for the method (1.4). Iterative method (1.5) converges to a limit, if lim φ(ν) = φ = A−1 b .
ν→∞
To summarize general convergence results: Theorem 1.3.1. Let S be a square matrix such that ρ(S) < 1. Then I−S is nonsingular and the iteration (1.5) converges for any b and φ (0) . Conversely, if the iteration (1.5) converges for any b and any starting vector φ (0) , then ρ(S) < 1. Remark 1.3.1. Since it is expensive to compute the spectral radius of a matrix, sufficient conditions that guarantee convergence can be useful in practice. One such sufficient condition could be obtained by utilizing the inequality ρ(S) < ‖S‖ for any matrix norm. Theorem 1.3.2. (A sufficient condition for convergence.) Let S be a square matrix such that ‖S‖ < 1 for some matrix norm ‖ ⋅ ‖. Then I − S is nonsingular and the iteration (1.5) converges for any b and starting vector φ (0) . Remark 1.3.2. If the above iteration (1.5) converges, its limit φ = A−1 b satisfies φ = Sφ + τW −1 b . Subtracting (1.5), we have φ − φ (ν+1) = S(φ − φ(ν) ) = S ν+1 (φ − φ (0) ) .
(1.6)
This results in the estimation ‖φ − φ(ν+1) ‖ ≤ ‖S‖ν+1 ‖φ − φ (0) ‖ . If ‖S‖ < 1, the exact error φ − φ(ν+1) will monotonically decrease for ν → ∞.
(1.7)
1.3 Basic iterative methods |
11
A much more detailed convergence analysis is given in [21, 27]. The splitting matrix W and relaxation parameter τ define the iterative method. The matrix W should satisfy to the following incompatible requirements: 1. Computation of a new approximation φ (ν+1) needs to repeatedly solve the SLAE Wδ (ν+1) = b,̃ i.e. the splitting matrix W should be easy to invert (diagonal or triangular). 2. According to (1.7), it is necessary to minimize the iteration matrix norm ‖S(τ, W)‖ (i.e. τ −1 W → A) in order to minimize the number of iterations. As a rule, faster treatment of the splitting matrix W results in a significant increase in the number of iterations needed to reach a stopping criterion. Let D, I, L and U be diagonal, identity, lower triangular and upper triangular matrices defined in Sect. 1.1 so that A = D − L − U. Table 1.1 represents the matrices W, S, the right-hand side vector b̃ and parameter τ/ϖ for the Jacobi, Gauss–Seidel and successive overrelaxation (SOR) methods written in the matrix form φ(ν+1) = Sφ (ν) + b̃ . Let us consider a system of linear algebraic equations (SLAE) n
∑ a ij φ j = b i ,
i = 1, 2, . . . , n
j=1
with the nonsingular coefficient matrix. We assume that its diagonal entries a ii are all nonzero real numbers and n is the number of unknowns. Every equation can be formally solved for a diagonal entry φi =
i−1 n 1 (b i − ∑ a ij φ j − ∑ a ij φ j ) , a ii j=1 j=i+1
or φi = φi +
n 1 (b i − ∑ a ij φ j ) . a ii j=1
Tab. 1.1: Generalized matrix notation for the Jacobi, Gauss–Seidel and SOR methods
Iterations
Parameter
W
S
b̃
Damped Jacobi
τ ∈ (0, 1]
D
I − τD−1 A
τD−1 b
Gauss–Seidel
1
D−L
I − (D − L)−1 A
(D − L)−1 b
SOR
ϖ ∈ [1, 2)
D − ϖL
I − ϖ(D − ϖL)−1 A
ϖ(D − ϖL)−1 b
12 | 1 Introduction to multigrid (0)
Choosing a starting guess φ i , we can rewrite this equation in the iterative form (ν+1)
φi
(ν)
= φi +
n τ (ν) (b i − ∑ a ij φ j ) . a ii j=1
These iterations define the damped Jacobi iterative method, which will be used in Sect. 1.4 for theoretical convergence analysis and in Chapter 3 for constructing parallel algorithms. In the Gauss–Seidel method, the most recently computed values of φ i are used in calculations for j > i solutions (ν+1)
φi
=
i−1 n 1 (ν+1) (ν) (b i − ∑ a ij φ j − ∑ a ij φ j ) . a ii j=1 j=i+1
The Gauss–Seidel method will be used for solving a large class of the (non)linear boundary value problems. It is possible to accelerate convergence of the Gauss–Seidel iterations by introducing an overrelaxation factor ϖ: φ (ν+1) = ϖ φ̄ (ν+1) + (1 − ϖ)φ (ν) , where the Gauss–Seidel method D φ̄ (ν+1) = Lφ (ν+1) + Uφ(ν) + b is used for computation of the provisional iterate φ̄ (ν+1) . For ϖ = 1 the iterations are the Gauss–Seidel method, for 1 < ϖ < 2 the iterations are overrelaxed, and for ϖ < 1 the iterations are is underrelaxed. The optimum value of ϖ depends on the coefficient matrix A. The iterations with 1 < ϖ < 2 are the successive overrelaxation method (SOR). Theorem 1.3.3. The successive overrelaxation method (SOR) does not converge unless 0 < ϖ < 2. An optimal value of the overrelaxation parameter ϖ can be found by theoretical methods only in the simplest cases [27]. We will consider this problem in Chapter 2. Underrelaxation is widely used for nonlinear problems (Sect. 1.9). From practical point of view, it is desirable to construct a parameter-free iterative method that has a faster convergence rate than SOR. The goal of this book is the development of a robust iterative algorithm with close-to-optimal complexity and the least number of problemdependent components. When an iterative method is used to solve a linear system, we typically face the problem of choosing a good stopping criterion for the algorithm. All iterative methods produce a sequence of vectors φ (ν) converging to the vector φ = A−1 b. A good stopping criterion should
1.3 Basic iterative methods | 13
– –
identify when the exact error ‖φ (ν) − φ‖ is small enough to stop, and limit the maximum amount of time spent iterating.
Often a residual norm ‖r(ν) ‖ = ‖b − Aφ (ν)‖ is available for some current approximation φ (ν) to the solution φ and an estimation of the exact error ‖φ − φ (ν) ‖ or the relative error ‖φ − φ (ν)‖/‖φ‖ is desired. The following simple relation is helpful in this regard: ‖φ − φ (ν) ‖ ‖r(ν) ‖ ≤ cond(A) , ‖φ‖ ‖b‖ where the quantity
(1.8)
cond(A) = ‖A−1 ‖ ‖A‖
is called a condition number of the coefficient matrix A. It is necessary to have an estimation of the condition number in order to exploit the above relation (1.8). Estimations of cond(A) and other details of the perturbation analysis are given in [11, 21]. A problem with a low condition number is said to be well conditioned, while a problem with a high condition number is said to be ill conditioned. The condition number is a property of the problem. Unfortunately, a stopping criterion depends on the problem to be solved, so it is impossible to give some practical recommendations for the general case. As a rule, the absence of a priori information about the problem forces the stop of iterations under the following criteria: 1) Criterion 1 ‖r(ν) ‖2 < ε1 . ‖b‖2 The criterion shows in how many cases the length of the residual vector is smaller than the length of the right-hand vector. 2) Criterion 2 ‖φ (ν) − φ (ν−1) ‖∞ < ε2 . The criterion shows the maximum value of the iteration error. Often, the user limits the maximum number of iterations. It is necessary to show certain caution about the application of Criteria 1 and 2 for solving the boundary value problems, because ε1 and ε2 depend on mesh size h. To summarize the important applied aspects of the iterative methods: Iterative methods are less sensitive to round-off errors in comparison to direct elimination methods. Any iterative method always contains at least one problem-dependent component – the stopping criterion. Convergence of iterative methods depends on the pattern of the coefficient matrix. For example, the above-mentioned Jacobi, Gauss–Seidel and SOR methods fail if diagonal entries of the coefficient matrix are zero a ii = 0. Stopping criteria should be in the range of the discretization accuracy.
14 | 1 Introduction to multigrid
1.4 One-dimensional Poisson equation Elliptic partial differential equations can be generally obtained from time-dependent problems considering the so-called stationary case. In the stationary case the solution varies only with the spatial coordinates and not with time. The one-dimensional Poisson equation with Dirichlet boundary conditions (the so-called one-dimensional boundary value problem) is a second-order ordinary differential equation with appropriate boundary conditions u (x) = −f(x) , u(0) = 0 ,
x ∈ (0, 1) , u(1) = 0 .
(1.9a) (1.9b)
We suppose that the solution u(x) is sufficiently smooth on (0, 1) and continuous on [0, 1]. Of course, the above problem can be solved exactly by integrating twice both sides of the equation and choosing the two constants of integration according the boundary conditions. Nevertheless, we solve this problem numerically with the finite difference method in order to understand details of the computational algorithms. In order to obtain a well-posed problem, we generally prescribe so-called boundary conditions on the boundary (denoted by ∂Ω) of the domain Ω. There are three different types of the boundary conditions: 1) Dirichlet boundary conditions. The function u is prescribed on the boundary ∂Ω, that is u(w) = g1 (w), w ∈ ∂Ω (g1 is a given function). 2) Neumann boundary conditions. The (normal) derivative of u is prescribed on the boundary ∂Ω, that is u n (w) = g2 (w), w ∈ ∂Ω (g2 is a given function). 3) Robin boundary conditions. The linear combination of u and its normal derivative is prescribed on the boundary ∂Ω, that is u n (w) + pu(w) = g3 (w), w ∈ ∂Ω (g3 is a given function and p is a given parameter). An elliptic partial differential equation together with suitable boundary conditions constitutes an elliptic boundary value problem. The main types of numerical methods for solving such problems are as follows: – finite difference methods (FDM) are the simplest to describe and the easiest to implement provided the domain has a reasonably simple shape; – finite volume methods (FVM) are based on the integral form instead of the differential equation; – finite element methods (FEM) are capable of handling very general domains; and – spectral methods can achieve very high accuracy but are practical only for very simple domains. Now we will discuss only the first of these methods. To discretize the differential equation, the domain Ω = [0, 1] is divided into N equal parts with the points 0 = x0 < x1 < x2 < ⋅ ⋅ ⋅ < x N = 1
1.4 One-dimensional Poisson equation | 15
where x i = ih, i = 0, 1, . . . , N with mesh size h = 1/N (or discretization parameter). By the term uniform grid the following set of the points is meant: Ω h = {x i = ih ,
i = 0, 1, . . . , N} .
In the following, the points x i will be called vertices of the grid. The finite difference approximation of u(x) is denoted as u hi , i.e. u(x) : Ω → ℝ ,
uh : Ωh → ℝ ,
u(x i ) = u hi ,
i = 0, 1, . . . , N .
The finite difference method is based on approximation of the derivatives in the partial differential equation by linear combinations of function values at the grid points x i . Finite difference approximations may be constructed in a variety of ways, but the use of the Taylor formula is probably the simplest for our purposes: n
u(x) = ∑ m=0
u (m) (x)̄ (x − x)̄ m + R n (x) , m!
u (m) (x)̄ =
d m u , dx m x̄
(1.10)
where R n (x) is the remainder of the Taylor series R n (x) =
̄ u (n+1) (x̄ + θ(x − x)) (x − x)̄ n+1 , (n + 1)!
θ ∈ (0, 1).
The first derivative in the grid point x i can be approximated on the uniform grid Ω h by linear combinations of function values at the grid points x i−1 and x i+1 . The Taylor formula (1.10) at the grid points x = x i−1 and x = x i+1 with x̄ = x i shows that u(x i−1 ) = u(x i ) − hu (x i ) +
h2 h3 u (x i ) − u (x i ) + ⋅ ⋅ ⋅ , 2 6
(1.11a)
u(x i+1 ) = u(x i ) + hu (x i ) +
h2 h3 u (x i ) + u (x i ) + ⋅ ⋅ ⋅ , 2 6
(1.11b)
where h = x i+1 − x i = x i − x i−1 is the mesh size of the grid Ω h . Multiplication of (1.11a) and (1.11b) by coefficients a1 and a2 a1 (u(x i−1 ) − u(x i )) = −a1 hu (x i ) + a1
h2 h3 u (x i ) − a1 u (x i ) + ⋅ ⋅ ⋅ , 2 6
a2 (u(x i+1 ) − u(x i )) = +a2 hu (x i ) + a2
h2 h3 u (x i ) + a2 u (x i ) + ⋅ ⋅ ⋅ , 2 6
and their summation gives the linear combination a1 u(x i−1 ) − (a1 + a2 )u(x i ) + a2 u(x i+1 ) = (−a1 + a2 )hu (x i ) + (a1 + a2 )
h2 h3 h4 u (x i ) + (−a1 + a2 ) u (x i ) + (a1 + a2 ) u (x i ) + ⋅ ⋅ ⋅ . 2 6 24
(1.12)
16 | 1 Introduction to multigrid
The coefficients a1 and a2 are to be determined in such a way that this linear combination is indeed an approximation of the first derivative. In order that the right-hand side of equation (1.12) reduces to u (x i ) we require the unknown coefficients to satisfy the linear system {(−a1 + a2 )h = 1 . { a + a2 = 0 { 1 The system has the unique solution −a1 = a2 = (2h)−1 . This solution gives the first derivative approximation h2 u(x i+1 ) − u(x i−1 ) = u (x i ) + u (x i ) + ⋅ ⋅ ⋅ . 2h 6 Since u(x i ) = u hi , the approximation takes the final form u hi+1 − u hi−1 h2 − u (x i ) = u (x i ) + ⋅ ⋅ ⋅ . 2h 6
(1.13)
The right-hand side of equation (1.13) is the error committed in terminating the series and is referred to as a truncation error. The truncation error can be defined as the difference between the derivative and its finite difference representation. For sufficiently smooth functions, i.e. ones that possess continuous higher-order derivatives, and sufficiently small h, the first term in the series can be used to characterize the order of magnitude of the error. Finite difference approximation of the second derivative can be obtained from (1.12) in the same way. In this case equating the coefficients of u (x i ) and u (x i ) on the right-hand side of (1.12) leads to the system { {−a1 + a2 = 0 2 { {(a + a ) h = 1 1 2 2 {
.
Solution of the system a1 = a2 = h−2 gives the second derivative approximation u(x i−1 ) − 2u(x i ) + u(x i+1 ) h2
= u (x i ) + R(x i ) ,
where the remainder of the Taylor series takes the form R(x i ) =
2h2 IV 2h4 VI 2h6 VIII u (x i ) + u (x i ) + u (x i ) + ⋅ ⋅ ⋅ . 4! 6! 8!
(1.14)
Symbol n!, where n is the given integer, denotes a factorial (the product of a given positive integer multiplied by all lesser positive integers). Since u(x i ) = u hi , the approximation can be rewritten as u hi−1 − 2u hi + u hi+1 h2
− u (x i ) =
h2 u (x i ) + ⋅ ⋅ ⋅ . 12
(1.15)
1.4 One-dimensional Poisson equation | 17
Finite difference representation of the second derivative makes it possible to approximate the differential equation (1.9a) by the algebraic equation u hi−1 − 2u hi + u hi+1
= −f(x i ) ,
h2
i = 1, 2, . . . , N − 1 .
(1.16)
In order to form the system of equations (SLAE), one has to eliminate the boundary values u 0h = 0 and u hN = 0, which possibly may appear in (1.16). For the first point x1 we have u h − 2u 1h + u 2h −2u 1h + u 2h ⇒ = −f(x1 ) . (1.17) − f(x1 ) = 0 h2 h2 The similar equation for x N−1 is u hN−2 − 2u hN−1 h2
= −f(x N−1 ) .
(1.18)
Eqs. (1.16), (1.17) and (1.18) form the SLAE −2u 1h
+ u 2h
= −h2 f(x1 )
u 1h
− 2u 2h
+ u 3h
u 2h
− 2u 3h
+ u 4h
⋅⋅⋅
⋅⋅⋅
= −h2 f(x2 ) = −h2 f(x3 ) ⋅⋅⋅ u hN−2
= ⋅⋅⋅ − 2u hN−1
= −h2 f(x N−1 ) .
We rewrite the system in the matrix notation with the tridiagonal coefficient matrix −2 1 ( ( ( h2
1
1
u1h
−2
1
1
−2
1
u 2h u 3h
⋅⋅⋅
⋅⋅⋅
⋅⋅⋅
1
−2) (u hN−1 )
(
)( )( )(
−f(x1 )
−f(x2 ) ) ( ) ) = ( −f(x3 ) ) , ) ( ) ⋅⋅⋅ ⋅⋅⋅ (−f(x N−1 ))
Aψ = b ,
(1.19a)
where the coefficient matrix A, the vector of unknowns ψ and the right-hand side vector b are given by 2 −1 1 ( ( A= 2( h (
−1
u 1h
2
−1
−1
2
−1
u 2h u 3h
⋅⋅⋅
⋅⋅⋅
⋅⋅⋅
−1
2)
) ) , )
( ψ=( (
f(x1 ) ) ) , )
⋅⋅⋅ h
(u N−1 )
f(x2 ) ( ) ( b = ( f(x3 ) ) ) . ⋅⋅⋅ (f(x N−1 )) (1.19b)
18 | 1 Introduction to multigrid
The coefficient matrix A is symmetric and positive definite. When algebraic equations replace the original differential problem it is called a finite difference scheme or a discrete boundary value problem. The matrix form of the algebraic equations means that the discrete boundary conditions have been included in the SLAE. In other words, SLAE (1.19) represents the discrete Poisson equation with the eliminated Dirichlet boundary conditions. The Jacobi iterations for SLAE Aψ = b (1.19) are written as (1.5) ψ(ν+1) = (I − τD−1 A)ψ (ν) + τD−1 b , where the matrix D consists of the diagonal entries of the coefficient matrix A: D = 2I/h2 ⇒ D−1 = h2 /2I. The iteration matrix of the Jacobi method is S=I−
τh2 A, 2
(1.20)
and the iterative method becomes ψ (ν+1) = Sψ(ν) +
τh2 b. 2
(1.21)
The most important tasks are determination of the algorithmic complexity and accuracy of the numerical solution. First, we roughly estimate the number of arithmetic operations. Matrix equation (1.21) written as (u hi )
(ν+1)
= (u hi )
(ν)
+
τ τ (ν) (ν) (ν) ((u hi−1 ) − 2(u hi ) + (u hi+1 ) ) + h2 f(x i ) , 2 2 i = 1, 2, . . . , N − 1
(1.22)
shows that information about the boundary conditions spreads inside the computational domain Ω on h for each Jacobi iteration. As a result, it is necessary to perform not less than N − 1 Jacobi iterations to spread the information on the domain Ω. Computation of a new iteration value u hi , i = 1, 2, . . . , N − 1 needs to perform several arithmetic operations per vertex (depending on used computer memory). Therefore computational cost of each Jacobi iteration will be O(N) arithmetic operations and the total amount of computations needed for solving SLAE (1.19) will not be less than O(N 2 ) arithmetic operations, where N is the number of unknowns. It is easy to see that the Jacobi method has no optimal algorithmic complexity. In reality, the Jacobi method puts more effort into the numerical solution. Convergence of iterative methods depends on the spectrum of the iteration matrix (Definition 1.1.2). Following [22], we recall the main facts about the eigenfunctions and eigenvalues of the differential equation u (x) + λu = 0 ,
0< x 0. Now we can build the robust algorithm if the used iterative method is a smoother.
2.2 Analytic part of the technique Fundamental conservation laws describe various physical processes (heat transfer, hydrodynamics, diffusions, etc.). Problems of mathematical physics are formulated in the form of (integro-)differential equations with the initial and boundary conditions providing existence and uniqueness of the governing equation solution. Discretization of the (integro-)differential problems means transition from continuous media to its discrete model. The finite difference discretization of derivatives has been considered in Sect. 1.4. The finite volume method is a more preferable approach for discretization of PDEs of a general type [15, 22, 23]. As a rule, the finite volume discretization is illustrated using the following boundary value problem: d du (λ(x) ) − g(x) u(x) = −f(x) , dx dx u(0) = μ 0 ,
u(1) = μ 1 ,
0 0 is a temperature and nonlinear term αu 2 is similar to quadratic nonlinearity in the convective terms of the Navier–Stokes equations. Σ-modification of the solution ̂ y, z) u(x, y, z) = c(x, y, z) + u(x, leads to the Σ-modified form of (2.61): ∂(c + u)̂ ∂(c + u)̂ ∂ ∂ (λ x (c + u)̂ )+ (λ y (c + u)̂ ) ∂x ∂x ∂y ∂y ∂ ∂(c + u)̂ + (λ z (c + u)̂ ) − α(c + u)̂ 2 = −f(x, y, z) . ∂z ∂z Using identity (c + u)̂ 2 = c2 + 2c û + û 2 , we rewrite the Σ-modified equation as ∂ ∂c ∂c ∂ (λ x (c + u)̂ )+ (λ y (c + u)̂ ) ∂x ∂x ∂y ∂y ∂c ∂ + (λ z (c + u)̂ ) − α(c2 + 2c u)̂ = R(x, y, z) , ∂z ∂z
(2.62a)
96 | 2 Robust multigrid technique
with the right-hand side function R(x, y, z) = −f(x, y, z) −
∂ û ∂ (λ x (c + u)̂ ) ∂x ∂x ∂ û ∂ û ∂ ∂ (λ y (c + u)̂ )− (λ z (c + u)̂ ) + α û 2 . − ∂y ∂y ∂z ∂z
(2.62b)
There is a difference between the differential operators in the left-hand side of the equation (2.62a) and the right-hand side of the equation (2.62b) for the nonlinear case; however the right-hand side function R(x, y, z) (2.62b) for c = 0 coincides with original equation (2.61). Convergence of RMT means that c → 0 ⇒ R → 0 and approximation to the solution û satisfies (2.61). Let the domain Ω be unit cube Ω = {(x, y, z)| 0 < x < 1, 0 < y < 1, 0 < z < 1} , and ∂Ω be the cube boundaries. Uniform computational grid G01 in the domain Ω is generated by partition of the cube sides on N x0 , N y0 and N z0 parts G01 = {(xvi , yvj , zvk )| xvi = (i − 1)h x ,
i = 1, 2, . . . , N x0 + 1 ;
yvj = (j − 1)h y ,
j = 1, 2, . . . , N y0 + 1 ;
zvk = (k − 1)h z ,
k = 1, 2, . . . , N z0 + 1} ,
where h x = 1/N x0 , h y = 1/N y0 and h z = 1/N z0 are the mesh sizes in spatial directions x, y and z respectively. The points (xvi , yvj , zvk ) are vertices of the grid G01 for exact ap0 proximation of the Dirichlet boundary conditions. As a result, the control volume V ijk 0 on the finest grid G1 is defined by 0 = {(x, y, z)| xfi−1 ≤ x ≤ xfi , yfj−1 ≤ y ≤ yfj , zfk−1 ≤ z ≤ zfk } , V ijk
and the control volume faces are located in the middle of the vertices xfi = (xvi + xvi+1 )/2 ,
i = 1, 2, . . . , N x0 ,
yfj = (yvj + yvj+1 )/2 ,
j = 1, 2, . . . , N y0 ,
zfk = (zvk + zvk+1 )/2 ,
k = 1, 2, . . . , N z0 .
Assume that the multigrid structure is generated, i.e. index mappings of the vertices and the faces of all grids of all levels onto the finest grid G01 are given. In the following, the general case N x0 ≠ N y0 ≠ N z0 ⇒ h x ≠ h y ≠ h z will be considered (Sect. 2.3). Control volume on some grid is defined by l = {(x, y, z)| xf{i−1} ≤ x ≤ xf{i} , yf{j−1} ≤ y ≤ yf{j} , zf{k−1} ≤ z ≤ zf{k} } . V{ijk}
2.7 Numerical experiments |
97
Finite volume approximation of the Σ-modified boundary value problems has been discussed in Sect. 2.3. Integration of the Σ-modified equation (2.62) over the control l leads to the discrete equation volume V{ijk} φ
φ
φ
φ
φ
φ
c{ijk} − c{i−1jk} c{i+1jk} − c{ijk} c{ij+1k} − c{ijk} ⟨λ x ⟩ f − ⟨λ x ⟩ f + ⟨λ y ⟩ f 2 2l 2 2l x{i} x{i−1} y{j} h 3 x h 3 x h2 32l y x
− ⟨λ y ⟩ f y{j−1}
x
φ c{ijk}
φ − c{ij−1k} h2y 32l y
−
+ ⟨λ z ⟩ f z{k} α
h x h y h z 3l x +l y +l z
y
φ φ c{ijk+1} − c{ijk} h2z 32l z
φ
φ
c{ijk} − c{ijk−1} − ⟨λ z ⟩ f z{k−1} h2 32l z z
x f{i}
yf{j}
zf{k}
∫
∫
∫ (c2 + 2c u)̂ dz dy dx = ⟨R⟩{ijk} ,
x f{i−1} yf{j−1} zf{k−1}
φ
where the grid functions c ijk and φ̂ ijk are discrete analogues of the functions c(x, y, z) ̂ y, z) (approximation to the solution), and the coeffi(coarse grid correction) and u(x, cients ⟨λ x ⟩, ⟨λ y ⟩, ⟨λ z ⟩ and the right-hand side function ⟨R⟩{ijk} are defined by yf{j}
zf{k}
1 ⟨λ ⟩ ∗ = ∫ x h y h z 3l y +l z f
̂ ∗ , y, z)) dz dy , ∫ λ x (c(x∗ , y, z) + u(x
x
y{j−1} zf{k−1} zf{k}
x f{i}
1 ⟨λ y ⟩ ∗ = ∫ y h x h z 3l x +l z f
̂ y∗ , z)) dz dx , ∫ λ y (c(x, y∗ , z) + u(x,
x {i−1} zf{k−1} yf{j}
x f{i}
1 ⟨λ z ⟩ ∗ = ∫ z h x h y 3l x +l y f
̂ y, z∗ )) dy dx , ∫ λ z (c(x, y, z∗ ) + u(x,
x {i−1} yf{j−1}
⟨R⟩{ijk} =
x f{i}
1 h x h y h z 3l x +l y +l z
yf{j}
∫
∫
x f{i−1}
yf{j−1}
zf{k}
∫ R(x, y, z) dz dy dx zf{k−1}
and x∗ = xf{i−1} , xf{i} , y∗ = yf{j−1} , yf{j} and z∗ = zf{k−1} , zf{k} . The coefficients ⟨λ x ⟩, ⟨λ y ⟩ and ⟨λ z ⟩ are arithmetic averaging of the coefficients λ x , l λ y and λ z on the control volume faces V{ijk} , and the right-hand side function ⟨R⟩{ijk} l is the arithmetic averaging of the residual R over the control volume V{ijk} . φ
l , we Assuming that the correction c{ijk} is constant inside the control volume V{ijk} have
α hx hy hz 3
l x +l y +l z
x f{i}
yf{j}
zf{k}
∫
∫
∫ (c2 + 2c u)̂ dz dy dx ≈ α((c{ijk} )2 + 2c{ijk} ⟨u⟩̂ {ijk}) ,
x f{i−1} yf{j−1} zf{k−1}
φ
φ
98 | 2 Robust multigrid technique where ⟨u⟩̂ {ijk} is arithmetic averaging of the approximation to the solution û over the l , i.e. control volume V{ijk}
⟨u⟩̂ {ijk} =
x f{i}
1 h x h y h z 3l x +l y +l z
yf{j}
∫
∫
x f{i−1}
yf{j−1}
zf{k}
̂ y, z) dz dy dx . ∫ u(x, zf{k−1}
Application of the Newton method (Sect. 1.9) for local linearization of the nonlinear term gives φ φ φ φ (c{ijk} )2 ≈ 2c̄ {ijk} c{ijk} − (c̄{ijk} )2 , φ
where c̄{ijk} is previous iterant value. Linearized approximation of the nonlinear term becomes α h x h y h z 3l x +l y +l z
x f{i}
yf{j}
zf{k}
∫
∫
∫ (c2 + 2c u)̂ dz dy dx ≈ 2α(c̄ {ijk} + ⟨u⟩̂ {ijk} )c{ijk} − α(c̄{ijk} )2 .
φ
φ
φ
x f{i−1} yf{j−1} zf{k−1}
All diffusion terms of (2.61) with the second-order derivatives are nonlinear because the coefficients λ x (u), λ y (u) and λ z (u) depend on the temperature u. As a rule, working mediums used in engineering applications have complicated dependence of their thermophysical properties on temperature (and pressure). Formulation of the equations of state can be very difficult scientific problems and often the thermal properties are computed by black-box software. In general, we will suppose that analytical dependence of the coefficients λ x (u), λ y (u) and λ z (u) on u is unknown. The most commonly used approach for linearization of these terms is methods for the lagging coefficients: the coefficients λ x (u), λ y (u) and λ z (u) are computed using the previous iterant value of u. The coefficients ⟨λ x ⟩, ⟨λ y ⟩ and ⟨λ z ⟩, ⟨u⟩{ijk} and the right-hand side function ⟨R⟩{ijk} should be computed on each grid before the smoothing iterations. Coarse grid values of these coefficients and the function depend only on their approximation on the finest grid G01 (Sect. 2.3). The coefficients λ x (u), λ y (u) and λ z (u) can be discontinuous functions (heat conductivity in heterogeneous medium) and their approximation on the finest grid is φ φ λ x (c i−1jk + φ̂ i−1jk )λ x (c ijk + φ̂ ijk ) ⟨λ ⟩ f = 2 x φ , φ x i−1 λ (c + φ̂ i−1jk ) + λ x (c + φ̂ ijk ) x
i−1jk φ
φ
λ x (c i+1jk + φ̂ i+1jk )λ x (c ijk + φ̂ ijk ) , ⟨λ ⟩ f = 2 x φ φ x i λ (c + φ̂ i+1jk ) + λ x (c + φ̂ ijk ) x
i+1jk φ
(2.63b)
ijk
φ
λ y (c ij−1k + φ̂ ij−1k )λ y (c ijk + φ̂ ijk ) ⟨λ ⟩ f = 2 y φ , φ y j−1 λ (c + φ̂ ij−1k ) + λ y (c + φ̂ ijk ) y
ij−1k
(2.63a)
ijk
ijk
(2.63c)
2.7 Numerical experiments |
φ
99
φ
λ y (c ij+1k + φ̂ ij+1k )λ y (c ijk + φ̂ ijk ) ⟨λ ⟩ f = 2 y φ , φ yj λ (c ij+1k + φ̂ ij+1k ) + λ y (c ijk + φ̂ ijk )
(2.63d)
φ φ λ z (c ijk−1 + φ̂ ijk−1 )λ z (c ijk + φ̂ ijk ) ⟨λ ⟩ f = 2 z φ , φ z k−1 λ (c ijk−1 + φ̂ ijk−1 ) + λ z (c ijk + φ̂ ijk )
(2.63e)
y
z
φ
φ
λ z (c ijk+1 + φ̂ ijk+1 )λ z (c ijk + φ̂ ijk ) . ⟨λ z ⟩ f = 2 z φ φ zk λ (c ijk+1 + φ̂ ijk+1 ) + λ z (c ijk + φ̂ ijk )
(2.63f)
l Approximation to the solution û averaged over the control volume V{ijk} is evaluated on the finest grid by midpoint formula
⟨u⟩̂ ijk
x fi
yfj
x fi−1
yfj−1
zfk
1 ̂ y, z) dz dy dx ≈ φ̂ ijk . = ∫ ∫ ∫ u(x, hx hy hz
(2.64)
zfk−1
Approximation of the right-hand side function ⟨R⟩{ijk} on the finest grid is the residual: x fi
⟨R⟩ijk
yfj
zfk
φ̂ i+1jk − φ̂ ijk 1 = ∫ ∫ ∫ R(x, y, z) dz dy dx ≈ −⟨λ x ⟩ f x i hx hy hz h2x x fi−1 yfj−1 zfk−1
φ̂ ijk − φ̂ i−1jk φ̂ ijk − φ̂ ij−1k φ̂ ij+1k − φ̂ ijk + ⟨λ x ⟩ f − ⟨λ y ⟩ f + ⟨λ y ⟩ f 2 2 x i+1 yj y j+1 h h h2 x
y
y
φ̂ ijk − φ̂ ijk−1 φ̂ ijk+1 − φ̂ ijk − ⟨λ z ⟩ f + ⟨λ z ⟩ f − f(xvi , yvj , zvk ) + α φ̂ 2ijk . 2 zk z k+1 h h2 z
(2.65)
z
(0)
A starting guess φ̂ ijk should be given before the multigrid iterations. The known (l+1)
lagging correction (correction computed on coarser level l + 1) is denoted as c ijk
or
φ {c ijk , 0 ≤ l < L+3 , = { l+1 l = L+3 {0, for the nonlinear smoothing iterations. Then approximation of the averaged coeffi cient ⟨λ x ⟩ f (2.63a) on the finest grid becomes x i−1 (l+1) c ijk
(l+1) (l+1) λ x (c i−1jk + φ̂ i−1jk )λ x (c ijk + φ̂ ijk ) . ⟨λ x ⟩ f = 2 x (l+1) (l+1) x i−1 λ (c + φ̂ ) + λ x (c + φ̂ ) i−1jk
i−1jk
ijk
ijk
(l+1)
Since multigrid iterations of RMT start on the coarsest level (l = L+3 ⇒ c ijk = 0), first approximation of the coefficient ⟨λ x ⟩ f (2.63a) on the finest grid reduces to x i−1 λ x (φ̂ i−1jk )λ x (φ̂ ijk ) . ⟨λ x ⟩ f = 2 x x i−1 λ (φ̂ i−1jk ) + λ x (φ̂ ijk )
100 | 2 Robust multigrid technique
The starting guess should be chosen carefully for nonlinear problems. For example, if λ x (u) = 1/u a (a > 0), then is not possible to guess u(0) = 0 as often as in linear cases. The simplest way is to input a starting guess not only for the solution, but also for the coefficients as ⟨λ x ⟩ f , for example x i−1
⟨λ x ⟩ f x i−1
(0) ⟨λ x ⟩ f , { { { x i−1 { { x (l+1) = { λ x (c(l+1) + φ̂ i−1jk )λ (c ijk + φ̂ ijk ) { i−1jk { { , {2 x (l+1) x (l+1) { λ (c i−1jk + φ̂ i−1jk ) + λ (c ijk + φ̂ ijk )
q = 0 and l = L+3 , (2.66) otherwise ,
(0) where ⟨λ x ⟩ f is a starting guess for the coefficient ⟨λ x ⟩ taken from the physical conx i−1 sideration. Other coefficients (2.63b)–(2.63f) are computed in the same manner. Smoothing on each level l (l = L+3 , . . . , 0) can be represented as the following operations: 1) computation of the coefficients (2.63), (2.64) and the right-hand side function (l+1) (2.65) using known lagging correction c ijk on the finest grid, where the coeffix y z cients ⟨λ ⟩, ⟨λ ⟩ and ⟨λ ⟩ are redefined as (2.66); ̂ and ⟨R⟩ on all grids of level l; 2) computation of averaged values ⟨λ x ⟩, ⟨λ y ⟩, ⟨λ z ⟩, ⟨u⟩ 3) reduction of the linearized scheme φ
φ
φ
φ
φ
φ
c{ijk} − c{i−1jk} c{i+1jk} − c{ijk} c{ij+1k} − c{ijk} − ⟨λ x ⟩ f + ⟨λ y ⟩ f ⟨λ x ⟩ f 2 2l 2 2l x{i} x{i−1} y{j} h 3 x h 3 x h2 32l y − ⟨λ ⟩ f y{j−1} y
x φ c{ijk}
x φ φ c{ijk+1} − c{ijk} z + ⟨λ ⟩ f z{k} h2z 32l z φ φ ̂ {ijk})c{ijk} − 2α( c̄ {ijk} + ⟨φ⟩
φ − c{ij−1k} h2y 32l y
− ⟨λ ⟩ f z{k−1} z
φ
y φ c{ijk}
φ
− c{ijk−1}
h2z 32l z
+ α(c̄{ijk} )2 = ⟨R⟩{ijk}
(2.67)
to the general SLAE (2.46); and 4) (post)smoothing: a few smoothing iterations on all grids of this level l to reduce high-frequency error components. After nonlinear smoothing on the finest grid (l = 0), approximation to the solution is updated φ φ̂ ijk := φ̂ ijk + c ijk φ
(equality in sense of assignment) and zeroing of the correction c ijk = 0. In general, it is necessary to get convergence of the nonlinear iterations. In particular, the Newton method converges with quadratic rate if an initial guess is close enough to the sought solution (Sect. 1.9). Therefore, the underrelaxation φ
φ
φ
φ
c{ijk} := c̄ {ijk} + ϖ(c{ijk} − c̄{ijk} )
2.7 Numerical experiments |
101
φ
is often used for solving the nonlinear equations, where c{ijk} is the updated coarse φ grid correction after the smoothing iteration, c̄ {ijk} is the previous value of the correction and ϖ ∈ (0, 1] is the underrelaxation parameter. φ φ Let (c{ijk} )(0) be a starting guess and (c{ijk} )(1) be an approximation to solution of (ν)
(ν)
SLAE A l (c φ )(ν) = b l after the first nonlinear iteration with ϖ = 1. Convergence of the nonlinear smoothing iterations means (1)
(1)
(0)
(0)
‖A l (c φ )(1) − b l ‖ < ‖A l (c φ )(0) − b l ‖ . Loss of monotone reduction of the residual vector norm will be assumed as divergence of the nonlinear smoothing iterations. In this case it is necessary to come φ back to the starting guess (c{ijk} )(0) and repeat the nonlinear smoothing iteration with a smaller value of the under-relaxation parameter ϖ. Then the condition of monotone convergence is checked. Thus it is possible to determine the value of the parameter ϖ guaranteeing the convergence of the nonlinear iterations, if the value exists. However computations with ϖ < 1 should be performed until a sufficiently accurate approximation to the solution is obtained. Then computations continue with ϖ = 1 to use the high (quadratic) convergence rate of the Newton method. Smoothing with smaller values of the parameter ϖ results in deterioration of the convergence rate, but smoothing with higher values of ϖ can lead to divergence of the nonlinear iterations. Therefore, the main difficulty in development of a robust algorithm for nonlinear problems is to find the optimal value of underrelaxation parameter ϖopt for nonlinear iterations. As a rule, proximity of the solutions to neighboring levels of RMT ensures convergence of Newton iterations on the finer levels (l < L+3 ). Besides, grids of the same level are quite identical so it is possible to assume that optimal value ϖopt is the same for all grids of this level. Therefore, we can perform computational experiments on single grids of the level to obtain ϖopt and to use the obtained value on other grids. Taking into account the number of grids forming this level, extra computational work is negligible. We use a block ordering of the unknowns to write the discrete equation (2.67) in matrix form. Block ordering was considered for 2D problems in Sect. 1.5. In general, φ the unknowns can be incorporated in blocks c ̃ ̃ ̃ . The Gauss–Seidel method will {i+ ij+ jk+ k} be used as a smoother in RMT with the following unknowns ordering: 1) point ordering 1 × 1 × 1 (one unknown): i ̃ = j ̃ = k̃ = 0 ; 2) block ordering 3 × 3 × 3 (27 unknowns): i ̃ = −1, 0, 1; j ̃ = −1, 0, 1; k̃ = −1, 0, 1 ; and 3) block ordering 5 × 5 × 1 (25 unknowns): i ̃ = −2, −1, 0, 1, 2; j ̃ = −2, −1, 0, 1, 2; k̃ = 0. These blocks of unknowns are shown in Fig. 2.14.
102 | 2 Robust multigrid technique
Fig. 2.14: Blocks of the unknowns: 3 × 3 × 3 (left) and 5 × 5 × 1 (right)
Let the computational domain Ω be a unit cube and G0 be a uniform finest grid 101 × 101 × 101 (h x = h y = h z = 1/100). Convergence rate of RMT is estimated by an average reduction factor of the residual (1.61) after five multigrid iteration (q = 5) with zero starting guess. The goal of the numerical experiments is to study the influence of the number of smoothing iterations (ν) and ordering of the unknowns on convergence rate of RMT. Other multigrid components are independent of the given problems. Poisson equation: (2.61) with λ x = λ y = λ z = 1 , α = 0. Assume that the exact solution of the Poisson equation (2.61) is u(x, y, z) = e x+y+z .
(2.68)
Substitution of the exact solution (2.68) into original equation (2.61) defines the righthand side function f(x, y, z) and boundary conditions. Error of the numerical solution E(q) is defined in a traditional manner: (q)
E(q) = max |φ̂ ijk − u(xvi , yvj , zvk )| . ijk
Table 2.2 represents the convergence results, where execution time for the multigrid algorithm with four (post)smoothing Gauss–Seidel iterations (ν = 4) with point Tab. 2.2: Convergence results in solving the Dirichlet problem for the Poisson equation Blocks 1×1×1 1×1×1 1×1×1 3×3×3 3×3×3 3×3×3
ν 4 5 6 3 4 5
(q)
E (q) −6
7.30 ⋅ 10 7.30 ⋅ 10−6 7.30 ⋅ 10−6 7.30 ⋅ 10−6 7.30 ⋅ 10−6 7.30 ⋅ 10−6
ρMG
WU
0.084 0.033 0.023 0.055 0.033 0.021
1.00 1.16 1.35 1.03 1.32 1.62
2.7 Numerical experiments |
103
ordering of unknowns (1 × 1 × 1) is taken as a work unit (WU=1). Obtained values of (q) ρ MG < 0.1 (1.61) after five multigrid iterations (q = 5) demonstrates the high convergence rate of RMT. Anisotropic equation: (2.61) with λ x = λ y = 1 , λ z = 0.1, α = 0. The Gauss–Seidel method with the point (1 × 1 × 1) and block (3 × 3 × 3 and 5 × 5 × 1) orderings of unknowns is used as a smoother for this anisotropic equation with exact solution (2.68). Table 2.3 represents the convergence results. Smoothing effect of the Gauss–Seidel method with the point ordering of unknowns is very poor because the pointwise relaxation has a smoothing effect only with respect to the ‘strong coupling’ in the operator [25]. Previously, similar results were obtained for 2D Dirichlet problems (Sect. 1.5). Block ordering 3 × 3 × 3 gives better results compared to point ordering, but (q) the best convergence rate (or the least value of ρ MG ) can be obtained for the unknown ordering taking into account the anisotropy (λ x = λ y ≫ λ z ). Tab. 2.3: Convergence results in solving the Dirichlet problem for the anisotropic equation (q)
Blocks
ν
E (q)
ρMG
WU
1×1×1 1×1×1 1×1×1 3×3×3 3×3×3 3×3×3 5×5×1 5×5×1 5×5×1
4 5 6 3 4 5 3 4 5
3.29 ⋅ 10−3 1.44 ⋅ 10−3 6.25 ⋅ 10−4 1.77 ⋅ 10−4 2.73 ⋅ 10−5 9.90 ⋅ 10−6 1.08 ⋅ 10−5 7.87 ⋅ 10−6 7.81 ⋅ 10−6
0.752 0.637 0.538 0.522 0.333 0.216 0.268 0.142 0.081
1.00 1.16 1.35 1.03 1.32 1.62 0.81 1.01 1.21
However, in general, the choice of the block of unknowns ensuring the highest convergence rate is not so obvious as in the example considered. Often the difference between the coefficients λ x , λ y and λ z takes place in the subdomains of Ω; this difference can be caused by the physical nature of the modeled process or/and the properties of the computational grid. In such cases, different blocks of unknowns as shown in Fig. 1.10 should be used depending on entries of the coefficient matrix. Equation with jumping coefficients: (2.61) with α = 0. Consider a heat conductivity problem in a domain consisting of dissimilar materials, e.g. a cubic subdomain Ω̃ in Ω as shown in Fig. 2.15. In this case, the heat conductivity coefficient is defined by { λi , λ x (x, y, z) = λ y (x, y, z) = λ z (x, y, z) = { λ , { e
(x, y, z) ∈ Ω̃ (x, y, z) ∉ Ω
.
λe λi
1
1/3
1/3
104 | 2 Robust multigrid technique
1/3
1/3
1
Fig. 2.15: Geometry of the problem with jumping coefficients
Assume that the problem has the exact solution (2.68) if λ i = λ e = 1. Table 2.4 represents the convergence results, where the Gauss–Seidel smoothers with the point and block orderings of the unknowns demonstrate almost the same convergence rate. Tab. 2.4: Convergence results in solving the Dirichlet problem for the equation with jumping coefficients Blocks
λe
λi
ν
ρ(q)
WU
1×1×1 1×1×1 1×1×1 1×1×1 1×1×1 1×1×1 3×3×3 3×3×3 3×3×3 3×3×3 3×3×3 3×3×3
1 1 1 1 1 1 1 1 1 1 1 1
10+0 10−1 10−2 10−3 10−4 10−5 10+0 10−1 10−2 10−3 10−4 10−5
6 6 6 6 6 6 5 5 5 5 5 5
0.023 0.228 0.198 0.227 0.273 0.410 0.021 0.193 0.170 0.150 0.149 0.149
1.92 1.92 1.92 1.92 1.92 1.92 2.05 2.05 2.05 2.05 2.05 2.05
First nonlinear equation: (2.61) with α = 0. Assume that the coefficients λ x , λ y are λ z defined by λ x (u) = u −a ,
λ y (u) = u −b ,
λ z (u) = u −c ,
where a, b and c are some positive constants. Multigrid iterations of RMT start with zero starting guess φ̂ (0) = 0 and unit starting guess for the coefficients (λ x )(0) = (λ y )(0) = (λ z )(0) = 1 assuming that the analytical dependence of λ x (u), λ y (u) and λ z (u) is unknown. Table 2.5 represents the convergence results for a = 0.25, b = 0.50 and c = 0.75 and the exact solution (2.68).
2.8 Unstructured grids
|
105
Tab. 2.5: Convergence results in solving the Dirichlet problem for first nonlinear equation Blocks 1×1×1 1×1×1 1×1×1 3×3×3 3×3×3 3×3×3
a
b
0.25 0.25 0.25 0.25 0.25 0.25
c
0.50 0.50 0.50 0.50 0.50 0.50
E (q)
ν
0.75 0.75 0.75 0.75 0.75 0.75
4 5 6 3 4 5
−6
9.00 ⋅ 10 2.83 ⋅ 10−6 2.40 ⋅ 10−6 6.01 ⋅ 10−6 2.27 ⋅ 10−6 2.14 ⋅ 10−6
ρ(q)
WU
0.317 0.169 0.115 0.256 0.110 0.075
1.44 1.77 2.01 1.55 1.80 2.06
Second nonlinear equation: (2.61) with λ x = λ y = λ z = 1. Assume that the boundary value problem (2.61) has the exact solution (2.68). The nonlinear smoothing consists of external iterations of the Gauss–Seidel method and internal iterations of the Newton method (Sect. 1.9). Table 2.6 represents the convergence results, where the nonlinear smoothing iterations are performed without an underrelaxation. Tab. 2.6: Convergence results in solving the Dirichlet problem for second nonlinear equation Blocks 1×1×1 1×1×1 1×1×1 3×3×3 3×3×3 3×3×3
ν 4 4 4 3 3 3
α 0 5 10 0 5 10
(q)
E (q) −6
7.30 ⋅ 10 7.30 ⋅ 10−6 7.30 ⋅ 10−6 7.30 ⋅ 10−6 7.30 ⋅ 10−6 7.30 ⋅ 10−6
ρMG
WU
0.084 0.043 0.013 0.055 0.020 0.006
1.00 1.06 1.16 1.03 1.08 1.14
All model problems are solved without optimization of components of RMT. Closeto-optimal algorithmic complexity is achieved for all cases. However, it is necessary to adapt the unknowns ordering for anisotropic problems. The application of the block ordering of unknowns has the advantage that PDE systems (with features of saddlepoint type) can be solved as a scalar Poisson-like equation. Application of RMT for solving the Navier–Stokes equations will be discussed in Chapter 4.
2.8 Unstructured grids There are two different approaches on the boundary-fitted grids. In the first, coordinate transformations are used to obtain simple (for example rectangular) domains, and correspondingly simple (rectangular) grids. Here the differential (and/or the discrete) equations are transformed to the new curvilinear coordinates. In the second
106 | 2 Robust multigrid technique
approach, the computations are performed in the physical domain with the original (nontransformed) equations [25]. Here we concentrate on the second approach. Now the unstructured grids are widely used for discretization of the boundary value problems in complicated domains. For a given unstructured grid, it is usually not difficult to define a sequence of finer grids, but it may be difficult to define a sequence of reasonably coarser grids [25]. We can construct a multigrid algorithm with the problem-dependent components for solving linear boundary value problems on unstructured grids. As shown in Sect. 2.6, the original PDE can be rewritten in the following Σ-modified form: Lc = f − L û . The Σ-modification can be considered as a variant of the defect correction (Sect. 1.9). In general, left- and right-hand sides of the Σ-modified equation Lc = f −L û can be discretized not only with different approximation orders, but also by different discretization methods on computational grids of different types. Assume that an unstructured grid has been generated for discretization of a boundary value problem. We can generate an auxiliary structured grid (ASG) for computation of the coarse grid correction. To illustrate exactly what the approach means, we consider the following example: the unstructured grid has been generated in unit square for discretization of the right-hand side of the Σ-modified equation by the finite element method. The original unstructured grids (OUG) are formed by the triangles with vertices 𝛶 k (x k , y k ), k = 1, 2, . . . , K (here K = 409) as shown in Fig. 2.16.
Fig. 2.16: Unstructured grid in the unit square
2.8 Unstructured grids |
107
We generate ASGs for discretization of the Σ-modified boundary value problem by the finite volume method. For the given purpose, we form an ordered set of the abscissas of the triangles vertices (without coincident vertices) 𝛶̃ x = {x i | 0 = x1 < x2 < ⋅ ⋅ ⋅ < x i < x i+1 < ⋅ ⋅ ⋅ < xK˜ = 1} ,
K˜ ≤ K .
Here K˜ = 368. After that we generate a uniform grid on the unit segment i−1 ˜ . 𝛶̂ x = {x̂ i | x̂ i = , i = 1, 2, . . . , K} K˜ − 1 Control function F : x̂ i → x i maps the uniform grid 𝛶̂ x on the ordered set of the vertex abscissas 𝛶̃ x (Fig. 2.17). We require that the number of vertices on ASG will be the same (or slightly higher) than that on OUS. For the given purpose, we generate another uniform grid i−1 ¯ , 𝛶̄ x = {x̄ i | x̄ i = , i = 1, 2, . . . , K} ¯K − 1
K¯ = [√K˜ ] + 1 ,
where the square brackets mean an integer part, here K¯ = 21. It is possible to use an spline-interpolating function of the control function F for determination of the vertices of ASG. Figure 2.17 and 2.18 represent the control function F and the nonuniform x grid 𝛶̄ ASG . The Σ-modified boundary value problem Lc = f −L û can be discretized by different methods: the left-hand side Lc is discretized on ASG by the finite volume method and ∼x
0.8
0.6
0.4
0.2
0.0 0.0
0.2
0.4
Fig. 2.17: Control function F
^x
0.6
0.8
1.0
108 | 2 Robust multigrid technique −
x ASG
0.8
0.6
0.4
0.2
0.0 0.0
0.2
0.4
−
x
0.6
0.8
1.0
Fig. 2.18: Vertex abscissas of ASG
the right-hand side f − L û is discretized on OUG by the finite element method. The iterative method can be written in the standard form. All quantities caused by OUGs and ASGs will have subscripts ∆ and ◻ respectively. Then discrete analogue of the Σmodified boundary value problem Lc = f − L û becomes (q)
A◻ c ◻φ = ∏(b ∆ − A ∆ φ̂ ∆ ) ,
(2.69)
∆→◻
is a residual computed on OUG, c◻φ is a correction computed on ASG, where b ∆ −A ∆ φ̂ (q) ∆ and ∏ is a transfer operator transferring the residual from OUG on ASG (Fig. 2.19). ∆→◻
Computational algorithm based on application of ASG for computation of the correction c◻φ can be represented as the following operations: 1. presmoothing on OUG; 2. computation of the residual on OUG; 3. interpolation of the residual from OUG on ASG; 4. computation of the coarse grid correction on ASG; 5. interpolation of the coarse grid correction from ASG on OUG; 6. postsmoothing on OUG; and 7. check convergence, continue (go to item 1) if necessary. Using analogy with two-grid algorithms discussed in Sect. 1.6, we have ̀ ́ ̀ ́ = S ∆ν d ∆ A ∆ S ∆ν φ̂ (q) + [I − S ∆ν d ∆ A ∆ S ∆ν ]A−1 φ̂ (q+1) ∆ b∆ , ∆ ∆
(2.70)
2.8 Unstructured grids
|
109
Fig. 2.19: Transfer operator ∏ ∆→◻
Fig. 2.20: Transfer operator ∏
◻→∆
where the matrix d ∆ is given by − ∏ A−1 d ∆ = A−1 ∆ ◻ ∏ , ◻→∆
∆→◻
where ∏ is a transfer operator transferring the correction from ASG on OUG (Fig. 2.20). ∆→◻
Iterations of the two-grid algorithm (2.70) are similar to the iterations of the two-grid algorithm (1.48) discussed in Sect. 1.6. OUG and ASG are shown in Fig. 2.21. By analogy with (1.50) and (1.51), we formulate extra smoothing and approximation properties for the method (2.70): 1. Extra smoothing property: there exists a monotonically decreasing function η(ν) : ℝ+ → ℝ+ such that η(ν) → 0 for ν → ∞ and ‖A ∆ S ∆ν ‖ ≤ η(ν)‖A ∆ ‖ . 2.
Extra approximation property: there exists a constant C A ∆ > 0 such that −1 −1 −1 A ∆ − ∏ A◻ ∏ ≤ C A ∆ ‖A ∆ ‖ . ◻→∆ ∆→◻ Extra smoothing and approximation properties have the same meaning as the basic ones in (1.50).
If the extra smoothing and approximation properties hold, then the two-grid iteration matrix norm can be estimated as ̀ ́ ̄ ∆ A ∆ S ∆ν+̀ ν́ ‖ ≤ C‖A ̄ ∆ S ∆ν+̀ ν́ ‖ ⋅ ‖d ∆ ‖ ≤ CC ̄ A η(ν̀ + ν)́ , ‖S ∆ν d ∆ A ∆ S ∆ν ‖2 ≤ C‖d ∆
110 | 2 Robust multigrid technique
Fig. 2.21: Original unstructured grid (OUG) and auxiliary structured grids (ASG)
i.e. norm of the exact error vector will decrease monotonically after the two-grid iterations independently of the discretization parameter. Algorithmic complexity depends on complexity of the iterative method used for solving SLAE (2.69) on ASG. For the same number of vertices N on OUG and ASG and complexity of an iterative method for SLAE (2.69) is O(N) arithmetic operations, then total algorithmic complexity of the two-grid iterations will be W = O(N) arithmetic operations. Optimization of the multigrid algorithms so that their components satisfy the extra smoothing and approximation properties makes it possible to use the approach for solving the boundary value problems on the unstructured grids. This approach restricts the choice of the algorithm component, but not properties of the computational grid. Another approach restricts properties of the computational grid, but not the choice of the algorithm component. For example, we can generate an unstructured grid so that the grid can be reconstructed into the boundary-fitted logically rectangular grid as shown in Fig. 2.22. It results in trivial intergrid operators ∏=∏=I. ◻→∆
∆→◻
The boundary-fitted logically rectangular grid with a finite volume discretization of the Σ-modified boundary value problems is used for determination of the coarse grid correction. Finite volume discretization of the partial differential equations on such a grid has been discussed in [14, 15].
2.9 Remarks on multigrid software
| 111
CV
Fig. 2.22: Unstructured grid (top) and associated boundary-fitted (logically rectangular) grid with control volumes (CV) (bottom)
At present, the formalization of generation of the computational grids with the required properties in complicated domains is the most difficult problem for blackbox software. If such grid has been generated in the complicated domain, then a large class of the boundary value problems can be solved by the single-grid Gauss–Seidel method with close-to-optimal complexity.
2.9 Remarks on multigrid software First, we assume that the finest grid G01 (N x0 + 1) × (N y0 + 1) × (N z0 + 1) for solving 3D boundary value problems can be represented as a product of three 1D grids G01x , G01y and G01z in spatial directions x, y and z respectively: G01 = {(xvi , xfi ; yvj , yfj ; zvk , zfk )| (xvi , xfi ) ∈ G01x , (yvj , yfj ) ∈ G01y , (zvk , zfk ) ∈ G01z } , where G01x = {xvi , i = 1, . . . , N x0 + 1; xfi , i = 1, . . . , N x0 } , G01y = {yvj , j = 1, . . . , N y0 + 1; yfj , j = 1, . . . , N y0 } , G01z = {zvk , k = 1, . . . , N z0 + 1; zfk , k = 1, . . . , N z0 } .
112 | 2 Robust multigrid technique
Similar grids were discussed in Chapter 1. Six real arrays (xvi , xfi , yvj , yfj , zvk , zfk ) are used for storage of the finest grid. Six integer arrays are used for storage of the index mapping. Consider arrangement of the array for storage of the index mapping of the points xv . This array is divided into L+x + 1 segments in accordance with the the number of levels in this spatial direction, i.e. L+x coarse levels plus the finest grid (zero level). Serial number of the coarsest level is defined by (2.4). Each segment is divided into 3l , l = 0, . . . , L+x subsegments in accordance with the number of grids forming this level l. Serial number of the array element counted from this subsegment origin is the point index of the coarse grid (i.e. i in xv{i} ). This array element with the serial number is the index of the corresponding point of the finest grid (i.e. {i} in xv{i} ). Integer array-pointers are used for comfortable work with this 1D array. These array-pointers contain information about the origin of each subsegment. Arrangement of integer array MG_v is shown in Fig. 2.23. We introduce an integer variable Dir (Integer): 1, direction x , { { { Dir = {2, direction y , { { {3, direction z .
LevelX=0
LevelX=1
GridsX=1
GridsX=1
LevelX=2
GridsX=2
GridsX=3
GridsX=1 MG_v(1,...) Poin_V(1,2,1)
Poin_V(1,1,3)
Poin_V(1,1,2)
Poin_V(1,1,1)
Poin_V(1,0,1)
Fig. 2.23: Arrangement of integer array for storage of indexes of the points x v
The array-pointer takes the form Poin_V(Dir,Level,Grid), where integer parameter Grid is number of a grid of a level Level (Integer). Array MG_v(Dir,...) contains all information about indexes of the points xv (Dir=1), yv (Dir=2) and zv (Dir=3). Array MG_f(Dir,...) intended for storage of indexes of the points xf (Dir=1), yf (Dir=2) and zf (Dir=3) has the same arrangement. Computation of the index mapping of points involves copying of the corresponding subsegment to integer arrays TxV (for points xv ) and TxF (for points xf ) as shown in Fig. 2.24. The notation xv{i} or xf{i} means XV(TxV(i)) and XF(TxF(i)) in computer implementation, where the arrays XV and XF are intended for the storage of points of the finest grid. Spatial location of the vertices are given by coordinates (xv{i} , yv{j} , zv{k} ) or (XV(TxV(i)),YV(TyV(j)),ZV(TzV(k))).
| 113
2.9 Remarks on multigrid software
LevelX=0 GridsX=1
LevelX=1 GridsX=1
LevelX=2
GridsX=2
GridsX=3
GridsX=1 MG_v(1,1,...) TxV(...)
Fig. 2.24: Computation of the index mapping of points x v of the second grid of the first level
For boundary-fitted logically rectangular grids discussed in Sect. 2.8, spatial location of the vertices are given by coordinates (xv{ijk} , yv{ijk} , zv{ijk} ) or (XV(TxV(i),TyV(j),TzV(k)),YV(TxV(i),TyV(j),TzV(k)),ZV(TxV(i),TyV(j),TzV(k))).
Now we consider approximation of the coefficients ⟨g⟩{i} (2.9), ⟨λ⟩{i} (2.11) and the right-hand side ⟨r⟩{i} (2.10) on the multigrid structure. As discussed in Sect. 2.3, the composite formula is used for their approximation on the finest grid, i.e. all coefficients and the right-hand side should be approximated on the finest grid. Since all control volumes on each coarse grid cover the computational domain, then computational efforts for the coefficient approximation will be the same for all grids. The number of computational grids of level l (3dl ) increases as a geometric progression with common ratio 3d (d = 2, 3), so the amount of computations will grow in the same geometric progression. However, using property 1 (p. 67) of the coarse grids and the additive property of the definite integral, we can suppose a fast algorithm for approximation of the coefficients on the multigrid structure. Consider approximation of the coefficient (2.9)
⟨g⟩{i} =
x f{i}
1 h3l
∫ g(x) dx x f{i−1}
in detail. We redefine the function g(x) assuming g(x) = 0 outside the domain Ω = [0, 1]: {0, x∉Ω ̌ . g(x) ={ g(x), x ∈ Ω { We introduce a characteristic function 𝛶(x) as {0, 𝛶(x) = { 1, {
x∉Ω x∈Ω
.
114 | 2 Robust multigrid technique
Fictitious points on all grids shown in Fig. 2.5 are intended only for this redefinition of the integrable function and for definition of the characteristic function 𝛶(x) The coefficient ⟨g⟩{i} (2.9) can be rewritten as ⟨g⟩{i} =
l ⟨g⟩̌ {i} l Λ{i}
,
(2.71)
l l and Λ{i} are defined by where the coefficients ⟨g⟩̌ {i} x f{i}
l ⟨g⟩̌ {i}
1 ̌ dx = ∫ g(x) h
x f{i}
and
l Λ{i}
1 = ∫ 𝛶(x) dx . h
x f{i−1}
x f{i−1}
l is the number of the control volumes on the finest grid, which form the Parameter Λ{i} real part of the given control volume V i (i.e. V i ∩ Ω). For example, for grid G22 shown in Fig. 2.6, we have Λ2{1} = 7, Λ2{2} = 9, etc. If the control volume is located fully in l l the domain Ω, then the coefficient Λ{i} can be computed analytically: Λ{i} = 3l for
0 ≤ xf{i−1} < xf{i} ≤ 1. l Approximation of the coefficient ⟨g⟩̌ {i} on a grid of level l∗ (0 < l∗ ≤ L+3 ) starts on the finest grid (l = 0), where the integral x fm
1 ̌ dx ⟨g⟩̌ 0m = ∫ g(x) h
(2.72)
x fm−1
is computed by some numerical method, but the coefficient Λ0m is computed analytically x fm
Λ0m
{0, 1 = ∫ 𝛶(x) dx = { h 1, { x fm−1
xvm ∉ Ω xvm ∈ Ω
.
Since each control volume on the coarse grids of level l is a union of three control l l and Λ{i} can be volumes on the finer grids of level l − 1, then the coefficients ⟨g⟩̌ {i} approximated by the following recurrence equations: l l−1 l−1 l−1 {⟨g⟩̌ {i} = ⟨g⟩̌ {i}−3l−1 + ⟨g⟩̌ {i} + ⟨g⟩̌ {i}+3l−1 { l l−1 l−1 l−1 { Λ{i} = Λ{i}−3l−1 + Λ{i} + Λ{i}+3l−1
,
l = 1, . . . , l∗ .
(2.73)
l approximation Figure 2.25 illustrates the fast algorithm for the coefficient ⟨g⟩̌ {i}
l on the second grid G21 of the second level. The coefficient Λ{i} is approximated in the same manner. Finally, the coefficient ⟨g⟩{i} on this grid of level l∗ is given by (2.71). To simplify the coding, the coefficients such as ⟨g⟩{i} should be approximated on all grids
2.9 Remarks on multigrid software
^
=0
^
m
x mf 1 g m= g (x ) dx h f x m -1 0
^
^
g
0
|
115
0
g m =0 G 01 G 11 G 21
domain boundaries
-0.5
0.0
0.5
1.0
1.5
l ̌ {i} Fig. 2.25: Approximation of the coefficient ⟨ g⟩ on the grid G21
of the level l∗ . Obtained results can be stored in a single array due to features of the coarse grid of RMT (property 3, p. 68). Multiple integrals are computed by sequential integration. In general, it is necessary to express a d-dimensional integral as an iterated integral, which can then be evaluated by computing d single integrals. We use function f(x, y) = e x−y for 2D illustration with the uniform grid 2001×201 (h x = 1/N x0 = 1/2000 and h y = 1/N y0 = 1/200) and consider four cases (Fig. 2.26): Case A: (xv{i} , yv{j} ) are the vertices, (xf{i} , yf{j} ) are the control volume faces, and the control volume is defined by V{ij} = {(x, y)| xf{i−1} ≤ x ≤ xf{i} , yf{j−1} ≤ y ≤ yf{j} } . Exact value J{ij} =
x f{i}
1 h x h y 3l x +l y
we redefine as a =− J {ij}
∫ x f{i−1}
yf{j}
∫ f(x, y) dy dx yf{j−1}
e x b − e xa e−y b − e−ya , xb − xa yb − ya
where limits of integration are xa = max(xf1 , xf{i−1} ) , ya =
max(yf1 , yf{j−1} )
,
x b = min(xfN 0 , xf{i} ) , x
yb =
min(yfN 0 , yf{j} ) y
Error of the integral evaluation is defined by a l E l = maxJ {ij} − J {ij} , {ij}
.
116 | 2 Robust multigrid technique
Case B
Case A y jv+ 1
y jv+ 1
y jf y jv y jf _ 1 y jv_ 1
y jf y jv y jf _ 1 y jv_ 1 xf x f_ x iv_1 i 1 x iv i x iv+ 1
x iv_1 f x iv f x iv+ 1 x i _1 x i
Case C
Case D
y jv+ 1
y jv+ 1
y jf y jv y jf _ 1 y jv_ 1
y jf y jv y jf _ 1 y jv_ 1 x iv_1 f x iv f x iv+ 1 x i _1 x i
xf x f_ x iv_1 i 1 x iv i x iv+ 1
Fig. 2.26: Configuration of the control volumes on 2D grids
l where J {ij} is an approximated value of the integral on all grids of level l. On the finest grid the integral is approximated by the midpoint formula x fi
yfj
1 J ij = ∫ ∫ f(x, y) dy dx ≈ f(xvi , yvj ) = exp(xvi − yvj ) . hx hy x fi−1 yfj−1
Results of the computation are (ErrMAX = E l ) LevelX LevelX LevelX LevelX LevelX LevelX
= = = = = =
5 4 3 2 1 0
LevelY LevelY LevelY LevelY LevelY LevelY
= = = = = =
3 3 3 2 1 0
ErrMAX ErrMAX ErrMAX ErrMAX ErrMAX ErrMAX
= = = = = =
.2333594E-05 .2417141E-05 .2583452E-05 .2870922E-05 .2895241E-05 .2946985E-05
Case B: (xf{i} , yv{j} ) are the vertices, (xv{i} , yf{j} ) are the control volume faces, the control volume is defined by V{ij} = {(x, y)| xv{i} ≤ x ≤ xv{i+1} , yf{j−1} ≤ y ≤ yf{j} } ,
2.9 Remarks on multigrid software
| 117
and limits of integration are xa = max(xv1 , xv{i} ) , ya =
max(yf1 , yf{j−1} )
x b = min(xvN 0 +1 , xv{i+1} ) , x
,
y b = min(yfN 0 , yf{j} ) . y
Results of the computations are (ErrMAX = E l ) LevelX LevelX LevelX LevelX LevelX LevelX
= = = = = =
5 4 3 2 1 0
LevelY LevelY LevelY LevelY LevelY LevelY
= = = = = =
3 3 3 2 1 0
ErrMAX ErrMAX ErrMAX ErrMAX ErrMAX ErrMAX
= = = = = =
.2330217E-05 .2377266E-05 .2534330E-05 .2792995E-05 .2922191E-05 .2949126E-05
Case C: (xv{i} , yf{j} ) are the vertices, (xf{i} , yv{j} ) are the control volume faces, the control volume is defined by V{ij} = {(x, y)| xf{i−1} ≤ x ≤ xf{i} , yv{j} ≤ y ≤ yv{j+1} } , and limits of integration are xa = max(xf1 , xf{i−1} ) , ya =
max(yv1 , yv{j} )
x b = min(xfN 0 , xf{i} ) , x
yb =
,
min(yvN 0 +1 , yv{j+1} ) y
.
Results of the computations are (ErrMAX = E l ) LevelX LevelX LevelX LevelX LevelX LevelX
= = = = = =
5 4 3 2 1 0
LevelY LevelY LevelY LevelY LevelY LevelY
= = = = = =
3 3 3 2 1 0
ErrMAX ErrMAX ErrMAX ErrMAX ErrMAX ErrMAX
= = = = = =
.2447429E-05 .2482297E-05 .2546472E-05 .2822611E-05 .2941816E-05 .2946985E-05
Case D: (xf{i} , yf{j} ) are the vertices, (xv{i} , yv{j} ) are the control volume faces, the control volume is defined by V{ij} = {(x, y)| xv{i} ≤ x ≤ xv{i+1} , yv{j} ≤ y ≤ yv{j+1} } , and limits of integration are xa = max(xv1 , xv{i} ) ,
x b = min(xvN 0 +1 , xv{i+1} ) ,
ya = max(yv1 , yv{j} ) ,
y b = min(yvN 0 +1 , yv{j+1} ) .
x
y
118 | 2 Robust multigrid technique Results of the computations are (ErrMAX = E l ) LevelX LevelX LevelX LevelX LevelX LevelX
= = = = = =
5 4 3 2 1 0
LevelY LevelY LevelY LevelY LevelY LevelY
= = = = = =
3 3 3 2 1 0
ErrMAX ErrMAX ErrMAX ErrMAX ErrMAX ErrMAX
= = = = = =
.2465492E-05 .2416924E-05 .2630778E-05 .2782695E-05 .2914575E-05 .2946355E-05
Complexity of the algorithm for approximation of integrals on the multigrid structure is O(N log N) or O(N log2 N) arithmetic operations depending on the computer memory used, where N is the number of grid points [14].
2.10 Conclusions 1. Application of the essential multigrid principle in the single-grid algorithm makes it possible to use the robust multigrid technique (RMT) for solving a large class of (non)linear problems on structured grids. Close-to-optimal complexity of RMT does not need optimization of the pseudomultigrid algorithm. The main advantage of RMT compared to classic multigrid is the least number of problem-dependent components. Results of numerical experiments represented in Sect. 2.7 show that variable parameters and components of RMT are: a) the number of (post)smoothing iterations; b) unknown ordering (for anisotropic problems); c) underrelaxation parameter (for nonlinear problems); and d) stopping criterion. The number of multigrid iterations of RMT needed for solving the boundary value problems is independent of the number of unknowns N, but computational cost of each multigrid iteration is O(N log N) arithmetic operations (Sect. 2.6). RMT has closeto-optimal complexity. Loss in computational work compared to classic multigrid (∼ log N arithmetic operations) is a result of true robustness of RMT. 2. Absence of the coarse grids in RMT makes the task of the smoother the least demanding. The Gauss–Seidel smoother with block unknown ordering can be used for unified solutions of a large class of applied problems, including nonlinear saddle point problems (Chapter 4). 3. Parallel RMT allows one to avoid a load imbalance and communication overhead on very coarse grids (Chapter 3).
2.10 Conclusions |
119
4. RMT gives a new approach to component optimization [14]. Assume that the algorithmic complexity W depends on some parameter ω l , i.e. W(ω l ). The optimization opt consists of determination of the optimal value of ω l , which minimizes amount of computations in solving the given problem: W(ω l ) → min
at
opt
ωl = ωl
,
l = 0, 1, . . . , L+3 .
Since all grids of the same level are similar to each other, then it is possible to assume that the optimal value of the parameter will be the same for these grids. Optimal value opt ω l can be determined by experimental study of algorithmic complexity W for ω l on opt a single grid of the level. The obtained value of ω l can be used on other grids of this level. If ϱ extra discrete problems or the optimization should be solved on each level l, growth of the execution time δ T caused by optimization can be estimated as δT ϱ 1 ≈ , Topt L+3 + 1 3d − 1 where Topt is execution time of the optimized algorithm [13].
3 Parallel multigrid methods In this chapter, we will discuss parallel RMT. In Sect. 3.1 an algebraic approach to parallelization will be introduced for parallel smoothing on the fine levels. The algebraic approach is based on a decomposition of the given problem into a number of subproblems with an overlap. In Sect. 3.2 a geometric approach to parallelization will be introduced for a parallel smoothing on the coarse levels. The geometric approach ∗ is based on a decomposition of the given problem into 3dl subproblems without an overlap to avoid a communication overhead on the coarse levels. In Sect. 3.3, we will discuss combination of the algebraic and the geometric approaches for parallel RMT. Application of the combined approach in the parallel V-cycle is presented in Sect. 3.4. In Sect. 3.5, we resume the discussion on some practical aspects of parallel RMT.
3.1 Algebraic approach to parallelization Currently, there is a large variety of software and hardware for parallel programming and each of them has its own advantages and disadvantages. Here, we have restricted our attention to the theoretical aspects such as construction of the parallel multigrid algorithms and estimations of the speed-up and parallel efficiency. We regard the concrete parallel architectures, the memory/cache organization, the operating system and the compiler as technical questions, which will not be discussed in detail in this chapter. We start with consideration of an abstract model of a parallel computer that consists of a number of identical processors. Each processor has its own memory and can work (and be programmed) independently of all other processors. The processors can communicate and exchange data over some suitable interconnection, which is not further specified. The total number of processors will be denoted as p and each processor will be denoted as p i . There are two obvious reasons why a parallel algorithm and/or a parallel system may perform unsatisfactorily: load imbalance and communication overhead. Load imbalance means that some processors have to do much more work than most of the others. In this case, most of the processors have to wait for others to finish their computation before a data exchange can be carried out [25]. Communication overhead means that the communication and data transfer between the processors takes too much time compared to the effective computing time. This overhead may even lead to slowdown instead of speed-up when more and more processors are used [25]. There are two different types of approaches for the parallel treatment of PDEs: 1) Utilization of information on the computational grid for decomposition of the given problem into a number of subproblems without an overlap defines the geometric approach to parallelization. This approach does not depend on the https://doi.org/10.1515/9783110539264-003
3.1 Algebraic approach to parallelization
2)
|
121
smoothers and it can be used on the coarse levels to minimize load imbalance and communication overhead. Utilization of information on the smoother for a decomposition of the given problem into a number of subproblems with an overlap defines the algebraic approach to parallelization. This approach does not depend on the computational grid and it can be used on the fine levels.
To demonstrate the basic principles of the parallel treatment of the discrete boundary value problems, we consider SLAE a11 a ( 21 a31 a41
a12 a22 a32 a42
a13 a23 a33 a43
φ1 b1 a14 a24 φ2 b2 )( ) = ( ) a34 φ3 b3 a44 φ4 b4
(3.1)
for illustration of the algebraic approach. The coefficient matrix A can be decomposed on four blocks (submatrices) as
A=(
a11 a21
a12 a22
a13 a23
a14 a24
a31 a41
a32 a42
a33 a43
a34 a44
) .
We form the splitting matrix W using the diagonal blocks of the coefficient matrix A:
W=(
a11 a21
a12 a22
0 0
0 0
0 0
0 0
a33 a43
a34 a44
) .
(3.2)
This splitting matrix W (3.2) defines the Jacobi method with the block ordering of unknowns. Iterative method (1.4) with τ = 1 and W (3.2) becomes Wδ (ν+1) = b − Aφ (ν) = r(ν) ,
(3.3)
where δ (ν+1) = φ (ν+1) − φ (ν) is an iteration error after ν Jacobi iterations and r(ν) is a vector of residuals (ν)
b 1 − a11 φ1 − a12 φ2 − a13 φ3 − a14 φ4
r2
(ν)
b 2 − a21 φ1 − a22 φ2 − a23 φ3 − a24 φ4
r3
b 3 − a31 φ1 − a32 φ2 − a33 φ3 − a34 φ4
r1 r(ν) = (
)=( (ν) (ν)
r4
(ν)
(ν)
(ν)
(ν)
(ν)
(ν)
(ν)
(ν)
(ν)
(ν)
(ν)
(ν)
(ν)
(ν)
(ν)
(ν)
) .
b 4 − a41 φ1 − a42 φ2 − a43 φ3 − a44 φ4
Assume that a parallel computer with two processors p1 and p2 is used for solving SLAE (3.1) by the Jacobi method (3.3). These processors p1 and p2 hold ‘upper’ and ‘lower’ parts of the SLAE:
122 | 3 Parallel multigrid methods
a) Processor p1 a11
a12
a13
a14
a21 A = ( p1 0
a22
a23
a24
0
0
0
0
0
0
0
0
0
0
(ν)
φ1
b1
(ν)
) ,
b2 b = ( ) . p1 0
φ2 φ (ν) = ( (ν) ) , p1 φ 3 (ν) (φ4 )
0
b) Processor p2 0 0 A = ( p2 a
0
0
0
31
a32
a33
a34
a41
a42
a43
a44
(ν)
φ1 ) ,
φ(ν) = p2
(ν) φ ( 2 ) (ν) φ3 (ν) (φ4 )
0 0 b = ( ) . p2 b
,
3
b4
For clarity, entries of the coefficient matrix A and b, which are not stored on this processor, are marked by zero. This is the result of a decomposition of the given problem (3.1) into two subproblems A φ = b p1 p1 p1
A φ = b . p2 p2 p2
and
Both processors p1 and p2 hold the same vector of unknowns φ = φ ; p1 p2 here it is result of the overlap. In the case of the distributed memory architectures, communication between the processors p1 and p2 is based on message passing. Parallel Jacobi iteration can be represented by the following operations: (ν) (ν) 1. data exchange between the processors p1 and p2 : transfer of φ1 and φ2 from (ν) (ν) p1 to p2 ; transfer of φ3 and φ4 from p2 to p1 ; 2. parallel computation of the residual vector: processor p1 (ν)
(ν)
(ν)
(ν)
(ν)
(ν)
(ν)
(ν)
(ν)
(ν)
(ν)
(ν)
(ν)
(ν)
(ν)
(ν)
(ν)
(ν)
(ν)
(ν)
r1 = b 1 − a11 φ1 − a12 φ2 − a13 φ3 − a14 φ4 , r2 = b 2 − a21 φ1 − a22 φ2 − a23 φ3 − a24 φ4 , processor p2 r3 = b 3 − a31 φ1 − a32 φ2 − a33 φ3 − a34 φ4 , r4 = b 4 − a41 φ1 − a42 φ2 − a43 φ3 − a44 φ4 ,
3.1 Algebraic approach to parallelization
3.
| 123
parallel solution of SLAE (3.3): processor p1 (ν+1)
(ν+1)
(ν)
(ν+1) + a12 δ2 = r1 {a11 δ1 δ1 a11 ⇒ ( )=( { (ν+1) (ν+1) (ν+1) (ν) a21 δ2 + a22 δ2 = r2 {a21 δ1
a12 a22
−1
)
(ν)
(
r1
(ν)
r2
) ,
processor p2 (ν+1)
(ν+1)
(ν)
(ν+1) + a34 δ4 = r3 {a33 δ3 δ3 a33 ⇒ ( )=( { (ν+1) (ν+1) (ν+1) (ν) a43 δ4 + a44 δ4 = r4 {a43 δ3
4.
a34 a44
−1
)
(ν)
(
r3
(ν)
r4
) ,
and parallel update of new approximation to the solution: processor p1 (ν+1)
= φ1 + δ1
(ν+1)
= φ3 + δ3
φ1
(ν)
(ν+1)
(ν)
(ν+1)
(ν+1)
= φ2 + δ2
(ν+1)
= φ4 + δ4
,
φ2
,
φ4
(ν)
(ν+1)
(ν)
(ν+1)
,
processor p2 φ3
.
The next iteration starts from item 1, if necessary. The damped Jacobi method with block ordering of unknowns can be used as the smoother for parallel multigrid algorithms. The main problem of parallel algorithm development is minimization of load imbalance and data transfer. Parallel Jacobi methods for a parallel system consisting of four processors (p = 4) can be constructed in the same manner. For both cases (p = 2 and p = 4), it is possible to count up the number of arithmetic operations performed in parallel on different processors and data exchanged between the processors. It is obvious that the processors are more busy for p = 2. In general, advantages of the parallel algorithms over the sequential algorithms will decrease as the number of processors grows. Sometimes grid partitioning is used for geometric illustration of the Jacobi method with block ordering of unknowns. If grid applications are to be implemented on parallel computers, the original grid is split into p parts (subgrids), such that p available processors can jointly solve the underlying discrete problem. Each subgrid (and the corresponding ‘subproblem’, i.e. the equations and the unknowns located in the subgrid) is assigned to a different processor such that each processor is responsible for the computations in its part of the domain. The grid partitioning idea is widely independent of the particular boundary value problem to be solved and of the particular parallel architecture to be used [25]. The same parallel algorithm can be constructed without utilization of information on the computational grid; therefore in this book similar methods are regarded as the algebraic approach to parallelization.
124 | 3 Parallel multigrid methods Definition 3.1.1. The speed-up S̄ and the efficiency Ē of a parallel algorithm are T(1) S̄ = pĒ = , T(p)
(3.4)
where T(1) is an execution time for a single processor and T(p) is an execution time using p processors. In general, the execution time T(p) can be represented as T(p) =
1 T(1) + T ∗ (p) , p
where T ∗ (p) is time depending on cost of interprocessor communication in parallel implementation. Let T ∗ (p) = T(1)ϱ(p), where the function ϱ(p) depends on parallel architecture and parallel algorithms. Then the speed-up S̄ and the efficiency Ē (3.4) are S̄ = pĒ =
T(1) p . = 1 1 + pϱ(p) T(1) + T ∗ (p) p
Assume that the function ϱ(p) depends linearly on the number of processors p: ϱ(p) ≈ ζ(p − 1), where ζ is some constant depending on the parallel architectures (p = 1 ⇒ ϱ(p) = 0 for the sequential algorithm). Finally, we have S̄ = pĒ ≈
p . 1 + ζ(p − 1)p
For a small number of processors (1 ≫ ζ(p −1)p), the parallel algorithm has very good parallel efficiency S̄ = pĒ ≈ p(1 − ζ(p − 1)p) ≈ p . However, for a large number of processors (1 ∼ ζ(p − 1)p), low parallel efficiency is expected. Measures S̄ and Ē indicate parallelism of algorithm: if S̄ ≪ p or Ē ≪ 1 then this algorithm cannot be used for parallel computing, but if S̄ ≈ p or Ē ≈ 1 then additional research is needed. Definition 3.1.2. The speed-up S̃ and the efficiency Ẽ of a parallel algorithm over the best sequential algorithm is ̃ T(1) , (3.5) S̃ = pẼ = T(p) ̃ where T(1) is an execution time for a single processor of the fastest sequential algorithm and T(p) is an execution time of the parallel algorithm on p processors. Measures of parallelism S̃ and Ẽ are helpful for multigrid practice. We have observed in Sect. 2.1 that a numerical solution of 2D Poisson equation (1.29) requires
3.1 Algebraic approach to parallelization
| 125
W = O(N 2 log ε) arithmetic operations by the Jacobi method or the Gauss–Seidel method and W = O(N log ε) arithmetic operations by OMA, where N denotes the total number of unknowns. It is possible to assume that T(1) ∼ W, i.e. ̃ T(1) ≈ COMA N log ε , 1 T(p) ≈ CJ N 2 log ε + T ∗ (p) , p where COMA and CJ are some N-independent constant. Assuming full parallelism T ∗ (p) = 0, we have p C S̃ = pẼ < S̃ max = pẼ max ≈ OMA = O(p/N) . CJ N Thus Jacobi method has almost full parallelism: Ē → 1 for p ≪ N, but execution time of the parallel Jacobi iterations will be worse than that of the best sequential OMA. Measures of parallelism S̃ and Ẽ can detect high parallel efficiency for numerically inefficient methods and low parallel efficiency for numerically efficient ones. A general description of the common measures of parallelism is given in [18] and the references therein. In addition to these common measures, we introduce the measures of parallel properties of smoothers: Definition 3.1.3. The speed-up Sl and efficiency El of a parallel smoother is Sl = pEl =
T l (1) , T l (p)
(3.6)
where T l (1) is an execution time for a single processor of the smoother, T l (p) is an execution time using p processors and l = 0, 1, . . . , L+ is serial number of the grid levels. In general, we can assume that the smoothing procedures are joined together with other multigrid components, i.e. execution time of each sequential multigrid iteration is T0 (1) + T1 (1) + ⋅ ⋅ ⋅ + T L+ (1). Then S̃ and Ẽ depend on Sl and El and the functions ̃ 0 , S1 , . . . , S + ) and Ẽ = E(E ̃ 0 , E1 , . . . , E + ) will be discussed in this chapS̃ = S(S L L ter. Unfortunately, it is impossible to obtain accurate a priori estimations of S̃ and Ẽ without taking into account all features of the parallel architecture. In the following, our goal is to estimate S̃ and Ẽ using Sl and El . Assume that we construct a parallel multigrid algorithm to solve a boundary value problem. As a rule, multigrid coding starts with a subroutine for the parallel smoothing. Numerical experiments with the subroutine allow us to obtain values of the speed-up Sl and the efficiency El of this parallel smoother as a function of the number of vertices (unknowns) N and processors p, and features of the computer architecture. After that it is possible to obtain accurate estimations of S̃ and Ẽ using the obtained values Sl (N, p) and El (N, p) for the given parallel computer.
126 | 3 Parallel multigrid methods
First, we illustrate the main problem for construction of parallel multigrid algorithms using the V-cycle. Consider a 1D problem on the finest grid with 2049 vertices (N 0 = 2048). The numbers of vertices on the coarse levels l (l = 1, 2, . . . , L+2 ) are l = 1;
N 1 + 1 = 1025 ,
l = 2;
N 2 + 1 = 513 ,
l = 3;
N 3 + 1 = 257 ,
l = 4;
N 4 + 1 = 129 ,
l = 5;
N 5 + 1 = 65 ,
l = 6;
N 6 + 1 = 33 ,
l = 7;
N 7 + 1 = 17 ,
l = 8;
N8 + 1 = 9 ,
l = 9;
N9 + 1 = 5 ,
l = 10;
N 10 + 1 = 3 .
Here L+2 = 10 is serial number of the coarsest level generated by double coarsening. Let a parallel computer with p = 32 processors be used for parallelization of the multigrid iterations. It is clear that the number of vertices is less than the number of processors (N l + 1 < p) for l ≥ 7. Most of the processors will be idle and it is very difficult to obtain uniform load balance for the smoothing iterations on the coarse levels. On fine levels N l + 1 ≥ p for l < 7, additional difficulties arise due to the fact that each processor has small discrete subproblems: N 6 /p ≈ 1, N 5 /p ≈ 2, N 4 /p ≈ 4, . . . . It reduces the ratio of the computational work performed by the processors to the amount of data for communication between different processors. As a result, the parallel algorithm loses its advantage in comparison with the sequential algorithm. It is expected that the speed-up and the efficiency of a parallel smoother will be higher on the fine levels: Sl > Sl+1
and El > El+1
for
N l > N l+1
(3.7)
with the same number of the processors p. Since the data exchange between processors and the processor idleness in each parallel smoothing iteration leads to deterioration of the efficiency of a parallel algorithm on the coarse levels l (l∗ ≤ l ≤ L+2 ), the simplest approach is a sequential smoothing on these levels. We can estimate the speed-up and efficiency of a parallel V-cycle with sequential smoothing on the coarse levels, i.e. if p − 1 processors are + idled for l∗ ≤ l ≤ L+2 . Assume that the finest grid with N0 + 1 = 2d(L2 +1) + 1, d = 2, 3 vertices is used for parallel computations and the number of coarse grid vertices is N l + 1 = N0 2−dl + 1, l = 0, 1, . . . , L+2 . If the same number of the same smoothing iterations is performed on all grids, we can assume that execution time needed for the sequential smoothing is proportional to the number of unknowns: T l (1) ∼ N l or T l (1) = T0 (1)2−dl , l = 0, 1, . . . , L+2 . Execution time of the sequential multigrid iteration of the V-cycle is L +2
L +2
T(1) = ∑ T l (1) = T0 (1) ∑ 2−dl = T0 (1) l=0
l=0
+
1 − 2−d(L2 +1) −d
1−2
≈
T0 (1) 1 − 2−d
.
(3.8)
3.1 Algebraic approach to parallelization
| 127
Execution time of the parallel multigrid iteration of the V-cycle is l ∗ −1
L +2
l ∗ −1
L +2
l=0
l=l ∗
l=0
l=l ∗
T(p) = ∑ T l (p) + ∑ T l (1) = ∑ T l (p) + T0 (1) ∑ 2−dl , where the first term is execution time of the parallel smoothing iterations using p processors on the fine levels l (0 ≤ l < l∗ ) and the second term is execution time of the sequential smoothing iterations on the coarse levels l (l∗ ≤ l ≤ L+2 ). Using Definition 3.1.3 (3.6), we have T l (p) = Since
L +2
−dl
∑2
l=l ∗
1 T l (1) 1 2−dl = T (1) . p El p El 0 +
− 2−d(L2 −l
−dl ∗ 1
=2
∗
+1)
1 − 2−d
≈
2−dl
∗
1 − 2−d
,
execution time of the parallel multigrid iteration of the V-cycle can be estimated as ∗
∗
1 l −1 2−dl 2−dl + ) . T(p) ≈ T0 (1) ( ∑ p l=0 El 1 − 2−d
(3.9)
Taking into account (3.8) and (3.9), the speed-up and efficiency of the parallel V-cycle (3.5) is estimated as 1 T(1) ≈ . S̃ = pẼ = −d l ∗ −1 −dl T(p) 1−2 2 −dl ∗ ∑ +2 p l=0 E l
(3.10)
To obtain a rough estimation, we assume that El < E0 (3.7). Since l ∗ −1
∑ 2−dl ≈
l=0
1 1 − 2−d
for l∗ ≫ 1 and using (3.10), we have S̃ = pẼ
0, i.e. s2 > 0 ⇒ 2ε > h . If 2ε < h then the numerical solution will be an oscillating function (−1)i |s2i | as opposed to the exact solution (4.15). In the limit case ε → 0, it represents a serious mesh size limitation in the finest grid generation. Let a first-order upwind discretization be used for discretization of the first derivative ψ on a uniform grid with a mesh size h ε
ϕ i−1 − 2ϕ i + ϕ i+1 h
2
+
ϕ i+1 − ϕ i =0, h
ϕ1 = 0 ,
ϕ N+1 = 1 ,
h = 1/N .
This finite difference scheme has the first approximation order. A general solution is given by (4.16), where ε s2 = . ε+h The obtained numerical solution ϕ i will be a monotone function independently on ε and h. However the first approximation order results in loss of the discretization accuracy. Using the Taylor formula (1.10), we have ψ(xvi+1 ) − ψ(xvi ) h = ψ (xvi ) + ψ (xvi ) + O(h2 ) , h 2 ψ(xvi−1 ) − 2ψ(xvi ) + ψ(xvi+1 ) ε = εψ (xvi ) + O(h2 ) . h2 Since ϕ i = ψ(xvi ), summation of these equations gives ε
ϕ i−1 − 2ϕ i + ϕ i+1 h
2
+
ϕ i+1 − ϕ i h = (ε + ) ψ (xvi ) + ψ (xvi ) + O(h2 ) , h 2
4.3 Discretization of convection-diffusion equations | 153
i.e. the upwind scheme approximates the boundary value problem with parameter ε + h/2 instead of ε. This results in increase of thickness of the boundary layer. As a rule, nonuniform grids are used for approximation of the boundary value problems if their solution has the boundary layers. Consider the following model problem: (θψ) = ε(ψ) , ψ(0) = 0 , ψ(1) = 1 , (4.17) where θ(x) is a sufficiently smooth function. If the boundary layer is located near boundary x = 0, we generate the nonuniform grid G = {xvi | 0 = xv1 < xv2 < . . . xvN+1 = 1} , so that
xvi − xvi−1 < xvi+1 − xvi ,
i = 2, . . . , N ,
i.e. minimum mesh of the grid G is near the boundary x = 0 for high resolution of the boundary layer. The control volume faces are defined as xfi =
1 v (x + xvi+1 ) , 2 i
i = 1, . . . , N .
This means that the control volume V i is defined by V i = {x | xfi−1 ≤ x ≤ xfi }. Integration of Eq. (4.17) over the control volume V i gives the finite difference scheme ϕ − ϕ i ϕ i − ϕ i−1 (4.18) θ(xfi )ϕ f − θ(xfi−1 )ϕ f = ε ( vi+1 − v ) . xi x i−1 x i+1 − xvi x i − xvi−1 We can use the following interpolation: ϕ f = (1 − γ k )ϕ k + γ k ϕ k+1 , k = i − 1, i , (4.19) xk to express the values ϕ f on the control volume faces in terms of the discrete nodal xk values ϕ k and ϕ k+1 . Here γ k = 0 or γ k = 1 corresponds to the one-sided differencing, but γ k = 1/2 corresponds to the central O(h2 )-discretization. Substituting (4.19) into (4.18), we have θ(xfi )((1 − γ i )ϕ i + γ i ϕ i+1 ) − θ(xfi−1 )((1 − γ i−1 )ϕ i−1 + γ i−1 ϕ i ) = ε(
ϕ i+1 − ϕ i ϕ i − ϕ i−1 − v ) , xvi+1 − xvi x i − xvi−1
which can be rewritten as a i ϕ i−1 + b i ϕ i + c i ϕ i+1 = 0 , where
ε + θ(xfi−1 )(1 − γ i−1 ) , − xvi−1 ε ε − v − θ(xfi )(1 − γ i ) + θ(xfi−1 )γ i−1 , bi = − v x i+1 − xvi x i − xvi−1 ε − θ(xfi )γ i . ci = v x i+1 − xvi
ai =
xvi
154 | 4 Applications of multigrid methods in computational fluid dynamics If θ(xfi ) > 0, we require that c i ≥ 0 to obtain a monotone numerical solution: ε 1 ε ≥ θ(xfi )γ i ⇒ γ i ≤ v v . xvi+1 − xvi θ(xfi ) x i+1 − x i Additionally, γ i = 1/2 if γi ≤
1 ε 1 ≤ 2 θ(xf ) xvi+1 − xvi i
because of higher-order discretization. The parameter γ i is defined by γ i = min (
1 ε 1 ) . ; 2 θ(xf ) xvi+1 − xvi i
(4.20)
If θ(xfi ) < 0, we require that a i+1 ≥ 0 to obtain a monotone numerical solution, and γ i = 1/2 if 1 1 ε γi ≥ ≥ 1 − f xv − xv 2 |θ(x )| i+1 i i
because of higher-order discretization. The parameter γ i is defined by γ i = max (
1 1 ε ;1− v v) . f 2 |θ(x i )| x i+1 − x i
(4.21)
If the grid is generated so that inside the boundary layer the inequality 1 |θ(xfi )|
1 ε ≥ 2 xvi+1 − xvi
holds, then (4.20) and (4.21) give γ i = 1/2 and the central O(h2 )-discretization takes place inside the layer (very thin subdomains of the sharp gradient of the solution). The first approximation order is expected outside the layer that ensures O(h) accuracy in subdomains without sharp gradients. The local refinement approach makes it possible to obtain an accurate numerical solution using simple finite difference schemes. In general, it is difficult to generate such grids because we do not know in advance where the local refinements will occur. Consider application of RMT for solving the convection-diffusion equation ̄ ̄ ̄ ∂(uu) ∂(υu) ∂(wu) ∂ ∂u ∂ ∂u ∂ ∂u + + = ε[ (μ ) + (μ ) + (μ )] ∂x ∂y ∂z ∂x ∂x ∂y ∂y ∂z ∂z
(4.22)
+ f(x, y, z) , where u,̄ υ,̄ w̄ and μ are differentiable functions. Assume that the domain is unit cube Ω = [0, 1]3 and a boundary layer is located near x = 0, y = 0 and z = 0 boundaries. We transform the coordinates to obtain the same domain Ω̃ = [0, 1]3 and a uniform grid in it. The differential equations can be transformed to the new coordinates. In the given example, the transformation can be performed independently in each spatial
4.3 Discretization of convection-diffusion equations | 155
direction. Let the coordinate x̃ in the computational domain Ω̃ and the coordinate x in the physical domain Ω be coupled as (direct transformation) x̃ = 1 − (ln
δ + 1 −1 δ + 1 − x ) ln , δ−1 δ−1+x
1 0 for x̃ ≥ 0. Coordinates ỹ and z̃ are defined in the same manner. Change of variables d d d x̃ d = = x̃ x , dx dx d x̃ d x̃
d d d ỹ d = = ỹ y , dy dy d ỹ d ỹ
d d d z̃ d = = z̃z , dz dz d z̃ d z̃
transforms the original convection-diffusion equation (4.22) in the computational domain Ω̃ as x̃ x
̄ ̄ ̄ ∂(uu) ∂(υu) ∂(wu) + ỹ y + z̃z ∂ x̃ ∂ ỹ ∂ z̃ ∂ ∂ ∂ ∂u ∂u ∂u (x̃ x μ ) + ỹ y (ỹ y μ ) + z̃z (z̃z μ )] + f(x,̃ y,̃ z)̃ , = ε [ x̃ x ̃ ̃ ̃ ̃ ̃ ∂x ∂x ∂y ∂y ∂z ∂ z̃
or ̃ ̃ ̃ ∂(uu) ∂(υu) ∂(wu) + + ∂ x̃ ∂ ỹ ∂ z̃ ∂ ∂u ∂ ∂u ∂ ∂u = ε[ (μ̃ )+ (μ̃ )+ (μ̃ )] + f ̃(x,̃ y,̃ z)̃ , ∂ x̃ ∂ x̃ ∂ ỹ ∂ ỹ ∂ z̃ ∂ z̃ where ũ = μx =
ū ỹ y z̃z
,
x̃ x μ, ỹ y z̃z
υ̃ = μy =
ῡ x̃ x z̃z ỹ y x̃ x z̃z
,
w̃ =
μ,
μz =
w̄ x̃ x ỹ z z̃z
x̃ x ỹ y
,
μ,
f̃ =
1 x̃ x ỹ y z̃z
f.
In this case, Eqs. (4.22) and (4.23) coincide up to the functions u,̄ υ,̄ w,̄ μ and f .
(4.23)
156 | 4 Applications of multigrid methods in computational fluid dynamics Here the equation (4.23) in the computational domain Ω̃ is similar to the original equation (4.22) in the physical domain Ω. We introduce a function φ φ = exp (−
αx + βy + γz ) , ε
where α, β and γ are some constant. We define the functions u,̄ ῡ and w̄ as ū = A(1 − φ) ,
ῡ = B(1 − φ) ,
w̄ = C(1 − φ) ,
where A, B and C are some constant. These constants α, β, γ, A, B and C are intended for simulation of partial cases. Assuming μ = 1 and u = ū = A(1 − φ) (4.24) and taking into account (4.22), we have the right-hand side function f f(x, y, z) =
A φ[α2 + β 2 + γ2 + 2(αA + βB + γC)(1 − φ)] ε
(4.25)
and the Dirichlet boundary conditions. Finally, the Dirichlet boundary value problem for equation (4.22) with the righthand side function (4.25) and exact solution (4.24) is posed in the physical domain Ω = [0, 1]3 . Similarly, the boundary value problem can be reformulated in nonlinear form ̄ ̄ ∂(u 2 ) ∂(υu) ∂(wu) ∂ ∂u ∂ ∂u ∂ ∂u + + = ε[ (μ ) + (μ ) + (μ )] ∂x ∂y ∂z ∂x ∂x ∂y ∂y ∂z ∂z + f(x, y, z) ,
(4.26)
with the same right-hand side function (4.25) and the same exact solution (4.24). Consider a 1D problem with A = 1 and α = 1. Assume that a uniform grid in the computational domain has Ñ + 1 = 101 vertices (mesh size h̃ = 1/Ñ = 1/100) and value of the function u(x) = 1 − exp(−x/ε) on the distance x2 from the nearest boundary x1 = 0 is u(x2 ) = 0.1, i.e. approximately ten vertices are located inside the boundary layer. This assumption makes it possible to define the direct mapping Ω → Ω̃ and the inverse mapping Ω̃ → Ω, i.e. to determine the parameter δ. Figures 4.4 and 4.5 represent the function u(x) = 1 − exp(−x/ε) and the nonuniform grid near the boundary x1 = 0 for ε = 10−3 (δ = 1.001415) and ε = 10−9 (δ = 1.000000000423). First, we use the single-grid Gauss–Seidel method with the block (3 × 3 × 3, Sect. 2.7) unknown ordering for solving Eqs. (4.22) and (4.26) with different A, B and C on the grid 101 × 101 × 101 (h̃ x = h̃ y = h̃ z = 1/100), starting with zero initial guess. Error of the numerical solution of Eqs. (4.22) and (4.26) is defined in the traditional manner (ν) E(ν) = max |u(x i , y j , z k ) − uijk | , (4.27) ijk
4.3 Discretization of convection-diffusion equations |
1
0.8
0.6
0.4
0.2
0
0.002
0.004
0.006
0.008
0.01
Fig. 4.4: Function u(x) = 1 − exp(−x/ε) and nonuniform grid in the physical domain for ε = 10−3
1
0.8
0.6
0.4
0.2
0
4 .10
_
9
8 .10
_
9
1.2 .10
_
8
1.6 .10
_
8
Fig. 4.5: Function u(x) = 1 − exp(−x/ε) and nonuniform grid in the physical domain for ε = 10−9
157
158 | 4 Applications of multigrid methods in computational fluid dynamics (ν)
where u(x i , y j , z k ) and uijk are the exact (4.24) and numerical solutions respectively. Here the superscript ν denotes the Gauss–Seidel iteration counter. Results of the numerical experiments are shown in Fig. 4.6. Smoothing procedures for the multigrid algorithms should trend towards the direct method for the discrete boundary value problem for ε → 0. Natural block ordering of unknowns allows one to obtain such a smoothing procedure for A = B = C = 1. As a result, reduction of the number of Gauss–Seidel iterations for ε → 0 is expected. For the given unknown ordering, information about conditions on the boundaries x = 0, y = 0 and z = 0 spreads on the mesh size in each iteration. If A = B = C = −1, error of the numerical solution E(ν) (4.27) can be reduced only after ≈ Ñ iterations. As a rule, u,̄ ῡ and w̄ are alternating-sign functions and it is very difficult to choose the unknown ordering resulting in the fastest convergence of the Gauss–Seidel iterations for ε → 0. The algorithm for the nonlinear convection-diffusion equation (4.26) consists of external Gauss–Seidel iterations with the block unknown ordering and two internal Newton iterations (Sect. 1.9), i.e. the nonlinear algorithm is twice as expensive per iteration as the linear one. If ε = 10−3 , then the nonlinear problem (4.26) can be solved without an underrelaxation. Figure 4.7 shows that the nonlinear algorithm converges somewhat slower than the linear solver. E
(ν )
0.1
0.01
0.001
0.0001 0
40
80 120 160 Gauss-Seidel iteration ν
200
Fig. 4.6: The convergence history: reduction of error E (ν) (4.27) in solving (4.22). The symbols are explained in Fig. 4.7.
4.4 Resulting SLAE
E
| 159
(ν )
1
0.1
ε = 10 ε = 10 ε = 10 ε = 10 ε = 10 ε = 10 ε = 10 ε = 10
0.01
0.001
0.0001 0
40
_
3
_
5
_
7
_
9
_
3
_
5
_
7
_
9
( A = B = C = 1) ( A = B = C = 1) ( A = B = C = 1) ( A = B = C = 1) ( A = B = C = _ 1) ( A = B = C = _ 1) ( A = B = C = _ 1) ( A = B = C = _ 1)
80 120 160 Gauss-Seidel iteration ν
200
Fig. 4.7: The convergence history: reduction of error E (ν) (4.27) in solving (4.26)
For multigrid methods we define error of the numerical solution of Eqs. (4.22) and (4.26) as (q) E(q) = max |u(x i , y j , z k ) − uijk | , (4.28) (q)
where u(x i , y j , z k ) and uijk are the exact (4.24) and numerical solutions respectively. Here the superscript q denotes the multigrid iteration counter. The finest grid 101 × 101×101 (h̃ x = h̃ y = h̃ z = 1/100) generates the four-level multigrid structure (L+3 = 3). The above-mentioned Gauss–Seidel method with the block 3×3×3 unknown ordering is used as the smoothing procedure for RMT. One (ν = 1) or two (ν = 2) smoothing iterations are applied on all levels l (0 ≤ l < L+3 ). Results of the numerical experiments are shown in Figs. 4.8 and 4.9. Really, we demonstrate the only formal application of RMT for solving the convection-diffusion equations. Convergence of the robust multigrid method should be h- and ε-independent [16].
4.4 Resulting SLAE Assume that the nonlinear discrete analogues of the momentum equations (4.11) and (4.12) and energy equation (4.13) are linearized by some linearization technique, and interpolation (4.19) is applied for discretization of the convective terms. Using the com-
160 | 4 Applications of multigrid methods in computational fluid dynamics
E
(q )
ε = 10 _ 3 ; ε = 10 _ 3 ; ε = 10 _ 3 ; ε = 10 3 ; _
0.1
ν ν ν ν
=1 =2 =1 =2
( A = B = C = 1) ( A = B = C = 1) ( A = B = C = _ 1) ( A = B = C = _ 1)
0.01
0.001
0.0001
0.00001 0
1 2 multigrid iteration q
3
Fig. 4.8: The convergence history: reduction of error E (q) (4.28) for solving (4.22)
E
(q )
ε = 10 _ 9 ; ε = 10 _ 9 ; ε = 10 _ 9 ; ε = 10 9 ; _
ν ν ν ν
=1 =2 =1 =2
( A = B = C = 1) ( A = B = C = 1) ( A = B = C = _ 1) ( A = B = C = _ 1)
0.1
0.01
0.001
0.0001 0
1 2 multigrid iteration q
3
Fig. 4.9: The convergence history: reduction of error E (q) (4.28) for solving (4.22)
4.5 Uzawa iteration
| 161
putational stencils shown in Fig. 4.3, we rewrite the linearized analogues in the general form: a) linearized discrete analogue of the X-momentum equation (n+1)
(n+1)
(n+1)
(n+1)
(n+1)
+ UWN ij υ i−1j+1
(n+1)
− p i−1j
E S N P UW ij u i−1j + U ij u i+1j + U ij u ij−1 + U ij u ij+1 + U ij u ij (n+1)
(n+1)
(n+1)
WS ES + UEN ij υ ij+1 + U ij υ i−1j + U ij υ ij
p ij
+
(n+1)
(n+1)
= S uij ,
hx
(4.29)
b) linearized discrete analogue of the Y-momentum equation (n+1)
(n+1)
(n+1)
(n+1)
(n+1)
+ VWN ij u ij
(n+1)
− p ij−1
E S N P VW ij υ i−1j + V ij υ i+1j + V ij υ ij−1 + Vij υ ij+1 + Vij υ ij
+ VEN ij
(n+1) u i+1j
+
VWS ij
(n+1) u ij−1
+
VES ij
(n+1) u i+1j−1
p ij
+
(n+1)
(n+1)
= S υij ,
(4.30)
+ Hnij υ ij+1 = STij .
(4.31)
hy
c) linearized discrete analogue of the energy equation (n+1)
(n+1)
(n+1)
(n+1)
(n+1)
E S N P HW ij T i−1j + H ij T i+1j + H ij T ij−1 + H ij T ij+1 + H ij T ij (n+1)
+ Hw ij u ij
(n+1)
(n+1)
+ Heij u i+1j + Hsij υ ij (n+1)
First, we consider an isothermal fluid flow T ij (n+1) u ij ,
(n+1) υ ij
(n+1)
= const and use the point ordering
(n+1) p ij
of unknowns and as shown in Fig. 1.6 (i.e. form vectors u h , υ h and p h ). Similarly we form a right-hand side vector s uh and s υh of Eqs. (4.29) and (4.30). The resulting SLAE arising from (4.29), (4.30) and (4.10) can be written in matrix form (
Ah
BTh
Bh
0
)(
Vh
sh )=( ) , ph 0
Vh = (
uh υh
) ,
s uh sh = ( υ ) . sh
(4.32)
Previously, such a SLAE has been obtained from discretization and linearization of the Navier–Stokes equations written in operator form (4.8). Remember that only the square matrix A h is invertible (Sect. 4.1). For a general description see [2] and the references therein. In the following, we will analyze the most popular solution approaches for the resulting SLAE (4.32).
4.5 Uzawa iteration The Uzawa iteration is based on use of the block pattern of the coefficient matrix SLAE (4.32). The Uzawa iteration is defined as (ν+1)
Ah Vh (ν+1)
Qp h
(ν)
+ BTh p h = s h , (ν)
(ν+1)
= Qp h + B h V h
,
(4.33a) (4.33b)
162 | 4 Applications of multigrid methods in computational fluid dynamics
where Q is some matrix and the superscript ν denotes the Uzawa iteration counter. Assuming that Eq. (4.33a) is solved exactly: (ν+1)
(ν)
T −1 + A−1 h Bh ph = Ah sh
Vh (ν+1)
and substituting V h
into the second equation (4.33b), we have (ν+1)
Qp h or
(ν+1)
ph
(ν)
(ν)
−1 T = Qp h + B h (A−1 h sh − Ah Bh ph )
(ν)
T −1 −1 = (I − Q−1 B h A−1 h B h )p h + Q B h A h s h .
Convergence of the Uzawa iteration means that the exact solution p h should satisfy T −1 −1 p h = (I − Q−1 B h A−1 h B h )p h + Q B h A h s h .
Then the exact errors after ν and ν + 1 Uzawa iterations are coupled as (ν+1)
ph − ph
(ν)
= (I − Q−1 S)(p h − p h ) ,
T where the matrix S = B h A−1 h B h is Schur complement (4.9). It is clear that convergence rate of the Uzawa iterations depends strongly on the matrix Q. On the one hand, we try to get Q → S: if ‖I − Q−1 S‖ ≤ ω < 1, then the Uzawa iterations converge as (ν+1)
‖p h − p h
(ν)
(0)
‖ ≤ ω‖p h − p h ‖ ≤ ω ν+1 ‖p h − p h ‖ .
On the other hand, the matrix Q should be easy to invert, i.e. Eq. (4.33b) should be T solved efficiently by some iterative method. Mostly the matrix Q is defined as B h D−1 A Bh , where the diagonal matrix D A is formed by the diagonal entries of A h . One disadvantage of this approach is formulation of the boundary conditions for (ν+1) (ν) the pressure iteration error p h − p h . Assuming small time step h t → 0, we have −1 A h ≈ h−1 t I ⇒ Ah ≈ ht I .
It results in simplification of the matrix Q definition T T S = B h A−1 h Bh ≈ ht Bh Bh = Q .
Analogy with (4.6) shows that B h BTh = ∆ h . For h t → 0 Eq. (4.33b) becomes (ν+1)
h t ∆ h δ (ν+1) = Bh Vh p (ν+1)
,
(4.34)
(ν)
where δ (ν+1) = ph − p h is the pressure iteration error. In two dimensions Eq. (4.34) p is a discrete analogue of the PDE ∂2 δ(ν+1) p ∂x
2
+
∂2 δ(ν+1) p ∂y
2
=
1 ∂u (ν+1) ∂υ(ν+1) ( + ) . ht ∂x ∂y
(4.35)
4.5 Uzawa iteration
| 163
To determine the pressure correction δ p , it is necessary to pose a boundary value problem, i.e. to formulate the boundary conditions for Poisson equation (4.35). Since Eq. (4.33b) or (4.35) have no a physical meaning, formulation of the boundary conditions is a very difficult problem. Problems concerning the flow in the driven cavity and the flow between the parallel plates are used for illustration. Geometry of these problems and the boundary conditions for the velocities u and υ are shown in Fig. 4.10.
0 u (t,x, 0)=0 v (t,x, 0)=0 1 X
v (t,1,y )=0
v (t,0,y )=0
A
1
u (t,0,y )= U0 (t,y )
u (t,0,y )=0 v (t,0,y )=0
1
u (t,x, 1)=0 v (t,x, 1)=0
B
∂u =0 ∂x x =1
Y
u (t,x, 1)=Uw (t ) v (t,x, 1)=0 u (t,1,y )=0 v (t,1,y )=0
Y
0 u (t,x, 0)=0 v (t,x, 0)=0 1 X
Fig. 4.10: Problems concerning the flow in the driven cavity (A) and the flow between the parallel plates (B): geometry and boundary conditions
First, consider the problem of the flow in the driven cavity (Fig. 4.10A). Integration of (4.35) over the domain Ω = {(x, y) | 0 < x < 1, 0 < y < 1} gives the following solvability condition: 1
∫ 0
1 1 1 (ν+1) ∂δ(ν+1) ∂δ(ν+1) ∂δ(ν+1) p dy − ∫ ∂δ p dy + ∫ p dx − ∫ p dx = 0 . ∂x ∂x ∂y ∂y x=1 x=0 y=1 y=0 0 0 0
As a rule, zero gradient boundary conditions for the pressure correction δ p are given on solid walls, which means that the boundary conditions for Eq. (4.35) become ∂δ(ν+1) ∂δ(ν+1) ∂δ(ν+1) ∂δ(ν+1) p p p p = = = =0. ∂x ∂x ∂y ∂y x=1 x=0 y=1 y=0 Discrete analogues of these Neumann boundary conditions are used for (4.34). Numerical solution of the Neumann boundary value problem for the Poisson equation needs more computational effort than the Dirichlet boundary value problem. Second, consider the flow between parallel plates (Fig. 4.10B). No-slip conditions are posed on the the plates y = 0 and y = 1. Distribution of the velocity u is given on the inlet boundary x = 0. Homogeneous Neumann boundary condition for the velocity u and homogeneous Dirichlet boundary condition for the velocity υ are posed on the outlet boundary x = 1.
164 | 4 Applications of multigrid methods in computational fluid dynamics Integration of (4.35) over the domain Ω = {(x, y) | 0 < x < 1, 0 < y < 1} gives the following solvability condition: 1 1 1 ∂δ(ν+1) ∂δ(ν+1) ∂δ(ν+1) ∂δ(ν+1) p p p p dx dy − ∫ dy + ∫ dx − ∫ ∫ ∂x ∂x ∂y ∂y x=1 x=0 y=1 y=0 0 0 0 0 1
1
=
1
1 1 ∫ u (ν+1) (t(n+1) , 1, y)dy − ∫ u (ν+1) (t(n+1) , 0, y)dy . ht ht 0
(4.36)
0
The right-hand side of (4.36) is an error of the mass flow rate between the outlet (x = 1) and inlet (x = 0) sections. For exact solution satisfying the continuity equation (4.1a), we have 1
1
∫ u(t(n+1) , x, y)dy = ∫ u(t(n+1) , 0, y)dy , 0
0≤x≤1.
0
Using (4.36) and assuming zero gradient boundary conditions for the pressure correction δ p on the solid walls of the plates 1
∫ 0
1 (ν+1) ∂δ(ν+1) p dx = ∫ ∂δ p dx = 0 , ∂y ∂y y=1 y=0 0
(4.37)
we have the following solvability condition: 1
∫ 0
1 (ν+1) ∂δ(ν+1) p dy − ∫ ∂δ p dy ∂x ∂x x=1 x=0 0 1
1
0
0
1 1 = ∫ u (ν+1) (t(n+1) , 1, y)dy − ∫ u(t(n+1) , 0, y)dy . ht ht
(4.38)
Inlet velocity u(t(n+1) , 0, y) is known. The solvability condition (4.38) shows that the boundary conditions for the pressure correction depend on error of the mass flow rate. Different approaches for the boundary condition formulation can be found in [15] and the references therein. As a rule, the Uzawa iterations use underrelaxation (ν+1)
ph
(ν)
:= p h + ϖδ (ν+1) p
for partial compensation of the difference between sparse matrix Q and (almost) dense matrix S. Many classical iterative solvers (Uzawa iterations, SIMPLE and others) treat the momentum equations and a ‘pressure equation’ separately in an outer iteration. Disadvantages of these solvers are caused by decoupled solutions of PDEs, artificial
4.6 Vanka iteration
| 165
boundary conditions, optimal choice of the matrix Q and the underrelaxation parameter ϖ, and the inability to use similar approaches efficiently in black-box software. In order to illustrate the basic ideas of black-box software, we consider an iterative method with the least number of problem-dependent components for the resulting SLAE (4.32).
4.6 Vanka iteration Our goal is to build a smoother for black-box multigrid. The smoother should have the least number of problem-dependent components and it can treat many problems including SLAEs with zero diagonal entries of the coefficient matrix (4.8). Vanka iteration is a variant of the Gauss–Seidel method with special block ordering of unknowns [26]. The Gauss–Seidel method with block 2 × 2 unknown ordering was considered in Sect. 1.5 (Fig. 1.9). The Vanka method has similar construction, but this approach can be applied to systems of PDEs. Assume that a staggered grid has been generated for discretization of the Navier– Stokes equations (Fig. 4.2). Let the control volumes V ∗ used for discretization of the continuity equation (4.1a) be numbered along lines i = const as shown in Fig. 4.11. For cleanness, we omit the superscript (n + 1) for all unknowns in (4.29)–(4.31) and (4.10). Remember that this superscript (n + 1) refers to all values obtained at the time step (n + 1). Superscript ν denoting the Vanka iteration counter will be used instead of (n + 1). We form a subvector (block) of the unknowns (ν+1)
V = (u ij
(ν+1)
(ν+1)
u i+1j υ ij
(ν+1)
(ν+1) T
υ ij+1 p ij
)
with the velocity components on the faces of control volume V ij∗ used for discretization of the continuity equation (4.1a), and the pressure in the center of this volume (Fig. 4.3C). uh
yNv y +1
vh
ph ,Th
0
yNf y
0
y v3 y 2f y v2 y 1f y v1
x v1
x f1
4
8
12
16
3
7
11
15
2
6
10
14
1
5
9
13
x v2
x f2
x v3
xNf x xNv x +1 0
0
Fig. 4.11: Ordering of the control volumes for discretization of the continuity equation (4.1a) along lines i = const
166 | 4 Applications of multigrid methods in computational fluid dynamics Each control volume V ij∗ results in a SLAE composed of the linearized discrete analogues of the momentum equations (4.29) and (4.30) and the continuity equation (4.10). We discuss this SLAE in detail here. Taking into account the ordering of the control volumes shown in Fig. 4.11, the linearized discrete analogues of the X-momentum equations (4.29) become (ν+1)
(ν+1)
(ν+1)
(ν)
(ν+1)
E S N P UW ij u i−1j + U ij u i+1j + U ij u ij−1 + U ij u ij+1 + U ij u ij
UEN ij
+
(ν+1) υ ij+1
+ UWS ij
(ν+1) υ i−1j
+ UES ij
(ν+1)
+ UWN ij υ i−1j+1
(ν+1) υ ij
(ν+1)
+
(ν+1)
− p i−1j
p ij
hx
= S uij .
All unknowns forming the subvector (block) V keep in the left-hand side of the equation, but other terms are transferred to right-hand side: (ν+1)
(ν+1)
UEij u i+1j + UPij u ij
(ν+1)
(ν+1)
ES + UEN ij υ ij+1 + U ij υ ij
(ν+1)
(ν+1)
(ν+1)
+ h−1 x p ij
(ν)
= b1 ,
(4.39a)
(ν+1)
S N WN b 1 = S uij − UW ij u i−1j − U ij u ij−1 − U ij u ij+1 − U ij υ i−1j+1 (ν+1)
(4.39b)
(ν+1)
−1 − UWS ij υ i−1j + h x p i−1j .
Figure 4.12A represents a location of the computational stencil used for discretization of the X-momentum equations (4.1b) and the control volume V ij∗ used for discretization of the continuity equation (4.1a). After that we shift the stencil in the direction x on h x , i.e. formally change i on i + 1 in (4.29) (Fig. 4.12B): (ν+1)
UW i+1j u ij
(ν+1)
(ν+1)
(ν)
(ν+1)
+ UEi+1j u i+2j + USi+1j u i+1j−1 + UNi+1j u i+1j+1 + UPi+1j u i+1j
(ν+1)
(ν+1)
(ν+1)
EN WS + UWN i+1j υ ij+1 + U i+1j υ i+1j+1 + U i+1j υ ij
(ν+1)
+ UES i+1j υ i+1j +
(ν+1)
(ν+1)
p i+1j − p ij hx
= S ui+1j .
Again all unknowns forming the subvector (block) V keep in the left-hand side of the equation, but other terms are transferred to right-hand side: (ν+1)
UW i+1j u ij
(ν+1)
(ν+1)
(ν+1)
WS + UPi+1j u i+1j + UWN i+1j υ ij+1 + U i+1j υ ij (ν+1)
(ν+1)
(ν+1)
− h−1 x p ij
(ν)
= b2 ,
(4.40a)
(ν+1)
b 2 = S ui+1j − UEi+1j u i+2j − USi+1j u i+1j−1 − UNi+1j u i+1j+1 − UEN i+1j υ i+1j+1 (ν+1)
(ν+1)
−1 − UES i+1j υ i+1j − h x p i+1j .
(4.40b)
The linearized discrete analogues of the Y-momentum equations (4.30) are treated in the same manner (Fig. 4.12C) (ν+1)
VWN ij u ij
(ν+1)
(ν+1)
P + VEN ij u i+1j + V ij υ ij (ν+1)
(ν)
(ν+1)
(ν+1)
+ VNij υ ij+1 + h−1 y p ij (ν+1)
(ν+1)
= b3 ,
(4.41a)
(ν+1)
E S WS ES b 3 = S υij − VW ij υ i−1j − V ij υ i+1j − Vij υ ij−1 − V ij u ij−1 − V ij u i+1j−1 (ν+1)
+ h−1 y p ij−1 .
(4.41b)
4.6 Vanka iteration
| 167
) vi(jn+2
( n +1)
(n
(
( n +1) ij 1
u
( n +1) 1
pi j
( n)
)
( n +1)
uij
(
)
v
E
( n +1)
(n
)
ui +1+1j
(n)
ui j +1
(n)
ui +1j +1 (
( n +1) i 1j +1
( n +1)
v
)
B (n
( n +1)
n +1 ) u(i n +1 pi 1j 1j
)
ui +1+1j
(n
(
)
ui +1+1j
( n +1)
ui j
pi j
vi( n 1+1j )
n) vi(+1 j +1
vi j +1
A
( n +1)
uij (
)
vi jn +1
( n +1) i j +1
v
)
vi jn +1
( n +1) i j +1
pi j
vi(jn +11 )
(n
ui +1+1j
pi j
uij
vi +1 j vijn +1 (n ) ui +1+1j 1
n) vi(+1 j +1
D
( n +1)
( n +1)
)
ui +1+1j
pi j
vi( n 1+1j )
( n +1)
vi j +1
vi
C
( n +1)
ui +1j +1
( n +1) 1j +1
( n +1)
vi j +1
uij
(n)
) p(i jn+1
( )
ui jn+1
( n +1)
n) p(i n+1) j u(i +2 j
(
n) vi(+1 j
pi j
)
vijn +1
)
vijn +1
u(in+1+1j ) 1
u(inj +11 ) Fig. 4.12: Formation of a SLAE for the Vanka iterations
After that we shift the stencil in the direction y on h y , i.e. formally change j on j + 1 in (4.30) (Fig. 4.12D): (ν+1)
VWS ij+1 u ij
(ν+1)
(ν+1)
S + VES ij+1 u i+1j + V ij+1 υ ij (ν+1)
(ν+1)
(ν+1)
(ν)
+ VPij+1 υ ij+1 − h−1 y p ij (ν+1)
(4.42a)
= b4 , (ν)
E N WN b 4 = S υij+1 − VW ij+1 υ i−1j+1 − Vij+1 υ i+1j+1 − Vij+1 υ ij+2 − Vij+1 u ij+1 (ν)
(ν)
−1 − VEN ij+1 u i+1j+1 − h y p ij+1 .
(4.42b)
168 | 4 Applications of multigrid methods in computational fluid dynamics
All equations (4.39a), (4.40a), (4.41a), (4.42a) and (4.10) should be combined into the resulting SLAE (ν+1)
UPij
UEij
UES ij
UEN ij
h−1 x
u ij
UW i+1j
UPi+1j
UWS i+1j
UWN i+1j
−h−1 x
VEN ij
VPij
VNij
(ν+1) u ) ( i+1j )
VES ij+1
VSij+1
VPij+1
h−1 x
−h−1 y
h−1 y
( ( WN ( Vij ( ( VWS ij+1
−1
(−h x
b1
b ( 2) ) ( ) ) ( ) ( (ν+1) ) = (b 3 ) , h−1 y ) ( υ ij ) ( ) ) ( ) )( (ν+1) b4 −h−1 υ ij+1 y 0 ) (p(ν+1) ij )
(4.43)
(0)
where components of the right-hand side vector are given by (4.39b), (4.40b), (4.41b) and (4.42b). The last equation of this SLAE is a discrete analogue of the continuity equation (4.10). SLAE (4.43) can be solved by Gaussian elimination with partial pivoting (Sect. 1.2). Remember that this direct method does not exploit a pattern of the coefficient matrix, therefore the partial pivoting makes it possible to solve the SLAE with zero diagonal entries. Since SLAE (4.43) is composed of the linearized discrete analogues of the momentum and continuity equations, we iterate the Gaussian elimination until convergence of the nonlinear iterations (Sect. 1.9). All approximations to the solution will satisfy the discrete analogue of the continuity equation (4.10) (last equation in SLAE (4.43)). The underrelaxation (ν+1)
:= ū kj
(ν+1)
:= ῡ ik
(ν+1)
:= p̄ ij
u kj υ ik p ij
(ν+1)
+ ϖ u (u kj
(ν+1)
− ū kj
(ν+1)
+ ϖ υ (υ ik
(ν+1)
+ ϖ p (p ij
(ν+1)
),
k = i, i + 1 ,
(ν+1)
− ῡ ik
(ν+1)
),
k = j, j + 1 ,
(ν+1)
− p̄ ij
(ν+1)
)
is typically required depending on the Reynolds number Re (4.3) for convergence of the nonlinear iterations. Here ū kj , ῡ kj and p̄ kj are known previous nonlinear iteration values. Usually the underrelaxation parameters 0.2 ≤ ϖ u ≈ ϖ υ ≤ 0.8 and ϖ p = 1 need to be used for convergence. There is no general rule for assigning optimum underrelaxation parameters as values used in one case may not work properly for another case [15]. Let the SLAE (4.43) coupled with the control volume m be abbreviated as A m Vm = b m , where the subscript m indicates the control volume ordering (e.g. Fig. 4.11). Nonlinear Vanka iterations can be represented as do m = 1, M Solve end do
A m Vm = b m
4.6 Vanka iteration
|
169
where ‘Solve’ means solution of (4.43) by the hybrid method (Sect. 1.9). To summarize the important applied aspects of the Vanka iteration: SLAE such as (4.43) can be formed using a group of control volumes for discretization of the continuity equation. Since convergence rate of multigrid methods depends on the grid aspect ratio (Sect. 2.5), the group of control volumes should take into account an anisotropy of the problem. For example, Fig. 4.13 represents a preferable group of the control volumes for h x ≪ h y . Vanka iteration allows handling of different systems of PDEs in the coupled manner. Any discretized PDE can be added to the SLAE (4.43). For example, linearized discrete analogue of the energy equation becomes (n+1)
Hw ij u ij
(n+1)
(n+1)
+ Heij u i+1j + Hsij υ ij (n+1)
(n+1)
(n+1)
+ Hnij υ ij+1 + HPij T ij
(n+1)
(n+1)
= b5 ,
(4.44a)
(n+1)
E S N b 5 = STij − HW ij T i−1j − H ij T i+1j − H ij T ij−1 − H ij T ij+1 .
(4.44b)
Then expanded¹ SLAE (4.43) with (4.44) is written as UPij
UEij
UES ij
UEN ij
h−1 x
UW i+1j
UPi+1j
UWS i+1j
UWN i+1j
−h−1 x
VEN ij
VPij
VNij
h−1 y
VES ij+1
VSij+1
VPij+1
−h−1 y
h−1 x
−h−1 y
h−1 y
0
Heij
Hsij
Hnij
0
( ( WN ( Vij ( ( WS (V ( ij+1 ( −h−1 x w ( Hij
0
(ν+1)
u ij
b1
(ν+1) u i+1j
b2 )( ) ( ) ) ( (ν+1) ) ( ) 0 ) ( υ ij ) (b 3 ) )( ) ( ) ) ( (ν+1) ) = ( ) . ( ) ( ) 0 ) ( υ ij+1 ) ) (b 4 ) )( ) ( ) (ν+1) 0 0 p ij 0
HPij ) (T (ν+1) ) ij
(b 5 )
Vanka iteration (i.e. Gauss–Seidel method) has the following problem-dependent components: the unknown ordering (for anisotropic problems), the optimum underrelaxation parameter (for nonlinear problems) and a stopping criterion. Single-grid Vanka iterations are slowly convergent, but have reasonable smoothing properties. This method has the disadvantage that the Vanka iterations need a solution of SLAE by Gaussian elimination. It should be remembered that algorithmic complexity of the Gaussian elimination is W = O(N 3 ) arithmetic operations, where N
Fig. 4.13: Group of the control volumes for discretization of the continuity equation
1 If the velocity is independent of the temperature, then the energy equation (4.2) can be solved in the decoupled manner, i.e. after solution of the Navier–Stokes equations (4.1).
170 | 4 Applications of multigrid methods in computational fluid dynamics
is the number of unknowns (Sect. 1.2). This fact complicates a practical application of this smoother. For example, four control volumes shown in Fig. 4.13 result in a SLAE with the coefficient matrix 17×17. If the mathematical model consists of the Navier–Stokes equations, energy equation and two equations for turbulence description, then these four control volumes result in a SLAE with the coefficient matrix 29 × 29. In the first case, it is expected that the Vanka iteration is nearly (29/17)3 ≈ 5 times faster than in the second one. For 3D Navier–Stokes equations, four control volumes result in a SLAE with the coefficient matrix 25×25. Three additional PDEs expand size of this SLAE up to 37 × 37. For real-life problems where large systems of PDEs should be solved, this can lead to an iterative solution of SLAE with a dense coefficient matrix. In general, the smoothing will become less efficient.
4.7 Simple multigrid algorithm for Navier–Stokes equations Let us represent a solution of the Navier–Stokes equations (4.1) as a sum of two functions ̂ x, y) , u(t, x, y) = c u (t, x, y) + u(t, υ
(4.45a)
̂ x, y) , υ(t, x, y) = c (t, x, y) + υ(t,
(4.45b)
̂ x, y) . p(t, x, y) = c p (t, x, y) + p(t,
(4.45c)
Discrete analogues of the functions c u (t, x, y), c υ (t, x, y) and c p (t, x, y) will be coarse ̂ x, y), υ(t, ̂ x, y) and grid corrections, but discrete analogues of the functions u(t, ̂ x, y) will be approximations to the solution in the multigrid iterations of RMT. p(t, Substitution of (4.45) into (4.1) gives: a) continuity equation ∂c u ∂c υ ∂ û ∂ υ̂ + + + =0, ∂x ∂y ∂x ∂y b) X-momentum equation υ ̂ u + u)) ̂ ∂c u ∂ û ∂(c u + u)̂ 2 ∂((c + υ)(c + + + ∂t ∂t ∂x ∂y
=−
∂c p ∂ p̂ 1 ∂2 û ∂2 û 1 ∂2 c u ∂2 c u )+ + ) , − + ( 2 + ( 2 ∂x ∂x Re ∂x Re ∂x2 ∂y2 ∂y
c) Y-momentum equation u ̂ υ + υ)) ̂ ∂(c υ + υ)̂ 2 ∂c υ ∂ υ̂ ∂((c + u)(c + + + ∂t ∂t ∂x ∂y
=−
∂c p ∂ p̂ 1 ∂2 υ̂ ∂2 υ̂ 1 ∂2 c υ ∂2 c υ ) + + ) . − + ( 2 + ( ∂y ∂y Re ∂x Re ∂x2 ∂y2 ∂y2
4.7 Simple multigrid algorithm for Navier–Stokes equations | 171
Separation of the functions leads to the Σ-modified Navier–Stokes equations: a) continuity equation ∂c u ∂c υ (4.46a) + = R∗ (t, x, y) , ∂x ∂y b) X-momentum equation ̂ u ) ∂(c υ c u ) ∂(υc ̂ u ) ∂(c υ u)̂ ∂c u ∂(c u )2 ∂(uc + +2 + + + ∂t ∂x ∂x ∂y ∂y ∂y =−
∂c p 1 ∂2 c u ∂2 c u + ) + R u (t, x, y) + ( ∂x Re ∂x2 ∂y2
(4.46b)
c) Y-momentum equation ̂ u ) ∂(c υ u)̂ ̂ υ) ∂c υ ∂(c υ c u ) ∂(υc ∂(c υ )2 ∂(υc + + + + +2 ∂t ∂x ∂x ∂x ∂y ∂y =−
∂c p 1 ∂2 c υ ∂2 c υ + ) + R υ (t, x, y) , + ( ∂y Re ∂x2 ∂y2
(4.46c)
where the source terms R∗ (t, x, y), R u (t, x, y) and R υ (t, x, y) are given by R∗ (t, x, y) = −
∂ û ∂ υ̂ − , ∂x ∂y
(4.47a)
R u (t, x, y) = −
∂ û ∂(û 2 ) ∂(υ̂ u)̂ ∂ p̂ 1 ∂2 û ∂2 û + ) , − − − + ( ∂t ∂x ∂y ∂x Re ∂x2 ∂y2
(4.47b)
R υ (t, x, y) = −
∂ υ̂ ∂(û υ)̂ ∂(υ̂ 2 ) ∂ p̂ 1 ∂2 υ̂ ∂2 υ̂ + ) . − − − + ( ∂t ∂x ∂y ∂y Re ∂x2 ∂y2
(4.47c)
The source terms (4.47) coincide with the Navier–Stokes equations (4.1). Convergence of the multigrid iterations means that discrete analogues of the corrections tend to zero: c u → 0, c υ → 0, c p → 0, therefore, all source terms R∗ , R υ and R u tend to zero, and discrete analogues of the u,̂ υ̂ and p̂ will satisfy the Navier–Stokes equations (4.1). Assume that a uniform grid with mesh sizes h x and h y are generated for the finite volume discretization of the Σ-modified Navier–Stokes equations (4.46). Consider discretization of the Σ-modified X-momentum equation (4.46b) on the multigrid structure in detail. Integration of (4.46b) over the control volume V iju = {(x, y) | xf{i−1} ≤ x ≤ xf{i} , yv{j} ≤ y ≤ yv{j+1} } gives the following discrete equation: (n+1)
(c u ){ij}
(n)
− (c u ){ij}
ht
+
(n+1) (n+1) (c u )2 f f − (c u )2 f (x{i} , y{j} ) (x{i−1} , yf{j} ) h x 3l x
172 | 4 Applications of multigrid methods in computational fluid dynamics
+2
+
(n+1) (n+1) x x ̂ {ij} ⟨u⟩ c u f f − ⟨u⟩̂ {i−1j} c u f (x{i} , y{j} ) (x{i−1} , yf{j} ) h x 3l x
(n+1) (n+1) ⟨υ⟩̂ {ij+1} c u v v − ⟨υ⟩̂ {ij} c u v v (x{i} , y{j+1} ) (x{i} , y{j} ) h y 3l y (n+1)
=
(n+1)
+
(n+1)
u u 1 (c ){i−1j} − 2(c ){ij} Re h2x 32l x
+ (c u ){i+1j}
+
(n+1) (n+1) (c υ c u ) v v − (c υ c u ) v v (x{i} , y{j+1} ) (x{i} , y{j} ) h y 3l y
(n+1) (n+1) y y ⟨u⟩̂ {ij+1} c υ v v − ⟨u⟩̂ {ij} c υ v v (x{i} , y{j+1} ) (x{i} , y{j} ) h y 3l y (n+1)
+
(n+1)
u u 1 (c ){ij−1} − 2(c ){ij} Re h2y 32l y (n+1)
− where x ⟨u⟩̂ {kj} =
(c p ){ij}
(n+1)
+ (c u ){ij+1}
(n+1)
− (c p ){i−1j}
h x 3l x
+ ⟨R u ⟩{ij} ,
v t (n+1) y{j+1}
1 h t h y 3l y
∫ t
(n)
̂ xf{k} , y) dy dt , ∫ u(t,
k = i − 1, i ,
(4.48)
yv{j}
is the average value of the function û over west (k = i − 1) and east (k = i) faces of the control volume V iju , y ⟨u⟩̂ {ik}
=
f t (n+1) x {i}
1 ht hx 3
∫
lx t
(n)
̂ x, yv{k} ) dx dt , ∫ u(t,
k = j, j + 1 ,
(4.49)
x f{i−1}
is the average value of the function û over south (k = j) and north (k = j + 1) faces of the control volume V iju , and ⟨υ⟩̂ {ik} =
f t (n+1) x {i}
1 ht hx 3
∫
lx
̂ x, yv{k} ) dx dt , ∫ υ(t,
k = j, j + 1 ,
(4.50)
t (n) x f{i−1}
is the average value of the function υ̂ over south (k = j) and north (k = j + 1) faces of the control volume V iju . The source term ⟨R u ⟩{ij} u
⟨R ⟩{ij} =
v f t (n+1) x {i} y{j+1}
1 ht hx hy 3
l x +l y
∫
∫
∫ R u (t, x, y) dy dx dt ,
(4.51)
v t (n) x f{i−1} y{j}
is an averaged residual over the control volume. The Σ-modification results in additional convective terms in the momentum equations (4.46b) and (4.46c) caused by nonlinearity of the Navier–Stokes equations. Assume that the discrete equations have been solved exactly on the previous time step: (n) (c u ){ij} = 0. We will omit the superscript (n + 1) for the corrections, i.e. the corrections
4.7 Simple multigrid algorithm for Navier–Stokes equations | 173
are defined only on the current time step. Linearization of this discrete equation using (4.14a) and (4.14b) gives
2
(c̄ u c u ) f f − (c̄ u c u ) f (x{i} , y{j} ) (x{i−1} , yf{j} ) h x 3l x
+2
x c u f f ⟨u⟩̂ {ij} (x{i} , y{j} )
x − ⟨u⟩̂ {i−1j} c u f (x
{i−1} ,
h x 3l x
(c υ c̄ u ) v v − (c υ c̄ u ) v v (x{i} , y{j+1} ) (x{i} , y{j} ) h y 3l y
⟨υ⟩̂ {ij+1} c u v v − ⟨υ⟩̂ {ij} c u v v (x{i} , y{j+1} ) (x{i} , y{j} ) h y 3l y
u c{ij}
ht υ (c̄ c ) v v − (c̄ c u ) v v (x{i} , y{j+1} ) (x{i} , y{j} )
yf{j} )
−
+
+
h y 3l y
(c̄ υ c̄ u ) v v − (c̄ υ c̄ u ) v v (x{i} , y{j+1} ) (x{i} , y{j} ) h y 3l y y y ⟨u⟩̂ {ij+1} c υ v v − ⟨u⟩̂ {ij} c υ v v (x{i} , y{j+1} ) (x{i} , y{j} ) h y 3l y p
=
+
υ u
h x 3l x
+
+
−
(c̄ u )2 f f − (c̄ u )2 f (x{i} , y{j} ) (x{i−1} , yf{j} )
p
u u u u u u 1 c{i−1j} − 2c{ij} + c{i+1j} 1 c{ij−1} − 2c{ij} + c{ij+1} c{ij} − c{i−1j} + − + ⟨R u ⟩{ij} . Re Re h2x 32l x h2y 32l y h x 3l x
Corrections c̄ u and c̄ υ on the control volume faces can be expressed in terms of the discrete nodal values using the following interpolation: 1 u u c̄ u f f = (c̄ {ij} + c̄ {i+1j} ) , (x{i} , y{j} ) 2 1 υ υ c̄ υ v v = (c̄ {ij} + c̄ {ij−1} ) , (x{i} , y{j} ) 2
1 u u c̄ u f f = (c̄ {ij} + c̄ {i+1j} ) , (x{i} , y{j} ) 2 1 υ c̄ υ v v = (c̄ υ + c̄ {ij+1} ) . (x{i} , y{j+1} ) 2 {ij}
Using interpolation (4.19), the linearized discrete analogues of the Σ-modified continuity and momentum equations (4.46) can be written as (4.43). y x (4.48), ⟨u⟩̂ {ik} (4.49) and ⟨υ⟩̂ {ik} (4.50) can be approximated on Coefficients ⟨u⟩̂ {kj} the finest grid by the midpoint formula (Sect. 2.5) 1 (û + û k+1j ) , 2 kj 1 y ⟨u⟩̂ ik = ( û ik−1 + û ik ) , 2 1 ⟨υ⟩̂ ik = ( υ̂ i−1k + υ̂ ik ) , 2
⟨u⟩̂ xkj =
k = i − 1, i , k = j, j + 1 , k = j, j + 1 .
̂ yij and ⟨υ⟩̂ ij computed on the finest grid define Remember that the coefficients ⟨u⟩̂ xij , ⟨u⟩ y x the coefficients ⟨u⟩̂ {ij} , ⟨u⟩̂ {ij} and ⟨υ⟩̂ {ij} on the coarse levels (Sect. 2.9). Approximation of the boundary conditions for the corrections c u and c υ are given by (2.17)–(2.21). The Σ-modification (4.45) can be considered as a variant of the defect correction (Sect. 1.8), therefore monotonicity, conservatism and accuracy of the numerical solution depend on the discretization of the source terms ⟨R∗ ⟩ij , ⟨R u ⟩ij and ⟨R υ ⟩ij on the
174 | 4 Applications of multigrid methods in computational fluid dynamics finest grid. For example, discretization of ⟨R u ⟩ij (4.51) on the finest grid ⟨R u ⟩ij = R u (t(n+1) , xvi , yfj ) gives the explicit scheme, but discretization ⟨R u ⟩ij =
1 u (n) v f (R (t , x i , y j ) + R u (t(n+1) , xvi , yfj )) 2
gives the Crank–Nicolson scheme. Since all the corrections tend to zero for convergence of the multigrid iterations, simple interpolation of (4.19) can be used for discretization of the convective terms of the Σ-modified Navier–Stokes equations (4.46). Consider a problem concerning stationary flow in a square chamber for illustration of multigrid convergence. Figure 4.14 represents the geometry of this problem, and boundary conditions for the velocity components are defined by {100y(0.2 − y), u(0, y) = { 0, {
for y ≤ 0.2 , for y > 0.2 ,
u(1, y) = u(x, 0) = u(x, 1) = 0 , {100(x − 0.8)(1 − x), υ(x, 1) = { 0, {
for x ≥ 0.8 , for x < 0.8 ,
υ(0, y) = υ(1, y) = υ(x, 0) = 0 .
Y 1
0.8
0.2 0
1 X
Fig. 4.14: Geometry of a problem concerning stationary flow in a square chamber
The finest grid 101×101 and 1001×1001 is used for the flow simulation (Re = 100 and Re = 500). Table 4.1 represents the convergence results of RMT after five multigrid iteration (q = 5) of the sawtooth cycle (Fig. 2.7). Four Vanka iterations (ν = 4) are performed for the smoothing on all grids of level l (0 ≤ l < L+3 ). On the coarsest grids the discrete equations are solved by a direct method.
4.8 Formal decomposition of pressure
| 175
Tab. 4.1: Convergence of RMT in solving the problem of flow in a square cavity (Fig. 4.14) (q)
grid
Re
ν
q
ρMG
101 × 101 1001 × 1001
100 (500) 100 (500)
4 4
5 5
0.132 (0.167) 0.151 (0.194)
An average reduction factor of the residual of the discrete continuity equation (1.61) is used as a measure of multigrid convergence. Results of the computations shown in Table 4.1 demonstrate the high convergence rate of RMT, which weakly depends on Re.
4.8 Formal decomposition of pressure The nonlinear nature of the conservation equations dealt with here necessitates an iterative method. Starting with an initial guess, solutions are obtained by repeatedly applying a solution algorithm with the solution at the end of an iteration used as an initial guess for the following iteration. It is desirable to take an accurate starting guess for iterative solutions of highly nonlinear PDEs. Such a starting guess guarantees that the coarse grid correction in the following multigrid iterations will be a very smooth function suitable for approximation on the coarse grids. It is simple to set the initial guess for solving the transient problems: the solution (n) (n+1) (n) u ijk found is the starting guess for the next time step n + 1 due to u ijk = u ijk + O(h t ), where h t is the time step. In this section, we describe a numerical approach for fast computation of the accurate starting guess to solution of the full Navier–Stokes equations. This approach can be used as additional smoothing in the multigrid iterations to obtain a smooth correction, which will be well approximated on the coarse grids. This approach can be used in black-box software for convergence improvement of multigrid algorithms.
4.8.1 Simplified Navier–Stokes equations In order to clarify our understanding of numerical algorithms for incompressible Navier–Stokes equations, we want to briefly discuss the simplified Navier–Stokes equations in thin-shear-layer approximation (also known as ‘slender channel’ approximation) [4, 6, 24]. Consider a problem concerning flow between parallel plates (Fig. 4.10). If the length of the plates considerably exceeds the distance between them, then we can assume that the pressure is changing only with the axial distance along the flow passage: py = 0 or p = p(x). It is very important to observe that for the plate flow, the mass flow rate across any plane perpendicular to the plate axis is constant in
176 | 4 Applications of multigrid methods in computational fluid dynamics
the absence of wall blowing or suction. Integration of the continuity equation (4.1a) over the flow passage (Fig. 4.10) gives the following conservation equation for mass flow rate: 1
1
∫ u(t, x, y) dy = ∫ u(t, 0, y) dy . 0
(4.52)
0
This assumption of physical nature py
= 0 or p = p(x) makes it possible to simplify the Navier–Stokes equations by replacing the Y-momentum equation (4.1c) with the conservation equation for mass flow rate (4.52). Navier–Stokes equations in the slender channel approximation become a) X-momentum and mass flow rate equations ∂u ∂(u 2 ) ∂(υu) ∂p 1 ∂2 u ∂2 u { + ) + + = − + ( { { { ∂t ∂x ∂y ∂x Re ∂x2 ∂y2 { { , 1 1 { { { { { ∫ u(t, x, y) dy = ∫ u(t, 0, y) dy { { 0 0
(4.53)
and b) continuity equation (4.1a). The brace means that these equations are solved in coupled matter. After that the velocity υ can be obtained from the continuity equation (4.1a). Let a structured grid (N x + 1) × (N y + 1) (h x = 1/N x and h y = 1/N y ) be generated for solving the Navier–Stokes equations in the slender channel approximation. Using the block ordering of unknowns as shown in Fig. 4.15, a linearized discrete analogue of the X-momentum and mass flow rate equations in ith plane perpendicular to the plate axis X are written as (n+1)
(n+1)
a ij u ij−1 + b ij u ij { { { { { { { { { Y
(n+1)
(n+1)
+ c ij u ij+1 = −p i Ny
(n+1)
∑ u ij j=1
=
1 G hy 0
+ d ij ,
i = 2, 3, . . . , N x + 1 ,
block of unknowns
1
X 0
1
Fig. 4.15: Block unknown ordering between parallel plates
4.8 Formal decomposition of pressure
| 177
where G0 is known inlet mass flow rate, 1
Ny
(n+1)
G0 = ∫ u(t(n+1) , 0, y) dy ≈ h y ∑ u 1j
.
j=1
0
We omit the superscript (n + 1) denoting current time step and rewrite the SLAE in matrix form d i1 u i1 . . . −1 c i2 . . . −1 u i2 d i2 ( ) ( ) ( b i3 c i3 . . . −1) ) ( u i3 ) = ( d i3 ) . ( ) ( ) ( ( d i4 ) a i4 b i4 . . . −1 u i4 ............................. ... ... −1 1 1 1 1 . . . 0 p h ) ( i ) ( y G0 ) ( c i1 b i2 a i3
b i1 a i2
The resulting SLAE is similar to SLAE (4.8), but the zero diagonal block of the coefficient matrix has the least size (1 × 1) since the pressure depends on the single spatial variable: p = p(x). This fact allows construction of very efficient algorithms for solving the simplified Navier–Stokes equations. There are different algorithms for solving the resulting SLAE. The simplest and most efficient approach is the segregated solution methods, where the velocity and pressure are computed in the iterative decoupled manner. First, a velocity approximation is determined from the momentum equations with fixed pressure gradients
( ( (
b i1 a i2
c i1 b i2 a i3
c i2 b i3 a i4
c i3 b i4 ...
c i4 ... a iN y +1
(
u i1 d i1 − p i u i2 d i2 − p i ) ( u i3 ) ( d i3 − p i ) )( ) ( ) )( u ) = ( d − p ) . i4 i4 i ... ... ... b iN y +1 ) (u iN y +1 ) (d iN y +1 − p i )
Second, pressure is corrected such that the velocity u ij satisfyies the conservation equation of mass flow rate [6, 14, 24]. Since the velocity u ij depends on the pressure p i , we form a defect equation (discrete analogue of (4.52)) Ny
F(p i ) = h y ∑ u ij (p i ) − G0 , j=1
and use this equation for computation of the pressure p i by the secant method [3]: (k+1)
pi
(k)
= pi −
(k)
(k−1)
pi − pi (k)
(k−1)
F i (p i ) − F i (p i
(k)
)
F i (p i ) ,
k = 1, 2, . . . .
178 | 4 Applications of multigrid methods in computational fluid dynamics The desired value of p i is zero for the function F(p i ), i.e. p i is varied iteratively until the mass flow constraint (4.52) is met. Remember that the secant method needs two starting guesses (Sect. 1.9). F(p i ) is almost a linear function of p i in the neighborhood of zero and the secant method is an exact solver for the linear problems. Therefore, a few iterations of the secant method are needed for convergence up to discretization accuracy. This is important since computation of the function F(p i ) (or the velocity) is expensive per the decoupled iteration (especially for 3D problems). Here, we summarize the results of the error analysis for the simplified Navier– Stokes equations using the problem concerning developed laminar flow of an incompressible Newtonian fluid induced by a pressure drop between the entrance and exit of parallel plates [14]. The flow is said to be hydrodynamically fully developed when the velocity distribution is no longer changing with the axial distance along the flow passage. If L and H are the length and distance between plates (L ≫ H), then the velocity u is independent of axial position x everywhere except near the entrance of the plates (x = 0). Let δ p be some error of the pressure p i (p i ≫ |δ p |) in ith axial section. This error δ p results in nonzero velocity υ on the solid wall (y = 1): υ(x i , 1) ≈
2 H Re δ , 3 L h2x p
i.e. the velocity υ should satisfy exactly the no-slip condition if δ p = 0. In reality, if H/L ≈ h x , then Re/h x ≫ 1 and small error of the pressure δ p can lead to remarkable error of the velocity. This is the reason why, in general, it is necessary to achieve discretization accuracy as a stopping criterion. The above-mentioned algorithm is the Gauss–Seidel method applied to the Navier–Stokes equations in the slender channel approximation. This solver does not have problem-dependent components, but convergence is often quite slow because of very fast computation of the pressure and very slow computation of the velocity. As a result, this iteration can only be used as a smoother in multigrid to obtain a smooth correction, which will be well approximated on the coarse grids.
4.8.2 Pressure decomposition Solution of the simplified Navier–Stokes equations does not need impressive computational work, but in the special cases of fluid flows it is close to the solution of the full Navier–Stokes equation. The simplification is based on the assumption that the pressure depends only on a single spatial variable. In general, the pressure depends on all spatial variables, but we can extract a part of the pressure and find it using very efficient methods (in the sense of the pressure computation) developed for the simplified Navier–Stokes equations.
4.8 Formal decomposition of pressure |
179
For the given purpose, we transform the pressure by adding and subtracting terms p x (t, x), p y (t, y) and p z (t, z), which depend only on the single spatial variable x, y and z respectively: p(t, x, y, z) = p x (t, x) + p y (t, y) + p z (t, z) + (−p x (t, x) − p y (t, y) − p z (t, z) + p(t, x, y, z)) , where superscripts x, y and z denote the dependence on the spatial variables. We introduce a new term p xyz (t, x, y, z) = −p x (t, x) − p y (t, y) − p z (t, z) + p(t, x, y, z) , and the pressure is represented as p(t, x, y, z) = p x (t, x) + p y (t, y) + p z (t, z) + p xyz (t, x, y, z) .
(4.54)
Representation (4.54) is called a principle of the formal pressure decomposition [13, 14]. In the following, terms p x (t, x), p y (t, y), p z (t, z) and p xyz (t, x, y, z) are called ‘onedimensional pressure components’ and ‘multidimensional pressure components’ respectively. The basic idea consists of fast computation of part of the pressure (i.e. sum of onedimensional pressure components p x + p y + p z ) using numerical methods developed for the simplified Navier–Stokes equations. If sum of the one-dimensional pressure components approximates well to the pressure p x (t, x) + p y (t, y) + p z (t, z) → p(t, x, y, z) , then multidimensional pressure component p xyz (t, x, y, z) tends to zero: p xyz (t, x, y, z) → 0 . This results in reduction of effort for the multidimensional pressure component determination. Decomposition (4.54) is based on a formal mathematical transformation and it does not use assumptions of physical nature. Since each function f(x, y, z) can be transformed similarly, this transformation is called the formal decomposition. To summarize the main features of the formal pressure decomposition: 1. Each term in (4.54) does not have a physical meaning, but their sum has physical meaning. Sometimes for comfortable computations one of the pressure ‘components’ is fixed in some point. For example, we can pose p(t, 0, 0, 0) = p0 = const. Taking into account (4.54), we have p0 = p(t, 0, 0, 0) = p x (t, 0) + p y (t, 0) + p z (t, 0) + p xyz (t, 0, 0, 0) .
180 | 4 Applications of multigrid methods in computational fluid dynamics In other words, p0 can be chosen arbitrarily between the terms p x (t, x), p y (t, y), p z (t, z) and p xyz (t, x, y, z), therefore all terms in (4.54) have no physical meaning. Sometimes it is preferable to assign p xyz (t, 0, 0, 0) = p0 ,
p x (t, 0) = 0 ,
p y (t, 0) = 0 ,
p z (t, 0) = 0 .
The quotes ‘’ in the pressure component notation (4.54) show an absence of physical meaning. 2. Decomposition (4.54) shows that the pressure can be represented as a sum of d (d = 2, 3) ‘one-dimensional pressure components’ and a single ‘multidimensional pressure component’. As a result, it is necessary to use d extra conditions for determination of the ‘one-dimensional pressure components’. Conservation equations describing the mass flow rate such as (4.52) are used as these conditions. These equations can be considered as a priori information about the problem solved. 3. Since all conservation equations describing the mass flow rate are integral forms of the continuity equation (4.1a), the ‘one-dimensional pressure components’ should be computed before determination of the ‘multidimensional pressure component’. 4. Pressure is decomposed as a sum of d + 1 terms, but each momentum equation has only two ‘pressure’ gradients. For example, the ‘pressure’ gradient in the X-momentum equation (4.1b) becomes ∂p ∂ ∂p x ∂p xyz = + . (p x (t, x) + p y (t, y) + p z (t, z) + p xyz (t, x, y, z)) = ∂x ∂x ∂x ∂x Errors of the ‘one-dimensional pressure component’ computations do not affect each other. 5. Efficiency of this approach (in the sense of reduction of computational effort) depends on the flow type. The highest efficiency is expected for dominant gradients of any ‘one-dimensional pressure component’, i.e. for the directed fluid flows, e.g. jets, flows in nozzles and channels, etc. The pressure decomposition is less efficient for simulation of a recirculating flow, e.g. flow in the driven cavity. 6. Velocity components and the corresponding ‘one-dimensional pressure components’ are computed only in coupled manner. 7. Gradients of ‘one-dimensional pressure components’ are determined explicitly for the explicit schemes, and it is necessary to construct an auxiliary problem for the implicit schemes.
4.8 Formal decomposition of pressure | 181
4.8.3 Explicit schemes One possible approach to solving the incompressible Navier–Stokes equations is fractional-step or time-split method, which relies on ideas of operator splitting. The method was proposed by Chorin and Temam and has since then become quite popular for time-accurate solution of the incompressible Navier–Stokes equations. First, we consider an application of the principle of the formal pressure decomposition for improvement of the explicit schemes. We use the above-mentioned benchmark problem concerning the driven cavity and the three stage splitting scheme: V (n+1/2) − V (n) = −(V (n) ∇)V (n) + Re−1 ∆V (n) ; ht
Stage I:
∆p =
Stage II:
∇V (n+1/2) ; ht
V (n+1) − V (n+1/2) = −∇p , ht
Stage III:
where h t is half of the time spacing and V (n+1/2) is an intermediate velocity. In the explicit schemes, the velocity and pressure are computed in a decoupled manner. The pressure decomposition partially takes into account the velocity-pressure coupling. Integration of the continuity equation (4.1a) over the control volumes V1 = {(x, y) | 0 ≤ x∗ ≤ 1; 0 ≤ y ≤ 1 } , V2 = {(x, y) | 0 ≤ x ≤ 1; 0 ≤ y∗ ≤ 1 } , shown in Fig. 4.16 gives two conservation equations of the mass flow rate 1
∫ u(t, x, y) dy = 0 ,
(4.55)
0 1
∫ υ(t, x, y) dx = 0 .
(4.56)
0
Y
Uw(t )
Y
1 y*
V1
X
0
Y
1
1
1
0
V2
X x*
1
0
X
1
Fig. 4.16: Driven cavity and the control volumes V 1 and V 2
182 | 4 Applications of multigrid methods in computational fluid dynamics
Discrete analogues of the continuity (4.1a) and the mass flow rate equations (4.55) and (4.56) on staggered grids with h x = 1/N x and h y = 1/N y (Fig. 4.2) are given by (n+1)
(n+1)
u i+1j − u ij hx Ny
(n+1)
+
(n+1)
υ ij+1 − υ ij hy
=0,
(4.57)
(k)
k = n, n + 1/2, n + 1 ,
(4.58)
(k)
k = n, n + 1/2, n + 1 .
(4.59)
h y ∑ u ij = 0 , j=1 Nx
h x ∑ υ ij = 0 , i=1
These mass flow rate equations are obtained by integration of the continuity equation (4.1a) over the control volumes V1 and V2 (Fig. 4.16), therefore the discrete analogues of the mass flow rate equations should be obtained by summation of the discrete analogues of the continuity equation (4.57) over these volumes. Equations (4.55) and (4.56) will be considered as a priori information for solution of the full Navier– Stokes equations. In a 2D case, the pressure is decomposed by p(t, x, y) = p x (t, x) + p y (t, y) + p xy (t, x, y) ,
(4.60)
where the superscripts x, y and xy denote dependence of the terms p x (t, x), p y (t, y) and p xy (t, x, y) on the spatial variables. To improve the explicit scheme, discrete analogue of the X-momentum equation (4.1b) is rewritten as (n+1/2)
u ij
(n)
− u ij
ht
(n)
+(
(n+1/2)
∂(u 2 ) ∂(υu) ∂p x + ) = −( ) ∂x ∂y ∂x i ij
+(
(n)
1 ∂2 u ∂2 u + )) , ( Re ∂x2 ∂y2 ij
i.e. on the immediate time step (n + 1/2) the velocity u and the one-dimensional pressure component p x are determined in a coupled manner using a discrete analogue of the mass flow rate equation (4.58). For brevity, we rewrite the discrete analogue of the X-momentum equation as (n+1/2)
u ij
(n)
− u ij
ht
(n+1/2)
= −(
∂p x ) ∂x i
+ Θ ij ,
(4.61)
where Θ ij is known grid function (n)
Θ ij = (−
∂(u 2 ) ∂(υu) 1 ∂2 u ∂2 u + )) . − + ( ∂x ∂y Re ∂x2 ∂y2 ij (n+1/2)
Our goal is to compute the velocity u ij component
(n+1/2) pi
using (4.61), but
(n+1/2) pi
and the one-dimensional pressure
should be found from the condition that
4.8 Formal decomposition of pressure
| 183
(n+1/2)
satisfies the discrete analogue of the mass flow rate equation the velocity u ij (4.58). Multiplication of (4.61) on h y and summation over j gives N
N
(n+1/2)
N
Ny
y y y 1 ∂p x (n+1/2) (n) (h y ∑ u ij − h y ∑ u ij ) = − ∑ h y ( ) ht ∂x i j=1 j=1 j=1
+ h y ∑ Θ ij . j=1
The left-hand side of this equation is zero due to (4.58). The first right-hand side term can be transformed as (n+1/2)
Ny
∑ hy ( j=1
∂p x ) ∂x i
(n+1/2) N y
=(
∂p x ) ∂x i
∑ hy = ( j=1
(n+1/2)
∂p x ) ∂x i
,
since the gradient of p x is j-independent, but the sum of all h y is nondimensional height of the cavity or 1. Then the gradient of p x is (n+1/2)
(
∂p x ) ∂x i
Ny
= h y ∑ Θ ij .
(4.62)
j=1
Substituting (4.62) into (4.61), we obtain an equation for computation of the intermediate velocity u (n+1/2) (n) Ny − u ij u ij = −h y ∑ Θ ij + Θ ij . (4.63) ht j=1 Ny
Due to the correction h y ∑j=1 Θ ij , the intermediate velocity u satisfies the discrete analogue of the mass flow rate equation (4.58). The intermediate velocity u and one-dimensional pressure component p x are computed as follows: Ny 1) computation of the function h y ∑j=1 Θ ij , 2) one-dimensional pressure component p x is the solution of the ordinary differential equation (4.62) (n+1/2)
(p x )
i
= (p x )
(n+1/2) i−1
Ny
+ h x h y ∑ Θ ij ,
(p x )
j=1
(n+1/2) 1
=0,
3) intermediate velocity u is the solution of (4.63): (n+1/2)
u ij
(n)
Ny
= u ij − h t h y ∑ Θ ij + h t Θ ij . j=1
The intermediate velocity υ and the one-dimensional pressure component p y are computed in the same manner. In the classical splitting scheme, the velocity components on the intermediate time step (n+1/2) (stage I) do not depend on the pressure; therefore these components
184 | 4 Applications of multigrid methods in computational fluid dynamics
do not satisfy the discrete analogues of the mass flow rate equations (4.58) and (4.59). Pressure decomposition makes it possible to partially take into account the correlation between the intermediate velocity and part of the pressure (more precisely, sum of the one-dimensional pressure components p x + p y + p z ), therefore the intermediate velocity components satisfy the equations of mass flow rate (4.58) and (4.59). At the same time, the improved explicit scheme saves the explicit nature of the computations because the SLAE (n+1/2)
(n)
u ij − u ij { { { { { ht { Ny { { { { h y ∑ u (n+1/2) ij { j=1
(n+1/2)
= −(
∂p x ) ∂x i
+ Θ ij
=0
has been solved analytically. The second stage of the scheme caused by the Poisson equation ∆p xy = −
1 ∂u ∂υ (n+1/2) ( + ) h t ∂x ∂y
is generally the most expensive to solve, i.e. the one-dimensional pressure components and the multidimensional pressure component are assigned to different time steps. Since the velocity components satisfy (4.58) and (4.59), it is expected that the right-hand side of the Poisson equation (residual of the continuity equation (4.1)) will be less than the residual in the classic approach (i.e. without the pressure decomposition). We can hope that reduction of the residual leads to reduction of the computational efforts needed for solving the Poisson equation. The third stage of the improved explicit scheme is determination of the velocity on the (n + 1)th time step using the multidimensional pressure component V (n+1) − V (n+1/2) = −∇p xy . ht Note that the velocity and part of the pressure (sum of the one-dimensional pressure components p x + p y ) are computed in a coupled manner (Stage I), but the velocity and another part of the pressure (multidimensional pressure component p xy ) are computed only in a decoupled manner (Stage II and III). For numerical experiment, we define speed of the cavity 1 × 1 lid as (n) Uw = min (
n ; 1) , 100
where n is number of time steps. Uniform staggered grid 201 × 201 (h = h x = h y = 1/200) for Re = 1000 and h t = h/5 is used for the experiment. Maximum lid speed (n) (n) Uw = 1 defines the Reynolds number (Re). Execution time ratio T (n) m /T c is taken (n) as a criterion of efficiency, where T (n) c and T m are execution time for classic and improved schemes. Figure 4.17 represents reduction of the execution time for n ≤ 200;
| 185
4.8 Formal decomposition of pressure
the average value 1 200 (n) (n) ∑ T /T c = 0.81 200 n=1 m shows minimum efficiency of the pressure decomposition approach for simulation of recirculating flow in the cavity. Other explicit schemes for (in)compressible Navier–Stokes equations can be improved in the same way.
(n) Tm / Tc( n )
1.0 0.9 0.8 0.7 0.6 0
20
40
60
80 100 120 140 160 180 200 time step n
Fig. 4.17: Influence of the pressure decomposition on execution time for simulation flow in the driven cavity
4.8.4 Implicit schemes Equations such as (4.62) for computation of gradients of the one-dimensional pressure components can be obtained only for explicit schemes. For implicit schemes, it is necessary to pose and to solve an auxiliary problem to compute part of the pressure px + py + pz . To formulate the auxiliary problem, we replace the continuity equation (4.1) with d (= 2, 3) conservation equations for the mass flow rate. For the driven cavity, the auxiliary problem becomes: a) the X-momentum and the mass flow rate (4.55) equations ∂u ∂(u 2 ) ∂(υu) ∂p x ∂p xy 1 ∂2 u ∂2 u { + ) + + = − − [ ] + ( { { { ∂t ∂x ∂y ∂x ∂x Re ∂x2 ∂y2 1 { { { { ∫ u(t, x, y) dy = 0 { 0 and
,
(4.64)
186 | 4 Applications of multigrid methods in computational fluid dynamics
b) the Y-momentum and the mass flow rate (4.56) equations ∂υ ∂(uυ) ∂(υ2 ) ∂p y ∂p xy 1 ∂2 υ ∂2 υ { + + =− −[ + ) ]+ ( { { { ∂t ∂x ∂y ∂y ∂y Re ∂x2 ∂y2 1 { { { { ∫ υ(t, x, y) dx = 0 { 0
,
(4.65)
where the square brackets mean that the derivatives (p xy )x and (p xy )y are fixed (i.e. these gradients are computed using the value of p xy from the previous iteration in the iterative solution procedure), but braces mean the momentum and mass flow rate equations are solved in a coupled manner. It is easy to see that systems (4.64) and (4.65) are very similar to the simplified Navier– Stokes equations in the slender channel approximation (4.53), so the same numerical methods should be applied to solve them. The difference is only in the stopping criterion: in multigrid methods to obtain an approximation to the solution of the full Navier–Stokes equations it is necessary to perform a few smoothing iterations of the Gauss–Seidel method with the block ordering of unknowns and with computation of one-dimensional pressure components p x (x) and p y (y). Here we use the very fast convergence rate of the secant method. In multigrid methods, the auxiliary problem can be used for extra smoothing. Further, these one-dimensional pressure components are used in the original Navier–Stokes equations, where the momentum equations become ∂u ∂(u 2 ) ∂(υu) dp x ∂p xy 1 ∂2 u ∂2 u + + = −[ ]− + ( + ) , ∂t ∂x ∂y dx ∂x Re ∂x2 ∂y2
(4.66)
∂υ ∂(uυ) ∂(υ2 ) dp y ∂p xy 1 ∂2 υ ∂2 υ + ) , + + = −[ ]− + ( ∂t ∂x ∂y dy ∂y Re ∂x2 ∂y2
(4.67)
where the square brackets mean that the gradients (p x )x are (p y )y have been computed in the auxiliary problem. We use the problem concerning stationary flow in the driven cavity to illustrate proximity of solution of the auxiliary problem to solution of the full Navier–Stokes equation, starting with u (0) = 0, υ(0) = 0, p(0) = 0 and Re = 100 on the staggered grid 101 × 101 (h x = h y = 1/100). In this case, the velocity component υ = 0 satisfies to the Y-momentum equation and decoupled iterations start with equation 1 ∂2 u ∂2 u ∂(u 2 ) + ) . = ( ∂x Re ∂x2 ∂y2
(4.68)
The auxiliary problem becomes ∂(u 2 ) dp x 1 ∂2 u ∂2 u { + ) = − + ( { { { ∂x dx Re ∂x2 ∂y2 1 { { { { ∫ u(x, y) dy = 0 { 0
.
(4.69)
4.8 Formal decomposition of pressure | 187
Finally, both problems (4.68) and (4.69) are reduced to Ax = b. Stopping criterion is taken as ‖Ax − b‖ < 10−7 , ‖b‖ to compare solutions of (4.68) and (4.69) with solution of the full Navier–Stokes equations (4.1). Figure 4.18 demonstrates comparison of solutions of the Navier–Stokes equations in the ‘velocity-pressure’ formulation (solid line) with solution of the Navier–Stokes equations in the ‘stream function-vorticity’ formulation (+) [7] and solutions of (4.68) and (4.69) in vertical sections of the cavity (x = 0.5). The solution of (4.68) is quite close to the solution of the Navier–Stokes equations only in the near-lid subdomain (y > 0.8). As opposed to (4.68), the solution of (4.69) is a good approximation to the solution of the Navier–Stokes equations in all domain. This is a result of the partial velocity-pressure coupling caused by the pressure decomposition. u y 0.8
[7] final result (4.68) (4.69)
Uw 1.0
0.6 0.4 0.2
0.0
0.5
x
1.0
0.0 -0.2 -0.4 0.0
0.2
0.4
y
0.6
0.8
1.0
Fig. 4.18: Velocity u(0.5, y) in the middle vertical section (x = 1/2) of the cavity for Re = 100
To summarize the important applied aspects of the pressure decomposition approach: Employing the pressure decomposition is one way to obtain more accurate approximations to the solution of the (in)compressible Navier–Stokes equations. This method uses the formal pressure decomposition and integral forms of the continuity equation without any problem-dependent components. As a result,
188 | 4 Applications of multigrid methods in computational fluid dynamics
the pressure decomposition can be used for robustness improvement and convergence acceleration of the basic algorithms for a large class of applied problems. This approach can be implemented in explicit form for the explicit schemes or in the form of solutions of the auxiliary problem for the implicit schemes or for stationary Navier–Stokes equations. Computation of more accurate approximations to solutions of the (in)compressible Navier–Stokes equations is based on the velocity-pressure formulation, the pressure decomposition, the Gauss–Seidel method with block ordering unknowns and the secant method. Computational efficiency of the convergence acceleration caused by the pressure decomposition depends on the flow type; the best results are expected for simulation of stationary directed fluid flows. The iterative algorithm for solving the auxiliary problem is very efficient for computation of one-dimensional pressure components due to fast convergence of the secant method. On the other hand, this algorithm is not efficient for computation of velocity due to slow convergence of the Gauss–Seidel method. Since the Gauss– Seidel method is smoother, the pressure decomposition approach can be used in black-box multigrid for extra smoothing. The pressure decomposition approach can be used for compressible flow simulation. Part of the problem is solved in the velocity-pressure formulation, but the remainder can be solved in the velocity-density formulation.
4.9 Full multigrid algorithm There are two different ways to put the principle of the formal pressure decomposition into multigrid algorithms. One of them, the coupled approach, uses the auxiliary problem for extra smoothing. For purposes of clarity, the problem concerning flow in a square cavity (Fig. 4.14) will be used below, so the auxiliary and main problems become: 1) Auxiliary problem a) the X-momentum and the mass flow rate equations ∂(u 2 ) ∂(υu) dp x ∂p xy 1 ∂2 u ∂2 u { + ) , + = − − [ ] + ( { { { ∂x ∂y dx ∂x Re ∂x2 ∂y2 1 1 x { { { {∫ u(x, y) dy = ∫ u(0, y) dy − ∫ υ(ξ, 1) dξ , {0 0 0 b) the Y-momentum and the mass flow rate equations ∂(uυ) ∂(υ2 ) dp y ∂p xy 1 ∂2 υ ∂2 υ { + ) , + = − − [ ] + ( { { { ∂x ∂y dy ∂y Re ∂x2 ∂y2 y { {1 { {∫ υ(x, y) dx = ∫ u(0, ξ) dξ ; {0 0
4.9 Full multigrid algorithm
2)
| 189
Main problem a) the continuity equation ∂u ∂υ + =0, ∂x ∂y b) the X-momentum equation ∂(u 2 ) ∂(υu) dp x ∂p xy 1 ∂2 u ∂2 u + = −[ + + ) , ]− ( ∂x ∂y dx ∂x Re ∂x2 ∂y2 c) the Y-momentum equation ∂(uυ) ∂(υ2 ) dp y ∂p xy 1 ∂2 υ ∂2 υ + ) . + = −[ ]− + ( ∂x ∂y dy ∂y Re ∂x2 ∂y2
Σ-modification of the problems is very similar to those for the Navier–Stokes equations (Sect. 4.7). The smoothing procedure for the auxiliary problem consists of Gauss– Seidel iterations with line ordering of unknowns and the secant method for computation of the one-dimensional pressure components (Sect. 4.8). The smoothing procedure for the main problem consists of Vanka iterations (Sect. 4.6). These procedures will be marked by ∙ and ∘ respectively. Figure 4.19 represents the sequential multigrid cycle for solving the Navier–Stokes equations. As a rule, a few smoothings using the auxiliary problem on the finest grid are needed to obtain sufficiently smooth coarse grid correction. Note that the smoothing is more expensive than the corresponding Vanka smoother. Parallelization of the multigrid algorithm with the pressure decomposition has, however, the disadvantage that the equations of the auxiliary problem should be solved sequentially. In this case, the parallel smoothing starts on the level l∗ (paral∗ lelization depth of RMT, Sect. 3.2), where we have 3dl independent discrete problems. The parallel cycle of RMT is shown in Fig. 4.20. The second method of application of the pressure decomposition is to solve the auxiliary problem and to use the solution as a starting guess for the Navier–Stokes equations. We use this method only to demonstrate the proximity of the auxiliary and main problems. Fig. 4.21 and 4.22 represent the stream functions computed from the Level
the finest grid
0 1
L 3+
the coarsest grids
Fig. 4.19: Sequential multigrid cycle for solving the Navier–Stokes equations
190 | 4 Applications of multigrid methods in computational fluid dynamics
Level 0
the finest grid
1
l*
L 3+
the coarsest grids
Fig. 4.20: Parallel multigrid cycle for solving the Navier–Stokes equations
solution of the auxiliary problem and Navier–Stokes equations. It is easy to see that solution of pressure-unlinked equations of the auxiliary problem a) the X-momentum and the mass flow rate equations ∂(u 2 ) ∂(υu) dp x 1 ∂2 u ∂2 u { + ) + = − + ( { { { ∂x ∂y dx Re ∂x2 ∂y2 1 x { {1 { {∫ u(x, y) dy = ∫ u(0, y) dy − ∫ υ(ξ, 1) dξ {0 0 0
,
b) the Y-momentum and the mass flow rate equations ∂(uυ) ∂(υ2 ) dp y 1 ∂2 υ ∂2 υ { + ) + =− + ( { { { ∂x ∂y dy Re ∂x2 ∂y2 y { {1 { {∫ υ(x, y) dx = ∫ u(0, ξ) dξ , {0 0 can give an accurate starting guess for the Navier–Stokes equations. However, the difference in pressure is remarkable as shown in Fig. 4.23 and 4.24 since only part of the pressure (namely sum of the one-dimensional pressure components p x + p y ) is computed in the auxiliary problem. Figure 4.25 demonstrates that the multigrid convergence histories ‖r u ‖∞ , ‖r υ ‖∞ , ‖r uυ ‖∞ , ‖r∗u ‖∞ and ‖r∗υ ‖∞ are norms of the residuals of linearized X- and Y-momentum equations, the continuity equation and mass flow rate equations. Four Vanka smoothing iteration steps are performed on all grids of level l (0 ≤ l < L+3 ).
4.9 Full multigrid algorithm
|
191
1
0.8
Y
0.6
0.4
0.2
0 0
0.2
0.4
X
0.6
0.8
1
Fig. 4.21: Solution of the auxiliary problem: isolines of stream function (Re = 100, grid 101 × 101)
1
0.8
Y
0.6
0.4
0.2
0 0
0.2
0.4
X
0.6
0.8
1
Fig. 4.22: Solution of the Navier–Stokes equations: isolines of stream function (Re = 100, grid 101 × 101)
192 | 4 Applications of multigrid methods in computational fluid dynamics
1
0.8
Y
0.6
0.4
0.2
0 0
0.2
0.4
X
0.6
0.8
1
Fig. 4.23: Solution of the auxiliary problem: isolines of pressure (Re = 100, grid 101 × 101)
1
0.8
Y
0.6
0.4
0.2
0 0
0.2
0.4
X
0.6
0.8
1
Fig. 4.24: Solution of the Navier–Stokes equations: isolines of pressure (Re = 100, grid 101 × 101)
4.9 Full multigrid algorithm
+3
10
auxiliary problem
Re =100, 101 101 ru rv ruv ru* rv*
+1
10
-1
10
+5
10
auxiliary problem
| 193
Re =100, 1001 1001 ru rv ruv ru* rv*
+3
10
+1
10
-1
10 -3
10
-3
10 -5
10
-5
10
-7
-7
10
10 0
2
+3
10
auxiliary problem
4 6 8 multigrid iteration
10
0
Re =500, 101 101 ru rv ruv ru* rv*
+1
10
-1
10
2
4 6 8 10 multigrid iteration
auxiliary problem
+3
10
+1
10
-1
12
14
Re =500, 1001 1001 ru rv ruv ru* rv*
10 -3
10
-3
10 -5
10
-5
10
-7
-7
10
0
2
4 6 8 multigrid iteration
10
10
0
2
4 6 8 10 12 14 multigrid iteration
16
Fig. 4.25: The convergence history: reduction of residual vector norm ‖r u ‖∞ , ‖r υ ‖∞ , ‖r uυ ‖∞ , ‖r ∗u ‖∞ and ‖r ∗υ ‖∞
194 | 4 Applications of multigrid methods in computational fluid dynamics
4.10 Conclusions 1. RMT with the smoothing procedure based on the Gauss–Seidel method with block ordering of unknowns allows one to solve a large class of (non)linear boundary value problems in a unified manner (including problems of the saddle-point type). 2. For typical real-life applications (PDE systems with saddle-point features and nonlinear terms), the algorithm for computation of an accurate approximation to solutions of the original problem is proposed. It is expected that the coarse grid corrections will be sufficiently smooth functions, which are well approximated on coarse grids. This advanced approach for real-life CFD applications will be more effective for directed fluid flows.
Bibliography [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25]
G. A. Baker and P. Graves-Morris. Padé Approximants. Encyclopedia of Mathematics and Its Applications. Addison-Wesley Publishing company, 1981. M. Benzi, G. H. Golub, and J. Liesen. Numerical solution of saddle point problems. Acta Numerica, 14:1–137, 2006. W. R. Briley. Numerical method for predicting three-dimensional steady viscous flow in ducts. J. Comp. Phys., 14:8–28, 1974. T. Cebeci and P. Bradshaw. Physical and Computational Aspects of Convective Heat Transfer. Springer-Verlag, New York, 1988. R. P. Fedorenko. A relaxation method for solving elliptic difference equations. USSR Comp. Math. Math. Phys., 1:1092–1096, 1962. C. A. J. Fletcher. Computational Techniques for Fluid Dynamics, volume I: Fundamental and General Techniques. Springer-Verlag, Berlin, 1988. U. Ghia, K.N. Ghia, and C.T. Shin. High-re solutions for incompressible flow using the navierstokes equations and a multigrid method. J. Comp. Phys., 48:387–411, 1982. I. M. Glazman and Y. I. Lubich. Finite Dimensional Linear Analysis. Nauka, Moscow, 1969 (in Russian). W. Hackbusch. Multi-grid Methods and Applications. Springer, Berlin, Heidelberg, 1985. L. A. Hageman and D. M. Young. Applied Iterative Methods. International Series of Numerical Mathematics. Academic Press, New York, 1981. N. J. Higham. Accuracy and Stability of Numerical Algorithms. second edition. SIAM Publications, Philadelphia, PA, 2002. S. I. Martynenko. Robust multigrid technique for solving partial differential equations on structured grids. Numerical Methods and Programming, 1:83–100, 2000. S. I. Martynenko. Robust Multigrid Technique. Keldysh Institute of Applied Mathematics, Moscow, 2013 (in Russian). S. I. Martynenko. Multigrid Technique: Theory and Applications. FIZMATLIT, Moscow, 2015 (in Russian). F. Moukalled, L. Mangani, and M. Darwish. The Finite Volume Method in Computational Fluid Dynamics. Springer International Publishing, 2016. M. A. Olshanskii. Lectures and exercises in multigrid methods. FIZMATLIT, Moscow, 2005 (in Russian). M. A. Olshanskii and E. E. Tyrtyshnikov. Iterative Methods for Linear Systems: Theory and Applications. Philadelphia, Berlin, 2014. J. Ortega. Introduction to Parallel and Vector Solution of Linear Systems. Plenum Press, New York, 1988. J. M. Ortega and W. C. Rheinboldt. Iterative Solution of Nonlinear Equations in Several Variables. Academic Press, New York, 1970. S. V. Patankar. Numerical Heat Transfer and Fluid Flow. McGraw-Hill, 1980. Y. Saad. Iterative Methods for Sparse Linear Systems. second edition. SIAM Publications, Philadelphia, PA, 2003. A. A. Samarskii. The Theory of Difference Schemes. Marcel Dekker Inc., New York, 2001. A. A. Samarskii and A. V. Goolin. Numerical Methods. Nauka, Moscow, 1989 (in Russian). J. C. Tannehill, D. A. Anderson, and R. H. Pletcher. Computational Fluid Mechanics and Heat Transfer. 2nd ed. Tayler and Francis, 1997. U. Trottenberg, C. W. Oosterlee, and A. Schüller. Multigrid. Academic Press, Berlin, 2001.
https://doi.org/10.1515/9783110539264-203
196 | Bibliography
[26] S. P. Vanka. Block-implicit multigrid solution of navier–stokes equations in primitive variables. J. Comp. Phys., 65:138–158, 1986. [27] R. S. Varga. Matrix Iterative Analysis. Prentice-Hall, Englewood Cliffs, NJ, 1962.
Index A algorithm – robustness 58 algorithmic complexity 6 approximation to the solution 60 auxiliary structured grid 106 B black-box software V boundary condition – Dirichlet 14 – Neumann 14 – Robin 14 boundary value problem 14 C coarse grid correction 60 communication overhead 120 complexity – close-to-optimal 25 – optimal 25 computational algorithm V condition number 13, 19 D defect correction 52, 61, 106 defect equation 35 discrete boundary value problem 18 E efficiency of a parallel algorithm 124 error – exact 10 – iteration 10 essential multigrid principle 24 F finite difference scheme 18 G Gaussian elimination 6 – full pivoting 8 – partial pivoting 8 grid – coarse 34, 62 – computational 33, 89 – fine 34, 61 https://doi.org/10.1515/9783110539264-204
– first level 64 – nonuniform 61 – points 62 – fictitious 64 – staggered 77, 147 – uniform 15, 61 – vertex 26 – zero level 64 grid partitioning 123 I index mapping 66 iteration counter 9, 89 L load imbalance 120 M mathematical model V matrix – block tridiagonal 3 – coefficient 6 – diagonal 2 – eigenvalue 3 – eigenvector 3 – identity 2 – iteration 10 – lower triangular 2 – nonsingular 3 – permutation 3 – similar 3 – singular 3 – spectral radius 3 – splitting 9 – square 1 – symmetric 3 – tridiagonal 2 – upper triangular 2 mesh size 15, 61 method – iterative – damped Jacobi 12 – Gauss–Seidel 12 – SOR 12 – Newton 151
198 | Index
methods – direct 6 – iterative 6 Σ-modification 60 multigrid – optimized 45 – robust 45 multigrid structure 64 N Navier–Stokes equations 143 Newton method 56 norm – matrix 4 – consistent 5 – spectral 5 – vector 4 – equivalent 4 number – Péclet 144 – Prandtl 144 – Reynolds 144 O operator – Hamilton 145 – Laplace 145 – prolongation 35, 76 – restriction 35 ordering – block 29 – point 27, 28 P parallelization depth 128 Picard method 56
postsmoothing 36 presmoothing 35 property – approximation 37, 43 – smoothing 37 R residual 10 robustness 58 S Schur complement 146, 162 smoother 24, 25 sparse matrix 27 spectrum 3 speed-up of a parallel algorithm 124 stencil 26 – five-point 26 Stokes equations 144 substitution – back 7 – forward 7 T TDMA 29 Thomas algorithm 9 transfer operators 36 truncation error 16 V vector – residuals 6 – right-hand side 6 – unknowns 6 vertex 15 – fictitious 80