Model Order Reduction and Applications: Cetraro, Italy 2021 3031295625, 9783031295621

This book addresses the state of the art of reduced order methods for modelling and computational reduction of complex p

289 27 12MB

English Pages 240 [241] Year 2023

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Preface
Acknowledgement
Contents
List of Symbols
1 The Reduced Basis Method in Space and Time: Challenges, Limits and Perspectives
1.1 Introduction
1.2 Some Industrial Challenges: Why Do We Need Model Reduction?
1.2.1 Ship Design, Steering and Propulsion
1.2.1.1 Optimal Steering
1.2.1.2 Optimal Design and Control
1.2.2 Financial and Energy Markets, Trading
1.2.3 Medicine: Biomechanics, Fracture Healing, Flow Simulation, Optics
1.2.4 Lessons Learnt
1.3 The Reduced Basis Method (RBM)
1.3.1 Parameterized Linear PDEs
1.3.2 A Detailed Approximation: The "Truth"
1.3.3 Offline Training: The Reduced Problem
1.3.3.1 Offline-Online Decomposition
1.3.3.2 A Posteriori Error Estimation
1.3.3.3 Greedy Selection of the Reduced Basis
1.3.4 What Is the Benchmark?
1.3.5 Weak-Greedy Convergence
1.4 Guiding Examples
1.4.1 The Thermal Block
1.4.2 The Heat Equation
1.5 Beyond Elliptic and Parabolic Problems
1.5.1 Some More Guiding Examples
1.5.2 Ultraweak Formulations
1.5.3 Stable Ultraweak Petrov-Galerkin Discretization
1.5.4 The Ultraweak Reduced Model
1.5.5 Guiding Examples Revisited
1.5.5.1 The Linear Transport Problem
1.5.5.2 The Wave Equation
1.5.5.3 The Schrödinger Equation
1.5.6 Ultraweak "Truth" Discretization
1.5.6.1 Linear Transport
1.5.6.2 The Wave Equation
1.5.6.3 The Schrödinger Equation
1.5.6.4 Common Challenges
1.5.7 The Kolmogorov N-Width Again
1.5.7.1 Linear Transport
1.5.7.2 The Parameterized Wave Equation
1.5.7.3 Schrödinger Equation
1.5.7.4 Common Challenges
1.6 Numerical Aspects
1.6.1 "Truth" Approximation in Space and Time
1.6.1.1 The Parameterized Heat Equation
1.6.1.2 The Parameterized Wave Equation
1.6.2 Reduced Basis Method
1.6.2.1 Thermal Block/The Parameterized Heat Equation
1.6.2.2 The Parameterized Transport Equation
1.6.2.3 The Parameterized Wave Equation
1.7 Conclusions and Outlook
References
2 Inverse Problems: A Deterministic Approach Using Physics-Based Reduced Models
2.1 Introduction
2.2 Forward and Inverse Problems
2.3 Optimality Benchmarks for State Estimation
2.4 Optimal Affine Algorithms
2.4.1 Definition and Preliminary Remarks
2.4.2 Characterization of Affine Algorithms
2.4.3 A Practical Algorithm for Optimal Affine Recovery
2.4.3.1 Discretization and Truncation
2.4.3.2 Optimization Algorithms
2.4.3.3 Final Remark About the Primal-Dual Algorithm
2.5 Sensor Placement
2.5.1 A Collective OMP Algorithm
2.5.1.1 Description of the Algorithm
2.5.1.2 Convergence Analysis
2.5.2 A Worst Case OMP Algorithm
2.5.2.1 Description of the Algorithm
2.5.2.2 Convergence Analysis
2.5.3 Application to Point Evaluation
2.6 Joint Selection of Vn and Wm
2.6.1 Optimality Benchmark
2.6.2 A General Nested Greedy Algorithm
2.6.3 The Generalized Empirical Interpolation Method
2.7 A Piece-Wise Affine Algorithm to Reach the Benchmark Optimality
2.7.1 Optimality Benchmark Under Perturbations
2.7.2 Piecewise Affine Reduced Models
2.7.3 Constructing Admissible Reduced Model Families
2.7.4 Reduced Model Selection and Recovery Bounds
2.8 Bibliographical Remarks/Connections with Other Works
2.8.1 A Bit of History on the Use of Reduced Models to Solve Inverse Problems
2.8.2 For Further Reading
Appendix 1: Practical Computation of An, the Linear PBDW Algorithm
Appendix 2: Practical Computation of β(Vn, Wm)
References
3 Model Order Reduction for Optimal Control Problems
3.1 Outline
3.2 Introduction and Preliminaries
3.2.1 Motivation
3.2.1.1 Examples of PDE Systems
3.3 Lecture 1: The Model Order Reduction (MOR) Techniques
3.3.1 The Beginning of Snapshot POD with Sirovich in 1987
3.3.2 Marry POD with Adaptivity
3.4 Lecture 2: POD in Optimal Control
3.4.1 Cahn-Hilliard Optimization with Spatial Adaptivity
3.4.2 Other Algorithmic Developments
3.4.3 Certification: Error Estimates for Surrogate-Based Optimal Control
3.4.4 Data Quality in Surrogate Based Optimal Control
3.4.5 Test 1: Solution with Steep Gradient Towards Final Time
3.4.6 Test 2: Solution with Steep Gradient in the Middle of the Time Interval
3.4.7 Test 3: Control Constrained Problem
3.4.7.1 Conclusion
3.4.7.2 How Many Snapshots?
3.4.7.3 Where to Take Snapshots?
3.5 Lecture 3: A Fully Certified Reduced Basis Method for PDE Constrained Optimization
3.5.1 General Setting and Model Problem
3.6 The Reduced Problem and the Greedy Procedure
3.6.1 A Relative Error Bound
3.6.2 Convergence of the Method
3.6.3 Numerical Experiments
References
4 Machine Learning Methods for Reduced Order Modeling
4.1 Introduction
4.2 Mathematical Formulation
4.3 Reduced Order Modeling
4.3.1 Dynamic Mode Decomposition
4.3.2 Sparse Identification of Nonlinear Dynamics
4.3.3 Neural Networks
4.4 Discovery of Coordinates and Models for ROMs
4.5 Conclusions
References
Recommend Papers

Model Order Reduction and Applications: Cetraro, Italy 2021
 3031295625, 9783031295621

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Lecture Notes in Mathematics 2328 CIME Foundation Subseries

Michael Hinze · J. Nathan Kutz Olga Mula · Karsten Urban

Model Order Reduction and Applications Cetraro, Italy 2021 Maurizio Falcone · Gianluigi Rozza Editors

Lecture Notes in Mathematics

C.I.M.E. Foundation Subseries Volume 2328

Editors-in-Chief Jean-Michel Morel, Ecole Normale Supérieure Paris-Saclay, Paris, France Bernard Teissier, IMJ-PRG, Paris, France Series Editors Karin Baur, University of Leeds, Leeds, UK Michel Brion, UGA, Grenoble, France Annette Huber, Albert Ludwig University, Freiburg, Germany Davar Khoshnevisan, The University of Utah, Salt Lake City, UT, USA Ioannis Kontoyiannis, University of Cambridge, Cambridge, UK Angela Kunoth, University of Cologne, Cologne, Germany Ariane Mézard, IMJ-PRG, Paris, France Mark Podolskij, University of Luxembourg, Esch-sur-Alzette, Luxembourg Mark Policott, Mathematics Institute, University of Warwick, Coventry, UK Sylvia Serfaty, NYU Courant, New York, NY, USA László Székelyhidi Germany

, Institute of Mathematics, Leipzig University, Leipzig,

Gabriele Vezzosi, UniFI, Florence, Italy Anna Wienhard, Ruprecht Karl University, Heidelberg, Germany

2023 Paolo Salani

[email protected]

Daniele Angella

[email protected]

Michael Hinze • J. Nathan Kutz • Olga Mula • Karsten Urban

Model Order Reduction and Applications Cetraro, Italy 2021 Maurizio Falcone • Gianluigi Rozza Editors

Authors Michael Hinze Mathematisches Institut University of Koblenz and Landau Koblenz, Germany Olga Mula Department of Mathematics and Computer Science Eindhoven University of Technology Eindhoven, The Netherlands Editors Maurizio Falcone Dipartimento di Matematica Guido Castelnuovo Universita Roma La Sapienza Rome, Italy

J. Nathan Kutz Department of Applied Mathematics University of Washington Seattle, WA, USA Karsten Urban Institut für Numerische Mathematik Ulm University Ulm, Baden-Württemberg, Germany

Gianluigi Rozza Area Mathematics, SISSA mathLab Scuola Internazionale Superiore Trieste, Italy

This work was supported by CIME, SISSA Trieste, Sapienza University of Rome and by Italian Ministry for University and Research within PRIN projects.

ISSN 0075-8434 ISSN 1617-9692 (electronic) Lecture Notes in Mathematics C.I.M.E. Foundation Subseries ISBN 978-3-031-29562-1 ISBN 978-3-031-29563-8 (eBook) https://doi.org/10.1007/978-3-031-29563-8 Mathematics Subject Classification: 35-xx, 65-xx © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

The book is dedicated to the memory of Maurizio Falcone, visionary gentleman and great applied mathematician

Preface

This book collects the contributions presented at the CIME Summer School held in Cetraro, Italy, from June 29, 2021 to July 3, 2021, on model order reduction and applications. The behaviour of processes in several fields like mechanical engineering, geophysics, seismic phenomena, climate/weather prediction is usually modelled by dynamical systems. Such models often involve systems of nonlinear partial differential equations. Their approximation by classical discretisation techniques like finite difference and finite element methods leads to high-dimensional systems of ordinary differential equations or difference equations. The number of equations is typically very large and it can easily reach a few million. Even after a linearisation, this system is therefore at best computationally very expensive, and often it is not feasible to simulate its evolution. The aim of model order reduction (MOR) is to develop reduced order models giving an accurate approximation of the dynamics of the underlying large-scale system, by enabling the reduction process to be implemented as computationally efficient and fast. Today’s computational and experimental paradigms feature complex models along with disparate and, frequently, enormous data sets. This has motivated the development of theoretical and computational strategies for the construction of efficient and robust numerical algorithms that effectively resolve the important features and characteristics of these complex computational models, possibly in real time, when needed by the application. Clearly resolving the underlying model is often application-specific and combines mathematical tasks like approximation, prediction, calibration, design, control, and optimisation. In fact, running simulations that fully account for the variability of the complexities of modern scientific models can be a very difficult task due to the curse of dimensionality, chaotic behaviour of dynamics, and/or overwhelming streams of informative data. The CIME School, given the above framework and motivation, has addressed the state of the art of reduced order methods for modelling and computational

vii

viii

Preface

reduction of complex parametrised systems, governed by ordinary and/or partial differential equations, with a special emphasis on real time computing techniques and applications in various fields. The lecturers of the school and authors of the contributed chapters of this volume are internationally recognised experts in MOR and other related areas and they present several point of view and techniques to solve demanding problems of increasing complexity. The school has focused on theoretical investigation and applicative algorithm development for reduction in the complexity—the dimension, the degrees of freedom, the data—arising in these models. The four broad thrusts of the School and reflected in this volume are: (1) Mathematics of reduced order models, (2) Algorithms for approximation and complexity reduction, (3) Computational statistics and data-driven techniques, and (4) Application-specific design. The particular topics include classical strategies such as parametric sensitivity analysis and best approximations, as well as mature but active topics like principal component analysis and information-based complexity, and also rising promising topics such as layered neural networks and high-dimensional statistics. The volume would like to attract researchers and PhD students working or willing to work on model order reduction, data-driven model calibration and simplification, computations and approximations in high dimensions, and data-based uncertainty quantification. The volume contains investigation and assimilation of complementary approaches with the goal of creating a productive cross-fertilisation and serving as a stronger and more structured link for several diverse research communities. The structure of the volume is the following. After this introduction, Chap. 1 by Karsten Urban deals with reduced basis method in space and time and it underlines challenges, limits, and perspectives related with the methodology. Chapter 2 by Olga Mula is devoted to inverse problems and deterministic approaches using physicsbase reduced models. Chapter 3 by Michael Hinze is focused on model order reduction for optimal control. Chapter 4 by Nathan Kutz is dedicated to machine learning method for reduced order modelling. Rome, Italy Trieste, Italy 25 October 2022

Maurizio Falcone Gianluigi Rozza

Preface

ix

Acknowledgement We do acknowledge the support and organisation provided by CIME, the contribution from MIUR PRIN project “Innovative Numerical Methods for Evolutionary Partial Differential Equations and Applications”, as well as support from Mathematics Department of Sapienza University in Rome and from Mathematics Area of SISSA, Scuola Internazionale Superiore di Studi Avanzati in Trieste. *****

Just few days after we closed this book Maurizio passed away. We all are stuck by this loss for the applied mathematics community. We do dedicate this volume to his memory. We will miss you, Maurizio, visionary gentleman and great scientist. Thank you for all you did for the applied mathematics community at national and international level. Also CIME Foundation and Springer support the dedication of this book to the memory of our colleague Maurizio. Trieste, Italy 14 November 2022

Gianluigi Rozza Elvira Mascolo Olga Mula Karsten Urban Michael Hinze J. Nathan Kutz Paolo Salani

Contents

1

2

The Reduced Basis Method in Space and Time: Challenges, Limits and Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Karsten Urban

1

Inverse Problems: A Deterministic Approach Using Physics-Based Reduced Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Olga Mula

73

3

Model Order Reduction for Optimal Control Problems . . . . . . . . . . . . . . . . 125 Michael Hinze

4

Machine Learning Methods for Reduced Order Modeling . . . . . . . . . . . . . 201 J. Nathan Kutz

xi

List of Symbols

aμ α◦ αμ Aμ βμ βμ◦ βμδ β◦ β bμ bq Bμ Bμ◦ Bμ◦,∗ Bμ∗ B δμ B N (μ) Bq BSC 1 (I¯) C{t} 2 C2π C Cμ CN N (μ) δN (μ) dN ·, ·V  ×V γμ H01 ( )

param. coercive bilinear form (= a(·, ·; μ)) 30 lower bound for coercivity constant 30 coercivity constant 32 affine symmetric and coercive operator 30 inf-sup constant 10, 11, 13, 17 inf-sup constant discrete (LBB) 13, 14, 18 truth inf-sup constant 13 lower bound for LBB constant 13, 14 lower bound for inf-sup constant 11, 13 parameterized bilinear form (= b(·, ·; μ)) 10 form in affine decomposition 16, 21 parameterized PDE operator 9 parameterized PDE operator, classical form 37 adjoint of PPDE operator, classical form 44 adjoint parameterized PDE operator 38 truth stiffness matrix 57 reduced stiffness matrix 17 operator in affine decomposition 21 blade steering curve 5 ˙ = 0} 44 {φ ∈ C 1 (I¯) : φ(t) = φ(t) 2 2π-periodic C -functions 5 upper bound for continuity constant 11 continuity constant 11, 14 Crank–Nicolson method 35 error estimator 17, 18 error estimator w.r.t. the truth 18 Kolmogorov N-width 2, 20 dual pairing 10 continuity constant of coercive operator 32 Sobolev space with BCs 31 xiii

xiv 1 (I ; V ) H(0) H 1 (I ; X) 2 (I ) H{T } L2 ( ) L2 (I ; X) L(X , Y) N N P Ptrain rμδ R δ,N (μ) σN (P) SN su (μ) τ ⊗φ M ⊗A |||v|||μ U U Uδ Uμδ uμ uδμ uN μ UN UμN V Vμ V Vδ Vμδ V VN VμN VSP

List of Symbols

Bochner-Sobolev space w. initial cond. 32 Bochner-Sobolev space 31 ˙ ) = 0} 49 {ϑ ∈ H 2 (I ) : ϑ(T ) = ϑ(T Lebesgue space 31 Bochner space 31 space of linear mapping from X to Y 10 reduced dimension (N  N ) 15 truth dimension 12 parameter set 9 parameter training set 19 residual 13 dual norm of reduced residual 18 maximal RB error 23 sample set 15 supremizer 11, 15, 17 tensor product of functions 34 tensor (Kronecker) product of matrices 34 trial space norm in ultraweak form 38 trial space 9 dual of trial space 9 truth trial space 12 parameter-dependent truth trial space 39 exact solution 10 truth approximation 12 truth approximation 15 reduced trial space 15 reduced parameter-dependent trial space 40 test space 9 parameter-dependent test space 49 dual of test space 9 truth test space 12 parameter-dependent truth test space 39 Bochner space (V := L2 (I ; V )) 32 reduced test space 15 reduced parameter-dependent test space 40 Voith-Schneider® Propeller 4

Chapter 1

The Reduced Basis Method in Space and Time: Challenges, Limits and Perspectives Karsten Urban

Abstract The simulation and optimization of several real-world industrial problems involve parameters, e.g. unknown constants, design parameters, controls etc. In many cases, these parameters are subject to uncertainties. In many situations, the underlying problem has to be solved very often (“multi-query”) and/or extremely fast (“realtime”) and/or using restricted memory/CPU (“cold computing”). Moreover, the mathematical modeling yields complex systems in the sense that (1) each simulation is extremely costly, its CPU time may be in the order of several weeks; (2) we are confronted with evolutionary, time-dependent processes with long time horizons or time-periodic behaviors (which often requires long-time horizons in order to find the time-periodic solution). All problems rely on time-dependent parameterized partial differential equations (PPDEs); (3) the processes often involve transport and wave-type phenomena as well as complex coupling and nonlinearities. Without significant model reduction, one will not be able to tackle such problems. Moreover, there is a requirement in each of the above problems to ensure that the reduced simulations are certified in the sense that a reduced output comes along with a computable indicator which is a sharp upper bound of the error. The Reduced Basis Method (RBM) is a well-established method for Model Order Reduction of PPDEs. We recall the classical framework for well-posed linear problems and extend this setting towards time-dependent problems of heat, transport, wave and Schrödinger type. The question of optimal approximation rates is discussed and possible benefits of ultraweak variational space-time methods are described.

K. Urban () Institute for Numerical Mathematics, Ulm University, Ulm, Germany e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 M. Falcone, G. Rozza (eds.), Model Order Reduction and Applications, C.I.M.E. Foundation Subseries 2328, https://doi.org/10.1007/978-3-031-29563-8_1

1

2

K. Urban

1.1 Introduction Several phenomena in mechanical engineering, geophysics, seismic modeling, climate/weather prediction and other fields are modeled by time-dependent partial differential equations or dynamical systems. Moreover, such systems are often imbedded into optimization tasks and/or need to be solved within an embedded system of restricted CPU and/or memory (which is nowadays termed “cold computing”). Without significant reduction of the complexity of the underlying systems there is no hope to perform such highly complex simulations within such multiquery and/or realtime environments, even taking fast growing computing capacities into account. The aim of model order reduction (MOR) is to develop reduced order models giving an accurate approximation of the dynamics of the underlying large-scale system, by enabling the reduction process to be implemented as computationally efficient and fast. Clearly resolving the underlying model is often application-specific and combines mathematical tasks like approximation, control and optimization with prediction, calibration and design. Over the last two decades, we have been working on a whole variety of industrial problems. Those problems from quite different areas all posed specific challenges calling for MOR. Since all problems we have been confronted with, are modeled in terms of instationary parameterized partial differential equations (PPDEs), we have been choosing the Reduced Basis Method (RBM) as a well-studied method for model order reduction. However, in many specific cases, the existing welldeveloped mathematical theory was not fully sufficient but needed to be adjusted in an application-specific manner. Most problems, we have been working on, are nonlinear. However, even after a linearization, the arising systems are at best computationally very expensive, and often it is not feasible to simulate its evolution. Hence, we study the RBM in space and time with the aim of treating instationary PPDEs. We describe a general framework for well-posed linear PPDEs, which can also be extended to evolutionary problems by treating time as an extra variable. At least from a mathematical point of view, the question arises, how good a reduced model can be at best? How small can the error be given the maximal size of the reduced system (e.g. determined by the capacity of the cold computing device)? This question is usually answered by the decay of the Kolmogorov N-width dN , i.e., the error of the best-possible linear reduced system of size .N ∈ N. It is remarkable that .dN decays exponentially fast within this general framework. However, what happens if we leave the realm of elliptic and parabolic problems which are fully covered by the RBM setting? We consider three problems beyond that scope, namely the linear stationary and instationary transport problem, the wave equation and the Schrödinger equation. We show limitations both w.r.t. to the general variational framework and also concerning the decay of .dN . A possible way out is an ultraweak variational form, which has been considered recently. Perspectives, potential and limitations are shown.

1 The Reduced Basis Method in Space and Time: Challenges, Limits and Perspectives

3

The remainder of this paper is organized as follows. Section 1.2 contains brief descriptions of some of the industrial challenges that we have been working on; details can be found in the quoted literature. At the end of that section, we collect conclusions and lessons that we have learnt from our work on those problems with the specific emphasis on model reduction. We cannot address all relevant issues in this paper but concentrate on the specific challenges of instationary linear problems with a fixed (finite) number of deterministic parameters. We are fully aware that there are several papers going beyond that scope; we quote some of them. Moreover, in the scope of this article, we cannot describe how to actually solve the mentioned problems in terms of model reduction but refer to the quoted literature. Instead we concentrate in the fundamental mathematical questions arising from the consideration of the industrial challenges that we have been working on. In Sect. 1.3, we give a short review on the Reduced Basis Method (RBM) and focus on those aspects that are relevant here. In particular, we present the RBM for general linear well-posed problems, i.e., we only use inf-sup stability, neither symmetry nor coercivity. This also includes the generation of the RBM by a weak greedy algorithm and the analysis of the Kolmogorov N-width. Even though basic knowledge of RB methods is helpful for the understanding of this section (and also the subsequent ones), we try to give a self-contained description. However, we assume that the reader is familiar with variational formulations of elliptic PDEs and Sobolev spaces. Section 1.4 is devoted to the description of specific examples to which we detail the general setting in Sect. 1.3. We start by well-known problems, namely the thermal block (a coercive problem) and the parameterized heat equation. Both cases can be fully covered by the framework. For the parabolic problem, this amounts to consider a variational formulation in space and time. However, the industrial challenges are not restricted to elliptic and parabolic problems only. In Sect. 1.5 we thus go beyond that scope. Transport, wave and Schrödinger-type problems do not directly fit into the general framework. However, by considering appropriate ultraweak variational formulations of such problems, some results can be recovered, in particular w.r.t. stability and the derivation of a reduced model. The exponential decay of the Kolmogorov N-width, however, cannot be guaranteed in general. The numerical solution of the arising space-time discretized systems yields stiffness matrices which are sums of tensorproducts. The efficient numerical treatment is described in Sect. 1.6. We show some results that indicate that efficient methods from Numerical Linear Algebra allow a numerical solution which even outperforms standard time-stepping schemes concerning runtime and accuracy. We end with short conclusions and an outlook in Sect. 1.7. Finally, we would like to stress that this paper contains part of the work that we have been doing over the past years as well as some results from the literature which are relevant for this overview. Of course, there is a huge amount of work which is related to the presented material. This paper cannot aim at giving a complete survey on related work.

4

K. Urban

1.2 Some Industrial Challenges: Why Do We Need Model Reduction? Over the past two decades, we have been working on several industrial problems. We briefly sketch some of them and collect the main specific challenges we have been confronted with.

1.2.1 Ship Design, Steering and Propulsion In 1927 the ship’s propulsion system Voith-Schneider® Propeller (VSP), the only one of its kind in the world, was developed by Voith based on the basic idea of the Austrian engineer Ernst Leo Schneider (1894–1975). In Fig. 1.1 on the left-hand side two Voith-Schneider® Propeller s integrated in a tug boat are shown, the righthand side shows a sectional drawing of the VSP. The propeller allows thrust of any magnitude to be generated in any direction quickly, precisely and in a continuously variable manner. It combines propulsion and steering in a single unit. On the Voith-Schneider® Propeller, a rotor casing (1) which ends flush with the ship’s bottom is fitted with a number of axially parallel blades (2) and rotates about a vertical axis. To generate thrust, each of the propeller blades performs an oscillating motion about its own axis (similar to the motion of the tail fin of a fish). This is superimposed by a uniform rotary motion. Blade excursion determines the amount of thrust, while the phase angle of between .0◦ and .360◦ determines its direction. As a result, the same amount of thrust can be generated in any direction, making this the ideal variable-pitch propeller. Both variables—the magnitude and the direction of thrust—are controlled by a control rod (4), which is pulled by two servomotors (5) and a mechanical kinematic transmission (3). Due to its capabilities, the Voith-Schneider® Propeller has so far been utilized in water tractors, ferries, mine-hunters and special ships floating cranes or oceanographic research ships. For all those different purposes there are different propeller types, that differ in size, number of blades, shape of blades, blade steering and

Fig. 1.1 Voith-Schneider® propeller

1 The Reduced Basis Method in Space and Time: Challenges, Limits and Perspectives

5

many other construction details. All those parameters have great influence on the characteristics of the propeller and are possibly subject to optimization.

1.2.1.1

Optimal Steering

In [56], we described the optimization of the steering using Computational Fluid Dynamics (CFD) and numerical optimization. The steering of the blades, i.e., the oscillating motion mentioned above, can be described as steering angle (.α in 2 , Fig. 1.2) in dependency of the rotor (1) position (.φ in Fig. 1.2). We call .μ ∈ C2π .μ(φ) := α the blade steering curve (BSC). This curve is not only defining the magnitude and direction of thrust, it obviously also has enormous influence on the flow around the propeller and therefore on its efficiency. It was therefore chosen as the first design variable for optimization. The problem under investigation is thus to find an optimal BSC in the sense that the efficiency of the VSP, at one specific driving situation, is maximal for the determined BSC as compared with all others. The BSC .μ determines the location of the propellers and thus the time-dependent domain .μ of water flowing around the Voith-Schneider® Propeller. For each .(t, x) ∈ (0, 2π ) × μ(t) (we identify the rotation angle .φ with the time variable t), we need to determine the velocity .uμ and the pressure .pμ as the solution of the incompressible Navier-Stokes equations .

∂ ∂t uμ

+ uμ · ∇uμ − ν uμ + ρ1 ∇pμ = f ,

∇ · uμ = 0

(1.1)

with homogeneous Dirichlet conditions on the blades, inflow conditions determined by the movement of the ship and periodic boundary conditions for the time (or rotation) variable. As usual, .ν denotes the kinematic viscosity, .ρ is the density and .f a given exterior force.

60

φ α

blade steering angle α [°]

40

20

0

−20

−40

−60 0

Fig. 1.2 Blade steering curve (BSC)

30

60

90

120 150 180 210 240 270 300 330 360 phase angle φ [°]

6

K. Urban

The propulsion efficiency .η(μ) of the free running propeller is defined as the ratio of the propulsion power (determined by the thrust force .ks(μ)) and the delivered ks(μ) power (in terms of the torque .kd(μ)), i.e., .η(μ) := c kd(μ) with a constant c. Force and torque are determined as integrals over time and the surfaces of the blades of the normal and tangential components of the pressure .pμ , respectively. Note that the surfaces of the blades are parameter-dependent due to their rotation given in terms of the BSC, which makes .η(μ) highly nonlinear. Then, we need to solve the constraint optimization problem η(μ) → max!

.

2 s.t. μ ∈ C2π and (uμ , pμ ) solves (1.1).

For more details, we refer to [56]. In order to solve this problem, we have used numerical optimization and Computational Fluid Dynamics (CFD), in particular the Vortex-Lattice Method for the optimization and Finite Volume Method for validation [60] to compute .uμ , .pμ from (1.1) for a given BSC .μ. For the optimization, we started by derivative-free optimization methods in [56] and improved this by a gradient-based optimization using automatic differentiation in [61]. We obtained a significant improvement of 4.8%. In terms of the subsequent discussion, we note that the blade steering curve serves as parameter (here a parameter function [62]) and the numerical optimization requires to solve the (time-dependent, time-periodic, nonlinear) CFD problem for many different BSCs. The CFD simulation can be seen as a “truth” discretization in the sense that the solution is sufficiently accurate (with engineering accuracy), but computationally costly. At the time we started with this project (2006), a single CFD calculation in 2d took about 2 weeks on a parallel cluster, 3d was out of reach to that time. This shows the need to reduce the complexity to as few calls to the “truth” problem as possible.

1.2.1.2

Optimal Design and Control

Once we had proven that numerical methods are able to yield significant improvement of VSPs behavior also in real-world environments, we considered optimal design problems starting from the shape of the blades and continuing towards the optimization of the full ship hull geometry under the constraint that the hull is able to carry a fixed number (typically 2–5) of VSPs [53, 76]. First of all, this required an extremely efficient numerical simulation of the rigid body motion with the efficiency of the BSC as output of interest. Hence, we developed an adjoint-based finite volume method for the time-dependent incompressible Navier-Stokes equations describing the flow around the ship [76]. The geometry of the ship served as parameter and required an efficient description in terms of splines [53]. The next issue was optimal allocation, namely given a ship with a number of Voith-Schneider® Propeller s (usually 3–5). Then, given the requirement of a certain maneuver of the ship, the optimal allocation of thrust and steering for each propeller has to be determined in order to maximize efficiency [15]. This problem is also

1 The Reduced Basis Method in Space and Time: Challenges, Limits and Perspectives

7

relevant in dynamic positioning for ships e.g. in an offshore environment for vessel supply ships. From the numerical point of view we are facing a realtime optimal control problem, where the desired control can be seen as parameter. The numerical solution of the full 3d, time-periodic, nonlinear CFD problem around the full ship (typically also with attachments such as the protection plate shown on the left in Fig. 1.1) and (up to five) propellers (i.e., with time-dependent domain) is still too complex to be solved in realtime. Moreover, such simulations are to be done onboard, i.e., on the ship as we cannot expect to have a secure and fast mobile connection to high performance computers on land. Thus, we have a cold computing situation, namely heavily restricted memory and CPU [68]. Finally, the actual weather conditions serve as input values. Those conditions are determined by measurements through onboard sensors, which are of course subject to uncertainties.

1.2.2 Financial and Energy Markets, Trading Numerical simulations are extensively been used for financial markets, e.g. for trading, valuation and hedging. The most simple models (e.g. the standard BlackScholes model) allow for a closed formula solution. The involved parameters (typically market parameters such as drift and volatility) can be observed on the market and the arising numerical models are calibrated. This is often done over night so that extremely fast numerical methods allow for highspeed trading in realtime. However, as the financial crises have shown, simple models are not capable to represent significant effects on the markets. This means that more sophisticated models combined with data-based modeling is required. The resulting complex (and often high-dimensional) systems call for highly efficient adaptive methods, e.g. [59]. However, even with such “truth” solvers at hand, model reduction is a must. Financial modeling is not only relevant on stock markets. In fact, several goods are nowadays traded and corresponding derivatives exist. One example is the European Union Emission Trading Scheme [78], another one the trading with renewable electricity [38, 39]. All these examples require the extremely fast solution of time-dependent problems involving a number of (market) parameters, some of them with uncertainties.

1.2.3 Medicine: Biomechanics, Fracture Healing, Flow Simulation, Optics Patient-specific medicine is a major goal of these days societies. One aspect of that is the determination of optimal healing strategies in case of bone fractures. In such cases, it is desirable to find exterior therapies (e.g. by fixators) avoiding operations.

8

K. Urban

The latter ones typically involve implanting nails which often need to be removed after healing, requiring another operation though. Hence, one seeks a forecast of the healing process given a patient-specific geometry (of the bones, the fracture and the tissue) and some therapy (a control). Hence, the numerical simulation typically requires to solve parameterized complex time-dependent problems [33, 69, 84], where the time horizons are pretty long compared to the required temporal resolution due to the stiffness of the systems. The numerical optimization for determining an optimal therapy requires repeated simulations in realtime [64]. In addition, often the models, parameters and data are subject to severe uncertainties as well as missing data [77], such that uncertainty quantification (UQ) is a must. However, UQ again requires several “truth” simulation with different parameter values. Healing of bone fractures can often be modeled by a combination of mechanical and diffusion processes. We have also worked on optimal therapies for nasal surgeries [84]. There, we were facing complex CFD problems involving severe transport and turbulent effects. Finally, many medical therapies rely on imaging procedures to determine high resolution 3d data. Extremely accurate optical surfaces are requires for that purpose. In [55], we introduced a numerical scheme for the representation, analysis and simulation of optical surfaces including the possibility for local corrections. The high resolution numerical simulation requires to solve the electromagnetic wave equation. Finally, slightly outside this scope, quantum physics is a field of enormous potential e.g. for information and imaging. Within the IQ.ST center for Integrated Quantum Science and Technology at Ulm University and the University of Stuttgart, we have been working on reduced models for the Schrödinger equation [47].

1.2.4 Lessons Learnt Let us summarize the specific needs, requirements and challenges that we can deduce from the description of the real-world problems we have been working on. Of course, we are fully aware that the literature is full of descriptions of other important real-world problems [5, 6, 12, 14, 20, 30, 36, 72], just to mention a few. We only report our experience without any claim of generality or specific relevance. • All described problems involve parameters, e.g. unknown constants, design parameters, controls etc. In many cases, these parameters are subject to uncertainties e.g. in terms of measurement errors. • Real-world problems have to be solved very often (“multi-query”) and/or extremely fast (“realtime”) and/or using restricted memory/CPU (“cold computing”).

1 The Reduced Basis Method in Space and Time: Challenges, Limits and Perspectives

9

• Typically, the mathematical modeling yields complex systems. In the above examples, “complex” means – each simulation is extremely costly, its CPU time may be in the order of several weeks; – we are confronted with evolutionary, time-dependent processes with long time horizons or time-periodic behaviors (which often requires long-time horizons in order to find the time-periodic solution). All problems rely on time-dependent parameterized partial differential equations (PPDEs); – the processes often involve transport, wave-type phenomena as well as complex coupling and nonlinearities. Without significant model reduction, we will not be able to tackle the above scope of problems. Moreover, there is a requirement in each of the above problems to ensure that the reduced simulations are certified in the sense that a reduced output comes along with a computable indicator which is a sharp upper bound of the error.

1.3 The Reduced Basis Method (RBM) The reduced basis method (RBM) is a well-established method for model order reduction of parameterized partial differential equations (PPDEs), see e.g. [8, 52, 67, 71] for nice and extensive surveys. We do not aim at giving another survey. Our purpose is to collect those facts that can contribute to tackle the above listed industrial problems. In particular, we will be focussing on time-dependent, noncoercive problems. The framework and the notation used from the very beginning will allow us to describe stationary as well as time-dependent problems.

1.3.1 Parameterized Linear PDEs We consider a parameterized operator equation (think of a parameterized PDE expressed in terms of a differential operator in variational form). To this end, let U and V be some Hilbert spaces (usually Sobolev spaces) such that we have two Gelfand triples U → HU → U ,

.

V → HV → V ,

where U  and V  denote the dual spaces of .U and .V, respectively. These spaces arise from the variational formulation of the PPDE at hand. In case of elliptic problems, .U and .V as well as .HU and .HV coincide. Next, .P ⊂ Rp is assumed to be a compact parameter set and .Bμ : U → V is some parameterized linear operator. The parameter set may also be infinitedimensional, i.e., .p = ∞, which would then be a set of sequences. In that case,

10

K. Urban

μ can be interpreted as parameter function. For any parameter .μ ∈ P we are given some data .fμ ∈ V and are interested in the state .uμ ∈ U such that

.

Bμ (uμ ) = fμ

.

in V .

(1.2)

Remark 1.1 In many applications, the state is not the main output of interest, but some quantity .sμ := μ (uμ ) for . μ : U → R. In order to streamline the presentation, we are not going to consider the case of output functions here, but refer to the above mentioned textbooks for the notion of adjoint problems, which can easily be adapted to the framework presented here. Since .V is a Hilbert space, (1.2) can equivalently be written as Bμ (uμ ), v V ×V = fμ (v) = fμ , v V ×V

.

for all v ∈ V,

(1.3)

where ·, · V  ×V denotes the dual pairing. Analogously, we introduce the linear form .fμ = f (·; μ) ∈ V by .f (v; μ) = fμ (v) for .v ∈ V. We need to ensure that .Bμ ∈ L(U, V ) is an isomorphism,1 namely that the operator is bijective and bounded with bounded inverse (so that (1.2), or (1.3), is well-posed, see below). This means in particular that the trial (or solution) space .U and the test space .V need be chosen appropriately. As mentioned above, we do not assume that trial and test spaces coincide, namely we go beyond elliptic problems from the very beginning on. We need to ensure well-posedness of (1.2), namely existence, uniqueness and stability. There is a rich theory concerning such linear variational problems. In order to describe the known results, consider the bilinear form .bμ : U × V → R, which is induced by the operator .Bμ in the sense that bμ (u, v) ≡ b(u, v; μ) := Bμ (u), v V ×V ,

.

u ∈ U, v ∈ V.

We will say that the bilinear form .bμ is continuous (or bounded) if there exists a constant .Cμ > 0 such that |b(u, v; μ)| ≤ Cμ u U v V

.

for all u ∈ U and all v ∈ V.

On the other hand, we say that the bilinear form .bμ satisfies an inf-sup condition if there exists a constant .βμ > 0 (called the inf-sup constant) such that .

1 L (X , Y )

b(u, v; μ) ≥ βμ u U

v V v∈V sup

for all u ∈ U.

denotes the space of linear mappings from .X to .Y.

(1.4)

1 The Reduced Basis Method in Space and Time: Challenges, Limits and Perspectives

11

Remark 1.2 If .U = V and if .bμ is coercive, i.e., if there exists a constant .αμ > 0 such that .bμ (u, u) ≥ αμ u 2U for all .u ∈ U, then (1.4) is satisfied for .βμ = αμ . Definition 1.1 (a) We call .bμ uniformly continuous, if .bμ is continuous with .Cμ ≤ C < ∞ for some constant .C and all .μ ∈ P. (b) If there exists a constant .β > 0 such that .bμ satisfies an inf-sup condition with .βμ ≥ β for all .μ ∈ P, then the bilinear form .bμ is called uniformly inf-sup stable, .β is termed lower inf-sup bound. We are now going to recall a generalization of the Lax–Milgram theorem, which we note as a corollary below. The result clarifies conditions ensuring well-posedness of PPDEs. Theorem 1.1 (Banach–Neˇcas) Let .μ ∈ P be a parameter and .bμ : U × V → R be a continuous bilinear form. Then the following statements are equivalent: (i) For each .fμ ∈ V there exists a unique .uμ ∈ U such that b(uμ , v; μ) = f (v; μ) for all v ∈ V.

.

(1.5)

(ii) (a) The condition (1.4) holds, and (b) for all 0 = v. ∈ V there exists wμ ∈ U with b(wμ , v; μ) = 0.

(1.6)

Corollary 1.1 (Lax–Milgram) Let .μ ∈ P be a parameter and .aμ : U × U → R be a symmetric, continuous and coercive bilinear form. Then, for each .fμ ∈ U there exists a unique .uμ ∈ U such that .a(uμ , v; μ) = f (v; μ) for all .v ∈ U. From this theorem, we see the importance of the inf-sup condition (1.4) for the analysis of linear variational problems. As we shall see below, it also plays a key role for the stability of numerical approximations. For both purposes, analysis and discretization, we sometimes need those elements that realize the supremum in (1.4). Definition 1.2 Let .μ ∈ P and .0 = u ∈ U be given. An element .su (μ) ∈ V is called supremizer if bμ (u, v) . v∈V v V

su (μ) V = arg sup

.

Lemma 1.1 Let .μ ∈ P and .0 = u ∈ U. Then, the supremizer .su (μ) is the unique solution of .(su (μ), v)V = bμ (u, v) for all .v ∈ V.

12

K. Urban

Proof The variational problem has a unique solution due to the Lax–Milgram theorem. Moreover, .

b(u, v; μ) (su (μ), v)V = sup = su (μ) V ,

v V

v V v∈V v∈V sup

 

which proves the claim.

Having the supremizer at hand, we can reduce proving inf-sup stability to showing that . su (μ) V ≥ βμ u U .

1.3.2 A Detailed Approximation: The “Truth” The next ingredient is that we shall assume that we have a detailed simulation method at our disposal in the sense that we are able to approximate .uμ with any desired accuracy for a given (fixed) parameter .μ. Of course, such a simulation may be computationally costly, in particular for increasing accuracy. In particular, we can choose the accuracy in such a way that the approximate solution cannot be distinguished from the exact one and thus can be interpreted as “sufficiently exact”, which is the reason why this is called a “truth” approximation. This detailed or “truth” approximation is determined by a Petrov–Galerkin method2 based upon finite-dimensional subspaces U δ ⊂ U,

.

V δ ⊂ V,

(1.7)

where we assume for simplicity that .

dim(Uδ ) = dim(Vδ ) =: N ∈ N

“large”,

otherwise we would need to solve a least-squares system. We may think of .δ as a discretization parameter, e.g. a mesh size and .N = Nδ denotes the corresponding dimension, which we think of being large. Then, the “truth” approximation .uδμ ∈ Uδ is determined by solving b(uδμ , v δ ; μ) = fμ (v δ ) for all v δ ∈ Vδ .

.

(1.8)

It is always assumed that (1.8) is well-posed (see below) and that the solution .uδμ can be computed at .O(N) (i.e., linear) complexity for each fixed parameter .μ.

2 Of course also other discretizations such as finite differences, finite volumes, discontinuous Galerkin methods etc. are possible. We restrict ourselves to the Petrov–Galerkin method here since it matches the theory presented below in a natural manner.

1 The Reduced Basis Method in Space and Time: Challenges, Limits and Perspectives

13

Well-posedness of (1.8) can be deduced from Theorem 1.1 as long as the inf-sup condition (1.4) is satisfied for .Uδ and .Vδ with some positive constant .βμδ > 0. Note that at this point, it may happen that .βμδ → 0 as .δ → 0 (i.e., .N → ∞). However, in order to ensure stability, we will later require a uniform inf-sup condition w.r.t. .N, resp. .δ (i.e., the size of the discretization). Moreover, without further assumptions, there is no guarantee that .βμδ is positive uniformly in the parameter, which is quite important as we shall see next. Remark 1.3 Referring also to the list of symbols at the end of this article, some remarks on the different inf-sup constants might be useful for the understanding: 1. By .βμ , we denote the inf-sup constant of the PPDE, by .β a parameterindependent lower bound. For well-posedness of the PPDE we need .βμ > 0. 2. For the well-posedness of the truth problem, we need .βμδ > 0 for any fixed discretization parameter .δ. For stability as .δ → 0+, we need that .βμδ ≥ βμ◦ > 0, which is also called LBB condition. By .β ◦ we denote a parameter-independent lower bound for the LBB constant. Lemma 1.2 Let the conditions of Theorem 1.1 and the inf-sup condition (1.4) hold for some .βμ > 0. Then,

uμ − uδμ U ≤

.

1 δ

r  , βμ μ V

where .rμδ := fμ − Bμ uδμ denotes the residual. Proof For completeness, we give the very well-known and simple proof. We use the inf-sup condition (1.4) for the error .uμ − uδμ ∈ U, i.e.,

uμ − uδμ U ≤

.

=

b(uμ − uδμ , v; μ) fμ (v) − b(uδμ , v; μ) 1 1 sup = sup βμ v∈V

v V βμ v∈V

v V fμ − Bμ uδμ , v V ×V 1 1 δ sup =

r  , βμ v∈V

v V βμ μ V

which proves the claim.

 

We stress the fact (and this is also the reason why we included the well–known proof) that .βμ in Lemma 1.2 is the inf-sup constant of the continuous problem, i.e., the variational problem, not its Petrov–Galerkin discretization. We are now going to further analyze the stability of the “truth” discretization. Definition 1.3 The spaces .Uδ ⊂ U and .Vδ ⊂ V in (1.7) are said to satisfy a Ladyshenskaya–Babuška–Brezzi (LBB) condition with respect to the bilinear form

14

K. Urban

bμ : U × V → R, .μ ∈ P, if there exists a constant .βμ◦ > 0 such that

.

.

inf

sup

uδ ∈Uδ uδ ∈Vδ

b(uδ , uδ ; μ) ≥ βμ◦

uδ U uδ V

(1.9)

for all .δ. If there exists a .β ◦ > 0 such that .βμ◦ > β ◦ for all .μ ∈ P, we say that the uniform LBB condition holds. Remark 1.4 If .bμ : U × U → R is coercive with coercivity constant .αμ > 0 and Uδ ⊂ U finite-dimensional. Then, (1.9) holds with .βμ◦ = αμ . In other words, for Galerkin methods, the stability of discrete systems is automatically inherited from the coercivity of the problems. This is not true in general if the problem is only inf-sup-stable. This is why Definition 1.3 is needed.

.

The following result from [85] states that the Petrov–Galerkin approximation is—up to a problem-dependent constant—as good as the best approximation. It is thus a natural generalization of the famous Céa lemma, see Corollary 1.2 below. Theorem 1.2 (Xu and Zikatanov [85]) Let .μ ∈ P be a parameter and .bμ : U × V → R be a continuous bilinear form with continuity constant .0 < Cμ < ∞, for which the LBB condition (1.9) holds for some constant .βμ◦ > 0. Given .uμ ∈ U, let δ δ δ δ δ δ δ .uμ ∈ U be the solution of .bμ (uμ , v ) = bμ (uμ , v ) for all .v ∈ V . Then .uμ and δ .uμ satisfy the estimate

uμ − uδμ U ≤

.

Cμ inf uμ − wδ U . βμ◦ wδ ∈Uδ

(1.10)

If .bμ is uniformly continuous and uniformly LBB stable (Definition 1.1), we have

uμ − uδμ U ≤

.

C inf uμ − w δ U . β ◦ wδ ∈Uδ

(1.11)

As opposed to Lemma 1.2, the inf-sup constant in Theorem 1.2 is .βμ◦ , i.e., the discrete inf-sup (or LBB) constant in (1.9), see Remark 1.3 above. In the coercive Galerkin case, these constants coincide. Corollary 1.2 (Céa Lemma) Let the conditions of Corollary 1.1 (Lax–Milgram) hold, let .Uδ ⊂ U be finite-dimensional and denote by .uδμ ∈ Uδ the unique solution of .a(uδμ , v δ ; μ) = f (v δ ; μ) for all .v δ ∈ Uδ . Then,

uμ − uδμ U ≤

.

Cμ inf uμ − wδ U . αμ wδ ∈Uδ

(1.12)

1 The Reduced Basis Method in Space and Time: Challenges, Limits and Perspectives

15

1.3.3 Offline Training: The Reduced Problem For the ultimate goal of approximating .uμ for many parameters and/or in realtime, the detailed “truth” discretization is too costly. In fact, the dimension .N is usually too large so that its use in many-query and/or realtime environments is prohibitive. The required model reduction is now performed by using the “truth” approximation several (but hopefully not too many) times in an offline or training phase. This is done by appropriately selecting samples SN := {μ(1) , . . . , μ(N ) } ⊂ P, where N  N is (hopefully) small.3 For each sample, one determines a snapshot ξn := uδμ(n) ∈ U. The reduced N-dimensional trial space is then defined as4 U N := Span{ξ1 , . . . , ξN }.

.

(1.13)

Given some new parameter value μ ∈ P, one first constructs parameter-dependent test spaces by determining supremizers ηn (μ) := sξi (μ) (see su (μ) defined in Definition 1.2) and defines V N (μ) := Span{η1 (μ), . . . , ηN (μ)}.

.

(1.14)

N Then, a reduced approximation is determined by seeking uN μ ∈ U such that

Bμ (uN μ ), vN Vμ ×Vμ = fμ , vN Vμ ×Vμ

.

for all vN ∈ V N (μ).

(1.15)

In general, we can no longer hope to solve (1.15) in linear complexity, so that we need to assume that the complexity is O(N • ), where “•” denotes some power, typically • = 3 by using a direct solution method. Such an approach is only meaningful if N •  N and N • must be independent of N. If this is the case, we call the approximation online-efficient. The point of view is that uN μ is computed in an online stage for possibly many different parameter values μ (multi-query), extremely fast (realtime) and/or on cold computing devices, whereas the reduced model (i.e., the spaces U N and V N (μ)) are mainly determined offline in a training phase.

3 Even though the notation μ(n) might be seen as somewhat cumbersome, we recall that each parameter μ ∈ P ⊂ RP is a vector with components μ1 , . . . , μP . This is the reason why we use superscripts for denoting the N samples in RP . 4 In order to have a clear distinction between high- and low-dimensional spaces, we use calligraphic letters for the high-dimensional and normal symbols for the reduced ones.

16

K. Urban

1.3.3.1

Offline-Online Decomposition

For the reduced approximation to be online-efficient, the following property is crucial. Definition 1.4 (a) The parametric bilinear form bμ is called affine in the parameter if there exists a Qb ∈ N, functions ϑqb : P → R and continuous bilinear forms bq : U×V → R, 1 ≤ q ≤ Qb , such that b

b(u, v; μ) =

Q 

.

ϑqb (μ) bq (u, v)

q=1

for all μ ∈ P, u ∈ U and v ∈ V. (b) The parametric functional fμ is called affine in the parameter if there exists a Qf ∈ N, functions ϑf : P → R and linear functionals fq : V → R, 1 ≤ q ≤ Qf , such that f

f (v; μ) =

Q 

.

f

ϑq (μ) fq (v)

q=1

for all μ ∈ P and v ∈ V. (c) The parametric problem (1.5) is called affine in the parameter if both bμ and fμ are affine in the parameter. We note that one can always determine an approximate affine decomposition in the sense of the latter definition by means of the Empirical Interpolation Method (EIM) [7]. Hence, we may always assume that (1.5) is affine in the parameter. In order to compute the reduced approximation uN μ =

N 

.

αn (μ) ξn ,

n=1

we first need to determine the supremizers for the test functions. This can be done as follows. In the offline (training) phase, we (pre-)compute parameter-independent quantities ηn,q ∈ Vδ by solving (ηn,q , uδ )V = bq (ξn , uδ ),

.

uδ ∈ Vδ ,

q = 1, . . . , Qb ,

n = 1, . . . , N . (1.16)

Hence, ηn,q are the supremizers in the sense of Definition 1.2 for the parameterindependent bilinear forms bq , q = 1, . . . , Qb corresponding to the functions ξn , n = 1, . . . , N , but with the supremum taken over the discrete truth space Vδ

1 The Reduced Basis Method in Space and Time: Challenges, Limits and Perspectives

17

instead of the inifinite-dimensional test space V (which would in general not be computable). Then, it is easily seen that the supremizers ηn (μ) := sξi (μ) (see su (μ) defined in Definition 1.2) read b

Q 

ηn (μ) =

.

ϑqb (μ) ηn,q ,

n = 1, . . . , N .

q=1

However, we will never form ηn (μ) explicitly, since this would be an O(N)-process. Instead, we setup the linear system B N (μ) α N (μ) = f N (μ) for determining the unknown reduced expansion coefficients α N (μ) = (αn (μ))n=1,...,N as follows. We have b

[B N (μ)]n,n = b(ξn , ηn (μ); μ) =

Q 

.

ϑqb (μ) b(ξn , ηn,q ; μ)

q=1 b

=

Q  q,q  =1 f

[f N (μ)]n =

ϑqb (μ) ϑqb (μ) bq  (ξn , ηn,q ), b

Q  Q  q=1 q  =1

f

ϑq (μ) ϑqb (μ) fq (ηn,q  ),

and the quantities bq  (ξn , ηn,q ), fq (ηn,q  ) can be precomputed in the offline phase in an O(N) loop. Then, setting up B N (μ) and f N (μ) for a new parameter μ requires O(N 2 (Qb )2 ) and O(N Qf Qb ) operations, respectively, both independent of N, i.e., the approximation is online-efficient. The solution of the linear system can be done in at most O(N 3 ) operations, we derive again an online-efficient approximation.

1.3.3.2

A Posteriori Error Estimation

Both for the selection of the sample set .SN and for the certification of the reduced approximation, we need the availability of an error estimator .N (μ) in the sense that we can prove an estimate

uμ − uN μ U ≤ N (μ),

.

(1.17)

and .N (μ) needs to be online-efficient. Note that (1.17) results in an error control of the exact solution .uμ , which is not always possible. In those cases, one needs to resort to ensuring . uδμ − uN μ U ≤ N (μ). Lemma 1.2 gives a possible choice for an error estimator, namely .(βμ )−1 fμ −Bμ uN μ V . However, this quantity is in general not computable, since the dual norm on .V requires to compute the supremum over

18

K. Urban

V. If this is impossible, one often reduces the supremum over .Vδ and gets that

.

δ

uδμ − uN μ U ≤ N (μ) :=

.

=

1

fμ − Bμ uN μ (Vδ ) βμ◦ δ fμ (v δ ) − b(uN 1 1 μ , v ; μ) sup =: ◦ R δ,N (μ), βμ◦ v δ ∈Vδ

v δ V βμ

which results in an error control w.r.t. the “truth” approximation .uδμ (not the exact one) and involves the LBB constant. Remark 1.5 (a) If the parametric problem is affine in the parameter, one can show that also the residual is affine in the parameter, see e.g. [44]. As a consequence, .R δ,N (μ) is online-efficient by precomputing and storing the Riesz representations of the affine terms in the offline phase and then combing them for a new parameter .μ in the online phase, see e.g. [52, 71]. (b) The computation of a lower bound for .βμ◦ is online-efficient by the successive constraint method (SCM) [54]. We can continue the above estimate as follows5 δN (μ) =

.



δ δ fμ (v δ ) − b(uN b(uδμ − uN 1 1 μ , v ; μ) μ , v ; μ) sup = sup βμ◦ v δ ∈Vδ

v δ V βμ◦ v δ ∈Vδ

v δ V

Cμ δ

u − uN μ U , βμ◦ μ

which shows that .δN (μ) is in fact a surrogate for the error w.r.t. the truth approximation, i.e., δ

uδμ − uN μ U ≤ N (μ) ≤

.

Cμ δ

u − uN μ U . βμ◦ μ

(1.18)

If .bμ is uniformly continuous and uniformly LBB stable, we obtain a uniform surrogate.

that δN (μ) is the error estimator for the difference of the reduced and the truth solution, whereas N (μ) bounds the error of the reduced approximation w.r.t. the exact solution of the PPDE.

5 Note

1 The Reduced Basis Method in Space and Time: Challenges, Limits and Perspectives

1.3.3.3

19

Greedy Selection of the Reduced Basis

So far, we do not know how to select the sample parameters for determining the snapshots that build the reduced trial space .U N in (1.13). This is typically done by a greedy procedure in which the error is maximized over a training sample set .Ptrain ⊂ P. This is called strong greedy method. However, usually the true error is not accessible and is replaced by a surrogate. The resulting scheme is called weak greedy method to be detailed next.

Weak Greedy Method input: training sample .Ptrain ⊆ P, parameter .γ ∈ (0, 1], tolerance .ε > 0 1: chose μ(1) ∈ Ptrain such that for ξ1 := uδ (1) it holds ξ1 U ≥ μ γ max uδμ U μ∈Ptrain

2: Initialize S1 ← {μ(1) }, U 1 := span{ξ1 }, N := 1 3: while true do 4: if max δN (μ) ≤ ε then return μ∈Ptrain

5:

μ(N+1) ← arg max δN (μ)

6:

compute the snapshot ξN+1 := uδμ(N+1)

μ∈Ptrain

7: compute the supremizers ηN,q , q = 1, . . . , Qf by (1.16) 8: SN+1 ← SN ∪ {μ(N+1) }, U N+1 := span{U N , ξN+1 } 9: N ←N +1 10: end while

output: set of chosen parameters .SN , reduced trial space .U N , supremizers

We note that the condition in line Algorithm 1 is more of theoretical nature as we shall see later in the convergence analysis of the algorithm. In practice, .μ(1) is often chosen arbitrarily. However, in order to satisfy the condition

ξ1 U ≥ γ max uδμ U ,

.

μ∈Ptrain

one does not need to compute all snapshots .uδμ , .μ ∈ Ptrain , which is prohibitive from an efficiency point of view. Since . uδμ U ≤ (βμ◦ )−1 fμ V , one can realize this requirement by ensuring that

ξ1 U ≥ γ max (βμ◦ )−1 fμ V ,

.

μ∈Ptrain

which can often be guaranteed by corresponding a priori estimates for .βμ◦ and . fμ V . The choice of the parameter .γ will later be guided by the analysis. A

20

K. Urban

possible choice is (see Definitions 1.1 and 1.3) γ =

.

β◦ C

if the LBB lower bound and the continuity constant are available. In our subsequent analysis, we will need a smaller value for .γ . One might argue that the supremizers are only required for the final value of N. However, the computation of the surrogate δ N . (μ) in line Algorithm 5 requires to compute the reduced approximation .uμ for N which the supremizers are needed. We will next present a convergence analysis for the weak greedy method and will compare it to “best possible” RBM, namely the optimal benchmark.

1.3.4 What Is the Benchmark? The Reduced Basis Method (RBM) as described above results in linear reduced models of hopefully small size .N ∈ N to approximate the solution .uμ of a PPDE, or, at least of the “truth” approximation .uδμ . The question of course arises how “good” the reduced approximation .uN μ is, i.e., how small the error N N eμ := u(δ) μ − uμ U

.

(or the error w.r.t. some quantity of interest other that the norm . · U ) is.6 In other words, given some tolerance .ε > 0, how small or large does the size of the reduced N ≤ ε? Hence, we are interested in the model need to be in order to ensure that .eμ N error decay of .eμ as N increases. The next issue is to fix in which notion we will describe the error w.r.t. the parameter. This might also depend on the specific problem at hand. In fact, if the parameter contains uncertainty, one might consider the error in expectation w.r.t. .μ or the mean squared error. We are going to consider the worst case error guaranteeing a certified bound for any parameter value. This is reflected by the following notion. Definition 1.5 Let .P ⊂ RP be the parameter set and .N ∈ N. The quantity dN (P) :=

.

inf

sup

inf uμ − wN U

UN ⊆U; dim(UN )=N μ∈P wN ∈UN

is called Kolmogorov N -width. 6 Usually,

one considers the error w.r.t. the truth approximation .uδμ , since the error bound can typically only be computed in that case. There are some papers considering the error w.r.t. the (δ) exact solution .uμ of the PPDE. This is why we write .uμ here.

1 The Reduced Basis Method in Space and Time: Challenges, Limits and Perspectives

21

Obviously, the Kolmogorov N-width is the best possible error that we can achieve with a linear approximation for all parameters. It can easily be seen that determining or realizing a reduced model with its error being the Kolmogorov Nwidth is NP-hard. However, the Kolmogorov N-width sets the benchmark, we are interested in the rate of decay of .dN (P), i.e., how fast tends .dN (P) to zero as .N → ∞? In order to answer this question, we start by determining the rate of convergence of the Kolmogorov N-width, i.e., the best possible RBM-error. This rate has been investigated in [18] and [65] for symmetric, coercive problems and we are going to slightly generalize the result from [65] to well-posed (i.e., inf-sup stable) problems. Theorem 1.3 ([65, Thm. 3.1]) If .bμ is inf-sup stable and affine in the parameter, then there exist constants .c, C > 0 such that dN (P) ≤ C e−cN

.

1/Qb

.

Proof One can follow the lines of the proof of [65, Thm. 3.1], where the result is proven for a coercive bilinear form .aμ instead of an inf-sup stable .bμ . Denote by .Bq : U → V the operators induced by .bq , i.e., Bq (u), v V ×V := bq (u, v),

.

u ∈ U, v ∈ V,

q = 1, . . . , Qb .

Those operators are extended by complex linearity to continuous linear operators BqC : UC → (V )C ∼ = (VC ) . Then, the bilinear mapping

.

b

C

 :U ×C

.

Qb

C 

→ (W ) ,

(u, c) :=

Q 

cq BqC (u)

q=1

is holomorphic. Moreover, due to the inf-sup stability of .bμ , the complex linear Fréchet derivative b

∂u (u, c) =

Q 

.

cq BqC ,

ˆ c ∈ {(ϑ1 (μ), . . . , ϑQb (μ)) : μ ∈ P} =: P

q=1

ˆ Denoting .c(μ) := (ϑ1 (μ), . . . , ϑQb (μ)), we have that is invertible for each .c ∈ P. (uμ , c(μ)) = Bμ (uμ ). Next, define

.

.

ˆ → UC ˆ :P 

by

ˆ 1 (μ), . . . , ϑQb (μ)) := (μ) (ϑ

22

K. Urban

and . : P → U is defined by .(μ) := uμ being the solution of (1.5). Then, the ˆ can complex Banach space version of the implicit function theorem ensures that . Qb ˆ be holomorphically extended to an open neighborhood .P ⊂ O ⊆ C . The remainder of the proof is exactly as the one of [65, Thm. 3.1], we quote it ˆ and ˆ is compact, there are finitely many .c1 , . . . , cM ∈ P for completeness.7 Since .P radii .r1 , . . . , rM such that ˆ ⊂ P

M 

.

M 

D(cm , rm ) and

m=1

D(cm , 2rm ) ⊆ O,

m=1

where .D(c, r) := {z ∈ CQ : |zq − cq | < r, 1 ≤ q ≤ Qb }. Holomorphy implies analyticity, thus for each .1 ≤ m ≤ M and each multiQ C ˆ index  .α ∈ Nα0 there are vectors .um,α ∈ U such that the power series .(z) = (z − c ) u , is absolutely convergent for each . z ∈ D(c , 2r ). In addition, m m,α m m α .um,α ∈ U. Moreover, we have that C := max

.

sup

1≤m≤M z∈D(cm ,rm )

   α  α  2 (z − cm ) um,α    < ∞. α

≤ K Q multi-indices .α of length .Qb and maximum Note that there are . (QQb+K)! !K! b

b

degree K. Let .KN := (M −1 N)1/Q , and define b

UN := Span{um,α | 1 ≤ m ≤ M, |α| ≤ KN } ⊆ U.

.

ˆ .(μ) = (z), .z ∈ D(cm , rm ), Now, for an arbitrary .μ ∈ P we can approximate  by the truncated power series .N (μ) := |α|≤KN α(z − cm )α um,α ∈ UN . We then obtain      −α α α  . (μ) − N (μ) ≤  2 · 2 (z − c ) u m m,α   |α|≥KN +1

≤ C2−(KN +1) ≤ Ce− ln(2)M which completes the proof.

−1/Qb N 1/Qb

,  

The above result shows that the best possible reduced model shows exponential convergence for inf-sup stable problems.

7 In order to clearly indicate the fact that this proof (and also some others in the sequel of this paper) are basically literally quoted from the given references, we change the type of font.

1 The Reduced Basis Method in Space and Time: Challenges, Limits and Perspectives

23

1.3.5 Weak-Greedy Convergence The reduced model is determined by applying the weak greedy algorithm described above. In particular, we obtain a reduced model determined by the RBM at hand with the following error σN (P) := sup uμ − uN μ U ,

.

μ∈P

8 where .uN μ is the reduced approximation. Hence, we would like to relate the RBMerror with the Kolmogorov N-width. Since the reduced model has been determined by the weak greedy method, the question at hand is if the rate of the error of the weak greedy method is at least asymptotically (i.e., as .N → ∞) comparable to the Kolmogorov N -width. The paper [13] addresses this question for symmetric, coercive problems. We shall generalize the results in [13] to general well-posed (i.e., inf-sup stable) problems. The first statement, Theorem 1.4 below, addresses the case of algebraic decay of the Kolmogorov N-width, while the second one, Theorem 1.5, is concerned with exponential decay as in Theorem 1.3 above. Both statements show that the weak greedy decay asymptotically matches the Kolmogorov N -width. An assumption, which is partly hidden in the subsequent analysis, though, is that we need to assume that .Ptrain = P , i.e., we assume to train the weak greedy method with all parameter data. Of course, this is in general not realistic. For certain problems such as the thermal block (see below), one can identify optimal test sets (the “magic points” [18]), but in general, one would get an additional test error, which we will neglect here. The proofs will be given later.

Theorem 1.4 ([13, Thm. 3.1]) Suppose that .d1 (P) ≤ M and .dN (P) ≤ M N −α , .N > 1 for some .M > 0 and .α > 0. Then, σN (P) ≤ C M N −α ,

.

N > 1,

(1.19)

with .C := q 1/2 (4α)α and .q := 2α+1 γ −1 2 . Theorem 1.5 ([13, Thm. 3.2]) Suppose that .dN (P) ≤ M e−aN , .N > 1 for some .M > 0, .a > 0 and .α > 0. Then, α

σN (P) ≤ C M e−cN , β

.

for .β := max{e

8 Here,

cmβ

α α+1 , whenever .θ ∈ (0, 1) , q 1/2 }, .q := 2γ −1 θ −1 2 ,

N > 1,

(1.20)

as well as .c := min{| log θ|, (4q)−α a}, .C := and .m := (8q)α+1 .

we shall not distinguish between the exact solution .uμ and its “truth” approximation .uδμ .

24

K. Urban

Proof The proof is similar to the proof of Theorem 1.4 and completely based upon Lemma 1.4, see [13, Thm. 3.2].   Both theorems have been proven in [13] for symmetric and coercive problems. We claim that both results also hold for inf-sup stable problems. In order to see this, we need some preparations, which are quite similar to [13]. Let .U N = Span{ξ1 , . . . , ξN } and assume that .{ξ1∗ , . . . , ξN∗ } is an orthonormal basis of .U N , which has been obtained from the .ξi by a Gram-Schmidt procedure w.r.t. the inner product in .U. Then, define the orthogonal projection .PN : U → U N as usual by N N   .PN u := (u, ξi∗ )U ξi∗ , so that . PN u 2U = |(u, ξi∗ )U |2 . We set i=1

i=1

σN (u) := u − PN u U ,

σN := σN (P) := max σN (uδμ ).

.

μ∈P

In particular, we have ξi = Pi+1 ξi =

i+1 

.

ai,j ξj∗ ,

with ai,j := (ξi , ξj∗ )U

j =1

and we define the lower triangular infinite matrix A := (ai,j )∞ i,j =1 ,

where ai,j := 0 for j > i.

.

The analysis in [13] is done for the error . uμ − PN uμ U of the orthogonal projection and for the exact solution .uμ instead of the “truth” .uδμ . Within the RB context described above, however, we do not have .PN uδμ available, but the reduced approximation .uN μ (the Petrov-Galerkin projection). Hence, we first need to relate the projection error . uδμ − PN uδμ U and the Petrov-Galerkin error . uδμ − uN μ U . In order to do so, we recall that the orthogonal projection is the best approximation, so that

uδμ − PN uδμ U =

.



inf

wN ∈U N

C β◦

uδμ − wN U ≤ uδμ − uN μ U

inf

wN ∈U N

uδμ − wN U =

C δ

u − PN uδμ U . β◦ μ

From (1.18) we know that C δ .N (μ) ≤

u − uN μ U ≤ β◦ μ



C β◦

2

uδμ − Pn uδμ U ,

1 The Reduced Basis Method in Space and Time: Challenges, Limits and Perspectives

25

so that we get 

β◦

.

C

2 N (μ) ≤ uδμ − PN uδμ U ≤ N (μ).

(1.21)

Now, we set  γ :=

.

β◦ C

2 ∈ (0, 1]

(1.22)

and use this in the weak greedy algorithm described in Sect. 1.3.3 above. Lemma 1.3 ([13]) With the above notation, we have for .Ptrain = P: (i) It holds that .γ σn ≤ |an,n | ≤ σn . m  2 (ii) For every .m ≥ n one has . am,j ≤ σn2 . j =n

Proof In order to prove (i), first note that

ξn 2U =

n 

.

|(ξn , ξi∗ )U |2

and

Pn−1 ξn 2U =

i=1

n−1 

|(ξn , ξi∗ )U |2 .

i=1

Then, |an,n |2 = |(ξn , ξn∗ )U |2 = ξn 2U − Pn−1 ξn 2U = ξn − Pn−1 ξn 2U = σn−1 (ξn )2 ,

.

which implies that .|an,n | ≤ σn−1 . On the other hand, by the greedy selection in line Algorithm 5 in the weak greedy method and (1.21) γ σn−1 = γ max uδμ − Pn−1 uδμ U ≤ γ max n−1 (μ) = γ n−1 (μ(n) )

.

μ∈P

μ∈P

≤ ξn − Pn−1 ξn U = |an,n |, which proves (i). As for (ii), we have m  .

j =n

2 am,j =

m  j =n

|(ξm , ξj∗ )U |2 = ξm − Pn ξm 2U ≤ max uδμ − Pn uδμ 2U = σn2 , μ∈P

 

which completes the proof. Lemma 1.4 ([13, Lemma 2.2]) Fix .0 < θ < 1 and set .q := 2γ −1 θ −1 2 . If σn+qm ≥ θ σn

.

(1.23)

26

K. Urban

for some .m, n ∈ N, then 1

σn (P) ≤ q 2 dm (P).

(1.24)

.

Proof The proof only uses the properties of the matrix .A proven in Lemma 1.3, so that we refer to [13, Lemma 2.2]. We quote the proof only for completeness of the presentation. Without loss of generality we associate the infinite dimensional Hilbert space .U with . 2 (N ∪ {0}) and .ξj∗ = ej , where .ej is the vector with a one in the coordinate indexed by j and all other coordinates zero, i.e., (.ej )i = δj,i . We consider the .(qm + 1) × (qm + 1) submatrix .A of .A given by the entries of .A in the rows and columns with indices in .{n, . . . , n + qm} . We denote by .gj , .j = 0, 1, . . . , qm , the rows of .A . These are each vectors in .Rqm+1 whose coordinates we index by .0, 1, . . . , qm . Let Y be an m -dimensional space which realizes .dm (P) , and let .Ym be the restriction of Y to the coordinates in .{n, n + 1, . . . , n + qm} . Then, for each j , the projection .yj of .gj onto .Ym satisfies

gj − yj ≤ dm := dm (P),

.

j =, 1, . . . , qm.

We can without loss of generality assume that .dim(Ym ) = m . Let .φ1 , . . . , φm be an qm orthonormal basis for .Ym . Since each .φi = (φi (j ))j =0 , .i = 1, . . . , m , has norm one, for at least one value of .j ∈ {0, . . . , qm} , we have m  .

|φi (j )|2 ≤ q −1

(1.25)

i=1

We fix this value of j and write .yj = coordinate .yj (j ) of .yj satisfies

m

i=1 gj , φi φi

. This means that the j th

m

 m 1/2  m 1/2



 



.|yj (j )| =

gj , φi φi (j ) ≤ | gj , φi |2 |φi (j )|2



i=1

≤q

− 12

i=1

gj ≤ q

− 12

i=1

σn ,

where we have used (1.25) and property (ii) from Lemma 1.3. From (i) of the same lemma, (1.23), and the definition of q, we have 1

|gj (j )| = |an+j,n+j | ≥ γ σn+j ≥ γ θ σn ≥ 2q − 2 σn .

.

1 The Reduced Basis Method in Space and Time: Challenges, Limits and Perspectives

27

It follows that 1

q − 2 σn ≤ |qj (j ) − yj (j )| ≤ gj − yj ≤ dm

.

 

which completes the proof of the lemma.

Proof of Theorem 1.4 First, we choose .θ = 2−α . It follows that .q = 2γ −1 θ −1 2 , which is the relation needed in Lemma 1.4. Next, we define .N0 := 4q. Note that .q ≥ 9 and .N0 ≥ 36. Then, the definition of C shows that (1.19) holds for .1 ≤ n ≤ N0 since σN (P) ≤ σ0 (P) ≤ M ≤ MN0α N α ≤ CMN −α , N ≤ N0 .

.

We suppose now that (1.19) does not hold for some .N > N0 and draw a contradiction. To start with we let n be the smallest integer .n > N0 for which (1.19) does not hold. Then, CMn−α < σn (P).

(1.26)

.

We define N as the smallest integer for which .N α N −α ≥ θ , i.e., .N := θ 1/α n = n/2. We then have σN (P) ≤ CMN −α = CMnα N −α n−α ≤ θ −1 σn (P).

.

We now take this value of N and take m as the largest integer such that .N +qm ≤ n, i.e., .m := (n − N)/q. We can apply Lemma 1.4 and conclude that 1

1

1

σn (P) ≤ σn (P) ≤ q 2 dm (P) ≤ q 2 Mm−α = (q 2 nα mα )Mn−α .

.

(1.27)

Since .n > N0 > 4q − 2, it follows that 4qm = 4q(n − N)/q ≥ 4(n − N − q + 1) = 4n − 2(2N/2) − 4q + 4

.

≥ 2n − 2 − 4q + 4 ≥ n, where we have used the fact that if a and b are positive integers, then .a ab  ≥ n b − a + 1 and .a ab  ≤ b + a − 1. Therefore, we have . m ≤ 4q. Combining (1.26) with (1.27) , we obtain 1

1

C < q 2 (n/m)α ≤ q 2 (4q)α .

.

This contradicts the definition of C.

 

28

K. Urban

Proof of Theorem 1.5 The proof is similar to that of Theorem 1.4. By our definition of C we find that (1.20) holds for .N ≤ N0 since we have β

σN (P) ≤ σ0 (P) ≤ σ0 (P) ecN0 e−cN ≤ CMe−cN , β

β

N ≤ N0 .

.

We suppose now that (1.20) does not hold for some .N > N0 and draw a contradiction. Let .n > N0 be the smallest integer for which CMe−cn < σn (P). β

.

We now let m be any positive integer for which ec(n−qm) e−cn ≥ θ β

β

.

c(nβ − (n − qm)β ) ≤ | ln θ|.

or, equivalently,

(1.28)

Then, using that (1.20) holds for .n − qm, (1.28) yields σn−qm (P) ≤ θ −1 σn (P).

.

Now we can apply Lemma 1.4 and conclude that 1

1

σn (P) ≤ σn−qm (P) ≤ q 2 dm (P) ≤ q 2 Me−am . α

.

We are left to show that there is a choice of m satisfying (1.28) and such that 1

q 2 Me−am ≤ CMe−cn α

β

.

(1.29)

since this will be the desired contradiction. Taking logarithms, (1.29) is equivalent to .

1 ln q − ln C ≤ amα − cnβ . 2

(1.30)

Now to show that there is an m which satisfies both (1.30) and (1.28), we consider m :=

.

n1−β 2q

.

n Clearly .m ≤ 2q and therefore .n − qm ≥ n/2. From the mean value theorem, we have for some .ζ ∈ (n − qm, n)

nβ − (n − qm)β = βζ β−1 qm ≤ qmβ

.

n β−1 2



n1−β n β−1 ≤ 2−β β ≤ 1. β 2 2

1 The Reduced Basis Method in Space and Time: Challenges, Limits and Perspectives

29

Thus, (1.28) will be satisfied for the definition of c. Now let us check (1.30). We 1−β first remark that since .n > N0 , we have .m ≥ 4 and therefore .m ≥ n4q . From the definition of c, we thus have  am − cn ≥ a

.

α

β

n1−β 4q

α

− cnβ = (a(4q)−α − c)nβ ≥ 0.

1

Since by definition .C ≥ q 2 , we have verified (1.30) and completed the proof.

 

1.4 Guiding Examples Now, we have all ingredients for the Reduced Basis Method at hand, including a convergence analysis. Of course, the above framework is—intentionally—quite general and we need to specify the general framework to particular PPDEs. In this section, we start by two examples for which the above presented theory fits particularly well. Later, we shall also investigate examples, where this is not the case. The first example is the most standard one, namely a stationary, symmetric and coercive PPDE.

Example 1.1 (The Thermal Block) Let .Rd ⊃  =  .p = p , and define Bμ u := −

P 

.

∇ · (μp χp ∇u),

P

p=1 p , .p

∩ p = ∅,

u|∂ = 0,

p=1

where .μ = (μ1 , . . . , μP ) ∈ RP , .μp ≥ μ0 > 0 for .p = 1, . . . , P , is the parameter vector and .χp denotes the characteristic function of .p .

The second example is instationary, namely the parabolic heat equation. As we shall see below, using a space-time variational formulation allows us to apply the RBM (almost) in the same manner as for the thermal block.

Example 1.2 (The Parameterized Heat Equation) Let . ⊂ Rd some spatial domain, .I := (0, T ) with .T > 0 a time interval. Then, define (continued)

30

K. Urban

Example 1.2 (continued) .Bμ u := u ˙ − Aμ u,

u(0) = 0, u|∂ = 0,

where .Aμ is an affine symmetric and coercive operator, i.e., there exists a Qa ∈ N, functions .ϑqa : P → R and continuous coercive forms .aq : V ×V → R, .1 ≤ q ≤ Qa , such that .aμ : V × V → R is defined by

.

a

aμ (φ, ψ) ≡ a(φ, ψ; μ) =

Q 

.

ϑqa (μ) aq (φ, ψ)

(1.31)

q=1

for all .μ ∈ P, .φ, ψ ∈ V = H01 (), such that .aμ (φ, φ) ≥ α◦ φ H 1 () . The 0 operator .Aμ : L2 (I ; V ) → L2 (I ; V  ) is defined by  Aμ φ, ψ L2 (I ;V  )×L2 (I ;V ) :=

T

a(u(t), v(t); μ) dt.

.

0

The most easy example would be .Aμ = −μ , .μ ≥ μ0 > 0.

1.4.1 The Thermal Block We start by Example 1.1, which can be treated by means of a standard Galerkin approach. Variational Formulation The variational form of this problem is given by .U = V = H01 () and b(u, v; μ) =

P 

.

p=1

 ∇u(x) · ∇v(x) dx.

μp p

Obviously, this yields a symmetric problem and .b(·, ·; μ) is even coercive thanks to the assumption that .μp > μ0 > 0 for all .p = 1, . . . , P , since

b(u, u; μ) =

P 

.

p=1

 ∇u(x) · ∇u(x) dx ≥ μ0

μp p

= μ0 |u|H 1 () ,

P   p=1 p

|∇u(x)|2 dx

1 The Reduced Basis Method in Space and Time: Challenges, Limits and Perspectives

31

which shows coercivity. The right-hand side reads for a given .fμ ∈ H −1 ()  f (v; μ) :=

fμ (x) v(x) dx.

.



Then, the variational formulation amounts finding .uμ ∈ U such that .b(u, v; μ) = f (v; μ) for all .v ∈ U. The Lax-Milgram theorem ensures existence, uniqueness and stability of the variational problem for all parameter values of .μ. Stable “Truth” Discretization Since we have a symmetric and coercive problem at hand, we can simply use a conforming finite element discretization on some triangulation. Trial and test spaces coincide and .Uδ = Vδ can e.g. be formed by piecewise linear, continuous finite elements vanishing at the boundary .∂. In that case, .N is the number of interior nodes of the triangulation. The inf-sup constant in this case coincides with the coercivity constant, so that uniform stability is ensured with a lower bound of .μ0 . By some appropriate preconditioned conjugate gradient or multigrid solver, we can determine the “truth” approximation .uδμ for any parameter in linear complexity, i.e., .O(N).

1.4.2 The Heat Equation The parabolic heat equation cannot be formulated by a symmetric coercive variational approach. The most standard method is probably a semi-discretization in space and then using a time-marching scheme to solve the arising high-dimensional initial value problem. In that framework, model reduction can be done by the PODGreedy method introduced in [45]. However, such a formulation does not fall into the setting formulated above. Hence, we follow a different path. Variational Formulation As already said, there are several approaches to study well-posedness of the heat equation. Among them, we note the semi-variational approach and semigroup theory. However, these classical methods do not yield a variational formulation in the above setting. Hence, we follow [25, 74, 80, 81] to introduce a well-posed space-time variational formulation. This will lead us to Lebesgue-Bochner spaces for functions in time and space to be introduced next. Denote .V := H01 () and recall that .V → H := L2 () → V  with all embeddings being dense. Then, for any .X ∈ {V , H, V  }, we define the BochnerSobolev space with homogeneous initial condition as    L2 (I ; X) := v : I → X : v 2L2 (I ;X) := v(t) 2X dt < ∞

.

I

H 1 (I ; X) := {v ∈ L2 (I ; X) : vt ≡ v˙ ∈ L2 (I ; X)} ,

32

K. Urban

where the derivative w.r.t. time is to be understood in the weak (variational) form. Then, we define the Sobolev space with left Dirichlet (i.e., initial) condition as 1 (I ; V ) := {v ∈ H 1 (I ; V ) : v(0) = 0}. H(0)

.

It is well-known that .H 1 (I ; V  ) → C(I¯; H ), so that .v(0) ∈ H . Next, denote the dual pairing of .V  and V by . ·, · V  ×V and we obtain the space-time dual pairing for   .V := L2 (I ; V ), .V = L2 (I ; V ) by  w, v V ×V = 

.

I

w(z), v(t) V  ×V dt.

With these preparations at hand, we can now formulate the space-time variational formulation as follows. Define 1 U := H(0) (I ; V  ) ∩ L2 (I ; V ),

.

V := V = L2 (I ; V )

(1.32)

(i.e., .fμ ∈ L2 (I ; V  ) = V = V ) and we choose the norms

u 2U := u ˙ 2L2 (I ;V  ) + u 2L2 (I ;V ) + u(T ) 2H ,

.

v V ≡ v L2 (I ;V ) ,

v ∈ V. 

Then, define .Aμ : V → V by . Aμ u, v V ×V := parametric bilinear and linear forms by  b(u, . v; μ) : = I

u ∈ U,

I

aμ (u(t), v(t)) dt and the

 ut (t), v(t) V  ×V dt + μ

aμ (u(t), v(t)) dt I

= ut + Aμ u, v V ×V , .  f (v; μ) = fμ (t), v(t) V  ×V dt = fμ , v V ×V .

(1.33a) (1.33b)

I

Now, we can investigate the well-posedness of the variational problem. To this end, we shall verify the conditions of the Banach–Neˇcas Theorem (Theorem 1.1). Proposition 1.1 Assume that the bilinear form .aμ is coercive and continuous such that .αμ ≤ Aμ ≤ γμ with .0 < αμ ≤ 1. Then, .

b(u, v; μ) ≥ αμ . u∈U v∈V u U v V inf sup

1 The Reduced Basis Method in Space and Time: Challenges, Limits and Perspectives

33

Proof Let .0 = u ∈ U. The supremizer .su (μ) ∈ V is determined as .(su (μ), v)V = b(u, v; μ) = ut + Aμ u, v V ×V . Hence 

b(u, v; μ) sup

v V v∈V

.

2 = su (μ) 2V = ut + Aμ u 2V = ut 2V + Aμ u 2V + 2 (ut , Aμ u)V .

Next, . Aμ u V ≥ αμ u V and  2 (ut , Aμ u)

.

V

T

=2



0

=

d dt

T

(ut (t), Aμ u(t)) dt = 2 V

0



T 0

2

A1/2 μ u(t) H dt ≥ αμ

d dt

1/2 (A1/2 μ ut (t), Aμ u(t))H dt



T 0

u(t) 2H dt = αμ u(T ) 2H ,

so that . su (μ) 2V ≥ ut 2V +αμ2 u 2V +αμ u(T ) 2H ≥ αμ2 u 2U , which completes the proof.   Corollary 1.3 For .Aμ ≡ A =− (i.e., for the heat equation) and . · V ≡ |·|H 1 () (the energy norm), the inf-sup constant is unity, i.e., .βμ ≡ β = 1. This latter result shows that the space-time variational method yields an optimally stable problem for the heat equation. Remark 1.6 Using similar arguments, one can also bound the continuity constant as follows .bμ (u, v) ≤ γμ u U v V , which is also unity (.γμ ≡ γ = 1) in the setting of the latter corollary. Theorem 1.6 Given .fμ ∈ L2 (I, V  ), there exists exactly one solution .uμ ∈ H 1 (I, V  ) ∩ L2 (I, V ) of (1.5) with the choices (1.33). Proof The proof can be found e.g. in [3, §8.6], see also [74] or [25]. In addition to Proposition 1.1 one needs to verify the conditions of Theorem 1.1.   Stable “Truth” Discretization Recall trial and test spaces defined in (1.32). We are now going to describe a uniformly stable Petrov-Galerkin discretization by spaces .Uδ and .Vδ . As above, we restrict ourselves to the case .H = L2 (), δ 1 .V = H (). Let .U := St ⊗ Vh , .Vδ := Qt ⊗ Vh , where .St and .Vh are 0 finite element spaces spanned by piecewise linear functions and .Qt a finite element space space spanned by piecewise constant functions with respect to triangulations .Th in space and k−1 Ttime ≡ (k − 1)t < t ≤ k t ≡ t k , 1 ≤ k ≤ K} t ≡ {t

.

in time, .t := T /K, as appropriate. We have .St = Span{σ 1 , . . . , σ K }, where k is the (interpolating) hat function with respect to the nodes .t k−1 , .t k und .t k+1 .σ (truncated to .[0, T ] when .k = K), and also .Qt = Span{τ 1 , . . . , τ K }, where

34

K. Urban

τ k = χI k is the characteristic function on .I k := (t k−1 , t k ). Finally, take .Vh = space Span{φ1 , . . . , φnh } to be the nodal basis with respect to .Th .  K nh  n δ k k h δ Given functions .uδ = K k=1 =1 i=1 ui σ ⊗ φi ∈ U and .v = j =1 vj τ k ⊗ φj (with coefficients .uki and .vj , respectively), we obtain9

.



 δ  u˙ (t), v δ (t) V  ×V + aμ (uδ (t), v δ (t)) dt

bμ (u , v ) = δ

.

δ

I

=

K 

nh 

 j uik v (σ˙ k , τ )L2 (I ) (φi , φj )H + (σ k , τ )L2 (I ) aμ (φi , φj )

k, =1 i,j =1

= (uδ ) B δμ v δ , where10 space

B δμ := N time t ⊗ M h

.

space

+ M time t ⊗ Ah;μ

(1.34)

space

and .M h := [(φi , φj )L2 () ]i,j =1,...,nh , .M time := [(σ k , τ )L2 (I ) ]k, =1,...,K are t the respective mass matrices with respect to time and space, and the remaining space time := [(σ .N ˙ k , τ )L2 (I ) ]k, =1,...,K and .Ah;μ := [aμ (φi , φj )]i,j =1,...,nh are the t stiffness matrices. For our special choice of spaces we obtain, denoting by .δk, the discrete Kronecker delta, (σ˙ k , τ )L2 (I ) = δk, − δk+1, ,

.

bμ (uδ , τ ⊗ φj ) =

nh  

(σ k , τ )L2 (I ) =

(u i − u −1 )(φi , φj )H + i

i=1

t (δk, + δk+1, ), 2

 t ) aμ (φi , φj ) (ui + u −1 i 2

 space 1  space = t M h (u − u −1 ) + Ah;μ u −1/2 , t −1/2

where .u := (vi )i=1,...,nh , .ui := 12 (u i +u −1 ), and correspondingly for .u −1/2 . i Now we apply the trapezoidal rule to approximate the integral with respect to the time variable on the right-hand side, 

T

fμ (τ ⊗ φj ) =

.



0

fμ (t), τ ⊗ φj V  ×V dt

t fμ (t −1 ) + fμ (t ), φj V  ×V 2 t −1 = )j , (f μ + f μ )j = t (f −1/2 μ 2



9 .(τ

⊗ φ)(t, x) := τ (t) φ(x). ⊗ A)(k,i),( ,j ) := M k, Ai,j denotes the Kronecker product of matrices.

10 .(M

1 The Reduced Basis Method in Space and Time: Challenges, Limits and Perspectives

35

where .f μ := ( fμ (t ), φj V  ×V )j =1,...,nh . Now we can write the (discrete) PetrovGalerkin problem (1.8) as .

1 space space (uμ − u −1 (μ) u −1/2 = f −1/2 , M μ ) + Ah μ μ t h

u0μ := 0,

(1.35)

which is exactly the Crank–Nicolson (CN) method. We can thus use the space-time variational formulation to derive error estimates for the Crank–Nicolson method. LBB Stability The inf-sup stability of the above discretization has been investigated in [81], see also [1, 2] for several results concerning stable discretizations for space-time formulations of parabolic problems. To this end,it pays off to consider a slightly modified norm: given .u ∈ U, we set  .u ¯ k := (t)−1 I k u(t) dt ∈ V and .u¯ := K ¯ k ∈ L2 (I,V ), as well as k=1 χI k ⊗ u |||u|||2U,δ := u ˙ 2L2 (I,V  ) + u ¯ 2L2 (I,V ) + u(T ) 2H .

.

This averaging in time is in fact the “natural” norm for the analysis of the Crank– Nicolson method. For the inf-sup and continuity constants βμ◦ := inf

.

sup

wδ ∈Uδ uδ ∈Vδ

bμ (w δ , uδ ) , |||uδ |||U,δ uδ V

γμ◦ := sup

sup

wδ ∈Uδ uδ ∈Vδ

bμ (w δ , uδ ) , |||uδ |||U,δ uδ V

we have .βμ◦ = γμ◦ = 1, as long as .aμ (·, ·) is assumed to be symmetric and the parameter-dependent energy norm is used in V , i.e., . φ 2V := aμ (φ, φ), .φ ∈ V [81, Prop. 2.9]. There are several choices for LBB stable discretizations, see again [1, 2]. Of course, their dependency on the parameter .μ depends on .aμ . In a similar fashion as Proposition 1.1 above, the inf-sup constant is bounded from below by a lower bound of the coercivity constant of .aμ . For practical realizations, one may resort to the Successive Constraint Method [54], if concrete values for a lower bound of .βμ◦ are needed. The above derivation shows that one does not need to solve a large-dimensional linear system (with the matrix .B δμ in (1.34)) to obtain the “truth” solution, but we can simply use the Crank–Nicolson form, which requires .O(KN) operations, where K is the number of time steps.

1.5 Beyond Elliptic and Parabolic Problems In addition to Sect. 1.4 we are now going to consider examples of PPDEs, where the above framework does not immediately fit. In fact, for transport, wave and Schrödinger-type problems, it is at least not straightforward how to develop a wellposed variational form (1.5) along with a LBB stable “truth” approximation (1.8). A second issue then is the computation of online-efficient parameter-dependent

36

K. Urban

supremizers as in (1.16). Next, we need an error estimator as in (1.17). We will tackle all these issues by considering ultraweak variational formulations, which turn out to be appropriate from a physical point of view. However, as we shall see, the arising framework does not satisfy the conditions of Theorem 1.3, which ensured that the Kolmogorov N-width decays exponentially fast. Hence, we will investigate this decay for problems going beyond the scope of elliptic and parabolic ones.

1.5.1 Some More Guiding Examples We start by describing the announced examples and collecting statements on the respective Kolmogorov N-width.

Example 1.3 (Linear Transport) Let . ⊂ Rd some spatial domain, .I := (0, T ) a time interval with .T > 0. Then, we consider the parameterized linear transport problem for Bμ u := u˙ + bμ · ∇u + cμ ,

.

u|− = gμ ,

where .bμ : I ×  → Rd , .cμ : I ×  → R and .gμ : − → R are time- and space-dependent coefficients, where the inflow boundary − := {(t, x) ∈ I × ∂ : bμ · n(x) < 0}

.

is assumed to be parameter-independent and .n denotes the outward normal of ∂.

.

Example 1.4 (The Wave Equation) Next, we consider the parameterized wave equation on .I × , where . ⊂ Rd is again some spatial domain and .I := (0, T ) with .T > 0 a time interval, which reads Bμ u := u¨ + Aμ u,

.

u(0) = u0 , u(0) ˙ = u1 ,

u|∂ = 0,

where .Aμ : H01 () → H −1 () is an affine symmetric and coercive operator as in the parabolic problem, see (1.31). If .Aμ = −μ2 , the parameter .μ indicates the wave speed.

1 The Reduced Basis Method in Space and Time: Challenges, Limits and Perspectives

37

Example 1.5 (Schrödinger Equation) We consider the inhomogeneous timedependent linear Schrödinger equation on a time interval .I := (0, T ), .T > 0, and a bounded spatial domain .  R3N (.N ≥ 1 being the number of particles) with a smooth boundary . := ∂, ⎧ ∂ 1 ⎪ i ψ(t, x) − x ψ(t, x) + μ (t, x) ψ(t, x) = g(t, x), ⎪ ⎪ ⎨ ∂t 2 .

⎪ ⎪ ⎪ ⎩

(t, x) ∈ I × ,

ψ(t, x) = 0,

(t, x) ∈ I × ,

ψ(0, x) = ψ0 ,

x ∈ , (1.36)

where .g : I ×  → C is an inhomogeneous right-hand side, .ψ0 some initial state and .μ : I × R3N → R a real-valued potential, which is affine, i.e., 

μ =

Q 

.

ϑq (μ) q .

q=1

We might also consider several other parameters here. For example, one could replace the Laplacian .x by some parameterized affine elliptic operator .Aμ or could also consider the initial value .ψ0 as a parameter. Our subsequent framework will cover all this, but for simplicity of exposition we restrict ourselves to the case of a parametric potential. We stress the fact that the presented theory can also be extended to the full space case, i.e., . = R3N , with some technical modifications though.

1.5.2 Ultraweak Formulations In all three cases described above there is a lack of a well-posed variational formulation in the sense of Theorem 1.1. In this section, we will derive a framework allowing for well-posed variational formulations for the above three examples. We start by considering the classical counterpart of the PPDE operator .Bμ denoted by .Bμ◦ and assume that it is a differential (or integral) operator on some domain .D ⊂ Rd , including stationary problems on .D =  and instationary ones on .D = (0, T ) × . The classical operator .Bμ◦ is defined pointwise on its classical domain .

dom(Bμ◦ ) = {w : D → R : Bμ◦ w ∈ C(D)}.

38

K. Urban

Its formal classical adjoint .Bμ◦,∗ of .Bμ◦ is defined by (u, Bμ◦,∗ v)L2 (D) = (Bμ◦ u, v)L2 (D)

.

u, v ∈ C0∞ (D).

(1.37)

The subsequent general construction goes back to [22], there for the parameterized transport equation. We shall describe the framework here in general terms that will allow us to apply it also to wave- and Schrödinger-type problems. We shall always pose the following assumption which then needs to be verified for the specific problem at hand. Assumption 1.1 Assume that: (B1) .Bμ◦,∗ is injective on .dom(Bμ◦,∗ ) which is dense in .L2 (D). (B2) The range .R(Bμ◦,∗ ) := Bμ◦,∗ [dom(Bμ◦,∗ )] is dense in .L2 (D). If Assumption 1.1 holds, we choose .U := L2 (D) and define |||v|||μ := Bμ◦,∗ v L2 (D) ,

.

which—under Assumption 1.1—is a norm on .dom(Bμ◦,∗ ) if (B1) holds. Then, we define the parameter-dependent test space by the completion Vμ := clos|||·|||μ {dom(Bμ◦,∗ )},

(1.38)

.

which is a Hilbert space with inner product .(w, v)μ := (Bμ∗ w, Bμ∗ v)L2 (D) and induced norm, where .Bμ∗ : Vμ → L2 (D) denotes the continuous extension of ◦,∗ ◦,∗ .Bμ from .dom(Bμ ) to .Vμ . Theorem 1.7 Let Assumption 1.1 hold and let .fμ ∈ L2 (D) be given. Then, the problem b(uμ , v; μ) := (uμ , Bμ∗ v)L2 (D) = (fμ , v)L2 (D)

.

∀v ∈ Vμ

(1.39)

admits a unique solution .uμ ∈ U = L2 (D) and it holds that .

inf sup

u∈U v∈Vμ

b(u, v; μ) b(u, v; μ) = sup sup = 1,

u U |||v|||μ u∈U v∈Vμ u U |||v|||μ

(1.40)

i.e., inf-sup and continuity constants are unity, or, equivalently, . Bμ L(L2 (D),Vμ ) =

Bμ∗ L(Vμ ,L2 (D)) = Bμ−1 L(Vμ ,L2 (D)) = Bμ−∗ L(L2 (D),Vμ ) = 1, where .Bμ−∗ := (Bμ∗ )−1 = (Bμ−1 )∗ : L2 (D) → Vμ .

Proof The bilinear form is bounded since by the Cauchy-Schwarz inequality we have .b(u, v; μ) = (uμ , Bμ∗ v)L2 (D) ≤ uμ L2 (D) Bμ∗ v L2 (D) =

uμ L2 (D) |||v|||μ , i.e., the continuity constant is unity. Next, given .0 = v ∈ Vμ , set .wμ := Bμ∗ v, then .b(wμ , v; μ) = |||v|||μ = 0. Finally, let .0 = u ∈ U. Then,

1 The Reduced Basis Method in Space and Time: Challenges, Limits and Perspectives

39

there exists a .v ∈ Vμ such that .u = Bμ∗ v, which shows (1.40). Then, Theorem 1.1 ensures well-posedness of (1.39), which completes the proof.   The above approach is called ultraweak since all involved partial derivatives are put on the test functions using integration by parts. This means that the solution is determined in .L2 , no additional regularity. The test spaces obviously play an important role. They are parameter-dependent, which was also the case above by forming them using supremizers. However, so far, the norm under which we form the topology of the test space was independent of the parameter. This is not the case here. In fact, we cannot expect that the norms .|||·|||μ are equivalent for different parameter values .μ, which means that also the spaces .Vμ differ for different values of .μ. Hence, we consider the parameter-independent test space ¯ := V



.

Vμ ,

μ∈P

|||v|||V¯ := sup |||v|||μ ,

(1.41)

μ∈P

¯ is dense in each .Vμ , which can be verified for the cases and assume that .V considered here, see e.g. [17] for the transport equation.

1.5.3 Stable Ultraweak Petrov-Galerkin Discretization The next step is to construct a stable Petrov-Galerkin discretization such that the LBB condition in Definition 1.3 holds. We did this by determining supremizers in (1.16). For the problems at hand here (transport, wave, Schrödinger), stabilization is particularly needed since the considered problems are inherently instable (which is the reason for introducing the ultraweak form). For the transport problem, a socalled double greedy method has been introduced in [24]. The idea is basically to introduce a second greedy loop in order to determine a test space in such a way that the arising discrete problem is LBB stable. Here, we are going to describe a different path, which was introduced in [17] for linear transport equations and which also applies to wave and Schrödinger¯ define type problems. The idea is to first choose a “truth” test space .Vδ ⊂ V, the parameter-dependent topology on it and then define the parameter-dependent trial space, namely the completion (see (1.38)) Vμδ := clos|||·|||μ (Vδ ) ⊂ Vμ ,

.

Uμδ := Bμ∗ (Vδμ ).

(1.42)

Then, the problem of finding some .uδμ ∈ Uδμ such that .bμ (uδμ , v δ ) = fμ (v δ ) for all δ δ .v ∈ Vμ can be equivalently be reformulated as find wμδ ∈ Vδμ :

.

a μ (wμδ , v δ ) := (Bμ∗ wμδ , Bμ∗ v δ )L2 (D) = fμ (v δ )

∀v δ ∈ Vδμ , (1.43)

40

K. Urban

which obviously is a symmetric and coercive problem (the normal equations) or a least-squares problem. Thus, problem (1.43) is well-posed and we identify the solution of the discretized “truth” problem as .uδμ := Bμ∗ wμδ . This reformulation will also be used for the implementation of the framework. From (1.43) we see that for the setup of the linear system for .wμδ the precise knowledge of the basis of .Uδ = B ∗ Vδ is not needed—only for the point-wise evaluation of .uδμ when e.g. visualizing the solution. This setting in (1.42) has -at least- two major consequences, namely 1. the test space .Vδμ is parameter-independent as a set with parameter-dependent topology and 2. the trial space .Uδμ is the opposite, i.e., parameter-dependent as a set with parameter-independent topology. However, there is a clear benefit, which is easily proven. Proposition 1.2 Let Assumption 1.1 hold. For trial and test spaces according to (1.42) it holds that .

inf sup

u∈Uδμ

v∈Vδμ

b(u, v; μ) b(u, v; μ) = sup sup = 1.

u U |||v|||μ u∈Uδ v∈Vδ u U |||v|||μ μ

(1.44)

μ

Recalling Theorem 1.2, this means that the Petrov-Galerkin approximation coincides with the best approximation. The convergence order of it can be controlled by the choice of the test space. Moreover, by Lemma 1.2 we obtain that the error of the Petrov-Galerkin approximation coincides with the dual norm of the residual. Hence, if we can compute the dual norm . rμδ V , we obtain a representation of the error. Remark 1.7 The methodology of choosing the discrete test space first and then defining the trial space as the image of the test space through the adjoint operator, is equivalent to the FOSLL.∗ method or its generalization, the DPG.∗ method [27, 57]. We also mention close connections to [26, 32, 34]. Concerning the Schrödinger equation, we mention [28]. However, the derivation presented here differs from those references.

1.5.4 The Ultraweak Reduced Model Next, we construct a reduced test space .V N ⊂ Vδ with dimension .N ∈ N constructed for instance via a greedy algorithm (see above). Then, for each .μ ∈ P we introduce the reduced discretization with test space .VμN := clos|||·|||μ (V N ) ⊂ Vδμ and trial space .UμN := Bμ∗ (VμN ) ⊂ Uδμ . The reduced problem then reads N N N ∗ N N N find uN μ ∈ Uμ : bμ (uμ , v ) = (uμ , Bμ v )L2 (D) = fμ (v )

.

∀v N ∈ VμN . (1.45)

1 The Reduced Basis Method in Space and Time: Challenges, Limits and Perspectives

41

As in the high-dimensional case, these pairs of spaces yield optimal inf-sup constants βμN :=

.

inf

sup

wN ∈UμN v N ∈V N μ

bμ (w N , v N ) =1

wN L2 (D) |||v N |||μ

for all μ ∈ P.

Hence, regardless of the choice of the “initial” reduced test space .V N , we get a perfectly stable numerical scheme without the need to stabilize. Note that this is a major difference to [24], where, due to a different strategy in finding discrete spaces, a stabilization procedure is necessary. Using a least-squares-type reformulation, we can first compute .wμN ∈ VμN such that (Bμ∗ wμN , Bμ∗ v N )L2 (D) = fμ (v N )

.

∀v N ∈ VμN ,

(1.46)

∗ N and then set .uN μ := Bμ wμ as the solution of (1.45).

Offline-/Online-Decomposition By employing the assumed affine parameter dependence of .Bμ∗ and .fμ as in Definition 1.4, the computation of .uN μ can be decomposed efficiently in an offline and online stage: Let .{viN : i = 1, . . . , N } be a basis of the parameter-independent test space .V N . In the offline stage, we precompute and store the following parameter-independent quantities: bq,i := Bq∗ viN ,

for q = 1, . . . , Qb , i = 1, . . . , N,

.

Aq1 ,q2 ;i,j := (bq1 ,i , bq2 ,j )L2 (D) , fq,i := f q (viN ),

for q1 , q2 = 1, . . . , Qb , i, j = 1, . . . , N, for q = 1, . . . , Qf , i = 1, . . . , N.

In the online stage, given a new parameter .μ ∈ P, we assemble for all .i, j = 1, . . . , N b

N .(Aμ )i,j

:=

(Bμ∗ viN , Bμ∗ vjN )L2 (D)

=

b

Q  Q 

q

q

ϑb 1 (μ) ϑb 2 (μ) Aq1 ,q2 ;i,j ,

q1 =1 q2 =1 Qf

N (f N μ )i := fμ (vi ) =



q

ϑf (μ) fq,i ,

q=1

 q q N with .ϑb i , .ϑf given in Definition 1.4. Next, we compute .wμN = N i=1 wi (μ) vi ∈ N N N V N as in (1.46) by solving the linear system .AN μ w μ = f μ of size N , where .w μ := N (wi (μ))i=1,...,N ∈ R . The reduced basis approximation is then determined as ∗ N uN μ := Bμ wμ =

N 

.

i=1

b

wi (μ) Bμ∗ viN =

Q N   i=1 q=1

q

wi (μ) ϑb (μ) bq,i .

42

K. Urban

Error Analysis for the Reduced Basis Approximation In the online stage, for a given (new) parameter .μ ∈ P we are interested in efficiently estimating the model error . uδμ − uN μ L2 (D) to assess the quality of the reduced solution. As already mentioned above, due to the choice of the reduced spaces, the reduced inf-sup and continuity constants are unity. This means that the error, the residual, and the error of best approximation coincide also in the reduced setting. To be more precise, defining for some .v ∈ L2 (D) the discrete residual .rμδ (v) ∈ (Vδμ ) as rμδ (v), w δ (Vδ ) ×Vδ := f (wδ ) − (v, Bμ∗ w δ )L2 (D) ,

.

μ

w δ ∈ Vδμ ,

μ

we have δ N

uδμ − uN μ L2 (D) = rμ (uμ ) (Vδ ) =

.

μ

inf

wN ∈UμN

uδμ − wN L2 (D) .

In principle, .rμδ (v) ∈ (Vδμ ) can be computed. However, due to the special choice of the parameter-dependent norm of .Vδμ , i.e., . w Vδ = Bμ∗ w L2 (D) , the μ

computation of the dual norm involves applying the inverse operator .(Bμ∗ )−1 and is thus as computationally expensive as solving the discrete problem. Therefore, the computation of . rμδ (uN μ ) (Vδμ ) is not offline-online decomposable, so that the residual is not online-efficient. As an alternative for the error estimation mainly in the online stage, we consider an online-efficient, but often non-rigorous hierarchical error estimator similar to the one proposed in [7]. Let .V N ⊂ V M ⊂ Vδ be nested reduced spaces with N ∗ N dimensions N and M, .N < M and denote for some .μ ∈ P by .uN μ ∈ Uμ := Bμ V , M M ∗ M the corresponding solutions of (1.45). Then, we can rewrite .uμ ∈ Uμ := Bμ V the model error of .uN μ as δ N M M δ

uN μ − uμ L2 (D) ≤ uμ − uμ L2 (D) + uμ − uμ L2 (D) .

.

δ Assuming that .V M is large enough such that . uM μ − uμ L2 (D) < ε  1, we can approximate the model error of .uN by δ N M N M

uN μ − uμ L2 (D) ≤ uμ − uμ L2 (D) + ε ≈ uμ − uμ L2 (D) ,

.

(1.47)

which can be computed efficiently also in the online stage. In practice, .V N and .V M can be generated by the strong greedy algorithm with different tolerances .εN and .εM  εN . Of course, this approximation to the model error is in general not reliable, since it depends on the quality of .V M . Reliable and rigorous variants of such an error estimator can be derived based on an appropriate saturation assumption, see [48]. Numerical investigations (for the linear transport problem) of the quality of the error estimator have been presented in [17].

1 The Reduced Basis Method in Space and Time: Challenges, Limits and Perspectives

43

1.5.5 Guiding Examples Revisited We come back to the examples presented in Sect. 1.5.1 at the beginning of this section. In particular, we need to check the validity of Assumption 1.1 in these cases in order to apply the above introduced general approach.

1.5.5.1

The Linear Transport Problem

We start by the linear transport problem in Example 1.3. In [22, Rem. 2.2] it was shown that Assumption 1.1 holds true provided that (i) .0 = bμ ∈ C 1 (; Rd ) and (ii) there exists a .κ > 0 such that cμ −

.

1 2

∇ · bμ ≥ κ > 0

in .

This result can easily be generalized to the time-space case. In fact, we can rewrite the operator in Example 1.3 as follows Bμ u = u˙ + bμ · ∇u + cμ = (1, bμ ) (u, ˙ ∇u) + cμ =:  bμ · ∇(t,x) u + cμ .

.

Then, the above condition reads cμ −

.

1 2

∇ · bμ = cμ −

1 2

∇ · bμ ≥ κ > 0

in I × ,

which means that the time-dependent case boils down to the same conditions for ensuring Assumption 1.1 as the stationary one. Further results in that direction can be found in [16, 17, 37].

1.5.5.2

The Wave Equation

At a first glance, one might think that an ultraweak formulation is not the best idea to get a stable variational formulation for the wave equation since the original operator is symmetric, but the ultraweak one is not. In fact, one might think that multiplying with a test function .v ∈ H01 (I × ) and perform integration by parts once for time and space variables might be a better idea. This would yield a symmetric bilinear form (for the case .Bμ ≡ B = u¨ − u) .

− (u, ˙ v) ˙ L2 (I ×) + (∇u, ∇v)L2 (I ×) .

However, we were not able to prove inf-sup stability and have even gained numerical evidence that this bilinear form is in fact not inf-sup stable.

44

K. Urban

In order to detail the ultraweak form, let us assume that .Aμ is a second-order elliptic operator. Then, the domain of the operator .Bμ◦ in the classical sense reads .

    1 ¯ ¯ , dom(Bμ◦ ) = C 2 (I ) ∩ C{0} (I ) × C 2 () ∩ C0 ()

¯ := {φ ∈ C() ¯ : φ|∂ = 0} models the homogeneous Dirichlet where .C0 () conditions, and 1 (I¯) := {φ ∈ C 1 (I¯) : φ(t) = φ(t) ˙ = 0} C{t}

.

for t ∈ [0, T ] = I¯.

¯ Since The range .R(Bμ◦ ) in the classical sense then reads .R(Bμ◦ ) = C(I¯ × ). (Bμ◦ u, v)L2 (I ×) = (u, Bμ◦ v)L2 (I ×)

.

for all u, v ∈ C0∞ (I × ),

the operator .Bμ◦ is self-adjoint—but with homogeneous terminal conditions .u(T ) = ¯ and u(T ˙ ) = 0 instead of initial conditions. This means that .R(Bμ◦,∗ ) = C(I¯ × ) .

   2  1 ¯ ¯ dom(Bμ◦,∗ ) = C 2 (I ) ∩ C{T } (I ) × C () ∩ C0 () .

Then, Assumption 1.1 has recently been verified in [50]. The corresponding proof is somewhat technical, since it relies on spectral properties of the wave operator and results for classical solutions of the wave equation. Hence, we do not report the details here and refer to [50].

1.5.5.3

The Schrödinger Equation

Also for the Schrödinger equation, there are several possible variational forms. We were only able to prove inf-sup stability for the ultraweak form. Hence, as above, we ¯ → C and perform multiply (1.36) with sufficiently smooth test functions .v : I¯ ×  integration by parts w.r.t. to both space and time, i.e., with .H := L2 (I × ; C)   (g, v)H =

g(t, x) v(t, x) dx dt

.

I



 

˙ x) v(t, x) dx dt ψ(t,

=i I

+



  I



 − 12 x ψ(t, x) + μ (t, x) ψ(t, x) v(t, x) dx dt

= i(ψ(T ), v(T ))H − i(ψ(0), v(0))H + (ψ, (i∂t − 12 x + μ )v)H = −i(ψ0 , v(0))H + (ψ, (i∂t − 12 x + μ )v)H

1 The Reduced Basis Method in Space and Time: Challenges, Limits and Perspectives

45

if .v(T ) = 0. Hence, we arrive at the ultraweak variational formulation of (1.36) with bμ (u, v) := (u, Bμ∗ v)H = (u, i ∂t v − 12 x v + μ v)H , .

.

f (v) := (g, v)H + i(ψ0 , v(0))H ,

(1.48a) (1.48b)

i.e., the right-hand side is independent of the parameter here. However, as already indicated in the formulation of this example, we might also view .ψ0 as a parameter and we could also replace the Laplacian .x by some parameterized affine elliptic operator .Aμ . We want to show that the choices (1.48) yield a well-posed problem of the form (1.5) if we define .U and .V according to the above ultraweak framework. In order to do so, we start by interpreting (1.36) for .ψ0 = 0 in the classical sense, i.e., Bμ◦ ψ(t, x) := i

.

∂ 1 ψ(t, x) − x ψ(t, x) + μ (t, x) ψ(t, x), ∂t 2

(t, x) ∈ T := I × , as well as .g ◦ (t, x) := g(t, x). Homogeneous initial and boundary conditions are associated to .Bμ◦ as essential boundary conditions. Then, (1.36) reads .Bμ◦ ψ = g ◦ in the space .C(T ; C), .T = I × . We denote by11 D(Bμ◦ ) := {ψ : T → C :Bμ◦ ψ ∈ C(T ; C); ψ(t, x) = 0 for (t, x) ∈ I × ;

.

ψ(0, x) = 0 for x ∈ } 1,2 ¯ C)] (T ; C) = [C 1 (I ; C) ∩ C{0} (I¯; C)] ⊗ [C 2 (; C) ∩ C0 (; =: C{0}, (1.49)

the classical domain of .Bμ◦ incorporating homogeneous initial and boundary condi¯ C). tions, where we assume that the parametric potential satisfies .μ ∈ C(I¯ × ; In (1.49) the superscripts denote the regularity in time and space, respectively, whereas the index .{0} refers to homogeneous temporal conditions for .t = 0 and . for homogeneous Dirichlet conditions on .. We will use a similar notation below also for .t = T . As in the case of the wave equation, we have .Bμ◦,∗ = Bμ◦ following the above lines. Next, we need to determine domain and range of .Bμ◦,∗ . To this end, we first note that due to the integration by parts w.r.t. time as above, the initial conditions in .B◦ are transformed to terminal conditions for .Bμ◦,∗ . In view of (1.49), we thus

11 .C (I¯; C) {0}

¯ C) := {φ ∈ C(; ¯ C) : φ(x) = 0, x ∈ }. := {θ ∈ C(I¯; C) : θ(0) = 0} and .C0 (;

46

K. Urban

obtain 1,2 ◦,∗ 1 2 ¯ ¯ D(B . μ ) = C{T }, (T ; C) = [C (I ; C) ∩ C{T } (I ; C)] ⊗ [C (; C) ∩ C0 (; C)]. (1.50)

The range .R(Bμ◦,∗ ) in the classical sense then reads .R(Bμ◦,∗ ) = C(T ; C), so that (B2) in Assumption 1.1 holds. Proposition 1.3 The formal adjoint .Bμ◦,∗ is injective on .D(Bμ◦,∗ ) given by (1.50). 1,2 ◦,∗ Proof Let .ψ ∈ C{T }, (T ) solve .Bμ ψ = 0. Then, for any .φ ∈ V and for .s ∈ I we have .i (∂t ψ(s), φ)L2 (;C) − 12 (∇x ψ(s), ∇x φ)L2 (;C) − μ (s)ψ(s), φ L (;C) = 0. 2 Then, we choose .ψ(s) as test function and obtain

i (∂s ψ(s), ψ(s))L2 (;C) −

(∇x ψ(s), ∇x ψ(s))L2 (;C)  − μ (s)ψ(s), ψ(s) L (;C) = 0.

.

1 2



2

(1.51)

  We note that . 12 (∇x ψ(s), ∇x ψ(s))L2 (;C) − μ (s)ψ(s), ψ(s) L (;C) ∈ R. Thus, 2 integrating over .[t, T ] for .t ∈ I and taking the imaginary part of (1.51) yields  0=

.

t

T

(∂s ψ(s), ψ(s))L2 (;C) ds =

1 2

 t

T

d

ψ(s) 2L2 (;C) ds ds

1 = − ψ(t) 2L2 (;C) , 2 since .ψ(T ) = 0, so that . ψ(t) L2 (;C) = 0 for all .t ∈ I , which proves the claim.

 

Now, we can put the above setting into the framework of Theorem 1.7 and obtain the desired statement. Theorem 1.8 Let .μ ∈ L∞ (T ; R) and .g ∈ H. Moreover, let .Vμ , .bμ (·, ·), and f (·) be defined as in (1.38) and (1.48), respectively. Then, the variational problem ∗ .bμ (u, v) = f (v) for all .v ∈ Vμ , admits a unique solution .uμ ∈ U = H and .

β := inf sup

.

u∈U v∈Vμ

bμ (u, v) bμ (u, v) = sup sup = 1.

u U |||v|||μ u∈U v∈Vμ u U |||v|||μ

(1.52)

1.5.6 Ultraweak “Truth” Discretization We are now going to detail possible discretizations (1.42) for the three guiding examples. First, we note that the wave and the Schrödinger equation are timedependent, the transport problem can be stationary and instationary.

1 The Reduced Basis Method in Space and Time: Challenges, Limits and Perspectives

1.5.6.1

47

Linear Transport

Let us describe a possible discretization from [17] for data .bμ and .cμ which are constant in time and space, i.e., .0 = bμ ∈ Rn and .cμ ∈ R. We refer again to [17] also for discretizations for non-constant data. Assume that we are given a finite  nT ¯ = nTδ T¯i . If .v δ ∈ Vδ is globally continuous and element mesh .Tδ = {Ti }i=1δ of . i=1 polynomial on each element .Ti , all terms of .Bμ∗ v δ are still polynomials of the same or lower order on .Ti , while the gradient terms yield discontinuities on the interelement boundaries. Denoting by .Vδ ⊂ V a conforming FE space with polynomial ¯ δ ⊂ L2 () the corresponding discontinuous FE space, i.e., order r on .Tδ and by .U .

Vδ := {v ∈ C 0 () : v|T ∈ Pr (T ) ∀T ∈ Tδ , v|+ = 0} ⊂ V, .

(1.53)

¯ δ := {u ∈ L2 () : u|T ∈ Pr (T ) ∀T ∈ Tδ } ⊂ L2 (), U

(1.54)

¯ δ and we can determine the solution .uδμ ∈ Uδμ in terms we have .Uδμ = Bμ∗ Vδ ⊂ U ¯ δ. of the standard (parameter-independent) nodal basis of .U ¯ δ in the nodal Let .B∗μ ∈ Rn¯ x ×ny be the matrix representation of .Bμ∗ : Vδ → U ¯ δ , meaning that the i-th column bases .(φ1 , . . . , φny ) of .Vδ and .(ψ1 , . . . , ψn¯ x ) of .U of .B∗ contains the coefficients of .Bμ∗ φi in the basis .(ψ1 , . . . , ψn¯ x ), i.e., .Bμ∗ φi = n¯ x μ ∗ j =1 [Bμ ]j,i ψj . Due to the form of the operator and the chosen spaces, the matrix ∗ .Bμ can be computed rather easily, see the example below. Then, the coefficient  ¯x ¯ δ can simply be computed vector .uμ = (u1 , . . . , un¯ x ) of .uδμ = ni=1 ui ψi ∈ U ny δ from the coefficient vector .wμ = (w1 , . . . , wny ) of .wμδ = i=1 wi φi ∈ V by ∗ .uμ = Bμ wμ . To solve the least-squares reformulation (1.43) of the discrete problem, we have to assemble the matrix corresponding to the bilinear form .a μ : Vδ × Vδ given by .a μ (wμδ , vμδ ) = (Bμ∗ wμδ , Bμ∗ vμδ )L2 (D) = (wμδ , vμδ )V , i.e., the .Vμ -inner product matrix of .Vδμ . One possibility for the assembly is to use the matrix .B∗μ : Denoting by n¯ ×n¯ x ¯ δ , i.e., .[M δ ]i,j = (ψi , ψj )L (D) , we see .M δ ∈ R x the .L2 -mass matrix of .U N U

N U

2

that for .Yμ := (B∗μ ) M N δ B∗μ ∈ Rny ×ny it holds .[Yμ ]i,j = (Bμ∗ φi , Bμ∗ φj )L2 (D) = U (φi , φj )Vμ . The solution procedure thus consists of the following steps: 1. 2. 3. 4.

Assemble .B∗μ and .Yμ Assemble the load vector .fμ ∈ Rny , .[fμ ]i := fμ (φi ), i = 1, . . . , ny Solve .Yμ wμ = fμ Compute .uμ = B∗μ wμ

Assembling the Matrices for Spaces on Rectangular Grids As a concrete example on how to assemble the matrices .B∗μ and .Yμ we recall from [17] the stationary case .D =  = (0, 1)n using a rectangular grid.

48

K. Urban

1D-Case We start with the one-dimensional case. Let thus . = (0, 1) and .bμ > 0. h Moreover, let .Th = {[(i − 1)h, ih)}ni=1 be the uniform one-dimensional grid with h,p ¯ h,p mesh size .h = 1/nh , fix a polynomial order .p ≥ 1, and define .V1D , .U 1D as in (1.53), (1.54). Let .(φ1 , . . . , φny ) and .(ψ1 , . . . , ψn¯ x ) be the respective nodal bases h,p n¯ x ×ny ¯ h,p be the matrix representation of the of .V and .U 1D . Moreover, let .I1D ∈ R 1D

¯ 1D in the respective nodal bases, i.e., the i-th column of embedding .Id : V1D → U h,p ¯ h,p in the basis .(ψ1 , . . . , ψn¯ x ), such .I1D contains the coefficients of .φi ∈ V U 1D ⊂ ny 1D n¯ x that for .u = I1D · w it holds . i=1 ui ψi = i=1 wi φi . Similarly, let .A1D ∈ Rn¯ x ×ny h,p h h  ¯ h,p be the matrix representation of the differentiation . d : V → U 1D , w &→ (w ) . h,p

h,p

dx

1D

Additionally, as above, we define .M1D ∈ Rn¯ x ×n¯ x , [M1D ]i,j = (ψi , ψj )L2 ((0,1)) as ¯ h,p the .L2 -mass matrix of .U 1D . For .p = 1, i.e., linear FE, and a standard choice of the nodal bases the matrices .I1D , .A1D , and .M1D read ⎛

I1D

.

1 ⎜0 ⎜ ⎜ := ⎜0 ⎜0 ⎝ .. .

0 1 1 0

⎞ ⎛ 0 ··· −1 1 ⎜−1 1 0 ⎟ ⎟ 1 ⎜ ⎜ 0 ⎟ ⎟ , A1D := · ⎜ 0 −1 ⎟ ⎜ 0 −1 h 1 ⎠ ⎝ .. .. . .

⎞ ⎛ 0 ··· 2 ⎜1 0 ⎟ ⎟ h ⎜ ⎜ 1 ⎟ ⎟ , M1D = · ⎜0 ⎟ ⎜0 6 1 ⎠ ⎝ .. .. . .

1 2 0 0

0 0 2 1

⎞ 0 ··· 0 ⎟ ⎟ 1 ⎟ ⎟. 2 ⎟ ⎠ .. .

With these three matrices we can then compose the matrices .B∗1D and .Y1D by B∗μ,1D := −bμ · A1D + cμ · I1D ,

.

Yμ,1D := (B∗μ,1D ) M1D B∗μ,1D .

nD-Case Next, we consider a rectangular domain of higher dimension, e.g., . = ¯ i, (0, 1)n , .n ≥ 2. We choose in each dimension one-dimensional FE spaces .Vi , U .i = 1, . . . , n as in (1.53), (1.54) separately, and use the tensor product of these &n &n i ¯δ ¯i spaces .Vδ := i=1 V , .U := i=1 U as FE spaces on the rectangular grid formed by a tensor product of all one-dimensional grids. The system matrices can then be assembled from Kronecker products of the one-dimensional matrices ¯ i , i = 1, . . . , n: We first assemble for .i = corresponding to the spaces .Vi , U ¯ i. 1, . . . , n the matrices .Ii1D and .Ai1D corresponding to the pair of spaces .Vi , .U Then, the matrix corresponding to the adjoint operator can be assembled by B∗μ := −

n 

.

i=1

(i−1) (i+1) (bμ )i I11D ⊗· · ·⊗I1D ⊗Ai1D ⊗I1D ⊗· · ·⊗In1D +cμ

n '

Ii1D ,

(1.55)

i=1

e.g. for .n = 2 we have .B∗μ,2D := −(bμ )1 (A11D ⊗ I21D ) − (bμ )2 (I11D ⊗ A21D ) + ¯ δ can be computed from the cμ (I11D ⊗ I21D ). Similarly, the mass matrix .MU¯ δ of .U & ¯ i , i = 1, . . . , n by .M δ := n Mi , one-dimensional mass matrices .Mi1D of .U i=1 ¯ 1D U

1 The Reduced Basis Method in Space and Time: Challenges, Limits and Perspectives

49

such that .Yμ := (B∗μ ) MU¯ δ B∗μ can also be directly assembled using the matrices i , Ai , Mi , .i = 1, . . . , n. .I 1D 1D 1D Remark 1.8 If the underlying domain is not of tensor product structure or if the mesh is unstructured, the linear system cannot be written in term of a tensor product, of course. However, there is a generic case in which the domain has (at least partly) tensor product structure, namely for instationary problems, i.e., .D = (0, T ) × . In this case, one could use separate finite element bases in time and space, so that the arising linear system in fact has tensor product structure, see also below.

1.5.6.2

The Wave Equation

The definition (1.38) is not well suited for a discretization. Hence, we are now going to further investigate .Vμ . First, recall the definition of .Bμ◦,∗ in (1.37) and note that .Vμ = clos|||·|||μ (dom(Bμ◦,∗ )) and .dom(B◦∗ ) is a tensor product space. Next, for .v(t, x) = ϑ(t) ϕ(x), we define a tensor product-type norm as    ¨ 2L (I ) + ϑ 2L (I ) ϕ 2L () + Aμ ϕ 2L () |||v|||2Vμ = |||ϑ ⊗ ϕ|||2Vμ := ϑ 2 2 2 2

.

=: |||ϑ|||2t |||ϕ|||2x , which mimics the tensor product graph norm, and set V◦μ := clos|||·|||Vμ (dom(Bμ◦,∗ ))   2   1 ¯ ¯ = clos|||·|||t C 2 (I ) ∩ C{T } (I ) × clos|||·|||x C () ∩ C0 ()  2  2 1 = H{T } (I ) × H () × H0 () ,

.

(1.56)

2 (I ) := {ϑ ∈ H 2 (I ) : ϑ(T ) = ϑ(T ˙ ) = 0} and assuming that where .H{T } ◦ 1 2 .dom(Aμ ) = H () × H () [11]. Again, it is readily seen that .Vμ ⊂ Vμ , but the 0 ◦ contrary is not true in general. In view of (1.56), .Vμ is a tensor product space which can be discretized in a straightforward manner. Hence, we look for a pair .Uδμ ⊂ U and .Vδμ ⊂ V◦μ satisfying (1.9) with a possibly large inf-sup lower bound .βμ◦ , i.e., close to unity.

Finite Elements in Time We start with the temporal discretization. We choose some integer .Nt > 1 and set .t := T /Nt . This results in a temporal “triangulation” k−1 Ttime ≡ (k − 1)t < t ≤ k t ≡ t k , 1 ≤ k ≤ Nt }. t ≡ {t

.

Then, we set 2 Rt := Span{1 , . . . , Nt } ⊂ H{T } (I ),

.

(1.57)

50

K. Urban

e.g. piecewise quadratic splines on .Ttime t with standard modification in terms of multiple knots at the right end point of .I¯ = [0, T ]. Discretization in Space For the space discretization, we choose any conformal finite element space Zh := Span{φ1 , . . . , φNh } ⊂ H01 () ∩ H 2 (),

.

(1.58)

e.g. piecewise quadratic finite elements with homogeneous Dirichlet boundary conditions. Test and Trial Space in Space and Time Then, we define the test space as Vδ := Rt ⊗ Zh ⊂ V◦ ⊂ V,

.

δ = (t, h),

(1.59)

= Span{ϕν := k ⊗ φi : k = 1, . . . , Nt , i = 1, . . . , Nh , ν = (k, i)}, which is a tensor product space of dimension .Nδ = Nt Nh . The trial space .Uδμ is constructed by applying the adjoint operator .Bμ∗ to each test basis function, i.e., for  .ν = ( , j ) ψν  := Bμ∗ (ϕ( ,j ) ) = Bμ∗ ( ⊗ φj ) = (∂tt + Aμ )( ⊗ φj )

.

= ¨ ⊗ φj +  ⊗ Aμ φj , i.e., .Uδμ := Bμ∗ (Vδ ) = Span{ψν  : ν  = 1, . . . , Nδ }. Since .Bμ∗ is an isomorphism of .Vμ onto .L2 (I ; L2 ()), the functions .ψ( ,j ) are in fact linearly independent. The Linear System To derive the stiffness matrix, we first use arbitrary spaces induced by .{ψν := σ ⊗ ξj : ν = 1, . . . , Nδ } for the trial and .{ϕν  = k ⊗ φi : ν  = 1, . . . , Nδ } for the test space. Using .[B δμ ]ν,ν  = [B δμ ]( ,j ),(k,i) we get [B δμ ]( ,j ),(k,i) = bμ (ψν , ϕν  ) = (ψν , Bμ∗ ϕν  )H = (σ ⊗ ξj , ¨ k ⊗ φi + k ⊗ Aμ φi )H

.

= (σ , ¨ k )L2 (I ) (ξj , φi )L2 () + (σ , k )L2 (I ) (ξj , Aμ φi )L2 () , (1.60) ˜ t ⊗ M ˜h+M ˜ t ⊗ N ˜ μ,h , where .[M ˜ t ] ,k := (σ , k )L2 (I ) , so that .B δμ = N k ˜ h ]j,i := (ξj , φi )L2 () , .[N ˜ t ] ,k := (σ , ¨ )L2 (I ) and .[N ˜ μ,h ]j,i := .[M (ξj , Aμ φi )L2 () . In the specific case .ψν = Bμ∗ (ϕν ), we get the representation [B δμ ]( ,j ),(k,i) = bμ (ψν , ϕν  ) = (ψν , Bμ∗ ϕν  )L2 (I ×) = (Bμ∗ ϕν , Bμ∗ ϕν  )L2 (I ×)

.

= (¨ ⊗ φj +  ⊗ Aμ φj , ¨ k ⊗ φi + k ⊗ Aμ φi )L2 (I ×) = (¨ , ¨ k )L2 (I ) (φj , φi )L2 () + ( , k )L2 (I ) (Aμ φj , Aμ φi )L2 () + (¨ , k )L2 (I ) (φj , Aμ φi )L2 () + ( , ¨ k )L2 (I ) (Aμ φj , φi )L2 () (1.61)

1 The Reduced Basis Method in Space and Time: Challenges, Limits and Perspectives

51

so that B δμ = Qt ⊗ M h + N t ⊗ N μ,h + N t ⊗ N μ,h + M t ⊗ Qμ,h ,

.

where the parameter-independent terms read .[Qt ] ,k := (¨ , ¨ k )L2 (I ) , k .[M t ] ,k := ( ,  )L2 (I ) , .[N t ] ,k := ( ¨ , k )L2 (I ) and .[M h ]j,i := (φj , φi )L2 () and the parameter-dependent matrices are given by [Qμ,h ]j,i := (Aμ φj , Aμ φi )L2 () ,

[N μ,h ]j,i := (Aμ φj , φi )L2 () .

.

We stress that .B δμ is symmetric and positive definite with the above assumptions on .Aμ . Finally, let us now detail the right-hand side. Recall from (1.39), that .fμ (v) = (fμ , v)H + u1 , v(0) − (u0 , v(0)) ˙ H . Hence, [f δμ ]ν = [f δμ ](k,i) = (fμ , ϕν )H + u1 , ϕν (0) V  ×V − (u0 , ϕ˙ν (0))H

.

= (fμ , k ⊗ φi )H + u1 , ϕν (0) V  ×V − (u0 , ϕ˙ν (0))H  T  k = fμ (t, x)  (t) φi (x) dx dt + [u1 (x) k (0) − u0 (x) ˙ k (0)]φi (x) dx. 0





Using appropriate quadrature formulae results in a numerical approximation, which we will again denote by .f δμ . Then, solving the linear system .B δμ uδμ = f δμ yields the expansion coefficients of the desired approximation .uδμ ∈ Uδμ as follows: Let δ .uμ = (uν )ν=1,...,Nδ , .ν = (k, i), then δ .uμ (t, x)

=

Nδ 

uν ψν (t, x) =

ν=1

Nh Nt  

uk,i σ k (x) ξi (x),

k=1 i=1

in the general case and for the special one, i.e., .ψν = Bμ∗ (ϕν ), uδμ (t, x) =

Nδ 

.

ν=1

1.5.6.3

uν ψν (t, x) =

Nh Nt  

  uk,i ¨ k (t) φi (x) + k (t) Aμ φi (x) .

k=1 i=1

The Schrödinger Equation

For the discretization of the Schrödinger equation, we use some pieces of the discretization reported in Sect. 1.5.6.2 above. Similar to (1.56) we note that 1 1 2 ◦ Vμ ⊆ H{T } (I ; C) ⊗ [H0 (; C) ∩ H (; C)] =: Vμ .

.

Next, we use .Rt from (1.57) and .Zh in (1.58).

(1.62)

52

K. Urban

Test and Trial Space in Space and Time Then, we define the test space as Vδ := Rt ⊗ Zh ⊂ V◦ ⊂ V,

.

δ = (t, h),

(1.63)

= SpanC {ϕν := k ⊗ φi : k = 1, . . . , Nt , i = 1, . . . , Nh , ν = (k, i)}  :=



(N t ,Nh )

cν ϕν : cν ∈ C ,

ν=(k,i)=(1,1)

which is a complex-valued tensor product space of dimension .Nδ = Nt Nh . The trial space .Uδμ is constructed by applying the adjoint operator .Bμ∗ to each test basis function, i.e., for .ν  = ( , j ) ψν  := Bμ∗ (ϕν  ) = Bμ∗ ( ⊗ φj ) = (i∂t − 12 x + μ )( ⊗ φj ),

.

i.e., .Uδ := Bμ∗ (Vδ ) = Span{ψν  : ν  = 1, . . . , Nδ }. Since .Bμ∗ is an isomorphism of .Vμ onto .H = L2 (I × ; C), the functions .ψν are in fact linearly independent. The Linear System To derive the stiffness matrix, we follow the same path as for the wave equation, namely we first use arbitrary spaces induced by .{ψν := σ ⊗ ξj : ν = 1, . . . , Nδ } for the trial and .{ϕν = k ⊗ φi : ν = 1, . . . , Nδ } for the test space. Using .[B δμ ]ν,ν  = [B δμ ]( ,j ),(k,i) we get in the “general” case [B δμ ]( ,j ),(k,i) = (ψν , Bμ∗ ϕν  )H = (σ ⊗ ξj , (i∂t − 12 x + μ )k ⊗ φi )H

.

= i(σ , ˙ k )L2 (I ) (ξj , φi )L2 () − 12 (σ , k )L2 (I ) (ξj , x φi )L2 () + (σ ⊗ ξj , μ [k ⊗ φi ])L2 (I ×)

(1.64)

In the specific (inf-sup optimal) case .ψν = Bμ∗ (ϕν ), we get the representation [B δμ ]( ,j ),(k,i) = bμ (ψν , ϕν  ) = (ψν , Bμ∗ ϕν  )H = (Bμ∗ ϕν , Bμ∗ ϕν  )H

.

= ((i∂t − 12 x + μ )( ⊗ φj ), (i∂t − 12 x + μ )(k ⊗ φi ))L2 (T ;C) Now, we need to take the rules for complex integration into account and get [B δμ ]( ,j ),(k,i) = (σ˙ , σ˙ k )L2 (I ) (φj , φi )L2 () + 14 (σ , σ k )L2 (I ) (x φj , x φi )L2 ()

.

− 12 (μ [σ ⊗ φj ], σ k ⊗ x φi )L2 (I ×) − 12 (σ ⊗ x φj , μ [σ k ⊗ φi ])L2 (I ×) + (μ [σ ⊗ φj ], μ [σ k ⊗ φi ])L2 (I ×) − 2i (σ˙ , σ k )L2 () (φk , x φi )L2 () + 2i (σ , σ˙ k )L2 () (x φk , φi )L2 () + i(μ [σ˙ ⊗ φj ], σ k ⊗ φi )L2 (I ×) − i(σ ⊗ φj , μ [σ˙ k ⊗ φi ])L2 (I ×) (1.65)

1 The Reduced Basis Method in Space and Time: Challenges, Limits and Perspectives

53

so that B δμ = At ⊗ M h + 14 M t ⊗ Ah − 12 (V δμ + (V δμ ) ) + W δμ ( ) + i − 12 N t ⊗ N h + 12 N t ⊗ N h + (U δμ − (U δμ ) ) =: Aδμ + i B δμ ,

.

(1.66) where [At ] ,k := (σ˙ , σ˙ k )L2 (I ) ,

[Ah ]j,i := (x φj , x φi )L2 () ,

[M t ] ,k := (σ , σ k )L2 (I ) ,

[M h ]j,i := (φj , φi )L2 () ,

[N t ] ,k := (σ˙ , σ k )L2 (I ) ,

[N h ]j,i := (x φj , φi )L2 () ,

.

as well as the parameter-dependent components [U δμ ]( ,j ),(k,i) := (μ [σ˙ ⊗ φj ], σ k ⊗ φi )L2 (I ×) ,

.

[V δμ ]( ,j ),(k,i) := (σ ⊗ x φj , μ [σ k ⊗ φi ])L2 (I ×) , [W δμ ]( ,j ),(k,i) := (μ [σ ⊗ φj ], μ [σ k ⊗ φi ])L2 (I ×) . The latter three matrices .U δμ , .V δμ and .W δμ involve the potential .μ and are thus, in general, not separable w.r.t. time and space variables. However, there are several methods to (at least) approximate .μ (t, x) in terms of a sum of separable functions. Hence, let us for simplicity assume that .μ is a tensor product, i.e., μ (t, x) = μ (t)

.

μ (x),

t ∈ I, x ∈ 

(1.67)

for functions .μ ∈ L∞ (I ; R) and . μ ∈ L∞ (; R), which we may assume to be affine in the parameter .μ in the sense of Definition 1.4. In that case, we can further detail the matrices .U δμ , .V δμ and .W δμ as follows [U δμ ]( ,j ),(k,i) = (μ σ˙ , σ k )L2 (I ) (

.

μ φj , φi )L2 () ,

[V δμ ]( ,j ),(k,i) = (σ , μ σ k )L2 (I ) (x φj , [W δμ ]( ,j ),(k,i) = (μ σ , μ σ k )L2 (I ) (

μ φi )L2 () ,

μ φj ,

μ φi )L2 () .

Finally, let us now detail the right-hand side, which is independent of the parameter here, i.e., [f δ ]ν = [f δ ](k,i) = g, k ⊗ φi V ×V + i(ψ0 , k (0) ⊗ φi )L2 (;C)   T = g(t, x) k (t) φi (x) dx dt + ik ψ0 (x) φi (x) dx.

.

0





54

K. Urban

Using appropriate quadrature formulae results in a numerical approximation, which we will again denote by .f δ . Then, solving the linear system .B δμ uδ = f δ yields the expansion coefficients of the desired approximation .uδμ ∈ Uδμ as follows: Let N δ .uμ = (uν )ν=1,...,Nδ ∈ C δ , .ν = (k, i), then uδμ (t, x) =

Nδ 

.

uν ψν (t, x) =

ν=1

Nh Nt  

uk,i σ k (x) ξi (x),

k=1 i=1

in the general case and for the special one, i.e., .ψν = Bμ∗ (ϕν ) under the assumption (1.67) uδμ (t, x) =

Nδ 

.

uν ψν (t, x)

ν=1

=

Nh Nt  

 uk,i i˙ k (t) φi (x) − 12 k (t) x φi (x) + μ (t)k (t)



μ (x)φ(x)

.

k=1 i=1

(1.68)

1.5.6.4

Common Challenges

We summarize the main observations from the above three examples. In space, we may use any finite element mesh with corresponding conformal finite elements. Since we start with the test functions, those elements typically need to be of second order at least, i.e., piecewise quadratic for wave and Schrödinger and piecewise linear for the linear transport problem. For the time-dependent problems there is a natural tensor product decomposition into space and time variables. If we choose a corresponding tensor product discretization, we obtain a linear system, where the stiffness matrix is a sum of tensor products. The sum is regular, while single components might be singular. This is a particular challenge for the numerical solution. We will come back to that point later.

1.5.7 The Kolmogorov N-Width Again The preceding section answers the question if we can construct a well-posed variational formulation for the challenging problems transport, wave and Schrödinger. The issue remains to determine the rate of decay of the Kolmogorov N-width. We collect some results in that direction.

1 The Reduced Basis Method in Space and Time: Challenges, Limits and Perspectives

1.5.7.1

55

Linear Transport

For transport-dominated problems, it is known that the decay of the Kolmogorov N -width is poor. We quote one respective result in that direction. Lemma 1.5 ([65]) For the univariate linear transport problem Bμ u := u˙ + μ ∂x u = 0,

.

t, x ∈ (0, 1),

u(0, x) = 0, u(t, 0) = 1

for .μ ∈ P = [0, 1], it holds that .dN (P) ≥ 12 N −1/2 . Of course, the framework in the latter result is a very simple one. This means, however, that we cannot expect a reasonable decay in more general situations. Transport problems are indeed tough for “standard” model reduction. In addition, note that the solution of the above transport problem is to be understood in the distributional sense.

1.5.7.2

The Parameterized Wave Equation

Also for the wave equation, the Kolmogorov N-width does not decay fast. Again, we quote one result in that direction. Lemma 1.6 ([42]) Let . := R, .I := R+ (Cauchy problem), .P = R+ . For Bμ u := u¨ μ − μ2 uμ = 0

.

(t, x) ∈ I := I × , . * 1, if x < 0, x ∈ , . uμ (0, x) = u0 (x) := −1, if x ≥ 0,

∂t uμ (0, x) = 0,

for

x ∈ ,

(1.69a) (1.69b) (1.69c)

it holds that . 14 N −1/2 ≤ dN (P) ≤ 12 (N − 1)−1/2 . The above initial value problem (1.69) has no classical solution, so that the statement holds for solutions in the distributional sense, which is known to be appropriate for hyperbolic problems.

1.5.7.3

Schrödinger Equation

The Kolmogorov N-width can be shown to behave in a similar manner as for transport and wave-type problems [47].

56

1.5.7.4

K. Urban

Common Challenges

We have seen that .dN (P) is to be expected to decay very slowly in all three cases. Let us come back to Theorem 1.3, where exponential decay of .dN (P) has been obtained for the reduced problem in (1.15). Why does this not generalize to the above problems? Recall, that in Theorem 1.3 we were seeking some .uμ ∈ U N and the test space N .V (μ) is built e.g. by supremizers. However, the ultraweak reduced framework in (1.45) does not fit into this setting. The reason is that both trial space .UμN and test space .VμN are parameter-dependent—the test space concerning the topology and the trial space as a set. Thus, we cannot apply Theorem 1.3 and we are facing the question: What is the rate of the error of the ultraweak reduced model? So far, we have only numerical indications, no theory yet. This is an open issue for future research. Finally, recall that all above statements of the decay of .dN (P) are asymptotic, i.e., as .N → ∞. This means that one can at least try to approximate a given problem for a given accuracy by a linear model reduction and observe how large the reduced dimension N in fact has to be chosen in order to reach the prescribed accuracy.

1.6 Numerical Aspects In this section, we collect some numerical observations both for realizing the “truth” approximation and also for model reduction.

1.6.1 “Truth” Approximation in Space and Time In all considered cases, we have described the derivation of a reduced system by using the Reduced Basis Method. The main difference in the numerical realization lies in the computation of the snapshots within the greedy algorithm. For the thermal block this amounts to solve a discretized elliptic problem allowing for standard fast numerical solvers such as a preconditioned cg-method or a multigrid solver. In the example of the parameterized heat equation, we obtained a linear system of equations .B δμ uδμ = f δμ , where the stiffness matrix .B δμ is a sum of tensor products, see (1.34). However, for specific choices of the discretization, we could show that this linear system is equivalent to the Crank–Nicolson time stepping scheme in (1.35). This means that the “truth” solution can be computed by a standard time-marching scheme combined with fast solvers for the spatial systems. Note however, that this is a very special situation for the heat equation in combination with a specific discretization [81]. In fact [1] shows that obtaining stable timemarching schemes arising from space-time discretizations is not straightforward

1 The Reduced Basis Method in Space and Time: Challenges, Limits and Perspectives

57

even for the parabolic heat equation. Hence, we cannot expect that stable spacetime discretizations for linear transport, wave and Schrödinger equations can be reformulated as time stepping schemes. We are not aware of corresponding examples. Hence, at least for those three examples, there is need for efficient numerical solvers for such tensor product systems. In this section, we report some recent observations from [50, 51] concerning these aspects. In all the cases described above, we obtain a (regular) linear system of the form B

δ δ .B μ uμ

=

f δμ

with

B δμ

=

P 

f

Dp,μ ⊗ Ap,μ ,

f δμ

=

p=1

P 

h ,μ ⊗ q ,μ ,

=1

(1.70) where all involved matrices are sparse and (at least some of) the .Ap,μ are s.p.d. If Dp,μ ∈ RK×K and .Ap,μ ∈ Rnh ×nh , the linear system in (1.70) is of dimension .N = Kn. Recall that .

(Dp,μ ⊗ Ap,μ )x = vec(Ap,μ XDp,μ ),

.

where vec stacks the columns of a given matrix one after the other, and .x = vec(X). We can thus rewrite the system .B δμ uδμ = f δμ in (1.70) as the linear matrix equation B

B δμ uδμ = f δμ

.



P 

f

Ap,μ Uμδ Dp,μ =

p=1

P 

q ,μ h ,μ , with uδμ = vec(Uμδ ).

=1

Matrix equations are receiving significant attention in the PDE context, due to the possibility of maintaining the structural properties of the discretized problem, while limiting memory consumptions; see [75]. Under certain hypotheses, a large variety of discretization methodologies such as finite differences, isogeometric analysis, spectral (element) methods, certain finite element methods as well as various parametric numerical schemes rely on tensor product spaces; see, e.g., [4, 21, 43, 58]. More recently, all-at-once time discretizations have shown an additional setting where tensor product approximations naturally arise; see, e.g., [66] and references therein. Among the various computational strategies discussed in the literature [75], here we focus on projection methods that reduce the original equation to a similar one, but of much smaller dimension.

1.6.1.1

The Parameterized Heat Equation

Let us now detail some numerical aspects for the heat equation. In particular, we shall compare the numerical solution of the fully discretized space-time system with a time-stepping scheme.

58

K. Urban

The Discretized Heat Equation The problem .B δμ uδμ = f δμ yields the following generalized Sylvester equation Mh Uμδ Dt + Aμ,h Uμδ Ct = Fμδ ,

.

with Fμδ := [q1 , . . . , qP ][hμ,1 , . . . , hμ,P ] . (1.71)

Note that .Aμ,h , .hμ,p are the only .μ-dependent terms due to the chosen form of the parameterized heat equation. The spatial stiffness .Aμ,h and mass matrices .Mh typically have significantly larger dimensions .Nh than the time discretization matrices .Dt , .Nt , i.e., .Nt  Nh . We therefore use a reduction method only for the space variables by projecting the problem onto an appropriate space. A matrix Galerkin orthogonality12 condition is then applied to obtain the solution: given .Vμm ∈ RNh ×km , .km  Nh , with orthonormal columns, we consider the approximation space range(.Vμm ) and seek .Yμm ∈ Rkm ×Nt such that .Uμδ,m := Vμm Yμm ≈ Uμδ and the residual .Rμm := Fμδ − (Mh Uμδ,m Dt + Aμ,h Uμδ,m Ct ) satisfies the Galerkin condition .Rμm ⊥ range(Vμm ). Imposing this orthogonality yields that m m .(Vμ ) Rμ = 0 is equivalent to (Vμm ) Fμδ Vμm − ((Vμm ) Mh Vμm )Yμm Dt − ((Vμm ) Aμ,h Vμm )Yμm Ct = 0.

.

The resulting problem is again a generalized Sylvester equation, but of much smaller size, therefore Schur-decomposition oriented methods can cheaply be used [75, Sec. 4.2], see [75] for a discussion on projection methods as well as their matrix and convergence properties. For selecting .Vμm , let .Fμ = Fμ,1 Fμ,2 with .Fμ,1 having full column rank. Given the properties of .Aμ,h , .Mh , we propose to employ the rational Krylov subspace −1 −1 RKm μ := range([Fμ,1 ,(Aμ,h − σ2 Mh ) Mh Fμ,1 , (Aμ,h − σ3 Mh ) Mh Fμ,1 ,

.

. . . , (Aμ,h − σm Mh )−1 Mh Fμ,1 ]), where the shifts .σs can be determined adaptively while the space is being generated; see [75] for a description and references. The obtained spaces are nested, .RKm μ ⊆ RKm+1 , therefore the space can be expanded if the approximation is not sufficiently μ good. To include a residual-based stopping criterion, the residual norm can be computed in a cheap manner, see, e.g., [29, 66] for the technical details. A Numerical Example We report results from a numerical example from [50] on the cube . = (−1, 1)3 with homogeneous Dirichlet boundary conditions and the time interval .I := (0, 10) with initial conditions .u(0, x, y, z) ≡ 0. We consider the non-parametric form .Aμ ≡ − in order to show the performance for a sample 12 At

this point, we are talking about discrete linear algebra problems, hence we have a Galerkin method in the Euclidean space, no longer a Petrov-Galerkin one.

1 The Reduced Basis Method in Space and Time: Challenges, Limits and Perspectives

59

Table 1.1 Results for different values of .Nh and .Nt . Memory allocations for RKSM and LRFGMRES+RKSM are given by .ρmem · (Nh + Nt ). For Crank–Nicolson we report only the computational timings .Nh

41,300

.Nt

300 500 700 347,361 300 500 700

RKSM Its .ρmem 13 14 13 14 13 14 14 15 14 15 14 15

.rank(Uδ )

9 9 9 9 9 7

Time (s) 25.96 30.46 28.17 820.17 828.34 826.93

LR-FGMRES+RKSM Its .ρmem .rank(Uδ ) Time (s) 4 74 10 82.89 4 75 11 83.93 4 86 11 89.99 4 78 9 2319.67 4 80 9 2384.39 4 97 9 2327.76

CN [Time (s)] Direct Iterative 123.43 59.10 143.71 78.01 153.38 93.03 14705.10 792.42 15215.47 1041.47 15917.52 1212.57

“truth” simulation (and thus omit the subscript .μ). The right-hand side is chosen as .f (t, x, y, z) := 10 sin(t)t cos( π2 x) cos( π2 y) cos( π2 z) and its discretized version is thus of low rank. For discretization in space, linear finite elements were chosen, leading to the discretized generalized Sylvester equation in (1.71). We compare the performance of the Galerkin method based upon rational Krylov spaces described above (denoted RKSM) with that of a low-rank version of preconditioned GMRES (denoted LR-FGMRES-RKSM). See, e.g., [79] for further insights about low-rank Krylov routines applied to linear matrix equations. The LRFGMRES-RKSM preconditioner is chosen as a fixed (five) number of iterations of the rational Krylov Galerkin method.13 The results are displayed in Table 1.1. Due to the 3D nature (in space) of the problem, the Crank–Nicolson (CN) method with a direct linear solver14 leads to an excessive workload compared with the allat-once approaches for all considered values of .Nh and .Nt , with the computational time growing with the number of time steps .Nt . The performance of the other methods is independent of the time discretization, and it only depends on the spatial component of the overall discrete operator. In fact, spatial mesh independence seems to also be achieved. The CN method is more competitive in terms of computational time when equipped with an iterative linear solver.15

1.6.1.2

The Parameterized Wave Equation

We are now going to report recent results from [51] concerning the numerical solution of the ultraweak space-time variational form of the wave equation.

13 Since

the preconditioner is a non-linear operator, a flexible variant of GMRES is used. LU factors of the CN coefficient matrix are computed once and for all at the beginning of the procedure. 15 We employ GMRES preconditioned with ILU (zero fill-in). The same solver is used for the RKSM basis construction. 14 The

60

K. Urban

The Discretized Wave Equation Recall from (1.61) the structure .B δμ = Qt ⊗ M h + N t ⊗ N μ,h + N t ⊗ N μ,h + M t ⊗ Qμ,h of the stiffness matrix. The above sum also involves singular matrices, namely .N t . Hence, preconditioning is strongly needed. To derive such strategies, we rewrite the linear system .B δμ uδμ = f δμ as a linear matrix equation, so as to exploit the structure of the Kronecker problem. Hence, the vector system is written as .Aδμ (U δμ ) = F δμ with Aδμ (U δμ ) = M h U δμ Qt + N μ,h U δμ N t + N μ,h U δμ N t + Qμ,h U M t ,

.

(1.72)

where .f δμ = vec(F δμ ) and the symmetry of some of the matrices has been exploited. Henning et al. [51] suggests to attack the original multi-term matrix equation directly. Thanks to the symmetry of .N μ,h we rewrite the matrix Eq. (1.72) as M h U δμ Qt + N μ,h U δμ (N t + N t ) + Qμ,h U δμ M t = F δμ ,

.

(1.73)

where we choose .F δμ to be of low rank, that is .F δμ = F δμ,1 (F δμ,2 ) .16 Consider two appropriately selected vector spaces .Vkμ , .Wkμ of dimensions much lower than .Nh , Nt , respectively, and let .V kμ , .W kμ be the matrices whose orthonormal columns span the two corresponding spaces. We look for a low rank approximation of .U δμ as .U kμ = V kμ Y kμ (W kμ ) . To determine .Y kμ we impose an orthogonality (Galerkin) condition on the residual R kμ := F δμ,1 (F δμ,2 ) − M h U kμ Qt − N h,μ U kμ (N t + N t ) − Qμ,h U kδ M t . (1.74)

.

with respect to the generated space pair .(V kμ , W kμ ). Using the matrix Euclidean inner product, this corresponds to imposing that .(V kμ ) R kμ W kμ = 0. Substituting k into this matrix equation, we obtain the following reduced matrix k .R μ and .U δ equation, of the same type as (1.73) but of much smaller size, ((V kμ ) M h V kμ )Y kμ (Qt W kμ ) + ((V kμ ) N μ,h V kμ )Y kμ ((W kμ ) (N t + N t )W kμ )

.

+ ((V kμ ) Qh,μ V kμ )Y kμ ((W kμ ) M t W kμ ) = ((V kμ ) F δμ,1 )((F δμ,2 ) W kμ ). The small dimensional matrix .Y kμ is thus obtained by solving the Kronecker form of this equation. The described Galerkin reduction strategy has been thorough exploited and analyzed for Sylvester equations, and more recently successfully applied to multi-term equations, see, e.g., [70]. The key problem-dependent ingredient is the choice of the spaces .Vkμ , .Wkμ , so that they well represent spectral

16 If

this assumption is not satisfied, we determine a corresponding low rank approximation.

1 The Reduced Basis Method in Space and Time: Challenges, Limits and Perspectives

61

information of the “left-hand” and “right-hand” matrices in (1.73). A well established choice is (a combination of) rational Krylov subspaces [75]. More precisely, for the spatial approximation we generate the growing space range(.V kμ ) as .

√ + k+1 = [V kμ , (Qμ,h + sk M h )−1 v kμ , (N μ,h + sk M h )−1 v kμ ], V μ

V 1μ = F 1μ ,

where .v kμ is the k-th column of .V kμ , so that .V k+1 μ is obtained by orthogonalizing the k+1 k+1 + + new columns inserted in .V μ . The matrix .V μ grows at most by two vectors at a time. For each k, the shift parameter .sk can be chosen either a-priori or dynamically, with the same sign as the spectrum of .Qμ,h (.N μ,h ). Here .sk is cheaply determined using the adaptive strategy in [29]. Since .N μ,h represents a second order operator, √ √ the factor . sk turned to be appropriate in .N μ,h + sk M h ; a specific computation of the parameter associated with .N μ,h can also be included, at low cost. Analogously, , + k+1 .W = [W kμ , (Qt + k M t )−1 wkμ , ((N t + N t ) + k M t )−1 w kμ ], μ W 1μ = F 2μ , where .wkμ is the k-th column of .W kμ , and .W k+1 is obtained by orthogonalizing the μ k+1 + μ . The choice of . k > 0 is made as for .sk . new columns inserted in .W A Numerical Example We report results from a numerical example from [51] on the cube . = (0, 1)3 with .Aμ ≡ −μ2 , .H = L2 (), .μ = 0 being the wave speed, 1 .V = H () and .I = (0, 1), i.e., .T = 1. The initial dilation .u0 is chosen in such a 0 way that the respective solutions have different regularity. If .U kμ denotes the current approximate solution computed at iteration k, the Galerkin method is stopped as soon as the backward error .Ekμ is smaller than .10−5 , where .Ekμ is defined as

Ekμ =

.

R kμ F

F μ F + U kμ F ( M h F Qt F + Qμ,h F M t F +2 N μ,h F N t F )

,

where .R kμ is the residual matrix defined in (1.74). The computation of .Ek simplifies thanks to the low-rank format of the involved quantities (for instance, .R kμ does not need to be explicitly formed to compute its norm). Henning et al. [51] compared the space-time method with the classical CrankNicolson time stepping scheme, in terms of approximation accuracy and CPU time. The .Nh × Nh linear systems involved in the time marching scheme are solved by means of the PCG method with tolerance .ε = 10−6 per step. The time stepping scheme is also used to compute the reference solutions. We chose 1024 time steps and 64 unknowns in every space dimension, resulting in .2.68 · 108 degrees of freedom. For the error calculation, we evaluated the solutions on a grid of 64 points

62

K. Urban

in every dimension and approximated the .L2 error through the .1.67 · 107 query points. The code is ran in Matlab and the B-spline implementation is based on [63].17 To explore the potential of the new ultraweak method on low-regularity solutions, we only concentrate on experiments with lower regularity solutions, in particular a solution which is continuous with discontinuous derivative (Case 1) and a discontinuous solution (Case 2). This is realized through the choice of .u0 . On the other hand, for smooth solutions the time-stepping method would be expected to be more accurate, due to its second-order convergence, compared to the ultraweak method, as long as the latter uses piecewise constant trial functions. The data are summarized as follows: We use tensor product spaces for the

.u0 (r)

.(1

Case 1 − √5 r) · 1r 0 and .γF > 0 are such that 2 .γG γF < 1/ L , and .θ ∈ [−1, +∞[ (it is generally set to .θ = 1 as in [29]).

2.4.3.3

Final Remark About the Primal-Dual Algorithm

Note that the proposed approach computes directly the optimal affine mapping (aff) rather than computing the optimal subspace .Vn = u¯ + Vn . This subspace is thus determined implicitly in view of Lemma 2.1, and we do not have any information about its dimension except that .1 ≤ dim Vn ≤ m.

2.5 Sensor Placement In Sect. 2.4 we have summarized a strategy to find an optimal affine reconstruction algorithm .A∗aff for a given observation space W . This algorithm is connected to an opt optimal affine subspace .Vn to use in the PBDW method although we note that opt our procedure does not yield an explicit characterization of .Vn and a further postprocessing would be necessary to find it in practice. In [10], we have considered the “reciprocal” problem, namely, for a given reduced model space .Vn with a good accuracy .εn , the question is how to guarantee a good reconstruction accuracy with a number of measurements .m ≥ n as small possible. In view of the error bound (2.16), one natural objective is to guarantee that .μ(Vn , Wm ) is maintained of moderate size. Note that taking .Wm = Vn would automatically give the minimal value .μ(Vn , Wm ) = 1 with .m = n. However, in a typical data acquisition scenario, the measurements that span the basis of .Wm are chosen from within a limited class. This is the case for example when placing m pointwise sensors at various locations within the physical domain ..

92

O. Mula

We model this restriction by asking that the .i are picked within a dictionary .D of .V , that is a set of linear functionals normalized according to

 V = 1,

.

 ∈ D,

which is complete in the sense that .(v) = 0 for all . ∈ D implies that .v = 0. With an abuse of notation, we identify .D with the subset of V that consists of all Riesz representers .ω of the above linear functionals .. With such an identification, .D is a set of functions normalized according to

ω = 1,

.

ω ∈ D,

such that the finite linear combinations of elements of .D are dense in V . Our task is therefore to pick .{ω1 , . . . , ωm } ∈ D in such a way that β(Vn , Wm ) ≥ β ∗ > 0,

.

(2.25)

for some prescribed .0 < β ∗ < 1, with m larger than n but as small as possible. In particular, we may introduce m∗ = m∗ (β ∗ , D, Vn ),

.

the minimal value of m such that there exists .{ω1 , . . . , ωm } ∈ D satisfying (2.25). In [10] the authors show two “extreme” results: • For any .Vn and .D, there exists .β ∗ > 0 such that .m∗ = n, that is, the infsup condition (2.25) holds with the minimal possible number of measurements. However this .β ∗ could be arbitrarily close to 0. • For any prescribed .β ∗ > 0 and any model space .Vn , there are instances of dictionaries .D such that .m∗ is arbitrarily large. The two above statements illustrate that the range of situations that can arise is very broad in full generality if one does not add extra assumptions on the nature of .Vn or on the nature of the dictionary .D. This motivates to analyse more concrete instances as we present next. It is possible to study certain relevant dictionaries for the particular space .V = H01 (), with inner product and norms 

u, v :=

∇u(x) · ∇v(x) dx

.



and

u := ∇u L2 () .

The considered dictionaries model local sensors, either as point evaluations or as local averages. In the first case, D = {x = δx : ∀x ∈ },

.

2 Inverse Problems: A Deterministic Approach Using Physics-Based Reduced Models

93

which requires that V is a reproducing kernel Hilbert space (RKHS) of functions defined on ., that is a Hilbert space that continuously embeds in .C(). Examples of such spaces are the Sobolev spaces .H s () for .s > d/2, possibly with additional boundary conditions. In the case of local averages, the linear functionals are of the form  .x,τ (u) = u(y)ϕτ (y − x)dy, 

where ϕτ (y) := τ −d ϕ

.

y  τ

,

for some fixed radial  function .ϕ compactly supported in the unit ball .B = {|x| ≤ 1} of .Rd and such that . ϕ = 1, and .τ > 0 representing the point spread. The dictionary in this case is D = {x,τ : ∀x ∈ }.

.

We could even consider an interval of values for .τ in .[τmin , τmax ] with .0 < τmin ≤ τmax , D = {x,τ : ∀(x, τ ) ∈  × [τmin , τmax ]}.

.

For the above cases of dictionaries, we provide upper estimates of .m∗ in the case of spaces .Vn that satisfy some inverse estimates, such as finite element or trigonometric polynomial spaces. In [10], the optimal value .m∗ is proved to be of the same order as n when the sensors are uniformly spaced. This a-priori analysis is not possible for more general spaces V . It is not possible either for subspaces .Vn such as reduced basis spaces, which are preferred to finite element spaces for model order reduction because the approximation error .εn of the manifold .M defined in (2.13) is expected to decay much faster in elliptic and parabolic problems. For such general spaces, we need a strategy to select the measurements. In practice, V is of finite but very large dimension and .D is of finite but very large cardinality M := #(D) >> 1.

.

For this reason, the exhaustive search of the set .{ω1 , . . . , ωm } ⊂ D maximizing β(Vn , Wm ) for a given .m > 1 is out of reach. One natural alternative is to rely on greedy algorithms where the .ωj are picked incrementally.

.

94

O. Mula

The starting point to the design of such algorithms is the observation that (2.25) is equivalent to having σm = σ (Vn , Wm ) :=

.

sup v∈Vn , v =1

v − PWm v ≤ σ ∗ ,

σ ∗ :=

 1 − (β ∗ )2 < 1. (2.26)

Therefore, our objective is to construct a space .Wm spanned by m elements from .D that captures all unit norm vectors of .Vn with the prescribed accuracy .σ ∗ < 1. This leads us to study and analyze algorithms which may be thought as generalization to the well-studied orthogonal matching pursuit algorithm (OMP), equivalent to the algorithms we study here when applied to the case .n = 1 with a unit norm vector .φ1 that generates .V1 . We refer to [30–33] for some references on classical results on greedy algorithms and the OMP strategy. In [10], the authors propose and analyzed two algorithms which are summarized in Sects. 2.5.1 and 2.5.2. In Sect. 2.5.3 the case of pointwise evaluations is discussed. The main result which is shown is that both algorithms always converge, ensuring that (2.25) holds for m sufficiently large, and we also give conditions on .D that allow us to a-priori estimate the minimal value of m where this happens. The main observation stemming from numerical experiments is the ability of the greedy algorithms to pick good points. In particular, in the case of dictionaries of point evaluations or local averages, we observe that the selection performed by the greedy algorithms is near optimal in simple 1D cases in the sense that it achieves (2.25) after a number of iterations which is proportional to n and which can be predicted in theory. Before finishing this section, let us outline the main differences and points of contact between the present approach and existing works in the literature. The problem of optimal placement of sensors, which corresponds to the particular setting where the linear functionals are point evaluations or local averages, has been extensively studied since the 1970s in control and systems theory. In this context, the state function to be estimated is the realization of a Gaussian stochastic process, typically obtained as the solution of a linear PDE with a white noise forcing term. The error is then measured in the mean square sense (2.7), rather than in the worst case performance sense (2.6) which is the point of view adopted in our work. The function to be minimized by the sensors locations is then the trace of the error covariance, while we target at minimizing the inverse inf-sup constant .μ(Vn , W ). See in particular [34] where the existence and characterization of the optimal sensor location is established in this stochastic setting. Continuous optimization algorithms have been proposed for computing the optimal sensor location, see e.g. [35–37]. One common feature with the present approach is that the criterion to be minimized by the optimal location is non-convex, which leads to potential difficulties when the number of sensors is large. This is the main motivation for introducing a greedy selection algorithm, which in addition allows us to consider more general dictionaries.

2 Inverse Problems: A Deterministic Approach Using Physics-Based Reduced Models

95

2.5.1 A Collective OMP Algorithm In this section we discuss a first numerical algorithm for the incremental selection of the spaces .Wm , inspired by the orthonormal matching pursuit (OMP) algorithm which is recalled below. More precisely, our algorithm may be viewed as applying the OMP algorithm for the collective approximation of the elements of an orthonormal basis of .Vn by linear combinations of m members of the dictionary. Our objective is to reach a bound (2.26) for the quantity .σm . Note that this quantity can also be written as σm = (I − PWm )|Vn L(Vn ,V ) ,

.

that is, .σm is the spectral norm of .I − PWm restricted to .Vn .

2.5.1.1

Description of the Algorithm

When .n = 1, there is only one unit vector .φ1 ∈ V1 up to a sign change. A commonly used strategy for approximating .φ1 by a small combination of elements from .D is to apply a greedy algorithm, the most prominent one being the orthogonal matching pursuit (OMP): we iteratively select ωk = arg max | ω, φ1 − PWk−1 φ1 |,

.

ω∈D

where .Wk−1 := span{ω1 , . . . , ωk−1 } and .W0 := {0}. In practice, one often relaxes the above maximization, by taking .ωk such that | ωk , φ1 − PWk−1 φ1 | ≥ κ max | ω, φ1 − PWk−1 φ1 |,

.

ω∈D

for some fixed .0 < κ < 1, for example .κ = 12 . This is known as the weak OMP algorithm, but we refer to it as OMP as well. It has been studied in [30, 32], see also [33] for a complete survey on greedy approximation. For a general value of n, one natural strategy is to define our greedy algorithm as follows: we iteratively select ωk = arg max

.

ω∈D

max

v∈Vn , v =1

| ω, v − PWk−1 v| = arg max PVn (ω − PWk−1 ω) . ω∈D

(2.27) Note that in the case .n = 1, we obtain the original OMP algorithm applied to .φ1 .

96

O. Mula

As to the implementation of this algorithm, we take .(φ1 , . . . , φn ) to be any orthonormal basis of .Vn . Then

PVn (ω − PWk−1 ω) 2 =

n 

.

| ω − PWk−1 ω, φi |2 =

n 

i=1

| φi − PWk−1 φi , ω|2

i=1

Therefore, at every step k, we have ωk = arg max

n 

.

ω∈D

| φi − PWk−1 φi , ω|2 ,

i=1

which amounts to a stepwise optimization of a similar nature as in the standard OMP. Note that, while the basis .(φ1 , . . . , φn ) is used for the implementation, the actual definition of the greedy selection algorithm is independent of the choice of this basis in view of (2.27). It only involves .Vn and the dictionary .D. Similar to OMP, we may weaken the algorithm by taking .ωk such that n  .

| φi − PWk−1 φi , ωk |2 ≥ κ 2 max ω∈D

i=1

n 

| φi − PWk−1 φi , ω|2 ,

i=1

for some fixed .0 < κ < 1. For such a basis, we introduce the residual quantity rm :=

n 

.

φi − PWm φi 2 .

i=1

This quantity allows us to control the validity of (2.25) since we have σm =

.

sup v∈Vn , v =1

v − PWm v =

n

sup

2 i=1 ci =1

n     1/2 ci (φi − PWm φi ) ≤ rm ,  i=1

and therefore (2.25) holds provided that .rm ≤ σ 2 = 1 − γ 2 .

2.5.1.2

Convergence Analysis

By analogy to the analysis of OMP provided in [30], we introduce for any . = (ψ1 , . . . , ψn ) ∈ V n the quantity

 1 (D) := inf

.

cω,i

 n   ω∈D

i=1

1/2 |cω,i |

2

: ψi =

 ω∈D

cω,i ω,

 i = 1, . . . , n ,

2 Inverse Problems: A Deterministic Approach Using Physics-Based Reduced Models

97

or equivalently, denoting .cω := {cω,i }ni=1 ,

 1 (D) := inf

.





cω 2 :  =

ω∈D



 cω ω .

ω∈D

This quantity is a norm on the subspace of .V n on which it is finite. Given that . = (φ1 , . . . , φn ) is any orthonormal basis of .Vn , we write J (Vn ) :=  1 (D) .

.

˜ = This quantity is indeed independent on the orthonormal basis .: if . ˜ ˜ ˜ (φ1 , . . . , φn ) is another orthonormal basis, we have . = U  where U is unitary. Therefore any representation . = ω∈D cω ω induces the representation .

˜ = 



c˜ω ω,

c˜ω = U cω ,

ω∈D

with the equality  .



c˜ω 2 =

ω∈D

cω 2 ,

ω∈D

˜ 1 (D) . so that .  1 (D) = 

One important observation is that if . = (φ1 , . . . , φn ) is an orthonormal basis of  .Vn and if . = c ω, one has ω ω∈D n=

n 

.

φi ≤

n   i=1 ω∈D

i=1

|cω,i | =



cω 1 ≤

ω∈D



n1/2 cω 2 .

ω∈D

Therefore, we always have J (Vn ) ≥ n1/2 .

.

Using the quantity .J (Vn ), we can generalize the result of [30] on the OMP algorithm in the following way. Theorem 2.2 Assuming that .J (Vn ) < ∞, the collective OMP algorithm satisfies rm ≤

.

J (Vn )2 (m + 1)−1 , κ2

m ≥ 0.

(2.28)

Remark 2.1 Note that the right side of (2.28), is always larger than .n(m + 1)−1 , which is consistent with the fact that .β(Vn , Wm ) = 0 if .m < n.

98

O. Mula

One natural strategy for selecting the measurement space .Wm is therefore to apply the above described greedy algorithm, until the first value .m ˜ = m(n) ˜ is met such that .β(Vn , Wm ) ≥ γ . According to (2.28), this value satisfies m(n) ≤

.

J (Vn )2 . κ 2σ 2

(2.29)

For a general dictionary .D and space .Vn we have no control on the quantity .J (Vn ) which could even be infinite, and therefore the above result does not guarantee that the above selection strategy eventually meets the target bound .β(Vn , Wm ) ≥ γ . In order to treat this case, we establish a perturbation result similar to that obtained in [32] for the standard OMP algorithm. Theorem 2.3 Let . = (φ1 , . . . , φn ) be an orthonormal basis of .Vn and . = (ψ1 , . . . , ψn ) ∈ V n be arbitrary. Then the application of the collective OMP algorithm on the space .Vn gives rm ≤ 4

.

 21 (D) κ2

(m + 1)−1 +  −  2 ,

where .  −  2 :=  −  2V n =

n

i=1 φi

m ≥ 1.

− ψi 2 .

As an immediate consequence of the above result, we obtain that the collective OMP converges for any space .Vn , even when .J (Vn ) is not finite. The next corollary shows that if .γ > 0, one has .β(Vn , Wm ) ≥ γ for m large enough. Corollary 2.1 For any n dimensional space .Vn , the application of the collective OMP algorithm on the space .Vn gives that .limm→+∞ rm = 0.

2.5.2 A Worst Case OMP Algorithm We present in this section a variant of the previous collective OMP algorithm first tested in [8], and then analyzed in [10]. In numerical experiments this variant performs better than the collective OMP algorithm, however its analysis is more delicate. In particular we do not obtain convergence bounds that are as good.

2.5.2.1

Description of the Algorithm

We first take   vk := argmax v − PWk−1 v : v ∈ Vn , v = 1 ,

.

2 Inverse Problems: A Deterministic Approach Using Physics-Based Reduced Models

99

the vector in the unit ball of .Vn that is less well captured by .Wk−1 and then define ωk by applying one step of OMP to this vector, that is

.

  | vk − PWk−1 vk , ωk | ≥ κmax | vk − PWk−1 vk , ω| : ω ∈ D ,

.

for some fixed .0 < κ < 1.

2.5.2.2

Convergence Analysis

The first result gives a convergence rate of .rm under the assumption that .J (Vn ) < ∞, similar to Theorem 2.2, however with a multiplicative constant that is inflated by .n2 . Theorem 2.4 Assuming that .J (Vn ) < ∞, the worst case OMP algorithm satisfies rm ≤

.

n2 J (Vn )2 (m + 1)−1 , κ2

m ≥ 0.

(2.30)

For the general case, we establish a perturbation result similar to Theorem 2.3, with again a multiplicative constant that depends on the dimension of .Vn . Theorem 2.5 Let . = (φ1 , . . . , φn ) be an orthonormal basis of .Vn and . = (ψ1 , . . . , ψn ) ∈ V n be arbitrary. Then the application of the worst case OMP algorithm on the space .Vn gives rm ≤ 4

.

n2  21 (D) κ2

(m + 1)−1 + n2  −  2 ,

where .  −  2 :=  −  2V n =

n

i=1 φi

m ≥ 1.

− ψi 2 .

By the exact same arguments as in the previous algorithm, we find that that the worst case OMP converges for any space .Vn , even when .J (Vn ) is not finite. Corollary 2.2 For any n dimensional space .Vn , the application of the worst case OMP algorithm on the space .Vn gives that .limm→+∞ rm = 0.

2.5.3 Application to Point Evaluation As a simple example, we consider a bounded univariate interval . = I and take V = H01 (I ) which is continuously embedded in .C(I ). Without loss of generality we take .I =]0, 1[. For every .x ∈]0, 1[, the Riesz representer of .δx is given by the solution of .ω = δx with zero boundary condition. Normalising this solution .ω it

.

100

O. Mula

with respect to the V norm, we obtain ωx (t) =

.

⎧ ⎨ √t (1−x) , x(1−x)

for t ≤ x

⎩ √(1−t)x , x(1−x)

for t > x.

For any set of m distinct points .0 < x1 < · · · < xm < 1, the associated measurement space .Wm = span{ωx1 , . . . , ωxm } coincides with the space of piecewise affine polynomials with nodes at .x1 , . . . , xm that vanish at the boundary. Denoting .x0 := 0 and .xm+1 := 1, we have Wm = {ω ∈ C0 ([0, 1]), ω|[xk ,xk+1 ] ∈ P1 , 0 ≤ k ≤ m, and ω(0) = ω(1) = 0}.

.

As an example for the space .Vn , let us consider the span of the Fourier basis (here orthonormalized in V ), √ 2 .φk := sin(kπ x), πk

1 ≤ k ≤ n.

(2.31)

Let us now estimate .m(n) in this example if we choose the points with the greedy algorithms that we have introduced. This boils down to estimate for .J (Vn ). In this simple case, J (Vn ) :=  1 (D) = inf







cx 2 dx :  =

.

x∈[0,1]

cx ωx dx x∈[0,1]

and we can derive .cx for every .x ∈ [0, 1] by differentiating twice the components of . since   . (x) = cy ωy (x) dy = − cy δy (x) dx = −cx . y∈[0,1]

y∈[0,1]

Thus, using the basis functions .φk defined by (2.31), we have 



J (Vn ) =

n 

.

x∈[0,1]

 = x∈[0,1]



1/2 |φk (x)2 |

dx

k=1 n 

1/2 2

2kπ| sin(kπ x)|

dx ∼ n3/2 .

k=1

Estimate (2.29) for the convergence of the collective OMP approach yields m(n) 

.

n3 , κ 2σ 2

2 Inverse Problems: A Deterministic Approach Using Physics-Based Reduced Models

101

while for the worst case OMP, estimate (2.30) gives m(n) 

.

n5 . κ 2σ 2

These bounds deviate from the optimal estimation due to the use of the HilbertSchmidt norm in the analysis. Numerical results reported in [10] reveal that the greedy algorithms actually behave much better in this case.

2.6 Joint Selection of Vn and Wm 2.6.1 Optimality Benchmark So far, we have studied linear and affine reconstruction algorithms which involve (aff) an affine reduced model space .Vn and an observation space .Wm . In Sect. 2.4 we have fixed the observation space, and we have discussed how to derive the optimal (aff) .Vn , which is directly connected to the optimal affine algorithm of the benchmark that we have introduced in (2.20). In Sect. 2.5 we have examined the “reciprocal” of this problem, namely the case where we fix .Vn and we select sensor measurements .ωi from a dictionary .D. The selection is done in order to build an observation space m that yields stable reconstructions in the sense of minimizing .Wm = span{ωi } i=1 .μ(Vn , Wm ) (or, equivalently, maximizing .β(Vn , Wm )). One can of course envision a combined approach in which we make a joint selection of .Vn and .Wm . Of course, the basis .{ωi }m i=1 spanning .Wm must be selected from a dictionary .D in order to account for the fact that we are working with sensor measurements. One way of defining the best performance that such a joint selection can deliver is given by the following extension of the benchmark (2.20). For a fixed .m ≥ 1, the optimal performance of the joint approach is ∗ Ewca,joint (M, m) =

.

min

min

n n {ωi }m i=1 ∈D A:span{ωi }i=1 →V A affine

Ewc (A, M),

(2.32)

for the case of affine algorithms. Of course, one can similarly define the best performance among all algorithms (affine and nonlinear) by removing the constraint that A is affine in the definition above, that is, ∗ Ewc,joint (M, m) =

.

min

min

n n {ωi }m i=1 ∈D A:span{ωi }i=1 →V A affine

Ewc (A, M).

(2.33)

102

O. Mula

2.6.2 A General Nested Greedy Algorithm Finding the optimal elements .{ωi∗ }m i=1 and the optimal algorithm A∗ : W ∗ → V ,

.

W ∗ := span{ωi∗ }m i=1

that meet (2.32) or (2.33) is a very difficult task, and, to best of the author’s knowledge, this question remains an open problem. There are however a number of practical algorithms that have been proposed in order to perform a satisfactory joint selection of .Vn and .Wm in the framework of affine reconstruction algorithms (see, e.g., [7, 8, 10]). All strategies are based on nested greedy algorithms, and they can be seen as variations of the following general algorithm. Assume that we have fixed a dictionary .D to select the sensors. Fix a minimal admissible value for the inf-sup stability .β > 0. For .n = 1, select u1 ∈ arg max u

.

u∈M

and set V1 := span{u1 }.

.

For the given .V1 , apply the OMP sensor selection strategy from Sect. 2.5.1 or its variant from Sect. 2.5.2. At every iteration .k ≥ 1 of the OMP, we pick an observation function .ωk1 . The iterations stop as soon as we reach a value .k = m(1) such that m(1)

β(V1 , span{ωk }k=1 ) ≥ β.

.

We then set m(1) O1 := {ωk }k=1 ,

.

and

Wm(1) := span{O1 }.

We next proceed by induction. At step .n > 1, assume that we have selected: • the set of functions .{u1 , . . . , un−1 } spanning .Vn−1 := span{u1 , . . . , un−1 }, • the set of observation functions .∪n−1 i=1 Oi spanning Wm(n−1) := span{∪n−1 i=1 Oi }.

.

We select the next function .un and the set .On of observation functions as follows. Consider the linear PBDW reconstruction algorithm .An−1 associated to the spaces .Vn−1 and .Wm(n−1) and find un ∈ arg max u − An−1 (PWm(n−1) u) .

.

u∈M

2 Inverse Problems: A Deterministic Approach Using Physics-Based Reduced Models

103

We next define Vn := span{ui }ni=1 = Vn−1 + span{un }.

.

If .β(Vn , Wm(n−1) ) ≥ β, the stability condition is satisfied at step n without needing to add any extra observation functions. As a consequence, we set .On = ∅. Then we define Wm(n) = span{∪ni=1 Oi } = Wm(n−1) ,

.

and go to step .n + 1. If .β(Vn , Wm(n−1) ) < β, we apply the OMP strategy for the constructed .Vn , taking .Wm(n−1) as the initial measurement space to which we have to add new dimensions. For example, in the case of the worst case OMP, we iteratively select for .k ≥ 1 ωkn = arg max PVn (ω − PWm(n−1) +span{ωi }k−1 ω) .

.

i=1

ω∈D

and we stop the iterations as soon as we reach a value .k = m(n) such that   β Vn , Wm(n−1) + span{ωi }ki=1 ≥ β.

.

Once this criterion is satisfied, we set On := {ωkn }m(n) k=1

.

and we finish iteration n by defining Wm(n) = span{∪ni=1 Oi }.

.

As a termination criterion for our algorithm, we can stop the outer iterations in n as soon as .

max u − An−1 (PWm(n−1) u) < ε u∈M

for a given prescribed tolerance .ε > 0. A straightforward application of the results proven in [26, 38] leads to the following result. It expresses the fact that the reconstruction error with the spaces .Vn and .Wm(n) decays at a comparable rate as the Kolmogorov n-width.

104

O. Mula

Theorem 2.6 Let .An be the linear PBDW algorithm associated to the spaces .Vn and .Wm(n) built with the nested greedy algorithm. Then, for .a, b, q ∈ R∗+ , ! .

dn (M) dn (M)

where .b˜ =

b b+1 ,

!

 n−q e



−anb

Ewc (M, An )

 n−q

Ewc (M, An )

˜ ,  e−an



and .a˜ depends on a and some other technical parameters.

2.6.3 The Generalized Empirical Interpolation Method Among the many variants that one can consider of the above joint selection strategy, one that has drawn particular attention is the so-called Generalized Empirical Interpolation Method (GEIM, [7, 38, 39]). In this method, at every step .n ≥ 1, we add only one observation function. The criterion to select it is close (but not entirely equivalent) to the one of making one single step of the worst case OMP of Sect. 2.5.2. This implies that we prescribe .m(n) = n for all .n ≥ 1, and the dimension of the reduced model .Vn is equal to the one of the observation space .Wn . One consequence of this construction is that one cannot guarantee that .β(Vn , Wn ) remains bounded away from 0. This is in contrast to the algorithm of Sect. 2.6.2. In practice, it has been observed that .β(Vn , Wn ) slowly decreases as .n → ∞ (see, e.g., [7, 39]) but there is no a priori analysis quantifying the rate of decay. The algorithm works as follows (see, e.g., [7]). For .n = 1, select u1 ∈ arg max u

.

u∈M

and set V1 := span{u1 }.

.

The first observation function is defined as ω1 ∈ arg max | ω, u1  |,

.

ω∈D

and we set W1 := span{ω1 }.

.

We then proceed by induction. At step .n > 1, assume that we have selected {u1 , . . . , un−1 } and .{ω1 , . . . , ωn−1 } which respectively span the subspaces .Vn−1 and .Wn−1 . We define .An−1 as the PBDW reconstruction algorithm associated to .Vn−1 .

2 Inverse Problems: A Deterministic Approach Using Physics-Based Reduced Models

105

and .Wn−1 . We choose un ∈ arg max u − An (PWn−1 u) ,

.

u∈M

and then select   ωn ∈ arg max | ω, un − An−1 (PWn−1 u) |.

.

ω∈D

We finally define Vn := Vn−1 + span{un },

.

and

Wn := Wn−1 + span{ωn },

and go the next step .n + 1. The method is called generalized interpolation because we have the interpolatory property that .i (v) = i (An (v)) for .i = 1, . . . , n. Also, for any .v ∈ Vn , .An (v) = v.

2.7 A Piece-Wise Affine Algorithm to Reach the Benchmark Optimality In this section, we come back to the setting where we work with a fixed observation space W and a fixed number m of observations .zi = i (u), i = 1, . . . , m. Our goal is to discuss how to go beyond the linear/affine framework that we have discussed in Sects. 2.4 to 2.6, and how to build algorithms that can deliver a performance close to optimal. The simplicity of the plain PBDW method (2.14) and its above variants come together with a fundamental limitation of performance: since the map .w → An (w) is linear or affine, the reconstruction necessarily belongs to an m or .m + 1 dimensional space, and therefore the worst case performance is necessarily bounded from below by the Kolmogorov width .dm (M) or .dm+1 (M). In other words, if we restrict ourselves to affine algorithms, we have .

min Ewc (A, M) ≤ dm+1 (M) ≤ min Ewc (A, M).

A:W →V

A:W →V A affine

and affine algorithms will miss optimality especially in cases where .

min Ewc (A, M)  dm+1 (M).

A:W →V

This is expected to happen in elliptic problems with weak coercivity or in hyperbolic problems.

106

O. Mula

In view of this limitation, the principal objective of [12] is to develop nonlinear state estimation techniques which provably overcome the bottleneck of the Kolmogorov width .dm (M). The next pages summarize the main ideas from this contribution. We will focus particularly on summarizing a nonlinear recovery method based on a family of affine reduced models .(Vk )k=1,...,K . Each .Vk has dimension .nk ≤ m and serves as a local approximation to a portion .Mk of the solution manifold. Applying the PBDW method with each such space, results in a collection of state estimators .u∗k . The value k for which the true state u belongs to .Mk being unknown, we introduce a model selection procedure in order to pick a value .k ∗ , and define the resulting estimator .u∗ = u∗k ∗ . We show that this estimator has performance comparable to optimal in a sense which we make precise later on, and which cannot be achieved by the standard linear/affine PBDW method due to the above described limitations. Model selection is a classical topic of mathematical statistics [40], with representative techniques such as complexity penalization or cross-validation in which the data are used to select a proper model. The approach that we present differs from these techniques in that it exploits (in the spirit of data assimilation) the PDE model which is available to us, by evaluating the distance to the manifold .

dist(v, M) = min v − u(y) , y∈Y

(2.34)

of the different estimators .v = u∗k for .k = 1, . . . , K, and picking the value .k ∗ that minimizes it. In practice, the quantity (2.34) cannot be exactly computed and we instead rely on a computable surrogate quantity .S(v, M) expressed in terms of the residual to the PDE. One typical instance where such a surrogate is available and easily computable is when the parametric PDE has the form of a linear operator equation B(y)u = f (y),

.

where .B(y) is boundedly invertible from V to .V , or more generally, from .V → Z for a test space Z different from V , uniformly over .y ∈ Y . Then .S(v, M) is obtained by minimizing the residual R(v, y) = B(y)v−f (y) Z ,

.

over .y ∈ Y . In other words, S(v, M) = min R(v, y).

.

y∈Y

This task itself is greatly facilitated in the case where the operators .B(y) and source terms .f (y) have affine dependence in .Y. One relevant example is the second order

2 Inverse Problems: A Deterministic Approach Using Physics-Based Reduced Models

107

elliptic diffusion equation with affine diffusion coefficient,

.

− div(a∇u) = f (y),

a = a(x; y) = a(x) ¯ +

d 

yj ψj (x).

j =1

2.7.1 Optimality Benchmark Under Perturbations In order to present the piece-wise affine strategy and its performance, we need to enrich the notions of benchmark optimality introduced in Sect. 2.3. In that section, we introduced in (2.11) the quantity .δ0 which was defined as δ0 = δ0 (M, W ) := sup{diam(Mw ) : w ∈ W }

.

= sup{ u − v : u, v ∈ M, u − v ∈ W ⊥ }. We saw in (2.12) that .δ0 can be related to the worst-case optimal performance ∗ (M) by the equivalence Ewc

.

.

1 ∗ (M) ≤ δ0 . δ0 ≤ Ewc 2

We next introduce a somewhat relaxed benchmark quantity to take into account the fact that computationally feasible algorithms usually introduce simplifications of the geometry of the manifold. In the case of the plain PBDW, the simplification is that the manifold is “replaced” by a linear or an affine subspace .Vn , which makes that for most practical and theoretical purposes, .M could be replaced by the cylinder .Kn introduced in (2.15). As we will see later on, the relaxed benchmark will also allow us to take into account model error and measurement noise in the analysis. In order to account for manifold simplification as well as model bias, for any given accuracy .σ > 0, we introduce the .σ -offset of .M, Mσ := {v ∈ V : dist(v, M) ≤ σ } =

"

.

B(u, σ ),

u∈M

where .B(u, σ ) is the ball of center u and radius .σ . Likewise, we introduce the set Mσ,w = Mσ ∩ (ω + W ⊥ ),

.

which is a perturbed set of .Mw introduced in (2.8) (note that this set still excludes uncertainties in w but we will come to this in a moment).

108

O. Mula

(a)

(b)

Fig. 2.3 Illustration of the optimal recovery benchmark on a manifold in the two dimensional Euclidean space. Left: benchmark for the idea scenario of a perfect model and noiseless observations. Right: how the benchmark is degraded by an abstract factor .σ associated to the model error and the observation noise

Our benchmark for the worst case error is now defined as δσ := max diam(Mσ,w ) = max{ u − v : u, v ∈ Mσ , u − v ∈ W ⊥ }.

.

w∈W

(2.35)

Figure 2.3a and b gives an illustration of .δ0 , .δσ and the optimal scheme .A∗wc based on Chebyshev centers which was introduced in (2.10). To account for measurement noise, we introduce the quantity .

δ˜σ := max{ u − v : u, v ∈ M, PW u − PW v ≤ σ }.

The two quantities .δσ and .δ˜σ are not equivalent, however one has the framing δσ − 2σ ≤ δ˜2σ ≤ δσ + 2σ.

.

In the following analysis of reconstruction methods, we use the quantity .δσ as a benchmark which, in view of this last observation, also accounts for the lack of accuracy in the measurement of .PW u. Our objective is therefore to design an algorithm that, for a given tolerance .σ > 0, recovers from the measurement .w = PW u an approximation to u with accuracy comparable to .δσ . Such an algorithm requires that we are able to capture the solution manifold up to some tolerance .ε ≤ σ by some reduced model.

2 Inverse Problems: A Deterministic Approach Using Physics-Based Reduced Models

109

2.7.2 Piecewise Affine Reduced Models Linear or affine reduced models, as used in the affine PBDW algorithm, are not suitable for approximating the solution manifold when the required tolerance .ε is too small. In particular, when .εm, therefore making .μ(Vn , W ) infinite. One way out is to replace the single space .Vn by a family of affine spaces Vk = u¯ k + V¯k ,

.

k = 1, . . . , K,

each of them having dimension .

dim(Vk ) = nk ≤ m,

such that the manifold is well captured by the union of these spaces, in the sense that   K " .dist M, Vk ≤ ε k=1

for some prescribed tolerance .ε > 0. This is equivalent to saying that there exists a partition of the solution manifold M=

K "

.

Mk ,

k=1

such that we have local certified bounds dist(Mk , Vk ) ≤ εk ≤ ε,

.

k = 1, . . . , K.

We may thus think of the family .(Vk )k=1,...,K as a piecewise affine approximation to M. We stress that, in contrast to the hierarchies .(Vn )n=0,...,m produced by reduced modeling algorithms, the spaces .Vk do not have dimension k and are not nested. Most importantly, K is not limited by m while each .nk is. The objective of using a piecewise reduced model in the context of state estimation is to have a joint control on the local accuracy .εk as expressed by (2.7.2) and on the stability of the PBDW when using any individual .Vk . This means that, for some prescribed .μ > 1, we ask that

.

μk = μ(V¯k , W ) ≤ μ,

.

k = 1, . . . , K.

(2.36)

According to (2.16), the worst case error bound over .Mk when using the PBDW method with a space .Vk is given by the product .μk εk . This suggests to alternatively

110

O. Mula

require from the collection .(Vk )k=1,...,K , that for some prescribed .σ > 0, one has σk := μk εk ≤ σ,

.

k = 1, . . . , K.

(2.37)

This leads us to the following definitions. Definition 2.2 The family .(Vk )k=1,...,K is .σ -admissible if (2.37) holds. It is .(ε, μ)admissible if (2.7.2) and (2.36) are jointly satisfied. Obviously, any .(ε, μ)-admissible family is .σ -admissible with .σ := με. In this sense the notion of .(ε, μ)-admissibility is thus more restrictive than that of .σ admissibility. The benefit of the first notion is in the uniform control on the size of .μ which is critical in the presence of noise. If .u ∈ M is our unknown state and .w = PW u is its observation, we may apply the PBDW method for the different .Vk in the given family, which yields a corresponding family of estimators u∗k = u∗k (w) = argmin{dist(v, Vk ) : v ∈ ω + W ⊥ },

.

k = 1, . . . , K.

(2.38)

If .(Vk )k=1,...,K is .σ -admissible, we find that the accuracy bound

u − u∗k ≤ μk dist(u, Vk ) ≤ μk εk = σk ≤ σ,

.

holds whenever .u ∈ Mk . Therefore, if in addition to the observed data w one had an oracle giving the information on which portion .Mk of the manifold the unknown state sits, we could derive an estimator with worst case error Ewc ≤ σ.

.

This information is, however, not available and such a worst case error estimate cannot be hoped for, even with an additional multiplicative constant. Indeed, as we shall see below, .σ can be fixed arbitrarily small by the user when building the family .(Vk )k=1,...,K , while we know from (2.12) that the worst case error is bounded from ∗ (M) ≥ 1 δ which could be non-zero. We will thus need to replace below by .Ewc 2 0 the ideal choice of k by a model selection procedure only based on the data w, that is, a map w → k ∗ (w),

.

leading to a choice of estimator .u∗ = u∗k ∗ = Ak ∗ . We shall prove further that such an estimator is able to achieve the accuracy Ewc (Ak ∗ , M) ≤ δσ ,

.

2 Inverse Problems: A Deterministic Approach Using Physics-Based Reduced Models

111

that is, the benchmark introduced in Sect. 2.3. Before discussing this model selection, we discuss the existence and construction of .σ -admissible or .(ε, μ)admissible families.

2.7.3 Constructing Admissible Reduced Model Families For any arbitrary choice of .ε > 0 and .μ ≥ 1, the existence of an .(ε, μ)-admissible family results from the following observation: since the manifold .M is a compact set of V , there exists a finite .ε-cover of .M, that is, a family .u¯ 1 , . . . , u¯ K ∈ V such that M⊂

K "

.

B(u¯ k , ε),

k=1

or equivalently, for all .v ∈ M, there exists a k such that . v − u¯ k ≤ ε. With such an .ε cover, we consider the family of trivial affine spaces defined by Vk = {u¯ k } = u¯ k + V¯k ,

.

V¯k = {0},

thus with .nk = 0 for all k. The covering property implies that (2.7.2) holds. On the other hand, for the 0 dimensional space, one has μ({0}, W ) = 1,

.

and therefore (2.36) also holds. The family .(Vk )k=1,...,K is therefore .(ε, μ)admissible, and also .σ -admissible with .σ = ε. This family is however not satisfactory for algorithmic purposes for two main reasons. First, the manifold is not explicitly given to us and the construction of the centers .u¯ k is by no means trivial. Second, asking for an .ε-cover, would typically require that K becomes extremely large as .ε goes to 0. For example, assuming that the parameter to solution .y → u(y) has Lipschitz constant L,

u(y) − u(y)

˜ ≤ L|y − y|, ˜

.

y, y˜ ∈ Y,

for some norm .| · | of .Rd , then an .ε cover for .M would be induced by an .L−1 ε cover for .Y which has cardinality K growing like .ε−d as .ε → 0. Having a family of moderate size K is important for the estimation procedure since we intend to apply the PBDW method for all .k = 1, . . . , K. In order to construct .(ε, μ)-admissible or .σ -admissible families of better controlled size, we need to split the manifold in a more economical manner than through an .ε-cover, and use spaces .Vk of general dimensions .nk ∈ {0, . . . , m} for the various manifold portions .Mk . To this end, we combine standard constructions of

112

O. Mula

linear reduced model spaces with an iterative splitting procedure operating on the parameter domain .Y. Let us mention that various ways of splitting the parameter domain have already been considered in order to produce local reduced bases having both controlled cardinality and prescribed accuracy [41–43]. However, these works are devoted to forward model reduction according to the terminology that we introduced in Sect. 2.2. Here our goal is different since we want to control both the accuracy .ε and the stability .μ with respect to the measurement space W . We describe the greedy algorithm for constructing .σ -admissible families, and explain how it should be modified for .(ε, μ)-admissible families. For simplicity we consider the case where .Y is a rectangular domain with sides parallel to the main axes, the extension to a more general bounded domain .Y being done by embedding it in such a hyper-rectangle. We are given a prescribed target value .σ > 0 and the splitting procedure starts from .Y. At step j , a disjoint partition of .Y into rectangles .(Yk )k=1,...,Kj with sides parallel to the main axes has been generated. It induces a partition of .M given by Mk := {u(y) : y ∈ Yk },

k = 1, . . . , Kj .

.

To each .k ∈ {1, . . . , Kj } we associate a hierarchy of affine reduced basis spaces Vn,k = u¯ k + V¯n,k ,

.

n = 0, . . . , m.

where .u¯ k = u(y¯k ) with .y¯k the vector defined as the center of the rectangle .Yk . The nested linear spaces .

V¯0,k ⊂ V¯1,k ⊂ · · · ⊂ V¯m,k ,

dim(V¯n,k ) = n,

are meant to approximate the translated portion of the manifold .Mk − u¯ k . For example, they could be reduced basis spaces obtained by applying the greedy algorithm to .Mk − u¯ k , or spaces resulting from local n-term polynomial approximations of .u(y) on the rectangle .Yk . Each space .Vn,k has a given accuracy bound and stability constant dist(Mk , Vn,k ) ≤ εn,k

.

and

μn,k := μ(V¯n,k , W ).

We define the test quantity τk =

.

min μn,k εn,k .

n=0,...,m

If .τk ≤ σ , the rectangle .Yk is not split and becomes a member of the final partition. The affine space associated to .Mk is Vk = u¯ k + V¯k ,

.

2 Inverse Problems: A Deterministic Approach Using Physics-Based Reduced Models

113

where .Vk = Vn,k for the value of n that minimizes .μn,k εn,k . The rectangles .Yk with τk > σ are, on the other hand, split into a finite number of sub-rectangles in a way that we discuss below. This results in the new larger partition .(Yk )k=1,...,Kj +1 after relabelling the .Yk . The algorithm terminates at the step j as soon as .τk ≤ σ for all .k = 1, . . . , Kj = K, and the family .(Vk )k=1,...,K is .σ -admissible. In order to obtain an .(ε, μ)-admissible family, we simply modify the test quantity .τk by defining it instead as μ  n,k εn,k min max .τk := , n=0,...,m μ ε .

and splitting the cells for which .τk > 1. The splitting of one single rectangle .Yk can be performed in various ways. When the parameter dimension d is moderate, we may subdivide each side-length at the mid-point, resulting into .2d sub-rectangles of equal size. This splitting becomes too costly as d gets large, in which case it is preferable to make a choice of .i ∈ {1, . . . , d} and subdivide .Yk at the mid-point of the side-length in the i-coordinate, resulting in only 2 sub-rectangles. In order to decide which coordinate to pick, we consider the d possibilities and take the value of i that minimizes the quantity − + τk,i = max{τk,i , τk,i },

.

− + where .(τk,i , τk,i ) are the values of .τk for the two subrectangles obtained by splitting along the i-coordinate. In other words, we split in the direction that decreases .τk most effectively. In order to be certain that all side-length are eventually split, we can mitigate the greedy choice of i in the following way: if .Yk has been generated by l consecutive refinements, and therefore has volume .|Yk | = 2−l |Y |, and if l is even, we choose .i = (l/2 mod d). This means that at each even level we split in a cyclic manner in the coordinates .i ∈ {1, . . . , d}. Using such elementary splitting rules, we are ensured that the algorithm must terminate. Indeed, we are guaranteed that for any .η > 0, there exists a level .l = l(η) such that any rectangle .Yk generated by l consecutive refinements has side-length smaller than .2η in each direction. Since the parameter-to-solution map is assumed to be continuous, for any .ε > 0, we can pick .η > 0 such that

y − y

˜ ∞ ≤ η ⇒ u(y) − u(y)

˜ ≤ ε,

.

y, y˜ ∈ Y.

Applying this to .y ∈ Yk and .y˜ = y¯k , we find that for .u¯ k = u(y¯k )

u − u¯ k ≤ ε,

.

u ∈ Mk .

Therefore, for any rectangle .Yk of generation l, we find that the trivial affine space Vk = u¯ k has local accuracy .εk ≤ ε and .μk = μ({0}, W ) = 1 ≤ μ, which implies that such a rectangle would not anymore be refined by the algorithm.

.

114

O. Mula

2.7.4 Reduced Model Selection and Recovery Bounds We return to the problem of selecting an estimator within the family .(u∗k )k=1,...,K defined by (2.38). In an idealized version, the selection procedure picks the value .k ∗ that minimizes the distance of .u∗k to the solution manifold, that is, k ∗ = argmin{dist(u∗k , M) : k = 1, . . . , K}

.

(2.39)

and takes for the final estimator u∗ = u∗ (w) := Ak ∗ (w) = u∗k ∗ (w).

(2.40)

.

Note that .k ∗ also depends on the observed data w. This estimation procedure is not realistic since the computation of the distance of a known function v to the manifold .

dist(v, M) = min u(y) − v ,

(2.41)

y∈Y

is a high-dimensional non-convex problem which necessitates to explore the whole solution manifold. A more realistic procedure is based on replacing this distance by a surrogate quantity .S(v, M) that is easily computable and satisfies a uniform equivalence r dist(v, M) ≤ S(v, M) ≤ R dist(v, M),

.

v ∈ V,

for some constants .0 < r ≤ R. We then instead take for .k ∗ the value that minimizes this surrogate, that is, k ∗ = argmin{S(u∗k , M) : k = 1, . . . , K}.

.

(2.42)

Before discussing the derivation of .S(v, M) in concrete cases, we establish a recovery bound in the absence of model bias and noise. Theorem 2.7 Assume that the family .(Vk )k=1,...,K is .σ -admissible for some .σ > 0. Then, the idealized estimator based on (2.39), (2.40), satisfies the worst case error estimate Ewc (Ak ∗ , M) = max u − u∗ (PW u) ≤ δσ ,

.

u∈M

where .δσ is the benchmark quantity defined in (2.35). When using the estimator based on (2.42), the worst case error estimate is modified into Ewc (Ak ∗ , M) ≤ δκσ ,

.

κ=

R > 1. r

2 Inverse Problems: A Deterministic Approach Using Physics-Based Reduced Models

115

In the above result, we do not obtain the best possible accuracy satisfied by the different .u∗k , since we do not have an oracle providing the information on the best choice of k. We can show that this order of accuracy is attained in the particular case where the measurement map .PW is injective on .M (which implies .δ0 = 0). Theorem 2.8 Assume that .δ0 = 0 and that μ(M, W ) =

.

δσ 1 sup < ∞. 2 σ >0 σ

Then, for any given state .u ∈ M with observation .w = PW u, the estimator .u∗ obtained by the model selection procedure (2.42) satisfies the oracle bound

u − u∗ ≤ C

.

min u − u∗k ,

k=1,...,K

C := 2μ(M, W )κ.

In particular, if .(Vk )k=1,...,K is .σ -admissible, it satisfies

u − u∗ ≤ Cσ.

.

The next theorem outlines how to incorporate model bias and noise in the recovery bound, provided that we have a control on the stability of the PBDW method, through a uniform bound on .μk , which holds when we use .(ε, μ)admissible families. Theorem 2.9 Assume that the family .(Vk )k=1,...,K is .(ε, μ)-admissible for some ε > 0 and .μ ≥ 1. If the observation is .w = PW u + η with . η ≤ εnoise , and if the true state does not lie in .M but satisfies .dist(u, M) ≤ εmodel , then, the estimator based on (2.42) satisfies the estimate

.

u − u∗ (w) ≤ δκρ + εnoise ,

.

ρ := μ(ε + εnoise ) + (μ + 1)εmodel ,

κ=

R , r

and the idealized estimator based on (2.39) satisfies a similar estimate with .κ = 1.

2.8 Bibliographical Remarks/Connections with Other Works 2.8.1 A Bit of History on the Use of Reduced Models to Solve Inverse Problems We often think of reduced order models only as a vehicle to speed up calculations in forward reduced modeling tasks according to the terminology that we introduced in Sect. 2.3. However, reduced order models .Vn play also a very prominent role in the inverse problem approach that we have presented. They are the main vehicle

116

O. Mula

for building implementable reconstruction algorithms whose performance can be proven to be close to optimal. In fact, the idea of using reduced models to solve inverse problems has actually a relatively long history. It can be traced back at least to the gappy POD method, first introduced in [44] by Everson and Sirovich. There, the authors address the problem of restoring a full image from partial pixel observations by using a least squares strategy involving a reconstruction on linear spaces obtained by PCA. The same strategy was then brought to other fields such as fluid and structural applications, see [45]. The introduction of a reduced model can be seen as an improvement with respect to working with one single background function as is done in methods such as 3D-VAR, see [46, 47]. In contrast to the present work and the PBDW method in general, the gappy POD method is formulated on the euclidean space .V = RN , with .N ∈ N typically much larger than m and n. It uses linear reduced models .Vn obtained by PCA and measurement observations are typically point-wise vector entries, that is .ωi = ei with .ei ∈ RN being the i-th unit vector. For that particular choice of ambient space and reduced models, the linear PBDW method is very close to gappy POD. It is however not entirely equivalent since PBDW presents a certain component in .W ∩ Vn⊥ which is missing in gappy POD. For the case of a general Hilbert space, there is a connection between the linear PBDW is equivalent to the Generalized Empirical Interpolation Method as we have outlined in Sect. 2.6.3. It is also interesting to note that the linear PBDW reconstruction algorithm (2.14) was proposed simultaneously in the field of model order reduction and by researchers seeking to build infinite dimensional generalizations of compressed sensing (see [48]). In the applications of this community, .Vn is usually chosen to be a “multi-purpose” basis such as the Fourier basis, as opposed to our current envisaged applications in which .Vn is a subspace specifically tailored to approximate .M. However, the results that we have summarized here are general, and they remain valid also for these types of “multi-purpose” subspaces. In the above landscape of methods, the piecewise affine extension of PBDW of Sect. 2.7 can be interpreted as a further generalization step which comes with optimal reconstruction guarantees. The strategy is based on an offline partitioning of the manifold .M in which, for each element of the partition, we compute reduced models. We then decide with a data-driven approach which reduced model is the most appropriate for the reconstruction. The idea of partitioning the manifold and working with different reduced order models for each partition is new for the purpose of addressing inverse problems. It has however been explored in works that focus on the forward modeling problem see, e.g., [49–52]. For forward modeling, the piece-wise strategy enters into the general topic of nonlinear forward model reduction for which little is known in terms of the performance guarantees. A first step towards a cohesive theory for nonlinear forward model reduction has recently been proposed in [43], in relation with the general concept of library widths [53].

2 Inverse Problems: A Deterministic Approach Using Physics-Based Reduced Models

117

2.8.2 For Further Reading • Noise and physical model error: For the readers interested in further aspects connected to noise, we refer to [54] for a study on optimal benchmarks with noise. Some algorithms that attempt to do some denoising have been presented in [55–58]. A contribution that aims to learn physical model corrections can be found in [59]. • Beyond the Hilbertian framework and piecewise affine approximations: The general framework of optimal recovery that we have introduced in Sect. 2.3 can be extended to general Banach spaces and for general nonlinear approximation spaces (see [60, 61]). • GEIM and variants: The GEIM can also be formulated in general Banach spaces (see [38]). This justifies why GEIM is a generalization of the celebrated EIM originally introduced in [62] (see also, e.g., [63]): if we work with a manifold in the Banach space of continuous functions V = C() with the supnorm

v ∞ := sup |v(x)|,

.

∀v ∈ C(),

x∈

GEIM boils down to EIM when we use the dictionary composed of pointwise evaluations D = {δx : x ∈ }.

.

EIM and GEIM strongly interweave forward and inverse problems since the exact same algorithm can be applied for both purposes. EIM was originally introduced to address forward model reduction of nonlinear PDEs. It can also be applied as a reconstruction algorithm as outlined in Sect. 2.6.3, and GEIM allows to apply it in basically any functional setting. • Applications: Among the applicative problems that have been addressed with the present inverse problem approach, we can cite: – – – – –

Acoustics problems: [8]. Biomedical problems: [64–66]. Air quality: [67]. Nuclear engineering: [57, 68]. Welding: [69].

118

O. Mula

Appendix 1: Practical Computation of An , the Linear PBDW Algorithm Let X and Y be two finite dimensional subspaces of V and let PX|Y : Y → X

.

y → PX|Y (y) be the orthogonal projection into X restricted to Y . That is, for any .y ∈ Y , .PX|Y (y) is the unique element .x ∈ X such that .

y − x, x ˜ = 0,

∀x˜ ∈ X.

Lemma 2.2 Let .Wm and .Vn be an observation space and a reduced basis of dimension .n ≤ m such that .β(Vn , Wm ) > 0. Then the linear PBDW algorithm defined in (2.14) is given by ∗ ∗ An (ω) = ω + vm,n − PW vm,n ,

.

with

−1 ∗ vm,n = PVn |Wm PWm |Vn PVn |Wm (ω).

.

(2.43)

Proof By formula (2.14), .An (ω) is a minimizer of .

min

u∈ω+Wm ⊥

dist(u, Vn )2 =

min

min u − v 2

u∈ω+Wm ⊥ v∈Vn

= min min ω + η − v 2 v∈Vn η∈Wm ⊥

= min ω − v − PWm ⊥ (ω − v) 2 v∈Vn

= min ω − v + PWm ⊥ (v) 2 v∈Vn

= min ω − PWm (v) 2 . v∈Vn

The last minimization problem is a classical least squares optimization. Any ∗ minimizer .vm,n ∈ Vn satisfies the normal equations ∗ ∗ PW P v ∗ = PW ω, m |Vn Wm |Vn m,n m |Vn

.

∗ ∗ : Vn → Wm is the adjoint operator of .PWm |Vn . Note that .PW is where .PW m |Vn m |Vn well defined since .β(Vn , Wm ) = minv∈Vn PWm |Vn v / v > 0, which implies that

2 Inverse Problems: A Deterministic Approach Using Physics-Based Reduced Models

119

PWm |Vn is injective and thus admits an adjoint. Furthermore, since for any .ω ∈ Wm ∗ and .v ∈ Vn , . v, ω = PWm |Vn v, ω = v, PVn |Wm ω, it follows that .PW = m |Vn PVn |Wm , which finally yields that the unique solution of the least squares problem is

.

−1 ∗ vm,n = PVn |Wm PWm |Vn PVn |Wm ω.

.

∗ ∗ −P ∗ = ω + vm,n Therefore .An (ω) = ω + ηm,n Wm vm,n .



Algebraic Formulation The explicit expression (2.43) for .vn∗ allows to easily derive its algebraic formulation. Let F and H be two finite-dimensional subspaces of V of dimensions n and m respectively in the Hilbert space V and let .F = {fi }ni=1 and .H = {hi }m i=1 be a basis for each subspace respectively. The Gram matrix associated to .F and .H is   G(F, H) = fi , hj 1≤i≤n .

.

1≤j ≤m

These matrices are useful to express the orthogonal projection .PF |H : H → F in the bases .F and .H in terms of the matrix PF |H = G(F, F)−1 G(F, H).

.

(2.44)

As a consequence, if .Vn = {vi }ni=1 is a basis of the space .Vn and .Wm = {ωi }m i=1 is the basis of .Wm formed by the Riesz representers of the linear functionals .{i }m i=1 , ∗ the coefficients .v∗m,n of the function .vm,n in the basis .Vn are the solution to the normal equations PVn |Wm PWm |Vn v∗m,n = PVn |Wm G(Wm , Wm )−1 w,

.

where .w is the vector of measurement observations w = ( u, ωi )m i=1 ,

.

and from formula (2.44),

.

! PVn |Wm PWm |Vn

= G(Vn , Vn )−1 G(Vn , Wm ), = G(Wm , Wm )−1 G(Wm , Vn ).

Usually .v∗m,n is computed with a QR decomposition or any other suitable method. Once .v∗m,n is found, the vector of coefficients .u∗m,n of .An (ω) easily follows.

120

O. Mula

Appendix 2: Practical Computation of β(Vn , Wm ) Let .Vn and .Wm be two linear subspaces of V of dimensions n and m respectively, and with .n ≤ m. The inf-sup constant between these spaces was defined in Eq. (2.17), and we recall it here: βn = β(Vn , Wm ) := min max

.

v∈Vn w∈Wm

v, w

PWm v

= min .

v w v∈Vn v

The last equality comes from the fact that .

max

w∈Wm

v, w

PWm v, w = max = PWm v , w∈W

w

w

m

∀v ∈ Vn .

Let .Vn = {vi }ni=1 be a basis of the space .Vn and let .c be the coefficients of an element .v ∈ Vn in the basis .Vn . For any nonzero .v ∈ Vn , we can thus write βn = min

.

v∈Vn

PWm v 2V

v 2V

cT M(Vn , Wm )c = minn T c∈R c G(Vn , Vn )c

(2.45)

where   M(Vn , Wm ) := PWm vi , PWm vj 1≤i,j ≤n

.

is a symmetric matrix. Let us make a few remarks before giving an implementable expression for .M(Vn , Wm ). First, note that the value of .βn does not depend on the selected bases  n instead of .Vn amounts to changing the .Vn and .Wm . For example, using a basis .V variable .c by  .c = Uc for an invertible matrix .U, and this does not affect the value of the minimizer. Second, note that formula (2.45) shows that .βn is the smallest eigenvalue of the generalized eigenvalue problem find (λ, c) ∈ R × Rn − {0}

.

s.t. M(Vn , Wm )c = λG(Vn , Vn )c.

Since .G(Vn , Vn ) and .M(Vn , Wm ) are symmetric, positive definitive, the eigenvalues .λ are positive, and having .βn > 0 is equivalent to the invertibility of .M(Vn , Wm ). We can transform the generalized eigenvalue problem in a classical eigenvalue problem by multiplying by the inverse of .G(Vn , Vn ). Also, remark that we have important simplifications when .V and/or .Wm are orthonormal bases since in that case .G(Vn , Vn ) and .G(Wm , Wm ) become the identity matrices. We next give an explicit expression for .M(Vn , Wm ). Since the coordinates in n .Vn of the i-th basis function .vi are given by the i-th canonical vector .ei ∈ R , using

2 Inverse Problems: A Deterministic Approach Using Physics-Based Reduced Models

121

formula (2.44) we deduce that the coordinates of .PWm vi in .Wm are given by pi := PWm |Vn ei = G(Wm , Wm )−1

.

× G(Wm , Vn )ei ,

∀i ∈ {1, . . . , n}.

Therefore  .

PWm vi , PWm vj

 V

= pTi G(Wm , Wm )pj = eTi GT (Wm , Vn )G−1 (Wm , Wm ) × G(Wm , Vn )ej ,

∀(i, j ) ∈ {1, . . . , n}2 ,

and M(Vn , Wm ) = GT (Wm , Vn )G−1 (Wm , Wm )G(Wm , Vn ).

.

References 1. H. Weyl, Über die asymptotische verteilung der eigenwerte. Nachrichten von der Gesellschaft der Wissenschaften zu Göttingen, Mathematisch-Physikalische Klasse 1911, 110–117 (1911) 2. C. Gordon, D.L. Webb, S. Wolpert, One cannot hear the shape of a drum. Bull. Am. Math. Soc. 27(1), 134–138 (1992) 3. A.M. Stuart, Inverse problems: a Bayesian perspective. Acta Numer. 19, 451–559 (2010) 4. M. Dashti, A.M. Stuart, The Bayesian Approach to Inverse Problems (Springer International Publishing, Cham, 2017), pp. 311–428 5. H.W. Engl, M. Hanke, A. Neubauer, Regularization of Inverse Problems, vol. 375 (Springer Science & Business Media, 1996) 6. M. Benning, M. Burger, Modern regularization methods for inverse problems. Acta Numer. 27, 1–111 (2018) 7. Y. Maday, O. Mula, A.T. Patera, M. Yano, The Generalized Empirical Interpolation Method: Stability theory on Hilbert spaces with an application to the Stokes equation. Computer Methods Appl. Mech. Eng. 287(0), 310–334 (2015) 8. Y. Maday, A.T. Patera, J.D. Penn, M. Yano, A parameterized-background data-weak approach to variational data assimilation: formulation, analysis, and application to acoustics. Int. J. Numer. Methods Eng. 102(5), 933–965 (2015) 9. P. Binev, A. Cohen, W. Dahmen, R. DeVore, G. Petrova, P. Wojtaszczyk, Data assimilation in reduced modeling. SIAM/ASA J. Uncertainty Quantif. 5(1), 1–29 (2017) 10. P. Binev, A. Cohen, O. Mula, J. Nichols, Greedy algorithms for optimal measurements selection in state estimation using reduced models. SIAM/ASA J. Uncertainty Quantif. 6(3), 1101–1126 (2018) 11. A. Cohen, W. Dahmen, R. DeVore, J. Fadili, O. Mula, J. Nichols, Optimal reduced model algorithms for data-based state estimation. SIAM J. Numer. Anal. 58(6), 3355–3381 (2020) 12. A. Cohen, W. Dahmen, O. Mula, J. Nichols, Nonlinear reduced models for state and parameter estimation. SIAM/ASA J. Uncertainty Quantif. 10(1), 227–267 (2022) 13. A. Ern, J.L. Guermond, Theory and Practice of Finite Elements, vol. 159 (Springer Science & Business Media, 2013)

122

O. Mula

14. R. J. LeVeque, Finite Volume Methods for Hyperbolic Problems, vol. 31 (Cambridge University Press, 2002) 15. M. Raissi, P. Perdikaris, G.E. Karniadakis, Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 378, 686–707 (2019) 16. C. Bernardi, Y. Maday, Spectral methods. Handbook Numer. Anal. 5, 209–485 (1997) 17. C. Canuto, M.Y. Hussaini, A. Quarteroni, Spectral Methods in Fluid Dynamics (Springer Science & Business Media, 2012) 18. A. Cohen, R. DeVore, Kolmogorov widths under holomorphic mappings. IMA J. Numer. Anal. 36(1), 1–12 (2016) 19. B. Bojanov, Optimal recovery of functions and integrals, in First European Congress of Mathematics (Springer, 1994), pp. 371–390 20. C.A. Micchelli, Th.J. Rivlin, A Survey of Optimal Recovery (Springer, 1977) 21. E. Novak, H. Wozniakowski, Tractability of Multivariate Problems, Volume I: Linear Information, vol. 2, no. (3) (European Mathematical Society, Zürich, 2008) 22. A. Cohen, R. DeVore, Approximation of high-dimensional parametric PDEs. Acta Numer. 24, 1–159 (2015) 23. A. Cohen, R. DeVore, C. Schwab, Analytic regularity and polynomial approximation of parametric and stochastic elliptic PDE’s. Anal. Appl. 09(01), 11–47 (2011) 24. A. Buffa, Y. Maday, A.T. Patera, C. Prud’homme, G. Turinici, A priori convergence of the greedy algorithm for the parametrized reduced basis method. ESAIM Math. Model. Numer. Anal. 46(3), 595–603 (2012) 25. G. Rozza, D.B.P. Huynh, A.T. Patera, Reduced basis approximation and a posteriori error estimation for affinely parametrized elliptic coercive partial differential equations. Arch. Comput. Methods Eng. 15(3), 1 (2007) 26. P. Binev, A. Cohen, W. Dahmen, R. DeVore, G. Petrova, P. Wojtaszczyk, Convergence rates for greedy algorithms in reduced basis methods. SIAM J. Math. Anal. 43(3), 1457–1472 (2011) 27. R. DeVore, G. Petrova, P. Wojtaszczyk, Greedy algorithms for reduced bases in Banach spaces. Constr. Approx. 37(3), 455–466 (2013) 28. A. Cohen, W. Dahmen, R. DeVore, J. Nichols, Reduced basis greedy selection using random training sets. ESAIM Math. Model. Numer. Anal. 54(5), 1509–1524 (2020) 29. A. Chambolle, T. Pock, A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vision 40(1), 120–145 (2011) 30. R.A. DeVore, V.N. Temlyakov, Some remarks on greedy algorithms. Adv. Comput. Math 5(1), 173–187 (1996) 31. J.A. Tropp, A.C. Gilbert, Signal recovery from random measurements via orthogonal matching pursuit. IEEE Trans. Inf. Theory 53(12), 4655–4666 (2007) 32. A.R. Barron, A. Cohen, W. Dahmen, R.A. DeVore, Approximation and learning by greedy algorithms. Ann. Stat. 36(1), 64–94 (2008) 33. V. Temlyakov, Greedy Approximation, vol. 20 (Cambridge University Press, 2011) 34. A. Bensoussan, Optimization of sensors’ location in a distributed filtering problem, in Stability of Stochastic Dynamical Systems (Springer, 1972), pp. 62–84 35. S.E. Aidarous, M.R. Gevers, M.J. Installe, Optimal sensors’ allocation strategies for a class of stochastic distributed systems. Int. J. Control 22(2), 197–213 (1975) 36. J.R. Cannon, R.E. Klein, Optimal selection of measurement locations in a conductor for approximate determination of temperature distributions. J. Dyn. Sys. Meas. Control 93(3), 193–199 (1971) 37. T.K. Yu, J.H. Seinfeld, Observability and optimal measurement location in linear distributed parameter systems. Int. J. Control 18(4), 785–799 (1973) 38. Y. Maday, O. Mula, G. Turinici, Convergence analysis of the generalized empirical interpolation method. SIAM J. Numer. Anal. 54(3), 1713–1731 (2016)

2 Inverse Problems: A Deterministic Approach Using Physics-Based Reduced Models

123

39. Y. Maday, O. Mula, A Generalized Empirical Interpolation Method: application of reduced basis techniques to data assimilation, in Analysis and Numerics of Partial Differential Equations, ed. by F. Brezzi, P. Colli Franzone, U. Gianazza, G. Gilardi, volume 4 of Springer INdAM Series (Springer Milan, 2013), pp. 221–235 40. P. Massart, Concentration inequalities and model selection: Ecole d’Eté de Probabilités de Saint-Flour XXXIII-2003 (Springer, 2007) 41. J.L. Eftang, A.T. Patera, E.M. Rønquist, An “hp” certified reduced basis method for parametrized elliptic partial differential equations. SIAM J. Sci. Comput. 32(6), 3170–3200 (2010) 42. Y. Maday, B. Stamm, Locally adaptive greedy approximations for anisotropic parameter reduced basis spaces. SIAM J. Sci. Comput. 35(6), A2417–A2441 (2013) 43. A. Bonito, A. Cohen, R. DeVore, D. Guignard, P. Jantsch, G. Petrova, Nonlinear methods for model reduction. ESAIM: Math. Model. Numer. Anal. 55(2), 507–531 (2021) 44. R. Everson, L. Sirovich, Karhunen–loeve procedure for gappy data. J. Opt. Soc. Am. (A) 12(8), 1657–1664 (1995) 45. K. Willcox, Unsteady flow sensing and estimation via the gappy proper orthogonal decomposition. Comput. Fluids 35(2), 208–226 (2006) 46. A.C. Lorenc, A global three-dimensional multivariate statistical interpolation scheme. Mon. Weather Rev. 109(4), 701–721 (1981) 47. A.C. Lorenc, Analysis methods for numerical weather prediction. Q. J. R. Meteorol. Soc. 112(474), 1177–1194 (1986) 48. B. Adcock, A.C. Hansen, C. Poon, Beyond consistent reconstructions: optimality and sharp bounds for generalized sampling, and application to the uniform resampling problem. SIAM J. Math. Anal. 45(5), 3132–3167 (2013) 49. D. Amsallem, M.J. Zahr, C. Farhat, Nonlinear model order reduction based on local reducedorder bases. Int. J. Numer. Methods Eng. 92(10), 891–916 (2012) 50. B. Peherstorfer, B. Butnau, K. Willcox, H.J. Bungart, Localized discrete empirical interpolation method. SIAM J. Sci. Comput. 36(1), A168–A192 (2014) 51. K. Carlberg, Adaptive h-refinement for reduced-order models. Int. J. Numer. Methods Eng. 102(5), 1192–1210 (2015) 52. D. Amsallem, B. Haasdonk, Pebl-rom: Projection-error based local reduced-order models. Adv. Model. Simul. Eng. Sci. 3(1), 1–25 (2016) 53. V.N. Temlyakov, Nonlinear Kolmogorov widths. Math. Notes 63, 785–795 (1998) 54. M. Ettehad, S. Foucart, Instances of computational optimal recovery: dealing with observation errors. SIAM/ASA J. Uncertainty Quantif. 9(4), 1438–1456 (2021) 55. Y. Maday, A.T. Patera, J.D. Penn, M. Yano, PBDW state estimation: noisy observations; configuration-adaptive background spaces, physical interpretations. ESAIM Proc. Surv. 50, 144–168 (2015) 56. T. Taddei, An adaptive parametrized-background data-weak approach to variational data assimilation. ESAIM Math. Model. Numer. Anal. 51(5), 1827–1858 (2017) 57. J.P. Argaud, B. Bouriquet, H. Gong, Y. Maday, O. Mula, Stabilization of (g)eim in presence of measurement noise: Application to nuclear reactor physics, in Spectral and High Order Methods for Partial Differential Equations ICOSAHOM 2016: Selected Papers from the ICOSAHOM conference, June 27-July 1, 2016, Rio de Janeiro, Brazil, ed. by M.L. Bittencourt, N.A. Dumont, J.S. Hesthaven (Springer International Publishing, Cham, 2017), pp. 133–145 58. H. Gong, Y. Maday, O. Mula, T. Taddei, PBDW method for state estimation: error analysis for noisy data and nonlinear formulation. arXiv e-prints, page arXiv:1906.00810, 6 (2019) 59. N. Aretz-Nellesen, M.A. Grepl, K. Veroy, 3d-var for parameterized partial differential equations: a certified reduced basis approach. Adv. Comput. Math. 45(5), 2369–2400 (2019) 60. R. DeVore, G. Petrova, P. Wojtaszczyk, Data assimilation and sampling in Banach spaces. Calcolo 54(3), 963–1007 (2017) 61. A. Cohen, M. Dolbeault, O. Mula, A. Somacal, Nonlinear approximation spaces for inverse problems. Anal. Appl. 21(1). https://doi.org/10.1142/S0219530522400140

124

O. Mula

62. M. Barrault, Y. Maday, N.C. Nguyen, A.T. Patera, An Empirical Interpolation Method: application to efficient reduced-basis discretization of partial differential equations. C. R. Acad. Sci. Paris Série I. 339, 667–672 (2004) 63. M.A. Grepl, Y. Maday, N.C. Nguyen, A.T. Patera, Efficient reduced-basis treatment of nonaffine and nonlinear partial differential equations. ESAIM Math. Model. Numer. Anal. 41(3), 575–605 (2007) 64. F. Galarce, D. Lombardi, O. Mula, Reconstructing haemodynamics quantities of interest from doppler ultrasound imaging. Int. J. Numer. Methods Biomedical Eng. 37, e3416 (2021) 65. F. Galarce, J.F. Gerbeau, D. Lombardi, O. Mula, Fast reconstruction of 3d blood flows from doppler ultrasound images and reduced models. Comput. Methods Appl. Mech. Eng. 375, 113559 (2021) 66. F. Galarce, D. Lombardi, O. Mula, State estimation with model reduction and shape variability. application to biomedical problems. SIAM J. Sci. Comput. 44(3), B805–B833 (2022) 67. J.K. Hammond, R. Chakir, F. Bourquin, Y. Maday, Pbdw: A non-intrusive reduced basis data assimilation method and its application to an urban dispersion modeling framework. Appl. Math. Model. 76, 1–25 (2019) 68. J.-P. Argaud, B. Bouriquet, F. de Caso, H. Gong, Y. Maday, O. Mula, Sensor placement in nuclear reactors based on the generalized empirical interpolation method. J. Comput. Phys. 363, 354–370 (2018) 69. P. Pereira Álvarez, P. Kerfriden, D. Ryckelynck, V. Robin, Real-time data assimilation in welding operations using thermal imaging and accelerated high-fidelity digital twinning. Mathematics 9(18), 2263 (2021)

Chapter 3

Model Order Reduction for Optimal Control Problems Michael Hinze

Abstract These lecture notes comprise lectures on Model Order Reduction (MOR) for optimal control problems which were given by the author within the CIME summer school on Model order education and applications in July 2021. The topics include • construction of reduced order models for nonlinear PDE systems with special emphasis on the approximation of the nonlinearities with (D)EIM techniques; • POD basis construction against the background of spatial, adaptively generated snapshots; • reduced basis approximations; • error analysis for reduced models obtained with the MOR approaches; • use of MOR models in optimization with PDE constraints including numerical analysis with the variational discretization method, which is perfectly tailored to the use of MOR models for the state approximation; • a novel snapshot location procedure for MOR in optimal control including a priori and a posteriori error analysis; • certification of RB models in parametrized optimal control, where the emphasis is taken on reliability and also effectivity of the RB approximation; • adaption of concepts from a posteriori finite element analysis for the construction of a sharp (up to a constant) error bound for the variables involved in the optimization process; • sketch of RB convergence of the approach.

Material presented in these lecture notes resulted from joint work with many colleagues, in particular from collaborations with Konstantin Afanasiev, Oke Alff, Allesandro Alla, Carmen Gräßle, Christian Kahle, Denis Korolev, Martin Kunkel, Ulrich Matthes, Morten Vierling, and Stefan Volkwein.

M. Hinze () Mathematisches Institut, University of Koblenz, Koblenz, Germany e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 M. Falcone, G. Rozza (eds.), Model Order Reduction and Applications, C.I.M.E. Foundation Subseries 2328, https://doi.org/10.1007/978-3-031-29563-8_3

125

126

M. Hinze

3.1 Outline The lecture series is composed of three lectures whose contents are summarized next. Lecture 1: Model Order Reduction techniques. Proper Orthogonal Decomposition (POD) POD for Time and/or parameter dependent PDEs Error estimates Treatment of nonlinearities .→ DEIM Marry spatial adaptivity with POD Further aspects related to POD Lecture 2: Optimization with MOR surrogate models. Basic approach in PDE constrained optimization Input dependence of MOR model .→ MOR basis updates Snapshot choice in optimal control Numerical analysis of MOR in PDE constrained optimization Further aspects of MOR in applications Lecture 3: A fully certified RBM for PDE constrained optimization. The reduced basis concept Model problem Greedy sampling Main results: error equivalence, convergence Numerical examples: the thermal block and Graetz flow

3.2 Introduction and Preliminaries There are many applications where methods from model order reduction might be used to speed up the numerical solution processes. This includes • • • •

PDE constrained optimization, also with surface PDEs, inverse problems, material optimization, and shape optimization problems.

Prototype applications with a large demand of reduced order modeling include simulation and optimization of permanent synchronous machines [6, 9], industrial shape optimization [51, 56], diffuse interface approaches for atmosphere-ocean coupling [22], optimization of crystal growth processes [40, 46, 47], inverse modeling in medicine [2], and flow control [1, 12, 13, 32, 37, 38, 60], just to mention a few.

3 Model Order Reduction for Optimal Control Problems

127

There exists a lot of literature in the field of model order reduction. Here we in the first instance mention the pioneering work of Sirovich [63–65], and also the recently published handbooks [15, 17, 18], the nice book [35], a recent contribution to linear-quadratic optimal control [30], as well as the overviews [16, 29] and the seminal contributions [44, 45].

3.2.1 Motivation Let us assume that we have given a validated mathematical model for a physical process (here a PDE system, say), and we intend to use this model to tailor and/or optimize the physical process with the help of numerical and computational methods. Here we assume that it is possible to influence the physical process through suitably adapting inputs and/or parameters of the model, which we may call the design variables for the physical process. This then might be a computationally very expensive task due to the curse of dimensionality, since in an optimization process every function evaluation corresponds to a model evaluation at a given design, and many such evaluations might be needed to find a better, ideally optimal design. Let us now assume that the physical process is modeled by a (here abstract) PDE system. Then our design process could be considered as abstract .∞-dimensional optimization problem with PDE constraints which takes the form min

(P )

.

(y,u)∈W ×Uad

J (y, u)

(3.1)

s.t.  ∂y (P DE)

.

∂t + Ay + G(y) = Bu in Z y(0) = y0 in H.

(3.2)

Here, u denotes the design, which we also refer to as control variable. Moreover, W , Z and H are appropriate Banach and/or Hilbert spaces. The control operator .B maps controls from a Banach space U of controls to feasible right hand sides of the PDE, where .Uad ⊆ U denotes the set of admissible controls, and .G defines the nonlinearity of the system. In our applications we always assume that .(P DE) for every right hand side admits a (unique) solution, and that the solution operator .U  u → y(u) ∈ W admits the properties which we require for the respective analysis.

128

M. Hinze

A quick and accurate solution of problem .(P ) is a central task, where one may aim at developing solution strategies which obey the rule Effort of optimization = K × Effort of simulation,

.

where K denotes a small integer. A possible technique to achieve this goal uses appropriate surrogate models for the PDE system .(P DE).

3.2.1.1

Examples of PDE Systems

We may consider to find .y ∈ W := W (0, T ) = {v ∈ L2 (0, T ; V ), yt ∈ L2 (0, T ; V ∗ )} which solves .

∂y + Ay + G(y) = Bu in Z(= L2 (0, T ; V ∗ )). ∂t y(0) = y0 in H.

(3.3) (3.4)

Here, .(V , H, V ∗ ) denotes an appropriate Gelfand tripple. Examples for this setting include 1. .A := −Δ in the case of the heat equation, 2. .A := −Δ, .G(y) := yy for the Burgers equation, 3. .A := −Δ, .G(y) := −δey , .δ > 0 in the case of ignition processes modeled by the Bratu problem, 1 4. .A := − Re P Δ, .G(y) := P [(y∇)y], P Leray projector, Re Reynolds number in the case of the Navier-Stokes equations,     −P Δ −Grg P [(v∇)v] , Re Reynolds number, , . G(y) = G(v, θ ) := 5. .A := 1 0 − Re Δ (v∇)θ Gr Grashof number in the case of the Boussinesq approximation of the NavierStokes system, and 6. Cahn-Hilliard and Cahn-Hilliard/Navier-Stokes systems, which we later consider as applications, where also more details will be provided. On the discrete level we for approximating solutions to .(P DE) in general consider a Galerkin approximation using e.g. finite elements for the spatial discretization combined with a continuous or discontinuous finite element method for the time discretization. In the reduced basis terminology those discrete solutions would refer to truth solutions. Concerning the computational complexity of the underlying problem this gives rise to the schematic Degree of Freedoms (DOF) diagram in Fig. 3.1, which depicts the problem size in a PDE constrained optimization problem in relation to the spatial (vertical axis) and temporal (horizontal axis) discretization parameters. A further application of model order reduction techniques consists in finding a low-dimensional approximation of the solution manifold of (optimal control problems with) parametrized PDEs. As example consider for

3 Model Order Reduction for Optimal Control Problems

129

Spatial Discretization

DOF for full optimization DOF for Moving Horizon Approach

DOF for Moving Horizon combined with Model Reduction

DOF for Model Reduction Approach

Fig. 3.1 The DOF-diagram gives an impression of the problem size in a PDE constrained optimization problem governed by a time-dependent PDE. The time discretization corresponds to the horizontal axis, the spatial discretization to the vertical axis. The model order reduction approaches discussed in the present notes refer to dimension reduction related to the spatial discretization (lower right circle). Other approaches might also take into account model reduction w.r.t. the time discretization (moving horizon approaches)

μ = (μ1 , μ2 ) > 0 the elliptic PDE

.

.

− div (A(x; μ)∇y) = f in Ω,

y = 0 on ∂Ω,

where  A(x; μ) =

.

μ1 , x ∈ R, μ2 , x ∈ Ω \ R.

The aim now consists in finding a low-dimensional surrogate model which represents the (optimization problem with the) parameter dependent problem sufficiently well over the parameter domain. Here the reduced basis method is the reduction method of choice.

3.3 Lecture 1: The Model Order Reduction (MOR) Techniques Let us now consider a snapshot-based model order reduction approach like proper orthogonal decomposition (POD) together with the associated Galerkin procedure.

130

M. Hinze

The following diagram depicts the advantages and shortcomings of the according reduced order model (ROM) compared to the—here finite element—model of the truth solution. An essential ingredient for the construction of the ROM is the quality of the available data, which here is given by the set of .n ∈ N snapshots. Given a dynamical system the construction of the ROM is sketched next. For .f : [0, T ] × RN → RN let the dynamical system be given by y(t) ˙ = f (t, y(t)),

.

t ∈ (0, T ),

y(0) = y0 .

(3.5)

Without loss of generality we assume .n ≤ N, which is a typical situation in many practical applications, see also the Navier-Stokes example presented below. For the reduced order approximation we with .1 ≤ l ≤ n use the MOR Galerkin ansatz y(t) ≈ y  (t) =

 

.

ηi (t)ψi ,

i=1

where the modes .{ψi }i=1 are computed from information provided by the dynamical system through snapshots .y 1 , . . . , y n ∈ RN , where .y i ∼ y(ti ) for .i = 1, . . . , n with suitably chosen time instances .ti . One option consists in computing a singular value decomposition (SVD) according to   Y = y 1 y 2 . . . , y n = Ψ ΣΦ t ,

.

where .Ψ = [ψ1 , . . . , ψN ] and .Φ = [φ1 , . . . , φn ] ,   n and then to truncate Ψ  = [ψ1 , . . . , ψ ] and Φ  = [φ1 , . . . , φ ] ,   n

.

according to the prescribed information content info .∈ [0, 1] with 

2 σ i=1 i . = arg max s : ≤ info . trace(Y t Y ) s

(3.6)

Considering applications, it is important that this definition of the information content of the snapshot sets enables the successive computation of the left and right eigenvectors .ψi and .φi , since n  .

σi2 = trace(Y t Y ).

i=1

Once l is specified we use the Galerkin Ansatz y(t) ≈ y  (t) = Ψ  η(t)

.

3 Model Order Reduction for Optimal Control Problems

131

and consider the Galerkin approximation (Ψ  )t y˙l (t) = (Ψ  )t f (t, y l (t)),

.

t ∈ (0, T ),

y l (0) = (Ψ  )t y0

as reduced order or surrogate model for the dynamical system (3.5). Frequently, invoking the coefficient vector .η of the MOR Galerkin Ansatz we use the notation η(t) ˙ = f˜(t, η(t)) := (Ψ  )t f (t, Ψ  η(t)),

.

t ∈ (0, T ),

η(0) = η0 .

(3.7)

We have thus replaced the N-dimensional dynamical system (3.5) by the .dimensional surrogate dynamical system (3.7), where . · · · > λd > 0 denote the strictly positive eigenvalues of the correlation matrix K. For .l ≤ d let .Vl = ψ1 , . . . , ψl . Further set Yk :=

l 

.

i=1

ηi (tk )ψi .

138

M. Hinze

Then

.

δt 

n 

|Yi − y(ti )|2H ≤

i=1





time-discrete L2 −error

≤C

⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩

d  i=l+1



|y0 , ψi V |2 



projection error initial condition

+

d 1  λi δt 2 i=l+1   

+

δt 2 

⎫ ⎪ ⎪ ⎪ ⎪ ⎪ ⎬ ⎪ ⎪ ⎪ ⎭

time discretization error⎪ ⎪

not considered information content

(3.9) Let us note that • this result also extends to the case of distinguish time and snapshot grids, and • improvements of this estimate can be obtained by differently weighting the snapshots through e.g. including derivative information by time difference quotients of snapshots. The following writing and the numerical results for the linear heat equation are taken from1 [45]. The numerical results show the exponential decay of singular values in surrogate modeling of parabolic equations and also illustrate the approximation properties of POD surrogate models. From [45, Prop. 4.7] and the considerations above we conclude that for .y0 = 0 the POD approximation error .y  − y can be estimated in terms of the not modeled eigenvalues, i.e., 2

∞ 

2

y  − yL∞ (0,T ;H ) + y  − yL2 (0,T ;V ) ∼

.

λi

(3.10)

i=+1

if time-derivative information is included into the snapshot set. In order to check this error behaviour let us suppose that the eigenvalues .{λi }i∈N decay exponentially which is justified in case of the heat equation, say. This then motivates the ansatz λi = λ1 e−α(i−1)

.

for i ≥ 1,

(3.11)

1 Reprinted by permission from Springer Natutre: Springer, Computational Optimization and Applications 39, pages 319–345, Error estimates for abstract linear-quadratic optimal control problems using proper orthogonal decomposition (M. Hinze, S. Volkwein) 2022 Springer Nature Switzerland AG.

3 Model Order Reduction for Optimal Control Problems

139

where our first goal consists in determining the factor .α > 0 numerically.Let X denote either the space H or the space V . Notice that ∞ 2 − yL2 (0,T ;X) . 2 y +1 − yL2 (0,T ;X)

y 





λi

i=+1 ∞

= λi

i=+2



e−α(i−1)

i=+1 ∞

e−α(i−1)

i=+2

e−α

=

i=0



e−α

i

i = eα −1

i=0

Thus, we have 2

Q() = ln

.

y  − yL2 (0,T ;X) 2

y +1 − yL2 (0,T ;X)

∼ α,

(3.12)

and we may introduce the experimental order of decay (EOD) as EOD :=

.

 max

1 max

(3.13)

Q(k)

k=1

so that .EOD ≈ α. The following numerical run serves the purpose of determining the EOD for a POD approximation of the linear heat equation. To anticipate the discussion we note that it confirms the expected error behaviour (3.10). This is due to the fact that time-derivative information is included into the snapshot set, see (3.17).

Run 1 In this example we consider the linear heat equation on the time interval .(0, T ) in a spatially three-dimensional geometry: yt (t, x) − Δy(t, x) = 0

.

for all (t, x) ∈ Q = (0, T ) × Ω, . (3.14a)

∂y (t, x) = 0 ∂n

for all (t, x) ∈ Σ1 = (0, T ) × Γ1 , . (3.14b)

∂y (t, x) = 100g(t, x) ∂n

for all (t, x) ∈ Σ2 = (0, T ) × Γ2 , . (3.14c)

y(0, x) = 0

for all x = (x, y, z) ∈ Ω ⊂ R3 . (3.14d) (continued)

140

M. Hinze

Run 1 The bounded domain .Ω is given by a cylinder between the planes .z = 0 and .z = 0.5 with an annulus as floor space whose rotational axis is the z-axis. The inner radius is equal to 0.4 and the outer radius is chosen as 1.0, see Fig. 3.7. We suppose that .∂Ω = Γ1 ∪ Γ2 and the boundary .Γ1 is the upper annulus. Furthermore, let .T = 1 and the inhomogeneous boundary condition in (3.14c) is given by  !" g(t, x) = exp − (x − 0.7 cos(2π t))2 + (y − 0.7 sin(2π t))2

.

× for (t, x) ∈ Σ2 ,

(3.15)

see Fig. 3.8. Note that g corresponds to a nozzle circling for .t ∈ [0, T ] once around in counter-clockwise direction at a radius of .0.7. For fixed t, the function g decays exponentially with the square of the distance from the current location of the nozzle. The ‘triangulation’ of the domain is created using an initial mesh obtained from the FEMLAB routine meshinit(fem,’Hmax’,0.25) and refined by calling the FEMLAB routine meshrefine(fem). The final mesh consists of 2100 degrees of freedom. In the time direction, we partition the interval into .m = 499 subintervals of equal lengths: tj = j Δt,

.

j = 0, . . . , m,

Δt = T /m.

(3.16)

First we solve (3.14) by using the implicit Euler method as time discretization, and piecewise linear, continuous finite elements (FE) for the discretization of the spatial variables. The resulting linear systems are solved by a preconditioned conjugate gradient method (MATLAB routine pcg.m) with an incomplete Cholesky factorization of the coefficient matrix (MATLAB routine cholinc.m) with the drop tolerance .10−6 as preconditioner. The FE solve needs 48 s for the run time, excluding the generation of the mesh, precomputing integrals and the incomplete Cholesky factorization. Here we have .H = L2 (Ω) and .V = H 1 (Ω). Let .{y h (tj )}m j =0 denote the FE solution to (3.14) at the time instances .tj , .j = 0, . . . , m. Then we approximate the time derivatives .yth (tj ) by finite differences and set

yj =

.

⎧ h ⎪ ⎨ y (tj −1 )

for 1 ≤ j ≤ m + 1,

h h ⎪ ⎩ y (tj −m−1 ) − y (tj −m−2 ) for m + 2 ≤ j ≤ 2m + 1. Δt

(3.17)

Next we introduce the symmetric, positive semi-definite .(2m + 1) × (2m + 1) m = Δt y , y  for .1 ≤ i, j ≤ 2m + 1. matrix .Km with the elements .Kij j i V (continued)

3 Model Order Reduction for Optimal Control Problems

141

Run 1 In this example we have .Km ∈ R999×999 . Let .max = 15 be chosen. For  any . = 1, . . . , max we compute the non-negative eigenvalues .{λm i }i=1 and the corresponding eigenvectors .{vim }i=1 of .Km by using the MATLAB routine m  eigs.m. If .λm  > 0 holds, the POD basis .{ψi }i=1 of rank . is given by 2m+1 1  m ψim = # m (v )j yj , λi j =1 i

.

i = 1, . . . , ,

where .(vim )j denotes the j -th components of the i-th eigenvector of .Km for m .1 ≤ i ≤  and .1 ≤ j ≤ 2m + 1. The super index indicates that both the .λ ’s i m m and the .ψi ’s depend on the time grid .{tj }j =0 . Notice that we have normalized m all eigenvalues of .Km so that . 2m+1 i=1 λi = 1 holds. The decay of the first .max eigenvalues is shown in Fig. 3.9 (left). Let E() =

 

.

λi · 100%.

(3.18)

i=1

Then, we find .E(1) ≈ 47%, .E(2) ≈ 68%, .E(3) ≈ 86%, .E(4) ≈ 96%, and .E(5) ≈ 99%. Using the POD basis we discretize (3.14) in the spatial variables by a Galerkin scheme and apply the implicit Euler method for the time integration. In this way we obtain the POD solutions .y  to (3.14) for . = 1, . . . , max . In Fig. 3.9 (right) the corresponding norms of the difference  h .y − y are presented for . = 1, . . . , max . Of course, both norms decay with respect to the number . of POD basis functions. Due to .ϕL2 (Ω) ≤ ϕH 1 (Ω) for every .ϕ ∈ H 1 (Ω) the .L2 (0, T ; L2 (Ω))-norm of the difference is smaller than the corresponding .L2 (0, T ; H 1 (Ω))-norm. For the experimental order of decay we obtain .EOD = 0.6683 if we take the .L2 (0, T ; L2 (Ω))-norm for the difference .y  − y h , and .EOD = 0.5726 provided we measure the difference .y  − y h in the .L2 (0, T ; H 1 (Ω))-norm. As Fig. 3.9 (left) shows the ansatz .λi = λ1 e−α(i−1) , .i = 1, . . . , max , reflects very well the decay of the eigenvalues. In Table 3.1 we present the needed CPU times for the computations. As Table 3.1 shows, the POD solve needs significantly less CPU time compared to the FE solve. Computing the POD basis functions and the reduced-order model takes nearly 90 s. Of course, the reduced order model only pays off if the system (3.14) has to be solved several times for different data, say. In this case the POD basis functions are only computed once and the POD solve for (3.14) reduces the overall CPU time significantly. (continued)

142

M. Hinze

Run 1 (continued) To compute the POD basis we use the topology in the space .V = H 1 (Ω) and include also the difference quotients into the ensemble. In Table 3.2 the errors of the difference .y  − y for . = 15 are compared for different POD strategies. It turns out that taking the .H 1 -topology and including the difference quotients lead to the smallest error in the computation. .♦

Fig. 3.7 Run 1: Domain .Ω ⊂ R3

Fig. 3.8 Run 1: Inhomogeneity .g(t, x) at .t = 0.0, .t = 0.25, .t = 0.5

3 Model Order Reduction for Optimal Control Problems Decay of the eigenvalues and estimated rates

0

143 Decay of the norms

1

10

10

L2(H1(Ω)) error L2(L2(Ω)) error

0

10 −2

10

−1

10

−2

10 −4

10

−3

10 λ i −α (i−1) 2 1 λ1 e for L (0,T;H (Ω)) λ e−α (i−1) for L 2(0,T;L2(Ω))

−6

10

−4

10

1

−5

1

3

5

7

9

11

13

15

10

1

3

i−axis

5 7 9 11 Number of POD basis functions

13

15

Fig. 3.9 Run 1: Decay of the eigenvalues (left) and norms (right) Table 3.1 Run 1: CPU times for the FE and POD computations Computing the FE mesh and matrices FE solve for (3.14) Computing 15 POD basis functions Computing the reduced-order model for (3.14), . = 15 POD solve for (3.14), . = 15

50.0 s 49.9 s 87.1 s .< 2.0 s .< 0.1 s

Table 3.2 Run 1: Comparison of the norms .y h − y  L2 (0,T ;H 1 (Ω)) and .y h − y  L2 (0,T ;L2 (Ω)) with . = 15 for different POD strategies (i.e., choice of the topology and ensemble) Topology 2 .L (Ω) 1 .H (Ω) 2 .L (Ω) 1 .H (Ω)

Ensemble m+1 .{yj }i=1 m+1 .{yj }i=1 2m+1 .{yj }i=1 2m+1 .{yj }i=1

.y

− y  L2 (0,T ;H 1 (Ω)) 0.0413 0.0171 0.0168 0.0131 h

.y

− y  L2 (0,T ;L2 (Ω)) 0.0045 0.0015 0.0015 0.0013 h

It is worth noting that error analysis for POD reduced order models is subject of current research, see e.g. [19, 55, 62]. Furthermore, POD is not restricted to parabolic equations. The following example is taken from [34], where the the linear wave equation x(t), ¨ φH + Dx(t), ˙ φH + a (x(t), φ) = f (t), φH

.

for all φ ∈ V and t ∈ [0, T ], x(0), ψ = x0 , ψH

for all ψ ∈ H,

x(0), ˙ ψ = x˙0 , ψH

for all ψ ∈ H,

144

M. Hinze

is considered. Then POD based on the Newark scheme delivers an error estimate of the form Δt

.

m  2    k X − x(tk ) ≤ k=1

H

$ 2  2  2       ≤ CI X0 − P l x(t0 ) + X1 − P l x(t1 ) + Δt ∂X0 − P l x(t ˙ 0 ) H

H



H

$ %  d  2 1 1   + Δt ∂X1 − P l x(t ˙ 1 ) + Δt 4 + + λIj ⎠ , + 1 H Δt Δt 4 j =l+1

l where .{tk }m k=0 denotes a time grid on .[0, T ] and .P denotes the orthogonal projection onto the linear space spanned by the first l POD modes. For this kind of equations

• we in general only observe linear decay of singular values, • the critical dependence on .Δt can also be avoided by including derivative information into the snapshot set. For the decay of the eigenvalues in the absence of friction (.D = 0) we obtain the result depicted in Fig. 3.10. In the presence of friction one obtains exponential decay.

Fig. 3.10 Wave equation: decay of eigenvalues without friction

3 Model Order Reduction for Optimal Control Problems 10 10

10 5 DOE2 DOE3 DOE4 DOE6 DOE8 pDWE

10 0

10

145

10

DOE2 DOE3 DOE4 DOE6 DOE8 pDWE

5

10 0

-5

10 -5 10 -10

10 -15

10 -10

0

500

1000

1500

10 -15

0

500

1000

1500

Fig. 3.11 Decay of singular values for the phase field .ϕ (left) and the chemical potential .μ (right) for .k = 2, . . . , 8. For comparison also the decay of the singular values obtained with the smooth Double Well free Energy is shown. One clearly sees that the the decay rate depends on the smoothness of the free energy in the Cahn-Hilliard equation

Let us briefly mention some shortcomings of POD model order reduction for nonsmooth PDE systems. To illustrate this consider the Cahn-Hilliard system ∂t ϕ − mΔμ + v · ∇ϕ = 0, .

−σ εΔϕ + σ ε−1 F (ϕ) = μ

(CH)

with the relaxed Double Obstacle free Energy F(ϕ) =

.

" s 1 1 − ϕ 2 + (max (ϕ − 1, 0) + | min (ϕ + 1, 0) |)k 2 k

k ∈ N.

Here, the power .k ∈ N is a measure of (non-)smoothness of the system. In Fig. 3.11 results obtained in [3] are summarized. In this context the reduction of the nonlinearity is of great importance. In the previous example the Discrete Empirical Interpolation Method (DEIM) proposed in [20] is used. Let us sketch the idea of the method for the PDE system (3.3). POD projects the nonlinearity .G(y) in the PDE as follows: G  (α(t)) ≡ ψ t G(ψα(t)) .    

.

×N

N×1

Here, .ψ is .N × , with N the dimension of the finite element space, .G has N components, and in the evaluation of every of its components may touch every component of its N-dimensional argument. This evaluation thus has complexity .O(N ). The DEIM-idea works as follows. Approximate the nonlinear function .G(ψα(t)) by projecting it onto a subspace that approximates the space generated by the nonlinear function and which is spanned by a basis of dimension .m 0 weighs the control cost and z denotes the desired state. Controls are sought in the closed and convex set .Uad ⊆ U , and as above .y ≡ y(u) denotes the unique solution of state equation associated to u. Then it is easy to argue that min

(P )

.

(y,u)∈W ×Uad

J (y, u) s.t. (3.22)

admits a unique solution .(y, u) ∈ W × Uad which satisfies the optimality conditions Jˆ (u), v − uU ∗ ,U ≥ 0 for all v ∈ Uad .

.

Here Jˆ (u) = αu + B ∗ p(y(u)),

.

(3.23)

where the function p solves the adjoint equation .

d − dt (p(t), v)H + a(v, p(t)) = (y − z, v)H , t ∈ [0, T ], v ∈ V , v ∈ V. (p(T ), v)H = 0,

(3.24)

It follows from the characterization of the orthogonal projection .PUad : U → Uad that the variational inequality (3.23) is equivalent to the nonsmooth operator equation % 1 ∗ − B p(y(u)) . α

$ u = PUad

.

(3.25)

3 Model Order Reduction for Optimal Control Problems

155

For details on the facts presented above (and also below) we refer to [41]. For the POD approximation choose now a POD subspace .V l := ψ1 , . . . , ψl  of V containing sufficient information in the sense of (3.6), and for a given control u define the POD Galerkin semi-discretization .y l ≡ y l (u) of the state .y ≡ y(u) using the subspace .V l according to d dt

.



y l (t), v

H

+ a(y l (t), v) = (Bu)(t), vV ,V ∗ , t ∈ [0, T ], v ∈ V l , v ∈ V l. (y(0), v)H = (y0 , v)H ,

(3.26)

If needed, similarly define a POD Galerkin approximation .pl ≡ p l (y l (u)) of .p ≡ p(y(u)) through .

l d p (t), v H + a(v, pl (t)) = y l − z, v H , t ∈ [0, T ], v ∈ V l , − dt l v ∈ V l. p (T ), v H = 0,

(3.27)

The variational discrete optimization problem with POD surrogate model then reads (Pˆ l )

.

min Jˆl (u) := J (y l (u), u),

u∈Uad

which admits a unique solution .ul ∈ Uad . This control satisfies the optimality condition Jˆl (ul ), v − ul U ∗ ,U ≥ 0 for all v ∈ Uad ,

.

where similarly to above Jˆl (u) = αu + B ∗ pl (y l (u))

.

holds, and where .pl solves the adjoint Eq. (3.27). Moreover, the variational inequality is equivalent to the nonsmooth operator equation % $ 1 ul = PUad − B ∗ pl (y l (ul )) . α

.

Invoking the abstract error analysis provided in [41, Section 3] it is possible to prove Theorem 3.1 Let .u, ul denote the unique solutions of .(P ) and .(Pˆ l ), respectively. Then 1 αu − ul 2U + y − y l 2L2 (H ) ≤ B ∗ (p(y(u)) − pl (y(u))), ul − uU ∗ ,U    2

.

POD approximation error

+

1 y − y l (u)2L2 (H ) . 2    POD approximation error

156

M. Hinze

This estimate in essence says that the error in the controls is determined by the POD approximation errors of the state .y(u) and the adjoint state .p(y(u)). Using the analysis of [45, 62] for POD approximations one qualitatively obtains

u − ul U + y − y l L2 (H )

.

' ( ∞ ( ∼ y0 − P l y0 H + ) λk + k=l+1

+ yt − P  yt L2 (0,T ;V ) + p(y(u)) − P l (p(y(u)))W ,

(3.28)

where .P l denotes the projection onto the respective POD space and

.



λk

k=l+1

measures the information content not considered in the POD space. From this analysis one may draw the following conclusions. 2

• Get rid of .(y − P  y)t L2 (0,T ;V ) .→ include derivative information into your snapshot set. This is common nowadays, since refined error analysis and also pointwise error analysis relies on including difference quotients into the snapshot set, see e.g. [55, 62] 2 • Get rid of .p − P  pW (0,T ) .→ include adjoint information into your snapshot set. This is now essential and also improves the quality of the optimization results, as the forthcoming numerical results demonstrate. • get rid of .y0 − P l y0 H .→ add .y0 to the snapshot set. This is very easy to establish and there is no reason to not include the initial condition into the snapshot set. However, the respective POD approximations in Theorem 3.1 rely on the knowledge of the optimal control u and the optimal state .y(u), respectively. This is impractical, since snapshots of the state and the adjoint state for those data in general are not available. A remedy to this problem is presented in the next section.

Run 2 Next we present [45, Run 2] to illustrate the findings of above numerically. To begin with we set .V = H 1 (Ω), .H = L2 (Ω), and .U = L2 (0, T ). We choose the cost functional .J : W (0, T ) × U → R defined by α1 .J (y, u) = 2

0

T

y(t) − z1 (t)2H dt +

α2 σ y(T ) − z2 2H + u2U , 2 2 (3.29) (continued)

3 Model Order Reduction for Optimal Control Problems

157

Run 2 where .z1 ∈ L2 (0, T ; H ) and .z2 ∈ H are given desired states and .α1 , α2 , σ nonnegative parameters satisfying .α1 + α2 > 0 and .σ > 0. In what follows we use .α1 = 0, .α2 = 1, .σ = 0.01, and .z2 = 1. The state .y ∈ V and the control .u ∈ L2 (0, T ) solve yt (t, x) − Δy(t, x) = 0

.

for all (t, x) ∈ Q = (0, T ) × Ω, . (3.30a)

∂y (t, x) = 0 ∂n

for all (t, x) ∈ Σ1 = (0, T ) × Γ1 , . (3.30b)

∂y (t, x) = u(t)g(t, x) ∂n

for all (t, x) ∈ Σ2 = (0, T ) × Γ2 , . (3.30c)

y(0, x) = 0

for all x = (x, y, z) ∈ Ω ⊂ R3 . (3.30d)

where the bounded domain .Ω as well as the partition .Γ = Γ1 ∩ Γ2 are the same as in Run 1 and the function g is defined in (3.15). The FE mesh is generated by the same procedure as in Run 1. So, again, we have 2100 degrees of freedom. For the time grid we take .T = 1 and .m = 249. All together, we obtain 1,046,049 unknowns for the state y, control u and adjoint variable p. The set .Uad is given by .Uad = {u ∈ L2 (0, T ) | ua ≤ u ≤ ub in (0, T ) almost everywhere}, where we choose .ua = −10000 and .ub = 10000. Due to the lower and upper bounds the solution .u ¯ ∈ L2 (0, T ) to (3.20) is inactive, i.e. .ua < u¯ < ub holds in .(0, T ). To solve (3.20) we apply the CG method with the relative stopping criterium .ε = 10−8 for the residuum. Notice that a matrix application within each CG iteration requires a linearized state and an adjoint solve. It turns out that the discrete FE solution .u ¯ h is found after 29 CG iterations. The needed CPU time is 1560 s. We denote m h by .{y¯ h (tj )}m j =0 and by .{p¯ (tj )}j =0 the corresponding FE solutions to the state and adjoint equation, respectively. To compute the POD basis we compare three different snapshot ensembles, namely  * y¯ h (t ) − y¯ h (t ) +m & j j −1 V1 = span {y¯ h (tj )}m , , j =0 j =1 Δt  * p¯ h (t ) − p¯ h (t ) +m & j j −1 , , V2 = span {p¯ h (tj )}m j =0 j =1 Δt

.

V 3 = V1 ∪ V2 , (continued)

158

M. Hinze

Run 2 (continued) j and compute the three POD basis function sets .{ψi }i=1 corresponding to the snapshot ensembles .Vj , .j = 1, 2, 3. The needed CPU times for . = 15 are presented in Table 3.4. Since the number of members in the ensembles .V1 and .V2 are the same, the computational time for the POD bases nearly coincides, whereas the space .V3 contains twice as many members as .V1 or .V2 . We compare the norms .y ¯ h − P  y¯ h L2 (0,T ;X) and .p¯ h − P  p¯ h L2 (0,T ;X) with .X = H 1 (Ω) as well as .X = L2 (Ω) for the three different POD bases, see Figs. 3.18 and 3.19. Due to the choice of the snapshot ensemble, the norm .y ¯ − P  y ¯ L2 (0,T ;X) decays very slowly provided the POD basis .{ψi2 }i=1 is utilized, since only information about the adjoint variable is included in .V2 . In .V3 the .y h (tj )’s are included. Consequently, using the POD basis functions 3 3 .ψ , . . . , ψ we observe that the norm .y ¯ − P  y ¯ L2 (0,T ;X) decays faster than  1  2 for the POD basis .{ψi }i=1 . Of course, the basis functions .ψ11 , . . . , ψ1 contain only information about the state variable .y¯ h , so that the corresponding norm .y ¯ − P  y ¯ L2 (0,T ;X) is smaller for each . ∈ {1, . . . , 15} compared to the same norm using the POD bases .{ψi2 }i=1 or .{ψi3 }i=1 , see Fig. 3.18. Analogous arguments can be used to explain the decay of the norm .p¯ h −P  p¯ h L2 (0,T ;X) in Fig. 3.19 for the three different POD bases. In Fig. 3.20 we compare the FE solution .uh with the POD Galerkin control j   .u for . = 15 and for the different POD bases .{ψ } i i=1 corresponding to the snapshot ensembles .Vj , .j = 1, 2, 3. As is shown in Fig. 3.20, the Galerkin control computed with ensemble .V3 is close to the FE solution 1  h  .u . The suboptimal control .u obtained by using the POD basis .{ψ } i i=1 leads to a larger error in the time interval .(0, 0.4), whereas suboptimal control .u obtained by using the POD basis .{ψi2 }i=1 differs significantly from .uh in the time interval .(0.8, 1). In Table 3.5 we also present the errors between the FE optimal solution and the POD suboptimal controls in the 2 .L (0, T )-norm and for different .. From Table 3.5 we conclude that including adjoint information into the snapshot ensemble is essential to obtain good approximations for the controls. In fact, the optimality conditions (3.25) directly relates the control and the adjoint variable. Let us mention that for . = 15 the norms .y ¯ h − y¯  L2 (0,T ;H 1 (Ω) are three times larger using the POD j

bases .{ψi }i=1 , .j = 1, 2 than with the POD basis built from ensemble .V3 . So it turns out, that the use of ensemble .V3 leads to the best performance of the POD Galerkin control. This observation coincides with the estimates in [45, Theorem 4.5, Proposition 4.7], where the state and the adjoint variables have to be approximated well to get a small error for the difference between the optimal and the Galerkin control. .♦

3 Model Order Reduction for Optimal Control Problems

159

Table 3.4 Run 2: CPU times in seconds for the computation of the three POD basis function j sets .{ψi }i=1 corresponding to the snapshot ensembles .Vj , .j = 1, 2, 3 Computing the FE mesh and matrices FE solve for (3.20) Computing 15 POD basis functions Comp. the reduced-order model for (3.20), . = 15 POD solve for (3.14), . = 15

.V1

.V2

.V3

38.4 1560.0 22.4 0.7 0.7

38.4 1560.0 22.5 0.7 0.7

38.4 1560.0 87.0 0.7 0.7

l

l

|| y−P y ||L2(0,T;H1(Ω))

0

|| y−P y ||L2(0,T;L2(Ω))

0

10

10

−1

10

−1

10

−2

10 −2

10

−3

10 −3

10

−4

10

−4

1

1

1

10

{ψ } {ψ2} {ψ3}

{ψ } {ψ2} {ψ3}

−5

3

5 7 9 11 Number l of POD basis functions

13

15

10

1

3

5 7 9 11 Number l of POD basis functions

13

15

Fig. 3.18 Run 2: Decay of the norms .y¯ h − P  y¯ h L2 (0,T ;H 1 (Ω)) (left) and .y¯ h − P  y¯ h j

L2 (0,T ;L2 (Ω)) (right) for the different POD basis .{ψi }i=1 corresponding to the snapshot ensembles = 1, 2, 3

.Vj , .j

|| p−Pl p ||

2

|| p−Pl p ||

1

L (0,T;H (Ω))

0

2

2

L (0,T;L (Ω))

−2

10

10

−2

10

−4

10 −4

10

−6

10 −6

10

−8

−8

−10

10

10

{ψ1} {ψ2} {ψ3}

10

1

{ψ1} {ψ2} {ψ3}

−10

3

5 7 9 11 Number l of POD basis functions

13

15

10

1

3

5 7 9 11 Number l of POD basis functions

13

15

Fig. 3.19 Run 2: Decay of the norms .p¯ h − P  p¯ h L2 (0,T ;H 1 (Ω)) (left) and .p¯ h − P  p¯ h j

L2 (0,T ;L2 (Ω)) (right) for the different POD basis .{ψi }i=1 corresponding to the snapshot ensembles = 1, 2, 3

.Vj , .j

160

M. Hinze h 15 Comparison of u −u

Comparison of the optimal controls 2

0.2

1.5

1

0 0.5

0

−0.5

uh15 1 u for ψi 2 15 u for ψi 3 15 u for ψi

1

−0.2

0.2

0.4

0.6

0.8

1

uh−u15 for ψi 2 h 15 u −u for ψi 3 h 15 u −u for ψi 0.2

0.4

0.6

0.8

1

t−axis

t−axis

Fig. 3.20 Run 2: Comparison of the FE optimal control .uh and the suboptimal control .u for j  h  . = 15 (left) and of the differences between .u and .u (right) for the different POD basis .{ψi }i=1 corresponding to the snapshot ensembles .Vj , .j = 1, 2, 3 j

Table 3.5 Run 2: Norms .uh − u L2 (0,T ) for the different POD basis .{ψi }i=1 corresponding to the snapshot ensembles .Vj , .j = 1, 2, 3 .

.u

= 1 = 3 . = 5 . = 7 . = 9 . = 11 . = 13 . = 15

0.5100 0.3792 0.3506 0.3225 0.3031 0.2902 0.2057 0.1530

. .

h

− u  for .{ψi1 }i=1

.u

h

− u  for .{ψi2 }i=1

0.5437 0.1200 0.0588 0.0584 0.0585 0.0585 0.0596 0.1282

.u

h

− u  for .{ψi3 }i=1

0.4672 0.1869 0.1201 0.0676 0.0566 0.0557 0.0555 0.0555

3.4.4 Data Quality in Surrogate Based Optimal Control As already discussed above the respective POD approximations in Theorem 3.1 rely on the knowledge of the optimal control u and the optimal state .y(u), respectively. This is impractical, since snapshots of the state and the adjoint state for those data in general are not available. In the present section we give partial answers to the related complex of questions. For the quality of a data-driven surrogate model the quality of the data plays an essential role. In the context of optimal control with data-driven surrogate models quality is related to • the origin of the data. In the context of POD surrogate modeling this question is related to the location of the snapshots in time. • the quality of the control which serves as input for the generation of the snapshots.

3 Model Order Reduction for Optimal Control Problems

161

In what follows we use ideas developed in the works [7, 23], and use5 writings and numerical experiments in larger parts from [8] to illustrate the relevance of data quality to surrogate-based optimal control results. Special thanks here to Alessandro Alla and Carmen Gräßle for their courtesy. Further related literature can be found in [25, 48, 54]. To begin with we consider the linear-quadratic optimal control problem (P )

.

min

(y,u)∈Y ×Uad

s.t. .

J (y, u) :=

1 α  y − yd 2L2 (Ω ) +  u 2L2 (Ω ) T T 2 2

⎫ yt (x, t) − νΔy(x, t) = (Bu)(x, t) in ΩT ⎬ y(x, t) = 0 on ΣT ⎭ y(x, 0) = y0 (x) in Ω

(3.31)

and u ∈ Uad := {u ∈ L2 (0, T ; Rm ) | ua (t) ≤ u(t) ≤ ub (t) f.a.a t ∈ [0, T ]}

.

Here, .Ω ⊂ Rn denotes an open bounded domain with smooth boundary, .ΩT = Ω × (0, T ], .ΣT = ∂Ω × (0, T ], .α, ν > 0, .yd , f ∈ L2 (ΩT ), y0 ∈ H01 (Ω), m 2 2 −1 (Ω)) is the linear and bounded control operator, .B : L (0, T ; R ) → L (0, T ; H ∞ and .ua , ub ∈ L (0, T ) denote the control bounds, and .Y = W (0, T ) = {v ∈ L2 (0, T ; H01 (Ω), vt ∈ L2 (0, T ; H −1 (Ω))} is the classical parabolic state space. In this setting problem .(P ) admits a unique solution .(y, u) which together with the unique adjoint state .p ∈ W (0, T ) satisfies the optimality conditions

.

⎫ (SE): yt − νΔy = Buin ΩT , y = 0 on ΣT , y(0) = y0 in Ω ⎬ (AE): −pt − νΔp = y − yd in ΩT , p = 0 on ΣT , p(T ) = 0 in Ω ⎭ (VI): αu + B ∗ p, v − uU ∗ ,U ≥ 0for all v ∈ Uad (3.32)

It then follows from classical regularity theory of parabolic equations that .y0 ∈ H01 (Ω), yd ∈ L2 (ΩT ) implies .p ∈ H 2,1 (ΩT ) := L2 (0, T ; H 2 (Ω) ∩ H01 (Ω)) ∩ H 1 (0, T ; L2 (Ω)), so that the following reformulation of the optimality system (SEAE-VI) in the present setting is equivalent to the second order in time and fourth order in space elliptic space-time system for the adjoint variable p, if we in addition

5 Reprinted by permission from ESAIM: M2AN, A posteriori snapshot location for POD in optimal control of linear parabolic equations, Alessandro Alla, Carmen Grässle, and Michael Hinze, Volume 52, Number 5, September–October 2018, Page(s) 1847–1873, DOI https://doi.org/ 10.1051/m2an/201800.

162

M. Hinze

require .yd ∈ H 2,1 (ΩT ); % 1 ∗ − BPUad − B p = −(yd )t + Δyd α p(·, T ) = 0 p=0 Δp = yd (pt + Δp) (0) = yd (0) − y0 $

−ptt

+ Δ2 p

.

⎫ ⎪ in ΩT , ⎪ ⎪ ⎪ ⎪ ⎪ in Ω, ⎬ on ΣT , ⎪ ⎪ ⎪ ⎪ on ΣT , ⎪ ⎪ ⎭ in Ω,

(3.33)

In [23, Theorem 3.1] the following a posteriori error analysis is presented for (3.33). Given a time grid .0 = t0 < . . . < tn = T , Δtj = tj −tj −1 , Ij = [tj −1 , tj ] together with the time-discrete space Vtk = {v ∈ H 2,1 (ΩT ) : v(·)|Ij ∈ P1 (Ij )}, V¯tk = Vtk ∩ H02,1 (ΩT )

.

the adjoint .p ∈ H02,1 together with its time-discrete approximation Galerkin approxmiation .pk ∈ V¯tk satisfies .

 p − pk 2H 2,1 (Ω ) ≤ C · ηp2 , T

(3.34)

where ηp2 =



.

Δtj2

j

$ % 1  −(yd )t + Δyd + (pk )tt − BPUad − B ∗ pk α

Ij

− Δ2 pk 2L2 (Ω) +



 yd − Δpk 2L2 (Γ ) .

j I j

Let us note already here that in the case .Uad ≡ U a similar analysis is applicable to the state y. The idea for the snapshot selection is now as follows; go for an adaptive time grid utilizing (3.34) based on a coarse spatial resolution of the adjoint p. And at the same time obtain an approximation of the optimal control which then is used for the for snapshot generation with the state equation. We are interested in controlling the error 1 1 u − ulk = PUad (− B ∗ p) − PUad (− B ∗ pkl ) α α

.

Controlling its size in essence then allows to control the error .y(u) − y l (ulk ). There are several options to achieve this goal. Let us first mention an approach which

3 Model Order Reduction for Optimal Control Problems

163

utilizes a priori error estimates. To begin with we recall estimate (3.28);

u − ulk U + y − ykl L2 (H )

.

' ( ∞ ( ∼ y0 − P l y0 H + ) λk + k=l+1

+ yt − P  yt L2 (0,T ;V ) + p(y(u)) − P l (p(y(u)))W , By including .y0 into the snapshot set the first addend on the right hand side vanishes, also the third addend, if derivative information is included into the snapshot set. However, it remains difficult to deduce information on the snapshot locations from the remaining addends. From the error representation of Theorem 3.1 we deduce u − ulk  + y − ykl  ∼ p(u) − p˜ kl (u) + y(u) − y˜kl (u).

.

Here, .y˜kl (u) denotes the POD approximation to .y(u), and .p˜ kl (u) the POD approximation to .p(u), the adjoint state associated to the optimal control problem. Detailed error estimates for POD approximations are provided e.g. in [53] and for the state in the discrete .L2 (H )-norm takes the form n  .

βj y(tj ; uk ) − yj (uk )2H ≤

j =0

n   j =1

Δtj2 Cy ((1 + cp2 )ytt (uk )2L2 (I

j ,H )

"

+yt (uk )L2 (Ij ;V ) )

+

n 

,

d  

Cy

j =1

(3.35a)

.

-

|ψi , y0 V | + λi 2

"

.

i=+1

(3.35b)

+

d n   j =1 i=+1

Cy

λi Δtj2

(3.35c)

where .Cy > 0 is a constant depending on T , but independent of the time grid {tj }nj=0 . Note that .y(tj ; uk ) is the continuous solution of (3.31) at given time instances related to the suboptimal control .uk . The temporal step size in the subinterval .[tj −1 , tj ] is denoted by .Δtj . The positive weights .βj are given by

.

β0 =

.

Δt1 , 2

βj =

Δtj + Δtj +1 for j = 1, . . . , n − 1, 2

and βn =

Δtn . 2

164

M. Hinze

The constant .cp is an upper bound of the projection operator. In our settings we use H = L2 (Ω). A similar estimate can be carried out for the .H 1 -norm. We refer the interested reader to [52, 53]. For the adjoint error the state y in this estimate has to be replaced by the adjoint p and its POD approximation .pl . Estimate (3.35) provides now a recipe for refinement of the time grid in order to approximate the state y within a prescribed tolerance. One option here consists in equidistributing the error contributions of the term (3.35a), while the number of modes has to be adapted to the time grid size according to the term (3.35c). Finally, the number . of modes should be chosen such that the term in (3.35b) remains within the prescribed tolerance. Note however that in order to obtain a fully practical error indicator in this way the quantities

.

ytt (uk )2L2 (I

.

j ,H )

+ yt (uk )L2 (Ij ;V )

have to be replaced by computable counterparts, compare [49]. The overall procedure is summarized in

Algorithm 2: Adaptive snapshot location for optimal control problems from [8] Require coarse spatial grid size Δx, fine spatial grid size h, maximal number of degrees of freedom (dof) for the adaptive time discretization, T > 0. • Solve (3.33) adaptively w.r.t. time with spatial resolution Δx. of the optimal adjoint state pΔx → Obtain time gridT + approximation " • Compute uΔx = PUad − α1 B∗ pΔx . • Solve (3.31) on T with spatial resolution Δx corresponding to the control uΔx . • Refine the time interval T according to (3.35) and construct the time grid Tnew . • Generate state and adjoint snapshots by solving (SE) in (3.32) with r.h.s. uΔx and (AE), respectively, on Tnew with spatial resolution h. Generate time derivative adjoint snapshots with time finite differences on those adjoint snapshots. • Compute a POD basis of order  and build the POD reduced order model (3.21) based on the state, adjoint state and time derivative adjoint state snapshots. • Solve (3.21) with the time grid Tnew to obtain ulk .

Let us now motivate the error control concept of Algorithm 2 with presenting the respective elaborations from [8]. Starting point is the estimate 1 1 u − ulk  = PUad (− B ∗ p) − PUad (− B ∗ pkl ) ≤ Cp − pkl , α α

.

which follows from the linearity and boundedness of the control operator .B and the non-expansiveness of the orthogonal projection .PUad . which allows to estimate the control error by the error in the adjoint approximation of the optimal control problem.

3 Model Order Reduction for Optimal Control Problems

165

Let us motivate our approach by analyzing the error .p(u) − p˜ k (uk )L2 (0,T ,V ) between the optimal adjoint solution .p(u) satisfying (AE) in (3.32) associated with the optimal control u for (3.20), i.e. .u = PUad (− α1 B ∗ p) and the POD reduced approximation .p˜ k (uk ), which is the time discrete solution to the POD-ROM for (AE) in (3.32) associated with the time discrete optimal control .uk for (3.21), i.e.  1 .y = y(u ) in (AE) of (3.32). We denote by V the space .V = H (Ω) and by H the k 0 space .L2 (Ω). To ease notation let us denote by .pk (uk ) the time discrete weak adjoint solution of (3.33) associated with the control .uk = PUad (− α1 B ∗ pk ), and by .p˜ k (uk ) the time discrete adjoint solution to (AE) in (3.32) with respect to the control .uk . Furthermore, .pk (uk ) is the time discrete adjoint solution to (AE) in (3.32) with respect to the suboptimal control .uk , i.e. .y = y(uk ) in (AE) of (3.32). By .P  : V → V  we denote the orthogonal POD projection operator as follows:   .P y := y, ψi V ψi

for y ∈ V .



i=1

Theorem 3.2 Suppose that . > 0, .p(u) is the solution of (AE) in (3.32) and .p˜ k (uk ) is the time discrete POD solution of (AE) in (3.32) with .y = ykl . Let us also assume that pk (uk ) − p˜ k (uk )L2 (0,T ;V ) ≤ ε.

(3.36)

.

Then, there exist .C1 , C2 , C3 > 0 such that p(u) − p˜ k (uk )L2 (0,T ,V ) ≤

.



C2 C1η + (ζk U + ζk U ) α ' , ( d  ( + )C3 λki + yk − yk 2L2 (0,T ,H ) , i=+1

(3.37) where the functions .ζk and .ζk are specified in the proof of the theorem. Proof By the triangular inequality we get the following estimates for the L2 (0, T ; V )-norm:

.

.p(u)

− p˜ k (uk ) ≤ p(u) − pk (uk ) + pk (uk ) − P  pk (uk ) +       (1)

(2)

+ P pk (uk ) − P   

(3)



p˜ k (uk ) + P  p˜ k (uk ) − p˜ k (uk ) 



 (4)



(3.38)

166

M. Hinze

The term (3.38)(1) can be estimated by (3.34) and concerns the snapshot generation. Thus, we can decide on a certain tolerance in order to have a prescribed error. The second term (2) in (3.38) is the POD projection error and can be estimated by the sum of the neglected eigenvalues. Then, we note that the third term (3.38)(3) can be estimated as follows: P  pk (uk ) − P  p˜ k (uk ) ≤ P   pk (uk ) − p˜ k (uk ) ≤ C2 uk − uk U ,

.

(3.39)

where .P   ≤ 1 and .C2 > 0 is the constant referring to the Lipschitz continuity of .pk independent of k as in [59]. In Eq. (3.39) we make use of assumption (3.36). In order to control the quantity .uk − uk U ≤ uk − uU + u − uk U we make use of the a-posteriori error estimation of [66], which provides an upper bound for the error between the (unknown) optimal control and any arbitrary control .up (here  .up = uk and .up = u ) by k u − up U ≤

.

1 ζp U , α

where .α is the regularization parameter in the cost functional and .ζp ∈ L2 (0, T ; Rm ) is chosen such that αup − B ∗ p(up ) + ζp , u − up U ≥ 0 ∀u ∈ Uad

.

is satisfied. Finally, the last term (3.38)(4) can be estimated according to [45] and involves the sum of the eigenvalues not considered, the first derivative of the time discrete adjoint variable and the difference between the state and the POD state: , P

.



pk (uk ) − pk (uk )2

≤ C3

d 

λki + p˙ k (uk ) − P  p˙ k (uk )2L2 (0,T ,V )

i=+1

" +yk (uk ) − yk (uk )2L2 (0,T ,H ) ,

(3.40)

for a constant .C3 > 0. We note that the sum of the neglected eigenvalues is sufficiently small provided that . is large enough. Furthermore, the error estimation (3.40) depends on the time derivative .p˙ k . To avoid this dependence, we include time derivative information concerning the adjoint variable into the snapshot set, see [53].   Remark 3.1 1. We assume in (3.36) that the difference between .pk (uk ) and .p˜ k (uk ) is small since the continuous solution of (3.33) coincides with the solution of (AE) in (3.32). 2. We note that estimations (3.37) and (3.40) involve the state variable which is estimated as in (3.35).

3 Model Order Reduction for Optimal Control Problems

167

The following numerical examples together with their description are take from [8].

3.4.5 Test 1: Solution with Steep Gradient Towards Final Time The data for this test example is inspired from Example 5.3 in [23], with the following choices: .Ω = (0, 1) and .[0, T ] = [0, 1]. We set .Uad = L∞ (0, T ; Rm ). The example is built such that the exact optimal solution .(y, ¯ u) ¯ of problem (3.19) with associated optimal adjoint state .p¯ is known: ,

y(x, ¯ t) = sin(π x) sin(π t),

.

e(t−1)/ε − e−1/ε p(x, ¯ t) = x(x − 1) t − 1 − e−1/ε

,

e(t−1)/ε − e−1/ε 1 ¯ t) = −t + , u(t) ¯ = − B ∗ p(x, α 1 − e−1/ε with .m = 1 and the control shape function .χ (x) = x(x − 1) for the operator .B. This leads to the right hand side ,

e(t−1)/ε − e−1/ε .f (x, t) = π sin(π x)(cos(π t) + π sin(π t)) + x(x − 1) t − 1 − e−1/ε

,

the desired state ,

e(t−1)/ε · 1/ε .yd (x, t) = sin(π x) sin(π t) + x(x − 1) 1 − 1 − e−1/ε , e(t−1)/ε − e−1/ε +2 t − 1 − e−1/ε

-

and the initial condition .y0 (x) = 0. We choose the regularization parameter to be α = 1/30. For small values of .ε (we use .ε = 10−4 ), the adjoint state .p¯ develops a layer towards .t = 1, which can be seen in the left plots of Figs. 3.21 and 3.22. In this test run we focus on the influence of the time grid to approximate of the POD solution. Therefore, we compare the use of two different types of time grids: an equidistant time grid characterized by the time increment .Δt = 1/n and a non-equidistant (adaptive) time grid characterized by .n + 1 degrees of freedom (dof). We build the POD-ROM from the uncontrolled problem; we create the snapshot ensemble by determining the associated state .y(u0 ) and adjoint state .p(u0 ) corresponding to the control function .u0 ≡ 0 and we also include the initial condition .y0 and the time derivatives of the adjoint .pt (u0 ) into our snapshot set, which is accomplished with time finite differences of the adjoint snapshots. We .

168

M. Hinze

0

0

-0.1

-0.1

-0.2

-0.2

-0.3 1

-0.3 1 1

1 0.5

0.5

0.5

t

0

0

t

x

0.5 0

0

x

Fig. 3.21 Test 1: Analytical optimal adjoint state .p¯ (left), POD adjoint solution .p  utilizing an equidistant time grid with .Δt = 1/20 (middle), POD adjoint solution .p  utilizing an adaptive time grid with dof .= 21 (right) 1

0.8

0.8

0.6

0.6

t

t

1

0.4

0.4

0.2

0.2

0

0 0

0.2

0.4

0.6

x

0.8

1

0

0.2

0.4

0.6

0.8

1

x

Fig. 3.22 Test 1: Contour lines of the analytical optimal adjoint state .p¯ (left), POD adjoint solution  utilizing an equidistant time grid with .Δt = 1/20 (middle), POD adjoint solution .p  utilizing an adaptive time grid with dof .= 21 (right)

.p

use . = 1 POD basis function. Although we would also have the possibility to use suboptimal snapshots corresponding to an approximation .uΔx of the optimal control, here, we want to emphasize the importance of the time grid. Nevertheless in this example, the quality of the POD solution does not really differ, if we consider suboptimal or uncontrolled snapshots. First, we leave out the post-processing step 4 of Algorithm 2 and discuss the inclusion of it later. Figure 3.23 visualizes the space-time mesh of the numerical solution of (3.33) with the temporal residual type a-posteriori error estimate (3.34). The first grid in Fig. 3.23 corresponds to the choice of dof=21 and .Δx = 1/100, whereas the grid in the middle refers to using dof = 21 and .Δx = 1/5. Both choices for spatial discretization lead to the exact same time grid, which displays fine time steps towards the end of the time horizon (where the layer in the optimal adjoint state is located), whereas at the beginning and in the middle of the time interval the time steps are larger. This clearly indicates that the resulting time adaptive grid is very insensitive against changes in the spatial resolution. For the sake of completeness, the equidistant grid with the same number of degrees of freedom is shown in the right plot of Fig. 3.23. Since the generation of the time adaptive grid as well as the approximation of the optimal solution is done in the offline computation part of POD-MOR, this process shall be performed quickly, which is why we pick .Δx = 1/5 for step 1 in Algorithm 1.

169

1

1

0.9

0.9

0.8

0.8

0.8

0.7

0.7

0.7

0.6

0.6

0.6

0.5

0.5

0.5

t

1 0.9

t

t

3 Model Order Reduction for Optimal Control Problems

0.4

0.4

0.4

0.3

0.3

0.3

0.2

0.2

0.2

0.1

0.1

0

0 0

0.2

0.4

0.6

0.8

1

0.1 0 0

0.2

0.4

x

0.6

0.8

1

0

0.2

0.4

x

0.6

0.8

1

x

Fig. 3.23 Test 1: Adaptive space-time grids with dof .= 21 according to the strategy in [23] and = 1/100 (left) and .Δx = 1/5 (middle), respectively, and the equidistant grid with .Δt = 1/20 (right)

.Δx

0

0

0

-0.2

-0.2

-0.2

-0.4

-0.4

-0.4

-0.6

-0.6

-0.6

-0.8

-0.8

-0.8

-1

-1

-1

-1.2

-1.2

-1.2

0

0.2

0.4

t

0.6

0.8

1

0

0.2

0.4

t

0.6

0.8

1

0

0

0

0

-0.2

-0.2

-0.2

-0.4

-0.4

-0.4

-0.6

-0.6

-0.6

-0.8

-0.8

-0.8

-1

-1

-1

-1.2

-1.2

-1.2

0

0.2

0.4

t

0.6

0.8

1

0

0.2

0.4

t

0.6

0.8

1

0

0.2

0.4

0.2

0.4

t

t

0.6

0.8

1

0.6

0.8

1

Fig. 3.24 Test 1: Analytical optimal control .u¯ (top left), approximation .uΔx of the optimal control gained by step 1 of Algorithm 2 (top middle), approximation of the optimal control utilizing OSPOD on a uniform time grid with .Δt = 1/20 (top right); POD control utilizing a uniform time grid with .Δt = 1/20 (bottom left), POD control utilizing an adaptive time grid with dof .= 21 (bottom middle), approximation of the optimal control utilizing OS-POD on an adaptive time grid with dof .= 21 (bottom right)

In the middle and right panel of Figs. 3.21 and 3.22, we show the surface and contour lines of the POD adjoint state with an equidistant and adaptive time grid, respectively. The analytical control intensity .u(t), ¯ the approximation .uΔx of the optimal control computed in step 1 of Algorithm 2 as well as the POD controls utilizing a uniform and time adaptive grid, respectively, are shown in Fig. 3.24. Table 3.6 summarizes the approximation quality of the POD solution depending on different time discretizations. The fineness of the time discretization (characterized by .Δt and dof, respectively) is chosen in such a way that the results of uniform and adaptive temporal discretization are comparable. The absolute errors between the analytical optimal state .y¯ and the POD solution .y  , defined

170

M. Hinze

Table 3.6 Test 1: Absolute errors between the analytical optimal solution and the POD solution depending on the time discretization (equidistant: columns 1–4, adaptive: columns 5–8) y

p u .εabs .εabs −02 −01 −02 .1.51 · 10 .1.98 · 10 .3.62 · 10 −02 −01 −02 .1.12 · 10 .2.11 · 10 .3.85 · 10

.Δt

.εabs

1/20 1/42 −02 .2.14 · 10−01 .3.92 · 10−02 1/61 .1.08 · 10 1/114 .1.12 · 10−02 .2.18 · 10−01 .3.99 · 10−02 1/6500 .2.05 · 10−02 .1.29 · 10−01 .2.35 · 10−02

dof 21 43 62 115 –

y

p u .εabs .εabs −02 −02 −03 .5.19 · 10 .5.34 · 10 .9.63 · 10 −02 −02 −03 .5.16 · 10 .2.49 · 10 .4.36 · 10 .εabs

· 10−02 .2.33 · 10−02 .4.07 · 10−03 −02 .2.30 · 10−02 .4.03 · 10−03 .5.16 · 10 – – – .5.16

Table 3.7 Test 1: CPU times for POD offline computations (snapshot generation and POD basis computation) depending on the time discretization (equidistant: columns 1–3, adaptive: columns 4–7 .Δt

1/20 1/42 1/61 1/114 1/6500

y

POD offline 0.12 s 0.13 s 0.14 s 0.18 s 58.6 s

POD online 0.04 s 0.07 s 0.09 s 0.10 s 4.74 s

dof 21 43 62 115 –

compute .T 2.3 s 4.2 s 5.5 s 8.9 s –

POD offline 0.12 s 0.13 s 0.14 s 0.18 s –

POD online 0.04 s 0.07 s 0.09 s 0.10 s –

by .εabs := y¯ − y  L2 (ΩT ) , are listed in columns 2 and 6; same applies for the u errors in the control .εabs := u¯ − u U (columns 3 and 7) and adjoint state p  .ε abs := p¯ − p L2 (ΩT ) (columns 4 and 8). If we compare the results, we note that we gain one order of accuracy for the adjoint and control variable with the time adaptive grid. For the state variable, the use of an adaptive time grid leads to slightly worse results, which we will discuss later. Both for the full solution and the reduced order solution, three steepest descent iterations are needed. In order to achieve an accuracy in the control variable of order .10−2 and an accuracy in the adjoint state of order .10−3 with an equidistant time grid, we would need about .n = 20,000 time steps (not listed in Table 3.6) which is not feasible with our machine. The largest number of time steps we are able to manage with our computing resources is .n = 6500. With .n = 6500 time steps, the accuracy in the control and adjoint state is one order worse than the results obtained with the time adaptive grid with only 21 degrees of freedom. Furthermore, the offline CPU time for snapshot generation and POD basis computation is .58.6 s. On the contrary, Algorithm 2 leads to a total CPU time of .2.46 s if we consider offline and online altogether (compare Table 3.7). This emphasizes that using an appropriate (non-equidistant) time grid for the adjoint variable is of particular importance in order to efficiently achieve POD controls of good quality. Furthermore, we compare the optimization of the full FE model to the optimization with the POD-MOR model obtained with our approach. First of all, Table 3.8 lists the errors between the FE solution with spatial discretization .h = 1/100 and the analytical solution. It shows that we need more than about .n = 40000 equidistantly distributed time steps to achieve a value of the cost functional close to the analyitcal

3 Model Order Reduction for Optimal Control Problems

171

Table 3.8 Absolute errors between the analytical optimal solution and the finite element solution of spatial resolution .h = 1/100 depending on the resolution of the uniform time discretization .Δt

1/20 1/42 1/61 1/114 1/6500 1/20000 1/40000

y

p

u

.εabs

· 10−02 −03 .6.8850 · 10 −03 .5.1979 · 10 −03 .3.5501 · 10 −03 .1.2343 · 10 −04 .5.0444 · 10 −04 .2.8080 · 10 .1.2961

.εabs

· 10−01 −01 .2.1144 · 10 −01 .2.1528 · 10 −01 .2.1939 · 10 −01 .1.3016 · 10 −02 .5.2073 · 10 −02 .2.7981 · 10 .1.9898

.εabs

J

· 10−02 −02 .3.8602 · 10 −02 .3.9303 · 10 −02 .4.0054 · 10 −02 .2.3716 · 10 −03 .9.3888 · 10 −03 .4.9088 · 10 .3.6325

· 10+04 +04 .1.9834 · 10 +04 .1.3656 · 10 +03 .7.3078 · 10 +02 .1.4116 · 10 +01 .9.0788 · 10 +01 .8.5692 · 10

.4.1652

Table 3.9 Computational times of the full finite element solution compared with the computational times of the POD-MOR solution including all offline times .Δt

1/6500 1/20000 1/40000 1/6500 1/20000 1/40000

h 1/100 1/100 1/100 1/1000 1/1000 1/1000

Full FE run 6.15 16.81 34.50 37.51 115.62 231.38

dof

Compute .T

POD offline

POD online

21

2.3

0.12

0.04

21

2.3

0.16

0.04

Speedup 2.5 6.8 14.0 15.0 46.2 92.5

value which is .J ≈ 8.3988 · 10+01 . Our POD-MOR approach with an adaptive time grid reaches this value with only dof .= 115 (see Table 3.11). We note that we were only able to realize the computation for .n ≥ 20000 on a compute server for memory reasons. On top of that, we compare in Table 3.9 the computational times of the full FE optimization using a uniform time grid with the POD-MOR solution utilizing our approach. If we want to achieve a FE solution with similar accuracy as the POD solution on a time adaptive grid, a very large number of time steps are needed, which leads to high computational times, so that we get large speedup factors. Finally, we want to mention that solving (3.33) with the adaptive space-time approach of [23], it takes 43.8 s for dof .= 21 and .h = 1/100 and 2111.5 s for dof .= 21 and .h = 1/1000 (compare Table 3.7). We note that our method has a speedup factor of .17.8 and 851, respectively. Table 3.10 contains the evaluations of each term in (3.37). The value .ηpi (.ηpb ) refers to the first (second) part in (3.34). For this test example, we note that the term .ηpi influences the estimation. However, we observe that the better the semidiscrete adjoint state .pΔx from step 1 of Algorithm 2 is, the better will be the POD adjoint solution. Since all summands of (3.37) can be estimated, Table 3.10 allows us to control the approximation of the POD adjoint state. The estimation (3.35) concerning the state variable will be investigated later on. Furthermore, a comparison of the value of the cost functional is given in Table 3.11. The aim of the optimization problem (3.19) is to minimize the quantity of interest .J (y, u). The analytical value of the cost functional at the optimal solution

172

M. Hinze

Table 3.10 Test 1: Evaluation of each summand of the error estimation (3.37) p

i

dof

.εabs

.ηp

21 43 62 115

.9.6343

.4.9518

· 10−03 −03 .4.3611 · 10 −03 .4.0691 · 10 −03 .4.0340 · 10

· 10+00 +00 .1.1976 · 10 −01 .7.2852 · 10 −01 .3.4966 · 10

.ζk U

b

.ηp

· 10−04 −05 .5.0087 · 10 −05 .2.9835 · 10 −05 .1.4845 · 10 .4.8031

Table 3.11 Test 1: Value of the cost functional at the POD solution utilizing uniform and adaptive time discretization, respectively, analytical value: +01 .J ≈ 8.3988 · 10

.Δt

1/20 1/42 1/61 1/114 1/6500

.J (y

+ ζk U

· 10−02 −02 .1.9200 · 10 −02 .1.9707 · 10 −02 .2.0191 · 10 .1.6033

 , u)

+04 .4.1652 · 10 +04 .1.9834 · 10

· 10+04 +03 .7.3078 · 10 +02 .1.4116 · 10 .1.3656

dof 21 43 62 115 –

d

.

i=+1 λi

· 10−04 −04 .2.9454 · 10 −04 .2.9212 · 10 −04 .2.9090 · 10 .3.3938

.J (y

 , u)

· 10+01 +01 .8.4252 · 10 +01 .8.4102 · 10 +01 .8.4034 · 10 – .8.7960

is .J (y, ¯ u) ¯ ≈ 8.3988 · 101 . Table 3.11 clearly points out that the use of a time adaptive grid is fundamental for solving the optimal control problem (3.19). The huge differences in the values of the cost functional is due to the great increase of the desired state .yd at the end of the time interval (see Fig. 3.25). Small time steps at the end of the time interval, as it is the case in the time adaptive grid, lead to much more accurate results. Now, let us discuss the inclusion of step 4 in Algorithm 2. Since we went for an adaptive time grid regarding the adjoint variable, we cannot in general expect that the resulting time grid is a good time grid for the state variable. Table 3.6 confirms that a uniform time grid leads to better approximation results in the state variable than the time adaptive grid. In order to improve also the approximation quality in the state variable, we incorporate the error estimation (3.35) from [53] in a postprocessing step after producing the time grid with the strategy of [23] and before starting the POD solution process. We define ⎛



⎟ ⎜ ηPODj := Δtj2 ⎝ (yttk 2H + ytk 2V )⎠

.

Ij

where .ytk ≈ yt (tk ) and .yttk ≈ ytt (tk ) are computed via finite difference approximation. We perform bisection on those time intervals .Ij , where the quantity .ηPODj has its maximum value and repeat this .Nrefine times. This results in the time grid .Tnew . The improvement in the approximation quality in the state variable can be seen in Table 3.12. The more additional time instances we include according to (3.35), the better the approximation results get with respect to the state. Moreover, also the approximation quality in the control and adjoint state is improved. The CPU times for this post-processing step are listed in Table 3.12.

3 Model Order Reduction for Optimal Control Problems

1

1

0.5

0.5

0 1

173

0 1 1

1 0.5

0.5

0.5

t

0

0

t

x

0.5 0

0

x

Fig. 3.25 Test 1: Analytical optimal state .y¯ (top left), desired state .yd (top right); POD state .y  utilizing a uniform time grid with .Δt = 1/20 (bottom left), POD state .y  utilizing an adaptive time grid with dof .= 21 (bottom right) Table 3.12 Test 1: Improvement of approximation quality concerning the state variable and the corresponding CPU times. The initial time grid .T is computed with dof .= 43 y

u

p

.Nrefine

.εabs

.εabs

.εabs

0 5 10 20 30

.5.1874

.5.3428

.9.6343

· 10−02 −02 .4.0058 · 10 −02 .3.0909 · 10 −02 .2.4759 · 10 −02 .2.3028 · 10

· 10−02 −02 .2.1145 · 10 −02 .1.8396 · 10 −02 .1.7104 · 10 −02 .1.6971 · 10

· 10−03 −03 .3.6378 · 10 −03 .3.0895 · 10 −03 .2.8210 · 10 −03 .2.7906 · 10

CPU time – 0.2 s 0.3 s 0.4 s 0.4 s

We note that the sum of the neglected eigenvalues . di=2 λi is approximately zero and the second largest eigenvalue of the correlation matrix is of order .10−10 , which makes the use of additional POD basis functions redundant. Likewise, in this particular example the choice of richer snapshots (even the optimal snapshots) does not bring significant improvements in the approximation quality of the POD solutions. So, this example shows that solely the use of an appropriate adaptive time mesh efficiently improves the accuracy of the POD solution. Finally, we compare the approximation .uΔx for the optimal control from step 2 in Algorithm 2 with an approximation of the optimal control we get by performing OS-POD (optimality system POD, see e.g. [54] and Sect. 3.4.2) on a uniform time

174

M. Hinze

Table 3.13 Test 1: Accuracy of the approximate control .uΔx from Algorithm 2 in comparison with the approximate OS-POD control .uOSP OD on uniform (columns 2,4) and adaptive (column 3) time grids Error

.Δt

− u ¯ U − u ¯ U

= 1/20 · 10−01

.uOSP OD

.2.2988

.uΔx



dof .= 21 · 10−02 −02 .1.1162 · 10 .9.2801

.Δt

= 1/6500 · 10−01

.1.2896



Table 3.14 Test 1: Computational times for computing an approximation of the optimal control using step 1–2 of Algorithm 2 (row 2) and using OS-POD for different temporal resolutions (rows 3-4) Control dof=21 .uOSP OD , Δt = 1/20 .uOSP OD , Δt = 1/6500 .uΔx ,

CPU time 2.3 s 0.37 s 315.6 s

grid. In our runs for OS-POD, the snapshots are taken from the state, adjoint state, time derivative of the adjoint state and the initial condition .y0 . We use . = 1 basis function and perform two gradient steps. The comparison of the controls .uΔx (Fig. 3.24, top middle) and .uOSP OD on a uniform time grid (Fig. 3.24, top right) with the optimal control .u¯ (Fig. 3.24, top left) visualizes that .uΔx is closer to the optimal solution than .uOSP OD . We also combined OS-POD with the time adaptive grid (Fig. 3.24, bottom right). In this example, it turns out that the accuracy of the control variable is improved by a well-suited adaptive time grid. Tables 3.13 and 3.14 shows the control error and the CPU time. As expected, a very large number of time steps for a uniform time discretization is needed. Indeed, the control computed on a fine equidistant grid with .n = 6500 is less accurate than a coarse adaptive grid (compare Table 3.13). Finally, Table 3.14 shows the computational costs of the offline stage to compute a reference control with OS-POD and our approach. It turns out that our approach is moderately more expensive than OS-POD on the coarse uniform time grid, but much cheaper than OS-POD on the fine uniform time grid. However, as already mentioned our method in both cases delivers better control approximation than OS-POD.

3.4.6 Test 2: Solution with Steep Gradient in the Middle of the Time Interval Let .Ω = (0, 1) be the spatial domain and .[0, T ] = [0, 1] be the time interval. We choose .ε = 10−4 and .α = 1. To begin with, we consider an unconstrained optimal control problem and investigate the inclusion of control constraints separately in Test 3. We build the example in such a way that the analytical solution .(y, ¯ u) ¯

3 Model Order Reduction for Optimal Control Problems

175

100

2

50

1

0

0

-50 1

-1 1 1 0.5

1 0.5

0.5

t

0

0

0.5

t

x

0

0

x

Fig. 3.26 Test 2: Analytical optimal adjoint state .p¯ (left), POD adjoint solution .p  with . = 4 utilizing an equidistant time grid with .Δt = 1/40 (middle), POD adjoint solution .p  with . = 4 utilizing an adaptive time grid with dof .= 41 (right)

0.6

0.6

t

1

0.8

t

1

0.8

0.4

0.4

0.2

0.2

0

0 0

0.2

0.4

0.6

0.8

1

0

x

0.2

0.4

0.6

0.8

1

x

Fig. 3.27 Test 2: Contour lines of the analytical optimal adjoint state .p¯ (left), POD adjoint solution  with . = 4 utilizing an equidistant time grid with .Δt = 1/40 (middle), POD adjoint solution  .p with . = 4 utilizing an adaptive time grid with dof=41 (right) .p

of (3.19) is given by: $

% t − 0.5 (t − 1), ε $ % $ % t − 0.5 8 32 .u ¯ 1 (t) = u¯ 2 (t) = −atan − (t − 1) , ε π3 π2 * * + + .χ ¯ 1 (x) = max 0, 1 − 16(x − 0.25)2 , χ¯ 2 (x) = max 0, 1 − 16(x − 0.75)2 . y(x, ¯ t) = x 3 (x − 1)t,

.

p(x, ¯ t) = sin(π x)atan

The desired state and the forcing term are chosen accordingly. Due to the arcustangent term and the small value for .ε, the adjoint state exhibits an interior layer with steep gradient at .t = 0.5, which can be seen in the left panel of Figs. 3.26 and 3.27. The shape functions .χ1 and .χ2 are shown in Fig. 3.28 on the left side. As in Test 1, we study the use of two different time grids: an equidistant time discretization and the time adaptive grid computed in step 1 of Algorithm 2 (see Fig. 3.29). Once again, we note that spatial and temporal discretization decouple when computing the time adaptive grid utilizing the a-posteriori estimation (3.34), which enables us to use a large spatial resolution .Δx for solving the elliptic system and to keep the offline costs low.

176 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0

M. Hinze 10 0

1.5

10 -5 1

10 -10

0.2

0.4

0.6

0.8

0.5

10 -15

χ1 χ2

10 -20

1

1

2

3

4

5 6 index i

7

8

9 10

0 0

0.2

0.4

0.6

0.8

1

x

0.2

0.4

0.6 x

0.8

1

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0

t

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0

t

t

Fig. 3.28 Test 2: Shape functions .χ1 (x) and .χ2 (x) (left), decay of the eigenvalues on semilog scale (middle) and first POD basis function .ψ1 (right) utilizing uniform time grid with .Δt = 1/40

0.2

0.4

0.6 x

0.8

1

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0

0.2

0.4

0.6

0.8

1

x

Fig. 3.29 Test 2: Adaptive space-time grids with dof .= 41 according to the strategy in [23] and = 1/100 (left) and .Δx = 1/5 (middle), respectively, and the equidistant grid with .Δt = 1/40 (right)

.Δx

We choose state and adjoint snapshots as well as time derivative adjoint snapshots corresponding to .u0 = 0 and we also include the initial condition .y0 into our snapshot set. We take . = 4 POD modes. Later we will also try out different numbers of utilized POD basis functions. The middle and right plots of Figs. 3.26 and 3.27 show the surface and contour lines of the POD adjoint solution utilizing an equidistant time grid (with .Δt = 1/40) and utilizing the adaptive time grid (with dof = 41), respectively. Clearly, the equidistant time grid fails to capture the interior layer at .t = 1/2 satisfactorily, whereas the POD adjoint state utilizing the adaptive time grid approximates the interior layer accurately. Unlike Test 1, the adaptive time grid is also a suitable time grid for the state variable in this numerical test example. This can be seen visually when comparing the results for the POD state utilizing uniform discretization and utilizing the adaptive time grid with the analytical optimal state, Figs. 3.30 and 3.31. Table 3.15 summarizes the absolute errors between the analytical optimal solution and the POD solution for the state, control and adjoint state for all test runs with an equidistant and adaptive time grid, respectively. If we compare the results of the numerical approximation, we note that the use of an adaptive time grid heavily improves the quality of the POD solution with respect to an equidistant grid. The exact optimal control intensities .u¯ 1 (t) and .u¯ 2 (t) as well as the POD solutions utilizing uniform and adaptive temporal discretization are illustrated in Fig. 3.32.

3 Model Order Reduction for Optimal Control Problems

177

0

0

-0.5

-0.05

-1

-0.1

-1.5 1

-0.15 1 1 0.5

t

1 0.5

0.5 0

0

t

x

0.5 0

0

x

Fig. 3.30 Test 6.2: Analytical optimal state .y¯ (left), POD solution .y  with . = 4 utilizing an equidistant time grid with .Δt = 1/40 (middle), POD solution .y  with . = 4 utilizing an adaptive time grid with dof .= 41 (right)

0.8

0.8

0.6

0.6

t

1

t

1

0.4

0.4

0.2

0.2

0

0 0

0.2

0.4

0.6

0.8

x

1

0

0.2

0.4

0.6

0.8

1

x

Fig. 3.31 Test 6.2: Contour lines of the analytical optimal state .y¯ (left), POD solution .y  with = 4 utilizing an equidistant time grid with .Δt = 1/40 (middle), POD solution .y  with . = 4 utilizing an adaptive time grid with dof .= 41 (right)

.

Table 3.15 Test 6.2: Absolute errors between the analytical optimal solution and the POD solution with . = 4 depending on the time discretization (equidistant: columns 1–4, adaptive: columns 5–8) .Δt

y

p u .εabs .εabs −01 +00 +01 .5.08 · 10 .7.84 · 10 .3.54 · 10 −01 +00 +01 .2.62 · 10 .4.11 · 10 .1.85 · 10 .εabs

1/20 1/40 −01 .2.45 · 10+00 .1.11 · 10+01 1/68 .1.56 · 10 1/134 .7.87 · 10−02 .1.24 · 10+00 .5.59 · 10+00 1/6500 .1.41 · 10−04 .4.92 · 10−03 .9.33 · 10−03

dof 21 41 69 135 –

y

p u .εabs .εabs −02 −01 +00 .4.03 · 10 .5.41 · 10 .2.44 · 10 −04 −03 −02 .2.22 · 10 .5.35 · 10 .1.32 · 10 .εabs

· 10−05 .4.57 · 10−03 .4.27 · 10−03 −05 .4.49 · 10−03 .2.35 · 10−03 .8.56 · 10 – – – .9.70

Another point of comparison is the evaluation of the cost functional. Since the exact optimal solution is known analytically, we can compute the exact value of the cost functional, which is .J (y, ¯ u) ¯ = 1.0085·103 . As expected, the adaptive time grid enables us to approximate this value of the cost functional quite well when using dof .= 135, see Table 3.16. In contrast, the use of a very fine temporal discretization with .Δt = 1/10000 is still worse than the results with the adaptive time grid with only 41 degrees of freedom. Again, this emphasizes the importance of a suitable time grid. For computing the full solution and the reduced order solution, we need three gradient steps in each case. The CPU times for the test runs are summarized in Table 3.17. In order to achieve an accuracy in the control and adjoint variable of order .10−3 , we need around .n = 6500 time steps. In this case, the CPU time for the POD offline phase gets really large (52.4 s). In contrast, computing a time

178

M. Hinze

0.2

5

0.2

0.1

0

0.1

0

-5

0

-0.1

-10

-0.1

-0.2

-15

-0.2

-0.3

-20

-0.3

-0.4

-25 0

0.2

0.4

t

0.6

0.8

1

0.2

-0.4 0

0.2

0.4

t

0.6

0.8

1

5

0.2

0.1

0

0.1

0

-5

0

-0.1

-10

-0.1

-0.2

-15

-0.2

-0.3

-20

-0.3

-0.4

-25 0

0.2

0.4

t

0.6

0.8

1

0

0.2

0.4

0

0.2

0.4

t

0.6

0.8

1

0.6

0.8

1

-0.4 0

0.2

0.4

t

0.6

0.8

1

t

Fig. 3.32 Test 2: Analytical control intensities .u¯ 1 (t) (top left) and .u¯ 2 (t) (bottom left), POD control utilizing an equidistant time grid with .Δt = 1/40 (middle) and . = 4, POD control utilizing an adaptive time grid with dof=41 (right) and . = 4 Table 3.16 Test 2: Value of the cost functional with . = 4, true value +03 .J ≈ 1.0085 · 10

.Δt

.J (y

1/20 1/40 1/68 1/134 1/10000

 , u)

+05 .3.1225 · 10 +05 .1.5619 · 10

· 10+04 +04 .4.6655 · 10 +03 .1.0350 · 10 .9.1901

dof 21 41 69 135 –

.J (y

 , u)

· 10+04 +03 .1.0274 · 10 +03 .1.0065 · 10 +03 .1.0082 · 10 – .1.9553

Table 3.17 Test 2: CPU times for POD offline computations (snapshot generation and POD basis computation) depending on the time discretization (equidistant: columns 1–3, adaptive: columns 4–7) .Δt

1/20 1/40 1/68 1/134 1/6500

POD offline 0.16 s 0.17 s 0.17 s 0.19 s 52.4 s

POD online 0.07 s 0.08 s 0.10 s 0.17 s 5.99 s

dof 21 41 69 135 –

Compute .T 1.89 s 5.32 s 7.81 s 12.71 s –

POD offline 0.16 s 0.17 s 0.17 s 0.19 s –

POD online 0.07 s 0.08 s 0.10 s 0.17 s –

adaptive grid on which the snapshots are sampled and the POD-ROM simulation is performed, makes computationally sense. For the sake of completeness we also study and compare the POD approximation for . = 1 POD basis function. To begin, we note that the decay of the eigenvalues are in the middle of Fig. 3.28. As one can see, the use of more than . = 4 POD basis functions does not lead to more accurate approximations. The first POD mode .ψ1 can be seen in the right panel of Fig. 3.28. Table 3.18 shows the absolute error

3 Model Order Reduction for Optimal Control Problems

179

Table 3.18 Test 2: Absolute errors between the analytical optimal solution and the POD solution with . = 1 depending on the time discretization (equidistant: columns 1–4, adaptive: columns 5–8) .Δt

y

p u .εabs .εabs −01 +00 +01 .5.06 · 10 .7.84 · 10 .3.54 · 10 −01 +00 +01 .2.62 · 10 .4.11 · 10 .1.85 · 10 .εabs

1/20 1/40 1/68 .1.57 · 10−01 .2.45 · 10+00 .1.11 · 10+01 1/134 .8.11 · 10−02 .1.24 · 10+00 .5.59 · 10+00

dof 21 41 69 135

y

p u .εabs .εabs −02 −01 +00 .4.53 · 10 .5.41 · 10 .2.44 · 10 −02 −03 −02 .2.07 · 10 .5.35 · 10 .1.32 · 10 .εabs

· 10−02 .4.57 · 10−03 .4.27 · 10−03 −02 .4.49 · 10−03 .2.35 · 10−03 .2.07 · 10 .2.07

between the analytical solution and the POD solution in the state, control and adjoint state for uniform as well as for adaptive time discretization for . = 1. We note that in the case of the uniform temporal discretization, the use of . = 1 POD basis function leads to similar approximation results as using . = 4 POD modes. On the contrary, the adaptive time discretization with . = 4 POD basis functions leads to more accurate approximation results for the state variable than using . = 1 POD modes (compare Tables 3.18 and 3.15). However, with only one POD mode, the time adaptive grid gives very accurate results.

3.4.7 Test 3: Control Constrained Problem In this test we add control constraints to the previous example. We set .u1,a (t) ≤ u1 (t) ≤ u1,b (t) and .u2,a (t) ≤ u2 (t) ≤ u2,b (t) for the time dependent control intensities .u1 (t) and .u2 (t). The analytical value range for both controls is .u1 (t), u2 (t) ∈ [−0.3479, 0.1700] for .t ∈ [0, 1]. For each control intensity we choose different upper and lower bounds: we set .u1,a (t) = −100 (i.e. no restriction), .u1,b = 0.1 and .u2,a (t) = −0.2, u2,b (t) = 0. For the solution of the problems (3.20), (3.21) we use a projected gradient method. For both the full solution and the reduced order solution, 5 projected gradient steps are needed. The solution of the nonlinear, nonsmooth Eq. (3.33) can be done by a semismooth Newton method or by a Newton method utilizing a regularization of the projection formula, see [58]. In our numerical tests we compute the approximate solution to (3.33) with a fixed point iteration and initialize the method with the adjoint state corresponding to the control unconstrained optimal control problem. In this way, only two iterations are needed for convergence. Convergence of the fixed point iteration can be argued for large enough values of .α, see [43]. The analytical optimal solutions .u¯ 1 and .u¯ 2 are shown in the left plots in Fig. 3.33. For POD basis computation, we use state, adjoint and time derivative adjoint snapshots corresponding to the reference control .u0 = 0 and we also include the initial condition .y0 into our snapshot set. Figure 3.33 refers to the POD controls using a uniform (middle panel) and an adaptive temporal discretization (right panel). We again like to emphasize how accurate our approximation with an adaptive time grid is in comparison to a uniform grid (see Table 3.19). We note that the inclusion of box constraints on the control functions leads in general to better approximation

180

M. Hinze

0.2

0.2

5 0 -5

0

0

-10 -15

-0.2

-0.2

-20 -0.4

-25 0

0.2

0.4

t

0.6

0.8

-0.4

1

0.05

0

0.2

0.4

t

0.6

0.8

1

0.05

0

0

0

-0.05

-0.05

-0.05

-0.1

-0.1

-0.1

-0.15

-0.15

-0.15

-0.2

-0.2

-0.2

-0.25

-0.25

-0.25

0

0.2

0.4

t

0.6

0.8

1

0

0.2

0.4

0

0.2

0.4

t

0.6

0.8

1

0.6

0.8

1

0.05

0

0.2

0.4

t

0.6

0.8

1

t

Fig. 3.33 Test 3: Inclusion of box constraints for the control intensities: Analytical control intensities .u¯ 1 (t) (top left) and .u¯ 2 (t) (bottom left), POD control utilizing an equidistant time grid with .Δt = 1/40 (middle) and . = 4, POD control utilizing an adaptive time grid with dof .= 41 (right) and . = 4 Table 3.19 Test 3: Inclusion of box constraints for the control intensities: Absolute errors between the analytical optimal solution and the POD solution with . = 4 depending on the time discretization (equidistant: columns 1–4, adaptive: columns 5–8) y

u

p

.Δt

.εabs

.εabs

.εabs

1/20 1/40 1/68 1/134 1/6500

.2.86

.5.72

.3.54

· 10−01 −01 .1.48 · 10 −02 .8.81 · 10 −02 .4.46 · 10 −04 .2.60 · 10

· 10+00 +00 .3.00 · 10 +00 .1.79 · 10 −01 .9.05 · 10 −03 .3.87 · 10

· 10+01 +01 .1.86 · 10 +01 .1.11 · 10 +00 .5.60 · 10 −03 .9.33 · 10

dof 21 41 69 135 –

y

u

p

.εabs

.εabs

.εabs

.2.27

.3.96

.2.44

· 10−02 −04 .2.95 · 10 −04 .2.13 · 10 −04 .2.13 · 10 –

· 10−01 −03 .4.50 · 10 −03 .3.28 · 10 −03 .3.13 · 10 –

· 10+00 −02 .1.32 · 10 −03 .4.26 · 10 −03 .2.35 · 10 –

results, compare Table 3.15 with Table 3.19. This is due to the fact that on the active sets the error between the analytical optimal controls and the POD solutions vanishes. The CPU time is listed in Table 3.20. As one can see, to achieve an accuracy of order .10−3 for the control and adjoint variable, .n ≈ 6500 time steps are necessary with a uniform temporal discretization (compare Table 3.19). In this case, the POD simulation including the offline phase takes .55.88 s, whereas utilizing our approach with 69 degrees of freedom takes around .8.18 s. This gives us an impressive speedup of factor approximately 7.

3 Model Order Reduction for Optimal Control Problems

181

Table 3.20 Test 3: CPU times for POD offline computations (snapshot generation and POD basis computation) depending on the time discretization (equidistant: columns 1–3, adaptive: columns 4–7) 1/20 1/40 1/68 1/134 1/6500

POD offline 0.16 s 0.17 s 0.20 s 0.25 s 46.76 s

3.4.7.1

Conclusion

.Δt

POD online 0.06 s 0.12 s 0.14 s 0.22 s 9.12 s

dof 21 41 69 135 –

Compute .T 1.09 s 5.31 s 7.84 s 13.09 s –

POD offline 0.16 s 0.17 s 0.20 s 0.25 s –

POD online 0.06 s 0.12 s 0.14 s 0.22 s –

• snapshot location is important for a good approximation quality of the POD model • generation of snapshots with time adaptivity produces improvement in POD solution and the approximation quality can be controlled by a-posteriori estimates • in order to get good approximation results for both state and adjoint state a postprocessing of the adaptive time grid for the snapshots might be necessary. Now we have seen how important the data quality in surrogate-based PDE constrained optimization might be. The concept presented in the previous sections could also be combined with other strategies for the solution of optimal control problems; • Embed Algorithm 2 (e.g. as initialization step) into POD-MOR algorithms for optimal control problems, like TRPOD in (3.17) [10] (Fahl & Sachs), APOD in Algorithm 1 [1] (Afansiev & Hinze), or OSPOD [54] (Kunisch & Volkwein). • For nonlinear problems use this method for the Newton system. Of course are there more adaptive concepts around which also might be used to construct high-quality data for surrogate models in optimal control. So one may ask the questions • How many snapshots should one take for the construction of the reduced order model? • Where to take those snapshots? To answer these questions one may use a goal oriented procedure following the approaches presented in e.g. [14].

3.4.7.2

How Many Snapshots?

−→ iterative goal oriented procedure with the goal to resolve the cost functional with the reduced order model as well as possible.

182

• • • • •

M. Hinze

Goal: Resolve J(y) Start on coarse equi-distant time grid and compute snapshots Build POD model and compute yh and adjoint ph of reduced dynamics Becker and Rannacher: J (y) − J (yh ) ≈ η(yh , ph ) η(yh , ph ) > tol: double number of snapshots (re-computation)

Here, η(yh , ph ) denotes an error estimator whose evaluation only relies on information of the reduced state yh and the reduced adjoint state ph .

3.4.7.3

Where to Take Snapshots?

−→ time-step adaption via sensitivity of POD model. • • • •

Goal: Optimal time-grid for system dynamics Start on coarse (equi-distant) time grid and compute snapshots Build POD model and compute yh and adjoint ph of reduced dynamics Becker, Johnson, Rannacher: η(yh , zh ) = ρjloc (yh )ωjloc (ph ) Ij

• New time-grid: equi-distribute the error contributions ρjloc (yh )ωjloc (ph )

3.5 Lecture 3: A Fully Certified Reduced Basis Method for PDE Constrained Optimization Finally we present a fully certified Reduced Basis Method for PDE constrained optimization. The material is taken from6 [5], and the approach taken is very much inspired by Bader et al. [11], where also a discussion of related literature can be found. An excellent introduction to the analytical and theoretical concepts related to the reduced basis method is given in [33, 61]. The aim of the reduced basis method consists in constructing low-dimensional linear subspaces .XN ⊂ X with |M − XN | := sup inf u − vX

.

u∈M v∈XN

being small, where .M := {u(μ); μ ∈ P} is the nonlinear solution manifold of a parameter dependent problem with .u(μ) ∈ X for .μ ∈ P ⊂ Rp denoting a parameter dependent solution, see Fig. 3.34 for a sketch of this concept. The constructing is summarized in Algorithm 3, and in the present chapter will be

6 Reprinted from Reduced Basis Methods-An Application to Variational Discretization of Parametrized Elliptic Optimal Control Problems, A.A. Ali, M. Hinze, SIAM Journal on Scientific Computing, 42 (1), A271–A291 (2020). Copyright 2020 Society for Industrial and Applied Mathematics. Reprinted with permission. All rights reserved.

3 Model Order Reduction for Optimal Control Problems

183

Fig. 3.34 The RBM schematic. It aims at approximation a nonlinear solution manifold of a parametric problem by low-dimensional linear spaces. Picture is taken from [33] (courtesy of B. Haasdonk)

applied to approximate the solution manifold of a parameter dependent elliptic optimal control problem with control constraints.

Algorithm 3: The reduced basis method with greedy sampling for the construction of a finite-dimensional space .XN ⊆ X approximating the nonlinear solution manifold .M ⊂ X. Here, .P denotes the parameter domain, where .Strain ⊆ P denotes a training set of parameters. .Δ(XN , μ) is an error estimator controlling the approximation error .u(μ) − uN (μ)X . 1. Choose Strain ⊂ P , arbitrary μ1 ∈ Strain , and εtol > 0 2. Set N = 1, X1 := span{u(μ1 )} 3. while maxμ∈Strain Δ(XN , μ) > εtol do • μN +1 := arg maxμ∈S

train

Δ(XN , μ)

• XN +1 := XN ⊕ span{u(μN +1 )} • N ←N +1 end while

3.5.1 General Setting and Model Problem Let .P ⊂ Rp , .p ∈ N, be a compact set of parameters, and for a given parameter .μ ∈ P we consider the variational discrete ([36]) control problem (P)

.

min(u,y)∈Uad ×Y J (u, y) := 12 y − z2L2 (Ω ) + α2 u2U 0

subject to a(y, v; μ) = b(u, v; μ) + f (v; μ)

.

∀ v ∈ Y.

(3.41)

(3.42)

184

M. Hinze

Here (3.42) represents a finite element discrete elliptic PDE in a bounded domain Ω ⊂ Rd for .d ∈ {1, 2, 3} with boundary .∂Ω. Y denotes the space of piecewise linear and continuous finite elements. We assume the approximation process is conforming. √ The space Y is equipped with the inner product .(·, ·)Y and the norm . · Y := (·, ·)Y , in addition, there exist constants .ρ1 , ρ2 > 0 such that there holds .

ρ1 yH 1 (Ω) ≤ yY ≤ ρ2 yH 1 (Ω)

.

∀ y ∈ Y,

(3.43)

with . · H 1 (Ω) being the norm of the classical Sobolev space .H 1 (Ω). Remark 3.2 We work in our setting directly with the finite dimensional finite element space Y as truth space in the terminology of the reduced basis community. This space also could be replaced by the appropriate Hilbert space associated with (3.42). However, in a practical consideration the finite element error additionally needs to be taken into account for a fully practical certification of the method. The controls are from a real √ Hilbert space U equipped with the inner product (·, ·)U and the norm . · U := (·, ·)U , and the set of admissible controls .Uad ⊆ U is assumed to be non-empty, closed and convex. We denote by .Ω0 ⊆ Ω an open subset, and .L2 (Ω0 ) the classical Lebesgue space endowed with the standard inner product .(·, ·)L2 (Ω0 ) and the norm . · L2 (Ω0 ) := # (·, ·)L2 (Ω0 ) .The desired state .z ∈ L2 (Ω0 ) and the parameter .α > 0 are given data. The parameter dependent bilinear form .a(·, ·; μ) : Y × Y → R is continuous

.

γ (μ) :=

.

|a(y, v; μ)| ≤ γ0 < ∞ y,v∈Y \{0} yY vY

∀ μ ∈ P,

sup

and coercive β(μ) :=

.

inf

y∈Y \{0}

a(y, y; μ) ≥ β0 > 0 y2Y

∀ μ ∈ P,

where .γ0 and .β0 are real numbers independent of .μ. The parameter dependent bilinear form .b(·, ·; μ) : U × Y → R is continuous κ(μ) :=

.

|b(u, v; μ)| ≤ κ0 < ∞ (u,v)∈U ×Y \{(0,0)} uU vY sup

∀ μ ∈ P,

where .κ0 is a real number independent of .μ. Finally, .f (·; μ) ∈ Y ∗ is a parameter dependent linear form, where .Y ∗ denotes the topological dual of Y with norm .·Y ∗ defined by l(·; μ)Y ∗ := sup l(v; μ),

.

vY =1

3 Model Order Reduction for Optimal Control Problems

185

for a give functional .l(·; μ) ∈ Y ∗ depending on the parameter .μ. We assume that there exists a constant .σ0 independent of .μ such that .

|f (v; μ)| ≤ σ0 < ∞ v∈Y \{0} vY

∀ μ ∈ P.

sup

We find it convenient to introduce here for the upcoming analysis the Riesz isomorphism .R : Y ∗ → Y which is defined for a given .f ∈ Y ∗ by the unique element .Rf ∈ Y such that f (v) = (Rf, v)Y

∀ v ∈ Y.

.

Under the previous assumptions one can verify that the problem .(P) admits a unique solution for every .μ ∈ P. The corresponding first order necessary conditions, which are also sufficient in this case, are stated in the next result. For the proof see for instance [41, Chapter 3]. Theorem 3.3 Let .u ∈ Uad be the solution of .(P) for a given .μ ∈ P. Then there exist a state .y ∈ Y and an adjoint state .p ∈ Y such that there holds a(y, v; μ) = b(u, v; μ) + f (v; μ)

∀ v ∈ Y, .

.

a(v, p; μ) = (y − z, v)L2 (Ω0 )

(3.44)

∀ v ∈ Y, .

b(v − u, p; μ) + α(u, v − u)U ≥ 0

(3.45)

∀ v ∈ Uad .

(3.46)

The varying parameter .μ in the state Eq. (3.42) could represent physical or/and geometrical quantities, like diffusion or convection speed, or the width of the spacial domain .Ω. Considering the problem .(P) in the context of real-time or multi-query scenarios can be very costly when the dimension of the finite element space Y is very high. In this work we adopt the reduced basis method, see for instance [33], to obtain a surrogate for .(P) that is relatively cheaper to solve and at the same time delivers acceptable approximation for the solution of .(P) at a given .μ. To this end, we first define a reduced problem for .(P), and establish a posterior error estimators that predict the expecting approximation error when using the reduced problem. Then, we adapt the greedy procedure of Algorithm 3 to our setting in order to improve the approximation quality of the reduced problem. In the numerical example we with .Ω = Ω1 ∪ Ω2 consider

a(y, v; μ) := μ

.

Ω1

∇y∇vdx + Ω2

∇y∇vdx,

b(u, v; μ) =

uvdx and f ≡ 0. Ω

186

M. Hinze

3.6 The Reduced Problem and the Greedy Procedure Let .YN ⊂ Y be a finite dimensional subspace. We define the reduced counterpart of the problem .(P) for a given .μ ∈ P by min(u,yN )∈Uad ×YN J (u, yN ) := 12 yN − z2L2 (Ω ) + α2 u2U

(PN )

.

0

(3.47)

∀ vN ∈ YN .

(3.48)

subject to a(yN , vN ; μ) = b(u, vN ; μ) + f (vN ; μ)

.

We point out that in .(PN ) the controls are still sought in .Uad . In a similar way to (P), one can show that .(PN ) has a unique solution for a given .μ, and it satisfies the following optimality conditions, whose proof is along the lines of the corresponding result in [36], and is therefore omitted here.

.

Theorem 3.4 Let .uN ∈ Uad be the solution of .(PN ) for a given .μ ∈ P. Then there exist a state .yN ∈ YN and an adjoint state .pN ∈ YN such that there holds a(yN , vN ; μ) = b(uN , vN ; μ) + f (vN ; μ)

.

a(vN , pN ; μ) = (yN − z, vN )L2 (Ω0 )

∀ vN ∈ YN , .

∀ vN ∈ YN , .

b(v − uN , pN ; μ) + α(uN , v − uN )U ≥ 0 ∀ v ∈ Uad .

(3.49) (3.50) (3.51)

The space .YN shall be constructed successively using the following greedy procedure. Here .Strain ⊂ P is a finite subset, called a training set, which is Algorithm 4: Greedy procedure for the construction of the space .YN in problem (3.47). 1. Choose Strain ⊂ P , μ1 ∈ Strain arbitrary, εtol > 0, and Nmax ∈ N. 2. Set N = 1, Φ1 := {y(μ1 ), p(μ1 )}, and Y1 := span(Φ1 ). 3. while maxμ∈Strain Δ(YN , μ) > εtol and N ≤ Nmax do • μN +1 := arg maxμ∈S

train

Δ(YN , μ)

• ΦN +1 := ΦN ∪ {y(μN +1 ), p(μN +1 )}. • YN +1 := span(ΦN +1 ). • N ←N +1 end while

assumed to be rich enough in parameters to well represent .P. .Nmax is the maximum number of iterations, and .εtol is a given error tolerance. In the iteration of index N , the pair .{y(μN ), p(μN )} is the optimal state and adjoint state, respectively, corresponding to the problem .(P) at .μN , and .ΦN is the reduced basis which assumed

3 Model Order Reduction for Optimal Control Problems

187

to be orthonormal. If it is not, one can apply an orthonormalization process like Gram–Schmidt. An orthonormal reduced basis guarantees algebraic stability when N increases, see [33]. The quantity .Δ(YN , μ) is an estimator for the expected error in approximating the solution of .(P) by the one of .(PN ) for a given .μ when using the space .YN . The maximum of .Δ(YN , μ) over .Strain is obtained by linear search. We note that during the While-loop in the previous algorithm one should implement a condition testing if the dimension of .ΦN+1 differs from the one of .ΦN . If it does not, the while loop should be terminated. One choice for .Δ(YN , μ) could be Δ(YN , μ) := u(μ) − uN (μ)U ,

.

i.e. the error between the solution of .(P) and of .(PN ). However, considering this choice in a linear search process over a very large training set .Strain is computationally too costly, since the solution of the highly dimensional problem .(P) is needed. We now establish some choices for .Δ(YN , μ) that does not depend on the solution of .(P), but use appropriate residual functionals instead. The construction idea is leant from the a posteriori finite element analysis for elliptic optimal control problems presented in [24, 50] and exploits the fact that in the a posteriori error estimation for variational discrete optimal control problems no residual for the control is needed, since the control is not discretized in this approach, see [31]. More specifically it is possible to prove the error equivalence u − uh U + y − yh Y + p − ph Y ∼ y h − yh Y + ph − ph Y ,

.

where .y, yh , .p, ph and .u, uh denote the continuous and discrete optimal state, adjoint state, and control, respectively in the finite element approximation of the original optimal control problem. At the same time .yh , ph are the Ritz-projections of .y h , ph . The idea now is twofold; • prove this estimate in the reduced basis context, • track the constants in the error equivalence. In our approach this leads to the two-sided a posteriori error bound δ(μ) ≤ u(μ) − uN (μ)U + y(μ) − yN (μ)Y + p(μ) − pN (μ)Y ≤ Δ(μ), (3.52)

.

see [5, Theorem 4.2] for details. Here .δ(μ), Δ(μ) only depend on the residuals ry (·, μ), rp (·, μ) ∈ Y ∗ of the primal and adjoint equation, and .(y(μ), p(μ), u(μ)), .(yN (μ), pN (μ), uN (μ)) denote the solutions to (3.44)–(3.46) and (3.49)–(3.51), respectively. The bounds are given by .

Δ(μ) ≡ Δuyp (μ) :=c1 (μ)ry (·; μ)Y ∗ + c2 (μ)rp (·; μ)Y ∗ , .

(3.53)

δ(μ) ≡ δuyp (μ) :=c3 (μ)ry (·; μ)Y ∗ + c4 (μ)rp (·; μ)Y ∗ ,

(3.54)

.

188

M. Hinze

where the constants are defined as c1 (μ) :=

.

 " 1  1 1 " κ(μ) √ + 1+ 2 √ +1 , β(μ) ρ1 α ρ1 β(μ) β(μ)ρ1 α

" 1  κ(μ) κ(μ)2 κ(μ)2 + + 2 +1 , β(μ) α β(μ)α ρ1 β 2 (μ)α  κ(μ) "−1 1 , max ,1 c3 (μ) := 2γ (μ) β(μ)  1 "−1 1 , c4 (μ) := max 2 ,1 2γ (μ) ρ1 β(μ)

c2 (μ) :=

and the residuals are given by ry (v; μ) :=b(uN , v; μ) + f (v; μ) − a(yN , v; μ)

.

rp (v; μ) :=(yN − z, v)L2 (Ω0 ) − a(v, pN ; μ)

∀ v ∈ Y, and

∀ v ∈ Y.

Among other choices we use the upper bound .Δ(μ) in a greedy procedure to construct the subspace .YN . Compared to other approaches in the literature [11, 57] the approach presented here • is fully certified in that it provides computable upper and lower bounds for the errors in the reduced basis approximation, • achieves comparable errors by smaller subspaces, and • no residual .ru is involved in the construction of of the estimators .δ(μ), Δ(μ), so that overestimation caused by inaccurate approximation schemes is avoided, compare [31].

3.6.1 A Relative Error Bound It is also possible to prove the relative error bound .

u(μ) − uN (μ)U 2Δu (μ) ≤ , u(μ)U uN (μ)U

(3.55)

u (μ) provided that . u2Δ ≤ 1, where N (μ)U

0 Δu (μ) :=

.

1 ρ12 αβ(μ)2

ry (·; μ)2Y ∗ +

κ(μ)2 rp (·; μ)2Y ∗ , α 2 β(μ)2

see [5, Corollary 4.3]. This bound will also be used in the numerical examples.

3 Model Order Reduction for Optimal Control Problems

189

3.6.2 Convergence of the Method Let .u(μ), uN (μ) ∈ Uad be the solutions of .(P) and .(PN ) corresponding to some given .μ ∈ P. From [5, Theorem 5.5,5.6] we have that Lipschitz continuity u(μ1 ) − u(μ2 )U # ≤ c a(μ2 ) − a(μ1 )A + b(μ2 ) − b(μ1 )B + f (μ2 ) − f (μ1 )F

.

and uN (μ1 ) − uN (μ2 )U # ≤ c a(μ2 ) − a(μ1 )A + b(μ2 ) − b(μ1 )B + f (μ2 ) − f (μ1 )F

.

holds with some .c > 0 independent of .μ1 , μ2 ∈ P or N. Here .a(μ) := a(·, ·; μ), b(μ) := b(·, ·; μ) and .f (μ) := f (·; μ) for any .μ ∈ P, and

.

a(·, ·, μ) − a(·, ·, ξ )A :=

.

b(·, ·, μ) − b(·, ·, ξ )B :=

sup

|a(v, w, μ) − a(v, w, ξ )|,

sup

|b(v, w, μ) − b(v, w, ξ )|, and

vY =1,wY =1 vU =1,wY =1

f (·, μ) − f (·, ξ )F := sup |f (v, μ) − f (v, ξ )|. vY =1

Moreover, from [5, Theorem 5.7] with hN := max min μ − μ 

.

μ∈P μ ∈PN

and the assumption a(μ2 ) − a(μ1 )A ≤ cμ2 − μ1 qa ,

.

b(μ2 ) − b(μ1 )B ≤ cμ2 − μ1 qb , f (μ2 ) − f (μ1 )F ≤ cμ2 − μ1 qf , we have the error estimate uN (μ) − u(μ)U ≤ chtN ,

.

(3.56)

where .t := 12 min{qa , qb , qf }, .c > 0 is a constant independent of .hN or .μ, and .u(μ), uN (μ) ∈ Uad denote the solutions of .(P) and .(PN ), respectively.

190

M. Hinze

3.6.3 Numerical Experiments In this section we apply our theoretical findings to construct numerically reduced surrogates for two examples, namely a thermal block problem and a Graetz flow problem, which are taken from [11]. In particular, we discretize those two examples using variational discretization, then we build their reduced counterparts using the greedy procedure from 4, where we use the bounds .2Δu (μ)/uN U and .Δuyp (μ) from (3.55) and (3.53), respectively, for the estimator .Δ(YN , μ). Finally, we compare the solutions of the reduced problems to their high-dimensional counterparts to asses the quality of the obtained reduced models.

Thermal Block We consider the control problem .

min

(u,y)∈Uad ×Y

J (u, y) =

1 α y − z2L2 (Ω ) + u2U 0 2 2

subject to



∇y · ∇v dx +

μ

.

Ω1

∇y · ∇v dx = Ω2

uv dx

∀ v ∈ Y,

Ω

where Ω1 := (0, 0.5) × (0, 1),

.

Ω0 := Ω,

Ω2 := (0.5, 1) × (0, 1),

z(x) = 1 in Ω,

U := L2 (Ω),

(·, ·)U := (·, ·)L2 (Ω) ,

Uad := {u ∈ L2 (Ω) : u(x) ≥ ua (x) a.e x ∈ Ω}, μ ∈ P := [0.5, 3],

Ω := Ω1 ∪ Ω2 ,

ua (x) := 2 + 2(x1 − 0.5),

α = 10−2 ,

¯ is the space of piecewise linear and and the space .Y ⊂ H01 (Ω) ∩ C(Ω) continuous finite elements endowed with the inner product .(·, ·)Y := (∇·, ∇·)L2 (Ω) . The underlying PDE admits a homogeneous Dirichlet boundary condition on the boundary .∂Ω of the domain .Ω. From the previous given data, it is an easy task to see that 3.43 holds with .ρ2 = 1 and .ρ1 = 1 21 where .cp = √1 is the Poincaré’s constant in the inequality cp +1



vL2 (Ω) ≤ cp ∇vL2 (Ω)

.

∀ v ∈ H01 (Ω).

˜ ˜ = cp and .β(μ) Furthermore, we take .κ(μ) = min(μ, 1).

3 Model Order Reduction for Optimal Control Problems

191

10 2

10 0

10 -2

10 -4

10 -6

0

5

10

15

20

25

Fig. 3.35 Thermal Block: The maximum of .u−uN L2 (Ω) /uL2 (Ω) the relative error of controls and the corresponding upper bounds .2Δu (μ)/uN L2 (Ω) over .Stest versus the greedy algorithm iterations .N = 1, . . . , 22

We use a uniform triangulation for .Ω such that .dim(Y ) ≈ 8300. The solution of both the variational discrete control problem and the reduced control problem for a given parameter .μ is achieved by solving the corresponding optimality conditions using a semismooth Newton’s method with the stopping criteria .

1 (k) p − p(k+1) L2 (Ω) ≤ 10−11 , α

where .p(k) is the adjoint variable at the k-th iteration. The reduced space .YN for the considered problem was constructed employing the greedy procedure introduced in Algorithm 4 with the choice .Strain := {sj }100 j =1 , (j −1)/99 1 −8 .sj := 0.5 × (3/0.5) , .μ := 0.5, .εtol = 10 and .Nmax = 30. The numerical results for the choice .Δ(YN , μ) := 2Δu (μ)/uN L2 (Ω) are reported in Fig. 3.35. We observe that the algorithm terminated before reaching the prescribed tolerance .εtol and that was after 22 iterations as it could not enrich the reduced basis with any new linearly independent samples. To investigate the quality of the obtained reduced basis and the sharpness of the bound .2Δu (μ)/uN L2 (Ω) , we compute the maximum of the relative error .u − uN L2 (Ω) /uL2 (Ω) and of the corresponding bound .2Δu (μ)/uN L2 (Ω) over the set .Stest := {sj }125 j =1 , .sj := (j −1)/125 0.503 × (2.99/0.503) for the greedy algorithm iterations .N = 1, . . . , 22.

192

M. Hinze

We also report the comparison between the quantity 1

u − uN 2U + α1 y − yN 2L2 (Ω

.

0)

uU

and the bound .2Δu (μ)/uN L2 (Ω) to see the effect of dropping the term . α1 y − yN 2L2 (Ω ) when establishing (3.55). 0 We see that the error decays dramatically in the first nine iterations, namely it drops from 1 to slightly above .10−6 , then the decay becomes very slow and the error almost stabilizes at .10−6 in the last four iterations. Notice that the term . α1 y − yN 2L2 (Ω ) admits a relatively small size in comparison to .u − uN 2L2 (Ω) . This plot 0 compares to Fig.1(b) of [11]. We observe that four iterations of the greedy algorithm with our approach deliver the same error reduction as thirty iterations of the greedy algorithm in [11]. For the choice .Δ(YN , μ) := Δuyp (μ) the results are reported in Fig. 3.36. We see that the dominant error is usually the one of the state variable while the error of the adjoint state has the smallest contribution.

10 2

10 0

10 -2

10 -4

10 -6

10 -8

10 -10

0

5

10

15

20

25

Fig. 3.36 Thermal Block: The maximum of the upper bound .Δuyp (μ) and the sum .u − uN L2 (Ω) +y−yN Y +p−pN Y over .Stest versus the greedy algorithm iterations .N = 1, . . . , 22

3 Model Order Reduction for Optimal Control Problems

193

Graetz Flow We consider the problem min

.

(u,y)∈Uad (μ2 )×Y (μ2 )

J (u, y) =

1 α y − z2L2 (Ω (μ )) + u2U (μ2 ) 0 2 2 2

subject to 1 . μ1





∇y ·∇v dx + Ω(μ2 )

β(x)·∇yv dx = Ω(μ2 )

uv dx

∀ v ∈ Y (μ2 ),

Ω(μ2 )

with the data Ω(μ2 ) := (0, 1.5 + μ2 ) × (0, 1),

.

Ω1 (μ2 ) := (0.2μ2 , 0.8μ2 ) × (0.3, 0.7),

Ω2 (μ2 ) := (μ2 + 0.2, μ2 + 1.5) × (0.3, 0.7), β(x) = (x2 (1 − x2 ), 0)T in Ω(μ2 ), U (μ2 ) := L (Ω(μ2 )), 2

Ω0 (μ2 ) := Ω1 (μ2 ) ∪ Ω2 (μ2 )

z(x) = 0.5 in Ω1 (μ2 ), z(x) = 2 in Ω2 (μ2 )

(·, ·)U (μ2 ) := (·, ·)L2 (Ω(μ2 )) ,

Uad (μ2 ) := {u ∈ L2 (Ω(μ2 )) : u(x) ≥ ua (x) a.e x ∈ Ω(μ2 )}, ua (x) := −0.5, (μ1 , μ2 ) ∈ P := [5, 18] × [0.8, 1.2],

α = 10−2 ,

and .Y (μ2 ) ⊂ {v ∈ H 1 (Ω(μ2 )) ∩ C(Ω(μ2 )) : v|ΓD (μ2 ) = 1} is the space of piecewise linear and continuous finite elements. The underlying PDE has the homogeneous Neumann boundary condition .∂η y|ΓN (μ2 ) = 0 on the portion .ΓN (μ2 ) of the boundary of the domain .Ω(μ2 ) , and the Dirichlet boundary condition .y|ΓD (μ2 ) = 1 on the portion .ΓD (μ2 ). An illustration for the domain .Ω(μ2 ) and the boundary segments .ΓD (μ2 ) and .ΓN (μ2 ) is given in Fig. 3.37. We introduce the lifting function .y(x) ˜ := 1 to handle the nonhomogeneous Dirichlet boundary condition, and reformulate the problem over the reference ref domain .Ω := Ω(μref 2 ), and endow the state space .Y := Y (μ2 ) by the inner product .(·, ·)Y given by (v, w)Y :=

.

1 μref 1

∇w · ∇v dx + Ω

1 2



β(x) · ∇wv dx +

Ω

β(x) · ∇vw dx

"

Ω

ref ref where .(μref 1 , μ2 ) = (5, 1). The control space .U := U (μ2 ) is endowed with a parameter dependent inner product .(·, ·)U (μ2 ) from the affine geometry transformation, see [61]. After transforming the problem over .Ω we deduce that (3.43) holds −1/2 , where the constant .c is from the Poincaré’s with .ρ1 = max(μref p 1 (1 + cp ), 1)

194

M. Hinze x2 ΓD (µ2 )

1

ΓD (µ2 )

0

Ω1 (µ2 )

Ω2 (µ2 )

0

µ2

ΓN (µ2 )

ΓD (µ2 )

µ2 + 1.5

x1

Fig. 3.37 The domain .Ω(μ2 ) for the Graetz flow problem

inequality

v 2 dx ≤ cp

.

Ω

|∇v|2 dx Ω

∀ v ∈ H 1 (Ω) : v|Γ

ref )

D (μ2

= 0.

In addition, we take .

 " 1 μ2 1 1 √ ˜ 1 , μ2 ) = min μref , , ), 1 , and κ(μ ˜ 1 , μ2 ) = ( μ2 + 1). β(μ 1 min( μ1 μ2 μ1 μ1 ρ1

The domain .Ω is partitioned via a uniform triangulation such that .dim(Y ) ≈ 4900. The optimality conditions corresponding to the variational discrete control problem and the reduced control problem are solved using a semismooth Newton’s method with the stopping criteria .

1 (k) p − p(k+1) U (μ2 ) ≤ 10−11 , α

where .p(k) is the adjoint variable at the k-th iteration. The optimal controls and their active sets for the parameter values .(μ1 , μ2 ) = (5, 0.8), .(18, 1.2) computed on the reference domain are presented in Fig. 3.38. The reduced basis for the space .YN is constructed by applying the Algorithm 4 with the choice .Strain := {(sj1 , sk2 )} for .j, k = 1, . . . , 30 where .sj1 := 5 × (18/5)(j −1)/29 and .sk2 := (0.4/29) × (k − 1) + 0.8. Furthermore, we take .μ1 := (5, 0.8), .εtol = 10−8 and .Nmax = 30. We start with investigating the estimator .Δ(YN , μ) := 2Δu (μ)/uN U (μ2 ) . The corresponding results are presented in Fig. 3.39. The algorithm terminated at .Nmax = 30 before reaching the tolerance .εtol . To asses the quality of the resulting reduced basis and the sharpness of the bound .Δ(YN , μ), we compare the maximum of the relative error .u − uN U (μ2 ) /uU (μ2 ) to the bound .2Δu (μ)/uN U (μ2 )

3 Model Order Reduction for Optimal Control Problems

195

(b)

(a)

1 0.8 0.6 0.4 0.2 0

0

0.5

1

1.5

2

2.5

0.5

1

1.5

2

2.5

(d)

(c)

1 0.8 0.6 0.4 0.2 0

0

Fig. 3.38 The optimal controls, and their active sets (enclosed by the curves) for .(μ1 , μ2 ) = (5, 0.8), and (18, 1.2) computed on the reference domain .Ω. (a) The optimal control for .(μ1 , μ2 ) = (5, 0.8). (b) The control active set boundary for .(μ1 , μ2 ) = (5, 0.8). (c) The optimal control for .(μ1 , μ2 ) = (18, 1.2). (d) The control active set boundary for .(μ1 , μ2 ) = (18, 1.2)

10 2

10 0

10 -2

10 -4 0

5

10

15

20

25

30

Fig. 3.39 The maximum of .u − uN U (μ2 ) /uU (μ2 ) the relative error of controls and the upper bound .2Δu (μ)/uN U (μ2 ) over .Stest versus the greedy algorithm iterations .N = 1, . . . , 30

196

M. Hinze 10 5

10 0

10 -5

0

5

10

15

20

25

30

Fig. 3.40 The maximum of the upper bound .Δuyp (μ) and the sum .u − uN L2 (Ω) + y − yN Y + p − pN Y over .Stest versus the greedy algorithm iterations .N = 1, . . . , 30

computed over the test set .Stest := {(sj1 , sk2 )}, for .j = 1, . . . , 10 and .k = 1, . . . , 5 where .sj1 := 5.2 × (17.5/5.2)(j −1)/9 , and .sk2 := (0.35/4) × (k − 1) + 0.82 for the greedy algorithm iterations .N = 1, . . . , 30. As in the previous example, we also investigate the contribution of the term . α1 y − yN 2L2 (Ω ) , which is again of 0 relatively small size as the plot indicates. The error decay is of moderate speed in comparison to the previous example. It could be because the current problem has more parameters and one of which stems from the geometry of the domain. Figure 3.39 compares to the results documented in Figure 3(b) of [11]. For this example six iterations of the greedy algorithm with our approach deliver the same error reduction as thirty iterations of the greedy algorithm in [11]. The numerical results with the estimator .Δ(YN , μ) := Δuyp (μ) are documented in Fig. 3.40. We see that after the 10th iteration the error of the state becomes the dominant one while the adjoint state always contributes with the smallest error. We remark that in all of the previous numerical experiments there is a gap between the bounds .2Δu (μ)/uN L2 (Ω) and .Δuyp (μ) and the corresponding errors, which is in particular of a much larger size in the Graetz flow example.

3 Model Order Reduction for Optimal Control Problems

197

References 1. K. Afanasiev, M. Hinze, Adaptive control of a wake flow using proper orthogonal decomposition. Lecture Notes Pure Appl. Math. 216, 31–332 (2001) 2. A. Agosti, P. Ciarletta, H. Garcke, M. Hinze, Learning patient-specific parameters for a diffuse interface glioblastoma model from neuroimaging data. Math. Methods Appl. Sci. 43(15), 8945–8979 (2020) 3. O. Alff, Modellordnungsreduktion für das Cahn-Hilliard System. Technical Report, Master thesis, Universität Hamburg, Fachbereich Mathematik, 2015 4. O. Alff, Trust Region Pod for Optimal Control of Cahn-Hilliard Systems.Technical Report, Master thesis, Universität Hamburg, Fachbereich Mathematik, 2018 5. A.A. Ali, M. Hinze, Reduced basis methods–an application to variational discretization of parametrized elliptic optimal control problems. SIAM J. Sci. Comput. 42(1), A271–A291 (2020) 6. A. Alla, M. Hinze, O. Lass, S. Ulbrich, Model order reduction approaches for the optimal design of permanent magnets in electro-magnetic machines. IFAC-PapersOnLine 48(1), 242– 247 (2015) 7. A. Alla, C. Gräßle, M. Hinze, A residual based snapshot location strategy for pod in distributed optimal control of linear parabolic equations. IFAC-PapersOnLine 49(8), 13–18 (2016) 8. A. Alla, C. Grässle, M. Hinze, A posteriori snapshot location for pod in optimal control of linear parabolic equations. ESAIM: Math. Modell. Numer. Anal. 52(5), 1847–1873 (2018) 9. A. Alla, M. Hinze, P. Kolvenbach, O. Lass, S. Ulbrich, A certified model reduction approach for robust parameter optimization with pde constraints. Adv. Comput. Math. 45(3), 1221–1250 (2019) 10. E. Arian, M. Fahl, E.W. Sachs, Trust-Region Proper Orthogonal Decomposition for Flow Control. Technical Report, Institute for Computer Applications and Engineering Hampton, 2000 11. E. Bader, M. Kärcher, M.A. Grepl, K. Veroy, Certified reduced basis methods for parametrized distributed elliptic optimal control problems with control constraints. SIAM J. Sci. Comput. 38(6), A3921–A3946 (2016) 12. F. Ballarin, G. Rozza, M. Strazzullo, Reduced order methods for parametric flow control problems and applications (2020). Preprint. arXiv:2011.12101 13. F. Ballarin, G. Rozza, M. Strazzullo, Reduced order methods for parametric flow control problems and applications (2020). Preprint, arXiv:2011.12101 14. R. Becker, R. Rannacher, An optimal control approach to a posteriori error estimation in finite element methods. Acta Numer. 10, 1–102 (2001) 15. P. Benner, S. Grivet-Talocia, A. Quarteroni, G. Rozza, W. Schilders, L.M. Silveira, Model order reduction: Basic concepts and notation, in Model Order Reduction: System-and Data-Driven Methods and Algorithms, vol. 1 (De Gruyter, Berlin, 2021), pp. 1–14 16. P. Benner, E. Sachs, S. Volkwein, Model order reduction for pde constrained optimization, in Trends in PDE constrained optimization (Springer, Berlin, 2014), pp. 303–326 17. P. Benner, W. Schilders, S. Grivet-Talocia, A. Quarteroni, G. Rozza, L. Miguel Silveira, Model Order Reduction: Snapshot-Based Methods and Algorithms, vol. 2 (De Gruyter, Berlin, 2020) 18. P. Benner, W. Schilders, S. Grivet-Talocia, A. Quarteroni, G. Rozza, L. Miguel Silveira, Model Order Reduction: Applications, vol. 3 (De Gruyter, Berlin, 2020) 19. D. Chapelle, A. Gariah, J. Sainte-Marie, Galerkin approximation with proper orthogonal decomposition: new error estimates and illustrative examples. ESAIM Math. Modell. Numer. Anal. 46(4), 731–757 (2012) 20. S. Chaturantabut, D.C. Sorensen, Nonlinear model reduction via discrete empirical interpolation. SIAM J. Sci. Comput. 32(5), 2737–2764 (2010) 21. Z. Drmac, S. Gugercin, A new selection operator for the discrete empirical interpolation method-improved a priori error bound and extensions. SIAM J. Sci. Comput. 38(2), A631– A648 (2016)

198

M. Hinze

22. H. Garcke, M. Hinze, C. Kahle, Diffuse interface approaches in atmosphere and ocean— modeling and numerical implementation, in Energy Transfers in Atmosphere and Ocean (Springer, Berlin, 2019), pp. 287–307 23. W. Gong, M. Hinze, Z. Zhou, Space-time finite element approximation of parabolic optimal control problems. J. Numer. Math. 20(2), 111–146 (2012) 24. W. Gong, N. Yan, Adaptive finite element method for elliptic optimal control problems: convergence and optimality. Numer. Math. 135(4), 1121–1170 (2017) 25. C. Graessle, M. Gubisch, S. Metzdorf, S. Rogg, S. Volkwein, Pod basis updates for nonlinearpde control. at-Automatisierungstechnik 65(5), 298–307 (2017) 26. C. Gräßle, M. Hinze, Pod reduced-order modeling for evolution equations utilizing arbitrary finite element discretizations. Adv. Comput. Math. 44(6), 1941–1978 (2018) 27. C. Gräßle, M. Hinze, J. Lang, S. Ullmann, Pod model order reduction with space-adapted snapshots for incompressible flows. Adv. Comput. Math. 45(5), 2401–2428 (2019) 28. C. Gräßle, M. Hinze, N. Scharmacher, Pod for optimal control of the cahn-hilliard system using spatially adapted snapshots, in European Conference on Numerical Mathematics and Advanced Applications, (2017), pp. 703–711 29. C. Gräßle, M. Hinze, S. Volkwein, 2 Model Order Reduction by Proper Orthogonal Decomposition (De Gruyter, Berlin, 2020) 30. M. Gubisch, S. Volkwein, Proper orthogonal decomposition for linear-quadratic optimal control. Model Reduct. Approx. Theory Algorithms 5, 66 (2017) 31. A. Günther, M. Hinze, M.H. Tber, A posteriori error representations for elliptic optimal control problems with control and state constraints, in Constrained Optimization and Optimal Control for Partial Differential Equations (Springer, Berlin, 2012), pp. 303–317 32. M.D. Gunzburger, Perspectives in Flow Control and Optimization (SIAM, Philadelphia, 2002) 33. B. Haasdonk, Reduced basis methods for parametrized pdes-a tutorial introduction for stationary and instationary problems. Model Reduct. Approx. Theory Algorithms 15, 65 (2017) 34. S. Herkt, M. Hinze, R. Pinnau, Convergence analysis of galerkin pod for linear second order evolution equations. Electron. Trans. Numer. Anal, 40, 321–337 (2013) 35. J.S. Hesthaven, G. Rozza, B. Stamm et al., Certified Reduced Basis Methods for Parametrized Partial Differential Equations, vol. 590 (Springer, Berlin, 2016) 36. M. Hinze, A variational discretization concept in control constrained optimization: the linearquadratic case. Comput. Optim. Appl. 30(1), 45–61 (2005) 37. M. Hinze, K. Kunisch, On suboptimal control strategies for the navier-stokes equations. ESAIM Proc. 4, 181–198 (1998) 38. M. Hinze, K. Kunisch, Three control methods for time-dependent fluid flow. Flow Turbul. Combust. 65(3), 273–298 (2000) 39. M. Hinze, M. Kunkel, U. Matthes, Pod model order reduction of electrical networks with semiconductors modeled by the transient drift-diffusion equations, in Progress in Industrial Mathematics at ECMI 2010 (Springer, Berlin, 2012), pp. 161–168 40. M. Hinze, O. Pätzold, S. Ziegenbalg, Solidification of a gaas melt-optimal control of the phase interface. J. Crystal Growth 311(8), 2501–2507 (2009) 41. M. Hinze, R. Pinnau, M. Ulbrich, S. Ulbrich, Optimization with PDE Constraints, vol. 23. Springer Science Business Media, New York, 2008) 42. M. Hinze, A. Rösch, Discretization of optimal control problems, in Constrained Optimization and Optimal Control for Partial Differential Equations (Springer, Berlin, 2012), pp. 391–430 43. M. Hinze, M. Vierling, The semi-smooth newton method for variationally discretized control constrained elliptic optimal control problems; implementation, convergence and globalization. Optim. Methods Softw. 27(6), 933–950 (2012) 44. M. Hinze, S. Volkwein, Error estimates for abstract linear-quadratic optimal control problems using proper orthogonal decomposition. Comput. Optim. Appl. 39(3), 319–345 (2008) 45. M. Hinze, S. Volkwein, Proper orthogonal decomposition surrogate models for nonlinear dynamical systems: error estimates and suboptimal control, in Dimension Reduction of LargeScale Systems (Springer, Berlin, 2005), pp. 261–306

3 Model Order Reduction for Optimal Control Problems

199

46. M. Hinze, S. Ziegenbalg, Optimal control of the free boundary in a two-phase stefan problem. J. Comput. Phys. 223(2), 657–684 (2007) 47. M. Hinze, S. Ziegenbalg, Optimal control of the free boundary in a two-phase stefan problem with flow driven by convection. ZAMM-J. Appl. Math. Mech./Z. Angew. Math. Mech. Appl. Math. Mech. 87(6), 430–448 (2007) 48. R.H. Hoppe, Z. Liu, Snapshot location by error equilibration in proper orthogonal decomposition for linear and semilinear parabolic partial differential equations. J. Numer. Math. 22(1), 1–32 (2014) 49. C. Johnson, Numerical Solution of Partial Differential Equations by the Finite Element Method (Courier Corporation, Chelmsford, 2012) 50. K. Kohls, A. Rösch, K.G. Siebert, A posteriori error analysis of optimal control problems with control constraints. SIAM J. Control Optim. 52(3), 1832–1861 (2014) 51. N. Kühl, P.M. Müller, A. Stück, M. Hinze, T. Rung, Decoupling of control and force objective in adjoint-based fluid dynamic shape optimization. AIAA J. 57(9), 4110–4114 (2019) 52. K. Kunisch, S. Volkwein, Galerkin proper orthogonal decomposition methods for a general equation in fluid dynamics. SIAM J. Numer. Anal. 40(2), 492–515 (2002) 53. K. Kunisch, S. Volkwein, Galerkin proper orthogonal decomposition methods for parabolic problems. Numer. Math. 90(1), 117–148 (2001) 54. K. Kunisch, S. Volkwein, Optimal snapshot location for computing pod basis functions. ESAIM Math. Modell. Numer. Anal. 44(3), 509–529 (2010) 55. S. Locke, J. Singler, New proper orthogonal decomposition approximation theory for pde solution data. SIAM J. Numer. Anal. 58(6), 3251–3285 (2020) 56. P.M. Müller, N. Kühl, M. Siebenborn, K. Deckelnick, M. Hinze, T. Rung, A novel pharmonic descent approach applied to fluid dynamic shape optimization (2021). Preprint, arXiv:2103.14735 57. F. Negri, G. Rozza, A. Manzoni, A. Quarteroni, Reduced basis method for parametrized elliptic optimal control problems. SIAM J. Sci. Comput. 35(5), A2316–A2340 (2013) 58. I. Neitzel, U. Prüfert, T. Slawig, A smooth regularization of the projection formula for constrained parabolic optimal control problems. Numer. Funct. Anal. Optim. 32(12), 1283– 1315 (2011) 59. I. Neitzel, B. Vexler, A priori error estimates for space-time finite element discretization of semilinear parabolic optimal control problems. Numer. Math. 120(2), 345–386 (2012) 60. G. Rozza, Shape Design by Optimal Flow Control and Reduced Basis Techniques. Technical Report. EPFL (2005) 61. G. Rozza, D.B.P. Huynh, A.T. Patera, Reduced basis approximation and a posteriori error estimation for affinely parametrized elliptic coercive partial differential equations. Archives of Computational Methods in Engineering, 15(3), 229–275 (2008) 62. J.R. Singler, New pod error expressions, error bounds, and asymptotic results for reduced order models of parabolic pdes. SIAM J. Numer. Anal. 52(2), 852–876 (2014) 63. L. Sirovich, Turbulence and the dynamics of coherent structures. i. coherent structures. Q. Appl. Math. 45(3), 561–571 (1987) 64. L. Sirovich, Turbulence and the dynamics of coherent structures. ii. symmetries and transformations. Q. Appl. Math. 45(3), 573–582 (1987) 65. L. Sirovich, Turbulence and the dynamics of coherent structures. iii: dynamics and scaling. Q. Appl. Math. 45(3), 583–590 (1987) 66. F. Tröltzsch, S. Volkwein, Pod a-posteriori error estimates for linear-quadratic optimal control problems. Comput. Optim. Appl. 44(1), 83 (2009)

Chapter 4

Machine Learning Methods for Reduced Order Modeling J. Nathan Kutz

Abstract Reduced order models (ROMs) are critically enabling in many application areas where high-fidelity simulations are computationally expensive to generate. The ability to produce accurate, low-rank, proxy models enable ROMs to transform the modeling and characterization of such systems. Data-driven algorithms have emerged as a viable alternative to projection-based ROMs. Indeed, there are a diversity of mathematical algorithms that can be used to produce data-driven ROMs including (1) dynamic mode decomposition, (2) sparse identification for nonlinear dynamics, and (3) neural networks. Each of these methods are highlighted here with a view towards producing proxy models that enable efficient computations of high-dimensional systems. Moreover, these methods can be used with direct measurement data, computational data, or both in generating stable ROMs.

4.1 Introduction Modeling of physics-based systems is typically achieved by deriving governing equations for the system and simulating the system for various parameter regimes. Mature numerical methods for differential and partial differential equation systems [1] have empowered scientific computing to characterize many modern highdimensional, complex dynamical systems, thus allowing for engineering design and control in a diversity of application areas. Despite great advancements in numerical algorithms, high-performance computing and hardware, there remain many application areas that are computationally intractable due to the high-dimensionality of the underlying computations. Reduced order models (ROMs) provide a mathematical architecture for reducing the computational complexity of mathematical models

J. N. Kutz () Department of Applied Mathematics, University of Washington, Seattle, WA, USA Department of Electrical and Computer Engineering, University of Washington, Seattle, WA, USA e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 M. Falcone, G. Rozza (eds.), Model Order Reduction and Applications, C.I.M.E. Foundation Subseries 2328, https://doi.org/10.1007/978-3-031-29563-8_4

201

202

J. N. Kutz

in numerical simulations [2–5]. Fundamental to rendering simulations computationally tractable is the construction of a low-dimensional subspace on which the dynamics can be approximately embedded. With the emergence of machine learning techniques, there exists a diverse number of data-driven methods for learning both low-dimensional embeddings and evolution equations in such coordinates [1, 6]. In what follows, we consider some of the leading machine learning strategies that help represent physics-based systems by learning dynamics and embeddings jointly. To frame the application of machine learning algorithms for ROMs, we will consider physics-based systems modeled by partial differential equations (PDEs). PDEs model a diversity of spatio-temporal systems, including those found in the classical physics fields of fluid dynamics, electrodynamics, heat conduction, and quantum mechanics. Indeed, PDEs provide a canonical description of how a system evolves in space and time due the presence of partial derivatives which model the relationships between rates of change of time and space. Governing equations of physics-based systems simply provide a constraint, or restriction, on how these evolving quantities are related. We consider PDEs of the form [7] u˙ = N(u, x, t; β)

.

(4.1)

where .u(x, t) is the variable of interest, or the state-space, which we are trying to model. The solution .u(x, t) of the PDE depends upon the spatial variable x, which can be in 1D, 2D or 3D, and time t. Importantly, solutions can often depend on the parameters .β, thus requiring a solution ultimately that can model parametric dependencies, i.e. .u(x, t; β). In what follows, to illustrate various mathematical methods, the parameters .β are assumed to be fixed. Solutions of (4.1) are typically achieved through numerical simulation, unless .N(·) is linear and constant coefficient so that analytic solutions are available. Asymptotic and perturbation methods can also offer analytically tractable solution methods [8]. In many modern applications, discretization of the evolution for .u(x, t) results in a high-dimensional system for which computations are prohibitively expensive. The goal of building ROMs is to generate a proxy model of (4.1) that renders tractable, inexpensive computations that approximate the full simulations of (4.1). Traditional ROMs produce a computational proxy of (4.1) by projecting the governing PDE into a new coordinate system. Coordinate transformations are one of the fundamental paradigms for producing solutions to PDEs [9]. ROMs produce coordinate transformations from the simulation data itself. Thus if the governing PDE (4.1) is discretized so that .u(x, t) → uk = u(tk ) ∈ Rn , then snapshots of the solution can be collected into the data matrix ⎡

⎤ | | ··· | .X = ⎣ u1 u2 · · · um ⎦ | | ··· |

(4.2)

where .X ∈ Cn×m . An optimal coordinate system for ROMs is extracted from the data matrix .X with a singular value decomposition (SVD) [10]

4 Machine Learning Methods for Reduced Order Modeling

X = V ∗

.

203

(4.3)

where . ∈ Cn×r , . ∈ Rr×r and .V ∈ Cm×r and r is the number of modes extracted to represent the data. The SVD, which is more commonly known in the ROMs community as the proper orthogonal decomposition (POD) [2, 11], computes the dominant correlated activity of the data as a set of orthogonal modes. It is guaranteed to provide the best .2 -norm reconstruction of the data matrix .X for a given number of modes r. Importantly, the r modes . extracted from the data matrix are used to produce a separation of variables solution to the PDE [12] u = (x)a(t)

.

(4.4)

where the vector .a = a(t) is determined by using this solution form and Galerkin projecting to (4.1) [2]. Thus projection-based ROM simply represents the governing PDE evolution (4.1) in the r-rank subspace spanned by .. The projective-based ROM construction often produces a low-rank model for the dynamics of .a(t) that can be unstable [13], i.e. the models produced generate solutions that rapidly go to infinity in time. Machine learning techniques offer a diversity of alternative methods for computing the time-dynamics in the low-rank subspace. The simplest architecture aims to train a deep neural network that maps the solution forward in time ak+1 = fθ (ak )

.

(4.5)

where .ak = a(tk ) and .fθ represents a deep neural network whose weights and biases are determined by .θ. A diversity of neural networks can be used to advance solutions, or learn the flow map from time t to .t + t [14, 15]. Indeed, deep learning algorithms provide a flexible framework for constructing a mapping between successive time steps. The typical ROM architecture constrains the dynamics to a subspace spanned by the POD modes, thus in the new coordinate system which is generated by projection to the low-rank subspace, the snapshot matrix is now constructed from the .ak . These snapshots can be used to construct a time-stepping model using neural networks. Recently, Parish and Carlberg [16] and Regazzoni et al. [17] developed a suite of neural network based methods for learning time-stepping models for (4.5). Moreover, Parish and Carlberg provide extensive comparisons between different neural network architectures along with traditional techniques for time-series modeling. In such models the neural networks (or timeseries analysis methods) simply map an input .ak to an output .ak+1 . Machine learning algorithms offer options beyond a Galerkin-POD or deep learning of time-stepping in the time variable .a(t). Thus instead of inserting (4.4) back into (4.1) or learning a flow map .fθ for (4.5), we can instead think about directly building a model for the dynamics of .a(t). Thus we would like to construct a dynamical system

204

J. N. Kutz

da = f(a, t) dt

.

(4.6)

where .f(·) now prescribes the dynamics. Two highly successful options have emerged for producing a model for the dynamics, (1) The dynamic mode decomposition (DMD) [18] and (2) the sparse identification of nonlinear dynamics (SINDy) [19]. The DMD model for .f(·) is assumed to be linear so that .

da ≈ La dt

(4.7)

where .L is a linear operator found by regression. Solutions are then trivial as all that one requires is to find the eigenvalues and eigenvectors of .L in order to provide a general solution by linear superposition. The SINDy method makes a different assumption, that the dynamics .f(·) can be represented by only a few terms. In this case, the regression is to the form da ≈ ξ dt

.

(4.8)

where the columns of the matrix . are candidate terms from which to construct a dynamical system and .ξ contains the loading (or coefficient or weight) of each library term. SINDy assumes that .ξ is a sparse vector so that most of the library terms do not contribute to the dynamics. The regression is a simple .Ax = b solve for an overdetermined system that is regularized by sparsity, or the sparsity-promoting .0 or .1 norms. In addition to the diversity of methods for building a model for the time dynamics .a(t), there also exists the possibility of using coordinates other than those defined by .. Moving beyond a linear subspace can provide improved coordinate systems for building dynamic models for .a(t). Importantly, there exists the possibility of learning a coordinate system where, for instance, a linear DMD model or a parsimonious SINDy model can be more efficiently imposed. Thus we wish to learn a coordinate transformation z = fθ (u)

.

(4.9)

where .z is the new variable representing the state space .u and .fθ is the neural network coordinate transformation that defines the mapping. The goal is then to enforce a DMD or SINDy model in the new coordinate system .

dz = Lz. dt dz = ξ . dt

(4.10a) (4.10b)

This allows us to find a coordinate system beyond the standard linear, low-rank subspace . which can be advantageous for ROM construction. In what follows,

4 Machine Learning Methods for Reduced Order Modeling

205

we highlight the basic mathematical formulations allowing for a diversity of ROM approximations, but especially those that leverage DMD and SINDy in constructing advantageous ROMs.

4.2 Mathematical Formulation The goal is to leverage principles of parsimony for ROMs, especially as they pertain to deep learning and neural network architectures, to learn physically interpretable and generalizable models of spatio-temporal systems from off-line and/or on-line streaming data. We consider systems in three critical scenarios for which a baseline physical model is known, however partially known or unknown models can also be characterized in the methodologies developed. The discovery process begins with data acquisition. Thus sensors are critical for empowering the physics learning process. Often what is measured by the sensors .y is not the correct variable .u for building a parsimonious model representation. Thus it is important to first learn how to map the measurements .y to a state space representation .u where a model is constructed, i.e. a measurement model must be constructed. Once achieved, a parsimonious model for .u can be constructed in a variety of ways as detailed in subsequent sections of the manuscript. Importantly, many modern applications of data-driven modeling require that the measurement model and parsimonious model be constructed simultaneously. There are, of course, many nuances to this process, but we will primarily focus at a first approximation on learning spatio-temporal systems governed by partial differential equations which are specified by a nonlinear operator .N(·). Thus we seek to construct a ROM based on data .y ∈ Rm measured from a highdimensional state .u ∈ Rn , where .n  1 and .m  n. Specifically .

u˙ = N(u, x, t, ), .

(4.11a)

yk = h(tk , x, u(tk )) + σ

(4.11b)

where the spatio-temporal dynamics are prescribed by .N : X1 → Rn , the observation operator is .h : X1 → Rm , the frequency of observations are given by .yk which are measured at the times .tk on some spatial locations prescribed by .x. Observations are compromised by measurement noise .σ , which is typically described by a probability distribution (e.g. a normal distribution .σ ∼ N (μ, σ )). The dynamics is prescribed by a set of parameters .. given p measurements .yk arranged in the matrix .Y =  The goal is thatm×p y1 y2 · · · yp ∈ R , infer the dynamics .N(·) (or unknown portion of dynamics) with parametrization ., the measurement operator .h(·), or a proxy model of the true system, so that tasks such as control, characterization, and forecasting can be accomplished. If .h(·) is not the identity and/or .σ is not zero, then we are in the case of imperfect data. This problem can be one of online learning where the update must

206

J. N. Kutz

occur in real-time and with no possibility of repeating the experiment. Thus model discovery is typically an ill-posed problem whose solution must be accomplished through judiciously chosen regularization. Solving the ill-posed problem (4.11a, 4.11b) is a fundamental scientific and mathematical challenge since there are simply not enough constraints on the measurement model, dynamics and parametrization to achieve a unique solution. To date, it has only been accomplished in highly specialized settings, typically with full state measurements and high-quality (lownoise) data. Significant mathematical innovations have to be developed in order to make this a general and robust architecture, especially as regularization is required to make the problem well-posed. As sufficient data is acquired from sensors in time and space, the data-discovery pipeline then produces the flow y (measurements) → u (state space) → N(·) (dynamics model)

.

(4.12)

with two functions to discover, .h and .N. Alternatively, one can also find a new coordinate system .z in which to build the ROM so that y (measurements) → u (state space)→z (new state space) → N(·) (dynamic model) (4.13)

.

In addition to the discovery of a measurement model and the underlying dynamics, the dimension of the dynamics r must also be discovered and/or estimated in any data-driven architecture. Often the rank of an underlying subspace on which the physics can be projected can be estimated from the singular value decomposition [20]. However, hyper-parameter tuning is critical in refining the initial estimates of the rank r. Such hyper-parameter tuning, which is aimed at justifying the choice of rank through cross-validation, is critical to almost every aspect of training a successful machine learning model. Importantly, we wish to impose physicsbased regularization principles in order to make (4.11a, 4.11b) well-posed. In what follows, construction of a ROM will be the driving principle for regularization in model discovery.

4.3 Reduced Order Modeling Three mathematical architectures will be used in building ROMs from data, including DMD, SINDy and neural networks. Each will be mathematically considered in the context of the mathematical construct (4.11a, 4.11b).

4 Machine Learning Methods for Reduced Order Modeling

207

4.3.1 Dynamic Mode Decomposition Dynamic mode decomposition originated in the fluid dynamics community. Introduced as algorithm by Schmid [21, 22], it has rapidly become a commonly used data-driven analysis tool and the standard algorithm to approximate the Koopman operator from data [23]. In the context of fluid dynamics, DMD was used to identify spatio-temporal coherent fluid structures from high-dimensional time-series data. The DMD analysis offered an alternative to standard dimensionality reduction methods such as the proper orthogonal decomposition (POD), which highlighted low-rank features in fluid flows using the computationally efficient singular value decomposition (SVD) [1]. The advantage of using DMD over SVD is that the DMD modes are linear combinations of the SVD modes that have a common linear (exponential) behavior in time, given by oscillations at a fixed frequency with growth or decay. Specifically, DMD is a regression to solutions of the form X(t) =

r 

.

j eωj t bj = exp( t)b,

(4.14)

j =1

where .X(t) is an r-rank approximation to a collection of state space measurements xk = x(tk ) .(k = 1, 2, · · · , n). The algorithm regresses to values of the DMD eigenvalues .ωj , DMD modes . j and their loadings .bj . The .ωj determines the temporal behavior of the system associated with a modal structure . j , thus giving a highly interpretable representation of the dynamics. Such a regression can also be learned from time-series data [24]. DMD may be thought of as a combination of SVD/POD in space with the Fourier transform in time, combining the strengths of each approach [18, 25]. DMD is modular due to its simple formulation in terms of linear algebra, resulting in innovations related to control [26], compression [27, 28], reduced-order modeling [29], and multi-resolution analysis [30, 31], among others. Because of its simplicity and interpretability, DMD has been applied to a wide range of diverse applications beyond fluid mechanics, including neuroscience [32], disease modeling [33], robotics [34, 35], video processing [36, 37], power grids [38, 39], financial markets [40], and plasma physics [41, 42]. The regression to (4.14) shows the immediate value of DMD for forecasting. Specifically, any time .t ∗ can be evaluated to produce an approximation to the state of the system .X(t ∗ ). However, despite its introduction more than a decade ago, DMD is rarely used for forecasting and/or reconstruction of time-series data except in cases with highquality (noise-free or nearly noise free) data. Indeed, practitioners who work with DMD and noisy data know that the algorithm fails not only to produce a reasonable forecast, but also often fails in the task of reconstructing the time-series it was originally regressed to. Thus in the past decade, the value of DMD has largely been as an important diagnostic tool as the DMD modes and frequencies are highly interpretable. Indeed, from Schmid’s [21, 22] original work until now, DMD papers are primarily diagnostic in nature with the key figures of any given paper being the DMD modes and eigenvalues. In cases, where DMD is used on noise-free data,

.

208

J. N. Kutz

such as for producing reduced order models from high-fidelity numerical simulation data [29, 43], then the DMD solution (4.14) can be used for reconstructing and forecasting accurate representations of the solution. The algorithmic construction of the DMD method can be best understood from the so-called exact DMD [44]. Indeed, this exact DMD is simply a least-square fitting procedure. Specifically, the DMD algorithm seeks a best fit linear operator .A that approximately advances the state of a system, .x ∈ Rn , forward in time according to the linear dynamical system xk+1 = Axk ,

.

(4.15)

where .xk = x(kt), and .t denotes a fixed time step that is small enough to resolve the highest frequencies in the dynamics. Thus, the operator .A is an approximation of the Koopman operator .K restricted to a measurement subspace spanned by direct measurements of the state .x [23]. In the original DMD formulation [22], uniform sampling in time was required so that .tk = kt. The exact DMD algorithm [44] does not require uniform sampling. Rather, for each snapshot .u(tk ) there is a corresponding snapshot .u(tk ) one time step .t in the future. These snapshots are arranged into two matrices, .X which is given in (4.2), and .X which is the matrix (4.2) with all snapshots advanced .t. In terms of these matrices, the DMD regression (4.15) is X ≈ AX.

.

(4.16)

The exact DMD is the best fit, in a least-squares sense, operator .A that approximately advances snapshot measurements forward in time. Specifically, it can be formulated as an optimization problem A = arg min X − AX F = X X†

.

(4.17)

A

where . · F is the Frobenius norm and .† denotes the pseudo-inverse. The pseudoinverse may be computed using the SVD of .X = UV∗ as .X† = V −1 U∗ . The matrices .U ∈ Cn×n and .Vm×m are unitary, so that .U∗ U = I and .V∗ V = I, where .∗ denotes complex-conjugate transpose. The columns of .U are known as POD modes. Often for high-dimensional data, the DMD leverages low-rank structure by first projecting .A onto the first r POD modes in .Ur and approximating the pseudo-inverse using the rank-r SVD approximation .X ≈ Ur  r V∗r .

˜ = U∗r AUr = U∗r X Vr  −1 A r .

(4.18)

The leading spectral decomposition of .A may be approximated from the spectral ˜ decomposition of the much smaller .A .

˜ = W . AW

(4.19)

4 Machine Learning Methods for Reduced Order Modeling

209

The diagonal matrix . contains the DMD eigenvalues, which correspond to eigenvalues of the high-dimensional matrix .A. The columns of .W are eigenvectors ˜ and provide a coordinate transformation that diagonalizes the matrix. These of .A, columns may be thought of as linear combinations of POD mode amplitudes that behave linearly with a single temporal pattern given by the corresponding eigenvalue .λ. The eigenvectors of .A are the DMD modes . , and they are reconstructed using the eigenvectors .W of the reduced system and the time-shifted data matrix .X ˜ ˜ = X V

.

−1

W.

(4.20)

Tu et al. [44] proved that these DMD modes are eigenvectors of the full .A matrix under certain conditions. As already shown in the introduction, the DMD decomposition allows for a reconstruction of the solution as (4.14). The amplitudes of each mode .b can be computed from b = † X1 ,

.

(4.21)

however, alternative and often better approaches are available to compute .b [25, 45, 46]. Thus, the data matrix .X may be reconstructed as X ≈ diag(b)T(ω) ⎤⎡ ω t ⎤ ⎡ ⎤⎡b e 1 1 · · · eω1 tm 1 | | ⎢ ⎥⎢ ⎥ = ⎣φ 1· · · φ r ⎦⎣ . . . ⎦⎣ ... . . . ... ⎦ . | | br eωr t1 · · · eωr tm

.

(4.22)

Bagheri [43] first highlighted that DMD is particularly sensitive to the effects of noisy data, with systematic biases introduced to the eigenvalue distribution [47–50]. As a result, a number of methods have been introduced to stabilize performance, including total least-squares DMD [50], forward-backward DMD [49], variational DMD [51], subspace DMD [52], time-delay embedded DMD [53–56] and robust DMD methods [46, 57]. However, the optimized DMD algorithm of Askham and Kutz [46], which uses a variable projection method for nonlinear least squares to compute the DMD for unevenly timed samples, provides the best and optimal performance of any algorithm currently available. This is not surprising given that it actually is constructed to optimally satisfy the DMD problem formulation. Specifically, the optimized DMD algorithm solves the exponential fitting problem directly .

arg min X − b T(ω) F ,

(4.23)

ω, b

where . b = . diag(b). This has been shown to provide a superior decomposition due to its ability to optimally suppress bias and handle snapshots collected at

210

J. N. Kutz

arbitrary times. Moreover, it can be used with statical bagging methods to produce the BOP-DMD (bagging, optimized DMD) [58], which is perhaps the most stable variant of DMD. BOP-DMD also provides spatial and temporal uncertainty quantification. The disadvantage of optimized DMD is that one must solve a nonlinear optimization problem, often which can fail to converge. The construction of a traditional ROM that is accurate and efficient is centered on the reduction (4.1). Thus once a low-rank subspace is computed from the SVD, the POD modes . are used for projecting the dynamics. DMD allows us to use a data-driven, non-intrusive method in order to regress to a model for the temporal dynamics. Consider modification of (4.1) to the evolution dynamics ut = Lu + N (u, ux , uxx , · · · , x, t; β)

.

(4.24)

where the linear and nonlinear parts of the evolution, denoted by .L and .N(·) respectively, have been explicitly separated. The solution ansatz .u = a yields the ROM .

da =  T La +  T N(a, β). dt

(4.25)

Note that the linear operator in the reduced space . T L in a .r × r matrix which is easily computed. The nonlinear portion of the operator . T N(a, β) is more complicated since it involves repeated computation of the operator as the solution .a, and consequently the high-dimensional state .u, is updated in time. One method for overcoming the difficulties introduced in evaluating the nonlinear term on the right hand side is to introduce the DMD algorithm. DMD approximates a set of snapshots by a best-fit linear model. Thus the nonlinearity can be evaluated over snapshots and a linear model constructed to approximate the dynamics. Thus two matrices can be constructed ⎡



⎤ | | | .N− = ⎣N1 N2 · · · Nm−1 ⎦ | | |

⎤ | | | and N+ = ⎣N2 N3 · · · Nm ⎦ | | |

(4.26)

where .Nk is the evaluation of the nonlinear term .N (u, ux , uxx , · · · , x, t; β) at .t = tk . The .± denotes the input (.−) and output (.+) respectively. This gives the training data necessary for regressing to a DMD model N+ = AN N− .

.

(4.27)

The governing equation (4.24) can then be approximated by ut ≈ Lu + AN u = (A + AN )u

.

(4.28)

4 Machine Learning Methods for Reduced Order Modeling

211

where the operator .L has been replaced by .A. The dynamics is now completely linear and solutions can be easily constructed from the eigenvalues and eigenvectors of the linear operator .A + AN . In practice, the DMD algorithm exploits low-dimensional structure in building a ROM model. Thus instead of the approximate linear model (4.24), we instead with to build a low-dimensional version. From snapshots (4.26) of the nonlinearity, the DMD algorithm can be used to approximate the dominant rank-r nonlinear contribution to the dynamics as N (u, ux , uxx , · · · , x, t; β) ≈

r 

.

j exp(ωj t)bj = exp( t)b

(4.29)

j =1

where .bj determines the weighting of each mode. Here . j is the DMD mode and ωj is the DMD eigenvalue. This approximation can be used in (4.25) to produce the POD-DMD approximation

.

.

da =  T La +  T exp( t)b. dt

(4.30)

In this formulation, there are a number of advantageous features, (1) the nonlinearity is only evaluated once with the DMD algorithm (4.29), and (2) the products . T L and . T are also only evaluated once and both produce matrices that are lowrank, i.e. they are independent of the original high-rank system. Thus with a onetime, up-front evaluation of two snapshot matrices to produce . and . , the ROM produces a computationally efficient ROM that requires no recourse to the original high-dimensional system. Alla and Kutz [29] integrated the DMD algorithm into the traditional ROM formalism to produce the POD-DMD model (4.30). The comparison of this computationally efficient ROM with traditional model reduction is shown in Fig. 4.1. Specifically, both the computational time and error are evaluated using this technique. Once the DMD algorithm is used to produce an approximation of the nonlinear term, it can be used for producing future state predictions and a computationally efficient ROM. Indeed, its computational acceleration is quite remarkable in comparison to traditional methods. Moreover, the method is nonintrusive and does not require additional evaluation of the nonlinear term. The entire method can be used with randomized algorithms to speed up the low-rank evaluations even further [59]. Note that the computational performance boost comes at the expense of accuracy as shown in Fig. 4.1. This is primarily due to the fact that additional POD modes used for standard ROMs, which are orthogonal by construction and guaranteed to be a best-fit in .2 , are now replaced by DMD modes which are no-longer orthogonal [29].

212

J. N. Kutz 1

0

10

10

FULL POD POD−DEIM POD−DMD

0

10

POD POD−DEIM POD−DMD

−2

10

Error

CPU time

−4

10 −1

10

−6

10

−2

10

−8

10

−10

−3

10

0

5

10

15

20

25

30

Number of POD modes

35

40

10

0

5

10

15

20

25

30

35

40

Number of POD modes

Fig. 4.1 Computation time and accuracy on a semi-linear parabolic equation (Modified from Alla and Kutz [29]). Four methods are compared, the high-fidelity simulation of the governing equations (FULL), a Galerkin-POD reduction as given in (4.30) (POD), a Galerkin-POD reduction with the discrete empirical interpolation (DEIM) algorithm for evaluation the nonlinearity (POD-DEIM), and the POD-DMD approximation (4.30). The left panel shows the computation time, which are an order-of-magnitude faster than traditional POD-DEIM algorithms. The right panel shows the accuracy of the different methods for reproducing the high-fidelity simulations. POD-DMD looses some accuracy in comparison to Galerkin-POD methods due to the fact that DMD modes are not orthogonal and thus the error does not decrease as quickly as the POD based methods

4.3.2 Sparse Identification of Nonlinear Dynamics In addition to a DMD model for modeling the low-rank dynamics, the SINDy regression framework also allows one to build a model for the evolution of the temporal dynamics in the low-rank subspace. Discovery of governing equations plays a fundamental role in the development of physical theories, and in this case, we wish to discover the evolution dynamics of .a(t) for constructing our ROM. With increasing computing power and data availability in recent years, there have been substantial efforts to identify the governing equations directly from data [60–62]. There has been particular emphasis on parsimonious representations because they have the benefits of promoting interpretability and generalizing well to unknown data [19, 63–69]. The SINDy method was proposed in [19], which leverages dictionary learning and sparse regression to model dynamical systems. This approach has been successful in modeling a diversity of applications, including in chemistry [70], optics [71], engineered systems [72], epidemiology [73], and plasma physics [74]. Furthermore, there has been a variety of modifications, including improved robustness to noise [75–77], generalizations to partial differential equations [78–80], multi-scale physics [31], and libraries of rational functions [81, 82]. Just like the BOP-DMD algorithm [58], recent Bayesian and ensemble methods make SINDy much more robust for model discovery for noisy systems and with little data [83, 84]. In the context of ROMs modeling, the goal is now to discover a dynamic, parsimonious model of the evolution dynamics of a high-fidelity model embedded in a low-rank subspace. Recall that .u(t) ≈ a(t) for building a ROM. Although

4 Machine Learning Methods for Reduced Order Modeling

213

 can be easily computed with the SVD, it is the evolution of .a(t) that ultimately determines the temporal behavior of the system. Thus far, the temporal evolution has been computed via Galerkin projection and DMD. SINDy gives yet another alternative

.

.

d a = f(a) dt

(4.31)

where the right-hand side function prescribing the evolution dynamics .f(·) is unknown. SINDy provides a sparse regression framework to determine this dynamics. As in DMD, the snapshots of .a(t) are collected into the matrix ⎡

⎤ | | | .A = ⎣a1 a2 · · · am ⎦ . | | |

(4.32)

The SINDy regression framework is then formulated as .

˙ = (A). A

(4.33)

where each column .ξ k in . is a vector of coefficients determining the active terms in the k-th row in (4.31). Leveraging parsimony provides a dynamical model using as few terms as possible in .. Such a model may be identified using a convex .1 regularized sparse regression ξ k = argminξ k ˙ak − (A)ξ k 2 + λ ξ k 1 .

.

(4.34)

˙ and .λ is a sparsity-promoting regularization. Note that .a˙ k is the k-th column of .A, There are many variants for sparsity promotion that can be used [85–92], including the advocated sequential least-squares thresholding to select active terms [19]. The SINDy-POD method provides a simple regression framework for discovering a parsimonious, and generally nonlinear, model for the evolution dynamics of the high-dimensional model in a low-dimensional subspace. For example, the canonical example of flow past a circular cylinder is considered. This is modeled by the two-dimensional, incompressible Navier-Stokes equations [43] ∇ · u = 0, ∂t u + (u · ∇)u = −∇p +

.

1 u Re

(4.35)

where .u is the two-component flow velocity field in 2D and .p is the pressure term. For Reynolds number .Re = Rec ≈ 47, the fluid flow past a cylinder undergoes a supercritical Hopf bifurcation, where the steady flow for .Re < Rec transitions to unsteady vortex shedding [93]. The unfolding gives the celebrated Stuart-Landau ODE, which is essentially the Hopf normal form in complex coordinates. This has resulted in accurate and efficient reduced-order models for this system [94, 95].

214

J. N. Kutz

xm

˙ = Θ(X)Ξ X

...

t

x3

x2 x1

x˙ y˙

= =

μx − ωy + Axz ωx + μy + Ayz



=

−λ(z − x2 − y 2 ).

Fig. 4.2 Application of SINDy algorithm for ROM construction. High-dimensional data is used with the sparse identification of nonlinear dynamics (SINDy) [19] in order to produce a model for .a(t). This procedure is modular, so that different techniques can be used for the feature extraction and regression steps. In this example of flow past a cylinder, SINDy discovers the model of Noack et al. [94]. Modified from Brunton et al. [19]

In Fig. 4.2, simulations at .Re = 100 were considered. The SVD of the data matrix at this Reynolds number shows that three modes dominate the dynamics. As such the first three columns of the matrix .V are extracted and the SINDy regression (4.33) is performed. The discovered dynamical model is given by .

a˙ 1 = μa1 − ωa2 + Aa1 a3.

(4.36a)

a˙ 2 = ωa1 + μa2 + Aa2 a3.

(4.36b)

a˙ 3 = −λ(a3 − a12 − a22 )

(4.36c)

which is the same found be Noack et al. [94] through a detailed asymptotic reduction of the flow dynamics. Thus the ROM evolution dynamics (4.36c) represents a significantly different model than what is achieved via Galerkin POD projection. This model stable and it also captures the correct supercritical Hopf bifurcation dynamics as a function of Reynolds number. Thus the SINDy-POD provides an improved ROM description of the dynamics.

4.3.3 Neural Networks The emergence of machine learning is expanding the mathematical possibilities for the construction of accurate ROMs. As shown in the previous sections, the focus of traditional projection-based ROMs is on computing the low-dimensional subspace . on which to project the governing equations. Recall that in constructing the lowdimensional subspace, the SVD is used on snapshots of high-fidelity simulation (or

4 Machine Learning Methods for Reduced Order Modeling

215

˜ ∗ . The POD reduction technique uses only the single ˜V experimental) data .X ≈   matrix . in the reduction process. The temporal evolution in the reduced space ˜ ∗ . This gives explicitly the evolution of each mode over the ˜V . is quantified by . snapshots of .X, information which is not used in projection-based ROMs. Neural networks can then be used directly on the time-series data encoded in .V to build a time-stepping algorithm for marching the solution forward in time. The motivation for using deep learning algorithms for time-stepping is the recognition that projection-based model reduction often can produce unstable iteration schemes [13]. A second important fact is that valuable temporal information in the low-dimensional space is summarily dismissed by the projection schemes, i.e. only the POD modes are retained for ROM construction. Neural networks aim to leverage the temporal information and in the process build efficient and stable time-stepping proxies. Recall that model reduction proceeds by projecting into the low-dimensional subspace spanned by . so that u(t) ≈ a(t).

.

(4.37)

In the projection-based ROMs of previous sections, the amplitude dynamics .a(t) are constructed by Galerkin projection on the governing equations. With neural networks, the dynamics .a(t) is approximated from the discrete time-series data encoded in .V. Specifically, this gives ⎡

a(t)

.

⎤ | | | ˜ ∗ = ⎣ a1 a2 · · · am ⎦ ˜V  | | |



(4.38)

over the m time snapshots of the original data matrix on which the ROM is to be constructed. Deep learning algorithms provide a flexible framework for constructing a mapping between successive time steps. As shown in Fig. 4.3, the typical ROM architecture constrains the dynamics to a subspace spanned by the POD modes .. Thus in the original coordinate system, the high-fidelity simulations of the governing equations for .u are solved with a given numerical discretization scheme to produce a snapshot matrix .X containing .uk . In the new coordinate systems which is generated by projection to the subspace ., the snapshot matrix is now constructed from .ak as shown in (4.38). In traditional ROMs, the snapshot matrix (4.38) is not used. Instead snapshots of .ak are achieved by solving the Galerkin projected model. However, the snapshot matrix (4.38) can be used to construct a time-stepping model using neural networks. Neural networks allow one to use the high-fidelity simulation data to train a mapping ak+1 = fθ (ak )

.

(4.39)

216

J. N. Kutz

Fig. 4.3 Illustration of neural network integration with POD subspaces. The autoencoder structure projects the original high-dimensional state space data into a low-dimensional space via .u(t) ≈ a(t). As shown in the bottom left, the snapshots .uk are generated by high-fidelity numerical solutions of the governing equations .ut = N (u, ux , uxx , · · · , x, t; β). In traditional ROMs, the snapshots .ak are constructed from Galerkin projection as shown in the bottom right. Neural networks instead learn a mapping .ak+1 = fθ (ak ) from the original, low-dimensional snapshot data. It should be noted that time-stepping Runge-Kutta schemes, for instance, are a form of feed-forward neural networks, which are used to produce the original high-fidelity data snapshots .uk [96]

where .fθ is a generic representation of a neural network which is characterized by its structure, weights and biases. Neural networks can be costly to train. Indeed, they typically require a significant amount of data and long training periods in order to perform up to their potential. When comparing DMD, SINDy and neural networks, neural networks take the longest time to train, while DMD is rapid and data efficient. These trade-offs must often be considered in making a choice between the three different model reduction paradigms. As previously mentioned, Parish and Carlberg [16] and Regazzoni et al. [17] developed neural network models to learn time-stepping of (4.39). In such models the neural networks (or time-series analysis methods) simply map an input (.ak ) to an output (.ak+1 ). In its simplest form, the neural network training requires input-output

4 Machine Learning Methods for Reduced Order Modeling

217

pairs that can be generated from snapshots .ak . Thus two matrices can be constructed ⎡



⎤ | | | .A− = ⎣a1 a2 · · · am−1 ⎦ | | |

⎤ | | | and A+ = ⎣a2 a3 · · · am ⎦ | | |

(4.40)

where the .± denotes the input (.−) and output (.+) respectively. This gives the training data necessary for learning (optimizing) a neural network map A+ = fθ (A− ).

.

(4.41)

There are numerous neural network architectures that can learn the mapping fθ . A simple feed-forward network was already shown to be quite accurate in learning such a model. Further sophistication can improve accuracy and reduce data requirements for training. Regazzoni et al. [17] formulated the optimization of (4.41) in terms of maximumlikelook. Specifically, they considered the most suitable representation of the high-fidelity model in terms of simpler neural network models. They show that such neural network models can approximate the solution to within any accuracy required (limited by the accuracy of the training data, or course) simply by constructing them from the input-output pairs given by (4.41). Parish and Carlberg [16] provide an in-depth study of different neural network architectures that can be used for learning the time-steppers. They are especially focused on recurrent neural network (RNN) architectures that have proven to be so effective in temporal sequences associated with language [97]. Their extensive comparisons show that long shortterm memory (LSTM) [98] neural networks outperform other methods and provide substantial improvements over traditional time-series approaches such as autoregressive models. In addition to a baseline Gaussian process (GP) regression, they specifically compare time-stepping models that include the following, k-nearest neighbors (kNN), artificial neural networks (ANN), autoregressive with exogenous inputs (ARX), Integrated ANN (ANN-I), latent ARX (LARX), RNN, LSTM and standard GP. Some models include recursive training (RT) and others do not (NRT). Their comparisons on a diversity of PDE models, which will not be detailed here, are evaluated on the fraction of variance unexplained (FVU). Figure 4.4 gives a representation of the extensive comparisons made on these methods for an advection-diffusion PDE model. The success of neural networks for learning time-stepping representations fits more broadly under the aegis of flow maps [99]

.

uk+1 = F(uk ).

.

(4.42)

218

J. N. Kutz

Fig. 4.4 From Parish and Carleberg [16], a comparison of a diversity of error metrics and methods for constructing the mapping (4.39) for the advection-diffusion equations. In all models considered in the paper, the LSTM and RNN structures proved to be the more accurate models for time-stepping. The reader is encouraged to consult the original paper for the details of the underlying models, the error metrics displayed, and the training data used. Python code are available in the appendix of the original paper. (a) Normed state error .δx . (b) QoI error .δs

For neural networks, the flow map is approximated by the learned model (4.39) so that .F = fθ . Qin et al. [14] and Liu et al. [15] have explored the construction of flow maps from neural networks as yet another modeling paradigm for advancing the solution in time without recourse to high-fidelity simulations. Such methods offer a broader framework for fast time-stepping algorithms as no initial dimensionality reduction needs to be computed. In Qin et al. [14], the neural network model .fθ is constructed with a residual network (ResNet) as the basic architecture for approximation. In addition to a one-step method, which is shown to be exact in temporal integration, a recurrent ResNet and recursive ResNet is also constructed for multiple time steps. Their formulation is also in the weak form where no derivative information is required in order to produce the time stepping approximations. Several numerical examples are presented to demonstrate the performance of the methods. Like Parish and Carlberg [16] and Regazzoni et al. [17], the method is shown to be exceptionally accurate even in comparison with direct numerical integration, highlighting the qualities of the universal approximation properties of .fθ . Liu et al. [15] leveraged the flow map approximation scheme to learn a multiscale time-stepping scheme. Specifically, one can learn flow maps for different characteristic timescales. Thus a given model ak+τ = fθ τ (ak )

.

(4.43)

can learn a flow map over a prescribed timescale .τ . If there exists distinct timescales in the data, for instance denoted by .t1 , .t2 and .t3 with .t1  t2  t3 (slow, medium and

4 Machine Learning Methods for Reduced Order Modeling

f

u1

1

f

2

f

219 slow medium fast

3

u

Fig. 4.5 Multiscale hierarchical time-stepping scheme (modified from Liu et al. [15]). Neural network representations of the time-steppers are constructed over three distinct time scales. The red model takes large steps (slow scale .fθ 1 ), leaving the finer time-stepping to the yellow (medium time scales .fθ 2 ) and blue (fast time scales .fθ 3 ) models. The dark path shows the sequence of maps from .u1 to .um

fast times), then three models can be learned, .fθ 1 , .fθ 2 and .fθ 3 for the slow, medium and fast times respectively. Figure 4.5 shows the hierarchical time-stepping (HiTS) scheme with three distinct timescales. The training data of a high-fidelity simulation, or collection of experimental data, allow for the construction of flow maps which can then be used to efficiently forecast long times in the future. Specifically, one can use the flow map constructed on the slowest scale .fθ 1 to march far into the future while the medium and fast scales are then used to advance to the specific point in time. Thus a minimal number of steps is taken on the fast scale, and the work of forecasting long into the future is done by the slow and medium scales. The method is highly efficient and accurate. Figure 4.6 compares the HiTS scheme across a number of example problems, some of which are videos and music frames. Thus HiTS does not require governing equations, simply time-series data arranged into input-output pairs. The performance of such flow maps is remarkably robust, stable and accurate, even when compared to leading time-series neural networks such as LSTMs, echo state networks (ESN) and clockwork recurrent neural networks (CW-RNNs). This is especially true for long forecasts in contrast to the small time-steps evaluated in the work of Parish and Carlberg [16]. Overall, the works of Parish and Carlberg [16], Regazzoni et al. [17], Qin et al. [14] and Liu et al. [15] exploit very simple training paradigms related to inputoutput pairings of temporal snapshot data as structured in (4.40). This provides a significant potential improvement for learning time-stepping proxies to Galerkin projected models.

Fig. 4.6 Evaluation of different neural network architectures (column) on each training sequence (row) (From Liu et al. [15]). Key diagnostics are visualized from a diversity of examples, including music files and videos. The last frame of the reconstruction is visualized for the first, third and fourth examples while the entire music score is visualized in the second example. Note the superior performance of the hierarchical time-stepping scheme in comparison with other modern neural network models such as LSTMs, echo state networks (ESN) and clockwork recurrent neural networks (CW-RNNs)

220 J. N. Kutz

4 Machine Learning Methods for Reduced Order Modeling

221

4.4 Discovery of Coordinates and Models for ROMs As a final example of ROM construction, we consider an architecture capable of jointly and simultaneously learning coordinates and parsimonious dynamics. Specifically, Champion et al. [75] present a method (SINDy AE) for the simultaneous discovery of sparse dynamical models (SINDy) and coordinates (autoencoders) that enable these simple representations. The aim in the architecture is to leverage the parsimony and interpretability of SINDy with the universal approximation capabilities of deep neural networks to discover an appropriate coordinate system in which to embed the dynamics. This can produce interpretable and generalizable models capable of extrapolation and forecasting since the dynamical model is minimally parametrized. The architecture is shown in Fig. 4.7 where an autoencoder is used to embed the original data .x into a new coordinate .z amenable to a parsimonious representation. While in the original coordinate system a dynamical model may be dense in terms of functions of the original measurement coordinates .x, this method determines through an autoencoder a reduced coordinate system .z(t) = ϕ(x(t)) ∈ Rd (.d  n)

Fig. 4.7 Schematic of the SINDy autoencoder method for simultaneous discovery of coordinates and parsimonious dynamics (From Champion et al. [75]). (a) An autoencoder architecture is used to discover intrinsic coordinates z from high-dimensional input data x. The network consists of two components, an encoder ϕ(x), which maps the input data to the intrinsic coordinates z, and a decoder ψ(z), which reconstructs x from the intrinsic coordinates. (b) A SINDy model captures the dynamics of the intrinsic coordinates. The active terms in the dynamics are identified by the nonzero elements in , which are learned as part of the NN training. The time derivatives of z are calculated using the derivatives of x and the gradient of the encoder ϕ. The inset shows the pointwise loss function used to train the network. The loss function encourages the network to minimize both the autoencoder reconstruction error and the SINDy loss in z and x. L1 regularization on  is also included to encourage parsimonious dynamics

222

J. N. Kutz

where the following dynamical model holds .

dz(t) = g(z(t)). dt

(4.44)

Specifically, a parsimonious description of the dynamics is sought where .g contains only a few active terms from a SINDy library. Thus in addition to a dynamical model, the method learns a coordinate transforms .ϕ, ψ that map the measurements to an intrinsic coordinates via .z = ϕ(x) (encoder) and back via .x ≈ ψ(z) (decoder). The autoencoder is a flexible, feedforward neural network that allows one to discovery underlying low-dimensional coordinates in which to represent the data. Thus the layers of the autoencoder learn a latent representation of a new variable in which to express the data, in this case the dynamic evolution dynamics. The network is trained to output an approximate reconstruction of its input, and the restrictions placed on the network architecture (e.g. the type, number, and size of the hidden layers) characterize the intrinsic coordinates [97]. The autoencoder gives a nonlinear generalization of a PCA analysis [100], thus it goes beyond the standard linear POD subspace description. Autoencoders can learn a low-dimensional representation in isolation without need to specify any other constraints. Without further specifications, the intrinsic coordinates learned have no particular meaning or interpretation. However, if in the latent space, additional constraints are imposed, then additional structure and meaning can be imposed on the model. For the SINDy AE model, the network is required to learn coordinates associated with parsimonious dynamics. Thus it integrates the sparse regression framework of SINDy in the latent space, or intrinsic coordinates .z. This constraint in the autoencoder provides a regularization framework whereby model discovery is achieved by constructing a library .(z) = [θ 1 (z), θ 2 (z), . . . , θ p (z)] of candidate basis functions, e.g. polynomials, and learning a sparse set of coefficients . = [1 , . . . , d ] that defines the dynamical system .

dz(t) = g(z(t)) = (z(t)). dt

Typical of SINDy, the library is specified before training occurs, where library loadings (coefficients) . are learned along with the autoencoder weights during training (optimization). Importantly, the derivatives .x˙ (t) of the original states are computed in order to pass these along to the encoder variables as .z˙ (t) = ∇x ϕ(x(t))˙x(t). This helps enforce accurate prediction of the dynamics by incorporating the loss function 2 Ldz/dt = ∇x ϕ(x)˙x − (ϕ(x)T ) .

.

2

(4.45)

This term uses both the typical SINDy regression along with the gradient of the encoder to promote learning of a sparse dynamical model which accurately predicts the time derivatives of the encoder variables. Additional loss terms require that the

4 Machine Learning Methods for Reduced Order Modeling

223

SINDy predictions accurately reconstruct the time derivatives of the original data

2 Ldx/dt = x˙ − (∇z ψ(ϕ(x))) (ϕ(x)T ) .

.

2

(4.46)

These loss terms (4.45) and (4.46) are added to the standard autoencoder loss Lrecon = x − ψ(ϕ(x)) 22 ,

.

which ensures that the autoencoder can accurately reconstructs the original input data. To help promote sparsity in the SINDy architecture, an .1 regularization penalty is included on the SINDy coefficients .. This promotes a parsimonious model for the dynamics by selecting only a small number of terms. The combination of the above four terms gives the following overall loss function Lrecon + λ1 Ldx/dt + λ2 Ldz/dt + λ3 Lreg ,

.

where the hyperparameters .λ1 , λ2 , λ3 determine the relative weighting of the three terms in the loss function. The SINDy AE is attractive since it does not force the ROM to operate in the subspace .. Rather, the AE allows the discovery of a coordinate system whereby a SINDy model can be expressed. This architecture can be used to also learn a linear Koopman embedding of the data [101, 102]. Moreover, the same method can be lifted to evaluate boundary value problems [103].

4.5 Conclusions Data-driven methods have emerged as an invaluable tool for aiding in the construction of ROMs. Indeed, there now exist many alternatives to the Galerkin-POD method whose projection technique often produce unstable models [13]. Demonstrated here are three emerging paradigms for data-driven ROMs, (1) dynamic mode decomposition, (2) sparse identification for nonlinear dynamics, and (3) neural networks. In each case, the goal is to use these methods to construct a model for the dynamics of .a(t). This is a data-driven construction as opposed to a projectionbased construction typical of ROMs [2]. Each of the data-driven constructions has advantageous that can be leveraged by practioneers. The DMD method is perhaps the simplest as it provides a regression to a best-fit linear model. The linear model, which models a Koopman operator [104], is advantageous since solutions can be easily represented as a linear combination of eigenvalues and eigenfunctions of the constructed linear operator. The data requirements are minimal for the DMD approximation and there exists open source code, pyDMD [105], for producing DMD models. SINDy requires more data, but it allows for a nonlinear representation of the dynamic evolution. SINDy is

224

J. N. Kutz

advantageous since it produces a parsimonious evolution dynamics for .a(t) that is typically interpretable and amenable to analysis with tools from dynamical systems theory. It also has open source software available called pySINDy [106, 107]. If sufficient data is available, a diversity of deep learning algorithms are available for producing neural networks for modeling the time evolution of .a(t). Such algorithms have been shown to be successful in a diversity of application areas. Moreover, the deep learning can be structured, for instance, to learn multiscale physics. Overall, data-driven methods are providing significant improvement capabilities for traditional ROMs. As these methods are developed further over the next decade, it is anticipated that ROMs will be substantially improved in terms of computational performance and accuracy. This has the potential to revolutionize digital twin technologies as these methods can use computational or measurement data to construct proxy models that are accurate and efficient to simulate. Acknowledgments The work of JNK was supported in part by the US National Science Foundation (NSF) AI Institute for Dynamical Systems (dynamicsai.org), grant 2112085.

References 1. J.N. Kutz, Data-Driven Modeling & Scientific Computation: Methods for Complex Systems & Big Data. (Oxford University Press, 2013) 2. P. Benner, S. Gugercin, K. Willcox, A survey of projection-based model reduction methods for parametric dynamical systems. SIAM Rev. 57(4), 483–531 (2015) 3. A.C. Antoulas, Approximation of Large-Scale Dynamical Systems (SIAM, 2005) 4. A. Quarteroni, A. Manzoni, F. Negri, Reduced Basis Methods for Partial Differential Equations: An Introduction, vol. 92 (Springer, 2015) 5. J.S. Hesthaven, G. Rozza, B. Stamm et al., Certified Reduced Basis Methods for Parametrized Partial Differential Equations, vol. 590 (Springer, 2016) 6. S.L. Brunton, J.N. Kutz, K. Manohar, A.Y. Aravkin, K. Morgansen, J. Klemisch, N. Goebel, J. Buttrick, J. Poskin, A. Blom-Schieber et al., Data-driven aerospace engineering: Reframing the industry with machine learning. Preprint. arXiv:2008.10740 (2020) 7. R. Courant, D. Hilbert, Methods of Mathematical Physics: Partial Differential Equations (John Wiley & Sons, 2008) 8. J.N. Kutz, Advanced differential equations: asymptotics & perturbations. Preprint. arXiv:2012.14591 (2020) 9. J.P. Keener, Principles of Applied Mathematics: Transformation and Approximation (CRC Press, 2018) 10. J.N. Kutz, Data-driven Modeling & Scientific Computation: Methods for Complex Systems & Big Data (Oxford University Press, 2013) 11. P. Holmes, J.L. Lumley, G. Berkooz, C.W. Rowley, Turbulence, Coherent Structures, dynamical systems and Symmetry (Cambridge University Press, 2012) 12. R. Haberman, Elementary Applied Partial Differential Equations, vol. 987 (Prentice Hall, Englewood Cliffs, 1983) 13. K. Carlberg, M. Barone, H. Antil, Galerkin v. least-squares Petrov–Galerkin projection in nonlinear model reduction. J. Comput. Phys. 330, 693–734 (2017) 14. T. Qin, K. Wu, D. Xiu, Data driven governing equations approximation using deep neural networks. J. Comput. Phys. 395, 620–635 (2019)

4 Machine Learning Methods for Reduced Order Modeling

225

15. Y. Liu, J.N. Kutz, S.L. Brunton, Hierarchical deep learning of multiscale differential equation time-steppers. Preprint. arXiv:2008.09768 (2020) 16. E.J. Parish, K.T. Carlberg, Time-series machine-learning error models for approximate solutions to parameterized dynamical systems. Comput. Methods Appl. Mech. Eng. 365, 112990 (2020) 17. F. Regazzoni, L. Dede, A. Quarteroni, Machine learning for fast and reliable solution of time-dependent differential equations. J. Comput. Phys. 397, 108852 (2019) 18. J.N. Kutz, S.L. Brunton, B.W. Brunton, J.L. Proctor, Dynamic Mode Decomposition: DataDriven Modeling of Complex Systems. (SIAM, 2016) 19. S.L. Brunton, J.L. Proctor, J.N. Kutz, Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proc. Natl. Acad. Sci. 113(15), 3932–3937 (2016) 20. S.L. Brunton, J. N. Kutz, Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control (Cambridge University Press, 2019) 21. P.J. Schmid, J. Sesterhenn, Dynamic mode decomposition of numerical and experimental data, in 61st Annual Meeting of the APS Division of Fluid Dynamics, November (American Physical Society, 2008) 22. P.J. Schmid, Dynamic mode decomposition of numerical and experimental data. J. Fluid Mech. 656, 5–28 (2010) 23. C.W. Rowley, I. Mezi´c, S. Bagheri, P. Schlatter, D.S. Henningson, Spectral analysis of nonlinear flows. J. Fluid Mech. 645, 115–127 (2009) 24. H. Lange, S.L. Brunton, N. Kutz, From Fourier to Koopman: Spectral methods for long-term time series prediction. Preprint. arXiv:2004.00574 (2020) 25. K.K. Chen, J.H. Tu, C.W. Rowley, Variants of dynamic mode decomposition: Boundary condition, Koopman, and Fourier analyses. J. Nonlinear Sci. 22(6), 887–915 (2012) 26. J.L. Proctor, S.L. Brunton, J.N. Kutz, Dynamic mode decomposition with control. SIAM J. Appl. Dynam. Syst. 15(1), 142–161 (2016) 27. S.L. Brunton, J.L. Proctor, J.H. Tu, J.N. Kutz, Compressed sensing and dynamic mode decomposition. J. Comput. Dyn. 2(2), 165–191 (2016) 28. N.B. Erichson, S. Voronin, S.L. Brunton, J.N. Kutz, Randomized matrix decompositions using R. J. Stat. Software 89(11), 1–48 (2019) 29. A. Alla, J.N. Kutz, Nonlinear model order reduction via dynamic mode decomposition. SIAM J. Sci. Comput. 39(5), B778–B796 (2017) 30. J.N. Kutz, X. Fu, S.L. Brunton, Multiresolution dynamic mode decomposition. SIAM J. Appl. Dyn. Syst. 15(2), 713–735 (2016) 31. K.P. Champion, S.L. Brunton, J.N. Kutz, Discovery of nonlinear multiscale systems: sampling strategies and embeddings. SIAM J. Appl. Dyn. Syst. 18(1), 312–333 (2019) 32. B.W. Brunton, L.A. Johnson, J.G. Ojemann, J.N. Kutz, Extracting spatial–temporal coherent patterns in large-scale neural recordings using dynamic mode decomposition. J. Neurosci. Methods 258, 1–15 (2016) 33. J.L. Proctor, P.A. Eckhoff, Discovering dynamic patterns from infectious disease data using dynamic mode decomposition. Int. Health 7(2), 139–145 (2015) 34. X. Tan, G. Mamakoukas, M. Castano, T. Murphey, Local Koopman operators for datadriven control of robotic systems, in Proceedings of “Robotics: Science and Systems 2019”, Freiburg im Breisgau, June 22-26, 2019 (IEEE, 2019) 35. X. Tan, G. Mamakoukas, M. Castano, T. Murphey, Derivative-based Koopman operators for real-time control of robotic systems. Preprint. arXiv:2010.05778 (2020) 36. J. Grosek, J.N. Kutz, Dynamic mode decomposition for real-time background/foreground separation in video. Preprint. arXiv:1404.7592 (2014) 37. N.B. Erichson, S.L. Brunton, J.N. Kutz, Compressed dynamic mode decomposition for background modeling. J. Real-Time Image Process. 16, 1479–1492 (2019) 38. Y. Susuki, I. Mezi´c, T. Hikihara, Coherent dynamics and instability of power grids. repository.kulib.kyoto-u.ac.jp (2009)

226

J. N. Kutz

39. Y. Susuki, I. Mezic, Nonlinear Koopman modes and coherency identification of coupled swing dynamics. IEEE Trans. Power Syst. 26(4), 1894–1904 (2011) 40. J. Mann, J.N. Kutz, Dynamic mode decomposition for financial trading strategies. Quantitative Finance 16(11), 1643–1655 (2016) 41. R. Taylor, J.N. Kutz, K. Morgan, B.A. Nelson, Dynamic mode decomposition for plasma diagnostics and validation. Rev. Sci. Instrum. 89(5), 053501 (2018) 42. A.A. Kaptanoglu, K.D Morgan, C.J. Hansen, S.L. Brunton, Characterizing magnetized plasmas with dynamic mode decomposition. Phys. Plasmas 27, 032108 (2020) 43. S. Bagheri, Effects of weak noise on oscillating flows: Linking quality factor, Floquet modes, and Koopman spectrum. Phys. Fluids 26(9), 094104 (2014) 44. J.H. Tu, C.W. Rowley, D.M. Luchtenburg, S.L. Brunton, J.N. Kutz, On dynamic mode decomposition: theory and applications. J. Comput. Dyn. 1(2), 391–421 (2014) 45. M.R. Jovanovi´c, P.J. Schmid, J.W. Nichols, Sparsity-promoting dynamic mode decomposition. Phys. Fluids 26(2), 024103 (2014) 46. T. Askham, J.N. Kutz, Variable projection methods for an optimized dynamic mode decomposition. SIAM J. Appl. Dyn. Syst. 17(1), 380–416 (2018) 47. D. Duke, J. Soria, D. Honnery, An error analysis of the dynamic mode decomposition. Exp. Fluids 52(2), 529–542 (2012) 48. S. Bagheri, Koopman-mode decomposition of the cylinder wake. J. Fluid Mech. 726, 596– 623 (2013) 49. S.T.M. Dawson, M.S. Hemati, M.O. Williams, C.W. Rowley, Characterizing and correcting for the effect of sensor noise in the dynamic mode decomposition. Exp. Fluids 57(3):1–19 (2016) 50. M.S. Hemati, C.W. Rowley, E.A. Deem, L.N. Cattafesta, De-biasing the dynamic mode decomposition for applied Koopman spectral analysis. Theor. Comput. Fluid Dyn. 31(4), 349–368 (2017) 51. O. Azencot, W. Yin, A. Bertozzi, Consistent dynamic mode decomposition. SIAM J. Appl. Dyn. Syst. 18(3), 1565–1585 (2019) 52. N. Takeishi, Y. Kawahara, T. Yairi, Subspace dynamic mode decomposition for stochastic Koopman analysis. Phys. Rev. E 96(3), 033310 (2017) 53. S.L. Brunton, B.W. Brunton, J.L. Proctor, E. Kaiser, J.N. Kutz, Chaos as an intermittently forced linear system. Nat. Commun. 8(19), 1–9 (2017) 54. H. Arbabi, I. Mezi´c, Ergodic theory, dynamic mode decomposition, and computation of spectral properties of the Koopman operator. SIAM J. Appl. Dyn. Syst. 16(4), 2096–2126 (2017) 55. M. Kamb, E. Kaiser, S.L. Brunton, J.N. Kutz, Time-delay observables for Koopman: theory and applications. SIAM J. Appl. Dyn. Syst. 19(2), 886–917 (2020) 56. S.M. Hirsh, S.M. Ichinaga, S.L. Brunton, J.N. Kutz, B.W. Brunton, Structured timedelay models for dynamical systems with connections to Frenet-Serret frame. Preprint. arXiv:2101.08344 (2021) 57. I. Scherl, B. Strom, J.K. Shang, O. Williams, B.L. Polagye, S.L. Brunton, Robust principal component analysis for particle image velocimetry. Phys. Rev. Fluids 5(054401), 10 (2020) 58. D. Sashidhar, J.N. Kutz, Bagging, optimized dynamic mode decomposition (BOP-DMD) for robust, stable forecasting with spatial and temporal uncertainty-quantification. Preprint. arXiv:2107.10878 (2021) 59. A. Alla, J.N. Kutz, Randomized model order reduction. Adv. Comput. Math. 45(3), 1251– 1271 (2019) 60. J. Bongard, H. Lipson, Automated reverse engineering of nonlinear dynamical systems. Proc. Natl. Acad. Sci. 104(24), 9943–9948 (2007) 61. M. Schmidt, H. Lipson, Distilling free-form natural laws from experimental data. Science 324(5923), 81–85 (2009) 62. Y. Yang, M.A. Bhouri, P. Perdikaris, Bayesian differential programming for robust systems identification under uncertainty. Preprint. arXiv:2004.06843 (2020)

4 Machine Learning Methods for Reduced Order Modeling

227

63. Z. Bai, T. Wimalajeewa, Z. Berger, G. Wang, M. Glauser, P.K. Varshney, Low-dimensional approach for reconstruction of airfoil data via compressive sensing. AIAA J. 53(4), 920–933 (2015) 64. S.L. Brunton, J.H. Tu, I. Bright, J.N. Kutz, Compressive sensing and low-rank libraries for classification of bifurcation regimes in nonlinear dynamical systems. SIAM J. Appl. Dyn. Syst. 13(4), 1716–1732 (2014) 65. A. Mackey, H. Schaeffer, S. Osher, On the compressive spectral method. Multiscale Model. Simul. 12(4), 1800–1827 (2014) 66. V. Ozolin¸š, R. Lai, R. Caflisch, S. Osher, Compressed modes for variational problems in mathematics and physics. Proc. Natl. Acad. Sci. 110(46), 18368–18373 (2013) 67. J.L. Proctor, S.L. Brunton, B.W. Brunton, J.N. Kutz, Exploiting sparsity and equation-free architectures in complex systems. Eur. Phys. J. Spec. Top. 223(13), 2665–2684 (2014) 68. G. Tran, R. Ward, Exact recovery of chaotic systems from highly corrupted data. Multiscale Model. Simul. 15(3), 1108–1129 (2017) 69. W.-X. Wang, R. Yang, Y.-C. Lai, V. Kovanis, C. Grebogi, Predicting catastrophes in nonlinear dynamical systems by compressive sensing. Phys. Rev. Lett. 106(15), 154101 (2011) 70. M. Hoffmann, C. Fröhner, F. Noé, Reactive SINDy: Discovering governing reactions from concentration data. J. Chem. Phys. 150(2), 025101 (2019) 71. M. Sorokina, S. Sygletos, S. Turitsyn, Sparse identification for nonlinear optical communication systems: Sino method. Optics Exp 24(26), 30433–30443 (2016) 72. S. Li, E. Kaiser, S. Laima, H. Li, S.L. Brunton, J.N. Kutz, Discovering time-varying aerodynamics of a prototype bridge by sparse identification of nonlinear dynamical systems. Phys. Rev. E 100(2), 022220 (2019) 73. J. Horrocks, C.T. Bauch, Algorithmic discovery of dynamic models from infectious disease data. Sci. Rep. 10(1), 1–18 (2020) 74. M. Dam, M. Brøns, J. Juul Rasmussen, V. Naulin, J.S. Hesthaven, Sparse identification of a predator-prey system from simulation data of a convection model. Phys. Plasmas 24(2), 022310 (2017) 75. K. Champion, B. Lusch, J.N. Kutz, S.L. Brunton, Data-driven discovery of coordinates and governing equations. Proc. Natl. Acad. Sci. 116(45), 22445–22451 (2019) 76. K. Champion, P. Zheng, A.Y. Aravkin, S.L. Brunton, J.N. Kutz, A unified sparse optimization framework to learn parsimonious physics-informed models from data. IEEE Access 8, 169259–169271 (2020) 77. K. Kaheman, S.L. Brunton, J.N. Kutz, Automatic differentiation to simultaneously identify nonlinear dynamics and extract noise probability distributions from data. Preprint. arXiv:2009.08810 (2020) 78. M. Raissi, G.E. Karniadakis, Hidden physics models: Machine learning of nonlinear partial differential equations. J. Comput. Phys. 357, 125–141 (2018) 79. S.H. Rudy, S.L. Brunton, J.L. Proctor, J.N. Kutz, Data-driven discovery of partial differential equations. Sci. Adv. 3(4), e1602614 (2017) 80. S. Rudy, A. Alla, S.L. Brunton, J.N. Kutz, Data-driven identification of parametric partial differential equations. SIAM J. Appl. Dyn. Syst. 18(2), 643–660 (2019) 81. N.M. Mangan, S.L. Brunton, J.L. Proctor, J.N. Kutz, Inferring biological networks by sparse identification of nonlinear dynamics. IEEE Trans. Molecular Biol. Multi-Scale Commun. 2(1), 52–63 (2016) 82. K. Kaheman, J.N. Kutz, S.L. Brunton, Sindy-pi: A robust algorithm for parallel implicit sparse identification of nonlinear dynamics. Preprint. arXiv:2004.02322 (2020) 83. S.M. Hirsh, D.A. Barajas-Solano, J.N. Kutz, Sparsifying priors for bayesian uncertainty quantification in model discovery. Preprint. arXiv:2107.02107 (2021) 84. U. Fasel, J.N. Kutz, B.W. Brunton, S.L. Brunton, Ensemble-SINDy: Robust sparse model discovery in the low-data, high-noise limit, with active learning and control. Preprint. arXiv:2111.10992 (2021) 85. R. Tibshirani, Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58, 267–288 (1996)

228

J. N. Kutz

86. D.L. Donoho, Compressed sensing. IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006) 87. E.J. Candès, Compressive sensing, in Proceedings of the International Congress of Mathematics (2006) 88. E.J. Candès, J. Romberg, T. Tao, Stable signal recovery from incomplete and inaccurate measurements. Commun. Pure Appl. Math. 59(8), 1207–1223 (2006) 89. E.J. Candès, J. Romberg, T. Tao, Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory 52(2), 489–509 (2006) 90. E.J. Candès, T. Tao, Near optimal signal recovery from random projections: Universal encoding strategies? IEEE Trans. Inf. Theory 52(12), 5406–5425 (2006) 91. R.G. Baraniuk, Compressive sensing. IEEE Signal Process. Mag. 24(4), 118–120 (2007) 92. J.A. Tropp, A.C. Gilbert, Signal recovery from random measurements via orthogonal matching pursuit. IEEE Trans. Inf. Theory 53(12), 4655–4666 (2007) 93. P.W. Bearman, On vortex shedding from a circular cylinder in the critical Reynolds number regime. J. Fluid Mech. 37(3), 577–585 (1969) 94. B.R. Noack, K. Afanasiev, M. Morzynski, G. Tadmor, F. Thiele, A hierarchy of lowdimensional models for the transient and post-transient cylinder wake. J. Fluid Mech. 497, 335–363 (2003) 95. B.R. Noack, M. Morzynski, G. Tadmor, Reduced-order Modelling for Flow Control, vol. 528 (Springer Science & Business Media, 2011) 96. R. Gonzalez-Garcia, R. Rico-Martinez, I.G. Kevrekidis, Identification of distributed parameter systems: a neural net based approach. Comput. Chem. Eng. 22, S965–S968 (1998) 97. I. Goodfellow, Y. Bengio, A. Courville, Deep Learning (MIT Press, 2016) 98. S. Hochreiter, J. Schmidhuber, Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997) 99. S. Wiggins, Introduction to Applied Nonlinear Dynamical Systems and Chaos, vol. 2 (Springer Science & Business Media, 2003) 100. P. Baldi, K. Hornik, Neural networks and principal component analysis: Learning from examples without local minima. Neural Netw. 2(1), 53–58 (1989) 101. B. Lusch, J.N. Kutz, S.L. Brunton, Deep learning for universal linear embeddings of nonlinear dynamics. Nature Commun. 9(1), 4950 (2018) 102. C. Gin, B. Lusch, S.L. Brunton, J.N. Kutz, Deep learning models for global coordinate transformations that linearise PDEs. Eur. J. Appl. Math. 32(3), 515–539 (2021) 103. C.R. Gin, D.E. Shea, S.L. Brunton, J.N. Kutz, DeepGreen: deep learning of Green’s functions for nonlinear boundary value problems. Preprint. arXiv:2101.07206 (2020) 104. S.L. Brunton, M. Budiši´c, E. Kaiser, J.N. Kutz, Modern Koopman theory for dynamical systems. Preprint. arXiv:2102.12086 (2021) 105. N. Demo, M. Tezzele, G. Rozza, Pydmd: Python dynamic mode decomposition. J. Open Source Software 3(22), 530 (2018) 106. B.M. de Silva, K. Champion, M. Quade, J.-C. Loiseau, J.N. Kutz, S.L. Brunton. Pysindy: a python package for the sparse identification of nonlinear dynamics from data. Preprint. arXiv:2004.08424 (2020) 107. A.A. Kaptanoglu, B.M. de Silva, U. Fasel, K. Kaheman, J.L. Callaham, C.B. Delahunt, K. Champion, J.-C. Loiseau, J.N. Kutz, S.L. Brunton, Pysindy: a comprehensive python package for robust sparse system identification. Preprint. arXiv:2111.08481 (2021)

LECTURE NOTES IN MATHEMATICS Editors in Chief: J.-M. Morel, B. Teissier; Editorial Policy 1. Lecture Notes aim to report new developments in all areas of mathematics and their applications – quickly, informally and at a high level. Mathematical texts analysing new developments in modelling and numerical simulation are welcome. Manuscripts should be reasonably self-contained and rounded off. Thus they may, and often will, present not only results of the author but also related work by other people. They may be based on specialised lecture courses. Furthermore, the manuscripts should provide sufficient motivation, examples and applications. This clearly distinguishes Lecture Notes from journal articles or technical reports which normally are very concise. Articles intended for a journal but too long to be accepted by most journals, usually do not have this “lecture notes” character. For similar reasons it is unusual for doctoral theses to be accepted for the Lecture Notes series, though habilitation theses may be appropriate. 2. Besides monographs, multi-author manuscripts resulting from SUMMER SCHOOLS or similar INTENSIVE COURSES are welcome, provided their objective was held to present an active mathematical topic to an audience at the beginning or intermediate graduate level (a list of participants should be provided). The resulting manuscript should not be just a collection of course notes, but should require advance planning and coordination among the main lecturers. The subject matter should dictate the structure of the book. This structure should be motivated and explained in a scientific introduction, and the notation, references, index and formulation of results should be, if possible, unified by the editors. Each contribution should have an abstract and an introduction referring to the other contributions. In other words, more preparatory work must go into a multi-authored volume than simply assembling a disparate collection of papers, communicated at the event. 3. Manuscripts should be submitted either online at www.editorialmanager.com/lnm to Springer’s mathematics editorial in Heidelberg, or electronically to one of the series editors. Authors should be aware that incomplete or insufficiently close-to-final manuscripts almost always result in longer refereeing times and nevertheless unclear referees’ recommendations, making further refereeing of a final draft necessary. The strict minimum amount of material that will be considered should include a detailed outline describing the planned contents of each chapter, a bibliography and several sample chapters. Parallel submission of a manuscript to another publisher while under consideration for LNM is not acceptable and can lead to rejection. 4. In general, monographs will be sent out to at least 2 external referees for evaluation. A final decision to publish can be made only on the basis of the complete manuscript, however a refereeing process leading to a preliminary decision can be based on a pre-final or incomplete manuscript. Volume Editors of multi-author works are expected to arrange for the refereeing, to the usual scientific standards, of the individual contributions. If the resulting reports can be

forwarded to the LNM Editorial Board, this is very helpful. If no reports are forwarded or if other questions remain unclear in respect of homogeneity etc, the series editors may wish to consult external referees for an overall evaluation of the volume. 5. Manuscripts should in general be submitted in English. Final manuscripts should contain at least 100 pages of mathematical text and should always include – a table of contents; – an informative introduction, with adequate motivation and perhaps some historical remarks: it should be accessible to a reader not intimately familiar with the topic treated; – a subject index: as a rule this is genuinely helpful for the reader. – For evaluation purposes, manuscripts should be submitted as pdf files. 6. Careful preparation of the manuscripts will help keep production time short besides ensuring satisfactory appearance of the finished book in print and online. After acceptance of the manuscript authors will be asked to prepare the final LaTeX source files (see LaTeX templates online: https://www.springer.com/gb/authors-editors/bookauthors-editors/manuscriptpreparation/5636) plus the corresponding pdf- or zipped psfile. The LaTeX source files are essential for producing the full-text online version of the book, see http://link.springer.com/bookseries/304 for the existing online volumes of LNM). The technical production of a Lecture Notes volume takes approximately 12 weeks. Additional instructions, if necessary, are available on request from lnm@springer. com. 7. Authors receive a total of 30 free copies of their volume and free access to their book on SpringerLink, but no royalties. They are entitled to a discount of 33.3 % on the price of Springer books purchased for their personal use, if ordering directly from Springer. 8. Commitment to publish is made by a Publishing Agreement; contributing authors of multiauthor books are requested to sign a Consent to Publish form. Springer-Verlag registers the copyright for each volume. Authors are free to reuse material contained in their LNM volumes in later publications: a brief written (or e-mail) request for formal permission is sufficient. Addresses: Professor Jean-Michel Morel, CMLA, École Normale Supérieure de Cachan, France E-mail: [email protected] Professor Bernard Teissier, Equipe Géométrie et Dynamique, Institut de Mathématiques de Jussieu – Paris Rive Gauche, Paris, France E-mail: [email protected] Springer: Ute McCrory, Mathematics, Heidelberg, Germany, E-mail: [email protected]