252 3 7MB
English Pages 202 Year 2022
Ewaryst Rafajłowicz Optimal Input Signals for Parameter Estimation
Also of Interest Probability and Statistics. A Course for Physicists and Engineers Arak M. Mathai, Hans J. Haubold, 2017 ISBN 978-3-11-056253-8, e-ISBN 978-3-11-056254-5 (open access)
USCO and Quasicontinuous Mappings L’ubica Holá, Dušan Holý, Warren Moors, 2021 ISBN 978-3-11-075015-7, e-ISBN 978-3-11-075018-8
Nature’s Patterns and the Fractional Calculus Bruce J. West, 2017 ISBN 978-3-11-053411-5, e-ISBN 978-3-11-053513-6
Discrete-Time Approximations and Limit Theorems. In Applications to Financial Markets Yuliya Mishura, Kostiantyn Ralchenko, 2021 ISBN 978-3-11-065279-6, e-ISBN 978-3-11-065424-0
Ewaryst Rafajłowicz
Optimal Input Signals for Parameter Estimation |
In Linear Systems with Spatio-Temporal Dynamics
Mathematics Subject Classification 2020 Primary: 62K05, 93C20, 35R30; Secondary: 35B30, 35E99 Author Prof. Dr. Ewaryst Rafajłowicz Wrocław University of Science and Technology Department of Control Systems and Mechatronics Wybrzeze Wyspianskiego 27 50-370 Wroclaw Poland ewaryst.rafajlowicz@pwr.edu.pl ORCID: 0000-0001-8469-2910
ISBN 978-3-11-035089-0 e-ISBN (PDF) 978-3-11-035104-0 e-ISBN (EPUB) 978-3-11-038334-8 Library of Congress Control Number: 2021951119 Bibliographic information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available on the Internet at http://dnb.dnb.de. © 2022 Walter de Gruyter GmbH, Berlin/Boston Cover image: Ewaryst Rafajłowicz using the Wolfram’s Mathematica® Typesetting: VTeX UAB, Lithuania Printing and binding: CPI books GmbH, Leck www.degruyter.com
Preface The aim of this book is to provide some results on selecting input signals acting on systems with spatio-temporal dynamics. They should be selected in a way that ensures the most accurate estimates of their parameters, under given constraints. We confine our attention to systems described by linear partial differential equations (PDEs) and their Green’s functions, since the theory for nonlinear equations is still relatively undeveloped. New optimality conditions are presented for systems of the elliptic, parabolic, and hyperbolic types. They are based on earlier results on optimal experiment design for regression functions, as well as on the frequency domain approach to selecting optimal input signals for systems described by linear ordinary differential equations (ODEs). Additionally, for such systems new necessary conditions for optimality are derived in the time domain. In all cases, computational algorithms are derived from the optimality conditions. Examples of their performance are provided together with sometimes fascinating 3D plots. The results are derived under a number of simplifying assumptions that frequently allow us to obtain them in a compact form. They can be interpreted as lower bounds on parameter estimation accuracy when more complicated cases are considered. In particular, we advocate new paradigms, namely that at the present state of engineering, it is frequently possible to observe and actuate spatio-temporal systems at all points of a spatial domain occupied by them. The reasons for attempting to estimate a system’s parameters as accurately as possible are twofold. In many cases, these parameters have interpretations of the material constants and their more exact values are of interest for many branches of science and engineering. The second reason is that PDEs are more and more frequently used as models for identification and then for control of industrial processes. The latter aspects are not covered in this book, but they will be referenced. As a measure of the parameter estimation accuracy we select the determinant of the inverse of the Fisher information matrix (FIM), due to its important invariance properties. Extensions of the results to other optimality criteria is possible. In the final chapter we mention several open problems that can be of interest both for theory and for applications. The intended audience of this book includes post-doctorate researchers and PhD and graduate students of applied mathematics, mathematical statistics, and many branches of engineering that apply spatio-temporal models. Wrocław, January 2022
https://doi.org/10.1515/9783110351040-201
Ewaryst Rafajłowicz
Acknowledgements I would like to express my sincere thanks to Professor Dariusz Uciński at the University of Zielona Góra, Poland for his very careful and critical reading of the draft of the book. He also made many suggestions clarifying the statements of the results, improving them or making them more precise. Many sincere thanks are also addressed to Professor Rainer Schwabe from the Otto von Guerricke University, Magdeburg, Germany for our long-lasting cooperation and discussions on experiment design that were very illuminating for me. My views on experiment design problems have also been influenced by the MODA1 community. These meetings are also acknowledged. I appreciate the co-authorship of Dr. Wojciech Myszka of our early papers on experiment design and his help in correcting the manuscript of this book. I would like to express my sincere thanks to my family for their support and patience during the writing of this book. The help of Dr. Wojciech Rafajłowicz in proofreading and improving the drawings is also acknowledged. This monograph was granted honorary patronage by the Committee of Automatic Control and Robotics of the Polish Academy of Sciences.
1 MODA is the acronym for the very famous conference on experiment design Model Oriented Data Analysis. https://doi.org/10.1515/9783110351040-202
Contents Preface | V Acknowledgements | VII List of Tables | XIII List of Figures | XV Frequently used acronyms | XVII 1 1.1 1.2
1.4 1.5 1.6 1.7
Introduction | 1 Overview and motivations | 1 Remarks on experiments and their role in gaining valuable knowledge | 3 Experiments for parameter estimation and model building methodology | 4 Our limitations | 6 We need new paradigms | 7 Bibliographical notes | 9 Notational conventions | 13
2 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8
Optimal experiment design for linear regression models – a primer | 17 The basic linear model | 17 The LSM – some basic facts on its accuracy | 20 Toward design optimality criteria | 25 Approximate experiment designs | 29 Optimality conditions – the Kiefer and Wolfowitz theorem | 34 Case studies – sensor allocation | 39 Why the D-optimality criterion? | 43 Remarks on optimal experiment designs for nonlinear models | 44
3
Numerical search for D-optimal designs – basic methods for a linear regression | 49 Introductory considerations | 49 Optimal allocation of observations | 50 Why is the multiplicative weights update algorithm so fast? | 55 The algorithm of Wynn and Fedorov | 56 The combined algorithm | 64
1.3
3.1 3.2 3.2.1 3.3 3.4
X | Contents 4 4.1 4.2 4.3 4.4 5 5.1 5.2 5.3 5.4 6 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 6.10 7 7.1 7.2 7.3 7.4 7.5 8 8.1 8.2 8.3
The product of experiment designs and its optimality | 69 The product of designs | 69 Multiplicative models | 70 On the D-optimality of product designs for estimating multiplicative models | 72 Examples | 76 Optimal input signals for linear systems described by ODEs | 79 Remarks on parameter estimation of systems described by ODEs | 79 LTI systems – the frequency domain approach | 81 Optimal input signals for ODE systems – the time domain approach | 94 Safer input signals – time domain synthesis | 103 Optimal excitations for systems described by elliptic equations | 105 Introduction and notational conventions | 105 Systems described by elliptic equations | 105 Spectra of compact linear operators | 109 Integral operators | 110 Green’s functions of elliptic operators | 113 The dependence of Green’s function on the parameters of elliptic operators | 116 A special class of Green’s functions | 118 Eigenfunctions of operators of the elliptic type – more examples | 120 Problem statement and D-optimality conditions for systems described by elliptic PDEs | 124 A more exact characterization of the solution | 126 Optimal input signals for DPS – time domain synthesis | 133 Introduction and notational conventions | 133 Assumptions on parabolic and hyperbolic PDEs | 133 Assumptions on observations and the form of the Fisher information matrix | 137 Spatio-temporal structure of the solution | 142 Safer input signals for DPS | 148 Input signal design for systems with spatio-temporal dynamics – frequency domain approach | 151 Introduction and notational conventions | 151 Assumptions | 151 Problem statement | 152
Contents | XI
8.4 8.5 8.6 9 9.1 9.2 9.3
The equivalence theorem for input signals – the space-frequency domain approach | 157 The structure of the D-optimal input signal | 158 Insight into the structure of Su∗ and examples | 161 Final comments | 165 Suppositions about extending the possible applicability of the results | 165 What is left outside this book and open problems | 167 Concluding remarks | 169
Bibliography | 171 Index | 181
List of Tables Table 2.1 Table 2.2 Table 2.3 Table 2.4 Table 2.5 Table 3.1 Table 3.2 Table 3.3 Table 3.4 Table 3.5 Table 3.6 Table 3.7
Design of the experiment discussed in Example 8. | 36 Support points of D-optimal designs – polynomial regressions. | 37 Sensors allocation for a rod with insulated end points. | 41 Optimal sensors positions for a rod. | 42 The D-optimal sensors allocation found by a numerical search. | 42 Starting design in Example 15. | 53 A starting design in Example 16. | 54 Starting design in Example 17. | 60 Starting and final designs in Example 18. | 62 Initial and final designs in Example 19. | 66 Starting and final designs in Example 20. | 66 Initial and final designs in Example 21. | 67
List of Figures Figure 1.1 Figure 2.1 Figure 2.2 Figure 2.3 Figure 2.4 Figure 2.5 Figure 2.6 Figure 2.7 Figure 2.8 Figure 2.9 Figure 3.1 Figure 3.2 Figure 3.3 Figure 3.4 Figure 3.5 Figure 3.6 Figure 4.1 Figure 5.1 Figure 5.2 Figure 5.3 Figure 5.4 Figure 5.5 Figure 5.6 Figure 6.1 Figure 6.2 Figure 6.3 Figure 7.1 Figure 7.2 Figure 7.3 Figure 7.4 Figure 7.5 Figure 7.6 Figure 7.7 Figure 8.1 Figure 8.2 Figure 8.3 Figure 9.1 Figure 9.2 Figure 9.3 Figure 9.4
Aristotle’s claim confronted with a simple experiment. | 4 Comparison of model output variances of two experiments. | 26 Attainable FIM plotted as vectors in Example 5 for p = 0.5. | 33 Attainable FIM plotted as vectors in Example 5 for p = 0.65 (left panel) and p = 0.75 (right panel). | 33 Prediction variances of two designs: D-optimal and nonoptimal one. | 35 Prediction variance when the quadratic regression is estimated. | 35 Prediction variance when a 9-th degree polynomial is estimated. | 37 D-optimal designs for polynomials. | 38 Prediction variance of 2D trigonometric regression. | 38 Sensors’ allocation for a rod with insulated end points. | 41 Updated weights vs. iteration number in Example 15. | 54 Updating weights – bivariate regression. | 54 Prediction variance vs iteration number. | 61 log(det(M(ξk ))) vs. iteration number k obtained in Example 18. | 62 Prediction variance – 4-th degree polynomial regression. | 67 Prediction variance – trigonometric regression. | 67 Optimal sensors placement – a thin plate. | 77 The sensitivity to departures from the optimality. | 88 Prediction variance in two parameters – 1st order model. | 92 The loss of the design efficiency vs departures from nominal parameters. | 93 The sum of sine waves can have a large amplitudes. | 94 Approximately optimal signal – vibrating system. | 100 Approximately optimal signal – 1st order system. | 101 Approximation error of the truncated Fourier expansion. | 115 Selected eigenfunctions of the Laplace operator in a cylinder. | 123 D-optimal input signal – two unknown parameters. | 131 The impulse response sensitivity for a hyperbolic system. | 144 Optimal amplitude for a hyperbolic system – one mode. | 144 The space-time structure of the optimal input signal. | 145 Time domain excitations of two modes – parabolic system. | 146 The space-time structure – two modes – parabolic system. | 146 D-optimal excitations for three modes – hyperbolic system. | 147 The D-optimal space-time shapes – hyperbolic system. | 148 Spatial distribution of heating. | 162 The spatio-temporal structure of the optimal input signal. | 163 The spatio-temporal structure – 4th order system. | 164 Nonconvex domain. | 165 Eigenvalues vs. the Laplacian parameter. | 166 Eigenfunctions vs. the Laplacian parameter. | 166 Difference of eigenfunctions the Laplacian vs. parameter. | 167
Frequently used acronyms BC BLUE DOE DPS FIM G-M theorem IC K-F theorem LSE LSM LTI MVUA MVUE ODE PDE W-F algorithm
boundary conditions best linear and unbiased estimator design of experiment distributed parameter system(s) Fisher information matrix Gauss–Markov theorem initial conditions Kiefer–Wolfowitz equivalence theorem least squares estimator least squares method linear, time invariant (systems) multiplicative weights update algorithm minimum variance, unbiased estimator ordinary differential equation partial differential equation Wynn–Fedorov algorithm
https://doi.org/10.1515/9783110351040-203
1 Introduction In this chapter, we give a justification for writing this book and explain why it contains the proposed scope. Thus, the starting point (Section 1.1) is a brief overview. As an interlude, in Section 1.2, we describe a very simple experiment. The role of more advanced experiments for estimating parameters and model building methodology is further discussed in Section 1.3. A simple example of estimating only one unknown parameter in the heat conduction equation (Section 1.4) shows why we do not consider the problem of the simultaneous optimal selection of input signals and sensor locations. In Section 1.5, we mention a large number of tools, techniques, and devices stimulating consideration of experiments that use observations and inputs (actuators) influencing systems with spatio-temporal dynamics at every spatial point of a certain domain. The existing bibliography on experiment design is so extensive that it is impossible even to sketch it. Nevertheless, in Section 1.6 we provide bibliographic notes on topics directly related to our considerations. Further references dedicated to particular problems are provided in the relevant chapters. Finally, in the last section, the most frequently used notational conventions are summarized.
1.1 Overview and motivations The main aim of this book is to discuss the results on selecting optimal input signals for estimating parameters in systems with spatio-temporal dynamics. For historical reasons, they are also called distributed-parameter systems (DPSs), since their states, inputs, and – possibly – coefficients depend on spatial variables. Such systems are usually described by partial differential equations (PDEs), equipped with boundary conditions (BCs). A sub-class of such systems, namely linear, time-invariant (LTI) systems, are the main focus of this book, since they can be described by their Green’s functions, even when these functions are not explicitly known. The results for this class of systems (optimality conditions and algorithms) collected in Chapters 6–8 are mostly new. In Chapter 6 input signals are designed for estimating parameters in systems considered in a steady state, i. e., those in which transient responses have already vanished. In other words, only the spatial dynamics of such systems are taken into account and we confine ourselves to those that are described by PDEs of the elliptic type with constant coefficients. This is an intermediate step for the results presented in Chapters 7 and 8. However, PDEs are rather complicated mathematical objects and it is desirable to consider (see Chapter 5) the problems of selecting input signals for estimating parameters appearing in systems described by ordinary differential equations (ODEs). The https://doi.org/10.1515/9783110351040-001
2 | 1 Introduction reason is that the solutions of these problems appear as building blocks for characterizing the spatio-temporal structure of inputs acting on DPSs (see Chapters 7 and 8). Again, in Chapter 5 we confine our attention to LTI systems described either by linear ODEs with constant parameters or by their transfer functions, or by the corresponding impulse responses. The results presented in Chapter 5 in turn heavily rely on the optimality conditions and algorithms that were developed in the theory of optimal experiment designs for estimating parameters of regression functions. For this reason Chapter 2 contains an introduction to the classic theory of D-optimal experiment design. As already mentioned, this optimality criterion is selected in this book as the primary one, since D-optimal experiment designs can be easily transformed when the domain of experimentation is rescaled. Thus, it suffices to plan an experiment in, e. g., the unit cube and to transform it into an actual domain of experimentation. In particular, the seminal Kiefer and Wolfowitz theorem is provided as a prototype for deriving the optimality conditions for systems described by ODEs and PDEs. In Chapter 3, this theorem forms the basis for presenting the fundamental algorithm of Fedorov and Wynn for searching D-optimal designs. In parallel, also the algorithm of optimizing the weights of experiment design is described. Finally, both algorithms are combined, providing an efficient tool for searching the frequencies of D-optimal input signals for dynamic and spatio-temporal dynamic systems. Chapter 4 plays a special role in this book, serving as a bridge between classic D-optimal experiment design theory and its generalizations to DPSs. One can consider it as a collection of the results on classic D-optimal designs that are considered in domains having the Cartesian product structure. Simultaneously, mathematical models defined over these domains are multiplicative ones with respect to independent variables. These two conditions lead to the conclusion that D-optimal designs have special structures that are products of simpler designs which are easier to compute. This conclusion has a general applicability. On the other hand, examples provided in this chapter illustrate the fact that eigenfunctions of the elliptic linear operators with constant coefficients frequently have a multiplicative structure. This is of importance for the rest of the book. In passing, we illustrate the fact that in these cases simple problems of D-optimal sensor allocation are analytically solvable. The book concludes with Chapter 9. It contains informal discussions on the possible applicability of the results when the assumptions are not strictly fulfilled. Additionally, many related problems that are outside the scope of this book are listed. Some of them can be considered as open problems. In more detail, our particular aims in Chapters 2–5 are the following: – to provide an overview of classic results on optimal experiment design for estimating parameters of a regression function, – to indicate how to use them for sensor allocation problems for estimating parameters in systems described by PDEs, – to recall basic examples concerning eigenvalues of differential operators,
1.2 Remarks on experiments and their role | 3
– – –
to discuss problems concerning the selection of input signals for parameter estimation in systems described by ODEs, to provide basic facts on selecting optimal input signals for ODEs, to provide new results for the time domain synthesis of D-optimal input signals for LTI dynamic systems described by ODEs (this approach is a combination of the variational methods and the results inspired by the class D-optimal designs for estimating a regression function).
In Chapters 6–8 our aims include: – to indicate that new technologies of sensing and actuating in space and time can lead to new (sometimes easier) problems of input signal selection, – to focus on input signals for parameter estimation in LTI systems with spatiotemporal dynamics, – to illustrate the interplay between the spatial and time domain structure of optimal input signals, – to summarize the results obtained on optimal input signals in the space-frequency domain.
1.2 Remarks on experiments and their role in gaining valuable knowledge –
–
–
Ancient civilizations made many important observations, constructed many impressive machines, and achieved technological breakthroughs, but as far as we know they mostly used experimental designs to construct them, rather than rational designs of experiments. The first person who mentioned how to run experiments was Galileo Galilei. His prescription was, roughly speaking, in modern vernacular, the following: “make changes to one input variable at a time and observe its influence on an output variable.” We already know that this receipt is not an optimal one and in many cases it can lead to the omission of interactions between factors (inputs). However, at a very early stage of experiments, it can still be useful in gaining intuition on the strength of the influence of particular input variables on the output. The first statistically planned experiments were published about 100 years ago (see the retrospective review [12]). The first paper cited in [12] was concentrated on industrial experiments, and the second on agricultural science, in which a badly planned experiment can be repeated next year. They included, for example, experiment designs to answer questions such as whether the yield of wheat A is better than the yield of wheat B.
4 | 1 Introduction
Figure 1.1: Experiment: Count how many legs a fly has and compare your answer with Aristotle’s claim that a fly has four legs. Note that the right panel image was not corrected by any means and the fly was not injured. It is simply the case that only the hind pair of legs are not visible to us.
An age-old question of philosophers was: what is more important in gaining valuable knowledge, a theory or an experiment? At present, the general opinion is that both theory and experiments are essential, plus a priori knowledge. The most intriguing example in the history of science illustrates the role of experiments. Aristotle, an important creator of Western philosophy, claimed that a fly has four legs (see, e. g., [187]). The most striking fact is that Aristotle’s authority was so great that his claim was verified “experimentally” about 1500 years later. The reader may wish to conduct such an experiment independently (see Fig. 1.11 ). The inference from examples is not always safe, but in this case it is. From this perspective, our aim is to discuss aspects of the theory of experiment design, since a complete theory still does not exist and the process of its construction will be endless.
1.3 Experiments for parameter estimation and model building methodology In this book we concentrate our attention on experiment designs for estimating parameters in selected classes of models. To this end, we need an appropriately selected model. Idealized steps of model building methodology are the following: 1 The author expresses his thanks to Dr. Wojciech Rafajłowicz for the permission to reproduce images taken by him.
1.3 Experiments for model building methodology |
–
–
5
Select inputs and outputs of the system (which is no easy task: screening experi-
ments [107], persistently exciting sequences [70, 153], and fractal signals [135] can be useful).
Decide which kind of system behavior one wants to describe:
1.
2. 3.
only steady-state (or static) behavior, when inputs are kept constant and out-
puts are measured after transient processes died out (e. g., deflection of a beam at a selected point under constant load),
steady state in space, as above, but inputs and observations can be made at
many spatial points,
dynamics in time, when inputs and outputs vary in time, a system has a
“memory,” i. e., the present output depends also on earlier input signal values (e. g., the same beam observed at a selected point, but the load varies in time and the beam is not “thin,” while the output signal at a selected point is observed at each time instant),
4. dynamics in time and space – as above, but a system state is observed also at each spatial point and inputs can be applied at many (or all) spatial points (e. g., the same beam, but deflections are observed by a fast camera).
The next ingredient in model building methodology is to choose a mathematical model that is capable of reflecting a desired system behavior and then estimate its parameters. To this end, one can distinguish the following steps: –
Select a class of models with unknown parameters. This is also not an easy task.
One can apply the laws of physics and/or chemistry or a sufficiently rich class of
functions or operators capable of covering observed behavior of a real system. The following typical cases can be considered: 1.
2. 3.
steady-state models – usually described by functions (e. g., polynomials, or-
thogonal series),
dynamic systems in time – usually modeled by ODEs or finite difference equa-
tions,
systems in the steady state in time and spatial dynamics – most frequently
described by PDEs of the elliptic type,
4. dynamics in space and time – Green’s functions, PDEs, or operator equations – – –
in Hilbert spaces are in common use.
Design an experiment. Its selection depends on the above selected class of mod-
els – this is the main topic of this book.
Perform the experiment and make observations.
Estimate the model parameters, validate the model, and return to model selection
or experiment design, if necessary.
6 | 1 Introduction
1.4 Our limitations The following simple example serves as an explanation as to why we are forced to confine our attention to the sub-class of experiments that focus mainly on selecting input signals. Example 1. Consider a heat conduction equation for temperature q(x, t) at spatial point x and time t > 0, 𝜕2 q(x, t) 𝜕q(x, t) =a + u(x, t), 𝜕t 𝜕 x2
x ∈ R.
(1.1)
This time (exceptionally in this book) x ∈ (−∞, ∞) and u(x, t) is an input signal, while a > 0 is an unknown parameter. The solution of equation (1.1) is required to fulfill the initial condition q(x, 0) = 0, x ∈ R, and the boundary conditions limx→±∞ q(x, t) = 0, t > 0. When u(x, t) = δ(x − κ) δ(t), where δ(⋅) denotes the Dirac delta function, the solution q(x, t) of (1.1) is called Green’s function, which is further denoted as G(x, κ, t; a) (see Chapter 6 for a more detailed discussion of Green’s functions and the Dirac delta function). Green’s function of (1.1) is well known and it is given by G(x, κ, t; a) =
1 (x − κ)2 exp(− ). 4at 2√π a t
(1.2)
It is interpreted as the response of system (1.1) at point x and time t > 0 when the unit impulse is applied at point κ. Thus, its response to u(x, t) is given by t +∞
q(x, t; a) = ∫ ∫ G(x, κ, t − τ; a)u(κ, τ) dκ dτ.
(1.3)
0 −∞
This system has only one unknown parameter a. This fact is displayed by adding “;” before parameter a as the argument of q. Let us select u(κ, t) = δ(t) exp[−s2 (κ − κ0 )2 ]/s2 as the Dirac delta impulse in time that is applied at point κ0 and its vicinity with the Gaussian weighting function of the width and height controlled by s > 0. The response of (1.3) to this signal is given by (κ0 −x)2
s e 4 a t+s2 q(x, t; a) = , √4 a s t + s3 −
(1.4)
while its sensitivity to changes of a has the form (κ0 −x)2
(κ0 −x)2
4 s t (κ0 − x)2 e 4 a t+s2 𝜕q(x, t; a) 2 s2 t e 4 a t+s2 = − . 3 3/2 𝜕a (4 a t + s2 )2 √4 a s t + s3 (4 a s t + s ) −
−
(1.5)
1.5 We need new paradigms | 7
As we shall see in subsequent chapters, when only one parameter is estimated it is expedient to maximize T
∫[ 0
2
𝜕q(x(t), t; a) ] dt 𝜕a
(1.6)
with respect to the trajectory x(t), t ∈ [0, T], of a moving sensor, where T > 0 is the observation horizon. In general, it is a rather difficult task, but in this case it is easy to observe that the squared expression on the right-hand side of (1.5) is maximized by x(t) = κ0 for each t ∈ [0, T], i. e., the optimal sensor’s position is constant in time. Furthermore, this position is directly related to the position of the maximum in space of the input signal. In a more general case, when also the heat source is moving, also the optimal position of the sensor varies in time, but this task is too difficult in a general case. Conclusion In general, we should optimize simultaneously the choice of input signals and sensor locations, but it is still a difficult task. Later we shall optimize input signals only, assuming the most favorable case that we are able to observe the system state at each spatial point (or to approximate it with high precision). The motivation for this assumption is given in the next sub-section. On the other hand, already in the early stages of research [126, 130] it was known that the application of moving sensors can be beneficial for the accuracy or computational aspects of parameter estimation. The monograph by D. Uciński [177] and further papers of his research group [110, 111, 183, 184] and monographs [79, 172] provide valuable results on sensor allocation and on optimization of moving sensors and/or actuator trajectories.
1.5 We need new paradigms Why do we need new paradigms for optimizing experiments for systems with spatiotemporal dynamics? The research in this area began in the 1950s, i. e., about 70 years ago. From this time our abilities to sense and actuate spatio-temporal fields have expanded tremendously, due to the progress in technology and engineering. The classic paradigms were, roughly speaking, based on the following statements: – observations of a system state can be made only at a finite number of spatial points, in particular, those located at a boundary of an object, – actuators that influence our system can act also at a finite number of spatial points – again – mainly those located at a boundary of an object.
8 | 1 Introduction These paradigms are still valid in many cases and it is still worth conducting research according to these. Besides, they still dominate in the literature. On the other hand, our aim in this section is to convince the reader that it is possible to relax or to replace them by new paradigms that provide more information about a system’s state and more flexibility in stimulating the system. To this end, let us briefly mention selected new techniques and devices that have been developed in the last 70 years. – Cameras and infra-red cameras provide information which can be considered as observations that are continuous in space. Indeed, for cameras working in the visible light spectrum, a resolution larger than 10 MPix is available even on our smartphones. Standard industrial cameras have slightly lower resolution, but enough for most applications. For infra-red cameras a resolution of 480 × 640 is available at a reasonable cost. Cameras working in the ultra-violet spectrum provide valuable images concerning, e. g., combustion processes and leaks around high-voltage devices. Additionally, our abilities to sense have increased due to the following specialized devices. – Specialized cameras for observing strain fields (by optical devices) are available. – Mobile sensors, e. g., those mounted on cars, for monitoring spatial fields of temperatures and chemicals (CO2 , O2 , SO3 , etc.) are in common use. – Intelligent sensors of various kinds, including PVDF piezoelectric polymer sensors, are available. – Sensor arrays of microelectromechanical measurement systems (MEMSs), e. g., accelerometers, are so cheap that one can consider placing them densely. – Wireless sensor networks for estimating spatial fields, based on different physical or chemical laws, are more reliable and relatively cheap. – Estimating velocity fields from a sequence of images by using optical flow algorithms has become a frequently used technique. – Laser 3D scanners can provide information about the shapes of observed objects from a microscale (e. g., about surface roughness) to a macroscale. In particular, we can collect data to estimate changes of shapes and vibrations of buildings, bridges, and pillars. – 3D images provided by computer tomography (CT) can be based on X-rays (the most popular), on impedance measurements, and on acoustic waves, known as ultrasonography (USG). Industrial CT devices are available, although they are not always cheap. – Magnetic resonance imaging (MRI) and MRI sequences, known as functional MRI, are also available, but – at present – they are still rather expensive. Summarizing, one can assume that in many cases we have observations at every spatial point of a certain domain. Most of the abovementioned sensors provide, in fact, digital observations, but they are so densely spread in space that our assumption is justified,
1.6 Bibliographical notes | 9
leading to essential simplifications of the theoretical considerations and to easily interpretable results. Furthermore, new tools for actuating systems with spatio-temporal dynamics have emerged. In particular: – High-energy lasers, acting as moving sources and having the ability to change their power almost instantaneously, are available. – Microwave (or tera-Hertz) heating (inside or on the boundary of 3D bodies) can be used. – One can apply shape-memory alloys, shape changing materials, e. g., piezoelectric bonds, acting as spatially distributed forces. – Magneto-rheological and electro-rheological dampers – which are able to change their stiffness almost immediately – are available and already used in practice, e. g., for damping vibrations in cars. – Peltier cells can be placed densely in space, providing controllable sources of heating or cooling. The above examples motivate the assumption that in many cases it is possible to apply spatially distributed actuators. Note that even more advanced control problems for PDEs are recently considered. Namely, measures on spatial domains are discussed as control signals in [26, 27]. Concluding, it is possible and useful to relax the classic paradigms and to allow for spatially distributed observations and actuators and to develop a theory for their use in high-quality parameter estimation of systems with spatio-temporal dynamics. This is our goal in subsequent chapters. Even if such a theory is still not always directly useful (realizable) in practice, it provides lower bounds on the achievable parameter estimation accuracy when our financial and engineering abilities of actuating and sensing will be growing.
1.6 Bibliographical notes This book did not arise from scratch. It is based on many deep results from mathematics, applied mathematics, and mathematical statistics, including optimal experiment design theory, engineering in general, and particularly control systems theory, optimization algorithms, system identification, and sensor engineering. Many of these results are cited in the text, mainly when they are directly used. However, a large number of the results were mile-stones in the theory of optimal experiment designs for estimating parameters in PDEs and/or they provided the methodological background of this book. For these reasons, we mention them in this section, trying to gather them into the classes of topics. In some cases, we try to explain why a particular stream of research is not directly used in this book. The following important topics and/or streams of research are either used in this book or are closely related.
10 | 1 Introduction Systems described by PDEs The theory of PDEs has such a long history that we have to confine ourselves to recently published monographs and textbooks, relying on their bibliographies. As far as possible, we confine ourselves to solutions of PDEs in the classical sense with the hope that it expands the readership. In particular, we refer the readers to, e. g., [2, 61, 174] in this respect and to [42, 58] for elementary properties of Green’s functions of linear PDEs as our basic tool. On the other hand, we are forced to invoke and use some facts from a functional analysis, including the Hilbert–Schmidt and trace-class operators. In this respect, we redirect the reader to, e. g., [72, 151, 152] for basic facts. At this point, it is worth explaining why we do not rely on a system description based on an elegant theory of describing systems with spatio-temporal dynamics in the terms of abstract semi-groups. The main reason lies in extending the potential readership. However, assuming possibilities of observing and actuating a system at every spatial point of a certain set Ω, it seems to be an easy exercise for more mathematically oriented readers to express our results in terms of special classes of semi-groups and their eigenvalues and eigenfunctions. Again, we refer the reader to [16, 32, 33, 81, 82, 92, 93, 117, 194] for basic facts on expressing evolutionary PDEs in terms of semi-groups and their control. Green’s function description of spatio-temporal systems For the reasons explained above, instead of semi-groups, our main tool in describing systems with spatio-temporal dynamics is the classic approach, namely, Green’s functions. An additional motivation is that for LTI systems, without spatial dependencies, the description in terms of Green’s functions is very popular among many branches of engineering, including control systems and electrical and electronic engineering, under the common name of system impulse response. Optimization theory and variational calculus For the purposes of this book we also need basic facts from classic variational calculus, as well as from classic (finite-dimensional) optimization theory and algorithms (see, e. g., [14, 20, 21, 59, 97]). Mathematical statistics The theory of optimal experiment design is based on estimation theory in general and on the Gauss–Markov theorem and the Cramer–Rao lower bound in particular (see, e. g., [114, 118, 149, 150]). The theory of experiment design The theory of optimal experiment design for estimating parameters of a regression function is the starting point for this book and simultaneously this theory provides the methodological background for our investigations. Roughly speaking, the following two streams of research can be distinguished.
1.6 Bibliographical notes | 11
Optimal experiment design for a regression function This stream of research has a long history that has its origins in the 1950s. Here, we mention only monographs [8, 38, 46, 77, 101, 113, 118, 120, 159] and [47, 164]. Further references are provided in Chapters 2–4. Experiment design for nonparametric regression estimation The theory of nonparametric regression estimation also has a long history (see, e. g., [29, 51, 57, 62, 64, 154]), but attempts to allocate observations were made later (see [103, 128, 144, 145, 155]). Sensors allocation for parameter estimation in PDEs The theory of sensor allocation and activation and selecting their trajectories for DPSs is parallel to the topics considered in this book. We refer the reader to the comprehensive monograph of Professor D. Uciński [177] that summarizes the results of his school and many other earlier results, up to the year 2005. Later, a large number of influencing papers on these topics were published (see, e. g., [68, 73, 162] and the survey paper [5]). An additional stimulus to conduct research in this direction was provided by the stream of papers on sensor networks (see, e. g., [111, 142, 178, 183, 184]), developed in the spirit of experiment design. Sampling instants for parameter estimation in dynamic systems The theory of experiment design was extended to selecting sampling instants in dynamic systems by Professor A. Atkinson and his coworkers and followers [9, 10, 23, 112, 179–181]. The theoretical results are illustrated by models of chemical reactions. Sampling for estimating random processes and fields The starting point in this direction was the paper [156]. Since the 1980s, this field of research has grown so fast that we mention only more recent monographs [31, 104, 106] and the survey paper [102]. Optimal input signals for identifying ODE systems The theory of optimal input signals for identifying LTI systems with lumped parameters, i. e., those that can be described by linear ODEs, finite-dimensional state-space equations, or transfer functions, is an important, auxiliary ingredient that is used in this book. In particular, we use the classic results on designing D-optimal input signals in the frequency domain that were developed and later summarized in [1, 53, 54, 99, 100, 195]. For the results concerning the time domain of optimal test signal synthesis for LTI systems, the reader is referred to [76, 88, 170]. Here, they are extended and used to derive optimality conditions for estimating parameters in PDEs. Additionally, we refer the reader to the class of methods developed by Professor H. Hjalmarsson and his coworkers on input signal design when the goal of identification is to obtain a good model for control purposes – see [6, 52, 63, 69] for selected approaches.
12 | 1 Introduction Optimal input signals for DPS parameter estimation It seems that the importance of this topic is not fully recognized in the literature as opposed to the parallel topic for identifying LTI systems with lumped parameters (see the citations in the previous topic). This fact is the main motivation for writing this book. We refer the reader to [131, 143] for earlier papers on this topic when the optimal input is synthesized in the time domain. The history of the design of input signals for identifying DPSs in the frequency domain can be traced back to [90, 121, 122, 124, 137]. The main difference between this book and the results presented in these papers lies in the assumptions concerning the availability of spatial observations. Namely, in the papers cited above the classic paradigm of observing a spatial field at a finite number of spatial points only was applied. Conversely, in this book, the new paradigm that we are (at least in many cases) able to collect observations at every point of a certain spatial domain leads to more explicit results. Even if this possibility is not relevant to the application at hand, it provides the lower bound on the achievable accuracy. System identification for DPSs Fortunately, the algorithms and the theory of parameter identification (or estimation) for systems described by PDEs are much better developed, independently of whether input signals are optimal or not. The literature in this field can be, roughly speaking, divided into two streams. Constant parameters This stream of research is the oldest one. It is mainly anchored in the engineering literature. We refer the reader to survey papers [89, 115, 176] from the 1970s and 1980s for the summary of earlier results and to [50, 148, 175] for the monographs. More recent research on DPS identification can be found in the following papers. In [127] the sequential approach to the state estimation of DPSs is presented, where the term “sequential” refers to procedures with a random stopping time, which guarantee to obtain a pre-specified estimation accuracy. In [134] the algorithm of parameter estimation for DPSs is proposed that is based on affine dependence of eigenvalues on unknown PDE parameters (see Chapter 6) and on double application of the least squares method (see [196] for the application of this method in microelectronics). Attempts at decomposing such tasks for piecewise constant parameters can be traced back to [129]. In [191] an identification algorithm of parameters appearing in boundary conditions is proposed. The space-time separation approach is developed in [94]. Recurrent trajectories and a learning algorithm are proposed in [40]. Interesting applications are discussed in [35, 39, 163]. A recent review, including nonlinear DPSs, is provided in [60].
1.7 Notational conventions | 13
Spatially varying parameters Much more mathematically sophisticated results were developed in [17–19, 28, 30, 182] and related papers. Ill-conditioning and computational aspects From the computational point of view, parameter estimation tasks for DPSs are known to be ill-conditioned in the sense of Hadamard (over-sensitivity to input data). Tikhonov’s regularization of the computational algorithms and the followers of this idea have crucial importance for parameter estimation accuracy and their efficiency. We refer the reader to [157] for highly developed numerical algorithms in this area. Other topics concerning estimation for DPSs In parallel with the contents of this book, topics concerning the state estimation of DPSs were considered. Formally, there is a link between parameter and state estimation problems. Namely, by considering unknown but constant parameters as system states, one can reduce the problem of their estimation to the problem of estimating the states of DPSs by extending the dimension of the system state space. For this reason, it is worth mentioning [84] as an early work on the state estimation of DPSs. Later, the problem of sensor allocation for state estimation was intensively studied [4, 85–87, 123], among others. An example of testing DPS states can be found in [136]. Clearly, also the theory and algorithms for identifying the lumped-parameter systems (described by ODEs, finite difference input–output relationships, or state-space equations) have their influence – as building blocks – on methods and algorithms for DPS identification and its accuracy. The literature in this field is so huge that we are forced to mention only selected monographs (see [43, 96, 168]).
1.7 Notational conventions It was our intention to follow widespread notational conventions as far as possible. However, this book uses facts from a large number of sub-fields of mathematics, mathematical statistics, PDEs, and many branches of engineering. For this reason, it was impossible or inconvenient to avoid some departures from these conventions. Therefore, our aim in this sub-section is to sketch general notational conventions and possible departures. The time variable is a scalar variable, denoted by t. However, its range is different in different chapters. In most cases the assumption t ∈ [0, T] is used, 0 < T < ∞, but in the chapters concerning the frequency domain approach we allow for t ∈ (−T, T), where T is the observation horizon that tends to ∞.
14 | 1 Introduction Spatial variables are usually denoted as x and sometimes as κ. However, in Chapters 2–4, x ∈ Rd is used as a column vector of input variables of a regression function or as spatial variables of systems in the steady state. Vectors of spatial variables are denoted as x and their elements by x1 , x2 , . . . . The transposition In Chapters 1–4 we use the traditional notation “T ” for the transposition of vectors and matrices. In these chapters tr[⋅] is used for the trace of a matrix. In Chapters 5–9 we changed the notation, since T is traditionally reserved for the observation horizon and we are forced to use the notation “tr ” for the transposition of a vector or a matrix. As a side issue, we shall use the full name trace for the trace of matrix. Finite-dimensional spaces By Rd we denote the dimensional Euclidean space of column vectors, but we write R instead of R1 . Outputs Output measurements are denoted by yi , i = 1, 2, . . . , when regression functions are considered. For dynamic systems we also use the notations y(t) and Y(x, t) for outputs. A system state For systems with spatio-temporal dynamics we consequently use the notation q(x, t) to denote their states. We consider only systems with scalar states, for simplicity of the exposition, thus q(x, t) ∈ R. When it is necessary to display the dependence of the state, we use the notation q(x, t, a)̄ or q(x, t; a)̄ when it is necessary to emphasize that ā is a vector of variables having a different interpretation than x and t. Here and later on, ā ∈ Rr is the vector of unknown parameters. Vectors, matrices, and operators For vectors we use lowercase letters, but in some cases the bar notation is used, e. g., for unknown parameters and for vectors related to ā such as gradient vectors. However, it is convenient to use the notation ∇a for the gradient with respect to a.̄ Matrices and operators are described by capital letters. However, in many cases we use the notation Ax or Ax (a)̄ to distinguish differential operators. In some cases, ̇ mainly in examples, the dot notation, e. g., q(t), will be used for the derivative with respect to time, while q′ (x) denotes the derivative with respect to spatial variable x. Alternatively, the notations 𝜕t𝜕 and 𝜕x𝜕 are used for partial derivatives. i Integral operators are denoted by capital, calligraphic letters, e. g., 𝒢 , while their ̄ kernels are denoted by capital letters with two or three arguments, e. g., G(x, κ, a). Exceptionally, in Chapter 4, we use boldface letters for sub-vectors, sub-sets, etc., since structures of considered regression functions are complex. The Kronecker product of matrices is denoted by ⊗. The same symbol is used for the product of experiment designs for the reasons explained in Chapter 4. Spaces of functions By L2 (Ω), Ω ⊂ Rd , we denote the space of squared integrable functions on Ω with the standard inner product, denoted as ⟨⋅ , ⋅⟩. The space of continuous functions is denoted by C(Ω) and, analogously, by C with superscripts we denote spaces of continuously differentiable functions, e. g., we write C k (⋅) for spaces of k-times differentiable functions.
1.7 Notational conventions | 15
Conjugate complex numbers are denoted by c , since ∗ is reserved for optimal solutions of optimization problems. The Fisher information matrix, abbreviated as FIM, is denoted by M. In Chapters 2–4 its argument is an experiment design ξ . In the subsequent chapters its argument is changing to input signal u (Chapter 5) and to U when spatio-temporal signals are considered. In such cases also versions MT or MΩ are used. Vectors of unknown parameters are denoted by a in Chapters 1–4 when they appear in a regression function. However, in Chapters 5–9 the notation ā is used, since unknown parameters are scattered as multipliers of differential operators. The number of unknown parameters is denoted by r ≥ 1 with the exception of Chapter 5, where we have r + 1 unknown parameters, including a0 as the amplification of input signal u. Greek letters appear with different meanings in different chapters, but we still try to use the most common notations, e. g., λ for eigenvalues, etc.
2 Optimal experiment design for linear regression models – a primer We shall start with linear models, since optimal design of experiments (DOE) for them: – is mathematically rigorous, – has a well-established methodological base, – is a prerequisite for understanding DOE concepts for dynamical systems.
2.1 The basic linear model Our basic model is linear (in unknown parameters), namely, r
y(x) = aT v(x) = ∑ a(r) v(r) (x), k=1
(2.1)
where T denotes transposition, while – a = [a(1) , a(2) , . . . , a(r) ]T is the vector of unknown parameters, – v(x) = [v(1) (x), v(2) (x), . . . , v(r) (x)]T is the vector of given functions (possibly nonlinear in x; later we shall provide comments on their choice). The main assumptions concerning (2.1) are the following: A1 Model (2.1) correctly describes the input–output (I/O) relationship of our process when there are no measurement errors. A2 Measurement errors εi , i = 1, 2, . . . , N, are random variables with zero mean and finite variance σ 2 (the same – for simplicity). Furthermore, εi and εj are uncorrelated for i ≠ j, i, j = 1, 2, . . . , N. A3 Errors are additive, i. e., available observations (xi , yi ) are related as follows: yi = aT v(xi ) + εi ,
i = 1, 2, . . . , N,
(2.2)
where a in (2.2) is treated as a vector of “true,” but unknown parameters. Its dimension r ≥ 1 is assumed to be known. Remark 1. The assumption that the model structure, i. e., v(x), and its dimension r are known is typical for the classical experiment design theory that is recapitulated here. It states solid methodological grounds for this elegant theory. However, both the founders of this theory and more contemporary researches had in mind that these assumptions should be weakened in further studies. In particular, in the seminal paper [11] A. Atkinson and V. Fedorov proposed a T-optimality criterion for discriminating between rival models. This crucial topic was continued in [36, 116] and in [180]. https://doi.org/10.1515/9783110351040-002
18 | 2 Primer on classic optimal experiments Assumptions A1–A3 hold also for many blending models. We do not discuss them here, since experiment designs for mixtures must fulfill a specific additional assumption that xi ’s should sum up to one. We refer the reader to [41] for recent results in this direction and to the bibliography therein for earlier results. In parallel, topics concerning experiment designs for the so-called nonparametric approaches to regression estimation have been developed (see the previous chapter for the bibliography). Under A1–A3, (2.2) is called the linear regression model (spanned by vector v(x)) or just linear regression. One more assumption, which is frequently not stated explicitly, is that there are no errors in setting (or observing) xi ’s, i. e., we are able to set them exactly. The choice of v(x) is at our disposal, but functions v(1) (x), v(2) (x), . . . , v(r) (x) have to be linearly independent in a certain domain X ⊂ Rs , in which observations {xi }Ni=1 are taken. The classic examples are listed below. Linear model – in the narrow sense A model which is linear in unknown parameters a(j) and affine with respect to inputs x(j) ̄ y(x) = a(1) + a(2) x (1) + ⋅ ⋅ ⋅ + a(s+1) x (s)
(2.3)
is also called the linear model (for traditional reasons). It can be written in the standard form if we put v(1) (x) ≡ 1, v(2) (x) = x(1) , . . . , v(r) (x) = x (s) , where r = s + 1. Polynomial model in one variable Due to the Weierstrass approximation theorem, the polynomial model ̄ y(x) = a(1) + a(2) x + a(2) x 2 + ⋅ ⋅ ⋅ + a(r) x r−1 ,
x ∈ R,
(2.4)
is frequently mentioned. Defining v(x) = [1, x, . . . , xr−1 ]T , one can write it in standard form. It is applicable for r ≤ 5, say. For polynomials of higher degrees it is advisable to express them in an orthogonal polynomial basis as they are less sensitive to numerical errors when parameters a are estimated. Quadratic model in two variables The simplest model in two variables has the form 2
̄ = a(1) + a(2) x(1) + a(3) x (2) + a(4) (x (1) ) + a(5) x (1) x (2) y(x) 2
+ a(6) (x (2) ) .
2.1 The basic linear model | 19
It can be rewritten in standard form by setting 2
2 T
v(x) = [1, x(1) , x(2) , (x(1) ) , x(1) x (2) , (x(2) ) ] . Trigonometric model A trigonometric model in one variable is motivated by a truncated Fourier series. It is defined by setting v(x) as follows: T
[1, sin(x), cos(x), sin(2 x), cos(2 x), . . . , sin(n x), cos(n x)] ,
x ∈ [0, 2 π],
for a finite integer n ≥ 1. Other linear regression models As v(x) one can also select: – orthogonal polynomials (e. g., Legendre polynomials), – Bernstein polynomials, – cubic splines, – kernels (as in RBF neural nets). These examples demonstrate the flexibility of linear models – by selecting v(x) properly, one can cover a large class of models by using one simple expression aT v(x). Remarks: – For simplicity of formulas, we shall confine ourselves to single-output systems with constant variance of measurement errors (the extensions are known). – We will return to examples of multivariate (multi-input) models later. From now on, x can be either one- or multi-dimensional. Identifiability – a minimal requirement imposed on an experiment Define the r × N matrix V as follows: V = [v(x1 ), v(x2 ), . . . , v(xN )].
(2.5)
Let us – for a while – assume that there are no errors in the observations. Then, the vector of responses (outputs), defined as ζ = [ζ1 , ζ2 , . . . , ζN ]T , has the following form: ζ = V T a. We say that parameters a of a linear model are identifiable from observations at points x1 , x2 , . . . , xN if and only if the following condition holds: for every pair ζ1 = V T a1 ,
ζ2 = V T a2 ,
if
ζ1 ≠ ζ2 ,
then a1 ≠ a2 .
(2.6)
Condition (2.6) does not depend on assumptions on measurement errors. If (2.6) does not hold, then one can find two vectors of parameters, a1 , a2 , say, which are such that
20 | 2 Primer on classic optimal experiments V T a1 = V T a2 , while a1 ≠ a2 , which means that a1 and a2 could not be distinguished from output observations, even without any observation errors. Identifiability condition: Parameters a of the model V T a are identifiable if and only if points xi , i = 1, 2, . . . , N, are allocated in such a way that rank[V] = rank[v(x1 ), v(x2 ), . . . , v(xN )] = r,
(2.7)
where rank[ ] is the rank of a matrix and r = dim(a) is the number of estimated parameters. Another way of verifying the identifiability condition is as follows: (2.7) holds if and only if det[V V T ] > 0,
(2.8)
where det[ ] denotes the determinant of a matrix. In fact, one of the criteria for a design optimality is to maximize det[V V T ]. The necessary condition for (2.7) is N ≥ r, because V is an r × N matrix. Remark 2. Functions v1 (x), v2 (x), . . . , vr (x) are linearly independent in domain X ⊂ Rs if and only if the Gram matrix Gr with elements ∫ vk (x) vl (x)dx,
k, l = 1, 2, . . . , r,
(2.9)
X
is nonsingular, which holds if and only if det[∫ v(x) vT (x) dx] > 0.
(2.10)
X
2.2 The LSM – some basic facts on its accuracy In this section we provide some well-known basic facts concerning the estimation accuracy provided by the least squares method (LSM). They are fundamental for correctly stating problems of optimal design of experiments. In this section by experidef
ment design we mean the sequence ξN = {x1 , x2 , . . . , xN }, where xi ’s are selected from a certain, prescribed domain X ⊂ Rs . We are in a chicken and egg situation. What should we select first: an estimation method or an experiment design adequate for its estimation? – Should an experiment design depend on the estimation method? Yes, because it should optimize the estimation accuracy. – Perhaps it is better to look for a more accurate estimation method. Yes, but the accuracy of the method depends on an experiment’s design.
2.2 The LSM – basic facts | 21
Fortunately, we can break this contradiction by selecting the LSM, since it is the most accurate (linear) method for each fixed experiment design. Thus, our plan of attack is the following: 1. Express the estimation accuracy of LSM in terms of an experiment design. 2. Then optimize this accuracy with respect to “all possible designs.” LSM – the ordinary least squares version The LSM is one of the oldest statistical methods. Hundreds of papers and books discuss the LSM in detail. Here, we collect only the basic facts, which are further necessary to form optimality criteria for experiment designs. Let A1–A3 hold and we have observations {(xi , yi )}Ni=1 at our disposal, where xi ’s are design points (at this stage any reasonable points that ensure identifiability). Vector â ∈ Rr , for which a minimum of N
def
Q(a) = ∑(yi − aT v(xi )) i=1
2
(2.11)
with respect to a ∈ Rr is attained, is further called the least squares estimate (LSE). The minimization of (2.11) is frequently called the ordinary least squares (OLS) method, as opposed to somewhat more sophisticated methods like the weighted least squares (WLS) method. For simplicity of the exposition, we confine ourself to the OLS version, although most of the consideration in this section, as well as in this book, also applies to the WLS case. Function Q(a) is convex in a ∈ Rr . If the identifiability condition (2.8) holds (which is further assumed), then it is also strictly convex. This implies that Q(a) has a unique minimum in Rr . In order to find it, we calculate the gradient of Q(a), equate it to zero, and form the set of linear equations with respect to a: N
MN a = ∑ yi v(xi ), i=1
(2.12)
where the r × r matrix MN is defined as N
MN = ∑ v(xi ) vT (xi ). i=1
(2.13)
Here, (2.12) is called the system of normal equations. Note that by (2.8) it is nonsingular. A (unique) solution â of (2.12) is of the form N
â = ∑ yi MN−1 v(xi ). i=1
(2.14)
22 | 2 Primer on classic optimal experiments def
It is crucial for further considerations that â is a linear function of outputs: y = [y1 , . . . , yN ]T .
Remark 3. In numerical calculations of â the version (2.14) should be avoided. Instead, solve MN a = ∑Ni=1 yi v(xi ) using carefully selected specialized software (this system of equations can be very badly conditioned, leading to large numerical errors). Main properties of OLS estimators According to A2, observation errors ϵi ’s are random variables. Thus, according to A3, also yi ’s are random (before observing them). Consequently, â defined by (2.14) is a random vector, which is further called the LSE. We shall denote by E [ ] the expected value of a random variable (vector) with respect to all ϵi ’s. Analogously, the variance and the covariance matrix will be denoted by Var [ ] and Cov [ ], respectively. The following properties of the LSE are well known (see, e. g., [8, 150]). 1) The LSE is unbiased, i. e., E (a)̂ = a, independently of the unknown vector a. 2) The covariance matrix has the following form: Cov (a)̂ = σ
2
MN−1
2
N
−1 T
= σ [∑ v(xi )v (xi )] . i=1
(2.15)
̂ Its diagonal elements are equal to the Let us recall the interpretation of matrix Cov (a). (j) (j) ̂ variances Var (a ) of estimators for a , j = 1, 2, . . . , r. Note that Cov (a)̂ is a symmetric matrix and its off-diagonal elements are directly proportional to the correlation coefficients of â (k) and â (j) , k ≠ j, k, j = 1, 2, . . . , r, the proportionality constant being equal to (var(â (k) ) ⋅ var(â (j) ))1/2 . As we shall see later, high correlations between â (k) and â (j) are not desirable. Optimality of LSE when competitors are linear and unbiased We need a common base for comparing different estimators of vector a in the linear model, assuming that an experiment design ξN is fixed. To this end we have to specify a fair class of all competitive estimators of vector a as well as a way of comparing them. As a class of competitors for LSE, we consider the class ℒ of all linear (with respect ̄ to y) estimators of a, which are also unbiased. Note that any linear estimator of a can be expressed as L y,̄ where L is a certain r × N matrix. Thus, one can regard ℒ as the class of all r × N matrices L, which are such that E (L y)̄ = a for every unknown a. We shall compare linear unbiased estimators according a partial ordering of their covariance matrices that is introduced as follows. Definition 1. Let A and B be r × r symmetric matrices. If the matrix A − B is positive semi-definite, then we shall write A − B ≥ 0 or A ≥ B. We shall write A > B if A − B is (strictly) positive definite.
2.2 The LSM – basic facts | 23
This partial ordering is known as the Loewner ordering. Not every pair of r × r matrices is comparable in the above sense, since A − B can be neither positive semi-definite nor negative semi-definite if part of the eigenvalues of A − B is nonnegative, while the rest is nonpositive. Thus, the symbol “≥” introduces only partial ordering. It can be proved, however, that if ξN is fixed, then for every pair of matrices L1 , L2 ∈ ℒ we are able to say whether Cov (L1 y) ≥ Cov (L2 y) or not. Thus, it makes sense to look for a linear unbiased estimator of a with the “smallest” covariance matrix. Having established the method of comparing the accuracy of linear and unbiased estimators for a, we shall say that we are looking for the best linear unbiased estimator (BLUE) for a in the linear model. The following result (see, e. g. [8, 150]), widely known as the Gauss–Markov (G-M) theorem, provides the answer to this question. Theorem 1 (Gauss–Markov). Let A1–A3 hold and let ξN that ensure identifiability be fixed. (A) The LSE â is the BLUE estimator, i. e., it has the smallest covariance matrix (in the sense of Definition 1) in the class of all linear unbiased estimators. (B) If, furthermore, the probability density function (p. d. f.) of εi ’s is the Gaussian one, then the LSE is the best in the class of all (including nonlinear) unbiased estimators of a. Several remarks are in order concerning the G-M theorem. 1) It holds for every N ≥ r, i. e., it is a nonasymptotic result. This fact plays a crucial role in stating optimal DOE problems. 2) In part (A) there are no assumptions concerning any particular distribution of errors (only conditions A2 and A3 are imposed). 3) In case (B) the LSE is the same as the maximum likelihood estimator of a. 4) For the G-M theorem it is essential to compare the LSE with unbiased estimators only. For biased estimators the covariance matrix is not an adequate measure of the estimation quality. Prediction variance of the LSM def T ̂ Having LSE a,̂ it is natural to use y(x) = â v(x) as a predictor of y(x), i. e., as the T estimate of a v(x). The following corollary from the G-M theorem provides facts con̂ cerning the accuracy of y(x). Corollary 1. Let A1–A3 hold and let ξN that ensure identifiability be fixed. (A) Then, for each fixed x, â T v(x) is the unbiased estimator of aT v(x). It has the smallest variance in the class of all linear unbiased estimators of aT v(x). This variance is given by ̂ Var (y(x)) = vT (x) cov(a)̂ v(x) = σ 2 vT (x) MN−1 v(x).
(2.16)
24 | 2 Primer on classic optimal experiments (B) If, additionally, the p. d. f. of ϵi ’s is the Gaussian one, then the minimum variance property of â T v(x) holds also in the class of all nonlinear, unbiased estimators for aT v(x). Having this result, we know that the LSE is the best way of estimating the output of a linear model. Thus, we have a tool for comparing experiments not only by the corresponding covariance matrices, but also by the prediction model variances. These two approaches for comparing experiments are the basis for formulating the so-called design optimality criteria, as described in Section 2.3, but first we provide several obvious remarks on improving the estimation accuracy. What can be done before optimizing an experiment design? More precisely, what can be done in order to increase the quality of estimating â in a linear model before going into the details of a precise optimal experiment design? Define a (discrete) experiment design as the i × N sequence ξN = {x1 , x2 , . . . , xN } for which the identifiability condition holds. Define the FIM as N
M(ξN ) = σ −2 ∑ v(xi ) vT (xi ) i=1
(2.17)
and recall that for the LSE we have Cov (a)̂ = M −1 (ξN ). Thus, before trying to optimize the estimation accuracy measures with respect to ξN , one may consider other ways of “increasing” M(ξN ) in the sense of partial ordering described previously. Considerations on this topic are – to some extent – informal, but they can be made rigorous. The analysis of (2.17) suggests the following approaches. Extend the experiment domain X As we shall see later, many points of the design which is optimal in a certain sense are placed at boundaries of X. Thus, extending X will lead to a better design. This way of increasing the estimation accuracy is usually limited by at least three factors, namely: – physical constraints, which do not allow us to increase forces, temperatures, pressures, etc., beyond some boundaries, – economic constraints that can make the increase of physical constraints inefficient in the sense that their cost increases faster than the estimation quality, – our model is only a certain approximation of the reality and extending X may lead to model inadequacy. Decrease the variance of observation errors σ 2 To this end: – apply measurement devices of higher quality (a barrier of costs may arise), – repeat observations at the same point x.
2.3 Toward design optimality criteria | 25
We suggest not using this method as a panacea, since allocation of the same number of observations at different points may lead to a better overall estimation accuracy. Such a decision should be taken in the process of optimizing the experiment. Increase the number of observations An increase of the number of observations is always desirable and should be made whenever additional observations are not too costly, but their proper allocation is again the problem of optimal experiment design. Remark 4. The last statement is valid under our assumption that observation errors are uncorrelated. For correlated errors a counterintuitive phenomenon of reducing the estimation accuracy with the growth of the sample size may occur (see [106] where examples of this type are discussed).
2.3 Toward design optimality criteria From the previous section we know that Cov (a)̂ = M −1 (ξN ) is the “best possible” for a given design ξN , but for different ξN′ and ξN′′ the covariance matrices M −1 (ξN′ ) and M −1 (ξN′′ ) may not be comparable in the sense of partial ordering of symmetric matrices. Remark 5. This is not in contradiction to the G-M theorem because in its formulation ξN was the same, only the estimators were different. ̂ From the previous section we also know that Var (y(x)) = vT (x) M −1 (ξN ) v(x) is the ′ ′′ smallest one for fixed x, but again, for different ξN and ξN the surfaces vT (x) M(ξN′ ) v(x) and vT (x) M −1 (ξN′′ ) v(x) may not be comparable in the sense that one of them may not uniformly dominate the other. Example 2. Consider a regression function spanned by v(x) = [1, cos(2 π x)]T , x ∈ [−1, 1], and two designs of experiments: (1) ξN′ consists of N = 8 evenly distributed between x1 = −0.1 and x2 = 0.7, (2) ξN′′ is supported at points x1 = −0.1 and x2 = 0.7 with two observations at each of them and the remaining four observations are made at x3 = 0. The corresponding covariance matrices have the form M −1 (ξN′ ) =
1 1.2 [ 8 −0.8
−0.8 ], 3.2
M −1 (ξN′′ ) =
1 2.32 [ 8 −2.10
−2.10 ]. 3.37
The eigenvalues of their difference are λ1 = −2.03 and λ2 = 0.75. Thus, the matrices M −1 (ξN′ ) and M −1 (ξN′′ ) are not comparable in the sense of partial ordering. Furthermore, the variance curves corresponding to these designs are also not comparable, as follows from Fig. 2.1.
26 | 2 Primer on classic optimal experiments
Figure 2.1: The prediction variance curves (scaled by 8) obtained in Example 2.
As a conclusion from this example, we need a more universal way of comparing experiments. For this reason, the so-called criteria for comparing experiment designs were proposed, which are either: – scalar functions defined on M −1 (ξN ), or ̂ – functionals of Var (y(x)). Remark 6. In certain, rather rare, cases (e. g., for aT x regression), there exists a universally optimal design ξN∗∗ for which M −1 (ξN∗∗ ) ≤ M −1 (ξN ) holds for every ξN . Review of selected design optimality criteria For the purposes of interpretation only, we tentatively assume that errors ϵi ’s are Gaussian. Fix 0 < β < 1, which is called the confidence level (usually β = 0.99 or 0.95). The confidence ellipsoid, denoted as ℰ (ξN ), for unknown model parameters, has the following form: r
T
ℰ (ξN ) = {a ∈ R : (a − a)̂ M (ξN ) (a − a)̂ ≤ c}, −1
(2.18)
where c > 0 is a constant, which depends on N and β, but not on ξN . This ellipsoid is centered at â and it can be proved (see [149]) that it covers the vector of unknown parameters a with probability not less than β, provided that c is appropriately chosen. From the point of view of parameter estimation accuracy, it is desirable to have the confidence ellipsoid ℰ (ξN ) “as small as possible.” This intuitive requirement can be formalized in a number of ways, leading to different design optimality criteria.
2.3 Toward design optimality criteria | 27
D-optimality criterion According to this criterion our aim is to minimize the volume of the confidence ellipsoid of parameter estimates. As is known (see [149]), the volume is proportional (denoted as “∼”) to the square root of the determinant of the covariance matrix, i. e., 1
−1
Vol(ℰ (ξN )) ∼ (det[M −1 (ξN )]) 2 = (det[M(ξN )]) 2 .
(2.19)
In (2.19) we have expressed the volume of (2.18) (up to a proportionality factor) in a number of equivalent ways. As one can notice, the minimization of the volume of the confidence ellipsoid is equivalent to the minimization of det[M −1 (ξN )] or to the maximization of det[M(ξN )] (or log[det[M(ξN )]]) with respect to ξN . A-optimality criterion According to the A-optimality criterion, we are looking for minξN r −1 tr[M −1 (ξN )], where tr[A] stands for the trace of a symmetric matrix A. Let λj [A]’s be its eigenvalues. Then, one can express r −1 tr[M −1 (ξN )] as follows: r
r −1 tr[M −1 (ξN )] = r −1 ∑ λj (M −1 (ξN )). j=1
(2.20)
Hence, the minimization of r −1 tr[M −1 (ξN )] is equivalent to the minimization of the mean value of lengths of the confidence ellipsoid axes. Another interpretation: r −1 tr[M −1 (ξN )] is the averaged variance of parameter estimators. Other design optimality criteria in the parameter space Generalizations of the above criteria are the following: – E-optimality criterion – minξN maxj λj [M −1 (ξN )], i. e., we minimize the largest axis of the uncertainty ellipsoid, – L-optimality criterion – minξN r −1 tr[W M −1 (ξN )], where W is the nonnegative definite matrix of weights, – Lp optimality criterion – minξN tr[M −p (ξN )]1/p . Note that the Lp -optimality criterion absorbs many others as special cases, namely: 1) for p = 1 we obtain the Lp = A-optimality criterion, 2) for p → 0+ criterion Lp converges to the D-optimality criterion, 3) for p → ∞ criterion Lp converges to the E-optimality criterion.
28 | 2 Primer on classic optimal experiments Experiment design optimality criteria minimizing the prediction error The above criteria are based on quantifying the accuracy of parameter estimates. One can also select designs that provide the best design from the viewpoint of the prediction error, namely: ̂ ̂ – G-optimality criterion – minξN maxx∈X var(y(x)), where y(x) = â v(x) is the prediction at x, ̂ – Q-optimality criterion – minξN ∫X Var(y(x)) dx (in fact, this criterion is equivalent to the L-optimality one when one selects W = ∫X v(x) vT (x) dx). Our focus is on D- and G-optimality criteria. The reasons are not only in that they are the most frequently used and well investigated. Other reasons are that they have important invariance properties, which are not shared by other criteria. We shall discuss these invariance properties later. Even more general design optimality criteria Fedorov introduced an even more general class of optimality criteria that are called Φ-optimality. They include all the above criteria defined in the parameter space as special cases, since only mild assumptions concerning the preservation of the Loewner ordering are imposed on them (see [120] for detailed analysis of these criteria). Optimal designs of experiments – simple examples Below we provide simple examples of D-optimal experiment designs. The reasons that they are D-optimal will be apparent later. (1) Linear regression: y = a1 + a2 x, x ∈ [−1, 1]. D-optimal design (for N – even): – N/2 observations at x1 = −1, – N/2 observations at x2 = +1. (2) Quadratic regression: y = a1 + a2 x + a3 x 2 , x ∈ [−1, 1]. (a) D-optimal design (for N divisible by 3): put N/3 observations at x1 = −1, x2 = 0, x3 = +1. (b) A-optimal design (for N divisible by 4): put N/4 observations at x1 = −1 and x2 = +1 and N/2 observations at x3 = 0. Discussion – What about optimal designs for linear and quadratic regression when N is not divisible by 2, 3, etc.? – Up to now, we have considered exact (or discrete) optimal designs. – In general, searching for discrete optimal designs is a difficult optimization problem that can be nowadays solved for N which is not too large, and the number of inputs s is of a moderate size.
2.4 Approximate experiment designs | 29
2.4 Approximate experiment designs In this section we introduce an important notion of approximate experiment designs and provide their basic properties. Approximate designs of experiments – definition To overcome these difficulties the notion of approximate (or continuous) designs was first introduced in the 1950s. Up to now, we have considered a sequence {x1 , x2 , . . . , xN } with possible repeated points as a discrete (exact) experiment design. As the first step toward introducing the notion of approximate experiment design, let us rewrite an exact (discrete) design in the following equivalent way:
[
x1 , x2 , . . . , xm ], p1 , p2 , . . . , pm
(2.21)
where m ≤ N is the number of all different points among x1 , x2 , . . . , xN . Without loss of generality, we can further assume that these different points have indices from 1 to m. By ni we denote the number of repetitions of xi , i = 1, 2, . . . , , xm , in the sequence x1 , x2 , . . . , xN . Clearly, ∑Ni=1 ni = N. In (2.21) pi ’s are frequencies defined as pi = ni /N, i = 1, 2, . . . , m. Hence, we have pi ≥ 0,
m
∑ pi = 1. i=1
(2.22)
Note that when (2.21) is given, one can easily recover the whole sequence x1 , x2 , . . . , xN , because N ⋅ pi are nonnegative integers. Approximate (or continuous) experiment designs are defined similarly as in (2.21), with only one, but important, exception, namely, N ⋅ pi , i = 1, 2, . . . , m, are no longer required to be an integer, e. g. (we allow, e. g., 3.33 observations at x1 = 1). This time, having (2.21) and selected N we can “recover” a realizable experiment design only approximately by attaching Round[N pi ] observations to xi , i = 1, 2, . . . , m. Note that the Round[⋅] operation must be performed in such a way that ∑Ni=1 Round[N pi ] = N. For large N this approximation is quite good (and can be very good, when properly done [120]). As we shall see later, by introducing the notion of approximate experiment design we obtain a beautiful and useful theory and much more efficient optimization algorithms.
30 | 2 Primer on classic optimal experiments Definition 2. Let a compact (i. e., closed and bounded) set X ⊂ Rs , which is interpreted as an experiment domain, be given. Then the table ξ =[
x1 , x2 , . . . , xm ], p1 , p2 , . . . , pm
m
pi ≥ 0,
∑ pi = 1 i=1
(2.23)
such that xi ∈ X, i = 1, 2, . . . , m, is called an approximate design of experiment. As in (2.21), for a given pi ∈ R+ , n̂ i = N ⋅pi is interpreted as the approximate number of observations at xi , assuming that we are allowed to perform N ≥ m experiments. Again, n̂ i ’s are rounded in such a way that ∑Ni=1 n̂ i = N. Points x1 , x2 , . . . , xm are called the support of ξ , supp(ξ ) in short; pi ’s are called the weights of ξ . Note that the number of support points m may be different for different ξ , but we do not display m in the notation. Further on, we shall call (2.23) an experiment design, skipping the adjective “approximate” unless necessary. Note also that in older papers “approximate experiment designs” were called “continuous designs.” The meaning was exactly the same as above and the adjective “continuous” was related to the possibility of varying pi ’s continuously in [0, 1], without any other meaning of the “continuity” notions. The set of all experiment designs over X The set of all approximate designs of the form (2.23) over a given compact set X ⊂ Rs will be denoted by Ξ(X). Note that we allow m to be different (individual) for ξ ∈ Ξ(X). For 0 ≤ α ≤ 1 and for a pair of designs ξ , τ ∈ Ξ(X), define their convex combination ξ (α) = (1 − α) ξ + α τ
(2.24)
by executing the following steps: (1) form support of ξ (α) as supp(ξ ) ∪ supp(τ), (2) multiply the weights of ξ and τ by (1 − α) and α, respectively, (3) associate the weights from step (2) with the corresponding support points obtained in step (1), summing up (if necessary) the weights of points in supp(ξ ) ∩ supp(τ). Example 3 (Convex combination of designs). In order to illustrate the notion of the convex combination of designs, consider the following two examples: −1, ξ =[1 , 3
0, 1 , 3
1
1], 3
0, τ = [1 , 3
1, 1 , 3
Then ξ (α) = (1 − α) ξ + α τ, 0 < α < 1, is the following: −1, ξ (α) = [ 1−α , 3
0, 1 , 3
1, 1 3
2
α]. 3
2
1]. 3
2.4 Approximate experiment designs | 31
Example 4 (Important special case). Let ξ be as above, but τ is a one-point design, i. e., all the observations are to be performed at point κ ∈ X, say, i. e., formally τ = [ κ1 ]. Then, ξ (α) = (1 − α) ξ + α τ has the form 0,
−1, ξ (α) = [ 1−α , 3
1−α , 3
1,
1−α 3
κ ], α
with the possible reduction, if κ is −1, 0, 1. Further, one-point designs, obtained by putting the unit weight at the design point κ, will be denoted by δ(κ). The following result, although simple, is crucial for further considerations. Proposition 1 (Convexity of Ξ(X)). The set Ξ(X) is convex, i. e., for every ξ , τ ∈ Ξ(X) and 0 ≤ α ≤ 1 also (1 − α) ξ + α τ ∈ Ξ(X). We already know that the inverse of the Fisher information matrix determines the accuracy of parameter estimates of a linear regression spanned by v(x). For approximate designs it is convenient to introduce its normalized version. Definition 3 (Normalized information matrix). The normalized information matrix for design ξ ∈ Ξ(X) is defined as follows: m
M(ξ ) = σ −2 ∑ pi v(xi ) vT (xi ), i=1
(2.25)
which is an r × r matrix, where r = dim(v(x)). Note that M(ξ ) is symmetric and nonnegative definite. If it is positive definite, then M −1 (ξ ) is directly proportional to the covariance matrix of the parameter estimates. This interpretation is strict if N pi ’s are integers and approximate otherwise. Later on we set σ = 1 in (2.25), since the constant factor does not influence optimal design. Definition 4 (Attainable information matrices). For a given regression model, spanned by v(x) in design region X, we define the set ℳ of all attainable information matrices as ℳ = {M(ξ ) : ξ ∈ Ξ(X)}.
(2.26)
The following two results are of special importance in optimal experiment design (see [46, 113], or [8] for their proofs). Theorem 2. Let X ⊂ Rs be a compact set. Let also functions v(x), spanning a regression function, be continuous in X. Then, ℳ is also compact in the space of all r × r matrices
32 | 2 Primer on classic optimal experiments endowed with the topology induced by the Euclidean matrix scalar product ⟨A, B⟩ = tr(AT B). a) For every M ′ ∈ ℳ one can find a design ξ ∈ Ξ(X) with the support comprised of not more than r(r + 1)/2 + 1 points and such that M ′ = M(ξ ). b) If, additionally, M ′ is a boundary element of ℳ, then the support of ξ contains at most r(r + 1)/2 points. Theorem 3. ℳ is a convex set, i. e., for 0 < α < 1 and M1 , M2 ∈ ℳ also (1 − α) M1 + α M2 ∈ ℳ. The compactness and convexity of ℳ make the optimization of designs a very nice theory. The compactness of ℳ and continuity of design criteria implies the existence of optimal designs. The convexity of ℳ and the strict convexity of a selected design optimality criterion imply that necessary design optimality conditions are also sufficient. Remark 7. From Theorem 2 it follows that it suffices to confine searching for optimal designs to those that have at most r (r + 1)/2 + 1 points in their support. Remark 8. One can extend the definition of the normalized FIM to the following one: ∫ v(x) vT (x) μ(dx),
(2.27)
X
where μ is a probability measure over X, while the integral is with respect to a probability measure μ (see [22]). However, this extension does not provide better experiment designs, since it can be shown (see [46]) that the set of attainable (by varying μ) normalized information matrices (2.27) coincides with ℳ as defined by (2.26). Example 5. In order to illustrate how complicated the geometry of attainable FIM can be, let us consider a linear model spanned by v(x) = [sin(π x), sin(2 π x)]T , x ∈ [0, 1], and a family of information matrices of the form Mp = p v(x1 ) vT (x1 ) + (1 − p) v(x2 ) vT (x2 ).
(2.28)
For illustration purposes, the elements m11 , √2 m12 , and m22 of Mp are stacked into a vector and plotted as a function of x1 , x2 for fixed p = 0.5 (see Fig. 2.2). At this figure each point corresponds to one FIM. Note that in Fig. 2.2 only a part of ℳ is plotted. This explains why the plotted set is not convex. Consider the same model, but this time for the weights p = 0.65 and p = 0.75 (see Fig. 2.3). A large sensitivity of shapes with respect to p that is visible in Fig. 2.3 is noteworthy.
2.4 Approximate experiment designs | 33
Figure 2.2: Attainable FIM plotted as vectors in Example 5 for p = 0.5.
Figure 2.3: Attainable FIM plotted as vectors in Example 5 for p = 0.65 (left panel) and p = 0.75 (right panel).
34 | 2 Primer on classic optimal experiments
2.5 Optimality conditions – the Kiefer and Wolfowitz theorem The Kiefer–Wolfowitz theorem on the equivalence of approximate designs that are optimal with respect to D- and G-optimality criteria (see [80]) was a milestone in the development of optimal experiment design. Theorem 4 (Kiefer and Wolfowitz equivalence theorem). Let us assume that functions v(x) spanning a linear model are continuous in a compact domain of experiment X ⊂ Rs and that there exists ξ ∈ Ξ(X) such that det[M(ξ )] > 0. The following conditions are equivalent: 1. Design ξ ⋆ is D-optimal. 2. Design ξ ⋆ is G-optimal. 3. For ξ ⋆ the following condition holds: max ϕ(ξ ⋆ , x) = r, x∈X
(2.29)
def
where ϕ(ξ , x) = v(x)T M −1 (ξ ) v(x) is the normalized variance of the linear model prediction error when design ξ ∈ Ξ(X), det[M(ξ )] > 0 is applied. Additionally, the maximum in (2.29) is attained at all the points of the support of ξ ⋆ , ⋆ i. e., for each point x1⋆ , x2⋆ , . . . , xm we have ϕ(ξ ⋆ , xi⋆ ) = r. Later on we refer to Theorem 4 as the K-W theorem. The equivalence theorem deserves several explanations. – D-optimal design ξ ∗ provides the smallest volume of the uncertainty ellipsoid and simultaneously minimizes the worst prediction error, i. e., it is also G-optimal and vice versa. – Condition (2.29) allows us to check the D- and G-optimality of any design ξ ∈ Ξ(X) with nonsingular FIM and (in many cases) it provides hints for finding the optimal design analytically. – It serves as a building block for constructing numerical search procedures. In the examples presented below, D- and G-optimality of the designs can easily be verified using the K-F theorem. Example 6. Consider the quadratic regression a1 + a2 x + a3 x 2 , x ∈ [−1, 1]. The design ξ∗ = [
−1 1 3
0 1 3
1
1] 3
(2.30)
is the D- and G-optimal one. Consider also the following nonoptimal design. −1 ξw = [ 1 5
0 2 5
1
2]. 5
(2.31)
2.5 Optimality conditions | 35
Figure 2.4: Prediction variances of two designs in Example 6. Thick line – D-optimal design (2.30). Thin line – prediction variance for nonoptimal design (2.31).
Figure 2.5: Prediction variance for x ∈ [−1, 1] when the quadratic regression is estimated by the family of designs described by (2.32). The curve in the bottom right panel corresponds to the optimal design.
36 | 2 Primer on classic optimal experiments Designs (2.30) and (2.31) have the same support, but design (2.31) has badly selected weights. Analysis of Fig. 2.4 reveals that we lose 66 % of the efficiency at x = −1 in comparison with D- and G-optimal design. Example 7. We continue our example of estimating the quadratic regression, but this time our aim is to check how much we lose when the design support point is improperly selected. This time (2.30) is compared with the following family of designs: −1 ξw = [ 1 3
xmid 1 3
1
1], 3
(2.32)
which have proper weights and end points, while the middle point is varying. As one can observe from Fig. 2.5, if xmid is far from the optimal position, we lose even 7 times in efficiency. On the other hand, if xmid is not so far from its optimal positions, our losses are not so large (compare the maxima of the two lower panels). Example 8. Consider the regression function spanned by [1, x, . . . , x 9 ]T and experiment design Table 2.1. This design consists of 11 support points equidistantly placed in [−1, 1] with additionally enlarged weights near the ends. The corresponding prediction variance is plotted in Fig. 2.6. For the regression function spanned by [1, x, . . . , x 9 ]T and G-optimal design we should have the largest prediction variance equal to 10, while from Fig. 2.6 it follows that for design Table 2.1 we obtain the largest prediction variance above 100, i. e., about 10 times larger. This indicates that designs that are based on equidistant support points are not a good choice for higher-degree polynomial regression estimation. We shall provide the description of D- and G-optimal design in this case in Theorem 5. Table 2.1: Design of the experiment discussed in Example 8. supp
pi
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 −1 0.95 1 −0.9
0.00869318 0.00869318 0.00869318 0.00869318 0.00869318 0.00869318 0.00869318 0.00869318 0.00869318 0.00869318 0.00869318 0.286875 0.0675 0.45 0.1
2.5 Optimality conditions | 37
Figure 2.6: Plot of the prediction variance when a ninth-degree polynomial is estimated using design Table 2.1.
Theorem 5. Consider polynomial regression of the form r
̄ y(x) = ∑ a(k) xk−1 , k=1
(2.33)
which is estimated from observations in X = [−1, 1] with a constant variance. Then, the D-optimal design is comprised of r points, which are the roots of the following equation: (1 − x2 )
dPr−1 (x) = 0, dx
(2.34)
where Pk (x) is the Legendre polynomial of the k-th order. The weights of this design are equal to 1/r. The proof of this theorem can be found in [78]. This theorem characterizes D-optimal designs in a sufficiently precise way that we are able to calculate them almost explicitly. In Table 2.2 one can find their support points for polynomials up to the ninth degree. The distribution of them is shown in Fig. 2.7. Note the grouping near Table 2.2: Support points of D-optimal designs – polynomial regressions. (r − 1) 2 3 4 5 6 7 8 9
Positive half of the support 1 1 1 1 1 1 1 1
0 0.447 0 0.285 0 0.872 0 0.165
0.654 0.765 0.469 0.592 0.899 0.478
0.830 0.209 0.677 0.739
0.363 0.919
38 | 2 Primer on classic optimal experiments
Figure 2.7: D-optimal designs for polynomials: the y-axis indicates the polynomial degree, the x-axis shows positions of the design support points.
the end points when the polynomial order is growing. This phenomenon is known as the arc-sine law when the degree of the polynomial regression is growing to infinity. Clearly, the K-W theorem applies also to multivariate models. Below, we provide two simple examples. Example 9 (2D regression). This time we consider a bivariate linear regression function spanned by T
[1, sin(2 π x(1) ), sin(2 π x(2) ) sin(2 π x(1) + x (2) )] ,
(2.35)
which is estimated from ten pairs of observations that were drawn at random with the uniform distribution in the unit square. The corresponding prediction variance is plotted in Fig. 2.8. Again, it is clear that such a design is not a good choice, because
Figure 2.8: The prediction variance of estimating a regression function spanned by (2.35) using ten points drawn from the uniform distribution.
2.6 Case studies – sensor allocation | 39
the largest variance exceeds 10, while for the G-optimal design we would have 4. We shall return to experiment designs for estimating trigonometric regression functions in the next chapter. Example 10 (Linear multivariate regression). Consider the simplest multivariate regression model: s
̄ y(x) = a(0) + ∑ a(k) x (k) ,
(2.36)
k=1
on an s-dimensional unit cube [−1, 1]s = ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟ [−1, 1] × ⋅ ⋅ ⋅ × [−1, 1], s-times
s ≥ 1.
Using the K-W theorem it is easy to prove that the D-G-optimal design is supported at all the vertices of [−1, 1]s , attaching equal weights 1/2s to each of them. This means that the classic full-factorial experiments (on two levels) are D- and G- optimal.
2.6 Case studies – sensor allocation Consider the heat conduction equation in the steady state defined along a ring (thin torus): Ax q(x) = −u(x),
(2.37)
where, for twice continuously differentiable functions q(⋅) on [0, 1], the operator Ax is defined as follows: def
Ax = a1
d2 q(x) − a2 q(x), d x2
x ∈ (0, 1),
(2.38)
with boundary conditions q(0) = q(1), q′ (0) = q′ (1). Example 11. Problem statement: the distribution of heat sources u(x) is a continuous function in [0, 1], but it is unknown. Our aim is to estimate it as accurately as possible, by selecting the D-optimal, approximate experiment design using the following observations: yi = q(xi ) + ϵi ,
i = 1, 2, . . . , N,
(2.39)
with random errors E[ϵi ] = 0, E[ϵi2 ] < ∞, E[ϵi ϵj ] = 0, for i ≠ j. Parameters a1 > 0 and a2 in this example are assumed to be known.
40 | 2 Primer on classic optimal experiments It is well known that Ax has a countable number of real eigenvalues λk and eigenfunctions vk (x) such that Ax vk = −λk vk (see Chapter 6 for explanations). They are of the form v0 (x) ≡ 1, v1s (x) = √2 sin(π x), v1c (x) = √2 cos(π x), v2s (x) = √2 sin(2 π x), v2c (x) = √2 cos(2 π x), . . . , while λ0 = a2 , λks = λkc = a1 π 2 k 2 + a2 , k = 1, 2, . . . . Note that vks (x) and vkc (x) form the Fourier basis and q(x) can be represented in it. This leads to the conclusion that the response of system (2.37) has the form K̂
−1 −1 q(x) = α0 λ0−1 + ∑ αk λks vks (x) + βk λkc vkc (x), k=1
(2.40)
where unknown αk ’s, βk ’s are Fourier coefficients of u: 1 α0 = ⟨u, v0 ⟩ = ∫0 u(x) dx, 1
αk = ⟨u, vks ⟩ = ∫0 u(x) vks (x) dx, 1
βk = ⟨u, vkc ⟩ = ∫0 u(x) vcs (x) dx.
In general, the solution of (2.41) should be considered for K̂ → ∞, but in practice we −1 −1 truncate this series, keeping only several terms, since λks and λkc decay very quickly (as O(k −2 )). Thus, later in this example, we assume that K̂ is finite and known. Taking into account (2.40) and (2.39), we use the OLS method for estimating αk ’ and βk ’s. Denote these estimates by α̂ k ’s and β̂ k ’s. Then, a natural way of estimating u is the following: K̂
̂ u(x) = α̂ 0 + ∑ α̂ k vks (x) + β̂ k vkc (x). k=1
(2.41)
One can easily prove that D-optimal (approximate) experiment design is of the following form: select m which is even and such that m ≥ (2 K̂ + 1). Calculate the design points xj =
j−1 , m
j = 1, 2, . . . , m,
(2.42)
and attach the weights 1/m to all of them. Thus, D-optimal sensor allocation is equidistant along the length of the ring. This is intuitive, but not quite trivial. The FIM for D-optimal design is a diagonal matrix with equal entries. Example 12. Consider the same equation (2.37), but instead of a ring, we consider heat conduction in a rod, with boundary conditions q′ (0) = 0, q′ (1) = 0 (0 heat flux by the
2.6 Case studies – sensor allocation | 41
ends). Then, eigenvalues and eigenfunctions of Ax are λkc = a1 π 2 k 2 + a2 and vk (x) = cos(k π x), k = 0, 1, 2, . . . . The procedure for estimating u is almost the same, but D-optimal sensor allocation is not equidistant. Let τj be the roots of the equation (1 − x2 )
dPK̂ (x) = 0, dx
(2.43)
̂ where PK̂ (x) are the Legendre polynomials of K-th order. Then, D-optimal design for ̂ estimating u consists of r = K + 1 points defined by the formula xj⋆ = arccos(τj )/π,
j = 1, 2, . . . , r,
(2.44)
and attaches the same weights 1/r to each of them. The proof uses the relationship of cos type functions with the Chebyshev polynomials of the first kind and uses also the invariance of the D-optimal design with respect to change of the domain and to the change of parametrization [78]. Examples are shown in Table 2.3 for K̂ = 4 and 6, while in Fig. 2.9 the allocation of sensors for K̂ = 5, 7, and 9 is sketched. Table 2.3: D-optimal sensor allocation for a rod with insulated end points. K̂
points
4 6
1 1
0 0
0.5 0.5
0.27309 0.344614
0.72691 0.655386
0.18834
0.81166
Figure 2.9: D-optimal sensor allocation for a rod with insulated end points. Allocation according to the roots of (2.44) for K̂ = 5, 7, and 9.
42 | 2 Primer on classic optimal experiments Discussion As one can observe (see Table 2.4), the optimal sensor positions for K̂ = 4 are almost equidistant. The largest differences in sensor positions are of the order 0.03. This may appear to be negligible, but note that det[Munif ] ≈ 0.9. det[Mopt ] Thus, we lose 10 % of the design efficiency by using a very similar, but nonoptimal sensor allocation and the loss grows with K.̂ Table 2.4: Comparison of the D-optimal sensor positions for K̂ = 4 modes with the equidistant one for a rod with insulated endpoints. points opt. eqd.
0 0
0.27 0.25
0.5 0.5
0.727 0.75
1 1
Example 13. We consider the same system (2.37), but this time with Dirichlet boundary conditions, i. e., q(0) = 0, q(1) = 0. This time, the eigenfunctions are of the form vk (x) = sin(k π x), k = 1, 2, . . . . Again, we are looking for the D-optimal design for estimating K̂ first coefficients in the Fourier expansion of unknown u. This time we calculated the D-optimal sensor allocation by a numerical search procedure for K̂ = 5 (see Table 2.5). Table 2.5: The D-optimal sensor allocation found by a numerical search procedure for K̂ = 5. xi
pi
0.14 0.32 0.5 0.68 0.86
0.2 0.2 0.2 0.2 0.2
Comparing Table 2.5 and the results of the two previous examples, we conclude the D-optimal sensor allocation essentially depends on the boundary conditions influencing a given system.
2.7 Why the D-optimality criterion? |
43
Example 14. Consider again system Ax q(x) = −u(x), x ∈ (0, 1), where Ax q(x) = a1
d4 q(x) − m ω2 q(x) d x4
(2.45)
is the fourth-order operator with boundary conditions q′ (0) = q′ (1) = 0, q′′′ (0) = q′′′ (1) = 0. Such operators appear in mechanical systems of bending a beam of mass m (known) under the harmonic excitation ω (also known), assuming the Euler– Bernoulli model. For estimating the Fourier coefficients αk ’s of u(x) we use the OLS method that is based on observations (2.39). The system response, truncated to K̂ terms, is again of the form K̂
q(x) = ∑ αk λk−1 vk (x), k=0
(2.46)
where eigenfunctions vk (x) = cos(k π x), while eigenvalues are of the form λk = π 4 k 4 a1 + m ω2 , since in this book the eigenvalues of partial differential operators are defined as Ax v(x) = −λ v(x) (see Chapter 6 for more detailed explanations). Clearly, D-optimal sensor allocation is the same as in Example 12. Note that this time attaining the estimation accuracy of higher coefficients is more difficult than in Example 12, because of the much faster decay of λk−1 .
2.7 Why the D-optimality criterion? In this section we explain why in the rest of this book we concentrate mainly on D-optimal designs. Note that our main focus is on experiment design for estimating parameters (or eigenvalues) of systems with spatial or spatio-temporal dynamics. These parameters are usually interpretable in the physical sense. Thus, in particular, they should be invariant to simple (almost trivial) changes of physical units (e. g., from grams to kilograms) that are used to describe physical models. The following properties of D- and G- approximate optimal designs justify the main focus on them. 1. D- and G-optimal designs are invariant under linear and nonsingular reparametrization (e. g., a change in scale). More precisely, when in a linear model aT v(x) the parametrization is changed according to a = R b, where R is an s × s nonsingular matrix, then the D- and G-optimal approximate design for estimating a remains the same for estimating b ∈ Rs . 2. D- and G-optimal approximate designs are invariant under 1-1 transformation of the domain X. In particular, we can design experiments on a standard domain, e. g., [−1, 1]s , and then scale the coordinates to a larger cube.
44 | 2 Primer on classic optimal experiments 3.
D- and G-optimal approximate designs may not be unique, but all of them share the same FIM. 4. If the D- and G-optimal design has exactly r point in its support, then all weights are equal to 1/r. An interesting interpretation of the D-optimality criterion can be found in [160]. The Kiefer-Wolfowitz theorem has been generalized to many other criteria, but its interpretations are not so transparent and they do not necessarily share the above invariance properties. In particular, (1) and (2) are of special importance for systems arising from the laws of physics and chemistry. An additional restriction that we impose on the considered design optimality criteria is that they lead to optimal experiment designs with nonsingular FIM. Otherwise, not all model parameters are identifiable. This excludes the so-called partial D-optimal (Ds -optimality) criteria (and many others) from our considerations.
2.8 Remarks on optimal experiment designs for nonlinear models Up to now, we have had a methodologically clear situation that can be summarized as follows. 1. A model for an I/O relationship of a certain system is exact in the sense that if there are no errors in input and output observations, the model perfectly describes the model behavior for a certain, but unknown, set of its parameters a0 . 2. The model output is linear with respect to unknown parameters to be estimated, but it can be nonlinear with respect to its inputs. 3. The model inputs are observed without errors. Furthermore, we are able to select their values and to apply them with no errors to the system under consideration. 4. The model output is corrupted by additive random errors that have zero mean values and finite variances (for simplicity the variances are the same, but this assumption can be relaxed). 5. Random errors are uncorrelated (again, for the sake of simplicity, but this assumption can be relaxed, at least in the cases when the correlation structure is known). 6. Under the above assumptions, for a given design of experiment, the LSEs of the parameters are the “best” one in the sense that they are unbiased and they have the “smallest” covariance matrix among all the linear (in the output observations) estimators of the unknown parameters. Additionally, for a fixed experiment design, all the covariance matrices of all linear and unbiased estimators are comparable in the sense that for each pair we can say whether their difference is positive definite or not. 7. In other words, for a fixed experiment design, the LSE is the BLUE of the parameters.
2.8 Experiment designs for nonlinear models | 45
8. If, additionally, random errors of the output observations are zero mean with the Gaussian distribution, then – for a fixed experiment design – the LSE is not only the BLUE, but it is also the best one among all the unbiased estimators of the parameters (including estimators that are nonlinear with respect to the output observations), when the covariance matrices of estimators are compared (see [149]). Thus, for the models that are linear in parameters and under the above assumptions, including the one that an experiment design is fixed, the LSM provides estimators that are improvable within this class and under the mean squared error criterion. Hence, further improvements are possible only by selecting an experiment design. It is clear – from the methodological point of view – that the situation is much more complicated when only one of the above assumptions is violated. Namely, this crucial assumption is a linear dependence of the system output on unknown parameters. If a system output depends nonlinearly on unknown parameters, we are standing on much less sure methodological ground. The reason lies in that we do not have the “best” estimation method for a finite number of observations and for a fixed experiment design. Furthermore, in general, we are not able to compare the estimation methods and experiment designs for a finite number of observations. To be more precise, consider the following model of observations: yi = η(xi , a0 ) + εi ,
i = 1, 2, . . . , N,
(2.47)
where for random errors εi ’s all the above assumptions hold, including that concerning the Gaussian distribution. A nonlinear (in parameters a ∈ Rr ) model η(xi , a) is exact, in the sense that for a certain (unknown) vector a0 , the output observations obey (2.47). Additionally, we assume that for each fixed x the function η(x, a) is continuously differentiable with respect to elements of a in a certain vicinity of a0 and its derivatives are square integrable. Then, according to the well-known Cramer–Rao lower bound (CRLB), for any unbiased estimator ã N of a0 we have Cov(ã N ) ≥ M −1 (ξN , a0 ),
(2.48)
where the FIM is defined as follows: N
M(ξN , a0 ) = σ −2 [∑ ∇a η(xi , a) ∇a ηT (xi , a)] i=1
a=a0
,
(2.49)
while the inequality in (2.48) is understood in the sense of the Loewner ordering. Much more advanced inequalities than (2.48) are known (see the important monographs [114, 118]), but further we use the CRLB (2.48) and (2.49) with a replaced by nominal parameter values a0 as the simplest one. In the case of nonlinear parametrization, we rarely have unbiased estimators of a0 at our disposal. However, we can use asymptotically unbiased estimators, the simplest
46 | 2 Primer on classic optimal experiments one being the LSE of â N , which solves the following optimization problem: N
2
min ∑(yi − η(xi , a)) , a∈A
(2.50)
i=1
where A is an admissible compact set containing a0 . In order to invoke asymptotic results that support the use of CRLB – as the base of an experiment design criterion – we need additional assumptions. The first of these are assumptions concerning xi ’s. The first and the simplest possibility is to assume that xi ’s are generated at random from a certain design measure with a density ξ , which is positive over the whole design domain. Such designs are called randomized designs1 in [118]. An alternative assumption, which can be imposed (see [118]), is to consider ξN as a sequence of designs with properly selected, but fixed, support points. Such a sequence is assumed to be convergent to design ξ in the sense that the corresponding probabilities are convergent in the usual sense. In both cases we assume that design ξ has the nonsingular FIM. The second crucial assumption is estimability (in the engineering literature known as identifiability). Namely, the limiting design ξ is such that if
2
∫[η(x, aI ) − η(x, aII )] ξ (dx) = 0,
then aI = aII .
(2.51)
Under the above assumptions and additional assumptions concerning differentiability of η(x, a) (see [118], Chapter 3) the CRLB has the following advantage. For N → ∞, in (2.48), we have equality (see [118]). Furthermore, as shown in [118], the sequence of parameter estimates is asymptotically normal with the mean a0 and the inverse of the FIM as the covariance matrix. In spite of these good properties, applications of (2.48) results in a fundamental difficulty. Namely, it depends on vector a0 , which is unknown. Hence, we have the problem of using it as the base for formulating an optimal experiment design criterion. This drawback of the CRLB in the nonlinear case has been known for decades. The following approaches were proposed (see also the extensive discussions in [47, 118, 186]) to overcome it: Approach 1 In (2.49) use nominal values of parameters, denoted further as anom , instead of unknown a0 . In cases considered in subsequent chapters, this approach seems to be the most appealing, since for systems having a spatio-temporal structure, unknown parameters have an interpretation as physical constants and their “typical” (nominal) values can be found in tables of physical constants. The role 1 Note that this term has also similar, but slightly different meanings in older books on the design of experiments.
2.8 Experiment designs for nonlinear models | 47
of an experiment is to obtain their values more precisely, in a particular case at hand. Approach 2 Apply the Bayesian approach, by assuming that we have a priori knowledge about unknown parameters that is expressible as their a priori p. d. f., denoted as fa (a). Then, fa (a) is used for averaging the D-optimality criterion. In [186] (page 333), the following variants are discussed: – the maximization of the ED-optimality criterion: Ea [det M(ξ , a)], – the minimization of the EID-optimality criterion: Ea [1/ det M(ξ , a)], – the maximization of the ELD-optimality criterion: Ea [ln det M(ξ , a)], where Ea [⋅] means that the expectation is calculated with respect to p. d. f. fa . In the above, variants of D-optimality criteria are averaged. One may also consider the averaging CRLB Cov(ã N ) ≥ M −1 (ξN , a) with respect to fa and then to impose the D-criterion. More pragmatic, i. e., easier to use, but rather difficult to interpret is the following expression: N
σ −2 ∫[∑ ∇a η(xi , a) ∇a ηT (xi , a)] fa (a) da, i=1
(2.52)
followed by imposing the D-criterion. Approach 3 Use an iterative approach, in which the stages of the experiment design, followed by performing it and updating the estimates of a0 are repeated until convergence (see [186] (page 332) for conditions under which the convergence can be proved). This approach was initially described in [46]. Approach 4 Apply the max-min approach in the following form: max {min det[M(ξN , a)]}. ξN
a
(2.53)
This is a very conservative approach, which is – simultaneously – very computationally intensive. Summarizing the above discussion, we select Approach 1 as the main one in the chapters that follow. One more advantage of this approach, in addition to its simplicity, interpretability, and good asymptotic properties, is that it is a building block for constructing solutions for more advanced approaches.
3 Numerical search for D-optimal designs – basic methods for a linear regression In this chapter we describe classical methods of searching for optimal experiment designs. We concentrate on their versions dedicated to D-optimality criteria, since they reveal basic concepts. We start from providing motivations for constructing methods and algorithms, which are dedicated to searching for optimal designs. Then, a method of updating design weights, without affecting the design support, is discussed in detail. The core of this chapter constitutes the method of Wynn and Fedorov, which allows for searching for both a design support and its weights. Finally, we sketch possible modifications of the Wynn and Fedorov algorithm.
3.1 Introductory considerations One can ask, what is the reason for constructing and studying specialized methods and algorithms in searching for optimal experiment design? Why are general purpose optimization algorithms not enough? Indeed, one can interpret the problem of searching for optimal designs as an optimization problem with experiment sites x1 , x2 , . . . , xm and nonnegative weights p1 , p2 , . . . , pm as optimization variables, subject to constraints xi ∈ X ⊂ Rs and ∑m i=1 pi = 1; X has usually a simple geometric shape. Thus, difficulties with using general purpose solvers are not hidden in constraints. In fact, the main difficulty lies in the dimensionality of the space of independent variables and in highly nonlinear and multi-extremal criterion (goal) functions. To be more specific, consider a regression function with s = 5 independent variables, which is spanned by v(x) with r = 15 components (e. g., linear plus simple interactions, say). We know that we can confine our search of the D-optimal design to designs comprised of not more than m = r (r + 1)/2 + 1 points. Then, the number of optimization variables equals s m plus m weights. In our hypothetical example this amounts to about 750 decision variables and their number grows (roughly) as r 2 . Multi-extremal, global optimization problems with 1000 optimization variables are intractable. Even local search in R750 with constraints and a nonlinear goal function is outside our possibilities in general. In fact, we are able to solve linear and quadratic optimization problems with 750 variables, but the experiment design criteria are usually much more complicated. The second question one may ask is which of the following approaches to searching for optimal designs is more efficient: 1. searching for exact (discrete) designs or 2. searching for the optimal approximate design and then rounding its weights in order to obtain a discrete design that is close to an exact one? https://doi.org/10.1515/9783110351040-003
50 | 3 Searching for D-optimal designs In our opinion, the first approach is applicable only for a small or a moderate number of observations N, since the problem is known to be the NP-complete, combinatorial optimization task. Note, however, that a highly elaborated branch and bound algorithm with inventively applied simplicial decomposition allows searching for discrete designs over thousands of sites [183]. For a larger N we prefer the second approach because – as proved in [120] – when the rounding of weights is carefully done, the loss in D-efficiency is rather small.
3.2 Optimal allocation of observations We start our tour around the numerical search for optimal designs from the problem of optimizing only weights of an experiment design, while its support – considered here as a set of candidate points, which may or may not enter into the final design – is fixed. In this section we concentrate on an algorithm of multiplicative improvement of weights. The algorithm of Wynn and Fedorov can also be used for optimizing weights only. We shall point out this possibility in the course of discussing it. The reasons of choosing this order of the presentation are the following: – Algorithms of optimizing weights are relatively simple as a starting point for discussing design search methods and still very useful, as explained below. – In many problems the optimization of weights can be sufficient to find an approximation to an optimal design. The way of achieving this goal is to cover the design region by a large set of candidate points and to optimize their weights unless weights of only a small part of them are essentially larger than zero, the rest of the initial set is not included into the final design. The success of this approach depends on the way of selecting the set of candidate points. If this set contains all points of the support of the optimal design and a method of optimizing weights is convergent to the optimal weights, then one can find a design which is nearly optimal. This method is frequently applied in commercial packages as a reliable way of searching for optimal designs. If we are not sure where the support points of the optimal design can be situated, then one can select a very fine grid and use the brute force of computers. – The task of optimizing weights can be treated as a sub-task in procedures for searching for both design points and their weights. As we shall see later, such an approach is more efficient than adding a design point and its weight one by one.
Problem statement We assume that the set of experiment sites (a design support) x1 , x2 , . . . , xm , m ≥ 1, xi ∈ X, i = 1, 2, . . . , m, is already chosen. Attaching weights pi ≥ 0, i = 1, 2, . . . , m,
3.2 Optimal allocation of observations | 51
∑di=1 pi = 1 to these points we obtain a design ξm , say, which is of the form x ξm = [ 1 p1
x2 p2
⋅⋅⋅ ⋅⋅⋅
xm ], pm
(3.1)
but differs from designs considered elsewhere in this book in that here x1 , x2 , . . . , xm are fixed. They are not treated as optimization variables directly. Clearly, if an optimization of the corresponding weights leads to pj = 0, then we finally delete xj from the design support. We refer the reader to [66] for the results on removing nonoptimal support points in D-optimal design algorithms. The class of all designs obtained in this way will be denoted by Ξm . The following properties of Ξd are easy to verify: 1. Ξm ⊂ Ξ(X), 2. Ξm is a convex set, i. e., linear combinations of designs from Ξm are also its elements, 3. the set of attainable information matrices def
ℳ(Ξm ) = {M : M = M(ξm ), ξm ∈ Ξm }
is convex, closed, and bounded, provided that vectors v(xi ), i = 1, 2, . . . , m, are bounded, 4. if v(xi ), i = 1, 2, . . . , m, m ≥ r, contain r linearly independent vectors, then ℳ(Ξm ) contains a positive definite matrix. Now, we are ready to formulate the problem of finding D-optimal weights. Design ξm∗ is D-optimal in the class Ξm iff sup det(M(ξm )) = det(M(ξm∗ )).
ξm ∈Ξm
(3.2)
Opposite to the general problem statement (see Chapter 2), here the number of support points m and their positions xi ∈ X are fixed and pi ’s such that pi > 0, ∑m i=1 pi = 1 are the only decision variables. The above listed properties (1)–(4) guarantee the existence of the optimal solution in (3.2). Remark 9. We also note that det M(ξm∗ ) ≤ sup det(M(ξ )) ξ ∈Ξ(X)
(3.3)
and the inequality is strict unless the set {x1 , x2 , . . . , xm } contains all the support points of a design, which is D-optimal in the class Ξ(X). One can easily reformulate (3.2) for other optimality criteria.
52 | 3 Searching for D-optimal designs Algorithm of multiplicative weights update The algorithm of optimizing weights of D-optimal designs has a long history (see [13, 48, 113, 118, 165, 171] and the bibliography cited therein). We present it in the form relevant to the D-optimality criterion. It seems that this algorithm is deserving of special attention here, not only because of its computational efficiency, but also because it is a rare case when updates take the multiplicative form, instead of commonly used additive improvements. To motivate the construction of this algorithm let us note that for ξm∗ the Kiefer– Wolfowitz theorem holds also for the weights of the D-optimal designs with a fixed support. In particular, if ξm∗ is D-optimal, then max vT (xi ) M −1 (ξm∗ ) v(xi ) = r,
1≤i≤m
(3.4)
and the maximum in (3.4) is attained at the support points of D-optimal design. In our case the support points of ξm∗ are exactly those points for which p∗i > 0. Thus, at these points we have vT (xi ) M −1 (ξm∗ ) v(xi ) = r or, equivalently, 1 T v (xi ) M −1 (ξm∗ ) v(xi ) = 1, r
i = 1, 2, . . . , m∗ ,
(3.5)
where m∗ ≤ m is the number of points with positive weights (without loss of generality, we have assumed that these points have the first m∗ indices). Multiplying both sides by D-optimal weights p∗i we obtain p∗i = p∗i
ϕ(ξm∗ , xi ) , r
i = 1, 2, . . . , m∗ ,
(3.6)
where ϕ(ξ ∗ , xi ) = vT (xi ) M −1 (ξm∗ ) v(xi ). Remark 10. Alternatives to (3.6) are also discussed in the literature. Their essence is to rise ϕ(ξm∗ , xi ) to the power larger than 1 and to normalize the denominator appropriately. However, in this book we confine our attention to the classic version. The above considerations suggest the following iterative method of improving weights (see the bibliography at the beginning of this section).
3.2 Optimal allocation of observations | 53
Multiplicative weights update algorithm (MWUA) Step 1 Select initial weights p(0) > 0, i = 1, 2, . . . , m, which are nonzero at all points i xi , i = 1, 2, . . . , m, and their sum equals 1. According to our assumptions made in Section 3.2, the information matrix M(ξ (0) ), corresponding to design ξ (0) with these weights, is nonsingular. Step 2 For n = 0, 1, . . . update the weights as follows: p(n+1) = p(n) i i
ϕ(ξ (n) , xi ) , r
i = 1, 2, . . . , m,
(3.7)
unless their differences are sufficiently small. Several remarks are in order concerning the above algorithm. 1. Note that in each iteration all the weights are updated, unless ϕ(ξ (n) , xi ) = 0. 2. At every iteration the weights sum up to 1, since they summed up to 1 at iteration 0. 3. The weights which correspond to points where ϕ(ξ (n) , xi ) > r are increased at the expense of the weights where ϕ(ξ (n) , xi ) < r, while weights corresponding to points ϕ(ξ (n) , xi ) = r remain unchanged. 4. One can save calculations by rejecting the support points, which are identified as not entering into the support of the D-optimal design in the version when also the points of the support are optimized. The reader is referred to [66] for the appropriate test. More detailed results on the convergence of the above algorithm and its versions for other design optimality criteria can be found in [192]. Numerical examples In order to become more intuitive regarding how MWUA works, consider the following two examples. Example 15. Consider the linear regression v(x) = [1, x]T , X = [−1, 1]. The algorithm was run from the starting design ξd(0) , which is shown in Table 3.1. We know that the D-optimum with optimized support points and weights is concentrated at points −1 Table 3.1: Starting design in Example 15. Support
Weight
−1 0 0.25 1
0.1 0.3 0.3 0.3
54 | 3 Searching for D-optimal designs
Figure 3.1: Updated weights vs. iteration number in Example 15.
and +1 with weights 1/2. Thus, the support of the optimal design is contained in the points listed in the first column of Table 3.1 and we can expect that the MWUA will be able to find the optimal weights. The process of updating weights in subsequent iterations is shown in Fig. 3.1. Note that in about 10 iterations the weights corresponding to points −1 and +1 attained level 0.5 (lines marked by triangles and dots), while the weights corresponding to points 0 and 0.25 were reduced to zero (lines marked by squares and rotated squares). Example 16. Consider a bivariate linear regression spanned by the following set of functions: v(x) = [1, x(1) , x(2) , x(1) x(2) ]T , defined on X = [−1, 1]2 . We look for a D-optimal design with support constrained to the points listed in the first column of Table 3.2: A starting design in Example 16. Support
Weight
{−1, 1} {1, 1} {−1, −1} {1, −1} {0, 0} {1, 0}
0.1 0.1 0.1 0.1 0.3 0.3
Figure 3.2: Updated weights vs. iteration number in bivariate regression (see Example 16).
3.2 Optimal allocation of observations | 55
Table 3.2. In fact, we know that the D-optimal design is concentrated at the vertices of X = [−1, 1]2 with weights 1/4. The second column of Table 3.2 shows the starting weights, while in Fig. 3.2 the weights updating process by the MWUA is shown (for selected weights).
3.2.1 Why is the multiplicative weights update algorithm so fast? The proof of convergence of the MWUA can be found in [113]. In [192] conditions for monotonic convergence are established for a general class of multiplicative algorithms. It seems, however, that the reasons for the observed experimentally fast convergence rate are still worth of investigations. The above examples illustrate the fact that the MWUA attains the optimal solution in a very small number of steps. The discussion presented below is a step toward understanding this phenomenon. The following fact can also easily be proved by direct calculations. Assume that m = r. It is easy to see that in this case the D-optimal solution of the problem (3.2) attaches equal weights 1/r to each candidate point of the design support. It is, however, remarkable that the MWUA finds this optimal solution in one iteration, for every initial distribution of weights p(0) > 0, i = 1, 2, . . . , r. i Consider the following auxiliary optimization problem. We have a design ξd = [
x1 p1
x2 p2
⋅⋅⋅ ⋅⋅⋅
xm ], pm
(3.8)
with fixed support, and we would like to improve its weights as follows: p′i = wi pi ,
i = 1, 2, . . . , m,
(3.9)
where p′i is a new weight, while nonnegative multipliers are to be found under the following obvious constraint that new weights also sum up to 1: m
∑ wi pi = 1.
(3.10)
i=1
The choice of wi ’s should be such that ln det(M(ξd′ )) is maximized with respect to wi , where x ξd′ = [ 1 p1 w1
x2 p2 w2
⋅⋅⋅ ⋅⋅⋅
xm ]. pm wm
(3.11)
56 | 3 Searching for D-optimal designs In order to take into account constraint (3.10), define the Lagrange function m
L(ξm′ , λ) = ln det(M(ξm′ )) + λ (1 − ∑ wi pi ), i=1
(3.12)
where λ is the Lagrange multiplier determined to fulfill (3.10). If wi∗ , i = 1, 2, . . . , m, maximize the Lagrange function, then its derivatives with respect to these variables are zero, when calculated at wi∗ ’s. After direct calculations we obtain pi vT (xi ) M −1 (ξd̃ ) v(xi ) − λ pi = 0,
i = 1, 2, . . . , m,
(3.13)
xm ∗]. pm wm
(3.14)
where x def ξd̃ = [ 1 ∗ p1 w1
x2 p2 w2∗
... ...
Summing all the equalities in (3.13) we arrive at the conclusion that λ = r. Substituting this equality and multiplying both sides of each equality in (3.13) by wi∗ , we arrive at the following equations for optimal multipliers wi∗ : 1 ∗ w p vT (xi ) M −1 (ξd̃ ) v(xi ) = wi∗ pi , r i i
i = 1, 2, . . . , m.
(3.15)
In (3.15) one can easily recognize the same equalities as in (3.6), but this time we know something more. Namely, we know that new weights are locally, for one step ahead, optimal improvements of the old weights, which seems to be a partial explanation of the fast convergence of the MWUA.
3.3 The algorithm of Wynn and Fedorov The algorithm that is described in this section was proposed by Fedorov (see [46] and the bibliography therein) and Wynn [190] in different forms, pertaining to the choice of the step length. It was a milestone in the area of searching for optimal experiment designs. Although about five decades have passed from the time of its invention, this algorithm still serves as a basis for constructing numerical methods in this area.
Motivations We shall present the main idea of the Wynn–Fedorov (W-F) algorithm for D-optimality def
criterion ΦD (⋅) = ln det M(⋅), since considerations for other criteria are similar.
3.3 The algorithm of Wynn and Fedorov | 57
Let ξ0 be a design, which one wishes to improve by constructing a design ξ1 , for which ΦD (ξ1 ) > ΦD (ξ0 ). To this end, consider a family of designs of the form (1 − α) ξ0 + α ξimp ,
0 ≤ α ≤ 1,
where ξimp is a certain admissible design, which can improve the quality of ξ0 . Designs from this family when substituted to the D-optimality criterion define a function ΦD [(1 − α) ξ0 + α ξpopr ] of one variable α. Its derivative with respect to α, calculated at α = 0+, has the form d Φ [(1 − α) ξ0 + α ξimp ]α=0+ = −r + tr[M −1 (ξ0 ) M(ξimp )]. dα D
(3.16)
In the derivation of (3.16) the following identity is crucial. Let B(α) be an r × r matrix that is differentiable with respect to α and nonsingular in a certain vicinity of α0 , say. Then d log Det[B(α)] d B(α) = tr[B−1 (α) ]. dα dα The right-hand side of equality (3.16) tells us how fast the changes of ΦD in the vicinity of ξ0 in the direction of ξimp are. Let us to try to maximize this quantity with respect to ξimp . In fact, it suffices to maximize m
tr[M −1 (ξ0 ) M(ξimp )] = ∑ p′i vT (xi′ ) M −1 (ξ0 ) v(xi′ ), i=1
(3.17)
where xi′ and p′i are the support points and weights of ξimp , respectively, and m ≥ 1 is also a decision variable. Taking into account that p′i ≥ 0, vT (xi′ ) M −1 (ξ0 ) v(xi′ ) ≥ 0, and ′ ∑m i=1 pi = 1, we arrive at the following conclusion. Lemma 1. The design ξimp which maximizes ΦD ((1 − α) ξ0 + α ξimp ) in the vicinity of ξ0 is a one-point design, concentrated at a maximizer of vT (x) M −1 (ξ0 ) v(x) over x ∈ X. To finish the proof of this fact it suffices to note that m
∑ p′i vT (xi′ ) M −1 (ξ0 ) v(xi′ ) ≤ max vT (x) M −1 (ξ0 ) v(x) i=1
x∈X
(3.18)
58 | 3 Searching for D-optimal designs and equality holds if the weight equal to 1 is attached to a point where the maximum1 of the right-hand side expression is attained. We shall denote such a point by ξimp . Note that a (global) maximizer of T v (x) M −1 (ξ0 ) v(x) may not be unique. Mixing the one-point design at ξimp with ξ0 provides locally the fastest growth of ΦD . The idea sketched above is the main step in the algorithm of Wynn and Fedorov, which is described below. As will be clear later, the search for ximp is the most time consuming task of the W-F algorithm. Note, however, that we have gained a lot, since now we are searching for the global maximum of a function of x variables, which has the dimension of the order 1–20, instead of hundreds. The price for the gain is that we have to repeat the search for ximp in each iteration. In fact, it is not necessary to search for the exact global maximum of vT (x) M −1 (ξ0 ) v(x), since it suffices to find a “satisfactory” improvement in order to build a convergent algorithm. Nevertheless, searching for the global maximum at each iteration is a challenging task. In the context of experiment design, attempts in this direction can be found in [133, 166]. One can also apply general purpose algorithms of global optimization with constraints (see [108] for a survey of differential evolution methods and [146] for the differential evolution algorithm with a population filter to handle constraints). Description of the algorithm Initial data for the W-F algorithm consist of the following ingredients. 1. A design domain, which is a compact set X ⊂ Rs . Note that X can also be a finite set, e. g., grid. 2. Vector v(x) of r ≥ 2 continuous and linearly independent functions on X. Note that the case r = 1 is excluded from our considerations, since then it suffices to use a one-point design, which is placed at the point where v2 (x) attains its maximum in X. 3. Starting design ξ0 ∈ Ξ(X), which can be arbitrary, but its information matrix M(ξ0 ) has to be nonsingular. 4. ϵ > 0, which determines the accuracy of searching for the optimal design. Recall that for a design ξ ∈ Ξ(X) with nonsingular information matrix, the prediction variance ϕ(ξ , x) is given by ϕ(ξ , x) = vT (x) M −1 (ξ ) v(x),
(3.19)
provided that the variance of errors σ 2 = 1. 1 Recall that X is a compact set and v(x) are continuous according to our previous assumptions.
3.3 The algorithm of Wynn and Fedorov | 59
Algorithm of Wynn and Fedorov Step 1 Select ξ0 and set iteration number k = 0. Step 2 For design ξk calculate ϕ(ξk , x) = vT (x) M −1 (ξk ) v(x).
(3.20)
We shall comment on the matrix inversion in (3.20) later. Step 3 Find point xk = arg max ϕ(ξk , x), x∈X
(3.21)
where arg max means that xk is an argument of the function ϕ(ξk , ⋅), for which its global maximum is attained. Step 4 If the condition ϕ(ξk , xk )/r < 1 + ϵ
(3.22)
holds, then ξk is a sufficiently accurate approximation of the D-optimal design and the algorithm is stopped, returning ξk and ϕ(ξk , xk ). Otherwise, a new design is composed as described below. Step 5 Improve design ξk as follows: ξk+1 = (1 − αk ) ξk + αk δ(xk ),
(3.23)
where δ(xk ) is a one-point design at xk with the weight αk , which is calculated as αk =
ϕ(ξk , xk ) − r . (ϕ(ξk , xk ) − 1) r
(3.24)
We shall comment on this choice later. Step 6 Replace k by k + 1 and repeat calculations, starting from Step 2. It can be proved (see [46, 113, 189, 190]) that if we omit checking condition (3.22), then the above algorithm generates an infinite sequence of designs such that lim det[M(ξk )] = det[M(ξ ∗ )].
k→∞
(3.25)
In other words, the algorithm generates a sequence of designs with the quality measure, which is convergent to the quality of the D-optimal design. One can prove (see [46]) that the above algorithm is convergent for any sequence 0 < αk < 1 for which the following conditions hold: ∞
∑ αk = ∞,
k=0
∞
∑ αk2 < ∞.
k=0
(3.26)
60 | 3 Searching for D-optimal designs In particular, one can select αk = 1/(k + 1). The advantage of the choice (3.24) is that it maximizes (locally at each iteration) det M[(1 − α) ξk + α ξ (xk )] with respect to α. Let us also note that the W-F algorithm applies directly as a tool for optimizing weights when candidate points are given. To this end it suffices to replace X by a finite set Xd = {xi , i = 1, 2, . . . , d}, say, and treat the maximization of ϕ(ξk , x) as max ϕ(ξk , xi ). xi ∈Xd
Examples of running the W-F algorithm The aim of the examples presented below is to gain some experience with how the W-F algorithm behaves in simple cases. Before going into details, we have to point out the most important and the most difficult issue connected with the use of the W-F algorithm. Namely, in Step 2 we have to find a global maximum of ϕ(ξk , x) over x ∈ X and point(s) xk where the maximum is attained. As pointed out in the introduction to this section, we gain a lot by using the W-F algorithm, but we still have to solve a global optimization problem in dim(x) space. In this section we confine ourselves to one-dimensional problems, i. e., dim(x) = 1, allowing for r = dim(v(x)) of a moderate size. In the examples below we provide plots of ϕ(ξk , xk ) versus iteration number k. They seems to be more informative than plots of det(M(ξk )), since ϕ(ξk , xk ) − r informs us how far we are from attaining the stopping condition. Example 17. Consider a quadratic regression spanned by v(x) = [1, x, x2 ]T , x ∈ [−1, 1]. As we already know, the D-optimal design is concentrated at points ±1 and 0 with weights 1/3. In Table 3.3 the starting design for the W-F algorithm is shown. Our aim is to illustrate how the algorithm behaves at the very first iterations. Table 3.3: Starting design in Example 17. Support
Weights
−0.4 0.7 0.2
0.333 0.333 0.333
3.3 The algorithm of Wynn and Fedorov | 61
Figure 3.3: ϕ(ξk , x) as a function of x for iterations 1–6, obtained in Example 17.
The plots in Fig. 3.3 illustrate the changes of ϕ(ξk , x) as a function of x ∈ [−1, 1] and the iteration number k. As one can notice, the position of the global maximum of ϕ(ξk , x) alternates between points 1 and −1 in the first six iterations. Example 18. The W-F algorithm was run to find a rough approximation to the D-optimal design for a regression function spanned by T
v(x) = [1, x, x2 , x3 , x4 ] ,
x ∈ [−1, 1].
A starting design is shown in Table 3.4 (left panel). Calculations were stopped after 12 iterations, which took 0.125 sec, and the resulting design is shown in Table 3.4 (right panel). We know that the D-optimal design is concentrated at ±1, ±0.66. One can note that the design obtained after only 12 iterations of the W-F algorithm already contains all the support points of the D-optimal design plus points being the support of an (intentionally) bad starting design. This remarkable property of the W-F algorithm will be exploited later.
62 | 3 Searching for D-optimal designs Table 3.4: Starting design (left panel) and final design (right panel) obtained in Example 18. Support
Weights
Support
Weights
−0.8 −0.3 0.2 0.8 −0.1
0.2 0.2 0.3 0.1 0.2
−0.8 −0.3 0.2 0.8 −0.1 1 −1 −0.66 0.67 0.66
0.070769 0.070769 0.106153 0.035384 0.070769 0.181785 0.185442 0.124298 0.086364 0.068266
Figure 3.4: log(det(M(ξk ))) vs. iteration number k obtained in Example 18.
In Fig. 3.4 the logarithm of the determinant of the information matrices corresponding to designs subsequently generated by the W-F algorithm is shown.
Remarks on the W-F algorithm and its improvements The above examples and many other simulation experiments performed by the author support the following conclusions on the typical behavior of the W-F algorithm. 1. The W-F algorithm very quickly (usually in 10–20 iterations) locates support points of the D-optimal design. At this stage the growth of det(M(ξk )) is rapid. Simultaneously, maxx∈X ϕ(ξk , x) rapidly decreases to a vicinity of r. 2. In further iterations a slow process of updating weights can be observed. The weights corresponding to the D-optimal support points attain levels close to their proper values in about 100 iterations. The decrease to zero of the weights corresponding to the points which are not in the optimal design is rather slow and takes thousands of iterations if the required accuracy is high. At this stage the growth of det(M(ξk )) and the decrease of maxx∈X ϕ(ξk , x) saturate.
3.3 The algorithm of Wynn and Fedorov | 63
3.
After stopping the iterations of the W-F algorithm, the final design frequently contains clusters of support points, which are close to the support points of the optimal design. The sum of the weights of points from each cluster is close to the optimal weight, provided that the number of iterations was sufficiently large.
The above behavior is typical for first-order optimization techniques, which are based on the steepest descent (ascent) in the gradient direction. The best local direction of search is described in Lemma 1, while locally the best step length is given by (3.24). Below we list modifications of the W-F algorithm, which were proposed in order to improve its performance or generalize it to other criteria. 1. One can avoid groups of very close points by applying a clustering algorithm (e. g., k-means) at iterations which are close to stopping the algorithm (maxx∈X ϕ(ξk , x) is close to r). Then, each cluster is replaced by a new support point placed in the cluster’s center. The weights of points which form the same cluster should be added. 2. Instead of adding one new design point in each iteration it is proposed to add simultaneously all the points in which max ϕ(ξk , x) x∈X
3.
(3.27)
is attained. This modification reduces the number of iterations, but finding all global maxima is a rather time consuming task, especially in multi-dimensional cases. Delete from a current design ξk a point for which min
x∈supp(ξk )
ϕ(ξk , x)
(3.28)
is attained, where supp(ξk ) denotes the support of design ξk . Note that the task (3.28) is much easier than (3.27), since the latter requires comparisons of a finite number of earlier calculated quantities ϕ(ξk , xi ), xi ∈ supp(ξk ). 4. Reject from design ξk all the points with weights less than a pre-specified threshold δ > 0. The level δ should depend on the number of points, which are expected in the final design, but for practical purposes one can safely set δ = 10−6 . In cases 3 and 4 above one should take care of the weights of rejected points, e. g., by dividing them equally between the points which remain in a current design. Rejecting points from a design can lead to throwing out a point which is in fact a point of the support of the D-optimal design. In such a case the algorithm introduces such a point again in the next iteration or after several iterations. In some cases it can lead to an unpleasant loop of rejecting and introducing the same point. One can reduce this danger by using at least one of the following tools. – Decrease slowly the level of rejection δ as the number of iterations increases.
64 | 3 Searching for D-optimal designs –
Use the inequality (see [66]), which allows one to verify that a given point is not in the support of the D-optimal design, although the support is unknown.
Generalizations of the W-F algorithm to cover other criteria and optimal experiment design problems with singular information matrices were also investigated. Furthermore, second-order design search procedures have been proposed. They are based on the second derivatives of the design criterion and their implementation is rather sophisticated. Their main advantage lies in speeding up weight updates, since – as we have seen earlier – the W-F algorithm finds the support of the optimal design rather quickly.
3.4 The combined algorithm The features of the W-F method and properties of the MWUA suggest that these two approaches are complementary. The W-F algorithm quickly identifies the support points of the D-optimal design, while the MWUA provides an efficient way of improving weights. Therefore, below we suggest a combined algorithm, which uses the W-F method and the MWUA. The idea of such a combination has been independently reinvented many times in different variants (see, e. g., [132] and its more advanced version [133], as well as [193] for D-optimal designs, [119] for a wide class of concave and differentiable criteria, and [49] for A-optimal designs). The reader is also referred to [65] for a general approach that includes many known algorithms as special cases. Examples of performance of the combined algorithm are summarized at the end of this section.
Description of the combined algorithm The combined algorithm for searching for D-optimal designs contains two loops. The outer one is based on the W-F algorithm. At each outer iteration the weights of a current design are updated also by the MVUA, which runs in the inner loop. In more detail, the combined algorithm runs as follows. Combined algorithm Step 1 Select ξ0 and set iteration number k = 0. Step 2 For design ξk set ϕ(ξk , x) = vT (x) M −1 (ξk ) v(x). We shall comment on the matrix inversion in (3.29) later.
(3.29)
3.4 The combined algorithm | 65
Step 3 Find point xk = arg max ϕ(ξk , x), x∈X
(3.30)
where arg max means that xk is an argument of the function ϕ(ξk , ⋅), for which its global maximum is attained. Step 4 If the condition ϕ(ξk , xk )/r < 1 + ϵ
(3.31)
holds, then ξk is a sufficiently accurate approximation of the D-optimal design and the algorithm is stopped, returning ξk and ϕ(ξk , xk ). Otherwise, a new design is composed as it is described below. Step 5 Form a new design ξk(0) as follows: ξk(0) = (1 − αk ) ξk + αk δ(xk ),
(3.32)
where δ(xk ) is the one-point design at xk with the weight αk , which is calculated as αk =
ϕ(ξk , xk ) − r . (ϕ(ξk , xk ) − 1) r
(3.33)
Step 6 Denote by Xd(k) the support of ξk(0) and by p(0) , i = 1, 2, . . . , #(Xd(k) ), the correi sponding weights. For l = 0, 1, 2, . . . calculate (l) p(l+1) = p(l) i i ⋅ φ(x, ξk )/r,
i = 1, 2, . . . , #(Xd(k) ),
(3.34)
until maxi |p(k+1) − p(k) | ≤ ε, where ε > 0 is a given accuracy, while ξk(l) is the i i
design with support Xd(k) and the corresponding weights p(l) . Denote by ξk(∗) the i
design with the support Xd(k) and with the weights p(∗) , i = 1, 2, . . . , #(Xd(k) ), at which i iterations (3.34) were stopped. Step 7 Set ξk+1 = ξk(∗) , put k := k + 1, and go to Step 2. As mentioned earlier, both the W-F method and the MVUA have their counterparts suitable for criteria other than the D-optimality. Combining them appropriately, we obtain an analog of the above algorithm. Simulations performed by the authors (not reported here) indicate that the combined approach is fruitful also for other criteria. Examples The first two examples presented in this section correspond exactly to the examples described in Section 3.3 for comparison purposes. Correspondence means the
66 | 3 Searching for D-optimal designs same model, the same initial design, the same final accuracy, and the same computer. The third example illustrates the efficiency of the combined method in a somewhat more difficult case, when a regression is a combination of polynomial and trigonometric terms and the D-optimal design is unknown. Example 19. Consider a quadratic regression spanned by v(x) = [1, x, x 2 ]T , x ∈ [−1, 1]. Initial and final designs are shown in Table 3.5. This example illustrates the fact that we need a large number of iterations in order to attain a high accuracy of computing the weights, but it is very rarely needed in practice. Indeed, for the reduction of the criterion from 0.0955 to 0.0954 about 100 iterations were needed. The overall execution time to obtain the results in Table 3.5 was equal to 0.125 sec. Table 3.5: Initial (left table) and final (right table) design in Example 19. Points
pi
Points
pi
−0.4 0.7 0.2
0.333 0.333 0.33
−1 1
0.333516 0.333511 0.332973
5 10−17
Example 20. Initial and final designs for a fourth-order polynomial regression v(x) = [1, x, x2 , x3 , x4 ]T , x ∈ [−1, 1], are shown in Table 3.6. The decrease of the largest prediction variance versus the iteration number is shown in Fig. 3.5. The prescribed accuracy ϵ = 0.01 was obtained after 0.375 sec. Table 3.6: Starting (left table) and final (right table) designs in Example 20. Points
pi
Points
pi
−0.8 −0.3 0.2 0.8 −0.1
0.2 0.2 0.3 0.1 0.2
1 −1 0.65 −0.65 −0.005
0.20004 0.20004 0.20004 0.199841 0.20004
Example 21. Consider a linear regression spanned by T
v(x) = [1, x, x2 , sin(π x), cos(π x)] ,
3.4 The combined algorithm | 67
Figure 3.5: maxx∈X ϕ(ξk , x) vs. iteration number k obtained in Example 20.
x ∈ [−1, 1]. The accuracy level was set to ϵ = 0.006, which was attained by the combined algorithm in 7 sec. Table 3.7 shows the initial and final designs, while in Fig. 3.6 the decrease of maxx∈X ϕ(ξk , x) is shown. Table 3.7: Initial (left table) and final (right table) designs in Example 21. Points
pi
Points
pi
−0.8 0.15 −0.3 0.2 0.8 0.9
0.1 0.1 0.2 0.3 0.1 0.1
−1 1 0.59 −0.61 −0.01375
0.202092 0.202095 0.201895 0.2019 0.192018
Figure 3.6: maxx∈X ϕ(ξk , x) vs. iteration number k obtained in Example 21.
68 | 3 Searching for D-optimal designs The above examples illustrate the efficiency of the combined algorithm. We shall refer to its variants when algorithms for D-optimal input signals are searched in the frequency or spatio-frequency domains.
4 The product of experiment designs and its optimality In this chapter we introduce the notion of the product of experiment designs. For the sake of simplicity, it is introduced for designs with finite supports, but it has a natural extension to experiment designs considered as probability measures. The role of product designs as possibly optimal experiments was firstly appreciated by Kono [83] and Hoel [71]. Later, the classes of regression functions and of criteria for which product designs are optimal were extended in [37, 38, 95, 120, 138–140, 159, 161, 188]. In this book, we put an emphasis on this class of experiments since in many cases the responses of systems with spatio-temporal dynamics have a tensor product structure, which is well suited for optimal spatial sensor allocation as product design.
4.1 The product of designs Let us assume that sub-vectors T
T
x(1) = [x(1) , . . . , x(s1 ) ]
and x(2) = [x (s1 +1) , . . . , x(s) ]
are parts of the input vector x. In other words, x(1) and x(2) – when concatenated – form vector x. Sub-vectors x(1) and x(2) will be called blocks of variables, also in the case when they contain single elements. Let us suppose that to each block of input variables an experiment design is attached. Namely, x(1) 1 ,
ξ (1) = [
p(1) 1 ,
x(1) 2 , p(1) 2 ,
..., ...,
x(1) K 1
], p(1) K1
x(2) 1 ,
ξ (2) = [
p(2) 1 ,
x(2) 2 , p(2) 2 ,
..., ...,
x(2) K 2
p(2) K
].
2
Their support points are sub-sets of Rs1 and Rs2 , respectively. Additionally, s1 + s2 = s and the following conditions hold: p(1) i ≥ 0,
K1
= 1, ∑ p(1) i i=1
p(2) j ≥ 0,
K2
= 1. ∑ p(2) j j=1
(4.1)
Using the boldface notation for blocks of variables is an exception from our notational conventions, which is used in this section only, in order to make it easier to distinguish sub-models and the corresponding designs of experiments. Definition 5. By the product of the designs ξ (1) and ξ (2) we mean a design which is supported at K1 ⋅ K2 points that is constructed as follows: https://doi.org/10.1515/9783110351040-004
70 | 4 The product of experiment designs – –
points (x(1) , x(2) ) form the support points of the product design, i j
the corresponding weights, denoted as pij , are given by pij 1, 2, . . . , K1 , j = 1, 2, . . . , K2 .
= p(1) pj(2) , i = i
Definition 6. By a (continuous) product design of experiments that is supported at a finite set of points, we mean a design which arises as the product of designs ξ (1) and ξ (2) . It is further denoted as ξ (1) ⊗ ξ (2) and it is named the product design. Designs ξ (1) and ξ (2) are further called partial designs of ξ (1) ⊗ ξ (2) . Note that partial design ξ (1) can be interpreted as a part of design ξ (1) ⊗ ξ (2) that arises by fixing a certain variables in x2 . Note also that if we consider the class of all product designs of the form {ξ (1) ⊗ ξ (2) : ξ (1) ∈ Ξ(X1 ), ξ (2) ∈ Ξ(X2 )},
(4.2)
then it is clear that the class of all product designs on X1 ×X2 does not cover set Ξ(X1 × X2 ) of all continuous designs defined on X1 ×X2 . Fortunately, later we shall prove that the class (4.2) is sufficiently broad to cover D-optimal designs for a large class of multivariate regression functions. Note that the arguments of ⊗ for experiment designs cannot be – formally – exchanged, i. e., ξ (1) ⊗ ξ (2) ≠ ξ (2) ⊗ ξ (1) . However, these two designs differ only by the numbering of input variables, which makes them equivalent in practice. Additionally, the operation ⊗ for experiment designs has the following important property: for the product of three experiment designs, namely, ξ (1) ∈ Ξ(X1 ), ξ (2) ∈ Ξ(X2 ), ξ (3) ∈ Ξ(X3 ), the following relationship holds: (ξ (1) ⊗ ξ (2) ) ⊗ ξ (3) = ξ (1) ⊗ (ξ (2) ⊗ ξ (3) ).
(4.3)
This property of the commutativity allows to define the product of any finite number of experiment designs for blocks of variables as follows: N
N−1
i=1
i=1
⊗ ξ (i) = [ ∏ ⊗ ξ (i) ] ⊗ ξ (N) , ∏
N = 2, 3, . . . .
(4.4)
Example 22. Consider three designs ξ (j) , j = 1, 2, 3. Each of them is supported at the points ±1 with weights 1/2. Then, their product ξ (1) ⊗ ξ (2) ⊗ ξ (3) is supported at all the vertices of the cube [−1, 1]3 with weights 1/8.
4.2 Multiplicative models For two column vectors b1 ∈ Rn1 , b2 ∈ Rn2 , say, we shall denote by b1 ⊗ b2 their Kronecker product, which is also a column vector with n1 n2 elements. The usage of the
4.2 Multiplicative models | 71
same symbol, namely, ⊗, for the Kronecker product of vectors and matrices as well as for the product of experiment designs is intentional. Let us suppose that the vector x of input variables is split into two sub-vectors x(1) and x(2) . Consider the following model: ̄ y(x) = aT ⋅ [v1 (x(1) ) ⊗ v2 (x(2) )],
(4.5)
where a is a column vector of unknown parameters and dim(a) = dim(v1 (x(1) )) ⋅ dim(v2 (x(2) )). In (4.5), v1 (x(1) ) and v2 (x(2) ) are given vectors of linearly independent functions on X1 and X2 , respectively. The experiment design problem for estimating a in (4.5) is considered in the set X1 × X2 , where x(1) ∈ X1 , x(2) ∈ X2 . Definition 7. By partial models of (4.5) we mean the following functions: (1) αT v1 (x(1) ), (2) βT v2 (x(2) ), which are defined on X1 and X2 , respectively. (1) (2) Above, α ∈ Rdim(v1 (x )) and β ∈ Rdim(v2 (x )) stand for constant vectors of parameters. Partial models can be considered quite formally, but – on the other hand – one can consider αT v1 (x(1) ) as the overall model output, when x(2) is fixed. Example 23. Consider the following model with three input variables: 1 1 1 ] ⊗ [ (2) ] ⊗ [ (3) ]} . x(1) x x
̄ y(x) = aT {[
(4.6)
One can split it into sub-models in the following three ways. The first one is to distinguish two sub-models, namely, those spanned by v1 (x(1) ) = [
1 1 ] ⊗ [ (2) ] x(1) x
and
1 v2 (x(2) ) = [ (3) ] , x
where x(1) consists of x(1) and x(2) , while x(1) = x(3) . The second way is similar to that above, but this time x (2) and x (3) are grouped. The third way is to distinguish the following three sub-models: 1 v1 (x(1) ) = [ (1) ] , x
v2 (x(2) ) = [
1 ], x(2)
and
v3 (x(3) ) = [
1 ], x (3)
where x(j) = x(j) , j = 1, 2, 3. The third way of selecting sub-models is the most convenient from the viewpoint of its usage. This will be clear from the results presented in the subsequent sections.
72 | 4 The product of experiment designs
4.3 On the D-optimality of product designs for estimating multiplicative models Consider the multiplicative model ̄ y(x) = aT [v1 (x(1) ) ⊗ v2 (x(2) )].
(4.7)
We use it to demonstrate how one can compose D-optimal designs for its sub-models in domains X1 and X2 into the D-optimal design in X1 × X2 for estimating a in (4.7). To this end, perform the following steps. The algorithm of composing a D-optimal design Step 1 Calculate D-optimal design ξ ̂ (1) ∈ Ξ(X1 ) for partial model αT v1 (x(1) ), i. e., solve the following problem: max det M (1) (ξ (1) ) = det M (1) (ξ ̂ (1) ),
ξ (1) ∈Ξ(X1 )
(4.8)
where for a design ξ (1) ∈ Ξ(X1 ) of the form ξ (1) = [
x(1) 1 , (1) p1 ,
x(1) 2 , (1) p2 ,
..., ...,
x(1) m1 ] p(1) m1
(4.9)
the FIM is given by the formula m1
T
(1) (1) (1) (1) M (1) (ξ (1) ) = ∑ p(1) i v (xi )(v (xi )) . i=1
(4.10)
Step 2 Calculate the D-optimal design ξ ̂ (2) ∈ Ξ(X2 ) for partial model βT v2 (x(2) ), i. e., solve the problem analogous to (4.8) with the superscript “1” replaced by “2.” Step 3 Compose the product design ξ ⋆ = ξ ̂ (1) ⊗ ξ ̂ (2) .
(4.11)
Clearly, the above algorithm can be used for more than two sub-models. For calculating ξ ̂ (1) and ξ ̂ (2) one can use either analytical or numerical methods. Before proving the D-optimality of (4.11) it is expedient to state the following two lemmas. Lemma 2. For arbitrary ξ (1) ∈ Ξ(X1 ), ξ (2) ∈ Ξ(X2 ) we have M(ξ (1) ⊗ ξ (2) ) = M (1) (ξ (1) ) ⊗ M (2) (ξ (2) ).
(4.12)
4.3 On the D-optimality of product designs | 73
Proof. Observe that for model (4.5) and design ξ (1) ⊗ ξ (2) the FIM has the following form: M(ξ (1) ⊗ ξ (2) ) = ∫ ∫ V(x(1) , x(2) ) dμ(x(1) , x(2) ),
(4.13)
X1 X2
where T
def
V(x(1) , x(2) ) = [v1 (x(1) ) ⊗ v2 (x(2) )] ⋅ [v1 (x(1) ) ⊗ v2 (x(2) )] ,
(4.14)
while μ(x(1) , x(2) ) is the probability measure corresponding to ξ (1) ⊗ ξ (2) . Note that the special structure of this experiment design leads to the conclusion that measure μ can be expressed as follows: μ(x(1) , x(2) ) = μ(1) (x(1) ) ⋅ μ(2) (x(2) ),
(4.15)
where μ(1) (x(1) ) and μ(2) (x(2) ) are the probability measures corresponding to ξ (1) and ξ (2) . We need the well-known property of the Kronecker product of matrices, namely, (A ⊗ B) (C ⊗ D) = (A C) ⊗ (B D),
(4.16)
which holds for arbitrary matrices A, B, C, D, having appropriate dimensions. Using this fact, we can express (4.14) as follows: T
T
V(x(1) , x(2) ) = [v1 (x(1) )] ⋅ [v1 (x(1) )] ⊗ [v2 (x(2) )] ⋅ [v2 (x(2) )] .
(4.17)
This expression and (4.15) allow us to write M(ξ (1) ⊗ξ (2) ) = M (1) (ξ (1) )⊗M (2) (ξ (2) ), where T
M (j) (ξ (j) ) = ∫[vj (x(j) )] ⋅ [vj (x(j) )] dμ(j) (x(j) ),
j = 1, 2,
(4.18)
Xj
which finishes the proof. Lemma 3. For arbitrary ξ (1) ∈ Ξ(X1 ), ξ (2) ∈ Ξ(X2 ) with nonsingular FIMs we have T
[v1 (x(1) ) ⊗ v2 (x(2) )] M −1 (ξ (1) ⊗ ξ (2) ) [v1 (x(1) ) ⊗ v2 (x(2) )] =
−1 vT1 (x(1) ) [M (1) (ξ (1) )] v1 (x(1) )
⋅
(4.19)
−1 vT2 (x(2) ) [M (2) (ξ (2) )] v2 (x(2) ).
Proof. It is known that the inverse of the Kronecker product of two matrices is the Kronecker product of their inverses. This fact and (4.12) imply M −1 (ξ (1) ⊗ ξ (2) ) = [M (1) (ξ (1) )]
−1
−1
⊗ [M (2) (ξ (2) )] .
(4.20)
74 | 4 The product of experiment designs Now, the result follows by substituting (4.20) into the left-hand side of (4.19) and by applying (4.16) twice. Theorem 6. Let us assume that for the partial models there exist designs in X1 and in X2 , respectively, for which the corresponding FIMs are nonsingular. Then, experiment design ξ ⋆ , which is obtained by the above algorithm, is D-optimal for estimating a in (4.7), when the design domain is X1 × X2 . Proof. Designs ξ ̂ (1) and ξ ̂ (2) are D-optimal for the partial models. Thus, by the Kiefer– Wolfowitz theorem, the following conditions hold for them: sup vT1 (x(1) )[M (1) (ξ ̂ (1) )] v1 (x(1) ) = r1 ,
(4.21)
sup vT2 (x(2) )[M (2) (ξ ̂ (2) )] v2 (x(2) ) = r2 ,
(4.22)
−1
x(1) ∈X1
−1
x(2) ∈X2
where r1 = dim[v1 (x(1) )], r2 = dim[v2 (x(2) )]. On the other hand, for the product design ξ ⋆ = ξ ̂ (1) ⊗ ξ ̂ (2) the FIM has the following form: M(ξ ⋆ ) = M(ξ ̂ (1) ⊗ ξ ̂ (2) ) = M (1) (ξ ̂ (1) ) ⊗ M (2) (ξ ̂ (2) )
(4.23)
(see Lemma 2). Being D-optimal for the partial models, matrices M (1) (ξ ̂ (1) ) and M (2) (ξ ̂ (2) ) are nonsingular. Hence, also M (1) (ξ ̂ (1) ) ⊗ M (2) (ξ ̂ (2) ) is nonsingular and, according to the wellknown property of the Kronecker product of matrices for M −1 (ξ ⋆ ) we have [M (1) (ξ ̂ (1) ) ⊗ M (2) (ξ ̂ (2) )]
−1
−1
= [M (1) (ξ ̂ (1) )]
−1
⊗ [M (2) (ξ ̂ (2) )] .
(4.24)
Using this fact, we obtain (see Lemma 3) −1 sup vT (x)M −1 (ξ ̂ )v(x) = sup vT1 (x(1) )[M (1) (ξ ̂ (1) )] v1 (x(1) )
x∈X1 ×X2
x(1) ∈X1
(4.25)
⋅ sup vT2 (x(2) )[M (2) (ξ ̂ (2) )] v2 (x(2) ) = r1 r2 = r. −1
x(2) ∈X
2
Having this result, it suffices to invoke the Kiefer–Wolfowitz theorem once again, this time for the overall model, in order to infer that ξ ∗ is D-optimal for estimating the r-dimensional vector of parameters a. Example 24. Consider the following bivariate model: 1 1 ] ⊗ [ (2) ]} , x (1) x
̄ y(x) = aT {[
(4.26)
4.3 On the D-optimality of product designs | 75
̄ which can be rewritten in the classic way as follows: y(x) = a(1) + a(2) x (1) + a(3) x(2) + (4) (1) (2) a x x . We are looking for a D-optimal design for estimating four-dimensional vector a in the domain [−1, 1] × [−1, 1]. The partial model for the first input variable has the form α(1) +α(2) x(1) . The D-optimal design for this sub-model is well known, namely, −1 1 ξ ̂ (1) = [ 1/2 1/2 ]. Due to the symmetry, the second sub-model and D-optimal design for its estimation are almost the same (only the change of superscripts is required). Applying Theorem 6 we obtain that the design (−1, −1) ξ⋆ = [ 1/4
(−1, 1) 1/4
(1, −1) 1/4
(1, 1) ] 1/4
(4.27)
is D-optimal for estimating parameters in model (4.26). This fact has been well known for a long time, but by using Theorem 6 it can be inferred in an elementary way. Note that design (4.27) is also D-optimal for estimating parameters in the follow̄ ing model: y(x) = a(1) +a(2) x(1) +a(3) x(2) . It occurs that it is a more general phenomenon [161]. This example immediately generalizes to regression functions of more than two input variables, in particular, to that described in Example 23. In general, in such cases D-optimal designs are supported at all the vertices of the [−1, 1]s cube with equal weights. Several remarks are in order concerning the above algorithm and Theorem 6. Remark 11. D-optimal product designs are sometimes criticized since they may contain a large number of design support points. Indeed, in the case when we have [ x1(j) ], j = 1, 2, . . . , K, sub-models on [−1, 1], the D-optimal design obtained from Theorem 6 is supported at the same points as the full factorial experiment with weights 2−K . On the other hand, the calculation of ξ ∗ design may be useful, even in such extreme cases, since having ξ ∗ we also have D-optimal FIM M(ξ ∗ ), which is unique. def
This is of special importance, because having M ∗ = M(ξ ∗ ) at our disposal, we can search for designs having the same information matrix that is attainable by a smaller number of support points. This reasoning is based on the following facts: the FIM of D-optimal designs is unique, but it can be attained by designs having different support points and the corresponding weights. The method of searching for such designs goes through solving the following system of (nonlinear) equations: M ∗ = ∑ pj v(xj ) vT (xj ) with respect to xj ’s and pj ≥ 0’s, ∑ pj = 1. This problem can be simpler than the problem of a general search for the D-optimal design. Remark 12. Equality (4.24) is crucial not only for the proof of Theorem 6 since it holds for any product design. This means that we can save the calculations of the parameter estimates by the LSQ method. Indeed, the FIM for the overall model is r × r, where r = r1 r2 , while the FIMs for partial models are of dimensions r1 × r1 and r2 × r2 , respectively.
76 | 4 The product of experiment designs These facts allow us to use a computation saving algorithm for calculating the LSE (see [141] and [44, 45] for advanced numerical techniques). Remark 13. Theorem 6 can be generalized to a wide class of other design optimality criteria, including A-optimality and Lp -optimality (see [138] for a longer list of design optimality criteria and for sufficient conditions for a general class of criteria that admit the product designs as the optimal solutions).
4.4 Examples Example 25. Consider deflections q(x(1) , x(2) ) at spatial points (x (1) , x (2) ) of a thin plate that are invoked by a distributed load u(x(1) , x (2) ), which is unknown. They are linked by the following equation: Ax q(x (1) , x(2) ) = −u(x (1) , x(2) ), 2
(x(1) , x (2) ) ∈ (0, 1) × (0, 1),
(4.28)
2
where the operator Ax = 𝜕 (x𝜕(1) )2 + 𝜕 (x𝜕(2) )2 is equipped with appropriate boundary conditions. Eigenfunctions of Ax have the form vkj (x(1) , x(2) ) = ϕk (x(1) ) ψj (x (2) ),
k, j = 1, 2, . . . ,
(4.29) 2
2
where ϕk (x(1) )’s and ψj (x(2) )’s are eigenfunctions of operators 𝜕 (x𝜕(1) )2 and 𝜕 (x𝜕(2) )2 , respectively, with appropriate boundary conditions. Note that eigenfunctions (4.29), truncated to K1 in the x (1) -direction and to K2 in the x(2) -direction, have the Kronecker product structure. Namely, they can be written ̄ (1) ) ⊗ ψ(x ̄ (2) ), where ϕ̄ is the column vector of the first K eigenfunctions ϕ (x (1) ), as ϕ(x 1 k while ψ̄ is the column vector of the first K2 eigenfunctions ψj (x (2) ). Thus, selecting K = K1 K2 , the (approximate) solution of (4.28) has the form K̂
q(x(1) , x(2) ) = − ∑ αkj vkj (x(1) , x (2) )/λkj , k, j=1
(4.30)
where λkj ’s are eigenvalues, Ax vkj = −λkj vkj , while αkj = ⟨u, vkj ⟩ are unknown coefficients of u. In other words, the solution (4.30) has a special structure, for which it was proved (see Theorem 6 and [138]) that the D-optimal experiment design can be composed from D-optimal designs for ϕk (x(1) )’s and ψj (x(2) )’s. Namely, design support points are the Cartesian products of designs for ϕk (x(1) )’s and ψj (x (2) ), while weights are the products of the corresponding weights of univariate designs.
4.4 Examples | 77
A particular design will depend on K̂ and on the boundary conditions. For the sake of simplicity, consider the case of clumped edges of the plate, i. e., q(0, x(2) ) = q(1, x(2) ) = 0,
for x (2) ∈ [0, 1],
q(x (1) , 0) = q(x(1) , 1) = 0,
for x (1) ∈ [0, 1].
(4.31)
Then, for the eigenfunctions we have ϕk (x (1) ) = √2 sin(π k x(1) ), (1)
(2)
(1)
ψj (x(2) ) = √2 sin(π j x(2) ), (2)
vkj (x , x ) = (2) sin(2 π k x ) sin(2 π j x ),
k, j = 1, 2, . . . , K.̂
(4.32) (4.33)
Let us select K1 = K2 = K̂ = 5. Then, partial designs for x (1) and x (2) are shown in Table 2.5. Thus, according to the result cited above, D-optimal sensor allocation consists of 25 support points that are obtained as the Cartesian product of the vector [0.14, 0.32, 0.5, 0.68, 0.86]T with itself (see Fig. 4.1). All the weights are equal to 0.04. The following result shows that the special structure of the solution of (4.30) can be met in many cases.
Figure 4.1: Optimal placement of sensors for estimating a load of a rectangular thin plate with clumped edges.
It is easy to prove that if v(x) is an eigenfunction of operator Ax with eigenvalue λ and w(y) is an eigenfunction of operator By with eigenvalue γ, then the operator Ax + By has the eigenfunction v(x) w(y) with the eigenvalue λ + γ.
78 | 4 The product of experiment designs This result obviously generalizes to more than two dimensions. Example 26 (Allocation of sensors for estimating eigenvalues). Up to now, we have assumed that a distributed load u is unknown, while eigenvalues λk , k = 1, 2, . . . , of operator Ax (defined by (2.38) or the one considered in Example 14) are known. In this example we consider the converse case, namely, u is known, but λk ’s are unknown and we would like to design the sensor allocation which is D-optimal for estimating them. The structure of the solution of (2.38) and (2.45) is the same and looks as follows: K̂
⟨u, vk ⟩ vk (x). λk k=1
q(x) = ∑
(4.34)
In the following chapters we reveal conditions under which the expansion (4.34) is valid. When u is known, we know also ⟨u, vk ⟩’s. Thus, if λk ’s are unknown, one can estimate them from the observations yi = q(xi ) + ϵi , i = 1, 2, . . . , N, as follows: estimate αk ’s by minimizing N
K̂
i=1
k=1
2
∑(yi − ∑ αk ⟨u, vk ⟩ vk (xi ))
(4.35)
using standard OLS software. Denote by α̂ k ’s the minimizers of (4.35). Then, one can estimate λk as λ̂k = 1/α̂ k . The D-optimal sensor allocation for estimating αk is the same as in the above examples. Does this mean that these sensor allocations are also D-optimal for estimating λk ’s? Strictly speaking, in a general case, the answer is negative, but in our special case the answer is positive, because the transformations αk = 1/λk are specific. Summarizing, we can repeat all the previous examples on the D-optimal sensor allocation in order to obtain D-optimal estimates for Ax eigenvalues. Remark 14 (On estimating parameters). In our previous Examples 11–13 we had λk = a1 k 2 π 2 + a2 . Such a linear dependence of eigenvalues of Ax on its parameters appears to be a more general phenomenon (see Chapter 6) that can be used for estimating parameters a1 and a2 quite easily and reliably. Indeed, having estimates λ̂k one can estimate a1 , a2 , again using OLS and standard software. Thus, we use OLS repeatedly, – firstly to estimate αk ’s, which provide λ̂k ’s, and – secondly to obtain estimates a1 , a2 by applying OLS to K̂
2 ∑ (λ̂k − (a1 k 2 π 2 + a2 )) .
k=1
This repeated least squares procedure was proposed in [134].
5 Optimal input signals for linear systems described by ODEs As the next step toward optimal input signals for estimating parameters of PDEs, we consider the problem of input signal optimization for systems described by linear ODEs. We shall confine our attention to LTI systems. Two main approaches can be distinguished, namely: (a) classical – the frequency domain approach, (b) more contemporary and more difficult – the time domain approach. We shall discuss them in the subsequent sections, while their usefulness for selecting optimal input signals for parameter estimation in systems with spatio-temporal dynamics will be described in further chapters. As an introduction to this chapter, we provide references to the methods and approaches of estimating parameters of systems described by ODEs, since the aim of selecting optimal input signals is to optimize the accuracy of their estimation. An additional point to be emphasized is the interplay between a selection of D-optimal input signals and the level of the ill-conditioning of the task, corresponding to a numerical procedure for finding optimal estimates. Remark 15. In this chapter the number of estimated parameters is equal to (r + 1) as it follows from the structure of ODEs. We emphasize this fact, since in the subsequent chapters the number of estimated parameters is r. This change of notation should be taken into account when applying the results of this chapter later on.
5.1 Remarks on parameter estimation of systems described by ODEs As mentioned in the Introduction, this book is focused on selecting an optimal input signal for parameter estimation, but not on the parameter estimation as such. Therefore, the aim of this section is limited to stating the problem of estimating parameters for systems described by ODEs and to provide references to specialized monographs on system identification. In particular, we mention the monographs authored by Eykhof [43], Kalaba and Spingarn [170], Soderstrom and Stoica [168], Ljung [96], Schitkovsky [157], and Schoukens and Pintelon [158], although this list is not exhaustive. The methods mentioned in this section do not require that input signals are optimal in any sense. Nevertheless, these signals should ensure the identifiability of parameters (see Chapter 2). The condition that is usually imposed on the input signal is that it is persistently exciting (see [70, 153], and the abovementioned monographs). https://doi.org/10.1515/9783110351040-005
80 | 5 Optimal input signals for ODEs According to the convention of this book, we concentrate on estimating the parameters of LTI systems, assuming that the model (system) structure is correct and known. Therefore, we omit many topics important for system identification, including model and noise structure determination and validation. Thee are also many results on selecting the cheapest input signals for system identification [24]. We also omit the interesting results on test signals that are designed for identification of the Hammerstein and the Wiener systems [34, 98]. These results will be useful in the future as building blocks for the design input signals for nonlinear DPS identification. Consider an LTI system described by an ODE dr−1 y(t) dr y(t) + ⋅ ⋅ ⋅ + a0 y(t) = ar u(t), + ar−1 r dt d t r−1
t ∈ [0, T],
(5.1)
where T is the horizon of observations. Additionally, it is assumed that either the initial conditions of (5.1) are known or they are included into the set of unknown parameters. In (5.1) y(t) denotes the output, which can be observed with random errors. The solution y(t; a)̄ of (5.1) depends on the vector ā = [a0 , a1 , . . . , ar ]tr of unknown parameters, where tr is the transposition. Derivatives of u(t) (up to the order (r − 1)) are also allowed in (5.1), but we skip this generalization for the sake of simplicity.
Let us assume that for a certain ā 0 , which is unknown, our system is sufficiently accurately described by (5.1). We also assume that the following observations are available: yi = y(ti ; ā 0 ) + εi ,
i = 1, 2, . . . , n,
(5.2)
where εi ’s are random errors, having zero means and (the same) finite variances. In (5.2), ti ’s denote instants of times (equidistant or not) at which the observations were taken. Then, a solution of the minimization problem n
2
̄ min ∑(yi − y(ti ; a)) ā
i=1
(5.3)
provides estimates â of ā 0 that are called the LSEs. Note that these estimates are nonlinear in yi ’s. If we, additionally, assume that εi ’s are Gaussian and the estimability (identifiability) conditions hold, then the solution of (5.3) provides the maximum likelihood estimate of ā 0 . For equidistant ti ’s, one can adopt the results from [118], recalled in Section 2.8, concerning the CRLB and its asymptotic attainability by nonlinear LSEs, as well as the asymptotic normality with the inverse of FIM as their covariance matrix.
5.2 LTI systems – the frequency domain approach | 81
Thus, also in this chapter we assume that FIM serves as the basis to form a criterion for selecting optimal input signals. The monographs [157, 170] emphasize important numerical aspects of computing parameter estimates such as those in (5.3). Taking into account that y(ti ; a)̄ depends nonlinearly on a,̄ this is of importance also from our point of view since, as mentioned earlier, an optimal experiment design usually leads to better conditioned problems of minimizing the sum of squared error criteria.
5.2 LTI systems – the frequency domain approach As already mentioned, the first results on selecting D-optimal signals for estimating parameters in LTI systems can be traced back to the early 1970s (see the pioneering papers and monographs by Mehra [99, 100], Goodwin and Payne [53], and Zarrop [195]). We summarize them here for the reader’s convenience. As a prerequisite for the following sub-sections, basic knowledge on stationary and ergodic stochastic processes in LTI systems is desirable (see, e. g., [109] for basic facts).
Assumptions Again, consider an LTI system described by an ODE, but this time it is convenient to have t ∈ [−T, T], where 2 T is the horizon of observations: dr−1 y(t) dr y(t) + ⋅ ⋅ ⋅ + a0 y(t) = ar u(t), + a r−1 d tr d t r−1
t ∈ [−T, T],
(5.4)
where 2 T is the horizon of observations. Additionally, it is assumed that an influence of the initial conditions from the remote past has already vanished. In order to obtain something more than simulations, we need a number of simplifying assumptions. – We confine our attention to LTI systems, described either by ODEs with constant parameters or by the corresponding impulse response of the system (Green’s function). – Observations are made at every instant of time (continuously in time) (the theory applies to equidistant discrete-time observations, but formulas are less transparent). – The observation errors are uncorrelated (a known correlation structure can be absorbed, but again, formulas are less transparent). – The horizon of observations T is long (theoretically – infinite) so as the influence of initial conditions is negligible (they can be assumed to have zero as their values).
82 | 5 Optimal input signals for ODEs –
Input signals are generated in an open loop, i. e., there is no feedback between observations and input signals.
The last assumption is a source of controversy. For this reason we stress that an internal structure of the system can be very complicated and may contain internal feedbacks between the system states. The only requirement imposed by the last assumptions is that there is no direct link between the observed outputs and input signals. There is an important stream of papers on generating input signals for system identification (see the bibliography at the beginning of this chapter) in which this assumption is not imposed. It is also known that when constraints are imposed on the power of the output signal, then admitting feedback from the outputs to the input may be beneficial in terms of the system identification accuracy. In this section we concentrate on the case of input signal power constraints, which appear when we estimate parameters of LTI systems having a direct physical interpretation. One more argument for considering the frequency domain approach is that it retains the elegance and many properties of the Kiefer–Wolfowitz approach to optimal experiment design for a regression estimation (see Section 2.5). As mentioned above, later on in this book, we assume that observations are made at every time instant or, even if the observed signals are sampled, the sampling rate is so fast that we can consider them as made continuously in time. Due to this assumption the formulas that follow are much more transparent. Furthermore, we can use the calculus of variations and strong results, concerning eigenfunctions and eigenvalues of integral operators with symmetric kernels. We assume that the available observations for estimating unknown parameters are the following: Y(t) = y(t; a)̄ + ε(t),
t ∈ [−T, T],
(5.5)
where ε(t) is zero mean, uncorrelated, Gaussian “white noise” with unit intensity. More precisely, ε(t) is implicitly defined by d W(t) = ε(t) dt,
(5.6)
where W(t) is the standardized Wiener process. In other words, E[ε(t) ε(τ)] = δ(t − τ), where δ is the Dirac delta function. We do not lose generality by assuming that ε(t) has unit intensity, since the intensity appears only as the multiplier of the FIM and does not influence the shape of the optimal input signal.
5.2 LTI systems – the frequency domain approach | 83
It can be shown (see [53]) that the FIM MT (u) for estimating ā from (5.5) has the form T
tr
̄ MT (u) = ∫ ∇a y(t; a)̄ (∇a y(t; a)) dt,
(5.7)
−T
where ∇a y(t; a)̄ is the gradient of y(t; a)̄ with respect to a.̄ Remark 16. The elements of ∇a y(t; a)̄ are called the sensitivities of y(t; a)̄ with respect to components of a.̄ They play a similar role as vectors v(x) in the previous chapters. Partial derivatives of y(t; a)̄ with respect to a(j) ’s can be calculated in many ways, as explained in the chapters that follow. In particular, under not too restrictive assumptions, one can formally differentiate (5.4) with respect to a(j) in order to obtain an ODE a)̄ for 𝜕y(t; that depends on the solution of (5.4) as a formal input. 𝜕a(j) From the Cramer–Rao inequality (see [118, 149]) we know that for any unbiased estimator ã of ā we have cov(a)̃ ≥ MT−1 (u),
(5.8)
which is understood in the sense that the matrix cov(a)̃ − MT−1 (u) is nonnegative definite. When T → ∞ the LSE is asymptotically unbiased and equality in (5.8) is attained (see Section 2.8). We refer the reader to [114] for more exact inequalities when the system response depends nonlinearly on unknown parameters. Thus, it is meaningful to minimize statistically interpretable functionals of MT−1 (u) with respect to u(⋅) under certain constraints. Such problems are difficult, since T T T
̄ a)̄ k̄ tr (ν; a)̄ MT (u) = ∫ ∫ ∫ k(τ;
(5.9)
−T −T −T
× u(t − τ) u(t − ν) dτ dν dt, where ̄ a)̄ def ̄ k(t; = ∇a g(t; a),
(5.10)
while g(t; a)̄ is the impulse response (Green’s function) of (5.4). Note that g(t; a)̄ = 0 for t < 0. The classic approach to solving such problems is to admit u(⋅) to be second-order stationary random processes with a finite energy and to allow T → ∞. This leads to
84 | 5 Optimal input signals for ODEs considering the normalized FIM of the form def
M(Ru ) = lim
1
T→∞ 2 T ∞ ∞
MT (u)
(5.11)
̄ a)̄ k̄ tr (ν; a)̄ R (τ − ν) dτ dν, = ∫ ∫ k(τ; u −∞ −∞
where T
1 Ru (τ) = lim ∫ u(t) u(t + τ) dt T→∞ 2 T def
(5.12)
−T
is the autocorrelation function of stochastic process u(⋅). Above and in further chapters, the stochastic independence of u(⋅) and ε(⋅) is postulated. Denote by ℱ (⋅) the Fourier transform of the function in parentheses. More precisely, among many definitions of the Fourier transform (that differs by spreading the constant multiplier[s] between the direct and the inverse transform), we select the following version, which is most frequently used in automatic control: ∞
ℱ (x(⋅))(jω) = X(jω) = ∫ x(t) exp(−jωt) dt,
(5.13)
−∞ ∞
x(t) =
1 ∫ X(jω) exp(jωt) dω. 2π
(5.14)
−∞
Then, by the Parseval theorem, we can express FIM (5.11) as a functional of the input spectral density Su (⋅) as follows: ∞
1 M(Su ) = ∫ K(j ω; a)̄ K tr (−j ω; a)̄ Su (ω) dω, 2π
(5.15)
−∞
where the input spectral density is defined by Su (ω) = ℱ (Ru (⋅)),
̄ a)). ̄ K(j ω; a)̄ = ℱ (k(⋅;
(5.16)
In order to justify (5.15) consider the following sequence of equalities for a typical element mij (Su ) of M(Su ): ∞ ∞
mij (Su ) = ∫ ∫ ki (τ)kj (ν)Ru (τ − ν) dτ dν −∞ −∞ ∞
∞
= ∫ kj (ν)[ ∫ ki (τ)Ru (τ − ν) dτ] dν. −∞
−∞
(5.17)
5.2 LTI systems – the frequency domain approach | 85
Noting that ∞
ℱ { ∫ ki (τ)Ru (τ − ⋅) dτ} = Ki (jω)Su (ω) = Ki (jω)Su (ω), ∗
(5.18)
−∞
we further obtain ∞
mij (Su ) = ∫ kj (ν)ℱ −1 {Ki (jω)Su (ω)}(ν) dν −∞ ∞
(5.19)
∞
1 = ∫ kj (ν)[ ∫ Ki (jω)Su (ω) exp[jων] dω] dν 2π −∞
−∞
∞
∞
−∞ ∞
−∞
1 = ∫ Ki (jω)Su (ω)[ ∫ kj (ν) exp[jων] dν] dω 2π =
1 ∫ Ki (jω)Kj (−jω)Su (ω) dω. 2π −∞
Summarizing, M(Su ), i. e., the normalized FIM for a long observation horizon (T → ∞), can be expressed as a matrix which is linear with respect to the spectral density Su (ω) of input stochastic process u. Note that the symmetry of autocorrelation function Ru (τ) = Ru (−τ) for τ ∈ R implies that Su (ω) is a real-valued and symmetric function. This fact, in turn, implies that the elements of M(Su ) are real-valued. The class of admissible Su Attempting to maximize functionals of M(Su ) we need to impose certain constraints on Su . The reasons are both practical and theoretical, since without them any reasonable functional of M(Su ) would be unbounded. On the other hand, we attempt to impose as weak constraints as possible. The reason is that we would like to investigate the attainable parameter estimation accuracy, while other practical constraints can be imposed later. As the main constraint we put a bound on u(⋅) that can be interpreted as the average power of an input signal. By the Parseval equality, it can be expressed in the frequency domain as follows: T
∞
1 1 ∫ u2 (t) dt = ∫ Su (ω) dω. T→∞ 2 T 2π lim
−T
(5.20)
−∞
This expression should be bounded from above. Additionally, the largest available (or admissible) frequency ωmax > 0, say, should be finite. Thus, the class 𝒮 of all admis-
86 | 5 Optimal input signals for ODEs sible spectral densities of input signals is defined as follows: 1 𝒮 = {Su : 2π
ωmax
∫ Su (ω) dω ≤ 1, for all |ω| > ωmax Su (ω) = 0}.
(5.21)
−ωmax
One may ask why the average power of input signals is bounded by 1 in (5.21) instead of any positive value. The answer is that the D-optimality criterion scales nicely, without changing the shape of the optimal solution. Additionally, we already know from previous chapters that for the optimal solution, equality in (5.21) holds. Some properties of the class of admissible Su Convexity 𝒮 is a convex set, i. e., ∀ 0 ≤ α ≤ 1 we have Su(1) , Su(2) ∈ 𝒮 ⇒ α Su(1) + (1 − α) Su(2) ∈ 𝒮 . Important special case Let u(t) = A cos(ω0 t + ϕ), where ω0 is given, while ϕ is a random variable, uniformly distributed in [0, 2 π]. Then, for A2 ≤ 1 Su (ω) =
π A2 [δ(ω − ω0 ) + δ(ω + ω0 )], 2
(5.22)
Su ∈ 𝒮 , where δ(⋅) is the Dirac delta. Several remarks are in order concerning the class of admissible Su . 1. The case ω0 = 0 is interpreted as u(t) = const. 2. There is no fear of using Dirac delta distributions, since δ(ω − ω0 ) appears under integrals only and we use only the fact that ∫ f (ω) δ(ω − ω0 ) dω = f (ω0 ). Remark: Dirac delta distributions can be avoided by using the Stieltjes integral and the cumulative spectral density representation. 3. Clearly, the spectral stripes of u(t) = A1 cos(ω1 t + ϕ1 ) + A2 cos(ω2 t + ϕ2 ) are located at frequencies ω1 and ω2 . In order to make the definition of 𝒮 more precise, consider two cases: (a) Let Su (ω) be a nonnegative, symmetric, and integrable function, which is zero for |ω| > ωmax . Then, its cumulative spectral density representation, denoted further ω by Su , has the form Su (ω) = ∫−∞ Su (ω) dω. Then, the condition on the mean power in (5.21) reads as Su (ωmax ) = 1. In such cases one can interpret integrals ∞
∫ ϒ(ω) Su (ω) dω −∞
(5.23)
5.2 LTI systems – the frequency domain approach | 87
as the Riemann integrals, assuming that function ϒ(ω) is integrable in the Riemann sense. (b) In the definition of 𝒮 we allow Su (ω) to be any nonnegative function such that: (Cums1) Su is nondecreasing, (Cums2) Su (ω) is nonnegative and Su (ω) = 0 for ω < −ωmax , while Su (ω) = 1 for ω > ωmax , (Cums3) for Su the following condition holds: Su (ω) = 1 − Su (−ω) for all ω. Note that the class of Su for which (Cums1)–(Cum3) hold allows for functions Su (ω) with discontinuities. In particular, Su (ω) = 21 (1(ω + ω0 ) + 1(ω − ω0 )) is allowed, where 1(ω) is the Heaviside unit step function. Note that in this example we have ∞
∫ ϒ(ω) dSu = −∞
1 (ϒ(−ω0 ) + ϒ(ω0 )), 2
(5.24)
where the integral is understood in the Stieltjes sense. Example (5.24) shows that we are able to cope with spectral densities that contain Dirac delta functions, without referring to the theory of distributions. For theoretical purposes, further in this book we define 𝒮 as the class of functions for which (Cums1)–(Cums3) hold and integrals with them are understood in the Stieltjes sense. However, we believe that the classic notation is more intuitive and for this reason we still keep the notation used in (5.21) in the rest of this book. Example 27. Consider the system ̇ + a y(t) = u(t), y(t) where a is unknown. Then, for t > 0 g(t; a) = exp(−a t) and zero otherwise. Its sensitivity to changes of a has the form k(t; a) = −t exp(−a t),
t > 0,
K(j ω; a) = ℱ (k(⋅; a)) = (a + j ω)−2 . For u(t) = cos(ω0 t + ϕ) with ϕ uniformly distributed in [0, 2 π] the information matrix is the following scalar function of ω0 : M(Su ) =
1 2 −2 (a + ω20 ) . 2
88 | 5 Optimal input signals for ODEs
Figure 5.1: The sensitivity of the information of the first-order ODE for departures from the optimal input signal cos(ω0 t + ϕ) for ω0 = 0 toward larger frequencies, depending on unknown parameter a.
Its shape is depicted in Fig. 5.1. It attains its maximum for ω∗0 = 0, which is interpreted as a constant signal. Note that the optimal frequency ω∗0 does not depend on unknown parameter a, but it is a rare case. Note that in [53] (Example 6.4.3) a seemingly similar example is considered. Namely, in our notation, ̇ + y(t) = u(t), a y(t)
(5.25)
where a is an unknown parameter. Then, the optimal input signal is a single harmonic, but with frequency ω∗0 = 1/a. Problem statement and the optimality conditions Given system (5.4) of known order r and unknown parameters a,̄ find Su∗ ∈ 𝒮 such that max ln(det[M(Su )]) = ln(det[M(Su∗ )]). Su ∈𝒮
(5.26)
In the problem statement we use the D-optimality criterion, which is equivalent to minimization of the volume of the uncertainty ellipsoid of the estimated parameters. However, all the results apply to the class of L-optimality criteria, trace[W M −1 (Su )], and to the Lp -class of criteria, trace[M −p (Su )]1/p , p > 0. The curse of a priori knowledge As in optimum experiment design for nonlinear (in parameters) regression estimation (see Section 2.8), also here the optimal Su depends on unknown a.̄ Adopting Bellman’s
5.2 LTI systems – the frequency domain approach | 89
terminology, we can name this difficulty the curse of a priori knowledge, since for designing the optimal input signal for estimating ā we need to know it. In the literature a number of ways of circumventing this difficulty have been proposed. The most common are the following (see also Section 2.8). 1. Use the “nominal” parameter values for a,̄ i. e., use typical (standard) values for a,̄ frequently available as physical constants. 2. Apply the “worst case” analysis, i. e., use the mini-max approach, trying to find the values of ā that are the least favorable from the viewpoint of an optimal choice of input signals. This approach is computationally the most difficult and it is considered to be too conservative. 3. Use the Bayesian approach – broadly discussed in Section 2.8. The difficulty does not vanish, since the a priori knowledge of the distribution for ā is rather rare. 4. Apply the “adaptive” approach (see [46]) of subsequent estimation and planning stages. Later on, we use the approach based on “nominal” parameter values ā as the simplest one, but the results are relevant also as the first stage of an “adaptive approach.” Existence of the solution If for fixed ā the mapping ω → K(j ω; a)̄ K tr (−j ω; a)̄
(5.27)
is continuous for ω ∈ [−ωmax , ωmax ], then the set ℳ of all attainable FIMs defined as def
ℳ = {M(Su ) : Su ∈ 𝒮 }
is convex and compact. The convexity of ℳ follows directly from the convexity of 𝒮 . The continuity of (5.27) over a finite and closed interval, together with the bounds on Su ∈ 𝒮 , implies that ℳ is bounded. The proof that ℳ is closed is more subtle, but it can be done using the guidelines of the corresponding proof in [46]. Remark 17. To explain why, we can invoke the result from [46] to prove that ℳ is the closed set. Note that functions for which (Cums1)–(Cums3) hold can be interpreted as cumulative distribution functions of random variables. Thus, their derivatives, if they exist, have the same properties as experiment designs considered in the previous chapters. As a consequence of these facts and of the strict convexity of the function log det[⋅] over the set of all symmetric, positive definite matrices we obtain the following result. Corollary 2. There exists a unique M ∗ ∈ ℳ that solves (5.26). Note that M ∗ can be achieved by several different Su∗ ∈ 𝒮 .
90 | 5 Optimal input signals for ODEs Important properties of Su∗ We start from a general remark concerning a canonical realization of Su . For each Su ∈ 𝒮 there exists Sud ∈ 𝒮 such that M(Su ) = M(Sud ) and Sud is a mixture of not more than def
r1 (r1 + 1)/2 + 1 sinusoids, where r1 = dim(a)̄ = r + 1. This is a direct consequence of the Caratheodory theorem. In particular, if Su∗ ∈ 𝒮 is D-optimal, then M(Su∗ ) is on the boundary of ℳ, and it can be realized by a sum of not more than r1 (r1 + 1)/2 sinusoids with amplitudes A∗1 , A∗2 , . . . and frequencies ω∗1 , ω∗2 , . . . and it suffices to find them. Thus, problem (5.26), originally formulated in an infinite-dimensional space containing 𝒮 , is reduced to the one in a finite-dimensional space. Remark 18. A further reduction of the optimization problem is possible, since – due to the symmetry of spectral densities Su – the FIM (5.15) can be expressed as ωmax
1 ̄ tr (−jω; a)} ̄ Su (ω) dω M(Su ) = ∫ Re {K(jω; a)K π
(5.28)
0
and the maximization of log det[M(Su )] can be confined to Su that are nonnegative and such that ωmax
1 ∫ Su (ω) dω = 1. π
(5.29)
0
This convention is used in the monograph [53]. Here, we keep the symmetric version of Su as more convenient for manipulating formulas, but for constructing numerical algorithms it is desirable to use (5.28) and (5.29) in order to avoid the symmetrization of partial results at each iteration. The equivalence theorem for D-optimal input signals A crucial hint for finding or numerically searching for D-optimal input signals is provided by the analog of the Kiefer–Wolfowitz theorem (see [53] and the bibliography cited therein). Theorem 7 (The optimality condition). Assuming continuity of the mapping (5.27), Su∗ ∈ 𝒮 is D-optimal if and only if sup
ω∈[−ωmax , ωmax ]
φ(ω, Su∗ ) = r + 1,
where for Su with Det[M(Su )] > 0 function φ is defined as follows: def
φ(ω, Su ) =
1 tr ̄ K (−j ω; a)̄ M −1 (Su ) K(j ω; a). 2π
Furthermore, for every ω∗ in the support of Su∗ we have φ(ω∗ , Su∗ ) = r + 1.
(5.30)
5.2 LTI systems – the frequency domain approach | 91
–
– –
Several remarks are in order concerning the above theorem. Function φ(ω, Su ) can be interpreted in terms of asymptotical variance of prediĉ̄ where â̄ is obtained by the LSQ method. Here we tion provided by model y(t, a), omit the discussion on relationships of (5.30) to G-optimality, but they are almost as tight as in the Kiefer–Wolfowitz theorem (see Chapter 2). Condition (5.30) allows us to check the optimality of a given Su rather than to infer how it looks. Condition (5.30) is a powerful tool for constructing algorithms for searching Su∗ in the spirit of the Wynn–Fedorov method and its further refinements.
Example 28. As an example of applying the above theorem directly, consider a system with two unknown parameters, ̇ + a0 y(t) = a1 u(t). y(t)
(5.31)
For suitably chosen ω0 , namely, ω0 = a0 /√3, we shall prove the D-optimality of the input signal u(t) = cos(ω0 t + ϕ) that has the autocorrelation function Ru (τ) = cos(ω0 τ). Note that for the corresponding Su we have ∞
1 ∫ Su (ω) dω = 1. 2π −∞
For the system (5.31) we have g(t; a)̄ = a1 exp(−a0 t) for t ≥ 0 and zero otherwise. ̄ a)̄ = [−a t exp(−a t), exp(−a t)]tr , Consequently, k(t; 1 0 0 K(j ω) = [
tr
a1 1 , ] . (a0 + j ω)2 (a0 + j ω)
The corresponding normalized FIM has the following elements: a21 , (a20 + ω20 )2 a a m12 = m21 = − 2 0 12 2 , (a0 + ω0 ) 1 . m22 = 2 (a0 + ω20 ) m11 =
Its determinant depends on ω0 in the following way: Det(M) = and attains its maximum, which equals
a21 ω20 (a20 + ω20 )4
27 a21 , 256 a60
for ω0 = a0 /√3.
(5.32)
92 | 5 Optimal input signals for ODEs Such direct proof is rarely possible. Thus it is useful to demonstrate also the applicability of Theorem 7. To this end, it suffices to calculate the maximum of the expression φ(ω, Su∗ ) =
(a20 + ω20 )2 (ω20 + ω2 ) ω20 (a20 + ω2 )
(5.33)
with respect to ω and to check that maxω φ(ω, Su∗ ) = 2 and it is attained for ω∗ = ω0 = a0 /√3. This can be done analytically (e. g., using Mathematica – see also Fig. 5.2).
Figure 5.2: Prediction variance in two parameters for the first-order model.
One can ask, how much do we lose if instead of the D-optimal frequency ω∗ = a0 /√3 we use frequency af /√3? To this end, let us set b = af /a0 . Denote by Suf the spectral density of the harmonic signal with ω = af //√3. Then, the relative efficiency has the form Eff =
Det[M(Suf )] 256 b2 = . ∗ Det[M(Su )] (3 + b2 )4
(5.34)
From Fig. 5.3 it follows that if the error ratio of selecting the nominal parameter b = af /a0 < ±0.3, then the loss of efficiency is acceptable, namely, Eff > 85 %. Conclusion: Even if our knowledge of a nominal parameter is largely uncertain (or vague), it pays to design the experiment. The important conclusion from this example is also that – in the framework of the theory of input signal designs, which is based on spectral densities, it may happen that the optimal spectral density has a smaller number of harmonics than the number of estimated parameters (in this example, two parameters and the optimal input signal with only one harmonic). This phenomenon is in contrast to the theory of D-optimal experiment designs for estimating regression parameters (see the previous chapters). We remark that it also appeared in [53] (Example 6.4.4) for the first-order system with two unknown parameters, but the parametrization was different than in our case.
5.2 LTI systems – the frequency domain approach | 93
Figure 5.3: The loss of design efficiency as a function of the ratio b = af /a0 .
A generalization to unbounded power input signals Intuition gained up to now suggests that if we can allow for less restrictive (“wild”) input signals, then one can expect better results in terms of estimation accuracy. For this reason it is reasonable to consider the class of stochastic input signals, such that for certain integer k > 0 the limits lim u(T)/T (2 k+1) = 0,
T→∞
∀τ
lim
T→∞
1
T (2 k+1)
(5.35)
T
∫ u(t) u(t + τ) dt < ∞ −T
exist, where the convergence is understood in the probability sense. Then, in [125] it was proved that D-optimal input signals for estimating parameters in LTE systems have the form t k × (a finite, linear combinations of sinusoids). Note that for k = 0 we obtain the previous results as a special case. Additionally, an algorithm which is based on the Wynn–Fedorov method can be used for finding frequencies of the aforementioned sinusoids. The above results assume that our system is able to “survive” such input signals. If so, then the estimation accuracy is much better. A generalization to multi-output systems One can generalize the above results to multi-output systems, in particular, to those that arise by sampling more than one output field. The necessary and sufficient optimality conditions in these cases can be found in [91]. In this paper one can also find remarks on computational algorithms relevant to this case. In general they are
94 | 5 Optimal input signals for ODEs more difficult, since selecting proper outputs to be measured requires one to solve a sequence of one-dimensional problems.
5.3 Optimal input signals for ODE systems – the time domain approach In this section we consider again optimal input signals for estimating parameters in LTI systems described by ODEs. But now, we discuss a time domain approach. It is based on the variational approach. The reader is referred to [169] for its contemporary presentation that is dedicated to control theory. In the appendix of [169] one can also find the definitions and basic properties of the Gateaux differential and the Frechet derivative. One may ask, why do we need a time domain theory of optimal input signals for parameter estimation in finite-dimensional LTI systems? The reasons are the following. – Frequency domain (FD) theory is elegant, but it requires a long observation history which is not always possible in practice: T → ∞. – In FD theory we can easily impose constraints on the energy of a signal only, but not, e. g., on their amplitudes. – A sum of sinusoids can take unexpectedly large values (see Fig. 5.4). In this sub-section we consider deterministic (nonstochastic) input signals of finite energy, defined on a finite time interval. Namely, we admit the following class of input signals: def
T
2
𝒰 = {u : ∫ u (t) dt ≤ 1}. 0
Figure 5.4: The sum of six sine waves with amplitude = 1 has a maximum of > 4. The expected cancelations do not appear.
5.3 ODE systems – the time domain approach | 95
For such signals, the expression for FIM, as a functional of u(⋅), has the following form: T T
MT (u) = ∫ ∫ H(τ, ν; a)̄ u(τ) u(ν) dτ dν, ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟ bilinear
0 0
where T
def ̄ − τ; a)̄ k̄ tr (t − ν; a)̄ dt, H(τ, ν; a)̄ = ∫ k(t
(5.36)
0
̄ a)̄ is the while k(t; def
̄ a)̄ = ∇ ̄ g(t; a). ̄ (r + 1) × 1 vector of the sensitivities k(t; a
(5.37)
In the above formulas, g(t; a)̄ is the impulse response (Green’s function) of ODE (5.1). Note also that g(t; a)̄ = 0 for t < 0. Thus, assuming zero initial conditions in (5.1), its solution can be expressed as t
T
y(t) = ∫ g(t − τ; a)̄ u(τ) dτ = ∫ g(t − τ; a)̄ u(τ) dτ. 0
(5.38)
0
̄ a)̄ = 0 for t < 0. Hence, Similarly, the property g(t; a)̄ = 0 for t < 0 implies that also k(t; t T ̄ a). ̄ we can also use ∫0 instead of ∫0 in the convolutions involving k(t; Problem statement Find u∗ (⋅) ∈ 𝒰 for which max log det[MT (u)] is attained. u∈𝒰
(5.39)
Note that for arbitrary ς > 0 we have MT (ς u) = ς2 MT (u).
(5.40)
A similar property is known under the name positively homogeneous criterion (see page 114 in [118]). Strictly speaking, the D-optimality criterion is not a positively homogeneous criterion, but the criterion (r +1)−1 log det[⋅] fulfills this requirement. Thus, we may use safely the log det[⋅] criterion in the rest of this book. T Hence, it suffices to look for D-optimal input signals such that ∫0 u2 (t) dt = 1. The class of all such functions will be denoted as 𝒰0 .
96 | 5 Optimal input signals for ODEs D-optimality conditions by the variational approach For γ ∈ R being the Lagrange multiplier, define T
L(u, γ) = log[Det(MT (u))] − γ (∫ u2 (t) dt − 1). 0
As the auxiliary problem we consider the maximization of L(u, γ) over u ∈ 𝒰 for a certain γ. Let u∗ ∈ 𝒰 ∩ C0 (0, T) be a solution of the auxiliary problem and let uϵ (t) = u∗ (t) + ϵ f (t), where f ∈ C0 (0, T) is arbitrary. Then, for the Gateaux differential of L we obtain T
T
𝜕L(uϵ , γ) = 2 ∫ f (ν)[∫ ker(τ, ν, u∗ ) u∗ (τ) dτ − γ u∗ (ν)] dν = 0, 𝜕 ϵ ϵ=0
(5.41)
0
0
where, for u ∈ 𝒰 , we define the kernel def
̄ ker(τ, ν, u) = trace[MT−1 (u) H(τ, ν, a)]
(5.42)
𝜕L(u , γ)
and its symmetry was used in calculating 𝜕 ϵϵ |ϵ=0 = 0. Equality in (5.41) holds for every f ∈ C0 (0, T). Thus, from the fundamental lemma of variational calculus we infer that the expression in the brackets in (5.41) must be zero for every t ∈ [0, T], which implies the following. Corollary 3. Input signal u∗ is an eigenfunction of the following integral equation: T
∫ ker(τ, ν, u∗ ) u∗ (τ) dτ = γ u∗ (ν),
(5.43)
0
which is normalized as follows: T
2
∫(u∗ (t)) dt = 1.
(5.44)
0
This fact has been known for a long time (see [170]), and it was also noted that the solution of (5.43) exists for a certain γ only. Below, we provide results that characterize this solution in more detail and in the spirit of the theory of optimal experiment design. Note that (5.43) is nonlinear with respect to u∗ . Consider a family of associated linear eigenvalue problems: for arbitrary, but fixed u ∈ 𝒰 , for which M−1 T (u) exists, we are looking for eigenfunctions ϕ(ν, u) and eigenvalues μ(u) (depending on u) and they
5.3 ODE systems – the time domain approach | 97
are such that T
∫ ker(τ, ν, u) ϕ(ν, u) dν = μ(u) ϕ(τ, u).
(5.45)
0
It is expedient to express kernel (5.42) as follows: T
̄ − ν; a)̄ dt. ker(τ, ν, u) = ∫ k̄ tr (t − τ; a)̄ MT−1 (u) k(t
(5.46)
0
Corollary 4. If MT−1 (u) exists, then the kernel has the following properties. (Ker1) It is symmetric: ker(τ, ν, u) = ker(ν, τ, u), ν, τ ∈ [0, T]. ̄ a)̄ are square integrable on [0, T], then also ∫T ker(t, (Ker2) If all the elements of k(⋅; 0 t, u) dt < ∞. (Ker3) The kernel is nonnegative definite, i. e., for every ũ ∈ L2 (0, T) we have T T
̃ u(ν) ̃ dτ dν ≥ 0. ∫ ∫ ker(τ, ν, u) u(τ)
(5.47)
0 0
̄ a)̄ are square integrable on [0, T] is imRemark 19. The assumption in (Ker2) that k(⋅; mediately fulfilled for LTI systems that can be expressed in the standard state-space ̄ a)̄ is continuous for t ∈ [0, T]. The continuity of k(⋅; ̄ a)̄ in turn form, since then k(t; 2 implies the continuity of (5.46) in [0, T] . Proof. From (5.46) it follows that T
̄ − ζ ; a)̄ dt, ker(ζ , ζ , u) ≤ λmax (MT−1 (u)) ∫ k̄ tr (t − ζ ; a)̄ k(t
(5.48)
0
where λmax (⋅) is the largest eigenvalue of a matrix in parentheses. Now, (Ker2) follows ̄ a). ̄ from the assumed integrability of k̄ tr (t; a)̄ k(t; To prove (Ker3) note that T T
T
0 0
0
̃ dt, ̃ u(ν) ̃ dτ dν = ∫ k̃ tr (t) MT−1 (u) k(t) ∫ ∫ ker(τ, ν, u) u(τ)
(5.49)
def T ̄ ̃ ̃ dν. Now, (Ker3) follows from the fact that MT−1 (u) is where k(t) = ∫0 k(t − ν; a)̄ u(ν) positive definite.
Having Corollary 4 at our disposal, we can invoke the well-known results (see, e. g., [173]) concerning eigenvalues and eigenfunctions of an integral operator with a
98 | 5 Optimal input signals for ODEs continuous, symmetric, and nonnegative definite kernel. In our case, these eigenfunctions and the eigenvalues depend on a fixed u, which is reflected in our notation. Corollary 5 (Implications of the Mercer theorem). For fixed u ∈ L2 (0, T) with nonsingular MT (u), the eigenvalue problem (5.45) has the following properties. (Ker4) The number of eigenfunctions ϕk (⋅, u) and the corresponding nonzero eigenvalues μk (u) is countable, k = 1, 2, . . . . It can be finite for degenerated kernels and then eigenfunctions corresponding to μk (u) = 0 can be selected in such a way that jointly they form an orthogonal basis of L2 (0, T). (Ker5) The eigenvalues are real and nonnegative. Without loss of generality, we can order them as follows: μmax (u) = μ1 (u) ≥ μ2 (u) ≥ ⋅ ⋅ ⋅ ≥ 0, including possible multiple eigenvalues in this ordering. (Ker6) The corresponding eigenfunctions ϕk (⋅, u), k = 1, 2, . . . , are orthogonal and comT plete in L2 (0, T). They can be normalized in such a way that ∫0 ϕ2k (t, u), dt = 1. Furthermore, the kernel can be expressed in the following form: ∞
ker(τ, ν, u) = ∑ μk (u) ϕk (τ, u) ϕk (ν, u). k=1
(5.50)
Corollary 6. For every u ∈ 𝒰0 for which MT (u) is nonsingular we have maxk=1,2,... μk (u) ≥ (r + 1). Proof. From the completeness and the orthogonality of ϕk (⋅, u), k = 1, 2, . . . , we have the following expansion (convergent in the L2 (0, T)-norm): ∞
u(t) = ∑ uk ϕk (t, u), k=1
def
(5.51)
T
where uk = ∫0 u(t) ϕk (t, u) dt. The Parseval equality yields ∞
∑ u2k = 1.
k=1
(5.52)
On the other hand, we have T T
r + 1 = trace[MT (u) MT−1 (u)] = ∫ ∫ ker(τ, ν, u) u(τ) u(ν) dτ dν
(5.53)
0 0
∞
= ∑ μk (u) u2k ≤ max μk (u), k=1
k=1,2,...
where in the last equality the Mercer theorem was applied, while the inequality follows from (5.52).
5.3 ODE systems – the time domain approach | 99
Analysis of the above partial results leads to the following. Theorem 8. If u∗ ∈ 𝒰0 is a solution of the optimization problem (5.39) with MT (u∗ ) nonsingular, then the following condition holds: γ = (r + 1) is one of the eigenvalues (5.43) and u∗ is the eigenfunction of (5.43) that corresponds to this particular eigenvalue. Proof. If u∗ ∈ 𝒰0 is D-optimal, then after multiplying both sides of (5.43) by u∗ and integrating over [0, T], we obtain T T
γ = ∫ ∫ ker(τ, ν, u∗ ) u∗ (τ) u∗ (ν) dτ dν = r + 1,
(5.54)
0 0
where the last equality follows by rereading (5.53) from the right-hand side to the lefthand side. This, together with Corollary 3, finishes the proof. We leave as a conjecture the statement that u∗ corresponds to the largest eigenvalue of (5.43) since its detailed proof would be rather long. Theorem 8 provides conditions that are necessary for the optimality of u∗ . For a discussion on when these conditions, considered for observation horizon T > 0 being relatively short, are also sufficient for optimality of u∗ , see [143]. ̈ + 2 ζ y(t) ̇ + ω20 y(t) = ω20 u(t) with known resExample 29. Consider the system y(t) ̇ onance frequency ω0 and y(0) = 0, y(0) = 0. The unknown damping parameter 0 < ζ < 1 is to be estimated, but it is a priori known that it has a low value (underdamped case). The impulse response of this system has the form g(t) =
ω0 √1 − ζ 2
exp[−ζ ω0 t] sin(ωd t),
t > 0,
(5.55)
def
where ωd = ω0 √1 − ζ 2 . For ζ near zero we neglect terms of the order ζ 2 and (5.55) can be approximated by g(t) ≈ ω0 exp[−ζ ω0 t] sin(ω0 t),
t > 0.
(5.56)
The sensitivity of (5.56) with respect to ζ has the form k(t; ξ ) = −t ω20 exp(−ζ t) sin(ω0 t),
t > 0.
The eigenfunction corresponding to the largest eigenvalue of H(τ, ν; ξ ) was calculated numerically (by the Nystrom discretization) for T = 2.5 and for the grid step size 0.005. In these calculations we have used the fact that for one estimated parameter MT (u∗ ) is an unknown scalar that does not influence the shape of the eigenfunction corresponding to the largest eigenvalue.
100 | 5 Optimal input signals for ODEs
Figure 5.5: Approximation of the optimal input signal for the second-order vibrating system with known resonant frequency ω0 and unknown, but low damping (see Example 29).
The result, being the approximation of the optimal input signal for estimating ζ , is shown in Fig. 5.5. As one can observe, this signal grows rapidly so as to compensate for the influence of low damping coefficient ζ . Advantages of the variational approach From the Gateaux differential it is easy (in our case) to obtain the Frechet derivative of L with respect to u, which is further denoted by Lu (γ, u). Indeed, T
Lu (γ, u)(ν) = 2 ∫ ker(τ, ν, u) u(τ) dτ − 2 γ u(ν).
(5.57)
0
Having Lu , one can construct numerical methods of searching for optimal input signals of a gradient-like form: up+1 (t) = up (t) − δp Lu (r + 1, up )(t),
t ∈ [0, T],
(5.58)
where the step length δp → 0 is selected either as a sequence for which the condition ∑∞ p=1 δp = ∞ holds or by optimizing the step length in the current search direction. In practice, (5.58) is calculated on a uniform grid. It is also possible to calculate the second Frechet derivative of functional L and to derive a quasi-Newton algorithm, but the calculations are too complicated to be considered here. Example 30. Consider a very simple system: ̇ + a y(t) = u(t), y(t)
t > 0,
y(0) = 0,
where a > 0 is unknown. Then, for t ≥ 0 its impulse response has the form g(t; a) = exp(−a t),
t ≥ 0,
(5.59)
5.3 ODE systems – the time domain approach
| 101
and zero otherwise. Its sensitivity to a is given by ∇a g(t; a) = −t exp(−a t),
t ≥ 0,
and zero otherwise. Using advanced systems for symbolic calculations one can obtain an expression for ker(τ, ν; a), but it is too complicated for further derivations. For small T > 0 it can be approximated as follows: ker(τ, ν; a) ∼ T [(T − ν) exp(−a (T − ν))] [(T − τ) exp(−a (T − τ))],
(5.60)
where ∼ stands for “proportional to.” It is clear that the eigenfunction ϕ∗ (τ) can be approximated by (T − τ) exp(−a (T − τ)). Thus, the approximation of the D-optimal input signal has the form u∗ (t) = s−1 (T − t) exp(−a (T − t)),
t > 0,
where s is the constant (dependent on a and T) which ensures that u∗ has the unit energy. Its shape is shown in Fig. 5.6 (solid line) for a = 1. Note, however, that it is a rather aggressive input signal. Hence, it is reasonable to apply it on a short interval (0, T) only.
Figure 5.6: Approximation of the optimal input signal for the first-order system (see Example 30).
This is an expected conclusion, since larger a > 0 corresponds to faster response decay, which needs to be compensated by the input signal. On the other hand, it is not always safe to apply such input signals (see the next section).
102 | 5 Optimal input signals for ODEs The reader may ask what is the reason that here the optimal input signal is rapidly increasing, while in the corresponding example in [170] it is decreasing. The reason is simple, namely, that in (5.59) we have a > 0, while in the cited monograph a was negative (in our convention). In other words, in [170] an unstable first-order system was considered, while in our case the system is stable. In particular, its impulse response decays over time. The variational approach reveals also the expected behavior of D-optimal input signals. To this end, it is expedient to distinguish the following two cases: (a) T > 0 is relatively small, (b) T is rather large. In case (a), by “T is relatively small” we mean that T is so small that for (5.46) the following approximation is sufficiently accurate: ̄ − ν; a). ̄ ker(τ, ν, u) ≈ T k̄ tr (T − τ; a)̄ MT−1 (u) k(T
(5.61)
Note that in this case ker(⋅ ⋅ ⋅) can be expressed as a finite sum of the elements of ̄ − ν; a)̄ multiplied by the elements of M −1/2 (u) k(T ̄ − τ; a). ̄ In other words, MT−1/2 (u) k(T T ker(τ, ν, u) is approximately a degenerate kernel (see, e. g., [173] for the theory of integral equations with such kernels). As is known, the eigenfunctions of integral equations with degenerated kernels are linear combinations of the kernel components. In our case, this means that also the eigenfunction corresponding to the largest eigen̄ − t; a). ̄ Thus, if the value in Theorem 8 is a linear combination of the elements of k(T ∗ ̄ elements of k(t; a)̄ exponentially decrease, we expect that u (t) will have components that are exponentially increasing. In case (b), kernel (5.46) can be approximated, as the simplest choice, by the Riemann sum imax
̄ − ν; a), ̄ ker(τ, ν, u) ≈ ΔT ∑ k̄ tr (Ti − τ; a)̄ MT−1 (u) k(T i i=1
(5.62)
where Ti ’s form an equidistant grid in [0, T] with step size ΔT. As in case (a), one can notice that the approximating kernel is a degenerate one. This time, it is spanned by ̄ −t; a), ̄ i = 1, 2, . . . , imax , and the conclusion is similar as in case (a). the elements of k(T i Namely, the eigenfunctions will be exponentially growing, provided that the elements ̄ a)̄ are decreasing. The details concerning calculations of eigenfunctions of inof k(t; tegral operators with degenerated kernels are provided in Chapter 6 for multivariable functions.
5.4 Safer input signals | 103
5.4 Safer input signals – time domain synthesis The main message of this section is the following. By adding also output constraints, we can gain a lot in safety of an input signal, losing relatively little in estimation accuracy. Typical output constraints include the following possibilities: T
t
2
∫ y (t) dt ≤ e2 , 0
y(t) = ∫ g(t − τ, a)̄ u(τ) dτ,
(5.63)
0
T
∫ ẏ 2 (t) dt ≤ e3 ,
(5.64)
0
T
∫ ÿ 2 (t) dt ≤ e4 ,
(5.65)
0
where positive ei ’s denote the prescribed constants. Later, we consider only (5.63), i. e., we are looking for u∗∗ ∈ L2 (0, T) that maximizes log[det(MT (u))] over u ∈ L2 (0, T) such that ‖u‖ ≤ 1 and (5.63) holds. Necessary optimality conditions for more safe input signals Note that replacing u by ς u in (5.63) for arbitrary ς > 1 we have to replace ‖y‖2 by larger ζ 2 ‖y‖2 . This fact and (5.40) imply that if u∗∗ solves our problem, then only one of the following two cases is possible. Case I. ‖u∗∗ ‖ = 1 and ‖y∗∗ ‖2 ≤ e2 , where y∗∗ is the response of the system in (5.63) to u∗∗ . Case II. ‖u∗∗ ‖ < 1 and ‖y∗∗ ‖2 = e2 , since ‖u∗∗ ‖ < 1 and ‖y∗∗ ‖2 < e2 contradicts the assumption that u∗∗ is the optimal solution (ς u∗∗ for certain ς > 1 would be a better one). One can check whether the conditions in Case I hold by solving the problem considered in the previous section, i. e., the one in which constraint (5.63) is tentatively neglected. Denote its solution, obtained by Theorem 8 or using (5.58), by u∗ . If for u∗ constraint (5.63) is fulfilled, then our problem is solved by setting u∗∗ = u∗ . If the conditions in Case I do not hold, the following problem remains to be solved, corresponding to Case II. Find u∗∗ ∈ L2 (0, T) such that max log det[MT (u)] is attained, u
(5.66)
104 | 5 Optimal input signals for ODEs where the maximization is done over all u for which the conditions T
t
2
∫ y (t) dt = e2 ,
y(t) = ∫ g(t − τ, a)̄ u(τ) dτ
0
(5.67)
0
hold. Using the variational calculus we obtain the following result. If u∗∗ with MT (u∗∗ ) nonsingular maximizes T
2
ℒ(u, γ) = log[det(MT (u))] − γ (∫ y (t) dt − e2 ), 0
then u∗∗ is an eigenfunction of the following integral equation: T
T
∫ ker(τ, ν, u∗∗ ) u∗∗ (τ) dτ = γ ∫ G(τ, ν) u∗∗ (τ) dτ,
ν ∈ [0, T],
(5.68)
0
0
for a certain γ ≠ 0, where def
T
G(τ, ν) = ∫ g(t − τ, a)̄ g(t − ν, a)̄ dt. 0
This eigenfunction has a norm that is less than or equal to one. Observe that kernel G(τ, ν) is symmetric and positive definite, and the same is true for kernel ker(τ, ν, ⋅). Thus, under additional technical assumptions (see [74, 75]), these two kernels allow for simultaneous diagonalization, which means that eigenfunctions in (5.68) exist. Note also that the discretized by the Nystrom method version of (5.68) leads to the well-known problem of the diagonalization of a pair of symmetric and positive definite matrices. The Frechet derivative of ℒ(u, γ) has the following form: for ν ∈ [0, T] T
T
ℒu (γ, u)(ν) = 2 ∫ ker(τ, ν, u) u(τ) dτ − 2 γ ∫ G(τ, ν) u(τ) dτ. 0
(5.69)
0
Algorithms for calculating safer input signals on [0, T] are modifications of those sketched previously.
6 Optimal excitations for systems described by elliptic equations 6.1 Introduction and notational conventions In this chapter we introduce notational conventions for spatial variables and give a very brief summary of the known facts about elliptic operators (and their inverses) that are useful in the remainder of this book. We try as much as possible to stay within the framework of the classical solutions of systems described (in their steady state) by elliptic PDEs or their one-dimensional counterparts, covered by the Sturm–Liouville theory. Then, we prove some facts about the dependence of a certain sub-class of eigenvalues of elliptic operators on their parameters. This dependence is linear (or affine), which is of importance for the design of an experiment. Finally, we provide some results about the D-optimality of spatial input signals (spatial excitations) being the most informative from the viewpoint of estimating accuracy of unknown parameters. From now on, components of vector x will be denoted as x1 , x2 , . . . , xs and interpreted as spatial variables. This change of the notational convention should not lead to any misunderstandings, since we shall not consider sequences of spatial variables. 𝜕2 𝜕2 On the other hand, 𝜕x . 2 looks much better than 𝜕(x(1) )2 1
q(x) When one spatial variable is considered, we shall write d dx or even q′ (x). In subsequent chapters, where spatio-temporal dynamics is considered, we will also use the t) ̇ t) for 𝜕 q(x, dot notation, i. e., q(x, . 𝜕t ̄ to denote a system state at spatial point x. Thus, we shall write q(x) (or q(x, a)) ̄ when we want to point Slightly abusing the notation, we write q(x1 , x2 ) (or q(x1 , x2 ; a)) out spatial coordinates, since q([x1 , x2 ]) would not look natural. The same notation is used for a spatial actuating field (input signal), denoted as u(x). We shall also write q(⋅) and u(⋅) or even more simply q and u when we refer to the whole function defined on a certain domain of Rs .
6.2 Systems described by elliptic equations Consider the following differential operator: s
s
Ax q(x) = ∑ ∑ αij i=1 j=1
𝜕2 q(x) , 𝜕xi 𝜕xj
x ∈ Ω,
(6.1)
where Ω ⊂ Rs is a bounded open domain with smooth boundary Γ. The matrix of constant coefficients αij ’s is symmetric and positive definite. Operator Ax is equipped with one of the following boundary conditions: https://doi.org/10.1515/9783110351040-006
106 | 6 Experiments for elliptic equations Dirichlet’s boundary condition: q(x) = 0 for x ∈ Γ, → Neumann’s boundary condition: 𝜕 q(x) → = 0 for x ∈ Γ, where n is the outer normal 𝜕n → tr derivative on Γ, i. e., 𝜕 q(x) → = n gradx q(x), 𝜕n Robin’s boundary condition: a linear combination of these two. Note that one may impose one kind of boundary condition on a part of Γ and other kinds on other parts of Γ. One also has to specify a class of sufficiently smooth functions, denoted further by 𝒟, that can appear as the argument of Ax . Further in this book, we assume that a definition of 𝒟 includes also the specification of the boundary conditions, and it is then called the domain of Ax . We shall also assume that 𝒟 is a sub-space of the space of all square integrable functions on Ω, denoted further as L2 (Ω). Simultaneously with Ax , we shall consider the following boundary value problem: for a certain, known function u(x), x ∈ Ω, find q(⋅) ∈ 𝒟 such that Ax q(x) = −u(x),
x ∈ Ω,
(6.2)
with boundary condition “hidden” in the definition of 𝒟. Remark 20. We consider only zero boundary conditions, since for linear PDEs nonzero boundary excitations can be “moved” to u by subtracting from q a known function, qb , say, for which nonzero boundary conditions hold. Then, for q0 = q − qb we have Ax q0 = −u − Ax qb , where for q0 homogeneous boundary conditions hold. As an important tool for finding the representation of a solution of (6.2), we also consider the following eigenvalue problem for Ax : find a nonzero function v(⋅) ∈ 𝒟 (v(x) ≢ 0 in Ω) and a number λ (possibly complex, in general) such that Ax v(x) = −λ v(x),
x ∈ Ω.
(6.3)
Nontrivial solutions v(⋅) ∈ 𝒟 of (6.3) are called the eigenfunctions of Ax , while the corresponding λ’s are called the eigenvalues of Ax . Note that each eigenfunction v has the corresponding eigenvalue, but it happens that Ax may have multiple eigenvalues, i. e., the same λ corresponds to many eigenfunctions. For many theoretical reasons it is customary to consider the so-called weak solutions of (6.2). There are also practical reasons to consider them, namely, weak solutions allow for less smooth solutions of (6.2). As an introduction to this notion, consider the following simple example. Example 31. For a one-dimensional case, formally define Ax as follows: Ax q(x) = q′′ (x),
def
x ∈ Ω = (0, π),
(6.4)
6.2 Systems described by elliptic equations | 107
with Dirichlet boundary conditions: q(0) = q(π) = 0. Here, Γ consists of two points, namely 0 and π. The classical solution of (6.2) with this operator means that we are looking for q such that q′′ (x) = −u(x),
x ∈ (0, π),
and q(0) = q(π) = 0.
(6.5)
These expressions are meaningful only if q is twice continuously differentiable and u(⋅) is at least a continuous function, which allows us to compare values of q′′ (x) and u(x) at every point of [0, π] so as to establish whether they are equal or not. On the other hand, at least the continuity of q is required in order to check whether boundary conditions q(0) = q(π) = 0 hold or not. Hence, an area for defining weak solutions of (6.5) is somewhere between the class C 2 (0, π) of twice continuously differentiable functions (since q′′ (x) has to be meaningful for establishing its value at x) and the class C 0 (0, π) of the continuous functions on [0, π] so as to compare values of u(x) with q′′ (x) at every point of Ω. Denote by C0∞ (0, π) the space of all infinitely differentiable functions with finite supports on [0, π], which imply that ϕ(0) = ϕ(π) = 0. In order to motivate intuitively the notion of the weak solution, let us firstly assume that q(x) is the classical solution of (6.5), which is sometimes also called the strong solution. Let ϕ(⋅) be an arbitrary function from C ∞ (0, π). We multiply both sides of (6.5) by ϕ(⋅) and then integrate them over [0, π]. After integrating by parts we obtain π
π
∫ q′ (x) ϕ′ (x) dx = ∫ u(x) ϕ(x) dx,
(6.6)
0
0
where the boundary terms vanished due to boundary conditions. If for a certain q(⋅), which is differentiable and its derivative q′ (⋅) is integrable in [0, π], equality (6.6) for every ϕ(⋅) ∈ C0∞ (0, π), then q(⋅) is called the weak solution of (6.5). Note that the classical solution is simultaneously the weak solution of (6.5), but not necessarily conversely, since the weak solution can have only one derivative, instead of two as required by the strong solution. However, if the weak solution is sufficiently smooth, then it is also the classical solution. In the same vein, eigenfunctions of Ax in the weak sense are defined, i. e., if there exist nontrivial, differentiable v(⋅) (having the integrable derivative) and λ such that for every ϕ(⋅) ∈ C0∞ (0, π) we have π
π
∫ v (x) ϕ (x) dx = λ ∫ v(x) ϕ(x) dx, ′
0
′
0
(6.7)
108 | 6 Experiments for elliptic equations then v(⋅) is an eigenfunction of (6.4) in the weak sense, while λ is the corresponding eigenvalue.
The notion of the weak solutions of PDEs generalizes to multivariable cases by
using Green’s formula (see, e. g., [152]).
It can be proved (see [152], Theorem 3.1) that the weak solution of (6.2) exists and is
unique. The proof is based on a direct application of the Riesz representation theorem. Thus, we can define the inverse of Ax , denoted further by A−1 x .
Fortunately, later in this book we can confine ourselves to the classical (strong)
solutions of PDEs. The reasons are implied by the following facts (see [152], Corollaries 3.4 and 3.5).
Proposition 2. If v(⋅) is an eigenfunction of Ax in the weak sense, then it is also the eigenfunction in the strong (classical) sense. Furthermore, v(⋅) ∈ C ∞ (Ω).
Proposition 3. If u(⋅) ∈ C ∞ (Ω) and q(⋅) is the weak solution of Ax q(⋅) = −u(⋅), then q(⋅) is also from the space C ∞ (Ω), and hence, it is also the classical (strong) solution.
Proposition 4. Operator Ax with the symmetric and positive definite matrix of constant
coefficients αij ’s has the following properties:
(P1) there exists a finite or countably infinite set of eigenfunctions of Ax , denoted further as vk (⋅), k = 1, 2, . . . ,
(P2) vk (⋅) ∈ C ∞ (Ω) form an orthonormal1 basis of L2 (Ω),
(P3) the corresponding eigenvalues λk are real and positive and limk→∞ λk = ∞. For the proof see [152] (Corollary 4.15). It is based on the fact that the inverse oper-
ator A−1 x is compact and self-adjoint. We shall recapitulate properties of such operators in the next section.
Let us note that if the boundary of Ω is infinitely many times differentiable, then
the conclusions of Propositions 2, 3, and 4 can be strengthened, namely, one can rē where Ω̄ is the closure of Ω (see Chapter 9.8 in [25]). place C ∞ (Ω) by C ∞ (Ω), As declared earlier, our aim is to find u(⋅), which is in some sense optimal for es-
timating unknown parameters of Ax . As we shall see later, the optimal input u(⋅) can
be expressed as a finite, linear combination of eigenfunctions of Ax . Thus, according
to Proposition 2, the optimal u(⋅) is from C ∞ (Ω). This, in turn, implies (by invoking Proposition 3) that q(⋅), corresponding to the optimal input signal, is also the clas-
sical solution, since it is from C ∞ (Ω). Thus, we do not lose generality by restricting ourselves to classical solutions.
1 That is, orthogonal with the norms equal to 1.
6.3 Spectra of compact linear operators | 109
6.3 Spectra of compact linear operators In this section we summarize the well-known properties of linear, compact, and self−1 adjoint operators with the aim of applying them to A−1 x and of expressing Ax by using the corresponding Green’s function. As the basic Hilbert space, we select L2 (Ω) with the standard inner product ⟨f , h⟩ = ∫Ω f (x) h(x) dx and the corresponding norm ‖f ‖2 = ∫Ω f 2 (x) dx. The theory summarized below applies also to separable Hilbert spaces, other than L2 (Ω), but for our purposes it suffices to stay within the L2 (Ω) framework. Later on, a generic linear and bounded operator on L2 (Ω) will be denoted as 𝒢 , which is bounded iff there exists a finite constant νmax ∈ R+ such that for all f ∈ L2 (Ω) we have ‖𝒢 f ‖ ≤ νmax ‖f ‖. A sub-class of linear and bounded operators on L2 (Ω) is the class of compact operators (see, e. g., [72], page 238), which have the property that 𝒢 is a limit of the uniformly convergent (in the operator norm) sequence of finite-rank operators, having finite-dimensional ranges. Another sub-class of linear and bounded operators on L2 (Ω), which are of importance for us, consists of operators that are symmetric, i. e., for all f , h ∈ L2 (Ω) we have ⟨𝒢 f , h⟩ = ⟨f , 𝒢 h⟩. For linear bounded operators, if 𝒢 is symmetric, then it is also self-adjoint. The eigenvalue problem for 𝒢 reads as follows: find v ∈ L2 (Ω), which is nontrivial, i. e., v(x) ≠ 0 for x’s from a certain sub-set of Ω, having a positive measure (length, area, volume), and such that for a certain number ν 𝒢 v(⋅) = ν v(⋅).
(6.8)
For compact and self-adjoint operators we can formulate the Hilbert–Schmidt theorem (intentionally restricted here to L2 (Ω) only; see [152], Theorem 4.14) that provides a deep characterization of eigenfunctions and eigenvalues. Theorem 9 (Hilbert–Schmidt theorem, restricted to L2 (Ω)). Let 𝒢 be a compact and self-adjoint operator on L2 (Ω). Then, there exists a finite or countably infinite, orthonormal sequence vk ∈ L2 (Ω), k = 1, 2, . . . , of eigenfunctions of 𝒢 and the corresponding, nonzero eigenvalues νk , k = 1, 2, . . . , which have the following properties: (G1) the orthonormal sequence vk ∈ L2 (Ω), k = 1, 2, . . . , forms a basis of L2 (Ω), i. e., every f ∈ L2 (Ω) is represented as follows: ∞
f (⋅) = ∑ ⟨f , vk ⟩ vk (⋅), k=1
(6.9)
where the convergence is in the standard norm of L2 (Ω), (G2) for all f ∈ L2 (Ω) ∞
𝒢 f (⋅) = ∑ νk ⟨f , vk ⟩ vk (⋅). k=1
(6.10)
110 | 6 Experiments for elliptic equations Definition 8 (Hilbert–Schmidt operators). A bounded linear operator 𝒢 is called the Hilbert–Schmidt operator on L2 (Ω) if there exists an orthonormal basis ek (⋅), k = 1, 2, . . . , in L2 (Ω) such that ∞
2 ∑ 𝒢 ek (⋅) < ∞.
(6.11)
k=1
The following result (see [72], Theorem 9.21) is of special importance for our purposes. Theorem 10. If 𝒢 is a Hilbert–Schmidt operator, then 𝒢 is a compact operator. It can be shown that if (6.11) holds for one basis, then this condition also holds for all other bases of L2 (Ω). Thus, in particular, one can take eigenfunctions {vk (⋅)} of 𝒢 as the basis {ek (⋅)} and then, condition (6.11) is equivalent to the following one: ∞
∑ νk2 < ∞.
(6.12)
k=1
As we shall see in the following sections, this condition holds for all operators 𝒢 considered in this book which are the inverse operators to the elliptic operators. Thus, such 𝒢 ’s are Hilbert–Schmidt operators and, by Theorem 10, they are compact, which implies that we can use the strong properties of {vk (⋅)}’s and {νk }’s described by Theorem 9.
6.4 Integral operators Consider the following class of linear integral operators, defined for f (⋅) ∈ L2 (Ω): (𝒢 f )(x) = ∫ G(x, κ) f (κ) dκ,
κ ∈ Ω,
(6.13)
Ω
where G : Ω×Ω → R is a given function, usually called the kernel of 𝒢 . It can be proved (see, e. g., [72], page 231) that (6.13) is the Hilbert–Schmidt (hence, also compact) operator if and only if the following condition holds: 2 ∫ ∫G(x, κ) dx dκ < ∞.
(6.14)
Ω Ω
If, in addition to (6.14), G is symmetric, i. e., G(x, κ) = G(κ, x) for x, κ ∈ Ω, then from Theorem 9 we infer that the set of eigenfunctions {vk } exists, such that ∫ G(x, κ) vk (κ) dκ = νk vk (x), Ω
x ∈ Ω, k = 1, 2, . . . ,
(6.15)
6.4 Integral operators |
111
which form the basis of L2 (Ω). Additionally, we can infer from the Mercer theorem (see, e. g., [72], page 231) that was already mentioned that kernel G has the following representation: ∞
G(x, κ) = ∑ νk vk (x) vk (κ). k=1
(6.16)
Note, however, that the convergence in (6.16) may not be pointwise. In general, it is only the convergence in the norm of L2 (Ω × Ω), i. e., 2 n lim ∫ ∫G(x, κ) − ∑ νk vk (x) vk (κ) dx dκ = 0. n→∞ k=1 Ω Ω
(6.17)
Later on in this book we shall frequently meet integral equations with so-called degenerated kernels having the following, specific form: n
G(x, κ) = ∑ gk (x) gk (κ), k=1
x, κ ∈ Ω,
(6.18)
where n ≥ 1 is finite and fixed and gk : L2 (Ω), k = 1, 2, . . . , n, are given functions, linearly independent in Ω. Clearly, (6.18) is symmetric and condition (6.14) holds. Thus, we can invoke Theorem 9. In order to reveal the structure of vk ’s and νk ’s for G is given by (6.18), we multiple both sides of ∫ G(x, κ) vk (κ) dκ = νk vk (x)
(6.19)
Ω
by vj (⋅), j = 1, 2, . . . , n, and integrate over Ω. This yields n
∫ G(x, κ) vj (κ) dκ = ∑ gk (x) βkj , k=1
Ω
j = 1, 2, . . . , n,
(6.20)
def
where βkj = ∫Ω gk (κ) vj (κ) dκ, k, j = 1, 2, . . . , n. If vj (⋅) is an eigenfunction, then for the right-hand side of (6.20) we must have n
∑ gk (x) βkj = νj vj (x),
k=1
j = 1, 2, . . . , n, x ∈ Ω,
(6.21)
for certain νj ’s. Thus, each eigenfunction of (6.18) is a linear combination of gk (⋅)’s. In other words, if v(⋅) is an eigenfunction, then for a certain n × 1 vector w̄ we have
112 | 6 Experiments for elliptic equations ̄ ̄ is an n × 1 vector of gk ’s and v(x) = w̄ tr g(x), where g(x) ̄ ḡ tr (κ) w̄ dκ = ν ḡ tr (x) w,̄ ∫ ḡ tr (x) g(κ)
(6.22)
Ω
̄ since (6.18) can be rewritten as G(x, κ) = ḡ tr (x) g(κ). In a more compact form, (6.22) reads as follows: ḡ tr (x) G w̄ = ν ḡ tr (x) w,̄
(6.23)
def
̄ ḡ tr (κ) dκ is an n × n symmetric and positive definite matrix, due to where G = ∫Ω g(κ) the linear independence of gk ’s. ̄ and integrating with respect Multiplying both sides of (6.23) (from the left) by g(x) to x we obtain G2 w̄ = ν G w̄ or, equivalently, since G is nonsingular, G w̄ = ν w.̄ Summarizing, we have shown that if v(x) = ḡ tr (x) w̄ is an eigenfunction of (6.19) with kernel (6.18) and eigenvalue ν, then necessarily w̄ is the eigenvector of G, i. e., G w̄ = ν w,̄
(6.24)
corresponding to eigenvalue ν. The converse statement also holds, since it suffices to reread the above formulas in reverse order. Note that G has n eigenvectors w̄ k , say, k = 1, 2, . . . , n, which are orthogonal when the corresponding eigenvalues νk ’s are different. If G has multiple eigenvalues, then the corresponding eigenfunctions can be orthogonalized by the Gramm–Schmidt procedure. Thus, later on, we can assume that all w̄ k ’s are orthogonal and their norms in Rn equal 1. Eigenvalues νk ’s in (6.24) are real and positive, since G is symmetric and positive definite. ̄ These facts allow us to show that the corresponding vk (x) = w̄ ktr g(x), k = 1, 2, . . . , n, are also orthonormal. Indeed, ∫ vk (x) vj (x) dx = w̄ ktr G w̄ j = νj w̄ ktr w̄ j = νj δkj ,
(6.25)
Ω
where δkj is the Kronecker delta symbol (δkj = 0 if k ≠ j and 1 otherwise). Corollary 7. An integral operator with degenerated kernel (6.18) and square integrable, linearly independent gk (⋅), k = 1, 2, . . . , n, has the following properties: (DG1) it has exactly n orthonormal eigenfunctions of the form vk (x) = ḡ tr (x) w̄ k , k = 1, 2, . . . , n, where w̄ k ’s are the eigenvectors of matrix G, (DG2) the corresponding eigenvalues are real and positive. We have already seen examples of such operators in the previous chapter, where Ω = (0, T). Further examples are provided by Green’s functions, which are kernels of the inverses of elliptic operators, as discussed in the next section.
6.5 Green’s functions of elliptic operators | 113
6.5 Green’s functions of elliptic operators Being equipped with additional knowledge, let us return to the problem considered in Section 6.2, namely, how to represent solutions of elliptic equations Ax q(x) = −u(x), x ∈ Ω, with the boundary conditions explained in Section 6.2. To do this, we introduce quite formally the notion of Green’s function G(x, κ), which is defined as the solution of the following equation: Ax G(x, κ) = −δ(x − κ),
x ∈ Ω,
(6.26)
with zero boundary conditions, where κ ∈ Ω plays the role of a parameter. In (6.26) δ(x) is the Dirac delta, which is the distribution in the sense of Schwarz. However, for our purposes, it suffices to consider the Dirac delta as a pseudo-function δ(x), which is zero for all x, with the exception of one point x = 0, where it is infinite. There is no fear of using δ(x) in this vein, since we apply only its one property, namely, ∫ f (x) δ(x − κ) dx = f (κ),
κ ∈ Ω,
(6.27)
Ω
assuming that f is a continuous function in Ω. Remark 21. In theoretical mechanics G(x, κ) has a direct interpretation as a deflection of a string or a plate at point x that is induced by a unit force acting permanently at point κ. For elliptic operators with constant coefficients Green’s function is symmetric, i. e., G(x, κ) = G(κ, x), x, κ ∈ Ω. The importance of Green’s function lies in the fact that if we know the Green’s function G(x, κ) of operator Ax , then one can express the solution of Ax q(x) = −u(x) as follows: q(x) = ∫ G(x, κ) u(κ) dκ,
x ∈ Ω.
(6.28)
Ω
One can check that (6.28) is a formal solution of Ax q(x) = −u(x) by applying Ax to the left-hand side of (6.28) and assuming that the order of the differentiation by Ax and the integration in (6.28) can be interchanged. This yields, using (6.26) and (6.27), Ax q(x) = ∫ Ax G(x, κ) u(κ) dκ = − ∫ δ(x − κ) u(κ) dκ = −u(x) Ω
(6.29)
Ω
for all x ∈ Ω, which is exactly Ax q(x) = −u(x). Summarizing, if G is square integrable in Ω × Ω, then (6.28) defines the Hilbert– Schmidt operator q(x) = (𝒢 u)(x). Note that 𝒢 is self-adjoint, due to the symmetry of Green’s function G. Formally, 𝒢 is properly defined for any u ∈ L2 (Ω); however, the
114 | 6 Experiments for elliptic equations expression (𝒢 u)(x) can be considered as the solution of Ax q(x) = −u(x) only for u in the domain 𝒟 of Ax . In other words, 𝒢 is the integral representation of A−1 x , which is the inverse of Ax . −1 This implies that 𝒢 , Ax and Ax must have the orthogonal set of eigenfunctions vk (⋅), k = 1, 2, . . . , which is complete (forming a basis) in L2 (Ω). Consequently, the eigenvalues νk of 𝒢 and the eigenvalues of Ax vk = −λk vk are related as follows: λk = 1/νk , k = 1, 2, . . . . Thus, according to the Mercer theorem (see Section 6.4), we can express Green’s function of Ax as follows: ∞
G(x, κ) = ∑ λk−1 vk (x) vk (κ). k=1
(6.30)
This fact together with (6.28) provides the following representation of the solution of Ax q = −u: ∞
q(x) = ∑ λk−1 ⟨u, vk ⟩ vk (x), k=1
(6.31)
where the coefficients ⟨u, vk ⟩ = ∫Ω u(κ) vk (κ) dκ, k = 1, 2, . . . . In general, we can guarantee that the series (6.31) is convergent in L2 (Ω × Ω) for the second-order elliptic operators for which λk−1 = O(k −2 ). In many particular cases the uniform convergence can also be proved. Example 32. We shall continue Example 31 (see also Section 10.3 in [72]). Define (Ax q)(x) = q′′ (x), x ∈ (0, π) with boundary conditions q(0) = q(π) = 0. Domain 𝒟 of Ax is defined as follows: 𝒟 = {q ∈ C 2 ([0, π]) : q(0) = q(π) = 0}. Consider the following equation: q′′ (x) = −u(x),
x ∈ (0, π),
(6.32)
where u ∈ C([0, 1]) is given and we are looking for q ∈ 𝒟 that solves (6.32). It can be proved (see Section 10.3 in [72]) that operator 𝒢 defined by π
q(x) = ∫ G(x, κ) u(κ) dκ
(6.33)
0
is the inverse of Ax , where Green’s function G can be explicitly described: it grows linearly, starting from 0, as x changes from 0 to κ, and then values of G are linearly decreasing as x changes from κ to π. Note that its derivative with respect to x has discontinuity at x = κ, corresponding to the point in which the Dirac delta acts. One can directly verify that in this case vk (x) = √2/π sin(k x) and λk = k 2 , k = 1, 2, . . . , are the eigenfunctions and the eigenvalues of Ax , respectively. Thus, Green’s
6.5 Green’s functions of elliptic operators | 115
function has the following representation: G(x, κ) =
2 ∞ −2 ∑ k sin(k x) sin(k κ). π k=1
(6.34)
Accordingly, the solution of (6.32) has the form ∞
q(x) = √2/π ∑ k −2 ck sin(k x), k=1
(6.35)
π
where ck = √2/π ∫0 u(κ) sin(k κ) dκ, k = 1, 2, . . . . Note that in spite of rapidly decreasing coefficients k −2 the series in (6.34) is not as rapidly convergent as one might expect near the cusp point. The reason is the discontinuity of the derivative of G(x, κ) at x = κ. The rate of convergence of (6.35) is much better if u(⋅) is smooth enough. For ex2 ample, when u(x) = π4 − (x − π2 )2 , one can calculate the exact solution q(x) = 121 (x 4 − √ 2 (−πk sin(πk)−2 cos(πk)+2)
2πx3 + π 3 x). When series (6.35) truncated to five terms ck = π is k3 used, the maximum absolute difference between this approximate solution and q(x) is equal to 2 × 10−4 . The whole plot of the approximation error is shown in Fig. 6.1.
Figure 6.1: The approximation error vs. x when the Fourier representation of the smooth solution of (6.32) is truncated to five terms (see Example 32 for details).
More examples of Green’s functions will be provided in the next section, because for our purposes the dependence of Green’s functions on unknown parameters of elliptic operators is crucial.
116 | 6 Experiments for elliptic equations
6.6 The dependence of Green’s function on the parameters of elliptic operators In this section we discuss the dependence of Green’s function on (possibly unknown) parameters of elliptic operators. To some extent, we repeat the results provided in Section 6.5, but this time we put the emphasis on the dependence of eigenvalues of partial differential operators on their (constant) parameters, since it implies the dependence of solutions (responses) of PDEs (systems) on them. For this purpose, it is convenient to change the notation. Namely, instead of a matrix of parameters αij , i j = 1, 2, . . . , s, we order unknown parameters into the column vector ā that contains r unknown parameters as its elements. In this vein, we consider elliptic operators Ax of the form r
Ax (a)̄ q(x) = ∑ ai Px(i) q(x),
(6.36)
i=1
where Px(i) , i = 1, . . . , r, are differential operators with respect to spatial variables x ∈ Ω ⊂ Rs . These operators are defined on set 𝒟 of functions defined on Ω, which are sufficiently many times continuously differentiable in Ω. We assume 𝒟 to be a dense sub-space of the space L2 (Ω). Boundary conditions are defined on set Γ, which is the boundary of Ω. These conditions are included into the definition of 𝒟; Ω is assumed to be a bounded open set with piecewise smooth boundary Γ. Later on, we denote by A ⊂ Rr the set of parameters ā which ensures that (6.36) is a properly defined (e. g., an elliptic) operator. As an example consider operator Ax (a)̄ q(x) = a1
d2 q(x) − a2 q(x), dx2
def
x ∈ Ω = (0, 1).
(6.37)
It is defined on the sub-space 𝒟 ⊂ C 2 (0, 1) of twice continuously differentiable functions such that the following boundary conditions hold: q(0) = q(1) = 0. Clearly, in 2 this case, Px(1) = d dxq(x) and Px(2) is the identity operator, while a1 > 0 and a2 > 0 are 2 constant parameters, i. e., A = {ā : a1 > 0, a2 > 0}. The second example is the Laplace operator in two dimensions: Ax (a)̄ q(x) = a1
𝜕2 q(x) 𝜕2 q(x) + a2 , 2 𝜕x1 𝜕x22
def
x ∈ Ω = (0, 1)2 .
(6.38)
It is defined on the sub-space 𝒟 ⊂ C 2 (Ω) of twice continuously differentiable functions with homogenous boundary conditions of the Dirichlet type: q(x1 , 0) = 0,
q(x1 , 1) = 0,
q(0, x2 ) = 0,
q(1, x2 ) = 0.
(6.39)
6.6 Green’s functions – dependence on parameters | 117
𝜕2 q(x) 𝜕x12
2
q(x) and Px(2) q(x) = 𝜕 𝜕x 2 , while a1 > 0 and a2 > 0 are constant 2 parameters. Thus, A = {ā : a1 > 0, a2 > 0}. Later on in this book the following assumptions are made concerning Ax . (A1) Ax (a)̄ is symmetric, i. e., ∀f , g ∈ 𝒟 and ā ∈ A we have
Here, Px(1) q(x) =
⟨Ax (a)̄ f , g⟩ = ⟨f , Ax (a)̄ g⟩, which is relatively easy to verify. It is also self-adjoint, which is relatively difficult to verify, since it requires us to establish whether the adjoint operator has the same domain as Ax . ̄ ⟩ > 0. (A2) −Ax (a)̄ is positive definite, i. e., ∀f ∈ 𝒟 and ā ∈ A, if f ≠ 0, then ⟨f , −Ax (a)f ̄ i. e., Ax (a)̄ with Remark 22. Note that assumption (A2) pertains to operator −Ax (a), a minus sign. This convention is maintained throughout the book as it is convenient for our purposes. We emphasize this fact since it is more widespread to include the minus sign into the definition of the operator itself. On the other hand, through the whole book eigenfunctions v(x) ≢ 0 and eigenvalues λ of Ax (a)̄ are defined as Ax (a)̄ v(x) = −λ v(x),
x ∈ Ω,
which implies that eigenvalues of Ax (a)̄ have the same sign as if the minus sign is hidden in the definition of an operator. With these conventions, Ax (a)̄ is still called an operator of the elliptic type. (A3) For every ā ∈ A there exists an inverse operator 𝒢ā of v1 (x), v2 (x), . . . , which is a bounded linear operator of the Hilbert–Schmidt type that has the following representation: (𝒢ā f )(x) = ∫ G(x, κ, a)̄ f (κ) dκ,
x ∈ Ω,
(6.40)
Ω
̄ where G(x, κ, a)̄ is Green’s function of Ax (a). ̄ ̄ (A4) For every a ∈ A Green’s function G(x, κ, a) is symmetric and square integrable in Ω × Ω. Operator 𝒢ā has eigenfunctions v1 (x), v2 (x), . . . that form an orthonormal ̄ The correspondbasis for L2 (Ω) and they are the same as eigenfunctions of Ax (a). ing eigenvalues of 𝒢ā are real and positive. For every ā ∈ A the solution of Ax (a)̄ q = −u can be expressed as q(x) = ∫ G(x, κ, a)̄ u(κ) dκ. Ω
It is known (see Section 6.5) that for elliptic operators (A1)–(A4) hold.
(6.41)
118 | 6 Experiments for elliptic equations ̄ Namely, nontrivial v ∈ 𝒟 is called an Consider the eigenvalue problem for Ax (a). eigenfunction of Ax (a)̄ if there exists an (in general, complex) number λ such that Ax (a)̄ v(x) = −λ v(x),
x ∈ Ω.
(6.42)
It is also well known (see, e. g., [15, 25]) that under (A1)–(A4) the sequence of eigenfunctions v1 (x), v2 (x), . . . and the sequence of the corresponding eigenvalues λ1 , λ2 , . . . have the following properties. (E1) The sequences of eigenfunctions v1 (x), v2 (x), . . . is countably infinite. (E2) The sequence of the corresponding eigenvalues λ1 , λ2 , . . . is also countably infinite. (E3) The number of eigenfunctions corresponding to one eigenvalue is finite. (E4) Eigenfunctions corresponding to different eigenvalues are orthogonal, i. e., if λi ≠ λj , then ⟨vi , vj ⟩ = 0. (E5) According to (E3) and (E4), the whole set of eigenfunctions can be orthogonalized. Later on, we assume that the orthogonalization process was done and (after a possible renumbering) it is again denoted as v1 (x), v2 (x), . . . . Eigenfunctions can be normalized in such a way that their norm ‖vk ‖ = √⟨vk , vk ⟩ = 1. It is convenient to assume that the orthogonal sequence was also normalized, i. e., the sequence v1 (x), v2 (x), . . . is orthonormal. To keep the correspondence between eigenfunction and eigenvalue pairs, multiple eigenvalues are repeated in the sequence λ1 , λ2 , . . . an appropriate number of times. (E6) Sequence vk , k = 1, 2, . . . , forms the complete orthonormal basis of L2 (Ω), i. e., every f ∈ L2 (Ω) can be expressed as ∞
f = ∑ ⟨f , vk ⟩ vk , k=1
(6.43)
where the convergence in (6.43) is understood in the standard L2 (Ω)-norm. (E7) Eigenvalues λ1 , λ2 , . . . are real and positive. They can be ordered in such a way that λ1 ≤ λ2 ≤ ⋅ ⋅ ⋅ , which is further assumed.
6.7 A special class of Green’s functions In general, both v1 (x), v2 (x), . . . and λ1 , λ2 , . . . depend on a.̄ It is important to point out a sub-class of operators Ax (a)̄ and the corresponding Green’s functions that have the following properties: (I1) Eigenfunctions v1 (x), v2 (x), . . . , of Ax (a)̄ do not depend on a.̄ (I2) If (I1) holds, then eigenvalues λk (a)̄ of Ax (a)̄ are linear functions of a,̄ i. e., λk (a)̄ = h̄ tr k a,̄
Ax (a)̄ vk = −λk (a)̄ vk ,
(6.44)
6.7 A special class of Green’s functions | 119
for certain (known) vectors h̄ k , k = 1, 2, . . . , which do not depend on a.̄ It is also ̄ admitted that λk (a)̄ is an affine function of a,̄ i. e., λk (a)̄ = h̄ tr k a + ϖk , where ϖk is a sequence of known constants. In order to convince ourselves that the class of operators Ax (a)̄ for which (I1) and (I2) hold is sufficiently large, we provide easily verifiable, sufficient conditions for them. Lemma 4. If a certain vk does not depend on ā and is an eigenfunction of all operators ̄ which does not depend on a.̄ Px(i) , i = 1, 2, . . . , r, then it is also an eigenfunction of Ax (a), The converse statement also holds. Proof. Sufficiency – by direct inspection. If vk is the eigenfunction of Ax (a)̄ and does 𝜕 v (x) not depend on a,̄ then we have 𝜕ka = 0, j = 1, 2, . . . , r. Thus, the differentiation j 𝜕 λ (a)̄ of both sides of Ax (a)̄ vk = −λk (a)̄ vk with respect to aj gives Px(j) vk = − 𝜕ka vk , j = j
1, 2, . . . , r, which proves the necessity.
As one basic example we mention the operator defined by (6.37), with boundary conditions q(0) = q(1) = 0. In this case vk = √2 sin(k π x) is the eigenfunction of Ax (a)̄ that does not depend on a1 and a2 . The corresponding eigenvalues have the form λk (a)̄ = a1 π 2 k 2 + a2 and they are linear in ā as predicted by the following lemma. ̄ ̄ Lemma 5. Under assumptions of Lemma 4, each λk (a)̄ = h̄ tr k a is linear in a and the ̄ are eigenvalues of P (i) , i. e., P (i) v (x) = −h(i) v (x), i = 1, 2, . . . , r. elements h(i) of h x x k k k k Proof. The proof follows by direct inspection. Remark 23. In some cases it happens that under assumptions very similar to those in ̄ Lemma 5, eigenvalues are affine with respect to a,̄ i. e., λk (a)̄ = h̄ tr k a + ϖk , where ϖk ̄ is a sequence of known constants that do not depend on a. To illustrate this point, consider the following operator: r
rmax
i=1
i=r+1
Ax (a)̄ q(x) = ∑ ai Px(i) q(x) + ∑ βi Px(i) q(x),
(6.45)
where βi ’s are known parameters and Px(i) , i = r +1, r +2, . . . , rmax , have the same eigenrmax functions vk ’s as Px(i) , i = 1, 2, . . . , r. Then, ϖk = ∑i=r+1 βi ςk(i) , where ςk(i) are eigenvalues (i) of Px(i) vk (x) = −ςk vk (x), i = r + 1, r + 2, . . . , rmax . Later on in this book we shall not make this distinction, keeping (6.36) as the basic form. The reason is that in both cases ∇a λk (a)̄ = h̄ k , which is of importance for DOE. It is expedient to reveal the structure of the eigenfunctions when we have several spatial variables. For the sake of simplicity, we provide the following result for two variables, because the generalization is obvious. The second route of easy generalizations is to replace Ω = (0, 1)2 by the Cartesian product of domains having other shapes (see the example of the cylindrical domain in the next section).
120 | 6 Experiments for elliptic equations Lemma 6. Consider the following operator, defined on Ω = (0, 1)2 : Ax (a)̄ q(x1 , x2 ) = a1 Px(1) q(x1 , x2 ) + a2 Px(2) q(x1 , x2 ), 1 2
(x1 , x2 ) ∈ Ω,
(6.46)
where Px(1) and Px(2) are partial differential operators with respect to x1 only and to x2 1 2 only, respectively. Operator (6.46) is equipped with boundary conditions that pertain to x1 and x2 separately. Consider also the following two eigenvalue problems: Px(1) ϕ(x1 ) = −h(1) ϕ(x1 ), 1
x1 ∈ (0, 1),
(6.47)
Px(2) ψ(x2 ) 2
x2 ∈ (0, 1)
(6.48)
= −h
(2)
ψ(x2 ),
̄ that are equipped with the boundary conditions resulting from those imposed on Ax (a). Let us also assume that (6.47) and (6.48) are such that they have properties (E1)–(E7). Denote their eigenfunctions and the corresponding eigenvalues by ϕk (x1 ), h(1) and k ψk (x2 ), h(2) , respectively. k Then we have: (A) Eigenfunctions of Ax (a)̄ are the products of all pairs: ϕk′ (x1 ) ψk′′ (x2 ), k ′ , k ′′ = 1, 2, . . . . In other words, the eigenfunctions of Ax (a)̄ have the tensor product struc̄ ) ⊗ ψ(x ̄ ) is the column ̄ ), where ϕ(x ture, namely, they can be expressed as ϕ(x 1 2 1 ̄ ). vector (possibly infinite) composed of all ϕk (x1 )’s, and analogously for ψ(x 2 ̄ (B) The eigenvalues of Ax (a) corresponding to ϕk′ (x1 ) ψk′′ (x2 ) are linear in ā and they have the following form: a1 h(1) + a2 h(2) , k ′ , k ′′ = 1, 2, . . . . k′ k ′′ Proof. It suffices to observe that Px(1) (ϕk′ (x1 ) ψk′′ (x2 )) = −h(1) ϕk′ (x1 ) ψk′′ (x2 ) and k 1
Px(2) (ϕk′ (x1 ) ψk′′ (x2 )) = −h(2) ϕk′ (x1 ) ψk′′ (x2 ). Part (B) follows from Lemma 5, since k 2 each ϕk′ (x1 ) ψk′′ (x2 ) is the eigenfunction of both operators Px(1) and Px(2) . 1 2
Example 33. As a simple application of Lemma 6, consider the eigenvalue problem for the Laplace operator (6.38) with the Dirichlet boundary conditions (6.39). In this case Px(1) ϕ(x1 ) = 1
𝜕2 ϕ(x1 ) 𝜕x12
with boundary conditions ϕ(0) = ϕ(1) = 0. Eigenfunctions
of this operator are well known: ϕk′ (x1 ) = √2 sin(k ′ π x1 ). Similarly, for the second
variable we have Px(2) ψ(x2 ) = 2
𝜕2 ψ(x2 ) 𝜕x22
with boundary conditions ψ(0) = ψ(1) = 0 and
ψk′′ (x2 ) = √2 sin(k π x2 ). Thus, according to Lemma 6, the eigenfunctions and eigenvectors of (6.38) are the following: 2 sin(k ′ π x1 ) sin(k ′′ π x2 ) and a1 (k ′ )2 + a2 (k ′′ )2 , k ′ , k ′′ = 1, 2, . . . . ′′
6.8 Eigenfunctions of operators of the elliptic type – more examples In this section we provide more examples of eigenfunctions of PDEs of the elliptic type. We rather sketch them formally, without going into detail. In particular, our aim is to il-
6.8 More examples of eigenfunctions | 121
lustrate cases when eigenfunctions do not depend on ā and/or cases when eigenfunctions of a multivariable operator are the products of the eigenfunctions of operators that depend on a part of spatial variables only. Example 34 (1D example). Consider the formal operator Ax (a)̄ q(x) = a1 q′′ (x)−a2 q(x), x ∈ (0, π). Then, Px(1) q(x) = q′′ (x) and Px(2) is the identity operator. Case (a): When the boundary conditions q(0) = q(π) = 0 are imposed, vk (x) = √2/π sin(k x), k = 1, 2, . . . , are eigenfunctions of both Px(1) and Px(2) . Thus, according to Lemma 4 and Lemma 5 we conclude that √2/π sin(k x) are eigenfunctions of Ax (a)̄ that do not depend on a.̄ Additionally, λk (a)̄ = a1 k 2 + a2 , k = 1, 2, . . . , are linear functions of a.̄ Case (b): When the boundary conditions q′ (0) = q′ (π) = 0 are imposed, analogously, v0 (x) = 1/√π and vk (x) = √2/π cos(k x), k = 1, 2, . . . , are eigenfunctions of both Px(1) and Px(2) , k = 1, 2, . . . , and again they are eigenfunctions of Ax (a)̄ that do not depend on ā and λk (a)̄ = a1 k 2 +a2 . Note that in this case we start the numbering of eigenfunctions from k = 0, the reason being that the first eigenfunction is a constant function over x ∈ (0, π). Case (c): When we impose the so-called circular2 (or periodic) boundary conditions q(0) = q(π) and q′ (0) = q′ (π), 1/√π and the pairs of eigenfunctions √2/π sin(k x) and √2/π cos(k x) do not depend on a,̄ and the same eigenvalue a1 k 2 + a2 , k = 1, 2, . . . corresponds to each pair. Example 35 (1D example, partly negative). Let the formal operator Ax (a)̄ q(x) be the same, but this time we consider the following mixed boundary conditions: q(0) = 0,
a0 q(π) + q′ (π) = 0,
(6.49)
where a0 > 0 is a constant parameter. It is well known (see, e. g., [3], Example 2.17) that not normalized eigenfunctions and the corresponding eigenvalues have the following form: sin(αk x/π),
λk = a1 (αk /π)2 + a2 ,
k = 1, 2, . . . ,
(6.50)
where αk ’s are the solutions of the following equation: tan(α) = −α/(π a0 ),
(6.51)
which has infinitely many positive roots, growing to infinity. One can imagine them by drawing the tan(α) function together with the line −α/(π a0 ) on the same plot. The conclusions are the following: 2 Such boundary conditions arise naturally when, e. g., a steady-state temperature distribution along a thin torus is considered.
122 | 6 Experiments for elliptic equations (1) When parameter a0 > 0 is known, the eigenfunctions do not depend on unknown ̄ the eigenvalues are linear in a1 and a2 , and we will be able parameters in Ax (a), to obtain more exact characterizations of D-optimal input signals for estimating a1 and a2 . (2) When parameter a0 > 0 is unknown and has to be estimated together with a1 and a2 , then eigenfunctions of Ax (a)̄ depend on a0 (through (6.51)) and the eigenvalues are not linear in the triple a0 , a1 , and a2 . In such a case, only general characterizations of the D-optimal input signal are possible. Example 36 (2D, three parameters). Let us return to the Laplace operator on the unit square with one more term, i. e., Ax (a)̄ q(x) = a1
𝜕2 q(x) 𝜕2 q(x) + a − a3 q(x), 2 𝜕x12 𝜕x22
def
x ∈ Ω = (0, 1)2 ,
(6.52)
and with the Dirichlet boundary condition on the two edges of the boundary and with the Neumann boundary condition on the second part: q(x1 , 0) = 0, Here, Px(1) q(x) =
q(x1 , 1) = 0,
𝜕2 q(x) , 𝜕x12
Px(2) q(x) =
𝜕 q(x1 , x2 ) = 0, x1 =0 𝜕x1 𝜕2 q(x) , 𝜕x22
𝜕 q(x1 , x2 ) = 0. x1 =1 𝜕x1
(6.53)
and Px(3) is the identity operator, while a1 > 0
and a2 > 0, a3 > 0 are constant parameters. For Px(2) we have the Neumann boundary conditions. Thus, its eigenfunctions are the following: √2 sin(k ′′ π, x2 ), k ′′ = 1, 2, . . . . Now, applying Lemmas 4–6, we conclude the operator (6.52) with boundary conditions (6.53) has the following eigenfunctions and eigenvalues: for k ′ = 1, 2, . . . and k ′′ = 1, 2, . . . , 2 sin(k ′′ π x2 ) cos(k ′ π x1 ),
2
2
a1 (π k ′ ) + a2 (π k ′′ ) + a3 ,
(6.54)
and for k ′ = 0 and k ′′ = 1, 2, . . . , we have √2/π sin(k ′′ π x2 ) and the eigenvalues a2 (π k ′′ )2 + a3 . This example shows that, due to Lemmas 4–6, one can “compose” eigenfunctions and eigenvalues of more complicated linear operators from eigenfunctions and eigenvalues of their parts. Clearly, the Laplace operator in the three-dimensional unit cube has eigenfunctions that are the products of one-dimensional trigonometric eigenfunctions: the constant and sin and/or cos, depending on the boundary conditions. Example 37 (A cylindrical domain). In this example we do not follow the general notational conventions of this book, giving priority to widespread conventions. Consider
6.8 More examples of eigenfunctions | 123
the following operator: a1 (
𝜕 q(r, z) 𝜕2 q(r, z) 𝜕2 q(r, z) + r −1 , ) + a2 2 𝜕r 𝜕r 𝜕 z2
0 < r < 1, 0 < z < π,
(6.55)
with the following boundary conditions: q(0, z) < ∞,
q(1, z) = 0,
𝜕 q(r, z) = 0, 𝜕 z z=0 def
𝜕 q(r, z) = 0. 𝜕 z z=π
(6.56)
2
z) z) Note that the operator Pr,(1)z q(r, z) = ( 𝜕 q(r, + r −1 𝜕 q(r, ) arises in the natural way 𝜕r 𝜕 r2 when a radially symmetric membrane with clamped edges is considered. Its eigenfunctions are well known (see, e. g., [7], page 200) as the Bessel functions of zero order and the first kind. They are commonly denoted as J0 (r). Denote by ϱk , k = 1, 2, . . . , positive zeros of J0 (r). Then,
J (ϱ r) def ϕk (r) = √2 0 k , |J1 (ϱk |)
k = 1, 2, . . . ,
where J1 (r) is the second-order Bessel function of the first kind. These functions pro-
def
vide the orthonormal basis of the Hilbert space with the scalar product ⟨f , g⟩c = 1 ∫0 r f (r) g(r) dr and ⟨f , f ⟩c < ∞. The corresponding eigenvalues are ϱ2k , k = 1, 2, . . . . def
2
z) Concerning the second operator, Pr,(2)z q(r, z) = 𝜕 𝜕q(r, with the homogenous z2 boundary conditions of the Neumann type, we already know that its eigenfunctions and eigenvalues are the following: √2/π cos(k z) and k 2 , respectively, k = 1, 2, . . . .
Figure 6.2: Selected eigenfunctions of the Laplace operator in the cylinder unit disc × [0, π] in rand z-coordinates, assuming the circular symmetry. J0 1 is the shorthand notation for J0 (ϱ1 r), where J0 (r) is the Bessel function, while ϱ1 is the first root of J0 (r) = 0. Analogously, J0 2 stands for J0 (ϱ2 r), where ϱ2 is the second root of J0 (r) = 0 (see explanations in the text).
124 | 6 Experiments for elliptic equations Thus, the eigenfunctions and eigenvalues of operator (6.55) are the following: √2/π ϕk′ (r) cos(k ′′ z) and a1 ϱ2k′ + a2 (k ′′ )2 , respectively, k ′ = 1, 2, . . . , k ′′ = 0, 1, . . . . In Fig. 6.2, two selected eigenfunctions are plotted. The Sturm–Liouville theory generalizes to fourth-order operators (see [7], Chapter 6). Example 38 (Fourth-order 1D operator). The operator Ax (a)̄ q(x) = a1
𝜕4 q(x) 𝜕2 q(x) − a3 q(x), + a2 4 𝜕x 𝜕x 2
x ∈ (0, π),
(6.57)
with boundary conditions q(0) = q(π) = 0,
𝜕2 q(x) 𝜕2 q(x) = =0 2 𝜕x x=0 𝜕x 2 x=π
(6.58)
describes flexural deflections of the clamped beam. It is easy to verify that √2/π sin(k x) and −a1 k 4 + a2 k 2 + a3 , k = 1, 2, . . . , are its eigenfunctions and eigenvalues, respectively. Example 39 (Fourth-order 2D operator). We consider a more complicated operator in Ω = (0, π)2 , namely, Ax (a)̄ q(x) = a1
𝜕4 q(x) 𝜕2 q(x) 𝜕2 q(x) 𝜕4 q(x) + a + a + a + a5 q(x), 2 4 3 𝜕x12 𝜕x22 𝜕x14 𝜕x24
(6.59)
with boundary conditions q(0, x2 ) = q(π, x2 ) = 0, q(x1 , 0) = q(x1 , π) = 0,
𝜕2 q(x) 𝜕2 q(x) = = 0, 𝜕x12 x1 =0 𝜕x12 x1 =π 𝜕2 q(x) 𝜕2 q(x) = = 0. 𝜕x22 x2 =0 𝜕x22 x2 =π
(6.60) (6.61)
Having Lemmas 4–6 and the results of Example 38 at our disposal, we can immediately derive eigenfunctions and eigenvalues of operator (6.59) with (6.60) and (6.61) as its boundary conditions, namely, (2/π) sin(k ′ x1 ) sin(k ′′ x2 ) and −a1 (k ′ )4 + a2 (k ′ )2 − a3 (k ′′ )4 + a4 (k ′′ )2 + a5 , k ′ , k ′′ = 1, 2, . . . .
6.9 Problem statement and D-optimality conditions for systems described by elliptic PDEs Consider a system described by Green’s function G(x, κ, a)̄ that corresponds to a certain PDE of the elliptic type. Thus, the system state q(x, a)̄ depends on its input u(x)
6.9 D-optimality for elliptic type PDEs | 125
(spatial excitation) as follows: q(x, a)̄ = ∫ G(x, κ, a)̄ u(κ) dκ.
(6.62)
Ω
We assume that observations Y(x) of q(x, a)̄ have the form Y(x) = q(x, ā 0 ) + ϵ(x),
x ∈ Ω,
(6.63)
where ϵ(x) represents a measurement error at point x. Assumption 1 (Random field of errors). Assume that ϵ(x) is a Gaussian random field with zero mean E[ϵ(x)] = 0 and E[ϵ(x) ϵ(κ)] = σ 2 δ(x − κ), x, κ ∈ Ω, where δ(⋅) is the Dirac delta. Later, we set σ 2 = 1 for the sake of simplicity. Under these assumptions, the FIM for estimating a,̄ denoted further as MΩ (u), has the following form: MΩ (u) = ∫ ∇a q(x, a)̄ ∇atr q(x, a)̄ dx . a=̄ ā 0
(6.64)
Ω
From (6.62) it follows that ̄ ∇a q(x, a)̄ = ∫ K(x, κ, a)̄ u(κ) dκ,
(6.65)
Ω def ̄ ̄ Hence, where K(x, κ, a)̄ = ∇a G(x, κ, a).
̄ ̄ MΩ (u) = ∫ ∫ ∫ K(x, κ, a)̄ K̄ tr (x, κ′ , a)u(κ) u(κ ′ ) dx dκ dκ′ .
(6.66)
Ω Ω Ω
Now, the problem is to find u∗ for which max det[MΩ (u)] u∈𝒰Ω
(6.67)
is attained, where 𝒰Ω is the set of functions such that ∫Ω u2 (x) dx ≤ 1. Necessary optimality conditions can be obtained by using variational calculus and some properties of the FIM. The way of deriving them is almost the same as in Chapter 5. To formulate them, let us define the following kernel: ̄ kerΩ (κ, κ ′ , u) = ∫ K̄ tr (x, κ ′ , a)̄ MΩ−1 (u) K(x, κ, a)̄ dx, Ω
(6.68)
126 | 6 Experiments for elliptic equations assuming that MΩ−1 (u) exists. This kernel has the same properties as listed in Corollaries 4 and 5. The necessary optimality condition is also parallel to Theorem 8. It is formulated as follows. Theorem 11. If u∗ ∈ 𝒰Ω , providing invertible MΩ (u∗ ), is the optimal solution of the problem (6.67), then ∫Ω (u∗ (x))2 dx = 1 and the following conditions hold: (a) u∗ is an eigenfunction of ∫ kerΩ (κ, κ′ , u) u(κ ′ ) dκ ′ = γ u(κ).
(6.69)
Ω
(b) The auxiliary eigenvalue problem ∫ kerΩ (κ, κ ′ , u∗ ) ϕk (κ′ , u∗ )dκ′ = γ ϕ(κ, u∗ )
(6.70)
Ω
has at most a countable set of real and nonnegative eigenvalues μk (u∗ ); u∗ is the eigenfunction of (6.70) that corresponds to the largest eigenvalue μmax (u∗ ) and inf max μk (u) = r, u k=1,2,...
(6.71)
where the infimum is taken over all of u such that ∫Ω u2 (x) dx = 1. Additionally, max μk (u∗ ) = r.
k=1,2,...
(6.72)
Using this result, one can derive the numerical procedure for calculation parallel to that described by (5.58).
6.10 A more exact characterization of the solution Let us assume the following. Assumption 2 (About the eigenfunctions). Green’s function G of a system with unknown parameters ā corresponds to linear operator Ax (a)̄ which is self-adjoint and has eigenfunctions vk , k = 1, 2, . . . , independent of a.̄ Furthermore, the eigenfunctions are orthonormal and form the complete set in L2 (Ω). It is also further assumed that eigenvalues λk (a)̄ of Ax (a)̄ are differentiable with respect to ā and we define ̄ h̄ k = ∇ā λk (a),
k = 1, 2, . . . .
(6.73)
In general, h̄ k ’s may depend on a,̄ but it is useful to point out the following cases.
6.10 A more exact characterization of the solution
| 127
Assumption 3 (About eigenvalues – optional). Eigenvalues λk (a)̄ of Ax (a)̄ are linear or r ̄ ̄ affine functions of a,̄ i. e., λk (a)̄ = h̄ tr k a + ϖk for a certain sequence of vectors hk ∈ R , k = 1, 2, . . . , and constants ϖk ’s that do not depend on a.̄ See also Remark 23. Assumption 2 is imposed later in the book. Assumption 3 is tagged as optional in the following sense. If it does not hold, all the results presented in this chapter remain true, but the resulting optimal input signal, in general, depends on unknown parameters and its application requires to use of nominal parameter values. If, additionally, Assumption 3 holds, then in many cases, displayed later in this chapter, the optimal input signal can be designed without any a priori knowledge about unknown parameters. This is the reason that in Section 6.7 the emphasis was put on sufficient conditions for the affine dependence of λk (a)̄ on a.̄ Under Assumption 2, Green’s function can be expressed as ∞
G(x, κ, a)̄ = ∑ λk−1 (a)̄ vk (x) vk (κ). k=1
(6.74)
Note, however, that for second-order elliptic operators their eigenvalues are of the order k 2 . For the fourth-order operators the rate of growth of eigenvalues is even faster. Thus, λk−1 (a)̄ decays as quickly as k −2 and we can safely truncate the series at the kmax > r term. It will be clear later that it is reasonable to have kmax ≥ r (r + 1)/2 + 1 linear combinations of the eigenfunctions. When selecting kmax one should also take into account that vk ’s are usually more oscillatory for larger k and troubles may arise with their practical realization. Further, we take kmax
G(x, κ, a)̄ = ∑ λk−1 (a)̄ vk (x) vk (κ) k=1
(6.75)
as a sufficiently accurate approximation of the system Green’s function. For the vector of sensitivities we have kmax
̄ K(x, κ, a)̄ = ∇ā G(x, κ, a)̄ = − ∑ h̄ k λk−2 (a)̄ vk (x) vk (κ). k=1
(6.76)
The orthonormality of vk ’s leads to the following expression for the FIM: kmax
−4 ̄ u2k , MΩ (u) = ∑ h̄ k h̄ tr k λk (a) k=1
(6.77)
128 | 6 Experiments for elliptic equations def
where uk = ∫Ω u(x) vk (x) dx, k = 1, 2, . . . , kmax . Note also that by the Parseval equality the constraint ∫Ω u2 (x) dx ≤ 1 can be expressed as follows: kmax
∑ u2k ≤ 1.
(6.78)
k=1
Note that under additional Assumption 2, the variational problem (6.67) was reduced to the maximization of the determinant of (6.77), under the constraint (6.78). Note also that the latter maximization problem has the same structure as the problem of selecting D-optimal weights in D-optimal experiment design discussed in Chapter 2. Theorem 12 (The equivalence theorem). Consider the problem of D-optimal design for system (6.62) under Assumption 2. Then, the input signal kmax
u∗ (x) = ∑ u∗k vk (x)
(6.79)
k=1
is D-optimal if and only if u∗k ’s have the form u∗k = √p∗k , k = 1, 2, . . . , kmax , where p∗k ’s are the solution of the following problem: kmax
kmax
k=1
k=1
−4 ̄ pk ], max det[ ∑ h̄ k h̄ tr k λk (a) pk ≥0
∑ pk = 1,
(6.80)
which in turn is equivalent to the problem of finding p∗k ’s for which the following condition holds: min
max
kmax
pk ≥0 k=1, 2,..., kmax
−1
̄ ̄ tr −4 ̄ p ] h̄ λ−4 (a)̄ = r, h̄ tr j k k k [ ∑ hj hj λj (a) j=1
(6.81)
where the minimization is carried out over all nonnegative pk ’s, summing up to 1. Furthermore, (6.81) is equivalent to the following condition: max
k=1, 2,..., kmax
kmax
−1
̄ ̄ tr −4 ̄ p∗ ] h̄ λ−4 (a)̄ = r. h̄ tr k k k [ ∑ hj hj λj (a) j j=1
(6.82)
Note that the coefficients u∗k depend on ā and one should set ā = ā 0 , but ā 0 is unknown and – in practice – we can replace it by its nominal value (see the discussion in Chapter 2). Proof. Theorem 12 is a direct consequence of the Kiefer–Wolfowitz theorem (see Chapter 2) and one may omit its proof. However, it seems to be instructive to provide the proof of the necessity of condition (6.82) using Theorem 11.
6.10 A more exact characterization of the solution
| 129
To this end, let us note that under Assumption 2, due to (6.76), kernel (6.68) has the following form: kmax
−1 ′ ̄ kerΩ (κ, κ ′ , u) = ∑ λk−4 (a)̄ h̄ tr k MΩ (u) hk vk (κ) vk (κ ). k=1
(6.83)
Multiplying both sides of (6.83) by vj (κ′ ), integrating over Ω, and using the orthonormality of vk ’s, it is easy to verify that vk ’s are also the eigenfunctions of the integral equation with kernel kerΩ . The corresponding eigenvalues have the following form: −1 ̄ μk (u) = λk−4 (a)̄ h̄ tr k MΩ (u) hk ,
k = 1, 2, . . . , kmax .
(6.84)
Thus, if u∗ is D-optimal, then (6.72) holds, which, by (6.84), implies (6.82). It is worth mentioning that the above theorem provides the exact solution of the DOE problem, since here we do not have additional requirement that the number of experiments should be positive integers. One nice feature of this result is that we can directly apply the well-known and fast algorithm of optimizing the weights of D-optimal design when its support is given (see Chapter 3). By the Caratheodory theorem we know that it is possible to select at most r (r + 1)/2 nonzero u∗k ’s, which means that at most r (r + 1)/2 eigenmodes of the system have to be actuated. Corollary 8 (One estimated parameter). Under the assumptions of Theorem 12, when only one parameter a is unknown (r = 1), u∗ (x) has the following properties: (a) u∗ (x) = vk∗ (x), x ∈ Ω, where k ∗ = arg max [h2k λk−4 (a)]. 1≤k≤kmax
In other words, uk∗ = 1 and u∗k = 0 for k ≠ k ∗ , k = 1, 2, . . . , kmax . (b) If the sequence h2k λk−4 (a),
k = 1, 2, . . . , kmax ,
(6.85)
is decreasing, then u∗ (x) = v1 (x). 2
Example 40. Consider operator Ax (a) q(x) = a d dxq(x) 2 , x ∈ (0, 1), q(0) = q(1) = 0, de2 fined in C (0, 1), where a > 0 is an unknown parameter. Then, vk (x) = √2 sin(k π x), λk (a) = a π 2 k 2 , hk = π 2 k 2 , and h2k λk−4 (a), k = 1, 2, . . . , kmax , is strictly decreasing. Thus, according to case (b) of Corollary 8, u∗ (x) = √2 sin(π x) for estimating a in the system a q′′ (x) = −u(x) with the Dirichlet boundary conditions. 2
Example 41. Consider a similar operator Ax (a) q(x) = d dxq(x) − a q(x), x ∈ (0, 1), q(0) = 2 q(1) = 0, defined in C 2 (0, 1), where a > 0 is an unknown parameter. Then, again,
130 | 6 Experiments for elliptic equations vk (x) = √2 sin(k π x), λk (a) = π 2 k 2 + a. According to (6.73), hk = 1, which is constant for all k. Thus, h2k λk−4 (a) = (π 2 k 2 + a) , −1
k = 1, 2, . . . , kmax ,
(6.86)
is decreasing and, according to Corollary 8, as u∗ (x) we have √2 sin(π x). Example 42. Consider the Laplace operator in two dimensions with one unknown parameter, Ax (a)̄ q(x) = a [
𝜕2 q(x) 𝜕2 q(x) + ], 𝜕x12 𝜕x22
x ∈ (0, 1)2 ,
(6.87)
and with homogenous boundary conditions of the Dirichlet type, q(x1 , 0) = 0,
q(x1 , 1) = 0,
q(0, x2 ) = 0,
q(1, x2 ) = 0.
(6.88)
Its eigenvalues and eigenfunctions are the following: a π 2 (i2 + j2 ),
2 sin(i π x1 ) sin(j π x2 ),
respectively, i, j = 1, 2, . . . . They can be lexicographically ordered into sequences λk (a) d λk (a) and vk (x), respectively, k = 1, 2, . . . . In the same way we can order hk = da which 2 2 2 corresponds to one of π (i + j ). Note that the sequence (6.85) in Corollary 8 is not strictly decreasing (e. g., for i = 1, j = 2 and for i = 2, j = 1 we have (i2 + j2 ) = 5, but these pairs must have different k’s after ordering). However, it is easy to verify that also in this case the maximum in Corollary 8 is attained for k that corresponds to the pair i = 1, j = 1. Indeed, this is the only pair that provides (i2 + j2 ) = 2, while for the rest of them (i2 + j2 ) > 2 and both the eigenvalues and the sensitivities are linearly growing with (i2 + j2 ). Thus, the D-optimal input for estimating a has the form u∗ (x) = 2 sin(π x1 ) sin(π x2 ). Example 43 (Two unknown parameters). Consider the operator Ax (a)̄ q(x) = a1
d2 q(x) − a2 q(x), dx 2
x ∈ (0, 1),
q(0) = q(1) = 0,
defined in C 2 (0, 1), where a1 > 0 and a2 > 0 are unknown parameters. Then, again, vk (x) = √2 sin(k π x), but this time λk (a)̄ = a1 π 2 k 2 + a2 and h̄ k = [π 2 k 2 , 1]tr . In this case we have to invoke Theorem 12. Note that the FIM is the sum of the matrices π4 k4 [ 2 2 π k
π2 k2 ] 1
with appropriate weights, pk /(a1 k 2 +a2 )4 , k = 1, 2, . . . , kmax = 3. Selecting only the first two modes as candidates for spanning the optimal solution, we obtain the following
6.10 A more exact characterization of the solution
| 131
expression for the determinant of the FIM: 9 p1 p2 , (a1 π 2 + a2 )4 (4 a1 π 2 + a2 )4
(6.89)
where p1 +p2 = 1. Clearly, (6.89) is maximized by p1 = p2 = 1/2. By tedious calculations, using, e. g., Mathematica, one can verify that this solution is a D-optimal one for a1 ≥ 1 and 0 ≤ a2 ≤ π 2 − pi − 1/2, since μk (u∗ ) =
2 [6π 4 a1 2 a2 2 (17k 4 − 40k 2 + 32) 9(π 2 a1 k 2 + a2 )4
+ 4π 6 a1 3 a2 (65k 4 − 136k 2 + 80) + π 8 a1 4 (257k 4 − 520k 2 + 272) + 4π 2 a1 a2 3 (5k 4 − 16k 2 + 20) + a2 4 (2k 4 − 10k 2 + 17)]
(6.90)
attains a maximum that equals 2 for k = 1 and k = 2, while for k = 3 we have μk (u∗ ) < 2 in the specified range of a2 . Summarizing, the D-optimal input in this case has the form u∗ (x) = sin(π x) + sin(2 π x) (see Fig. 6.3). Note that the weights are 1/√2, while the normalizing constants are √2.
Figure 6.3: The D-optimal input signal derived in Example 43.
Observe that if there is the necessity of using a numerical search for optimal weights, the product type algorithms (sketched in Chapter 3) are those of the first choice, since the support is known a priori. This example and many other cases that are not displayed here suggest the following:
132 | 6 Experiments for elliptic equations Open problem Propose conditions on eigenvalues and on sensitivities h̄ k of Ax (a)̄ that are sufficient for D-optimal input u∗ (x) to be a linear combination of only r first (or main) eigenfunctions ̄ of Ax (a). Justification This problem seems to be of importance for applications and for a deeper understanding of DOE for ill-posed inverse problems for operators of the elliptic type. From the theory of D-optimal designs (see, e. g., [8, 46]) it follows that when the number of estimated parameters r is equal to the number of functions spanning a system response, the D-optimal experiment design consists of r points with the weights 1/r. Translating this statement to our case, it means that our hypothesis is the following: D-optimal input is of the form u∗ (x) =
1 r ∑ v (x). √r k=1 k
(6.91)
By Corollary 8 this hypothesis is true for r = 1, under the assumption that h2k λk−4 (a), k = 1, 2, . . . , kmax , is strictly decreasing. Note that the FIM is the sum of the following rank one matrices: −4 h̄ k h̄ tr k λk (a).
(6.92)
For the second-order elliptic operators the largest elements of these matrices are of the order k 4 while λk−4 (a)’s are of the order k −8 , which yield the order k −4 . Thus, one may expect that the modes higher than r will not be present in the optimal solution. In the case of fourth-order operators we even have a ratio of the order k −8 . Even if the conjecture sketched above is not true in general, in practice it is still useful to consider the minimum support designs (consisting of only r first – in order – eigenfunctions) because they are easier to implement and still highly efficient from the point of view of estimation accuracy.
7 Optimal input signals for DPS – time domain synthesis 7.1 Introduction and notational conventions In this chapter we discuss the problem of D-optimal input signals for estimating parameters in systems with spatio-temporal dynamics, but this time we allow for a finite or even short observation horizon. As the main tool for obtaining D-optimality conditions we use variational calculus and eigenfunction expansions. The chapter is organized as follows. We firstly outline the mathematical descriptions of the two main classes of systems with spatio-temporal dynamics, namely, systems of the parabolic and hyperbolic types, with the emphasis on their representations in terms of eigenfunction expansions. Then, we state the problem of finding D-optimal input signals for such systems and D-optimality conditions are derived in two cases: – when constraints are imposed only on the energy of input signals, which leads to “aggressive” (rapidly growing) inputs, – when additional constraints are imposed on the energy of the output signal, which leads to less “aggressive,” safer in practice, input signals. Finally, we provide examples of D-optimal input signals. Slightly abusing the notation, we shall write q(x, t) for the system state at spatial point x ∈ Rs at time t ≥ 0 or q(x, t, a)̄ when we want to emphasize dependence of the vector of unknown parameters ā ∈ Rr . In the same vein, we shall denote by G(x κ, t) ̄ Green’s function of the system under consideration. (or G(x κ, t, a)) We shall maintain all the conventions used in the previous chapter concerning spatial variables x = [x1 , x2 , . . . xs ]tr and derivatives with respect to them. In particular, when only one spatial variable is considered in the examples that follow, we shall (t,...) ̄ Similarly, we shall denote f ̇(t, . . .) for d f dt use x instead of x1 and q′ (x, t, a). for a differentiable f . For continuously differentiable f we use the shorthand notation f ̇(0, . . .) (t,...) for d f dt |t=0 . We shall denote by U(x t) an input signal at spatial point x and time t. We use the capital letter U to indicate that it is a function of two groups of variables, as opposed to u(t) or u(x) that were used previously and are still used in this chapter.
7.2 Assumptions on parabolic and hyperbolic PDEs The following main classes of systems with spatio-temporal dynamics are considered. https://doi.org/10.1515/9783110351040-007
134 | 7 Optimal inputs for PDEs in time domain Systems described by PDEs of the parabolic type 𝜕 q(x, t) = Ax (a)̄ q(x, t) + U(x, t), 𝜕t
x ∈ Ω, t ∈ (0, T),
(7.1)
with the initial condition q(x, 0) = q0 (x) for x ∈ Ω. Systems described by PDEs of the hyperbolic type 𝜕 q(x, t) 𝜕2 q(x, t) +μ = Ax (a)̄ q(x, t) + U(x, t), 𝜕t 𝜕t 2 with the initial conditions q(x, 0) = q0 (x) and
𝜕 q(x, t) |t=0 𝜕t
x ∈ Ω, t ∈ (0, T),
(7.2)
= q1 (x) for x ∈ Ω.
As in the previous chapter, it is tacitly assumed that the boundary conditions are ̄ included in the definition of operator Ax (a). The following assumptions are made concerning equations (7.1) and (7.2).
(PH1)
(PH2) (PH3)
All the assumptions made in the previous chapter concerning operator Ax (a)̄ hold, including the assumption on independence of eigenvalues on ā and
(optionally) the assumption on the linear (or affine) dependence of the eigenvalues on a.̄ For simplicity of formulas, the initial conditions q0 (x) (q0 (x) and q1 (x)) are
assumed to be zero1 in Ω.
For sufficiently smooth2 U(x, t) on Ω × [0, T] the solution q in the classical
sense of (7.1) (respectively, (7.2)) exists and is unique. This means that q ∈
C 1,2 ([0, T], Ω), i. e., q is continuously differentiable with respect to t ∈ [0, T]
and twice continuously differentiable with respect to x ∈ Ω, and (7.1) is ful-
filled by q at each point of (0, T] × Ω together with the initial and boundary
conditions. Analogously, for the solution q of (7.2) we have q ∈ C 2,2 ([0, T], Ω).
1 One can also consider the initial conditions as decision variables, influencing the estimation accuracy. However, manipulating them is not always possible. On the other hand, when we cannot manipulate them and they are not zero, their influence on the estimation accuracy manifests itself only by the presence of an additional, constant matrix in the FIM. Its presence does not influence substantially the results that follow. Note that if the initial conditions are nonzero, then they have to be in agreement with the boundary conditions imposed on Ax . 2 As in the case of systems described by elliptic equations, assuming the smoothness of U, we do not lose the attainable estimation accuracy, since – as will be shown later in this chapter – the optimal kmax solution is in the class of functions of the form ∑k=1 uk (t) vk (x). Smoothness of vk ’s was already noted in the previous chapter, while uk (t) are eigenfunctions (or their combinations) of integral operators with smooth kernels.
7.2 Parabolic and hyperbolic PDEs | 135
(PH4)
The solution of (7.1) (respectively, (7.2)) with zero initial conditions can be ex̄ pressed with the aid of Green’s function G(x, κ, t, a): T
q(x, t, a)̄ = ∫ ∫ G(x, κ, t − τ, a)̄ U(κ, τ) dx dτ,
(7.3)
0 Ω
for t ∈ (0, T), x ∈ Ω, where G is the solution of (7.1) (respectively, (7.2)) when U(x, t) is replaced by δ(t) δ(x − κ), where δ(t) is the Dirac delta with respect to the time variable. Remark 24. Note that G(x, κ, t, a)̄ = 0 for t < 0, which allows us to use integrals over [0, T] instead of [0, t] in (7.3) and later on in similar cases. In order to reveal the structure of the solution of (7.1) we formally look for the solution in the following form: ∞
q(x, t, a)̄ = ∑ yk (t, a)̄ vk (x),
(7.4)
k=1
̄ which are independent of ā (see the previous where vk ’s are eigenfunctions of Ax (a), ̄ k = 1, 2, . . . , are functions to chapter for conditions and examples), while yk (t, a), be specified in such a way that (7.4) solves (7.1). Under (PH1)–(PH3), vk ’s provide an orthonormal basis in L2 (Ω). Thus, we can express u as follows: ∞
U(x, t) = ∑ uk (t) vk (x),
(7.5)
k=1
def
where uk (t) = ∫Ω U(x, t) vk (x) dx, k = 1, 2, . . . . Now, we substitute (7.4) and (7.5) into (7.1), and multiply both sides of (7.1) by vj (⋅), and then integrate the result over Ω, which yields ẏj (t, a)̄ = −λj (a)̄ yj (t, a)̄ + uj (t),
j = 1, 2, . . . ,
(7.6)
with initial conditions yj (0, a)̄ = 0, since zero initial conditions were assumed for q. In deriving (7.6) we have used the fact that Ax (a)̄ vj (⋅) = −λj (a)̄ vj (⋅). Summarizing, the solution of (7.1) is represented by (7.4) with the time-varying coefficients that are solutions of (7.6). Repeating the procedure to (7.2) we arrive at the conclusion that its solution is represented by (7.4) with the time-varying coefficients that are solutions of the following ODEs: ÿj (t, a)̄ + μ ẏj (t, a)̄ = −λj (a)̄ yj (t, a)̄ + uj (t), with initial conditions yj (0, a)̄ = 0, ẏj (0, a)̄ = 0, j = 1, 2, . . . .
j = 1, 2, . . . ,
(7.7)
136 | 7 Optimal inputs for PDEs in time domain In the same vein, we are looking for Green’s function of the form ∞
G(x, κ, t, a)̄ = ∑ Ik (t) vk (x) vk (κ).
(7.8)
k=1
Substituting (7.8) into (7.1) ((7.2), respectively), inserting δ(x − κ) δ(t) as U(x, t), multiplying both sides by vj (⋅), and integrating over Ω, we obtain the following equations for Ik (t)’s: in the parabolic case, Ik̇ (t; a)̄ = −λk (a)̄ Ik (t; a)̄ + δ(t),
Ik (0) = 0,
(7.9)
in the hyberbolic case, Ik̈ (t; a)̄ + μ Ik̇ (t; a)̄ = −λk (a)̄ Ik (t; a)̄ + δ(t),
Ik (0) = 0,
Ik̇ (0) = 0.
(7.10)
As one can observe, Ik (t)’s can be interpreted as impulse responses of the spatial modes. We also need the sensitivities of G, and hence also the gradients of Ik (t)’s, with respect to a.̄ To this end, it is expedient to formulate the following simple lemmas. Lemma 7. Let for I(t, a)̄ the following differential equation hold: ̇ a)̄ = −λ(a)̄ I(t, a)̄ + δ(t), I(t,
t ∈ (0 , T),
I(0) = 0,
(7.11)
where λ(a)̄ = h̄ tr ā for a certain h̄ ∈ Rr , which is independent of a.̄ Then, I(t, a)̄ is differentiable with respect to the elements of ā and its gradient, denoted further by ̄ is the solution of the following set of equations: ̄ = ∇ā I(t, a), s(t) ̇̄ = −λ(a)̄ s(t) ̄ − h̄ I(t), s(t)
̄ s(0) = 0.
(7.12)
̄ = h̄ ρ(t), where ρ(t) is a realThe solution of (7.12) has a specific structure, namely, s(t) valued function for which the following equation holds: ̇ = −λ(a)̄ ρ(t) − I(t), ρ(t)
ρ(0) = 0,
t ∈ (0, T].
(7.13)
Proof. Equation (7.12) follows by the differentiation of (7.11) with respect to the elē = h̄ ρ(t) results from the linearity of (7.12) with respect ments of a.̄ The structure of s(t) ̄ to h and from the fact that (7.12) is in fact the set of independent ODEs that differ only by the amplification factor influencing I(t). Lemma 8. Let for I(t, a)̄ the following differential equation hold: ̈ a)̄ + μ I(t, ̇ a)̄ = −λ(a)̄ I(t, a)̄ + δ(t), I(t,
t ∈ (0 , T),
I(0) = 0,
(7.14)
7.3 Assumptions and the Fisher information matrix | 137
where λ(a)̄ = h̄ tr ā for a certain h̄ ∈ Rr , which is independent of a.̄ Then, I(t, a)̄ is differentiable with respect to the elements of ā and its gradient, denoted further by ̄ = ∇ā I(t, a), ̄ is the solution of the following set of equations: s(t) ̈̄ + μ s(t) ̇̄ = −λ(a)̄ s(t) ̄ − h̄ I(t), s(t)
̄ s(0) = 0.
(7.15)
̄ = h̄ ρ(t), where ρ(t) is a realThe solution of (7.15) has a specific structure, namely, s(t) valued function for which the following equation holds: ̈ + μ ρ(t) ̇ = −λ(a)̄ ρ(t) − I(t), ρ(t)
ρ(0) = 0,
t ∈ (0, T].
(7.16)
We need these two lemmas in the next section for calculating the FIMs.
7.3 Assumptions on observations and the form of the Fisher information matrix The observations (exactly or approximately “continuous” in Ω) of the system T
q(x, t, a)̄ = ∫ ∫ G(x, κ, t − τ, a)̄ U(κ, τ) dκ dτ
(7.17)
0 Ω
for estimating ā have the following form: Y(x, t) = q(x, t; a)̄ + ε(x, t),
x ∈ Ω,
(7.18)
where ε(x, t) is a zero mean, uncorrelated in space and time, covariance stationary, Gaussian process (σ 2 = 1), t ∈ (0, T). We shall assume that the observation T horizon is finite. We also consider the case when it is relatively small. The Fisher information matrix Under these assumptions, the Fisher information matrix has the form T
MT (U) = ∫ ∫ ∇a q(x, t; a)̄ ∇atr q(x, t; a)̄ dt dx
(7.19)
Ω 0
and the problem of finding the D-optimal spatio-temporal input signal reads as follows. Find U ∗ such that det[MT−1 (U)] is minimized over all U ∈ L2 (Ω × (0, T)) with the constrained energy, i. e., T
∫ ∫ U 2 (x, t) dt dx ≤ 1. Ω 0
(7.20)
138 | 7 Optimal inputs for PDEs in time domain The orthogonality of eigenfunctions vk of Ax makes this problem tractable. For uk (t) = ⟨U(⋅, t), vk ⟩, constraint (7.20) can be transformed as follows: ∞ T
∑ ∫ u2k (t) dt ≤ 1.
k=1 0
(7.21)
Again, by the orthogonality of vk ’s we can express the FIM as follows: ∞ T
MT (U) = ∑ ∫ ∇a yk (t; a)̄ ∇atr yk (t; a)̄ dt. k=1 0
(7.22)
Partial modal decomposition of the D-optimality problem Denoting by Ik (t; a)̄ the impulse response of the k-th mode of the system, we obtain T
∇a yk (t; a)̄ = ∫ ∇a Ik (t − τ; a)̄ uk (τ) dτ
(7.23)
0
(see Remark 24 for the integration interval). Thus, the FIM can be expressed as follows: ∞ T T
MT (U) = ∑ ∫ ∫ Hk (τ, ν; a)̄ uk (τ)uk (ν) dτ dν, k=1 0 0
where def
T
Hk (τ, ν; a)̄ = ∫ ∇a Ik (t − τ; a)̄ ∇atr Ik (t − ν; a)̄ dt. 0
̄ Observe that Hktr (τ, ν; a)̄ = Hk (ν, τ; a). Remark 25. The lower integration limit in the above expression is max(τ, ν), but Ik (t; a)̄ = 0 for t < 0 and here and later on we can take 0 as the lower limit. This problem of selecting U that is D-optimal is more difficult than for ODEs, since we cannot select one mode and optimize its input signal. The reason is that Hk (τ, ν; a)̄ is a rank one matrix as explained below. In fact, we have to optimize simultaneously excitations of several modes. Using Lemmas 7 and 8, we justify the following: ̄ Hk (τ, ν; a)̄ = h̄ k h̄ tr k ck (τ, ν; a).
(7.24)
7.3 Assumptions and the Fisher information matrix | 139
In the above formulas, ck ’s are defined as follows: def
T
ck (τ, ν; a)̄ = ∫ ρk (t − τ; a)̄ ρk (t − ν; a)̄ dt,
(7.25)
0
̄ ∇a Ik (t; a)̄ = h̄ k ρk (t; a),
(7.26)
where: in the parabolic case, ρ̇ k (t; a)̄ = −λk (a)̄ ρk (t; a)̄ − Ik (t), Ik̇ (t; a)̄ = −λk (a)̄ Ik (t; a)̄ + δ(t),
(7.27)
ρ̈ k (t; a)̄ + μ ρ̇ k (t; a)̄ = −λk (a)̄ ρk (t; a)̄ − Ik (t), Ik̈ (t; a)̄ + μ Ik̇ (t; a)̄ = −λk (a)̄ Ik (t; a)̄ + δ(t).
(7.28)
in the hyberbolic case,
̄ are causal and Remark 25 is relevant for the integral in (7.25). Note that ρk (t; a)’s We call our approach only the partial modal decomposition, because we cannot select one mode and optimize its input signal. The implication that Hk (τ, ν, a)̄ = ̄ h̄ k h̄ tr k ck (τ, ν, a) is a rank one matrix is the following. Corollary 9. The necessary condition for MT (U) to be nonsingular is that at least dim(a)̄ of the functions ũ k (t) = ⟨U(⋅, t), vk ⟩, k = 1, 2, . . . , takes nonzero values on sub-intervals of (0, T) having nonzero lengths. The analysis of the structure of MT (U) reveals that we can split our attempts to find a D-optimal input signal into the following steps: 1. proper spreading of the energy of input signals between modes, 2. the selection of their excitations. def
To this end, define ũ k (t) = uk (t)/√αk , where αk = ‖uk ‖2 . Corollary 10. The problem of finding the D-optimal input signal is equivalent to the maximization of the determinant of the FIM: ∞ T T
MT (U) = ∑ ∫ ∫ αk Hk (τ, ν, a)̄ ũ k (τ) ũ k (ν) dτ dν, k=1 0 0
(7.29)
with respect to αk ≥ 0 and ũ k (⋅), k = 1, 2, . . . , under the following conditions: ‖ũ k (⋅)‖2 = 1, ∑∞ k=1 αk ≤ 1.
140 | 7 Optimal inputs for PDEs in time domain Denote by αk∗ and u∗k (⋅), k = 1, 2, . . . , the solution of the above problem. Then, the solution of the initial problem can be expressed as follows: ∞
U ∗ (x, t) = ∑ √αk∗ u∗k (t) vk (x).
(7.30)
k=1
∗ As in the previous chapters, we know that for the optimal solution we have ∑∞ k=1 αk = 1.
Toward D-optimality conditions Now, our aim is to provide optimality conditions for αk∗ and u∗k , k = 1, 2, . . . . To this end, assume that U ∗ is the optimal solution and note that √αj∗ u∗j (t) = ⟨U ∗ (⋅, t), vj ⟩.
Define also
def
̄ kerj (τ, ν, U ∗ ) = trace[MT−1 (U ∗ ) Hj (τ, ν, a)]
(7.31)
̄ = ζj (U , T) cj (τ, ν, a), ∗
def −1 ∗ ̄ ̄ is defined as in (7.25). Kernels kerj ’s where ζj (U ∗ , T) = h̄ tr j MT (U ) hj , while cj (τ, ν, a) are symmetric and positive definite.
Corollary 11. If U ∗ is optimal and MT (U ∗ ) is nonsingular, then the excitations u∗j of modes j = 1, 2, . . . are eigenfunctions of the following integral equations: T
ζj (U , T) ∫ cj (τ, ν, a)̄ u∗j (τ) dτ = γ u∗j (ν), ∗
j = 1, 2, . . . .
(7.32)
0
Note that eigenvalue γ is the same in all these equations. Proof. Let us perturb each j-th mode of U ∗ (x, t) separately, i. e., consider the family Uj,ϵ (x, t) = U ∗ (x, t) + ϵ fj (t) vj (x) of perturbations, where fj ∈ C0 (0, T), j = 1, 2, . . . , are arbitrary. Define the Lagrange function T
2
ℒ(U, γ) = log det(MT (U)) − γ(∫∫ U (x, t) dx dt − 1) 0Ω
∞ T
= log det(MT (U)) − γ( ∑
∫ u2k (t) dt
k=1 0
(7.33) − 1).
Substituting each Uj,ϵ into ℒ(U, γ), differentiating with respect to ϵ, and then setting ϵ = 0 we arrive at (7.32) in the same way as in Chapter 5 for LTI systems. Now, it remains to calculate γ in (7.32) and to recognize which eigenfunctions are associated with it.
7.3 Assumptions and the Fisher information matrix | 141
Lemma 9. The Lagrange coefficient γ that appears in Corollary 11 is equal to r, i. e., to the number of estimated parameters. Proof. The result follows from the following equalities: r = trace(MT−1 (U ∗ )MT (U ∗ )) ∞
TT
−1 ∗ ∗ ∗ = ∑ αk∗ ∫∫ trace(M ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟ T (U )Hk (τ, ν)) uk (τ)uk (ν) dτ dν k=1
∞
00 T
kerk (τ,ν,U ∗ )
(7.34)
∞
2
= γ ∑ αk∗ ∫ (u∗k (ν)) dν = γ ∑ αk = γ, k=1
k=1
0
where in the second row the right-hand side of (7.32) is substituted at the place of the integral with respect to τ. Additionally, the equality ∑∞ k=1 αk = 1 was used at the last stage. Thus, the eigenfunctions corresponding to γ = r, further denoted as ϕ(k) max (t), must coincide with u∗k (t). Recall that in Chapter 5 the number of estimated parameters was equal to (r + 1), which is the equivalent of r parameters as used in this chapter. Summarizing, we obtain the following result. Corollary 12. If U ∗ is the D-optimal input signal, then it can be expressed in the modal ∗ form (7.30) with u∗k (t) = ϕ(k) max (t) and αk selected in such a way that they are maximizers, over all αk ≥ 0’s, ∑∞ k=1 αk = 1, of the following criterion: αk h̄ h̄ tr ]. ∗ , T) k k ζ (U k k=1 ∞
max log det[r ∑
α1 , α2 ...
(7.35)
Proof. It suffices to observe that for U ∗ in (7.29) we have T T
∫ ∫ Hk (τ, ν, a)̄ ũ ∗k (τ) ũ ∗k (ν) dτ dν
(7.36)
0 0
T T h̄ k h̄ tr h̄ k h̄ tr ∗ ∗ k k ̄ ̃ ̃ = c (τ, ν, a) u (τ) u (ν) dτ dν = r, ∫ ∫ k k k ζk (U ∗ , T) ζk (U ∗ , T) 0 0
since the squared norm of ũ ∗k (τ) is 1. Multiplier r in (7.35) does not influence the best αk ’s. It is kept to have the proper scaling of the FIM. This result almost completely characterizes the spatio-temporal structure of the optimal input signal. The only ingredients that are not explicitly stated are αk∗ ’s, indicating allocation of the energy between modes. For the fixed structure of U ∗ , as indi-
142 | 7 Optimal inputs for PDEs in time domain cated in (7.30), one can formally repeat the proof of the Kiefer–Wolfowitz theorem to get D-optimality conditions for αk∗ ’s (see Chapter 2).
7.4 Spatio-temporal structure of the solution In this section we state the D-optimality conditions and reveal the spatio-temporal structure of the solution. Theorem 13. If U ∗ is the D-optimal input signal for parameter estimation in (7.17), then it can be expressed as follows: L
U ∗ (x, t) = ∑ √αk∗ vk (x) ϕ∗k (t).
(7.37)
k=1
T
If this signal is D-optimal, the weights αk∗ ≥ 0, ∑Lk=1 αk∗ = 1 and ϕ∗k (t), ∫0 (ϕ∗k (t))2 dt = 1 ∗ have to be selected so that for the corresponding eigenvalues μ(k) ∗ (U ) of the integral operators with the kernels
̄ ker∗k (τ, ν) = trace[MT−1 (U ∗ ) Hk (τ, ν; a)] we simultaneously have ∗ ̄ k = 1, 2, . . ., μ(k) ∗ (U ) = r = dim(a),
k = 1, 2, . . . , L,
(7.38)
and αk∗ ’s solve the problem (7.35). – – –
– –
Several remarks are in order concerning the above results. Theorem 13 reveals the structure of the D-optimal signal. Namely, each spatial mode is excited by its own time domain signal. This theorem allows for checking the D-optimality of a signal that was “guessed” by the experimenter, similarly as in the Kiefer–Wolfowitz theorem (see Chapter 2). The excitations of spatial modes are eigenfunctions of the operators with kernels ̄ These eigenfunctions can in principle be calculated, since they depend ck (τ, ν; a). only on our system, assuming that the nominal values ā 0 are used instead of ā when λk ’s are computed. Furthermore, αk∗ ’s can be calculated by solving the D-optimal problem (e. g., by the Wynn–Fedorov algorithm or its further extensions). Theorem 13 is valid for arbitrary T > 0. If experimental conditions allow (or force) T to be “small,” then we can formulate a simple algorithm for approximating U ∗ (x, t). By “small” we understand that the following approximation is
7.4 Spatio-temporal structure of the solution | 143
sufficiently accurate: T
ck (τ, ν; a)̄ = ∫ ρk (t − τ; a)̄ ρk (t − ν; a)̄ dt
(7.39)
0
̄ ≈ T ρk (T − τ; a)̄ ρk (T − ν; a). Applying (7.39), the algorithm for calculating the (approximate) D-optimal input signal when T is “small” runs as follows. Algorithm 1 (Approximate D-optimal input signals). def
T
1
2 2 ̄ ̄ Step 1 Set ϕ∗k (t) = n−1 k ρk (T − t; a), where nk = [∫0 ρk (T − t; a) dt] . Step 2 Solve the following standard D-optimal experiment design problem: find α1∗ , α2∗ , . . . for which L
αk nk h̄ k h̄ tr k] ̆ ζ ( U, T) k=1 k
max det[ ∑
α1 ,α2 ,...
is attained, under the constraints α1 ≥ 0, α2 ≥ 0, . . . , ∑Lk=1 αk = 1, where L
̆ t) = ∑ √αk vk (x) ρk (T − t; a)/n ̄ k. U(x, k=1
̃ t) of the D-optimal input signal U ∗ (x, t) as folStep 3 Form the approximation U(x, lows: L
̃ t) = ∑ √α∗ vk (x) ρk (T − t; a)/n ̄ k. U(x, k k=1
Several remarks are relevant when the above algorithm is analyzed. 1. The D-optimal input signal consists of not more than L spatial modes, each being excited by its own signal varying in time. 2. The excitation of each spatial mode is based on its sensitivity function, but running backward in time. Note that for stable systems the sensitivity functions ρk (t; a)̄ are decreasing in time. Thus, approximate D-optimal excitations, as running backward in time, are rapidly growing. 3. Hence, allowing for “small” T is not only for mathematical convenience, but also in order to prevent a system from possible destruction. ̄ rapidly decrease with k. Thus, only several first of them are 4. Observe that ρk (t; a)’s informative at all. 5. When T is allowed to be larger, one can approximate c(τ, ν, a)̄ by a quadrature formula, which leads to solving the eigenvalue–eigenfunction problems for u∗k (t)
144 | 7 Optimal inputs for PDEs in time domain as the problems with degenerated kernels (exactly in the same way as described for LTI ODE systems in Chapter 5). Example 44 (Hyperbolic case). Consider the following damped vibrating system: 𝜕2 q(x, t) 𝜕 q(x, t) 𝜕2 q(x, t) + μ + a = U(x, t), 𝜕t 𝜕 t2 𝜕 x2 x ∈ (0, 1), q(0, t) = q(1, t) = 0. Let us assume that only one parameter, namely, a, is unknown. The following plots are calculated for nominal values: μ = 3, a = 33. As the experiment horizon we take T = 6. According to Step 1 of the above algorithm, if T is “small,” one can take the signal from the right panel of Fig. 7.1 (appropriately normalized) backward in time, as an approximation of the D-optimal input signal for estimating a. This signal is shown in Fig. 7.2 (left panel). Note that the experiment horizon T = 6 is not very “small.” It was intentionally selected in this way in order to compare it with a more accurately calculated optimal amplitude, which – by Corollary 11 – is the eigenfunction corresponding to the ̄ According to Corollargest eigenvalue of the integral equation with kernel c1 (τ, ν, a). lary 11, we have to numerically approximate ϕmax (t). To this end, we approximate the
Figure 7.1: The impulse response and its sensitivity to a for the hyperbolic system in Example 44 – the first mode.
Figure 7.2: An approximate optimal input signal (left panel) and its more exact (numerical) approximation (right panel) in Example 44 – the first mode.
7.4 Spatio-temporal structure of the solution | 145
Figure 7.3: The space-time structure of the optimal input signal for the hyperbolic system. In this case r = 1 – one mode in space is sufficient (see Example 44).
integral eigenfunction problem in Corollary 11 by its algebraic counterpart by the Nystrom method, with the step size in time equal to 0.1. As one can note from the right panel of Fig. 7.2, the difference between the more exact approximation (dots) and the approximation used in the above algorithm is small. Thus, according to the above results, the D-optimal, spatio-temporal input signal excites only the first spatial mode, √2 sin(π x), and its spatio-temporal structure is shown in Fig. 7.3. Example 45 (Parabolic case). Consider the following system of the parabolic type: 𝜕 q(x, t) 𝜕2 q(x, t) = a1 − a2 q(x, t) + U(x, t), 𝜕t 𝜕 x2 x ∈ (0, 1), q(0, t) = q(1, t) = 0 with two unknown parameters a1 and a2 , i. e., r = 2. In this case the D-optimal input signal formally consists of at most three spatial modes: √2 sin(k π x), k = 1, 2, 3. Using the result presented in Example 43, it suffices to excite the first two modes. The eigenvalues have the form λk (a)̄ = a1 π 2 k 2 + a2 . For further calculations we take a1 = a2 = 0.1 as nominal values of unknown parameters. Firstly, we calculate approximately the D-optimal input signals for each mode. To this end, we calculate the sensitivities ρk (t) for the parabolic case. Then ρk (T − t), k = 1, 2, are taken as the excitations of each mode (see Fig. 7.4). Finally, the approximation of the D-optimal input signal is formed: ρ1 (T − t) sin(π x) + ρ2 (T − t) sin(2 π x). Its shape is shown in Fig. 7.5.
146 | 7 Optimal inputs for PDEs in time domain
Figure 7.4: Approximately D-optimal time domain excitations for two modes – the dashed curve represents the second mode (see Example 45).
Figure 7.5: The space-time structure of the first two modes of the optimal input signal, but the influence of the second one is almost invisible (see Example 45).
Example 46 (Hyperbolic case – thin plate). Consider vibrations of a thin, rectangular plate described by the hyperbolic equation 𝜕2 q(x, t, a)̄ = Ax (a)̄ q(x, t, a)̄ + U(x, t), 𝜕t 2
x ∈ (0, 1)2 , t > 0,
(7.40)
with two opposite edges clumped and the other two free. Then, Ax is of the form Ax (a)̄ q(x) = a1
𝜕2 q(x) 𝜕2 q(x) + a − a3 q(x), 2 𝜕x12 𝜕x22
def
x ∈ Ω = (0, 1)2 ,
(7.41)
7.4 Spatio-temporal structure of the solution | 147
and with the Dirichlet boundary condition on the two edges of the boundary and with the Neumann boundary condition on the second part: q(x1 , 0) = 0,
q(x1 , 1) = 0,
𝜕 q(x1 , x2 ) = 0, x1 =0 𝜕x1
𝜕 q(x1 , x2 ) = 0. x1 =1 𝜕x1
(7.42)
We have already met this operator in Example 36 and we know that its eigenfunctions and eigenvalues are of the following form: for k ′ = 1, 2, . . . and k ′′ = 1, 2, . . . , we have 2 sin(k ′′ π x1 ) cos(k ′ π x2 ),
2
2
a1 (π k ′ ) + a2 (π k ′′ ) + a3 ,
(7.43)
and for k ′ = 0 and k ′′ = 1, 2, . . . , we have √2/π sin(k ′ , π, x2 ), a2 (π k ′′ )2 +a3 , respectively. Operator (7.41) has three unknown parameters. As their nominal values we take a1 = a2 = a3 = 1. In order to estimate their values more precisely, we apply Algorithm 1. We decide to excite the first three spatial modes with the same energies. Their excitations vs. time are shown in Fig. 7.6 (right panel). They were obtained from reading backward in time (t is replaced by (T − t)) the following sensitivity function: def
ρ(t, λ) = −
1(t)e−
√−λt
(2√−λt + e2 −λt (2√−λt − 2) + 2) , 8(−λ)3/2 √
(7.44)
where λ is set to the values 2, 3, and 6, which correspond to the following modes: sin(π x1 ), sin(π x1 ) cos(π x2 ), and sin(2 π x1 ) cos(π x2 ). The overall (approximate) D-optimal input signal (for T = 6) is given by 2 [ρ(T − t, 2) sin(π x1 ) + ρ(T − t, 3) sin(π x1 ) cos(π x2 ) √3 + ρ(T − t, 6) sin(2 π x1 ) cos(π x2 )].
(7.45)
Its shapes, for t ∈ (0.5, 6), with step size 0.5 are plotted in Fig. 7.7. The animation of (7.7) with, e. g., a 10 times larger sampling rate results in a quite pleasing video sequence.
Figure 7.6: Impulse responses of the first three modes of a thin plate (left panel) and approximately D-optimal excitations for three modes (see Example 46).
148 | 7 Optimal inputs for PDEs in time domain
Figure 7.7: The D-optimal space-time shapes of an input signal for a thin plate – sampled in time (see Example 46).
7.5 Safer input signals for DPS If U ∗ (x, t) is too aggressive, one can undertake – at least – the following two actions: – use a shorter observation horizon T, – impose constraints on the energy of the system state, as explained below. Imposing constraints on the system state In order to obtain safer input signals, we impose the constraint T
∫ ∫ q2 (x, t, a)̄ dt dx = 1 Ω 0
(7.46)
7.5 Safer input signals for DPS | 149
as the only one for the sake of simplicity. The rest of the problem statement remains the same as in the previous section. Its solution is further denoted as U ⋆ . Concerning its solution, we follow the results presented in Section 5.4, as well as those obtained in the previous section. These lead to the following conclusions. – The spatio-temporal structure of U ⋆ (x, t) remains the same as in (7.30), i. e., ∞
U ⋆ (x, t) = ∑ √αk⋆ u⋆k (t) vk (x). k=1
–
Several changes that are necessary can be summarized as follows: 1. We have to rederive Corollary 11 to conclude that optimal u⋆j ’s are eigenfunctions of the following integral equations (all with the same eigenvalue γ): for j = 1, 2, . . . , T
T
ζj (U , T) ∫ cj (τ, ν, ∗
0 def
–
(7.47)
T
a)̄ u⋆j (τ) dτ
= γ ∫ u⋆j (τ) Gj (τ, ν, a)̄ dτ,
(7.48)
0
where Gj (τ, ν, a)̄ = ∫0 Ij (t − τ, a)̄ Ij (t − ν, a)̄ dt, while Ij (t, a)̄ is the impulse response of k-th mode. 2. We state as the conjecture that γ = r and it is the largest eigenvalue of all the equations (7.48). Now, it remains to apply the algorithm for ODEs for calculating u⋆k (t), k = 1, 2, . . . , and to solve the D-optimal experiment design problem for αk⋆ ’s.
8 Input signal design for systems with spatio-temporal dynamics – frequency domain approach 8.1 Introduction and notational conventions In this chapter we discuss the problem of D-optimal input signals for estimating parameters in systems with spatio-temporal dynamics. A distinguishing feature of the approach presented in this chapter is that we allow for a long (theoretically, infinite) observation horizon, which makes it possible to apply the frequency domain approach. The results presented here are generalizations of the classic frequency domain approach (see Chapter 5 for the bibliography) to LTI systems with spatio-temporal dynamics. They are also generalizations of the author’s papers on optimal input signals for estimating parameters in systems described by PDEs (see [121, 122]). Due to the orthogonality of the eigenfunctions of elliptic operators, the problem of D-optimal input signal selection splits into the problems of selecting frequencydependent excitations of each spatial mode, but it is still not quite trivial, since we have to allocate the energies of these excitations between spatial modes by solving an auxiliary D-optimal experiment design problem. This is the main result of this chapter. Additionally, we reveal the spatio-temporal structure of input signals in the frequency domain.
8.2 Assumptions Taking into account the previous justifications, we consider a class of systems with spatio-temporal dynamics described by t
q(x, t) = ∫ ∫ G(x, κ, t − τ, a)̄ U(κ, τ) dκ dτ,
(8.1)
0 Ω
where – q(x, t) is the system state at spatial point x ∈ Rd and time t, – U(κ, t) is an input signal at spatial point κ ∈ Ω ⊆ Rd and time t. In this chapter we concentrate on selecting spatio-temporal inputs, i. e., changes are admitted at each κ ∈ Ω and time t, assuming the infinite horizon of observations and of actuating the system. This is the main difference between this chapter and the previous one. Moreover, the results are qualitatively different. The reason is that we have https://doi.org/10.1515/9783110351040-008
152 | 8 Inputs for DPS in frequency domain to ensure the FIM is finite, which requires admitting signals with a finite power, while in the previous chapter it was sufficient to put a bound on the energy of input signals. Above, G(x, κ0 , t, a)̄ is the Green’s function of our system, i. e., its response to δ(κ − κ0 ) δ(t), where ā is the vector of unknown parameters that is estimated from observations. 1. It is not necessary to know the Green’s function explicitly in order to apply the results that follow. It suffices to approximate its sensitivity function numerically for nominal parameter values. 2. There are important classes of DPS for which the Green’s function can be expressed in terms of eigenfunctions of a spatial operator (known also as modal expansions) – we discussed this aspect in the two previous chapters. Remark 26. The results for input signals – acting on boundaries – U(κ, t) = b(κ) u(t), b(κ) – determined by boundary conditions and – (a) pointwise source(s) U(κ, t) = δ(κ − κ0 ) u(t) are similar to those for lumped-parameter systems and we shall not discuss them later. Remark 27. It is possible to obtain the D-optimality conditions for all systems described in terms of G(x, κ, t, a)̄ as well as to derive computational algorithms. They are expressed by means of eigenvalues of an integral operator with the kernel spanned by ∇a G(x, κ, t, a) We omit these results, concentrating mainly on the cases when eigenfunctions of Ax (a)̄ do not depend on ā (see Chapter 6). As in the previous chapter, we consider systems described by PDEs of the parabolic type (7.1) and the hyperbolic type (7.2). The main difference is that we do not specify the initial conditions for t = 0. Instead, we impose zero initial conditions in the remote past of the interval (−T, T) as T → ∞. Also the observations and the action of control signal U(x, t) will be considered in this expanding interval (see also Section 5.2). We maintain all the assumptions made in the previous chapter, with obvious changes resulting from the assumption about the infinite observation horizon.
8.3 Problem statement Under these assumptions the Green’s function is of the form ∞
̄ G(x, κ, t, a)̄ = ∑ vk (x) vk (κ) Ik (t, a), k=1
where Ik ’s are the impulse responses of the k-th mode, which has the form:
(8.2)
8.3 Problem statement | 153
in the parabolic case, Ik̇ (t, a)̄ = −λk (a)̄ Ik (t, a)̄ + δ(t),
(8.3)
Ik̈ (t, a)̄ + μ Ik̇ (t, a)̄ = −λk (a)̄ Ik (t, a)̄ + δ(t).
(8.4)
in the hyberbolic case,
Let us assume that observations (“continuous” in space, i. e., available for each x ∈ Ω) of system t
q(x, t, a)̄ = ∫ ∫ G(x, κ, t − τ, a)̄ U(κ, τ) dτ dκ
(8.5)
Ω −∞
for estimating ā have the form Y(x, t) = q(x, t, a)̄ + ε(x, t),
x ∈ Ω, t ∈ R,
(8.6)
where ε(x, t) is a zero mean, uncorrelated in space and time, covariance stationary, Gaussian random field. Admissible input signals We admit a relatively large class of input signals U(x, t). Namely, for each x ∈ Ω considered as a parameter, U(x, t) as a function of t ∈ R is a realization of the zero mean, stationary and ergodic stochastic process (see, e. g., [109] for the definitions and basic properties). For such processes the cross-covariance function between two processes U(x, ⋅) and U(κ, ⋅), denoted further as Ru (x, κ, τ), is well defined and it can be calculated as follows: T
1 ∫ U(x, t) U(κ, t + τ) dt, T→∞ 2 T
Ru (x, κ, τ) = lim
(8.7)
−T
where the convergence here and in the subsequent formulas is understood in the mean square sense with respect to the time variable for each x, κ ∈ Ω. Note that Ru (x, κ, −τ) = Ru (κ, x, τ). Furthermore, T
1 Ru (x, x, 0) = lim ∫ U(x, t)2 dt T→∞ 2 T −T
(8.8)
154 | 8 Inputs for DPS in frequency domain can be interpreted as the mean power of the input signal at spatial point x ∈ Ω. Thus, the overall power, denoted as 𝒫Ω (U), is given by 𝒫Ω (U) = ∫ Ru (x, x, 0) dx.
(8.9)
Ω
For two processes U(x, ⋅) and U(κ, ⋅) we define their cross-spectral density function, denoted as Su (x, κ, j ω), by the Fourier transform (with respect to time only) Su (x, κ, j ω) = ℱτ [Ru (x, κ, τ)],
(8.10)
where j ω is the argument of this transform. Because Ru (x, κ, −τ) = Ru (κ, x, τ), we have Su (x, κ, j ω) = Suc (κ, x, j ω) = Su (κ, x, −j ω), where c denotes the complex conjugate. These imply that Su (x, x, j ω) is real-valued. By the Parseval equality we also have ∞
1 𝒫Ω (U) = ∫ ∫ Su (x, x, j ω) dx dω, 2π
(8.11)
−∞ Ω
which allows us to impose the constraint on 𝒫Ω (U) in terms of Su . For practical reasons we admit only spectral densities such that Su (x, κ, j ω) = 0
for ω ∈ ̸ [−ωmax , ωmax ], x, κ ∈ Ω,
(8.12)
where 0 < ωmax < ∞ is the largest admissible frequency in the input signal. Its choice may depend on technical, economic, or safety reasons. A salient implication of this assumption is that the set of all attainable information matrices generated by such Su ’s is a closed and bounded set. The proof of this statement goes along the same lines as for systems described by linear ODEs (see Chapter 5). We define the class of admissible input spectral densities 𝒮Ω as the set of all Su for which condition (8.12) holds and simultaneously ∞
∫ ∫ Su (x, x, j ω) dx dω ≤ 1.
(8.13)
−∞ Ω
The FIM in terms of Su Our starting point is expressing the FIM in the spatio-temporal domain. The assumptions concerning ε(x, t) lead to the following conclusion. For the observations in
8.3 Problem statement | 155
(−T, T) and in Ω the FIM has the following form: T
MT (U) = ∫ ∫ ∇a q(x, t, a)̄ ∇atr q(x, t, a)̄ dx dt,
(8.14)
−T Ω
where, by differentiation of (8.5), we have ∞
∇a q(x, t, a)̄ = ∫ ∫ ∇a G(x, κ, t − τ, a)̄ U(κ, τ) dτ dκ.
(8.15)
Ω −∞
Note that the upper integration limit with respect to the time variable is changed from t to ∞. This is valid since ∇a G is causal. Let us substitute (8.15) into (8.14) and consider the normalized FIM: 21T MT (U). Then, changing the orders of the integration in (8.14), we conclude that for T → ∞ matrix 21T MT (U) depends linearly on Ru . The next step is to substitute Ru = ℱτ−1 [Su ] and to rearrange the integrals appropriately, which yields that the m. s. limit limT→∞ 21T MT (U) is equal to ∞
1 ∫ ∫ ∫ ∫ F(x, κ′ , j ω) F tr (x, κ′′ , −j ω) Su (κ ′ , κ′′ , j ω) dx dκ′ dκ′′ dω, 2π
(8.16)
−∞ Ω Ω Ω
where ∞
def
̄ = ∑ vk (x) vk (κ) W̄ k (j ω), F(x, κ, j ω) = ℱt [∇a G(x, κ, t, a)]
(8.17)
def ̄ W̄ k (j ω) = ℱt (∇a Ik (t, a)),
(8.18)
k=1
where
while ℱt stands for the Fourier transform with respect to the time variable. We call (8.16) the averaged FIM and denote it by M(Su ). The above results show that M(Su ) = lim
T→∞
1 M (U), 2T T
where the convergence is in the mean square sense with respect to time. For Su ∈ 𝒮Ω this limit exists, provided that (8.7) holds and the system under considerations is asymptotically stable. Problem statement Our aim is to select the spectral density Su∗ of the input signal for which maxSu det[M(Su )] is attained over the class 𝒮Ω .
156 | 8 Inputs for DPS in frequency domain Note that (8.13) is the constraint on the mean (with respect to time and space) averaged power of U(x, t). The problem reads as follows: we are looking for cross-spectral densities between each pair of points x, κ ∈ Ω, but the orthogonality of vk ’s and the convexity of 𝒮Ω help us to solve the problem. The set ℳΩ of all attainable FIMs def
ℳΩ = {M(Su ) : Su ∈ 𝒮Ω }
is convex and compact, under suitable assumptions concerning F(x, κ, j ω), namely, its continuity for x, κ ∈ Ω and ω ∈ [−ωmax , ωmax ]. The complete class theorem Set SΩ is rather rich. The following theorem allows for an essential reduction of the search for optimal solutions, not only for D-optimality criteria, but also for other criteria. Theorem 14. Consider a sub-set of SΩ of densities having the following form: ∞
S̃u (x, κ, j ω) = ∑ vk (x) vk (κ) sk̃ (ω), k=1
(8.19)
where sk̃ (ω), k = 1, 2, . . . , are spectral densities of certain univariate signals such that ∞
∑ ∫ sk̃ (ω) dω ≤ 1.
k=1
(8.20)
Then, for each Su ∈ SΩ there exists S̃u ∈ SΩ of the form (8.19), (8.20) such that their FIMs are the same, i. e., M(Su ) = M(S̃u ). Note that for densities of the form (8.19) we have S̃u (x, κ, j ω) = S̃u (κ, x, j ω),
S̃u (x, κ, j ω) = S̃u (κ, x, −j ω)
and sk̃ (ω)’s have real values. Proof. Let Su ∈ SΩ be arbitrary, but fixed. As a step toward the proof, consider the following fact: by the orthogonality of vk ’s, substituting (8.17) into (8.16) we obtain ∞ ∞
M(Su ) = ∑ ∫ W̄ k (j ω) W̄ ktr (−j ω) sk (ω) dω, k=1 −∞
(8.21)
where def
sk (ω) = ∫ ∫ vk (x) vk (κ) Su (x, κ, j ω) dx dκ. Ω Ω
(8.22)
8.4 The equivalence theorem – the FD approach
| 157
The constraint on the averaged power of Su translates as ∞ ∞
∑ ∫ sk (ω) dω ≤ 1.
k=1 −∞
(8.23)
Hence, in order to obtain S̃u ∈ SΩ with the same FIM as for Su , it suffices to select sk̃ (ω) in (8.19) as the right-hand side of (8.22), k = 1, 2, . . . . Note that the elements of FIM (8.21) are real numbers, since sk (ω)’s and the elements of W̄ k (j ω) W̄ ktr (−j ω) are real-valued, observing that W̄ k (j ω) W̄ ktr (−j ω) is equal to its own conjugate transposed.
8.4 The equivalence theorem for input signals – the space-frequency domain approach Due to Theorem 14, in particular, due to (8.21) and (8.23), the optimization problem of finding the D-optimal spectral density of the input signal is only slightly more difficult than for the ODEs. Theorem 15. Assume that all the elements of W̄ k (j ω) are continuous functions for ω ∈ [−ωmax , ωmax ] for k = 1, 2, . . . . (1) The D-optimal spectral density of the input signal can be represented in the following form: ∞
Su∗ (x, κ, j ω) = ∑ s∗k (ω) vk (x) vk (κ), k=1
(8.24)
where ∞
ωmax
∑ ∫ s∗k (ω) dω = 1.
k=1 −ω
(8.25)
max
(2) It is D-optimal if and only if max sup W̄ ktr (−j ω) M −1 (Su∗ )W̄ k (j ω) = r, k
ω
(8.26)
̄ k = 1, 2, . . . and ω ∈ [−ωmax , ωmax ]. where r = dim(a), Proof. The representation (8.24) is a direct consequence of the previous theorem, while (8.25) follows from the fact that det[M(α Su )] > det[M(Su )] for α > 1.
158 | 8 Inputs for DPS in frequency domain To prove (2), let us note (see also Chapter 5) that for arbitrary Su ∈ SΩ for which (8.23) holds and M(Su ) is the nonsingular FIM, we have ∞
ωmax
r = trace[M(Su ) M (Su )] = ∑ ∫ W̄ ktr (−j ω) M −1 (Su )W̄ k (j ω) sk (ω) dω −1
k=1 −ω
max
ωmax
∞
≤ ∑ sup[W̄ ktr (−j ω) M −1 (Su )W̄ k (j ω)] k=1
ω
∫
sk (ω) dω
(8.27)
−ωmax
≤ max sup W̄ ktr (−j ω) M −1 (Su )W̄ k (j ω), k
ω
since all the summands W̄ ktr (−j ω) M −1 (Su )W̄ k (j ω) are nonnegative and condition (8.23) holds. On the other hand, for Su∗ being D-optimal and for α ∈ (0, 1) consider the family of spectral densities of the form Su(α) = (1 − α) Su∗ + α vl (x) vl (κ) [δ(ω − ω0 ) + δ(ω + ω0 )]/2,
(8.28)
where ω0 ∈ [−ωmax , ωmax ] and l = 1, 2, . . . are arbitrary. D-optimality of Su∗ implies that d log[det[M(Su(α) )]]α=0 ≤ 0 dα
(8.29)
and on differentiation we obtain d log[det[M(Su(α) )]]α=0 = W̄ ltr (−j ω0 ) M −1 (Su∗ )W̄ l (j ω0 ) − r ≤ 0. dα
(8.30)
Note that (8.30) holds for any ω0 ∈ [−ωmax , ωmax ] and l = 1, 2, . . . , thus it holds also for the supremum. Combining this fact with (8.27), we conclude that the necessity of (2) is proved. Its sufficiency follows from the strict concavity of log[det[⋅]] in ℳΩ .
8.5 The structure of the D-optimal input signal Let us recall that under our assumptions we have λk (a)̄ = h̄ tr k a,̄
k = 1, 2, . . . ,
̄ W̄ k (j ω) = ℱt (∇a Ik (t, a),
̄ ∇a Ik (t, a)̄ = h̄ k ρk (t, a).
(8.31) (8.32)
For the parabolic case we have Ik̇ (t; a)̄ = −λk (a)̄ Ik (t; a)̄ + δ(t).
(8.33)
8.5 The structure of the D-optimal input signal | 159
̄ −1 , which yields Thus, W̄ k (j ω) = ∇a (j ω + λk (a)) W̄ k (j ω) = Φk (j ω) h̄ k ,
(8.34) −2
̄ Φk (j ω) = −(j ω + λk (a)) .
(8.35)
Analogously, for the hyperbolic case we have Ik̈ (t; a)̄ + μ Ik̇ (t; a)̄ = −λk (a)̄ Ik (t; a)̄ + δ(t).
(8.36)
W̄ k (j ω) = Ψk (j ω) h̄ k ,
(8.37)
Thus,
where ̄ Ψk (j ω) = −(−ω2 + j ω μ + λk (a)) . −2
(8.38)
Note that in both cases W̄ k (j ω)’s have the same structure, namely a vector multiplied by a scalar-valued function. Define the following frequencies for the parabolic and the hyperbolic case, respectively:
def
ω∗k = arg
max
ω∈[−ωmax , ωmax
|Ψk (j ω∗k )|2 ⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞2 ∗ Φk (j ωk ) , ]
k = 1, 2, . . . .
(8.39)
The following result provides insight into the structure of Su∗ . Theorem 16. Consider the spectral density of the following form: ∞
Su∗ (x, κ, j ω) = ∑
v⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟ k (x) vk (κ) k=1 single spatial mode
⋅ Θk (ω),
(8.40)
σk∗ [δ(ω − ω∗k ) + δ(ω + ω∗k )] . 2 ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟
(8.41)
where def
Θk (ω) =
single sine wave
If ω∗k ’s are selected as in (8.39) while σk∗ ’s are selected as the solution of the optimization problem |Ψk (j ω∗k )|2 ∞ ⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞2 max det( ∑ σk Φk (j ω∗k ) h̄ k h̄ tr k ), σ1 ⋅⋅⋅ k=1
(8.42)
160 | 8 Inputs for DPS in frequency domain ∗ under the constraints ∑∞ k=1 σk ≤ 1, σk > 0, k = 1, 2, . . . , then Su (x, κ, j ω) given by (8.40) is the D-optimal input spectral density function for the parabolic case (the over-brace version corresponds to the hyperbolic case).
Proof. If σk∗ ’s solve the problem (8.42), then – by the Kiefer–Wolfowitz theorem – we have |Ψk (j ω∗k )|2 ⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞2 −1 ∗ ̄ max [Φk (j ω∗k ) h̄ tr k M (Su ) hk ] = r k=1, 2,...
(8.43)
or, equivalently, |Ψk (j ω)|2 ⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞2 −1 ∗ ̄ max max[Φk (j ω) h̄ tr k M (Su ) hk ] = r, k=1, 2,... ω
(8.44)
where ω ∈ [−ωmax , ωmax ]. Now, it suffices to observe that (8.44) is the same as the condition (8.26) for D-optimality of Su∗ in Theorem 15, taking into account the formulas for W̄ k ’s that are provided at the beginning of this section. The algorithm for calculating Su∗ when vk ’s are known 1. Find optimal frequencies for exciting each mode by solving the following optimization problems: |Ψk (j ω)|2 ⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞2 ω∗k = arg sup Φk (j ω) , ω
k = 1, 2, . . . , R,
(8.45)
where R = r (r + 1)/2. 2. Calculate vectors of weights h̄ k , k = 1, 2, . . . , R. This step is easy when vk ’s are known. 3. Solve problem (8.42), which provides the optimal energies of each mode σ1∗ , σ2∗ , . . . , σR∗ . 4. Form the D-optimal input signal: R
u∗ (x, t) = ∑ √σk∗ vk (x) sin(ω∗k + θk ), k=1
(8.46)
where θk , k = 1, 2, . . . , R, are independent, uniformly distributed [−π, π] random variables. Remarks on calculating Su∗ when vk ’s are known.
8.6 The structure of Su∗
1.
|
161
Problem (8.42) is the standard D-optimal experiment design task, where only
weights are optimized. It can be solved using the following algorithms: – –
the MWUA – when R is not very large (typical case),
the Wynn–Fedorov algorithm with later modifications – when R is large.
Note that here we do not have difficulties with rounding, because σk ’s can be any
2.
real number.
It is worth explaining why we consider only the first R eigenfunctions, instead
of trying to select them from the whole infinite sequence. Let us assume that the eigenvalues are sorted as λ1 ≤ λ2 ≤ ⋅ ⋅ ⋅ . Then we can be sure that the first R eigen-
functions are the most informative, because for the second-order elliptic operator we have λk = O(k 2 ), which implies |Φk (j ω)|2 = O(k −8 ), while the largest elements of b̄ b̄ tr matrices are of the order O(k 4 ), finally yielding O(k −4 ). For the fourthk
3.
k
order operators the information drops even faster.
When vk ’s are not known explicitly, they can be found numerically using efficient
procedures available, e. g., in MATLAB.
8.6 Insight into the structure of Su∗ and examples In order to obtain deeper insight into the structure of Su∗ we discuss separately the
cases of parabolic and hyperbolic systems.
Corollary 13. For systems of the parabolic type all ω∗k = 0. Thus, D-optimal input is constant in time, i. e.,
R
u(x, t) = ∑ √σk∗ vk (x),
(8.47)
k=1
where eigenfunctions depend on the type of boundary conditions and on the shape of Ω. Note that the D-optimal input is constant in time regardless of the shape of Ω. Example 47. Consider the parabolic equation case of a heated plate The results of
D-optimal heating are shown in the left panel of Fig. 8.1 for the boundary conditions (BCs) of the first kind, while the right panel corresponds to the mixed BC of the first and the second kind.
Corollary 14. For systems of the hyperbolic type, 𝜕 q(x, t) 𝜕2 q(x, t) +μ = Ax (a)̄ q(x, t) + u(x, t), 𝜕t 𝜕t 2
x ∈ Ω, t ∈ (0, T),
(8.48)
162 | 8 Inputs for DPS in frequency domain
Figure 8.1: Spatial distribution of heating (constant in time).
where μ > 0 is the damping coefficient, the D-optimal input has the following form: R
u(x, t) = ∑ √σk∗ vk (x) sin(ω∗k t + θk ),
(8.49)
k=1
where the optimal frequencies are given by the following formulas: 2
λk (a)̄ − μ2 /2, if λk (a)̄ − μ2 /2 > 0,
(ω∗k ) = {
0,
otherwise.
(8.50)
Eigenfunctions vk ’s depend on the type of boundary conditions and on the shape of Ω, but not on time t. Fig. 8.2 illustrates how complicated spatio-temporal shapes of a D-optimal input signal can arise in the hyperbolic systems case, even when only two modes are considered. ̄ which are the resonant frequencies We remark that if μ → 0, then ω∗k → √λk (a), of this system. 2 ̄ /2 are unknown, but in many cases they The optimal frequencies (ω∗k )2 = λk (a)−μ can be easily determined experimentally by tuning the input signal to the resonant frequency, even if the damping factor μ is relatively large. Example 48. We continue Example 43, but this time the elliptic operator is a part of the following system of the hyperbolic type: 𝜕2 q(x, t, a)̄ d2 q(x, t, a)̄ = a − a2 q(x, t, a)̄ + U(x, t), 1 𝜕 t2 dx 2
x ∈ (0, 1),
(8.51)
with boundary conditions q(0, t, a)̄ = q(1, t, a)̄ = 0.
(8.52)
8.6 The structure of Su∗
|
163
Figure 8.2: The spatio-temporal structure of the optimal input signal for two modes in Example 48.
According to the assumptions in this chapter, we do not specify initial conditions, expecting that their influence died out in a remote past. We already know that vk (x) = √2 sin(k π x), λk (a) = a1 π 2 k 2 + a2 , and h̄ k = 2 2 [k π , 1]tr . Selecting nominal values of parameters as a1 = a2 = 1 and invoking Corollary 14 we can calculate the frequencies ω∗1 = √2 and ω∗2 = √5. We consider only two modes, thus, σ1∗ = σ2∗ = 1/2 and the overall input signal has the form U ∗ (x, t) = √0.5 [sin(π x) sin(√2 t + θ1 ) + sin(2 π x) sin(√5 t + θ2 )], where θ1 and θ2 can be selected at random, uniformly in [0, 2 π] (see Fig. 8.2 for the spatio-temporal shape of this signal). Example 49. This time, we continue Example 39 from Chapter 6. Consider flexural vibrations of the clumped beam: for x ∈ (0, π) 𝜕2 q(x, t, a)̄ 𝜕4 q(x, t, a)̄ 𝜕2 q(x, t, a)̄ = a + a − a3 q(x, t, a)̄ + U(x, t) 1 2 𝜕x4 𝜕 t2 𝜕x2
(8.53)
with boundary conditions q(0, t, a)̄ = q(π, t, a)̄ = 0,
𝜕2 q(x, t, a)̄ 𝜕2 q(x, t, a)̄ = = 0. 2 x=0 x=π 𝜕x 𝜕x 2
(8.54)
The eigenfunctions and the eigenvalues corresponding to the spatial differential operator in (8.53) are of the form √2/π sin(k x) and a1 k 4 + a2 k 2 + a3 , respectively, k = 1, 2, . . . . Referring to Corollary 14 and setting a1 = a2 = a3 = 1 as the nominal values of unknown parameters, we obtain the following frequencies of modal excitations: ω∗1 =
164 | 8 Inputs for DPS in frequency domain √3, ω∗2 = √21, ω∗3 = √91. We consider only three modes, thus, σ1∗ = σ2∗ = σ3∗ = 1/3, which leads to the following D-optimal input signal representation: U ∗ (x, t) =
1 [sin(√3 t + θ1 ) sin(x) + sin(√21 t + θ2 ) sin(2 x) √3 π + sin(√91 t + θ3 ) sin(3 x)].
(8.55)
Its shape is shown in Fig. 8.3.
Figure 8.3: The shape of a spatio-temporal structure of the optimal input signal for three modes in Example 49.
All the above results convey smoothly to the following cases: – when Ax (a)̄ has multiple eigenvalues, – L- and Lp -optimality criteria (the equivalence theorems look different, but the spatio-temporal structures are similar). Part of the above results can be generalized to LTI systems with spatially varying coefficients, which are unknown, but they can be approximated by finite linear combinations of known functions.
9 Final comments In this chapter we present some thoughts on possible extensions of the results presented in Chapters 6–8. In the first section some suppositions on relaxing the main assumptions made in these chapters are discussed. They are supported by simulation experiments. Then, in Section 9.2, we list the topics that are left outside the scope of this book. Most of them are simultaneously open problems. Finally, we briefly indicate the possible limitations of the use of optimal input signals directly in practice.
9.1 Suppositions about extending the possible applicability of the results Our aim in this section is to discuss possible extensions of the results contained in this book. Some of them are documented by simulations, while others are stated as suppositions based on general experience. When Ω is not expressible as the Cartesian product of lower-dimensional sets, corresponding to spatial variables, then – strictly speaking – the eigenfunctions of the elliptic operator may depend on unknown parameters, as illustrated by the following example. On the other hand, this and other numerical examples conducted by the author (not provided here) indicate that the dependence of the several first eigenfunctions on the parameters is rather weak and the corresponding eigenvalues depend almost linearly on them. Thus, the results presented in the previous chapters are still applicable with the status “approximate” or “sub-optimal” solutions. Example 50. Consider domain Ω shown in Fig. 9.1. In this domain we define operator Ax (a) q(x) =
𝜕2 q(x) 𝜕2 q(x) + a , 𝜕x12 𝜕x22
Figure 9.1: Nonconvex domain used in Example 50. https://doi.org/10.1515/9783110351040-009
x ∈ Ω,
(9.1)
166 | 9 Final comments with the homogenous boundary conditions of the Dirichlet type, where a > 0 is an unknown parameter. The first six eigenvalues and eigenfunctions of this operator were calculated numerically for a ranging from 1 to 4.5. The dependence of eigenvalues on a is plotted in Fig. 9.2. As one can observe, these dependencies are linear in a for the four first eigenvalues in the whole range changes of a. The fifth and sixth eigenvalues are almost linear in a ∈ [1, 3], while for larger values of a the departures from linearity are visible, but it seems that linear approximations are still acceptable.
Figure 9.2: Eigenvalues vs. the Laplacian parameter in Example 50.
Figure 9.3: Eigenfunctions vs. the Laplacian parameter in the nonconvex domain (see Example 50).
9.2 What is left outside this book? |
167
Figure 9.4: Difference of eigenfunctions the Laplacian vs. parameter.
The first six eigenfunctions of the operator (9.1) with a = 1, together with the corresponding eigenvalues, are shown in Fig. 9.3. They are numbered in lexicographical order. The same six eigenfunctions were calculated for a = 2. They are not displayed here, since the differences are almost invisible. Instead, the absolute differences between eigenfunctions for a = 1 and a = 2 are shown in Fig. 9.4. As one can notice, the differences for the first three eigenfunctions are rather small. The conclusion from this example and other similar simulations is the following: one can use the results contained in Chapters 6–8 also for domains that are not Cartesian products as approximate solutions.
9.2 What is left outside this book and open problems Spatially varying coefficients Deep results on identifying spatially varying coefficients in PDE are mentioned in the Introduction. Some of the results contained in this book can be extended to this class of systems, assuming that spatial variability of these coefficients can be parametrized by a finite set of unknown, but constant parameters. PDE’s with time-varying coefficients Undoubtedly, the frequency domain approaches developed in this book cannot be directly extended to such systems. Formally, one can apply variational calculus to systems with time-varying coefficients. Thus, they can be considered in the same way as in Chapter 7, but it seems to be safe to not apply our results without further investigation.
168 | 9 Final comments Quasi-linear PDEs Systems described by quasi-linear PDE’s are frequently considered as “easy” generalizations of linear PDEs to nonlinear cases. For our problems the situation is different, since unknown parameters depend on the system state, which is time-varying. Thus, the difficulties mentioned above are in force. Nonlinear PDEs The class of nonlinear PDEs is too wide to be precisely discussed. Certainly, we cannot directly apply the results that are based on the frequency domain approach. Even the application of a linearization does not help in general, since it leads to a linear PDE, but having – in general – time-varying coefficients. In addition to the remarks made above, one can also consider the following problems that are deserving of further research. 1. Simultaneous selection of input signals and sensor positions or moving sensor trajectories is still an open problem. The bibliography on preliminary results in these directions is collected in the Introduction (at the end of Section 1.4). 2. For the sake of simplicity, we considered observations at each time instant, but in practice we will have sampled data. The sampling rate is the next factor to be included in the design of the experiment. One may also consider benefits of applying sampling points that are not equidistantly placed. This topic was mentioned in the Introduction. It deserves further attention in order to also take into account the highest frequencies of input and output signals. 3. We confined our attention to open loop excitation signals, which means that an internal structure of a system can be complicated, including many closed loops, but a direct closed loop between observations of the system state (or output) and input excitations is excluded. They are preferable when: (a) our aim is to estimate parameters of PDEs, treated as material constants, (b) we have constraints on the power of input signals. However, as we have already mentioned, closed loop experiments can be preferable when we have constraints on the output power of excitations. It is worth considering similar approaches to DPS identification. We emphasize, once again, that in this book we excluded direct connection (closed loop) signals only between observed signals and input signals, but an internal structure of a system to be identified can contain (many) feedback connections. 4. Our main topic are the D-optimal input signals for DPS system identification. We left outside this book the topics concerning algorithms of efficient system identification. References to these topics are provided in the Introduction. 5. In this book we concentrated mainly on the D-optimality criterion. In most cases, the results can be conveyed to A-optimality, L-optimality, and Lp -optimality criteria, but not to all the criteria. For example, T-optimality is the exception that is noteworthy (see the references in the Introduction).
9.3 Concluding remarks | 169
6.
We confined our attention to LTI systems with spatio-temporal dynamics. Extensions of the results to (sub-)classes of nonlinear systems would be desirable. Important results on selecting input signals for parameter estimation in finitedimensional systems of the Wiener and the Hammerstein type were already obtained (see the references in the Introduction). The results on nonparametric or semi-parametric identification of such systems and other sandwich-like ones can be found in [55, 56, 67, 105, 167, 185]. As the next step, one may consider spatio-temporal extensions of the Hammerstein and Wiener systems and the selection of input signals for their accurate identification. 7. The next step toward extending the established results would be a nonparametric estimation of spatio-temporal systems in which we are faced with the necessity of estimating not only constant parameters, but also parameters that are functions of other (e. g., spatial) variables. 8. Note that we have stayed within the class of systems with spatio-temporal dynamics that can be described by PDEs with derivatives of orders being positive integers. However, in recent years the theory of PDEs and systems with fractional derivatives has been extensively developed. It would be of interest to develop the theory of optimal input signals for estimating parameters in such classes of systems. As a possible class of candidates for optimal input signals one can consider the class of fractional Brownian motions that occurred to be useful in the determination whether an I/O relationship exists (see [135]). 9. In this monograph, one can observe a tendency that if “wilder” input signals are allowed, then better parameter estimation accuracy is attainable. Note that up to now, we have excluded input signals containing the Dirac delta functions from the considerations, since they allow for an instantaneous change of the system state (see [147] for possible application in laser-based additive manufacturing where a laser is able to change a system state almost instantaneously). However, if we can allow for such input signals, then one can expect an additional reward in the parameter estimation accuracy of a system.
9.3 Concluding remarks Under several simplifying assumptions we were able to establish pleasing results that D-optimal input signals for estimating parameters in systems described by PDEs are sums of excitations, which are products of natural modes in space and sinusoids with natural system frequencies in time. Similar results in the time domain were also provided. In both cases computational algorithms are also provided. Many examples document their usefulness. One may well ask whether these results are directly applicable in practice. Note that our results rely on the assumption that the mathematical model is correct and
170 | 9 Final comments only a finite number of parameters is unknown. The second factor that should be taken into account is that we imposed a limited number of constraints, which led to rather aggressive input signals, potentially dangerous to a system to be identified. Thus, such signals can be directly applied only when we are conducting experiments in a laboratory, under perfectly controlled conditions. Otherwise, it is reasonable to use a mixture of these signals with wideband, persistent signals in order to be able to detect that a model is not adequate. For safety reasons, one should also consider imposing additional constraints on input and/or output signals. On the other hand, by imposing only minimal assumptions on input signal energy or its power we were able to obtain explicit (or almost explicit) results that can serve as lower bounds on the achievable estimation accuracy, under favorable assumptions, concerning observations and excitations at every point of the spatial domain. As partially demonstrated in this book, the proposed methodology of obtaining D-optimality conditions and D-optimal input signals can be, after additional effort, extended to include other constraints that may arise in practice.
Bibliography [1]
[2] [3] [4]
[5] [6]
[7] [8] [9] [10]
[11] [12] [13] [14] [15] [16] [17]
[18] [19] [20] [21] [22]
Sergey Abrashov, Rachid Malti, Xavier Moreau, Mathieu Moze, François Aioun, and Franck Guillemard. Optimal input design for continuous-time system identification. Communications in Nonlinear Science and Numerical Simulation, 60:92–99, 2018. Kuzman Adzievski and Abul Hasan Siddiqi. Introduction to Partial Differential Equations For Scientists And Engineers Using Mathematica. Chapman and Hall/CRC, 2016. Mohammed A. Al-Gwaiz. Sturm–Liouville Theory and its Applications, volume 7. Springer, 2008. Jorge E. Alaña and Constantinos Theodoropoulos. Optimal spatial sampling scheme for parameter estimation of nonlinear distributed parameter systems. Computers & Chemical Engineering, 45:38–49, 2012. Alen Alexanderian. Optimal experimental design for infinite-dimensional Bayesian inverse problems governed by PDEs: a review. Inverse Problems, 37:043001 2021. Mariette Annergren, Christian A. Larsson, Hakan Hjalmarsson, Xavier Bombois, and Bo Wahlberg. Application-oriented input design in system identification optimal input design for control. IEEE Control Systems Magazine, 37(2):31–56, 2017. Nakhlé H. Asmar. Partial Differential Equations with Fourier Series and Boundary Value Problems. Courier Dover Publications, 2016. Anthony Atkinson, Alexander Donev, and Randall Tobias. Optimum Experimental Designs, with SAS, volume 34. Oxford University Press, 2007. Anthony C. Atkinson and Barbara Bogacka. Compound D-and DS -optimum designs for determining the order of a chemical reaction. Technometrics, 39(4):347–356, 1997. Anthony C. Atkinson and Barbara Bogacka. Compound and other optimum designs for systems of nonlinear differential equations arising in chemical kinetics. Chemometrics and Intelligent Laboratory Systems, 61(1):17–33, 2002. Anthony C. Atkinson and Valerij V. Fedorov. The design of experiments for discriminating between two rival models. Biometrika, 62(1):57–70, 1975. Anthony C. Atkinson and R. A. Bailey. One hundred years of the design of experiments on and off the pages of Biometrika. Biometrika, 88(1):53–97, 2001. Corwin L. Atwood. Convergent design sequences, for sufficiently regular optimality criteria. The Annals of Statistics, 4(6):1124–1138, 1976. Alampallam V. Balakrishnan. Introduction to Optimization Theory in a Hilbert Space, volume 42. Springer Science & Business Media, 2012. Alampallam V. Balakrishnan. Applications of Mathematics: Applied Functional Analysis, volume 3. Springer Science & Business Media, 2012. Thomas H. Banks. A Functional Analysis Framework for Modeling, Estimation and Control in Science and Engineering. Chapman and Hall/CRC, 2012. Thomas H. Banks, James Crowley, and Karl Kunisch. Cubic spline approximation techniques for parameter estimation in distributed systems. IEEE Transactions on Automatic Control, 28(7):773–786, 1983. Thomas H. Banks and Karl Kunisch. Estimation Techniques for Distributed Parameter Systems. Springer Science & Business Media, 2012. Thomas H. Banks and P. Daniel Lamm. Estimation of variable coefficients in parabolic distributed systems. IEEE Transactions on Automatic Control, 30(4):386–398, 1985. Dimitri P. Bertsekas. Convex Optimization Theory. Athena Scientific, Belmont, 2009. Dimitri P. Bertsekas. Convex Optimization Algorithms. Athena Scientific Belmont, 2015. Patrick Billingsley. Convergence of Probability Measures. John Wiley & Sons, 2013.
https://doi.org/10.1515/9783110351040-010
172 | Bibliography
[23]
[24]
[25] [26] [27]
[28]
[29] [30]
[31] [32] [33] [34] [35]
[36] [37] [38] [39]
[40]
[41]
[42] [43]
Barbara Bogacka, Maciej Patan, Patrick J. Johnson, Kuresh Youdim, and Anthony C. Atkinson. Optimum design of experiments for enzyme inhibition kinetic models. Journal of Biopharmaceutical Statistics, 21(3):555–572, 2011. Xavier Bombois, Gérard Scorletti, Michel Gevers, Paul M. J. Van den Hof, and Roland Hildebrand. Least costly identification experiment for control. Automatica, 42(10):1651–1662, 2006. Haim Brezis. Functional Analysis, Sobolev Spaces and Partial Differential Equations. Springer Science & Business Media, 2010. Eduardo Casas and Karl Kunisch. Parabolic control problems in space-time measure spaces. ESAIM: Control, Optimisation and Calculus of Variations, 22(2):355–370, 2016. Eduardo Casas and Karl Kunisch. Optimal control of the two-dimensional stationary Navier–Stokes equations with measure valued controls. SIAM Journal on Control and Optimization, 57(2):1328–1354, 2019. Guy Chavent and Karl Kunisch. The output least squares identifiability of the diffusion coefficient from an H1 -observation in a 2-D elliptic equation. ESAIM: Control, Optimisation and Calculus of Variations, 8:423–440, 2002. Kuang F. Cheng and Pi E. Lin Nonparametric estimation of a regression function. Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete, 57:223–233, 1981. Christian Clason, Karl Kunisch, and Philip Trautmann. Optimal control of the principal coefficient in a scalar wave equation. Applied Mathematics & Optimization, 84(3):2889–2921, 2021. Noel Cressie and Christopher K. Wikle. Statistics for Spatio-Temporal Data. John Wiley & Sons, 2015. Ruth F. Curtain and Hans Zwart. An Introduction to Infinite-Dimensional Linear Systems Theory, volume 21. Springer Science & Business Media, 2012. Giuseppe Da Prato and Jerzy Zabczyk. Stochastic Equations in Infinite Dimensions. Cambridge University Press, 2014. Alexander De Cock, Michel Gevers, and Johan Schoukens. D-optimal input design for nonlinear fir-type systems: a dispersion-based approach. Automatica, 73:88–100, 2016. Domenico De Tommasi, D. Ferri, G. C. Marano, and G. Puglisi. Material parameters identification and experimental validation of damage models for rubberlike materials. European Polymer Journal, 78:302–313, 2016. Holger Dette, Viatcheslav B. Melas, Petr Shpilev, et al.T-optimal designs for discrimination between two polynomial models. The Annals of Statistics, 40(1):188–205, 2012. Holger Dette and Ingo Röder. Optimal product designs for multivariate regression with missing terms. Scandinavian Journal of Statistics, 23(2):195–208, 1996. Holger Dette and William J. Studden. The Theory of Canonical Moments with Applications in Statistics, Probability, and Analysis, volume 338. John Wiley & Sons, 1997. Jie Dong, Qiang Wang, Mengyuan Wang, and Kaixiang Peng. Data-driven quality monitoring techniques for distributed parameter systems with application to hot-rolled strip laminar cooling process. IEEE Access, 6:16646–16654, 2018. Xunde Dong, Cong Wang, Qigui Yang, and Wenjie Si. System identification of distributed parameter system with recurrent trajectory via deterministic learning and interpolation. Nonlinear Dynamics, 95(1):73–86, 2019. Belmiro P .M. Duarte, Anthony C. Atkinson, José F. O. Granjo, and Nuno M. C. Oliveira. Optimal design of mixture experiments for general blending models. Chemometrics and Intelligent Laboratory Systems, 217:104400, 2021. Dean G. Duffy. Green’s Functions with Applications. Chapman and Hall/CRC, 2015. Peter Eykhof. System identification, parameter and state estimation, 1974.
Bibliography | 173
[44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65]
Donald W. Fausett and Charles T. Fulton. Large least squares problems involving Kronecker products. SIAM Journal on Matrix Analysis and Applications, 15(1):219–227, 1994. Donald W. Fausett and Hany Hashish. Overview of QR methods for large least squares problems involving Kronecker products. In Proceedings of the Third International Colloquium on Numerical Analysis, pages 71–80. De Gruyter, 2020. Valerii V. Fedorov. Theory of Optimal Experiments. Elsevier, 1972. Valerii V. Fedorov and Sergei L. Leonov. Optimal Design for Nonlinear Response Models. CRC Press, 2013. Johan Fellman. On the allocation of linear observations. Commentationes Physico-Mathematicae, 44(2–3):27–78, 1974. Lenka Filov and Radoslav Harman. Cocktail algorithm for a-optimal designs. PROBASTAT 2015, 7:24, 2015. D. Franke. Adaptive and robustness properties of certain bilinear distributed parameter control systems. In Distributed Parameter Control Systems: Theory and Application, chapter 6, pages 179–212, 1982. Th. Gasser and H. G. Muller Kernel estimation of regression functions. In Th. Gasser and M. Rosenblatt, editors, Smoothing Techniques for Curve Estimation, pages 23–68. Springer, Heidelberg, 1979. Laszlo Gerencser, Hakan Hjalmarsson, and Lirong Huang. Adaptive experiment design for LTI systems. IEEE Transactions on Automatic Control, 62(5):2390–2405, May 2017. Graham Goodwin and Robert Payne. Dynamic System Identification. Experiment Design and Data Analysis. Academic Press, 1977. Graham Goodwin, Martin Zarrop, and Robert Payne. Coupled design of test signals, sampling intervals, and filters for system identification. IEEE Transactions on Automatic Control, 19(6):748–752, 1974. Wlodzimierz Greblicki and Miroslaw Pawlak. Identification of discrete Hammerstein systems using kernel regression estimates. IEEE Transactions on Automatic Control, 31(1):74–77, 1986. Włodzimierz Greblicki and Mirosław Pawlak. Nonparametric system identification, volume 1. Cambridge University Press, Cambridge, 2008. Wlodzimierz Greblicki, Miroslaw Pawlak, and Adam Krzyżak. Distribution-free pointwise consistency of kernel regression estimate. The Annals of Statistics, 12:1570–1575, 1984. Michael D. Greenberg. Applications of Green’s Functions in Science and Engineering. Courier Dover Publications, 2015. John Gregory. Constrained Optimization in the Calculus of Variations and Optimal Control Theory. Chapman and Hall/CRC, 2018. Saurav Gupta, Ajit Kumar Sahoo, and Upendra Kumar Sahoo. Volterra and Wiener model based temporally and spatio-temporally coupled nonlinear system identification: a synthesized review. IETE Technical Review, 38(3):303–327, 2021. Karl E. Gustafson. Introduction to Partial Differential Equations and Hilbert Space Methods. Courier Corporation, 2012. László Györfi, Michael Kohler, Adam Krzyżak, and Harro Walk. A Distribution-Free Theory of Nonparametric Regression, volume 1. Springer, 2002. Per Hagg, Christian A. Larsson, Afrooz Ebadat, Bo Wahlberg, and Hakan Hjalmarsson. Input signal generation for constrained multiple-input multiple-output systems. In The 19th IFAC World Congress, Cape Town, South Africa, 2014. Peter Hall. Integrated square error properties of kernel estimators of regression functions. The Annals of Statistics, 12:241–260, 1984. Radoslav Harman, Lenka Filová, and Peter Richtárik. A randomized exchange algorithm for computing optimal approximate designs of experiments. Journal of the American Statistical Association, 115(529):348–361, 2020.
174 | Bibliography
[66] [67]
[68]
[69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86]
[87]
Radoslav Harman and Luc Pronzato. Improvements on removing nonoptimal support points in D-optimum design algorithms. Statistics & Probability Letters, 77(1):90–94, 2007. Zygmunt Hasiewicz, Miroslaw Pawlak, and Przemyslaw Sliwinski. Nonparametric identification of nonlinearities in block-oriented systems by orthogonal wavelets with compact support. IEEE Transactions on Circuits and Systems I: Regular Papers, 52(2):427–442, 2005. Roland Herzog, Ilka Riedel, and Dariusz Uciński. Optimal sensor placement for joint parameter and state estimation problems in large-scale dynamical systems with applications to thermo-mechanics. Optimization and Engineering, 19(3):591–627, 2018. Hakan Hjalmarsson and Henrik Jansson. Closed loop experiment design for linear time invariant dynamical systems via LMIs. Automatica, 44(3):623–636, 2008. Håkan Hjalmarsson, Jonas Mårtensson, and Bo Wahlberg. On some robustness issues in input design. IFAC Proceedings Volumes, 39(1):511–516, 2006. Paul G. Hoel. Minimax designs in two dimensional regression. The Annals of Mathematical Statistics, 36(4):1097–1106, 1965. John K. Hunter and Bruno Nachtergaele. Applied Analysis. World Scientific Publishing Company, 2001. Siddharth Joshi and Stephen Boyd. Sensor selection via convex optimization. IEEE Transactions on Signal Processing, 57(2):451–462, 2008. T. Kadota. Examples of optimum detection of Gaussian signals and interpretation of white noise. IEEE Transactions on Information Theory, 14(5):725–734, 1968. T. T. Kadota. Integral equation for simultaneous diagonalization of two covariance kernels. The Bell System Technical Journal, 46(5):883–892, 1967. Robert E. Kalaba and Karl Spingarn. Optimal input system identification for nonlinear dynamic systems. Journal of Optimization Theory and Applications, 21(1):91–102, 1977. Samuel Karlin and William Studden. Tchebyeff Systems: With Applications in Analysis and Statistics. Interscience, New York 1966. Samuel Karlin and William J. Studden. Tchebycheff systems: With Applications in Analysis and Statistics (translated into Russian). Nauka, Moscow, 1976. Alexander Y. Khapalov. Mobile Point Sensors and Actuators in the Controllability Theory of Partial Differential Equations. Springer, 2017. Jack Kiefer and Jacob Wolfowitz. The equivalence of two extremum problems. Canadian Journal of Mathematics, 12(5):363–365, 1960. Jerzy Klamka. Approximate controllability of second order dynamical systems. Applied Mathematics and Computer Science, 2(1):135–146, 1992. Jerzy Klamka. Controllability of dynamical systems. A survey. Bulletin of the Polish Academy of Sciences: Technical Sciences, 61(2):335–342, 2013. Kazumasa Kono. Optimum design for quadratic regression on k-cube. Memoirs of the Faculty of Science, Kyushu University. Series A, Mathematics, 16(2):114–122, 1962. Józef Korbicz. State estimation in discrete-time distributed parameter systems under incomplete priori information about the system. Kybernetika, 21(6):470–479, 1985. Józef Korbicz and Dariusz Uciński. Sensor allocation for state and parameter estimation of distributed systems. In Discrete Structural Optimization, pages 178–189. Springer, 1994. Józef Korbicz, Marcin Witczak, and Vicenc Puig. LMI-based strategies for designing observers and unknown input observers for non-linear discrete-time systems. Bulletin of the Polish Academy of Sciences: Technical Sciences, 55(1):31–42, 2007. Józef Korbicz, M. Z. Zgurovsky, and A. N. Novikov. Suboptimal sensors location in the state estimation problem for stochastic non-linear distributed parameter systems. International Journal of Systems Science, 19(9):1871–1882, 1988.
Bibliography | 175
[88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111]
Andrzej Krolikowski and Peter Eykhoff. Input signal design for system identification: a comparative analysis. IFAC Proceedings Volumes, 18(5):915–920, 1985. Carlos S. Kubrusly. Distributed parameter system identification a survey. International Journal of Control, 26(4):509–535, 1977. B. Kuszta and N. K. Sinha. Design of optimal input signals for the identification of distributed parameter systems. International Journal of Systems Science, 9(1):1–7, 1978. Tom Lahmer and Ewaryst Rafajłowicz. On the optimality of harmonic excitation as input signals for the characterization of parameters in coupled piezoelectric and poroelastic problems. Mechanical Systems and Signal Processing, 90:399–418, 2017. Irena Lasiecka. Mathematical Control of Coupled PDEs, volume 75. SIAM, 2002. Irena Lasiecka and Roberto Triggiani. Control Theory for Partial Differential Equations: Volume 1, Abstract Parabolic Systems: Continuous and Approximation Theories, volume 1. Cambridge University Press, 2000. Han-Xiong Li and Chenkun Qi. Modeling of distributed parameter systems for applications – a synthesized review from time–space separation. Journal of Process Control, 20(8):891–901, 2010. Yong B. Lim and W. J. Studden. Efficient D_s-optimal designs for multivariate polynomial regression on the q-cube. The Annals of Statistics, 16(3):1225–1240, 1988. Lenart Ljung. System Identification: Theory for the User. 11. Print. Prentice-Hall, Upper Saddle River, NJ, 2009. David G. Luenberger. Optimization by Vector Space Methods. John Wiley & Sons, 1997. Kaushik Mahata, Johan Schoukens, and Alexander De Cock. Information matrix and D-optimal design with Gaussian inputs for Wiener model identification. Automatica, 69:65–77, 2016. Raman Mehra. Optimal input signals for parameter estimation in dynamic systems – survey and new results. IEEE Transactions on Automatic Control, 19(6):753–768, 1974. Raman Mehra. Optimal inputs for linear system identification. IEEE Transactions on Automatic Control, 19(3):192–200, 1974. Viatcheslav Melas. Functional Approach to Optimal Experimental Design, volume 184. Springer Science & Business Media, 2006. Monidipa Das and Soumya K. Ghosh. Data-driven approaches for spatio-temporal analysis: a survey of the state-of-the-arts. Journal of Computer Science and Technology, 35:665–696, 2020. Hans-Georg Müller. Optimal designs for nonparametric kernel regression. Statistics & Probability Letters, 2(5):285–290, 1984. Werner G. Müller. Collecting Spatial Data: Optimum Design of Experiments for Random Fields. Springer Science & Business Media, 2007. Grzegorz Mzyk. Combined Parametric–Nonparametric Identification of Block-Oriented Systems, volume 238. Springer, 2014. Wolfgang Näther. Effective Observation of Random Fields, volume 72. Collets, 1985. J. H. O’Geran and H. P. Wynn. Multi-stage step-wise group screening. Communications in Statistics – Theory and Methods, 21(11):3291–3308, 1992. Karol R. Opara and Jarosław Arabas. Differential evolution: a survey of theoretical analyses. Swarm and Evolutionary Computation, 44:546–558, 2019. Athanasios Papoulis. Probability, Random Variables and Stochastic Processes. McGraw-Hill, New York, 1989. Maciej Patan. Distributed scheduling of sensor networks for identification of spatio-temporal processes. International Journal of Applied Mathematics and Computer Science, 22(2):299–311, 2012. Maciej Patan. Optimal Sensor Networks Scheduling in Identification of Distributed Parameter Systems, volume 425. Springer Science & Business Media, 2012.
176 | Bibliography
[112] Maciej Patan and Barbara Bogacka. Optimum group designs for random-effects nonlinear dynamic processes. Chemometrics and Intelligent Laboratory Systems, 101(2):73–86, 2010. [113] Andrej Pázman. Foundations of Optimum Experimental Design, volume 14. Springer, 1986. [114] Andrej Pázman. Nonlinear Statistical Models, volume 254. Springer Science & Business Media, 2013. [115] Michael P. Polis and Raymond E. Goodson. Parameter identification in distributed systems: a synthesizing overview. Proceedings of the IEEE, 64(1):45–61, 1976. [116] A. Ponce de Leon and Anthony C. Atkinson. Optimum experimental design for discriminating between two rival models in the presence of prior information. Biometrika, 78(3):601–608, 1991. [117] Anthony J. Pritchard and Jerzy Zabczyk. Stability and stabilizability of infinite-dimensional systems. SIAM Review, 23(1):25–52, 1981. [118] Luc Pronzato and Andrej Pázman. Design of Experiments in Nonlinear Models, volume 212. Springer, 2013. [119] Luc Pronzato and Anatoly A. Zhigljavsky. Algorithmic construction of optimal designs on compact sets for concave and differentiable criteria. Journal of Statistical Planning and Inference, 154:141–155, 2014. [120] Friedrich Pukelsheim. Optimal Experimental Design. Wiley, New York, 1993. [121] Ewaryst Rafajłowicz. Optimal input signals for parameter estimation in linear distributed parameter systems. International Journal of Systems Science, 13(7):799–808, 1982. [122] Ewaryst Rafajlowicz. Optimal experiment design for identification of linear distributed-parameter systems: frequency domain approach. IEEE Transactions on Automatic Control, 28(7):806–808, 1983. [123] Ewaryst Rafajłowicz. Optimization of measurements for state estimation in parabolic distributed systems. Kybernetika, 20(5):413–422, 1984. [124] Ewaryst Rafajłowicz. Optimum experiment design for parameter identification in distributed systems: brief survey and new results. IFAC Proceedings Volumes, 17(2):747–751, 1984. [125] Ewaryst Rafajłowicz. Unbounded power input signals in optimum experiment design for parameter estimation in linear systems. International Journal of Control, 40(2):383–391, 1984. [126] Ewaryst Rafajłowicz. Optimum choice of moving sensor trajectories for distributed-parameter system identification. International Journal of Control, 43(5):1441–1451, 1986. [127] Ewaryst Rafajłowicz. Sequential identification algorithm and controller choice for a certain class of distributed systems. Kybernetika, 22(6):471–486, 1986. [128] Ewaryst Rafajłowicz. Nonparametric orthogonal series estimators of regression: a class attaining the optimal convergence rate in L2 . Statistics & Probability Letters, 5(3):219–224, 1987. [129] Ewaryst Rafajlowicz. Splitting of the least squares identification algorithm for distributed systems – case study. Systems Analysis, Modelling, Simulation, 4(6):531–540, 1988. [130] Ewaryst Rafajłowicz. Reduction of distributed system identification complexity using intelligent sensors. International Journal of Control, 50(5):1571–1576, 1989. [131] Ewaryst Rafajłowicz. Time-domain optimization of input signals for distributed-parameter systems identification. Journal of Optimization Theory and Applications, 60(1):67–79, 1989. [132] Ewaryst Rafajłowicz. Algorithms of Experiment Design with Implementations in the Mathematica Environment (in Polish). Academic Publisher PLJ, Warsaw, 1996. [133] Ewaryst Rafajlowicz. Selective random search for optimal experiment designs. In Atkinson Anthony Pronzato Luc and Henry Wynn, editors, MODA 5 – Advances in Model-Oriented Data Analysis and Experimental Design, pages 75–83. Springer, 1998. [134] Ewaryst Rafajłowicz. Repeated least squares with inversion and its application in identifying linear distributed-parameter systems. International Journal of Systems Science, 31(8):1003–1010, 2000.
Bibliography | 177
[135] Ewaryst Rafajlowicz. Testing (non-) existence of input-output relationships by estimating fractal dimensions. IEEE Transactions on Signal Processing, 52(11):3151–3159, 2004. [136] Ewaryst Rafajlowicz. Testing homogeneity of coefficients in distributed systems with application to quality monitoring. IEEE Transactions on Control Systems Technology, 16(2):314–321, 2008. [137] Ewaryst Rafajłowicz and Wojciech Myszka. Computational algorithm for input-signal optimization in distributed-parameter systems identification. International Journal of Systems Science, 17(6):911–924, 1986. [138] Ewaryst Rafajłowicz and Wojciech Myszka. Optimum experimental design for a regression on a hypercube-generalization of Hoel’s result. Annals of the Institute of Statistical Mathematics, 40(4):821–827, 1988. [139] Ewaryst Rafajłowicz and Wojciech Myszka. Computational algorithm for generating optimum experimental designs on a hypercube. In Stochastic Methods in Experimental Sciences: Proceedings of the 1989 COSMEX Meeting, pages 394–406. World Scientific, 1990. [140] Ewaryst Rafajłowicz and Wojciech Myszka. When product type experimental design is optimal? Brief survey and new results. Metrika, 39(1):321–333, 1992. [141] Ewaryst Rafajlowicz and Wojciech Myszka. Efficient algorithm for a class of least squares estimation problems. IEEE Transactions on Automatic Control, 39(6):1241–1243, 1994. [142] Ewaryst Rafajłowicz and Wojciech Rafajłowicz. Sensors’ allocation for estimating scalar fields by wireless sensor networks. In 2010 Fifth International Conference on Broadband and Biomedical Communications, pages 1–6. IEEE, 2010. [143] Ewaryst Rafajłowicz and Wojciech Rafajłowicz. D-optimum input signals for systems with spatio-temporal dynamics. In MODA 10 – Advances in Model-Oriented Design and Analysis, pages 219–227. Springer Verlag, Heidelberg, New York, 2013. [144] Ewaryst Rafajłowicz and Rainer Schwabe. Equidistributed designs in nonparametric regression. Statistica Sinica, 13(1):129–142, 2003. [145] Ewaryst Rafajłowicz and Rainer Schwabe. Halton and Hammersley sequences in multivariate nonparametric regression. Statistics & Probability Letters, 76(8):803–812, 2006. [146] Wojciech Rafajłowicz. Learning Decision Sequences for Repetitive Processes – Selected Algorithms (in print). Springer Science & Business Media, 2021. [147] Wojciech Rafajłowicz, Piotr Jurewicz, Jacek Reiner, and Ewaryst Rafajłowicz. Iterative learning of optimal control for nonlinear processes with applications to laser additive manufacturing. IEEE Transactions on Control Systems Technology, 27(6):2647–2654, 2018. [148] Fred W. Ramirez. Identification for distributed parameter chemical engineering systems – fixed-bed tubular rectors. In Distributed Parameter Control Systems – Theory and Application. Pergamon Press Ltd., 1982. [149] Radhakrishna C. Rao. Linear Statistical Inference and Its Applications, volume 2. Wiley, New York, 1973. [150] Radhakrishna C. Rao and Helge Toutenburg. Linear Models: Least Squares and Alternatives. Springer, 1999. [151] Michael Renardy and Robert C. Rogers. An Introduction to Partial Differential Equations, volume 13. Springer Science & Business Media, 2006. [152] James C. Robinson. Linear partial differential equations, 2010. [153] Cristian R. Rojas, Jonas Mårtensson, and Håkan Hjalmarsson. A tutorial on applications-oriented optimal experiment design. In Identification for Automotive Systems, pages 149–164, 2012. [154] Leszek Rutkowski. On system identification by nonparametric function fitting. IEEE Transactions on Automatic Control, AC-27:225–227, 1982. [155] Leszek Rutkowski and Ewaryst Rafajłowicz. On optimal global rate of convergence of some nonparametric identification procedures. IEEE Transactions on Automatic Control, AC-34:1089–1092, 1989.
178 | Bibliography
[156] Jerome Sacks and Donald Ylvisaker. Designs for regression problems with correlated errors. The Annals of Mathematical Statistics, 37(1):66–89, 1966. [157] Klaus Schittkowski. Numerical Data Fitting in Dynamical Systems: A Practical Introduction with Applications and Software, volume 77. Springer Science & Business Media, 2002. [158] Johan Schoukens and Rik Pintelon. Identification of Linear Systems: A Practical Guideline to Accurate Modeling. Elsevier, 2014. [159] Rainer Schwabe. Optimum Designs for Multi-Factor Models, volume 113. Springer Science & Business Media, 1996/2012. [160] Rainer Schwabe. Maximin efficient designs. Another view at D-optimality. Statistics & Probability Letters, 35(2):109–114, 1997. [161] Rainer Schwabe and Werner Wierich. D-optimal designs of experiments with non-interacting factors. Journal of Statistical Planning and Inference, 44(3):371–384, 1995. [162] Manohar Shamaiah, Siddhartha Banerjee, and Haris Vikalo. Greedy sensor selection: leveraging submodularity. In 49th IEEE Conference on Decision and Control (CDC), pages 2572–2577. IEEE, 2010. [163] Wen-Jing Shen and Han-Xiong Li. A sensitivity-based group-wise parameter identification algorithm for the electric model of Li-ion battery. IEEE Access, 5:4377–4387, 2017. [164] Samuel Silvey. Optimal Design: An Introduction to the Theory for Parameter Estimation, volume 1. Springer Science & Business Media, 2013. [165] Samuel Silvey, Michael D. Titterington, and Ben Torsney. An algorithm for optimal designs on a design space. Communications in Statistics – Theory and Methods, 7(14):1379–1389, 1978. [166] Ewa Skubalska-Rafajłowicz and Ewaryst Rafajłowicz. Searching for optimal experimental designs using space-filling curves. Applied Mathematics and Computer Science, 8(3):647–656, 1998. [167] Przemysław Sliwinski. Nonlinear System Identification by Haar Wavelets, volume 210. Springer Science & Business Media, 2012. [168] Torsten Söderström and Petre Stoica. System Identification. Prentice-Hall International, 1989. [169] Eduardo D. Sontag. Mathematical Control Theory: Deterministic Finite Dimensional Systems, volume 6. Springer Science & Business Media, 2013. [170] Karl Spingarn and Robert Kalaba. Control, Identification, and Input Optimization. Plenum Press, New York, NY, Springer Science Business Media 2012, 1982. [171] Ben Torsney and Martín Raú. Multiplicative algorithms for computing optimum designs. Journal of Statistical Planning and Inference, 139(12):3947–3961, 2009. [172] Christophe Tricaud and YangQuan Chen. Optimal Mobile Sensing and Actuation Policies in Cyber-Physical Systems. Springer Science & Business Media, 2011. [173] Francesco Giacomo Tricomi. Integral Equations, volume 5. Courier Corporation, 1985. [174] Aslak Tveito and Ragnar Winther. Introduction to Partial Differential Equations: A Computational Approach, volume 29. Springer Science & Business Media, 2004. [175] Spyros G. Tzafestas. Distributed Parameter Control Systems: Theory and Application, volume 6. Elsevier, 2013. [176] Spyros G. Tzafestas and Peter Stavroulakis. Recent advances in the study of distributed parameter systems. Journal of the Franklin Institute, 315(5–6):285–305, 1983. [177] Dariusz Ucinski. Optimal Measurement Methods for Distributed Parameter System Identification. CRC Press, Boca Raton, 2005. [178] Dariusz Uciński. D-optimal sensor selection in the presence of correlated measurement noise. Measurement, 164:107873, 2020. [179] Dariusz Ucinski and Anthony C. Atkinson. Experimental design for time-dependent models with correlated observations. Studies in Nonlinear Dynamics & Econometrics, 8(2), 2004. [180] Dariusz Uciński and Barbara Bogacka. Construction of T-optimum designs for multiresponse dynamic models. In Compstat, pages 267–272. Springer, 2002.
Bibliography | 179
[181] Dariusz Uciński and Barbara Bogacka. T-optimum designs for discrimination between two multiresponse dynamic models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(1):3–18, 2005. [182] Dariusz Uciński and Józef Korbicz. Parameter identification of two-dimensional distributed systems. International Journal of Systems Science, 21(12):2441–2456, 1990. [183] Dariusz Uciński and Maciej Patan. D-optimal design of a monitoring network for parameter estimation of distributed systems. Journal of Global Optimization, 39(2):291–322, 2007. [184] Dariusz Uciński and Maciej Patan. Sensor network design for the estimation of spatially distributed processes. International Journal of Applied Mathematics and Computer Science, 20(3):459–481, 2010. [185] Paweł Wachel. Convex aggregative modelling of infinite memory nonlinear systems. International Journal of Control, 89(8):1613–1621, 2016. [186] Eric Walter and Luc Pronzato. Identification of Parametric Models from Experimental Data. Springer Verlag, 1997. [187] Katrin Weigmann. The consequence of errors. EMBO Reports, 6(4):306–309, 2005. [188] Werner Wierich. On optimal designs and complete class theorems for experiments with continuous and discrete factors of influence. Journal of Statistical Planning and Inference, 15:19–27, 1986. [189] Chien-Fu Wu and Henry P. Wynn. The convergence of general step-length algorithms for regular optimum design criteria. The Annals of Statistics, 6(6):1273–1285, 1978. [190] Henry P. Wynn. The sequential generation of D-optimum experimental designs. The Annals of Mathematical Statistics, 41(5):1655–1664, 1970. [191] Masatoshi Yoshida, Ravin Hanutsaha, and Shigeru Matsumoto. Parameter identification for a parabolic distributed parameter system using the finite integral transform technique. Journal of Chemical Engineering of Japan, 29(2):386–389, 1996. [192] Yaming Yu. Monotonic convergence of a general algorithm for computing optimal designs. The Annals of Statistics, 38(3):1593–1606, 2010. [193] Yaming Yu. D-optimal designs via a cocktail algorithm. Statistics and Computing, 21(4):475–481, 2011. [194] Jerzy Zabczyk. Mathematical Control Theory: An Introduction. Springer Science & Business Media, 2009. [195] Martin Zarrop. Optimal Experimental Design for Dynamic System Identification, volume 21. Springer Verlag, Berlin, 1981. [196] Tomasz Zawada. Simultaneous estimation of heat transfer coefficient and thermal conductivity with application to microelectronic materials. Microelectronics Journal, 37(4):340–352, 2006.
Index attainable information matrices 32 – compactness 32 – convexity 32 D-optimal sensors’ allocation 41 – double LSM 78 – eigenvalue estimation 78 – for estimating sources 78 – Kronecker product 77 – loss of the efficiency 42 – rectangular plate 76 – vibrating beam 42 – vibrating systems 42 D-optimality for elliptic type PDEs 124 – characterization of the solution 126 – constraints 128 – exact solution 129 – FIM 125, 128 – Green’s function 125, 127 – observations 125 – one parameter 129 – open problem 131 – optimality conditions 126 – problem statement 125 – the equivalence theorem 128 D-optimality for elliptic type PDEs assumptions 125 design of experiment see experiment design discrete experiment designs 49 discrete vs approximate designs 49 DOE see experiment design experiment design 11 – A-optimality 27 – approximate designs 29 – attainable information matrices 31 – convexity 31 – D-optimal design for polynomial regression 37 – D-optimal sensors’ allocation 40 – D-optimality 26 – design weights 30 – E-optimality 27 – equivalence of D- and G-optimality 34 – equivalence theorem 34 – exact designs 29 – for control systems 11 – for DPS 12, 151 – for DPS – frequency domain 12
– for nonparametric regression 11 – for ODE 11 – for ODEs – in time domain 11 – for PDEs 12 – for PDEs – frequency domain 12 – for PDEs – time domain 12 – for random fields 11 – for regression function 11 – G-optimality 28 – how to compare experiments 26 – in frequency domain 11 – L-optimality 27 – Lp -optimality 27 – multivariate regression 39 – normalized information matrix 31 – optimal sensors’ allocation 39 – properties of D-optimal design 43 – properties of G-optimal design 43 – Q-optimality 28 – sampling 11 – sampling – dynamic systems 11 – sensitivity to support 36 – sensitivity to weights 36 – sensors’ allocation 11 – support 30 – what can be done before 24 experiment designs 49 – allocation of observations 50 – combined algorithm 64 – discrete vs. approximate 49 – examples of weights optimization 53 – multiplicative algorithm 51 – optimization of weights 50 – rounding 49 – W-F algorithm – description 58 – weights optimization 53 – Wynn–Fedorov algorithm 56 extending applicability 165 – departures from linearity 166 – dependence on parameters 165 – eigenfunctions 165, 167 – nonconvex domain 165 – spatial domains 167 Gauss–Markov theorem 23 Green’s functions 116 – ā notation 116
182 | Index
– dependence on parameters 116, 119 – eigenfunctions – examples 120 – eigenvalues vs. parameters 119 – special class 118 – tensor products of eigenfunctions 120 input signals 5 – fractal 5 – persistently exciting 5 input signals for ODEs 81 – a priori knowledge 88 – admissible spectral densities 85 – assumptions 81 – autocorrelation function 84 – D-efficiency 92 – equivalence theorem 90 – example 87 – examples 91 – existence of D-optimal solution 89 – FIM 83 – in frequency domain 81 – observations 82 – optimality conditions 88 – properties of optimal solutions 90 – spectral density 84 – Stieltjes integral 87 – unbounded power signals 93 integral operators 110 – eigenfunction expansion 111 – Green’s function 113 – truncated series 115 – with degenerated kernels 111 Kronecker product – matrices multiplication 78 – matrix inversion 78 – of eigenfunctions 78 – of matrices 78 – of vectors 78 – properties 78 least squares method – accuracy 22 – assumptions 20 – comparing different covariance matrices 22 – covariance matrix 22 – Gauss–Markov theorem 23 – linear models 20 – prediction variance 23
linear models 17 – assumptions 21 – bases 19 – identifiability 19 – least squares method 20 lumped-parameter systems 11 mathematical model 5 – dynamic 5 – methodology 5 – spatial 5 – spatio-temporal 5 – static 5 moving sensors 7 new paradigms 7 – actuators 9 – electro-rheological dampers 9 – lasers 9 – magneto-rheological dampers 9 – microwave heating 9 – Peltier cells 9 – piezoelectric bonds 9 – shape-memory alloys 9 – classic paradigms 7 – observations 8 – 3D imaging 8 – 3D scaners 8 – accelerometers 8 – CO2 , O2 , SO3 8 – CT 8 – IR cameras 8 – MRI 8 – optical flow 8 – piezoelectric sensors 8 – strains fields 8 – temperature field 8 – USG 8 – UV cameras 8 – velocity field 8 – wireless sensor network 8 notation 13 – dependence on parameters 14 – Fisher’s information matrix 15 – functional spaces 14 – outputs 14 – spatial variable 14 – system state 14 – the Kronecker product 14
Index | 183
– time variable 13 – trace 14 – transposition 14 – vectors and matrices 14 – vectors of unknown parameters 15 ODEs – parameter estimation 79 – computational algorithms 81 – notation 79 – Cramer–Rao inequality 81 – LTI systems 80 – output observations 80 – persistent signals 79 – problem statement 80 – selected bibliography 79 optimal excitations – elliptic equations 105 – boundary conditions 106 – compact operators 109 – eigenfunctions 106 – eigenvalues 106 – Hilbert–Schmidt operators 110 – notations 105 – operator 106 – weak solutions 106 optimal input signal for DPS – admissible frequencies 154 – admissible signals 153 – assumptions 151 – averaged FIM 155 – boundary inputs 152 – complete class theorem 156 – computational algorithm 160 – constraints 156 – cross-covariance function 153 – cross-spectral density 154 – D-optimality condition 159 – decomposition 159 – equivalence theorem 157 – examples 162 – FIM 154 – frequency domain approach 151 – generalizations 164 – Green’s function 152 – hyperbolic PDEs 161 – observations 153 – parabolic PDEs 161 – point sources 152 – problem statement 152, 153 – space-frequency domain approach 157
– spatio-temporal structure 158 – special cases 164 – spectral density 155 – time domain synthesis – algorithm 142 – examples 145 – Fisher information matrix 138 – more safe 148 – observations 133 – optimality conditions 142 – problem statement 138 – structure of the solution 142 optimal input signals – assumptions 81 – example 100 – examples 92 – frequency domain approach 81 – generalization 93 – model 80, 81 – more safe input signals 103 – new paradigms 8 – numerical algorithm 100 – observations 82 – ODE 80, 81 – optimality conditions in time domain 96 – parameter estimation 79 – problem statement 88 – properties 89 – sensitivity to departures 88 – spectral density 85 – systems described by PDE 8 – the equivalence theorem 90 – the Fisher information matrix 83 – variational approach 96 optimal input signals for ODEs – additional constraints 103 – computational algorithm 100 – examples 99 – further optimality conditions 99 – more safe signals 103 – multioutput systems 93 – optimality conditions 96 – optimization in time domain 94 – problem statement 95 – the FIM 95 – variational approach 96 product of experiments 69 – D-optimality 72
184 | Index
– definition 70 – discussion 75 – estimating eigenvalues 78 – example – thin plate 77 – examples 76 – information matrix 72 – multiplicative models 70 – notation 70 – optimality for other criteria 76 – partial designs 70 – properties 70 – repeated LSM 78 – simplifying the LSM 76 related topics 10 – functional analysis 10 – Green’s functions 10 – Green’s functions of PDEs 10 – mathematical statistics 10 – optimization theory 10 – PDEs 10 – semi-groups of operators 10 – variational calculus 10 remarks on experiments 3 – 100 years 3 – Aristotle example 4 – experiments vs. theory 3 – Galileo Galilei 3 – model building 4 searching for D-optimal designs 49 – approximate designs 49
– combined algorithm 64 – discrete designs 49 – examples 53 – examples – W-F algorithm 60 – multiplicative algorithm 51 – optimizing weights 50 – Wynn–Fedorov algorithm 56 state estimation for DPS – nonlinear 13 – sensors allocation 13 Sturm–Liouville theory 124 system identification for DPS 12 – constant parameters 12 – ill-conditioning 13 – monographs 13 – numerical aspects 13 – parameters in BCs 12 – spatially varying coefficients 13 – state estimation 13 topics out of scope 167 – choice of signals and sensors 168 – Hammerstein systems 169 – moving sensors 168 – nonlinear PDEs 168 – nonparametric estimation 169 – nonstandard inputs 169 – quasi-linear PDEs 168 – sampling rate 168 – spatially varying parameters 167 – time-varying coefficients 167 – Wiener systems 169