152 26 59MB
English Pages 550 [552] Year 2012
Inverse and Ill-Posed Problems Series 56 Managing Editor Sergey I. Kabanikhin, Novosibirsk, Russia /Almaty, Kazakhstan
Computational Methods for Applied Inverse Problems Edited by
Yanfei Wang Anatoly G. Yagola Changchun Yang
De Gruyter
Mathematics Subject Classification 2010: Primary: 65J22; Secondary: 65J20, 86-00, 86-08, 65K10, 65J10, 65J15, 45Q05.
ISBN 978-3-11-025904-9 e-ISBN 978-3-11-025905-6 ISSN 1381-4524 Library of Congress Cataloging-in-Publication Data A CIP catalog record for this book has been applied for at the Library of Congress. Bibliographic information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available in the Internet at http://dnb.dnb.de. ” 2012 Higher Education Press and Walter de Gruyter GmbH, Berlin/Boston Printing and binding: Hubert & Co. GmbH & Co. KG, Göttingen 앝 Printed on acid-free paper 앪 Printed in Germany www.degruyter.com
Preface This volume contains the chapters based on the lectures given by invited speakers of the international workshop “Computational Methods for Applied Inverse Problems”. The workshop was organized under the auspices of the Chinese Academy of Sciences in the Institute of Geology and Geophysics, located in Beijing, the capital of China, and held during July 12–16, 2010. The workshop was sponsored by the National Science Foundation of China (NSFC), ChinaRussia Cooperative Research Project (NSFC-RFBR), Knowledge Innovation Programs of Chinese Academy of Sciences (CAS) and the Strategic Priority Research Program of the Chinese Academy of Sciences (CAS). The main goal of the workshop is to gather young participants (mostly mathematicians and geophysicists) from China and overseas together to discuss how to solve inverse and ill-posed problems using different solving strategies. Eminent specialists from China, Russia (partially sponsored by the Russian Foundation of Basic Research), USA, India and Norway were invited to present their lectures. Other young scientists also present their recent researches during the conference. The book covers many directions in the modern theory of inverse and illposed problems — mathematical physics, optimal inverse design, inverse scattering, inverse vibration, biomedical imaging, oceanography, seismic imaging and remote sensing; methods including standard regularization, parallel computing for multidimensional problems, Nyström method, numerical differentiation, analytic continuation, perturbation regularization, filtering, optimization and sparse solving methods are fully addressed. This issue attempts to bridge the gap between theoretical studies of ill-posed inverse problems and practical applications. Let us continue our efforts for further progress. This book will be helpful to researchers and teachers in developing courses on various inverse and ill-posed problems of mathematical physics, geosciences, designing technology, imaging, high performance computing, inverse scattering and vibration, and so on. It could be also beneficial for senior undergraduate students, graduate and Ph.D. students, recent graduates getting practical experience, engineers and researchers who study inverse and ill-posed problems and solve them in practice.
Chinese Academy of Sciences, Beijing June 2012
Yanfei Wang
Editor’s Preface Inverse problem theory and methods are driven by applied problems in sciences and engineering. Studies on inverse problems represent an exciting research area in recent decades. The special importance of inverse problems is that it is an interdisciplinary subject related with mathematics, physics, chemistry, geoscientific problems, biology, financial and business, life science, computing technology and engineering. Inverse problems consist in using the results of actual observations to infer the values of the (model) parameters characterizing the system under investigation. Inverse problems are typically ill-posed in the sense that one of the three items “existence, uniqueness or stability” of the solution may be violated. Inverse problems use modeling design and solving methods to provide a better, more accurate, and more efficient simulation for practical problems. Methodologies for solving inverse problems involve regularization, optimization and statistics. No one particular method solves all inverse problems. This book provides a background of using regularization and optimization techniques to solve practical inverse problems for the readers who do research in computational/applied mathematics, physical chemistry, engineering, geophysics, image processing and remote sensing, etc. In particular, recent advances of inversion theory and solution methods with applications to practical inverse problems are addressed. This book Computational Methods for Applied Inverse Problems will comprise the following scientific fields: • • • • • • •
Historical background and key issues of general inverse problems; Recent advances in regularization theory and new solution methods; Optimal inverse design and optimization methods; Recent advances in inverse scattering; Inverse vibration and data processing; Modeling and inversion of the geoscientific problems; Analytic, algebraic, statistical and computational methods.
The five main parts of the book are preceded with the first part of an introductory chapter. Chapter 1 written by S. I. Kabanikhin presents us a general idea about inverse problems and key theories on solving problems. The second part of this book is devoted to presenting recent advances in regularization theory and solving methods. Chapter 2 written by D. V. Lukyanenko and A. G. Yagola proposes a parallel computing technique on multidimensional ill-posed problems. In this chapter, example of a practical problem of restoring
viii
Editor’s Preface
magnetization parameters over a ship body using parallelization is considered. Chapter 3 written by M. T. Nair talks about the theoretical issues of Nyström approximation method for ill-posed problems, numerical issues and error estimates. Chapter 4 written by T. Y. Xiao, H. Zhang and L. L. Hao discusses about the regularizing theories on numerical differentiation. In this chapter, different regularization schemes are presented and compared with extensive numerical simulations. Chapter 5 written by C. L. Fu, H. Cheng and Y. J. Ma shows readers the analytic continuation and regularization. Convergence properties and error estimates are included. Chapter 6 written by G. S. Li discusses about the perturbation regularization method for function reconstruction problem. Four cases of coefficient determination of an advection diffusion equation are addressed. Chapter 7 written by L. V. Zotov and V. L. Panteleev presents some filtering methods for ill-posed problems. The third part is devoted to inverse design problems and optimization. Chapter 8 written by G. S. Dulikravich and I. N. Egorov describes the alloy design methodology. The inverse problem is formulated as a constrained multi-objective optimization problem and solved using a robust evolutionary optimizer of IOSO type. Chapter 9 written by Z. H. Xiang discusses both the optimal sensor placement design and the regularization method. Practical methods based on the well-posedness analysis of parameter identification procedures and the adaptive adjusting of a-priori information are developed. Chapter 10 written by Y. H. Dai introduces a stable optimization method, called the BFGS method. A general convergence result about the BFGS algorithm is obtained. The fourth part is devoted to the field of inverse scattering. Chapter 11 written by X. D. Liu and B. Zhang presents the uniqueness results in inverse acoustic and electromagnetic obstacle scattering problems. Some interesting open problems are posed at the end of the chapter. Chapter 12 written by G. Bao and P. J. Li addresses a shape reconstruction problem of inverse scattering. A continuation method for the inverse obstacle scattering problem is developed and details about solving issues are established. The fifth part is devoted to the inverse vibration problems, data processing and some mathematical problems in biomedical imaging. Chapter 13 written by G. M. Kuramshina, I. V. Kochikov and A. V. Stepanova talks about molecular force field calculations. In particular, they discuss how a-priori model assumptions and ab initio quantum mechanical calculations are used for constructing regularizing algorithms for the calculation of molecular force fields. Chapter 14 written by J. J. Liu and H. L. Xu discusses about the mathematical models and image reconstruction realizations of magnetic resonance electrical impedance tomography (MREIT), the harmonic Bz algorithm and the integral equation method are presented.
Editor’s Preface
ix
The last part is devoted to the modeling and inversion problems occurred in geophysics, oceanography, and remote sensing. Chapter 15 written by S. I. Kabanikhin and M. A. Shishlenin discusses about iterative regularization method for solving the inverse hyperbolic problems. Chapter 16 written by H. B. Song, X. H. Huang, L. M. Pinheiro, Y. Song, C. Z. Dong and Y. Bai focuses on a new cross discipline between seismology and physical oceanography. Chapter 17 written by L. J. Gelius provides a framework of understanding and analyzing both diffraction-limited imaging as well as super-resolution. Chapter 18 written by Y. F. Wang, Z. H. Li and C. C. Yang makes a short review seismic migration methods and develops a preconditioned regularizing least squares migration method. Chapter 19 written by Y. F. Wang, J. J. Cao, T. Sun and C. C. Yang extends the conception of compressive sensing to seismic wavefileds interpolation, sparse optimization and regularization methods are fully described. Chapter 20 written by H. Yang presents a quantitative model to characterize the reflectance of land surfaces, regularizing and optimizing issues and multistage inversion strategies are discussed. The special features of the book are that it provides both novel methods for standard and nonstandard regularization and practical applications in science and engineering. Each chapter is written by respective researchers in their research fields. Illustrations and tables are provided for better understanding of their ideas. Scientists, researchers, engineers, as well as graduate students engaged in applied/computational mathematics, engineering, physical chemistry, geophysics, medical science, image processing, computer science and remote sensing, will benefit from this book. Finally, we hope that this book will stimulate and inspire new research efforts and the intensive exploration of new promising directions.
Chinese Academy of Sciences, Beijing Lomonosov Moscow State University, Moscow Chinese Academy of Sciences, Beijing June 2012
Yanfei Wang Anatoly G. Yagola Changchun Yang
Contents
Preface
v
Editor’s Preface
I
Introduction
1 S. I. Kabanikhin Inverse Problems of Mathematical Physics 1.1 Introduction . . . . . . . . . . . . . . . . . . 1.2 Examples of Inverse and Ill-posed Problems 1.3 Well-posed and Ill-posed Problems . . . . . 1.4 The Tikhonov Theorem . . . . . . . . . . . 1.5 The Ivanov Theorem: Quasi-solution . . . . 1.6 The Lavrentiev’s Method . . . . . . . . . . 1.7 The Tikhonov Regularization Method . . . References . . . . . . . . . . . . . . . . . . . . . .
II
vii
1
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
3 3 12 24 26 29 33 35 44
Recent Advances in Regularization Theory and Methods 47
2 D. V. Lukyanenko and A. G. Yagola Using Parallel Computing for Solving Multidimensional Ill-posed Problems 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Using Parallel Computing . . . . . . . . . . . . . . . . . . . 2.2.1 Main idea of parallel computing . . . . . . . . . . . . 2.2.2 Parallel computing limitations . . . . . . . . . . . . . 2.3 Parallelization of Multidimensional Ill-posed Problem . . . . 2.3.1 Formulation of the problem and method of solution . 2.3.2 Finite-difference approximation of the functional and gradient . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.3 Parallelization of the minimization problem . . . . . 2.4 Some Examples of Calculations . . . . . . . . . . . . . . . . 2.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . its . . . . . . . . . .
49 49 51 51 52 53 53 56 58 61 63 63
xii
Contents
3 M. T. Nair Regularization of Fredholm Integral Equations of the First Kind using Nyström Approximation 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Nyström Method for Regularized Equations . . . . . . . . . . 3.2.1 Nyström approximation of integral operators . . . . . 3.2.2 Approximation of regularized equation . . . . . . . . . 3.2.3 Solvability of approximate regularized equation . . . . 3.2.4 Method of numerical solution . . . . . . . . . . . . . . 3.3 Error Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Some preparatory results . . . . . . . . . . . . . . . . 3.3.2 Error estimate with respect to · 2 . . . . . . . . . . 3.3.3 Error estimate with respect to · ∞ . . . . . . . . . . 3.3.4 A modified method . . . . . . . . . . . . . . . . . . . . 3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 T. Y. Xiao, H. Zhang and L. L. Hao Regularization of Numerical Differentiation: Methods and Applications 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Regularizing Schemes . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Basic settings . . . . . . . . . . . . . . . . . . . . . . . 4.2.2 Regularized difference method (RDM) . . . . . . . . . 4.2.3 Smoother-Based regularization (SBR) . . . . . . . . . 4.2.4 Mollifier regularization method (MRM) . . . . . . . . 4.2.5 Tikhonov’s variational regularization (TiVR) . . . . . 4.2.6 Lavrentiev regularization method (LRM) . . . . . . . 4.2.7 Discrete regularization method (DRM) . . . . . . . . . 4.2.8 Semi-Discrete Tikhonov regularization (SDTR) . . . . 4.2.9 Total variation regularization (TVR) . . . . . . . . . . 4.3 Numerical Comparisons . . . . . . . . . . . . . . . . . . . . . 4.4 Applied Examples . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 Simple applied problems . . . . . . . . . . . . . . . . . 4.4.2 The inverse heat conduct problems (IHCP) . . . . . . 4.4.3 The parameter estimation in new product diffusion model . . . . . . . . . . . . . . . . . . . . . . 4.4.4 Parameter identification of sturm-liouville operator . . 4.4.5 The numerical inversion of Abel transform . . . . . . . 4.4.6 The linear viscoelastic stress analysis . . . . . . . . . . 4.5 Discussion and Conclusion . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . .
65 65 68 68 69 70 73 74 74 77 77 78 80 81
. . . . . . . . . . . . . . .
83 83 87 87 88 89 90 92 93 94 96 99 102 105 106 107
. . . . . .
108 110 112 114 115 117
Contents
xiii
5 C. L. Fu, H. Cheng and Y. J. Ma Numerical Analytic Continuation and Regularization 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Description of the Problems in Strip Domain and Some Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 Description of the problems . . . . . . . . . . . . . . . . 5.2.2 Some assumptions . . . . . . . . . . . . . . . . . . . . . 5.2.3 The ill-posedness analysis for the Problems 5.2.1 and 5.2.2 . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.4 The basic idea of the regularization for Problems 5.2.1 and 5.2.2 . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Some Regularization Methods . . . . . . . . . . . . . . . . . . . 5.3.1 Some methods for solving Problem 5.2.1 . . . . . . . . . 5.3.2 Some methods for solving Problem 5.2.2 . . . . . . . . . 5.4 Numerical Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 G. S. Li An Optimal Perturbation Regularization Algorithm for Function Reconstruction and Its Applications 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 The Optimal Perturbation Regularization Algorithm . . . 6.3 Numerical Simulations . . . . . . . . . . . . . . . . . . . . 6.3.1 Inversion of time-dependent reaction coefficient . . 6.3.2 Inversion of space-dependent reaction coefficient . . 6.3.3 Inversion of state-dependent source term . . . . . . 6.3.4 Inversion of space-dependent diffusion coefficient . 6.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.1 Determining magnitude of pollution source . . . . 6.4.2 Data reconstruction in an undisturbed soil-column experiment . . . . . . . . . . . . . . . . . . . . . . 6.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 L. V. Zotov and V. L. Panteleev Filtering and Inverse Problems Solving 7.1 Introduction . . . . . . . . . . . . . . . . 7.2 SLAE Compatibility . . . . . . . . . . . 7.3 Conditionality . . . . . . . . . . . . . . . 7.4 Pseudosolutions . . . . . . . . . . . . . . 7.5 Singular Value Decomposition . . . . . . 7.6 Geometry of Pseudosolution . . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
121 121 124 124 125 125 126 126 126 133 135 140
143 143 144 147 147 149 151 157 159 159
. . . 162 . . . 165 . . . 166
. . . . . .
. . . . . .
. . . . . .
169 169 170 171 173 175 177
xiv
Contents
7.7 Inverse Problems for the Discrete Models of Observations 7.8 The Model in Spectral Domain . . . . . . . . . . . . . . . 7.9 Regularization of Ill-posed Systems . . . . . . . . . . . . . 7.10 General Remarks, the Dilemma of Bias and Dispersion . . 7.11 Models, Based on the Integral Equations . . . . . . . . . . 7.12 Panteleev Corrective Filtering . . . . . . . . . . . . . . . . 7.13 Philips-Tikhonov Regularization . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
III
. . . . . . . .
. . . . . . . .
. . . . . . . .
Optimal Inverse Design and Optimization Methods
8 G. S. Dulikravich and I. N. Egorov Inverse Design of Alloys’ Chemistry for Specified Thermo-Mechanical Properties by using Multi-objective Optimization 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Multi-Objective Constrained Optimization and Response Surfaces . . . . . . . . . . . . . . . . . . . . . . . . . 8.3 Summary of IOSO Algorithm . . . . . . . . . . . . . . . . . . . 8.4 Mathematical Formulations of Objectives and Constraints . . . 8.5 Determining Names of Alloying Elements and Their Concentrations for Specified Properties of Alloys . . . . . . . . . . . . . . 8.6 Inverse Design of Bulk Metallic Glasses . . . . . . . . . . . . . . 8.7 Open Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Z. H. Xiang Two Approaches to Reduce the Parameter Identification Errors 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2 The Optimal Sensor Placement Design . . . . . . . . . . . . . . 9.2.1 The well-posedness analysis of the parameter identification procedure . . . . . . . . . . . . . . . . . . . . . . . 9.2.2 The algorithm for optimal sensor placement design . . . 9.2.3 The integrated optimal sensor placement and parameter identification algorithm . . . . . . . . . . . . . . . . . . 9.2.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3 The Regularization Method with the Adaptive Updating of Apriori Information . . . . . . . . . . . . . . . . . . . . . . . . . .
178 180 181 181 184 185 186 194
195
197 198 199 201 203 212 214 215 218 219
221 221 223 223 226 229 229 233
Contents
xv
9.3.1
Modified extended Bayesian method for parameter identification . . . . . . . . . . . . . . . . . . . . . 9.3.2 The well-posedness analysis of modified extended Bayesian method . . . . . . . . . . . . . . . . . . . 9.3.3 Examples . . . . . . . . . . . . . . . . . . . . . . . 9.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Y. H. Dai A General Convergence Result for the BFGS Method 10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 10.2 The BFGS Algorithm . . . . . . . . . . . . . . . . . . . 10.3 A General Convergence Result for the BFGS Algorithm 10.4 Conclusion and Discussions . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
IV
. . . . .
. . . 234 . . . .
. . . . .
. . . .
. . . . .
. . . .
234 236 238 238
. . . . .
241 241 243 244 246 247
Recent Advances in Inverse Scattering
249
11 X. D. Liu and B. Zhang Uniqueness Results for Inverse Scattering 11.1 Introduction . . . . . . . . . . . . . . . . . 11.2 Uniqueness for Inhomogeneity n . . . . . 11.3 Uniqueness for Smooth Obstacles . . . . . 11.4 Uniqueness for Polygon or Polyhedra . . . 11.5 Uniqueness for Balls or Discs . . . . . . . 11.6 Uniqueness for Surfaces or Curves . . . . . 11.7 Uniqueness Results in a Layered Medium 11.8 Open Problems . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . .
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12 G. Bao and P. J. Li Shape Reconstruction of Inverse Helmholtz Equation 12.1 Introduction . . . . . . . . . . . 12.2 Analysis of the scattering map 12.3 Inverse medium scattering . . . 12.3.1 Shape reconstruction . . 12.3.2 Born approximation . . 12.3.3 Recursive linearization . 12.4 Numerical experiments . . . . . 12.5 Concluding remarks . . . . . . References . . . . . . . . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
251 251 256 256 262 263 265 265 272 276
Medium Scattering for the . . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
283 283 285 290 291 292 294 298 303 303
xvi
V
Contents
Inverse Vibration, Data Processing and Imaging
307
13 G. M. Kuramshina, I. V. Kochikov and A. V. Stepanova Numerical Aspects of the Calculation of Molecular Force Fields from Experimental Data 13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.2 Molecular Force Field Models . . . . . . . . . . . . . . . . . . . 13.3 Formulation of Inverse Vibration Problem . . . . . . . . . . . . 13.4 Constraints on the Values of Force Constants Based on Quantum Mechanical Calculations . . . . . . . . . . . . . . . . . . . . . . 13.5 Generalized Inverse Structural Problem . . . . . . . . . . . . . 13.6 Computer Implementation . . . . . . . . . . . . . . . . . . . . . 13.7 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 J. J. Liu and H. L. Xu Some Mathematical Problems in Biomedical 14.1 Introduction . . . . . . . . . . . . . . . . . . . 14.2 Mathematical Models . . . . . . . . . . . . . 14.2.1 Forward problem . . . . . . . . . . . . 14.2.2 Inverse problem . . . . . . . . . . . . . 14.3 Harmonic Bz Algorithm . . . . . . . . . . . . 14.3.1 Algorithm description . . . . . . . . . 14.3.2 Convergence analysis . . . . . . . . . . 14.3.3 The stable computation of ΔBz . . . . 14.4 Integral Equations Method . . . . . . . . . . 14.4.1 Algorithm description . . . . . . . . . 14.4.2 Regularization and discretization . . . 14.5 Numerical Experiments . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . .
VI
Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
Numerical Inversion in Geosciences
15 S. I. Kabanikhin and M. A. Shishlenin Numerical Methods for Solving Inverse Hyperbolic Problems 15.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.2 Gel’fand-Levitan-Krein Method . . . . . . . . . . . . . . . . . . 15.2.1 The two-dimensional analogy of Gel’fand-Levitan-Krein equation . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.2.2 N -approximation of Gel’fand-Levitan-Krein equation . .
309 309 311 312 314 319 321 323 327
331 331 334 334 336 339 340 342 344 348 348 352 354 362
367
369 369 370 374 377
Contents
xvii
15.2.3 Numerical results and remarks . . . . . . . . . . . . 15.3 Linearized Multidimensional Inverse Problem for the Wave Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.3.1 Problem formulation . . . . . . . . . . . . . . . . . . 15.3.2 Linearization . . . . . . . . . . . . . . . . . . . . . . 15.4 Modified Landweber Iteration . . . . . . . . . . . . . . . . . 15.4.1 Statement of the problem . . . . . . . . . . . . . . . 15.4.2 Landweber iteration . . . . . . . . . . . . . . . . . . 15.4.3 Modification of algorithm . . . . . . . . . . . . . . . 15.4.4 Numerical results . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 H. B. Song, X. H. Huang, L. M. Pinheiro, Y. Song, C. Z. Dong and Y. Bai Inversion Studies in Seismic Oceanography 16.1 Introduction of Seismic Oceanography . . . . . . . . . 16.2 Thermohaline Structure Inversion . . . . . . . . . . . . 16.2.1 Inversion method for temperature and salinity . 16.2.2 Inversion experiment of synthetic seismic data . 16.2.3 Inversion experiment of GO data (Huang et al., 16.3 Discussion and Conclusion . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . 379 . . . . . . . . .
. . . . . . . . . . . . . . . . 2011) . . . . . . . .
. . . . . . . . .
379 381 382 384 385 387 388 389 390
. . . . . . .
395 395 398 398 399 402 406 408
17 L. J. Gelius Image Resolution Beyond the Classical Limit 17.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.2 Aperture and Resolution Functions . . . . . . . . . . . . . . 17.3 Deconvolution Approach to Improved Resolution . . . . . . 17.4 MUSIC Pseudo-Spectrum Approach to Improved Resolution 17.5 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . .
. . . . . .
411 411 412 417 424 434 436
18 Y. F. Wang, Z. H. Li and C. C. Yang Seismic Migration and Inversion 18.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 18.2 Migration Methods: A Brief Review . . . . . . . . . 18.2.1 Kirchhoff migration . . . . . . . . . . . . . . . 18.2.2 Wave field extrapolation . . . . . . . . . . . . 18.2.3 Finite difference migration in ω − X domain . 18.2.4 Phase shift migration . . . . . . . . . . . . . . 18.2.5 Stolt migration . . . . . . . . . . . . . . . . . 18.2.6 Reverse time migration . . . . . . . . . . . .
. . . . . . . .
. . . . . . . .
439 439 440 440 441 442 443 443 446
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
xviii
Contents
18.2.7 Gaussian beam migration . . . . . . . . . . . . . . . 18.2.8 Interferometric migration . . . . . . . . . . . . . . . 18.2.9 Ray tracing . . . . . . . . . . . . . . . . . . . . . . . 18.3 Seismic Migration and Inversion . . . . . . . . . . . . . . . . 18.3.1 The forward model . . . . . . . . . . . . . . . . . . . 18.3.2 Migration deconvolution . . . . . . . . . . . . . . . . 18.3.3 Regularization model . . . . . . . . . . . . . . . . . . 18.3.4 Solving methods based on optimization . . . . . . . 18.3.5 Preconditioning . . . . . . . . . . . . . . . . . . . . . 18.3.6 Preconditioners . . . . . . . . . . . . . . . . . . . . . 18.4 Illustrative Examples . . . . . . . . . . . . . . . . . . . . . . 18.4.1 Regularized migration inversion for point diffraction scatterers . . . . . . . . . . . . . . . . . . . . . . . . 18.4.2 Comparison with the interferometric migration . . . 18.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Y. F. Wang, J. J. Cao, T. Sun and C. C. Yang Seismic Wavefields Interpolation Based on Sparse Regularization and Compressive Sensing 19.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 19.2 Sparse Transforms . . . . . . . . . . . . . . . . . . . . 19.2.1 Fourier, wavelet, Radon and ridgelet transforms 19.2.2 The curvelet transform . . . . . . . . . . . . . . 19.3 Sparse Regularizing Modeling . . . . . . . . . . . . . . 19.3.1 Minimization in l0 space . . . . . . . . . . . . . 19.3.2 Minimization in l1 space . . . . . . . . . . . . . 19.3.3 Minimization in lp -lq space . . . . . . . . . . . 19.4 Brief Review of Previous Methods in Mathematics . . 19.5 Sparse Optimization Methods . . . . . . . . . . . . . . 19.5.1 l0 quasi-norm approximation method . . . . . . 19.5.2 l1 -norm approximation method . . . . . . . . . 19.5.3 Linear programming method . . . . . . . . . . 19.5.4 Alternating direction method . . . . . . . . . . 19.5.5 l1 -norm constrained trust region method . . . . 19.6 Sampling . . . . . . . . . . . . . . . . . . . . . . . . . 19.7 Numerical Experiments . . . . . . . . . . . . . . . . . 19.7.1 Reconstruction of shot gathers . . . . . . . . . 19.7.2 Field data . . . . . . . . . . . . . . . . . . . . . 19.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
447 447 449 452 454 456 457 458 462 464 465
. . . .
. . . .
465 468 468 471
. . . . . . . . . . . . . . . . . . . . .
475 475 477 477 480 481 481 481 482 482 485 485 487 489 491 493 496 497 497 498 503 503
. . . . . . . . . . . . . . . . . . . . .
Contents
xix
20 H. Yang Some Researches on Quantitative Remote Sensing Inversion 20.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.2 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.3 A Priori Knowledge . . . . . . . . . . . . . . . . . . . . . . . . . 20.4 Optimization Algorithms . . . . . . . . . . . . . . . . . . . . . . 20.5 Multi-stage Inversion Strategy . . . . . . . . . . . . . . . . . . . 20.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
509 509 511 514 516 520 524 525
Index
529
Part I
Introduction
Chapter 1
Inverse Problems of Mathematical Physics S. I. Kabanikhin
Abstract. In this chapter we give the main definitions and consider various examples of inverse and ill-posed problems. These problems are found everywhere in mathematics (see the right column of Tables 1.1 and 1.2) and in virtually any area of knowledge where mathematical methods are used. Scientists have been dealing with them since ancient times (see the epigraph), but it was not until the middle of the 20th century that these problems began to be studied systematically and gradually earned the right to be considered a promising area of modern science.
1.1
Introduction
In our everyday life we are constantly dealing with inverse and ill-posed problems and, given good mental and physical health, we are usually quick and effective in solving them. For example, consider our visual perception. It is known that our eyes are able to perceive visual information from only a limited number of points in the world around us at any given moment. Then why do we have an impression that we are able to see everything around? The reason is that our brain, like a personal computer, completes the perceived image by interpolating and extrapolating the data received from the identified points. Clearly, the true image of a scene (generally, a three-dimensional color scene) can be adequately reconstructed from several points only if the image is familiar to us, i.e., if we previously saw and sometimes even touched most of the objects in it. Thus, although the problem of reconstructing the image of an object and its surroundings from several points is ill-posed (i.e., there is no uniqueness or stability of solutions), our brain is capable of solving it rather quickly. This is due to the brain’s ability to use its extensive previous experience (a-priori information). A quick glance at a person is enough to determine if he or she is a child or a senior, but it is usually not enough to determine the person’s age with an error of at most five years. From the example given in the epigraph, it is clear that considering only the shape of the Earth’s shadow on the surface of the moon is not sufficient for the
4
1 Inverse Problems of Mathematical Physics
unique solution of the inverse problem of projective geometry (reconstructing the shape of the Earth). Aristotle wrote that some think the Earth is drumshaped based on the fact that the horizon line at sunset is straight. And he provides two more observations as evidence that the Earth is spherical (uses additional information): objects fall vertically (towards the center of gravity) at any point of the Earth’s surface, and the celestial map changes as the observer moves on the Earth’s surface. Attempting to understand a substantially complex phenomenon and solve a problem where the probability of error is high, we usually arrive at an unstable (ill-posed) problem. Ill-posed problems are ubiquitous in our daily lives. Indeed, everyone realizes how easy it is to make a mistake when reconstructing the events of the past from a number of facts of the present (for example, reconstructing a crime scene based on the existing direct and indirect evidence, determining the cause of a disease based on the results of a medical examination, and so on). The same is true for tasks that involve predicting the future (predicting a natural disaster or simply producing a one week weather forecast) or “reaching into” inaccessible zones to explore their structure and internal processes (subsurface exploration in geophysics or examining a patient’s brain using NMR tomography). Almost every attempt to expand the boundaries of visual, aural, and other types of perception leads to ill-posed problems. What are inverse and ill-posed problems? While there is no universal formal definition for inverse problems, an “ill-posed problem” is a problem that either has no solutions in the desired class, or has many (at least two) solutions, or the solution procedure is unstable (arbitrarily small errors in the measurement data may lead to indefinitely large errors in the solutions). Most difficulties in solving ill-posed problems are caused by instability. Therefore, the term “ill-posed problems” is often used for unstable problems. To define various classes of inverse problems, we should first define a direct problem. Indeed, something “inverse” must be the opposite of something “direct”. For example, consider problems of mathematical physics. In mathematical physics, a direct problem is usually a problem of modeling some physical fields, processes, or phenomena (electromagnetic, acoustic, seismic, heat, etc.). The purpose of solving a direct problem is to find a function that describes a physical field or process at any point of a given domain at any instant of time (if the field is nonstationary). The formulation of a direct problem includes 1. the domain in which the process is studied; 2. the equation that describes the process; 3. the initial conditions (if the process is nonstationary); 4. the conditions on the boundary of the domain.
1.1 Introduction
5
For example, we can formulate the following direct initial-boundary value problem for the acoustic equation: In the domain Ω ⊂ Rn with boundary Γ = ∂Ω,
Rn is a Euclidean space,
(1.1.1)
it is required to find a solution u(x, t) to the acoustic equation c−2 (x)utt = Δu − ∇ ln ρ(x) · ∇u + h(x, t)
(1.1.2)
that satisfies the initial conditions u(x, 0) = ϕ(x),
ut (x, 0) = ψ(x)
(1.1.3)
and the boundary conditions ∂u = g(x, t). ∂n Γ
(1.1.4)
Here u(x, t) is the acoustic pressure, c(x) is the speed of sound in the medium, ρ(x) is the density of the medium, and h(x, t) is the sources function. Like most direct problems of mathematical physics, this problem is well-posed, which means that it has a unique solution and is stable with respect to small perturbations in the data. The following is given in the direct problem (1.1.1)–(1.1.4): the domain Ω, the coefficients c(x) and ρ(x), the source function h(x, t) in the equation (1.1.2), the initial conditions ϕ(x) and ψ(x) in (1.1.3), and the boundary conditions g(x, t) in (1.1.4). In the inverse problem, aside from u(x, t), the unknown functions include some of the functions occurring in the formulation of the direct problem. These unknowns are called the solution to the inverse problem. In order to find the unknowns, the equations (1.1.2)–(1.1.4) are supplied with some additional information about the solution to the direct problem. This information represents the data of the inverse problem. (Sometimes the known coefficients of the direct problem are also taken as data of the inverse problem. Many variants are possible.) For example, let the additional information be represented by the values of the solution to the direct problem (1.1.2)–(1.1.4) on the boundary: u|Γ = f(x, t).
(1.1.5)
In the inverse problem, it is required to determine the unknown functions occurring in the formulation of the direct problem from the data f(x, t). Inverse problems of mathematical physics can be classified into groups depending on which functions are unknown. We will use our example to describe this classification.
6
1 Inverse Problems of Mathematical Physics
Classification based on the unknown functions The inverse problem (1.1.2)–(1.1.5) is said to be retrospective if it is required to determine the initial conditions, i.e., the functions ϕ(x) and ψ(x) in (1.1.3). The inverse problem (1.1.2)–(1.1.5) is called an inverse boundary value problem, if it is required to determine the function in the boundary condition (the function g(x, t)). The inverse problem (1.1.2)–(1.1.5) is called an extension problem if the initial conditions (1.1.3) are unknown and the additional information (1.1.5) and the boundary conditions (1.1.4) are specified only on a certain part Γ1 ⊆ Γ of the boundary of the domain Ω, and it is required to find the solution u(x, t) of equation (1.1.2) (extend the solution to the interior of the domain). The inverse problem (1.1.2)–(1.1.5) is called a source problem if it is required to determine the source, i.e., the function h(x, t) in equation (1.1.2). The inverse problem (1.1.2)–(1.1.5) is called a coefficient problem if it is required to reconstruct the coefficients (c(x) and ρ(x)) in the main equation. It should be noted that this classification is still incomplete. There are cases where both initial and boundary conditions are unknown, and cases where the domain Ω (or a part of its boundary) is unknown. Classification based on the additional information Aside from the initial-boundary value problem (1.1.2)–(1.1.4), it is possible to give other formulations of the direct problem of acoustics (spectral, scattering, kinematic, etc.) in which it is required to find the corresponding parameters of the acoustic process (eigenfrequencies, reflected waves, wave travel times, and so on). Measuring these parameters in experiments leads to new classes of inverse problems of acoustics. In practice, the data (1.1.5) on the boundary of the domain under study are the most accessible for measurement, but sometimes measuring devices can be placed inside the object under study: u(xm, t) = fm (t),
m = 1, 2, . . . ,
(1.1.6)
which leads us to interior problems. Much like problems of optimal control, retrospective inverse problems include the so-called “final” observations u(x, T ) = fˆ(x).
(1.1.7)
Inverse scattering problems are formulated in the case of harmonic oscillations u(x, t) = eiwt u(x, ¯ ω), the additional information being specified, for example, in the form ¯ α), u(x, ¯ ωα ) = f(x,
x ∈ X1 ,
α ∈ Ω,
(1.1.8)
1.1 Introduction
7
where X1 is the set of observation points and {ωα }α∈Ω is the set of observation frequencies. In some cases, the eigenfrequencies of the corresponding differential operator ΔU − ∇ ln ρ · ∇U = λU and various eigenfunction characteristics are known (inverse spectral problems). It is sometimes possible to register the arrival time at points {xk } for the waves generated by local sources concentrated at points {xm }: τ (xm, xk ) = f˜(xm , xk ),
x k ∈ X1 ,
x m ∈ X2 .
In this case, the problem of reconstructing the speed c(x) is called an inverse kinematic problem. Classification based on equations As shown above, the acoustic equation alone yields as many as M1 different inverse problems depending on the number and the form of the unknown functions. On the other hand, one can obtain M2 different variants of inverse problems depending on the number and the type of the measured parameters (additional information), i.e., the data of the inverse problem. Let q represent the unknown functions, and let f denote the data of the inverse problem. Then the inverse problem can be written in the form of an operator equation Aq = f,
(1.1.9)
where A is an operator from the space of unknowns Q to the data space F . It should be noted in conclusion that, instead of the acoustic equation, we could take the heat conduction equation, the radiative transfer equation, Laplace’s or Poisson’s equation, or the systems of Lamé’s or Maxwell’s equations, and the like. Suppose this could yield M3 different variants. Then, for equations of mathematical physics alone, it is possible to formulate about M1 M2 M3 different inverse problems. Many of these inverse problems became the subjects of monographs. Remark 1.1.1. It is certainly possible to make the definition of an inverse problem even more general, since even the law (equation) under study is sometimes unknown. In this case, it is required to determine the law (equation) from the results of experiments (observations). The discoveries of new laws expressed in the mathematical form (equations) are brought about by a lot of experiments, reasoning, and discussions. This complex process is not exactly a process of solving an inverse problem. However, the term “inverse problems” is gaining popularity in the scientific literature on a wide variety of subjects. For example, there have been attempts to view
8
1 Inverse Problems of Mathematical Physics
mathematical statistics as an inverse problem with respect to probability theory. There is a lot in common between the theory of inverse problems and control theory, the theory of pattern recognition, and many other areas. This is easy to see, for example, from the results of a search for “inverse problems” on the web site www.amazon.com. At the time when this was written (May 31, 2007), the search produced 6687 titles containing the words “inverse problems”, and each of the books found, in its turn, had a list of references to the relevant literature. This textbook, however, deals with only a few mathematical aspects of the theory of inverse problems. The structure of the operator A The direct problem can be written in the operator form A1 (Γ, c, ρ, h, ϕ, ψ, g) = u. This means that the operator A1 maps the data of the direct problem to the solution of the direct problem, u(x, t). We call A1 the operator of the direct problem. Some data of the direct problem are unknown in the inverse problem. We denote these unknowns by q, and the restriction of A1 to q by A1 . The measurement operator A2 maps the solution u(x, t) of the direct problem to the additional information f, for example, A2 u = u|Γ or A2 u = u(xk , t), k = 1, 2, . . . , etc. Then equation (1.1.9) becomes as follows: Aq ≡ A2 A1 q = f, where A is the result of the consecutive application (composition) of the operators A1 and A2 . For example, in the retrospective inverse problem we have q = (ϕ, ψ), f = u(x, T ), A1 q = u(x, t), and A2 A1 q = u(x, T ); in the coefficient inverse problem, we have q = (c, ρ), f = u|Γ , A1 q = u(x, t), and A2 A1 q = u|Γ , etc. The operator A1 of the direct problem is usually continuous (the direct problem is well-posed), and so is the measurement operator (normally, one chooses to measure stable parameters of the process under study). The properties of the composition A2 A1 are usually even too good (in ill-posed problems, A = A2 A1 is often a completely continuous operator or, in other words, a compact operator). This complicates finding the inverse of A, i.e., solving the inverse problem Aq = f. In a simple example where A is a constant, the smaller the number A being multiplied by q, generally speaking, the smaller the error (if q is approximate): A(q + δq) = f˜, i.e., the operator of the direct problem has good properties. However, when solving the inverse problem with approximate data f˜ = f + δf, we have f δf δf f˜ = + , δq = , q˜ = q + δq = A A A A
1.1 Introduction
9
and the error δq = q˜ − q can be indefinitely large if the number A is sufficiently small. Inverse and ill-posed problems began to attract the attention of scientists in the early 20th century. In 1902, J. Hadamard formulated the concept of the well-posedness (properness) of problems for differential equations. A problem is called well-posed in the sense of Hadamard if there exists a unique solution to this problem that continuously depends on its data (see Definition 1.3.1). In the same paper, Hadamard also gave an example of an ill-posed problem (the Cauchy problem for Laplace’s equation). In 1943, A. N. Tikhonov pointed out the practical importance of such problems and the possibility of finding stable solutions to them. In the 1950’s and 1960’s there appeared a series of new approaches that became fundamental for the theory of ill-posed problems and attracted the attention of many mathematicians to this theory. With the advent of powerful computers, inverse and ill-posed problems started to gain popularity very rapidly. By the present day, the theory of inverse and ill-posed problems has developed into a new powerful and dynamic field of science that has an impact on almost every area of mathematics, including algebra, calculus, geometry, differential equations, mathematical physics, functional analysis, computational mathematics, etc. Some examples of well-posed and ill-posed problems are presented below in Tables 1.1 and 1.2. It should be emphasized that, one way or the other, every ill-posed problem (in the right column) can be formulated as an inverse problem with respect to a well-posed direct problem (the corresponding problems in the left column of Tables 1.1 and 1.2). The theory of inverse and ill-posed problems is also widely used in solving applied problems in almost all fields of science, in particular: • physics (quantum mechanics, acoustics, electrodynamics, etc.); • geophysics (seismic exploration, electrical, magnetic and gravimetric prospecting, logging, magnetotelluric sounding, etc.); • medicine (X-ray and NMR tomography, ultrasound testing, etc.); • ecology (air and water quality control, space monitoring, etc.); • economics (optimal control theory, financial mathematics, etc.). More detailed examples of applications of inverse and ill-posed problems are given in the next section, in the beginning of each chapter, and in the sections devoted to specific applications. Without going into further details of mathematical definitions, we note that, in most cases, inverse and ill-posed problems have one important property in common: instability. In the majority of cases that are of interest, inverse problems turn out to be ill-posed and, conversely, an ill-posed problem can usually be reduced to a problem that is inverse to some direct (well-posed) problem.
10
1 Inverse Problems of Mathematical Physics
Table 1.1. Examples of well-posed and ill-posed problems. Well-posed problems
Ill-posed problems Arithmetic
Multiplication by a small number A Aq = f
Division by a small number q = A−1 f (A 1)
Algebra Multiplication by a matrix Aq = f
q = A−1 f , A is ill-conditioned, A is degenerate, the system is inconsistent
Calculus Integration x q(ξ) dξ f (x) = f (0) +
Differentiation q(x) = f (x)
0
Differential equations The Sturm-Liouville problem u (x) − q(x)u(x) = λu(x), u(0) − hu (0) = 0, u(1) − Hu (1) = 0
The inverse Sturm-Liouville problem {λn , un 2 } → q(x)
Integral geometry Find the integral of a function q(x, y) along a curve Γ(ξ, η)
Find q(x, y) from a family of integrals q(x, y) ds = f (ξ, η) Γ(ξ,η)
Integral equations Volterra equations and Fredholm equations of the second kind x q(x) + K(x, ξ)q(ξ) dξ = f (x) 0
b
q(x) + a
Volterra equations and Fredholm equations of the first kind x K(x, ξ)q(ξ) dξ = f (x) 0
b
K(x, ξ)q(ξ) dξ = f (x) a
K(x, ξ)q(ξ) dξ = f (x)
1.1 Introduction
11
Table 1.2. Examples of well-posed and ill-posed problems. Well-posed problems
Ill-posed problems Elliptic equations
Δu = 0 Dirichlet and Neumann problems, Robin problem (mixed)
Δu = 0 Cauchy problem Initial-boundary value problem with data given on a part of the boundary
Parabolic equations 1) Cauchy problem Δu = ut , t > 0 ut=0 = f (x) 2) Initial-boundary value problem Δu = ut , t > 0 ut=0 = 0 uS = g(x, t)
Cauchy problem in reversed time −ut = Δu ut=0 = f Initial-boundary value problem with data given on a part of the boundary ut = Δu u = f1 , ux = f2 x=0
x=0
Hyperbolic equations Cauchy problem Initial-boundary value problem Direct problems
utt = uxx − q(x)u u|t=0 = ϕ(x), ut |t=0 = ψ(x) ut = uxx − q(x)u u|t=0 = 0 ∇(q(x)∇u) = 0, u|S = g
Dirichlet and Neumann problems Cauchy problem with data on a time-like surface Coefficient inverse problems utt = uxx − q(x)u u|t=0 = ϕ(x), ut |t=0 = ψ(x) u(0, t) = f (t) ut = uxx − q(x)u u|t=0 = 0, u(0, t) = f (t) ∇(q(x)∇u) = 0, u|S = g,
∂u =f ∂n S
12
1 Inverse Problems of Mathematical Physics
To sum up, it can be said that specialists in inverse and ill-posed problems study the properties of and regularization methods for unstable problems. In other words, they develop and study stable methods for approximating unstable mappings, transformations, or operations. From the point of view of information theory, the theory of inverse and ill-posed problems deals with maps from data tables with very small epsilon-entropy to tables with large epsilon-entropy.
1.2
Examples of Inverse and Ill-posed Problems
Example 1.2.1 (algebra, systems of linear algebraic equations). Consider the system of linear algebraic equations Aq = f,
(1.2.1)
where A is an m × n matrix, q and f are n- and m-dimensional vectors, respectively. Let the rank of A be equal to min(m, n). If m < n, the system has many solutions. If m > n, there may be no solutions. If m = n, the system has a unique solution for any right-hand side. In this case, there exists an inverse operator (matrix) A−1 . It is bounded, since it is a linear operator in a finitedimensional space. Thus, all three conditions of well-posedness in the sense of Hadamard are satisfied. We now analyze in detail the dependence of the solution on the perturbations of the right-hand side f in the case where the matrix A is nondegenerate. Subtracting the original equation (1.2.1) from the perturbed equation A(q + δq) = f + δf,
(1.2.2) we obtain Aδq = δf, which implies δq = A−1 δf and δq ≤ A−1 δf. We also have A q ≥ f. As a result, the best estimate for the relative error of the solution is δf δq ≤ A A−1 , q f
(1.2.3)
which shows that the error is determined by the constant μ(A) = A A−1 called the condition number of the system (matrix). Systems with relatively large condition number are said to be ill-conditioned. For normalized matrices (A = 1), it means that there are relatively large elements in the inverse matrix and, consequently, small variations in the right-hand side may lead to relatively large (although finite) variations in the solution. Therefore, systems with ill-conditioned matrices can be considered practically unstable, −1although formally the problem is well-posed and the stability condition A < ∞ holds.
1.2 Examples of Inverse and Ill-posed Problems
13
For example, the matrix ⎛
1 a 0 ... ⎜0 1 a ... ⎜ ⎜ .. .. .. . . ⎜. . . . ⎜ ⎝0 0 0 ... 0 0 0 ...
⎞ 0 0⎟ ⎟ .. ⎟ .⎟ ⎟ a⎠ 1
is ill-conditioned for sufficiently large n and |a| > 1 because the inverse matrix has elements of the form an−1 . In the case of perturbations in the elements of the matrix, the estimate (1.2.3) becomes as follows: δA δA δq ≤ μ(A) 1 − μ(A) q A A −1 (for A δA < 1). If m = n and the determinant of A is zero, then the system (1.2.1) may have either no solutions or more than one solution. It follows that the problem Aq = f is ill-posed for degenerate matrices A (det A = 0). Example 1.2.2 (calculus; summing a Fourier series). The problem of summing a Fourier series consists in finding a function q(x) from its Fourier coefficients. We show that the problem of summing a Fourier series is unstable with respect to small variations in the Fourier coefficients in the l2 metric if the variations of the sum are estimated in the C metric. Let q(x) =
∞
fk cos kx,
k=1
and let the Fourier coefficients fk of the function q(x) have small perturbations: f˜k = fk + ε/k. Set ∞ f˜k cos kx. q(x) ˜ = k=1
The difference between the coefficients of these series in the l2 metric is ∞ ∞ 1/2 1/2 1 π2 2 (fk − f˜k ) =ε =ε , 2 k 6 k=1
k=1
which vanishes as ε → 0. However, the difference ∞ 1 q(x) − q(x) ˜ =ε cos kx k k=1
14
1 Inverse Problems of Mathematical Physics
can be as large as desired because the series diverges for x = 0. Thus, if the C metric is used to estimate variations in the sum of the series, then summation of the Fourier series is not stable. Example 1.2.3 (geometry). Consider a body in space that can be illuminated from various directions. If the shape of the body is known, the problem of determining the shape of its shadow is well-posed. The inverse problem of reconstructing the shape of the body from its projections (shadows) on various planes is well-posed only for convex bodies, since a concave area obviously cannot be detected this way. As noted before, Aristotle was one of the first to formulate and solve a problem of this type. Observing the shadow cast by the Earth on the moon, he concluded that the Earth is spherical. Example 1.2.4 (integral geometry on straight lines). One of the problems arising in computerized tomography is to determine a function of two variables q(x, y) from the collection of integrals q(x, y) dl = f(p, ϕ) L(p,ϕ)
along various lines L(p, ϕ) in the plane (x, y), where p and ϕ are the line parameters. This problem is not well-posed because the condition for the existence of a solution for any right-hand side f(ρ, ϕ) does not hold. Example 1.2.5 (integral geometry on circles). Consider the problem of determining a function of two variables q(x, y) from the integral of this function over a collection of circles whose centers lie on a fixed line. Assume that q(x, y) is continuous for all (x, y) ∈ R2 . Consider a collection of circles whose centers lie on a fixed line (for definiteness, let this line be the coordinate axis y = 0). Let L(a, r) denote the circle (x − a)2 + y 2 = r2 , which belongs to this collection. It is required to determine q(x, y) from the function f(x, r) such that q(ξ, τ ) dl = f(x, r), (1.2.4) L(x,r)
and f(x, r) is defined for all x ∈ (−∞, ∞) and r > 0. The solution of this problem is not unique in the class of continuous functions, since for any continuous function q(x, ˜ y) such that q(x, ˜ y) = −q(x, ˜ −y) the integrals q(ξ, ˜ τ ) dl L(x,r)
1.2 Examples of Inverse and Ill-posed Problems
15
vanish for all x ∈ R and r > 0. Indeed, using the change of variables ξ = x + r cos ϕ, τ = r sin ϕ, we obtain 2π q(ξ, ˜ τ ) dl = q(x ˜ + r cos ϕ, r sin ϕ)r dϕ 0 L(x,r) π q(x ˜ + r cos ϕ, r sin ϕ)r dϕ = 0 2π + q(x ˜ + r cos ϕ, r sin ϕ)r dϕ. (1.2.5) π
Putting ϕ¯ = 2π − ϕ and using the condition q(x, ˜ y) = −q(x, ˜ −y), we transform the last integral in the previous formula: 0 2π q(x ˜ + r cos ϕ, ¯ −r sin ϕ)r ¯ dϕ¯ q(x ˜ + r cos ϕ, r sin ϕ)r dϕ = π π π q(x ˜ + r cos ϕ, ¯ r sin ϕ)r ¯ dϕ. ¯ =− 0
Substituting the result into (1.2.5), we have q(ξ, ˜ τ ) dl = 0 L(x,r)
for x ∈ R, r > 0. Thus, if q(x, y) is a solution to the problem (1.2.4) and q(x, ˜ y) is any continuous function such that q(x, ˜ y) = −q(x, ˜ −y), then q(x, y) + q(x, ˜ y) is also a solution to (1.2.4). For this reason, the problem can be reformulated as the problem of determining the even component of q(x, y) with respect to y. The first well-posedness condition is not satisfied in the above problem: solutions may not exist for some f(x, r). However, the uniqueness of solutions in the class of even functions with respect to y can be established using the method described in Section 5.2. Example 1.2.6 (a differential equation). The rate of radioactive decay is proportional to the amount of the radioactive substance. The proportionality constant q1 is called the decay constant. The process of radioactive decay is described by the solution of the Cauchy problem for an ordinary differential equation du = −q1 u(t), t ≥ 0, (1.2.6) dt u(0) = q0 , (1.2.7) where u(t) is the amount of the substance at a given instant of time and q0 is the amount of the radioactive substance at the initial instant of time.
16
1 Inverse Problems of Mathematical Physics
The direct problem: Given the constants q0 and q1 , determine how the amount of the substance u(t) changes with time. This problem is obviously well-posed. Moreover, its solution can be written explicitly as follows: u(t) = q0 e−q1 t ,
t ≥ 0.
Assume now that the decay constant q1 and the initial amount of the radioactive substance q0 are unknown, but we can measure the amount of the radioactive substance u(t) for certain values of t. The inverse problem consists in determining the coefficient q1 in equation (1.2.6) and the initial condition q0 from the additional information about the solution to the direct problem u(tk ) = fk , k = 1, 2, . . . , N . Exercise 1.2.7. Use the explicit formula for the solution of the direct problem u(t) = q0 e−q1 t to determine if the inverse problem is well-posed, depending on the number of points tk , k = 1, 2, . . . , N , at which the additional information fk = u(tk ) is measured. Example 1.2.8 (a system of differential equations). A chemical kinetic process is described by the solution to the Cauchy problem for the system of linear ordinary differential equations dui = qi1 u1 (t) + qi2 u2 (t) + · · · + qin un (t), dt ui (0) = q¯i ,
i = 1, . . . , N.
(1.2.8) (1.2.9)
Here ui (t) is the concentration of the ith substance at the instant t. The constant parameters qij characterize the dependence of the change rate of the concentration of the ith substance on the concentration of the substances involved in the process. The direct problem: Given the parameters qij and the concentrations q¯i at the initial instant of time, determine ui (t). The following inverse problem can be formulated for the system of differential equations (1.2.8). Given that the concentrations of substances ui (t), i = 1, . . . , N , are measured over a period of time t ∈ [t1 , t2 ], it is required to determine the values of the parameters qij , i.e., to determine the coefficients of the system (1.2.8) from a solution to this system. Two versions of this inverse problem can be considered. In the first version, the initial conditions (1.2.9) are known, i.e., q¯i are given and the corresponding solutions ui (t) are measured. In the second version, q¯i are unknown and must be determined together with qij .
1.2 Examples of Inverse and Ill-posed Problems
17
Example 1.2.9 (differential equation of the second order). Suppose that a particle of unit mass is moving along a straight line. The motion is caused by a force q(t) that depends on time. If the particle is at the origin x = 0 and has zero velocity at the initial instant t = 0, then, according to Newton’s laws, the motion of the particle is described by a function u(t) satisfying the Cauchy problem u(t) ¨ = q(t), t ∈ [0, T ], (1.2.10) u(0) = 0,
u(0) ˙ = 0,
(1.2.11)
where u(t) is the coordinate of the particle at the instant t. Assume now that the force q(t) is unknown, but the coordinate of the particle u(t) can be measured at any instant of time (or at certain points of the interval [0, T ]). It is required to reconstruct q(t) from u(t). Thus, we have the following inverse problem: determine the right-hand side of equation (1.2.10) (the function q(t)) from the known solution u(t) to the problem (1.2.10), (1.2.11). We now prove that the inverse problem is unstable. Let u(t) be a solution to the direct problem for some q(t). Consider the following perturbations of the solution to the direct problem: un (t) = u(t) +
1 cos(nt). n
These perturbations correspond to the right-hand sides qn (t) = q(t) − n cos(nt). Obviously, u − un C[0,T ] → 0 as n → ∞, and q − qn c[0,T ] → ∞ as n → ∞. Thus, the problem of determining the right-hand side of the linear differential equation (1.2.10), (1.2.11) from its right-hand side is unstable. Note that the inverse problem is reduced to double differentiation if the values of u(t) are given for all t ∈ [0, T ]. Exercise 1.2.10. Analyze the differentiation operation for stability. Example 1.2.11 (Fredholm integral equation of the first kind). Consider the problem of solving a Fredholm integral equation of the first kind b K(x, s)q(s) ds = f(x), c ≤ x ≤ d, (1.2.12) a
where the kernel K(x, s) and the function f(x) are given and it is required to find q(s). It is assumed that f(x) ∈ C[c, d], q(s) ∈ C[a, b], and K(x, s), Kx (x, s), and Ks (x, s) are continuous in the rectangle c ≤ x ≤ d, a ≤ s ≤ b. The problem of solving equation (1.2.12) is ill-posed because solutions may not exist for some functions f(x) ∈ C[c, d]. For example, take a function f(x)
18
1 Inverse Problems of Mathematical Physics
that is continuous but not differentiable on [c, d]. With such a right-hand side, the equation cannot have a continuous solution q(s) since the conditions for the kernel K(x, s) imply that the integral in the left-hand side of (1.2.12) is differentiable with respect to the parameter x for any continuous function q(s). The condition of continuous dependence of solutions on the initial data is also not satisfied for equation (1.2.12). Consider the sequence of functions qn (s) = q(s) + n sin (n2 s),
n = 0, 1, 2, . . . .
Substituting qn (s) into (1.2.12), we obtain b K(x, s)qn (s) ds fn (x) = a b = f(x) + K(x, s)n sin(n2 s) ds,
n = 0, 1, . . . .
a
We now estimate fn − f C[c,d] . Since b K(x, s)n sin (n2 s) ds |fn (x) − f(x)| ≤ a
b 1 1 b 2 Ks (x, s) cos (n2 s) ds = − cos (n s)K(x, s) + n n a a K1 ≤ , n where the constant K1 does not depend on n, we have K1 , n = 0, 1, . . . . n On the other hand, from the definition of the sequence qn (s) it follows that fn − f C[c,d] ≤
qn − qC[a,b] → ∞ as
n → ∞.
Thus, as n → ∞, the initial data fn (x) are as close to f(x) as desired, but the corresponding solutions qn (s) do not converge to q(s), which means that the dependence of solutions on the initial data is not continuous. Example 1.2.12 (Volterra integral equation of the first kind). Consider a Volterra integral equation of the first kind x K(x, s) q(s) ds = f(x), 0 ≤ x ≤ 1. (1.2.13) 0
Assume that K(x, s) is continuous, has partial derivatives of the first order for 0 ≤ s ≤ x ≤ 1, and K(x, x) = 1 for x ∈ [0, 1]. Suppose also that the unknown function q(s) must belong to C[0, 1] and f(x) ∈ C0 [0, 1], where C0 [0, 1] is the space of functions f(x) continuous on [0, 1] such that f(0) = 0, endowed with the uniform metric.
1.2 Examples of Inverse and Ill-posed Problems
19
Exercise 1.2.13. Prove that the problem of solving equation (1.2.13) is illposed. Example 1.2.14 (the Cauchy problem for Laplace’s equation). Let u = u(x, y) be a solution to the following problem (see Chapter 7): Δu = 0,
x > 0,
u(0, y) = f(y), ux (0, y) = 0,
y ∈ R,
(1.2.14)
y ∈ R,
(1.2.15)
y ∈ R.
(1.2.16)
Let the data f(y) be chosen as follows: 1 sin (ny). n Then the solution to the problem (1.2.14)–(1.2.16) is given by f(y) = fn (y) = u(0, y) =
1 sin (ny)(enx + e−nx ), n ∈ N. (1.2.17) n For any fixed x > 0, the solution un (x, y) tends to infinity as n → ∞, while f(y) tends to zero as n → ∞. Therefore, small variations in the data of the problem in C l or W2l (for any l < ∞) may lead to indefinitely large variation in the solution, which means that the problem (1.2.14)–(1.2.16) is ill-posed. un (x, y) =
Example 1.2.15 (an inverse problem for a partial differential equation of the first order). Let q(x) be continuous and ϕ(x) be continuously differentiable for all x ∈ R. Then the following Cauchy problem is well-posed: ux − uy + q(x)u = 0, u(x, 0) = ϕ(x),
(x, y) ∈ R2 ,
(1.2.18)
x ∈ R.
(1.2.19)
Consider the inverse problem of reconstructing q(x) from the additional information about the solution to the problem (1.2.18), (1.2.19) u(0, y) = ψ(y),
y ∈ R.
(1.2.20)
The solution to (1.2.18), (1.2.19) is given by the formula u(x, y) = ϕ(x + y) exp
x
q(ξ) dξ ,
(x, y) ∈ R2 .
x+y
The condition (1.2.20) implies ψ(y) = ϕ(y) exp
0 y
Thus, the conditions
q(ξ) dξ ,
y ∈ R.
(1.2.21)
20
1 Inverse Problems of Mathematical Physics
(1) ϕ(y) and ψ(y) are continuously differentiable for y ∈ R; (2) ψ(y)/ϕ(y) > 0, y ∈ R; ψ(0) = ϕ(0) are necessary and sufficient for the existence of a solution to the inverse problem, which is determined by the following formula (Romanov, 1973a): q(x) = −
d ψ(x) ln , dx ϕ(x)
x ∈ R.
(1.2.22)
If ϕ(y) and ψ(y) are only continuous, then the problem is ill-posed. Example 1.2.16 (the Cauchy problem for the heat conduction equation in reversed time). The Cauchy problem in reversed time is formulated as follows: Let a function u(x, t) satisfy the equation ut = uxx ,
0 < x < π,
t>0
(1.2.23)
and the boundary conditions u(0, t) = u(π, t) = 0,
t > 0.
(1.2.24)
It is required to determine the values of u(x, t) at the initial instant t = 0: u(x, 0) = q(x),
0 ≤ x ≤ π,
(1.2.25)
given the values of u(x, t) at a fixed instant of time t = T > 0 u(x, T ) = f(x),
0 ≤ x ≤ π.
(1.2.26)
This problem is inverse to the problem of finding a function u(x, t) satisfying (1.2.23)–(1.2.25) where the function q(x) is given. The solution to the direct problem (1.2.23)–(1.2.25) is given by the formula ∞
u(x, t) =
2
e−n t qn sin nx,
(1.2.27)
n=1
where {qn } are the Fourier coefficients of q(x): 2 qn (x) = π
π q(x) sin (nx) dx. 0
Setting t = T in (1.2.27), we get f(x) =
∞ n=1
e−n
2T
qn sin nx,
x ∈ [0, π],
(1.2.28)
1.2 Examples of Inverse and Ill-posed Problems
which implies
2
qn = fn en T ,
21
n = 1, 2, . . . ,
where {fn } are the Fourier coefficients of f(x). Since the function q(x) ∈ L2 (0, π) is uniquely determined by its Fourier coefficients {qn }, the solution of the inverse problem is unique in L2 (0, π). Note that the condition (1.2.25) holds as a limit condition: π lim
t→+0
2 u(x, t) − q(x) dx = 0.
0
The inverse problem (1.2.23)–(1.2.26) has a solution if and only if ∞
fn2 e2n
2T
< ∞,
n=1
which obviously cannot hold for all functions f ∈ L2 (0, π). Example 1.2.17 (coefficient inverse problem for the heat conduction equation). The solution u(x, t) to the boundary value problem for the heat conduction equation cρut = (kux )x − αu + f,
0 < x < l,
0 < t < T,
(1.2.29)
u(0, t) − λ1 ux (0, t) = μ1 (t),
0 ≤ t ≤ T,
(1.2.30)
u(, t) − λ2 ux (l, t) = μ2 (t),
0 ≤ t ≤ T,
(1.2.31)
u(x, 0) = ϕ(x),
0 ≤ x ≤ l,
(1.2.32)
describes many physical processes such as heat distribution in a bar, diffusion in a hollow tube, etc. The equation coefficients and the boundary conditions represent the parameters of the medium under study. If the problem (1.2.29)– (1.2.32) describes the process of heat distribution in a bar, then c and k are the heat capacity coefficient and the heat conduction coefficient, respectively, which characterize the material of the bar. In this case, the direct problem consists in determining the temperature of the bar at a point x at an instant t (i.e., determining the function u(x, t)) from the known parameters c, ρ, k, α, f, λ1 , λ2 , μ1 , μ2 , and ϕ. Now suppose that all coefficients and functions that determine the solution u(x, t) except the heat conduction coefficient k = k(x) are known, and the temperature of the bar can be measured at a certain interior point x0 , i.e., u(x0, t) = f(t), 0 ≤ t ≤ T . The following inverse problem arises: determine the heat conduction coefficient k(x), provided that the function f(t) and all other functions in (1.2.29)–(1.2.32) are given. Other inverse problems can be formulated in a similar way, including the cases where C = C(u), k = k(u), etc. (Alifanov, Artyukhin, and Rumyantsev, 1988).
22
1 Inverse Problems of Mathematical Physics
Example 1.2.18 (interpretation of measurement data). The operation of many measurement devices that measure nonstationary fields can be described as follows: a signal q(t) arrives at the input of the device, and a function f(t) is registered at the output. In the simplest case, the functions q(t) and f(t) are related by the formula
t 0
g(t − τ )q(τ ) dτ = f(t).
(1.2.33)
In this case, g(t) is called the impulse response function of the device. In theory, g(t) is the output of the device in the case t where the input is the generalized function δ(t), i.e., Dirac’s delta function: a g(t − τ )δ(t)dτ = g(t). In practice, in order to obtain g(t), a sufficiently short and powerful impulse is provided as an input. The resulting output function is close to the impulse response function in a certain sense. Thus, the problem of interpreting measurement data, i.e., determining the form of the input signal q(t) is reduced to solving the integral equation of the first kind (1.2.33). The relationship between the input signal q(t) and the output function f(t) can be more complicated. For a “linear” device, this relationship has the form
t 0
K(t, τ )q(τ ) dτ = f(t).
The relationship between q(t) and f(t) can be nonlinear: 0
t
K(t, τ, q) dτ = f(t).
This model describes the operation of devices that register alternate electromagnetic fields, pressure and tension modes in a continuous medium, seismographs, which record vibrations of the Earth’s surface, and many other kinds of devices. Remark 1.2.19. To solve the simple equation (1.2.33), one can use Fourier or Laplace transforms. For example, extend all functions in (1.2.33) by zero to t < 0, and let g(λ), ˜ q(λ), ˜ and f˜(λ) be the Fourier transforms of the functions g(t), q(t), and f(t), respectively: ∞ ∞ ∞ iλt iλt ˜ e g(t) dt, q(λ) ˜ = e q(t) dt, f(λ) = e iλt f(t) dt. g(λ) ˜ = 0
0
Then, by the convolution theorem, ˜ g(λ) ˜ q(λ) ˜ = f(λ),
0
1.2 Examples of Inverse and Ill-posed Problems
23
and, consequently, the inversion of the Fourier transform yields the formula for the solution of (1.2.33): ∞ ∞ f˜(λ) 1 1 dλ. (1.2.34) e−iλt q(λ) ˜ dλ = e−iλt q(t) = 2π −∞ 2π −∞ g(λ) ˜ The calculation method provided by formula (1.2.34) is unstable because the function g(λ), ˜ which is the Fourier transform of the impulse response function of the device, tends to zero as λ → ∞ in real-life devices. This means that arbitrarily small variations in the measured value of f˜(λ) can lead to very large variations in the solution q(t) for sufficiently large λ. Remark 1.2.20. If g(t) is a constant, then the problem of solving (1.2.33) is the differentiation problem. Example 1.2.21 (continuation of stationary fields). Some problems of interpreting gravitational and magnetic fields related to mineral exploration lead to ill-posed problems equivalent to the Cauchy problem for Laplace’s equation. If the Earth were a spherically uniform ball, the gravitational field strength on the surface of the Earth would be constant. The inhomogeneity of the land surface and the density distribution within the Earth causes the gravitational field strength on the Earth’s surface to deviate from its mean value. Although these deviations are small in terms of percentage, they are reliably registered by physical devices (gravimeters). Gravimetric data are used in mineral exploration and prospecting. The purpose of gravimetric exploration is to determine the location and shape of subsurface inhomogeneities based on gravimetric measurement data. If the distance between geological bodies is greater than the distance between either of them and the surface of the Earth, then their locations correspond to the local maxima of the anomalies. Otherwise, the two bodies may be associated with a single local maximum. Geophysical measurements and the interpretation of their results represent the preliminary stage in prospecting for mineral deposits. The main stage of prospecting consists in drilling the exploratory wells and analyzing the drilling data. If the shape of an anomaly leads us to the conclusion that it represents a single body, then a natural choice would be to drill in the center of the anomaly. However, if the conclusion is wrong, the decision to drill in the center will result in the well being drilled between the actual bodies that we are interested in. This was often the case in the practice of geological exploration. Then it was proposed to calculate the anomalous gravitational field at a certain depth under the surface based on the results of the gravitational measurements performed on the surface of the Earth (i.e., to solve the Cauchy problem for Laplace’s equation). If it turns out that the anomaly at that depth still has a single local maximum, then it is highly
24
1 Inverse Problems of Mathematical Physics
probable that the anomaly is generated by a single body. Otherwise, if two local maxima appear as a result of recalculation, then it is natural to conclude that there are two bodies, and the locations for drilling must be chosen accordingly. A similar problem formulation arises when interpreting anomalies of a constant magnetic field, since the potential and the components of the field strength outside the magnetic bodies also satisfy Laplace’s equation. In electrical sounding using direct-current resistivity methods, direct current is applied to two electrodes inserted into the ground and the potential difference on the surface is measured. It is required to determine the structure of the subsurface area under study based on the measurement results. If the sediment layer is a homogeneous conductive medium and the basement resistivity is much higher, then the electric current lines will run along the relief of the basement surface. Consequently, to determine the relief of the basement surface, it suffices to determine the electric current lines in the sediment layer. In a homogeneous medium, the direct current potential satisfies Laplace’s equation. The normal derivative of the electric potential on the Earth’s surface is equal to zero. The potential is measured. Thus, we again arrive at the Cauchy problem for Laplace’s equation.
1.3
Well-posed and Ill-posed Problems
Let an operator A map a topological space Q into a topological space F (A : Q → F ). For any topological space Q, let O(q) denote a neighbourhood of an element q ∈ Q. Throughout what follows, D(A) is the domain of definition and R(A) is the range of A. Definition 1.3.1 (well-posedness of a problem; well-posedness in the sense of Hadamard). The problem Aq = f is well-posed on the pair of topological spaces Q and F if the following three conditions hold: (1) for any f ∈ F there exists a solution qe ∈ Q to the equation Aq = f (the existence condition), i.e., R(A) = F ; (2) the solution qe to the equation Aq = f is unique in Q (the uniqueness condition), i.e., there exists an inverse operator A−1 : F → Q; (3) for any neighbourhood O(qe ) ⊂ Q of the solution qe to the equation Aq = f, there is a neighbourhood O(f) ⊂ F of the right-hand side f such that for all fδ ∈ O(f) the element A−1 fδ = qδ belongs to the neighbourhood O(qe ), i.e., the operator A−1 is continuous (the stability condition). Definition 1.3.1 can be made more specific by replacing the topological spaces Q and F by metric, Banach, Hilbert, or Euclidean spaces. In some cases, it
1.3 Well-posed and Ill-posed Problems
25
makes more sense to take a topological space for Q and a Euclidean space for F , and so on. It is only the requirements of the existence, uniqueness, and stability of the solution that are fixed in the definition. Definition 1.3.2. The problem Aq = f is ill-posed on the pair of spaces Q and F if at least one of the three well-posedness conditions does not hold. M. M. Lavrentiev proposed to distinguish the class of conditionally well-posed problems as a subclass of ill-posed problems. Let Q and F be topological spaces and let M ⊂ Q be a fixed set. We denote by A(M ) the image of M under the map A : Q → F , i.e., A(M ) = {f ∈ F : ∃q ∈ M such that Aq = f}. It is obvious that A(M ) ⊂ F . Definition 1.3.3 (conditional/Tikhonov well-posedness). The problem Aq = f is said to be conditionally well-posed on the set M if f ∈ A(M ) and the following conditions hold: (1) a solution qe to the equation Aq = f, f ∈ A(M ), is unique on M ; (2) for any neighbourhood O(qe ) of a solution to the equation Aq = f, there exists a neighbourhood O(f) such that for any fδ ∈ O(f) ∩ A(M ) the solution to the equation Aq = fδ belongs to O(qe ) (conditional stability). It should be emphasized that the variations fδ of the data f in the second condition are assumed to lie within the class A(M ), which ensures the existence of solutions. Definition 1.3.4. The set M in Definition 1.3.3 is called the well-posedness set for the problem Aq = f. Remark 1.3.5. To prove that the problem Aq = f is well-posed, it is necessary to prove the existence, uniqueness and stability theorems for its solutions. To prove that the problem Aq = f is conditionally well-posed, it is necessary to choose a well-posedness set M , prove that the solution of the problem is unique in M and that q is conditionally stable with respect to small variations in the data (the right-hand side) f that keep the solutions within the well-posedness set M . Remark 1.3.6. It should be emphasized that proving the well-posedness of a problem necessarily involves proving the existence of a solution, whereas in the case of conditional well-posedness the existence of a solution is assumed. It certainly does not mean that the existence theorem is of no importance or that one should not aim to prove it. We only mean to emphasize that in the most interesting and important cases, the necessary conditions for the existence of
26
1 Inverse Problems of Mathematical Physics
a solution qe to a conditionally well-posed problem Aq = f which should hold for the data f turn out to be too complicated to verify and directly apply in numerical algorithms (see the Picard criterion, the solvability conditions for the inverse Sturm-Liouville problem, etc.). In this sense, the title of V. P. Maslov’s paper “The existence of a solution of an ill-posed problem is equivalent to the convergence of the regularization process” (Maslov, 1968) is very characteristic. It reveals one of the main problems in the study of strongly ill-posed problems. For this reason, the introduction of the concept of conditional well-posedness shifts the focus to the search for stable methods for approximate solution of illposed problems. However, this does not make the task of detailed mathematical analysis of the solvability conditions for any specific problem less interesting or important! Note that it is sometimes possible to construct regularizing algorithms converging to a pseudo-solution or a quasi-solution when the exact solution does not exist.
1.4
The Tikhonov Theorem
The trial-and-error method (Tikhonov and Arsenin, 1974) was one of the first methods for the approximate solution of conditionally well-posed problems. In the description of this method, we will assume that Q and F are metric spaces and M is a compact set. For an approximate solution, we choose an element qK from the well-posedness set M such that the residual ρF (Aq, f) attains its minimum at qK , i.e., ρF (AqK , f) = inf ρF (Aq, f) q∈M
(see Definition 1.5.1 in Section 1.5 for the definition of a quasi-solution). Let {qn } be a sequence of elements in Q such that lim ρF (Aqn , f) = 0. We n→∞ denote by qe the exact solution to the problem Aq = f. If M is compact, then from lim ρF (Aqn , f) = 0 it follows that lim ρQ (qn , qe ) = 0, which is proved n→∞ n→∞ by the Tikhonov theorem. Theorem 1.4.1 (The Tikhonov theorem). Let Q and F be metric spaces, and let M ⊂ Q be compact. Assume that A is a one-to-one map from M onto A(M ) ⊂ F . Then, if A is continuous, then so is the inverse map A−1 : A(M ) → M . Proof. Let f ∈ A(M ) and Aqe = f. We will prove that A−1 is continuous at f. Assume the contrary. Then there exists an ε∗ > 0 such that for any δ > 0 there is an element fδ ∈ A(M ) for which ρF (fδ , f) < δ
and ρQ (A−1 (fδ ), A−1 (f)) ≥ ε∗ .
1.4 The Tikhonov Theorem
27
Consequently, for any n ∈ N there is an element fn ∈ A(M ) such that ρF (fn , f) < 1/n
and ρQ (A−1 (fn ), A−1 (f)) ≥ ε∗ .
Therefore, lim fn = f.
n→∞
Since A−1 (fn ) ∈ M and M is compact, the sequence {A−1 (fn )} has a converg¯ ing subsequence: lim A−1 (fnk ) = q. k→∞
¯ A−1 (f)) ≥ ε∗ , it follows that q¯ = A−1 (f) = qe . On the other Since ρQ (q, hand, from the continuity of the operator A it follows that the subsequence A(A−1 (fnk )) converges and lim A(A−1 (fnk )) = lim fnk = f = Aqe . Hence, k→∞
k→∞
q¯ = qe . We have arrived at a contradiction, which proves the theorem. Theorem 1.4.1 provides a way of minimizing the residual. Let A be a continuous one-to-one operator from M to A(M ), where M is compact. Let {δn } be a decreasing sequence of positive numbers such that δn 0 as n → ∞. Instead of the exact right-hand side f we take an element fδk ∈ A(M ) such that ρF (f, fδk ) ≤ δk . For any n, using the trial-and-error method we can find an element qn such that ρF (Aqn , fδ ) ≤ δn . The elements qn approximate the exact solution qe to the equation Aq = f. Indeed, since A is a continuous operator, the image A(M ) of the compact set M is compact. Consequently, by Theorem 1.4.1, the inverse map A−1 is continuous on A(M ). Since ρF (Aqn , f) ≤ ρF (Aqn , fδ ) + ρF (fδ , f), the following inequality holds: ρF (Aqn , f) ≤ δn + δ = γnδ . Hence, by the continuity of the inverse map A−1 : A(M ) → M we have ρQ (qn , qe ) ≤ ε(γnδ ), where ε(γnδ ) → 0 as γnδ → 0. Note that the Tikhonov theorem implies the existence of a function ω(δ) such that (1) lim ω(δ) = 0; δ→0
(2) for all q1 , q2 ∈ M the inequality ρF (Aq1 , Aq2 ) ≤ δ implies ρQ (q1 , q2 ) ≤ ω(δ).
28
1 Inverse Problems of Mathematical Physics
Definition 1.4.2. Assume that Q and F are metric spaces, M ⊂ Q is a compact set, and A : Q → F is a continuous one-to-one operator that maps M onto A(M ) ⊂ F . The function ω(δ) =
sup f1 ,f2 ∈A(M ) ρF (f1 ,f2 )≤δ
ρQ (A−1 f1 , A−1 f2 )
is called the modulus of continuity of the operator A−1 on the set A(M ). Given the modulus of continuity ω(δ) or a function that majorizes it, one can estimate the norm of the deviation of the exact solution from the solution corresponding to the approximate data. Indeed, let qe ∈ M be an exact solution to the problem Aq = f and qδ ∈ M be a solution to the problem Aq = fδ , where f − fδ ≤ δ. Then q − qδ ≤ ω(δ) if M is compact. (see also Section 2.8). Therefore, after proving the uniqueness theorem, the most important stage in the study of a conditionally well-posed problem is obtaining a conditional stability estimate. Remark 1.4.3. Stability depends on the choice of topologies on Q and F . Formally, the continuity of the operator A−1 can be ensured, for example, by endowing F with the strongest topology. If A is a linear one-to-one operator and Q and F are normed spaces, then the following norm can be introduced in F : fA = A−1 f. In this case
A−1 f =1 f =0 fA
A−1 = sup
and therefore A−1 is continuous. In practice, however, the most commonly used spaces are C m and H n , where m and n are not very large (see the definition of these spaces in the Appendix). We conclude this section with several results of a more general character. Theorem 1.4.4 (stability in topological spaces). Let Q and F be topological spaces, F be a Hausdorff space, and M ⊂ Q. Let A be a continuous one-toone operator that maps M onto F (M ). If M is a compact set, then A−1 is continuous on F (M ) in the relative topology. Definition 1.4.5. An operator A : Q → F is called a closed operator if its graph G(A) = {(q, Aq), q ∈ D(A)} is closed in the product of topological spaces Q × F .
1.5 The Ivanov Theorem: Quasi-solution
29
Theorem 1.4.6. Let Q and F be Hausdorff topological spaces satisfying the first axiom of countability. Let A : Q → F be a closed one-to-one operator with domain of definition D(A), K ⊂ Q be a compact set, and M = D(A) ∩ K. Then A(M ) is closed in F and the operator A−1 is continuous on A(M ) in the relative topology. Remark 1.4.7. The assertion of the theorem still holds without the assumption that Q satisfies the first axiom of countability. Remark 1.4.8. Theorem 1.4.6 generalizes Theorem 1.4.4 by relaxing the conditions imposed on the operator A: it is assumed to be closed, but not necessarily continuous. Ill-posed problems can be formulated as problems of determining the value of a (generally, unbounded) operator at a point T f = q,
f ∈ F,
q ∈ Q.
(1.4.1)
If the operator A−1 exists for the problem Aq = f, then this problem is equivalent to problem (1.4.1). The operator A−1 , however, may not exist. Moreover, in many applied problems (for example, differentiating a function, summing a series, etc.) the representation Aq = f may be inconvenient or even unachievable, although both problems can theoretically be studied using the same scheme. We now reformulate the conditions for Hadamard well-posedness for problem (1.4.1) as follows: (1) the operator T is defined on the entire space F : D(T ) = F ; (2) T is a one-to-one map; (3) T is continuous. Problem (1.4.1) is said to be ill-posed if at least one of the well-posedness conditions does not hold. In particular, the case where the third condition does not hold (the case of instability) is the most important and substantial. In this case, problem (1.4.1) is reduced to the problem of approximating an unbounded operator with a bounded one.
1.5
The Ivanov Theorem: Quasi-solution
There are other approaches to ill-posed problems that involve a generalization of the concept of a solution. Let A be a completely continuous operator (which means that A maps any bounded set to a precompact set (see the Appendix for details)). If q is assumed to belong to a compact set M ⊂ Q and f ∈ A(M ) ⊂ F , then the formula q = A−1 f
30
1 Inverse Problems of Mathematical Physics
can be used to construct an approximate solution that is stable with respect to small variations in f. It should be noted that the condition that f belong to A(M ) is essential for the applicability of the formula q = A−1 f for finding an approximate solution, since the expression A−1 f may be meaningless if the said condition does not hold. However, finding out whether f belongs to A(M ) is a complicated problem. Furthermore, even if f ∈ A(M ), measurement errors may cause this condition to fail, i.e., fδ may not belong to A(M ). To avoid the difficulties arising when the equation Aq = f has no solutions, the notion of a quasi-solution to Aq = f is introduced as a generalization of the concept of a solution to this equation. Definition 1.5.1 (Ivanov, 1962b). A quasi-solution to the equation Aq = f on a set M ⊂ Q is an element qK ∈ M that minimizes the residual: ρF (AqK , f) = inf ρF (Aq, f). q∈M
If M is compact, then there exists a quasi-solution for any f ∈ F . If, in addition, f ∈ A(M ), then quasi-solutions qK coincide with the exact solution (qK is not necessarily unique!). We will give a sufficient condition for a quasi-solution to be unique and continuously depend on the right-hand side f. Definition 1.5.2. Let h be an element of a space F , and let G ⊂ F . An element g ∈ G is called a projection of h onto the set G if ρF (h, g) = ρF (h, G) := inf ρF (h, p). p∈G
The projection g of h onto G is written as g = PG h. Theorem 1.5.3. Assume that the equation Aq = f has at most one solution on a compact set M and for any f ∈ F its projection onto A(M ) is unique. Then a quasi-solution to the equation Aq = f is unique and continuously depends on f. The proof of this theorem and Theorem 1.5.5 can be found in Ivanov, Vasin, and Tanana (2002) and Tikhonov and Arsenin (1974). Note that all well-posedness conditions are restored when passing to quasisolutions if the assumptions of Theorem 1.5.3 hold. Consequently, the problem of finding a quasi-solution on a compact set is well-posed. Remark 1.5.4. If the solution of the equation Aq = f is not unique, then its quasi-solutions form a subset D of the compact set M . In this case, even
1.5 The Ivanov Theorem: Quasi-solution
31
without the conditions imposed on A(M ) in the assumption of the theorem, the set D continuously depends on f in the sense of the continuity of multi-valued maps. If the operator A is linear, the theorem can be stated in the following specific form. Theorem 1.5.5. Let A : Q → F be a linear operator and assume that the homogeneous equation Aq = 0 has only one solution q = 0. Furthermore, assume that M is a convex and compact set and any sphere in F is strictly convex. Then a quasi-solution to the equation Aq = f on M is unique and continuously depends on f. We now consider the case where Q and F are separable Hilbert spaces. Let A : Q → F be a completely continuous operator and M = B(0, r) := {q ∈ Q: q ≤ r}. By A∗ we denote the adjoint of the operator A. It is known that A∗ A is a self-adjoint positive completely continuous operator from Q into Q (positivity means that A∗ Aq, q > 0 for all q = 0). Let {λn } be the sequence of eigenvalues of the operator A∗ A (in descending order), and let {ϕn } be the corresponding complete orthonormal sequence of eigenfunctions (vectors). The element A∗ f can be represented as a series: A∗ f =
∞
fn ϕn ,
fn = A∗ f, ϕ∗ .
(1.5.1)
n=1
Under these conditions, the following theorem holds. Theorem 1.5.6. A quasi-solution to the equation Aq = f on the set B(0, r) is given by the formula ⎧ ∞ ∞ fn ⎪ fn2 ⎪ ⎪ ϕ , if < r2 , n ⎪ ⎨ λn λ2n n=1 qK = n=1 ∞ ∞ ⎪ f fn2 n ⎪ ⎪ ≥ r2 , ϕn , if ⎪ ⎩ 2 λ + β λ n=1 n n=1 n where β satisfies the equation ∞ n=1
fn2 = r2 . (λn + β)2
32
1 Inverse Problems of Mathematical Physics ∞
Proof. If
n=1
fn2 /λ2n < r2 , then a quasi-solution qK that minimizes the functional
ρ2F (Aq, f) := J(q) = Aq − f, Aq − f on B(0, r) can be obtained by solving the Euler-Lagrange equation A∗ Aq = A∗ f.
(1.5.2)
We will seek a solution to this equation in the form of a series: qK =
∞
qn ϕn .
n=1
Substituting this series into equation (1.5.2) and using the expansion (1.5.1) for A∗ f, we obtain ∗
A AqK =
∞
∗
qn A Aϕn =
n=1
∞
qn λ n ϕn =
n=1 ∞
Hence, qn = fn /λn . Since
n=1
∞
fn ϕn .
n=1
fn2 /λ2n < r2 , qK =
∞
(fn /λn )ϕn ∈ B(0, r)
n=1
minimizes the functional J(q) on B(0, r). ∞ fn2 /λ2n ≥ r2 , then, taking into account that qK On the other hand, if n=1
must belong to B(0, r), it is required to minimize the functional J(q) = Aq − f, Aq − f on the sphere q2 = r2 . Applying the method of Lagrange multipliers, this problem is reduced to finding the global extremum of the functional Jα (q) = Aq − f, Aq − f + αq, q. To find the minimum of the functional Jα , it is required to solve the corresponding Euler equation (1.5.3) αq + A∗ Aq = A∗ f. Substituting qK = obtain qn = fn /(α +
∞ n=1 λn ).
qn ϕn and A∗ f =
∞
fn ϕn into this equation, we
n=1
The parameter α is determined from the condition ∞ fn2 /(α + λn )2 = r2 . q = r , which is equivalent to the condition w(α) := 2
2
n=1
We now take the root of the equation ω(α) = r2 for β, which completes the proof of the theorem. Remark 1.5.7. The equation w(α) = r2 is solvable because w(0) ≥ r2 and w(α) monotonically decreases with the increase of α and vanishes as α → ∞.
1.6 The Lavrentiev’s Method
1.6
33
The Lavrentiev’s Method
If the approximate right hand side fδ of the equation Aq = f does not belong to A(M ), then one can try to replace this equation with a similar equation αq + Aq = fδ ,
α>0
(1.6.1)
for which the problem becomes well-posed. In what follows, we prove that in many cases this equation has a solution qαδ that tends to the exact solution qe of the equation Aq = f as α and the error δ in the approximation of f tend to zero at the same rate (see Lavrentiev, 1959). Assume that Q and F are separable Hilbert spaces, F = Q, and A is a linear, completely continuous, positive, and self-adjoint operator. Assume that for f ∈ F there exists a qe such that Aqe = f. Then take the solution qα = (αE + A)−1 f to the equation αq + Aq = f as an approximate solution to the equation Aq = f (the existence of qα will be proved below). If the data f is approximate, i.e., instead of f we have fδ such that f −fδ ≤ δ, then we set qαδ = (αE + A)−1 fδ . Equation (1.6.1) defines a family of regularizing operators Rα = (αE + A)−1 , α > 0 (see Definition 1.7.2). Consider this matter in more detail. Let {ϕk } be a complete orthonormal sequence of eigenfunctions and {λk } (0 < · · · ≤ λk+1 ≤ λk ≤ · · · ≤ λ1 ) be the corresponding sequence of eigenvalues of the operator A. Assume that the equation Aq = f (1.6.2) has a solution qe . Substituting the expansions qe = f=
∞ k=1 ∞
qk = qe , ϕk ,
qk ϕk ,
(1.6.3) fk = f, ϕk ,
fk ϕk ,
k=1
into (1.6.2), we conclude that qk = fk /λk and therefore qe =
∞ fk ϕk . λk k=1
Since qe ∈ Q, the series
∞ fk 2 k=1
λk
(1.6.4)
34
1 Inverse Problems of Mathematical Physics
converges. Consider the auxiliary equation αq + Aq = f.
(1.6.5)
As before, a solution qα to equation (1.6.5) can be represented in the form ∞
qα =
k=1
fk ϕk . α + λk
(1.6.6)
Taking into account that fk = λk qk , we estimate the difference qe − qα =
∞
qk ϕk −
k=1
=
∞
∞ λ k qk ϕk α + λk k=1
qk λ k qk ϕk = α ϕk . α + λk α + λk ∞
qk −
k=1
k=1
Consequently, qe − qα 2 = α2
∞ k=1
qk2 . (α + λk )2
(1.6.7)
It is now easy to show that lim qe −qα = 0. Let ε be an arbitrary positive α→+0
number. The series (1.6.7) is estimated from above as follows: α2
∞ k=1
n ∞ qk2 qk2 qk2 2 2 = α + α (α + λk )2 (α + λk )2 (α + λk )2 k=1
≤
Since the series
∞
2
α q2 + λ2n
k=n+1
∞
qk2 .
(1.6.8)
k=1
qk2 converges, there is a number n such that the second
k=1
term in the right-hand side of (1.6.8) is less than ε/2. Then we can choose α > 0 such that the first term is also less than ε/2. It follows that lim qe − qα = 0. α→+0
We now consider the problem with approximate data Aq = fδ ,
(1.6.9)
(where f − fδ ≤ δ) and the regularized problem (1.6.1): αq + Aq = fδ . Put fδ,k = fδ , ϕk . Then the solution qαδ to (1.6.1) can be represented as a series ∞ fδ,k ϕk . qαδ = α + λk k=1
1.7 The Tikhonov Regularization Method
35
We now estimate the difference qe − qαδ ≤ qe − qα + qα − qαδ .
(1.6.10)
The first term in the right-hand side of (1.6.10) vanishes as α → 0. The second term can is estimated as follows: ∞ qα − qαδ 2 = k=1
=
∞ (fk − fδ,k )2 k=1
=
fδ,k fk 2 ϕk − α + λk α + λk
(α + λk )2
≤
∞ 1 (fk − fδ,k )2 α2 k=1
2
1 δ f − fδ 2 ≤ 2 . α2 α
(1.6.11)
We now prove that qαδ → qe as α and δ tend to zero at the same rate. Take an arbitrary positive number ε. Choose α such that qe − qα < ε/2. Then find δ > 0 such that δ/α < ε/2. Then from (1.6.10) and (1.6.11) it follows that qe − qαδ ≤ qe − qα + qα − qαδ ≤ ε/2 + ε/2 = ε. We will apply the Lavrentiev method for the regularization of Volterra integral equations of the first kind in both linear case (Section 4.2) and nonlinear case (Section 4.6). In conclusion, we note that if A is not positive and self-adjoint, then the equation Aq = f can be reduced to an equation with a positive self-adjoint operator by applying the operator A∗ : A∗ Aq = A∗ f. Then the regularizing operator is written as Rα = (αE + A∗ A)−1 A∗ (cf. (1.6.3) from the previous section).
1.7
The Tikhonov Regularization Method
In many ill-posed problems Aq = f, the class M ⊂ Q of possible solutions is not a compact set, and measurement errors in the data f may result in the right-hand side not belonging to the class A(M ) for which solutions exist. The regularization method developed by A. N. Tikhonov (see Tikhonov, 1963a, 1963b, 1964) can be used to construct approximate solutions to such problems.
36
1 Inverse Problems of Mathematical Physics
First, we give the general definition of a regularizing algorithm for the problem Aq = f (Vasin and Ageev, 1995). Let A be an invertible bounded linear operator. Suppose that, instead of the operator A and the right-hand side f, we have their approximations Ah and fδ that satisfy the conditions A − Ah ≤ h,
f − fδ ≤ δ
(Q and F are normed spaces). Let A be the set of admissible perturbations of A. Definition 1.7.1. A family of mappings Rδh : f × A → Q is called a regularizing algorithm for the problem Aq = f if sup f −fδ ≤δ, A−Ah ≤h fδ ∈F, Ah ∈A
Rδh (fδ , Ah ) − A−1 f → 0
as δ → 0 and h → 0 for all f ∈ R(A) = A(Q). The set {Rδh (fδ , Ah )}, δ ∈ (0, δ0 ], h ∈ (0, h0 ] is called a regularized family of approximate solutions to the problem Aq = f. If there is any a-priori information about a solution qe to the equation Aq = f, such as the condition qe ∈ M ⊂ Q, then the set A−1 f in the above definition can be replaced with A−1 f ∩ M . In the sequel, in most cases we assume that the representation of the operator A is exact. Now assume that Q and F are metric spaces, A : Q → F , and qe is an exact solution to the ill-posed problem Aq = f for some f ∈ F . Definition 1.7.2 (regularizing family of operators (Ivanov, Vasin, and Tanana, 2002)). A family of operators {Rα }α>0 is said to be regularizing for the problem Aq = f if (1) for any α > 0 the operator Rα : F → Q is continuous, (2) for any ε > 0 there exists an α∗ > 0 such that ρQ (Rα f, qe ) < ε for all α ∈ (0, α∗ ), in other words, lim Rα f = qe .
α→+0
(1.7.1)
If the right-hand side of the equation Aq = f is approximate and the error ρF (fδ , f) ≤ δ in the initial data is known, then the regularizing family {Rα }α>0 allows us not only to construct an approximate solution qαδ = Rα fδ , but also to
1.7 The Tikhonov Regularization Method
37
estimate the deviation of the approximate solution qαδ from the exact solution qe . Indeed, using the triangle inequality, we have ρQ (qαδ , qe ) ≤ ρQ (qαδ , Rα f) + ρQ (Rα f, qe ).
(1.7.2)
The second term in the right-hand side of (1.7.2) vanishes as α → +0. Since the problem is ill-posed, the estimation of the first term as α → +0 and δ → +0 is a difficult task which is usually solved depending on the specific character of the problem under study and on the a-priori and/or a posteriori information about the exact solution. For example, consider the case where Q and F are Banach spaces, A : Q → F is a completely continuous linear operator, and Rα is a linear operator for any α > 0. Assume that there exists a unique solution qe for f ∈ F , and fδ ∈ F is an approximation of f such that f − fδ ≤ δ.
(1.7.3)
We now estimate the norm of the difference between the exact solution qe and the regularized solution qαδ = Rα fδ : qe − qαδ ≤ qe − Rα f + Rα f − Rα fδ .
(1.7.4)
Set qe − Rα f = γ(qe , α). The property (1.7.1) of a regularizing family implies that the first term in the right-hand side of (1.7.4) vanishes as α → 0, i.e., lim γ(qe , α) = 0. α→0
Since Rα is linear, from the condition (1.7.3) it follows that Rα f − Rα fδ ≤ Rα δ. Recall that the norm of an operator A : Q → F for Banach spaces Q and F is determined by the formula Aq . q∈Q q
A = sup q=0
The norm Rα cannot be uniformly bounded because the problem is ill-posed. Indeed, otherwise we would have lim Rα = A−1 and the problem Aq = f α→+0
would be well-posed in the classical sense. However, if α and δ tend to zero at the same rate, then the right-hand side of the obtained estimate qe − qαδ ≤ γ(qe , α) + Rα δ
(1.7.5)
tends to zero. Indeed, setting ω(qe , δ) = inf {γ(qe , α) + Rα δ}, we will show α>0
that lim ω(qe , δ) = 0.
δ→0
38
1 Inverse Problems of Mathematical Physics
Take an arbitrary ε > 0. Since lim γ(qe , α) = 0, there exists an α0 (ε) such that for all α ∈ (0, α0 (ε)) we have
α→0
γ(qe , α) < ε/2. Put μ0 (ε) =
inf
α∈(0,α0 (ε))
Rα and take δ0 (ε) = ε/(2μ0 (ε)). Then
inf {Rα δ} ≤ δ
α>0
inf
{Rα } ≤ ε/2.
α∈(0,α0 (ε))
for all δ ∈ (0, δ0 (ε)). Thus, for any ε > 0 there exist α0 (ε) and δ0 (ε) such that qe − qαδ < ε for all α ∈ (0, α0 (ε)) and δ ∈ (0, δ0 (ε)). For specific operators A and families {Rα }α>0 , an explicit formula for the relationship between the regularization parameter α and the data error δ can be derived. One of the well-known methods for constructing a regularizing family is the minimization of the Tikhonov functional M (q, fδ , α) = Aq − fδ 2 + αΩ(q − q 0 ), where q 0 is a test solution, α is the regularization parameter, and Ω is a stabilizing functional, which is usually represented by the norm (or a seminorm), for example, Ω(q) = q2 . The stabilizing functional Ω uses the a-priori information about the degree of smoothness of the exact solution (or about the solution structure) and determines the type of the convergence of approximate solutions to the exact one for a given relationship between α(δ) → 0 and δ → 0. For example, the Tikhonov method is effective for Ω(q) = q2W 1 in the numer2 ical solution of integral equations of the first kind that have a unique solution which is sufficiently smooth (see Vasin and Ageev (1995) and the bibliography therein). First, consider a simple example M (q, f, α) = Aq − f2 + αq2 ,
α > 0.
(1.7.6)
Theorem 1.7.3. Let Q and F be Hilbert spaces and A be a completely continuous linear operator. Then for any f ∈ F and α > 0, the functional M (q, f, α) attains its greatest lower bound at a unique element qα . Applying Theorem 1.7.3, we can construct an approximate solution qαδ from the approximate data fδ ∈ F satisfying the condition f − fδ ≤ δ and prove
1.7 The Tikhonov Regularization Method
39
that qαδ converges to the exact solution qe to the equation Aq = f as the parameters α and δ tend to zero at the same rate. Indeed, let the functional M (q, fδ , α) attain its greatest lower bound at a point qαδ , whose existence and uniqueness is guaranteed by Theorem 1.7.3. Theorem 1.7.4 (Denisov, 1995). Let the assumptions of Theorem 1.7.3 hold. Assume that there exists a unique solution qe to the equation Aq = f for some f ∈ F . Let {fδ }δ>0 be a family of approximate data such that f −fδ < δ for each of its elements. Then, if the regularization parameter α = α(δ) is chosen so that lim α(δ) = 0 and lim δ 2 /α(δ) = 0, then the element qα(δ),δ which δ→0
δ→0
minimizes the regularizing functional M (q, fδ , α) tends to the exact solution qe to the equation Aq = f, i.e., lim qα(δ),δ − qe = 0. δ→0
We now return to a more general case, where the problem Ah q = fδ is being solved instead of Aq = f. Theorem 1.7.5 (Vasin, 1989). Let the operator Ah and the right-hand side fδ satisfy the approximation conditions Ah − A ≤ h,
f − fδ ≤ δ,
(1.7.7)
where h ∈ (0, h0 ), δ ∈ (0, δ0 ). Assume that A and Ah are bounded linear operators from Q into F , where Q and F are Hilbert spaces. Let qn0 be a solution to the problem Aq = f that is normal with respect to q 0 (i.e., an element that minimizes the functional q − q 0 on the set Qf of all solutions to the problem Aq = f). Then for all α > 0, q0 ∈ Q, h ∈ (0, h0 ), and δ ∈ (δ0 ) there exists a α to the problem unique solution qhδ min {Ah q − fδ 2 + αq − q 0 2 , q ∈ Q}.
(1.7.8)
Furthermore, if the regularization parameter α satisfies the conditions lim α(Δ) = 0,
Δ→0
where Δ =
√
(h + δ)2 = 0, Δ→0 α(Δ) lim
h2 + δ 2 , then α lim qhδ − q 0 = 0.
Δ→0
Example 1.7.6 (differentiation problem). Suppose that the error in the approximate representation of f(x) ∈ C 1 (0, 1) is f − fδ C(0,1) < δ. It is required to determine the approximation of the derivative f (x) from the function fδ (x) ∈ C(0, 1).
40
1 Inverse Problems of Mathematical Physics
The ill-posedness of this problem readily follows from the fact that the function fδ (x) may not have a derivative at all. However, even if fδ (x) exists, the problem may turn out to be unstable. Indeed, with an error of the form δ sin (x/δ 2 ), the approximate representation fδ (x) = f(x) + δ sin (x/δ 2 ) tends to f(x) as δ → 0. At the same time, the difference between the derivatives fδ (x) − f (x) =
1 cos (x/δ 2 ) δ
increases indefinitely. Consider a simple regularizing family Rα f(x) = (f(x + α) − f(x))/α, where x ∈ (0, 1) and α ∈ (0, 1 − x). We now estimate the deviation f (x) − Rα fδ (x) at a fixed point x ∈ (0, 1). Since f(x) is differentiable, we have f(x + α) = f(x) + f (x)α + o(α). Hence
o(α) f(x + α) − f(x) − f (x) = . α α
Therefore, f(x + α) − f(x) |f (x) − Rα fδ (x)| ≤ f (x) − α f(x + α) − f(x) f (x + α) − f (x) δ δ − + α α o(α, x) 2δ ≤ + . (1.7.9) α α In this case, for the right-hand side of inequality (1.7.9) to vanish, it is sufficient to require that α and δ tend to zero in such a way that δ/α → 0, i.e., δ = o(α). The estimate (1.7.9) can be improved by imposing stricter requirements on the smoothness of f(x). For example, suppose that f(x) has the second derivative f (x) whose absolute value is bounded by a constant c1 in a neighbourhood O(x, ε) = {x ∈ R : |x − x | < ε} of x: sup
x ∈O(x,ε)
|f (x )| < c1 .
(1.7.10)
Then take α ∈ (0, min{ε, 1−x}) and apply the Taylor formula with Lagrange remainder: f(x + α) = f(x) + f (x)α +
f (x + θα)α2 , 2
θ ∈ (0, 1).
1.7 The Tikhonov Regularization Method
41
Hence
f(x + α) − f(x) c α 1 − f (x) ≤ . α 2 Thus, we get the following estimate instead of (1.7.9): |f (x) − Rα fδ (x)| ≤
c1 α 2δ + . 2 α
(1.7.11)
The conditions expressing the dependence of α on δ can be refined, for example, by choosing α > 0 so as to minimize the right-hand side of (1.7.11). Its minimum with respect to α is attained at the positive solution to the equation 2δ c1 = 2, 2 α
i.e., at α = 2 δ/c1 . In this case we have
|f (x) − Rα fδ (x)| ≤ 2 c1 δ. Remark 1.7.7. Other families of regularizing operators can be constructed and analyzed for convergence. For example, Rα f(x) = (f(x) − f(x − α))/α or Rα f(x) = (f(x + α) − f(x − α))/(2α). Remark 1.7.8. We estimated the deviation |f (x) − Rα fδ (x)| in the neighbourhood of an arbitrary fixed point x ∈ (0, 1). Clearly, the estimate (1.7.11) will hold for all x ∈ (0, 1) if the condition (1.7.10) is replaced with the condition f C(0,1) < const. (For more details on the point and uniform regularization, see Ivanov, Vasin, and Tanana, 2002.) Remark 1.7.9. In practice, aside from the fact that the values of f(x) are specified approximately, they are often specified only at a few points of the interval being studied. In such situations, one can use interpolation polynomials (Newton and Lagrange polynomials and splines) and take the derivative of the interpolation polynomial instead of the approximate derivative of f(x). In this case, estimating the deviation of the approximate derivative from the exact one requires additional assumptions on the smoothness of the derivative and the concordance between the measurement accuracy and the discretization step. Remark 1.7.10. Another regularization method for the differentiation operator consists in taking the convolution of the approximate function fδ (x) and a smooth function ωα (x) with the following properties: (1) sup {ωα (x)} ⊂ [−α, α]; α ωα (x) dx = 1. (2) −α
42
1 Inverse Problems of Mathematical Physics
For example, the family {ωα (x)}, α > 0, can be chosen as follows: cα exp{−α2 /(α2 − x2 )}, |x| ≤ α, ωα (x) = 0, |x| > α, cα =
α
exp
−
−α
α2 −1 dx . α2 − x2
It is easy to verify that the operators Rα defined by the formula ∞ Rα fδ (x) = fδ (y)ωα (x − y) dy −∞
constitute a family of regularizing operators. Regularizing sequence. In the construction of a regularizing family of operators, the real parameter α → 0 is sometimes replaced by a natural parameter n → ∞. Consider several examples. Example 1.7.11 (expansion in eigenfunctions). Assume that Q is a separable Hilbert space and F = Q. As in Section 1.5, let A be a completely continuous positive self-adjoint linear operator, and let {ϕn } and {λn } be the sequences of eigenfunctions and eigenvalues of A, respectively, where λk+1 ≤ λk , k = 1, 2, . . . . An exact solution qe to the equation Aq = f can be represented in the form ∞ fk ϕk , fk = f, ϕk . (1.7.12) qe = λk k=1
Since the solution exists and belongs to the Hilbert space Q, the series (1.7.12) ∞ (fk /λk )2 converge. However, if the equation Aq = fδ has no solutions and k=1
for fδ ∈ F satisfying the condition f − fδ < δ, then the series
∞ (fδk /λk )2 , k=1
fδk = fδ , ϕk , diverges (see the Picard criterion in the Appendix). We now construct the sequence of operators {Rn } Rn fδ =
n fδk k=1
λk
(1.7.13)
ϕk
and show that it has all the properties of a regularizing family as n → ∞. The operators Rn are obviously continuous, Rn = 1/λn , and for all q ∈ Q lim Rn Aq = lim
n→∞
n→∞
n k=1
qk ϕk = q,
qk = q, ϕk .
1.7 The Tikhonov Regularization Method
43
Then the following estimate holds for the exact solution (1.7.12) to the equation Aq = f and the regularized solution (1.7.13), qδn = Rn fδ : qe − qδn ≤ qe − Rn f + Rn f − Rn fδ ≤ qe − Rn Aqe + Rn (f − fδ ) ∞ ∞ δ 2 ≤ qk + Rn δ = qk2 + . λn k=n+1
k=n+1
Consequently, for any ε > 0 we can choose a number n0 such that
∞
qk2
0 to satisfy the condition δ < λn ε/2. Then the inequality qe − qδn < ε holds for all n > n0 and δ ∈ (0, λn ε/2). Example 1.7.12 (the method of successive approximations). Under the assumptions of the preceding example, suppose that λ1 < 1. Take a sequence {qn } defined by the formula qn+1 = qn − Aqn + f,
q0 = f,
n = 0, 1, 2, . . . .
We will show that if the equation Aq = f has an exact solution qe ∈ Q, then lim qn = qe .
n→∞
Indeed, qn =
n n (E − A)k f = (E − A)k Aqe k=0
k=0
Exercise 1.7.13. Prove that qe − qn = (E − A)n+1 qe .
(1.7.14)
The representation of (1.7.14) with respect to the basis {ϕn } is written as qe − qn =
∞ (1 − λk )n+1 qn ϕk ,
qk = qe , ϕk .
k=1
Then qe − qn 2 =
∞ (1 − λk )2(n+1)qk2 . k=1
Exercise 1.7.14. Prove that lim qe − qn = 0. n→∞
Exercise 1.7.15. Prove that the sequence of operators defined by the equality n Rn f = (E − A)k f is regularizing and Rn = n + 1. k=0
44
1 Inverse Problems of Mathematical Physics
Acknowledgements The work was supported by Federal Target Grant “Scientific and educational personnel of innovation Russia” for 2009–2013 (government contract No. 14.740. 11.0350).
References [1] A. S. Alekseev, Inverse dynamical problems of seismology, Some Methods and Algorithms for Interpretation of Geophysical Data, Nauka, Moscow, 9-84, 1967 (in Russian). [2] A. S. Alekseev and V. I. Dobrinskii, Questions of practical application of dynamical inverse problems of seismology, Mathematical Problems of Geophysics, 6(2). Computer Center, Siberian Branch of USSR Academy Sci., Novosibirsk, 7-53, 1975 (in Russian). [3] J. S. Azamatov and S. I. Kabanikhin, Nonlinear Volterra operator equations. L2 -theory, J. Inv. Ill-Posed Problems, 7(6), 487-510, 1999. [4] M. I. Belishev, Boundary control in reconstruction of manifolds and metrics (the bc method), Inverse Problems, 13, R1-R45, 1997. [5] M. I. Belishev and V. Yu. Gotlib, Dynamical variant of the bc-method: theory and numerical testing, J. Inverse and Ill-Posed Problems, 7, 221-240, 1999. [6] J. G. Berryman and R. R Greene, Discrete inverse methods for elastic waves, Geophys, 45, 213-233, 1980. [7] A. S. Blagoveshchenskii, One-dimensional inverse problem for a hyperbolic equation of the second order, Mathematical Questions of the Wave Propagation Theory, 2, LOMI Akad. Nauk SSSR, 85-90, 1969 (in Russian). [8] A. S. Blagoveschenskii, The local method of solution of the non-stationary inverse problem for an inhomogeneous string, Proceedings of the Mathematical Steklov Institute, 115, 28-38, 1971 (in Russian). [9] K. Bube, Convergence of discrete inversion solutions, Inverse Problems of Acoustic and Elastic Waves, SIAM, 20-47, 1984. [10] R. Burridge, The Gel’fand-Levitan, the Marchenko and the Gopinath-Sondhi integral equation of inverse scattering theory, regarded in the context of inverse impulse-response problems, Wave Motion, 2, 305-323, 1980. [11] K. M. Case and M. Kack, A discrete version of the inverse scattering problem, J. Math. Phys., 4, 594-603, 1983. [12] I. I. Eremin and V. V. Vasin, Operator and Iterative Processes of Fejer Type, Ural Branch of RAS, Ekaterinburg, 2005 (in Russian). [13] I. M. Gel’fand and B. M. Levitan, On the determination of a differential equation from its spectral function, Izv. Akad. Nauk SSSR. Ser. Mat., 15, 309-360, 1951 (in Russian). [14] G. M. L. Gladwell and N. B. Willms, A discrete Gel’fand-Levitan method for band-matrix inverse eigenvalue problems, Inverse Problems, 5, 165-179, 1989. [15] B. Gopinath and M. Sondhi, Inversion of the telegraph equation and the synthesis of nonuniform lines, Proc. IEE, 59(3), 383-392, 1971. [16] P. Goupillaud, An approach to inverse filtering of nearsurface layer effects from seismic record, Geophysics, 26, 754-760, 1961.
References
45
[17] S. He and S. I. Kabanikhin, An optimization approach to a three-dimensional acoustic inverse problemin the time domain, J. Math. Phys., 36(8), 4028-4043, 1995. [18] V. K. Ivanov, V. V. Vasin and V. P. Tanana, Theory of Linear Ill-Posed Problems and its Applications, VSP, Utrecht, 2002. [19] S. I. Kabanikhin, On the solvability of a dynamical problem of Seismology, Conditionally Well-Posed Mathematical Problems and Problems of Geophysics, Computer Center, Siberian Brunch of USSR Academy of Sciences, Novosibirsk, 43-51, 1979 (in Russian). [20] S. I. Kabanikhin, Linear Regularization of Multidimensional Inverse Problems for Hyperbolic Equations, Preprint No. 27. Sobolev Institute of Math., Novosibirsk, 1988 (in Russian). [21] S. I. Kabanikhin, Projection-Difference Methods for Determining the Coefficients of Hyperbolic Equations, SO Akad. Nauk SSSR, Novosibirsk, 1988 (in Russian). [22] S. I. Kabanikhin, Projection-Difference Methods of Determening the Coefficients of Hyperbolic Equations, Nauka, Novosibirsk (1988) (in Russian). [23] S. I. Kabanikhin, On linear regularization of multidimensional inverse problems for hyperbolic equations, Sov. Math. Dokl., 40(3), 579-583, 1990. [24] S. I. Kabanikhin, Methods of solving dynamical inverse problems for the hyperbolic equations, Ill-Posed Problems of Mathematical Physics and Analysis, Nauka, Novosibirsk, 109-123, 1992 (in Russian). [25] S. I. Kabanikhin, R. Kowar and O. Scherzer, On the Landweber iteration for the solution of parameter identification problem in a hyperbolic partial differential equation of second order, J. Inv. Ill-Posed Problems, 6(5), 403-430, 1998. [26] S. I. Kabanikhin, G. B. Bakanov and M. A. Shishlenin, Comparative Analysis of Methods of Finite-Difference Scheme Inversion, Newton-Kantorovich and Landweber Iteration in Inverse Problem for Hyperbolic Equation, Preprint N 12. Novosibirsk State University, Novosibirsk, 2001 (in Russian). [27] S. I. Kabanikhin, O. Scherzer and M. A. Shishlenin, Iteration methods for solving a two-dimensional inverse problem for a hyperbolic equation, J. Inv. Ill-Posed Problems, 11(1), 87-109, 2003. [28] S. I. Kabanikhin and M. A. Shishlenin, Boundary control and Gel’fand-LevitanKrein methods in inverse acoustic problem, J. Inv. Ill-Posed Problems, 12(2), 125-144, 2004. [29] S. I. Kabanikhin, A. D. Satybaev and M. A. Shishlenin, Direct Methods of Solving Inverse Hyperbolic Problems, VSP, The Netherlands, 2005. [30] S. I. Kabanikhin, Inverse and Ill-Posed Problems, Siberian Scientific publishers, Novosibirsk, 450, 2008 (in Russian). [31] M. V. Klibanov and A. Timonov, Carleman Estimates for Coefficient Inverse Problems and Numerical Applications, VSP, Utrecht, The Netherlands, 2004. [32] L. Beilina and M. V. Klibanov, A Globally Convergent Numerical Method for Some Coefficient Inverse Problems With Resulting Second Order Elliptic Equations, available online at http://www.ma.utexas.edu/mp_arc, preprint number 07-311. [33] M. G. Krein, Solution of the inverse Sturm-Liouville problem, Dokl. Akad. Nauk SSSR, 76, 21-24, 1951 (in Russian).
46
1 Inverse Problems of Mathematical Physics
[34] M. G. Krein, On a method of effective solution of an inverse boundary problem, Dokl. Akad. Nauk SSSR, 94, 987-990, 1954 (in Russian). [35] G. Kunetz, Essai d’analyse de traces sismiques, Geophysical Prospecting, 9(3), 317-341, 1961. [36] G. Kunetz, Quelques exemples d’analyse d’enregistrements sismiques, Geophysical Prospecting, 11(4), 409-422, 1963. [37] F. Natterer, A Discrete Gel’fand-Levitan Theory, Technical report, Institut fuer Numerische und instrumentelle Mathematik, Universitaet Muenster, Muenster, Germany, 1994. [38] B. S. Pariiskii, The inverse problem for a wave equation with a depth effect, Some Direct and Inverse Problems of Seismology, Nauka, Moscow, 139-169, 1968 (in Russian). [39] B. S. Pariiskii, Economical Methods for the Numerical Solutions of Convolution Equations and of Systems of Polynomial Equations with Toeplitz Matrices, Computer Center, USSR Academy Sci., Moscow, 1977 (in Russian). [40] B. S. Pariiskii, An economical method for the numerical solution of convolution equations, USSR Computational Math. and Math. Phys., 17(2), 208-211, 1978. [41] Rakesh, An inverse problem for the wave equation in the half plane, Inverse Problems, 9, 433-441, 1993. [42] Rakesh and W. W. Symes, Uniquiness for an inverse problem for the wave equation, Commun. Part. Different. Equat., 13, 15, 87-96, 1988. [43] V. G. Romanov and S. I. Kabanikhin, Inverse Problems of Geoelectrics (Numerical Methods of Solution), Preprint No. 32. Inst. Math., Siberian Branch of the USSR Acad. Sci., Novosibirsk, 1989 (in Russian). [44] V. G. Romanov, Local solvability of some multidimensional inverse problems for equations of hyperbolic type, Differential Equations, 25(2), 275-284, 1989 (in Russian). [45] F. Santosa, Numerical scheme for the inversion of acoustical impedance profile based on the Gel’fand-Levitan method, Geophys. J. Roy. Astr. Soc., 70, 229-244, 1982. [46] J. Sylvester and G. Uhlmann, A global uniqueness theorem for an inverse boundary value problem, Ann. of Math., 125, 153-169, 1987. [47] W. W. Symes, Inverse boundary value problems and a theorem of Gel’fand and Levitan, J. Math. Anal. Appl., 71, 378-402, 1979. [48] B. Ursin and K.-A. Berteussen, Comparison of some inverse methods for wave propagation in layered media, Proc. IEEI, 74(3), 7-19, 1986.
Author Information S. I. Kabanikhin Institute of Computational Mathematics and Mathematical Geophysics, Russian Academy of Sciences, Novosibirsk 630090, Russia. E-mail: [email protected]
Part II
Recent Advances in Regularization Theory and Methods
Chapter 2
Using Parallel Computing for Solving Multidimensional Ill-posed Problems D. V. Lukyanenko and A. G. Yagola
Abstract. Solving multidimensional ill-posed problems has attracted wide interests and found many practical applications. However, the most modern applications require processing a large amount of data that often very difficult to perform on personal computers. In these cases usual different methods are applied for simplification of the problem statements but these simplifications degrade the accuracy of the inverted parameters. It is supposed to solve calculating difficult applications in general form (without any simplifications) by using parallel computation that gives us an advantage the time and the accuracy. The proposed method can be efficiently applied for solving multidimensional Fredholm integral equations of the first kind in many areas of physics where it is necessary to solve inverse problems such as: radiophysics, optics, acoustics, spectroscopy, geophysics, tomography, image processing etc.
2.1
Introduction
Key propositions of this chapter we consider on the example of a practical problem of restoring magnetization parameters over a ship body. Formulation of the problem is as follows: a ship passes over the system of triaxial sensors (Figure 2.1) which measure the value of the induced magnetic field. According to these values of induced magnetic fields it is necessary to restore the magnetization parameters over the hull of the ship [8], [9]. This formulation of the problem is equal to a situation where the ship stands over the system of sensor arrays (Figure 2.2). The main difficulty of this problem solving (as in most of the other multidimensional problems) is that general solution of the problem is extremely time-taking. As a result, various simplifications are typically used to simplify the problem that reduce the dimension of the problem that allows us to solve it by using simpler methods [6].
50
2
Parallel Computing
Figure 2.1. The ship passes over the system of triaxial sensors.
Figure 2.2. The ship stands over the system of sensor arrays.
For example, it is used a partition of the vessel at sufficiently large subvolumes (Figure 2.3) and restoring of the magnetization parameters of these particular elements of the partition. It is obvious that this approach though can reduce the dimension of the problem but gives us only a qualitative description of the object. Second approach is to approximate the hull of the ship by an ellipsoid of revolution (Figure 2.4) for which well-known analytic transformations might be applied that can significantly reduce the dimension of the problem. But in this case we apply the assumption that magnetized part of the ship is only a hull and inner magnetized parts are not counted. Along with the often rough approximation of the hull of the ship by ellipsoid of revolution this type of simplification also gives us only a partial picture of the object under study. The third approach is to approximate the hull of the ship by a plane (which is applicable for very large ships) (Figure 2.5) which leads to the need to solve two-dimensional integral equation of convolution type for vector functions. But in all of these cases [6] (similar to other multidimensional inverse problems) the simplifications are used which although lower the dimension of the problem but give us results that are useful only in specific situations. And what can we do when it is need to solve the problem in general? For example, what can we do if the dimension of grids too huge (see Figure 2.6)? In these cases, the only way for solving of these problems is using a parallel computing [5].
2.2 Using Parallel Computing
51
Figure 2.3. Dividing the ship into subdivisions with constant values of magnetization.
Figure 2.4. Approximation the hull of the ship by an ellipsoid.
Figure 2.5. Approximation the hull of the ship by a plane.
2.2 2.2.1
Using Parallel Computing Main idea of parallel computing
Parallel computing is a form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently (in parallel) [3], [4]. Parallel computation can be performs on multi-processor clusters
52
2
Parallel Computing
Figure 2.6. Example of a large segmentation of the body of the ship.
or on multi-core computers that have multiple processing elements within a single machine (see Figure 2.7).
Figure 2.7. Parallel computation using multi-processor clusters.
But not every problem can be parallelized efficiently.
2.2.2
Parallel computing limitations
The speed-up of a program as a result of parallelization is observed as Amdahl’s law. It states that a small portion of the program which cannot be parallelized will limit the overall speed-up available from parallelization. Any large mathematical or engineering problem will typically consist of several parallelizable parts and several non-parallelizable (sequential) parts. This relationship is given by equation: 1 , S= (1 − P ) + NP−1 where S is the speed-up of the program (as a factor of its original sequential runtime), N is number of processors and P is the fraction that is parallelizable.
2.3 Parallelization of Multidimensional Ill-posed Problem
53
If the sequential portion of a program is 10% of the runtime, we can get no more than a 10x speed-up, regardless of how many processors are added. This puts an upper limit on the usefulness of adding more parallel execution units (see Figure 2.8).
Figure 2.8. Amdahl’s law.
There will be shown at this chapter that parallelizable fraction for multidimensional Fredholm integral equation of the first kind is 0, (9) that gives us high effectiveness of parallelization.
2.3
Parallelization of Multidimensional Ill-posed Problem
The main practical problem investigated here is effective solving of multidimensional Fredholm integral equation of the first kind by using parallel computation. Let consider the most general problem statement: three-dimensional Fredholm integral equation of the first kind for vector-function. Following statements can be easy reduced in cases of two-dimensional equations and/or cases of a scalar function.
2.3.1
Formulation of the problem and method of solution
The equation describing the magnetic field B of dipole sources in term of the field point position relative to the source r and equivalent magnetic moment
54
2
Parallel Computing
M is defined as B (xs , ys , zs ) =
N Mk μ0 3(M k · r ks )r ks − , 4π |rks |5 |rks |3
(2.3.1)
k=1
where xs , ys , zs are coordinates of a point located on the sensor planes in the Cartesian system of coordinates (x, y, z) and correspond to coordinates of the sensors, rks is a distance between the point (xs , ys , zs ) and the point of the dipole source k, μ0 is a permeability in vacuum, N is number of the dipole sources. The equation (2.3.1) can be expressed by equivalent mathematical model of the three-dimensional Fredholm integral equation of the first kind (2.3.2) B (xs , ys , zs ) = K (xs , ys , zs , x, y, z)M (x, y, z)dv. V
In this case the left hand side B is a vector function defined on the sensors planes, and the unknown function M is also a vector function over the volume of the ship. Here (x, y, z) are coordinates of points distributed over the volume of the ship. The kernel K of the integral equation (2.3.2) corresponds to (2.3.1) and can be written as ! " 3(x−xs )2 −r2 3(x−xs )(y−ys ) 3(x−xs )(z−zs ) μ0 2 2 3(y−ys )(x−xs ) 3(y−ys ) −r 3(y−ys )(z−zs ) , K (xs , ys , zs , x, y, z) = 4πr5 3(z−zs )(x−xs ) 3(z−zs )(y−ys ) 3(z−zs )2 −r2 where r = (x − xs )2 + (y − ys )2 + (z − zs )2 . If we presume V ⊂ P = {(x, y, z) : Lx ≤ x ≤ Rx , Ly ≤ y ≤ Ry , Lz ≤ z ≤ Rz } and the set of the sensors allocated in a rectangle area Q = {(xs , ys , zs ) ≡ (s, t, r) : Ls ≤ s ≤ Rs , Lt ≤ t ≤ Rt , Lr ≤ r ≤ Rr }, we obtain RxRyRz K (s, t, r, x, y, z)M (x, y, z)dxdydz = B (s, t, r).
AM =
(2.3.3)
Lx Ly Lz
We assume that the M ∈ W22 (P ), B ∈ L2 (Q), and operator A with kernel K is continuous and unique. Norms of right-hand side of the equation (2.3.3) and solution are introduced as follows: # B L2 = B 1 2L2 + B 2 2L2 + B 3 2L2 , # M W22 = M 1 2W 2 + M 2 2W 2 + M 3 2W 2 . 2
2
2
¯ and operator A their approximate Suppose that instead of accurately known B ¯ L2 ≤ δ, A−Ah W 2 →L ≤ h. values B δ and Ah are known, such that B δ − B 2 2
2.3 Parallelization of Multidimensional Ill-posed Problem
55
The problem (2.3.3) is ill-posed and it is necessary to build the regularizing algorithm based on the minimization of the Tikhonov functional [1], [2]. We write down the Tikhonov functional F α [M ] for the equation (2.3.3) F α [M ] = Ah M − B δ 2L2 + αM 2W 2 , 2
which in our case takes the form: RxRyRz
RsRt Rr F α [M ] =
K (s, t, r, x, y, z)M (x, y, z)dxdydz
dsdtdr Ls Lt Lr
Lx Ly Lz
$2 − B(s, t, r)
+ αΩ[M ], (2.3.4)
where Ω[M ] ≡ M 2W 2 – smoothing functional: 2
% 2 & % 2 & RxRyRz ∂ M ∂ 2M ∂ M ∂2M (M , M ) + + , , Ω[M ] = ∂x2 ∂x2 ∂y 2 ∂y 2 Lx Ly Lz
% +
∂2M ∂2M , ∂z 2 ∂z 2
&' dxdydz,
α > 0 is the regularization parameter that should be chosen based on the generalized discrepancy principle. The smoothing functional in (2.3.4) is changed to W22 (G) norm of the unknown solution M (x, y, z) accordingly to a-priori information concerning smoothness of the unknown solution. Tikhonov’s theory of ill-posed problems and numerical methods can be applied for numerical solution. For any α > 0 a unique extremal of the Tikhonov functional M αη exists, η = {δ, h}, which implements minimum of F α [M ]. To select the regularization parameter method can be used generalized discrepancy principle [1]. When we choose the parameter α = α(η) accordingly to he generalized discrepancy principle 2 ρ(α) = Ah M αη − B δ 2L2 − δ + hM αη W22 = 0, M αη tends to exact solution as η → 0 in W22 . The minimal element of the Tikhonov functional for fixed α > 0 can be found by the application of the conjugate gradient method.
56
2.3.2
2
Parallel Computing
Finite-difference approximation of the functional and its gradient
When we solve minimization problem by conjugate gradient method it is necessary to calculate value of the Tikhonov functional F α [M ] and its gradient grad F α [M ]. For numerical solution we introduce uniform grids on x, y, z, s, t, r with steps hx , hy , hz , hs , ht , hr which have number of nodes Nx , Ny , Nz , Ns , Nt , Nr accordingly:
xi1 = Lx + (i1 − 1)hx ,
i1 = 1, Nx ,
yi2 = Ly + (i2 − 1)hy ,
i2 = 1, Ny ,
zi3 = Lz + (i3 − 1)hz ,
i3 = 1, Nz ,
sj1 = Ls + (j1 − 1)hs ,
j1 = 1, Ns ,
tj2 = Lt + (j2 − 1)ht ,
j2 = 1, Nt ,
rj3 = Lr + (j3 − 1)hr ,
j3 = 1, Nr ,
R x − Lx , Nx − 1 R y − Ly , hy = Ny − 1 R z − Lz hz = , Nz − 1 R s − Ls hs = , Ns − 1 R t − Lt ht = , Nt − 1 R r − Lr . hr = Nr − 1 hx =
We assume that Mim = M m (xi1 , yi2 , zi3 ), Bjn1 j2 j3 = B n (sj1 , tj2 , rj3 ), 1 i2 i3 nm nm Kj1 j2 j3 i1 i2 i3 = K (sj1 , tj2 , rj3 , xi1 , yi2 , zi3 ), n = 1, 3, m = 1, 3. All integrals in the (2.3.4) are approximated by the formula of rectangles. So we obtain the finite-difference approximation of the functional (2.3.4): F α [M ] = Φ[M ] + αΩ[M ],
(2.3.5)
where
Φ[M ] =
Ns Nt Nr 3
hs ht hr
j1 =1 j2 =1 j3 =1 n=1
⎡
×⎣
Ny Nz Nx 3 i1 =1 i2 =1 i3 =1 m=1
⎤2 hx hy hz Kjnm Mim − Bjn1 j2 j3 ⎦ , (2.3.6) 1 j2 j3 i1 i2 i3 1 i2 i3
2.3 Parallelization of Multidimensional Ill-posed Problem
Ω[M ] = hx hy hz
Ny Nz 3 Nx , i1 =1 i2 =1 i3 =1 m=1
Mim 1 i2 i3
-2
57
+ ...
Ny Nz 3 Nx −1 , m -2 hy hz Mi1 +1i2 i3 − 2Mim + 3 + Mim + ... 1 i2 i3 1 −1i2 i3 hx m=1 i1 =2 i2 =1 i3 =1
y −1 Nz 3 Nx N , m -2 hx hz Mi1 i2 +1i3 − 2Mim + 3 + Mim + ... 1 i2 i3 1 i2 −1i3 hy m=1
i1 =1 i2 =2 i3 =1
+
Ny Nz −1 3 Nx , -2 hx hy Mim − 2Mim + Mim . 1 i2 i3 +1 1 i2 i3 1 i2 i3 −1 3 hz m=1 i1 =1 i2 =1 i3 =2
(2.3.7) So the initial problem is reduced to a minimization problem in N -dimensional space, where N ≡ 3×Nx ×Ny ×Nz , and selecting the regularization parameters accordingly to generalized discrepancy principle. The minimization process starts with an arbitrary admissible vector M (0) and proceeds by the method of conjugate gradients. For given vector M (i) of minimizing sequence we compute gradient g (i) = grad F α [M (i) ]: (grad F α [M ])m i1 i2 i3 =
] ∂F α [Mim 1 i2 i3 = ... m ∂Mi1 i2 i3
= 2hx hy hz ⎡ ×⎣
Nt Nr 3 Ns j1 =1 j2 =1 j3 =1 n=1
Ny Nz 3 Nx l1 =1 l2 =1 l3 =1 p=1
hs ht hr Kjnm 1 j2 j3 i1 i2 i3 ⎤
hx hy hz Kjnp Mlp1 l2 l3 − Bjn1 j2 j3 ⎦ + . . . 1 j2 j3 l1 l2 l3 +α
∂Ω[M ] , (2.3.8) ∂Mim 1 i2 i3
hy hz hx hy ∂Ω[M ] hx hz = 2hx hy hz Mim + 2 3 Ω 1 + 2 3 Ω 2 + 2 3 Ω 3 , m 1 i2 i3 ∂Mi1 i2 i3 hx hy hz ⎧ m m m ⎪ i1 = 1, ⎪ ⎪−2Mi1 +1i2 i3 + Mi1 i2 i3 + Mi1 +2i2 i3 , ⎪ ⎪ m m m m ⎪ −4M + 5M − 2M + M , i1 = 2, ⎪ i1 +1i2 i3 i1 i2 i3 i1 −1i2 i3 i1 +2i2 i3 ⎪ ⎪ ⎨−4M m m m m m i1 +1i2 i3 + 6Mi1 i2 i3 − 4Mi1 −1i2 i3 + Mi1 −2i2 i3 + Mi1 +2i2 i3 , Ω 1 = ⎪ i1 = 3, Nx − 2, ⎪ ⎪ ⎪ ⎪ m m m m ⎪−2Mi +1i i + 5Mi i i − 4Mi −1i i + Mi −2i i , i1 = Nx − 1, ⎪ 1 2 3 1 2 3 1 2 3 1 2 3 ⎪ ⎪ ⎩−2M m m m +M +M , i =N . i1 −1i2 i3
i1 i2 i3
i1 −2i2 i3
1
x
58
2
Parallel Computing
Formulas for Ω 2 and Ω 3 are similar to Ω 1 with appropriate substitutions of the expressions in the coefficients of i2 and i3 on the expression for i1 . Based on a gradient g (i−1) and a direction of descent h (i−1) on the previous steep, new direction of descent is calculated by the formula h (i) = −g (i) + γi h (i−1) ,
γi =
(g (i) − g (i−1) , g (i) ) (g (i−1) , g (i−1) )
(Polak-Ribiere’s variant of conjugate gradient method). Then an one-parameter set consisting of elements M λ = M (i) + λh (i) are constructed. Then the problem of one-dimensional minimizing F α [M λ ] has to be solved. The minimum of the functional F α [M λ ] is taken as the next element M (i+1) of the minimizing sequence. In solving of one-dimensional minimizing problem we use a quadratic approximation (i)
σ(λ) ≡ F α [M (i) + λh (i) ] − F α [M λ ] that uses three point 0, λstep , 2λstep . Then approximate solution of onedimensional minimizing problem can be found as λmin = λstep ·
1 σ2 − 4σ1 , 2 σ2 − 2σ1
σ1 ≡ σ(λstep),
σ2 ≡ σ(2λstep).
The details of the method have been described in [1]. Structure of algorithm is the need to calculate values of the functional F α [M ] (2.3.5)–(2.3.7), and its gradient grad F α [M ] (2.3.8) which contain large groups of independent summands. It allows dividing large problem of calculating the functional and the gradient into smaller ones which are then solved “in parallel” [3], [4].
2.3.3
Parallelization of the minimization problem
In solving the problem of minimizing by the conjugate gradient method it is necessary to calculate the values of the functional and its gradient [5]. Following from (2.3.5)–(2.3.7), (2.3.8) for calculating of the functional F α [M ] and its gradient grad F α [M ] parallelization can be applied. The initial problem is solved using N (N > 2) concurrent processes with numbers 0, . . . , N − 1 which are performed on separate processors and interact with each other if it is necessary. Zero process (process with number 0) performs all unparallelizable operations: reading from a file, save file, the formation of arrays. Parallelization takes place only in cases of minimization of functional F α [M ] and its gradient grad F α [M ].
2.3 Parallelization of Multidimensional Ill-posed Problem
59
Consider in details problem of calculating the functional F α [M ] in some point M . Calculating of the functional Φ[M ] is performed using separate processors for calculating of different squares of sums in (2.3.6). At first, parallelizing of the Tikhonov functional we consider on a simplified example Nx = Ny = Nz = Ns = Nt = Nr = 2. The steps hx , hy , hz , hs , ht , hr and smoothing functional Ω[M ] are skipped for clearness (see Figure 2.9).
Figure 2.9. Simplified example of the Tikhonov functional parallelizing.
So zero process transfers to other processes vector M (see Figure 2.10), then each nonzero process calculates square of sum for variables k and l . All multi-dimensional arrays can be rewritten as one-dimensional so in the scheme the variables k and l are replaced by a variable i , which changes from 1 to N˜ = 3 × Ns × Nt × Nr . Calculations of squares of sums are performed by each processor for its values i . Characteristic property of the functional calculating is that zero process summarizes values of s in order of its calculating by nonzero processes. So each time results of calculations might be different because of round-off errors. In connection with simple structure of the functional Ω[M ] (its equal to three common summands from Φ[M ]) its calculations are performed by the instrumentality of only one process (zero process) because of number of required calculations is too small comparing with another part of algorithm. The scheme of calculating of grad F α [M ] (see Figure 2.11) similar to previous scheme but there is a difference: each non-zero process calculates element k of vector of gradient and transfers its value to zero process. So order of receiving of elements s is not important because numbers k define positions of elements s in an array of the gradient.
60
2
Parallel Computing
Figure 2.10. The scheme of calculating value of the functional for a) zero process, b) non-zero processes.
Figure 2.11. The scheme of calculating value of the gradient of the functional for: a) zero process; b) non-zero processes.
2.4 Some Examples of Calculations
61
This scheme of multiprocessing is highly efficient. Despite Amdahl’s law, which states that if P is the proportion of a program that can be made parallel (i.e. benefit from parallelization), and (1 − P ) is the proportion that cannot be parallelized (remains serial), then the maximum speedup that can be achieved 1 by using N processors is S = (1−P )+ P . For example, if the sequential portion N −1
of a program is 10% of the runtime, we can get no more than a 10x speed-up, regardless of how many processors are added. This puts an upper limit on the usefulness of adding more parallel execution units. In solving our problem, parallelization is carried out only in the calculation of the Tikhonov functional F α [M ] and its gradient grad F α [M ]. It is clear that all other calculations are carried out sequentially. But since the time of the serial code is negligible for large dimensions of the grid, the time spent on all these calculations can be considered equal to zero. It remains a question whether the impact on the effectiveness of parallelizing sequential computations of the smoothing functional Ω[M ] in the calculation of the Tikhonov functional. From formula (2.3.6) for the finite-difference approximation of the Tikhonov functional it can be seen that the residual consist of 3 × Ns × Nt × Nr independent summands, and the smoothing functional (2.3.7) consists of 3 × (≤ 4) equivalent (in terms of time spent on computing) groups of summands. So for the calculation of the Tikhonov functional part of parallelizable actions is s Nt Nr s Nt Nr = NNs N , that shows that even with a relatively small P ≥ 3N3N s Nt Nr +12 t Nr +4 number of input data, for example Ns = Nt = Nr = 10 (recall that the grids correspond to the domain of a known vector function B ), parallelized part of the computation is more than 97%. Modern applied problems require the handling of a much larger number of input data, and therefore the proportion of parallelizable computations tends to 100%, which proves a very high effectiveness of algorithms parallelization. In our case the amount of sequential code is closer to 0% with increasing Ns , Nt and Nr that provides strong efficiently parallelizing of proposed algorithm that allows to process large problems.
2.4
Some Examples of Calculations
Typical dimensions that correspond to real applications are Nx = 100, Ny = 15, Nz = 15 (Figure 2.12 and 2.13). Input data simulated a real experiment and correspond to grids Nx = 200, Ny = 15, Nz = 15, Ns = 4000, Nt = 3, Nr = 2 that relevant to 67500 unknowns and 72000 equations. As a result of implementation of the describing method distribution of the magnetization parameters over the volume of the ship was obtained. Some results of calculations are represented on Figure 2.14. Input data were specified with error equal to 1.5%.
62
2
Parallel Computing
Figure 2.12. Model of the ship.
Figure 2.13. Parallelepiped segmentation of the volume of the ship.
Figure 2.14. The results of the inversion of the magnetization parameters over the volume of the ship (it represented 5 slices of the module inverted vector function M ).
The computation time was approximately 29 hours with using 200 processors (Intel Xeon E5472 3.0 GHz). So long computations associated with the using
References
63
of the regularizing algorithms, which require repeated finding the minimum of the functional to be minimized for each value of the regularization parameter α. Testing calculations were performed on the Computing Cluster of the Moscow State University.
2.5
Conclusions
This chapter presents the advantages of using parallel computing for solving multidimensional ill-posed problems. The conjugate gradients method that is applied for the solution of multidimensional Fredholm integral equations of the first kind has a structure that is ideal for effective parallelization. Key propositions of the chapter can be easy reduced in cases of two-dimensional equations and/or cases of a scalar function. The proposed method can be efficiently applied for solving multidimensional Fredholm integral equations of the first kind in many areas of physics where it is necessary to solve inverse problems such as: radiophysics, optics, acoustics, spectroscopy, geophysics, tomography, image processing etc.
Acknowledgements The authors are partly supported by RFBR grant 10-01-91150-NFSC.
References [1] A. N. Tikhonov, A. V. Goncharsky, V. V. Stepanov and A. G. Yagola, Numerical Methods for the Solution of Ill-posed Problems, Kluwer Academic Publishers, Dordrecht, 1995. [2] A. N. Tikhonov, A. S. Leonov and A. G. Yagola, Nonlinear ill-posed problems, Proceedings of the First World Congress of Nonlinear Analysts (editor V. Lakshmikantham), 1, Berlin, Walter de Gruyters, 505-511, 1996. [3] V. V. Voevodin, Mathematical Foundations of Parallel Computing, World Scientific Publ, Co, Singapore, 1992. [4] V. V. Voevodin, Theory and practice of parallelism detection in sequential programs, Programming and Computer Software, 1992. [5] D. V. Lukyanenko and A. G. Yagola, Application of multiprocessor systems for solving three-dimensional Fredholm integral equations of the first kind for vector functions, Numerical Methods and Programming, 11, 336-343 (in Russian), 2010. [6] D. V. Lukyanenko, Y. H. Pei, A. G. Yagola, G. R. Liu and N. A. Evdokimova, Numerical methods for solving ill-posed problems with constraints and applications to inversion of the magnetic field, International Conference “Inverse and Ill-Posed Problems of Mathematical Physics”, dedicated to Professor M. M.
64
2
Parallel Computing
Lavrentiev on the occasion of his 75-th birthday August 20-25, 2007, Novosibirsk, Russia. Abstracts of Section 3, 1-2, 2007, http://www.math.nsc.ru/conference /ipmp07/section3.htm. [7] E. Shimanovskaya, A. Yagola, E. Koptelova and B. Artamonov, Inverse problems in gravitational lensing research, Optimization and Regularization for Computational Inverse Problems and Applications. A workshop at the Institute of Geology and Geophysics. The Chinese Academy of Sciences, Beijing, China. July 21-25, 2008. Meeting Guide. Abstracts of Presentation, Beijing, IGGCAS, 31, 2008. [8] Y. H. Pei and H. G. Yeo, Sequential inversion of ship magnetization from measurements, Underwater Defense technology, Asia 2003, Singapore. [9] Y. H. Pei and A. G. Yagola, Constraint Magnetization Parameter Inversion by Iterative Tikhonov Regularization, International Conference “Inverse and IllPosed Problems of Mathematical Physics”, dedicated to Professor M. M. Lavrentiev on the occasion of his 75-th birthday August 20-25, 2007, Novosibirsk, Russia, Abstracts of Section 3, 1-2, 2007, http://www.math.nsc.ru/conference /ipmp07/section3.htm.
Authors Information D. V. Lukyanenko and A. G. Yagola Department of Mathematics, Faculty of Physics, Lomonosov Moscow State University, Moscow 119991, Russia. E-mail: {lukyanenko,yagola}@physics.msu.ru
Chapter 3
Regularization of Fredholm Integral Equations of the First Kind using Nyström Approximation M. T. Nair
Abstract. Although Nyström approximation of integral operators, based on a convergent quadrature rule, is effectively used in standard numerical methods for the solution of Fredholm integral equations of the second kind, such procedure has not been explored rigorously in the literature in the context of Fredholm integral equations of the first kind, except possibly in a recent book [8] of the author, though an indication of the same can be found in an earlier work of the author [9] as well. In this chapter we make these results available in a focussed manner, by applying Nyström approximation to the integral equation of the second kind, namely, the Tikhonov regularized equation. It is observed that the derived error estimates are of the same order as the one corresponding to the Tikhonov regularization without approximation, provided the level of discretization is large enough.
3.1
Introduction
Many of the ill-posed inverse problems in science and engineering have their mathematical formulation as a Fredholm integral equation of the first kind, b k(s, t)x(t) dt = y(s), s ∈ [a, b], a
where k(·, ·) is a continuous non-degenerate kernel and y ∈ L2 [a, b] (cf. [3, 4]). In operator notation, the above equation takes the form Kx = y, where K is the integral operator defined by b k(s, t)x(t) dt, x ∈ L2 [a, b], s ∈ [a, b]. (Kx)(s) = a
(3.1.1)
(3.1.2)
66
3
Regularization of Fredholm Integral Equations of the First Kind
For example, the above type of equations arise in the contexts of 1. Computerized tomography, 2. Geological prospecting, 3. Backward heat conduction problem. Since k(·, ·) is a continuous non-degenerate kernel, K : L2 [a, b] → L2 [a, b] is compact with infinite rank. Hence, its range R(K) is not closed, and hence equation (3.1.1) is ill-posed. In fact (See Section 1.2 and Section 4.1 in [8]), (i) there does not exist δ > 0 such that {y˜ ∈ Y : y − y ˜ < δ} ⊆ R(K), so that corresponding to a perturbed data y˜ in place of y, there need not exist a solution, and (ii) even if y ∈ R(K) and there is a unique solution to (3.1.1), the perturbed data y˜ is allowed to vary in R(K), the corresponding solution x˜ need not be close to the actual solution. Therefore, one has to look for a regularization procedure for obtaining a stable approximate solution for the equation (3.1.1). Tikhonov regularization is one of the well-known and widely used such procedure. In Tihkonv regularization, one looks for the solution of the equation ˜ (K ∗ K + αI)x = K ∗ y,
(3.1.3)
for each regularization parameter α > 0, where K ∗ is the adjoint of K and y˜ is an available noisy data in place of the actual data y. Note that K ∗ K is a positive self adjoint operator on L2 [a, b], and hence, for each y˜ ∈ L2 [a, b] and for each α > 0 the equation (3.1.3) has a unique solution, and the operator (K ∗ K + αI)−1 is a bounded operator on L2 [a, b] (See Section 4.4 in [8]). Throughout the chapter we assume that k(·, ·) is continuous so that R(K) ⊆ C[a, b]. Further, we assume that y ∈ R(K) and the noisy data is y δ ∈ L2 [a, b] satisfying y − y δ 2 ≤ δ for some noise level δ > 0. Let xα and xδα be the solutions of (3.1.3) with y˜ replaced by y and y δ , respectively. Thus, we have (K ∗ K + αI)xα = K ∗ y,
(3.1.4)
(K ∗ K + αI)xδα = K ∗ y δ .
(3.1.5)
Then it is known that, if y ∈ D(K † ) := R(K) + R(K)⊥ and x† := K † y, then x† − xα 2 → 0 as
α → 0.
Here, K † denotes the Moore-Penrose generalized inverse of K. Further, using √ the estimate (K ∗ K + αI)−1 K ∗ ≤ 1/ 4α, we obtain δ xα − xδα 2 ≤ √ . 2 α
3.1 Introduction
67
These results can be found in many of the books on regularization theory, for example in [5, 3, 8]. Observe that the equations (3.1.4) and (3.1.5) are in the infinite dimensional setting. So, in order to have numerical approximations for x† , it is necessary to look for the finite dimensional realizations of (3.1.4) and (3.1.5). For this purpose, first let us recall that the operator K ∗ is an integral operator given by b (K ∗ x)(s) = k(t, s)x(t)dt, x ∈ L2 [a, b]. a
Thus, K ∗ K is also an integral operator given by b ˜ t)x(t)dt, x ∈ L2 [a, b] (K ∗ Kx)(s) = k(s, a
with kernel ˜ t) = k(s,
b
k(τ, s)k(τ, t)dτ,
s, t ∈ [a, b].
(3.1.6)
a
It is to be observed that the operators K, K ∗ and K ∗ K are not only compact from L2 [a, b] to L2 [a, b], but also from L2 [a, b] to C[a, b], where C[a, b] is endowed with the supremum norm, x∞ := sup |x(t)|,
x ∈ C[a, b].
a≤t≤b
Thus, for every y˜ ∈ L2 [a, b], K ∗ y˜ ∈ C[a, b],
(K ∗ K + αI)−1 K ∗ y˜ ∈ C[a, b].
Thus, (3.1.4) and (3.1.5) can be viewed as equations in the setting of C[a, b] as well. Therefore, we can make use of the rich theory available for integral equations of the second kind (cf. [1, 2]) for analyzing equations (3.1.4) and (3.1.5). In particular, we can use quadrature based approximations for K ∗ K, namely the Nyström approximation of K ∗ K for obtaining numerical approximation of solutions for (3.1.4) and (3.1.5), and use them for approximating x† . This has been done in the author’s recent book [8]. This chapter is written essentially based on the material available in the fifth chapter of this book for the purpose of making it available in a focussed manner, as the results in the book were derived from many general results of the previous chapters of the book. The plan of the chapter is as follows. In Section 2, we shall recall some of the properties of Nyström approximation of integral operators in general, and also present some properties of the Nyström approximation of K ∗ K. In Section 3, some general error estimates are considered. In Section 4, we shall consider error estimates by imposing specific source conditions on the solution x† , and show that we do not lose any accuracy by the process of using the numerical realizations of (3.1.4) and (3.1.5) with the help of the Nyström approximation of K ∗ K.
68
3.2
3
Regularization of Fredholm Integral Equations of the First Kind
Nyström Method for Regularized Equations
3.2.1
Nyström approximation of integral operators
Let κ(·, ·) be a continuous function on [a, b] × [a, b] and let K be the integral operator with kernel κ, i.e., b (Ku)(s) = κ(s, t)u(t)dt, u ∈ C[a, b], s ∈ [a, b]. a (n)
(n)
For n ∈ N, let Qn be a quadrature formula based on nodes τ1 , . . . , τn (n) (n) [a, b] and weights w1 , . . . , wn in R, i.e., Qn u =
n
(n)
(n)
u(τj )wj ,
in
u ∈ C[a, b].
j=1
Then the Nyström approximation Kn of the operator K associated with the quadrature formula Qn is defined by (Kn u)(s) =
n
(n)
(n)
(n)
κ(s, τj )u(τj )wj ,
u ∈ C[a, b].
j=1
We shall assume that the sequence (Qn ) of quadrature formulas converge, i.e., for every u ∈ C[a, b],
b
Qn u →
u(t) dt as n → ∞.
a
Then it is known (cf. [1, 2, 7, 8]) that there exists ω > 0 such that n
(n)
|wj | ≤ ω
(3.2.1)
j=1
and K − Kn ∞ ≥ K∞ ,
∀ n ∈ N,
whereas (K − Kn )K∞ → 0,
(K − Kn )Kn ∞ → 0 as
n → ∞.
(3.2.2)
Here, we used the notation B∞ for a bounded operator B : C[a, b] → C[a, b] to denote the operator norm of B induced by the norm · ∞ on C[a, b]. Next, suppose that λ is a nonzero scalar which is not an eigenvalue of K. Then the operator K − λI is a bijective linear operator on C[a, b] so that its
3.2 Nyström Method for Regularized Equations
69
inverse, (K − λI)−1 is a bounded linear operator. Further, using the properties of (Kn ) given in (3.2.2), it is known that there exists N ∈ N such that λ is not an eigenvalue of Kn for all n ≥ N , and the set {(Kn − λI)−1 ∞ : n ≥ N } is bounded. In particular, for every v ∈ C[a, b], the equation (Kn − λI)u = v has a unique solution un ∈ C[a, b] for every n ≥ N . Further, it is known that if u0 is the solution of the equation (K − λI)u = v, then u0 − un ∞ → 0 as n → ∞ (cf. [1, 2, 7, 8]).
3.2.2
Approximation of regularized equation
.n be the Nyström Now, let K be the integral operator given in (3.1.2) and let A (n) (n) ∗ approximation of K K associated with the nodes τ1 , . . . , τn in [a, b] and (n) (n) weights w1 , . . . , wn in R such that the corresponding quadrature rule Qn u =
n
(n)
(n)
u(τj )wj ,
u ∈ C[a, b],
j=1
converges, i.e.,
b
Qn u →
u(t) dt as n → ∞
a
.n is defined by for every u ∈ C[a, b]. Thus, A .n x)(s) = (A
n
˜ τ (n) )x(τ (n))w(n) k(s, j j j
(3.2.3)
j=1
˜ ·) is as given in (3.1.6). We have already for x ∈ C[a, b] and s ∈ [a, b], where k(·, ∗ observed that K y˜ ∈ C[a, b] and that equation (3.1.3) is uniquely solvable for every y˜ ∈ L2 [a, b] and the solution belongs to C[a, b]. Thus, the standard Nyström method for (3.1.3) would take the form .n + αI)xn = K ∗ y. (A ˜
(3.2.4)
As per the theory discussed in Section 3.2.1, the unique solvability of the equation (3.2.4) is known only for large n. Now, due to the special form of the operator A := K ∗ K, in fact, we show that this is true for all n ∈ N. For this purpose, we make use of the following observation.
70
3
Regularization of Fredholm Integral Equations of the First Kind
.n of the integral operator K ∗ K, Theorem 3.2.1. The Nyström approximation A defined in (3.2.3), has the representation .n := K ∗ Kn , A
(3.2.5) (n)
where Kn is the Nyström approximation of K associated with the nodes τ1 , . . . , (n) (n) (n) τn in [a, b] and weights w1 , . . . , wn , that is, (Kn x)(s) =
n
(n)
(n)
(n)
k(s, τj )x(τj )wj ,
x ∈ C[a, b], s ∈ [a, b].
j=1
Proof. For x ∈ C[a, b] and s ∈ [a, b], we have .n x)(s) = (A =
n
˜ τ (n))x(τ (n))w(n) k(s, j j j
j=1 n % b a
j=1
& (n) (n) (n) k(t, s)k(t, τj ) dt x(τj )wj
b
=
k(t, s)(Kn x)(t) dt a
= (K ∗ Kn x)(s). .n := K ∗ Kn . Thus, A
3.2.3
Solvability of approximate regularized equation
In order to establish the unique solvability of (3.2.4), we first observe the following result, analogous to Proposition 5.2 in [8]. Lemma 3.2.2 (cf. Nair [8]). For each n ∈ N, the operator Kn K ∗ : L2 [a, b] → L2 [a, b] is a positive and self-adjoint operator. Proof. Let x, y ∈ L2 [a, b] and s ∈ [a, b]. Then we have (Kn K ∗ x)(s) = =
n j=1 n
(n)
(n)
(n)
(n)
(n)
wj k(s, τj )(K ∗ x)(τj ) %
wj k(s, τj )
a
j=1
=
b n a j=1
(n)
(n)
b
& (n) k(t, τj )x(t)dt (n)
wj k(s, τj )k(t, τj )x(t)dt
3.2 Nyström Method for Regularized Equations
so that ∗
Kn K x, y =
b b n a
a
71
(n)
(n)
(n)
k(s, τj )k(t, τj ) wj
x(t) y(s)dt ds
j=1 ∗
= x, Kn K y and Kn K ∗ x, x =
n j=1
Thus, Kn
K∗
2 b (n) (n) wj k(s, τj )x(s) ds ≥ 0. a
is a positive self-adjoint operator for every n ∈ N.
Remark 3.2.3. It can be seen that the operator Kn K ∗ is also obtained by approximating the kernel of KK ∗ by the convergent quadrature rule, that is, b κn (s, t)x(t) dt, x ∈ L2 [a, b], s ∈ [a, b], (Kn K ∗ x)(s) = a
where κn (s, t) :=
n
(n)
(n)
(n)
k(s, τj )k(t, τj ) wj ,
s, t ∈ [a, b].
j=1
At this point, we may recall that, in [6], Groetsch considered the degenerate kernel approximation An of the operator K ∗ K, by approximating the kernel of the operator K ∗ K by the convergent quadrature rule, so that An := Fn K, where Fn is the Nyström approximation of the integral operator K ∗ . Hence, conclusion in Lemma 3.2.2 holds for An := Fn K as well. The following lemma, which will be used in the due course, can be proved using ideas from spectral theory. However, for an elementary proof for the same, one may see Nair ([8], Lemma 4.1 and Corollary 4.4). Lemma 3.2.4 (cf. Nair [8]). Let X and Y be Hilbert spaces, A : X → X and T : X → Y be bounded linear operators with A being a positive self-adjoint operator, that is, Ax, x ≥ 0 for every x ∈ X and A∗ = A. Then for every α > 0, A + αI and T ∗ T + αI are bijective, and the following hold: (A + αI)−1 ≤ 1/α, (A + αI) ∗
(T T + αI)
−1
−1
A ≤ 1,
√ T ≤ 1/ α. ∗
In view of Lemma 3.2.2 and Lemma 3.2.4, Kn to itself for every α > 0, and
K ∗ +αI
(Kn K ∗ + αI)−1 ≤ for all α > 0 and n ∈ N.
1 α
(3.2.6) (3.2.7) (3.2.8) is bijective from L2 [a, b] (3.2.9)
72
3
Regularization of Fredholm Integral Equations of the First Kind
Theorem 3.2.5 ([8], Theorem 5.14). For α > 0 and n ∈ N, the operator K ∗ Kn + αI is bijective from C[a, b] into itself. In particular, for every α > 0 and v ∈ L2 [a, b], there exists a unique xn ∈ C[a, b] such that (K ∗ Kn + αI)xn = K ∗ v. Proof. Let v ∈ C[a, b]. Let un be the unique element in L2 [a, b] such that (Kn K ∗ + αI)un = Kn v. Then we see that u˜ n :=
1 (v − K ∗ un ) ∈ C[a, b] α
and it is the unique element in C[a, b] such that (K ∗ Kn + αI)u˜ n = v. Thus, K ∗ Kn + αI : C[a, b] → C[a, b] is bijective. The particular case follows, since K ∗ v ∈ C[a, b] for every v ∈ L2 [a, b]. In view of the last part of Theorem 3.2.5, for every α > 0 and n ∈ N, there exists a unique xδα,n ∈ C[a, b] such that (K ∗ Kn + αI)xδα,n = K ∗ y δ .
(3.2.10)
Here, for δ > 0, y δ ∈ L2 [a, b] is a noisy data in place of the actual data y such that y − y δ 2 ≤ δ. We observe that xδα,n = K ∗ uδα,n , where uδα,n ∈ L2 [a, b] is the unique element satisfying the equation (Kn K ∗ + αI)uδα,n = y δ . Indeed, applying K ∗ on both sides of equation (3.2.11), we get, (K ∗ Kn + αI)K ∗ uδα,n = K ∗ y δ which is nothing but (3.2.10) with xδα,n = K ∗ uδα,n .
(3.2.11)
3.2 Nyström Method for Regularized Equations
3.2.4
73
Method of numerical solution
From (3.2.10), we have n
˜ τ (n))xδα,n (τ (n) )w(n) + αxδα,n (s) = (K ∗ y δ )(s), k(s, j j j
s ∈ [a, b].
j=1 (n)
In particular, ai := xδα,n (τi ) for i = 1, . . . , n, satisfies n
˜ (n) , τ (n) )w(n) aj + α ai = (K ∗ y δ )(τ (n)). k(τ i j j i
j=1
Thus, xδα,n is given by 1 ∗ δ ˜ τ (n) )w(n) aj , k(s, (K y )(s) − = j j α n
xδα,n (s)
j=1
where a := [aj ] is the unique solution of the matrix equation Ma + αa = b, with
(n)
(n)
(n)
˜ M := [k(τ i , τj )wj ],
(n)
b = [(K ∗ y δ )(τi )].
Remark 3.2.6. The above procedure of the numerical solution can also be arrived at by using equation (3.2.11). Since (Kn K ∗ u)(s) =
n
k(s, τj )wj u, kj ,
s ∈ [a, b],
j=1
with kj (s) = k(s, τj ), j = 1, . . . , n, equation (3.2.11) gives n
k(s, τj )wj uδα,n , kj + αuδα,n (s) = y δ (s),
s ∈ [a, b].
j=1
Taking inner product with ki (·), we get n kj , ki wj uδα,n , kj + αuδα,n , ki = y δ , ki j=1
for i = 1, . . . , n. Thus, xδα,n (s)
:= a
b
k(t, s)uδα,n (t) dt,
s ∈ s ∈ [a, b],
74
3
Regularization of Fredholm Integral Equations of the First Kind
with
1 δ y (s) − k(s, τj )wj a˜ j , α n
uδα,n (s) =
j=1
where ˜ a := [˜aj ] is the unique solution of the matrix equation ˜ ˜ a + α˜ M˜ a = b, with ˜ := [kj , ki w(n) ], M j
˜ = [y δ , ki ]. b
˜ = M and b ˜ = b. It can be easily seen that M
3.3
Error Estimates
Now, we would like to derive estimates for the error x† − xδα,n . Our idea is to derive estimates for xδα − xδα,n and use the relation xˆ − xδα,n = (xˆ − xα ) + (xα − xδα ) + (xδα − xδα,n ), where xα and xδα be the Tikhonov regularized solutions of (3.1.1) with actual data y and noisy data y δ , respectively, that is, solutions of (3.1.4) and (3.1.5), respectively, and the norm · is either ·2 or ·∞ depending on the context. We observe that xδα = K ∗ uδα , xα = K ∗ uα , where uδα and uα are the unique elements satisfying the equations (KK ∗ + αI)uδα = y δ
and (KK ∗ + αI)uα = y,
respectively. Note that xα − xδα = K ∗ (uα − uδα ), xδα − xδα,n = K ∗ (uδα − uδα,n ), where uδα,n is the solution of the equation (3.2.11).
3.3.1
Some preparatory results
For proving the convergence of the method, we make use of the following two lemmas as well.
3.3 Error Estimates
75
Lemma 3.3.1. For every compact operator T : L2 [a, b] → C[a, b], (Kn − K)T → 0
as
n → ∞.
In particular, (Kn − K)K ∗ → 0,
(Kn − K)K ∗K → 0
as n → ∞. Here, the norm B is the operator norm of B : L2 [a, b] → C[a, b]. Proof. Let T : L2 [a, b] → C[a, b] be a compact operator. Since the Nyström approximation Kn of K is pointwise convergent on C[a, b], that is, (K − Kn )x∞ → 0 as n → ∞ for every x ∈ C[a, b], and since T is a compact operator from L2 [a, b] to C[a, b], it follows (cf. [7], Corollary 6.6) that (K − Kn )T → 0 as n → ∞. The particular case is obvious, as both K ∗ and K ∗ K are compact operators from L2 [a, b] to C[a, b]. Lemma 3.3.2. For every u ∈ N (K)⊥ and v ∈ N (K ∗ )⊥ , α(K ∗ K + αI)−1 u → 0
and
α(KK ∗ + αI)−1 v → 0
as α → 0. Proof. Let Tα := α(K ∗ K+αI)−1 for α > 0. By Lemma 3.2.4, we have Tα ≤ 1 for every α > 0. Also, we know that R(K ∗ K) is a dense subspace of N (K)⊥ . Now, for x ∈ R(K ∗ K), let z ∈ X be such that x = K ∗ Kz. Again, by Lemma 3.2.4, Tα x = Tα K ∗ Kz = α(K ∗ K + αI)−1 K ∗ Kz ≤ αz. Thus, α(K ∗ K + αI)−1 x → 0 for every x ∈ R(K ∗ K). Therefore (cf. [7], Theorem 3.11), it follows that α(K ∗ K + αI)−1 u → 0 for every u ∈ N (K)⊥ . By interchanging the roles of K and K ∗ in the above argument we see that α(KK ∗ + αI)−1 v → 0 for every u ∈ N (T ∗ )⊥ . Since, k(·, ·) is continuous, we know that for every x ∈ L2 [a, b], the functions Kx, K ∗ x and K ∗ Kx are in C[a, b]. In particular, it follows that xα and xδα are in C[a, b]. We have already mentioned in Section 3.1 that x† − xα 2 → 0 as α → 0
76
3
Regularization of Fredholm Integral Equations of the First Kind
and δ xα − xδα 2 ≤ √ . α However, with respect to the uniform ·∞ , we only have the following theorem (cf. Groetsch [6]), for the proof of which we shall make use of the relation K ∗ x∞ ≤ κ0 x2 , where
/ κ0 :=
sup
b
(3.3.1) 01/2
2
|(k(s, t)| dt
.
(3.3.2)
a≤s≤b a
Theorem 3.3.3 (cf. Groetsch [6]). For α > 0, δ > 0, xα − xδα ∞ ≤ κ0
δ α
with κ0 as in (3.3.2), and if xˆ ∈ R(K ∗ ), x† − xα ∞ → 0
as
α → 0.
Proof. Using the relation (3.3.1), we have xα − xδα ∞ = K ∗ (KK ∗ + αI)−1 (y − y δ )∞ ≤ κ0 (KK ∗ + αI)−1 (y − y δ )2 κ0 δ . ≤ α Next, suppose that xˆ ∈ R(K ∗ ). Then there exists u ∈ N (K ∗ )⊥ such that = K ∗ u. Hence, by the relation (3.3.1) and the identity xˆ − xα = α(K ∗ K + ˆ we have αI)−1 x, x†
x† − xα ∞ = αK ∗ (KK ∗ + αI)−1 u∞ ≤ κ0 α(KK ∗ + αI)−1 u2 . Now, by Lemma 3.3.2, α(KK ∗ + αI)−1 u2 → 0 as α → 0. Now, we deduce estimates for the error in the regularized approximation with respect to the norms · 2 and · ∞ .
3.3 Error Estimates
3.3.2
77
Error estimate with respect to · 2
We shall use the following notations: εn := (K − Kn )K ∗ ,
ηn := (K − Kn )K ∗ K.
By Lemma 3.3.1, we know that εn → 0 and ηn → 0 as n → ∞. Now, we state a theorem on error estimate with respect to the norm · 2 . For its proof, and for the proof of some of the results in this section, reader may refer [8]. The corresponding results in [8] are mentioned along with the statement of the results. Theorem 3.3.4 ([8], Theorem 5.17). For every α > 0 and n ∈ N, δ xˆ − xδα,n 2 ≤ a˜ n (α)xˆ − xα 2 + b˜ n (α) √ , α where εn ηn 1 + , α α3/2 εn εn 1+ . b˜ n (α) := 1 + α α
a˜ n (α) := 1 +
In particular, we have the following: (i) For α > 0, if nα ∈ N is such that εn ≤ αδ , then
3/2
ηn ≤ αδ ,
%
xˆ −
xδα,n 2
∀ n ≥ nα ,
δ ≤ 3 xˆ − xα 2 + √ α
& ,
∀ n ≥ nα .
√ (ii) If αδ > 0 is such that δ/ αδ → 0 as δ → 0 and if nδ := nαδ , then xˆ − xδα,nδ 2 → 0
3.3.3
as
δ → 0.
Error estimate with respect to · ∞
Now we deduce an error estimate with respect to the norm · ∞ on C[a, b]. Let us first recall that, since R(K ∗) ⊆ C[a, b], both xα and xδα belong to C[a, b].
78
3
Regularization of Fredholm Integral Equations of the First Kind
Theorem 3.3.5 ([8], Theorems 5.18, 5.19). Let κ0 be as in (3.3.1). Then for every α > 0 and n ∈ N, δ xˆ − xδα,n ∞ ≤ aˆ n (α)xˆ − xα ∞ + bˆ n (α) α where
κ0 ηn , α2 In particular, we have the following. aˆ n (α) := 1 +
εn . bˆ n (α) := κ0 1 + α
(i) For α > 0, let nα ∈ N be such that εn ≤ α,
ηn ≤
α2 , κ0
where κ0 be as in (3.3.1). Then & % δ xˆ − xδα,n ∞ ≤ 2 xˆ − xα ∞ + κ0 α
∀ n ≥ nα .
(ii) If αδ > 0 is such that δ/αδ → 0 as δ → 0 and if nδ := nαδ , then xˆ − xδα,nδ ∞ → 0
3.3.4
as
δ → 0.
A modified method
Next, let us look at the case in which the operator K ∗ on the right hand side of the equation (3.2.10) is replaced by an approximation Bn of it. In this case, the approximation of x† is the solution of the equation .n + αI)x˜ δα,n = Bn y δ , (A
(3.3.3)
We observe that δ x˜ δα,n = xδα,n + zα,n ,
where
δ .n + αI)−1 (Bn − K ∗ )y δ . zα,n = (A
We assume that each Bn is a bounded operator from L2 [a, b] to C[a, b] so that there exists μn > 0 such that (Bn − K ∗ )u∞ ≤ μn u2
∀ u ∈ L2 [a, b].
(3.3.4)
For example, Bn may be of the form Bn := Pn K ∗ , where (Pn ) is a sequence of projection operators on C[a, b] such that Pn x − x∞ → 0 as
n→∞
3.3 Error Estimates
79
for every x ∈ C[a, b]. In this case, we see that (Bn − K ∗ )u∞ = (I − Pn )K ∗ u∞ ≤ (I − Pn )K ∗ u2
∀ u ∈ L2 [a, b].
Thus, in this case, we have μn := (I − Pn )K ∗ → 0 as
n → ∞.
.n + αI)−1 with respect Now, in order to derive an estimate for the norm of (A to the norm · ∞ on C[a, b], , we first recall from the definition of Kn that sup{Kn v∞ ≤ cˆ0 v∞ ,
v ∈ C[a, b],
(3.3.5)
where cˆ0 := k∞ ω with ω is as in (3.2.1). The main ingredient for deriving the error estimate is the following lemma. Lemma 3.3.6 ([8], Proposition 5.5). For every v ∈ C[a, b], % & .n + αI)−1 v∞ ≤ 1 1 + κˆ v∞ , (A α α √ where κˆ := κ0 cˆ0 b − a with κ0 and cˆ0 as in (3.3.1) and (3.3.5), respectively. Since (Bn − K ∗ )y δ ∞ ≤ (Bn − K ∗ )(y δ − y)∞ + (Bn − K ∗ )y∞ ≤ μn δ + (Bn − K ∗ )y∞ , we have δ ∞ zα,n
1 ≤ α
%
κˆ 1+ α
&
(μn δ + (Bn − K ∗ )y∞ ) .
Therefore, using the estimates for xˆ − xδα,n 2 and and xˆ − xδα,n ∞ from Theorem 3.3.4 and Theorem 3.3.5, respectively, we obtain the following theorem, using the notations εn ηn , a˜ n (α) := 1 + 3/2 1 + α α εn εn 1+ , b˜ n (α) := 1 + α α κ0 ηn aˆ n (α) := 1 + 2 , α ˆbn (α) := κ0 1 + εn , α % & κˆ μn 1+ , c˜n (α) := b˜ n (α) + √ α α % & κˆ dˆn (α) := bˆ n (α) + μn 1 + α with with κ0 as in (3.3.1), μn as in (3.3.4) and κˆ as in Lemma 3.3.6.
80
3
Regularization of Fredholm Integral Equations of the First Kind
Theorem 3.3.7. For every α > 0 and n ∈ N, δ 1 xˆ − x˜ δα,n 2 ≤ a˜ n (α)xˆ − xα 2 + c˜n (α) √ + α α and xˆ −
xδα,n ∞
δ 1 ≤ aˆ n (α)xˆ − xα ∞ + dˆn (α) + α α
% 1+ %
κˆ α
κˆ 1+ α
&
&
(Bn − K ∗ )y∞
(Bn − K ∗ )y∞ .
From the above theorem the following corollary is immediate. Corollary 3.3.8. Suppose μn → 0 as n → ∞. Then, for α ∈ (0, α0 ] for some α0 > 0, the following hold: (i) Let nα ∈ N be such that εn ≤ α,
ηn ≤ α3/2 ,
μn ≤ α3/2 ,
(Bn − K ∗ )y∞ ≤ δα3/2
for all n ≥ nα . Then there exists c1 > 0 such that % & δ xˆ − x˜ δα,n 2 ≤ c1 xˆ − xα 2 + √ , α
∀ n ≥ nα .
(ii) Let n˜ α ∈ N be such that εn ≤ α,
ηn ≤ α2 ,
μn ≤ α,
(Bn − K ∗ )y∞ ≤ δα
for all n ≥ nα . Then there exists c2 > 0 such that % & δ xˆ − x˜ δα,n ∞ ≤ c2 xˆ − xα ∞ + , α
3.4
∀ n ≥ n˜ α .
Conclusion
.n ) of the integral operator K ∗ K to We have used the Nyström approximation (A have a regularized approximation method for approximately solving the integral equation of the first kind, Kx = y, where K is a Fredholm integral operator with continuous kernel. In this case the solvability the approximate equation is uniquely solvable for every n ∈ N, instead of the standard known result for all large enough n ∈ N. This is established by making use of the the representation .n ), where Kn is the Nyström .n = K ∗ Kn of the Nyström approximation (A A ∗ approximation of K, and the fact that Kn K is a positive self adjoint operator. Error estimates are given in terms of the uniform norm · ∞ as well as the L2 -norm · 2 when the available noisy data y δ ∈ L2 [a, b] satisfies y − y δ 2 ≤ δ for some known error level δ > 0. The derived error estimates are of the same order as that one obtains from Tikhonov regularization for large enough n.
References
81
References [1] P. Anselone, Collectively Compact Operator Approximation Theory and Applications to Integral Equations, Prentice-Hall, Englewood Cliffs, NJ, 1971. [2] K. Atkinson and W. Han, Theoretical Numerical Analysis, Springer, 2000. [3] H. W. Engl, M. Hanke and A. Neubauer, Regularization of Inverse Problems, Dordrecht, Kluwer, 1993. [4] C. W. Groetsch, Inverse Problems in Mathematical Sciences, Vieweg, Braunschweg, Wiesbaden, 1993. [5] C. W. Groetsch, Tikhonov Rgularization of Fredholm Integral Equations of the First Kind, Pitman, 1984. [6] C. W. Groetsch, Convergence of a regularized degenerate kernel method for Fredholm integral equations of the first kind, Integr. Equat. Oper. Th., 13, 67-75, 1990. [7] M. T. Nair, Functional Analysis: A First Course, Prentice-Hall of India, New Delhi, 2002 (Third Printing: PHI Learning, Pvt. Ltd., 2010). [8] M. T. Nair, Liner Operator Equations: Approximation and Regularization, World Scietific, Singapore, 2009. [9] M. T. Nair, A unified approach for regularized approximation methods for Fredholm integral equations of the first kind, Numer. Funct. Anal. and Optim., 15 (3-4), 381-389, 1994.
Author Information M. T. Nair Department of Mathematics, Indian Institute of Technology Madras, Chennai 600 036, India. E-mail: [email protected]
Chapter 4
Regularization of Numerical Differentiation: Methods and Applications T. Y. Xiao, H. Zhang and L. L. Hao
Abstract. In many scientific and engineering applications one may encounter the problem of numerically differentiating the approximated function specified by noise data, which is a classical and typical ill-posed inverse problem. So various regularization schemes have been presented and employed to obtain the stable approximated derivatives. The Objective of this chapter is to summarize and analyze in brief these methods from a viewpoint of integration of approximation, optimization and regularization theory, to test their performances by numerical comparisons and to demonstrate their usefulness by some representative examples. A so-called δ 2 -rule for determining parameter in TV-regularization is presented and shown to be effective. Furthermore, the unsolved problems and suggestions are also proposed.
4.1
Introduction
In many scientific and engineering applications one may have to estimate a derivative y given the noisy values of the function y to be differentiated. This kind of problems is well known as Numerical Differentiation (abbreviated to ND). For the space limitation, we only give a short list of the titles of applications of numerical differentiation: • recovering the local volatility of underlying assets [41] (in Finance); • estimating the parameters in new product (/technology) diffusion model [40] (in Technological Forecasting); • determining the source (/coefficients) item [37] (in Heat Conduct Problem); • the numerical inversion of Abel transform [35] (in Physics); • the parameter identification of differentiation operator [6] (in System Identification);
84
4
Regularization of Numerical Differentiation
• the linear viscoelastic stress-strain analysis from experimental data [27] (in Viscous Elastic Mechanics); • the image edge(/corner) detection [32] (in Image Processing); • calculating vorticity from observational wind [3] (in Meteorology). Many similar problems, and many other type of applications can be found in [23, 24, 14, 25] and the references therein; we will also discuss several representative examples in details in Section 4.4. ND-problem can be formulated in the first kind of integral equations for seeking unknown function φ(x) if we take, without loss of generality, the definition domain of y as [0, 1]: ⎧ x ⎪ ⎪ φ(s)ds = y(x)(y(0) = 0), x ∈ [0, 1], or ⎨ (Aφ)(x) := 0 1 (4.1.1) ⎪ ⎪ φ(s)ds = −y(x)(y(1) = 0), x ∈ [0, 1] ⎩ (A∗ φ)(x) := x
for the first order of derivative; and (Aφ)(x) :=
1 0
K(x, s)φ(s)ds = y(x)(y(0) = y(1) = 0), x ∈ [0, 1]
for the second order of derivative; where (1 − s)x, K(x, s) = (1 − x)s,
(4.1.2)
0 ≤ x ≤ s; s ≤ x ≤ 1.
They are typical inverse problems which means especially that the differentiation of noisy function is ill-posed: small (in some metric) perturbation of y may cause large errors in the computed derivative. As Hanke [14] said it “encompasses many subtleties and pitfalls that a complex (linear) inverse problem can exhibit” in the solving process; yet there are some other remarkable features: the operator A is positive; the structure of A is simple, the related singular systems are easy to obtain, and, as can be shown, the degree of ill-posedness of the discrete matrixes is moderate. So it is easy to analyze, considerably easy to solve. However, we will show that there still exist some deficiencies in the outcomes when using the existing techniques for solving ND-problems. Because of the nature of ill-posedness, we should employ stable numerical methods to do ND, in which Tikhonov regularization must be a preferred approach. Actually, the first batch of applications of Tikhonov regularization to ND were contributed by Dolgolova and Ivanov [8] (1966), Vasin [28] (1969), Dolgopolova [9] (1970) and Cullum [7] (1971). After that various kinds of regularization methods are successively applied to ND problems. Since the purpose
4.1 Introduction
85
of performing regularization or stabilization is to obtain a stable numerical result of ND-problem, hereafter we would incorporate them into a large family named as regularization. It is obvious that the stable numerical schemes (SNS) of ND are mainly rooted in regularization theory, and its development is parallel to the development of regularization theory. Therefore, the basic concepts and ideas of constructing SNS are as in the regularization methods, i.e., Find a good approximation of the original ill-posed problem from a family of the neighbor well-posed problems or some kind of optimization problems with reasonable objective functional and constraints. Thus, to design a concrete SNS it may include the following tasks: (1) How to construct the neighbor ‘well-posed problem’ ? We should make sure what does the ‘neighbor’ mean? How to ensure the ‘neighbor problems’ to be wellposed? Does the ‘neighbor problems’ appear in a discrete or continuous way? (2) How to impose the appreciate constrains on the desired solution? This may involve the setting of the solution space, the smoothness, boundness of the feasible solutions and the quantitative (or qualitative) assumptions about the input data, etc. For example, is y(x) k-times continuously differentiable (y(x) ∈ H (k) )? Is y(x) or Aφ−y bounded? (3) How to control the degree of the proximity? This usually means the suitable choice of some control number or regularization parameter. Of course, most of the above tasks can be settled as in the general regularization methods, but they may appear in a concise style or fresh faces. Try to take a look at the ways of constructing the neighbored well-posed problems. If we suppose the precise data y(x) ∈ H (1) (H (2)), and its δ-approximation, y δ ∈ L2 [0, 1], such that y − y δ ≤ δ, we can Construct an integral operator Dh such that Dh y δ (x) ≈ y (k) (x): 1 1 δ ψk (t)y δ (x + th)dt (k = 1, 2) (4.1.3) Dh y (x) = k h −1 with a suitable step-size h > 0 and a selected polynomial ψk (t); Construct a mollifier Jλ , a convolution operator with a kernel Kλ ∞ δ δ δ Jλ y (x) := y ∗ Kλ = Kλ ∗ y = Kλ (x − s)y δ (s)ds (4.1.4) −∞
such that Jλ y(x) be a smooth ‘Identity Approximation’, i.e., ∞ Kλ (x − s)y(s)ds = y(x), ∀y ∈ H (k) lim+ λ→0
(4.1.5)
−∞
and (Jλ y (x)) δ
(k)
=
∞
−∞
(Kλ (x − s))(k)y δ (s)ds.
(4.1.6)
86
4
Regularization of Numerical Differentiation
We hope to choose Kλ with λ = λ¯ such that (Jλ¯y δ (x))(k) ≈ y (k) . Change positive operator equation (4.1.1) into the second kind: x φ(s)ds + αφ(x) = y δ (x), (α > 0), y δ ∈ L2 [0, 1] (4.1.7) 0
by choosing an appreciate α, (4.1.7) is expected to be a neighbored well-posed problem of (4.1.1). Find a min-norm least square solution among the set of feasible solutions Fδ := {φ|Aφ − y δ ≤ δ; φ ∈ H (k) }: ' 1 φ¯ = arg inf Ω[φ] , Ω[φ] =: |φ(k) (x)|2dx; k = 0, 1, 2 (4.1.8) φ∈Fδ
0
which can be transformed as solving an unconstrained optimization problem by method of Lagrange multiplier: ' δ δ 2 φα = arg inf (Aφ − y + αΩ[φ]) , (α > 0) (4.1.9) φ∈H (k)
where Ω[φ] is a penalty item or stabilizer. Taking a satisfied value of α > 0, the well-posed problem (4.1.9) will close to the problem: ' + 2 (4.1.10) inf Aφ − y . φT = A y := arg φ∈H (k)
Give another set of feasible solutions: (denoting y˜i = y δ (xi ) and Sφ = {φ|φ are spline functions}) $ n−1 1 (φ(xi ) − y˜i )2 ≤ δ 2 ; φ ∈ Sφ . FD,δ := φ | n−1 i=1
So can construct an unconstrained minimization problem: $ n−1 1 (φ(xi ) − y˜i )2 + αΩ[φ] inf φ∈FD,δ n−1
(4.1.11)
i=1
this is well-posed ∀α > 0 and close, when α is suitable, to the problem $ n−1 1 (4.1.12) inf (φ(xi ) − y˜i )2 φ∈Sφ n−1 i=1
and so on and so forth. In a word, the regularizing schemes could be constructed by means of smoother, mollifier, filter, penalty factor or stabilizer accompanying
4.2 Regularizing Schemes
87
some appreciate fit-to-data functional on a selected set of feasible solutions; in which we need to take several theoretical and computational tools including approximation, optimization, regularization, and some combination of them, into the work. The remainder of this chapter is organized as follows. In Section 4.2 we give a brief introduction on major stable schemes of ND. Section 4.3 provides a numerical comparisons of most of those schemes. Some representative applied examples are demonstrated in Section 4.4. Finally, we complete this chapter with some conclusions and suggestions.
4.2
Regularizing Schemes
There are plenty of schemes devoted to stabilized or regularized ND, we intend to pay close attention to the following ones: (1) Regularized Difference Methods (RDM); (2) Smoother-Based Regularization (SBR); (3) Mollifier Regularization Methods (MRM); (4) Tikhonov Variational Regularization(TiVR); (5) Lavrentiev Regularization Methods(LRM); (6) Discrete Regularization Methods(DRM); (7) Semi-Discrete Tikhonov Regularization(SDTR); (8) Total Variation Regularization (TVR). With some settings and basic assumptions, we will present a brief introduction, summarize and analyze the main results in this section. Moreover, a so-called δ 2 -rule for TVR is proposed.
4.2.1
Basic settings
Basic setting of numerical differentiation Given a tabulated function y(Δ) ˜ = {y˜0 , y˜1 , . . . , y˜n }, which are the sampled values at the points of the grid Δx = {0 = x0 < x1 < · · · < xn = 1}, of an ideal and (piecewise) smooth function y(x) on [0, 1]. First we state some basic settings and assumptions: Setting 1 (Discrepancy in Discrete Form): Suppose (1) δ and y˜i are given with |y˜i − y(xi)| ≤ δ, for i = 1, 2, . . . , n − 1; (2) y˜0 = y(0) and y˜n = y(1) (error-free at x = 0, 1). The task is to find a smooth approximation f (k) (x) of y (k)(x), k = 1, 2, from the given data (y(Δ), ˜ δ), with some guaranteed accuracy. Setting 2 (Discrepancy in Continuous Form): Suppose that (1) y˜i = y δ (xi ), i = 0, 1, 2, . . . , n, where y δ (x) ∈ L2 [0, 1] is formally the approximate version of y(x);
88
4
Regularization of Numerical Differentiation
(2) for a given δ: y δ (x) − y(x)μ ≤ δ, μ = L2 [0, 1] or μ = ∞. The task is to find an approximation of y (k)(Δ), k = 1, 2 from the given data (y(Δ), ˜ δ) and the y δ (x), with some guaranteed accuracy. Basic assumptions: H.1 exact data y ∈ Y := H (k) (k ∈ N), a space of whole k-times continuously differentiable functions, y (k) exists uniquely. H.2 approximate data y δ ∈ Lk [0, 1], k = 1, 2 or ∞; it is known with the error level δ > 0: y − y δ l ≤ δ, l = 1, 2, or ∞. General formula of error estimation of ND Let Dh : Lk [0, 1] → Ll [0, 1] and Rα : Lk [0, 1] → Ll [0, 1] be two linear operators. The total error of stabilized ND consists of two parts:
or
Dh y δ − y ≤ Dh y − y + Dh y δ − Dh y 1 23 4 1 23 4
(4.2.1)
Rα y δ − y ≤ Rα y − y + Rα y δ − Rα y 1 23 4 1 23 4
(4.2.2)
⇓ ⇓ ⇓ Total Error ≈ Approximated Error+Regularized Error. So,
Rα y − y ≤ C1 (y, y , α), Rα y δ − Rα y ≤ C2 (δ, α)Rα .
(4.2.3)
We should select h = h(δ) or α = α(δ) such that the right hand-side of (4.2.1) or (4.2.2) is minimized!
4.2.2
Regularized difference method (RDM)
Let y δ ∈ L∞ (0, 1), take step size h = h(δ) as a regularization parameter. Ramm proposed following stable schemes (1968, cf. [25]): ⎧1 δ δ ⎪ 0 < x < h, ⎪ (y (x + h) − y (x)), ⎪ h ⎪ ⎨ 1 δ (4.2.4) Rh y δ (x) = h < x < 1 − h, (y (x + h) − y δ (x − h)), ⎪ 2h ⎪ ⎪ ⎪ ⎩ 1 (y δ (x) − y δ (x − h)), 1 − h < x < 1, h > 0, h ⎧ 1 δ δ δ ⎪ 0 < x < 2h, ⎪ (4y (x + h) − y (x + 2h) − 3y (x)), ⎪ 2h ⎪ ⎨ 1 δ Rh y δ (x) = 2h < x < 1 − 2h, (y (x + h) − y δ (x − h)), ⎪ 2h ⎪ ⎪ ⎪ ⎩ 1 (3y δ (x) + y δ (x − 2h) − 4y δ (x − h)), 1 − 2h < x < 1. 2h (4.2.5)
4.2 Regularizing Schemes
89
Error estimation: for scheme (4.2.4) and (4.2.5), we have Rh(δ) y δ − y 2 ≤ Rh(δ) y δ − y 2 ≤
N2,2h 2δ + ; 2 h
y 2 ≤ N2,2,
(4.2.6)
N3,2h2 4δ + ; 24 h
y 2 ≤ N3,2.
(4.2.7)
Minimizing the right-hand side of (4.2.6), (4.2.7) leads to hopt = 2 δ/N2,2 (or 4 3 3δ/N3,2 ), so Rh(δ) y δ − y 2 = O(δ 1/2 ) or O(δ 2/3) respectively. RDM is an integration of difference approximation and regularizing idea, which is simple to be realized, but the unknown constants N2,2 , N3,2 may result in the non-convenience for choosing h. In addition approximating y (0), y (1) is not possible at x = 0, 1!
4.2.3
Smoother-Based regularization (SBR)
Smoothing methods by integration (see [12]) Lanczos(1956) claimed to perform “differentiation by integration”: 3 f (x) ≈ Dh f(x) = 3 2h
h
−h
tf(x + th)dt.
(4.2.8)
It can be shown for an error-free and smooth function f(x) that Dh f(x) = f (x) + O(h2 ) −→ f (x)(h → 0).
(4.2.9)
Groetsch (1998) [12] considered the case of the noise version, f δ (x): f − f δ ∞ ≤ δ, where f δ ∈ L1 . He proved that Dh f δ − f ∞ ≤ If taking h = h(δ) =
3δ M h2 + , 10 2h
f (3) (x)∞ ≤ M.
(4.2.10)
√ 3 δ, then
Dh f δ − f ∞ = O(δ 2/3 ).
(4.2.11)
Luo and He [21] generalized the above method, they gave more effective schemes, for k = 1, 2, f δ ∈ L1 , f (k) can be approximated by ⎧ 1 1 ⎪ k δ ⎪ ψk (t)f δ (x + th)dt, where ⎪ ⎨ Dh f (x) := hk −1 n n (4.2.12) ⎪ 2i−1 2i ⎪ (t) = a t ; ψ (t) = a t . ψ ⎪ 1 2i−1 2 2i ⎩ i=1
i=0
90
4
Regularization of Numerical Differentiation
Here {a2i−1 } and {a2i } can be determined respectively by matching the error exponents when inserting the Taylor expanding of f(x + th) respect to th, into Dkh f(x). For example, it was derived [21] that ⎧ 15 3t ⎪ ⎨ ψ1 (t) = (n = 1) and (5t − 7t3 ) (n = 2), 2 8 2 4 ⎪ ⎩ ψ2 (t) = 15 (−1 + 3t2 ) (n = 1) and −525 + 4410t − 4725t (n = 2). 4 32 (4.2.13) Moreover, the asymptotic optimal convergence rate of (4.2.12) is given by 2n 1 D1h f δ − f ∞ = O(δ 2n+1 ), h = c1 δ 2n+1 , for f ∈ C (2n+1) , (4.2.14) 2n−1 1 D2h f δ − f ∞ = O(δ 2n+1 ), h = c2 δ 2n+1 . A compromise strategy is to take h = δ 1/(2n+1) in both of the cases. Notice that if f(x) is smooth enough (n > 1), the asymptotic order of the error (4.2.14) is higher than in the case of RDM, but if f ∈ C (3) merely, both of their asymptotic order are the same. Likewise, SBR is easy to be realized numerically; although y (0) and y (1) can be approximated, but they cannot be reconstructed with high precision.
4.2.4
Mollifier regularization method (MRM)
Mollifiers (also known as approximations to the identity) have been used in approximation theory for a very long time, it was introduced by Vasin [29] and Murio [22] as the mollification regularization for dealing with ND-problem. The idea of mollification method is simple and intuitive: given a function which is rather irregular, by convolving (mollifying) it with a mollifier the function gets “mollified” (see (4.1.5) of Section 4.1), that is, its rough features are smoothed, while still remaining close to the original non-smooth function. Since differentiating a smooth function is rather stable, to which we can expect that the derivative of smoothed function will close to the derivative of the original function. Below we introduce some of the main results of MRM presented by Murio in monograph [23]. Let C 0 (I) denote the set of continuous functions over the interval I = [0, 1] with φ∞,I = infx∈I |φ(x)| < ∞. Suppose that y ∈ H (2)[0, 1] and its observed version y δ ∈ C 0 (I) satisfies y δ − y∞,I ≤ δ; where δ ≥ 0 is error level. Outline of the steps of MBR Introduce a Gaussian kernel Kλ with a ‘blurring radius’, λ: √ Kλ (x) = exp(−x2 /λ2 )/λ π.
(4.2.15)
We notice that (1) Kλ ∈ C ∞ falls to nearly 0 outside a few radii from its center (≈ 3λ); (2) it is positive and has total integral 1.
4.2 Regularizing Schemes
91
δ Make a continuation of y, 5 y into Iλ = [−3λ, 1+3λ] such that (1) they decay smoothly to 0 in [−3λ, 0] [1, 1 + 3λ]; (2) they are 0 in R − Iλ . This can be done by defining δ 2 y (x) = y δ (0) exp{ [(3λ)x2 −x2 ] }, −3λ ≤ x ≤ 0, (4.2.16) 2 (x−1) y δ (x) = y δ (1) exp{ [(x−1)2 −(3λ)2 ] }, 1 ≤ x ≤ 1 + 3λ.
∞ Kλ (x − s)f(s)ds Jλ f(x) := (Kλ ∗ f) = −∞ x+3λ ∼ Kλ (x − s)f(s)ds, f = y, y δ . =
So
(4.2.17)
x−3λ
Choose a suitable λ being a regularization parameter which can be done by discrepancy principle (δ is known) or GCV-criterion (without knowing δ). For instance, the radius λ¯ can be determined by solving the discrepancy-like equation: Jλ y δ (x) − y δ I,∞ = δ. Make the differentiating operation: According to the property of the mollifier, for f = y, y δ we have
(Jλ f) (x) = (Kλ ∗ f) (x) = (Kλ ∗ f)(x).
(4.2.18)
Once the radius of mollification λ¯ is determined, we can get continuous ap proximation: (Jλ¯y δ ) (x) by the right-side expression, (Kλ¯ ∗ y δ )(x) of (4.2.18) or use centered differences to approximate the derivative of Jλ¯y δ at the sample ¯ 1 − 3λ] ¯ for the case of the mollified function points of the interval Iλ¯ = [3λ, δ Jλ¯y being discrete. Consistency, Stability & Error Estimation of MRM ⎧ ⎪ (C): y ≤ M ⇒ (Kλ ∗ y) )∞ ≤ 3λM, ⎪ ⎪ ⎪ 2δ ⎨ (S): y δ ∈ C 0 (I), y δ − y∞ ≤ δ ⇒ (Kλ ∗ (y δ − y)) ∞ ≤ √ , λ π ⎪ ⎪ 2δ ⎪ δ ⎪ ⎩ (E): both of the above hold ⇒ (Kλ ∗ y ) − y ∞ ≤ 3λM + √ . λ π (4.2.19) We observe that the error estimate is minimized by choosing √ λ = λ¯ = [2δ π/3M ]1/2 which results in
√ √ 2 6M √ δ δ ⇒ (K ∗ y ) − y = O( δ). (4.2.20) (Kλ ∗ y ) − y ∞ ≤ ∞ λ π4 The Numerical Implementation of MRM The numerical scheme of MRM can refer to [23], where many applied examples of ND are included. Moreover, a group of programs in Matlab language for MRM are also provided by Murio. δ
92
4
4.2.5
Regularization of Numerical Differentiation
Tikhonov’s variational regularization (TiVR)
Since Tikhonov’s variational regularization is well-known, we only give a brief description that may differ from the usual case. Introducing Heaviside function we get two equivalent problems: 1 x φ(s)ds = y δ (x) ⇔ H(x − s)φ(s)ds = y δ (x) (0 ≤ x ≤ 1), (4.2.21) 0
0
where H(x − s) = 1, if x ≥ s; else = 0. The Euler equation of variational problem (4.1.9) is (A∗ A + αΩ )φ(s) = A∗ y δ (s) := g(s) where
⎧ ' 1 1 ⎪ ∗ ⎪ H(x − s)H(x − ξ)dx φ(ξ)dξ, ⎨ (A Aφ)(s) = 0 0 1 1 ⎪ δ ⎪ ⎩ g(s) = H(x − s)y (x)dx = y δ (x)dx 0
and αΩ [φ] = In which φ2
if Ω[φ] = φ2L2 , (case 1)
⎩ α(φ(s) − φ (s)),
if Ω[φ] = φ2
=
(4.2.23)
s
⎧ ⎨ αφ(s),
(1) H2
(4.2.22)
1 0
(1) H2
. (case 2)
(4.2.24)
(|φ(x)|2 +|φ (x)|2)dx. As well known, for α > 0 the Euler
equation (4.2.22) is uniquely solvable and its solution φδα continuously depends on y δ . The selection of regularization parameter α It must satisfy Regularity Conditions, i.e., select α = α(δ) such that ⎧ δ2 ⎪ ⎪ = 0, (strongly) ⎨ lim α(δ) = 0; lim δ→0 δ→0 α(δ) (4.2.25) ⎪ δ2 ⎪ ⎩ lim α(δ) = 0; lim < ∞ (weakly) δ→0 δ→0 α(δ) by which a simple and effective setting is: α(δ) = δ 2 . Further, the regularization parameter can be determined by using Morozov discrepancy principle, L-curve rule, GCV rule and Quasi-optimal rule (see [11, 36]). Of course the above steps should be realized in a discrete way. The error of the regularized solution obeys general estimation: under suitable assumptions and conditions (see [11, 36]), we may have that φδα −y = O(δ 1/2 ) or O(δ 2/3 ).
4.2 Regularizing Schemes
4.2.6
93
Lavrentiev regularization method (LRM)
Let operator A be the same as in (4.1.1). For solving (4.1.1) with right-side hand y δ , we can make some translations to get the neighbored problems: Aφ = y ⇒ Aφ + αφ := (A + αE)φ = y (α > 0), Aφ = y δ ⇒ Aφ + αφ := (A + αE)φ = y δ (α > 0).
(4.2.26) (4.2.27)
The well-posedness of (4.2.27) can be proved based upon the following Lax-Milgram theorem: A strictly coercive operator T : F → F (F is a Hilbert space) has a bounded inversion: T−1 : F → F . x Actually, Tα := (A+αE) is strictly coercive, since Aφ = 0 φ(s)ds is positive; for α > 0 and identity operator E, we have Re(Tα φ, φ) = Re(Aφ, φ) + αφ2 ≥ αφ2 , ∀φ ∈ F
(4.2.28)
and Tα ≤ 1/α. So problem (4.2.27) is well-posed. Select α = α∗ to satisfy discrepancy (/residual) principle: Aφδα∗ − y δ 2 = Cδ 2 (C ≥ 1); for y δ > δ; (for DP) (4.2.29) α∗ (A + α∗ E)−1 (Aφδα∗ − y δ ) = Cδ (C ≥ 1). (for RP) √ When DP is employed and y ∈ L2 , we have φδα(δ) − y = O( δ) [37]. The quasi-optimal rule in discrete form can also be used and seems to be more effective than in Tikhonov regularization as shown in [37]. Advantages: simple, analytical solution can be obtained [2]: ⎧ −1 x s−x δ y δ (x) ⎪ δ ⎪ exp( )y (s)ds + , or ⎨ y (x) ≈ φα (x) = 2 α 0 α α (4.2.30) 1 ⎪ 1 x−s δ y δ (x) ⎪ δ ⎩ y (x) ≈ φα (x) = exp( )y (s)ds − α2 x α α For suitable α > 0, it is easy to be implemented by means of numerical quadratures. Although one can obtain an approximation of y (x) by differentiating expression (4.2.30), but it is not stable numerically. There is another way to approximate the second order derivative. For the second order of ND, Xu and Liu [39] presented an analytical formula by Lavrentiev regularization, we write it in a more compact form: g δ (x) hδα (x) − , where (4.2.31) α α $ x 1 1−s δ x−s δ sinh √ g (s)ds − sinh √ g (s)ds . α α 0 0 (4.2.32)
y (x) ≈ ψαδ (x) = hδα (x)
1 =√ α
cosh √xα cosh √1α
94
4
Regularization of Numerical Differentiation
The ψαδ in (4.2.31) satisfies the second kind of Volterra equation: ⎧ ⎨ B[ψ](x) + αψ(x) = −y δ (x) − (y δ (0)) (1 − x) := g δ (x), 1 s ⎩ B[ψ](x) := ψ(τ )dτ ds. x
(4.2.33)
0
A posterior choice of the regularization parameter is given by Xu and Liu [39], we summarize it as the following theorem: Theorem 4.2.1. Assume that g δ > Cδ γ , g δ − g ≤ δ < Cδ γ for two constraints C ≥ 1 and 0 < γ < 1; ψαδ solves eq. (4.2.33), then 1. ∃α = α(δ) such that δ B[ψα(δ) ] − g δ = Cδ γ ,
(4.2.34)
2. φδα(δ) − y = O(δ min(1−γ,γ/2)). So the optimal convergence rate: φδα(δ) − y = O(δ (1/3)) is obtained for γ = 2/3.
Computational considerations: 1. In g δ (x) we can take (y δ ) (0) ≈
1 −y δ (0) + 2 α α
0
1
exp(
−s δ )y (s)ds; α
2. A further simplification to (4.2.31) − (4.2.32) gives ψαδ (x) ≈
1 1 −|x − s| √ (exp( √ ) + Kα (x, s))y δ (s)ds 2α α 0 α 1 −x 1 − y δ (x) − √ (y δ ) (0) exp( √ ), α α α
(4.2.35)
√ ) − exp( x+s−2 √ ). where Kα (x, s) = exp( −x−s α α
The discrepancy equation (4.2.34) and expression (4.2.35) must be solved in a discrete way if g δ (x) is merely known on the grid Δx . In this case so many numerical approximations of the integrals in the iterative process must be involved, hence the computing cost would be quite large.
4.2.7
Discrete regularization method (DRM)
Roughly speaking, the above-mentioned regularization schemes have their relevant discrete versions in the numerical implementation, thus we can add the word ‘discrete’ in front of them. So we may have discrete MRM, discrete LRM, discrete TiVR and so on; we here abbreviate to DMRM, DLRM and DTiVR
4.2 Regularizing Schemes
95
respectively. Let’s take DTiVR as an example, its performing procedure usually includes three jobs: discretization, regularization and select regularization parameter; the former two are as Aφ = y δ ⇒ Ah φh = yhδ ⇒ (ATh Ah + αE)φh = ATh yhδ
(4.2.36)
where Ah , an m×n-order matrix, is discrete approximation of integral operator A, E=unite matrix or E = LT L where L a discrete derivative matrix. The matrix Ah can be obtained via the following manners. Two class of discrete schemes: collocation and Galerkin Method. S1: for (4.1.1) we employ collocation method, on an uniformly grid Δx with Simpson’s quadrature, which results in ⎧ xk+1 ⎪ ⎪ φ(s)ds = (Δy δ )(xk ) = y δ (xk+1 ) − y δ (xk−1 ), ⎪ ⎪ ⎨ xk−1 xk+1 h (4.2.37) φ(s)ds ≈ [φ(xk−1) + 4φ(xk ) + φ(xk+1 )], ⎪ ⎪ 3 ⎪ ⎪ ⎩ xk−1 (k = 1, 2, . . . , n − 1) this can be written in a matrix equation of (n − 1) × (n + 1) order if denoting δ yh,1 = (y δ (x2 ) − y δ (x0 ), y δ (x3 ) − y δ (x1 ), . . . , y δ (xn ) − y δ (xn−2 ))T : δ Ah,1 φh = yh,1 , φh = (φ(x0), . . . , φ(xn ))T .
(4.2.38)
To remedy its defect, we add two constrained conditions at x0 = 0 and xn = 1, i.e., for k = n − 1, n, we have xk h φ(s)ds ≈ [φ(x0 ) + 2φ(x2 ) + · · · + 2φ(xk−1 ) + φ(xk )] = y δ (xk ) (4.2.39) 2 0 δ := (y δ (x δ T denoted as Ah,2 φh = yh,2 n−1 ), y (xn )) , a 2 × (n + 1) order-equations. Combine these equations with (4.2.38), we get δ δ ; yh,2 ]. Ah φh := [Ah,1 ; Ah,2 ]φh = yhδ := [yh,1
(4.2.40)
By the way, we point out that if φ(x0), φ(xn ) are known, then we can get, by recasting equation (4.2.38), a well-posed linear equations whose condition number never exceed 3 (cf. [40]). This shows the absence of information at the two ends is one of the sources of ill-posedness in ND-problems. S2: for (4.1.2) we adopt Galerkin method with orthogonal base function, by using a Matlab command [15]: [Ah , b] = deriv2(n, example), which generates an n × n-order matrix Ah corresponding to the second order derivative, the discretization of A in (4.1.2).
96
4
Regularization of Numerical Differentiation
S3: for (4.1.2) we can also employ collocation method on the grid Δx = {0 = x0 < x1 < · · · < xn−1 < xn = 1} with the kernel (4.1.3): xi 1 sφ(s)ds + xi (s − 1)φ(s)ds = y δ (xi ) (i = 1, 2, . . . , n − 1). (xi − 1) 0
xi
(4.2.41) Suppose hi = xi+1 − xi = h = const, by using trapezoid formula, we get a (n − 1) × (n − 1)-order linear equations: Ah φh = yhδ , where ⎞ ⎛ x1 (x1 − 1) x1 (x2 − 1) . . . x1 (xn−1 − 1) ⎜ x1 (x2 − 1) x2 (x2 − 1) . . . x2 (xn−1 − 1) ⎟ ⎟ ⎜ ⎟, ... ... ... ... Ah = h ⎜ (4.2.42) ⎟ ⎜ ⎝ x1 (xn−2 − 1) x2 (xn−2 − 1) . . . xn−2 (xn−1 − 1) ⎠ x1 (xn−1 − 1) x2 (xn−1 − 1) . . . xn−1 (xn−1 − 1) φh = (φ(x1 ), φ(x2), . . . , φ(xn−1 ))T , (4.2.43) yhδ = (y δ (x1 ), y δ (x2), . . . , y δ (xn−1 ))T . It should be noted that for S2 and S3, the unknown variables of φ(x0 ), φ(xn ) are not included in vector φh ! The remaining task of DRM can be completed by using Hansen’s Regularization Tools [15] which provides various strategies for getting the regularized solution and effective rules for selecting the regularization parameter. The rational combination of “strategies” and “rules” can help us to obtain stable approximated derivatives. For convenience, we use symbol ‘DRM-S1-dp’ to denote DTiVR-method based on discrete scheme S1 with discrepancy principle; for other similar writings, like DRM-gcv or DRM-S3, their meaning are self-evident.
4.2.8
Semi-Discrete Tikhonov regularization (SDTR)
The idea of SDTR may originate in a work of Hanke [14]. The method is a variant of TiVR: (1) the space of feasible solutions is usually some spline space; (2) the constraints are defined by discrete norm but the stabilizer is in continuous semi-norm. So the objective functional to be minimized is described simultaneously by discrete and continuous forms; that’s why we named it as SDTR. The methods usually include following steps: Preestablish a concrete set of feasible solution, for example 6 7 (4.2.44) FD,δ := φ | φ ∈ Pk [0, 1]; M SEd (φ) ≤ δ 2 where Pk [0, 1] and M SEd (φ) are defined as follows Pk [0, 1] = {φ(x)|φ(x) is k-order spline polynimial, k ∈ N} ,
(4.2.45)
4.2 Regularizing Schemes
97
1 (φ(xi ) − y˜i )2 . n−1 n−1
M SEd (φ) :=
(4.2.46)
i=1
Construct and solve the unconstraint optimal problem: inf
φ∈FD,δ
1 (φ(xi ) − y˜i )2 + αφ(k) 2 n−1 n−1
$ .
(4.2.47)
i=1
To solve (4.2.47) we impose the conditions of 1. interpolating conditions: φ(xi ) = y δ (xi ) := y˜i , i = 0, 1, 2, . . . , n; 2. connecting conditions: φ(l) (xi −) = φ(l) (xi +), i = 1, 2, . . . , n − 1; l = 1, 2, . . . , k; objective) = 0. variables) Some other conditions also need to be imposed.
3. optimality (necessary) condition:
∂ (the
∂ (the
Compute the first/second derivative of the spline polynomial which is the minimizer of (4.2.47). The example of SDTR by Lu and Wang [20]: Denote uniform grid Δx := {0 = x0 < x1 < · · · < xn = 1} with step h = 1/n, suppose y(x) ∈ H (3) (0, 1); α = δ 2 and setting ⎧ ⎪ ⎨ φ(0) = y(0) = y˜0 , φ(1) = y(1) = y˜n ; |φ(xi ) − y˜i | ≤ δ, n−1 1 (φ(xi ) − y˜i )2 + δ 2 φ(3) 2L2 [0,1] . ⎪ ⎩ Ψ[φ] = n − 1
(4.2.48)
i=1
Let φ(x) ∈ FD,δ be the piecewise polynomial of five order (k = 5): φ(x) = aj + bj (x − xj ) + cj (x − xj )2 + dj (x − xj )3 +ej (x − xj )4 + fj (x − xj )5 , x ∈ [xj , xj+1 ], j = 0, 1, . . . n − 1. (4.2.49) The Minimizer of Ψ[φ] : φ∗ (x) must satisfy the following conditions: ⎧ (i) (i) ⎪ φ∗ (xj +) − φ∗ (xj −) = 0; i = 0, 1, . . . , 4, j = 1, 2, . . . , n − 1; ⎪ ⎪ ⎪ ⎪ y˜j − φ∗ (xj ) ⎨ (5) (5) , j = 1, . . . , n − 1; φ∗ (xj +) − φ∗ (xj −) = −α(n − 1) ⎪ (3+j) (3+j) ⎪ ⎪ (1) = 0, j = 0, 1; φ∗ (0) = 0, j = 0, 1; φ∗ ⎪ ⎪ ⎩ φ (0) = y(0) = y˜ , φ (1) = y(1) = y˜ . ∗
0
∗
n
(4.2.50)
98
4
Regularization of Numerical Differentiation
The coefficients in (4.2.49) satisfy the following equations: ⎧ 1 1 ⎪ (y˜j − aj ); fj = Δej ; ⎪ fj+1 − fj = − ⎪ ⎪ 120αn 5h ⎪ ⎪ ⎪ ⎪ ⎨ dj+1 − dj = 4ej h + 10fj h2 ; dj = 1 Δcj − 1 (2ej+1 + 4ej ); 3h 3h ⎪ 2 3 4 ⎪ − b = 2c h + 3d h + 4e h + 5f h ; b ⎪ j+1 j j j j j ⎪ ⎪ ⎪ ⎪ 1 2 1 8 7 ⎪ ⎩ b = Δa − hc − hc h3 ei + h3 ej+1 j j j j+1 + h 3 3 15 15
(4.2.51)
where Δ is difference operator, i.e., Δ(•)j = (•)j+1 − (•)j . Let a = (a1 , a2 , . . . , an−1 )T , c = (c1 , c2 , . . . , cn−1 )T , e = (e1 , e2 , . . . , en−1 )T , y = (y˜1 , y˜2 , . . . , y˜n−1 )T , z = (y˜1 , 0, . . . , 0, y˜n )T , and
⎛
⎛
−2 1 ⎜ 1 −2 ⎜ ⎜ .. P =⎜ . ⎜ ⎝
⎞
⎞
⎛
5 1 ⎟ ⎜1 4 1 ⎟ ⎜ ⎜ ⎟ .. .. ⎟ , H = ⎜ ... . . ⎜ ⎟ ⎝ 1 −2 1 ⎠ 1 −1 ⎞ ⎛ 26 7 ⎟ ⎜ 7 16 1 ⎟ ⎜ ⎟ ⎜ .. .. .. , Q = ⎟ ⎜ . . . ⎟ ⎜ ⎠ ⎝ 1 −2 1 1 −2
−1 1 ⎜ 1 −2 ⎜ ⎜ .. G=⎜ . ⎜ ⎝
⎟ ⎟ ⎟ ⎟, ⎟ 1 4 1⎠ 1 5 ⎞
1 .. .. . .
⎟ 7 ⎟ ⎟ .. .. . . . ⎟ ⎟ ⎠ 7 16 7 7 26
We get the following two equations: ⎧ Gc = 2h2 He, ⎪ ⎪ ⎪ ⎨ h4 h2 z + P a + Qe = Hc, 15 3 ⎪ ⎪ 24α ⎪ ⎩ P e = a − y, h2 % 4 & 72αP 2 4 −1 h Q + e = 3GH −1 (P y + z). 2h H − GH 5 h2
(4.2.52)
(4.2.53)
(4.2.54)
(4.2.55)
With (4.2.54) and (4.2.55) being solved, we obtain e, a and c and then b, d and f in succession. So φ∗ (x) is gotten, and φ∗ (xi ), φ∗ (xi ) can be computed easily. The error estimation of the approximated derivatives: φ∗ − y L2 [0,1] ≤ C11 h2 + C12 δ 2/3 , (4.2.56) φ∗ − y L2 [0,1] ≤ C21 h + C22 δ 1/3 ,
4.2 Regularizing Schemes
99
where Ci,j (i, j = 1, 2) are constants. So if α = δ 2 , and h = δ 1/3 , we have φ∗ − y L2 [0,1] = O(δ 2/3 ), φ∗ − y L2 [0,1] = O(δ 1/3 ) respectively. There are other forms and generations of SDTR: Wang et al. [32] presented another cost functional with the stabilizer Ω[φ] = φ(2) 2L2 [0,1]: Ψ[φ] =
n−1 i=1
hi + hi+1 (φ(xi ) − y˜i )2 + αφ(2) 2L2 [0,1]. 2
They proved that if taking α = δ 2 and denoting h = max hi , then √ φ∗ − y L2 [0,1] ≤ C1 (π, y L2 [0,1])h + C2 δ; y ∈ H (2) (0, 1), lim φ∗ L2 [0,1] = +∞; y ∈ C[0, 1], ∈ / H (2) (0, 1).
(4.2.57)
(4.2.58)
δ,h→0
Wang and Wei (2005) generalized SDTR from 1D-space to 2D-space for computing the first order derivative [33]. Wei and Hon (2007) further generalized SDTR [34] by adopting radial basis function as base of the solution space for the case where the first and second order derivatives could be computed simultaneously based on 1D and 2D data. The relative merits of SDTR are obvious: (1) they are suitable especially to the scatted data both in 1D and 2D-space and calculating simultaneously first and second derivatives is easy; (2) the important assumption of SDTR is error-free of the observations of y δ (x) at x = 0, 1, which may be difficult to be satisfied in quite a few cases; (3) taking α = δ 2 is simple and admissible choice, while this simple-setting is not optimal; (4) but if one adopts an iterative strategy to determine the parameter of α the computational cost may be quite large!
4.2.9
Total variation regularization (TVR)
Total variation regularization (TVR), as a further development of general theory (in metric space) of conventional variation regularization (CVR), was introduced in 1993 as a novel approach capable of handing properly edges and removing noise in a given image. This method has proven to be successful in a wide range of applications. The application of TVR to ND may start from Chartrand in 2005 in a technical report [4]. Although the numerical test for y(x) = |x − 1/2| is successful, but it seems to lake a complete theoretical and numerical analysis including the choice of regularization parameter. Difference between CVR and TVR: Let us consider 1 k(x, s)φ(s)ds = y δ (x), A : F −→ U, (4.2.59) (Aφ)(x) := 0
where F, U denote solution space and data space respectively. For ND-problem, (k) we usually assume U = L2 [0, 1]; F = H2 [0, 1], (k = 1 or 2) in CVR but now
100
4
Regularization of Numerical Differentiation
for TVR we suppose F = BV [0, 1], the space of functions of bounded variation, with the norm φBV [0,1] := φL1 [0,1] +φT V (see below). The cost functionals in CVR and TVR are ⎧ α ⎨ MCV R [φ, y δ ] = Aφ − y δ 2L2 + αφ2H (k) ; φ ∈ F = H (k) , (4.2.60) ⎩ α MT V R [φ, y δ ] = Aφ − y δ 2L2 + αφT V ; φ ∈ F = BV. In (4.2.60), relevant stabilizers (or penalty items) are different: ⎧ 1 ⎪ 2 ⎪ |φ(k) (x)|2dx, for CVR, φ = ⎨ H (k) 01 ⎪ ⎪ ⎩ φT V = |φ (x)|dx, for TVR,
(4.2.61)
0
where φ ∈ BV [0, 1] means that we may permit φ (x) to have at most a countable set of discontinuities. This property together with the minimization of MTαV R makes TVR particulary useful for the ND-problem where the derivative function is piecewise smooth. Another difference is that the Euler equation in CVR is linear (see (4.2.23)) but is not as that in TVR: % & d φ ∗ δ = 0. (4.2.62) 2A (Aφ − y ) − α dx |φ | To overcome possible singularity of (4.2.62), one can introduce an approximate 1 δ stabilizer Ωβ [φ] = 0 |φ |2 + βdx, (β > 0), then for functional MTα,β V R [φ, y ] = δ 2 Aφ − y L2 + αΩβ (φ), its Euler equation becomes 0 / φ d = 0, (4.2.63) 2A∗ (Aφ − y δ ) − α dx |φ |2 + β which can reduce the computational difficult but is still nonlinear one resulting in troubles in the numerical realization. Well-posedness of TVR problems Although BV [0, 1] is Banach space, the stabilizers φT V and Ωβ [φ] are not in BV-norm, so the well-posedness of TVR-problems: ⎧ ⎪ ⎨ inf Aφ − y δ 2L2 + αφT V , φ∈BV (4.2.64) ⎪ ⎩ inf Aφ − y δ 2L2 + αΩβ [φ] φ∈BV
are not natural true. To this end, Acar and Vogel have discussed in detail [1]. According to their proof and results (quoting their notations), under mild
4.2 Regularizing Schemes
101
restrictions on the operator A and stabilizer J(u), T (u) has an unique minimizer which is stable with respect to certain perturbation in the data Z, parameter α and functional J(u). Actually, I = [0, 1] ⊂ R1 is compact set; the anti-differentiation operators A defined by (4.1.1)–(4.1.2) mapping L1 [0, 1] into L2 [0, 1] are linear bounded, AX[0,1] = 0, φT V and Ωβ [φ] are lower semicontinuous, etc., the related ‘restrictions’ are all satisfied, so problem (4.2.64) are well-posed. Choice of regularization parameter in TVR There are several rules for choosing regularization parameter in CVR, e.g., DP, GCV, L-curve and UPRE (unbiased predictive risk estimator) [30]. But, to our knowledge, they are nearly at direct applications to TVR, much less TVR for ND-problems. Recently, Liao et al. and Lin have generalized the GCV and UPRE rules to the TV-image restoration respectively [18, 19], which can be served directly for ND-problems. Here we propose a simple strategy (called δ 2 -rule for short) as in CVR. Assume there exists a true solution φT ∈ BV [0, 1], then φT T V ≤ E < ∞. From Aφ − y δ ≤ δ, we can assert that α(δ) = δ 2 is acceptable, since from the minimality of φδα and MTαV R [φδα , y δ ], we have α(δ) → 0 as δ → 0 and MTαV R [φδα , y δ ] = Aφδα − y δ 2L2 + α(δ)φδα T V
= α(δ) Aφδα − y δ 2L2 /α(δ) + φδα T V 2 δ + φT T V ≤ α(δ)[1 + E] → 0. ≤ α(δ) α(δ)
(4.2.65)
So the setting of α = δ 2 satisfies the weakly regularity condition (4.2.25). About the numerical implementation of TVR Naturally, solving equation (4.2.63) or one of the minimizing problems (4.2.64) is not an easy task. Take the upper one in (4.2.64) as an example, if the discretization of continuous minimizing problem ' δ α δ φα = arg inf MT V R [φ, y ] (4.2.66) φ∈BV [0,1]
is finished, it is changed into the optimizing problem in Rcn ⊂ Rn : ' α,h δ,h h δ,h φα = arg min {MT V R [φ , y ]} . φh ∈Rn c
(4.2.67)
For any fixed α > 0, this can be done by several effective methods; about them and related discrete schemes can be found in [30]. Lucky for us that Regularization Tools [15] provides an M-file for solving the ND-problem in one dimension by using well-known Barrier method: x = T V reg(A, b, lambda); (lambda is positive parameter)
(4.2.68)
102
4
Regularization of Numerical Differentiation
It will be demonstrated that employing A generated by S1, S2 and S3 in Section 4.2.7, and the δ 2 -rule for choosing the α (i.e. lambda), (4.2.68) can work quite well both for the first order and second order ND-problems. There are other effective SNS of ND based on so-called optimal regularizing filter [16] and wavelet-Galerkin method [10], but lack of space forbids our further treatment of the topic here.
4.3
Numerical Comparisons
This section is designed to compare the performances of major SNS of ND described in Section 4.2; the merits of those schemes are also instantiated in next section. One of the measures of numerical performance is relative error norm: δ eM r = φα,M − φT /φT ,
M ∈ {DRM, SDT R, T V R, . . . }
(4.3.1)
and a modified eTr V R (denoted as ‘M-RelErr’) which means that the discontinuous point with its two neighbors are discarded in the calculation. Testing examples include smooth and non-smooth, oscillating and nonoscillating functions in which the former three examples are taken from Hansen [15]: Eg.1: y(x) = (x3 − x)/6, x ∈ [0, 1]; Eg.2: y(x) = exp(x) + (1 − e)x − 1; x ∈ [0, 1]; Eg.3: f(x) = (4x3 − 3x)/24 when x < 0.5 else (−4x3 + 12x2 − 9x + 1)/24, x ∈ [0.5, 1]; Eg.4: y(x) = exp(x) + sin(x); x ∈ [0, π]; Eg.5: y(x) = sin(10x); x ∈ [0, π]; Eg.6: y(x) = |(x − 1/2|; x ∈ [0, 1]. (from [4]) The way of error perturbation in discrete format: √ (4.3.2) yτh = y h + τ ∗ y h ∗ randn(n, 1)/ n ⇒ δ := yτh − y h , where randn generates the normal distributed random numbers with mean zero, variance one and standard deviation one. About the operation environment and ND-package The numerical tests are made under the MATLAB 6.5 computing environment and on the Micro-Computer of Lenovo. In testing process, we have quoted some subroutines from Hansen’s Regularization Tools, Murio’s Mollifiersubroutines and an ND-package [42]; the others are programmed recently by ourselves. In the tables of this section, each result in boldface indicates the best one among the relevant range; the SBR-schemes always take hk = h(δ)k = δ 2k/3
4.3 Numerical Comparisons
103
in (4.2.12), while the TVR-schemes usually employ δ 2 -rule except for the case where a short word (like ‘lc-rule’ or ‘opt-rule’) appears in the parentheses below that item, showing where the adopted parameter is borrowed from. The contents of the comparisons include (1) the relative errors of 1st numerical derivatives by most of the above SNS, 2nd numerical derivatives by DRM, SDTR and TVR with an emphasis on showing the impact of discrete schemes on the reconstruction precision; (2) the performances of DRM, SBR, MRM and TVR for restoring the derivatives at the two sides of interval [0, 1]; (3) the practicality of the δ 2 -rule for choosing the regularization parameter in TVR with comparison to the other rules. Table 4.1 offers a wealth of information: (1) all of the participants do work well for the function like polynomial, among which TVR is the best one, and at the lowest error level (τ = 0.001), SDTR’s ability is a little inferior than TVR; (2) for dealing with the function being drastic oscillating like Eg.5, DRM, SBR and TVR have a good performance, where SBR is the most capable; (3) for dealing with the non-smooth function like Eg.6, as TVR’s line, the best result is of course belongs to him, while DRM-group are the second. SBR and MRM can work rather well in case of the smallest error level (τ = 0.001), but SDTR could do nothing about it; (4) it should be noted that MRM’s exhibition in handling Eg.4 and Eg.5 is disappointing; the bad results may be caused by a
Table 4.1. The relative error comparisons of 1st order derivatives restoration (n = 100; NA=not available).
Scheme DRM-S1-dp DRM-S1-lc DRM-S1-gcv SDTR SBR MRM TVR-S1-δ 2 DRM-S1-dp DRM-S1-lc DRM-S1-gcv SDTR SBR MRM TVR-S1-δ 2
τ 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.001 0.001 0.001 0.001 0.001 0.001 0.001
Eg.1 .90e−1 .36e−1 .39e−1 .71e−1 .70e−1 .52e−1 .15e−1 .22e−1 .42e−1 .42e−1 .95e−2 .36e−1 .11e−1 .43e−2
Eg.3 NA NA .79e−1 .11e−0 .35e−1 .31e−1 .15e−1 .14e−2 NA .16e−2 .98e−2 .13e−1 .13e−1 NA
Eg.4 .17e−0 .13e−0 .16e−0 .21e−0 .80e−1 NA .10e−0 .83e−1 .76e−1 .76e−1 .92e−1 .20e−1 NA .76e−1
Eg.5 .39e−1 .67e−1 .67e−1 .10e+1 .26e−1 NA .27e−1 .65e−2 .56e−1 .65e−1 .96e−0 .35e−2 NA .61e−2
† The relative errors in this column are computed under the absence of the discontinuous point with its two neighbors.
Eg.6(†) .29e−1 .34e−1 .49e−1 .48e−0 .13e−0 .15e−0 .60e−2 .11e−1 .39e−1 .39e−1 .39e−0 .18e−1 .17e−1 .11e−1
104
4
Regularization of Numerical Differentiation
simple post-treatment on the mollified data, i.e., by making ‘center-difference’ to (Kλ¯ ∗ y δ ); if performing numerically integration on (Kλ¯ ∗ y δ ), the situation may be improved. It is clear from Table 4.2 that, for 2nd -ND based on different discrete schemes, DRM and TVR are of about the same efficiency for Eg.1 and Eg.2, while the scheme S2 is obviously superior to scheme S3 for Eg.3; especially, all of the DRM and TVR based on scheme S3 failed the restorations of 2nd derivatives of Eg.3. Pay attention that for restoring the second derivatives in Eg.1 and Eg.2 SDTR-scheme works rather well. Table 4.2. The relative error comparisons of 2nd order derivatives restoration (n = 100; NA=not available).
Scheme DRM-S2-dp DRM-S2-lc DRM-S2-gcv TVR-S2
τ 0.001 0.001 0.001 0.001
DRM-S3-dp DRM-S3-lc DRM-S3-gcv TVR-S3-δ 2 SDTR DRM-S2-dp DRM-S2-lc DRM-S2-gcv TVR-S2
0.001 0.001 0.001 0.001 0.001 0.0001 0.0001 0.0001 0.0001
DRM-S3-dp DRM-S3-lc DRM-S3-gcv TVR-S3-δ 2 SDTR
0.0001 0.0001 0.0001 0.0001 0.0001
Eg.1 .1756e−0 .2360e−0 .1733e−0 .4358e−1 (lc-rule) .1411e−0 .2084e−0 .1461e−0 .4769e−1 .5712e−1 .1061e−0 .1089e−0 .1149e−0 .1344e−1 (lc-rule) .6062e−1 .5727e−1 .1055e−0 .1811e−1 .1947e−1
Eg.2 .1644e−0 .2357e−0 .1665e−0 .2984e−1 (δ 2 -rule) .1321e−0 .2115e−0 .1413e−0 .3724e−1 .6374e−1 .1009e−0 .1065e−0 .1059e−0 .1071e−1 (opt-rule) .5965e−1 .5743e−1 .1045e−0 .1434e−1 .2223e−1
Eg.3 .1650e−1 .2283e−0 .4820e−1 .4998e−1 (lc-rule) .7015e−0(NA) .7095e−0(NA) .7021e−0(NA) .6991e−0(NA) .6967e−0(NA) .6717e−2 .5501e−1 .1203e−1 .2010e−1 (lc-rule) .7017e−0(NA) .7018e−0(NA) .7018e−0(NA) .7018e−0(NA) .7069e−0(NA)
Figure 4.1 shows that DRM-type schemes, SBR and MRM perform poorly in reconstructing the derivatives at x = 0, 1 for Eg.6, while TVR works very well in this situation. But in other cases, for anyone of them, may still exist, in varying degrees, such a tendency that the ability of restoring derivatives at the
4.4 Applied Examples
105
Figure 4.1. The comparison on first order ends-derivatives restoration for Eg.6: y(x) = |x − 1/2| (n = 100, τ = 0.10, δ = 0.0172).
points nearby the end-sides of interval [0, 1], is less effective than at the other points. Retracing Table 4.3, 4.2 and Figure 4.1, we confirm that δ 2 -rule is quite effective, with which TVR is the almost best one to data for reconstructing first and second derivatives, no mater smooth or not; of course, the computational cost compared to the DRM-type methods, will increase greatly by no less than 200% according to our experiments.
4.4
Applied Examples
As mentioned in §4.1, there are plenty of practical applications of ND. In this section, we try to demonstrate some representative examples. Our principal goal is to show in some detail that (1) how a practical problem can be modeled directly by ND; (2) how an ND-procedure can be embedded into the process of solving practical problem as its one of the sub-problems.
106
4
Regularization of Numerical Differentiation
Table 4.3. The RelErr performance of δ 2 -rule in TVR with comparison to the other rules (1st ND, n = 100).
λT V R = Eg.1 Eg.2 Eg.3 Eg.4 Eg.5 Eg.6 (M-RelErr:) success rate
δ2 .15e−1 .43e−2 .14e−1 .20e−2 .15e−1 NA .76e−1 .75e−1 .27e−1 .61e−2 .60e−2 .11e−1 91.67%
αdp (†) NA NA .36e−2 .36e−2 .10e−1 .14e−2 .77e−1 .75e−1 .29e−1 .64e−2 .29e−1 .11e−1 83.33%
αlc (†) .13e−1 .56e−2 .15e−1 NA .11e−1 NA .76e−1 .75e−1 .22e−1 .64e−2 NA .96e−2 75.00%
αgcv (†) .13e−1 .56e−2 NA .51e−2 .11e−1 .16e−2 .76e−1 .75e−1 .22e−1 .64e−2 .13e−1 .96e−2 91.67%
αopt (†) NA NA NA NA NA .26e−1 .76e−1 .75e−1 NA NA .80e−2 .71e−2 58.33%
τ 0.01 0.001 0.01 0.001 0.01 0.001 0.001 0.0001 0.01 0.001 0.01 0.001
RelErr=relative error; NA=not available; † in DRM-S1.
4.4.1
Simple applied problems
The problem of finding the heat capacity cp [25] of a gas as a function of temperature T . Experimentally, one measured heat content qi := q(Ti ), i = 1, 2, . . . , m, while the heat capacity cp,i := cp (Ti ) must obey the following physical relations: Ti cp (τ )dτ = q(Ti ), i = 1, 2, . . . , m. (4.4.1) T0
Obviously, finding cp (Ti ) from q(Ti ) is a numerical differential problem. Selecting the direction of a ship from the recording of courses in navigation [25]; this should be done by finding the maximum of a certain univalent curve. The direction can be obtained by differentiating this curve. Determine the velocity and accelerated velocity of a moving body from the trip recorders at a finite number of observing points; this is undoubtedly a first and second order-numerical differential problem. In a word, things like these are too numerous to mention; they are closely related to the calculation of “change rate”, “change rate of the change rate” by observations (noisy data).
4.4 Applied Examples
4.4.2
107
The inverse heat conduct problems (IHCP)
Presentation of the forward problem: Consider a rather general type of heat conduction problem: ⎧ ut (x, t) = uxx (x, t) + f(x, t) x > 0, t ∈ (0, T ) [1] ⎪ ⎪ ⎪ ⎪ x > 0 [2] ⎨ u(x, 0) = g(x), u(0, t) = ϕ(t), t ∈ (0, T ) [3] ⎪ ⎪ t ∈ (0, T ) [4] u (0, t) = ψ(t), ⎪ ⎪ ⎩ x g(0) = ϕ(0) = 0, [5]
(4.4.2)
where u(x, t), f(x, t) denote the temperature distribution, source item respectively. Suppose ϕ, ψ ∈ L2 [0, T ], g ∈ L2 [0, ∞), and supp g ∈ [0, L] is a compact set. The forward problem is: Given the control equation (4.4.2) ([1]), initial and boundary conditions (4.4.2)([2])–(4.4.2)([5]) (i.e., f(x, t), g(x), ϕ(t), ψ(t) are assumed to be known), determine the temperature function of u(x, t). The analytical formula of u(x, t) exits when f(x, t) has some specific the structure. For example, if f(x, t) = f(t), we have t t f(τ )dτ − 2 k(x, t − τ )ψ(τ )dτ u(x, t) = 0 0 (4.4.3) +∞ g(ξ)[k(x − ξ, t) + k(x + ξ, t)]dξ, + 0
√ where k(x, t) = exp(−x2 /4t)/ πt. Formulation and transform of the source identification Inserting the conditions (4.4.2)([2])–(4.4.2)([5]) into (4.4.3) gives t t +∞ , 1 1 ψ(τ ) √ dτ − √ f(τ )dτ = ϕ(t) + √ g(ξ) exp −ξ 2 /4t dξ. π 0 t−τ πt 0 0 (4.4.4) t Defining (Af)(t) := 0 f(τ )dτ, t ∈ [0, T ], and writing the r.h.s of the above formula as F (t), will result in Af = F (t), t ∈ [0, T ]. This is a special problem of Identification of Source Item (ISI), a typical NDproblem: reconstruction of the first order derivative function. This problem is solved by Wang and Zheng [31] employing TiVR and by Xiao and Zhang [37] recently, employing an improved LRM.
108
4
Regularization of Numerical Differentiation
Another Problem of ISI With f(x, t) = a(t)x and making some other modifications, we get a parabolic equation of the form: ⎧ ut (x, t) = uxx (x, t) + a(t)x, x > 0, 0 < t < T [1] ⎪ ⎪ ⎨ u(x, 0) = 0, x < 0 [2] (4.4.5) u(0, t) = f(t), f(0) = 0, 0 < t < T [3] ⎪ ⎪ ⎩ −ux (0, t) = g(t), g(t) > 0, 0 < t < T [4] where f(t), g(t) are assumed to be known and strictly increasing functions. We want to find unknown functions u(x, t) and a(t) to satisfy the above equations of (4.4.5)([1]) − (4.4.5)([4]). So this is an inverse problem. Assume that (4.4.5) has a solution u(x, t), then it can be shown that u(x, t) = 2
t 0
K(x, t − τ )g(τ ) exp(θ(t) − θ(τ ))dτ,
(4.4.6)
here K(x, t) = exp(−x2 /4t) and θ(t) =
0
t
a(τ )dτ.
(4.4.7)
Let x → 0 in (4.4.6) and using boundary condition (4.4.5)([3]) we obtain an integral equation for y(t) := exp(−θ(t)): 0
t
√ g(τ ) √ y(τ )dτ = πf(t)y(t) t−τ
(4.4.8)
which is stably solvable for y(t) since (4.4.8) is a Volterra equation of the second kind, a well-posed problem. After y(t) is gotten, we meet the ND-problem for seeking a(t) in the first kind of equation: 0
t
a(τ )dτ = − ln y(t).
(4.4.9)
This problem is solved by Li and Zheng [17].
4.4.3
The parameter estimation in new product diffusion model
An Improved Diffusion Model of New Product How to describe the time-varying characteristics of the diffusion of a new product(/technology) and to estimate the relevant model parameters, have been an important content in management science and marketing research. Among
4.4 Applied Examples
109
the numerous product-diffusion models, a representative one is the improved version proposed by Jones and Ritz [13]: S (t) = [c + b(S(t) − S0 )](S ∗ − S(t)), b, c > 0,
(4.4.10)
R (t) = a[pS(t) − R(t)], a, p > 0,
(4.4.11)
S(0) = S0 , R(0) = 0
(4.4.12)
in which the meanings of the related symbols are as follows: S(t) denotes the cumulative number of retailers who have adopted the product at time t; S ∗ denotes the maximum number of retailers who would adopt the product; R(t) denotes the cumulative number of consumers who have adopted the product at time t; R∗ denotes the cumulative potential; and S0 denotes the initial distribution level for the product. The parameters a, b, c, p in the model have to be estimated on the basis of given data {ti , Si , Ri }, i = 1, 2, . . . , m. The exiting methods for estimating the above parameters are usually OLS and LM method [26]. Denote Si = S(ti ), Ri = R(ti ). ξi , ηi are the approximations of S (ti ), R (ti ) respectively, then we have ξi ≈ c(S ∗ − Si ) + b(Si − S0 )(S ∗ − Si ),
(4.4.13)
ηi ≈ a(pSi − Ri ); i = 1, 2, . . . , m.
(4.4.14)
Ordinary least square (OLS) estimation based on the difference approximation: Without loss of generality, we firstly consider the estimation of c, b. Let ξiD = ΔSi /Δti = Si+1 − Si (Δti = 1), based on c(S ∗ − Si ) + b(Si − S0 )(S ∗ − Si ) = ξiD , i = 1, 2, . . . , m − 1
(4.4.15)
the linear least squares estimation of c and b can be obtained; but as well-known, these estimates are not numerical stable. Levenberg-Marquardt (LM) method based on the analytical solution: Given the initial condition S(0) = S0 , S ∗ (a prior estimate by the experts), eq. (4.4.11) can be explicitly solved: S(t) = S0 + (S ∗ − S0 )
1 − exp(−ct) , b, c > 0, 1 + b exp(−ct)
(4.4.16)
then from the discrete sampled data, parameters b, c can be obtained by LMmethod being a regularized strategy; but this is non-linear estimation and thus is quite time-consuming. A regularized least squared estimation (RLSE) developed by Xu and Xiao [40] to improve the numerical stability:
110
4
Regularization of Numerical Differentiation
After getting a stabilized approximation ξiR , ηiR of ξi , ηi by regularization method, we replace ξiD , ηiD in (4.4.15) by ξiR , ηiR which result in: c(S ∗ − Si ) + b(Si − S0 )(S ∗ − Si ) = ξiR , i = 1, 2, . . . , m − 1.
(4.4.17)
Equations (4.4.17) are usually over-determined linear equations. Then we perform a centering and non-dimensional treatment on the data and use a DRMdp, the stable least squared estimation of b, c can be obtained. We call these procedures as Regularized Least Squared Estimation (abbreviated to RLSE). We can also construct an implicit difference approximations to the equations in (4.4.13): c ∗ [(S − Si ) + (S ∗ − Si+1 )] + b(Si − S0 )(S ∗ − Si ) = ξiR 2 (i = 1, 2, 3, . . . , m − 1).
(4.4.18)
Repeat the above process, we obtain a new regularized estimates of b, c, then a, p. For short, the estimates based on (4.4.17) and (4.4.18) are abbreviated to RLSE-1, RLSE-2 respectively. The above RLSE-Schemes have been tested by a group of sampled data from [26] and compared with OLS-method and LM-method. The results indicate that RLSE-1 and RLSE-2 are quite effective as shown in the Table 4.4. Table 4.4. The fitted errors by several methods.
The Fitted Error S h − Sfh 2 S h − Sfh 2 /S h 2
4.4.4
RLSE-1 3.0182 0.1427
RLSE-2 2.5018 0.1183
LM-method 3.4431 0.1628
OLS-method 4.6384 0.2194
Parameter identification of sturm-liouville operator
The distribution parameter identification of Sturm-Liouville operator is an important applied filed in inverse problems, which seeks k(x) or k(u(x)) in the following operator equations: ⎧ ⎪ ⎨ L(k) := d (k(x) d u(x)) = f(x), (linear case) dx dx (4.4.19) d ⎪ ⎩ k(0) u(0) = c = const, dx ⎧ d d ⎪ (k(u(x)) u(x)) = f(x), (non-linear case) ⎨ L(k) := dx dx (4.4.20) ⎪ ⎩ k(u(0)) d u(0) = c = const dx
4.4 Applied Examples
111
from a set of discrete solutions: {u(xi )} and relevant boundary conditions. We further suppose that u(x) ∈ C 1 [0, 1], f(x) ∈ C[0, 1] and constants c’s are known. If there exists a compact set K ⊂ I = [0, 1] such that infK {| du dx |} > 0, then it can be proven that the inversion formulas [23] ⎧ & % x du ⎪ ⎪ (linear case) f(t)dt + c /( ), ⎨ k(x) = dx 0% & (4.4.21) x du ⎪ ⎪ f(t)dt + c /( ) (non-linear case) ⎩ k(u(x)) = dx 0 hold. This is grounded to the identification of k(x), k(u(x)). The Numerical Identifying Schemes Even the inversion formulas are known, there exist two main difficulties in estimating k(x) or k(u(x)): 1. u(x) is approximately known as uδ : u−uδ ≤ δ and {uδ (xi )}m i=1 is merely a finite-subset of uδ (x) (an infinite set); 2. It involves an unstable numerical differentiation process to the noisy data of {uδ (xi )}m i=1 . So the combination of a stable numerical differentiation process and a numerical integration might be a conservative strategy. In [6], a stabilized identification scheme is proposed: (1) Employing DRM-dp to obtain the stable approximation: %
duδ duδ duδ (x1 ), (x2 ), . . . , (xm ) dx dx dx
&T h h h := yα(δ) = (yα(δ),1 , . . . , yα(δ),m )T .
(4.4.22) (2) Employing adaptive Newton-Cotes formula to compute xi F (xi ) = f(t)dt := Fh (xi ) + ei , i = 1, 2, . . . , m.
(4.4.23)
0
(3) Constructing the approximation of k(xi ), i = 1, 2, . . . , m by F,h (xi ) := kα(δ)
F (xi ) + c Fh (xi ) + c Fh ,h ≈ kα(δ) (xi ) := . h h yα(δ),i yα(δ),i
(4.4.24)
The Error Estimates and Numerical Experiment Based on the above discussions and interpolation theory, Cheng and Xiao proved the following theorem [6]:
112
4
Regularization of Numerical Differentiation
Theorem 4.4.1. Assume u(x) ∈ C 2 [0, 1], y(x) = u (x), uδ ∈ C[0, 1]; 0 < λ ≤ h } ≤ C , then when uh − uh ≤ δ < uh , u(x) − u (x) ≤ min{y(x), yα(δ) 1 δ δ δ δ < uδ , we have √ Fh ,h (x) ≤ c1 δ + c2 α(δ) + c3 h, k(x) − kα(δ)
(4.4.25)
where c1 , c2 and c3 are constants. Two Testing Examples: (1) f(x) = 4x(1 + x2 ), u(x) = x + x3 /3, k(0) du dx (0) = 1; 2 (linear case); k(x) = 1 + x (2) f(x) = 3 exp(3x), u(x) = exp(x), k(u(0)) du dx (0) = 1; k(u(x)) = u2 = exp(2x) (non-linear case). The noise uhδ is generated by uhδ = uh + δ ∗ randn(n, 1). h The comparisons are made among the results in which the yα(δ) are obtained by DRM-dp and by MRM-method. The related performances are listed in Table 4.5, from which we can see that 1. Both of the two schemes work well and they are all numerically stable; 2. For Example 1 and same δ, the scheme described in [6] (a kind of DRM) is better than MRM; 3. For Example 2 and same δ, the scheme described in [6] (a kind of DRM) is worse than MRM. h Table 4.5. The comparisons on the relative-errors (h = 1/128; R.E.= kh − kapprox / h k ).
Disturb δ 0.005 0.01 0.05
4.4.5
Eg.-1: MRM 0.04989 0.05334 0.09865
k(x) = 1 + x2 DRM-dp 0.0396 0.0397 0.0411
Eg.-2: MRM 0.04555 0.05255 0.05229
k(u) = e2x DRM-dp 0.0637 0.0597 0.0687
The numerical inversion of Abel transform
Abel Transform with the inversion Formula Abel’s Transform (AT) has many applications in various fields. A typical AT is of the form: R (r)rdr √ = y(x), x ∈ [0, R], y(R) = 0, (4.4.26) (T)(x) := 2 r 2 − x2 x
4.4 Applied Examples
113
where y(x) ∈ H 1 (0, R), (r) ∈ C[0, R] is to be determined. Under certain conditions the theoretical inversion formula is given by 1 R y (x)dx √ , r ∈ [0, R]. (4.4.27) T−1 y = (r) = π r x2 − r 2 Because (4.4.26) has weakly singular kernel and as the approximation of y(x), yδ (x) ∈ L2 [0, R], hence the solving process of (4.4.26) or (4.4.27) is ill-posed, and some regularization techniques must be employed to get its stable numerical solution. Many efforts have been devoted to this end in which three kinds of methods are used: 1. The ordinary least square method on some subspace of definite-dimensional (for example, Pn [0, R], space of n order-polynomials); 2. The method which directly solves the first kind of Abel equation (4.4.26) by one of some regularization methods; 3. The method which firstly deals with the ND-problem by stable numerical technique then computes the integral of (4.4.27) by some quadrature techniques. Numerical Inversion Scheme Based on Stabilized ND Xiao et.al. [35] presented a numerical inversion scheme to solve (4.4.27) which includes the following steps: Step 1 Input yδh = (yδ,0, yδ,1 . . . , yδ,n )T , δ and control constant ε > 0; d h , y , . . . , y )T by using DRM-dp; yδ = (yδ,0 Step 2 Find dx δ,1 δ,n Step 3 Construct the Hermitian interpolation yδh (x); Step 4 Set weighted function ρi (x) = 1/ (x − ri ) resulting in , h d 1 R dx yδ (x) δ (ri ) = ρi (x) √ dx, i = 0, 1, . . . , n − 1. (4.4.28) π ri x + ri Step 5 Using Gauss-type quadrature with two-points, we get ⎧ n−1 ⎪ 1 R ⎪ ⎪ ⎨ δ (ri ) ≈ ¯δ (ri ) := (βj1 Fi (xj1 ) + βj2 Fi (xj2)), π ri j=i ⎪ ⎪ √ d h ⎪ ⎩ Fi (x) := (yδ (x))/ x + ri , βj1 , βj2 are coefficients. dx
(4.4.29)
It has been proven that if y ∈ C[0, 1], yδ ∈ L2 [0, R] then
√ max |Δ(ri )| = max |(ri ) − ¯δ (ri )| ≤ Ch δ.
0≤i≤n
0≤i≤n
(4.4.30)
114
4
Regularization of Numerical Differentiation
Table 4.6 says that the DRM-dp is much better than the method by Pu Xiaoyun (cf. [35]) and both of them are of about the same ability when error level is small (δ = 0.1%). Table 4.6. The mean square deviations of the approximated solutions (h = (1 − 0)/30).
Disturb. δ(%) 5 1 0.5 0.1
Eg.1: by Xiao 191.3660 38.2781 19.1430 3.8414
y = 103 (1 − x2 ) by X.Y.Pu 20.1020 11.7523 8.6265 3.5566
Eg.2: by Xiao 0.0355 0.0067 0.0034 0.0006
y = (1 − x2 )2 by X.Y.Pu 0.0048 0.0025 0.0017 0.0006
Remark 4.4.2. When y (x) is not smooth the DRM-dp method will work at a discount greatly. Cheng [5] presented an effective method based on SDTR which can also solve more general-type Abel equation. Remark 4.4.3. Obviously, if substituting TVR scheme in the Step 2 of the above method to deal with ND, we will also get an effective technique to solve Abel equation under the case of non-smoothed ‘true solution’.
4.4.6
The linear viscoelastic stress analysis
Background and Presentation of the Problem Visco-elasticity is the property of material that exhibit both viscous and elastic characteristics when undergoing deformation. In order to construct a mathematical model for material, the material parameters have to be determined. Moreover, after the material parameters are obtained we can analysis the stress from the experimental data of the strain, or from the stress to compute the strain. For linear viscoelastic material the above mechanical qualities are linked together in a so-called constitutive equation [27]: σ(t) =
t
0
G(t − τ )˙(τ )dτ,
(4.4.31)
where σ is the stress, t is time, is the strain and G(t) is the relaxation modulus. Alternatively the (t) can be written as (t) =
0
t
J(t − τ )σ(τ ˙ )dτ,
(4.4.32)
4.5 Discussion and Conclusion
115
where J(t) is the creep compliance. Regularizing Strategies for Stress-Strain Analysis Another important mechanical basis is: for a typical linear viscoelastic experiment, only either G(t) or J(t) can be determined directly. After such an experiment, the unknown linear viscoelastic material function can be determined with an inter-conversion method. Actually, by performing firstly Laplace Transform to (4.4.31), (4.4.32) and then applying the inversion of Laplace Transform to the intermediate results we have t=
t
0
G(t − τ )J(τ )dτ =
t
0
G(τ )J(t − τ )dτ.
(4.4.33)
After G(t) or J(t) is known experimentally, the other one can be obtained by solving the above convolution equation and this can be done by Tikhonov Regularization [27]. Formally, we can consider two regularization strategies for the stress-strain analysis: the method with ND and the method without ND: 1. Given (t) and G(t), perform ND of (t) to get ˙(t), then σ(t) can be obtain by integration formula (4.4.31) (with ND); 2. Given σ(t) and J(t), perform ND of σ(t) to get σ(t), ˙ then (t) can be obtain by integration formula (4.4.32) (with ND); 3. Given (t) and ˙ t J(t), solve (4.4.32) by regularization method to get σ(t), ˙ )dτ (without ND); then σ(t) = 0 σ(τ 4. Given σ(t) and t G(t), solve (4.4.31) by regularization method to get ˙(t), then (t) = 0 ˙(τ )dτ (without ND). Relatively, using the strategies 1, 2 with SNS of ND are better than the other two strategies. By the way, we can see that the numerical analysis of linear viscoelastic stress offers a challenge it needs a combining use of regularization methods to do Numerical Inter-Conversion, Numerical Differentiation and Solving Ill-Posed Equation!
4.5
Discussion and Conclusion
In this chapter, the general framework of regularized schemes for ND is introduced; under which we briefly introduce and analyze the major methods with their intuitive background, multidisciplinary applications with instructive examples. A simple strategy, δ 2 rule for determining the regularization parameter in TVR is also proposed. Our analysis and numerical results of comparison study on six test examples shows that
116
4
Regularization of Numerical Differentiation
If y δ ∈ L2 [0, 1] or y δ ∈ C 0 [0, 1], y(x) is smooth enough and the suitable α or step-size h is determined, then (1) the asymptotic optimal convergence rates of most of the regularized NDschemes are O(δ 1/2) for 1st -ND and O(δ 1/3 ) for 2nd -ND; the rates of some SNS are even higher; (2) the practical performance of the above schemes depend on not only their own instincts but also the efficiency of numerical implementation, the degree of smoothness of input data, and their integration. Because the 1st -ND (for Eg.1, Eg.3–Eg.5) is relatively easy, most of the schemes can work well except for MRM in case of Eg.4 and Eg.5. And, SBR and TVR-S2-δ 2 possess the optimal performances; (3) for 2nd -ND of Eg.1–Eg.3, a more difficult task than the 1st -ND, DRMS2-dp, DRM-S2-lc and DRM-S2-gcv are still adequate to the work, and TVR-S2 works more effectively and more robustly. If y δ ∈ L2 [0, 1], y ∈ BV [0, 1], TVR-S2 can works quite well for 1st -ND (Eg.6), DRM-S1 can still work for this example but they are less effective. Actually, TVR-S2 can also get the 2nd -ND of Eg.6 rather well by using subroutine (4.2.68) where A, b are derived from (4.1.2) and S3. There are some unsolved problems that will arouse more attention to us: (a) how to increase the accuracy of recovering the ends-derivatives? (b) how to realize RDM and SBR effectively if the sampled-length h is fixed in advance? (c) how to improve the accuracy and take advantage of the potential of SDTR? A probable strategy might be the acceptance of extrapolation techniques of Tikhonov regularization [38]. Solving ND-problems in high-dimension is more difficult and more attractive, to which not all of the above schemes can be applied directly. But, in our opinion, SDTR based on radial basis functions and the wavelet-Galerkin method are particularly suitable to take on this responsibility.
Acknowledgements This work is completed with financial support from the Natural Science Foundation of Hebei Province under grant number A2006000004. The first author was also partially supported by the National Natural Science Foundation of China under grant number 10571039.
References
117
References [1] R. Acar and C. R. Vogel, Analysis of bounded variation methods for ill-posed problems, Inverse Problems, 10, 1217-1229, 1994. [2] S. Ahn, U. J. Choi and A. G. Ramm, A scheme for stable numerical differentiation, J. Comp. Appl. Math., 186(2), 325-334, 2006. [3] Q. F. Cai, et al., A new method for calculating vorticity, Acta Phys. Sin., 57(6), 3912-3919, 2008. [4] R. Chartrand, Numerical differentiation of noisy, non-smooth dada, Technical report LA-UR-05-9309, Los Alamos National Laboratory, 2005. [5] J. Cheng, et al., A numerical method for the discontinuous solution of Abel integral equation (to appear). [6] Z. F. Cheng and T. Y. Xiao, Discrete regularization method for parameter identification of Sturm-Liouville operator, Numerical Mathematics, a Journal of Chinese Universities, 27, 324-328, 2005 (in Chinese). [7] J. Cullum, Numerical differentiation and regularization, SIAM Journal on Numerical Analysis, 8, 254-265, 1971. [8] T. F. Dolgopolova and V. K. Ivanov, Numerical differentiation, Comp. Math. and Math. Physics, 6(3), 570-576, 1966. [9] T. F. Dolgopolova, Finite dimensional regularization in the case of numerical differentiation of periodic functions, Ural. Gos. Univ. Mat. Zap., 7(4), 27-33, 1970. [10] F. F. Dou, C. L. Fu and Y. J. Ma, A wavelet-Galerkin method for high order numerical differentiation, Applied Mathematics and Computation, 215(10), 37023712, 2010. [11] H. W. Engl, M. Hanke and N. Neubauer, Regularization of Inverse Problems, Doedrecht, Kluwer, 1996. [12] C. W. Groetsch, Lanczos’s generalized derivative, Amer. Math. Monthly, 105(4), 320-326, 1998. [13] J. M. Jones and C. J. Ritz, Incorporating distribution into a new product diffusion models. Intern. J. of Research in Marketing, 8, 91-112, 1991. [14] M. Hanke and O. Scherzer, Inverse problems light: numerical differentiation, Amer. Math. Monthly, 6, 512-522, 2001. [15] P. C. Hansen, Regularization tools: A MATLAB package for analysis and solution of discrete ill-posed problems, Numerical Algorithms, 6, 1-35, 1994. [16] G. S. Li and X. P. Fan, An optimal regularization algorithm based on singular value decomposition, Journal of Information and Computational Science, 2(4), 705-714, 2005. [17] Z. L. Li and K. W. Zheng, An inverse problem in a parabolic equation, electronic journal of differential equations, Conference, 01, 193-197, 1997. [18] H. Liao, F. Li and M. K. Ng, Generalized cross-validation for total variation image restoration, Journal of Optical Society of America A, 26(11), 2311-2320, 2009. [19] Y. Z. Lin, UPRE method for total variation parameter selection, Signal Processing, 9(8), 2546-2551, 2010. [20] S. Lu and Y. B. Wang, First and second order numerical differentiation with Tikhonov regularization, Selected Publications of Chinese Universities: Mathematics, 1, 106-116, 2004.
118
4
Regularization of Numerical Differentiation
[21] X. J. Luo and C. N. He, A stable high accuracy approximate differentiation of approximately specified functions, Numerical Computing and Computer Applications, 26(4), 269-277, 2005 (in Chinese). [22] D. A. Murio, Automatic numerical differentiation by discrete mollification, Computers and Mathematics with Applications, 13, 381-386, 1987. [23] D. A. Murio, The Mollification Method and the Numerical Solution of Ill-posed Problems, New York, Wiley, 175-189, 1993. [24] A. G. Ramm and A. B. Smirnova, On stable numerical differentiation, Mathematics of Computation, 70, 1131-1153, 2001. [25] A. G. Ramm and A. Smirnova, Stable numerical differentiation: when is possible? J. KSIAM, 7(1), 47-61, 2003. [26] R. Scitovski, Analysis of a parameter identification problem, Appl. Math. Comput., 82: 39-55, 1997. [27] J. Sovari and M. Malinen, Numerical inter-conversion between linear viscoelastic material functions with regularization, International Journal of Solid and Structures, 44(3-4), 1291-1303, 2007. [28] V. V. Vasin, Regularization of numerical differentiation problem, Mathem. Zap. Uralśkii Univ., 7(2), 29-33, 1969 (in Russian). [29] V. V. Vasin, The stable evaluation of derivative in the space C(−∞, ∞), USSR Computational Mathematics and Mathematical Physics, 13, 16-24, 1973. [30] C. R. Vogel, Computational Methods for Inverse Problems, Philadephia, SIAM, 2002. [31] P. Wang and K. W. Zheng, Determination of conductivity in heat equation, Internat J. Math. & Math. Sci., 24(9), 589-594, 2000. [32] Y. B. Wang, X. Z. Jia and J. Cheng, A numerical differentiation method and its application to reconstruction of discontinuity, Inverse Problems, 18, 1461-1476, 2002. [33] T. Wei, Y. C. Hon and Y. B. Wang, Reconstruction of numerical derivatives from scattered noisy data, Inverse Problems, 21, 657-672, 2005. [34] T. Wei and Y. C. Hon, Numerical differentiation by radial basis functions approximation, Advances in Computational Mathematics, 27(3), 247-272, 2007. [35] T. Y. Xiao and J. L. Song, A Discrete Regularization Method for the Numerical Inversion of Abel Transform, Chinese Journal of Computational Physics, 17(6), 602-610, 2000. [36] T. Y. Xiao, S. G. Yu and Y. F. Wang, Numerical Methods for Inverse Problems, Beijing, Science Press, 2003 (in chinese). [37] T. Y. Xiao and J. L. Zhang, A stable and fast algorithm for identifying the source item in heat conduct problem, Chinese Journal of Computational Physics, 25(3), 335-343, 2008. [38] T. Y. Xiao, Y. Zhao and G. Z. Su, Extrapolation techniques of Tikhonov regularization, In: Optimization and Regularization for Computational Inverse Problems, Editors: Y. F. Wang, A. G. Yagola and C. C. Yang, Higher Edication Press/Springer, 2010. [39] H. L. Xu and J. J. Liu, Stable numerical differentiation for the second order derivatives, Adv. Comput. Math., 33(4), 431-447, 2010.
References
119
[40] W. Xu and T. Y. Xiao, A stable algorithm for the parameter estimation of a new product diffusion model, Numerical Mathematics, a Journal of Chinese Universities, 27, 350-353, 2005 (in Chinese). [41] Y. Z. Ye and L. Yi, Recovering the local volatility of underlying assets, Appl. Math. J. Chinese Univ. Ser A, 21(1), 1-8, 2006 (in Chinese). [42] H. Zhang, A MATLAB package for one dimensional numerical differentiation, MA Thesis of Hebei University of Technology, 2007.
Authors Information T. Y. Xiao School of Science, Hebei University of Technology, Tianjin 300130, P. R. China. E-mail: [email protected] H. Zhang Department of Mathematics and Computer Science, Tongren College, Tongren 554300, P. R. China. E-mail: [email protected] L. L. Hao The Department of Tourism, Hebei Vocational and Technical College of Building Materials, Qinhuangdao 066000, P. R. China. E-mail: [email protected]
Chapter 5
Numerical Analytic Continuation and Regularization C. L. Fu, H. Cheng and Y. J. Ma
Abstract. The problem of numerical analytic continuation of an analytic function, in general, is an ill-posed problem and frequently encountered in many practical applications. The main earlier works for this topic focus on the conditional stability and some rather complicated computational techniques. However, it seems there are few applications of the modern theory of regularization methods which have been developed intensively in the last few decades. This chapter is devoted to some different regularization methods for solving the numerical analytic continuation of an analytic function f(z) = f(x + iy) on the strip domains Ω = {z = x+iy | x ∈ R, 0 < |y| < y0 } and Ω+ = {z = x+iy | x ∈ R, 0 < y < y0 }, where the data is given approximately only on the line y = 0. A simple numerical example illustrates different effects of these methods.
5.1
Introduction
The problem of analytic continuation is a classical problem in the theory of the complex analysis. The analytic continuation just is the attempt to make the domain of an analytic function as large as possible. For this, there is an essential difference between the continuous function with real variable and analytic function with complex variable. In fact, for a continuous function w(x) with single real variable x ∈ [a, b], a < b, there are infinite methods to extend it to the region outside the interval [a, b] and keep its continuity. However, for analytic function f(z) with a complex variable, its value in original domain B can determine entirely the value outside the domain B. There are two important aspects for the study of analytic continuation of analytic function with complex variable. The first topic is the uniqueness of continuation. Some uniqueness theorems have been obtained in the 19th century and presented in the majority of textbooks.
122
5
Numerical Analytic Continuation and Regularization
The second topic is the stability of continuation. In some special cases, this problem is stable or well-posed. For example, the function f(z) is analytic in a domain D and Γ = ∂D1 ⊂⊂ D is a simple closed curve, see Figure 5.1. The value of f(z) on Γ1 is known, then for every z ∈ D1 , by Cauchy formula, we know f(ξ) 1 dξ f(z) = 2πi Γ1 ξ − z is totally determined and it is stable.
Figure 5.1. The well-posed solving domain D1 for data given on Γ1 .
But, in general, this problem is unstable or ill-posed. To explain this phenomenon, let D ⊃ D1 ⊃ D2 be domains of the complex plane, and suppose that their boundaries Γ, Γ1 and Γ2 are piecewise smooth curves: Γ1 ⊂ D and Γ2 ⊂ D1 , see Figure 5.2. Further suppose that there exists a function f(z) analytic in D and the values of f(z) are known everywhere in D2 . It is required to determine the value of f(z) in D1 . This problem reduces to an integral equation of the first kind. Indeed, by Cauchy formula, to solve the problem it suffices to determine the value of f(z) on Γ1 , i.e., f(z)|Γ1 = ϕ(z). Thus, solution of the given problem of analytic continuation reduces to solution of the integral equation 1 ϕ(ξ) dξ = f(z), z ∈ D2 . 2πi Γ1 ξ − z It is well-known that this is an ill-posed problem. The details can refer to Theorem 2.22 of [1]. For above ill-posed problem, there are two branches of study.
Figure 5.2. The ill-posed solving domain D1 for data given on Γ2 .
5.1 Introduction
123
The first one is the conditional stability in theory. For this topic, many important results were obtained in last century. The prototype of the stability estimates for the analytic continuation problem is the famous Hadamard’s three circle theorem for holomorphic function as below ([2]). Theorem 5.1.1 (Hadamard). Let f(z) be a function analytic in the annulus r < |z| < R and continuous in the closed annulus r ≤ |z| ≤ R, see Figure 5.3. Then & % ln |z| − ln r ln R − ln |z| + ln M |f(z)| ≤ exp ln m ln R − ln r ln R − ln r =m
ln R−ln |z| ln R−ln r
M 1−
ln R−ln |z| ln R−ln r
,
where M = max |f(z)|, m = max |f(z)|. |z|=R
|z|=r
Figure 5.3. Illustration for the three circle theorem.
This theorem shows that for any two bounded analytic functions defined in the annulus r ≤ |z| ≤ R, if the difference of their values on inner boundary |z| = r is small, then the difference of their values on total domain is also small. For the more general problem of analytic continuation from a part of the boundary of a domain, we suppose that the boundary of the domain D is a piecewise smooth curve Γ = Γ1 ∪ Γ2 , Γ1 ∩ Γ2 = ∅, and the Γj (j = 1, 2) are also piecewise smooth curves, see Figure 5.4. Function f(z) is analytic in D and continuous in D, and the values of f(z) are known on Γ1 . It is required to determine the values of f(z) in the domain D. For this problem there holds [2]: Theorem 5.1.2. Suppose the function f(z) satisfies the inequalities
Then
|f(z)| ≤ M,
z ∈ Γ2 ,
|f(z)| ≤ m,
z ∈ Γ1 .
|f(z)| ≤ M 1−ω(z) mω(z) ,
where ω(z) is the harmonic measure of curve Γ1 in the domain D.
124
5
Numerical Analytic Continuation and Regularization
Figure 5.4. The general domain D with data given on Γ1 (a part of domain D).
Other related exposition on conditional stability also can be seen in [3, 4]. However, the conditional stability can not ensure the stability in numerical computation. The second more interesting branch is the numerical analytic continuation based on the regularization theory, which have been developed intensively in the last few decades for ill-posed problems. For this, to the authors’ knowledge, it seems there are few results both in theory and algorithm. We only know a method of regularizing the problem of analytic continuation from part of the boundary of a domain, which is to use a Carleman function ([2]), nevertheless, there is only the convergence analysis, but not any estimate for the convergence rate and concrete algorithm, meanwhile, the construction of the Carleman function is also rather difficult. In this chapter, we will consider this problem on the strip domain in the complex plane C, and the data is only known approximately on the real axis. Some important domains, e.g. the elliptic domain, the semi-strip domain can be easily transformed to a strip domain ([5]), and some application background of analytic continuation on strip domain in the complex plain can be found in [5, 6, 7, 8, 9, 10].
5.2 5.2.1
Description of the Problems in Strip Domain and Some Assumptions Description of the problems
Denote Ω ⊂ C : and Ω+ ⊂ Ω :
Ω = {z = x + iy | x ∈ R, |y| < y0 }, Ω+ = {z = x + iy | x ∈ R, 0 < y < y0 }.
Suppose a function f(z) with single complex variable z = x + iy is analytic in ¯ We only know a measurement data fδ (x), which is an approximation of the Ω. exact data f(x) = f(x + iy)|y=0 given on x-axis.
5.2 Description of the Problems in Strip Domain and Some Assumptions
125
Problem 5.2.1. We want to determine an approximation of function f(z) = f(x + iy) on domain Ω+ by using the data fδ (x). Problem 5.2.2. We want to determine an approximation of function f(z) = f(x + iy) on domain Ω by using the data fδ (x). Remark 5.2.3. fined by
1. For g(x) ∈ L1 (R), g(ξ) ˆ denotes its Fourier transform de1 g(ξ) ˆ =√ 2π
∞
e−ixξ g(x)dx.
−∞
2. For g(x) ∈ L2 (R), g denotes its L2 -norm defined by ∞ 1 g = ( |g(x)|2 dx) 2 , −∞
and there holds Parseval formula g = g. ˆ
5.2.2
Some assumptions
Assumption 5.2.4.
1. The noise level satisfies f − fδ ≤ δ.
(5.2.1)
2. Suppose f(· + iy) ∈ L2 (R), for z = x + iy ∈ Ω, or z = x + iy ∈ Ω+ . (5.2.2) 3. There hold the a-priori bounds
5.2.3
f(· + iy0 ) ≤ E,
z = x + iy ∈ Ω+ , or
(5.2.3)
f(· ± iy0 ) ≤ E,
z = x + iy ∈ Ω.
(5.2.4)
The ill-posedness analysis for the Problems 5.2.1 and 5.2.2
Viewpoint 1. Because the function f(z) is analytic in Ω, there holds the following Taylor expansion at point (x, 0) : f(z) = f(x + iy) ∞ ∞ f (n) (x) (iy)n n = (iy)n = D f(x), n! n! n=0 n=0
(5.2.5)
126
5
Numerical Analytic Continuation and Regularization
n
∂ where Dn = ∂x n . It is well known that the numerical differentiation is ill-posed, expression (5.2.5) shows that the numerical analytic continuation is a sum of infinite numerical differentiation with different orders which tends to infinite. So Problem 5.2.1 and 5.2.2 must be severely ill-posed.
Viewpoint 2. Note that assumption (5.2.2) and expression (5.2.5), we know that f(· + iy)(ξ, y) =
∞ (iy)n n=0
=
1 f(x + iy) = √ 2π
(iξ)n fˆ(ξ)
∞ (−yξ)n n=0
Therefore, there holds
n!
∞
−∞
n!
fˆ(ξ) = e−yξ fˆ(ξ).
eixξ e−yξ fˆ(ξ)dξ,
|y| < y0 .
(5.2.6)
(5.2.7)
Note that the factor e−yξ tends to infinity rapidly for y > 0, ξ → −∞ and y < 0, ξ → +∞, and the integral (5.2.7) is convergent, we know the Fourier transform fˆ(ξ) of the exact data f(x) must decay rapidly in the above cases. However, the Fourier transform fˆδ (ξ) of the noise data fδ (x) can not possess such a property, and the high frequency perturbation must be magnified by the factor e−yξ and cause blow up of integral (5.2.7). So the Problems 5.2.1 and 5.2.2 both are severely ill-posed.
5.2.4
The basic idea of the regularization for Problems 5.2.1 and 5.2.2
From equations (5.2.6) and (5.2.7) we know that the reason for causing the illposedness of Problems 5.2.1 and 5.2.2 is that the high-frequency disturbation of the noise data fˆδ (ξ) is magnified by the kernel function e−yξ in (5.2.6) and (5.2.7). So, the basic idea of regularizing these problems is just to give a filtering for the noise data fδ (x) by a proper filter function.
5.3 5.3.1
Some Regularization Methods Some methods for solving Problem 5.2.1
Fourier method The Fourier method is to give an approximation of f(x + iy) by directly cutoff the high frequency components of noise data fδ (x). For a more general
5.3 Some Regularization Methods
127
description for the Fourier method can refer to [11]. Definition 5.3.1. Define the regularization approximation of the solution to Problem 5.2.1 as ∞ 1 fδ,ξmax (x + iy) := √ eixξ e−yξ fˆδ (ξ)χ+ max dξ, 2π −∞ where χ+ max =
1, 0,
ξ ≥ −ξmax , ξ < −ξmax ,
(5.3.1)
and ξmax > 0 is the regularization parameter to be determined. Theorem 5.3.2. Assume conditions (5.2.1), (5.2.2) and (5.2.3) hold, and if we take 1 E ξmax = ln , y0 δ then there holds the error estimate y 1− y f(· + iy) − fδ,ξmax (· + iy) ≤ 2 E y0 δ y0 + o(1) , for δ → 0, 0 < y < y0 . The details and more results for this method can be found in [12]. A modified Tikhonov method Define a new kernel function which comes from a simplifying Tikhonov method: ξ ≥ 0, e−yξ , MT Rα (ξ, y) = e−yξ , ξ < 0, 1+αe−2y0 ξ and define the approximation of the solution of Problem 5.2.1 with noise data fδ (x) as ∞ 1 MT (x + iy) = √ eixξ RαM T (ξ, y)fˆδ (ξ)dξ, fδ,α 2π −∞ we have the following results: Theorem 5.3.3 (a-priori method). Assume the conditions (5.2.1), (5.2.2) and (5.2.3) hold, and if we take the regularization parameter % &2 δ , α= E then there holds the error estimate y
MT (· + iy) − f(· + iy) ≤ E y0 δ fδ,α
1− yy
0
(2 + o(1)),
for δ → 0, 0 < y < y0 .
128
5
Numerical Analytic Continuation and Regularization
The details and more results for this method can refer to [13]. Theorem 5.3.4 (a-posteriori method). Assume the conditions (5.2.1), (5.2.2) and (5.2.3) hold, and if we take the regularization parameter as the solution of the equation (the Discrepancy Principle of Morozov) eξy RαM T (ξ, y)fˆδ − fˆδ = τ δ, where τ > 1 is a constant. Then there holds the error estimate MT (· + iy) − f(· + iy) ≤ C(τ, y, y0 )δ fδ,α
where C(τ, y, y0 ) = (τ + 1)
1− yy
0
1− yy
0
y
E y0 ,
y
( 4(τ1−1) + 1) y0 .
The details can refer to [14]. An optimal filtering method The idea of this method comes from [15, 16] for IHCP and BHCP. Denote filtering function as −yξ ξ ≥ 0, e , ρ(ξ, y) = ρ1 (ξ, y), ξ < 0,
where ρ1 (ξ, y) =
e−yξ , β(y),
e−yξ ≤ β(y), e−yξ ≥ β(y),
y
with β(y) = (1 − yy0 )( Eδ ) y0 . Define the approximation of the solution of Problem 2.1 with noise data fδ (x) as ∞ 1 eixξ ρ(ξ, y)fˆδ (ξ)dξ. fδ,β(y) (x + iy) = √ 2π −∞ Theorem 5.3.5. Assume conditions (5.2.1), (5.2.2) and (5.2.3) hold, then there holds the error estimate y
fδ,β(y) (· + iy) − f(· + iy) ≤ E y0 δ
1− yy
0
(1 + o(1))
for δ → 0, 0 < y < y0 .
The details can refer to [17, 18]. The above estimate is an optimal error estimate [13]. Two modified kernel methods Define the kernel functions as Rα(1) (y, ξ) :=
e−yξ , −
e
yξ 1+α2 ξ2
ξ ≥ 0, ,
ξ < 0,
5.3 Some Regularization Methods
129
and Rα(2)(y, ξ) :=
e−yξ , y sin αξ e− α ,
ξ ≥ 0, ξ < 0,
which is corresponding to the modification of the equation and the central difference method for the IHCP ([19, 20]). Define the approximation of the solution to Problem 5.2.1 as ∞ 1 (k) √ fα,δ (x + iy) = eixξ Rα(k)(y, ξ)fˆδ (ξ)dξ, k = 1, 2. 2π −∞ We have Theorem 5.3.6. Assume conditions (5.2.1), (5.2.2) and (5.2.3) hold, if we take the parameter y0 , α= 2 ln Eδ then (1) for
δ E
≥ e−3 and 0 < y < y0 , there holds error estimate y
(1)
fα,δ (· + iy) − f(· + iy) ≤ 2E y0 δ (2) for
δ E
< e−3 and 0 < y < y0 −
(1) fα,δ (·+iy)−f(·+iy)
(3) for
δ E
≤E
< e−3 and y0 −
(1) fα,δ (· +
y y0
δ
3y0 ln E δ
1− yy
0
3y0 ln E δ
1− yy
0
,
, there holds
+max E
y y0
δ
27yy02 E 0, 3 3 e (y0 − y) (ln( Eδ )2 )2
1− yy
$ ,
≤ y < y0 , there holds
iy) − f(· + iy) ≤ E
y y0
δ
1− yy
0
%
1 + max 1,
y E ln 4y0 δ
'& .
Theorem 5.3.7. Assume conditions (5.2.1), (5.2.2) and (5.2.3) hold, if we take the parameter y0 α= E, ln δ then there holds (2)
y
fα,δ (· + iy) − f(· + iy) ≤ E y0 δ where C =
9yy02 . 2e3 (y0 −y)3
The details can refer to [21, 22].
1− yy
0
+
CE , ln Eδ
0 < y < y0 ,
130
5
Numerical Analytic Continuation and Regularization
An approximation method by harmonic functions Let f(z) = f(x + iy) = u(x, y) + iv(x, y), where u(x, y) and v(x, y) are real and imaginary parts of function f(z) with complex variable z ∈ C. It is easy to see by the Cauchy-Riemann formula that Problem 5.2.1 is equivalent to solving the following two Cauchy problems of Laplace equation: ⎧ 2 ∂ u ∂2u ⎪ ⎪ ⎨ 2 + 2 = 0, x ∈ R, 0 < y < y0 , ∂x ∂y (5.3.2) u(x, 0) = f(x), x ∈ R, ⎪ ⎪ ⎩ x ∈ R, uy (x, 0) = 0, and
⎧ 2 ∂ v ∂2v ⎪ ⎪ ⎨ 2 + 2 = 0, ∂x ∂y v(x, 0) = 0, ⎪ ⎪ ⎩ vy (x, 0) = f (x),
x ∈ R, 0 < y < y0 , (5.3.3)
x ∈ R, x ∈ R.
The solutions of problem (5.3.2) and (5.3.3) are ∞ 1 ˆ u(x, y) = √ eixξ cosh(y|ξ|)f(ξ)dξ, 2π −∞ and
1 v(x, y) = √ 2π
∞
eixξ
−∞
iξ sinh(y|ξ|) ˆ f(ξ)dξ, |ξ|
(5.3.4)
(5.3.5)
respectively. Define the approximation of the solutions to problem (5.3.2) and (5.3.3) with noise data as ∞ 1 (1) uδ,ξ(1) (x, y) = √ eixξ cosh(y|ξ|)fˆδ (ξ)χmax dξ, max 2π −∞ and
1 vδ,ξ(2) (x, y) = √ max 2π
∞
eixξ
−∞ (k)
iξ sinh(y|ξ|) ˆ (2) fδ (ξ)χmax dξ, |ξ| (k)
where the meaning of the notations χmax and ξmax , k = 1, 2, are the same as in (5.3.1). Moreover, we assume there hold the following two a-priori bounds
where
u(·, y0 )p ≤ E1 ,
(5.3.6)
v(·, y0 )p ≤ E1 ,
(5.3.7)
% gp :=
∞
−∞
2 |g(ξ)| ˆ (1 + ξ p )dξ
&1 2
.
5.3 Some Regularization Methods
131
Lemma 5.3.8. Assume conditions (5.2.1), (5.2.2) and (5.3.6) hold, if we take (1) ξmax
1 = ln y0
/
2E1 δ
%
2E1 1 ln y0 δ
&−p 0 ,
then there holds the error estimate yp y
y
u(·, y) − uδ,ξ(1) (·, y) ≤ y0 0 (2E1 ) y0 δ
1− yy
0
,
max
ln 2Eδ 1
-− yp y 0
(2 + o(1)),
for δ → 0, 0 < y ≤ y0 . Lemma 5.3.9. Assume conditions (5.2.1), (5.2.2) and (5.3.7) hold, if we take (2) ξmax
1 = ln y0
/
2E1 δ
%
1 2E1 ln y0 δ
&−p 0 ,
then there holds the error estimates yp y
y
v(·, y) − vδ,ξ(2) (·, y) ≤ y0 0 (2E1 ) y0 δ max
1− yy
0
,
ln 2Eδ 1
-− yp y 0
(1 + o(1)),
for δ → 0, 0 < y ≤ y0 . (1)
(2)
Note that ξmax = ξmax , so we can define the approximation of Problem 5.2.1 as fδ,ξmax (x + iy) = uδ,ξmax (x, y) + ivδ,ξmax (x, y) and we have Theorem 5.3.10. Assume conditions (5.2.1),(5.2.2),(5.3.6) and (5.3.7) hold, if we take / % & 0 1 2E1 1 2E1 −p ξmax = , ln ln y0 δ y0 δ then there holds the error estimates f(· + iy) − fδ,ξmax (· + iy) ≤
y √ yyp0 -− yp 1− y , 2y0 (2E1 ) y0 δ y0 ln 2Eδ 1 y0 (2 + o(1)),
for δ → 0 and 0 < y ≤ y0 , p > 0 or 0 < y < y0 , p = 0. The details can refer to [22].
132
5
Numerical Analytic Continuation and Regularization
Two remarks Remark 5.3.11. For the above methods, replacing the a-priori bound f(· + iy) ≤ E by f(· − iy) ≤ E, we can get the approximation of f(z) = f(x + iy) in the domain Ω− = {z = x + iy | x ∈ R, −y0 < y < 0} and moreover, the approximation of f(z) = f(x + iy) on Ω can be obtained. Remark 5.3.12. Comparison of the kernels in the above methods: 1) Fourier method Rξmax =
e−yξ , e−yξ χmax ,
ξ ≥ 0, ξ < 0,
e−yξ ,
ξ ≥ 0, ξ < 0,
2) A modified Tikhonov method RξαM T =
e−yξ , 1+αe−2y0 ξ
3) An optimal filtering method Rβ (ξ, y) = where
ρ1 (ξ, y) =
with β(y) = (1 −
e−yξ , ρ1 (ξ, y),
e−yξ , β(y),
ξ ≥ 0, ξ < 0,
e−yξ ≤ β(y), e−yξ ≥ β(y),
y
y E y0 y0 )( δ ) .
4) Modified kernel methods Rα(1) (y, ξ)
:=
and
e−yξ , −
Rα(2)(y, ξ) :=
ξ ≥ 0, ,
ξ < 0,
e−yξ , y sin αξ e− α ,
ξ ≥ 0, ξ < 0,
e
yξ 1+α2 ξ2
we can more distinctly see the basic idea of regularization method described in Section 5.2, i.e., substantially speaking, they are just different filterings for high-frequency disturbation for ξ → −∞.
5.3 Some Regularization Methods
5.3.2
133
Some methods for solving Problem 5.2.2
A Meyer wavelet method Due to the scaling and wavelet functions of Meyer wavelet all have the compact support in the frequency space ([23]). So the high-frequency components of noise data can be filtered in these scaling spaces. Therefore, the idea of wavelet method is the same as Fourier method in Section 5.3.1. Let ϕ(x), ψ(x) be Meyer scaling and wavelet function, respectively, and denote {Vj }j∈Z = {ϕjk , k ∈ Z},
j
ϕjk = 2 2 ϕ(2j x − k), j, k ∈ Z,
be the scaling spaces of the multiresolution analysis (MRA) of Meyer wavelet. For g ∈ L2 (R), define (g, ϕjk )ϕJk . PJ g := k∈Z
In this subsection, we consider Problem 5.2.2 on the domain Ω. Let f(x) be the data function. Define operator Ty by Ty f(x) := f(x + iy),
|y| < y0 ,
or equivalently, −yξ ˆ f (ξ), T8 y f (ξ) = f(· + iy)(ξ, y) = e
|y| < y0 .
Define the approximate solution of Problem 5.2.2 with noise data fδ (x) as (Ty,J fδ )(x + iy) := (Ty PJ fδ )(x + iy). Theorem 5.3.13. Assume conditions (5.2.1), (5.2.2) and (5.2.4) hold. If we take the regularization parameter 1 E J = log2 ( (ln )) , 2y0 δ ∗
where [a] denotes the largest integer no more than a ∈ R. Then there holds the error estimate |y|
f(· + iy) − Ty,J ∗ fδ (· + iy) ≤ CE y0 δ
|y| 0
1− y
,
where C is a positive constant independent of J ∗ and y.
0 < |y| < y0 ,
134
5
Numerical Analytic Continuation and Regularization
A mollification method Denote the Gauss function ρε (x) as 1 x2 ρε (x) = √ exp{− 2 }. ε ε π Define an approximation of the solution to Problem 5.2.2 with noise data fδ (x) as ∞ 1 eixξ e−yξ ρ fε,δ (x + iy) = √ ε ∗ fδ (ξ)dξ, 2π −∞ we have two results as follows: Theorem 5.3.14 (a-priori method). Assume conditions (5.2.1), (5.2.2) and (5.2.4) hold, if we take the parameter ε=
y0
1
(ln Eδ ) 2
,
then there holds the estimate y2 E
fε,δ (· + iy) − f(· + iy) ≤ 2e−2 (y0 0−y)2 (ln Eδ )−1 (1 + o(1)), for δ → 0 and 0 < |y| < y0 . Theorem 5.3.15 (a-posteriori method). Assume the conditions (5.2.1), (5.2.2) and (5.2.4) hold and τ > 0, such that 1 0 < δ + τ (ln(ln ))−1 < fδ . δ If we take the regularization parameter ε as the solution of the following equation e−
ε2 ξ 2 4
1 fˆδ − fˆδ = δ + τ (ln ln( ))−1 , δ
then there holds the error estimate y y0 −y fε,δ (· + iy) − f(· + iy) ≤ (E + o(1)) y0 τ (ln ln 1δ )−1 y0
for δ → 0, 0 < |y| < y0 . The details and more results for this method can refer to [24]. A remark Due to the limitations of the techniques, the methods in this section can not be directly used to the Problem 5.2.1 on the domain Ω+ , but the results are suitable for z = x + iy ∈ Ω+ .
5.4 Numerical Tests
5.4
135
Numerical Tests
In this section, a simple numerical example is devised to verify the validity of the regularization methods given in Section 5.3. Example 5.4.1. The function 2
2
f(z) = e−z = e−(x+iy) = ey
2 −x2
(cos 2xy − i sin 2xy)
is analytic in the domain +
Ω = {z = x + iy ∈ C| x ∈ R, 0 ≤ y ≤ 1} with
2
f(z)|y=0 = e−x ∈ L2 (R),
and Re f(z) = ey
2 −x2
cos 2xy,
y2 −x2
Im f(z) = −e
sin 2xy.
We use the Fast Fourier and Inverse Fourier transform to complete our numerical experiments. In these numerical experiments we always calculate the approximate value V (x + iy) of f(x + iy) at y = 0.1, 0.9 for |x| ≤ 10. Suppose the vector F represents samples from the function f(x), then we obtain the perturbation data F δ = F + ε randn(size(F )),
(5.4.1)
where the function “randn(·)” generates arrays of random numbers whose elements are normally distributed with mean 0, variance σ 2 = 1. The error is given by 9 : +1 : 1 M δ δ = F − F l2 := ; |F δ (n) − F (n)|, (5.4.2) M + 1 n=1 here we choose M = 100. In all numerical experiments, we compute the approximation solutions V (x+ iy) according to the theorems in Section 5.3. Figures 5.5–5.10, 5.12 are the comparison of the real and imaginary parts of the exact f(z) and the approximate solution V (x + iy) for different noise level = 10−2 , 10−3 . We can easily see that the Fourier method and the Modified Tikhonov method work better than others. Figure 5.11 gives the real and imaginary parts of the exact f(z) and the approximate solution V (x + iy) for larger noise level = 10−1 , 10−2 , the computational result of Meyer Wavelet method is still very well.
136
5
Numerical Analytic Continuation and Regularization
Figure 5.5. The Fourier method.
Figure 5.6. The modified Tikhonov method.
5.4 Numerical Tests
Figure 5.7. The Filtering method.
Figure 5.8. Modified kernel method (1).
137
138
5
Numerical Analytic Continuation and Regularization
Figure 5.9. Modified kernel method (2).
Figure 5.10. Harmonic function method.
5.4 Numerical Tests
Figure 5.11. Meyer wavelet method.
Figure 5.12. The Mollification method.
139
140
5
Numerical Analytic Continuation and Regularization
Remark 5.4.2. Although the filtering method is optimal in theory, its numerical effect is not better than some of other methods. The reasons may be that new errors must appear in the computational process and we can appropriately adjust the regularization parameter for other methods to obtain a better result. While for the filtering method, the parameter β(y) is completely fixed and can not be corrected.
Acknowledgements We thank our group students Xiao-Li Feng, Zhi-Liang Deng, Yuan-Xiang Zhang, Fang-Fang Dou and Min-Hua Zhao, Zheng-Qiang Zhang for their active participation in this topic and important contributions. The project is supported by the National Natural Science Foundation of China under grant numbers 11171136 and 10671085.
References [1] R. Kress, Linear Integral Equations, 2nd Edition, Berlin, Springer-Verlag, 1998. [2] M. M. Lavrent’ev, V. G. Romanov and S. P. Shishat’skiˇi, Ill-posed problems of mathematical physics and analysis, Translations of Mathematical Monographs, 64, 1986. [3] M. M. Lavrent’ev, Some Improperly Posed Problems of Mathematical Physcis, New York, Springer-Verlge, 1967. [4] S. Vessella, A continuous dependence result in the analytic continuation problem, Forum Math, 11, 695-703, 1999. [5] D. N. Hào and H. Shali, Stable analytic continuation by mollification and the fast Fourier transform Method of Complex and Clifford Analysis, Proc. of ICAM, 143-52, 004. [6] J. Franklin, Analytic continuation by the fast Fourier transform, SIAM J. Sci. Stat. Comput., 11, 112-22, 1990. [7] I. S. Stefanescu, On the stable analytic continuation with a condition of uniform boundedness, J. Math. Phys., 2, 2657-86, 1986. [8] A. G. Ramm, The ground-penetrating radar problem, III, J. Inverse Ill-posed Probl., 8, 23-30, 2000. [9] C. L. Epstein, Introduction to the Mathematics of Medical Imaging, Philadelphia, SIAM, 2008. [10] R. G. Airapetyan and A. G. Ramm, Numerical inversion of the Laplace transform from the real axis, J. Math. Anal. Appl., 248, 572-87, 2000. [11] C. L. Fu and Z. Qian, Numerical pseudodifferential operator and Fourier regularization, Adv. Comput. Math., 33 (4), 449-470, 2010. [12] C. L. Fu, F. F. Dou, X. L. Feng and Z. Qian, A simple regularization method for stable analytic continuation, Inverse Problems, 24, article 065003, 2008.
References
141
[13] C. L. Fu, Z. L. Deng, X. L. Feng and F. F. Dou, A modified Tikhonov regularization for stable analytic continuation, SIAM J. Numer. Anal., 47(4), 2982-3000, 2009. [14] Z. L. Deng, The Study of the Regularization Methods for Two Classes of Ill-posed Problems, Doctoral thesis, Lanzhou University, 2010. [15] T. I. Seidman and L. Eldén, An optimal filtering method for the sideways heat equation, Inverse Problems, 6, 681-696, 1990. [16] T. I. Seidman, Optimal filtering for the backward heat equation, SIAM J. Numer. Anal., 33, 162-170, 1996. [17] H. Cheng, Two Regularization Methods for Solving Some Inverse Problems of Elliptic Equations, Master thesis, Lanzhou University, 2009. [18] H. Cheng, C. L Fu and X. L. Feng, An optimal filtering method for stable analytic continuation, Journal of Computational and Applied Mathematics, 2012 (to appear). [19] L. Eldén, Approximations for a Cauchy problem for the heat equation, Inverse Problems, 3, 263-273, 1987. [20] L. Eldén, Numerical solution of the sideways heat equation by difference approximation in time, Inverse Problems, 11, 913-923, 1995. [21] M. H. Zhao, A Modified Kernel Method for Analytic Continuation, Master thesis, Lanzhou University, 2009. [22] Z. Q. Zhang, Two Regularization Methods for Analytic Continuation, Master thesis, Lanzhou University, 2010. [23] I. Daubechies, Ten Lectures on Wavelets, Philadelphia, SIAM, 1992. [24] Z. L. Deng, C. L. Fu, X. L. Feng and Y. X. Zhang, A mollification regularization method for stable analytic continuation, Mathematics and Computers in Simulation, 81, 1593-1608, 2011.
Authors Information C. L. Fu, H. Cheng and Y. J. Ma School of Mathematics and Statistics, Lanzhou University, Lanzhou 730000, P. R. China. E-mail: [email protected], [email protected], [email protected]
Chapter 6
An Optimal Perturbation Regularization Algorithm for Function Reconstruction and Its Applications G. S. Li
Abstract. This chapter deals with several inverse problems of determining coefficient functions in advection-diffusion equation by an optimal perturbation regularization algorithm. The inversion algorithm is implemented successfully with numerical simulations for reconstruction problems of time-dependent, and space-dependent linear function, and state-dependent nonlinear source term, and space-dependent diffusion coefficient, respectively. Furthermore, the inversion algorithm is applied to determine key parameters in two real inverse problems, and reconstructed data basically coincide with the measured data.
6.1
Introduction
As we know, it is very difficult to measure reaction coefficient or source term by experiment itself for diffusion processes with physical and/or chemical reactions. So, coefficient function reconstruction problem is always encountered with in advection-dispersion and reaction-diffusion equations. In order to determine an unknown function, or reconstruct measured data, an inversion algorithm must be designed and performed with high efficiency and stability. It is noticeable that most of inversion algorithms involve utilizing regularization strategies so as to overcome ill-posedness of data noises and computational errors, and different kinds of inverse problems could need different regularization strategies. For example, method of fundamental solutions ([11, 14]), lie-group estimation method and one-step group preserving scheme ([12, 13]), level set method ([15, 10]), and optimal approximate methods ([16, 1]) have been testified to be effective methods in dealing with corresponding inverse problems. However, to our knowledge, it is still a trouble on how to realize an inversion algorithm utilizing regularization with high efficiency in concrete computations. In this chapter, we will deal with inverse problems of determining coefficient
144
6
Optimal Perturbation Regularization Algorithm
functions in a one-dimensional advection diffusion equation ∂ ∂u ∂u ∂u = (D(x) ) − v + f(x, t; u), 0 < x < l, 0 < t < T. ∂t ∂x ∂x ∂x We will consider four cases based on equation (6.1.1) given below:
(6.1.1)
Problem 6.1.1 (c.f.[5], for instance). D(x) = D, and f(x, t; u) = β(t)u in which case an additional condition at x = l is given as u(l, t) = h(t).
(6.1.2)
The inverse problem is to determine coefficient function β = β(t) by equation (6.1.1) and additional information (6.1.2) with suitable initial boundary conditions. Problem 6.1.2 (c.f.[6], for instance). D(x) = D, and f(x, t; u) = p(x)u, and additional information is given at one final time t = T u(x, T ) = uT (x).
(6.1.3)
The inverse problem in this case is to determine coefficient function p = p(x) by equation (6.1.1) and the additional information (6.1.3) with suitable initial boundary conditions. Problem 6.1.3 (c.f.[2], for instance). D(x) = D, and f(x, t; u) = a(x)g(u), and the inverse problem here is to determine the nonlinear source term g = g(u) also by equation (6.1.1) and the additional condition (6.1.3) with suitable initial boundary conditions. Problem 6.1.4 (c.f.[9], for instance). Let f(x, t; u) = r(x, t) be known, and the inverse problem is to determine the diffusion coefficient D = D(x) also by equation (6.1.1) and the additional condition (6.1.3) with suitable initial boundary conditions. For the above inverse problems of determining coefficient functions, we will employe an optimal perturbation regularization algorithm (c.f.[16], for instance) to perform numerical inversion simulations. Furthermore, two real life examples are presented which arise from regional groundwater pollution and soil-column infiltrating experiment respectively, and the measured data are reconstructed successfully by the optimal perturbation regularization algorithm.
6.2
The Optimal Perturbation Regularization Algorithm
The optimal perturbation regularization algorithm is an approximate method for function reconstruction based on least square method and linearization technique with regularization strategy. It is always called Levenberg-Marquardt
6.2 The Optimal Perturbation Regularization Algorithm
145
method to minimization problem in finite-dimensional space (c.f.[17], for instance), however, it has been testified to be effective to identify unknown functions depending upon space, time or state variables at least for some reactiondiffusion equations (c.f.[16, 5, 6, 7, 8, 2], for instance). On the concrete computations, we find that sometimes the algorithm can be simplified, and reduced to ordinary optimal perturbation algorithm without using explicit regularization. In this section, we will illustrate the inversion algorithm by taking Problem I as example. Assume β(t) ∈ Ψ, where Ψ ⊂ L2 (0, T ) is an admissible space of the unknowns. For any β(t) ∈ Ψ, a unique solution of the corresponding forward problem, denoted by u(x, t; β), can be solved numerically, and then computational data at the boundary x = l are obtained, denoted by u(l, t; β). So, an optimal idea for solving the inverse problem here is to solve a minimization problem: (6.2.1) min J(β), β∈Ψ
where J(β) = u(l, t; β) − h(t)22 =
0
T
[u(l, t; β) − h(t)]2 dt.
Generally speaking, it is unstable to solve the above minimization problem (6.2.1) numerically, especially in the case of the additional data having noises. Thus, regularization methods can be utilized to stabilize and solve it. If employing ordinary Tikhonov regularization, it needs to minimize the following functional with Tikhonov regularization term min{u(l, t; β) − h(t)22 + αβ22 }. β∈Ψ
(6.2.2)
Suppose that {φi (t)}∞ i=1 is a group of basis functions of Ψ, then there is β(t) =
∞
ai φi (t),
i=1
where ai (i = 1, 2, . . . ) are expansion coefficients. Taking an approximation by choosing limited terms, we have β(t) ≈
N
ai φi (t),
(6.2.3)
i=1
here N ≥ 1 is a truncated level of β(t), which can be regarded as dimension of approximate space. It is convenient to set a limited dimensional space as ΨN =span {φ1 , φ2 , . . . , φN }, and an N -dimension vector a = (a1 , a2 , . . . , aN ).
146
6
Optimal Perturbation Regularization Algorithm
Therefore, to get an approximate reaction coefficient solution β ∈ ΨN is equivalent to find a vector a ∈ RN , in which meaning we can say that β = a. Now, for given βj , assume that βj+1 = βj + δβj ,
j = 0, 1, . . . .
(6.2.4)
Then in order to get βj+1 from βj , we need to compute an optimal perturbation δβj . In the follows for convenience of writing, βj and δβj are abbreviated as β and δβ, respectively. Paying attention to (6.2.3), let us set δβ(t) =
N
δai φi (t).
i=1
So, we only need to work out vector δa = (δa1 , δa2 , . . . , δaN ). Taking Taylor’s expansion for u(l, t; β + δβ) at β, and ignoring higher order terms, we can get u(l, t; β + δβ) ≈ u(l, t; β) + ∇T u(l, t; β) · δβ. Thus, define a perturbation functional for δβ (or δa) as follows: F (δβ) = u(l, t; β) + ∇T u(l, t; β) · δβ − h(t)22 + αδβ22 , where ∇T u(l, t; β) · δβ ≈
N u(l, t; β + τi ϕi ) − u(l, t; β) i=1
τi
(6.2.5)
δai ,
here τi (i = 1, 2, . . . , N ) are numerical differential steps in computing the gradient of ∇T u(l, t; β). Next, discretizing the domain [0, T ] with 0 = t1 < t2 < · · · < tK = T , then the above L2 norm in (6.2.5) can be reduced to discrete Euclid norm given as F (δa) = G δa − (η − ξ)22 + αδa22 , where ξ = (u(l, t1 ; β), u(l, t2 ; β), . . . , u(1, tK ; β)); η = (h(t1 ), h(t2 ), . . . , h(tK )); and
u(1, tk ; β + τi φi ) − u(1, tk ; β) , G = (gki )K×N . (6.2.6) τi It is not difficult to prove that the above square minimization problem is equivalent to solve the following normal equation ([3]): gki =
(GT G + αI)δa = GT (η − ξ).
(6.2.7)
6.3 Numerical Simulations
147
Therefore, an optimal perturbation can be worked out by (6.2.7) given as δaα = (GT G + αI)−1 GT (η − ξ),
(6.2.8)
and then an optimal increment δβ α can be obtained. Furthermore, an optimal coefficient function β = β(t) can be obtained approximately by iterative procedures (6.2.4) as long as the increment satisfying a given convergent precision. The key points to realize the inversion algorithm lie in suitable choices of finite-dimensional approximate space, initial iteration, regularization parameter, numerical differential step and convergent precision. The detailed steps to implement the above algorithm are given as follows. Algorithm 6.2.1 (Optimal perturbation regularization algorithm). Step 1 Given basis functions {φi }N i=1 , and initial iteration βj (aj ), numerical differentiation steps vector τ = (τ1 , τ2 , . . . , τN ), and convergent precision eps, and additional data h(t); Step 2 Solve the corresponding forward problem to get u(l, t; βj ) and u(l, t; βj +τi φi ), and then obtain the vector ξ and the matrix G by formula (6.2.6); Step 3 Firstly setting regularization parameter as zero in formula (6.2.8), if a suitable perturbation δaj can be worked out, then turn to the next step. Otherwise, choosing regularization parameter α > 0 appropriately, and get an optimal perturbation δaαj also by using formula (6.2.8), and then get δβ αj ; Step 4 If there is δβ αj ≤ eps, then the inversion algorithm can be terminated, and βj+1 = βj + δβ αj is taken as the solution what we just want to determine; otherwise, go to Step 2 by replacing βj with βj+1 .
6.3
Numerical Simulations
We will utilize the inversion algorithm with zero regularization parameter to perform several numerical tests in this section.
6.3.1
Inversion of time-dependent reaction coefficient
Consider Problem 6.1.1 with the following initial boundary conditions u(x, 0) = 0; u(0, t) = 1; ux (l, t) = 0.
(6.3.1)
Example 6.3.1. Set D = 0.2, v = 1, and l = 1, T = 5 in equation (6.1.1), and take a true reaction coefficient as β(t) = −2 + 0.5t − 0.1t2 .
148
6
Optimal Perturbation Regularization Algorithm
In the concrete inversions, we will choose polynomial basis functions to compose finite-dimensional approximate space, i.e., take ΨN = span{1, t, . . . , tN −1 } as inversion space of solutions. For different approximate space, we can regard the true reaction coefficient as β true = (−2, 0.5, −0.1) in Ψ3 , and β true = (−2, 0.5, −0.1, 0) in Ψ4 , etc. In the case of using accurate data, set initial iteration be zero, i.e., β0 = 0, differential steps τ = (1e − 1, 1e − 2, . . . , 1e − N ) for discrete level N , and convergent precision as eps = 1e − 8. Table 6.1 lists inversion results in different inv denotes inversion coapproximate space ΨN for N = 3, 4, 5, 6, 7, where βN efficient in N dimensional approximate space, and Tcpu /I denotes CPU time (second) for each iteration. Moreover, Figure 6.1 plots the inversion reconstruction solution via the true solution in the case of N = 7. Table 6.1. Inversion results in ΨN for Example 6.3.1. N 3 4 5 6 7
inv βN (−2.00000, (−2.00000, (−2.00003, (−2.00024, (−1.99983,
0.500001, 0.500003, 0.500073, 0.500748, 0.499278,
−0.100000) −0.100001, 2.28e−7) −0.100054, 1.49e−5, −1.38e−6) −0.100750, 3.3e−4, −6.5e−5, 4.8e−6) −0.098987, −6.6e−4, 2.2e−4, −3.5e−5, 2.2e−6)
Tcpu /I 10.6/6 12.9/6 31.7/13 80.8/29 30.1/7
Figure 6.1. Reconstruction coefficient in Ψ7 and the true coefficient with accurate data.
6.3 Numerical Simulations
149
In the case of with noisy data, suppose that the noisy data are expressed by hε (t) = h(t) + εθ, where 0 < ε ≤ ε0 is noisy level, and θ is a random vector ranged in [−1, 1]. The average inversion results by ten computations for different noisy levels are listed in Table 6.2, and all the ten-time inversion results via the true solution for = 5% are plotted in Figure 6.2. Where ε represents noisy level, β¯inv denotes average inversion source, T¯cpu /I¯ denotes ¯ denotes average relative error in average CPU time for each iteration, and Err the solutions. Table 6.2. Inversion results with noisy data in Ψ3 .
ε 5% 10%
β¯inv (1.9986, −0.4986, 0.09968) (1.9912, −0.4934, 0.09895)
T¯cpu /I¯ 40.3/20.7 47.95/26.4
¯ Err 0.00203 0.00528
Figure 6.2. Inversion results of ten-time computations in Ψ3 for ε = 5%.
By the above numerical simulations we can see that the above optimal perturbation algorithm not only has accuracy, but also stable for random noises of the additional data, although explicit regularization strategy not utilized here.
6.3.2
Inversion of space-dependent reaction coefficient
Consider inverse problem 6.1.2 composed by equation (6.1.1) and the additional condition (6.1.3), where taking l = 1, T = 2, and the following initial boundary value conditions: u(x, 0) = 2x + 1, u(0, t) = exp(t), u(1, t) = t + 3.
(6.3.2)
150
6
Optimal Perturbation Regularization Algorithm
Example 6.3.2. Set an exact coefficient be p(x) = 1 − x + x2 /2, and D = 0.5, v = 100 in equation (6.1.1). Choosing basis functions space also as polynomial space, i.e., ΨN = span{1, x, . . . , xN −1 }, and taking N = 3, convergent precision as eps = 1e − 6, and differential step as τ = (1e − 2, 1e − 3, 1e − 4), the algorithm can be realized also with zero regularization parameter. The inversion results are listed in Table 6.3 and plotted in Figure 6.3, where p0 denotes initial iteration, pinv denotes the inversion solution, and Tcpu /I also denotes CPU time (second) for each iteration. Table 6.3. Inversion results with accurate data in Ψ3 for Example 6.3.2.
p0 (−10,−10,−10) (0, 0, 0) (10,10,10)
pinv (1.0000000, −0.99999999, 0.49999999) (1.0000000, −1.0000000, 0.50000000) (1.0000000, −0.99999999, 0.49999999)
Tcpu /I 5.2/4 3.6/3 4.5/4
Figure 6.3. True solution and inversion solution with p0 = 0.
Remark 6.3.3. If performing the algorithm in higher dimensional approximate space ΨN (N ≥ 4) for this example by choosing initial iteration p0 = 0, the inversion results are satisfactory too, which are listed in Table 6.4, where pinv N denotes inversion solution in N -dimensional approximate space.
6.3 Numerical Simulations
151
Table 6.4. Inversion results in ΨN for Example 6.3.2. N 4 5 6 7
pinv N (1.0000000, −1.0000000, 0.50000000, 7.5772e−10) (1.0000000, −1.0000000, 0.50000004, −6.1959e−8, 3.0818e−8) (0.99999999, −0.99999988, 0.49999891, 1.912e−6, −1.19e−6, 1.80e−7) (0.99999250, −0.99975197, 0.49767948, 9.1e−3, −1.7e−2, 1.5e−2, −5.3e−3)
6.3.3
Tcpu /I 8.5/3 5.5/3 8.5/4 7.3/3
Inversion of state-dependent source term
Consider Problem 6.1.3 composed by equation (6.1.1) and the additional condition (6.1.3), where taking D = 1, v = 0, and f = a(x)g(u), and the following initial boundary value conditions: u(x, 0) = 0, ux (0, t) = −b(t), ux (1, t) = 0.
(6.3.3)
The functions a(x), g(u), and b(t) are assumed to satisfy: (A1) a(x) is continuous and takes positive values for x ∈ (0, 1), and g(u) is continuous and piecewise differentiable on R, and g(0) = 0; (A2) b(t) ∈ C[0, ∞), b(0) = 0. Before performing the inversion algorithm, let us firstly investigate in which condition the forward problem has a monotone, positive solution. Proposition 6.3.4 ([2]). Suppose that u = u(x, t) be a-priori bounded, and the functions a(x), g(u), and b(t) satisfy assumptions (A1) and (A2), and b(t) > 0 for t ∈ (0, T ), then for each x ∈ (0, 1) follows that 0 = u(x, 0) ≤ u(x, t) ≤ u(x, T ) = uT (x), 0 < t < T.
(6.3.4)
Denote M = max uT (x), then there has 0 ≤ u(x, t) ≤ M for (x, t) ∈ DT [0,1]
under the conditions of the proposition, i.e., the solution’s range of the forward problem is [0, M ]. In the follows, numerical simulations will be carried out for g(u) ∈ C([0, M ]), and we will take four-order polynomial space as the basis functions space on the concrete computations. That is to say, there is N = 4, and ϕi (u) = ui−1 , i = 1, N . In other words, the source term g(u) ∈ φ4 has the expansion of (6.3.5) g(u) = a1 + a2 u + a3 u2 + a4 u3 , in which sense, g = g(u) is denoted as g = a = (a1 , a2 , a3 , a4 )T as stated before. In addition, the final time is taken as T = 1 in the following computations. Example 6.3.5. Set a(x) = 12 (1 − x), b(t) = t2 , and take g(u) = u2 as a true source solution.
152
6
Optimal Perturbation Regularization Algorithm
In the given polynomial basis space φ4 =span{1, u, u2 , u3 }, the true source can be expressed as g = (0, 0, 1, 0)T . Under the above given conditions, the forward problem is solved and the maximum of the solution is M = 5.3663, and then the source term will be reconstructed on u ∈ [0, M ] by the optimal perturbation algorithm still setting regularization parameter as zero. Choosing initial iteration as g0 = 0, the computational results are listed in Table 6.5 and Figure 6.4, where τ is the numerical differentiation step, g ∗ = (a∗1 , a∗2 , a∗3 , a∗4 )T denotes the computational reconstruction source, I represents number of iterations, and Err1 , Err2 and Err3 defined as follows represent three kinds of computational errors respectively: Err1 = g(u) − # 4 M 1 ∗ + a∗ u + a∗ u2 + a∗ u3 )]2 du, Err = ( 1 g ∗ (u)2 = [g(u) − (a |a∗i − 2 1 2 3 4 M 0 4 i=1
ai |2 )1/2 , Err3 = θ(x) − u∗ (x, 1; g ∗ )2 , where Err1 and Err2 denote errors in the true source solution and the reconstruction sources, and Err3 denotes inversion error, and u∗ (x, t; g ∗ ) denotes the solution corresponding to the reconstruction source g ∗ . Table 6.5. Inversion results for g(u) = u2 and b(t) = t2 . τ ≥ 0.5 0.1 0.01 0.001
g∗ divergent (−0.0005, 0.0002, 0.9996, 0.0000) (−0.0005, 0.0002, 0.9995, 0.0000) (−0.0006, 0.0002, 0.9995, 0.0000)
Err1
Err2
Err3
I
1.2008e−3 1.0334e−3 9.7583e−6 13 1.1933e−3 1.0654e−3 4.2546e−6 11 1.1897e−3 1.1015e−3 1.0692e−6 8
Figure 6.4. Reconstruction solution and true solution for g(u) = u2 and b(t) = t2 .
6.3 Numerical Simulations
153
As for this numerical example, if taking boundary condition as b(t) = t/2 and other conditions are unchanged, the solution’s maximum becomes M = 1.5075. Without loss of generality, taking numerical differential step as τ = 0.001, the computational errors are worked out given as follows: Err1 = 3.8101e − 9, Err2 = 2.9261e − 8, Err3 = 1.6673e − 9. As compared with those of b(t) = t2 , the errors become much smaller especially for Err1 and Err2 . Furthermore, if setting the boundary condition as b(t) = 5 2 4 t , and other conditions also unchanged, the computational results are given as M = 7.5676, and Err1 = 5.8527e − 3, Err2 = 8.9323e − 3, Err3 = 5.8194e − 6, and the reconstruction solution and the true source are plotted in Figure 6.5, respectively.
Figure 6.5. Reconstruction solution and true solution for b(t) = 5t2 /4 in Example 6.3.5.
By the above computations, we can see that the inversion algorithm is influenced by the solution’s range of the forward problem to some extent. On one hand, the inversion solutions errors could become much small if the solution’s maximum takes small value. On the other hand, the inversion solutions errors become large if the solution’s maximum of the forward problem taking relatively large value. Example 6.3.6. Set a(x) = x, b(t) = t2 /2, and g(u) = 1 − cos(u) as a true source solution.
154
6
Optimal Perturbation Regularization Algorithm
In the three-order polynomial basis space φ4 =span{1, u, u2 , u3 }, the true source is expressed as g = (0, 0, 1/2, 0). Similarly as done in Example 6.3.5, by solving the forward problem with the true source, we get the solution’s maximum which is M = 2.0440, then the inversion algorithm will be applied to reconstruct g(u) on u ∈ [0, 2.0440]. Also choosing initial iteration as g0 = 0, the computational results are listed in Table 6.6, where τ , g ∗ , I, and Erri (i = 1, 2, 3) represent the same meanings as in Example 6.3.5. Moreover, in the case of τ = 0.01, the computational reconstruction source solution and the true source are plotted in Figure 6.6, respectively. Table 6.6. Inversion results for Example 6.3.6. τ 0.5 0.1 0.01 0.001
g∗ (0.0051, −0.0760, (0.0051, −0.0760, (0.0051, −0.0760, (0.0051, −0.0760,
0.6713, −0.1395) 0.6713, −0.1395) 0.6713, −0.1395) 0.6713, −0.1395)
Err1 3.2563e−3 3.2550e−3 3.2542e−3 3.2567e−3
Err2 5.2249e−2 5.2254e−2 5.2256e−2 5.2250e−2
Err3 3.0153e−7 2.9500e−7 3.0234e−7 2.9494e−7
I 7 6 7 6
Figure 6.6. Reconstruction solution and true solution for b(t) = t2 /2 in Example 6.3.6.
Similarly as done in Example 6.3.5, replacing the boundary condition with b(t) = t2 and b(t) = t/2, respectively, and other conditions unchanged, the inversion computational results together with that of b(t) = t2 /2 in the case of τ = 0.01 are listed in Table 6.7, and the true source solution and the reconstruction source are plotted in Figure 6.7 for b(t) = t2 , respectively.
6.3 Numerical Simulations
155
Table 6.7. Solutions errors to different boundary conditions in Example 6.3.6.
b(t) t/2 t2 /2 t2
M 1.3987 2.0440 4.1991
Err1 1.3170e−3 3.2542e−3 7.5361e−2
Err2 3.9288e−2 5.2256e−2 3.8882e−2
Err3 8.4357e−7 3.0234e−7 3.5746e−6
Figure 6.7. Reconstruction solution and true solution for b(t) = t2 in Example 6.3.6.
Example 6.3.7. Set a(x) = x, b(t) = t3 , and g(u) = 1 − exp(−u) as a true source solution. In this example, the true source is g = (0, 1, −1/2, 1/6), and the solution’s maximum is M = 6.7203. Also choosing initial iteration as g0 = 0, the computational results are listed in Table 6.8, and the reconstructed source for τ = 0.001 and the true source are plotted in Figure 6.8, respectively. Table 6.8. Inversion results for Example 6.3.7. τ ≥ 0.5 0.1 0.01 0.001
g∗ Err1 Err2 divergent (0.0387, 0.6583, 0.1515, 0.0119) 6.5966e−2 2.5675e−1 (0.0388, 0.6581, 0.1514, 0.0118) 6.5521e−2 2.5689e−1 (0.0388, 0.6580, 0.1514, 0.0118) 6.5506e−2 2.5689e−1
Err3
I
2.8553e−6 8 2.8560e−6 6 2.8582e−6 4
By Table 6.8 and Figure 6.8, we find that the inversion results are not too satisfactory, the reconstructed source has a different trend when u > 5 as compared with the true source. However, the computational errors could become
156
6
Optimal Perturbation Regularization Algorithm
Figure 6.8. Reconstruction solution and true solution for b(t) = t3 in Example 6.3.7.
small if the solution’s maximum is relatively small as observed in Example 6.3.5 and Example 6.3.6. Actually, by setting b(t) = t/4 and b(t) = t2 /2, respectively, and also taking τ = 0.001, the computational results together with the inversion result of b(t) = t3 are listed in Table 6.9, the reconstruction source solution and the true solution for b(t) = t2 /2 are plotted in Figure 6.9, respectively. Table 6.9. Solutions errors with different boundary conditions in Example 6.3.7.
b(t) t/4 t2 /2 t3
M 0.8024 2.1652 6.7203
Err1 1.8646e−4 3.2164e−3 6.5506e−2
Err2 3.8777e−2 9.3370e−2 2.5689e−1
Err3 3.8994e−8 1.3256e−7 2.8582e−6
Furthermore, if setting b(t) = t/10 in this example, the inversion algorithm has to take on larger numerical differential steps so that better inversion results could be obtained. The computational results are listed in Table 6.10, and in Table 6.10. Inversion results with b(t) = t/10 in Example 6.3.7. τ ≥ 0.9 0.85 0.8 0.75 ≤ 0.7
g∗ divergent (0.0000, 0.9991, −0.4914, 0.1351) (0.0000, 0.9991, −0.4914, 0.1351) (0.0000, 0.9991, −0.4914, 0.1351) divergent
Err1
Err2
Err3
I
7.4697e−6 7.3322e−6 7.3350e−6
0.0164 0.0164 0.0164
1.869e−8 1.539e−8 1.485e−8
76 60 55
6.3 Numerical Simulations
157
Figure 6.9. Reconstruction solution and true solution for b(t) = t2 /2 in Example 6.3.7.
this case for τ = 0.8, the reconstruction source solution and the true solution are plotted in Figure 6.10, respectively.
Figure 6.10. Reconstruction solution and true solution for b(t) = t/10 and τ = 0.8 in Example 6.3.7.
6.3.4
Inversion of space-dependent diffusion coefficient
Consider Problem IV also composed by equation (6.1.1) and the additional condition (6.1.3), where l = 2π, T = 1, with the following initial boundary
158
6
Optimal Perturbation Regularization Algorithm
value condition ux (0, t) = ux (2π, t) = 0, u(x, 0) = cos(x).
(6.3.6)
Example 6.3.8. Set D(x) = 1 + cos(x) as an exact diffusion coefficient, and v = 0, f = exp(−t) cos(2x) in equation (6.1.1). It is easy to testify that the solution of the forward problem here is u(x, t) = exp(−t) cos(x), and the additional data function is uT =1 (x) = exp(−1) cos(x). We will perform the algorithm by choosing trigonometric basis functions space Ψ = span{1, sin(x), cos(x), sin(2x), cos(2x)} in [0, 2π] as approximate space, then the exact coefficient solution D(x) = 1 + cos(x) can be expressed by a vector c = (1, 0, 1, 0, 0) in the trigonometric space. Following the methods as used in the above, also set regularization parameter α = 0 in the case of using accurate data here. In the following Figure 6.11, the true solution and the inversion reconstruction solution are plotted in the case of taking zero initial iteration, and differential steps vector τ = (1e − 3, 1e − 3, 1e − 3, 1e − 3, 1e − 3). In addition, from the computations we find that the first component of the differential steps vector has much more impacts on the algorithm than those others in this example. For example, the algorithm is not convergent when choosing D0 = (1, 1, 1, 1, 1), and τ = (1e − 2, 1e − 2, 1e − 2, 1e − 2, 1e − 2), however, if taking τ = (1e − 1, 1e − 2, 1e − 2, 1e − 2, 1e − 2), the algorithm is convergent to the true solution, and the solutions error is Err = 2.35e − 4.
Figure 6.11. Inversion data and exact solution for Example 6.3.8.
Now, consider the case of additional data having random noises in which regularization strategy must be implemented, and regularization parameter need
6.4 Applications
159
to be chosen suitably. Suppose that noisy level is ε > 0, and the noisy data uεT has the form of uT = uT + θε, where θ is a random vector ranged in [−1, 1]. In addition, set numerical differential steps vector as τ = [1e − 1, 1e − 2, 1e − 2, 1e − 3, 1e − 3]. By ten-time inversion computations for ε = 1%, 5% and 10% respectively, the average inversion results are obtained and listed in Table ¯ denotes the average inversion coefficient vector, I¯ is the average 6.11. Where D ¯ is the average error of the ten-inversion solutions number of iterations, and Err errors. Table 6.11. Inversion results with noisy data in Example 6.3.8.
ε 1% 5% 10%
¯ D (1.00, 5.57e−4, 1.00, 1.57e−3, 2.98e−3) (0.995, −1.3e−3, 1.00, −6.14e−3, −2.12e−3) (1.02, −1.26e−2, 0.93, 5.63e−2, −2.19e−1)
I¯ 7.0 7.9 9.3
¯ Err 0.018 0.067 0.174
In order to observe the inversion results clearly, ten-time inversion computations for ε = 5% and 10% are plotted in Figure 6.12 and Figure 6.13, respectively.
Figure 6.12. Inversion data and exact solution for ε = 5% in Example 6.3.8.
6.4 6.4.1
Applications Determining magnitude of pollution source
Consider an acid pollution in groundwater in Fengshui, Zibo of Shandong Province, China. The problem we are to deal with is to determine average magnitude of the acid pollutants seeping into the aquifer every year based on
160
6
Optimal Perturbation Regularization Algorithm
Figure 6.13. Inversion data and exact solution for ε = 10% in Example 6.3.8.
the measured concentration data in the region from 1988 to 1999 along the groundwater flow direction. Under some suitable assumptions on the aquifer, the acid solute transport in this groundwater system can be characterized by: ∂2c ∂c q(x) ∂c , 0 < x < l, 0 < t < T1 , − aL v 2 + v + λc = ∂t ∂x ∂x ne
(6.4.1)
where q = q(x)[mg/(L · y)] represents an average magnitude of the pollutants seeping into the aquifer every year. In addition, l = 4000[m], and T1 = 11[y]. Moreover, the initial boundary value conditions are given as: c(0, t) = 7.96t + 45.6, c(l, t) = 1.75t2 + 331.6, c0 (x) = 0.0715x + 45.6, and additional data at the final year of T = 11 is given as cf inal (x) = 0.1026x + 133.2, 0 ≤ x ≤ l.
(6.4.2)
In concrete computations, the parameters in the model we know are given as follows: aL = 1[m], v = 1[m/d] = 365[m/y], ne = 0.25, λ = 0.05[y−1 ]. By transforming the real problem into dimensionless form, we can get: D ∂2C ∂C ∂C − + λ1 C = Q(Z), 0 < Z < 1, 0 < T < T¯, + 2 ∂T lv ∂Z ∂Z where λ1 =
l q(lZ) ¯ v l λ, Q(Z) = , T = T1 . v v c0 ne l
(6.4.3)
(6.4.4)
6.4 Applications
161
The initial boundary value conditions and the additional final data are transformed to: C(Z, 0) = 1 +
7.96l 331.6 1.75l2 2 0.0715l Z, C(0, T ) = 1 + T, C(1, T ) = + T , c0 vc0 c0 c0 v 2
and Cf inal (Z) =
133.2 0.1026l + Z, 0 ≤ Z ≤ 1. c0 c0
(6.4.5)
Choosing linear polynomial space as the basis functions space for the unknowns, and initial iteration as zero, and τ = (1e − 2, 1e − 4), an optimal source function with data noisy level of 5% can be easily worked out given as: Q(Z) = 13.9538 + 0.04531Z.
(6.4.6)
Then substituting the above source function into the model, we reconstruct the additional data plotted in Figure 6.14 as compared with the actually additional final observations.
Figure 6.14. Additional data and computational reconstruction data.
Furthermore, transforming (6.4.6) into dimensional case by (6.4.4), we can get q(x) = 14.5 + 0.0000118x,
(6.4.7)
which implies that an average magnitude of the acid pollutants seeping into the groundwater aquifer every year is about 15[mg/L]. This just coincides with the report given by the local government and inverse analysis result given in paper [4].
162
6.4.2
6
Optimal Perturbation Regularization Algorithm
Data reconstruction in an undisturbed soil-column experiment
It is important to characterize physical and/or chemical reactions occurring in the solute transport processes in mathematics. An effective way to reveal the transport processes is to do soil-column experiment. Consider an actual undisturbed soil-column experiment1 . Some relating parameters about the experiment are given in Table 6.12. Where l–length of the column [cm], aL – dispersitivity [cm], ρb –bulk density [g/cm3 ], Kd –adsorption coefficient [cm3 /g], θ–volumetric water content [no dimension], v–average pore water velocity [cm/s], T1 –total infiltration time with polluted water [h]. Table 6.12. Basic parameters in the soil-column experiment.
l 45
aL 1
ρb 1.48
Kd 0.095
θ 0.15
v 3.76e−3
T1 120
By the experiment, breakthrough data at the outflow were obtained which can be utilized as additional data. The problem here is to reconstruct the measured data by inversion algorithm based on suitable mathematical model describing solutes transport in the column. Denote c1 , c2 , c3 , and c4 as the con2+ − centrations of Ca2+ , SO2− 4 , Mg , and Cl at time t and space point x in the liquid phase in the column, respectively. By general advection-dispersion mechanism combing with the analysis of chemical reactions, a transport model for the four kinds of solutes ions penetrating through the column can be described as follows for 0 < x < l and 0 < t < T1 ⎧ ∂ 2 c1 ∂c1 ∂c1 ⎪ ⎪ ⎪ = D + r2 exp(−r1 t) − r3 c1 c2 , −v ⎪ 2 ⎪ ∂t ∂x ∂x ⎪ ⎪ ⎪ ⎪ ∂c2 ∂c2 ∂ 2 c2 ⎪ ⎪ =D 2 −v + r2 exp(−r1 t) − r3 c2 c1 + r5 exp(−r4 t), ⎨ ∂t ∂x ∂x (6.4.8) ⎪ ∂c3 ∂ 2 c3 ∂c3 ⎪ ⎪ =D 2 −v + r5 exp(−r4 t) − r6 c3 , ⎪ ⎪ ⎪ ∂t ∂x ∂x ⎪ ⎪ ⎪ 2 ⎪ ⎪ ⎩ ∂c4 = D ∂ c4 − v ∂c4 . ∂t ∂x2 ∂x It is noticeable that the six parameters ri (i = 1, 2, . . . , 6) are unknown which represent the chemical reactions occurring in the soil-column. Denote Ci = ci /ci0 (i = 1, 2, 3, 4), here ci0 (i = 1, 2, 3, 4) are the known initial concentrations 1
The experiment was supplied by The Inspecting Station of Geology and Environment in Zibo, Shandong.
6.4 Applications
163
of the four ion species in the inflow, and let Z = x/l, T = vt/l, P = vl/D, and aj = rj l/v(j = 1, 2, . . . , 6), we get a dimensionless model: ⎧ 1 ∂ 2 C1 ∂C1 a2 ∂C1 ⎪ ⎪ − exp(−a1 T ) − c20 a3 C1 C2 , = + ⎪ 2 ⎪ ∂T P ∂Z ∂Z c ⎪ 10 ⎪ ⎪ ⎪ ∂C2 1 ∂ 2 C2 ∂C2 a2 a5 ⎪ ⎨ = + − exp(−a1 T ) − c10 a3 C2 C1 + exp(−a4 T ), ∂T P ∂Z 2 ∂Z c20 c20 2 ⎪ ⎪ ∂C3 = 1 ∂ C3 − ∂C3 + a5 exp(−a T ) − a C , ⎪ 4 6 3 ⎪ ⎪ ∂T P ∂Z 2 ∂Z c30 ⎪ ⎪ 2 ⎪ ⎪ ⎩ ∂C4 = 1 ∂ C4 − ∂C4 . ∂T P ∂Z 2 ∂Z (6.4.9) The Initial and boundary conditions are given as ∂Ci (1, T ) = 0, i = 1, 2, 3, 4, (6.4.10) Ci (Z, 0) = 0, Ci (0, T ) = 1, ∂Z and additional data: Ci (1, Tk ) = Cˆ ik , i = 1, 2, 3, 4, k = 1, 2, . . . , K,
(6.4.11)
where K represents the number of measured breakthrough data. As a result, an inverse problem of identifying the unknown parameter a = (a1 , a2 , . . . , a6 ) is encountered which is the forward problem (6.4.9)-(6.4.10) with the additional information (6.4.11). In the follows, an modified optimal perturbation inversion algorithm will be introduced, and the reactive coefficients can be determined with which the measured breakthrough data can be reconstructed successfully. It is obvious that to solve the above inverse problem (6.4.9)-(6.4.11) numerically can be transformed to the following minimization problem K min{ max [Ci (1, Tk ; a) − Cˆ ik ]2 + αa22 }, Sa
i=1,2,3,4
(6.4.12)
k=1
where Sa denotes an admissible set of the unknown parameters vector, for example, Sa = {a : a2 ≤ E, aj > 0, j = 1, 2, . . . , 6, E is a constant} is a suitable set, which can ensure existence of the parameters, and α > 0 is regularization parameter, K is the number of the samples given in (6.4.11). In concrete computations, the above minimization problem (6.4.12) can be solved by obtaining an+1 for given an by the following iteration procedure an+1 = an + δan , n = 0, 1, . . . ,
(6.4.13)
here δan = (δan1 , δan2 , . . . , δan6 ), is called perturbation vector for each n, which is worked out by minimizing the following functional for given an : F (δa ) = max n
i=1,2,3,4
K k=1
[Ci (1, Tk ; an + δan ) − Cˆ ik ]2 + αδan 22 .
(6.4.14)
164
6
Optimal Perturbation Regularization Algorithm
If the best perturbation δan is obtained by (6.4.24), then the optimal parameter can be approximated by iterations (6.4.13) as long as the perturbation satisfying a given precision. The iterative procedures are listed below: Algorithm 6.4.1 (A modified OPR algorithm). Step 1 Given iteration vector an (n = 0, 1, . . . ), compute the output errors Ei =
K
[Ci (1, Tk ; an ) − Cˆ ik ]2 , (i = 1, 2, 3, 4)
k=1
for the four ions respectively, and without loss of generality suppose the largest error is Em , then define an error functional as follows: F(δan ) = Em + αδan 22 .
(6.4.15)
Step 2 By the above expression (6.4.15), and utilizing ordinary optimal perturbation algorithm, the perturbation vector can be worked out via δan = (αI + GT G)−1 GT (η − ξ),
(6.4.16)
where G = (gkj )K×6 , gkj = [Cm (1, Tk ; Rn + τ ej ) − Cm (1, Tk ; Rn )]/τ, here k = 1, . . . , K, j = 1, . . . , 6, and ej = (0, . . . , 1, . . . , 6) (j = 1, 2, . . . , 6) are basis functions of R6 ; and ξ = (Cm (1, T1 ; an ), . . . , Cm (1, TK ; an ))T , η = (Cˆ m1 , . . . , Cˆ mK )T , and τ is numerical differential step. Step 3 If the perturbation satisfies a given precision by δan 2 ≤ eps, then the algorithm is terminated, and an is taken as the parameter solution what we just want to determine; otherwise, get an+1 by the iteration (6.4.13), and go to Step 1. Now let us consider solving of the real inverse problem. It is much more complicated to solve a real problem with real data than to do numerical simulations, and it always needs larger regularization parameters for real problems than for artificial simulations. By setting initial iteration as a0 = 0, convergent precision as eps = 1e − 5, and regularization parameter as α = 0.0392, and differential step as τ = 0.1, the six parameters in equation (6.4.9) can be worked out by utilizing a modified optimal perturbation algorithm, which is ainv = (0.0163, 8.1314, 11.2092, 11.3687, 1.9289, 0.0353).
(6.4.17)
Furthermore, by substituting the above inversion parameters into the model (6.4.9)–(6.4.10), breakthrough data of the four ions are reconstructed which plotted in Figure 6.15, as compared with the real measured data respectively.
6.5 Conclusions
165
Figure 6.15. Reconstruction data and real breakthrough data.
Remark 6.4.2. By the inversion coefficients, we can give suitable physical/ chemical explanations for the soil-column infiltrating experiment by resolving the forward problem (c.f. [8]).
6.5
Conclusions
We conclude this chapter with the following comments: (1) The optimal perturbation regularization algorithm is an effective approximate method for function reconstruction in solving inverse problems of parameter determination. By numerical simulations and real inversions, we find that there are several factors impacting the inversion algorithm’s realization, which are approximate space, regularization parameter, convergent precision or number of iteration times, numerical differential step, initial iteration, and computational method of the forward problem, etc. Sometimes, the algorithm can be performed successfully without using explicit regularization term (by setting regularization parameter as zero in formula (6.2.8)), especially for the cases of utilizing accurate data, or performing the algorithm in lower-dimensional approximate space. On the other hands, initial iteration, numerical differential step, convergent precision and numerical method of the forward problem seem to have little impacts on the algorithm, but dimension of the approximate space and regularization parameter play important actions in the algorithm’s realization in many cases, and regularization parameter should not be too small in high-dimensional approximate space. (2) It is much more complicated to solve a real problem with real data than to do numerical simulations, and it always needs larger regularization parameters for real problems than for artificial simulations. When dealing with real
166
6
Optimal Perturbation Regularization Algorithm
inverse problems, it is useful to the algorithm’s implementation to transform the real model to a dimensionless form at first, and convergent precision must be chosen small enough in order to get a stable solution with high accuracy in the case of using large regularization parameters for real problems. (3) There are five kinds of approximate processes in the algorithm’s realization which are finite-dimensional approximation, linearization approximation, numerical differential approximation, regularization approximation and iteration approximation. Therefore, convergence analysis and stability theory seem to be complicated to the algorithm which become our sequent works. Moreover, it is still a trouble on how to choose suitable regularization parameter in the algorithm’s realization, especially for real problems not having noisy level of data.
Acknowledgements The project is supported by the National Natural Science Foundation of China (No. 11071148, No. 10926194 and No. 10471080), and the Natural Science Foundation of Shandong Province (No. Y2007A29).
References [1] A. Amirov and Z. Ustaoglu, On the approximation methods for the solution of a coefficient inverse problem for a transport-like equation, Computer Modeling in Engineering and Sciences, 54, 283-300, 2009. [2] G. S. Chi and G. S. Li, Numerical inversions for a nonlinear source term in the heat equation by optimal perturbation algorithm, Applied Mathematics and Computation, 216, 2408-2416, 2010. [3] A. Kirsch, An Introduction to Mathematical Theory of Inverse Problems, New York, Springer-Verlag, 1996. [4] G. S. Li, Y. J. Tan, J. Cheng and X. Q. Wang, Determining magnitude of groundwater pollution sources by data compatibility analysis, Inverse Problems in Science and Engineering, 14, 287-300, 2006. [5] G. S. Li, J. Cheng, D. Yao, H. L. Liu and J. J. Liu, One-dimensional equilibrium model and source parameter determination for soil-column experiment, Applied Mathematics and Computation, 190, 1365-1374, 2007. [6] G. S. Li, J. Q. Liu, X. P. Fan and Y. Ma, A new gradient regularization algorithm for source term inversion in 1D solute transportation with final observations, Applied Mathematics and Computation, 196, 646-660, 2008. [7] G. S. Li, Y. J. Tan, D. Yao, X. Q. Wang and H. L. Liu, A nonlinear mathematical model for undisturbed soil-column experiment and source parameter identification, Inverse Problems in Science and Engineering, 16, 885-901, 2008. [8] G. S. Li, D. Yao, Y. Z. Wang and H. Y. Jiang, Numerical inversion of multiparameters in multi-components reactive solutes transportation in an undisturbed
References
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16] [17]
167
soil-column experiment, Computer Modeling in Engineering and Sciences, 51, 5372, 2009. G. S. Li and W. J. Gao, Numerical inversion of diffusioin coefficient by optimal perturbation regularization algorithm, Proceedings of ICCASM 2010, 14, 244-248, Taiyuan, 2010. J. S. Lin, W. B. Chen, J. Cheng and L. F. Wang, A level set method to reconstruct the interface of discontinuity in the conductivity, Science in China (Series A), 52, 29-44, 2009. L. Ling and T. Takeuchi, Boundary control for inverse Cauchy problems of the Laplace equations, Computer Modeling in Engineering and Sciences, 29, 45-54, 2008. C.-S. Liu, L.-W. Liu and H.-K. Hong, Highly accurate computation of spatialdependent heat conductivity and heat capacity in inverse thermal problem, Computer Modeling in Engineering and Sciences, 17, 1-18, 2007. C.-S. Liu, C.-W. Chang and J.-R. Chang, A new shooting method for solving boundary layer equations in fluid mechanics, Computer Modeling in Engineering and Sciences, 32, 1-16, 2008. L. Marin, Boundary reconstruction in two-dimensional functionally graded materials using a regularized MFS, Computer Modeling in Engineering and Sciences, 46, 221-254, 2009. H. Shim, V. T.-T. Ho, S.-Y. Wang and D. A. Tortorelli, Topological shape optimization of electromagnetic problems using level set method and radial basis function, Computer Modeling in Engineering and Sciences, 37, 175-202, 2008. C. W. Su, Numerical Methods and Applications of Inverse Problems in PDE, Xi’An, Northwestern Polytechnical University Press, 1995. Y. F. Wang, Computational Methods for Inverse Problems and Their Applications, Beijing, Higher Education Press, 2007.
Author Information G. S. Li Institute of Applied Mathematics, College of Sciences, Shandong University of Technology, Zibo 255049, P. R. China. E-mail: [email protected]
Chapter 7
Filtering and Inverse Problems Solving L. V. Zotov and V. L. Panteleev
Abstract. The inverse problems for discrete and continuous linear systems are considered together with approaches to their solutions, based on Moore-Penrose inversion, Panteleev corrective filtering, and Phillips-Tikhonov regularization. The common point for all approaches is a rejection of the improbable solutions based on a priory assumptions, in other words — filtering. Thus, the normal pseudosolution for ill-posed linear systems can be obtained by eliminating the small eigenvalues. In the corrective filtering an additional filter is superimposed on the frequency band of noise amplification, and the regularization also can be interpreted as filtering of the solution in the band of small absolute values of the operator transfer function. Simple geometrical interpretation is given. An example of excitation reconstruction for the Chandler wobble of the Earth’s pole is provided.
7.1
Introduction
Inverse problems arise when it’s necessary to recover the causes of the phenomena from their observable consequences. In case of the dynamical systems it is an input signal reconstruction from the observed output or determination of the system structure. Any device can be treated as a dynamical system. Therefore, reconstruction of the true view of phenomena from the instrumental data is the inverse problem. Any reconstruction of the model parameters from the observations also can be considered as an inverse problem. Besides the distortions, introduced by the equipment, any observations are aggravated by the errors. The presence of observational noises makes the inverse problems significantly non equal to the direct ones. In most cases they are unstable. Simple inversion of the model equations may be so sensitive to the slightest errors of observations or of the model itself, that the solution uncertainty would be enormous. The complexity of the inverse problems is not so much in finding the solution, but in fact, that many of possible solutions do not correspond to the original. As at the architectural monument reconstruction from its remains, additional sources of information must be taken into account, solving the inverse problems
170
7
Filtering and Inverse Problems Solving
requires additional (a-priori) information, which reduces the uncertainty about the original object. No mathematical trick can replace the lack of information. Inverse problems can be found everywhere. Examples can be the random time series spectrum estimation, restoration of gravitational field at height h0 from its smoothed values observed at the grater hight h1 > h0 , reconstruction of images, Earth internal structure [16], three-dimensional fields in tomography, mapping the star surfaces from the luminosity curves, building the model of the Universe from the relict background radiation fluctuations and number of galaxies calculations, etc. Jacques Hadamard formulated in 1932 the conditions of the problem wellposedness: (1) solution should exist; (2) solution is unique; (3) solution depends continuously on the initial data. Inverse problems most often are ambiguous and unstable. They do not meet one or several of these conditions and therefore are called ill-posed. They require special approaches we will talk about. In classical physics and mathematics, the greatest attention is plaid to the inverse problems for systems of linear algebraic and integral equations [13], [15]. Integral equations can be discretized for numerical solution, turning into the systems of linear algebraic equations (SLAE). The classical methods were designed to work primarily with the deterministic functions. Their extension over the random processes took place later, and is far from completeness. In practice, we always face the randomness, so we need to learn how to use the theory for this case. Linear algebra techniques are also applicable for stochastic signals, but the work should be done with use of mathematical expectations and requires special care. The methods of mathematical statistics, developed to meet the challenges of estimating the parameters of distributions, form the natural approaches to the inverse problems with random data [12].
7.2
SLAE Compatibility
Let us consider a system of linear algebraic equations y = Cx
(7.2.1)
with matrix C of dimensionality (m ×n). Expression (7.2.1) corresponds to the transformation C, which maps vector x from the linear space Ln into vector y from the linear space Lm . The problem of finding the solution x for given y is posed, which can be treated as inverse one. Each of the system (7.2.1) equations represents the equation of hyperplane. It is possible to find the common points of all hyperplanes only if the system
7.3 Conditionality
171
is consisted. In the linear algebra it is proved [3] and in functional analysis it is generalized [7] the Fredholm theorem, according to which the system (7.2.1) is consistent if and only if every solution of the transposed (in the general case adjoint) homogeneous system CT η = 0 satisfies the condition ηy = 0. Let us recall, that the set x ∈ Ln , which is mapped by C into the null vectors from Lm (the solution set of the homogeneous system Cx = 0) is called the kernel of transformation C or null-space and is denoted as KerC. This is a linear subspace. Respectively in the Lm space such vectors η can be found, which are mapped by the transposed (adjoint) transformation CT into the null vectors from Ln . This is the set KerCT of solutions of the transposed system. The letter is orthogonal to the range (image) of C, denoted as ImC. Geometrical interpretation of the Fredholm theorem is the following, if the vector y has zero projection on the basis KerCT , then it can not belong to the range ImC and the system (7.2.1) can not be solved [2]. Example 7.2.1. The system of one equation ax1 + bx2 + cx3 = y with at least one coefficient not equal zero put into correspondence to the set of vectors x = (x1 , x2 , x3 )T from R3 the value y from R1 and defines the plane in R3 . The kernel KerC is the set of vectors of the plane, parallel to the defined and passing through the origin of coordinates. The system divide the space R3 into the sum KerC⊕ImCT . The range J = ImCT of the adjoint operator becomes the set of vectors z, orthogonal to the plane. Indeed, according to the Fredholm theorem the system CT η = z is consistent if and only if xT z = 0, where x is a vector from the set K = KerC of solutions of the homogeneous initial system. So, the subspace R3 is divided into two orthogonal subspaces: K of vectors in the plane, given by the initial equation, and I – vectors, orthogonal to this plane. The initial system solutions have zero projection on it. The system is consisted for any y, because the kernel of the transpose operator consists of zero η values.
7.3
Conditionality
The system (7.2.1) is called ill-conditioned, when small errors in y may cause significant errors of the solution x. Suppose C is square nonsingular matrix. Consider the perturbed system obtained from (7.2.1) by adding small quantities to the matrix coefficients y = (C + δC)x. If at small perturbations δC the system can become singular, then the matrix of the system (7.2.1) is called almost singular. In other words, the matrix C is almost singular when in its small neighbourhood there is a singular matrix
172
7
Filtering and Inverse Problems Solving
C + δC. This is a serious case of ill-conditionality. In case of singular C the system obtains an infinite number of solutions. In practice, most often the matrix C is known approximately. If it is almost singular, it could be that an accuracy with which it is known, is not sufficient to answer the question – is it singular in reality or not. In this case the inverse problem is highly unstable. Suppose vector y is not known exactly, but with an absolute error Δy. I.e. it is given the center y of neighbourhood with radius Δy, where the true value of the vector may lie. In this case the inverse problem can be treated as the search of the prototype of this neighbourhood. In the linear problem the prototype of the spherical neighbourhood will be ellipsoid in Ln . In case of bad conditionality, some of the semi-axes of this ellipsoid will be large, and the corresponding x components will have very large errors. Example 7.3.1. The system of equations a1 x1 + b1 x2 = y1 , a2 x1 + b2 x2 = y2 ,
(7.3.1)
defines two lines on the plane, Figure 7.1. The solution of SLAE is the point of intersection of these lines. If the vector y is known with error Δy = (Δy1 , Δy2 )T , the solution will lie in the region, bounded by the lines a1 x1 + b1 x2 = y1 + Δy1 ,
a1 x1 + b1 x2 = y1 − Δy1 ,
a2 x1 + b2 x2 = y2 + Δy2 ,
a2 x1 + b2 x2 = y2 − Δy2 .
In case of ill-conditionality of the system the lines will be almost parallel and the uncertainty of the point of intersection over one of the parallelogram diagonals, which in this case the neighbourhood is, will be very high. If uncertainty is
Figure 7.1. The range of permissible values of the problem (7.3.1), better (a) and verse (b) conditioned.
7.4 Pseudosolutions
173
hidden in the matrix of the system, the lines are not only shifted, but also rotated, the neighbourhood becomes more complex figure. The impression may arise that if the lines are perpendicular, the system is determined best of all. The angle between the lines is indeed associated with the degree of singularity of the matrix. However, since the conditionality is a broader concept, one can give the examples, where the lines are perpendicular but conditionality is bad. For example, if among the coefficients and free terms there are some greatly different in values (eg. a1 = 103 , b1 = 0, y1 = 1, a2 = 0, b2 = 10−3 , y2 = −10−3 ), the system could be ill-posed even in case of the orthogonal lines. In connection with said above the question of finding of measure of conditionality rises. It can be shown that the value m(C) = ||C|| · ||C−1 ||, where ||·|| denotes some matrix norm, could serve as such a measure. In particular, for the spectral norm1 the ratio of the greatest to the smallest eigenvalues of the matrix C λ1 m(C) = λn can be used as the measure of conditionality.
7.4
Pseudosolutions
In general case the matrix C of the system (7.2.1) is not square. When m < n reasonable solution can be obtained as an exception. When m ≥ n, in most cases it will not be possible (if matrix C has more then n linearly independent rows) to satisfy all the equations simultaneously. Do we need to choose the consistent subsystem of equations, to solve it, and reject the others? What really is reasonable is to find such a multitude of points, none of which, possibly, does not satisfy any of the equations (7.2.1), but all they in some sense are close to the solution of every one. Here we meet the notion of generalized solution or pseudosolution, minimizing the norm of residual (y − Cx). The residual also can be considered as an element of solution, which can be interpreted as the search of such corrections for y, which makes the system consistent. Minimization is usually performed over the Euclidean norm, i.e. we would search the minimum of squared deviation. A necessary condition for the extremum (7.4.1) d(y − Cx)T (y − Cx) = −2dxT CT (y − Cx) = 0 1
Spectral norm of the square positive defined matrix is equal to its greatest eigenvalue.
174
7
Filtering and Inverse Problems Solving
brings us to the normal system of equations CT Cx = CT y, which is always consistent (CT C is symmetric, if CT Cη = 0, then η T CT Cη = (CηT )Cη = 0, Cη = 0 and the Fredholm theorem condition η T CT y = 0 is satisfied.). The solution of the normal system is the pseudosolution of (7.2.1). If matrix CT C is not singular and allows inversion, it can be written as x = C+ y. Here we introduce into consideration C+ – generalized inverse (pseudoinverse) matrix, which satisfies the condition C = CC+ C. If C has linearly independent columns, it can be given by the equation C+ = (CT C)−1 CT , and if linearly independent rows, – by the expression C+ = CT (CT C)−1 . At that C+ consists of m columns of hight n, i.e. has the same dimensionality as CT . But if the matrix CT C is singular, the solution is not unique. It can be written using the pseudoinverse Penrose matrix2 in form x = C+ y + (I − C+ C)c,
(7.4.2)
where I is an identity matrix of size (n × m), and c is arbitrary vector. In the written expression the first term is a particular solution of the normal system, and the second term is a general solution of the homogeneous conjugated normal system CT Cx = 0. The question arises what solution to select? By normal pseudosolution the solution is called which has the minimum norm. It is proved that every system of normal equations has a unique normal pseudosolution x = C+ y. (7.4.3) But how to determine the pseudoinverse matrix, if CT C can not be inverted? In general case C+ can be defined as a matrix, whose columns are normal pseudosolutions of the systems of linear equations of form Cx = ei ,
(7.4.4)
where ei is the i-th column of unitary matrix of order m. 2
The notion of generalized (pseudo) inversion of the operator was introduced by Fredholm in 1903. Pseudoinverse matrix method developed by Moore in 1920 and generalized by Penrose in 1955.
7.5 Singular Value Decomposition
175
Example 7.4.1. To the system of two equation x = 1, x = 2, the normal system 2x = 3 and pseudosolution x = 2/3 corresponds. Example 7.4.2. The system of one equation αx = β when α = 0 has pseudosolution, coinciding with solution. If α = 0, then any x gives the residual with norm |β|. The normal pseudosolution will be x with the minimal norm, i.e. 0. In the same way the pseudosolution of the system with zero matrix of larger dimensionality can be found. Example 7.4.3. The system of m equations yi = kxi + b,
i = 1, . . . , m
can be represented by m points on the plain with coordinates (xi , yi ). If to consider xi and 1 as coefficients forming matrix C rows, then the search of k and b – parameters of the line, passing in terms of minimum of deviation as close to all the points, as possible, will be the inverse problem solution. So that the system is non-consistent (if all the points do not merge into one) such line will be pseudosolution of SLAE of form (7.2.1).
7.5
Singular Value Decomposition
Let us consider the structure of the pseudosolution and pseudoinverse matrix in the singular basis. To do this let’s perform singular value decomposition (SVD) of the transformation matrix [6] C = USVT .
(7.5.1)
On the diagonal of the S matrix of dimensionality (m × n), in decreasing sequence the singular values si of matrix C stay. Other elements are zeros. The columns of matrix U of dimensionality (m × m) forms the second (column) singular basis in Lm , and the rows of matrix V of dimensionality (n×n) forms the first (row) singular basis in Ln . Both matrices are unitary UT U = I, VT V = I [3]. Let us write the SVD-decomposition for the matrix of the normal system CT C = (USVT )T USVT = VΛVT ,
(7.5.2)
where Λ = S2 is a diagonal matrix. Along its main diagonal in decreasing order the eigenvalues of the matrix of the normal system λ1 ≥ λ1 ≥ · · · ≥ λn are located. Each of them is equal to the square of the corresponding singular number of C, λi = s2i . This numbers are nonnegative for self-adjoint positive semidefined matrix (7.5.2) and characterize the degree of conditionality of the matrix. Their full set is called the spectrum of the matrix, and the expression
176
7
Filtering and Inverse Problems Solving
(7.5.2) is nothing more, then the coordinate transformation (rotation), bringing the matrix to the diagonal form. It means, that the basis can be selected, where the matrix of the normal system has the diagonal form Λp = ST θ,
(7.5.3)
where p = VT x, θ = UT y. In this basis of principal components, which is sometimes called the singular basis or the Karhunen-Loeve basis, it’s easy to find the pseudosolution. Once again let us emphasize that the notions of the spectrum and eigenvalues are defined for the square matrices. For non-square matrices the singular numbers are defined. Below, we will work with both eigenvalues √ λi of the normal system matrix (7.5.2), and with singular numbers si = λi of the initial system (7.2.1) rectangular matrix C. In case of nonsingular Λ the pseudoinverse matrix S+ will have the dimensionality (n×m) with all the elements equal zeros, except the diagonal elements of the square block (of size (n × n), if m ≥ n), located in the upper left part of the matrix. Along its diagonal the numbers 1/si are located. The measure of conditionality of the problem is the ratio m(S) = s1 /sn . If the spectrum of the normal matrix has a big range, the problem becomes unstable. One of the method of solution in this case is cutting out the small singular numbers. Some value ε > 0 is chosen and all singular numbers |si | < ε in the matrix S are set to zero. When rg(Λ) = k < n, the normal system is not consistent, one or several eigenvalues become equal zero and the number of consistency turns into infinity. In this case the matrix of normal system can be written in form Λ = diag (λn , λn , . . . , λk , 0, . . . , 0) . It does not allow the direct inversion and the columns of pseudoinverse matrix should be found as normal pseudosolutions of the systems (7.4.4). The first k columns of S+ will have non-zero elements 1/λi , i = 1, . . . , k, standing at the main diagonal of the matrix. Remaining n −k columns will be zeros. Whatever they are, they do not change the residual, but normal pseudosolution requires them to have minimal norm. So, the pseudoinverse matrix of the singular diagonal system is the matrix of dimensionality (n × m) with non-zero main diagonal of the square block of size (k × k) ⎛ −1 ⎞ s1 ⎜ ⎟ .. ⎜ . 0⎟ S+ = ⎜ (7.5.4) ⎟. ⎝ ⎠ s−1 k 0 0
7.6 Geometry of Pseudosolution
177
Using the obtained result and SVD, the pseudoinverse matrix of the original system (7.2.1) can be written in form C+ = VS+ UT .
(7.5.5)
The so-called second form of singular decomposition can be used C = QDPT which comes from the original (7.5.1) by taking the first k columns of U, first k rows of V and non-zero block of matrix S of size k × k T
T
C = USVT = U(Ikm )T DIkn V = Ikm UD(VIkn ) = QDPT , where matrix Ikn represents the block of identity matrix of size k × n. Multiplication of some matrix from the left by Ikn , if it is defined, will leave in the matrix k rows, and multiplication from the right adds n − k zero columns. In this case the pseudoinverse matrix D+ will be square and the solution will be written in form & % 1 1 1 + + T + . , ,..., C = PD Q , with D = diag s1 s2 sk For the normal matrix (7.5.2) the pseudoinversion will be written in form (CT C)+ = VΛ+ V = VS+ S+ V . T
7.6
T
T
Geometry of Pseudosolution
Let us give the geometric interpretation of the pseudosolution. It can be shown that the operation C+ Cx is a projection of vector x onto the subspace composed of all linear combinations of form CT y, Figure 7.2.
Figure 7.2. Estimating as the projection on the range of the adjoint operator, orthogonal to the kernel of the original operator, the discrepancy belongs to.
178
7
Filtering and Inverse Problems Solving
Indeed, taking into account the explanation of the Fredholm theorem (Example 7.2.1), one can understand that the space Ln is splitted by the transformation C : Ln → Lm into two orthogonal subspaces: null-space K = KerC and the range of adjoint operator J = ImCT – the subset of linear combinations CT y for any y. As the basis in the Ln space the first singular basis can be chosen. Then the basis of subspace K will consist of the rows of the matrix V, corresponding to zero eigenvalues, and of the subspace J – to nonzero. Any vector of Ln can be represented by the sum of two components x = x0 + x1 , x0 ∈ J, x1 ∈ K. So that Cx1 = 0, the image of x in Lm will be η = Cx0 . The operation C+ η will give such a prototype in Ln , which has a minimal norm, i.e. all its components3 connected with K, will be equal to zero. Thus the C+ Cx = x0 is a projection4 of x on J. Hence the first summand in (7.4.2) represents pseudosolution of (7.2.1) – the projection of x on J, and the second summand of (I − C+ C)c for any vector c is the projection of the latter on K. In normal pseudosolution the second term (7.4.2) is set to zero. Thus it does not include vectors, which are combinations of basis vectors of null-space. Amputation of small eigenvalues in calculation of pseudosolution for ill-posed problem means making zero all the components of x in the first singular basis, whose uncertainty is maximal. This technique is known as Moore-Penrose inversion.
7.7
Inverse Problems for the Discrete Models of Observations
Let us consider the generalized discrete model of observations z = Hx + u, u = a, cov(u) = Q,
(7.7.1)
where H – is the transformation matrix of dimensionality (m × n), u is the observational noise with known mean a and covariance matrix cov(u) = (u − a)(u − a)T . In general case, the matrix Q is non-diagonal. It becomes diagonal, when the errors of observations are independent and there are no hidden systematic errors in the data, not taken into account in the model. The inverse problem will be in determination of n components of the vector x from the m-dimensional vector of observations y. It can be considered as a the problem of SLAE z±Δz = Hx solving, where Δz = u is the observational noise, which is 3 4
If not to impose a minimum condition on the norm, the components of the transform in K, generally speaking, can be arbitrary. One of the basic properties of the projection operator is that the re-projection does not change the result P roj(P roj(x)) = P roj(x).
7.7 Inverse Problems for the Discrete Models of Observations
179
the sequence of the random vector realization with natural fluctuations about its mean a. Let us use generalized least squares method (LSM) [4] to estimate x ¯ in the model (7.7.1) when the covariance matrix Q is full. The problem of minimization of the weighted sum of squared deviations of the computed values from observational ones is to be solved (z − z¯)T Q−1 (z − z¯) → min,
(7.7.2)
The optimal estimation is given by expression x ¯ = (HT Q−1 H)−1 HT Q(z − a), and covariance matrix can be calculated as cov(¯ x) = (HT Q−1 H)−1 . The matrix of the normal system of equations F = HT Q−1 H sometimes is called Fisher information matrix [12]. The model (7.7.1) can be brought to the standard model with zero mean and unit covariance matrix y = Cx + n, n = 0, cov(n) = In ,
(7.7.3)
by the variables transformation of the form 1
y = (z − a)Q− 2 ,
1
n = (u − a)Q− 2 ,
1
C = HQ− 2 .
(7.7.4)
Extraction of the root from the covariance matrix Q is possible because it is positively defined and nonsingular. Minimization of the sum of squared deviations leads in the standard model [9] to the expression (7.4.1) CT (y − Cx) ¯ = 0,
(7.7.5)
wherefrom the normal system of equations and well-known solution follows x ¯ = (CT C)−1 CT y,
cov(¯ x) = F−1 = (CT C)−1 .
(7.7.6)
Indeed, the zero operator (7.7.5) application to any object x can be written in form (Cx)T (y − Cx) ¯ = 0. (7.7.7)
180
7
Filtering and Inverse Problems Solving
This expression represents nothing else but an orthogonality condition between the discrepancy (y −Cx) ¯ and the model. The discrepancy belongs to the kernel of the adjoint operator KerCT (7.7.5), which is orthogonal to the range ImC of operator C. LS-estimation is the projection of observations y on the class of linear combinations Cx. The distance from y to Cx¯ is minimal. The condition of orthogonality brought us before to the Moore-Penrose inversion, Figure 7.2. If the noise in the model (7.7.1) is white and all the components have equal precision, i.e. all ui are mutually independent and have the same variance σ 2 , then the covariance matrix has diagonal form Q = σ 2 I and the variables transformation (7.7.4) can be written in form y = (z − a)/σ,
n = (u − a)/σ,
C = H/σ,
in this case, the Fisher matrix and covariance matrix take the form F = HT H/σ 2 ,
cov(¯ x) = σ 2 (HT H)−1 .
(7.7.8)
An interesting approach to the problem solving, when the error is expected both in the observations and in the data matrix, is the given by the total least squares (TLS) method [11].
7.8
The Model in Spectral Domain
Let us transform the standard model (7.7.3) into the spectral band of principal components (7.5.3). To do this we need to multiply the equation by UT and change the variables UT Cx = UT USVT x = ISp = Sp, θ = UT y,
ζ = UT n,
p = VT x.
Then we will obtain the system of equations θ = Sp + ζ, ζ = 0, cov(ζ) = In ,
(7.8.1)
where θ is the decomposition of y over the principal components, pi are principal components (canonical coordinates) of the object x = Vp, ordered according to the precision of reconstruction. Vectors vi (rows of V) are also called empirical orthogonal functions. LS-estimation in the spectral band and its covariance matrix in nonsingular case takes the form, see (7.5.3), p¯ = Λ−1 ST θ = S−1 θ,
cov(p) ¯ = Λ−1 .
(7.8.2)
7.9 Regularization of Ill-posed Systems
181
When the problem is ill-posed, the inverse problem of the source vector x search becomes very unstable. The components of estimate x¯ may have huge dispersion and be highly correlated. In this case the methods of pseudosolution search described above become useful. Normal pseudosolution can be obtained in form (7.4.3). The issue of finding and excluding from the system those components that make it unsolvable or introduce the largest uncertainty into the solution is the issue of filtering. It can be solved by finding a singular basis, and setting those components to zero, which brings into the solution most of uncertainty, i.e. component of the solution in the basis K. The range of the acceptable estimates (RAE) as a prototype of the observations y neighbourhood (ep. 7.2.1) in case of gaussian noises is ellipsoidal. The axes of chosen in (7.8.1) system of coordinates are parallel to the axes of RAE ellipsoid. The components p are orthogonal and do not correlate with each other. In the Moore-Penrose pseudoinversion method the estimate is obtained by the observations multiplication by the pseudoinverse matrix (7.5.4), which decomposition contains 1/si . Therefore in the system of principal components the smallest singular numbers correspond to the largest dispersions – length of RAE ellipsoid axes. Truncation of small singular numbers reduces the uncertainty of the inverse solution [12]. It is easy to see that this is nothing more than filtering with a rectangular window in the spectral domain, namely simple removal of small eigenvalues and associated components. Smoother filtering method can be found. Example of Wiener filter of principal component is given in [12].
7.9
Regularization of Ill-posed Systems
Another way to overcome the ill-posedness, mentioned in the previous section, is adding small correction αIn to Fisher matrix, then the matrix of reconstruction in (7.7.6) takes the form (7.9.1) (F + αIn )−1 CT . This is so-called regularized solution, where α is a small regularization parameter, which significantly affect the small eigenvalues of the matrix F + αIn . The regularization method will be discussed later.
7.10
General Remarks, the Dilemma of Bias and Dispersion
The inverse problem is posed when there are established (1) the data model;
182
7
Filtering and Inverse Problems Solving
(2) a-priori information about the object; (3) the meaning put into the inverse solution notion; (4) the concept of the solution quality. To satisfy the first item it is enough to formulate a direct problem. The model should be sufficiently complete. Thus, the example of not quite correct approach is subtracting the background, obtained from the distant from the object of study parts of the astronomical image, before it’s processing. It is more reasonable to include the background model in the estimation procedure. The condition x ≥ 0 (the luminance can not be negative) for photo or CCDimage can be an example of a-priori information. A-priori information is often difficult to formalize. The observer sometimes overestimate the amount of available a-priori information. It is better not to allow illusions that more information can be extracted from observations, than they really contain. Dealing with the inverse problems it is useful to use “Occam principle”: not to complicate the entity without necessity. Among two solutions, simple and complex in equally good agreement with observations, it is reasonable to choose a simple one. Solving the inverse problem is similar to the detective work. The search area narrows by indication of group of people the offender belongs to, by choosing the preferences in this group. Invalid a-priori information usually leads to the estimate offset to the “wrong direction”. The 4-th item requires a definition of what is important for us in the solution. In statistics three estimation criteria were introduced by Fisher: • unbiasedness – the absence of systematic error; • consistency, i.e. increase of accuracy with increase of samples number; • efficiency, which is the most powerful property, implying the achievement of the smallest possible dispersion in the selected class of estimates. If the estimation bias b = x−x ¯ tends to zero while the number of samples increases, one speaks about the asymptotic unbiasedness. If the estimation scattering does not exceed the scattering of any other estimate in this class, it is effective. For different tasks different properties are important. For example, LSM was developed by Gauss and Legendre for orbit parameters determination. Generally speaking, in the standard model (7.7.3) parameter x is supposed to be nonrandom. Observational errors are random, for them it is supposed statistical independence and zero mean. The distribution of errors in the general case can be unknown and be non-gaussian. It is proved that in case of linearly independent columns of the system matrix LS-estimate is unbiased and efficient estimate in the class of linear estimates. If it is known that distribution of errors is gaussian, LS-estimate becomes efficient in the class of all linear and nonlinear estimates. The system matrix columns contain the values of the basis functions. Ev-
7.10 General Remarks, the Dilemma of Bias and Dispersion
183
erything that does not fit into the model, i.e. has zero projection on the basis functions, is discarded. Thus, by establishing the LSM model and performing estimation, we implicitly do filtering. An important feature of inverse problems solution the dilemma of bias and variance reflects. Let’s consider it on example of LS-estimate. The proximity of estimate x¯ to the true value can be described by the matrix of scattering Ω = (x − x)(x ¯ − x) ¯ T ), which trace
n tr(Ω) = (xk − x¯k )2 k=1
also characterize the deviation and is nothing else than the sum of mean squared deviations of x¯ components. According to the Gauss-Markov theorem LS-estimate has the smallest variance in the class of linear unbiased estimates. Hence the scattering matrix can be written as Ω = cov(x) = (x − x)(x ¯ − x) ¯ T = F−1 . Adding and subtracting average, let’s rewrite Ω in the following way Ω = (x − x ¯ + x ¯ − x)(x ¯ − x ¯ + x ¯ − x) ¯ T T ¯ − x)( ¯ x ¯ − x) ¯ T = (x − x)(x ¯ − x) ¯ + (x ¯ − x)(x ¯ − x) ¯ T +(x − x)( ¯ x ¯ − x) ¯ T + (x T = (x ¯ − x)(x ¯ − x) + (x¯ − x)( ¯ x¯ − x) ¯ T ,
(7.10.1)
where non-corelatedness of b = x ¯ − x and (x¯ − x) ¯ is taken into account. So, we come to the expression Ω = bbT + D
(7.10.2)
where D = (x− ¯ x)( ¯ x− ¯ x) ¯ T is the dispersion matrix for the estimate x, ¯ and b is its bias. Obtained expression reflects the dilemma of bias and dispersion in consequence of which the additional filtering of least-squares estimate could lead to the reduction of the overall dispersion, while the unbiasedness is being sacrificed. This is applicable to all kind of inverse problems solutions. Examples can be the estimation of power spectral density (PSD), where the use of biased estimate of autocorrelation function (ACF) leads to the reduced spectrum error, for the price of a bias. Introduction of the regularizing term in (7.9.1) or rejection of small eigenvalues in pseudoinversion (7.5.5) also introduce bias into the estimate, reducing its dispersion. One should always clearly understand what is the base model, where are the random parameters, what is an a-priori information. If the matrix in the model
184
7
Filtering and Inverse Problems Solving
(7.7.3) links random variables, it is a regression problem. Additional information incorporated into the model influences the mathematical expectation of the estimate. Unbiased estimate, obtained under such model, will appear to be shifted with respect to the estimate in the model, which does not include that additional information, though the first one would probably have smaller dispersion [11]. Estimation of the inverse problem error is, generally speaking, a difficult task, since it is hard to say what error, if any, is introduced with additional information. Even if we can estimate the scattering (variance) of the obtained estimate in the range where from it was chosen, it is unlikely to find the possible error associated with the constraints on the range of values, introduced by application of regularization, or smallest singular values cut off in Penrose pseudoinversion.
7.11
Models, Based on the Integral Equations
Let us proceed to the review of models based on the integral equations. Let us consider an example when the observed signal y is formed by a linear filter in presence of observational noises. The model of observations under the input signal stationarity assumption can be written as b y(ξ) =
h(ξ − ξ )x(ξ )dξ + u(ξ),
(7.11.1)
a
where h is a convolution kernel, ξ – coordinates of the task. The inverse problem of determination x from observations (7.11.1) is known as deconvolution5 . From our point of view more appropriate names are inverse filtering or signal reconstruction. Now let’s move to the spectral domain, supposing stationarity of the signal and noise y ˆ(ω) = W(ω)ˆ x(ω) + u ˆ (ω). (7.11.2) Here and below hat ˆ· denotes Fourier transform and W(ω) = hˆ is the transfer function, which specifies the eigenvalues of the integral transformation of convolution type [18]. The range of frequencies where absolute value of the transfer function is small corresponds to the frequency band, where the inverse amplitude response |W−1 | is large. In this band the components of the inverse solution obtain the largest uncertainty, because in real applications sufficient 5
In general case the Fredholm integral equation with unstationary kernel h(ξ, ξ ), or Volterra equation b = ξ > a [7] are considered. We restrict our consideration to the convolution.
7.12 Panteleev Corrective Filtering
185
part of noise spectral density u ˆ (ω) is concentrated in this band, and is being amplified by the inverse operator. To reduce the uncertainty in this area filtering is required.
7.12
Panteleev Corrective Filtering
Let us suppose the spectral densities of useful signal x ˆ(ω) and noise u ˆ (ω) exist and lie in different frequency bands, and a filter with the transfer function Wflt (ω) and zero phase response can be used to separate the first term in (7.11.2) from the second. Then to solve the inverse problem the transformation can be used x ˆ(ω) = Wcorr (ω)y(ω), ˆ (7.12.1) where the operator Wcorr (ω) =
Wflt (ω) W(ω)
will be called corrective, i.e. simultaneously filtering and inverting. Correction here can be interpreted as addition of filtering to the inversion, resulting in reduction of amplitude response of the inverse transformation in the band, relevant to the largest uncertainty of inverse solution. So, by using a-priori information about the properties of the signal and noise for Wflt design, uncontrolled increase of the observational noise and their transfer to the inverse problem solution can be avoided. Of course, those frequency components that are completely suppressed, can not be restored. An important but difficult task is to restore the components suppressed to the level of noises. Within the approach (7.12.1) the operator can be used Wcorr (ω) =
W∗ (ω) , W∗ (ω)W(ω) + u ˆ (ω)/ˆ x(ω)
here and after the conjugation is denoted by star ∗ . This operator approaches to the inverse operator W−1 (ω) where the signal to noise PSD ratio tends to infinity. If for all frequencies the signal to noise ratio is constant and equal 1/α, we come to the expression Wcorr (ω) =
W∗ (ω) , W∗ (ω)W(ω) + αI
(7.12.2)
which can be obtained under the regularization theory framework (see next section). Example 7.12.1. Corrective filtering was suggested by V.L. Panteleev for reconstruction of gravity signal from the gravity measurements at sea [10].
186
7
Filtering and Inverse Problems Solving
Let the observational model (gravimeter) is given by the differential equation of the first order T y(t) ˙ + y(t) = x(t) + u(t), where T is the time constant, y(t) is measurement and on the right-hand side is an input signal6 . Let the frequency bands of useful signal x(t) and noise u(t) be different and can be separated by Panteleev filter % & ω0 − ω√0 |t| ω0 t ω0 |t| 2 h(t) = √ e cos √ + sin √ (7.12.3) 2 2 2 2 with the transfer function not introducing phase distortion W (ω) =
ω4
ω04 . + ω04
Then the solution of the inverse problem for useful signal x(t) can be obtained by observations filtering (convolution) with Panteleev corrective window % & ω0 ω0 ω0 − √ ω0 ω0 ω0 |t| 2 cos √ t + sin √ |t| − 2T √ sin √ t . hcorr (t) = √ e (7.12.4) 2 2 2 2 2 2
7.13
Philips-Tikhonov Regularization
In 1963 for solving the ill-posed problems A.N. Tikhonov7 proposed the regularization algorithm8 [13]. Independently in 1962 similar idea was proposed by D.L. Phillips, and this method of inverse problems solving was named PhillipsTikhonov regularization. The essence of the approach is in using the stabilization of solution through some additional regularization term to narrow the class of inverse solutions, which are possible under the uncertainty of the model and noise. The method is applicable for solving the problems, where it is a priory known, that the solution belongs to the closed set (compact) and has some smoothness properties. In the regularization method to obtain an inverse solution x, ¯ satisfying a-priori conditions and delivering a minimum to the square deviation of the estimate y¯ = y(x) ¯ from the observed values, Tikhonov functional is to be minimized E(x) = Es (y(x)) + αEr (g(x)) → min . 6 7 8
(7.13.1)
Though noise u(t) in this model comes as an input, not observational noise, Panteleev corrective scheme (7.12.1) is applicable. Biography of A.N. Tikhonov, who founded in 1970 the mechanical-mathematical faculty of Moscow State University, is presented in [14]. Notion regularization means improving the properties of solution. In analysis it also means convexification [1].
7.13 Philips-Tikhonov Regularization
187
The first term in the functional corresponds to the standard deviation 1 Es (y) ¯ = ||y − y|| ¯ 2, 2 where || · || is the distance in the given metric (norm of difference), which can be written in form of scalar product. The second term in (7.13.1) is a stabilizer Er multiplied by the regularization parameter α. This is a kind of penalty for the complexity of the model, where the positive value α regulates the payment for the deviation from the a-priori conditions, and function g(x) defines this a-priori conditions. Tikhonov regularization can be interpreted as a conditional optimization with an additional condition Er → min and Lagrange parameter α. If it is required the function g(x) to be smooth in a certain sense, the stabilizer can be written as 1 Er (g(x)) = ||Fr (D)g(x)||2 , 2 where Fr (D) is some certain differential operator [5]. Let the data be linear and given by the operator A, which can be matrix (7.7.1) or integral (7.11.1) y = Ax + u. Argument x could be a vector of parameters (matrix case) or a vector function (integral case). The inverse problem solving requires finding the optimal in Tikhonov sense solution x¯α together with an estimate y¯α = Ax¯α as a function of this solution. To do this, it’s necessary to find the minimum of (7.13.1) as a functional of x. Without limiting the generality one can write the component of standard error in form Es (x) = ([y − Ax] · [y − Ax]), where brackets (·) denote scalar product. For example in space LC 2 (R) (functions integrated with square) it takes the form ∞ Es (x) =
[y(ξ) − Ax(ξ)]∗ · [y(ξ) − Ax(ξ)]dξ,
−∞
and in linear vector space it takes the form Es (x) = (y − Ax)T (y − Ax).
(7.13.2)
For random signals, the operation of mathematical expectation should be used. Let us write the stabilizer in form of scalar product Er (x) = (Fr (D)(x − x∞ ) · Fr (D)(x − x∞ )),
188
7
Filtering and Inverse Problems Solving
where x∞ is an a priory model, deviation from which is being minimized, Fr (D) is some differential operator. Notion x∞ is used because when α → ∞ solution of the problem is determined only by the regularization term, as if Tikhonov functional has no term of standard error. On the contrary, when α → 0, the influence of stabilizer on the solution becomes negligibly small. A-priori information about the solution may be reduced to the assumptions about the class of functions it belongs to. Thus, the search of the minimal norm solution, which belongs to L2 and satisfies observations y, leads to the stabilizer Er = ||x||2 . Search of the solution with minimal norm from Sobolev space W21 (functions with square integrable first derivatives) leads to the stabilizer ˙ 2 ). Er = (||x||2 + ||x|| Let’s perform calculations at the example of function from LC 2 (R). Stabilizer for this space has the form ∞ Er (x) =
[Fr (D)(x(ξ) − x∞ (ξ))]∗ · [Fr (D)(x(ξ) − x∞ (ξ))]dξ.
−∞
To find the optimal function x¯ Tikhonov functional should be minimized using the generalized derivatives [7]. Vanishing of the first variation of Tikhonov functional is a necessary condition, and positivity of the second is enough to ensure that solution x¯α delivers the minimum to the functional (7.13.1) x¯α = arg min{E(x)}. Let us calculate the first variation of Tikhonov functional (7.13.1), compiled of the relevant terms ∞ ∞ dE(x + δx) ∗ =− A (y − Ax)dξ + α Fr∗ (D)Fr (D)(x − xa )dξ, d =0 −∞
−∞
and set it to zero ∞
[αFr∗ (D)Fr (D)(x − x∞ ) − A∗ (y − Ax)] dξ = 0.
−∞
Taking into account that α is any number from the interval (0, ∞) and integrand expression must be zero for all ξ, we obtain the minimality condition αFr∗ (D)Fr (D)x − αFr∗ (D)Fr (D)x∞ = A∗ y − A∗ Ax. It gives the expression for the optimal regularized estimate [A∗ A + αFr∗ (D)Fr (D)]x¯α = A∗ y + αFr∗ (D)Fr (D)x∞ .
(7.13.3)
7.13 Philips-Tikhonov Regularization
189
The final solution can be written in operator form as x¯α =
A∗ y + αFr∗ (D)Fr (D)x∞ . A∗ A + αFr∗ (D)Fr (D)
(7.13.4)
In case when A∗ A is non-singular linear operator, which allows inverse representation in form of the inverse transfer function9 or differential operator F (D) = A−1 , expression (7.13.3) can be written in form L(D)x¯α = ϕα (y),
(7.13.5)
in notations L(D) = 1 + αF ∗ (D)F (D)Fr∗ (D)Fr (D), ϕα (y) = F (D)y + αF ∗ (D)F (D)Fr∗(D)Fr (D)x∞ ,
(7.13.6)
obtained after multiplication of (7.13.3) by F ∗ (D)F (D). The solution of differential equation (7.13.5) can be written in form ∞ xα (y) =
G(y, y )ϕα (y )dy ,
(7.13.7)
−∞
where Green function G can be determined from the condition L(D)G(y, y ) = δ(y − y ). Equation (7.13.7) is a linear filter and becomes in stationary case the convolution equation. Thus for the case of the integral operator A the inverse problem solving reduces to the solution of the integral equation. Regularization method results in the integro-differential equation (7.13.4). If to allow the possibility of representation A = 1/F (D), than from (7.13.5) it can be seen, that besides the inverse operation (differentiation of observations) F (D)y on the right side of (7.13.5), the regularization method leads to the additional terms (7.13.6) and integration (7.13.7) or smoothing of the solution. So the solution can be interpreted as the operator A inversion together with additional filtering of observations. By analogy with (7.12.1) such an operation can be called the regularizing filter. Calculations for the case of linear vectorial space will lead us to the expression essentially not different from (7.13.4). Conjugation in this case would be equivalent to transposition (7.13.2). For the linear systems of form (7.7.3) the regularized solution (7.9.1) was suggested above. It can be obtained from the extremum condition for Tikhonov functional, if x∞ = 0 and stabilizer is 9
This representation is possible if operator is non-singular and its eigenfunctions are known.
190
7
Filtering and Inverse Problems Solving
Er = ||x||2 (ep. 7.13.1). It is easy to show that if α → 0 the regularized solution (7.9.1) tends to normal pseudosolution. Regularization allows to include it into the family of solutions, continuous over the parameter, while the normal pseudosolution, obtained by Moore-Penrose pseudoinversion, is not a continuous function of the system matrix. However, there is no need in direct introduction of regularizing functionals for finding the solution in Ln . This approach is mostly designed for infinite-dimensional spaces. The solution still can be found by using (7.9.1) in the process of iterations with the decreasing value of the regularization parameter, coordinated with the generalized residual [15]. Let us summarize. While solving the inverse problems their correction is necessary, or the imposition of restrictions on the class of possible solutions based on a-priori data about the object. It is better to introduce such a restrictions on the basis of realistic physical assumptions. We have considered Moore-Penrose pseudoinversion, Panteleev corrective filtering, Tikhonov regularization methods. It is possible to say that all these methods are based on very simple ideas, based on the principles of filtering. Example 7.13.1. For the system (7.7.3), where the model is given approximately by the matrix C with errors εC , and observations are aggravated by white noise with unit dispersion (σ 2 = 1), the search of the comparable by precision regularized solution x ¯ is conducted according to the iterative scheme x ¯α = (CT C + αI)−1 CT y, where x ¯α is a solution, corresponding to the value of regularization parameter α from a decreasing sequence. The value α is chosen in accordance with the generalized residual criterion ||y − Cx¯α || = σ + εC ||¯ xα ||, √ where Euclid norm is used ||x|| = xT x. Example 7.13.2. Ill-posed operation of differentiation is the inverse problem with respect to the convolution integral problem. Suppose the latter is given in form of stationary integral convolution operator A with the kernel K b K(t − τ )x(τ )dτ = K ∗ x.
y = Ax = a
Then the problem of differentiation can be regularized. (a) While looking for solutions from L2 , minimization of (7.13.1) leads to the operator expression A∗ y . x= ∗ A A+α
7.13 Philips-Tikhonov Regularization
191
In frequency domain the solution can be written in form xˆ =
W ∗ yˆ ˆ = Wreg y, W ∗W + α
ˆ where W (ω) = F {K(t)} = K(t), is the transfer function, and Wreg is a regularizing operator (see (7.12.2)). Regularized solution in its turn may be rewritten in time domain as convolution of observations with the regularizing kernel Kreg (t) = F −1 {Wreg (ω)} x ¯ = Kreg ∗ y. (b) While looking for solution from W12 the minimization criteria takes the form (A∗ A + α)x − A∗ x + αx¨ = 0, and in frequency domain the solution can be written in form xˆ =
W ∗W
W ∗ yˆ = Wreg y. ˆ + α(1 + ω 2 )
Regularization parameter is being chosen iteratively according to the generalized residual[15] p(α) = ||y − Axα || − σ − ε||xα || ≈ 0, where σ is an observational error, ε is an error of operator assignment. Example 7.13.3. An interesting example of applying (7.13.7) is the so-called regularization neural networks [5]. After discretization of the continuous function ϕ(α, y) (7.13.6), its counts become the coordinates of decomposition over the Green functions (7.13.7). It’s possible to implement the Green functions on the basis of radial neurons and coefficients of decomposition with use of linear neurons. Regularized solution can be calculated by the neural networks with radial basis functions. Example 7.13.4. From the Euler-Liouville equation for the rotating Earth the equation of motion of the Earth’s pole can be derived im ˙ + m = ϕ, σc
(7.13.8)
where m = m1 + im2 ≈ xp − iyp , small values m1 , m2 give the corrections to the angular velocity vector (pole displacement), xp , yp – coordinates of the pole in the Earth system of coordinates, and on the right side is the excitation function ϕ = ϕ1 + iϕ2 depending on the moments of external forces, as well as on the moments of inertia of the Earth, the perturbations of the inertia tensor, the relative angular momentum and average velocity of Earth rotation
192
7
Filtering and Inverse Problems Solving
Ω = 7.292 · 10−5 rad/sec. The main parameter of the equation (7.13.8) is Chandler frequency σc , which can be represented in form % & i σc = 2πfc 1 + , 2Q where fc is the Chandler frequency, Q is the quality factor. According to estimates made by Wilson and Vicente, fc = 0.843 cycle per year, Q = 175. For the input ϕ(t) the solution of the equation (7.13.8) has the form t m(t) = e
iσc t
m0 (t0 ) − iσc
ϕ(τ )eiσc (t−τ ) dτ,
t0
where m0 (t0 ) is the constant, determined by the initial conditions. The influence of the initial conditions for the stable system over time (t0 → −∞) vanishes, and the trajectory of the pole is determined by the particular solution of the inhomogeneous system, which has the form of convolution integral m = h ∗ ϕ. Thus, the trajectory of the Earth’s pole m(t) is the smoothed input ϕ(t). The corresponding filter can be described by the impulse characteristic h(t) = −iσc eiσc t ,
(7.13.9)
σc . ip + σc
(7.13.10)
or the transfer function W (p) =
The main observed components of polar motion are Chandler and annual components, decadal variations and some high-frequency components. Chandler component is one of the strongest (up to 6 meters in amplitude), it is supposed to be exited by atmospheric and oceanic processes, but the causes of its amplitude changes are not exactly known. Reconstruction of the excitation function ϕ (input force) from the observed polar motion of the Earth is an ill-posed problem (differentiation of observations). To solve it a corrective procedure is required. To extract the Chandler excitation additional removing of the annual and side frequencies components is needed. In [17] using the data since 1846 from the IERS EOP C01 bulletin the reconstruction of Chandler excitation is performed by three methods: (a) The Chandler component, extracted by the singular spectrum analysis [6], was processed by Wilson filter, which approximates the inverse to (7.13.10) operator ie−iπfc Δt (7.13.11) mt+ Δt − eiσc Δt mt− Δt , χ(t) = 2 2 σc Δt where Δt/2 is the step between observations (0.05 yr).
7.13 Philips-Tikhonov Regularization
193
(b) The annual component was adjusted by LSM and subtracted from the initial series, then the regularization in frequency domain is applied with use of expression (7.12.2) Wreg (ω) =
W ∗ (ω) , W ∗ (ω)W (ω) + α
(7.13.12)
where the transfer function W is given by 7.13.10. Regularization parameter α = 500 was selected to make the result more or less consistent with obtained by method a). (c) In frequency band Panteleev corrective filtering applied (ep. 7.12.1) with the center at the Chandler frequency and parameter f0 = 0.04 selected to damp high and low frequencies, including the annual. Figure 7.3a represents amplitude responses of three afore mentioned methods. Results of reconstruction of the Chandler excitation are represented on Figure 7.3b for x-coordinate of the pole (for y coordinate the picture is similar). Three methods gave similar results. Obtained excitation shows amplitude modulation. In most cases Chandler excitation increases simultaneously with the decrease in the Earth rotation speed (increase of length of day LOD), as a result of tidal force increase in the 18.6-year cycle of the Moon orbit node motion. It can be seen from comparison of excitation curve with 18.6-year harmonic of the IERS zonal tide model for LOD, represented along the abscissa axis on Figure 7.3b. It means that Moon could be an important factor for Chandler excitation, while atmosphere and ocean could provide a channel for energy transfer.
Figure 7.3. Amplitude responses of inverse operators (a) and the result of Chandler excitation reconstruction (b).
194
7
Filtering and Inverse Problems Solving
Acknowledgements This work is supported by the grant of President of Russia and Chinese Academy of Sciences Fellowship for Young International Scientists grant.
References [1] V. M. Alexeev, V. M. Tihomirov and S. V. Fomin, Optimal Control, Fizmatlit, 2005. [2] A. E. Albert, Regression and the Moore-Penrose Pseudoinverse, Elsevier, 1972. [3] D. V. Beklemishev, Additional Chapters from Linear Algebra, Moscow, Nauka, 1983. [4] V. S. Gubanov, A Generalized Least Squares Method: Theory and Application to Astrometry, St. Petersburg, 1997. [5] S. Haykin, Neural Networks: A Comprehensive Foundation, Prentice Hall, 1998. [6] I. T. Jollife, Principal component analysis, Springer-Verlag, 2002. [7] A. N. Kolmogorov and S. V. Fomin, Elements of the Theory of Functions and Functional Analysis, 1957. [8] V. L. Panteleev, Gravity Measurements in the Sea, Mir, 1983. [9] V. L. Panteleev, Mathematical Processing of Observations, Lectures. M., MSU, 2001, http://lnfm1.sai.msu.ru/grav/russian/lecture/mon/mon.htm. [10] V. L. Panteleev and T. S. Chesnokova, Problem deconvolution in inertial gravimetry, Moscow University Physics Bulletin, 66(1), 2011. [11] B. Schaffrin, On Penalized Least-Squares: Its Mean Squared Error and a QuasiOptimal Weight Ratio, Recent Advances in Linear Models and Related Areas Essays in Honour of Helge Toutenburg, Springer-Verlag, 2008. [12] V. Yu. Terebizh, Introduction to Statistical Theory of Inverse Problems, Moscow, Fizmatlit, 2005. [13] A. N. Tikhonov, A. S. Leonov and A. G. Yagola, Nonlinear Ill-posed Problems, Chapman and Hall, 1998. [14] A. A. Tikhonova, N. A. Tikhonov and A. N. Tikhonov, Sobranie, 2006. [15] A. G. Yagola, I. E. Stepanova and V. N. Titarenko, Inverse Problems in Geophysics, 2008. [16] M. S. Zhdanov, Geophysical Inverse Theory and Regularization Problems, Elsevier, 2002. [17] L. Zotov, Dynamical modeling and excitation reconstruction as fundamental of Earth rotation prediction, Artificial Satellites, 45(2), 95-106, Warsaw, 2010. [18] L. Zotov, Theory of Filtering and Time Series Processing, Moscow, MSU, 2010, http://lnfm1.sai.msu.ru/grav/english/lecture/filtering/.
Authors Information L. V. Zotov and V. L. Panteleev Sternberg Astronomical Institute, Moscow State University, Moscow 119991, Russia. E-mail: [email protected]
Part III
Optimal Inverse Design and Optimization Methods
Chapter 8
Inverse Design of Alloys’ Chemistry for Specified Thermo-Mechanical Properties by using Multi-objective Optimization G. S. Dulikravich and I. N. Egorov
Abstract. Inversely designing new alloys for specific applications involves determining concentrations of alloying elements that will provide, for example, specified tensile strength at a specified temperature for a specified length of time. This represents an inverse problem which can be formulated as a multiobjective optimization problem with a given set of equality constraints. This chapter describes several such formulations for the multiple objective functions and comparatively evaluates these models when using optimization to solve this de facto inverse problem. This approach allows a materials design engineer to design a precise chemical composition of an alloy that is needed for building a particular object. This inverse method uses a multi-objective constrained evolutionary optimization algorithm to determine not one, but a number of alloys (Pareto front points), each of which will satisfy the specified properties while having different concentrations of each of the alloying elements. This provides the user of the alloy with additional flexibility when creating such an alloy, because he/she can use the chemical composition which is made of the most readily available and the most inexpensive elements. It should be pointed out that the inverse problem of determining alloy chemical composition is different from a direct optimization problem of designing alloys that will have extreme properties. This alloy design methodology does not require knowledge of metallurgy or crystallography and is directly applicable to alloys having an arbitrary number of alloying elements. Examples are presented for Ni-based steel alloys and bulk metallic glasses, although the method is applicable to inversely designing chemical concentrations of arbitrary alloys.
198
8.1
8 Inverse Design of Alloys’ Chemistry
Introduction
It is well known that thermo-physical properties of alloys depend on the choice and number of the alloying elements, concentrations of each of the alloying elements, and thermal and/or mechanical treatment protocol that an alloy is typically submitted to in an a posteriori fashion. The microstructure of an alloy depends on these influencing factors. It represents an intermediate step in this cause-consequence relationship between chemistry and thermo/mechanical treatment on one side, and thermo-mechanical properties on another side. Mathematical modeling of the interdependency of various thermo-physical properties on each of the influencing factors is either non-existent or based on empiricism and heuristics. Thus, the general problem of designing new alloys is still an art, rather than a science. It involves the designer’s experience with general metallurgy, personal intuition and an excessively long and expensive experimentation which makes the alloy design process very costly. It does not currently involve any aspects of chemistry. Therefore, rather than attempting to develop a new fundamental science of alloys’ chemistry based on nonlinear thermodynamics and atomistic modeling of basic structures, which is still restricted to relatively small number of atoms because of the excessive computing time and memory requirements, it is more prudent to utilize simple models that do not require detailed elaboration of microstructure and chemistry. Since such simple meta models linking causes and consequences can significantly reduce the overall time and cost of the alloy design process, it is of utmost importance to utilize such computational design tools that already exist and have been successfully applied in numerous other fields of science and engineering. Such proven design tools are various design optimization algorithms that can be used to create alloys with extreme thermophysical properties [1–7] or can be used in conjunction with inverse design of alloys [8, 3, 7] having specified thermo-physical properties. For example, a designer of a crankshaft in an internal combustion engine needs to use an alloy that will sustain a very specific maximum stress, at a specific temperature, for a specific number of hours before it breaks. This would be a typical example of an inverse design of alloys [8]. The resulting alloys that will meet the desired specifications are typically considerably less expensive than the optimized alloys where the properties were extremized via an alloy design optimization process [1–7]. In this article, we will elaborate on a method that we created for inverse design of alloys that will have values of their thermo-physical properties as specified by the designer. This inverse design method uses a variant of I. N. Egorov’s optimization algorithm known as IOSO [9, 3] to determine not one, but a number of alloys, each of which is satisfying the specified properties while having different con-
8.2 Multi-Objective Constrained Optimization and Response Surfaces
199
centrations of each of the alloying elements. This provides the user of the alloy with increased flexibility when deciding to create such an alloy. In this way, the customer can choose the inversely determined alloy composition (the alloying elements to be used in a new alloy) and the inversely determined set of concentrations (of these alloying elements) that are the most available and the least expensive at the moment when it is ordered from the alloy manufacturer. It should be pointed out that the inverse problem of determining alloy chemical composition is different from a direct optimization problem [1–7] of designing alloys that will have extreme properties. The inverse problem can then be formulated as, for example, a multi-objective optimization problem with a given set of equality constraints. We have used IOSO multi-objective optimization algorithm [9] to achieve the solution of this type of inverse alloy design problem [8, 3, 7]. We have developed eight mathematical formulations and corresponding software packages for different ways to achieve inverse determination of chemical concentrations of alloying elements that simultaneously satisfy several specified mechanical and cost/availability properties. These different formulations were then compared and analytically evaluated in an attempt to determine the most appropriate formulation.
8.2
Multi-Objective Constrained Optimization and Response Surfaces
The key to the success of the proposed inverse method for design of alloys is the robustness, accuracy, and efficiency of the multi-objective constrained optimization algorithm. This inverse problem solution methodology and results presented in this chapter are based on a special adaptation of IOSO [9], which is a robust stochastic multi-objective constrained optimization algorithm. The IOSO algorithm is of a semi-stochastic type incorporating certain aspects of a selective search on a continuously updated multi-dimensional response surface. IOSO can utilize either a weighted linear combination of several objectives or a true multi-objective formulation option for creating Pareto fronts. The main benefits of this algorithm are its outstanding reliability in avoiding local minima, its computational speed, and a significantly reduced number of required experimentally evaluated candidate alloys as compared to more traditional semi-stochastic optimizers such as genetic algorithms. Furthermore, the self-adapting response surface formulation [10] used in IOSO allows for incorporation of realistic non-smooth variations of experimentally obtained data and provides for accurate interpolation of such data. One of the advantages of this approach is the possibility of ensuring good approximating capabilities using a minimum amount of available information.
200
8 Inverse Design of Alloys’ Chemistry
This possibility is based on self-organization and evolutionary modeling concepts [10, 8, 3]. During the optimization process, the approximation function (multi-dimensional response surface) structure is being continuously improved, so that it allows successful approximation of the optimized functions and constraints having sufficiently complicated topology. The obtained analytical formulations for the response surface approximations can be used by multi-level optimization procedures with an adaptive change of approximation level accuracy for both a single and multiple objectives analysis, and also for the solution of their interaction problems. With reference to a particular problem of the creation of alloys with desirable properties, there will inevitably arise a problem of constraints that need to be specified on the objective functions. These constraints are absent in a more general multi-objective optimization statement. Such objective constraints should be set by the user (expert) and could be allowed to vary during the solution process. For example, a minimum acceptable value for the Young’s modulus of elasticity could be specified as an inequality constraint. Or, a maximum acceptable percentage for each of the most expensive chemical elements in the alloy could be specified as a cost objective constraint. Also, the maximum acceptable manufacturing cost of an alloy could be specified as an inequality constraint. The problem of search for a Pareto-optimum solution set in the multi-objective optimization, while varying concentrations of alloying elements, would be an unacceptably labor-intensive process. This is because of an extremely large number of candidate alloys that would need to be created and because several of the properties of each of these alloys would have to be evaluated experimentally. In this case, we can speak only about the creation of some rather extensive database including the information on various properties of alloys for various combinations of a chemical structure. Such a database could be used for the solution of particular problems aimed at the creation of alloys with desirable properties. Unfortunately, inverse problems, as a rule, are difficult to formalize at the initial stage, since the user does not know initially what values of some objectives could be physically reached and how the remaining objectives will vary. That is, the user has very little if any a-priori knowledge of topology of the objective functions. Hence, it is very difficult to predict the number of experiments required in the optimization application proposed here. Therefore, it appears that inverse design of alloys via use of optimization can be solved only in an interactive mode, when the user during the solution can modify both objective constraints and objective functions. Actually, in this case one can speak about optimally controlled experiments. Let us consider several different scenarios for the solution of optimization problems for these conditions.
8.3 Summary of IOSO Algorithm
201
The first approach is to perform a general multi-objective optimization of the material properties. Within the frame-work of this strategy, we are to solve the multi-objective optimization problem (to find the Pareto set) using the general IOSO algorithm. This strategy is the most accurate, but it requires a very large number of experiments. The second approach is an interactive step-by-step optimization of the material properties. The first step of this strategy is to create an initial plan of experiments. This involves formulation of a single (hybrid) optimization objective by the user. This objective may be the convolution of particular objectives with different weight coefficients assigned to each of them. Then, one optimization step is needed to minimize this composite objective. The result of this strategy is the single solution that belongs to Pareto-set. However, during such relatively efficient quasi multi-objective optimization process we can accumulate the information about the particular objectives and construct progressively more accurate response surface models. Thus, in order to develop and realize the most effective optimization strategies, both of the first and the second kind, we have to perform a thorough preliminary search for the classes of base functions that will be able to construct the most accurate multi-dimensional response surface models. The number of experiments that is necessary for true multi-objective optimization problem solution depends not only on the dimensionality of the problem (the number of chemical elements in an alloy); it also depends to a considerable degree on the topologies of the objective functions. For example, for the solution of an actual problem in the car industry with 6 variables, we needed nearly 60 experiments when using a basic IOSO algorithm [11]. However, for finding the minimum of the classical Rosenbrock test function, having only 2 variables, it was necessary to perform almost 300 objective function evaluations.
8.3
Summary of IOSO Algorithm
An extremely important part of the optimization process is the creation and iterative improvements of a multidimensional response surface (an approximation of the objective function as an analytical expression relating it to the design variables-concentrations of different alloying elements). Each iteration of IOSO, therefore, consists of two steps. The first step is the creation of an approximation of the objective function(s). The response surface in IOSO is modeled analytically as a tree-structure or a multi-level graph, where each branch is a quadratic polynomial. Thus, the final analytic expression for a multidimensional response surface is a polynomial-of-a-polynomial-of-a-polynomial. . . , where each polynomial is a simple quadratic function. Generally speaking,
202
8 Inverse Design of Alloys’ Chemistry
the basic polynomial could be a linear function, a quadratic function, a cubic function, a quartic function, etc. [11, 12], but the best tradeoff between the accuracy of the fitting process and the computational cost appears to be the quadratic polynomial [11]. The second step in IOSO is the optimization of this approximation function. This approach allows for corrective updates of the structure and the parameters of the response surface approximation. The distinctive feature of this approach is an extremely low number of trial points needed to initialize the algorithm. The obtained response surface functions are used in the multi-level optimization, while adaptively utilizing various single and multiple discipline analysis tools that differ in their level of sophistication. During each iteration of IOSO, the optimization of the response function is performed only within the current search area. This step is followed by a direct call to the mathematical analysis model or an actual experimental evaluation for the obtained point. During the IOSO operation, the information concerning the behavior of the objective function in the vicinity of the extremum is stored, and the response surface function is re-created locally and made more accurate only for this search area. Thus, during each iteration, a series of approximation functions for a particular objective of optimization is built. These functions differ from each other according to both structure and definition range. The subsequent optimization of these approximation functions allows us to determine a set of vectors of optimized variables. During this work, algorithms of artificial neural networks (ANN) [13] were used that utilized radial-basis functions modified in order to build the response surfaces. The modifications consisted in the selection of ANN parameters at the stage of their training that are based on two criteria: minimal curvature of the response hyper-surface, and provision of the best predictive properties for a given subset of test points. In summary, each iteration of IOSO multi-objective optimization applied to alloy design involves the following: (1) Building and training ANN1 for a given set of test points. (2) Conducting multi-objective optimization with the use of ANN1 and obtaining a specified number of Pareto optimal solutions P1. (3) Determining a subset of test points that are maximally close to points P1 in the space of variable parameters. (4) Training ANN2 proceeding from the requirement to provide the best predictive properties for obtained subset of test points. (5) Conducting multi-objective optimization with the use of ANN2 and obtaining a set of Pareto-optimal solutions P2. In general, the database contains information on experimentally obtained alloy properties compiled from different sources and obtained under different ex-
8.4 Mathematical Formulations of Objectives and Constraints
203
perimental conditions. As a result, alloys with the same chemical compositions can have considerable differences between their measured properties. These differences can be explained as errors due to the particular conditions existing during the experiments (measurement errors), and by the effect of certain operating conditions (for example, thermal condition of alloy making). Unless operating conditions are quantified numerically, their influence is regarded as an additional chance factor. Therefore, in its simplified form, the alloy design methodology that takes into account these uncertainties can be presented as the following set of actions: (1) Formulation of optimization task, that is, selection of variable parameters, definition of optimization objectives and constraints, and setting initial (preliminary) ranges of variable parameters’ variations. (2) Preliminary reduction of the experimental database. At this stage, the alloys meeting optimization task statement are picked up from the database so that alloys having chemical composition outside the chosen set of chemical elements are rejected. Alloys for which there is no data for at least one optimization objective are rejected. In addition, alloys with chemical concentrations outside the set range of variable concentrations are also rejected. (3) Final reduction of the experimental database. Since accuracy of the building of response surfaces substantially depends on uniformity of distribution of variable parameters in the surveyed area, rejection of experimental data values appearing significantly outside of the universal set is performed. At the end of this stage, a final range of variable parameters for optimization is set. (4) Execution of multi-objective optimization resulting in a specified number of Pareto optimal solutions. (5) Analysis of optimization results. (6) Manufacturing and experimental evaluation of the obtained Pareto optimal alloys to obtain high fidelity values of the optimized objectives and analysis of the results obtained. (7) Change of the optimization problem statement (number of simultaneous objectives and constraints, the set and range of variable parameters), and returning to step 2. (8) Modification of database and returning to step 4. (9) Stop.
8.4
Mathematical Formulations of Objectives and Constraints
In particular, the objective of this inverse alloy design task was to determine concentrations (by weight) of each of the 14 alloying elements (C, S, P, Cr, Ni, Mn, Si, Mo, Co, Cb, W, Sn, Zn, Ti ) in high temperature steel alloys that
204
8 Inverse Design of Alloys’ Chemistry
will have specified (desired) physical properties. No mathematical analysis was used to evaluate the physical properties of candidate alloys. The evaluations of properties were performed using classical experiments on candidate alloys. In other words, we used an existing experimental database [1, 2, 3, 4, 5, 8]. The ranges of concentrations of these elements were set by finding the minimum and the maximum values of concentrations for each alloying element in the existing set of experimental data (Expmini , Expmaxi , where i = 1, . . . , 14). Then, new minimum and maximum values for concentrations of each of the 14 alloying elements were specified according to the following simple dependencies: (Mini = 0.9Expmini , Maxi = 1.1Expmaxi , where i = 1, . . . , 14). These ranges are given in Table 8.1. Table 8.1. Ranges of variation of design variables (concentrations of alloying elements).
min max min max
C 0.063 0.539 Mo 0.000 0.132
S 0.001 0.014 Co 0.000 0.319
P 0.009 0.031 Cb 0.000 1.390
Cr 17.500 39.800 W 0.000 0.484
Ni 19.300 51.600 Sn 0.000 0.007
Mn 0.585 1.670 Zn 0.001 0.015
Si 0.074 2.150 Ti 0.000 0.198
The inverse problem can be then formulated as, for example, a multi-objective optimization problem with a given set of equality constraints. This optimization was formulated as a multi-objective statement with three simultaneous objectives: minimize the difference between the specified and the actual stress, minimize the difference between the specified and actual maximum temperature, and minimize the difference between the specified and actual time to rupture at the specified temperature and stress. One additional objective (minimizing the cost of the raw material used in the alloy) was also considered. Eight different mathematical formulations of this constrained optimization problem were created (Table 8.2) and implemented using IOSO algorithm. In the case of inversely determining concentrations of each of the 14 alloying elements in steel alloys when using the eight mathematical formulations for the objective function(s) and constraints on the range of design variables (Table 8.1), IOSO optimization algorithm offered consistently high accuracy in satisfying the specified stress (Figure 8.1), operating temperature (Figure 8.2), time-until-rupture (Figure 8.3) and an overall combined accuracy (Figure 8.4). Overall performance evaluation of the various inverse alloy design formulations was then developed that was based on an ad hoc analytical formulation summarized in equations (8.4.1) through (8.4.8).
3
1
1
1
1
10
3
4
5
6
7
8
(T − Tspec )2
(σ − σspec )2 (T − Tspec )2
(σ − σspec )2
2
(H − Hspec )
2
(H − Hspec )2
(H − Hspec )2
(σ − σspec )2 + (T − Tspec )2 + (H − Hspec )2
(σ − σspec ) (T − Tspec )
2
Ni, Cr, Nb, Co, Cb, W, Ti
Objectives (minimize) Model Number of Stress Operating Time until Low cost number objectives temperature rupture alloy 2 2 2 1 3 (σ − σspec ) (T − Tspec ) (H − Hspec ) 2 2 1 (σ − σspec ) + (T − Tspec )2 + (H − Hspec )2 (σ − σspec ) < (σ − σspec ) < (T − Tspec ) < (H − Hspec ) < (σ − σspec ) < (T − Tspec ) < (H − Hspec ) < (T − Tspec ) < (H − Hspec ) < (σ − σspec ) < (H − Hspec ) < (σ − σspec ) < (T − Tspec ) <
Constraints (minimize)
8.4 Mathematical Formulations of Objectives and Constraints 205
Table 8.2. Eight formulations for objective functions and constraints.
206
8 Inverse Design of Alloys’ Chemistry
Figure 8.1. Comparison of accuracy of satisfying the specified stress for eight inverse design formulations.
Figure 8.2. Comparison of accuracy of satisfying the specified temperature for eight inverse design formulations.
Figure 8.3. Comparison of accuracy of satisfying the specified time-to-rupture for eight inverse design formulations.
8.4 Mathematical Formulations of Objectives and Constraints
207
Figure 8.4. Comparison of combined accuracies of satisfying the specified values for eight inverse formulations.
Δσ = (σ − σspec ) /σspec ,
(8.4.1)
ΔT = (T − Tspec ) /Tspec ,
(8.4.2)
ΔH = (H − Hspec ) /Hspec , −1 EP S = (Δσ)2 + (ΔT )2 + (ΔH)2 ,
(8.4.3) (8.4.4)
K1 = 10 Nobjectives + Nconstraints + Nvariables ,
(8.4.5)
K2 = 100 (1 − Δσ) + (1 − ΔT ) + (1 − ΔH),
(8.4.6)
K3 = Ncalls /NP areto ,
(8.4.7)
Maximize: SCORE =
K1 K2 exp(EP S). K3
(8.4.8)
When the suggested eight formulations were evaluated using this ad hoc evaluation procedure, only a few formulations appear to offer an overall superior performance (Figure 8.5).
Figure 8.5. The values of overall performance (SCORE) for eight formulations for inverse design of alloys.
208
8 Inverse Design of Alloys’ Chemistry
Table 8.3 presents a summary of accuracies in satisfying each of the constraints, number of the constraints, number of simultaneous objectives, number of Pareto points generated, number of optimization algorithm calls required, and the final performance scores of the eight design formulations with formulation number 8 being the best. It is also highly educational to visualize the fact that the inverse design of alloys gives results that are not unique. That is, the same objectives and constraints can be met by using different concentrations of alloying elements. For example, if the designer specifies the desired stress level of 230 N mm−2 at the desired temperature of 975 C for the desired time of 5000 hours until rupture, the optimization algorithm can be asked to generate 50 possible combinations of Ni and Cr concentrations that will all provide life expectancy of 5000 hours at the desired stress level and the desired temperature. If the life expectancy is specified by the designer to be 6000 hours for the same stress and temperature levels, the allowable range of possible combinations of Ni and Cr concentrations will decrease. This becomes more noticeable as the specified time until the rupture is increased to 7000 hours and eventually to 8000 hours (Figure 8.6). Notice the reduction in the range of the acceptable variations of concentrations of the alloying elements as the specified alloy life expectancy increases.
Figure 8.6. Allowable ranges of Ni and Cr concentrations for a specified level of stress at a specified temperature for different specified times until rupture.
Thus, the presented methodology for inversely designing chemical compositions of alloys offers a significant freedom to the designer to choose from a relatively large number of possible chemical concentration sets that satisfy the same specified physical properties. This is very attractive in cases when certain
Prob. Prob. Prob. Prob. Prob. Prob. Prob. Prob.
1 2 3 4 5 6 7 8
EP S σ .408e−19 .269e−08 .897e−10 .434e−13 .413e−13 .954e−06 .408e−10 .714e−09
EP S T .356e−06 .267e−07 .143e−09 .289e−12 .139e−05 .576e−15 .515e−10 .928e−09
EP S H .536e−06 .172e−08 .134e−12 .244e−18 .549e−06 .980e−04 .299e−12 .127e−10
EP S sum .297e−06 .104e−07 .777e−10 .111e−12 .646e−06 .646e−06 .309e−10 .552e−09
Nconstr 0 3 3 3 2 2 2 3
NObj 3 1 3 1 1 1 1 10
NP areto 50 1 50 1 1 1 1 46
Ncalls 417 703 445 1020 601 774 776 834
Score 0.590 0.246 0.817 0.246 0.239 0.180 0.256 1.000
8.4 Mathematical Formulations of Objectives and Constraints 209
Table 8.3. Summary of accuracies for each of the eight inverse design formulations for alloys.
210
8 Inverse Design of Alloys’ Chemistry
alloying elements are becoming hard to obtain or too expensive in which case the optimized alloys with the lowest concentrations of such alloys can be used. It is also highly educational to visualize the intrinsic nonlinearities of the unknown relationships between the concentrations of the alloying elements and the multiple properties of the alloys. Figure 8.7 shows that although concentrations of Ni and Cr in the 50 inversely designed alloys vary smoothly (Figure 8.6), concentrations of other alloying elements in these alloys have highly nonsmooth variations, suggesting that even small variations of concentrations of certain alloying elements can cause significant variations in properties of alloys. Figure 8.7 was obtained using inverse design formulation number 3 with the following prescribed alloy properties: maximum stress = 4000 kpsi, temperature at which this stress is applied = 1800 F, time-until-rupture at the prescribed stress and the prescribed temperature = 5000 hours.
Figure 8.7. Variations of concentrations of several alloying elements corresponding to inversely designed alloys.
The results of this multiple simultaneous least-squares constrained minimization problem cannot be visualized for more than two alloying elements at a time. For example, when concentrations of only two alloying elements such as Ni and Cr are visualized, and temperature and life expectancy are unconstrained (unspecified), the optimization will result in a fairly large domain of acceptable variations of the concentrations of Cr and Ni [8]. However, as the constraints on temperature level are introduced and progressively increased, the feasible
8.4 Mathematical Formulations of Objectives and Constraints
211
domain for varying Cr and Ni will start to shrink (Figure 8.8). Similar general trends can be observed when the time until rupture is specified and progressively increased (Figure 8.9). The iso-contours in these plots depict the constant stress levels as functions of concentrations of Cr and Ni in these alloys.
Figure 8.8. Effect of increasing specified temperature alone on allowable concentrations of Ni and Cr.
Finally, when temperature level and time until rupture are specified simultaneously and then progressively increased simultaneously, the feasible domain for concentrations of Cr and Ni reduces rapidly (Figure 8.10). Similar trends could be observed when looking at any other pair of alloying elements.
212
8 Inverse Design of Alloys’ Chemistry
Figure 8.9. Effect of increasing specified time until rupture alone on allowable concentrations of Ni and Cr.
8.5
Determining Names of Alloying Elements and Their Concentrations for Specified Properties of Alloys
A more realistic (and considerably more complex) problem of inverse design of alloys is to actually determine which chemical elements to use in an alloy, while simultaneously determining the appropriate concentrations for each of the candidate elements. It is best to illustrate this inverse alloy design process by analyzing details presented in Figure 8.11. In this example, a maximum of 17 candidate alloying elements were considered (Cr, Ni, C, S, P, Mn, Si, Cu, Mo, Pb, Co, Cb, W, Sn, Al, Zn, Ti). The following three desired properties of the alloys were specified: stress = 4000 kpsi, temperature = 1800 F, time
8.5 Determining Names of Alloying Elements and Their Concentrations
213
Figure 8.10. Effect of simultaneously increasing specified temperature and specified time until rupture on allowable concentrations of Ni and Cr.
until rupture = 6000 hours. These specified alloy properties were then treated as three equality constraints (satisfy accuracy of the three specified properties to within one percent) and the entire alloy design problem was formulated as a constrained multi-objective minimization problem (minimize Cr and Ni concentrations simultaneously in order to minimize cost of the raw material). Results of this multi-objective constrained optimization task are given in Figure 8.11 by presenting five Pareto optimized alloys on the left hand side in terms of their concentrations of Ni and Cr, and the concentrations of the remaining 15 candidate alloying elements for each of the five Pareto optimized alloys given on the right hand side. Each of the five Pareto optimized alloys satisfies the three specified alloy properties while providing Pareto-optimized minimum use of Ni and Cr. It is fascinating to realize that optimized concentrations of some of the remaining 15 candidate alloying elements were found to be negligible although they are currently widely used in such alloys, thus
214
8 Inverse Design of Alloys’ Chemistry
Figure 8.11. An example of simultaneously determining alloying elements and their concentrations for alloys with specified properties.
eliminating these elements as potential candidates for forming these types of steel alloys. Consequently, the number of alloying elements that actually needs to be used to create an alloy with the three specified properties could be as low as 7 instead of 15 (in addition to Ni and Cr). This is highly attractive for practical applications where regular supply, storage, and application of a large number of different pure elements are considered impractical, costly and financially risky. This methodology of inversely designing chemical compositions of alloys offers a significant freedom to the designer to choose from a relatively large number of possible chemical compositions that satisfy the same specified physical properties. This is very attractive in cases when certain alloying elements are becoming hard to obtain or too expensive in which case the optimized alloys with the lowest concentrations of such alloys can be used.
8.6
Inverse Design of Bulk Metallic Glasses
Besides inverse design of Ni-based steel alloys, this alloy inverse design methodology can readily be used when designing arbitrary alloys including bulk metallic glasses (BMGs). For example, this inverse design method utilizing an optimization algorithm offers a capability to design a number of BMG alloys [7] with the same multiple properties, but having different chemistries that will make their availability, cost and utility more affordable. To demonstrate this, we created an initial data set of properties of 53 published experimentally evaluated Zr-based BMGs (Table 8.4) and then used IOSO optimization algorithm to determine chemical concentrations of 7 alloying elements (Zr, Cu, Al, La, (Cu, Ni), Pd, Si) in such BMGs that will all have glass transition temperature
8.7 Open Problems
215
Tg = 680 K for several prescribed values of BMGs’ liquidus temperature, Tl (1000 K, 1100 K, 1200 K, 1240 K). Results of such inverse design procedures utilizing optimization are depicted in Figures 8.12–8.15 in the form of concentrations of the alloying elements.
Figure 8.12. Results of an inverse design problem for Zr-based BMGs (specified Tg = 680 K and several specified values of Tl [7]) showing inversely determined concentrations of Cu and Zr for these conditions.
Figure 8.13. Results of an inverse design problem for Zr-based BMGs (specified Tg = 680 K and several specified values of Tl [7]) showing inversely determined concentrations of La and Al for these conditions.
8.7
Open Problems
The entire concept of inverse design of alloys is new and no other attempts to achieve the same have been found in the open literature. Since mathematical models linking the design variables (names and concentrations of the alloying elements) and the objectives (the specified multiple thermo-physical properties
216
8 Inverse Design of Alloys’ Chemistry
Table 8.4. Experimental data for 53 Zr-based BMGs collected from open literature [7]. # 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
Tl(K) 1188 1170 1176 1181 1184 1186 1187 1192 1195 1208 1178 1185 1189 1188 1195 1193 1204 1190 1212 1163 1176 1216 759 742 674 696 699 722 729 727 743 764 783 813 844 930 763 724 674 715 738 773
Tg(K) 724 722 714 703 704 708 704 706 701 697 717 714 719 720 722 711 704 692 685 705 698 684 403 407 405 414 420 422 426 423 426 431 435 440 436 435 404 405 405 411 417 422
Tg/Tl Zr% 0.609428 50 0.617094 50 0.607143 50 0.595258 50 0.594595 49 0.596965 48 0.593092 49 0.592282 48 0.586611 49 0.576987 49 0.608659 45 0.602532 45 0.604710 44 0.606061 45 0.604184 45 0.595977 46 0.584718 47 0.581513 54 0.565182 56 0.606191 52 0.593537 54 0.562500 54 0.530962 0 0.548518 0 0.600890 0 0.594828 0 0.600858 0 0.584488 0 0.584362 0 0.581843 0 0.573351 0 0.564136 0 0.555556 0 0.541205 0 0.516588 0 0.467742 0 0.529489 0 0.559392 0 0.600890 0 0.574825 0 0.565041 0 0.545925 0
Cu% 36 38 40 43 44 45 45 46 46 47 49 50 51 48 47 49 49 38 36 38 36 40 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Al% 14 12 10 7 7 7 6 6 5 4 6 5 5 7 8 5 4 8 8 10 10 6 12.4 13.2 14 14.6 15.2 15.7 15.9 16.3 16.6 17 17.5 17.9 18.4 20.5 14 14 14 14 14 14
La% 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 70 68 66 64.6 63.1 62 61.4 60.5 59.6 58.6 57.6 56.5 55.4 50.2 70 68 66 64 62 59
(Cu,Ni)% 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 17.6 18.8 20 20.8 21.7 22.3 22.7 23.2 23.8 24.4 24.9 25.6 26.2 29.3 16 18 20 22 24 27
Pd% 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Si% 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
8.7 Open Problems
217 (Continued)
# 43 44 45 46 47 48 49 50 51 52 53
Tl(K) 815 1097.3 1086 1058.1 1135.9 1153.6 862.7 785.6 731 792.7 825.5
Tg(K) 427 633 635 637 645 652 428 404 395 391 361
Tg/Tl Zr% 0.523926 0 0.576871 0 0.584715 0 0.602022 0 0.567832 0 0.565187 0 0.496117 0 0.514257 0 0.540356 0 0.493251 0 0.437311 0
Cu% 0 2 4 6 8.2 10.2 36 26 20 14 10
Al% 14 0 0 0 0 0 14 14 14 14 14
La% 57 0 0 0 0 0 50 60 66 72 76
(Cu,Ni)% 29 0 0 0 0 0 0 0 0 0 0
Pd% 0 81.5 79.5 77.5 75 73 0 0 0 0 0
Si% 0 16.5 16.5 16.5 16.8 16.8 0 0 0 0 0
Figure 8.14. Results of an inverse design problem for Zr-based BMGs (specified Tg = 680 K for several specified values of Tl [7]) showing inversely determined concentrations of Pd and (Cu, Ni) for these conditions.
Figure 8.15. Results of an inverse design problem for Zr-based BMGs (specified Tg = 680 K and several specified values of Tl [7]) showing inversely determined concentrations of Si and Pd for these conditions.
218
8 Inverse Design of Alloys’ Chemistry
of alloys) are non-existent, one might be inclined to use a heuristic interpolation algorithm (such as artificial neural networks (ANNs) [13]) to search an existing large data set of a similar class of alloys and try to interpolate these data in order to obtain a set of concentrations that will most closely provide for a specified set of alloy’s properties. However, ANNs require an unacceptably large “training” data set of experimentally obtained multiple thermo-physical properties for each class of alloys studied. In addition, ANNs are strictly interpolation algorithms that cannot themselves perform constrained optimization nor can they extrapolate outside the initial data set with any confidence. When testing samples of actual alloys, there is always a certain level of measurement error due to the finite accuracy of the testing equipment. This level of expected accuracy can now be specified and the results of the alloy composition optimization will automatically be modified to reflect this degree of uncertainty. Furthermore, during the manufacturing (melting and casting/solidification) of each new alloy, there is always a degree of uncertainty if the resulting alloy will have precisely the chemical composition that was expected when preparing and measuring the alloying components’ masses. The level of this uncertainty depends on the level of sophistication of the alloy manufacturing process. Now, we have incorporated this feature in our alloy optimization software, whereby the materials designer can specify the accuracy level of the manufacturing process and the optimizer will automatically and appropriately modify the predicted quantities.
8.8
Conclusions
A new concept has been developed for designing alloys having specified multiple physical properties. The design variables are concentrations of the alloying elements and the names of the alloying elements themselves. This inverse method was formulated as a constrained multi-objective optimization problem and solved using a robust evolutionary optimizer of IOSO type. As a result, multiple choices are obtained for combinations of concentrations of alloying elements whereby each of the combinations corresponds to another Pareto front point and satisfies the specified physical properties. This inverse alloy design methodology does not require knowledge of metallurgy or crystallography and is directly applicable to alloys having an arbitrary number of alloying elements.
Acknowledgements The authors are grateful for the financial support provided for this work by the US Army Research Office under the grant DAAD 19-02-1-0363 and partially by the US Department of Energy under the grant DE-FC07-01ID14252. The
References
219
authors are also grateful for the in-kind support provided by their employing institutions.
References [1] G. S. Dulikravich, I. N. Egorov, V. K. Sikka and G. Muralidharan, Semi-stochastic optimization of chemical composition of high-temperature austenitic steels for desired mechanical properties, 2003 TMS Annual Meeting, Yazawa International Symposium: Processing and Technologies, TMS Publication, Editors: F. Kongoli, K. Itakagi, C. Yamaguchi and H.-Y. Sohn, San Diego, CA, March 2-6, 1, 801-814, 2003. [2] I. N. Yegorov-Egorov and G. S. Dulikravich, Optimization of alloy chemistry for maximum stress and time-to-rupture at high temperature, paper AIAA-20044348, 10th AIAA/ISSMO Multidisciplinary Analysis and Optimization Conference, Editors: A. Messac and J. Renaud, Albany, NY, Aug. 30-Sept. 1, 2004. [3] I. N. Egorov-Yegorov and G. S. Dulikravich, Chemical Composition design of superalloys for maximum stress, temperature and time-to-rupture using selfadapting response surface optimization, Materials and Manufacturing Processes, 20 (3) 569-590, 2005. [4] G. S. Dulikravich and I. N. Egorov-Yegorov, Robust optimization of concentrations of alloying elements in steel for maximum temperature, strength, timeto-rupture and minimum cost and weight, ECCOMAS-Computational Methods for Coupled Problems in Science and Engineering, Editors: M. Papadrakakis, E. Onate and B. Schrefler, Fira, Santorini Island, Greece, May 25-28, 2005. [5] G. S. Dulikravich and I. N. Egorov-Yegorov, Design of alloy’s concentrations for optimized strength, temperature, time-to-rupture, cost and weight, Sixth International Special Emphasis Symposium on Superalloys 718, 625, 706 and Derivatives, TMS Publications, Editor: E. A. Loria, Pittsburgh, PA, October 2-5, 419-428, 2005. [6] G. S. Dulikravich, I. N. Egorov and M. J. Colaco, Optimizing chemistry of bulk metallic glasses for improved thermal stability, Modelling and Simulation in Materials Science and Engineering, 16(7), 075010-075029, 2008. [7] G. S. Dulikravich and I. N. Egorov, Optimizing chemistry of bulk metallic glasses for improved thermal stability, Symposium on Bulk Metallic Glasses. TMS 2006 Annual Meeting & Exhibition, Editors: P. K. Liaw and R. A. Buchanan, San Antonio, TX, March 12-16, 2006. [8] I. N. Yegorov-Egorov and G. S. Dulikravich, Inverse design of alloys for specified stress, temperature and time-to-rupture by using stochastic optimization, Proceedings of International Symposium on Inverse Problems, Design and Optimization - IPDO, Editors: M. J. Colaco, G. S. Dulikravich and H. R. B. Orlande, Rio de Janeiro, Brazil, March 17-19, 2004. [9] I. N. Egorov, Indirect optimization method on the basis of self-organization, curtin university of technology, Optimization Techniques and Applications (ICOTA’98), Perth, Australia, 2, 683-691, 1998. [10] I. N. Egorov and G. S. Dulikravich, Calibration of microprocessor control systems for specified levels of engine exhaust toxicity, 2003 JUMV Conference-Science and
220
8 Inverse Design of Alloys’ Chemistry
Motor Vehicles, Editor: C. Duboka, Belgrade, Serbia and Montenegro, May 2728, 2003. [11] H. R. Madala and A. G. Ivakhnenko, Inductive Learning Algorithms for Complex System Modeling, CRC-Press, 1994. [12] R. J. Moral and G. S. Dulikravich, A hybrid self-organizing response surface methodology, paper AIAA-2008-5891, 12th AIAA/ISSMO Multidisciplinary Analysis and Optimization Conference, Victoria, British Columbia, Canada, September 10-12, 2008. [13] H. K. D. H. Bhadeshia, Neural networks in materials science, ISIJ International, 39, 966-979, 1999.
Authors Information G. S. Dulikravich Department of Mechanical and Materials Engineering, Multidisciplinary Analysis, Inverse Design, Robust Optimization and Control (MAIDROC) Laboratory, Florida International University, Miami, Florida 33174, USA. E-mail: [email protected] I. N. Egorov SIGMA Technology, Moscow 107023, Russia.
Chapter 9
Two Approaches to Reduce the Parameter Identification Errors Z. H. Xiang
Abstract. As a typical inverse problem, the parameter identification procedure is ill-posed in nature. Consequently, the identified parameters are likely to contain some errors due to the polluted measurements. To reduce this kind of identification error, this chapter discusses two possible solutions. One is using an integrated algorithm that alternatively selects the optimal sensor placement and conducts the parameter identification. Another is called the modified extended Bayesian method, which is a kind of regularization method under the Bayesian estimation framework, but updates the a-priori information according to the identified parameters at each iteration step. Both of these approaches are developed through well-posedness analyses and emphasize the adaptive adjustment of a-priori information, so that the reliable identification results could be ensured.
9.1
Introduction
A widely used methodology in science and engineering is using an abstract model to describe a physical system. The model parameters could represent material properties, damage information, or other characteristic features of the interested system. And these parameters have to be identified from physical observations. In contract to the direct analysis, i.e., solving system responses with given parameters under certain external exciting, the parameter identification is a kind of inverse procedure, which uses some measured responses to identify the system parameter and sometimes even the external exciting should also be recovered. Ideally, this could be achieved by requiring calculated responses based on the identified parameters being equal to the measurements at some locations. Because of this requirement, the identification procedure is ill-posed in nature [1, 2]. Consequently, the resultant parameters could have some bias from true values due to some uncertainties.
222
9 Two Approaches to Reduce the Parameter Identification Errors
One source of uncertainty comes from the inaccuracy of the model. How to establish a proper model to describe the physical system is an everlasting topic in science community. People may still remember the famous disputation about the model of the solar system. Claudius Ptolemy adjusted the epicycles of his earth centered planetary model according to the observations. Nicolaus Copernicus just proposed a very simple model, in which the sun is located at the center of planetary system. Both of these two models could approximately explain the existing observation, and in many cases, Ptolemy’s model even shown higher accuracy. However, when new observations came up, they could be satisfactorily explained by Copernicus’ model without any modification, while the similar effect could only be achieved by adjusting the epicycles in Ptolemy’s complicated model. For this reason, people eventually prefer Copernicus’ model to Ptolemy’s model. From this story, one may roughly conclude that a good model should be as simple as possible and can not only explain the past but also predict the future. According to the second property of the good model, a practical way to evaluate the parameter identification results is using the so called cross-validation method [3], i.e., just using part of the observations to identify parameters, and then compare the calculated response with all observations. Another source of uncertainty comes from inevitable measurement noises, denoted as ε in this chapter. Assuming these noises are additive, the measured response can be represented as x ¯ = x∗ + ε, where x∗ is the true response. To reduce the influence of these measurement noises on identified parameters, one can formulate the parameter identification procedure as a nonlinear optimization problem with certain objective function f (p, x ¯): Min f (p, x ¯) , s.t. F (p, x) = 0, p ∈ Dp ,
(9.1.1)
where p is the model parameter; Dp is the admissible set of p; F (p, x) = 0 is the mathematical model that describes a physical system, which can be linear or nonlinear; and x is the calculated response based on the mathematical model. The general form of the objective function can be obtained from the extended Bayesian estimation as [4, 5]: , , 0 T −1 Cp p − p0 , f (p, x ¯) = RT C−1 u R+β p−p
(9.1.2)
where Cu and Cp are the covariance of measurements and parameters, respectively; p0 is the pre-estimated parameter; β is a nonnegative scalar and R is the residual vector: R = S (¯ x − x) , (9.1.3)
9.2 The Optimal Sensor Placement Design
223
where S is a selective matrix that determines the number and location of measurements; Cu , Cp , p0 and β are all a-priori information, which could be difficult to obtain in practice. If some a-priori information were unknown, the general objective function in (9.1.2) could be degenerated to other forms. For example, when one has no idea of p0 , i.e., Cp → ∞, the objective function in (9.1.2) becomes the maximum likelihood formulation, which is actually a kind of weighted Least Squares (LS) estimation [6]. Furthermore, it becomes the ordinary LS estimation, when all measurements are uncorrelated and have the same variance σ 2 , i.e., Cu = Iσ 2 (I is the identity matrix). Actually, the second term on the right hand side of (9.1.2) can be regarded as a kind of Tikhonov regulator [7, 8, 9], which can ensure the well-posedness of the parameter identification procedure. However, the regularization parameter β should be carefully selected to compromise between the stability and the accuracy of the parameter identification. So far, this is the most popular approach to reduce the parameter identification errors. However, if reanalysis (9.1.2) and (9.1.3), one may find another possible approach, i.e., trying to find the optimal selective matrix S. This approach is usually called the optimal sensor placement design, which was initiated in the aerospace engineering, then quickly spread out to other branches of science and engineering [10]. This chapter will discuss both the optimal sensor placement design and the regularization method. The main idea is proposing practical methods based on the well-posedness analysis of parameter identification procedures and emphasizes the adaptive adjusting of a-priori information.
9.2
The Optimal Sensor Placement Design
The sensor placement design tries to find the optimal number and location of sensors. For this purpose, one needs a proper definition of optimality and an efficient combinatorial optimization algorithm. In the following text, a criterion of optimal placement is established based on the well-posedness analysis of parameter identification procedure, and a heuristic is constructed with efficient local searches and an effective diversification mechanism. Moreover, the sensor placement design and the parameter identification are alternatively conducted to update some uncertain a-priori information.
9.2.1
The well-posedness analysis of the parameter identification procedure
Supposing there is not any a-priori information available, the objective function in (9.1.1) can be simplified as: f (p, x ¯) = RT R. To solve this LS problem, one
224
9 Two Approaches to Reduce the Parameter Identification Errors
can use the efficient Gauss-Newton method iteratively: , pk = G pk−1 , k = 1, 2, . . . ,
(9.2.1)
where k is the iteration step; and G is the mapping function [11]: , -−1 G (p) = p − JT J JT R.
(9.2.2)
Referring to (9.1.3), the Jacobian matrix J in (9.2.2) is: ∂R ∂x = −S . (9.2.3) ∂p ∂p , During the iteration of (9.2.1), it requires G pk ∈ Dp , otherwise pk should be pull back to the boundary of Dp . This iteration stops until the following conditions are satisfied: , k - ' S x − xk−1 pk −pk−1 2 i i < T ol1 , max pk < T ol2 , i = 1, 2, . . . , Np , (9.2.4) i Sxk 2 J=
where · 2 is the Euclidian norm; Np is the total number of parameters; T ol1 and T ol2 are two tolerances. Usually, each parameter locates in a continuous, bounded and convex admissible set. Therefore, according to the Brouwer’s fixed-point theorem [12], if the mapping function G in (9.2.2) is continuous, at least one solution exists. This can be satisfied if the Jacobian matrix J is continuous and the Fisher information matrix JT J is nonsingular. This requires that the interested response x is very sensitive to the parameter p and the number of measurements is not less than Np . Suppose p1 and p2 are arbitrary two sets of parameters, p1 , p2 ∈ Dp and G (p1 ) − G (p2 ) =
∂G (ζ) (p1 − p2 ) , ∂p
where ζ = p2 + η (p1 − p2 ), 0 < η < 1 and ! , " , T -−1 ∂ JT J , T -−1 T ∂JT ∂G (p) ≡ Ω (p) = J J J J J − R. ∂p ∂p ∂p
(9.2.5)
(9.2.6)
Then, it is easy to obtain: G (p1 ) − G (p2 ) ∞ ≤ Ω (ζ) ∞ p1 − p2 ∞ ≤ LΩ p1 − p2 ∞ ,
(9.2.7)
where · ∞ is the norm of infinity and LΩ ≡ max Ω (p) ∞ .
(9.2.8)
9.2 The Optimal Sensor Placement Design
225
Because G (p) ∈ Dp , according to the contraction mapping theorem [12], the convergence and uniqueness of this iteration can be guaranteed if LΩ < 1. This is a sufficient but not necessary condition. The mapping function G (p) defined in (9.2.2) reveals that the parameter identification procedure would be very unstable if the Fisher information matrix JT J was nearly singular. This happens if neither the response x is not very sensitive to the parameter p, nor the sensor placement, determined by S, is not properly designed, referring to (9.2.3). In this case, small measurement noises may lead to large identification errors. To discuss the stability of parameter identification procedure, it is better to trace the propagation of measurement noises through the identification iterations. For this purpose, one can define the difference between the identified parameters pk and the true parameter p∗ at the k-th step as: hk = pk − p∗ .
(9.2.9)
Referring to x ¯ = x∗ + ε, (9.1.3), (9.2.1), (9.2.2) and (9.2.9), it obtains: , hk − hk−1 = G pk−1 − pk−1 ! " , , ∂G pξk−1 , εξk−1 ∂G pξk−1 , εξk−1 ξk−1 k−1 = + , −I h ε ∂p ∂ε (9.2.10) , where pξk−1 = p∗ + γ pk−1 − p∗ , 0 < γ < 1, εξk−1 ∈ (0, ε). Define -−1 T , ∂G (p, ε) J Sε (9.2.11) ε = − JT J A (p, ε) ≡ ∂ε and refer to (9.2.6), (9.2.10) can be rewritten as: (9.2.12) hk = Ω pξk−1 , εξk−1 hk−1 + A pξk−1 , εξk−1 . Defining LA ≡ max A (p, ε)∞ ,
(9.2.13)
it yields: k h ≤ LΩ hk−1 + LA ∞ ∞ ≤ L2Ω hk−2 ∞ + (1 + LΩ ) LA ... ≤ LkΩ δp∞ + 1 + LΩ + · · ·
If LΩ < 1,
lim hk
k→∞
∞
≤
k−1 + LΩ
(9.2.14)
LA .
1 LA ≈ (1 + LΩ ) LA . 1 − LΩ
(9.2.15)
226
9 Two Approaches to Reduce the Parameter Identification Errors
This means the parameter identification procedure would converge to the solution with bias (1 + LΩ ) LA , if LΩ < 1. This coincides with the sufficient condition of uniqueness. The upper bound of parameter identification error B ≡ (1 + LΩ ) LA can be approximately evaluated through the following estimations of LΩ and LA : Ω (p, ε) ≈ Ω (p∗ , 0) +
∂Ω (p∗ , 0) ∂Ω (p∗ , 0) δp + ε, ∂p ∂ε
(9.2.16)
where δp = p − p∗ . Since R (p∗ , 0) = 0, according to (9.2.6), it is clear that Ω (p∗ , 0) = 0 and -−1 ∂J , ∂Ω (p∗ , 0) ∗ C (p ) ≡ , (9.2.17) = JT J JT ∂p ∂p p=p∗ ∂Ω (p∗ , 0) D (p∗ , ε) ≡ ε= ∂ε
,
" $ ! , T T , J ∂ J ∂J −1 −1 JT J JT J Sε JT − ∂p ∂p
. p=p∗
(9.2.18) Therefore, LΩ ≈ C (p∗ )∞ δp∞ + D (p∗ , ε)∞ . Similarly, A (p, ε) ≈ A (p∗ , ε) +
∂A (p∗ , ε) δp. ∂p
(9.2.19)
(9.2.20)
According to (9.2.11): , -−1 A (p∗ , ε) = − JT J JT Sε
p=p∗
,
(9.2.21)
∂A (p∗ , ε) = D (p∗ , ε) . ∂p
(9.2.22)
LA ≈ A (p∗ , ε)∞ + D (p∗ , ε)∞ δp∞ .
(9.2.23)
Therefore,
9.2.2
The algorithm for optimal sensor placement design
According to the well-posedness analysis in Section 9.2.1, if LΩ < 1 the parameter identification procedure converges to a unique solution with error B ≡ (1 + LΩ ) LA . If let δp = 0, for the given p∗ and the measurement noise ε, B is only determined by the selective matrix S. The best S should minimize B while ensure LΩ < 1. With this criterion, the optimal sensor placement can be obtained with the help of a combinatorial optimization algorithm. However, it
9.2 The Optimal Sensor Placement Design
227
is well known that this is an N P hard problem. Therefore, an efficient heuristic is needed, which should consist of a construction part, an intensification mechanism and a diversification mechanism [13]. In the construction part, all Nm candidate measurements are put into NS = Nm /Np ! measurement sets. Calculating LΩ and B for each set, and keep the best set Sbest as the initial solution that has the minimum B with LΩ (Sbest ) < 1. Starting from the initial solution, an intensification mechanism is needed to find better solutions in some local regions. As Figure 9.1 shows, three efficient elementary local search operations are alternatively conducted for this purpose.
Figure 9.1. The intensification mechanism.
The Exchange operation tries to exchange a pair of measurements between a set S1 and a set S2 . If either B (S1 ) or B (S2 ) is reduced, keep this exchange and stop; otherwise recover S1 , S2 and continue this process until each pair of measurements between S1 and S2 has been tested. During this process, update the best measurement set Sbest from S1 or S2 . The Move 1 operation tries to move one measurement from a set S1 into a set S2 . If either B (S1 ) or B (S2 ) is reduced, keep this move and stop; otherwise recover S1 , S2 and continue this process until every measurement in S1 has been tested. During this process, update the best set Sbest from S1 or S2 . The Move 2 operation tries to move two measurements from a set S1 into a set S2 . If either B (S1 ) or B (S2 ) is reduced, keep this move and stop; otherwise recover S1 , S2 and continue this process until every two measurements in S1 has been tested. During this process, update the best set Sbest from S1 or S2 .
228
9 Two Approaches to Reduce the Parameter Identification Errors
The above local searches try to find the best set that has the minimum parameter identification error B with LΩ < 1. This is the primary objective of sensor placement design, and the resultant best measurement set is denoted as Sopt (B). However, it is well known that local searches are apt to sticking into local optima and consequently, miss the global optimal solution. To help the local search process jumping out of local optima, it usually requires the algorithm can temporarily accept worse solutions. This is called the diversification mechanism, which is the key point of many modern meta-heuristics, such as simulated annealing, generic algorithm, etc. For this purpose, the diversification mechanism adopted in the proposed heuristic uses a similar local search process as that presented in Figure 9.1, but tries to find the best set that minimizes LΩ instead of B. This is called the secondary objective [13] and the resultant best measurement set is denoted as Sopt (LΩ ). Because Sopt (LΩ ) is slightly different from Sopt (B), the local search process can be guided to a nearby new search region by temporarily accepting worse solutions. Then, better solutions may be found in the new region. The flow chart of this heuristic is plotted in Figure 9.2. Because only local optimum can be obtained by this heuristic, the resultant measurement set is only a good one instead of the global optimal solution. However, it could be sufficient for engineering practice at this low cost.
Figure 9.2. The intensification mechanism.
9.2 The Optimal Sensor Placement Design
9.2.3
229
The integrated optimal sensor placement and parameter identification algorithm
For the given p∗ and the measurement noise ε, a good selective matrix S can be obtained by using the heuristic presented in Section 9.2.2. With these optimized measurements, the identified model parameter would have small bias from p∗ . This seems like a Chicken or the Egg causality dilemma. And sometimes, even the measurement noise is very unclear. To solve the above problem, one practical solution is using an integrated algorithm that can adaptively adjust the unclear a-priori information of p∗ and ε according to intermediate results obtained from observations. This integrated algorithm can be proposed as: (1) Set k = 1. Empirically set the initial guess of p∗ and ε, which are denoted as p0 and ε0 . Empty the good measurement set, i.e., S0 = Φ; (2) Let p∗ = pk−1 and ε = εk−1 , use the heuristic presented in Figure 9.2 to search for a good measurement set Sk ; (3) If Sk = Sk−1 , stop; otherwise, go to Step (4); (4) Based on Sk , use the algorithm presented in Section 9.2.1 to identify parameter pk and calculate the response xk ; ¯ − xk ; (5) Update the guess of measurement noise εk = x (6) Set k = k + 1, go to Step (2).
9.2.4
Examples
As Figure 9.3 shows, a simply-supported beam has a uniform rectangular cross section in dimensions of 20 × 10 cm and with the density of 2400 kg/m 3 . It is equally divided into five regions with the Young’s moduli of E1 = E2 = E4 = E5 = 30 GPa and E3 = 25 GPa, respectively. In the following text, the natural frequency and mode shape (i = 1, 2, . . . , N ) of this beam are going to be used to identify the Young’s modulus of each region.
Figure 9.3. The intensification mechanism.
Firstly, the first four natural frequencies and the first mode shape at all 21 points (see Figure 9.3) calculated by Finite Element Method (FEM) are taken as the ‘measurements’ for the parameter identification. The initial guesses of all Young’s Moduli are 5 GPa, which are far from the true values. As Figure 9.4
230
9 Two Approaches to Reduce the Parameter Identification Errors
shows, after only four iteration steps without optimal measurement selection, all parameters can be successfully identified. Moreover, if fewer measurements with the first four natural frequencies and the first mode shape at points of 1, 5, 9, 13, 17 and 21 are used, all parameters can also be successfully identified. The iteration procedure is very similar to that in Figure 9.4.
Figure 9.4. The intensification mechanism.
However, noise free measurements do not exist in practice. Let the normal¯ and X∗ are the measured and the true mode shapes, ized (to unity) vector X respectively. The measurement noise level of a mode shape can be roughly defined as: 9 :m : , -2 X¯i − Xi∗ , (9.2.24) α=; i=1
where m is the number of mode shape points. Assuming the absolute measurement noise at each mode shape point is the same Δ, one can easily get the value of Δ from (9.2.24): α Δ= √ . (9.2.25) m Therefore, for a given noise level α, it is easy to calculate the polluted mea¯ based on the FEM result of X∗ : surement X ¯ = X
X∗ + Δ . X∗ + Δ2
(9.2.26)
¯ − X∗ . Then, the measurement noise can be obtained as εX = X To evaluate the parameter identification result from polluted measurements, 1% positive noises are added in the first four natural frequencies and α = 10% noises are added in the first mode shape ‘measured’ at 11 points (nodes 1, 3, 5, . . . , 19, 21, see Figure 9.3). As Figure 9.5 shows, the identified parameters are very sensitive to measurement noises, so that the parameter identification procedure cannot converge even if the initial parameters are taken as the true values. This observation coincides with the fact that natural frequencies and
9.2 The Optimal Sensor Placement Design
231
mode shapes are structural global properties, which are not sensitive to local change of stiffness [14]. Therefore, the Fisher information matrix JT J in (9.2.2) is not well conditioned. Small noises in measurements will lead to large perturbations in identified parameters.
Figure 9.5. Parameter identification procedure with polluted measurements.
Then taking the same measurements and the initial values, the integrated algorithm proposed in Section 9.2.3 (without the measurement noise updating) is used to find the good measurements. Unfortunately, the program failed to find any good measurement set. The failure continues until the true first four natural frequencies and some mode shapes polluted with α = 10% noises are used. As Table 9.1 shows, five cases have been tested. Case 1 through Case 4 have only one polluted mode shape and Case 5 contains all the first four polluted mode shapes. It observes from Table 9.1: (1) If only the natural frequencies and the first or the second mode shapes are used, the program cannot find any good measurement set. (2) If the third or the fourth mode shape is used, good measurement set can be found and all parameters are quickly identified with the errors less than the estimation of B = (1 + LΩ ) LA . (3) The identification errors of Case 5 are larger than those of Case 3. This indicates that the heuristic proposed in Figure 9.2 can find only good measurement set instead of the optimal one. (4) If all measurements are used without optimal selection, the parameter identification either fails or converges to the results with larger identification errors than the optimized ones. (5) When all measurements are used in Case 2, although LΩ > 1, the parameter identification procedure still converges. This is because LΩ < 1 is just a sufficient but not necessary condition of the convergence. If LΩ > 1, the parameter identification would diverge or converge to results with very large errors.
Good measurement set
E5
E4
E3
E2
E1
Iteration steps
LΩ
B = (1 + LΩ )LA (10GPa)
Model 4
Model 3
Frequency Model 1 Model 2
/
/
/
/
/
/ (3.433958) / (1.699820) / (3.076546) / (5.978787) / (2.357690)
/ (5.069158) / (2.272) diverged (19)
/
/
/ (0.171250) / (33.634) diverged (diverged)
Case 2 / / / /
Case 1 / / / /
3.000136 (2.999318) 2.999585 (2.999682) 2.500436 (2.501743) 2.999585 (2.999682) 3.000136 (2.999318)
0.000480 (0.002565) 0.098 (0.470) 2 (2)
Case 3 1, 2, 4 / / 3, 5, 7, 11, 15, 17, 19 /
2.933297 (2.814913) 3.071625 (3.008291) 2.499584 (2.510786) 2.931186 (3.000000) 3.071081 (3.181542)
0.085833 (0.297000) 0.168 (0.343) 5 (8)
3, 5, 9, 13, 15, 19
Case 4 1, 2, 3 / / /
3.002349 (3.060870) 2.997784 (2.781328) 2.500284 (2.501764) 3.000484 (3.235985) 2.998789 (2.946059)
0.002514 (0.302505) 0.053 (0.317) 4 (5)
Case 5 1, 2, 4 3, 7, 13 3, 13, 17, 19 9, 11, 15, 17, 19 3, 5, 7, 9, 13, 17, 19
Note: the bold values have the maximum identification error; and the values in parenthesis are results from all measurements without optimal selection.
Identified parameters (10GPa)
232 9 Two Approaches to Reduce the Parameter Identification Errors
Table 9.1. The identified parameters with known measurement noises.
9.3 The Regularization Method
233
As what discussed in Section 9.2.3, the a-priori information of p∗ and ε are usually uncertain in practice. Therefore, the integrated algorithm is used to update the a-priori information at each parameter identification step. For this beam example, supposing the initial guesses of the Young’s modulus of all regions are 20GPa, and there are 1% positive errors in measured natural frequencies and α = 10% noises in measured mode shapes at 11 points (nodes 1, 3, 5, . . . , 19, 21, see Figure 9.3). Because natural frequencies are very insensitive to local stiffness, it just fix εkλ = 0 (k = 0, 1, 2, . . . ) so that good measurement sets could be found. In addition, supposing the true mode shapes are unknown, the initial guess of measurement noise of mode shape could be ¯ −X ˜ ∗ , where X ˜ ∗ is the guess of true normalized mode shape solved ε0X = X ∗ ˜ ¯ = X +Δ . As Table 9.2 shows, the parameter identification procedure from X ˜ ∗ +Δ X 2 would not converge if εX was fixed. In addition, if all measurements were used, the identified parameters would have larger errors than those obtained from the integrated algorithm.
Good measurements
Table 9.2. The identified parameters with guessed measurement noises.
Identified parameters (10GPa)
Frequency Model 1 Model 2 Model 3 Model 4 Iteration steps
9.3
E1 E2 E3 E4 E5
Use all measurements
Fix εX
Update εX
All All All All All 7 3.122372 2.837269 2.552086 3.301066 3.005202
/ 3, 5, 13, 15 / / / Diverged / / / / /
1, 2 5, 7, 15, 17, 19 All All 4 3.001390 3.273441 2.559989 2.867603 3.107655
The Regularization Method with the Adaptive Updating of A-priori Information
As what discussed in Section 9.1, the Extended Bayesian Method (EBM) can be regarded as a kind of Tikhonov regulator, which is very effective to stabilize the parameter identification procedure. However, the regularization parameter β should be carefully selected so that the identified parameter is not very close to the initial guess p0 when the a-priori information is not so reliable. For this purpose, this section proposes a Modified Extended Bayesian Method (MEBM),
234
9 Two Approaches to Reduce the Parameter Identification Errors
which is also based on the idea of adaptively adjusting the a-priori information at each parameter identification step.
9.3.1
Modified extended Bayesian method for parameter identification
Assuming the measurement noise is Gaussian with zero mean N (0, Cu ), the a-priori information of the parameters from EBM in (9.1.2) should also fol 1 0 low a Gaussian distribution of N p , β Cp . It should be noticed that these a-priori information is very empirical and mathematically has nothing to do with the measurements. However, the measurements do provide some useful information, which can be used for updating the uncertain a-priori information. Therefore, the EBM can be modified by assuming the a-priori information of the 1 k−1 parameters follows the Gaussian distribution of N p , β k−1 Cpk−1 at each iteration step k (k = 1, 2, . . . ). Thus, the objective function of the proposed MEBM is: T T k k−1 k k−1 f pk , x pk − pk−1 C−1 p . (9.3.1) ¯ = Rk C−1 − p k−1 u R +β p Consequently, the mapping function in (9.2.2) is changed to:
G
k−1
p
=p
k−1
−
Jk−1
T
k−1 C−1 + β k−1 C−1 u J pk−1
−1
Jk−1
T
k−1 C−1 . u R
(9.3.2) When k = 1, p0 can be empirically specified, and for simplicity, β 0 C−1 p0 can be taken as a diagonal matrix with the same diagonal items as matrix , 0 -T −1 0 J Cu J . When k > 1, Cpk−1 is also assumed to be a diagonal matrix, in which the diagonal items are the square of the corresponding items in Δpk−2 = pk−1 −pk−2 . The parameter β k−1 acts as a scale factor that ensures the matrices β k−1 C−1 pk−1 , k−1 -T −1 k−1 and J Cu J have the same maximum items. In this way, the matrix can be automatically updated following its original statistical meanβ k−1 C−1 pk−1 ing, and at the same time balancing the magnitude of the two terms on the right side of (9.3.1), which is also the motivation of how to select β in Bayesian estimation [15, 16].
9.3.2
The well-posedness analysis of modified extended Bayesian method
The well-posedness analysis of MEBM is very similar as that in Section 9.2.1. into the mapping function G in The main difference is introducing β k−1 C−1 pk−1
9.3 The Regularization Method
235
(9.3.2). Consequently, T −1 , , −1 −1 ∂ (J Cu J) JT C−1 J + βC−1 −1 JT C−1 R Ω (p) = I + JT C−1 u J + βCp u p u ∂p , T −1 , T −1 −1 −1 ∂JT −1 −1 −1 T −1 − J Cu J + βCp J Cu J, ∂p Cu R − J Cu J + βCp (9.3.3) , T −1 −1 −1 T −1 J Cu Sε. (9.3.4) A (p) = − J Cu J + β Cp
It notices matrix JT C−1 u J is diagonally dominated. Its magnitude, de= < Tthat −1 noted as J Cu J , can be approximated as the magnitude of its diagonal item, say θ: < T −1 = J Cu J ∼ θ. (9.3.5) Consequently, the following relations can be obtained: = < T −1 ∼ 2θ, J Cu J + βC−1 p J ∼
x , p
< −1 = Cu ∼
(9.3.6) (9.3.7)
1 , x2
R ∼ x .
(9.3.8) (9.3.9)
From (9.3.5), (9.3.7) and (9.3.8), it is obvious that: p2 ∼ θ.
(9.3.10)
Substituting (9.3.5) through (9.3.10) into (9.3.3) and (9.3.4), yields: < = 1 1 1 1 LΩ ∼ Ω ∼ 1 + − − = , 4 2 2 4
(9.3.11)
< = 1 (9.3.12) LA ∼ A ∼ p ε . 2 Therefore, the magnitude of the estimation of parameter identification error is: B = (1 + LΩ ) LA ∼
5 p ε . 8
(9.3.13)
From above analysis, it is clear that LΩ < 1 can always be satisfied and the parameter identification procedure converges to a solution with small bias from the initial guess of p∗ .
236
9.3.3
9 Two Approaches to Reduce the Parameter Identification Errors
Examples
Figure 9.6 shows an embankment with three layers of different materials, which is subjected to a linearly distributed load. All displacements calculated by FEM are taken as ’measurements’ to identify the Young’s modulus E of each layer. The parameter identification procedure starts with p0 = p∗ , and uses the measurements polluted with ±1 relative noises.
Figure 9.6. A three-layered embankment.
Using the method in Section 9.2, one can obtain the best measurement set that contains y displacements at nodes 4, 6 and 7 (see Figure 9.6). The identified results are listed in columns of ‘Sec. 9.2’ in Table 9.3. It observes that the identification errors are almost the same as the estimated values. This gives another validation of the algorithm proposed in Section 9.2. In addition, the program also points out a bad measurement set that contains −1 relative noise in the y displacement of node 5 and the x displacement of node 2, and +1 relative noise in the x displacement of nodes 1 and 3. In this case, the value of LΩ is about 1.15. Consequently, the parameter identification cannot converge. The above best and bad measurement sets are also adopted for the parameter identification using the MEBM with the guess that the standard variance of the measurements is 1×10−6 m. As Table 9.3 shows, with the best measurement set, the identification errors by the MEBM are much smaller than those from the method in Section 9.2. In addition, with the bad measurement set, the MEBM can still converge to a reasonable results with the maximum identification error of 19009Pa (E1 = 5.019009MPa, E2 = 3.001548MPa and E3 = 1.005529MPa). It seems that, the MEBM could converge to a reasonable result even with bad measuremet set. This result could be greatly improved if optimal measurements were used.
E1 (MPa) Sec. 9.2 MEBM ♣5.004989 ♣5.002786 5.003212 ♣5.002400 ♣5.005773 ♣5.002715 ♣5.003994 ♣5.002329 ♣4.995986 ♣4.997652 ♣4.994213 ♣4.997267 4.996760 ♣4.997581 ♣4.994990 ♣4.997196
E2 (MPa) Sec. 9.2 MEBM 3.002954 3.000683 ♣3.005297 3.000667 2.998326 2.999561 3.000658 2.999544 2.999311 3.000453 3.001642 3.000436 ♣2.994693 2.999330 2.997017 2.999314
E3 (MPa) Sec. 9.2 MEBM 1.000978 1.000005 1.002724 1.000219 0.999773 1.000330 1.001515 1.000544 0.998469 0.999453 1.000206 0.999668 0.997270 0.999778 0.999003 0.999992
B (Pa) Sec. 9.2 4989 5297 5773 3994 4014 5787 5307 5009
Note: ’+’: with +0.1% relative noise; ’−’: with −0.1% relative noise; ’♣’: with the maximum bias.
Measurements 4y 6y 7y − − − + − − − + − + + − − − + + − + − + + + + +
MEBM 2786 2400 2715 2329 2348 2733 2419 2804
9.3 The Regularization Method 237
Table 9.3. Comparison of results from the methods in Section 9.2 and the MEBM.
238
9.4
9 Two Approaches to Reduce the Parameter Identification Errors
Conclusion
As discussed in above sections, there are two approaches to reduce the parameter identification errors. One is based on the optimized sensor placement; another is based on regularization techniques. The common feature of the proposed two approaches is adaptively adjusting the a-priori information according to the intermediate results identified from observations. In such a way, reliable parameter identification results can be guaranteed to some extent. In addition, numerical examples reveal that it is promising to combine these two approaches to get better identification. That is to say, one can firstly use the integrated algorithm presented in Section 2 to obtain a reliable guess of the a-priori information and the good measurement set. Then based on these initial solutions, try to fine tune the parameter identification results by using the MEBM presented in Section 9.3. One may notice that all the examples presented in this chapter just use artificial measurements, which come from numerical analysis. This is because the author just wants to emphasize the algorithm itself in this chapter, assuming the model is perfect. How to establish a good model and select the sensitive response is closely related with the insight understanding of a specific physical system. This abroad topic is just beyond the scope of this chapter. If the reader is interested in the practical implementation of the proposed methods, he or she is recommended to read [15–18] for reference.
References [1] A. N. Tikhonov and A. V. Goncharsky, Ill-posed Problems in the Natural Sciences, Moscow, Mir Publishers, 1987. [2] V. G. Romanov, Inverse problems of mathematical physics, Utrecht, Vnuscience Press, 1987. [3] M. R. Forster, The new science of simplicity, Simplicity, Inference and Modelling, Editors: A. Zellner, H. A. Keuzenkamp and M. McAleer, UK, Cambridge University Press, 2002. [4] S. P. Neuman and S. Yakowitz, A statistical approach to the inverse problem of aquifer hydrology-1. Theory, Water Resources Research, 15, 845-860, 1979. [5] H. Yusuke, W. T. Lie and G. Soumitra, Inverse analysis of an embankment on soft clay by extended Bayesian method, International Journal for Numerical and Analytical Methods in Geomechanics, 18, 709-734, 1994. [6] H. W. Sorenson, Least-squares estimation: from Gauss to Kalman, IEEE Spectrum, 7, 63-68, 1970 [7] A. L. Dontchev and T. Zolezzi, Well-posed Optimization Problems, Berlin, Springer-Verlag, 1993. [8] H. W. Engl, M. Hanke and A. Neubauer, Regularization of inverse problems, Netherlands, Kluwer Academic Publishers, 1996.
References
239
[9] Y. F. Wang, A. G. Yagola and C. C. Yang, Optimization and Regularization for Computational Inverse Problems and Applications, Beijing, Higher Education Press, 2010. [10] S. L. Padula and R. K. Kincaid, Optimization strategies for sensor and actuator placement, Technique Report, NASA/TM-1999-209126, 1999. [11] E. Haber, U. M. Ascher and D. Oldenburg, On optimization techniques for solving nonlinear inverse problems, Inverse Problems, 16, 1263-1280, 2000. [12] D. H. Griffel, Applied functional analysis, Mineola, N.Y., Dover Publications, 2002. [13] Z. H. Xiang, C. B. Chu and H. X. Chen, A fast heuristic for solving a largescale static dial-a-ride problem under complex constraints, European Journal of Operational Research, 174, 1117-1139, 2006. [14] Z. H. Xiang and Y. Zhang, Changes of modal properties of simply-supported plane beams due to damage, Interaction and Multiscale Mechanics, 2, 171-193, 2009. [15] Z. H. Xiang, G. Swoboda and Z. Z. Cen, On the optimal layout of displacement measurements for parameter identification process in geomechanics, ASCE The International Journal of Geomechanics, 3, 205-216, 2003. [16] Z. H. Xiang, G. Swoboda and Z. Z. Cen, Parameter identification and its application in tunneling, Numerical Simulation in Tunnelling, Editor: G. Beer, Vienna, Springer-Verlag, 2003. [17] M. S. Zhou, Y. Q. Li, Z. H. Xiang, G. Swoboda and Z. Z. Cen, A modified extended Bayesian method for parameter estimation, Tsinghua Science and Technology, 12, 546-553, 2007. [18] Y. Q. Li, Z. H. Xiang, M. S. Zhou and Z. Z. Cen, An integrated parameter identification method combined with sensor placement design, Communications in Numerical Methods in Engineering, 24, 1571-1585, 2008.
Author Information Z. H. Xiang Department of Engineering Mechanics, Tsinghua University, Beijing 100084, P. R. China. E-mail: [email protected]
Chapter 10
A General Convergence Result for the BFGS Method Y. H. Dai
Abstract. The BFGS method is one of the most famous quasi-Newton algorithms for unconstrained optimization. For general functions, Powell (1976) showed that the BFGS method with Wolfe line searches is globally converyT y gent assuming that the quantity skT yk is uniformly bounded. A recent counterk
k
example by Dai (2010) indicates that, if this quantity increases at an exponential rate, the BFGS method may not converge even if the line search always picks the first local minimizer. In this note, we establish the convergence of the BFGS method with Wolfe line searches for general functions assuming that the quantity is at most linearly increasing.
10.1
Introduction
Consider the unconstrained optimization problem, min f(x),
x ∈ Rn ,
(10.1.1)
where f is smooth and its gradient g is available. The line search method generates the iterates {xk ; k ≥ 1} recursively by xk+1 = xk + αk dk ,
(10.1.2)
where x1 is a given starting point, αk is a step length via some line search and dk is the search direction. Having the current approximation matrices Bk to the Hessian of f at xk , the quasi-Newton method defines the search direction dk+1 = −Bk−1 gk ,
(10.1.3)
and then updates the approximation Bk to Bk+1 based on the curvature pair {sk , yk }, where sk = αk dk and yk = gk+1 − gk .
242
10
A General Convergence Result for the BFGS Method
The BFGS method is one of the most efficient quasi-Newton methods for solving this problem. It was proposed by Broyden [2], Fletcher [7], Goldfarb [9] and Shanno [19] individually, and is given by Bk+1 = Bk −
Bk sk sTk Bk yk ykT + . sTk Bk sk sTk yk
(10.1.4)
In addition to the BFGS method, another famous quasi-Newton method is called the DFP method, which was the first quasi-Newton method discovered by Davidon [5] and modified by Fletcher and Powell [8]. Combining the BFGS and DFP methods, Broyden [2] proposed a family of quasi-Newton methods: Bk+1 (θ) = Bk −
Bk sk sTk Bk yk ykT + T + θ(sTk Bk sk )vk vkT , sTk Bk sk sk yk
where θ ∈ R1 is a scalar, and
vk =
yk T sk yk
−
Bk s k T s k Bk s k
.
(10.1.5)
(10.1.6)
The choice θ = 0 gives rises to the BFGS update, whereas θ = 1 defines the DFP method. The step length αk in line search methods is required to meet certain conditions. If exact line search is used, αk satisfies f(xk + αk dk ) = min f(xk + αdk ). α>0
(10.1.7)
In practical implementations of the BFGS algorithm, one normally requires that step length αk satisfies the Wolfe conditions ([20]): f(xk + αk dk ) − f(xk ) ≤ δ1 αk dTk gk ,
(10.1.8)
dTk ∇f(xk + αk dk ) ≥ δ2 dTk gk ,
(10.1.9)
where δ1 ≤ δ2 are constants in (0, 1). For convenience, we call the line search that satisfies the Wolfe conditions (10.1.8)–(10.1.9) as the Wolfe line search. For uniformly convex functions, Powell [14] showed that the DFP algorithm with exact line searches stops at the unique minimum or generate a sequence that converges to the minimum. Dixon [6] found that all methods in the Broyden family with exact line searches produce the same iterations for general objective functions. For inexact line searches, Powell [16] first proved the global convergence of the BFGS algorithm for general functions with Wolfe yT y line searches if the quantity sTk y k is uniformly bounded, namely, k
ykT yk ≤ M, sTk yk
k
for some M > 0 and all k ≥ 1.
(10.1.10)
10.2 The BFGS Algorithm
243
This condition naturally holds for convex functions. Powell’s result was extended by Byrd, Nocedal and Yuan[1] to all methods in the restricted Broyden family with θ ∈ [0, 1). The following questions remain unanswered for many years (for instance, see [12, 21]): (i) does the DFP method with Wolfe line searches converge for convex functions? and (ii) does the BFGS method with Wolfe line searches converge for non-convex functions? A negative answer for the second question is given in Dai [3]. Mascarenhas [11] constructed a counter-example showing that the BFGS method may fail for general functions if the global exact line search is used. If the line search always picks the first local minimizer, Powell [17] proves the global convergence of the BFGS method for two-dimensional non-convex functions. Dai [4] further provided a four-dimensional counter-example addressing the non-convergence of the BFGS method even if the line search picks the first local minimizer. yT y From the counter-examples, it is not difficult to see that the quantity skT y k k k increases to infinity at an exponential rate. We will give the BFGS algorithm with details in the next section. Then in Section 3, we will give a general yT y convergence theorem for the BFGS method, which says that if the quantity skT y k k k increases at most linearly, then the BFGS algorithm with Wolfe line searches is globally convergent for general objective functions. Conclusion and discussions are given in the last section.
10.2
The BFGS Algorithm
In this section, we give the details of the BFGS algorithm considered in this chapter. Algorithm 10.2.1 (The Broyden-Fletcher-Goldfarb-Shanno algorithm). 0. Given x1 ∈ Rn ; B1 ∈ Rn×n positive definite; k := 1. 1. Compute gk = ∇f(xk ); If gk = 0 then stop; Set dk = −Bk−1 gk ; 2. Calculate αk by the Wolfe line search (10.1.8)–(10.1.9); Set xk+1 = xk + αk dk ; 3. Calculate sk = αk dk and yk = gk+1 − gk ; Update Bk+1 by (10.1.4). 4. k := k + 1; go to Step 1.
244
10
A General Convergence Result for the BFGS Method
For the parameters in the Wolfe line search, typically values for δ1 and δ2 are δ1 = 0.01 (or a smaller value) and δ2 = 0.9, respectively. Under mild assumptions on the objective function, we know that there always exists a step length satisfying the Wolfe conditions (10.1.8)–(10.1.9), see for example [21]. Further, the Wolfe line search ensures a positive curvature pair {sk , yk } to be found; namely, sTk yk > 0. Consequently, the positive definiteness of Bk+1 follows from this condition and the positive definiteness of Bk . Since the starting matrix B1 is chosen to have this property, we know that all matrices {Bk } will be positive definite and the algorithm is well defined.
10.3
A General Convergence Result for the BFGS Algorithm
We make the following basic assumption on the objective function. Assumption 10.3.1. (1) The level set L = {x ∈ Rn : f(x) ≤ f(x1 )} is bounded; (2) In some neighborhood N of L, f is differentiable and its gradient g is Lipschitz continuous. Under the above assumption, we address a general lemma for the descent line search method. The condition (10.3.1) is usually called the Zoutendijk condition ([22]). Lemma 10.3.2. Assume that f satisfies Assumption 10.3.1. Consider a general line search method xk+1 = xk + αk dk , where dk is a descent direction satisfying dTk gk < 0 and αk is computed by the Wolfe line search. Then we have that (dT gk )2 k < ∞. (10.3.1) dk 2 k≥1
Denote tr(Bk ) and det(Bk ) to be the trace and determinant of Bk , respectively. The following lemma provides the basic trace and determinant relations for the BFGS update. One can see Pearson [13] for the proof to (10.3.3). Lemma 10.3.3. Consider the BFGS update. We have that tr(Bk+1 ) = tr(B1 ) −
k Bi si 2 i=1
det(Bk+1 ) =
sTk yk T s k Bk s k
sTi Bi si
det(Bk ).
+
k yi 2 i=1
sTi yi
,
(10.3.2) (10.3.3)
In the following, we present our main convergence result for the BFGS alyT y gorithm. It says that, if the quantity skT yk increases at most linearly, then k
k
10.3 A General Convergence Result for the BFGS Algorithm
245
the BFGS algorithm with Wolfe line searches is globally convergent for general objective functions. Theorem 10.3.4. Suppose that Assumption 10.3.1 holds. Consider the BFGS algorithm with the Wolfe line search (10.1.8)–(10.1.9). For any starting point x1 and any positive definite starting matrix B1 , if there exists positive constants γ1 and γ2 such that ykT yk ≤ γ1 + γ2 k, for all k, (10.3.4) sTk yk we have either gk = 0 for some k, or the following convergence relation holds: lim inf gk = 0.
(10.3.5)
k→∞
Proof. It follows from (10.3.3) and (10.3.2) that % &n k > [tr(Bk+1 )/n]n det(Bk+1 ) Ak sTi yi −1 ≤ ≤ det(B1 ) = , T det(B1 ) det(B1 ) n s Bi s i i=1 i where
k yi 2
Ak = tr(B1 ) +
i=1
sTi yi
(10.3.6)
(10.3.7)
.
Since tr(Bk+1 ) > 0, relation (10.3.2) also implies that k Bi si 2 i=1
sTi Bi si
which is followed by k > Bi si 2 i=1
sTi Bi si
≤ Ak , %
≤
Ak k
(10.3.8)
&k (10.3.9)
.
For any γ3 > γ2 /2, we then know by (10.3.6), (10.3.9) and (10.3.4) that k > Bi si 2 sT yi i
i=1
(sTi Bi si )2
≤ det(B1−1 )
%
Ak n
&n %
Ak k
&k ≤ (γ3 k)k
(10.3.10)
for all sufficiently large k. The line search condition (10.1.9) implies that sTk yk ≥ −(1 − δ2 )sTk gk .
(10.3.11)
It follows from (10.3.10) and (10.3.11) that k > gi 2 ≤ (γ4 k)k , for all large k, Tg −s i i i=1
(10.3.12)
246
10
A General Convergence Result for the BFGS Method
where γ4 =γ3 /(1 − δ2 ). By Assumption 10.3.1 and the line search condition (10.1.8), we get that ∞ −sTi gi < +∞. (10.3.13) i=1
Now we proceed by contradiction and assume that there exists a positive constant γ such that gi ≥ γ, for all i ≥ 1. (10.3.14) For any positive constant < γ 2 γ4−1 , we know by (10.3.13) that there exists an integer k¯ such that k
−sTi gi ≤ ,
for all k ≥ k¯ + 1.
(10.3.15)
¯ i=k+1
Then we have from (10.3.15) that for k ≥ k¯ + 1, k > ¯ i=k+1
! k −sTi gi
≤
"k−k¯ T ¯ (−si gi ) i=k+1 k − k¯
% ≤
k − k¯
&k−k¯ .
(10.3.16)
¯ for all k ≥ k.
(10.3.17)
The above relation and (10.3.14) show that % &k−k¯ k¯ k > > k − k¯ gi 2 gi 2 ≥ , γ −2 −sTi gi −sTi gi i=1 i=1
Since < γ 2 γ4−1 , we know by letting k → ∞ that (10.3.12) and (10.3.17) contradicts each other. The contradiction shows that this theorem is true.
10.4
Conclusion and Discussions
In this chapter, we have provided a general convergence theorem, namely, Theorem 10.3.4 for the famous BFGS method for unconstrained optimization. Similarly to Byrd, Nocedal and Yuan [1], we believe that Theorem 10.3.4 can be extended to the whole Broyden’s family of quasi-Newton methods with θ ∈ [0, 1) except the DFP method. On the other hand, it is still not known to the author whether there exists some class of functions broader than convex functions such that the condition (10.3.4) holds. There have been quite a few researchers to modify the BFGS method such that there is global convergence for general functions (for example, see [10]). It might be helpful to use Theorem 10.3.4 to extend their convergence results or suggest some new convergent variants.
References
247
Finally, we mention several related convergence problems in the quasi-Newton field. If the line search is exact and sk → 0, [15] proved the global convergence of the BFGS algorithm for 2-dimensional quadratic functions. This result was extended by Pu and Yu [18] for any-dimensional general functions. Therefore one interesting question may be, if sk → 0, is the BFGS algorithm with Wolfe line searches globally convergent for general functions? If yes, how to design an inexact line search such that sk → 0 and the condition (10.1.9) holds, and hence ensures the global convergence of the BFGS method for general functions? The hardest convergence problem is obviously related to the DFP method. Does the DFP method with Wolfe line searches converge for uniformly convex functions?
Acknowledgements This work is supported by the National Natural Science Foundation of China under grant number 10831006 and the Chinese Academy of Sciences under grant number kjcx-yw-s7-03.
References [1] R. Byrd, J. Nocedal and Y. Yuan, Global convergence of a class of quasi-Newton methods on convex problems, SIAM J. Numer. Anal., 24, 1171-1190, 1987. [2] G. C. Broyden, The convergence of a class of double rank minimization algorithms: 2. the new algorithm, J. Inst. Math. Appl., 6, 222-231, 1970. [3] Y. H. Dai, Convergence properties of the BFGS method, SIAM J. Opt., 13, 693701, 2002. [4] Y. H. Dai, A Perfect Example for the BFGS Method, Research report, Academy of Mathematics and Systems Sciences, Chinese Academy of Sciences, Beijing, 2010. [5] W. C. Davidon, Variable Metric Methods for Minimization, Argonne National Lab Report (Argonne IL), 1959. [6] L. C. W. Dixon, Variable metric algorithms: necessary and sufficient conditions for identical behavior of nonquadratical functions, J. Opt. Theory Appl., 10, 3440, 1972. [7] R. Fletcher, A new approach to variable metric algorithms. Computer J., 13, 317-322, 1970. [8] R. Fletcher and M. J. D. Powell, A rapidly convergent descent method for minimization, Computer J., 6, 163-168, 1963. [9] D. Goldfarb, A family of variable metric method derived by variational means, Math. Comput., 23, 23-26, 1970. [10] D. H. Li and M. Fukushima, On the global convergence of BFGS method for nonconvex unconstrained optimization problems, SIAM J. Opt., 11, 1054-1064, 2001. [11] W. F. Mascarenhas, The BFGS method with exact line searches fails for nonconvex objective functions, Math. Program., 99, 49-61, 2004.
248
10
A General Convergence Result for the BFGS Method
[12] J. Nocedal, Theory of algorithms for unconstrained optimization, Acta Numerica, 199-242, 1991. [13] J. D. Pearson, Variable metric methods of minization, Computer Journal, 12, 171-178, 1969. [14] M. J. D. Powell, On the convergence of the variable metric algorithm, J. Inst. Maths. Appl., 7, 21-36, 1971. [15] M. J. D. Powell, Quadratic termination properties of minimization algorithm, Part I and Part II, J. Inst. Maths. Appl., 10, 332-357, 1972. [16] M. J. D. Powell, Some global convergence properties of a variable metric algorithm for minimization without exact line searches, Nonlinear Programming, SIAMAMS Proceedings Vol. IX, Editors: R. W. Cottle and C. E. Lemeke, Philadelphia, SIAM Publications, 53-72, 1976. [17] M. J. D. Powell, On the convergence of the DFP algorithm for unconstrained optimization when there are only two variables, Math. Program. Ser. B, 87, 281301, 2000. [18] D. Pu and W. Yu, On the convergence property of DFP algorithm, Annals of Operations Research, 24, 175-184, 1990. [19] D. F. Shanno, Conditioning of quasi-Newton methods for function minimization, Math. Comput., 24, 647-650, 1970. [20] P. Wolfe, Convergence conditions for ascent methods, SIAM Review, 11, 226-235, 1969. [21] Y. Yuan, Numerical Methods for Nonlinear Programming, Shanghai Scientific & Technical Publishers, 1993 (in Chinese). [22] G. Zoutendijk, Nonlinear programming, computational methods, Integer and Nonlinear Programming, Editor: J. Abadie, North-Holland, Amsterdam, 37-86, 1970.
Author Information Y. H. Dai State Key Laboratory of Scientific and Engineering Computing, Institute of Computational Mathematics and Scientific/Engineering Computing, Academy of Mathematics and Systems Science, Beijing 100190, P. R. China. E-mail: [email protected]
Part IV
Recent Advances in Inverse Scattering
Chapter 11
Uniqueness Results for Inverse Scattering Problems X. D. Liu and B. Zhang
Abstract. In this chapter, we collect and present the uniqueness results in inverse obstacle scattering problems in a readable and informative form. Included are a sketchy summary of the existing results, an outline of some mathematical tools utilized in the proofs of the uniqueness results and some meaningful discussions. Furthermore, we list some interesting open problems.
11.1
Introduction
Inverse scattering is concerned with the reconstruction of scattering objects or their physical properties. It grew from its early beginnings with the invention of radar and sonar during the Second World War to a large and fast developing area of applied mathematics, like medical imaging, ultrasound tomography, nondestructive testing, remote sensing, aeronautics and seismic exploration, that require the practical solution of inverse problems. In the last few decades, as the development of powerful computers and the mathematical theory of ill-posed problems by Tikhonov, the computational simulation of scattering process has become accessible by using microcomputers and the field of inverse scattering problems arose. In the present survey paper, we shall focus on the inverse scattering problems, where one utilizes the time-harmonic acoustic or electromagnetic waves to identify the unaccessible obstacle. Our main concern is on the uniqueness issues for such inverse problems. To proceed further, we first explain in more detail what we mean by the inverse scattering problem. Consider the scattering of a time harmonic acoustic wave by a bounded obstacle in a homogeneous medium. Assume that the incident field is given by the plane wave ui (x) = eikx·d where k = ω/c0 is the wave number, ω is the frequency, c0 is the speed of sound
252
11
Uniqueness Results for Inverse Scattering Problems
in the homogeneous background medium and d is the direction of propagation. To describe the phenomenon of scattering mathematically we must distinguish between the two cases of penetrable and impenetrable obstacles. In particular, for a penetrable inhomogeneous medium, the simplest case can be modeled by Δu + k 2 nu = 0 in Rm (m ≥ 2),
(11.1.1)
u(x) = ui (x) + us (x), % s & m−1 ∂u s 2 lim r − iku = 0 r→∞ ∂r
(11.1.2) (11.1.3)
where r = |x|, n = c20 /c2 is the refractive index given by the ratio of the square of the sound speeds, and the penetrable obstacle D is given by the compact support of 1 − n. Note that if the inhomogeneous medium in D is absorbing, n is complex-valued and no longer is simply the ratio of the sound speed. The radiation condition (11.1.3) was introduced by Sommerfeld in 1912 to guarantee that the scattered wave is outgoing. On the other hand, for the case of scattering by an impenetrable obstacle D, the scattering problem is to seek the total field u such that Δu + k 2 u = 0 in Rm \D,
(11.1.4)
u(x) = u (x) + u (x),
(11.1.5)
B(u) = 0 on ∂D, % s & ∂u − ikus = 0. lim r r→∞ ∂r
(11.1.6)
i
s
m−1 2
(11.1.7)
The equation (11.1.4) carries the name of the physicist Hermann Ludwig Ferdinand von Helmholtz (1821–1894) for his contributions to mathematical acoustic and electromagnetic. The boundary condition (11.1.6) depends on the physical property of the obstacle. For a sound-soft obstacle the pressure of the total wave vanishes on the boundary, so a Dirichlet boundary condition B(u) := u on ∂D is imposed. Similarly, the scattering from a sound-hard obstacle leads to a Neumann boundary condition ∂u on ∂D, ∂ν since the normal velocity of the total acoustic wave vanishes on the boundary. More general and realistic boundary conditions are to allow that the normal velocity on the boundary is proportional to the excess pressure on the boundary, which leads to an impedance boundary condition of the form B(u) :=
B(u) :=
∂u + iλu on ∂D ∂ν
11.1 Introduction
253
with a nonnegative continuous function λ. Henceforth, we shall use B(u) = 0 to represent either of the above three types or mixed type of boundary conditions on ∂D. Solutions to the Helmholtz equation (11.1.4) which are defined in all of Rm (m ≥ 2) are called entire solutions. For example, ui (x) is an entire solution. With the help of Green’s theorem, it is easy to deduce that an entire solution satisfying the Sommerfeld radiation condition (11.1.3) must vanish identically. It is well-known that for any Lipschitz continuous boundary ∂D, there exists 1 (Rm \D) to the problem (11.1.4) − (11.1.7), and u is a unique solution u ∈ Hloc analytic on any compact set in Rm \D (see [86]). The well-posedness (existence, uniqueness and stability) of the direct problem for smooth obstacles using the integral equation method can be find in [20]. Moreover, it is known that us (x) has the following asymptotic representation 1 ' eikr s ∞ u (x, d) = m−1 u (? x, d) + O as r = |x| → ∞ (11.1.8) r r 2 uniformly for all directions x ? := x/|x|, where the function u∞ (? x, d) defined on the unit sphere S is known as the far field pattern with x ? and d denoting, respectively, the observation direction and the incident direction. By analyticity, the far field pattern is completely determined on the whole unit sphere S by only knowing it on some open subset S ∗ of S [20]. Therefore, all the uniqueness results carry over to the case of limited aperture problems where the far field pattern is only known on some open subset S ∗ of S. Without loss of generality, we can assume that the far field pattern is given on the whole unit sphere S, that is, in every possible observation direction. x, d, k) The inverse problem we consider is, given the far field pattern u∞ (? which depends on the incident directions d ∈ S, the observation directions x ?∈S and the wave number k to determine the location and shape of the obstacle D together with its physical property B. The first question to ask about the inverse scattering problem is the identifiability, that is, whether an obstacle can be identified from a knowledge of the far field pattern. Mathematically, the identifiability is the uniqueness issue which is of theoretical interest and is required in order to proceed to efficient numerical methods of solutions. Consider now the case of electromagnetic waves. Assume that the incident field is given by the electromagnetic plane wave i curl curl qeikx·d = ik(d × q) × deikx·d , k H i (x, d, q) = curl qeikx·d = ikd × qeikx·d
E i (x, d, q) =
√ where k = ω 0 μ0 is the wave number, ω the frequency, 0 the electric permittivity, μ0 the magnetic permeability, d the direction of propagation and q the
254
11
Uniqueness Results for Inverse Scattering Problems
polarization. Then the electromagnetic scattering problem corresponding to (11.1.1)–(11.1.3) (assuming variable permittivity but constant permeability μ) is to find the electric field E and the magnetic field H such that curl H + ikn(x)E = 0 in R3 ,
curl E − ikH = 0,
E(x) = E (x) + E (x), i
s
H(x) = H (x) + H (x), i
s
lim (H × x − rE ) = 0 s
s
(11.1.9) (11.1.10) (11.1.11)
r→∞
where n(x) = [(x) + iσ(x)/ω]/0 is the refractive index with electric permittivity (x) and electric conductivity σ(x) in the inhomogeneous medium and (11.1.11) is the Silver-Müller radiation condition. Similarly, the penetrable obstacle D is given by the compact support of 1 − n. If σ is nonzero, the obstacle is called a conductor, whereas if σ = 0 the obstacle is referred to as a dielectric. The radiation condition (11.1.11) was independently introduced in the 1940s by Silver and Müller. Similarly, the electromagnetic analogue of (11.1.4)–(11.1.7) is the problem of scattering by an impenetrable obstacle D which can be mathematically formulated as the problem of finding an electromagnetic field E, H such that curl H + ikE = 0 in R3 \D,
curl E − ikH = 0,
E(x) = E (x) + E (x), i
s
H(x) = H (x) + H (x), i
s
(11.1.12) (11.1.13)
B(E) = 0 on ∂D,
(11.1.14)
lim (H × x − rE ) = 0
(11.1.15)
s
s
r→∞
where the equations (11.1.12) are named for Maxwell (1831–1879) for his fundamental contributions to electromagnetic theory. The boundary condition (11.1.14) depends on the nature of the obstacle D. For a perfect conductor we have B(E) := ν × E. The scattering by an obstacle which is not perfectly conducting but does not allow the electromagnetic wave to penetrate deeply into the obstacle is modeled by an impedance or Leontovich boundary condition of the form [37] B(E) := ν × curl E − iλ(ν × E) × ν with a positive continuous impedance function λ. Henceforth, we shall use B(E) = 0 to represent either of the above two types or mixed type of boundary conditions on ∂D. The radiation condition (11.1.15) ensures uniqueness of solution to the exterior boundary value problem and leads to an asymptotic behavior of the form 1 ' eikr s ∞ E (x) = E (? as r = |x| → ∞ (11.1.16) x, d, q) + O r r
11.1 Introduction
255
uniformly in all directions x ? = x/|x|, where the vector field E ∞ defined on the unit sphere S is known as the electric far field pattern. Note that the electric far field pattern is analytic function of their independent variables and the dependance on the polarization is linear. The inverse problem we consider is to determine the inhomogeneity n or the obstacle D with its physical property B x, d, q) for the observation from a knowledge of the electric far field pattern E ∞ (? directions x ? ∈ S, the incident directions d ∈ S and the polarizations q ∈ R3 . Since the first uniqueness result due to Schiffer for sound-soft smooth obstacles by countably many incident plane waves (see [20, 65]), there has been an extensive study in this direction, and rich results can be found in the literature. For instance, one can find the results on uniqueness for smooth obstacles in [20, 21, 25, 34, 48, 60, 102, 103, 106, 108, 109], the results on uniqueness for polyhedral obstacles in [2, 13, 14, 28, 29, 72], the results on uniqueness for discs or balls in [67, 74, 114], and the results on uniqueness for smooth plane screens or planar curves in [3, 53, 54, 87, 105, 106, 107]. However, this important problem still remains largely unsolved, even up to now. There is an interesting and well-known conjecture that one incident plane wave with one single direction and one single wave number (and one polarization in the electromagnetic case) completely determines the obstacle (without any additional a-priori information). As remarked in [48]: This is a well-known question that supposedly can be solved by elementary means. However, it has been open for thirty to forty years and there is no idea how to attack it. Inevitably, there is an enormous amount of interesting material left out of this review as we will provide only uniqueness results in inverse scattering problems. For results on uniqueness in the other case, stability analysis or numerical methods we refer the readers to some review papers [10, 17, 21, 22, 60, 76, 97] and monographs [11, 20, 41, 48, 55–58, 96, 101]. The plan of this chapter is as follows. In Section 11.2 to Section 11.6, we will mainly provide a sketchy summary of the existing uniqueness results in the inverse acoustic obstacle scattering problem together with a brief description of the mathematical tools involved in the proofs of those results, such as spectral arguments, mixed reciprocity relations, Holmgren’s uniqueness theorem, the unique continuation and reflection principles. For completeness, we also mention the uniqueness results for the electromagnetic case though most of them are analogues to the acoustic case. Section 11.7 is devoted to the uniqueness review for inverse scattering in a layered medium instead of a homogeneous medium. Finally, in Section 11.8 we list some interesting open problems that are of importance.
256
11.2
11
Uniqueness Results for Inverse Scattering Problems
Uniqueness for Inhomogeneity n
The inverse medium problem is closely related with electrical impedance tomography (EIT). The uniqueness theorem in the three-dimensional case was first given by Nachman [89], Novikov [91], and Ramm [99] independently. Their method is motivated by the completeness for the products u1 u2 of solutions to Δu1 + k 2 n1 u1 = 0 and Δu2 + k 2 n2 u2 = 0 with two different refractive indices n1 , n2 . Such a result was first established in the fundamental work of Sylvester and Uhlmann [110] by constructing complex geometric optics solutions of the form u(x, ζ) = eiζ·x (1 + v(x, ζ))
(11.2.1)
where ζ · ζ = 0, ζ ∈ C3 and v(x, ζ)L2 decays to zero as |#ζ| tends to infinity. To construct these solutions Sylvester and Uhlmann employed Fourier transform techniques and Hähner [39] simplified the analysis considerably by using Fourier series techniques. We refer the reader to the Habilitationsschrift of Hähner [41] for a detailed analysis of these techniques. The following uniqueness result can be found in [41] (see Theorem 2.10 and the remark given in page 68). Theorem 11.2.1. The inhomogeneity n ∈ L2 (D) is uniquely determined from the far field pattern for all incident plane waves. However, the uniqueness result in two dimensional case was one of the outstanding open problems in inverse scattering theory and was finally resolved in 2008 by Bukhgeim [7]. The generalization of Theorem 11.2.1 to the case of the Maxwell equations has been established by Colton and Päivärinta [24] and Hähner [41] for n ∈ C 2,α , (0 < α < 1). The results of [24] and [41] are for the case when the magnetic permeability μ is constant. Generalizations to the case of variable permeability have been given by Ola, Päivärinta and Somersalo [92] and Ola and Somersalo [93].
11.3
Uniqueness for Smooth Obstacles
In general, for the scattering problem the boundary values are as smooth as the boundary since they are given by the restriction of the analytic function ui to ∂D. In particular, for domains D of class C 2 our regularity analysis shows that the scattered field us ∈ C 2 (Rm \D) ∩ C 1,α (Rm \D). We begin with the following Rellich’s lemma which gives the one-to-one correspondence between radiating waves and their far field patterns. It was first proved by Rellich [104] and Vekua [111] in 1943. Due perhaps to wartime conditions [11], Vekua’s paper remained unknown in the west and then the result is
11.3 Uniqueness for Smooth Obstacles
257
now commonly attributed only to Rellich. Rellich’s lemma also ensures uniqueness for solutions to exterior boundary value problems and has been essentially utilized in almost all the uniqueness studies. Lemma 11.3.1. Rellich Assume that D ∈ Rm is a bounded domain with connected complement G := Rm \D and let u ∈ C 2 (G) be a solution to the Helmholtz equation (11.1.4) satisfying |u(x)|2ds = 0. lim r→∞
|x|=r
Then u = 0 in G. The following classical uniqueness result for sound-soft obstacles D ∈ R3 is due to Schiffer [65]. Note that the initial proof described in [65] contains a slight technical fault since the fact that the complement of D1 ∪ D2 might be disconnected was overlooked. The following corrected version was given in the book of Colton and Kress [20, Theorem 5.1]. Theorem 11.3.2. Assume that D1 and D2 are two sound-soft obstacles such that their far field patterns coincide for an infinite number of incident plane waves with distinct incident directions and one fixed wave number. Then D1 = D2 . Proof. Assume that D1 = D2 . By Rellich’s lemma for each incident plane wave ui the scattered waves us1 and us2 for the obstacles D1 and D2 coincide in the unbounded component G of the complement of D1 ∪ D2 . Without loss of generality, we can assume that D∗ := (R3\G)\D1 is nonempty (see Figure 11.1). Then us1 is defined in D∗ , and the total field u = ui + us1 satisfies the Helmholtz equation in D∗ and the homogeneous boundary condition u = 0 on ∂D∗ . Hence, u is a Dirichlet eigenfunction of −Δ in the domain D∗ with eigenvalue k 2 . The proof is now completed by using the results [20, P107] that the total fields for distinct incoming plane waves are linearly independent and
Figure 11.1. Two different obstacles.
258
11
Uniqueness Results for Inverse Scattering Problems
that for a fixed eigenvalue there exist only finitely many linearly independent Dirichlet eigenfunctions of −Δ in H01 (D∗ ). For n = 0, 1, . . . , we denote the positive zeros of the spherical Bessel func tions jn by tn,l , l = 0, 1, . . . , i.e., jn (tn,l ) = 0. Let N := (2n + 1) over all n such that tn,l < k ∗ R. Let 0 < k1 < · · · < kN +1 ≤ k ∗ . Using the strong monotonicity property of the eigenvalues of −Δ, extending Schiffer’s ideas, Colton and Sleeman [25] showed that the obstacle D is uniquely determined by the far field pattern for a finite number of incident plane waves provided a priori information on the size of the obstacle. Theorem 11.3.3. Let D1 and D2 be two sound-soft obstacles which are contained in a ball of radius R. Assume that their far field patterns coincide for N + 1 incident plane waves either (a) with distinct incident directions and one fixed wave number k ≤ k ∗ or (b) with distinct positive wave numbers k1 , . . . , kN +1 and one fixed incident direction. Then D1 = D2 . Proof. The proof for case (a) is presented in Theorem 5.2 in [20], while the proof for case (b) is presented in Theorem 6.3.1 in [48]. Note that the above results also hold in the two dimensional case. In particular, assume that the obstacle is contained in a ball of radius R then it is uniquely determined by the far-field pattern for one incident plane wave provided the wave number satisfies that kR < k0,1 , where k0,1 denotes the smallest positive zero of the Bessel function J0 of order zero in R2 or the smallest positive zero of the spherical Bessel function j0 of order zero in R3 . Recently, exploiting the fact that the wave functions are complex valued, this bound was improved to kR < k1,1 by Gintides [34] where k1,1 denotes the smallest positive zero of the Bessel function J1 of order zero in R2 or the smallest positive zero of the spherical Bessel function j1 of order zero in R3 . Using the fact that the optimal lower estimate for the eigenvalues of the Laplacian for a domain is given by the Faber-Krahn inequality, which relates the area of the domain to the first eigenvalue of a disc of equal area, a local type uniqueness is proved in [34] under the restriction that the possible obstacles do not deviate “too much” in measure (such as, area, diameter etc.) Another local uniqueness result concerning volume differences by Stefanov and Uhlmann [109] is based on estimate using the Poincaré inequality. However, Schiffer’s proof cannot be generalized to other boundary conditions. This is due to the fact that the finiteness of the dimension of the eigenspaces of −Δ for the Neumann or impedance boundary condition requires the boundary of the intersection D∗ from the proof of Theorem 11.3.2 to be sufficiently smooth. In other words, the validity of the Rellich selection theorem in the Sobolev space H 1 (D∗ ), that is, without homogeneous Dirichlet boundary values, requires the
11.3 Uniqueness for Smooth Obstacles
259
boundary to be sufficiently smooth. Therefore, for a long time uniqueness for other inverse scattering problems from both impenetrable and penetrable obstacles remained open. In 1990, Isakov [47] made a great breakthrough. Assuming two different obstacles producing the same far field pattern for all incident directions, Isakov obtained a contradiction by considering a sequence of solutions with a singularity moving towards a boundary point of one obstacle that is not contained in the other obstacle. He used weak solutions and the proofs are technically involved. In 1993, Kirsch and Kress [50] realized that these proofs can be simplified by using classical solutions rather than weak solutions and by obtaining the contradiction by considering pointwise limits of the singular solutions rather than limits of L2 integrals. Only after this new uniqueness proof was published, it was also observed by the authors that for scattering from impenetrable obstacles it is not required to know the physical properties of the obstacles in advance. Before proceeding to the proof of the general case, we first provide some useful tools. In addition to scattering of plane waves, we also need to consider scattering of a point source located at z ∈ G which is given by ⎧ ⎪ eik|x−z| ⎪ ⎨ , m = 3, Φ(x, z) = 4π|x − z| ⎪ ⎪ ⎩ i H0(1) (k|x − z|), m = 2 4 (1)
depending on the dimension m for x = z. Here, H0 denotes the Hankel function of the first kind of order 0. Note that Φ(x, z) is the fundamental solution to the Helmholtz equation. Denote by us (·, d) the scattered field for an incident plane wave ui (·, d) with incident direction d ∈ S and by u∞ (·, d) the corresponding far field pattern. The scattered field for an incident point source Φ(·, z) with the source point z ∈ Rm is denoted by us (·; z) and the corresponding far field pattern by Φ∞ (·, z). The following mixed reciprocity relation will play a key role in the uniqueness proof for the general boundary condition (11.1.6). For the mixed reciprocity relation we need the constant ⎧ 1 ⎪ ⎪ m = 3, ⎨ 4π , γm = eiπ/4 ⎪ ⎪ ⎩√ , m=2 8kπ depending on the dimension m. Lemma 11.3.4 (Mixed reciprocity relation). For the scattering of plane waves ui (·, d) with d ∈ S and point-sources Φ(·, z) from an obstacle D we have x, z) = γm us (z, −? x) Φ∞ (?
z ∈ G, x ? ∈ S.
(11.3.1)
260
11
Uniqueness Results for Inverse Scattering Problems
Proof. The proof can be found in [56] or [96]. The corresponding result for electromagnetic waves can be found in [58, 96]. Besides the mixed reciprocity relation, Holmgren’s uniqueness theorem will be useful to prove the uniqueness of the boundary conditions. Theorem 11.3.5 (Holmgren’s uniqueness theorem). Let u ∈ C 2 (D) ∩ C 1 (D) be a solution of the Helmhotz equation (11.1.4) in the domain D such that u has zero Cauchy data on a surface element of ∂D. Then u is identically zero. Proof. For a proof, we refer the reader to [15, 55]. The corresponding result for electromagnetic waves can be found in [1, 57]. Based on these tools, we are able to prove the following uniqueness theorem [9, 64]. Theorem 11.3.6. Assume that D1 and D2 are two obstacles with boundary conditions B1 and B2 such that the far field patterns coincide for all incident directions and a fixed wave number. Then D1 = D2 and B1 = B2 . Remark 11.3.7. One of the remarkable features of this uniqueness result is that we need not to know a prior the physical property of the underlying obstacle; in fact, it can also be uniquely identified. This seems to be appropriate for a number of applications where the physical nature of the obstacle is unknown. Besides, we would like to mention that the use of singular sources in the proof of Theorem 11.3.6 coupled with the dual space method of Colton and Monk (see [20]) gives rise to the so-called linear sampling method, which has been under extensive study in recent years (see [11, 97]). ∞ Proof. Let u∞ 1 (·, d) and u2 (·, d) be the far field patterns for an incident plane wave with incident direction d ∈ S and let us1 (·; z) and us2 (·; z) be the scattered waves for an incident point source with source point z corresponding to D1 and D2 , respectively. With the mixed reciprocity relation (11.3.1) and two applications of Rellich’s lemma, first for scattering of plane waves and then for x, d) = scattering of point sources, it can be concluded from the assumption u∞ 1 (? ∞ s s x, d) for all x ?, d ∈ S that u1 (x; z) = u2 (x; z) for all x, z ∈ G. Here, as in u2 (? the previous proof, G denotes the unbounded component of the complement of D1 ∪ D2 (Figure 11.2). loss of generality, we may assume Assume that D1 = D2 . Then, , m without that there exists z0 ∈ ∂D2 ∩ R \D1 . Choose h > 0 such that the sequence zj := z0 + hj ν(z0 ) (j = 1, 2, . . . ) is contained in G, where ν(z0 ) is the outward normal to ∂D2 at z0 . Since z0 has a positive distance from D1 , we conclude
11.3 Uniqueness for Smooth Obstacles
261
Figure 11.2. Two different obstacles.
from the well-posedness of the direct scattering problem that there exists C > 0 such that |B(us1 (z0 ; zj ))| ≤ C uniformly for j ≥ 1. On the other hand, by the boundary condition on ∂D2 , |B(us1 (z0 ; zj ))| = |B(us2(z0 ; zj ))| = | − B(Φ(z0, zj ))| → ∞ as j → ∞. This is a contradiction, which implies that D1 = D2 . We now assume that the boundary conditions are different, that is, B1 = B2 . First, for the case of impedance boundary conditions we assume that we have two different continuous impedance functions λ1 = λ2 . Then from the conditions ∂u ∂u + iλ1 u = 0, + iλ2 u = 0 on ∂D1 , ∂ν ∂ν it is deduced that (λ1 − λ2 )u = 0 on ∂D1 . Therefore, on the open set Γ := {x ∈ ∂D1 : λ1 = λ2 } we have that ∂u/∂ν = u = 0. Then Holmgren’s uniqueness theorem implies that the total field u = ui + us = 0. The scattered field us tends to zero uniformly at infinity while the incident plane wave has modulus one everywhere. Thus, the modulus of the total field tends to one. This leads to a contradiction, giving that λ1 = λ2 . The case for other boundary conditions can be dealt with similarly. Clearly, the above method exploits the fact that the scattered wave for incident point sources becomes singular at the boundary as the source point approaches a boundary point. It also has been employed by Kirsch and Kress [50] for the transmission problem and by Hettlich [43] and Gerlach and Kress [33] for the conductive boundary condition. Here the analysis becomes more involved due to the fact that from the transmission or conductive boundary conditions it is not immediately obvious that the scattered wave becomes singular since the singularity of the incident wave, in principle, could be compensated by a singularity of the transmitted wave. In the proofs, this possibility is excluded through
262
11
Uniqueness Results for Inverse Scattering Problems
a somewhat tedious analysis of boundary integral operators that, in particular, require the fundamental solution both inside and outside the homogeneous obstacle. In the electromagnetic case, the corresponding uniqueness results with the above approach were obtained for scattering from perfect conductors by Colton and Kress (see Theorem 7.1 in [20]), for scattering from general impenetrable obstacles by Kress [59], for scattering from an inhomogeneous medium by Hähner [38], for scattering from homogeneous chiral media by Gerlach [32] and for scattering from homogeneous orthotropic media by Colton, Kress and Monk [23]. Hähner [42] considered uniqueness for the inverse scattering problem of determining the shape of the obstacle D for the inhomogeneous transmission problem that, in particular, includes scattering from an inhomogeneous orthotropic medium. Hähner’s approach restructures Isakov’s [47] original idea in such a way that, in general, it can be applied provided the direct scattering problem is well-posed with a sufficiently regular solution and an associated interior transmission problem is a compact perturbation of a well-posed problem. It differs from the analysis in [47] by using weak solution techniques rather than boundary integral equations and as opposed to both [47, 50] it does not need the fundamental solution inside the obstacle. The ideas of Hähner have been extended to Maxwell’s equations by Cakoni and Colton [8].
11.4
Uniqueness for Polygon or Polyhedra
In the past few years, some good progress has been achieved for uniqueness with polygonal/polyhedral obstacles, and a good survey on this can be found in [76]. Generally speaking, a polygon is a simply connected set in R2 whose boundary is composed of finitely many line segments, while a polyhedron is a simply connected set in R3 whose boundary is composed of finitely many polygons. Recently, Alessandrini and Rondi [2] and Liu and Zou [72] considered some more general polygonal and polyhedral obstacles. In fact, an obstacle D is said to be a polygonal or polyhedral obstacle if it is a compact set in Rm (m ≥ 2) with connected complement G := Rm \D, and the boundary of ∂G is composed of a finite union of cells. Here, a cell is defined to be the closure of an open subset of an (m − 1)-dimensional hyperplane. Clearly, such an obstacle is very general and admits the simultaneous presence of finitely many solid- and crack-type obstacles. In R2 , for instance, D may consist of finitely many polygons and line segments, while in R3 , D may consist of finitely many polyhedra and cells that are not necessarily being polygons. More significantly, we have the following uniqueness result due to Liu and Zou [72] and Elschner and Yamamoto [30]. Theorem 11.4.1. A sound-soft polyhedral obstacle D as described above is uniquely determined by the far field pattern corresponding to a single incident plane wave, whilst a sound-hard polyhedral obstacle D in Rm is uniquely deter-
11.5 Uniqueness for Balls or Discs
263
mined by the far field pattern corresponding to m incident plane waves with fixed wave number and linearly independent incident directions. A polyhedral obstacle in Rm , consisting of finitely many solid polyhedra, is uniquely determined by a single incident plane wave. In 1989, Ari and Firth [4] gave a numerical algorithm for reconstructing polygonal obstacles. In 1994, Liu and Nachman [68] proved, among various results, that, for m ≥ 2, u∞ uniquely determines the convex envelope of a polyhedral obstacle D. In addition, they also outlined a proof of the unique determination of a polyhedral obstacle. Their arguments involve a scattering theory analogue of a classical theorem of Polya on entire functions and the reflection principle for solutions of the Helmholtz equation across a flat boundary. In 2003, Cheng and Yamamoto [13] gave the first uniqueness result, under the non-trapping condition, with at most two incoming waves. The key is the analyticity of the solution of the scattering problem and the reflection principle of solutions. These two properties played an important role in the subsequent study. At the same time, Alessandrini and Rondi [2] proved that a sound-soft polyhedral obstacle is uniquely determined by the far field pattern corresponding to an incident plane wave at one given wave number and one incident direction. Some improvements were derived by Elschner and Yamamoto [28, 29, 30]. Based on the reflection principle given in [13] and path argument idea given in [2], Liu and Zou obtained a shorter proof in the sound-soft case, together with the result that any sound-hard polyhedral obstacle is uniquely determined by the far field pattern for m linearly independent incident directions. Moreover, these uniqueness results can be extended to obstacles with mixed-type (Dirichlet/Neumann) boundary conditions [75]. Along this line, some novel reflection principles were derived in [70, 71] for the time-harmonic Maxwell’s equations. They were then applied to prove that a perfectly electric conducting (PEC) or perfectly magnetic conducting (PMC) obstacle in R3 , consisting of finitely many solid polyhedra, is uniquely determined by the far-field pattern corresponding to a single incident electromagnetic plane wave. In realizing that the existence of an “unbounded” perfect plane implies certain symmetries of the underlying obstacle, it was proved in [77] that a polyhedral obstacle together with its mixed PEC or PMC boundary condition are uniquely determined by the electric far field pattern for a fixed polarization p, an incident direction d (d × p = 0) and a wave number k.
11.5
Uniqueness for Balls or Discs
Let D be a sound-soft ball centered at the origin. It is obvious from symmetry that the far field pattern only depends on the angle between the incident and
264
11
Uniqueness Results for Inverse Scattering Problems
the observation directions (see for example (3.30) in [20]). Hence we have u∞ (? x, d) = u∞ (Q? x, Qd)
(11.5.1)
for all x ?, d ∈ S and all rotations, that is, all real orthogonal matrices Q with det Q = 1. Karp’s theorem [49] says that the converse of this statement is also true. This result has been simplified and extended to scattering by sound-hard obstacles [18], scattering by a locally inhomogeneous medium [18, 100] and electromagnetic scattering by a perfect conductor [19]. In fact, by using the uniqueness Theorem 11.3.6, the relation (11.5.1) implies that if the obstacle is invariant under any rotation then it must be a ball with center at the origin and the impedance λ is a constant. In the three dimensional case, Liu [66] gives an improvement in the sense that (11.5.1) holds only for such rotations that are invariant about two linearly independent incident directions. Explicit solutions of scattering problems are only available for balls or discs with special physical properties (see [20, 74, 44]). Therefore, it is expectable to study the uniqueness issues from these explicit representations. Indeed, for balls and discs, the uniqueness results can be established by means of the fact that the radiating solutions to the Helmholtz equation corresponding to balls or discs can be analytically extended to a solution in Rm except for the center. The extension of radiating solutions for a sound-soft ball was first shown in [16], based on the study of the Goursat problem for the wave equation, whilst for the sound-hard case, it was shown in [114] by carefully estimating the convergence radius of the spherical wave functions expansion of the solution. As a consequence of this extension property, one readily gets that the location of a sound-soft/sound-hard ball is uniquely determined by the far field pattern corresponding to a single incident plane wave [67, 114]. Suppose the far field patterns for two balls Br1 (x1 ) and Br2 (x2 ) with x1 = x2 coincide, that is, u∞ x) = u∞ x) for all x ? ∈ S. Then one can easily derive from Rellich’s lemma 1 (? 2 (? and the above extension property that u(x) := u1 (x) − u2 (x) is a radiating entire solution in the whole Rm , that is, u = 0 in Rm , which is certainly not true. Then, with the asymptotic expression for the spherical Bessel or Bessel functions, one can further deduce that r1 = r2 , that is, the shape of the ball/disc is also uniquely determined. It should be mentioned that from symmetry, the x, d) depends only on the angle θ between the observation far field pattern u∞ (? direction x ? and the incident direction d, and therefore these results can be deduced directly from Theorem 11.3.6. Note further that, given its physical property, a ball/disc is uniquely determined by its radius and center. Therefore, it seems that only few far field data are enough to identify the ball/disc. From the properties of the zeros of the Bessel and spherical Bessel functions, Liu and Zou [74] made a first step in this direction in 2007, by showing that, in the resonance region, the radius of
11.6 Uniqueness for Surfaces or Curves
265
a sound-soft/sound-hard ball can be uniquely determined by a single far field datum. In fact, the center of a sound-soft/sound-hard ball can also be uniquely determined by at most three far field data [44]. This uniqueness result has recently been extend to the inverse electromagnetic scattering by a perfectly conducting ball in [44].
11.6
Uniqueness for Surfaces or Curves
There are also some uniqueness results concerning obstacles having empty interior such as a surface in R3 or a curve in R2 . The study in this aspect was initiated by Kress in [53], who proved that a sound-soft open C 3 -curve in R2 can be uniquely determined by the far field pattern at a fixed wave number and all the possible incident directions. This result was then extended to the sound-hard case by Mönich in [87]. In [54], Kress has further proved that a sound-soft open analytic curve in R2 is uniquely determined by a single incident plane wave provided the wave number is sufficiently low. In [105] and [107], it was shown that a much general multiple obstacle in R2 , possibly consisting of finitely many holes or curves, can be uniquely determined by measuring the far field pattern corresponding to incident plane waves with a fixed and sufficiently low wave number and all the possible incident directions. Alves and Duong [3] established a uniqueness result for admissible scatterers in R3 that include screens with unknown impedance boundary conditions. In [106], Rondi investigated the uniqueness issue for the determination of a sound-soft multiple obstacle in R3 , which includes, for example, obstacles and surfaces, by a finite number of far field measurements.
11.7
Uniqueness Results in a Layered Medium
In practical applications, the background might not be homogeneous and then may be modeled as a layered medium. In this section, we review some recent work on uniqueness results for inverse scattering in a layered medium. For simplicity, we mainly consider the two models as shown in Figures 11.3 and 11.4. The layered medium in the first model is a nested body consisting of a finite number of homogeneous layers and occurs in various areas of applications such as non-destructive testing, biomedical imaging and geophysical explorations. In non-destructive testing, for example, the conducting wire can be modeled in terms of an inhomogeneous impedance boundary condition while the coating can be characterized as an arbitrarily shaped lossy dielectric layer. We take the inverse acoustic scattering by an impenetrable obstacle in a two-layered medium in R3 as an example for illustration. It is modeled by
266
11
Uniqueness Results for Inverse Scattering Problems
Figure 11.3. Scattering by a two-layered obstacle.
Figure 11.4. Scattering by partially coated obstacles in a two-layered medium.
the Helmholtz equation with boundary conditions on the interface S0 and the boundary S1 : Δu + k02 u = 0 in Ω0 ,
(11.7.1)
u(x) = u (x) + u (x) in Ω0 ,
(11.7.2)
i
k12 v
s
Δv + = 0 in Ω1 , ∂u ∂v u − v = 0, − λ0 = 0 on S0 , ∂ν ∂ν B(v) = 0 on S1 , & % s ∂u lim r − ik0 us = 0 r = |x| r→∞ ∂r
(11.7.3) (11.7.4) (11.7.5) (11.7.6)
where ν is the unit outward normal to the interface S0 and the boundary S1 , and λ0 is a positive constant. Remark 11.7.1. This model can be considered as either (1) the problem of an impenetrable obstacle Ω2 immersed in a layered medium or (2) the problem of a layered obstacle Ω := R3 \ Ω0 containing an impenetrable core Ω2 immersed in a homogeneous medium Ω0 .
11.7 Uniqueness Results in a Layered Medium
267
The well-posedness of the direct problem has been rigorously proved in [85] using the variational method and in [82] employing the integral equation method. The latter method also makes it possible to show some a-priori estimates of the solution on the interface S0 ; this result is then used to prove that the interface S0 is uniquely determined from the far field pattern [82] given λ0 = 1. Given the interface S0 , that is, the background medium is known a-priori, Yan and Pang [113] gave a proof of uniqueness of the sound-soft obstacle based on Schiffer’s idea. But their method can not be extended to other boundary conditions. They also proved a uniqueness result for a two-layered medium with a sound-hard obstacle in [98] using a generalization of Schiffer’s method. However, their method requires the interior wave number to be in an interval, which seems unreasonable, and is hard to be extended to the case of multilayered medium. Recently, by the following generalized mixed reciprocity relation it is proved in [85] that both the obstacle and its physical property can be uniquely determined. Lemma 11.7.2 (Mixed reciprocity relation). For the scattering of plane waves ui (·, d) with d ∈ S and point-sources Φ(·, z) from an obstacle Ω2 we have γn us (z, −? x), z ∈ Ω0 , x ? ∈ S, ∞ Φ (? x, z) = λ0 γn us (z, −? x) + (λ0 − 1)γn ui (z, −? x), z ∈ Ω1 , x ? ∈ S. The mixed reciprocity relation is also modified in [83]. Using Kirsch and Kress’s idea [50], the following uniqueness result is available in Liu and Zhang [82]. Theorem 11.7.3. Suppose the positive numbers k0 , k1 and λ0 (λ0 = 1) are given. Then the interface S0 , and the obstacle Ω2 with its physical property B x, d) for all the incident are uniquely determined by the far field pattern u∞ (? directions d ∈ S and the observation directions x ? ∈ S. A corresponding uniqueness result in the electromagnetic case is available in [80] and [84]. Note that the method used for the proof of the unique determination of the interface S0 in the acoustic case can not be extended to the electromagnetic case. But using Hähner’s ideas [38], a different method was used in [84] to establish such a uniqueness result in the electromagnetic case. The second model shown in Figure 11.4 has also many important applications, such as a mine buried in the soil, where the domain surrounding the obstacle (mine) consists of two half-spaces (air and soil) with different electromagnetic coefficients separated by a flat infinite interface. Moving an electronic device parallel to the flat infinite interface to generate a time-harmonic field, the induced field is measured within the same (or other) device. The goal is to retrieve information from these data to detect and identify buried obstacles.
268
11
Uniqueness Results for Inverse Scattering Problems
For more information, especially on mine detection, the reader is referred to [27] and the many references therein. For x = (x1 , x2 , x3 ) ∈ R3 , let Σ0 := {x ∈ R3 |x3 = 0} be the flat infinite interface and let the two half-spaces R3+ := {x ∈ R3 |x3 > 0} and R3− := {x ∈ R3 |x3 < 0} above and below Σ0 represent air and soil, respectively. Assume that D ⊂ R3− is a bounded domain with connected complement and define Ω1 := R3+ and Ω2 := R3− \D. Denote by D an impenetrable obstacle that is (possibly) partially coated by a thin dielectric layer. We assume further that the boundary ∂D is Lipschitz continuous with a dissection ∂D = Γ1 ∪ Γ2 , where Γ1 (un-coated part) and Γ2 (coated part) are two disjoint, relatively open subsets of ∂D. In particular, a fully coated obstacle corresponds to the case when Γ1 = ∅. For an intuitional description see Figure 11.4. Finally, let ν denote the unit outward normal to ∂D and the unit upward normal to Σ0 . Consider the propagation of electromagnetic waves with frequency ω in the two-layered medium consisting of two isotropic half-spaces Ωj with space independent electric permittivity j , magnetic permeability μj and electric conductivity σj , j = 1, 2. The electromagnetic wave is described by the electric field Ej and the magnetic field Hj satisfying the Maxwell equations curl Ej + μj
∂Hj H = 0, ∂t
curl Hj − j
∂Ej = σ j Ej ∂t
in Ωj
for j = 1, 2. For time-harmonic electromagnetic waves of the form iσj −1/2 −1/2 Ej := $ j + Ej (x)e−iωt , Hj := $ μj Hj (x)e−iωt , ω it follows that the complex-valued space-dependent parts Ej and Hj satisfy the time-harmonic Maxwell equations curl Ej − ikj Hj = 0,
curl Hj + ikj Ej = 0 in Ωj , (11.7.7) j + iσj /ω μj ω 2 where the wave number kj is a constant given by kj = with the sign of kj chosen such that #kj ≥ 0. Assume that the medium (air) in Ω1 is non-conducting, that is, σ1 = 0 and consequently, k1 > 0. However, the medium (soil) in Ω2 is possibly conducting with a nonnegative conductivity σ2 ≥ 0, and therefore #k2 ≥ 0. On the interface, the fields must satisfy the transmission conditions ν × E1 = aE ν × E2 ,
ν × H1 = aH ν × H2 on Σ0 (11.7.8) with the constants given by aE = 1 /(2 + iσ2 /ω) and aH = μ1 /μ2 . The transmission conditions (11.7.8) imply the continuity of the tangential component of the electric field E and the magnetic field H across the interface.
11.7 Uniqueness Results in a Layered Medium
269
On the boundary of the obstacle ∂D, we have a mixed boundary condition B defined as ν × E2 = 0 on Γ1 ,
ν × H2 −
λ (ν × E2 ) × ν = 0 on Γ2 k2
(11.7.9)
with a positive constant λ. The impedance boundary condition on Γ2 is derived on the assumption that if is the electric permittivity in the thin dielectric coating then 2 [37] and consequently the electromagnetic wave can not penetrate deeply into the obstacle. Therefore, the mixed boundary condition (11.7.9) models a partly coated obstacle which widely occurs in practical applications. The electromagnetic fields Ej , Hj are decomposed into the scattered fields Ejs , Hjs and the incident fields Eji , Hji . To characterize the physically relevant solution, the scattered fields Ejs , Hjs are required to satisfy the Silver-Müller radiation condition lim |Hjs × ν − Ejs |2 ds = 0 (11.7.10) R→∞
ΩR,j
where ΩR,j = {x ∈ Ωj | |x| = R}, j = 1, 2. In [27], an explicit representation of the Green’s matrix GE,j , GH,j (j = 1, 2) is provided for the two-layered background medium. Consider the problem of scattering of the following electric dipole for the two-layered medium: . E,j (x, y)p, Hji (x, y, p) := GE,j (x, y)p = curl (pΦj (x, y)) + G i . H,j (x, y)p, x = y (11.7.11) Eji (x, y, p) := curlcurl (pΦj (x, y)) + G kj with the source point y ∈ R3 \Σ0 and the polarization p ∈ R3 . Here, for x = y, Φj (x, y) = eikj |x−y| /(4π|x − y|) is the fundamental solution to the Helmholtz . E,j (x, y) is a smooth matrix equation with the wave number kj in Ωj and G function such that the incident fields (11.7.11) satisfy the Maxwell equations (11.7.7) in R3 \{y}, the transmission conditions (11.7.8) and the Silver-Müller radiation condition (11.7.10). The direct problem is to look for the total fields Ej = Ejs + Eji ,
Hj = Hjs + Hji
(11.7.12)
satisfying the Maxwell equations (11.7.7) in R3 \(D∪{y}), the transmission conditions (11.7.8), the mixed boundary condition (11.7.9) and the Silver-Müller radiation condition (11.7.10). The well-posedness of the scattering problem (11.7.7)–(11.7.10) has been established by Delbary et al. [27] for the case when Γ2 = ∅ and ∂D is sufficiently smooth (e.g., C 2 -smooth), employing the integral equation method. By a variational approach and following essentially a
270
11
Uniqueness Results for Inverse Scattering Problems
similar argument as in [26] or [88, Section 12.4], it is easy to establish the wellposedness of the scattering problem (11.7.7)–(11.7.10) for the general case with a Lipschitz continuous boundary ∂D. The inverse problem we have in mind is, given measurements in the upper half-space Ω1 , to recover the obstacle D and its physical property B, that is, the mixed boundary condition (11.7.9). Precisely, assume that the electromagnetic fields are generated by electric dipoles (11.7.11) located on some two-dimensional device (see Figure 11.4): Σi ⊂ Σdi := {x = (x1 , x2 , x3 ) ∈ R3 |x3 = di},
di > 0,
(11.7.13)
while the measurements {ν × E1 } are taken on another two-dimensional device: Σm ⊂ Σdm := {x = (x1 , x2 , x3 ) ∈ R3 |x3 = dm},
dm > 0.
(11.7.14)
Mathematically, the two devices Σi and Σm are assumed to be relatively open in Σdi and Σdm , respectively. Note that these two devices can possibly be the same in practical applications. Since the field E1 is analytic [20, Theorem 6.3] in Ω1 \Σi , then the measurement device Σm can be chosen to be any relatively open nonempty subset of Σdm . In recent years, many numerical reconstruction methods have been proposed to solve the above inverse problem; see, e.g. the linear sampling method proposed by Gebauer et al. [31] and by Cakoni, Fares and Haddar [12], an iterative method proposed by Delbary et al. [27], and an asymptotic factorization method studied by Griesmaier [35] and by Griesmaier and Hanke [36]. The first step towards the uniqueness result in such a model was made in [81] as the following theorem. . are two impenetrable obstacles with Theorem 11.7.4. Assume that D and D . physical properties B and B, respectively, for the scattering problem. If the tangential component of the electric fields coincides, i.e., .1 (x, y, p), ν × E1 (x, y, p) = ν × E
∀x ∈ Σm
(11.7.15)
for all incident field (Eji (x; y, p), Hji (x; y, p)) defined by (11.7.11) with the source point y ∈ Σi and two different polarizations p = e1 := (1, 0, 0) and p = e2 := . . and B = B. (0, 1, 0). Then D = D A key tool in the proof is a novel reciprocity relation which gives clearly the relationship between the solutions of the scattering problem from the incident fields (11.7.11) with two different source points. Theorem 11.7.5 (Reciprocity relation). For any two source points yj ∈ Ωj and two polarizations pj ∈ R3 , j = 1, 2, we have p1 · E1 (y1, y2 , p2 ) =
k1 p2 · E2 (y2 , y1 , p1 ). k2
(11.7.16)
11.7 Uniqueness Results in a Layered Medium
271
To complete the proof of Theorem 11.7.4, we will also need the following uniqueness result. Lemma 11.7.6. Let b > 0 be a fixed constant. Suppose (E, H) is a solution to the Maxwell equations (11.7.7) in Ωb := {x ∈ R3 |x3 > b} satisfying the SilverMüller radiation condition (11.7.10) with ΩR,1 replaced by Ω∗R,1 =: ΩR,1 ∩ Ωb and the perfectly conducting boundary condition ν×E =0
on Σb := {x ∈ R3 |x3 = b}.
(11.7.17)
Then E = H = 0 in Ωb . Proof. Let Rb denote the reflection with respect to Σb in R3 . For x = (x1 , x2 , x3 ) ∈ R3 define x =: Rb x = (x1 , x2 , 2b −x3 ). For x ∈ Rb Ωb =: {x ∈ R3 |x3 < b} set . 2 (x), E . 3 (x)) =: (−E 1 (x ), −E 2 (x ), E 3 (x )), . . 1 (x), E E(x) = (E . . 1 (x), H . 2 (x), H . 3 (x)) =: (H 1 (x ), H 2 (x ), −H 3 (x )). H(x) = (H Define V (x) :=
E(x), x ∈ Ωb ∪ Σb , . E(x), x ∈ Rb Ω b ,
W (x) :=
H(x), x ∈ Ωb ∪ Σb , . H(x), x ∈ Rb Ω b .
Then a straightforward calculation shows that (V, W ) satisfies the Maxwell equations curl V − ik1 W = 0,
curl W + ik1 V = 0 in R3
(11.7.18)
and the radiation condition lim
R→∞ Ω∗ R
|W × ν − V |2 ds = 0
(11.7.19)
where Ω∗R := Ω∗R,1 ∪ Rb Ω∗R,1 . It is easy to prove that the radiation condition (11.7.19) is equivalent to the Silver-Müller radiation condition lim |W × ν − V |2 ds = 0 (11.7.20) R→∞ Ω R
where ΩR =: {x ∈ R3 ||x| = R}. Thus, (V, W ) is an entire solution to the Maxwell equations (11.7.18) satisfying the Silver-Müller radiation condition (11.7.20). Therefore, V = W = 0 in R3 (cf. the proof on page 163 in [20]), so E = H = 0 in Ωb .
272
11.8
11
Uniqueness Results for Inverse Scattering Problems
Open Problems
We have reviewed many interesting and important results. However, more questions are important and not answered until now. In this section, we would like to list some open problems which are of importance. Open Problem 1. Assume that n ∈ C(D) but is not necessarily equal to one for x ∈ ∂D. Prove that the inhomogeneity n is uniquely determined by the x, d, q) for all x ?, d ∈ S and q ∈ R3 . electric far field pattern E ∞ (? This problem was proposed in [10]. In the case when the inhomogeneity n ∈ C 2,α (R3 ), this problem has been resolved in [24] and [41]. However, in many practical applications, n is not continuously differentiable in R3 ; in particular, it usually has a jump across the boundary ∂D. In 1993, Hähner [38] made a first step by treating the case when n ∈ C 1,α (D) and n is a constant (= 1) in the neighborhood of ∂D. Open Problem 2. Suppose the inhomogeneous medium is anisotropic, that is, n is replaced by a 3 × 3 matrix function N . Prove that the support D of x, d) for all x ?, d ∈ S I − N is uniquely determined by the far field pattern u∞ (? (or the electric far field pattern E ∞ (? x, d, q) for all x ?, d ∈ S and q ∈ R3 in the electromagnetic case). This problem can be seen in [10]. Note that an anisotropic medium (n is replaced by a 3 × 3 matrix function N ) is not uniquely determined by the far field pattern and that only the support of I −N can be expected to be uniquely recovered by the far field pattern [94]. A uniqueness result has been established, under the condition that ξ$(N )ξ ≥ γ|ξ|2 or ξ#(N −1 )ξ ≥ γ|ξ|2 , where γ > 1 is a constant, by Hähner [42] for the acoustic case and by Cakoni and Colton [8] for the electromagnetic case. However, as pointed out in [10] the assumption on γ implies that (x) > 0 for all x ∈ D or (x) < 0 for all x ∈ D which seems artificial. Open Problem 3. Prove that one incoming plane wave with one single direction and one single wave number completely determines the obstacle without any additional a-priori information. This is a well-known problem in the inverse scattering theory which has been open for around fifty years. In [95, 96], Potthast considered the important case of finite data. Precisely, assume that (Sn )n∈N is a sequence of finite subsets of S consisting of n elements and satisfying that for a given we can find n such that the distance x − d| d(? x, Sn ) := inf |? d∈Sn
is smaller than for all x ? ∈ S. Then the following theorem was proved in [95, 96].
11.8 Open Problems
273
Theorem 11.8.1. Let D1 and D2 be two sound-soft or sound-hard obstacles, ∞ and let u∞ 1 and u2 be the two far field patterns corresponding to (11.1.4)– x; d) = (11.1.8). Given > 0 there exists integers n0 and n1 such that if u∞ 1 (? ∞ x; d) for x ? ∈ Sn0 , d ∈ Sn1 , then d(D1 ; D2 ) ≤ , where d(D1 ; D2 ) denotes u2 (? the Hausdorff distance between D1 and D2 . Based on Theorem 11.8.1, if we can establish the local uniqueness result under the condition that the Hausdorff distance between two obstacles is small enough, we may resolve this problem. Further, the study on the exact relation between and n0 and n1 is important. Using a different method, Liu and Nachman [68] have shown that there is at most a finite number of bounded, Lipschitz obstacles that can share the same far field pattern arising from a single incident wave. Taking another approach, Kress and Rundell [61] were able to obtain a local uniqueness result for an obstacle sufficiently close to a circle. This allows us to recover an obstacle whose boundary lies in some finite dimensional set, from the far field pattern at a discrete set of values corresponding to a single incident wave. In 1999, Kress and Rundell [63] obtained two local uniqueness results. x, d) at a The first seeks to recover the obstacle from the far field pattern u∞ (? single observation direction x ? = Qd for all incident directions d ∈ S and a fixed rotation matrix Q. In particular, this includes the case of backscattering. The second case considers a set of incident waves at one fixed direction but with wave numbers in an interval. Open Problem 4. Prove uniqueness of a sound-hard obstacle when the far field pattern is known at a fixed wave number, a fixed incident direction and all observation directions. In the case of acoustic scattering by a sound-soft obstacle the uniqueness result has been established by Colton and Sleeman [25], Gintides [34], Stefanov and Uhlmann [109] and Rondi [106]. Note that Schiffer’s method cannot be applied here since in contrary to the case of sound-soft obstacles, one cannot control the space of solutions to the homogeneous Neumann problem in a component of the difference of two possible obstacles. Since this component can have arbitrary cusps at its boundary, one cannot even conclude that this space is finite dimensional. Open Problem 5. Assume that D is a perfect conductor. Is it uniquely determined by the electric far field patten E ∞ (? x, d, p) for x ? ∈ S, p ∈ R3 and a finite number of directions d ∈ S? Under what conditions can D be determined by a single incident plane wave? x, d, p) is an analytic function of x ?, d on S, it suffices to know Since E ∞ (? x, d, p) with x ?, d on an open subset of S. Furthermore, since E ∞ (? x, d, p) is E (? ∞
274
11
Uniqueness Results for Inverse Scattering Problems
linear in p, it suffices to know E ∞ (? x, d, p) for three linearly independent vectors p1 , p2 , p3 . However, a rigorous mathematical justification is still not available. Open Problem 6. Prove the unique determination of a polygonal or polyhedral obstacle with an impedance boundary condition from the far field pattern for one single incident direction. Based on analytic continuation on a straight plane or line, for convex obstacles with an impedance boundary condition, Cheng and Yamamoto [14] gave a uniqueness result in the two-dimensional case by using two incident directions. Liu and Zou [73] also proved an uniqueness result for this case; however, there is a gap in their proof that is hard to fill. Since there is no corresponding reflection principle, some new idea must be used to prove the uniqueness result for polygonal or polyhedral obstacles with impedance boundary conditions. The corresponding case of electromagnetic scattering is also open. Open Problem 7. Can a sound-soft or sound-hard ball/disc be determined by at most four/three far field pattern data? Note that a ball in R3 or a disc in R2 can be uniquely identified by its radius and center. Based on a study of zeros of Bessel and spherical Bessel functions, Liu and Zou [74] proved that in the resonance region, the shape (radius) of a sound-soft/sound-hard ball in R3 or a sound-soft/ sound-hard disc in R2 is uniquely determined by a single far-field datum measured at a fixed observation corresponding to a single incident plane wave. The result in [74] was extended to the electromagnetic case with a perfectly conducting ball in [44]. Further in [44], with the help of the relation of the far field patterns under translation, it is shown that the location (center) of the ball can also be uniquely determined if three more far field data are added. An explicit formula is given in [44] for the electric far field pattern of a perfect conducting ball, and this result is then applied to get the uniqueness result for the electromagnetic case. The far field pattern u∞ is a complex-valued function defined on the unit ball, while the radius and the center of a ball/disc are uniquely determined by four/three real numbers. Thus from a purely information matching point of view, it seems that two far field data are enough. Open Problem 8. If the background medium is inhomogeneous, can the obstacle and the background medium be determined simultaneously by the far field pattern for all incident and observation directions and a fixed wave number? To the authors’ knowledge, there are few papers in studying the inverse obstacle scattering problem in an inhomogeneous medium. In particular, there are no global uniqueness results in recovering both the obstacle and surrounding medium from the far field pattern at a fixed frequency. In 1998, Kirsch and Päivärinta [51] proved a uniqueness result in determining a sound-soft or
11.8 Open Problems
275
a penetrable (with transmission boundary conditions) obstacle if the outside inhomogeneity is known in advance. In the same year, Hähner [40] considered the scattering of time-harmonic waves by a simply connected, sound-soft obstacle in an inhomogeneous medium in R2 using an interval of wave numbers. Ten years later, in 2007, the authors in [90] shown that an obstacle inside a known inhomogeneous medium can be uniquely determined from measurements of the far field at one frequency without a-priori knowledge of the boundary conditions. Open Problem 9. Can the modulus of the far field pattern, |u∞ |, for one incident wave, uniquely determine a sound-soft obstacle D? The problem of uniqueness was partially investigated by Kress and Rundell in [62]. They have shown that for the plane wave incidence the modulus of the far field pattern is invariant under translation for the sound-soft obstacle D, that is, for the shifted domain D := {x + h : x ∈ D} with a fixed unit vector h ∈ R2 , the far field pattern u∞ satisfies the equality u∞ x) = eikh·(d−x)u∞ (? x). (?
(11.8.1)
Therefore, from the modulus of the far field pattern for one incident plane wave one cannot recover the location of the obstacle. This ambiguity cannot be remedied by using finitely many incident waves with different wave numbers or different incident directions. It was also pointed out in [62] that it is a very difficult problem to obtain an analogue for Schiffer’s uniqueness result since its proof heavily relies on the fact that, by Rellich’s lemma, the far field pattern u∞ uniquely determines the scattered wave us . A corresponding result is not available for the modulus of the far field pattern, even with the translation invariance taken into account. The equality (11.8.1) holds also in the case of Neumann and impedance boundary conditions (see [78]). A corresponding result in the electromagnetic case can be found in [44]. Recently, Ivanyshyn [45, 46] gave some numerical examples through a nonlinear integral equation approach, which imply that shape reconstruction from the modulus of the far field pattern is possible. Under the condition that the ball is small, Liu and Zhang [79] proved that it is uniquely determined by the modulus of a single far field datum measured at a fixed observation corresponding to a single incident plane wave.
Acknowledgements This work was supported by the National Natural Science Foundation of China under grant no. 11071244.
276
11
Uniqueness Results for Inverse Scattering Problems
References [1] T. Abboud and J. C. Nedelec, Electromagnetic waves in an inhomogeneous medium, J. Math. Anal. Appl., 164, 40-58, 1992. [2] G. Alessandrini and L. Rondi, Determining a sound-soft polyhedral scatterer by a single far-field measurement, Proc. Amer. Math. Soc., 133, 1685-1691, 2005. Corrigendum: http://arxiv.org/abs/math.AP/0601406. [3] C. J. S. Alves and T. H. Duong, On inverse scattering by screens, Inverse Problems, 13, 1161-1176, 1997. [4] N. Ari and J. R. Firth, Acoustic inverse scattering problems of polygonal shape reconstruction, Inverse Problems, 6, 299-309, 1990. [5] C. Athanasiadis, A. G. Ramm and I. G. Stratis, Inverse acoustic scattering by a layered obstacle, Inverse Problem, Tomography and Image Processing, New York, Plenum, 1-8, 1998. [6] C. Athanasiadis and I. G. Stratis, On some elliptic transmission problems, Ann. Polon. Math., 63, 137-154, 1996. [7] A. Bukhgeim, Recovering the potential from Cauchy data in two dimensions, J. Inverse Ill-Posed Problems, 16, 19-33, 2008. [8] F. Cakoni and D. Colton, A uniqueness theorem for an inverse electromagnetic scattering problem in inhomogeneous anisotropic media, Proc. Edinburgh Math. Soc., 46, 293-314, 2003. [9] F. Cakoni and D. Colton, The determination of the surface impedance of a partially coated obstacle from far field data, SIAM J. Appl. Math., 64, 709-723, 2004. [10] F. Cakoni and D. Colton, Open problems in the qualitative approach to inverse electromagnetic scatteing theory, Euro. J. Appl. Math., 16, 411-425, 2005. [11] F. Cakoni and D. Colton, Qualitative Methods in Inverse Scattering Theory: An Introduction, New York, Springer, 2006. [12] F. Cakoni, M. B. Fares and H. Haddar, Analysis of two linear sampling methods applied to electromagnetic imaging of buried objects, Inverse Problems, 22, 845867, 2006. [13] J. Cheng and M. Yamamoto, Uniqueness in an inverse scattering problem within non-trapping polygonal obstacles with at most two incoming waves, Inverse Problems, 19, 1361-1384, 2003. Corrigendum: Inverse Problems, 21, 1193, 2005. [14] J. Cheng and M. Yamamoto, Global uniqueness in the inverse acoustic scattering problem within polygonal obstacles, Chin. Ann. Math., 25 B, 1-6, 2004. [15] D. Colton, Analytic Theorem of Partial Differential Equations, London, Pitman Publishing, 1980. [16] D. Colton, A reflection principle for solutions to the Helmholtz equation and an application to the inverse scattering problem, Glasgow Math J., 18, 125-130, 1997. [17] D. Colton, Inverse acoustic and electromagnetic scattering theory, Inside Out: Inverse Problems, MSRI Publications, Vol. 47, 2003. [18] D. Colton and A. Kirsch, Karp’s theorem in acoustic scattering theory, Proc. Amer. Math. Soc., 103, 783-788, 1988. [19] D. Colton and R. Kress, Karp’s theorem in electromagnetic scattering theory, Proc. Amer. Math. Soc., 104, 764-769, 1988.
References
277
[20] D. Colton and R. Kress, Inverse Acoustic and Electromagnetic Scattering Theory, 2nd Edition, Berlin, Springer, 1998. [21] D. Colton and R. Kress, Using fundamental solutions in inverse scattering, Inverse Problems, 22, R49-R66, 2006. [22] D. Colton and R. Kress, Inverse scattering, Handbook of Mathematical Methods in Imaging, Editor: O. Scherzer, New York, Springer, 551-598, 2011. [23] D. Colton, R. Kress and P. Monk, Inverse scattering from an orthotropic medium, J. Comput. Appl. Math., 81, 269-298, 1997. [24] D. Colton and L. Päivärinta, The uniqueness of a solution to an inverse scattering problem for electromagnetic waves, Arch. Rational Mech. Anal., 119, 59-70, 1992. [25] D. Colton and B. D. Sleeman, Uniqueness theorems for the inverse problem of acoustic scattering, IMA J. Appl. Math., 31, 253-259, 1983. [26] P. M. Cutzach and C. Hazard, Existence, uniqueness and analyticity properties for electromagnetic scattering in a two-layered medim, Math. Methods Appl. Sci., 21, 433-461, 1998. [27] F. Delbary, K. Erhard, R. Kress, R. Potthast and J. Schulz, Inverse electromagnetic scattering in a two-layered medium with an application to mine detection, Inverse Problems, 24, 015002, 2008. [28] J. Elschner and M. Yamamoto, Uniqueness in determining polygonal sound-hard obstacles with a single incoming wave, Inverse Problems, 22, 355-364, 2006. [29] J. Elschner and M. Yamamoto, Uniqueness in determining polygonal sound-hard obstacles with a single incoming wave, Inverse Problems, 22, 355-364, 2006. [30] J. Elschner and M. Yamamoto, Uniqueness in determining polyhedral soundhard obstacles with a single incoming wave, Inverse Problems, 24, 035004, 2008. [31] B. Gebauer, M. Hanke, A. Kirsch, W. Muniz and C. Schneider, A sampling method for detecting buried objects using electromagnetic scattering, Inverse Problems, 21, 2035-2050, 2005. [32] T. Gerlach, The two-dimensional electromagnetic inverse scattering problem for chiral media, Inverse Problems, 15, 1663-1675, 1999. [33] T. Gerlach and R. Kress, Uniqueness in inverse obstacle scattering with conductive boundary condition, Inverse Problems, 12, 619-625, 1996. Corrigendum: Inverse Problems, 12, 1075, 1996. [34] D. Gintides, Local uniqueness for the inverse scattering problem in acoustics via the Faber-Krahn inequality, Inverse Problems, 21, 1195-1205, 2005. [35] R. Griesmaier, An asymptotic factorization method for inverse electromagnetic scattering in layered media, SIAM J. Appl. Math., 68, 1378-1403, 2008. [36] R. Griesmaier and M. Hanke, An asymptotic factorization method for inverse electromagnetic scattering in layered media II: A numerical study, Contemp. Math., 494, 61-79, 2009. [37] H. Haddar and P. Joly, Stability of thin layer approximation of electromagnetic waves scattering by linear and nonlinear coatings, J. Comput. Appl. Math., 143, 201-236, 2002. [38] P. Hähner, A uniqueness theorem for a transmission problem in inverse electromagnetic scattering, Inverse Problems, 9 , 667-678, 1993. [39] P. Hähner, A periodic Faddeev-type solution operator, J. Diff. Equations, 128, 300-308, 1996.
278
11
Uniqueness Results for Inverse Scattering Problems
[40] P. Hähner, A uniqueness theorem for an inverse scattering problem in an exterior domain, SIAM J. Math. Anal., 29, 1118-1128, 1998. [41] P. Hähner, On Acoustic, Electromagnetic and Elastic Scattering Problems in Inhomogeneous Media, Habilitation thesis, Göttingen, 1998. [42] P. Hähner, On the uniqueness of the shape of a penetrable, anisotropic obstacle, J. Comput. Appl. Math., 116, 167-180, 2000. [43] F. Hettlich, On the uniqueness of the inverse conductive scattering problem for the Helmholtz equation, Inverse Problems, 10, 129-144, 1994. [44] G. Hu, X. Liu and B. Zhang, Unique determination of a perfectly conducting ball by a finite number of electric far field data, J. Math. Anal. Appl., 352, 861-871, 2009. [45] O. Ivanyshyn, Shape reconstruction of acoustic obstacles from the modulus of the far field pattern, Inverse Problem Imaging, 1, 609-622, 2007. [46] O. Ivanyshyn and R. Kress, Identification of sound-soft 3D obstacles from phaseless data, Inverse Problems Imaging, 4, 131-149, 2010. [47] V. Isakov, On uniqueness in the inverse transmission scattering problem, Commun. Part. Diff. Eqns., 15, 1565-1587, 1990. [48] V. Isakov, Inverse Problems for Partial Differential Equations, 2nd Edition, New York, Springer, 2006. [49] S. N. Karp, Far field amplitudes and inverse diffraction theory, Electromagnetic Waves, Editor: R. E. Langer, Madison, Univ. of Wisconsin Press, 291-300, 1962. [50] A. Kirsch and R. Kress, Uniqueness in inverse obstacle scattering, Inverse Problems, 9, 285-299, 1993. [51] A. Kirsch and L. Päivärinta, On recovering obstacles inside inhomogeneities, Math. Meth. Appl. Sci., 21, 619-651, 1998. [52] R. Kress, Numerical methods in inverse acoustic obstacle scattering, Inverse Problems in PartialDifferential Equations, Editors: D. Colton, R. Ewing and W. Rundell, Philadelphia, PA, SIAM, 61-72, 1990. [53] R. Kress, Inverse scattering from an open arc, Math. Meth. Appl. Sci., 18, 267293, 1995. [54] R. Kress, Fréchet diffirentiability of the far field operator for scattering from a crack, J. Inverse Ill-Posed Problems, 3, 305-313, 1995. [55] R. Kress, Acoustic scattering: Specific theoretical tools, Scattering, Editors: R. Pike and P. Sabatier, London, Academic Press, 37-51, 2001. [56] R. Kress, Acoustic scattering: Scattering by obstacles, Scattering, Editors: R. Pike and P. Sabatier, London, Academic Press, 52-73, 2001. [57] R. Kress, Electromagnetic waves scattering: Specific theoretical tools, Scattering, Editors: R. Pike and P. Sabatier, London, Academic Press, 175-190, 2001. [58] R. Kress, Electromagnetic waves scattering: Scattering by obstacles, Scattering, Editors: R. Pike and P. Sabatier, London, Academic Press, 191-210, 2001. [59] R. Kress, Uniqueness in inverse obstacle scattering for electromagnetic waves, Proceedings of the URSI General Assembly, Maastricht, 2002. Full paper is avaible via http://num.math.uni-goettingen.de/kress/ursi.ps. [60] R. Kress, Uniqueness and numerical methods in inverse obstacle scattering, J. Phys.: Conf. Series, 73, 012003, 2007. [61] R. Kress and W. Rundell, A quasi-Newton method in inverse obstacle scattering, Inverse Porblms, 10, 1145-1157, 1994.
References
279
[62] R. Kress and W. Rundell, Inverse obstacle scattering with modulus of the far field pattern as data, Inverse Problems in Medical Imaging and Nondestructive Testing, Editors: H. Engl, et al., New York, Springer, 75-92, 1997. [63] R. Kress and W. Rundell, Inverse obstacle scattering using reduced data, SIAM J. Appl. Math., 59, 442-454, 1999. [64] R. Kress and W. Rundell, Inverse scattering for shape and impedance, Inverse Problems, 17, 1075-1085, 2001. [65] P. D. Lax and R. S. Phillips, Scattering Theory, New York, Academic Press, 1967. [66] C. Liu, Inverse obstacle problem: local uniqueness for rougher obstacles and the identification of a ball, Inverse Problems, 13, 1063-1069, 1997. [67] C. Liu, An inverse obstacle problem: a uniqueness theorem for balls, Inverse Problems in Wave Propagation, Editors: Chavent, et al., IMA Vol.90, Berlin, Springer, 347-356, 1997. [68] C. Liu and A. Nachman, A scattering theory analogue of a theorem of Polya and an inverse obstacle problem, Preprint, 1994. [69] H. Liu, A global uniqueness for formally determined inverse electromagnetic obstacle scattering, Inverse Problems, 24, 035018, 2008. [70] H. Liu, M. Yamamoto and J. Zou, Reflection principle for Maxwell equations and its application toinverse electromagnetic scattering, Inverse Problems, 23, 2357-2366, 2007. [71] H. Liu, M. Yamamoto and J. Zou, New reflection principles for Maxwell equations and their applications, Numer. Math. TMA., 2, 1-17, 2009. [72] H. Liu and J. Zou, Uniqueness in an inverse obstacle scattering problem for both sound-hard and sound-soft polyhedral scatterers, Inverse Problems, 22, 515-524, 2006. [73] H. Liu and J. Zou, On unique determination of partially coated polyhedral scatterers with far field measurements, Inverse Problems, 23, 297-308, 2007. [74] H. Liu and J. Zou, Zeros of Bessel and spherical Bessel functions and their applications for uniqueness in inverse acoustic obstacle scattering problems, IMA J. Appl. Math., 72, 817-831, 2007. [75] H. Liu and J. Zou, Uniqueness in determining multiple polygonal scatterers of mixed type, Discrete Cont. Dynamical Syst. B, 9, 375-396, 2008. [76] H. Liu and J. Zou, On uniqueness in inverse acoustic and electromagnetic obstacle scattering problems, J. Phys.: Conf. Series, 124, 012006, 2008. [77] H. Liu, H. Zhang and J. Zou, Recovery of polyhedral scatterers by a single electromagnetic far-field measurement, J. Math. Phys., 50, 123506, 2009. [78] J. Liu and J. Seo, On stability for a translated obstacle with impedance boundary condition, Nonlin. Anal., 59, 731-744, 2004. [79] X. Liu and B. Zhang, Unique determination of a sound soft ball by the modulus of a single far field datum, J. Math. Anal. Appl., 365, 619-624, 2009. [80] X. Liu and B. Zhang, A uniqueness result for the inverse electromagnetic scattering problem in a piecewise homogeneous medium, Appl. Anal., 88, 1339-1355, 2009. [81] X. Liu and B. Zhang, A uniqueness result for inverse electromagnetic scattering problem in a two-layered medium, Inverse Problems, 26, 105007, 2010.
280
11
Uniqueness Results for Inverse Scattering Problems
[82] X. Liu and B. Zhang, Direct and inverse scattering problem in a piecewise homogeneous medium, SIAM J. Appl. Math., 70, 3105-3120, 2010. [83] X. Liu and B. Zhang, Inverse scattering by an inhomogeneous penetrable obstacle in a piecewise homogeneous medium, submitted, (arXiv:0912.2788v1), 2009. [84] X. Liu, B. Zhang and J. Yang, The inverse electromagnetic scattering problem in a piecewise homogeneous medium, Inverse Problems, 26, 125001, 2010 (arXiv:1001.2998v1). [85] X. Liu, B. Zhang and G. Hu, Uniqueness in the inverse scattering problem in a piecewise homogeneous medium, Inverse Problems, 26, 015002, 2010. [86] W. Mclean, Strongly Elliptic Systems and Boundary Integral Equation, Cambridge, Cambridge Univ. Press, 2000. [87] L. Mönich, On the inverse acoustic scattering problem by an open arc: the sound-hard case, Inverse Problems, 13, 1379-1392, 1997. [88] P. Monk, Finite Element Methods for Maxwell’s Equations, Oxford, Oxford Univ. Press, 2003. [89] A. Nachman, Reconstructions from boundary measurements, Ann. Math., 128, 531-587, 1988. [90] A. Nachman, L. Päivärinta and A. Teirlilä, On imaging obstacle inside inhomogeneous media, J. Func. Anal., 252, 490-516, 2007. [91] R. Novikov, Multidimensional inverse spectral problems for the equation −Δψ + (v(x) + Eu(x))ψ = 0, Funktsionalny Analizi Ego Prilozheniya, 22, 11-12, 1988. Transl. Func. Anal. and its Appl., 22, 263-272, 1988. [92] P. Ola, L. Päivärinta and E. Somersalo, An inverse boundary value problem in electrodynamics, Duke Math. J., 70, 617-653, 1993. [93] P. Ola and E. Somersalo, Electromagnetic inverse problems and generalized Sommerfeld potentials, SIAM J. Appl. Math., 56, 1129-1145, 1996. [94] M. Piana, On uniqueness for anisotropic inhomogeneous inverse scattering problems, Inverse Problems, 14, 1565-1579, 1998. [95] R. Potthast, On a concept of uniqueness in inverse scattering for a finite number of incident waves, SIAM J. Appl. Math., 58, 666-682, 1998. [96] R. Potthast, Point Sources and Multipoles in Inverse Scattering Theory, Chapman and Hall/CRC, London, 2001. [97] R. Potthast, A survey on sampling and probe methods for inverse problems, Inverse Problems, 22, R1-R47, 2006. [98] P. Y. H. Pang and G. Yan, Uniqueness of inverse scattering problem for a penetrable obstacle with rigid core, Appl. Math. Lett., 14, 155-158, 2001. [99] A. G. Ramm, Recovery of the potential from fixed energy scattering data, Inverse Problems, 4, 877-886, 1988. [100] A. G. Ramm, Symmetry properties of scattering amplitudes and applicaitons to inverse problems, J. Math. Anal. Appl., 156, 333-340, 1991. [101] A. G. Ramm, Multidimensional Inverse Scattering Problems, New York, Longman Scientific & Wiley, 1992. [102] A. G. Ramm, A new method for proving uniqueness theorems for inverse obstacle scattering, Appl. Math. Lett., 6, 85-87, 1993. [103] A. G. Ramm, Uniqueness theorems for inverse obstacle scattering problems in Lipschitz domains, Appl. Anal., 59, 337-383, 1995.
References
281
[104] F. Rellich, Über das asymptotische Verhalten der Lösungen von Δu + λu = 0 in unendlichen Gebieten, J. Deutsch. Math. Verein., 53, 57-65, 1943. [105] L. Rondi, Uniqueness and Optimal Stability for the Determination of Multiple Defects by Electrostatic Measurements, S.I.S.S.A-I.S.A.S., Trieste, PhD Thesis, 1999. [106] L. Rondi, Unique determination of non-smooth sound-soft scatters by finitely many far-field measurements, Indiana Univ. Math. J., 52, 1631-1662, 2003. [107] L. Rondi, Uniqueness for the determination of sound-soft defects in an inhomogeneous planar medium by acoustic boundary measurements, Trans. Amer. Math. Soc., 355, 213-239, 2003. [108] B. D. Sleemann, The inverse problem of acoustic scattering, IMA J. Appl. Math., 29, 113-142, 1982. [109] P. Stefanov and G. Uhlman, Local uniqueness for the fixed energy fixed angle inverse problem in obstacle scattering, Proc. Amer. Math. Soc., 132, 1351-1354, 2004. [110] J. Sylvester and G. Uhlmann, A global uniqueness theorem for an inverse boundary value problem, Ann. Math., 125, 153-169, 1987. [111] I. N. Vekua, On metaharmonic functions, Trudy Tbilisskogo matematichesgo Instituta, 12, 105-174, 1943. [112] G. Yan, Inverse scattering by a multilayered obstacle, Comput. Math. Appl., 48, 1801-1810, 2004. [113] G. Yan and P. Y. H. Pang, Inverse obstacle scattering with two scatterers, Appl. Anal., 70, 35-43, 1998. [114] K. Yun, The reflection of solutions of Helmholtz equation and an application, Commun. Korean Math. Soc., 16, 427-436, 2001.
Authors Information X. D. Liu and B. Zhang LSEC and Institute of Applied Mathematics, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, P. R. China, E-mail: [email protected],[email protected]
Chapter 12
Shape Reconstruction of Inverse Medium Scattering for the Helmholtz Equation G. Bao and P. J. Li
Abstract. Consider a time-harmonic electromagnetic plane wave incident on a medium enclosed by a bounded domain in two dimensions. In this chapter, existence and uniqueness of the variational problem for direct scattering are established. An energy estimate for the scattered field is obtained on which the Born approximation is based. Fréchet differentiability of the scattering map is examined. A continuation method for the inverse obstacle scattering problem, which reconstructs the shape of the inhomogeneous mediums from boundary measurements of the scattered wave, is developed. The algorithm requires multi-frequency data. Using an initial guess from the Born approximation, each update of the shape, represented by the level set function, is obtained via recursive linearization on the wavenumber by solving one forward problem and one adjoint problem of the Helmholtz equation.
12.1
Introduction
Consider the Helmholtz equation in two dimensions Δψtot + κ2 εψtot = 0,
(12.1.1)
where ψtot is the total electric field, κ > 0 is the wavenumber, and ε is the relative electric permittivity. Rewrite ε = 1 + q(x) and q(x) > −1, which has a compact support, is the scatterer. The total electric field ψtot consists of the incident field ψinc and the scattered field ψ: ψtot = ψinc + ψ. Assume that the incident field is a plane wave ψinc (x) = eiκx·d ,
(12.1.2)
284
12
Shape Reconstruction of Inverse Medium Scattering
where d ∈ S1 = {x ∈ R2 : |x| = 1} is the propagation direction. Evidently, such an incident wave satisfies the homogenous equation Δψinc + κ2 ψinc = 0.
(12.1.3)
It follows from the equations (12.1.1) and (12.1.3) that the scattered field satisfies (12.1.4) Δψ + κ2 (1 + q)ψ = −κ2 qψinc . In addition, the scattered field is required to satisfy the following Sommerfeld radiation condition & % ∂ψ √ − iκψ = 0, ρ = |x|, lim ρ ρ→∞ ∂ρ uniformly along all directions x/|x|. In practice, it is convenient to reduce the problem to a bounded domain by introducing an artificial surface. Let Ω be the compact support of the scatterer q(x). Assume that R > 0 is a constant, such that the support of the scatterer, Ω, is included in the open ball B = {x ∈ R2 : |x| < R}. Let S be the sphere of the ball, i.e., S = {x ∈ R2 : |x| = R}. Denote n the outward unit normal to S. A suitable boundary condition then has to be imposed on S. For simplicity, we employ the first order absorbing boundary condition [16] as ∂n ψ − iκψ = 0,
on S.
(12.1.5)
Given the incident field ψinc , the direct scattering problem is to determine the scattered field ψ for the known scatterer q. Based on the Lax-Milgram lemma, the direct problem is shown to have a unique solution for all but possibly a discrete set of wavenumbers. Furthermore, an energy estimate for the scattered filed, with a uniform bound with respect to the wavenumber κ, is given in the case of low frequency. The estimate provides a theoretical basis of the linearization algorithm. Properties on continuity and Fréchet differentiability of the scattering map are also examined. For analysis of the direct scattering in open domain, the reader is referred to [1, 8] and references therein. The relative permittivity or the scatterer is assumed to be constant with a known value inside inhomogeneities. The inverse obstacle scattering is to determine the number, shapes, sizes and locations of these inhomogeneities from the measurements of near field current densities, ψ|S , given the incident field. Our goal of this work is to present a recursive linearization method that solves the inverse obstacle scattering problem of Helmholtz equation in two dimensions. The reader may refer to [2, 6] and [3, 4] for recursive linearization approaches for solving inverse medium scattering problems in two dimensions and three dimensions, respectively. The algorithm requires multi-frequency
12.2 Analysis of the scattering map
285
scattering data, and the recursive linearization is obtained by a continuation method on the wavenumber κ. It first solves a linear equation (Born approximation) at the lowest κ, which maybe done by using the Fast Fourier Transform (FFT). Updates are subsequently obtained by using higher and higher wavenumber κ from the level set representation. Using the idea of Kaczmarz method [10, 11, 22, 23], we use partial data to perform the nonlinear Landweber iteration at each stage of the wavenumber κ. For each iteration, one forward and one adjoint state of the Helmholtz equation are solved. The level set method was originally developed for describing the motion of curves and surfaces [25]. Since then, it has found application in a variety of quite different situations [24, 26]. The idea of using level set representation as part of a solution scheme for inverse problems involving obstacles can be found in [27, 5, 11, 15, 21]. For related results on the inverse obstacle problem, the reader is referred to [7, 9, 14, 18, 19, 20] and references therein. See [8] for an account of the recent progress on the general inverse scattering problem. The plan of this chapter is as follows. The analysis of the variational problem for direct scattering is presented in Section 12.2. The well-posedness of the direct scattering is proved, and important energy estimate is given, and the Fréchet differentiability of the scattering map is examined. Section 12.3 is devoted to the numerical study of the inverse obstacle scattering, and a regularized iterative linearization algorithm is proposed. Numerical examples are presented in Section 12.4. The paper is concluded with some remarks and directions for future research in Section 12.5.
12.2
Analysis of the scattering map
For convenience, denote the inner products u · v, ¯ and u, v = u · v, ¯ (u, v) = S
B
where the bar denotes the complex conjugate. To state our boundary value problem, we introduce the bilinear form a : 1 H (B) × H 1 (B) → C a(u, v) = (∇u, ∇v) − κ2 (εu, v) − iκu, v, and the linear functional on H 1 (B) b(v) = κ2 (qψinc , v). Then, we have the weak form of the boundary value problem (12.1.4) and (12.1.5): Find ψ ∈ H 1 (B) such that a(ψ, ξ) = b(ξ),
for all ξ ∈ H 1 (B).
(12.2.1)
286
12
Shape Reconstruction of Inverse Medium Scattering
Throughout the chapter, C stands for a positive generic constant whose value may change step by step, but should be always be clear from the contexts. Lemma 12.2.1. Given the scatterer q ∈ L∞ (B), the direct scattering problem (12.1.4)–(12.1.5) has at most one solution. Proof. It suffices to show that ψ = 0 in B if ψi = 0 (no source term). From the Green’s formula , ¯ ¯ ¯ ¯ 0 = (ψΔψ − ψΔψ) = ψ∂n ψ − ψ∂n ψ = −2iκ |ψ|2 , B
S
S
we get ψ = 0 on S. The absorbing boundary condition on S yields further that ∂n ψ = 0 on S. By the Holmgren uniqueness theorem, ψ = 0 in R2 \ B. A unique continuation result [17] concludes that ψ = 0 in B. Theorem 12.2.2. If the wavenumber k is sufficiently small, the variational problem (12.2.1) admits a unique weak solution in H 1 (B). Furthermore, it holds the estimate ψ H 1 (B) ≤ Cκ q L∞ (B) ψinc L2 (B) ,
(12.2.2)
where the constant C is independent of the wavenumber κ. Proof. Decompose the bilinear form a into a = a1 + κ2 a2 , where a1 (ψ, ξ) = (∇ψ, ∇ξ) − iκψ, ξ, a2 (ψ, ξ) = −(εψ, ξ). We conclude that a1 is coercive from |a1 (ψ, ψ)| ≥ C ∇ψ 2L2 (B) +κ ψ 2L2 (S) ≥ Cκ ψ 2H 1 (B) ,
for all ψ ∈ H 1 (B),
where the last inequality may be obtained by applying standard elliptic estimates [13]. Next we prove the compactness of a2 . Define the operator K : L2 (B) → 1 H (B) by a1 (Kψ, ξ) = a2 (ψ, ξ), for all ξ ∈ H 1 (B), which gives (∇Kψ, ∇ξ) − iκKψ, ξ = −(εψ, ξ),
for all ξ ∈ H 1 (B).
Using the Lax-Milgram lemma, it follows that Kψ H 1 (B) ≤ Cκ−1 ψ L2 (B) ,
(12.2.3)
12.2 Analysis of the scattering map
287
where the constant C is independent of k. Thus K is bounded from L2 (B) to H 1 (B), and H 1 (B) is compactly imbedded into L2 (B). Hence K : L2 (B) → L2 (B) is a compact operator. Define a function u ∈ L2 (B) by requiring u ∈ H 1 (B) and satisfying a1 (u, ξ) = b(ξ),
for all ξ ∈ H 1 (B).
It follows from the Lax–Milgram lemma again that u H 1 (B) ≤ Cκ q L∞ (B) ψinc L2 (B) .
(12.2.4)
Using the operator K, we can see that the problem (12.2.1) is equivalent to find ψ ∈ L2 (B) such that (12.2.5) (I + κ2 A)ψ = u. When κ is sufficiently small, the operator I + κ2 K has a uniformly bounded inverse. We then have the estimate ψ L2 (B) ≤ C u L2 (B) ,
(12.2.6)
where the constant C is independent of κ. Rearranging (12.2.5), we have ψ = u − κ2 Kψ, so ψ ∈ H 1 (B) and, by the estimate (12.2.3) for the operator K, we have ψ H 1 (B) ≤ u H 1 (B) +Cκ ψ L2 (B) . The proof is completed by combining (12.2.6) and (12.2.4). Remark 12.2.3. The energy estimate of the scattered field (12.2.2) provides a criterion for weak scattering. From this estimate, it is easily seen that fixing any two of the three quantities, i.e., the wavenumber κ, the compact support of the scatterer Ω, and the L∞ (B) norm of the scatterer, the scattering is weak when the third one is small. Especially, for the given scatterer q(x), i.e., the norm and the compact support are fixed, the scattering is weak when the wavenumber κ is small. Remark 12.2.4. For a general wavenumber κ, from the equation (12.2.5), the existence follows from the Fredholm alternative and the uniqueness result. However, the constant C in the estimate (12.2.2) depends on the wavenumber. For a given scatterer q and an incident field ψinc , we define the map S(q, ψinc ) by ψ = S(q, ψinc ), where ψ is the solution of the problem (12.1.4)–(12.1.5) or the variational problem (12.2.1). It is easily seen that the map S(q, ψinc ) is linear with respect to the incident field ψinc but is nonlinear with respect to q. Hence, we may denote S(q, ψinc ) by S(q)ψinc . Concerning the map S(q), we have the following regularity results. Corollary 12.2.5 gives the boundedness of S(q) and directly follows from Theorem 12.2.2; while a continuity result for the map S(q) is presented in Lemma 12.2.6.
288
12
Shape Reconstruction of Inverse Medium Scattering
Corollary 12.2.5. Given q ∈ L∞ (B), the scattering map S(q) is a bounded linear map from L2 (B) to H 1 (B). Moreover, there is a constant C depending on κ and B such that S(q)ψinc H 1 (B) ≤ C q L∞ (B) ψinc L2 (B) .
(12.2.7)
Lemma 12.2.6. Assume that q1 , q2 ∈ L∞ (B). Then (S(q1 ) − S(q2 )) ψinc H 1 (B) ≤ C q1 − q2 L∞ (B) ψinc L2 (B) ,
(12.2.8)
where the constant C depends on κ, B, and the bound of q2 . Proof. Let ψ1 = S(q1 )ψinc and ψ2 = S(q2 )ψinc . It follows that for j = 1, 2 Δψj + κ2 (1 + qj )ψj = −k 2 qj ψinc . By setting w = ψ1 − ψ2 , we have Δw + κ2 (1 + q1 )w = −κ2 (q1 − q2 )(ψinc + ψ2 ). The function w also satisfies the boundary condition (12.1.5). We repeat the procedure in the proof of Theorem 12.2.2 to obtain w H 1 (B) ≤ C q1 − q2 L∞ (B) ψinc + ψ2 L2 (B) . Using Corollary 12.2.5 for ψ2 yields ψ2 H 1 (B) ≤ C q2 L∞ (B) ψinc L2 (B) , which gives (S(q1 ) − S(q2 )) ψinc H 1 (B) ≤ C q1 − q2 L∞ (B) ψinc L2 (B) , where the constant C depends on B, κ, and the bound of q2 . Let γ be the restriction (trace) operator to the boundary S. By the trace theorem, γ is a bounded linear operator from H 1 (B) onto H 1/2 (S). We can now define the scattering map M (q) = γS(q). Next is to consider the Fréchet differentiability of the scattering map. Recall the map S(q) is nonlinear with respect to q. Formally, by using the first order perturbation theory, we obtain the linearized scattering problem of (12.1.4)–(12.1.5) with respect to a reference scatterer q, Δv + κ2 (1 + q)v = −κ2 δq(ψinc + ψ), ∂n ψ − iκv = 0, where ψ = S(q)ψinc .
on S,
in Ω,
(12.2.9) (12.2.10)
12.2 Analysis of the scattering map
289
Define the formal linearization T (q) of the map S(q) by v = T (q)(δq, ψinc ), where v is the solution of the problem (12.2.9)–(12.2.10). The following result is concerned with the boundedness for the map T (q). A proof by be given by following step by step the proofs of Theorem 12.2.2 and Lemma 12.2.6. Hence we omit here. Lemma 12.2.7. Assume that q, δq ∈ L∞ (B), and ψinc is the incident field. Then v = T (q)(δq, ψinc) ∈ H 1 (B) with the estimate T (q)(δq, ψinc) H 1 (B) ≤ C δq L∞ (B) ψinc L2 (B) ,
(12.2.11)
where the constant C depends on κ, B, and q. The next lemma is concerned with the continuity property of the map. Lemma 12.2.8. For any q1 , q2 ∈ L∞ (B), and an incident field ψinc , the following estimate holds T (q1 )(δq, ψinc ) − T (q2 )(δq, ψinc ) H 1 (B) ≤ C q1 − q2 L∞ (B) δq L∞ (B) ψinc L2 (B) , (12.2.12) where the constant C depends on κ and B. Proof. Let vi = T (qi )(δq, ψinc ), for i = 1, 2. It is easy to see that Δ(v1 − v2 ) + κ2 (1 + q1 )(v1 − v2 ) = −κ2 δq(ψ1 − ψ2 ) − κ2 (q1 − q2 )v2 , where ψi = S(qi )ψinc . Similar to the proof of Theorem 12.2.2, we get v1 − v2 H 1 (B) , ≤ C δq L∞ (B) ψ1 − ψ2 H 1 (B) + q1 − q2 L∞ (B) v2 H 1 (B) . From Corollary 12.2.5 and Lemma 12.2.6, we obtain v1 − v2 H 1 (B) ≤ C q1 − q2 L∞ (B) δq L∞ (B) ψinc L2 (B) , which completes the proof. The following result concerns the differentiability property of S(q). Lemma 12.2.9. Assume that q, δq ∈ L∞ (B). Then there is a constant C dependent of κ and B, for which the following estimate holds S(q + δq)ψinc − S(q)ψinc − T (q)(δq, ψinc ) H 1 (B) ≤ C δq 2L∞ (B) ψinc L2 (B) . (12.2.13)
290
12
Shape Reconstruction of Inverse Medium Scattering
Proof. By setting ψ1 = S(q)ψinc , ψ2 = S(q + δq)ψinc , and v = T (q)(δq, ψinc), we have Δψ1 + κ2 (1 + q)ψ1 = −κ2 qψinc , Δψ2 + κ2 (1 + q + δq)ψ2 = −κ2 (q + δq)ψinc , Δv + κ2 (1 + q)v = −κ2 δqψ1 − κ2 δqψinc . In addition, ψ1 , ψ2 , and v satisfy the absorbing boundary condition (12.1.5). Denote U = ψ2 − ψ1 − v. Then ΔU + κ2 (1 + q)U = −κ2 δq(ψ2 − ψ1 ). Similar arguments as in the proof of Lemma 12.2.5 gives U H 1 (B) ≤ C δq L∞ (B) ψ2 − ψ1 H 1 (B) . From Lemma 12.2.5, we obtain further that U H 1 (B) ≤ C δq 2L∞ (B) ψinc L2 (B) , which is the estimate. Finally, by combining the above lemmas, we arrive at Theorem 12.2.10. The scattering map M (q) is Fréchet differentiable with respect to q and its Fréchet derivative is DM (q) = γT (q).
12.3
(12.2.14)
Inverse medium scattering
In this section, a regularized recursive linearization method for solving the inverse obstacle scattering problem of Helmholtz equation in two dimensions is proposed. The algorithm, obtained by a continuation method on the wavenumber κ, requires multi-frequency scattering data. At each wavenumber κ, the algorithm determines a forward model which produces the prescribed scattering data. At low wavenumber κ, the scattered field is weak. Consequently, the nonlinear equation become essentially a linear one, known as the Born approximation. The algorithm first solves this nearly linear equation at the lowest κ to obtain low-frequency modes of the true scatterer. The approximation is then used to linearize the nonlinear equation at the next higher κ to produce a better approximation which contains more modes of the true scatterer. This process is continued until a sufficiently high wavenumber κ where the dominant modes of the scatterer are essentially recovered. At each update, a level set representation is used to keep track of shapes of the scatterer.
12.3 Inverse medium scattering
12.3.1
291
Shape reconstruction
We formulate the inverse obstacle scattering as shape reconstruction problem, and cast it in a form which makes use of the level set representation of the domains. To start with we introduce some useful notations. Definition 12.3.1. Assume that we are given a constant q˜ > 0 and an open ball B ⊂ R2 . We call a pair (Ω, q), which consists of a compact domain Ω ⊂⊂ B and q ∈ L∞ (B), admissible if we have q, ˜ if x ∈ Ω, q(x) = 0, if x ∈ R2 \ Ω. In other words, a pair (Ω, q) is admissible if q has a compact support of Ω with preassigned value q˜ inside. For an admissible pair (Ω, q), and for given q, ˜ the scatterer q is uniquely determined by Ω. It is essential for the success and efficiency of the inverse obstacle scattering to have a good and flexible way of keeping track of the shape evolution during the reconstruction process. The method chosen in our reconstruction algorithm is a level set representation of the shapes [27]. Definition 12.3.2. A function φ : R2 → R is called a level set representation of Ω if (12.3.1) φ|Ω ≤ 0 and φ|R2 \Ω > 0. For each function φ : R2 → R there is a domain Ω associated with φ by (12.3.1) which is called scattering domain and denoted as Ω[φ]. It is clear that different functions φ1 , φ2 , φ1 = φ2 , can be associated with the same domain Ω[φ1 ] = Ω[φ2], but different domains cannot have the same level set representation. Therefore, we can use the level set representation for specifying a domain Ω by any one of its associated level set functions. The boundary of a domain Ω[φ], represented by the level set function φ, is denoted Γ = ∂Ω[φ]. Definition 12.3.3. We call a triple (Ω, q, φ), which consists of a domain Ω ⊂⊂ B and q, φ ∈ L∞ (B), admissible if the pair (Ω, q) is admissible in the sense of Definition 12.3.1, and φ is a level set representation of Ω. For an admissible triple (Ω, q, φ), and for given q, ˜ the pair (Ω, q) is uniquely determined by φ. The shape of the scatterer is then recovered from the representing level set function. We use these definitions to reformulate our inverse obstacle scattering problem: Given a constant q˜ and boundary measurements of the scattered field ψ|S . Find a level set function φ such that the corresponding admissible triple (Ω, q, φ) reproduces the data.
292
12
Shape Reconstruction of Inverse Medium Scattering
A continuation method is proposed to recursively determine the triple (Ωk , qk , φk ) at k = κ1 , κ2 , . . . with increasing wavenumber. For finding this series, we only need keep track of φk and qk , but not of Ωk . The function qk is need in each step for solving a forward and a corresponding adjoint problem. The final level set function is used to recover the final shape of the scatterer.
12.3.2
Born approximation
For starting the shape reconstruction method, an initial guess is needed which is derived from the Born approximation. Rewrite (12.1.4) as Δψ + κ2 ψ = −κ2 q(ψinc + ψ),
(12.3.2)
where the incident wave is taken as ψinc = eiκx·d1 . Consider a test function ψ0 = eiκx·d2 , where d2 ∈ S1 . Hence ψ0 satisfies (12.1.3). Multiplying (12.3.2) by ψ0 , and integrating over B on both sides, we have 2 2 ψ0 Δψ + κ ψ0 ψ = −κ q(ψinc + ψ)ψ0 . B
B
B
Integration by parts yields , 2 2 ψΔψ0 + ψ0 ψ = −κ q(ψinc + ψ)ψ0 . ψ0 ∂n ψ − ψ∂n ψ0 + κ B
S
B
B
We have by noting (12.1.3) and the boundary condition (12.1.5) 1 q(ψinc + ψ)ψ0 = 2 ψ (∂n ψ0 + iκψ0 ) . κ S B Using the special form of the incident wave and the test function, we then get iκx·(d1 +d2 ) −1 iκx·d2 q(x)e = iκ ψ(n · d2 + 1)e − qψψ0 . (12.3.3) B
S
B
From Theorem 12.2.2, for small wavenumber κ, the scattered field is weak and the inverse scattering problem becomes essentially linear. Dropping the nonlinear (second) term of the equation (12.3.3), we obtain the linearized integral equation q(x)eiκx·(d1+d2 ) = iκ−1 ψ(n · d2 + 1)eiκx·d2 , (12.3.4) B
S
which is the Born approximation. Since the scatterer q(x) has a compact support, we use the notation q(ξ) ˆ = q(x)eiκx·(d1+d2 ) , B
12.3 Inverse medium scattering
293
where q(ξ) ˆ is the Fourier transform of q(x) with ξ = κ(d1 + d2 ). Choose dj = (cos θj , sin θj ) ,
j = 1, 2,
where θj is the incident angle. It is obvious that the domain [0, 2π] of θj , j = 1, 2, corresponds to the ball {ξ : |ξ| ≤ 2κ}. Thus, the Fourier modes of q(ξ) ˆ in the ball {ξ : |ξ| ≤ 2k} can be determined. The scattering data with the higher wavenumber κ must be used in order to recover more modes of the true scatterer. Define the data ⎧ ⎨ −1 iκ ψ(n · d2 + 1)eiκx·d2 , for |ξ| ≤ 2κ, G(ξ) = S ⎩ 0, otherwise. The linear integral equation (12.3.4) can be reformulated as q(x)eix·ξ dx = G(ξ). R2
(12.3.5)
Taking the inverse Fourier transform of the equation (12.3.5) leads to −2 −ix·ξ iy·ξ −2 e q(y)e dy dξ = (2π) e−ix·ξ G(ξ)dξ. (2π) R2
R2
R2
By the Fubini theorem, we have (2π)−2 q(y) ei(y−x)·ξ dξ dy = (2π)−2 R2
R2
R2
e−ix·ξ G(ξ)dξ.
Using the inverse Fourier transform of the Dirac Delta function (2π)−2 ei(y−x)·ξ dξ = δ(y − x), R2
we deduce
R2
q(y)δ(y − x)dy = (2π)
which gives q(x) = (2π)
−2
R2
−2
R2
e−ix·ξ G(ξ)dξ,
e−ix·ξ G(ξ)dξ.
(12.3.6)
In practice, the integral equation (12.3.6) is implemented by using the Fast Fourier Transform (FFT). We are now ready to define the initial triple (Ω, q, φ). Choose a threshold value 0 < τ < 1 and define q0 := τ max |q(x)|. x∈B
294
12
Shape Reconstruction of Inverse Medium Scattering
The level set zero of φ is denoted as {x ∈ B : |q(x)| = q0 }. This means that all points of B where the reconstruction |q(x)| has exactly the value q0 are mapped to zero by the level set function φ. The level set function is then defined as φ(x) = σ(q0 − |q(x)|), where σ is some scaling factor. The initial scattering domain Ω and the scatterer are defined as Ω = Ω[φ], q = Λ(φ). Together with φ they form an admissible triple (Ω, q, φ).
12.3.3
Recursive linearization
As discussed in the previous section, when the wavenumber κ is small, the Born approximation allows a reconstruction of those Fourier modes less than or equal to 2κ for the function q(x). We now describe a procedure that recursively determines the triple (Ωk , qk , φk ) at k = kj for j = 1, 2, . . . with the increasing wavenumber. At each stage of the wavenumber κ, using the idea of Kaczmarz method, we use partial data, corresponding to one incident wave, to perform the nonlinear Landweber iteration. Suppose now that the pair (Ωk˜ , qk˜ ) has been recovered at some wavenumber κ, ˜ and that κ > 0 is slightly larger than κ. ˜ Since only the φk and qk need to be kept track, we wish to determine the φk and qk , or equivalently, to determine the perturbation δφ = φk − φk˜ ,
and
δq = qk − qk˜ .
For the reconstructed scatterer qk˜ , we solve at the wavenumber κ the forward scattering problem Δψ˜ + κ2 (1 + qk˜ )ψ˜ = −κ2 qk˜ ψjinc , ∂n ψ˜ − iκψ˜ = 0,
(12.3.7) (12.3.8)
where ψjinc is the incident wave with the incident angle θj = 2πj/J, j = 1, 2, . . . , J. For the scatterer qk , we have Δψ + κ2 (1 + qk )ψ = −κ2 qk ψjinc , ∂n ψ − iκψ = 0.
(12.3.9) (12.3.10)
Subtracting (12.3.7), (12.3.8) from (12.3.9), (12.3.10) and omitting the second˜ we obtain order smallness in δq and in δψ = ψ − ψ, ˜ Δδψ + κ2 (1 + qk˜ )δψ = −κ2 δq(ψjinc + ψ), ∂n ψ − iκδψ = 0.
(12.3.11) (12.3.12)
12.3 Inverse medium scattering
295
For the scatterer qk , and the incident wave ψjinc , we define the map S(qk , ψjinc ) by S(qk , ψjinc ) = ψ, where ψ is the scattering data at the wavenumber k . Let γ be the trace operator to the boundary S. Define the scattering map M (qk , ψjinc ) = γS(qk , ψjinc ). For simplicity, denote M (qk , ψjinc ) by Mj (qk ). By the definition of the trace operator, we have Mj (qk ) = ψ|S . Let DMj (qk˜ ) be the Fréchet derivative of Mj (qk ), and denote the residual operator ˜ S. Rj (qk˜ ) = ψ|S − ψ| It follows from Theorem 12.2.10 that DMj (qk˜ )δq = Rj (qk˜ ).
(12.3.13)
Given a constant q. ˜ Then, with each level set function φ a uniquely determined scatterer Λ(φ) is associated by putting q, ˜ for φ(x) ≤ 0, Λ(φ)(x) = 0, for φ(x) > 0. In [27], it is shown that the infinitesimal response δq in the scatterer q(x) to an infinitesimal change δφ(x) of the level set function φ(x) has the form δφ(x) . δq(x) = −q˜ |∇φ(x)| x∈∂Ω[φ] The Fréchet derivative of Λ is then defined [11] [DΛ(φ)δφ](x) = −q˜
δφ(x) δΓ (x), |∇φ(x)|
where δΓ (x) denotes the Dirac delta distribution concentrated on Γ = ∂Ω[φ]. Define the forward operator Fj (φ) = Mj (Λ(φ)) = ψ|S ,
(12.3.14)
where ψ is the scattered field with scatterer Λ(φ). It is easily seen that the Fréchet derivative of the forward operator can be expressed as DFj (φ)δφ = DMj (Λ(φ))DΛ(φ)δφ,
296
12
Shape Reconstruction of Inverse Medium Scattering
which gives by noting (12.3.13) DFj (φ)δφ = Rj (Λ(φ)).
(12.3.15)
Using the Landweber iteration of (12.3.15) yields δφ = βk DFj∗ (φ)R(Λ(φ)), which gives δφ = βk DΛ∗ (φ)DMj∗ (Λ(φ))Rj (Λ(φ)),
(12.3.16)
where βk is a relaxation parameter. In order to calculate (12.3.16), we will need practically useful expressions for the adjoint of the Fréchet derivatives. First, a simple calculation gives the following theorem. Theorem 12.3.4. The adjoint operator DΛ∗ (φ) is given by [DΛ∗ (φ)δq] (x) = −q˜
δq δΓ (x). |∇φ|
(12.3.17)
Theorem 12.3.5. Given residual Rj (qk˜ ), there exits a function ϕj such that the adjoint Fréchet derivative DMj∗ (qk˜ ) satisfies
, ¯˜ -ϕ (x), DMj∗ (qk˜ )Rj (qk˜ ) (x) = k 2 ψ¯jinc (x) + ψ(x) j
(12.3.18)
where the bar denotes the complex conjugate, φinc j is the incident wave with incident angle θj , and ψ˜j is the solution of (12.3.7), (12.3.8) with the incident wave ψjinc . Proof. Let ψ˜j be the solution of (12.3.7), (12.3.8) with the incident wave ψjinc . Consider the equations as follows ˜ Δδψ + κ2 (1 + qk˜ )δψ = −κ2 δq(ψjinc + ψ),
(12.3.19)
∂n δψ − iκδψ = 0.
(12.3.20)
Δϕj + κ2 (1 + qk˜ )ϕj = 0,
(12.3.21)
∂n ϕj + iκϕj = Rk (qk˜ ).
(12.3.22)
and the adjoint equations
The existence and uniqueness of the weak solution for the adjoint equations follow from the same proof of Corollary 12.2.5, we omit here.
12.3 Inverse medium scattering
297
Multiplying equation (12.3.19) with the complex conjugate of ϕj , integrating over B on both sides, we obtain , 2 2 ϕ¯j Δδψ + κ (1 + qk˜ )δψϕ¯j = −κ δq ψjinc + ψ˜ ϕ¯j . B
B
B
Integration by parts yields , , 2 ϕ¯j ∂n δψ − δψ∂n ϕj = −κ δq ψjinc + ψ˜ ϕ¯j . S
B
Using the boundary condition (12.3.20), we deduce , , δψ ∂n ϕj − iκϕ¯j = κ2 δq ψjinc + ψ˜ ϕ¯j dx. S
B
It follows from (12.3.13), and the boundary condition (12.3.22) , 2 DMj (qk˜ )δq Rj (qk˜ ) = κ δq ψjinc + ψ˜ ϕ¯j . S
B
DMj∗ (qk˜ )
We know from the adjoint operator , 2 ∗ δqDMj (qk˜ )Rj (qk˜ ) = κ δq ψjinc + ψ˜ ϕ¯j . B
B
Since it holds for any δq, we have , DMj∗ (qk˜ )Rj (qk˜ ) = κ2 ψjinc + ψ˜ ϕ¯j . Taking the complex conjugate of the above equation yields the result. Finally, combining Theorems 12.3.4 and 12.3.5, it follows that (12.3.16) can be rewritten as (ψ¯jinc + ψ¯˜j )ϕj 2 (12.3.23) δφ = −qκ ˜ βk δΓ (x). |∇φ| In practice, for a given level set function φ, let Γ = ∂Ω[φ] and Bρ (Γ) = ∪y∈Γ Bρ (y) a small finite width neighborhood of Γ. In our numerical experiment, the constant ρ is chosen about 2-3 grid cells. Equation (12.3.23) is updated with δφ = −qκ ˜ 2 βk
˜j )ϕj (ψ¯jinc + ψ¯ |∇φ|
χBρ (Γ) (x),
where βk is some suitable parameter and χBρ (Γ) (x) is defined as 1, for x ∈ Bρ (Γ), χBρ (Γ) (x) = 0, for R2 \ Bρ (Γ).
(12.3.24)
298
12
Shape Reconstruction of Inverse Medium Scattering
So for each incident wave with incident angel θj , we have to solve one forward problem (12.3.7), (12.3.8), and one adjoint problem (12.3.21), (12.3.22). Since the adjoint problem has a similar variational form with the forward problem, essentially, we need to compute two forward problems at each sweep. Once δφj is determined, φη˜ is updated by φk˜ + δφj . After completing the Jth sweep, we get the reconstructed level set function φη at the spatial frequency η. Then, the scatterer is updated by qk = Λ(φk ). Remark 12.3.6. For a fixed wavenumber κ, the stopping index of nonlinear Landweber iteration (12.3.16) could be determined from the discrepancy principle. However, in practice, it is not necessary to do many iterations. Numerical results show that the iterative process for different incident angles φj , j = 1, . . . , m, is sufficient to obtain reasonable accuracy. The recursive linearization for shape reconstruction of inverse medium scattering can be summarized in Table 12.1. Table 12.1. Recursive linearization reconstruction algorithm.
1 2 3 4 5 6 7 8 9 10 11 12 13 13
12.4
Initialization k = k0 (Ωk0 , qk0 , φk0 ) given from the Born approximation Reconstruction loop do k = kmin : kmax march over wavenumber do j = 1 : J perform J sweeps for incident angles ˜ j )ϕj (ψ¯inc +ψ¯
δφjk = −qκ ˜ 2 βk j|∇φjk | χBρ (Γ) φjk := φjk + δφjk qjk := Λ(φjk ) end do φk := φJk qk := Λ(φk ) end do (Ω, q, φ) := (Ωkmax , qkmax , φkmax ) final reconstruction
Numerical experiments
In this section, we discuss the numerical solution of the forward scattering problem, and the computational issues of the recursive linearization algorithm. As for the forward solver, we adopt the Finite Element Method (FEM). As we know, the FEM usually leads to a sparse matrix. The sparse large-scale linear system can be most efficiently solved if the zero elements of coefficient
12.4 Numerical experiments
299
matrix are not stored. We used the commonly used Compressed Row Storage (CRS) format which makes no assumptions about the sparsity structure of the matrix, and does not store any unnecessary elements. In fact, from the variational formula of our direct problem (12.2.1), the coefficient matrix is complex symmetric. Hence, only the lower triangular portion of the matrix needs be stored. Regarding the linear solver, both BiConjugate Gradient (BiCG) and Quasi-Minimal Residual (QMR) algorithms with diagonal preconditioning are tried to solve the sparse, symmetric, and complex system of the equations, with the QMR more efficient. In the following, we present three numerical examples where the number of incident wave J = 10 and the relaxation parameter βk = 0.1/κ. For stability analysis, some relative random noise is added to the date, i.e., the electric field takes the form ψ|S := (1 + σ rand)ψ|S . Here, rand gives normally distributed random numbers in [−1, 1], and σ is a noise level parameter, taken to be 0.02 in our numerical experiments. Example 12.4.1. Reconstruct a U-shaped scatterer in the domain D = [−1, 1] × [−1, 1]. Figure 12.1 shows the exact scatterer and the evolution of reconstructions at different wavenumbers varying from the Born approximation with wavenumber κ = 1.0 to highest wavenumber κ = 7.0. As can be seen, better reconstructions are obtained when higher wavenumber is used for the inversion. Under the lowest wavenumber, the Born approximation can only generate an average of the scatterer; no detailed features are able to be resolved. The concave part of the scatterer can be gradually resolved by using higher and higher wavenumbers; while the concave part of the scatterer can not be fully recovered at the low wavenumber. This result may be explained by Heisenberg’s uncertainty principle [6]. We point out that the method is not sensitive to the noise and the step size of the wavenumber, which suggests that large step size of the wavenumber may be used to speed up the convergence. Figure 12.2 shows the negative of the level set function −φ, which clearly presents the U shape of the reconstructed scatterer. Example 12.4.2. Reconstruct a cross-shaped scatterer in the domain D = [−1, 1] × [−1, 1]. Figure 12.3 shows the exact scatterer and the evolution of reconstructions at different wavenumbers varying from the Born approximation with wavenumber κ = 1.0 to highest wavenumber κ = 7.0. Similarly, better reconstructions are obtained when higher wavenumber is used for the inversion. Under the lowest wavenumber, the Born approximation can only generate an average of the scatterer; no detailed features are able to be resolved. The cross shape the scatterer can be gradually resolved by using higher and higher wavenumbers. Figure 12.4 shows the negative of the level set function −φ, which clearly presents the cross shape of the reconstructed scatterer.
300
12
Shape Reconstruction of Inverse Medium Scattering
Figure 12.1. Evolution of scatterer in Example 1. Left column from top to bottom: true scatterer; Born approximation; reconstruction at κ = 2.5; right column from top to bottom: reconstruction at κ = 4.0; reconstruction at κ = 5.5; reconstruction at κ = 7.0.
Figure 12.2. Final level set function −φ for Example 1.
12.4 Numerical experiments
301
Figure 12.3. Evolution of scatterer in Example 2. Left column from top to bottom: true scatterer; Born approximation; reconstruction at κ = 2.5; right column from top to bottom: reconstruction at κ = 4.0; reconstruction at κ = 5.5; reconstruction at κ = 7.0.
Figure 12.4. Final level set function −φ for Example 2.
302
12
Shape Reconstruction of Inverse Medium Scattering
Figure 12.5. Evolution of scatterer in Example 3. Left column from top to bottom: true scatterer; Born approximation; reconstruction at κ = 2.5; right column from top to bottom: reconstruction at κ = 4.0; reconstruction at κ = 5.5; reconstruction at κ = 7.0.
Figure 12.6. Final level set function −φ for Example 3.
References
303
Example 12.4.3. Finally, we consider a scatterer which has three disjoint components. This scatterer is difficult to recover due to the three nearby components. Again, Figure 12.5 shows the exact scatterer and the evolution of reconstructions at different wavenumbers varying from the Born approximation with wavenumber κ = 1.0 to highest wavenumber κ = 7.0. Similarly, better reconstructions are obtained when higher wavenumber is used for the inversion and three components can be separated. Under the lowest wavenumber, the Born approximation can only generate an average of the scatterer; the three disjoint parts can not be resolved. The three parts of the scatterer can be gradually separated by using higher and higher wavenumbers. Figure 12.6 shows the negative of the level set function −φ, which clearly presents the three parts of the reconstructed scatterer.
12.5
Concluding remarks
A new continuation method with respect to the spatial frequency of the evanescent plane waves is presented. The recursive linearization algorithm is robust and efficient for solving the inverse medium scattering with the fixed frequency scattering data. Finally, we point out two important future directions along the line of this work. The first is concerned with the convergence analysis. Although our numerical experiments demonstrate the convergence and stability of the inversion algorithm, its analysis needs to be done. Another important project is to consider the case of data with partial measurements at fixed frequency. Without the full measurements, the ill-posedness and nonlinearity of the inverse problem becomes more severe, which will be reported else where.
Acknowledgements The research of the first author was supported in part by the NSF grants DMS0604790, DMS-0908325, CCF-0830161, EAR-0724527, and DMS-0968360, and the ONR grant N00014-09-1-0384, and a special research grant from Zhejiang University. The research of the second author was supported in part by the NSF grants DMS-0914595 and DMS-1042958.
References [1] G. Bao, Y. Chen and F. Ma, Regularity and stability for the scattering map of a linearized inverse medium problem, J. Math. Anal. Appl., 247, 255-271, 2000. [2] G. Bao and J. Liu, Numerical solution of inverse problems with multiexperimental limited aperture data, SIAM J. Sci. Comput., 25, 1102-1117, 2003.
304
12
Shape Reconstruction of Inverse Medium Scattering
[3] G. Bao and P. Li, Inverse medium scattering for three-dimensional time harmonic Maxwell equations, Inverse Problems, 20, L1-L7, 2004. [4] G. Bao and P. Li, Inverse medium scattering problems for electromagnetic waves, SIAM J. Appl. Math., 65, 2049-2066, 2005. [5] M. Burger, Levenberg-Marquardt level set methods for inverse obstacle problems, Inverse Problems, 20, 259-282, 2004. [6] Y. Chen, Inverse scattering via Heisenberg uncertainty principle, Inverse Problems, 13, 253-282, 1997. [7] D. Colton, H. Haddar and M. Piana, The linear sampling method in inverse electromagnetic scattering theory, Inverse Problems, 19, S105-S137, 2003. [8] D. Colton and R. Kress, Inverse Acoustic and Electromagnetic Scattering Theory, 2nd Edition, Appl. Math. Sci., 93, Berlin, Springer-Verlag, 1998. [9] D. Colton and A. Kirsch, A simple method for solving inverse scattering problems in the resonance regions, Inverse Problems, 13, 383-393, 1996. [10] O. Dorn, H. Bertete-Aguirre, J. G. Berryman and G. C. Papanicolaou, A nonlinear inversion method for 3D electromagnetic imaging using adjoint fields, Inverse Problems, 15, 1523-1558, 1999. [11] O. Dorn, E. Miller and C. Rappaport, A shape reconstruction method for electromagnetic tomography using adjoint fields and level sets, Inverse Problems, 16, 1119-1156, 2000. [12] H. Engl, M. Hanke and A. Neubauer, Regularization of Inverse Problems, Dordrecht, Kluwer, 1996. [13] D. Gilbarg and N. Trudinger, Elliptic Partial Differential Equations of Second Order, New York, Springer, 1983. [14] H. Haddar and P. Monk, The linear sampling method for solving the electromagnetic inverse medium problem, Inverse Problems, 18, 891-906, 2002. [15] K. Ito, K. Kunisch and Z. Li, Level-set function approach to an inverse interface problem, Inverse Problems, 17, 1225-1242, 2001. [16] J. Jin, The Finite Element Methods in Electromagnetics, John Wiley & Sons, 2002. [17] D. Jerison and C. Kenig, Unique continuation and absence of positive eigenvalues for Schrödinger operators, Ann. Math., 121, 463-488, 1985. [18] R. Kress, Decomposition methods in inverse obstacle scattering via Newton iterations, Proceedings of the 6th International Wrokshop on Mathematical Methods in Scattering Theory and Biomedical Engineering, Tsepelovo, 2003. [19] R. Kress and W. Rundell, Inverse scattering for shape and impedance, Inverse Problems, 17, 1075-1085, 2001. [20] R. Kress and W. Rundell, Inverse obstacle scattering using reduced data, SIAM J. Appl. Math., 59, 442-454, 1999. [21] A. Litman, D. Lesselier and F. Santosa, Reconstruction of a two-dimensional binary obstacle by controlled evolution of a level-set, Inverse Problems, 14, 685706, 1998. [22] F. Natterer, The Mathematics of Computerized Tomography, Stuttgart, Teubner, 1986. [23] F. Natterer and F. Wübbeling, A propagation-backpropagation method for ultrasound tomography, Inverse Problems, 11, 1225-1232, 1995.
References
305
[24] S. Osher and R. Fedkiw, Level Set Methods and Dynamic Implicit Surfaces, New York, Springer, 2002. [25] S. Osher and J. Sethian, Fronts propagating with curvature-dependent speed: algorithms based on Hamilton-Jacobi formulations, J. Comput. Phys., 79, 12-49, 1988. [26] J. Sethian, Level Set Methods and Fast Marching Methods, 2nd Edition, Cambridge, Cambridge University Press, 1999. [27] F. Santosa, A level-set approach for inverse problems involving obstacles, ESAIM: Control, Optimization and Calculus of Variations, 1, 17-33, 1996.
Authors Information Gang Bao†‡ † Department of Mathematics, Zhejiang University, Hangzhou 310027, P. R. China. ‡ Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA. E-mail: [email protected] Peijun Li Department of Mathematics, Purdue University, West Lafayette, IN 47907, USA. E-mail: [email protected]
Part V
Inverse Vibration, Data Processing and Imaging
Chapter 13
Numerical Aspects of the Calculation of Molecular Force Fields from Experimental Data G. M. Kuramshina, I. V. Kochikov and A. V. Stepanova
Abstract. New approaches and algorithms have been proposed for molecular force field calculations on the basis of regularizing methods for solving nonlinear ill-posed problems. These algorithms have been implemented in our software package SPECTRUM. The main goal of this chapter is the numerical implementation of algorithms for the calculation of scale factors of molecular force fields expressed in different sets of generalized coordinates including Cartesian ones.
13.1
Introduction
Many problems of physical chemistry belong to the class of inverse problems, in which from known experimental data of the object we need to determine some of its properties based on a certain model connecting these properties with measured characteristics. Inverse problems typically lead to mathematical models that are not well-posed in the sense of Hadamard, i.e., to the ill-posed problems. This means that they may not have a solution in the strict sense; solutions may not be unique and/or may not depend continuously on the input data. Mathematical problems possessing such properties are called ill-posed problems [1], mostly due to the nonstability of solutions with respect to data perturbations. The theory of ill-posed problems (advanced by A. N. Tikhonov and his scientific school in 1960s) investigates and develops the effective stable numerical (regularization) methods for the solution of the ill-posed problems. In the background of this theory lies an understanding of the underdetermined character of the ill-posed problems and a concept of the regularizing operator (algorithm). Molecular force fields provide very important information about molecular structure and molecular dynamics and may be determined within harmonic
310
13 Calculation of Molecular Force Fields
approximation from experimental data of vibrational (infrared and Raman) spectroscopy by solving the so-called inverse vibrational problem. Rapid progress in quantum mechanical calculations of theoretical harmonic force fields provides new ways for more accurate interpretation of experimental data as well as new possibilities for development of empirical force field calculations. The latter are particularly important for the large-size molecules for which accurate ab initio calculations are impossible, so that empirical methods based on a solving an inverse vibrational problem still remain the best available source of force field parameters. Fast progress in the investigation of rather large nanomolecules requires development of special approaches for solving inverse vibrational problems moving far beyond traditional methods based on the least-squares procedures. Analysis of large molecular systems (when a force constant matrix F is constructed from previously evaluated force constants of model compounds) runs across difficulties of possible incompatibility of the results determined by different authors and by means of different numerical methods within different approximations (force field models). These difficulties are related to nonuniqueness and instability of the solution of inverse vibrational problem as well as to incompatibility of available experimental data with the harmonic model. In this chapter we discuss how a-priori model assumptions and ab initio quantum mechanical calculations may be used for constructing regularizing algorithms for the calculation of molecular force fields. We have proposed a principally new formulation of a problem of searching for the molecular force field parameters using all experimental data available and quantum mechanical calculation results and taking into account the a-priori known constraints for force constants. The essence of approach is that we suggest (using given experimental data and its accuracy) to find by means of stable numerical methods the approximations to the so-called normal pseudosolution, i.e. matrix F which is the nearest by chosen Euclidean norm to the given force constant matrix F 0 , satisfies the set of a-priori constraints D and is compatible with experimental data Λδ with regard for the possible incompatibility of the problem [2]. Within this approach, the results will tend to be as close to quantum mechanical data as the experiment allows. From mathematical point of view, the algorithm should provide approximations to the solution that tend to the exact solution when experimental data becomes more extensive and accurate. Imposing certain restrictions on the matrix of force constants (in our case it is a requirement of the closeness of solution to the matrix F 0 ) allows to obtain unique solution from the variety of possible choices.
13.2 Molecular Force Field Models
13.2
311
Molecular Force Field Models
The idea of the force field arises from the attempt to consider a molecule as a mechanical system of nuclei while all the interactions due to electrons are included in an effective potential function U (q1 , . . . , qn ), where q1 , . . . , qn denote the n = 3N − 6 generalized coordinates of the N atomic nuclei of the molecule. The potential function minimum (with respect to nuclei coordinates) defines the equilibrium geometry of the molecule, and the second derivatives of the potential with respect to nuclei coordinates in the equilibrium fij =
∂ 2U |eq ∂qi ∂qj
(i, j = 1, . . . , n)
constitute a positive defined matrix F determining all the molecular characteristics connected with small vibrations. The vibrational frequencies (obtained from R and Raman spectra) are the main type of experimental information on molecular vibrations. They are connected with the matrix of force constants by the eigenvalue equation GF L = LΛ (13.2.1) where Λ is a diagonal matrix consisting of the 6 squares of 7 the molecular normal vibration frequencies ω1 , . . . , ωn , Λ = diag ω12 , . . . , ωn2 , and G is the kinetic energy matrix in the momentum representation. L is a matrix of normalized relative amplitudes. The choice of generalized coordinates {qi } determines the view of molecular force field. The most popular coordinates which are used in description of molecular vibrations are so-called the internal ones which include the bond stretchings, valence bond bendings (deformations), out-of-plane bendings (determined by changes of dihedral angles) and torsions. From set of internal coordinates one can construct sets of symmetry or local symmetry coordinates, etc. The molecular force field plays an important role in determining the properties of a molecule, in particular its vibrational spectrum. Many approximations have been proposed for the calculation of a complete quadratic force field and many computational programs have been created and used in practice (see, e.g., [3, 4, 5, 6, 7]). Usually, the ill-posed character of inverse vibrational problems has led to some degree of subjectivity, related basically to constraints imposed on the solution to ensure physically meaningful results. In this way various models of the molecular force fields have been proposed and a great number of force field matrices have been calculated for various series of compounds. These matrices are chosen to satisfy either experimental data or a-priori known, but not explicitly formulated, criteria of the physically meaningful solution. As a result, there was a situation (in particular, for complicated polyatomic molecules)
312
13 Calculation of Molecular Force Fields
when various so-called spontaneous methods for solving inverse problems lead to inconsistent force fields due to different criteria for the physical feasibility of solutions used by various investigators, and to the instability (with respect to small perturbations in the input information) of numerical methods used to solve the inverse problem. In our publications [2, 8-25] we have made an attempt to formulate and formalize all possible obvious (and not so obvious) model assumptions concerning the character of force fields which are widely used in vibrational spectroscopy. On the basis of this formalization in the framework of Tikhonov’s regularization theory we have constructed a principle for choosing a unique solution from the set of solutions. We have formulated such a principle in terms of the closeness of the solution to the given matrix of force constants which satisfy all a-priori assumptions concerning the model characteristics of the solution.
13.3
Formulation of Inverse Vibration Problem
If only the experimental frequencies of one molecular isotopomer are known, inverse vibrational problem of finding force constant matrix F reduces to the inverse eigenvalue problem; hence, when G is not singular it follows that solution of Eq. (13.2.1) is represented by any matrix F such that F = G−1/2 C ∗ Λ C G−1/2
(13.3.1)
where C is an arbitrary orthogonal matrix (the asterisk denotes the transposed matrix). While Eq. (13.2.1) is the main source of data determining the force constants, it is evident that (except for diatomic molecules) the n(n + 1)/2 parameters of F cannot be found uniquely from the n frequencies ω1 , . . . , ωn . This has led, on the one hand, to attempts of using certain model assumptions concerning the structure of the matrix F , and on the other hand, to bringing in additional experimental data. Within the approximation considered, the force field of a molecule does not depend on the masses of the nuclei, and hence for the spectra of m isotopic species we have, instead of Eq. (13.2.1), the system (Gi F ) Li = Li Λi ,
i = 1, 2, . . . , m.
(13.3.2)
The additional information may be extracted also from ro-vibrational spectra (Coriolis constants), gas electron diffraction (mean square amplitudes), etc., where molecular constants are determined by force constant matrix F . The mathematical relation between the molecule vibrational properties (Eqs. (13.2.1), (13.3.2), etc.) and its experimental display can be summarized in the form of a single operator equation AF = Λδ
(13.3.3)
13.3 Formulation of Inverse Vibration Problem
313
where F ∈ Z ∈ Rn(n+1)/2 (Z is a set of possible solutions) is the unknown force constant matrix (real and symmetrical), Λ ∈ Rm represents the set of available experimental data (vibrational frequencies, etc.) determined within δ error level: Λ − Λδ ≤ δ. A is a nonlinear operator which maps matrix F on the Λ. This problem belongs to the class of nonlinear ill-posed problems (it does not satisfy any of three well-posedness conditions by Hadamard) and in the general case (except the diatomic molecules) may have non-unique solution or no solutions at all (incompatible problem); solutions may be unstable with respect to the errors in operator A and set of experimental data Λ. The theory of regularization methods for nonlinear problems was developed in the last two decades [1] and applied to the inverse problems of vibrational spectroscopy [2]. The most important idea of regularization theory is that every time we find that experimental data is insufficient for the unique and stable determination of some (or all) molecular parameters, we should employ some kind of external knowledge or experience. In some cases it is desirable to formulate these considerations as explicit additional restrictions on the set of possible solutions. When it is impossible, a more flexible approach is to choose the solution that is in a certain sense nearest to some a-priori defined parameter set. This set may not necessarily conform to the experiment, but should be based on data complementary to the experiment. This external evidence may be derived from the general ideas (for example, molecular force field models, or data on similar molecular structures), or, preferably, be based on ab initio calculations. An inverse vibrational problem is formulated as a problem of finding the so-called normal solution (or normal pseudo-(quasi-)solution in the case of incompatibility of input data) of a nonlinear operator equation (13.3.3). The desired solution is a matrix F α ∈ Z that reproduces experimental data within given error level and is the nearest in the Euclidean metrics to some given matrix F 0 . All necessary model assumptions (explicit and implicit) concerning the form of force field may be taken into account by the choice of some given a-priori matrix of force constants F 0 and a pre-assigned set D of a-priori constraints on values of the force constants. This set defines a form of matrix F in the framework of the desired force field model (i.e., with specified zero elements, equality of some force constants, etc.). If no a-priori data constrains the form of solution, then D coincides with the set Z.
314
13.4
13 Calculation of Molecular Force Fields
Constraints on the Values of Force Constants Based on Quantum Mechanical Calculations
It was mentioned in [19, 20], that in the Tikhonov’s regularizing procedure, one can increase the stability and accuracy of the calculated solution Fα by using (a) an extended set of experimental data (including, e.g., Coriolis constants, mean square amplitudes, frequencies of isotopomers or related compounds, etc.); (b) an improved choice of the stabilizer matrix F 0 ; (c) an improved choice of the constraint set D. As a particularly effective choice of stabilizer, we have proposed [17] to use an ab initio quantum mechanical F 0 matrix in the regularizing procedure. This leads to the concept of regularized quantum mechanical force field (RQM FF), defined as the force constant matrix that is nearest to a corresponding quantum mechanical matrix F 0 and reproduces experimental frequencies within given error level. The correct choice of constraint set D is also extremely important. Physically stipulated limitations may either decrease the range of acceptable matrices F , or provide criteria for selecting a concrete solution from a set of tolerable ones. An incorrect choice of constraints may lead to increasing incompatibility of the inverse problem, eventually resulting in a pseudosolution having no physical meaning. A set of a-priori constraints may arise from several types of limitations on force constant values, e.g. [19, 20]: (1) some force constants may be stipulated on a-priori grounds to be a zero; (2) some force constants may be stipulated to satisfy inequalities aij ≤ fij ≤ bij , where aij , bij are certain known values; (3) some force constants may be stipulated to be equal in a series of related molecules (or conformers); (4) the final solution may be stipulated to conform to Pulay’s scaled force constant matrix [26], which may also be considered as a kind of constraint. During the last 30 years numerous attempts were undertaken in the development of quantum mechanical methods for the evaluation of molecular structure and molecular dynamics of polyatomic molecules directly from Schrodinger’s equation. The use of very restrictive assumptions was (and still is) necessary to make such calculations computationally realizable. At present, direct calculations of vibrational spectra by quantum mechanical calculations at different levels of theory are very important and widely used for the interpretation of experimental data, especially in case of large molecules. Among different approaches, the Hartree-Fock (HF) level calculations are routine and available even for very large (consisting of up to hundreds of atoms)
13.4 Constraints on the Values of Force Constants
315
systems. But the quality of these calculations is insufficient for the direct comparison of theoretical vibrational frequencies with experimental ones. As a rule, the HF frequencies are overestimated (up to 15%), and these errors have a systematic character for the related compounds. The wide use of quantum mechanical calculations of vibrational spectra and harmonic force fields of polyatomic molecules induced the necessity of making empirical corrections to theoretical data for compatibility with experiment. The most popular approach is the so-called scaling procedure proposed by P. Pulay [26] where the disagreement between experimental and theoretical frequencies was eliminated by introducing a finite (not very large) number of scaling factors. This approach also can be formulated in a form of a-priori constraints mentiones above as type 4. In this approach we can specify [27–29] the set D as: D = {F : F = BF 0 B}, B = diag{β1 , . . . , βn } (where β i are the scaling parameters). Though this scaling procedure (mathematically, it is imposing rather strict limitations on the molecular force field) often does not provide enough freedom to eliminate all discrepancies between calculated and observed data, it has certain advantages that follow from the comparatively small number of adjustable parameters and, consequently, moderate computational resources required to perform force-field refinement. Indeed, it is very attractive to find a limited number of scaling factors for a series of model molecules (assuming their transferability), and use them for correction of the quantum mechanical force constants of more complicated molecular systems. The most popular numerical procedure for calculation (optimization) of scaling factors is the least-squares procedure, but there are a few publications indicating at the non-convergency and instability of this numerical procedure while solving an inverse scaling problem. It is explained by the impossibility of using the traditional numerical methods for solving the nonlinear ill-posed problems [1, 2, 30]. In our works the following strict mathematical formulation of the inverse scaling problem has been proposed [27–29]: the problem of finding scaling factors on the base of experimental data is treated as an operator equation similar to (13.3.3): AF (β) = Λδ
(13.4.1)
where β are scaling factors. Let the following norms in the Euclidean space be introduced: ⎛ ⎞1 / l 0 12 2 n λ2k ρk F = ⎝ fij2 ⎠ , Λ = ij
k=1
where ρk > 0 are certain positive weights; fij are the elements of matrix F ; λk (k = 1, . . . , m) are the components of Λ.
316
13 Calculation of Molecular Force Fields
Since problem (13.4.1) is also ill-posed, we have to regularize it. We formulate the problem as a requirement to find a solution of (13.4.1), Fn,δ , that is nearest (by Euclidean norm) to the quantum mechanical matrix F 0 , satisfies experimental data within a given error level δ (||A(F (β)) − Λδ || ≤ δ) and has a special form proposed by Pulay. If we consider this problem taking into account its possible incompatibility (within harmonic model), we come to the following formulation: To find Fn,δ = arg min F − F 0 where F ∈ F : F ∈ D = F : F = B 1/2 F 0 B 1/2 ,
AF − Λδ ≤ μ + δ . (13.4.2) Here B is a diagonal matrix of scaling factors β I , and μ is a measure of incompatibility of the problem [2]. It may arise due to the possible anharmonicity of experimental frequencies or the crudeness of the chosen model. Finding such a solution may be provided by minimization of the Tikhonov functional M α (β) = M α [F ] = Ah F − Λδ 2 + αF − F 0 2
(13.4.3)
where F = F (β), and regularization parameter α is chosen in accordance with the generalized discrepancy principle [1]. As a rule, the assumed limitations on the values of force constants of polyatomic molecules cannot be strictly proved. Nevertheless, numerical quantum mechanical results on molecular force fields can provide useful guidance in choosing realistic force field models for different types of molecules. The simplicity of the scaling procedure made it extremely popular in recent years. It has been shown that for many molecular fragments scale factors (within a given level of quantum-mechanical method) are approximately constant in a wide range of similar molecules. Force constant scaling factors have been obtained for different sets of coordinates and quantum-mechanical methods, which in most cases allow to approximate experimental frequencies with a reasonable degree of accuracy. However, the regularized scaling procedure has initially been suggested [27– 29] for the force fields defined in the internal or symmetry (local symmetry) coordinate systems. In the course of spectroscopic and structural research, introduction of the complete system of internal coordinates is the most tedious and time-consuming procedure, especially for the large molecular systems. From quantum chemistry we usually obtain force constant matrix in Cartesian coordinates. Therefore we have proposed the procedure [31, 32] to scale ab initio
13.4 Constraints on the Values of Force Constants
317
force field matrix in Cartesian coordinates which allows to avoid introducing internal coordinates. Within this approach scaling is still given by Eq. (13.4.2); however, we are not assuming the matrix B to be diagonal. Force field matrix in Cartesian coordinates is not automatically independent of the molecular position and orientation as in a case of using internal coordinates. Physically meaningful force constant matrix should therefore satisfy a number of constraints that would eliminate translational and vibrational degrees of freedom in the expression for the potential energy. Let the force field matrix in Cartesian coordinates is represented as an array of 3 × 3 submatrices corresponding to each atom: ⎞ ⎛ f(11) f(12) . . . f(1N ) ⎜ f(21) f(22) . . . f(2N ) ⎟ ⎟ (13.4.4) F =⎜ ⎝ ... ... ... ... ⎠ f(N 1) f(N 2) . . . f(N N ) where N is the number of atoms in a molecule. Independence of potential energy of the transltions and rotation of a molecule as a whole leads to the following requirements which were introduced in [2]: N i=1
f(ij) = 0,
N
Vi f(ij) = 0,
j = 1, 2, . . . , N
(13.4.5)
i=1
where 3 × 3 submatrices Vi are defined as ⎞ ⎛ 0 −Zi0 Yi0 Vi = ⎝ Zi0 0 −Xi0 ⎠ , −Yi0 Xi0 0 and Xi0 , Yi0 , Zi0 are Cartesian components of the i-th atom equilibrium position. Imposing constraints (13.4.1) reduces the rank of matrix F to 3N − 6 (or 3N − 5 for linear molecules), thus leaving only vibrational degrees of freedom. When scaling procedure (13.3.2) is applied to the matrix F in Cartesian coordinates, we may assume that a-priori matrix F 0 satisfies the requirements (13.4.1). However, this does not necessarily mean that the scaled matrix also satisfies these requirements. To ensure that scaled matrix also contains only vibrational degrees of freedom, the scale matrix B should also satisfy certain conditions as it was shown in [31]: (1) Matrix B similarly to force field matrix in Cartesian coordinates (13.3.3) consists of the 3 × 3 unit submatrices multiplied by certain factors β ij (i, j, = 1, . . . , N ):
318
13 Calculation of Molecular Force Fields
⎛
β11 E ⎜ β21 E B=⎜ ⎝ ... βN 1 E
β12 E β22 E ... βN 2 E
⎞ . . . β1N E . . . β2N E ⎟ ⎟. ... ... ⎠ . . . βN N E
(2) The factors β ij are subject to the following constraints: βij = βji ; N i=1
β1i =
N i=1
β2i = · · · =
N
βN i = S = const.
(13.4.6)
i=1
It is easy to see that conditions (13.4.2) allow matrix B to be diagonal only when all β ii are equal. If there exist any extra constraints due to the symmetry or model assumptions, they should be used in addition to (13.4.2). In general, matrix B contains N (N − 1)/2 + 1 independent parameters, since all diagonal elements may be represented as βij . βii = S − j=i
On this way we come to the formulation of inverse vibrational problem in a form (13.2.1) where a set of a-priori constraints D on the molecular force field includes conditions (13.4.2). The solution (a set of scaling factors) can be found by minimization of functional (13.3.1). Additionally, a set D can include the constraints such as equality of some off-diagonal factors to zero, in-pair equalities of factors, symmetry constraints etc. There is a lot of publications describing application of numerical methods to solution of the inverse vibrational problem on the base of the least squares procedure. Very often, as criterion of minimization, the authors choose the “best” agreement between experimental and fitted vibrational frequencies. However, there is a doubt if the “best” agreement criterion is meaningful. We should like to note that this criterion is insufficient due to the ill-posed nature of the inverse vibrational problem. Even in the case of a single molecule, it is a well-known fact that there exist an infinite number of solutions, which exactly satisfy any given set of the experimental frequencies. Addition of the expanded experimental information on frequencies of the isotopomers or related molecules may lead to incompatibility of the mathematical problem and result in no solution at all within the conventional harmonic model. This means that, using any minimization procedure for solving the inverse vibrational problem, it is necessary to apply some additional criteria (that can be mathematically formulated) in the minimization procedure to select the unique solution.
13.5 Generalized Inverse Structural Problem
13.5
319
Generalized Inverse Structural Problem
More complicated procedure was proposed for solving the generalized inverse structural problem (GISP) [23,25] in the case of joint treatment of the experimental data obtained by different physical methods (vibrational spectroscopy, electron diffraction (ED) data and microwave (MW) spectroscopy). To implement an integrated procedure of simultaneous refinement of the force field and equilibrium geometry, a dynamic model of a molecule has been created. Based on the general approximation of small vibrations, the model has been extended to include cubic and quartic anharmonic potential terms for proper description of the large-amplitude motion. Within this approach, a molecule is described using a set of equilibrium geometry parameters R, and a molecular force field F represented by the matrices of quadratic, cubic and possibly quartic force constants defined in the framework of a certain nonlinear system of internal coordinates. Both parameter sets R and F can be considered as finite-dimensional vectors. The model is used to predict experimentally measured values, such as vibrational frequencies ω, electron diffraction intensity M (s), rotational constants {A, B, C } obtained from microwave molecular spectra, etc. All of these values are functions of geometric (R) and force field (F ) parameters. With experimental data and parameters represented as elements of normalized finitedimensional spaces, we can formulate the problem of simultaneous refinement of the force field and equilibrium geometry of the molecule as a system of nonlinear equations ⎧ ⎪ ⎨ ω(F, R) = ωexp , M (s, F, R) = Mexp (s), ⎪ ⎩ {A, B, C}(R, F ) = {A, B, C}exp
(13.5.1)
on a set of predefined constraints F ∈DF , R∈DR . This system can be extended to include additional experimental evidence when available (for example, data for isotopic species of a molecule sharing the same force field and equilibrium geometry). Due to experimental errors, lack of experimental data and model limitations, this system o f equations (that can be also treated as a finite-dimensional nonlinear operator equation) usually fails to define unique solution, often proves to be incompatible and does not provide stability with respect to the errors of input data. To avoid these unfavorable features characteristic to the ill-posed problems, it is necessary to implement a regularizing algorithm for its solution. We suggest using a regularizing algorithm based on optimization of the
320
13 Calculation of Molecular Force Fields
Tikhonov’s functional M α (F, R) = ω(F, R) − ωexp 2 + M (s, F, R) − Mexp (s)2 + + {A, B, C}(R, F ) − {A, B, C}exp 2 7 6 + α F − F 0 2 + R − R0 2
(13.5.2)
where in the last (“stabilizer”) term F 0 and R 0 represent parameters of ab initio force field and equilibrium geometry, respectively. With the appropriate choice of regularization parameter α (that depends on the experimental errors characterized by some numerical parameter δ), it proves possible to obtain approximations converging to a normal pseudosolution of the system (13.5.1) when experimental errors tend to zero [2]. These approximations are obtained as extremals {Fα ,Rα }of functional (13.5.2). Now it is appropriate to give a brief summary of the features distinguishing the given approach from the various previously used attempts to solve the similar inverse problem [23,25]. (1) The approach is aimed at a simultaneous determination of the geometry and force field parameters of a molecule. It combines techniques previously used in R spectroscopy and ED data analysis. In particular, it allows to use more flexible force field models when fitting ED data, far beyond the usually employed scaling of the ab initio force field. (2) Ab initio data (or any other external data) is automatically “weighed” so as to serve an additional source of information when data supplied by the experiment proves insufficient. There is no need to supply ab initio data with some kind of assumed errors, etc. (3) Molecular geometry is defined in terms of equilibrium distances thus allowing compatibility with spectroscopic models and ab initio calculations. Besides, the self-consistency of geometrical configuration is automatically maintained at all stages of the analysis. The complexity of molecular models used in the analysis strongly depends on the availability and quality of the experimental data. Since in most cases vibrational spectra and ED patterns reveal the vibrational motion in a molecule resulting from small deviations of the atoms from their equilibrium positions, most molecular models are based on the assumption of small harmonic vibrations. In some cases of solving GISP within the scaling approximation, it is necessary to include the cubic part of the force field [33]. Similarly, in order to get a set of more reliable cubic force constants it is undoubtedly beneficial to improve empirically the ab initio values, e.g. for simplicity by the use of the Pulay harmonic scale factors. Two schemes of cubic constant scaling are generally feasible. Let the ab initio quadratic force constant fij0 defined in natural
13.6 Computer Implementation
321
internal coordinates be scaled as follows: fij (scaled) = fij0 βi1/2 βj1/2 , where β i and β j are the harmonic scale factors. Then, reasoning by analogy, the cubic constants scaling mode can be formulated [34] as 0 βi / βj / βk / fijk (scaled) = fijk
12 12 12
or, alternatively [34], 0 βi1/3 βj1/3 βk1/3 , fijk (scaled) = fijk 0 are the unscaled theoretical cubic constants. Both scaling schemes where fijk reduce the vibrational problem to the determination of a much smaller number of parameters. The examples of the applying the last procedure to different molecular systems including those with large amplitude motion are given in [25]. The most important step in solving the inverse vibrational problem is formulating a-priori constraints on the solution, which are taken from quantum mechanical calculations. Plausible constraints for the force field matrix followed recommendations discussed elsewhere [19,20].
13.6
Computer Implementation
On the basis of the regularizing algorithms described above we have compiled a package of programs for processing spectroscopic data on a computer [2, 14] which has been revised and now has the structure shown below. The package contains 3 distinct software units. One of them, intended for graphic illustrations of the 3D models of molecules, has no direct output. It shows a rigid molecule rotating in space. At present we plan to extend it to display vibrational modes. Another program, the symmetry analyzing routine, is not absolutely necessary for vibrational calculations but has proved extremely useful when working with molecules including dozens of atoms and possessing relatively high symmetry (for example, C60 with icosahedron symmetry). The version of symmetry analysis implemented in our package has following capabilities: (1) Given the equilibrium configuration (Cartesian coordinates of atoms) and masses of the atoms, it defines the symmetry point group of the molecule. In the other mode, when the symmetry group is given a-priori, the Cartesian coordinates may be adjusted (if necessary) to correspond to given symmetry properties;
322
13 Calculation of Molecular Force Fields
(2) The number of normal frequencies of each symmetry is calculated; (3) If the set of internal coordinates is described in the input, then the symmetry coordinates based on this set are constructed. The resulting set of symmetry coordinates is analyzed (e.g., numbers of independent and redundant coordinates of each symmetry are calculated) and outputted to be used in solving direct or inverse vibrational problems. The main software package known as SPECTRUM [2] has recently under-
13.7 Applications
323
gone important modifications. The current version takes advantages of 32-bit processing and virtual memory management that leads to increases in both the processing rate and in the allowable size of molecules. Progress in these areas has made the processing of molecules of more than 100–200 atoms possible in a reasonable amount of time. Further improvements were concerned with allowing the input of more kinds of data and the imposition of more and different constraints on the solution of the inverse problem. The general flow of data processing within the program is shown in the following table. Now software SPECTRUM is a part of information system ISMOL [35–37], the general scheme of these database is performed in Figure 13.1.
Figure 13.1. Hybrid type information system ISMOL.
13.7
Applications
As an example of solving the inverse scaling problem within scaling schemes, we present the results of the calculations of scaled force field in Cartesian coordinates for trans- and gauche-conformers of isopropylamine molecule (Figure 13.2). This compound was the subject of experimental structural [38], spectral [38] and theoretical [39] studies and the conclusion was made that isopropylamine exists as a mixture of trans and gauche conformers. The trans conformer (of Cs symmetry) has a nitrogen lone pair in trans-position to the central CH
324
Figure 13.2. molecules.
13 Calculation of Molecular Force Fields
Atoms numbering in trans- (a) and gauche- (b) isopropylamine
bond and is the predominant conformer in gaseous state in ratio ∼ 3.5 : 1 [40]. Gauche conformer has a structure of C1 symmetry. Scaling factors in Cartesian coordinates were determined from solving the inverse problem (13.4.1) by minimization functional (13.4.3). The sets of experimental frequencies of both conformers were taken from [38, 41]. In our calculations we did not use any symmetry constraints for trans conformer of isopropylamine but used some model constraints on the values of scaling factors in the way similar to previous calculations of scaling factors for CH3 CH2 Cl [30]. As a result of the imposed constraints, the matrices of scale factors contained 27 different factors that were optimized. Fitted (calculated with the final set of optimized scale factors) frequencies for both conformers of isopropylamine in comparison with experimental and theoretical data are given in Table 13.1. Corresponding resulting matrices of βij coefficients for both conformers are presented in Table 13.2. Average errors in frequency fitting were equal to 19.5 cm−1 , and 20.3 cm−1 (HF/6-31G*) for trans and gauche conformers of isopropyl molecule, respectively. These results demonstrate good correspondence between experimental and fitted frequencies supporting the conclusion of rather good quality of the obtained scale for trans- and gauche-isopropylamine. We can conclude that the model of correcting theoretical vibrational frequencies by means of scaling Cartesian force matrices is quite reasonable; generally, the results of frequency correction with Cartesian scaling factors are very close to the quality of their correction by scaling force matrices in internal coordinates [40, 42].
13.7 Applications
325
Table 13.1. Observed and calculated at the HF/6-31G* theoretical level vibrational frequencies of isopropylamine trans and gauche-conformers in comparison with fitted (calculated with optimized set of scaling factors) data.
3342 2968 2945 2932 2878 1618 1469 1449 1375 1245 1130 942 919 819 785 472 404 258
A 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
3411 2965 2950 2860 1469 1460 1360 1343 1240 1170 1029 976 369 267 236
3803 3273 3254 3196 1636 1632 1573 1528 1378 1122 1043 1015 431 319 239
1
3392 2949 2947 2848 1494 1490 1384 1330 1251 1171 1014 966 361 291 221
the final value of regularization parameter α.
Observed [38,40]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Mode
Observed [38,40]
A
HF/6-31G* Theor. Fitted1 α = 1.56 · 10−2 3721 3342 A 3281 2958 3258 2926 3202 2897 3118 2887 1840 1643 1655 1498 1642 1487 1572 1394 1523 1291 1316 1126 1278 946 1080 900 965 838 876 792 510 461 391 399 289 269
Sym.
Mode
gauche- isopropylamine
Sym.
trans- isopropylamine
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
3393 3328 2968 2965 2950 2945 2918 2878 2860 1618 1469 1469 1460 1449 1375 1360 1343 1245
HF/6-31G* Theor. Fitted1 α = 4.24 · 10−3 3793 3390 3708 3331 3288 2964 3261 2950 3252 2940 3246 2928 3206 2908 3196 2875 3186 2828 1836 1633 1652 1496 1645 1491 1634 1489 1630 1468 1575 1396 1556 1358 1539 1347 1498 1284
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
1240 1170 1130 1029 976 942 919 826 781 459 407 369 258 236 221
1395 1307 1262 1132 1061 1028 1009 982 870 499 440 393 296 289 247
1238 1170 1126 1023 967 948 921 826 792 452 392 359 268 265 229
HF/6-31G* trans-conformer N1 C2 C3 N1 0.904 0.019 0.000 C2 0.019 0.854 0.007 C3 0.000 0.007 0.858 C4 0.000 0.007 0.006 H5 0.005 0.028 0.007 H6 0.019 0.000 0.001 H7 0.019 0.000 0.001 H8 0.005 0.013 0.026 H9 0.005 0.013 0.010 H10 0.005 0.013 0.026 H11 0.005 0.013 0.026 H12 0.005 0.013 0.010 H13 0.005 0.013 0.010 HF/6-31G* gauche-conformer N1 C2 C3 N1 0.901 0.006 0.006 C2 0.006 0.871 0.015 C3 0.006 0.015 0.909 C4 0.006 0.015 0.005 H5 0.006 0.026 0.000 H6 0.016 0.0082 0.005 H7 0.016 0.008 0.005 H8 0.006 0.007 0.011 H9 0.006 0.007 0.011 H10 0.006 0.007 0.011 H11 0.006 0.007 0.005 H12 0.006 0.007 0.005 H13 0.006 0.007 0.005 H5 0.005 0.028 0.007 −0.005 0.960 0.001 0.001 −0.003 −0.003 −0.003 −0.003 −0.003 −0.003 H5 0.006 0.026 0.000 0.000 0.914 0.006 0.006 0.006 0.006 0.006 0.006 0.006 0.006
C4 0.000 0.007 0.006 0.858 0.007 0.007 0.001 0.0103 0.026 0.010 0.0103 0.026 0.026 C4 0.006 0.015 0.005 0.909 0.000 0.005 0.005 0.005 0.005 0.005 0.011 0.011 0.011
H6 0.016 0.008 0.005 0.005 0.006 0.910 0.003 0.007 0.007 0.007 0.007 0.007 0.006
H6 0.019 0.000 0.001 0.003 0.001 0.914 0.007 0.009 0.006 0.009 0.009 0.006 0.006 H7 0.016 0.008 0.005 0.005 0.006 0.003 0.910 0.007 0.007 0.007 0.007 0.007 0.007
H7 0.019 0.000 0.007 0.003 0.001 0.007 0.914 0.006 0.009 0.006 0.006 0.009 0.009 H8 0.006 0.007 0.011 0.005 0.006 0.007 0.007 0.915 0.004 0.004 0.007 0.007 0.007
H8 0.005 0.013 0.026 0.015 −0.003 0.009 0.006 0.928 0.000 0.000 0.000 0.000 0.000 H9 0.006 0.007 0.011 0.005 0.006 0.007 0.007 0.004 0.915 0.004 0.0067 0.0067 0.007
H9 0.005 0.013 0.010 0.024 −0.003 0.006 0.009 0.000 0.928 0.000 0.000 0.000 0.000 H10 0.006 0.007 0.011 0.005 0.006 0.007 0.007 0.004 0.004 0.915 0.0067 0.0067 0.007
H10 0.005 0.013 0.026 0.015 0.003 0.009 0.006 0.000 0.000 0.928 0.000 0.000 0.000 H11 0.006 0.007 0.005 0.011 0.006 0.007 0.007 0.007 0.007 0.007 0.915 0.004 0.004
H11 0.005 0.013 0.026 0.015 0.003 0.009 0.006 0.000 0.000 0.000 0.928 0.000 0.000 H12 0.006 0.007 0.005 0.011 0.006 0.007 0.007 0.007 0.007 0.007 0.004 0.915 0.004
H12 0.0058 0.013 0.010 0.024 0.003 0.006 0.009 0.000 0.000 0.000 0.000 0.928 0.000 H13 0.006 0.007 0.005 0.014 0.006 0.007 0.007 0.007 0.007 0.007 0.004 0.004 0.915
H13 0.005 0.013 0.010 0.024 0.003 0.006 0.009 0.000 0.000 0.000 0.000 0.000 0.928
326 13 Calculation of Molecular Force Fields
Table 13.2. Comparison of calculated scale factors for trans- and gaucheisopropylamine.
References
327
Acknowledgements This work was partly supported by the RFBR with grant numbers 11-03-00040 and 11-01-97020.
References [1] A. N. Tikhonov, A. S. Leonov and A. G. Yagola, Nonlinear Ill-posed Problems. Vol. 1, 2, 6915, Chapman and Hall, 1998. [2] A. G. Yagola, I. V. Kochikov, G. M. Kuramshina and Yu. A. Pentin, Inverse Problems of Vibrational Spectroscopy, VSP, Utrecht, Tokyo, 1999. [3] L. A. Gribov, Intensity Theory for Infrared Spectra of Polyatomic Molecules, New York, Consultants Bureau, 1964. [4] S. J. Cyvin, Molecular Vibrations and Mean Square Amplitudes, Universitetsforlaget, Oslo: Elsevier, Amsterdam, 1968. [5] M. V. Volkenstein, L. A. Gribov, M. A. El’yashevich and B. I. Stepanov, Vibrations of Molecules, Nauka, Moscow, 1972. [6] L. M. Sverdlov, M. A. Kovner and E. P. Krainov, Vibrational Spectra of Polyatomic Molecules, New York, Wiley, 1973. [7] S. Califano, Vibrational States, New York, Wiley, 1976. [8] I. V. Kochikov, G. M. Kuramshina, Yu. A. Pentin and A. G. Yagola, Regularizing algorithm for solving inverse vibrational problem, Dokl. Akad. Nauk SSSR, 261, 1104-1106, 1981. [9] I. V. Kochikov, G. M. Kuramshina, Yu. A. Pentin and A. G. Yagola, Calculation of polyatomic molecule force field by means of Tikhonov’s method of regularization, Dokl. Akad. Nauk SSSR, 283, 850-854, 1985. [10] I. V. Kochikov, G. M. Kuramshina and A. G. Yagola, Stable numerical methods of solving certain inverse problems of vibrational spectroscopy, USSR Comput. Maths. Math. Phys., 27, 33-40, 1987. [11] I. V. Kochikov, G. M. Kuramshina, Yu. A. Pentin and A. G. Yagola, Regularizing algorithms for molecular force field calculations, J. Mol. Struct., (Theochem) 272, 13-33, 1992. [12] I. V. Kochikov, G. M. Kuramshina and Yu. A. Pentin, Application of Tikhonov’s regularization method to the calculation of force fields of polyatomic molecules, Russian J. of Phys. Chem., 61, 451-457, 1987. [13] I. V. Kochikov, G. M. Kuramshina, Yu. A. Pentin and A. G. Yagola, Inverse problems in vibrational spectroscopy, Ill-posed problems in natural sciences: proceedings of the international conference, Editors: A. N. Tikhonov, et al., Utrecht: VSP/Moscow, TVP, 535-542, 1992. [14] I. V. Kochikov and G. M. Kuramshina, Software package for calculation of polyatomic molecule force fields by means of Tikhonov’s regularization method, Vestnik Moskov. Univ., Ser. 2, Khimia 26, 354-358, 1985. [15] A. S. Dostavalova, I. V. Kochikov, G. M. Kuramshina and A. G. Yagola, Joint analysis of force fields in a series of polyatomic molecules, Dokl. Akad. Nauk SSSR, 315, 1368-1373, 1990.
328
13 Calculation of Molecular Force Fields
[16] G. M. Kuramshina, F. A. Weinhold, I. V. Kochikov, Yu. A. Pentin and A. G. Yagola, Regularized molecular force fields obtained on a base of ab initio quantum mechanical calculations, Russian J. Phys. Chem., 68, 13-26, 1994. [17] G. M. Kuramshina, F. A. Weinhold, I. V. Kochikov, A. G. Yagola and Yu. A. Pentin, Joint treatment of ab initio and experimental data in molecular force field calculations with Tikhonov’s method of regularization, J. Chem. Phys., 100, 1414-1424, 1994. [18] G. M. Kuramshina, A. V. Lobanov and A. G. Yagola, Searching normal solution of inverse vibrational problem by the method of Monte-Carlo, J. Mol. Struct., 348, 147-150, 1995. [19] G. M. Kuramshina and A. G. Yagola, A priori constraints in the force field calculations of polyatomic molecules, J. Struct. Chem., 38, 181-194, 1997. [20] G. M. Kuramshina and F. Weinhold, Constraints on the values of force constants for molecular force field models based on ab initio calculations, J. Mol. Struct., 410-411, 457-462, 1997. [21] I. V. Kochikov, G. M. Kuramshina and A. G. Yagola, Inverse problems of vibrational spectroscopy as nonlinear ill-posed problems, Surveys on Mathematics for Industry, 8, 63-94, 1998. [22] G. M. Kuramshina and Yu. A. Pentin, DFT force fields, vibrational spectra and potential functions for hindered internal rotation of CF3 CH2 CH2 Cl and CCl3 CH2 CH2 Cl, J. Mol. Struct., 481, 161-168, 1999. [23] I. V. Kochikov, Yu. I. Tarasov, V. P. Spiridonov, G. M. Kuramshina, A. G. Yagola, A. S. Saakjan, M. V. Popik and S. Samdal, Extension of a regularizing algorithm for the determination of equilibrium geometry and force field of free molecules from joint use of electron diffraction, molecular spectroscopy and ab initio data on systems with large-amplitude oscillatory motion, J. Mol. Struct., 485-486, 421-443, 1999. [24] G. M. Kuramshina, Yu. A. Pentin and A. G. Yagola, New approaches for molecular conformer force field analysis in combination with ab initio results, J. Mol. Struct., 509, 255-263, 1999. [25] I. V. Kochikov, Yu. I. Tarasov, V. P. Spiridonov, G. M. Kuramshina, D. W. H. Rankin, A. S.Saakjan and A. G.Yagola, The equilibrium structure of thiophene by the combined use of electron diffraction, vibrational spectroscopy and microwave spectroscopy guided by theoretical calculations, J. Mol. Struct., 567568, 29-40, 2001. [26] P. Pulay, G. Fogarasi, G. Pongor, J. E. Boggs and A. Vargha, Combination of theoretical ab initio and experimental information to obtain reliable harmonic force constants. Scaled quantum mechanical (QM) force fields for glyoxal, acrolein, butadiene, formaldehyde, and ethylene, J. Am. Chem. Soc., 105, 7037-7047, 1983. [27] A. V. Stepanova, I. V. Kochikov, G .M. Kuramshina and A. G. Yagola, Regularizing scale factor method for molecular force field calculations, Computer Assistance for Chemical Research, International Symposium CACR-96. Moscow, 52, 1996. [28] I. V. Kochikov, G. M. Kuramshina, A. V. Stepanova and A. G. Yagola, Regularizing scale factor method for molecular force field calculations, Moscow Univ. Bull., Ser. 3, Physics and Astronomy, 5, 21-25, 1997.
References
329
[29] I. V. Kochikov, G. M. Kuramshina, A. V. Stepanova and A. G. Yagola, Numerical aspects of the calculation of scaling factors from experimental data, Numerical Methods and Programming, 5, 285-290, 2004. [30] H. W. Engl, M. Hanke and A. Neubauer, Regularization of Inverse Problems, Dordrecht, Kluwer, 1996. [31] I. V. Kochikov, G. M. Kuramshina and A. V. Stepanova, New approach for the correction of ab initio molecular force fields in artesian coordinates, Int. J. Quant. Chem., 109, 28-33, 2009. [32] I. V. Kochikov, G. M. Kuramshina and A. V. Stepanova, Scaled ab initio Molecular force fields in Cartesian coordinates: application to benzene, pyrazine and isopropylamine molecules, Asian Chem. Let., 13, 143-154, 2009. [33] S. Kondo, Empirical improvement of the anharmonic ab initio force field of methyl fluoride, J. Chem. Phys., 81, 5945-5951, 1984. [34] I. V. Kochikov, Yu. I. Tarasov, V. P. Spiridonov, G. M. Kuramshina, A. S. Saakyan and Yu. A. Pentin, The use of ab initio calculation results in electron diffraction studies of the equilibrium structure of molecules, Russian J. Phys. Chem., 75, 395-400, 2001. [35] I. V. Kochikov, G. M. Kuramshina, L. M. Samkov, D. A. Sharapov and S. A. Sharapova, A hybrid database (information system) on the molecular spectral constants, Numerical Methods and Programming, 6, 83-87, 2005. [36] I. V. Kochikov, G. M. Kuramshina, L. M. Samkov, D. A. Sharapov and S. A. Sharapova, Data structuring and formalization in the hybrid database on molecular spectral constants (ISMOL), Numerical Methods and Programming, 7, 111116, 2006. [37] I. V. Kochikov, G. M. Kuramshina, L. M. Samkov, D. A. Sharapov and S. A. Sharapova, Identification algorithms for substances in the hybrid database on molecular spectral constants (ISMOL), Numerical Methods and Programming, 8, 70-73, 2007. [38] T. Iijima, T. Kondou and T. Takenaka, Molecular structure of isopropylamine and cyclopropylamine as investigated by gas-phase electron diffraction, J. Mol. Struct., 445, 23-28, 1998. [39] J. R. Durig, G. A. Gulrgls and D. A. C. Compton, Analysis of torsional spectra of molecules with two internal C3V rotors. 13. Vibrational assignments, torsional potential functions, and gas phase thermodynamic functions of isopropylamine-d0 and -d2, J. Phys. Chem., A, 83, 1313-1323, 1979. [40] D. Zeroka, J. O. Jensen and A. C. Samuels, Infrared spectra of some isotopomers of isopropylamine: A theoretical study, Int. J. Quant. Chem., 72, 109-126, 1999. [41] Y. Hamada, M. Tsuboi, M. Nakata and M. Tasumi, Infrared spectrum of isopropylamine, Chem. Phys., 125, 55-62, 1988. [42] J. Baker, A. A. Jarzecki and P. Pulay, Direct scaling of primitive valence force constants: An alternative approach to scaled quantum mechanical force fields, J. Phys. Chem., A, 102, 1412-1424, 1998.
330
13 Calculation of Molecular Force Fields
Authors Information G. M. Kuramshina Department of Physical Chemistry, Chemical Faculty, Moscow State University, Moscow 119992, Russia. E-mail: [email protected] I. V. Kochikov Scientific Research Computer Centre, Moscow State University, Moscow 119992, Russia. E-mail: [email protected] A. V. Stepanova Department of Physical Chemistry, Chemical Faculty, Moscow State University, Moscow 119992, Russia.
Chapter 14
Some Mathematical Problems in Biomedical Imaging J. J. Liu and H. L. Xu
Abstract. Magnetic resonance electrical impedance tomography (MREIT) is a new biomedical imaging modality developed recently to visualize the crossesectional conductivity images of an electrically conducting object, which is motivated by the poor spatial resolution of the traditional electrical impedance tomography (EIT). Current injected through a pair of surface electrodes induces a magnetic flux density distribution B = (Bx , By , Bz ) inside the electrically conducting object. With the help of an MRI scanner, the z-component Bz of B can be captured. Based on a relation between Bz data and the internal conductivity σ, we can reconstruct the cross-sectional conductivity images by some image reconstruction algorithms. Numerical simulation and animal experiments have shown that MREIT can provide conductivity images with high spatial resolution and accuracy. Since the original ideas on MREIT were proposed in early 1990’s, MREIT technique has developed rapidly from the mathematical models, image reconstruction algorithms to the stage of in vivo animal and human imaging experiments. In this chapter, we will review the mathematical models and image reconstruction realizations of MREIT, focusing on some recent algorithms.
14.1
Introduction
Magnetic Resonance Electrical Impedance Tomography (MREIT) is a new biomedical imaging modality developed recently to visualize the crosse-sectional conductivity images of an electrically conducting object such as biological tissues and organs inside the human body. Since the early 1980s, numerous attempts in electrical impedance tomography (EIT) have been made to produce the crosse-sectional conductivity images [4, 6]. EIT imaging uses only the boundary measurement of current-voltage data (Neumann-to-Dirichlet data) as input data, which is insensitive to any change in a local conductivity distribution of the object. Due to the well-known ill-posedness of the corresponding
332
14 Some Mathematical Problems in Biomedical Imaging
inverse problem in EIT, the boundary measurement data are insufficient for robust reconstruction of high-resolution conductivity images. In contrast to the traditional EIT technique, MREIT applies essentially the internal magnetic flux density data induced by an externally injected current to reconstruct the conductivity distribution. Since the internal data contain more information about the conductivity of the object, this technique of course weakens the ill-posedness of traditional EIT imaging and can provide a high resolution of the conductivity image. In the early 1990s, MREIT was proposed in order to bypass the ill-posedness in EIT. Early trials of MREIT [39, 36, 3] were motivated by the magnetic resonance current density imaging (MRCDI) technique which aims to visualize the internal current density distribution of the target [29, 30]. Although none of them could produce high-resolution conductivity images in actual imaging experiments, these early ideas changed the way to investigating the conductivity imaging problem by using the internal information. Since then, MREIT technique has developed rapidly from the mathematical models, reconstruction algorithms to the stage of in vivo animal and human imaging experiments [11, 8, 31, 23, 13, 20, 10, 34]. In MREIT, we inject an external current into an electrically conducting object through a pair of surface electrodes. The injected current produces the internal current density J = (Jx , Jy , Jz ) and magnetic flux density B = (Bx , By , Bz ) distributions. With the help of an MRI scanner with its main magnetic field in the z direction, we can capture the data of Bz , the z-component of B. The magnetic flux density B conveys the information about the conductivity distribution σ according to the Ohm law and Amp´ere law −σ∇u = J =
1 ∇ × B, μ0
(14.1.1)
where μ0 is the magnetic permeability of the free space, u is the internal voltage distribution. In the early stage of MREIT, some methods attempted to reconstruct the conductivity σ from the information of current density J. These J-based methods need to measure all three components of B = (Bx , By , Bz ) as J = μ10 ∇ ×B, which requires mechanical rotations of the imaging object within the MRI scanner. If we know the current density J, the conductivity σ can be reconstructed by using some image reconstruction algorithms such as J-substitution algorithm [11, 8, 16] and equipotential line methods [12, 25]. Whereas, these methods are difficult to use in practice since rotations of the imaging object in the MRI scanner are needed. In order to make the MREIT technique applicable to clinical situations, we should use only one component of B to avoid object rotations. In 2003, the first constructive Bz -based MREIT algorithm called the harmonic Bz algorithm
14.1 Introduction
333
was proposed in [31]. The numerical simulations and phantom experiments have shown the possibility of high-resolution conductivity imaging for the measurement data with small noise [32, 23, 24]. Since then, the MREIT imaging technique has advanced rapidly and now reached the stage of in vivo animal and human imaging experiments [23, 24, 19, 20, 10, 21, 7, 34]. This novel algorithm is based on the key relation between the internal Bz and conductivity σ. It is the z-component of the identity ΔB = μ0 ∇σ × ∇u,
(14.1.2)
which is the curl form of the Amp´ere law. The harmonic Bz algorithm for the reconstruction of conductivity σ applies ΔBz , the Laplacian operation of magnetic flux, as input data, rather than Bz . Of course, this algorithm amplifies the noise in the measurement data Bz obviously. Thus the performance of the harmonic Bz algorithm could deteriorate if the noise in the measurement of Bz is not so small, which is the practical case of MREIT. To deal with this noise problem, some algorithms were developed in order to weaken the ill-posedness of the harmonic Bz algorithm, such as the gradient Bz decomposition algorithm [26] and variational gradient Bz algorithm [27], which need the first order derivative of Bz only. Although the noise amplification problem is weakened in some senses in these two schemes, the first-order derivative of Bz is still needed. Due to these difficulties in treating with the derivative of the measurement data Bz , it is preferable to establish an image reconstruction algorithm using the magnetic flux density data directly, rather than its derivative. Recently, we proposed an integral equation method, where the conductivity σ was reconstructed from the Bz data directly, see [22]. The integral version of the Biot-Savart law was used to describe the relation between Bz and σ, see (14.2.7) in the next section. The validity of the integral equation method was shown by some numerical simulations in [22]. In this chapter, we will review MREIT from mathematical models, imaging reconstruction algorithms to its numerical simulations. In section 14.2, the mathematical models of the forward problem and inverse problem in MREIT are described in details. Since the imaging reconstruction algorithms are the key to the practical applications of MREIT technique, two specific algorithms: the harmonic Bz algorithm and the integral equation method are discussed in Section 14.3 and Section 14.4, respectively. In the last section, we give some numerical simulations to show the validity of the above image reconstruction algorithms.
334
14.2
14 Some Mathematical Problems in Biomedical Imaging
Mathematical Models
In this section, we describe the mathematical models of MREIT technique for the forward problems and the corresponding inverse problems.
14.2.1
Forward problem
Consider an electrically conducting object occupying a domain Ω ⊂ R3 with smooth connected boundary ∂Ω. In order to simplify the MREIT problem, we assume that the conductivity distribution σ in Ω is isotropic, positive and smooth. For the anisotropic electrically conducting object such as human skeletal muscle and some neural tissues, the corresponding MREIT problem is more complicated, see [33]. By attaching a pair of electrodes E + and E − on ∂Ω, we inject a fixed low frequency current of I mA from the electrodes. The injected current I produces an internal current density J = (Jx , Jy , Jz ) inside the object Ω which satisfies the following boundary value problem: ⎧ ⎪ ⎪∇ · J = 0, in Ω, ⎨ J · nds = J · nds, I=− ⎪ E+ E− ⎪ ⎩ J × n = 0, on E + ∪ E − , J · n = 0,
(14.2.1) on ∂Ω\E + ∪ E − ,
where n is the unit outward normal vector on ∂Ω and ds is the surface area element. Since the internal current density J and voltage u satisfy the Ohm law J = −σ∇u, (14.2.1) can be rewritten as ⎧ ⎪ = 0, in Ω, ⎪ ⎪∇ · (σ∇u) ⎨ ∂u ∂u σ ds, I= σ ds = − ⎪ ∂n ∂n − + E E ⎪ ⎪ ∂u ⎩ ∇u × n = 0, on E + ∪ E − , σ ∂n = 0,
(14.2.2) on ∂Ω\E + ∪ E − ,
∂u where ∂n = ∇u · n. By setting a reference voltage such as u |E − = 0, we can obtain a unique solution u ∈ W 1,2 (Ω) of (14.2.2) from the standard arguments in PDEs. The nonclassical model (14.2.2) can be solved indirectly from the following standard problem of an elliptic equation with mixed boundary conditions, see [20]:
⎧ ⎪ ∇ · (σ∇u) ˜ = 0, in Ω, ⎪ ⎨ u˜ |E + = 1, u˜ |E − = 0, ⎪ u ⎪ ⎩−σ ∂ = 0, on ∂Ω\E + ∪ E − . ∂n
(14.2.3)
14.2 Mathematical Models
335
More precisely, if u is a solution of the mixed boundary value problem (14.2.2), then I u= u˜ + C, in Ω, (14.2.4) ∂ u˜ ∂E + σ ∂n ds where the constant C can be confirmed by the reference voltage. Now we consider the magnetic flux density B produced by the injected current I. B can be expressed as B(r) = BΩ (r) + Bχ (r),
r ∈ Ω,
(14.2.5)
where BΩ is the magnetic flux density due to current density J in Ω and Bχ is from currents in lead wires and surfaces of electrodes. From the Biot-Savart law, we have μ0 r − r B(r) ≈ BΩ (r) = J(r ) × dr , r ∈ Ω, (14.2.6) 4π Ω |r − r |3 since Bχ is quite small and therefore is omitted for simplicity. In fact, we have ΔBχ ≡ 0. The z-component of the formula (14.2.6) contains an implicit relation between Bz and the conductivity σ as follows μ0 r − r , −σ(r )∇u(r ) × ez dr , r ∈ Ω, (14.2.7) Bz (r) = 4π Ω |r − r |3 where ez = (0, 0, 1). By taking the curl form of the Amp´ere law μ0 J = ∇ × B, we have ΔB = μ0 ∇σ × ∇u.
(14.2.8)
Then the z-component yields the following relation between Bz and the conductivity σ: (14.2.9) ΔBz = μ0 ∇σ · (∇u × ez ). The forward problem of MREIT is to compute the internal current density J, magnetic flux density B from the given conductivity σ in Ω ⊂ R3 . Firstly, we can solve (14.2.3) for u˜ by the finite element method, then the internal voltage u can be determined from (14.2.4) and a given reference voltage. Finally, we compute the internal current density J from the Ohm law J = −σ∇u, the internal magnetic flux density B and its z-component Bz from (14.2.6) to (14.2.9). For inverse problem, the conductivity σ is unknown and the Bz data can be measured by the MRI scanner in practical situation. The aim is to reconstruct σ from the measurement data Bz based on the above relations, which will be stated in the next subsections.
336
14.2.2
14 Some Mathematical Problems in Biomedical Imaging
Inverse problem
The inverse problem in MREIT is to reconstruct the conductivity σ in Ω from the given measurement data of the internal magnetic flux density B, where an image reconstruction algorithm should be employed. Based on the relations between σ and the internal current density J or magnetic flux density B given in the previous subsection, some different image reconstruction algorithms have been developed. Before we introduce these image algorithms, we first explain briefly how to obtain the measurement data of B by using the MRI scanner, see [37, 7, 34] for more details. The measurement of the internal magnetic flux density B induced by an injected current was originally studied in magnetic resonance current density imaging (MRCDI) technique. Assume that the main magnetic field B0 of the MRI scanner is parallel to the z direction. An injected current to the biotissue can generate extra inhomogeneity of the main magnetic field changing B0 to B0 + B and this altered the MR phase image in such a way that the phase change was proportional to Bz . Sequentially injecting a positive current I + and a negative current I − from a pair of electrodes, we can obtain the following complex k-space data involving Bz information in an imaging slice Ωz0 = Ω ∩ {z = z0 }: SI ± (kx , ky ) = M (x, y, z0 )e±iγBz (x,y,z0 )Tc +iδ(x,y,z0 ) ei(xkx +yky ) dxdy, Ωz0
(14.2.10) where M is the transverse magnetization, δ is any systematic phase error, γ = 26.75×107 rad/T·s is the gyromagnetic ratio of hydrogen and Tc is the duration of the inject current pulse. The k-space data SI ± (kx , ky ) were measured in MREIT techniques. Taking two-dimensional discrete Fourier transforms to the k-space data SI ± , we can obtain the following complex MR images: M± (x, y, z0 ) = M (x, y, z0 )e±iγBz (x,y,z0 )Tc eiδ(x,y,z0 ) .
(14.2.11)
Occupying a standard phase unwrapping algorithm, we can get the Bz data as Bz (x, y, z0 ) =
+ 1 M (x,y,z0 ) , arg M − (x,y,z ) 0 2γTc
(14.2.12)
where arg(·) is the principal value of the argument of a complex number. Note that the Bz signal is directly proportional to a phase change produced by the injected current, the noise of Bz data is amplified where the MR signals are weak. Thus, for obtaining a high-quality Bz data, some denoising method and inpainting method should be introduced to recover the measurement data of
14.2 Mathematical Models
337
Bz , see [17, 18, 37]. Then, the recovered Bz data will be used to reconstruct the conductivity σ by some image reconstruction algorithms. Now we introduce the image reconstruction algorithm, which is the core of MREIT technique. Generally speaking, there are mainly two categories of image reconstruction algorithms: one is the J-based algorithm, the others are the Bz -based algorithms [37]. First, we introduce the J-based algorithms. The internal current density J = μ10 ∇ × B is available by measuring all the three components of B = (Bx , By , Bz ), which requires mechanical rotations of the imaging object within the MRI scanner. The J-based algorithms reconstruct the conductivity mainly from the relation between the conductivity σ and interior current density J: J(r) := |J(r)| = σ(r)|∇u[σ](r)|,
(14.2.13)
where u[σ] represents the solution of (14.2.2) for the conductivity σ. When we know the interior current density J, we can capture the conductivity σ by solving the nonlinear equation (14.2.13). Before considering the numerical implementations of the above algorithms, we need to consider the solvability of the nonlinear equation (14.2.13). In practice, the existence is always guaranteed as long as J is the magnitude of the current density, whereas the uniqueness cannot be guaranteed. In fact, if σ satisfies (14.2.13), then cσ also satisfies (14.2.13) for any positive constant c, where the corresponding voltage is u[cσ] = 1c u[σ]. This uncertainty can be fixed by adopting the normalization of the conductivity σ(ξ) = 1 for an arbitrary point ξ ∈ Ω. Unfortunately, the uniqueness of the nonlinear equation (14.2.13) still σ also satisfies cannot be assumed, noticing that if σ satisfies (14.2.13), φ (u[σ]) σ (14.2.13). At that time, the corresponding voltage is u[ φ (u[σ]) ] = φ(u), where φ : R → R is a strictly increasing and continuously differentiable function. Since there are infinitely many ways to the choice of the function φ, there are at lest the same number of conductivities σ satisfying the nonlinear equation (14.2.13). This uncertainty can be fixed by providing additional measurement data. By injecting two linearly independent currents through two pairs of surface electrodes on ∂Ω, we can obtain two measurements of the current density. Then the uniqueness of the nonlinear equation (14.2.13) has been proven in [9]. After ensuring the solvability of the nonlinear equation (14.2.13), we can introduce the image reconstruction algorithms by solving it, such as the Jsubstitution algorithm [11, 8, 16] and equipotential line methods [12, 25]. Taking the J-substitution algorithm as an example, it is a natural iterative scheme to solve (14.2.13). In order to capture the conductivity σ from the magnitude J of the current density J, we take an initial guess σ k with k = 0 of the true
338
14 Some Mathematical Problems in Biomedical Imaging
conductivity σ, and update σ k by σ k+1 =
J(r) , |∇uk (r)|
where uk = u[σ k ] is the voltage corresponding to the conductivity σ k . Combined with two linearly independent injected currents, this process can be done. The whole iterative algorithm and its convergence analysis can be found in [9]. Notice that the J-based algorithms are difficult to use in practice since rotations of the imaging object in the MRI scanner are needed. In order to make the MREIT technique applicable to clinic situations, we should use only Bz data to avoid object rotation in the MRI scanner. In 2003, the first Bz -based algorithm was proposed to reconstruct the conductivity σ only using the Bz data, see [31]. The so-called harmonic Bz algorithms had been widely used in the subsequent experiment studies [23, 24, 19, 20, 10]. Since then, imaging techniques in MREIT have advanced rapidly and have now reached the stage of in vivo animal and human experiments. The Bz -based algorithms reconstruct the conductivity from the relation between σ and the z-component Bz of the magnetic flux density B. As the Bz data can trace the change of σ to some extent, the high-quality reconstruction of the conductivity σ from the data Bz is possible. From the physical law, we can get two different relations between σ and Bz , one is the z-component of the Biot-Savart law: σ(r)[(x − x )∂y u[σ](r )] − (y − y )∂x u[σ](r ) μ0 Bz (r) = dr , r ∈ Ω. 4π Ω |r − r |3 (14.2.14) the other is the z-component of the curl form of the Amp´ere law ∇ × J = 1 μ0 ∇ × (∇ × B), that is, ΔBz = μ0 ∇σ · (∇u × ez ) = μ0 ∇ ln σ · (σ∇u × ez ).
(14.2.15)
This expression explains clearly that Bz data probes a change of ln σ along the vector flow σ∇u × ez , which is vertical to the current density direction −σ∇u and the z-component direction. Both the nonlinear equation (14.2.14) and (14.2.15) show the relation between the Bz data and the conductivity σ. In fact, these two equations are theoretically equivalent. For example, from the equation (14.2.14), we have 1 ·(σ(r )∇u(r )×ez )dr = μ0 σ(r)∇u(r)×ez , r ∈ Ω. ∇Bz (r) = μ0 ∇ ∇ |r − r | Ω Using the smoothness assumption of the conductivity σ and the fact that ∇ · (∇u × ez ) = 0, we can get the formula (14.2.15). Thus, we can attempt to
14.3 Harmonic Bz Algorithm
339
reconstruct the conductivity σ by solving one of them from the measurement data Bz . The solvability of the nonlinear equation (14.2.14) and (14.2.15) are similar to that of (14.2.13). The existence can be ensured in practice as long as Bz is the z-component of the magnetic flux density B. The uniqueness can be guaranteed by adopting the normalization of the conductivity and providing additional measurement data of Bz . Although the uniqueness has been shown in numerous numerical simulations, its rigorous mathematical proof is still open for three-dimension domain Ω ⊂ R3 . Some uniqueness results in two-dimension domain based on geometric index theory can be found in [14]. Various image reconstruct algorithms can be established by solving the nonlinear equation (14.2.14) or (14.2.15). The well-known harmonic Bz algorithm is based on (14.2.15), and the integral equation method proposed recently is based on the nonlinear equation (14.2.14). Both of them will be discussed concretely in the next two sections respectively.
14.3
Harmonic Bz Algorithm
In this section, we will give the well-known harmonic Bz algorithm, which is based on the key relation (14.2.15) between the conductivity σ and the data Bz . Assume that we inject two linearly independent currents through two pairs of surface electrodes E1± and E2± on ∂Ω. For given σ, denote by uj [σ] the induced voltage corresponding to the injection current Ij , j = 1, 2, that is, uj [σ] is the solution of the following boundary value problem: ⎧ ⎪ ∇ · (σ∇uj [σ]) = 0, in Ω, ⎪ ⎪ ⎪ ∂uj [σ] ∂uj [σ] ⎨ σ Ij = ds = − ds, σ − + ∂n ∂n Ej Ej ⎪ ⎪ ⎪ ⎪ ⎩∇u [σ] × n = 0, on E + ∪ E − , σ ∂uj [σ] = 0, j j j ∂n
(14.3.1) on ∂Ω\Ej+ ∪ Ej− .
The z-component Bzj , j = 1, 2 of internal magnetic flux density B satisfy Bzj (r) =
μ0 4π
Ω
σ(r)[(x − x )∂y uj [σ](r )] − (y − y )∂x uj [σ](r ) dr , |r − r |3
r ∈ Ω. (14.3.2)
The corresponding Laplacian operations ΔBzj , j = 1, 2 satisfy ΔBzj = μ0 ∇σ · (∇uj [σ] × ez ) = μ0 ∇ ln σ · (σ∇uj [σ] × ez ).
(14.3.3)
340
14.3.1
14 Some Mathematical Problems in Biomedical Imaging
Algorithm description
The harmonic Bz algorithm is based on the identity (14.3.3) for two measurement data Bzj , j = 1, 2, that is, ∂ ln σ (r) 1 ΔBz1 (r) ∂x = (σA[σ])(r) ∂ ln σ , r ∈ Ω, (14.3.4) μ0 ΔBz2 (r) ∂y (r)
∂u
where A[σ](r) :=
1 [σ]
1 [σ] (r) − ∂u∂x (r)
∂y ∂u2 [σ] ∂y (r)
2 [σ] − ∂u∂x (r)
,
r ∈ Ω.
Provided that A[σ] be invertible, then the identity (14.3.4) becomes ∂ ln σ 1 ∂x (r) = 1 (σA[σ](r))−1 ΔBz (r) , r ∈ Ω. ∂ ln σ ΔBz2 (r) μ0 ∂y (r)
(14.3.5)
The basic idea of the harmonic Bz algorithm is to reconstruct the conductivity σ on each two-dimensional slice Ωz0 := Ω ∩ {z = z0 } of Ω ⊂ R3 , then the conductivity σ in Ω is obtained by combining {σ(x, y, z0 )} for all z0 together. Therefore, the reconstruction of σ at each two-dimensional slice Ωz0 is the key part of the harmonic Bz algorithm. In the following we will consider the two-dimensional MREIT problem which occurs in the case when the imaging object is locally cylindrical and ∂σ ∂z = 0, which means that the conductivity σ in Ω does not change along z direction. In this case, the voltage uj [σ] in Ω is independent of z and satisfies the two-dimensional version of (14.3.1), if the boundary injected current is also independent of z. The harmonic Bz algorithm (14.3.10) in what follows is an iterative scheme based on the identity (14.3.5) on each slice Ωz0 in terms of the measured data Bzj , j = 1, 2, see [20, 34]. Denote by x := (x, y) and x := (x , y ). The identity (14.3.5) on each slice Ωz0 is ∂ ln σ 1 ∂x (x, z0 ) = 1 (σA[σ](x, z ))−1 ΔBz (x, z0 ) , (x, z0 ) ∈ Ωz0 , (14.3.6) 0 ∂ ln σ ΔBz2 (x, z0 ) μ0 ∂y (x, z0 ) where Δ is the two-dimensional Laplacian operation. We define 1 (x − x ) · ν(x ) ln σ(x , z0 )dlx , (14.3.7) Lz0 [ln σ](x) := ln σ(x, z0 ) + 2π ∂Ωz0 |x − x |2 where ν is the unit outward normal vector to ∂Ωz0 and dl is the line element. Define 1 1 x − x −1 ΔBz (x , z0 ) (σ(A[σ](x , z0 )) Gz0 [σ](x) := )dsx , ΔBz2 (x , z0 ) 2πμ0 Ωz0 |x − x |2 (14.3.8)
14.3 Harmonic Bz Algorithm
341
where ds is the surface area element. Then we have Lz0 [ln σ](x) = Gz0 [σ](x),
(x, z0 ) ∈ Ωz0 ,
(14.3.9)
which comes from the identity Δx
1 1 ln = −δ(x − x), 2π |x − x|
1 ln |x 1−x| with respect to x = (x , y ). and integrating by parts for ln σ(x )Δx 2π ∗ We denote by σ the true conductivity which is unknown in Ωz0 , and assume that the boundary value σ ∗ |∂Ωz0 is known. For a given initial guess σ 0 (x, z0 ) of σ ∗ , the harmonic Bz algorithm constructs an approximation sequence {σ n (x, z0 ) : n ∈ N} by
⎧ ⎨ ⎩
∇ ln σ n+1 (x, z0 ) =
1 n −1 μ0 (σA[σ ](x, z0 ))
Lz0 [ln σ n+1 ](x) = Gz0 [σ n+1 ](x).
ΔBz1 (x, z0 ) , ΔBz2 (x, z0 )
(14.3.10)
From the first step in (14.3.10), we can update ∇σ n+1 from σ n in Ωz0 as long as the measurement data Bzj , j = 1, 2 are available. In the second step, we update σ n+1 from ∇σ n+1 and the known boundary value σ ∗ by ln σ
n+1
(x, z0 ) = −Hz0 [ln σ
n+1
1 ](x) + 2π
Ωz0
x − x · ∇ ln σ n+1 (x , z0 )dsx , |x − x |2 (14.3.11)
where Hz0 [ln σ
n+1
1 ](x) = Hz0 [ln σ ](x) := 2π ∗
∂Ωz0
(x − x ) · ν(x ) ln σ ∗ (x , z0 )dlx . |x − x |2
It should be noticed that (A[σ](r))−1 may be quite large near ∂Ωz0 due to the fact that two induced currents σ∇u1 , σ∇u2 are probably almost parallel near the boundary for some electrodes configuration. This phenomena will cause serious difficulty for the convergence of the harmonic Bz algorithm. To avoid this situation, let us assume that the target conductivity σ ∗ is constant ˜ z0 for some interior domain Ω ˜ z0 ⊂⊂ Ωz0 . Then it is easy to show that in Ωz0 \ Ω j j ˜ z0 , where ΔBz,∗ , j = 1, 2 denote the exact magnetic ΔBz,∗ ≡ 0, j = 1, 2 in Ωz0 \Ω ∗ flux densities corresponding to σ in such a way that it satisfies (14.3.6) on the slice Ωz0 . This assumption is necessary in the following convergence analysis for the harmonic Bz algorithm.
342
14.3.2
14 Some Mathematical Problems in Biomedical Imaging
Convergence analysis
The convergence analysis for the harmonic Bz algorithm can be found in [20, 21]. In [20], the authors gave a rigorous mathematical analysis on the convergence of the iterative sequence {σ n } in (14.3.10) under the small contrast condition ||∇σ ∗ ||C 1 (Ωz ) ≤ with the convergence rate affected by the known 0 ∗ of σ ∗ satisfying 0 < σ ∗ < σ ∗ < σ ∗ < +∞. These lower and upper bounds σ± − + convergence results are the Theorem 3.2 for two-dimensional domain and the Theorem 3.5 for three-dimensional domain in [20]. However, there are still some differences for the convergence between such a mathematical a-priori assumption on the exact conductivity and the physical understanding, that is, the ∗ reconstruction performance should not depend on ∇σ ∗ , but on ∇σ σ∗ , which has also been found in the numerical performances. Further more, the convergence estimate given in [20] is implicit, since some constants coming from the theoretical analysis are involved. In order to give a computable error estimate for the harmonic Bz algorithm and to analyze the convergence property under a practical assumption on the target conductivity, an a-posterior error estimate and convergence analysis were given in [21], which improved the previous convergence result in [20]. In the next, we will give the a-posterior error estimate and the corresponding convergence analysis for the harmonic Bz iteration scheme, which can be found in [21] for more details. Assume that the exact conductivity σ ∗ is compactly supported in a domain ˜ Ωz0 ⊂⊂ Ωz0 with ∇σ ∗ |Ωz
˜
0 \Ωz0
= 0 and ∇ ln σ ∗ L∞ (Ωz0 ) ≤ C0 ,
(14.3.12)
where C0 is a positive constant. Notice that the norm ∇ ln σ ∗ L∞ (Ωz0 ) chosen here is natural in the sense that it measures the oscillations on σ ∗ . Putting a bound C0 means putting a bound on the oscillation of the target conductivity. This assumption with the standard electrode configuration provides a positive ˜ z0 : lower bound for det(A[σ ∗ ]) in Ω ∂u [σ∗ ] ∂u [σ∗ ] 1 1 − ∂x > 0. inf ∂u∂y ∗ ∂u2 [σ∗ ] 2 [σ ] ˜ Ωz0 − ∂y ∂x
(14.3.13)
We should notice that (14.3.13) is not true for the entire domain Ωz0 due to the fact that two induced currents satisfy ∇u1 [σ ∗ ] × ∇u2 [σ ∗ ] ≈ 0 near to ∂Ωz0 \ ∪j=1,2 Ej+ ∪ Ej− . This means that (14.3.13) is correct only for the interior ˜ z0 . The above positive lower bound for a three-dimensional domain region Ω remains to be an open problem. In the two-dimensional case, the estimate (14.3.13) can be proven using results in [2] when σ ∗ is smooth.
14.3 Harmonic Bz Algorithm
343
Denote by {σ n : n = 1, 2, . . . } the iteration sequence from (14.3.10) using j j for j = 1, 2. Let ΔBz,n , j = 1, 2 be the solutions of the exact input data Bz,∗ n ⎧ n n n ⎨ΔB j = μ ∂uj [σ ] ∂uj [σ ] · ∂σ , ∂σ in Ωz0 , ,− z,n 0 (14.3.14) ∂x ∂y ∂y ∂x ⎩ j j j j Bz,n = Bz,∗ on ∂Ωz0 \ E ± , ∇Bz,n · n = ∇Bz,∗ · n on ∂E ± . j Notice that Bz,n for j = 1, 2 are available quantities by solving the Poisson equation (14.3.14) and the elliptic equation (14.3.1) for each σ n . We are now ready to present the a-posteriori estimate, see [21].
Theorem 14.3.1. Assume that σ ∗ > 0 satisfies (14.3.12). The error between the reconstructed conductivity σ n in (14.3.10) and the unknown true conductivity σ ∗ can be estimated by ln σ n − ln σ ∗ L∞ (Ω˜ z
0)
2 j B − B j 2,p ≤ M (σ n A[σ n ])−1 Lp2 (Ω˜ z ) z,∗ z,n W 3 (Ω ˜ 0
j=1
z0 )
,
(14.3.15) where M=
2 1 + C0 −1 (diam(Ωz0 )) p1 1/p 1 2πμ0 (2 − p1 )
and p1 , p2 , p3 are any positive numbers such that 1 < p1 < 2,
1 p1
+ p12 + p13 = 1.
Now we consider the convergence analysis for the harmonic Bz iteration scheme (14.3.10). The major difficulty comes from the requirement on the uniform upper bound of the inverse matrix A[σ n ]−1 in the iteration procedure. In order to guarantee the uniform upper bound of A[σ n ]−1 , some assumptions on the target conductivity σ ∗ and the initial guess σ 0 are needed. Assume that the target conductivity σ ∗ lies in the set Ξ[ 0 , λ, σ 0 ] 1 1 0 < σ < λ, ∇ ln σC(Ωz0 ) < 0 , σ|Ωz \Ω˜ z = σ , = σ ∈ C (Ωz0 ) : 0 0 λ (14.3.16) where λ, σ0 , 0 are positive constants. Then we can give the lower bound of ˜ z0 , which has been proven in [21]: A[σ ∗ ] in Ω inf | det A[σ ∗ ]| ≥ d∗− > 0,
˜z Ω 0
(14.3.17)
˜ z0 ) and E ± . Based on the above where d∗− depends only on λ, 0 , Ω, dist(∂Ωz0 , Ω j discussion, we have the following convergence result.
344
14 Some Mathematical Problems in Biomedical Imaging
Theorem 14.3.2. Let Ξ[ 0 , λ, σ 0 ] be the set defined in (14.3.16) and let d∗− be the quantity in (14.3.17). There exists a constant = (d∗− , 0 ) > 0 such that, for each σ ∗ ∈ Ξ[ 0 , λ, σ 0 ] with ∇ ln σ ∗ C(Ωz ) ≤ , the sequence {σ n } given by 0 the harmonic Bz iteration scheme (14.3.10) with the initial guess σ 0 satisfies n 1 ˜ z0 , ln σ n − ln σ ∗ 1 ˜ ≤ K σ n ≡ σ ∗ in Ωz0 \ Ω
, n = 1, 2, . . . , C (Ωz0 ) 2 (14.3.18) where K := diam(Ωz0 ) + 1. Combining Theorems 14.3.1 and 14.3.2 yields the following result. Theorem 14.3.3. For the above iteration process, we have the following aposteriori estimate: σn ln σ∗ ∞ L (Ωz0 ) ⎛ ⎞n n j j (σ A[σ n ])−1 p ˜ 2 − ΔB ΔB z,n z,∗ j=1 L 2 (Ωz0 ) W 2,p3 (Ωz0 ) ⎟ ⎜ ≤ K 0 ⎝ ⎠ . j j 2 n−1 n−1 −1 (σ A[σ ]) Lp2 (Ω˜ z ) j=1 ΔBz,∗ − ΔBz,n−1 2,p 0
W
3 (Ωz ) 0
(14.3.19) Theorem 14.3.3 gives a computable error estimate for the harmonic Bz iteration algorithm, which provides an adaptive criterion for the stopping rule of the harmonic Bz algorithm as all the items in the right hand side of (14.3.19) can be computed. When the iterative scheme (14.3.10) is applied for the noisy input data j Bz , j = 1, 2, the noise in the measurement data is amplified by the Laplacian operation ΔBzj , j = 1, 2. Thus, some regularization methods are needed for computing ΔBzj from the noisy measurement data of Bzj . In the next subsection, we will introduce a stable method for this computation based on the Lavrentiev regularization.
14.3.3
The stable computation of ΔBz
In this subsection, we will give a stable computation scheme for the Laplacian operation. For simplicity, we denote the two-dimensional domain Ω := Ωz0 in the following. Suppose that Bz ∈ H 2 (Ω) and u := ΔBz ∈ L2 (Ω) is its Laplacian operation. Without loss of generality, we assume that Bz |∂Ω = 0, otherwise, B˜ z := Bz − Bz,0 with Bz,0 solving x ∈ Ω, ΔBz,0 (x) = 0, (14.3.20) Bz,0 (x) = Bz (x), x ∈ ∂Ω
14.3 Harmonic Bz Algorithm
345
satisfies B˜ z |∂Ω = 0 and ΔB˜ z = u. Denote by G(x, x ) the Green function of −Δ with Dirichlet boundary condition, then we have G(x, x )u(x )dx = −Bz (x), x ∈ Ω. (14.3.21) A[u](x) := Ω
The norm · and the inner product ·, · without subscript in the sequel are in L2 sense. Obviously, the operator A is linear bounded and compact from L2 (Ω) to itself. Moreover, the operator A is nonnegative and self-adjoint in L2 (Ω), since for all v ∈ L2 (Ω) satisfying A[v] = g, it has ∂g g dsx + |∇g|2 dx ≥ 0, A[v], v = − g(x)Δg(x)dx = − Ω ∂Ω ∂n Ω and for u ∈ L2 (Ω) satisfying A[u] = f, it has A[u](x)v(x)dx = − f(x)Δg(x)dx A[u], v = Ω Ω ∂g ∂f =− f g dsx − g(x)Δf(x)dx dsx + ∂n ∂Ω ∂n Ω ∂Ω u(x)A[v](x)dx = u, A[v] . = − g(x)Δf(x)dx = Ω
Ω
In order to distinguish the exact data and the measurement data of magnetic flux, we denote the exact data as Bz , while its measurement data with error level δ as Bzδ , i.e., (14.3.22) Bzδ − Bz ≤ δ for some kind of norm. Now we consider how to get the stable computation of u = ΔBz from Bzδ , which is equivalent to solving the operator equation (14.3.21) from the measurement data Bzδ . The Fredholm integral equation A[u] = −Bzδ is ill-posed, and some regularization methods are needed. Since the integral operator A is nonnegative, the Lavrentiev regularization is applicable, see [15, 28, 35] for its general theoretical framework. The approximate solution uα,δ for this regularizing scheme is computed from αuα,δ + A[uα,δ ] = −Bzδ ,
(14.3.23)
where α > 0 is the regularization parameter. The standard results of the Lavrentiev regularization show that the Fredholm integral equation (14.3.21) has a unique solution uα,δ . Moreover, for α = α(δ) > 0 satisfying lim α(δ) = 0,
δ→0
lim
δ→0
δ = 0, α(δ)
(14.3.24)
346
14 Some Mathematical Problems in Biomedical Imaging
we have the convergence limδ→0 uα(δ),δ = u in L2 sense, see [1, 28, 38] for more details. Although (14.3.24) gives an admissible choice of the regularization parameter α = α(δ), we cannot get the quantitative data of α by this way. As the choice of the regularization parameter plays an important role in the regularization method, we will give some a-priori and a-posterior choice strategies of the regularization parameter in the following. The corresponding convergence analysis and error estimate of the regularizing solution will also be given. First, we consider the a-priori choice strategy of the regularization parameter α = α(δ). Denote by {(λn ; un ) : n ∈ N} the eigensystem of the self-adjoint operator A : L2 (Ω) → L2 (Ω). Due to the definition of nonnegative operator A, n. Define a linear operator B : L2 (Ω) → it has N (A) = {0} and λn > 0 for all √ L2 (Ω) such that its eigensystem is {( λn ; un ) : n ∈ N}, then it has B ∗ = B and A = B ∗ B. Based on this decomposition of A, we have the following result. Theorem 14.3.4. Let α > 0, Bzδ ∈ L2 (Ω) and uα,δ be the unique solution of (14.3.23). 1. If u = B ∗ [ψ] ∈ H 1 (Ω) with ψ ≤ M , then the error estimate for the approximate solution uα(δ),δ with α(δ) = c(2δ/M )2/3 for constant c > 0 is √ (14.3.25) uα(δ),δ − u ≤ (1/c + c)(M/2)2/3δ 1/3 . 2. If u = B ∗ B[ψ] ∈ H 2 (Ω) with ψ ≤M , then the error estimate for the approximate solution uα(δ),δ with α(δ) = c δ/M for some constant c > 0 is √ uα(δ),δ − u ≤ (1/c + c) M δ. (14.3.26) Proof. Using the standard triangle inequality, it follows that uα,δ − u ≤ uα,δ − uα + uα − u ≤
δ + uα − u, α
where uα is the unique solution of equation αuα + A[uα ] = −Bz .
(14.3.27)
If u = B ∗ [ψ] ∈ H 1 (Ω), it has α(uα − u) + B ∗ B[uα − u] = −αB ∗ ψ. Then ∗
u − u ≤ α(αI + B B) α
−1
√ √ α α B ψ ≤ ψ ≤ M, 2 2 ∗
thus uα,δ − u ≤
M√ δ + α. α 2
(14.3.28)
14.3 Harmonic Bz Algorithm
347
The choice of α(δ) = c(2δ/M )2/3 leads to the estimate (14.3.25). If u = B ∗ B[ψ] ∈ H 2 (Ω), it has α(uα − u) + B ∗ B[uα − u] = −αB ∗ Bψ. Then α(uα − u) + B ∗ B[uα − u + αψ] = 0.
(14.3.29)
Taking an inner product with respect to uα − u + αψ in both sides yields uα,δ − u ≤
δ + M α. α
The choice of α(δ) = c δ/M leads to the estimate (14.3.26). The proof is completed. Remark 14.3.5. Obviously, in the two a-priori cases of this theorem, the best choice of α can be gotten when c = 22/3 and c = 1, respectively. Then we have the error estimate uα(δ),δ − u = 3(M/4)2/3 δ 1/3
for
α(δ) = (4/M )2/3 δ 2/3 ,
and uα(δ),δ − u = 2M 1/2 δ 1/2
for
α(δ) = (1/M )1/2 δ 1/2,
respectively. Therefore, α = α(δ) can be quantitatively specified for these a-priori strategies. Theorem 13.3.4 gives the a-priori choice strategy of the regularization parameter α = α(δ), which relies on some assumptions on the exact solution u = ΔBz . The corresponding error estimates for the regularizing solution uα(δ),δ have also been given. However, the a-priori choice is not easy to be applied in many cases, since these assumptions on the exact solution u are hard to be justified in practical situations. Compared with the a-priori choice of α, the a-posteriori choice only relies on the noisy data and its noise level in general, therefore, it is more useful. Now we consider the generalized Morozov discrepancy principle, which determines α from (14.3.30) A[uα,δ ] + Bzδ = Cδ γ , where C > 0 and 0 < γ < 1 are two specified constants, see [5, 38]. The equation (14.3.30) has a unique solution α = α(δ) provided that constants C and γ be chosen such that δ < Cδ γ < Bzδ . Combine Theorem 14.3.4 and the Theorem 6 in [38] yields the following result.
348
14 Some Mathematical Problems in Biomedical Imaging
Theorem 14.3.6. Let α > 0, Bzδ ∈ L2 (Ω) and uα,δ be the unique solution of (14.3.23). Suppose that δ < Cδ γ < Bzδ for two constants C > 0 and 0 < γ < 1. Then for the regularizing solution uα(δ),δ with α = α(δ) determined by (14.3.30), it follows that 1. If u = B ∗ [ψ] ∈ H 1 (Ω) with ψ ≤ M , then the convergence rate is uα(δ),δ − u = O(δ min{1−γ,γ/2}). 2. If u = B ∗ B[ψ] ∈ H 2 (Ω) with ψ ≤ M , then the convergence rate is uα(δ),δ − u = O(δ min{1−γ,γ}). After the choice strategies of the regularization parameter α = α(δ), we consider how to compute the regularizing solution uα(δ),δ from the equation (14.3.23). In the numerical aspects, we propose to solve (14.3.23) from an equivalent boundary value problem. Define hα,δ := A[uα,δ ], then the equation (14.3.23) can be rewritten as −αΔhα,δ (x) + hα,δ (x) = −Bzδ (x),
x∈Ω
(14.3.31)
with hα,δ (x) = 0,
x ∈ ∂Ω.
(14.3.32)
From this boundary value problem, we can compute hα,δ (x) in Ω by numerical methods and finally get 1 1 uα,δ (x) = −Δhα,δ (x) = − hα,δ (x) − Bzδ (x), α α
x ∈ Ω.
(14.3.33)
For the given measurement data Bzj for j = 1, 2, we can compute its Laplacian operation ΔBzj , j = 1, 2 by the above method, which can be used in the iterative scheme of the harmonic Bz algorithm.
14.4
Integral Equations Method
In this section, we will give the integral equation method, which is based on the key relation (14.3.2) between the conductivity σ and the magnetic flux data Bz .
14.4.1
Algorithm description
The integral equation method is based on the identity (14.3.2) for two measurement data Bzj , j = 1, 2, that is, σ(r)[(x − x )∂y uj [σ](r )] − (y − y )∂x uj [σ](r ) μ0 Bzj (r) = dr , r ∈ Ω. 4π Ω |r − r |3 (14.4.1)
14.4 Integral Equations Method
349
The basic idea of the integral equation method is the same as the harmonic Bz algorithms, which reconstructs the conductivity σ on each two-dimensional slice Ωz0 = Ω∩{z = z0 } and then combine σ(x, y, z0 ) for all possible z0 together. Therefore, we consider the two-dimensional problem directly, which can explain our scheme clearly. Such a simplified model can be derived from the conductivity reconstruction of a three-dimensional cylinder object. That is, we assume Ω = D × R1 is an infinite cylinder along z direction with a two-dimensional cross section D := Ωz0 ⊂ R2 . If the conductivity σ(x, y, z) is independent of z and the input currents from the surface electrodes are also independent of z, then there are only transverse currents J = −(σ∂x u[σ], σ∂y u[σ], 0) on each slice D. Taking a fixed slice such as z0 = 0 at the cylinder, then the nonlinear equation (14.4.1) becomes ∞ dz j K [σ](x, x )σ(x ) dx = Bzj (x), x = (x, y) ∈ D 2 2 3/2 −∞ (|x − x | + z ) D (14.4.2) for j = 1, 2 with the kernel μ0 {(x − x )∂y uj [σ](x ) − (y − y )∂x uj [σ](x )}, (14.4.3) K j [σ](x, x ) := 4π where and in the sequel we omit the variable z0 in all the functions defined in cylinder Ω for the simplicity of notations, since they are independent of z. A straightforward computation of the integral with respect to z in the above relation yields 2K j [σ](x, x ) σ(x )dx = Bzj (x), x ∈ D, j = 1, 2. (14.4.4) |2 |x − x D In this case, uj [σ](x) meets the two-dimensional boundary value problem ⎧ ⎪ ∇ · (σ∇uj [σ]) = 0, in D, ⎪ ⎪ ⎪ ⎪ ∂uj [σ] ∂uj [σ] ⎪ ⎨Ij = ds = − ds, σ σ − + ∂n ∂n Ej Ej (14.4.5) ⎪ ∂uj [σ] ⎪ − − + + ⎪ ∇uj [σ] × n = 0, on Ej ∪ Ej , σ ∂n = 0, on ∂D\Ej ∪ Ej , ⎪ ⎪ ⎪ ⎩u (x ) = 0, on ∂D, j
0
where uj (x0 ) = 0 is the reference voltage on x0 ∈ ∂D for the uniqueness of the above problem. The relations (14.4.3)–(14.4.5) constitute the reconstruction model in each two-dimensional slice D of the integral equation method. The equation (14.4.4) is a nonlinear integral equation of the first kind with respect of to the unknown conductivity σ. For the uniqueness of this nonlinear equation, we need to specify the voltage difference at two fixed boundary points. Obviously there exists at least one point xj ∈ ∂D such that uj [σ](xj ) = Cj = 0.
350
14 Some Mathematical Problems in Biomedical Imaging
Thus, in the iterative algorithm, we assume that the voltage uj [σ n ](x) corresponding to the iterative value σ n always meets uj [σ n ](xj ) = uj [σ](xj ) = Cj .
(14.4.6)
Together with uj [σ n ](x0 ) = 0, we have specified the voltage difference at x0 and xj . Our basic idea is to solve (14.4.4) by the following iterative scheme ⎧ 2K 1 [σ n ](x, x ) n+1/2 ⎪ ⎪ ⎪ σ (x )dx = Bz1 (x), x ∈ D, ⎨ |x − x |2 D (14.4.7) 2 [σ n+1/2 ](x, x ) ⎪ 2K ⎪ n+1 2 ⎪ σ (x )dx = Bz (x), x ∈ D ⎩ |x − x |2 D to get σ n+1 from σ n , rather than to couple (14.4.4) for j = 1, 2 and then to solve them together. More precisely, we propose the following iteration algorithm based on the above nonlinear integral equations (14.4.4): Alternative Iterations Algorithm: Give initial guess σ 0 (x). (i) Solve u1 [σ n ] from (14.4.5) and then rescale it by u1 [σ n ](x)
C1 ⇒ u1 [σ n ](x) u1[σ n ](x1 )
such that (14.4.6) holds at each iteration, compute K 1 [σ n ](x, x ) from (14.4.3) using the rescaled electric potential. 1 (ii) Solve the first linear equation in (14.4.7) to generate σ n+ 2 by regularization, rescale it by 1 1 u1[σ n ](x1 ) σ n+ 2 (x) ⇒ σ n+ 2 (x). C1 1
(iii) Solve u2 [σ n+ 2 ] from (14.4.5) and then rescale it by C2
1
u2 [σ n+ 2 ](x)
u2[σ
n+ 21
1
](x2 )
⇒ u2 [σ n+ 2 ](x),
1
compute K 2 [σ n+ 2 ](x, x ) from (14.4.3) using the rescaled electric potential. (iv) Solve the second equation in (14.4.7) to generate σ n+1 by regularization, rescale it by 1 u2[σ n+ 2 ](x2 ) n+1 (x) ⇒ σ n+1 (x). σ C2 (v) Set n := n + 1 and goto (i), or stop the iteration if ||σ n+1 − σ n || < , where is a given tolerance.
14.4 Integral Equations Method
351
By this way, we update σ n to generate σ n+1 . This procedure applies Bz1 and alternatively. The advantage of this algorithm is that the iteration has the same structure for each measured magnetic field, therefore this scheme can be easily generalized to multiple measurements case. The two linear iteration equations in (14.4.7) can be written in a unified form 2K j [σ](x, x ) σ(x )dx = Bzj (x), x ∈ D, j = 1, 2 (14.4.8) |2 |x − x D
Bz2
with known σ in D. Theoretically, σ can be solved from this equation by some regularization. Unfortunately, there are some numerical difficulties coming from the practical situations, which make it difficult to reconstruct the conductivity near ∂D from (14.4.7). To understand this difficulty, we write the exact equation (14.4.4) equivalently as (y − y , x − x) · Jj [σ](x ) μ0 dx = Bzj (x), x ∈ D, (14.4.9) 2π D |x − x |2 where Jj [σ](x) = −σ(x)(∂xuj [σ](x), ∂y uj [σ](x)) (j = 1, 2) is the transverse current on the slice D. As we know, the two internal currents J1 [σ](x) and ± ± 2 J [σ](x) are almost parallel near to the boundary ∂D \ (E1 E2 ). This means j the two magnetic fluxes Bz (x) with j = 1, 2 contribute almost the same information of σ(x) near ∂D \ (E1± E2± ). Therefore, if we try to use (14.4.9) to reconstruct σ(x) in the whole domain D, σ(x) near the boundary will contain too much noise. Consequently, the error near the boundary will contaminate the reconstruction in the whole domain D, since the left-hand side of (14.4.9) depends on σ in D. To overcome this difficulty, we assume that the conductivity σ is known near ∂D, which is a reasonable assumption from the practical phantom experiments, where the unknown biologic tissue is put inside a given medium. Mathematically, this assumption ensures that we have the domain decomposition D = Dloc (D \ Dloc ) and only need to determine σ in some interior compact set Dloc , while σ is known in D \ Dloc , which is denoted by σ ∗ . Under this condition, we can decompose the unknown conductivity σ in D as x ∈ D \ Dloc , σ∗, (14.4.10) σ(x) = σloc , x ∈ Dloc . Then the equations in (14.4.7) for x ∈ Dloc and j = 1, 2 becomes 2K j [σ](x, x ) 2K j [σ](x, x ) ∗ j σ (x )dx = B (x) − σ (x )dx loc z |x − x |2 |x − x |2 Dloc D\Dloc (14.4.11)
352
14 Some Mathematical Problems in Biomedical Imaging
with known right-hand side and unknown σloc defined in Dloc . Once we determine σloc from this equation, we can reconstruct σ in the interior part of D approximately from (14.4.7) with σ = σ n , σ n+1/2 respectively.
14.4.2
Regularization and discretization
As explained in the above subsection, we need to assume that the conductivity σ near the boundary ∂D is known. For simplicity, we consider the rectangle domain D = [−X0 , X0 ] × [−Y0 , Y0 ]. We will give a method to compute the integrals in (14.4.11) and derive the corresponding linear algebra equation in finite dimensional space for iterations. For a completely model problem for which the conductivity distributes in the rectangle domain, we divide D by the grid points −X0 = x0 < x1 < · · · < xN = X0 ,
−Y0 = y0 < y1 < · · · < yM = Y0 .
M,N Denote by {(xi , yj )}N,M i,j=0 the nodes and {e(j, i)}j,i=1 the rectangle elements, and divide D by rectangle elements. Denote by xj,i the center of each element
e(j, i) := {(x, y) : x ∈ [xi−1 , xi ], y ∈ [yj−1 , yj ]}. Then the collocation form of (14.4.8) is N,M
i,j=1 e(j,i)
2K˜ j [σ](xl,k , x ) σ(x )dx = Bzj (xl,k ). |xl,k − x |2
(14.4.12)
By taking σ(x ) ≡ σ(xj,i ),
∇uj [σ](x ) ≡ ∇uj [σ](xj,i )
(14.4.13)
at each element e(j, i), the integrals in each element in (14.4.12) can be computed from the definition of K j [σ](xl,k , x ) and (14.4.13). Especially, the integral at element e(l, k) with singular integrand under the definition (14.4.13) is zero due to the symmetric property, that is, K j [σ](xl,k , x )σ(x ) dx = 0. (14.4.14) |xl,k − x |2 e(l,k) Finally, we are led to σ(xj,i ) (i,j)=(k,l)
e(j,i)
2K j [σ](xl,k , x ) dx = Bzj (xl,k ), |xl,k − x |2
j = 1, 2
(14.4.15)
for k = 1, . . . , N, l = 1, . . . , M . With the values of u at the nodes, we can compute the derivative in (14.4.13) at xj,i by center difference method and therefore the regular integrals in (14.4.15).
14.4 Integral Equations Method
353
Decomposing this equation as that in (14.4.11) yields
σloc (xj,i ) e(j,i)
(i,j)=(k,l),xj,i ∈Dloc
= Bzj (xl,k ) −
2K j [σ](xl,k , x ) dx |xl,k − x |2 2K j [σ](xl,k , x ) σ ∗ (xj,i ) dx , |xl,k − x |2 e(j,i)
(i,j)=(k,l),xj,i ∈D\Dloc
j = 1, 2. (14.4.16) Then we can solve σloc at all points xj,i ∈ Dloc by taking xl,k ∈ Dloc from this equation. This linear system is of large condition number. Notice, the size of D \ Dloc in the right-hand side represents the amount of our a-priori information about exact conductivity σ, which will cause obvious influence on our reconstruction accuracy. Without any information about this part, it is impossible to reconstruct σ in D up to some satisfactory accuracy. On the other hand, the equation (14.4.15) also gives a way to the simulation of Bz (not ΔBz ) for our inversion scheme. It follows for the given σ and input currents Ij , j = 1, 2 that Bzj (xl,k ) =
σ(xj,i )
(i,j)=(k,l)
=
2μ0 4π
e(j,i)
2K j [σ](xl,k , x ) dx |xl,k − x |2
l,k l,k σ(xj,i )[∂y uj [σ](xj,i )Ej,i − ∂x uj [σ](xj,i )Fj,i (] 14.4.17)
(i,j)=(k,l)
with ⎧ (xl,k − x ) (k − i)Δx l,k ⎪ ⎪ dx ≈ ΔxΔy , ⎨Ej,i = |2 2 + (l − j)Δy)2 |x − x ((k − i)Δx) e(j,i) l,k (14.4.18) (yl,k − y ) (l − j)Δy ⎪ l,k ⎪ ⎩Fj,i = dx ≈ ΔxΔy . 2 ((k − i)Δx)2 + (l − j)Δy)2 e(j,i) |xl,k − x | Our inversion scheme is to update σloc by solving (14.4.16) at each iteration step and then to generate the next σ in D by (14.4.8). By rearranging the order of unknowns σ(xl,k ) with xl,k ∈ Dloc , we rewrite the matrix form of (14.4.16) as − → → K− σ = R Bz , (14.4.19) where K is an almost-N M × N M matrix with main diagonal elements 0 due to − → (14.4.16), RBz is the right-hand side of (14.4.16). We ignore the dependance − → → of this equation on j = 1, 2 with two given input currents. Both − σ and RBz are almost-N M vector defined at the points xl,k ∈ Dloc . In the rectangle case,
354
14 Some Mathematical Problems in Biomedical Imaging
if we define the domain by Dloc = k|N −LL ,l|M −2∗LL e(l, k), then we have totally 1+LL 1+2∗LL (N − 2 × LL) × (M − 4 × LL)-unknowns in Dloc . Since K is of large condition number, we introduce the regularizing scheme to solve (14.4.19). Assume that we are given the noisy data Bzδ for exact magnetic filed Bz with noise level δ > 0, then the standard Tikhonov regularization takes the form → T T− → (αI + K K)− σ = K B δz (14.4.20) → − for given regularizing parameter α > 0. Denote by σ α,δ the solution for this → − → σ = − σ for α = α(δ) well-posed equation. Then we know that lim δ→0
α(δ),δ
chosen suitably. One of the possible way to the choice of α(δ) is the Morozov discrepancy. The solution to (14.4.20) is stable with respect to the input error. The integral equation method gives an iterative scheme to reconstruct the conductivity σ for the measurement data Bzj , j = 1, 2 alternatively. The numerical simulations have shown the efficiency of the integral equation method such as the convergence and its validity for noisy Bz data, but some theoretical issues such as the convergence property of the iteration algorithm as well as the error analysis are still open. Our numerical simulations will be shown in the next section.
14.5
Numerical Experiments
In this section, we will present some numerical experiments to show the validity of the conductivity image reconstruction algorithms in the above sections. Consider the two-dimensional MREIT problem in a rectangular Ω := [−1, 1] × [−2, 2] ⊂ R2 . We use two pairs of surface electrodes E1± = {(±1, y) : |y| < 0.1} and E2± = {(x, ±2) : |x| < 0.1} attached on ∂Ω with two injection currents I1 = I2 = 1. For given σ, the internal voltage uj [σ] corresponding to the injection current Ij satisfies: ⎧ ⎪ ∇ · (σ∇uj [σ]) = 0, in Ω, ⎪ ⎪ ⎪ ⎨ ∂uj [σ] ∂uj [σ] (14.5.1) Ij = ds = − ds, σ σ ⎪ − + ∂n ∂n ⎪ E ε j j ⎪ ⎪ ⎩ ∂uj [σ] ∇uj [σ] × n = 0, on Ej+ ∪ Ej− , σ ∂n = 0, on ∂Ω\Ej+ ∪ Ej− . The z-components Bzj , j = 1, 2 of the internal magnetic fluxes satisfy 2K j [σ](x, x ) σ(x )dx = Bzj (x), x ∈ Ω, (14.5.2) |x − x |2 Ω where K j [σ](x, x ) =
μ0 {(x − x )∂y uj [σ](x ) − (y − y )∂x uj [σ](x )}. 4π
(14.5.3)
14.5 Numerical Experiments
355
The corresponding Laplacian operations ΔBzj , j = 1, 2 satisfy ΔBzj = μ0 ∇σ · (∇uj [σ] × ez ) = μ0 ∇ ln σ · (σ∇uj [σ] × ez )
(14.5.4)
with μ0 = 1 for simplicity. We compute the exact values of Bzj and ΔBzj by using (14.5.2) and (14.5.3) in our numerical simulations, and the corresponding noisy data are generated by adding random noise to the exact ones in the following way: B˜ zj (x, y) = Bzj (x, y) + δ × ||Bzj ||L2 (Ω) × rand(x, y), (14.5.5) ΔB˜ zj (x, y) = ΔBzj (x, y) + δ × ||ΔBzj ||L2 (Ω) × rand(x, y), where rand is a random number in (−1, 1). Example 14.5.1. We test the numerical validity and the convergence property of the harmonic Bz algorithm, where the exact Laplacian operations ΔBzj for j = 1, 2 are computed by using (14.5.4) directly. Consider an MRI image of the human brain in Figure 14.1 embedded inside the rectangular domain Ω. Using triangulation, we converted the original image (left) into a model conductivity image (right), where the grey level of the picture is assumed to be the conductivity. In principle, there is no reason of correlation between the image and the conductivity distribution. Here we just consider a model problem to test our algorithm. In this model, we mapped the gray level σ˜ ∈ [0, 255] (the value 255 indicates white color) to the exact conductivity values by σ(x, ˜ y) + 1, σ ∗ (x, y) = 255 which ensures that σ ∗ (x, y) ∈ [1, 2] in Ω with a positive lower bound. Our aim is to reconstruct the target model conductivity (right image in Figure 14.1) and analyze the error estimate. We take the initial guess in the harmonic Bz algorithm as σ 0 (x, y) = 0.8 − 0.2 cos(0.2π x2 + y 2 ).
Figure 14.1. An image of the human head (left) and the target model conductivity image (right).
356
14 Some Mathematical Problems in Biomedical Imaging
The reconstruction results for n = 1, 2, 3, 4 are shown in Figure 14.2, which show the well numerical performance of the harmonic Bz algorithm.
Figure 14.2. Reconstructed conductivity images for n = 1 (top, left), 2 (top, right), 3 (bottom, left), and 4 (bottom, right).
Now we test the theoretical results shown in Section 14.3.2 numerically. The relative contrast for the exact conductivity σ ∗ is ∇ ln σ ∗ C(Ω) ≈ 41.64 in this problem. From Theorem 14.3.1, we know that σn 1 n n −1 2 j j ln Bz,∗ − Bz,n W 2,p3 (Ω) . Σj=1 σ ∗ ∞ ≤ M (σ A[σ ]) Lp2 (Ω) μ 0 L (Ω) (14.5.6) For numerical simplicity, we replace the Sobolev norm in (14.5.6) by the continuous norm. We define the constant Mn :=
ln σ n − ln σ ∗ C(Ω) j j 1 2 (σ n A[σ n ])−1 C(Ω) Σj=1 μ0 Bz,∗ − Bz,n C(Ω)
(14.5.7)
to represent the approximation behavior. That is, Mn should approach some constant as n increases from our error estimate.
14.5 Numerical Experiments
357
We also use the continuous norm to represent the a-posterior estimate in Theorem 11.7.3, i.e., we use the equivalent form via σn ln σ∗ C(Ω) !n j j 1 2 (σ n A[σ n ])−1 C(Ω) Σj=1 μ0 Bz,∗ − Bz,n C(Ω) ≤ K 0 . (14.5.8) j j 1 2 (σ n−1 A[σ n−1 ])−1 C(Ω) Σj=1 μ0 Bz,∗ − Bz,n−1 C(Ω) Then we define Kn as the quotient of the left-hand side divided by the righthand side except K 0 in (14.5.8). On the other hand, we use the relative error en :=
σ n − σ ∗ C(Ω) σ ∗ C(Ω)
to analyze the approximation. The relative error en for different iteration times n are plotted in Figure 14.3. The quantitative descriptions of Mn and Kn for different iteration times n are given in Figure 14.4. All the data of en , Mn , and
Figure 14.3. Relative error en for different iteration times n.
Figure 14.4. Behavior of Mn and Kn for different iteration times n.
358
14 Some Mathematical Problems in Biomedical Imaging
Kn for different iteration times n are given in Table 14.1, from which we can conclude that the a-posterior estimate is reasonable and sharp. Table 14.1. Quantitative description of en , Mn , and Kn for different iteration times n in Example 14.5.1.
n
en
Mn
Kn
0 1 2 3 4 5 6 7 8
6.763 × 10−1 5.799 × 10−1 1.943 × 10−1 1.827 × 10−1 1.836 × 10−1 1.835 × 10−1 1.835 × 10−1 1.835 × 10−1 1.835 × 10−1
2.334 × 10−4 2.332 × 10−4 1.023 × 10−4 9.994 × 10−5 1.001 × 10−4 1.001 × 10−4 1.001 × 10−4 1.001 × 10−4 1.001 × 10−4
– 1.127 1.862 × 10−1 2.604 × 10−1 2.301 × 10−1 2.332 × 10−1 2.329 × 10−1 2.330 × 10−1 2.330 × 10−1
Example 14.5.2. In this example, we test the numerical validity of the harmonic Bz algorithm for the given measurement data of Bzj , j = 1, 2, where the stable computation of the Laplacian operation ΔBzj is needed. Consider an MRI image given in Figure 14.5 embedded inside the rectangular domain Ω. In this model, the conductivity values is simulated by the same way as that in Example 14.5.1. The exact values of Bzj , j = 1, 2 are computed by using (14.5.3), which have been shown in Figure 14.6, and the corresponding noisy data B˜ zj , j = 1, 2 are generated by using (14.5.5). We apply the stable computation method given in Section 14.3.3 to compute ˜ ΔBzj from the noisy data B˜ zj , where the regularization parameter is chosen by the generalized Morozov discrepancy principle with C = 1 and γ = 0.5. Taking
Figure 14.5. An image used in Example 14.5.2.
14.5 Numerical Experiments
359
Figure 14.6. The exact magnetic flux Bz1 and Bz2 .
the same initial guess as that in Example 14.5.1, the relative error E(n) :=
σ n − σ ∗ L2 (Ω) σ ∗ L2 (Ω)
of the inversion iterative sequence {σ n } are shown in Table 14.2 for different noise level δ and iteration times n. It can be concluded that the harmonic Bz algorithm is satisfactory under reasonable noise level, due to the efficient computation of ΔBz from its noisy measurement data. When δ = 3%, the inversions for different iteration times n are given in Figure 14.7, showing the edges of true conductivity clearly. Table 14.2. Relative error E(n) for different noise level δ and iteration times n in Example 14.5.2.
δ δ δ δ
= 0% = 1% = 3% = 5%
n=0
n=1
n=2
n=3
n=4
n=5
n=6
n=7
0.4603 0.4603 0.4603 0.4603
0.1469 0.1566 0.1801 0.1943
0.0688 0.0991 0.1383 0.1621
0.0547 0.0924 0.1334 0.1574
0.0520 0.0916 0.1328 0.1566
0.0514 0.0915 0.1327 0.1564
0.0513 0.0914 0.1327 0.1564
0.0513 0.0914 0.1327 0.1564
Example 14.5.3. In this example, we test the numerical validity of the integral equation method for the given measurement data of Bzj , j = 1, 2. Consider the same MRI image and the exact conductivity σ ∗ as that in Example 14.5.1, the exact values of Bzj , j = 1, 2 are shown in Figure 14.8. Notice, this picture of magnetic flux is almost the same as Figure 14.6, which correspond to different conductivities. This phenomenon reveals the fact that the information of conductivity contained in the magnetic flux is quite weak. For the rectangular domain D := Ω, we take Dloc = [−0.8, 0.8] × [−1.6, 1.6],
360
14 Some Mathematical Problems in Biomedical Imaging
Figure 14.7. The inversion results σ n with noise level δ = 3%.
Figure 14.8. The exact magnetic flux Bz1 and Bz2 .
which means some a-prior information near to the medium boundary are used to yield the stable reconstruction in our experiments. The regularizing parameter is α = 1e − 7, and the initial guess σ 0 = 2. For the exact Bz data, the reconstructions are given in Figure 14.9. To test the stability of our scheme, the noisy data of Bzj , j = 1, 2 are generated by (14.5.5). We consider the reconstructive algorithm based on the integral equation method for the noisy data B˜ zj , j = 1, 2. When δ = 0.0002, the reconstruction results are shown in Figure 14.10 for different iteration times. The iteration error for different noisy level and iteration times are shown in Table 14.3. From these numerical
14.5 Numerical Experiments
361
Figure 14.9. Target conductivity and iteration results for n = 1, 2, 3, 4, 5 with initial guess σ 0 = 2.
Figure 14.10. Target conductivity and iteration results for n = 1, 2, 3, 4, 5 with initial guess σ 0 = 2 and δ = 0.0002.
362
14 Some Mathematical Problems in Biomedical Imaging
Table 14.3. The iteration error for different noisy level δ and iteration times n in Example 14.5.3.
δ δ δ δ
=0 = 0.00005 = 0.0001 = 0.0002
n=0
n=1
n=2
n=3
n=4
n=5
n=6
n=7
3.0621 3.0621 3.0621 3.0621
0.8530 0.8540 0.8560 0.8601
0.3467 0.3485 0.3529 0.3636
0.1276 0.1307 0.1417 0.1677
0.0691 0.0737 0.0910 0.1283
0.0595 0.0645 0.0832 0.1231
0.0583 0.0633 0.0821 0.1224
0.0581 0.0631 0.0819 0.1223
performances for noisy input data applied for practical biologic tissue image, we can see that the alternative iteration method using noisy Bz directly, based on the integral equation method, is stable.
Acknowledgements This work is supported by National Natural Science Foundation of China under grant number 10911140265.
References [1] S. Ahn, U. J. Choi and A. G. Ramm, A scheme for stable numerical differentiation, J. Comput. Appl. Math., 186, 325-334, 2006. [2] P. Bauman, A. Marini and V. Nesi, Univalent solutions of an elliptic system of partial differential equations arising in homogenization, Indiana Univ. Math. J., 128, 53-64, 2000. [3] O. Birgul and Y. Z. Ider, Use of the magnetic field generated by the internal distribution of injected currents for electrical impedance tomography, Proc. 9th Int. Conf. Elec. Bio-impedance, Heidelberg, Germany, 418-419, 1995. [4] M. Cheney, D. Isaacson and J. C. Newell, Electrical impedance tomography, SIAM Review, 41, 85-101, 1999. [5] N. S. Hoang and A. G. Ramm, A discrepancy principle for equations with monotone continuous operators, Nonlinear Analysis, 70, 4307-4315, 2009. [6] D. Holder, Electrical Impedance Tomography: Methods, History and Applications, IOP Publishing, Bristol, UK, 2005. [7] K. Jeon, H. J. Kim, C. O. Lee, J. K. Seo and E. J. Woo, Integration of the denoising, inpainting and local harmonic Bz algorithm for MREIT imaging of intact animals, Phys. Med. Biol., 55, 7541-7556, 2010. [8] H. S. Khang, B. I. Lee, S. H. Oh, E. J. Woo, S. Y. Lee, M. H. Cho, O. I. Kwon, J. R. Yoon and J. K. Seo, J-substitution algorithm in magnetic resonance electrical impedance tomography (MREIT): Phantom experiments for static resistivity images, IEEE Trans. Med. Imag., 21, 695-702, 2002.
References
363
[9] Y. J. Kim, O. Kwon, J. K. Seo and E. J. Woo, Uniqueness and convergence of conductivity image reconstruction in magnetic resonance electrical impedance tomography, Inverse Problems, 19, 1213-1225, 2003. [10] H. J. Kim, Y. T. Kim, A. S. Minhas, W. C. Jeong, E. J. Woo, J. K. Seo and O. J. Kwon, In vivo high-resolution conductivity imaging of the human leg using MREIT: The first human experiment, IEEE Trans. Med. Imag., 28, 1681-1687, 2009. [11] O. Kwon, E. J. Woo, J. R. Yoon and J. K. Seo, Magnetic resonance electrical impedance tomography (MREIT): Simulation study of J-substitution algorithm, IEEE Trans. Biomed. Eng., 49, 160-167, 2002. [12] O. Kwon, J. Y. Lee and J. R. Yoon, Equipotential line method for magnetic resonance electrical impedance tomography (MREIT), Inverse Problems, 18, 10891100, 2002. [13] O. Kwon, C. J. Park, E. J. Park, J. K. Seo and E. J.Woo, Electrical conductivity imaging using a variational method in Bz -based MREIT, Inverse Problems, 21, 969-980, 2005. [14] O. Kwon, H. C. Pyo, J. K. Seo and E. J. Woo, Mathematical framework for Bz based MREIT model in electrical impedance imaging, Comput. Math. Appl., 51, 817-828, 2006. [15] M. M. Lavrentiev, Some Improperly Posed Problems of Mathematical Physics, Springer-Verlag, New York, 1967. [16] B. I. Lee, S. H. Oh, E. J. Woo, S. Y. Lee, M. H. Cho, O. Kwon, J. K. Seo and W. S. Baek, Static resistivity image of a cubic saline phantom in magnetic resonance electrical impedance tomography (MREIT), Physiol. Meas., 24, 579-589, 2003. [17] B. I. Lee, S. H. Lee, T. S. Kim, O. Kwon, E. J. Woo and J. K. Seo, Harmonic decomposition in PDE-based denoising technique for magnetic resonance electrical impedance tomography, IEEE Trans. Biomed. Eng., 52, 1912-1920, 2005. [18] S. Lee, J. K. Seo, C. Park, B. I. Lee, E. J. Woo, S. Y. Lee, O. Kwon and J. Hahn, Conductivity image reconstruction from defective data in MREIT: Numerical simulation and animal experiment, IEEE Trans. Med. Imag., 25, 168-176, 2006. [19] J. J. Liu, H. C. Pyo, J. K. Seo and E. J. Woo, Convergence properties and stability issues in MREIT algorithm, Contemp. Math., 408, 201-218, 2006. [20] J. J. Liu, J. K. Seo, M. Sini and E. J. Woo, On the convergence of the harmonic Bz algorithm in magnetic resonance electrical impedance tomography, SIAM J. Appl. Math., 67, 1259-1282, 2007. [21] J. J. Liu, J. K. Seo and E. J. Woo, A posteriori error estimate and convergence analysis for conductivity image reconstruction in MREIT, SIAM J. Appl. Math., 70, 2883-2903, 2011. [22] J. J. Liu and H. L. Xu, Reconstruction of biologic tissue conductivity from noisy magnetic field by integral equation method, submitted to Appl. Math. Comput.. [23] S. H. Oh, B. I. Lee, E. J. Woo, S. Y. Lee, M. H. Cho, O. Kwon and J. K. Seo, Conductivity and current density image reconstruction using harmonic Bz algorithm in magnetic resonance electrical impedance tomography, Phys. Med. Biol., 48, 3101-3016, 2003. [24] S. H. Oh, B. I. Lee, E. J. Woo, S. Y. Lee, T. S. Kim, O. Kwon and J. K. Seo, Electrical conductivity images of biological tissue phantoms in MREIT, Physiol. Meas., 26, S279-S288, 2005.
364
14 Some Mathematical Problems in Biomedical Imaging
[25] S. Onart, Y. Z. Ider and W. Lionheart, Uniqueness and reconstruction in magnetic resonance-electrical impedance tomography (MR-EIT), Physiol. Meas., 24, 591604, 2003. [26] C. Park, O. Kwon, E. J. Woo and J. K. Seo, Electrical conductivity imaging using gradient Bz decomposition algorithm in magnetic resonance electrical impedance tomography (MREIT), IEEE Trans. Med. Imag., 23, 388-394, 2004. [27] C. Park, E. J. Park, E. J. Woo, O. Kwon and J. K. Seo, Static conductivity imaging using variational gradient Bz algorithm in magnetic resonance electrical impedance tomography, Physiol. Meas., 25, 257-269, 2004. [28] A. G. Ramm, Inverse Problems, Springer-Verlag, New York, 2005. [29] G. C. Scott, M. L. G. Joy, R. L. Armstrong and R. M. Henkelman, Measurement of nonuniform current density by magnetic resonance, IEEE Trans. Med. Imag., 10, 362-374, 1991. [30] G. C. Scott, M. L. G. Joy, R. L. Armstrong and R. M. Henkelman, Sensitivity of magnetic-resonance current density imaging, J. Mag. Res., 97, 235-254, 1992. [31] J. K. Seo, J. R. Yoon, E. J. Woo and O. Kwon, Reconstruction of conductivity and current density images using only one component of magnetic field measurements, IEEE Trans. Biomed. Engrg., 50, 1121-1124, 2003. [32] J. K. Seo, O. Kwon, B. I. Lee and E. J.Woo, Reconstruction of current density distributions in axially symmetric cylindrical sections using one component of magnetic flux density: Computer simulation study, Physiol. Meas., 24, 565-577, 2003. [33] J. K. Seo, H. C. Pyo, C. Park, O. Kwon and E. J.Woo, Image reconstruction of anisotropic conductivity tensor distribution in MREIT: Computer simulation study, Phys. Med. Biol., 49, 4371-4382, 2004. [34] J. K. Seo and E. J. Woo, Magnetic resonance electrical impedance tomography (MREIT), SIAM Review, 53, 40-68, 2011. [35] U. Tautenhahn, On the method of Lavrentiev regularization for nonlinear ill-posed problems, Inverse Problems, 18, 191-207, 2002. [36] E. J. Woo, S. Y. Lee and C. W. Mun, Impedance tomography using internal current density distribution measured by nuclear magnetic resonance, Proc. SPIE, 2299, 377-385, 1994. [37] E. J. Woo and J. K. Seo, Magnetic resonance electrical impedance tomography (MREIT) for high-resolution conductivity imaging, Physiol. Meas., 29, R1-R26, 2008. [38] H. L. Xu and J. J. Liu, Stable numerical differentiation for the second order derivatives, Adv. Comput. Math., 33, 431-447, 2011. [39] N. Zhang, Electrical Impedance Tomography Based on Current Density Imaging, M. S. Thesis, Department of Electrical Engineering, University of Toronto, Toronto, Canada, 1992.
References
Authors Information J. J. Liu Department of Mathematics, Southeast University, Nanjing 210096, P. R. China. E-mail: [email protected] H. L. Xu School of Mathematics and Information Science, Henan Polytechnic University, Jiaozuo 454000, P. R. China. E-mail: [email protected]
365
Part VI
Numerical Inversion in Geosciences
Chapter 15
Numerical Methods for Solving Inverse Hyperbolic Problems S. I. Kabanikhin and M. A. Shishlenin
Abstract. We consider Gel’fand-Levitan-Krein method, linearization method and optimization method for coefficient inverse problems. The inverse problem data is given on a time-like surface of the domain. The comparative analysis of methods is presented and discussed as well as the results of numerical experiments.
15.1
Introduction
Methods for solving inverse and ill-posed problems can be divided into two groups: direct and iterative methods [31]. First we describe two direct methods: linearization method and Gel’fandLevitan-Krein method. Direct methods allow one to determine unknown coefficients in a fixed point of medium when additional information is given by the trace of the adjoint problems solution on a (usually time-like) surface of the domain. Direct methods in multidimensional inverse problems seem to be very promising trend of investigation because in iteration algorithms (method of steepest descent, Landweber iteration, Newton-Kantorovich method and so on) we have to solve the corresponding direct (forward) and adjoint (or linear inverse) problems on every step of the iterative process while in multidimensional case solving of a direct problem is hard enough. In the second part we describe iterative methods. The usual way of formulation of inverse coefficient problem is the operator form [32] A(q) = f, where q is the vector-function of desired coefficients, f is the inverse problem data. The Newton-Kantorovich method " #−1 (A(qn ) − f) qn+1 = qn − A (qn )
370
15
Numerical Methods for Solving Inverse Hyperbolic Problems
is very sensitive to the initial guess q0 , and include the inversion of the compact operator A (qn ). The Landweber iteration " #∗ qn+1 = qn − α A (qn ) (A(qn ) − f) include the solution of the direct and adjoint problem. The comparative analysis of direct and iterative methods is presented and discussed as well as the results of numerical experiments. The inverse problems for the acoustic equation are considered. We propose a method of reconstruction of the density approximating 2D inverse acoustic problem by a finite system of one dimensional inverse acoustic problems. The 2D analogy of the Gel’fand-Levitan-Krein method is established. The inverse acoustic problem is formulated and the short outline of the history and development in this field is given in Section 15.2. In Section 15.2.1 we consider the 2D analogy of the Gel’fand-Levitan-Krein equation. The N -approximation of the Gel’fand-Levitan-Krein equation is obtained for inverse acoustic problem. We apply the notion of quasi-solution to nonlinear inverse coefficient problems. Instead of compact set M we use the ball B(0, r) in which the radius r occurred to be sometimes a regularization parameter. Moreover this constant allows one to estimate the convergence rate for many well-known algorithms for solving coefficient inverse problems and to decrease the number of iterations crucially.
15.2
Gel’fand-Levitan-Krein Method
Let us consider the Direct Problem: c−2 (x, y) wtt = Δw − ∇ ln ρ(x, y) ∇w, x > 0, w ≡ 0,
y ∈ R,
t > 0;
(15.2.1) (15.2.2)
t 0 (c0 = const) is the velocity of wave propagation; ρ(x, y) ≥ ρ0 > 0 (ρ0 = const) is the density of the medium; w(x, y, t) is the exceeded pressure. Inverse Problem: find the coefficients of equation (15.2.1) using additional information about the solution to the forward problem (15.2.1)–(15.2.3): u(+0, y, t) = f(y, t),
y ∈ R,
t ∈ R.
(15.2.4)
Inverse problem (15.2.1)–(15.2.4) is very unstable, first, due to the nonlinearity and second, because of the additional ill-posedness which is caused by the
15.2 Gel’fand-Levitan-Krein Method
371
Cauchy data (15.2.3)–(15.2.4) which are given on the time-like surface x = 0. To overcome this ill-posedness we introduce the regularization algorithm which consists of two steps. The first is the projection of the problem (15.2.1)–(15.2.4) on the N -dimensional subspace produced by the basis {e iky }k=0,±1,...,±N . The second step is the application of time-domain approach of Gel’fand-LevitanKrein technique. We consider the method of regularization of 2D inverse acoustic problem based on projection method and the approach of I. M. Gel’fand, B. M. Levitan and M. G. Krein (GLK method). In this paper we develop the method, proposed in [21, 30] for 2D inverse acoustic problem. The unknown density ρ(x, y) is supposed to be represented in the finite Fourier series expansion ρ(x, y) =
ρk (x) eiky .
|k|≤N
In this case 2D inverse acoustic problem can be approximated by the finite system of 1D inverse acoustic problems. We use the GLK method for solving this system and discuss the results of numerical experiments. The GLK method is one of the most widely used in the theory of inverse problems and consists in reducing a nonlinear inverse problem to the family of linear integral Fredholm equations of the second kind. I. M. Gel’fand and B. M. Levitan [14] presented a method of reconstructing the Sturm-Liouville operator from a spectral function and gave the sufficient conditions for a given monotonic function to be a spectrum function of the operator. It is also important to mention the papers by M. G. Krein [35, 36], in which the physical statement of the problem is considered and theorems on solvability of inverse boundary value problem has been formulated. In A. S. Blagoveschenskii [8], a new proof of M. G. Krein’s results on the theory of inverse problems for the string equation were obtained. The new proof has the advantage of being local (nonstationary), namely, the local dependence of the unknown coefficient on the additional data was explicitly taken into account and used. The GLK method enjoys the possibility of formulating necessary and sufficient conditions for existence in the large of the inverse problem solution. It is important in view of nonlinearity of the inverse problem. Ideas of the GLK method were actively used in the theory of inverse dynamic seismic problems starting with the papers by G. Kunetz [37], A. S. Alekseev [1], B. S. Pariiskii [40]. A. S. Blagoveshchensky [9], B. Gopinath and M. Sondhi [16] developed the dynamic (time-domain) version of the GLK method for the acoustic inverse problem. A. S. Aleekseev and V. I. Dobrinskii [2] used the discrete analogy of the GLK method while investigating the numerical algorithms for a one-dimensional dynamic inverse problem of seismics (see also P. Goupillaud, [17]; G. Kunetz, [38];
372
15
Numerical Methods for Solving Inverse Hyperbolic Problems
W. W. Symes, [49]; F. Santosa, [47]). The comparison of the Gel’fand-Levitan method with other methods of solving one-dimensional inverse problems can be found in R. Burridge [11] and B. Ursin and K.-A. Berteussen [50]. B. S. Pariiskii [41, 42] analyzed the numerical algorithms for solving the GLK equation. S. I. Kabanikhin [20] proposed numerical algorithm for the GLK equation introducing the sufficient condition for the inverse problem solvability. In [21, 24, 25] the multi-dimensional analogy of the GLK equation was obtained. V. G. Romanov and S. I. Kabanikhin [45] applied the dynamical version of the GLK method to one-dimensional inverse geoelectric problem for quasistationary approximation of Maxwell’s equations. We also mention G. J. Berryman and R. R. Greene [7], K. M. Case and M. Kack [12], K. Bube [10], G. M. L. Gladwell and N. B. Willms [15], F. Natterer [39] for further results and references. The method proposed in this paper is the direct method which means that we do not use iterations. Numerical methods can be divided into two main groups — iterative methods and direct methods. The first group includes iterative algorithms in which one should solve the corresponding direct problem and adjoint (or linear inverse) problem on every step of the iterative process (optimization, Newton-Kantorovich etc). Direct methods do not use the multiple direct problem solution and allow one to find the solution directly. Development of such numerical methods for coefficient inverse problems, which would provide good approximations for unknown coefficients regardless on the a-priori information is vital for the field of inverse problems due to their applied nature. Currently there are only few approaches which are not using a-priori information. We mention the boundary control method [5, 6] and the globally convergent method proposed by M. V. Klibanov et al. [33, 34, 4]. In case when c is given constant, we may (15.2.1) to more sim$ come from % ple equation. Let v(x, y, t) = u(x, y, t) exp − 12 ln ρ(x, y) . From the problem (15.2.1)–(15.2.4) one can obtain the following equations vtt = Δv − q(x, y) v, v(x, y, +0) = 0,
where
x > 0,
y ∈ R,
t ∈ R;
vt (x, y, +0) = 0;
(15.2.5) (15.2.6)
vx (+0, y, t) = g1 (y, t);
(15.2.7)
v(+0, y, t) = f1 (y, t),
(15.2.8)
1 1 q(x, y) = − Δ ln ρ(x, y) + |∇ ln ρ(x, y)|2 ; 2 4 ∂ 1 1 ln ρ(0, y) exp − ln ρ(0, y) , g1 (y, t) = − f(y, t) 2 ∂x 2 1 f1 (y, t) = f(y, t) exp − ln ρ(0, y) , t > 0. 2
t > 0;
15.2 Gel’fand-Levitan-Krein Method
373
Note that inverse problem for (15.2.1) is uniquely connected with inverse problem for the equation (15.2.5). Theorem 15.2.1 ([18]). The initial boundary value problem: 1 |∇b(x, y)|2 − 2q(x, y), 2 b(0, y) = φ(y);
Δb(x, y) =
x > 0,
y ∈ [−π, π];
bx (0, y) = ψ(y); b(x, ±π) = b0 (const) has a (locally) unique solution in R = {x : |y| ≤ π, x > 0} in a class of functions b(x, y) which are 2π-periodic and analytic with respect to y and twice continuously differentiable with respect to x. As far as existence of the inverse problem solution is concerned, V.G. Romanov [46] proved that the inverse problem (15.2.5)–(15.2.8) has (locally) unique solution in the space of functions Q, represented by Fourier series q(x, y) =
qk (x) e iky ,
Z = {0, ±1, ±2, . . . }
k∈Z
with the finite norm qs =
qk C es|k| ,
s>0
k∈Z
provided vx (+0, y, t) = f(y) θ(t)+2δ(t). Here θ(t) is the Heaviside step function and δ(t) is the Dirac delta-function. The technique of Banach spaces can be applied to the problem of determining c(x, y) in (15.2.1) as well. Let us consider (15.2.5)–(15.2.7) in the half-space x > 0. This is the direct well-posed problem [44] and we may define the Neumann-to-Dirichlet map [48] Λq : L2 ((0, +∞) × [0, T ]) −→ H 1 ((0, +∞) × [0, T ]), g1 (y, t) −→ f1 (y, t). Let us consider the problem of determining q(x, y) when the map Λq is given. Theorem 15.2.2 ([43]). Λq1 = Λq2 implies q1 = q2 provided: q1 , q2 ∈ L∞ ((0, +∞) × R), and are constant (possibly different) on {x : |x| > r} and T > (π + 1)r.
374
15.2.1
15
Numerical Methods for Solving Inverse Hyperbolic Problems
The two-dimensional analogy of Gel’fand-Levitan-Krein equation
Let us consider the inverse problem of finding density ρ(x, y). The main goal is to reduce multidimensional nonlinear inverse problem to the system of linear integral equations (multidimensional analogy of Gel’fand-Levitan-Krein equation [21, 31]). We consider the sequence of forward problems Aρ u(k) =
' & ∂2 u(k) = 0, − Δ + ∇ ln ρ(x, y) ∇ x,y x,y x,y ∂t2 x > 0, y ∈ (−π, π), t > 0, k ∈ Z; ≡ 0; u(k) t 0,
y ∈ (−π, π),
w(m) (0, y, t) = e imy δ(t),
t ∈ R,
m ∈ Z;
(15.2.14)
(m)
∂w (0, y, t) = 0. ∂x
The solution to (15.2.14), (15.2.15) has the form " # w(m) (x, y, t) = S (m) (x, y) δ(x + t) + δ(x − t) + w˜ (m) (x, y, t),
(15.2.15)
(15.2.16)
where function w˜ (m) (x, y, t) is piecewise smooth and ( 1 ρ(x, y) imy (m) e . S (x, y) = 2 ρ(0, y) It is obvious, that u
(k)
(x, y, t) =
R
k
(k) fm (t − s) w(m) (x, y, s) ds,
(15.2.17)
15.2 Gel’fand-Levitan-Krein Method
375 (k)
for x > 0, y ∈ (−π, π), t ∈ R and k ∈ Z. Here function fm is Fourier coefficient of function f (k) in (k) f (k) (y, t) = fm (t) e imy . m (k)
We continue functions u (x, y, t) and f (k) (y, t) in odd way by t. Let us apply to (15.2.17) operation ∂ ∂t
x 0
and denote V
(m)
(·) dξ ρ(ξ, y) x
(x, y, t) = 0
w(m) (ξ, y, t) dξ. ρ(ξ, y)
(15.2.18)
Then ∂ ∂t
x 0
=2
∂ u(k)(ξ, y, t) dξ = ρ(ξ, y) ∂t
(k) V (m)(x, y, s)fm (t − s) ds
m
R
x
(k) fm (+0)V (m) (x, y, t)
+
m
= −2V (k) (x, y, t) +
x
(k) fm (t − s)V (m) (x, y, s) ds
m −x
(k) fm (t − s)V (m)(x, y, s) ds.
m −x
Let us investigate function for x > |t| (k)
G
∂ (x, t) = ∂t
π x −π 0
u(k) (ξ, y, t) dξ dy. ρ(ξ, y)
The form of the solution to problem (15.2.9)–(15.2.13) leads to G(k) x (x, t) = 0,
(k)
Gt (x, t) = 0,
|t| < x.
Therefore G(k) (x, t) = const for |t| < x. Let us find these constants. We continue functions u(k)(x, y, t) and ρ(x, y) in even way by x. Then u(k) (x, y, t) is solution of following problem: Aρ u(k) = 0, x ∈ R, y ∈ (−π, π), t > 0; k ∈ Z; ∂u(k) = −2 e iky δ(x). u(k) t=0 = 0; ∂t t=0
376
15
Numerical Methods for Solving Inverse Hyperbolic Problems
Therefore (k)
(k)
:= lim G
G
t→+0
1 lim = 2 t→+0
π x (x, t) = lim
t→+0 −π 0
π x
(k)
(k)
ut (ξ, y, t) dξ dy ρ(ξ, y)
ut (ξ, y, t) dξ dy = − ρ(ξ, y)
−π −x
π
−π
Let us denote (m)
Φ
π (x, t) =
e iky dy, ρ(0, y)
k ∈ Z.
V (m)(x, y, t) dy.
−π
Then we obtain the system of integral equations for finding function Φ(m) (x, t):
x
(k)
2Φ
(x, t) −
(k) fm (t − s) Φ(m) (x, s) ds = G(k) ,
(15.2.19)
m −x
where |t| < x, k ∈ Z. The system (15.2.19) is the multidimensional analog of Gel’fand-Levitan-Krein equation. Using (15.2.16) and (15.2.18) we can calculate the value of function V (m)(x, y, t) for t = x: e imy . V (m) (x, y, x − 0) = 2 ρ(x, y)ρ(0, y) Therefore (m)
Φ
π (x, x − 0) = − −π
e imy dy. 2 ρ(x, y)ρ(0, y)
The solution of inverse problem (15.2.9)–(15.2.13) can be found from ρ(x, y) =
#−2 π2 & (m) Φ (x, x − 0) e−imy . ρ(0, y) m
(15.2.20)
Therefore it is necessary to solve equation (15.2.19) with fixed parameter x0 and calculate ρ(x0 , y) by formula (15.2.20) for finding solution ρ(x, y) for x0 > 0. It is easy to show that if for y ∈ Rn and n — arbitrary (multidimensional inverse problem), then in the same way one can derive the system (15.2.19) and formula (15.2.20).
15.2 Gel’fand-Levitan-Krein Method
15.2.2
377
N -approximation of Gel’fand-Levitan-Krein equation
Let us assume that u(k)(x, y, t) =
iny u(k) . n (x, t) e
(15.2.21)
n∈Z
Substituting the series expansion (15.2.21) into (15.2.9) we obtain the sequences of inverse problems for the Fourier coefficients: (k) (k) (k) ∂am ∂un−m ∂ 2 un ∂ 2 un 2 (k) (x) = − n u − n ∂t2 ∂x2 ∂x ∂x m (k) m(n − m)am (x)un−m, x > 0, +
t > 0,
k, n ∈ Z; (15.2.22)
m
u(k) n |t 0, |k| ≤ N , E N A(x) and B(x) are defined by the relations &
(k)
(x, t) A(x) V N &
B(x)
' n
= n2 vn(k) +
(k)
m(n − m) am (x) vn−m (x, t);
|m|≤N |m−n|≤N
' ∂ (k) VN (x, t) = − ∂x n
|m|≤N |m−n|≤N
(k)
∂v ∂am (x) n−m . ∂x ∂x
(k) (x, t) → Using technique developed in [23, 28] one can prove that V N when N → ∞ in special functional spaces. To solve the N -approximation (15.2.26)–(15.2.29) is to find vector-function (k) VN (x, t) and coefficients of matrixes A(x) and B(x) by known vector-function (k)(t). F N Let us consider N -approximation of the multidimensional analog of Gel’fandLevitan-Krein equation (15.2.19): (k) (x, t) U N
(k)
2Φ
x
(x, t) −
(k) fm (t − s) Φ(m) (x, s) ds = G(k) ,
(15.2.30)
|m|≤N −x
where |t| < x, |k| ≤ N . Equation (15.2.30) can be written in matrix form as follows x 2EΦ(x, t) −
F(t − s) Φ(x, s) ds = G,
|t| < x.
(15.2.31)
−x
Here E is identity matrix, unknown vector function is Φ(x, t) = (Φ(−N ) (x, t), (0) (N ) T = (G(−N ) , . . . , . . . , Φ (x, t), . . . , Φ (x, t)) , vector of right-hand side is G (0) (N ) T G , . . . , G ) and matrix ⎛
(−N )
f−N
⎜ (−N +1) ⎜f ⎜ −N ⎜ .. ⎜ . ⎜ F(t) = ⎜ (0) ⎜ f−N ⎜ .. ⎜ . ⎝ (N )
f−N
(−N )
f−N +1
(−N +1)
f−N +1 .. . (0)
(N )
f−N +1 .. . f−N +1
... ... .. .
(−N )
f−N +1
(−N +1)
f−N +1 .. . (0)
(N )
... .. .
f−N +1 .. .
...
f−N +1
... ... .. .
(−N )
fN
⎞
(−N +1) ⎟ ⎟
fN
.. .
(0)
... .. .
fN .. .
...
fN
(N )
⎟ ⎟ ⎟ ⎟ (t). ⎟ ⎟ ⎟ ⎟ ⎠
15.3 Linearized Multidimensional Inverse Problem for the Wave Equation
15.2.3
379
Numerical results and remarks
Let h = T /Nx , x = lh, t = nh, [Φk ]nl = Φk (lh, nh). Approximate data has the form: f (t) = f(t) + εα(t)(fmax − fmin ), where ε is the noise level in data, α(t) is random number uniformly distributed on [−1, 1] for fixed t, fmax and fmin are maximum and minimum value of exact data. We fixed T = 1, Nx = 50, Ny = 500. We consider two different cases in numerical simulations for 2D inverse acoustic problem: smooth function ρ(x, y) (see Figures 15.2–15.3) ⎧ 2 πy ⎨ 1 + 4 cos( 2 + 1) e−160(x−0,3) , x ∈ (0.1, 0.5), y ∈ (−2, 2), 2 ρ(x, y) = 1 + 1.5 cos(πy + 1) e−200(x−0,8) , x ∈ (0.7, 0.9), y ∈ (−1, 1), ⎩ 1, else and piecewise constant function ρ(x, y) (see Figures 15.4 and 15.5) ⎧ ⎨ 2, x ∈ (0.2, 0.5), y ∈ (−1, 1), ρ(x, y) = 1.5, x ∈ (0.7, 0.8), y ∈ (−0.5, 0.5), ⎩ 1, else. For fixed time t = nh discrete analog of Gel’fand-Levitan-Krein equation (15.2.31) is the system of algebraic equations Aq = f, where A is matrix (l + 1) × (2M + 1) and l = 1, . . . , n. Approximate solution of the inverse problem N '−2 π2 & (m) Φ (lh, lh − 0) e−imy . ρ(lh, y) = ρ(0, y) m=−N
The GLK method determines the solution of inverse problem in particular point x0 in depth without any special calculations of unknown coefficients on the interval (0, x0 ). The GLK method determines the smooth functions better than piecewise constant functions (see Figures 15.2–15.3 and Figures 15.4–15.5). It was proved [28] that N is regularization parameter and numerical results confirm it (see Figure 15.1). For exact data we see that ρex − ρL2 → 0 when N → ∞, when the data has a noise then ∃N0 : ρex − ρL2 → ∞ ∀N > N0 .
15.3
Linearized Multidimensional Inverse Problem for the Wave Equation
We consider the problem of finding the wave propagation speed in the halfspace (z, y) ∈ R+ × Rn in the case where the velocity can be represented in the
380
15
Numerical Methods for Solving Inverse Hyperbolic Problems
Figure 15.1. ρ − ρN L2 (left) and ρ − ρN,e L2 , ε = 0.05 (right).
Figure 15.2. Exact solution (left) and reconstruction result, N = 5 (right).
Figure 15.3. Reconstruction results for N = 20: ε = 0 (left) and ε = 0.05 (right).
form c2 (z, y) = c20 (z) + c1 (z, y), the function c1 is substantially less than c20 and the domain of the half-space where c1 is nonzero is finite. In our analysis of a linearized version of the inverse problem, we will prove the uniqueness theorem,
15.3 Linearized Multidimensional Inverse Problem for the Wave Equation
381
Figure 15.4. Exact solution (left) and reconstruction result, N = 5 (right).
Figure 15.5. Reconstruction results for N = 20: ε = 0 (left) and ε = 0.05 (right).
obtain a conditional stability estimate, construct a regularizing sequence that converges to the exact solution of the linearized inverse problem and design a finite difference algorithm for solving the inverse problem.
15.3.1
Problem formulation
Assume that the wave propagation speed in the half-space (z, y) ∈ R+ × Rn , y = (y1 , . . . , yn ) has the following structure (Kabanikhin, 1988c) c2 (z, y) = c20 (z) + c1 (z, y).
(15.3.1)
In addition, assume that the functions c0 and c1 satisfy conditions A0 and A1 defined as follows. Condition A0 : (1) c0 ∈ C 2 (R+ ), c (+0) = 0; (2) there exist constants M1 , M2 , M3 ∈ R+ such that the following inequalities hold for all z ∈ R+ : 0 < M1 ≤ c0 (z) ≤ M2 ,
c0 C 2 (R+ ) < M3 .
382
15
Numerical Methods for Solving Inverse Hyperbolic Problems
Condition A1 : (1) The function c1 (z, y) vanishes outside the domain (z, y) ∈ (0, h)×Kn (D1 ), $ % Kn (D1 ) = y ∈ Rn | |yj | < D1 , j = 1, n , where h, D1 ∈ R+ are given numbers. (2) c1 (z, y) ∈ C 2 ((0, h) × Kn (D1 )), α = c1 C 2 ((0,h)×Kn(D1 )) ,
α M1 .
Suppose that the medium is at rest at any instant before t = 0: ut 0, α∗ > 0, that for any initial guess Q(0) and any α ∈ (0, α∗ ) the approximations Q(n) of Landweber iteration converge to the solution QT as n → ∞ and 2 QT − Q(n)
L2 ( )
15.4.4
≤ ν∗n C∗2 .
Numerical results
We consider the following noisy data: f ε = f + εα(t) (fmax − fmin ), where ε is the noise level, α(t) is random number from [−1, 1] for fixed t, fmax and fmin are maximum and minimum values of exact data. The following parameters are fixed: = 1, the size of mesh is 50, N = 10, γ = 1, β = 0.5, α = 0.1. On Figure 15.6 we show the exact solution and the initial guess. On Figure 15.7 the approximate solution after 330 iterations with L2 -cutting and the approximate solution after 1000 iterations without L2 -cutting are demonstrated. On Figure 15.8 the approximate solution after 330 iterations with L2 cutting and the approximate solution after 1000 iterations without L2 -cutting with noisy data ε = 0.01 are demonstrated.
Figure 15.6. 2D: a — exact solution, b — initial guess.
One can see how the involving the constant r into algorithms decreases the number of iterations. Moreover the modified algorithms become independent on initial guess.
390
15
Numerical Methods for Solving Inverse Hyperbolic Problems
Figure 15.7. 2D, Landweber iteration, ε = 0: a — L2 -cutting, 330 iterations; b — without L2 -cutting, 1000 iterations.
Figure 15.8. 2D, Landweber iteration, ε = 0.01: a — L2 -cutting 330 iterations; b — no L2 -cutting, 1000 iterations.
Acknowledgements The work was supported by Federal Target Grant “Scientific and educational personnel of innovation of Russia” for 2009–2013 (government contract No. 17.740.11.0350) and RFBR grant 11-01-00105.
References [1] A. S. Alekseev, Inverse dynamical problems of seismology, Some Methods and Algorithms for Interpretation of Geophysical Data, Nauka, Moscow, 9-84, 1967 (in Russian). [2] A. S. Alekseev and V. I. Dobrinskii, Questions of practical application of dynamical inverse problems of seismology, Mathematical Problems of Geophysics, 6(2).
References
[3] [4]
[5] [6] [7] [8]
[9]
[10] [11]
[12] [13] [14]
[15] [16] [17] [18]
[19] [20]
391
Computer Center, Siberian Branch of USSR Academy Sci., Novosibirsk, 7-53, 1975 (in Russian). J. S. Azamatov and S. I. Kabanikhin, Nonlinear Volterra operator equations. L2 -theory, J. Inv. Ill-Posed Problems, 7, 487-510, 1999. L. Beilina and M. V. Klibanov, Synthesis of global convergence and adaptivity for a hyperbolic coefficient inverse problem in 3D, Journal of Inverse and Ill-Posed Problems, 18, 85-132, 2010. M. I. Belishev, Boundary control in reconstruction of manifolds and metrics (the bc method), Inverse Problems, 13, R1-R45, 1997. M. I. Belishev and V. Yu. Gotlib, Dynamical variant of the bc-method: theory and numerical testing., J. Inverse and Ill-Posed Problems, 7, 221-240, 1999. J. G. Berryman and R. R Greene, Discrete inverse methods for elastic waves, Geophys., 45, 213-233, 1980. A. S. Blagoveshchensky, One-dimensional inverse problem for a hyperbolic equation of the second order, Mathematical Questions of the Wave Propagation Theory, 2, LOMI Akad. Nauk SSSR, 85-90, 1969 (in Russian). A. S. Blagoveshchensky, The local method of solution of the non-stationary inverse problem for an inhomogeneous string, Proceedings of the Mathematical Steklov Institute, 115, 28-38, 1971 (in Russian). K. Bube, Convergence of discrete inversion solutions, Inverse Problems of Acoustic and Elastic Waves, SIAM, 20-47, 1984. R. Burridge, The Gel’fand-Levitan, the Marchenko and the Gopinath-Sondhi integral equation of inverse scattering theory, regarded in the context of inverse impulse-response problems, Wave Motion, 2, 305-323, 1980. K. M. Case and M. Kack, A discrete version of the inverse scattering problem, J. Math. Phys., 4, 594-603, 1983. I. I. Eremin and V. V. Vasin, Operator and Iterative Processes of Fejer Type. Ural Branch of RAS, Ekaterinburg, 2005 (in Russian). I. M. Gel’fand and B. M. Levitan, On the determination of a differential equation from its spectral function, Izv. Akad. Nauk SSSR. Ser. Mat., 15, 309-360, 1951 (in Russian). G. M. L. Gladwell and N. B. Willms, A discrete Gel’fand-Levitan method for band-matrix inverse eigenvalue problems, Inverse Problems, 5, 165-179, 1989. B. Gopinath and M. Sondhi, Inversion of the telegraph equation and the synthesis of nonuniform lines, Proc. IEE, 59(3), 383-392, 1971. P. Goupillaud, An approach to inverse filtering of nearsurface layer effects from seismic record, Geophysics, 26, 754-760, 1961. S. He and S. I. Kabanikhin, An optimization approach to a three-dimensional acoustic inverse problemin the time domain, J. Math. Phys., 36(8) 4028-4043, 1995. V. K. Ivanov, V. V. Vasin and V. P. Tanana, Theory of Linear Ill-Posed Problems and its Applications, VSP, Utrecht, 2002. S. I. Kabanikhin, On the solvability of a dynamical problem of Seismology, Conditionally Well-Posed Mathematical Problems and Problems of Geophysics, Computer Center, Siberian Brunch of USSR Academy of Sciences, Novosibirsk, 43-51, 1979, (in Russian).
392
15
Numerical Methods for Solving Inverse Hyperbolic Problems
[21] S. I. Kabanikhin, Linear Regularization of Multidimensional Inverse Problems for Hyperbolic Equations, Preprint No. 27. Sobolev Institute of Math., Novosibirsk, 1988 (in Russian). [22] S. I. Kabanikhin, Projection-Difference Methods for Determining the Coefficients of Hyperbolic Equations, SO Akad. Nauk SSSR, Novosibirsk, 1988 (in Russian). [23] S. I. Kabanikhin, Projection-Difference Methods of Determening the Coefficients of Hyperbolic Equations, Nauka, Novosibirsk, 1988 (in Russian). [24] S. I. Kabanikhin, On linear regularization of multidimensional inverse problems for hyperbolic equations, Sov. Math. Dokl., 40(3), 579-583, 1990. [25] S. I. Kabanikhin, Methods of solving dynamical inverse problems for the hyperbolic equations, Ill-Posed Problems of Mathematical Physics and Analysis, Nauka, Novosibirsk, 109-123, 1992 (in Russian). [26] S. I. Kabanikhin, R. Kowar and O. Scherzer, On the Landweber iteration for the solution of parameter identification problem in a hyperbolic partial differential equation of second order, J. Inv. Ill-Posed Problems, 6(5), 403-430, 1998. [27] S. I. Kabanikhin, G. B. Bakanov and M. A. Shishlenin, Comparative Analysis of Methods of Finite-Difference Scheme Inversion, Newton-Kantorovich and Landweber Iteration in Inverse Problem for Hyperbolic Equation, Preprint N 12. Novosibirsk State University, Novosibirsk, 2001 (in Russian). [28] S. I. Kabanikhin, O. Scherzer and M. A. Shishlenin, Iteration methods for solving a two-dimensional inverse problem for a hyperbolic equation, J. Inv. Ill-Posed Problems, 11(1), 87-109, 2003. [29] S. I. Kabanikhin, O. Scherzer and M. A. Shishlenin, Iteration method for solving a two dimensional inverse problem for a hyperbolic equation, J. Inv. Ill-Posed Problems, 11(1), 2003. [30] S. I. Kabanikhin and M. A. Shishlenin, Boundary control and Gel’fand-LevitanKrein methods in inverse acoustic problem, J. Inv. Ill-Posed Problems, 12(2), 125-144, 2004. [31] S. I. Kabanikhin, A. D. Satybaev and M. A. Shishlenin, Direct Methods of Solving Inverse Hyperbolic Problems, VSP, The Netherlands, 2005. [32] S. I. Kabanikhin, Inverse and Ill-Posed Problems, Novosibirsk, Russia, 2009 (in Russian). [33] M. V. Klibanov and A. Timonov, Carleman Estimates for Coefficient Inverse Problems and Numerical Applications, VSP, Utrecht, The Netherlands, 2004. [34] M. V. Klibanov and A. Timonov, Numerical studies on the globally convergent convexification algoritm in 2D, Inverse Problems, 23, 123-138, 2007. [35] M. G. Krein, Solution of the inverse Sturm-Liouville problem, Dokl. Akad. Nauk SSSR, 76, 21-24, 1951 (in Russian). [36] M. G. Krein, On a method of effective solution of an inverse boundary problem, Dokl. Akad. Nauk SSSR, 94, 987-990, 1954 (in Russian). [37] G. Kunetz, Essai d’analyse de traces sismiques, Geophysical Prospecting, 9(3), 317-341, 1961. [38] G. Kunetz, Quelques exemples d’analyse d’enregistrements sismiques, Geophysical Prospecting, 11(4), 409-422, 1963. [39] F. Natterer, A Discrete Gel’fand-Levitan Theory, Technical report, Institut fuer Numerische und instrumentelle Mathematik, Universitaet Muenster, Muenster, Germany, 1994.
References
393
[40] B. S. Pariiskii, The inverse problem for a wave equation with a depth effect, Some Direct and Inverse Problems of Seismology, Nauka, Moscow, 139-169, 1968 (in Russian). [41] B. S. Pariiskii, Economical Methods for the Numerical Solutions of Convolution Equations and of Systems of Polynomial Equations with Toeplitz Matrices, Computer Center, USSR Academy Sci., Moscow, 1977 (in Russian). [42] B. S. Pariiskii, An economical method for the numerical solution of convolution equations, USSR Computational Math. and Math. Phys., 17(2), 208-211, 1978. [43] Rakesh, An inverse problem for the wave equation in the half plane, Inverse Problems, 9, 433-441, 1993. [44] Rakesh and W. W Symes, Uniquiness for an inverse problem for the wave equation, Commun. Part. Different. Equat., 13(15), 87-96, 1988. [45] V. G. Romanov and S. I. Kabanikhin, Inverse Problems of Geoelectrics (Numerical Methods of Solution), Preprint No. 32, Inst. Math., Siberian Branch of the USSR Acad. Sci., Novosibirsk, 1989 (in Russian). [46] V. G. Romanov, Local solvability of some multidimensional inverse problems for equations of hyperbolic type, Differential Equations, 25(2), 275-284, 1989 (in Russian). [47] F. Santosa, Numerical scheme for the inversion of acoustical impedance profile based on the Gel’fand-Levitan method, Geophys. J. Roy. Astr. Soc., 70, 229-244, 1982. [48] J. Sylvester and G. Uhlmann, A global uniqueness theorem for an inverse boundary value problem, Ann. of Math., 125, 153-169, 1987. [49] W. W. Symes, Inverse boundary value problems and a theorem of Gel’fand and Levitan, J. Math. Anal. Appl., 71, 378-402, 1979. [50] B. Ursin and K.-A. Berteussen, Comparison of some inverse methods for wave propagation in layered media, Proceed. IEEI, 74(3), 7-19, 1986.
Authors Information S. I. Kabanikhin Institute of Computational Mathematics and Mathematical Geophysics, Russian Academy of Sciences, Novosibirsk 630090, Russia. E-mail: [email protected] M. A. Shishlenin Sobolev Institute of Mathematics, Russian Academy of Sciences, Novosibirsk 630090, Russia. E-mail: [email protected]
Chapter 16
Inversion Studies in Seismic Oceanography H. B. Song, X. H. Huang, L. M. Pinheiro, Y. Song, C. Z. Dong and Y. Bai
Abstract. Seismic oceanography is a new cross discipline between seismology and physical oceanography. In the development of seismic oceanography, inversion of quantitative physical properties, such as sea water sound speed, temperature, salinity becomes the key problem. This chapter introduces preliminary results of thermohaline structure inversion by CTD/XBT-controlled wave impedance inversion.
16.1
Introduction of Seismic Oceanography
Marine reflection seismology has been used for decades as an efficient exploration method. Seismic signal, generated bellow sea surface, propagates downward, and a fraction of the acoustic energy is reflected when it encounters a contrast in the acoustic impedance. The upward propagating reflected acoustic energy is detected at the surface, and used to derive structural and physical information bellow seafloor. The traditional application of marine reflection seismology treats reflections bellow seafloor as useful signal while that from seawater are ignored as noise. Recently, Holbrook et al. (2003) found that shallow reflections in seismogram represent temperature and salinity fine structures of seawater, and marine reflection seismology can be used to do oceanographic research, thus originate a new discipline of seismic oceanography (Ruddick et al., 2009). Compared to conventional methods for detecting the ocean used in physical oceanography, seismic oceanography has the advantages of high lateral resolution and fast imaging (typical horizontal sample rate is about ten meters and time sample rate is far less than physical oceanographic methods), hence, has been more and more accepted by oceanographers (Song et al., 2008). In order to build benchmark calibration between reflection seismic and oceanographic data sets and make further progresses in seismic oceanography, EU launched the interdisciplinary GO (Geophysical Oceanography) project in 2006 for seismic and oceanographic joint investigation (http://www.dur.ac.uk/eu.go/general_public/ project_info.html).
396
16
Inversion Studies in Seismic Oceanography
First ESF exploratory workshop on seismic oceanography was held in Begur (Spain) on November 19-21, 2008 (http://www.cmima.csic.es/sow/content/1stesf-exploratory-workshop-seismic-oceanography). Until now, seismic oceanography has made a lot of progresses in prospecting oceanic features of eddy, internal wave and ocean front (e.g. Holbrook et al., 2005; Nakamura et al., 2006; Biescas et al., 2008; Krahmann et al., 2008; Song et al., 2009; Dong et al., 2009; Hobbs et al., 2009; Klaeschen et al., 2009; Sheen et al., 2009; Pinheiro et al., 2010; Song et al., 2010; see Figures 16.1–16.3).
Figure 16.1. Complete stacked seismic section along the processed Line IAM-5 (2004 processed version of Pinheiro et al. (2010)).
Inversion for oceanographic parameters of temperature and salinity from seismic data is a very important research field in seismic oceanography. In its early stage, seismic oceanography paid more attention to imaging boundaries in the ocean (Nandi et al., 2005; Nakamura et al., 2006) and discriminating reflection shape from seismogram than inversion for physical oceanographic parameters. After analyzing XBT (Expendable Bathythermograph) and XCDP (Expendable Conductivity-Temperature-Depth) derived oceanographic data combined with seismic data, Nandi et al. (2005) found that reflections in seismogram and thermohaline fine structures in the ocean have very good correlations with each other; quantitative research demonstrated that reflectors can specify the temperature change as small as 0.03 degree, and they also anticipated that high-resolution spatial distribution of temperature can be derived from seismogram. Tsuji et al. (2005) shows that seismogram can represent fine structures
16.1 Introduction of Seismic Oceanography
397
Figure 16.2. Seismic sections’ location in the northeastern South China Sea.
Figure 16.3. Seismic sections along the seismic profile in the northeastern SCS shown in Figure 16.2 and digitized undulate seismic reflectors (blue curves). Top (left), top (right), bottom (left) and bottom (right) show sections of continental slope, abyssal basin, Hengchun ridge and Luzon volcanic arc, respectively. Vertical scale in Two Way Traveling Time (ms), CMP interval is 6.25m.
398
16
Inversion Studies in Seismic Oceanography
of the Kuroshio Current. By comparing the amplitude of seismic signal, they found the max change of temperature was about 1 degree. Methods mentioned above emphasized qualitative description rather than quantitative research, but also showed that temperature can be inverted from seismogram. Paramo et al. (2005) analyzed temperature gradient of Norwegian Sea using AVO method; Wood et al. (2008) applied full wave inversion method to synthetic seismogram and real seismic data, and the result is agreeable. Both of these studies are restricted to 1-D. Recently, Papenberg et al. (2010) presented high-resolution result of 2-dimensional temperature and salinity distributions inversion applied to GO (Geophysical Oceanography) data, which can represent thermohaline fine structures of seawater. Here, we present a CTD/XBT constrained thermohaline structure inversion method. Synthetic seismogram experiment shows that 2D high-resolution temperature and salinity section can be derived from seismic data by using a few CTD data as controlled wells. Then, we analyzed low frequency seismic data from one multichannel seismic (MCS) line acquired in the scope of the European GO Project (line GOLR-12) combined with the simultaneously acquired XBT and CTD (Conductivity-Temperature-Depth) data. We used the post-stack constrained impedance inversion method to derive temperature and salinity distributions of seawater and the result demonstrated that this method can indeed provide reliable temperature and salinity distribution profiles every 6.25m along the seismic line, with the resolutions of 0.16◦ C and 0.04psu respectively.
16.2 16.2.1
Thermohaline Structure Inversion Inversion method for temperature and salinity
The inversion for temperature and salinity distributions of seawater from combined seismic and oceanographic data is divided into two steps as follow: the first step is inversion for velocity or impedance (product of sound speed and density) distribution from combined seismic and oceanographic data using a post-stack constrained impedance inversion method; and the second step is converting the inverted velocity distribution to temperature and salinity distributions. There are some differences between wave impedance inversion method in sea water layer and impedance inversion in oil and gas exploration. The former needs to carry out forward calculations based on CTD/XBT data. Based on sea water state equations (Fofonoff and Millard, 1983), we can calculate variations of the sound speed, density, and impedance from temperature and salinity changes with depth from CTD/XBT casts. While the latter does not need forward calculations. Post-stack constrained inversion is relatively mature that is widely used in
16.2 Thermohaline Structure Inversion
399
industrial issues. A lot of software has this function, among which we used STRATA module of Geoview. The method used in inversion is model based, which aim to build best-fitted model for real seismic data. Initial impedance model is built from seismic data and constrained-wells. Here, CTD or XBT stations are considered as constrained-wells. Each time the synthetic seismogram is derived from impedance model, the model will be adjusted after comparing with real seismic data. Repeat this procedure until reaching the best-fitted model as the inversion result. The inversion process contained importing constrained-wells and seismic data; extracting wavelet using statistical method; picking horizons or importing picked horizons; doing well correlation and extracting wavelet again using well data, well correlation is necessary since CTD/XBT measurement and seismic trace coordinates were not exactly the same; building initial model and doing inversion. Converting velocity to temperature and salinity is also an important task. We did this in an iterative process during which the empirically derived formula for relationship between velocity and temperature, salinity and depth was used (Wilson, 1960): vp = 1492.9 + 3(T − 10) − 6 · 10−3 (T − 10)2 − 4 · 10−2 (T − 18)2 + 1.2(S − 35) − 10−2 (T − 18)(S − 35) + Z/61, where vp is the velocity (m/s), T is the temperature (◦ C), S is the salinity (), and Z is the depth (m), respectively. The iteration is processed as follow: given the initial salinity of 36psu, and derive a temperature using the formula above, then the salinity is adjusted according to the T-S relationship in the region based on CTD casts until convergence reached. This approach works because the variation scales of temperature and salinity and T-S relationship leading to a unique pair of temperature and salinity for a given acoustic velocity. More detailed information can be found in Song et al. (2010), Dong (2010) and Huang et al. (2011).
16.2.2
Inversion experiment of synthetic seismic data
We use synthetic seismic record (including 121 seismic trace numbers)(Ruddick et al., 2009) that is based on thirteen CTD stations for detecting Meddy Sharon as inversion seismic data (Figure 16.4). Ruddick et al. (2009) calculated vertical variation of wave impedance based on thirteen stations of CTD data, extracted seismic wavelet from IAM5 section and then obtained 121 synthetic seismic records by convolution method and linear interpolation method. In the inversion process, we use only four CTD stations (CTD126, CTD129, CTD133 and CTD136) as “constrained wells”. CTD stations correspond to the seismic trace number in Table 16.1.
400
16
Inversion Studies in Seismic Oceanography
Figure 16.4. Synthetic seismic record. Table 16.1. CTD correlation to the seismic trace number.
CTD 124 125 126 127 128 129 131 132 133 134 135 136 137 Trace No. 1 11 21 31 41 51 61 71 81 91 101 111 121 Two-dimensional temperature and salinity structure section at depth of 400 to 2000m (Figures 16.5 and 16.6) can be obtained from above inversion method. A comparison study of the seawater temperature (in the range of 4–14◦ C) between the observed and inverted ones shows that they have the same trend which manifests that temperature is high in the meddy core and relatively low in surrounding areas. The phenomenon reflects that there is some difference between Mediterranean Outflow Water in the meddy core and the North Atlantic mid-water in surrounding areas. Inverted salinity of the seawater changes the same trend as observed salinity. It is higher in the meddy core of salinity structure profile than other areas, and the range of salinity is from 35 to 37psu. It shows that there is a clear separation line corresponding to the 61th seismic trace in Figure 16.6 (bottom).The reasons for this phenomenon are that one three-order polynomial fitting cannot estimate the relationship between temperature and salinity for the whole region in inversion processing, so we need to separate two parts (seismic trace number 1 to 60 and seismic trace number 61 to 121). In order to check further the inversion results, we extract five CTD data (CTD124, CTD127, CTD128, CTD131, CTD136). The study of comparisons between observed temperature and salinity and inverted ones shows that the inversion results are good, as shown in Figure 16.7 and Figure 16.8. The starting point and the ending point of data for different CTD stations are different. The minimum depth changes from 400 to 724m and the maximum depth changes
16.2 Thermohaline Structure Inversion
401
Figure 16.5. Top: observed and interpolated temperature of sea water; bottom: inverted temperature of sea water.
Figure 16.6. Top: observed and interpolated salinity of sea water; bottom: inverted salinity of sea water.
from 1366 to 1836m, so the effective ranges of depth are different for each location. Specific performances are that the fitting degree is poorer in the upper and lower parts, and is better in the middle parts. There are some distinct differences of observed data and inverted ones in CTD131 at the depth between 1000 and 1400m. CTD131 is corresponding to the seismic trace number 61 which is the boundary for the temperature and salinity relationship subdivisions. It shows that inverted temperature and salinity results are relatively accurate and the proposed inversion method is effective based on comparisons between inverted results and CTD data.
402
16
Inversion Studies in Seismic Oceanography
Figure 16.7. Comparisons of observed (black line) and inverted (black spots) temperature of sea water: from left to right the figures represents CTD124, CTD127, CTD128, CTD131 and CTD136, respectively.
Figure 16.8. Comparisons of observed (black line) and inverted (black spots) salinity of sea water: from left to right the figures represents CTD124, CTD127, CTD128, CTD131 and CTD136, respectively.
16.2.3
Inversion experiment of GO data (Huang et al., 2011)
In 2007, the GO (Geophysical Oceanography) project by European Union carried out a two-month observation in the Gulf of Cadiz, and acquired data are more than 40 seismic lines including high frequency, middle frequency and low frequency, more than 500 simultaneously measured XBT sets and 43 CTD sets (Hobbs et al., 2009; http://www.dur.ac.uk/eu.go/general_public/ project_info.html). A wide range of oceanographic features were found in researching area. Warm and saline seawater spills out of Mediterranean from the Straits of Gibraltar and flows along the south margin of Iberia peninsula.
16.2 Thermohaline Structure Inversion
403
The Portimao Canyon, south of Portugal, disrupts this flow and is a locus for the creation of Meddies, Mediterranean eddies, which detach from the main vein and drift away from the continental slope at the depth of neutral buoyancy. During observation, Meddies were repeatedly captured by seismic and oceanographic data. We analyzed low frequency seismic data of GOLR12. Acquisition parameters are: seismic source is a 1500L BOLT air-guns system with main frequency of 5–60 Hz released at the depth of 11m; shot interval is 37.5m; reflected signal is recorded by 2400m long SERCEL streamer, towed at 8m depth with 192traces (12.5m spacing), and near offset is 84m. 24 XBT sets and 2 CTD profiles were acquired simultaneously. Locations of seismic line, XBT and CTD are demonstrated in Figure 16.9.
Figure 16.9. Seafloor topography in the study area and locations of the seismic line GOLR-12, XBT, CTD. The red line represents the seismic line, the green solid points represent XBT measurements, and the yellow stars represent CTD locations.
Seismogram (Huang et al., 2011) shows that reflectors distribute mostly between the depths of 600m and 1500m, and form a clear elliptical shape. Inverted profiles (Figure 16.10 and Figure 16.11) show that the elliptical area has the features of high acoustic velocity, temperature and salinity that are in accordance with that of warm and saline seawater from Mediterranean, hence,
404
16
Inversion Studies in Seismic Oceanography
Figure 16.10. Inverted temperature distribution. Dot lines represent locations of CTD.
Figure 16.11. Inverted salinity distribution.
the elliptical structure can be interpreted as a Meddy. Strong reflections exist at the boundary of water volume where, on the velocity and salinity profiles, lots of fine structures can be detected, which indicate that interactions between Mediterranean seawater and Atlantic seawater are very strong at this place. Big temperature and salinity gradients in the same place also represent strong material and energy interactions. On the other hand, slight reflections in the interior of water column indicate the not-so-strong but still existed nonuniformity which represent fine structures of temperature and salinity. XBT-measured temperature and XBT-derived salinity using T-S relationship were compared with the inverted values, see Figure 16.12 and Figure 16.13. As can be seen from the figures, a better agreement is achieved in low frequency component than high frequency component. Quantitatively calculated results indicate that mean square errors of temperature and salinity are 0.16◦ C and 0.04psu respectively. Taking their variation ranges into consideration, inversion result of temperature is better than that of salinity, which is acceptable, because
16.2 Thermohaline Structure Inversion
405
Figure 16.12. Inverted and XBT temperature distribution. Blue lines represent XBT data; red lines represent inverted data.
Figure 16.13. Inverted and XBT-derived salinity distribution. Blue lines represent XBT-derived data from temperature and salinity relationship; red lines represent inverted data.
previous study have found that relative contribution of salinity contrasts to reflectivity is approximately 20% (Sallares et al., 2009), far less than that of temperature, or reflectivity is insensitive to salinity change; and salinity, used for inversion, is derived from T-S relationship in an average manner, which might also affect the inversion results. From comparisons above we can see that although inversion results can represent temperature and salinity distributions and thermohaline fine structures
406
16
Inversion Studies in Seismic Oceanography
very well, errors are inevitable. We consider the following possible reasons: first, seismic data processing procedure may introduce error. Removal of the direct wave and filtering would affect amplitude and shape of signal. Poor results have also been the impact of inaccurate velocity used in processing; second, in the region of strong interaction, T-S relationship is more complex than we used which is in an average manner, hence, accurate result is very difficult to achieve; third, main frequency of the seismic data we used is so low that fine structures with the scale less than 15m can not be recognized. Temperature and salinity inversion resolutions of Papenberg et al.(2010) are 0.1◦ C and 0.1psu respectively, and ours are 0.16◦ C and 0.04psu. Notice that their method of deriving temperature and salinity is using inversion result from seismic data as high frequency component and XBT-derived as low frequency component, hence, their inversion errors are very close to the high frequency component inverted from seismic data, and the higher resolution can be explained. There is no resolution analysis at the place without XBT constraint which is not complete. We inverted combined seismic and oceanographic data using a post-stack constrained impedance inversion method; generally speaking, the inversion results are consistent with XBT data. With the constraints of seismic reflectors, the area between XBT sets can also achieve high inversion resolution. This method can indeed provide reliable temperature and salinity distribution profiles every 6.25m along the seismic line, which can be used for analysis on small scale oceanographic features.
16.3
Discussion and Conclusion
Physical oceanographic research is based on a large amount of observational data. Mesoscale and small scale thermohaline fine structures are very difficult to be observed using its conventional observation method because of its low lateral resolution and long observation time. Seismic oceanography makes up for these deficiencies, and makes it possible for fast imaging of the entire research area with high lateral resolution. In its early stage, seismic oceanography was mostly used for discriminating seawater boundaries in the ocean, and pointed out that the amplitude of seismic data represents temperature and salinity gradients. In this chapter, two-dimensional temperature and salinity structure section is obtained by using CTD/XBT-controlled seawater wave impedance inversion. The comparison of inverted temperature and salinity from synthetic seismogram and observed ones shows a good match. Then we analyzed low frequency seismic data of GOLR12 combined with simultaneously acquired XBT and CTD data, derived temperature and salinity distributions of seawater using this post-stack constrained impedance inversion method with resolutions of 0.16◦ C and 0.04psu respectively which further demonstrate that seismic data can be used to extract
16.3 Discussion and Conclusion
407
two dimensional temperature and salinity distributions, hence, provide high lateral resolution data for physical oceanographic research. Although inversion result is affected by quality of seismic data, data processing and complexity of thermohaline structures in research area, it can be anticipated that with its development, seismic oceanography will play a more and more important role in physical oceanographic research. Besides post-stack inversion studies, We also inverse the physical parameter contrasts of north-eastern South China Sea by AVO (Amplitude Vesus Offset) technique (Dong, 2010) and inverse the seismic data of northeastern South China Sea and obtain the 1D velocity structure of seawater at three CMP (Common Mid-Points) locations from full waveform inversion (Dong, 2010). We propose a technique to explore the 2D thermohaline structure of seawater based on hybrid inversion method of pre/post-stack seismic data (Dong, 2010), resolving the problem that the lack of constraint of simultaneous thermohaline data for seismic data, and this technique is easy to realize. Combining with the historical thermohaline relations, we inverse the legacy seismic data of north-east of South China Sea and obtain the 2D thermohaline structure (Dong, 2010), which can set up a solid foundation for further studies of physical oceanographic processes such as seawater mixing and internal wave breaking, etc. It is shown in this paper and Papenberg et al. (2010) that thermohaline structure section with high lateral resolution can be obtained from seismic inversion based on the seismic data and few CTD/XBT data. The method could make up some disadvantages of conventional measurement methods of physical oceanography, and provide vast oceanographic data for ocean science studies. In addition, it should be noted that the inversion study for thermohaline structure is preliminary, and it need to be further studied.
Acknowledgements The research was co-financially supported by National Natural Science Foundation of China (No. 41076024), the European funded GO Project (Geophysical Oceanography-FP6-2003-NEST 15603) and the National Major Fundamental Research and Development Project of China (No. 2011CB403503). The seismic data used here were collected as part of the GO-project supported by the EU project GO (15603) (NEST), the United Kingdom Natural Environment Research Council and the German research agency DFG (KR 3488/1-1). We are grateful to Dr. Richard Hobbs and Dr. Dirk Klaeschen for their kind permission of using seismic and oceanographic data to do post-stack inversion.
408
16
Inversion Studies in Seismic Oceanography
References [1] B. Biescas, V. Sallarès, J. L. Pelegrí, F. Machín, R. Carbonell, G. Buffett, J. J. Dañobeitia and A. Calahorrano, Imaging meddy finestructure using multichannel seismic reflection data, Geophysical Research Letters, 35, 2008, L11609, doi:10.1029/2008GL033971. [2] C. Z. Dong, H. B. Song, T. Y. Hao, L. Chen and Y. Song, Studying of oceanic internal wave spectra in the northeast south China sea from seismic reflections, Chinese J. Geophys., 52(8), 2050-2055, 2009 (in Chinese). [3] C. Z. Dong, Seismic oceanography research on inversion of sea-water thermohaline structure and internal waves in South China Sea, Ph.D. thesis, Institute of Geology and Geophysics, Chinese Academy of Sciences, 2010. [4] N. P. Fofonoff and Jr. R. C. Millard, Algorithms for computation of fundamental properties of seawater, Tech. Pap. Mar. Sci., 44, UNESCO, Paris, 1983. [5] R. W. Hobbs, D. Klaeschen, V. Sallarès, E. Vsemirnova and C. Papenberg, Effect of seismic source bandwidth on reflection sections to image water structure, Geophys. Res. Lett., 36, 2009, L00D08, doi:10.1029/2009GL040215. [6] W. S. Holbrook and I. Fer, Ocean internal wave spectra inferred from seismic reflection transects, Geophys. Res. Lett., 32, 2005, L15604, doi:10.1029/ 2005GL023733. [7] W. S. Holbrook, P. Paramo, S. Pearse and R. W. Schmitt, Thermohaline fine structure in an oceanographic front from seismic reflection profiling, Science, 301, 821-824, 2003. [8] X. H. Huang, H. B. Song, L. M. Pinheiro and Y. Bai, Ocean temperature and salinity distributions inverted from combined reflection seismic and hydrographic data, Chinese J. Geophys., 54(5), 1293-1300, 2011 (in Chinese). [9] D. Klaeschen, R. W. Hobbs, G. Krahmann, C. Papenberg and E. Vsemirnova, Estimating movement of reflectors in the water column using seismic oceanography, Geophys. Res. Lett., 36, 2009, L00D03, doi:10.1029/2009GL038973. [10] G. Krahmann, P. Brandt, D. Klaeschen and T. J. Reston, Mid-depth internal wave energy off the Iberian Peninsula estimated from seismic reflection data, Journal of Geophysical Research, 113, 2008, C12016, doi:10.1029/2007JC004678. [11] Y. Nakamura, T. Noguchi, T. Tsuji, S. Itoh, H. Niino and T. Matsuoka, Simultaneous seismic reflection and physical oceanographic observations of oceanic fine structure in the Kuroshio extension front, Geophys. Res. Lett., 33, 2006, L23605, doi:10.1029/2006GL027437. [12] P. Nandi, W. S. Holbrook, S. Pearse, P. Paramo and R. W. Schmitt, Seismic reflection imaging of water mass boundaries in the Norwegian Sea, Geophys. Res. Lett., 31, 2004, L23311, doi:10.1029/2004GL021325. [13] C. Papenberg, D. Klaeschen, G. Krahmann and R. W. Hobbs, Ocean temperature and salinity inverted from combined hydrographic and seismic data, Geophys. Res. Lett., 37, 2010, L04601, doi:10.1029/2009GL042115. [14] P. Paramo and W. S. Holbrook, Temperature contrasts in the water column inferred from amplitude-versus-offset analysis of acoustic reflections, Geophys. Res. Lett., 32, 2005, L24611, doi:10.1029/2005GL024533. [15] L. M. Pinheiro, H. B. Song, B. Ruddick, J. Dubertd, I. Ambare, K. Mustafaa and R. Bezerra, Detailed 2-D imaging of the Mediterranean outflow and meddies off
References
[16] [17]
[18]
[19]
[20]
[21]
[22]
[23]
[24] [25]
409
W Iberia from multichannel seismic data, Journal of Marine Systems, 79, 89-100, 2010. B. Ruddick, H. B. Song, C. Z. Dong and L. Pinheiro, Water column seismic images as maps of temperature gradient, Oceanography, 22(1), 192-205, 2009. V. Sallares, B. Biescas, G. Buffett, R. Carbonell, J. J. Dañobeitia and J. L. Pelegrí, Relative contribution of temperature and salinity to ocean acoustic reflectivity, Geophys. Res. Lett., 36, 2009, L00D06, doi:10.1029/2009GL040187. K. L. Sheen, N. J. White and R. W. Hobbs, Estimating mixing rates from seismic images of oceanic structure, Geophys. Res. Lett., 36, 2009, L00D04, doi:10.1029/2009GL040106. H. B. Song, Y. Bai, C. Z. Dong and Y. Song, A preliminary study of application of Empirical Mode Decomposition method in understanding the features of internal waves in the northeastern South China Sea, Chinese J. Geophys., 53(2), 393-400, 2010 (in Chinese). H. B. Song, C. Z. Dong, L. Chen and Y. Song, Reflection seismic methods for studying physical oceanography: Introduction of seismic oceanography, Progress in Geophysics, 23(4), 1156-1164, 2008 (in Chinese). H. B. Song, P. Luis, D. X. Wang, C. Z. Dong, Y. Song and Y. Bai, Seismic images of ocean meso-scale eddies and internal waves, Chinese J. Geophys., 52(11), 27752780, 2009 (in Chinese). Y. Song, H. B. Song, L. Chen, C. Z. Dong and X. H. Huang, Study of sea water thermohaline structure inversion from seismic data, Chinese J. Geophys., 53(11), 2696-2702, 2010 (in Chinese). T. Tsuji, T. Noguchi, H. Niino, T. Matsuoka, Y. Nakamura, H. Tokuyama, S. Kuramoto and N. Bangs, Two-dimensional mapping of fine structures in the Kuroshio Current using seismic reflection data, Geophys. Res. Lett., 32, 2005, L14609, doi:10.1029/2005GL023095. W. D. Wilson, Equation for the speed of sound in seawater, J. Acoust. Soc. Am., 32, 1357, 1960. W. T. Wood, W. S. Holbrook, M. K. Sen and P. L. Stoffa, Full waveform inversion of reflection seismic data for ocean temperature profiles, Geophys. Res. Lett., 35, 2008, L04608, doi:10.1029/2007GL032359.
410
16
Inversion Studies in Seismic Oceanography
Authors Information Haibin Song†, Xinghui Huang†‡ , Luis M. Pinheiro§ , Yang Song†‡ , Chongzhi Dong†‡ and Yang Bai†‡ †
Key Laboratory of Petroleum Resources Research, Institute of Geology and Geophysics, Chinese Academy of Sciences, Beijing 100029, P. R. China. E-mail: [email protected]
‡
Graduate University of Chinese Academy of Sciences, Beijing 100049, P. R. China.
§
Departamento de Geociências and CESAM, Universidade de Aveiro, 3800 Aveiro, Portugal.
Chapter 17
Image Resolution Beyond the Classical Limit L. J. Gelius
Abstract. Conventionally, limits in the image resolution are assumed to be explained by classical wavelength criteria like that of Rayleigh. The concept of resolution is here analyzed further in order to understand the basic cause of diffraction-limited imaging. This analysis is based on the theory of backpropagation of wavefields. It will be followed by a discussion of possible techniques that has the potential of giving image resolution beyond the classical limit (e.g. super resolution). Both synthetic and experimental data will be employed to support the theoretical developments.
17.1
Introduction
It is well known that experimental or numerical backpropagation of waves generated by a point-source/-scatterer will refocus on a diffraction-limited spot with a size not smaller than half of the wavelength. More recently, however, super-resolution techniques have been introduced that apparently can overcome this fundamental physical limit. This chapter provides a simple framework of understanding and analyzing both diffraction-limited imaging as well as superresolution. The resolution analysis presented in the first part of this chapter unifies the different ideas of backpropagation and resolution known from the literature and provides an improved platform to understand the cause of diffraction-limited imaging. It is demonstrated that the monochromatic resolution function consists of both causal and non-causal parts even for ideal acquisition geometries. This is caused by the inherent properties of backpropagation not including the evanescent field contributions. As a consequence, only a diffraction-limited focus can be obtained unless ideal acquisition surfaces and an infinite source-frequency band. In the literature various attempts have been made to obtain images resolved beyond the classical diffraction limit, e.g. super resolution. The main direction of research within optics has been to exploit the evanescent field components. However, this approach is not practical in case of for example seismic imaging, since the evanescent waves are so weak because of attenuation that they are masked by the noise. Alternatively, one may try to
412
17 Image Resolution Beyond the Classical Limit
estimate and deconvolve the space-variant non-ideal resolution function caused by the finite bandwidth and aperture. In the second section of this chapter a possible approach is discussed based on ray techniques. In the section to follow further improvements of the image resolution in case of point like targets is being discussed. It is demonstrated that resolution beyond the diffraction limit can apparently be obtained employing concepts adapted from conventional statistical multiple signal classification (MUSIC) (Schmidt, 1986). The basis of this approach is the decomposition of the measurements into two orthogonal domains: signal and noise (nil) spaces. On comparison with Kirchhoff prestack migration this technique is showed to give superior results for monochromatic data. However, in case of random noise the super-resolution power breaks down when employing monochromatic signals and a limited acquisition aperture. A modified algorithm is proposed that adds coherently over a frequency band and is able to recover the super-resolution capability also in case of noise. This coherent-phase MUSIC technique is demonstrated to work well when applied to experimental ultrasound data.
17.2
Aperture and Resolution Functions
This section is based on Gelius and Asgedom (2011). Consider a closed surface S defining a volume V of space as shown in Figure 17.1, and assume that receivers are distributed across the surface. A point source is located inside V at a position r 0 and illuminates the scatterers embedded in the (possible) non-uniform background medium. The monochromatic scattered wave field associated with a compact perturbation is now given by the Lippman-Schwinger equation (Newton, 1982):
ps ( r , ω) = p( r , ω) − G0 ( r , r 0 , ω) k02 ( r , ω)G0 ( r , r , ω)α(r )p(r , ω)dV , = V
Figure 17.1. Schematics of problem.
(17.2.1)
17.2 Aperture and Resolution Functions
413
where p represents the total wave field, G0 is the background Green’s function and k0 and α are respectively the background wavenumber and the scattering potential defined as:
ω c20 ( r ) α( r ) = (17.2.2) k0 ( r , ω) = , − 1 c0 ( r ) c2 ( r ) with c and c0 representing the space-variant velocities of respectively the scatterers and the background medium. Assume that the scattered waves are recorded by receivers placed on the surface S and backpropagated employing an integral formulation of migration as described by the Kirchhoff integral (Schneider, 1978; Wiggins, 1984; Langenberg, 1987; Esmersoy and Oristaglio, 1988; Schleicher et al., 2007). The backpropagated scattered wave field can then be written explicitly as: ∂ ∂ ∗ ∗ G ( r , r , ω)ps (r , ω) − G0 ( r , r , ω) ps (r , ω) dS . pbp ( r , ω) = − ∂n 0 ∂n S
(17.2.3) The surface integral defining the backpropagation of the scattered field can be replaced by an alternative volume integral formulation by combining Eqs. (17.2.1) and (17.2.3) (Esmersoy and Oristaglio, 1988):
pbp ( r , ω) =
k02 ( r , ω)B( r , r , ω)α(r )p( r , ω)dV ,
V
∂ G∗0 (r, r , ω)G0 ( r , r , ω) ∂n S ∂ ∗ − G0 (r, r , ω) G0 ( r , r , ω) dS . ∂n
B( r , r , ω) = −
(17.2.4)
Comparison between Eqs. (17.2.1) and (17.2.4) shows that the background Green’s function G0 ( r , r , ω) now is replaced by the backpropagation kernel B( r , r , ω). This kernel has a simple physical interpretation: it represents backpropagation of the wave field associated with a (secondary) point source located at r (cf. Eq. (17.2.3)), and is therefore closely related to the resolution function as discussed in the following. Ideally, the backpropagated field as given by Eq. (17.2.4) should resemble the scattered field as given by Eq. (17.2.1). However, due to the special properties of the backpropagation operation and consequently also the kernel B this will only be fulfilled in case of an ideal acquisition aperture (e.g. receivers uniformly distributed over the closed surface S) and an infinite frequency band. To realize this, the introduction of the
414
17 Image Resolution Beyond the Classical Limit
concept of a resolution function is very useful. Employing an U/D type of imaging condition (Claerbout, 1971) an estimate of the scattering potential can be obtained from the equation (integrating over the available frequency band Δω)
pbp (r, ω) dω iωp(r, ω) Δω ⎤ ⎡ p(r , ω) 1 dω ⎦ dV α( r ) ⎣ k02 (r , ω)B(r, r , ω)· = 2π iωp(r, ω) V Δω ≡ α(r ) · RF (r, r )dV , (17.2.5)
1 α(r) ∼ = π
V
where RF is the resolution function or point-spread function (Gelius et al., 2002). Since the kernel B is ‘singular’ when r = r , it is feasible to set p(r , ω)/p(r, ω) ∼ = 1 in the expression for the resolution function, e.g.: 2 1 k0 (r , ω) (17.2.6) RF (r, r ) = B(r, r , ω)dω. π iω Δω
According to Eq. (17.2.6) the governing part of the resolution function is given by the kernel B. For a monochromatic case, understanding the role of B also gives then automatically a good idea of the resolving power of imaging based on backpropagation as governed by Eq. (17.2.4) (and also migration based on integral formulations in general). In case of an ideal aperture the kernel B takes this special form (Gelius, 2009; Gelius and Asgedom, 2011): B(r, r , ω) = G0 (r, r , ω) − G∗0 (r, r , ω),
(17.2.7)
which represents a superposition of a time-advanced and time-retarded Green’s function. This implies that the kernel B is not causal, and as a consequence backpropagation of the recorded field associated with a point source as described by B will give a diffraction-limited focus. By introducing a finite frequency band backpropagation of the kernel B as given by Eq. (17.2.4) can be better visualized. Consider now the simple example shown in Figure 17.2. The wave fields generated by a point source embedded in a homogenous medium were recorded at receivers evenly distributed over a sphere (a total number of 1600 measurement points employed here and with the source placed in the centre). In the calculations the homogenous background medium was assigned a velocity of 2000 m/s and the source was represented by a band limited pulse defined by a Ricker wavelet with a centre frequency of 20 Hz. Backpropagation in time-domain is now given by Fourier synthesis over the frequency band
17.2 Aperture and Resolution Functions
415
Figure 17.2. Sketch of simple backpropagation example.
(here limited to be between 0 Hz and 50 Hz). Figure 17.3 shows snapshots of the backpropagated field (note that each snapshot has been individually scaled so only a qualitative description is obtained). It can be easily seen from Figure 17.3 the presence of non-causal contributions in snapshots (a) and (b). The diffraction-limited focus given by snapshot (c) is caused by interference between converging and (polarity reversed) diverging waves as described by Eq. (17.2.7). The size of the focal spot in Figure 17.3 (c) is in the order of half of the centre wavelength as expected.
Figure 17.3. Snapshots of backpropagated kernel B (complete acquisition aperture) at different (back)propagation times: (a) t = −0.060s, (b) t = −0.032s, (c) t = 0.0s, (d) t = 0.032s and (e) t = 0.060s.
416
17 Image Resolution Beyond the Classical Limit
From the previous analysis of B the phenomenon of diffraction-limited imaging could be easily understood. In case of imaging with a frequency band the backpropagation kernel B needs to be recomputed for the relevant range of frequencies before the corresponding resolution function can be obtained from Eq. (17.2.6). By increasing the frequency band of the illuminating source pulse an improved point-spread function (focus) can be obtained as shown in Figure 17.4. In the limit of infinite frequencies the focus will be perfectly resolved (i.e. ideal point-spread or resolution function).
Figure 17.4. Effect of frequency on the resolution function (focus).
It has been demonstrated that in case of an ideal aperture and a monochromatic signal, the corresponding resolution function is diffraction-limited. Only by introducing an infinite band of frequencies an ideal point-spread function can be obtained. In case of a limited aperture the image resolution will be even more distorted and the backpropagation kernel B can be decomposed in two parts: B(r, r , ω) = BH (r, r , ω) + BA (r, r , ω), (17.2.8) where BH corresponds to an ideal aperture (cf. Eq. (17.2.7)) and BA represents the additional effect of a possible limited acquisition geometry. By combining Eqs. (17.2.4) and (17.2.7) the aperture part of the backpropagation kernel B can be expressed as: ∂ BA ( r , r , ω) = − G∗0 (r, r , ω)G0 ( r , r , ω) ∂n S ∂ ∗ − G0 (r, r , ω) G0 ( r , r , ω) dS ∂n ) * − G0 (r, r , ω) − G∗0 (r, r , ω) . (17.2.9) To visualize the additional effect of a limited aperture, the experiment described in Figure 17.2 is repeated but this time receivers are only placed along the right part of a ring in the vertical slice through the surface of the sphere.
17.3 Deconvolution Approach to Improved Resolution
417
The corresponding snapshots of the kernel B show a more complex wave focus (caused by treating a 3-D problem as a two-dimensional one). Again the focus (Figure 17.5 (b)) is formed by the interaction of converging and diverging waves and is more distorted than the focus obtained in the ideal aperture case (cf. Figure 17.3 (c)).
Figure 17.5. Snapshots of backpropagated kernel B (limited aperture-half-ring) at different back) propagation times: (a) t = −0.032s, (b) t = 0.0s, (c) t = 0.032s.
The effect of both a limited aperture as well as a finite frequency-band will distort the final point-spread function.
17.3
Deconvolution Approach to Improved Resolution
The material presented in this section is based on Sjoeberg, Gelius and Lecomte (2003). We consider the case of seismic prestack depth imaging employing integral equation formulations. Because of limited aperture and band-limited signals, the output from such imaging methods is a distorted or blurred image. This distortion can be computed using the concept of the resolution function, which is a quantity readily accessible in the Fourier space of the model. The
418
17 Image Resolution Beyond the Classical Limit
key parameter is the scattering wavenumber, which at a particular image point is defined by the incident and scattered ray directions in a given background model. If we consider imaging within the Born approximation and 2-D geometries, this resolution function will take the form of a space-variant 2-D point-spread function. Hence, knowing the blur-functions, we can enhance the seismic image quality by employing a 2-D deconvolution technique. In this study, we propose to use a formulation based on the normal-equation form of the conjugate-gradient method. A simple performance test has been carried out, employing synthetic data, to illustrate the potential of this algorithm. We consider the problem of resolving quantities, such as velocity or reflectivity, which characterizes a geological model from a seismic experiment. Moreover, we assume 2-D seismic imaging within the Born approximation and introduce the resolution function which describes how much a model quantity is distorted because of band-limited seismic data and a limited acquisition geometry. For the Born model the resolution function takes the form of a standard pointspread function, and can be easily computed if the following three quantities are known: the background velocity model, the frequency bandwidth and the acquisition geometry. Based on knowledge of the space-variant resolution functions, image distortions can be compensated for. In this study we propose to deconvolve these 2-D point-spread functions by using the normal-equation form of the conjugate-gradient method, which ensures a stable formulation of the problem. Within a Born model the resolution function can be written explicitly as (Lecomte and Gelius 1998; Gelius et al. 2002): · (r − r )]dK, (17.3.1) RF (r − r ) = S(ω) exp[iK where S(ω) is the source spectrum if a signature deconvolution has not been applied to the input data before the prestack depth migration. If that is the case, S(ω) must be interpreted as the deconvolved signature, which is then almost constant, at least within a band. Moreover, in Eq. (17.3.1) the position vector r denotes a given image point and r is a position vector which denotes is the Fourier vector of the points in a small region Ω surrounding r. Finally, K model space at the center point r. A normalized version of this vector was first is also called the introduced by Beylkin et al. (1985). The Fourier vector K scattering wavenumber vector and it can be linked to a specific seismic survey through the following expression (Gelius et al. 2002): = −ω∇τ (rg , rs , r) = −ω[∇τs (r, rs ) + ∇τg (rg , r)], K
(17.3.2)
where rg denotes a receiver position and r s denotes a source position, ω is the angular frequency (i.e. we consider the seismic data after Fourier transforma tion of the time coordinate) and τ is the total traveltime from r s via r to rg .
17.3 Deconvolution Approach to Improved Resolution
419
For a fixed source, Eq. (17.3.1) can be derived from Eq. (17.2.6) if an asymptotic and high-frequency version of the backpropagation kernel B is being introduced (for more details see Gelius and Asgedom, 2011). Due to a limited acquisition can geometry and a finite frequency band, it follows from Eq. (17.3.2) that K be recovered only within a finite band. Hence, the seismic image obtained after prestack depth Born migration is a distorted one, and is given explicitly by the following image equation (Gelius et al., 2002): (17.3.3) γ(r) = γ(r )RF (r − r )dr = γ ∗ ∗RF , where γ represents the blurred version of the exact seismic image γ. Equation (17.3.3) represents a local 2-D convolutional model, and in general the resolution function RF will be space-variant. We now introduce a lexicographical ordering of the image as illustrated in Figure 17.6. Eq. (17.3.3) can then be rewritten as follows y = Dx, (17.3.4)
where y is the blurred image, x is the original image and D is the blur matrix. The blur matrix will contain the complete set of resolution functions (which can be space-variant) and can be quite sparse. Next, we introduce the normalequation form of Eq. (17.3.4), i.e. DT y = DT Dx
⇒
y = D x,
(17.3.5)
Figure 17.6. Lexicographical ordering of an image (letters A, B, C etc. indicate pixels).
420
17 Image Resolution Beyond the Classical Limit
where the superscript T means matrix transpose. The new system matrix D = DT D is now a symmetric positive semi-definite matrix, and Eq. (17.3.5) can be solved by employing a conjugate gradient algorithm. As a performance test of the algorithm we employed the so called Lena image, which has become an industry standard for work on digital image processing. The image has a size of 256 × 256 pixels and we selected time-domain samples of this image with 1ms interval. Since both low and high frequencies are lost in seismic, we followed the idea of Liner (2000) and applied a second order Butterworth filter with pass-band 40–250 Hz in order to arrive at the seismic version of Lena shown in Figure 17.7. This image will now serve as the original (non-distorted) seismic image in the test of our 2-D deconvolution method. Let us now consider the resolution function, within a Born assumption, in seismic imaging. Figure 17.8 shows a typical point-spread function, which has been computed using standard acquisition parameters and a typical source bandwidth. The size of this point-spread function is limited to 7 × 7 pixels. A blurred image can now be established by a two-dimensional convolution between the resolution function in Figure 17.8 and the Lena image in Figure 17.7, and the convolved result is shown in Figure 17.9. The distorted image in Figure 17.9 is now input to our deblur algorithm with the resolution function in Figure 17.8 assumed to be known. Figure 17.10 shows the output from the conjugategradient deconvolution scheme, and should be compared with the original image in Figure 17.7. The overall conclusion is that the restoration method works very well.
Figure 17.7. Seismic version of Lena image.
After these preliminary studies, the method was tested on synthetic seismic data generated from the so-called Gullfaks model (cf. Figure 17.11). Gullfaks is an oil-producing field on the Norwegian continental shelf, and a detailed geological model of the subsurface and the target area exists for this area. The reservoir zone is located at a depth of approximately 2000m, and involves faulted blocks with finer sequences of dipping layers and an oil-water contact.
17.3 Deconvolution Approach to Improved Resolution
421
Figure 17.8. A typical Born resolution function.
Figure 17.9. The Lena image in Figure 17.7 after blurring with the point-spread function in Figure 17.8.
Figure 17.10. The distorted image in Figure 17.9 after 2-D deconvolution (to be compared with Figure 17.7).
Synthetic constant-offset seismic data were generated employing classical Kirchhoff modeling of the target horizons, defined here by constant reflectivity (+1). Moreover, a homogeneous background velocity of 2160m/sec was used to further simplify the modeling. 481 different equally spaced midpoint positions were considered (12.5m interval). A zero-phase Ricker wavelet with a 0 to 60
422
17 Image Resolution Beyond the Classical Limit
Figure 17.11. The Gullfaks model with enlarged target zone.
Hz bandwidth and a peak frequency of 20Hz was used as a source pulse. In the following we will only consider the far-offset of 4000m. Three different imaging approaches will be presented for the target area: • raw constant-offset data is input to the PSDM (prestack depth migration) • a source deconvolution is applied to the raw constant-offset data before PSDM • raw constant-offset data is input to the PSDM, and the prestack migration is followed by 2-D deconvolution employing the concept of resolution function. Figure 17.12 shows that part of the reservoir zone that will be considered in this study. The horizontal reflector represents the oil-water contact. Correspondingly, Figure 17.13 shows the constant-offset (4km) PSDM-image obtained with raw data as input. We can recognize the oil-water contact but the overall resolution is rather poor.
Figure 17.12. Gullfaks target area considered in this study.
17.3 Deconvolution Approach to Improved Resolution
423
Figure 17.13. Constant-offset (4km) PSDM-image, with raw data as input.
In order to improve the image resolution obtained in Figure 17.13, a source (signature) deconvolution was applied to the constant-offset data before PSDM. This process will enhance the vertical resolution due to pulse compression, e.g. giving rise to increased effective bandwidth. Figure 17.14 shows the result obtained, where the increased resolution is obvious. However, on comparison with Figure 17.12, image distortions are still present caused by the acquisition footprint. Finally, the PSDM-image in Figure 17.13 was 2-D deconvolved employing the concept of resolution function, and the final result is shown in Figure 17.15. This latter image represents by far the best reconstruction of the original target area given in Figure 17.12. Note that in order to obtain the image in Figure 17.15, we used a subset of the full Born point-spread function, i.e. a
Figure 17.14. PSDM image with signature-deconvolved input.
424
17 Image Resolution Beyond the Classical Limit
Figure 17.15. PSDM plus 2-D deconvolution.
small zone around the central vertical part. The reason for this sub-selection is based on the fact that in subsurface models where reflections dominate, energy will concentrate around specular ray directions (cf. introduction of reflector spread function by Gelius et al., 2002). By selecting a narrow 2-D point-spread function around the vertical direction, the underlying assumption is that of horizontal layering. An improved result can be obtained if the subset selection is taken around a direction perpendicular to the local dip. In the present study a single point-spread function was used for the examined target. A more accurate approach would be to compute resolution functions at several locations within the target area considered, combined with spatial interpolation. This case of spatial-varying point-spread functions can be handled equally well by our 2-D deconvolution method.
17.4
MUSIC Pseudo-Spectrum Approach to Improved Resolution
This first part of this section (basic MUSIC-theory and simulations) is based on Gelius and Asgedom (2011). The second part discussing the phase-coherent MUSIC algorithm and experimental results is partly taken from Asgedom et al. (2010). The question is now if diffraction-limited imaging (focusing) can be somehow relaxed by introducing alternative approaches to image formation. Traditional imaging makes direct use of the signal part of the measurements. Alternatively, identification of the orthogonal noise or nil space apparently can give super resolved point scatterers where the size of the focus is represented by fractions of a wavelength (Lehman and Devaney, 2003). However, closer analysis shows that
17.4 MUSIC Pseudo-Spectrum Approach to Improved Resolution
425
the output from such a method has to be interpreted as an extreme localization map of a point target rather than a conventional image. The key to this superresolved localization is to decompose the measurements into a signal space and a complementary noise (nil) space. Such a decomposition is obtained employing a Singular Value Decomposition (SVD) analysis of the data set. The original work of Lehman and Devaney (2003) derives the main results within the frame work of time reversal. In this chapter a more direct approach is employed which provides a simpler analysis of the problem Consider a multi-source multi-receiver experiment (cf. Figure 17.16) with a total of D scatterers (scattering strength di , i = 1, 2, . . . , D) embedded in a possible inhomogeneous background. In case of possible transient signals it is assumed that a temporal Fourier transform has been applied. Hence, the following formulation is derived for a monochromatic case. Let Ns and Nr represent respectively the total number of sources and receivers, and the monochromatic signal S¯ (vector with dimension Ns ) represent the transmitted signal from the source array. Introducing the complex transfer Nr × Ns matrix K, gives the following relationship (assuming noise-free case): ¯ R¯ = K S,
(17.4.1)
where R¯ represents the data measured at the receiver array. Assume that the rank of the matrix is D < min(Nr , Ns ) with D representing the total number of scatterers. The implies a physical experiment with D linear scattering contributions, e.g. Born type with no interactions between the scatterers. Introducing a singular value decomposition (SVD) of the system matrix K in Eq. (17.4.1) gives now: Σs 0 VsH = Us Σs VsH , (17.4.2) K = Us U⊥ 0 0 V⊥H where subscript s means signal space and ⊥ denotes the complementary space (nil space). Moreover, superscript H means complex conjugated transposed.
Figure 17.16. Acquisition geometry.
426
17 Image Resolution Beyond the Classical Limit
Due to orthogonal signal and nil spaces in Eq. (17.4.2) the following holds: U⊥H · Us = 0,
V⊥H · Vs = 0.
(17.4.3)
Also due to the inherent normalization: U⊥H · U⊥ = I, UsH · Us = I,
V⊥H · V⊥ = I, VsH · Vs = I.
(17.4.4)
Moreover, because of the linear scattering assumption the left- and rightsingular matrices take the form1 : Us = g¯r ( r 1 ) · eiθ1 g¯r (r2 ) · eiθ2 . . . g¯r ( r D ) · eiθD , (17.4.5) Vs = g¯s∗ ( r 1 ) · eiθ1 g¯s∗ ( r 2 ) · eiθ2 . . . g¯s∗ ( r D ) · eiθD ,
where g¯r ( r i ) and g¯s ( r i ) represent respectively the normalized receiver-side and source-side Green’s function vector associated with a secondary source (scatterer) placed at the true scattterer position ri . It is known that singular vectors output from an SVD analysis of a complex-valued matrix system will be non-unique by an arbitrary phase. The phase angles θi in Eq. (17.4.5) represent symbolically this non-uniqueness. The diagonal matrix Σs contains the corresponding true scattering strengths along its diagonal. Based on these observations a MUSIC type of operator can be introduced that peaks at the true target locations:
PM U SIC ( r ) =
g¯sT ( r )g¯s∗ (r) g¯rH ( r )g¯r (r) + . g¯rH ( r )U⊥ U⊥H g¯r ( r ) g¯sT ( r )V⊥ V⊥H g¯s∗ ( r )
(17.4.6)
Introduce now the projection matrices for the nil space with respect to both receiver and source array side: P⊥,r = U⊥ U⊥H ,
P⊥,s = V⊥ V⊥H .
(17.4.7)
These matrices have the following characteristics: P⊥,r U⊥ = U⊥ , P⊥,s V⊥ = V⊥ , P⊥,r Us = 0, P⊥,s Vs = 0.
(17.4.8)
Projection matrices for the signal space can be introduced correspondingly: Ps,r = Us UsH ,
Ps,s = Vs VsH ,
(17.4.9)
which show the complementary characteristics of those in Eq. (17.4.8). 1
Also assume that scatterers are fully resolved (ideal point-spread functions with respect to both source and receiver arrays).
17.4 MUSIC Pseudo-Spectrum Approach to Improved Resolution
427
Finally, by combining Eqs. (17.4.6) and (17.4.7) the alternative form of the MUSIC pseudo-spectrum operator can be established:
PM U SIC ( r ) =
g¯rH ( r )g¯r (r) g¯sT ( r )g¯s∗ (r) + . g¯rH ( r )P⊥,r g¯r ( r ) g¯sT ( r )P⊥,s g¯s∗ ( r )
(17.4.10)
The use of the noise-space in this way gives a result that is similar to the effect of including evanescent energy: e.g., it peaks at the location of the scatterer(s). However, this does not imply that the nil-space singular functions represent the true evanescent waves; they only give and overall effect in the ‘image’ that resembles well the one obtained from employing the evanescent fields. In practice, also experimental/measurement noise will often be present. Introduce now the noise corrupted version of the system matrix: ˜ =K+N K and its corresponding SVD-form: H Σ˜ s 0 V˜s K˜ = U˜ s U˜ ⊥ = U˜ s Σ˜ s V˜sH + U˜ ⊥ Σ˜ ⊥ V˜⊥H . ˜ V˜⊥H 0 Σ⊥
(17.4.11)
(17.4.12)
˜ will generally have full rank but is ‘close’ to a matrix of rank D if the noise N K is small. In the following this is assumed. Introduce now for the perturbed lefthand and right-hand singular vector matrices (linear or first-order perturbation assumed) U˜ ⊥ = U⊥ + Us Q, U˜ s = Us + U⊥ R, (17.4.13) V˜⊥ = V˜⊥ + Vs L, V˜s = Vs + V⊥ M. Due to orthogonality between the signal and nil space the following condition hold: (17.4.14) U˜ ⊥H · U˜ s = 0. Combination between Eqs. (17.4.13) and (17.4.14) give then R = −QH .
(17.4.15)
M = −LH .
(17.4.16)
By analogy it can be shown that
Next, check if the perturbed singular vector matrices in Eq. (17.4.13) are normalized. Use the left-hand perturbed singular vector matrix of the nil space
428
17 Image Resolution Beyond the Classical Limit
as a test: U˜ ⊥H U˜ ⊥ = (U⊥ + Us Q)H (U⊥ + Us Q) = U⊥H + QH UsH (U⊥ + Us Q) = U⊥H U⊥ + QH UsH U⊥ + U⊥H Us Q + QH UsH Us Q ∼ = I.
(17.4.17)
In order to arrive at the result in Eq. (17.4.17) one has to make use of Eqs. (17.4.3) and (17.4.4). Moreover, assuming a first-order perturbation analysis the second order term QH Q is neglected. The same check can be made on the other perturbed singular matrices giving the same result. By analogy with Eq. (17.4.10) the MUSIC operator now takes the form
g¯sT ( r )g¯s∗ (r) g¯rH ( r )g¯r (r) + , g¯rH ( r )P˜⊥,r g¯r ( r ) g¯sT ( r )P˜⊥,s g¯s∗ ( r )
PM U SIC ( r ) =
(17.4.18)
where the nil-space projection matrices are defined as P˜⊥,r = U˜ ⊥ U˜ ⊥H ,
P˜⊥,s = V˜⊥ V˜⊥H .
(17.4.19)
Combination of Eqs. (17.4.13) and (17.4.19) gives now P˜⊥,r = (U⊥ + Us Q)(U⊥ + Us Q)H
and
= (U⊥ + Us Q)(QH UsH + U⊥H ) = U⊥ QH UsH + Us QQH UsH + U⊥ U⊥H + Us QU⊥H
(17.4.20)
P˜⊥,s = V⊥ LH VsH + Vs LLH VsH + V⊥ V⊥H + Vs LV⊥H .
(17.4.21)
Explicit expressions for the two first-order perturbation matrices Q and L are derived in Gelius and Asgedom (2011):
and
H H Q = −Σ−1 s Vs N U⊥
(17.4.22)
H L = −Σ−1 s Us N V⊥ .
(17.4.23)
Assuming random noise with expected value (mean) equal zero, e.g. E[N ] = 0, Gelius and Asgedom (2011) also show that
and
H E [P⊥,r ] = U⊥ U⊥H + σ 2 (Nr − D)Us Σ−2 s Us
(17.4.24)
H E [P⊥,r ] = V⊥ V⊥H + σ 2 (Ns − D)Vs Σ−2 s Vs
(17.4.25)
17.4 MUSIC Pseudo-Spectrum Approach to Improved Resolution
429
with σ 2 being the variance of the noise. Hence, the expected value of the MUSIC operator in Eq. (17.4.18) is now: E PM U SIC ( r ) =
g¯rH ( r )g¯r (r) g¯sT ( r )g¯s∗ (r) + g¯rH ( r )E P˜⊥,r g¯r ( r ) g¯sT ( r )E P˜⊥,s g¯s∗ ( r ) g¯rH ( r )g¯r ( r ) = (17.4.26) H g¯rH ( r ) U⊥ U⊥H + σ 2 (Nr − D)Us Σ−2 s Us g¯r ( r ) g¯sT ( r )g¯s∗ ( r ) + . H g¯∗ ( g¯sT ( r ) V⊥ V⊥H + σ 2 (Ns − D)Vs Σ−2 s Vs s r)
Note that in the special case of perfect illumination (acquisition geometry) with respect to both arrays the projection matrices take the ideal form: Ps,r = Us UsH = I,
Ps,s = Vs VsH = I.
(17.4.27)
Assuming constant eigenvalues η for all scatterers and a perfect illumination Eq. (17.4.26) simplifies as: E PM U SIC ( r ) =
g¯rH ( r )g¯r ( r ) g¯rH ( r ) U⊥ U⊥H + σ 2 (Nr − D)/η 2 g¯r ( r ) g¯sT ( r )g¯s∗ ( r ) + , g¯sT ( r ) V⊥ V⊥H + σ 2 (Ns − D)/η 2 g¯s∗ (r)
(17.4.28)
which represents the most ideal case. In practice, as can be seen from Eq. (17.4.26) the noise will project the effect of a limited aperture via the projection matrices of the signal space. The more correlated aperture pattern the worse. The MUSIC-pseudo spectrum approach as given by Eq. (17.4.10) will now be tested using controlled data. In the first case considered the measurements are assumed noise-free. A fairly limited acquisition geometry was employed as shown in Figure 17.17 (a), e.g. involving only four sources and three receivers
Figure 17.17. (a) Acquisition geometry, (b) array resolution function.
430
17 Image Resolution Beyond the Classical Limit
organized in a random way. Data was generated in the time-domain employing a Ricker wavelet with a centre frequency of 20 Hz and Foldy-Lax theory (Green and Lumme, 2005) to take into account interactions between the two scatterers. The separation between the two scatterers was one quarter of the centre wavelength. The actual super resolution computation was carried out in the frequency domain after carrying out a Fourier transform. The corresponding monochromatic (20Hz) array resolution function is shown in Figure 17.17 (b) with respect to one of the scatterers. A similar result is obtained for the PSF associated with the second point target. This array resolution function is certainly not an ideal one because it shows non-zero values not only at the location of the considered scatterer. However, it turns out that the super resolution technique is more robust when used in practice as long as the PSF of a given scatterer peaks at its true location and show significantly lower values at the location of the other scatterer(s). This is demonstrated in Figure 17.18 (a) which shows the result obtained employing the super resolution technique (monochromatic data corresponding to the centre frequency of 20Hz). The quality of this ‘image’ is superior. Finally, employing the same monochromatic data as input to Kirchhoff migration the image in Figure 17.18 (b) was arrived upon. As expected, due to the sparse acquisition geometry and also the use of monochromatic data only, the reconstruction of the two scatterers is now heavily distorted.
Figure 17.18. (a) super resolution result and (b) Kirchhoff migration result. Note only monochromatic data considered, corresponding to the centre frequency of 20Hz.
In the second example considered a more realistic acquisition geometry was applied. This time also white noise was added to the controlled data generated to further investigate the sensitiveness of the MUSIC-pseudo spectrum technique. A Ricker wavelet with a centre frequency of 20Hz was chosen as a source pulse. The background velocity model was homogeneous and set to 2000m/s. Separate source and receiver arrays were employed (both with 20 elements) as shown in Figure 17.19. The targets to be illuminated were two point scatterers separated by three centre wavelengths and having a unity scattering strength (Born scattering was assumed when generating the test data). The
17.4 MUSIC Pseudo-Spectrum Approach to Improved Resolution
431
Figure 17.19. Data acquisition geometry for the imaging of two point scatters.
noise introduced in the measurements was an additive white Gaussian noise (AWGN) with zero mean and a variance equal to 10% of the maximum measurement value for each of the source-receiver pairs in the time domain. The noise was Fourier transformed and scaled by the inverse of the source spectrum and added to the response matrix K. The monochromatic MUSIC algorithm given by Eq. (17.4.10) was now employed. Figure 17.20 (a) shows the result obtained employing the centre frequency of 20Hz. Due to the noise superimposed the two scatterers are no longer super-resolved as would be the case for noise free input data. This is just according to the result predicted by Eq. (17.4.26), e.g. the signal-space start to leak into the nil-space. Repeated use of Eq. (17.4.10) for a frequency band between 5 and 35 Hz, and then adding all the ‘images’ gave the result shown in Figure 17.20 (b). This result is somewhat poorer than the single-frequency result. It is as expected that employing the MUSIC pseudo-spectrum operator over a frequency band will not improve the result. This is due to the fact that the operator in Eq. (17.4.10) does not include any phase information which is needed if the effect of the random noise should be added out efficiently. To further evaluate the results in Figure 17.20, standard Kirchhoff migration was also applied to the test data. Figures 17.21 (a) and (b) show respectively the results obtained employing a single frequency of 20 Hz and the frequency band 5–35 Hz. Comparison between Figures 17.20 (b) and 17.21 (b) show that the output from migration is significantly better. In order to construct a more robust version of a MUSIC pseudo-spectrum operator that can add phase coherently, realize first that an alternative to Eq. (17.4.10) can be constructed based on singular vectors corresponding to
432
17 Image Resolution Beyond the Classical Limit
Figure 17.20. Standard MUSIC computed using Eq. (17.4.10) at (a) 20 HZ and (b) for a frequency band between 5–35 Hz.
Figure 17.21. Kirchhoff migration for (a) single frequency of 20 HZ and (b) a frequency band between 5–35 Hz.
the signal space:
PM U SIC,2( r ) =
g¯rH ( r )g¯r (r) g¯rH ( r )g¯r ( r ) − g¯rH ( r )Ps,r g¯r ( r ) g¯sT ( r )g¯s∗ (r) + , g¯sT ( r )g¯s∗ (r) − g¯sT ( r )Ps,s g¯s∗ ( r )
(17.4.29)
where the projection matrices for the signal space is given by Eq. (17.4.9). However, also the pseudo-spectrum operator in Eq. (17.4.29) carries only magnitude information. By analogy with Eq. (17.4.29) introduce now the mixed-space projection matrices: (17.4.30) Ps,mix1 = Us VsH , Ps,mix2 = Vs UsH and correspondingly the modified MUSIC pseudo-spectrum:
PM U SIC,3( r ) =
|g¯r ( r )||g¯s ( r )| |g¯r ( r )||g¯s ( r )| − g¯rH ( r )Ps,mix1g¯s∗ ( r ) |g¯s ( r )||g¯r ( r )| + . |g¯s ( r )||g¯r ( r )| − g¯sT ( r )Ps,mix2 g¯r ( r )
(17.4.31)
The pseudo-spectrum operator in Eq. (17.4.31) now contains phase information in the denominator. Moreover, by introducing the mixed-space projection matrices in Eq. (17.4.30) the effect of the SVD-phase ambiguity (cf. Eq. (17.4.5))
17.4 MUSIC Pseudo-Spectrum Approach to Improved Resolution
433
has now been eliminated. A phase-coherent MUSIC pseudo-spectrum operator can now be constructed by adding over frequencies in the denominator in Eq. (17.4.31) (for simplicity the nominator can also be set equal to unity):
Pcoh ( r ) = ω
1 |g¯r ( r )||g¯s ( r )| − g¯rH ( r )Ps,mix1g¯s∗ ( r )
+ ω
1 |g¯s ( r )||g¯r ( r )| − g¯sT ( r )Ps,mix2g¯r ( r )g¯s ( r )|
(17.4.32)
The algorithm given by Eq. (17.4.32) we will denote phase-coherent MUSIC. This modified MUSIC operator can now be tested employing the data corresponding to the experiment in Figure 17.19. The output is given by Figures 17.22 (a) and (b). Comparisons with both standard MUSIC (cf. Figure 17.20) as well as Kirchhoff migration (cf. Figure 17.21) show that the new algorithm is superior and able to give super-resolved images of the two targets considered.
Figure 17.22. New multi-frequency MUSIC computed using Eq. (17.4.32) at (a) 20 HZ and (b) for a frequency band between 5–35Hz.
To further benchmark the phase-coherent MUSIC algorithm experimental ultrasound data was employed. Figure 17.23 shows schematics of the measurement set up in the water tank. The two scatterer were made up of thin wires. Ultrasound frequencies were applied within the band 3.8–4.1MHz. This corresponds to a centre frequency of about 0.35mm. The results obtained respectively using standard MUSIC, Kirchhoff migration and phase-coherent MUSIC are shown in Figures 17.24 (a)–(c) for a single frequency of 4MHz (corresponding to the centre frequency). For both type of MUSIC methods the results are not of super-resolution quality. Both scatterers are of course resolved using all three techniques since they are separated several wavelengths apart. Next, the idea of adding over the available frequency band between 3.8 and 4.1MHz is tested out. Figures 17.25 (a)–(c) show the corresponding results obtained. On comparison with Figures 17.24 (a)–(c) it can be seen that the standard MUSIC has not improved as expected, further that the Kirchhoff
434
17 Image Resolution Beyond the Classical Limit
Figure 17.23. Sketch of ultrasound water-tank experiment.
Figure 17.24. (a) Standard MUSIC, (b) Kirchhoff migration and (c) Phase-coherent MUSIC. Single frequency of 4MHz (centre frequency).
migration is significantly better but that the largest changes can be observed for phase-coherent MUSIC. The latter technique is now seen to give a superresolved result of the two scatterers showing that experimental noise can be handled well by this improved technique.
17.5
Concluding Remarks
This chapter has provided a simple framework of understanding and analyzing both diffraction-limited imaging as well as super-resolution. Extrapolation of measurement data employing the principles of migration or backpropagation gives only an estimate of the scattered wavefields. In case of a point scatterer it is shown explicitly that the band-limited character of the focus is caused by a su-
17.5 Concluding Remarks
435
Figure 17.25. (a) Standard MUSIC, (b) Kirchhoff migration and (c) Phase-coherent MUSIC. Frequency band 3.8–4.1MHz.
perimposition of converging and diverging waves. Hence, the backpropagation operation is non-causal by nature since it is based on the anti-causal Green’s theorem. Imaging beyond the classical diffraction limit has been proposed in the literature by making use of the evanescent field contributions. However, this approach is not practical in case of seismic imaging in general, since the evanescent waves are so weak because of attenuation that they are masked by the noise. An alternative procedure is to estimate the space-variant resolution functions based on ray techniques. The resolution of the seismic image can then be improved as discussed in Section 17.3. In the special case of point-diffracted data they can apparently be super resolved by making use of the null-space solutions. This concept applies equally to a single or a collection of scatterers as long as ideally no strong interactions between them (Born type). In this chapter, it was demonstrated that two scatteres with strong interaction still could be super resolved. Hence, the Born assumption can be somewhat relaxed if a small number of scatterers. However, the apparently super-resolved focus obtained by this technique has to be interpreted as an extreme localization rather than a quantitative image (the value of the peak is arbitrary). However, if white noise is added to the data the MUSIC pseudo-spectrum operator breaks down. This chapter therefore proposes a modified MUSIC technique which has the characteristics of adding coherently the results obtained from several frequencies. This new technique was benchmarked employing ultrasound data demonstrating its robustness towards experimental noise.
436
17 Image Resolution Beyond the Classical Limit
References [1] A. Asgedom, L.-J. Gelius, A. Austeng and S. Holm, Multi-frequency phase coherent super-resolution imaging, 72nd EAGE conference, Barcelona, Spain, Extended Abstract, 2010. [2] G. Beylkin, Imaging of discontinuities in the inverse scattering problem by inversion of a causal generalized Radon transform, Journal of Mathematical Physics, 26, 99-108, 1985. [3] J. F. Claerbout, Toward a unified theory of reflector mapping, Geophysics, 36, 467-481, 1971. [4] C. Esmersoy and M. Oristaglio, Reverse-time wave-field extrapolation, imaging, and inversion, Geophysics, 53, 920-931, 1988. [5] L.-J. Gelius, Generalized acoustic diffraction tomography, Geophys. Prosp. 43, 3-29, 1995. [6] L.-J. Gelius, A simple analysis of diffraction-limited imaging and super-resolution, 71st EAGE meeting, Amsterdam, The Netherlands, Extended Abstract, 5431, 2009. [7] L.-J. Gelius and E. Asgedom, Diffraction-limited imaging and beyond - the concept of super resolution. Geophysical Prospecting, 59, 400-421, 2011. [8] L.-J. Gelius, I. Lecomte and H. Tabti, Analysis of the resolution function in prestack depth migration, Geophys. Prosp., 50, 505-515, 2002. [9] K. Green and K. Lumme, Multiple scattering by the iterative Foldy-Lax scheme, J. Opt. Soc. Am. A, 22, 1555-1558, 2005. [10] K. J. Langenberg, Applied Inverse Problems for Acoustic, Electromagnetic, and Elastic Wave Scattering, Basic Methods of Tomography and Inverse Problems, 125-467, Editor: P. C. Sabatier, Adam Hilger Ltd., 1987. [11] I. Lecomte and L.-J. Gelius, Have a look at the resolution of prestack depth migration for any model, survey and wavefields, 68th SEG meeting, New Orleans, USA, Expanded Abstract, 1998. [12] S. K. Lehman and A. J. Devaney, Transmission mode time-reversal superresolution imaging, J. Acoust. Soc. Am., 113, 2742-2753, 2003. [13] C. L. Liner, On the history and culture of geophysics, and science in general, The Leading Edge, 19, 502-504, 2000. [14] R. G. Newton, Scattering Theory of Waves and Particles, 2nd Edition, Springer, New York, 1982. [15] J. Schleicher, M. Tygel and P. Hubral, Seismic True-Amplitude Imaging, Geophysical Developments Series No.12, Society of Exploration Geophysicists (SEG), 2007. [16] R. O. Schmidt, Multiple emitter location and signal parameter estimation, IEEE Trans. Antennas Propagation, AP-34, 276-280, 1986. [17] W. A. Schneider, Integral formulation for migration in two and three dimensions, Geophysics, 43, 49-76, 1978. [18] T. A. Sjoeberg, L.-J. Gelius and I. Lecomte, 2-D deconvolution of seismic image blur, 73rd SEG meeting, Dallas, USA, Expanded Abstracts, 2003. [19] J. W. Wiggins, Kirchhoff integral extrapolation and migration of nonplanar data, Geophysics, 49, 1239-1248, 1984.
References
Author Information L.-J. Gelius Department of Geosciences, University of Oslo, Norway. E-mail: [email protected]
437
Chapter 18
Seismic Migration and Inversion Y. F. Wang, Z. H. Li and C. C. Yang
Abstract. This chapter makes a short review about different migration methods in seismic imaging. In particular, regularizing least squares migration and inversion imaging techniques are discussed. Preconditioning technique is also introduced. Numerical tests are made to show the performance of the methods. Since the interferometric migration and the least squares migration both aim to improve the resolution of seismic imaging, a numerical experiment is made to discuss their ability in improving imaging resolution.
18.1
Introduction
In early time, people do the seismic migration by graphical methods, such as fuzzy wavefront method, diffraction stack, etc. These methods, which are based on Huygens Principle, are only qualitative methods. They can not supply images with high resolution and proper waveform characteristics. Migration which is based on wave equation began at early 1970s in last century. It gave out more ideal images. Claerbout [10, 11] suggested a 45 degree finite difference method and opened the gate of wave equation migration. Schneider [44] proposed Kirchhoff migration. Stolt [50] considered the migration method in frequency-wave number domain. Gazdag [17] carried out phase shift method. In the case of strong lateral velocity variation, PSPI (Phase Shift Plus Interpolation, Judson and Schultz [26]) was developed. Baysal [5], Mcmechan [35] and Whitmore [71] presented reverse time migration (RTM) almost at the same time. Lots of migration methods base on the following wave equation: ∇2 U (x, y, z, t) −
1 ∂ 2 U (x, y, z, t) = 0. v(x, y, z) t
(18.1.1)
Many geophysicists believe that the wave propagation under the ground satisfies the above equation. And we know the acquisition data U (x, y, z = 0, t), migration means that we try to output the unknown wave field U (x, y, z, t = 0). Seismic migration can be implemented before or after stack.
440
18 Seismic Migration and Inversion
Poststack migration is most popular in the practical production at present. It firstly stacks the original data of multiple coverages to get the zero offset section, and then performs the migration on this stack section. Its cost is much less than prestack migration while its defects are also apparent. It cannot realize exactly the common reflection points stacking. And this will cause a low resolution image in the area with complex structure. Prestack migration is the most ideal method of migration. It directly performs the migration on the original data of multiple coverages, and then does the stack. In this way, the data is not only migrated but also stacked on the precise common reflection points. However, it costs too much resource. Therefore, it is sometimes rejected by the industry. With the development of super computers and powerful computing technologies, prestack migration is very promising in the practice nowadays. No matter which migration method is used, a velocity model v(x, y, z) and a forward modeling operator L should be built. We will see many different forms of L in the subsequent sections.
18.2 18.2.1
Migration Methods: A Brief Review Kirchhoff migration
In mathematics, the following theorem holds:
(U ∇2 G − G∇2 U )dV = (U ∇G − G∇U ) · ndS. V
(18.2.1)
S
In the above, we choose U and G to be short for the functions U = U (x, y, z, ω) and G = G(x, y, z, ω). Now let us choose for U and G two special functions. Firstly, we suppose U equals the Fourier transformed pressure of a compressional wave field which is generated by sources outside closed surface S. Therefore, U should satisfies the Helmholtz equation ∇2 U + k 2 U = 0
(18.2.2)
Secondly, we suppose G to be the Fourier for all points inside S, where k = transformed pressure for a compressional wave field which is generated by a monopole in a point A inside S, therefore G satisfies the equation ω v.
∇2 G + k 2 G = −4πδ(x − xA )δ(y − yA )δ(z − zA ).
(18.2.3)
Substituting (18.2.2) and (18.2.3) into (18.2.1), it yields
(U ∇G − G∇U ) · ndS = −4π U δ(x − xA )δ(y − yA )δ(z − zA )dV S
V
= −4πU (xA , yA , zA , ω).
18.2 Migration Methods: A Brief Review
Using −∇U = jωρ0v [1] and ∇G · n = U (xA , yA , zA , ω) = −
1 4π
441 ∂G ∂n ,
(U
it turns to be like
∂G + Gjωρ0 vn )dS. ∂n
(18.2.4)
S
Result (18.2.4) is the famous Kirchhoff integral. From it, we can draw the following conclusion: If we know the pressure U amd the normal component of the particle velocity vn on a closed surface S, then the pressure can be computed in every point inside S with the aid of (18.2.4) [6]. This conclusion is very important for seismic migration. Because it indicates that if we know the data on the ground surface U (x, y, z = 0, t), we can know U (x, y, z, t = 0) for every point under ground. Reviewing the viewpoint in Section 18.1, we can find that it is exactly what seismic migration tries to do. The migration under this scheme is called Kirchhoff Migration.
18.2.2
Wave field extrapolation
Different from Kirchhoff migration, we do not expect to derive such a powerful formula as Kirchhoff integral is, instead, we would like to have a formula which can correlate the data at the current depth and the next depth: U (zi−1 ) = L(U (zi )),
(18.2.5)
where L is the upgoing forward extrapolation operator. If we have such an operator, we can design a migration scheme as follow: Scheme 1. 1) Design an inverse extrapolation operator F according to (18.2.5), so that U (zi ) = F (U (zi−1 ));
(18.2.6)
2) From z0 = 0, we downward continue to depth z1 = Δz according to (18.2.6); 3) Repeat the foregoing two steps for all depth levels; 4) Output U (x, y, z, t = 0) from the above results. Therefore, in the following sections, we will introduce different kinds of forward extrapolation operators and how to construct their inverse operators. Here, we firstly show the simplest one: forward extrapolation operator in spacetime domain. First, we choose for closed surface S the plane z = 0 and a hemi-sphere in the upper half space, and denote them by S1 and S2 . Then, we choose proper function G in (18.2.4) such that G = 0 on S and ∂G ∂n = 0 on S2 ,
442
18 Seismic Migration and Inversion
therefore, according to (18.2.4)
1 ∂G ∂G 1 U U dS = − dS1 U (xA , yA , zA , ω) = − 4π ∂n 4π ∂n S1 −jkr
S1 +S2 jk 1 + jkr e = U cos ϕ dS1 2π jkr r S 1 1 1 + jkr = U (x, y, z = 0, ω) cos ϕe−jkr dxdy. 2π r2 Lx Ly
(18.2.7) Then, we transform the above formula to the time domain via inverse Fourier transform and obtain 1 1 1 ∂ r U (x, y, zi , t − )dxdy. U (xA , yA , zi−1 , t) = cos ϕ 2 + 2π r cr ∂t v Lx Ly
(18.2.8) If we assume kr 1, then U (xA , yA , zi−1 , t) =
1∂ c ∂t
Lx Ly
r cos ϕ U (x, y, zi , t − )dxdy, 2πr v
(18.2.9)
where cos ϕ = (zi − zi−1 )/r. Combined (18.2.5) and (18.2.9), we can get the forward extrapolation operator L. Then, we can finish migration using Scheme 1.
18.2.3
Finite difference migration in ω − X domain
For simplicity, we use the two-dimensional version of (18.2.7) as follow jk (2) U (x, zi , ω) cos ϕH1 (kr)dx, (18.2.10) U (xA , zi−1 , ω) = − 2 Lx
(2)
where r = (xA − x)2 + (zi − zi−1 )2 and H1 kel’s function of the second kind. If we define W (xA − x, Δz, ω) = −
represents the first-order Han-
jk (2) cos ϕH1 (kr), 2
then (18.2.10) can be rewritten as U (xA , zi−1 , ω) = W (xA − x, Δz, ω)U (x, zi , ω)dx = W (x, Δz, ω) ∗ U (x, zi , ω). Lx
(18.2.11)
18.2 Migration Methods: A Brief Review
443
Hence, we may conclude that forward extrapolation in the space-frequency domain can be formulated in terms of convolution! Moreover, the space-variant version of (18.2.11) can be given in matrix notation for some scale operator samples by some way (18.2.12) U (zi−1 ) = W U (zi ). Combined (18.2.5) and (18.2.12), we can get the forward extrapolation operator L. Then, we can finish migration using Scheme 1.
18.2.4
Phase shift migration
In Section 18.2.3, a procedure has been given to migrate single seismic records by inverse extrapolation of the reflected wave fields according to formula (18.2.11). Now if we assume no lateral variations in velocity, then convolution (18.2.11) may be written as multiplications by transforming it to the spatial Fourier domain (18.2.13) U (kx , zi−1 , ω) = e−jkz Δz U (kx , zi , ω), where kz = k 2 − kx2 , k 2 = (2ω/vi )2 . From expression (18.2.13) it follows that inverse extrapolation of zero-offset data can be formulated as follows U (kx , zi , ω) = e+jkz Δz U (kx , zi−1 , ω),
(18.2.14)
where i runs from 0 to the maximum depth. Imaging at each depth level occurs by adding all frequency components and by inverse Fourier transformation ωmax 1 − U (kx , zi , t = 0) = Re U − (kx , zi−1 , ω)dω, π ωmin 1 R(x, zi ) = 2π
+k
−k
U − (kx , zi , t = 0)dkx .
This is so-called “Phase Shift Migration”. A serious disadvantage of this method is that along the entire zero-offset section lateral velocity variations cannot be included. For the lateral variant media, we can apply the Phase Shift Plus Interpolation method (PSPI). It will offer a better image. However, it will become much slower than Kirchhoff migration because of its frequent forward and inverse Fourier transformations between space and wave number domain.
18.2.5
Stolt migration
If the medium velocity is constant, migration can be expressed as a direct mapping from temporal frequency ω to vertical wavenumber kz [72]. The equation
444
18 Seismic Migration and Inversion
for Stolt mapping is
kz v v 2 U (kx , kz , t = 0) = kx + kz2 ), U (kx , z = 0, ω = 2 kx2 + kz2 2
(18.2.15)
where U (kx , z = 0, ω) is the zero-offset section and U (kx , kz , t = 0) is the migrated section in the frequency-wavenumber domain. Note that Stolt migration involves, first, mapping ω to kz for a specific kx by using v 2 kx + kz2 . (18.2.16) ω= 2 The output of mapping is then scaled by the quantity S=
v k z . 2 kx2 + kz2
(18.2.17)
Stolt migration and phase shift migration are both f − k domain migration. And Stolt migration needs a more strict assumption: constant velocity in the whole medium. Therefore, similar to the steps in phase shift method, but with a constant velocity, it can accomplish the migration only in one time. Hence, it would be much quicker than phase shift method. However, it can not give out a correct image for inhomogeneous medium. Stolt’s algorithm for constant velocity thus involves the following steps: 1) Start with the input wavefield U (x, z = 0, t) approximated by the CMP stack, and apply 2-D Fourier transform to get U (kx , z = 0, ω). 2) Map the wavefield from ω to kz using the dispersion relation given by (18.2.16). 3) Apply the scaling factor S of equation (18.2.17) as part of the mapping procedure. 4) Invoke the imaging principle by setting t = 0 and obtain U (kx , kz , t = 0). 5) Finally, apply a 2-D inverse transform to get the migrated section U (x, z, t = 0). It may be questionable as to whether the constant velocity Stolt’s method has value on its own as a practical migration algorithm. Nevertheless, Stolt’s method can be used efficiently to perform a constant velocity migration as the first step in a residual migration scheme. Additionally, the method constitutes an essential procedural step for migration velocity analysis. Stolt extended his method to handle velocity variations. For the variable velocity case, Stolt’s extension consists of (a) Modifying the input wavefield to make it appear as if it were the response of a constant velocity earth; (b) Applying the constant velocity algorithm (see Figure 18.1), and (c) Reversing the original modification of the input wavefield.
18.2 Migration Methods: A Brief Review
445
Figure 18.1. Stolt’s constant velocity migration in f − k domain.
This modification essentially is a type of stretching of the time axis to make the reflection times approximately equivalent to those recorded for a constant velocity earth. The nature of stretching is described by the stretch factor W . The constant velocity case is equivalent to W = 1. Note that the phase-shift and Stolt migration outputs normally are displayed in two-way vertical zerooffset time τ = 2z/v, as are the outputs from the finite-difference and Kirchhoff migrations. In practice, mapping in the f − k domain really is from ω − kx to ωτ − kx rather than to kz − kx , where ωτ is the Fourier dual of τ and is simply kz of equation (18.2.17) scaled by v/2: vkx 2 ωτ = ω 1 − . (18.2.18) 2ω One important concept must be pointed out from equation (18.2.18). Note that for a constant kx , ωτ < ω; thus, migration shifts the bandwidth to lower frequencies. This is analogous to the conclusion derived in relation to the NMO correction, since the latter also causes data stretching to lower frequencies. The implication from equation (18.2.18) is demonstrated by the dipping events model (see [72]), where events with different dips which have the same bandwidth before migration will have different bandwidths after migration.
446
18.2.6
18 Seismic Migration and Inversion
Reverse time migration
Another migration method, known as reverse time migration, extrapolates an initially zero x−z plane backward in time, bringing in the seismic data U (x, z = 0, t) as a boundary condition z = 0 at each time step to compute snapshots of the x − z plane at different times. At time t = 0, this x − z plane contains the migration result U (x, z, t = 0). The algorithmic process for the reverse time migration starts with the x − t section at the surface z = 0 (see Figure 18.2). Also consider an x − z frame at tmax . This frame is blank except for the first row thick is equal to the bottom row of the x − t section at tmax . Extrapolate this snapshot at t = tmax to t = tmax − Δt by using the phase-shift operator exp(iωΔt). This yields a new snapshot of the x − z frame at t = tmax − Δt. The first row of numbers in this frame is identical to the row in the x − t plane–the original unmigrated section, at t = tmax − Δt. Hence, replace the first row in the first snapshot at t = tmax − Δt with the row of the x − t section at t = tmax − Δt and continue the extrapolation back in time. The last snapshot is at t = 0 that represents the final migrated section.
Figure 18.2. The process of reverse time migration.
Reverse time migration mainly bases on reverse time wave field extrapolation and imaging condition. It follows the following steps: (a) Extrapolate the source wave field and save the result; (b) Reverse extrapolate the reception wave field from the last sample point at time axis; (c) For each depth, apply the imaging condition to sum up the wave field and get the final result. The forward and reverse wave field extrapolation can see Section 18.2.2. And imaging condition which is widely used at the present includes: (a) The equality of the arrival time of upgoing wave and the departure time of downgoing wave, which is proposed by Claerbout in [10]; (b) The inter-relationship in
18.2 Migration Methods: A Brief Review
447
the frequency-space domain; (c) The ratio of amplitude of upgoing wave to downgoing wave.
18.2.7
Gaussian beam migration
If we choose different Green’s function, we can obtain different image as the following formulas show 1 U (xA , yA , zA , ω) = − G(rA , rs , ω)U (rd , rs , ω)dxd dyd 4π (18.2.19) ∂G(rA , rd , ω) · dxd dyd , (Prestack) ∂n 1 ∂G(rA , r, ω) U (xA , yA , zA , ω) = − U (x, y, z = 0, ω) dxdy. (Poststack) 4π ∂n (18.2.20) If we replace the Green’s function in the above two formulas by the sum of Gaussian beams iω uGB (r, r , p , ω) G(r, r , ω) ≈ dp x dp y , (18.2.21) 2π p z where uGB (r, r , p , ω) is some kind of Gaussian Beam approximation, we call the scheme which combine (18.2.19) or (18.2.20) with (18.2.21) is Gaussian beam migration [22, 23]. And how to construct uGB (r, r , p , ω) can refer to [41]. One of the choices is as follows [41]: ⎞⎤ ⎡ ⎛ 2 const 1 t exp ⎣iω ⎝τ0 + Γjk qj qk ⎠⎦ , uGB (r, r , p , ω) = 2 1 2 det Q a0 ρ0 a0
j,k=1
(18.2.22) where t = dr0 /ds is the unit vector tangent to the central ray, and by a0 and ρ0 , we denote the velocity of P -waves a and the density ρ calculated on the central ray, respectively; (1) (2) q1 q1 Q = (1) (2) , q2 q2 which can be solved using ray tracing; τ0 + 12 2j,k=1 Γjk qj qk is the power series expansion of the solution of the corresponding eikonal equation; qi (i = 1, 2) is the parameter characterizing the ray path coordinate system.
18.2.8
Interferometric migration
Interferometry originated from optics, where the interference of light waves were used to assess the optical properties of an object. Seismic interferometric migration originated since Claerbout (1968), where he suggested that ghost reflections
448
18 Seismic Migration and Inversion
could be used to image subsurface reflectivity by imaging correlograms. From then on, there have been rapid developments in seismic interferometry [14]. Meanwhile, Wapenaar and Fokkema (2006) has mathematically demonstrated applications of crosscorrelation for reconstructing Green’s function using wave field reciprocity theorem in a lossless, three-dimensional inhomogeneous media. Schuster et. al (2004) demonstrated that the crosscorrelations of seismic signals from both active and passive sources at the surface or in the subsurface can be used to reconstruct a valid model of the subsurface. A complete proceedings with topic on “Seismic Interferometry: History and Present Status” was published by Society of Exploration Geophysicists (SEG) in 2008 [70]. Seismic interferometry consists of simple crosscorrelation and stacking of actual receiver responses to approximate the impulse response as if a virtual source was placed at the location of the applicable receiver [14]. Using crosscorrelation formula, we can define the crosscorrelation of seismic data d(rg1 , rs , t) ⊗ d(rg2 , rs , t), (18.2.23) φ(rg1 , rg2 , t) = s
where φ represents crosscorrelation graph, d(rg1 , rs , t) and d(rg2 , rs , t) represent the seismic records for the source at rs and receivers at rg1 and rg2 , respectively. Defining Green’s functions G(r0 |rA ) and G(r0 |rB ) satisfy the Helmholtz equation (∇2 + k 2 )G(r0 |rA ) = −δ(r0 − rA ), (18.2.24) (∇2 + k 2 )G(r0 |rB )∗ = −δ(r0 − rB ), ω where G(r0 |rB )∗ is the adjoint of G(r0 |rB ), k = v( r) . Based on the reciprocity theorem and using basic differential calculations, we obtain the following crosscorrelation equation ∂G(r0 |rA ) G(r0 |rB )∗ 2 ∗ G(rB |rA )−G(rA |rB ) = −G(r0 |rA )∗ G(r0 |rB )∗ d Sr , ∂nr ∂nr ∂S (18.2.25) where the integration is on the volume enclosed by S, nr denotes the normal vector pointing outward from the boundary, Sr represents the points on the surface of S. For a VSP acquisition geometry at a free surface S0 and an underlying acoustic medium of arbitrary velocity enclosed by S∞ and Swell , the vertical seismic profile data can be transformed to the surface seismic profile (SSP) data ∂G(r0 |rA ) G(rB |r0 )∗ 2 − G(r0 |rA )∗ 2iIm[G(rB |rA )] = G(rB |r0 )∗ d Sr . ∂nr ∂nr ∂S (18.2.26) In the far field approximation, the gradient of the Green’s function can be approximated by ∂G(r0 |rA ) ≈ ikG(r0 |rA ) (18.2.27) ∂nr
18.2 Migration Methods: A Brief Review
and
G(rB |r0 )∗ ≈ −ikG(r0 |rB )∗ . ∂nr
449
(18.2.28)
Hence the interferometric migration formula can be written as ∗ m(r0 ) = kω 2 G(rB |r0 )G(rB |r0 )∗ G(r0 |rA )G(rA |r0 ) d2 Sr rA ∈Swell rB ∈Swell
∂S
(18.2.29) or m(r0 ) = kω
2
rA ∈Swell rB ∈Swell
∂S
G(rB |r0 )G(rA |r0 )∗ d2 Sr G(rB |r0 )∗ G(r0 |rA )∗ . (18.2.30)
This indicates that the seismic records can be written as G(r0 |rB )G(r0 |rA )∗ , Φ(rB , rA ) = k
(18.2.31)
r0 ∈Swell
which states that the seismic records could be generated with the virtual source at A and the receivers at B. Realistic algorithms based on wave equation interferometric migration and diffraction stack migration are given in recent works of [21] and [73].
18.2.9
Ray tracing
Ray theory, which is analogous to optical ray theory, has been applied for over 100 years to seismic data processing. It continues to be used extensively today, owing to its simplicity and applicability to a wide range of problems. Ray theory is intuitively easy to understand, simple to program, and very efficient. Compared to more complete solutions, it is relatively straightforward to generalize to three-dimensional velocity models. In this section, we will be concerned with the timing and amplitudes of seismic arrivals. This narrow focus is nonetheless very useful for many problems. The theoretical basis for much of ray theory is derived from the eikonal equation (18.2.32) and the transport equation (18.2.34) [59]. Let us assume an isotropic medium with smooth velocity heterogeneities with respect to the wavelength of the signal we want to propagate. Let us attach a Cartesian reference frame (0,x,y,z) to this medium. Consider, at a given time t, a set of particles, at position x = (x,y,z), vibrating in phase on a smooth surface. We call this surface a wavefront. Particles on this wavefront have the same traveltime, T(x) = T0 . As time increases, the wavefront moves locally at speed c(x) and the gradient ∇T(x) is orthogonal to the wavefront (Figure 18.3). Although the wavefront moves in one direction, local properties do not allow us
450
18 Seismic Migration and Inversion
to detect what is the direction which must be known from the previous position of the wavefront. Therefore, we must consider the square of the gradient which gives us the eikonal equation [59] (∇T)2 (x) =
1 . c2 (x)
(18.2.32)
The gradient is called the slowness vector. In isotropic media, curves orthogonal to the wavefronts, that is, parallel to the slowness vector field, can be defined. We call them rays. Rays are very useful trajectories for calculating not only traveltime but also amplitude variations. Indeed, the vibration energy moves along the ray tube without any energy leaking (Figure 18.3). The energy over a small volume of length ΔT1 , crossing the wavefront T1 through the surface dS1 is related to the square of the amplitude A1 through the expression: dε1 = A21 dS1 ΔT1 . The energy should be preserved at the wavefront T2 which gives the following conservation of flux: A2 (x1)
dS(x1 ) dS(x2) = A2 (x2 ) , c(x1 ) c(x2)
(18.2.33)
where two different points along the ray tube are denoted by x1 and x2 , and dS denotes the surface of the elementary orthogonal cross-sections of the ray tube. From this equation, applying the divergence theorem over an infinitesimal volume, we can obtain the local transport equation 2∇A(x) · ∇T (x) + A(x)∇2 T (x) = 0.
(18.2.34)
The eikonal and transport equations are fundamental ingredients of ray theory and highlight required properties such as wavefront smooth spatial continuity as
Figure 18.3. Ray tube geometry: rays are orthogonal to wavefronts in isotropic medium. Energy flows along rays. The local energy is preserved over an infinitesimal volume controlled by the local velocity.
18.2 Migration Methods: A Brief Review
451
well as amplitude conservation along ray tubes. Unfortunately, failures of such properties exist quite often in the Earth [9] (interfaces with sharp discontinuity of media properties, shadow zones where no rays are entering, caustics where rays cross each other, strong velocity gradients, etc.). We shall describe efforts to overcome these difficulties by introducing a more rigorous framework behind this intuitive ray concept. Both ray and paraxial ray equations, which are ODES, can be solved by numerical tools using Runge-Kutta (RK) of specified order or Predictor-Corrector (PC) schemes. Initial boundary problems, where both initial position and initial slowness (i.e., six initial values) are specified, can be efficiently solved. Because we have a first integral of these equations, which is the Hamiltonian/Eikonal value, we may efficiently avoid Gear-like methods and check accuracy at each integration step by estimating the eikonal constant. Of course, if we use the reduced Hamiltonian system, we lose this first integral at the benefit of fewer variables to be integrated. Although PC schemes are far more superior than RK schemes (only one spatial derivative to be estimated at each integration step whatever the order of precision), we rely essentially on the latter in seismology because it is easy to implement considering its self starting property. Paraxial equations are linear equations and could be solved with quite significantly coarser integration steps than ray equations. They may be solved in a second step once rays have been found and may well compete with integral formulations of propagator matrices. Other strategies based on analytical solutions inside each cell of a mesh have been considered although efficiency is effective only when simple coarse discretization is performed. In the frame of ray tracing by rays, we may consider three kinds of methods for making the ray converge to the receiver. The shooting method corrects an initial value problem until the ray hits the receiver. How to estimate the new initial slowness relies on many different numerical strategies linked to root solvers: the paraxial ray turns out to be the Newton procedure. The bending method deforms an already-specified curve connecting the source and the receiver: the search is essentially performed in the spatial domain where each node along the trajectory is perturbed until we may consider it as a ray. The continuation method is based on rays connecting the source and the receiver for rather simple velocity structures for which one may find easily the connecting ray. The velocity field is then deformed until it matches the true velocity field. We may find these tracing techniques quite inefficient if one has to span the entire medium with many receivers. Another strategy is the ray tracing by wavefronts where we perform ray tracing from an already evaluated wavefront. When ray density becomes too high, we may decimate rays; when ray density becomes too low, we may increase the number. Adaptive techniques have been
452
18 Seismic Migration and Inversion
developed and paraxial information may help such strategy of fixing ray density according to ray curvature in the phase space. Of course, the entire medium is sampled which may be a quite intensive task compared to the 1-D sampling performed by a single ray. Unfortunately, for multi-valued estimations of time and amplitude, this is up to now the unique solution while, for first-arrival computations, we may have other efficient numerical techniques. Finally, we want to indicate that ray theory also has several important limitations. It is only a high-frequency approximation, which may fail at long periods or within steep velocity gradient, and it does not easily predict any “non-geometrical” effects, such as head waves or diffracted waves. The ray geometries must be completely specified, making it difficult to study the effects of reverberation and resonance due to multiple reflections within a layer.
18.3
Seismic Migration and Inversion
We have mentioned a lot of migration methods in previous sections. However, these traditional seismic migration methods only yield an image that only has valid information for the positions of geological structures. To get an image with high resolution and true amplitude, we should continue to see inversion migration and regularized migration imaging in the next subsection. Migration only solves the first part of the inverse problem by considering traveltime to be the only important parameter [7]. The reflector is imaged, in the sense that its position and shape are more correctly represented, but there is no attempt to recover information about the material parameters of the subsurface. This difference in approach represents the major distinction between “migration” and “inversion”. To the early migrator, discussions of true amplitudes were moot because of the difficulties of controlling the source and in calibrating the seismometers. Therefore, in the mind of the early migrator, the output of a migration procedure was a processed seismic section, as opposed to a subsurface parameter image. Consequently, early digital migration schemes were not consciously designed to deal with the issue of true amplitude recovery. All that changed in the early 1970s when the technique of identifying gas-bearing strata by apparent high-amplitude bright spots on seismic sections was established. Current interest in amplitude-versus-offset (AVO) measurements for the determination of specific reservoir characteristics has provided further incentive for true amplitude recovery. The distinction between migration and inversion has blurred in recent years as the more modern approaches to migration do attempt to address the amplitude issue. This change was not as difficult as might be thought, thanks to the serendipitous discovery that relative amplitudes are handled correctly, in an
18.3 Seismic Migration and Inversion
453
inversion sense, by some migration algorithms. This happened because using the wave equation to directly handle the traveltimes has the by-product of handling the amplitudes more correctly as well. Migration and inversion is a new concept. It aims at finding an exact inverse extrapolation operator at the least cost. According to (18.2.5), how to construct an inverse extrapolation operator F is a very important problem. In the following, we use the matrix expression of (18.2.5) U (zi−1 ) = L · U (zi ). In geophysics, this is a typical ill-posed problem and the propagation matrix L is singular: the inverse L−1 can not be determined. However, there are several alternatives of practical interest: (a) Inversion of the significant values in the eigen-value spectrum of propagation matrix L T F = Y Λ−1 (18.3.1) c X , where L = XΛY T . This is called the singular-value-decomposition (SVD) of L (Lanczos, 1961). Matrix Λ is a diagonal matrix containing the eigen-values of L and matrix Λ−1 contains the inverted eigen-values of Λ which exceed a c pre-specified threshold: all eigen-values smaller than this threshold are set to zero. Hence in (18.3.1) the very small eigen-values are not inverted but set to zero. (b) Least squares inversion −1 F = LT L LT .
(18.3.2)
−1 is much We must declare the drawbacks of this operation [12]: (1) LT L T −1 is highly sensitive to the perturbations in more ill-posed than L, thus L L data; (2) Huge computational cost. (c) Matched inversion (18.3.3) F = LT . Note that matched inversion is significantly simpler than least-squares inversion. However, inversion according to (18.3.3) does not take into account the influence of coherent noise and neglects amplitude errors due to losses and truncation. In practical situations inversion according to (18.3.3) is almost always used because of its simplicity. (d) Tikhonov Regularization migration −1 F = LT L + αR LT ,
(18.3.4)
where R represents the normalized spatial autocorrelation matrix of the noise and α equals the signal-to-noise ratio for one temporal frequency component.
454
18 Seismic Migration and Inversion
For white noise, R = I. After the inverse extrapolation operator F being constructed, the following steps are to output the refined result and they are the same as that in other migration methods. For inversion migration, developed methods include inversion based on resolution function, CG-based least squares migration deconvolution, nonstationary matching filters methods and regularizing seismic inversion imaging [64]. For general geophysical inversion method, many methods were developed, e.g., damped least squares method, preconditioning conjugate gradient method, restarted conjugate gradient method based on discrepancy principle, trust region algorithms, Bayesian inference and statistical inversion techniques, regularization methods based on a priori information in Sobolev’s space, singular value decomposition method, least squares and various iterative implementations, mollification method and Bakus-Gilbert inversion. Some of these methods were developed in seismic migration imaging. However, the convergence rate of these methods for inversion migration may be slow. Wang et al. studied regularizing methods for inversion migration, discussed the ill-posed nature of the inverse problem in seismic migration and the limitation of the least squares migration in computation, and proposed a regularized hybrid gradient method for computation of inversion migration [64, 65].
18.3.1
The forward model
Before showing the inversion formulas, we first declare here that may be some formulas in seismic inversion will be similar to the ones used in Fourier-based migration, although our method of derivation is totally different from that used in seismic migration. Migration and inversion have the same base: wave equation. We will begin from its subsequent Helmholtz equation (18.2.2), and the derivation in the following will be quite different from that in seismic migration. First, we suppose that the wave-speed profile v(r) can be represented as a perturbation from a background profile c(r). While there are many ways to represent such a small deviation, our choice will be constrained by a desire to preserve the form of the Helmholtz equation. To do this, we use an expression of the form 1 1 = 2 (1 + α(r)). (18.3.5) v 2 (r) c (r) For the present discussion, α(r) will be assumed to always be “small” when compared to other relevant quantities in the problem. Then, because the wave speed structure of the medium has been represented as a reference profile plus a perturbation, a similar representation of the wave field U (r, rs , ω) is also appropriate. It is proper, therefore, to think of U (r, rs , ω) as being made up of a reference field, UI (r, rs , ω) (incident field), which would
18.3 Seismic Migration and Inversion
455
be present in the absence of the perturbation, plus US (r, rs , ω) (scattered field), which represents the departure from UI (r, rs , ω) due to the presence of the perturbation, α(r). The expression for this decomposition of the total field, U (r, rs , ω) = UI (r, rs , ω) + US (r, rs , ω)
(18.3.6)
is analogous to the wave-speed perturbation expression (18.3.5). An advantage of using this formal decomposition of the field is apparent, because we see that the Helmholtz equation (18.2.2) may be rewritten as the sum of two Helmholtz equations. We require that incident field UI (r, rs , ω) be a solution of the problem ∇2 UI (r, rs , ω) +
ω2 UI (r, rs , ω) = −δ(r − rs ). c2 (r)
(18.3.7)
We then substitute the wave-speed decomposition (18.3.5), the formal field decomposition (18.3.6) and the equation describing the incident field (18.3.7) into (18.2.2), to obtain the Helmholtz equation ∇2 US (r, rs , ω) +
ω2 c2 (r)
US (r, rs , ω) = −
ω2 c2 (r)
α(r)[UI (r, rs , ω) + US (r, rs , ω)].
(18.3.8) Because the Green’s function of the Helmholtz equation posed in terms of c(r) is known or can be approximated, we can use it and denote it by g(r, rg , ω). Therefore, g(r, rg , ω) will satisfy the following equation ∇2 g(r, rg , ω) +
ω2 g(r, rg , ω) = −δ(r − rg ). c2 (r)
(18.3.9)
Compare (18.3.7) and (18.3.9), we immediately know UI (r, rs , ω) = g(r, rs , ω). Equations (18.3.8) and (18.3.9) can be solved using Green’s theorem to create an integral equation for US (r, rs , ω), that is α(r) 2 US (rg , rs , ω) = ω [UI (r, rs , ω) + US (r, rs , ω)]g(r, rg , ω)dr. (18.3.10) c2 (r) Ω
So far, we have built up a forward model which relates the observations of the scattered field US (rg , rs , ω) at rg to the interior values of that unknown field US (r, rs , ω) and the unknown perturbation α(r). The goal of seismic inversion is to find out α(r). We say that the integral equation (18.3.10) is nonlinear because it has a term that contains the product of the unknown field US (r, rs , ω) and the perturbation α(r). This introduces a difficulty because, in the inverse problem, α(r) is the unknown that we seek. An important approach to solving such nonlinear
456
18 Seismic Migration and Inversion
problems is to find a “nearby” linear problem that we can solve. This solution is then viewed as a first approximation-subject to correction-of a solution to the nonlinear problem. The common method for finding such a nearby linear solution is to linearize the problem. Here “linearization” means removing the product α(r)US (r, rs , ω) from the right side of equation (18.3.10). If a justification for ignoring this product can be found, then the linearization can be accomplished. Fortunately, it is reasonable to assume that if α(r) were “small” then US (r, rs , ω) would also be “small”. Hence, the product α(r)US (r, rs , ω) appearing under the integral in (18.3.10) should be significantly smaller than the product α(r)UI (r, rs , ω). This means (18.3.10) can be approximately rewritten to be α(r) UI (r, rs , ω)g(r, rg , ω)dr. (18.3.11) US (rg , rs , ω) = ω 2 c2 (r) Ω
Bearing in mind that UI (r, rs , ω) = g(r, rs , ω), therefore US (rg , rs , ω) = ω 2
Ω
α(r) g(r, rs , ω)g(r, rg , ω)dr. c2 (r)
(18.3.12)
The linearization performed here is often called the “Born approximation” by physicists. (18.3.12) is therefore often called the Born modeling formula.
18.3.2
Migration deconvolution
Migration deconvolution is based on Born approximation (18.3.12) and far field assumption. If we use the Green’s function in exponential form, we can apply inverse Fourier transform to get the solution. Standard seismic migration methods give distorted images of the subsurface, even with an accurate velocity model because of limitations in bandwidth, recording time, and aperture of the seismic-reflection experiment [65]. Application of inverse methods can improve the resolution of seismic images by compensating for these distortions. Migration deconvolution is one possible approach because it uses knowledge of the resolution kernel of the seismic experiment to compensate for the effects mentioned. However, it is well known that deconvolution approaches to this problem are ill-posed, rendering the results sensitive to noise. Least-squares migration, particularly with the addition of regularization, is another promising approach to improve the resolution of migrated images. The key to successful, iterative least-squares migration is the optimization strategy. Because least-squares migration has a cost approximately equal to two migration applications per iteration, it is essential to control the number of iterations required to reach a satisfactory image. We have adapted a gradient-descent optimization technique described by Barzilai
18.3 Seismic Migration and Inversion
457
and Borwein to least-squares seismic migration. One key feature of this approach, as opposed to the typical steepest-descent or conjugate-gradient (CG) approaches, is that it does not have monotonic convergence. The advantage of relaxing the requirement that each iteration must have a lower residual than the previous step is that the method can converge to a lower overall value of the objective function in less overall iteration. This is important for least-squares migration because one wishes to keep the overall number of iterations as small as possible. In [45, 46] and [24], the term migration deconvolution describes inverting the blurring operator from the migration image and thus creating a sharper image of the reflectivity. Nemeth et al. (1999) show that by modeling and migration of synthetic data, we can understand better the impact of the noise (e.g., recording footprint). For example, one might specify source-receiver geometry and a reflectivity model m, compute synthetic seismic data d using some modeling operator L, and then migrate these data to obtain the migrated section mmig [13, 45, 46, 24, 25, 73]: mmig = L∗ d = L∗ Lm, where mmig is the blurred migration image of the reflectivity; L∗ L is the integral blurring kernel operator, which is represented by a so-called resolution function (or point-spread function); and the adjoint operator L∗ is the integral migration operator. A more general blurring equation is given in [18], where they formulate the same problem in different form. The resolution function is important for optimization of survey planning, i.e., defining aperture, sampling, and location. It also can be used to guide the selection of migration parameters (frequency band, frequency sampling, shot/receiver sampling, image sampling) for a well-resolved image [30]. However, the inverse of the operator L∗ L should be only approximated [20, 58]. More sophisticated technologies can be referred to [65].
18.3.3
Regularization model
We consider the minimization of a “regularized functional”, i.e., 1 1 J α [m] := Lm − d2 + αDm2 → min, 2 2
(18.3.13)
where “:=” means “defined by”, D is a (semi-)positive definite bounded scale operator, and α ∈ (0, 1) is the regularization parameter. A very simple steepest descent method can be applied to solve equation (18.3.13), i.e., mk+1 = mk + ωk sk ,
(18.3.14)
where sk = −gk , gk is the gradient of J α and ωk is the stepsize which can be obtained by line search. If we set α = 0 and restrict the stepsize ωk to a constant
458
18 Seismic Migration and Inversion
in the interval (0, L−2 ) in each iteration, we obtain a special gradient method which is known as the Landweber iteration [61]. However these methods are slow in convergence and are difficult to be used on practical problems. One may have noticed that the standard migration is just the one step of steepest descent iteration if we set m0 = 0, α = 0 and choose the stepsize ω as unity. It is well known that one step of gradient iteration is generally far from convergence, hence the resolution of the standard migration imaging is low.
18.3.4
Solving methods based on optimization
Iterative regularization methods are more adaptable for large scale problems in applied sciences. The well-know iterative regularization methods are FridmanLandweber iterative methods [28, 68]. And many optimization methods can be regularized for solving practical inverse problems, e.g., regularizing GaussNewton method, trust region method, steepest descent method and conjugate gradient method [60, 61]. We recall that the gradient-based method is simple in iteration and low memory because of the second-order information of the objective function of the regularized functional is not required. Therefore, we pay attention to the gradient method and study the improvement of its performance for geophysical exploration problems in this paper. Gradient descent methods In the following, we assume that the problem is formulated in finite spaces and accordingly the adjoint of the operator is transferred to transpose of the operator. There are several highly cited gradient methods in the literature. Perhaps the simplest and the easiest one is the steepest descent method mk+1 = mk + ωk sk ,
(18.3.15)
where sk = −gk , gk = g(mk ), g(m) is the gradient of the function J α [m] and ωk the steplength which can be obtained by line search, i.e., optimal ωk∗ satisfies ωk∗ = argminω J α (mk + ωsk ), where the notation argmin denotes minimizing a function with specific argument. However, the steepest descent method is slow in convergence and zigzagging after several iterations [75]. The poor behavior is due to the optimal choice of the step size and not to the choice of the steepest descent direction gk . If we restrict the steplength ωk to a constant value in the interval (0, L−2 ) at each iteration, we obtain a special gradient method which is known as the aforementioned Fridman-Landweber iteration mk+1 = mk + ωsk .
(18.3.16)
18.3 Seismic Migration and Inversion
459
However this method is quite slow in convergence and is difficult to be used for most of practical problems [28, 68, 61]. The poor behavior lies in that both the step size ω and the gradient descent direction gk are not optimizing. Instead of using the negative gradient in each iteration, the non-monotone gradient methods were developed recently [16, 15, 76], meanwhile the BarzilaiBorwein method was one of the most well-known. This method was first proposed for solving the unconstrained optimization problem [4]. In this paper, we propose to apply the nonmonotone gradient method to ill-posed seismic signal retrieval problems by solving the following minimization problem min J α [m].
(18.3.17)
The non-monotone gradient method is aimed to accelerate the convergence of the steepest descent method and requires few storage locations and inexpensive computations. Barzilai and Borwein incorporated the quasi-Newton property with the gradient method for obtaining the second order information of the objective function J α [m]. Specifically, they approximated the Hessian ∇2 J α [mk ] by νk I and based on the secant condition, they considered two minimization problems νk = argminν yk−1 − νIzk−1 and νk = argminν νIyk−1 − zk−1 , where yk−1 = gk − gk−1 and zk−1 = mk − mk−1 . This leads to the two choices of the stepsize νk (gk−1 , gk−1 ) (18.3.18) νk1 = (gk−1 , Agk−1 ) and νk2 =
(gk−1 , Agk−1 ) , (gk−1 , AT Agk−1 )
(18.3.19)
where A = LT L + αDT D. It is evident that the two choices of stepsizes inherit the information of the gradient information from the former iteration instead of the current iteration. Hence the stepsizes inherit the regularized spectrum of the regularization model. Therefore we believe that this method is very efficient for solving ill-posed convex quadratic programming problem [62]. Let {mk } be the sequence generated by the above method from initial vectors m0 and m1 . Then the gradient of the object function J α [m] at mk is gk = Amk − b, where A is mentioned above and b = LT d. We have for all k ≥ 1, gk+1 = νk (
1 I − A)gk . νk
(18.3.20)
To analyze the convergence of the Barzilai-Borwein method, we can assume without loss of generality that an orthogonal transformation is made that transforms A to a diagonal matrix of eigenvalues diag(λi ). Moreover, if there are any eigenvalues of multiplicity M > 1, then we can choose the corresponding (i) eigenvectors so that gk = 0 for at least M − 1 corresponding indices of gk .
460
18 Seismic Migration and Inversion
It follows from equation (18.3.20) and using spectrum representation A = diag(λi ) that 1 (i) (i) gk+1 = νk ( − λi )gk . (18.3.21) νk Using the recurrence, Barzilai and Borwein prove an R-superlinear convergence result for the particular choice of the stepsize νk . Note that from (18.3.18) and (18.3.19) the inverse of the scalar νk is the Rayleigh quotient of A or AT A at the vector gk−1 . More choices for the steplengths can be obtained by setting a hybrid of the two stepsizes, e.g., the mean values of any two Rayleigh ratios, and the convergence properties were obtained [77]. We consider a linear combination of the two stepsizes, i.e., νkRayleigh = β1
(gk−1 , gk−1 ) (gk−1 , Agk−1 ) + β2 , (gk−1 , Agk−1 ) (gk−1 , AT Agk−1 )
(18.3.22)
where β1 and β2 are two positive parameters. An investigation of Eqs. (18.3.18)– (18.3.19) reveals that Eq. (18.3.18) is better than Eq. (18.3.19), since νk1 has small jumps than νk2 has when A is ill-conditioning. However, there is no reason that one should discard the stepsize given by νk2 . To make a trade-off, we choose the parameter β2 geometrically, i.e., β2 = β0 ξ k−1 , β0 > 0, ξ ∈ (0, 1), k = 1, 2, . . . .
(18.3.23)
Another parameter β1 can be set to 1 − β2 . This choice of parameters assigns more weights to νk1 than νk2 , however both step information inherits into the next iteration. In iterative regularization methods, the stopping rule is also an important issue. Since the noise for practical seismic data is hard to be identified, it is infeasible to use the commonly adopted discrepancy principle as the stopping rule [68]. Therefore, for gradient descent methods, the stoping rule is based on the values of the norm of the gradient gk . We preassign a tolerance > 0. Once gk ≤ is reached, the iterative process will be stopped. Smaller yields better approximate solution, however, induces more CPU time. Empirical values of is in the interval (10−3 , 10−5 ). It can be shown that the iteration points generated by the above method converge to the minimal solution of J α [m] [67]. Omitting proof details, we have the following theorem. Theorem 18.3.1. Let J α [m] be given in (18.3.13) with Ω[m] = 12 Dm2 and D a positive definite bounded scale operator and let {mk } be generated by the above nonmonotone gradient method in Rayleigh type with stepsize νk satisfying (18.3.22). Then the sequence {mk } converges to the minimal solution of J α [m].
18.3 Seismic Migration and Inversion
461
Barzilai and Borwein proved an R-superlinear convergence result for the particular choice of the stepsize νk . Since their method is a special case of choosing the stepsize νk with formula (18.3.22), therefore, it is ready to see that the nonmonotone gradient method in Rayleigh type is R-superlinear convergent. We have the following results. Theorem 18.3.2. Let J α [m] be given in (18.3.13) with Ω[m] = 12 Dm2 and D a positive definite bounded scale operator and let {mk } be generated by the above nonmonotone gradient method in Rayleigh type with stepsize νk satisfying (18.3.22). Then the sequence {mk } converges to the minimal solution of J α [m] with R-superlinear convergence. Proof. When β1 = 0 or β2 = 0, the Rayleigh type method reduces to the Barzilai and Borwein’s method. Since the Barzilai and Borwein’s algorithm is R-superlinear convergent, hence the algorithm of Rayleigh type is R-superlinear convergent. Remark 18.3.3. One may argue that why the non-monotone gradient method works well for ill-posed inverse problems. It is easy to see that νk1 ≥ νk2 . Therefore, it would be favorable to choose a longer step νk1 instead of νk2 . However, the shorter step νk2 yields a smaller gk+1 , which indicates that the shorter step νk2 would be efficient for obtaining an accurate solution of a large scale and ill-conditioned problem. It is clear that the choice of νkRayleigh possesses both the suitable length of the step and the reasonable approximation of an accurate solution. So, it is no doubt that proposed method works efficiently for seismic inversion and imaging. Conjugate gradient method Conjugate gradient method is known as a stable method and possesses the termination property for quadratic programming. Different from steepest descent method, this method relies also conjugate descent direction hk−1 . Omitting tedious deduction, we write the iterative formulae as mk+1 = mk + ν˜k hk , gkT hk , hTk Ahk hk = −gk + βk hk−1 , gk+1 = gk − ν˜k Ahk , ν˜k =
βk =
(18.3.24)
T gk+1 gk+1 . gkT gk
With proper stopping tolerance, it can be prove that the conjugate gradient method is convergent for ill-posed inverse problem [68].
462
18.3.5
18 Seismic Migration and Inversion
Preconditioning
The convergence property of iterative gradient methods depends on the conditioning of the operators. Preconditioning is a technique to improve convergence by lowering the condition number or increasing the eigenvalue clustering. This technique applied to gradient descent methods has been considered sufficiently in literature, e.g., [27, 36, 42, 19]. The idea is to solve a modified problem P −1 LT Lm = P −1 LT d,
(18.3.25)
where P is a symmetric positive definite matrix. If the condition number of P −1 LT L is less than that of LT L or the eigenvalues of P −1 LT L are more clustered than that of LT L, a higher rate of convergence will be reached. Let C be a nonsingular matrix and define the factorization of P as P = CC T , then solving (18.3.25) is equivalent to solving C −1 LT LC −T z = C −1 LT d,
(18.3.26)
z = C m.
(18.3.27)
T
Note that minimization of the objective function in equation (18.3.13) is equivalent to minimizing a quadratic programming problem 1 Qα [m] = mT (LT L + αDT D)m − dT Lm, 2
(18.3.28)
therefore the preconditioning problem can be written as minimizing a new quadratic programming problem 1 ˜ − b˜ T z, Q˜ α [z] = z T Az 2
(18.3.29)
where A˜ = C −1 (LT L + αDT D)C −T , b˜ = C −1 LT d and z = C T m. The gradient of Q˜ α can be evaluated as ˜ − b˜ = C −1 (LT L + αDT D)C −T z − C −1 LT d. g(z) ˜ = Az
(18.3.30)
The iterations of the steepest descent method are described by ˜ ˜ k − b, g˜k = Az
(18.3.31)
zk+1 = zk − νk g˜k , g˜ T g˜k νk = Tk . g˜k A˜ g˜k
(18.3.32) (18.3.33)
18.3 Seismic Migration and Inversion
463
Straightforward calculation yields the equivalent iterative formula mk+1 = mk − ν˜k hk , g T hk ν˜k = T T k , hk (L L + αDT D)hk
(18.3.34)
hk = P −1 gk ,
(18.3.36)
gk+1 = gk − ν˜k (L L + αD D)hk . T
T
(18.3.35)
(18.3.37)
Similarly, the iterations of the non-monotone gradient method are described by mk+1 = mk − ν˜k hk , ⎧ 1 ⎨ ν˜k , or ν˜k = ν˜k2 , or ⎩ Rayleigh ν˜k ,
(18.3.38)
hk = P −1 gk ,
(18.3.40)
gk+1 = gk − ν˜k (L L + αD D)hk , T
where ν˜k1 = ν˜k2 =
T
(18.3.39)
(18.3.41)
T h gk−1 k−1 , T T hk−1 (L L + αDT D)hk−1
hTk−1 (LT L + αDT D)hk−1 hTk−1 (LT L + αDT D)P −1 (LT L + αDT D)hk−1
Rayleigh
and ν˜k is the linear combinations of ν˜k1 and ν˜k2 . For any positive definite bounded scale operator D, J α is strictly convex. Since P is nonsingular matrix, therefore similarly to Theorem 18.3.1, we can establish the following convergence results of the preconditioning non-monotone gradient method. Theorem 18.3.4. Let J α [m] be given in (18.3.13) with Ω[m] = 12 Dm2 and D a positive definite bounded scale operator and let {mk } be generated by the preconditioning nonmonotone gradient method. Then the sequence {mk } converges to the minimal solution of J α [m]. Similarly, we can establish the preconditioning conjugate gradient method. The formulae reads as mk+1 = mk + ν˜k hk , gkT hk , hTk Ahk hk = −P −1 gk + βk hk−1 , gk+1 = gk − ν˜k Ahk , ν˜k =
βk =
T P −1 gk+1 gk+1 . gkT gk
(18.3.42)
464
18.3.6
18 Seismic Migration and Inversion
Preconditioners
Many preconditioners have been developed since the resuscitation of the conjugate gradient method. Widely referred preconditioners are Jacobian, GaussSeidel’s and incomplete factorization LU proconditioners [61]. The Jacobian preconditioner uses the diagonal of LT L + αDT D and has been shown to be useful if the diagonal elements are relatively different. The Gauss-Seidel’s preconditioner originates from Gauss-Seidel’s iterative method for solving linear matrix-vector equations. Incomplete factorization preconditioning uses an approximation to LT L+αDT D which is easy to invert. These preconditioners are efficient for well-posed linear problems. However for ill-posed problems, their advantages are not so obvious. We apply a symmetric successive over relaxation preconditioner. We assume that the matrix S = LT L + αDT D can be decomposed as S = M − N, where M=
N=
1 [(K − ωCl )K −1 (K − ωCu )], ω(2 − ω)
1 [(1 − ω)K + ωCl ]K −1 [(1 − ω)K + ωCu ], ω(2 − ω)
and K, Cl and Cu are the diagonal, lower triangular parts and upper triangular parts of S, respectively. Then we choose P as P = (K − ωCl )K −1 (K − ωCu ), where ω is a real scalar parameter within (0, 2). The optimal choice of the ω is not easy to do, as it requires very complicated eigenvalue analysis. And in many cases, such eigenvalue analysis becomes unavailable because the matrix L is not explicitly given, e.g., seismological problems, and it has to be estimated iteratively. In the current paper, we only approximate it by the degree of ill-posedness of the problem. According to Perron-Frobenius Theorem (see any textbook on matrix theory), the spectrum radius of S denoted by ρ(S) is greater than 0. It is ready ρ(S −1 N ) to see that ρ(M −1 N ) = 1+ρ(S Since M −1 S = I − M −1 N and −1 N ) < 1. −1 −1 0 < ρ(M N ) < 1, hence 0 < ρ(M S) < 1. So, the above choice of the preconditioner P is sufficient to guarantee the acceleration of convergence of the iterative methods.
18.4 Illustrative Examples
18.4 18.4.1
465
Illustrative Examples Regularized migration inversion for point diffraction scatterers
We perform experiments on a six-point scatterers diffraction model. These scatterer models are buried at a depth of 625m to 1625m. A source wavelet with central frequency 20 Hz and time sample rate 2 milliseconds is used to generate the data. We assume that 75 receivers are uniformly distributed on a survey line with maximum length of 1875m. The sampling interval of the survey line and the depth gridpoint spacing are both 25m. This yields the grid dimensions of the reflectivity model are 1875 × 1875 points. The background velocity is homogeneous with c = 2000m/s, and the time sampling interval is dt = 2ms. First, we assume that the reflectors are well separated, i.e., the reflectors are in a large distances in offset. The seismogram, the standard migration and gradient descent migration deconvolution images are illustrated in Figures 18.4, 18.5 and 18.6, respectively. Next, we assume that the reflectors are near from each other, i.e., the reflectors are in a small distances in offset. The seismogram, the standard migration and gradient descent migration deconvolution images are illustrated in Figures 18.7, 18.8 and 18.9, respectively. It is clear from the illustrations that the non-monotone gradient descent method yields better resolution imaging results than the standard migration tool. Comparison of the Figures 18.5, 18.6, 18.8 and 18.9, it reveals that the non-monotone gradient descent method possesses better resolution ability than the standard migration method. Figure 18.10 plots the least squares errors of the solution by the non-
Figure 18.4. Seismic data of point diffraction scatterers with large gap.
466
18 Seismic Migration and Inversion
Figure 18.5. The standard migration image for the data in Figure 18.4.
Figure 18.6. The gradient descent migration deconvolution image for the data in Figure 18.4.
Figure 18.7. Seismic data of point diffraction scatterers with small gap.
18.4 Illustrative Examples
467
Figure 18.8. The standard migration image for the data in Figure 18.7.
Figure 18.9. The gradient descent migration deconvolution image for the data in Figure 18.7.
Figure 18.10. Error plot of the least squares errors of the gradient descent method.
468
18 Seismic Migration and Inversion
monotone gradient descent method at each iteration. The nonmonotonicity and speediness of the method are clearly illustrated.
18.4.2
Comparison with the interferometric migration
We consider a layered model to make a comparison of the regularized migration inversion imaging and the interferometric migration. The velocity model is shown in Figure 18.11. In simulation, 10 receivers are placed in the vertical well, and 80 shots are evenly deployed on the surface. Since the seismic data are usually perturbed with different kinds of noise, therefore, we assume an additive random noise with noise level 0.05 being added to the synthetic data. The seismic records are shown in Figure 18.12. The random noise is shown in Figure 18.13. The standard migration, interferometric migration and migration inversion using the preconditioning conjugate gradient method are compared. The results are shown in Figures 18.14–18.16, respectively. It reveals that both the migration inversion and the interferometric migration can be capable of enlarging the imaging area and improving the resolution, whereas the standard migration cannot well control the noise propagation.
Figure 18.11. Velocity model.
18.5
Conclusion
In this chapter, we first make a brief review of migration methods developed in literature. Then we focus on the seismic migration and inversion. In particular, we develop a non-monotone gradient method and applied it to least-square
18.5 Conclusion
Figure 18.12. Noisy vertical seismic profile.
Figure 18.13. The random noise with level equaling 0.05.
Figure 18.14. Standard migration results.
469
470
18 Seismic Migration and Inversion
Figure 18.15. Interferometric migration results.
Figure 18.16. Regularizing migration and inversion results.
migration. Meanwhile, we introduce a regularization technique to reduce the ill-posedness and design a preconditioner to speed up the convergence. We have proved that the method converges with R-superlinear rate. Numerical examples demonstrate that the method is efficient and can improve the resolution of the inversion results. Since interferometric migration using the ghost reflections could be used to image subsurface reflectivity with enhanced resolution, we also make a comparison of these two methods. It seems that the migration and inversion imaging technique yields much higher resolution than the interferometric migration.
References
471
Acknowledgements This work is supported by National Natural Science Foundation of China under grant numbers 10871191, 40974075 and Knowledge Innovation Programs of Chinese Academy of Sciences KZCX2-YW-QN107.
References [1] K. Aki and P. G. Richards, Quantitative Seismology: Theory and Methods, W. H. Freeman and Company, San Francisco, 1980. [2] G. Backus and J. Gilbert, Numerical applications of a formalism for geophysical inverse problems, Geophys. J. R. Astron. Soc., 13, 247-276, 1967. [3] G. Backus and J. Gilbert, Numerical applications of a formalism for geophysical inverse problems, Geophys. J. R. Astron. Soc., 16, 169-205. 1968. [4] J. Barzilai and J. Borwein, Two-point step size gradient methods, IMA Journal of Numerical Analysis, 8, 141-148, 1988. [5] E. Baysal, Reverse time migration, Geophysics, 11, 1514-1524, 1983. [6] A. J. Berkhout, Seismic migratoion: Imaging of Acoustic Energy by Wave Field Extrapolation, Elsevier, New York, 1985. [7] N. Bleistein, J. K. Cohen and John W. Jr. Stockwell, Mathematics of multidimensional seismic imaging, migration and inversion, Springer, New York, 2000. [8] D. Bevc, F. Ortigosa, A. Guitton and B. Kaelin, Next generation seismic imaging: high fidelity algorithms and high-end computing, AGU General Assembly, Acapulo, Mexico, 2007. [9] V. Cerveny, The application of ray tracing to the numerical modeling of seismic wavefields in complex structures, Seismic shear waves (Part A: Theory), Editor: G. Dohr, Number 15 in Handbook of Geophysical Exploration, Geophysical Press, 1-124, 1985. [10] J. F. Claerbout, Toward a unified theory of reflector mapping, Geophysics, 36(3), 467-481, 1971. [11] J. F. Claerbout, Downward continuation of move-out-corrected seismograms, Geophysics, 37(5), 741-768, 1972. [12] J. F. Claerbout, Synthesis of a layered medium from its acoustic transmission response, Geophysics, 32, 264-269, 1968. [13] J. F. Claerbout, Imaging the Earth’s Interior, Blackwell, 1985. [14] A. Curtis, P. Gerstoft, H. Sato, R. Snieder and K. Wapenaar, Seismic interferometry turning noise into signal, The Leading Edge, 25, 1082-1092, 2006. [15] Y. H. Dai, On the nonmonotone line search, Journal of Optimization Theory and Applications, 112, 315-330, 2002. [16] R. Fletcher, On the Barzilai-Borwein Method, University of Dundee Report NA/207, 2001. [17] J. Gazdag, Wave equation migration with the Phase-shift Method, Geophysics, 43(7), 1342-1351, 1978. [18] L.-J. Gelius, I. Lecomte and H. Tabti, Analysis of the resolution function in seismic prestack depth imaging, Geophysical Prospecting, 50, 505-515, 2002. [19] A. Greenbaum, Iterative Methods for Solving Linear Systems, SIAM, Philadelphia, 1997.
472
18 Seismic Migration and Inversion
[20] A. Guitton, Amplitude and kinematic corrections of migrated images for nonunitary imaging operators, Geophysics, 69, 1017-1024, 2004. [21] R. Q. He, B. Hornby and G. Schuster, 3D wave-equation interferometric migration of VSP free-surface multiples, Geophysics, 72(5), S195-S203, 2007. [22] N. R. Hill, Gaussian beam migration, Geophysics, 55(11), 1416-1428, 1990. [23] N. R. Hill, Prestack Gaussian-beam depth migration, Geophysics, 66(4), 12401250, 2001. [24] J. X. Hu and G. T. Schuster, Migration deconvolution, Mathematical Methods in Geophysical Imaging Conference, SPIE, Proceedings, 118-124, 1998. [25] J. X. Hu, G. T. Schuster and P. A. Valasek, Poststack migration deconvolution, Geophysics, 66, 939-952, 2001. [26] D. R. Judson, Depth migration after stack, Geophysics, 45(3), 361-375, 1980. [27] C. T. Kelley, Iterative Methods for Linear and Nonlinear Equations, SIAM, Philadelphia, 1995. [28] A. Kirsch, An Introduction to the Mathematical Theory of Inverse Problems, Spring-Verlag, New York, 1996. [29] C. Lanczos, Linear Differential Operators, Van Nostrand, New York. [30] I. Lecomte and L.-J. Gelius, Have a look at the resolution of prestack depth migration for any model, survey and wavefields: Expanded Abstracts, SEG 68th Annual Meeting, SP 2.3, 1998. [31] K. Levenberg, A method for the solution of certain nonlinear problems in least squares, Quart. Appl. Math., 2, 164-166, 1944. [32] H. Liu and L. He, Pseudo-differential operator and inverse scattering of multidimensional wave equation, Optimization and Regularization for Computational Inverse Problems and Applications, Editors: Y. F. Wang, A. Yagola and C. C. Yang, Berlin: Springer-Verlag, 301-325, 2011. [33] G. F. Margrave, M. P. Lamoureux, J. P. Grossman and V. Iliescu, Gabor deconvolution of seismic data for source waveform and Q correction, SEG Extended Abstracts, 2190-2193, 2002. [34] D. W. Marquardt, An algorithm for least-squares estimation of nonlinear inequalities, SIAM J. Appl. Math., 11, 431-441, 1963. [35] G. A. Mcmechan, Migration by extrapolation of time-dependent boundary values, Geophysics Prospecting, 31(3), 413-420, 1983. [36] B. Molina and M. Raydan, Preconditioned Barzilai-Borwein method for the numerical solution of partial differential equations, Numerical Algorithms, 13, 45-60, 1996. [37] D. A. Murio, The Mollification Method and the Numerical Solution of Ill-posed Problems, John Wiley and Sons, New York, 1993. [38] T. Nemeth, C. J. Wu and G. T. Schuster, Least-squares migration of incomplete reflection data, Geophysics, 64(1), 208-221, 1999. [39] G. Nolet, Solving or resolving inadequate noisy tomographic systems, J. Comp. Phys., 61, 463-482, 1985. [40] G. Nolet and R. Snieder, Solving large linear inverse problems by projection, Geophys. J. Int., 103, 565-568, 1990. [41] M. M. Popov, Ray theory and Gaussian beam method for geophysicists, EDUFBA, Salvador, 2002. [42] Y. Saad, Iterative Methods for Sparse Linear Systems, SIAM, Philadelphia, 2000.
References
473
[43] M. D. Sacchi, J. Wang and H. Kuehl, Regularized migration/inversion: new generation of imaging algoritihms, CSEG Recorder, 31 (Special Edition), 54-59, 2006. [44] W. A. Schneider, Integral formulation for migration in two and three dimensions, Geophysics, 43(1), 49-76, 1978. [45] G. T. Schuster, Acquisition foot print removal by least-squares migration, Utah Tomography and Modeling/Migration Research Report, University of Utah, 1997a. [46] G. T. Schuster, Green’s functions for migration, 67th Annual International Meeting, SEG, Expanded Abstracts, 1754-1758, 1997b. [47] G. T. Schuster, L. Katz, F. Followill and J. Yu, Autocorrelogram migration: theory, Geophysics, 68, 1685-1694, 2003. [48] G. Schuster, J. Yu, J. Sheng and J. Rickett, Interferometric/daylight seismic imaging, Geophysical Journal International, 157, 838-852, 2004. [49] T. Sjøberg, L.-J. Gelius and I. Lecomte, 2D deconvolution of seismic image blur, Expanded Abstracts, SEG 73rd Annual Meeting, Dallas, 2003. [50] R. H. Stolt, Migration by Fourier transform, Geophysics, 43(1), 23-48, 1978. [51] A. Tarantola, Inverse Problems Theory: Methods for Data Fitting and Model Parameter Estimation, Elsevier, Amsterdam, 1987. [52] A. N. Tikhonov and V. Y. Arsenin, Solutions of Ill-posed Problems, John Wiley and Sons, New York, 1977. [53] J. Trampert and J. J. Leveque, Simultaneous iterative reconstruction technique: physical interpretation based on the generalized least squares solution, Journal of Geophysical Research, 95, 12553-12559, 1990. [54] S. Treitel and L. R. Lines, Past, present, and future of geophysical inversion-a new millennium analysis, Geophysics, 66, 21-24, 2001. [55] T. J. Ulrych, M. D. Sacchi and M. Grau, Signal and noise separation: art and science, Geophysics, 64, 1648-1656, 1999. [56] T. J. Ulrych, M. D. Sacchi and A. Woodbury, A Bayesian tour to inversion, Geophysics, 66, 55-69, 2000. [57] J. C. VanDecar and R. Snieder, Obtaining smooth solutions to large linear inverse problems, Geophysics, 59, 818-829, 1994. [58] A. Valenciano, B. Biondi and A. Guitton, Target oriented wave-equation inversion, Geophysics, 71, A35-A38, 2006. [59] B. B. Wafik and H. K. Timothy, The paraxial ray method, Geophysics, 52(12), 1639-1653, 1987. [60] Y. F. Wang and Y. X. Yuan, Convergence and regularity of trust region methods for nonlinear ill-posed inverse problems, Inverse Problems, 21, 821-838, 2005. [61] Y. F. Wang, Computational Methods for Inverse Problems and Their Applications, Higher Education Press, Beijing, 2007. [62] Y. F. Wang and S. Q. Ma, Projected Barzilai-Borwein methods for large scale nonnegative image restorations, Inverse Problems in Science and Engineering, 15(6), 559-583, 2007. [63] Y. F. Wang, C. C. Yang and X. W. Li, A regularizing kernel-based brdf model inversion method for ill-posed land surface parameter retrieval using smoothness constraint, Journal of Geophysical Research, 113, D13101, 2008. [64] Y. F. Wang, C. C. Yang and Q. L. Duan, On iterative regularization methods for seismic migration inversion imaging, Chinese Journal of Geophysics, 52(3), 1615-1624, 2009.
474
18 Seismic Migration and Inversion
[65] Y. F. Wang and C. C. Yang, Accelerating migration deconvolution using a nonmonotone gradient method, Geophysics, 75(4), S131-S137, 2010. [66] Y. F. Wang, I. E. Stepanova, V. N. Titarenko and A. G. Yagola, Inverse Problems in Geophysics and Solution Methods, Higher Eduacaiton Press, Beijing, 2011. [67] Y. F. Wang, Preconditioning non-monotone gradient methods for retrieval of seismic reflection signals, Advances in Computational Mathematics, DOI: 10.1007/s10444-011-9207-2, 2012. [68] T. Y. Xiao, S. G. Yu and Y. F. Wang, Numerical Methods for Solution of Inverse Problems, Science Press, Beijing, 2003. [69] K. Wapenaar and J. Fokkema, Green’s function representations for seismic interferometry, Geophysics, 71(4), SI33-SI46, 2006. [70] K. Wapenaar, D. Draganov and J. O. A. Robertsson, Seismic Interferometry: History and Present Status, Society of Exploration Geophysicists, Tulsa, Oklahoma, 2008. [71] N. D. Whitmore, Iterative depth migration by backward time propagation, 53rd Annual International Meeting, SEG, Expanded Abstacts, 382-385, 1983. [72] O. Yilmaz, Seismic data analysis: Processing, Inversion and Analysis of Seismic Data, 2 Vols., Society of Exploration Geophysicists, Tulsa, 2001. [73] J. Yu and G. T. Schuster, Crosscorrelogram migration of inverse vertical seismic profile data, Geophysics, 71, S1-S11, 2006. [74] J. H. Yu, J. X. Hu, G. T. Schuster and R. Estill, Prestack migration deconvolution, Geophysics, 71(2), S53-S62, 2006. [75] Y. X. Yuan, Numerical Methods for Nonliear Programming, Shanghai Science and Technology Publication, Shanghai, 1993. [76] Y. X. Yuan, Step-sizes for the gradient method, Proceedings of the Third International Congress of Chinese Mathematicians, AMS/IP Studies in Advanced Mathematics, 785-796, 2008. [77] Y. X. Yuan, Gradient methods for large scale convex quadratic functions, Optimization and Regularization for Computational Inverse Problems & Applications, Editors: Y. F. Wang, A. Yagola and C. C. Yang, Berlin/Beijing: SpringerVerlag/Higher Education Press, 141-155, 2010.
Authors Information Yanfei Wang† , Zhenhua Li†‡ and Changchun Yang† †
Key Laboratory of Petroleum Resources Research, Institute of Geology and Geophysics, Chinese Academy of Sciences, Beijing 100029, P. R. China. E-mail: [email protected]
‡
Graduate University of Chinese Academy of Sciences, Beijing 100049, P. R. China.
Chapter 19
Seismic Wavefields Interpolation Based on Sparse Regularization and Compressive Sensing Y. F. Wang, J. J. Cao, T. Sun and C. C. Yang
Abstract. Due to the influence of variations in landform, geophysical data acquisition is usually sub-sampled. Reconstruction of the seismic wavefield from sub-sampled data is an ill-posed inverse problem. It usually requires some regularization techniques to tackle the ill-posedness and provide a stable approximation to the true solution. In this review chapter, we consider the wavefield reconstruction problem as a compressive sensing problem. We solve the problem by constructing different kinds of regularization models and study sparse optimization methods for solving the regularization model. The lp -lq model with p = 2 and q = 0, 1 is fully studied. The Projected gradient descent method, linear programming method, alternating direction method and an l1 norm constrained trust region method are developed to solve the compressive sensing problem. Numerical results demonstrate that the developed approaches are robust in solving the ill-posed compressive sensing problem and can greatly improve the quality of wavefield recovery.
19.1
Introduction
In seismology, the process of acquisition records the continuous wavefield which is generated by the source. In order to restore the seismic data correctly, the acquisition should satisfy the Nyquist/Shannon sampling theorem, i.e., the sampling frequency should be at least twice of the maximum frequency of original signal. In seismic acquisition, because of the influence of obstacles at land surface, rivers, bad receivers, noise, acquisition aperture, restriction of topography and investment, the obtained data usually does not satisfy the sampling theorem. A direct effect of the limitations of acquisition is the sub-sampled data will generate aliasing in the frequency domain; therefore, it may affect the subsequent processing such as filtering, de-noising, AVO (amplitude versus off-
476
19 Seismic Wavefields Interpolation
set) analysis, multiple eliminating and migration imaging. In order to remove the influence of sub-sampled data, the seismic data restoration/interpolation technique is often used. However, this is an ill-posed problem due to the fact that seismic data are usually band-limited and the inversion process is underdetermined, i.e., the dimension of the solution space is infinite [26, 54, 56, 9]. The model of seismic acquisition can be written as Lm = d,
(19.1.1)
where L ∈ RM ×N is the forward modeling/sampling operator, m ∈ RN is the reflectivity model or the wavefield, and d ∈ RM denotes the sampled data (observations). The restoration problem is to solve m from knowledge of L and d, thus it is an inverse problem. A problem is called well-posed if the solution of (19.1.1) exists, is unique and continuous. If any one of these conditions is violated, the problem is ill-posed. Equation (19.1.1) can be solved by finding a least squares solution. However, spectrum analysis reveals that this solution is unstable and physically meaningless, sometimes [50]. Seismic interpolation is a main technique for wavefield restoration. For seismic interpolation problem, let us denote by m the original seismic wavefield, d the sampled data, and L the sampling operator, the expression again can be written as (19.1.1). Our purpose is to restore m from the sampled data d. Since d is usually incomplete and L is an underdetermined operator, this indicates that there are infinite solutions satisfying the seismic interpolation equation (19.1.1). Hence, seismic data interpolation is an ill-posed inverse problem. Signal acquisition systems based on the Nyquist-Shannon sampling theorem require that the number of samples needed to recover a signal without error is twice the bandwidth. This sampling theorem is hard to be satisfied in practice. As an alternative, compressive sensing (CS) has recently received a lot of attention in the signal and image processing community. Instead of relying on the bandwidth of the signal, the CS uses the basic assumption: sparsity. The sparsity can lead to efficient estimations and compression of the signal via a linear transform, e.g., sine, cosine, wavelet and curvelet transforms [27]. The method involves taking a relatively small number of non-traditional samples in the form of projections of the signal onto random basis elements or random vectors [16, 6, 8]. Therefore, if the signal has a sparse representation on some basis, it is possible to reconstruct the signal using few linear measurements. A popular method for solving ill-posed problems is the Tikhonov regularization strategy, which refers to solving a regularized functional α 1 min Jα (m) = Lm − d22 + Ω(m), 2 2
(19.1.2)
where Ω(·) is the so-called Tikhonov stabilizer, and α ∈ (0, 1) is the regular parameter. We often choose Ω(m) = D1/2 m2 , where D is a positive (semi-)
19.2 Sparse Transforms
477
definite and bounded operator. Therefore, the minimizer of Jα (x) can be given by (19.1.3) m = (L∗ L + αD)−1 L∗ d, i.e., the solution of the so-called Euler equation L∗ Lm + αDm = L∗ d.
(19.1.4)
However, the classical Tikhonov regularization method cannot deal with the sparsity. We must resort to establishing sparse regularization models and developing sparse optimization methods.
19.2
Sparse Transforms
Sparse transform is an important part of compressive sensing. If the coefficients in the transform domain are very sparse, then only small sampling numbers are enough. In the seismic processing, the most commonly used transforms are the Fourier transform, the linear Radon transform, the parabolic Radon transform, the wavelet transform and the curvelet transform. For a signal x in N -dimensional space, we have M observation data di = Ai x, i = 1, 2, . . . , M , where Ai is the impulse response of the i-th sensor. Denote A = [A1 , A2 , . . . , AM ]T , the observation data can be reformulated as d = Ax. The aim of the compressive sensing is to use limited observations di (i = 1, 2, . . . , M ) with M N to restore the input signal x. Suppose x is the original wavefield which can be spanned by a series of orthogonal bases Ψi (t). These bases for all i constitute an orthogonal transform matrix Ψ such that mi Ψi (t), (19.2.1) x(t) = (Ψm)(t) = i
where mi = (x, Ψi ). Using operator expression, m = Ψ∗ x. The vector m is thought of the sparse or compressive expression of the signal x. Let L = AΨ, the reconstruction problem of the sparse signal m reduces to solving a simple problem d = Lm. Note that mi is the weight or coefficient of linear combinations for the signal x, the reconstruction of the signal x in turn becomes to find the coefficient vector m.
19.2.1
Fourier, wavelet, Radon and ridgelet transforms
There are many ways to choose an orthogonal transform matrix based on some orthogonal bases, e.g., sine curve, wavelet, curvelet, contourlet, framelet and
478
19 Seismic Wavefields Interpolation
contourlet, and so forth [58, 27]. The simplest is the Fourier transform, which decomposes a signal f through proper a group of orthogonal bases ψj such that fFrouier = cj ψj = Ψc, (19.2.2) j
where c is the coefficient matrix with entries cj . Wavelet transform is the bases for all sparse transforms. Define t = (t, τ ), the wavelet transform of the signal f(t) is given by ψa1 ,a2 ,b1 ,b2 (t)f(t)dt, (19.2.3) fwt (a1 , a2 , b1 , b2 ) = R2
where ψa1 ,a2 ,b1 ,b2 (t) is the tensor form, which can be written as ψa1 ,a2 ,b1 ,b2 (t) = ψa1 ,b1 (t)ψa2 ,b2 (τ ), 1
where ψa,b (t) = a− 2 ψ( t−b a ) (a > 0, b ∈ R). Wavelet transform relies on proper generating function ψa,b (t), which is a kind of point to point transform. Figure 19.1 shows the wavelet generating function.
Figure 19.1. Wavelet generating function ψa,b (t).
Ridgelet transform is very similar to wavelet transform, and can be expressed by wavelet transform. Ridgelet transform of the signal f(t) is defined by ψa,b,θ (t)f(t)dt, (19.2.4) fRT (a, b, θ) = R2
where a > 0, b ∈ R, θ ∈ [0, 2π) and ψa,b,θ (t) is related by 1 1 ψa,b,θ (t) = a− 2 ψ( (t cos θ + τ sin θ − b)). a
19.2 Sparse Transforms
479
In the transform domain, an inverse transform of fRT (a, b, θ) yields recovery of the signal f(t): f(t) =
0
2π
∞
−∞
0
∞
da dθ fcrt (a, b, θ)ψa,b,θ (t) 3 db . a 4π
(19.2.5)
Ridgelet transform relies on proper generating function ψa,b,θ (t), which is a kind of line transform. Figure 19.2 shows the ridgelet generating function.
Figure 19.2. Ridgelet generating ψa,b,θ (t).
It is well-known that the Radon transform is widely applied in computerized tomography (CT). The Radon transform is a kind of line transform with certain directional orientation. The formula reads as δ(t cos θ + τ sin θ − u)f(t)dt, (19.2.6) fRadon (θ, u) = R2
where δ(·) is the Dirac function. It is evident that the ridgelet transform is actually the wavelet transform of the slice up of the Radon transform, i.e., ψa,b (u)fRadon (θ, u)du. (19.2.7) fRT (a, b, θ) = R2
As we are mainly concerned with new methods to solve the linear system d = Lm in this paper, we could choose a simple wavelet orthogonal bases to form the transform matrix Ψ.
480
19.2.2
19 Seismic Wavefields Interpolation
The curvelet transform
Although applications of wavelets have become increasingly popular in scientific and engineering fields, traditional wavelets perform well only at those points which possess singularities; and the geometric properties, e.g., waveforms of wave fronts in seismic data are not utilized. The linear Radon transform can focus the energy of linear events; the parabolic Radon transform can compress events with parabolic shapes. The ridgelet transform can deal with the linear events, but fail with curve events. Curvelet transform as a multi-scale ridgelet transform, is multi-scale, multi-directional, anisotropic and local [4]. Curves are approximated by piecewise line segments in the curvelet domain; therefore, seismic signals can be sparsely compressed. Similar to the wavelet and ridgelet transforms, curvelet transform can be represented by inner product of the curvelet functions ϕ and the signal f(x) f(x)ϕj,l,k (x)dx, (19.2.8) c(j, l, k) = f, ϕj,l,k = R2
where f ∈ L2 (R2 ), ϕj,l,k denotes curvelet function and ϕ¯ is the conjugate of ϕ, which is indexed by three parameters: a scale parameter 2−j , j ∈ N0 (N0 is the positive integer set), a sequence of rotation angles θj,l = 2πl · 2−j/2 , 0 ≤ (j,l) (k1 2−j , k2 2−j/2 )T , (k1 , k2 ) ∈ Z 2 (Z l ≤ 2−j/2 − 1, and a position xk = Rθ−1 j,l denote the integer set), where cos θj,l sin θj,l Rθj,l = − sin θj,l cos θj,l is the rotation matrix with angle θj,l . Curvelets consist of directional entries ϕj,l,k (x) in fine scales and isotropic entries ϕj0 ,k (x) in coarse scales. In fine scales, the curvelet function can be written as (j,l)
ϕj,l,k (x) = ϕj (Rθj,l (x − xk )),
(19.2.9)
while in coarse scales, the curvelet function can be denoted as ϕj0 ,k (x) = ϕj0 (x − 2−j0 k), which indicates that the curvelets are isotropic in the coarse scale. The function ϕj (x) is the waveform by means of its Fourier transform ϕˆ j (ω), which serves as a “mother” curvelet in the sense that all curvelets at scale 2−j are obtained by rotations and translations of ϕj . Digital implementations of curvelet transform can be outlined as three steps: applying 2D FFT; product with frequency windows; and applying 2D inverse FFT for each window. Through the curvelet transform, the original signals are decomposed into various scales and angles. Its discrete form can be written as c = Sf, where c is a vector denotes the discrete set of curvelet coefficients, f is the discrete form
19.3 Sparse Regularizing Modeling
481
of the data, and S = WF F2 is the curvelet transform matrix, where F2 is the 2D Fourier transform matrix and WF denotes the windowing operator followed by 2D inverse Fourier transform in each scale and in each direction. The computational cost of the forward and inverse curvelet transform is O(N 2 log N ) for an N × N data. We refer to [5, 7] for details of the implementation of the curvelet transform by involving FFT and IFFT. The inverse curvelet transform can be written as f = S ∗ c, where S ∗ denotes the adjoint operator of S. Since seismic data are sparse under the curvelet transform, we could also use it as the sparse transform in this paper.
19.3 19.3.1
Sparse Regularizing Modeling Minimization in l0 space
A natural model to satisfy the sparse solutions of the linear system Lm = d is the equality constrained minimization model with l0 quasi-norm: ml0 −→ min, subject to Lm = d,
(19.3.1)
where · l0 is defined as: xl0 = {num(x = 0), for all x ∈ RN }, where num(x = 0) denotes the cardinality of nonzero components of the vector x. Minimization of xl0 means the number of nonzero components of x to be minimal. It is well known that the minimization of xl0 is an N P -Hard problem, i.e., optimization algorithms solving the l0 minimization problem cannot be finished in polynomial times. This indicates that this model is doomed to be infeasible in practice.
19.3.2
Minimization in l1 space
Because of the numerical infeasibility of the l0 quasi-norm minimization problem, we relax it to solve the approximation model based on l1 norm: ml1 −→ min, subject to Lm = d.
(19.3.2)
The presence of the l1 term encourages small components of m to become exactly zero, thus promoting sparse solutions. Introducing the Lagrangian multiplier λ, equation (19.3.2) is equivalent to the following unconstrained problem Lm − d2l2 + αml1 −→ min .
(19.3.3)
The minimization model based on l1 norm approximates the minimization model based on l0 norm quite well, while the sparsity is retained.
482
19.3.3
19 Seismic Wavefields Interpolation
Minimization in lp -lq space
In [52], the authors proposed a general lp -lq model for solving multi-channel ill-posed image restoration problem, 1 α J α [m] := Lm − dplp + Dmqlq −→ min, 2 2 for p, q ≥ 0 and D a scale operator, (19.3.4) which includes most of the regularization models thus far. Straightforward calculation yields the gradient and Hessian (the matrix of the second-order partial derivatives) of J α [m] as ⎡ ⎡ ⎤ ⎤ |r1 |p−1 sign(r1 ) |m1 |q−1 sign(m1 ) ⎢ |r2 |p−1 sign(r2 ) ⎥ 1 ⎢ |m2 |q−1 sign(m2 ) ⎥ 1 ⎢ ⎢ ⎥ ⎥ gradJ α [m] = pLT ⎢ ⎥ + αqDT ⎢ ⎥ .. .. 2 ⎣ ⎣ ⎦ 2 ⎦ . . |rM |p−1 sign(rM ) |mN |q−1 sign(mN ) (19.3.5) and 1 HessJ α [m] = p(p − 1)LT diag(|r1 |p−2 , |r2 |p−2 , . . . |rM |p−2 )L 2 (19.3.6) 1 + αDT q(q − 1)diag(|m1 |q−2 , |m2 |q−2 , . . . , |mN |q−2 )D, 2 respectively, where r = (r1 , r2 , . . . , rM )T = Lm − d is the residual, sign(·) denotes a function which returns −1, 0, or +1 when the numeric expression value is negative, zero, or positive respectively, diag(v) is the diagonal matrix whose i-th diagonal entry is the same as the i-th component of the vector v. Evidently, when p = 2 and q = 0 or q = 1, the lp -lq model becomes the l0 minimization model or the l1 minimization model, respectively. We remark that the lp -lq regularization model does not require the convexity of the objective function, hence could be used to solve inverse problems with complex structure.
19.4
Brief Review of Previous Methods in Mathematics
Finding the sparse solution of under-determined problems have been studied in many areas, meanwhile the commonly used methods are based on the l1 -norm optimization. In the past few years, many algorithms have been proposed and studied for solving the l1 -norm minimization problems. These methods find the sparse solution through solving min al1 ,
s.t. Aa = d.
(19.4.1)
19.4 Brief Review of Previous Methods in Mathematics
483
This problem can be changed into a linear programming, and then solved by interior point methods [11, 50, 51]. Suppose that the sampling process has additive noise n, we obtain d = As + n; therefore, the corresponding problem becomes (19.4.2) min al1 , s.t. Aa − dl2 < , where is a nonnegative real parameter which control the noise level in the data. This problem can be solved as a second order cone program [9]. Problems (19.4.1) and (19.4.2) are closely related to the following two problems: min Aa − d22 + λal1 and
min Aa − d2l2 ,
s.t. al1 < σ,
(19.4.3) (19.4.4)
where λ is the Lagrange multiplier. With appropriate parameter choices of , λ and σ, the solutions of (19.4.2), (19.4.3) and (19.4.4) coincide, and these problems are in some sense equivalent. For example, the well-known least absolute shrinkage and selection operator (LASSO) approach has the form (19.4.4) with σ ≥ 0 [38]; the basis pursuit denoising (BPDN) approach has the form (19.4.2) with = 0 [11, 41]. Evidently, equations (19.4.3) and (19.4.4) fall into the familiar Tikhonov regularization scheme for solving ill-posed problems [39]. A lot of methods can be used to solve problems (19.4.1)–(19.4.4). The homotopy method was originally designed for solving noisy over-determined l1 penalized least squares problems [35]. It was also used to solve underdetermined problems in [16]. This method for the model problems (19.4.1)–(19.4.4) has been considered in [32, 42]. The least angle regression method as a modification of the homotopy method, considered in [18], investigated solving problem (19.4.4). When the solution is sufficiently sparse, these two methods are more rapid than general-purpose linear programming methods even for the high dimensional cases. If the solution is not rigorously sparse, the homotopy method may fail to find the correct solution in certain scenarios. Another popularly used method is the interior point method. In [11], problems (19.4.1) and (19.4.3) are solved by first reformulating them as “perturbed linear programs”, then applying a standard primal-dual interior point method. The linear equations are solved by iterative methods such as LSQR or conjugate gradient method. Another interior point method given in [29] is based on changing (19.4.3) into a quadratic programming (QP) problem solved by preconditioned conjugate gradient method. Problem (19.4.2) can also be solved by reconsidering it as a second-order cone programming and then applying a log-barrier method, e.g., methods developed in [51, 46] are used to solve the land surface parameter retrieval problems. Other optimization methods based on bounded constraints may also be applied to solve this problem, e.g., [12, 13, 34, 2, 1, 22, 30].
484
19 Seismic Wavefields Interpolation
Iterative soft thresholding methods (IST) were used to solve problem (19.4.3) [20]. But they are sensitive to the initial values and are not always stable. In addition, much iteration is required for convergence. Recently, the spectral gradient-projection method was developed for solving problem (19.4.2) [19]. The method relies on projected gradient descent step (including nonmonotone gradient step [14, 48]) and root-finding of the parameter σ through solving the nonlinear convex, monotone and differential equation Γ(σ) = . This method can handle both noisy and noiseless cases. However, it is clear that the root-finding method is the famous “discrepancy principle” in regularization theory for ill-posed problems [43]. Matching pursuit (MP) and orthogonal matching pursuit (OMP) [15, 17, 40] are also used for sparse inversion. This is known as a kind of greedy approach. The vector b is approximated as a linear combination of a few columns of A, where the active set of columns to be used is built in a greed fashion. At each iteration, a new column is added to the active set. OMP includes an extra orthogonal step which is known to perform better than standard MP. The computational cost is small if the solution is very sparse. However, only small dimensional problems are suitable for these methods. If Ax = b, with x being sparse and the columns of A being sufficiently incoherent, then OMP finds the sparsest solution. It is also robust for small levels of noises [17]. The projected gradient method developed in [21] express (19.4.3) as a nonnegative constraint quadratic program by separating a as a = (a)+ − (−a)+ , where (a)+ = max{a, 0}. However, this method can only tackle real numbers, thus it is unsuitable for curvelet-based seismic data restoration. The iterative re-weighted least squares method developed in [36] changes the weights of entries of a at each iteration. A similar method was proposed in [8]. Methods based on non-convex objectives are studied in [33, 23]. In [33], the l0 quasi-norm is replaced by a non-convex function. However, initial values must be carefully chosen to prevent the local optimal solution. Other methods such as iterative support detection method [59] and fix point method are also developed. These methods have extensive applications in various fields, such as image reconstruction, MRI [31], model selection in regression [18], texture/geometry separation [49] and also in seismic imaging [28, 54, 9].
19.5 Sparse Optimization Methods
19.5 19.5.1
485
Sparse Optimization Methods l0 quasi-norm approximation method
The original problem for l0 minimization is the following equality constrained optimization problem min x0 , s.t. Ax = b. (19.5.1) However, direct solution of (19.5.1) is hard to make because of time consuming. Obviously, solving the true l0 norm optimization is superior to the l1 norm optimization though both methods can yield sparse solutions. We consider approximation of l0 minimization problem. Denote fσ (t) = 1 − exp(−t2 /2σ 2 ) as a function of σ and t. This function satisfies the following properties: (a) fσ (t) is continuous and differentiable; (b) fσ (t) tends to the l0 norm when σ tends to 0, i.e., " 0, t = 0, lim fσ (t) = (19.5.2) 1, t = 0. σ→0 Thus, we can construct a continuous function to approximate the l0 norm, and then solve the optimal solution. In this way, problem (19.5.1) is approximated by N fσ (xi ), s.t. Ax = b. (19.5.3) min Jσ (x) := i=1
The object function Jσ (x) is differentiable and is closely related to the parameter σ: the smaller value of σ, the closer behavior of Jσ (x) to the l0 norm. For small values of σ, Jσ (x) is highly non-smooth and contains a lot of local minimum, hence its minimization is not easy. On the other hand, for larger values of σ, Jσ (x) is smoother and contains less local minimum, and its minimization is easier. Practically, we use a decreasing sequence values of σ: for minimizing Jσ (x) for each value of σ, the initial value of the minimization algorithm is the minimum of Jσ (x) for the previous value of σ. Then we apply a simple projected gradient method to solve equation (19.5.3). Details of the procedure are given in Algorithm 19.5.1. For the convergence of a similar method for general signal processing problem, we refer to [33] for details. Below we present a similar algorithm as [33] based on the above l0 norm approximation and use it for seismic wavefield restoration. Algorithm 19.5.1 (Smooth l0 quasi-norm approximation method). 1. Initialization: (1) Let xˆ 0 be the minimum l2 -norm solution of Ax = b, which can be obtained by applying pseudo-inverse of A.
486
19 Seismic Wavefields Interpolation
(2) Choose the inner loop number L, outer loop number J and the steplength μ; set a decreasing sequence values of σ: [σ1 , . . . , σJ ]. 2. Iteration: for j = 1, . . . , J (1) Let σ = σj . (2) Minimize the function Jσ (x) on the feasible set S = {x|Ax = b} using L iterations of gradient descent method. (a) Let x = xˆ j−1 ; (b) For l = 1, . . . , L: a) Let gσ = [∇Jσ (x1 ), ∇Jσ (x2 ), . . . , ∇Jσ (xn )]. b) (The gradient decent iteration): x = x + τ d (τ is the steplength, d = γ(gσ ). c) (Projection): Project x on the feasible set S = {x|Ax = b}: x = x − AT (AAT )−1 (Ax − b). (3) Set xˆ j = x, σ = σ/2. 3. Final solution is xˆ = xˆ J . We can also choose other functions to approximate the l0 norm, e.g., the “truncated hyperbolic” function: " 0, |t| ≤ σ, fσ (t) = (19.5.4) 2 1 − (t/σ) , |t| ≥ σ and
fσ (t) = 1 − σ 2 /(t2 + σ 2 ).
(19.5.5)
We proved that the smooth l0 method based on the truncated hyperbolic functions fσ (xi ) can yield similar restoration results. Function values of fσ (t) = 1 − σ 2 /(t2 + σ 2 ) and fσ (t) = 1 − exp(−t2 /2σ 2 ) for different values of σ are plotted in Figure 19.3. Remark 19.5.2. In step 2 of Algorithm 19.5.1, if the gradient descent step is based on steepest descent (SD) step, i.e, γ(gσ ) = −gσ and τ = τ SD , then the algorithm corresponds to projected steepest descent method. In addition, in Algorithm 19.5.1, the inner loop number L needs not be too large, and according to our experience, the step-length τ should be greater than 2 for sure of fast convergence. However, it is clear that this choice of the steplength is not optimal. Usually, we need to calculate an optimal step-length τ ∗ by line search for the one-dimensional problem τ ∗ = argminτ Jσ (x + τ d).
19.5 Sparse Optimization Methods
487
Figure 19.3. (a) Function values of fσ (x) = 1 − σ 2 /(x2 + σ 2 ) for different σ; (b) Function values of fσ (x) = 1 − exp(−x2 /2σ 2 ) for different σ.
One may readily see that some fast gradient methods based on nonmonotone gradient descent step can be applied, e.g., the BB step used in seismic migration inversion [55], where the function γ(gσ ) is updated by the former iterative information instead of the current iterative information. For curvelet-based restoration, because of the orthogonality of the curvelet transform, the inverse of AAT is the identify matrix, thus the projection onto S = {x|Ax = b} can be simply solved by x = x − AT (Ax − b).
19.5.2
l1 -norm approximation method
As it is well-known that the objective function based on l1 -norm is non-differentiable at the original point, most of the solvers for l1 -norm optimization are based on the interior point methods to solve a linear programming problem. In this section, we consider using smooth functions to approximate the l1 -norm. We consider the function (19.5.6) f (t) = |t|2 + , which is continuous, convex and differentiable. If is small enough, f (t) will approximate the l1 -norm sufficiently, thus it can be used to replace the l1 -norm [49]. Another similar function is ⎧ |t| ≥ 0.01, ⎨ |t|, fθ (t) = 1 (19.5.7) ⎩ [log(1 + exp(−θt)) + log(1 + exp(θt))], |t| < 0.01. θ If θ is large enough, then fθ will approximate the l1 -norm. This function has been used for solving nonlinear and mixed complementarity problems and feature classification [10, 37].
488
19 Seismic Wavefields Interpolation
Since the objective function based on f (t) or fθ (t) is an approximation of the l1 -norm, the errors are inevitable. However, the cost of computation will be much less than the IST method and SPGL1 method. At each iteration of the SPGL1 method, one needs to project the iteration point to the active set specified by the l1 -norm, which will increase the amount of computation [19]; while the gradient projection for sparse reconstruction (GPSR) method cannot deal with complex numbers [21]. We found that the smaller values of , the better approximation of f (t) to the l1 -norm; the smaller values of θ, the worse approximation of fθ (t) to the l1 -norm, see Figure 19.4. Thus, we can solve the following problems to get the sparse solution min F (x) =
N
f (xi ),
s.t. Ax = b
(19.5.8)
fθ (xi ),
s.t. Ax = b.
(19.5.9)
i=1
or min Fθ (x) =
N i=1
Figure 19.4. (a) f (t) for different between [−0.5, 0.5]; (b) fθ (t) for different θ between [−0.5, 0.5].
We use a projected gradient algorithm to solve these two problems. Details are outlined in Algorithm 19.5.3. In our algorithm, a projection to S = {x|Ax = b} is also needed. However, the projection is easy to be computed. Since A is orthogonal, the initial l2 norm solution can be obtained by x = AT b, and T k+1 the projection can be easily calculated through xk+1 = xk+1 pre − A (Axpre − b). However, it is different from the smooth l0 algorithm because it does not need the outer iteration.
19.5 Sparse Optimization Methods
489
Algorithm 19.5.3 (Projected gradient method for approximation of the l1 -norm optimization). 1. Initialization: (1) Choose the maximum loop number L, set = 1.0e − 16 (or θ = 1000), and k = 0. (2) Let x0 be the l2 norm solution of Ax = b, which can be obtained by the pseudo-inverse of A. 2. Iteration: (1) Let g = ∇F (xk ) (or gθ = ∇Fθ (xk )). k+1 = xk + τ d (τ is the step (2) Perform the gradient decent iteration: xpre k+1 denotes the predicted step). length, d = γ(g ) or d = γ(gθ ), xpre k+1 on the feasible set S = {x|Ax = b}: (3) Projection: Project xpre k+1 k+1 − AT (AAT )−1 (Axpre − b). xk+1 = xpre
(4) Let xk = xk+1 , k = k + 1. 3. Final solution is x = xL . In step 2 of Algorithm 19.5.3, if the gradient descent step is based on steepest descent step, i.e, γ(g ) = −g or γ(gθ ) = −gθ and τ = τ SD , then the algorithm corresponds to projected steepest descent method. More advanced choices of the function γ and the step-length τ are given in Remark 19.5.2.
19.5.3
Linear programming method
Direct solution of the l0 quasi-norm minimization model is an NP-Hard problem, i.e., it is impossible to invert the function in polynomial time. Thus, direct solving the l0 quasi-norm minimization problem is not recommended for applications. We reconsider the solution method for the constrained l0 -norm minimization model (19.5.1). Using l1 -norm minimization approach, it is required to search a feasible solution in the feasible set: S = {a|Aa = d, fi (a) ≤ 0}, where fi is a linear function in the form of fi (a) = eTi a + ci for some ei ∈ RN , ci ∈ R. So, it actually searches an interior point within the feasible set S, hence the method is called the interior point method. Note that al1 → min is equivalent to eT0 a → min, where e0 is a vector with all components equaling to 1, hence the l1 -norm minimization under the constraint set S is the well-known linear programming method. Using logarithmic barrier, minimization of al1
490
19 Seismic Wavefields Interpolation
in the feasible set S can be approximated by min eT0 a − (1/γ) a
N
log(−fi (a)),
i=1
s.t. Aa = d,
(19.5.10)
where a vector mentioned above. Define ψ(a) = γ > 0 is a constant and e0 is ∗ (γ) for γ > 0 is the optimal solution of the log(−f (a)) and suppose a − N i i=1 problem mina γeT0 r + ψ(a), (19.5.11) s.t. Aa = d. We can define the central path as {a(γ) : γ > 0}. Therefore a = a∗ (γ) if there exists a vector u such that γe0 + ∇ψ(a) + AT u = 0,
Aa = d.
(19.5.12)
This indicates that a∗ (γ) minimizes the Lagrangian function ∗
∗
φ(a, λ (γ), ν (γ))
eT0 a
+
N
λ∗i (γ)fi (a) + νi∗ (γ)T (Aa − d),
(19.5.13)
i=1
where λ∗i (γ) = −1/(γfi (a∗ (γ))) and ν ∗ (γ) = u/γ. The dual problem is max infa φ(a, λ, ν), s.t. λ ≥ 0.
(19.5.14)
Using the Lagrangian multiplier method, the interior point method chooses a triplet (a, λ, ν) through solving the nonlinear equations ⎤ ⎡ ⎤ ⎡ 0 resprimal (a, λ, ν) (19.5.15) F (a, λ, ν) = ⎣ resdual (a, λ, ν) ⎦ = ⎣ 0 ⎦ , 0 rescentral (a, λ, ν) where the residuals of the primal problem, dual problem and central path are defined as resprimal (a, λ, ν) = Aa − d, N λi ∇fi (a) + AT ν, resdual (a, λ, ν) = e0 +
(19.5.16)
i=1
rescentral (a, λ, ν) = −Λf − 1/γe0 , i = 1, 2, . . . , N, where Λ = diag(λ1 , λ2 , . . . , λn ) and diag() denotes a diagonal operator, and f = (f1 (r), f2 (r), . . . , fn (r))T .
19.5 Sparse Optimization Methods
491
Using Newton’s iterative method, the interior point methods generate iteration sequence {ak , λk , νk }. With the # k going to infinity, the # iterative index equality violations d − Aak and #AT νk + Λe + e0 # approach zero and the dual gap towards N/γ, hence solve the primal-dual problem. Detailed algorithms for ill-posed problems can be found in [29, 50, 51]. The complete theory about interior point algorithms was given in Ref. [60].
19.5.4
Alternating direction method
The alternating direction method is a kind of operator splitting method, which receives much more attention in recent years [24, 25, 3]. For general equalityconstrained convex optimization problem min f(x),
(19.5.17)
s.t. Ax = d,
(19.5.18)
where x ∈ RN , A ∈ RM ×N , f : RN → R is convex and separable, the splitting operator method refers to the following problem min f1 (x) + f2 (y),
(19.5.19)
s.t. Ax + By = c,
(19.5.20)
where y ∈ Rp , B ∈ RM ×p and c ∈ RM ; f1 and f2 are two convex functions. It is clear that the original input signal x is split into two parts, called x and y here, with the objective function separation meeting this splitting. Using the method of multipliers, we form the augmented Lagrangian function 1 Lα (x, y, λ) = f1 (x) + f2 (y) + λT (Ax + By − c) + νAx + By − c2l2 , (19.5.21) 2 where λ is the Lagrangian multiplier and ν > 0 is a preassigned parameter. This formulation is clearly a regularized form of the Lagrangian problem 1 min f1 (x) + f2 (y) + νAx + By − c2l2 , 2 s.t. Ax + By − c = 0,
(19.5.22) (19.5.23)
where ν > 0 serves as a regularization parameter. With the splitting form, the solution of the original problem (19.5.17)–(19.5.18) can be solved by alternating directions: xk+1 = argminx Lν (x, yk , λk ),
(19.5.24)
yk+1 = argminy L (xk+1 , y, λk ),
(19.5.25)
λk+1 = λk + ν(Axk+1 + Byk+1 − c).
(19.5.26)
ν
492
19 Seismic Wavefields Interpolation
Referring to our problem, let us consider lp -lq model with p = 2 and q = 1: f(a) = Aa − d2l2 + αal1 −→ min .
(19.5.27)
With the splitting form, the problem can be written as min f1 (a) + f2 (y),
(19.5.28)
s.t. a − y = 0,
(19.5.29)
where f1 (a) = Aa − d2l2 and f2 (y) = αyl1 . The augmented Lagrangian function can be written as 1 Lν (a, y, λ) = f1 (a) + f2 (y) + λT (a − y) + νa − y2l2 . 2 Define u = ν1 λ, the above expression can be rewritten as 1 Lν (a, y, u) = f1 (a) + f2 (y) + νa − y + u2l2 . 2 Since the second term is nondifferentiable at zero, a subdifferential calculus can be applied. Its differential can be explicitly expressed as in Section 19.3.3 or using some approximations as in Sections 19.5.2 and 19.5.5 or using soft thresholding projection operators. Therefore, the solution of (19.5.27) can be solved by alternating directions: 1 ak+1 = argmina (f1 (a) + νa − (yk − uk )2l2 ), 2 yk+1 = SC (ak+1 + uk ), uk+1 = uk + (ak+1 − yk+1 ),
(19.5.30) (19.5.31) (19.5.32)
where SC serving as a proximal operator is a projection operator which project some iteration points onto C. Referring to the l2 -l1 model, straightforward calculation of the gradient of the objective function and setting it to be zero yield that the alternating directions are explicitly formed by ak+1 = (AT A + νI)−1 (AT d + νyk − λk ),
(19.5.33)
yk+1 = Sα/ν (ak+1 + λk /ν),
(19.5.34)
λk+1 = λk + ν(ak+1 − yk+1 ),
(19.5.35)
where Sc (a) is defined as Sc (a) = (a − c)+ − (−a − c)+ , c ∈ C and (·)+ = max(·, 0), which provides some soft thresholding to a. Since ν > 0, hence AT A + νI is positive definite and the above iterations are validated. We give a simple algorithm as follows.
19.5 Sparse Optimization Methods
493
Algorithm 19.5.4 (Alternating direction method for solving the nonlinear l2 -l1 minimization model). (1) Initialization: input the maximal number of iterations Itermax ; the starting point (a0 , y 0 , λ0 ), where a0 and y 0 are in their domains of f1 and f2 , respectively and λ0 ∈ RN ; the value of the regularization parameter ν. (2) Do loop until the stopping criterion is satisfied or Itermax is reached. (3) Update ak+1 = (AT A + νI)−1 (AT d + νyk − λk ), yk+1 = Sα/ν (ak+1 + λk /ν), λk+1 = λk + ν(ak+1 − yk+1 ). In step 2, the stopping criterion is based on the norm of the residual between yk+1 and yk , or the absolute value of the residual between λk+1 and λk . This can be easily realized by setting a tolerance > 0, if the maximum energy of the above residuals exceeds > 0, then stop. In step 3, solving the iteration point ak+1 requires factorization of the positive definite symmetric matrix AT A + νI. This can be easily realized by the square-root factorization method.
19.5.5
l1 -norm constrained trust region method
Let us go back to the lp -lq minimization model with p = 2 and q = 1. In this case, the model reads as f(m) = Lm − d2l2 + αml1 −→ min .
(19.5.36)
The regularization parameter α is set to be an a priori. It is evident that the above function f is nondifferentiable at m = 0. To make it easy to be calculated l by computer, we approximate ml1 by i=1 (mi , mi ) + ( > 0) and l is the length of the vector m. For simplification of notations, we let A = LT L, T mki mk1 mkn k γ(m ) = √ k T k , √ k T k , . . . , √ k T k and (m1 ) m1 +
⎛
⎜ ⎜ ⎜ ⎜ k χp (m ) = ⎜ ⎜ ⎜ ⎜ ⎝
(mi ) mi +
((mk1 )T mk1 + )p/2
0
0
0
..
0 .. . 0
... 0 0
(mn ) mn +
.
0 ((mki )T mki + )p/2
.. . ...
0 .. .
... .. . .. .
0 .. .
..
. 0
((mkn )T mkn + )p/2
Straightforward calculation yields the gradient of f at mk gk := g(mk ) ≈ LT (Lmk − d) + αγ(mk )
⎞ ⎟ ⎟ ⎟ ⎟ ⎟. ⎟ ⎟ ⎟ ⎠
494
19 Seismic Wavefields Interpolation
and the Hessian of f at mk Hk := H(mk ) ≈ LT L + αχ3 (mk ). With above preparation, a trust region subproblem for the compressing model can be formulated as [56] 1 min φk (ξ) := (gk , ξ) + (Hk ξ, ξ), 2 subject to ξl1 ≤ Δk .
ξ∈X
(19.5.37) (19.5.38)
To solve the trust region subproblem (19.5.37)–(19.5.38), we introduce the Lagrangian multiplier λ and solve an unconstrained minimization problem L(λ, ξ) = φk (ξ) + λ(Δk − ξl1 ) −→ min .
(19.5.39)
Straightforward calculation yields that the solution satisfies ξ = ξ(λ) = −(Hk + λχ1 (ξ))−1 gk .
(19.5.40)
From (19.5.40), we find that the trial step ξ can be obtained iteratively ξ j+1 (λ) = −(Hk + λχ1 (ξ j ))−1 gk .
(19.5.41)
And at the k-th step, the Lagrangian parameter λ can be solved via the nonlinear equation (19.5.42) ξk (λ)l1 = Δk . 1 − Δ1k , the Lagrangian parameter λ can be iteratively Denoting Γ(λ) = ξk (λ)
l1 solved by Newton’s method
Γ(λl ) , l = 0, 1, . . . . (19.5.43) Γ (λl ) & ' (λ) d 1 = − ρρ2(λ) The derivative of Γ(λ) can be evaluated as dλ = ξρ(λ)
2 , ρ(λ) (λ) λl+1 = λl −
k
l1
where ρ(λ) := ξk (λ)l1 . One may readily derive that at the k-th step ⎛ ⎞ k √ k ξ1T(λ)k ⎜ ξ1 (λ) ξ1 (λ)+ ⎟ ⎜ ⎟ .. ⎜ ⎟ . ⎜ ⎟ k (λ) ⎜ ⎟ d ξ i ⎜ ⎟∗ √ ρ (λ) ≈ ⎜ k T k ξk (λ) = γ(ξk )[Hk + λχ1 (ξk )]−1 χ1 (ξk )ξk (λ). ξi (λ) ξi (λ)+ ⎟ dλ ⎜ ⎟ .. ⎜ ⎟ ⎜ ⎟ . ⎝ ⎠ k ξn (λ) √k T k ξn (λ) ξn (λ)+
(19.5.44)
19.5 Sparse Optimization Methods
495
Hence the optimal Lagrangian parameter λ∗ can be obtained from iteration formula (19.5.43). Once λ∗ is reached, the optimal step ξ ∗ is obtained, and the trust region scheme in Algorithm 19.5.5 can be driven to another round of iteration. For general form of the trust region scheme for well-posed problems, we refer to [61, 30] for details. Algorithm 19.5.5. (A trust region algorithm for solving the nonlinear l2 -l1 minimization model) (1) Choose parameters 0 < τ3 < τ4 < 1 < τ1 , 0 ≤ τ0 ≤ τ2 < 1, τ2 > 0 and initial values a0 , Δ0 > 0; Set k := 1. (2) If the stopping rule is satisfied then STOP; Else, solve (19.5.37)–(19.5.38) to give ξk . (3) Compute rk ; " a
k+1
=
ak , a k + ξk ,
if rk ≤ τ0 , otherwise.
Choose Δk+1 that satisfies " [τ3 ξk , τ4 Δk ], Δk+1 ∈ [Δk , τ1 Δk ],
if rk < τ2 , otherwise.
(19.5.45)
(19.5.46)
(4) Evaluate gk and Hk ; k := k + 1; GOTO STEP 2. Discussions In [57], the author proved the uniform boundedness of the Lagrangian parameters {λk } and pointed out the Lagrangian parameters {λk } play a role of regularization. Furthermore, we remark that unlike the smooth regularization, where the ξk (λ)l2 solved by the corresponding trust region method is monotonically decreasing [47, 49]; for the sparse regularization model, the ξk (λ)l1 solved by the above trust region method is not necessarily decreasing, see Figure 19.5. However, the Lagrangian parameter λ acts as a regularization parameter which provide a stable approximation to the true solution. Since the trust region subproblem only supplies an approximate solution, it is unnecessary to solve it accurately. Therefore, it is quite common that in practice the trial step at each iteration of a trust region method is computed by solving the trust region subproblem (19.5.37) and (19.5.38) inexactly. One way to compute an inexact solution of (19.5.37) and (19.5.38) is the truncated conjugate gradient method developed by [44, 45] for ill-posed problems. More advanced trust region methods for fast computation can be found in [53].
496
19 Seismic Wavefields Interpolation
Figure 19.5. Iterative step values of the trust region subproblem.
19.6
Sampling
Regular incomplete sampling takes a number of observations in a measurement line with equidistance. This kind of sampling may not satisfy the Shannon/Nyquist sampling theorem. As the coherence noise in frequency-wavenumber domain occurs in this type of sampling, hence it is not suitable for orthogonal transform-based wavefield reconstruction. Random incomplete sampling refers to taking a number of independent observations in a measurement line with randomly allocated geophones. This sampling technique is better than the regular incomplete sampling, however large sampling interval is not suitable for wavefield reconstruction, e.g., reconstruction using short-time Fourier transform and curvelet transform. This lack of control over the size of the gaps during random sampling may lead to an occasional failed recovery. Figure 19.6 illustrates the problem of the uncontrolled random sampling. Another sampling technique is the jittered undersampling [28]. The basic idea of jittered undersampling is to regularly decimate the interpolation grid and subsequently perturb the coarse-grid sample points on the fine grid. However the jittered undersampling takes only integer partition of the complete sampling, which may not satisfy the practical wavefield reconstruction.
Figure 19.6. Random sampling: the solid circle (yellow one) represents receivers, the hollow circle (white one) means no receivers. Large gap occurs for some sampling points.
Usually in field applications, because of the influence of ground geometry such as valleys and rivers, the sampling is difficult to allocate properly. Therefore
19.7 Numerical Experiments
497
the above sampling techniques could not overcome such kind of difficulties completely. Considering their shortcomings, we propose a new sampling scheme: a piecewise random sub-sampling, see Figure 19.7. We first partition the measurement line into several subintervals; then perform random sampling on each subinterval. As the number of partition is sufficient enough, the sampling scheme will control the size of the sampling gaps while keeping the randomicity of the sampling.
Figure 19.7. Piecewise random sub-sampling: the solid circle (yellow one) represents receivers, the hollow circle (white one) means no receivers. Large gap are controlled for any sampling points.
19.7 19.7.1
Numerical Experiments Reconstruction of shot gathers
We consider a 7 layers geologic velocity model, see Figure 19.8 (a). With the spatial sampling interval of 15 meters and the time sampling interval of 0.002 second, the shot gathers are shown in Figure 19.8 (b). The data with missing traces and its frequency are shown in Figures 19.9 (a) and 19.9 (b), respectively. Though performance of each method provided in this paper is different, they are all stable methods for wavefields restorations. For illustrations, we only list the restoration results of wavefields by the projected gradient method for approximation of the l1 -norm optimization, the alternating direction method and the l1 -norm constrained trust region method. Figures 19.10 (a) and 19.10 (b) show the restored wavefields and its frequency using the projected gradient
Figure 19.8. (a) Velocity model; (b) Seismogram.
498
19 Seismic Wavefields Interpolation
Algorithm 19.5.3 with l1 -norm approximated by Fθ (x), respectively; Figures 19.11 (a) and 19.11 (b) show the restored wavefields and its frequency using the alternating direction method, respectively; Figures 19.12 (a) and 19.12 (b) show the restored wavefields and its frequency using the l1 -norm constrained trust region method, respectively; It is clear from the reconstructions that most of the details of the wavefield are preserved by the above mentioned methods. The frequency information of the sub-sampled data the restored data in above figures indicates the good performance of these methods. It is clear that the aliasing (just like noise) of the sub-sampled data is reduced greatly in the recovered data. The small differences of the original data and the recovered data illustrated in Figures 19.13 (a), 19.13 (b) and 19.14 (a) further reveal that these methods are reliable in recovery of the original seismic wavefields. As we proved that the Lagrangian parameter of the trust region subproblem is uniformly bounded. To show this property, we plot the values of the Lagrangian parameter at each iteration in Figure 19.14 (b). Reconstruction of anisotropic media We consider an anisotropic media to test the performance of the developed methods. The data is generated using a velocity model varying both vertically and transversely, see Figure 19.15 (a). We only list the results by the trust region method. The original data, sub-sampled data and recovered data are shown in Figures 19.15 (b), 19.16 (a) and 19.17 (a), respectively. The frequency information of the sub-sampled data and the recovered data are shown in Figures 19.16 (b) and 19.17 (b), respectively. Again, the aliasing of the subsampled data is reduced greatly in the recovered data. The difference of the original data and the recovered data is illustrated in Figure 19.18 (a). Virtually, all the initial seismic energy is recovered with minor errors. Though the reconstruction is not perfect, most of the details of the wavefield are preserved. Boundedness of Lagrangian parameters at each iteration is shown in Figure 19.18 (b).
19.7.2
Field data
We further examine the efficiency of the new methods with field data. A marine shot gather is provided in Figure 19.19 (a) which consists of 160 traces with spacing 25m and 800 time samples with interval 2 × 10−3 s. There are damaged traces in the gather. The sub-sampled gather is shown in Figure 19.19 (b) with half of the original traces randomly deleted. This sub-sampled gather was used to restore the original gather with different methods. The restoration using the smooth l0 quasi-norm method with fσ (t) = 1 − exp(−t2 /(2σ 2 )) is displayed in Figure 19.20.
19.7 Numerical Experiments
499
Figure 19.9. (a) Incomplete data; (b) Frequency of the sub-sampled data.
Figure 19.10. (a) Recovery results using the projected gradient Algorithm 19.5.3 with l1 -norm approximated by Fθ (x); (b) Frequency of the restored data using the projected gradient Algorithm 19.5.3.
Figure 19.11. (a) Recovery results using the alternating direction method; (b) Frequency of the restored data using the alternating direction method.
500
19 Seismic Wavefields Interpolation
Figure 19.12. (a) Recovery results using the l1 -norm constrained trust region method; (b) Frequency of the restored data using the l1 -norm constrained trust region method.
Figure 19.13. (a) Difference between the restored data and the original data by the projected gradient Algorithm 19.5.3 with l1 -norm approximated by Fθ (x); (b) Difference between the restored data by the alternating direction method and the original data.
Figure 19.14. (a) Difference between the restored data by the l1 -norm constrained trust region method and the original data; (b) Variations of the Lagrangian parameters λ for the l1 -norm constrained trust region subproblem.
19.7 Numerical Experiments
501
Figure 19.15. (a) Velocity model; (b) Seismogram.
Figure 19.16. (a) Incomplete data; (b) Frequency of the sub-sampled data.
Figure 19.17. (a) Recovery results using the l1 -norm constrained trust region method; (b) Frequency of the restored data using the l1 -norm constrained trust region method.
502
19 Seismic Wavefields Interpolation
Figure 19.18. (a) Difference between the restored data and the original data by the l1 -norm constrained trust region method; (b) Variations of the Lagrangian parameters λ for the l1 -norm constrained trust region subproblem.
Figure 19.19. (a) The original marine shot gather; (b) The sub-sampled gather.
Figure 19.20. Restoration results by the l0 approximation method with fσ (t) = 1 − exp(−t2 /(2σ 2 )).
References
19.8
503
Conclusion
In this chapter, we make a review about using sparse optimization methods for solving the compressive sensing problem in seismic imaging. In particular, we introduce several recently developed methods in seismic wavefields recovery, including the l0 quasi-norm approximation method, l1 -norm constrained method, linear programming method, alternating direction method and the l1 norm constrained trust region method. For above mentioned methods, we also perform synthetic experiments and field data tests. The numerical tests reveal that these methods are useful and reliable for seismic wavefields recovery.
Acknowledgements This work is supported by National Natural Science Foundation of China under grant numbers 10871191, 40974075 and Knowledge Innovation Programs of Chinese Academy of Sciences KZCX2-YW-QN107.
References [1] J. Bardsley and C. R. Vogel, A nonnegatively constrained convex programming method for image reconstruction, SIAM Journal on Scientific Computing, 25, 1326-1343, 2003. [2] D. P. Bertsekas, Projected Newton methods for optimization problems with simple constraints, SIAM J. Control and Optimization, 20, 221-246, 1982. [3] S. Boyd, N. Parikh, E. Chu and B. Peleato, Distributed optimization and statistical learning via the alternating direction method of multipliers, Foundations and Trends in Machine Learning, 2011. [4] E. Candes and D. L. Donoho, Curvelets: A surprisingly effective non-adaptive representation for objects with edges, Curves and Surfaces Fitting, Editors: A. Cohen, C. Rabut and L. L. Schumaker, Nashville, TN, Vanderbilt University Press, 105-120, 2000. [5] E. Candes and D. Donoho, New tight frames of curvelets and optimal representations of objects with piecewise singularities, Communications on Pure and Applied Mathematics, 57, 219-266, 2004. [6] E. J. Candes, J. Romberg and T. Tao, Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information, IEEE Trans. on Information Theory, 52, 489-509, 2006. [7] E. Candes, L. Demanet, D. Donoho and L. X. Ying, Fast discrete curvelet transforms, Multiscale Modeling and Simulation, 5, 861-899, 2006. [8] E. J. Candes and M. B. Wakin, An introduction to compressive sampling, IEEE Signal Processing Magazine, 25, 21-30, 2008. [9] J. J. Cao, Y. F. Wang, J. T. Zhao and C. C. Yang, A review on restoration of seismic wavefields based on regularization and compressive sensing, Inverse Problems in Science and Engineering, 19(6), 2011.
504
19 Seismic Wavefields Interpolation
[10] C. H. Chen and O. L. Mangasarian, A class of smoothing functions for nonlinear and mixed complementarity problems, Computational Optimization and Applications, 5, 97-138, 1996. [11] S. Chen, D. Donoho and M. Saunders, Atomic decomposition by basis pursuit, SIAM Journal on Scientific Computation, 20, 33-61, 1998. [12] T. F. Coleman and Y. Li, An interior trust region approach for nonlinear minimization subject to bounds, SIAM Journal on Optimization, 6(2), 418-445, 1996. [13] A. R. Conn, N. I. M. Gould and R. B. Schnable, Testing a class of methods for solving minimization problems with simple bounds on the variables, Mathematics of Computation, 50, 399-430, 1988. [14] Y. H. Dai and R. Fletcher, Projected Barzilai-Borwein methods for large-scale box-constrained quadratic programming, Numerische Mathematik, 100, 21-47, 2005. [15] G. Davis, S. Mallat and M. Avellaneda, Greedy adaptive approximation, Journal of Constructive Approximation, 12, 57-98, 1997. [16] D. Donoho, Compressed sensing, IEEE Trans. on Information Theory, 52, 12891306, 2006. [17] D. Donoho, M. Elad and V. Temlyakov, Stable recovery of sparse overcomplete representations in the presence of noise, IEEE Transactions on Information Theory, 52, 6-18, 2006. [18] B. Efron, T. Hastie, I. Johnstone and R. Tibshirani, Least angle regression, Annals of Statistics, 32, 407-499, 2004. [19] V. B. Ewout and P. F. Michael, Probing the pareto frontier for basis pursuit solutions, SIAM J. Sci. Comput., 31, 890-912, 2008. [20] M. A. T. Figueiredo and R. D. Nowak, An EM algorithm for wavelet-based image restoration, IEEE Transactions on Image Processing, 12, 906-916, 2003. [21] M. A. T. Figueiredo, R. D. Nowak and S. J. Wright, Gradient projection for sparse reconstruction: application to compressed sensing and other inverse problems, IEEE Journal of Selected Topics in Signal Processing, 1, 586-597, 2007. [22] R. Fletcher, Practical Methods of Optimization, 2nd edition, John Wiley and Sons, Chichester, 1987. [23] G. Gasso, A. Rakotomamonjy and S. Canu, Recovering sparse signals with a certain family of non-convex penalties and DC programming, IEEE Transaction on Signal Process, 57, 4686-4698, 2009. [24] B. S. He, H. Yang and S. L. Wang, Alternating direction method with selfadaptive penalty parameters for monotone variational inequalities, Journal of Optimization Theory and Applications, 106, 337-356, 2000. [25] B. S. He, M. H. Xu and X. M. Yuan, Solving large-scale least squares covariance matrix problems by alternating direction methods, SIAM Journal of Matrix Analysis and Applications, 32(1), 136-152, 2011. [26] F. J. Herrmann and G. Hennenfent, Non-parametric seismic data recovery with curvelet frames, Geop. J. Int., 173, 233-248, 2008. [27] F. J. Herrmann, D. L. Wang, G. Hennenfent and P. P. Moghaddam, Curveletbased seismic data processing: a multiscale and nonlinear approach, Geophysics, 73, A1-A6, 2008. [28] G. Hennenfent and F. J. Herrmann, Simply denoise: wavefield reconstruction via jittered undersampling, Geophysics, 73, v19-v28, 2008.
References
505
[29] S.-J. Kim, K. Koh, M. Lustig, S. Boyd and D. Gorinevsky, An interior-point method for large-scale l1 -regularized least squares, IEEE Journal on Selected Topics in Signal Processing, 1, 606-617, 2007. [30] C. T. Kelley, Iterative methods for optimization, SIAM in Applied Mathematics, 1999. [31] M. Lustig, D. L. Donoho and J. M. Pauly, Sparse MRI: the application of compressed sensing for rapid MR imaging, Magnetic Resonance in Medicine, 58, 11821195, 2007. [32] D. M. Malioutov, M. Cetin and A. S. Willsky, Homotopy continuation for sparse signal representation, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Philadelphia, PA, 5, 733-736, 2005. [33] H. Mohimani, M. Babaie-Zadeh and C. Jutten, A fast approach for overcomplete sparse decomposition based on smoothed l 0 norm, IEEE Transactions on Signal Processing, 57, 289-301, 2009. [34] J. Moré and G. Toraldo, On the solution of large quadratic programming problems with bound constraints, SIAM J. Optim., 1, 93-113, 1991. [35] M. R. Osborne, B. Presnell and B. A. Turlach, A new approach to variable selection in least squares problems, IMA Journal of Numerical Analysis, 20, 389-403, 2000. [36] B. D. Rao and K. Kreutz-Delgado, An affine scaling methodology for best basis selection, IEEE Transaction on Signal Process, 47, 187-200, 1999. [37] M. Schmidt, G. Fung and R. Rosales, Fast optimization methods for L1 regularization: a comparative study and two new approaches, Proceedings of European Conference on Machine Learning, 286-297, 2007. [38] R. Tibshirani, Regression shrinkage and selection via the lasso, Journal Royal Statistical Society B, 58, 267-288, 1996. [39] A. N. Tikhonov and V. Y. Arsenin, Solutions of Ill-posed Problems, John Wiley and Sons, New York, 1977. [40] J. Tropp, Greed is good: Algorithmic results for sparse approximation, IEEE Transactions on Information Theory, 50, 2231-2242, 2004. [41] J. A. Tropp and A. C. Gilbert, Signal recovery from random measurements via orthogonal matching pursuit, IEEE Trans. Inform. Theory, 53, 4655-4666, 2007. [42] B. Turlach, On algorithms for solving least squares problems under an L1 penalty or an L1 constraint, Proceedings of the American Statistical Association, Statistical Computing Section (Alexandria, VA), 2572-2577, 2005. [43] Y. F. Wang and T. Y. Xiao, Fast realization algorithms for determining regularization parameters in linear inverse problems, Inverse Problems, 17, 281-291, 2001. [44] Y. F. Wang and Y. X. Yuan, On the regularity of trust region-CG algorithms for nonlinear ill-posed problems, Proceedings of the Third Asian Mathematical Conference, Editors: T. Sunada, P. W. Sy and Y. Lo, Singapore, World Scientific Publishing Co. Pte. Ltd, 562-580, 2002. [45] Y. F. Wang, On the regularity of trust region-cg algorithm: with application to image deconvolution problem, Science in China Ser. A, 46, 312-325, 2003. [46] Y. F. Wang, X. W. Li, S. Q. Ma, Z. Nashed and Y. N. Guan, BRDF model inversion of multi-angular remote sensing: ill-posedness and interior point solution method, Proceedings of the 9th International Symposium on Physical Measurements and Signature in Remote Sensing, XXXVI, 328-330, 2005.
506
19 Seismic Wavefields Interpolation
[47] Y. F. Wang and Y. X. Yuan, Convergence and regularity of trust region methods for nonlinear ill-posed inverse problems, Inverse Problems, 21, 821-838, 2005. [48] Y. F. Wang and S. Q. Ma, Projected Barzilai-Borwein methods for large scale nonnegative image restorations, Inverse Problems in Science and Engineering, 15, 559-583, 2007. [49] Y. F. Wang, Computational Methods for Inverse Problems and Their Applications, Higher Education Press, Beijing, 2007. [50] Y. F. Wang, S. F. Fan and X. Feng, Retrieval of the aerosol particle size distribution function by incorporating a priori information, Journal of Aerosol Science, 38, 885-901, 2007. [51] Y. F. Wang, S. Q. Ma, H. Yang, J. D. Wang and X. W. Li, On the effective inversion by imposing a priori information for retrieval of land surface parameters, Science in China D, 52, 540-549, 2009. [52] Y. F. Wang, J. J. Cao, Y. X. Yuan, C. C. Yang and N. H. Xiu, Regularizing active set method for nonnegatively constrained ill-posed multichannel image restoration problem, Applied Optics, 48, 1389-1401, 2009. [53] Y. F. Wang and S. Q. Ma, A fast subspace method for image deblurring, Applied Mathematics and Computation, 215, 2359-2377, 2009. [54] Y. F. Wang, C. C. Yang and J. J. Cao, On Tikhonov regularization and compressive sensing for seismic signal processing, Mathematical Models and Methods in Applied Sciences, 22(2), DOI: 10.1142/S0218202511500084, 2012. [55] Y. F. Wang and C. C. Yang, Accelerating migration deconvolution using a nonmonotone gradient method, Geophysics, 75, S131-S137, 2010. [56] Y. F. Wang, J. J. Cao and C. C. Yang, Recovery of seismic wavefields based on compressive sensing by an l1 -norm constrained trust region method and the piecewise random sub-sampling, Geophysical Journal International, 187, 199-213, 2011. [57] Y. F. Wang, Sparse optimization methods for seismic wavefields recovery, Proceedings of the Institute of Mathematics and Mechanics, 18(1), IMM, Ural Branch of the Russian Academy of Sciences, Ekaterinburg, Russia, 2011. [58] Y. F. Wang, I. E. Stepanova, V. N. Titarenko and A. G. Yagola, Inverse Problems in Geophysics and Solution Methods, Higher Education Press, Beijing, 2011. [59] Y. L. Wang and W. T. Yin, Sparse signal reconstruction via iterative support detection, Technical Report, Rice University, TX, 2009. [60] Y. Y. Ye, Interior-Point Algorithm: Theory and Analysis, John Wiley and Sons, New York, 1997. [61] Y. X. Yuan, Numerical Methods for Nonliear Programming, Shanghai Science and Technology Publication, Shanghai, 1993.
References
Authors Information Yanfei Wang† , Jingjie Cao†‡ , Tao Sun†‡ and Changchun Yang† †
Key Laboratory of Petroleum Resources Research, Institute of Geology and Geophysics, Chinese Academy of Sciences, Beijing 100029, P. R. China. E-mail: [email protected]
‡
Graduate University of Chinese Academy of Sciences, Beijing 100049, P. R. China.
507
Chapter 20
Some Researches on Quantitative Remote Sensing Inversion H. Yang
Abstract. In this chapter, we make a review of recent developments in quantitative remote sensing inversion mainly on the multi-stage inversion utilizing priori knowledge. Both modeling and solving methods are described, including estimation of priori knowledge, optimization algorithm and multi-stage inversion strategy. Numerical experiments based on field data are performed to validate the methods. The results show the effectiveness of our solving methods.
20.1
Introduction
In the most generally meaning, remote sensing refers to instrument-based techniques employed in the acquisition and measurement of spatially organized data on some properties by non-contact with the items at a finite distance from the observed target. Here remote sensing is specified to the electromagnetic radiation of land surface. Into the 21st century, a series of operating satellites from the NASA Earth Observing System (EOS) program are in-orbit. Other international programs and commercial programs, such as IKONOS, are working. Vast amounts of remotely sensed observations are acquired. The remote sensing of land surface is entering quantitative remote sensing from the qualitative remote sensing with the development of the sciences and technologies for computing, mathematics, physics, etc. and remote sensing itself is in the more and more demands of retrieval of land surface variables. The land surface variables estimated from remotely sensed observing data are frequently and last in multi-scale and multi-spectral, fitting for the demand of input parameters of many applications, such as environmental prediction, common land model etc. Making research on the global change, etc. is feasible and reliable. The retrieved parameters in multi-scale also can fit for the demand and application of regional and global research. In this sense, the core of quantitative remote sensing is inversion, namely, retrieval the land surface parameters from the remotely sensed data. However,
510
20
Quantitative Remote Sensing Inversion
the choke point in development of quantitative remote sensing is the inversion being ill-posed in essence (Li et al., 1997, 1998; Chen et al., 1997). The emergence of Multi-Angular Remote Sensing (MARS) data brings hope to this question. But, it is common that the scope of illumination angle is small for satellite MARS datasets, the number of sampling is still little and the observations may be even correlated to each other highly. In addition, parameters in the model should be more than ten. Information is, therefore, far from plenty. Facing the fact that the increase of information because of more observations is always less than increase of parameters, uncertainty, people begin to study how to make use of the most of the limited information for minimizing the uncertainty of parameters. At present, there are three usually used ways, using of a priori knowledge, more effective strategy and optimized algorithm. Applicable models which chosen for different unknowns inversion is an independent problems or can be included into a priori knowledge. About the a priori knowledge, the expression and usage of it are investigated. Bootstrap method, a statistical method to estimate the distribution of the a priori knowledge is studied. The kernel-driven BRDF (Bi-directional Reflectance Distribution Function) model is taking as an example, the distribution of the three parameters are estimated by bootstrap method, not obeying normal distribution. The test results show the bootstrap method can estimate the most realistic distribution of the a priori knowledge for inversion, may improve the precision of the inversed parameters. The usually used expression of a priori knowledge for parameters is their means and covariance matrix. More expression ways are further researched. And the usage of a priori knowledge is also an important and difficult problem to be investigated in quantitative remote sensing. On inversion strategy, the multi-stage, sample-direction dependent, targetdecision (MSDT) strategy is well-known at present (Li et al., 1998, 2000, 2001; Gao et al., 1998). This method is trying to fix some parameters quantitatively during inversion based on USM (Uncertainty and Sensitivity Matrix). This is the key to push BRDF inversion from art to science. But it was not proved in mathematic. The main reason is that how to measure information quantitatively is still not solved, which leads to the difficulties in describing the change of information during inversion quantitatively. Thus, the Shannon information entropy is studied to measure the information of remote sensing observations and parameters. Then the information flow of the inversion is analyzed. Finally, USM-based information matrix is put forth as a tool for quantitatively control information flow in inversion. In the traditional optimization inversion approach, iterative techniques are employed to estimate the biophysical variables. Many different types are
20.2 Models
511
available, some require derivative information, and others do not. A multidimensional optimization algorithm adjusts the free parameters until the merit function is minimized. The mostly common method is the least squares error (LSE) estimation. However, Li et al. (2001) pointed out that for physical models with about ten parameters (single band), it is questionable whether remote sensing inversion problems in geosciences seem to be always underdetermined in some sense. Therefore, the inversion problems in remote sensing seem to be always underdetermined in some sense. Nevertheless, it can be always converted to an overdetermined one in some cases by utilizing multiangular remote sensing data or by accumulating some a priori knowledge (Li et al., 2001). Such as Tikhonov regularization algorithm (Engl et al., 2000), uses a priori knowledge as constrain to improve the precision and robustness. But the regularization parameter’s determination is a key issue. There are many developed methods for quantitative remote sensing inversion mainly statistical methods with different variation from Bayesian inference. Most accurate, physically-based models are non-differentiable, analytic and relatively computationally expensive. Such as the simplex method, AMOEBA (NAG, 1990), the conjugate direction set method, POWELL (Press et al., 1989), etc., has been widely used by the BRDF community. The major limitation associated with these methods is that it is very computationally expensive and therefore difficult to apply operationally for regional and global study. Wang et al. analyzed from algebraic point of view the solution theory and methods for quantitative remote sensing inverse problems (Wang et al., 2010). Another solution is the lookup table method. This method has become popular in various disciplines. Hybrid inversion methods are another computationally efficient and can be applied on a per pixel basis. The nonparametric models in those hybrid inversion methods include a feed-forward neural network, projection-pursuit regression and generic algorithm etc. There are some good discussions regarding the advantages and disadvantages of these inversion methods by Kimes et al. (2000). In this chapter, the three areas will be presented in detail, mainly on the progress of quantitative remote sensing inversion of our research group.
20.2
Models
Because of the non-Lambert reflectance of the land surface, the BRDF models were used to best describe the interaction of the land surface objects and the radiance of the sun. Modeling research is a foundation of quantitative remote sensing. It is also important for inversion. At present, there are three kinds of models, one kind is the empirically relation of the reflectance with the land surface parameters, such as the linear kernel-driven model, the empirically function of NDVI (Normal Difference Vegetation Index) with LAI (Leaf Area Index), etc.
512
20
Quantitative Remote Sensing Inversion
The second kind is physically based canopy reflectance models, including radiative transfer models, geometric optical models (Li and Strahler, 1985), etc. The last kind model is computer simulation, such as Monte Carlo ray tracing model, radiosity model, etc. But because of computing time consume and detailed description, they are typically suitable for very complex simulations of canopy radiation regimes and for validating the simplified canopy reflectance models. Thus, the first two models are usually chosen for inversion to retrieval the land surface parameters such as albedo and canopy properties such as LAI, fractional photosynthetically active radiation absorption (FPAR), etc. The linear kernel-driven models mathematically describe the BRDF of land surface in linear combination of the isotropic kernel, volume scattering kernel and geometric optics kernel. Because it is simple in formulation, it is often used to invert BRDF/albedo (Pokrovsky and Roujean, 2003). The linear kerneldriven BRDF model can be described as follows (Roujean et al. 1992), R(θi , θv , φ) = fiso + kvol (θi , θv , φ)fvol + kgeo (θi , θv , φ)fgeo ,
(20.2.1)
where θi and θv are the zenith angle of the solar direction and the zenith angle of the view direction respectively, φ is the relative azimuth of the sun and view direction; R is the bidirectional reflectance; fiso , fvol and fgeo are the three unknown parameters, the weigh of the isotropic, volume and geometric scattering in all BRDF, closely related to the reflect properties of the land surface, thus often to be retrieved; kvol and kgeo are the kernels, that is, known functions of illumination and viewing geometry which describe volume and geometric scattering respectively. There are many types of the kernels. However, it was demonstrated that the Ross-Thick kernel for volume scattering and LiTransit kernel for geometric scattering more robust and stable. The expression of Ross-Thick kernel is given by Kvol (θi , θv , φ) =
(0.5π − g) cos g + sin g π − . cos θi + cos θv 4
(20.2.2)
More detailed explanation about g and others can be found in Roujean et al. (1992). Li-Transit kernel is related to Li-Sparse and Li-Sparse-R (the reciprocal Li-Sparse) kernel in the following, ( Ksparse , B ≤ 2, (20.2.3) Ktransit = 2 Ksparse , B > 2. B And B is given B(θi , θv , φ) = −O(θi , θv , φ) + sec θi + sec θv .
(20.2.4)
More detailed explanation about O, θi and θv can be found in Wanner et al. (1995).
20.2 Models
513
Physically based canopy reflectance models link remote sensing radiance observations with canopy properties more directly than kernel or statistic models, these models more provide the theoretical foundations for developing practical algorithms to estimate biophysical parameters, such as LAI, leaf angle distribution (LAD), FPAR, etc. These models have their own advantages and disadvantages. For dense vegetation canopies, canopy radiative transfer formulations are typically to modeling, especially one-dimensional radiative transfer modeling. Such as Suits (1972) model, SAIL (light Scattering by Arbitrarily Inclined Leaves) canopy model (Verhoef, 1984), in which canopy is assumed to consist of various angles of leaves. These most early canopy radiative transfer models were based on Kulbelka-Munk (KM) theory, and the four differential equations with nine coefficients form as follows, dEs = k · Es , dz
dE − = a · E − − σ · E + − s · Es , dz dE + = −a · E + + σ · E − + s · Es , dz
(20.2.5)
dEv = −K · Ev + μ · E + + v · E − + w · Es . dz The coefficients are related with the observation geometry, canopy structure and canopy optical properties, etc. The detailed expression can be found in Suits (1972), Verhoef (1984), etc. Given these coefficients, solving this differential equation set is not too difficult. For the canopy which consists of a series of regular geometric shapes (e.g. cylinders, spheres, cones, ellipsoids) in a prescribed manner, the geometric optical (GO) models are suitable. For sparse vegetation canopies, the reflectance is the area-weighted sum of different sunlit/shadow components. The fractions of different components are calculated based on the geometric optical principles (Li and Strahler, 1992, Strahler and Jupp, 1990). These models have been usually used for mapping forest tree size and density from remotely sensed imagery. There are some hybrid canopy models, GORT, which integrate an analytical approximation of the radiative transfer (RT) within the plant canopies to model the spectral properties of each scene component with geometric optical model. It is suitable for discontinuous canopies, and good for considering the multiple scattering. It is better for estimate the component signature (Ni et al., 1999). Computer simulation models have been applied primarily for understanding of the radiation regime and validation of some simplified models. If using for inversion, the look-up table often generated ahead.
514
20.3
20
Quantitative Remote Sensing Inversion
A Priori Knowledge
Inversion is the key problem of quantitative remote sensing, but the inversion is ill-posed in fact. Under-determined is still the main difficult of quantitative remote sensing inversion in EOS era. So, adding a priori knowledge into inversion is the hope. But what is a priori knowledge? As the ancient philosopher Confucius pointed out, “Our knowledge consists of two parts-what we know, and what we know we do not know.” Namely, all the information we know before the inversion is a prior knowledge. Such as the class of the land cover, the phenology of the crop and even the uncertainty of the unknowns, etc. are all a prior knowledge, including all the information ways, words, data and images and so on. In practice, we may classify the a priori knowledge into different levels: general knowledge about the land surface (“global knowledge”), land cover type related knowledge, target specific knowledge etc. In remote sensing inversion, it involves these aspects, applicable forward model(s); physical limits and probability density in model parameter space; statistics of model accuracy and noise in remote sensing signals; seasonal change associated with land cover types or targets; confidence of the above knowledge. Many papers research on the necessity and roles of a priori knowledge application into inversion (Li et al., 1998). But how to express, accumulate and utilize a prior knowledge are need further research. Tarantola (1987) pointed out that the best expression of a priori knowledge is the PDF (probability density function) of parameters. However, Gaussian distributions of parameters and errors are usually assumed in current inversion as a priori knowledge. This assumption is not always reasonable in fact. Kolmogorov-Smirnov (K-S) test method (He, 1993) was used to test this assumption. The results of the 73 groups data set, which collected by Kimes et al. (1986), Ranson et al. (1985), Deering and Leone (1986), etc., show that the p-value is approximate to zero, which indicate the assumption that the distributions of parameters and error are Gaussian distribution is unreasonable. These data sets mainly collected from grass, farm, forest and soil land cover type, in order to clarify the rationality of the classification, the similarity of PDF for each class is tested. The likelihood and K-S test results also illustrate that they are not obey the same normal distribution (Yang et al., 2005, Chen et al., 2007). According to above results, the PDF estimating of all a priori knowledge is important for their reasonable usage in the quantitative remote sensing. Bootstrap is one of the best methods. Bootstrap method is proposed by Efron (1979). This method does not need any assumption about the distribution, and estimate the distribution or property of the random variables directly from the re-sample of the data. The principle of the bootstrap method is taking the basic sample as the population,
20.3 A Priori Knowledge
515
abstracting N samples with the size of n using Monte Carlo sampling method and calculating the value of the statistics Γ of each N samples. Then, the frequency of these Γ values can be seen as distribution of the statistics. As N increases, the estimate gets better. The detailed see in Efron (1979). Take 73 groups and kernel-driven model as an example, the means vector and covariance matrix of the three parameters are listed in Table 20.1. Classify the data sets into four groups according to the land cover type, then use bootstrap method to estimate the means vector and covariance matrix of each sub-group data sets, the results of grass land and farm are also listed in Table 20.1 as an example. Table 20.1. Means vector and covariance matrix. classes all classes parameters fiso fvol fgeo means 0.449 0.154 0.113 Covariance 0.313 0.207 −0.110 matrix 0.207 0.140 −0.066 (0.0001) −0.110 −0.066 0.110
grass land farm fiso fvol fgeo fiso fvol fgeo 0.540 0.162 0.096 0.357 0.059 0.060 0.773 0.601 0.388 0.573 0.486 −0.163 0.601 0.653 0.484 0.486 0.447 −0.123 0.388 0.484 0.852 −0.163 −0.123 1.010
From Table 20.1, we can see the distributions of them are different, and the uncertainty is less for the classified data sets. According to Li et al. (1998), a priori knowledge can be represented in different forms: 1) (wide-bound form) hard-bounded ranges of parameter values, usually applicable for physical limits; 2) (δ-bound form) some parameters (or relations among them) are accurately known, or we know they are insensitive to the given observation geometry, and can be fixed; 3) (soft-bound form) unknown parameters have a priori known joint probability density function (JPDF), which we call a soft-bound range, just as above. All these three kinds of a priori knowledge can be used in inversion depending upon the situation. At present, most of the a priori knowledge, especially the soft-bound form gets from the field observations, such as the 73 groups data sets. The POLDER BRDF data base of Bicheron et al. (1999) provides another opportunity to expand our BRDF knowledge base. The data sets are classified by IGBP land cover types. The 395 POLDER BRDF observations are similar to those of the 73-collection, even though these space-borne observations have coarse spatial resolution, but more land cover features. In China, the classical and the first a priori knowledge base is the Spectrum Knowledge Base System for Typical Object in China (Wang et al., 2009). There are a plenty of spectrum observations of many typical crops, water and rocks, accompany with the environmental parameters measurements. The crop structural parameters measurements of LAI, LAD etc. are plentiful. Many methods
516
20
Quantitative Remote Sensing Inversion
have been researched to abstract a priori knowledge from this base system. It is significant important for a priori knowledge accumulation of structural and optical parameters inversion. At present, more and more sensors on orbit and observation systems on operation, a great deal of data and products acquired, methods for a priori knowledge accumulation from them need further research.
20.4
Optimization Algorithms
Remote sensing inversion is an under-determined problem, optimization methods are often used to retrieval unknowns. The optimization problems are also made up of three basic ingredients, an objective function that we want to minimize or maximize. For remote sensing inversion, we usually call it merit functions. The common form is the total deviation of observed data from predictions based on the model. Then, is a set of unknowns or variables that affect the value of the objective function, namely, the parameters which be retrieved, such as LAI, LAD etc. The last is a set of constraints that allow the unknowns to take on certain values but exclude others. It is the hard-form of the a priori knowledge. If considering a priori knowledge, as what usually to do now, a priori knowledge is often expressed as a constraint of the merit function as an additional item of the merit functions. The optimization problem is then to find values of the variables that minimize or maximize the merit function while satisfying the constraints. In the traditional optimization inversion approach, iterative techniques are employed. But most accurate, physically-based models are non-differentiable and relatively computationally expensive, thus, a non-derivative-based optimization algorithm is usually required, such as the simplex method and conjugate direction set method, POWELL (Press et al., 1989). This optimization approaches have been widely used by the BRDF community. The major limitation associated with this method is that it is very computationally expensive and therefore difficult to apply operationally for regional and global study. Then, for merit functions which applying a priori knowledge, the LSE (Least square estimation) method is also used, such as the least square estimate algorithm by adding latent data (Yang et al., 2005). The latent data is yield by the a priori knowledge. It is simulating kernel matrix Asimu [3, 3] by covariance matrix of the a priori knowledge, namely, Asimu [3, 3] = Λ−1/2 × E ,
(20.4.1)
where Λ is a diagonal matrix, its elements are the three characteristic values of Cp , covariance matrix of a priori knowledge, E is the corresponding characteristic vectors.
20.4 Optimization Algorithms
517
Then get simulated observations Ysimu [3] through model forwarding Asimu [3, ¯ with the means vector of the a priori knowledge. These simulated data 3]× X[3] are called latent data (Li et al., 2001). The fundamental thought of this method is trying to let merit function of latent data equal to deviation of parameters in value. It is actually a generalization of a great quantity random simulated data whose parameters obey the priori joint probability density function. It is essentially a method of a priori knowledge application in inversion. As a result, there are four data after adding these three latent data to the one singlelook observation. In this condition, inversion becomes determined because the latent data are simulated by nonsingular matrix Cp [3, 3]. Thus the least square solution is, ˆ X[3] = (A[4, 3] × A[4, 3])−1 × A[4, 3] × Y [4] , (20.4.2) yobs [1] Aobs [1, 3] where Y [4] = , A[4, 3] = . Ysimu [3] Asimu [3, 3] Phillips and Tikhonov developed regularization method independently (Hansen, 1998). The latter is most widely used to stabilize the ill-posed problem and to obtain a meaningful solution (Engl et al., 2000; Ma et al., 2000). The key idea is to confine estimation to a compactly supported set by introducing mathematics constraint into solution space and it is achieved according to observations and a prior knowledge. In the statistical literature, Tikhonov’s method is known as ridge regression. The method also appears when the least squares problem is augmented with statistical a priori information about the solution, then, in this meaning, it is same to the LSE when adding the latent data. The Tikhonov’s method general form leads to getting the maximum likelihood estimate by minimizing the following expression, ¯ Cp −1 (X − X)}, ¯ min{(Aobs X − Yobs ) (Aobs X − Yobs ) + ϒ2 (X − X)
(20.4.3)
where, the symbol ϒ is a regularization parameter. It controls the weight of the contributions to the minimum value from regularization term and residual norm. It equals to balance the weight of observation and a priori knowledge in inversion. There are many studies on how to determine this ratio, such as quasioptimality criterion, generalized cross-validation (GCV) and L-curve criterion et al. But none of them can be quantitative and commonly accepted. Study shows that the larger the priori information ratio is, namely, larger weight of a priori knowledge in inversion, the more stable the inversion will be, and the estimate is much closer to a priori knowledge; on the contrary, the less the ratio, namely, larger weight of observation, the less stable the inversion. The following 5 observations of NOAA-AVHRR is typically multi-angle remote sensing data, and among them, good and bad qualify observations are known. They are usually used to validate the inversion methods in theory.
518
20
Quantitative Remote Sensing Inversion
Table 20.2 is the observations, and Table 20.3 is the results for Tikhonov regularization method. Table 20.2. Observations of NOAA-AVHRR. data No. 1 2 3 4 5
θ 35.2 34.3 32.9 32.5 32.0
ϑ 27.6 12.4 20.2 33.7 53.0
φ 42.0 42.5 130.6 129.2 126.5
NIR 0.287 0.298 0.216 0.210 0.195
kgeo −0.746 −0.818 −1.180 −1.240 −1.260
kvol 0.0774 0.0130 −0.0929 −0.1080 −0.0725
Table 20.3. Results of Tikhonov regularization method. methods priori means results
bootstrap normal distribution unclassification classification 0.449 0.154 0.113 0.411 0.104 0.141 0.394 0.162 0.079 0.440 0.189 −0.010 0.401 0.112 0.162 0.337 0.082 0.168
Many researches indicated that MODIS albedo product (MOD43) has an acceptable accuracy, so MODIS BRDF observations are used to inverse and validate this inversion method. Heihe Watershed in Zhangye, China has been chosen as the research area. Considering the mixed pixels, the average value of means vector X¯B and covariance matrix CBp are used as a priori knowledge to retrieve land surface albedo. Regularization parameter ϒ is important and decided by observations’ error. However, the observations’ error is difficult to estimate. Here, the MODIS products of the same area and time range of year 2004 are used to estimate the regularization parameter. This method is based on the assumption that MODIS products contain similar observations’ error. According to the value of regularization parameter ϒ changes in the range of 0 − 6 × 106 , the differences of inversion BSA (Black Sky Albedo) results and MODIS BSA results increase within 2×106 and decrease then. So regularization parameter will be 2 × 106 in the inversion here. The results are in Figure 20.1. The left is the retrieval BSA (Figure 20.1 (a)), the right is the MODIS product (Figure 20.1 (b)). One issue in the traditional optimization method is that the success of the inversion often depends on the initial values provided. If the initial values are far away from the true values, the iterative algorithm may not converge. One solution is to run the iterative algorithms several times by providing multiple sets of initial values and then find the best solution, which will, of course, increase the computational time significantly. Generic algorithm might be an alternative solution.
20.4 Optimization Algorithms
519
Figure 20.1. The retrieval BSA (left) and MODIS BSA product (right).
The fundamental concept of the generic algorithm (GA) is based on the concept of natural selection in the evolutionary process, which is accomplished by genetic recombination and mutation (Goldberg, 1989). Applications of GA to a variety of optimization problems in remote sensing have been demonstrated only since the 1990s. There are advantages and disadvantages of general GA, such as search from a population, not an individual; the solution is the global optimization, not local; but its computation is expensive, is early convergence, namely, prematurity, etc. These operation in GA all achieved by corresponding functions, such as rank, proportional, top or shift linear etc. scaling functions, stochastic uniform, roulette, remainder and uniform etc. selection functions, scattered, two point and heuristic etc. crossing functions, and Gaussian, uniform and adaptive etc. mutation functions. The GA is also the determination of these functions. For physical remote sensing models, the unknown parameters which to be retrieved are much more than the observations. Then if using these models to retrieve the land surface parameters, the initial values of the unknowns are affecting the inversion results. If given according to the a priori knowledge, the weight of the a priori knowledge is large, then making the problems complicated. And because remote sensing inversion usually have multi-angle observations, GA is searching from a population, thus is more suitable for this inversion. Take SAIL model as an example, the functions of all steps in GA are selected as Table 20.4. Observations are measured BRDF of wheat in April, 2001, in Shunyi experiment site, Beijing, China. The LAI inversion results in Figure
520
20
Quantitative Remote Sensing Inversion
20.2. It shows the results are good, the total RMS (Root Means Square) is 0.267. Table 20.4. Functions of all steps in GA.
red band near infrared band
scaling functions selection functions crossing functions Top(0.42) Uniform Heuristic Top(0.42) Uniform Heuristic
Figure 20.2. Inversion results of structural parameters.
20.5
Multi-stage Inversion Strategy
Quantitative remote sensing is ill-posed problem in essence. There are two ways to deal with from the view point of information. One is adding a priori knowledge as stating above, the other is multi-stage inversion strategy, use part sensitive data to retrieve part uncertain parameters. Li et al (1997) put forth firstly the inversion method that fixing parameters semi-quantitatively based on USM (Uncertainty and Sensitivity Matrix) and multi-stage, sample-direction dependent, target-decisions (MSDT) strategy. USM is an objective expression of the a priori knowledge, so it is called knowledge-based USM. For USM, an argument may be about the best guess of each parameter and its uncertainty. Li et al. (1997) defined the initial uncertainty of a parameter using both the standard deviation and the physical limitation as, {Pi ± Si } AND {physical limit} ,
(20.5.1)
where Pi and Si are the mean and standard deviation of the probability distribution of the values, AND is operation of intersection. Sensitivity of parameters
20.5 Multi-stage Inversion Strategy
521
has a direct effect on the inversion, due to the different sensitivities in different sampling direction of each parameter, and the fact that the sensitivity of the same sample can also shows a great diversity among the parameters, USM is proposed to describe the complex situation. Assuming that a BRDF model has N bands, K structural parameters, L spectral parameters, K + N ∗ L in total, and the observations have M samples, so the matrix will have M × N rows and K + N ∗ L columns. It is too big to operate. Because of the spectral parameters of one band are independent of the other band, so decomposed it into a structural parameters matrix which has M × N rows, K columns, and N matrix of spectral parameters (M × L) correspond to N bands, total N + 1 matrix. An element of USM is defined as, U SM [i, j] =
M axdiff [R(j |i )] , R(i)
(20.5.2)
where M axdiff [R(j |i )] is the maximum difference of BRDF as a function of only the j-th parameter within its uncertainty, given the i-th geometry of illumination and viewing; R(i) is BRDF as predicted by the model at the ith geometry, with all parameters at their best guess values. On the basis of USM, we can analyze the inversion process on an objective basis, and selected sensitive samples to invert for selected parameters step by step. This is just the MSDT. Every stage of inversion, data and parameters which to be retrieve were chosen by searching out from the maximum U SM [i, j]. Many works have been done and used the USM and MSDT in inversion, the results show this strategy is effective. But the USM partition the data sets and parameters were not proved mathematically and cannot attain perfectly quantitative. The main reason is that the problem on how to measure information quantitatively is still not solved, which leads to the difficulties in describing the change of information in inversion quantitatively. It is still the key problem needed to be solved that how to express, utilize a prior knowledge and partition data, fix parameters to control information flow quantitatively. Yang et al. (2005) take the linear kernel-driven model as an example, study how to describe information of remote sensing data quantitatively and measure information change during inversion, namely, analyze information flow in different inversion methods and effect of information flow controlling. In general, the information content of any dataset is its total word length (unit is bit or byte). However, the information content of remote sensing dataset is always difficult to reach this capability because it is affected by inversion objective, prior knowledge and inversion model et al. For the limited sampling of remote sensing observations, Fisher statistics, the determinant or the sum of diagonal elements of the matrix defined by the practical problems, and Shannon information entropy have their own shortcomings. Taking linear kernel-driven
522
20
Quantitative Remote Sensing Inversion
model as an example, given a priori knowledge, information of single-look observation on three parameters in data space and parameter space is measured by Shannon entropy decrease, respectively. Natural logarithm for entropy is taking here. If ξ is a random variable obeying m-dimension Gaussian distribution, the information entropy will be only related with its covariance matrix cov(ξ). In data space, the entropy decrease thus is the measure for uncertainty decrease of the observation, namely, increased information on three parameters induced by a single-look observation. It depends on the variance σ 2 of observations?error and covariance of a priori knowledge mapping into the data space through the model. What should be noted is that if the uncertainty of observation is larger, uncertainty of the prior distribution is difficult to decrease. The quality of this observation is regarded as relatively poor in this case. In parameter space, the prior uncertainty of three parameters is determined by covariance Cp of a priori knowledge. However, posterior uncertainty of three parameters is determined by their post-covariance ˆ Cov(X[3]) that affected by inversion algorithm. For LSE method adding latent data, covariance matrix depends on variance of error of the one single-look observation and covariance Cp of a priori knowledge inverted by the inversion methods. Entropy decrease is used to measure the uncertainty decrease of the parameters after inversion. And it is the total information content of singlelook observation for three parameters inversion. For Tikhonov regularization method, covariance matrix of parameters after inversion is, ˆ = (A1 A1 )−1 (γ 4 σ 2 Aobs [1, 3] Aobs [1, 3] + Cp−1 [3, 3])(A1 A1 )−1 . (20.5.3) Cov(X) The detailed see in Yang, et al. (2005). Then posterior information entropy can be calculated. The entropy decrease is related to γ, then the γ let the entropy decrease the maximum is the best one weight. Namely, we can determine the ˆ = (A1 A1 )−1 , best information ratio at the same time. And there are Cov(X) we define A1 A1 as information matrix. Also the observations in Table 20.2, the entropy decrease after inversion through these two algorithms is in Table 20.5. Table 20.5. Entropy decrease after inversion through two algorithms. data No. 1 2 3 4 5
results of the method A 0.0127 0.0128 0.0143 0.0146 0.0149
results of the method B 3.22 3.23 3.28 3.29 3.31
By analyzing the results in Table 20.5, we can get the following primary conclusions. Uncertainty decreases of the same parameter are different for different
20.5 Multi-stage Inversion Strategy
523
inversion method in the condition of same data and a prior knowledge. That is, data information utility in different inversion methods is different. In the same conditions, entropy decrease of the method A is least, showing information utility in least square method by adding latent data is not high. However, entropy decrease of the method B is much larger, showing that information utility in regularization algorithms is high. Above information analyze reflects an average of uncertainty decrease of these three parameters. However, uncertainty decreases of every parameter after inversion, namely, what the information flow to every parameter is, is the problem we concern on more. Therefore, now we analyze the information flow in inversion taking regularization algorithm when prior information ratio is the best, as an example. The posterior and prior variances of parameters fiso , fgeo , fvol are ˆ and Cp [3, 3]. We respectively determined by the diagonal element of Cov(X) still calculate the entropy decrease of every parameter, which is information of data for every parameter in inversion, namely, information flow in inversion. Results are in Table 20.6. Table 20.6. Entropy decrease for three parameters after inversion. data No. entropy decrease of fiso entropy decrease of fgeo entropy decrease of fvol 1 0.572 0.00827 0.00206 2 0.564 0.00938 0.00057 3 0.334 0.0544 0.00574 4 0.307 0.0642 0.00700 5 0.273 0.0802 0.0136
From Table 20.6, we can make out that information flow to fiso is the most, fgeo the more, fvol the least. This shows that distribution of data information to each parameter is different, namely, information flow is not in average. It is relevant to information matrix A1 A1 in the condition. Consequently, we should control information flow to the highly utilized parameter, to make full use of data information to maximum decrease uncertainty of parameters. Thus, it is a key to utilize data’s information effectively for ill-posed remote sensing inversion problems. According to the information dilution theorem introduced by Fraiture (1986), information is limited for given observations, then the more parameters in model, the less information convey to these parameters. As a result, the simpler the model, namely, less independent parameters or fixing some parameters which uncertainty is less in inversion, the more effective utility of limited information of data for the same precision. That is stable inversion, such as MSDT. The effect of controlling information flow by fixing parameters in different inversion stage in MSDT was analyzed.
524
20
Quantitative Remote Sensing Inversion
From Table 20.7, we can see the information flow is different for different inversion strategies, namely, fixing different parameters. Therefore, in order to fully use the information of data, it is necessary to partition data and parameters synthetically, and need provide more quantitative evidence. For this purpose, information matrix based on USM (IM-USM) was put forth as a tool for quantitatively controlling information flow in inversion, IM − U SM = E × L1/2 × A A × L1/2 × E = AU SM × AU SM ,
(20.5.4)
where L is a matrix consists of eigenvalues of Cp [3, 3], and E is the matrix of corresponding column eigenvectors. Table 20.7. Inversion results of observation by fixing parameter in multi-stage. Inversion results of the 1st stage Parameters inversion value fiso 0.4100 0.1662 fgeo fvol 0.0189
entropy decrease 1.4216 1.0418 0.3202
Inversion results of the 2nd stage fix fvol fix fgeo fix fiso inversion entropy inversion entropy inversion entropy value decrease value decrease value decrease 0.4111 1.4642 0.4171 1.4305 0.1671 1.0854 0.1586 1.3422 0.0006 0.6353 0.0389 0.5541
Model, a priori knowledge, optimization algorithms and inversion strategies are the main contents of quantitative remote sensing inversion. Above are some progresses on these aspects. Because of the ill-posed inversion problems, there are many works should be further, such as the a priori knowledge accumulation and expression methods, quantitative partition methods of data and parameters in MSDT, etc.
20.6
Conclusion
In this chapter, we make a short review of the solving methods for quantitative remote sensing inversion problems. In particular, the important BRDF model is presented. A priori knowledge expression and utilizing methods are focused. PDF is a common expression way at present, and bootstrap method is one of the best estimation. Optimization issues and multistage inversion strategy are discussed, especially the MSDT, the methods for partition of the dataset and the unknown parameters are addressed. Practical remote sensing data are used to perform the inversion and explanation. These researches show that it is very important and effective to add a priori knowledge in remote sensing inversion. In order to make use of the limited information of data and a priori knowledge, MSDT and optimization algorithm are the key tools. With more
References
525
and more remote sensing observations been acquired, the temporal series of the data and the products are becoming important a priori knowledge. Therefore, besides utilizing in remote sensing assimilation, how to use them in inversion is also a very important problem deserving further investigation. In addition, lots of mature optimization methods which may be employed in remote sensing inversion are also a focus.
References [1] P. Bicheron and M. Leroy, BRDF signatures of major biomes observed from space, J. Geophys. Res., 105, 26669-26681, 2000. [2] S. P. Chen, Geo-spatial/temporal Analysis in Geo-processin, J. Remote Sensing, 1(3), 161-170, 1997. [3] X. Chen, H. J. Cui and H. Yang, The application of bootstrap method in linear BRDF models inversion, J. of Remote Sensing, 6(11), 845-851, 2007. [4] D. Deering and P. Leone, A sphere scanning radiometer for rapid directional measurements of sky and ground radiance, Remote Sens. of Environ., 19(1), 124, 1986. [5] B. Efron, Bootstrap methods: Another look at the Jackknife, Ann. Statist., 7(1), 1-26, 1979. [6] H. W. Engl, M. Hanke and A. Neubauer, Regularization of Inverse Problems, Kluwer Academic Publishers, 2000. [7] L. Fraiture, The information dilution theorem, ESA J., 10, 381-386, 1986. [8] F. Gao, X. W. Li, Z. G. Xia, Q. J. Zhu and A. H. Straher, Multi-stage uncertainty multi-angle remote sensing inversion basing on knowledge, Science in China, Series D, 28(4), 346-350, 1998. [9] D. E. Goldberg, Genetic algorithms in search, Optimization and Machine Learning, Addison-Wesley, 1989. [10] P. C. Hansen, Rank-Deficient and Discrete Ill-Posed Problems, Philadephia, Society for Industrial and Applied Mathematics, 99-131, 1998. [11] Y. He, Theory and Methods of Mathematical Statistic, Tongji University Publishing House, Shanghai, 1993. [12] D. S. Kimes, Y. Knyazikhin, J. L. Privettea, A. A. Abuelgasima and F. Gao, Inversion methods for physically-based models, Remote Sensing Review., 18, 381440, 2000. [13] D. S. Kimes, W. W. Newcomb, R. F. Nelson and J. B. Schutt, Directional reflectance distributions of hardwood and pine forest canopy, IEEE Transactions on Geoscience and Remote Sensing, 24, 281-293, 1986. [14] X. Li and A. H. Strahler, Geometric-optical bidirectional reflectance modeling of the discrete crown vegetation canopy: effect of crown shape and mutual shadowing, IEEE Trans. Geosci. Remote Sens., 30, 276-292, 1992. [15] X. Li and A. H. Strahler, Geometric-optical modeling of a coniferous forest canopy, IEEE Trans. Geosci. Remote Sens., 23, 705-721, 1985. [16] X. Li, F. Gao, J. D. Wang and A. Strahler, A priori knowledge accumulation and its application to linear BRDF model inversion, J. Geophysical Research, 106(D11), 11925-11935, 2001.
526
20
Quantitative Remote Sensing Inversion
[17] X. Li, F. Gao, J. D. Wang, A. H. Strahler, W. Lucht and C. Schaaf, Estimation of the parameter error propagation in inversion based BRDF observations at single sun position, Science in China, Series E, 43(Supp.), 9-16, 2000. [18] X. Li, F. Gao, J. D. Wang and Q. Zhu, Uncertainty and sensitivity matrix of parameters in inversion of physical BRDF model, J. Remote Sensing, 1(1), 5-14, 1997. [19] X. Li, J. D. Wang, B. X. Hu and A. H. Strahler, On utilization of prior knowledge in inversion of remote sensing model, Science in China, Series D, 41(6), 580-586, 1998. [20] X. Li, G. J. Yan, Y. Liu, J. Wang and C. Zhu, Uncertainty and sensitivity matrix of parameters in inversion of physical BRDF models, J. Remote Sensing, 1(Supp.), 113-122, 1997. [21] X. L. Ma, Z. M. Wan, C. C. Moeller, W. P. Menzel, L. E. Gumley and Y. Zhang, Retrieval of geophysical parameters from moderate resolution imaging spectroradiometer thermal infrared data: evaluation of a two-step physical algorithm, Appl. Opt., 39, 3537-3550, 2000. [22] NAG, The NAG Fortran Library, mark 14, volume 3, NAG, Inc., I. L. Downers Grove, 1990. [23] W. Ni, X. Li, C. E. Woodcock, M. R. Caetano and A. H. Strahler, An analytical hybrid GORT model for bidirectional reflectance over discontinuous plant canopies, IEEE Trans. Geosci. Remote Sens., 37, 987-999, 1999. [24] O. M. Pokrovsky and J. L. Roujean, Land surface albedo retrieval via kernelbased BRDF modeling: II. an optimal design scheme for the angular sampling, Remote Sens. Environ., 84, 120-142, 2003. [25] W. H. Press, B. P. Flannery, S. A. Teukolsky and W. T. Vetterling, Numerical Recipes (the Art of scientific computering (Fortran Version)), Cambridge Univ. Press, 1989. [26] K. J. Ranson, L. L. Biehl and M. E. Bauer, Variation in spectral response of soybeans with respect to illumination, view and canopy geometry, Int. J. of Remote Sensing, 6(12), 1827-1842, 1985. [27] J. L. Roujean, M. Leroy and P. Y. Deschamps, A bidirectional reflectance model of the earth’s surface for the correction of remote sensing data, J. Geophys. Res., 97, 20455-20468, 1992. [28] A. H. Strahler and D. L. B. Jupp, Modeling bidirectional reflectance of forests and woodlands using Boolean models and geometric optics, Remots Sens. Envir., 34, 153-166, 1990. [29] G. H. Suits, The calculation of ghe directional reflectance of vegetative canopy, Remote Sens. Envir., 2, 117-125, 1972. [30] A. Tarantola, Inverse Problem Theory-Methods for Data Fitting and Model Parameter Estimation, Elsevier, 1987. [31] W. Verhoef, Light scattering by leaf layes with application to canopy reflectance modeling: the SAIL model, Remote Sens. Envir., 16, 125-141, 1984. [32] J. Wang, L. Zhang, Q. Liu, B. Zhang and Q. Yin, The Spectrum Knowledge Base System for Typical Object in China, Science Press, 2009. [33] Y. F. Wang, A. G. Yagola and C. C. Yang, Optimization and Regularization for Computational Inverse Problems and Applications, Higher Education Press, 2010.
References
527
[34] W. Wanner, X. Li and A. H. Strahler, On the derivation of kernels for kerneldriven models of bidirectional reflectance, J. Geophys. Res., 100, 21077-21090, 1995. [35] H. Yang, W. L. Xu, H. R. Zhao, X. Chen and J. D. Wang, Information-flow and controlling in regularization inversion of quantitative remote sensing, Science in China (series D), 48(1), 74-83, 2005.
Author Information H. Yang Research Center for Remote Sensing and GIS, School of Geography, Beijing Normal University, Beijing Key Laboratory for Remote Sensing of Environment and Digital Cities, State Key Laboratory of Remote Sensing Science, Beijing 100875, P. R. China. E-mail: [email protected]
Index
R-superlinear convergence, 461 l0 quasi-norm approximation, 485 l1 , 481 l1 -norm approximation, 487 lp -lq , 482 acoustic equation, 5 additional information, 8 alternating direction, 491 BFGS, 242 BFGS algorithm, 243 Born approximation, 456 BRDF, 512 coefficient problem, 6 coercive operator, 93 conjugate gradient method, 461 corrective filtering, 185 curvelet transform, 480 data of the direct problem, 8 data of the inverse problem, 5 direct problem, 4 Dirichlet boundary condition, 252 discrete regularization, 95 eikonal equation, transport equation, 450 electromagnetic field, 254 Error estimate, 74 extended Bayesian method, 234 filtering, 128 finite difference migration, 443 finite-difference approximation, 56 Fourier method, 127 Fourier transform, wavelet transform, Radon transform, ridgelet transform, 478 Gaussian beam migration, 447 Gel’fand-Levitan-Krein method, 371
gradient methods, 458 harmonic Bz algorithm, 339 heat capacity coefficient, 21 heat conduction coefficient, 21 heat conduction equation, 20, 21 Helmholtz equation, 454 ill-posedness of a problem, 25 impedance boundary condition, 252 interferometric migration, 448 interior problem, 6 inverse alloy design, 204 inverse kinematic problem, 7 inverse problem, 4 inverse scattering, 255 inverse vibration, 312 IOSO, 202 Ivanov theorem, 29 kernel methods, 128 Kirchhoff migration, 441 Landweber iteration, 387 Laplace’s equation, 19 Lavrentiev method, 33 Lavrentiev regularization, 93 linear programming, interior point method, 489 Lippman-Schwinger equation, 412 maximum likelihood estimate, 517 measurement operator, 8 Meyer wavelet method, 133 migration deconvolution, 457 migration,inversion, 453 modulus of continuity, 28 mollification method, 134 mollifier regularization, 90 MREIT, EIT, 332 multi-stage Inversion, 520 multidimensional ill-posed problem, 53 MUSIC operator, 426
530 MUSIC pseudo-spectrum operator, 427 Neumann boundary condition, 252 non-monotone gradient methods, 459 Nyquist-Shannon sampling theorem, 476 Nyström approximation, 69 oceanography, marine reflection seismology, 396 parallel computing, 52 perturbation regularization, 147 phase shift migration, 443 Philips-Tikhonov regularization, 186 preconditioner, 464 preconditioning, preconditioning gradient methods, 462 problem inverse boundary value problem, 6 inverse extension problem, 6 projection of an element, 30 projection of an element onto a set, 30 pseudosolution, 173
Index resolution function, point-spread function, 414 retrospective problem, 6 reverse time migration, 447 sampling, 496 scattering problem, 6 seismic interpolation, 476 SLAE, 170 solution of the direct problem, 8 solution to the inverse problem, 5 source problem, 6 sparse regularizing, l0 , 481 sparse transforms, 477 spectral problem, 7 Stolt migration, 444 SVD, 175, 425 Tikhonov functional, 56 Tikhonov regularization, 92, 476 Tikhonov theorem, 26 total variation regularization, 99 trust region, 494 Volterra equation, 94
quasi-Newton method, 241 quasi-solution, 30 ray theory, 451 regularization, 181 regularization method, 35 regularizing family of operators, 36 resolution function, 418
wave equation, 381 wavefield extrapolation, 441 well-posedness conditional well-posedness, 25 on a pair of topological spaces, 24 well-posedness set, 25 Wolfe line search, 243