306 26 9MB
English Pages 199 Year 2024
Deep Learning-Based Forward Modeling and Inversion Techniques for Computational Physics Problems This book investigates in detail the emerging deep learning (DL) technique in computational physics, assessing its promising potential to substitute conventional numerical solvers for calculating the fields in real-time. After good training, the proposed architecture can resolve both the forward computing and the inverse retrieve problems. Pursuing a holistic perspective, the book includes the following areas. The first chapter discusses the basic DL frameworks. Then, the steady heat conduction problem is solved by the classical U-net in Chapter 2, involving both the passive and active cases. Afterwards, the sophisticated heat flux on a curved surface is reconstructed by the presented Conv-LSTM, exhibiting high accuracy and efficiency. Additionally, a physics-informed DL structure along with a nonlinear mapping module are employed to obtain the space/temperature/time-related thermal conductivity via the transient temperature in Chapter 4. Finally, in Chapter 5, a series of the latest advanced frameworks and the corresponding physics applications are introduced. As deep learning techniques are experiencing vigorous development in computational physics, more people desire related reading materials. This book is intended for graduate students, professional practitioners, and researchers who are interested in DL for computational physics. Yinpeng Wang received the B.S. degree in Electronic and Information Engineering from Beihang University, Beijing, China in 2020, where he is currently pursuing his M.S. degree in Electronic Science and Technology. Mr. Wang focuses on the research of electromagnetic scattering, inverse scattering, heat transfer, computational multi-physical fields, and deep learning. Qiang Ren received the B.S. and M.S. degrees both in electrical engineering from Beihang University, Beijing, China, and Institute of Acoustics, Chinese Academy of Sciences, Beijing, China in 2008 and 2011, respectively, and the PhD degree in Electrical Engineering from Duke University, Durham, NC, in 2015. From 2016 to 2017, he was a postdoctoral researcher with the Computational Electromagnetics and Antennas Research Laboratory (CEARL) of the Pennsylvania State University, University Park, PA. In September 2017, he joined the School of Electronics and Information Engineering, Beihang University as an “Excellent Hundred” Associate Professor.
Deep Learning-Based Forward Modeling and Inversion Techniques for Computational Physics Problems
Yinpeng Wang Qiang Ren
Designed cover image: sakkmesterke This work was supported by the National Natural Science Foundation of China under Grant 92166107 First edition published 2024 by CRC Press 6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487-2742 and by CRC Press 4 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN CRC Press is an imprint of Taylor & Francis Group, LLC © 2024 Yinpeng Wang and Qiang Ren Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, access www.copyright.com or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. For works that are not available on CCC please contact [email protected] Trademark notice: Product or corporate names may be trademarks or registered trademarks and are used only for identification and explanation without intent to infringe. ISBN: 978-1-032-50298-4 (hbk) ISBN: 978-1-032-50303-5 (pbk) ISBN: 978-1-003-39783-0 (ebk) DOI: 10.1201/9781003397830 Typeset in CMR10 font by KnowledgeWorks Global Ltd. Publisher’s note: This book has been prepared from camera-ready copy provided by the authors.
To my dear parents. (Yinpeng Wang) To my wife and my daughter. (Qiang Ren)
Contents
Preface Symbols 1 Deep Learning Framework and Paradigm in Computational Physics 1.1 Traditional Numerical Algorithms . . . . . . . . . . . . . . . . 1.1.1 Moment of Method . . . . . . . . . . . . . . . . . . . . . 1.1.2 Monte Carlo Method . . . . . . . . . . . . . . . . . . . . 1.2 Basic Neural Network Structure . . . . . . . . . . . . . . . . . 1.2.1 Fully Connected Neural Network . . . . . . . . . . . . . 1.2.2 Convolutional Neural Network . . . . . . . . . . . . . . 1.2.3 Recurrent Neural Network . . . . . . . . . . . . . . . . . 1.2.4 Generative Adversarial Network . . . . . . . . . . . . . 1.3 Paradigms in Deep Learning . . . . . . . . . . . . . . . . . . . 1.3.1 Data Driven . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.2 Physics Constraint . . . . . . . . . . . . . . . . . . . . . 1.3.2.1 Fully Connected Based PINN . . . . . . . . . . 1.3.2.2 Convolutional Based PINN . . . . . . . . . . . 1.3.2.3 Recurrent Based PINN . . . . . . . . . . . . . 1.3.2.4 Generative Adversarial Based PINN . . . . . . 1.3.3 Operator Learning . . . . . . . . . . . . . . . . . . . . . 1.3.4 Deep Learning-Traditional Algorithm Fusion . . . . . . 1.4 Constitutions of the Book . . . . . . . . . . . . . . . . . . . . . Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xi xiii
1 1 1 4 5 7 8 9 11 12 13 17 18 19 20 20 20 23 26 27
2 Application of U-Net in 3D Steady Heat Conduction Solver 35 2.1 Traditional Methods . . . . . . . . . . . . . . . . . . . . . . . . 35 2.1.1 Analytical Methods . . . . . . . . . . . . . . . . . . . . 36 2.1.2 Numerical Methods . . . . . . . . . . . . . . . . . . . . 37 2.2 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.3 3D Heat Conduction Solvers via Deep Learning . . . . . . . . 43 2.3.1 Heat Conduction Model . . . . . . . . . . . . . . . . . . 44 2.3.2 Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 2.3.2.1 Thermophysical Parameters . . . . . . . . . . . 46 2.3.2.2 Basic Dataset . . . . . . . . . . . . . . . . . . 47 vii
viii
Contents 2.3.2.3 Open-Source Dataset . . 2.3.2.4 Enhanced Dataset . . . . 2.3.3 Architecture of the Network . . . . 2.3.4 Loss Functions . . . . . . . . . . . 2.3.5 Pre-Experiments . . . . . . . . . . 2.3.5.1 Activation Function . . . 2.3.5.2 Learning Rate . . . . . . 2.3.5.3 Dropout Ratio . . . . . . 2.3.5.4 Split Ratio . . . . . . . . 2.3.5.5 Optimizer . . . . . . . . . 2.3.6 Results . . . . . . . . . . . . . . . 2.3.6.1 Passive Cases . . . . . . . 2.3.6.2 Active Cases . . . . . . . 2.3.6.3 Computing Acceleration . 2.4 Conclusion . . . . . . . . . . . . . . . . . Bibliography . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
3 Inversion of Complex Surface Heat Flux Based on ConvLSTM 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Progress in Inversion Research . . . . . . . . . . . . . . . . . . 3.2.1 Conventional Approach . . . . . . . . . . . . . . . . . . 3.2.2 Artificial Neural Network . . . . . . . . . . . . . . . . . 3.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Physical Model of Heat Conduction . . . . . . . . . . . 3.3.2 3D Transient Forward Solver Based on Joint Simulation 3.3.3 Neural Network Framework Based on ConvLSTM . . . 3.3.3.1 Fully Connected Network . . . . . . . . . . . . 3.3.3.2 Recurrent Neural Network . . . . . . . . . . . 3.3.3.3 Convolutional LSTM . . . . . . . . . . . . . . 3.4 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . 3.4.1 Training of the ConvLSTM . . . . . . . . . . . . . . . . 3.4.2 Inversion of the Regular Plane . . . . . . . . . . . . . . 3.4.3 Inversion of the Complex Surface . . . . . . . . . . . . . 3.4.3.1 Thermal Inversion Results of the Fixed Complicated Model . . . . . . . . . . . . . . . 3.4.3.2 Thermal Inversion Results of the Variable Complicated Model . . . . . . . . . . . . . . . 3.4.4 Statistical Analysis and Comparison . . . . . . . . . . . 3.4.5 Engineering Application . . . . . . . . . . . . . . . . . . 3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
48 49 51 54 57 57 60 61 62 63 66 66 71 75 77 77
81 81 82 82 86 90 90 92 93 93 95 96 98 98 99 99 100 101 102 104 106 107
Contents
ix
4 Reconstruction of Thermophysical Parameters Based on Deep Learning 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 Physical Foundation . . . . . . . . . . . . . . . . . . . . 4.2 Progress in Inversion Research . . . . . . . . . . . . . . . . . . 4.2.1 Gradient-Based Methods . . . . . . . . . . . . . . . . . 4.2.1.1 LM Method . . . . . . . . . . . . . . . . . . . 4.2.1.2 Conjugate Gradient Method . . . . . . . . . . 4.2.2 Global Optimization Algorithm . . . . . . . . . . . . . . 4.2.2.1 Genetic Algorithm . . . . . . . . . . . . . . . . 4.2.2.2 Particle Swarm Optimization . . . . . . . . . . 4.2.3 Deep Learning Approach . . . . . . . . . . . . . . . . . 4.2.4 Structure of the Chapter . . . . . . . . . . . . . . . . . . 4.3 Physical Model and Data Generation . . . . . . . . . . . . . . 4.3.1 2D Heat Conduction Model . . . . . . . . . . . . . . . . 4.3.2 3D Heat Conduction Model . . . . . . . . . . . . . . . . 4.3.3 Data Generation . . . . . . . . . . . . . . . . . . . . . . 4.3.3.1 The Architecture of the PINN and Its Loss Functions . . . . . . . . . . . . . . . . . . . . . 4.3.3.2 Comparison with Commercial Software . . . . 4.4 Denoising Process . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 Conventional Denoising Approach . . . . . . . . . . . . 4.4.2 Deep Learning Denoising Framework . . . . . . . . . . . 4.4.3 Training and Testing . . . . . . . . . . . . . . . . . . . . 4.4.4 Comparisons with Other Approaches . . . . . . . . . . . 4.5 Inversion Process . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.1 2D Cases . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.1.1 DL Framework . . . . . . . . . . . . . . . . . . 4.5.1.2 Training and Testing . . . . . . . . . . . . . . 4.5.2 3D Cases . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.2.1 DL Framework . . . . . . . . . . . . . . . . . . 4.5.2.2 Reconstructing Results . . . . . . . . . . . . . 4.5.2.3 Statistics Analyze . . . . . . . . . . . . . . . . 4.5.2.4 Generalization Ability . . . . . . . . . . . . . . 4.5.2.5 Computational Speed . . . . . . . . . . . . . . 4.5.2.6 Comparisons with Conventional Network . . . 4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Advanced Deep Learning Techniques in Computational Physics 5.1 Physics Informed Neural Network . . . . . . . . . . . . . 5.1.1 Fully Connected-Based PINN . . . . . . . . . . . . 5.1.1.1 Cylindrical Coordinate System . . . . . . 5.1.1.2 Spherical Coordinate System . . . . . . .
. . . .
. . . .
. . . .
111 111 111 113 113 114 114 115 116 116 117 119 120 120 121 122 123 125 126 127 127 128 130 131 132 132 132 133 133 135 137 139 139 140 142 142
147 147 147 148 150
x
Contents 5.1.1.3 Parabolic Coordinate System 5.1.2 Convolutional-Based PINN . . . . . . 5.2 Graph Neural Networks . . . . . . . . . . . . 5.2.1 Architecture of the GNN . . . . . . . 5.2.2 Data Generation and Training . . . . 5.2.3 Results . . . . . . . . . . . . . . . . . 5.3 Fourier Neural Networks . . . . . . . . . . . 5.3.1 Methods . . . . . . . . . . . . . . . . . 5.3.1.1 Framework Architecture . . . 5.3.1.2 Physics Model . . . . . . . . 5.3.1.3 Data Generation . . . . . . . 5.3.1.4 Training . . . . . . . . . . . 5.3.2 Results and Discussion . . . . . . . . . 5.3.2.1 Prediction Accuracy . . . . . 5.3.2.2 Statistical Analysis . . . . . 5.3.2.3 Comparison . . . . . . . . . . 5.4 Conclusion . . . . . . . . . . . . . . . . . . . Bibliography . . . . . . . . . . . . . . . . . . . . .
Index
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
151 152 155 156 157 158 159 162 162 164 165 166 166 166 170 171 172 173 181
Preface
Computational physics is the third branch of modern physics beyond experimental physics and theoretical physics. It is an emerging discipline that implements computers as tools and applies appropriate mathematical methods to calculate physical problems. Numerical modeling of either steady or transient processes of computational physics finds a plethora of applications in diverse realms, with practical impact on electromagnetics, optics, thermology, hydrodynamics, quantum mechanics, and so on. Traditional wisdom in assisting the modeling procedure is derived from applied mathematics, where a series of numerical algorithms, such as the finite difference method (FDM), the finite element method (FEM), and the method of moment (MoM), have been proposed to resolve the difference, differential, and integral equations related to multiple physics systems. Nevertheless, all the aforementioned approaches encounter the same dilemma when dealing with real computational scenarios. Under these circumstances, the physics boundaries along with the constraint equation of the system are generally discretized into a sophisticated matrix with high-dimensional space and millions of unknown variables. Although the traditional algorithms may supply certain feasibility to conquer the task, the computational efficiency can lag far behind. Generally speaking, tens of thousands of hours for calculation definitely hinder researchers from large-scale simulation problems demanding a real-time response. The deep learning (DL) technique has come into the stage of scientific computing in recent years. The most evident idea behind the DL technique in scientific computing is data-driven modeling, where the conventional solving tools merely serve to generate sufficient data for DL algorithms to learn the underlying physics during the training. In essence, the implicit relation in any forward or inverse modeling tasks with numerous variables is able to be faithfully approximated via training, rather than resource-demanding and time-consuming numerical calculation. In terms of computational speed, the well-trained DL algorithms surpass conventional numerical approaches by several orders of magnitude in forward analysis. However, purely data-dependent DL frameworks just resemble black boxes, in which training samples galore are indispensable to yield a satisfactory output. To this end, the physics-informed neural networks (PINN) are presented which embed the physics constraint into the elaborately designed loss functions, either by means of the automatic differential scheme or difference kernels. The training of PINN is analogous to solving by conventional numerical algorithms, which encode the boundaries xi
xii
Preface
and physics equations to a particular form. Therefore, a significant defect of the PINN is that it is applicable to only a specific scene, demonstrating an inferior generalization ability. Accordingly, the operator learning architecture is developed, which surmounts the imperfection of the PINN and hence emerges a pervasive application prospect. The deep learning technique appears as a reformative approach with remarkable performance in scientific computing, but it is never an obscure and prohibitive method to be employed in various areas. This book is hereby to furnish comprehensive insights and practical guidance for pertinent research to establish a complicated numeric solver via deep learning networks. The intended audience contains anyone who is interested in implementing machine learning techniques in the field of computational physics, particularly forward and inverse calculations. To start with, the first chapter provides a detailed introduction to different DL frameworks, such as the prevailing fully connected network, the convolutional network, the recurrent network, the generative adversarial network, and the graph network. After that, the paradigm for employing the deep learning mechanism to settle computational physics missions is discussed, involving the data-driven, physical constraints, operator learning, and DL-traditional fusion methods. Next, the original and concrete experimental results are demonstrated in the following chapters to showcase how the complete procedure can be done to set up the DL-based solver. In Chapter 2, the steady-state forward heat conduction problem is handled by the classical U-net, including the passive and active scenarios. In Chapter 3, the emerging convLSTM architecture is utilized to reconstruct the surface heat flux of curved surfaces. In Chapter 4, a physics-informed neural network (PINN) and a feedback mapping module (NMM) are adopted to reconstruct the intricate thermal conductivity. Ultimately in Chapter 5, several of the latest advanced network structures along with the corresponding physical scenes are investigated, consisting of the application of PINN in generalized curvilinear coordinates, the living example of graph neural networks in solving electrostatic fields, and the instance of coupled Fourier networks in tackling multiphysics field problems. Employing the deep learning technique in the modeling of computational physics is stepwise going from a spark of practice into a mainstream trend, where physicists, mathematicians, and algorithm engineers gather together to tackle tangled multiscale and multiphysics problems. Considering this background, the authors anticipate that all the readers will benefit a lot from this book. Yinpeng Wang Qiang Ren
Symbols
Symbol Description H E T Φ k q ρ P Cp h r ε µ t n λ kB n ν D n p u g K W
Magnetic field Electrical field Temperature Electrical potential Thermal conductivity Heat flux Density (mass/eletrical source Power density Constant pressure heat capacity Convective heat transfer coefficient Space coordinate Emissivity/Permittivity Permeability/viscosity Time Unit normal vector Mean free path Boltzmann constant Electron density Drift velocity Diffusion velocity Electron density Pressure Velocity Gravitational acceleration Convolutional kernel Weight matrix
b Q K V α β γ θ R δ φ b, c C˜ χ H O P A B L B D E F d ∇ ⊗ ◦
Bias vector Query vector Key vector Value vector Learning rate Step size Updating coefficient Network parameters Real number set Unit impulse function Activation function Offset Cell state Input variable Hidden variable Output variable Pooling operator Direct physical field Indirect physical field Equation loss operator Boundary loss operator Observed loss operator Loss function Fourier operator Scaling factor Gradient operator Convolution operator Inner product operator Hadamard product
xiii
1 Deep Learning Framework and Paradigm in Computational Physics
Computational physics is a new subject that uses computers to numerically simulate physical processes. The application of computational physics is usually fairly extensive and permeates all fields of physics. The research process of computational physics mainly includes modeling, simulation, and computing. Among them, modeling means the process of abstracting physical processes into mathematical models. Simulation refers to the expression and exploration of physical laws, also known as computer experiments. Computing is the procedure of numerical research and analysis of theoretical problems using computers. Traditional computational physics includes the finite difference method [1, 2, 3], the finite element method [4, 5, 6], the variational method [7, 8], the moment of method [9], the molecular dynamics method [10, 11], the Monte Carlo simulation method [12], etc. The detailed process of the finite element method is covered in later chapters of this book, so the first chapter will not be included. Here, the electromagnetic scattering calculation method based on MoM and the Monte Carlo simulation method based on steady-state heat conduction are briefly introduced.
1.1 1.1.1
Traditional Numerical Algorithms Moment of Method
The first conventional approach to be discussed in this chapter is the moment of method (MoM), which was first introduced to the realm of computational electromagnetics by Harrington [13, 14]. The main methodology of the MoM is to convert the continuous integral equation into discrete equations, which is able to be solved by computers [15]. The operator equation can be expressed by L(f ) = g (1.1) where L denotes the integral equation operator of electrical fields. g and f are the known and unknown functions, respectively. The basic process for the MoM to solve the operator equation could be included in the following parts: DOI: 10.1201/9781003397830-1
1
2
Deep Learning-Based Forward Modeling and Inversion Techniques
First of all, the discretization is implemented to express the function f into the linear combination of the basis functions. f=
N X
an fn
(1.2)
n=1
where fn and an are the basis function of the n-th term and the corresponding expansion coefficient, respectively, and N is the number of expansion terms. Since the integral operator is linear, the Eq. 1.1 can be substituted to obtain: N X
an L (fn ) ≈ g
(1.3)
n=1
In this way, the operator equation is transformed into a matrix equation. Next, sampling inspection is required, where the weight function (or trial function) needs to be selected. The frequently selected weight functions include the point matching method and Galerkin method [16], in which the unit impulse function or the basis function itself is selected as the weight function, respectively. For simplicity, the point-matching method is adopted here: ωm (r) = δ (r)
(1.4)
One can define an inner product calculation (also called moment) that acts between the basis function and the weight function. Z Z hfm , fn i = fm (r) · fn (r0 ) dr0 dr (1.5) fm
fn
In the method of moments, the corresponding operators can be defined, so that exists in a certain way between the basis functions, and there is a certain relationship between them N X
an hωm , L (fn )i = hωm , gi
(1.6)
n=1
Construct a matrix whose unit element is zmn = hωm , L (fn )i
(1.7)
The right-hand side of Eq. 1.6 can be denoted as bmn = hωm , gi
(1.8)
Here, the inner product equation has been transformed into a matrix equation: Ax = b (1.9)
Deep Learning Framework and Paradigm in Computational Physics
3
where x represents the discretization of the total electric field, b represents the discretization of the incident field, and complex matrix A can be written as A = I + Gd Ds
(1.10)
where I is the unit matrix, D is the diagonal matrix formed by the discretization of electromagnetic parameters in the scattering area, and Gd is the coefficient matrix. h i ( (2) i d 2 πk0 aH1 (k0 a) − 2i , p = q Gp,q = (1.11) (2) i πk aJ (k a) H (k ρ ) , p = 6 q 0 1 0 0 p,q 0 2 where k0 represents the wave number in vacuum, J1 represents the first-order (2) Bessel function, H1 represents the second-class first-order Hankel function, (2) H0 represents the second-class zero-order Hankel function, and a represents the radius of a circle equal to each grid area ∆2 . r ∆2 (1.12) a= π ρp,q represents the distance between two grid center points. q 2 2 ρp,q = (xp − xq ) + (yp − yq )
(1.13)
Here, an example is used to illustrate the reliability of the moment method. In the two-dimensional plane, there is a square scattering area with a side length of 2 m. The electromagnetic wave is incident along the +x direction, and the incident wave is TM polarized; that is, the electric field has only the z component. The frequency of the electromagnetic wave is 300 MHz. The √ scatterer is elliptical, its semi-major axis is 0.5 m, and its eccentricity is 23 ; the internal scatterer is a uniform lossless medium, its dielectric constant is 4, and the background is vacuum. Figure 1.1 shows the comparison of the results between the forward solver and the commercial simulation software COMSOL. It can be concluded that the method of moments can accurately solve the forward problem of the electromagnetic fields. In fact, compared with the finite difference method based on the difference equation, the method of moments makes use of the global information when solving, so it can obtain more accurate solutions. However, a notable disadvantage is that the resource utilization rate of the algorithm is quite high, so it is difficult to apply to high-speed occasions. In addition, using the method of moments to solve the integral equation needs to be subject to certain conditions. Among them, the two most important ones are that in the process of grid division and discretization, the grid needs to be uniform, and the size should meet k0 am /2 < 1/10, where am is the maximum length of the segment included in the region. In this way, the electromagnetic field in each grid can be regarded as a constant value.
4
Deep Learning-Based Forward Modeling and Inversion Techniques EUURU
(a)
(b)
(c)
FIGURE 1.1 The comparisons of the MoM algorithm and the COMSOL. (a) Fields calculated by the MoM algorithm, (b) fields computed by COMSOL, and (c) the errors.
1.1.2
Monte Carlo Method
The second method introduced here is the Monte Carlo method [12, 17, 18], which is also called the random sampling method. The algorithm makes use of the random number generated by the computer for statistical experiments and takes the statistical characteristics such as the mean value and probability as the numerical solution of the equation. In recent years, with the vigorous development of computer technology, the Monte Carlo algorithm has been widely used in the field of computational physics. Although compared with classical numerical algorithms such as the finite element method and finite difference method, the Monte Carlo method is slightly clumsy, it has an advantage that other methods cannot replace. For example, this method can solve the value of any point in the region independently without solving on the basis of other solved points, so it can realize fast parallel computing. When applying the Monte Carlo method to solve partial differential equations, it is usually necessary to establish a probability model to obtain the probability of an event through numerical simulation, so as to obtain the numerical solution of the point. This chapter will take the steady-state heat conduction equation as an example to introduce the process of using Monte Carlo simulation to solve the PDE approximate solution. Suppose the problem in solution domain D is ∇2 u = 0
(1.14)
u|Γ = g (Γ)
(1.15)
Divide the space area to be solved evenly, and the grid size is δ. Assuming that the internal point to be solved is S and the grid point on the boundary
Deep Learning Framework and Paradigm in Computational Physics
5
is marked as Γ, the two-dimensional finite difference scheme is as follows: u (i + 1, j) − 2u (i, j) + u (i − 1, j) u (i, j + 1) − 2u (i, j) + u (i, j − 1) + =0 δ2 δ2 (1.16) Therefore, the value of S at any point in the region can be regarded as the average value of several points around. Similarly, approximate equations can be established for other points in the region. The connection between the internal point S and the boundary point can be obtained by simultaneous equations, namely Np X u (S) = g (Γi ) f (Γi ) (1.17) i=1
In the above formula, the value of the boundary Γi is denoted as the g (Γi ), and its weight coefficient is f (Γi ). Therefore, a set of probability models based on the random walk can be constructed to simulate the aforementioned solving process. Here, suppose there are NP particles starting from point S, walk along the grid randomly with independent equal probability, and pi points finally arrive at the boundary point Γi . Then the formula 1.17 can be rewritten as Np X pi g (Γi ) Np →∞ Np i=1
u (S) = lim
(1.18)
An example will be used to illustrate the application of the Monte Carlo simulation method in solving the steady-state heat conduction equation. In the square area, the Dirichlet boundary conditions around are given: u = 0, y = 0 and 1
(1.19)
u = sin πy, x = 0
(1.20)
π
u = e sin πy, x = 1
(1.21)
Figure 1.2 shows the analytical solution and the numerical solution obtained by Monte Carlo simulation. Figure 1.3 shows the violin diagram of calculation error distribution under different particle numbers. Obviously, with the increasing number of particles, the calculation results become more and more accurate. However, the computation of this algorithm is quite large, and it takes a long time to obtain the field value of the whole region, so its computational efficiency is fairly low.
1.2
Basic Neural Network Structure
Because of the low efficiency of traditional algorithms, they are not suitable for high-speed scenes. In recent years, deep learning technology has created
6
Deep Learning-Based Forward Modeling and Inversion Techniques
FIGURE 1.2 The comparisons of the Monte Carlo simulation and the analytical solution. (a) Temperature calculated by the Monte Carlo simulation, (b) the analytical solution of the temperature, and (c) the errors.
FIGURE 1.3 The error of the Monte Carlo simulation with different numbers of particles. many brilliant achievements in the field of computational physics. In general, the application of deep learning technology in computational physics includes three parts: the first is the calculation of field, which is also the so-called forward problem. The physical field is obtained by using the given boundary conditions and initial conditions. The second problem is parameter extraction, which mainly involves the inversion of various constitutive parameters. The third problem is the inverse design, which usually reconstructs an unknown structure from a given target response. This book mainly explores the first two
Deep Learning Framework and Paradigm in Computational Physics
7
FIGURE 1.4 A simple fully connected network composed of the input layer, several hidden layers, and the output layer. types of problems, involving the calculation of the temperature field, electric field, and flow field, and the inversion of heat flux, and thermal conductivity. Applying deep learning technology to solve physical problems usually requires building various neural networks. Here, various basic neural network structures will be introduced briefly.
1.2.1
Fully Connected Neural Network
The fully connected neural network [19], or FCNN for short, is the most basic type of deep learning (DL) network (shown in Figure 1.4). Generally speaking, the network consists of three parts, the input layer, the hidden layers, and the output layer. Supposing that the input of the i-th and i + 1-th layer of the network are represented as Xi and Xi+1 , respectively, then: Xi+1 = σ (Wi+1 Xi + bi+1 )
(1.22)
where Wi+1 , bi+1 are the weight matrix and bias vector of the i + 1-th layer and σ denotes the nonlinear activation function. The goal of the FCNN is to fit some unknown function. In 1991, Leshnon [20] proved that an FCNN with a hidden layer is able to approximate any continuous function f (x) uniformly on a compact set with a non-polynomial activation (such as Tanh, logistic, and ReLU). Mathematically, it can be termed as: ∀ > 0, there exists an integer N (the total number of hidden units), as well as parameters φ, bi ∈ R such that the function N X F (x) = vi φ wiT x + bi (1.23) i=1
Complies with |F (x)−f (x)| < for all x. This theorem lays a solid foundation for implementing the network to approximate the unsuspected functions.
8
Deep Learning-Based Forward Modeling and Inversion Techniques
Here, several familiar activation functions [21, 22], such as the Sigmoid function, the ReLU function, the Leaky ReLU function, and Tanh function are defined as 1 σ (z) = g (z) = (1.24) 1 + e−z ReLU (z) = max (0, z) (1.25) LeaklyReLU (z) = max (αz, z) z
tanh (z) =
(1.26)
−z
e −e ez + e−z
(1.27)
whose derivatives are d σ (z) = σ (z) (1 − σ (z)) dz ( 0 z0 ( α z δ
Application of U-Net in 3D Steady Heat Conduction Solver
57
FIGURE 2.17 The curves of the four loss functions for regression problems. where δ is an adjustable parameter that determines the behavior of the model to handle outliers. When δ tends to 0, it degenerates into MAE while when it tends to infinity, it degenerates into MSE. The Huber loss function inherits the advantages of MAE and MSE, which not only maintains the continuous derivative of the loss function but also emerges better robustness to outliers. Figure 2.17 displays the curves of the loss functions for regression problems. Through the comprehensive comparison, Huber loss function is adopted in the following studies.
2.3.5
Pre-Experiments
To avert the waste of computing resources due to poorly designed experiments, it is valuable to find out the best experimental conditions. A common approach is to conduct pre-experiments on a small dataset with adjustable hyperparameters before formal experiments. This section mainly focuses on the activation function, learning rate, dropout rate, split ratio, optimizer, etc. 2.3.5.1
Activation Function
The activation function is vital for neural networks to interpret complex systems, without which the output of the network is a linear combination of the input, no matter how many layers the network has. The activation function introduces the nonlinearity to the network so that it can approximate any function. In this section, several activation functions related to the preexperiments will be discussed. • A. Sigmoid function The Sigmoid function is the analytical solution of logistic differential equations, which is often used in the field of ecology. It is defined as σ(x) =
1 1 + e−x
(2.45)
58
Deep Learning-Based Forward Modeling and Inversion Techniques
This function is a monotone increasing function, mapping (−∞, ∞) to (0, 1). Figure 2.18 (a) shows the curve of the Sigmoid function and its derivative. It can be found that the function is smooth and has a continuous derivative. However, the output of Sigmoid is not zero-centered, which will alter the distribution of data with the deepening of the network layers. In addition, since the derivative function ranges from 0 to 0.25, it is easy to suffer from gradient saturation according to the chain rule. • B. Hyperbolic tangent function The hyperbolic tangent function is defined as tanh(x) =
ex − e−x ex + e−x
(2.46)
As exhibited in Figure 2.18 (b), the curve of the hyperbolic tangent function is close to that of the Sigmod function. It is also monotone increasing, mapping (−∞, ∞) to (−1, 1). Although the function has the advantage of zerocentered, it still cannot obviate gradient saturation. In addition, the complexity of derivative operation for exponential function is also the disadvantage of the above two functions. • C. ReLU function ReLU is currently the most widely used activation function whose definition is: ( x x≥0 ReLU (x) = (2.47) 0 x