133 68
English Pages 240 [233] Year 2021
Lecture Notes on Numerical Methods in Engineering and Sciences
Genki Yagawa Atsuya Oishi
Computational Mechanics with Neural Networks
Lecture Notes on Numerical Methods in Engineering and Sciences Editorial Board Francisco Chinesta, Ecole Centrale de Nantes, Nantes Cedex 3, France Charbel Farhat, Department of Mechanical Engineering, Stanford University, Stanford, CA, USA C. A. Felippa, Department of Aerospace Engineering Science, College of Engineering & Applied Science, University of Colorado, Boulder, CO, USA Antonio Huerta, Universitat Politècnica de Catalunya, Barcelona, Spain Thomas J. R. Hughes, Institute for Computational Engineering and Sciences, University of Texas at Austin, Austin, TX, USA Sergio Idelsohn, CIMNE - UPC, Barcelona, Spain Pierre Ladevèze, Ecole Normale Supérieure de Cachan, Cachan Cedex, France Wing Kam Liu, Evanston, IL, USA Xavier Oliver, Campus Nord UPC, International Center of Numerical Methods, Barcelona, Spain Manolis Papadrakakis, National Technical University of Athens, Athens, Greece Jacques Périaux, Barcelona, Spain Bernhard Schrefler, CISM - International Centre for Mechanical Sciences, Padua, Italy Genki Yagawa, School of Engineering, University of Tokyo, Tokyo, Japan Mingwu Yuan, Beijing, China Series Editor Eugenio Oñate, Jordi Girona, 1, Edifici C1 - UPC, Universitat Politecnica de Catalunya, Barcelona, Spain
This series publishes text books on topics of general interest in the field of computational engineering sciences. The books will focus on subjects in which numerical methods play a fundamental role for solving problems in engineering and applied sciences. Advances in finite element, finite volume, finite differences, discrete and particle methods and their applications to classical single discipline fields and new multidisciplinary domains are examples of the topics covered by the series. The main intended audience is the first year graduate student. Some books define the current state of a field to a highly specialised readership; others are accessible to final year undergraduates, but essentially the emphasis is on accessibility and clarity. The books will be also useful for practising engineers and scientists interested in state of the art information on the theory and application of numerical methods.
More information about this series at http://www.springer.com/series/8548
Genki Yagawa · Atsuya Oishi
Computational Mechanics with Neural Networks
Genki Yagawa University of Tokyo Tokyo, Japan
Atsuya Oishi Tokushima University Tokushima, Japan
Toyo University Tokyo, Japan
ISSN 1877-7341 ISSN 1877-735X (electronic) Lecture Notes on Numerical Methods in Engineering and Sciences ISBN 978-3-030-66110-6 ISBN 978-3-030-66111-3 (eBook) https://doi.org/10.1007/978-3-030-66111-3 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
The first article on the finite element method for engineering problems was published in 1956, (Turner et al. 1956). Since then, the application of computational mechanics to science and technology fields has been expanded rapidly assisted by the astonishing progress of computers. A number of international journals purposing to publish research articles in the field of computational mechanics have been started, which include International Journal for Numerical Methods in Engineering (since 1969), Computers and Structures (since 1971), Computer Methods in Applied Mechanics and Engineering (since 1972), Finite Elements in Analysis and Design (since 1985), Computational Mechanics (since 1986), Archives of Computational Methods in Engineering (since 1994) and others. As is well-known, the core of computational mechanics has been and still is to find numerical solutions for a variety of partial differential equations in science and engineering. In the meantime, the neural networks emerged first as the perceptron (Rosenblatt 1958) and then as the error back-propagation algorithm (Rumelhart et al. 1986). Since then, the neural networks have been applied to various fields including computational mechanics successfully. The neural networks have explored fresh fields in computational mechanics, adding some soft or ambiguous features to the traditional or rigid computational mechanics to solve such mechanics problems as structural or fluid problems by discretizing the partial differential equations. After several decades since the invention of the neural networks of the middle of twentieth century, the deep learning, an extension of the feedforward neural networks, was initiated with a new idea of pretraining (Hinton et al. 2006), making it possible to train the large-scale feedforward neural networks with many layers. Needless to say, the impact of the deep learning on the society is amazing, and the same can be said about the impact of computational mechanics. Depicted in Fig. 1 is the trend of the number of research articles related to computational mechanics with the neural networks, which have been published in the journals above. It is interesting to see from the figure that the trend is growing sharply in the last few years although it experienced a short slowdown period after 2010. One of the reasons of the recent sharp growth of the number of published papers is due to the deep learning supported by Graphics Processing Units (GPUs). v
vi
Preface
80
Fig. 1 Trend of number of articles on computational mechanics with neural networks
Number of Articles
70 60
CMAME IJNME CM Total (Five Journals)
50 40 30 20 10 0 1980
1990
2000
2010
2020
Year It is also noted that various machine learning methods in addition to the neural networks have been employed in computational mechanics, including the genetic algorithms (GAs) based on the biological evolution model, the genetic programming (GP), the evolutionary algorithm (EA) or the evolutionary strategy (ES) and the support vector machine (SVM). They play similar roles as the neural networks in the field of computational mechanics. This book is written with the purpose to address what have been achieved by the neural networks and the related methods in the field of computational mechanics, and how the neural networks are applied to computational mechanics. Part I is devoted to the fundamentals of the neural networks and other machine learning methods with the relation to computational mechanics. In Chap. 1, the bases on computers and networks are given. In Chaps. 2–5, the neural networks are discussed from the viewpoint of application to computational mechanics. Some machine learning methods other than the neural networks are given in Chap. 6. Part II deals with the applications of the neural networks to a variety of problems of computational mechanics. After an overview of computational mechanics with the neural networks in Chap. 7, some applications related to computational mechanics with neural networks are discussed in Chaps. 8–14. In Chap. 15, applications related to computational mechanics with the machine learning methods are given. Finally, in Chap. 16, the perspective to the applications of the deep learning to computational mechanics is discussed. We express our cordial thanks to all the colleagues and students, who have collaborated over several decades: S. Yoshimura, M. Oshima, H. Okuda, T. Furukawa, N. Soneda, H. Kawai, R. Shioya, Y. Nakabayashi, T. Horie, Y. Kanto, Y. Wada, T. Miyamura, G. W. Ye, T. Yamada, A. Yoshioka, M. Shirazaki, H. Matsubara, T. Fujisawa,
Preface
vii
H. Hishida, Y. Mochizuki, T. Kowalczyk, A. Matsuda, C. R. Pyo, J. S. Lee and K. Yamada. Many of these individuals have contributed in carrying out the researches covered in this book. We are particularly grateful to Prof. E. Oñate (CIMNE/Technical University of Catalonia, Spain) for his many important suggestions and encouragements during the publication process of the book. Tokyo, Japan Tokushima, Japan September 2020
Genki Yagawa Atsuya Oishi
References Hinton, G.E., Osindero, S., Teh, Y.: A fast learning algorithm for deep belief nets. Neural Comput. 18, 1527–1544 (2006) Rosenblatt, F.: The perceptron: a probabilistic model for information storage and organization in the brain. Psychol. Rev. 65, 386–408 (1958) Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323, 533–536 (1986) Turner, M.J., Clough, R.W., Martin, H.C., Topp, L.J.: Stiffness and deflection analysis of complex structures. J. Aeronaut. Sci. 23, 805–823 (1956)
Contents
Part I
Preliminaries: Machine Learning Technologies for Computational Mechanics
1
Computers and Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Computers and Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Network Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Parallel Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Numerical Precision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3 3 6 6 7 8
2
Feedforward Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Various Types of Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Acceleration for Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Initialization of Connection Weights . . . . . . . . . . . . . . . . . . . . . . . 2.6 Model Averaging and Dropout . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11 11 17 19 20 22 22 23
3
Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Neural Network Versus Deep Learning . . . . . . . . . . . . . . . . . . . . . 3.2 Pretraining: Autoencoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Pretraining: Restricted Boltzmann Machine . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25 25 26 27 28
4
Mutually Connected Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Hopfield Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Boltzmann Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
31 31 32 34
5
Other Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Self-organizing Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Radial Basis Function Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
35 35 37 38 ix
x
6
Contents
Other Algorithms and Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Genetic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Genetic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Other Bio-inspired Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Support Vector Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5 Expert Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.6 Software Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Part II
39 39 41 44 44 48 49 49
Applications
7
Introductory Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
53 55
8
Constitutive Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1 Parameter Determination of Viscoplastic Constitutive Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Implicit Constitutive Modelling for Viscoplasticity . . . . . . . . . . . 8.3 Autoprogressive Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4 Others . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
57 57 58 60 65 67
Numerical Quadrature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1 Optimization of Number of Quadrature Points . . . . . . . . . . . . . . . 9.2 Optimization of Quadrature Parameters . . . . . . . . . . . . . . . . . . . . . Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
69 69 73 76
10 Identifications of Analysis Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1 Time Step Determination of Pseudo Time-Dependent Stress Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2 Parameter Identification of Augmented Lagrangian Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3 Predictor–Corrector Method for Nonlinear Structural Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4 Contact Stiffness Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
77
9
11 Solvers and Solution Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Finite Element Solutions Through Direct Minimization of Energy Functional . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Neurocomputing Model for Elastoplasticity . . . . . . . . . . . . . . . . . 11.3 Structural Re-analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4 Simulations of Global Flexibility and Element Stiffness . . . . . . . 11.5 Solutions Based on Variational Principle . . . . . . . . . . . . . . . . . . . . 11.6 Boundary Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.7 Hybrid Graph-Neural Method for Domain Decomposition . . . . 11.8 Wavefront Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
77 80 81 86 90 91 91 93 94 96 100 100 102 103
Contents
11.9 11.10 11.11 11.12
Contact Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Physics-Informed Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . Dynamic Analysis with Explicit Time Integration Scheme . . . . Reduced Order Model for Improvement of Solutions Using Coarse Mesh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Structural Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.1 Identification of Defects with Laser Ultrasonics . . . . . . . . . . . . . . 12.2 Identification of Cracks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3 Estimation of Stable Crack Growth . . . . . . . . . . . . . . . . . . . . . . . . 12.4 Failure Mechanisms in Power Plant Components . . . . . . . . . . . . 12.5 Identification of Parameters of Non-uniform Beam . . . . . . . . . . . 12.6 Prediction of Beam-Mass Vibration . . . . . . . . . . . . . . . . . . . . . . . . 12.7 Others . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.7.1 Nondestructive Evaluation with Neural Networks . . . . 12.7.2 Structural Identification with Neural Networks . . . . . . . 12.7.3 Neural Networks Combined with Global Optimization Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.7.4 Training of Neural Networks . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Structural Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.1 Hole Image Interpretation for Integrated Topology and Shape Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.2 Preform Tool Shape Optimization and Redesign . . . . . . . . . . . . . 13.3 Evolutionary Methods for Structural Optimization with Adaptive Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.4 Optimal Design of Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.5 Optimization of Production Process . . . . . . . . . . . . . . . . . . . . . . . . 13.6 Estimation and Control of Dynamic Behaviors of Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.7 Subjective Evaluation for Handling and Stability of Vehicle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.8 Others . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Some Notes on Applications of Neural Networks to Computational Mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.1 Comparison among Neural Networks and Other AI Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.2 Improvements of Neural Networks in Terms of Applications to Computational Mechanics . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xi
105 108 109 110 111 113 113 118 120 122 124 126 127 127 131 132 132 133 137 137 139 142 144 148 151 154 159 165 169 169 171 174
xii
Contents
15 Other AI Technologies for Computational Mechanics . . . . . . . . . . . . . 15.1 Parameter Identification of Constitutive Model . . . . . . . . . . . . . . 15.2 Constitutive Material Model by Genetic Programming . . . . . . . . 15.3 Data-Driven Analysis Without Material Modelling . . . . . . . . . . . 15.4 Numerical Quadrature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.5 Contact Search Using Genetic Algorithm . . . . . . . . . . . . . . . . . . . 15.6 Contact Search Using Genetic Programming . . . . . . . . . . . . . . . . 15.7 Solving Non-linear Equation Systems Using Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.8 Nondestructive Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.9 Structural Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.10 Others . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
175 175 177 177 180 183 184
16 Deep Learning for Computational Mechanics . . . . . . . . . . . . . . . . . . . . 16.1 Neural Networks Versus Deep Learning . . . . . . . . . . . . . . . . . . . . 16.2 Applications of Deep Convolutional Neural Networks to Computational Mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.3 Applications of Deep Feedforward Neural Networks to Computational Mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.4 Others . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
199 199
186 186 190 193 195
200 204 205 207
Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 Uncited References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
Part I
Preliminaries: Machine Learning Technologies for Computational Mechanics
Chapter 1
Computers and Network
Abstract This chapter focuses on the progress of several features of computational mechanics and neural networks. Section 1.1 deals with the computers and the processors, Sect. 1.2 the network technologies, Sect. 1.3 the parallel processing and Sect. 1.4 the numerical precision.
1.1 Computers and Processors Needless to say, one of the directions of computational mechanics to aim faster solutions for large problems has been supported by the unprecedented progress of computers. The computing speed is measured by the number of floating point arithmetic operations performed per second, FLOPS (FLoating Operations Per Second). Based on the computing performance in executing the LINPACK code, the benchmark program to measure FLOPS, the ranking of supercomputers in the world has been announced since 1993 [https://www.top500.org]. According to the list as of June 2020, the world’s fastest computer can compute at 415.5 PFLOPS (Peta (1015 )FLOPS), which is almost 240 and 175,000 times faster than the fastest one at the time of 10 years before and that of 20 years before, respectively. Estimated from the data of the last few decades, the computing speed of the fastest computer measured by FLOPS has improved at the rate of 400 times per decade, meaning that we now have a computational power of 40 × 1015 times as fast in speed as that of the year 1956, when the first article on the FEM for engineering problems was published [1]. Figure 1.1 shows how the supercomputers have progressed, where the vertical axis is the speed of selected computers measured by GFLOPS (Giga FLOPS). Remarkable progress of computers has been supported by some innovative technologies developed for the hardware architecture of computers [2, 3]. Among those include vector processors, massively parallel computers, multi-core processors, many-core processors including GPUs (Graphics Processing Units), FPGAs (Field Programmable Gate Arrays) and ASICs (Application Specific ICs). The most modern computers for HPC
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 G. Yagawa and A. Oishi, Computational Mechanics with Neural Networks, Lecture Notes on Numerical Methods in Engineering and Sciences, https://doi.org/10.1007/978-3-030-66111-3_1
3
4
1 Computers and Network 9
10
(GFLOPS) Fugaku TaihuLight K-computer
6
Performance
10
Roadrunner EarthSimulator
3
10
ASCI Red SX-3 Cray-2
0
10
-3
10 1960
Cray-1
CDC6600
1970
1980
1990
Year
2000
2010
2020
Fig. 1.1 Progress of supercomputers
(High Performance Computing) make use of multicore CPUs (Central Processing Units) and GPUs. Multicore CPUs, such as Intel Xeon Processors and IBM Power Processors, have multiple computing cores. They have multiple SIMD processing units that can accelerate calculations in many HPC applications. Table 1.1 shows specifications of some CPUs, which are usually designed for general-purpose, performing well in for any application. GPUs such as NVIDIA Geforce and AMD Radeon were first developed to process image data that consist of many pixels. Therefore, GPUs have several hundreds to thousands of computing cores, designed not for general-purpose as they show poor performances for some problems, but very good, much better than CPUs, performance for special problems. It has been proved that they are suitable for processing neural networks of large scale, which are often seen in deep learning. Table 1.2 shows specifications of some GPUs. The use of GPUs for numerical processing other than graphics is called GPGPU (General Purpose GPU), or GPU computing. Shaders or graphics library, such as OpenGL and DirectX, were used to control GPUs for GPU computing [4, 5]. Programming languages specially designed for GPU computing have also become popular due to their easy usage. Among others, CUDA, Compute Unified Device Architecture developed by NVIDIA, is the most popular one [6, 7]. OpenCL [8] is used for programming GPUs as well.
1.1 Computers and Processors Table 1.1 Specifications of CPUs (2019.09)
Table 1.2 Specifications of GPUs (2019.09)
5 Manufacturer
Product Name
Number of Cores
SIMD Extension
intel
Xeon Platinum 56 9282
AVX512
intel
Xeon Gold 6262 V
24
AVX512
intel
Xeon Silver 4216
16
AVX512
intel
Xeon Bronze 3204
6
AVX512
intel
Core i9 9900 K 8
IBM
Power9
24
AMD
EPYC 7742
64
AVX2
AMD
Ryzen 9 3950X 16
AVX2
AMD
Threadripper 2990WX
32
AVX2
Manufacturer
Product name
Number of cores
Extension
NVIDIA
Geforce RTX 2080Ti
4352
544 Tensor cores
NVIDIA
TITAN RTX
4608
576 Tensor cores
NVIDIA
Tesla P100
3584
FP16
NVIDIA
Tesla V100
5120
640 Tensor cores
AMD
Radeon RX Vega 64
4096
AMD
Radeon Instinct 4096 M160
AVX2
FP16
FPGA [9] has been recognized as an alternative because it has plenty of gates enough for constructing fast and large scale numerical circuits, which consists of many LBs (logical blocks) including DSPs (digital signal processors), IOBs (input/output blocks) and SBs (switch blocks). Users can configure any applicationspecific circuit by connecting them on the chip. Table 1.3 shows specifications of some large scale FPGAs. For building up a circuit on FPGA, users first write the specification of the circuit in a hardware definition language (HDL), such as Verilog HDL and VHDL [10, 11], and then implement it on the FPGA chip using some IDE (integrated development environment) system provided from the manufacturer of the FPGA, such as Vivado (by Xilinx) and Quartus (by Altera/Intel). Recently, users
6
1 Computers and Network
Table 1.3 Specifications of FPGAs (2019.09) Manufacturer
Product name
Number of LBs
Number of DSPs
Memory (Mb)
Xilinx
XCVU13P
3,780,000
12,288
455
Xilinx
XCVU19P
8,938,000
3840
224
intel/Altera
AGI 027
2,692,760
8736
259
can instead use standard C language or OpenCL to specify the circuit than HDL languages. ASICs, including many proprietary processors for numerical processing, were proposed several years ago. But, many of them disappeared because they were outperformed both in cost and performance by commodity CPUs. Among others, Google has developed a special purpose processor for deep learning with a remarkable success, called TPU (tensor processing unit) [12, 13], which was originally designed not for use in training but for inference. These developments of hardware technologies, as a matter of course, have stimulated many innovations in the field of software technologies. In computational mechanics, various new software and solution techniques have been developed to make use of these new hardware [14–19].
1.2 Network Technologies Together with the remarkable progress of computers, networking technologies have made great advances in performance and availability during the last decades. Ethernet, a typical networking technology to connect computers in LAN (Local Area Network), has been developed with high bandwidth: starting from 10 Mbps (Mbit/s) in 1980 and 10 Gbps available in 2020 or so. It has also promoted parallel processing using multiple computers connected via Ethernet. As the machine learning and the deep learning require not only a high amount of computation for training, but also a huge data as input for training, the unprecedented advance of networking technologies has been a big impact on these technologies.
1.3 Parallel Processing The parallel processing with multiple computers or processing cores for solving a single problem has become popular in many fields including computational mechanics. It is partly because the progress in performance of a single processing core has apparently slowed down and much better performance is found difficult to be realized without the help of multiple processors. One of the most popular parallel processing environments is a cluster, which consists of multiple computers mutually
1.3 Parallel Processing
7
connected via network. It is noted that a single CPU with multiple processing cores can be regarded as a parallel processing environment. According to Flynn [20], the parallel processing is categorized into the following four types: SISD (Single Instruction stream, Single Data stream: This is not parallel but sequential.) SIMD (Single Instruction stream, Multiple Data stream). MISD (Multiple Instruction stream, Single Data stream). MIMD (Multiple Instruction stream, Multiple Data stream). It is well known that a cluster is usually used as MIMD type, and GPUs as SIMD type. Another categorization based on the memory configuration is shown as follows: Shared Memory: All processors use common memory space, and inter-processor data transfer via network is unnecessary. (e.g. a single CPU with multiple cores). Distributed Memory: Each processor exclusively uses its own memory, and interprocessor data transfer via network is necessary. (e.g. a cluster). The support by software is essential for parallel processing. OpenMP is usually used not only for shared memory environment but for each multicore processor in distributed memory environment such as a cluster. To perform parallel processing with OpenMP, users have only to write directives just before the code fragments to be executed in parallel [21]. In addition to OpenMP, MPI (Message Passing Interface) [22, 23], was developed for data transfer among processors in the distributed memory environment. Users need to specify details about data transfer, or to know and manage all the data among processors throughout the program execution [24–26].
1.4 Numerical Precision The real number is represented in a fixed-point format as 123.456 or a floatingpoint format as 1.23456 × 102 . The latter can represent much broader orders of numbers than the former with the fixed number of total bits, and therefore the latter is exclusively employed in computational mechanics. Note, however, that arithmetic calculations could be performed much faster with the former format. Representations of the floating-point real numbers on computers are defined by IEEE, known as the IEEE 754 standards, which prepares four formats of the floatingpoint numbers according to their numerical precision: FP128, FP64, FP32 and FP16. The number of each format indicates the total number of bits. The larger the number is, the higher the precision that can represent is.
8
1 Computers and Network
Solving partial differential equations numerically is considered the major part in the computational mechanics, which is usually done with the double precision (FP64) computation. Then, the computing speed with the double precision calculation is possibly the most important factor to select computers for computational mechanics applications, and the most of CPUs and GPUs are designed as the double precision as well as the single precision arithmetic are efficiently executed with the support of hardware. On the other hand, the neural networks including the deep neural networks do not require the double precision arithmetic and the single precision arithmetic usually works well. GPUs, which are known to execute the single precision arithmetic much more efficiently than that of the double precision, show good performance for training of the neural networks. To employ lower precision format for numerical data directly reduces the amount of memory space required to store data and the computational time as well. When using commercially available commodity CPUs, however, numerical operations of real numbers of precision lower than single precision (FP32) cannot perform faster as expected due to the lack of hardware support dedicated to operations of lower precision numbers. Only when using some GPUs equipped with the hardware for calculations of numbers of half precision format (FP16), we can expect almost doubly faster calculations for deep learning applications with the FP16 format than with FP32. Studied are the uses of reduced precision for neural networks: A test of deep learning with limited numerical precision [27], Comparative evaluation of low precision storage for deep learning [28], A test of training with connection weights and activations constrained to 1 or − 1 [29, 30]. These studies show that, using the fixed point real number format rather than that of the floating point, the arithmetic circuits specific to the low-precision format implemented in the FPGA would be much more compact and perform much faster than the circuits for floating point numbers.
References 1. Turner, M.J., Clough, R.W., Martin, H.C., Topp, L.J.: Stiffness and deflection analysis of complex structures. J. Aeronaut. Sci. 23, 805–823 (1956) 2. Dowd, K.: High Performance Computing. O’Reilly& Associates (1993) 3. Almasi, G.S., Gottlieb, A.: Highly Parallel Computing, 2nd edn. Benjamin/Cummings (1994) 4. Rust, R.J.: OpenGL Shading Language, 2nd edn. Addison-Wesley, Boston (2006) 5. Oishi, A., Yoshimura, S.: Finite element analyses of dynamic problems using graphic hardware. Comput. Model. Eng. Sci. 25(2), 115–131 (2008) 6. Kirk, D.B., Hwu, W.W.: Programming Massively Parallel Processors: A Hands-on Approach. Morgan Kaufmann (2010) 7. Sanders, J., CUDA By Example: An Introduction to General-Purpose GPU Programming. Addison-Wesley, Boston (2011) 8. Munshi, A., Gaster, B.R., Mattson, T.G., Fung, J., Ginsburg, D.: OpenCL Programming Guide. Addison-Wesley, Boston (2012) 9. Gokhale, M.B., Graham, P.S.: Reconfigurable Computing: Accelerating Computation with Field-Programmable Gate Arrays. Springer, Berlin (2005)
References
9
10. Palnitkar, S., Verilog, HdL: A Guide to Digital Design and Synthesis. Prentice-Hall, Upper Saddle River (1996) 11. Ashenden, P.J.: The Designer’s Guide to VHDL, 3rd edn. Morgan-Kaufmann, Burlington (2008) 12. Jouppi, N.P., Young, C., Patil, N., Patterson, D., Agrawal, G., Bajwa, R., Bates, S., Bhatia, S., Boden, N., Borchers, A., Boyle, R., Cantin, P., Chao, C., Clark, C., Coriell, J., Daley, M., Dau, M., Dean, J., Gelb, B., Ghaemmaghami, T.V., Gottipati, R., Gulland, W., Hagmann, R., Ho, C.R., Hogberg, D., Hu, J., Hundt, R., Hurt, D., Ibarz, J., Jaffey, A., Jaworski, A., Kaplan, A., Khaitan, H., Koch, A., Kumar, N., Lacy, S., Laudon, J., Law, J., Le, D., Leary, C., Liu, Z., Lucke, K., Lundin, A., MacKean, G., Maggiore, A., Mahony, M., Miller, K., Nagarajan, R., Narayanaswami, R., Ni, R., Nix, K., Norrie, T., Omernick, M., Penukonda, N., Phelps, A., Ross, J., Ross, M., Salek, A., Samadiani, E., Severn, C., Sizikov, G., Snelham, M., Souter, J., Steinberg, D., Swing, A., Tan, M., Thorson, G., Tian, B., Toma, H., Tuttle, E., Vasudevan, V., Walter, R., Wang, W., Wilcox, E., Yoon, D.H.: In-datacenter performance analysis of a Tensor Processing Unit. In: Proceedings of the 44th Annual International Symposium on Computer Architecture, Toronto, June 24–28, pp. 1–12 (2017) 13. Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., Chen, Y., Lillicrap, T., Hui, F., Sifre, L., van den Driessche, G., Graepel, T., Hassabis, D.: Mastering the game of Go without human knowledge. Nature 550, 354–359 (2017) 14. Coutinho, A.L.G.A., Alves, J.D., Ebecken, N.F.F., Troina, L.M.: Conjugate gradient solution of finite element equations on the IBM 3090 vector computer utilizing polynomial preconditionings. Comput. Methods Appl. Mech. Eng. 84, 129–145 (1990) 15. Yagawa, G., Soneda, N., Yoshimura, S.: A large scale finite element analysis using domain decomposition method on a parallel computer. Comput. Struct. 38, 615–625 (1991) 16. Garatani, K., Nakajima, K., Okuda, H., Yagawa, G.: Three-dimensional elasto-static analysis of 100 million degrees of freedom. Adv. Eng. Softw. 32, 511–518 (2001) 17. Akiba, H., Ohyama, T., Shibata, Y., Yuyama, K., Katai, Y., Takeuchi, R., Hoshino, T., Yoshimura, S., Noguchi, H., Gupta, M., Gunnels, J.A., Austel, V., Sabharwal, Y., Garg, R., Kato, S., Kawakami, T., Todokoro, S., Ikeda, J., Large scale drop impact analysis of mobile phone using ADVC on Blue Gene/L. In: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, November, 2006, Tampa, Florida doi: https://doi.org/10.1145/1188455.118 8503 18. Kawai, H., Ogino, M., Shioya, R., Yoshimura, S.: Large scale elasto-plastic analysis using domain decomposition method optimized for multi-core CPU architecture. Key Eng. Mater. 462–463, 605–610 (2011) 19. Papadrakakis, M., Stravroulakis, G., Karatarakis, A.: A new era in scientific computing: domain decomposition methods in hybrid CPU-GPU architectures. Comput. Methods Appl. Mech. Eng. 200, 1490–1508 (2011) 20. Flynn, M.J.: Very high-speed computing systems. Proc. IEEE 54(12), 1901–1909 (1966) 21. Chandra, R., Dagum, L., Kohr, D., Maydan, D., McDonald, J., Menon, R.: Parallel Programming in OpenMP. Morgan Kaufmann, Burlington (2001) 22. Snir, M., Otto, S., Huss-Lederman, S., Walker, D., Dongarra, J.: MPI—The Complete Reference Volume 1, The MPI Core, 2nd edn. MIT Press, Cambridge (1998) 23. Gropp, W., Huss-Lederman, S., Lumsdaine, A., Lusk, E., Nitzberg, B., Saphir, W., Snir, M.: MPI—The Complete Reference Volume 2, The MPI Extensions. MIT Press, Cambridge (1998) 24. Pacheco, P.S.: Parallel Programming with MPI. Morgan Kaufmann, Burlington (1997) 25. Gropp, W., Lusk, E., Skjellum, A.: Using MPI: Portable Parallel Programming with the Message-Passing Interface, 2nd edn. MIT Press, Cambridge (1999) 26. Gropp, W., Lusk, E., Thakur, R.: Using MPI-2: Advanced Features of the Message-Passing Interface. MIT Press, Cambridge (1999) 27. Gupta, S., Agrawal, A., Gopalakrishnan, K., Narayanan, P.: Deep learning with limited numerical precision. In: Proceedings of the 32nd International Conference on Machine Learning, Lille, France (2015)
10
1 Computers and Network
28. Courbariaux, M., David, J.P., Bengio, Y.: Low precision storage for deep learning (2014). arXiv:1412.7024 29. Courbariaux, M., Hubara, I., Soudry, D., El-Yaniv, R., Bengio, Y.: Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1 (2016). arXiv:1602.02830 30. Courbariaux, M., Bengio, Y., David, J.P.: Binaryconnect: Training deep neural networks with binary weights during propagations. Adv. Neural. Inf. Process. Syst. 28, 3105–3113 (2015)
Chapter 2
Feedforward Neural Networks
Abstract The feedforward neural network is the most fundamental type of neural networks from historical viewpoint and its wide applicability. This chapter discusses several aspects of this type of neural network in detail. Section 2.1 describes its fundamental structure and algorithm, Sect. 2.2 various types of layers, Sect. 2.3 some techniques for regularization, Sect. 2.4 the acceleration techniques for training, Sect. 2.5 the methods for weight initialization, and finally Sect. 2.6 the model averaging technique and the Dropout.
2.1 Bases The feedforward neural network is a system of multi-layered processing components (Fig. 2.1). Each layered component consists of some units, the multiple-input–singleoutput processors each modelled after a nerve cell called a neuron, receiving data from the units in the preceding layer as input and providing a single value as output (Fig. 2.2). Assuming the number of layers to be N, the network consists of input (the first) layer, hidden (second through the (N − 1)-th) layers, and output (the N-th) layer. In a standard feedforward neural network, a unit is connected to another unit in the neighboring layer and the corresponding connection weight is assigned between these units. It is noted that connections exist neither among units in the same layer nor those in non-neighboring layers. The data provided at the input layer run through a neural network via connections between units, starting from the input layer to the output layer. It is well known that a feedforward neural network of more than three layers can simulate any non-linear continuous function with any precision desired [1, 2]. This, however, ensures merely the existence of the neural network that can simulate a target function without any suggestion on what architecture, including the number of hidden layers and that of units in them as well as how to select the values of connection weights. A feedforward neural network is regarded as a mapping as follows: © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 G. Yagawa and A. Oishi, Computational Mechanics with Neural Networks, Lecture Notes on Numerical Methods in Engineering and Sciences, https://doi.org/10.1007/978-3-030-66111-3_2
11
12
2 Feedforward Neural Networks
Fig. 2.1 Feedforward neural network
Output Data
: Unit Output Layer
Hidden Layers
Input Layer Input Data
Fig. 2.2 Schematic diagram of unit
Rn I → RnO
(2.1.1)
where n I and n O are the numbers of input and output data, respectively, and the numbers of units in the hidden layers as well as those of hidden layers are usually determined through trial-and-error procedure. Receiving as input the weighted sum of output data from the units in the preceding layer, any unit in the hidden and the output layers outputs the value of activation function as follows: p
Uj =
n p−1 i=1
p−1
w ji
p−1
· Oi
p
+ θj
(2.1.2)
2.1 Bases
13
with p p Oj = f Uj
(2.1.3)
where. p
Uj : input value to the activation function of the j-th unit in the p-th layer, n p−1 : number of units in the (p − 1)-th layer, p−1 w ji : connection weight between the i-th unit in the (p − 1)-th layer and the j-th unit in the p-th layer, p−1 Oi : output of the i-th unit in the (p − 1)-th layer, p bias of the j-th unit in the p-th layer, θj : p output value of the activation function of the j-th unit in the p-th layer, Oj : f (): activation function. A nonlinear function is used as an activation function. For example, the sigmoid function and the rectified linear unit (ReLU) function are usually employed, which are, respectively, shown as f (x) = f (x) =
x (x > 0) 0 (x ≤ 0)
1 1 + e−x
Sigmoid function
Rectified linear unit (ReLU) function
(2.1.4)
(2.1.5)
Figure 2.3 shows the sigmoid function and its first derivative, while Fig. 2.4 the ReLU function and its first derivative. Note that the first derivative of the sigmoid function vanishes at x far from the origin, while that of the ReLU does not even at very large value of x, the fact of which has significant importance in the gradient-based training of neural networks. In addition, a piecewise linear, convex function is employed, called the Maxout [3] (see Fig. 2.5), which is tuned during the training of the neural networks. It is noted that the linear function: f (x) = ax + b
(2.1.6)
is often employed as the activation function at the output layer, especially for regression. At the input layer, the identity function: f (x) = x
(2.1.7)
is usually employed. A pair of input data and corresponding output data or the teacher signals is called a training pattern, in which the training means the process to construct the implicit mapping relationship inherent to the large amount of training patterns onto the neural
14
2 Feedforward Neural Networks
Fig. 2.3 Sigmoid function
1.2 f
1.0 0.8 0.6 0.4 0.2
df dx
0.0 -0.2 -10
-5
0
5
x
10 f 8
6
4
2
0 -10
Fig. 2.4 ReLU function
df dx
-5
0
x
5
10
10
2.1 Bases
15
x
0
Fig. 2.5 Piecewise linear function in Maxout
network. A feedforward neural network can construct on itself the mapping from the input to the output data or the teacher signals through the iterative learning process of these data pairs. Eventually, the trained neural networks can estimate the output for any input data even not included in the training data, which is called the generalization. As for the training algorithm, the error back propagation based on the gradient descent algorithm is usually employed [4, 5], where the connection weights between two units and the biases of units are iteratively modified to minimize a loss function through the following procedure: (0) Initialization: All the connection weights and the biases are initialized to be small random values. As the training result depends on the initialization, it is often the case that the best one is selected from multiple results trained with different initializations. (1) Forward propagation: Input data, starting from the input layer, propagate through the hidden layers and finally reach the output layer, which is called the forward propagation, and the data are sequentially processed with Eqs. (2.1.2) and (2.1.3). Finally, using the output at the output layer and the corresponding teacher signals, the error or the loss function is calculated as N P q N q 2 1 O j − Tj E= 2 q=1 j=1
n
n
(2.1.8)
where q O jN is the output of the j-th unit in the output at the N-th layer for the q-th training pattern, q T j the teacher signal corresponding to the output of the j-th unit
16
2 Feedforward Neural Networks
in the output layer for the q-th training pattern, and n P the total number of training patterns. (2) Back propagation: After the error or the loss function is evaluated in the forward propagation process above, the connection weights and the biases are iteratively modified based on the gradient descent concept as follows: p−1
w ji p−1
w ji
=−
p−1
= w ji p
p
(2.1.9)
p−1
∂w ji
p−1
+ α · w ji
θ j = − p
∂E
∂E p ∂θ j
(2.1.10) (2.1.11)
p
θ j = θ j + β · θ j
(2.1.12)
p−1
where w ji is the change of the connection weight between the j-th unit in the p p-th layer and the i-th unit in the (p − 1)-th layer, θ j the change of bias at the j-th unit in the p-th layer, and α and β the learning-rate parameters of updates for the connection weight and the bias, respectively. This procedure is named as the error back propagation as the connection weights and the biases are updated in descending order from the top (output) layer to the bottom (input) layer. Processing cyclically the forward and backward propagations, all the connection weights and the biases converge to values that minimize the error defined by Eq. (2.1.8), which is equal to the sum of the squared error of all the output units for all the training patterns. When using Eq. (2.1.8) as the loss function, the back propagation process is performed only once after the accumulative forward propagations for all the training patterns. Instead of Eq. (2.1.8), another definition of the error: N N 2 1 O j − Tj 2 j=1
n
E=
(2.1.13)
is also employed, where the error is defined for each training pattern. If this definition of error is used, the back propagation calculation is performed each time after the process of forward propagation of a single training pattern. Back propagation based on the direct minimization of the error given by Eq. (2.1.8) is called the batch training, whereas that based on the minimization of the error Eq. (2.1.13) the on-line training [6]. The latter is based on the stochastic gradient descent, which eventually minimizes the error Eq. (2.1.8). It is known that the former is easily trapped in the local minimum, whereas the latter is apt to reach the global minimum with the help of its stochastic nature. But, the computational load of the former is smaller than that of the latter, and more suitable to parallel processing.
2.1 Bases
17
Therefore, when using the GPU as an accelerator of the deep learning, the former or the mini-batch training is usually employed, where the training patterns are divided into multiple smaller groups and the former is applied to each group. Usually, the error defined by Eq. (2.1.8) gradually decreases epoch by epoch, where epoch is a sum of forward propagation calculations for all the training patterns followed by corresponding back propagation calculations. It is noted that neural networks sometimes fit themselves to training patterns too strictly, as is often the case with neural networks of too large size in the number of hidden layers or that of the units in each hidden layer compared to the complexity of the problem. This is called the overfitting or the overtraining. When it occurs, the neural networks lose generalization capability to provide appropriate estimation for new input other than training patterns. To avoid this, one can prepare other patterns not included in the training patterns, usually called the test patterns, monitor the error defined as Eq. (2.1.8) for the test patterns during training, and terminate the training before the error for the test patterns starts to increase. In the classification, the Softmax cross entropy error defined as ⎞ exp O jN N⎠ T j · ln⎝ n N E =− i=1 exp Oi j=1 nN
⎛
(2.1.14)
is often employed as the loss function instead of the sum of squared error defined by Eq. (2.1.13). Here, each teacher signal has as many components as the categories, and the component corresponding to the correct category is set to be 1 and others to be 0.
2.2 Various Types of Layers Various types of layers have been proposed depending on the character of input data in addition to the standard fully-connected layer defined in Eq. (2.1.2), where each unit in a layer receives data from any unit in the preceding layer. Here, four types of layers are discussed: a convolution layer, a pooling layer, a normalization layer and a recurrent layer. A convolution layer is often employed for the image data, which are regarded as a two-dimensional array, meaning that the units in a layer in a feedforward neural network are also two-dimensionally arranged. Figure 2.6 shows a schematic illustration of the convolution layer with a filter of the size 3 × 3. Connections in the convolution layer are defined as p
Ukl =
S−1 S−1 s=0 t=0
p−1
h st
p−1
p
· Ok+s,l+t + θkl
(2.2.1)
18
2 Feedforward Neural Networks
(p-1)-th Layer
p-th Layer Fig. 2.6 Convolution layer p−1
where Ok+s,l+t is the output of the (k + s, l + t)-th unit in the (p − 1)-th layer, where p−1 units are arranged in a two-dimensional manner, h st the (s, t)-th component of the p filter of S × S size for the (p − 1)-th layer, θkl the bias of the (k, l)-th unit in the p-th p layer, and Ukl the input value to the activation function of the (k, l)-th unit in the p-th layer, where units are also arranged in a two-dimensional manner. A pooling layer is employed just after a convolution layer, where the connections in the pooling layer are defined as ⎛ Ukl = ⎝ p
1 S2
p−1
g
Ost
⎞ g1 ⎠
(2.2.2)
(s.t)∈Dkl
p−1
where Ost is the output of the (s, t)-th unit in the (p − 1)-th layer, where units are arranged in a two-dimensional manner, Dkl the pooling window of the (k, l)-th unit in the p-th layer, and (s, t) the index of the unit within the pooling window Dkl of S × S size. Setting g in Eq. (2.2.2) to be 1.0 results in an average pooling as p
Ukl =
1 S2
p−1
Ost
(2.2.3)
(s.t)∈Dkl
On the other hand, setting g in Eq. (2.2.2) to be ∞results in a max pooling as p
Ukl = max
(s.t)∈Dkl
p−1
Ost
(2.2.4)
2.2 Various Types of Layers
19
A normalization layer is employed in combination with the convolutional layer. A typical normalization layer is defined as follows: − p−1
p−1
p Ukl
− O kl O = kl c + σkl2
(2.2.5)
where − p−1
O kl
σkl2
1 = 2 S
=
1 S2
p−1
Ok+s,l+t
(2.2.6)
(s.t)∈Dkl
p−1 Ok+s,l+t
− p−1
2
− O kl
(2.2.7)
(s.t)∈Dkl
p−1
Here, Okl is the output of the (k, l)-th unit in the (p − 1)-th layer, where the units are arranged in a two-dimensional manner, Dkl the normalization window of the (k, l)-th unit in the p-th layer, (s, t) the index of the unit within the normalization window Dkl of S × S size and c a small constant number to avoid the division by zero. Finally, a recurrent layer is taken, which is often employed for time-dependent input data, where not only the data at the current time step but also the data at the precedential time step are considered. A recurrent layer is defined as follows: p(N )
Uj
=
n p−1
p−1
p−1 (N )
w ji Oi
+
np ∼p (N −1) p w j j O jp + θj
(2.2.8)
j =1
i=1
∼p
where ()(N ) means the value for the N-th input pattern, and w j j the connection weight between the j-th unit in the p-th layer and the output of the j’-th unit in the p-th layer for the backward propagation.
2.3 Regularization In order to relax the overtraining, the so-called regularization is often performed, where the standard target function E S to be minimized defined in Eq. (2.1.13) is modified by adding a new term Er eg as N N 2 1 O j − T j + Er eg 2 j=1
n
E = E S + Er eg =
(2.3.1)
20
2 Feedforward Neural Networks
where Er eg is the sum of the squared weights as follows: Er eg =
λ p−1 2 w ji 2 p i, j
(2.3.2)
Here, λ is a regularization factor. This is called the Tikhonov regularization [6], and the weight update rule, defined in Eqs. (2.1.9) and (2.1.10), are written as follows: p−1 w ji
=
p−1 w ji
−α
∂ E S + Er eg p−1 ∂w ji
p−1
= (1 − λ)w ji
−α
∂ ES p−1
∂w ji
(2.3.3)
It is expected that this weight update tends to let the value of weight smaller. Therefore, this regularization is often called the weight decay method. Another type of regularization that does not alter the target function is also employed by adding small noise to input data with success to relax overtraining, which can be seen as a kind of data augmentation.
2.4 Acceleration for Training To train neural networks using the back propagation is often a time-consuming process, and then various acceleration techniques have been developed. One of the most frequently employed acceleration techniques is the momentum method, where the update of weight for an input pattern is repeatedly used for the update of the weight for the next input pattern as p−1 (N )
w ji
=−
∂ E (N )
(2.4.1)
p−1
∂w ji
where p−1 (N )
w ji
p−1 (N −1)
= w ji
p−1 (N )
+ α · w ji
p−1 (N −1)
+ γ · w ji
(2.4.2)
Here, ()(N ) is the value for the N-th input pattern, γ a momentum factor. When p−1 (N ) p−1 (N −1) p−1 has the same sign as w ji , the amount of the update of w ji w ji p−1 (N )
is larger than that by w ji
alone, suggesting the update being accelerated. In
p−1 (N ) has the opposite sign to contrast, when w ji p−1 p−1 (N ) update of w ji is smaller than that by w ji
p−1 (N −1)
w ji
, the amount of the p−1
alone, which implies that w ji tends to remain unchanged when the updates derived from the successive patterns contradict each other.
2.4 Acceleration for Training
21
The adaptation of the learning factor during the training is often employed to accelerate it. The AdaGrad is one of the most popular techniques for this purpose p−1 (N ) with all the older updates as follows: [7], which modifies the current update w ji h (0) = ε p−1 (N )
w ji
=−
(2.4.3) ∂ E (N )
(2.4.4)
p−1
∂w ji
2 p−1 (N ) h (N ) = h (N −1) + w ji η(0) η(N ) = √ h (N ) p−1 (N )
w ji
p−1 (N −1)
= w ji
(2.4.5)
(2.4.6) p−1 (N )
+ η(N ) · w ji
(2.4.7)
where ε is a small constant, e.g. ε = 1.0×10−7 , and the learning rates for connection weights that are frequently updated decrease faster than those for weights rarely updated. The Adam [8] is another popular technique, which is formulated as follows: m (0) = 0
(2.4.8)
v (0) = 0
(2.4.9)
p−1 (N )
w ji
=−
∂ E (N )
(2.4.10)
p−1
∂w ji
p−1 (N )
m (N ) = β1 · m (N −1) + (1 − β1 )w ji
2 p−1 (N ) v (N ) = β2 · v (N −1) + (1 − β2 ) w ji
(N )
m
=
m (N ) 1 − β1 N
(2.4.11)
(2.4.12)
(2.4.13)
22
2 Feedforward Neural Networks
v (N ) =
v (N ) 1 − β2 N
(2.4.14)
(N )
p−1 (N ) w ji
=
p−1 (N −1) w ji
+α
m ε + v (N )
(2.4.15)
where α, β1 , β2 and ε are parameters and α = 0.001, β1 = 0.9, β2 = 0.999 and ε = 1.0 × 10−8 are suggested in the original paper.
2.5 Initialization of Connection Weights Connection weights are usually initialized to small random values. Performance of a trained neural network, however, depends on the initial values of the connection weights used for the training as the back propagation algorithm is gradient-based. Therefore, in order to find optimal setting of a network structure and parameters out of many possible candidates, so many training cases with different initial values of connection weights are tested for each setting of a network structure and parameters. Assuming a fully connected layer with m inputs and n outputs, we can use a heuristic method that the connection weightsare initialized into random values sampled from a uniform distribution ranged − √1m , √1m . The sampling from a √ √ 6 6 uniform distribution ranged − √m+n is also suggested [9], which is called , √m+n the normalized initialization.
2.6 Model Averaging and Dropout The model averaging is a technique to reduce an estimation error by combining outputs of several neural networks trained differently, which is also called the ensemble method. The Dropout [10] is a method to train the ensemble consisting of subnetworks that can be formed by removing some units from an underlying base neural network, where the formulation of the feedforward fully-connected layer given in Eq. (2.1.2) is revised as follows: p−1
ri
(s) = Ber noulli(s)
(2.6.1)
2.6 Model Averaging and Dropout p
Uj =
n p−1
23 p−1
w ji
p−1
· ri
p−1
(s) · Oi
p
+ θj
(2.6.2)
i=1 p−1
where ri (s) is the independent Bernoulli random variable that takes 1 with probability s and 0 with probability 1 − s, which has an effect of temporarily producing subnetworks out of the base network.
References 1. Funahashi, K.: On the approximate realization of continuous mappings by neural networks. Neural Netw. 2, 183–192 (1989) 2. Hornik, K., Stinchcombe, M., White, H.: Multilayer feedforward networks are universal approximators. Neural Netw. 2, 359–366 (1989) 3. Goodfellow, I., Warde-Farley, D., Mirza, M., Courville, A., Bengio, Y.: Maxout network. J. Mach. Learn. Res. W&CP 28(3), 1319–1327 (2013) 4. Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323, 533–536 (1986) 5. Rumelhart, D.E., McClelland, J.L., The PDP Research Group: Parallel Distributed Processing: Explolations in the Microstructure of Cognition, Volume 1: Foundation, MIT Press, Cambridge (1986) 6. Heykin, S.: Neural Networks: A Comprehensive Foundation. Prentice Hall, Upper Saddle River (1999) 7. Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) 8. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: The 3rd International Conference for Learning Representations (ICLR), San Diego (2015) arXiv:1412.6980 9. Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, PMLR 9, 249–256 (2010) 10. Srivastava, N., Hinton, G.E., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)
Chapter 3
Deep Learning
Abstract Chapter 3 deals with the deep learning based on the viewpoint that it is an extended version of the feedforward neural network. Section 3.1 describes what is common and what is different between the deep learning and the feedforward neural network. Techniques for pretraining are described in Sects. 3.2 and 3.3.
3.1 Neural Network Versus Deep Learning One often employs the neural network with too many hidden layers in order to gain better estimation capability for complex nonlinear problems, but it does not necessarily work well. This is partly because the updates of the weights and the biases near the input layer become too small with many hidden layers and the training does not proceed further, which is called the vanishing gradient problem. Hinton et al. [1] claimed that a layer by layer training from the bottom to the top using the restricted Boltzmann machine is effective to solve this problem. In this layer-wise training, called the pretraining, the initial values of the connection weights and the biases are first set arbitrarily, and then the network as a whole is trained using the back propagation algorithm. The autoencoder, a simple three-layer network with the same teacher signal as the input data, is also shown to be effective to solve the issue [2]. On the other hand, the training of neural networks with more than five non-linear layers is called the deep learning [3]: A deep-learning architecture is a multilayer stack of simple modules, all (or most) of which are subject to learning, and many of which compute non-linear input–output mappings. Each module in the stack transforms its input to increase both the selectivity and the invariance of the representation. With multiple non-linear layers, say a depth of 5 to 20, a system can implement extremely intricate functions of its inputs that are simultaneously sensitive to minute details — distinguishing Samoyeds from white wolves — and insensitive to large irrelevant variations such as the background, pose, lighting and surrounding objects. (Reprinted from [3] with permission from Nature.)
As the deep learning requires a large amount of computation for big training data, the so called accelerators such as GPUs are often used, and some libraries for that © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 G. Yagawa and A. Oishi, Computational Mechanics with Neural Networks, Lecture Notes on Numerical Methods in Engineering and Sciences, https://doi.org/10.1007/978-3-030-66111-3_3
25
26
3 Deep Learning
purpose have become available [4, 5]. Having shown superior achievement than any other method in the image classification problem [6], the deep learning has made remarkable successes [7–9].
3.2 Pretraining: Autoencoder As mentioned above, the pretraining is a method with the layer-by-layer initialization of the connection weights. Assuming a deep neural network with many layers, the connection weights between the input layer and the first hidden layer are initialized by means of the pretraining technique first, and then the connection weights between the first and the second hidden layers are initialized, and so on. A simple three-layer network called the autoencoder, that is trained to copy its input to its output (Fig. 3.1), is often used for the pretraining as follows: (1) A deep neural network to be trained is divided into multiple three-layer neural networks. Figure 3.2 shows the original deep neural network and the first two autoencoders generated by the division, each of which is set to have as many output units as input units. (2) The first autoencoder (a) in Fig. 3.2 is trained to copy its input to its output using the training patterns for the original neural network. The resulted connection weights between input and hidden layers of the autoencoder (a) are used for the initial values of those between the input and the first hidden layers in the original deep neural network. (3) Then, the second autoencoder (b) in Fig. 3.2 is trained to copy its input to its output using the output of the units in the hidden layer of the first autoencoder (a) as the input data. The resulted connection weights between input and hidden layers of the autoencoder (b) are used for the initial values of those between the first and the second hidden layers in the original deep neural network. Fig. 3.1 Autoencoder
Output Data (= Input Data)
Output Layer
Hidden Layer Input Layer
Input Data
3.2 Pretraining: Autoencoder
27
autoencoder (b)
deep neural network autoencoder (a) Fig. 3.2 Division of a deep neural network to multiple autoencoders
(4) The above steps are repeated (Fig. 3.3). Some improvements of training of autoencoders have been proposed. One of them is known as a denoising autoencoder [10, 11], where a small artificial noise is added to input data during training while the teacher signal unchanged.
3.3 Pretraining: Restricted Boltzmann Machine The restricted Boltzmann machine is a kind of Boltzmann machine with visible and invisible (hidden) neurons [12, 13], in which any visible neuron has a connection with each hidden neuron, while there are no connections among visible neurons either among hidden neurons. Therefore, the machine can be regarded to have two-layer structure consisting of a layer of visible neurons and that of hidden neurons as shown in Fig. 3.4. This structure is regarded as the same as the successive two layers cut out from a deep feedforward neural network. The restricted Boltzmann machine can be used for the layer-wise pretraining of the deep neural network. Basic algorithm of the Boltzmann machine is described in Sect. 4.2. Though the machine is often compute-intensive, various improvements and fast learning algorithms have been proposed [1, 14].
28
3 Deep Learning
autoencoder (b)
deep neural network autoencoder (a) Fig. 3.3 Successive training of autoencoders
Invisible(Hidden) Units
Visible Units Fig. 3.4 Restricted Boltzmann machine
References 1. Hinton, G.E., Osindero, S., Teh, Y.: A fast learning algorithm for deep belief nets. Neural Comput. 18, 1527–1544 (2006) 2. Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.: Greedy layer-wise training of deep networks. In: Proceedings of NIPS (2006) 3. LeCun, Y., Bengio, Y., Hinton, G.E.: Deep learning. Nature 521, 436–444 (2015) 4. Tokui, S., Oono, K., Hido, S., Clayton, J.: Chainer: a next-generation open source framework for deep learning. In: Proceedings of Workshop on Machine Learning Systems (LearningSys)
References
5.
6.
7. 8.
9.
10.
11. 12.
13. 14.
29
in The Twenty-ninth Annual Conference on Neural Information Processing Systems (NIPS) (2015) Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mane, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viegas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., Zheng, X.: TensorFlow: large-scale machine learning on heterogeneous distributed systems (2016). arXiv:1603.04467 Le, Q.V., Ranzato, M.A., Monga, R., Devin, M., Chen, K., Corrado, G.S., Dean, J., Ng, A.Y.: Building high-level features using large scale unsupervised learning. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), May 2013, pp 8595–8598 Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016) Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T., Hassabis, D.: Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016) Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., Chen, Y., Lillicrap, T., Hui, F., Sifre, L., van den Driessche, G., Graepel, T., Hassabis, D.: Mastering the game of Go without human knowledge. Nature 550, 354–359 (2017) Vincent, P., Larochelle H., Bengio, Y., Manzagol, P.A.: Extracting and composing robust features with denoising autoencoders. In: Proceedings of the Twenty-fifth International Conference on Machine Learning (ICML‘08), pp. 1096–1103 (2008) Alain, G., Bengio, Y.: What regularized auto-encoders learn from the data generating distribution. In: ICLR’2013 (2013). arXiv:1211.4246 Rumelhart, D.E., McClelland, J.L., The PDP Research Group: Parallel Distributed Processing: Explolations in the microstructure of cognition, Volume 1: Foundation. MIT Press, Cambridge (1986) Hinton, G.E.: Training products of experts by minimizing contrastive divergence. Neural Comput. 14, 1771–1800 (2002) Hinton, G.E.: A practical guide to training restricted Boltzmann machine. In: Neural Networks: Tricks of the Trade. Springer, Berlin, pp. 599–619 (2012)
Chapter 4
Mutually Connected Neural Networks
Abstract This chapter focuses on the mutually connected neural networks. Section 4.1 describes the Hopfield network, and Sect. 4.2 the Boltzmann machine.
4.1 Hopfield Network The Hopfield network [1] consists of mutually-connected neurons shown as open circles in Fig. 4.1, where each neuron called unit is connected with other neurons in the network. The connections among them are formulated as w ji = wi j U (t) j =
w ji · Oi(t−1) + θ j
(4.1.1) (4.1.2)
i
(i = j)
where (t) = f U O (t) j j
(4.1.3)
Minimized through updating neurons iteratively is the energy of the system, which is defined as E(t) = −
1 wi j Oi(t) O (t) θi Oi(t) j − 2 i, j i
(4.1.4)
(i = j)
where ()(t) means the value at the t-th step. The activation function f (x) in Eq. (4.1.3) used here is the McCulloch-Pitts function as
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 G. Yagawa and A. Oishi, Computational Mechanics with Neural Networks, Lecture Notes on Numerical Methods in Engineering and Sciences, https://doi.org/10.1007/978-3-030-66111-3_4
31
32
4 Mutually Connected Neural Networks
Fig. 4.1 Hopfield network
i
wij (wji) j
f (x) =
1 (x > 0) 0 (other wise)
(4.1.5)
The algorithm for the minimization of the system energy is given as follows: (0) Set t = 0 (1) Any unit is selected at random. (2) Inputs from the other units to the selected unit are summed up as Eq. (4.1.2). (3) Output from the selected unit is updated using Eqs. (4.1.3) and (4.1.5). (4) Outputs of the other units remain unchanged. (5) Update t → t + 1 (6) Return to (1). It is known that the network energy defined in Eq. (4.1.4) monotonously decreases with t. Important to note here is that once the system is initialized to be around a neighbor of one of many local minima, it is impossible for the system to escape from the local minimum.
4.2 Boltzmann Machine The Boltzmann machine [2] is an extension of the Hopfield network, where a probabilistic nature is added to make it possible to escape from the local minimum. The formulation, basically identical to that of the Hopfield network, is given as w ji = wi j U (t) j =
i
(i = j)
w ji · Oi(t−1) + θ j
(4.2.1) (4.2.2)
4.2 Boltzmann Machine
33
(t) O (t) j = f Uj E(t) = −
(4.2.3)
1 wi j Oi(t) O (t) − θi Oi(t) j 2 i, j i
(4.2.4)
(i = j)
Note that the activation function is different from that of the Hopfield network, or the activation function f (x) is set to 1 with the probability of sigmf Tx , where sigmf() is the sigmoid function and T is a parameter called a temperature as P( f (x) = 1) = sigmf
x T
x P( f (x) = 0) = 1 − sigmf T
(4.2.5a)
(4.2.5b)
where P( f (x) = a) is a probability that f (x) is a, and sigmf(x) =
1 1 + exp(−x)
(4.2.6)
Whenthe temperature T is very high, the probability of f (x) = 1, i.e. the value of sigmf Tx , is almost 0.5 for any value of x, meaning that the value of f (x) is randomly set to 0 or 1. On the other hand, when the temperature T is very low, or nearly zero, the probability of f (x) = 1 will be almost 1 for a positive value of x and almost 0 otherwise. This behavior is equivalent to that of the McCulloch-Pitts model used for the Hopfield network. It is known that the temperature T set high in a random manner increases the energy defined in Eq. (4.2.4), which often makes it possible to escape from being trapped in the local minima. Employing this behavior, a strategy to set the temperature T high at the early stage and then gradually set it lower is sometimes taken, which is called the simulated annealing [3]. The neurons in the Boltzmann machine are often categorized into two groups: the visible units and the invisible or hidden units as shown in Fig. 4.2. The learning algorithms for the Boltzmann machine are usually based on the maximum-likelihood principle [4–6].
34
4 Mutually Connected Neural Networks
Fig. 4.2 Boltzmann machine
i
wij (wji)
Invisible Units
j
References 1. Hopfield, J.J.: Neural network and physical systems with emergent collective computational abilities. Proc. Natl. Acad. Sci. USA 79(8), 2554–2558 (1982) 2. Ackley, D.H., Hinton, G.E., Sejnowski, T.J.: A learning algorithm for Boltzmann machines. Cogn. Sci. 9, 147–169 (1985) 3. Kirkpatrick, S., Gelatt, C.D., Jr., Vecchi, M.P.: Optimization by simulated annealing. Science 220, 671–680 (1983) 4. Rumelhart, D.E., McClelland, J.L., The PDP Research Group: Parallel Distributed Processing: Explolations in the Microstructure of Cognition, Volume 1: Foundation. MIT Press, Cambridge (1986) 5. Hassoun, M.H.: Fundamentals of Artificial Neural Networks. MIT Press, Cambridge (1995) 6. Heykin, S.: Neural Networks: A comprehensive Foundation. Prentice Hall, Upper Saddle River (1999)
Chapter 5
Other Neural Networks
Abstract The present chapter discusses neural networks other than the feedforward neural networks (see Chap. 2) and the mutually connected neural networks (see Chap. 4). Section 5.1 describes the self-organizing map (SOM) and Sect. 5.2 the radial basis function network.
5.1 Self-organizing Maps The self-organizing maps [1] are a kind of neural networks consisting of only two layers, the input and output layers (Fig. 5.1). Units or neurons in the output layer, virtually allocated at the nodes of two-dimensional lattice, are connected to all the input data. It is noted that the self-organizing maps, categorized as a unsupervised learning, do not require teacher signals. The training algorithm of the self-organizing map with M × N output units and the m-dimensional input patterns is summarized as follows: (1) Weights between input and output layers are initialized: T (i, j) (i, j) (i, j) w(i, j) = w1 , w2 , . . . , wm−1 , wm(i, j) , ((i, j) = (1, 1), . . . , (N , M)) (5.1.1) where w (i, j) is a weight vector corresponding to the (i,j)-th unit in the output (i, j) a weight between the (i,j)-th unit in the output layer and the layer, and wl l-th component of the input vector. (2) An m-dimensional input pattern, or an input vector, is given as T (k) , xm(k) x(k) = x1(k) , x2(k) , . . . , xm−1
(5.1.2)
(3) The distance between the input vector x(k) and the weight vector w (i, j) is calculated for all the weight vectors. Then, the weight vector that minimizes the © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 G. Yagawa and A. Oishi, Computational Mechanics with Neural Networks, Lecture Notes on Numerical Methods in Engineering and Sciences, https://doi.org/10.1007/978-3-030-66111-3_5
35
36
5 Other Neural Networks
Fig. 5.1 Self-organizing map
Input Vector Input Layer
Output Layer
Units (Neurons)
distance is searched with its location being registered as (i min , jmin ): (k) (k) i min = arg minx(k) − w(i, j) , jmin (i, j)
(5.1.3)
(4) All the weight vectors between input and output layers are updated using 2 2 (k) (k) i, j i min , jmin (k) (k) d((i, j), k) = r − r = i − i + j − j min min d((i, j), k)2 H ((i, j), k) = exp − 2σ 2 t σ = σ (t) = σ0 exp − τ1 t η = η(t) = η0 exp − τ2 w (i, j) (t + 1) = w(i, j) (t) + η(t) · H ((i, j), k) · x(k) − w (i, j) (t)
(5.1.4)
(5.1.5) (5.1.6) (5.1.7) (5.1.8)
where d((i, j), k) is the virtual distance between the (i,j)-th unit and the (k) (k) i min -th unit in the output layer, τ1 , τ2 , σ0 and η0 are constants to be , jmin appropriately set, and H ((i, j), k) the influence circle, which takes one for the (k) (k) output unit at i min and smaller values for other output units far from , jmin (k) (k) . i min , jmin (5) (2)–(4) above are performed for all the input patterns.
5.1 Self-organizing Maps
37
(6) (2)–(5) above are repeated until the system reaches a stable state. Note that appropriate values of parameters, τ1 , τ2 , σ0 and η0 above, depend on the problems to be solved and the map size [2].
5.2 Radial Basis Function Networks The radial basis function networks [3, 4] consist of three layers; input, hidden and output layers as shown in Fig. 5.2, where all the connection weights between hidden and input layers are set to 1, meaning that any hidden unit directly uses all the input data. Linear activation function is employed for the output unit. Their major difference between the basic three-layer feedforward neural network and the radial basis function networks is about the activation function of the units in the hidden layer used. The latter employ a function with a localized receptive field, i.e. a radial basis function, for the activation function of the hidden units. The networks are considered to be universal approximators like the standard feedforward neural networks [5], and formulated as follows:
x( p) − µ j 2 ( p) = exp − Hj = Hj x σ j2
Ok =
nH
wk j H j
(5.2.1)
(5.2.2)
j=1
where H j is an output of the j-th hidden unit, Ok an output of the k-th output unit, x( p) the p-th input vector, wk j the connection weight between the k-th output unit Output Data
Fig. 5.2 Radial basis function network
Ok
Output Layer
Hj
Hidden Layer
Ii
Input Layer
Input Data
38
5 Other Neural Networks
and the j-th hidden unit, µ j the center of the receptive fields of the j-th hidden unit and σ j a parameter to control the effective area of the j-th hidden unit covered by the radial basis function of Gaussian type. Minimized through the supervised learning is the error defined as N 1 (Ok − Tk )2 2 k=1
n
E=
(5.2.3)
where Tk is the teacher signal for the k-th output unit. The networks can be trained using the gradient-based learning algorithm as wk j = −
∂E ∂wk j
(5.2.4)
µ j = −
∂E ∂µ j
(5.2.5)
σ j = −
∂E ∂σ j
(5.2.6)
The training algorithm above does not make use of the localized feature of the radial basis functions, often resulting in slow convergence. To solve this problem, faster training algorithms utilizing the localities of the radial basis functions are available [3, 4].
References 1. Kohonen, T.: Self-organizing Maps, 2nd edn. Springer, Berlin (1997) 2. Masuda, M., Nakabayashi, Y., Yagawa, G.: Radius parallel self-organizing map (RPSOM). J. Comput. Sci. Technol. 6(1), 16–27 (2002) 3. Hassoun, M.H.: Fundamentals of Artificial Neural Networks. MIT Press, Cambridge (1995) 4. Heykin, S.: Neural Networks: A comprehensive Foundation. Prentice Hall, Upper Saddle River (1999) 5. Poggio, T., Girosi, F.: A theory of networks for approximation and learning. Technical Report, MIT (1989)
Chapter 6
Other Algorithms and Systems
Abstract Neural networks have become very popular in computational mechanics. Nevertheless, it is well known that some other machine learning algorithms also work well as the neural networks. In addition, neural networks are often combined with other machine learning algorithms. This chapter describes several algorithms, which are often combined with or compared with neural networks, including the genetic algorithm (GA), the genetic programming (GP), the bio-inspired algorithms and the support vector machine (SVM).
6.1 Genetic Algorithms The Genetic Algorithms (GA) are among the artificial optimization methods that mimic the theory of evolution [1–3], in which the basic unit to be optimized is called the individual (see Fig. 6.1). Each individual is an array of parameters called genes to be adjusted. Each gene is either integer-valued including binary-bit, or real-valued. A group of individuals to be optimized evolve under repeated genetic operations as follows: (1) Crossover: Two individuals are assumed, and then randomly selected sequences of several genes of the above individuals are interchanged to make two new individuals, which is called the crossover. Figure 6.1 shows an example of crossover between two binary-valued individuals of 10 bit length. (2) Mutation: An individual is picked up at random, and then a randomly selected gene of the individual is modified to make a new individual, which is called the mutation. Figure 6.2 shows an example of mutation on a binary-valued individual of 10 bit length. (3) Evaluation: Degree of fitness for target objectives is evaluated for each individual in the group. (4) Selection: Each individual is tested for survival to the next generation. Figure 6.3 shows a roulette used for the selection, where each individual has its own area proportional to the fitness. The higher the area of an individual is, the higher the probability of its survival to the next generation is. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 G. Yagawa and A. Oishi, Computational Mechanics with Neural Networks, Lecture Notes on Numerical Methods in Engineering and Sciences, https://doi.org/10.1007/978-3-030-66111-3_6
39
40
6 Other Algorithms and Systems
Gene Individual 1
0101101110
0101101000
New Individual 1
Individual 2
1010101000
1010101110
New Individual 2
Point of Crossover
Point of Crossover Fig. 6.1 Crossover in genetic algorithm
Fig. 6.2 Mutation in genetic algorithm
0101101110
0101101010
Point of Mutation
Point of Mutation
fitness[1] fitness[n-1] fitness[n-2]
fitness[2] fitness[n]
fitness[3]
fitness[4]
fitness[5] Fig. 6.3 Roulette used for selection in genetic algorithm
The procedure of a typical GA is summarized as follows (Fig. 6.4): (1) Initialization: All the individuals are initialized to the sequences of random values. (2) Evaluation of fitness: Fitness of each individual is assessed based on the prescribed target function.
6.1 Genetic Algorithms
41
Fig. 6.4 Flowchart of a typical GA
START Initialization Evaluation
Converged? Generation Loop
Yes
END
No Selection Crossover Mutation
(3) Selection: Each individual is tested for survival to the next generation through the roulette selection process. (4) Crossover: Crossover operation is performed. (5) Mutation: Mutation operation is performed. (6) (2)–(5) above are repeated. Genetic algorithms are often employed for global optimization problems. Though they are more efficient than random search methods, the former require heavy computing power depending on how to evaluate the fitness.
6.2 Genetic Programming The genetic programming (GP) is regarded as one of the genetic algorithms (GA) [4–6]. Like the genetic algorithms, the genetic programming is applied to global optimization problems, and again requires heavy computing power depending on the manner how to evaluate the fitness. It is noted that such genetic operations as the crossover, the mutation, the evaluation and the selection are common among the GA and the GP. Major difference is that the latter can handle the structural representation such as the tree structure, while the former usually handles individuals represented by a simple one-dimensional array. A tree structure can represent various numerical expressions. Figure 6.5 shows an example of tree structure representation of a numerical expres-
42
6 Other Algorithms and Systems
Fig. 6.5 Tree representation of mathematical expression in genetic programming
sion, 3x + 5y, where each circle is called a node: the nodes with x, 3, 5 or y the terminal nodes, the nodes with + or ∗ the non-terminal nodes, and the node with + the root node, respectively. The tree structure can represent various mathematical expressions by setting variables and constants in terminal nodes, and operators or functions in non-terminal nodes. An example of crossover of two tree structures is shown in Fig. 6.6, where a selected branch of one individual is exchanged with a selected branch of another individual to produce new individuals. In the figure, the crossover between two individuals, called Parents, representing 3x y and 3x y + x, respectively, creates two new individuals, called Childs, representing 3x y 2 and 4x, respectively. Figure 6.7 shows an example of mutation for a tree structure, where a node for
Fig. 6.6 Crossover in genetic programming
6.2 Genetic Programming
43
Fig. 6.7 Mutation in genetic programming
mutation is selected at random. If the node is a terminal node, it is changed to a different value, otherwise to a different operator. It is shown in the figure that an individual representing x + 3y is mutated to another one that represents x + y + 3 according to the mutation of the node ∗ to +. The tree structure of an individual in the genetic programming is usually implemented using pointers in C language, which has some disadvantages in computing speed during evaluation and in implementation on a parallel processing environment. On the other hand, the linear genetic programming (GP) with a one-dimensional array to express mathematical expressions is faster in speed, better in search performance and more efficient in memory usage than the pointer-based genetic programming [7], where mathematical expressions are implemented on the 1D array with each operator being placed before its operands. Table 6.1 shows some examples of mathematical expressions represented by the linear GP, in which the genetic operations, the crossover and the mutation are performed with the help of corresponding stack array. It is noted that each node has its own stack count according to its node type: the stack count of an operator node with two operands such as + and ∗ is −1, that of an operator node with only one operand such as sin() and cos() is 0 and that of variable or constant is +1. With these definitions of the stack count, its total sum in a subtree is always 1. Once a node is specified, one can identify a subtree starting from the node by checking the stack count. Table 6.2 shows some examples of stack arrays for mathematical expressions represented by the linear GP. Table. 6.1 Mathematical expressions in linear GP
Mathematical expression
Representation by linear GP
3x + 5y
+*3x*5y
y(3 + x)
*y + 3x
3xy + sin(x)
+*3*xysinx
44
6 Other Algorithms and Systems
Table. 6.2 Stack arrays for mathematical expressions
Stack array
Representation by linear GP
[−1,−1,1,1,−1,1,1]
+*3x*5y
[−1,1,−1,1,1]
*y + 3x
[−1,−1,1,−1,1,1,0,1]
+*3*xysinx
6.3 Other Bio-inspired Algorithms As the neural networks are modelled after the neuron system of human and the genetic algorithms including the genetic programming are modelled after the evolution and the heredity in the nature, they are called the bio-inspired algorithms. Other bio-inspired algorithms include the Artificial Immune System [8, 9], the Particle swarm optimization [10], the Ant Colony Algorithms [11], the Artificial bee colony algorithm [12, 13], and the Cuckoo search [14]. These algorithms are based on the collective behavior of the natural or artificial systems, and often categorized as the swarm intelligence [15].
6.4 Support Vector Machines Among other important methods, the support vector machines (SVM) [16–19] are popular for classification problems and also for regression problems. The SVM often show superior performance to the neural networks. The basic algorithm of the SVM is summarized as follows. First, we assume a set of n-dimensional data points consisting of two categories as = (x i , ti )|x i ∈ R n , ti ∈ {−1, +1}, (i = 1, . . . , N )
(6.4.1)
where x i is the i-th data point in the set and ti its label indicating the category it belongs. If the data set are linearly separable, we can draw a (n − 1)-dimensional hyperplane that divides the data set with a marginal distance d called a margin between two categories based on their labels, where a hyperplane and its margin are expressed as follows: wT x + b = 0 d(w, b) =
2 w
(6.4.2) (6.4.3)
Figure 6.8 shows a hyperplane for the case n = 2. When data points are linearly separable, we have
6.4 Support Vector Machines
45
Fig. 6.8 Separation of data points by hyperplane
ti w T x i + b − 1 ≥ 0
(6.4.4)
for all the data points. The best hyperplane is determined by maximizing the margin as shown in Fig. 6.9 and the best value w 0 is obtained through the minimization of the equation as
Fig. 6.9 Hyperplane with the largest margin
46
6 Other Algorithms and Systems
L(w, b, α) =
N 1 T αi ti w T x i + b − 1 w w− 2 i=1
(6.4.5)
where α = (α1 , . . . , αn ) are the Lagrange multipliers. By solving this, we obtain five equations or inequalities called the Karush–Kuhn–Tucker (KKT) conditions as follows: w0 =
N
αi ti x i
(6.4.6)
i=1 N
αi ti = 0
(6.4.7)
i=1
ti w T x i + b − 1 ≥ 0
(6.4.8)
αi ≥ 0
(6.4.9)
αi ti w T x i + b − 1 = 0
(6.4.10)
In the above, Eq. (6.4.6) with Eqs. (6.4.7)–(6.4.10) as constraints means that the best hyperplane is constructed from the small subset of data points called support vectors. If some data points are not separable by a simple hyperplane (see Fig. 6.10), then new parameters ξ = (ξ1 , . . . , ξn ) are employed with ξ being the slack variables. According to the location of corresponding data point, the value of each slack variable ξi is set as ⎧ ⎨
ξi = 0 (Case A) 0 < ξi ≤ 1 (CaseB) ⎩ ξi > 1 (CaseC)
(6.4.11)
Then, Eqs. (6.4.4) and (6.4.5) are replaced, respectively, as ti w T x i + b − 1+ξ i ≥ 0 L(w, b, α, ξ , μ) =
(6.4.12)
N N N 1 T w w− αi ti w T x i + b − 1 + C ξi − μi ξi 2 i=1 i=1 i=1
(6.4.13) where μ = (μ1 , . . . , μn ) are another Lagrange multipliers. Through the minimization of Eq. (6.4.13), we have
6.4 Support Vector Machines
47
Fig. 6.10 Slack variables
w0 =
N
αi ti x i
(6.4.14)
i=1
with other KKT conditions as constraints. Here again, the best hyperplane is determined from the small subset of data points or the support vectors. It is noted that the support vector machines determine the (n − 1)-dimensional hyperplane for the n-dimensional data points, implying that it is difficult to draw a non-linear line to separate into two categories in the 2D case (n = 2) (see Fig. 6.11). However, this is not true as shown in the followings. For a given set of n-dimensional data points , a new set of m-dimensional data points are defined using a m-dimensional vector function ϕ() as =
yi , ti | yi = ϕ(x i ), x i ∈ R n , yi ∈ R m , ti ∈ {−1, +1}, (i = 1, . . . , N ) (6.4.15)
Then, the SVM is applied to , which results in a (m − 1)-dimensional hyperplane, and the best hyperplane is obtained as w0T y + b = 0
(6.4.16)
where the first term of the left side of the equation is transformed as follows:
48
6 Other Algorithms and Systems
Fig. 6.11 Non-linear separating line
w0T
y=
N
αi ti yi y =
i=1
N i=1
αi ti ϕ(x i )ϕ(x) =
N
αi ti K (x i , x)
(6.4.17)
i=1
Here, K () is called a kernel function. Selecting an appropriate form of the kernel function significantly reduces computational load. This (m − 1)-dimensional hyperplane is regarded as a (n − 1)-dimensional non-linear surface for the original data points .
6.5 Expert Systems The expert systems [20] are software systems that have been developed since 1970’s for decision making by emulating human experts. They are usually rule-based and written in such languages for AI as Lisp [21] and Prolog [22]. A typical expert system consists of the if–then rule and the database of a huge amount of knowledge. This is different from the systems described above, where a few simple rules are employed
6.5 Expert Systems
49
to process huge data. A computer algebra system is regarded as a kind of expert systems [23, 24].
6.6 Software Tools Various machine learning and statistical analysis techniques are implemented in the free software R [https://www.r-project.org; [25]]. One can choose the best method in the software for a given data set.
References 1. Goldberg, D.E.: Genetic Algorithms in Search, Optimization & Machine Learning. AddisonWesley, Boston (1989) 2. Michalewicz, Z.: Genetic Algorithms + Data Structures = Evolution Programs. Springer, Berlin (1992) 3. Holland, J.H.: Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence. MIT Press, Cambridge (1992) 4. Koza, J.R.: Genetic Programming. MIT Press, Cambridge (1992) 5. Koza, J.R.: Genetic Programming II. MIT Press, Cambridge (1994) 6. Langdon, W.B., Poli, R.: Foundations of Genetic Programming. Springer, Berlin (2002) 7. Tokui, N., Iba, H.: Empirical and statistical analysis of genetic programming with linear genome. In: Proceedings of 1999 IEEE International Conference on Systems, Man and Cybernetics (1999) 8. Farmer, J.D., Packard, N., Perelson, A.: The immune system, adaptation and machine learning. Physica D 2, 187–204 (1986) 9. DeCastro, L., Timmis, J.: Artificial Immune Systems: A New Computational Intelligence Approach. Springer, Berlin (2001) 10. Kennedy, J., Eberhart, R.: Particle swarm optimization. In: Proceedings of IEEE international conference on neural networks, IV, pp. 1942–1948 (1995) 11. Dorigo, M., Gambardella, L.M.: Ant algorithms for discreate optimization. Artif. Life 5(2), 137–172 (1999) 12. Karaboga, D.: An idea based on honey bee swarm for numerical optimization. Technical Report, TR06, Erciyes University (2005) 13. Karaboga, D., Basturk, B.: A powerful and efficient algorithm for numerical function optimization: artificial bee colony (ABC) algorithm. J. Global Optim. 39(3), 459–471 (2007) 14. Yang, X.S., Deb, S.: Cuckoo search via Lévy flights, World Congress on Nature & Biologically Inspired Computing (NaBIC 2009). IEEE Publications, pp 210–214 (2009). arXiv:1003. 1594v1 15. Bonabeau, E., Dorigo, M., Theraulaz, G.: Swarm Intelligence: From Natural to Artificial Systems. Oxford University Press, Oxford (1999) 16. Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Berlin (1995) 17. Cortes, C., Vapnik, V.: Support vector networks. Mach. Learn. 20, 273–297 (1995) 18. Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Berlin (2006) 19. Murphy, K.P.: Machine Learning: A Probabilistic Perspective. MIT Press, Cambridge (2012) 20. Jackson, P.: Introduction to Expert Systems. Addison-Wesley, Boston (1986) 21. Winston, P.H., Horn, B.K.P.: Lisp, 3rd edn. Addison-Wesley, Boston (1989)
50
6 Other Algorithms and Systems
22. Clocksin, W.F., Mellish, C.S.: Programming in Prolog: Using the ISO Standard, 5th edn. Springer, Berlin (2003) 23. Hearn, A.C.: REDUCE: a user-oriented interactive system for algebraic simplification. In: Klerer, M. (Ed.) Interactive Systems for Experimental Applied Mathematics. Springer, Berlin, pp. 79–90 (1968) 24. Moses, J.: Macsyma: a personal history. J. Symbol. Comput. 47, 123–130 (2012) 25. Lander, J.P.: R for Everyone: Advanced Analytics and Graphics, 2nd edn. Addison-Wesley, Boston (2017)
Part II
Applications
Chapter 7
Introductory Remarks
Abstract This chapter provides an introduction for the application of neural networks to computational mechanics, where a basic procedure is discussed to incorporate neural networks in computational mechanics.
In the computational mechanics, we employ a partial differential equation which explains some natural or physical phenomenon of our interest, develop or select a method of discretization to solve the problem numerically, and then a solver appropriate for the simultaneous linear equations derived through the discretization depending on the size and the computational nature of them. In the above, careful observation on a natural or physical phenomenon and deep mathematical insight on it leads to partial differential equations, which describe the phenomenon appropriately. Also, a constitutive model of new material is developed with a lot of data measured in the large set of carefully prepared experiments and some mathematical thought, and a new algorithm for faster computation is derived based on the understanding on both the computer hardware and the details of source code. As is well recognized, any result by the computational mechanics simulation is achieved based on some rules. In this context, we need to select or develop rules to solve the problem throughout the solution process described above, meaning that, provided rules are prepared, a large amount of resulting data or solution is achieved by using them for any problem. But, the invention and development of rules has exclusively depended on man’s insight, requiring considerable efforts of experts in the related fields. Also, the rule developed often has some limits, which is available only when some assumptions and idealizations hold, reflecting the limits of man’s ability of thinking. On the other hand, amazing advance of computers and networking technology has brought us the capability to deal with a large amount of data, and also another way to find and develop new rules “automatically” by using the machine learning. Among several machine learning methods, neural networks have been popular in finding new rules. Rules achieved by neural networks are usually different from those acquired through man’s thought. The latter are usually represented explicitly in mathematical equations or in algorithms for numerical computation, which are rigid and explicit © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 G. Yagawa and A. Oishi, Computational Mechanics with Neural Networks, Lecture Notes on Numerical Methods in Engineering and Sciences, https://doi.org/10.1007/978-3-030-66111-3_7
53
54
7 Introductory Remarks
reflecting some idealizations which they are based on. The former, on the other hand, are not represented in mathematical equations but in implicit mapping. In the computational mechanics, neural networks have been employed to produce rules for various fields and processes where, due to complexity, we have never been able to develop any explicit rules. Here, among neural networks of various types, feedforward neural networks and their variants are most often employed in the applications related to the computational mechanics. This is because their ability to approximate mappings is easily implemented in many applications. For example, if any input–output mapping is known in an application, it is easily performed to use a feedforward neural network that approximates the mapping. It is noted that it sometimes results in remarkable success, while it sometimes fails to learn or cannot learn a mapping accurate enough for such application as with improper setting of input–output relationship or inadequate amount of data. Most applications of feedforward neural networks in the computational mechanics consist of three phases as follows: (1) Data preparation phase: An input–output relationship involved in the computational problem is defined by setting the n-dimensional p pmechanics p p vector p p x1 , x2p, . . . ,pxn−1 , xn as input and the m-dimensional vector y1 , y2 , . . . , ym−1 , ym as output, respectively, where p means the p-th data pairs out of a large set of data pairs, which are prepared to construct the mapping relationships between them. (2) Training phase: Using the large set of input–output data pairs above, the training of a feedforward neural network is carried out in order to achieve the mapping rules between them that aneural network is trained with the n-dimensional p p p p data x1 , x2 , . . . , xn−1 ,pxn p, p = 1,p 2, 3, .p .., as input to the network and the m-dimensional data y1 , y2 , . . . , ym−1 , ym , p = 1, 2, 3, . . ., as the teacher signal. When the training above converges, the neural network achieves a multidimensional mapping as f : Rn → Rm
(7.1)
(3) Application phase: The mapping f given above can reproduce the original input–output relationship as follows:
p p p p p p y1 , y2 , . . . , ym−1 , ymp = f x1 , x2 , . . . , xn−1 , xnp
(7.2)
In other words, the original input–output relationship given in (1) above is replaced by the mapping achieved in the feedforward neural network. Or, given the input data, the corresponding output data is directly estimated by the above mapping. It is noted that a large amount of input–output data pairs in the data preparation phase (1) can be efficiently collected with the computational mechanics simulations. Experiments may work, but not enough. It sometimes occurs that the neural network trained in the training phase is re-trained to strengthen its usefulness or improve
7 Introductory Remarks
55
its accuracy with the data pairs collected in the application phase or the additional simulations. The training in (2) can be performed in either of two directions: one p p p p p p p p from , xn p top y1 , y2p , . . . p, ym−1 , ym , and the other from p p x1 , x2 ,p. . . , xpn−1 y1 , y2 , . . . , ym−1 , ym to x1 , x2 , . . . , xn−1 , xn . Using the former, the trained neural network can approximate the original direct mapping of such experiments or numerical methods as the finite element method, and the network is often called a surrogate model or a reduced-order model. Using the latter, which is the inverse type of the mapping, the trained neural network can be used to solve such inverse problems as the nondestructive evaluation and the structural optimization problem. Compared to the plenty of applications of the feedforward neural networks, the mutually connected neural networks have few applications in the computational mechanics. They have been employed only in such applications as a kind of energy minimization problem. Applications of the neural networks in the computational mechanics are classified into the enhancement of the computational mechanics itself and the widening of the applicability of the computational mechanics. The former includes the materials modeling, the numerical quadrature and the contact search, whereas the latter such applications as the nondestructive evaluation, the structural optimization and the design optimization. We discuss a variety of applications of the neural networks to the computational mechanics in Chaps. 8 through 14 of this book. Some applications of the machine learning methods other than the neural networks to the computational mechanics are described in Chap. 15, and those of the deep learning in Chap. 16. Readers can also refer to other review papers [1, 2] on the application of the neural networks to the computational mechanics, especially about studies in early days.
References 1. Yagawa, G., Okuda, H.: Neural networks in computational mechanics. Arch. Comput. Methods Eng. 3(4), 435–512 (1996) 2. Waszczyszyn, Z., Ziemianski, L.: Neural networks in mechanics of structures and materials— new results and prospects of applications. Comput. Struct. 79, 2261–2276 (2001)
Chapter 8
Constitutive Models
Abstract Since constitutive equations of some materials are difficult to define mathematically, neural networks have been utilized to solve this problem. Chapter 8 describes how to use neural networks to simulate constitutive equations of various mechanical properties of materials: the parameter determination of viscoplastic constitutive equations (Sect. 8.1), the implicit constitutive modelling for viscoplasticity (Sect. 8.2), the autoprogressive algorithm (Sect. 8.3) and others (Sect. 8.4).
8.1 Parameter Determination of Viscoplastic Constitutive Equations The time-dependent effect of the deformation of materials is often represented with a viscoplastic term as follows: ε = εe + εv + ε p = εe + εvp
(8.1.1)
where εv , ε p and εvp are the viscous, plastic and unified viscoplastic strains, respectively. In the Chaboche’s model [1], the viscoplastic behavior is formulated by using the back stress χ representing the kinematic hardening and the drag stress R representing the isotropic hardening as ε˙ vp =
|σ − χ | − R n sgn(σ − χ ) K
(8.1.2)
with χ˙ = H ε˙ vp − Dχ ε˙ vp
(8.1.3)
R˙ = (h − d R)ε˙ vp
(8.1.4)
and
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 G. Yagawa and A. Oishi, Computational Mechanics with Neural Networks, Lecture Notes on Numerical Methods in Engineering and Sciences, https://doi.org/10.1007/978-3-030-66111-3_8
57
58
8 Constitutive Models
where K , n, H , D, h and d are material parameters, is zero if the value inside the bracket is negative and the stationary temperature condition is assumed. These six material parameters and R0 , the initial value of R, are to be determined to be optimal, so that Eqs. (8.1.2)–(8.1.4) can approximate well the viscoplastic behaviors of the material. They could be determined through a curve fitting technique, but it is not so easy to find these values as the whole viscoplastic behaviors are well approximated. To solve this issue, a method is reported to determine these parameters by the feedforward neural network trained using simulated data on four viscoplastic behaviors: a tensile behavior, a cyclic hysteresis behavior, a cyclic strain hardening behavior and a stress relaxation behavior [2]. This method is summarized as follollows, (1) Given the seven parameters above, the four viscoplastic behaviors are simulated based on the Chaboche’s model by using Eqs. (8.1.2)–(8.1.4). If we change the parameter values parametrically, various viscoplastic behaviors are generated. The sets of the seven parameters and the feature values extracted from the viscoplastic behaviors are employed as the training patterns in the following step. (2) A feedforward neural network is trained using the patterns generated above, where the feature parameters extracted from the calculated viscoplastic behaviors are used as the input data to the neural network, and the seven parameters of the Chaboche’ model as the teacher signals. (3) The trained neural network can estimate the optimal values of the seven parameters using the feature parameters as input. Note that the feature parameters are extracted from the experiments of the four viscoplastic behaviors. In the paper, selected 46 feature values from four viscoplastic behaviors are used as input to the neural network: eight stress values from the tensile curve, twenty-one stress values from the cyclic hysteresis curve, seven stress values at the hysteresis tip from the cyclic hardening curve and ten stress values from the stress relaxation curve. A feedforward neural network of one hidden layer with 60 units is employed, which is successfully trained using 98 training patterns through 12,000 learning iterations. It is noted that this method could be applied to various inelastic constitutive equations.
8.2 Implicit Constitutive Modelling for Viscoplasticity It is discussed in the previous section that the neural networks have a capability to estimate material parameters of the constitutive models defined by the explicit mathematical equations. This method may not be applicable to such materials as newly developed compounds that show unique nonlinear behaviors, for which no constitutive models are available. To solve this issue, Furukawa and Yagawa [3] have proposed an implicit constitutive model based on the neural network, where they have employed the state-variable method in the control theory, which is known to describe the dynamical systems by a set of first-order differential equations with
8.2 Implicit Constitutive Modelling for Viscoplasticity
59
variables called the state, and the solution may be visualized as a trajectory in space. The method has a deep relation with the modern control theory [4]. Using the method, the state-space representation of the explicit viscoplastic constitutive model such as the Chaboche’s model can be written as ε˙ vp = ϕ ε vp , ξ , σ , a
(8.2.1)
ξ˙ = ψ εvp , ξ , σ , a
(8.2.2)
and
where εvp , ξ , σ and a are the viscoplastic strain, the internal variables, the stress and the material parameters, respectively. Similarly, the generalized implicit viscoplastic model can be written as follows: ε˙ vp = ϕ ε vp , ξ , σ
(8.2.3)
ξ˙ = ψ εvp , ξ , σ
(8.2.4)
and
In the above, the internal variables include the back stress, the drag stress or others, depending on the material behavior to be described. In the modern control theory, a nonlinear dynamic system is usually given by x˙ = φ(x, u)
(8.2.5)
where x is a set of state variables and u a set of control points. In analogy of the constitutive equations Eqs. (8.2.3) and (8.2.4) with the general dynamic systems Eq. (8.2.5), the viscoplastic strain and the internal variables are regarded as the state variables, whereas the stress as a control input. The method above is summarized as follows: (1) The stress–strain relationships and their time histories are obtained by experiments. (2) A feedforward neural network is trained using the patterns prepared in (1) above. (3) The trained neural network is employed as if it is a constitutive model of the material. vp
In the paper [3], εn , ξ n and σ n are used as the input data for a neural network ˙ and ε˙ vp n and ξ n as the teacher signals, respectively (Fig. 8.1). It is shown in Fig. 8.2 that the implicit constitutive model by the present neural networks can simulate well the real stress–strain curve obtained by experiment.
60
8 Constitutive Models
Fig. 8.1 Training of neural network constitutive model. Reprinted from [3] with permission from Wiley
8.3 Autoprogressive Algorithm In the autoprogressive algorithm, the neural network itself is integrated as a part of the iterative algorithm to create the stress–strain training cases from the global response data [5, 6]. In this algorithm, the material model is iteratively updated using the stresses and the strains obtained from the finite element analyses with measured boundary conditions. The algorithm consists of three phases as. (1) Pre-training phase: A target structure, either a virtual or a real one, and its finite element model are taken first, and a material model used in the finite element analysis is given. The output of the neural network is usually the stress at the current step, and the input the strains at the current step and some values of stresses and strains at the previous steps depending on the material type to be simulated. The training patterns for the initial state of the neural network are selected from such a basic material model as the linear elasticity. (2) Training phase: −
−
The load F and the resultant displacements u at the surface of a structure are measured in an experiment. Next, the following steps are repeated until the convergence criterion is met (Fig. 8.3). −
(A) The stress data of the target structure under the load F are obtained by the FEM analysis with the current NN-based material model. (B) Similarly, the strain data of the target structure under the measured −
displacements u are obtained by the FEM analysis with the current NN-based material model.
8.3 Autoprogressive Algorithm Fig. 8.2 a Comparison between neural network constitutive model and experimental data. b Comparison between best-fit Caboche’s model and experimental data. Reprinted from [3] with permission from Wiley
61
62
8 Constitutive Models
(C) The NN-based material model is updated using the stress and the strain data obtained above as the training data of the neural network. The criterion to finish the above cycles (A)–(C) is based on the error between −
the displacements obtained in (A) and those measured (u). (3) Using the converged NN-based model above, a FEM analysis of the target −
structure is performed under the load F. This method has been applied to the material modelling of laminated composites, where the laminated graphite/epoxy plates containing an open hole is employed as an example [5]. The neural network employed consists of four layers: three units corresponding to three strain components in the input layer, four units in the first
Fig. 8.3 Autoprogressive training of neural network material model. Reprinted from [5] with permission from Wiley
8.3 Autoprogressive Algorithm
63
and the second hidden layers, respectively, and three units in the output layer, which correspond to three stress components. Figure 8.4 shows how the proposed method works in the material modelling. This method has also been applied to the modelling of the rate-dependent material behavior [6] (Fig. 8.5), and it has been extended by introducing a new tangent stiffness
Fig. 8.4 Improvements of neural network 1-D material model during autoprogressive training (1 ksi = 895 MPa). Reprinted from [5] with permission from Wiley
64
8 Constitutive Models
Fig. 8.5 Predicted and experimental creep curves (constant stress = 500 MPa). Reprinted from [6] with permission from Elsevier
formulation, applying to the modelling of the cyclic behavior of the connections in frame structures (Fig. 8.6), especially the modelling of the post-yielding and the postlimit behavior of the connection under cyclic loading [7]. In the paper, considering both the geometrical and material nonlinearities, the nonlinear static and dynamic finite element analyses with an incremental iterative scheme based on the Newton– Raphson method are employed, where the autoprogressive loop is implemented as an inner loop of the load increment loop of the Newton–Raphson iteration. Figure 8.7 shows the performance of the method applied to a two-story frame structure with semi-rigid connections.
Fig. 8.6 Neural network based connection model combined with beam–column element. Reprinted from [7] with permission from Elsevier
8.4 Others
65
Fig. 8.7 Comparisons of time history of horizontal displacement at second floor. Reprinted from [7] with permission from Elsevier
8.4 Others In most of the finite element analysis codes, a material model is explicitly expressed in the form of a stress–strain matrix as shown in Eq. (A1.19) for the linear elasticity (see Appendix A1). In contrast, the material models by the neural networks are rarely represented in the matrix form, which often limits its applicability to the FEM analysis. Nevertheless, a method to derive the explicit stress–strain matrix from a material model based on the neural network has been developed and tested in the commercial FEM code, ABAQUS [8]. It is noted that to simulate the behaviors of materials is mainly performed by the standard feedforward neural networks (see Sect. 2.1), whereas, for time-dependent or rate-dependent material behaviors, the recurrent neural network (see Sect. 2.2) could be an alternative. In [9], a partial recurrent neural network is developed to model the behavior of rheological materials that exhibit the Newtonian-like behavior under the creep loadings. In [10], the recurrent neural networks for fuzzy data [11] are employed to describe the time-dependent material behavior, where the fuzzy finite element method is also utilized. In [12], the recurrent neural network trained to output stress sequence from strain sequence is employed to accelerate the multiscale finite element simulations. Another application of the recurrent neural network to the material modelling is found in [13], where the recurrent neural network is tested to model the uncured natural rubber material. In [14], instead of the recurrent neural network, the long short-term memory (LSTM) neural network [15], which outputs the structural response sequence taking the ground motion sequence as input, is employed to predict the nonlinear structural seismic response of buildings.
66
8 Constitutive Models
On the other hand, the neural networks of the mutually connected type have been used for the material modelling [16]. Where the neural network of Hopfield type is employed to approximate the anisotropic hardening elastoplastic behavior of materials. As is well known, the constitutive equation for accurate numerical results often requires too many material parameters, some of which cannot be achieved by simple experiments. The neural networks are utilized to reduce the number of parameters of a material model [17], where the original 28 parameters in the complex isotropic viscoplastic material model are reduced to 19 by using the neural networks. Combinations with other machine learning techniques are also reported. In [18], the feedforward neural network and the support vector machine are used for the multiscale analysis, where the former is employed to evaluate the stress tensor and the latter for the decision of the loading or unloading. Both utilize the same input data: the current total strain and the maximum strain ever reached in the loading process. The feedforward neural network and the genetic algorithm (GA) have been combined to simulate the stress–strain response of ceramic-matrix-composites, where initial values of the connection weights of the neural network are determined by the GA, resulting in much faster convergence of the training of the neural network with the back-propagation algorithm [19]. In [20], several techniques including the neural networks, the nested sampling [21], the Monte Carlo sampling and the principal component analysis are combined in a Bayesian inference framework for the plasticity model characterization. Other studies include the identification of material parameters of the finite deformation viscoplasticity model with static recovery [22], the modelling of sorption hysteresis of a porous material by the four-layer feedforward neural networks [23], the parameter identifications for an elasto-plastic model of superconducting cable under the cyclic loading [24], the simulation of the viscoplastic behavior of a carbonfiber/thixotropic-epoxy matrix composite by the feedforward neural network trained using the experimental results obtained via creep tests performed at various stresstemperature conditions [25], the numerical modelling of non-linear material in the incremental form [26], the rate-dependent constitutive model by the neural networks [27], the numerical multiscale modelling of composites and hierarchical structures, where the neural networks are used as a tool for the stress–strain recovery at lower levels of the hierarchical structure and for estimation of the state of yielding of the materials [28], the approximation of macroscopic stress vs. crack opening response in the multiscale analysis of a reinforced concrete beam, with which the material formulation of the interface layer between concrete and reinforcement is replaced by the NN to avoid expensive parallel simulation on different scales [29], the constitutive modelling for the non-linear characterization of anisotropic materials [30], the estimation of energy conversion factor from the material properties of flexoelectric nanostructures [31], the prediction of effective electric current density from the effective electric field in graphene/polymer nanocomposites [32], Bayesian inference of non-linear multiscale model parameters assisted by the neural network, which outputs
8.4 Others
67
homogenized tensile stress from homogenized strain history and material parameters [33], the prediction of creep parameters for the equivalent model of perforated Ni-based single crystal plates [34] and so on.
References 1. Chaboche, J.L., Rousselier, G.: On the plastic and viscoplastic equations-Part I: rules developed with internal variable concept. Trans. ASME, J. Pressure Vessel Technol. 105, 153–164 (1983) 2. Yoshimura, S., Hishida, H., Yagawa, G.: Parameter optimization of viscoplastic constitutive equation using hierarchical neural network. In: Proceedings of the VII International Congress on Experimental Mechanics, Nevada, June 8–11, pp. 296–301 (1992) 3. Furukawa, T., Yagawa, G.: Implicit constitutive modelling for viscoplasticity using neural networks. Int. J. Numer. Methods Eng. 43, 195–219 (1998) 4. Franklin, G.F., Powell, J.D., Emami-Naeini, A.: Feedback Control of Dynamicc Systems. Addison-Wesley (1991) 5. Ghaboussi, J., Pecknold, D.A., Zhang, M., Haj-Ali, R.: Autoprogressive training of neural network constitutive models. Int. J. Numer. Methods Eng. 42, 105–126 (1998) 6. Jung, S., Ghaboussi, J.: Characterizing rate-dependent material behaviors in self-learning simulation. Comput. Methods Appl. Mech. Eng. 196, 608–619 (2006) 7. Yun, G.J., Ghaboussi, J., Elnashai, A.S.: Self-learning simulation method for inverse nonlinear modeling of cyclic behavior of connections. Comput. Methods Appl. Mech. Eng. 197, 2836– 2857 (2008) 8. Hashash, Y.M.A., Jung, S., Ghaboussi, J.: Numerical implementation of a network based material model in finite element analysis. Int. J. Numer. Methods Eng. 59, 989–1005 (2004) 9. Oeser, M., Freitag, S.: Modeling of materials with fading memory using neural networks. Int. J. Numer. Methods Eng. 78, 843–862 (2009) 10. Freitag, S., Graf, W., Kaliske, M.: A material description based on recurrent neural networks for fuzzy data and its application within the finite element method. Comput. Struct. 124, 29–37 (2013) 11. Zadeh, L.A.: Fuzzy Sets, Fuzzy Logic, and Fuzzy Systems. World Scientific (1996) 12. Ghavamian, F., Simone, A.: Accelerating multiscale finite element simulations of historydependent materials using a recurrent neural network. Comput. Methods Appl. Mech. Eng. 357, 112594 (2019) 13. Zopf, C., Kaliske, M.: Numerical characterization of uncured elastomers by a neural network based approach. Comput. Struct. 182, 504–525 (2017) 14. Zhang, R., Chen, Z., Chen, S., Zheng, J., Büyüköztürk, O., Sun, H.: Deep long short-term memory networks for nonlinear structural seismic response prediction. Comput. Struct. 220, 55–68 (2019) 15. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016) 16. Theocaris, P.S., Panagiotopoulos, P.D.: Generalised hardening plasticity approximated via anisotropic elasticity: A neural network approach. Comput. Methods Appl. Mech. Eng. 125, 123–139 (1995) 17. Sumelka, W., Lodygowski, T.: Reduction of the number of material parameters by ANN approximation. Comput. Mech. 52, 287–300 (2013) 18. Unger, J.F., Könke, C.: Coupling of scales in a multiscale simulation using neural networks. Comput. Struct. 86, 1994–2003 (2008) 19. Rao, H.S., Ghorpade, V.G., Mukherjee, A.: A genetic algorithm based back propagation network for simulation of stress–strain response of ceramic-matrix-composites. Comput. Struct. 84, 330–339 (2006) 20. Asaadi, E., Heyns, P.S.: A computational framework for Bayesian inference in plasticity models characterisation. Comput. Methods Appl. Mech. Eng. 321, 455–481 (2017)
68
8 Constitutive Models
21. Skilling, J.: Nested sampling for general Bayesian computation. Bayesian Anal. 1, 833–860 (2006) 22. Huber, N., Tsakmakis, Ch.: A neural network tool for identifying the material parameters of a finite deformation viscoplasticity model with static recovery. Comput. Methods Appl. Mech. Eng. 191, 353–384 (2001) 23. Gawin, D., Lefik, M., Schrefler, B.A.: ANN approach to sorption hysteresis within a coupled hygro-thermo-mechanical FE analysis. Int. J. Numer. Methods Eng. 50, 299–323 (2001) 24. Lefik, M., Schrefler, B.A.: Artificial neural network for parameter identifications for an elastoplastic model of superconducting cable under cyclic loading. Comput. Struct. 80, 1699–1713 (2002) 25. Al-Haik, M.S., Garmestani, H., Navon, I.M.: Truncated-Newton training algorithm for neurocomputational viscoplastic model. Comput. Methods Appl. Mech. Eng. 192, 2249–2267 (2003) 26. Lefik, M., Schrefler, B.A.: Artificial neural network as an incremental non-linear constitutive model for a finite element code. Comput. Methods Appl. Mech. Eng. 192, 3265–3283 (2003) 27. Jung, S., Ghaboussi, J.: Neural network constitutive model for rate-dependent materials. Comput. Struct. 84, 955–963 (2006) 28. Lefik, M., Boso, D.P., Schrefler, B.A.: Artificial neural networks in numerical modelling of composites. Comput. Methods Appl. Mech. Eng. 198, 1785–1804 (2009) 29. Unger, J.F., Könke, C.: Neural networks as material models within a multiscale approach. Comput. Struct. 87, 1177–1186 (2009) 30. Man, H., Furukawa, T.: Neural network constitutive modelling for non-linear characterization of anisotropic materials. Int. J. Numer. Methods Eng. 85, 939–957 (2011) 31. Hamdia, K.M., Ghasemi, H., Bazi, Y., AlHichri, H., Alajlan, N., Rabczuk, T.: A novel deep learning based method for the computational material design of flexoelectric nanostructures with topology optimization. Finite Elements Anal. Des. 165, 21–30 (2019) 32. Lu, X., Giovanis, D.G., Yvonnet, J., Papadopoulos, V., Detrez, F., Bai, J.: A data-driven computational homogenization method based on neural networks for the nonlinear anisotropic electrical response of graphene/polymer nanocomposites. Comput. Mech. 64, 307–321 (2019) 33. Wu, L., Zulueta, K., Major, Z., Arriaga, A., Noels, L.: Bayesian inference of non-linear multiscale model parameters accelerated by a deep neural network. Comput. Methods Appl. Mech. Eng. 360, 112693 (2020) 34. Zhang, Y., Wen, Z., Pei, H., Wang, J., Li, Z., Yue, Z.: Equivalent method of evaluating mechanical properties of perforated Ni-based single crystal plates using artificial neural networks. Comput. Methods Appl. Mech. Eng. 360, 12725 (2020)
Chapter 9
Numerical Quadrature
Abstract Elemental integration is one of the major processes in the finite element method. The feedforward neural network could be employed to improve the numerical quadrature of the elemental integration. Chapter 9 deals with this issue: Section 9.1 describes the classification of the finite elements based on the convergence speed in numerical quadrature, and Sect. 9.2 the optimization of quadrature parameters.
9.1 Optimization of Number of Quadrature Points The element matrices in the finite element method are usually calculated using a numerical quadrature, where the integrated value of a function is approximated by the weighted sum of the values of an integrand at several prescribed points. The Gauss– Legendre quadrature is the most popular one among others, where an integrand f (x) is numerically integrated in the range [−1, 1] as follows: 1 f (x)d x ≈ −1
n
f (xi )Hi
(9.1.1)
i=1
where xi is the ith integration point, Hi the weight corresponding to xi and n the total number of integration points. In this quadrature, the coordinates of the integration points and their corresponding weights are, respectively, defined using the Legendre polynomials and the Lagrange polynomials. The former polynomial of the n-th degree Pn is given as follows: Pn (x) =
n dn 2 x −1 n dx
(9.1.2)
Here, the equation:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 G. Yagawa and A. Oishi, Computational Mechanics with Neural Networks, Lecture Notes on Numerical Methods in Engineering and Sciences, https://doi.org/10.1007/978-3-030-66111-3_9
69
70
9 Numerical Quadrature
Pn (x) = 0
(9.1.3)
has n real number solutions x1 , x2 , · · · , x n−1 , xn (xi < xi+1 ) in the range of (−1.0, 1.0), which are used as the coordinates of the integration points in the quadrature. On the other hand, the weights in the quadrature are calculated using the Lagrange polynomial as follows: L in−1 (x) =
(x − x1 ) · · · (x − xi−1 )(x − xi+1 ) · · · (x − xn ) (xi − x1 ) · · · (xi − xi−1 )(xi − xi+1 ) · · · (xi − xn )
(9.1.4)
and 1 Hi =
L in−1 (x)d x
(9.1.5)
−1
where L in−1 (x) is the ith Lagrange polynomial of (n − 1)th degree constructed from n solutions of Eq. (9.1.3) and Hi the weight at the ith integration point in the quadrature. It is noted that the Gauss–Legendre quadrature with n integration points is equivalent to integrating not the integrand given but its approximating polynomial of (2n − 1)th degree. The quadrature in one dimension is extended to that in two or three dimensions, respectively, as follows: 1 1 f (x, y)dxdy ≈ −1 −1
1 1 1 −1 −1 −1
n m f xi , y j · Hi j (2D)
(9.1.6)
i=1 j=1
n m l f (x, y, z)dxdydz ≈ f xi , y j , z k · Hi jk (3D)
(9.1.7)
i=1 j=1 k=1
where n, m and l are the numbers of integration points along each axis, respectively. From the viewpoint of the accuracy of numerical quadrature of the FEM stiffness matrix, that of the elements of regular shape, such as a square in the 2D space and a cube in the 3D space, can be accurately integrated with a few quadrature points, but that of irregular or distorted shapes requires more, often much more, quadrature points for accurate integration. The accuracy of numerical quadrature of the FEM stiffness matrix could be measured by comparison to a reference matrix, which could be obtained by using the Gauss–Legendre quadrature with adequate number of integration points. Here, the Err or is defined as follows:
9.1 Optimization of Number of Quadrature Points
71
q qmax i, j ki, j − ki, j Err or = q max ki,max j
(9.1.8)
i, j
q
where ki, j is the component at the ith row and the jth column of an element stiffness q matrix obtained by the numerical quadrature with q integration points and ki,max j with qmax integration points, which is the number of the integration points per axis that is used to compute the reference matrix. If qmax is set to 30, the reference matrix is obtained using the Gauss–Legendre quadrature with the total of 30 integration points for a one-dimensional element, and 27,000 points for a three-dimensional element, respectively. Here, we will discuss the accuracy of the element stiffness matrix obtained by using the Gauss–Legendre quadrature for two solid elements: an eight-node linear solid element of standard cubic shape, and that of distorted shape, in which √ one of the edges is extended twofold and the two neighboring edges extended by 2. Adding nodes at the middle of edges of these elements, two twenty-node quadratic elements are generated, for which the convergence properties of the Gauss–Legendre quadrature are shown in Fig. 9.1, where the horizontal axis represents the number of integration points per axis, the same for all the three axes, and the vertical axis the Err or defined in Eq. (9.1.8). In the figure, “1st-Regular” and “2nd-Regular” indicate the linear and the quadratic elements of the regular shape, respectively, and “1st-Irregular” and “2nd-Irregular” those of the distorted shape, showing that an element of the regular shape, whether it is linear or quadratic, tends to converge faster requiring only a few integration points to reach the prescribed accuracy, whereas that of the distorted shape slower convergence requiring more integration points.
Error
Fig. 9.1 Error versus number of quadrature points. Reprinted from [1] with permission from Elsevier
10
3
10
1
10
-1
10
-3
10
-5
10
-7
10
-9
10
-11
10
-13
FEA
1st-Regular 1st-Irregular 2nd-Regular 2nd-Irregular
0
5
10
15
20
25
Number of Quadrature Points
30
72
9 Numerical Quadrature
In conclusion, the number of quadrature points required to achieve the appropriate accuracy, called the optimal number of the quadrature points, varies depending on the types of elements, whereas it is not easy, often very difficult, to know how many integration points are needed to achieve satisfactory accuracy. However, the optimal number of quadrature points could be determined with the conditions as follows: (A) Given an integrand, a unique number or a unique set of numbers is mapped to the integrand. It is noted that it is rarely possible that the integrand is exactly identified out of broad range of function families. However, if the purposed integrand is supposed to be found in a small range of function families, the mapping of the integrand to a unique set of numbers will be possible. An integrand in the integral of the FEM stiffness matrix satisfies the condition (A), since it consists of the derivatives of the basis functions, which are usually common among elements and of limited varieties. (B) Given the target accuracy, the appropriate number of integration points for the integrand identified in (A) is estimated by the neural networks, in which the standard Gauss–Legendre quadrature of the integrand is employed with the estimated number of integration points above. Then, the optimization of the number of integration points is performed as follows: (1) Data preparation phase: The element stiffness matrices are numerically integrated for many kinds of elements using the Gauss–Legendre quadrature rule with one to qmax integration points per axis and the corresponding errors are evaluated by Eq. (9.1.8). A set of minimum numbers of integration points n opt , m opt , lopt to achieve the error smaller than the value set in advance are obtained. Thus, a lot of data pairs ({e − parameter s} and n opt , m opt , lopt ) are collected, where {e − parameter s} is a set of parameters identifying the element. (2) Training phase: Using the data prepared in (1) above, the feedforward neural network is trained. The classifier takes the element parameters {e − parameter s} as the input and the set of optimal numbers of integration points n opt , m opt , lopt as the output or the teacher signal. (3) Application phase: The trained neural network in (2) above is implemented to the numerical quadrature process of the FEM code. The numerical integrations of each element stiffness matrix are performed using the Gauss–Legendre quadrature with the corresponding set of optimal numbers of integration points provided by the neural network. The present method has been tested for the eight node solid elements, [1], where the input vector consists of the coordinates of the nodes and the output vector the optimal numbers of the quadrature points, the neural network consists of five layers or three hidden layers with 50 units in each hidden layer, and ten thousand patterns are collected and five thousand learning patterns are used for the training. The optimal numbers of the quadrature points are correctly estimated with the accuracy over 90% for the training patterns and 80% for the test patterns (see Table 9.1).
9.2 Optimization of Quadrature Parameters
73
Table 9.1 Results of classification (a) Training patterns Number of quadrature points (estimated by neural network) 2 3 Number of quadrature points (correct)
4
5
6
7
8
9 Total
2
5
5
3
1553 9
1562
4
7
2548 2
2557
5
20
616
6
5
2
153 2
636
7
2
5
1
8
5
1
9
1
1
5
6
162
46 1 15
55 21 2
(b) Test patterns Number of quadrature points (estimated by neural network) 2 3 Number of quadrature points (correct)
4
7
8
2
9 Total 0
3
1430 123
4
135
1553
2222 201 15
1
5
206
386 56
7
6
16
70
36
13
7
1
29
19
6
8
10
8
3
9
3
2
2574 1
656 135
1
56 21 5
Reprinted from [1] with permission from Elsevier
9.2 Optimization of Quadrature Parameters The Gauss–Legendre quadrature is known to be an excellent and near optimal and universal quadrature formula based on the polynomial approximation, which uses the same coordinates and the weights of the integration points for any integrand. It is expected that we could obtain better result employing different parameters without increasing the number of integration points. To make the concept clear, a simple example of definite integral is taken as 1
1 11 x x dx = 11
1
10
−1
−1
=
2 11
(9.2.1)
This integral is calculated using the Gauss–Legendre quadrature formula with 2 integration points as follows:
74
9 Numerical Quadrature
1 x 10 d x ≈ 1.0 · −1
−1 √ 3
10
+ 1.0 ·
1 √ 3
10 =
2 243
(9.2.2)
Comparing Eqs. (9.2.1) with (9.2.2), it is shown that the value obtained by the Gauss–Legendre quadrature is as small as one twentieth of the correct value. But, if both of the weights in Eq. (9.2.2) are changed from 1.0 to 243/11, then the correct value is achieved, meaning that the value 243/11 is the optimal weight for the present case. Therefore, as in Sect. 9.1, the optimal quadrature parameters could be obtained with the conditions as follows: (A) Given an integrand, a unique number or a unique set of numbers is mapped to the integrand. (B) Given the number of integration points, such parameters as the coordinates and the weights of the integration points for the integrand identified in (A) are estimated by the neural networks, and the numerical integration is performed using these parameters. The optimization procedure is given as follows. Here again, the element parameters that identify the element, such as the coordinates of the nodes in an element, are presented as {e − parameter s}, and the prescribed number of integration points as qx , q y , q z , each component of which corresponds to that for one axis. (1) Data preparationphase: Setting the set of numbers of integration points to qmax , qmax , q max for each of many kinds of elements, the element stiffness matrix is integrated using the Gauss–Legendre quadrature with the standard weights Hi, j,k , and the result is regarded as the correct one. Then, with the prescribed set of numbers of the integration points qx , q y , q z , the element stiffness matrix is repeatedly integrated using the Gauss–Legendre quadrature, not with the standard values of Hi, j,k but with different ones of wi, j,k × Hi, j,k , and the error defined in Eq. (9.1.8) is evaluated for each result. Finally, the best result evaluated to have the smallest error and the corresponding w i, j,k written opt
as wi, j,k is determined out of the many results obtained from using various opt wi, j,k . In this manner, many data pairs ({e − parameter s}and wi, j,k ) are collected. (2) Training phase: Using the data prepared in the data preparation phase above, the feedforward neural network is trained, taking the element parameters {e − parameter s} as the input and the set of optimal correction factors wi,optj,k as the output or the teacher signal. (3) Application phase: The trained feed forward neural network constructed in the training phase is implemented to the numerical quadrature process of the FEM code. When the {e − parameter s} ofan element is put to the neural network, it opt outputs the optimal correction factors wi, j,k of the element, which is employed to calculate the element stiffness matrix of the element.
9.2 Optimization of Quadrature Parameters
75
This method has been successfully tested for the eight node solid elements [1], where the input vector consists of the coordinates of nodes and the output the optimal correction factors. The neural network consists of seven layers or five hidden layers with 50 units in each hidden layer. Ten thousand patterns are collected and five thousand learning patterns are used for the training. Using the correction factors estimated by the neural network, the accuracy in the quadrature is improved as high as 99% for the training patterns, whereas 97% for the test patterns. Figure 9.2 shows how the accuracy of quadrature is improved by employing the method.
Optimized Estimated by Neuro
Number of Elements
1000 800 600 400 200 0 0.72
0.86
1.00
1.14
Error Ratio
(a) Training patterns Optimized Estimated by Neuro
1000
Number of Elements
Fig. 9.2 Distributions of error ratios when using correction factors estimated by deep learning. Reprinted from [1] with permission from Elsevier
800 600 400 200 0 0.72
0.86
1.00
Error Ratio
(b) Test patterns
1.14
76
9 Numerical Quadrature
Reference 1. Oishi, A., Yagawa, G.: Computational mechanics enhanced by deep learning. Comput. Methods Appl. Mech. Eng. 327, 327–351 (2017)
Chapter 10
Identifications of Analysis Parameters
Abstract We discuss here the identifications of analysis parameters using the neural networks, which include the time step evaluation in time-dependent stress analysis (Sect. 10.1), the parameter identification in the augmented Lagrangian method (Sect. 10.2), the predictor-corrector method using the neural networks (Sect. 10.3), and the contact stiffness estimation (Sect. 10.4).
10.1 Time Step Determination of Pseudo Time-Dependent Stress Analysis It is well known that a static problem is converted into quasi-nonsteady problems by introducing an artificial viscosity, which are solved using the fractional step method. By the same token, a fully plastic solution for practical structural components with a 3D crack can be obtained using the quasi-nonsteady algorithm based on the mixed formulation [1], where the time step t has a strong influence on the numerical results and the stability. The neural networks are employed to realize an automatic selection of appropriate time step of the above problem [2, 3], in which the power-law hardening constitutive relationship of material nonlinearity is assumed as n ε σ˜ =α ε0 σ0 −
(10.1.1) −
where α, ε0 , σ0 and n are the material constants, and ε and σ˜ are, respectively, the von Mises equivalent strain and stress. The basic equations for the incompressible nonlinear materials are given as −
σi j, j + Fi = 0
(10.1.2)
εii = 0
(10.1.3)
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 G. Yagawa and A. Oishi, Computational Mechanics with Neural Networks, Lecture Notes on Numerical Methods in Engineering and Sciences, https://doi.org/10.1007/978-3-030-66111-3_10
77
78
10 Identifications of Analysis Parameters
and 1 Si j = σi j − δi j σkk 3
(10.1.4)
where Si j is the deviatoric stress component, δi j the Kronecker delta, and (), j denotes the partial derivative with respect tox j . The spatial discretization on the above equations yields K (u)u + Q p = F
(10.1.5)
QT u = 0
(10.1.6)
and
where u is the nodal displacement, p the hydrostatic pressure, K (u) and Q the stiffness matrix and the pressure gradient matrix, respectively, F the equivalent nodal force and ()T denotes the transposed matrix. Adding the quasi-viscosity term to Eq. (10.1.5), we have C u˙ + K (u)u + Q p = F
(10.1.7)
where C is the quasi-viscosity matrix, and u˙ the velocity vector. With the displacement vector discretized explicitly in time and the hydrostatic pressure implicitly in time, Eqs.(10.1.5) and (10.1.6) are written as Q T un+1 = 0
(10.1.8)
un+1 = un + t C −1 F − K (u)un − Q pn+1
(10.1.9)
and
Equation (10.1.9) is rewritten using the fractional step method as un+ 21 = un − t C −1 K (u)un
(10.1.10)
un+1 = un+ 21 + t C −1 F − Q pn+1
(10.1.11)
and
Assuming that Eq. (10.1.8) is satisfied at the (n + 1)th time step, we obtain from Eqs. (10.1.11) and (10.1.8) the following equation,
10.1 Time Step Determination of Pseudo Time-Dependent Stress Analysis
79
Fig. 10.1 Flowchart for fractional step method. Reprinted from [3] with permission from Emerald Publishing Limited
Q T C −1 Q pn+1 =
1 T Q un+ 21 + Q T C −1 F t
(10.1.12)
The procedure above is summarized as shown in Fig. 10.1, where t is determined automatically by the feedforward neural network as follows: (1) A lot of problems with randomly selected analysis parameters each other are solved with various time steps, where the analysis parameters include α, ε0 , σ0 and n, the element type, the element shape, the element size and so on. From the analysis results above, the most appropriate time step for each problem is determined based on some criterion. Thus, a lot of data pairs of the analysis parameters and a corresponding best time step are collected. (2) The feedforward neural network is trained using the data pairs collected above: the analysis parameters as input data and the best time step as the teacher signal. (3) The trained neural network above is applied to the estimation of time step of any new problem. Studied is the fundamental performance of this method through the analysis of an inelastic cubic material [3], where several types of network topologies are tested for small set of data pairs. For example, a neural network with two parameters, n and σ˜ as input and t as output, is trained with 25 patterns and tested with 16 patterns, resulting in an estimation error as small as 7% or less.
80
10 Identifications of Analysis Parameters
10.2 Parameter Identification of Augmented Lagrangian Method The governing equations of the steady-state Stokes problem is written as follows: −
−2v Di j, j + p, j = f i in
(10.2.1)
u i,i = 0in
(10.2.2)
with Di j =
1 u i, j + u j,i 2
(10.2.3)
where u i is the velocity of fluid flow, p the pressure divided by the mass density, v the viscosity coefficient and Di j the transformed velocity strain tensor. The finite element discretization on the above equations yields the matrix equations as Du + C p = f
(10.2.4)
CT u = 0
(10.2.5)
and
where D and C denote the global matrices of the diffusion and the gradient, respectively. Next, we apply the augmented Lagrangian method [4] to the above problem, the analysis flow of which is shown in Fig. 10.2, where α is the penalty factor and β the pressure modification factor. The solution procedure consists of three nested iteration loops: the most inner Loop 1 for solving kinetic equations, Loop 2 for updating pressure and the outer Loop 3 for updating advection term. Convergence criteria for these three loops are named epsCR for Loop 1, epsP for Loop2, and epsU for Loop 3, respectively. Since these analysis parameters as well as the Reynolds number Re and the boundary conditions strongly influence the convergence of the solution, the feedforward neural network is used to find the most appropriate analysis parameters [3] as follows: (1) A lot of problems are solved with various analysis conditions such as the Reynolds number and the boundary conditions, and the analysis parameters such as epsCR and epsU. Thus, many data pairs of the analysis conditions, the analysis parameters, and the corresponding convergence efficiency are collected.
10.2 Parameter Identification of Augmented Lagrangian Method
81
Fig. 10.2 Flowchart for augumented Lagrangian method to solve steady-state Navier–Stokes problem. Reprinted from [3] with permission from Emerald Publishing Limited
(2) The feedforward neural network is trained using the data pairs collected above: the convergence efficiency and the analysis conditions as input data and the analysis parameters as teacher signals. (3) The trained neural network is applied to the determination of analysis parameters for any unsolved problem. Studied in the paper is the basic performance of the method with the 2D cavity flow, where the feedforward neural network of three units in the input layer, six units in the single hidden layer and two units in the output layer is trained using 83 training patterns: Reynolds number, epsCR and α as input data, and epsU and CPU time as teacher signals (Fig. 10.3). It is concluded that the method works well in the evaluation of the CPU time and the epsU for several analysis conditions given.
10.3 Predictor–Corrector Method for Nonlinear Structural Analysis The Newton method for nonlinear analysis is a kind of iterative methods, which is popular in the area of structural analysis. It is well known that the Newton iteration
82
10 Identifications of Analysis Parameters
Fig. 10.3 Network topology for parameter estimation with augumented Lagrangian method. Reprinted from [3] with permission from Emerald Publishing Limited
converges faster with better starting point. In this context, the neural networks could be used to predict better starting point above. The equilibrium condition for the nonlinear structural problem is written as follows: K ik U ik = λik P − Rik
(10.3.1)
+ U ik U ik = U i−1 k
(10.3.2)
with
where the superscript i and the subscript k denote the iteration number and the load step number, respectively, K ik the stiffness matrix, U ik the displacement vector, Rik the residual force vector, P the reference load vector and λik the load factor. The displacement vector is divided into three parts as p
U i−1 = U ck + U k + k
i−1
j
U k
(10.3.3)
j=1
where the superscripts c and p indicate the converged and the predicted terms, respectively. Equation (10.3.1) is iteratively solved for U ik , where the stiffness matrix and the given in Eq. (10.3.3) as residual load vector in the equation are obtained using U i−1 k
10.3 Predictor–Corrector Method for Nonlinear Structural Analysis
83
follows: K ik = K U i−1 k
(10.3.4)
Rik = R U i−1 k
(10.3.5)
The method predicts the direction and the magnitude of the starting point vector based on the three previously converged solution vectors, respectively (Fig. 10.4). The pattern of the direction is represented by the mean vector and the complementary vector, and the pattern of the magnitude by the slope factor, where the mean vector is calculated as follows: c 1 U k−i 3 i=1 3
U mean =
(10.3.6)
with
Fig. 10.4 Iterative procedure for predictor–corrector method with standard Newton method in load-control case. Reprinted from [5] with permission from Elsevier
84
10 Identifications of Analysis Parameters c
U k−i =
U ck−i U ck−i
(10.3.7)
On the other hand, the complementary vector is calculated as follows: U compl =
c U k−i
−
c U mean · U k−i U mean U mean · U mean
(10.3.8)
It is easily shown that the mean vector and the complementary vector are orthogonal to each other. The vector plane spanned by these vectors can be used as a reference plane to describe the marching pattern of the converged displacement vectors. Thus, the normalized converged displacement vectors are expressed as follows: c∗ = cm,k−i Uˆ mean + cc,k−i Uˆ compl Uˆ k−i
(10.3.9)
where the superscript ∗ indicates a vector projected onto the plane. The predicted direction to the next starting point is obtained by using the mean vector and the complementary vector as, p U k
p
=
p
cm,k U mean + cc,k U compl p
p
cm,k U mean + cc,k U compl
(10.3.10)
Defining the slope factor as αk−i =
U ck−i 2 U c 2 λk−i + Uk−ic λ1
(10.3.11)
1
the predicted magnitude of the incremental displacement is obtained as follows:
p
U k =
p k αk · λ λ1 1−
p
2 p αk U c1
(10.3.12)
where αk is the predicted slope factor. The neural network is employed to predict the pattern of direction and magnitude of the converged solution vectors of the current step using those of the previous steps as input data. For this purpose, the feedforward neural network has been utilized as the predictor [5], which is given as follows: (1) Three sets of data lk−3 , cm,k−3 , cc,k−3 , αk−3 , lk−2 , cm,k−2 , cc,k−2 , αk−2 and lk−1 , cm,k−1 , cc,k−1 , αk−1 are collected at the previous three iteration steps, respectively, where the normalized arc-length ln at the step n is defined as
10.3 Predictor–Corrector Method for Nonlinear Structural Analysis
85
follows: n U ic 2 λi 2 + ln = λ1 U c1 i=1
(10.3.13)
(2) Using the data sets collected in (1), the neural network is trained using the back-propagation algorithm with lk−i as the input and cm,k−i , cc,k−i , αk−i as the teacher signals. p p p (3) The trained neural network is applied to the prediction of cm,k , cc,k , αk for the input lk . The neural network above consists of the input layer with one unit, the two hidden layers each with eight units and the output layer with three units (Fig. 10.5). Some numerical tests show that this method can be employed for faster convergence (Table 10.1) and has been extended to the geometrically-nonlinear dynamic analysis (Figs. 10.6 and 10.7, Table 10.2) [6].
Fig. 10.5 Neural network for predictor–corrector method. Reprinted from [5] with permission from Elsevier
86
10 Identifications of Analysis Parameters
Table. 10.1 Comparison of computational efficiency of three iterative algorithms for analysis of cylindrical shell CPU time (s)
Additional time (s)
Total steps
Total iterations
Newton
44,837
0.0
561
1689
Predictor–corrector
15,108
2.3
192
564
Riks’ continuation
Fail to pass the second limit point
Reprinted from [5] with permission from Elsevier tol1 = 10−2 , tol1 = 10−4 , λ0 = 1, I d = 3 a CPU time, additional time for predictor, the number of load steps, and the number of total iterations
Fig. 10.6 Rotating slender plate; geometry and loading condition. Reprinted from [6] with permission from Elsevier
10.4 Contact Stiffness Estimation A contact problem is a constrained minimization problem, which is formulated as
(u) → min, subjecttogj (u) ≤ 0( j = 1, 2, · · · , n)
(10.4.1)
where u is the displacement vector, (u) the total potential energy and gj (u) represents the n non-penetration constraints that are defined as follows:
10.4 Contact Stiffness Estimation
87
Fig. 10.7 Time history of displacement in z-direction at measurement point of rotating slender plate. Reprinted from [6] with permission from Elsevier
Table. 10.2 Comparison of computation time for rotating slender plate
Total elapsed time (s)
Additional time for the predictor
Amount of reduction
Without the predictor
9996.30
−
61.38%↓
With the predictor
3860.48
1.95
Reprinted from [6] with permission from Elsevier
⎧ ⎨ < 0 (notincontact) g j (u) = 0 (incontact) ⎩ > 0 (inpenetration)
(10.4.2)
Based on the augmented Lagrangian method [4], the contact problem is written as the minimization problem of the functional as 1 L(u, λ, r ) = (u) + λT g(u) + r [g(u)]2+ 2
(10.4.3)
where g(u) = (g1 (u), g2 (u), · · · , gn (u))T , λ = (λ1 , λ2 , · · · , λn )T the Lagrange multipliers and r the penalty factor. When performing the contact analyses using ANSYS software, an analysis parameter called the contact stiffness is required, which is known to give significant influence on analysis accuracy and to have been determined in a trial-and-error basis. On
88
10 Identifications of Analysis Parameters
the other hand, the neural networks are employed to estimate the appropriate value of the contact stiffness [7]. The contact pressure P is defined in ANSYS as P=
0 ifg < 0 (i+1) ifg ≥ 0 Kn g + λ
(10.4.4)
and the Lagrange multipliers are stated as λ
(i+1)
=
λ(i) + K n g i f |g| > T O L N i f |g| < T O L N λ(i)
(10.4.5)
where K n is the normal contact stiffness, T O L N the penetration limit and ()(i) means the value at the i-th iteration step. The contact stiffness K n above is to be selected by the user. In ANSYS, the value of K n is determined through such parameters as the Young’s modulus, the contact areas and the scale factor. But, this determination process of the correct value of K n is unclear. As it is known that a high value of K n leads to a small penetration, whereas too high value may make the stiffness matrix ill-conditioned, the neural networks could be used to solve this issue. Hattori and Serpa have estimated the appropriate value of the normal contact stiffness K n for 3D contact problems by the feedforward neural network trained using the data obtained from the analysis results of 2D contact problems [7]. The procedure is summarized as follows: (1) Given a 3D contact problem, two 2D simplified models are derived by viewing the original model from two different directions: a 2D mesh obtained from the frontal view and another 2D mesh obtained from the lateral view (Fig. 10.8). Then, a lot of contact analyses are performed using the 2D front-view meshes 2D and with various values of the scale factor kn , and the maximum penetration gmax 2D the maximum contact pressure variation Pmax are obtained in each analysis, 3D and P 3D which are converted to the values gmax max for the 3D case, respectively, using the results obtained by the analysis of the corresponding lateral-view 3D ,P 3D mesh. Thus, a lot of data pairs ((gmax max ), kn ) are collected. (2) The feedforward neural network is trained using the patterns collected above, 3D ,P 3D using (gmax max ) as the input data and kn the teacher signal (Fig. 10.9). (3) The trained neural network above can be used to estimate an appropriate value of the scale factor kn . This method is tested for three cases using the feedforward neural network with two hidden layers, showing a good estimation capability of the method for all the cases tested (Fig. 10.10).
10.4 Contact Stiffness Estimation
89
Fig. 10.8 T-structure contact problem; Left: frontal view, Right: lateral view. Reprinted from [7] with permission from Elsevier
Fig. 10.9 Representation of neural network with correction factor. Reprinted from [7] with permission from Elsevier
90
10 Identifications of Analysis Parameters
Fig. 10.10 Neural network training results; Left: Penetration, Right: Contact pressure variation. Reprinted from [7] with permission from Elsevier
References 1. Yoshimura, S., Pyo, C.R., Yagawa, G., Kawai, K.: Finite element analyses of three dimensional fully plastic solutions using quasi-nonsteady algorithm and tetrahedral elements. Comput. Mech. 14, 128–139 (1994) 2. Yagawa, G., Yoshimura, S., Okuda, H., Neural network based parameter optimization for nonlinear finite element analyses. In: Proceedings of the 2nd Japan-US Symposium on Finite Element Methods in Large-Scale Computational Fluid Dynamics (Extended Abstracts), Tokyo, March 14–16, pp. 83–86 (1994) 3. Okuda, H., Yoshimura, S., Yagawa, G., Matsuda, A.: Neural network-based parameter estimation for non-linear finite element analyses. Eng. Comput. 15(1), 103–138 (1998) 4. Fortin, M., Glowinski, R.: Augumented Lagrangian Methods: Applications to Numerical Solutions of Boundary-Value Problems. North-Holland (1993) 5. Kim, J.H., Kim, Y.H.: A predictor-corrector method for structural nonlinear analysis. Comput. Methods Appl. Mech. Eng. 191, 959–974 (2001) 6. Suk, J.W., Kim, J.H., Kim, Y.H.: A predictor algorithm for fast geometrically-nonlinear dynamic analysis. Comput. Methods Appl. Mech. Eng. 192, 2521–2538 (2003) 7. Hattori, G., Serpa, A.L.: Contact stiffness estimation in ANSYS using simplified models and artificial neural networks. Finite Elem. Anal. Des. 97, 43–53 (2015)
Chapter 11
Solvers and Solution Methods
Abstract This chapter discusses the applications of neural networks to solvers or solution processes in numerical methods. Introduced here are the finite element methods using mutually connected neural networks (Sects. 11.1 and 11.2), the structural re-analysis (Sect. 11.3), the simulations of global flexibility and element stiffness (Sect. 11.4), the domain decomposition (Sect. 11.7), the contact search (Sect. 11.9), the physics-informed neural networks (Section 11.10), etc.
11.1 Finite Element Solutions Through Direct Minimization of Energy Functional Consider the Poisson’s equation given as −u = b in
(11.1.1)
where is the Laplace operator, u the unknown variable and b the source term. It is known that the solution of Eq. (11.1.1) minimizes the functional as follows: J (u) =
1 2 (∇u) − bu d 2
(11.1.2)
Using the finite element discretization, Eq. (11.1.2) is approximated as J (u) =
1 ki j u i u j − bi u i 2 i, j i
(11.1.3)
where u i and bi are the nodal values of u and b, respectively, and ki j the stiffness coefficient between nodes i and j. The network energy of the mutually connected neural network with the feedback neurons (see Fig. 11.1) is given as follows: © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 G. Yagawa and A. Oishi, Computational Mechanics with Neural Networks, Lecture Notes on Numerical Methods in Engineering and Sciences, https://doi.org/10.1007/978-3-030-66111-3_11
91
92
11 Solvers and Solution Methods
Fig. 11.1 Mutually connected neural network accompanying feedback neurons
i’
Base neuron
Feedback neuron
i
w ij (w ji ) j j’
E(u) = −
1 2
wi j u i u j −
θi u i −
i i, j (i = j) 1 − wi i u i u j − θi u i 2 i
1 w ui ui 2 i ii
(11.1.4)
i
where wi j is the connection weight between i-th and j-th neurons, θi the bias of the i-th neuron, and the subscripts i and i’indicates the i-th base neuron and the i’-th feedback neuron, respectively. Here, it is assumed that the conditions are satisfied as follows: wi j = w ji = −Aki j
(11.1.5)
θi = Abi
(11.1.6)
wii = (1 − A)kii
(11.1.7)
wi i = −kii
(11.1.8)
where A is a positive parameter. Then, Eq. (11.1.4) is translated into
11.1 Finite Element Solutions Through Direct Minimization …
93
Fig. 11.2 Flow around circular cylinder: CPU-time versus time step. Reprinted from [1] with permission from Wiley
Table. 11.1 Speed-up and parallel efficiency. Reprinted from [1] with permission from Wiley Number of processors N (= number of domains)
4
16
32
64
128
256
Speed-up
3.37
14.8
30.7
58.0
116.0
179.0
Parallel efficiency
0.843
0.923
0.958
0.906
0.904
0.699
⎡
⎤ 1 E(u) = A⎣ ki j u i u j − bi u i ⎦ 2 i, j i
(11.1.9)
Thus, it is concluded that to solve the Poisson’s equation (11.1.1) is equivalent to minimize the network energy defined in Eq. (11.1.9). Yagawa and Okuda have proved that the network minimizes Eq. (11.1.9) when A < 2 and applied to the incompressible flow analyses, reducing CPU-time (Fig. 11.2) with very high parallel efficiency on Fujitsu AP1000 parallel computer (Table 11.1) [1].
11.2 Neurocomputing Model for Elastoplasticity Another solution method of the finite elements by the neural network of Hopfield type has been proposed based on the two-type-variable minimum potential energy principle, where the elastoplastic problem is transformed into a minimization problem, which can be solved by the mutually connected neural network [2]. It is also shown in
94
11 Solvers and Solution Methods
Fig. 11.3 Elastoplasticity problem for simple truss. Reprinted from [2] with permission from Elsevier
Fig. 11.4 Neural network used for elastoplasticity problem. Reprinted from [2] with permission from Elsevier
the paper that the neural network for solving the elastoplasticity problem (Fig. 11.3) can be mapped into a dynamic electric circuit (Fig. 11.4), which makes it possible to solve the problem within an elapsed time of circuit time-constant.
11.3 Structural Re-analysis The feedforward neural network has been employed to approximate the mapping between the external loads and the displacements in a truss structure [3]. Figure 11.5
11.3 Structural Re-analysis
95
Fig. 11.5 Truss structure
5
4
3
1
f3
2
f2 shows a truss structure tested and Fig. 11.6 the corresponding two-layered neural network for the structure, in which the control flags in the external loads assigned at the input layer indicate where the external loads are applied (“1” indicates the external load being applied), and those in the displacements assigned at the output layer show where the structure is supported (“0” indicates a fixed displacement). The units in the output layer are connected to all the units in the input layer, whereas only Fig. 11.6 Two-layered network
Control Flags
0
f 1x
0
u1
0
f 1y
0
v1
0
f 2x
1
u2
1
f 2y
1
v2
1
f 3x
1
u3
1
f 3y
1
v3
0
f 4x
1
u4
0
f 4y
1
v4
0
f 5x
0
u5
0
f 5y
0
v5
External Loads
Displacements
96
11 Solvers and Solution Methods
the connections between units whose control flags are assigned to “1” are defined to be active. The displacement is represented by a linear combination of the external loads as follows: uj =
n
w ji f i
(11.3.1)
i=1
where the subscript j indicates the node number in the structure: u j is the displacement at the node j, f i the external force at the node i, w ji the connection weight between the nodes j and i, and the summation is performed only for values with their control flags are assigned to 1. Once the displacements of all the nodes are calculated, the internal forces are obtained using the geometries and the material properties of the structure. Combined with the prescribed external forces, the equilibrium conditions at the nodes are evaluated, and then the errors are obtained. The active connection weights are updated by using the back-propagation algorithm. When the external loads are added, or some of the current external loads are changed or deleted, or the support conditions are changed, or the combination of these changes occurs, the control flags are changed and the connection weights are re-trained with the present values as the initial ones, resulting in a faster convergence. Figure 11.7 and Table 11.2 show, respectively, some test problems and the results.
11.4 Simulations of Global Flexibility and Element Stiffness We discuss here the global flexibility simulation (GFS) method and the element stiffness simulation (ESS) method developed for the re-analyis [4], both using the feedforward neural networks to simulate the input–output relationship. The governing equation of the finite element method, is represented as follows: Ku = f
(11.4.1)
u = K −1 f
(11.4.2)
which is solved as
In the GFS method, Eq. (11.4.2) is viewed as a mapping from the external loads to the displacements, which is approximated by the feedforward neural network with the loads as the input and the displacements as the output. The method is not for a single analysis, but for re-analyses with modified external forces. In the method, Eq. (11.4.1) is represented as follows:
11.4 Simulations of Global Flexibility and Element Stiffness
97
Fig. 11.7 Truss structures tested. Reprinted from [3] with permission from Elsevier
Table. 11.2 Maximum member forces by neural network and exact analysis Maximum member forces, tension/compression Maximum tensile force (kN)
Maximum compressive force (kN)
State of structure
Member
Exact
Neural network
Member
Exact
Neural network
1
1, 2
15.1
14.5
15
55.2
55.2
2
1, 2
15.1
15.8
15
55.2
55.1
3
38, 39
16.7
16.7
84
54.6
54.6
4
64, 65
14.4
14.4
84
47.8
47.8
Reprinted from [3] with permission from Elsevier n
e=1
Ae K e AeT u − f = 0
(11.4.3)
98
11 Solvers and Solution Methods
where ()e means the variable defined for the e-th element and Ae the transform matrix, which defines the relationship between the element and the global displacement vectors. The stiffness of each element is represented by the feedforward neural network and the global stiffness by the sum of element stiffness of each element as shown in Fig. 11.8. The network consists of the input layer, the sub-nets corresponding to the elements and the output layer, where the connections between the input layer and the subnets are represented as AeT , and those between the sub-nets and the output layer as Ae . For approximation of the element stiffness, the neural network of two layers is used for the known elements, where the components of the element stiffness matrix are used as the connection weights, while that of three layers for the unknown elements, in which the connection weights are determined by the learning and Eq. (11.4.3) is solved by an iteration method such as the conjugate-gradient. The proposed method has been successfully tested for a sample problem (Figs. 11.9 and 11.10).
Fig. 11.8 ESS method
11.4 Simulations of Global Flexibility and Element Stiffness
99
Fig. 11.9 FEM mesh for sample analysis. Reprinted from [4] with permission from Elsevier
Fig. 11.10 Neural network representation of FEM model. Reprinted from [4] with permission from Elsevier
100
11 Solvers and Solution Methods
Fig. 11.11 State diagram for learning problem in neural network. Reprinted from [5] with permission from Wiley
11.5 Solutions Based on Variational Principle Many engineering problems are regarded as finding a function that minimizes or maximizes the value of a specified functional. As is well known, the feedforward neural network has the ability to approximate any continuous non-linear function within a prescribed error. This, in turn, means that the neural network can be seen as a function, where the neural networks with the same structure and the different connection weights span a parameterized function space. The learning process of the neural network is formulated in terms of the minimization of an error function of the connection weights so as to fit the output of the neural network to the corresponding teacher signal. In contrast to the above, Lopez et al. have proposed a variational formulation for the feed forward neural networks, where the learning process is formulated in terms of the minimization of a specified functional (Fig. 11.11) [5]. In the paper, several problems in shape design, optimal control and inverse problems are tested, showing the reduction of multimodality. An extension for the feedforward neural network, which is suitable for variational problems, is also developed [6].
11.6 Boundary Conditions The wave propagation in a continuum is one of the most important themes in the computational mechanics. The computational power needed for its simulation is rather high, depending on both the area of the analysis domain and the time step
11.6 Boundary Conditions
101
(a)
(b)
Fig. 11.12 Analysis domain with a defect
size. The larger the computational area is, the more memory and computing time are required. Figures 11.12a, b show, respectively, the whole analysis domain with a defect and its local division near the defect. With the mesh (b), one can perform the wave propagation simulation much faster than with (a). The results obtained with the domain (b) are, however, different from those with the domain (a), because of the reflection waves from the newly generated boundaries. Several researches have addressed the nonreflecting boundary or the transmitting boundary [7]. The method of the nonreflecting boundary using the neural networks is also developed [8] (see Fig. 11.13), where an original domain Fig. 11.13a is truncated to a local domain, Fig. 11.13b with the nonreflecting boundary using the neural networks. The neural network provides to each node on the nonreflecting boundary the value of the internal force of the truncated part from the original domain. More specifically, to simulate the nonreflecting boundary, the proposed neural network is trained to output the dynamic reaction force at each boundary node from the input parameters consisting of the displacements of the corresponding three nodes: the boundary node itself and its nearest and second nearest nodes located in the line perpendicular to the boundary. The neural network proposed in the paper by Ziemianski above consists of four layers with 12 and 8 units in the first and the second hidden layers, respectively.
102
11 Solvers and Solution Methods
(a)
(b)
Fig. 11.13 Nonreflecting boundary using feedforward neural network
11.7 Hybrid Graph-Neural Method for Domain Decomposition A large analysis domain is divided into a number of subdomains in the domain decomposition method (DDM) for the parallel processing. The computational efficiency of the parallel processing (see Appendix A2) depends on the quality of the result of the domain decomposition. For example, it is often required that the number of nodes at the subdomain boundaries is minimized. A hybrid method for the domain decomposition has been proposed [9], where the graph technique and the neural network are, respectively, utilized for the partial decomposition and the bisection. First, the mesh bisection, the division of a mesh into two partitions A and B, is translated into the minimization of energy to be solved by the Hopfield neural network, which is given as 1 E(S) = − 2
i, j (i = j)
2 α si Ti j si s j + 2 i
(11.7.1)
where si =
1 (ifnodeiisassignedtothepartitionA) −1 (ifnodeiisassignedtothepartitionB)
(11.7.2)
11.7 Hybrid Graph-Neural Method for Domain Decomposition
Ti j =
1 (ifapairofnodesareconnectedbyanedge) 0 (otherwise)
103
(11.7.3)
Here, Ti j is the connection weights between the i-th and the j-th nodes and Ti j = T ji is assumed, si the state of the j-th node, Ii the bias of the i-th node, S = s1 , s2 , . . . , s N −1 , s N and α the imbalance parameter to control the bisection ratio. The above minimization problem is then solved by using the mean field annealing [10], where the mean field equation for the bisection is given by ⎛ si = tanh⎝
N
j=1
⎞ si ⎠ Ti j − α T emp
(11.7.4)
where T emp is the temperature. All values of si are initialized to small random numbers, and the temperature is gradually lowered during the optimization process. At a sufficiently low temperature, the state of each neuron reaches a value close to +1 or −1, indicating the fulfillment of the bisection criteria. The above method is successfully applied to three test cases, each of which has several hundreds to thousands of nodes [9]. Among the test cases, a 648-noded mesh with three openings is divided into two balanced subdomains: one has 335 nodes and the other 337 nodes, both including 24 interface nodes (Fig. 11.14).
11.8 Wavefront Reduction The node numbering in the finite element analysis is very important because it has strong influence on the efficiency of the solution process of the simultaneous equation. Several researches have addressed this problem [11–15]. The nodal ordering method [14, 15] for the frontal solver [16] has been implemented in the feedforward neural network with the graph theory [17]. The Sloan’s algorithm [14, 15] is summarized as follows: (1) The element clique graph (ECG) is defined for a given mesh, where the nodes in the ECG are the same as those in the corresponding finite element mesh, and the nodes in an element are interconnected each other as shown in Fig. 11.15. (2) All the nodes are assigned non-active. (3) The root node s and the end node e are, respectively, defined [18], where the root node s is numbered to 1. (4) Nodes already numbered are assigned post-active, and nodes, adjacent to a postactive node and not post-active, are assigned active. For every active node, a value called priority is calculated. (5) The node with the highest priority is labelled to be the next node number. (6) (4)–(5) are repeated until all the nodes are numbered.
104
11 Solvers and Solution Methods
Fig. 11.14 a Finite element model with three openings. b Middle subdomain. c Subdomain bisected by ANN. d Final partitioning. Reprinted from [9] with permission from Elsevier
(a) FEM Mesh Fig. 11.15 FEM mesh and its ECG
(b) Element Clique Graph
11.8 Wavefront Reduction Table 11.3 Comparison of results by Sloan and neural network
105 Example
1
2
3
No. of nodes
5551
2351
1248
No. of elements
5400
2200
1152
No. of members
21,750
8950
4704
Frms by Sloan
75.52
65.6
30.8
Frms by neural net
71.14
61.37
28.4
Reprinted from [17] with permission from Wiley
Though the priority above is calculated using a linear function of two graph parameters, appropriate coefficients used there should be determined depending on the problem to be solved. Kaveh et al. have evaluated the priority using the feedforward neural network [17], where, for each node, six parameters which represents the graph-theoretical relationship between the node and its neighboring nodes are used as input. The neural network employed has three layers: the input layer with six units, the hidden layer with two units and the output layer with one unit. The improvement is seen on the root-mean-squared wavefront for three problems tested (Table 11.3).
11.9 Contact Search The dynamic contact-impact analyses based on the finite element method, such as car crash simulations for improved safety and drop impact simulations of mobile devices for improved durability have played an important role in modern industry [19, 20]. The dynamic contact-impact analysis usually employs the explicit time integration scheme and its computational efficiency depends on that of contact search to identify the contacting points, and the node-segment type contact algorithm is popular for the dynamic contact impact analyses [21, 22]. In this algorithm, one of the contacting surfaces facing each other is named the master surface and the other the slave surface, where a surface is represented as a set of polygons in the finite element method and each polygon is a face of an element exposed on the surface and called a segment. It is noted that the contact search in the dynamic analyses usually consists of two consecutive processes: the global search and the local search. The former process finds pairs of a master segment or a segment on the master surface and a slave node or a node on the slave surface that are considered to be in contact or in proximity. The bounding boxes are often used in the global search [23]. Once the contacting pairs of a slave node and the corresponding master segment are determined by the global contact search, the local contact search identifies the local coordinates of the contact point which is the projection of the penetrating slave node onto the corresponding master segment (see Fig. 11.16). The conventional local
106
11 Solvers and Solution Methods
Master Segment Slave node Fig. 11.16 Contact between slave node and master segment in FEM-based dynamic contact-impact analysis
contact search procedure in the node-segment algorithm is summarized as follows: (1) For each slave node Ps , the neighbor master node Pm and the contacting segment is determined through the global search. Figure 11.17 shows a configuration of the master segment and the slave node. (2) The contact point H, i.e. the projection of the Ps onto the segment, is identified by solving the set of equations, derived from the orthogonality condition, with the Newton’s method. (3) The signed distance between Ps and H is calculated to find whether the slave node and the corresponding master segment is in contact: the positive distance indicates they are not in contact, and vice versa.
D Ps A
Pm
H
C
B
Fig. 11.17 Local contact search in node-segment algorithm based on projection of slave node onto corresponding master segment
11.9 Contact Search
107
If feedforward neural networks are used for the iterative solution process in (2) above, it is easily expected that one can accelerate the local contact search. The procedure for performing node-segment type local contact search by feedforward neural networks is summarized as follows: (1) Data preparation phase: For a lot of pairs of a master segment and a slave node, the contact state is calculated. The contact state {stateo f contact} of the pair, such as the location of the contact point, is evaluated using iterative solution method. In this way, a lot of data pairs of ({s − parameter s} of the master segment and the coordinates of the slave node, {stateo f contact}) are collected, where a set of parameters defining a segment including coordinate values of nodes in the segment is denoted to be {s − parameter s}. (2) Training phase: Using the data pairs collected in the data preparation phase, a neural network is trained to output {stateo f contact} taking corresponding {s − parameter s} and the coordinates of the slave node as input. (3) Application phase: The trained neural network constructed in the training phase above is implemented to the local contact search process of the finite element analysis code. Given probable contacting pairs prepared through the global contact search, the trained neural network promptly outputs the estimated contact state {stateo f contact}. In [24], this method is tested for four-node linear segment in the FEM. The input vector consist of coordinates of nodes in the master segment and those of the slave node, and the output vector the coordinates of the contact point in the local parameter coordinate system. The neural network consists of three layers (one hidden layer) with 8 units in the hidden layer. The local coordinate values of contact points are accurately identified by the neural network (Fig. 11.18). This method has been extended and applied to the contact search for NURBS-based curved surfaces [25]. 200000
Number of Patterns
Fig. 11.18 Accuracy of local contact search [24].
150000
100000
50000
0 -0.2
-0.1
0
0.1
Error in Local Coordinate
0.2
108
11 Solvers and Solution Methods
11.10 Physics-Informed Neural Networks A kind of regularization (see Sect. 2.3) is proposed for solving the nonlinear partial differential equations, which is called the physics-informed neural network [26]. Consider a partial differential equation as ∂u(x, t) + N [u] = 0, x ∈ , t ∈ [0, T ] ∂t
(11.10.1)
where u(x, t) is the unknown, N [] the nonlinear differential operator, and a subset of R D . Assuming a data-driven solution of u(x, t) by the neural network, it usually results in minimizing the error function E u as follows: Eu =
nu i i 1 u x , t − u i 2 u u n u i=1
(11.10.2)
where x iu , tui , u i is the initial and boundary training data of u(x, t), and n u the number of training data. In the physics-informed neural network, the error function E to be minimized has an additional term as follows:
2 nN i i i i 1 ∂u x N , t N + N u x N , tN (11.10.3) E = Eu + E N = Eu + n N j=1 ∂t
where x iN , t Ni is the collocation point and n N the number of collocation points. This method is considered to be equivalent to have two neural networks that share parameters: one for E u and the other for E N . As the latter is derived by applying the chain rule for differentiation to the former, the latter can use the same parameters as those of the former, where the differentiation can be easily performed by the automatic differentiation capability implemented in some deep learning toolkit library such as TensorFlow [27]. It is claimed in [26] that the physics-informed networks have a key feature that they can be effectively trained using small data sets, and they are tested in one dimensional Burgers equation. This neural networks have been employed to solve the Euler equations modelling the high-speed aerodynamic flows [28], to predict the arterial blood pressure from the MRI data [29], to construct a surrogate model for the fluid flows [30], and to solve the differential equations using the reinforcement learning [31].
11.11 Dynamic Analysis with Explicit Time Integration Scheme
109
11.11 Dynamic Analysis with Explicit Time Integration Scheme The neural networks are employed to mitigate the limit of the explicit dynamics [32]. To discuss the method, we take here the standard elastodynamics equation as ˙ + [K ({u})]{u} = { f e } ¨ + [D]{u} [M]{u}
(11.11.1)
where {u}, ¨ {u}, ˙ and {u} are the nodal acceleration, the velocity, and the displacement, respectively, [M] the diagonalized lumped mass matrix, [D] the damping matrix, [K ({u})] the displacement-dependent stiffness matrix, and { f e } the nodal external force vector. Here, [D] = μ[M] or the Rayleigh mass damping, is assumed. Discretizing Eq. (11.11.1) in time using the central-difference approximations with the time step t as {u}t+t − {u}t−t 2t
(11.11.2)
{u}t+t − 2{u}t + {u}t−t (t)2
(11.11.3)
{u} ˙ t= {u} ¨ t=
where {u}t is the displacement vector at the time t, we have the iteration rule as follows:
{u}t+t = [M]−1 α{ f T }t + β{u}t + γ {u}t−t
(11.11.4)
where α=
2(t)2 2 + μt
(11.11.5)
β=
4 2 + μt
(11.11.6)
γ =−
2 − μt 2 + μt
{ f T } = { f e } − [K ({u})]{u}
(11.11.7) (11.11.8)
As is well known, the explicit time integration scheme is conditionally stable, and t must be smaller than the critical time step according to the Courant-FriedrichsLevy condition, which is the major drawback in the computation with the explicit dynamics. To solve this issue, the neural network has been employed so that the use of the time step larger than the critical one is possible [32], which is summarized as follows. First, the neural network is trained to predict an effective acceleration {u}t
110
11 Solvers and Solution Methods
for the large time step kt(k > 1) from the system state at the current time, where {u}t is written as follows: {u}t =
{u}t+kt − 2{u}t + {u}t−kt (kt)2
(11.11.9)
From this equation, the new iteration formula is obtained as {u}t+kt = {u}t (kt)2 + 2{u}t − {u}t−kt
(11.11.10)
where {u}t is predicted by the neural network from the current system state including the displacement, the velocity, and the instantaneous acceleration [M]−1 { f T }t . It is noted that the neural network is employed node-wise, i.e. the trained neural network predicts the three components of the effective acceleration vector of a node from the nine values of the displacement, the velocity, and the acceleration of the node at the current state. In the paper, using the seven layer feedforward neural network, the total Lagrangian explicit dynamics simulation of a soft tissue deformation is successfully performed with the time step larger than the critical one.
11.12 Reduced Order Model for Improvement of Solutions Using Coarse Mesh A finite element formulation for the stress analysis of solid with a coarse mesh is given as follows: Aic Uic = Fic
(11.12.1)
where Aic is the m × m global stiffness matrix, Uic the m-dimensional displacement vector, and Fic the m-dimensional force vector, m the degree of freedom of the system, the superscript c means the coarse mesh, and the subscripti the identifications of boundary condition and external forces. On the other hand, using the fine mesh, the same problem is written as follows: f
f
Ai Ui = Fi f
f
(11.12.2) f
where Ai the M × M global stiffness matrix, Ui the M-dimensional displacement f vector, Fi the M-dimensional force vector, M the degree of freedom of the system (M m), and the superscript f means the fine mesh. f cf The projection of Ui to the coarse mesh Ui , which is considered to be much c more accurate than Ui , will be the solution of the following equation: cf
Aic Ui = Fic + Di
(11.12.3)
11.12 Reduced Order Model for Improvement of Solutions Using Coarse Mesh
111
where Di is the m-dimensional correction vector. cf cf If Di is obtained as the function of Ui , i.e. Di = g Ui , one can achieve the cf
accurate solution Ui by solving the following equation using iterative method: cf cf Aic Ui = Fic + g Ui
(11.12.4)
This is regarded as a reduced order model of Eq. (11.12.2), and the function g() can be simulated by the neural network [33], where the feedforward neural network cf is trained to simulate the mapping g using a lot of sample data pairs Ui , Di obtained from the two analyses: one using a coarse mesh and the other a fine mesh. Though solving Eq. (11.12.4) of small scale gives as accurate solutions as solving Eq. (11.12.2) of much larger scale, one must use iterative method to solve the former. Then, this method is usually applied to non-linear or transient problems that are solved iteratively in nature, and the equation to be solved is given as cf cf Aic Ui = Fic + g Ui−1
(11.12.5)
where i means the identifications of boundary conditions and external forces, as well as the non-linear iteration count or the time step count. This method has been applied to a dynamic solid mechanics simulation with the material non-linearity, a flow problem of the nearly incompressible regime, and a fluid structure interaction problem.
References 1. Yagawa, G., Okuda, H.: Finite element solutions with feedback network mechanism through direct minimization of energy functional. Int. J. Numer. Meth. Eng. 39, 867–883 (1996) 2. Daoheng, S., Qiao, H., Hao, X.: A neurocomputing model for the elastoplasticity. Comput. Methods Appl. Mech. Eng. 182, 177–186 (2000) 3. Jenkins, W.M.: A neural network for structural re-analysis. Comput. Struct. 72, 687–698 (1999) 4. Li, S.: Global flexibility simulation and element stiffness simulation in finite element analysis with neural network. Comput. Methods Appl. Mech. Eng. 186, 101–108 (2000) 5. Lopez, R., Balsa-Canto, E., Oñate, E.: Neural networks for variational problems in engineering. Int. J. Numer. Meth. Eng. 75, 1341–1360 (2008) 6. López, R., Oñate, E.: An extended class of multilayer perceptron. Neurocomputing 71(13–15), 2538–2543 7. Harari, I., Grosh, K., Hughes, T.J.R., Malhotra, M., Pinsky, P.M., Stewart, J.R., Thompson, L.L.: Recent developments in finite element methods for structural acoustics. Arch. Comput. Methods Eng. 3(2–3), 131–309 (1996) 8. Ziemianski, L.: Hybrid neural network/finite element modelling of wave propagation in infinite domains. Comput. Struct. 81, 1099–1109 (2003) 9. Kaveh, A., Bahreininejad, A., Mostafaei, H.: A hybrid graph-neural method for domain decomposition. Comput. Struct. 70, 667–674 (1999) 10. Peterson, C., Anderson, J.R.: A mean field theory learning algorithm for neural networks. Complex Syst. 1, 995–1019 (1987)
112
11 Solvers and Solution Methods
11. Cuthill, E., McKee, J.: Reducing the bandwidth of sparse symmetric matrices. ACM Publ. P-69, pp. 157–172 (1969) 12. Liu, J.W.H., Sherman, A.H.: Comparative analysis of the Cuthill-McKee and Reverse CuthillMcKee ordering algorithms for sparse matrices. SIAM J. Numer. Anal. 13, 198–213 (1976) 13. George, J.A., McIntyre, D.P.: On the Application of the minimum degree algorithm to finite element problems. SIAM J. Numer. Anal. 15, 90–112 (1978) 14. Sloan, S.W.: An algorithm for profile and wavefront reduction for sparse matrices. Int. J. Numer. Meth. Eng. 23, 239–251 (1986) 15. Sloan, S.W.: A Fortran program for profile and wavefront reduction. Int. J. Numer. Meth. Eng. 28, 2651–2679 (1989) 16. Irons, B.M.: A frontal solution scheme for finite element analysis. Int. J. Numer. Meth. Eng. 2, 5–32 (1970) 17. Kaveh, A., Rahimi Bondarabady, H.A.: Wavefront reduction using graphs, neural networks and genetic algorithm. Int. J. Numer. Meth. Eng. 60, 1803–1815 (2004) 18. Duff, I.S., Reid, J.K., Scott, J.A.: The use of profile reduction algorithms with a frontal code. Int. J. Numer. Meth. Eng. 28, 2555–2568 (1989) 19. Zhong, Z.H.: Finite Element Procedures for Contact-Impact Problems. Oxford U.P. (1993) 20. Wriggers, P.: Computational Contact Mechanics. Wiley, Hoboken (2002) 21. Hallquist, J.O., Goudreau, G.L., Benson, D.J.: Sliding interfaces with contact-impact in largescale Lagrangian computations. Comput. Methods Appl. Mech. Eng. 51, 107–137 (1985) 22. Benson, D.J., Hallquist, J.O.: A single surface contact algorithm for the post-buckling analysis of shell structures. Comput. Methods Appl. Mech. Eng. 78, 141–163 (1990) 23. Ericson, C.: Real-Time Collision Detection. Morgan Kaufmann, Burlington (2005) 24. Oishi, A., Yoshimura, S.: A new local contact search method using a multi-layer neural network. Comput. Model. Eng. Sci. 21(2), 93–103 (2007) 25. Oishi, A., Yagawa, G.: A surface-to-surface contact search method enhanced by deep learning. Comput. Mech. 65, 1125–1147 (2020) 26. Raissi, M., Perdikaris, P., Karniadakis, G.E.: Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 378, 686–707 (2019) 27. Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mane, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viegas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., Zheng, X.: TensorFlow: Large-scale machine learning on heterogeneous distributed systems (2016). arXiv:1603.04467 28. Mao, Z., Jagtap, A.D., Karniadakis, G.E.: Physics-informed neural networks for high-speed flows. Comput. Methods Appl. Mech. Eng. 360, 112789 (2020) 29. Kissas, G., Yang, Y., Hwuang, E., Witschey, W.R., Detre, J.A., Perdikaris, P.: Machine learning in cardiovascular flows modeling: predicting arterial blood pressure from non-invasive 4D flow MRI data using physics-informed neural networks. Comput. Methods Appl. Mech. Eng. 358, 112623 (2020) 30. Sun, L., Gao, H., Pan, S., Wang, J.-X.: Surrogate modeling for fluid flows based on physicsconstrained deep learning without simulation data. Comput. Methods Appl. Mech. Eng. 361, 112732 (2020) 31. Wei, S., Jin, X., Li, H.: General solutions for nonlinear differential equations: a rule-based selflearning approach using deep reinforcement learning. Comput. Mech. 64, 1361–1374 (2019) 32. Meister, F., Passerini, T., Mihalef, V., Tuysuzoglu, A., Maier, A., Mansi, T.: Deep learning acceleration of Total Lagrangian Explicit Dynamics for soft tissue mechanics. Comput. Methods Appl. Mech. Eng. 358, 112628 (2020) 33. Baiges, J., Codina, R., Castañar, I., Castillo, E.: A finite element reduced-order model based on adaptive mesh refinement and artificial neural networks. Int. J. Numer. Meth. Eng. 121, 588–601 (2020)
Chapter 12
Structural Identification
Abstract Chapter 12 focuses on applications of neural networks to structural identifications including nondestructive evaluation. Sections 12.1 and 12.2 describe the applications of neural networks to nondestructive evaluation, Sect. 12.3 the estimation of stable crack growth, Sect. 12.4 the prediction of failure mechanism in power plant components, Sect. 12.5 the identification of beam structure, and Sect. 12.6 the prediction of beam-mass vibration. Various researches categorized to the above applications are reviewed in Sect. 12.7.
12.1 Identification of Defects with Laser Ultrasonics Such parameters as the size and the location of a defect have been identified from the dynamic responses measured at several points on the solid surface [1]. Figure 12.1 shows the schematic of the problem, where the defect parameters as the defect length L, the central position of the defect (X, Y ), and the inclination angle A are to be identified by measuring the dynamic response with the ultrasonics of the vertical mea displacements at the surface of the solid, u mea 1 (t) to u n (t). In the above study, the neural network based inverse analysis is employed as follows: Phase 1: Prepare a number of training patterns, i.e. N combinations of the defect parameters versus their corresponding dynamic responses of displacements or L p , X p , Y p , A p versus u 1 (t) through u n (t) (t = t0 , . . . , ti ) obtained by the dynamic finite element analyses, where the subscript p indicates the value for the p-th pattern. Phase 2: Train the neural network using the N training patterns above with the dynamic responses of displacements u 1 (t) through u n (t) as the input and the defect parameters L p , X p , Y p , A p as the output. mea Phase 3: Input a set of measured dynamic responses u mea 1 (t) through u n (t) to the input units of the trained network above. Then, the network immediately outputs their corresponding defect parameters.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 G. Yagawa and A. Oishi, Computational Mechanics with Neural Networks, Lecture Notes on Numerical Methods in Engineering and Sciences, https://doi.org/10.1007/978-3-030-66111-3_12
113
114 Fig. 12.1 Ultrasonic NDE. Reprinted from [2] with permission from Taylor & Francis
12 Structural Identification
Ultrasonic Wave
mea (t) 1
u
L
umea n (t)
A (X,Y)
Note that the time consuming optimization process is involved only in the training process of the network at the second phase above, which is independent from the physical problem to be solved. In addition, once the training process is over, one can solve immediately the inverse problems for various combinations of defect parameters, meaning that the method is regarded as a database of the finite element solutions with an inverse analysis capability, and this approach can be easily applied to any kinds of inverse problems if a sufficient number of the training patterns are available through the computational mechanics simulation. Oishi et al. have extended this method to the defect identification with the laser ultrasonics [2]. As given in [3], the laser ultrasonics technique is based on the phenomenon that the laser irradiation onto solid surface generates ultrasonic waves in the solid, which have several attractive features as follows: (1) Non-contact generation of ultrasonic waves in materials. (2) Possibility of fully non-contact generation and detection of ultrasonic waves by combining with optical (laser) interferometer for ultrasonic detection. These features make the fully non-contact ultrasonic NDE available, which could be applied to materials in hostile environments, such as very high temperature, or of complicated configuration [4]. There are two kinds of mechanism of ultrasonic generation by the laser irradiation [3, 5]. One is named the thermoelastic mode (T mode), in which the thermal energy absorbed from the irradiated laser beam causes the thermal expansion of the laser irradiated shallow area, resulting in the thermoelastic stress wave or the ultrasound. The other is called the ablation mode (A mode), in which the thermal energy absorbed from the irradiated laser beam with higher power density causes the ablation of materials to circumstances and its reactive force vertical to the surface generates the ultrasound in the materials. In the above, the T mode generation causes little damage to the material, while the A mode some damage to the solid surface. Thus, the former mode is often adopted to generate the ultrasonic wave by laser. On the other hand, the optical interferometer is used in order to detect the ultrasonic waves, as it can directly measure the dynamic response of displacements at the monitoring point on solid surface, enabling direct comparison with the calculated
12.1 Identification of Defects with Laser Ultrasonics
35mm
10mm
115
YAG laser
He-Ne laser
100mm 70mm Fig. 12.2 Tested specimen. Reprinted from [2] with permission from Taylor & Francis
one. In addition, the spacial resolution of the optical interferometer is much higher than that of the conventional transducers. In the T mode experimental apparatus, the YAG (Yttrium Aluminum Garnet) laser beam is irradiated onto a sample material, and the T-mode ultrasonic wave is generated. The dynamic response of the displacements at several monitoring points at the solid surface are measured by the Michaelson optical interferometer using a He–Ne laser, where the beam is focused to be the size of 35 mm × 0.1 mm through the cylindrical lens. Figure 12.2 shows the schematic view of the specimen and the laser beams. The specimen is 70 mm × 100 mm × 10 mm in size and made of steel (S55C). An artificial surface defect of the size of 3 mm (or 5 mm) in height and 1 mm in width is machined in the bottom of the specimen. The YAG laser beam is irradiated onto the point, which is 3 mm apart from the point just above the defect. The dynamic response of displacements at two monitoring points on the surface, both of which are 5 mm apart from the laser irradiating point, are directly measured by the optical interferometer. A data averaging is applied to the measured time histories of displacements in order to improve the S/N ratio. Two-dimensional dynamic finite element analysis is employed to simulate both generation and propagation of ultrasonic wave. To reduce the calculation load, the size of the analysis domain is set to be 10 mm × 70 mm, causing little effect on the calculation results because main events such as the wave reflection by the defect and their detection finish before the incident wave reaches both edges of the analysis domain. The effect of laser irradiation is simulated by application of an external force to the surface nodes in irradiated area. In the case of T-mode laser ultrasonics, two horizontal traction forces, which have the same amplitude with opposite direction
116
12 Structural Identification
and tend to expand the irradiated area, are applied to the surface nodes at the edges of the irradiated area. The time history of the amplitude of external force applied to the surface nodes is similar to that of the accumulated thermal energy absorbed from the laser beam. Also, the horizontal traction forces (max 0.5kN/m) are applied to the edge nodes of the irradiated area of 0.2 mm wide, and very small forces (max 0.05 MPa) normal to the surface to the nodes in the area. Though this is a very simplified model, it can simulate well the basic features of the T-mode laser ultrasonics [6]. Dynamic responses of displacements at two monitoring points on the upper surface are calculated using the dynamic finite element method with the domain decomposition (see Appendix A2). The whole analysis domain is decomposed into 700 subdomains of 1 mm × 1 mm size, and each subdomain is equally divided into four-noded quadrilateral isoparametric elements of 0.1 mm × 0.1 mm. The material properties of steel are employed, i.e. the Young’s modulus = 205 GPa, the mass density = 7.84 × 103 kg/m3 and the Poisson’s ratio = 0.3. The explicit time integration scheme, i.e. the central difference method, is used with the time increment of 0.01 ms. In order to quantitatively estimate the robustness of the neuro-based system, a robustness measure is introduced as follows. When using a well-trained neural network in an application process, the error on the k-th output Ok caused by the small amount of error at the i-th input Ii is evaluated as Ok (Ii ) = Ok (. . . , I i + Ii , . . .) − Ok (. . . , I i , . . .) ≈
∂ Ok · Ii = Sik · Ii ∂ Ii (12.1.1)
where Sik , called the Error Propagation Coefficient (EPC), corresponds to the influence of the fluctuation of the i-th input Ii on that of the k-th output Ok (Ii ). The EPC can be easily calculated for the trained neural network as the input–output relationship of the neural network is mathematically given in an explicit form with the connection weights and the activate functions. Equation (12.1.1) shows that, as the EPC values are larger, a small amount of fluctuation in the input data would cause larger fluctuation in the output signal, and vice versa. For the assessment of the robustness, the following global index S, called the global EPC, is defined. S=
N p=1
n i=1
m k 2 S ( p) k=1
N ×n×m
i
(12.1.2)
where Sik ( p) is the EPC value defined for the k-th output signal and the i-th input signal in the p-th test pattern, N the number of test patterns, m the number of output units, and n the number of input units. As the noise-added training generates a smoother solution space, the adding a small amount of artificial noise into the input signals of training patterns improves both the convergence of residual error in training and the generalization capability
12.1 Identification of Defects with Laser Ultrasonics
117
of the neural network. Then, it is expected that the neural networks with low EPC values may be realized by noise-added training. Note that, using the neural network trained with the data obtained by the parallel finite element analyses, a neuro-based method has been developed for the identification of a surface defect hidden in solid using the laser ultrasonics, where the horizontal locations and the depth of real defects are estimated with 2.4–12.5% and 0.6–4.1% errors, respectively (Fig. 12.3) [2].
Laser beam
A
B
Defect2 Estimated
Actual Error
Horizontal Location : 12.5 % Depth
Laser beam
A
: 4.1 %
B
Defect1
Estimated
Actual Error Horizontal Location : 2.4 % Depth : 0.6 %
Fig. 12.3 Estimated locations of Defect1 and Defect2 using measured dynamic responses. Reprinted from [2] with permission from Taylor & Francis
118
12 Structural Identification
12.2 Identification of Cracks The electric potential drop method is one of the nondestructive methods to detect locations and sizes of cracks or defects from the electric potential values measured at multiple points around the cracks and defects, which has such favorable features as the simple procedure, the easy applicability to detect 3D cracks and the insensitiveness to material non-homogeneity and residual stresses in comparison to the ultrasonic method [7]. The governing equation and the boundary conditions for the electric potential φ, which is the key parameter for the method, are described as follows: ∇ 2 φ = 0 in
(12.2.1)
φ = φ on 1
(12.2.2)
q=
∂φ = q on 2 ∂n −
(12.2.3)
where ∇ 2 is the Laplacian operator, q the flux, q the prescribed value of the flux and n the direction of outer normal to the boundary. Once the boundary conditions of a body including cracks are given, the electric potential φ at any location is obtained by solving the above equation. The schematic of the crack identification problem is shown in Fig. 12.4, where several parameters, such as the length (L), the location of its center (X,Y) and the Fig. 12.4 Crack identification by electric potential drop method
12.2 Identification of Cracks
119
angle (A) to the reference direction, are to be identified from the values of the electric potential measured at multiple points. The neural network has been applied to this problem, the procedure of which is summarized as follows [8]: (1) A set of crack parameters and the electric potential values at several points, i.e. a pattern, is obtained by the FEM calculation of the electric potential field. Repeating this for a variety of cracks, many data for the patterns are produced. (2) The feedforward neural network is trained using the patterns above, where we assign the electric potential values at multiple points as the input and the parameters of the crack as the output or the teacher signals. (3) The trained neural network above is applied to a crack identification problem. First, a set of measured electric potential values are input to the trained neural network. Then, the neural network outputs the crack parameters. This method has been applied to a surface fatigue crack of semielliptical shape in a pipe [8]. With the electric potential values at eleven points, two crack parameters are identified: the depth and the aspect ratio of the crack. The training patterns are generated through the FEM analyses of square plates, each of which contains a various geometries of semielliptical surface crack. The three-layer neural network is employed: 11 units in the input layer, 40 units in the hidden layer and 2 units in the output layer, and twenty four patterns are used to train the neural network. The trained neural network is tested to estimate the shape of a crack, showing a good estimation capability of the neural network within about 10% error. The method is then employed to identify two dissimilar surface cracks that exist at the middle plane of a square plate of 200 mm width and 20 mm thickness. With electric potential values at 41 points on a cracked surface, six crack parameters are to be identified: three parameters for each crack, the position x, the depth a and the half length c of each crack. Generated are 729 training patterns through the 3D finite element analyses of square plates, each of which contains one or two different semielliptical surface cracks each other. Then, five kinds of the three-layer neural networks, called type A, B, C-1, C-2 and C-3, respectively, are taken. The neural network of type A estimates the number of cracks, i.e. whether one or two. That of type B, which is used only when the number of cracks is estimated to one, estimates the location, the depth and the length of the surface crack. The neural networks of type C-1, C-2 and C-3, employed only when the existence of two cracks is considered, estimate the locations x1 and x2 , the depths a1 and a2 and the half lengths c1 and c2 of the cracks, respectively. The neural networks employed have 41 units in the input layer, 18 to 20 units in the hidden layer and 2 or 3 units in the output layer according to the number of parameters to be identified. The trained neural network of type A is proved to estimate the number of cracks correctly. The trained neural networks of type C are shown to be able to estimate the locations and the shapes of two cracks successfully.
120
12 Structural Identification
12.3 Estimation of Stable Crack Growth The elastic–plastic fracture phenomena of inhomogeneous materials and structures is one of the critical issues concerning the integrity of welded components such as piping and pressure vessels, in which the nonlinear fracture mechanics based on the J-integral concept has often been utilized. From a series of experiments and numerical studies on the stable crack growth in welded CT specimens (Fig. 12.5), it is suggested that the utilization of mixed material constants of two-component materials could successfully estimate the crack growth behavior in welded specimen, employing a proper ratio of mixture of material constants achieved by the neural networks [9]. In the paper, the Ramberg–Osgood type stress–strain relation is assumed as follows: n σ˜ σ˜ ε¯ = +α (12.3.1) ε0 σ0 σ0
Fig. 12.5 F5G welded CT specimen. Reprinted from [9] with permission from Elsevier
12.3 Estimation of Stable Crack Growth
121
where α, ε0 , σ0 and n are the material constants, while ε¯ and σ˜ the von Mises type equivalent strain and stress, respectively. On the other hand, the empirical J-integral [10] is calculated as J D(i+1) =
J D(i) +
ηi bi
Ai,i+1
γi × 1− (ai+1 − ai ) bi
(12.3.2)
where bi = W − ai bi W bi γi = 1 + 0.76 W
(12.3.3)
ηi = 2 + 0.522
(12.3.4) (12.3.5)
Here, ai is the crack length at the i-th step, W the width of CT specimen, Ai,i+1 the area of load versus load-line displacement curve between the i-th and the (i + 1)-th steps. In the GE/EPRI estimation scheme [11], which is based on the J2 -deformation theory of plasticity and the power-law hardening constitutive model, the J-integral J and the load-line displacement δ are, respectively, given as J = Je + J p
(12.3.6)
δ = δe + δ p
(12.3.7)
where Je and J p are, respectively, the elastic and the fully plastic solutions of J , and δe and δ p are the elastic and the fully plastic solutions of δ, respectively. Here, J p and δ p are, respectively, defined as a P n+1 ,n × W P0 a
P n+1 ,n × δ p = α × ε0 × a × h 3 W P0
J p = α × σ0 × ε0 × (W − a) × h 1
(12.3.8)
(12.3.9)
where σ0 and ε0 are the yield stress and the yield strain, respectively, h 1 and h 3 the fully plastic solutions of J and δ, respectively, P the applied load per unit thickness and P0 the limit load per unit thickness. Next, the generation phase crack growth simulations are performed as follows: (1) Using the measured δ versus a curve as the input, the applied load P is iteratively calculated from Eq. (12.3.7).
122
12 Structural Identification
(2) J is calculated by substituting the applied load P into Eq. (12.3.6). With the mixture ratio ω, the hypothetical material constant is derived as follows: C = (1 − ω)C1 + ωC2
(12.3.10)
where C could be any material constant, C1 that of softer material and C2 that of harder material. A method is proposed to find the best ratio of mixture by the neural networks as follows [9]: (1) Given the mixture ratio, a set of P−δ and J −a curves are calculated using the measured δ−a curve. Thus, a data set of P−δ and J −a curves and the mixture ratio ω, called a pattern, is obtained. A lot of patterns are generated by parametrically varying the mixture ratio. (2) The neural network is trained using the patterns obtained above: P−δ and J −a curves as the input and the mixture ratio ω as the output or the teacher signal. (3) When a new measured P−δ and J −a curves are provided to the trained neural network above, it outputs the estimated best value of ω. It is concluded in the paper that among three types of estimation methods tested the best performance is given by the following one. First, utilize ω P to calculate the P−δ curve with the measured δ−a curve. Similarly, utilize ω J to calculate the J −a curve with the measured δ−a curve. Thus, the training patterns are generated by varying ω P and ω J , parametrically. Then, two neural networks are trained: one is trained with the P−δ curve as the input and ω P as the teacher signal, and the other with the J −a curve as the input and ω J as the teacher signal. Both neural networks have the same structure: 16 units in the input layer, 32 units in the hidden layer and 1 unit in the output layer (Fig. 12.6). With this method, the trained neural networks are shown to be able to estimate the best ratios from less number of learning data than with other methods (Fig. 12.7).
12.4 Failure Mechanisms in Power Plant Components Structural failures are one of the main causes of availability loss in industries, and many of them are repeated due to the same failure mechanism. The neural networks that can either predict possible failure mechanisms or classify root failure causes are developed using the database collected from operating power plants [12]. Failure mechanisms in power plants include creep, overheating (OH) and overstressing (OS) induced failures. The root failure cause is classified into a primary or secondary root cause: the former is defined as manufacturing, material or design-induced causes, while the latter failures due to operation or mal-operation.
12.4 Failure Mechanisms in Power Plant Components
123
Fig. 12.6 Tested system of neural networks. Reprinted from [9] with permission from Elsevier
Fig. 12.7 Comparison of experimental and estimated P-delta curves of F5G specimen. Reprinted from [9] with permission from Elsevier
In the paper, three neural networks are developed, each of which predicts a failure mechanism whether it is creep-induced, whether it is overheating-induced or whether it is overstressing-induced. Additional neural network that predicts whether the root failure cause is primary or not is also developed. In the prediction of whether it is creep-induced, the three-layer neural network, which is trained using the following eight parameters, results in the best prediction capability of over 86% correctness:
124
12 Structural Identification
(1) T /Tc : The ratio of the operation temperature to the creep temperature after the operation hours. (2) σ/σc : The ratio of the stress to the creep rupture stress at the operation temperature after the operation hours, where the stress σ is defined as the nominal hoop stress as σ =
1 (d − t) P 2 t
(12.4.1)
where P is the operation pressure, d the pipe diameter and t the pipe thickness. log10 H : Logarithmic operation hours. T: Operation temperature. P: Operating pressure. d: Pipe diameter. t: Pipe thickness. mc: Material class, which can be the carbon steel (CA), the low-alloy steel (LA), the high-alloy steel (HA) or the austenitic steel (AU). In the prediction of the root failure cause, the three-layer neural network, which is trained using the following nine parameters results in over 88% correct prediction: (1)–(8): The same parameters used in the prediction of the creep-induced failure. (9) σ/σ y : The ratio of the stress to the yield stress at the operation temperature after the operation hours. (3) (4) (5) (6) (7) (8)
12.5 Identification of Parameters of Non-uniform Beam A beam consisting of six subsections of equal length, each of which has the various Young’s modulus is taken (Fig. 12.8) [13], where the Young’s modulus of each subsection is determined by the feedforward neural network using the vibration characteristics of the beam such as the eigenfrequencies and the eigenvectors of the first to the sixth modes. The procedure of the method is as follows: (1) Setting the values of the Young’s modulus to six subsections as E 1 , . . . , E 6 , the finite element eigen analysis is performed to obtain 36 vibration characteristics that consist of the first to the sixth eigenfrequencies f 1 , . . . , f 6 and the normalized displacements at the five nodes of the six eigenmodes u 11 , u 12 , . . . , u 64 , u 65 . Fig. 12.8 Beam with non-uniform Young’s modulus and its finite element model. Reprinted from [13] with permission from Wiley
12.5 Identification of Parameters of Non-uniform Beam
125
The above eigenvalue analyses are repeatedly performed for combinations of the various values of the Young’s modulus of the six subsections, thus a lot of patterns, i.e. a set of (E 1 , . . . , E 6 , f 1 , . . . , f 6 , u 11 , u 12 , . . . , u 64 , u 65 ) are generated. (2) The feedforward neural network is trained using the patterns generated in (1) above: f 1 , . . . , f 6 and u 11 , u 12 , . . . , u 64 , u 65 as the input data, while E 1 , . . . , E 6 as the teacher signals. The three-layer neural network is employed: 36 units in the input layer, 50 units in the hidden layer and 5 units in the output layer. (3) Once the neural network is trained, it is applied to the identification of the Young’s modulus of each subsection of a beam. Note that the input data to the neural network above are usually normalized in the unit range of [0.0, 1.0]. For this purpose, either the linear transformation or the logarithmic transformation is employed for most cases. However, these transformations cannot sufficiently remedy the non-uniform nature of heavily localized original data, often resulting in poor convergence in the training process and less estimation accuracy. In the paper, a normalization method for the neural networks, called a generalized-space-lattice (GSL) transform, is proposed. When transforming x1 , . . . , xn with GSL, x1 , . . . , xn are first sorted depending on their values. If the ascending order of x j is k j , then x j is transformed to x j as
xj =
kj − 1 n−1
(12.5.1)
As the dense data points are spread due to this transformation, the estimation accuracy shows significant improvement (Figs. 12.9 and 12.10).
Fig. 12.9 Schematic view of GSL transformation. Reprinted from [13] with permission from Wiley
126
12 Structural Identification
Fig. 12.10 Mean estimation error for a training patterns versus training iterations and b test patterns versus training iterations. Reprinted from [13] with permission from Wiley
12.6 Prediction of Beam-Mass Vibration The linear and nonlinear free vibrations of beam-mass systems under immovable end conditions are predicted using the feedforward neural networks [14]. Figure 12.11 shows an example of a beam-mass system under immovable end conditions. The mode shapes and the natural frequencies of the system can be obtained by solving the transcendental equations through time-consuming Newton–Raphson iterations for parameters α and η, which are defined as follows: α=
M ρ AL
(12.6.1)
xs L
(12.6.2)
η=
Here, M is the concentrated mass, ρ the density of the beam, A the cross-sectional area of the beam, L the length of the beam and xs the position of the concentrated mass. The nonlinear amplitude dependent frequencies are approximated as Fig. 12.11 Beam-mass system
Mass
xs
Beam
L
12.6 Prediction of Beam-Mass Vibration Table 12.1 Average percentage errors for test values of ANN for linear problem. Reprinted from [14] with permission from Elsevier
Table 12.2 Average percentage errors for test values of ANN for nonlinear problem. Reprinted from [14] with permission from Elsevier
127 Input data number
Average error (%)
Case I
18
0.298561
Case II
20
0.379373
Case III
18
0.120634
Case IV
20
0.385934
Case V
31
0.4573
Input data number
Average error (%)
Case I
9
0.381519
Case II
8
0.274351
Case III
9
1.423081
Case IV
8
1.175880
Case V
12
0.525184
ω N L = ω + λa02
(12.6.3)
where a0 is the amplitude of vibration and λ the nonlinear correction coefficient. Assuming α and η as the input parameters, the neural networks are trained to output five natural frequencies for linear vibrations, while λ for nonlinear vibrations. Here, five-layer feedforward neural networks are employed for both linear and nonlinear vibrations. It is shown that the estimation error obtained by the trained networks are within 0.5 and 1.5% for linear and nonlinear vibrations, respectively (Tables 12.1 and 12.2).
12.7 Others 12.7.1 Nondestructive Evaluation with Neural Networks Stavroulakis and Antes [15]: The feedforward neural networks are employed for the identification of unilateral contact cracks in nonlinear elastostatics. Performed is simulations of the static unilateral contact problem for an elastic two-dimensional structure with a crack by the multiregion BEM. The location of crack is identified from the nodal vertical displacements at the upper surface. It is shown that locations of crack can be identified from the 29 displacement values by the five-layer neural network: 29 units in the first layer, 50 units in each of the three hidden layers and 2 units in the output layer.
128
12 Structural Identification
Stavroulakis and Antes [16]: The feedforward neural networks are utilized for the two-dimensional crack identification in steady state elastodynamics. Harmonic excitation with time-periodic loadings in linear elastodynamics is simulated using the boundary element method in the frequency domain. The length of a crack at the bottom is identified by using the vertical displacements measured for two frequencies at the 48 points of the top surface. Four-layer neural network is used: 96 units (two frequencies for each of 48 measuring points) in the input layer, 1 unit in the output layer, and two hidden layers have the same number of units, selected out of 10, 50 or 150. Liu et al. [17]: The feedforward neural networks are applied to ultrasonic testing of a crack in solid, where simulation of the wave propagation is performed by combining the finite element method with the boundary element method [18]: the region around a crack is analyzed by the FEM and the outer region by the BEM. Two types of three-layer neural networks are developed: one classifies cracks into three categories, i.e. no crack, surface-breaking crack and sub-surface crack, and the other identifies the sizes and the locations of the cracks. In the paper, 13 characteristic values for the input to the neural network are extracted from each of the transient surface response waveforms obtained from the direct analysis: the first three relative maximum values of responses, the first three relative minimum values of the responses, their corresponding arrival times and the distance between source and receiver. Kao and Hung [19]: Employed are the feedforward neural networks to detect damage in civil engineering structures using free vibration responses generated by the neural networks. First, the feedforward neural network is trained to simulate dynamic responses of the structure using training patterns obtained by experiments or numerical simulations. A training pattern consists of external excitations to the structure p(n), displacements d(n), velocities v(n) and accelerations a(n) at the measuring point, where n indicates the value at the n-th time step. Second, the trained networks are utilized to evaluate the extent of damage in the structure. The method is tested for a five-story steel frame. Zacharias et al. [20]: The feedforward neural networks are applied to the identification of damage in reusable crates of beverages using the vibration response data. Training patterns consisting of frequency response spectra are generated by the finite element vibration analysis package NASTRAN, where a finite element mesh with more than a million degrees of freedom is generated from a CAD data set by the pre-processor PATRAN. The obtained spectra are validated through comparison to selected experimental data. Out of approximately 1000 amplitudes in the calculated or the measured frequency spectrum, ten values are generated by a piecewise integration of them. These 10 data per measuring point are used as input to the neural network. With two sensors, i.e. two measuring points, the damage at the crate is successfully detected by the loosely-coupled feedforward neural network: two three-layer neural networks each corresponding to data from a sensor are combined into a single neural network as shown in Fig. 12.12.
12.7 Others
129
Output Layer (2 units) Hidden Layer (5 units) Hidden Layer (2 units)
Hidden Layer (2 units)
Hidden Layer (5 units)
Hidden Layer (5 units)
Input Layer (10units)
Input Layer (10units)
Fig. 12.12 Loosely-coupled neural network
Jeyasehar [21]: The four-layer feedforward neural networks are applied to the assessment of damage in prestressed concrete beams from natural frequency measurements. Given the response data such as natural frequency, the neural network employed predicts the extent of damage in the beam. Jiang and Adeli [22]: The dynamic fuzzy wavelet neural network is utilized for the damage detection of highrise building structures subjected to seismic excitations. The general input-output mapping in the dynamic fuzzy wavelet neural network from input vector (x1 , . . . , x D ) to the k-th output yk is given as follows: D D x j − ci j yk = + wi ϕ bjxj + d ai j i=1 j=1 j=1 M
(12.7.1)
where ϕ() is a wavelet function such as the Mexican hat [23], ci j the j-th value in the ith cluster of the multidimensional input state vector obtained using the fuzzy C-means clustering [24], ai j the frequency or scale corresponding to the multidimensional input vector, wi the connection weight between the i-th wavelet node and the output node, b j the connection weight between the j-th input and the output, d the bias and M the number of wavelet nodes. The wavelet function employed has a spatial-spectral zooming property reducing the undesirable interaction effects among the nodes, which in general improve the accuracy of the structural damage detection. The highrise structure is divided into a series of substructures, in each of which sensors are placed at one of the floors, called a measurement floor. The wavelet neural network is trained to output current acceleration response at the i-th measurement floor from the input of the acceleration responses of three adjacent measurement floors, i.e. (i − 1)-th, i-th and (i + 1)-th measurement floors, at the previous time step. The damage to the substructure is evaluated using the damage evaluation method based on the error between the measured
130
12 Structural Identification
responses and the predicted ones by the trained fuzzy wavelet neural network. This method is validated for a 38-story concrete model. de Oliveira and Marques [25]: The self-organizing maps (Sect. 5.1) are utilized for the damage identification and the discrimination for composite materials using the acoustic emission (AE) signals [26]. Fourteen features are extracted from the AE signals measured both in the time and frequency domain: ten features such as the rise time, the duration and the counts number etc. from the signals in the time domain, while four features such as the fast Fourier transform (FFT) peak frequencies, the FFT amplitude and the center of gravity of the spectrum from the signals in the frequency domain. A self-organizing map of 45 × 29 in size is constructed from 4243 AE events, and the map is further partitioned into clusters by using the kmeans method [27]. In the paper, the AE signals are clustered into six types, each of which is associated to one of the following damages: transverse matrix cracking, decohesion, delamination, longitudinal matrix splitting and fibre fracture. Hattori and Sáez [28]: The self-organizing maps and the feedforward neural networks are combined for the identification of a crack in magneto-electro-elastic materials. The length, the location of the center and the inclination angle of a crack are to be identified from the elastic displacements at the measurement point or the electric and magnetic potentials at the point. Direct analyses are performed by the BEM. Training patterns are first divided into five clusters by using the self-organizing maps and the successive clustering algorithm such as the k-means method. For each categorized cluster, the feedforward neural network is trained just with the patterns in the cluster. It is concluded that, based on this partitioning of the patterns, the neural networks show better performance in the identification of crack parameters. In addition to the above, the studies related to the nondestructive evaluation with the neural networks include: – assessment of hydrothermal effect or change in material properties due to a combined action of moisture and temperature using the feedforward neural networks integrated with the fuzzy logic [29], – damage detection of steel bridge structures from displacements measured at several points under given static load using the four-layer feedforward neural network that outputs the cross-sectional areas of all elements [30], – damage identification of structures from the dynamic response using the four-layer feedforward neural network that outputs where a delamination exists [31], – damage assessment of prestressed concrete piles by the fuzzy neural networks [32]. – identification of stiffness of a substructure from natural frequencies and modal shapes using the feedforward neural networks [33], – detection of fault just from the data obtained by transmissibility measurement using the feedforward neural networks [34],
12.7 Others
131
– identification of mechanical parameters of surrounding rock of underground caverns in stratified layers using the fuzzy neural networks that output stresses and the modulus of elasticity with the measured displacements as input [35], – damage identification of structures by the feedforward neural networks using the statistical properties of structural dynamic response, such as the change of variance and covariance of structural dynamic displacements [36], – seismic damage identification in buildings using the feedforward neural networks that output spatial variables including mass and stiffness from the input of several natural frequencies and mode shapes at each principle direction of the structure [37], – prediction of modulus of elasticity from other properties measured by some NDE techniques using the five-layer feedforward neural network [38].
12.7.2 Structural Identification with Neural Networks Klos and Waszczyszyn [39]: The geometrical parameters of circular arch, such as span, height and cross-sectional thickness, are identified from the eigenfrequencies of the arch using the cascade neural network, which is one of the fully-connected feedforward neural networks. Hasançebi and Dumlupınar [40]: The model updating is performed based on the static and dynamic responses of a structure using the feedforward neural networks. The dynamic and static responses of a structure gradually change in its life cycle due to aging, deterioration and damage. Therefore, model parameters of a real structure should be updated based on the measured data of its current state. In the paper, the model updating for the bridge structures is considered and five parameters are selected to be identified: vertical, transverse and lateral spring stiffness representing the modelled-by-spring boundary conditions of the bridge structure, the elasticity modulus of concrete and the thickness of surface overlay of the bridge deck. Varying these five parameters, the finite element simulations are performed to obtain the corresponding static and dynamic responses of the structure. Six values selected from the dynamic and static responses are used as the input data to the neural network, and the five model parameters as the corresponding teacher signals. Facchini et al. [41]: The eigenvalues and the eigenmodes of a structure are identified from vibration data using the feedforward neural networks. From the vibration data measured, four frequency-dependent indexes are calculated, which are used as the input to the neural network, and the judgement whether a structural resonance exists at the frequency is used as the corresponding teacher signal. The four layer neural network is employed as follows: four units in the first layer, one unit in the output layer and 5–15 units in each of the two hidden layers. Due to the wide generality of the four indexes, training patterns for the neural network can be collected from numerical models of structures different from the target structure.
132
12 Structural Identification
Chen et al. [42]: The impact load conditions of shell structures are identified based on the permanent plastic deformation of the structures with the feedforward neural network, which has two hidden layers: the first hidden layer has 128 units and the second 8 units, all with ReLU activation function. The location and the speed of the impact load are identified for static problems, and the location, velocity and collision duration are identified, both using the permanent plastic deformation of the shell as the input.
12.7.3 Neural Networks Combined with Global Optimization Method Mera et al. [43]: The electrical impedance tomography, which identifies the geometry of discontinuities in conductive material from voltages and the current fluxes measured at boundaries, is studied using the evolutionary algorithm (EA) with the neural networks. Saridakis et al. [44]: The identification of two cracks with different characteristics as the position, the depth and the rotation angle is performed using the genetic algorithm (GA), where the compliance factors used to model the cracks are estimated by the neural networks.
12.7.4 Training of Neural Networks Marwala [45]: The fault identification in cylinders from vibration data is performed using the feedforward neural networks, where the neural networks formulated using the Bayesian method are trained by using the hybrid Monte Carlo method, Fang et al. [46]: Studied is the detection of damage in a structure from frequency response functions using the feedforward neural networks, where several types of the steepest decent algorithms for the error back propagation are tested, concluding that the tunable steepest decent algorithm outperforms others in the case studied. Protopapadakis et al. [47]: Structural defects in concrete pile are identified using the feedforward neural networks, where parameters of neural networks, such as the number of hidden layers, the number of units in each hidden layer and other parameters related to the learning algorithm employed, are determined by the genetic algorithm.
References
133
References 1. Krautkramer, J., Krautkramer, H.: Ultrasonic Testing of Materials, 4th, fully revised edn. Springer, Berlin (1990) 2. Oishi, A., Yamada, K., Yoshimura, S., Yagawa, G., Nagai, S., Matsuda, Y.: Neural networkbased inverse analysis for defect identification with laser ultrasonics. Res. Nondestruct. Eval. 13(2), 79–95 (2001) 3. Scruby, C.B., Drain, L.E.: Laser Ultrasonics: Techniques and Applications. Adam Hilger (1990) 4. Nakano, H., Nagai, S.: Crack measurement by laser ultrasonic at high temperatures. Jpn. J. Appl. Phys. 32(5B), 2540–2542 (1993) 5. Dewhurst, R.J., Hutchins, D.A., Palmer, S.B.: Quantitative measurements of laser-generated acoustic waveforms. J. Appl. Phys. 53(6), 4064–4071 (1982) 6. Yamawaki, H., Saito, T.: Computer simulation of laser-generated elastic waves in solid. Nondestruct. Test. Eval. 7, 165–177 (1992) 7. Kubo, S., Sakagami, T., Ohji, K.L.: Electric potential CT method for measuring two- and threedimensional cracks. In: Okamura, H., Ogura, K. (Eds.) Current Japanese Materials Research, vol. 8, Fracture Mechanics. pp. 235–254 (1991) 8. Yoshimura, S., Saito, Y., Yagawa, G.: Identification of two dissimilar surface cracks hidden in solid using neural networks and computational mechanics. Comput. Model. Simulat. Eng. 1, 477–491 (1996) 9. Yagawa, G., Matsuda, A., Kawate, H.: Neural network approach to estimate stable crack growth in welded specimens. Int. J. Pressure Vessels Piping 63, 303–313 (1995) 10. Ernst, H.A., Paris, P.C., Landes, J.D.: Estimations on J-integral and tearing modulus T from a single specimen test record. ASTM STP 743, 476–502 (1981) 11. Kumar, V., German, M.D., Shih, C.F.: An engineering approach for elastic-plastic fracture analysis. NP-1391, Project 1237-1, Topical Report, EPRI (1981) 12. Yoshimura, S., Jovanovic, A.S.: Analyses of possible failure mechanism and root failure causes in power plant components using neural networks and structural failure database. J. Press. Vessel Technol. 118, 237–246 (1996) 13. Yoshimura, S., Matsuda, A., Yagawa, G.: New regularization by transformation for neural network based inverse analyses and its application to structure identification. Int. J. Numer. Methods Eng. 39, 3953–3968 (1996) 14. Karlik, B., Ozkaya, E., Aydin, S., Pakdemirli, M.: Vibrations of a beam-mass systems using artificial neural networks. Comput. Struct. 69, 339–347 (1998) 15. Stavroulakis, G.E., Antes, H.: Nondestructive elastostatic identification of unilateral cracks through BEM and neural networks. Comput. Mechan. 20, 439–451 (1997) 16. Stavroulakis, G.E., Antes, H.: Neural crack identification in steady state elastodynamics. Comput. Methods Appl. Mechan. Eng. 165, 129–146 (1998) 17. Liu, S.W., Huang, J.H., Sung, J.C., Lee, C.C.: Detection of cracks using neural networks and computational mechanics. Comput. Methods Appl. Mechan. Eng. 191, 2831–2845 (2002) 18. Brebbia, C.A., Telles, J.C.F., Wrobel, L.C.: Boundary Element Techniques: Theory and Applications in Engineering. Springer, Berlin (1984) 19. Kao, C.Y., Hung, S.-L.: Detection of structural damage via free vibration responses generated by approximating artificial neural networks. Comput. Struct. 81, 2631–2644 (2003) 20. Zacharias, J., Hartmann, C., Delgado, A.: Damage detection on crates of beverages by artificial neural networks trained with finite-element data. Comput. Methods Appl. Mechan. Eng. 193, 561–574 (2004) 21. Jeyasehar, C.A., Sumangala, K.: Damage assessment of prestressed concrete beams using artificial neural network (ANN) approach. Comput. Struct. 84, 1709–1718 (2006) 22. Jiang, X., Adeli, H.: Pseudospectra, MUSIC, and dynamic wavelet neural network for damage detection of highrise buildings. Int. J. Numer. Method. Eng. 71, 606–629 (2007)
134
12 Structural Identification
23. Daubechies, I.: Ten Lectures on Wavelets. SIAM (1992) 24. Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press (1981) 25. de Oliveira, R., Marques, A.T.: Health monitoring of FRP using acoustic emission and artificial neural networks. Comput. Struct. 86, 367–373 (2008) 26. Nazarchuk, Z., Skalskyi, V., Serhiyenko, O.: Acoustic Emission: Methodology and Application. Springer, Berlin (2017) 27. Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Berlin (2006) 28. Hattori, G., Sáez, A.: Crack identification in magnetoelectroelastic materials using neural networks, self-organizing algorithms and boundary element method. Comput. Struct. 125, 187–199 (2013) 29. Ramu, S.A., Johnson, V.T.: Damage assessment of composite structures—a fuzzy logic integrated neural network approach. Comput. Struct. 57(3), 491–502 (1995) 30. Pandey, P.C., Barai, S.V.: Multilayer perceptron in damage detection of bridge structures. Comput. Struct. 54(4), 597–608 (1995) 31. Rhim, J., Lee, S.W.: A neural network approach for damage detection and identification of structures. Comput. Mechan. 16, 437–443 (1995) 32. Rajasekaran, S., Febin, M.F., Ramasamy, J.V.: Artificial fuzzy neural networks in civil engineering. Comput. Struct. 61(2), 291–302 (1996) 33. Yun, C.-B., Bahng, E.Y.: Substructural identification using neural networks. Comput. Struct. 77, 41–52 (2000) 34. Chen, Q., Chan, Y.W., Worden, K.: Structural fault diagnosis and isolation using neural networks based on response-only data. Comput. Struct. 81, 2165–2172 (2003) 35. Liang, Y.C., Feng, D.P., Liu, G.R., Yang, X.W., Han, X.: Neural identification of rock parameters using fuzzy adaptive learning parameters. Comput. Struct. 81, 2373–2382 (2003) 36. Li, Z.-X., Yang, X.-M.: Damage identification for beams using ANN based on statistical property of structural responses. Comput. Struct. 86, 64–71 (2008) 37. Gonzalez, M.P., Zapico, J.L.: Seismic damage identification in buildings using neural networks and modal data. Comput. Struct. 86, 416–426 (2008) 38. Esteban, L.G., Fernández, F.G., de Palacios, P.: MOE prediction. In: Abies pinsapo Boiss. timber: application of an artificial neural network using non-destructive testing. Comput. Struct. 87, 1360–1365 (2009) 39. Klos, M., Waszczyszyn, Z.: Modal analysis and modified cascade neural networks in identification of geometrical parameters of circular arches. Comput. Struct. 89, 581–589 (2011) 40. Hasançebi, O., Dumlupınar, T.: Linear and nonlinear model updating of reinforced concrete T-beam bridges using artificial neural networks. Comput. Struct. 119, 1–11 (2013) 41. Facchini, L., Betti, M., Biagini, P.: Neural network based modal identification of structural systems through output-only measurement. Comput. Struct. 138, 183–194 (2014) 42. Chen, G., Li, T., Chen, Q., Ren, S., Wang, C., Li, S.: Application of deep learning neural network to identify collision load conditions based on permanent plastic deformation of shell structures. Comput. Mechan. 64, 435–449 (2019) 43. Mera, N.S., Elliott, L., Ingham, D.B.: The use of neural network approximation models to speed up the optimization process in electrical impedance tomography. Comput. Methods Appl. Mechan. Eng. 197, 103–114 (2007) 44. Saridakis, K.M., Chasalevris, A.C., Papadopoulos, C.A., Dentsoras, A.J.: Applying neural networks, genetic algorithms and fuzzy logic for the identification of cracks in shafts by using coupled response measurements. Comput. Struct. 86, 1318–1338 (2008) 45. Marwala, T.: Scaled conjugate gradient and Bayesian training of neural networks for fault identification in cylinders. Comput. Struct. 79, 2793–2803 (2001)
References
135
46. Fang, X., Luo, H., Tang, J.: Structural damage detection using neural network with learning rate improvement. Comput. Struct. 83, 2150–2161 (2005) 47. Protopapadakis, E., Schauer, M., Pierri, E., Doulamis, A.D., Stavroulakis, G.E., Böhrnsen, J.-U., Langer, S.: A genetically optimized neural classifier applied to numerical pile integrity tests considering concrete piles. Comput. Struct. 162, 68–79 (2016)
Chapter 13
Structural Optimization
Abstract We discuss here applications of neural networks to structural optimization. In this category, neural networks are often combined with other global optimization algorithms such as genetic algorithms or evolutionary algorithms. Discussed are neural networks applied to the topology and shape optimization (Sect. 13.1), the preform tool shape optimization (Sect. 13.2), the structural optimization using neural networks and evolutionary methods (Sect. 13.3), the optimal design of materials (Sect. 13.4), the optimization of production process (Sect. 13.5), the control of dynamic behavior of structures (Sect. 13.6), the subjective evaluation for handling and stability of vehicle (Sect. 13.7), and many others related to this category (Sect. 13.8).
13.1 Hole Image Interpretation for Integrated Topology and Shape Optimization The homogenization and density based topology optimization is performed to obtain the best conceptual structural configuration with prescribed boundary and loading conditions, where the feedforward neural networks are employed for the identification of the geometry of an inner hole [1]. The identification of the geometry consists of two stages. The hole is first assigned to either of the following basic geometries: circular, semicircular, rectangular, square or triangular. Next, it is categorized to the most fit detailed shape templates among the geometry assigned above (Fig. 13.1). The neural networks for the first stage (ANN-1) classify the geometry of an inner hole into one of five categories (Fig. 13.2) based on the seven invariant moments, which are derived from the central moments of the hole image and invariant to changes of size and orientation of the hole. The logarithmic values of these seven parameters are used as the input to the neural network. The four-layer neural network is employed at the first stage with 7 units in the input layer, 35 units in the first hidden layer, 20 units in the second hidden layer and 5 units in the output layer. The neural networks for the second stage (ANN-2) further assign the inner hole to one of several template geometries among the subgroup assigned at the first stage based on the six non-dimensional parameters, which also are invariant to changes of © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 G. Yagawa and A. Oishi, Computational Mechanics with Neural Networks, Lecture Notes on Numerical Methods in Engineering and Sciences, https://doi.org/10.1007/978-3-030-66111-3_13
137
138
13 Structural Optimization
Fig. 13.1 Flowchart of artificial neural network based hole configuration interpretation process. Reprinted from [1] with permission from Elsevier
size and orientation of the hole and derived from the following four basic measurements of the hole image: the circumference, the largest and the smallest distances between the centroid and the boundary and the area of the hole image. The usefulness of this method is demonstrated through three illustrative examples (Fig. 13.3).
13.2 Preform Tool Shape Optimization and Redesign
139
Fig. 13.2 Training patterns for neural networks for first stage. Reprinted from [1] with permission from Elsevier
13.2 Preform Tool Shape Optimization and Redesign The preform tool design is one of the most important steps for the product quality control in forging process, where the shape optimization of the preform tool is required and the fill ratio of the final die cavity is an important factor to be considered. In this context, the difference of the shape between the forged and the actually required part is often taken as the objective function to be minimized. The response surface method (RSM) could reduce the computational cost during the optimization process compared to the iterative use of the time-consuming finite element analyses. Among the most widely used approximation models in RSM, the neural networks are considered to be superior to the others in both capability of handling non-linear problems and ease of use and implementation. Tang, Zhou and Chen have employed the feedforward neural networks for the RSM-based optimization of the preform tool shape (Fig. 13.4) [2]. In the paper, the shape of a preform tool is described by B-spline curve (Appendix A3), and control points are denoted as design variables (Fig. 13.5). Seven y-coordinate values of seven control points are used as design variables of the two-dimensional preform tool. Each
140
13 Structural Optimization
Fig. 13.3 a Design space of cantilever beam topology optimization problem; b resultant optimum binary topology; c Structure model rebuild by ANN based interpretation techniques; d initial design of shape optimization problem; and e optimum structure of shape optimization. Reprinted from [1] with permission from Elsevier
of the seven values takes one of six prescribed values, which results in 67 patterns in total. In order to reduce the number of training patterns, the Latin hypercube sampling (LHS) [3] is employed. Selected out of 67 cases by the LHS are 18 sets of the coordinate variables, and the teacher signals, i.e. the die cavity fill ratios, each of which corresponds to one of the set, are calculated by the finite elements. The threelayer neural network is employed with 7 units in the input layer, 20 units in the hidden layer and one single unit in the output layer. The response surface constructed on the
13.2 Preform Tool Shape Optimization and Redesign
141
Fig. 13.4 RSM-based preform tool shape optimization procedure. Reprinted from [2] with permission from Elsevier
Fig. 13.5 B-spline description of preform tool shape. Reprinted from [2] with permission from Elsevier
142
13 Structural Optimization
Fig. 13.6 Preform shapes during iterations. Reprinted from [2] with permission from Elsevier
neural network, combined with a pattern search algorithm, is successfully applied to the optimization of the shape of the tool (Fig. 13.6).
13.3 Evolutionary Methods for Structural Optimization with Adaptive Neural Networks The genetic algorithms (GAs) and the evolutionary algorithms (EAs) are often used for the structural optimization problems because of the global search nature of these algorithms. With these algorithms, however, a full finite element analysis is needed for each individual in the populations considered, resulting in heavy computation. Therefore, the neural networks are used to replace the finite element analyses above, sometimes called as the surrogate models. For example, the feedforward neural network employed in combination with the EA in [4] is adaptive in the sense that its configuration is appropriately updated
13.3 Evolutionary Methods for Structural Optimization …
143
according with the progress of the global optimization by the EA, where the update of the neural network is performed by retraining with the information gradually accumulated during the optimization by the EA. The conventional method with non-adaptive neural networks is summarized as follows: (1) (2) (3) (4) (5) (6) (7) (8) (9) (10)
Training set selection: Select input training patterns. Constraints check: Perform constraints check for each pattern. Training step: Train the neural network (NN). Testing step: Test the trained NN. Initialization of population. Constraints check by NN. Generation of offsprings by EA. Constraints check by NN: If satisfied continue, else go to step 7. Selection step. Convergence check of EA: If satisfied stop, else go to step 7.
It is noted here that the neural network is trained once in (3) above, which is used throughout the optimization process by the EA in (5)–(10) above. On the other hand, the optimization processes proposed in the above paper is summarized as follows: (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15)
Initialization of the first population: The first phase EA starts, where M is set as the number of training patterns of the neural network. Analysis step: Analysis results are stored in data base for training patterns of the NN. Constraints check. Generation of offsprings. Analysis step: Analysis results are stored in data base for training patterns of the NN. Constraints check. Selection step. EA stop check: The first phase EA stops when M training patterns have been accumulated in the data base, else go to step 4. Training set accumulation step: N training patterns are added to the existing training set. In the first training, no patterns are added. Training step: Train the NN. Generation of offsprings in EA: The second phase EA starts. Constraints check by NN. Retraining check: If the criterion for training is satisfied, then go to step 9, else continue. Selection step. Convergence check of the second phase EA: If satisfied stop, else go to step 11.
As shown in (1)–(8) above, the first training patterns are accumulated according with the progress of the EA until M patterns are prepared. Then, the trained neural
144
13 Structural Optimization
Fig. 13.7 3D bus frame. Reprinted from [4] with permission from Elsevier
network is repeatedly re-trained using the extended training patterns including newly added patterns indicated by the EA. With this adaptation, the neural network can gradually acquire prediction capabilities for the regions of the overall design space that are actually visited by the EA. In the retraining check at the step (13), the decision whether the retraining is performed or not is based on the number of wrong predictions by the current NN. In the paper, this method is tested on the minimization of the weight of a bus frame (Figs. 13.7 and 13.8).
13.4 Optimal Design of Materials Shown here are several applications of the NN to the optimal design of a variety of materials: Jayatheertha et al. [5] The mutually-connected neural networks are applied to the optimum design problem of a composite laminated plate. Ootao et al. [6] The five-layer feedforward neural networks are applied to optimization problems of material compositions for a nonhomogeneous hollow sphere with arbitrarily distributed and continuously varied material properties such as functionally graded material (Fig. 13.9), where the material compositions that minimize the thermal stress of the hollow sphere are searched. The feedforward neural network with three
13.4 Optimal Design of Materials
145
Fig. 13.8 Bus frame generation history for various optimization methodologies. Reprinted from [4] with permission from Elsevier
Fig. 13.9 Analytical model of nonhomogeneous hollow sphere. Reprinted from [6] with permission from Elsevier
hidden layers is given two values: a control variable for thermal stress analysis and a temperature of outer surrounding medium as the input data, and a single value: the ratio of the maximum stress to the temperature-dependent tensile or compressive strength as the output (Fig. 13.10). Fairbairn et al. [7] The three-layer feedforward neural networks are utilized to evaluate the values characterizing the statistical distribution of material parameters, based on the mean P − δ curve obtained by a Monte-Carlo simulation (Fig. 13.11). The neural network employed has 18 units in the hidden layer with the 26 input values sampled from the response P − δ curve of the specimen and the outputs being the mean and the standard deviations of the tensile strength (Fig. 13.12). Gotlib et al. [8] Very small two-layer feedforward neural networks are tested to estimate such material properties as the thermal conductivity, the electrical conductivity, the dielectric constant, the magnetic permeability and the diffusivity in the realm of disordered
146
13 Structural Optimization
Fig. 13.10 Five layer neural network. Reprinted from [6] with permission from Elsevier
Fig. 13.11 Direct analysis. Reprinted from [7] with permission from Elsevier
composites. The neural network employed has four units in the input layer and one unit in the output layer without the hidden layer and the output unit has a nonlinear activation function of the sigmoid type. Cetinel et al. [9] The three-layer feedforward neural networks are applied to the prediction of the microstructure and the properties of steel bars treated by the tempcore process. From two input values of the diameters of steel bars and the quenching duration, the neural networks predict seven material parameters: the volume percentages of the
13.4 Optimal Design of Materials
147
Fig. 13.12 Inverse analysis. Reprinted from [7] with permission from Elsevier
martensite, the bainite, the pearlite and the values of the elongation, the self-tempering temperature, the yield and the tensile strengths. Fratini et al. [10] The five-layer feedforward neural networks are employed to predict the local values of the average size of grains occurring in the friction stir welding process (Fig. 13.13). The neural network employed is five-layered: four units in the input
Fig. 13.13 Sketch of friction stir welding of butt joints. Reprinted from [10] with permission from Elsevier
148
13 Structural Optimization
Fig. 13.14 Comparison between measured and calculated average grain sizes. Reprinted from [10] with permission from Elsevier
layer, three, five and four units in the first, second and third hidden layers, respectively, and a single unit in the output layer, where the input data consist of the local values of the equivalent plastic strain and the strain rate, the temperature and the ZenerHollomon parameter in transverse section, and the single output corresponds to the average grain size (Fig. 13.14). Bessa et al. [11] A computational framework to assist the design and modelling of new material systems and structures is developed. The framework consists of three general steps for finding new material properties and models: the design of experiments, the computational analysis methods and the machine learning algorithms including the feedforward neural networks (Fig. 13.15).
13.5 Optimization of Production Process Several applications of the NN to the optimization of production process are discussed as shown in what follows: Toparli et al. [12] The five-layer feedforward neural networks are used for residual stress analyses of cylindrical steel bars (Fig. 13.16). Parameters employed in the neural networks consist of the diameter and the length of a bar and the time as the input, and the temperatures at the center and the surface of the bar as the output. Each hidden layer has 5 units. Figure 13.17 shows the performance of the trained neural network. Lin [13] The abductive neural networks [14] are utilized to analyze the hydrodynamic deep drawing of 3D T-piece design. Hambli and Guerin [15]
13.5 Optimization of Production Process
149
Fig. 13.15 Schematic of global framework for data-driven material systems design/modeling. Reprinted from [11] with permission from Elsevier
Fig. 13.16 Finite element model of cylindrical shape. Reprinted from [12] with permission from Elsevier
150
13 Structural Optimization
Fig. 13.17 Temperatures in center (A) and on surface (B) versus time for cylinder (φ 25 mm × 75 mm). Reprinted from [12] with permission from Elsevier
The three-layer feedforward neural networks are employed to predict the optimum clearance in sheet metal blanking processes (Fig. 13.18). The neural network employed has one unit in the input layer for the material elongation, 15 units in the hidden layer and one unit in the output layer for the optimum clearance (Fig. 13.19). Hambli [16] The feedforward neural networks are used to replace the finite element analyses required in the Monte Carlo simulation for statistical damage analysis of the extrusion processes (Fig. 13.20). The employed neural network has six material parameters as the input and outputs the maximum damage value within the workpiece. Buffa et al. [17]
Fig. 13.18 Illustration of crack propagation angle and diagonal angle. Reprinted from [15] with permission from Elsevier
13.5 Optimization of Production Process
151
Fig. 13.19 Optimum clearance versus material elongation. Reprinted from [15] with permission from Elsevier
The three-layer feedforward neural networks are applied to the prediction of the weld integrity in solid bonding dominated processes. The neural network employed has four bonding conditions as the input: the average value of temperature, the pressure, the strain and the strain rates, and outputs two values related to the quality of the welding (Fig. 13.21). Chokshi et al. [18] The four-layer feedforward neural networks are studied for the prediction of the phase distribution after hot stamping process. The neural network employed has 10 units in the input layer and 3 units in the output layer: the input data consist of 8 timing values sampled from the thermal history, the temperature of deformation and the amount of deformation, and the output data three volume fractions of the martensite, the bainite and the ferrite phases.
13.6 Estimation and Control of Dynamic Behaviors of Structures Shown here are several applications of the NN to the estimation and the control of dynamic behaviors of structures: Tang [19] The four-layer feedforward neural networks are used for the active control of the single-degree-of-freedom (SDF) systems. Dovstam and Dalenbring [20] The three-layer feedforward neural networks are applied to the estimation of damping function in the augmented Hooke’s law. The neural network employed consists of three layers: 1841 units in the input layer for the data sampled from
152
13 Structural Optimization
Fig. 13.20 Flowchart of statistical damage analysis of extrusion process. Reprinted from [16] with permission from Elsevier
frequency spectra, 20 units in the hidden layer and 11 units in the output layer for the parameters of corresponding damping function (Fig. 13.22). Sunar et al. [21] The radial basis function neural networks are applied to the active control of flexible structures (Figs. 13.23 and 13.24). Kuzniar and Waszczyszyn [22] The feedforward neural networks of one or two hidden layers are used to simulate the displacement records in time domain for vibrations of prefabricated buildings. The autoencoders are employed as the data compression and the decompression tools (Fig. 13.25). The master neural network is trained to output the compressed
13.6 Estimation and Control of Dynamic Behaviors of Structures
153
Fig. 13.21 Comparison between expected output and neural network response: a sound solid bonding occurrence and b Q parameter. Reprinted from [17] with permission from Elsevier
dynamic response of displacements from the input of compressed dynamic excitations. The trained master neural network outputs the compressed dynamic response for new excitation given, and the compressed output data are further decoded using the decompression tool for the response. Figure 13.26 shows the performance of the proposed method. Lanzi et al. [23] The four-layer feedforward neural networks are applied to the fast re-analysis of crash behaviors of structural components (Fig. 13.27). Fall et al. [24] The four-layer feedforward neural networks are studied for the stability analysis of a controlled aluminium panel. The neural network is given the size and the Young’s modulus of a plate as the input, while outputs two parameters related to the stability of the plate. Ge et al. [25]
154
13 Structural Optimization
Fig. 13.22 Response-to-damping mapping and neural net (NN) structure for augmented Hooke’s law damping parameter estimation. Reprinted from [20] with permission from Springer
The recurrent neural networks are applied to the identification and the control of speed of the ultrasonic motors. The training of the recurrent neural networks, including the determination of their architectural parameters, is performed based on the particle swarm optimization algorithm. Freitag et al. [26] The fuzzy recurrent neural networks shown in Fig. 13.28 are applied to the prediction of time-dependent structural behaviors, such as the prediction of long-term displacements of reinforced concrete plates. Hasançebi and Dumlupınar [27] The three-layer feedforward neural networks are used for the estimation of load ratings of bridges, which is a measure for estimating the live load that a bridge can safely carry. Given six parameters of a bridge, the neural networks output the moment and the shear ratings of the bridge (Fig. 13.29).
13.7 Subjective Evaluation for Handling and Stability of Vehicle The subjective evaluation for handling and stability of vehicles is required both in the design of vehicle and in the evaluation of vehicle dynamics. However, it is difficult to perform these because of poor modeling techniques for complicated man– machine systems. In [28], a procedure for subjective evaluations for these using the hierarchical neural networks is developed, where the subjective evaluation scores of
13.7 Subjective Evaluation for Handling and Stability of Vehicle
155
Fig. 13.23 45-bar truss. Reprinted from [21] with permission from Elsevier
test vehicles are obtained from the frequency response tests. The method consists of three processes as (1) Samples of data of evaluation scores are studied through test maneuvers.
156
13 Structural Optimization
Fig. 13.24 Comparison of controller outputs between LQR complete structural controller (solid line) and RBFNN controller (dashed line). Reprinted from [21] with permission from Elsevier
(2) The neural network is trained using the data obtained in (1) above, where the vehicle response characteristics are used as the input and the evaluation scores as the output. (3) The trained neural network above is employed to predict the evaluation score for vehicle response characteristics.
13.7 Subjective Evaluation for Handling and Stability of Vehicle
157
Fig. 13.25 a Schematic of training phase for data processing to generate input and output vectors for training and testing of master BPNN; b schematic of operational phase of trained networks. Reprinted from [22] with permission from Elsevier
In the paper, 16 patterns for the neural network are sampled from tests of 16 cars of various types. For each car, 32 response characteristics in Table 13.1 are sampled from several tests, and 5 subjective evaluation scores in Table 13.2 are also evaluated by professional test drivers: the response deadband, the steering feel, the feeling of steering stiffness, the unsteadiness of bounce and the unsteadiness of roll and yaw. The three-layer feedforward neural network with 32 units in the input layer, 16 units in the hidden layer and 5 units in the output layer is trained using the 15 training patterns, and the trained neural network is tested for the generalization capability using the one pattern not used in training. The results show that the subjective evaluation score is successfully estimated within the error of 4% by the present method.
158
13 Structural Optimization
Fig. 13.26 Comparison of measured and neurally simulated displacement records on 4th floor for: a, b building of C/MBY/V(III) type; c, d building of BSK type. Reprinted from [22] with permission from Elsevier
13.8 Others
159
Fig. 13.27 Comparison between load–time curves obtained by PAMCRASH and by neural network systems: a, b Riveted tubes, c, d Honeycomb structures. Reprinted from [23] with permission from Elsevier
13.8 Others In addition to the studies above, a variety of papers have been published on the neural networks related to the structural optimization as follows: Nikolaidis and Zhu [29] The feedforward neural networks are applied to the design of automotive joints. The three-layer feedforward neural networks are used, where several parameters on the shape of a joint are given as the input data to the neural network, and the stiffness of the joint is the output or the teacher signal. Ramasamy and Rajasekaran [30] The optimization problems of design of industrial roofs are solved by two methods: one being the optimization by the genetic algorithms and the other the optimization using an expert system combined with the feedforward neural networks. Gupta and Li [31] The nonlinear optimization problems of mechanical and structural design are solved by using the mathematical programming neural networks (MPNN), a neural
160
13 Structural Optimization
Fig. 13.28 Partially recurrent neural network for fuzzy data. Reprinted from [26] with permission from Elsevier
network of the Hopfield type. Both unconstrained and constrained optimization problems as the energy minimization problems, are studied using this neural network. This neural network is applied to the structural optimization problems in [32] as well. El-Kassas et al. [33] The feedforward neural networks are applied to the cold-formed steel design. The neural networks with various numbers of hidden layers, various numbers of units per hidden layer and various activation functions are tested parametrically changing the size of training set. Hadi [34] The three-layer feedforward neural networks are used to study the optimal design of concrete beams. Ayhan et al. [35] The flow geometry optimization of channels with baffles is carried out using the three-layer feedforward neural networks. Cho et al. [36] The feedforward neural networks are applied to the shape optimization of tire tread. The employed neural network with 17 units in each of the two hidden layers has seven nodal radii of the outer tread elements as the input data and outputs nodal contact pressures at 11 outer tread nodes. The trained network could replace the finite element analyses required in the optimization process. Patnaik et al. [37]
13.8 Others
161
Fig. 13.29 Network architectures; a 6–10–1 architecture for moment LR and b 6–9–1 architecture for shear LR. Reprinted from [27] with permission from Elsevier
The radial basis function neural networks are applied to the design optimization of large systems through a subproblem strategy. Jiang and Adeli [38] The dynamic fuzzy wavelet neural networks are employed to control building structures. Feedforward neural networks employed in combination with other global optimization methods: Papadrakakis et al. [39], Papadrakakis and Lagaros [40], Lagaros et al. [41]
162
13 Structural Optimization
Table. 13.1 Response characteristics of vehicle 1
Yaw rate (φ) gain at 0.1 Hz
17
Steering wheel angle (θ) at G = 4m/s2
2
φ phase at 0.1 Hz
18
Roll angle at G = 5m/s2
3
φ gain at 0.6 Hz
19
Phase difference between G and φ at 0.6 Hz
4
φ phase at 0.6 Hz
20
φ gain at 1.0 Hz/φ gain at 0.1 Hz
5
Lateral acceleration (G) gain at 0.1 Hz
21
ϕ gain at 0.6 Hz x θ
6
G phase at 0.1 Hz
22
T /θ at G = 4m/s2
7
G gain at 0.6 Hz
23
Horizontal width of θ at G versus θ
8
G phase at 0.6 Hz
24
Horizontal width of θ at φ versus θ
9
Steering wheel torque (T ) gain at 0.1 Hz
25
Horizontal width of θ at T versus θ
10
T phase at 0.1 Hz
26
Vertical width of T at T versus θ
11
T gain at 0.6 Hz
27
Vertical width of G at G versus θ
12
T phase at 0.6 Hz
28
Vertical width of φ at φ versus θ
13
Roll rate (ϕ) gain at 0.1 Hz
29
Horizontal width of T at φ versus T
14
ϕ phase at 0.1 Hz
30
Vertical width of φ at φ versus T
15
ϕ gain at 0.6 Hz
31
Horizontal width of T at G versus T
16
ϕ phase at 0.6 Hz
32
Vertical width of G at G versus T
Table. 13.2 Subjective evaluation score
1
Response deadband
2
Steering feel
3
Feeling of steering stiffness
4
Unsteadiness of bounce
5
Unsteadiness of roll and yaw
Combination of the evolutionary strategies and the feedforward neural networks is studied. Giannakoglou et al. [42] Combination of the evolutionary algorithms and the radial basis function neural networks is studied. Polini et al. [43], Marcelin [44, 45] Combination of the genetic algorithms and the feedforward neural networks is studied. Jiang and Adeli [46] Combination of the genetic algorithms and the fuzzy wavelet neural networks is studied. Cheng [47]
13.8 Others
163
Combination with genetic algorithms for estimating the reliability of long span suspension bridges is studied. Alavi and Gandomi [48] The three-layer feedforward neural networks are employed to predict the peak time-domain characteristics of strong ground motions, where the simulated annealing is used for initialization of connection weights of neural networks. Gomes and Beck [49] Combination of the particle swarm optimization algorithms and the neural networks of both the feedforward and the radial basis function types is studied. Giovanis et al. [50] Combination with Markov chain Monte Carlo sampling in Bayesian updating with structural reliability methods is studied. Note that, in many of the above works, the neural networks have replaced such time-consuming evaluation process of individuals as the finite element analyses in the main global optimization loop driven by the GAs, the EAs or other global optimization algorithms, Researches focused on training of neural networks: Zhang and Subbarayan [51, 52] Various training procedures for the feedforward neural networks are tested in relation to the optimal design of structural systems. Hirschen and Schafer [53] The feedforward neural networks in combination with the EAs are applied to the shape optimization of a channel junction. Lee et al. [54] The feedforward neural networks are applied to the optimization of structures, where additional terms related to various inequality-based constraints on the design of structures are added to the target error function of the neural networks. Feedforward neural networks used in other applications: Mukherjee and Deshpande [55] Generation of preliminary design model in the design expert system is performed. Anderson et al. [56] Predicted is the semi-rigid response of beam-to-column connections. Cao et al. [57] Identified is the loads acting on aircraft wings. Kaveh and Servati [58] Designed is the double layer grid, a kind of space structures. Lee et al. [59] Surrogate models for behaviors of a stub-girder system are stuesid.
164
13 Structural Optimization
Lee and Han [60] Generated are artificial earthquake and response spectra. Cho [61] Estimated are the shear wave velocities of each layer in the multi-layer cement mortar slab systems from the experimental dispersion curve. Alqedra and Ashour [62], Sakla and Ashour [63] Predicted is the tensile and shear capacity of anchors. Hambli et al. [64] Real-time deformation analyses in virtual reality applications are performed. Fu et al. [65] Predicted are the wind-induced pressures on a large gymnasium roof. Freitag et al. [66] Lifetime is predicted using accelerated test data. Hambli [67] Multiscale analyses of a human femur are performed. Garzón-Roca et al. [68] Performed is the estimation of load-bearing capacity of brick masonry walls. Froio et al. [69] Studied is the modelling of superconducting magnets operation in tokamak fusion reactors. Papadopoulos et al. [70] Studied are the surrogate models for geometrically nonlinear analyses of carbon nanotubes. Applications of the radial basis function neural networks: Flood et al. [71] Analysed are the concrete beams reinforced by FRP sheets. Zhang and Zhang [72] Predicted is the building interference effect on the wind loads. Zakasovskaya and Tarasov [73] Studied is the optical fiber imaging based tomography reconstruction from limited data. Applications of the recurrent neural networks: Thore [74] Recurrent neural networks are applied to the design of neuro-mechanical oscillators in biological systems. Freitag et al. [75]
13.8 Others
165
Performed is the real-time predictions of mechanized tunneling with proper orthogonal decomposition (POD). Applications of other kinds of neural networks: Arslan and Hajela [76] Counter propagation neural networks are applied to design optimization. Kallassy [77] NANNs (new architecture of neural networks) are studied for the buckling design of a compression panel. Beltzer and Sato [78] Self-organizing maps are applied to the classification of the finite elements. Yuen et al. [79] General regression neural networks and their extended ones are employed for the seismic attenuation modeling.
References 1. Lin, C.-Y., Lin, S.-H.: Artificial neural network based hole image interpretation techniques for integrated topology and shape optimization. Comput. Methods Appl. Mech. Eng. 194, 3817–3837 (2005) 2. Tang, Y.-C., Zhou, X.-H., Chen, J.: Preform tool shape optimization and redesign based on neural network response surface methodology. Finite Elem. Anal. Des. 44, 462–471 (2008) 3. McKay, M.D., Beckman, R.J., Conover, W.J.: A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics 21(2), 239–245 (1979) 4. Lagaros, N.D., Charmpis, D.C., Papadrakakis, M.: An adaptive neural network strategy for improving the computational performance of evolutionary structural optimization. Comput. Methods Appl. Mech. Eng. 194, 3374–3393 (2005) 5. Jayatheertha, C., Webber, J.P.H., Morton, S.K.: Application of artificial neural networks for the optimum design of a laminated plate. Comput. Struct. 59(5), 831–845 (1996) 6. Ootao, Y., Kawamura, R., Tanigawa, Y.: Optimization of material composition of nonhomogeneous hollow sphere for thermal stress relaxation making use of neural network. Comput. Methods Appl. Mech. Eng. 180, 185–201 (1999) 7. Fairbairn, E.M.R., Ebecken, N.F.F., Paz, C.N.M., Ulm, F.-J.: Determination of probabilistic parameters of concrete: solving the inverse problem by using artificial neural networks. Comput. Struct. 78, 497–503 (2000) 8. Gotlib, V.A., Sato, T., Beltzer, A.I.: Neural computing of effective properties of random composite materials. Comput. Struct. 79, 1–6 (2001) 9. Cetinel, H., Ozyigit, H.A., Ozsoyeller, L.: Artificial neural networks modeling of mechanical property and microstructure evolution in the Tempcore process. Comput. Struct. 80, 213–218 (2002) 10. Fratini, L., Buffa, G., Palmeri, D.: Using a neural network for predicting the average grain size in friction stir welding processes. Comput. Struct. 87, 1166–1174 (2009) 11. Bessa, M.A., Bostanabad, R., Liu, Z., Hu, A., Apley, D.W., Brinson, C., Chen, W., Liu, W.K.: A framework for data-driven analysis of materials under uncertainity: countering the curse of dimensionality. Comput. Methods Appl. Mech. Eng. 320, 633–667 (2017)
166
13 Structural Optimization
12. Toparli, M., Sahin, S., Ozkaya, E., Sasaki, S.: Residual thermal stress analysis in cylindrical steel bars using finite element method and artificial neural networks. Comput. Struct. 80, 1763–1770 (2002) 13. Lin, J.C.: Using FEM and neural network prediction on hydrodynamic deep drawing of T-piece maximum length. Finite Elem. Anal. Des. 39, 445–456 (2003) 14. Montgomery, G.J., Drake, K.C.: Abductive reasoning network. Neurocomputing 2, 97–104 (1991) 15. Hambli, R., Guerin, F.: Application of neural network for optimum clearance prediction in sheet metal blanking processes. Finite Elem. Anal. Des. 39, 1039–1052 (2003) 16. Hambli, R.: Statistical damage analysis of extrusion processes using finite element method and neural networks simulation. Finite Elem. Anal. Des. 45, 640–649 (2009) 17. Buffa, G., Patrinostro, G., Fratini, L.: Using a neural network for qualitative and quantitative predictions of weld integrity in solid bonding dominated processes. Comput. Struct. 135, 1–9 (2014) 18. Chokshi, P., Dashwood, R., Hughes, D.J.: Artificial Neural Network (ANN) based microstructural prediction model for 22MnB5 boron steel during tailored hot stamping. Comput. Struct. 190, 162–172 (2017) 19. Tang, Y.: Active control of SDF systems using artificial neural networks. Comput. Struct. 60(5), 695–703 (1996) 20. Dovstam, K., Dalenbring, M.: Damping function estimation based on modal receptance models and neural nets. Comput. Mech. 19, 271–286 (1997) 21. Sunar, M., Gurain, A.M.A., Mohandes, M.: Substructural neural network controller. Comput. Struct. 78, 575–581 (2000) 22. Kuzniar, K., Waszczyszyn, Z.: Neural simulation of dynamic response of prefabricated buildings subjected to paraseismic excitations. Comput. Struct. 81, 2353–2360 (2003) 23. Lanzi, L., Bisagni, C., Ricci, S.: Neural network systems to reproduce crash behavior of structural components. Comput. Struct. 82, 93–108 (2004) 24. Fall, H., Guessasma, S., Charon, W.: Stability analysis of a controlled aluminium panel using neural network methodology. Comput. Struct. 84, 835–842 (2006) 25. Ge, H.-W., Liang, Y.-C., Marchese, M.: A modified particle swarm optimization-based dynamic recurrent neural network for identifying and controlling nonlinear systems. Comput. Struct. 85, 1611–1622 (2007) 26. Freitag, S., Graf, W., Kaliske, M., Sickert, J.-U.: Prediction of time-dependent structural behaviour with recurrent neural networks for fuzzy data. Comput. Struct. 89, 1971–1981 (2011) 27. Hasançebi, O., Dumlupınar, T.: Detailed load rating analyses of bridge populations using nonlinear finite element models and artificial neural networks. Comput. Struct. 128, 48–63 (2013) 28. Matsuda, A., Yoshimura, S., Yagawa, G., Hirata, T.: Subjective evaluations for handling and stability of vehicle using hierarchical neural networks. In: Proceedings of the International Conference on Computational Engineering Science, Hawaii, July 30–Aug. 3, pp. 76–81 (1995) 29. Nikolaidis, E., Zhu, M.: Design of automotive joints: using neural networks and optimization to translate performance requirements to physical design parameters. Comput. Struct. 60(6), 989–1001 (1996) 30. Ramasamy, J.V., Rajasekaran, S.: Artificial neural network and genetic algorithm for the design optimization of industrial roofs—a comparison. Comput. Struct. 58(4), 747–755 (1996) 31. Gupta, K.C., Li, J.: Robust design optimization with mathematical programming neural networks. Comput. Struct. 76, 507–516 (2000) 32. Adeli, H., Park, H.S.: A neural dynamics model for structural optimization—theory. Comput. Struct. 57(3), 383–390 (1995) 33. El-Kassas, E.M.A., Mackie, R.I., El-Sheikh, A.I.: Using neural networks in cold-formed steel design. Comput. Struct. 79, 1687–1696 (2001) 34. Hadi, M.N.S.: Neural networks applications in concrete structures. Comput. Struct. 81, 373–381 (2003)
References
167
35. Ayhan, T., Karlik, B., Tandiroglu, A.: Flow geometry optimization of channels with baffles using neural networks and second law of thermodynamics. Comput. Mech. 33, 139–143 (2004) 36. Cho, J.R., Shin, S.W., Yoo, W.S.: Crown shape optimization for enhancing tire wear performance by ANN. Comput. Struct. 83, 920–933 (2005) 37. Patnaik, S.N., Guptill, J.D., Hopkins, D.A.: Subproblem optimization with regression and neural network approximators. Comput. Methods Appl. Mech. Eng. 194, 3359–3373 (2005) 38. Jiang, X., Adeli, H.: Dynamic fuzzy wavelet neuroemulator for non-linear control of irregular building structures. Int. J. Numer. Meth. Eng. 74, 1045–1066 (2008) 39. Papadrakakis, M., Lagaros, N.D., Tsompanakis, Y.: Structural optimization using evolution strategies and neural networks. Comput. Methods Appl. Mech. Eng. 156, 309–333 (1998) 40. Papadrakakis, M., Lagaros, N.D.: Reliability-based structural optimization using neural networks and Monte Carlo simulation. Comput. Methods Appl. Mech. Eng. 191, 3941–3507 (2002) 41. Lagaros, N.D., Garavelas, ATh., Papadrakakis, M.: Innovative seismic design optimization with reliability constraints. Comput. Methods Appl. Mech. Eng. 198, 28–41 (2008) 42. Giannakoglou, K.C., Papadimitriou, D.I., Kampolis, I.C.: Aerodynamic shape design using evolutionary algorithms and new gradient-assisted metamodels. Comput. Methods Appl. Mech. Eng. 195, 6312–6329 (2006) 43. Polini, C., Giurgevich, A., Onesti, L., Pediroda, V.: Hybridization of a multi-objective genetic algorithm, a neural network and a classical optimizer for a complex design problem in fluid dynamics. Comput. Methods Appl. Mech. Eng. 186, 403–420 (2000) 44. Marcelin, J.L.: Genetic optimization of stiffened plates and shells. Int. J. Numer. Meth. Eng. 51, 1079–1088 (2001) 45. Marcelin, J.L.: Genetic optimization of stiffened plates without the FE mesh support. Int. J. Numer. Meth. Eng. 54, 685–694 (2002) 46. Jiang, X., Adeli, H.: Neuro-genetic algorithm for non-linear active control of structures. Int. J. Numer. Meth. Eng. 75, 770–786 (2008) 47. Cheng, J.: An artificial neural network based genetic algorithm for estimating the reliability of long span suspension bridges. Finite Elem. Anal. Des. 46, 658–667 (2010) 48. Alavi, A.M., Gandomi, A.H.: Prediction of principal ground-motion parameters using a hybrid method coupling artificial neural networks and simulated annealing. Comput. Struct. 89, 2176– 2194 (2011) 49. Gomes, W.J.S., Beck, A.T.: Global structural optimization considering expected consequences of failure and using ANN surrogates. Comput. Struct. 126, 56–68 (2013) 50. Giovanis, D.G., Papaioannou, I., Straub, D., Papadopoulos, V.: Bayesian updating with subset simulation using artificial neural networks. Comput. Methods Appl. Mech. Eng. 319, 124–145 (2017) 51. Zhang, L., Subbarayan, G.: An evaluation of back-propagation neural networks for the optimal design of structural systems: Part I. Training procedures. Comput. Methods Appl. Mech. Eng. 191, 2873–2886 (2002) 52. Zhang, L., Subbarayan, G.: An evaluation of back-propagation neural networks for the optimal design of structural systems: Part II. Numerical evaluation. Comput. Methods Appl. Mech. Eng. 191, 2887–2904 (2002) 53. Hirschen, K., Schafer, M.: Bayesian regularization neural networks for optimizing fluid flow processes. Comput. Methods Appl. Mech. Eng. 195, 481–500 (2006) 54. Lee, J., Jeong, H., Choi, D.-H., Volovoi, V., Marvis, D.: An enhancement of consistent feasibility in BPN based approximate optimization. Comput. Methods Appl. Mech. Eng. 196, 2147–2160 (2007) 55. Mukherjee, A., Deshpande, J.M.: Application of artificial neural networks in structural design expert systems. Comput. Struct. 54(3), 367–375 (1995) 56. Anderson, D., Hines, E.L., Arthur, S.J., Eiap, E.L.: Application of artificial neural network to the prediction of minor axis steel connections. Comput. Struct. 63(4), 685–692 (1997) 57. Cao, X., Sugiyama, Y., Mitsui, Y.: Application of artificial neural networks to load identification. Comput. Struct. 69, 63–78 (1998)
168
13 Structural Optimization
58. Kaveh, A., Servati, H.: Design of double layer grids using backpropagation neural networks. Comput. Struct. 79, 1561–1568 (2001) 59. Lee, S.C., Park, S.K., Lee, B.H.: Development of the approximate analytical model for the stub-girder system using neural network. Comput. Struct. 79, 1013–1025 (2001) 60. Lee, S.C., Han, S.W.: Neural-network-based models for generating artificial earthquakes and response spectra. Comput. Struct. 80, 1627–1638 (2002) 61. Cho, Y.-S.: Dispersive characteristic measurement of multi-layer cement mortar slabs using SASW method and neural network. Comput. Struct. 81, 2491–2499 (2003) 62. Alqedra, M.A., Ashour, A.F.: Prediction of shear capacity of single anchors located near a concrete edge using neural networks. Comput. Struct. 83, 2495–2502 (2005) 63. Sakla, S.S.S., Ashour, A.F.: Prediction of tensile capacity of single adhesive anchors using neural networks. Comput. Struct. 83, 1792–1803 (2005) 64. Hambli, R., Chamekh, A., Salah, H.B.H.: Real-time deformation of structure using finite element and neural networks in virtual reality applications. Finite Elem. Anal. Des. 42, 985–991 (2006) 65. Fu, J.Y., Liang, S.G., Li, Q.S.: Prediction of wind-induced pressures on a large gymnasium roof using artificial neural networks. Comput. Struct. 85, 179–192 (2007) 66. Freitag, S., Beer, M., Graf, W., Kaliske, M.: Lifetime prediction using accelerated test data and neural networks. Comput. Struct. 87, 1187–1194 (2009) 67. Hambli, R.: Numerical procedure for multiscale bone adaptation prediction based on neural networks and finite element simulation. Finite Elem. Anal. Des. 47, 835–842 (2011) 68. Garzón-Roca, J., Adam, J.M., Sandoval, Roca, P.: Estimation of the axial behaviour of masonry walls based on Artificial Neural Networks. Comput. Struct. 125, 145–152 (2013) 69. Froio, A., Bonifetto, R., Carli, S., Quartararo, A., Savoldi, L., Zanino, R.: Design and optimization of artificial neural networks for the modelling of superconducting magnets operation in tokamak fusion reactors. J. Comput. Phys. 321, 476–491 (2016) 70. Papadopoulos, V., Soimiris, G., Giovanis, D.G., Papadrakakis, M.: A neural network-based surrogate model for carbon nanotubes with geometric nonlinearities. Comput. Methods Appl. Mech. Eng. 328, 411–430 (2018) 71. Flood, I., Muszynski, L., Nandy, S.: Rapid analysis of externally reinforced concrete beams using neural networks. Comput. Struct. 79, 1553–1559 (2001) 72. Zhang, A., Zhang, L.: RBF neural networks for the prediction of building interference effects. Comput. Struct. 82, 2333–2339 (2004) 73. Zakasovskaya, E.V., Tarasov, V.S.: Optical fiber imaging based tomography reconstruction from limited data. Comput. Methods Appl. Mech. Eng. 328, 542–553 (2018) 74. Thore, C.-J.: Optimal design of neuro-mechanical oscillators. Comput. Struct. 119, 189–202 (2013) 75. Freitag, S., Cao, B.T., Ninic, J., Meschke, G.: Recurrent neural networks and proper orthogonal decomposition with interval data for real-time predictions of mechanised tunnelling processes. Comput. Struct. 207, 258–273 (2018) 76. Arslan, M.A., Hajela, P.: Counterpropagation neural networks in decomposition based optimal design. Comput. Struct. 65(5), 641–650 (1997) 77. Kallassy, A.: A new neural network for response estimation. Comput. Struct. 81, 2417–2429 (2003) 78. Beltzer, A.I., Sato, T.: Neural classification of finite elements. Comput. Struct. 81, 2331–2335 (2003) 79. Yuen, K.-V., Oritz, G.A., Huang, K.: Novel nonparametric modeling of seismic attenuation and directivity relationship. Comput. Methods Appl. Mech. Eng. 311, 537–555 (2016)
Chapter 14
Some Notes on Applications of Neural Networks to Computational Mechanics
Abstract Chapter 14 focuses on the comparison between neural networks and other methods in terms of performance in applications to computational mechanics (Sect. 14.1), and on the improvements of neural networks (Sect. 14.2).
14.1 Comparison among Neural Networks and Other AI Technologies Some studies have discussed the performance of computational mechanics among various neural networks and other methods: Hurtado and Alvarez [1] The feedforward neural networks are compared with the radial basis function neural networks in the applications to the probabilistic analyses of structures. Rafiq et al. [2] The three types of neural networks, the feedforward neural networks, the radial basis function networks and the normalized radial basis function networks, are compared about their performance in predicting the optimal design of a concrete slab. Milano and Koumoutsakos [3] The feedforward neural network and the proper orthogonal decomposition (POD) are compared in reconstructing the near-wall flow in a turbulent channel flow from the flow fields provided by the direct numerical simulations, where the POD is equivalent to a specific linear neural network that employs linear activation functions, and the nonlinear neural network that employs nonlinear ones is regarded as the extension of the POD. It is reported in the paper that the neural network shows better performance than the POD. Lee and Shin [4] The feedforward neural networks are compared with the wavelet neural networks for the meta-modeling in the constrained approximate optimization. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 G. Yagawa and A. Oishi, Computational Mechanics with Neural Networks, Lecture Notes on Numerical Methods in Engineering and Sciences, https://doi.org/10.1007/978-3-030-66111-3_14
169
170
14 Some Notes on Applications of Neural Networks to Computational …
Garijo et al. [5] The feedforward neural networks are compared with the support vector machines and the linear regressions on their performance in the prediction of proximal femur loads from bone morphology (Fig. 14.1 and Table 14.1). Ling et al. [6]
Fig. 14.1 Schematic diagram of computational approach. Reprinted from [5] with permission from Elsevier
Table 14.1 Absolutes of relative error (RE) and correlation coefficient (RSQ) of learning techniques analyzed The absolute of relative error (RE)% and the correlation coefficient (RSQ) Force magnitude
Force angle
Force position
Total
Linear regression
ANN (15 neurons)
RE training (%)
0.09727
0.00107
9.199
RE testing (%)
0.10533
0.00273
0.932
RSQ testing
0.99977
0.99999
0.976
RE training (%)
0.25842
0.00257
6.001
RE testing (%)
0.55726
0.00522
1.441
RSQ testing
0.99718
0.99999
0.947
RE training (%)
0.61120
0.00255
0.0074
RE testing (%)
0.60468
0.00508
0.002
RSQ testing
0.99988
0.99999
RE training (%)
0.96689
0.0062
15.207
RE testing (%)
1.26727
0.0130
2.375
RSQ testing
0.998943
0.99999
0.9739
Reprinted from [5] with permission from Elsevier
SVM
0.9988
14.1 Comparison among Neural Networks and Other AI Technologies
171
Fig. 14.2 Errors as a function of number of training rotations for a random forests and b neural networks in turbulence modeling case study. Reprinted from [6] with permission from Elsevier
The feedforward neural network and the random forest are compared on the performance of the learning invariance or the symmetry properties, such as the rotational invariance, concluding that both methods show very similar performance (Fig. 14.2). In addition to the studies mentioned above, the neural network-based response surface method is compared with the polynomial-based response surface method in the prediction of the performance characteristics of automotive joints (Fig. 14.3) [7] and in solving inverse reliability problems of steel structures [8]. The feedforward neural networks are applied to the prediction of dam behavior, where their performance is compared with those of other machine learning methods [9, 10]. The former claims that the artificial neural networks may perform better than other machine learning methods because the ANNs have more tunable hyper-parameters than other methods, while the latter argues that the artificial neural networks require more data than other methods to obtain generalization capability due to the lack of extrapolation capability.
14.2 Improvements of Neural Networks in Terms of Applications to Computational Mechanics Reported are several works on the improvements of the training data sampling methods, the training algorithms and the network architectures: Topping et al. [11] A parallel training of the feed forward neural network is implemented on the network of processors, where the distribution of workload is based on the dataparallelism. Yilbas and Hashmi [12]
172
14 Some Notes on Applications of Neural Networks to Computational …
Fig. 14.3 Comparison of polynomial and neural network results. Reprinted from [7] with permission from Elsevier
Proposed for the pattern classification problems is a weight pruning method based on the comparison of the absolute values of the weights in the same layer, where the non-contributing weights are deleted to obtain an optimal compact neural network for faster computation. Pei et al. [13] Studied is a method of analytically constructing a feedforward neural network, equivalent to a given polynomial expression consisting of not only single variable terms but cross terms, where the feedforward neural network is treated as a polynomial fitting and the appropriate number of hidden units are automatically determined. The fixed-weight training for this method is also proposed, where the connection weights related to the coefficients of polynomials are trained while those inherited from the fitting of each individual polynomial term are fixed. Chakraverty et al. [14] An initialization method of connection weights is proposed for the feedforward neural networks based on the assumption that the outputs are approximated by the polynomials of input data, where the number of units in the hidden layer is automatically set to the number of terms in the polynomial expression and the initial
14.2 Improvements of Neural Networks in Terms of Applications …
173
connection weights are also automatically determined by the least-square method based on the polynomial approximation. Jenkins [15] The mutation operation is applied to the training of the feedforward neural networks, where the algorithm is summarized as follows: (1) Two identical neural networks, labeled as 1 and 2, respectively, are prepared and the fitness values of them are calculated. (2) A weight of the network 2 is chosen at random, and the weight is mutated to increase its value. Then, the fitness of the network is updated. (3) If the fitness of the network 2 is shown to be better than that of the network 1, the network 2 is copied to the network 1, then go to (2). Otherwise, go to (4) (4) The weight is mutated to decrease the value and the fitness of the network is updated. (5) If the fitness of the network 2 is shown better than that of the network 1, the network 2 is copied to the network 1. (6) Go to (2). Slonski [16] The model selection methods to determine the optimal structure of a feedforward neural network, especially the number of hidden units, are compared each other among the three methods; the validation set approach, the maximum marginal likelihood approach and the full Bayesian approach, concluding that the second one is most effective. Teichert et al. [17] An integrable deep neural network (IDNN) is employed to identify the free energy of material systems. While the free energy is rarely directly measured or computed, the derivatives of it are first observed or computed and then integrated to find the free energy of the system. Thus, in the IDNN, the neural network is first trained to simulate the observable data, and then the weight of the trained network is transferred to the network that simulates the free energy. The method is tested in the phase field simulations. Yang and Perdikaris [18] A probabilistic learning methodology that enables the construction of predictive data-driven surrogate models is proposed, in which the generative adversarial networks methodology [19] is employed. It is concluded that the proposed models can be effectively trained with noisy data and provide accurate predictions with uncertainty estimates. In addition to the above studies, some researches discuss the selection of the training data for neural networks. For example, the uniform design method [20] is applied to the selection of training data, which results in significant reduction of training data [21, 22], and the Latin hypercube sampling is employed to reduce the size of training set [23].
174
14 Some Notes on Applications of Neural Networks to Computational …
References 1. Hurtado, J.E., Alvarez, D.A.: Neural-network-based reliability analysis: a comparative study. Comput. Methods Appl. Mech. Eng. 191, 113–132 (2001) 2. Rafiq, M.Y., Bugmann, G., Easterbrook, D.J.: Neural network design for engineering applications. Comput. Struct. 79, 1541–1552 (2001) 3. Milano, M., Koumoutsakos, P.: Neural network modeling for near wall turbulent flow. J. Comput. Phys. 182, 1–26 (2002) 4. Lee, J., Shin, K.H.: A conservative method of wavelet neural network based meta-modeling in constrained approximate optimization. Comput. Struct. 89, 109–126 (2011) 5. Garijo, N., Martinez, J., Garcia-Aznar, J.M., Perez, M.A.: Computational evaluation of different numerical tools for the prediction of proximal femur loads from bone morphology. Comput. Methods Appl. Mech. Eng. 268, 437–450 (2014) 6. Ling, J., Jones, R., Templeton, J.: Machine learning strategies for systems with invariance properties. J. Comput. Phys. 318, 22–35 (2016) 7. Nikolaidis, E., Long, L., Ling, Q.: Neural networks and response surface polynomials for design of vehicle joints. Comput. Struct. 75, 593–607 (2000) 8. Cheng, J., Li, Q.S.: Application of the response surface methods to solve inverse reliability problems with implicit response functions. Comput. Mech. 43, 451–459 (2009) 9. Salazar, F., Toledo, M.A., Morán, R., Oñate, E.: An empirical comparison of machine learning techniques for dam behaviour modelling. Struct. Saf. 56, 9–17 (2015) 10. Salazar, F., Moran, R., Toledo, M.A., Oñate, E.: Data-based models for the prediction of dam behaviour: a review and some methodological considerations. Arch. Comput. Methods Eng. 24(1), 1–21 (2017) 11. Topping, B.H.V., Khan, A.I., Bahreininejad, A.: Parallel training of neural networks for finite element mesh decomposition. Comput. Struct. 63(4), 693–707 (1997) 12. Yilbas, Z., Hashmi, M.S.J.: Simulation of weight pruning process in backpropagation neural network for pattern classification: a self-running threshold approach. Comput. Methods Appl. Mech. Eng. 166, 233–246 (1998) 13. Pei, J.-S., Wright, J.P., Smyth, A.W.: Mapping polynomial fitting into feedforward neural networks for modeling nonlinear dynamic systems and beyond. Comput. Methods Appl. Mech. Eng. 194, 4481–4505 (2005) 14. Chakraverty, S., Singh, V.P., Sharma, R.K.: Regression based weight generation algorithm in neural network for estimation of frequencies of vibrating plates. Comput. Methods Appl. Mech. Eng. 195, 4194–4202 (2006) 15. Jenkins, W.M.: Neural network weight training by mutation. Comput. Struct. 84, 2107–2112 (2006) 16. Slonski, M.: A comparison of model selection methods for compressive strength prediction of high-performance concrete using neural networks. Comput. Struct. 88, 1248–1253 (2010) 17. Teichert, G.H., Natarajan, A.R., Van der Ven, A., Garikipati, K.: Machine learning materials physics: integrable deep neural networks enable scale bridging by learning free energy functions. Comput. Methods Appl. Mech. Eng. 353, 201–216 (2019) 18. Yang, Y., Perdikaris, P.: Conditional deep surrogate models for stochastic, high-dimensional, andmulti-fidelity systems. Comput. Mech. 64, 417–434 (2019) 19. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016) 20. Fang, K.T., Wang, Y.: Number-Theoretic Methods in Statistics. CRC Press (1993) 21. Cheng, J., Li, Q.S.: Reliability analysis of structures using artificial neural network based genetic algorithms. Comput. Methods Appl. Mech. Eng. 197, 3742–3750 (2008) 22. Cheng, J., Li, Q.S.: A hybrid artificial neural network method with uniform design for structural optimization. Comput. Mech. 44, 61–71 (2009) 23. Papadopoulos, V., Giovanis, D.G., Lagaros, N.D., Papadrakakis, M.: Accelerated subset simulation with neural networks for reliability analysis. Comput. Methods Appl. Mech. Eng. 223–224, 70–80 (2012)
Chapter 15
Other AI Technologies for Computational Mechanics
Abstract Chapter 15 describes some applications of machine learning methods other than neural networks to computational mechanics. They cover most categories discussed in Chaps. 8 through 13. Topics given here include the parameter identification of constitutive model using evolutionary algorithm (Sect. 15.1), the construction of constitutive model using genetic programming (Sect. 15.2), the datadriven analysis (Sect. 15.3), the numerical quadrature using symbolic manipulation (Sect. 15.4), the contact search using genetic algorithm (Sect. 15.5), the contact search using genetic programming (Sect. 15.6), the non-linear equation systems solved with genetic algorithm (Sect. 15.7), and other applications using various machine learning methods (Sects. 15.8–15.10).
15.1 Parameter Identification of Constitutive Model The parameter identification problem of viscoplastic constitutive material model (Figs. 15.1 and 15.2), (See Sect. 8.1) is discussed with the evolutionary algorithm, a kind of genetic algorithm [1]. While the procedure of the evolutionary algorithm is equivalent to that of the genetic algorithm, the crossover operation in the standard GA is here replaced by the recombination operation. For randomly selected two individuals x α = α α β β β α x1 , x2 , · · · , x n , and x β = x1 , x2 , · · · , x n , , the recombination operation
generates two new individuals x α and x β , respectively, as
x α = (1 − μ)x α + μx β
x β = (1 − μ)x β + μx α
(15.1.1) (15.1.2)
where μ is defined either by a normal with the mean value of 0 and the distribution standard deviation of σ , i.e. μ = N 0, σ 2 or by a uniform random value between specified maximum and minimum limits μmin and μmax , respectively. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 G. Yagawa and A. Oishi, Computational Mechanics with Neural Networks, Lecture Notes on Numerical Methods in Engineering and Sciences, https://doi.org/10.1007/978-3-030-66111-3_15
175
176
15 Other AI Technologies for Computational Mechanics
Fig. 15.1 Cyclic loading test. Reprinted from [1] with permission from Wiley
Fig. 15.2 Hysteresis loops of cyclic loading test. Reprinted from [1] with permission from Wiley
For each individual, the evaluation of fitness is calculated as 1 2 ∗ ∗ i=1 wi σi − σ i (εi , x)
f itness = m
(15.1.3)
where σi ∗ and εi ∗ are the i-th measured stress and strain, respectively, σ i (εi ∗ , x) the corresponding calculated stress based on the material model with the measured εi ∗ and the parameters given by the individual, and wi a weighting factor. In the paper, several thousand generations of the evolutionary algorithm are successfully performed with the population size of 50 (Fig. 15.3). In [2], an automated system based on the method described above is developed for the parameter identification of inelastic constitutive models (Fig. 15.4).
15.2 Constitutive Material Model by Genetic Programming
177
Fig. 15.3 Sample histories of minimization of objective function value. Reprinted from [1] with permission from Wiley
15.2 Constitutive Material Model by Genetic Programming In [3], a material model is developed with experimental data and the genetic programming (See Sect. 6.2). While the neural networks construct practically implicit expressions of a material model, the genetic programming can construct explicit ones, which provides various merits. In the paper, an expert knowledge is introduced into the standard genetic programming. The method is successfully applied to the modelling of plastic flow at high temperatures and high strain rates. Figure 15.5 shows comparison between experimental results and computed profiles obtained using (a) Standard MTS (Mechanical Threshold Stress) model, (b) Standard PTW (PrestonTonks-Wallace) model, (c) Simplest symbolic regression (i.e. genetic programming) model, no expert knowledge, (d) Strength model obtained from stress–strain data with additional expert knowledge, (e) Voce hardening expression from genetic symbolic regression, and (f) Voce hardening from symbolic regression and expert knowledge on saturation stress, respectively.
15.3 Data-Driven Analysis Without Material Modelling In [4], another paradigm on material modelling is developed, where the analyses are carried out directly from the experimental data without any material modelling. The material data are selected from the finite set of data obtained experimentally, then the
178
15 Other AI Technologies for Computational Mechanics
Fig. 15.4 Screen shots of automated system. Reprinted from [2] with permission from Elsevier
15.3 Data-Driven Analysis Without Material Modelling
179
Fig. 15.5 Computed profiles vs. experimental results, initial velocity 146 m/s at room temperature. Reprinted from [3] with permission from Elsevier
problems such as the structural or the stress analyses to be solved are transformed into that to find the most appropriate assignment of material data set that minimizes the sum of the strain and the complementary energy of the system with constraints of equilibrium and compatibility. With such experimental data as the strain or the stress measured at discrete points in the state space (Fig. 15.6), the analyses of a truss structure are performed as follows: (1) Data points for strain or stress are randomly selected and assigned to each bar in the truss structure. (2) Equations derived from the minimization condition are solved by using the material data given. (3) Based on the results obtained in (2) above, data points for strain or stress are calculated for each bar. (4) For each bar in the truss, data points for strain or stress in the experimental data that are the closest to those calculated in (3) are selected as new candidate data points. (5) If the data points used in (2) and those selected in (4) coincide for each bar in the truss, the solution is considered to have converged. Otherwise, using the new candidate data points, return to (2) above. The method is also tested on the linear elasticity analyses in the paper. Figure 15.7 shows the value F, the penalty functional to be minimized, decays through increasing data resolution for both mesh resolutions, while Fig. 15.8 shows RMS errors decay linearly in data resolution for both stresses (σ) and strains ().
180
15 Other AI Technologies for Computational Mechanics
Fig. 15.6 Typical material data set for truss bar. Reprinted from [4] with permission from Elsevier
15.4 Numerical Quadrature As described in Appendix A1.1, the element stiffness matrices in the finite element method is usually calculated using such numerical quadrature as the Gauss–Legendre quadrature (See Sect. 9.1). In [5], a numerical integration scheme based on the symbolic manipulation is developed to reduce the computation time, where, as a symbolic manipulation system, a kind of expert system, REDUCE [6] is employed. To make the basic idea clear, a simple integration of the m-th order polynomial is considered as f (x) =
m
c j x j = c0 + c1 x + c2 x 2 + · · · + cm x m
(15.4.1)
j=0
Using the Gauss–Legendre quadrature, we have
1
−1
f (x)d x ∼ =
n i=1
n c0 + c1 xi + · · · + cm xim wi f (xi )wi = i=1
(15.4.2)
15.4 Numerical Quadrature
181
Fig. 15.7 Linear-elastic tensile specimen. Convergence of local material-data assignment iteration. Reprinted from [4] with permission from Elsevier
where xi and wi are, respectively, the coordinate value and the corresponding weight for the Gauss–Legendre scheme. The right hand side of Eq. (15.4.2) is transformed to the compact form as
1 −1
f (x)d x ∼ =
m
ai ci
(15.4.3)
i=1
where
ai =
n
x ij w j
(15.4.4)
j=1
This transformation is equivalent to performing the computation of the Gauss– Legendre quadrature with c0 , c1 , c2 , · · · , cm being as variables. The amount of computation required to calculate Eq. (15.4.3) is much smaller than that required for calculating Eq. (15.4.2). A component of the two-dimensional element stiffness matrix for the finite element method is written as follows:
182
15 Other AI Technologies for Computational Mechanics
Fig. 15.8 Linear-elastic tensile specimen. Convergence with respect to sample size. Reprinted from [4] with permission from Elsevier
e k pq =
1 −1
1 −1
f pq (c1 , c2 , · · · , cm ; ξ, η)dξ dη
(15.4.5)
where f pq (c1 , c2 , · · · , cm ; ξ, η) is a rational function with c1 , c2 , · · · , cm being coefficients. The above equation is transformed into a compact form as
1 1 f pq (c1 , c2 , · · · , cm ; ξ, η)dξ dη −1 −1 n n
∼ =
f pq (c1 , c2 , · · · , cm ; ξ, η)wi w j
i=1 j=1 Gauss = F pq (c1 , c2 , · · · , cm )
(15.4.6)
where the Gauss–Legendre quadrature is employed with the symbolic integration. Gauss It is noted that F pq (c1 , c2 , · · · , cm ) above is equivalent to the form of Eq. (15.4.3). Using this compact form, it is possible to calculate the element stiffness
15.4 Numerical Quadrature
183
matrices much faster than using the usual code consisting of loops on integration points. On the other hand, one can use the symbolic integration for fully analytical integration of Eq. (15.4.5) as
1 −1
1 −1
Full f pq (c1 , c2 , · · · , cm ; ξ, η)dξ dη = F pq (c1 , c2 , · · · , cm )
(15.4.7)
Full Though F pq (c1 , c2 , · · · , cm ) gives correct value, the calculation of Full Gauss F pq (c1 , c2 , · · · , cm ) is much more complicated than that of F pq (c1 , c2 , · · · , cm ).
15.5 Contact Search Using Genetic Algorithm In Sect. 11.9, the local contact search using the neural network is discussed, where the mapping from the relative locations of a slave node and its contacting master segment to the location of the corresponding contact point is implemented on the neural network. While the mapping is implicitly implemented on the neural network above, an approximating polynomial or an explicit expression of the mapping can be obtained also by using the genetic algorithm [7]. In the paper, the local coordinate value ξc of the contact point is assumed to be a mapping f of fifteen known coordinate values, x A , y A , z A , · · · , x D , y D , z D , x Ps , y Ps , z Ps of four nodes in the master segment and the slave node Ps in Fig. 11.17. Using the relative coordinates and scaling, above 15 coordinate values are reduced to 8 coordinate values xC , yC , z C , x D , y D , x Ps , y Ps , z Ps , with the point A fixed at (0,0,0), the point B fixed at (1,0,0) and the point D fixed on the xy plane. Thus, f is expressed as ξc = f xC , yC , z C , x D , y D , x Ps , y Ps , z Ps
(15.5.1)
An approximating polynomial of the function f is found by using the genetic algorithms (GA). Assuming the Nth degree perfect polynomials as a candidate, an approximating polynomial for ξc is represented as follows: ξc = f xC , yC , z C , x D , y D , x Ps , y Ps , z Ps = a0 +
N
n=1
i k1i + · · · + k8i = n
ki
ki
ai xc 1 · · · z P8
(15.5.2)
Note that an approximating polynomial for ηc is also represented in the same way. Table 15.1 shows the relationship between the highest degree of a perfect polynomial and the number of terms in it for the case of eight variables. In the paper, a lot of
184 Table 15.1 Number of terms in perfect polynomial with 8 variables [7]
Table 15.2 Average error in GA-based contact search. [7]
15 Other AI Technologies for Computational Mechanics Highest degree
Number of terms in the polynomial
0
1
1
9
2
45
3
165
4
495
5
1287
Eξ
Nter m
Eη
Nter m
0.017445
124
0.022610
129
candidate polynomials are generated through the genetic algorithm, where an individual has the same number of binary genes, i.e. each gene takes 0 or 1 as the terms of the perfect polynomial and a candidate polynomial is generated by employing only the terms whose corresponding genes have the value 1. Once a polynomial is constructed by the genetic algorithm, the coefficients of all the terms are determined by the least square method, and then the fitness of the individual is evaluated. Table 15.2 shows the performance of the method with the number of polynomial terms being limited less than 130.
15.6 Contact Search Using Genetic Programming In Sect. 15.5, the approximating polynomials for the local contact search are found by the genetic algorithm (GA). The approximating functions other than the polynomials can be found also by using the genetic programming (GP). As the genetic programming (GP) represents broader range of functions than the GA, more compact and accurate approximating function may be found by the GP. In the paper [7], the linear GP (see Sect. 6.2) is employed to find approximating functions for the mapping in the local contact search, where operators, variables and constants are represented by corresponding integer numbers as indexes, and constants are to be selected from the prescribed 201 values at an interval of 0.05 within the range of (−5.0, 5.0). Two kinds of expressions are tested: polynomials using +, − and × operators and rational expressions using +, −, × and / operators. As for the maximum number of genes of individuals, five kinds of MaxGeneLength are tested. Figures 15.9 and 15.10 show the performance of the best expression for each MaxGeneLength. The left vertical axis denotes the error of the polynomial expression derived from the best individual obtained, while the right vertical axis the count of multiplication operations in Fig. 15.9, and the count of multiplication and division operations in Fig. 15.10, respectively. The horizontal axis denotes MaxGeneLength.
15.6 Contact Search Using Genetic Programming
185
0.08
50
Error 40
Number of Multiplications
0.06
Error
30 0.04 20 0.02
10
Number of Multiplications 0.00
0
50
100
150
200
250
0 300
Maximum Gene Length Fig. 15.9 Accuracy of approximating polynomials [7]
0.08
50
Error
Error
30 0.04 20 0.02
0.00
Number of Operations
40
0.06
10
Number of Multiplications Number of Divisions 0
50
100
150
200
Maximum Gene Length Fig. 15.10 Accuracy of approximating rational expressions [7]
250
0
300
186
15 Other AI Technologies for Computational Mechanics
15.7 Solving Non-linear Equation Systems Using Genetic Algorithm In [8], the genetic algorithm is employed to search the best ordering to solve a set of non-linear equations: Let us assume a set of non-linear equations as ⎧ f 1 (x) ⎪ ⎪ ⎪ ⎨ f 2 (x) F(x) = . ⎪ ⎪ .. ⎪ ⎩ f n (x)
⎫ ⎪ ⎪ ⎪ ⎬ ⎪ ⎪ ⎪ ⎭
=0
(15.7.1)
Then, a process P is defined as a possible order of the functions f i , aiming to solve the equation system through the fixed-point method [9]. For example, we assume F(x) as follows. ⎧ ⎫ ⎨ f 1 (x, y, z) ⎬ F(x, y, z) = =0 f (x, y) ⎩ 2 ⎭ f 3 (x, y)
(15.7.2)
In this case, P can be one of the following six orderings: { f 1 , f 2 , f 3 }, { f 1 , f 3 , f 2 }, { f 2 , f 1 , f 3 }, { f 2 , f 3 , f 1 }, { f 3 , f 1 , f 2 } and { f 3 , f 2 , f 1 }. The solution scheme in the case of { f 2 , f 3 , f 1 } is shown in Fig. 15.11. Each function is used to find one variable and the state of a variable is categorized into one of the three states: known, unknown and supposed. Solving a function changes the state of some variables. In the paper, the character of each non-linear function of equation system F(x) is evaluated based on how the state of variables is changed. Using the evaluated characteristic value of each function, the fitness of each P, which is an individual in the population in the genetic algorithm, is calculated. Developed is the extension to the case where the number of equations to be solved varies in the solving process.
15.8 Nondestructive Evaluation Several studies are performed in the area of the nondestructive evaluations using the machine learning methods: Rabinovich et al. [10] The genetic algorithms are applied to the crack identification, where such crack parameters as location, size, orientation and shape are encoded into the grey-coded genes of 28 bits in total, 7 bits per parameter, and the direct analyses are performed using the extended finite element method (XFEM) [11], which enables the use of the
15.8 Nondestructive Evaluation
187
Fig. 15.11 Flow diagram of solution process. Reprinted from [8] with permission from Wiley
same regular mesh for all the analyses with various cracks, eliminating the cost for re-meshing. Tan and Awade [12] The Sobol decomposition and the reduced order representation are employed to evaluate localization in heterogeneous polycrystals based on pattern recognition and classification techniques (Fig. 15.12). Peherstorfer and Willcox [13] The dynamic reduced-order models (ROMs) are applied to a structural assessment. The proposed models are based on the proper orthogonal decomposition (POD), and they adapt themselves using just the new measurement data without any full-scale re-analysis. Wang et al. [14] The POD is utilized to estimate material property distributions from partial-field response measurements, in which the gappy POD is used to reconstruct the whole response field from the measurement data for only a portion of the system domain (Figs. 15.13 and 15.14). Sankaran et al. [15] The impact of geometric uncertainty in the cardiovascular system is evaluated by using the machine learning techniques, which results in much faster convergence (Figs. 15.15 and 15.16). Franck and Koutsourelakis [16]
188
15 Other AI Technologies for Computational Mechanics
Fig. 15.12 Two-dimensional material model for polycrystalline microstructure. Reprinted from [12] with permission from Elsevier
Fig. 15.13 Schematic for numerically simulated examples representing characterization of elastic modulus distribution with an inclusion. Reprinted from [14] with permission from Elsevier
15.8 Nondestructive Evaluation
189
Fig. 15.14 a Target elastic modulus distribution and b elastic modulus distribution estimated with direct inversion approach (contours in units of Pa). Reprinted from [14] with permission from Elsevier
Fig. 15.15 Schematic of algorithm that couples adaptive collocation with machine learning algorithm. Reprinted from [15] with permission from Elsevier
190
15 Other AI Technologies for Computational Mechanics
Fig. 15.16 (left) Comparison of time taken to perform a single simulation using Navier–Stokes equations in 3D and machine learning method and (right) Bar plot comparing time for sensitivity problem using 3D simulation and machine learning. Reprinted from [15] with permission from Elsevier
A variational Bayesian method is applied to the identification of mechanical properties of biological materials in the ultrasound elasticity imaging (elastography) for medical diagnosis. In addition to the above papers, the applications of the machine learning methods to nondestructive evaluation problems include the gappy POD [17], the Sobol decomposition [18], the variational Bayesian method [19], etc.
15.9 Structural Optimization Some studies on the structural optimization using the machine learning methods other than the neural networks are briefly discussed: Botello et al. [20] Two stochastic search methods, the genetic algorithms and the simulated annealing, are taken for the performance study in the optimization of pin-jointed steel bar structures, showing that the hybrid of the two methods performs best in a parallel computing environment. Bugeda et al. [21] The evolutionary methods are employed to solve structural shape optimization problems, where a low-cost adaptive remeshing strategy is utilized to reduce the computational cost. In the case of flywheel, its topology is first optimized by using a conventional technique, followed by the shape optimization with the proposed method (Fig. 15.17). Congedo et al. [22]
15.9 Structural Optimization
191
Fig. 15.17 Flywheel. Shape evolution versus generation number. Adapted meshes. Reprinted from [21] with permission from Elsevier
The genetic algorithms are applied to the shape optimization of an airfoil in a Bethe-Zeldovich-Thompson fluids flow, one of the dense gases that are single-phase vapors operating at temperatures and pressures of the order of magnitude of those of their thermodynamic critical point. The polynomial chaos approach [23] is employed to take into account the different sources of uncertainties. The shape of an airfoil is given by Bezier curve (Fig. 15.18). The optimized shape is successfully obtained by the proposed method (Fig. 15.19). Zimmermann and von Hoessle [24] The Iterative Monte Carlo samplings are used to solve automotive crash design problems. The samplings are performed to probe a box-shaped candidate region, and then to readjust its boundaries in order to remove designs with insufficient performance. The goal of the algorithm is to find the maximum box-shaped solution space not including bad design, which improves the robustness and flexibility of component design (Fig. 15.20).
192
15 Other AI Technologies for Computational Mechanics
Fig. 15.18 Baseline profile with its interpolating Bézier points. Reprinted from [22] with permission from Elsevier
Fig. 15.19 Three geometries: baseline sonic arc, optimal shape produced by classical (deterministic) optimization and optimal shape yielded by robust optimal design. Reprinted from [22] with permission from Elsevier
15.10 Others
193
Fig. 15.20 Solution box for general problem. Reprinted from [24] with permission from Wiley
15.10 Others In addition to the above studies, a number of papers have been published applying the machine learning methods or the so called AI technologies to the computational mechanics. Genetic Algorithms: Smith et al. [25] A machine learning system based on the genetic algorithms is developed to generate the air combat tactics for advanced fighter aircraft. Some results on twosided learning are reported, where aircrafts in a one-versus-one combat scenario use this system to adapt their strategies. Amirjanov [26] The changing range genetic algorithm (CRGA) is tested on 12 benchmark problems. The algorithm is a variant of the genetic algorithms based on the shifting and shrinking of the size of the search space towards the global optimum. With this shifting and shrinking, the CRGA has such additional feature as the noise handling. Kernel Methods: Wang et al. [27] The support vector machine (SVM) (see Sect. 6.4) is applied to the construction of metamodels for crashworthiness optimization problems. The probability-based least square SVM for regression [19] is used to construct metamodels, appropriate over the entire design space, where an improved version of the boundary and the best neighbor sampling (BBNS) [28] is employed.
194
15 Other AI Technologies for Computational Mechanics
Tracy et al. [29] The supervised learning techniques with the kernel regression [19] are employed to study the low-fidelity models from higher-fidelity data sets in the simulation of turbulent flow. Wirtz et al. [30] The kernel methods [19] are used for multiscale simulations such as human spine simulations and simulations of saturation overshoots in porous media. A fast surrogate model for the microscale model is developed using the methods. Sampling Methods: Peherstorfer et al. [31] The importance sampling [19] combined with the surrogate model is applied to the failure probability estimation, reducing significantly the number of model evaluations in comparison to the standard Monte Carlo method. Bayesian Inference: Kohler et al. [32] A method to estimate parameters in partial differential equations is developed based on the Bayesian inference concept. Differential equations are transformed to a dynamic Bayesian network (DBN) [33] with the discretization of cellular probabilistic automata [34], and then the inference is performed with the Boyen-Koller algorithm [33]. Sandhu et al. [35] The Bayesian inference procedure with the Markov Chain Monte Carlo (MCMC) [19] sampling is augmented using the automatic relevance determination (ARD) [19], which is applied to nonlinear dynamical systems, such as the nonlinear aeroelastic oscillator. Note that the ARD is used for model reduction of nonlinear dynamical system. Swarm Intelligence: Vieira et al. [36] Performance among the genetic algorithms (GA), the particle swarm optimization methods (PSO) and the artificial immune systems (AIS) is discussed for the optimization of offshore oil production systems. Performance evaluation is based on how many of objective function evaluations are required during optimization procedure. It is concluded that the artificial immune system shows better performance than the others. Parpinelli et al. [37] Performance among three types of swarm intelligence algorithms, namely the bacterial foraging optimization (BFO), the particle swarm optimization (PSO) and the artificial bee colony algorithm (ABC), is compared for structural engineering optimization problems. Performance is measured based on both the quality of solutions and the number of function evaluations required during optimization. It is reported
15.10 Others
195
that the PSO presents the best balance between these two criteria for the applications tested. Random Forests: Trehan et al. [38] Though the surrogate models are essential in many applications, there exist some errors or differences between the results obtained by using the high-fidelity models. The error introduced by surrogate models of dynamical systems is discussed using such machine learning techniques as the random forests [39] and the least absolute shrinkage and selection operator (LASSO) [40]. The random forests are also employed to improve the RANS (Raynolds-averaged Navier–Stokes) models [41]. Surrogate Models: Wu et al. [42] A method to estimate the prediction performance of the surrogate models is developed. The prediction accuracy of the models depends on the data tested, and also those used in the training, because the models can usually make right prediction for data similar to or equivalent to the data used in training. In the paper, the Mahalanobis distance [40] and the kernel density estimation technique [40] are used as the metrics of the distance between two data, which are employed for the assessment of prediction confidence of the model. Boosted Regression Trees: Salazar et al. [43] The boosted regression trees [44] is employed to evaluate dam deformation and leakage [45], and to detect anomalies in dam performance. Game Strategies: Lee et al. [46] The game strategies such as the Nash-equilibrium and the Pareto-optimality are combined with the multi-objective evolutionary algorithms (MOEA) to accelerate convergence speed and to achieve better solutions in the optimization problems.
References 1. Furukawa, T., Yagawa, G.: Inelastic constitutive parameter identification using an evolutionary algorithm with continuous individuals. Int. J. Numer. Methods Eng. 40, 1071–1090 (1997) 2. Furukawa, T., Sugata, T., Yoshimura, S., Hoffmann, M.: Automated system of simulation and parameter identification of inelastic constitutive models. Comput. Methods Appl. Mech. Eng. 191, 2235–2260 (2002) 3. Versino, D., Tonda, A., Bronkhorst, C.A.: Data driven modeling of plastic deformation. Comput. Methods Appl. Mech. Eng. 318, 981–1004 (2017) 4. Kirchdoerfer, T., Ortiz, M.: Data-driven computational mechanics. Comput. Methods Appl. Mech. Eng. 304, 81–101 (2016)
196
15 Other AI Technologies for Computational Mechanics
5. Yagawa, G., Ye, G.W., Yoshimura, S.: A numerical integration scheme for finite element method based on symbolic manipulation. Int. J. Numer. Methods Eng. 29, 1539–1549 (1990) 6. Hearn, A.C.: Reduce: a user-oriented interactive system for algebraic simplification. In: Klerer, M. (ed.) Interactive Systems for Experimental Applied Mathematics, pp. 79–90. Springer, Berlin (1968) 7. Oishi, A., Yoshimura, S.: Genetic approaches to iteration-free local contact search. Comput. Model. Eng. Sci. 28(2), 127–146 (2008) 8. Rovira, A., Valdes, M., Casanova, J.: A new methodology to solve non-linear equation systems using genetic algorithms. Application to combined cycle gas turbine simulation. Int. J. Numer. Methods Eng. 63, 1424–1435 (2005) 9. Ueberhuber, C.W.: Numerical Computation 2: Methods, Software, and Analysis. Springer, Berlin (1997) 10. Rabinovich, D., Givoli, D., Vigdergauz, S.: XFEM-based crack detection scheme using a genetic algorithm. Int. J. Numer. Methods Eng. 71, 1051–1080 (2007) 11. Moës, N., Dolbow, J., Belytschko, T.: A finite element method for crack growth without remeshing. Int. J. Numer. Methods Eng. 46, 131–150 (1999) 12. Tan, L., Awade, S.R.: Response classification of simple polycrystalline microstructures. Comput. Methods Appl. Mech. Eng. 197, 1397–1409 (2008) 13. Peherstorfer, B., Willcox, K.: Dynamic data-driven reduced-order models. Comput. Methods Appl. Mech. Eng. 291, 21–41 (2015) 14. Wang, M., Dutta, D., Kim, K., Brigham, J.C.: A computationally efficient approach for inverse material characterization combining Gappy POD with direct inversion. Comput. Methods Appl. Mech. Eng. 286, 373–393 (2015) 15. Sankaran, S., Grady, L., Taylor, C.A.: Impact of geometric uncertainty on hemodynamic simulations using machine learning. Comput. Methods Appl. Mech. Eng. 297, 167–190 (2015) 16. Franck, I.M., Koutsourelakis, P.S.: Sparse variational Bayesian approximations for nonlinear inverse problems: applications in nonlinear elastography. Comput. Methods Appl. Mech. Eng. 299, 215–244 (2016) 17. Everson, R., Sirovich, L.: Karhunen-Loeve procedure for gappy data. J. Opt. Soc. Am. A 12(8), 1657–1664 (1995) 18. Sobol, I.M.: Global sensitivity indices for nonlinear mathematical models and their Monte Carlo estimates. Math. Comput. Simul. 55, 271–280 (2001) 19. Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Berlin (2006) 20. Botello, S., Marroquin, J.L., Oñate, E., Van Horebeek, J.: Solving structural optimization problems with genetic algorithms and simulated annealing. Int. J. Numer. Methods Eng. 45(5), 1069–1084 (1999) 21. Bugeda, G., Ródenas, J.J., Oñate, E.: An integration of a low cost adaptive remeshing strategy in the solution of structural shape optimization problems using evolutionary methods. Comput. Struct. 86(13–14), 1563–1578 (2008) 22. Congedo, P.M., Corre, C., Martinez, J.-M.: Shape optimization of an airfoil in a BZT flow with multiple-source uncertainties. Comput. Methods Appl. Mech. Eng. 200, 216–232 (2011) 23. Xiu, D.: Numerical Methods for Stochastic Computations: A Spectral Method Approach. Princeton University Press (2010) 24. Zimmermann, M., von Hoessle, J.E.: Computing solution spaces for robust design. Int. J. Numer. Methods Eng. 94, 290–307 (2013) 25. Smith, R.E., Dike, B.A., Mehra, R.K., Ravichandran, B., El-Fallah, A.: Classifier systems in combat: two-sided learning of maneuvers for advanced fighter aircraft. Comput. Methods Appl. Mech. Eng. 186, 421–437 (2000) 26. Amirjanov, A.: Investigation of a changing range genetic algorithm in noisy environments. Int. J. Numer. Meth. Eng. 73, 26–46 (2008) 27. Wang, H., Li, E., Li, G.Y.: Probability-based least square support vector regression metamodeling technique for crashworthiness optimization problems. Comput. Mech. 47, 251–263 (2011)
References
197
28. Wang, H., Li, G.Y., Zhong, Z.H.: Optimization of sheet metal forming processes by adaptive response surface based on intelligent sampling method. J. Mater. Process. Technol. 197(1–3), 77–88 (2008) 29. Tracy, B., Duraisamy, K., Alonso, J.J.: Application of supervised learning to quantify uncertainties in turbulence and combustion modeling. In: Proceedings of the 51st AIAA Aerospace Sciences Meeting including the New Horizons Forum and Aerospace Exposition, Grapevine, 18 pp (2013) 30. Wirtz, D., Karajan, N., Haasdonk, B.: Surrogate modeling of multiscale models using kernel methods. Int. J. Numer. Methods Eng. 101, 1–28 (2014) 31. Peherstorfer, B., Cui, T., Marzouk, Y., Willcox, K.: Multifidelity importance sampling. Comput. Methods Appl. Mech. Eng. 300, 490–509 (2016) 32. Kohler, D., Marzouk, Y.M., Muller, J., Wever, U.: A network approach to Bayesian inference in partial differential equations. Int. J. Numer. Methods Eng. 104, 313–329 (2015) 33. Murphy, K.P.: Machine Learning: A Probabilistic Perspective. MIT Press (2012) 34. Kohler, D., Muller, J., Wever, U.: Cellular probabilistic automata—a novel method for uncertainity propagation. SIAM/ASA J. Uncertain. Quantif. 2, 29–54 (2014) 35. Sandhu, R., Pettit, C., Khalil, M., Poirel, D., Sarkar, A.: Bayesian model selection using automatic relevance determination for nonlinear dynamical systems. Comput. Methods Appl. Mech. Eng. 320, 237–260 (2017) 36. Vieira, I.N., Pires de Lima, B.S.L., Jacob, B.P.: Bio-inspired algorithms for the optimization of offshore oil production systems. Int. J. Numer. Methods Eng. 91, 1023–1044 (2012) 37. Parpinelli, R.S., Teodoro, F.R., Lopes, H.S.: A comparison of swarm intelligence algorithms for structural engineering optimization. Int. J. Numer. Methods Eng. 91, 666–684 (2012) 38. Trehan, S., Carlberg, K.T., Durlofsky, L.J.: Error modeling for surrogates of dynamical systems using machine learning. Int. J. Numer. Methods Eng. 112, 1801–1827 (2017) 39. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001) 40. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, Berlin (2008) 41. Wang, J.-X., Wu, J.-L., Xiao, H.: Physics informed machine learning approach for reconstructing Reynolds stress modeling discrepancies based on DNS data. Phys. Rev. Fluids 2(3), Paper No. 034603 (2017) 42. Wu, J.-L., Wang, J.-X., Xiao, H., Ling, J.: A priori assessment of prediction confidence for data-driven turbulence modeling. Flow, Turbulence Combust. 99, 25–46 (2017) 43. Salazar, F., Toledo, M.A., González, J.M., Oñate, E.: Early detection of anomalies in dam performance: a methodology based on boosted regression trees. Struct. Control Health Monitor. 24,(11), Paper No. e2012 (2017) 44. Elith, J., Leathwick, J.R., Hastie, T.: A working guide to boosted regression trees. J. Animal Ecol. 77, 802–813 (2008) 45. Salazar, F., Toledo, M.A., Oñate, E., Suárez, B.: Interpretation of dam deformation and leakage with boosted regression trees. Eng. Struct. 119, 230–251 (2016) 46. Lee, D.S., Gonzalez, L.F., Periaux, J., Srinivas, K., Oñate, E.: Hybrid-game strategies for multi-objective design optimization in engineering. Comput. Fluids 47(1), 189–204 (2011)
Chapter 16
Deep Learning for Computational Mechanics
Abstract Since the deep learning is now a hot topic in computational mechanics with neural networks and many related studies have been reported recently, we discuss here some features of computational mechanics with deep learning. First, similarity and difference between conventional neural networks and deep neural networks are reviewed (Sect. 16.1), then the applications of deep learning to the computational mechanics are shown (Sect. 16.2 for the applications of deep convolutional networks, and Sect. 16.3 for those of deep feedforward networks). Finally, the applications of miscellaneous deep networks to computational mechanics are discussed in Sect. 16.4.
16.1 Neural Networks Versus Deep Learning It is expected that the deep learning expands the scope of application of the feed forward neural networks to the computational mechanics. On the other hand, the number of hidden layers and the number of units per layer of the neural network significantly increase, resulting in a large amount of computational load. In the training phase, however, it is considered that little problems exist regarding available computational resources and computing time because the training is performed independently prior to the application of the neural network to the computational mechanics. A large scale neural network armed by the deep learning is, as a matter of course, employed to solve the inference problems in the application phase, where available computers are usually much more limited than in the training phase. In other words, the issues related to computational speed in the application phase may be bigger than those in the training phase, because the speed of inference by the trained large-scale neural networks is insufficient and a critical issue for the computational mechanics with the deep learning. It is well known that to solve the partial differential equations numerically has been the major task of the computational mechanics, which is usually done with the double precision (FP64) computation. In the deep learning, on the other hand, it has become recognized that even the lower-precision computation performs well especially in the © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 G. Yagawa and A. Oishi, Computational Mechanics with Neural Networks, Lecture Notes on Numerical Methods in Engineering and Sciences, https://doi.org/10.1007/978-3-030-66111-3_16
199
200
16 Deep Learning for Computational Mechanics
classification problems, such as the classification of images (Sect. 1.4). In addition, the numerical accuracy in the application phase is not so serious compared to that in the training phase, where the accuracy-aware gradient calculations are involved. Thus, the low precision arithmetic could be a key regarding the computational speed in the application phase with the deep learning. The effect of the low precision arithmetic on the accuracy of the estimation for classification with the trained neural network and that for the regression can be easily assessed by using the pseudo low precision (PLP) method [1], which is a kind of precision reduction methods from a single precision floating point number to that of an arbitrary lower precision floating point. If the right shift of n bits and the following left shift of n bits are performed on the original data, the lower n bits of the fraction part are filled with 0, then the meaningful number of digits is reduced. This operation should be carried out after each arithmetic operation that is executed using the arithmetic unit designed for FP32 or FP64 data. Though this method lacks the completeness in that it uses only the ordinary arithmetic operation units designed for FP32 or FP64 data and leaves the exponent bits unchanged, it is easy to implement, and simple to evaluate by using the PLP the influence of the precision reduction on the results estimated by the neural networks. It is noted that the practicality of the low precision arithmetic is demonstrated in the regression problems where teacher signals are real-valued (See Sect. 9.2). The accuracy of the estimation measured by the sum of squared error is tested using the PLP with varying the number of shifted bits for the network of neural opt five hidden layers and 50 units per hidden layer for identifying wi, j,k trained using FP32 data and the arithmetic operations. It is shown that the sum of squared error does not grow for the PLP with less than 15 bits, indicating that the numerical precision of only 8 bits in fraction suffice the neural network tested [1]. Needed in the near future is a computational hardware that works best not only for the double precision computation but for the low precision computation, the former corresponding to the core solver part, such as the finite element method, and the latter to the deep learning part.
16.2 Applications of Deep Convolutional Neural Networks to Computational Mechanics Generally speaking, majority of studies of the deep learning have been related to the deep convolutional neural networks (CNN). This is because the deep learning has been mainly applied to the fields where two or multi-dimensional large data are employed as input, i.e. classification of images, sounds, processing of natural languages and so on. Although the uses of the CNN have not been so popular in the computational mechanics, we discuss here some recent applications of the CNN to the computational mechanics:
16.2 Applications of Deep Convolutional Neural Networks …
201
Tompson et al. [2] The CNN is applied to the prediction of pressure in order to accelerate the Eulerian fluid simulations. The network employed consists of five stages of convolution and rectifying linear (ReLU) layers. Finol et al. [3] The deep CNN are utilized for eigenvalue problems in mechanics, where a phononic eigenvalue problem of one-dimensional two-phase periodic composite with 100 cells is solved by using the CNN (see Fig. 16.1). They have solved the problem by employing the feedforward neural network with six hidden layers and 1024 units in each hidden layer. These neural networks take the Young’s modulus and the mass density of each cell as the input, i.e. 200 input data in total, and output the corresponding first two eigenvalues at ten different points in the irreducible Brillouin zone, i.e. 20 output data in total. Each input node of the CNN consists of two channels: one for the Young’s modulus and the other for the mass density. The stochastic gradient descent with momentum through backpropagation with mini-batches is employed for the training. A total of 300,000 patterns are generated, and several training datasets of different sizes are randomly chosen out of them. It is shown in Fig. 16.2 that the deep CNN employed achieves significantly higher prediction accuracies than the feedforward neural network, especially when trained with datasets of small size. Bhatnagar et al. [4] The CNN is used for the flow field predictions in Reynolds Averaged Navier– Stokes flow. The proposed CNN is trained to predict the velocity and the pressure field from the given image of an airfoil shape pixelated using the signed distance function. Hou et al. [5] The deep generative networks are applied to the approximation of posterior distributions in Bayesian inverse problems.
Fig. 16.1 Convolutional neural network architecture used for one-dimensional case. Reprinted from [3] with permission from Wiley
202
16 Deep Learning for Computational Mechanics
Fig. 16.2 Comparison of prediction accuracy as function of training data size. Reprinted from [3] with permission from Wiley
Patel et al. [6] The CNN is applied to the classification of tumors, which takes the measured displacement field as the input, classifying the tumors based on the elastic heterogenity and the nonlinearity. The CNN trained only using simulation data shows 80% accuracy in classification on real data obtained using ultrasound elastography. Han et al. [7] The CNN is employed to approximate the deformation field for the given pressure field and the tissue topology in electrosurgery. The CNN is trained to output the deformation field taking the micropore pressure distribution and the tissue topology described by the level set signed distance. Li et al. [8] The deep convolutional neural networks are employed to predict the effective mechanical property of heterogeneous materials, where the CNN employed takes a simplified image of a heterogeneous material as the input and outputs the corresponding effective mechanical property: the Young’s modulus (Fig. 16.3). A SEM (Scanning Electron Microscope) image of a material (shale) sample, which consists of many kinds of constituents, is converted into the corresponding simplified image that consists of only five types of constituents: silicate, carbonate, clay, kerogen and others. Each training pattern consists of a simplified image and the corresponding effective material property. Every pixel in the simplified image is then converted to a four-node linear quadrilateral plane strain finite element with the corresponding one of five kinds of material properties. Then, the finite element analysis with the above element is performed to obtain the corresponding effective material property. About 10,000 patterns are used to train the CNN, and other 2000 patterns are employed for cross validation. Most of the cross validation errors are under 2% and the average error is as low as 0.55%. The trained CNN is also tested for 500 real shale samples, in which the average error is 0.97%.
16.2 Applications of Deep Convolutional Neural Networks …
203
Fig. 16.3 Convolutional neural network to establish implicit mapping between mesoscale structure of shale sample and its effective modulus. Reprinted from [8] with permission from Elsevier
As discussed above, the microstructures of heterogeneous, composite materials are usually given as images, and the deep convolutional neural networks can be directly applicable to various pattern recognition problems related to these materials [9–12]. Li et al. [13] The design of phononic crystal with anticipated band gap is performed by a data-driven method, where an auto-encoder CNN is trained to extract compressed topological features from sample images, and a fully-connected feedforward neural network is simultaneously trained to output the compressed topological features from the band gap distribution obtained by using the image-based finite element analysis. Given an anticipated band gap distribution, the trained fully-connected neural network outputs corresponding compressed topological features, and then the features are input to the decoder part of the trained CNN to output the sample image that is considered to have the given band gap distribution. Wu et al. [14] A reduced order model based on the temporal CNN, a counterpart of the recurrent neural network, is applied to a dynamic simulation of flow past a cylinder. It is trained to predict physical values at the next time step from those at previous time steps.
204
16 Deep Learning for Computational Mechanics
16.3 Applications of Deep Feedforward Neural Networks to Computational Mechanics The feedforward neural networks have been widely employed for various applications in computational mechanics, where each single component of input data usually has its own explicit meaning in contrast to the convolutional neural networks (CNN). This means that the structure of the input–output mappings in the former is simpler and clearer. Some applications of the deep feedforward neural networks to the computational mechanics are discussed here including those given in Chap. 9. Ling et al. [15] The feedforward neural networks with 10 hidden layers and 10 units in each hidden layer are employed to study a model for the Reynolds stress anisotropy tensor from high-fidelity simulation data. It takes approximately 200 h to train a neural network. Employed also is the neural networks with specialized structure designed to manage rotational invariance with 8 hidden layers and 30 nodes in each hidden layer. It is concluded that the latter networks show better performance. Yang et al. [16] The feedforward neural network with 3 hidden layers and 6 units in each hidden layer is utilized to predict the pressure in grid-based fluid. The size of the neural network here is rather small, but the number of training patterns employed is as many as eight millions. Moosavi et al. [17] Some deep feedforward neural networks are compared to the Gaussian processes in the approximation of multivariate mappings. The deep neural networks with more than 5 hidden layers outperform the Gaussian processes in accuracy for many cases tested. Teichert and Garikipati [18] A modified feedforward neural network, which is a kind of the knowledge-based neural networks, is used as a surrogate model in the optimization process to predict precipitate morphology in alloy systems. The model takes such parameters as those describing the boundary dimensions of the surface, those defining the surface topography and one parameter defining composition of the precipitate as the input and outputs the total energy of the system. The knowledge-based neural network uses some basic knowledge of the system by employing an analytical low-fidelity model in parallel with the standard fully-connected layers of the deep feedforward neural network (Fig. 16.4). In the paper, a standard feedforward neural network trained using the coarse mesh data is used as the low-fidelity model, while the whole network is trained using the high-fidelity data based on fine meshes with the low-fidelity model part fixed, where the optimal architecture of the deep feedforward neural network for the low-fidelity model is searched in the range of 2–10 hidden layers and of 40–500 hidden units per hidden layer.
16.3 Applications of Deep Feedforward Neural Networks …
205
Fig. 16.4 Knowledge-based neural network (KBNN). Reprinted from [18] with permission from Elsevier
Wang et al. [19] The deep feedforward neural network is employed to estimate U* index for predicting the load path, which is a concept for tracking transferred forces within a structure starting from a loading point and ending at a supporting point, and it can be determined from the indexes representing the internal stiffness between any point and the loading point of a structure. A plate with a hole and that with a stiffener are, respectively, tested, where 960 sample points are set equally distributed over the plates, and the material information is assigned to each point. The neural network takes the material information at 960 points as input, and outputs U* indexes at the 960 points. The network in this study consists of four hidden layers with 20, 180, 180 and 60 units for each hidden layer and the ReLU activation function is employed. Used are 150 patterns for the training, and 50 patterns are checked of the generalization capability of the trained neural network. It is shown that the estimation of U* values is more accurate for the case with a stiffener than that with a hole due to the less structural singularity and U* distribution discontinuity in the former.
16.4 Others Some additional applications of the deep learning to the computational mechanics are discussed: Ladicky et al. [20] This is a study rather categorized into the computer graphics field, but the regression forests are applied to estimate the acceleration of every particle for each frame in a fluid simulation. A large number of random training videos generated with the position-based fluid method are used for training as the truth data.
206
16 Deep Learning for Computational Mechanics
Liu et al. [21] A deep material network, which represents the direct numerical model of the representative volume element by a hierarchical structure with mechanistic building blocks, is proposed for the multiscale material modeling based on the homogenization theory, the architecture of which is not a fully-connected one common among the feedforward neural networks but a binary-tree type, where each unit has two child units (Fig. 16.5). The building block, a simple two-layer subnet consisting of three units, is chosen to be a simple structure with analytical homogenization solutions, where the architect of the material network represents the path of the homogenization process from each individual phase to the overall macroscopic material (Fig. 16.6). The network uses the same backpropagation algorithm as the ordinary feed forward neural networks, while the propagation rule of the former is different from that of the latter. It is shown that the proposed network with nine layers (depth N = 7) can predict the homogenized compliance matrix within 1% error for all the four different types of morphologies under 2D plane strain conditions: uniform material with a single phase, matrix-inclusion material with circular inclusions embedded in the matrix,
Fig. 16.5 Material network with depth N = 3. Reprinted from [21] with permission from Elsevier
−r
Fig. 16.6 Two-layer building block. Compliance matrix after homogenization operation is D , and −
that after rotation operation D. Reprinted from [23] with permission from Elsevier
16.4 Others
207
amorphous material and anisotropic material. This method is also combined with the transfer learning strategy [22] to improve training accuracy and speed [23]. Wang and Sun [24] A reinforcement learning of mapping from situations to actions to maximize a scalar reward [25] is employed to generate the mechanical constitutive models, which are considered as an information flow in directed graphs, and the process of writing them is a sequence of forming graph edges with the goal of maximizing the model score. In this research, the process of writing a constitutive model is implemented as a game suitable for reinforcement learning, where the reward is evaluated by the computer agent, i.e. a neural network, which is able to improve itself through the selfplaying of the game to generate directed graphs or constitutive models. Employed are the recurrent neural networks with two hidden layers and 32 gated recurrent units in each hidden layer.
References 1. Oishi, A., Yagawa, G.: Computational mechanics enhanced by deep learning. Comput. Methods Appl. Mech. Eng. 327, 327–351 (2017) 2. Tompson, J., Schlachter, K., Sprechmann, P., Perlin, K.: Accelerating Eulerian fluid simulation with convolutional networks. In: Proceedings of the 34th International Conference on Machine Learning, Sydney (2017) 3. Finol, D., Lu, Y., Mahadevan, V., Srivastava, A.: Deep convolutional neural networks for eigenvalue problems in mechanics. Int. J. Numer. Meth. Eng. 118, 258–275 (2019) 4. Bhatnagar, S., Afshar, Y., Pan, S., Duraisamy, K., Kaushik, S.: Prediction of aerodynamic flow fields using convolutional neural networks. Comput. Mech. 64, 525–545 (2019) 5. Hou, T.Y., Lam, K.C., Zhang, P., Zhang, S.: Solving Bayesian inverse problems from the perspective of deep generative networks. Comput. Mech. 64, 395–408 (2019) 6. Patel, D., Tibrewala, R., Vega, A., Dong, L., Hugenberg, N., Oberai, A.A.: Circumventing the solution of inverse problems in mechanics through deep learning: application to elasticity imaging. Comput. Methods Appl. Mech. Eng. 353, 448–466 (2019) 7. Han, Z., Rahul, De, S.: A deep learning-based hybrid approach for the solution of multiphysics problems in electrosurgery. Comput. Methods Appl. Mech. Eng. 357, 112603 (2019) 8. Li, X., Liu, Z., Cui, S., Luo, C., Li, C., Zhuang, Z.: Predicting the effective mechanical property of heterogeneous materials by image based modeling and deep learning. Comput. Methods Appl. Mech. Eng. 347, 735–753 (2019) 9. Chowdhury, A., Kautz, E., Yener, B., Lewis, D.: Image driven machine learning methods for microstructure recognition. Comput. Mater. Sci. 123, 176–187 (2016) 10. Kondo, R., Yamakawa, S., Masuoka, Y., Tajima, S., Asahi, R.: Microstructure recognition using convolutional neural networks for prediction of ionic conductivity in ceramics. Acta Mater. 141, 29–38 (2017) 11. Cang, R., Li, H., Yao, H., Jiao, Y., Rena, Y.: Improving direct physical properties prediction of heterogeneous materials from imaging data via convolutional neural network and a morphology-aware generative model. Comput. Mater. Sci. 150, 212–221 (2018) 12. Yang, Z., Yabansu, Y.C., Al-Bahrania, R., Liaoa, W.-K., Choudhary, A.N., Kalidindi, S.R., Agrawal, A.: Deep learning approaches for mining structure-property linkages in high contrast composites from simulation datasets. Comput. Mater. Sci. 151, 278–287 (2018) 13. Li, X., Ning, S., Liu, Z., Yan, Z., Luo, C., Zhuang, Z.: Designing phononic crystal with anticipated band gap through a deep learning based data-driven method. Comput. Methods Appl. Mech. Eng. 361, 112737 (2020)
208
16 Deep Learning for Computational Mechanics
14. Wu, P., Sun, J., Chang, X., Zhang, W., Arcucci, R., Guo, Y., Pain, C.C.: Data-driven reduced order model with temporal convolutional neural network. Comput. Methods Appl. Mech. Eng. 360, 112766 (2020) 15. Ling, J., Kurzawski, A., Templeton, J.: Reynolds averaged turbulence modelling using deep neural networks with embedded invariance. J. Fluid Mech. 807, 155–166 (2016) 16. Yang, C., Yang, X., Xiao, X.: Data-driven projection method in fluid simulation. Comput. Animation Virtual Worlds 27, 415–424 (2016) 17. Moosavi, A., Stef˘ ¸ anescu, R., Sandu, A.: Multivariate predictions of local reduced-order-model errors and dimensions. Int. J. Numer. Meth. Eng. 113, 512–533 (2018) 18. Teichert, G.H., Garikipati, K.: Machine learning materials physics: surrogate optimization and multi-fidelity algorithms predict precipitate morphology in an alternative to phase field dynamics. Comput. Methods Appl. Mech. Eng. 344, 666–693 (2019) 19. Wang, Q., Zhang, G., Sun, C., Wu, N.: High efficient load paths analysis with U* index generated by deep learning. Comput. Methods Appl. Mech. Eng. 344, 499–511 (2019) 20. Ladicky, L., Jeong, S., Solenthaler, B., Pollefeys, M., Gross, M.: Data-driven fluid simulations using regression forests. ACM Trans. Graph. 34(6), Article No. 199, 9 (2015) 21. Liu, Z., Wu, C.T., Koishi, M.: A deep material network for multiscale topology learning and accelerated nonlinear modeling of heterogeneous materials. Comput. Methods Appl. Mech. Eng. 345, 1138–1168 (2019) 22. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016) 23. Liu, Z., Wu, C.T., Koishi, M.: Transfer learning of deep material network for seamless structure– property predictions. Comput. Mech. 64, 451–465 (2019) 24. Wang, K., Sun, W.-C.: Meta-modeling game for deriving theory-consistent, microstructurebased traction–separation laws via deep reinforcement learning. Comput. Methods Appl. Mech. Eng. 346, 216–241 (2019) 25. Sutton, R.S.: Introduction: the challenge of reinforcement learning. In: Sutton, R.S. (ed.) Reinforcement Learning. The Springer International Series in Engineering and Computer Science (Knowledge Representation, Learning and Expert Systems), vol. 173. Springer, Berlin (1992)
Appendix
A1. Bases of Finite Element Method Assuming an elastic solid body subject to an external force on a part of its surface Sσ and fixed at the other part of the surface Su , the equations of equilibrium and boundary conditions are formulated as follows: ∂σx ∂x ∂τx y ∂x ∂τzx ∂x
+ + +
∂τx y ∂y ∂σ y ∂y ∂τ yz ∂y
+ ∂τ∂zzx + f x = 0 ∂τ + ∂zyz + f y = 0 in z + ∂σ + fz = 0 ∂z
σx l + τx y m + τzx n = T x τx y l + σ y m + τ yz n = T y τzx l + τ yz m + σz n = T z u=u v=v w=w
on Sσ
on Su
(A1.1)
(A1.2)
(A1.3)
where σx, σ y , σz , τx y , τ yz , τzx are the three-dimensional stress components, body force components,(l, m, n) the outward unit normal vector, f x , f y , f z the − − − − − − T x , T y , T z the external force vector, u, v, w the prescribed displacements on Su , and (u, v, w) the three-dimensional displacement vector. The potential energy of this system is defined as
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 G. Yagawa and A. Oishi, Computational Mechanics with Neural Networks, Lecture Notes on Numerical Methods in Engineering and Sciences, https://doi.org/10.1007/978-3-030-66111-3
209
210
Appendix
Fig. A.1 Analysis domain divided by mesh
Mesh generation
Analysis Domain
˚ =
Element
Node
¨
A1 + A2 + A3 − f x u + f y v + f z w dν −
Mesh
T x u + T y v + T z w ds −
−
−
Sσ
(A1.4) where ∂u ∂v ∂w 2 Eν + + A1 = 2(1 + ν)(1 − 2ν) ∂ x ∂y ∂z
2 2 ∂u 2 ∂v ∂w A2 = G + + ∂x ∂y ∂z G A3 = 2
∂w ∂v + ∂y ∂z
2
+
∂w ∂u + ∂z ∂x
2
+
∂u ∂v + ∂x ∂y
(A1.5)
(A1.6) 2 (A1.7)
Here, E is the modulus of elasticity in tension and G that in shear. It is known that the solution of Eqs. (A1.1)–(A1.3) minimizes and the solution that minimizes is that of Eqs. (A1.1)–(A1.3). In the finite element method, a whole analysis domain is divided into a set of small domains of simple shape as shown in Fig. A.1. Each divided domain is called an element and its vertexes the nodes. Displacements at an arbitrary point in an element are defined using those values at the nodes in the element as ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ n Ui u u(ξ, η, ζ ) ⎝ v ⎠ = ⎝ v(ξ, η, ζ ) ⎠ = Ni (ξ, η, ζ ) · ⎝ Vi ⎠ w
w(ξ, η, ζ )
i=1
(A1.8)
Wi
where (u, v, w) is the three-dimensional displacement vector, (Ui , Vi , Wi ) the nodal displacement vector at the i-th node in the element, n the number of nodes in the element and Ni (ξ, η, ζ ) is the basis function, which is also called a shape function. Two typical three-dimensional isotropic solid elements in the finite element method are shown in Fig. A.2; one is the eight-node linear element and the other the
Appendix
211
(a) 8-noded element
(b) 20-noded element
Fig. A.2 a 8-noded 3D element, b 20-noded 3D element
twenty-node quadratic one. Basis functions of the former are defined as Ni (ξ, η, ζ ) =
1 (1 + ξi ξ )(1 + ηi η)(1 + ζi ζ )i = 1, · · · , 8 8
(A1.9)
and those of the latter as
Ni (ξ, η, ζ ) =
1 (1 + ξi ξ )(1 + ηi η)(1 + ζi ζ )(ξi ξ + ηi η + ζi ζ − 2)i = 1, · · · , 8 8 (A1.10)
Ni (ξ, η, ζ ) =
1 1 − ξ 2 (1 + ηi η)(1 + ζi ζ )i = 9, 11, 17, 19 4
(A1.11)
Ni (ξ, η, ζ ) =
1 (1 + ξi ξ ) 1 − η2 (1 + ζi ζ )i = 10, 12, 18, 20 4
(A1.12)
Ni (ξ, η, ζ ) =
1 (1 + ξi ξ )(1 + ηi η) 1 − ζ 2 i = 13, 14, 15, 16 4
(A1.13)
where (ξi , ηi , ζi ) is the coordinate of the i-th node in the parametric space as seen in Fig. A.2. Discretization of the equilibrium equations of a static structural problem, Eqs. (A1.1)–(A1.3), with respect to space dimensions results in [K ]{U } = {F}
(A1.14)
where {U } is the nodal displacement vector, [K ] the global stiffness matrix, and {F} the nodal force vector. It is noted that the global stiffness matrix is constructed by summing up all the element stiffness matrices as follows:
212
Appendix
[K ] =
ne e k
(A1.15)
e=1
where n e is the total number of elements, and [k e ] the element stiffness matrix of the e-th element calculated as e k =
[B]T [D][B]dv
(A1.16)
ve
where [D] is the stress–strain matrix and [B] the strain–displacement matrix. Define a matrix of shape functions [N ] as ⎤ Nn 0 0 N1 0 0 [N ] = ⎣ 0 N1 0 · · · 0 Nn 0 ⎦ 0 0 N1 0 0 Nn ⎡
(A1.17)
where Ni = N i (ξ, η, ζ ) is the shape function and n the number of nodes in an element. Using [N ] defined above, [B] is calculated as [1, 2], ⎡
∂ ∂x
0 0 ⎢ 0 ∂ 0 ⎢ ∂y ⎢ ∂ ⎢ 0 0 ∂z [B] = ⎢ ∂ ∂ ⎢ ⎢ ∂y ∂x 0 ⎢ ∂ ∂ ⎣ 0 ∂z ∂ y ∂ 0 ∂∂x ∂z
⎤ ⎥ ⎥ ⎥ ⎥ ⎥[N ] ⎥ ⎥ ⎥ ⎦
(A1.18)
In the case of the three-dimensional solid elements with isotropic elastic materials, [D] is defined as ⎡
ν ν 1−ν 1−ν ν ν 1 1−ν 1−ν ν ν 1 1−ν 1−ν
1
⎢ ⎢ ⎢ E(1 − ν) ⎢ ⎢ [D] = (1 + ν)(1 − 2ν) ⎢ ⎢ 0 ⎢ 0 ⎣ 0
0 0 0
0 0 0
0 0 0 1−2ν 2(1−ν)
0 0
0 0 0 0 1−2ν 2(1−ν)
0 0 0 0 0
0
1−2ν 2(1−ν)
⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦
(A1.19)
where E is the Young’s modulus and ν the Poisson’s ratio. The integral Eq. (A1.16), defined in the real xyz space, is usually transformed to that in the parametric ξ ηζ space of [−1, 1] × [−1, 1] × [−1, 1] and then numerically integrated by the Gauss–Legendre quadrature (see Sect. 9.1). With the isoparametric formulation in the finite element method as well as the isogeometric one in the
Appendix
213
isogeometric analysis, where the same basis functions are used for both representing shape of the objects to be analyzed and approximating unknown physical values such as displacements, the integral Eq. (A1.16) is performed as follows: e k =
1 1 1 [B]T [D][B] · |J | · dξ dηdζ
[B] [D][B]d xd ydz = T
(A1.20)
−1 −1 −1
ve
where |J | is the determinant of the Jacobian matrix defined as ⎡ ∂x ⎢ [J ] = ⎣
∂ξ ∂x ∂η ∂x ∂ζ
∂y ∂ξ ∂y ∂η ∂y ∂ζ
⎤
∂z ∂ξ ∂z ∂η ∂z ∂ζ
⎥ ⎦
(A1.21)
The first derivatives of the basis function with respect to x, y or z, which appear in [B] are calculated as follows: ⎧ ⎫ ∂ Ni ⎪ ⎪ ⎨ ∂x ⎬ ∂ Ni
∂y ⎪ ⎩ ∂ Ni ⎪ ⎭ ∂z
= [J ]−1
⎧ ∂N ⎫ i ⎪ ⎨ ∂ξ ⎪ ⎬ ∂ Ni
(A1.22)
∂η ⎪ ⎩ ∂ Ni ⎪ ⎭ ∂ζ
Accordingly, the Gauss–Legendre quadrature of an element stiffness matrix is computed as follows: n m l e T k ≈ [B] [D][B] · |J | | i=1 j=1 k=1
ξ = ξi η = ηj ζ = ζk
· Hi, j,k
(A1.23)
where n, m and l are the numbers of integration points along ξ , η and ζ axes, respectively, and Hi, j,k the weight at the quadrature point ξi , η j , ζk defined by Hi, j,k = Hi · H j · Hk
(A1.24)
As is shown in Eq. (A1.23), a numerical quadrature of an element stiffness matrix is to sum up the value of the integrand evaluated at the quadrature point multiplied by the corresponding weight, meaning that the computational load to perform the quadrature grows in proportion to the total number of integration points. On the other hand, the discretization of the equation of a dynamic structural problem with respect to space dimensions leads to the following equation, [M] U¨ + [K ]{U } = {F}
(A1.25)
214
Appendix
where U¨ is the nodal acceleration vector, [M] the global mass matrix and {F} the nodal dynamic force vector. Summing up all element mass matrices results in the global mass matrix as [M] =
n e m
(A1.26)
e=1
where the element mass matrices are calculated by
e m =
[N ]T ρ[N ]dv
(A1.27)
ve
where ρ is the mass density. It is known that there are two kinds of mass matrix: a consistent mass matrix and a lumped mass matrix. The former is the one calculated by Eq. (A1.27), whereas the latter is made by setting each diagonal component as the sum of the correspondent row of the consistent mass matrix and non-diagonal elements as 0. In other words, the ij-th component of the lumped mass matrix m iLj is calculated from the ij-th component of the consistent mass matrix m i j as follows: m iiL =
m i j , m iLj = 0(i = j)
(A1.28)
j
Discretization of Eq. (A1.25) with respect to time using the central difference scheme leads to the following equation: U¨
1 [M]{U }n+1 ( t)2
{U }n+1 − 2{U }n + {U }n−1 (A1.29) ( t)2 1 2 = {F}n − [K ] − [M] {U }n − [M]{U }n−1 2 ( t) ( t)2 (A1.30) n
=
where {U }n+1 , {U }n , and {U }n−1 are the nodal displacement vectors at (n + 1)th, nth and (n + 1)th time steps, respectively, and t the time step. Using the lumped diagonal mass matrix as [M] in Eq. (A1.30) results in an explicit time integration scheme, which needs no matrix inversion to solve the equation. On the other hand, we can use the Newmark’s β method as {U }n+1
= {U }n + t U˙ U˙
n+1
+ ( t)
2
n
= U˙
n
+
1 − β U¨ 2
t ¨ U 2
n
+
n
+ ( t)2 β U¨
t ¨ U 2
n+1
n+1
(A1.31) (A1.32)
Appendix
215
Fig. A.3 Domain decomposition method (DDM)
Node Domain boundary node In this case, the equation (A1.25) is discretized in time as [K ] +
1 1 = {F}n+1 + [M] − 1 U¨ [M] {U } 2 2β β( t) n+1
n
1 {U }n + β( t)2 (A1.33)
It is noted that the matrix inversion is needed to solve this equation.
A2. Parallel Processing for Finite Element Method The domain decomposition method (DDM) is considered to be the most popular parallel processing algorithm for the FEM [3–6], in which a whole analysis domain is decomposed into subdomains as shown in Fig. A.3. In the figure, the analysis domain is decomposed into four subdomains, where nine nodes are shared among multiple subdomains and assigned to be the inter-subdomain boundary. The procedure of the DDM is summarized as follows: (1) All the displacements at the domain-boundaries are initialized to arbitrary values, usually to zero. (2) With the prescribed displacements at the inter-domain boundaries being a part of the boundary conditions, the finite element analysis of each subdomain is performed. As the finite element analysis of a subdomain is independent of those of other subdomains, this analysis can be executed in parallel. (3) The displacements at the inter-domain boundaries are updated to satisfy the equilibrium of reactive forces. This is usually performed using the preconditioned conjugate gradient algorithm. (4) If no more update of the displacements at the inter-domain boundaries are required, the equilibrium is determined to be achieved. Otherwise, return to the step (2) above.
216
Appendix
Fig. A.4 Flowchart of DDM
Fig. A.5 Allocation of processors for DDM
Controller DCC Sending Server
Receiving Server
Domain Analyzer
Domain Analyzer
Analyzer
Analyzer
This procedure is illustrated in Fig. A.4. The DDM is usually performed with the MIMD parallel processing environment as shown in Fig. A.5, where a processor is allocated to the Controller and the others the Analyzers. The Controller consists of the DCC (Domain Connectivity Controller) and the Sending or Receiving Server. The former manages all the data and updates the
Appendix
217
Whole Analysis Domain
Whole Analysis Domain divided to Parts
Each Part divided to Subdomains
Fig. A.6 Hierarchical domain decomposition method (HDDM)
displacements of the nodes at the inter-domain boundaries and the latter communicates with all the other processors. Each of the Analyzers receives the necessary data of the corresponding subdomain from the Controller, performs the finite element analysis of the subdomain with the prescribed boundary conditions and returns the analysis results to the Controller. If a problem to be solved is very complicated or large, a number of subdomains and Analyzers are extraordinary in order to perform the analysis in bearable timescale. But, with too many Analyzers, the workload of the Controller in communication with them could exceed the practical limit. To solve this issue, the hierarchical domain decomposition method (HDDM) is studied [7, 8], where an analysis domain is first decomposed into parts, each of which is then decomposed into subdomains as shown in Fig. A.6. Processors for the HDDM are also hierarchically allocated: the Grand Controller that manages the update of the displacements at the inter-part boundaries, the Controllers that manage the update of the displacements at the intersubdomain boundaries in the corresponding part and the Analyzers that perform the FEM analyses of subdomains (see Fig. A.7).
218
Appendix
Grand Controller
Controller
Analyzer
Controller
Analyzer
Analyzer
Analyzer
Fig. A.7 Allocation of processors for HDDM
A3. Isogeometric Analysis The Isogeometric Analysis (IGA) [9, 10] is regarded as an extension of the FEM, which employs the NURBS as the basis functions for analysis and also for constructing shapes in the CAD system. Using the same basis functions for both design and analysis eliminates a mesh generation process that is essential for the conventional FEM. The NURBS basis functions are constructed using the B-Spline basis functions [11, 12], which are constructed from a knot vector. A knot vector is a set of monotonously non-decreasing real values, ξ1 , ξ2 , · · · , ξn+ p , ξn+ p+1 , where p is the polynomial order and n the number of basis functions. The B-Spline basis functions based on a given knot vector ξ1 , ξ2 , · · · , ξn+ p , ξn+ p+1 are defined recursively using ! Ni,0 (ξ ) = Ni, p (ξ ) =
1 (ξi ≤ ξ < ξi+1 ) 0 (other wise)
(A3.1)
ξi+ p+1 − ξ ξ − ξi Ni, p−1 (ξ ) + Ni+1, p−1 (ξ ) ξi+ p − ξi ξi+ p+1 − ξi+1
(A3.2)
Based on the B-Spline basis functions, the one-dimensional and the threedimensional NURBS basis functions are, respectively, defined as follows: Ni, p (ξ ) · wi p Ri (ξ ) = "n N (ξ ) · w i i =1 i , p #
#
(A3.3)
#
Ni, p (ξ ) · M j,q (η) · L k,r (ζ ) · wi, j,k p,q,r Ri, j,k (ξ, η, ζ ) = "n "m "l N (ξ ) · M (η) · L (ζ ) · w j ,q k ,r i , j ,k i =1 j =1 k =1 i , p (A3.4) #
#
#
#
#
#
#
#
#
Appendix Fig. A.8 B-Spline basis functions
219 1.0 0.8 0.6 0.4 0.2 0.0 0.0
Fig. A.9 NURBS basis functions
1.0
2.0
1.0
2.0
ξ
3.0
4.0
5.0
3.0
4.0
5.0
1.0 0.8 0.6 0.4 0.2 0.0 0.0
ξ
where Ni, p (ξ ), M j,q (η) and L k,r (ζ ) are the one-dimensional B-Spline basis functions for each axis, p, q and r are orders of basis functions and w a weight. Figure A.8 shows the B-Spline basis functions derived from the knot vector {0, 0, 0, 0, 1, 2, 3, 4, 5, 5, 5, 5}, and Fig. A.9 the NURBS basis functions derived from the same knot vector with setting w6 to 0.5 and all the other weights to 1.0. The IGA is almost equivalent in formulation to the conventional FEM except for using the NURBS as the basis functions. The formulation of the conventional FEM shown in Appendix A1 is employed for IGA. Figure A.10 shows a sample analysis domain for the IGA, where the Control points correspond to the nodes in the FEM. The IGA has some advantages over the conventional FEM as follows: (1) Geometry defined in the CAD is also used for analysis, while the conventional FEM uses independent geometry re-defined after meshing.
220
Appendix
Fig. A.10 IGA mesh
Object
Control points
Fig. A.11 Local elements cluster for FMM
Pi
ri
(2) The NURBS functions are defined over an analysis domain with higher continuity everywhere, while the basis functions in the conventional FEM are defined in each element often with only C 0 continuity at the inter-element boundaries, meaning that artificial stress discontinuity appears there.
A4. Free Mesh Method The free mesh method (FMM) is an extension of the finite element method with a meshless feature [13, 14]. Figure A.11 shows that the FMM employs no pre-defined element data but locally-defined elements cluster that are temporarily generated with
Appendix
221
a node Pi and its neighboring nodes within the circle of prescribed radius r i . The algorithm of the FMM is summarized as follows: For a node Pi in the whole domain, (1)–(4) below are performed, (1) The neighboring nodes for the node Pi are searched and identified in the circle of the radius ri and Pi being its center. (2) Local elements cluster employing the node Pi and the neighboring nodes searched above is temporarily set. (3) The element stiffness matrix [K e ] corresponding to the above local elements cluster is calculated. (4) The components of the matrix above are added to the corresponding row in the global stiffness matrix [K ]. When (1)–(4) above are done for all the nodes in the domain, (5) Solve the global equation [K ]{U } = {F}. The above node-based nature of the free mesh method has been deepened and inherited to the node-by-node meshless method [15, 16] and the FMM has been updated and applied to various computational fields [17–29].
A5. Other Meshless Methods As is well known, the elements employed in the conventional FEM play important roles. They, however, often bring various difficulties as follows: (1) Automatic mesh generation for a domain of irregular and complicated shape is often impossible and needs man’s assist, which takes large amount of cost and time. (2) When handling large deformation and breakage, accuracy is often lost due to distorted elements. Though adaptive re-meshing could be a solution to this issue, automatic meshing used for re-meshing may not be robust enough. To overcome these difficulties, various methods not requiring mesh generation have been developed. They only depend on nodes and are called meshless methods [30]. Among others, the Diffuse Element Method [31], the EFGM (Element-free Galerkin Method) [32, 33], the MLPGM (Meshless Local Petrov–Galerkin Method) [34, 35], the PIM (Point Interpolation Method) [36], and the Particle Finite Element Method [37, 38] are successfully applied to many basic and industrial problems. The Isogeometric Analysis and the Free Mesh Method described in the previous sections are also regarded as meshless methods. Other than the methods above, studied are several methods that are categorized as particle methods: the SPH (Smoothed Particle Hydrodynamics) [39, 40] and the MPS (Moving Particle Semi-implicit) [41]. They have been mainly developed for fluid simulation while there are some applications to solid mechanics.
222
Appendix
A6. Inverse Problems Any computational mechanics problem can be classified into one of the followings: the direct problem and the inverse problem. To solve the former, it is usually required to have the data listed as [42], (a) (b) (c) (d) (e)
Shape of domain to be solved with its boundary. Governing equation. Boundary conditions and initial conditions. Material properties. Forces or other inputs acting in the domain.
When all the above are given, we can obtain the response of the system in a straight manner by means of a numerical or analytical method because the direct problem is usually well-posed. On the other hand, in the inverse problem, some of (a)–(d) are lacking, but some of the responses are given instead. Using the response of the system and information available out of (a)–(e) above, the inverse problems are posed to determine one or some of the followings, (a ) (b ) (c ) (d ) (e )
Shape of domain and its boundary. Governing equation. Boundary conditions and initial conditions. Material properties. Forces or other inputs acting in the domain.
Solving process of an inverse problem is often called identification, estimation, reconstruction, inversion, back calculation, inverse design and so on. The Non Destructive Evaluation (NDE) is one of the inverse problems. Some features of the inverse problems are given as follows: (1) When existence, uniqueness, continuity and stability of the solution of a problem are all assured, the problem is called well-posed. But, the inverse problems lack one or some of them, it makes the problem hard to be solved, and they are often called ill-posed. (2) The neural networks have been used as promising tools to solve some of inverse problems.
References 1. 2. 3.
Bathe, K.J.: Finite Element Procedures. Prentice-Hall (1996) Hughes, T.J.R.: The Finite Element Method : Linear Static and Dynamic Finite Element Analysis. Dover (2000) Yagawa, G., Soneda, N., Yoshimura, S.: A large scale finite element analysis using domain decomposition method on a parallel computer. Comput. Struct. 38, 615–625 (1991)
Appendix 4. 5.
6. 7.
8.
9.
10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26.
27. 28. 29.
223
Yagawa, G., Yoshioka, A., Yoshimura, S., Soneda, N.: A parallel finite element method with a supercomputer network. Comput. Struct. 47(3), 407–418 (1993) Nikishkov, G.P., Makinouchi, A., Yagawa, G., Yoshimura, S.: Performance study of the domain decomposition method with direct equation solver for parallel finite element analysis. Comput. Mech. 19, 84–93 (1996) Smith, B., Bjorstad, P., Gropp, W.: Domain Decomposition: Parallel Multilevel Methods for Elliptic Partial Differential Equations. Cambridge UP (1996) Miyamura, T., Noguchi, H., Shioya, R., Yoshimura, S., Yagawa, G.: Elastic-plastic analysis of nuclear structures with millions of DOFs using the hierarchical domain decomposition method. Nuclear Eng. Des. 212, 335–355 (2002) Murotani, K., Sugimoto, S., Kawai, H., Yoshimura, S.: Hierarchical domain decomposition with parallel mesh refinement for billions-of-DOF scale finite element analyses. Int. J. Comput. Methods 11(4), Paper No. 1350061 (2014) Hughes, T.J.R., Cottrell, J.A., Bazilevs, Y.: Isogeometric Analysis: CAD, finite elements, NURBS, exact geometry, and mesh refinement. Comput. Methods Appl. Mech. Eng. 194, 4135–4195 (2005) Cottrell, J.A., Hughes, T.J.R., Bazilevs, Y.: Isogeometric Analysis. Wiley (2009) Piegl, L., Tiller, W.: The NURBS Book, 2nd edn. Springer, Berlin (2000) Rogers, D.F.: An Introduction to NURBS with Historical Perspective. Academic Press (2001) Yagawa, G., Yamada, T.: Free mesh methods: a new meshless finite element method. Comput. Mech. 18, 383–386 (1996) Yagawa, G., Furukawa, T.: Recent developments of free mesh method. Int. J. Numer. Methods Eng. 47, 1419–1443 (2000) Nagashima, T.: Node-by-node meshless approach and its application to structural analyses. Int. J. Numer. Methods Eng. 46, 341–385 (1999) Nagashima, T.: Development of a CAE system based on the node-by-node meshless method. Comput. Methods Appl. Mech. Eng. 187, 1–34 (2000) Yagawa, G., Shirazaki, M.: Parallel computing for incompressible flow using a nodal-based method. Comput. Mech. 23, 209–217 (1999) Furukawa, T., Yang, C.Q., Yagawa, G., Wu, C.C.: Quadrilateral approaches for accurate Free Mesh Method. Int. J. Numer. Methods Eng. 47, 1445–1462 (2000) Fujisawa, T., Yagawa, G.: A virtually meshless formulation for compressible high speed flows with Free Mesh Method. Comput. Fluid Solid Mech. 2, 836–838 (2001) Yagawa, G.: Parallel computing of local mesh finite element method. Comput. Mech. New Front. New Millen. 1, 17–26 (2001) Fujisawa, T., Inaba, M., Yagawa, G.: Parallel computing of high-speed compressible flows using a node-based finite element method. Int. J. Numer. Methods Eng. 58, 481–511 (2003) Fujisawa, T., Yagawa, G.: A FEM-based meshfree method with a probabilistic node generation technique. Int. J. Comput. Methods 1(2), 241–265 (2004) Yagawa, G.: Node-by-node parallel finite elements: a virtually meshless method. Int. J. Numer. Methods Eng. 60, 69–102 (2004) Yagawa, G., Miyamura, T.: Three-node triangular shell element using mixed formulation and its implementation by Free Mesh Method. Comput. Struct. 83, 2066–2076 (2005) Tian, R., Matsubara, H., Yagawa, G.: Advanced 4-node tetrahedrons. Int. J. Numer. Methods Eng. 68(12), 1209–1231 (2006) Tsuchida, J., Fujisawa, T., Yagawa, G.: Direct numerical simulation of aerodynamic sounds by a compressible CFD scheme with node-by-node finite elements. Comput. Methods Appl. Mech. Eng. 195, 1896–1910 (2006) Yagawa, G., Matsubara, H.: Enriched free mesh method: an accuracy improvement for nodebased FEM. Comput. Plastic. Comput. Methods Appl. Sci. 7, 207–219 (2007) Yagawa, G.: Free mesh method: fundamental conception, algorithms and accuracy study. Proc. Japan Acad. Series B 87(4), 115–134 (2011) Yagawa, G.: Computational performance of free mesh method applied to continuum mechanics problems. Proc. Japan Acad. Series B 87(4), 135–151 (2011)
224
Appendix
30. Liu, G.R.: Mesh Free Methods: Moving beyond the Finite Element Method. CRC Press (2003) 31. Nayroles, B., Touzot, G., Villon, P.: Generalizing the finite element method: diffuse approximation and diffuse elements. Comput. Mech. 10, 307–318 (1992) 32. Belytschko, T., Lu, Y.Y., Gu, L.: Element-free Glerkin method. Int. J. Numer. Methods Eng. 37, 229–256 (1994) 33. Lu, Y.Y., Belytschko, T., Gu, L.: A new implementation of the element free Galerkin method. Comput. Methods Appl. Mech. Eng. 113, 397–414 (1994) 34. Atluri, S.N., Zhu, T.: A new meshless local Petrov-Galerkin (MLPG) approach in computational mechanics. Comput. Mech. 22, 117–127 (1998) 35. Atluri, S.N., Kim, H.G., Cho, J.Y.: A critical assessment of the truly meshless local PetrovGalerkin (MLPG), and local boundary integral equation (LBIE) methods. Comput. Mech. 24, 348–372 (1999) 36. Liu, G.R., Gu, Y.T.: A point interpolation method for two-dimensional solids. Int. J. Numer. Methods Eng. 50, 937–951 (2001) 37. Idelsohn, S.R., Oñate, E., Del Pin, F.: The particle finite element method: a powerful tool to solve incompressible flows with free-surfaces and breaking waves. Int. J. Numer. Methods Eng. 61(7), 964–989 (2004) 38. Idelsohn, S.R., Oñate, E.: To mesh or not to mesh. That is the question.... Comput. Methods Appl. Mech. Eng. 195(37–40), 4681–4696 (2006) 39. Gingold, R.A., Monaghan, J.J.: Smoothed particle hydrodynamics: theory and application to non-spherical stars. Mon. Not. R. Astron. Soc. 181, 375–389 (1977) 40. Lucy, L.B.: A numerical approach to the testing of the fission hypothesis. Astron. J. 82, 1013–1024 (1977) 41. Koshizuka, S., Oka, Y.: Moving particle semi-implicit method for fragmentation of incompressible fluid. Nuclear Sci. Eng. 123, 421–434 (1996) 42. Kubo, S.: Inverse problems related to the mechanics and fracture of solids and structures. JSME Int. J. 31(2), 157–166 (1988)
Uncited References
1. Aoki, O., Yagawa, G.: Neural network-based direct FEM on a massively parallel computer AP1000 (Part 2). In: Proceedings of the Third Parallel Computing Workshop, Kawasaki, pp. 1–3 (1994) 2. Asaadi, E., Heyns, P.S., Haftka, R.T., Tootkaboni, M.: On the value of test data for reducing uncertainty in material models: computational framework and application to spherical indentation. Comput. Methods Appl. Mech. Eng. 346, 513–529 (2019) 3. Byravan, A., Fox, D.: SE3-Nets: Learning rigid body motion using deep neural networks. In: Proceedings of the IEEE International Conference on Robotics and Automation (2017). arXiv: 1606.02378 4. Cheng, Z., Wang, H.: How to control the crack to propagate along the specified path feasibly. Comput. Methods Appl. Mech. Eng. 336, 554–577 (2018) 5. Faes, M., Moens, D.: Multivariate dependent interval finite element analysis via convex hull pair constructions and the Extended Transformation Method. Comput. Methods Appl. Mech. Eng. 347, 85–102 (2019) 6. Finn, C., Goodfellow, I., Levine, S.: Unsupervised learning for physical interaction through video prediction (2016) arXiv:1605.07157 7. Fleuret, F.: Predicting the dynamics of 2d objects with a deep residual network (2016) arXiv: 1610.04032 8. Furukawa, T., Yagawa, G.: Parameter identification using an evolutionary algorithm and its performance under measurement errors. In: Proceedings of the International Conference on Computational Engineering Science, Hawaii, pp. 122–127 (1995) 9. Furukawa, T., Yagawa, G.: Parameter identification of inelastic constitutive equations using an evolutionary algorithm. In: Current Topics in Computational Mechanics (Proceedings of the 1995 Joint ASME/JSME Pressure Vessels and Piping Conference), Hawaii, pp. 437–444 (1995) 10. Furukawa, T., Yagawa, G.: An interface for implicit constitutive modelling. Key Eng. Mater. 145–149, 421–426 (1998) 11. Furukawa, T., Yoshimura, S., Yagawa, G.: Multi-objective parameter identification of unified material models. In: Tanaka, M., Dulikravich, G.S. (eds.) Invers Problems in Engineering Mechanics Proceedings of International Symposium on Inverse Problems in Engineering Mechanics 2000, Nagano, pp. 307–316 (2000) 12. Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014) © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 G. Yagawa and A. Oishi, Computational Mechanics with Neural Networks, Lecture Notes on Numerical Methods in Engineering and Sciences, https://doi.org/10.1007/978-3-030-66111-3
225
226
Uncited References
13. Ishihara, D., Jeong, M.J., Yoshimura, S., Yagawa, G.: Design window search using continuous evolutionary algorithm and clustering—its application to shape design of microelectrostatic actuator. Comput. Struct. 80, 2469–2481 (2002) 14. Ivakhnenko, A.G.: Polynomial theory of complex systems. IEEE Trans. Syst. Man Cybern. 1, 364–378 (1971) 15. Kawasaki, N., Felix, S., Kasahara, N., Furukawa, T., Yoshimura, S., Yagawa, G.: Automated identification of material constants in complex constitutive equations using an evolutionary algorithm and massively parallel processors. In: Proceedings of International Conference on Materials and Mechanics 97 (ICM&M’97), Tokyo, July 20–22, pp. 415–420 (1997) 16. Kawasaki, N., Felix, S., Kasahara, N., Furukawa, T., Yoshimura, S., Yagawa, G.: Automated identification of material constants in elastic constitutive equations by continuous evolutionary algorithm and massively parallel processors. In: Yoon, K.K., Bhandari, S., Moinereau, D., Ruggles, M.B., Shimakawa, T., Kubo, S. (eds.) PVP-Vol.5 Fatigue, failure, and High Temperature Design Methods in Pressure Vessel and Piping, No.H01146, pp. 183–190 (1998) 17. Kowalczyk, T., Furukawa, T., Yoshimura, S., Yagawa, G.: An extensible evolutionary algorithm approach for inverse problems. In: M. Tanaka, M., Dulikravich, G.S. (eds.) Invers Problems in Engineering Mechanics Proceedings of International Symposium on Inverse Problems in Engineering Mechanics 1998, Nagano, pp. 541–550 (1998) 18. Le, B.A., Yvonnet, J., He, Q.-C.: Computational homogenization of nonlinear elastic materials using neural networks. Int. J. Numer. Meth. Eng. 104, 1061–1084 (2015) 19. Lee, C.J.K., Furukawa, T., Yoshimura, S.: A human-like numerical technique for engineering design. Int. J. Numer. Meth. Eng. 64(14), 1915–1943 (2005) 20. Lerer, A., Gross, S., Fergus, R.: Learning physical intuition of block towers by example (2016). arXiv:1603.01312 21. Liu, M., Liang, L., Sun, W.: Estimation of in vivo constitutive parameters of the aortic wall using a machine learning approach. Comput. Methods Appl. Mech. Eng. 347, 201–217 (2019) 22. Mochizuki, Y., Yoshimura, Yagawa, G.: Neural network for automated structural design: its application to ITER first wall design. In: Transactions of the 12th International Conference on Structure Mechanics in Reactor Technology (SMiRT-12), Stuttgart, Aug. 15–20, pp. 183–188 (1993) 23. Mochizuki, Y., Yoshimura, Yagawa, G.: Automated structural design system based on design window search approach: its application to ITER first wall design. In: Proceedings of SMiRT-12 Post Conference Seminar Nr. 13, Constance, Aug. 23–25, pp. 1–10 (1993) 24. Mochizuki, Y., Yoshimura, S., Yagawa, G.: Automated system for structural design using design window search approach: its application to fusion first wall design. Adv. Eng. Softw. 28, 103–113 (1997) 25. Muramatsu, T., Yagawa, G.: An adaptive control system using the fuzzy theory for transient multi-physics numerical simulations. Int. J. Numer. Meth. Fluids 54, 805–830 (2007) 26. Ochiai, T., Shuto, T., Hamabe, S., Yamaguchi, A., Maeda, N., Yagawa, G.: Barkhausen noise tentative analysis using neural networks. In: Proceedings of the International Conference on Computational Engineering Science, Hawaii, July 30–Aug. 3, pp. 88–92 (1995) 27. Oh, S., Jiang, C.-H., Jiang, C., Marcus, P.S.: Finding the optimal shape of the leading-andtrailing car of a high-speed train using design-by-morphing. Comput. Mech. 62, 23–45 (2018) 28. Oishi, A., Yamada, K., Yoshimura, S., Yagawa, G.: Inverse analysis by means of neural network and computational mechanics: its application to UT-based nondestructive evaluation. In: Proceedings of the Joint JSME/KSME Conference on Recent Progress in Fracture Mechanics, Tokyo, July 30, pp. 89–94 (1993) 29. Oishi, A., Yamada, K., Yoshimura, S., Yagawa, G.: Quantitative nondestructive evaluation with ultrasonic method using neural networks and computational mechanics. Comput. Mech. 15, 521–533 (1995) 30. Okuda, H., Miyazaki, H., Yagawa, G.: A neural network approach for modeling of viscoplastic material behaviors. In: Advanced Computer Applications 1994 (Proceedings of the 1994 Pressure Vessels and Piping Conference), Minneapolis, June 19–23, pp. 141–145 (1994)
Uncited References
227
31. Okuda, H., Miyazaki, H., Yagawa, G.: Finite element analysis system using the neural network constitutive properties. In: Current Topics in Computational Mechanics (Proceedings of the 1995 Joint ASME/JSME Pressure Vessels and Piping Conference), Hawaii, July 23–27, pp. 33– 38 (1995) 32. Okuda, H., Yagawa, G.: Multi-color neural network with feedback mechanism for parallel finite element fluid analysis. In: M. Papadrakakis, M. (ed.) Parallel Solution Methods in Computational Mechanics, pp. 431–457 (1997) 33. Okuda, H., Yagawa, G., Yoshimura, S.: Inverse analysis for optimizing the parameters of constitutive properties using the neural network. In: Proceedings of Simulation of Materials Processing: Theory, Methods and Applications (NUMIFORM 95) (1995) 34. Sheikholeslami, M., Gerdroodbary, M.B., Moradi, R., Shafee, A., Li, Z.: Application of neural network for estimation of heat transfer treatment of Al2O3-H2O nanofluid through a channel. Comput. Methods Appl. Mech. Eng. 344, 1–12 (2019) 35. Shim, M.B., Suh, M.W., Furukawa, T., Yagawa, G., Yoshimura, S.: Pareto-based continuous evolutionary algorithms for multiobjective optimization. Eng. Comput. 19(1), 22–48 (2002) 36. Ueda, H., Uno, M., Shimakawa, S., Yoshimura, S., Yagawa, G.: AI-based automated design system for demonstration FBR components. In: Transactions of the 12th International Conference on Structure Mechanics in Reactor Technology (SMiRT-12), Stuttgart, Aug. 15–20, pp. 159–164 (1993) 37. Wang, K., Sun, W.-C.: A multiscale multi-permeability poroplasticity model linked by recursive homogenizations and deep learning. Comput. Methods Appl. Mech. Eng. 334, 337–380 (2018) 38. Wang, K., Sun, W.-C.: An updated Lagrangian LBM–DEM–FEM coupling model for dualpermeability fissured porous media with embedded discontinuities. Comput. Methods Appl. Mech. Eng. 344, 276–305 (2019) 39. White, D.A., Arrighi, W.J., Kudo, J., Watts, S.E.: Multiscale topology optimization using neural network surrogate models. Comput. Methods Appl. Mech. Eng. 346, 1118–1135 (2019) 40. Yagawa, G.: Finite elements with network mechanism. In: Proceedings of the Third World Congress on Computational Mechanics (WCCM III), Chiba, Aug. 1–5, pp. 1474–1481 (1994) 41. Yagawa, G., Aoki, O.: A neural network-based finite element method on parallel processors. In: Contemporary Research in Engineering Science, Springer, Berlin, pp. 637–653 (1995) 42. Yagawa, G., Mochizuki, Y., Yoshimura, S.: Automated structural design based on expert’sknowledge and fuzzy control. Comput. Eng. ASME, 1, 23–28 (1991) 43. Yagawa, G., Yoshimura, S.: Neural network approach to estimate crack growth in welded specimens. In: Pre-prints of the Ninth International Seminar on Inelastic Analysis, Fatigue, Fracture and Life Psrediction, Paris, pp. 22–35 (1993) 44. Yagawa, G., Yoshimura, S., Aoki, S., Kikuchi, M., Arai, Y., Kashima, K., Watanabe, T., Shimakawa, T.: Stable crack growth behaviors in welded CT specimens-finite element analyses and simplified assessments. In: Proceedings of the Joint IAEA/CSNI Specialists’ Meeting on Fracture Mechanics Verification by Large-Scale Testing, Oak Ridge, pp. 681–696 (1993) 45. Yagawa, G., Yoshimura, S., Mochizuki, Y., Oishi, T.: Identification of crack shape hidden in solid by means of neural network and computational mechanics. In: Inverse problems in Engineering Mechanics (IUTAM Symposium), Tokyo (1992) 46. Yagawa, G., Yoshimura, S., Mochizuki, Y., Oishi, T.: Identification of crack shape hidden in solid by means of neural network and computational mechanics. In: Inverse Problems in Engineering Mechanics, Proceedings of International Union of Theoretical and Applied Mechanics (IUTAM) Symposium, pp. 213–222 (1993) 47. Yagawa, G., Yoshimura, S., Nakao, K.: Automatic mesh generation of complex geometries based on fuzzy knowledge processing and computational geometry. Integr. Comput. Aided Eng. 2(4), 265–280 (1995) 48. Yagawa, G., Yoshimura, S., Okuda, H.: Neural network based parameter optimization for nonlinear finite element analyses. In: Proceedings of the 2nd Japan-US Symposium on Finite Element Methods in Large-Scale Computational Fluid Dynamics (Extended Abstracts), Tokyo, pp. 83–86 (1994)
228
Uncited References
49. Yagawa, G., Yoshimura, S., Okuda, H., Furukawa, T.: Innovative approaches for modelling of inelastic material behaviours (applications of neural networks and evolutionary algorithms). Trans. Eng. Sci. 13, 169–184 (1996) 50. Yagawa, G., Yoshimura, S., Saito, Y.: New quantitative nondestructive examination for threedimensional crack hidden in solid: an application of neural networks and computational mechanics. In: Proceedings of a Specialists Meeting organized by the International Atomic Energy Agency, Tokyo (1994) 51. Yagawa, G., Yoshimura, S., Soneda, N., Nakao, K.: Automatic two- and three-dimensional mesh generation based on fuzzy knowledge processing. Comput. Mech. 9, 333–346 (1992) 52. Yoshimura, S., Kowalczyk, T., Wada, Y., Yagawa, G.: A CAE system for multidisciplinary design and its interface in Internet. Trans. JSCES, Paper No. 19980004 (1998) 53. Yoshimura, S., Lee, J.-S., Yagawa, G.: Automated system for analyzing stress intensity factors of three-dimensional cracks: its application to analyses of two dissimilar semi-elliptical surface cracks in plate. Int. J. Press. Vessels Pip. 119, 18–26 (1997) 54. Yoshimura, S., Lee, J.-S., Yagawa, G., Sugioka, K., Kawai, T.: New probabilistic fracture mechanics approach with neural network-based crack modeling: its application to multiple cracks problem. In: Fatigue and Fracture Mechanics in Pressure Vessels and Piping (Proceedings of the 1995 Joint ASME/JSME Pressure Vessels and Piping Conference), Hawaii, July 23–27, pp. 437–442 (1995) 55. Yoshimura, S., Matsuda, A., Yagawa, G.: A new regularization method for neural-networkbased inverse analyses and its application to structural identification. In: Proceedings of the Second International Symposium on Inverse Problems (ISIP’94), Paris, Nov. 2–4, pp. 461–466 (1994) 56. Yoshimura, S., Mochizuki, Y., Yagawa, G.: Automated structural design based on knowledge engineering and fuzzy control. Eng. Comput. 12(7), 593–608 (1995) 57. Yoshimura, S., Psomas, S., Maile, K., Jovanovic, A.S., Ellingsen, H.P.: Prediction of possible failure mechanism in power plant components using neural networks and structural failure database. In: Proceedings of the MPA-Seminar on Safety and Reliability of Plant Technology, Stuttgart, Oct. 6–7, pp. 411–423 (1994) 58. Yoshimura, S., Saito, Y., Yagawa, G.: An automated inverse analysis system using neural networks and computational mechanics with its application to identification of 3D crack shape hidden in solid. In: Transactions of the 13th International Conference on Structure Mechanics in Reactor Technology (SMiRT-13), Porto Alegre, Aug. 13–18, pp. 473–478 (1995) 59. Yoshimura, S., Yagawa, G.: Inverse analysis by means of neural network and computational mechanics: its application to structural identification of vibrating plate. In: Kubo, S. (ed.) Inverse Problems, Atlanta Technology Publications, pp. 184–193 (1993) 60. Yoshimura, S., Yagawa, G., Mochizuki, Y.: Automation of thermal and structural design using artificial intelligence techniques. Eng. Anal. Boundary Elem. 7(2), 73–77 (1990) 61. Yoshimura, S., Yagawa, G., Mochizuki, Y.: An artificial intelligence approach to efficient fusion first wall design. In: Lecture Notes in Computer Science (Computer-Aided Cooperative Product Development), vol. 492, pp. 502–521 (1991) 62. Yoshimura, S., Yagawa, G., Oishi, A., Yamada, K.: Identification of defect hidden in solid using neural networks and computational mechanics: its application to ultrasonic nondestructive evaluation. In: Proceedings of SMiRT-12 Post Conference Seminar Nr. 13, Constance, Aug. 23–25, pp. 1–8 (1993) 63. Yoshimura, S., Yagawa, G., Oishi, A., Yamada, K.: Quantitative defect identification by means of neural network and computational mechanics. In: Proceedings of the Third Japan International SAMPE Symposium, Chiba, Dec. 7–10, pp. 2263–2268 (1993) 64. Yoshimura, S., Yagawa, G., Oishi, A., Yamada, K.: Neural network-based inverse analysis for defect identification with laser ultrasonics. Key Eng. Mater. 145–149, 443–452 (1998)