207 20 2MB
English Pages 155 [173] Year 2004
Copyright by Chang-wan Kim 2004
Intellectual Property Statement
“The software implementations of the Automated Multilevel Substructuring (AMLS) method and the Fast Frequency Response Analysis (FFRA) algorithm are commercial products protected by both copyrights and patents. Such protected technology may include some or all of the material described herein. Please contact The Office of Technology Commercialization at The university of Texas at Austin at 512.471.2995 or Professor Jeffrey K. Bennighof at 512.471.4709 if you are interested in licensing or developing a commercial implementation of this technology” i
The Dissertation Committee for Chang-wan Kim certifies that this is the approved version of the following dissertation:
Frequency Response Computation for Complex Structures with Damping and Acoustic Fluid
Committee:
Jeffrey K. Bennighof, Supervisor
Leszek F. Demkowicz
Clint N. Dawson
Eric P. Fahrenthold
Rui Huang
Frequency Response Computation for Complex Structures with Damping and Acoustic Fluid
by
Chang-wan Kim, B.S., M.S.
Dissertation Presented to the Faculty of the Graduate School of The University of Texas at Austin in Partial Fulfillment of the Requirements for the Degree of
Doctor of Philosophy
The University of Texas at Austin December 2004
to my parents my wife and children
Acknowledgments First of all, I would like to thank the LORD Jesus for giving me everything I have. He always gives me light in the name of the LORD. I would like to express my gratitude to my advisor, Dr. Bennighof, for his academic advice, teaching, guidance, and insight throughout my research. It has been a pleasure to work with him and he has taught me a great deal about many things. I would also like to thank the members of my committee, Dr. Leszek F. Demkowicz, Dr. Clint N. Dawson, Dr. Eric P. Fahrenthold, and Dr. Rui Huang for their advice and assistance throughout my graduate education. I especially appreciate Dr. Matthew F. Kaplan and Mark Muller who have helped me along in this process. I am very grateful to Mintae Kim, Eric Swenson, Jeremiah Palmer, and Tim Allison for numerous discussions. Also, I am grateful to all my Korean colleagues in ASE/EM department and members of Korean catholic church for their encouragement and prayer. It has been my privilege to work with Mladen Chargin in CDH GmbH. Without his enthusiasm, this research would never have gone this far. Also, I would like to thank Mark Kelly in HP, Doug Petesch in IBM, and Cheng Liao in SGI for their supports. Finally, and most importantly, I would like to thank my parents, brother and sister, and my family for their unconditional love and support throughout my study. v
I am particularly grateful to my wife, Miyoung for her unending patience, love, and prayer.
Chang-wan Kim
The University of Texas at Austin December 2004
vi
Frequency Response Computation for Complex Structures with Damping and Acoustic Fluid
Publication No.
Chang-wan Kim, Ph.D. The University of Texas at Austin, 2004
Supervisor: Jeffrey K. Bennighof
Modal frequency response analysis is a very economical approach for large and complex structural systems since there is an enormous reduction in dimension from the original finite element frequency response problem to the number of modes participating in the response. When damping does not exist, the modal frequency response problem is inexpensive to solve because it becomes uncoupled. However, when damping exists, the modal damping matrices can become fully populated, making the modal frequency response problem expensive to solve at many frequencies. The conventional approach to solve the modal frequency response problem with damping is to use either direct methods with O(n3 ) operations at each frequency, or iterative methods with O(n2 ) operations per iteration and numerous iterations at each frequency, where n is the number of modes used to represent the response. Another approach is to use a state space formulation and an eigensolution to uncouple the damped modal frequency response problem, but this doubles the vii
dimension of the problem. All of the existing traditional methods are very expensive for systems with many modes. In this dissertation, a new algorithm to solve the modal frequency response problem for large and complex structural systems with structural and viscous damping is presented. The newly developed algorithm, fast frequency response analysis (FFRA), solves the damped modal frequency response problem with O(n2 ) operations at each frequency. The FFRA algorithm considers both structural damping and viscous damping for structural systems. When only structural damping exists, the modal frequency response problem is uncoupled by applying the eigensolution of the complex symmetric modal stiffness matrix. A complex symmetric matrix eigensolver (CSYMM) has been developed to solve the complex symmetric matrix eigenvalue problem efficiently. If a viscous damping matrix is also present, the algorithm handles viscous damping by noting that the rank of the viscous damping matrix is typically very low for the problems of interest in the automobile industry because of the small number of viscous damping elements such as shock absorbers and engine mounts. This algorithm has also been applied to the coupled response of systems consisting of a light acoustic fluid and structure, and systems with enforced motion. Also, the algorithm is implemented in parallel on shared memory multiprocessor machines for performance improvement. The FFRA algorithm is evaluated for several industry finite element models which have millions of degrees of freedom. The FFRA algorithm produces outstanding performance compared to the methods available in the commercial finite element software MSC.Nastran or NX.Nastran in terms of analysis time, since the new algorithm is many times faster while obtaining almost the same accuracy as MSC.Nastran. Therefore, the new FFRA algorithm makes inexpensive high frequency analysis possible and extends the capability of solving modal frequency re-
viii
sponse analysis to higher frequencies.
ix
Contents Acknowledgments
v
Abstract
vii
List of Tables
xiv
List of Figures
xvi
Chapter 1 Introduction 1.1
1.2
1
Motivations and Challenges . . . . . . . . . . . . . . . . . . . . . . .
2
1.1.1
Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
1.1.2
Challenges
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
Outline of the Dissertation . . . . . . . . . . . . . . . . . . . . . . . .
9
Chapter 2 Survey of Methods to Solve Modal Frequency Response Problems
11
2.1
Methods that solve the uncoupled modal frequency response problem
12
2.1.1
Undamped system . . . . . . . . . . . . . . . . . . . . . . . .
12
2.1.2
Damped system with global structural and proportional damping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12
Damped system with non-proportional damping . . . . . . .
13
Methods that solve the coupled modal frequency response problem .
14
2.1.3 2.2
x
2.2.1
Direct methods for complex indefinite linear systems . . . . .
15
2.2.2
Iterative methods for complex indefinite linear systems . . . .
15
Chapter 3 A New Frequency Response Analysis Algorithm 3.1
19
Modal Frequency Response Problem Formulation using AMLS . . .
19
3.1.1
Overview of AMLS . . . . . . . . . . . . . . . . . . . . . . . .
20
3.1.2
Modal frequency response problem formulation . . . . . . . .
22
3.2
Algorithm FFRA1 : Systems with structural damping . . . . . . . .
24
3.3
Algorithm FFRA2 : Systems with viscous damping . . . . . . . . . .
27
3.4
Algorithm FFRA3 : Systems with both structural and viscous damping 31
3.5
Algorithm FFRA4 : Modal correction approach for system with M, K not diagonalized by Φ . . . . . . . . . . . . . . . . . . . . . . . . .
Chapter 4 Complex Symmetric Matrix Eigensolver
33 38
4.1
Properties of Complex Symmetric Matrices . . . . . . . . . . . . . .
40
4.2
Full Storage Complex Symmetric Matrix Eigensolver . . . . . . . . .
43
4.2.1
Complex symmetric Householder matrix . . . . . . . . . . . .
43
4.2.2
Complex symmetric matrix eigensolver . . . . . . . . . . . . .
46
4.2.3
Tie-situation . . . . . . . . . . . . . . . . . . . . . . . . . . .
48
4.2.4
Tie Breaking . . . . . . . . . . . . . . . . . . . . . . . . . . .
48
Block Packed Storage Symmetric Matrix Eigensolver . . . . . . . . .
53
4.3.1
Block packed storage . . . . . . . . . . . . . . . . . . . . . . .
54
4.3.2
Reduction to tridiagonal form . . . . . . . . . . . . . . . . . .
54
4.3
Chapter 5 Frequency Response Analysis of Acoustic Fluid/Structure Interaction
61
5.1
Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . .
62
5.2
Frequency Response Analysis . . . . . . . . . . . . . . . . . . . . . .
66
xi
Chapter 6 Frequency Response Analysis with Enforced Motion
69
6.1
Motivation
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
70
6.2
Problem Formulation for Structure . . . . . . . . . . . . . . . . . . .
73
6.3
Problem Formulation for Acoustic Fluid and Structure Interaction .
75
6.4
Problem Formulation with Modal Correction Approach . . . . . . .
78
Chapter 7 Parallel Implementation of the New Algorithm 7.1
Parallelization of Frequency Response Analysis . . . . . . . . . . . .
81
7.1.1
Step 1 : Parallel complex symmetric matrix eigensolver . . .
81
7.1.2
Step 2 : Frequency loop . . . . . . . . . . . . . . . . . . . . .
89
7.1.3
Step 3 : Backtransformation
90
. . . . . . . . . . . . . . . . . .
Chapter 8 Numerical Results and Performance 8.1
8.2
8.3
80
91
Accuracy and Performance of CSYMM . . . . . . . . . . . . . . . . .
93
8.1.1
Full storage eigensolver . . . . . . . . . . . . . . . . . . . . .
93
8.1.2
Block packed storage eigensolver . . . . . . . . . . . . . . . .
96
Accuracy and Performance of FFRA . . . . . . . . . . . . . . . . . . 100 8.2.1
Model 1 : Structural damping . . . . . . . . . . . . . . . . . . 100
8.2.2
Model 2 : Structural and viscous damping . . . . . . . . . . . 106
8.2.3
Model 3 : Acoustic fluid . . . . . . . . . . . . . . . . . . . . . 113
8.2.4
Model 4 : Asymmetric B matrix . . . . . . . . . . . . . . . . 119
8.2.5
Model 5 : Full modal mass and stiffness matrices . . . . . . . 123
8.2.6
Model 6 : Enforced motion . . . . . . . . . . . . . . . . . . . 126
8.2.7
Summary of all 6 models . . . . . . . . . . . . . . . . . . . . 132
Parallel Performance of the FFRA . . . . . . . . . . . . . . . . . . . 133 8.3.1
Example 1: model 2 . . . . . . . . . . . . . . . . . . . . . . . 133
8.3.2
Example 2: model 6 . . . . . . . . . . . . . . . . . . . . . . . 134
xii
Chapter 9 Conclusions and Future Work
137
9.1
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
9.2
Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
9.3
Final Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
Appendix A Eigenvector orthogonality in the Jordan form
144
Bibliography
146
Vita
155
xiii
List of Tables 3.1
The cost of operations for the FFRA1 algorithm: frequency response analysis with structural damping . . . . . . . . . . . . . . . . . . . .
3.2
The cost of operations for FFRA 2 algorithm: frequency response analysis with viscous damping . . . . . . . . . . . . . . . . . . . . . .
3.3
27
30
The cost of operations for FFRA 3 algorithm: frequency response analysis with structural and viscous damping . . . . . . . . . . . . .
34
4.1
The storage requirements of each stage of the eigenproblem . . . . .
56
8.1
The list of FE models used for numerical examples . . . . . . . . . .
92
8.2
Elapsed time for the reduction of the complex symmetric matrices with CSYMM, CS, and LAPACK [mm:ss] . . . . . . . . . . . . . . .
8.3
93
Elapsed time for the backtransformation of the complex symmetric matrices with CSYMM, CS, and LAPACK [mm:ss] . . . . . . . . . .
94
8.4
Elapsed time for reduction of real symmetric matrices [mm:ss] . . . .
98
8.5
Elapsed time for backtransformation of real symmetric matrices [mm:ss] 98
8.6
FE model 1 analysis information . . . . . . . . . . . . . . . . . . . . 101
8.7
Elapsed time for the modal frequency response analysis for FE model 1102
8.8
Timings of the algorithm FFRA for Model 1 . . . . . . . . . . . . . . 102
8.9
FE model 2 analysis information . . . . . . . . . . . . . . . . . . . . 106
xiv
8.10 Elapsed time for the modal frequency response analysis for FE model 2107 8.11 Timings of the algorithm FFRA for Model 2 . . . . . . . . . . . . . . 109 8.12 FE model 3 analysis information . . . . . . . . . . . . . . . . . . . . 113 8.13 Elapsed time for the modal frequency response analysis for FE model 3115 8.14 Timings of the algorithm FFRA for Model 3 . . . . . . . . . . . . . . 115 8.15 FE model 4 analysis information . . . . . . . . . . . . . . . . . . . . 119 8.16 Elapsed time for the modal frequency response analysis for FE model 4120 8.17 Timings of the algorithm FFRA for Model 4 . . . . . . . . . . . . . . 120 8.18 Elapsed time for the modal frequency response analysis for FE model 5123 8.19 Timings of the algorithm FFRA for Model 5 . . . . . . . . . . . . . . 124 8.20 FE model 6 analysis information . . . . . . . . . . . . . . . . . . . . 126 8.21 Elapsed time for the modal frequency response analysis for FE model 6126 8.22 Timings of algorithm FFRA for Model 6 . . . . . . . . . . . . . . . . 127 8.23 Parallel performance of the FFRA for model 2 . . . . . . . . . . . . 134 8.24 Timings (second) and speedups for parallelized steps in the FFRA algorithm for model 2 . . . . . . . . . . . . . . . . . . . . . . . . . . 134 8.25 Timings (second) and speedups for parallelized steps in the FFRA algorithm for model 2 . . . . . . . . . . . . . . . . . . . . . . . . . . 135 8.26 Parallel performance of the FFRA for model 6 . . . . . . . . . . . . 135 8.27 Timings (second) and speedups for parallelized steps in FFRA algorithm for model 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 8.28 Timings (second) and speedups for parallelized steps in the FFRA algorithm for model 6 . . . . . . . . . . . . . . . . . . . . . . . . . . 136
xv
List of Figures 1.1
Stress-strain diagram showing a hysteresis loop [20] . . . . . . . . . .
5
1.2
Automobile shock absorber in suspension system . . . . . . . . . . .
6
1.3
Example of acoustic fluid and structure interaction problem [6] . . .
8
1.4
Example of enforced motion problem . . . . . . . . . . . . . . . . . .
9
3.1
Analysis flow using the AMLS method and the FFRA algorithm . .
21
3.2
Optimization flow in the modal correction approach . . . . . . . . .
37
4.1
Block packed storage scheme . . . . . . . . . . . . . . . . . . . . . .
55
4.2
Mapping between block packed storage B and the upper triangular matrix A
4.3
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
55
Example of block packed matrix-vector multiplication, x = τ Av, for the p-th column in B4 . . . . . . . . . . . . . . . . . . . . . . . . . .
59
4.4
Example of block packed rank 2k update . . . . . . . . . . . . . . . .
60
5.1
Fluid-structure interaction problem domain . . . . . . . . . . . . . .
63
6.1
Example of large mass approach for the enforced motion problem . .
71
6.2
Norm of residual of each eigenvector from LAPACK and CSYMM .
72
7.1
The flow of the parallel frequency response analysis using the FFRA algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi
82
7.2
Elapsed time of PZGEMM with different submatrix sizes for n = 6000 matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
85
7.3
Speedup of PZGEMM for different matrix sizes . . . . . . . . . . . .
85
7.4
Speedup of PZSYR2K for different matrix sizes . . . . . . . . . . . .
86
7.5
Speedup of parallel PZGEMV for different matrix sizes . . . . . . . .
89
8.1
Ratio of elapsed time to compute the eigensolution of complex symmetric matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.2
95
Eigenvalues calculated from the full storage algorithm DSYTRD, packed storage algorithm DSPTRD, and block packed storage algorithm for a matrix n = 4000 . . . . . . . . . . . . . . . . . . . . . . .
8.3
97
Relative errors between the eigenvalues from the block packed storage algorithm and those from the full storage algorithm for a matrix n = 4000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.4
97
Ratio of elapsed time to compute the eigensolution of real symmetric matrices: the full storage solver to the block packed storage solver and the full storage solver to the packed storage solver
. . . . . . .
99
8.5
Finite element model: Model 1 with structural damping . . . . . . . 101
8.6
Frequency response for Model 1 with Y direction excitation force . . 104
8.7
Frequency response for Model 1 with Z direction excitation force . . 105
8.8
Finite element model: Model 2 . . . . . . . . . . . . . . . . . . . . . 107
8.9
Frequency responses for Model 2 at cross point 1 . . . . . . . . . . . 110
8.10 Frequency responses for Model 2 at cross point 2 . . . . . . . . . . . 111 8.11 Frequency responses for Model 2 at drive point . . . . . . . . . . . . 112 8.12 Finite element model: Model 3 . . . . . . . . . . . . . . . . . . . . . 114 8.13 Frequency response of structure in X and Z direction with X direction excitation force . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 8.14 Frequency response of fluid with X and Z direction excitation force . 118 xvii
8.15 Frequency response of structure and fluid parts for Model 4 . . . . . 122 8.16 Frequency response of structure in Y direction excitation force . . . 125 8.17 Finite element model: Model 6 . . . . . . . . . . . . . . . . . . . . . 129 8.18 Frequency response of structure at a point . . . . . . . . . . . . . . . 130 8.19 Frequency response of fluid at a point . . . . . . . . . . . . . . . . . 131 8.20 Comparison of the elapsed time for test FE models . . . . . . . . . . 132
xviii
Chapter 1
Introduction Analysis of the dynamic response of large complex structures is a very challenging problem to solve efficiently. A large complex structure such as an automobile, submarine or aircraft can be represented with a finite element (FE) model on which detailed dynamic response analysis can be performed. The dynamic response of a structure to harmonic excitations over a range of frequencies can be found through frequency response analysis. Today’s challenge in frequency response analysis is to obtain accurate frequency response at higher frequencies for FE models with millions of degrees of freedom. Because the frequency response is typically desired at hundreds of frequencies, solutions in terms of all of the FE degrees of freedom are not feasible. Instead modal frequency response approach has been used. The goal of this dissertation is to develop efficient algorithms to solve the modal frequency response problem for structures with structural damping, viscous damping, and acoustic fluid. Newly developed algorithms greatly improve the performance of frequency response analysis compared to current existing methods.
1
1.1 1.1.1
Motivations and Challenges Motivations
There are mainly two methods of frequency response analysis: direct frequency response analysis and modal frequency response analysis. Direct frequency response analysis The forced vibration equations of motion for damped structures discretized by the FE method are ¨ (t) + B x(t) ˙ Mx + (1 + iγ)Kx(t) + iKs x(t) = p(t)
(1.1)
where M , B, and K are the FE mass matrix, the viscous damping matrix and the stiffness matrix, respectively. The scalar γ is a global structural damping coefficient √ and i = −1. Ks is the FE structural damping matrix which represents localized deviations of specific elements from the global structural damping level. For a harmonic excitation p(t) = P(ω)eiωt , we assume a harmonic solution of the form x(t) = X(ω)eiωt
(1.2)
where X(ω) is a complex displacement vector and ω is the radian frequency of timeharmonic excitation and response. When the first and second derivatives of equation (1.2) are substituted into equation (1.1), the following is obtained after dividing by eiωt : £ 2 ¤ −ω M + iωB + (1 + iγ)K + iKs X(ω) = P(ω)
(1.3)
This is a system of equations for the direct frequency response analysis. The frequency response X(ω) is calculated at each frequency ω by solving a set of complex linear equations. Today, the size of finite element models is growing larger for more accurate analysis, so that they have millions of degrees of freedom. Therefore, solving 2
these very large finite element systems of equations at many frequencies has been prohibitive because of the large amounts of CPU time, memory, and data transfer required, even though this method is very straightforward. Instead, modal frequency response analysis has been used. Modal frequency response analysis This method uses the modes of a structure to reduce the dimension of the frequency response problem and uncouple the equations of motion if there is no damping. Since it is very expensive to compute the damped modes [21–23], the undamped modes have been used in practice to form the modal frequency response problem. The modes of the structure having natural frequencies up to a specified cutoff frequency are obtained from a partial eigensolution of the generalized eigenvalue problem KΦ = M ΦΛ
(1.4)
where Φ is the rectangular matrix whose columns are eigenvectors and Λ is a diagonal matrix containing real valued eigenvalues. The frequency response problem in equation (1.3) is projected onto the subspace spanned by eigenvectors in Φ. By making the approximation X(ω) ≈ ΦZ(ω) and premultiplying by ΦT , the modal frequency response problem is obtained as [−ω 2 ΦT M Φ + iωΦT BΦ + (1 + iγ)ΦT KΦ + iΦT Ks Φ]Z(ω) = ΦT P(ω).
(1.5)
With mass normalization, ΦT M Φ = I and ΦT KΦ = Λ, so equation (1.5) can be written in the form ¯ + (1 + iγ)Λ + iK¯s ]Z(ω) = F(ω) [−ω 2 I + iω B
(1.6)
¯ = ΦT BΦ and where F(ω) = ΦT P(ω). Note that the modal damping matrices B K¯s = ΦT Ks Φ are not diagonal except in certain special cases. Now equation (1.6)
3
contains the equations of motion in terms of the modal coordinates. The dimension of each modal matrix is equal to the number of modes in Φ, typically in the thousands, found in equation (1.4). Therefore, there is an enormous reduction in dimension from the original frequency response problem of equation (1.3) to the modal frequency response problem of equation (1.6), so that the solution of the modal frequency response problem becomes much more economical. Some of the most challenging aspects of performing the modal frequency response analysis arise from the modeling of damping. In the frequency response analysis of structures, damping is almost unavoidable. Damping dissipates some energy during each cycle of response. Generally, two types of damping are used for modeling complex structures: structural damping, represented with the matrix Ks , and viscous damping, represented with the matrix B. • Structural damping Structural damping encompasses energy dissipation related to the internal structure of a vibrating body. It is especially intended to account for energy loss due to the hysteresis of elastic materials experiencing cyclic stress as shown in Figure 1.1. During loading, a material in cyclic stress follows a stressstrain path that differs from the path during unloading. The stress-strain curve forms a hysteresis loop and the area enclosed by the curve is equal to the dissipated energy during one cycle of stress. Note that in the frequency response, structural damping produces imaginary numbers in the complex stiffness matrix which are independent of the excitation frequency. In reality, in many structures, structural damping is the main source of damping and it strongly affects vibration of mechanical systems [1]. Structural damping has been found experimentally to represent the energy dissipation exhibited by automobile structures more accurately than the viscous damping model. • Viscous damping 4
Figure 1.1: Stress-strain diagram showing a hysteresis loop [20] Viscous damping is used to model certain discrete devices in an automobile where energy dissipation is not represented well by structural damping. Typically, these devices only include shock absorbers and, in some cases, engine mounts. A viscous damper creates forces at each end which oppose the relative velocity between its ends, and are of magnitude proportional to this relative velocity. The resulting damping force vector depends on the excitation frequency. With most structures, a relatively small amount of viscous damping provides a large reduction in stress and deflection by dissipating energy. For example, in an automobile suspension system, a damper or shock absorber is used to control the motion of the springs as shown in Figure 1.2. The damping forces required are quite small compared to those of springs which must support the vehicle and deflect under bump loadings. However, viscous dampers can greatly reduce unpleasant automobile vibrations. Other examples are pneumatic parts in doors, and seismic viscous dampers inside buildings. When structural and viscous damping do not exist, solving the modal frequency response problem is very inexpensive because the coefficient matrix becomes 5
Damper and spring
Figure 1.2: Automobile shock absorber in suspension system diagonal. However, when damping matrices are nonzero, the modal damping matrices are fully populated except in special cases such as proportional damping. Therefore, it becomes very expensive to solve the modal frequency response problem with damping. In current commercial FE analysis software, the complex dense coefficient matrix in equation (1.6), whose dimension is in the thousands, is factored at each frequency. As the upper frequency limit of frequency response analysis increases, the cutoff frequency for the modes increases proportionally. For many structures of interest, including automobile bodies and other shelllike vehicle structures, the number of modes below the cutoff frequency is roughly proportional to the square of the cutoff frequency. Therefore, the number of modes increases rapidly as the cutoff frequency increases. The cost of a factorization is O(n3 ) operations, where n is the number of modes used to represent the response. This makes the expense of the modal frequency response analysis increase extremely rapidly as the upper frequency limit for the analysis increases. Therefore, the cost of solving the modal frequency response equation with damping is now emerging as a significant issue.
6
1.1.2
Challenges
The challenge of this dissertation is the development of a new algorithm, fast frequency response analysis (FFRA), to solve the modal frequency response problem with structural and viscous damping. Whereas the conventional approach in commercial FE software requires O(n3 ) operations for the factorization of the coefficient matrix at each excitation frequency, the goal of the new algorithm is O(n2 ) operations. There are some more challenging research issues that can be addressed using the new FFRA algorithm. Some of these issues are frequency response analysis for acoustic fluid and structure interaction, frequency response analysis for enforced motion, and parallel implementation. Therefore, the scope of this dissertation is extended to these topics: • Acoustic fluid and structure interaction Reduction of vehicle noise has been a top priority for car manufacturers to improve ride quality. Today’s automobile industry uses detailed acoustic fluid and structure FE models to predict acoustic responses due to road, tire, powertrain, and driveline excitation. An example of such a model is shown in Figure 1.3. The vibrating structure generates acoustic pressures in the acoustic cavity of the passenger compartment, so that not only the structure part of the response but also the fluid part of the response to excitation is of interest. Since a typical industry automobile FE model for the acoustic fluid and structure interaction problem contains a detailed body structure, detailed suspension, tires, detailed driveline, powertrain, and acoustic cavity, these FE models usually have millions of degrees of freedom. Therefore, there has been a limitation in the excitation frequency range for the acoustic fluid and structure interaction problem with these large scale FE models because of high computational cost with current existing methods. 7
Structure-Acoustic FE model
Acoustic FE model for passenger compartment
Figure 1.3: Example of acoustic fluid and structure interaction problem [6] • Enforced motion The structural systems analyzed are subjected to various loads and boundary conditions. Boundary conditions are classified as single point constraints, multi-point constraints, and enforced motion or base motion. Enforced motion is used when base motion is specified instead of or in conjunction with applied loads. The enforced motion technique specifies the displacement, velocity, and/or acceleration at nodes for frequency response analysis [68]. In an automobile, there is a need to enforce the motion of tires to simulate the vehicle traveling over a prescribed surface as shown in Figure 1.4. This enforced motion analysis has been one of the essential analyses for improving ride quality. • Parallelization Much of the research in developing software for advanced architecture computers is motivated by the need to solve large and complex problems in paral8
M, B, K
Prescribed Surface
Figure 1.4: Example of enforced motion problem lel. Parallel machines have been divided into two broad types based on their physical structure: distributed memory multiprocessors and shared memory multiprocessors. Parallel implementation in shared memory multiprocessors is easy and simple since each processor has direct access to the memory shared by all of the processors in the system. Today, the popularity of shared memory multiprocessor systems is increasing in industry, so it is necessary to parallelize the new frequency response analysis algorithm FFRA for shared memory multiprocessors.
1.2
Outline of the Dissertation
The main thrust of this dissertation is the development of new algorithms to solve the modal frequency response problem with structural and viscous damping. The goal of the new algorithm is O(n2 ) operations at each excitation frequency, where n is the number of modes used to represent the response and is usually in the thousands. The following is an outline of the dissertation. • Chapter 2 surveys several existing methods for solving the damped modal frequency response problem.
9
• Chapter 3 presents an efficient frequency response analysis algorithm (FFRA) for structures with O(n2 ) operations at each frequency, in which structural and viscous damping are considered. • In Chapter 4, a complex symmetric matrix eigensolver (CSYMM) is presented for solving a complex symmetric matrix eigenvalue problem efficiently. This eigensolver is essential for the new FFRA algorithm described in Chapter 3. • In Chapter 5, we present a method for solving the acoustic fluid and structure interaction problem using the FFRA algorithm. In automobile acoustic fluidstructure interaction analysis, the response of the structure is more expensive than the acoustic fluid response, so the new algorithm is applied to handle the structure part to reduce the computational cost. • Chapter 6 presents an approach for solving the enforced motion problem effectively using the FFRA algorithm. • Chapter 7 describes parallelizations for the FFRA algorithm presented in Chapter 3 and 4, and all other applications described in Chapter 5 and 6. The parallelization is for shared memory multiprocessors and is implemented using the OpenMP application program interface. • Chapter 8 shows the performance and accuracy of the FFRA algorithm using numerical results from several industry FE models. • Finally, Chapter 9 presents conclusions of this dissertation and identifies potential future work.
10
Chapter 2
Survey of Methods to Solve Modal Frequency Response Problems In this chapter, several methods for solving the modal frequency response problem are surveyed. As mentioned in Chapter 1, solving the direct frequency response problem at each frequency ω is not feasible for large structural system FE models, so we consider methods for solving the modal frequency response problem only. For the purpose of surveying methods to solve the modal frequency response problem, we write the forced vibration equations of motion for damped structures with the modal superposition technique in the form ¯ z(t) ˙ + (1 + iγ)Λz(t) + iK¯s z(t) = ΦT p(t) I¨ z(t) + B
(2.1)
where x(t) ≈ Φz(t) and x(t) is defined in equation (1.1). Then, we rewrite the modal frequency response problem in equation (1.6) ¯ + (1 + iγ)Λ + iK¯s ]Z(ω) = F(ω). [−ω 2 I + iω B 11
(2.2)
Methods are divided into two types: those that solve the uncoupled modal frequency response problem, and those that solve the coupled modal frequency response problem.
2.1
Methods that solve the uncoupled modal frequency response problem
2.1.1
Undamped system
For an undamped system, equation (2.2) becomes the uncoupled equation £ 2 ¤ −ω I + Λ Z(ω) = F(ω),
(2.3)
so that it is easy to solve this modal frequency response problem. Once the modal responses Z(ω) are computed, physical responses are recovered as the summation of the modal responses using X(ω) = ΦZ(ω).
2.1.2
Damped system with global structural and proportional damping
With global structural damping γ and proportional viscous damping B = αK + βM in which α and β are constant scalars, the modal frequency response problem in equation (2.2) becomes £ 2 ¤ −ω I + iω(αI + βΛ) + (1 + iγ)Λ Z(ω) = F(ω)
(2.4)
as a result of the mode orthogonality property and mass normalization. The coefficient matrix becomes a diagonal matrix, so it is inexpensive to solve this linear system. However, it is inaccurate to represent the real damping of structures with these simple models of damping because these models assume that the energy loss mechanism is distributed over the structure in the same way as the mass and the stiffness. 12
2.1.3
Damped system with non-proportional damping
In reality, for physical structures and systems, the damping matrices will not always be proportional to the mass and/or stiffness matrices. This type of damping can be classified as non-proportional damping. The previously used formulation of the eigenvalue problem for the undamped system as shown in equation (1.4) will not yield eigenvectors that uncouple the frequency response problem with nonproportional damping in equation (1.3). To uncouple the equation in (2.2), one needs an eigensolution for the damped system in equation (2.1). The eigensolution of the damped system is the solution of the quadratic eigenvalue problem (QEP) ¯ + (1 + iγ)Λ + iK¯s )vi = 0, Q(λi )vi = (λ2i I + λi B
i = 1, · · · , 2n
(2.5)
where vi is the eigenvector corresponding to the eigenvalue λi . The QEP has 2n eigenvalues with 2n eigenvectors. There are two ways for solving the QEP, those that solve the QEP problem in its original form, and those that linearize it into a generalized eigenvalue problem (GEP) and then apply GEP techniques [21]. Generally, it is known that the former methods are more difficult to implement and costly to solve than the latter methods [31–33]. The linearization for Q(λi ) can be obtained by reformulating equation (2.2) into 2n first order equations using a state space representation. There are several ways [18] of rewriting equation (1.1). One typical way is as follows [35] 0 I −I 0 y(t) y(t) = q(t) ˙ + ¯ ¯ I B 0 (1 + iγ)Λ + iKs
(2.6)
˙ where the state vector, y(t), is of order 2n and contains both z(t) and z(t) in the form
z(t) ˙ y(t) = z(t) 13
(2.7)
and
q(t) =
0
ΦT p(t)
.
(2.8)
Equation (2.6) can be simplified to the system of 2n equations ˙ Ay(t) + Cy(t) = q(t).
(2.9)
This equation now yields a generalized eigenvalue problem in the form Cwi = λi Awi ,
i = 1, · · · , 2n
Note that the equation Q(λi )vi = 0 is equivalent to (C − λi A)wi = 0 with λi vi . wi = vi
(2.10)
(2.11)
Then, similar to the case for an undamped system using the modal superposition method, a set of weighted orthogonality relationships is valid for the system matrices A and C [24, 35]. w∗ Aw = diag(α1 , · · · , α2n ) w∗ Cw = diag(β1 , · · · , β2n )
(2.12)
where w∗ is the conjugate transpose of w and the w∗ = wT if A and C are real. Then equation (2.6) can be reduced to 2n decoupled equations. However, this method requires an additional eigenvalue problem in equation (2.10) which has 2n dimension. Complex arithmetic also increases the cost. Therefore, this approach is not appropriate for large FE models which require more than thousands of modes to represent the responses.
2.2
Methods that solve the coupled modal frequency response problem
There are two ways to solve a coupled modal frequency response problem: direct methods and iterative methods. 14
2.2.1
Direct methods for complex indefinite linear systems
Direct methods are the most straightforward method for solving the damped modal frequency response problem without uncoupling equation (2.2). Equation (2.2) is a complex indefinite linear system of equations. To solve indefinite linear systems, there are three well known algorithms in direct methods when A is a dense matrix: the Bunch-Kaufman algorithm [37], the Bunch-Parlett algorithm [38], and Aasen’s algorithm [39]. All of them provide stable factorizations of symmetric indefinite matrices. Aasen’s algorithm computes an LT LT factorization, where T is symmetric tridiagonal. The Bunch-Parlett and Baunch-Kaufman algorithms factor A as P AP T = LDLT , where P is a permutation matrix, D is a block diagonal matrix composed of 1 × 1 or 2 × 2 blocks, and L is an unit lower triangular matrix. The most recent of these, the Bunch-Kaufman algorithm, has advantages over the other two algorithms [40], so that it has been the most commonly used. This algorithm is adopted in both LINPACK [44] and LAPACK [58]. Overall, all of the factorization algorithms require n3 /3 flops and n2 /2 storage. Therefore, it is very expensive to use a direct method for structural systems with many modes.
2.2.2
Iterative methods for complex indefinite linear systems
For large linear systems, iterative methods have a decisive advantage over direct methods in terms of speed and demands of computer memory. There are two categories of iterative methods [48]. Stationary methods are older, and simpler to understand and implement, but usually not as effective. Nonstationary methods are a relatively recent development. Their analysis is usually harder to understand, but they can be highly effective. • Stationary iterative methods In a stationary method, a general type of iterative process for solving a linear
15
system Ax = b can be expressed in the form M x(k) = (M − A)x(k−1) + b,
(2.13)
where A = M − N is a splitting of the matrix A. This iteration is convergent if the spectral radius of the matrix M −1 N , denoted by ρ(M −1 N ), is less than one. For rapid convergence, we should choose M and N so that ρ(M −1 N ) is as small as possible. The four main stationary methods are the Jacobi method, Gauss-Seidel method, successive overrelaxation method (SOR), and symmetric successive overrelaxation method (SSOR). Although these stationary iterative methods are simple to implement, these methods may break down for indefinite linear systems [73]. Therefore stationary iterative methods are not appropriate for damped modal frequency response problems. • Nonstationary iterative methods Nonstationary methods are highly effective and are based on a sequence of orthogonal vectors. They construct a suitable basis for the Krylov subspace and solve a system Ax = b projected onto this Krylov subspace. Let Kn (A, r) denote the Krylov subspace Kn (A, r) = span{r0 , Ar0 , · · · , An−1 r0 }
(2.14)
where r0 = b − Ax(0) is the residual for the initial solution x(0) . The Conjugate Gradient (CG), the BiConjugate Gradient (BiCG) [48], the Generalized Minimal Residual (GMRES) [45], and the Quasi-Minimal Residual (QMR) [47] methods are some of the typical algorithms in this category. The CG method is very effective since it forms the projected system by a simple three term recurrence, but it can be used only when the coefficient matrix is symmetric positive definite. Therefore it is impossible to use the CG method for solving the damped modal frequency response problem since the coefficient matrix is indefinite. 16
Various methods have been proposed for a system in which A is indefinite. One of the most popular methods is GMRES. The GMRES algorithm generates a sequence of orthogonal vectors, known as Arnoldi vectors, based on the modified Gram-Schmidt procedure in the Krylov subspace. The full GMRES retains all of the previously computed vectors in the orthogonal sequence. The major drawback is that it requires excessive storage requirements and computational costs for the orthogonalization. The restarted GMRES retains only some of the previously computed orthogonal vectors and restarts iteration. This procedure is repeated until convergence is achieved. The Minimum Residual (MINRES)method is an alternative for symmetric indefinite coefficient matrices [46]. While the GMRES method maintains orthogonality of the residuals by using a long recurrence at the cost of a large storage demand, the BiCG method takes another approach. It does not minimize a residual, but replaces the orthogonal sequence of residuals by two mutually orthogonal sequences at the price of no longer providing a minimization. It can be accomplished using the unsymmetric Lanczos process, in which bases are generated for the two involved Krylov spaces. Therefore, the generation of basis vectors is relatively cheap and the memory requirement is low. The BiCG method often displays rather irregular convergence behavior because of its dependence on the unsymmetric Lanczos process. The QMR attempts to overcome this problem. The main idea is to solve the reduced tridiagonal system in a least squares sense, similar to GMRES. Since the constructed basis for the Krylov subspace is bi-orthogonal, rather than orthogonal as in GMRES, the obtained solution is viewed as a quasi-minimal residual solution. Additionally, the QMR method uses look-ahead techniques to avoid breakdowns in the Lanczos process, which makes it more robust than the BiCG
17
method [48]. All of these iterative methods require O(n2 ) operations per iteration. However, the convergence rate of iterative methods depends on spectral properties of the coefficient matrix, so that iterative methods require numerous iterations at each frequency. In addition, when a system has multiple right hand sides, the cost is increased in proportion to the number of right hand sides. Therefore, there are still disadvantages when iterative methods are used to solve the damped modal frequency response problem.
18
Chapter 3
A New Frequency Response Analysis Algorithm This chapter introduces a new frequency response analysis algorithm, known as fast frequency response analysis (FFRA), for structures with structural and viscous damping. At first, the damped modal frequency response problem is formulated using the Automated Multi-Level Substructuring (AMLS) method [25] since AMLS is an efficient method for computing an approximate partial eigensolution and can form modal matrices efficiently. Then, algorithms for structures with structural damping, or both structural and viscous damping, are developed. Finally, an algorithm for the special case in which all modal matrices are full is developed.
3.1
Modal Frequency Response Problem Formulation using AMLS
In modal frequency response analysis, it is necessary to obtain the required natural frequencies and natural modes of a structure up to a specified cutoff frequency as shown in equation (1.4). There have been several methods of calculating or 19
approximating eigensolutions for large scale structural systems such as the Lanczos method [7], the component mode synthesis method [9–11], and the AMLS method [25–28]. The research work in this dissertation uses the AMLS method. The recently developed AMLS method can calculate the partial eigensolution with good performance, so it is now widely used in industry. Another motivation to use the AMLS method is that it forms the modal damping matrices efficiently. While a classical approach requires huge amounts of data transfer for generating the modal damping matrices, the approach with the AMLS method can form the modal damping matrices efficiently without forming the FE eigenvector matrix explicitly, so the amount of data transfer can be significantly reduced.
3.1.1
Overview of AMLS
The AMLS method is very closely related to the mode superposition method [53] and the component mode synthesis method. In the AMLS method, a FE model of a structure is automatically and recursively divided into thousands of substructures on multiple levels. The eigenspace of each substructure is truncated for dimensional reduction, and the eigenvectors associated with these substructures are used to approximate the partial eigensolution of the FE model. The AMLS method for obtaining the partial eigensolution of the FE model is composed of five phases. The flow of the AMLS method is illustrated in Figure 3.1. In PHASE 1, FE matrices are generated. In this dissertation, NX.Nastran version 1.0.1, which is the same as MSC.Nastran version 2001, is used to generate this FE data since it is standard commercial FE software for large-scale vibration analysis in industry. We will refer to either NX.Nastran or MSC.Nastran 2001 as NASTRAN in this dissertation. Once FE matrices are obtained, these are partitioned into many substructures on a number of levels automatically in PHASE 2 based on the sparsity
20
AMLS eigensolver PHASE 1 Generate the finite element matrices
PHASE 2 Reorder and partition the finite element matrices
PHASE 3 Transform the model
PHASE 4 Compute the global eigensolution
PHASE 5 Backtransform to FE space and form the modal matrices
Fast Frequency Response Analysis (FFRA)
Figure 3.1: Analysis flow using the AMLS method and the FFRA algorithm
21
structure of system matrices. PHASE 3 computes partial substructure eigensolutions whose eigenvalues are below a specified cutoff frequency, which is higher than the square of the highest excitation frequency of interest, and projects the FE matrices onto a substructure eigenvector subspace. PHASE 4 approximates the global eigensolution in terms of the substructure eigenvectors found in PHASE 3. PHASE 5 computes approximate FE eigenvectors and generates modal damping matrices. Once modal matrices are obtained in AMLS using the partial eigensolution, the FFRA algorithm solves the modal frequency response problem. This dissertation is focused on the FFRA algorithm for solving the modal frequency response problem efficiently.
3.1.2
Modal frequency response problem formulation
The frequency response problem in the FE space is rewritten in the form [−ω 2 M + iωB + (1 + iγ)K + iKs ]X(ω) = P (ω)
(3.1)
where M , B, K, Ks ∈ RNeq ×Neq and P ∈ CNeq ×Ncase are FE matrices, X(ω) ∈ CNeq ×Ncase is the displacement matrix and P (ω) ∈ RNeq ×Ncase is the force matrix. Neq is the number of finite element degrees of freedom and Ncase is the number of load cases. By using the AMLS transformation matrix T ∈ RNeq ×NA whose columns are substructure eigenvectors, which is generated in PHASE 3, one can assume FE displacements in the form X(ω) ≈ T Y (ω)
(3.2)
where Y (ω) ∈ RNA ×Ncase contains coefficients of substructure eigenvectors. NA is the number of substructure eigenvectors. The frequency response problem in the FE space can be projected onto a substructure eigenvector subspace by substituting
22
equation (3.2) into equation (3.1) and then premultiplying by T T as follows. [−ω 2 M + iωB + (1 + iγ)K + iKs ]Y (ω) = F(ω)
(3.3)
where the transformed matrices are defined as M = T T M T ∈ RNA ×NA B
= T T BT ∈ RNA ×NA
K
= T T KT ∈ RNA ×NA
(3.4)
Ks = T T Ks T ∈ RNA ×NA F
= T T P ∈ RNA ×Ncase .
Based on the approximate global eigensolution Φg which is obtained by solving the reduced eigenvalue problem, KΦg = MΦg Λ, in PHASE 4, the frequency response problem on the substructure eigenvector subspace in equation (3.3) can be projected again onto a subspace of global modes Φg by approximating Y (ω) as Y (ω) ≈ Φg Z(ω)
(3.5)
where Z(ω) ∈ RNev ×Ncase is the modal coefficient matrix. Substituting equation (3.5) into equation (3.3), premultiplying by ΦTg , and using orthogonality relations and mass normalization give the following damped modal frequency response problem ¯ + (1 + iγ)Λ + iK ¯ s ]Z(ω) = F (ω) [−ω 2 I + iω B
(3.6)
¯ s = ΦTg Ks Φg and F = ΦTg F. The equation obtained is the ¯ = ΦTg BΦg , K where B same as equation (1.6). Once the modal solution Z(ω) is obtained over the desired frequency range, the modal solution needs to be backtransformed to the FE space to get the frequency response. It can be represented as X(ω) ≈ T Y (ω) ≈ T Φg Z(ω).
23
(3.7)
Note that forming the modal damping matrices using the AMLS method with the following order of matrix computation ¯ = ΦT BΦ = ΦT (T T BT )Φg B g
(3.8)
K¯s = ΦT Ks Φ = ΦTg (T T Ks T )Φg
(3.9)
is more advantageous in the operation count and the amount of data transfer than the classical way shown in equation (1.6) which has the following order of matrix operations ¯ = ΦT BΦ = (T Φg )T B(T Φg ) B
(3.10)
¯ s = ΦT Ks Φ = (T Φg )T Ks (T Φg ). K
(3.11)
Using the AMLS method, T T BT and T T Ks T are obtained implicitly since T is never formed explicitly, because T is a huge matrix and is very costly to form explicitly. Also, these operations are performed naturally in the AMLS transformation without requiring additional data and cost since these transformations utilize the information which has already been computed to transform the mass and stiffness matrices. In other words, these matrices are obtained in the same way that the mass and stiffness matrices are transformed. Therefore, the AMLS method can significantly reduce the computational cost of forming modal damping matrices compared to the classical approach which forms Φ explicitly with huge amounts of data transfer.
3.2
Algorithm FFRA1 : Systems with structural damping
Without viscous damping, i.e., B = 0, the modal frequency response problem in equation (3.6) becomes ¯ s ]Z(ω) = F (ω). [−ω 2 I + (1 + iγ)Λ + iK 24
(3.12)
Most commercial finite element software solves this equation by factoring the com3 ) at each plex coefficient matrix at each frequency. The cost of operations is O(Nev 3 ∗N frequency, so that the total cost becomes O(Nev f req ) operations which is very
expensive for large size problems. Nf req is the number of excitation frequencies. In the FFRA1 algorithm, instead of factoring the complex coefficient matrix or using any iterative methods, we intend to diagonalize the coefficient matrix. First, a complex symmetric matrix C is defined ¯ s, C = (1 + iγ)Λ + iK
C = C T ∈ CNev ×Nev
(3.13)
and equation (3.12) is rewritten as [−ω 2 I + C]Z(ω) = F (ω).
(3.14)
Note that C is not a Hermitian matrix, but it is frequency independent. Next, the FFRA1 algorithm solves the eigenvalue problem for C CΦC = ΦC ΛC
(3.15)
where ΛC ∈ CNev ×Nev is the diagonal matrix of complex eigenvalues and ΦC ∈ CNev ×Nev is the complex eigenvector matrix. ΦC is normalized to satisfy ΦC ΦTC = ΦTC ΦC = I
(3.16)
ΦTC CΦC = ΛC .
(3.17)
and
To solve this eigenvalue problem efficiently, the complex symmetric matrix eigensolver CSYMM is developed and described in detail in Chapter 4. In the frequency response problem in equation (3.14), we let Z(ω) = ΦC W (ω).
25
(3.18)
Substituting equation (3.18) into equation (3.14) and premultiplying by ΦTC gives ΦTC [−ω 2 I + C]ΦC W (ω) = ΦTC F (ω).
(3.19)
Combining equation (3.16), (3.17), and (3.18) with equation (3.19), the frequency response analysis problem can be written as [−ω 2 I + ΛC ]W (ω) = ΦTC F (ω)
(3.20)
where the coefficient matrix D(ω) = (−ω 2 I + ΛC ) becomes a diagonal matrix and is frequency dependent. Therefore, the solution W (ω) can be computed easily as follows. W (ω) = [−ω 2 I + ΛC ]−1 ΦTC F (ω) = D(ω)−1 ΦTC F (ω)
(3.21)
Then, the modal solution Z(ω) can be obtained from the backtransformation in equation (3.18). Table 3.1 represents the number of operations for each step in the FFRA1 al3 ) operations to solve the complex gorithm. In step (1), the algorithm requires O(Nev
symmetric eigenvalue problem for C before the modal frequency response analysis begins. Note that this computation is required only one time. At each frequency, 2 ∗ N T O(Nev case ) operations are required to form ΦC F (ω) when F has Ncase right
hand sides. If forces are frequency independent, step (2.1) needs to be computed only once before the frequency sweep. To compute W (ω), O(Nev ∗ Ncase ) opera2 ∗N tions are necessary in step (2.2) and O(Nev case ) operations are necessary for the
backtransformation in step (2.3). Therefore, the main operations of the FFRA1 al3 ) operations for solving the eigenvalue problem once and O(N 2 ) gorithm are O(Nev ev
operations per load case at each frequency. Therefore, as shown in Table 3.1, this new algorithm, FFRA1, is much faster than the conventional approach which factors 3 ) operations. the coefficient matrix at each frequency with O(Nev
26
Table 3.1: The cost of operations for the FFRA1 algorithm: frequency response analysis with structural damping Step Task Cost (1)
CΦC = ΦC Λ
3 ) O(Nev
(2.1) (2.2) (2.3)
for i = 1, Nf req ΦTC F W = D−1 (ΦTC F ) Z = ΦC W end
2 ∗N O(Nev case ) O(Nev ∗ Ncase ) 2 ∗N O(Nev case )
3 ) + O(N 2 ∗ N O(Nev f req ∗ Ncase ) ev
main cost
3.3
Algorithm FFRA2 : Systems with viscous damping
When only viscous damping exists, equation (3.6) can be reduced to ¯ + (1 + iγ)Λ]Z(ω) = F (ω). [−ω 2 I + iω B
(3.22)
¯ can be asymmetric if gyroscopic effects are present. Equation (3.22) can be B rewritten as ¯ [D(ω) + iω B]Z(ω) = F (ω)
(3.23)
D(ω) = −ω 2 I + (1 + iγ)Λ.
(3.24)
where
¯ is fully populated, solving equation (3.23) by factoring the Since the matrix B coefficient matrix is very expensive. The new algorithm FFRA2 handles viscous damping by noting that the rank of the viscous damping matrix is typically very low for problems of interest in the automobile industry because of the small number of viscous damping elements such as shock absorbers and engine mounts. At first, the modal viscous damping matrix
27
¯ is decomposed into B ¯ = ΦT Bb Φb B b
(3.25)
where Bb ∈ RNb ×Nb contains only non-zero rows and columns of the finite element matrix B. Φb ∈ RNb ×Nev contains rows of Φ which corresponds to non-zero elements in B. Generally Nb is much smaller than Neq since B is very sparse. Nb is typically tens of degrees of freedom in automobile structures. For the asymmetric B case, that is Bb 6= BbT , Bb is decomposed by using the singular value decomposition (SVD) as Bb = U ΣV T
(3.26)
where U ∈ RNb ×Nrank , V ∈ RNb ×Nrank , and Σ ∈ RNrank ×Nrank . Nrank is the rank of the matrix Bb and Σ is a diagonal matrix which has the nonzero singular values of ¯ can be represented in the form Bb . By combining equation (3.25) and (3.26), B ¯ = ΦT U ΣV T Φb . B b
(3.27)
Using the decomposition in equation (3.27), equation (3.23) becomes [D(ω) + iωΦTb U ΣV T Φb ]Z(ω) = F (ω).
(3.28)
Equation (3.28) is rewritten as ¯ Q(ω)V¯ T ]Z(ω) = F (ω) [D(ω) + U
(3.29)
¯ = ΦT U ∈ RNev ×Nrank , and V¯ = ΦT V ∈ where Q(ω) = iωΣ ∈ CNrank ×Nrank , U b b RNev ×Nrank . Note that the coefficient matrix in equation (3.29) is composed of ¯ , V¯ and Q, so the coefficient the diagonal matrix D and the low rank matrices U ¯ QV¯ T ) is referred as a diagonal plus low rank (DPLR) matrix. This matrix (D + U DPLR coefficient matrix can be inverted inexpensively using the Sherman-MorrisonWoodbury (SMW) formula [49]. 28
Formula 3.1 (Sherman-Morrison-Woodbury). If A and (I + RT A−1 P ) are invertible, then (A + P RT )−1 = A−1 − A−1 P (I + RT A−1 P )−1 RT A−1 . With this SMW formula, the DPLR coefficient matrix of equation (3.29) can be inverted as follows. ¯ QV¯ T )−1 (D + U ¯ Q1/2 )−1 Q1/2 V¯ T D−1 ¯ Q1/2 (I + Q1/2 V¯ T D−1 U = D−1 − D−1 U = D−1 − S1 (I + S2T R1 )−1 S2T
(3.30)
= D−1 − S1 (I + R2T S1 )−1 S2T ¯ Q1/2 ∈ CNev ×Nrank , S1 = D−1 R1 ∈ CNev ×Nrank , R2 = V¯ Q1/2 ∈ where R1 = U CNev ×Nrank , and S2 = D−1 R2 ∈ CNev ×Nrank . Since the dimension of (I + R2T S1 ) is Nrank , computing the inverse of this matrix is very economical. Therefore, Z(ω) can be obtained easily in the form ¯ QV¯ T ]−1 F (ω) Z(ω) = [D + U = [D−1 − S1 (I + R2T S1 )−1 S2T ]F (ω).
(3.31)
If B is symmetric, we use the eigendecomposition of Bb which is a special case of the SVD of Bb . It becomes Bb = U ΣU T
(3.32)
where Σ ∈ RNrank ×Nrank is a matrix containing the nonzero eigenvalues of Bb and U ∈ RNb ×Nrank is the eigenvector matrix. Then equation (3.23) becomes [D(ω) + iωΦTb U ΣU T Φb ]Z(ω) = F (ω).
(3.33)
Equation (3.33) can be rewritten in the form of a DPLR coefficient matrix ¯ Q(ω)U ¯ T ]Z(ω) = F (ω). [D(ω) + U
(3.34)
In this case, the inverse of the coefficient matrix in equation (3.34) using the SMW formula becomes ¯ QU ¯ T ]−1 = D−1 − S1 (I + RT S1 )−1 S T . [D + U 1 1 29
(3.35)
Table 3.2: The cost of operations for FFRA 2 algorithm: frequency response analysis with viscous damping Step Task Cost (1) (2.1) (2.2)
Bb = U ΣV T ¯ = ΦT U U b V¯ = ΦTb V
O(Nb3 ) O(Nev ∗ Nb ∗ Nrank ) O(Nev ∗ Nb ∗ Nrank )
(3.1) (3.2) (3.3) (3.4) (4) (5.1) (5.2) (5.3) (5.4)
for i = 1, Nf req ¯ Q1/2 R1 = U S1 = D−1 R1 R2 = V¯ Q1/2 S2 = D−1 R2 D−1 F S2T F (I + R2T S1 )−1 (I + R2T S1 )−1 ∗ S2T F S1 ∗ (5.3) end
2 O(Nev ∗ Nrank ) O(Nev ∗ Nrank ) 2 O(Nev ∗ Nrank ) O(Nev ∗ Nrank ) O(Nev ∗ Ncase ) O(Nev ∗ Nrank ∗ Ncase ) 3 O(Nrank ) 2 O(Nrank ∗ Ncase ) O(Nev ∗ Nrank ∗ Ncase )
main cost
O(Nev ∗ Nf req ∗ Ncase )
The cost of operations in this FFRA2 algorithm is summarized in Table 3.2. Each step describes the cost of operations to solve equation (3.31). The SVD ¯ and V¯ needs at most O(Nb3 ) costs in step (1) which is almost negligible. Next U are formed in step (2). Steps (1) and (2) are performed once before the frequency response analysis begins. At each frequency, steps (3) through (5) are executed. Since Nrank is very small compared to Nev , most of operations associated with Nrank are negligible. Therefore, the main cost is O(Nev ∗ Nf req ∗ Ncase ) for the entire frequency range, so the FFRA2 algorithm results in a dramatic reduction in the cost of operations compared to factoring the coefficient matrix at each frequency 3 ) operations. with O(Nev
30
3.4
Algorithm FFRA3 : Systems with both structural and viscous damping
¯ and K ¯ S are both nonzero, equation (3.6) can be When modal damping matrices B rewritten as ¯ + C]Z(ω) = F (ω) [−ω 2 I + iω B
(3.36)
where C is defined in equation (3.13). The previously developed algorithms, FFRA1 and FFRA2, are combined to solve this damped frequency response problem. First, the singular value decomposition for the Bb matrix is performed to give Bb = U ΣV T . With equation (3.27), equation (3.36) becomes [−ω 2 I + iωΦTb U ΣV T Φb + C]Z(ω) = F (ω).
(3.37)
Then, the complex symmetric matrix eigenvalue problem for C is solved as shown in equation (3.15). By letting Z(ω) = ΦC W (ω) and premultiplying equation (3.37) by ΦTC , equation (3.37) becomes ¯ ΣV¯ T + C]ΦC W (ω) = ΦTC F (ω). ΦTC [−ω 2 I + iω U
(3.38)
¯ and V¯ are defined in equation (3.29). Equation (3.38) can be rewritten where U using equations (3.16) and (3.17) in the form ¯ ΣV¯ T ΦC + ΛC ]W (ω) = ΦTC F (ω). [−ω 2 I + iωΦTC U
(3.39)
For convenience, we define the diagonal matrix D(ω) as D(ω) = −ω 2 I + ΛC .
(3.40)
Then, the frequency response problem can be represented as [D(ω) + P Q(ω)R]W (ω) = ΦTC F (ω)
31
(3.41)
¯ ∈ CNev ×Nrank , Q(ω) = iωΣ ∈ CNrank ×Nrank , and R = V¯ T ΦC ∈ where P = ΦTC U CNrank ×Nev . The coefficient matrix (D + P QR) is a DPLR matrix and can be inverted using the SMW formula as explained in section 3.3. The inverse of coefficient matrix (D + P QR) is [D + P QR]−1 = D−1 − D−1 P Q1/2 (I + Q1/2 RD−1 P Q1/2 )−1 Q1/2 RD−1
(3.42)
= D−1 − S1 (I + R2T S1 )−1 S2T . where R1 = P Q1/2 ∈ CNev ×Nrank , S1 = D−1 R1 ∈ CNev ×Nrank , R2 = RT Q1/2 ∈ CNev ×Nrank , and S2 = D−1 R2 ∈ CNev ×Nrank . Then, the solution W (ω) can be obtained as W (ω) = [D + P QR]−1 ΦTC F (ω) [D−1 − S1 (I + R2T S1 )−1 S2T ]F (ω).
(3.43)
Finally, the modal solution Z(ω) is given by the backtransformation Z(ω) = ΦC W (ω)
(3.44)
For the Bb = BbT case, the eigenvalue decomposition for Bb matrix is expressed as Bb = U ΣU T . Then the frequency response problem in equation (3.36) becomes ¯ ΣU ¯ T + C]Z(ω) = F (ω). [−ω 2 I + iω U
(3.45)
and can be rewritten in the form [D(ω) + P Q(ω)P T ]W (ω) = ΦTC F (ω).
(3.46)
The inverse of the coefficient matrix which is a DPLR matrix is given by [D + P QP T ]−1 = D−1 − D−1 P Q1/2 (I + Q1/2 P T D−1 P Q1/2 )−1 Q1/2 P T D−1 = D−1 − S1 (I + R1T S1 )−1 S1T . 32
(3.47)
Therefore, the modal solution becomes Z(ω) = ΦC (D + P QP T )−1 ΦTC F (ω) = ΦC [D−1 − S1 (I + R1T S1 )−1 S1T ]F (ω)
(3.48)
Table 3.3 shows the number of operations in each step of the algorithm. 3 ) operations once, Before frequency response analysis, the FFRA3 requires O(Nev 2 ) operations per load case at each frequency. Since the cost of then it needs O(Nev
decomposing B is negligible, the overall cost is almost the same as the FRRA 1 algorithm.
3.5
Algorithm FFRA4 : Modal correction approach for system with M, K not diagonalized by Φ
Optimization plays an important part in the structure design process. In the optimal design procedure, most commercial FE software restarts the entire analysis whenever a design modification is performed, so optimal design for large structural systems is very expensive. Therefore, although structural optimization is potentially very beneficial, structure optimization for large structures has not been feasible up to now due to its expense. Recently, a significant computational time reduction has been achieved by the modal correction approach, in which the modifications realized on an original structural system are directly taken into account in the equations of motion on the basis of modal correction matrices which can be easily deduced from the FE model of the basic or original configuration [75]. Figure 3.2 shows the basic concept of optimization flow for an approximate model with the modal correction approach. As shown, the optimization algorithm is applied to the approximate model until appropriate design improvement is obtained. However, in the modal correction approach, the mass, stiffness and damping matrices of a modified configuration differ 33
Table 3.3: The cost of operations for FFRA 3 algorithm: frequency response analysis with structural and viscous damping Step Task Cost (1) (2) (3.1) (3.2) (4) (5)
CΦC = ΦC ΛC Bb = U ΣV T ¯ = ΦT U U b ¯ V = ΦTb V ¯ P = ΦTC U T ¯ R = V ΦC
3 ) O(Nev O(Nb3 ) O(Nev ∗ Nb ∗ Nrank ) O(Nev ∗ Nb ∗ Nrank ) 2 ∗N O(Nev rank ) 2 ∗N O(Nev rank )
(6) (7) (8) (9.1) (9.2) (9.3) (9.4) (9.5) (9.6) (9.7) (9.8) (10)
for i = 1, Nf req ΦTC F D = −ωi2 I + ΛC W1 = D−1 ∗ ΦTC F R1 = P Q1/2 R2 = RQ1/2 S1 = D−1 R1 S2 = D−1 R2 S2T ∗ ΦTC F (I + R2T S1 )−1 (I + R2T S1 )−1 ∗ S2T ΦTC F W2 = S1 ∗ (I + R2T S1 )−1 S2T ΦTC F Z = ΦC ∗ (W1 − W2 ) end
2 ∗N O(Nev case ) O(Nev ) O(Nev ∗ Ncase ) 2 O(Nev ∗ Nrank ) 2 O(Nev ∗ Nrank ) O(Nev ∗ Nrank ) O(Nev ∗ Nrank ) O(Nev ∗ Nrank ∗ Ncase ) 3 O(Nrank ) 2 O(Nrank ∗ Ncase ) 2 O(Nev ∗ Nrank ∗ Ncase ) 2 O(Nev ∗ Ncase )
3 ) + O(N 2 ∗ N O(Nev f req ∗ Ncase ) ev
main cost
34
from those in equation (3.1) which is related to the basic or original configuration. The following equation represents the frequency response problem of the modified system in the modal correction approach: [−ω 2 (M +∆M )+iω(B+∆B)+(1+iγ)(K+∆K)+i(Ks +∆Ks )]X(ω) = P (ω) (3.49) where ∆M, ∆B, ∆K, and ∆Ks are the changes in the FE mass, viscous damping, stiffness, and structural damping matrices, respectively. Then, the modal frequency response problem for the modified system, in terms of the original systems’s modes, becomes [−ω 2 M + iωB + (1 + iγ)K + iKs ]Z(ω) = F (ω)
(3.50)
where M
= ΦT (M + ∆M )Φ
= ΦTg [T T (M + ∆M )T ]Φg
B
= ΦT (B + ∆B)Φ
= ΦTg [T T (B + ∆B)T ]Φg
K
= ΦT (K + ∆K)Φ
= ΦTg [T T (K + ∆K)T ]Φg
(3.51)
Ks = ΦT (Ks + ∆Ks )Φ = ΦTg [T T (Ks + ∆Ks )T ]Φg . Note that the modal frequency response problem for the modified configuration does not result in diagonal modal mass and stiffness matrices. In this section, the algorithm FFRA4 is developed to solve the modal frequency response problem of a structural system which has full modal mass and stiffness matrices as shown in equation (3.49). The first step is performing the factorization M = LLT
(3.52)
in which L is lower triangular. By representing Z(ω) as Z(ω) = L−T S(ω), and premultiplying equation (3.50) by L−1 , we obtain [−ω 2 I + iωL−1 BL−T + (1 + iγ)L−1 KL−T + iL−1 Ks L−T ]S(ω) = L−1 F (ω). (3.53) Based on the algorithm FFRA1, the complex symmetric matrix C is defined as C = (1 + iγ)L−1 KL−T + iL−1 Ks L−T 35
(3.54)
and the complex symmetric matrix eigenvalue problem, CΦC = ΦC ΛC , is solved. Then, the singular value decomposition for (B + ∆B) is performed, so the modal viscous damping matrix B is decomposed in the form ¯ ΣV¯ T B = ΦTb (U ΣV T )Φb = U
(3.55)
as described for algorithm FFRA2. With equation (3.54) and (3.55), representing S(ω) as S(ω) = ΦC W (ω) and premultiplying equation (3.53) by ΦTC yield ¯ ΣV¯ T L−T ΦC + ΛC ]W (ω) = ΦTC L−1 F (ω). [−ω 2 I + iωΦTC L−1 U
(3.56)
Finally, equation(3.56) can be represented with a DPLR coefficient matrix as follows. [D(ω) + P Q(ω)R]W (ω) = ΦTC L−1 F (ω)
(3.57)
where D(ω) = −ω 2 I + ΛC ∈ CNev ×Nev , Q(ω) = iωΣ ∈ CNrank ×Nrank and ¯ ∈ CNev ×Nrank P = ΦTC L−1 U
(3.58)
R = V¯ T L−T ΦC ∈ CNrank ×Nev .
(3.59)
Once the solution W (ω) is obtained using the SMW formula, the modal solution Z(ω) can be obtained as Z(ω) = L−T S(ω) = L−T ΦC W (ω).
(3.60)
The cost of algorithm FFRA4 is almost the same as that of algorithm FFRA3 except the additional cost for the factorization of M in equation (3.52) and the matrix multiplication of L−1 and L−T in equation (3.53). Therefore, algorithm FFRA4 can also reduce the computational cost greatly compared to the classical approach, which factors the coefficient matrix at each frequency for the modified configuration system.
36
Improved Design
Optimization Algorithm
Frequency Frequency Response Response Analysis Analysis
Approximate Approximate Model Model
Sensitivity Analysis Finite Element Analysis
Initial Design
Figure 3.2: Optimization flow in the modal correction approach
37
Chapter 4
Complex Symmetric Matrix Eigensolver Most of the work on algorithms for solving eigenvalue problems have been related to real symmetric, complex Hermitian, or real and complex nonsymmetric matrices [49, 58]. However, the newly developed algorithm, FFRA, in Chapter 3 calls for a complex symmetric matrix eigensolution. Most numerical libraries do not offer routines specially designed for the solution of complex symmetric eigenvalue problems. The standard approach for solving complex symmetric matrix eigenvalue problems is to treat them as complex general matrix eigenvalue problems. However, the cost of a complex general matrix eigensolver is very high, since complex general matrices are typically reduced to upper Hessenberg form, from which eigenvalues and eigenvectors are computed. This method requires substantially more time and storage than the tridiagonalization based approach for solving real symmetric and complex Hermitian matrices. There has been only limited research on algorithms to solve complex symmetric matrix eigenvalue problems. The lack of research or special software for solving a complex symmetric eigenvalue problem comes from the following observations 38
[55, 56, 76]: • Complex symmetric matrices appear less frequently in practice than real symmetric and complex Hermitian matrices. • Complex symmetric matrices are not necessarily diagonalizable, in contrast to complex Hermitian matrices. • The straight forward reduction of a complex symmetric matrix to a tridiagonal form is not always stable. • There is no robust theory of a complex symmetric tridiagonal eigenproblem. Recently, some effective Lanczos-type tridiagonalization procedures for partially reducing a complex symmetric matrix to a complex symmetric tridiagonal matrix have been introduced [24, 60, 61]. However, although the Lanczos-type methods are very effective for solving large and sparse eigenproblems, they are not practical for dense matrix eigenproblems, especially when all eigensolutions are required. Bar-On and Ryaboy [55] proposed an algorithm, which is called CS (complex symmetric), for computing the eigenvalues and eigenvectors of dense complex symmetric matrices. Their algorithm considerably outperforms the general matrix eigensolver when computing the eigenvalues of complex symmetric matrix. However, their algorithm is still inefficient since it is developed based on a strategy from EISPACK [43], which is an old (1970’s) library of Fortran 77 routines for solving eigenvalue problems. EISPACK algorithms use BLAS 2 operations, so that EISPACK algorithms take too much time to move data instead of doing useful floating-point operations. In this chapter, a new complex symmetric matrix eigenvalue problem solver, CSYMM, is developed. First, some properties of a complex symmetric matrix are addressed. Then, the algorithm of CSYMM is introduced, in which CSYMM implements a complex orthogonal similarity transformation for finding the eigensolutions of a complex symmetric matrix with BLAS 3 operations. Using this transformation, 39
we can reduce computational costs greatly because a complex symmetric matrix is reduced to a complex symmetric tridiagonal matrix. In addition, a block packed storage algorithm is developed to reduce the amount of memory required while obtaining good performance.
4.1
Properties of Complex Symmetric Matrices
A complex symmetric matrix A = AT ∈ Cn×n is diagonalizable if and only if its eigenvector matrix X can be chosen such that X T AX = Λ = diag(λ1 , λ2 , · · · , λn ) X T X = XX T = I,
X ∈ Cn×n
(4.1)
where Λ ∈ Cn×n is the eigenvalue matrix and X is the eigenvector matrix. However, a complex symmetric matrix A may not be diagonalizable. When A is a defective matrix that does not have a full set of n linearly independent eigenvectors because of repeated eigenvalues, A cannot be transformed to a diagonal matrix using similarity transformations. A complex symmetric matrix may be defective if it has a complex eigenvector x such that xT x = 0,
but x 6= 0.
(4.2)
This may occur only when A has repeated eigenvalues [76]. In such cases we can form a basis which consists of eigenvectors and generalized eigenvectors. Then, there exists a similarity transformation P that transforms A to a matrix J, P −1 AP = J, such that J is as close as possible to a diagonal form.
40
J is the Jordan form matrix
J = P −1 AP =
J1
J2 ..
. Js−1
(4.3)
Js in which Ji is a Jordan block. In the FFRA algorithm described in Chapter 3, if the complex symmetric matrix C = (1 + iγ)Λ + iKs can be decomposed in the form of equation (4.1), the coefficient matrix of equation (3.14) can be easily diagonalized in the form (−ω 2 I + ΛC )W (ω) = ΦTC F (ω)
(4.4)
as shown in equation (3.20). If the complex symmetric matrix C is not diagonalizable, then equation (3.14) can be transformed as (−ω 2 I + J)W (ω) = P −1 F (ω).
(4.5)
by using the Jordan form and representing Z(ω) = P W (ω). The coefficient matrix of equation (4.5) becomes nearly diagonal, so that the solution W can be obtained inexpensively. But, in order to obtain equation (4.5), we need P −1 which could be expensive to compute. However, P −1 can be obtained efficiently using the eigensolution of the complex symmetric matrix C in (3.14). As the simplest example, assume that C has only one repeated eigenvalue. When one of C’s eigenvalues is repeated k times, we
41
would get the following Jordan form λ1 P −1 CP =
..
=J
. λn−k
(4.6)
Jk after arranging eigenvalues such that the repeated eigenvalues are listed last. The submatrix Jk contains only Jordan block(s) that exist. There could be more than one. Then, P T P can be represented in the form T h i X d PTP = Xd Xr XrT XdT Xd XdT Xr = T T X Xd Xr Xr r I 0 = T 0 Xr Xr
(4.7)
where Xd ∈ Cn×(n−k) contains the eigenvectors corresponding to the distinct eigenvalues, that is XdT Xd = I, and Xr ∈ Cn×k contains the generalized eigenvectors for the repeated eigenvalues. I ∈ R(n−k)×(n−k) is the identity matrix and XrT Xr ∈ Ck×k . Note that the eigenvectors in Xd and the generalized eigenvectors in Xr are orthogonal to each other as shown in Appendix A. Then P −1 can be found from the following equation −1 0 I PT. P −1 = T 0 Xr Xr
(4.8)
When k ¿ n, the inverse of P can be obtained inexpensively. For a complex symmetric matrix to be defective, it must not only have a repeated eigenvalue, but also the corresponding eigenvector must satisfy equation 42
(4.2). The probability of both of these being true to a high level of precision is apparently very low. In our experience with industrial FE models, we have not yet encountered a defective complex symmetric matrix, so we will not address this special case further in this dissertation, but leave it as a topic for future research.
4.2
Full Storage Complex Symmetric Matrix Eigensolver
One of the most efficient current algorithms for computing full eigensolutions of a real symmetric or complex Hermitian matrix A is the Householder method, in which an original dense matrix A is reduced to a tridiagonal matrix T with n − 2 similarity transformations. Similarly, the Householder method for a complex symmetric matrix is developed by introducing a complex orthogonal similarity transformation.
4.2.1
Complex symmetric Householder matrix
The newly developed algorithm CSYMM performs a complex orthogonal transformation using complex symmetric Householder matrices. The complex symmetric Householder matrix H has the form, H = I − τ vvT
(4.9)
HT = H
(4.10)
H T H = HH T = I.
(4.11)
where H is symmetric
and orthogonal
The τ is defined as τ=
2 vT v
.
(4.12)
Note that this τ is different from τ = 2/kvk22 = 2/(vH v) in the complex Hermitian matrix case [54]. 43
To form H, a complex Householder vector v for a complex symmetric Householder matrix needs to be defined. There are a number of practical details associated with the determination of a Householder vector [49, 54]. Suppose we are given x ∈ Cn , x 6= 0 and want Hx = [ξ, 0, · · · , 0]T = ξe1 . Using the formula for H, we have Hx = (I − τ vvT )x = x − (τ vT x)v
(4.13)
= ξe1 . Since the scalar factor is irrelevant in determining v, we can take v = x − ξe1 .
(4.14)
Similar to the real case [49], substituting equation (4.14) into equation (4.13), then √ we obtain ξ = ∓ xT x. The following algorithm 4.2.1 gives a complex Householder vector v for a complex symmetric Householder matrix. This algorithm comes from some slight modifications of a LAPACK algorithm [58]. It is convenient to normalize v so that v(1) = 1. This normalization is done for storage considerations. One concerns the choice of sign in the definition of ξ in equation (4.14). In order that the scaling factor γ has small relative error in step (6), the sign of β is chosen in step (4). Algorithm 4.2.1 (Complex Householder vector for the complex symmetric Householder matrix). Given x ∈ Cn , this function computes the complex Householder vector v ∈ Cn with v(1) = 1 and τ ∈ C such that H = I − τ vvT is √ orthogonal and Hx = ∓( xT x)e1 . function [v, τ ] = house(x) For x ∈ Cn and α = x(1) ∈ C (1)
σ = x(2 : n)T x(2 : n) ∈ C if σ = 0 44
(2)
τ =0 else
√ α2 + σ ∈ C
(3)
β=
(4)
if | α − β |≥| α + β | then γ =α−β else γ =α+β β = −β end
(5)
τ = −γ/β
(6)
v = x(2 : n)/γ end
Once Hk is obtained in the k-th column, we can write the process of the tridiagonal decomposition as A(k−1) = Hk A(k) Hk
(4.15)
where the superscript represents the number of the reduced column. Starting from the n-th column, applying the matrix Hk to A = AT ∈ Cn×n from the right and left produces zeros above the first superdiagonal of column k, and modifies columns 1 through k − 1 of A. After all complex symmetric Householder matrices are formed, a complex symmetric matrix A may be reduced to the complex symmetric tridiagonal matrix T = T T ∈ Cn×n by a complex orthogonal similarity transformation T = QT AQ,
Q ∈ Cn×n
(4.16)
where the complex orthogonal transformation matrix Q is the product of complex 45
symmetric Householder matrices, Q = Hn Hn−1 · · · H4 H3
(4.17)
and QT Q = QQT = I. This reduction works backward from the n-th column to the 3rd column for the upper and lower triangular parts of the matrix A.
4.2.2
Complex symmetric matrix eigensolver
The main steps of the new complex symmetric matrix eigenvalue problem solver, CSYMM, are composed of three steps: tridiagonal reduction, eigenvalue problem for a tridiagonal matrix, and backtransformation. At first, the complex symmetric matrix A is tridiagonalized by applying the complex symmetric Householder matrices to A as shown in equation (4.15). The unblocked version, which calls only Level 1 BLAS (vector-vector operations) and Level 2 BLAS (matrix-vector operations) routines, of the tridiagonalization algorithm can be described as A(k−1) = Hk A(k) Hk = (I − τ vk vkT )A(k) (I − τ vk vkT )
(4.18)
= A(k) − vk xTk − xk vkT + τ (vkT xk )vk vkT where xk = τ A(k) vk ∈ Cn .
(4.19)
Then, equation (4.18) can be represented as a rank 2 update in the form A(k−1) = A(k) − vk wkT − wk vkT .
(4.20)
1 wk = xk − τ vk (vkT xk ) ∈ Cn . 2
(4.21)
where
These reduction processes can be described with a matrix-vector multiplication and an outer product update [49]. The drawback of the unblocked version is that it 46
does not make efficient use of the cache or register of processors, so it shows poor performance. The related efficiency concerns led Bischof and Van Loan [59] to introduce the W Y representation of Householder vectors, in which the unblocked version algorithm is recast as a block algorithm [52, 58] that operates on blocks or submatrices of the original matrix to exploit the faster speed of Level 3 BLAS which is matrix-matrix operations. The key of a block algorithm is to aggregate a product of Householder transformations so that its application becomes a matrix multiplication. Instead of applying equation (4.18) to A for each column, v and w are accumulated in submatrices V and W , respectively. Then the update is performed in the form A(k−1) = A(k+nb−1) − V W T − W V T
(4.22)
where V = (vk , · · · , vk+nb−1 ) ∈ Cn×nb , W = (wk , · · · , wk+nb−1 ) ∈ Cn×nb , and nb is the number of columns in submatrices V and W . Once the matrix A is reduced to the complex symmetric tridiagonal matrix T , we calculate the eigenvalues Λ ∈ Cn×n and the corresponding eigenvectors Z ∈ Cn×n of T from the following eigenvalue problem T Z = ZΛ.
(4.23)
A complex orthogonal QL algorithm is used to compute the complete set of eigenvalues of the tridiagonal matrix, and the corresponding eigenvectors are computed by inverse iteration. We use the subroutines CMTQL1, to compute the complex eigenvalues, and INVERM, to compute the corresponding complex eigenvectors, from the Lanczos package [51]. Finally, we compute the eigenvectors of the original eigenvalue problem by backtransformation. In the backtransformation, the eigenvectors, X ∈ Cn×n , of the original matrix can be obtained as X = QZ 47
(4.24)
where Q is defined in equation (4.17). We use the storage efficient W Y form [57] to represent the complex orthogonal matrices Q in equation (4.17), in which BLAS 3 is used since this backtransformation is very rich in matrix-matrix multiplication.
4.2.3
Tie-situation
In the process of the reduction, the product vT v of the complex Householder vector is required when the complex symmetric Householder matrix H is formed as H = I − 2vvT /(vT v) in equation (4.12). There may be a breakdown of the process when vT v = 0 and v 6= 0 for a given complex householder vector v ∈ Cn . This situation is called a tie-situation by Bar-on and Ryaboy [55], because it occurs when there is a tie between the real and imaginary parts of v. This condition can be rewritten by decomposing v into the real part and the imaginary part. If a complex symmetric Householder vector is represented as v = vr + ivi ∈ Cn , where vr ∈ Rn and vi ∈ Rn , then a tie-situation can be rewritten as (vr + ivi )T (vr + ivi ) = 0, which gives k vr k2 =k vi k2
(4.25)
vrT vi = 0.
(4.26)
So vr and vi are not only the same magnitude, but they must be orthogonal to each other. To identify the tie-situation for each column in the reduction process, all complex Householder v vectors should be examined.
4.2.4
Tie Breaking
We start the reduction with the last column n of the upper triangular matrix A. At the k-th column, the reduced matrix is represented as follows. A(k) = Hk+1 · · · Hn A(n) Hn · · · Hk+1 = Hk+1 A(k+1) Hk+1 48
(4.27)
where A(n) = A. Suppose a tie-situation is encountered in the reduction of the k-th column. Then a tie breaking algorithm is applied in order to continue reducing A(k) to tridiagonal form. Equation (4.28) represents the reduction process up to the k-th column.
A(k) = Hk+1 A(k+1) Hk+1
a1,1 · · · a1,k .. .. . . a1,k · · · ak,k ek = 0
0 ek dk+1 ek+1 ek+1 ..
.
..
.
..
.
0
0
en−1 en−1
=
dn
U (k)
B (k)
T B (k)
T (k)
(4.28)
where U (k) ∈ Ck×k is the unreduced part, and B (k) ∈ Ck×(n−k) and T (k) ∈ C(n−k)×(n−k) are the tridiagonalized part of A(k) . We assume vkT vk = 0 and vk 6= 0, where vk is the Householder vector for reducing the k-th column and row of A(k) . To break this tie-situation, Bar-On and Ryaboy [55] proposed an efficient strategy using the QL method. As breakdowns are fairly uncommon, the use of QL method for the tridiagonal matrix as a recovery strategy is very effective and insignificantly time consuming [55], so that we adopted their strategy. For T (k) which is the tridiagonalized part of A(k) , we apply the following matrix decomposition T (k) = Gk+1 Lk+1
49
(4.29)
where Lk+1 ∈ Cn×n is a lower triangular matrix and Gk+1 ∈ Cn×n is a complex orthogonal matrix. Gk+1 is constructed in the form of a product of rotation matrices as Gk+1 = ΘTn−1 ΘTn−2 · · · ΘTk+1 .
(4.30)
in which the rotation matrices Θ are complex orthogonal Givens rotation matrices. The subscript k + 1 indicates that these rotation matrices are applied to the (k + 1)th column from the last column n. After the factorization in equation (4.29) is finished, the computation of the matrix product in reverse order yields a tridiagonal matrix 0
T (k) = Lk+1 Gk+1 .
(4.31)
Now, multiplying equation (4.29) by GTk+1 and introducing the results into equation (4.31), we obtain 0
T (k) = GTk+1 Tk Gk+1 .
(4.32)
As shown in equation (4.32), after the rotations in Gk+1 are applied to the tridiagonal 0
matrix T (k) from column n to k + 1, the new tridiagonal matrix T (k) is obtained. 0
After the new T (k) is obtained, we apply the complex orthogonal Givens rotation Θk to the column k and k + 1 as a final step. This step changes the last column of the submatrix U (k) and the first column of B (k) . In other words, elements of the column k, which has a tie-situation, are changed and elements of the column k + 1 become nonzero again. Equation (4.33) illustrates the matrix after applying this tie breaking algorithm.
50
A(k+1)
0
=
U (k+1)
0
T B (k+1)
B (k+1) 0
T (k+1)
0
0
a1,k+1 .. .
a1,1 · · · a1,k .. .. . . 0 0 ak,1 · · · ak,k 0 ak+1,1 · · · = 0
0
ak+1,k+1 e¯k+1
0
e¯k+1 d¯k+2 e¯k+2 e¯k+2 ..
0
0
.
..
.
..
.
e¯n−1
e¯n−1
(4.33)
d¯n
0
0
where U (k+1) ∈ C(k+1)×(k+1) is the unreduced part, B (k+1) ∈ C(k+1)×(n−k−1) and 0
T (k+1) ∈ C(n−k−1)×(n−k−1) are the tridiagonalized part. As shown in equation (4.33), after the tie breaking algorithm is applied, the (k + 2)-th row and column and following rows and columns of the matrix still remain tridiagonal although we retreat one step backward in the reduction process. This approach is very inexpensive and effective since the workload for this QL method for the tridiagonal matrix is only O(n − k) operations. In reality, the tie-situation is very unusual. In fact, we have not seen any model that has a tiesituation so far. The procedure for applying the tie breaking algorithm in CSYMM can be summarized as 0
A(k+1) = GTk A(k) Gk .
(4.34)
Then, the reduction is started again from the (k + 1)-th column as follows. 0
0
0
A(k) = Hk+1 (GTk A(k) Gk )Hk+1
51
(4.35)
The overall reduction procedure with a tie-situation at column k can be described as 0
0
T = A(3) = H3 · · · Hk [Hk+1 GTk (Hk+1 · · · Hn A(n) Hn · · · Hk+1 )Gk Hk+1 ]Hk · · · H3 = QT AQ (4.36) where the complex orthogonal transformation matrix Q is defined as 0
Q = Hn · · · Hk+1 Gk Hk+1 Hk · · · H3 .
52
(4.37)
4.3
Block Packed Storage Symmetric Matrix Eigensolver
A variety of software developments have led to many libraries such as EISPACK [43] and the more recent LAPACK [58]. Most of the numeric libraries provide full storage and packed storage subroutines to compute all eigenvalues, and optionally, eigenvectors of a symmetric matrix. The full storage algorithm requires only either the upper right or the lower left triangular part of the full matrix, so the memory for the other triangular part is wasted. But, this method is very rich in matrix-matrix operations which results in high speed. For packed storage, only the upper or lower triangular part of the matrix is stored in a one-dimensional array. Symmetric or Hermitian matrices may be stored more compactly if the non-zero entries of the triangular part are packed by columns in a one-dimensional array. Therefore, it is very economical in memory usage. But, the packed storage uses only Level 2 BLAS which results in slow performance speed. It is known generally that the implementation of Level 3 BLAS performs better than Level 2 BLAS [52]. Level 3 BLAS makes the algorithm speed up the data movement and obtain near peak performance, although the amount of data moved is the same, since fewer messages are needed to move the data. Therefore, the full storage algorithm which uses a block partitioned algorithm with Level 3 BLAS has much better performance than the packed storage algorithm. This section introduces a new algorithm, block packed storage Householder method (BPHOUSE), which can save almost as much memory as the packed storage algorithm, but can get performance almost as good as the full storage algorithm. It can be applied to a complex symmetric matrix eigenvalue problem in CSYMM as well as to a real symmetric matrix eigenvalue problem.
53
4.3.1
Block packed storage
In BPHOUSE, the block packed storage scheme is used to store the coefficients of a symmetric matrix. It is illustrated in Figure 4.1. A matrix is partitioned into contiguous submatrices which are called block packed parts or blocks. Only these several block packed parts of the matrix are now kept in storage. The mapping between the original matrix with an upper triangular part and the block packed storage is written in Figure 4.2. It is different from conventional packed storage which contains only the upper or lower triangular part of the matrix. Table 4.1 describes the storage requirements of each different algorithm for a square matrix with dimension n. For full storage, n2 storage is required for the reduction process. Also, another n2 storage is needed to save the eigenvectors Z in the eigenproblem of T in equation (4.37). In the backtransformation, there is no additional storage requirement since the space of Z can be used to multiply Q by Z implicitly. The total storage requirement is 2n2 . For packed storage, a total of (3n2 /2 + n/2) storage is required. In the block packed storage algorithm, (3n2 /2 + n/2 + nb(nb+1) ∗ nblock) storage is required. The storage amount ( nb(nb+1) ∗ nblock) 2 2 corresponds to nblock triangular areas below the diagonal in Figure 4.1, which can be negligible compared to n2 . This block packed storage scheme gives an excellent and natural opportunity for matrix operations with Level 3 BLAS although it uses almost the same amount of memory as packed storage. However, the block packed storage algorithm gives almost the same performance as the full storage algorithm.
4.3.2
Reduction to tridiagonal form
A complex symmetric matrix A ∈ Cn×n needs to be reduced to the complex symmetric tridiagonal form T ∈ Cn×n by orthogonal similarity transformation T = QT AQ, where Q is the complex orthogonal transformation matrix. We can rewrite the
54
B1 B2
BN
Figure 4.1: Block packed storage scheme
(Bi,f irstcol : number of the first column of Bi ) (Bi,lastcol : number of the last column of Bi ) (Bi,lda : leading dimension of Bi ) kk = 1 do i = 1, nblock do j = Bi,f irstcol , Bi,lastcol−1 do k = 1, Bi,lda Bi (kk) = A(k, j) kk = kk + 1 end end end
Figure 4.2: Mapping between block packed storage B and the upper triangular matrix A
55
Table 4.1: The storage requirements of each stage of the eigenproblem
reduction eigenproblem of T back transformation max. memory requirement ∗) ² = nb(nb+1) ∗ nblock 2 number of blocks
Full storage n2 n2 − 2n2
Packed storage n(n + 1)/2 n2 − 2 3n /2 + n/2
Block packed storage n(n + 1)/2 + ²∗ n2 − 2 3n /2 + n/2 + ²
nb =the number of columns in a block,
nblock = the
process of tridiagonal decomposition at column k as A(k+1) = Hk A(k) Hk .
(4.38)
Let the upper part of A be stored in the block packed storage according to the mapping in Figure 4.2. In the reduction stage, the first block B1 is tridiagonalized using the unblocked version as described in equation (4.18). For the other blocks Bk , (k = 2, · · · , nblock), the blocked version in equation (4.22) is used. There are two main differences between the full storage algorithm and the block packed storage algorithm. One is the matrix-vector multiplication to compute x in equation (4.19) and the other is the rank 2k update in equation (4.20). First, we consider the matrix-vector multiplication x = τ Av. For convenience, each block packed storage is partitioned again in the following form. 1 1 1 1 B1 B2 B3 · · · BN 2 B22 B32 BN 3 3 (4.39) A= B3 BN . .. .. . N BN where the subscript represents the block number and the superscript represents the partition number in each block. For the current j-th block, x is computed by looping 56
all blocks from the first block to the (j − 1)-th block as follows. x = τ Av = x1 + x1:2 + · · · + x1:j−1
(4.40)
= τ (B1 v1 + B2 v2 + · · · + Bj−1 vj−1 )
Figure 4.3 shows an example for computing x with block packed matrix-vector multiplication. For the p-th column in B4 , B1 through B4 are called, and each block Bj (j = 1, · · · , 4) is multiplied by the part of householder vectors vj (solid line). Then each result x1:j is accumulated to get xp . vp is saved in the p-th column of B4 to be used in the backtransformation. Once x is obtained and saved in Bj , w is computed and saved in working space W to be used in the rank 2k update. Next, for the current block j, the block packed rank 2k updates are performed. All blocks are updated by looping blocks from B1 through Bj−1 in the following way. B1:j−1 = B1:j−1 − V W T − W V T
(4.41)
= B1:j−1 − Bj W T − W BjT This update can be divided into separate updates for each block. T
T
B1
= B1 − Bj1 W 1 − W 1 Bj1
B2
= B2 − Bj1:2 W 2 − W 1:2 Bj2 .. .
T
T
T
Bj−1 = Bj−1 − Bj1:j−1 W j−1 − W 1:j−1 Bjj−1
(4.42) T
where the workspace W is partitioned in the same way as Bj−1 in equation (4.39)
W1
W2 W = .. . W j−1
.
(4.43)
Because the Householder vectors V are stored in the block Bj , the block packed rank 2k update naturally satisfies the goal of using matrix-matrix multiplication. 57
The illustration of the block packed rank 2k updates algorithm is in Figure 4.4. For the current block B4 , updating from B1 to B3 is necessary. The update is performed for each block separately, not at one time like the conventional rank 2k update. To update the first block of B1 , the first part of B1 and W 1 is used. For the B3 update, the first three parts of B4 and the third part of W, which is W 3 , are necessary. In the Householder method, most of the computation is in the matrix-vector multiplication and the rank 2k updates. In BPHOUSE, the block packed matrixvector multiplication and block packed rank 2k updates, BPSYR2K, give performance as fast as that of the regular BLAS routines for full storage matrices. Therefore, the reduction process with block packed storage gives outstanding performance although it only requires about half of the storage requirement of the full storage for storing A. Once we have the complex symmetric tridiagonal matrix T , the eigenvalues Λ and the corresponding eigenvectors Z of T are computed as shown in equation (4.37). Then, the original eigenvectors are obtained from the backtransformation X = QZ. Since Q consists of a sequence of Householder transformations, we just need to apply these transformations to Z in the reverse order. We use the storage efficient W Y form [57] to represent the complex orthogonal matrices. This block packed storage backtransformation is also performed with Level 3 BLAS, unlike the packed storage method, so that the performance is much better than that of the packed storage method.
58
B1
B2 B 3 B 4 B 5
(a)
B1
V1
X1
B1
V2
B1 V3
X1:2
X1:3
B1
V4(1:p) X1:p
(b)
Figure 4.3: Example of block packed matrix-vector multiplication, x = τ Av, for the p-th column in B4
59
B1 B2 B3
B4 B5
(a)
_
_
B 41
B1
( W1)T
W1
(B 41)T
W 1:2
(B42 )T
W 1:3
(B43)T
(b)
_
_
B41:2
B2
( W2 )T (c)
_
B3
_
B41:3
( W3 )T (d)
Figure 4.4: Example of block packed rank 2k update
60
Chapter 5
Frequency Response Analysis of Acoustic Fluid/Structure Interaction In automobiles, vibrating structures generate acoustic pressures in the acoustic cavity of the passenger compartment, so that acoustic fluid-structure interaction analysis is important to improve ride quality by predicting noise levels. A number of finite element formulations have been proposed to model an acoustic fluid for the analysis of fluid-structure interaction problems [62, 64, 66]. There are two main approaches: displacement formulation and pressure formulation. In the displacement formulation, the fluid motion is described in the same manner as the motion of the structure. The displacement formulation, which is also called the Lagrangian method [62], results in symmetric coefficient matrices, so that it is easy to implement a fluid-structure interface condition. However, the displacement approach suffers from the presence of spurious resonance [63] because it exhibits spurious nonzero frequency modes. Although various approaches have been introduced to obtain improved formulations, the currently available displacement 61
based formulation is not yet satisfactory. Therefore, the pressure approach has been widely employed to describe the acoustic fluid-structure interaction problem [66]. In the pressure formulation, which is called the Eulerian method [62], the fluid is characterized by pressure variables, and the coupling between structure and fluid is achieved by consideration of interface forces. This approach has the advantage that a much smaller number of variables are involved to describe the fluid motion [62]. However, unlike the displacement formulation, the pressure formulation results in non-symmetric coefficient matrices. Everstine [65] rearranged the governing equations, which are asymmetric, into a form having symmetric coefficient matrices using the velocity potential. This chapter describes a method for computing the frequency response of the acoustic fluid-structure interaction problem formulated with the pressure formulation. This approach uses the newly developed frequency response algorithm FFRA described in Chapter 3.
5.1
Problem Formulation
Figure 5.1 shows the fluid-structure interaction problem domain. ΩS and ΩF are the structure domain and the fluid domain, respectively. ΓI is the fluid-structure interaction part of the boundary. ΓD and ΓN represent the boundary for the prescribed displacements and the applied forces, respectively. In the fluid domain ΩF , the fluid is treated as a compressible, inviscid, nonflowing medium whose pressure p(t) satisfies the wave equation ∇2 p(t) −
1 p¨(t) = 0 c2
in
ΩF
(5.1)
where c is the wave speed in the fluid. A finite element formulation of the fluid-structure interaction problem with
62
*N
:F
:S
*I
*D
Figure 5.1: Fluid-structure interaction problem domain the pressure formulation results in the form [62] ˙ ¨ x(t) x(t) B 0 M 0 + + ¨ (t) ˙ 0 C p(t) −ρAT E p fs (t) (1 + iγ)K + iKs A x(t) = 0 H p(t) ff (t)
(5.2)
in which x(t) is the displacement in ΩS and p(t) is the fluid pressure in ΩF . M , B, K and Ks are the structure finite element mass, viscous damping, stiffness and structural damping matrices, respectively, as described in Chapter 3. fs (t) is the external force acting on the structure and ff (t) stands for the external force in the fluid due to a wave force field. The tangential forces exerted by the fluid on the structure are neglected. E is the fluid inertia matrix, C is the symmetric damping matrix for the fluid associated with a radiation boundary condition, and H is the fluid stiffness matrix [62]. The matrix A is obtained from the interface interaction 63
force and it couples the structure and fluid responses together. It is sometimes referred to as the area matrix. The imposed boundary conditions are the rigid wall condition, the linearized free surface condition, and the radiation boundary condition [62]. Note that the coefficient matrices of equation (5.2) is not symmetric. The asymmetry can be removed by reformulating the problem in terms of a ˙ new vector q(t) such that p(t) = q(t) [65]. The new variable q(t) is the velocity potential which has long been used in fluid dynamics analysis. If the second partition of equation (5.2) is divided by −ρ and integrated in time, and p(t) is replaced by ˙ q(t), one can obtain the following symmetric form x(t) x ˙ ¨ (t) B A M 0 + + ˙ ¨ (t) AT −C/ρ q(t) 0 −E/ρ q x(t) fs (t) (1 + iγ)K + iKs 0 = 0 −H/ρ q(t) gf (t) where 1 gf (t) = − ρ
Z 0
(5.3)
t
ff (τ )dτ.
(5.4)
For a harmonic excitation, in which fs (t) = Ps (ω)eiωt and gf (t) = Pf (ω)eiωt , we assume a harmonic solution of the form x(t) = Xs (ω)eiωt and q(t) = Xf (ω)eiωt , in which q(t) = (iω)−1 p(t) = (iω)−1 P(ω)eiωt . Then, the frequency response problem can be formulated as follows [65, 67]. Xs (ω) −ω 2 M + iωB + (1 + iγ)K + iKs iωA iωAT −1/ρ(−ω 2 E + iωC + H) Xf (ω) Ps (ω) = P (ω) f
(5.5) where Xs (ω) is the structure response and Xf (ω) is the fluid response. To form the modal frequency response problem, at first, the fluid system is directly projected onto the global modal space of the fluid with the fluid modes ΦF 64
from the following eigenvalue problem HΦF = EΦF ΛF .
(5.6)
ΦF is normalized such that ΦTF EΦF = IF
(5.7)
ΦTF HΦF = ΛF
where ΛF is the eigenvalue matrix for the fluid part. Then the fluid pressure is represented as Xf (ω) = ΦF Zf (ω).
(5.8)
The eigenvalue problem for the fluid part is ordinarily much less expensive to solve than the structure part eigenvalue problem because it has a much smaller dimension. For the structure part, the AMLS method is employed to form modal matrices including a modal area matrix. These modal matrices are formed efficiently without requiring a huge amount of data transfer as described in Chapter 3. Substituting
Xs (ω) T = X (ω) 0 f
Ys (ω) 0 ΦF Zf (ω)
into equation (5.5) and premultiplying equation (5.5) by TT 0 0 ΦTF yield
−ω 2 M + iωB + (1 + iγ)K + iKs
Fs (ω) = F (ω)
iωAT
(5.9)
(5.10)
Ys (ω) −(−ω 2 IF + iω C¯ + ΛF )/ρ Zf (ω) iωA
f
(5.11)
65
where M, B, K and Ks are defined in equations (3.4), and C¯ = ΦTF CΦF . The transformed area matrix becomes A = T T AΦF
(5.12)
and the force matrices are transformed to Fs (ω) = T T Ps (ω) and Ff (ω) = ΦTF Pf (ω). The frequency response problem in equation (5.11) can be projected onto the global modal space using the approximated global eigensolution Φg from PHASE 4 as
ΦTg
0
0
IT
−ω 2 M + iωB + (1 + iγ)K + iKs iωAT
×
Zs (ω) ΦTg = I Zf (ω) 0
Φg 0 0
iωA
−(−ω 2 IF + iω C¯ + ΛF )/ρ Fs (ω) 0 . F (ω) T I
f
(5.13) where Ys (ω) = Φg Zs (ω). Equation (5.13) can be rewritten in the form 2 ¯ ¯ ¯ Zs (ω) −ω I + iω B + (1 + iγ)Λ + iKs iω A iω A¯T −(−ω 2 IF + iω C¯ + ΛF )/ρ Zf (ω) Fs (ω) = F (ω) f
(5.14) where Fs (ω) = ΦTg Fs (ω) and A¯ = ΦTg A. Equation (5.14) is the damped modal frequency response problem for the acoustic fluid-structure interaction problem.
5.2
Frequency Response Analysis
For simplicity, we rewrite equation (5.14) as Zs Fs Ass iωAsf = Z F iωATsf Af f f f 66
(5.15)
where the subscript ss, f f , and sf represent the structure part, the fluid part, and the fluid-structure interaction part of the coefficient matrix, respectively. Note that the responses Zs ∈ CNev ×Ncase , Zf ∈ CNf ev ×Ncase and the forces Fs ∈ CNev ×Ncase , Ff ∈ CNf ev ×Ncase are matrices, not vectors. Ncase represents the number of right hand sides and Nf ev represents the number of global modes of the fluid. Ass ∈ CNev ×Nev , iωAsf ∈ CNev ×Nf ev and Af f ∈ CNf ev ×Nf ev depend on the excitation frequency ω. Conventionally, most commercial FE software solve the linear system in equation (5.15) directly using existing linear system solvers such as direct methods or iterative methods. The main goal of this chapter is to develop a method to solve the equation (5.15) with the newly developed frequency response algorithm FFRA. Instead of dealing with equation (5.15) directly as in the conventional approach, we partition it into the structure part and the fluid part. Then, the upper part of equation (5.15) becomes Ass Zs + iωAsf Zf = Fs
(5.16)
and it is rearranged to give −1 Zs = A−1 ss Fs − iωAss Asf Zf
(5.17)
= Us1 − iωUs2 Zf Nev ×Ncase and U = A−1 A Nev ×Nf ev . where Us1 = A−1 s2 ss Fs ∈ C ss sf ∈ C
Note that Us1 and Us2 can be obtained by solving the following equation Ass [Us1 , Us2 ] = [Fs , Asf ]
(5.18)
and solving this linear system is the most expensive part in the analysis procedure since the structure part has more modes than the fluid part in the frequency response problem of large detailed automobile acoustic fluid-structure models, that is, Ass is the biggest submatrix.
67
By noting that Ass is just the structural part of the coefficient matrix, Us1 and Us2 can be obtained inexpensively using the FFRA algorithm described in Chapter 3, in which the coefficient matrix can be diagonalized using a complex symmetric matrix eigensolution for the structural damping case, or transformed to a DPLR matrix for the viscous damped case. Therefore, using the FFRA algorithm to solve the linear system in equation (5.18), which takes most of the computation time, can reduce the overall computational cost significantly. Now the lower part of equation (5.15) can be rewritten as Af f Zf = Ff − iωATsf Zs .
(5.19)
By substituting equation (5.17) into equation (5.19), we obtain −1 Af f Zf = Ff − iωATsf (A−1 ss Fs − iωAss Asf Zf ).
(5.20)
Equation (5.20) is rearranged in the form T −1 (Af f + ω 2 ATsf A−1 ss Asf )Zf = Ff − iωAsf Ass Fs .
(5.21)
Combining equation (5.18) with equation (5.21) yields (Af f + ω 2 ATsf Us2 )Zf = Ff − iωATsf Us1 .
(5.22)
Finally, the fluid solution Zf is obtained in the form Zf
= (Af f + ω 2 ATsf Us2 )−1 (Ff − iωATsf Us1 ).
(5.23)
Then, substituting Zf into equation (5.17) yields the structural part solution Zs . Once Zs and Zf are obtained, these solutions are transformed to the finite element solutions as follows. Xs = T Φg Zs
(5.24)
P = iωXf = iωΦF Zf
(5.25)
68
Chapter 6
Frequency Response Analysis with Enforced Motion When an enforced motion or a base motion is specified, large mass, large stiffness or Lagrange multiplier techniques have been used [68] to solve the frequency response problem. If the added stiffness or mass is sufficiently stiff or massive, the reaction force from the actual structure will not significantly affect the input motions. The large mass method has more of an advantage than the other method since it is easy to estimate a good value for the large mass which is approximately 106 ∼ 107 times the mass of the entire structure. Figure 6.1 illustrates the large mass modeling concept. Instead of restraining the FE model of the structure at a ground point, it is connected to a point with a large mass. The large mass represents the base for which the motion is to be specified. If a very large mass M0 is attached to a degree of freedom and the load P is applied to the same degree of freedom, the load that produces a desired acceleration u ¨ is approximately P = M0 u ¨. The accuracy of this approximation increases as M0 is made larger in comparison to the mass of the structure. The obvious limit for the size of M0 is numeric overflow in the computer [68]. A more subtle problem is that using large fictitious masses in the 69
model introduces ill-conditioning, which can lead to inaccurate results. This chapter describes a new approach for enforced motion analysis with the large mass method. This approach uses the FFRA algorithm, but it avoids the numerical inaccuracy of the eigensolution by separating modes which have low frequencies because of large masses from the other modes of the structure.
6.1
Motivation
As stated above, inaccurate eigensolutions can be obtained for the complex symmetric matrix C defined in equation (3.13) when there are low frequency modes corresponding to the large masses. Figure 6.2 shows an example of this situation. As an example, model 6 in Chapter 8 is a model with enforced motion, and it has 314 global modes, of which 36 modes have extremely low natural frequencies because of the large masses. For a model with enforced motion, which is called as model 6 in Chapter 8, there are 314 global modes, in which 36 modes are low frequency modes. The eigenvalue problem for the complex symmetric matrix C is solved using both ZGEEV, which is a general matrix eigensolver in LAPACK, and CSYMM, described in Chapter 4. Then the orthogonality condition of the eigenvector matrix, √ X T X = I, is checked using the Euclidian norm of the residual, krk = rH r, of each column of X. The residual is defined as h
i r1 r2 · · · rn−1 rn
= XT X − I
, where X ∈ Cn×n
(6.1)
As shown in Figure 6.2, the eigensolutions for the low frequency modes from both ZGEEV in LAPACK and CSYMM do not satisfy the orthogonality condition well. This situation motivates the need to develop a new approach for solving the frequency response problem with large masses.
70
??? ??? STRUCTURE
M, B, K
Mo
P
LARGE MASS
Figure 6.1: Example of large mass approach for the enforced motion problem
71
eigenvectors from LAPACK
0.9 0.8 0.7
|| rk ||
0.6 0.5 0.4 0.3 0.2 0.1 0 −0.1
0
50
100
150
200
250
300
350
k−th eigenvector
(a) LAPACK −3
3
eigenvectors from CSYMM
x 10
2.5
2
|| rk ||
1.5
1
0.5
0
−0.5
−1
0
50
100
150
200
250
300
350
k−th eigenvector
(b) CSYMM
Figure 6.2: Norm of residual of each eigenvector from LAPACK and CSYMM 72
6.2
Problem Formulation for Structure
The frequency response problem is partitioned into the low frequency mode part Nl , corresponding to modes with very low natural frequencies due to the large masses, and the fixed-mass mode part Nf , corresponding to the rest of the global modes, in which the large masses have virtually no motion, to avoid the numerical inaccuracy of eigensolutions due to the large masses. Therefore, the modal frequency response problem can be represented in the form Ass,ll Ass,lf Zs,l Fs,l = Ass (ω)Zs = Ass,f l Ass,f f Zs,f Fs,f
(6.2)
¯ ¯ s ). The subscripts ll, f f , lf and where Ass (ω) represents (−ω 2 I +iω B+(1+iγ)Λ+i K f l represent the low frequency mode part, the remaining or fixed-mass mode part, and the interaction part between those two parts, respectively. Zs,l ∈ CNl ×Ncase and Zs,f ∈ CNf ×Ncase are the low frequency mode and the fixed-mass mode parts of the structure solution, respectively. Fs,l ∈ CNl ×Ncase is the force matrix in the low frequency mode part and Fs,f ∈ CNf ×Ncase is the force matrix in the fixed-mass mode part. The submatrices Ass,ll , Ass,lf , Ass,f l and Ass,f f can be defined as follows. Ass,ll =
−ω 2 Ill + iωBll + (1 + iγ)Λll + iKs,ll ∈ CNl ×Nl
Ass,lf =
iωBlf + iKs,lf ∈ CNl ×Nf
Ass,f l =
iωBf l + iKs,f l ∈ CNf ×Nl
(6.3)
Ass,f f = −ω 2 If f + iωBf f + (1 + iγ)Λf f + iKs,f f ∈ CNf ×Nf As in the method for solving the partitioned system in the acoustic fluidstructure interaction problem described in Chapter 5, we solve equation (6.2) by representing the upper part and the lower part separately. First, the lower part of equation (6.2) is expressed as Zs,f = A−1 ss,f f (Fs,f − Ass,f l Zs,l ). 73
(6.4)
Substituting equation (6.4) into the upper part of equation (6.2) yields −1 (Ass,ll − Ass,lf A−1 ss,f f Ass,f l )Zs,l = Fs,l − Ass,lf Ass,f f Fs,f .
(6.5)
Equations (6.4) and (6.5) require the inverse of the submatrix Ass,f f , so that we solve the following linear system in advance using the FFRA algorithm. Ass,f f [V1 , V2 ] = [Fs,f , Ass,f l ]
(6.6)
¯ s exists, the complex Based on the FFRA algorithm, when structural damping K symmetric matrix corresponding to the fixed-mass mode part is defined as ¯ s,f f ∈ CNf ×Nf Cf = (1 + iγ)Λf + K
(6.7)
and the complex symmetric eigenproblem Cf ΦC = ΦC ΛC is solved, so the coefficient matrix in equation (6.7) becomes diagonal. In cases in which there is a low-rank viscous damping matrix B, equation (6.6) can be solved easily by representing the coefficient matrix as a DPLR matrix. With solutions [V1 , V2 ], equation (6.5) can be rewritten as follows. (Ass,ll − Ass,lf V2 )Zs,l = Fs,l − Ass,lf V1
(6.8)
Once the low frequency mode part solution Zs,l is obtained from equation (6.8), the fixed-mass mode part Zs,f is obtained from the equation (6.4). It can be rewritten as Zs,f = V1 − V2 Zs,l . Therefore, the solution for the whole structure becomes Zs,l . Zs = Zs,f
74
(6.9)
(6.10)
6.3
Problem Formulation for Acoustic Fluid and Structure Interaction
In this section, the algorithm for the frequency response problem with enforced motion described in the previous section is extended to the acoustic fluid-structure interaction problem with enforced motion. The frequency response of the acoustic fluid-structure interaction problem in equation (5.15) is rewritten as Fs Ass iωAsf Zs . = Z F iωATsf Af f f f
(6.11)
As in the algorithm in section 6.2, the coefficient matrix of the structure part, Ass , is divided into the low frequency mode part and the fixed-mass mode part in the form
Ass,ll
Ass,lf
Ass,f l Ass,f f iωATsf,l iωATsf,f
iωAsf,l
Zs,l
iωAsf,f Zs,f Af f Zf
Fs,l
= Fs,f Ff
.
(6.12)
The area matrix Asf , the structure part Zs of the solution matrix, and the structure part Fs of the force matrix are also partitioned into Asf,l ∈ CNl ×Nf ev and Asf,f ∈ CNf ×Nf ev , Zs,l ∈ CNl ×Ncase and Zs,f ∈ CNf ×Ncase , and Fs,l ∈ CNl ×Ncase and Fs,f ∈ CNf ×Ncase , respectively, based on the low frequency mode and the fixed-mass mode part. This partitioned system can be solved using the same approach used in the acoustic fluid-structure interaction problem in Chapter 5. After the partitioned system for the low frequency mode part and the fixed-mass mode part is solved as described in the previous section, the partitioned system for the acoustic fluidstructure interaction problem is solved later. First, it is necessary to solve equation (5.18) for the partitioned system of the acoustic fluid-structure interaction problem when the FFRA algorithm is employed.
75
Equation (5.18) is rewritten as Ass [Us1 , Us2 ] = [Fs , Asf ] .
(6.13)
Since Ass is the coefficient matrix of the structure part, it is partitioned in the form U U2,l F Asf,l Ass,ll Ass,lf 1,l = s,l . (6.14) Ass [Us1 , Us2 ] = Ass,f l Ass,f f U1,f U2,f Fs,f Asf,f To solve this equation, from the lower part of equation (6.14), we get [U1,f , U2,f ] = A−1 ss,f f ([Fs,f , Asf,f ] − Ass,f l [U1,l , U2,l ]).
(6.15)
The fixed-mass mode part [U1,f , U2,f ] is substituted into the upper part of equation (6.14), so that the low frequency mode part [U1,l , U2,l ] can be expressed as ³ ´ Ass,ll − Ass,lf A−1 A [U1,l , U2,l ] = [Fs,l , Asf,l ] − Ass,lf A−1 ss,f l ss,f f ss,f f [Fs,f , Asf,f ] . (6.16) In order to solve equation (6.16) efficiently, similar to the previous section, the following linear system is solved in advance Ass,f f [V1 , V2 , V3 ] = [Fs,f , Asf,f , Ass,f l ]
(6.17)
using the FFRA algorithm to reduce the computational cost significantly. When structural damping exists, [V1 , V2 , V3 ] is represented in the form [V1 , V2 , V3 ] = ΦC [W1 , W2 , W3 ]
(6.18)
and equation (6.17) can be written as £ ¤ A¯ss,f f [W1 , W2 , W3 ] = F¯s,f , A¯sf,f , A¯ss,f l
(6.19)
where A¯ss,f f = ΦTC Ass,f f ΦC , F¯s,f = ΦTC Fs,f , A¯sf,f = ΦTC Asf,f , and A¯ss,f l = ΦTC Ass,f l , respectively. Once the solution W is obtained with the FFRA algorithm, the solution V can be obtained from the backtransformation in equation (6.18). 76
With the solution [V1 , V2 , V3 ], the equation (6.16) can be rewritten as (Ass,ll − Ass,lf V3 ) [U1,l , U2,l ] = [Fs,l , Asf,l ] − Ass,lf [V1 , V2 ] .
(6.20)
However, instead of substituting V into equation (6.20), the solution W can be used in the equation (6.20) as (Ass,ll − Ass,lf ΦC W3 ) [U1,l , U2,l ] = [Fs,l , Asf,l ] − Ass,lf ΦC [W1 , W2 ]
(6.21)
and equation (6.21) can be rewritten using the definition in equation (6.19) in the form ¢ ¡ Ass,ll − A¯ss,lf W3 [U1,l , U2,l ] = [Fs,l , Asf,l ] − A¯ss,lf [W1 , W2 ] .
(6.22)
Since A¯ss,lf is already obtained in the equation (6.19), solving equation (6.22) instead of equation (6.20) can significantly reduce the computational cost by skipping the backtransformation in equation (6.18). After we get the low frequency mode solution [U1,l , U2,l ] from equation (6.22), the fixed-mass mode solution [U1,f , U2,f ] can be obtained from the equation (6.15). Similar to the way to calculate [U1,l , U2,l ], [U1,f , U2,f ] can be obtained easily as [U1,f , U2,f ] = [V1 , V2 ] − V3 [U1,l , U2,l ]
(6.23)
= ΦC ([W1 , W2 ] − W3 [U1,l , U2,l ]) . For better computational performance with BLAS 3 operations, only the ([W1 , W2 ] -W3 [U1,l , U2,l ]) part is computed at each frequency, and then ΦC is multiplied once after the frequency sweep is finished. Finally, the solution [Us1 , Us2 ] of equation (6.13) can be obtained from equations (6.22) and (6.23) in the form [Us1 , Us2 ] =
U1,l
U2,l
U1,f
U2,f
77
.
(6.24)
After the solution [Us1 , Us2 ] is obtained, the fluid part Zf of the solution can be obtained from ¡ ¢ Af f + ω 2 ATsf Us2 Zf = F − iωATsf Us1
(6.25)
and the structure part Zs of the solution is obtained from Zs = Us1 − iωUs2 Zf .
6.4
(6.26)
Problem Formulation with Modal Correction Approach
The enforced motion analysis can be integrated with the modal correction approach described in Chapter 3. For the system modified from an original configuration, the frequency response problem, including the acoustic fluid-structure interaction, can be described as
Ass
iωAsf
iωATsf
Af f
Zs F s = Z F f f
(6.27)
where Ass = −ω 2 M + iωB + (1 + iγ)K + iKs ,
(6.28)
and M, B, K, and Ks are defined in equation (3.51). We assume that there is no configuration change in the fluid part. The coefficient matrix is partitioned in the same way as shown in equation (6.12). All solution procedures are similar to the previous section except steps given below. To solve the following equation which corresponds to equation (6.17), £ ¤ Ass,f f [V1 , V2 , V3 ] = Fs,f , Asf,f , ATss,lf
(6.29)
where Ass,f f = −ω 2 Mf f + iωBf f + (1 + iγ)Kf f + iKs,f f , the factorization of the full mass matrix is performed as M = LLT . By representing V in the form [V1 , V2 , V3 ] = L−T [S1 , S2 , S3 ] 78
(6.30)
and premultiplying equation (6.29) by L−1 , equation (6.29) becomes [−ω 2 I + iωL−1 Bf f L−T + (1 + iγ)L−1 Kf f L−T + iL−1 Ks,f f L−T ] [S1 , S2 , S3 ] = L−1 [Fs,f , Asf,f , Ass,f l ] . (6.31) Then the complex symmetric matrix Cf is defined as Cf = (1 + iγ)L−1 Kf f L−T + iL−1 Ks,f f L−T
(6.32)
and complex symmetric matrix eigensolutions of C are obtained using CSYMM. With the FFRA algorithm, equation (6.31) becomes [−ω 2 I + ΛC + iωΦTC L−1 U ΣV T L−T ΦC ] [W1 , W2 , W3 ] = ΦTC L−1 [Fs,f , Asf,f , Ass,f l ] (6.33) where [S1 , S2 , S3 ] = ΦC [W1 , W2 , W3 ]. Once [W1 , W2 , W3 ] is computed, [U1,l , U2,l ] is obtained from ¢ ¡ Ass,ll − A¯ss,lf W3 [U1,l , U2,l ] = [Fs,l , Asf,l ] − A¯ss,lf [W1 , W2 ]
(6.34)
A¯Tss,lf = ΦTC L−1 ATss,lf = (Ass,lf L−T ΦC )T .
(6.35)
where
Then the fixed-mass mode part [U1,f , U2,f ] is obtained [U1,f , U2,f ] = [V1 , V2 ] − V3 [U1,l , U2,l ] = L−T ΦC ([W1 , W2 ] − W3 [U1,l , U2,l ]) .
(6.36)
Finally, the structure and fluid parts of the solution can be obtained from equations (6.25) and (6.26).
79
Chapter 7
Parallel Implementation of the New Algorithm In shared memory multiprocessors (SMM), each processor has direct access to the memory of other processors in the system, so that any processor directly loads or stores any shared address. As a tool for parallel implementation on SMM, the OpenMP Application Program Interface (API) has been used. OpenMP is a set of compiler directives and callable runtime library routines that express shared memory parallelism. Also, OpenMP has the very powerful concept of orphan directives that greatly simplify the task of implementing coarse grained parallel algorithms. The parallel implementation exhibits two different types of parallelism: the task or coarse grained parallelism, where a number of separate tasks can be executed simultaneously, and the fine grained parallelism at the matrix level, where one matrix operation may be carried out using a number of processors. In this chapter, the new FFRA algorithm for solving the damped modal frequency response analysis problems is parallelized for SMM using the OpenMP API with task parallelism as well as fine grained parallelism.
80
7.1
Parallelization of Frequency Response Analysis
The newly developed FFRA algorithm is parallelized using the OpenMP API. Figure 7.1 shows the flow of the parallel implementation of the FFRA algorithm. It consists mainly of 3 steps. In the beginning, the complex symmetric matrix eigenvalue problem solver CSYMM is parallelized using fine grained parallelism. Then, the frequency loop is partitioned into several sub-loops based on the number of available processors. Each processor performs analysis at a certain number of frequencies within a sub-loop, so that it can be implemented with task parallelism. Finally, the solutions obtained from several processors are collected and backtransformed to the modal solutions with fine grained parallelism.
7.1.1
Step 1 : Parallel complex symmetric matrix eigensolver
In the FFRA algorithm, the frequency response analysis depends mainly on the complex symmetric matrix eigensolution, so that its parallelization is very important for performance. In this section, the sequential algorithm of CSYMM is parallelized with fine grained parallelism. The parallelization of CSYMM is obtained through a sequence of parallel BLAS calls developed in this dissertation work and referred to as OpenMP parallel BLAS. Parallel Basic Linear Algebra using OpenMP Since CSYMM consists of a sequence of consecutive BLAS calls within a loop, it is necessary to develop OpenMP based parallel BLAS. However, developing parallel BLAS is one of the challenging issues in OpenMP [77]. In this dissertation, the strategy of partitioning BLAS operations into a series of small independent sequential BLAS calls is used, in which a matrix is partitioned and assigned to each process according to the block-cycle decomposition where the blocking factor is consistent with that required for good performance from the single processor matrix multipli81
Read the modal matrices
Step 1
Parallel CSYMM
Step 2
Frequency Loop
1
2
np - 1
np
Step 3
Backtransformation
Output the modal solution
Figure 7.1: The flow of the parallel frequency response analysis using the FFRA algorithm
82
cation. The block cyclic distribution is performed over both rows and columns of the matrix. This allows the routines to get better scalability and balanced computation. This is similar to the approach used by ScaLapack [70] which is for distributed memory computers. In ScaLapack, PBLAS is the name of a set of parallel BLAS routines, and interprocessor communication occurs within PBLAS. When a complex symmetric matrix is reduced to complex symmetric tridiagonal form, the Level 3 BLAS routine ZSYR2K, the Level 2 BLAS routine ZSYMV, ZGEMV and the Level 1 BLAS routine ZAXPY are called heavily. In the backtransformation, the Level 3 BLAS routine ZTRMM and ZGEMM are used. Parallel implementation of three of the typical and most heavily used BLAS routines, ZGEMM, ZSYR2K and ZGEMV, is described. • PZGEMM : matrix-matrix multiplication This complex matrix-matrix multiplication routine updates C ∈ Cm×n by the following equation: C = αAB + βC
(7.1)
where A ∈ Cm×k , B ∈ Ck×n , α ∈ C and β ∈ C. We start by noting that we partition A, B, and C by columns and rows with the block-cycle decomposition, respectively. C11 · · · C1N .. .. C= . . CM 1 · · · CM N
A11 · · · A1K .. .. ,A = . . AM 1 · · · AM K
B11 · · · B1N .. . , B = .. . . BK1 · · · BKN (7.2)
The subscripts M and N represent the number of sub-matrices in each column and
83
row. Then each processor performs the following sub-matrix multiplication. B1j B2j ³ ´ .. + βCij Cij = α Ai1 Ai2 · · · Ai(K−1) AiK ∗ . B(K−1)j BKj
(7.3)
It is a typical approach to optimize matrix multiplication by using an inner kernel to compute Cij . This approach has the property that the CPU attains near optimal performance when Aip remains in the L1 cache and elements of Cij and Bpj are streamed from a lower level in the memory pyramid [79]. Note that each processor operation is independent of another processor operation, and each processor saves the results in its own memory space, so that PZGEMM operations do not need any reduction of data. The only issue is the size of a submatrix since the submatrix should be large enough that near peak sequential performance is obtained. Figure 7.2 shows the performance of PZGEMM with different sizes of submatrices. An HP rx5670 server which has four 900 MHz Itanium 2 processors is used to evaluate the performance. The test matrix is generated with randomly distributed entries. All run times are reported in seconds, and the size of the submatrix is denoted by nb. Several different values of nb are chosen, in which nb = 32, 64, 100, 200, 300, 400, and 500. As shown in Figure 7.2, the performance increases until nb is around 200, so that the value of nb = 256 is chosen in OpenMP based parallel BLAS. Figure 7.3 shows the performance of PZGEMM for several different sizes of test matrix. Speedups of 1.9 and 3.5 are obtained for 2 and 4 threads, respectively. • PZSYR2K : symmetric rank 2k update The symmetric rank 2k update assumes that C ∈ Cn×n is symmetric and is 84
PDGEMM (n=6000) 120 NP=2 NP=4
100
time (sec)
80
60
40
20
0
0
100
200
300
400
500
600
block size nb
Figure 7.2: Elapsed time of PZGEMM with different submatrix sizes for n = 6000 matrix
PDGEMM 4
3.5
3
speed up
2.5
2
1.5
1 NP=2 NP=4 0.5
0
0
2000
4000
6000
8000
10000
12000
matrix size n
Figure 7.3: Speedup of PZGEMM for different matrix sizes
85
to be updated using the following formula. C = α(AB T + BAT ) + βC
(7.4)
where A ∈ Cn×k , B ∈ Cn×k , α ∈ C and β ∈ C. Each processor performs the following rank 2k update. C1i A1 C2i A2 .. .. = α( . . C(N −1)i A(N −1) CN i AN
B1 B2 T .. Bi + . B(N −1) BN
C1i C2i T .. Ai ) + β . C(N −1)i CN i
(7.5)
This operation is also independent of other processors, so that there is no need to transfer data between the processors.
Figure 7.4 shows the performance of
PZSYR2K for several different sizes of test matrices. Speedup is 2 and 3.5 with 2 and 4 threads, respectively. PDSYR2K 4
3.5
3
speed up
2.5
2
1.5
1 NP=2 NP=4 0.5
0
0
2000
4000
6000
8000
10000
12000
matrix size n
Figure 7.4: Speedup of PZSYR2K for different matrix sizes
86
• Issues about Level 2 BLAS parallelization As mentioned earlier, CSYMM consists of a sequence of consecutive BLAS calls within a loop. When these calls operate on just vectors, or perform matrixvector type operations as shown in equation (4.19), they are sensitive to the migration of data from one processor cache to another because of overhead issues, which is one of the intrinsic difficulties of OpenMP [77]. In addition, while OpenMP provides flexibility and convenience which users can exploit, it lacks performance from data cache locality [77]. Because of the overhead of not using the data in the cache of particular processors, the performance of the matrix-vector and vector-vector operations are bad in the parallelization with OpenMP API. Therefore, this issue causes poor performance in the reduction process of CSYMM since the matrix-vector multiplication in equation (4.19) is one of the most expensive parts in the reduction process. The matrix-vector multiplication is implemented in the following way: y = αAx + βy
(7.6)
where A ∈ Cm×n , x ∈ Cn , y ∈ Cm , α ∈ C and β ∈ C. Each processor performs the following operations.
³ yi = α
Ai1 Ai2 · · · Ai(N −1) AiN
x1 x2 ´ .. ∗ . x(N −1) xN
+ βyi ,
i = 1, · · · , M
(7.7) Since all processors need to access x almost at the same time, this leads to excessive overhead. There is another way to parallelize the matrix-vector multiplication. Each
87
processor performs the following operations p y 1 A1j yp A2j 2 .. .. = α . . p y(M −1) A(M −1)j p yM AM j
xj ,
p = 1, · · · , N
(7.8)
where superscript p represents the processor number that computes this operation. However, after each processor finishes its own operation, it requires the reduction of data from all the other processors in the form y=
N X
yp + βy.
(7.9)
p=1
Since OpenMP does not support the reduction of data in vector or matrix format, the overheads due to synchronizing shared work among processes are important factors influencing the performance of a parallel program. Therefore, this approach is also not appropriate, and gives poor performance. This poor parallel performance of Level 2 BLAS makes it difficult to develop efficient OpenMP LAPACK routines because LAPACK routines use successive BLAS calls including Level 2 BLAS which give poor performance [77]. Figure 7.5 shows the poor performance of PZGEMV, in which the speedup with 2 and 4 processors is about 1.2, 1.5, respectively. However, the poor parallel performance of Level 2 BLAS operations is not confined to applications on SMM. In a distributed memory machine, Level 2 BLAS operations, especially matrix-vector multiplication, have been also bottlenecks in performance [70, 80]. Parallelization of CSYMM using OPBLAS The sequential algorithm of CSYMM in equation (4.19), (4.21), (4.22), and (4.24) is parallelized using the following OpenMP based parallel BLAS routines. 88
PDGEMV 4 NP=2 NP=4 3.5
3
speed up
2.5
2
1.5
1
0.5
0
0
2000
4000
6000
8000
10000
12000
matrix size n
Figure 7.5: Speedup of parallel PZGEMV for different matrix sizes
Reduction PZSYMV
:
xk = τ A(k) vk
PZAXPY
:
wk = xk − 21 τ vk (vkT xk )
PZSYR2K :
A(k−1) = A(k+nb−1) − V W T − W V T
PZGEMM :
X = QZ
Backtransformation (7.10)
7.1.2
Step 2 : Frequency loop
The frequency loop is partitioned into several sub-loops based on the available processors as shown in Figure 7.1, then each sub-frequency loop is assigned to each processor. Each processor generates the fluid solution Zf and the structure solution Zs , and these solutions are saved out-of-core. This task parallel implementation, like many similar parallelization efforts, is done to control the allocation of tasks 89
and load-balance effectively.
7.1.3
Step 3 : Backtransformation
Once the frequency sweep is finished, the structure part of the solution needs to be backtransformed by multiplying ΦC as shown in equation (3.18). Since this operation is a matrix-matrix multiplication, PZGEMM in OPBLAS can be used.
90
Chapter 8
Numerical Results and Performance In this chapter, numerical results are presented to demonstrate various aspects of the FFRA algorithm. Six FE models developed by various companies in the international automobile industry are used as numerical examples for testing the FFRA algorithm presented in Chapters 3 through 6. First, before testing the FFRA algorithm, the accuracy and performance of a complex symmetric matrix eigenvalue problem solver, CSYMM, are evaluated. For accuracy, the eigenvalues of CSYMM are compared with those obtained from ZGEEV which is a general matrix eigensolver in LAPACK. Performance is judged primarily on elapsed time. Then, six industry FE models are used as numerical examples to verify the accuracy and evaluate the performance of the FFRA algorithm. Table 8.1 shows the various characteristics of each FE model. The first model has only structural damping and has 5,818 global modes below the user specified cutoff frequency. The second model has structural damping as well as viscous damping, and has 7,570 global modes. The third model, with 4,944 global modes, includes an acoustic fluid 91
Table 8.1: The list of FE models used for numerical examples
Model Model Model Model Model Model
1 2 3 4 5 6
modes (structure) 5,818 7,570 4,944 6,022 7,570 3,017
Ks o o o o o o
damping symm. B asymm. B
fluid
modified system
large mass
o o o o
o o o o
o
model to represent air in the passenger compartment. The fourth model, with 6,022 global modes, has an asymmetric B matrix where the asymmetry comes from gyroscopic effects in an automatic transmission torque converter. The fifth model is a modified configuration of model 2 for design optimization. The last one, which has 3,017 global modes, is a model with enforced motion, in which large masses are used to enforce the motion. The accuracy of the solution from the FFRA algorithm for these FE models is evaluated by comparing it with the solution obtained from ZSYSV or ZGESV in LAPACK, and SOL 108 or SOL 111 in NASTRAN. ZSYSV and ZGESV in LAPACK are a complex symmetric and general matrix linear system solver, respectively. NASTRAN SOL 108 is a direct frequency response analysis module and NASTRAN SOL 111 is a modal frequency response analysis module. Frequency response functions fall into two categories. The frequency response function (FRF) for which the output location is the same as the input location is called the drive point FRF or the point FRF. The frequency response function for which the output location is different from the input location is called the cross point FRF or the transfer FRF. Therefore, the FFRA solution and the solution from NASTRAN SOL 108 and/or NASTRAN SOL 111 are compared for both drive points and cross points. 92
Table 8.2: Elapsed time for the reduction of the complex symmetric matrices with CSYMM, CS, and LAPACK [mm:ss] reduction matrix size n CSYMM CS ZGEEV 1000 00:03 00:04 00:10 2000 00:26 00:55 00:44 3000 00:47 01:41 06:43 4000 01:31 03:54 12:50 5000 03:22 10:02 37:10 Finally, the performance of the FFRA algorithm is measured in terms of elapsed time and compared with NASTRAN SOL 111. An HP rx5670 server which has four 900 MHz Itanium II processors is used to evaluate the performance of the FFRA algorithm. This computer has 4 processors, 8 gigabytes (GB) of physical memory, and 500 GB of disk space. The operating system is HP-UX version B.11.22.
8.1
Accuracy and Performance of CSYMM
8.1.1
Full storage eigensolver
In the FFRA algorithm, it is necessary to solve the complex symmetric matrix eigenvalue problem for C in equation (3.13). In Chapter 4, CSYMM is presented for solving a complex symmetric matrix eigenvalue problem efficiently using full storage. The accuracy of CSYMM is evaluated by comparing the eigenvalues produced by CSYMM with those obtained from ZGEEV in LAPACK [58]. ZGEEV computes the eigenvalues and, optionally, all or selected eigenvectors of a complex general matrix, which is reduced to upper Hessenberg form. The accuracy is tested for a matrix of order n = 1000, in which entries are distributed uniformly in the interval [−1, 1]. The norm of the residual k r k2 , in which r = (diag(Λlapack ) − diag(Λcsymm )), is 2.71e-12. 93
Table 8.3: Elapsed time for the backtransformation of the complex symmetric matrices with CSYMM, CS, and LAPACK [mm:ss] backtransformation matrix size n CSYMM CS ZGEEV 1000 00:03 00:11 00:06 2000 00:09 01:49 00:47 3000 00:22 05:04 02:31 4000 00:52 12:19 07:27 5000 01:28 34:42 11:22 To measure the performance of CSYMM, the elapsed time of CSYMM is compared with that of ZGEEV and the code CS (which stands for complex symmetric) developed by Bar-On and Ryaboy [55]. The performance is measured for randomly generated matrices with entries distributed uniformly in the interval [−1, 1]. Matrices of size n = 1000, 2000, 3000, 4000 and 5000 are generated. Table 8.2 and 8.3 represent the performance of the reduction and backtransformation stages with the three different algorithms as the size of matrix is increased. In the reduction procedure, the performance of CSYMM and CS is better than ZGEEV since ZGEEV does not exploit the symmetry of the matrix. However, in the backtransformation, the performance of CSYMM and ZGEEV is better than CS because CS is using only a BLAS 2 implementation. For example, in the matrix with n = 5000, CS takes 198 % more time than CSYMM, and ZGEEV takes 1004 % more time than CSYMM, in the reduction process. In the backtransformation, CS and ZGEEV take 2266 % and 675 % more time than CSYMM, respectively. Figure 8.1 shows the ratios of elapsed time, between CS and CSYMM and between ZGEEV and CSYMM. Figure 8.1 (a) and (b) shows that CSYMM is significantly faster than CS and ZGEEV in both reduction and backtransformation. Overall, the performance of CSYMM is outstanding while obtaining results almost as accurate as those of ZGEEV in LAPACK.
94
REDUCTION 25
CS / CSYMM LAPACK / CSYMM
Ratio of elapsed time
20
15
10
5
1 0
1000
2000
3000
4000
5000
Matrix size
(a) reduction BACKTRANSFORMATION 25
CS / CSYMM LAPACK / CSYMM
Ratio of elapsed time
20
15
10
5
1 0
1000
2000
3000
4000
5000
Matrix size
(b) backtransformation
Figure 8.1: Ratio of elapsed time to compute the eigensolution of complex symmetric matrices 95
8.1.2
Block packed storage eigensolver
In Chapter 4, the block packed storage Householder method, BPHOUSE, has been developed. It can use almost as little memory as the packed storage algorithm, but can perform almost as well as the full storage algorithm. It can be applied to either the real symmetric or the complex symmetric matrix eigenvalue problem. For a numerical example, BPHOUSE is tested only for a real symmetric matrix. The accuracy and performance of BPHOUSE is compared with LAPACK subroutines. For full storage, DSYTRD and DORMTR are used for reduction and backtransformation, respectively. DSPTRD and DOPMTR are used for packed storage. The accuracy of BPHOUSE is evaluated for a random matrix with n = 4000, in which entries are distributed uniformly in the interval [−1, 1]. Figure 8.2 shows the eigenvalues calculated by the full storage algorithm DSYTRD, the packed storage algorithm DSPTRD, and the block packed storage algorithm, in which all eigenvalues coincide. Figure 8.3 represents the relative errors between the eigenvalues from the block packed storage algorithm and those from the full storage algorithm, DSYTRD. The order of relative error is 1e-12 which is close to the machine precision. Table 8.4 and 8.5 show the performance of the reduction and backtransformation stage with different storage algorithms for matrices of size n = 2000, 4000, 6000, 8000 and 10000. Matrices are generated with uniformly distributed random numbers in the interval [−1, 1]. For a matrix with n = 10000, the performance of the block packed storage code is 10% slower than the full storage subroutine, but 40% faster than the packed storage subroutine in the reduction procedure. In the backtransformation, the block packed storage subroutine is 4% slower than the full storage subroutine, but 272% faster than the packed storage subroutine. Figure 8.4 (a) and (b) show the ratios of elapsed time, between the full storage and the block packed storage codes and between the full storage and the packed storage codes, for several different sizes of matrices. As illustrated, the block
96
80
60
40
Eigenvalue
20
0
−20
−40
full packed block packed
−60
−80
0
500
1000
1500
2000
2500
3000
3500
4000
Figure 8.2: Eigenvalues calculated from the full storage algorithm DSYTRD, packed storage algorithm DSPTRD, and block packed storage algorithm for a matrix n = 4000 −12
1
x 10
0.8
0.6
Relative Error
0.4
0.2
0
−0.2
−0.4
−0.6
−0.8
−1
0
500
1000
1500
2000
2500
3000
3500
4000
Figure 8.3: Relative errors between the eigenvalues from the block packed storage algorithm and those from the full storage algorithm for a matrix n = 4000 97
packed storage gives much better performance than the packed storage, but gives almost the same performance as the full storage. However, the block packed storage requires only half of the memory of the full storage for storing the matrix A while obtaining the same accuracy. Table 8.4: Elapsed time for reduction of real symmetric matrices [mm:ss] reduction n full storage pack storage block packed storage 2000 00:08 00:10 00:09 4000 00:40 01:12 00:48 6000 02:31 03:59 02:50 8000 05:01 07:40 05:45 10000 09:39 15:01 10:47
Table 8.5: Elapsed time for backtransformation of real symmetric matrices [mm:ss] backtransformation n full storage pack storage block packed storage 2000 00:07 00:28 00:07 4000 00:55 03:28 00:57 6000 03:04 13:11 03:07 8000 06:15 26:12 07:11 10000 14:10 55:03 14:47
98
REDUCTION 5 Full / Block pack Full / Pack 4.5
4
Ratio of elapsed time
3.5
3
2.5
2
1.5
1
0.5
0
2000
4000
6000
8000
10000
Matrix size
(a) reduction BACK TRANSFORMATION 5 Full / Block pack Full / Pack 4.5
4
Ratio of elapsed time
3.5
3
2.5
2
1.5
1
0.5
0
2000
4000
6000
8000
10000
Matrix size
(b) back transformation
Figure 8.4: Ratio of elapsed time to compute the eigensolution of real symmetric matrices: the full storage solver to the block packed storage solver and the full storage solver to the packed storage solver 99
8.2
Accuracy and Performance of FFRA
For the FE models described in Table 8.1, the frequency range of interest is defined by an analyst for modal frequency response analysis. The excitation location and response location are also determined by the analyst. The FFRA algorithm is applied to these FE models without modifying any NASTRAN input data provided by the analyst. To form a modal frequency response problem, partial eigensolutions of FE model are obtained using the AMLS method as described in Chapter 3. The global cutoff frequency for computing natural frequencies is set to 1.5 times the highest excitation frequency of interest.
8.2.1
Model 1 : Structural damping
Model 1 is a trim body model which has 1.58 million FE degrees of freedom. A trim body model is a automobile body model that has a steering mechanism and moving parts such as doors, hood, trunk lid, and seats. As shown in Figure 8.5, this FE model represents the cab of a full size truck without any parts of the truck behind the cab. This model has global structural damping γ and structural damping Ks . The number of nodes for which the response is requested by the user is 116, so the frequency response is computed only for 116 selected nodes. The number of load cases is 3. The frequency range of interest is from 1 Hz to 500 Hz with a 1 Hz increment. In the calculation of the partial eigensolution to form the modal frequency response problem, 750 Hz is selected as the global cutoff frequency. If a higher cutoff frequency is selected, a more accurate solution can be obtained in the modal frequency response problem. The detailed analysis information for FE model 1 is summarized in Table 8.6. Since model 1 has only structural damping, the FFRA1 algorithm is employed to solve this model. Table 8.7 compares the analysis time of FFRA, NASTRAN SOL 111, and ZSYSV in LAPACK respectively. NASTRAN SOL 111 uses a sparse matrix 100
Figure 8.5: Finite element model: Model 1 with structural damping
Table 8.6: FE model 1 analysis information Degrees of freedom 1.58 M DOF global structural damping γ Damping structural damping Ks Acoustic fluid none Large masses none Excitation frequency range 1-500 Hz Frequency increment 1 Hz Number of excitation frequencies 500 Global cutoff frequency 750 Hz Force frequency dependence frequency dependent Number of load cases 3 Number of global modes 5,818
101
decomposition to solve a linear system for a modal frequency response problem. The FFRA algorithm is almost 36 times faster than NASTRAN SOL 111 and ZSYSV in LAPACK. Table 8.7: Elapsed time for the modal frequency response analysis for FE model 1 FFRA NASTRAN SOL 111 LAPACK (ZSYSV) elapsed time 20 min 11 sec 12 hr 08 min 11 hr 48 min Table 8.8 represents the timing profile of the main steps in the FFRA algorithm for model 1. Most of the time, 77.8%, is used for solving the complex symmetric matrix eigenvalue problem in step (1). Once the complex symmetric matrix eigensolution is obtained, the time for the frequency sweep, 21.7%, is very inexpensive because the coefficient matrix of the modal frequency response problem becomes diagonal. The frequency sweep includes forming P in step (2.1) and Z in step (2.3). The time which is neglected in the timing profile is for reading and writing data, which is minimal compared to the other analysis time. Table 8.8: Timings of the algorithm FFRA for Model 1 step operation time (mm:ss) portion (1) CSYMM : CΦC = ΦC Λ - reduction, T = QT CQ 07:01 34.8% - EVP for T, T Z = ZΛ 00:46 3.8% - backtransformation, Φ = QZ 07:55 39.2% for i = 1, Nf req (2.1) P = ΦTC F 02:09 10.7% (2.2) W = D−1 P 00:03 0.2% (2.3) Z = ΦC W 02:11 10.8% end total 20:11 For accuracy evaluation, the solution of the FFRA algorithm is compared with that of NASTRAN SOL 111. Figures 8.6 (a) and (b) show the magnitude of the acceleration in the Y direction at driving point 1 and cross point 1 for the 102
Y direction excitation force. Also Figures 8.7 (a) and (b) show the magnitude of acceleration in the Z direction at driving point 2 and cross point 2 for the Z direction excitation force. The driving points are located on the floor of the truck. As shown in the Figures, the results from both the FFRA and NASTRAN SOL 111 are indistinguishable. The figures and tables show outstanding performance of the FFRA algorithm with good accuracy for a FE model with structural damping only.
103
Driver point at floor (Y dir.)
3
10
FFRA NASTRAN SOL 111
2
Inertance Magnitude (Y dir.)
10
1
10
0
10
−1
10
0
50
100
150
200
250
300
350
400
450
500
Frequency (Hz)
(a) drive point 1 Cross point at floor (Y dir.)
2
10
FFRA NASTRAN SOL 111
1
Inertance Magnitude (Y dir.)
10
0
10
−1
10
0
50
100
150
200
250
300
350
400
450
500
Frequency (Hz)
(b) cross point 1
Figure 8.6: Frequency response for Model 1 with Y direction excitation force 104
Driver point at floor (Z dir.)
3
Inertance Magnitude (Z dir.)
10
2
10
1
10
0
10
FFRA NASTRAN SOL 111 0
50
100
150
200
250
300
350
400
450
500
Frequency (Hz)
(a) drive point 2 Cross point at floor (Z dir.)
2
10
Inertance Magnitude (Z dir.)
FFRA NASTRAN SOL 111
1
10
0
10
−1
10
0
50
100
150
200
250
300
350
400
450
500
Frequency (Hz)
(b) cross point 2
Figure 8.7: Frequency response for Model 1 with Z direction excitation force 105
8.2.2
Model 2 : Structural and viscous damping
Model 2 is a full vehicle model that has 1.8 million FE degrees of freedom. Figure 8.8 shows the FE model representation. This full vehicle model includes tires, engine, exhaust system, seats, steering mechanism, and other moving parts. The frequency range of interest is from 1 Hz to 500 Hz with a 1 Hz increment, so the global cutoff frequency is set to 750 Hz. Model 2 has global structural damping γ, structural damping Ks , and viscous damping B in the FE model, so the FFRA3 algorithm is employed. Table 8.9 shows the analysis information of FE model 2. Model 2 has 7570 global modes and 3 frequency dependent forces. Table 8.9: FE model 2 analysis information Degrees of freedom 1.80 M DOF Damping global structural damping γ structural damping Ks viscous damping B none Acoustic fluid Large mass none 1-500 Hz Excitation frequency range Frequency increment 1 Hz Number of excitation frequencies 500 750 Hz Global cutoff frequency Force frequency dependence frequency dependent Number of load cases 3 Number of global modes 7,570 For comparison with the FFRA results, we ran NASTRAN SOL 108, which is for a direct frequency response analysis, and NASTRAN SOL 111. NASTRAN SOL 108 analysis takes over 200 hours, explaining why it is prohibitive to use NASTRAN SOL 108 analysis for large FE models even though the direct frequency response analysis gives the most accurate results. Table 8.10 shows the analysis time with FFRA and NASTRAN SOL 111, respectively. FFRA is almost 60 times faster than NASTRAN SOL 111. 106
Figure 8.8: Finite element model: Model 2
Table 8.10: Elapsed time for the modal frequency response analysis for FE model 2 method FFRA NASTRAN SOL 111 LAPACK (ZSYSV) elapsed time 39 min 33 sec 26 hr 30 min 26 hr 23 min
107
Table 8.11 shows the timing profile of the main steps in the FFRA algorithm. 90.7% of the analysis time is spent to solve the complex symmetric matrix eigenvalue problem for the C matrix in step (1). Then, in step (2), the rank of the viscous damping matrix is obtained. Since this model has a viscous damping matrix B which has 1.8 million rows and columns but is very sparse, Bb is extracted from B, containing non-zero rows and columns of B. The size of the Bb matrix becomes 50 × 50, and the rank of the Bb matrix is obtained from the eigenvalue problem of Bb as described in the FFRA 3 algorithm. The rank of Bb is found to be 4. The time spent for determining the rank of the viscous damping matrix is negligible, as shown in step (2). In the frequency sweep, it should be noted that only 0.2 % of the time is spent inverting the DPLR coefficient matrix in step (3.3). This shows that FFRA, which inverts the DPLR coefficient matrix, is a very effective approach compared to the classical approach which inverts the modal coefficient matrix with 7570 rows and columns at each frequency. Figures 8.9 (a) and (b) represent the magnitude of the velocity at cross point 1. The upper figure shows the comparison between FFRA and NASTRAN SOL 111, in which both results agree very well. In the lower figure, the accuracy of FFRA is compared with NASTRAN SOL 108. The accuracy of FFRA relative to the NASTRAN direct solution is entirely adequate. We can see that the FFRA solution is slightly shifted to the right of the NASTRAN direct solution, especially at the high end of the spectrum. This shift comes from the effects of substructure mode truncation in an AMLS approximate eigensolution. This situation can be improved by increasing the cutoff frequency when the partial eigensolution is calculated. Figures 8.10 (a) and (b) represent the magnitude of the velocity at cross point 2. Figure 8.10 shows that the accuracy of the FFRA solution is good compared to NASTRAN SOL 111 and SOL 108. Figure 8.11 shows the frequency response at
108
step (1)
(2.1) (2.2) (3.1) (3.2) (3.3) (3.4)
Table 8.11: Timings of the algorithm FFRA for Model 2 operation time (mm:ss) portion CSYMM : CΦC = ΦC Λ - reduction, T = QT CQ - EVP for T, T Z = ZΛ - backtransformation, Φ = QZ Dealing with viscous damping matrix B -Bb = U ΣU T -P = ΦTC (ΦTb U ) for i = 1, Nf req P = ΦTC F W1 = D−1 ∗ P W2 = S1 ∗ (I + R2T S1 )−1 S2T ΦTC F Z = ΦC ∗ (W1 − W2 ) end
total
15:58 01:16 18:48
40.4% 3.2% 47.5%
00:00 00:04
0.1%
01:26 00:01 00:05 01:13
3.6% 0.04% 0.2% 3.1%
39:33
the drive point. As shown in Figure 8.11, the results from both the FFRA and NASTRAN SOL 111 are indistinguishable. This FE model shows that the amount of time required for the frequency sweep is not significant once the complex symmetric matrix eigensolution is obtained. Also, it proves that forming the DPLR coefficient matrix is a very effective approach to save computational time when there are only a few viscous damping elements.
109
Z direction
0
10
FFRA NASTRAN SOL 111
−1
Mobility Magnitude
10
−2
10
−3
10
−4
10
0
50
100
150
200
250
300
350
400
450
500
Frequency (Hz)
(a) FFRA and NASTRAN SOL 111 Z direction
0
10
FFRA NASTRAN SOL 108
−1
Mobility Magnitude
10
−2
10
−3
10
−4
10
0
50
100
150
200
250
300
350
400
450
500
Frequency (Hz)
(b) FFRA and NASTRAN SOL 108
Figure 8.9: Frequency responses for Model 2 at cross point 1
110
Z direction
−1
10
FFRA NASTRAN SOL 111
−2
Mobility Magnitude
10
−3
10
−4
10
0
50
100
150
200
250
300
350
400
450
500
Frequency (Hz)
(a) FFRA and NASTRAN SOL 111 Z direction
−1
10
FFRA NASTRAN SOL 108
−2
Mobility Magnitude
10
−3
10
−4
10
0
50
100
150
200
250
300
350
400
450
500
Frequency (Hz)
(b) FFRA and NASTRAN SOL 108
Figure 8.10: Frequency responses for Model 2 at cross point 2
111
Z direction
0
Mobility Magnitude
10
−1
10
FFRA NASTRAN SOL 111
−2
10
0
50
100
150
200
250
300
350
400
450
500
Frequency (Hz)
(a) FFRA and NASTRAN SOL 111 Z direction
0
Mobility Magnitude
10
−1
10
FFRA NASTRAN SOL 108 −2
10
0
50
100
150
200
250
300
350
400
450
500
Frequency (Hz)
(b) FFRA and NASTRAN SOL 108
Figure 8.11: Frequency responses for Model 2 at drive point
112
8.2.3
Model 3 : Acoustic fluid
Model 3 is a full vehicle model including engine, exhaust system, seats, and steering mechanism. It also contains acoustic fluid in the passenger compartment. Figure 8.12 shows the FE model representation. The upper part shows the structure part of the FE model and the lower part shows the fluid part of the FE model. It has 2.27 million FE degrees of freedom for the structure part and 3,359 FE degrees of freedom for the fluid part. The frequency range of interest is from 20 Hz to 400 Hz with a 1 Hz increment. The global cutoff frequency is set to 650 Hz for the partial eigenvalue problems of both structure and fluid parts. Model 3 includes global structural damping γ and structural damping Ks . Table 8.12 summarizes the analysis information for FE model 3. Table 8.12: FE model 3 analysis information Degrees of freedom 2.27 M DOF (structure) 3359 DOF (fluid) Damping global structural damping γ structural damping Ks Acoustic fluid present Large mass none 20-400 Hz Excitation frequency range Frequency increment 1 Hz Number of excitation frequencies 381 Global cutoff frequency 650 Hz (structure) 650 Hz (fluid) Force frequency dependence frequency dependent Number of load cases 102 Number of global modes 4,944 (structure) 196 (fluid) Figures 8.13 (a) and (b) represent the magnitude of the velocity of the structure part at a point in X and Z direction for the excitation force in X direction, respectively. The result from FFRA agrees very well with that of NASTRAN SOL 111 in both directions. Figures 8.14 (a) and (b) show the acoustic pressure at a 113
(a) structure part
(b) fluid part
Figure 8.12: Finite element model: Model 3
114
point in the acoustic fluid with X and Z direction excitation forces, in which the results from both the FFRA and NASTRAN SOL 111 are indistinguishable. To show the accuracy more clearly, a dot symbol is used to represent the NASTRAN SOL 111 solution instead of a line. The dots are exactly on the line of the FFRA solution. Table 8.13: Elapsed time for the modal frequency response analysis for FE model 3 method FFRA NASTRAN SOL 111 LAPACK (ZSYSV) time 1 hr 32 min 10 hr 35 min 10 hr 8 min
step (1)
(2.1) (2.2) (2.3) (2.4) (2.5)
Table 8.14: Timings of the algorithm FFRA for Model 3 operation time (mm:ss) portion CSYMM : CΦC = ΦC Λ - reduction, T = QT CQ - EVP for T, T Z = ZΛ - backtransformation, Φ = QZ for i = 1, Nf req P = ΦTC [Fs , Asf ] ΦTC Ass ΦC [W1 , W2 ] = P [Us1 , Us2 ] = ΦC [W1 , W2 ] (Af f + ω 2 ATsf Us2 )Zf = Ff − iωATsf Us1 Zs = Us1 − iωUs2 Zf end
total
04:38 00:33 05:16
4.5% 0.5% 5.2%
41:16 00:32 40:14 07:04 00:15
40.6% 0.5% 39.6% 6.9% 0.2%
101:32
The analysis time is given in Table 8.13. The FFRA is about 7 times faster than NASTRAN SOL 111 and ZSYSV in LAPACK. Table 8.14 shows the timing profile of the main steps to solve the acoustic fluid and structure interaction problem using the FFRA algorithm. 10.2% of the total time is spent in step (1) for solving the complex symmetric matrix eigenvalue problem for C. In the frequency sweep, step (2), most of time is spent for forming the force matrix and computing the backtransformation in (2.1) and (2.3), respectively. Since this model has 102 115
frequency dependent load cases and 196 global modes for the fluid part, the total number of columns in the force matrix in step (2.1) becomes 298, which must be multiplied by the transpose of ΦC . This matrix multiplication operation needs to be performed for 381 excitation frequencies. Also, similar operations are performed in the backtransformation in step (2.3). Once the force matrix is formed, solving a linear system for the structure part and the fluid part is very inexpensive, as shown in steps (2.2), (2.4), and (2.5). In particular, solving the fluid system using a linear system direct solver is not a significant issue because the fluid system with 196 modes is small compared to the structure system with 4,944 modes. This numerical example shows that FFRA can reduce the computational cost significantly for an acoustic fluid and structure interaction problem while maintaining good accuracy.
116
X direction
0
10
−1
Mobility Magnitude
10
−2
10
−3
10
−4
10
FFRA NASTRAN SOL 111
−5
10
0
50
100
150
200
250
300
350
400
Frequency (Hz)
(a) response in X direction Z direction
0
10
−1
Mobility Magnitude
10
−2
10
−3
10
−4
10
FFRA NASTRAN SOL 111
−5
10
0
50
100
150
200
250
300
350
400
Frequency (Hz)
(b) response in Z direction
Figure 8.13: Frequency response of structure in X and Z direction with X direction excitation force 117
−8
4
X direction
x 10
FFRA NASTRAN SOL 111 3.5
Acoustic pressure
3
2.5
2
1.5
1
0.5
0
0
50
100
150
200
250
300
350
400
Frequency (Hz)
(a) excitation in X direction −8
4
Z direction
x 10
FFRA NASTRAN SOL 111 3.5
Acoustic pressure
3
2.5
2
1.5
1
0.5
0
0
50
100
150
200
250
300
350
400
Frequency (Hz)
(b) excitation in Z direction
Figure 8.14: Frequency response of fluid with X and Z direction excitation force 118
8.2.4
Model 4 : Asymmetric B matrix
Model 4 is a full vehicle model which contains an asymmetric B matrix. An automatic transmission torque converter induces a gyroscopic effect, which results in asymmetry in the B matrix. Table 8.15: FE model 4 analysis information Degrees of freedom 7.66 M DOF (structure) 47703 DOF (fluid) Damping global structural damping γ structural damping Ks asymmetric matrix B Acoustic fluid present none Large masses Excitation frequency range 10-100 Hz with 0.5 Hz increment 101-300 Hz with 1 Hz increment Number of excitation frequencies 381 Global cutoff frequency 500 Hz (structure) 500 Hz (fluid) Force frequency dependence frequency dependent Number of load cases 1 Number of global modes 6,022 (structure) 85 (fluid) The analysis information of FE model 4 is summarized in Table 8.15. This model contains acoustic fluid, global structural damping γ, structural damping Ks , and an asymmetric gyroscopic matrix. It has 7.66 million FE degrees of freedom for the structure and 47,703 FE degrees of freedom for the fluid. The frequency range of interest is from 10 Hz to 200 Hz with a 0.5 Hz increment. The global cutoff frequency is set to 500 Hz for both the structure and fluid parts, and 6,022 global modes for the structure part and 85 global modes for the fluid part are obtained. The performance of FFRA, NASTRAN SOL 111, and ZGESV in LAPACK is represented in Table 8.16. FFRA is about 60 times faster than NASTRAN SOL 111 analysis and ZGESV in LAPACK. FFRA performs a SVD to decompose the B 119
matrix instead of using an eigenvalue problem since this model has an asymmetric B matrix. In both LAPACK and NASTRAN SOL 111 analysis, a system of linear equations with a general matrix is solved since the coefficient matrix of the modal frequency response problem is asymmetric. Therefore, the computational cost is much higher than the cost of solving a system of linear equations with a symmetric coefficient matrix. Table 8.16: Elapsed time for the modal frequency response analysis for FE model 4 method FFRA Nastran SOL 111 LAPACK (ZGESV) time 22 min 29 sec 23 hr 33 min 22 hr 23 min
step (1)
(2.1) (2.2) (3.1) (3.2) (3.3) (3.4) (3.5)
Table 8.17: Timings of the algorithm FFRA for Model 4 operation time (mm:ss) portion CSYMM : CΦC = ΦC Λ - reduction, T = QT CQ - EVP for T, T Z = ZΛ - backtransformation, Φ = QZ Dealing with viscous damping matrix B -Bb = U ΣV T -P = ΦTC (ΦTb U ), R = (ΦTb V )T ΦC for i = 1, Nf req ΦTC [Fs , Asf ] ΦTC Ass ΦC [W1 , W2 ] = ΦTC [Fs , Asf ] [Us1 , Us2 ] = ΦC [W1 , W2 ] (Af f + ω 2 ATsf Us2 )Zf = Ff − iωATsf Us1 Zs = Us1 − iωUs2 Zf end
total
08:20 00:49 09:17
37.1% 3.6% 41.3%
00:00 00:08
0.5%
00:55 00:42 00:37 00:56 00:15
4.1% 3.1% 2.7% 4.2% 1.1%
22:29
The timing profile is summarized in Table 8.17. The complex symmetric matrix eigenvalue problem for C matrix takes 82% of the analysis time. The time for determining the rank of matrix B with SVD is negligible even though B is asymmetric. The rank of B is found to be 13. 15.2% of the analysis time is spent in 120
the frequency sweep. The time for transforming the force matrix is not significant since this model has only one load case. Similar to model 3, the analysis time for the fluid part in step (3.4) is small compared to that of the structure part. Figure 8.15 (a) shows the magnitude of velocity at a cross point, and Figure 8.15 (b) shows the acoustic pressure at a cross point. Figure 8.15 (a) gives the comparison between FFRA and NASTRAN SOL 111, and both results agree very well. In Figure 8.15 (b), the accuracy of acoustic pressures calculated from the FFRA algorithm is also compared with the solution from NASTRAN SOL 111. As shown in both figures, the accuracy of FFRA is very good. This example model shows the FFRA algorithm can deal with an asymmetric B matrix effectively with good performance as well as good accuracy.
121
X direction
3
10
FFRA NASTRAN SOL 111 2
10
Inertance Magnitude
1
10
0
10
−1
10
−2
10
−3
10
0
50
100
150
200
250
300
Frequency (Hz)
(a) structure part X direction
−5
10
FFRA NASTRAN SOL 111 −6
10
−7
Acoustic pressure
10
−8
10
−9
10
−10
10
−11
10
−12
10
0
50
100
150
200
250
300
Frequency (Hz)
(b) fluid part
Figure 8.15: Frequency response of structure and fluid parts for Model 4 122
8.2.5
Model 5 : Full modal mass and stiffness matrices
Model 5 is a full vehicle model with a modified configuration, for which model 2 is the original configuration. The modal correction approach is employed to analyze this modified system. As described in Chapter 3, the modal correction approach does not make the modal mass and stiffness matrices diagonal, so the FFRA4 algorithm is used to analyze this FE model. The performance of the FFRA algorithm compared to NASTRAN SOL 111 is presented in Table 8.18. The analysis time with the FFRA is about 27 times faster than NASTRAN SOL 111 and LAPACK. Table 8.18: Elapsed time for the modal frequency response analysis for FE model 5 method FFRA Nastran SOL 111 LAPACK (ZSYSV) time 58 min 35 sec 27 hr 11 min 26 hr 22 min The detailed analysis of the time profile is represented in Table 8.19. First, as explained in Chapter 3, the full modal mass matrix is factored in step (1.1). Then, the new matrix C is formed. Note that this C matrix is different from the matrix C in the original configuration. 30.7% of the analysis time is spent for these operations which are additional operations in the analysis for a modified configuration FE model. Then the complex eigenvalue problem for C matrix is solved, and this takes 61.1% of the analysis time. As in the previous examples, the time spent for the frequency sweep part is very small compared to the time spent for steps (1) and (2). One difference from the previous examples is that matrices K, B, and Ks are premultiplied and postmultiplied by the inverse of the factor matrix L and its transpose in step (4.1) and step (4.4). Overall, there is a 48% increase in analysis time compared to the original configuration model, FE model 2, due to the additional steps (1), (4.1), and (4.4). However, although the analysis time with FRRA is increased for the modified configuration, it can significantly reduce the overall cost of the optimization proce123
dure since FFRA can solve the modal frequency response problem for a modified configuration much faster than the existing classical methods such as direct methods, and the partial eigensolution for the FE model does not need to be computed again.
step (1.1) (1.2) (2)
(3.1) (3.2) (4.1) (4.2) (4.3) (4.4)
Table 8.19: Timings of the algorithm FFRA for Model 5 operation time (mm:ss) portion factorization M = LLT form C = (1 + iγ)L−1 KL−T + iL−1 Ks L−T CSYMM : CΦC = ΦC Λ - reduction, T = QT CQ - EVP for T, T Z = ZΛ - backtransformation, Φ = QZ Dealing with viscous damping matrix B -Bb = U ΣU T -P = ΦTC L−1 (ΦTb U ) for i = 1, Nf req P = ΦTC L−1 F W1 = D−1 ∗ P W2 = S1 ∗ (I + R2T S1 )−1 S2T P Z = L−T ΦC ∗ (W1 − W2 ) end
total
00:49 17:08
1.4% 29.3%
15:46 01:16 18:34
27.1% 2.2% 31.8%
00:00 00:04
0.1%
02:16 00:01 00:05 01:54
3.9% 0.02% 0.1% 3.3%
58:22
For accuracy comparison, the magnitude of the acceleration for two different cross points is presented in Figure 8.16. It is shown that the accuracy of the FFRA is very good compared to NASTRAN SOL 111. The numerical results of this FE model prove that the FFRA algorithm can solve a modified configuration model with good accuracy as well as high performance.
124
Y direction
0
10
−1
Mobility Magnitude
10
−2
10
FRRA NASTRAN SOL 111
−3
10
0
50
100
150
200
250
300
350
400
450
500
Frequency (Hz)
(a) cross point 1 Y direction
0
10
FRRA NASTRAN SOL 111
−1
Mobility Magnitude
10
−2
10
−3
10
−4
10
0
50
100
150
200
250
300
350
400
450
500
Frequency (Hz)
(b) cross point 2
Figure 8.16: Frequency response of structure in Y direction excitation force
125
8.2.6
Model 6 : Enforced motion
Model 6 is a full vehicle model with acoustic fluid in the passenger compartment, in which large masses are included to enforce motion. A total of 36 large masses are attached to the four wheels. Figure 8.17 shows the FE model representation without wheels with large masses. It has 2.44 million FE degrees of freedom for the structure and 19,407 FE degrees of freedom for the fluid. The frequency range of interest is from 50 Hz to 250 Hz with a 0.5 Hz increment, so that the global cutoff frequency is set to 400 Hz. The numbers of global modes for the structure and fluid are 3,017 and 57, respectively. The damping included in this FE model is global structural damping γ, structural damping Ks , and viscous damping B. Table 8.20 summarizes the analysis information for FE model 6. Table 8.20: FE model 6 analysis information Degrees of freedom 2.44 M DOF (structure) 19407 DOF (fluid) Damping global structural damping γ structural damping Ks symmetric viscous damping B Acoustic fluid present 36 Large masses Excitation frequency range 50-250 Hz Frequency increment 0.5 Hz Number of excitation frequencies 401 Global cutoff frequency 400 Hz (structure) 400 Hz (fluid) Force frequency dependence frequency dependent Number of load cases 2 Number of global modes 3017 (structure) 57 (fluid)
Table 8.21: Elapsed time for the modal frequency response analysis for FE model 6 method FFRA 1 Nastran SOL 111 LAPACK (ZSYSV) time 6 min 28 sec 1 hr 52 min 1 hr 43 min 126
step (1)
(2.1) (2.2) (3.1) (3.2) (3.3) (3.4) (3.5) (3.6)
Table 8.22: Timings of algorithm FFRA for Model 6 operation time (mm:ss) CSYMM : Cf ΦC = ΦC Λ - reduction, T = QT CQ - EVP for T, T Z = ZΛ - backtransformation, Φ = QZ Dealing with viscous damping matrix B -Bb = U ΣU T -P = ΦTC (ΦTb U ) for i = 1, Nf req P = ΦTC [Fs,f , Asf,f , Ass,f r ] ΦTC Ass,f f ΦC [W1 , W2 , W3 ] = P [U1r , U2r ] [U1f , U2f ] (Af f + ω 2 ATsf Us2 )Zf = Ff − iωATsf Us1 Zs = Us1 − iωUs2 Zf end
total
portion
01:04 00:12 01:09
16.5% 3.1% 17.8%
00:00 00:18
4.6%
00:21 02:26 00:06 00:11 00:20 00:09
5.4% 37.6% 1.6% 2.8% 5.1% 2.4%
06:28
Table 8.21 shows the analysis time with FFRA, NASTRAN SOL 111, and ZSYSV in LAPACK, respectively. The FFRA algorithm is about 15 times faster than NASTRAN SOL 111 and ZSYSV. Table 8.22 represents the timing profile of the main steps in the algorithm FFRA for model 6. Since this model has several large masses, the coefficient matrix of the modal frequency response problem is partitioned into the low frequency mode part and the fixed-mass mode part, in which the size of the low frequency mode part is 36 and the size of the fixed-mass mode part is 2981. Modes below 1 Hz are considered to be the low frequency mode part. In step (1), the complex symmetric eigenvalue problem for the flexible mode part of C is solved using CSYMM, 37.4 % of the analysis time is spent. Then the rank of Bb matrix is checked. In this model, the rank of the viscous damping matrix is 134, so it takes 37.6 % of the total time
127
to invert the DPLR matrix in step (3.2) at each frequency. Once Us1 and Us2 are obtained in steps (3.3) and (3.4), the linear system for the fluid part is solved in step (3.5). Since the number of modes of the fluid part is much smaller than the number of modes of the structure part, it is very inexpensive to solve this linear system. Figure 8.18 (a) and (b) represent the magnitude of velocity in the X direction with an X direction excitation force, and in the Z direction with a Z direction excitation force for the same point. The FFRA results agree very well with those of NASTRAN SOL 111 in both directions. Figure 8.19 (a) and (b) show the acoustic pressures in the fluid part with 2 different loads. As shown, the results from both the FFRA algorithm and NASTRAN SOL 111 are indistinguishable.
128
(a) structure part
(b) fluid part
Figure 8.17: Finite element model: Model 6
129
2.5 FFRA NASTRAN SOL 111 2
1.5
Mobility Magnitude
1
0.5
0
−0.5
−1
−1.5
−2
−2.5 50
100
150
200
250
Frequency (Hz)
(a) x direction load and x direction response 2.5 FFRA NASTRAN SOL 111 2
1.5
Mobility Magnitude
1
0.5
0
−0.5
−1
−1.5
−2
−2.5 50
100
150
200
Frequency (Hz)
(b) z direction load and z direction response
Figure 8.18: Frequency response of structure at a point 130
250
−6
4
x 10
FFRA NASTRAN SOL 111 3
Acoustic pressure
2
1
0
−1
−2
−3
−4 50
100
150
200
250
Frequency (Hz)
(a) load case 1 −6
4
x 10
FFRA NASTRAN SOL 111 3
Acoustic pressure
2
1
0
−1
−2
−3
−4 50
100
150
200
Frequency (Hz)
(b) load case 2
Figure 8.19: Frequency response of fluid at a point 131
250
8.2.7
Summary of all 6 models
Based on the results from all test models, the FFRA algorithm gives outstanding performance while obtaining results that agree well with those of NASTRAN and LAPACK routines. Figure 8.20 summarizes the performance of all 6 test FE models. As shown, NASTRAN and LAPACK routines, ZSYSV or ZGESV, give almost the same performance since both of them use a factorization of the coefficient matrix with O(n3 ) operations. However, the FFRA algorithm is about 36, 60, 7, 60, 27, and 15 times faster than NASTRAN and LAPACK routines for each model, respectively. This performance can have a significant impact on modal frequency response analysis, especially for large structures, where response to high excitation frequencies is of interest. Also, from the comparison of frequency response plots, we observe that the FFRA produces accurate solutions compared to NASTRAN.
1800
1600
FFRA NASTRAN LAPACK
1400
Elapsed time (min.)
1200
1000
800
600
400
200
0
1
2
3
4
5
6
Test model
Figure 8.20: Comparison of the elapsed time for test FE models
132
8.3
Parallel Performance of the FFRA
We implemented the parallel FFRA algorithm using the OpenMP API for all models. In this section, results for model 2 and 6 are presented. The parallel performance is measured in terms of speedup and efficiency. Speedup is the ratio of elapsed time with multiple processors to single processor elapsed time. The efficiency is the ratio of speedup to the number of processors used.
8.3.1
Example 1: model 2
The parallel performance of the FFRA algorithm for FE model 2 is presented in Table 8.23. FE model 2 has no acoustic fluid and includes structural damping and viscous damping. In the shared memory machine with 4 processors, we have a speedup of 1.52 with 2 processors, which may be considered adequate. However, the speed up with 4 processors is 2.33, and this is not as good as the speedup with 2 processors, as is evident from the efficiency. Detailed information about the parallel performance with 2 processors and 4 processors is described in Table 8.24 and Table 8.25, respectively. In Table 8.24, the timings and speedups for the main parts of the analysis with 2 processors are presented. As shown, the speedup for the backtransformation in CSYMM, the frequency sweep, and the backtransformation in the FFRA are very good. The results in Table 8.24 show that the fine grain parallelism in the two backtransformation parts and the task parallelism in the frequency sweep are good strategies in the parallel FFRA implementation. However, the speedup of the reduction part in CSYMM is not satisfactory because there is a performance bottleneck in Level 2 BLAS routine parallelization as explained in Chapter 7. The eigenvalue problem for the T matrix is not parallelized since the time spent for this analysis is negligible compared to the rest of the analysis time. Table 8.25 shows the timings and speedups with 4 processors. As in the case 133
with 2 processors, all parallel performance is excellent except the reduction part in CSYMM. Table 8.23: Parallel performance of the FFRA for model 2 Number of processors 1 2 4
FFRA elapsed time (sec.) 2360.5 1548.1 1013.2
Speedup 1.00 1.52 2.33
Efficiency 1.00 0.76 0.58
Table 8.24: Timings (second) and speedups for parallelized steps in the FFRA algorithm for model 2 No. of processors 1 2 2170.1 1388.7 (958.1) (729.7) (77.1) (77.1) (1126.8) (582.8) 76.9 39.1 73.1 37.7
Step complex symmetric matrix EVP - reduction, T = QT CQ - EVP for T, T Z = ZΛ - backtransformation, Φ = QZ frequency sweep backtransformation, Z = ΦC W
8.3.2
Speedup 1.56 (1.31) (1.00) (1.93) 1.97 1.93
Example 2: model 6
The objective of selecting this FE model is to check the parallel performance for a FE model which includes many features including structural damping, viscous damping, acoustic fluid, and large masses. FE model 6 contains all of these features. The parallel performance of the FFRA algorithm for FE model 6 is shown in Table 8.26. The speedup is 1.53 with 2 processors, and 2.7 with 4 processors. The speedup for each analysis step is represented in Table 8.27 for 2 processors and in Table 8.28 for 4 processors. For each case, the parallel performance of FE model 6 is very similar to that of FE model 2 in the previous section. This shows that
134
Table 8.25: Timings (second) and speedups for parallelized steps in the FFRA algorithm for model 2 No. of processors 1 4 2170.1 986.3 (958.1) (609.4) (77.1) (77.1) (1126.8) (299.8) 76.9 20.4 73.1 19.8
Step complex symmetric matrix EVP - reduction, T = QT CQ - EVP for T, T Z = ZΛ - backtransformation, Φ = QY frequency sweep backtransformation, Z = ΦC W
Speedup 2.20 (1.57) (1.00) (3.76) 3.77 3.69
the parallel implementation strategy is independent of the analysis features of the FE model such as types of damping and the existence of acoustic fluid and large masses because all of these features are considered only in the frequency sweep part which is parallelized with task parallelism. Also, the parallel implementation of the backtransformations in CSYMM and elswhere in FFRA has nothing to do with the features of the FE model. Table 8.26: Parallel performance of the FFRA for model 6 Number of processors 1 2 4
FFRA elapsed time (sec.) 388.5 253.1 143.8
135
Speedup 1.00 1.53 2.70
Efficiency 1.00 0.76 0.67
Table 8.27: Timings (second) and speedups for parallelized steps in FFRA algorithm for model 6 Step complex symmetric matrix EVP - reduction, T = QT CQ - EVP for T, T Y = Y Λ - backtransformation, Φ = QY frequency sweep backtransformation, Z = ΦC W
No. of processors 1 2 146.2 sec. 98.0 sec. (64.3) (49.6) (12.1) (12.1) (69.8) (36.3) 212.8 109.2 18.5 9.6
Speedup 1.49 (1.30) (1.0) (1.92) 1.95 1.93
Table 8.28: Timings (second) and speedups for parallelized steps in the FFRA algorithm for model 6 No. of processors 1 4 146.2 66.1 (64.3) (36.1) (12.1) (12.1) (69.8) (17.9) 212.8 56.1 18.5 4.9
Step complex symmetric matrix EVP - reduction, T = QT CQ - EVP for T, T Y = Y Λ - backtransformation, Φ = QY frequency sweep backtransformation, Z = ΦC W
136
Speedup 2.21 (1.79) (1.0) (3.90) 3.79 3.8
Chapter 9
Conclusions and Future Work 9.1
Conclusions
In this dissertation, a new fast frequency response analysis (FFRA) algorithm is presented for the modal frequency response problem with structural and viscous damping. The goal of the FFRA algorithm is O(n2 ) operations at each frequency, where n is the number of modes used to represent the response, while the conventional approaches require O(n3 ) operations for factorization of the coefficient matrix at each excitation frequency. For performance improvement, the FFRA algorithm has been parallelized for shared memory multiprocessors using the OpenMP application program interface (API). In Chapter 3, when only structural damping exists, the modal frequency response problem is diagonalized with the eigensolution of the complex symmetric modal stiffness matrix C = (1 + iγ)Λ + iKs . If a viscous damping matrix B is also present, the modal viscous damping matrix ΦT BΦ is typically fully populated and cannot be diagonalized by the eigenvectors of C. However, for the automobile problems of interest, B is of low rank because of the small number of viscous damping elements in the structure, so the low rank of B is exploited to formulate a low rank
137
update to the inverse of the coefficient matrix in the modal frequency response problem. When the modal frequency response problem does not have diagonal modal mass and stiffness matrices, it is solved by first performing the Cholesky factorization of the full modal matrix M = LLT , in which L is a lower triangular matrix, then representing the solution Z as Z = L−T S and premultiplying the equation of the modal frequency response problem by L−T . In Chapter 4, a complex symmetric matrix eigensolver, CSYMM, is developed in order to solve the complex symmetric matrix eigenvalue problem efficiently since the FFRA algorithm requires an eigensolution for the complex symmetric modal stiffness matrix C. CSYMM uses a complex symmetric Householder method, so that it implements a complex orthogonal similarity transformation for tridiagonalizing the complex symmetric matrix C. CSYMM uses a full storage algorithm with Level 3 BLAS operations for high performance. Also, the block packed storage scheme is developed to reduce memory usage while obtaining performance almost as good as the full storage algorithm. It should be noted that CSYMM, which requires O(n3 ) operations, is used only once in the solution of a modal frequency response problem. Then O(n2 ) operations are necessary at each excitation frequency with the FFRA algorithm. In Chapter 5, the FFRA algorithm is applied to the coupled response of a system consisting of both acoustic fluid and structure. The acoustic fluid is represented in terms of its pressure in the acoustic fluid-structure interaction problem. Instead of factoring the coefficient matrix as in the conventional approach, we partition the coefficient matrix according to the structure displacement and fluid pressure variables, and obtain the structure and fluid responses separately. The FFRA algorithm is applied to obtain the structure part of the solution in terms of the fluid part since the response of the structure part is much more expensive than the response of the fluid part. Then the fluid response is obtained using a direct linear system
138
solver. The efficiency of solving the acoustic fluid and structure interaction problem is improved significantly, and good accuracy is obtained. Chapter 6 presents a new approach for enforced motion analysis with large masses. The large mass represents the base for which the motion is to be specified. However, large masses in the FE model cause ill-conditioning in the C matrix, so current existing eigensolvers give inaccurate eigensolutions. To avoid this issue, as with the acoustic fluid and structure interaction problem, the enforced motion problem is partitioned into a nearly zero frequency mode part associated with the large masses, and a part containing of the remaining modes, which have higher natural frequencies. In Chapter 7, parallel implementation of the FFRA algorithm is described using the OpenMP API for shared memory multiprocessor machines. The FFRA algorithm is divided into 3 main parts: (1) a complex symmetric matrix eigensolution, CΦC = ΦC ΛC , (2) frequency sweep for the entire excitation frequency range to find W , and (3) backtransforming the solution W to the global modal space, Z = ΦC W . In the parallelization of the complex symmetric matrix eigensolver, fine grained parallelism is used, in which several BLAS routines are parallelized. The frequency sweep part is divided into several sub-loops based on the available processors using a task parallelism which can control the allocation of tasks and loadbalance efficiently. Once the frequency sweep is finished, the backtransformation is parallelized with fine grained parallelism. Finally, in Chapter 8, numerical results are presented to demonstrate the various aspects of the FFRA algorithm. Before testing the FFRA algorithm, the accuracy and performance of the complex symmetric matrix eigensolver are evaluated for randomly generated matrices. For a matrix of order n = 1000, the norm of a vector containing difference between eigenvalues computed using CSYMM and using a LAPACK subroutine is found to be on the order of 10−12 . The performance
139
of CSYMM is compared with ZGEEV, an eigensolver in LAPACK, and CS, which is a complex symmetric matrix eigensolver developed by Bar-On and Ryaboy [55]. In the reduction, CSYMM and CS are much faster than ZGEEV. However, in the backtransformation, CSYMM and ZGEEV are much faster than CS. Six FE models from the automotive industry are used to evaluate all features of the FFRA algorithm, including capabilities for handling structural damping, symmetric B matrix and asymmetric B matrix, full modal mass and stiffness matrices, acoustic fluid, and large masses. To evaluate the accuracy of FFRA, the solutions from FFRA are compared with those from NASTRAN. For all six FE models, FFRA gives a very accurate solution while obtaining outstanding performance. For these models, the FFRA algorithm is from seven to sixty times faster than both NASTRAN and LAPACK routines in terms of elapsed time. For model 1 which has structural damping and 5,818 global modes, 77.8 % of the FFRA analysis time is spent in CSYMM. Once the complex symmetric matrix eigensolution is obtained, the frequency sweep is very inexpensive because the coefficient matrix of the modal frequency response problem becomes diagonal. Model 2 has 7,570 global modes and viscous damping. The time spent for determining the rank of the viscous damping matrix is negligible since the rank is only four. Then, in the frequency sweep, only 0.2 % of the analysis time is spent for inverting the diagonal plus low rank (DPLR) coefficient matrix, so forming and inverting the DPLR coefficient matrix is shown as a very effective approach compared to the classical approach which factors the modal coefficient matrix with 7,570 rows and columns at each frequency. Model 3 contains acoustic fluid. This model has 4,944 and 196 global modes for structure and fluid parts, respectively. 80.9 % of the total time is spent for solving the structural part using the FFRA algorithm, and only 9.9 % of the analysis time is used for the fluid part of the solution with a direct linear system solver. Model 4 has a gyroscopic matrix. The SVD is used to obtain the
140
rank of the asymmetric B matrix, and then the coefficient matrix is transformed to a DPLR matrix. Model 5 has fully populated modal mass and stiffness matrices which result from the modal correction approach. The analysis procedure is almost the same as for the original configuration model which has diagonal modal mass and stiffness matrices, except for the factorization of the full modal mass matrix and transformation of other matrices with L−1 and L−T . Therefore, FFRA can significantly reduce the overall cost of the optimization procedure since FFRA can solve the modal frequency response problem for a modified configuration much faster than the existing methods. Finally, model 6 uses enforced motion analysis with large masses, with a total of 36 large masses. The coefficient matrix is partitioned into the structure part with 3,017 global modes, and the fluid part with 57 global modes. The structure part is partitioned again into the 36 low frequency modes and the 2,981 fixed-mass modes. By applying FFRA to obtain the higher frequency modes part of the solution, an accurate solution is obtained with good performance. In the parallel implementation of FFRA, for model 2, we have a speedup of 1.52 with 2 processors, but the speedup with 4 processors is 2.33. The speedups for the backtransformation in CSYMM, the frequency sweep, and the backtransformation in the FFRA are satisfactory: 1.93, 1.97, and 1.93 with 2 processors, and 3.76, 3.77, and 3.69 with 4 processors. However, the speedup of the reduction part in CSYMM is not satisfactory: 1.31 with 2 processors and 1.57 with 4 processors, because there is a performance bottleneck in Level 2 BLAS routine parallelization. This poor performance degrades the overall parallel performance of the FFRA. Overall, the FFRA algorithm with O(n2 ) operations makes inexpensive high frequency analysis possible. Also, the capability of the FFRA algorithm allows an analyst to explore easily complex problems with structural damping, viscous damping, acoustic fluid, and enforced motion.
141
9.2
Future Work
Frequency dependent materials The capability to solve the modal frequency response problem with structural damping and/or viscous damping is built in the FFRA algorithm. However, the FFRA algorithm considers only frequency independent materials. In some cases, properties of a viscoelastic material may depend on the frequency of excitation. So this must be considered in the frequency response analysis. A great deal of work has been devoted to this topic with classical frequency response analysis methods. Therefore, it is worthwile to extend the capability of the FFRA to the analysis of structures with frequency dependent materials. Defective matrix capability It is possible for the modal stiffness matrix to be defective, so that it cannot be diagonalized by a similarity transformation. In this case it can be made nearly diagonal by transforming it to Jordan form, as mentioned in section 4.1. it may be worthwhile to develop the capability to handle defective matrices in the FFRA and CSYMM. Parallelization Much of the research in developing software for advanced architecture computers is motivated by the need to solve large and complex problems in parallel. In this dissertation, the FFRA algorithm has been parallelized for shared memory multiprocessors using the OpenMP API. It would also be desirable to implement the FFRA algorithm in parallel in distributed memory multiprocessors using MPI. This parallelization will improve the performance of the FFRA significantly, and will enable it to solve much larger problems than a single processor or a few processors
142
with shared memory can solve.
9.3
Final Remarks
The impact of the new FFRA algorithm is significant for solving the modal frequency response problem since it uses one order less operations, O(n2 ), than classical approaches with O(n3 ). Along with the AMLS method which can reduce greatly the cost of computing a partial eigensolution, the outstanding performance of the FRRA algorithm can reduce the overall frequency response analysis time significantly for large structural systems that previously requires days of computation. Such a problem can now be solved in several hours on an engineering workstation.
143
Appendix A
Eigenvector orthogonality in the Jordan form Given a defective complex symmetric matrix A = AT ∈ Cn×n , we choose P so that P −1 AP = J is as nearly diagonal as possible, in which J is the Jordan form matrix. As the simplest example, we assume the Jordan form J has (n − k) independent eigenvectors for distinct eigenvalues and k − 1 or less generalized eigenvectors for repeated eigenvalues λr . A distinct eigenvalue λd and the corresponding eigenvector xd satisfy the following equation Axd = λd xd .
(A.1)
For the repeated eigenvalue λr , which corresponds to one eigenvector xn−k+1 and up to k − 1 generalized (non-zero) eigenvectors, they satisfy the equations Axn−k+1 = λr xn−k+1
(A.2)
Axn−k+2 = λr xn−k+2 + xn−k+1
(A.3)
Axn = λr xn + xn−1
(A.4)
144
Next, we multiply equation (A.1) on the left by xn−k+1 and equation (A.2) on the right by xd , subtract the second result from the first and obtain (λd − λr )xTn−k+1 xd = 0.
(A.5)
Since these eigenvalues are distinct, we must have xTn−k+1 xd = 0,
λd 6= λr
(A.6)
Also, premultiplying equation (A.1) by xn−k+2 and premultiplying equation (A.3) by xd , and subtracting the second result from the first, we obtain xTn−k+2 xd = 0
(A.7)
using the result in equation (A.6). Similarly, we can obtain xTn−k+i xd = 0,
i = 3··· ,k
(A.8)
Equations (A.6)-(A.8) state that the eigenvectors of a distinct eigenvalue and the eigenvector and generalized eigenvectors of the repeated eigenvalue are orthogonal to each other.
145
Bibliography [1] Z. Osinski, Damping of Vibration, A.A.Balkema, Rotterdam, 1998. [2] D. I. Jones, Handbook of Viscoelastic Vibration Damping, John Wiley and Sons Ltd, 2001. [3] D. J. Ewins, Modal Testing : Theory and Practice, John Wiley and Sons Inc., 1984. [4] A. D. Nashif, D. I. G. Jones, and J. P. Henderson, Vibration Damping, John Wiley and Sons Inc., 1985. [5] R. J. Guyan, “Reduction of Stiffness and Mass matrices,” AIAA Journal, 3, 1965, pp. 380. [6] N. H. Kim, K. K. Choi, J. Dong, and C. Pierre, “Design Optimization of Structural Acoustic Problems using FEM-BEM,” 44th AIAA/ASME/ASCE/ASC Structure, Structural Dynamics, and Materials Conference, April 7-10, 2003, Norfolk, VA [7] C. Lanczos, “An Iterative Method for the Solution of the Eigenvalue Problem of Linear Differential and Integral Operators,” J. Res. Nat. Bur. Stand., 45, 1950, pp. 255-282.
146
[8] B. Irons, “Structural Eigenvalue Problems : Elimination of Unwanted Variables,” AIAA Journal, 3, 1965, pp. 961-962. [9] W. C. Hurty, “Vibration of Structural System by Component Mode Synthesis,” ACES Journal of the Engineering Mechanics Division, 86, No. EM4, 1960, pp. 51-69. [10] W. C. Hurty, “Dynamic Analysis of Structural System Using Component Modes,” AIAA Journal, 3, 1965, pp. 675-685. [11] R. R. Craig and M. C. Bampton, “Coupling of Substructures for Dynamic Analysis,” AIAA Journal, 6, 1968, pp. 1313-1319. [12] R. L. Goldman, “Vibration Analysis by Dynamic Partitioning,” AIAA Journal, 7, 1969, pp. 1152-1154. [13] W. A. Benfield and R. F. Hruda, “Vibration Analysis of Structures by Component Mode Substitution,” AIAA Journal, 9, 1971, pp. 1255-1261. [14] T. K. Hasselman, “Modal Coupling in Lightly Damped Structures,” AIAA Journal, 14, 1976, pp. 1627-1628. [15] A. Y. T. Leung, “A Simple Dynamic Substructure Method,” Earthquake Engineering Structural Dynamics, 16, 1988, pp. 827-837. [16] A. L. Hale and L. A. Bergman, “The Dynamics Synthesis of General Nonconservative Structures from Separately Identified Substructure Models,” J. of Sound and Vibration, 98, 1985, pp. 431-446. [17] L. Meirovitch and A. L. Hale, “On the Substructure Synthesis Method,” AIAA Journal, 19, 1981, pp. 940-947.
147
[18] B. N. Parlett and H. C. Chen, “Use of Indefinite Pencils for Computing Damped Natural Modes,” Linear Algebra and its Application, 140, 1990, pp. 53-88. [19] A. Y. T. Leung, Dynamic Stiffness and Substructures, Springer-Verlag, 1993. [20] L. Meirovitch, Principles and Techniques of Vibrations, Prentice Hall, 1997. [21] F. Tisseur and K. Meerbergen, “The Quadratic Eigenvalue Problem,” SIAM REVIEW, 43, No. 2, 2001, pp. 235-286. [22] G. Peters and J. H. Wilkinson, “Inverse Iteration, Ill-Conditioned Equations, and Newton’s Method,” SIAM REVIEW, 21, 1979, pp. 339-360. [23] G. Sleijpen, H. Vorst, and M. Gijzen “Quadratic Eigenproblems Are No Problem,” SIAM News, 29, 1996, pp. 8-9. [24] Y. Saad, “Numerical Methods for Large Eigenvalue Problems,” Manchester University Press, Manchester, UK, 1992. [25] M. F. Kaplan, “Implementation of Automated Multilevel Substructuring for Frequency Response Analysis of Structures,” Ph. D. Dissertation, The University of Texas at Austin, 2001. [26] J. K. Bennighof, M. F. Kaplan, M. Kim, C. W. Kim, and M. B. Muller, “Implementing Automated Multilevel Substructuring in Nastran Vibroacoustic Analysis,” SAE Transactions-Journal of Passenger Cars-Mechanical Systems, 6, 2002, pp. 1474-1480. [27] J. K. Bennighof, M. B. Muller, M. Kim, and C. W. Kim, “Automated Multilevel Substructuring: Issues in Large Scale Vibration Analysis,” 7th U.S. National Congress on Computational Mechanics, Albuquerque, NM, July 2003.
148
[28] J. K. Bennighof and R. B. Lehoucq, “An Automated Multilevel Substructuring Method for Eigenspace Computation In Linear Elastdynamics,” SIAM Journal of Scientific Computing, Vol. 25, No. 6, pp.2084-2106. [29] L. W. Dunne and D. W. Deuermeyer, “MSC/NASTRAN 68X - A Tool for NVH Response Optimization,” Proc. of the 3rd International Conference on High Performance Computing in the Automotive Industry Silicon Graphics, Inc., 1997, pp. 97-110. [30] H. Misra, E. Johnson, and L. Komzsik, “NVH Optimization on NEC Supercomputers Using MSC/NASTRAN,” MSC 1999 Aerospace Users’ Conference Proceedings, No. 0599, 1999. [31] Y. Saad, Numerical Methods for Large Eigenvalue Problems, Manchester Univ. Press, Manchester, 1992. [32] G. L. G. Sleijpen, A. G. L. Booten, D. R. Fokkema and H. A. Van der Vorst, “Jacobi-Davidson Type Methods for Generalized Eigenproblems and Polynomial Eigenproblems,” BIT, 36, 1996, pp. 595-633. [33] F. Tisseur and K. Meerbergen, “A Survey of the Quadratic Eigenvalue Problem,” Manchester Center for Computational Mathematics, Numerical Analysis Report No. 370, The University of Manchester, 2000. [34] J. K. Bennighof and M. F. Kaplan, “Frequency Sweep Analysis Using MultiLevel Substructuring, Global Modes, and Iteration,” Proceeding of 39th AIAA/ASME/ASCE/AHS Structures, Structural Dynamics and Materials Conference, Long Beach, April 1998, Paper No.98-2015. [35] R. J. Allemang, Vibrations: Analytic and Experimental Modal Analysis, UCSDRL-CN-20-263-662, University of Cincinnati, 1994. 2001.
149
[36] C. W. Bert, “Materal Damping: An Introductory Review of Mathematical Models, Measures, and Experimental Techniques,” Journal of Sound and Vibration, 29, No.2, 1973, pp. 129-153. [37] J. R. Bunch and L. Kaufman, “Some Stable Methods for Calculating Inertia and Solving Symmetric Linear Systems,” Math. Comp., 31, 1977, pp. 163-179. [38] J. R. Bunch and B. N. Parlett, “Direct Methods for Solving Symmetric Indefinite Systems of Linear Equations,” SIAM J. Numer. Anal., 8, 1971, pp. 639-655. [39] J. O. Aasen, “On the Reduction of a Symmetric Matrix to Tridiagonal Form,” BIT, 11, 1971, pp. 233-242. [40] C. Ashcraft, R. G. Grimes, and J.G. Lewis, “Accurate Symmetric Indefinite Linear Equation Solvers,” SIAM J. Matrix Anal. Appl., 20, 2, 1998, pp. 513561. [41] A. S. Householder, “Unitary Triangularization of a Nonsymmtric Matrix,” J. Assoc. Comput. Mach., 5, 1958, pp. 339-342. [42] J. Dongarra, J. R. Bunch, C. B. Moler, and G. W. Stewart, “A Set of Level 3 Basic Linear Algebra Subprograms,” ACM. Trans. Math. Software, 14, 1988, pp. 1-17. [43] B. Smith, J. Boyle, J. Dongarra, B. Garbow, Y. Ikebe, V. Klema, and C. Moler, Matrix Eigensystem Routines, EISPACK Guide, Lecture Notes in Comput. Sci. 6, Springer-Verlag, Berlin, 1976. [44] J. Dongarra, J. Bunch, C. Moler, and G. W. Stewart. Linpack users guide, SIAM, Philadelphia, PA, 1979.
150
[45] Y. Saad and M. Schultz, “GMRES : A Generalized Minimal Residual Algorithm for Solving Nonsymmetric Linear Systems,” SIAM J. Sci. and Stat. Comp., 7, 1986, pp. 856-869. [46] C. C. Paige and M. A. Saunders, “Solution of Sparse Indefinite Systems of Linear Equations,” SIAM J. Numer. Anal., 12, 1975, pp. 617-629. [47] R. Freund and N. Nachtigal, “QMR: A quasi-minimal residual method for non-Hermitian linear system,” Numer. Math., 60, 1991, pp.315-339. [48] R. Barrett, M. Berry, T. F. Chan, J. Demmel, J. Donato, J. Dongarra,V. Eijkhout, R. Pozo, C. Romine, and H. Van der Vorst, Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, 2nd ed. Philadelphia, PA: SIAM, 1994. [49] G. H. Golub and C. F. Van Loan, Matrix Computations, Johns Hopkins, 1996. [50] R. A. Horn and C. R. Johnson, Matrix Analysis, Cambridge University Press, Cambridge, UK, 1985. [51] J. Cullum and R. Willoughby, Lanczos Algorithms for Large Symmetric Eigenvalues Computations, In Theory, Vol. 1, Boston, Cambridge, MA, , 1985. [52] J. Dongarra, J. R. Bunch, C. B.Moler, and G. W.Stewart, “A Set of Level 3 Basic Linear Algebra Subprograms,” ACM. Trans. Math. Software, 14, 1988, pp. 1-17. [53] R. R. Craig, Structural Dynamics : An Introduction to Computer Methods, John Wiley., New York, 1981. [54] R. B. Lehoucq, “The Computation of Elementary Unitary Matrices,” ACM Trans. on Mathematical Software , 22, 4, 1996, pp. 393-400.
151
[55] I. Bar-On and V. Ryaboy, “Fast Diagonalization of Large and Dense Complex Symmetric Matrices with Application to Quantum Reaction Dynamics,” SIAM J. Sci. Comput., 18, 1997, pp. 1412-1435. [56] I. Bar-On and V. Ryaboy, “High Performance Solution of the Complex Symmetric Eigenproblem,” Numerical Algorithms, 18, 1998, pp. 195-208. [57] R. Schreiber and C. Van Loan, “A Storage Efficient WY Representation for Products of Householder Transformations,” SIAM J. Sci. Statist. Comput., 10, 1989, pp. 53-57. [58] E. Anderson, Z. Bai, C. Bischof, L.S. Blackford, J. Demmel. J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney , and D. Sorenson LAPACK Users’ Guide (Third Edition), SIAM, 1999. [59] C. Bischof and C. Van Loan, “The WY Representation for Products of Householder Matrices,” SIAM J. Sci. Statist. Comput., 8, No. 1, 1987, pp. 2-13. [60] F. T. Luk and S. Qiao, “Using Complex-Orthogonal Transformation to Diagonalize a Complex Symmetric Matrix”, Advanced Signal Processing Algorithms, Architectures, and Implementations VII Proc.SPIE 3162, 1997, pp. 418-425. [61] F. T. Luk and S. Qiao, “A Fast Eigenvalue Algorithm for Hankel Matrices,” Linear Algebra and Its Applications, 316, 2000, pp. 171-182. [62] O. C. Zienkiewicz, and P. Bettes, “Fluid-Structure Dynamic Interaction and Wave Forces. An Introduction to Numerical Treatment,” Int. J. Num. Meth. Engng, 13, 1978, pp. 1-16. [63] M. A. Hamdi and Y. Ousset, “A Displacement Method for the Analysis of Vibration of Coupled fluid-structure systems,” Int. J. for Numerical Methods in Engineering, 13, 1978, pp. 139-150. 152
[64] X. Wang and K. J. Bathe, “Displacement/Pressure Based Mixed Finite Element Formulations for Acoustic Fluid-Structure Interaction Problems,” Int. J. for Numerical Methods in Engineering, 40, 1997, pp. 2001-2017. [65] G. C. Everstine, “A Symmetric Potential Formulation for Fluid-Structure Interaction,” J. of Sound and Vibration, 79(1), 1981, pp. 157-160. [66] G. C. Everstine, “Finite Element Formulations of Structural Acoustics Problems,” Computers and Structures, 65(3), 1997, pp. 307-321. [67] J. K. Bennighof, “Vibroacoustic Frequency Sweep Analysis using Automated Multi-Level Substructuring,” Proc. of AIAA 40-th SDM Conference, St.Louis, Missouri, April, 1999. [68] K. Blakely, MSC/NASTRAN User’s Guide : BASIC DYNAMIC ANALYSIS, The MacNeal-Schwendler Corporation, 1993. [69] MSC/NASTRAN User’s Manual, The MacNeal-Schwendler Corporation, 1994. [70] Blackford, L.S., Choi, J., Cleary, A., D’Azevedo, E., Demmel, J., Dhillon, I., Dongarra, J., Hammarling, S., Henry, G., Petitit, A., Stanley, K., Walker, D. and Whaley, R.C., ScaLAPACK Users’ Guide, SIAM, Philadelphia PA, 1997. [71] S. Balay, W.D. Gropp, L. Curfman McInnes, and B.F. Smith, ”Efficient Management of Parallelism in Object Oriented Numerical Software Libraries,” Modern Software Tools in Scientific Computing, Birkhauser Press, 1997. [72] R. A. van de Geijin, Using PLAPACK : Parallel Linear Algebra Package, The MIT Press, 1997. [73] J.W. Demmel, Applied Numerical Linear Algebra, SIAM, 1997.
153
[74] V. Fraysse, L. Giraud, and S. Gratton, A set of Flexible GMRES routines for Real and Complex Arithmetics, CERFACS Technical Report TA/RA/98/20, France, 1998. [75] R. Freymann, Advanced Numerical and Experimental Methods in the Field of Vehicle Structural-Acoustics, Hieronymus, Germany, 2000. [76] Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, and H. van der Vorst, Templates for the Solution of Algebraic Eigenvalue Problems: A Practical Guide, SIAM, Philadelphia, 2000. [77] C. Addison and Y. Ren, OpenMP Issues Arising in the Development of Parallel BLAS and LAPACK libraries, Third European Workshop on OpenMP (EWOMP’01), Spain, 2001. [78] J. Choi, J. J. Dongarra, D. Walker, The Design of a Parallel Dense Linear Algebra Software Library: Reduction to Hessenberg, Tridiagonal, and Bidiagonal Form, LAPACK Working Note 92, University of Tennessee, Techanical Report CS-95-275. [79] K. Goto, R. van de Geijin, On Reducing TLB Misses in Matrix Multiplication, FLAME Working Note 9, The University of Texas at Austin, 2002. [80] B. Hendrickson, E. Jessup, and C. Smith, “Toward an Efficient Parallel Eigensolver for Dense Symmetric Matrices,” SIAM J. Sci. Comput., 20, No. 3, 1999, pp. 1132-1154.
154
Vita Chang-wan Kim was born in Kyungbuk, Korea on March 20, 1969. After completing his work at Pohang Steel Iron High School, Pohang, Korea, in 1987, He entered Hanyang University. He received the degree of Bachelor of Science in Mechanical engineering. In March, 1991, He entered Pohang University of Science and Technology. After completing the degree of Master of Science in Mechanical engineering in 1991, he joined the military research institute, Agency for Defense Development(ADD), Taejon, Korea, in 1993. In Fall 1997, he entered the graduate pgrogram, Computational Applied Mathematics (CAM), at The University of Texas. After completing his work the degree of Master of Science in CAM, he continued his Ph.D in Aerospace engineering.
Permanent Address: 2501 LakeAustin Blvd. Apt.C-206, Austin, TX, 78703, USA
This dissertation was typeset with LATEX 2ε 1 by the author. 1 A LT
EX 2ε is an extension of LATEX. LATEX is a collection of macros for TEX. TEX is a trademark of the American Mathematical Society. The macros used in formatting this dissertation were written by Dinesh Das, Department of Computer Sciences, The University of Texas at Austin, and extended by Bert Kay and James A. Bednar.
155