Techniques of scientific computing for the energy and environment [illustrated edition] 1600219217, 9781600219214

Research and development in scientific computing and computational science has considerably increased the power of numer

261 49 2MB

English Pages 112 Year 2007

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
TECHNIQUES OF SCIENTIFIC COMPUTING FOR ENERGY AND THE ENVIRONMENT......Page 3
NOTICE TO THE READER......Page 6
CONTENTS......Page 7
PREFACE......Page 9
1. Introduction......Page 11
2.1. Nonlinear ShallowWater Equation......Page 12
2.2. Ecological Model......Page 13
3.1. Lyapunov’s Stability Theory......Page 14
3.2. Linearization......Page 15
4.1. Arnoldi’s Method......Page 16
5.2. Algorithm......Page 17
5.3. Sensitivity Matrix......Page 18
6.1. Modal Analysis......Page 19
7.1. Verification of Ecological Model......Page 21
7.2. Analysis in Lake Kasumigaura......Page 23
References......Page 29
1. Introduction......Page 31
2. Finite Element Method Package Hydro-Geo......Page 32
3. Numerical Procedure......Page 33
4. Parallel Hydro-Geo For Shared Memory Supercomputers......Page 35
5.1. Distributed FEM Package......Page 36
6.1. Embankment Rising......Page 38
6.2. Besko Dam......Page 39
Conclusions......Page 42
References......Page 43
1. Introduction......Page 47
2. Programmable Filters......Page 49
3.1. Creating and Manipulating VTK Arrays......Page 50
3.2. Data Passing Library......Page 51
3.3. Passing Data from VTK Object to Fortran Routine......Page 52
4.1. Simple Pipeline Example......Page 53
4.2. Pipeline with Filter......Page 54
4.3. User Function......Page 56
5.1. Simple Pipe Example......Page 57
5.2. Pipeline with Filter......Page 58
6. Using Dynamically Linked Function as Filter Method in Tcl......Page 59
6.1. Adding New Built-in Command to Tcl Interpreter......Page 60
6.2. Tcl Interface to VTK Library......Page 62
6.3. Accessing C++ Object from Tcl Script......Page 63
6.4. vtkProgrammableFilter with C++ User Function......Page 64
A Example of Accessing Object Data with Routines from dpl Library......Page 68
B dpl Library......Page 70
References......Page 77
1. Introduction......Page 79
2.1. Basic Equation......Page 80
2.2. Finite Element Interpolation......Page 81
2.3. Fictitious Domain Formulation......Page 82
2.4. Movement of Propeller......Page 84
3.2. Finite Element Mesh......Page 85
3.3. Numerical Result......Page 87
4. Numerical Study 2......Page 90
4.2. Numerical Result......Page 91
References......Page 92
1. Introduction......Page 95
2. Algorithm......Page 96
3. Parallel Implementation......Page 98
4. Test Case Configuration......Page 99
5. Results......Page 100
References......Page 105
INDEX......Page 109
Recommend Papers

Techniques of scientific computing for the energy and environment [illustrated edition]
 1600219217, 9781600219214

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

TECHNIQUES OF SCIENTIFIC COMPUTING FOR ENERGY AND THE ENVIRONMENT

TECHNIQUES OF SCIENTIFIC COMPUTING FOR ENERGY AND THE ENVIRONMENT

FRÉDÉRIC MAGOULÈS AND

RIAD BENELMIR EDITORS

Nova Science Publishers, Inc. New York

Copyright © 2007 by Nova Science Publishers, Inc.

All rights reserved. No part of this book may be reproduced, stored in a retrieval system or transmitted in any form or by any means: electronic, electrostatic, magnetic, tape, mechanical photocopying, recording or otherwise without the written permission of the Publisher. For permission to use material from this book please contact us: Telephone 631-231-7269; Fax 631-231-8175 Web Site: http://www.novapublishers.com NOTICE TO THE READER The Publisher has taken reasonable care in the preparation of this book, but makes no expressed or implied warranty of any kind and assumes no responsibility for any errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of information contained in this book. The Publisher shall not be liable for any special, consequential, or exemplary damages resulting, in whole or in part, from the readers’ use of, or reliance upon, this material. Any parts of this book based on government reports are so indicated and copyright is claimed for those parts to the extent applicable to compilations of such works. Independent verification should be sought for any data, advice or recommendations contained in this book. In addition, no responsibility is assumed by the publisher for any injury and/or damage to persons or property arising from any methods, products, instructions, ideas or otherwise contained in this publication. This publication is designed to provide accurate and authoritative information with regard to the subject matter covered herein. It is sold with the clear understanding that the Publisher is not engaged in rendering legal or any other professional services. If legal or any other expert assistance is required, the services of a competent person should be sought. FROM A DECLARATION OF PARTICIPANTS JOINTLY ADOPTED BY A COMMITTEE OF THE AMERICAN BAR ASSOCIATION AND A COMMITTEE OF PUBLISHERS. LIBRARY OF CONGRESS CATALOGING-IN-PUBLICATION DATA Available upon request

ISBN 13: 978-1-60692-823-3

Published by Nova Science Publishers, Inc.

New York

CONTENTS Preface

vii Frédéric Magoulès and Riad Benelmir

Stability Analysis of Abnormal Multiplication of Plankton Considering Flow Velocity T. Yamauchi and M. Kawahara

1

Achieving High-Performance Computing in Geomechanics by Development of Parallel Finite Element Package F. Okulicka - Dłuzewska

21

Large-Scale Data Visualization Using Multi-Language Programming Applied to Environmental Problems F. Magoulès and R. Putanowicz

37

An Analysis of Flow Around a Propeller Using Fictitious Domain Finite Element Method K. Harada and M. Kawahara

69

Numerical Simulation of Supersonic Combustion Using Parallel Computing E. von Lavante and M. Kallenberg

85

Index

99

P REFACE Fr´ed´eric Magoul`es and Riad Benelmir The research and development in Scientific Computing and Computational Science have considerably increased the power of numerical simulation. Engineers and researchers are now able to solve large and complex problems which were impossible to solve in the past. This Special Issue presents some techniques, methods and algorithms for solving engineering problems arising in energy and environment. The first article by T. Yamauchi and M. Kawahara presents a numerical method for the abnormal multiplication of plankton that happens by the water pollution. In this study, the basic equation represents food chain of ecological model, which consists of phytoplankton, zooplankton, and nutrient. The stability problem, the eigenvalue problem and the parameter identification technique are reintroduced. Experimental data used in the numerical simulation arise from the Lake Kasumigaura, located in the Ibaraki Prefecture in Japan. A discretisation with finite element is considered for the numerical experiments. The paper of F. Okulicka - Dłuzewska analyzes the geotechnical problem, the embankment rising, the dam building and the settlement of the underground. A finite element software is considered for this analysis. Due to the large amount of data used in such model, these environmental problems appear to be difficult to solve with classical finite element methods. The use of high performance computers is thus mandatory. For this reason, the finite element software is parallelized on a network of PC’s, and details of the proposed approach are presented by the authors. The paper written by F. Magoul`es and R. Putanowicz describes a technique well suited for the visualization and analysis of large data sets arising in environmental problems. This technique is based on a multi-programming approach using the visualisation toolkit VTK library, and components written in Tcl and C++. VTK is an open-source software system for visualisation, computer graphics and imaging. Though it is possible to write a whole VTK application in a scripting language like Tcl, it is more suitable, for efficiency reasons, to implement some functionality in a compiled language like C/C++ for instance. This is specially the case when working with large data sets arising from environment analysis for instance, as presented here. Pieces of code and detailed examples are provided to the reader in order to alow him to program his own software. The paper by K. Harada and M. Kawahara describes the analysis of flow around a propeller. This analysis is related to the minimisation of the turbulence around the propeller which leads to a lost of energy for the engine. The proposed analysis is based on the fictitious domain method and a finite element discretisation is performed for the numerical experiments. Finally, the paper of E. von Lavante and M. Kallenberg discusses the numerical simulation of supersonic combustion using parallel computing. This analysis is related to a better understanding of the transfer of energy involved in such combustion. For this purpose the

viii

Fr´ed´eric Magoul`es and Riad Benelmir

unsteady, three-dimensional, supersonic flow in a channel with transverse hydrogen injection is simulated. The time accurate computation was accelerated by an implicit method and implemented on a massively parallel computer. The parallelization is accomplished using domain decomposition on a distributed memory systems. The relative efficiency and relative speedup of the parallel algorithm are analyzed for various sizes of the problem and number of processor units. Naturally, the present issue cannot provide a complete record of the many approaches, applications, features, and numerical methods related to energy and environment. However, it does give an indication of the progress that is being made in addressing these issues and the possibilities that are available for future research in this area. Fr´ed´eric Magoul`es Universit´e Henri Poincar´e Institut Elie Cartan de Nancy, BP 239 54506 Vandoeuvre-les-Nancy Cedex, France

Riad Benelmir Universit´e Henri Poincar´e Ecole Sup. Sc. Tech. Ing. de Nancy 2 rue Jean Lamour 54519 Vandoeuvre-les-Nancy Cedex, France

In: Techniques of Scientific Computing for Energy ... ISBN 1-60021-921-7 c 2007 Nova Science Publishers, Inc. Editors: F. Magoul`es and R. Benelmir, pp. 1–20

S TABILITY A NALYSIS OF A BNORMAL M ULTIPLICATION OF P LANKTON C ONSIDERING F LOW V ELOCITY Tomohiro Yamauchi and Mutsuto Kawahara∗ Department of Civil Engineering, Chuo University, Kasuga 1-13-27,Bunkyou-ku,Tokyo 112-8551,Japan

Abstract This paper presents a numerical method for the abnormal multiplication of plankton that happens by the water pollution. In this study, the basic equation represents food chain of ecological model, which consists of phytoplankton, zooplankotn, and nutrient. Flow velocity and time-lag increase are added to the ecological model as a new approach. Velocity of the flow is obtained by using the nonlinear shallow water equation. The stabilized bubble function finite element method is applied to the spatial discretization in an analysis of the nonlinear shallow water flow. In this paper, an abnormal multiplication is thought as one of the unstable problem.Therefore, if there is no problem of water quality, the system is stable. The stability of the system is investigated introducing the eigenvalues of the basic equation. The stability of the system can be judged by the eigenvalue based on the Lyapunov’s stability theory. In this paper, the Arnoldi-QR method is used to obtain eigenvalues and eigenvectors of the system. The Lake Kasumigaura, that is located in the Ibaraki Prefecture in Japan, is selected and actual data in 1991 is used in order to guess the phenomenon of plankton at the lake. Mode analysis is employed to make the initial distribution at the lake Kasumigaura. Finally, change of distribution of plankton patchness for various time stage and equilibrium solution is obtained.

Keywords: plankton, shallow water equation, ecological model, stability, parameter identification

1.

Introduction

Recently, environmental problems are serious problems. Water pollution causes a large amount of nutrient, because the industrial waste water, that streamed into the river, lake and sea, etc. includes a lot of nutrient, which means nitrogen and phosphorous. The phytoplankton preys the nutrient. As a result, a large amount of nutrient causes an abnormal multiplication of the plankton. Fish and shellfish die by suffocation because a lot of plankton consumes a large quantity of oxygen. Therefore, the abnormal multiplication of the plankton has damaged seriously fishing industry. In fact, income of villages dramatically ∗ E-mail

address: [email protected]

2

Tomohiro Yamauchi and Mutsuto Kawahara

decrease for heavy damage more than billion yen. This abnormal multiplication of the plankton is called as red tide or blue-green alagae 1),2),3),4). Prediction of abnormal multiplication of plankton using numerical analysis leads to prevention of abnormal multiplication. In the recent study, abnormal multiplication is thought as one of the unstable problem 5),6). The purpose of this study is to prevent an abnormal multiplication of plankton by investigating stability of the system considering flow velocity. To obtain the initial spatial distribution of plankton at the Lake Kasumigaura, the mode analysis is applied. This spatial distribution is used as the initial data.

2.

Basic Equation

2.1.

Nonlinear Shallow Water Equation

The two-dimensional nonlinear shallow water equation is used to caluculate the water flow, which is written as follows; ∂u ∂u ∂u ξ+η ∂2 u ∂2 u ∂2 u ∂2 v + (u + v ) + g − ν[( 2 + 2 ) + ( 2 + )] + f u = 0, ∂t ∂x ∂y ∂x ∂x ∂x ∂y ∂x∂v

(1)

∂v ∂v ξ+η ∂2 v ∂2 u ∂2 v ∂2 v ∂v + (u + v ) + g − ν[( 2 + ) + ( 2 + 2 )] + f v = 0, ∂t ∂x ∂y ∂y ∂x ∂x∂y ∂y ∂y

(2)

∂ξ ∂ξ ∂u ∂v ∂ξ + u + v + ξ( + ) = 0. ∂t ∂x ∂y ∂x ∂y

(3)

The boundary condition can be expressed as; on Γd ,

(4)

v = vˆ on Γd , ξ = ξˆ on Γd ,

(5)

u = uˆ

(6)

un = unx = uˆn

on

Γn ,

(7)

vn = vny = vˆn

on

Γn .

(8)

The stabilized bubble element is used for the discretization by the finite element method7). The bubble function is capable of eliminating the barycenter point by using the static condensation. The discretized form derived from the bubble function element is equivalent to that from the SUPG 9) . Therefore, the stabilized parameter which is derived from the bubble function element is expressed as follows for the momentum equation of the shallow water flow: hφe, 1i2Ωe A−1 e , τe B u i = 1 1 2 2 2 ˜ ∆t ||φe||Ωe + 2 (ν + ν)2||φe, j||Ωe − f ||Ωe ||Ωe and for the continuity equation :

(9)

Stability Analysis of Abnormal Multiplication...

τe B u i =

hφe, 1i2Ωe A−1 e , 1 2 + 1 (ν 2 ˜ ||φ || )||φ e Ωe e, j||Ωe ∆t 2

3

(10)

where ν˜ is the stabilized control parameter. From the criteria for the stabilized parameter corresponding to the SUPG, an optimal parameter can be given as follows for the momentum equation of shallow water flow; α −1 1 ) τe Bui = ( τ−1 es + 2 ∆t τ− es 1 = [(

2|Ui| 2 4ν 1 ) + ( 2 )2 ] 2 , he he

(11)

(12)

and for the continuity equation; 1 α τeBui = ( τ−1 + )−1 , 2 es ∆t τ− es 1 = (

2|Ui| ), he

(13)

(14)

where α=

Ae ||φe||2Ωe hφe, 1i2Ωe

he = |Ui| =

p

p

(15)

2Ae,

(16)

u2 + v2 + gξ,

(17)

and Ωe is the element domain and hu, viΩe =

2.2.

,

R

Ωe uvdΩ.

Ecological Model

In this study, a simple mathematical model is employed, which is suggested in 1). There are many parameters in these equations. The ecological model is shown in Figure 1. ∂2 P ∂2 P ∂P ∂P ∂P = D1x 2 + D1y 2 − u − v + f (P, Z, N), ∂t ∂x ∂y ∂x ∂y

(18)

∂Z ∂2 Z ∂2 Z ∂Z ∂Z = D2x 2 + D2y 2 − u − v + g(P, Z, N), ∂t ∂x ∂y ∂x ∂y

(19)

4

Tomohiro Yamauchi and Mutsuto Kawahara

Figure 1. Ecological System

∂N ∂2 N ∂2 N ∂N ∂N = D3x 2 + D3y 2 − u −v + h(P, Z, N). ∂t ∂x ∂y ∂x ∂y

(20)

where, P is Phytoplankton, Z is Zooplankton, and N is Nutrient in which P,Z and N show the concentration of each component. In these equation, D1x,D1y ,D2x,D2y ,D3x and D3y are the non-dimensional diffusion coefficient of P,Z and N, respectively. The terms f(P,Z,N),g(P,Z,N) and h(P,Z,N) are the biological reaction terms, which are expressed as follows; NP ˆ − ψP, − βZ[1 − exp{−λ(P − P)}] (21) f (P, Z, N) = α+N ˆ − γβZ 2 [1 − exp{−λ(P − P)}], ˆ g(P, Z, N) = βZ[1 − exp{−λ(P − P)}] NP ˆ + ψP + γβZ 2 [1 − exp{−λ(P − P)}]. α+N where positive term underlined means the increase of the time-lag. h(P, Z, N) = −

3. 3.1.

(22) (23)

Stability Problem Lyapunov’s Stability Theory

The stability analysis based on the Lyapunov’s stability theory is employed. Considering this theory, equilibrium points and perturbation can be thought to research the stability of the system.Equilibrium means the state points, and the perturbation is the microscopic oscillation. In case that the system is completely stable, the oscillation settles down according as time goes by. But if the system is unstable, according as time passes, oscillation becomes unlimited. In this study, to decide the stability of the system, eigenvalue is employed. The judging criteria of the stability is described in Table1.

Stability Analysis of Abnormal Multiplication...

5

Table 1. Judging Criteria Eigenvalue σ0

3.2.

system Completely stable Neutral Unstable

Linearization

In order to obtain eigenvalues of the system, the basic equations are the linearized Lyapunov’s stability theory. At the first equilibrium point is pursuited. The following way is employed in order to linearize the equations; 1:Considering the solution around the equilibrium points as follows; Pγ +∆P,Zγ +∆Z,Nγ +∆N, where, Pγ ,Zγ and Nγ mean equilibrium points in each component. The equilibrium points are determined if change of value by the incremental method is less than 1.0 × 10−5 , and ∆P,∆Z, ∆ N are the perturbations. Thus, substituting Pγ + ∆P,Zγ + ∆Z, and Nγ +∆N to eqs.(18) - eq.(20); ∂(Pγ + ∆P) = D1 ∇2 (Pγ + ∆P) + f (Pγ + ∆P, Zγ + ∆Z, Nγ + ∆N), ∂t

(24)

∂(Zγ + ∆Z) = D2 ∇2 (Zγ + ∆Z) + g(Pγ + ∆P, Zγ + ∆Z, Nγ + ∆N), ∂t

(25)

∂(Nγ + ∆N) = D3 ∇2 (Nγ + ∆N) + h(Pγ + ∆P, Zγ + ∆Z, Nγ + ∆N). (26) ∂t 2:Employing Taylor-expansion and omitting terms more than one order, the linearized equation of eqs.(24) - (26) is obtained as follows; ∆φ˙ = F∆φ, where

 ∆P ∆φ =  ∆Z  , ∆N 



 F = in which

(27)

∂ fγ ∂P

D1 ∇2 + ∂gγ ∂P ∂hγ ∂P

∂ fγ ∂P

∂ fγ ∂Z 2

D2 ∇ + ∂hγ ∂Z

∂gγ ∂Z

∂ fγ ∂N ∂gγ ∂N ∂h D3 ∇2 + ∂Nγ



 .

is the function at the substituted equilibrium solution with differentiation by P.

6

Tomohiro Yamauchi and Mutsuto Kawahara

3.3.

Discretization by FEM

The following perturbations are substituted ; ˆ σt ∆P = Pe

(28)

ˆ σt ∆Z = Ze ˆ σt ∆N = Ne

(29) (30)

Using eqs.(28) - (30) and discretizing eq.(27), the following equation is obtained ; ˆ σ[M]φˆ = [H]φ, where



M= −D1 Sαβ + FPαβ  GPαβ H= HPαβ 

Mαβ =

4.

Z

V

(31) 

Mαβ Mαβ Mαβ FZαβ −D2 Sαβ + GZαβ HZαβ

Φα Φβ dV, Sαβ =

Z

V

,  FNα beta , GNαβ −D3 Sαβ + HNαβ

Φα,i Φβ,i dV.

(32)

Eigenvalue Problem

4.1.

Arnoldi’s Method

To obtain the eigenvalue of the system, the Arnoldi’s method is applied in this research. This method is enable to decrease the memory of dimension and computation time. Algorithm for the standard eigenvalues and eigenvectors problem(Cu = σ u) is as follows; 1:Start;Choose an initial vector v1 of unity norm, and a number of step m. 2:Iterate;For j = 1,2,...,m do: j

vˆ j+1 = Cv j − ∑ hi j vi ,

(33)

hi j = (Cv j , vi ), i = 1, ...., j,

(34)

h j+1, j = ||vˆ j+1 ||2 ,

(35)

v j+1 = vˆ j+1 /h j+1, j .

(36)

i=1

where

This algorithm produces an orthonormal basis Vm = [v1 , v2 , ..., vm] of the Krylov subspace Km = span{v1 ,Cv1 , ...,Cm−1v1 }. In this basis the restriction of C to Km is represented by the upper Hessenberg matrix Hm whose entries are the hi j produced by algorithm,i.e., Hm = hi j .

(37)

The eigenvalues of C are approximated by those of Hm which are : Hm = VmT CVm .

(38)

Stability Analysis of Abnormal Multiplication...

4.2.

7

Application for Generalized Eigenvalue Problem

If one wishes to find out the leading eigenvalue with maximum real part,it is common to use the shift and invert strategy. If σ0 is an approximation to an eigenvalue of interest, then the shifted and inverted problem is; (C − σ0 I)−1 u = λu,

(39)

where, λ = 1/(σ − σ0 ).Thus,eigenvalues of C close to σ0 correspond to eigenvalues λ of eq.(39) with large absolute value, and one expects the Arnoldi’s method to converge to such eigenvalues. In order to apply the Arnoldi’s method to eq.(39) for the generalized eigenvalue problem eq.(31), eq(39) may be described as; (H − σ0 M)−1 Mu = λu,

(40)

and to apply to the Arnoldi’s method the LU decomposition of H - σ0 M once is performed, and then each time(H-σ0 M)−1 Mv is needed, we solve (H − σ0 M)w = Mv by forward and backward analysis. This is much more economical than forming the matrix of eq.(40) explicitly since it is usually full and also its dimension is much larger than M.

5. 5.1.

Parameter Identification Performance Function

In case that the parameter in the equation is changed, the stability of the system changes. Obtaining the parameter value in case that the system is stable, the parameter identification technique is applied. This technique is equal to the estimation with minimization of the performance function J, which is defined as the sum of square residual between calculated and observed values. This function is described as follows: 1 J= 2

Z

v

(σ(k) − σ)t (σ(k) − σ)dv.

(41)

where, σ is objective eigenvalue,σ(k) is the eigenvalues of the system . In a word, the optimal parameter value can be decided to minimize the performance function J applying the parameter identification technique.

5.2.

Algorithm

In this reseach, the Conjugate Gradient Method is employed to minimize the performance function J. The algorithm of the parameter identification technique is as follows: 1.Assume initial parameter value k(0) , decide convergence criterion εJ 2.Calculate state value σ(k)(0) 3.Calculate performance function J (0) (0) 4.Calculate sensitivity matrix [ ∂σ(k) ∂k ] ∂σ(k) 5.Calculate initial gradient d (0) = [ ∂k ](0) 6.Calculate step size α so as to minimize J(σ(i) + αd (i) )

8

Tomohiro Yamauchi and Mutsuto Kawahara

7.Renew parameter k(i+1) = k(i) + αd (i) 8.Calculate state value σ(k)(i+1) 9.Calculate performance function J (i+1) (i+1) 10.Calculate sensitivity matrix [ ∂σ(k) ∂k ] 11.Calculate β =

(i+1) ∂J (i+1) [ ∂J [ ∂k ] ∂k ] ∂J (i) ∂J (i) [ ∂k ] [ ∂k ]

(i+1) 12.Calculate gradient of performance function J;d (i+1) = −[ ∂J + βd (i) ∂k ] (i+1) (i) 13.If |J | − |J | < ε, then stop 14.Set i = i + 1 and go to 6

5.3.

Sensitivity Matrix

In order to solve the sensitivity matrix, the left eigenvalue problem has to be used in this study. Left eigenvalue problem is as follows: σMφ = Hφ,

(42)

σM T ψ = H T ψ.

(43)

where, M T (or H T ) is the transposed matrix of M(or H).The eigenvector of eqs.(42) and (43) are not the same, but the eigenvalues are the same. In this study, the maximum eigenvalue of real part is investigated and eigenvectors of real and imaginary part are employed to solve sensitivity matrix. The real part of sensitivity matrix is able to be obtained as follows : Re

AC + BD ∂σ =− 2 . ∂k A + B2

(44)

where, A = ψTreMφre − ψTim Mφim ,

(45)

B = ψTreMφim + ψTim Mφre,

(46)

∂H ∂H φre − ψTim φim , ∂k ∂k ∂H partialH D = ψTim φre + ψTre φim . ∂k ∂k φre:Real part of eigenvectors φim :Imaginary part of eigenvectors φre:Real part of eigenvectors φim :Imaginary part of eigenvectors C = ψTre

(47) (48)

Calculating this matrix, parameter identification technique can be applied and the optimal parameter value which makes the system stable can be obtained.

Stability Analysis of Abnormal Multiplication...

6.

9

Initial Distribution of Plankton

6.1.

Modal Analysis

To represent the whole distribution of the plankton as the initial distribution,the concept of the modal analysis is utilized as shown in Figure 2. If the eigenvalue of the linear Laplacian ∇ is denoted as λ2 in area V , spectrum of linear Laplacian ∇ is determined by the Helmholtz equation as follows;

Figure 2. Modal analysis ∇2 ψ + λ2 ψ = 0,

(49)

∂2 ∂2 + , ∂x2 ∂y2

(50)

where ∇2 ≡

In eq.(49),ψ presents basic mode of phytoplankton, zooplankton and nutrient. The boundary condition is as follows; ∇ψ~n = 0. (51)

6.2.

Eigenvalue Problem by FEM

To obtain the eigenvalues λ2 and eigenvectors ψ, the finite element method is employed. The Galerkin method is used for the spatial discretization of eq.(49). Sαβ Ψβ − λ2 Mαβ Ψβ = 0, where Sαβ =

Z

Mαβ =

V

Z

(52)

ψα,i Ψβ,i dV,

(53)

ψα ψβ dV.

(54)

V

10

Tomohiro Yamauchi and Mutsuto Kawahara

Eq(52) is dealed with as the general eigenvalue problem. The Householder-QR method is employed to find the eigenvalues λ2 . However this method can’t be applied to the general eigenvalue problem. Therefore this problem is transformed the into standard eigenvalue problem. Matrix Mαβ is symmetric, therefore the matrix can be divided into two matrices by the Choleski Method; (55) Mαβ = LTαβ Lαβ , Substituting eq.(55) into eq.(52); Sαβ ψβ − λ2 LTαβ Lαβ ψβ = 0,

(56)

where eigenvector ψ is replaced by using the following equation; zβ = Lαβ ψβ ,

(57)

Sαβ L−1 z = λ2 LTαβ Lαβ ψβ , αβ β

(58)

−1 λ2 zβ = L−T αβ Sαβ Lαβ zβ ,

(59)

−1 Aαβ = L−T αβ Sαβ Lαβ ,

(60)

then it is obtained that

where Substituting eq.(60) into eq.(59), the following equation can be derived ; λ2 zβ = Aαβ zβ .

(61)

To obtain the eigenvalues λ2 and the eigenvectors zβ by eq.(61), the Householder-QR Method and the Inverse iteration method is employed. And ψ equals to L−1 z, thus, eigenvector ψ is found by the Backward substitution method 8) .

6.3.

Superposition of Spectra

6.3.1. Performance Function Two-dimensional distribution is calculated by the superposition of the eigenmode and observation data. Then obtained spectra which represent the state for spatial density is superposed. It is called as the Modal analysis method. The method for superposition is calculated as follows. The performance function is defined as; 1 J= 2

Z

(uˆ − u) ˜ 2 dV,

(62)

V

where

n

uˆ = ∑ ci ui ,

(63)

i=1

Determine c1 -cn so as to minimize J,where; J=

1 2

Z

V

(uˆ − u) ˜ 2 dV,

(64)

Stability Analysis of Abnormal Multiplication...

J=

1 2

mx

∑ (uˆj − u˜j )2,

(65)

∑ (uˆj2 − 2uˆj u˜j + u˜j 2 ),

(66)

J=

j=1

mx

1 2

J=

1 2

j=1

n

n

mx

∑ [( ∑ ui j ci )2 − 2( ∑ ui j ci )u˜j + u˜j 2 ], mx

n

j=1

i=1

∑ [(2 ∑ ui j ci )ul j − 2ul j u˜j],

mx

=

(67)

i=1

j=1 i=1

1 ∂J = ∂Cl 2

11

(68)

n

∑ [( ∑ ui j ci)ul j − ul j u˜j ],

(69)

j=1 i=1

=

mx

n

j=1

i=1

∑ [ul j ( ∑ ui j ci − u˜j )].

(70)

(l = 1,2,3,,n)

6.3.2. Minimization Method Spatial distribution is made by the superposition of each eigenmode. Therefore, mode with component influence by the unknown constants ci have generated the spatial distribution. The conjugate gradient method is employed for the above equations to obtain the ci .

7. 7.1.

Numerical Example Verification of Ecological Model

7.1.1. Case 1 Figure 3 shows the finite element mesh. The total number of element and node are 400 and 303, respectively.

Figure 3. Used Mesh Assuming that the sufficient nutrient change of phytoplankton, zooplankton, nutrient is computed and represented in Figures 3-6. Vertical axis is normalized fraction of nutrient Nt. Horizontal axis is the scaled distance of plankton patchness. Figure 4 shows the initial condition. In figure 5, phytoplankton increases as the nutrient decreases and all patchnesses are moved by velocity. In figure 6, zooplankton increases as phytoplankton decreases, because zooplankton absorbs phytoplankton. The nutrient is also preyed by the phytoplankton. All patchnesses don’t move outsiede through the boundary. In figures 5 and 6, all patchness is moved by velocity and diffusion. The small amount of the nutrient remained mainly due to the extinction of zooplankton. In figure 7, equilibrium solution is obtained. Equilibrium solution means balanced value on the ecological model.

12

Tomohiro Yamauchi and Mutsuto Kawahara Phytoplankton Zooplankton Nutrient

1

Normalized Fraction of Nt

0.8

0.6

0.4

0.2

0

-30

-20

-10

0 Scaled Distance

10

20

30

Figure 4. t = 0.00

Phytoplankton Zooplankton Nutrient

1

Normalized Fraction of Nt

0.8

0.6

0.4

0.2

0

-30

-20

-10

0 Scaled Distance

10

20

30

Figure 5. t = 3.00

Phytoplankton Zooplankton Nutrient

1

Normalized Fraction of Nt

0.8

0.6

0.4

0.2

0

-30

-20

-10

0 Scaled Distance

Figure 6. t = 9.00

10

20

30

Stability Analysis of Abnormal Multiplication...

13

Phytoplankton Zooplankton Nutrient

1

Normalized Fraction of Nt

0.8

0.6

0.4

0.2

0

-30

-20

-10

0 Scaled Distance

10

20

30

Figure 7. t = 50.00

7.1.2. Case 2

Figure 8. Used Mesh

The mesh in Figure 8 is used in case 2. The total number of element is 3600 and that of node is 1861. Figure 9 is initial condition of the nutrient. Phytoplankton and zooplankton are constant. From figure 10, that nutrient diffuses and moves by flow velocity is confirmed.

7.2.

Analysis in Lake Kasumigaura

In this research, the Lake Kasumigaura is chosen as the analysis field. This lake is with area of 220 square kilometers which is the second in size within Japan. In this lake,there have been water quality problems and its damage has been very serious. One of the famous problems is outbreak of ’Microcystis aeruginisa’.It is a kind of phytoplankton like red tide, and by the eutrophication, the water quality problem like ’Microcystis’ was taken place. Location of the Lake Kasumigaura is shown in Figure 11. The mesh is used in Figure 12. The total number of element is 1409 and that of node is 804.

14

Tomohiro Yamauchi and Mutsuto Kawahara

N 0.95 0.9 0.85 0.8 0.75 0.7 0.65 0.6 0.55 0.5 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05

Figure 9. t = 0.00 Table 2. Biological Parameter The parameter values of the system are described on Table 2. Parameter difinition α Micahelis constant 0.1 β Zooplankton maximum grazing threshold 1.2 γ Zooplankton egestion coefficient 2.31 λ Ivlev constant 1.0 ⋆ p Phytoplankton loss coefficient 0.083 ψ Zooplankton grazing threshold 0.15

7.2.1. Initial Distribution Figures 13 and 14 are results of modal analysis. The convergence of the performance function J is shown in Figure 13. Figure 14 gives control vector ci to each spectrum, it is confirmed which mode has significant influence on the components in the Lake Kasumigaura. Figure 15 represents appearance of initial distribution of phytoplankton in time. The plankton patchness is changed for predatism and velocity of flow on the Lake Kasumigaura. In this study, the values in figure 16 are regarded as equilibrium solution, which means balanced value in ecological system.

7.2.2. Stability Analysis In this study, it is thought to combine the outbreak of the planktons with stability problem employing eigenvalues. For example, the maximum eigenvalue is negative, this system is stable. However, if it is positive, it is considered that the system is unstable. Figure 17 shows the stability of the system. Changing parameter α, the system shifts from stable to unstable. The parameter value of α which

Stability Analysis of Abnormal Multiplication...

N 0.012 0.011 0.01 0.009 0.008 0.007 0.006 0.005 0.004 0.003 0.002 0.001

Figure 10. t = 10.00

Figure 11. Place of the lake Kasumigaura

15

16

Tomohiro Yamauchi and Mutsuto Kawahara

Figure 12. Used mesh

1.8 J 1.6

Performance Function J

1.4

1.2

1

0.8

0.6

0.4

0.2

0 0

10

20

30

40

50

60

70

80

90

Iteration

Figure 13. Performance function of Phytoplankton

0.9 control value 0.8

0.7

control value

0.6

0.5

0.4

0.3

0.2

0.1

0 0

5

10

15 mode number

20

25

Figure 14. Control Quantity of Phytoplankton

30

Stability Analysis of Abnormal Multiplication...

P 3.22 2.98615 2.75231 2.51846 2.28462 2.05077 1.81692 1.58308 1.34923 1.11538 0.881538 0.647692 0.413846 0.18

Figure 15. t = 0.00(phytoplankton)

P 3.22 2.98615 2.75231 2.51846 2.28462 2.05077 1.81692 1.58308 1.34923 1.11538 0.881538 0.647692 0.413846 0.18

Figure 16. t = 6.00(phytoplankton)

17

18

Tomohiro Yamauchi and Mutsuto Kawahara 0.02

’falpha.r’

0

Maximum eigenvalue

-0.02

-0.04

-0.06

-0.08

-0.1 0

2

4

6

8

10 12 Parameter alpha

14

16

18

20

Figure 17. Stability of The System 0.014 ’dej.r’

0.012

performance function J

0.01

0.008

0.006

0.004

0.002

0 0

50

100

150 iteration

200

250

300

Figure 18. Performance Function -0.32 ’emax.r’ -0.34

maximum eigenvalue

-0.36

-0.38

-0.4

-0.42

-0.44

-0.46

-0.48 0

50

100

150 iteration

200

Figure 19. Maximum eigenvalue

250

300

Stability Analysis of Abnormal Multiplication...

19

1.1 ’balpha.r’ 1.05

parameter alpha

1

0.95

0.9

0.85

0.8

0.75

0.7 0

50

100

150

200

250

300

iteration

Figure 20. Parameter α

changes the stability is guessed about 9.45. From this result, in case that parameter α is small, abnormal multiplication occurs. That α is small means that phytoplankton absorbs large amount of nutrient. Figure 18 shows the convergence of the performance function. Figure 19 shows that of real part of maximum eigenvalue, and Figure 20 shows the convergence of α. In this case, convergent value of the performance function J is 1.0 × 10−5. From the result, in case that maximum eigenvalue is -0.32, parameter α is 1.1. Therefore, 1.1 is the value of parameter α which makes the system is stable.

8.

Conclusion

In this study, two-dimensional spatial distribution is made by the superposition of spectra of mode number from 1 to 30. From Figure 5, mode No.1 has a significant influence on spatial distribution. Relation of food chain is represented and applied to the phenomena in the Lake Kasumigaura. The nonlinear shallow water equation is employed to represent influence of flow velocity. The computed result is used as the initial distribution of the stability analysis. The main purpose of this study is to judge the stability of the system from the real part of the maximum eigenvalue. Figure 17 is the result of the forward analysis, Figures 18, 19 and 20 are results of backward analysis about parameter α. When parameter α is small, phytoplankton absorbs large amount of nutrient. When the parameter α is 1.1, the system is stable.

References [1] J.S.Wroblewski and J.J.O’Brien(1976), A Spatial Model of Phytoplankton Patchiness. Marine Biology 35, 161-172. [2] J.S.Wroblewski(1977), A model of Phytoplankton plume formation during variable Oregon upwelling, Journal of Marine Research, 358-394. [3] N.F.Britton (1999), Reaction-diffsion equations and their application to biology , Academic Press; pp109-137. [4] Peter J.S. Franks and Changsheng Chen (1996), Plankton prouction in tidal fronts: A model of Georges Bank in summer, Journal of Marine Research; 54: pp631-651.

20

Tomohiro Yamauchi and Mutsuto Kawahara

[5] G.Ono and M.Kawahara (2004), Stability Analysis of Multiplication of Plankton Using Parameter Identification Technique. Int. J. Num. Meth. Fluids Vol. 44 Num. part1 pg71. [6] Y.Ding and M.Kawahara (1998), Bifurcation Analysis of Brown Tide in Tidal Flow Using Finite Element Method, Oceanographic Literature Review pp502-502. [7] J.Matsumoto, T.Umestu and M.kawahara(1998), Shallow Water and Sediment Transport Analysis by Implicit FEM, Journal of Applied Mechanics, vol.3, 263-274. [8] N. Nayar and J. M. Ortega (1993), Computing of Selected Eigenvalues of Generalized Eigenvalue Problems, Jour. Comp. Phys.; 108: pp8-14. [9] Hughes, T.J.R., Framea, L. P. and Balestra, M. (1986), A New Finite Element Formulation for Computational Fluid Dynamics, V. Comp. Meth. Appli. Meth. Eng. 59, 85-99.

In: Techniques of Scientific Computing for Energy ... ISBN 1-60021-921-7 c 2007 Nova Science Publishers, Inc. Editors: F. Magoul`es and R. Benelmir, pp. 21–35

ACHIEVING H IGH -P ERFORMANCE C OMPUTING IN G EOMECHANICS BY D EVELOPMENT OF PARALLEL F INITE E LEMENT PACKAGE Felicja Okulicka - Dłuzewska∗ Faculty of Mathematics and Information Science Warsaw University of Technology, Pl. Politechniki 1, 00-661 Warsaw, POLAND

Abstract The parallelization of the finite element method (FEM) algorithm is considered. The parallel versions of the FEM package are developed on the base of the sequential one. The elasto-plastic behavior of the geotechnical constructions can be modelled, calculated and analyzed by the package. The Cray Fortran compiler directives are applied for the parallelization of source code for shared memory machines and MPI library for distributed environment. As the engineering example of the geotechnical problem the embankment rising, the dam building and the settlement of the underground are remodelled, calculated and analyzed.

1.

Introduction

Finite element method (FEM) is the most general and powerful tool for solving engineering problems. The FEM algorithm, well known and widely used in practice, is very appropriate for the parallelization. Due to the parallelization of the code the high performance calculation can be done, very large structures can be modelled and considerable speed-up is reached. For smaller engineering problems it would be nice to have result as fast as a draw a mouse to a new pixel. Considering the finite element modelling, one of the essential problems, which we face in software development, is the parallelization of the existing sequential codes, which very often are developed by years. The question arises if is it worthy done and how great effort is required. Advantages and difficulties of the parallelization of the finite element method package are presented and discussed in the paper. The FEM package Hydro-Geo oriented at hydro and geotechnical problems is presented in Section 2. The program is developed at Warsaw University of Technology and next extended to allow the parallel calculations. The sequential version of the program is the starting point for developing the parallel versions step by step. The package is composed of the three main programs: the preprocessor for mesh generation and preparing the data, the processor for main mechanical calculation and the graphical post processor. In the paper two parallel versions of processor are compared: the first working on the shared memory machines ∗ E-mail

address: [email protected]

22

Felicja Okulicka - Dłuzewska

and the second in the distributed environment. In section 3 the numerical procedure implemented in the package is recalled after [6, 7]. Section 4 contains the algorithm of processor for shared memory machines. In section 5 the message passing method implemented in distributed Hydro-Geo is described. The computational results are included in section 6. The version for shared memory machines is implemented and tested due to the access to the supercomputer Sun 10000E, owned by COI PW (Computing Center of Warsaw University of Technology). The distributed version of the package is implemented and tested due the support of the European Community - Access to Research Infrastructure action of the improving Human potential programme (contract No HPRi-1999-CT00026).

2.

Finite Element Method Package Hydro-Geo

The finite element package HYDRO-GEO [8] is oriented at hydro and geotechnical problems. The structure of the package is drown on Figure 1.

management shell

processor

preprocessor

structure

data

soil

postprocessor

water

graphical presentation

coupled Auto CAD interface

mesh generation

results selection

Figure 1. Relation between various programs in the finite element package HYDRO-GEO. Finite element method package algorithm can be divided into three separate parts: 1. preprocessor for mesh generation, mesh optimalization and data input 2. processor for stiffness matrix calculation, solving the global set of equations and analysis of strains and stresses 3. postprocessor for graphical and numerical presentation of the computed results The above tasks are independent and they are realized by separate modules of the package. In Hydro-geo these programs can run under the management shell or can be executed on different machines as well, where the data between programs can be sent by the net. The

Achieving High-Performance Computing in Geomechanics...

23

data transfer between modules are due to the text files. In the package the format of the data transfer files between modules are fixed. It allows to exchange the parts of the package. It is very useful when we take under consideration the data preparation and the representation of the results. In fact in the package a few preprocessors exists. They are written in Fortran and in C. The main part - processor was developed by years by group of persons. It is written in Fortran. The most time-consuming part of the modelling is the processor in which the numerical finite element algorithms are implemented what is really worth to be done in parallel. In the processor the coefficient matrix of the set of equations is calculated for each stage of the construction building and each time increment. Parallel calculation speeds up the process and putting data into the distributed memories increases the number of elements that can be proceeded. The set of equation is solved several times. The parallel solver can also speed calculation process radically. The structure of the package gives the opportunity to execute the calculation on parallel machine without changing the format of data input and output. The parallelization of the preprocessor presented in the paper is done in such a way, that the procedures responsible for modelling are not change and the program can be developed by others without problems.

3.

Numerical Procedure

Description of different mechanical phenomena such as flow, mechanical behavior, thermal effects, leads to coupled systems of differential equations. To solve a certain initial boundary value problems, the finite element methods can be used. In such situation where a few phenomena are taken into account the final form of global equation set takes the block form. In general, the times when important phenomena has been considered separately belongs to the past. Now we want to model very complicated and complex effects. For example if we consider the car engine we have to solve mechanical and thermal differential equations as a coupled system. The coupled systems appear in modern mechanics very often. Ground waters flow and mechanical behavior deformation and stressed, transport of the pollutants, thermal flow etc. Coupled problems are much more complicated, comparing each effect considered separately but solving of them gives very realistic behavior of complex problems [12]. To solve system of linear equations obtained during the modelling of coupled systems the standard solvers, which are available from the Internet, can be used. Here we consider the direct solvers, the Scalapack [24] library for distributed blocked matrix equations is very appropriate for the problem. The iterative solvers needs preconditioning because even for relatively not too big problems the convergence of algorithm is difficult to reach. In Hydro-geo the virtual work principle, continuity equation with boundary conditions is the starting points for numerical formulation. The finite element method is applied to solve initial boundary value problems. Several procedures stemming from elasto-plastic modelling can be coupled with the time stepping algorithm during the consolidation process. The elasto-plastic soil behavior is modelled by means of visco-plastic theory (Perzyna, 1966). The finite element formulation for the elasto-plastic consolidation combines overlapping numerical processes. The elasto pseudo-viscoplastic algorithm for numerical modelling of elasto-plastic behavior is used after Zienkiewicz and Cormeau (1974). The sta-

24

Felicja Okulicka - Dłuzewska

bility of the time marching scheme was proved by Cormeau (1975). The pseudo-viscous algorithm developed in finite element computer code Hydro-Geo is successfully applied to solve a number of boundary value problems, Dluzewski (1993). The visco-plastic procedure was extended to cover the geometrically non-linear problems by Kanchi et al (1978) and also developed for large strains in consolidation, Dluzewski (1997) [6]. The pseudoviscous procedure is adopted herein for modelling elasto-plastic behavior in consolidation. In the procedure two times appear, the first is the real time of consolidation and the second time is only a parameter of the pseudo-relaxation process. The global set of equations for the consolidation process is derived as follows   i     i  i ∆u 0 0 KT L u ∆F • = (1) • + i T i i i ∆p L −(S + Θ∆tH ) 0 −∆tH p ∆q where KT is the tangent stiffness array, considering large strains effects, L is the coupling array, S is the array responsible for the compressibility of the fluid, H is the flow array, u are the nodal displacements, p are the nodal excesses of the pore pressure, ∆F i is the load nodal vector defined below ∆F i = ∆FL + ∆RiI + ∆RiII

(2)

∆F i is the load increment, ∆RiI is the vector of nodal forces due to pseudo-visco iteration, ∆RiII is the unbalanced nodal vector due to geometrical nonlinearity. ∆RiI takes the following form Z νp t+∆t ∆εi )t+∆t (3) ∆RiI = (i−1) BT(i−1)D(t+∆t i−1 dv t+∆t

V

and is defined in the current configuration of the body. The subscripts indicate the configuration of the body, and superscripts indicate time when the value is defined (notation after Bathe (1982)). ∆RiI stands for the nodal vector which results from the relaxation of the stresses. For each time step the iterative procedure is engaged to solve the material non-linear problem. The i-th indicates steps of iterations. Both local and global criterions for terminating the iterative process are used. The iterations are continued until the calculated stresses are acceptable close to the yield surface, F ≤ Tolerance at all checked points, where F is the value of the yield function. At the same time the global criterion for this procedure is defined at the final configuration of the body. The global criterion takes its roots from the conjugated variables in the virtual work principle, where the Cauchy stress tensor is coupled with the linear part of the Almansi strain tensor. For two phase medium, the unbalanced nodal vector ∆RiII is calculated every iterative pseudo-time step. ∆Rk−1 =



Z

Z

t+∆t V

N T f t+∆t dV +

Z

N T t t+∆t dS t+∆t S

(k−1)

(k−1) t+∆t V

t+∆t j(k−1) t+∆t (k−1) BT(k−1)D(t+∆t σ + mt+∆t p )t+∆t dV

(4)

The square norm on the unbalanced nodal forces is used as the global criterion of equilibrium. The iterative process is continued until both criterions are fulfilled.

Achieving High-Performance Computing in Geomechanics...

4.

25

Parallel Hydro-Geo For Shared Memory Supercomputers

In programs for shared memory machines all variables are visible by all threads created during the program execution. The most popular and simple way of the parallelization is the division of the loops into threads, which run on the separate processors. We can make the compiler do it automatically by adding the directive ”autoparallel” during the compilation process. Any changes need not be make in the source code. Explicit parallelization is reached by putting directives into source code before the parts that can be execute concurrently. The number of processors should be known during compilation because it determines the number of threads. The parallel versions of the FEM package are built on the base of the sequential one. The structure of the package is not changed. In the first step all auxiliary files for keeping data during the calculation process are cancelled. All data are put into the memory. The number of read/write on/from the disk operations is reduced. The order in which single elements are calculated became not important. That allows us to parallelize the main loops which calculate the local values and the local matrices for single elements. The Hydro-Geo processor algorithm for shared memory machines with compiler parallelizing directives can be written as follows [14]: Start Data reading, initial computations For each stage of the construction and each increment of load do Read the data specific for the stage Parallelize the following loop For each element do Calculate the local stiffness matrix, coupling matrix, flow matrix Calculate the initial stresses End do Calculate the global set of fully coupled system First part of the solver (forward substitution) - parallel calculation For each Gauss point do Second part of the solver (backward substitution) -parallel calculation Parallelize the following loop For each element do Calculation of strains and stresses End do Print the result in the disk file End do End do Stop The loops calculating values for each element (the local stiffness matrix, coupling matrix, flow matrix, stresses, strains) are divided into threads that were executed con-

26

Felicja Okulicka - Dłuzewska

currently. It is a kind of domain decomposition, done by splitting the set of elements into subsets. The calculated variables are visible for the rest of program commands and procedures. Such approach needs big amount of memory for big problems. The speed-up is reached not only due to parallelism but due to the reduction of disk operations as well.

5. 5.1.

Distributed Calculation Distributed FEM Package

The Message Passing Interface (MPI) standard is used as the tool for parallelization [13, 25]. For distributed memory machines the number of processes is created during the execution of the parallel program. The processes are distinguished by own unique names called ranks, which are integer numbers. They have their own private variables. Each process has to have copies of all variables needed for calculations. All results calculated by one process and needed by another should be send or broadcast. In our approach only one process reads data. It will be called ”master”. Others obtain the data by broadcast from master, make calculations using received data and send the results needed for solving global set of equation back to master. The distributed version bases on the program working in memory - no auxiliary files for keeping data during the calculation process are created. The parallelization is similar to the one described in the previous section [14, 17]. The calculation is done concurrently in such a way that the loops, which calculate the local values for each element, are divided between processes. Each process calculates local values connected with single element (the local stiffness matrix, coupling matrix, flow matrix, initial stresses) for own private subset of elements. The subsets are determined at the beginning and remain fixed during whole calculation. All processes know which subset of elements belongs to each one. Each process keeps data connected with elements for his private subset only. When the local matrixes are calculated, they are sent to master, which calculates the global matrix for fully coupled system and solves the set of linear equations. The result is broadcast to all processes to allow them to continue the calculation. For master process which is the process with rank 0 the algorithm can be written as follows: Start Data reading, initial computations Sharing the computation determine the subsets of elements calculating by separate processes Broadcast read data to other processes For each stage of the construction and each increment of load do For my subset of the set of elements do Calculate the local stiffness matrix, coupling matrix , flow matrix Calculate the initial stresses End do Synchronization point 1 Gather the local stiffness matrices from all processes Calculate the global set of fully coupled system For each Gauss point do

Achieving High-Performance Computing in Geomechanics...

27

Solve the set of equations Synchronization point 2 Broadcast the solution to all processes For my subset of the set of element do Calculation of strains and stresses End do End do Synchronization point 3 Receiving the results from all processes Printing of the calculated values End do Stop

Processes with ranks greater than 0 receive the read data, initialize the data connected with their subsets of elements, calculate the elements of local matrices connected with their private subsets of elements and send them to the master. The algorithm for slave processes is as follows: Start Receiving of data from process 0 For each stage of the construction and each increment of load do For my subset of the set of element do Calculate the local stiffness matrix, coupling matrix, flow matrix Calculate the initial stresses End do Synchronization point 1 Send the local matrices to process number 0 For each Gauss point do Synchronization point 2 Receive the solution from the process number 0 For my subset of the set of element do Calculation of strains and stresses End do End do Synchronization point 3 Sending the results to process 0 End do Stop

The synchronization points are added to secure the proper communication and exchange of data.

28

Felicja Okulicka - Dłuzewska Table 1. Time table of the embankment rising Stage No 0 I II III

5.2.

Description Initial stresses Rising of stage I Consolidation Rising of stage II Consolidation Rising of stage III Consolidation

Time increment days 0 12 189 12 239 12 590

Total time days 0 12 201 213 452 464 1054

Parallel Numerical Algorithm for Solving the Linear Equations for Consolidation Problem

The block formulation of the coupled problems makes natural the application of the block methods for solving the sets of linear equations. The large matrixes can be split into blocks and put into separate memories of the net of computers. The parallel calculations are reached due to the matrix operations on separate blocks. The standard numerical algorithms should be rebuilt for the block version or the standard libraries can be used [3, 4, 5]. For big problems matrix of the system of linear equations is put on distributed memories and iterative methods is used to obtain solution of the set of equations. For the consolidation problem the coefficient matrix is ill-conditioned [22]. The preconditioners are needed to improve the convergence although for not big problem [1, 2, 9, 10, 11, 12, 19, 21, 22, 23].

6. 6.1.

Engineering Problems Embankment Rising

To study the influence of the large deformation description rising of the embankment on peat is modelled [18]. The Coulomb-Mohr yield criterion and the non-associated flow rule is used with dilatancy angle equals zero. The permeable boundary below the peat layer is assumed. The layer is 10m thick. The embankment slope is 1:2. The embankment is built in four stages. At the beginning the initial stresses are introduced into subsoil. The first stage of the embankment is risen up to the height of 2.0 m, the second up to 4.0 m, the third up to 6.0 m and the fourth up to 7.0 m. The timetable of the embankment rising is given in the Table 1. The mesh contains 1879 nodes. The example of the calculating results - the pressure are presented on Figure 4. The mesh is shown in Fig.2. Six nodded izoparametric elements are used. The non-consistent formulation of the consolidation is applied (herein, pore pressures are calculated in all nodes). The three different times are compared: real wall time, processor

Achieving High-Performance Computing in Geomechanics...

29

Figure 2. Embankment rising - finite element mesh. calculation time - user time from the point of view of system and system time i.e. time for synchronization , management of disk and memory access. The calculations were made for elastic model on Sun6500. The reached speedup is presented on Figures 5. The speedup depends on the size of the problem - on shared memory machines we did not calculate big models. That is why the results are not spectacular.

6.2.

Besko Dam

The Besko dam has been risen on the Carpathian flysch [15, 16]. The height of the dam is about 40m. The inclined schist layers are located in the subsoil. The parallel schist layers with various material parameters create the specific foundation typical for Polish dams in the south. The dam is built from concrete. The clay-concrete screen of 0.8 m thickness is performed. The height of the screen is about 25 m.

Figure 3. Speed up of user time reached on Sun6500. The numerical modelling is done in three stages. In the first stage the initial stresses are introduced into the subsoil. In the second one, the heavy concrete dam is built. The special teeth are done between the subsoil and the dam body for better interaction between the dam and subsoil. The rising of the dam is done by adding the elements. In the third

30

Felicja Okulicka - Dłuzewska

stage the loading causes by filling the reservoir is applied.

Figure 4. Embankment rising - excess of the pore pressure.

Figure 5. Besko dam - calculated stresses. The performance of the different parallel versions of the package is compared for user time. Real time strongly depends on the number of users concurrently working on the supercomputer. The system time is connected mostly with synchronization of threads and the number of input/output commands which is the same in all parallel versions. The average speed up for sequential version in the memory comparing the sequential version of the HYDRO-GEO processor working with the auxiliary files is about 10. In the case of automatic parallelization the speed up for user time comparing the sequential version working into memory is about 2 and does not change much when we change the number of processors.

6.3.

The Settlements of Warsaw Underground

The settlements of the Warsaw underground structures of the metro station are analyzed. The calculation are performed in six stages: 1 - introduction of the initial stresses in subsoil, introduction of diaphragm wall, construct-

Achieving High-Performance Computing in Geomechanics...

31

ing of the ground ceiling, 2 - soil excavation 3 - foundation plate constructing 4 - adding columns 5 - loading from the station trains and traffic 6 - extra loading from 10 floors building

Figure 6. Besko dam - isolines of displacements. In the each stage the new arrangement of the global equilibrium system is done. Some additional boundary conditions (to support internal walls) are changing the global numbering of equations. The problem is nonlinear due to elasto-plastic soil models based on Coulomb-Mohr yield criterion and nonassociated flow rule [2]. The chosen results are shown in the Fig. 7 , 8 and 9. The three different times are compared: real wall time, processor calculation time - user time from the point of view of system and system time i.e. time for synchronization, management of disk and memory access. To compare the speed up the calculations are done for elastic model. The maximum speed-up for user time is bounded, by Amdahl’s law, which says that the maximum speed-up does not depend on number of processors. The parallel versions are compared with the sequential version working in the memory because they based on it. The average speed up is following: 1. for version of the processor obtained by the compilation with the option autoparallel the speed up is about 2 2. for version of the processor obtained by the compilation with the option parallel (i.e. explicit and auto) the speed up is about 3,5 3. for version of the processor obtained by the compilation with the option parallel ( i.e. explicit and auto) and with band matrix parallel solver the speed up is about 5. In our case the half of the calculation of the processor should be done sequentially. The

32

Felicja Okulicka - Dłuzewska

Figure 7. Warsaw underground - the finite element mesh, 8-noded izoparametrical elements are used.

Figure 8. Warsaw underground - the finite displacements in the form of contour lines. speed-up reached by the parallel package, comparing the sequential version is about 2 for different problems. Comparison of calculation time of both versions is possible only for small problems. Bigger problems can not be calculated in sequential way in the memory only - auxiliary files should be used to keep data between called procedures because the capacity of the memory of the single machine is too small. For big problems the calculation without using disc for writing/reading intermediary results is possible only when the data is divided between different machines memory. Big problems can be calculated quickly keeping all data in memory only in parallel.

Conclusions The shared memory version in practice does not work for problems with huge number of elements. Such approach leads to difficulties with shared memory access via bus. It can be used for programs with large number of iterations for Gauss points. This version is easier to implement because there is no exchange of data between processes. The compiler ensures the synchronization. Considering the parallelization of the finite element source code the first steps are obvious. First the element procedures (calculating the stiffness matrix and

Achieving High-Performance Computing in Geomechanics...

33

Figure 9. Warsaw underground - the displacement of the station with the subsoil. nest calculating the strains and stresses) are parallelized. In the second step the frontal procedure for solving the set of linear equations are replaced by band matrix solver. The distributed version is not limited in the number of elements in calculated problems. It can be used for really huge models. For small models the speed up is small or the parallel version can work longer than the sequential one because of the communication and synchronization procedures.

References [1] O.Axelsson, Iterative solution Methods, Cambridge 1994 [2] O.Axelsson, V.A.Barker, Finite Element Solution of Boundary Value Problems , Academic Press,Inc. 1984 [3] R.Barret, M.Berry, T.Chan, J.Demmel, J.Donato, J.Dongara, V.Eijkhout, R.Pozo, C.Romine, H. Van der Vost, Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, SIAM, 1994 [4] J.J.Dongara, Performance of Various Computers Using Standard Linear Equations Software, 1999 [5] J.Demmel, Applied numerical linear algebra, 1997 [6] J. M. Dłuzewski, Non-linear consolidation in finite element modelling, Proceedings of the Ninth International Conference on Computer Methods and Advances in Geomechanics, Wuhan, China, November 1997 [7] J.M. Dłuzewski, Nonlinear problems during consolidation process , Advanced Numerical Applications and Plasticity in Geomechanics ed. D.V. Griffiths, and G. Gioda, Springer Verlag, Lecture Notes in Computer Science, 2001 [8] J.M. Dłuzewski, HYDRO-GEO - finite element package for geotechnics , hydrotechnics and environmental engineering, Warsaw 1997 (in Polish)

34

Felicja Okulicka - Dłuzewska

[9] V. Eijkhout, T.Chan , ParPre: A parallel preconditioners Package reference manual for version 2.0.21, revision 1 [10] M.J.Grote, T.Hycke, Parallel Preconditioning with Sparse Approximate Inverses , SIAM Journal of Sci. Comput., 18(1997), pp 838-853 [11] P.Krzyzanowski, On block preconditioner for non-symmetric saddle point problems, SIAM Journal on Scientific Computing, Vol.23 No 1, 2001, pp157-169 [12] R.W.Lewis, B.A.Schrefler, The Finite Element Method in the Static and Dynamic Deformation and Consolidation of Porous Media , John Wiley Sons, 1998 [13] MPI: A Message Passing Interface Standard , June 1995 [14] F. Okulicka, High-Performance Computing in Geomechanics by a Parallel Finite Element Approach, Lecture Notes in Computer Science 1947, ”Applied parallel Computing”, 5th International Workshop, PARA 2000, Bergen, Norway, June 2000, pp391-398 [15] F. Okulicka, Block parallel solvers for coupled geotechnical problems, 10th International Conference on Computer Methods and Advances in Geomechanics ,January 7 12, 2001, Tucson, Arizona USA- vol 1, A.A.Balkema, Rotterdam, Brookfield, 2001, pp861-866 [16] F.Okulicka, Parallel Calculations of Geotechnical Problems by Means of Parallel Finite Element Code Hydro-Geo, Proceedings of the IASTED International Symposia APPLIED INFORATICS, Innsbruck Austria, February 19-22, 2001, pp 440-443 [17] F.Okulicka, Achieving high performance calculation by the parallelization of the code, The Eighth International Conference On Advanced Computer Systems ACS’2001 , October 17-19, 2001 Mielno, Poland, pp 259-268 [18] F.Okulicka, Parallelization of Finite Element ement Package by MPI library, International Conference of MPI/PVM Users, MPI/PVM 01, Santorini 2001, Lecture Notes in Computer Science 2131, pp425-436 [19] P.S.Pacheco, A User’s guide to MPI, 1998 [20] S Parter, Preconditioning Legrendre spectral collocation methods for elliptic problems I: Finite difference operators, SIAM Journal on Numerical Analysis , Vol 39, No 1, 2001, 320-347 [21] S Parter, Preconditioning Legrendre spectral collocation methods for elliptic problems I: Finite element operators, SIAM Journal on Numerical Analysis , Vol. 39, No 1, 2001, 348-362 [22] K.K.Phoon, K.C.Toh, S.H.Chan, F.H.Lee, An Efficient Diagonal Preconditioner for Finite Element Solution of Biot’s Consolidation Equations , to appear in International Journal of Numerical Methods in Engineering.

Achieving High-Performance Computing in Geomechanics... [23] Y.Saad, Iterative methods for Sparse linear systems , SIAM 2003 [24] http://www.netlib.org/scalapack/ [25] http://www-unix.mcs.anl.gov/mpi

35

In: Techniques of Scientific Computing for Energy ... ISBN 1-60021-921-7 c 2007 Nova Science Publishers, Inc.

Editors: F. Magoulès and R. Benelmir, pp. 37-68

L ARGE -S CALE DATA V ISUALIZATION U SING M ULTI -L ANGUAGE P ROGRAMMING A PPLIED TO E NVIRONMENTAL P ROBLEMS Fr´ed´eric Magoul`es∗ and Roman Putanowicz† Institut Elie Cartan de Nancy, Universit´e Henri Poincar´e, BP 239, 54506 Vandoeuvre-les-Nancy Cedex, France Institute of Computer Methods in Civil Engineering (L5), Cracow University of Technology, Cracow, Poland

Abstract Environment problems lead to large and complex data sets, which appear to be difficult to analyze. Scientific visualization which transforms raw data into images has been recognized as an effective way to understand such data. Actually, most existing scientific software have their own data format and special visualization interfaces or independent software are used to display these data. In this paper a technique for the visualization of large-scale data using multi-language programming is investigated. Mixing Tcl, C++ and Fortran components to the VTK library allows to build efficient and robust applications. This article presents in details how to build such applications and provides an elegant solution to the problem of accessing VTK objects from different languages and how to mix Tcl, C++ and Fortran components in one single application.

Keywords: image processing, visualization, graphics, scripting language, compiled language, multi-language programming

1. Introduction The amount of data collected and stored electronically is doubling every three years. Even if the development of standard data interface protocols allows to solve the data access problems, the analysis of this information becomes an emerging problem. Visualization technology provides effective data presentation. Unfortunately, we are now reaching the limits of interactive visualization of large-scale data sets, since the amount of data to be analyzed is overwhelming. Despite the numerous number of scientific visualization software and the multiple options available in these software, researchers have particular requirements and the development of home made visualization software is very common. ∗ E-mail † E-mail

address: [email protected]; address all correspondence to this author. address: [email protected]

38

Fr´ed´eric Magoul`es and Roman Putanowicz

The Visualization ToolKit named VTK is a software system for computer graphics, visualization and image processing. VTK library is written in C++ however it provides interfaces to the scripting languages Tcl, Python and Java. Though it is possible to write a whole VTK application in a scripting language like Tcl, it is more suitable, for efficiency reasons, to implement some functionality in a compiled language like C/C++ for instance. This is specially the case when working with large data sets arising from environment analysis as noise reduction for instance. For example, when the noise level distribution generated by cars or airplanes over a city is analyzed, large data sets are considered. Huge data are mandatory to model the whole city. An example of such model is illustrated in the Figures 1, 2 and 3. These Figures have been obtained with the technique presented in this paper.

Figure 1. Example of a city.

Figure 2. Example of a city (bis). This article presents in through details how to access VTK objects from different languages and how to mix Tcl and C++ components in one application. Several source code

Large-Scale Data Visualization using Multi-Language Programming...

39

Figure 3. Example of a city (ter). examples are shown in order to help the reader to write a complete application by his own. The paper is organized as follows. In Section 2, the concept of programmable filters in VtK are reminded. Then in Section 3, the way to access VTK object’s data is detailed. Section 4 presents programmable filters written in C++, and Section 5 presents programmable filters written in Tcl. In Section 6, dynamically linked functions used as filter method in Tcl are investigated. Finally, Section 7 contains the conclusions of this paper.

2. Programmable Filters In programs using VTK library, visualization process can be described in terms of data flow through so called visualization network or visualization pipeline [14, 12]. During visualization, a data, which is represented by visualization objects, is passed between process objects connected into the visualization pipeline. The process objects operate on input data to generate output data. The process objects can be divided into the following categories: source objects, filter objects and mapper objects. Filter objects require one or more input data objects and generate one or more output data objects. VTK provides several filter objects which perform various visualization operations (e.g. extracting geometry, extracting and modifying data attributes, etc). When new processing capabilities are required, and when they cannot be obtained by combination of existing filters, new classes of filters can be added. This requires introduction of a new class and modification of the source code. However it is possible to create new type filter objects without creating new classes and even to create new kind of filters on the run time and from the scripting languages like Tcl or Python. To make it possible VTK provides family of programmable filter class which has all common properties of ordinary filter classes except the fact that their processing routine can be set to specified user function. This way the user has only to write the processing function, create new instance of programmable filter and use it to build visualization pipeline. Each time the filter is re-

40

Fr´ed´eric Magoul`es and Roman Putanowicz

quested to execute, it will call user specified function. The next section shows how to write functions for programmable filters (vtkProgrammableFilter in particular) in Tcl, C++ and Fortran.

3. Accessing VTK Object’s Data It might happen that we want to extend visualization programs by functions written in Fortran or we have large Fortran legacy code we would like to interface with visualization program written using VTK library. One problem that immediately appears is that in Fortran (including Fortran 90) we do not have direct access to C++ objects. In theory it is possible to pass C++ object pointer to a Fortran function and then knowing the memory layout of the object manipulate it directly from Fortran but this is restricted to simplest cases and is highly not portable. What instead should be done is to extract all necessary information from C++ object, pack it into ordinary variables and arrays and pass that data to a Fortran function. When Fortran function returns modified arguments, they are used to alter the C++ objects or to create new ones. We assume that the reader is already familiar with basic VTK components and in particular with VTK data model. If not then we suggest reading chapters 4 and 5 from [14] or chapter 11 from [12]. Nevertheless we will start our discussion with a very simple example which introduces one of the VTK array classes – vtkDoubleArray

3.1.

Creating and Manipulating VTK Arrays

The example below shows how to create an array of double values with 10 rows and 3 columns. Such array could be used for instance to hold points coordinates of a three dimensional mesh in finite element methods. 1 #include 2 #include "vtkDoubleArray.h" 3 using namespace std; 4 int main(void) { 5 vtkDoubleArray *array; 6 int m = 10; 7 int n = 3; 8 double buff[3]; 9 10 11

/* Creating VTK array object */ array = vtkDoubleArray::New(); array->SetNumberOfComponents(n);

12

array->Allocate(m);

13

array->SetNumberOfTuples(m);

14

for (int i=0; iGetNumberOfTuples(); int ncols = array->GetNumberOfComponents(); double *carray = new double [nrows * ncols];

23 24 25 26 27 28 29

for (int i=0; iSetFileName("2Dmesh.vtk");

8 9

writer = vtkUnstructuredGridWriter::New(); writer->SetFileName("newMesh.vtk");

10 11

// connect object to form the pipe writer->SetInput(reader->GetOutput());

12 13

// initialize pipe processing writer->Update();

14 15 16 17 }

writer->Delete(); reader->Delete(); return 0;

As it can be seen, this program does nothing else as copying ”2Dmesh.vtk” file into ”newMesh.vtk”.

4.2.

Pipeline with Filter

Now we introduce programmable filter in order to transform an unstructured grid by user specified function. The layout of the program is shown in the figure below. We assume that data file specified a scalar attribute for each point. The user function copy the grid topology and geometry and sets new point attributes which are the old value multiplied by 10. With

Large-Scale Data Visualization using Multi-Language Programming...

45

vtkUnstructuredGridReader

vtkProgrammableFilter user function

vtkUnstructuredGridWriter

Figure 5. Pipeline with filter user function as above it would be better to use vtkProgrammableAttributeDataFilter but we will use vtkProgrammableFilter to show how to create new output object and how to copy grid topology and geometry. The vtkProgrammableFilter class provides, among others, the following method: vktProgrammableFilter::SetExecuteMethod (void(*f)(void *), void *arg)

This method takes two arguments: first being the pointer to user function. The user function must take one argument of void pointer type and return void. The second argument is the pointer to client data, which will be passed to user function upon its execution. The client data allow to pass to user function all necessary information the function needs to perform its job. The client data will usually contain pointer to the filter itself which allows the function to retrieve filter’s input and output objects. In the simplest case the client data will be the pointer to the filter alone. Lets assume that the function void ScaleBy10 (void *arg);

is going to be used by the filter. Here is the new program: 1 #include "vtkUnstructuredGridReader.h" 2 #include "vtkUnstructuredGridWriter.h" 3 #include "vtkProgrammableFilter.h"

46

Fr´ed´eric Magoul`es and Roman Putanowicz

4 void ScaleBy10 (void *arg); 5 int main() { 6 vtkUnstructuredGridReader *reader; 7 vtkUnstructuredGridWriter *writer; 8 vtkProgrammableFilter *filter; 9 10

reader = vtkUnstructuredGridReader::New(); reader->SetFileName("2Dmesh.vtk");

11 12

writer = vtkUnstructuredGridWriter::New(); writer->SetFileName("newMesh.vtk");

13 14

filter = vtkProgrammableFilter::New(); filter->SetExecuteMethod (ScaleBy10, (void*)filter);

15 16 17

// connect objects to form the pipe filter->SetInput(reader->GetOutput()): writer->SetInput(filter->GetUnstructuredGridOutput());

18 19

// initialize pipe processing writer->Update();

20 21 22 23 24 }

writer->Delete(); filter->Delete reader->Delete(); return 0;

Note the line 14 where as we said we pass pointer to filter object as a client data.

4.3.

User Function

The task of the user function ScaleBy10 create new grid with the same geometry and topology as the input grid but with points attribute scaled by 10. First the topology and geometry is copied from the input object. Then point data array are copied and then modified. At the end the modified data array is inserted into output mesh as the new point data. The code for user function is given below. 1 2 3 4 5

#include #include #include #include #include

"vtkDataSet.h" "vtkDoubleArray.h" "vtkDataArray.h" "vtkProgrammableFilter.h" "vtkUnstructuredGrid.h"

6 void ScaleBy10(void *arg) { 7 vtkIdType numPts; 8 vtkDataArray *da; 9 vtkDoubleArray *scalars; 10 vtkDataSet *input;

Large-Scale Data Visualization using Multi-Language Programming... 11

vtkUnstructuredGrid *output;

12

vtkProgrammableFilter *myFilter;

13 14

/* get the filter from client data */ myFilter = (vtkProgrammableFilter *)arg;

15 16 17

/* get the filter input and output */ input = myFilter->GetInput(); output = myFilter->GetUnstructuredGridOutput();

18 19

/* copy grid geometry and topology */ output->CopyStructure(input);

20 21 22

/* copy the scalar point attribute */ scalars = vtkDoubleArray::New(); scalars->DeepCopy((input->GetPointData())->GetScalars());

23 24

/* modify the scalars */ double factor = 10;

25 26 27 28

for (vtkIdType j=0; jGetNumberOfTuples(); j++) { scalars->SetTuple1(j, factor*scalars->GetTuple1(j)); }

47

29 /* Set back the scalars as the point data in output object */ 30 (output->GetPointData())->SetScalars(scalars); 31 }

In the line 14 we retrieve the pointer to the filter from client data. In lines 16 to 19 we get the filter input and output and do the copying of grid geometry and topology. Lines 21 to 28 contain the code to copy point scalar attributes into array of doubles and to modify the array. Finally in line 30 we set the modified array as the point data in output object. The action in lines 25 to 28 is the core of the filter. We could substitute those lines by a call to Fortran routine which does the multiplication. Of course, in light of what we said about passing data from C++ objects to Fortran code, we would need to add appropriate data transfer routines.

5. Programmable Filters in Tcl Now we are about to repeat the example from the previous section entirely in Tcl.

5.1.

Simple Pipe Example

Here is the Tcl code that do not use any filter.

48

Fr´ed´eric Magoul`es and Roman Putanowicz

1 package require vtk 2 vtkUnstructuredGridReader reader 3 reader SetFileName "2Dmesh.vtk" 4 vtkUnstructuredGridWriter writer 5 writer SetFileName "newMesh.vtk" 6 writer SetInput [reader GetOutput] 7 writer Write 8 vtkCommand DeleteAllObjects 9 exit

5.2.

Pipeline with Filter

Next we present the code in which programmable filter is used to modify the data. 1 package require vtk 2 proc ScaleBy10 {} { 3 set input [filter GetInput] 4 set output [filter GetUnstructuredGridOutput] 5 #copy the geometry and topology 6 $output CopyStructure $input 7

set dai [[$input GetPointData] GetScalars]

8 9 10

# copy the point scalar attributes to scalars array vtkDoubleArray scalars scalars DeepCopy $dai

11 12 13 14 15 16

# modify the array values set n [scalars GetNumberOfTuples] set factor 10 for {set i 0} {$i < $n} {incr i} { scalars SetTuple1 $i [expr $factor * [scalars GetTuple1 $i]] }

17 # get the scalars array as the point data in the output mesh 18 [$output GetPointData] SetScalars scalars 19 } 20 vtkUnstructuredGridReader reader 21 reader SetFileName "2Dmesh.vtk" 22 vtkUnstructuredGridWriter writer

Large-Scale Data Visualization using Multi-Language Programming... 23

49

writer SetFileName "newMesh.vtk"

24 vtkProgrammableFilter filter 25 filter SetInput [reader GetOutput] 26 filter SetExecuteMethod ScaleBy10 27 writer SetInput [filter GetUnstructuredGridOutput] 28 writer Write 29 vtkCommand DeleteAllObjects 30 exit

The example looks similar to the C++ one though there is one important difference – the filter method SetExecuteMethod was called with only one argument: filter SetExecuteMethod ScaleBy10

It should be called with only one argument because the second argument is implicitly provided by the Tcl wrapper of SetExecuteMethod and this argument is set to be the pointer to Tcl interpreter. vtkProgrammableFilter filter object need that pointer to evaluate the user supplied Tcl script. The above brings the question how user supplied script will get its client data and in its filter. To solve this problem we must use some other data passing mechanism available in Tcl. We can pass client data through global variable or through use of namespace variable mechanism. In the presented example we used the fact, that the filter was created in the global name scope and the command procedure associated with it is available everywhere.

6. Using Dynamically Linked Function as Filter Method in Tcl In the example of the previous section the user function ScaleBy10 was written in scripting language. What we would like to do now is to keep the previous example intact except for the user function, which is going to be written in C, C++, or Fortran. We may want to do it for several reasons; first for efficiency, second we may already have substantial portion of the user function code written in another language. Before we attempt to give a recipe how to achieve the above goal some introduction to extending Tcl applications with new built-in commands in necessary. This topic is wide and of course we are not going to cover it in every detail. More detailed discussion can be found in chapter 44 of [16] or in part III of [9]. It also will be necessary to understand some inner working of VTK library. Unfortunately for that we must refer to the source code itself. In particular we will refer to the following files: Common/vtkTclUtil.h Common/vtkTclUtil.cxx, Graphics/vtkProgrammableFilter.cxx, Graphics/vtkProgrammableFilterTcl.cxx.

50

Fr´ed´eric Magoul`es and Roman Putanowicz

6.1.

Adding New Built-in Command to Tcl Interpreter

Tcl may be easily extended by writing new command implementations in C.1 New commands can be implemented in C for efficiency or to provide functionality which is not possible to provide in pure Tcl. The example below provides implementation of Add command, which adds two numbers and returns the result of addition. Here is how Tcl implementation may look like: proc Add {a b} { return [expr $a + $b] }

Instead of the Tcl code we would like to use the following C code: double AddVeryFast(double a, double b) { return (a+b); }

To implement Tcl command in C we must provide a C function called command procedure. In our example we will use command procedure interface which is based on so called ”dual ported objects”. Thews ”objects” are structures of type Tcl Obj and were introduced in Tcl 8.0 to minimize number of conversions between string and native representation of data. Each command procedure has standard interface. For our Add command it looks like: int AddObjCmd(ClientData clientData, Tcl Interp *interp, int objc, Tcl Obj *CONST objv[]);

The first argument allows to pass arbitrary client data to command procedure. The second is the Tcl interpreter, the third the number of arguments passed to the command (including command name) and the fourth is the array of argument values represented as Tcl Obj ”objects”. The function returns an integer indicating success or failure. Here is an implementation of AddObjCommand 1 #include 2 int AddObjCmd(ClientData clientData, Tcl Interp *interp, 3 int objc, Tcl Obj *CONST objv[]) 4 { 5 double a, b, c; 6 static Tcl Obj *resultObjPtr=NULL; 7 8 9 10

/* check the number of arguments: command a b */ if (objc != 3) { Tcl WrongNumArgs(interp, 1, objv, 1 In

this section whenever we say ’C’ we also mean ’C++’

Large-Scale Data Visualization using Multi-Language Programming... 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27

51

"arg1 arg2\n\t arg1,arg2 are numeric values"); return TCL ERROR; } /* get the value of the first argument */ if (Tcl GetDoubleFromObj(interp, objv[1],&a) != TCL OK) { return TCL ERROR; } /* get the value of the second argument */ if (Tcl GetDoubleFromObj(interp, objv[2], &b) != TCL OK) { return TCL ERROR; } /* create the result object with value being the sum of arguments */ c = AddVeryFast(a, b); resultObjPtr = Tcl NewDoubleObj(c); Tcl SetObjResult(interp, resultObjPtr);

28 return TCL OK; 29 }

In lines 8 to 13 we check if command was called with proper number of arguments. In lines 15 to 23 we get the value of the objects which were passed as command arguments. In line 25 we call our C ”fast addition” function and in line 26 we create the result object then set it as the result of the command. The example only touches the issue of creating command procedures in C. The Tcl API (Application Programmer Interface) provides several functions to assist writing command procedures (support for lists, hash tables, script evaluation, variables tracking, etc.). Creating a command procedure is only half of the job. Tcl must be informed about the new command, so that whenever it encounters the command name in a procedure call context it knows what to do. We are going to provide new built-in Tcl Add command as a dynamically loadable package. A package can be loaded with the following command: load library package

The first argument to load command is the name of a shared library file (usually named *.dll on Windows and *.so on Unix). The package is the name of the package and this name is used to call package initialization function. Each package must provide initialization function called package Init. The initialization function performs all necessary initialization and in particular register new commands provided by the package. The initialization function for our example is given below: 1 #include 2 int AddObjCmd(ClientData clientData, Tcl Interp *interp, 3 int objc, Tcl Obj *CONST objv[]); 4 extern "C" {

52

Fr´ed´eric Magoul`es and Roman Putanowicz

5 int Add Init(Tcl Interp *interp) ; 6 } /* end extern "C" */ 7 int Add Init(Tcl Interp *interp) { 8 /* Register the Add command */ 9 Tcl CreateObjCommand(interp, "Add", AddObjCmd, (ClientData) NULL, 10 (Tcl CmdDeleteProc *)NULL); 11 12

/* Declare the package */ Tcl PkgProvide(interp, "Add", "1.1");

13 return TCL OK; 14 }

In lines 4 to 6 we declare Add Init as extern "C" to prevent C++ compiler to mangle its name otherwise the load command could not find the initialization procedure. Now, we can create the shared library (.dll for windows or .so for unix) for the package by compiling the AddObjCmd and Add Init functions files. The use of Add package is demonstrated under unix by the following Tcl script load ./add.so Add set c [Add 1 2] puts $c

6.2.

Tcl Interface to VTK Library

Though the details might be different the Tcl interface to VTK library is created in a way similar to the one presented in the previous section. For each VTK class the VTK library provides a command procedure and initialization procedure. The initialization procedure (or procedures) are called when VTK library is loaded into Tcl, i.e. when the interpreter encounters lines like: package require vtk

The command procedure for a given class creates new class instances (objects) and new commands specific to each instance. In the following Tcl script vtkUnstructuredGridReader myreader myreader SetFileName "mesh.vtk"

two things happen. First when interpreter encounters command vtkUnstructuredGridReader it calls this command procedure provided by VTK library. That procedure creates in turn an object of vtkUnstructuredGridReader type and registers new Tcl command called myreader. The myreader command will then deal with all requests to the vtkUnstructuredGridReader object - it will forward the requests and their arguments to the object by calling object methods. If we create another reader by calling vtkUnstructuredGridReader myreader1 new Tcl command myreader1 will be

Large-Scale Data Visualization using Multi-Language Programming...

53

registered and so on. It is important to remember that each VTK object is related to its unique Tcl command, and the command name determines the specific object. Figure 6 illustrates the way of accessing object methods from Tcl script.

myreader SetFileName "mesh.vtk"

Tcl script Tcl interpreter

myreader command procedure

VTK library vtkUnstructuredGridReader

SetFileName()

Figure 6. Accessing object methods from Tcl

6.3.

Accessing C++ Object from Tcl Script

Imagine that we have C++ function which takes a pointer to an unstructured grid object. We have wrapped that function in Tcl (by hand like in section 6.1. or using tools like SWIG or CABLE) and now we are about to call that function. However we do not have the right data to pass to it. When in Tcl shell we issues a command vtkUnstructuredGrid mygrid

a vtkUnstructuredGrid object is created together with its command procedure. What we deal with in Tcl is the object command and not the object pointer. The C++ and Tcl layers are completely separated. VTK however provides utilities in vtkTclUtil.cxx which allow to access object pointer knowing its command name and its class name. It also provides reverse function to create object command from object pointer but here we will discuss and use only the former. With help of the function extern VTKTCL EXPORT void * vtkTclGetPointerFromObject(const char *name, const char *result type, Tcl Interp *interp, int &error);

54

Fr´ed´eric Magoul`es and Roman Putanowicz

it is possible to write a new Tcl command which will retrieve the pointer to object and return it to scripting language (e.g. as an integer value, or string value) where it can be passed around.

6.4.

vtkProgrammableFilter with C++ User Function

When trying to use user function written in Fortran we will encounter two problems – accessing data of VTK objects from Fortran and accessing C++ representation of objects from Tcl. The former problem was discussed in section 3. so now we will only concentrate on the latter. Lets assume that in Tcl we created a programmable filter and we would like to call C++ function void ScaleBy10 (void *arg);

as its method. We have to solve two additional problems. First is how to get the pointer to filter object and pass it to the user function as a client data and the second how to call the C++ function. As for accessing the pointer we can use vtkTclGetPointerFromObject. For the second problem we must recall that in Tcl the argument to SetExecuteMethod is the Tcl script. When the filter executes the script is evaluated in the context of the current interpreter. What we have to do is to create new special Tcl command which we can pass to the Tcl interpreter and this command will take care to get the pointer to filter object and to call the C++ function. Lets call the Tcl command ScaleBy10. When this command is called from Tcl it calls the C++ function ScaleBy10(void*). Additionally the Tcl command is able to register the name of a programmable filter with which it is going to be used. Now we repeat the example from section 5.2. but this time filter method is implemented in C++. 1 package require vtk 2 load ./ScaleBy10

ScaleBy10

3 vtkUnstructuredGridReader reader 4 reader SetFileName "2Dmesh.vtk" 5 vtkUnstructuredGridWriter writer 6 writer SetFileName "newMesh.vtk" 7 vtkProgrammableFilter filter 8 9 10 11

# register the filter in callback routine ScaleBy10 set filter filter SetInput [reader GetOutput] filter SetExecuteMethod ScaleBy10

12 writer SetInput [filter GetUnstructuredGridOutput]

Large-Scale Data Visualization using Multi-Language Programming...

55

13 writer Write 14 vtkCommand DeleteAllObjects 15 exit

The complete code for ScaleBy10 package is given below. The code has the same structure as the one discussed in 6.1.. #include #include #include #include

"string.h" "ScaleBy10.h" "vtkTclUtil.h"

extern "C" { int ScaleBy10ObjCmd(ClientData clientData, Tcl Interp *interp, int objc, Tcl Obj *CONST objv[]); int Scaleby10 Init(Tcl Interp *interp) ; } int Scaleby10 Init(Tcl Interp *interp) { /* Initialize the stub table interface */ if (Tcl InitStubs(interp, "8.1", 0) == NULL) { return TCL ERROR; } /* Register the command */ Tcl CreateObjCommand(interp, "ScaleBy10", PscalarsObjCmd, (ClientData) NULL, (Tcl CmdDeleteProc *)NULL); /* Declare the package */ Tcl PkgProvide(interp, "ScaleBy10", "1.1"); return TCL OK; }

The code above registers new command with the interpreter. It is exactly the same as in the example with Add command in section 6.1.. Here comes the function which actually implements the command: int ScaleBy10ObjCmd(ClientData clientData, Tcl Interp *interp, int objc, Tcl Obj *CONST objv[]) { void *filterPtr = NULL; char *filterName = NULL; int index; static Tcl Obj *filterNameObjPtr=NULL; Tcl Obj *objPtr; /* array of subcommand names */ char *subCmds[] = "set", "get", NULL;

56

Fr´ed´eric Magoul`es and Roman Putanowicz

enum CmdIdx {SetIdx, GetIdx }; /* check if command was called with proper number of arguments */ if (objc > 3) { Tcl WrongNumArgs(interp, 1, objv, "?get?, ?set FilterName?"); return TCL ERROR; } /* If no argument was given then try to call associated C++ function */ if (objc == 1) { /* check if we have object which holds filter name */ if (filterNameObjPtr == NULL) { Tcl AppendResult(interp, "Error: filter name was not set\n", NULL); Tcl AppendResult(interp, "Use: ", Tcl GetString(objv[0]), " set arg",NULL); return TCL ERROR; } /* get the name from object */ filterName = Tcl GetString(filterNameObjPtr); int error=0; /* Get the pointer to the filter object This is the crucial part of the function - getting the VTK object knowing the name of its command procedure */ filterPtr = vtkTclGetPointerFromObject(filterName, "vtkProgrammableFilter", interp, error); if (error == 1 || filterPtr == NULL) { Tcl AppendResult(interp, "Could not fild the vtkProgrammableFilter", " with name: ", filterName, NULL); return TCL ERROR; } /* Run the filter method */ ScaleBy10(filterPtr); return TCL OK; } else /* if more arguments are given then check if they are valid subcommands */ { if (Tcl GetIndexFromObj(interp, objv[1], subCmds, "subcommand", TCL EXACT, &index) != TCL OK) {

Large-Scale Data Visualization using Multi-Language Programming...

57

return TCL ERROR; } /* check if subcommands were called with proper number of arguments */ if ( (index == SetIdx && objc != 3) ||(index == GetIdx && objc != 2)) { Tcl WrongNumArgs(interp, 1, objv, "?get?, ?set FilterName?"); return TCL ERROR; } /* execute the particular subcommand */ switch (index) { /* duplicate the argument given to ’set’ subcommand and store it as filter name */ case SetIdx: if (filterNameObjPtr != NULL) { Tcl DecrRefCount(filterNameObjPtr); } filterNameObjPtr = Tcl DuplicateObj(objv[2]); Tcl IncrRefCount(filterNameObjPtr); /* return the stored filter name */ case GetIdx: objPtr = Tcl DuplicateObj(filterNameObjPtr); /* Setting the results add new reference so after * the reference count is 1 */ Tcl SetObjResult(interp, objPtr); return TCL OK; default: /* just sanity check */ char errMsg[256]; snprintf(errMsg, 255, "index value %d, file: %s, line %d", index, FILE , LINE ); Tcl AppendResult(interp, "Internal error, unknown ", errMsg, NULL); return TCL ERROR; } } }

The command can be called with no arguments - that triggers the execution of the C++ function ScaleBy10(void *) or with the following subcommands • ScaleBy10 get – it returns the filter name registered with command • ScaleBy10 set name – it register the name in the command The most important it the above code is the line filterPtr = vtkTclGetPointerFromObject(filterName,

58

Fr´ed´eric Magoul`es and Roman Putanowicz "vtkProgrammableFilter", interp, error);

In this line we retrieve the pointer to registered filter from the pointer name. In the presented implementation there is no way to check if the C++ function ScaleBy10 have finished successfully. To provide such feedback it would be necessary to create compound data object and pass it as a client data. In our case the declaration of such compound object could look like: typedef struct MyClientData { void *arg; int status; } MyClientData;

If we have more than one C++ function to use as a filter method we need to provide appropriate Tcl command for each of them. As the number of function increases the task of writing the command procedure becomes tedious. However if we look at given example we may notice that the implementation of command procedure will be exactly the same for all functions except the names. It is very easy to provide a script which will generate the implementations from the list of function names.2

7. Conclusions Visualization and analysis of data arising from realistic environmental studies becomes problematic since the amount of data is very large. A solution to the efficient and interactive visualization of such phenomena may be to mix Tcl, C++ and Fortran interface calls to the VTK library. The visualization application based on the VTK library can be written in one of the scripting languages Tcl, Python or Java, and the treatment methods involving data manipulation can be written in one of the compiled language C++ or Fortran. With this approach, compiled language components are used for speed and scripting language components are used for flexibility and rapid development. This article presents in through details how to access VTK objects from different languages and how to mix Tcl and C++ components in one single application. Several source code examples are shown in order to help the reader to write a complete application by his own.

A

Example of Accessing Object Data with Routines from dpl Library

1 #include 2 #include 2 One may ask if it is possible to use for instance SWIG to to that job. It is possible however not necessary and in fact it would require somehow to twist normal SWIG behaviour. It is much simpler and appropriate to use bash, AWK or sed script

Large-Scale Data Visualization using Multi-Language Programming...

59

3 #include "dpl.h" 4 using namespace std; 5 template void PrintArray( T *array, int length, int 6 stride) { 7 T *pt; 8 pt = array; 9 for (int i=0; iGetNumberOfTuples() != n) { return 0; } /* just sanity check */ assert (da->GetNumberOfComponents() == 1); /* set the array */ for (vtkIdType j=0; jSetTuple1(j, sdata[j]); }

Large-Scale Data Visualization using Multi-Language Programming... return 1; } /* end of dplSetScalars */ /* helper function to add scalar array to point data */ static int dplAddScalars (vtkPointData *pd, const double *sdata, const vtkIdType n, const char *name) { vtkDoubleArray *da = vtkDoubleArray::New(); da->SetName(name); da->SetNumberOfComponents(1); da->Allocate(n); da->SetNumberOfTuples(n); /* set the array */ for (vtkIdType j=0; jSetTuple1(j, sdata[j]); } /* add array to point data */ pd->AddArray(da); int hasScalar = pd->SetActiveScalars(name); assert (hasScalar != -1); return 1; } /* end of dplAddScalars */

B5.

Getting Scalar Points Attribute

/* NAME * dplGetScalarsCArray * DESCRIPTION * Allocate and fill the array with point scalar data * ARGUMENTS * pd - point data object (IN) * name - name of the scalars (IN) * length - variable used to return the length of allocated array * RETURN VALUE * Pointer to allocated array or NULL if there was no scalar attribute * of given name or array allocation failed */ double * dplGetScalarsCArray (vtkPointData *pd, const char *name, vtkIdType *length) { assert (pd != NULL); /* Set the active scalar attribute */

65

66

Fr´ed´eric Magoul`es and Roman Putanowicz int hasScalar = pd->SetActiveScalars(name); if (hasScalar == -1) { return NULL; } vtkDataArray *da = pd->GetAttribute(vtkDataSetAttributes::SCALARS); vtkIdType n = da->GetNumberOfTuples(); if (length != NULL) { *length = n; } /* we are expecting a scalar date with one component only */ assert (da->GetNumberOfComponents() == 1); double *dscalars = new double [n]; if (dscalars == NULL) return NULL; /* set the array */ for (vtkIdType j=0; jGetTuple(j, dscalars+j); }

return dscalars; } /* end of dplGetScalarsCArray */

B6.

Creating vtkUnstructuredGrid from Arrays Data

/* NAME * dplUGridTriangleFromCArrays * DESCRIPTION * Create an ustructured grid with triangular elements using coordinates * and topology arrays * ARGUMENTS * npoints - number of points * dcoords - array of point coordinates (of size 3 * npoints) * ncells - number of triangular cells in the grid * ids - array of indices of cell points * RETURN VALUE * Pointer to vtkUnstructured or NULL on error */ vtkUnstructuredGrid * dplUGridTriangleFromCArrays (vtkIdType npoints, const double *dcoords, vtkIdType ncells, vtkIdType *ids)

Large-Scale Data Visualization using Multi-Language Programming...

67

{ vtkUnstructuredGrid *ugrid = vtkUnstructuredGrid::New(); vtkPoints *points = vtkPoints::New(); assert (ugrid != NULL); assert (points != NULL); /* set points coordinates */ points->SetNumberOfPoints(npoints); for (vtkIdType i=0; iSetPoint(i, dcoords+3*i); } ugrid->SetPoints(points); points->Delete(); /* set cell indices */ ugrid->Allocate(ncells); for (vtkIdType i=0; iInsertNextCell(VTK TRIANGLE, 3, ids+3*i); } return ugrid; } /* end of dplUGridTriangleFromCArrays */

References [1] A.V. Aho, B. Kernighan, P.J. Weinberger. The AWK Programming Language. AddisonWesley, 1988. [2] J. Ahrens et al. A Parallel Approach for Efficiently Visualizing Extremely Large, TimeVarying Datasets. Los Alamos National Laboratory, Technical Report #LAUR-001620. [3] D.M. Beazley. SWIG: An Easy to Use Tool for Integrating Scripting Languages with C and C++. Proceedings of the 4th USENIX Tcl/Tk Workshop, Monterey, California, 129–139, July 1996. [4] K. Brodlie et al. Harnessing the Web for Scientific Visualization. VisFiles By Bill Hibbard, February 2000. [5] J.M. Favre. Towards Efficient Visualization Support for Single-block and Multi-block Datasets. IEEE Visualization Proceedings, 1997. [6] J. Friesen, T. Tarman. Remote High-Performance Visualization and Collaboration. IEEE CG&A, vol. 20, no. 4, August 2000. [7] CABLE page. http://public.kitware.com/Cable/HTML/Index.html. April 2003.

68

Fr´ed´eric Magoul`es and Roman Putanowicz

[8] B. Kernighan, D. Ritchie. The C programming Language. (2nd Edition) Prentice-Hall, 1988. [9] J. Ousterhout. Tcl and the Tk Toolkit. Addison-Wesley, March 1994. [10] R. Putanowicz, F. Magoul`es. Simple visualizations of unstructured grids with VTK. Research Report LORIA - Universit´e Henri Poincar´e, France, No A03-R-38, February 2003. [11] R. Putanowicz, F. Magoul`es. Building Applications with VTK Library Using Tcl, C++ and Fortran Components. Research Report LORIA - Universit´e Henri Poincar´e, France, No A04-R-37, February 2003. [12] W. Schroeder. The VTK User’s Guide. Kitware, Inc. 2003. [13] W. Schroeder, L. Avila, W. Hoffman. Visualizing with VTK: A Tutorial. IEEE Computer Graphics And Applications, September 2000. [14] W. Schroeder, K. Martin, B. Lorensen. The Visualization Toolkit: An Object-Oriented Approach To 3D Graphics. (3rd Edition) Kitware, Inc. publishers, 2003. [15] B. Stroustrup. The C++ Programming Language. (3rd Edition) Addison-Wesley Longman, 1997. [16] B. Welch. Practical Programming in Tcl & Tk. (2nd Edition) Prentice Hall PTR, June 1997.

In: Techniques of Scientific Computing for Energy ... Editors: F. Magoulès and R. Benelmir, pp. 69-83

ISBN 1-60021-921-7 © 2007 Nova Science Publishers, Inc.

AN ANALYSIS OF FLOW AROUND A PROPELLER USING FICTITIOUS DOMAIN FINITE ELEMENT METHOD Kazutaka Harada and Mutsuto Kawahara Department of Civil Engineering, Chuo University, Kasuga 1-13-27, Bunkyou-ku, Tokyo 112-8511, Japan

Abstract In this paper, an analysis of 3-dimensional finite element method based on the mixed bubble function interpolation using a fictitious domain method is presented. Following the present approach, the computation of the moving boundary problems can be solved successfully. How to use a fictitious domain method based on the Navier-Stokes equation is explained. Then, the incompressible viscous fluid around rotating propeller is solved. It is shown that the flow around the rotating propeller can be solved successfully by the present method.

Keywords: finite element method, fictitious domain method, distributed Lagrange multiplier, Navier-Stokes equation.

1. Introduction Purpose of this research is to analyze the moving boundary problems in the incompressible viscous fluid. Most of the natural phenomena which are dealt with in engineering fields consist of the moving boundary problems. Thus, it is important to analyze the moving boundary problems to solve the natural phenomena. Experimental study of the complicated moving boundary problems of the incompressible viscous fluid is one of the problems. The finite element method is useful tool for the analysis. To compute the moving boundary problems by the conventional finite element method, there are some disadvantages. One is an increase of the computational time and the other is a shortage of computational memory because the moving boundary problem is more and more complicated and large-scaled owing to the developments of the finite element method and of the computer. The most important feature is to avoid the remeshing in the conventional finite element method. Therefore, the moving boundary problems using the finite element method with the fictitious domain method [1]~[3] is employed in this research. To solve the finite element equation, the mixed interpolation for velocity and pressure based on the bubble function formulation is employed.

Kazutaka Harada and Mutsuto Kawahara

70

The basic idea of this method is simple. It is assumed that the inside of an objective domain is filled with fluid. An objective action is reflected in fluid. Moreover, objective boundary condition can be expressed as the constraint condition using the distributed Lagrange multiplier. Therefore, remeshing is not necessary because both objective and computational domains are introduced. The computation is much better because moving boundary problems can be analyzed with the moving mesh of the objective domain. The mixed interpolation [7]~[10] is employed with the bubble function interpolation for velocity and linear function interpolation for pressure to obtain the stable finite element computation. For the computational study, the falling particulate flow has already been presented in the papers [3]~[7]. In this paper three dimensional moving boundary problem of moving propeller is computed. For the development of wind turbine power supply [13]~[15], fictitious domain finite element method is applied to the wind turbine. And comparison between computational fluid dynamics and wind tunnel experiment can be examined.

2. Computational Scheme 2.1. Basic Equation As the basic equation of the incompressible viscous fluid, the incompressible Navier-Stokes equations is used, which is expressed as,

u + u∇u + ∇p −ν∇(∇u + ∇u T ) = f in Ω \ ω , ∇u = 0 in Ω \ ω ,

(1) (2)

u = g 0 on Γ1 ,

(3)

u = g 1 on γ ,

(4)

[− p + ν {∇u + (∇u ) T }] ⋅ n = t on Γ2 ,

(5)

Γ1 ∪ Γ2 = Γ ,

(6)

Γ1 ∩ Γ2 = φ ,

(7)

where u and p denote the velocity and pressure,ν is viscosity, and Ω is a computational

ω is the bounded domain contained in Ω , Γ is the boundary of Ω , and γ is the boundary of ω ,respectively. Γ is divided into subsets Γ1 and Γ2 . f is external force. The

domain,

domain

ω is referred to as the objective domain. The given values g 0 and g1 are boundary

An Analysis of Flow around a Propeller using Fictitious Domain…

71

conditions. Traction is denoted by t and n is the unit vector of outward normal to Γ , respectively.

ω Ω

Figure 1. Computational Domain Ω and Objective Domain ω .

2.2. Finite Element Interpolation Multiplying equation(1) by a weighted function, and integrating over the computational domain, the weighted residual equation can be obtained after applying the green formula.

∫ wudΩ + ∫ wu ⋅ ∇udΩ + ∫ ∇w ⋅ [− p + ν {∇u + (∇u) Ω

Ω

Ω



Ω

q ∇ ⋅ ud Ω = 0

T

}]dΩ = ∫ w ⋅ fdΩ + ∫ w ⋅ tdΓ2 , Ω

,

Γ2

(8)

(9)

where w and q are weighting functions. The fractional step projection method [11] is applied in this formulation. As for the spatial discretization, the finite element method based on the bubble function interpolation [8]~[10] for the velocity and the linear interpolation for kinematic pressure are applied and expressed as follows. The linear interpolation can be denoted as;

p = Ψ1 p1 + Ψ2 p 2 + Ψ3 p 3 + Ψ4 p 4 ,

(10)

Ψ1 = L1 , Ψ2 = L2 , Ψ3 = L3 , Ψ4 = L4 ,

(11)

where p1 ~ p4 are nodal values of pressure and L1 ~ L4 are area coordinate. The Bubble function interpolation can be represented as;

u i = Φ 1u i1 + Φ 2 u i 2 + Φ 3 u i 3 + Φ 4 u i 4 + Φ 5 u~i 5 ,

(12)

Kazutaka Harada and Mutsuto Kawahara

72

1 u~i 5 = u i 5 − (u i1 + u i 2 + u i 3 + u i 4 ) 4 ,

(13)

Φ1 = L1 , Φ 2 = L2 , Φ 3 = L3 , Φ 4 = L4 , Φ 5 = 256L1 L2 L3 L4 ,

(14)

u ~u

u

i 4 are nodal values. Nodes are shown in where i shows the components of u, and i1 Fig.2. The inside node is taken at the center of gravity, at which value can be eliminated.

Figure 2. Nodes of element.

2.3. Fictitious Domain Formulation

The basic idea of the fictitious domain method is simple. The fluid domain is denoted by Ω and solid body is denoted by ω and is referred to as the objective domain. The inside of ω is assumed to be filled with fluid. An objective reaction is reflected in the fluid. Moreover, the objective boundary condition can be expressed as constraint condition using the distributed

λ , waiting functions

by W , Q and μ , respectively, the formulation of the fictitious domain method can be expressed as follows, Lagrange multipliers. Denoting the distributed Lagrange multiplier by

~ n Uh −Uh ν n n n n T ∫Ω\ωWh{ Δt + (Uh ⋅ ∇)Uh }dx+ ∫Ω\ω ∇Wh ⋅ 2{∇Uh + (∇Uh ) }dx = ∫Ω \ω W h ⋅ F h dx + ∫ω W h ⋅ λ h dx ,

(15)

~ n+1 Uh −Uh 1 n+1 n n T ∫Ω\ωWh{ Δt }dx+ ∫Ω\ω ∇Wh ⋅{−Ph + 2ν{∇Uh +(∇Uh ) }}dx = ∫ Γ W h ⋅ tdx + ∫ω W h ⋅ λ h dx ,

(16)

2

An Analysis of Flow around a Propeller using Fictitious Domain…



Ω\ ω

∫ω μ

h

Qh ⋅ ∇Uh dx = 0

,

73

(17)

(U h − g1 )dx = 0,

(18)

~ W , Q Uh h h is the intermediate are interpolated functions by the bubble function and where velocity. In order to integrate more easily the last terms in eqs.(15) and (16) added by the fictitious domain method easily, the following Dirac’s delta function is introduced.

∞ 0

if ( X = X i ) if ( X ≠ X i ) ,

δ (X − X i ) = {

(19)

The integral form is expressed as:

∫ω The interpolation of follows.

δ ( X − X i )dx = {

0 1

if ( X = X i ) if ( X ≠ X i ) ,

(20)

λ and μ which are expressed by λh and μ h , can be expressed as

λ h = ∑ λ i δ ( X − X i ), Nd

i =1

(21)

μ h = ∑ μ i δ ( X − X i ), Nd

i =1

d is the number of the nodes of the domain of where domain ω can be written as follows,

N

∫ω ∫ω

(22)

ω and the integration over the

λ h v h dω = ∑ λ i v h ( X i ), Nd

i =1

μ h (U h − g 1 )dω = ∑ μ i (U h ( X ) − g 1 ( X i )).

(23)

Nd

i =1

(24)

Eqs.(15)~(18) can be solved at each time cycle. The simultaneous equation can be solved by the element by element conjugate gradient method [12].

Kazutaka Harada and Mutsuto Kawahara

74

2.4. Movement of Propeller The motion of the propeller is caused by the fluid force, of which the momentum equations of the Newton second law is expressed by

Ip

ωp

where

is the angular speed,

Ip

dω p dt

= Tp

,

(25)

is the moment of inertia,

Tp

is the moment force

imposed on the propeller by the fluid, respectively, in which subscripted p expresses the quantity is concerned with the propeller. The force and the moment imposed on the propeller by the fluid are described as follows,

F p = ∫ σndx γ

⎡cosθ pn ⎢ R n = ⎢ sin θ pn ⎢ 0 ⎣

,

(26)

− sin θ pn cosθ pn 0

0⎤ ⎥ 0⎥ 1⎥⎦ ,

(27)

.

(28)

T pn = ∫ ( x − G p ) R n (σn)dx γ

σ = − pI + ν (∇u + ∇u T ) is stress tensor, x is coordinate of point and n is outward unit normal to the boundary γ ,respectively. The where

Gp

is coordinate of gravity center,

velocity of propeller is computed as follows,

ω θ

n +1 p

n +1 p

= ω + Δt n p

= θ + Δt n p

2

T pn Ip

.

(29)

.

(30)

T pn Ip

θ

where p in eq.(30) is the angular displacement. The movement of propeller can be expressed by the following procedure. A nodal coordinate of a points on the surface of the domain

ω is expressed by X ω in eq.(31). Then it can be moved rotating by multiplying

( R n+1 )T . Those procedures are schematically illustrated in Fig.3.

An Analysis of Flow around a Propeller using Fictitious Domain…

X ω = [xω



zω ] ,

75

(31)

Figure 3. Image for rotational movement.

3. Numerical Study 1 3.1. Numerical Model propeller in the channel (Ω = (−2.0 × 2.0) × ( 2.0 × 2.0) × ( −4.0 × 10.0)) is given as the numerical study as shown in Fig.4. The Reynolds number is 1000. Time increment is taken 0.01 in this numerical computation.

To show the efficiency of the fictitious domain method, the simulation of the flow around the

Figure 4. Numerical model of propeller.

3.2. Finite Element Mesh

The finite element meshes for Ω and ω are shown in Figs.5 and 6, respectively.

Total number of nodes of Ω and ω are 42345 and 4321, respectively. Total number of

elements of Ω and ω are 244642 and 19528, respectively.

76

Kazutaka Harada and Mutsuto Kawahara

Figure 5. Mesh for computational domain.

Figure 6. Mesh for propeller.

An Analysis of Flow around a Propeller using Fictitious Domain…

3.3. Numerical Result

Figure 7. Time history of drag force.

Figure 8. Time history of lift and side forces.

77

78

Kazutaka Harada and Mutsuto Kawahara

Figure 9. Figure 8 amplified for the period from time step 7000 to 8000.

Figure 10. Streamlines and pressure time step is 4000

An Analysis of Flow around a Propeller using Fictitious Domain…

Figure 11. Streamlines and pressure time step is 5000

Figure 12. Pressure time step is 3000

79

80

Kazutaka Harada and Mutsuto Kawahara

Figure 13. Pressure time step is 5000

In order to understand the behavior of wake of wind turbine, the flow around the rotating propeller is analyzed. Numerical results are shown in Figs.7~13. The time history of the drag, lift and side forces are expressed in Figs.7, 8 and 9. Drag, lift and side forces are fluid forces expressed in eq.(26) in the direction of x , z and y , respectively. Getting to the stationary state, the periodic behavior can be seen in the figures. The pressure and streamlines are represented in Figs.10 and 11. in which streamlines, iso-surface of pressure and position of propeller are represented, respectively. It is clear that the rotating flow can be obtained behind the propeller. Looking at the figures, it is understood that the pressure at the point of the propeller receives stronger pressure than at the center of the propeller. Figs.12 and 13 express the pressure distribution behind the propeller. The influence distance to the rear side can be seen. It wishes these understanding to be made the best use of for the decision of the wind terbine at installation intervals and the wing shape decisions of the wind terbine.

4. Numerical Study 2 The result of obtaining from the wind tunnel experiment [13] is compared with the numerical analysis solution, and the validity of this technique is examined.

An Analysis of Flow around a Propeller using Fictitious Domain…

81

4.1. Model for Wind Tunnel Experiment

Figure 14. Model for wind tunnel experiment.

The model for wind tunnel experiment is presented in figure 14. This pinwheel is a scale of 1/50 of a real machine. D is 90cm. The wing shape that is basic is M-F073. Re number is 10000, and tip speed ratio is 4.1.

4.2. Numerical Result

Figure 15. Comparison between experiment and CFD about wind velocity W (z/D=0.0).

Behavior at flow velocity can be firmly obtained. However, CFD shows a big reaction. It is in being not able the use of the mesh of the same shape of a real machine. However, it can

82

Kazutaka Harada and Mutsuto Kawahara

be said that fictitious domain FEM is an effective technique compared with the experiment that needs a large labor.

5. Conclusions The fictitious domain method can be applied to the 3-dimensional flow around rotating propeller which has the complex body configuration and the moving boundary. These problems have to consider the interaction between fluid and solid. The mixed interpolation finite element method can be applied to the fictitious domain method with the bubble function interpolation for velocity and linear function for pressure. The element by element conjugate gradient method can be applied as the solver of the simultaneous equation. The fictitious domain finite element computation can be more stable and faster than the conventional finite element method in the computation of the moving boundary problems [1][2] by Prof. Glowinski. These understanding lead directly solution for shape design of propeller, and it is thought that it becomes a new method for the wind power generation in the future that takes the place of the experiment.

Acknowledgements The authors are grateful for the computation by AMD Opteron Quad Saver, which is installed “Anchoring Method for the Friction Anchor Bridge” funded by Research Institute of Science and Technology, Chuo University in 1999.

References [1] R.Glowinski, T-W. Pan and J.Periaux, ‘A fictitious domain method for Dilichlet problem and applications’, Compute. Methods Appl. Mech. Engrg., Vol.111, pp.283303, 1994. [2] R.Glowinski, T-W. Pan and J.Periaux, ‘A fictitious domain method for externa incompressible viscous flow modeled by Navier-Stokes equation’, Comput. Methods Appl. Mech. Engrg., Vol.112,pp.133-148, 1994. [3] D. D. Joseph and R. Glowinski, ‘Interrogations of Direct Numerical Simulation of Solid-Liquid Flow’,http://www.aem.umn.edu./Solid-liquid_Flows/ [4] T-W. Pan, ‘Numerical Simulation of the Motion of a Ball Falling in an Incompressible Viscous Fluid’ ,http://math.uh.edu/particulate_flow/f98.ps [5] H.Kawarada and H.suito, ‘Numerical method for a free surface flow on the basis of the fictitious domain method’, East –West J.Numer. Math., Vol.5, No.1, pp.57-66,1997. [6] H. Okumura, H. Naya, N. Shimada and M. Kawahara, ‘A Distributed Lagrange multiplier / Fictitious domain method for Incompressible flows moving rigid bodies’, Proceedings of the 14th Symposium on Computational Fluid Dynamics, c06-4.pdf, 2000. [7] N. Shimada and M. Kawahara, ‘Analysis of particulate flows by fictitious domain method with Distributed Lagrange multiplier’, Proceedings of the First Asian-Pacific Congress on Computational Mechanics, Vol.1, 2001, pp.127-132.

An Analysis of Flow around a Propeller using Fictitious Domain…

83

[8] A.Maruoka, J.Matsumoto and M.Kawahara, ‘Lagragian finite element method for incompressible Navier-Stokes equations using quadrilatera scaled bubble function’, Journal of Applied Mechanics, Vol. 44A, 1998, pp.383-390. (In Japanese). [9] J.Matsumoto and M.Kawahara, ‘Shape Identification for Fluid-Structure Interaction Problem using Improved Bubble Element’ International Journal of Computational Fluid Dynamics, 15, pp.33-45, 2001. [10] J.Matsumoto, T.Umetsu and M.Kawahara, ‘Incompressible Viscous Flow Analysis and Adaptive Finite Element Method Using Linear Bubble Function’, Journal of Applied Mechanics, vol.2, 223-232, 1999. [11] H. Okumura and K. Ohmori, ‘Mass conservative finite element method for immiscible two-fluid flow problems’, Journal of Applied Mechanics, J.S.C.E., Vol.7, 2004(In Japanese). [12] Zhang, S. L., ‘GPBi-CG: Generalized Product-type Method Based on Bi-CG for Solving Nonsymmetric Linear Systems’, SIAM J. Sci. Comput.,Vol.5, No. 4, 1995. [13] Y. Hattori, M. Yamamoto, ‘ Wind Tunnel Experiment of Flow Field in Wake of Wind Turbine ’, Journal of JSME Fluids Engineering Division, 25.11.2004(In Japanese). [14] Barthelmie, R. et al., Proc. European Wind Energy Conf. and Ex. (2003),A. T1-591. [15] Refael, G. E. et al., Proc. European Wind Energy Conf. and Ex. (2003),A.T1-591.

In: Techniques of Scientific Computing for Energy ... ISBN 1-60021-921-7 c 2004 Nova Science Publishers, Inc. Editors: F. Magoul`es and R. Benelmir, pp. 85–97

N UMERICAL S IMULATION OF S UPERSONIC C OMBUSTION U SING PARALLEL C OMPUTING E. von Lavante1∗ and M. Kallenberg2† 1 University of Duisburg-Essen, FB12, D-45127 Essen, Germany 2 RWE-IT Essen, D-45137 Essen, Germany

Abstract In the present investigation, the unsteady, three-dimensional, supersonic flow with nonequilibrium chemistry in a square channel with transverse hydrogen injection was simulated using a parallelized computer program. To this end, the concepts of largeeddy simulation (LES) were applied to a model supersonic combustion chamber using a three-dimensional solver of the compressible Navier-Stokes equations with chemical reactions developed by the present authors. The time accurate computation was accelerated by an implicit method and implemented on a massively parallel computer. The parallelization was accomplished using domain decomposition on a distributed memory systems. The results of the present three-dimensional simulation were analyzed with respect to their timewise behaviour and compared, where applicable, with two-dimensional predictions and experimental data obtained by other investigators. The relative efficiency and relative speedup of the parallel algorithm were analyzed for various sizes of the problem and number of processor units (PUs) ranging between 1 and 128.

1.

Introduction

The research and development of high-speed-flight vehicles is spurring activity in the areas of corresponding scientific development. The increase of scientific activities can be attributed to the reemerging interest in the concept of the scramjet. Several nations are planning unmanned hypersonic research vehicles with the scramjet as the most logical choice of propulsion. Typical representatives of these research vehicles are the US X-43A and X-43B with the ISTAR engine, developed as NASA hypersonic propulsion demonstration vehicles, the stationary PTE and GDE hypersonic scramjet test engines, the french VRR and A3CP, the french-german Japhar project or the german ELAC/EOS two-stage vehicle. All of these research vehicles are intended to validate design tools which could be used in future development of hypersonic propulsion technology, [1], [2]. Detailed study of some of the physical aspects of supersonic combustion has been carried out by, for example, Brummund and Nuding [3]. The simulation methods of these types of flows have reached a certain degree of maturity, ∗ E-mail † E-mail

address: [email protected] address: [email protected]

86

E. von Lavante and M. Kallenberg

offering a choice of standard spatial and time-wise discretization procedures. For details, see Cox et. al. [4] or Godfroy and Tissier [5]. However, several problems remain. One of the main difficulties is the treatment of turbulence, since none of the models is adequate for these complex flow cases. The interaction between the turbulent effects and the chemistry, in particular the chemical rates of reaction, is difficult to predict numerically due to the uncertainties in describing this physical phenomenon theoretically. The ability of various turbulence models to predict the mixing phenomena in a scramjet combustor was investigated by Madabhushi et al. [6]. More recently, a survey of numerical algorithms for solving the three-dimensional Navier-Stokes equations with nonequilibrium detailed chemistry was pubished by Chen and Shuen [7]. A corresponding computer code was put to practical use by, for example, Chamberlain et al. [8] The present authors have been carrying out numericall simulations of reacting flows in supersonic combustion chambers for some time now, [9], [10]. Here, the strong coupling between the fluidmechanic and thermodynamic variables found in supersonic flows requires the simultaneous solution of the PDF-equations for the velocities, partial densities and thermodynamic variables. A possible solution to this problems is to employ large-eddy simulation (LES) to resolve the low frequency fluctuations, assuming that they contain most of the turbulent kinetic energy, while implementing a subgrid model for the unresolved effects. The subgrid model should include fine scale mixing to account more realistically for the turbulence- chemistry interactions. A modified version of the linear eddy model for subgrid combustion was developed by Chakravarthy and Menon [13] and, more recently, Menon [14]. The LES-equivalent to the PDF-approach in the case of the RANS-solver are the filtered density function (FDF) methods. These ideas are relatively recent (see, for example, Colucci et al. [15]) and, therefore, unproven on anything but the simplest test cases. In view of the above discussion of the difficulty to formulate an appropriate model for the turbulence-chemistry interaction, as well as the turbulence alone, the present authors decided to investigate the feasibility of large-eddy simulation, applied to a case of chemically reacting supersonic flow of the air-hydrogen system. The presence of a rather complex shock system made the use of assumed PDF-distributions seem inappropriate, while a full scale Monte Carlo approach was too computationally intensive. It was decided to take a simplified approach similar to the above mentioned hybrid methods. The velocities where obtained from the LES-simulation in combination with a relatively simple Smagorinsky subgrid model. The scalar temperature fluctuations where obtained from the resolved values while neglecting their subgrid component and their effect on the reaction rates was modelled. The unresolved part of the mass fractions was neglected. A justification for this assumption is given below in the corresponding section of this paper.

2.

Algorithm

In the present work, the flow was assumed to be compressible, viscous, and a mixture of thermally perfect species. Due to the relatively low temperature and high pressure in the present configuration, the gas mixture can be treated as in vibrational (thermodynamic) equilibrium. The governing equations were in this case the compressible Navier-Stokes equations for ns species:

Numerical Simulation of Supersonic Combustion Using Parallel Computing

87

S ∂Qˆ ∂Fˆ ∂Gˆ ∂Hˆ + + + = ∂t ∂ξ ∂η ∂ζ J

(1)

where Fˆ , Gˆ and Hˆ are the flux vectors in the corresponding ξ , η and ζ directions, Q is the vector of the dependent state variables (ρu, ρv, ρw, e, ρ1 ... ρns )T and J is the Jacobian of the transformation of coordinates. The details of the governing equations are given in [16]. A simple model according to Fick’s law for the binary diffusion coefficient, along with the Sutherland equation for the viscous coefficient, were used. The chemical reactions for the H2 -air combustion were realized with an 8-reaction and 7-specie model of Evans and Schexnayder [17], representing a compromise between computational effort and complexity and physical reality. The present algorithm was based on the work of Roe using his flux-difference splitting scheme (FDS) [18]. This scheme was demonstrated to be accurate, with relatively low dissipation and dispersion. In the present version, the reconstruction of the cell-centered variables to the cellinterface locations was done using a monotone interpolation as introduced by Grossmann and Cinella in [19]. The interpolation slope was limited by an appropriate limiter, according to the previously published MUSCL type procedure (see, for example, [9]):   1 n−1 + n−1 QnR = Qi+1 − Li+1 (1 + κ)∆− , i+1 + (1 − κ)∆i+1 4

(2)

using a special quadratic version of the van Albada limiter developed by the present authors: 2

Li =

2

+ 2 ∆− i ∆i + ε 4 ∆− i

+

4 ∆+ i



,

ε ≈ 1 · 10−5 .

(3)

Variation of the accuracy factor κ, and switching the limiter on or off, results in various schemes from simple first order accurate up to third order formulation. In the present work, the second order accurate upwind biased Fromm scheme (κ = 0) and the above form of van Albada limiter has been used. The viscous fluxes Fv , Gv and Hv were centrally differenced. The details of this scheme, with the corresponding modifications, are given by Hilgenstock et. al. [10]. This includes also the positivity preserving modifications in the sense of Larrouturou [20]. The governing equations were integrated in time using a semi-implicit method, with different multi-stage Runge-Kutta type schemes used for the explicit operator. Only the chemical source terms were treated implicitly,   ˆn ˆn ˆn ∂Sn ∂Qn ˆn − ∂F − ∂G − ∂H I − ∆t = S (4) ∂Qn ∂t J ∂ξ ∂η ∂ζ ∂S The numerical effort to invert the Matrix D = I − ∆t ∂Q n depends on the formulation of the Jacobian of the chemical source terms. Several different forms of the Jacobian matrix, with increasing complexity and accuracy, were implemented and compared. The possibility to simplify the matrix D by dropping all the off-diagonal terms while keeping only the diagonal terms turned out to be an effective means of accelerating the convergence, with n

88

E. von Lavante and M. Kallenberg

stability limits given by the acoustic wave speeds. Without subiterations in pseudo-time, the scheme is first-order accurate in time, improving to second order accuracy in time with the substepping. Using a multi-block grid structure resulted in a flexible code with the possibilty of working with different chemical models ( nonequilibrium, equilibrium, frozen ) in different blocks. Besides, some of the blocks were selectively refined, depending on the evolving results. The present geometrical treatment of the computational domain was simple, yet flexible enough.

3.

Parallel Implementation

Ideally, in a LES, turbulent eddies with as large energy contents as possible should be directly simulated, making resolutions of the order of magnitude below y+ necessary. Since no preferential direction is assumed, this very high resolution should be applied not only normal to the solid walls, but in all spatial directions considered in that particular simulation. Additionally, the computational grid should be uniformly distributed. Clearly, even if the above requirements are somewhat relaxed, an extremely high number of grid points (or cells in the finite volume method) have to be utilized. The corresponding computations can be carried out only on the largest computers available. A performance that is adequate for the LES is presently offered only on massively parallel computers. Early in this work, it was decided to implement a data parallel structure, since the multiblock grid system already had data exchange between the blocks built in. The present method of domain decomposition is shown schematically in Fig. 1. At the block interfaces, a system of two rows of overlapping cells is employed to assure the preservation of up to third order accuracy. The information exchange between the PUs was accomplished using the MPI library using standard point to point communications. Only few global operations had to be used. The production runs were carried out on a Cray T3E using 128 PUs. Before this machine became available, an IBM SP2 using 8 PUs and a cluster of LINUX-PCs using up to 12 PUs were also used. In the two-dimensional case, the maximum total grid size was 1024x1024 internal cells, although a 256x128 grid was mostly sufficient. The present threedimensional simulation was executed using a 256x128x32 grid, assuming symmetric flow in the crosswise direction. The more appropriate periodic boundary condition resulted in an unrestricted crosswise velocity w. It might be interesting to note that the Cray T3E, the IBM SP2 and, surprisingly, a Linux PC-cluster based on the Pentium IV CPUs, demonstrated approximately the same performance per PU. Furthermore, since the present application is highly computationally intensive, the times necessary for interzonal communication were relatively insignificant, resulting in very efficient parallelization of the code.

4.

Test Case Configuration

The geometry selected for the present test was relatively simple, consisting of a rectangular channel with a 0.06 m × 0.06 m cross section. All the opposite walls were parallel; the length of the channel was approximately 0.465 m. At a distance of 0.105 m from the

Numerical Simulation of Supersonic Combustion Using Parallel Computing

89

Figure 1. Parallelization by domain decomposition in blocks, employing overlapping cells for information exchange.

leading edge of the tube, hydrogen was injected from the upper and lower wall through a dense row of small holes (0.002 m diameter). A schematic picture of the expected flow field can be seen in Fig. 2. Due to the blockage of the injected jet of hydrogen, a strong bow shock is generated. At the injection hole, a so called barrel shock forms with two Mach tripple points. After the injection, a recompression shock turnes the flow into a direction parallel to the wall. The flow in the injection holes reached critical conditions. The holes were very closely spaced at a distance of 4mm between their centers. The intention was to approximate two-dimensional flow conditions as closely as possible, thus enabling a comparison with two-dimensional simulations. Unfortunately, it will be shown below that even in this case the flow at the injection location was fully three-dimensional. The basic geometry including information about the flow conditions is shown schematically in Fig. 3. The structure of the multi-block computational grid used in the present simulation is displayed in Fig. 4. The location of the H2 injection holes, the physical size of the overall domain in meters and the various boundary conditions are also indicated. The computational grid consisted of 256x128x32 internal cells, arranged in 128 blocks.

90

E. von Lavante and M. Kallenberg Flow Direction

Separation Shock

Bow Shock

Mixing Zone Rezirculationarea

Mach Disk

Recompression Shock

Barrel Shock

11111111111111 00000000000000 00000000000000 11111111111111 00000000000000 11111111111111

111111111111111111111 000000000000000000000 000000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111

Injectant

Figure 2. Physics of the flow field at the injection hole.

A simplified view of the flow and the shock structure close to the H2 injection hole has been published by, among others, Ramakrishnan and Singh [23]. The dominant features of this shock system can be seen in Fig. 2. The geometry and the boundary conditions of this configuration are decribed in detail by von Lavante et al. [9] . At the inflow, the Mach number was M = 2.97, the static pressure was p = 0.137 MPa and the static temperature was T = 1300K. The hydrogen jet enters at sonic conditions, at a static pressure of p = 0.4 MPa; its static temperature was T = 350K. The Reynolds number per one meter reference length was in this case Rex = 1.7 · 107. This case is of particular interest, since it has been frequently used in numerical simulations by several other authors and was experimentally investigated by, for examle, Quenett [22].

5.

Results

The present three-dimensional computations were started initially from the twodimensional flow distribution published by [9], extrapolated crosswise into the third dimension. At the initial time, called t0 , the length of the separated region xsep was 94 mm and the penetration depth of the H2 -jet h pen was 8.9 mm. In the course of the computation, xsep decreased until it reached its final average value of 29 mm at time t0 + 790µs. Due to the unsteady character of the flow, the length of the separated region fluctuated about the timewise average by approximately ±1mm . The penetration depth reached h pen = 3.3mm and fluctuated by approximately 0.3 mm. Even in the three-dimensional case, the tendency of the relationship between xsep and h pen is preserved as the separation length decreased with decreasing penetration depth.

Numerical Simulation of Supersonic Combustion Using Parallel Computing

91

Comparision of Injectionhole Diameter: Manufacted Geometry Air Inflow:

Testbed Geometry

T = 1300K p = 0.137MPa M = 2.97

1.75 mm m 1.125 m

30 mm

105 mm

Hydrogen Injection: T = 350K p = 0.4 MPa M = 1.1

y 4m m

220 mm

x z

Figure 3. Present configuration including the boundary conditions.

Figure 4. Computational grid consisting of 128 blocks.

92

E. von Lavante and M. Kallenberg

0.030

y

y

0.030

0.029 0.029 0.102

0.103

0.104

0.106

x

0.1065

0.107

0.1075

x

y

0.030

0.027

0.102

0.104

0.106

0.108

0.11

0.112

0.114

x

Figure 5. Simulated velocity vectors at the injection hole.

Figure 6. Comparison of experimentally determined schlieren picture (top) and numerically simulated schlieren (bottom).

Numerical Simulation of Supersonic Combustion Using Parallel Computing

Figure 7. Relative parallel efficiency on a Cray T3E.

Figure 8. Relative parallel speedup on a Cray T3E.

93

94

E. von Lavante and M. Kallenberg

In Fig. 5, instantaneous velocity vectors in the vicinity of the injection hole are displayed. Clearly visible is the barrel shock, the oblique bow shock, the free shear layer behind the hydrogen jet and the separation regions upstream and downstream of the injection. The high resolution of the boundary layer can be seen in the two magnified pictures. The flow forms a horse-shoe vortex around the jet, thus relieving the pressure ahead of the injection. This flow feature explaines the much smaller separation region upstream of the injection hole. After leaving the injection hole at critical conditions, the jet expands and forms several mach-disks. The H2 -jet is unsteady, moving in a circular periodic motion with a frequency of approximately 8 kHz. Consequently, the flow field downstream of it is also highly unsteady. The simulated schlieren picture of the flow at and downstream of the injection position is compared with its experimental counterpart in Fig. 6a and 6b. Although shown at a different vertical scale, the similarity can be clearly recognized. In particular, the location and shape of the oblique shocks and the extend of the turbulent shear layer agree rather well. The structure of the numerically generated shear layer is similar to the experimental picture at the injection, where the resulution is high enough, but becomes coarser downstream as the grid resolution rapidly decreases. The efficiency of the parallelization was studied on a Cray T3E by successively increasing the number of PUs while keeping the overall number of cells constant at the basic resolution of ntot = 1048676 cells. This means that with increasing number of PUs nPU , the size of each of the blocks being processed by one PU decreased as nblock = ntot /nPU . Clearly, the execution time per iteration decreased while increasing the communication time, so that the parallel efficiency, defined as follows, should have decreased. The relative parallel efficiency ε p and relative parallel speedup S p are defined as: εp =

T1 T1 , Sp = nPU · Tp Tp

where T1 is the execution time on one PU and Tp the execution time on nPU processing units. The relative efficiency obtained for the basic configuration is shown in Fig. 7 marked by squares. At first, it might seem strange that at 32 PUs, the efficiency suddenly increases to a value larger than 1.0. This is due to the smaller size of the blocks, which now fit entirely into the cache, making the execution speed much faster. To prove this point, the authors also carried out the same study for a much smaller total number of cells, ntot = 32768, marked by circles, and for a much larger number of cells, ntot = 8388608, marked by rhombi. The efficiency of the small-sized computation displays the expected behavior, strongly decreasing with increasing nPU . The large-sized case was much more efficient since the computational time per block per iteration is significantly larger. In general, the parallel efficiency of the present code was rather good since, due to the physical complexity of the problem, the computational effort per cell per iteration was very large, resulting in 2.9 s per iteration for the basic case of ntot = 1048676. The corresponding plot of the parallel speedup is displayed in Fig. 8.

Numerical Simulation of Supersonic Combustion Using Parallel Computing

6.

95

Conclusion

The main goal of the present work was the numerical simulation of a small yet important component of a simple supersonic combustion chamber. Here, the flow field in a part of a supersonic channel with transverse hydrogen injection, containing one injection opening, was predicted using the present solver of compressible viscous chemically reacting flows. This configuration is the subject of experimental studies being undertaken by many academic and research institutions and is therefore of significant importance. The resulting flow field was highly unsteady, with periodic motion of the hydrogen jet and the free shear layer downstream of the location of the injection port. The present LES was able to capture some of the large scale turbulent effects in the boundary layer downstream of the hydrogen jet and in the free shear layer, however at a significant computational cost. Even so, the resolution sould be further increased in future work to capture more of the turbulent kinetic energy spectrum. The effect of turbulence on the nonequilibrium chemistry was accounted for only by considering the temperature fluctuations. In the present special case, this might be an acceptable simplification, but generally the PDF-transport equations for mass fractions must be considered as well.

Acknowledgements The present work was supported by a grant from the DFG (German Research Association). The major part of the computer resources were provided by a grant from the HLRZ (High Performance Computer Center for Science and Research) at the Research Center J¨ulich GmbH, Germany.

References [1] Rausch, V. L.,McClinton, C. R. and Hicks, J. W., “ Scramjets breathe new life into hypersonics" Aerospace America, July 1997, pp. 40-46. [2] C. Covault, “Hypersonic strategy sets stage for ’next great leap’“, Aviation Week & Space Technology, March 26, 2001, pp.28-30. [3] Brummund, U. and Nuding, J.-R., “ Interaction of Compressible Shear Layer with Shock Waves: an Experimental Study “, AIAA Paper 97-0392, 1997. [4] Cox, C. F., Cinnella, P., and Arabshahi, A., “ Multi-Block Calculations for Flows in Local Chemical Equilibrium ”, AIAA Paper 93-2999, 1993. [5] Godfroy, F., and Tissier, P. Y., “ CFD Analysis of Vortex Shedding Inside a Subscale Segmented Motor “, AIAA Paper 94-2781. [6] Madabhushi, R. K., Choi, D., Barber, T. J., and Orszag, S., “ Computational Modeling of Mixing Process for scramjet Combustor Applications “, AIAA Paper 97-2638. [7] Chen, K.-H. and Shuen, J.-S., “ A Comprehensive Study of Numerical Algorithms for Three-Dimensional, Turbulent, Nonequilibrium Viscous Flows with Detailed Chemistry “, AIAA Paper 95-0800, 1995.

96

E. von Lavante and M. Kallenberg

[8] Chamberlain, R., Dang, A. and McClure, D., “ Effect of Exhaust Chemistry on Reaction Jet Control “, AIAA Paper 99-0806, 1999. [9] von Lavante, E., Hilgenstock, M. and Groenner, J., ” Simple Numerical Method for Simulating Supersonic Combustion ”, AIAA Paper 94-3179, 1994. [10] Hilgenstock, M., von Lavante, E. and Groenner, J., ” Efficient Computations of Navier-Stokes Equations with Nonequilibrium Chemistry ”, ASME Paper 94-GT-251, 1994. [11] Narayan, J. R., “ Prediction of Turbulent Reacting Flows Related to Hypersonic Airbreathing Propulsion Systems “, AIAA Paper 94-2948, 1994. [12] M¨obius, H., Gerlinger, P. and Br¨uggemann, D., “ Monte Carlo PDF Simulation of Compressible Turbulent Diffusion Flames Using Detailed Chemical Kinetics “, AIAA Paper 99-0198, 1999. [13] Chakravarthy, V. K., and Menon, S., “ Characteristics of a Subgrid Model for Turbulent Premixed Combustion “ , AIAA Paper 97-3331, 1997. [14] Menon, S., “ Subgrid Combustion Modelling for Large-Eddy Simulation “, Int. J. Engine Research 1, 209-227, 2000. [15] Colucci, P. J., Jaberi, F. A., Givi, P. and Pope, S. B., “ Filtered Density Function for Large Eddy Simulations of Turbulent Reactive Flows “, Phys. Fluids 10, pp. 499-515, 1998. [16] Kallenberg, M. and von Lavante, E., “ The Dynamics of Unsteady Supersonic Combustion “, AIAA Paper 98-3319, 1998. [17] Evans, J.S., Schexnayder, C.J., ” Influence of Chemical Kinetics and Unmixedness on Burning in Supersonic Hydrogen Flames”, AIAA-Journal, Febr. 1980, pp. 188-193. [18] Roe, P.L., Pike, J., “Efficient Construction and Utilisation of Approximate Riemann Solutions”, Computing Methods in Applied Sciences and Engineering, VI, pp. 499516, INRIA, 1984. [19] Grossmann, B., and Cinella, P., ” Flux-Split Algorithms for Flows with Nonequilibrium Chemistry and Vibrational Relaxation ”, J. Comp. Phys., vol. 88, pp. 131-168, 1990. [20] Larrouturou, B., and Fezoui, L. ” On the Equations of Multi-Component Perfect or Real Gas Inviscid Flow ”, Nonlinear Hyperbolic Problems, Lecture Notes in Mathematics, 1402, Springer Verlag, Heidelberg 1989. [21] von Lavante, E., “ The Accuracy of Upwind Schemes Applied to the Navier-Stokes Equations “, AIAA Journal, Vol. 28, No. 7, 1990. [22] Quenett, Ch., “ Stoßrohruntersuchungen zur H2 - Verbrennung in einer heissen ¨ Uberschallstr¨ omung “, Ph.D. Thesis, University of Essen, Germany, 1995.

Numerical Simulation of Supersonic Combustion Using Parallel Computing

97

[23] Ramakrishnan and Singh, ”Scramjet Combustor Flowfields”, AIAA-Journal, Vol. 32, No. 5, pp. 930-935, May 1994.

INDEX A access, 22, 29, 31, 32, 37, 38, 39, 40, 53, 58 accuracy, 87, 88 algorithm, viii, 6, 7, 21, 22, 23, 24, 25, 26, 27, 85, 87 appendix, 42 argument, 43, 45, 49, 50, 51, 54, 56, 57, 63 Austria, 34

B behavior, 21, 23, 24, 80, 94 blocks, 28, 88, 89, 91, 94 boundary value problem, 23, 24 breathing, 96

computing, vii concentration, 4 concrete, 29 condensation, 2 configuration, 24, 82, 86, 90, 91, 94, 95 Congress, 82 conjugate gradient method, 73, 82 consolidation, 23, 24, 28, 33 construction, 23, 25, 26, 27 continuity, 2, 3, 23 control, 3, 14, 16 convergence, 7, 14, 19, 23, 28, 87 coupling, 24, 25, 26, 27 Cray T3E, 88, 93, 94

D C C++, vii, 37, 38, 39, 40, 42, 43, 47, 49, 52, 53, 54, 56, 57, 58, 67, 68 cast, 61 cell, 62, 63, 66, 67, 87, 94 chemical reactions, 87 China, 33 classes, 39, 40, 42 codes, 21 combustion, vii, 85, 86, 87, 95 combustion chamber, 85, 86, 95 communication, 27, 33, 88, 94 compilation, 25, 31 compiler, 21, 25, 32, 43, 52 complexity, 87, 94 components, vii, 14, 37, 38, 40, 41, 58, 61, 72 compressibility, 24 computation, viii, 6, 26, 69, 70, 75, 82, 85, 90, 94 computational fluid dynamics, 70 computational grid, 88, 89 computers, vii, 28, 88

data set, vii, 37, 38 data transfer, 23, 41, 47 decisions, 80 decomposition, viii, 7, 26, 85, 88, 89 deformation, 23, 28 density, 10, 59, 60, 86 diaphragm, 30 differential equations, 23 differentiation, 5 diffusion, 4, 11, 87 directives, 21, 25 discretization, 1, 2, 9, 71, 86 displacement, 33, 74 distributed memory, viii, 26 distributed memory machines, 26 distribution, 1, 2, 9, 10, 11, 14, 19, 38, 80, 90 division, 25

E eigenvalue, vii, 1, 4, 6, 7, 8, 9, 10, 14, 18, 19

100

Index

energy, vii, viii, 86, 88, 95 environment, vii, viii, 21, 22, 38 equilibrium, 1, 4, 5, 11, 14, 31, 86, 88, 96 European Community, 22 eutrophication, 13 execution, 25, 45, 57, 94 extinction, 11

F failure, 50 family, 39 feedback, 58 FEM, 6, 9, 20, 21, 25, 26, 82 filters, 39, 40 finite element method, 1, 9, 21, 23, 40, 69, 70, 71, 82, 83 fishing, 1 flexibility, 58 flight, 85 flow field, 89, 90, 94, 95 fluctuations, 86 fluid, 24, 69, 70, 72, 74, 80, 82, 83 food, vii, 1, 19 France, viii, 37, 68

G generation, 21, 22, 82 Germany, 85, 95, 96 gravity, 72, 74 grazing, 14 grid resolution, 94 grids, 68

H height, 28, 29 hybrid, 86 hydrogen, viii, 85, 86, 89, 90, 94, 95

I identification, vii, 7, 8 images, 37 imaging, vii implementation, 50, 58 income, 1 indexing, 42 indication, viii indices, 60, 62, 63, 66, 67 industry, 1 inertia, 74

information exchange, 88, 89 installation interval, 80 institutions, 95 integration, 73 interaction, 29, 82, 86 interactions, 86 interface, 22, 37, 40, 50, 52, 55, 58, 87 Internet, 23 iteration, 10, 18, 19, 24, 94

J Japan, vii, 1, 13, 69 Java, 38, 58 justification, 86

L labor, 82 Lagrange multipliers, 72 language, vii, 37, 38, 49, 54, 58 linear function, 70, 82 linear systems, 35 location, 89, 94, 95

M management, 22, 29, 31 manipulation, 58 matrix, 6, 7, 8, 10, 22, 23, 25, 26, 27, 28, 31, 32, 33, 87 mechanical behavior, 23 memory, viii, 6, 21, 22, 25, 26, 29, 30, 31, 32, 40, 41, 69, 85 message passing, 22 mixing, 86 models, 29, 31, 33, 86, 88 modules, 22, 23 momentum, 2, 74 Monte Carlo, 86, 96 motion, 74, 94, 95 movement, 74, 75 MPI, 21, 26, 34, 88 multiplication, vii, 1, 2, 19, 47 multiplier, 69, 70, 72, 82

N Navier-Stokes equation, 69, 70, 82, 83 Nd, 73 network, vii, 39 nitrogen, 1 nodes, 28, 60, 73, 75

Index noise, 38 nonequilibrium, 85, 86, 88, 95 Norway, 34 numerical analysis, 2, 80

O one dimension, 41 operator, 87 oscillation, 4 oxygen, 1

P Pacific, 82 parallel algorithm, viii, 85 parallelism, 26 parallelization, viii, 21, 23, 25, 26, 30, 32, 34, 85, 88, 94 parameter, 1, 2, 3, 7, 8, 14, 19, 24 peat, 28 phosphorous, 1 phytoplankton, vii, 1, 9, 11, 13, 14, 17, 19 plankton, vii, 1, 2, 9, 11, 14 Poland, 34, 37 pollutants, 23 pollution, vii, 1 power, vii, 70, 82 power generation, 82 pressure, 24, 28, 30, 69, 70, 71, 80, 82, 86, 90, 94 production, 88 program, vii, 21, 23, 25, 26, 40, 43, 44, 45, 85 programming, vii, 37, 68 protocols, 37 publishers, 68 PVM, 34

R reaction rate, 86 reading, 25, 26, 32, 40 real time, 24 reality, 87 recall, 54 reconstruction, 87 reduction, 26, 38 relationship, 90 relaxation, 24 relaxation process, 24 resolution, 88, 94, 95 resources, 95 returns, 40, 43, 50, 57 routines, 47

101

S scalar field, 43 sensitivity, 7, 8 separation, 90, 94 settlements, 30 shape, 80, 81, 82, 94 shear, 94, 95 shellfish, 1 shock, 86, 89, 90, 94 shortage, 69 similarity, 94 simulation, vii, 75, 85, 86, 88, 89, 95 sinus, 42 software, vii, 21, 37, 38 soil, 22, 23, 31 species, 86 spectrum, 9, 14, 95 speed, 21, 23, 26, 30, 31, 32, 33, 58, 74, 81, 85, 94 stability, vii, 1, 2, 4, 5, 7, 14, 19, 88 stages, 28, 29, 30 strain, 24 stress, 24, 74 substitutes, 42 substitution, 10, 25 summer, 19 Sun, 22 supply, 70 switching, 87 synchronization, 27, 29, 30, 31, 32, 33 systems, viii, 23, 35, 85

T technology, 85 teeth, 29 temperature, 43, 59, 60, 86, 90, 95 theory, 1, 4, 5, 23, 40 threshold, 14 time, viii, 1, 4, 6, 7, 14, 23, 24, 28, 29, 30, 31, 32, 39, 42, 54, 69, 73, 77, 80, 85, 86, 87, 88, 90, 94 time increment, 23 topology, 44, 45, 46, 47, 48, 66 tracking, 51 traffic, 31 transformation, 87 transport, 23, 95 treatment methods, 58 triggers, 57 turbulence, vii, 86, 95 twins, 37

102

Index

W

U users, 30

V validity, 80 values, 7, 14, 25, 26, 27, 40, 41, 42, 43, 48, 50, 51, 70, 71, 72 variable, 19, 49, 60, 62, 65 variables, 24, 25, 26, 40, 51, 86, 87 vector, 6, 10, 14, 24, 71, 87 vehicles, 85 velocity, 1, 11, 13, 14, 19, 69, 70, 71, 73, 74, 81, 82, 88, 92, 94 viscosity, 70 visualization, vii, 37, 38, 39, 40, 43, 58

waste water, 1 water quality, 1, 13 wind, 70, 80, 81, 82 windows, 52 writing, 32, 50, 51, 58

Y yield, 24, 28, 31

Z zooplankton, vii, 9, 11, 13