Modeling, Simulation and Optimization of Complex Processes: Proceedings of the Fourth International Conference on High Performance Scientific Computing, March 2-6, 2009, Hanoi, Vietnam [1 ed.] 3642257062, 9783642257063

This proceedings volume contains a selection of papers presented at the Fourth International Conference on High Performa

197 67 5MB

English Pages 338 [348] Year 2012

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Front Matter....Pages i-ix
A Cutting Hyperplane Method for Generalized Monotone Nonlipschitzian Multivalued Variational Inequalities....Pages 1-11
Robust Parameter Estimation Based on Huber Estimator in Systems of Differential Equations....Pages 13-23
Comparing MIQCP Solvers to a Specialised Algorithm for Mine Production Scheduling....Pages 25-39
A Binary Quadratic Programming Approach to the Vehicle Positioning Problem....Pages 41-51
Determining Fair Ticket Prices in Public Transport by Solving a Cost Allocation Problem....Pages 53-63
A Domain Decomposition Method for Strongly Mixed Boundary Value Problems for the Poisson Equation....Pages 65-75
Detecting, Monitoring and Preventing Database Security Breaches in a Housing-Based Outsourcing Model....Pages 77-89
Real-Time Sequential Convex Programming for Optimal Control Applications....Pages 91-102
SuperQuant Financial Benchmark Suite for Performance Analysis of Grid Middlewares....Pages 103-113
A Dimension Adaptive Combination Technique Using Localised Adaptation Criteria....Pages 115-125
Haralick’s Texture Features Computation Accelerated by GPUs for Biological Applications....Pages 127-137
Free-Surface Flows over an Obstacle: Problem Revisited....Pages 139-151
The Relation Between the Gene Network and the Physical Structure of Chromosomes....Pages 153-167
Generalized Bilinear System Identification with Coupling Force Variables....Pages 169-182
Reduced-Order Wave-Propagation Modeling Using the Eigensystem Realization Algorithm....Pages 183-193
Complementary Condensing for the Direct Multiple Shooting Method....Pages 195-206
Some Inverse Problem for the Polarized-Radiation Transfer Equation....Pages 207-217
Finite and Boundary Element Energy Approximations of Dirichlet Control Problems....Pages 219-231
Application of High Performance Computational Fluid Dynamics to Nose Flow....Pages 233-245
MaxNet and TCP Reno/RED on Mice Traffic....Pages 247-255
Superstable Models for Short-Duration Large-Domain Wave Propagation....Pages 257-269
Discontinuous Galerkin as Time-Stepping Scheme for the Navier–Stokes Equations....Pages 271-281
Development of a Three Dimensional Euler Solver Using the Finite Volume Method on a Multiblock Structured Grid....Pages 283-292
Hybrid Algorithm for Risk Conscious Chemical Batch Planning Under Uncertainty....Pages 293-304
On Isogeometric Analysis and Its Usage for Stress Calculation....Pages 305-314
On the Efficient Evaluation of Higher-Order Derivatives of Real-Valued Functions Composed of Matrix Operations....Pages 315-324
Modeling of Non-ideal Variable Pitch Valve Springs for Use in Automotive Cam Optimization....Pages 325-337
Recommend Papers

Modeling, Simulation and Optimization of Complex Processes: Proceedings of the Fourth International Conference on High Performance Scientific Computing, March 2-6, 2009, Hanoi, Vietnam [1 ed.]
 3642257062, 9783642257063

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Modeling, Simulation and Optimization of Complex Processes



Hans Georg Bock  Hoang Xuan Phu Rolf Rannacher  Johannes P. Schl¨oder Editors

Modeling, Simulation and Optimization of Complex Processes Proceedings of the Fourth International Conference on High Performance Scientific Computing, March 2-6, 2009, Hanoi, Vietnam

123

Editors Hans Georg Bock Rolf Rannacher Johannes P. Schl¨oder University of Heidelberg Interdisciplinary Center for Scientific Computing (IWR) Heidelberg Germany

Hoang Xuan Phu Vietnam Academy of Science and Technology (VAST) Hanoi Vietnam

ISBN 978-3-642-25706-3 e-ISBN 978-3-642-25707-0 DOI 10.1007/978-3-642-25707-0 Springer Heidelberg Dordrecht London New York Library of Congress Control Number: 2012931412 Math. Subj. Class. (2010): 35-06 49-06, 60-06, 65-06, 68-06, 70-06, 76-06, 86-06, 90-06, 93-06, 94-06 c Springer-Verlag Berlin Heidelberg 2012

This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Front cover figure: The Huc Bridge on Hoan Kiem Lake, Hanoi. By Courtesy of Johannes P. Schl¨oder. Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)

Preface

High Performance Scientific Computing is an interdisciplinary area that combines many fields such as mathematics and computer science as well as scientific and engineering applications. It is an enabling technology for both competitiveness in industrialized countries and for speeding up development in emerging countries. High performance scientific computing develops methods for modeling, computer-aided simulation, and optimization of systems and processes. In practical applications in industry and commerce, science and engineering, it helps to conserve resources, to avoid pollution, to reduce risks and costs, to improve product quality, to shorten development times, or simply to operate systems better. Topical aspects of scientific computing have been presented and discussed at the Fourth International Conference on High Performance Scientific Computing held at the Institute of Mathematics, Vietnam Academy of Science and Technology (VAST), March 2–6, 2009. The conference has been organized by the Institute of Mathematics of VAST, the Interdisciplinary Center for Scientific Computing (IWR) of the University of Heidelberg, and Ho Chi Minh City University of Technology. More than 200 participants from countries all over the world attended the conference. The scientific program consisted of more than 140 talks, 10 of them were invited plenary lectures given by Robert E. Bixby (Houston), Olaf Deutschmann (Karlsruhe), Iain Duff (Chilton), Roland Eils (Heidelberg), L´aszl´o Lov´asz (Budapest), Peter Markowich (Cambridge & Vienna), Volker Mehrmann (Berlin), Alfio Quarteroni (Lausanne & Milan), Horst Simon (Berkeley), and Ya-xiang Yuan (Beijing). Topics included mathematical modeling, numerical simulation, methods for optimization and control, parallel computing, software development, applications of scientific computing in physics, mechanics, hydrology, chemistry, biology, medicine, transport, logistics, site location, communication networks, scheduling, industry, business, and finance. This proceedings volume contains 27 carefully selected contributions referring to lectures presented at the conference. We would like to thank all authors and the referees.

v

vi

Preface

Special thanks go to the sponsors whose support significantly contributed to the success of the conference: + Heidelberg Graduate School of Mathematical and Computational Methods for the Sciences + Daimler and Benz Foundation, Ladenburg + The International Council for Industrial and Applied Mathematics (ICIAM) + Berlin Mathematical School + Berlin/Brandenburg Academy of Sciences and Humanities + The Abdus Salam International Centre for Theoretical Physics, Trieste + Institute of Mathematics, Vietnam Academy of Science and Technology + Faculty of Computer Science and Engineering, HCMC University of Technology Heidelberg

Hans Georg Bock Hoang Xuan Phu Rolf Rannacher Johannes P. Schl¨oder

Contents

A Cutting Hyperplane Method for Generalized Monotone Nonlipschitzian Multivalued Variational Inequalities . . .. . . . . . . . . . . . . . . . . . . . Pham Ngoc Anh and Takahito Kuno

1

Robust Parameter Estimation Based on Huber Estimator in Systems of Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Tanja Binder and Ekaterina Kostina

13

Comparing MIQCP Solvers to a Specialised Algorithm for Mine Production Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Andreas Bley, Ambros M. Gleixner, Thorsten Koch, and Stefan Vigerske

25

A Binary Quadratic Programming Approach to the Vehicle Positioning Problem.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Ralf Bornd¨orfer and Carlos Cardonha

41

Determining Fair Ticket Prices in Public Transport by Solving a Cost Allocation Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Ralf Bornd¨orfer and Nam-D˜ung Ho`ang

53

A Domain Decomposition Method for Strongly Mixed Boundary Value Problems for the Poisson Equation . . . . .. . . . . . . . . . . . . . . . . . . . Dang Quang A and Vu Vinh Quang

65

Detecting, Monitoring and Preventing Database Security Breaches in a Housing-Based Outsourcing Model. . . . . . . .. . . . . . . . . . . . . . . . . . . . Tran Khanh Dang, Tran Thi Que Nguyet, and Truong Quynh Chi

77

Real-Time Sequential Convex Programming for Optimal Control Applications .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Tran Dinh Quoc, Carlo Savorgnan, and Moritz Diehl

91

vii

viii

Contents

SuperQuant Financial Benchmark Suite for Performance Analysis of Grid Middlewares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 103 Abhijeet Gaikwad, Viet Dung Doan, Mireille Bossy, Franc¸oise Baude, and Fr´ed´eric Abergel A Dimension Adaptive Combination Technique Using Localised Adaptation Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 115 Jochen Garcke Haralick’s Texture Features Computation Accelerated by GPUs for Biological Applications . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 127 Markus Gipp, Guillermo Marcus, Nathalie Harder, Apichat Suratanee, Karl Rohr, Rainer K¨onig, and Reinhard M¨anner Free-Surface Flows over an Obstacle: Problem Revisited .. . . . . . . . . . . . . . . . . . 139 Panat Guayjarernpanishk and Jack Asavanant The Relation Between the Gene Network and the Physical Structure of Chromosomes.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 153 Dieter W. Heermann, Manfred Bohn, and Philipp M. Diesinger Generalized Bilinear System Identification with Coupling Force Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 169 Jer-Nan Juang Reduced-Order Wave-Propagation Modeling Using the Eigensystem Realization Algorithm .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 183 Stephen A. Ketcham, Minh Q. Phan, and Harley H. Cudney Complementary Condensing for the Direct Multiple Shooting Method . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 195 Christian Kirches, Hans Georg Bock, Johannes P. Schl¨oder, and Sebastian Sager Some Inverse Problem for the Polarized-Radiation Transfer Equation. . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 207 A.E. Kovtanyuk and I.V. Prokhorov Finite and Boundary Element Energy Approximations of Dirichlet Control Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 219 G¨unther Of, Thanh Xuan Phan, and Olaf Steinbach Application of High Performance Computational Fluid Dynamics to Nose Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 233 I. Pantle and M. Gabi MaxNet and TCP Reno/RED on Mice Traffic . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 247 Khoa T. Phan, Tuan T. Tran, Duc D. Nguyen, and Nam Thoai

Contents

ix

Superstable Models for Short-Duration Large-Domain Wave Propagation .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 257 Minh Q. Phan, Stephen A. Ketcham, Richard S. Darling, and Harley H. Cudney Discontinuous Galerkin as Time-Stepping Scheme for the Navier–Stokes Equations .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 271 Th. Richter Development of a Three Dimensional Euler Solver Using the Finite Volume Method on a Multiblock Structured Grid.. . . . . . . . . . . . . . . . . . . 283 Tran Thanh Tinh, Dang Thai Son, and Nguyen Anh Thi Hybrid Algorithm for Risk Conscious Chemical Batch Planning Under Uncertainty .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 293 Thomas Tometzki and Sebastian Engell On Isogeometric Analysis and Its Usage for Stress Calculation . . . . . . . . . . . . 305 Anh-Vu Vuong and B. Simeon On the Efficient Evaluation of Higher-Order Derivatives of Real-Valued Functions Composed of Matrix Operations.. . . . . . . . . . . . . . . . . . . 315 Sebastian F. Walter Modeling of Non-ideal Variable Pitch Valve Springs for Use in Automotive Cam Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 325 Henry Yau and Richard W. Longman



A Cutting Hyperplane Method for Generalized Monotone Nonlipschitzian Multivalued Variational Inequalities Pham Ngoc Anh and Takahito Kuno

Abstract We present a new method for solving multivalued variational inequalities, where the underlying function is upper semicontinuous and satisfies a certain generalized monotone assumption. First, we construct an appropriate hyperplane which separates the current iterative point from the solution set. Then the next iterate is obtained as the projection of the current iterate onto the intersection of the feasible set with the halfspace containing the solution set. We also analyze the global convergence of the algorithm under minimal assumptions.

Keywords Multivalued variational inequalities • Generalized monotone • Upper semicontinuous

1 Introduction We consider the classical multivalued variational inequality problem (see e.g. [7, 8, 11]), shortly MVI, which is to find points x  2 C and w 2 F .x  / such that hw ; x  x  i  0

8x 2 C;

P.N. Anh Department of Scientific Fundamentals, Posts and Telecommunications Institute of Technology, Hanoi, Vietnam e-mail: [email protected] T. Kuno Graduate School of Systems and Information Engineering, University of Tsukuba, Ibaraki, Japan e-mail: [email protected] H.G. Bock et al. (eds.), Modeling, Simulation and Optimization of Complex Processes, DOI 10.1007/978-3-642-25707-0 1, © Springer-Verlag Berlin Heidelberg 2012

1

2

P.N. Anh and T. Kuno

where C is a closed convex subset of Rn , F is a point-to-set mapping from C into subsets of Rn , and h:; :i denotes the usual inner product in Rn . Various methods have been developed so far to solve variational inequality problems (see e.g. [3,5,9,10,12]). In general, these methods assume the underlying mapping to be single-valued, monotone, and hence cannot be applied directly to MVI where F is multivalued. Unfortunately, there are still not many methods for solving MVI (see e.g. [4, 10]). Most of these methods require that F is either Lipschitz with respect to the Hausdorff distance or strongly monotone on C . However, both Lipschitz constant and strongly monotone constant are not easy to compute. In a recent paper [4], Anh et al. proposed a generalized projection method to solve MVI whose mapping F is not assumed to be Lipschitz. The main features of their method are that at each iteration, at most one projection onto C is needed, and that the search direction can be determined from any point in the image of the current iterate. In [1, 2], Anh also proposed an interior proximal method for solving monotone generalized variational inequalities and pseudomonotone multivalued variational inequalities when C is a polyhedron. His method is based on a special interior-quadratic function which replaces the usual quadratic function. This leads to an interior proximal type algorithm, which can be viewed as combining an Armijotype line-search technique and the special interior-quadratic function. The only assumption required is that F is monotone on C . Our main concern in this paper is to use the projection operators on closed convex set C PrC .x/ D arg min ky  xk: y2C

We propose an algorithm for solving MVI, by making no assumptions on the problem other than upper semicontinuity, compact-valuedness and certain generalized monotonicity of F . In Sect. 2, we give formal definitions of our target MVI and the generalized monotonicity of F . We then extended an idea often used for singlevalued variational inequalities to MVI and develop an iterative algorithm. Section 3 is devoted to the proof of its global convergence to a solution of MVI. An application to nonlinear complementarity problems is discussed in the last section.

2 Generalized Monotonicity and Algorithm n

Let F W Rn ! 2R be a mapping upper semicontinuous on a closed convex set C  Rn , which is a subset of domF D fx 2 Rn j F .x/ ¤ 0g. Again, let us write out the target problem: (MVI) Find points x  2 C and w 2 F .x  / such that hw ; x  x  i  0

8x 2 C:

For simplicity, we assume MVI to have a solution .x  ; w /. Let S  C denote the first component of the solution set.

Generalized Monotone Variational Inequalities

3

Now we recall well known definition of generalized monotonicity of mappings which will be required in our following analysis (see e.g. [13]). We assume that the mapping F of MVI satisfies this condition. Definition 1. F is called generalized monotone on C if hw; x  x  i  0

8w 2 F .x/; 8x 2 C:

It is clear that F is generalized monotone if F is monotone, i.e., hw  w0 ; x  x 0 i  0

8x; x 0 2 C; w 2 F .x/; w0 2 F .x 0 /:

More generally, F is also generalized monotone if F is pseudomonotone, i.e., for all x; x 0 2 C; w 2 F .x/; w0 2 F .x 0 / hw0 ; x  x 0 i  0 ) hw; x  x 0 i  0: However, even if F is generalized monotone, F might not be monotone or pseudomonotone. It is not difficult to check such examples (see e.g. [13]). If F is a point-to-point mapping, then MVI can be formulated as the following variational inequalities: (VI) Find x  2 C such that hF .x  /; x  x  i  0

8x 2 C:

In this case, it is known that solutions coincide with zeros of the following projected residual function T .x/ D x  PrC .x  F .x//:

In other words, x 0 2 C is a solution of (VI) if and only if T .x 0 / D 0 (see e.g. [12]). Applying this idea to the multivalued variational inequalities MVI, we have the following solution scheme. Let x k be a current approximation to the solution of MVI. First, we compute k w D arg supw2F .x k / hw; x k i and PrC .x k  cwk / for some positive constant c. Next, we search the line segment between x k and PrC .x k  cwk / for a point .wN k ; zk / such that the hyperplane @Hk D fx 2 Rn j hwN k ; x  zk i D 0g strictly separates x k from the solution set S of the problem. To find such .wN k ; zk /, we may use a computationally inexpensive Armijo-type procedure. Then we compute the next iterate x kC1 by projecting x k onto the intersection of the feasible set C with the halfspace Hk D fx 2 Rn j hw N k ; x  zk i  0g. The algorithm is then described as follows. Algorithm 1 Step 0.

Choose  > 0; x 0 2 C; w0 2 F .x 0 /; 0 < c < 1=, and 2 .0; 1/.

4

P.N. Anh and T. Kuno

Step 1.

Compute wk WD arg sup hw; x k i; w2F .x k /

r.x k / WD x k  PrC .x k  cwk /:

  Let Gk .m/ WD F x k  m r.x k / for an integer m, and find the smallest nonnegative number mk of m such that vk WD

sup hw; r.x k /i  kr.x k /k2 :

(1)

w2Gk .mk /

N k ; r.x k /i D vk . Set zk WD x k  mk r.x k /. Choose wN k 2 Gk .mk / such that hw Step 2. Set Hk WD fx 2 Rn j hwN k ; x  zk i  0g:

Find x kC1 WD PrC \Hk .x k /. Step 3. Set k WD k C 1, and go to Step 1.

t u

3 Convergence of the Algorithm Let us discuss the global convergence of Algorithm 1. Lemma 1 Let fx k g be the sequence generated by Algorithm 1. Then the following hold: (i) if r.x k / D 0, then x k 2 S , (ii) x k … Hk , S  C \ Hk , (iii) x kC1 D PrC \Hk .y k /, where y k D PrHk .x k /.

Proof. (i) It follows from r.x k / D 0 that PrC .x k  cwk / D x k . Then hx k  cwk  x k ; x  x k i  0 8x 2 C:

Hence, hwk ; x  x k i  0

(ii) By noting r.x k / ¤ 0, we have

8x 2 C:

hw N k ; x k  zk i D hwN k ; x k  .x k  mk r.x k //i D hwN k ; mk r.x k /i

  mk kr.x k /k2 > 0:

This implies x k … Hk . Since F is assumed to be generalized monotone, N k ; x   zk i  0 ) x  2 Hk : hw N k ; zk  x  i  0 ) hw

Generalized Monotone Variational Inequalities

5

(iii) We know that H D fx 2 Rn j hw; x  x 0 i  0g ) PrH .y/ D y 

hw; y  x 0 i w: kwk2

Hence, y k D PrHk .x k / D x k 

hwN k ; x k  zk i k

mk hwN k ; r.x k /i k k w N D x  wN : kwN k k2 kwN k k2

Otherwise, for every y 2 C \ Hk there exists  2 .0; 1/ such that xO D x k C .1  /y 2 C \ @Hk ; where @Hk D fx 2 Rn j hw N k ; x zk i D 0g, because x k 2 C but x k … Hk . Therefore, ky  y k k2  .1  /2 ky  y k k2

D kxO  x k  .1  /y k k2

D k.xO  y k /  .x k  y k /k2

D kxO  y k k2 C 2 kx k  y k k2  2hxO  y k ; x k  y k i

D kxO  y k k2 C 2 kx k  y k k2  kxO  y k k2 ;

(2)

because y k D PrHk .x k /. Also we have kxO  x k k2 D kxO  y k C y k  x k k2

D kxO  y k k2  2hxO  y k ; x k  y k i C ky k  x k k2 D kxO  y k k2 C ky k  x k k2 :

Since x kC1 D PrC \Hk .x k /, using the Pythagorean theorem we can reduce this to the following: kxO  y k k2 D kxO  x k k2  ky k  x k k2

 kx kC1  x k k2  ky k  x k k2

D kx kC1  y k k2 : From (2) and (3), we have

kx kC1  y k k  ky  y k k 8y 2 C \ Hk ;

(3)

6

P.N. Anh and T. Kuno

which means x kC1 D PrC \Hk .y k /: t u Using Lemma 1, we can prove the global convergence of Algorithm 1 under moderate assumptions. Theorem 2 (Convergence theorem). Let F be upper semicontinuous, compact valued and generalized monotone on C . Suppose the solution set S of MVI is nonempty. Then any sequence fx k g generated by Algorithm 1 converges to a solution of MVI. Proof. Suppose that kr.x k /k > 0. First, we need to show the existence of the smallest nonnegative integer mk such that sup hw; r.x k /i  kr.x k /k2 ;

w2Gk .mk /

where

  Gk .mk / D F x k  mk r.x k / :

Assume on the contrary that it is not satisfied for any nonnegative integer i , i.e, sup hw; r.x k /i < =jr.x k /=j2 , hw; r.x k /i < kr.x k /k2 8w 2 Gk .i /:

w2Gk .i /

As k ! 1, from the upper semicontinuity of F we have hw; r.x k /i  kr.x k /k2

8w 2 F .x k /:

(4)

Since hx  PrC .x/; z  PrC .x/i  0 8x 2 Rn ; z 2 C; we have hx k  cwk  PrC .x k  cwk /; x k  PrC .x k  cwk /i  0; by noting x D x k  cwk and z D x k . This means hr.x k /  cwk ; r.x k /i  0 ) kr.x k /k2  chwk ; r.x k /i: From (4) and (5), we have kr.x k /k2  chwk ; r.x k /i  ckr.x k /k2 ) c > This is a contradiction.

1 : 

(5)

Generalized Monotone Variational Inequalities

7

We next show that the sequence fx k g is bounded. Since x kC1 D PrC \Hk .y k /, we have hy k  x kC1 ; z  x kC1 i  0 8z 2 C \ Hk : Substituting z D x  2 C \ Hk , then we have

hy k  x kC1 ; x   x kC1 i  0 , hy k  x kC1 ; x   y k C y k  x kC1 i  0; which implies kx kC1  y k k2  hx kC1  y k ; x   y k i: Hence, kx kC1  x  k2 D kx kC1  y k C y k  x  k2

D kx kC1  y k k2 C ky k  x  k2 C 2hx kC1  y k ; y k  x  i

 hx   y k ; x kC1  y k i C ky k  x  k2 C 2hx kC1  y k ; y k  x  i

D ky k  x  k2 C hx kC1  y k ; y k  x  i D ky k  x  k2  kx kC1  y k k2 :

(6)

Since zk D x k  mk r.x k / and y k D PrHk .x k / D x k 

hwN k ; x k  zk i k w N ; kwN k k2

we have ky k  x  k2

N k ; x k  zk i k k hw N k ; x k  zk i2 k 2 2hw kwN k  hwN ; x  x  i k 4 kwN k kwN k k2  mk k 2 2 mk hwN k ; r.x k /i k k

hwN ; r.x k /i k  2  hwN ; x  x  i D kx  x k C kwN k k kwN k k2  mk k 2

hw N ; r.x k /i D kx k  x  k2  kwN k k " #  mk k k 2 h w N ; r.x /i

mk hwN k ; r.x k /i k k 2 hwN ; x  x  i  kwN k k2 kwN k k D kx k  x  k2 C

k

 2

D kx  x k 



N k ; r.x k /i

m k hw kwN k k



2

 2 mk hw N k ; r.x k /i  k k hw N ; x  x  i  m k hw N k ; r.x k /i k 2 kwN k

8

P.N. Anh and T. Kuno

D kx k  x  k2 



N k ; r.x k /i

m k hw kwN k k

2

N k ; r.x k /i k k 2 mk hw hw N ; x  x   mk r.x k /i kwN k k2  mk k 2 2 mk hwN k ; r.x k /i k k

hw N ; r.x k /i k  2  hwN ; z  x  i: D kx  x k  kwN k k kwN k k2 

(7)

From the generalized monotonicity of F we see that hw N k ; zk  x  i  0. This, k k together with wN 2 F .z /, implies hwN k ; r.x k /i  kr.x k /k2 : Thus, (7) reduces to 

2

m k hw N k ; r.x k /i ky  x k  kx  x k  kwN k k  mk 2

  kx k  x  k2  kr.x k /k4 : kwN k k k

 2

k

 2

(8)

Combining (6) and (8), we obtain kx kC1  x  k2  kx k  x  k2  kx kC1  y k k2 



mk  kwN k k

2

kr.x k /k4 :

(9)

This implies that the sequence fkx k  x  kg is nonincreasing and hence convergent. Consequently, the sequence fx k g is bounded. Since wk 2 F .x k /, r.x k / D x k  PrC .x k  cwk /, zk D x k  mk r.x k / and F is upper semicontinuous and compact valued on C , the sequence fzk g is also bounded (see e.g. [6]). Hence, the sequence fF .zk /g is bounded, i.e., there exists M > 0 such that kwk k  M 8wk 2 F .zk /: This, together with (9), implies kx kC1  x  k2  kx k  x  k2  kx kC1  y k k2 



mk  M

Since fkx k  x  kg converges to zero, it is easy to see that lim mk kr.x k /k D 0:

k!1

The cases remaining to consider are the following.

2

kr.x k /k4 :

(10)

Generalized Monotone Variational Inequalities

9

Case 1. lim sup mk > 0. k!1

This case must follow that lim inf kr.x k /k D 0. Since x  PrC .x  cF .x// is upper k!1

semicontinuous on C and fx k g is bounded, there exists x, N an accumulation point of fx k g. In other words, a subsequence fx ki g converges to some xN such that r.x/ N D 0, as i ! 1. Then we see from Lemma 1 that xN 2 S , and besides we can take x  D x, N in particular in (10). Thus fkx k  xkg N is a convergent sequence. Since xN is an accumulation point of fx k g, the sequence fkx k  x  kg converges to zero, i.e., fx k g converges to xN 2 S . Case 2. lim mk D 0. k!1

Since mk is the smallest nonnegative integer, mk  1 does not satisfy (1). Hence, we have   hw; r.x k /i < kr.x k /k2 8w 2 F x k  mk 1 r.x k / ;

and besides

hw; r.x ki /i < kr.x ki /k2

  8w 2 F x ki  mki 1 r.x ki / :

(11)

Passing onto the limit in (11) as i ! 1 and using the upper semicontinuity of F , we have hw; r.x/i N  kr.x/k N 2

8w 2 F .x/: N

(12)

From (5) we have kr.x ki /k2  chwki ; r.x ki /i: Since F is upper semicontinuous, passing onto the limit as i ! 1 we obtain N r.x/i: N kr.x/k N 2  chw; Combining this with (12), we have N r.x/i N  ckr.x/k N 2; kr.x/k N 2  chw; which implies r.x/ N D 0, and hence xN 2 S . Letting x  D xN and repeating the previous arguments, we conclude that the whole sequence fx k g converges to xN 2 S . This completes the proof. t u

10

P.N. Anh and T. Kuno

4 An Application to Nonlinear Complementarity Problems It is well known [8] that when C D RnC is a closed convex cone, then MVI becomes the nonlinear complementarity problem, shortly NCP: Find x  2 C such that F .x  / 2 C  ; hF .x  /; x  i D 0; where F W C ! Rn ; C  WD fw W hw; xi  0 8x 2 C g is the polar cone of C . We apply Algorithm 1 to the complementarity problem NCP. Note that in this case, wk D hF .x k /; x k i; r.x k / D x k  P rC .x k  cwk /; the algorithm for NCP can be detailed in the following. Algorithm 2 Step 0. Choose  > 0; x 0 2 C; w0 2 F .x 0 /; 0 < c < 1=, and 2 .0; 1/. Step 1. Compute wk ; r.x k /. Find the smallest nonnegative number mk such that hF .x k  mk r.x k //; r.x k /i  kr.x k /k2 : Set zk WD x k  mk r.x k /. Step 2. Set Hk WD fx 2 Rn j hzk ; x  zk i  0g:

Find x kC1 WD PrC \Hk .x k /. Step 3. Set k WD k C 1, and go to Step 1.

t u

Validity and convergence of this algorithm is immediate from Algorithm 1.

Acknowledgements The author would like to thank the referee for his/her useful comments, remarks, questions and constructive suggestions that helped us very much in revising the paper. This work is supported in part by the Vietnam National Foundation for Science Technology Development (NAFOSTED) and the Grant-in-Aid for Scientific Research (B) 20310082 from the Japan Society for the Promotion of Sciences.

References 1. Anh P. N.: An interior proximal method for solving pseudomonotone nonlipschitzian multivalued variational inequalities, Nonlinear Analysis Forum, 14, 27–42 (2009). 2. Anh P. N.: An interior proximal method for solving monotone generalized variational inequalities, East-West Journal of Mathematics, 10, 81–100 (2008).

Generalized Monotone Variational Inequalities

11

3. Anh P. N., and Muu L. D.: Coupling the Banach contraction mapping principle and the proximal point algorithm for solving monotone variational inequalities, Acta Mathematica Vietnamica, 29, 119–133 (2004). 4. Anh P. N., Muu L. D., and Strodiot J. J.: Generalized Projection Method for Non-Lipschitz Multivalued Monotone Variational Inequalities, Acta Mathematica Vietnamica, 34, 67–79 (2009). 5. Anh P. N., Muu L.D., Nguyen V. H., and Strodiot J. J.: On the Contraction and Nonexpensiveness Properties of the Marginal Mappings in Generalized Variational Inequalities Involving Co-coercive Operators. In: Eberhard, A., Hadjisavvas, N. and Luc, D. T. (ed) Generalized Convexity and Monotonicity. Springer (2005). 6. Aubin J.P., and Ekeland I.: Applied Nonlinear Analysis, Wiley, New York (1984). 7. Daniele P., Giannessi F., and Maugeri A.: Equilibrium Problems and Variational Models, Kluwer (2003). 8. Facchinei F., and Pang J.S.: Finite-Dimensional Variational Inequalities and Complementary Problems, Springer-Verlag, NewYork (2003). 9. Farouq N. El.: Pseudomonotone variational inequalities: convergence of the auxiliary problem method, J. of Optimization Theory and Applications, 111(2), 305–325 (2001). 10. Hai N. X., and Khanh P. Q.: Systems of set-valued quasivariational inclusion problems, J. of Optimization Theory and Applications, 135, 55–67 (2007). 11. Konnov I. V.: Combined Relaxation Methods for Variational Inequalities, Springer-Verlag, Berlin (2000). 12. Rockafellar R. T.: Monotone operators and the proximal point algorithm, SIAM J. Control Optimization, 14, 877–898 (1976). 13. Schaible S., Karamardian S., and Crouzeix J. P.: Characterizations of generalized monotone maps, J. of Optimization Theory and Applications, 76, 399–413 (1993).



Robust Parameter Estimation Based on Huber Estimator in Systems of Differential Equations Tanja Binder and Ekaterina Kostina

Abstract The paper discusses the use of the Huber estimator for parameter estimation problems which are constrained by a system of ordinary differential equations. In particular, a local and global convergence analysis for the estimation with the Huber estimator is given. For comparison, numerical results are given for an estimation with this estimator and both l1 estimation and the least squares approach for a parameter estimation problem for a chemical process.

1 Motivation Robustness in the sense of parameter estimation problems we use here means “insensitivity to small deviations from the assumptions” as defined by Huber [3]. Standard assumptions in parameter identification are, e.g., normally distributed and independent measurement data. But even high quality measurements are not exactly normally distributed but typically longer-tailed. In scientific routine data, i.e. data not taken with special care, about 1%–10% gross errors can be expected [2]. Such gross errors often show as outliers but, of course, not every outlier is a gross error but may also arise for other reasons. Nevertheless, a single outlier can often completely spoil a least squares estimation. This is where the Huber estimator comes into play. Essentially, it is a combination of l1 and l2 criteria for parameter estimation. It is robust in the sense that it can reduce the influence of “wild” data points [5]. To do this, the Huber estimator minimizes a cost function  .t/ that evaluates a least squares term if the data point

T. Binder  E. Kostina Department of Mathematics and Computer Science, University of Marburg, Hans-Meerwein-Str., 35032 Marburg, Germany e-mail: [email protected]; [email protected] H.G. Bock et al. (eds.), Modeling, Simulation and Optimization of Complex Processes, DOI 10.1007/978-3-642-25707-0 2, © Springer-Verlag Berlin Heidelberg 2012

13

14

T. Binder and E. Kostina

has an absolute value smaller than a given constant and an absolute value term if the data’s absolute value is greater than , ( 2 t ; jtj  ; (1)  .t/ D 2

2

jtj  2 ; jtj > : The used partition constant is linked to the ratio " of “bad” data points in the measurement data by the nonlinear function [4, 5] .1  "/1 D

Z



'.t/ dt C 2'. /= ;

1

where '.t/ D .2/ 2 exp. 12 t 2 / denotes the standard normal density. For " going to zero the partition constant tends to infinity and the solution of the Huber estimator converges to the solution of the least squares method. In case the error probability " tends to one then approaches zero and the solution of the Huber estimator converges to the solution of the l1 approximation. The Huber estimator is more robust than the l2 estimator in the sense that it is less sensitive to outliers in the measurement data. Via the partition constant the Huber estimators even shows directly which data points are to be considered as outliers. Absolute value minimization also gives a robust estimation by interpolating the n “best” data points and ignoring the rest, i.e. the outliers, where n is the number of degrees of freedom. In contrast, the Huber estimator takes more measurements into account and thus more information of the system.

2 Parameter Estimation in Systems of Differential Equations We assume that at time points tj ; j D 1; : : : ; N , we have given measurement data ij ; i D 1; : : : ; Kj ; which are the sum of some model response hi .tj ; x.tj /; p/ and some unknown measurement errors "ij , ij D hi .tj ; x.tj /; p/ C "ij ; i D 1; : : : ; Kj ; j D 1; : : : ; N; where the function x satisfies an ordinary differential equation x.t/ P D f .t; x.t/; p/

for t 2 Œt0 ; tf ;

x.t0 / D x0 : Here x0 is given, p is a vector of unknown parameters which are to be estimated. To get the parameters p we solve an optimization problem in which an appropriate function describing the deviation between model and data is to be minimized subject to constraints. In case of the Huber estimator this problem reads:

Robust Parameter Estimation Based on Huber Estimator

min

x./;p

Kj N X X j D1 i D1

15

   ij  hi .tj ; x.tj /; p/

s.t. x.t/ P D f .t; x.t/; p/

(2)

8t 2 Œt0 ; tf ;

rC .x.t0 /; : : : ; x.tf /; p/ D 0 : Besides the ODE constraint, equality constraints rC .x.t0 /; : : : ; x.tf /; p/ D 0 can hold, e.g., any initial, boundary and interior point conditions or similar restrictions which the solution of the problem has to satisfy.

2.1 Discretization of the Dynamics Since the problem (2) is a parameter estimation problem with an ODE system in the constraints, we follow the boundary value problem approach [1]. This means we discretize the dynamics like a boundary value problem and solve simultaneously the optimization problem, the boundary value problem as equality constraints and the further constraints in one loop. The method of choice for the discretization of the ODE is the multiple shooting method. The integration interval of the ODE is divided into M multiple shooting intervals Œk ; kC1 , k D 0; : : : ; M  1, with t0 D 0 <    < M D tf where new unknowns x.k / D sk are introduced to represent the ODE state values in the limiting points of the intervals. Then M initial value problems x.tI P sk ; p/ D f .t; x.tI sk ; p/; p/;

t 2 Œk ; kC1 ;

x.k I sk ; p/ D sk ; k D 0; : : : ; M  1; have to be solved. Of course, the solutions of these subproblems do not necessarily link together smoothly. Therefore we have to introduce M additional continuity conditions, i.e. equality constraints in the endpoints of the multiple shooting intervals, x.kC1 I k ; p/ D skC1 ; to guarantee continuity of the overall solution x.t; s0 ; : : : ; sM ; p/; t 2 Œt0 ; tf . Summarizing the new unknowns sk ; k D 0; : : : ; M , and the original parameters T p to a new parameter vector s D .s0T ; : : : ; sM ; p T /T , this procedure leads to an equality constrained, parametrized optimization problem:

min s

Kj N X X

j D1 i D1

   ij  hi .tj ; x.tj ; s/; p/

s.t. rk .s/ WD x.kC1 I sk ; p/  skC1 D 0; rC .s/ D 0 :

(3) k D 0; : : : ; M  1;

16

T. Binder and E. Kostina

2.2 Parameter Estimation Problem with Huber Estimator Essentially, the parameter estimation problem with the Huber estimator (3) can be written as min X

N1 X

 .F1;i .X //;

s.t.

i D1

F2 .X / D 0;

(4)

where the cost function splits up into two terms,  X 1 X  1 2 2  .F1;i .X // D

jF1;i .X /j  ; .F1;i .X // C 2 2 i D1

N1 X

i 2I1 .X /

i 2I2 .X /

with the two index sets I1 .X / D fi W jF1;i .X /j  g;

I2 .X / D fi W jF1;i .X /j > g :

Here Fl .X /, l D 1; 2; are Nl -vector sufficiently smooth functions, X 2 RN :

2.3 Constrained Gauss-Newton Method For the solution of the minimization problem (4) we use the constrained GaussNewton method. Starting from a given initial value X 0 , we compute new iterates X kC1 D X k C Œt k X k ; where the increment X k solves the linearized problem min X

1 2

X

i 2I1 .X /

X

i 2I2 .X /

2  F1;i .X k / C J1;i .X k /X C

 

2 k k ;

jF1;i .X / C J1;i .X /X j  2

s.t. F2 .X k / C J2 .X k /X D 0; for the index sets I1 .X / D I1 .X k ; X / D fi W jF1;i .X k / C J1;i .X k /X j  g;

I2 .X / D I2 .X k ; X / D fi W jF1;i .X k / C J1;i .X k /X j > g :

(5)

Robust Parameter Estimation Based on Huber Estimator

17

l .X / ; l D 1; 2: The optional stepsize t k can be determined by a line Here Jl .X / D @F@X search strategy, see Sect. 2.5. A numerical procedure for solving the problem (5) is briefly discussed in Sect. 2.6. Let JQ 1 denote a transformation of the Jacobian of the cost function,

JQ 1 .X; X / D D.X; X /J1 .X /; where D.X; X / is a diagonal matrix with the entries ( 1; if jF1;i .X / C J1;i .X /X j  ; dii D 0; if jF1;i .X / C J1;i .X /X j > : This means that in the transformed Jacobian JQ 1 only those components of the cost function are still present for which the absolute value of the linearized cost function is smaller than the partition constant while all other components are set to zero and therefore ignored in the further course of the procedure. We further introduce the notation JQ for the composed Jacobian matrix   JQ 1 .X; X / Q Q : J D J .X; X / D J2 .X / Under the regularity assumptions rank J2 .X k / D N2 ; rank JQ .X k ; X k / D N

(6)

the solution X k of the linearized problem (5) is given by means of a generalized C inverse JQ : C X k D JQ .X k ; X k /FQ .X k ; X k / ; where FQ is defined as

  FQ 1 .X; X / FQ D FQ .X; X / D F2 .X /

with ( F1;i .X /; Q F 1;i .X; X / D

sign.F1;i .X / C J1;i .X /X /; The generalized inverse JQ C JQ

C

if i 2 I1 .X; X /;

if i 2 I2 .X; X /:

is explicitly given by  1 C J .X /T D.X; X /J1 .X / J2 .X /T Q D J .X; X / D .IN ; 0N;N2 / 1 J2 .X / 0N2 ;N2   J .X /T 0N;N2  1 0N2 ;N1 IN2

18

T. Binder and E. Kostina

and satisfies the condition

C C C JQ JQJQ D JQ :

Here Ik denotes k  k identity matrix, 0k;r denotes k  r zero matrix.

2.4 Local Convergence It can be shown that the Gauss-Newton method eventually identifies the “optimal partitioning” of residuals F1i .X / in a neighbourhood of a solution X  that satisfies certain assumptions: Theorem 1. Suppose that X  is a solution of the problem (4) that satisfies (6) with X k D X  ; X k D 0 and the strict complementarity jF1i .X /j ¤ ; i D 1; : : : ; N1 ; with X D X  . Then, there exists a neighbourhood D of X  such that for all X k 2 D the linearized problem (5) has a solution X k whose partitioning I1 .X k ; X k /; I2 .X k ; X k / is the same as the partitioning I1 .X k /; I2 .X k / of the nonlinear problem (4) at X D X k , which is in its turn the same as the partitioning I1 .X  /; I2 .X  / of the nonlinear problem (4) at X D X  W Il .X k / D Il .X  / D Il .X k ; X k / D Il ; l D 1; 2: Moreover, signF1i .X k / D signF1i .X  / D sign.F1i .X k / C J1i .X k /X k /; i 2 I2 : Hence, the full-step (t k  1) method becomes equivalent to the Gauss-Newton method applied to solving a modified least squares problem min

X 1X 2 F1i .X / C

sgn.F1i .X  //F1i .X /; 2 i 2I i 2I 1

2

s.t. F2 .X / D 0;

and as a result it has a linear rate of local convergence. Theorem 2 (Local Contraction). Let D be the neighbourhood defined by Theorem 1. Assume that the following (weighted) Lipschitz conditions for JQ and C JQ are satisfied for all X; Y D X C X; Z 2 D and all t 2 Œ0; 1 C kJQ .Y /ŒJQ .X C t.Y  X //  JQ .X /.Y  X /k  ! < 1; t kY  X k2 C

(7)

C

kŒJQ .Z/  JQ .X /R.X; X /k   < 1; kZ  X k

(8)

Robust Parameter Estimation Based on Huber Estimator

19

where X solves the linearized problem (5) at X , R is the residual of the linearized problem,   Q R.X; X / ; R D R.X; X / D F2 .X / C J2 .X /X with

(

F1;i .X / C J1;i .X /X; RQ i .X; X / D

sign.F1;i .X / C J1;i .X /X /;

if i 2 I1

if i 2 I2 :

Assume further, that all initial guesses X 0 2 D satisfy !jjX 0 jj C  < 1; ı0 D 2

  jjX 0 jj N  D: D0 D B X 0 ; 1  ı0

Here X 0 solves the linearized problem (5) at X 0 : Then the sequence of iterates fX k g of the full step Gauss-Newton method is well defined, remains in D and converges to a point X  with JQ .X  /CFQ .X  / D 0: It further holds the a priori estimate kX kC1 k  .kX k k !=2 C / kX k k ; which means the convergence to be linear with the rate . The statements of this theorem can be interpreted similarly to the least squares case. The constant ! from the Lipschitz condition (7) for JQ is a measure for the nonlinearity of the model as it is in fact nothing but a weighted second derivative. Its inverse ! 1 characterizes the region of validity of the linear model. C The constant  from the Lipschitz condition (8) for JQ refers to the incompatibility of the model and the measurements and it is therefore called the incompatibility constant. A value  < 1 is a necessary condition for the identifiability of the parameters from the available data. Only a solution with  < 1 is statistically stable. Solutions with   1 have large residuals and are statistically unstable. Let us note that in case of the Huber-estimator one can reduce  by decreasing the partitioning constant :

2.5 Global Convergence As we have seen that the Constrained Gauss-Newton method for the Huber estimator is locally convergent, we concentrate our attention on globalization strategies. One possibility is a line search in which the iteration step is damped, X kC1 D X k C t k X k ;

20

T. Binder and E. Kostina

with a stepsize t k 20; 1. This stepsize is chosen such that the next iterate X kC1 is “better” in some sense than the current iterate X k , T1 .X kC1 / < T1 .X k / : As a measure for the goodness of the iterates we use the exact penalty function T1 as a merit function, T1 .X / D

N1 X i D1

 .F1;i .X // C

N2 X i D1

˛i jF2;i .X /j ;

(9)

with sufficiently large weights ˛i > 0. Theorem 3 (Compatibility of Gauss-Newton method for Huber estimator and exact penalty function). Under the regularity assumptions (6) and if we further assume strict complementarity, i.e. jF1;i .X /j ¤ ;

jF1;i .X / C J1;i .X /X j ¤ ;

the increment X solving the linearized problem leads to a descent direction of the nonlinear problem with Huber estimator, T1 .X k C "X /  T1 .X k / < 0; "!0 " lim

for the exact penalty function (9). Based on this Theorem we can prove global convergence of the method with exact line search.

2.6 Numerical Solution of the Linearized Problem One of the decisive steps of the method which largely affects its performance is the solution of the linearized problems of (5). An efficient method is the so-called condensing which exploits the block structure of the Jacobian J2 that arises due to the applied multiple shooting approach, see also [1], 0

R01 BR 2 B 0 B B A0 J DB B B B @

R11 R21 R12 R22 I A1 I :: :

1    RM 2    RM

::

:

Rp1 Rp2 B0 B1 :: :

AM 1 I BM 1

1

C C C C C; C C C A

0

B B B B F DB B B B @

F1 rC r0 r1 :: : rM 1

1

C C C C C; C C C A

Robust Parameter Estimation Based on Huber Estimator

with Rj1 D

@F1 , @sj

@x.j C 1 Wj ;sj ;p/ ; @sj

@rC , j D @sj @x.j C W 1 j ;sj ;p/ , @p

Rj2 D

21

0; : : : ; M , Rp1 D

@F1 , @p

Rp2 D

@rC @p

, Aj D

Bj D j D 0; : : : ; M  1. For notation simplicity we omit here the variable X . For given s0 and p we can solve the continuity equations by a simple forward recursion. This is equivalent to a reduction of the linearized system in the following way. Define iteratively vectors dj and matrices Cj and Dj with d0 D r0 , C0 D A0 , D0 D B0 , dj D Aj dj 1 C rj ; Cj D Aj Cj 1 ; Dj D Aj Dj 1 C Bj ; j D 1; : : : ; M  1; and compute FN1 D F1 C RN 0l D R0l C

M X

Rj1 dj 1 ;

j D1 M X

Rjl Cj 1 ;

j D1

rNC D rC C

M X

Rj2 dj 1 ;

j D1

RN pl D Rpl C

M X

j D1

Rjl Dj 1 ; l D 1; 2:

We can then write sj D Cj 1 s0 C Dj 1 p C dj 1 ; j D 1; : : : ; M; and substitute this in the linearized problem to get: min

s0 ;p

N1 X i D1

1 1  .FN1;i C RN 0;i s0 C RN p;i p/;

(10)

s.t. rNC C RN 02 s0 C RN p2 p D 0 : The condensed problem (10) is equivalent to a quadratic programming problem with additional variables v, u and w 2 RN1 and additional equality and inequality constraints N

min

s0 ;p;v;u;w

1 X 1 T .ui C wi /; v vC 2 i D1

s.t. FN1 C RN 01 s0 C RN p1 p  v D u  w; u  0; w  0; rNC C RN 02 s0 C RN p2 p D 0 ; and can be solved by a structure exploiting QP solver with an active set strategy.

22

T. Binder and E. Kostina

A violation of the assumed regularity assumption that the rank of JQ should equal the number of unknowns means that the measurements judged to be “good” by the Huber estimator do not provide enough information about the parameters. The parameters are not identifiable from the given data. Hence, a regularization by a rank reduction is necessary. Alternatively, methods from optimum experimental design can be applied to gain more information about the parameters.

3 Numerical Results As a numerical example we consider the chemical process of the denitrogenization of pyridine, see [1]. Pyridine is converted into ammonia and pentane by means of three catalysts. The reaction coefficients p1 ; : : : ; p11 are the unknowns of the reaction which are to be estimated from given measurements. As the process is isothermal at 350ı K and 100atm, we do not need an Arrhenius term. Thus the process can be described mathematically by a system of seven ordinary differential equations, one for each occurring species, pyridine: AP D p1 A C p9 B;

piperidine: BP D p1 A  p2 B  p3 BC C p7 D  p9 B C p10 DF;

pentylamine: CP D p2 B  p3 BC  2p4 C C  p6 C C p8 E C p10 DF C 2p11 EF; N-pentylpiperidine: DP D p3 BC  p5 D  p7 D  p10 DF;

dipentylamine: EP D p4 C C C p5 D  p8 E  p11 EF;

ammonia: FP D p3 BC C p4 C C C p6 C  p10 DF  p11 EF; pentane: GP D p6 C C p7 D C p8 E :

In the beginning of the process only pyridine is available while the initial concentration for all other species is zero. The artificial measurement data was generated using “true” parameter values. Four outliers were randomly introduced into the data. We solved the corresponding parameter estimation problem with the l2 norm, the l1 norm, and the Huber function as optimization criterion for the cost function. The computed parameter estimates together with their true values are given in Table 1. Obviously, l1 and Huber estimation do not differ much while the least squares approximation yields quite different results. The most deviant parameter estimates of the l2 estimation are marked in italics.

Robust Parameter Estimation Based on Huber Estimator

23

Table 1 Estimates of the parameters l2 l1 Huber true

p1 1.812 1.810 1.810 1.810

p2 0.850 0.894 0.894 0.894

p3 29.597 29.399 29.393 29.400

p4 4.467 9.209 9.172 9.210

p5 0.059 0.058 0.058 0.058

p6 2.503 2.429 2.430 2.430

p7 0.112 0.0644 0.0647 0.0644

p8 1.990 5.550 5.551 5.550

p9 0.0203 0.0201 0.0201 0.0201

p10 0.497 0.577 0.576 0.577

p11 8.468 2.149 2.184 2.150

4 Conclusions We developed methods for using the Huber estimator in parameter estimation problems with underlying dynamic processes. This estimator is “better” than the least squares method in the sense that (1) it is robust, (2) it gives a possibility for outlier identification, and (3) it tells us if there are enough “good” measurements to identify the parameters or if more information is needed. Therefore, the Huber estimator is the method of choice for normally distributed data with some outliers. For normally distributed data without outliers the l2 estimator is of course still preferable. Although the l1 estimator is also a robust method, it is inferior to the Huber estimator with respect to the amount of information taken into account. Acknowledgements This research was supported by the German Federal Ministry for Education and Research (BMBF) through the Programme “Mathematics for Innovations in Industry and Public Services”.

References 1. Bock, H.G.: Randwertproblemmethoden zur Parameteridentifizierung in Systemen nichtlinearer Differentialgleichungen. Bonner Mathematische Schriften, 183, Bonn (1987). 2. Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J., Stahel, W.A.: Robust Statistics: The Approach Based on Influence Functions. Wiley, New York (1986) 3. Huber, P.J.: Robust Statistics. Wiley, New York (1981) 4. Huber, P.J.: Robust estimation of a location parameter. The Annals of Mathematical Statistics, 35, 73–101 (1964) 5. Ekblom, H., Madsen, K.: Algorithms for Non-linear Huber Estimation. BIT, 29, 60–76 (1989)



Comparing MIQCP Solvers to a Specialised Algorithm for Mine Production Scheduling Andreas Bley, Ambros M. Gleixner, Thorsten Koch, and Stefan Vigerske

Abstract This paper investigates the performance of several out-of-the-box solvers for mixed-integer quadratically constrained programmes (MIQCPs) on an open pit mine production scheduling problem with mixing constraints. We compare the solvers BARON, COUENNE , SBB, and SCIP to a problem-specific algorithm on two different MIQCP formulations. The computational results presented show that general-purpose solvers with no particular knowledge of problem structure are able to nearly match the performance of a hand-crafted algorithm.

1 Introduction Effective general-purpose techniques are currently applicable for most linear mixedinteger and continuous convex optimisation problems. In contrast, for many nonconvex optimisation problems, specialised algorithms are still required to find globally optimal solutions. Traditional solution methods for nonconvex integer optimisation problems have been developed either as entirely new solvers [16, 19], or by directly extending a solver for NLPs to cope with integrality conditions, see e.g. [3, 10]. In recent years several groups have started to explore a different direction by trying to extend MIP solvers to handle nonlinearities, see e.g. [1, 4, 5, 9, 14].

A. Bley Technische Universit¨at Berlin, Straße des 17. Juni 136, 10623 Berlin e-mail: [email protected] A.M. Gleixner  T. Koch Zuse Institute Berlin, Takustraße 7, 14195 Berlin e-mail: gleixner,[email protected] S. Vigerske Humboldt-Universit¨at zu Berlin, Unter den Linden 6, 10099 Berlin e-mail: [email protected] H.G. Bock et al. (eds.), Modeling, Simulation and Optimization of Complex Processes, DOI 10.1007/978-3-642-25707-0 3, © Springer-Verlag Berlin Heidelberg 2012

25

26

A. Bley et al.

In this paper we compare the performance of a specialised branch-and-bound code to solve an open-pit mine production scheduling problem with mixing constraints to the performance of several general-purpose solvers on these problems, specifically BARON [19], COUENNE [4], SBB [3], and SCIP [5]. An extended version of this article is available as technical report [7]. Open-pit mine production scheduling has been chosen as a test case, since the authors were involved in a research project to solve these challenging, large-scale optimisation problems [6]. Now a few years later it can be seen that using recent general-purpose software we are able to get nearly as good solutions out-of-the-box.

2 Open Pit Mine Production Scheduling with Stockpiles In this section we describe in detail our model of the open pit mine production scheduling problem (OPMPSP) [8, 12, 17]. Typically, the orebody of an open pit mine is discretised into small mining units called blocks. Block models of realworld open pit mines may consist of hundreds of thousands of blocks resulting in large-scale optimisation problems. Groups of blocks are often aggregated to form larger mining units with possibly heterogeneous ore distribution, which we call aggregates. We assume such an aggregation of a block model is given a priori, with the set of aggregate indices N D f1; : : : ; N g.1 Note that this setting comprises the special case of an unaggregated block model where we have only one block per aggregate. Moreover, we assume complete knowledge about the contents of each aggregate i : First, its rock tonnage Ri , i.e. the amount of material which has to be extracted from the mine. Second, its ore tonnage Oi , i.e. the fraction of the rock tonnage sufficiently valuable to be processed further; in contrast, the non-ore fraction of each aggregate is discarded as waste immediately after its extraction from the mine. Finally, the tonnages A1i ; : : : ; AK i quantify a number of mineral attributes contained in the ore fraction. Attributes may be desirable, such as valuable mineral, or undesirable, such as chemical impurities. The mining operations consist of several processes: First, rock is extracted from the pit, which we refer to as mining. Subsequently, the valuable part of the extracted material is refined further for sale, which is called processing; the remaining material not sufficiently valuable is simply discarded as waste. In an intermediate stage between mining and processing, the valuable material may be stored on stockpiles. A stockpile can be imagined as “bucket” in which all material is immediately mixed and becomes homogeneous. The lifespan of the mine is discretised into several, not necessarily homogeneous periods 1; : : : ; T . A feasible mine schedule determines, for each time period, the

1

Various techniques exist for computing aggregates of blocks, for an example see the fundamental tree method [18].

MIQCP Solvers vs. a Spec. Algo. for Mine Sched.

27

amount of rock which is to be mined from each aggregate, the fraction of the mined ore which is to be sent for processing or stockpiled, as well as the amount of ore sent from the stockpiles to the processing plant. Resource constraints restrict the amount of rock which may be mined and the amount of ore which may be processed during each time period t by limits Mt and Pt , respectively. Precedence constraints model the requirement that wall slopes are not too steep, ensuring the safety of the mine. Technically, these constraints demand that, before the mining of aggregate i may be started, a set of predecessor aggregates P.i / must have been completely mined. Long-term mining schedules have to be evaluated by their net present value: For each time period, we take the return from the processed and sold minerals minus the cost for mining and processing, multiplied by a decreasing discount factor to account for the time value of money. For homogeneous time periods and constant interest rate q > 0 per time period, the profit made in time period t is multiplied by a factor of 1=.1 C q/t . The objective is to find a feasible mine schedule with maximum net present value. Already without considering stockpiles, open pit mine production scheduling poses an NP-hard optimisation problem, see e.g. [13]. This paper focuses on the special case of one attribute—some valuable mineral— and a single stockpile. A more general setting comprising multiple attributes, multiple stockpiles, or blending constraints in case of multiple attributes can easily be modelled by minor extensions and modifications, see [6]. To conclude this section, Table 1 summarises the notation introduced above.

3 MIQCP Formulations In this section we provide mixed-integer quadratically constrained programming (MIQCP) formulations of the open pit mine production scheduling problem with one attribute (“metal”) and a single, infinite-capacity stockpile, as presented in [6]: an aggregated “basic” formulation and an extended “warehouse” formulation. These formulations are theoretically equivalent. The results in [6], however, clearly speak in favour of the extended formulation. For the LP relaxation based solvers, this is equally confirmed by our computational study presented in Sect. 4. Table 1 List of notation N, N P .i / Ri , Oi Aki ck m, p T ıt M t , Pt

Number of aggregates and set of aggregate indices f1; : : : ; N g, respectively Set of immediate predecessors of aggregate i Rock and ore tonnage of aggregate i , respectively [tonnes] Tonnage of attribute k in aggregate i (Ai for a single attribute) [tonnes] Sales price of attribute k (c for a single attribute) [$m/tonne] Mining and processing cost, respectively [$m/tonne] Number of time periods Discount factor for time period T (typically 1=.1 C q/t with fixed interest rate q > 0) Mining and processing capacity, respectively, for time period t [tonnes]

28

A. Bley et al.

3.1 Basic Formulation To track the various material flows, we define the following continuous decision variables for each aggregate i and time period t: m yi;t 2 Œ0; 1 as the fraction of aggregate i mined at time period t, p

yi;t 2 Œ0; 1 as the fraction of aggregate i mined at time period t and sent immediately for processing, s

yi;t 2 Œ0; 1 as the fraction of aggregate i mined at time period t and sent to the stockpile, s

s

p

p

as the absolute amount of ore respectively metal on the stockpile at time period t, and

ot ; at > 0

ot ; at > 0 as the absolute amount of ore respectively metal sent from the stockpile to the processing plant at time period t. With this, the net present value of a mine schedule is calculated as NP V .y m ; y p ; op ; ap / D " ! ! # T N N N X X X X p p p p m ıt c at C Ai yi;t  p ot C Oi yi;t  m Ri yi;t : t D1

i D1

i D1

(1)

i D1

In order to model the precedence constraints, we define the binary variables xi;t 2 f0; 1g as equal to 1 if aggregate i is completely mined within time periods 1; : : : ; T . A precedence-feasible extraction sequence is then ensured by the constraints xi;t 6

t X  D1

t X

m yi; 6 xj;t

 D1

m yi;

for i 2 N ; t D 1; : : : ; T;

(2)

for i 2 N ; j 2 P.i /; t D 1; : : : ; T:

(3)

Additionally, we may, without altering the set of feasible solutions, require the sequence xi;1 ; : : : ; xi;T to be nondecreasing for each aggregate i : xi;t 1 6 xi;t

for i 2 N ; t D 2; : : : ; T:

(4)

MIQCP Solvers vs. a Spec. Algo. for Mine Sched.

29

Though redundant from a modelling point of view, these inequalities may help (or hinder) computationally, and have been used in the benchmark algorithm from [6]. Conservation of the mined material is enforced by T X

m yi;t 61

t D1

p

s

m yi;t C yi;t 6 yi;t

for i 2 N ; and

(5)

for i 2 N ; t D 1; : : : ; T;

(6)

i.e. for each aggregate, the amount sent for processing or to the stockpile in one time p s m period must not exceed the total amount mined. (The difference yi;t  yi;t  yi;t is discarded as waste.) To model the state of the stockpile, we make

3.1.1 Assumption S Material sent from the stockpile to processing is removed at the beginning of each time period, while material extracted from the pit (and not immediately processed) is stockpiled at the end of each time period. Following this assumption, we must not send more material from the stockpile to processing than is available at the end of the previous period: p

s

p

s

for t D 2; : : : ; T:

ot 6 ot 1 and at 6 at 1

(7)

If we assume the stockpile to be empty at the start of the mining operations, we p p have o1 D a1 D 0. Now, the book-keeping constraints for the amount of ore on the stockpile read s ot

D

(P

N i D1

s ot 1

s

Oi yi;1



p ot

C

PN

s i D1 Oi yi;t

for t D 1;

for t D 2; : : : ; T;

(8)

and analogously for the amount of metal on the stockpile s at

D

(P

s N i D1 Ai yi;1 s p at 1  at C

PN

s i D1 Ai yi;t

for t D 1;

for t D 2; : : : ; T:

(9)

The resource constraints on mining and processing read N X

m Ri yi;t 6 Mt

i D1

p

ot C

N X i D1

p

Oi yi;t 6 Pt

for t D 1; : : : ; T; and

(10)

for t D 1; : : : ; T:

(11)

30

A. Bley et al.

Last, we need to ensure that the ore-metal-ratio of the material sent from stockpile to processing equals the ore-metal-ratio in the stockpile itself. Otherwise, only the profitable metal could be sent to processing and for sale while the ore, only causing processing costs, could remain in the stockpile. This involves the nonconvex p p s s quadratic mixing constraints at =ot D at 1 =ot 1 for t D 2; : : : ; T . To avoid singularities, we reformulate these constraints as p s

s

p

at ot 1 D at 1 ot

for t D 2; : : : ; T:

(12)

All in all, we obtain the basic formulation max NP V .y m ; y p ; op ; ap / s. t.

(2)–(12);

(BF)

x 2 f0; 1gN T ; y m ; y p ; ys 2 Œ0; 1N T ; os ; as ; op ; ap > 0: Stockpiling capacities can be incorporated as upper bounds on os and as .

3.2 Warehouse Formulation In the basic formulation (BF) the material of all aggregates sent from the pit to the stockpile is aggregated into variables os and as . Alternatively, we may track the material flows via the stockpile individually. Instead of variables os , as , op , and ap , we then define for each aggregate i and time period t: p

zi;t 2 Œ0; 1 as the fraction of aggregate i sent from stockpile for processing at time period t and s

zi;t 2 Œ0; 1 as the fraction of aggregate i remaining in the stockpile at time period t. The net present values in terms of these variables is calculated as NP V .y m ; y p ; zp / D # " N N N T X X X X  p  p p  p  m Ri yi;t : Oi yi;t C zi;t  m ıt c Ai yi;t C zi;t  p t D1

i D1

i D1

(13)

i D1

s

Constraints (2)–(6) remain unchanged. Starting with an empty stockpile gives zi;1 D p zi;1 D 0 for i 2 N . Under Assumption S, the stockpile balancing equations read s

s

s

p

zi;t 1 C yi;t 1 D zi;t C zi;t

for i 2 N ; t D 2; : : : ; T:

(14)

MIQCP Solvers vs. a Spec. Algo. for Mine Sched.

31

The resource constraints on mining are the same as (10), the resource constraints on processing become N X i D1

 p p  Oi yi;t C zi;t 6 Pt

for t D 1; : : : ; T:

(15)

Instead of the mixing constraints (12), now we demand that for each time period t, p s the fraction zi;t =zi;t is equal for each aggregate i . We obtain a better formulation by introducing, for each time period t, a new variable ft 2 Œ0; 1 called out-fraction, p s p and requiring for all i 2 N , that zi;t =.zi;t C zi;t / D ft . To avoid zero denominators, we reformulate this as p

s

zi;t .1  ft / D zi;t ft

for i 2 N ; t D 2; : : : ; T:

(16)

This gives the warehouse formulation max NP V .y m ; y p ; zp / s. t.

(2)–(6), (10), (14)–(16);

(WF)

x 2 f0; 1gN T ; y m ; y p ; ys ; zp ; zs 2 Œ0; 1N T ; f 2 Œ0; 1T : Note that the basic formulation is an aggregated version of the warehouse formulation, and thus the LP relaxation (obtained by dropping integrality and mixing constraints) is tighter for the warehouse formulation. Bley et al. [6] propose a rough a priori discretisation of the out-fractions in order to tighten the linear MIP relaxation obtained when dropping all of the quadratic constraints. Computational results on the effect of this technique can be found in the extended version of this article [7].

4 Computational Study 4.1 Application-Specific Benchmark Algorithm As benchmark algorithm we used the application-specific approach developed by Bley et al. [6]. It features a branch-and-bound algorithm based on the linear MIP relaxation of the problem obtained by dropping the nonlinear mixing constraints, i.e. (12) for the basic and (16) for the warehouse formulation, respectively. A specialised branching scheme is used to force the maximum violation of the nonlinear constraints arbitrarily close to zero. They implemented their approach using the state-of-the-art MIP solver CPLEX 11.2.1 with tuned parameter settings. Additionally, they apply problem-specific heuristics as well as a variable fixation scheme and cutting planes derived from the

32

A. Bley et al.

underlying precedence constrained knapsack structure which have been shown to improve the dual bound for linear mine production scheduling models. For further details, see [6]. We used the same implementation in our computational study.

4.2 General-Purpose MIQCP Solvers For our computational experiments, we had access to four general-purpose solvers for MIQCPs: BARON [19], a closed-source mixed-integer nonlinear programming (MINLP) solver that implements a spatial branch-and-bound algorithm based on a linear relaxation obtained from a convexification of the MINLP. Branching is performed on both integer variables and continuous variables, the latter to reduce the gap between a nonconvex function and its convex underestimator. We used BARON 9.0.2 with CPLEX 12.1.0 [8] as LP solver and MINOS 5.51 [15] as NLP solver. COUENNE [4], a recently developed open-source MINLP solver that implements a similar technique to BARON. It is built on top of the MIP solver CBC [11]. We used COUENNE 0.2 (stable branch, rev. 256) with CBC 2.3 as branch-andbound framework, CLP 1.10 [11] as LP solver, and the interior-point solver I POPT 3.6 [20] to handle NLPs. SCIP [2], a constraint integer programming solver that is freely available for academic use and has recently been extended to handle quadratic constraints within an LP based branch-and-cut algorithm [5]. We used SCIP 1.2.0.4 once with CPLEX 12.1.0 and once with CLP 1.10 as LP solver and IPOPT 3.7 as QCP solver. SBB [3], a commercial solver for MINLPs that implements an NLP based branchand-bound algorithm. The solution of the NLP relaxation is used as dual bound for the branch-and-bound algorithm. This bound can only be trusted if all NLPs are solved to global optimality. However, we used SBB with CONOPT 3.14T [3] as NLP solver, which does not guarantee global optimality for the nonconvex NLP relaxations in our application. We still include SBB into our testset, since NLP based branch-and-bound algorithms often obtain very good primal solutions also for nonconvex MINLPs.

4.3 Test Instances Our industry partner BHP Billiton Pty. Ltd.2 has provided us with realistic data from two open pit mines. Data set Marvin is based on a block model provided with the

2

http://www.bhpbilliton.com/

MIQCP Solvers vs. a Spec. Algo. for Mine Sched.

33

Table 2 Size of MIQCPs for instances Marvin and Dent (before presolving) Marvin Dent No. variables

No. constraints

No. variables

No. constraints

Total Bin Cont Total Linear Quad Total Bin Cont Total Linear Quad (BF) 5,848 1,445 4,403 7,598 7,582 16 12,600 3,125 9,475 15,774 15,750 24 (WF) 8,687 1,445 7,242 10,404 9,044 1,360 18,775 3,125 15,650 21,900 18,900 3,000

Whittle 4X mine planning software,3 originally consisting of 8513 blocks which were aggregated to 85 so-called “panels”, i.e. single layers of blocks without blockto-block precedence relations. The lifespan of this mine, i.e. the time in which the profitable part of the orebody can be fully mined, is 15 years. Each panel has an average of 2.2 immediate predecessor aggregates. Data set Dent is based on the block model of a real-world open pit mine in Western Australia, originally consisting of 96821 blocks which were aggregated to 125 panels. Each panel has an average of 2.0 immediate predecessor aggregates. The lifespan of this mine is 25 years. The aggregations to panels, the cutoff grades (determining which blocks in each panel are immediately discarded as waste), and precedence relations between the panels were pre-computed by our industry partner. Scheduling periods are time periods of one year each with a discount rate of 10% per year. Realistic values for mining costs and processing profits as well as for mining and processing capacities per year were chosen by our industry partner. We tested the performance of the general-purpose MIQCP solvers from Sect. 4.2 on this data using the basic and the warehouse formulation—the same formulations on which the benchmark algorithm is based. Table 2 gives an overview over the size of these MIQCP formulations for instances Marvin and Dent.

4.4 Computational Results The experiments were run single-threaded on an Intel Core2 Extreme CPU X9650 with 3.0 GHz and 8 GB RAM and a time limit of 10,000 s. Since no solver was able to provide a provable optimal solution, we report primal and dual bound and the number of nodes processed after one hour and at the end of the time limit. 4.4.1 Solver Settings We ran both solver BARON and COUENNE with default and optimised settings. For BARON we set maxpretime 1800 limiting the preprocessing time to 30

3

Gemcom Whittle, http://www.gemcomsoftware.com/products/whittle/

34

A. Bley et al.

minutes, PEnd 5 and PDo 50 reducing probing to depth 5 for at most 50 variables. For COUENNE we set aggressive fbbt no and optimality bt no to switch off too expensive bound propagation techniques. For SBB, we generally switched on the option acceptnonopt, ensuring that SBB did not prune a node if the NLP subsolver did not conclude optimality or infeasibility of the node’s QCP relaxation. Besides default settings, we also tested a tuned version with option dfsstay 25. The results for tuned settings are indicated by “ ” in tables and figures. We parenthesised the dual bound and gap for solver SBB, since they might be invalid due to the nonconvexity of the problem. SCIP was run with one setting only. The extended RENS heuristic [5] was called frequently. The QCP solver was only used inside the RENS heuristic to find feasible solutions of the sub-MIQCP with all integer variables fixed. To allow a better comparison with both BARON and C OUENNE, we run SCIP once with CPLEX and once with CLP as LP solver. 4.4.2 Results for the Basic Formulation Table 3 shows the performance of the application-specific benchmark algorithm from Sect. 4.1 and the general-purpose solvers when using the basic formulation. The application-specific algorithm yields the smallest primal-dual gaps among the LP relaxation based solvers, all of which, however, terminate with large dual bounds. Among the LP based general-purpose solvers, BARON has the best dual bounds, while it is outperformed by COUENNE and SCIP in terms of primal solutions. However, including the benchmark algorithm, all LP based solvers perform rather unsatisfactory on the basic formulation. In contrast, the tightest dual bounds clearly are obtained by the NLP based approach of solver SBB –although they cannot be trusted. It produces the best primal solution for problem instance Marvin and terminates with the smallest gap of 3.25%. For instance Dent, however, the best solution found by SBB was 18.1% worse than the best primal solution found by COUENNE , resulting in a final gap larger than for the benchmark algorithm.

4.4.3 Results for the Warehouse Formulation Table 4 shows the results for the warehouse formulation. First note that the LP based approaches perform significantly better on this formulation. The application-specific algorithm shows excellent performance on the warehouse formulation. It produces the best primal solutions and terminates with the smallest primal-dual gaps of 0.02% for instance Marvin and 0.33% for instance Dent. Nevertheless, the best solutions found by the general-purpose solvers are only 0.4% and 0.2% below those found by the benchmark algorithm for Marvin and Dent, respectively.

MIQCP Solvers vs. a Spec. Algo. for Mine Sched.

35

Table 3 Results for basic formulation (BF) Instance Solver After 3,600 s Primal Marvin

Dent

Dual

After 10,000 s Nodes

Primal

916.6 249,100

678.2

Dual

Nodes

Gap

916.4 476,456

35.13

Benchmark

678.2

BARON BARON

229.2 1,217.9 568.4 1,386.1

500 10

229.1 1,163.9 641.4 1,142.7

1,925 3,979

407.90 78.15

C OUENNE COUENNE

— 1,655.6 283.9 1,645.9

2 2,893

— 1,650.0 642.6 1,636.0

104 15,429

— 154.58

SCIP/CPLEX SCIP/CLP

669.6 1,584.1 232,600 671.6 1,581.1 170,800

672.4 1,579.7 645,397 671.6 1,577.4 462,962

57.43 57.43

SBB SBB

676.6 683.0

7,980 8,040

682.7 685.1

25,266 24,922

(3.25) (2.88)

54.0 100,500

47.3

53.8 269,023

13.71

Benchmark BARON BARON

47.3

(706.1) (706.0)

(705.0) (704.9)

4.8 4.6

106.8 104.3

19 6

4.8 4.6

106.8 104.3

384 2,118.86 1,333 2,161.57

COUENNE COUENNE

48.1 48.1

113.9 113.9

0 0

48.1 48.1

113.9 113.4

2 2,955

137.04 135.81

SCIP/CPLEX SCIP/CLP

46.1 47.5

110.1 110.3

61,400 45,000

46.1 47.5

109.9 179,554 110.0 141,391

58.05 56.86

SBB SBB

39.4 39.4

(50.2) (50.2)

500 540

39.4 39.4

(50.2) (50.2)

1,214 1,220

(27.53) (27.33)

The best dual bounds from the general-purpose solvers are 1.4% and 0.2% away from the benchmark values for Marvin and Dent, respectively. Note that this difference is not only due to the handling of the nonlinear constraints. Also, the benchmark algorithm uses knowledge about the underlying precedence constrained knapsack structure of the linear constraints in order to fix binary variables and separate induced cover inequalities. This structure is not directly exploited by the general-purpose solvers. In contrast to the LP relaxation based solvers, the QCP relaxation based approach of SBB appears to be less dependent on the change in formulation. Notably, the Dent instance appears more challenging to SBB than Marvin, while for SCIP the situation is reversed. This is probably due to the increased problem size, which affects the solvability of the QCP relaxation in SBB more than the solvability of the LP relaxation in SCIP. For both instances, SCIP was able to compute better primal solutions than SBB. For instance Marvin, SBB produced a solution only slightly worse than SCIP when using the option dfsstay 25. Here, the forced depth first search after nodes with integer feasible solution appears to function as an improvement heuristic, compensating for SBB’s lack of heuristics.

36

A. Bley et al.

Table 4 Results for warehouse formulation (WF) Instance Solver After 3,600 s

Marvin

Dent

After 10,000 s

Primal

Dual

Nodes

Primal

Dual

Nodes

Gap

Benchmark

694.8

695.9

41,057

695.0

695.1

115,103

0.02

BARON BARON

303.3 427.8

715.1 715.9

482 803

388.8 619.3

713.9 714.8

1,881 4,546

83.60 15.43

C OUENNE C OUENNE

681.2

719.5 718.5

0 193

687.6

719.5 715.7

2 1,534

4.09

SCIP/CPLEX SCIP/CLP

691.9 691.7

705.1 705.5

43,000 31,200

691.9 692.0

704.6 704.7

149,474 95,927

1.80 1.80

SBB SBB

677.8 684.3

(705.9) (705.9)

8,940 9,020

689.0 691.8

(705.0) (705.0)

27,498 27,095

(2.32) (1.92)

Benchmark

48.8

49.1

7,300

48.9

49.0

23,401

0.33

BARON BARON

in preprocessing 11.6 50.0 146

11.7 46.5

50.0 50.0

61 864

327.00 7.65

C OUENNE C OUENNE

47.3 47.3

50.3 50.2

0 2

47.3 47.3

50.3 50.2

2 10

6.43 6.11

SCIP/CPLEX SCIP/CLP

48.5 48.7

49.2 49.2

13,000 9,600

48.8 48.7

49.1 49.1

41,312 34,275

0.71 1.00

SBB SBB

40.2 40.3

(50.1) (50.1)

580 660

40.2 40.3

(50.1) (50.1)

1,546 1,611

(24.69) (24.19)

4.4.4 Comparison of LP Based Solvers BARON, C OUENNE, and SCIP For the basic formulation, the best dual bounds were found by BARON, while for the warehouse formulations SCIP computed tighter bounds. For all formulations, SCIP computed the best primal solutions among the global solvers—all found by the extended RENS heuristic—and terminated with the smallest gaps. BARON spent much time in preprocessing and per node—also with reduced probing—which results in a comparably small number of enumerated nodes. COUENNE, in contrast, spent much time in its primal solution heuristics. A significant amount of this time was used by the underlying NLP solver IPOPT , which seems to have difficulties solving the (nonconvex) QCPs obtained from fixing integer variables in the original formulation. Figure 1 compares the progress of the primal and dual bounds from the start to the time limit of 10,000 s for all three solvers. It can be seen that even with tuned settings BARON and COUENNE spent a significant amount of time in presolving, especially

MIQCP Solvers vs. a Spec. Algo. for Mine Sched.

37

BARON∗ SCIP

200 0 0

50.5

Couenne∗

50

BARON∗

49.5

SCIP 49 Benchmark best primal 40 30 20 10 0 2,000 4,000 6,000 8,000 10,000 0 2,000 4,000 6,000 8,000 10,000

Benchmark

400

dual bnd

710

700 best primal 600 primal bnd

Dent Couenne∗

primal bnd

dual bnd

Marvin 720

time [s]

time [s]

Fig. 1 Progress in primal and dual bounds for the application-specific algorithm (grey) and the global solvers BARON (dotted), COUENNE (continuous), and SCIP/CPLEX (dashed) for warehouse formulation (WF). The “best primal” axis is level with the best known primal solution value from the application-specific benchmark algorithm

for instance Dent. BARON found a number of primal solutions in presolving. Since the BARON log files do not show the times when primal solutions were found during presolving, we plot these at the end of presolving. For the dual bound it can be seen that all three solvers start approximately with the same dual bound. SCIP, however, is able to decrease it more rapidly and comes closer to the best known primal solution values from the application-specific algorithm. For both instances, SCIP’s dual bound after 1,800 s is already less than 0.35% above its final value at 10,000 s.

5 Conclusion We have compared the performance of state-of-the-art generic MIQCP solvers on two realistic instances arising from the scheduling of open pit mine production. The problem can be characterised as a large mixed-integer linear program which is complemented by quadratic mixing constraints. The performance of SCIP and the application-specific algorithm indicates that for such problems, extending a MIP framework compares favourably to other approaches. Intuitively, the reason might be that integer variables usually model decisions, whereas nonlinear constraints model conditions. Once a linear relaxation of the nonlinear constraints has been solved and all variables are integer feasible, what remains is to fix violations of the nonlinear constraints. One could argue that in many applications this is easier than trying to fix violated integrality constraints once a continuous nonlinear relaxation has been solved. On the other hand, the performance of the QCP relaxation based solver SBB shows that employing nonlinear relaxations can make the solver more robust with respect to the choice of the formulation used. Unfortunately, as long as there is no way to prove global optimality for the relaxation used, this can only be used as a heuristic.

38

A. Bley et al.

Comparing the LP based general purpose solvers, COUENNE exploits some sophisticated heuristics in the root node, which enable it to produce good primal solutions. However, the low number of enumerated nodes, partly due to using I POPT as QCP solver, yields weaker dual bounds. BARON computed better dual bounds, but was unable to produce compatible primal solutions. Our experiments demonstrated that SCIP is able to perform nearly as well as a problem-specific implementation. In a pure MIP setting, SCIP would employ 25 primal heuristics. At the time of testing, only one of these has been extended to handle nonlinearities. Acknowledgements We thank our industry partner BHP Billiton Pty. Ltd. for providing us with the necessary data sets to conduct this study, and GAMS Development Corp. for providing us with evaluation licenses for BARON and SBB. This research was partially funded by the DFG Research Center M ATHEON , Project B20.

References 1. K. Abhishek, S. Leyffer, and J.T. Linderoth. FilMINT: An outer-approximation-based solver for nonlinear mixed integer programs. Technical Report ANL/MCS-P1374-0906, Argonne National Laboratory, 2006. 2. T. Achterberg. Constraint Integer Programming. PhD thesis, TU Berlin, 2007. 3. ARKI Consulting & Development A/S. CONOPT and SBB. http://www.gams.com/solvers/ solvers.htm. 4. P. Belotti, J. Lee, L. Liberti, F. Margot, and A. W¨achter. Branching and bounds tightening techniques for non-convex MINLP. Optimization Methods and Software, 24(4-5):597–634, 2009 5. T. Berthold, S. Heinz, and S. Vigerske. Extending a CIP framework to solve MIQCPs. Technical Report 09-23, Konrad-Zuse-Zentrum f¨ur Informationstechnik Berlin (ZIB), 2009. http://opus.kobv.de/zib/volltexte/2009/1186/. 6. A. Bley, N. Boland, G. Froyland, and M. Zuckerberg. Solving mixed integer nonlinear programming problems for mine production planning with a single stockpile. Technical Report 2009/21, Institute of Mathematics, TU Berlin, 2009. 7. A. Bley, A.M. Gleixner, T. Koch, and S. Vigerske. Comparing MIQCP solvers to a specialised algorithm for mine production scheduling. Technical Report 09-32, Konrad-Zuse-Zentrum f¨ur Informationstechnik Berlin (ZIB), October 2009. http://opus.kobv.de/zib/volltexte/2009/1206/. 8. N. Boland, I. Dumitrescu, G. Froyland, and A.M. Gleixner. LP-based disaggregation approaches to solving the open pit mining production scheduling problem with block processing selectivity. Comp. & Oper. Research, 36:1064–1089, 2009. 9. P. Bonami, L.T. Biegler, A.R. Conn, G. Cornu´ejols, I.E. Grossmann, C.D. Laird, J. Lee, A. Lodi, F. Margot, N. Sawaya, and A. W¨achter. An algorithmic framework for convex mixed integer nonlinear programs. Disc. Opt., 5:186–204, 2008. 10. O. Exler and K. Schittkowski. A trust region sqp algorithm for mixed-integer nonlinear programming. Optimization Letters, 1:269–280, 2007. 11. J.J. Forrest. CLP and CBC. http://projects.coin-or.org/Clp,Cbc/. 12. C. Fricke. Applications of Integer Programming in Open Pit Mining. PhD thesis, University of Melbourne, August 2006. 13. A.M. Gleixner. Solving large-scale open pit mining production scheduling problems by integer programming. Master’s thesis, TU Berlin, June 2008. 14. IBM. ILOG CPLEX. http://www-01.ibm.com/software/integration/optimization/cplex.

MIQCP Solvers vs. a Spec. Algo. for Mine Sched.

39

15. B.A. Murtagh and M.A. Saunders. MINOS 5.5 User’s Guide. Department of Operations Research, Stanford University, 1998. Report SOL 83-20R. 16. I. Nowak and S. Vigerske. LaGO: a (heuristic) branch and cut algorithm for nonconvex MINLPs. Central Europ. J. of Oper. Research, 16(2):127–138, 2008. 17. M.G. Osanloo, J. Gholamnejad, and B. Karimi. Long-term open pit mine production planning: a review of models and algorithms. International Journal of Mining, Reclamation and Environment, 22(1):3–35, 2008. 18. S. Ramazan. The new fundamental tree algorithm for production scheduling of open pit mines. European Journal of Oper. Research, 177(2):1153–1166, 2007. 19. M. Tawarmalani and N.V. Sahinidis. Convexification and Global Optimization in Continuous and Mixed-Integer Nonlinear Programming: Theory, Algorithms, Software, and Applications. Kluwer Academic Publishers, 2002. 20. A. W¨achter and L.T. Biegler. On the implementation of a primal-dual interior point filter line search algorithm for large-scale nonlinear programming. Mathematical Programming, 106(1):25–57, 2006.



A Binary Quadratic Programming Approach to the Vehicle Positioning Problem Ralf Bornd¨orfer and Carlos Cardonha

Abstract The VEHICLE POSITIONING PROBLEM (VPP) is a classical combinatorial optimization problem that has a natural formulation as a MIXED INTEGER QUADRATICALLY CONSTRAINED PROGRAM. This MIQCP is closely related to the QUADRATIC ASSIGNMENT PROBLEM and, as far as we know, has not received any attention yet. We show in this article that such a formulation has interesting theoretical properties. Its QP relaxation produces, in particular, the first known nontrivial lower bound on the number of shuntings. In our experiments, it also outperformed alternative integer linear models computationally. The strengthening technique that raises the lower bound might also be useful for other combinatorial optimization problems.

1 Introduction The VEHICLE POSITIONING PROBLEM (VPP) is about the assignment of vehicles (buses, trams, or trains) to parking positions in a depot and to timetabled trips. The parking positions are organized in tracks, which work as one- or two-sided stacks or queues. If at some point in time a required type of vehicle is not available in the front of any track, shunting movements must be performed in order to change the vehicle positions. This is undesirable and should be avoided. The VPP and its variants, such as the BUS DISPATCHING PROBLEM [5], the TRAM D ISPATCHING PROBLEM [13], and the T RAIN UNIT DISPATCHING PROBLEM [10], are well-investigated in the combinatorial optimization literature, see Hansmann and Zimmermann [7]. The problem was introduced by Winter [13] and Winter and Zimmermann [14], who modeled the VPP as a QUADRATIC

R. Bornd¨orfer  C. Cardonha Zuse Institute Berlin, Takustr. 7, 14195, Berlin, Germany e-mail: [email protected]; [email protected] H.G. Bock et al. (eds.), Modeling, Simulation and Optimization of Complex Processes, DOI 10.1007/978-3-642-25707-0 4, © Springer-Verlag Berlin Heidelberg 2012

41

42

R. Bornd¨orfer and C. Cardonha

ASSIGNMENT PROBLEM and used linearization techniques to solve it as an integer linear program. This approach was extended by Gallo and Di Miele [4] to deal with vehicles of different lengths and interlaced sequences of arrivals and departures. Similarly, Hamdouni et al. [5] explored robustness and the idea of uniform tracks (tracks which receive just one type of vehicle) to solve larger problems. Recently, Freling, Kroon, Lentink, and Huisman [3] and Kroon, Lentink, and Schrijver [10] proposed an integer linear program to consider decomposable vehicles (trains) and different types of tracks; they assume that the number of uniform tracks is known in advance. Although the VPP was originally modeled as a binary quadratic program, this formulation was not explored theoretically and it was not used for computations. All research efforts that we are aware of concentrated on integer linear models, that used more and more indices in order to produce tighter linearizations. Recent progress in mixed integer nonlinear programming (MINLP) and, in particular, in mixed integer quadratically constrained programming (MIQCP) methods [11], however, has increased the attractivity of the original quadratic model. Besides the compactness of this formulation, quadratic programming models also yield potentially superior lower bounds from fractional quadratic programming relaxations. In fact, the LP relaxations of all known integer linear models yield only the trivial lower bound zero. We investigate in this article two binary quadratic programming formulations for the VPP. Our main result is that the QP relaxation of one of these models yields a nontrivial lower bound on the number of shunting movements, that is, the fractional QP lower bound is nonzero whenever shunting is required. This model also gave the best computational performance in our tests, even though it is not convex. We also tried to apply convexification techniques [6], but the results were mixed. Convexification helped, but only when the smallest eigenvalue of the objective function was not too negative. The article is organized as follows. The VPP is described in Sect. 2. Section 3 discusses integer linear and integer quadratic 2-index models, i.e., we revisit the original approach of Winter. In Sect. 4 we present integer linear and integer quadratic 3-index models. One of them produces the already mentioned QP bound. All our computational experiments were done on an Intel(R) Core 2 Quad 2,660 MHz with 4Gb RAM, running under openSUSE 11.1 (64 bits). We used CPLEX 11.2 [8] to solve linear programs, SCIP 1.0 for integer programs [1], and SNIP 1.0 for integer non-linear programs [12].

2 The Vehicle Positioning Problem The VEHICLE POSITIONING PROBLEM (VPP) is a 3-dimensional matching problem, where vehicles that arrive in a sequence A D fa1 ; a2 ; : : : ; an g must be assigned to parking positions P D fp1 ; p2 ; : : : ; pn g in a depot and depart to service a sequence of timetabled trips D D fd1 ; d2 ; : : : ; dn g. We assume that the first

A Binary Quadratic Programming Approach to Vehicle Positioning Problem

43

departure trip starts after the last incoming vehicle arrived. Each vehicle ai has a type t.ai / and each trip di can be serviced only by vehicles of type t.di /. The parking positions are located in tracks S, and we assume that positions in the tracks are numbered consecutively. Each track s 2 S has size ˇ, and we assume that ˇjSj  n. Each track is operated as a FIFO queue, that is, vehicles enter the track at one end and leave at the other. Consider a matching with assignments .i; p; k/ and .j; q; l/, that is, the i -th arriving vehicle is assigned to parking position p in order to service the k-th departing trip and the j -th arriving vehicle is assigned to parking position q in order to service the l-th departing trip. Assume that p and q are located in the same stack; then a shunting movement is required if either i < j and p > q or p < q and k > l. In this case, we say that these assignments are in conflict and denote the associated crossings by .i; p/ Ž .j; q/ or .p; k/ Ž .q; l/. Given A; P; D; S; t, and ˇ, the VPP is to find a 3-dimensional matching that minimizes the number of crossings. The number of crossings is related to the number of required shuntings. We remark that there are more complex versions of this problem involving different sizes of vehicles and parking positions, multiple periods, etc. However, we do not consider them here. We use the following notation. V .M / denotes the optimal objective value of a model M . If M is an ILP, VLP .M / is the optimal objective value of its LP relaxation, and if M is an MIQCP, VQP .M / is the optimal objective value of its fractional quadratic programming relaxation. Finally, we say that two models M and M 0 are equivalent if, for every solution of M , there is a solution of M 0 with the same objective value and vice-versa.

3 Two-Index Models Winter [14] gave the following integer quadratic programming formulation for the VPP: .W/ min

P

xa;p xa0 ;q C

P

(1)

yd;p yd 0 ;q

.d;p/Ž.d 0 ;q/

.a;p/Ž.a0 ;q/

P

a2A

xa;p D 1

p2P

(2)

p2P

xa;p D 1

a2A

(3)

d 2D

yd;p D 1

p2P

(4)

p2P

yd;p D 1

d 2D

(5)

.a;p;d /2APD t .a/¤t .d /

(6)

P

P

P

xa;p C yd;p  1

xa;p ; yd;p 2 f0; 1g:

44

R. Bornd¨orfer and C. Cardonha

The model uses binary variables xa;p , with a 2 A and p 2 P, and yd;p , with d 2 D and p 2 P. If xa;p D 1 .yd;p D 1/, vehicle a (trip d ) is assigned to parking position p. Constraints (2)–(5) define the assignments, the constraint (6) enforces the coherence of these assignments by allowing only vehicles and trips of the same type to be assigned to a given parking position. Finally, the quadratic cost function calculates the number of crossings. In his work, Winter did not solve the quadratic program directly. Instead, he applied the linearization method of Kaufman and Broeckx [9], obtaining the following integer linear model: .LW/ min

P

P wa;p C d 2D;p2P ud;p P a2A xa;p D 1 P p2P xa;p D 1 P d 2D yd;p D 1 P p2P yd;p D 1

(7)

a2A;p2P

p2P

(8)

a2A

(9)

p2P

(10)

d 2D

(11)

.a;p;d /2APD xa;p C yd;p  1 t .a/¤t .d / P x x da;p xa;p  wa;p C .a;p/Ž.a0 ;q/ xa0 ;q  da;p 8p 2 P; a 2 A P y y dd;p yd;p  ud;p C .d;p/Ž.d 0 ;q/ yd 0 ;q  dd;p 8p 2 P; d 2 D

(12) (13) (14)

xa;p ; yd;p 2 f0; 1g wd;p ; ud;p 2 N:

In this model, the integer variables wa;p and ud;p count the number of crossings y x involving the assignments .a; p/ and .d; p/, respectively. da;p and dd;p are upper bounds on these variables, respectively, that are computed a priori. The following is known about these models: Remark 1. The model W has 2n2 variables and n3 C 4n constraints. Remark 2. The model LW has 4n2 variables and n3 C 2n2 C 4n constraints. Theorem 1 (WZ00). The models W and LW are equivalent. Theorem 2 (WZ00). VLP .LW/ D 0. It is not difficult to modify Winter’s proof of Theorem 2 in order to get a similar result for the QP relaxation of his quadratic model: Theorem 3. VQP .W / D 0 if jSj > 1. Proof. Let M be a matching where each ai is assigned to di (i.e., first vehicle to first trip, second vehicle to second trip, and so on) and the assignment of the pairs

A Binary Quadratic Programming Approach to Vehicle Positioning Problem

45

.ai ; di / to the parking positions is made according to the following scheme, where each column represents a track: .anjSj ; dnjSj / :: :

.anjSjC1 ; dnjSjC1 / :: :

::: :: :

.an ; dn / :: :

.ajSjC1 ; djSjC1 / .a1 ; d1 /

.ajSjC2 ; djSjC2 / .a2 ; d2 /

::: :::

.a2jSj ; d2jSj / .ajSj ; djSj /

Such a matching has no crossings. However, it is not always feasible for W because of type mismatches (cf. the coherence (6)). If the integrality of the variables is relaxed, assigning each pair .ai ; di / to the same relative position in each track avoids the restrictions given by the coherence equations. More precisely, if a pair .ai ; di / is assigned to the second position of some track (in other words, if b.i  1/=jSjc D 1), we fix xai ;p D ydi ;p D 1=jSj for each position p 2 P which is the second position in some track (in other words, if b.p  1/=jSjc D 1). If jSj > 1, Equations 6 are satisfied. Since there are no crossings, the objective value is zero. t u A problem with model W is that the objective is not convex. This obstacle can be overcome using the following eigenvalue technique of Hammer and Rubin [6]. P Initially, we observe that .a;p/Ž.a0 ;q/ xa;p xa0 ;q can be written as x T Ax, where A 2 2

2

f0; 1gn  f0; 1gn is the symmetric incidence matrix of all arrival crossings. If ˛ is the minimum eigenvalue of A, we have x T Ax D x T .A  ˛I /x C ˛x T x:

(15)

As x is binary, this equation can be rewritten as x T Ax D x T .A  ˛I /x C ˛

X

xi :

(16)

i

Finally, in our case, we have

P

i

xi D n for every feasible solution, that is,

x T Ax D x T .A  ˛I /x C ˛n:

(17)

As A  ˛I P is positive semidefinite, the function on the right is convex. The same ideas yield .d;p/Ž.d;q0 / yd;p yd;q 0 D y T A0 y. Moreover, A0 D A. Then, the objective can be written as X X 2 2 x T A0 x  ˛ .xa;p  xa;p / C y T A0 y  ˛ .yd;p  yd;p /: (18) .a;p/

.d;p/

Applying this substitution to the model W, we obtain:

46

R. Bornd¨orfer and C. Cardonha

minx T A0 x  ˛

P

.a;p/

2 .xa;p  xa;p / C y T A0 y  ˛

P

xa;p D 1

a2A

(20)

yd;p D 1

p2P

(21)

yd;p D 1

d 2D

(22)

p2P

d 2D

P

p2P

2 .yd;p  yd;p /

(19)

xa;p D 1

P

.d;p/

p2P

a2A

P

P

xa;p C yd;p  1

.a;p;d /2APD t .a/¤t .d /

(23)

xa;p ; yd;p 2 f0; 1g: Table 1 give the results of a computational comparison of models W and LW, and W and CW, respectively, on a test set of ten instances of small and medium sizes. The first column in these tables give the name x-y-z of the problem. Here, x is the number of vehicle types, y is the number of tracks, and z D ˇ is the number of parking positions per track. The arrival sequences A were built randomly (i.e., the type of each vehicle was uniformly chosen among the x possibilities), while sequences D were obtained by applying 1,000 uniformly chosen random swaps to A. The columns labeled Row, Col, and NZ give the number of constraints, variables, and non-zeros of the respective model. The numbers of rows and columns for the problems of model CW are the same as the ones for model W. Columns Nod give the number of nodes in the search tree generated by the respective solver (SCIP with LP solver CPLEX for LW and SNIP for W) and T/s the computation time in seconds. Comparing the results for models CW and W shows that convexification led to an improvement, but not enough to outperform the linearized model LW, in particular not on the larger instances. We remark that more sophisticated convexification techniques might improve the results [2].

4 Three-Index Models Gallo and Di Miele [4] improved Winter’s model by noting that assignments .a; s/ and .d; s/ of arrivals and departures to stacks implicitly determine the parking positions uniquely; this produces a substantially smaller model. Kroon, Lentink and Schrijver [10] took this idea in order to create a 3-index model with a stronger LP relaxation (although the lower bound is still equal to zero):

Col

2;305 2;305 2;305 1;765 1;765 1;765 3;137 3;137 3;137 4;901

3-6-4 4-6-4 5-6-4 3-7-3 4-7-3 5-7-3 3-7-4 4-7-4 5-7-4 3-7-5

10;465 11;617 12;289 7;141 7;897 8;359 16;297 18;145 19;209 31;151

Name Row

LW

43;741 46;045 47;389 25;257 26;769 27;693 68;391 72;087 74;215 152;125

NZ

1;343 12;849 32;870 234 17;220 114 17;220 7;393 60;590 59;992

Nod

Table 1 Comparing models LW, W, and CW

58 265 654 18 15 19 124 574 2;171 3;251

T/s 9;325 10;477 11;149 6;273 7;029 7;491 14;743 16;591 17;655 28;715

Row

W/CW

1;165 1;165 1;165 897 897 897 1;583 1;583 1;583 2;465

Col 21;889 24;193 25;537 14;995 16;507 17;431 33;937 37;633 39;761 64;471

NZ

W

215 816 1;010 590 523 651 480 1;609 113;997 6;612

Nod 142 214 237 58 52 64 121 251 11;845 76;685

T/s

21;913 29;977 25;561 15;023 16;535 17;459 33;965 37;661 39;789 64;499

NZ

CW

1;543 24;217 586 245 324 858 2;122 1;526 1;320 627

Nod

116 690 96 29 32 42 176 242 1;544 40;145

T/s

A Binary Quadratic Programming Approach to Vehicle Positioning Problem 47

48

R. Bornd¨orfer and C. Cardonha

.LU/ P

min

(24)

ra;s;d

.a;s;d /2ASD.a;s;d /

P

P

a0 1. Proof. Let M be a matching where each ai is assigned to di (i.e., first vehicle to 1 first trip, second vehicle to second trip, and so on). Assign jSj to each variable xa;s;d such that .a; d / 2 M . In this case, Constraints (25) and (26) clearly hold, as X s

xa;s;d D

X 1 D1 jSj s

A Binary Quadratic Programming Approach to Vehicle Positioning Problem

49

for each a 2 A and d 2 D. Moreover, as jM j D n, X

.a;d /

xa;s;d D n

1 ˇ jSj

for each s 2 S, satisfying (27). Finally, because each arrival is assigned to only one departure, we have X

xa0 ;s;d C

a0 0 if V .UI / > 0. Proof. If V .UI/ > 0, there is a crossing for each possible assignment of vehicles to trips and tracks. Let x  be an optimal solution of the QP relaxation of UI. Consider the vector dx  e. If dx  e contains an integer solution, there is a crossing and

50

R. Bornd¨orfer and C. Cardonha

X

s;.a;d /Ž.a0 ;d 0 /

 dxa;s;d edxa0 ;s;d 0 e > 0:

Then X

 xa;s;d xa0 ;s;d 0 > 0:

s;.a;d /Ž.a0 ;d 0 /

If dx  e does not contain an integer solution, there is an inconsistent assignment and therefore X   xa;s;d xa;s 0 ;d 0 > 0: a;.s;d /¤.s 0 ;d 0 /

 As far as we know, VQP .UI/ is the first nontrivial lower bound for the VPP. We remark that the same idea can also be used to strengthen some of the linear models such that they sometimes also produce nonzero lower bounds. We have, however, not been able to prove a result similar to Theorem 6, that is, that the lower bound is always nonzero if shuntings are required. Table 2 gives the results of a computational comparison of models U and LU on the same set of test problems as in Sect. 3 plus one additional model that could not be solved there. Model UI could not be tested yet due to numerical problems. The comparison of the results for models CW and W from Sect. 3 and those for LU and U shows a clear superiority of the U models over the W models. Among the U models, the integer quadratic model U outperformed the integer linear model LU. The next instance 7-8-7, however, could not be solved using any of our formulations. Table 2 Comparing models LU and U LU Name 3-6-4 4-6-4 5-6-4 3-7-3 4-7-3 5-7-3 3-7-4 4-7-4 5-7-4 3-7-5 4-7-5 5-7-5 6-7-6

Row 3;511 3;511 3;511 3;137 3;137 3;137 5;552 5;552 5;552 8;653 8;653 8;653 12;440

Col 4;609 4;321 4;153 4;117 3;865 3;711 7;323 6;861 6;595 11;439 10;725 10;291 14;407

NZ 38;017 30;241 25;675 30;871 24;816 21;274 67;803 53;509 45;389 126;099 98;582 82;321 117;307

U Nod 1 1 59 12 1 54 1 41 1 1 59 26 227

T/s 1 0 15 8 1 6 1 29 1 4 44 38 200

Row 61 61 61 57 57 57 71 71 71 85 85 85 99

Col 1;159 871 703 1;037 785 631 1;842 1;380 1;114 2;871 2;157 1;723 2;066

NZ 4;621 3;463 2;797 4;124 3;123 2;507 7;351 5;503 4;432 11;467 8;604 6;875 8;240

Nod 28 69 16 16 20 27 22 33 21 17 22 31 27

T/s 4 3 2 3 2 28 10 7 4 34 21 12 32

A Binary Quadratic Programming Approach to Vehicle Positioning Problem

51

We have also tried to apply the convexification technique of Hammer and Rubin [6] to model U, but this time it did not bring any performance gain. A possible explanation for this behavior is that the spectra of the objectives of the U instances have negative eigenvalues of much larger magnitude than those in the W instances. Again, more sophisticated convexification could be tried [2]. Acknowledgements We thank Stefan Vigerske for his advice with respect to the formulation of integer quadratic programs and SNIP support. We also thank an anonymous referee for helpful comments and suggestions. The work of C. C. is supported by CNPq-Brazil

References 1. T. ACHTERBERG, Constraint integer programming, Ph.D. thesis, TU Berlin, (2007). 2. A. B ILLIONNET AND S. ELLOUMI , Using a mixed integer quadratic programming solver for the unconstrained quadratic 0-1 problem, Math. Program., 109 (2007), pp. 55–68. 3. R. FRELING , R. LENTINK, L. KROON , AND D. HUISMAN , Shunting of passenger train units in a railway station. ERIM Report Series Research in Management, 2002. 4. G. GALLO AND F. DI M IELE, Dispatching buses in parking depots, Transportation Science, 35 (2001), pp. 322–330. 5. M. H AMDOUNI, F. SOUMIS, AND G. D ESAULNIERS, Dispatching buses in a depot minimizing mismatches. 7th IMACS, Scientific Computing Toronto, Canada, 2005. 6. P. HAMMER AND A. RUBIN , Some remarks on quadratic programming with 0-1 variables, Revue Francaise d’Informatique et de Recherche Operationelle, 4 (1970), pp. 67–79. 7. R. S. HANSMANN AND U. T. ZIMMERMANN , Optimal Sorting of Rolling Stock at Hump Yards, in Mathematics – Key Technology for the Future: Joint Projects Between Universities and Industry, Springer, Berlin, 2008, pp. 189–203. 8. ILOG, CPLEX website. http://www.ilog.com/products/cplex/. 9. L. K AUFMANN AND F. BROECKX, An algorithm for the quadratic assignment problem, European J. Oper. Res., 2 (1978), pp. 204–211. 10. L. KROON , R. LENTINK, AND A. SCHRIJVER , Shunting of passenger train units: an integrated approach, ERIM Report Series Reference No. ERS-2006-068-LIS, (2006). 11. I. N OWAK , Relaxation and Decomposition Methods for Mixed Integer Nonlinear Programming, Birkh¨auser Verlag, 2005. 12. S. VIGERSKE, Nonconvex mixed-integer nonlinear programming. http://www.math.hu-berlin. de/stefan/B19/. 13. T. W INTER , Online and Real-Time Dispatching Problems, PhD thesis, TU Braunschweig, 1998. 14. T. W INTER AND U. ZIMMERMANN, Real-time dispatch of trams in storage yards, Annals of Operations Research, 96 (2000), pp. 287–315.



Determining Fair Ticket Prices in Public Transport by Solving a Cost Allocation Problem Ralf Bornd¨orfer and Nam-Dung ˜ Ho`ang

Abstract Ticket pricing in public transport usually takes a welfare maximization point of view. Such an approach, however, does not consider fairness in the sense that users of a shared infrastructure should pay for the costs that they generate. We propose an ansatz to determine fair ticket prices that combines concepts from cooperative game theory and integer programming. An application to pricing railway tickets for the intercity network of the Netherlands is presented. The results demonstrate that prices that are much fairer than standard ones can be computed in this way.

1 Introduction Public transport ticket prices are well studied in the economic literature on welfare optimization as well as in the mathematical optimization literature on certain network design problems, see, e.g., the literature survey in [2]. To the best of our knowledge, however, the fairness of ticket prices has not been investigated yet. The point is that typical pricing schemes are not related to infrastructure operation costs and, in this sense, favor some users, which do not fully pay for the costs they incur. For example, we will show that in this paper’s (academic) example of the Dutch IC railway network, the current distance tariff results in a situation where the passengers in the central Randstad region of the country pay over 25% more than the costs they incur, and these excess payments subsidize operations elsewhere. One can argue that this is not fair. We therefore ask whether it is possible to construct ticket prices that reflect operation costs better.

R. Bornd¨orfer  N.-D. Ho`ang Zuse Institute Berlin (ZIB), Germany e-mail: [email protected]; [email protected] H.G. Bock et al. (eds.), Modeling, Simulation and Optimization of Complex Processes, DOI 10.1007/978-3-642-25707-0 5, © Springer-Verlag Berlin Heidelberg 2012

53

54

R. Bornd¨orfer and N.-D. Ho`ang

Ticket pricing can be seen as a cost allocation problem, see [11] for a survey/an introduction. Cost allocation problems are widespread. They come up whenever it is necessary or desirable to divide a common cost between several users or items. If the users have an alternative to accepting the prices, cost allocation becomes a game. Some examples of applications where cost allocations have been determined using methods from cooperative game theory are, e.g, aircraft landing fees [8], water resource planning [10], water resource development [12], distribution cost of gas and oil transportation [4], and investment in electric power [5]. In this paper we model ticket pricing as a cooperative cost allocation game to minimize overpayments. We argue that the f -least core of this game can be used to determine fair prices. The f -least core can be computed by solving a linear program. This linear program has a number of constraints that is exponential in the number of players, but it can be solved using a constraint generation approach. The associated separation problem is an NP-hard combinatorial optimization problem. The article is structured as follows. Section 2 recalls some concepts from cooperative game theory. Some desired properties that a cost allocation should have are introduced in Sect. 3. A model that treats ticket pricing as a cost allocation game is presented in Sect. 4. The final Sect. 5 is devoted to the Dutch IC railway network example. We use the following notation. For a vector x 2 RN and a subset S  N , we P denote by x.S / D i 2S xi the sum of the coordinates of x in S .

2 Game Theoretical Setting A cost allocation game deals with price determination and is defined as follows. Definition 1. Consider a set of players N D f1; 2; : : : ; ng, a cost function c W 2N nf;g ! RC , and a non-empty polyhedron P D fx 2 Rn j Ax  b; xi  0; 8i 2 N g, which gives the set of feasible prices x that the players are asked to pay. The triple  D .N; c; P / is a cost allocation game. Definition 2. Let  D .N; c; P / be a cost allocation game and f W 2N nf;g ! RC a weight function. For each coalition ; ¤ S ¨N and each vector x D .x1 ; : : : ; xn / 2 Rn , we define the f -excess of S at x as ef .S; x/ WD

c.S /  x.S / : f .S /

The f -excess represents the gain (or loss, if it is negative) of coalition S , scaled by f .S /, if its members accept to pay x.S / instead of operating some service themselves at cost c.S /. We will assume in this article that the weight function f has the form f D ˛ C ˇj  j C c with ˛; ˇ;  0 and ˛ C ˇ C > 0; e.g., f .S / D jS j gives the gain per coalition member. The excess measures price acceptability: the smaller ef .S; x/, the less favorable is price x for coalition S , and

Determining Fair Ticket Prices in Public Transport

55

for ef .S; x/ < 0, i.e., in case of a loss, x will be seen as unfair by the members of S . We assume furthermore that the set of cost-covering prices of a cost allocation game  X . / WD fx 2 P j x.N / D c.N /g is non-empty. The goal of the cost allocation game is to determine a price x 2 X . / which minimizes the loss (or maximizes the gain) over all coalitions. In other words, we associate the optimization problem max

min ef .S; x/

x2X . / ;¤S  N

with the cost allocation game  D .N; c; P /. We recall some related definitions from game theory. Definition 3. Consider a cost allocation game  D .N; c; P /. Let " be a real number and f W 2N nf;g ! RC be a weight function. The set C";f . / WD fx 2 X . / j ef .S; x/  "; 8; ¤ S   N g is the ."; f /-core of  . In particular, C0;f . / is the core of  . The f -least core of the game  , denoted LCf . /, is the intersection of all nonempty ."; f /-cores. Proposition 1. Let "f . / be the largest " such that the set C";f . / is nonempty, i.e., "f . / D max

min ef .S; x/:

x2X . / ;¤S  N

Then LCf . / D C"f . /;f . In other words, the f -least core is the set of all vectors in X . / that maximize the minimum f -excess of proper subsets of N . The following trivial proposition holds. Proposition 2. If X . / is non-empty, then LCf . / is non-empty.

3 Cost Allocation Methods We take in this section a novel axiomatic approach to cost allocation games. Introducing a number of desirable properties, we state the question whether there is a “perfect” cost allocation method that fulfills them all. It will out that this is unfortunately impossible. Due to lack of space, we state here only the results; a full exposition can be found in [7].

56

R. Bornd¨orfer and N.-D. Ho`ang

Definition 4. Let ˘ be the set of all cost allocation games. A cost allocation method ˚ is a function ˚ W˘ !

1 [

kD1

jN j

Rk0 ; .N; c; P / 7! x 2 R0 :

Definition 5. A cost allocation method ˚ is feasible for a cost allocation game .N; c; P / if ˚.N; c; P / belongs to P . A cost allocation method is feasible if it is feasible for every game .N; c; P /. Definition 6. A cost allocation method ˚ is efficient for a cost allocation game .N; c; P / if there holds ˚.N; c; P /.N / D c.N /: A cost allocation method is efficient if it is efficient for every game .N; c; P /. Definition 7. Consider a cost allocation game .N; c; P /. For each coalition ; ¤ S ¨ N , define the set ˇ ˚  PS WD xjS ˇ x 2 P W xjN nS D 0 :

Assume that PS is non-empty for every coalition ; ¤ S ¨ N . A cost allocation method ˚ is a coalitionally stable allocation method for the game .N; c; P / if there holds for every coalition ; ¤ S ¨ N ˚.N; c; P /.S /  ˚.S; cS ; PS /.S /; where cS WD cj2S . A cost allocation method is coalitionally stable if it is coalitionally stable for every cost allocation game .N; c; P / satisfying PS ¤ ; for every nonempty coalition S . With a coalitionally stable cost allocation method, there is no proper coalition S of N such that the price for each player in S will not increase and the price for at least one player will decrease if S leaves the grand coalition N . Hence, the grand coalition N is stable, since no coalition S will have a profit to leave N . Definition 8. A cost allocation method ˚ is a core allocation method/an f -least core allocation method if for every cost allocation game .N; c; P / the vector ˚.N; c; P / belongs to the core of .N; c; P / in the case it is non-empty/the f -least core of .N; c; P /. In reality, allocation costs can often only be approximated or they can change over time. Therefore, a cost allocation method should be insensitive with respect to changes of the cost function. Definition 9. A cost allocation method ˚ is said to have bounded variation if for each number  2 .0; 1/ there exists a positive number K such that for all cost allocation games .N; c; P / and .N; c; Q P / satisfying

Determining Fair Ticket Prices in Public Transport

57

jc.S Q /  c.S /j  ˛jc.S /j; 8S  N; for some 0  ˛  , there holds j˚.N; c; Q P /i  ˚.N; c; P /i j  K˛˚.N; c; P /i ; 8i 2 N W ˚.N; c; P /i ¤ 0: Proposition 3. Each f -least core allocation method is a feasible, efficient core allocation method. We can now formulate our question: Does a feasible, efficient, coalitionally stable core allocation method, which has bounded variation, exist? The answer is “no” due to the following two propositions. Proposition 4. There is no efficient, coalitionally stable allocation method for cost allocation games whose cores are empty. Proposition 5. Core allocation methods do not have bounded variation, even if we only consider cost allocation games having a monotone, subadditive cost function. That means that, in general, one cannot construct an efficient, coalitionally stable core allocation method, which has bounded variation. Even worse, at most two of these four properties can be simultaneously fulfilled. There are two way to proceed: One way is to consider more specific families of cost allocation games which could have better properties, another is trying the minimize the degree of axiomatic violation. An example for the latter approach, namely, the minimization of unfairness in the sense of coalitional instability, is given in the following section.

4 Ticket Pricing as a Cooperative Game To apply the framework of Sects. 2 and 3 to the ticket pricing problem, we define a suitable cost allocation game  D .N; c; P /. Consider a railway network as a graph G D .V; E/, and let N  V  V be a set of origin-destination (OD) pairs, between which passengers want to travel, i.e., we consider each (set of passengers of an) ODpair as a player. We next define the cost c.S / of a coalition S  N as the minimum operation cost of a network of railway lines in G that service S . Using the classical line planning model of [3], c.S / can be computed by solving the integer program c.S / WD min .;/

s:t:

X

1 2 .cr;f r;f C cr;f r;f /

(1)

.r;f /2RF

X

X

r2R;r3e f 2F

ccap f .mr;f C r;f / 

X i 2S

Pei ; 8e 2 E

58

R. Bornd¨orfer and N.-D. Ho`ang

X

X

f r;f  Fei ; 8.i; e/ 2 S  E

r2R;r3e f 2F

r;f  .M  m/r;f  0; 8.r; f / 2 R  F X r;f  1; 8r 2 R f 2F

j  2 f0; 1gjRF j ;  2 ZjRF : 0

The model assumes that the Pi passengers of each OD-pair i travel on a unique shortest path P i (with respect to some distance in space or time) through the network, such that demands Pei on transportation capacities on edges e arise, and, likewise, demands Fei on frequencies of edges. These demands can be covered by a set R of possible routes (or lines) in G, which can be operated at a (finite) set of possible frequencies F , and with a minimal and maximal number of wagons m and 2 1 M in each train. ccap is the capacity of a wagon, cr;f , .r; f / 2 R  F , and cr;f are cost coefficients for the operation of route r at frequency f . The variable r;f equals 1 if route r is operated at frequency f , and 0 otherwise, while variable r;f denotes the number of wagons in addition to m on route r with frequency f . The constraints guarantee sufficient capacity and frequency on each edge, link the two types of route variables, and ensure that each route is operated at a single frequency. It is shown in [3] that the problem is NP-hard, but it can be solved for the sizes that we consider. Finally, we define the polyhedron P , which gives conditions on the prices x that the players are asked to pay, as follows. Let .uj 1 ; uj /, j D 1; : : : ; l, be OD-pairs such that uj , j D 0; : : : ; l, belong to the travel path P st associated with some ODpair .s; t/, u0 D s, and ul D t, and let .u; v/ be an arbitrary OD-pair such that u and v also lie on the travel path P st from s to t. We then stipulate that the prices xi =Pi , which individual passengers of OD-pair i have to pay, must satisfy the monotonicity properties 0

l X xuj 1 uj xst xuv   : Puv Pst j D1 Puj 1 uj

(2)

Moreover, we require that the prices should have the following property max st

xst xst  K min ; st dst Pst dst Pst

(3)

where dst is the distance of the route .s; t/. This inequality guarantees that the price difference per unit of length, say one kilometer, is bounded by a factor of K. The triple  D .N; c; P / defines a cost allocation game to determine costcovering prices for using the railway network G, in which coalitions S consider the option to bail out of the common system and set up their own, private ones. Computing prices in the f -least core of  requires to solve the linear program

Determining Fair Ticket Prices in Public Transport

max "

59

(4)

.x;"/

s:t: x.S / C "f .S /  c.S /; 8S 2 2N nf;; N g x 2 X . /: This can be done using a constraints generation approach. We start with a (small) subset ; ¤ ˙  2N nf;; N g and solve (5)

max " .x;"/

s:t: x.S / C "f .S /  c.S /; 8S 2 ˙ x 2 X . /: Let .x  ; " / be an optimal solution of this LP. The separation problem for .x  ; " / is to find a coalition T 2 2N nf;; N g such that .x  ; " / violates the constraint x  .T / C " f .T /  c.T /:

(6)

This can be done by solving the optimization problem max x  .S / C " f .S /  c.S /:

;¤S ¨N

(7)

If the optimal value is non-positive then .x  ; " / is a feasible and hence optimal solution of (4). Otherwise, each optimal solution of (7) provides a violated constraint. Recalling f D ˛ C ˇj  j C c, we have x  .S / C " f .S /  c.S / D ˛" C

X .xi C ˇ" /iS C . "  1/c.S /;

i 2N

where iS WD



1 if i 2 S 0 otherwise:

On the other hand, there holds "  1:

(8)

Trivially, the inequality (8) holds for " < 0. In the case "  0, since ˛; ˇ  0 and .x  ; " / is a feasible solution of (5),one can easily verify that the inequality (8)

60

R. Bornd¨orfer and N.-D. Ho`ang

holds as well. Therefore, the optimization problem (7) can be reformulated as the integer program max ˛" C

.;;z/

s:t:

X

X  xi C ˇ" zi C . "  1/ i 2N

X

ccap f .mr;f C r;f / 

X

.r;f /2RF

X



1 2 r;f C cr;f r;f cr;f



(9)

Pei zi  0; 8e 2 E

i 2N

r2R;r3e f 2F

X

X

f r;f  Fei zi  0; 8.i; e/ 2 N  E

r2R;r3e f 2F

r;f  .M  m/r;f  0; 8.r; f / 2 R  F X r;f  1; 8r 2 R f 2F

j  2 f0; 1gjRF j ;  2 ZjRF ; z 2 f0; 1gjN j nf0; 1g: 0

The variables zi , i 2 N , correspond to a coalition S  N , zi equals 1 if the player i belongs to S and 0 otherwise. Other variables and constraints come from the integer program (1), which models the cost function. A violated constraint exists iff the optimal value is larger than 0. If it is non-positive, then .x  ; " / is a feasible solution N ; N zN/ of (9) with a positive of (4). Otherwise, we can find a feasible solution .; objective function value. Define T WD fi 2 N j zNi D 1g, then .x  ; " / violates the constraint (6).

5 Fair IC Ticket Prices We now use our ansatz to compute ticket prices for the intercity network of the Netherlands, which is shown in Fig. 1. Our data is a simplified version of that published in [3], namely, we consider all 23 cities, but reduce the number of ODpairs to 85 by removing pairs with small demand. However, with 285  1 possible coalitions, the problem is still very large. We start with a “pure fairness scenario” where the prices are only required to have the monotonicity property (2), i.e., we ignore property (3) for the moment. By solving LP (4), we determine a point x  in the c-least core (i.e., f D c) and define c-least core ticket prices (lc-prices) for each passenger of an OD-pair i as pi WD xi =Pi . Figure 2 compares these lc-prices p  with the distance dependent prices p that have been used by the railway operator NS Reizigers for this network as reported in /x.S / [1]. The picture on the left side plots the relative c-profits c.Sc.S with x D x  and / x D x D p ı P (ı denotes the coordinate-wise product) of 8,000 coalitions, which have been computed in the course of our constraint generation algorithm. These c-profits are sorted in non-decreasing order. Note that the core of this particular

Determining Fair Ticket Prices in Public Transport

61 Groningen

Leeuwarden

Assen Heerenveen

Lelystad

Zwolle Oldenzaal

Amsterdam Hengelo Schiphol

Apeldoorn Arnhem

Den Haag

Utrecht Zevenaar

Rotterdam

Breda Rosendaal

Eindhoven

Sittard

Maastricht

Fig. 1 The intercity network of the Netherlands

game is empty, and some coalitions have to pay more than their cost. The maximum c-loss of any coalition with respect to the lc-prices is a mere 1.1%. This hardly noticeable unfairness is in contrast with the 25.67% maximum c-loss in the distance prices. In fact, there are 10 other coalitions with losses of more than 20%. Even worse, the coalition with the maximum loss is the main coalition of passengers traveling in the center of the country, i.e., in our model, a major coalition would earn a substantial benefit from shrinking the network. The picture on the right side of Fig. 2 plots the distribution of the ratio between the lc-prices and the distance prices. It can be seen that lc-prices are lower, equal,

62

R. Bornd¨orfer and N.-D. Ho`ang 0.3 lc-prices/distance prices

14

relative profit

0.2 0.1 0 -0.1 -0.2 -0.3

lc-prices distance prices 0

2000

4000 coalition

6000

distribution

12 10 8 6 4 2 1

8000

0

20000 40000 60000 number of passengers

80000

Fig. 2 c-least core vs. distance prices for the Dutch IC network (1) 0.3 lc-prices/distance prices

2

relative profit

0.2 0.1 0 -0.1 -0.2 -0.3

lc-prices distance prices 0

2000

4000 coalition

6000

8000

1.5 1 0.5 0

0

20000

40000

60000

80000

number of passengers

Fig. 3 c-least core vs. distance prices for the Dutch IC network (2)

or slightly higher for most passengers. However, some passengers, mainly in the periphery of the country, pay much more to cover the costs that they produce. The increment factor is at most 3.78 except for two OD-pairs, which face very high price increases. The top of the list is the OD-pair Den Haag HS to Den Haag CS, which gets 14.4 times more expensive. The reason is that the travel path of this OD-pair consists of a single edge that is not used by any other travel route. From a game theoretical point of view, these lc-prices can be seen as fair. It would, however, be very difficult to implement such prices in practice. We therefore add property (3) in order to limit price increases by a factor of K. Considering the results from the previous computation, we set K D 3. Figure 3 gives the same comparisons as Fig. 2 for these lc-prices. The maximum c-loss of any coalition in the new lc-prices is 1.68%, which is slightly worse than before. But the price increments are significantly smaller. Nobody has to pay more than 1.89 times more than the distance price. In this way, one can come up with price systems that constitute a good compromise between fairness and enforceability. Acknowledgements We thank Sebastian Stiller for valuable comments and suggestions. The work of N.-D. H. is supported by a Konrad-Zuse Scholarship.

Determining Fair Ticket Prices in Public Transport

63

References ¨ 1. R. BORND ORFER , M. NEUMANN , AND M. E. P FETSCH, Optimal fares for public transport, Operations Research Proceedings 2005, (2006), pp. 591–596. , Models for fare planning in public transport, Tech. Rep. ZIB Report 08-16, Zuse2. Institut Berlin, 2008. 3. M. R. BUSSIECK, Optimal Lines in Public Rail Transport, PhD thesis, TU Braunschweig, 1998. ¨ ¨ -LUNDGREN , AND P. VARBRAND , The traveling salesman game: 4. S. ENGEVALL, M. G OTHE An application of cost allocation in a gas and oil company, Annals of Operations Research, 82 (1998), pp. 453–471. 5. D. GATELY, Sharing the gains from regional cooperation: A game theoretic application to planning investment in electric power, International Economic Review, 15 (1974), pp. 195– 208. ˚ H ALLEFJORD, R. H ELMING, AND K. JØRNSTEN, Computing the nucleolus when the 6. A. characteristic function is given implicitly: A constraint generation approach, International Journal of Game Theory, 24 (1995), pp. 357–372. 7. N. D. HOANG , Algorithmic Cost Allocation Game: Theory and Applications, PhD thesis, TU Berlin, 2010. 8. S. LITTLECHILD AND G. THOMPSON , Aircraft landing fees: A game theory approach, Bell Journal of Economics 8, (1977). 9. M. MASCHLER , B. PELEG , AND L. S. SHAPLEY , Geometric properties of the kernel, nucleolus, and related solution concepts, Mathematics of Operations Research, 4 (1979), pp. 303–338. 10. P. STRAFFIN AND J. HEANEY , Game theory and the tennessee valley authority, International Journal of Game Theory, 10 (1981), pp. 35–43. 11. H. P. YOUNG , Cost allocation, In R. J. Aumann and S. Hart, editors, Handbook of Game Theory, vol. 2, North-Holland, Amsterdam, 1994. 12. H. P. YOUNG , N. OKADA, AND T. HASHIMOTO , Cost allocation in water resources development, Water Resources Research, 18 (1982), pp. 463–475.



A Domain Decomposition Method for Strongly Mixed Boundary Value Problems for the Poisson Equation Dang Quang A and Vu Vinh Quang

Abstract Recently we proposed a domain decomposition method (DDM) for solving a Dirichlet problem for a second order elliptic equation, where differently from other DDMs, the value of the normal derivative on an interface is updated from iteration to iteration. In this paper we develop a method for solving strongly mixed boundary value problems (BVPs), where boundary conditions are of different type on different sides of a rectangle and the transmission of boundary conditions occurs not only in vertices but also in one or several inner points of a side of the rectangle. Such mixed problems often arise in mechanics and physics. Our method reduces these strongly mixed BVPs to sequences of weakly mixed problems for the Poisson equation in the sense that on each side of the rectangle there is given only one type of boundary condition, which are easily solved by a program package, constructed recently by Vu (see [13]). The detailed investigation of the convergence of the method for a model problem is carried out. After that the method is applied to a problem of semiconductors. The convergence of the method is proved and numerical experiments confirm the efficiency of the method. Keywords Domain decomposition method • Poisson equation • strongly mixed boundary conditions

D. Quang A Institute of Information Technology, VAST 18 Hoang Quoc Viet, Cau giay, Hanoi, Vietnam e-mail: [email protected] V.V. Quang Faculty of Information Technology, Thai Nguyen University, Hanoi, Vietnam e-mail: [email protected] H.G. Bock et al. (eds.), Modeling, Simulation and Optimization of Complex Processes, DOI 10.1007/978-3-642-25707-0 6, © Springer-Verlag Berlin Heidelberg 2012

65

66

D. Quang A and V.V. Quang

1 Introduction Consider the problem 8 u D f; ˆ ˆ < u D '; ˆ @u ˆ : D ; @

x 2 ˝;

x 2 @˝ n n ;

(1)

x 2 n :

where ˝  R2 with a Lipschitz continuous boundary @˝, f 2 L2 .˝/; ' 2 H 1=2 .@˝ n n /; 2 H 1=2 .n /; x D .x1 ; x2 / . Under the above assumptions the problem (1) has a solution in H 1 .˝/ (see [11]). We call the problem (1) with the point of transmission of Dirichlet and Neumann boundary conditions being an inner point of a smooth part of the boundary strongly mixed boundary value problem in distinguishing it from the weakly mixed one where the points of transmission of boundary conditions occur only in angle points of the boundary. This problem and other mixed boundary value problems often arise in mechanics and physics, and attract attention of many researchers. When the domain ˝ has a special shape such as rectangle, circle, half-plane or infinite strip for the case f D 0 and some special boundary conditions using series or integral transform methods many authors reduce strongly mixed boundary value problems to dual series or dual integral equations, and after that reduce the latter equations to the Fredholm equation for obtaining approximate solutions [6, 12]. An another approach to the solution of strongly mixed BVPs for the Laplace equation is the use of the expansion by fundamental solutions (see e.g. [1,5]). In 1989 Vabishchevich [8] proposed a method for reducing a strongly mixed BVP to a sequence of Dirichlet problems by an iterative method. In this method the value of the unknown function on the part of boundary, where its derivative is prescribed, is updated from iteration to iteration. As the author showed, this iterative method is not convergent on the continuous level, but it is convergent on the discrete level. Recently, in [2] the first author of the present paper proposed an alternative idea for treating the mixed BVP in a rectangle, where the Neumann condition is given on a part n of one side and the Dirichlet condition is given on the remaining part of boundary. This idea is based on iterative updating of the derivative of the unknown function on the part of the Dirichlet condition on the side. This iterative process reduces the strongly mixed problem to a sequence of weakly mixed BVPs, which are easily solved. It is proved that the method converges on both the continuous and the discrete level. Below we present a completely different approach to strongly mixed BVPs, which is based on a DDM developed by ourselves in [3, 4]. Differently from other DDMs, where the value of the sought function on the interface is updated in each iteration, in our method the value of its normal derivative is updated. In the mentioned works the advantage of our method in convergence rate over other DDMs is demonstrated in many numerical examples. The investigation of our DDM for the model problem (1) is presented in Sect. 2. In Sect. 3 we develop the method for a

DDM for Strongly Mixed BVP for Poisson Equation

67

problem which is a generalization of a problem in physics of semiconductors. The performed numerical examples show the efficiency of the proposed method.

2 A Domain Decomposition Method for a Strongly Mixed BVP For the problem (1), divide the domain ˝ into two subdomains ˝1 ; ˝2 by a curve  . Denote 1 D @˝1 n ; 2 D @˝2 n .n [  / (see Fig. 1), and let ui .i D 1; 2/ be the solution u restricted in the subdomains ˝1 ; ˝2 , i be the outward normal to the boundary of subdomain ˝i .i D 1; 2/, Further, denote @u1 g D @ j . Our idea is to determine the boundary function g together with the 1 functions ui .i D 1; 2/ by an iterative process. This leads the original problem to a sequence of weakly BVPs, which are easily solved.

2.1 Description of the Method Step 1. Step 2.

Step 3.

Given a function g .0/ 2 L2 . /, for example, g .0/ D 0; x 2 : For the known g .k/ on  , consecutively solve two problems 8 .k/ ˆ 8 u2 D f; x 2 ˝2 ; ˆ .k/ ˆ ˆ ˆ u1 D f; x 2 ˝1 ; ˆ ˆ ˆ ˆ ˆ u.k/ ˆ .k/ D '; x 2 2 ; < < 2 u1 D '; x 2 1 ; .k/ .k/ D u1 ; x 2 ; u ˆ ˆ ˆ 2 ˆ @u.k/ ˆ ˆ ˆ ˆ .k/ ˆ : 1 D g.k/ ; x 2 ; @u ˆ ˆ @1 : 2 D ; x 2 n : @2

(2)

Update value of g.kC1/

.k/

g .kC1/ D .1  /g.k/  

@u2 ; x 2 : @2

(3)

where  is a parameter to be chosen for guaranteeing the convergence of the method. Γn

Γ Γ1 Ω1

Fig. 1 Domain ˝ and its subdomains ˝1 ; ˝2

Ω2

Γ2

68

D. Quang A and V.V. Quang

2.2 Convergence of the Method Rewrite (3) in the form .k/

@u g .kC1/  g .k/ C g.k/ C 2 D 0; .k D 0; 1; 2; : : :/:  @2 .k/

Introduce the notations ei the problems

.k/

.k/

D ui  ui .i D 1; 2/;  .k/ D g .k/  g: Then ei

8 .k/ e1 D 0; x 2 ˝1 ; ˆ ˆ ˆ ˆ < .k/ e1 D 0; x 2 1 ; ˆ .k/ ˆ ˆ @e1 ˆ : D  .k/ ; x 2 ; @1 and the  .k/ satisfy the relation

8 .k/ e2 D 0; ˆ ˆ ˆ ˆ ˆ .k/ ˆ e2 D 0; ˆ < ˆ ˆ ˆ ˆ ˆ ˆ ˆ :

.k/ e2

.k/ @e2

@2

D

.k/ e1 ;

satisfy

x 2 ˝2 ; x 2 2 ; x 2 ;

D 0; x 2 n :

.k/

 .kC1/   .k/ @e C  .k/ C 2 D 0; .k D 0; 1; 2; : : :/:  @2

(4)

Now we define Steklov-Poincare operators S1 ; S2 as follows

S1  D

@v1 @v2 ; S2  D ; x2 @1 @2

where v1 and v2 are the solutions of the problems

8 < v1 D 0; v1 D 0; : v1 D ;

x 2 ˝1 ; x 2 1 ; x 2 ;

8 v2 D 0; x 2 ˝2 ; ˆ ˆ ˆ < v2 D 0; x 2 2 ; v2 D ; x 2 ; ˆ ˆ @v ˆ 2 : D 0; x 2 n : @2

Then the inverse operator S11 is defined by S11  D w1 j , where w1 is the solution of the problem

DDM for Strongly Mixed BVP for Poisson Equation

8 ˆ w1 D 0; ˆ < w1 D 0; @w ˆ 1 ˆ D ; : @1

69

x 2 ˝1 ; x 2 1 ; x 2 : .k/

@e D D 2 ; x 2 : Using the operators Therefore, we have @2 defined above we can rewrite the relation (4) in the form .k/ e1 ;

S11  .k/

.k/ S 2 e1

 .kC1/   .k/ C .I C S2 S11 / .k/ D 0; .k D 0; 1; 2; : : :/:  Acting the operator S11 on the both sides of the above equality we obtain the twolayer iterative scheme .kC1/

e1

.k/

 e1 

.k/

C Be1 D 0; x 2  .k D 0; 1; 2; : : :/:

(5)

where B D I C S11 S2 . From this scheme we have .kC1/

e1

.k/

D .I  B/e1 ;

x 2 :

The convergence of the iterative scheme depends on properties of the operator B. 1 ˇ For investigating the operator B we introduce the space  D H 2 . / D fvˇ W v 2 00

1



H01 .˝/g and its dual space 0 D H00 2 . /. Then in weak formulation the operator S1 can be defined as ˛ ˝ S1 ;  0 ; D .rH1 ; rH1 /L2 .˝1 / ; 8;  2 ;

where H1  is the harmonic extension of  from  to ˝1 : In [3] we proved that S1 is a symmetric, positive definite and bounded operator in the space . 1 b 2  the harmonic Now consider S2 . Let  2 H002 . / and denote by w D H extension of  to ˝2 , i. e., w is the solution of the problem 8 w D 0; ˆ ˆ ˆ w D 0; < w D ; ˆ ˆ @w ˆ : D 0; @2

x 2 ˝2 ; x 2 2 ; x 2 ; x 2 n :

b 2  the harmonic extension of  to ˝2 . Then Analogously, denote by v D H

70

D. Quang A and V.V. Quang

0D

Z

vwdx D

Z

S2 ds C

Z

S2  ds D

˝2

D



Z

@v wds C  @2

˝2

rvrwdx

˝2

@˝2

Z

Z

b 2 :r H b 2 dx: rH

From here it follows



Z

˝2

b 2 :r H b 2 dx; rH

which means that S2 is a symmetric operator in the space . Next, using the Poincare-Friedrich inequality and the trace theorem we obtain     ˛ ˝ b 2 ; r H b 2 2 D rv; rv L2 .˝2 / S2 ;  ;0 D r H L .˝2 /  C12 jjvjj2H 1 .˝2 /  C22 jjjj2H 1=2 . / :

(6)

On other hand, we have the following estimate for the solution of the problem for v jjvjjH 1 .˝2 /  C jjvjjH 1=2. / :

(7)

Besides, from the definition of the norm jjvjj2H 1.˝2 / D jjvjj2L2 .˝2 / C jjrvjj2L2 .˝2 / it follows jjrvjj2L2 .˝2 /  jjvjj2H 1.˝2 / :

(8)

Thus, from (6), (7) and (8) we obtain ˝ ˛ C22 jjjj2H 1=2 . /  S2 ;  ;0  C 2 jjjj2H 1=2 . / :

It means that S2 is a positive definite and bounded operator in : In the energetic product of S1 we have

 ˛ ˝ ˛ ˝ ˛   ˝  B;  D S1 I C S11 S2 ;  ;0 D S1 ;  ;0 C S2 ;  ;0 :

Since, as shown above, the operators S1 , S2 are symmetric, positive definite and bounded operators the operator B also is a symmetric, positive definite and bounded operator in : According to the general theory of two-layer iterative schemes [7] we conclude that with the parameter  chosen appropriately the iterative scheme (5) will converge. The results of computational experiments in [4] confirm the conclusion.

DDM for Strongly Mixed BVP for Poisson Equation

71

3 A Parallel Algorithm for Solving a Problem of Semiconductors At present for describing physical processes in semiconductor devices hydrodynamical models are usually used. Among them one is described in [10] (see also [9]). This model contains Poisson equation for electric potential with mixed boundary conditions. In this section we develop the idea of DDM presented in Sect. 2 to a more general problem in the domain ˝ D .0; 6a/  .0; b/ which is the model in [9]: 8 u D f; ˆ ˆ ˆ ˆ ˆ ˆ @u ˆ ˆ D ˇ; ˆ ˆ @x2 ˆ ˆ ˆ < @u D ˛; ˆ @x1 ˆ ˆ ˆ ˆ ˆ u D g; ˆ ˆ ˆ ˆ ˆ @u ˆ ˆ : D ˇ; @x2

.x1 ; x2 / 2 ˝; x2 D 0I 0 < x1 < 6a; (9)

x1 D 0; 6aI 0 < x2 < b; x2 D bI 0 < x1 < a; 2a < x1 < 4a; 5a < x1 < 6a x2 D bI a < x1 < 2a; 4a < x1 < 5a;

where ˛; ˇ; g are given functions.

3.1 Description of the Algorithm We divide the domain ˝ into five subdomains ˝i .i D 1; : : : ; 5/ by straightline segments 1 D .x1 D aI 0 < x2 < b/; 2 D .x1 D 2aI 0 < x2 < b/; 3 D .x1 D 4aI 0 < x2 < b/; 4 D .x1 D 5aI 0 < x2 < b/. We denote the left side of ˝1 by 0 , the right side of ˝5 by 5 , the top and bottom sides of ˝i by Ti and Bi .i D 1; : : : ; 5/, respectively (see Fig. 2). x2 b

T1

T2

Γ1

Γ0

0

Ω1

Ω2

B1

a B 2

T3

Γ2

T4

Γ3 Ω3

2a

B3

Fig. 2 Domain ˝ and its subdomains with boundaries

Γ4

Ω4

4a

T5

B4

Γ5

Ω5

5a B 5

✲ x1

6a

72

D. Quang A and V.V. Quang

Also, we introduce the notations ˇ ui D uˇ˝i .i D 1; : : : ; 5/;  D .1 ; 2 ; 3 ; 4 /0 ; ˇ ˇ ˇ ˇ @u1 ˇˇ @u3 ˇˇ @u3 ˇˇ @u5 ˇˇ 1 D ;  D ;  D ;  D : 2 3 4 @x1 ˇ1 @x1 ˇ2 @x1 ˇ3 @x1 ˇ4

Step 1. Given  .0/ D 0: Step 2. For k D 0; 1; 2; : : : solve in parallel problems in subdomains ˝1 ; ˝3 ; ˝5 8 .k/ u1 ˆ ˆ ˆ ˆ ˆ .k/ ˆ ˆ @u1 ˆ ˆ ˆ ˆ ˆ @x1 ˆ ˆ ˆ < .k/ @u1 ˆ @x1 ˆ ˆ ˆ ˆ ˆ ˆ @u.k/ ˆ 1 ˆ ˆ ˆ ˆ @x 2 ˆ ˆ ˆ : .k/ u1

D f;

x 2 ˝1 ;

D ˛;

x 2 0 ;

.k/

D 1 ;

x 2 1 ;

D ˇ;

x 2 B1 ;

D g;

x 2 T1 ;

8 .k/ x 2 ˝5 ; u5 D f; ˆ ˆ ˆ ˆ ˆ ˆ < @u.k/ .k/ 5 D 4 ; x 2 4 I @x 1 ˆ ˆ ˆ .k/ ˆ ˆ @u ˆ 5 : D ˇ; x 2 B5 I @x2

8 .k/ ˆ u3 ˆ ˆ ˆ ˆ ˆ .k/ ˆ ˆ @u3 ˆ ˆ ˆ ˆ @x1 ˆ ˆ ˆ ˆ < .k/ @u3 ˆ @x1 ˆ ˆ ˆ ˆ ˆ .k/ ˆ ˆ @u3 ˆ ˆ ˆ ˆ @x2 ˆ ˆ ˆ ˆ : .k/ u3

D f;

x 2 ˝3 ;

.k/

x 2 2 ;

D 3 ;

.k/

x 2 3 ;

D ˇ;

x 2 B3 ;

D g;

x 2 T3 ;

D 2 ;

.k/

@u5 D ˛; @x1

x 2 5 ;

.k/

u5 D g;

x 2 T5 :

Step 3. Solve in parallel in ˝2 ; ˝4 8 .k/ u2 D f; ˆ ˆ ˆ ˆ ˆ .k/ .k/ ˆ u2 D u 1 ; ˆ < ˆ ˆ ˆ ˆ ˆ ˆ ˆ :

.k/

.k/

u2 D u3 ;

.k/ @u2

@x2

D ˇ;

x 2 ˝2 ; x 2 1 ; x 2 2 ; x 2 B2 [ T2 ;

Step 4. Update

8 .k/ u4 D f; ˆ ˆ ˆ ˆ ˆ .k/ .k/ ˆ u 4 D u3 ; ˆ < ˆ ˆ ˆ ˆ ˆ ˆ ˆ :

.k/

.k/

u4 D u5 ;

.k/ @u4

@x2

D ˇ;

 .kC1/ D .1  / .k/  ' .k/ ; where  is a parameter to be chosen and

x 2 ˝4 ; x 2 3 ; x 2 4 ; x 2 B4 [ T4 :

(10)

DDM for Strongly Mixed BVP for Poisson Equation

'D



73

 @u2 ˇˇ @u2 ˇˇ @u4 ˇˇ @u4 ˇˇ 0 : ˇ ; ˇ ; ˇ ; ˇ @x1 1 @x1 2 @x1 3 @x1 4

3.2 Convergence of the Iterative Process In order to study the convergence of the proposed iterative process we introduce an operator B defined in the space L2 .1 [ 2 [ 3 [ 4 / by the formula B D '; where ui ; .i D 1; 2; 3; 4; 5/ are the solutions of the problems 8 u1 ˆ ˆ ˆ ˆ ˆ @u1 ˆ ˆ ˆ ˆ ˆ @x 1 ˆ ˆ < @u 1 ˆ @x2 ˆ ˆ ˆ ˆ @u1 ˆ ˆ ˆ ˆ ˆ @x ˆ 2 ˆ : u1

D 0;

x 2 ˝1 ;

D 0;

x 2 0 ;

D 1 ;

x 2 1 ;

D 0;

x 2 B1 ;

D 0;

x 2 T1 ;

8 u3 ˆ ˆ ˆ ˆ @u3 ˆ ˆ ˆ ˆ ˆ @x1 ˆ ˆ ˆ < @u3 @x1 ˆ ˆ ˆ ˆ ˆ @u3 ˆ ˆ ˆ ˆ @x2 ˆ ˆ ˆ : u3

D 0;

x 2 ˝2 ;

D 2 ;

x 2 2 ;

D 3 ;

x 2 3 ;

D 0;

x 2 B3 ;

D 0;

x 2 T3 ;

8 u5 D 0; x 2 ˝5 ; ˆ ˆ ˆ ˆ < @u5 D  ; x 2  I @u5 D 0; x 2  I 4 4 5 @x1 @x1 ˆ ˆ ˆ @u5 ˆ : D 0; x 2 B5 I u5 D 0; x 2 T5 ; @x2 8 u2 ˆ ˆ ˆ ˆ < u2 u2 ˆ ˆ ˆ @u2 ˆ : @x2

D 0; x 2 ˝2 ; D u 1 ; x 2 1 ; D u 3 ; x 2 2 ; D 0;

x 2 B2 [ T2

8 u4 ˆ ˆ ˆ ˆ < u4 u4 ˆ ˆ ˆ @u4 ˆ : @x2

D 0; .x1 ; x2 / 2 ˝4 ; D u 3 ; x 2 3 ; D u 5 ; x 2 4 ; D 0;

x 2 B4 [ T4

Then the formula (10) can be written in the form of the iterative scheme  .kC1/   .k/ C .I C B/ .k/ D F;  where F is a function determined by the data functions of the problem (9).

(11)

74

D. Quang A and V.V. Quang

Table 1 Convergence of the iterative scheme in Example 1 u1 u2 u3  K Error K Error K Error 0.1 40 5.104 40 0.05 40 0.05 40 6:104 30 0.002 0.2 27 9.105 0.3 17 9:105 22 6:104 18 0.001 0.4 12 9:105 16 6:104 14 0.002 5 0.5 9 8:10 12 6:104 10 0.002 0.6 16 8:105 30 8:104 17 0.002 0.7 40 2:104 40 0.05 40 0.002

K 40 34 22 15 12 20 40

u4 Error 0.0047 9:105 6:105 8:105 4:105 7:105 0.0025

Analogously as in the case of two subdomains in Sect. 2 we can prove that B is a symmetric, positive definite and bounded operator in appropriate space. It implies that I C B is a symmetric, positive definite and bounded operator in the space. Therefore, according to the general theory of two-layer iterative schemes [7] the iterative scheme (11) converges for some range of the parameter  with the rate of geometric progression.

3.3 Numerical Experiments We perform some numerical examples for testing the convergence of the proposed iterative method. Below we report the results of the experiments when the exact solution u.x1 ; x2 / is known. The computational domain ˝ D .0; 6a/  .0; b/ with a D =6 and b D =3 is covered by the uniform grid with grid size h D 1=64. The stopping criteria is jju.k/  u.k1/ jj1 < 0:001: The results of the experiment for 4 different functions: u1 D sin x1 x2 ; u2 D x13 C x2 e x1 C x23 C x1 e x2 ; u3 D e x2 log.x1 C 5/  sin x2 log.x1 C 6/; u4 D x12 C x22 are given in Table 1 , where K is the number of iterations, error=ku.k/  uk1 . From the results of the experiments we see that the value  D 0:5 of the iterative parameter appears to be optimal and with this value the proposed iterative process converges very fast.

4 Conclusion In the paper we propose a domain decomposition method for solving strongly mixed BVPs, when the transmission of types of boundary conditions occurs at one or many points on a smooth part of the boundary. This domain decomposition

DDM for Strongly Mixed BVP for Poisson Equation

75

method is based on the update of the normal derivative of the solution on the interface between subdomains. In the case of many points of transmission such as in a problem in physics of semiconductors, a parallel algorithm is considered. The computational experiments performed for some examples show an optimal value of the iterative parameter, which gives the fastest convergence of the iterative process. The proposed method can be successfully applied to other strongly mixed BVPs for the Poisson equation and for elliptic equations in general. Acknowledgements The authors kindly acknowledge financial support from Vietnam National Foundation for Science and Technology Development (NAFOSTED), project 102.99-2011.24, and would like to thank the referees for the helpful suggestions.

References 1. Arad M., Yosibash Z., Ben-Dor G., Yakhot A., Computing flux intensity factors by a boundary method for elliptic equations with singularities, Communications in Numerical Methods in Engineering, 14 (1998) 657–670. 2. Dang Q. A, Iterative method for solving strongly mixed boundary value problem, Proceedings of the First National Workshop on Fundamental and Applied Researches in Information Technology, Publ. House “Science and Technology”, Ha Noi, 2004. 3. Dang Q. A and Vu V.Q., A domain decomposition method for solving an elliptic boundary value problem, In: L. H. Son, W. Tutschke, S. Jain (eds.), Methods of Complex and Clifford Analysis (Proceedings of ICAM Hanoi 2004), SAS International Publications, Delhi, 2006, 309–319. 4. Dang Q. A, Vu V.Q., Experimental study of a domain decomposition method for mixed boundary value problems in domains of complex geometry, J. of Comp. Sci. and Cyber., 21(3) (2005) 216–229. 5. Georgiou G.C., Olson L., Smyrlis Y.S., A singular function boundary integral method for the Laplace equation, Communications in Numerical Methods in Engineering 12(2) (1996), 127–134. 6. Mandal N., Advances in dual integral equations, Chapman & Hall, 1999. 7. Samarskii A.A., The Theory of Difference Schemes, New York, Marcel Dekker, 2001. 8. Vabishchevich P.N., Iterative reduction of mixed boundary value problem to the Dirichlet problem, Differential Equations 25(7) (1989) 1177–1183 (Russian). 9. Blokhin A.M., Ibragimova A.S., Krasnikov N.Y., On a variant of the method of lines for the Poisson equation, Comput. Technologies, 12(2) (2007), 33-42 (Russian). 10. Romano V., 2D simulation of a silicon MESFET with a non-parabolic hydrodynamical model based on the maximum entropy principle, J. Comput. Phys. 176 (2002) 70–92. 11. Savare G., Regularity and perturbation results for mixed second order elliptic problems, Commun. in partial differential equations, 22 (5 and 6), 1997, 869–899. 12. Snedon I., Mixed boundary value problems in potential theory. North. Hol. Pub. Amsterdam, 1966. 13. Vu V. Q., Results of application of algorithm for reducing computational amount in the solution of mixed elliptic boundary value problems, Proceedings of the National symposium “Development of tools of informatics for the help of teaching, researching and applying mathematics”, Hanoi, 2005, 247–256 (Vietnamese).



Detecting, Monitoring and Preventing Database Security Breaches in a Housing-Based Outsourcing Model Tran Khanh Dang, Tran Thi Que Nguyet, and Truong Quynh Chi

Abstract In a housing-based outsourcing model, the database server is the client’s property and the outsourcing service provider only provides physical security of machines and data, and monitors (and if necessary restores) the operating condition of the server. Soft security-related aspects (e.g., DBMS security breaches) are the client’s responsibility. This is a non-trivial task for most of the clients. In this paper, we propose an extensible architecture for detecting, monitoring and preventing database security breaches in a housing-based outsourcing model. The architecture can help in dealing with both outsider and insider threats. It is well suited for the detection of both predefined and potential security breaches. Our solution to the database security breach detection is based on the wellknown pentesting- and version checking-based techniques in network and operation systems security. The architecture features visual monitoring and secure auditing w.r.t. all database user activities in real time. Moreover, it also supports automatic prevention techniques if security risks are established w.r.t. the found security breaches.

1 Introduction With the rapid development of the Internet and networking technology, outsourcing database services become increasingly popular in the enterprise database management [3, 4, 6]. Organizations (partly) outsource their data management needs to external service providers, and thereby freeing themselves to concentrate on their core business. One popular method of this outsourcing model is called “housing service”, where the servers are the property of the client who installs and

T.K. Dang  T.T.Q. Nguyet  T.Q. Chi Faculty of Computer Science & Engineering, HCMUT, Vietnam e-mail: fkhanh,ttqnguyet,[email protected] H.G. Bock et al. (eds.), Modeling, Simulation and Optimization of Complex Processes, DOI 10.1007/978-3-642-25707-0 7, © Springer-Verlag Berlin Heidelberg 2012

77

78

T.K. Dang et al.

operationally manages the servers’ software. In this case the outsourcing service provider provides the physical security of machines and data, monitors (and if necessary restores) the operating condition of the server. Even then, concerning security issues in the model, “soft” security-related issues are usually ignored or understood that “the installed software (or the client) must be responsible for it”. For example, if a client chooses a database housing service at a service provider SP and SP has got a special and limited account easytask for some special managerial activities (easytask was assigned very limited rights on the client’s “outside-housed” database clientDB), then SP is typically liable only for the so-called “physical” security of the client’s hardware and data. This is reasonable only if (the client believes that) the database management system (DBMS) is working as expected in terms of security. In practice, serious problems can occur as the DBMS has got some security flaws and the client may not realize these “soft” flaws soon before the housing service provider SP. In this case, SP may make use of the account easytask and the found security flaws in order to get control of or perform unauthorized accesses to the client’s clientDB, which is housed on SP’s premises. For the above reasons, we have been carrying out research in order to develop a means capable of detecting the database security flaws and visually monitoring the account easytask’s real-time activities, especially with respect to the clientDB’s possible security flaws. Moreover, based on the results of the detecting and monitoring phase, the system will also be able to send warning messages or conduct other proper preventing actions if it detects the risks that may violate the security policy. The rest of this paper is organized as follows. Section 2 briefly discusses related work. Section 3 introduces a classification of security flaws and monitored database activities. An architecture for the system is proposed in Sect. 4. After that, we present our prototype for an Oracle database and evaluate our solution in Sect. 5. Section 6 discusses open related research issues. Finally, Sect. 7 gives concluding remarks and presents future work.

2 Related Work 2.1 Detecting Techniques Data mining is one of the techniques used in anomaly detection methodology [16] to find an abnormal pattern of access transactions which may be a signal of an attack [1] by heuristics and/or some stochastic-based algorithms. However, it may also give a false alarm or some uncertain degree to the discovered results. Therefore, we focus only on two major detecting techniques: version checking and pentestingbased techniques to find the security flaws. We will elaborate on this problem as placing data mining-based techniques in the context of our developing system in Sect. 6.

Detecting, Monitoring and Preventing Database Security Breaches

79

The version checking-based technique [5, 6] is used to find inherently security flaws related to each particular DBMS version which have been published on the database security-related web sites. This technique is simple, but it has a noticeable lacuna that it may result in false alarms because security flaws may depend on not only DBMS but also other factors. Pentesting is a method of evaluating the security of database system by simulating an attack by a malicious user through various phrases [2, 7] to detect security flaws such as flaws related to SQL Injection, etc. (see Sect. 3.1). If the simulatedintrusion action is successful, a security flaw definitely exists in the system.

2.2 Monitoring Techniques Database activity monitoring means that all structured query language (SQL) statements of both normal users and database administrators are captured and recorded in real time. To perform auditing activities, database activity monitoring needs collectors. There are three major categories of collection techniques [8, 13, 14]: network monitoring, local agent, and remote monitoring. Each category has its own advantages and disadvantages which are related to the system performance, DBMS platform, and type of captured activities (internal or external). In our system, the remote monitoring-based technique is used to collect information of outsourced database access activities (by using native auditing features of DBMS or other internal database features such as triggers). We can, therefore, monitor both internal and external database activities.

2.3 Preventing Techniques To protect the outsourced database, in addition to built-in security functionalities of the DBMS, our system provides users with two additional kinds of prevention: active and passive unauthorized access prevention [6]. Active prevention is a countermeasure that should be done before the system being attacked while passive prevention is done right after a certain sign of an attack appeared in the database system. There are three levels of a passive prevention, namely to alert, to disable (temporarily) user’s access, or to shut down the database. An instance of this prevention is to monitor all the activities on the system in real time and allow the tracker to prevent malicious or attacking actions as soon as possible.

3 Problem Analysis The desired system has two key features: detecting database security flaws and monitoring database activities. As a consequence, it also provides abilities of

80

T.K. Dang et al.

preventing possible damages to the database systems. Firstly, we must identify and classify objects or events which should/must be detected.

3.1 Database Security Flaw Classification A database security flaw is a vulnerability which can be exploited to make an attack on the database system. Database security flaws are classified into five major categories, which will also be used as the fundamental criteria to scan the flaws later on [5, 6, 11]: – Version: The version-related flaws are inherently natural security flaws existing in the DBMS since it has been released. – Users/Accounts: This type refers to the way users manage their accounts, or the account settings in the security aspect. – Procedures/Functions/Packages: Some procedures/functions/packages may consist of errors/bugs that can be exploited to attack the database system. For instance, parameters of the procedures are not well validated so that hackers can take advantage of this leading to a privilege escalation or denial of service (DoS) attack on the database systems. – Privileges/Roles: This kind of flaws begins from lack of full awareness of the power that the privileges/roles give to users, and from incomplete understanding of the existing technologies w.r.t. the employed DBMS. – System security settings: Mis-configuration of the database system often leads to serious security flaws. Thus, the database administrator has to cover as much as possible the aspects which will affect the database system security.

3.2 Monitoring Activity Classification Based on the requirements of the Sarbanes-Oxley Act (SOX) for auditing activities [Nat05], we classify monitoring activities into six main categories which are used to set up the monitoring policy for our system: – Connection: The events of sign-on and sign-off should be recorded with the login name, a timestamp for the event, and some additional information such as TCP/IP address of the user or the program creating the connection. – Object changes: This category falls into three kinds, namely schema changes, user changes, and role changes. Auditing schema changes means that all data definition activities are audited. User changes or role changes are activities related to addition, deletion or expiration of users or roles. – System security settings: It includes audit settings and system configuration settings. Activities related to audit settings are monitored when changing what

Detecting, Monitoring and Preventing Database Security Breaches

81

to be audited and which users can access to audit tables. System configuration settings which change values of system parameters are also captured. – Security settings for users: This kind includes privilege/role settings and account settings. It means that activities which change privileges of users and parameters’ values of accounts must be recorded. – Privileged user activities: Any activities by privileged users, including public users, database administrators and other predefined privileged users, are monitored. – Direct data access: This type is to monitor any content changes or accesses to key tables including audit tables and sensitive tables.

4 The Proposed Architecture The system architecture consists of three programs separately located at client side, service server side and client server side. The client side is the application to scan the client’s remote database system, to monitor the database activities and to prevent security problems. The service server side is an independent server as a central service server providing services for the above application to process the client’s requests (cf. Fig. 1). The client server side or housing provider side is actually the client’s outsourced database server. The system architecture is described in detail as follows:

Fig. 1 The overall system architecture

82

T.K. Dang et al.

4.1 Client Side The client installs a desktop application to interact with the service server. This application is a functional interface. The main tasks of this application are to connect to the service server and call suitable deployed services in order to accomplish the following tasks: scanning, monitoring, configuring, flaw fixing, reporting, feedback sending, and visualization. – Scan: identify which database security flaws exist in the client’s database system and notify the client administrators of their current database’s health (see Sect. 4.4 for more details). – Monitor: while an account accesses to the client’s database or whenever this account’s privileges have any changes in the real time, all activities of the account are monitored (see Sect. 4.5 for more details). – Configure: set up the system configuration. – Fix: offer solutions to repair the found flaws with two modes, a manual or an automatic-fixing mode. – Report: present the final results of the scan and monitoring processes in the visually most attractive way (graphs and reports). – Send feedback: send comments and report new security flaws to the system developers. – Visualization: visually displays the scan/monitor results with intuitively understandable interfaces. – Update program: ensure that the program’s database and functionalities at the client side are always up-to-date (e.g., new security flaw patterns, new solutions to monitor database activities, or even a new GUI). As mentioned above, the remote services are actually in charge of performing the main functions, and then returning results to the client program. Thus, there is no real scanning or monitoring process happening at the client side. Therefore, this application is light enough for the client to download and run at his personal computer. The program database at the client side only stores some configuration information for the program to connect to the service server.

4.2 Service Server Side Each function is implemented as web service. This implementation model is also suitable for the service provider to manage its supplied services according to any specific clients. The services are corresponding to primary functions of the client system: scan, monitor, prevent, fix, report and collect feedback. The prevent function differs from the other functions as it is not a function called by the client. The response actions are predefined and called automatically when there are risks

Detecting, Monitoring and Preventing Database Security Breaches

83

threatening the client’s server. The other functions are described similarly as the client side’s functions, but they are implemented as web services. For example, when client wants to scan his server, the program at the client side will call the “scan” service to detect the security flaws in the target database. Another program at the service server side is built for the system development purpose. It supports the developers to specify more database security flaws, detecting scripts and monitoring policies: – Specify database security flaws: update the database of the program. The repository includes the flaw information, preconditions, and corresponding fixing solutions. – Specify detecting scripts: define the scripts to detect database security flaws. For each security flaw, it needs scripts and a solution to detect it. The solution can base on the version checking- or pentesting-based technique. – Specify monitoring policies: define the policies to monitor database activities. The policy defines the sensitive objects, subjects, and database privilege that should be monitored. – Collect feedback: get feedback from the users/clients (from the client side) in order to further improve the program with new advanced features. A crucially important feature of the service server side is the capability of securing sensitive data for each client, namely, the scanning results, the audit data, as well as the sensitive configuration information of its database server. By that way, even the provider administrators do also not know the state of their client database server and, of course, each authorized client can only access its own data.

4.3 Client Server Side (Housing Provider Side) The program at the client server side is a small piece of code, called listener that records what is happening at the client’s outsourced database. It stores audit data temporarily in the client’s database server and then sends them to the service server after being filtered for further analysis and reporting. The audit data are saved temporarily in the client’s server instead of transferring them directly to the service server in order to ensure secure auditing property. Such data contains important information about the system’s state and so, must be continuously updated for clients to make timely decisions. When the connection between client database server and service server is interrupted, all activities at the outsourced database still keep on being audited and stored at the local client database server. As the connection is re-established, the recovery and synchronize mechanism in the system will ensure that these two audit databases are consistent. Consequently, monitoring database activities of users continue even if the connection is interrupted. In addition, all audit data are encrypted, digitally signed, as well as transferred over a secure communication channel. Besides that, security policies are enforced strictly to limit access to the audit tables from unauthorized users.

84

T.K. Dang et al.

Fig. 2 Process of detecting database security flaws

4.4 Detecting Engine The detecting engine (cf. Fig. 2) is used for the web service of scanning. This process is activated when the client’s administrator defines requirements and requests the detecting engine to find database security flaws through the application at the client side. After gathering the requirements and criteria to scan, it makes some necessary preparations before scanning such as performing backup of important data, generating temporary files, etc. Next, it starts scanning, analyzing and evaluating the database system. These three processes work together in order to find the database security flaws. After that, it returns results to the client and then displays them on the client user interface w.r.t. the found flaws as a statistical report and, if necessary, sends alert messages to the client’s administrator. The result is visually organized by the flaw categories and ranked by the risk level. Next, possible solutions to the found flaws are also suggested. This will help the administrator to have an overall view of the system health and to carry out suitable actions to protect the database system better. While running, the process uses the database security flaw repository, audit logs and stated policies to detect the security flaws. The result of the scanning phase is stored at a local database, called the scan history, for references as needed.

4.5 Monitoring Engine Figure 3 illustrates the process of monitoring database activities which is activated when the administrator configures the monitor settings and requests the monitoring service to track users’ database activities in real time. Firstly, it

Detecting, Monitoring and Preventing Database Security Breaches

85

Fig. 3 Process of monitoring database activities

gathers the users’ requirements to monitor. Then, using the collected information, it makes some preparations, for example, by installing the auditing modules (if necessary), generating temporary files, etc. [5, 10]. Next, it starts to monitor database activities, including the following sub-processes: gathering audit data, filtering data, analyzing, evaluating, and detecting. These processes work together to monitor the client database system’s health. The gathering audit data process records all activities of an account while the account is accessing the client’s database or whenever this account’s privileges have got any changes in the real-time environment from audit logs. These activities are captured by an auditor module. The filtering data process filters audit trails using policy-based rules. These rules are set to determine activities that really matter. This engine will send filtered data to the analysing and evaluating processes. Using the results of the detector process as described in Sect. 4.4 and the account’s activities combined with thresholds retrieved from the predefined baseline storage, the analysis and evaluation part can give trustworthy and noticeable results about the found security flaws. After that, basing on policies for response action, the preventing process will propose possible solutions for fixing the found flaws as well as sending certain alert messages to the administrator. The results from the security flaw scanning process, the corresponding response actions as well as what the monitoring processes have captured will be presented visually in the forms of charts, graphs, etc. by the displaying process. They are also summarized and stored in the history database for statistics.

86

T.K. Dang et al.

5 Prototype and Evaluation In this section, we briefly introduce our implemented prototype which is used to scan security flaws and to monitor activities for Oracle databases in a real-world housing model [6]. One can also extend this prototype w.r.t. other DBMSs based on the specification modules as presented in Sect. 4.2. After that, we give the evaluation of our architecture and implementation.

5.1 Prototype The prototype can scan about 150 flaws [6] of the Oracle Database Server 10g and 11g, including all kinds of flaws mentioned in Sect. 3.1. Besides that, it can track all activities of normal users who are accessing the client’s outsourced database server (clientDB) for some special managerial activities in the housing-outsourcing model and alert them to a predefined security policy violation. These activities include Data Manipulation Language (DML), Data Definition Language (DDL), and Data Control Language (DCL) in both successful and unsuccessful modes. This means that all performed activities are recorded in the program database even if they were unsuccessful. The prototype comprises the collection of web services and two programs which are described in the system architecture (cf. Sect. 4): the client tool at the client side and the developer tool at the service server side. The client tool consumes data space of less than 4 MB. It mainly calls web services deployed at the service server side of the provider to carry out the main functions such as scanning and monitoring. The sensitive data such as scanning results or auditing data are secured for each client by using Oracle Data Vault. Except the authorized client, nobody not even the administrator of the housing service provider can have the privilege to discover such sensitive data.

5.2 Evaluation The evaluation is based on the following characteristics: – Extensibility: The system developers of the service provider can enlarge the security flaws database and the kinds of database activities to be monitored through specification functions of the service server side (cf. Sect. 4.2). Moreover, this architecture can be extended to any kind of DBMS. – Efficiency: The clients only need to install a lightweight program to perform the functions of scanning, monitoring, alerting and reporting that will call the corresponding web services located at the service server side. Therefore, it reduces the workload as well as the cost of resources for the clients as running the application (cf. Sect. 4.1). Regarding the response time, it depends on the speed

Detecting, Monitoring and Preventing Database Security Breaches

87

of the network connection between client server and service server. If a lot of clients are connecting to the service server concurrently, a bottleneck problem is not avoidable and must be solved, for example, by upgrading the configuration of the service server. – Security: The architecture proposes the solution to keep sensitive data of a client secure from the administrator(s) of the service provider as well as other clients. In addition, with the secure auditing data feature, all database activities are always monitored even though the network connection is interrupted (stored in the local audit data by the listener, see Sect. 4.2). – Reliability: The client application calls web services located in a separate service server. Therefore, it cannot work in the condition that network between the client and the service server is interrupted. The analyzed results, however, can be reviewed by the client because they are stored at the service server side in the client data vault. Through the proposed solution, although the clients are not present at the database server location, their database soft security aspects can be monitored and evaluated. Besides, necessary prevention actions can also be conducted by the clients in case suspicious activities are found.

6 Open Research Issues Although our framework is general-purpose for a variety of application contexts, many research topics are still open and we would like to introduce some most notable issues which have been identified but not addressed radically: – Pentesting weaknesses: Although pentesting is an useful practice and a simple and sound solution to identify vulnerabilities existing in the database system, there are many security problems and limitations associated with penetration tests [2]. We have proposed a framework which combines five components based on a variety of rules: verifying scripts before testing, monitoring and preventing, alerting, auditing all events during testing and checking recovery state [15]. – Database security visualization: The in-depth analysis of huge amounts of information from scanning or monitoring results is not easy without the support of data visualization techniques. Security visualization extensively helps users to understand deeply security issues and easily keep track of the health state of database by time from the visual perspective [12]. – Data mining-based techniques: The crucial advantage of data mining techniques is the ability to draw attention to apotential/unknown security flaws. To implement data mining in the proposed system, we need a mechanism to collect all database transactions and configurations from the clients. The main challenge of this technique is to develop an efficient algorithm to distinguish risky patterns [1] and to protect clients’ privacy in data mining [9].

88

T.K. Dang et al.

7 Conclusion and Future Work In this paper, we have presented a framework for detecting, monitoring and preventing database security breaches and three relevant methodologies, namely version checking- and pentesting-based techniques for detecting database security flaws, remote monitoring-based techniques for auditing data, and policy-based preventing techniques for preventing suspicious activities in a housing-based outsourcing model. Besides that, we have also introduced a classification of security flaws existing in the database systems and of activities that need to be monitored as a basis to enlarge the database of flaws and monitoring policies. In the future, we are going to integrate our recent results of a side-effects free database pentesting solution [15] into the proposed architecture and carry out open research issues as discussed in Sect. 6 to improve the system effectiveness, performance, and security. Besides that, we continue to do research on an extensible framework for database security flaw specification in order to provide a means for enlarging the system easier and for reducing the risks due to lack of specification experience of pentesters.

References 1. Ashish, K., Evimaria, K., Elisa, B.: Detecting Anomalous Access Patterns in Relational Databases. VLDB Journal, 17(5), 1063–1077 (2008) 2. The Bundesamt f¨ur Sicherheit in der Informationstechnik: Study: A Penetration Testing Model, URL: https://ssl.bsi.bund.de/english/publications/studies/penetration.pdf (2003) 3. Dang, T.K., Nguyen, T.S.: Providing Query Assurance for Outsourced Tree-Indexed Data. HPSC2006, Hanoi, Vietnam, pp. 207–224 (2008) 4. Dang, T.K.: Ensuring Correctness, Completeness and Freshness for Outsourced Tree-Indexed Data. IRMJ, Idea Group, 21(1), 59–76 (2008) 5. Dang, T.K., Truong, Q.C., Cu-Nguyen, P.H., Tran, T.Q.N.: An Extensible Framework for Detecting Database Security Flaws. ACOMP2008, Vietnam, pp. 68–77 (2008) 6. Dang, T.K., Tran, T.Q.N., Truong, Q.C.: Security Issues in Housing Service Outsourcing Model with Database Systems. ASIS LAB, ASIS-TR-0017/2009, URL: http://www.cse.hcmut.edu. vn/asis (2009) 7. Geer, D., Harthorne, J.: Penetration testing: a duet. Proceedings of the 18th Annual Computer Security Applications Conference, Las Vegas, USA, pp. 185–198 (2002) 8. Handscombe, K.: Continuous Auditing From A Practical Perspective. Information Systems Control Journal, 2 (2007) 9. Huynh, V.Q.P, Dang, T.K: eM2: An Efficient Member Migration Algorithm for Ensuring k-Anonymity and Mitigating Information Loss. VLDB Workshop on Secure Data Management, LNCS, Springer Verlag, Singapore, pp. 26–40 (2010) 10. Natan, R.B.: Implementing Database Security and Auditing. Elsevier Digital Press (2005) 11. Qiang, L.: Defense In-Depth to Achieve Unbreakable Database Security. ICITA2004, China, pp. 386–390 (2004) 12. Raffael, M.: Applied Security Visualization. Addison-Wesley (2008) 13. Rich, M.: Understanding and Selecting a Database Activity Monitoring Solution. URL: http: //securosis.com/publications/DAM-Whitepaper-final.pdf (2008)

Detecting, Monitoring and Preventing Database Security Breaches

89

14. Surajit, C., Arnd, C., Koenig, V.N.: SQLCM: A Continuous Monitoring Framework for Relational Database Engines. ICDE2004, USA, pp. 473–485 (2004) 15. Tran, T.Q.N., Dang, T.K.: Towards Side-Effects-free Database Penetration Testing. Journal of Wireless Mobile Networks, Ubiquitous Computing, and Dependable Applications (JoWUA), 1(1), 72–85 (2010) 16. Varun, C., Arindam B., Vipin K.: Anomaly Detection: A Survey. ACM Computing Surveys (CSUR), 41(3), article 15 (2009)



Real-Time Sequential Convex Programming for Optimal Control Applications Tran Dinh Quoc, Carlo Savorgnan, and Moritz Diehl

Abstract This paper proposes real-time sequential convex programming (RTSCP), a method for solving a sequence of nonlinear optimization problems depending on an online parameter. We provide a contraction estimate for the proposed method and, as a byproduct, a new proof of the local convergence of sequential convex programming. The approach is illustrated by an example where RTSCP is applied to nonlinear model predictive control.

1 Introduction and Motivation Consider a parametric optimization problem of the form: (

min c T x x

s.t. g.x/ C M  D 0; x 2 ˝;

P./

where x; c 2 Rn , g W Rn ! Rm is a nonlinear function, ˝  Rn is a convex set, the parameter  belongs to a given set   Rp , and M 2 Rmp is a given matrix. This paper deals with the efficient calculation of approximate solutions to a sequence of problems of the form P./ where the parameter  is varying slowly. In other words, for a sequence fk gk1 such that kM.kC1  k /k is small, we want to solve problem P.k / in an efficient way without requiring too much accuracy in the result. In practice, sequences of problems of the form P./ can be solved in the framework of nonlinear model predictive control (MPC). MPC is an optimal control

T.D. Quoc  C. Savorgnan  M. Diehl Department of Electrical Engineering (ESAT-SCD) and Optimization in Engineering Center (OPTEC), K.U. Leuven, Kasteelpark Arenberg 10, 3001 Leuven, Belgium e-mail: [email protected]; [email protected]; [email protected] H.G. Bock et al. (eds.), Modeling, Simulation and Optimization of Complex Processes, DOI 10.1007/978-3-642-25707-0 8, © Springer-Verlag Berlin Heidelberg 2012

91

92

T.D. Quoc et al.

technique which avoids computing an optimal control law in a feedback form, which is often a numerically intractable problem. A popular way of solving the optimization problem to calculate the control sequence is using either interior point methods [1] or sequential quadratic programming (SQP) [2, 3, 9]. A drawback of using SQP is that this method may require several iterations before convergence and therefore the computation time may be too large for a real-time implementation. A solution to this problem was proposed in [6], where the real-time iteration (RTI) technique was introduced. Extensions to the original idea and some theoretical results are reported in [5, 7, 8]. Similar nonlinear MPC algorithms are proposed in [10, 13]. RTI is based on the observation that for several practical applications of nonlinear MPC, the data of two successive optimization problems to be solved in the MPC loop is numerically close. In particular, if we express these optimization problems in the form P./, the parameter  usually represents the current state of the system, which, for most applications, doesn’t change significantly in two successive measurements. The RTI technique consists of performing only the first step of the usual SQP algorithm which is initialized using the solution calculated in the previous MPC iteration.

1.1 Contribution Before stating the main contributions of the paper we need to outline the (full-step) sequential convex programming (SCP) algorithm framework applied to problem P./ for a given value k of the parameter : 1. Choose a starting point x 0 2 ˝ and set j WD 0. 2. Solve the convex approximation of P.k /: 8 ˆ cT x < min x s.t. g 0 .x j /.x  x j / C g.x j / C M k D 0; ˆ : x2˝

Pcvx .x j I k /

to obtain a solution x j C1 , where g 0 ./ is the Jacobian matrix of g./. 3. If the stopping criterion is satisfied then: STOP. Otherwise, set j WD j C 1 and go back to Step 2. The real-time sequential convex programming (RTSCP) method proposed in this paper combines the RTI technique and the SCP algorithm: instead of solving with SCP every P.k / to full accuracy, RTSCP solves only one convex approximation Pcvx .x k1 I k / using as a linearization point x k1 , which is the approximate solution of P.k1 / calculated at the previous iteration. Therefore, RTSCP solves a sequence of convex problems corresponding to the different problems P.k /. This method is suitable for the problems that contain a general convex substructure such as nonsmooth convex cost, second order or semidefinte cone constraints which may not be convenient for SQP methods.

Real-Time Sequential Convex Programming for Optimal Control Applications

93

In this paper we provide a contraction estimate for RTSCP which can be interpreted in the following way: if the linearization of the first problem P.0 / is close enough to the solution of the problem and the quantity kM.kC1  k /k is not too big (which is the case for many problems arising from nonlinear MPC), RTSCP provides a sequence of good approximations of the sequence of optimal solutions of the problems P.k /. As a byproduct of this result, we obtain a new proof of local convergence for the SCP algorithm. The paper is organized as follows. Section 2 proposes a description of the RTSCP algorithm. Section 3 proves the contraction estimate for the RTSCP method. The last section shows an application of the RTSCP method to nonlinear MPC.

2 The RTSCP Method As mentioned in the previous section, SCP solves a possibly nonconvex optimization problem by solving a sequence of convex subproblems which approximate the original problem locally. In this section, we combine RTI and SCP to obtain the RTSCP method. The method consists of the following steps: Initialization. Find an initial value 1 2  , choose a starting point x 0 2 ˝ and compute the information needed at the first iteration such as derivatives, dependent variables, . . . . Set k WD 1. Iteration. 1. Solve Pcvx .x k1 I k / (see Sect. 3) to obtain a solution x k . 2. Determine a new parameter kC1 2  , update (or recompute) the information needed for the next step. Set k WD k C 1 and go back to Step 1. One of the main tasks of the RTSCP method is to solve the convex subproblem Pcvx .x k1 I k / at each iteration. This work can be done by either implementing an optimization method which exploits the problem structure or relying on one of the many efficient software tools available nowadays. Remark 1. In the RTSCP method, a starting point x 0 in ˝ is required. It can be any point in ˝. But as we will show later [Theorem 1], if we choose x 0 close to the true solution of P.0 / and kM.1  0 /k is sufficiently small, then the solution x 1 of Pcvx .x 0 ; 1 / is still close to the true solution of P.1 /. Therefore, in practice, problem P.0 / can be solved approximately to get a starting point x 0 . Remark 2. Problem P./ has a linear cost function. However, RTSCP can deal directly with the problems where the cost function f .x/ is convex. If the cost function is quadratic and ˝ is a polyhedral set then the RTSCP method collapses to the real-time iteration of a Gauss-Newton method (see, e.g. [4]). Remark 3. In MPC, the parameter  is usually the value of the state variables of a dynamic system at the current time t. In this case,  is measured at each sample time based on the real-world dynamic system (see example in Sect. 4).

94

T.D. Quoc et al.

3 RTSCP Contraction Estimate The KKT conditions of problem P./ can be written as (

0 2 c C g0 .x/T  C N˝ .x/

(1)

0 D g.x/ C M ;

˚  where N˝ .x/ WD u 2 Rn j uT .v  x/  0; 8v 2 ˝ if x 2 ˝ and N˝ .x/ WD ; if x … ˝, is the normal cone of ˝ at x, and  is a Lagrange multiplier associated with g. Note that the constraint x 2 ˝ is implicitly included in the first line of (1). N N is called a A pair zN./ WD .x./; N .// satisfying (1) is called a KKT point and x./ stationary point of P./. We denote by ./ the set of KKT points at . In the sequel, we use z for a pair .x; /, zNk is a KKT point of P./ at k and zk is a KKT point of Pcvx .x k I kC1 / (defined below) at kC1 for k  0. The symbols k  k and k  kF stand for the L2 -normand the Frobenius  norm, respectively. c C g 0 .x/T  and K WD ˝  Rm , then the KKT Now, let us define '.zI / WD g.x/ C M  system (1) can be expressed as a parametric generalized equation [11]: (2)

0 2 '.zI / C NK .z/;

where NK .z/ is the normal cone of K at z. Let x k 2 ˝ be a solution of Pcvx .x k1 I k / at the k-iteration of RTSCP. We consider the following parametric convex subproblem at Step 1 of the RTSCP algorithm: 8 ˆ cT x < min x s.t. g 0 .x k /.x  x k / C g.x k / C M kC1 D 0; ˆ : x 2 ˝:

Pcvx .x k I kC1 /

 c C g0 .x k /T  then g.x k / C g 0 .x k /.x  x k / C M kC1 the KKT condition for Pcvx .x k ; kC1 / can also be represented as a parametric generalized equation: 0 2 '.zI O x k ; kC1 / C NK .z/; (3) If we define '.zI O x k ; kC1 / WD



where k WD .x k ; kC1 / plays a role of parameter. Suppose that the Slater constraint qualification condition holds for problem Pcvx .x k I kC1 /, i.e.: ˚  ri.˝/ \ x W g.x k / C g 0 .x k /.x  x k / C M kC1 D 0 ¤ ;;

Real-Time Sequential Convex Programming for Optimal Control Applications

95

where ri.˝/ is the set of the relative interior points of ˝. Then by convexity of ˝, a point zkC1 D .x kC1 ; kC1 / is a KKT point of the subproblem Pcvx .x k I kC1 / if and only if x kC1 is a solution of Pcvx .x k I kC1 / with a corresponding multiplier kC1 . For a given KKT point zN k 2 .k / of P.k /, we define a set-valued mapping: L.zI / WD '.zI O xN k ; / C NK .z/;

(4)

˚  and L1 .ıI / WD z 2 RnCm W ı 2 L.zI / for ı 2 RnCm is its inverse mapping. Note that 0 2 L.zI / is indeed the KKT condition of Pcvx .xN k I /. For each k  0, we make the following assumptions: (A1) The set of the KKT points 0 WD .0 / is nonempty. (A2) The function g is twice continuously differentiable on its domain. (A3) There exist a neighborhood N0  RnCm of the origin and a neighborhood NzNk of zNk such that for each ı 2 N0 , k .ı/ WD NzNk \ L1 .ıI / is single-valued and Lipschitz continuous on N0 with a Lipschitz constant > 0. (A4) There exists a constant 0   < 1= such that kEg .Nzk /kF  , where P N k 2 N k /. Eg .Nzk / WD m i D1 i r gi .x

Assumptions (A1) and (A2) are standard in optimization, while Assumption (A3) is related to the strong regularity concept introduced by Robinson [11] for the parametric generalized equations of the form (2). It is important to note that the strong regularity assumption follows from the strong second order sufficient optimality in nonlinear programming when the constraint qualification condition (LICQ) holds [11] [Theorem 4.1]. In this paper, instead of the generalized linear mapping LR .zI / WD '.Nzk I / C ' 0 .Nzk /.z  zN k / C NK .z/ used in [11] to define strong regularity, in Assumption (A3) we use a similar form L.zI / D '.Nzk I / C D.Nzk /.z  zN k / C NK .z/, where ' 0 .Nzk / D

    Eg .Nzk / g 0 .xN k /T 0 g0 .xN k /T k / D ; and D.N z : g0 .xN k / 0 g 0 .xN k / 0

These expressions are different from each other only at the left-top corner Eg .Nzk /, the Hessian of the Lagrange function. Assumption (A3) corresponds to the standard strong regularity assumption (in the sense of Robinson [11]) of the subproblem (Pcvx .x k I kC1 /) at the point zNk , a KKT point of (2) at  D k . Assumption (A4) implies that either the function g should be “weakly nonlinear” (small second derivatives) in a neighborhood of a stationary point or the corresponding Lagrange multipliers are sufficiently small in this neighborhood. The latter case occurs if the optimal value of (P./) depends only weakly on perturbations of the nonlinear constraint g.x/ C M  D 0. Theorem 1 (Contraction Theorem). Suppose that Assumptions (A1)–(A4) are satisfied. Then there exist neighborhoods N of k , N of zNk and a single-valued function zN W N ! N such that for all kC1 2 N , zN kC1 WD zN.kC1 / is the unique

96

T.D. Quoc et al.

KKT point of P.kC1 / in N with respect to parameter kC1 (i.e. .kC1 / ¤ ;). Moreover, for any kC1 2 N , zk 2 N we have kzkC1  zNkC1 k  !k kzk  zNk k C ck kM.kC1  k /k;

(5)

where !k 2 .0; 1/, ck > 0 are constant, and zkC1 is a KKT point of Pcvx .x k I kC1 /. Proof. The proof is organized in two parts and step by step. The first part proves k WD .k / ¤ ; for all k  0 by induction and estimates the norm kNzkC1  zNk k. The second part proves the inequality (5) (see Fig. 1). Part 1: For k D 0, 0 ¤ ; by Assumption (A1). Suppose that k ¤ ; for k  0, we will show that kC1 ¤ ;. We divide the proof into four steps. Step 1.1. We first provide the following estimations. Take any zN k 2 k . We define O xN k ; k /  '.zI /: rk .zI / WD '.zI

(6)

Since p  < 1 by (A4), we can choose " > 0 sufficientlypsmall such that  C 5 3 " < 1. By the choice of ", we also have c0 WD  C 3" 2 .0; 1= /. Since g is twice continuously differentiable, there exist neighborhoods N  Nk of k and N  NzNk of a radius  > 0 centered at zNk such that: rk .zI / 2 N0 , kEg .z/  Eg .Nzk /kF  ", kEg .z/  Eg .zk /kF  ", kg0 .x/  g 0 .xN k /kF  " and kg 0 .x/  g 0 .x k /kF  " for all z 2 N . Next, we shrink the neighborhood N of k , if necessary, such that: kM.  k /k  .1  c0 /= :

(7)

Step 1.2. For any z; z0 2 N , we now estimate krk .zI /  rk .z0 I /k. From (6) we have rk .zI /  rk .z0 I / D '.zI O xN k ; k /  '.z O 0 I xN k ; k /  '.zI / C '.z0 I / D

Fig. 1 The approximate sequence fzk gk along the manifold zN./ of the KKT points

Z

1 k

0

0

B.zt I xN /.z  z/dt;

(8)

Real-Time Sequential Convex Programming for Optimal Control Applications

97

where zt WD z C t.z0  z/ 2 N and  g 0 .z/T  g 0 .x/ O T Eg .z/ B.zI x/ O D 0 : g .x/  g0 .x/ O 0 

(9)

Using the estimations of Eg and g 0 at Step 1.1, it follows from (9) that  1=2 kB.zt I xN k /k  kEg .Nzk /kF C kEg .zt /  Eg .Nzk /k2F C 2kg 0 .xt /  g0 .Nzk /k2F (10) p   C 3"  c0 : Substituting (10) into (8), we get krk .zI /  rk .z0 I /k  c0 kz  z0 k:

(11)

Step 1.3. Let us define ˚ .z/ WD NzNk \ L.rk .zI /I k /. Next, we show that ˚ ./ is a contraction self-mapping onto N and then show that kC1 ¤ ;. Indeed, since rk .zI / 2 N0 , applying (A3) and (11), for any z; z0 2 N , one has k˚ .z/  ˚ .z0 /k  krk .zI /  rk .z0 I /k  c0 kz  z0 k:

(12)

Since c0 2 .0; 1/ (see Step 1.1), we conclude that ˚ ./ is a contraction mapping on N . Moreover, since zNk D NzNk \ L1 .0I k /, it follows from (A3) and (7) that k˚ .Nzk /  zNk k  krk .Nzk I /k D kM.  k /k  .1  c0 /: Combining the last inequality, (12) and noting that kz  zNk k   we obtain k˚ .z/  zN k k  k˚ .z/  ˚ .Nzk /k C k˚ .Nzk /  zNk k  ; which proves ˚ is a self-mapping onto N . Consequently, for any kC1 2 N , ˚kC1 possesses a unique fixed point zNkC1 in N by virtue of the contraction principle. This statement is equivalent to zNkC1 is a KKT point of P.kC1 /, i.e. zNkC1 2 .kC1 /. Hence, kC1 ¤ ;. Step 1.4. Finally, we estimate kNzkC1  zNk k. From the properties of ˚ we have kNzkC1  zk  .1  c0 /1 k˚kC1 .z/  zk; 8z 2 N :

(13)

Using this inequality with z D zN k and noting that zNk D ˚k .Nzk /, we have kNzkC1  zN k k  .1  c0 /1 k˚kC1 .z/  ˚k .Nzk /k:

(14)

98

T.D. Quoc et al.

Since krk .Nzk I k /  rk .Nzk I kC1 /k D kM.kC1  k /k, applying again (A3), it follows from (14) that kNzkC1  zNk k  .1  c0 /1 kM.kC1  k /k:

(15)

O x k ; kC1 / as: Part 2: Let us define the residual from '.zI O xN k ; kC1 / to '.zI O xN k ; kC1 /  '.zI O x k ; kC1 /: ı.zI x k ; kC1 / WD '.zI

(16)

Step 2.1. We first provide an estimation for kı.zI x k ; kC1 /k. From (16) we have     ı.zI x k ; kC1 /D '.zI O xN k; kC1/  '.NzkI kC1 /  '.zI kC1/  '.NzkI kC1 /      '.zI O x k ; kC1 /  '.zk I kC1 / C '.zI kC1 /  '.zk I kC1 / Z 1 Z 1 B.Nzkt I xN k /.z  zN k /dt (17) B.zkt I x k /.z  zk /dt  D 0

0

D

Z

0

1 k k  B.zt I x /  B.Nzkt I xN k / .z  zk /dt 

Z

0

1

B.Nzkt I xN k /.zk  zNk /dt;

where zkt WD zk C t.z  zk /, zN kt WD zNk C t.z  zNk / and B is defined by (9). Using the definition of 'O and the estimations of Eg and g 0 at Step 1.1, it is easy to show that  1=2 kB.zkt I x k /  B.Nzkt I xN k /k  kEg .zkt/Eg .Nzk /k2F C2kg0 .xtk /  g0 .x k /k2F (18) p  1=2  2 3": C kEg .Nzkt/Eg .Nzk /k2F C2kg0 .xN tk /  g0 .xN k /k2F Similar to (10), the quantity B.Nzkt I xN k / is estimated by kB.Nzkt I xN k /k   C

p

3":

(19)

Substituting (18) and (19) into (17), we obtain an estimation for kı.zI x k ; kC1 /k as kı.zI x k ; kC1 /k  . C

p p 3"/kzk  zNk k C 2 3"kz  zk k:

(20)

Step 2.2. We finally prove the inequality (5). Suppose that zkC1 is a KKT point of Pcvx .x k I kC1 /, we have 0 2 '.z O kC1 I x k ; kC1 / C NK .zkC1 /. This inclusion kC1 k kC1 implies ı.z I x ; kC1 / 2 '.z O I xN k ; kC1 / C NK .zkC1 /  L.zkC1 I kC1 / by kC1 k the definition (16) of ı.z I x ; kC1 /. On the other hand, since 0 2 '.N O zk I xN k ; k / C k k NK .Nz /, which is equivalent to ı1 WD M.kC1  k / 2 L.Nz I kC1 /, applying (A3) we get kzkC1  zN k k  kı.zkC1 I x k ; kC1 /  ı1 k  kı.zkC1 I x k ; kC1 /k C kM.kC1  k /k:

Real-Time Sequential Convex Programming for Optimal Control Applications

99

Combining this inequality and (20) with z D zkC1 to obtain p p kzkC1 zNk k  . C 3"/kzk Nzk kC2 3 "kzkC1zk kC kM.kC1 k /k: (21) Using the triangular inequality, after a simple arrangement, (21) implies kz

kC1

kC1

 zN

p p 1 C 2 3 " kC1

. C 3 3"/ k k p kz  zN k C p kNz  zNk k k  1  2 3 " 1  2 3 "

C p kM.kC1  k /k: 1  2 3 "

(22)

i h p p 3"/

2 3 "C1 p p C 1 . By the choice , ck WD Now, let us define !k WD .C3 1c

0 12 3 " 12 3 " of " at Step 1.1, we can easily check that !k 2 .0; 1/ and ck > 0. Substituting (15) into (22) and using the definitions of !k and ck , we obtain kzkC1  zNkC1 k  !k kzk  zNk k C ck kM.kC1  k /k; which proves (5). The theorem is proved.



If   fg then the RTSCP method collapses to the full-step SCP method described in Sect. 1. Without loss of generality, we can assume that k D 0 for all k  0. The following corollary immediately follows from Theorem 1. ˚  Corollary 1. Suppose that zj j 1 is the sequence of the KKT points of Pcvx .x j 1 I 0/ generated by the SCP method described in Sect. 1 and that the assumptions of Theorem 1 hold for k D 0. Then kzj C1  zNk  !kzj  zNk; 8j  0;

(23)

where ! 2 .0; 1/ is the contraction factor. Consequently, this sequence converges linearly to a KKT point zN of P.0/.

4 Numerical Example: Control of an Underactuated Hovercraft In this section we apply RTSCP to the control of an underactuated hovercraft. We use the same model as in [12], which is characterized by the following differential equations: 8 ˆ ˆ 0:015 > 0:04 > 0:08 > 0:09

GLOBAL CHANGE

Time > 400 > 100 > 200 > 100

NRMSE

0.017 0.022 0.043 0.089

Time 2,796 15 157 7,946

LOCAL RESIDUAL NRMSE

0.013 0.023 0.042 0.085

Time 2,463 83 1,011 2,500

The run time depends on the number of grids, but also on the kind of grids which are being used. Grids which depend on a small number of dimensions but are highly refined, i.e. for a few but large entries in k, are worse in this regard than grids which depend on more dimensions, but only have a small level, i.e. many, but small entries in k. For the considered data sets all dimensions were used at least in one grid, although the number of grids can depend largely for the different attributes. We observed up to 5 non-constant dimensions per grid. How often a dimension is used and the size of the error indicators for these grids are information about the importance of attributes and can be derived from the final results. If this information is worthwhile in practise needs to be investigated on real life data sets together with specialists from the application area. Only on the helicopter data set with just 13 dimensions the non-adaptive optimized combination technique [3] could be used. It achieves a MSE of 3:265 in 18,280 s using level 3. Level 4 was not finished after 5 days. Finally a comparison with results using CVR, a special form of support vector regression, is given in Table 2. For all data sets our method achieves better results, but might need more, in one case quite significant, run time. On the other hand, using a smaller tolerance a somewhat worse result could be achieved by our approach in less time. Note that for a larger synthetic data set a quite significant run time advantage of the dimension adaptive approach in comparison to CVR can be observed [3, 9].

124

J. Garcke

4 Conclusions and Outlook The dimension adaptive combination technique for regression shows good results in high dimensions and breaks the curse of dimensionality of grid based approaches. It gives a non-linear function describing the relationship between predictor and response variables and (approximately) identifies the ANOVA-decomposition. Of the three different refinement criteria, GLOBAL CHANGE is best suited for applications using a small number of partial grids, otherwise LOCAL RESIDUAL performed best. It is known that error estimators which use the difference between two approximations of different resolution, i.e. of extrapolation type, have weaknesses. For example the error estimator can be small, although the actual error is still large [1]. Furthermore, the combination technique can also be derived as an extrapolation technique, therefore a thorough investigation of the observed behaviour in this context is warranted. We currently employ a simple greedy approach in the adaptive procedure. More sophisticated adaptation strategies and different error indicators, for example taking computational complexity of a grid into account, are worthwhile investigating, especially in regard to an underlying theory which could provide robustness and efficiency of the approach similar to the numerical solution of partial differential equations with adaptive finite elements [1]. The original approach scales linear in the number of data [2, 3]. In the dimension adaptive approach at least the computational effort for each partial grid scales linear in the number of data. Since the value of the adaption and stopping criteria depends on the number of data, the number of partial grids might change with a different number of data for a given stopping tolerance. Although we did not observe such unwanted behaviour in our experiments, it has to be seen if in a worst case scenario the dimension adaptive approach could result in a non-linear scaling in regard to the number of data.

References 1. Mark Ainsworth and J.Tinsley Oden. A posteriori error estimation in finite element analysis. Wiley, 2000. 2. J. Garcke, M. Griebel, and M. Thess. Data mining with sparse grids. Computing, 67(3):225–253, 2001. 3. Jochen Garcke. Regression with the optimised combination technique. In W. Cohen and A. Moore, editors, 23rd ICML ’06, pages 321–328, 2006. 4. Jochen Garcke. A dimension adaptive sparse grid combination technique for machine learning. In Wayne Read, Jay W. Larson, and A. J. Roberts, editors, Proc. of 13th CTAC-2006, volume 48 of ANZIAM J., pages C725–C740, 2007. 5. T. Gerstner and M. Griebel. Dimension–Adaptive Tensor–Product Quadrature. Computing, 71(1):65–87, 2003. 6. M. Griebel, M. Schneider, and C. Zenger. A combination technique for the solution of sparse grid problems. In P. de Groen and R. Beauwens, editors, Iterative Methods in Linear Algebra, pages 263–281. IMACS, Elsevier, 1992.

A Dimension Adaptive Combination Technique Using Localised Adaptation Criteria

125

7. M. Hegland. Adaptive sparse grids. In K. Burrage and Roger B. Sidje, editors, Proc. of 10th CTAC-2001, volume 44 of ANZIAM J., pages C335–C353, 2003. 8. M. Hegland, J. Garcke, and V. Challis. The combination technique and some generalisations. Linear Algebra and its Applications, 420(2–3):249–275, 2007. 9. Ivor W. Tsang, James T. Kwok, and Kimo T. Lai. Core vector regression for very large regression problems. In Luc De Raedt and Stefan Wrobel, editors, 22nd ICML 2005, pages 912–919. ACM, 2005.



Haralick’s Texture Features Computation Accelerated by GPUs for Biological Applications Markus Gipp, Guillermo Marcus, Nathalie Harder, Apichat Suratanee, Karl Rohr, Rainer K¨onig, and Reinhard M¨anner

Abstract In biological applications, features are extracted from microscopy images of cells and are used for automated classification. Usually, a huge number of images has to be analyzed so that computing the features takes several weeks or months. Hence, there is a demand to speed up the computation by orders of magnitude. This paper extends previous results of the computation of co-occurrence matrices and Haralick texture features, as used for analyzing images of cells, by generalpurpose graphics processing units (GPUs). New GPUs include more cores (480 stream processors) and their architecture enables several new capabilities (namely, computing capabilities). With the new capabilities (by atomic functions) we further parallelize the computation of the cooccurrence matrices. The visually profiling tool was used to find the most critical bottlenecks which we investigated and improved. Changes in the implementation like using more threads, avoiding costly barrier synchronizations, a better handling with divergent branches, and a reorganization of the thread tasks yielded the desired performance boost. The computing time of the features for one image with around 200 cells is compared to the original software version as a reference, to our first CUDA version with computing capability v1.0 and to our improved CUDA version with computing capability v1.3. With the latest CUDA version we obtained an improvement of 1.4 to the previous CUDA version, computed on the same GPU (gForce GTX 280).

M. Gipp  G. Marcus  R. M¨anner Department of Computer Science V Institute of Computer Engineering (ZITI), University of Heidelberg B6, 26, 68131 Mannheim, Germany e-mail: [email protected]; [email protected]; [email protected] N. Harder  A. Suratanee  K. Rohr  R. K¨onig Department of Bioinformatics and Functional Genomics IPMB, BIOQUANT and DKFZ Heidelberg, University of Heidelberg Im Neuenheimer Feld 267, 69120 Heidelberg, Germany e-mail: [email protected]; [email protected]; [email protected]; [email protected] H.G. Bock et al. (eds.), Modeling, Simulation and Optimization of Complex Processes, DOI 10.1007/978-3-642-25707-0 11, © Springer-Verlag Berlin Heidelberg 2012

127

128

M. Gipp et al.

In total, we achieved a speedup of 930 with the most recent GPU (gForce GTX 480, Fermi) compared to the original CPU version and a speedup of 1.8 compared to the older GPU with the optimized CUDA version. Keywords Co-occurrence matrix • GLCM • Graphics processing units • GPU • GPGPU • Fermi • Haralick texture features extraction

1 Introduction In 1973 Haralick introduced the co-occurrence matrix and texture features for automated classification of rocks into six categories [6]. Today, these features are widely used for different kinds of images, for example, for microscope images of biological cells. One drawback of the features is the relatively high costs for computation. However, it is possible to speed up the computation using generalpurpose graphics processing units (GPUs). Nowadays, GPUs (ordinary computer graphics cards) are used to accelerate non-graphical software by highly parallel execution. In biological applications, features are extracted from microscopy images of cells and are used for automated classification as described in [3, 7]. Figure 1 shows an example of a microscopy image (1,3441,024 pixels and 12 bit gray level depth), which includes several hundred cells (typically 100–600). Usually a very large number of images has to be analyzed so that computing the features takes several weeks or months. Hence, there is a demand to speed up the computation by orders of magnitude. The overall goal of this biological application is to construct a network of signalling pathways of the cells. Therefore, genes are knocked down and images are acquired. Afterwards, the images are segmented using the adaptive thresholding algorithm in [7] to distinguish cells from the background. For the segmented cells Haralick texture features are computed. Besides these features also other features are calculated and a well-chosen list of features is used for classification. The classification result yields information about the signalling network of the cells. Due to a large range of interesting genes and images, the image analysis process must

Fig. 1 Microscopy image with several hundred cells

Haralick’s Texture Features Computation Accelerated by GPUs

129

be automated. After analyzing the different computation steps it turned out that the Haralick texture features consume most of the time. In a previous GPU version we analyzed the features and found similarities in the computation between them. Common intermediate results were visualized in a dependency graph and the optimal computational order was determined to avoid costly double computation. Further, we grouped the features in functional steps and parallelized the computation. Implementation details on how the features are parallelized in order to dramatically speed up the computations can be found in [4]. In this paper, our approach is to optimize the latest GPU version. Since then the profiling tool offered by NVIDIA has been fully developed and includes good visualization options. The profiling is simple and shows a detailed timing behavior with a direct comparison with small changes in the implementation. Furthermore, new GPU architectures with more computing functions (computing capabilities) are available. These computing capabilities helps to speed up our latest algorithm by changing the structure of the implementation. Hence, we use new tools and architectures to further improve our latest results. Below, we present the latest changes in the state of the art of GPUs, refresh important details about the co-occurrence matrices, and introduce all versions of the computations. Then, we explain the investigation of the profiling and implementation changes in Sects. 2.4 and 2.5 (these sections are a good entry for those familiar with our last paper). Afterwards, we present the speed up factors for the different versions. We finally discuss the result and draw conclusions.

2 Methods 2.1 State of the Art Speedup of the computation of the co-occurrence matrix and the Haralick texture features using reconfigurable hardware has been described in [9]. There, only a subset of the 14 features was chosen, obtaining a speedup of 4.75 for the cooccurrence matrix and 7.3 for the texture features when compared to a CPU application. More recent FPGAs (Xilinx Virtex4, Virtex5, Virtex6) would provide more space to implement more features at a higher clock speed. Using GPUs for general-purpose computation is more and more common. During the last years, the peak computing power of GPUs has been rising dramatically. As an example, the NVidia GTX 480 from the Geforce-Force-400 series with 480 usable thread processors and 1.4 GHz clock speed reached over 1,345 GFLOPS. It can process 2 single precision floating operations (SP) concurrently or 1 double precision (DP) per thread processor. Hence, the maximum speed is computed by 480 * 1.4 GHz * 1 floating operation DP gives  672 GFLOPS, and double as fast with SP. A state of the art CPU (Intel Xeon X7560 with 16 thread cores at 2.66 GHz turbo) reaches around 307 GFLOPS [1], i.e. 19 GFLOPS for each core. Figure 2 illustrates the peak performance of GPUs and CPUs and highlights a much steeper growing curve for GPUs. Reference [8] presents various applications

130

M. Gipp et al.

Fig. 2 Peak performance growing curve of different GPU and CPU generations

in which GPUs provide a speedup of 3–59 compared to CPUs. Especially n-body simulations achieve a GPU performance over 200 GFLOPS. One should mention that the total peak performance depends on the application itself and how the GFLOPS are counted. Only applications using multiply-add operations without divisions and other costly operations come close to the theoretical maximum performance. The better an application can be parallelized and partitioned in identical small computational units, the better the architecture of a GPU is utilized. The NVidia graphics card we use are the GeForce GTX 280 and the GeForce GTX 480 (called Fermi). The older card (GTX 280) has 30 streaming multiprocessors (SM), and each consists of 16,384 registers, 16 kBytes of shared memory, and 8 processing elements. The partition of the newer card (GTX 480) is different, it has 15 steaming multiprocessors containing 32,768 registers, 64 kBytes of shared memory, and 32 processing elements. These processing elements are arranged in a single instruction multiple data (SIMD) fashion. In total, the GPUs provides 240 and 480 parallel pipelines that can operate most efficiently if a much higher number of light-weight program threads are available. A GPU (below called “device”) is divided into many multiprocessors and is provided by several usable memories. The device memory is the biggest memory with around 1 GByte (GTX 280) and 1.5 GByte (GTX 480) but also the slowest. Access to this memory has a latency of several hundred cycles. One important improvement of the newer card is the presence of caches, so that the latency is dramatically reduced with the occurrence of a cache hit. NVidia offers an Application Programmable Interface (API), an extension to the programming language C called Compute Unified Device Architecture (CUDA) to use the highly parallel GPU architecture. One CUDA block contains a program code in a single instruction multiple threads (SIMT) fashion and is executed on one SM.

Haralick’s Texture Features Computation Accelerated by GPUs

131

All threads within a block share the total number of registers and the shared memory of one SM. Using a high number of threads has the advantage of hiding latency of memory accesses for a maximum occupation of the SM computational units. Blocks are arranged in a block grid so they can be dispatched between the SM. Reference [2] discusses the architecture and CUDA.

2.2 Co-occurrence Matrix The generation of the co-occurrence matrices (simply co-matrices) is based on second order statistics as described in [6] and [5]. With this approach, histogram matrices are computed for different orientations of pixel pairs. Using pixel pairs along a specific angle (horizontal, diagonal, vertical, co-diagonal) and distance (one to five pixels) together, a two-dimensional symmetric histogram of the gray levels is generated. The gray levels of the pixel pair address the indexes in the co-matrix and increment it by one, detailed example can be found in [5]. For each specific angle/distance combination a separate matrix must be generated. This means that one side of the square co-matrix is as long as the gray range level in the image. The microscope generates multi cell images (Fig. 1) with a gray level depth of 12 bits corresponding to 4,096 different gray levels. Hence, each co-matrix needs 4,0964,0964 bytesD64 Mbytes of storage capacity. The graphics device is equipped with 1,536 Mbytes of memory. Therefore we can generate only 24 matrices at once and compute the features on the corresponding image, which does not fully use the GPU. For a massive parallel approach we need to reduce the size of the co-matrices and the size depends on the existing gray range of each cell image extracted from the multi cell image. Actually, the co-matrices contain zeros almost everywhere. The reason for this sparse matrix is that a cell image contains nothing purely random and all the pixel pairs have preferred gray tones so that during the co-matrix counting part the elements are not determined randomly. For example, the cell core pixels have a small variation compared to the neighboring pixels. Especially the background of the segmented image contains only pixels with the same intensity value so that no gray tone difference between neighboring pixels exists. These facts result in many pixel pairs with similar gray values and the counting in the matrix being more or less spotted into small regions. Figure 3a shows a binary image of a full matrix with a size of 4,0964,096 pixels. Especially the plane background of the cell images has the gray tone zero (black) with only one combination of gray levels (zero/zero) apart from the background cell border combinations. In our algorithm we delete all rows having only zero elements (and also all columns since the matrices are symmetric) to obtain a smaller packed co-matrix. Figure 3b shows the co-matrix of Fig. 3a in a packed representation with only 277277 elements. For this example, a reduction from 64 MByte to 300 kByte could be achieved. The total average packed co-matrix size has been determined to be about 1.5 MByte of storage space. A large standard deviation in the average size

132

M. Gipp et al.

Fig. 3 Binary images of a full (a) and a packed (b) co-occurrence matrix. White pixels indicate zeros and black pixel indicate values differ from zero

forces us to assume a bigger size to determine the actually memory demand for the computations. For the feature computations, we store the gray value index of the full comatrix in a lookup table corresponding to the index of the packed co-matrix. So the gray value can be reconstructed from the index of the packed co-matrix, which is necessary for computing related features. This co-matrix reduction strategy is a compromise between less storage capacity and direct accessibility in memory and works well in our algorithm for real cell images.

2.3 Previous Versions Previously, we implemented an optimized software version and a GPU version. In our software version, we analyzed the existing software version that computes the Haralick texture features. The goal was to optimize the code and run it on a single node. The single node version can be used to run it on a cluster with different data sources. In our GPU version we could parallelize the software version in several ways, to compute several cell images in parallel (C ), to generate all co-occurrence matrices for each angle/distance combination in parallel (A D ), and to compute each feature by summing and multiplying several elements in parallel. In CUDA we created a grid of A D  C blocks so that for each cell all matrices and features are computed in parallel. More details on how we mapped the CUDA blocks on the GPU architecture and how we used the threads within a CUDA block can be found in [4].

2.4 Optimizations with Profiling The profiler showed us the most time consuming-computational functions in our algorithm and visualized it. With the profiling we investigated the functions and

Haralick’s Texture Features Computation Accelerated by GPUs

133

found various points for changes. Many divergent paths and synchronization barriers could be avoided by changes in the structure. Access on the matrices are row-wise in small blocks simultaneously as many threads are used. Often, the row size is not a multiple of the thread block size so that at the border of the row divergent paths exist for some threads. We divided the memory accesses in a common part, all threads executing before they reach the border, and a border part with only thread blocks executing on the borders. The common way is to read from the global memory, store the information in a shared memory, then synchronize, do operations on shared memory, synchronize again, and write the results in a global memory. For many functions with few operations, the execution order was changed to read from global memory, compute the operations and write the results back in global memory. The operations read, compute, and write are coded in just one line of code. We are not using the shared memory, therefore we got rid of all synchronization barriers. In the next section we will give more details on how and why the changes effect the speed up in the optimized implementation.

2.5 Implementation Changes Before we explain the changes to our last version we refresh details about the structure of the algorithm. On each multi cell image C , A D co-matrices with different orientations are generated and on each co-matrix 13 features are computed. The total computation consists of C  A D  13 features. Before the features can be computed we need to generate the C A D matrices. Afterwards the features are computed on the matrices. Below we describe implementation details about the matrix generation process and how we further improved it. For details of the feature implementation we refer to our last work in [4]. With the profiling we could identify and significantly optimize the slowest kernel functions. This section explains the implementation changes of the functions with the biggest gain of speedup, see also the result Table 2. Kernel Function 0B sets all matrix elements to zero, made by C  A D CUDA blocks in the old code. Since each CUDA block sets one relatively small matrix to zero, this includes also a small work load. A better efficiency of the work load and a reduction of the call overhead could be achieved by using only C CUDA blocks which do the work of A D matrices. The matrix generation process in Function 0C is similar to the computation of a histogram. Each event increments one of the histogram bins. First the value is read, than one is added, and finally the result is written back to the same memory address. This process is not thread safe, which means that several threads working on the same value have to be mutual exclusive accesses. In our previous version we avoided computing conflicts by using one thread within each CUDA block generating one matrix. Our current version is supported by atomic functions

134

M. Gipp et al.

provided by devices with higher computing capabilities. The atomic add reads the memory value, increments it, and writes it back in one instruction. This has several advantages, for example it needs less memory accesses, the add is executed by the memory controller so that the computational units are free. More important, the atomic operations are thread safe so that several threads can be used to generate one matrix without any computing conflicts. Function 1D is the slowest kernel in the old code, so we spend much effort to speed it up. The task of this function is to compute a vector containing the sum of each matrix row. The profiling showed that using one thread for each matrix row to sum up all elements in a loop has an inefficient memory access pattern. To improve the speed of the memory accesses a change of the thread mapping from one thread per matrix row to 16 threads also reduced the number of loop iteration, respectively. In Sect. 2.4 we already mentioned that the use of shared memory with costly barrier synchronizations and only few computational instructions is contra productive in comparison to avoiding the barriers at all. Further, splitting up the computation in a common part without divergent branches and a border part with branches increases the occupancy of the GPU. These two changes are applied to several kernel functions in the new code. The ones with the most gain are the functions 0D, 1F and 5A.

3 Results We compare six versions of the Haralick texture feature computation: the original version, a well optimized software version, and two CUDA versions using different GPUs. Results are shown in Table 1. The execution times have been compared on a Intel Core 2 Quad machine (Q6600) with 2.4 GHz and 8 MBytes L2 cache, 4 GBytes DDR2 RAM with 1,066 MHz clock speed, a NVidia GeForce 8,800 GTX with a 1,350 MHz shader clock, 768 MByte GDDR3 at 900 MHz and 384 Bit wide in a PCIe v1.0 16x slot; and

Table 1 Comparison of execution times and speedup factor of all introduced versions and different GPUs Execution Speed up Speed up Speed up time [s] factor to 1. factor to 2. factor to 3. 1. Original software version 2. Optimized software version 3. GPU version I (8,800 GTX) 4. GPU version I (GTX 280) 5. GPU version II (GTX 280) 6. GPU version II (GTX 480)

2,378 214 11.1 6.6 4.65 2.55

– 11 214 360 511 930

– – 19 32 46 83

– – – 1.7 2.4 4.4

Haralick’s Texture Features Computation Accelerated by GPUs

135

a NVidia GeForce GTX280 with a 1,300 MHz shader clock, 1,024 MByte GDDR3 at 1,107 MHz and 512 Bit wide in a PCIe v2.0 slot. The operating system was Linux Ubuntu x64 with kernel version 2.6.20 and gnu C-compiler version 4.1.2. For software version 1 and 2 we used one CPU core only. In the GPU version, we chose C=8 and AD=20, i.e. eight cells are calculated in parallel with 4 angles and 5 directions per cell. These parameters gave the best results. The total grid size is 160 blocks in CUDA for each feature kernel. CUDA Version I is compiled for the architecture with computing capability v1.0 and CUDA version II is compiled with computing capability v1.3. In Table 2 we show the profiling improvements of the previously and current CUDA version. In the first column, the kernel functions are listed. The second column contains the execution times of our previous GPU version and in the third column the execution times of our current version is listed. The last column contains the speedup factors, respectively.

Table 2 Execution times and speedup factors of the previous and current GPU versions ID, Function CUDA version 1 CUDA version 2 0A, lookup tables 276.3 ms 275.9 ms 0B, clear co-matrices 242.4 ms 86.8 ms 0C, compute co-matrices 466.4 ms 367.6 ms 0D, normalize co-matrices 224.4 ms 141.0 ms 221.8 ms 221.2 ms 1A, compute f1 1B, compute f5 416.8 ms 373.6 ms 202.5 ms 203.3 ms 1C, compute f9 1D, compute P 929.2 ms 177.0 ms 1E, compute P jxyj 310.1 ms 288.2 ms 602.6 ms 418.2 ms 1F, compute P xCy 2A, compute me an 4.5 ms 4.5 ms 2B, compute var 6.2 ms 6.2 ms 2C, compute H 4.2 ms 4.2 ms 5.0 ms 5.0 ms 3A, compute f2 4.9 ms 4.9 ms 3B, compute f11 3C, compute M ac P jx  y j 6.7 ms 6.7 ms 3D, compute f10 5.2 ms 5.2 ms 7.2 ms 7.2 ms 4A, compute f6 10.9 ms 10.9 ms 4B, compute f8 13.1 ms 13.1 ms 4C, compute f7 5A, compute f3 418.6 ms 300.5 ms 269.3 ms 270.7 ms 5B, compute f4 309.9 ms 273.0 ms 5C, compute f12 5D, compute f13 225.1 ms 226.5 ms

Factor 1 2:8 1:3 1:6 1 1:1 1 5:2 1:1 1:4 1 1 1 1 1 1 1 1 1 1 1:4 1 1:1 1

GPU execution time CPU execution time

5,183 ms 1,416 ms

3,690 ms 963 ms

1:40 1:47

Total execution time

6,600.0 ms

4,650.0 ms

1:42

136

M. Gipp et al.

4 Discussion The speedup of a factor of 930 for the GPU version compared to the original software version meets the demand of the biologists. Compared to the optimized software version the speedup is still around a factor of 83. Table 1 shows the execution times in seconds. Beside an optimized software version which is 11 times faster than the original software version. Our latest CUDA version is in comparison 4.4 times faster, including the implementation optimization and a faster GPU. In a direct comparison CUDA version II is around 1.4 times faster over CUDA version I, due to optimizations and new computing capabilities in the recent GTX 280 device. A detailed performance comparison of our CUDA versions can be found in Table 2. The best results we obtained in terms of improvements concern Function 1D and Functions 0B—0D. The biggest execution time reduction was achieved by avoiding divergent paths and reduced synchronization barriers. Many complex operations can very effectively be performed on shared memory due to its fast access times of one to two cycles. In most functions we have only two operations to compute, so that the benefit using fast shared memory is worse than the benefit of removing the barrier synchronization. Moreover, the division into a common memory access part and a memory border access part leads, together with not using shared memory, to a steady computing and memory flow without any synchronization points. The benefit of using the atomic function shows Function 0C with a improvement of 1.3. Originally, we planed to use multi-threading for the matrix generation but a view on the profiling result told us that spending more effort to parallelize it would hardly change the total computational time. Therefore, we left the matrix generation single threaded for each co-matrix. Given the complexity of the Haralick texture features and the co-occurrence matrices computations, and the application requirements, our most recent implementation yields excellent performance.

5 Conclusion In this paper we have shown that the costly computation of the co-occurrence matrix and the Haralick texture features can be speed up by a factor of 930 in comparison to the original software version. This allows biologists to perform much more tests to acquire novel knowledge in cell biology in weeks or days instead of several months. Graphics Processing Units (GPUs) are inexpensive alternatives to reconfigurable hardware with an even higher computational capability, a much shorter implementation development time, and are much faster (in orders of magnitudes) than Central Processing Units (CPUs). By using many CUDA functions to reduce the complexity of the algorithm in combination with avoiding divergent branches and refraining from using shared memory we could improve our own results by an additional factor of 1.4 and a factor of 4.4 including the latest GPU.

Haralick’s Texture Features Computation Accelerated by GPUs

137

References 1. Intel(r) microprocessor export compliance metrics. URL: http://www.intel.com/support/ processors/xeon/sb/CS-020863.htm, (5. Dec. 2008). 2. NVIDIA CUDA Programming Guid Version 2.0. URL: http://www.nvidia.com/object/cuda develop.html, (5. Dec. 2008). 3. C. Conrad, H. Erfle, P. Warnat, N. Daigle, T. L¨orch, J. Ellenberg, R. Pepperkok, and R. Eils. Automatic identification of subcellular phenotypes on human cell arrays. Genome Research, 14:130–1136, 2004. 4. M. Gipp, G. Marcus, N. Harder, A. Suratanee, K. Rohr, R. K¨onig, and R. M¨anner. Haralick’s texture features computed by GPUs for biological applications. IAENG International Journal of Computer Science, 36:1:IJCS 36 1 09, 2009. 5. R. M. Haralick. Statistical and structural approaches to texture. Proceedings of the IEEE, 67(5):786–804, 1979. 6. R. M. Haralick and K. Shanmugam. Computer classification of reservoir sandstones. IEEE Transactions on Geoscience Electronics, 11(4):171–177, 1973. 7. N. Harder, B. Neumann, M. Held, U. Liebel, H. Erfle, J. Ellenberg, R. Eils, and K. Rohr. Automated recognition of mitotic patterns in fluorescence microscopy images of human cells. In B. Neumann, editor, 3rd IEEE International Symposium on Biomedical Imaging: Nano to Macro, pages 1016–1019, 2006. 8. H. Nguyen. GPU Gems 3. Addison-Wesley, Upper Saddle River, NJ, USA, 2007. 9. M. A. Tahir, A. Bouridane, F. Kurugollu, and A. Amira. Accelerating the computation of GLCM and Haralick texture features on reconfigurable hardware. In A. Bouridane, editor, International Conference on Image Processing (ICIP ’04), volume 5, pages 2857–2860, 2004.



Free-Surface Flows over an Obstacle: Problem Revisited Panat Guayjarernpanishk and Jack Asavanant

Abstract Two-dimensional steady free-surface flows over an obstacle are considered. The fluid is assumed to be inviscid and incompressible; and the flow is irrotational. Both gravity and surface tension are included in the dynamic boundary condition. Far upstream, the flow is assumed to be uniform. Triangular obstruction is located at the channel bottom as positive bump or negative bump (dip). This problem has been investigated by many researchers, such as Forbes [5], Shen [8], and Dias and Vanden-Broeck [2], to seek for new types of solutions. In this paper, the fully nonlinear problem is formulated by using a boundary integral equation technique. The resulting integrodifferential equations are solved iteratively by using Newton’s method. When surface tension is neglected, a new solution type of subcritical flow is proposed, the so-called drag-free solution. Furthermore, solutions of flows over a dip in the bottom are also presented. When surface tension is included, there is an additional parameter in the problem known as the Bond number B. In addition, the weakly nonlinear problem is investigated and compared with the fully nonlinear results. Finally, solution diagrams for all flow regimes are presented on the .F; hob/plane for which F is the Froude number and hob is the dimensionless height of the obstacle. Keywords Free-surface flow • Obstacle • Boundary integral equation • Surface tension

P. Guayjarernpanishk Department of Mathematics, Faculty of Science, Chulalongkorn University, Bangkok 10330, Thailand e-mail: [email protected] J. Asavanant Advanced Virtual and Intelligent Computing (AVIC) Research Center, Faculty of Science, Chulalongkorn University, Bangkok 10330, Thailand e-mail: [email protected] H.G. Bock et al. (eds.), Modeling, Simulation and Optimization of Complex Processes, DOI 10.1007/978-3-642-25707-0 12, © Springer-Verlag Berlin Heidelberg 2012

139

140

P. Guayjarernpanish and J. Asavanant

1 Introduction Flow over submerged obstacles is one of the classical problems in fluid mechanics. This problem has many related physical applications ranging from the flow of water over rocks to atmospheric, and oceanic stratified flows encountering topographic obstacles, or even a moving pressure distribution over a free surface. Free surface flows over an obstacle have been investigated for different bottom topography by many researchers. Lamb [7] calculated solutions of the linear problem of free-surface flow over a submerged semi-elliptical obstacle. He obtained solutions with downstream waves for subcritical flow and symmetric solutions without waves for supercritical flow. Forbes and Schwartz [6] used the boundary integral method to find fully nonlinear solutions of subcritical and supercritical flows over a semi-circular obstacle. Their results confirmed and extended Lamb’s solutions. In 1987, Vanden-Broeck [11] showed that there exist solutions of supercritical flow for which one solution corresponds to that obtained by the perturbation of uniform flow, and the other by the solitary wave. Both profiles were symmetric with respect to the obstacle. Forbes [4] computed numerical solutions of critical flow over a semi-circular obstacle. The flow was a uniform subcritical stream ahead of the obstacle, followed by a uniform supercritical stream behind the obstacle. This type of solution is generallyerred as “hydraulic fall”. Supercritical and critical flows over a submerged triangular obstacle were investigated by Dias and Vanden-Broeck [3]. They used a series truncation technique to find numerical solutions. For critical regime, the flow behavior near the apex of the triangle was similar to the flow over a wedge as the size of the triangle increased. In case of supercritical flow, the flow approached a limiting configuration with a stagnation point on the free surface with a 120ı angle. Shen et al. [10], Shen and Shen [9] and Shen [8] presented weakly nonlinear solutions of flow over an obstacle. They confirmed the two branches of solutions for supercritical flow and Forbes’s numerical result [4] was a limit of the noidal wave solution. Zhang and Zhu [12] derived a new nonlinear integral equation model in terms of hodograph variables for free-surface flow over an arbitrary bottom obstruction with downstream waves. Their results did not suffer the upstream spurious waves as those obtained by Forbes and Schwartz [6]. In 2002, Dias and Vanden-Broeck [2] found a new solution called the “generalized hydraulic fall”. Such solutions are characterized by downstream supercritical flow and a train of waves on the upstream side. This type of solution can be obtained by removing the radiation condition on the far upstream of the obstacle. Forbes [5] calculated numerical solutions of gravity-capillary flows over a semicircular obstruction. The fluid was subject to the combined effects of gravity and surface tension. Three different branches of solution were presented and compared between linear and fully nonlinear problems. In this paper we consider both the fully nonlinear problem and the weakly nonlinear problem of free-surface flows over an obstacle. Both gravity and surfacetension are included in the dynamic boundary condition. The fully nonlinear

Free-Surface Flows over an Obstacle: Problem Revisited

141

problem is formulated and solved, as in Binder, Vanden-Broeck, and Dias [1], by using the boundary integral equation technique in Sect. 2. Results and discussion are presented in Sect. 3.

2 Mathematical Formulation 2.1 Fully Nonlinear Problem We consider a steady two-dimensional flow over an obstacle. The fluid is assumed to be inviscid1 and incompressible2; and the flow is irrotational.3 The flow domain is bounded below by a horizontal bottom, except at the presence of bump/dip in the form of isosceles triangular obstacle, and above by a free surface. We introduce Cartesian coordinates .x; y/ with the x-axis along the flat part of the bottom and the y-axis directed vertically upwards through the apex of triangular obstacle. Gravity g is acting in the negative y-direction. The apex and the other two vertex of the isosceles triangular obstacle are denoted by xb ; xbl and xbr , respectively. Here the superscripts “l”and “r” refer to the leftmost and the rightmost vertices of the obstacle. Far upstream as x ! 1, the flow approaches a uniform stream with constant velocity U and constant depth H . All variables are made dimensionless with respect to the velocity and length scales U and H , respectively. The dimensionless parameters in the problem are the T Froude number4 F D pUgH , the Bond number5 B D gH 2 , and the dimensionless height of triangular obstacle hob. Here T is a surface tension and  is a fluid density. From the irrotationlity and incompressibility, there exist a potential function .x; y/ and a stream function .x; y/. Let’s define a complex potential f D  C i which is an analytical function of z D x C iy. Without loss of generality, we choose D 0 on the free surface and  D 0 at the apex of the obstacle. It follows that D 1 on the bottom. The flow domain in the complex f -plane is a strip 1 < < 0. The mathematical problem can be formulated in terms of the potential function  satisfying the Laplace’s equation and the corresponding boundary conditions

1

Inviscid implies that viscosity is negligible and therefore it can support no shearing stress. Fluid density is unchanged under pressure variation or, mathematically, the divergence of velocity is zero. 3 This can be thought of fluid flow that is free of vortices or, mathematically, the curl of velocity vanishes everywhere in the flow field. 4 The Froude number is the dimensionless ratio of characteristic velocity to wave celerity. 5 The Bond number is the dimensionless ratio of body forces (often gravitational) to surface tension forces.

2

142

P. Guayjarernpanish and J. Asavanant

xx C yy D 0 x2 C y2 C

in the fluid domain,

(1)

2 2 2 y  2 B D 1 C 2 2 F F F

on y D .x/;

(2)

y D x x

on y D .x/;

(3)

y D x hx

on y D h.x/;

(4)

x ! 1; .x/ ! 1

as x ! 1:

(5)

Here y D .x/ is the unknown free surface, y D h.x/ is the equation of the bottom xx and  D .1C 2 /3=2 is the curvature of the free surface. Equation (2) is the dynamic x boundary condition known as Bernoulli’s equation, and (3) and (4) are the kinematic boundary conditions on the free surface and on the bottom, respectively. Let us introduce a conformal mapping  D ˛ C iˇ D e f . Here the region occupied by the fluid is mapped onto the upper half of the -plane. Values of ˛ at the apex and the leftmost and the rightmost vertices of the triangular obstacle are O we can denoted by ˛b ; ˛bl and ˛br . Introducing the hodograph variables, O and , write the complex velocity w as O

O  w  x  i y D e i :

Here e O is the magnitude of the velocity and O represents the directivity angle of the flow (  O < ). We now apply the Cauchy integral formula to the function i O O in -plane with a contour consisting of the real axis (˛ axis) and a semicircle of arbitrary large radius in the upper half plane. Since the flow is uniform far upstream, O  i O ! 0 as jj ! 1. After taking the real part, we have Q .˛0 / D 

1 

Z

1 1

Q .˛/ d˛; ˛  ˛0

(6)

where .˛/ Q and Q .˛/ are the values of O and O on the ˛ axis. It should be noted that the integral equation (6) is of Cauchy principal value type. The kinematic boundary condition on the bottom (1 < ˛ < 0) of the channel implies 8 r r ˆ ˆ 0) In case of subcritical flow, there exist two types of solutions for which the first type is characterized by a train of nonlinear waves behind the obstacle (Forbes and Schwartz [6], see Fig. 1a). The second type can be called the “drag-free” solution as

a

b

c

Fig. 1 Typical free-surface profiles of flows over a bump. (a) Nonsymmetric subcritical solution for F D 0:70 and hob D 0:10. (b) Symmetric subcritical solution for D 0:20 and hob D 0:10. (c) Critical solution for F D 0:62 and hob D 0:30

Free-Surface Flows over an Obstacle: Problem Revisited

145

0.8 0.7 0.6 No solution 0.5

a ic rit

C

hob

Subcritical flow 0.4

w lo lf

0.3

Supercritical flow

0.2 0.1 Symmetric Nonsymmetric 0

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

F

Fig. 2 Solution diagram in .F; hob/-plane of free-surface flows over a bump

shown in Fig. 1b. When the Froude number F decreases to its critical value Fc (solid line with squares in Fig. 2), the amplitude of the nonlinear waves of the first type decreases and ultimately vanishes. The drag-free solution exists on the left-hand region of the solid line with squares. For supercritical flows, solutions exist with a symmetric free-surface profile as elevation wave for given values of F and hob. Two types of supercritical solutions can be found for which one is a perturbation of a uniform flow (Forbes and Schwartz [6]), so-called type 1 supercritical solution, whereas the other is a perturbation of a solitary wave (Vanden-Broeck [11]), so-called type 2 supercritical solution. Unlike the above flow regimes, critical flow over an obstacle depends only on one parameter which can be chosen to be the obstacle height hob. This type of solution is traditionally called the “hydraulic fall” (see Fig. 1c). Regions of existence of different types of solutions are illustrated in Fig. 2. In summary, for pure gravity flow over a bump, there exist two types of supercritical and subcritical flows, and a one-to-one correspondence of F and hob for hydraulic fall solutions.

3.1.2 Dip (hob < 0) For subcritical flow, there exist two types of solutions for which one is characterized by a train of nonlinear waves and the other by an elevation wave. The first type of solutions is depicted in Fig. 3a for F D 0:60 and hob D  0:20. Downstream behavior of this flow is similar to the case of a bump except in the dip region where the free surface is uplifted. The other subcritical solution takes on the form of a symmetrical elevation profile with respect to the obstacle. A typical profile is

146

a

P. Guayjarernpanish and J. Asavanant

b

c

Fig. 3 Typical free-surface profiles of flows over a dip. (a) Subcritical solution of the first type (nonsymmetric) for F D 0:60 and hob D 0:20. (b) Subcritical solution of the second type (symmetric) for F D 0:20 and hob D 0:20. (c) Supercritical solution for F D 1:50 and hob D 0:50

a

b

c

Fig. 4 Critical flows over a dip. (a) Fully nonlinear solution for F D 0:82 and hob D 0:45. dy (b) Plot of y  1 versus dx D tan  of the fully nonlinear phase trajectories for (a). (c) Weakly nonlinear profile for F D 0:79 and hob D 0:25

shown in Fig. 3b. For supercritical flow, it is found that a unique solution exists as a symmetric depression wave (see Fig. 3c). Weakly nonlinear solutions of subcritical and supercritical flows are found to be qualitatively in good agreement with the fully nonlinear results. In case of critical flow, the upstream free surface is elevated in the region of the dip with a hydraulic fall on the downstream of the obstacle. Typical profiles of fully nonlinear and weakly nonlinear critical solutions are shown in Fig. 4a and c. A fully nonlinear phase trajectory is illustrated in Fig. 4b. Figure 5 illustrates regions of existence of solutions for subcritical, supercritical, and critical flow regimes in the presence of a dip.

3.2 Free-Surface Flows over an Obstacle with Surface Tension 3.2.1 Bump (hob > 0) For subcritical flow, a nonsymmetric solution with downstream waves and nonphysical wave train of small amplitude on the upstream side was proposed by Forbes [5]. In our computation, an upstream radiation condition is imposed to

Free-Surface Flows over an Obstacle: Problem Revisited

147

0

w

-0.1

al flo

-0.3

Symmetric

Critic

hob

-0.2

Nonsymmetric

-0.4 Subcritical flow

Supercritical flow

-0.5 No solution -0.6

0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

F

Fig. 5 Solution diagram in .F; hob/-plane of free-surface flows over a dip. The Froude number on the dashed line with diamonds and the dash-pointed line with squares is 0.24 and 1.04, respectively

a

b

Fig. 6 Gravity-capillary subcritical flows over a bump. (a) Nonsymmetric solutions for hob D 0:10; F D 0:70 and B D 0; 0:005; and 0:01. (b) Symmetric solutions for hob D 0:15; F D 0:50 and B D 0:10; 0:20; 0:30; and 0:40

remove these unwanted numerical phenomenon. In Fig. 6a, the amplitude of a train of nonlinear waves increases with decreasing wavelength as the Bond number increases for nonsymmetric solution of subcritical flow. The shape of a nonlinear free-surface profile for a symmetric solution of subcritical flow, when F D 0:50 and hob D 0:15, is shown in Fig. 6b. It is found that solutions for small Bond numbers exhibit a sharper trough at x D 0 than those of large Bond numbers. For supercritical flows, as shown in Fig. 7, maximum free-surface elevation of type 1 solution is slightly greater than that obtained in the case B D 0. For solutions of

148

a

P. Guayjarernpanish and J. Asavanant

b

c

Fig. 7 Typical free-surface profiles of gravity-capillary supercritical flows over a bump. (a) Type 1 solutions for F D 1:20, hob D 0:25 and B D 0; 0:05; 0:10. (b–c) Type 2 solutions for F D 1:20; B D 0; 0:02; 0:04 and hob D 0:10 and 0:25, respectively

type 2, nonuniformity of the surface tension effect is found in the numerical results. That is, the maximum level of elevation wave increases when the obstacle height is small (see Fig. 7b) but decreases when the obstacle height is large (see Fig. 7c) as the Bond number increases in both cases. For a given Bond number, the free-surface profile on the upstream side of critical flows changes from rigid lid to a profile with elevated free-surface as hob decreases to the critical height hob  (see Fig. 8a). For hob < hob  , critical flow solution does not exist. Fully nonlinear phase trajectories of these results are shown in Fig. 8b. For each case, the trajectory starts at a fixed point y  1 D 0 with a downward jump to the solitary wave orbit and then returns to another fixed point y  1 D 23 .F 2  1/. A similar behavior can also be found as in the previous case for a fixed Bond number with various values of the height of obstacle. In particular, when the Bond number decreases to its critical value B  , the amplitude of the upstream wave increases and the shape of the free-surface above the bump ultimately approaches a dimplelike profile whose amplitude is of O.hob2 / as shown in Fig. 8c. Weakly nonlinear results and weakly nonlinear phase portraits of critical flows are shown in Fig. 8e–f for hob D 0:10 and B D 0:10; 0:20; and 0:30. 3.2.2 Dip (hob < 0) For given values of hob and F , the amplitude and wavelength of downstream waves of the type 1 solution of subcritical flow decrease as the Bond number increases. Typical free-surface profiles are shown in Fig. 9a. This solution can be found only for small values of the Bond number. When the Bond number increases (B D 0:05 ! 0:19), the maximum level of an elevation wave over a dip for a type 2 symmetric solution of the subcritical flow decreases (see Fig. 9b). Typical profiles of supercritical flow over a dip for various values of Bond number are shown in Fig. 9c. The minimum level of flow over a dip is found to be a decreasing function of the Bond number. In case of the critical flow, as the Bond number increases, the

Free-Surface Flows over an Obstacle: Problem Revisited

a

b

c

d

e

f

149

Fig. 8 Gravity-capillary critical flows over a bump. (a) Fully nonlinear solutions for B D 0:10 and hob D 0:165; 0:20; 0:25; 0:30. The critical height hob  is 0.165. (b) Values of y  1 versus dy D tan  of the fully nonlinear phase trajectories for (a). (c) Fully nonlinear solutions for hob D dx 0:20 and B D 0:10; 0:20; 0:30. The critical Bond number B  is 0:10. (d) Fully nonlinear phase trajectories for (c). (e) Weakly nonlinear solutions for hob D 0:10 and B D 0:10; 0:20; and 0:30. (f) Weakly nonlinear phase portrait for (e) showing A versus Ax

maximum elevation over a dip increases whereas the far downstream level decreases as shown in Fig. 9d. It should be noted that, for critical flow, the Froude number F is treated as part of the solution which is an inverse proportion of the far downstream free surface elevation.

150

P. Guayjarernpanish and J. Asavanant

a

b

c

d

Fig. 9 Typical fully nonlinear free-surface profiles of gravity-capillary flows over a dip. (a) Subcritical flows of type 1 for hob D 0:40; F D 0:60 and B D 0:0; 0:002; and 0:004. (b) Subcritical flows of type 2 for hob D 0:30; F D 0:40 and B D 0:05; 0:10; 0:15; and 0:19. (c) Supercritical flows for hob D 0:40; F D 1:50 and B D 0; 0:05; 0:10; and 0:15. (d) Critical flows for hob D 0:20 and B D 0; 0:02; 0:04; and 0:06

4 Conclusion Subcritical, supercritical and critical flows of gravity-capillary waves over a triangle-shaped bump and dip are considered. Fully nonlinear solutions are calculated by using the boundary integral equation technique. When the flow is subcritical or supercritical, there exists a three-parameter family of solutions (F; B and hob). For the critical flow, it is found that there is a two-parameter family of solutions (B and hob). In this paper, new solutions of subcritical and critical flows over a bump and critical flows over a dip are found for both fully nonlinear and weakly nonlinear problems. Acknowledgements This work was partially supported by the Graduate and Faculty of Science, Chulalongkorn University, the National Research Council of Thailand, the Franco-Thai Cooperation Program in Higher Education, and Advanced Virtual and Intelligent Computing (AVIC) Research Center.

Free-Surface Flows over an Obstacle: Problem Revisited

151

References 1. Binder, B.J., Vanden-Broeck, J.-M., Dias, F.: Forced solitary waves and fronts past submerged obstacles. Chaos., 15, 037106-1–13 (2005) 2. Dias, F., Vanden-Broeck, J.-M.: Generalised critical free-surface flows. J. Eng. Math., 42, 291– 301 (2002) 3. Dias, F., Vanden-Broeck, J.-M.: Open channel flows with submerged obstructions. J. Fluids. Mech., 206, 155–170 (1989) 4. Forbes, L.K.: Critical free-surface flow over a semi-circular obstruction. J. Eng. Math., 22, 3–13 (1988) 5. Forbes, L.K.: Free-surface flow over a semicircular obstruction, including the influence of gravity and surface tension. J. Fluid. Mech., 127, 283–297 (1983) 6. Forbes, L.K., Schwartz, L.W.: Free-surface flow over a semi-circular obstruction in a channel. J. Fluid. Mech., 114, 299–314 (1982) 7. Lamb, H.: Hydrodynamics. Cambridge, Cambridge University Press (1945) 8. Shen, S.S.P.: On the accuracy of the stationary forced Korteweg-de Vries equation as a model equation for flows over a bump. Quar. App. Math., 53, 701–719 (1995) 9. Shen, S.S.P., Shen, M.C.: Notes on the limit of subcritical free-surface flow over an obstruction. Acta Mech., 82, 225–230 (1990) 10. Shen, S.S.P., Shen, M.C., Sun, S.M.: A model equation for steady surface waves over a bump. J. Eng. Math., 23, 315–323 (1989) 11. Vanden-Broeck, J.-M.: Free-surface flow over an obstruction in a channel. Phys. Fluids., 30, 2315–2317 (1987) 12. Zhan, Y., Zhu, S.: Open channel flow past a bottom obstruction. J. Eng. Math., 30, 487–499 (1996)



The Relation Between the Gene Network and the Physical Structure of Chromosomes Dieter W. Heermann, Manfred Bohn, and Philipp M. Diesinger

1 Introduction Human cells contain 46 chromosomes with a total length of about 5 cm beads-ona-string type of nucleosomal fibre, called chromatin. Packaging this into a nucleus of typically 5–20 m diameter requires extensive compatification. This packaging cannot be random, as considerable evidence has been gathered that chromatin folding is closely related to local genome function. However, the different levels of compactification are ill understood and not easily accessible by experiments. Consensus is that chromosomes are folded and compactified on several length scales. The lowest level of compactification is the nucleosome [27] consisting of a cylindrical-shaped histone octamer and a stretch of DNA which is wrapped around the histone complex approximately 1.65 times. The histone octamer consists of four pairs of core histones (H2A, H2B, H3 and H4) and is known up to atomistic resolution [8, 15]. The nucleosomes are connected by naked DNA strands and together with these linkers they form the so-called 30 nm fibre. The histone H1 (and the variant histone H5 with similar structure and functions) is involved in the packing of the beads on a string structure into the 30 nm chromatin structure (the second level of compaction). To do so it sits in front of the nucleosome keeping in place the DNA which is wrapped around the histone octamer and thus stabilizes the chromatin fibre. The folding motifs of the chromatin fibre on the scale of the entire chromosome are totally unclear. Imaging techniques do not allow one to follow the folding path of the fibre in the interphase nucleus. Therefore, indirect approaches have been used to obtain information on the folding [12, 20]. There is an ever growing body of evidence that chromatin loops play a dominant role in transcriptional regulation [16].

D.W. Heermann  M. Bohn  P.M. Diesinger Institute for Theoretical Physics, University of Heidelberg, Philosophenweg 19, 69120 Heidelberg, Germany e-mail: [email protected] H.G. Bock et al. (eds.), Modeling, Simulation and Optimization of Complex Processes, DOI 10.1007/978-3-642-25707-0 13, © Springer-Verlag Berlin Heidelberg 2012

153

154

D.W. Heermann et al.

It was suggested that genes from different positions on a chromosome assemble to transcription factories showing a high degree of gene activity. These studies indicate that there is a tight and probably causal relationship between folding of the chromatin fibre and its transcriptional activity. Models and simulations are required to gain understanding and compare to the mostly indirect evidence of the folding motifs. There are challenges here too, both on the modelling as well as on the numerical methods side. It is literally impossible to do simulations on the level of individual atoms or DNA base pairs. Coarsegraining descriptions are necessary due to the enormous amount of number of constituents of the system. This holds true for all the scales involved. On the scale of the 30 nm fibre it is impossible to use atomistic simulations and methods and clearly this is so on the scale of an entire chromosome. Further new methods are necessary to take into account the dynamic nature of problem. The individual association of molecules to chromosomes that alter the folding motifs can certainly not be taken into account on an individual basis. Here methods that use statistics are required. Taking this altogether multi-scale methods are clearly needed to couple the information on the different levels. This awaits invention, however.

2 Modelling on the 30 nm Scale Chromatin can be described by the two-angle model which was introduced by Woodcock et al. [28] to model the geometry of the 30 nm chromatin fibre. In the framework of the extended two angle model (“E2A-model”) (cf. Fig. 1) the nucleosomes will be characterized by the centers Ni 2 R3 and their orientations pOi 2 R3 . The linkers between the centers of two nucleosomes will be denoted by bi WD Ni Ni 1 with i D 1; : : : ; N for a fibre of N nucleosomes. The length kbi k of the linkers will be a further input parameter of the model (opposite of the direction bi 2 R3 of the linkers). Furthermore, the entry-exit-angle ˛i 2 Œ0;  between two consecutive linkers is defined by ˛i WD ^ .bi ; bi C1 / with i D 1; : : : ; N  1 and the rotational angle ˇi 2 Œ0;  between two consecutive orientations is given by ˇi WD ^ .pi 1 ; pi / with i D 1; : : : ; N . Moreover, hi represents the distance along the orientational axis pOi 1 from Ni 1 to Ni due to the spatial discrepancy between in and outgoing DNA strand. hi can be expressed by the vertical distances di which the DNA covers by wrapping up itself around the histone complexes: hi D 12 .di 1 Cdi / with i D 1; : : : ; N . The construction of the fiber can be done using an iterative process. A further part of the model is the presence of a H1 histone which is assumed to be present with probability p.

The Relation Between the Gene Network and the Physical Structure of Chromosomes

155

Fig. 1 The figure shows the basic parameters of the E2A model: The entry-exit-angle ˛i , the rotational angle ˇi , the linker length bi and the vertical distance di between in and outgoing linker. We chose a large entry-exit-angle here to make the visualization clear

The first nucleosome center and its orientation are arbitrary. We chose: 0 1 0 1 0 0 N0 D @ 0 A ; pO0 D @ 0 A : 0 1

The following vectors fulfil the conditions of the two angle model for the second nucleosome location and its orientation: 1 0 q 0 1 1 q  kb1 k2  h21 C B N1 D N0 C kb1 k2  h21 @ 0 A C h1 pO0 D @ A 0 0 h1

and

ˇ

pO1 D RaO 1 pOi 1

aO D .1; 0; 0/t :

with

Now we can calculate Ni C1 and pOi C1 in dependence of Ni , Ni 1 , pOi and pOi 1 . With vi WD bi C hpOi ; bi ipO i and

v0i

WD

Rp˛O0i

q

bi2C1



di2C1



vi kvi k



C hi C1 pOi

(1)

156

D.W. Heermann et al.

one gets the location of nucleosome i C 1 by Ni C1 D Ni C v0i : ˛0 is the angle between the projections of bi C1 and bi 1 onto an arbitrary plane orthogonal to pOi . We need to calculate the dependence of this projected entry-exitangle ˛0 on the actual entry-exit-angle ˛. Using the law of cosine one gets l 2 D bi2 C bi2C1  2bi bi C1 cos.˛/:

(2) T

Now we will use an affine transformation T to a new coordinate system .x; y; z/ ! .x 0 ; y 0 ; z0 / in order to get a second relation for l. We shift the origin to Ni and rotate our old coordinate system so that pOi corresponds to the new z-axis. Furthermore, the new x-axis has to coincide with the projection of bi onto any plane orthogonal to pOi . Obviously, l 2 D kbi C v0i k2 D kbi0 C v00i k2 with

1 0q bi2  hpO i ; bi i2 T C B bi ! bi0 D @ A 0 hpOi ; bi i

and

q 1 cos.˛0 / bi2C1  h2iC1 C Br T q  2 C B v0i ! v00i D B b 2  h2  cos.˛ / b 2  h2 C: 0 A @ i C1 i C1 i C1 i C1 0

hi C1

This leads to

q q l 2 D bi2C1 C bi2  2hi C1hpOi ; bi i  2cos.˛0 / bi2  hpOi ; bi i2 bi2C1  h2iC1 : (3) By comparing (2) and (3) one gets eventually bi bi C1 cos.˛/  hi C1 hpO i ; bi i q cos.˛0 / D q bi2C1  h2iC1 bi2  hpOi ; bi i2 with the boundary condition ˛0 > ˛mi n

.hi C1 C khpOi ; bi ik/2  bi2C1  bi2 D acos 2bi bi C1

!

(4)

The Relation Between the Gene Network and the Physical Structure of Chromosomes

157

due to non-vanishing di and di C1 . The calculation of Ni C1 is complete, since we now know the dependence of ˛0 on ˛ and therefore one can use (1) to determine Ni C1 . But one still has to calculate the orientation pi C1 of nucleosome Ni C1 . Due to the fixation of the in and outgoing DNA strand by the H1 histones this orientation can be calculated by a rotation around the following normalized axis a: O aO WD

bi C1  hpi ; bi C1 ipOi : kak O

pOi C1 then follows by a rotation of pOi around this axis: ˇ

pOi C1 D RaO i C1 pOi : These equations can be solved analytically in some cases and thus supply the basis of our Monte Carlo model to describe the structure of chromatin [5]. It has been shown that the excluded volume of the histone complex plays a very important role for the stiffness of the chromatin fibre [18] and for the topological constraints during condensation/decondensation processes [3]. In [22] a rough approximation of the forbidden surface in the chromatin phase diagram was given. In a previous work of ours [5] we answered questions concerning the fine structure of the excluded volume borderline which separates the allowed and forbidden states in the phase diagram with the basic assumption of spherical nucleosomes and no vertical shift between in and outgoing strand. Furthermore, we were able to show analytically that the shape of the excluded volume borderline, which is very irregular, comes from an underlying prime factor dismantling problem. In a following work [6] we presented a Ramachandran-like diagram for chromatin fibres with cylindrical nucleosomes for the extended model and furthermore discussed the influence of a vertical shift between the linkers due to H1 histones and the volume exclusion of the DNA. This diagram is shown in Fig. 2. The coloured lines represent the phase transition between allowed and forbidden states. All states below the corresponding line are forbidden, those above it are allowed. The states near the excluded volume borderline are the most interesting of the phase diagram since they are the most compact ones. The gaps in the borderline might be used by the fibre to become (at least locally) very dense. The nucleosome-nucleosome as well as the nucleosome-DNA interaction are highly complex and still an area of current research. We solved the problem of avoiding these potentials by using experimental data [29] for the distribution of the nucleosome repeat length (NRL) and taking advantage of the fact that the local chromatin parameters are not independent. This makes it possible to partially invert the convolution of the probability distributions (which is given by the experimental data) and thus get information on the individual distributions of our model parameters.

158

D.W. Heermann et al.

Phase diagram of the chromatin fiber 25

d=0.00nm d=0.17nm d=0.33nm d=0.50nm d=0.67nm d=0.83nm d=1.00nm d=1.67nm d=1.33nm d=1.50nm d=1.67nm d=1.83nm d=2.00nm d=2.17nm d=2.33nm

β [deg]

20

15

10

5

0 10

20

30

40

50

60

70

80

90

100

α [deg] Fig. 2 Cut-out of the chromatin phase diagram (for different d ). The states below the corresponding lines are forbidden due to excluded volume interactions. With increasing d more and more states become accessible to the fibre

Making use of given parameter distributions for the model parameters gives us the advantage of saving computation time that would otherwise be spent for the equilibration of the fibres. The saved computation time can then be used to generate very large fibres (i.e. chromatin fibres consisting of several Mbp). Of course excluded volume potentials for the DNA and the nucleosomes are taken into account: The DNA has a tube-like shape and the nucleosomes have the excluded volume of flat cylinders. An example conformation of such a regular chromatin fibre—that means a fibre without defects—is shown in Fig. 5. Integrating these experimental parameter distributions does not lead to one specific chromatin fibre structure, but instead to a distribution of structures in the chromatin phase diagram (cf. Fig. 3). A further part of the model is the presence of a H1 histone which is assumed to be present with a fixed probability [6, 7]. For a certain nucleosome Ni the defect probability p gives the chance of a missing H1 histone. If the histone is missing, the in and outgoing DNA strand are no longer fixed in front of the nucleosome but instead are arbitrary with respect to the excluded volume interactions of the chromatin strand (c.f. Fig. 4b). Furthermore, the second chromatin feature that we included in the model is the possibility for nucleosomes to dissolve entirely so that only naked DNA stretches remain. These DNA strands are modelled as worm-like chains with a diameter of 2.2 nm and a persistence length of 50 nm as illustrated in Fig. 4a. In a real cell nucleus nucleosome-free regions are likely to be occupied by regulatory proteins.

The Relation Between the Gene Network and the Physical Structure of Chromosomes

159

Fig. 3 A single point in this phase diagram corresponds to a specific chromatin structure. The forbidden structures lie left and below the dashed line which is the excluded volume borderline [5]. Due to the parameter distributions in our model we do not expect a specific chromatin structure but instead a distribution of structures in the phase diagram. This probability distribution is shown in the back of the figure

To estimate the average rate of nucleosome skips we used data for the average nucleosome occupancy per bp [25] that was obtained by experiments combined with a probabilistic prediction model. We use a prediction of the nucleosome occupancy for the entire yeast genome [23]. A example conformation for a such a disturbed chromatin fibre with fixed depletion rates is shown in Fig. 5. Depletion of linker histones and nucleosomes affects, massively, the flexibility and the extension of chromatin fibres. Increasing the amount of nucleosome skips (i.e., nucleosome depletion) can lead either to a collapse or to a swelling of chromatin fibres. These opposing effects were discussed and we showed that depletion effects might even contribute to chromatin compaction. Furthermore, we found that predictions from experimental data for the average nucleosome skip rate lie exactly in the regime of maximum chromatin compaction. We determined the pair distribution function of chromatin. This function reflects the structure of the fibre, and its Fourier-transform can be measured experimentally. Our calculations show that even in the case of fibres with depletion effects, the main dominant peaks (characterizing the structure and the length scales) can still be identified which might lead to new experimental approaches in determining chromatin structure for instance by light optical methods.

160

D.W. Heermann et al.

Fig. 4 Illustration of two kinds of histone depletion. (a) An example of a individual nucleosome skip. If a nucleosome is dissolved, a blank stretch of DNA will remain. The naked DNA stretches have lengths of multiple integers of the nucleosome repeat length plus once the length of a DNA linker and can either lead to a collapse or to a swelling of the chromatin fibre. In both cases they increase the flexibility of the chromatin chain. (b) An example conformation of a chromatin fibre with a missing linker histone. The upper strand and the strand below the defect are regular, i.e. the local fibre parameters are fixed. Please note that the fibre is very open to make the visualization clear

3 Modelling on the Large Scale The folding of chromatin above the scale of about 100 kb is unknown to a surprising extend. The limited resolution of light microscopy does not allow tracking the path of the chromatin fibre in vivo. An ever growing body of evidence suggests that chromatin folding is tightly connected to genome function. A pivotal role in maintaining this connection is attributed to the formation of chromatin loops, i.e. the possibility of genes and regulatory elements to co-locate [16,26]. The formation of these loops is dynamic: different genes interact with the control sequences during development in a mutually exclusive way, correlated with their expression. Loops have also been associated with the formation of transcription factories, which bring together transcriptionally active genes [10, 19]. Despite recent progress in understanding some links between genome folding and function, a coherent connection has not been established yet. Polymer models are able to shed light on the most important features of chromatin folding, being able to make predictions as well as explain experimental data on the basis of very general assumptions. An interesting ˝ ˛ outcome of recent experiments [20] is that the mean square displacement R2 between two fluorescent markers becomes independent of genomic separation g at about 5–10 Mb (see Fig. 6), indicating a folding of chromosomes into a confined space of the nucleus. Let us assume that we can approximate the chromosomal conformations on the scale above 100 kb by a bead spring or linker type of polymer model, the chain consisting of N uncorrelated, equal subunits of length b. Such a description of a biological polymer is correct when we make N sufficient small so that b is larger

The Relation Between the Gene Network and the Physical Structure of Chromosomes

161

Fig. 5 (a) Example conformation of a chromatin strand of length 40 kbp. The light blue tubes represent the DNA, the histone octamers are modeled as purple cylinders and the linker histones are marked pale yellow. This chromatin conformation with a diameter of about 34 nm has no depletion effects (i.e. it is regular). (b) Example conformation of a chromatin fibre with depletion effects: The linker histone skip rate is 6 and the nucleosome skip rate is 8. The linker histone skips are marked orange. One can see that the concept of a regular 30 nm fibre does not hold anymore. Instead one obtains very flexible coil-like structures of compact regions which are separated by naked DNA stretches. Shown is a section of the fibre which has a total length of 394 kbp

than the persistence length of chromatin [11]. Three basic polymer models are commonly used to compare experimental data with: (a) the random walk (RW) model where no volume interactions are taken into account, (b) the self-avoiding walk model (SAW) takes excluded volume into account, while (c) the globular state (GS) model furthermore includes temperature-dependent attractive interactions [4]. One characteristic feature of a polymer model is the mean squared end-to-end ˝ 2˛ distance RN , which displays a typical scaling behaviour, ˝ 2˛ RN D b 2 N 2

(5)

in the limit of large N , where l is the linker length, N the chain length and  a constant depending on the model used:  D 0:5 for the RW,   0:588 for the SAW and  D 1=3 for the GS.

162

Chromosome1

Chromosome 11

mean square distance [µm2]

a

D.W. Heermann et al.

mean square distance [µm2]

b

genomic separation [Mb] Chromosome 1

genomic separation [Mb]

genomic separation [Mb]

Chromosome 11

genomic separation [Mb]

Fig. 6 Distance measurements in fibroblast cells. a. Plots show the mean square physical ˝ ˛ distances R2 between two fluorescent markers as a function of the genomic distance for regions of increased gene density (ridges, green) and gene-poor regions (anti-ridges, red) on human chromosome 1 and 11 in the 0.5–10 Mb range. Data points in green and red correspond to the ridges and anti-ridges, respectively [20]. Error bars represent standard errors. b. The mean square ˝ ˛ displacement R2 is shown as a function of genomic distance in the 25–75 Mb range. Error bars represent standard error

We compare data from experiments (Fig. 6) to these polymer models by calcu˝ ˛ ˝ ˛2 lating the moment ratio R4 = R2 [1]. It has the advantage of being dimensionless and containing information about the fluctuations. Interestingly, we find that the experimental data displays pronounced deviations from these simple polymer models (Fig. 7), the fluctuations being even larger than for the RW model. The values are larger even than for the RW model, indicating huge distance fluctuations inside the cell nucleus. Based on these observations we propose a general polymer model, the Random Loop (RL) model [2], which is able to explain the observed levelling-off in the mean-square distance as well as the large cell-to-cell variation. The model takes into account the looping of the chromatin fibre. In contrast to other chromatin models [13,17,24], the RL model for the first time includes two important aspects of chromatin folding: Firstly, loops are assumed to be dynamic, i.e. the loop attachment points are not fixed throughout the ensemble. Secondly, our model allows the formation of loops of all sizes, in agreement with experimental evidence [26]. The RL model assumes a chain of length N , where the spatial bead positions are denoted by x0 ; : : : ; xN , to be subjected to the following potential

The Relation Between the Gene Network and the Physical Structure of Chromosomes

163

4.4 4

RW SAW GS

〈R 4〉 / 〈R 2〉2

3.6 3.2 2.8 2.4 2 1.6 1.2 0.1

1

10 genomic distance [Mb]

˝ ˛ ˝ ˛ Fig. 7 The moment ratio R4 = R2 for the experimental data from human chromosome 1 and 11 (Fig. 6) and data from the murine Igh locus [14]. The ratios are compared to the random walk (RW), self-avoiding walk (SAW) and globular state (GS) polymer model [1]. The large ratios of experimental data indicate a huge cell-to-cell variation

N 1 X k xj  xj 1 k2 C U D 2 j D1 2

N X

i 1

ij k xi  xj k2

(6)

The first term describes the connectivity of the chain, while the second term describes the formation of random loops. ij D j i are the spring constants for the loop attachment points. Right now we keep them arbitrary but they will be randomly chosen later within the model. The probability density for a bead conformation .x0 ; : : : ; xN / in the canonical ensemble is given by the P .x0 ; : : : xN / D C exp.U=kB T /. Eliminating the degrees of freedom stemming from the translational invariance and factorizing the spatial dimensions, we can rewrite the one-dimensional probability density, 1 P1 .x1 ; : : : ; xN / D C1 exp. XT KX/ 2

(7)

where X D .x1 ; : : : ; xN /T and K is a matrix made up of the ij [2]. Assuming K to be symmetric and positive semi-definite, we can integrate out some degrees of freedom and obtain the probability distribution for the coordinates of two arbitrary beads I and J P .xI ; xJ / D

Z

:::

Z

N Y

P .x1 ; : : : ; xN /

(8)

i D1;i ¤I;J

This integral can be evaluated by standard methods for normal distributions. Going back to three dimensions we obtain after some basic integral evaluations the mean square distance between two beads I and J ,

164

D.W. Heermann et al.

Fig. 8 The Random Loop Model averages over (a) the thermal disorder and (b) over the possible configurations of loops. Here one can see two possible configurations of loops

where

˝ 2 ˛ rIJ thermal D 3.JJ C II  2IJ / :

(9)

  ˙ D K 1 D ij i;j

(10)

The important point is now, that we let ij be Bernoulli-distributed random variables with probability p, meaning that the loop attachment points are chosen randomly. In a first approach, we assume a homogeneous looping probability p for all pairs of monomers, i.e. each pair of monomers will form a loop with equal probability independent of the contour length n in between. The mean square displacement between two monomers has to be calculated not only over the thermal ensemble given by (9) but also over the ensemble of different loop configurations, i.e. the random variables. Two such conformations are displayed in Fig. 8. The disorder average cannot be calculated analytically, so we have to use a representative subset of the ensemble and numerically calculate the averages. ˝ ˛ The results for the mean square displacement R2 in relation to genomic separation g are displayed in Fig. 9a. It shows an increase at small genomic separations, which is due to the random-walk nature of the backbone. At larger genomic separation, however, the model displays a leveling-off comparable to that of the experimental data. Note that loops on all scales are necessary to explain this leveling-off [2]. Interestingly, the random loop model can also explain the large ˝ ˛ ˝ ˛2 fluctuations found, represented by the ratio R4 = R2 . Fluctuations of the RLM exceed the random walk value due to the additional disorder in the system (Fig. 9b), yielding a natural explanation for the cell-to-cell variation being based on different looping configurations. In contrast to the assumptions of a homogeneous looping distribution, experiments reveal a strong dependence of the level of compaction on transcriptional activity [12] (see Fig. 6a). How can these findings be explained by the Random Loop Model? Indeed, the ˛ scale behaviour of the Random Loop model displays ˝ short a power-law behaviour R2  N 2 , the parameter , i.e. the compaction depending on the looping probability (Fig. 10a). This leads us to propose that different states of compaction can be explained by different local looping probabilities. As a first approximation we divide the polymer in ridge and anti-ridge regions and define three different looping probabilities, i.e. PR , defining loop formation in ridge regions, PAR for anti-ridges and Pinter for the interaction between such

The Relation Between the Gene Network and the Physical Structure of Chromosomes

165

Mean Square Physical Distance [µm2]

10 9 8 7 6 5 4 p=4E-5 p=5E-5 p=6E-5 p=7E-5 chromosome 11 long distance data chromosome 1 ridge data chromosome 1 anti-ridge data

3 2 1 0

0

10

20

30 40 50 genomic distance [Mb]

60

70

80

˝ ˛ Fig. 9 (a). Mean square distance R2 in relation to genomic separation g (contour length) of the Random Loop model compared to experimental data. Data is shown for the model without excluded volume and a chain length of N D 1;000 for different looping probabilities p. (b). The ˝ ˛ ˝ ˛2 dimensionless ratio R4 = R2 of the Random Loop model has much larger values than the RW, SAW or GS polymer model, in agreement with experimental data

b

200

160 140

p = 1 × 10–5, v = 0.450 ± 0.001 p = 2 × 10–5, v = 0.406 ± 0.002 p = 3 × 10–5, v = 0.366 ± 0.004 p = 5 × 10–5, v = 0.307 ± 0.005 p = 8 × 10–5, v = 0.241 ± 0.007

2

〈Rn 〉

120 100 180 60 40 20 0 0

10

20

30 40 50 Contour length n

10 chr 11 ridge chr 11 anti-ridge model in ridge region model in anti-ridge region

2 2

180

mean square displacement 〈R 〉 [mm ]

a

60

70

80

8

6

4

2

0 0

2

8 4 6 genomic distance g [Mb]

10

12

Fig. 10 a. Qualitative short scale behaviour of the Random Loop model. The relationship between the mean square displacement between two monomers and their contour ˝ ˛distance is shown for different values of the looping probability P and fitted to a power-law R2  N 2 . The scaling exponent  varies over a broad range of values, depending on the looping probability P . b. This panel shows simulations of the RL model using different P values for ridges, ant-ridges and the interactions between these regions on the q-arm of chromosome 11, as shown in Fig. 6. The assigned P values are pR D 3  105 , pAR D 7  105 and pinter D 1  105 , respectively. Calculations are without excluded volume; the coarse-grained monomer is set at 75 kb

166

D.W. Heermann et al.

regions. Figure 10b shows the result of a simulation for PR D 3  105, PAR D 7  105 and Pinter D 1  105 . The RL model with these values describes the folding of the ridge and anti-ridge region of chromosome 11 remarkably well. Thus, this heterogeneous Random Loop model allows a unified description of the folding of the chromatin fibre inside the interphase nucleus over different length scales. It furthermore bridges the gap between genome folding and function, explaining different levels of compaction with different local looping probabilities.

4 Discussion In this contribution we have presented two models for chromatin on the small and the large scale. Using Monte Carlo simulations of the 30 nm chromatin fibre, it was shown that linker histone H1 depletion as well as nucleosomal skips massively affect the flexibility and the extension of chromatin fibres. On the scale of the whole chromosome we have presented a model, the Random Loop model, which predicts important features of large-scale chromatin organization by assuming probabilistic loops on a broad range of scales. Local differences in chromatin compaction, as for instance found in ridges and anti-ridges along the q-arms of chromosomes 1 and 11 (Fig. 6a), are taken into account by locally assigning different looping probabilities to the polymer. Although still highly simplifying, this explains remarkably well the difference in compaction of ridges and anti-ridges, assuming a 2.5-fold difference in looping probability for the studied region on human chromosome 11. Thus the RL model allows for a unified description of the folding of the chromatin fibre inside the interphase nucleus over different length scales and explains different levels of compaction by assuming different looping probabilities, related for instance to local differences in transcription level and gene density. The RL model creates a basis for explaining the formation of chromosome territories, not requiring a scaffold or other physical confinement. While there is a lot of evidence that chromatin-chromatin interactions play a crucial role in genome function (e.g. see [9, 21]), our study proposes that it also plays an important role in chromatin organization inside the interphase nucleus on the scale of the whole chromosome (tens of Mbs) as well as on that of subchromosomal domains in the size range of a few Mb.

References 1. M. Bohn and D. W. Heermann, J. Chem. Phys., 130(17):174901, 2009. 2. M. Bohn, D. W. Heermann, and R. van Driel, Phys. Rev. E, 76(5):051805, 2007. 3. M. Barbi, J. Mozziconacci, and J.-M. Victor, Phys Rev E Stat Nonlin Soft Matter Phys, 71(3 Pt 1):031910, Mar 2005.

The Relation Between the Gene Network and the Physical Structure of Chromosomes

167

4. P.-G. de Gennes, Ithaca, N.Y., Cornell University Press, 1979. 5. P. M. Diesinger and D. W. Heermann, Phys. Rev. E, 74, 031904, Sep 2006. 6. P. M. Diesinger and D. W. Heermann, Biophys. J., 94(11), 4165 – 4172, 2008. 7. P. M. Diesinger and D. W Heermann. Biophys. J., 97(8), 2146–2153, Oct 2009. 8. C. A. Davey, D. F. Sargent, K. Luger, A. W Maeder, and T. J Richmond, J Mol Biol, 319(5), 1097–1113, Jun 2002. 9. P. Fraser and W. Bickmore, Nature, 447(7143), 413–417, May 2007. 10. P. Fraser, Current Opinion in Genetics & Development, 16(5), 490–495, Oct 2006. 11. A. Y. Grosberg and A. R. Khokhlov, Statistical Physics of Macromolecules. AIP Press, 1994. 12. S. Goetze, J. Mateos-Langerak, H. J. Gierman, W. de Leeuw, Osdilly Giromus, M. H. G. Indemans, J. Koster, V. Ondrej, R. Versteeg, and R. van Driel, Mol. Cell. Biol., 27(12), 4475–4487, 2007. 13. P. Hahnfeldt, J. E. Hearst, D. J. Brenner, R. K. Sachs, and L. R. Hlatky, PNAS, 90, 7854–7858, August 1993. 14. S. Jhunjhunwala, M. C. van Zelm, M. M. Peak, S. Cutchin, R. Riblet, J. J. M. van Dongen, F. G. Grosveld, T. A. Knoch, and C. Murre, Cell, 133(2), 265–279, Apr 2008. 15. K. Luger, A. W. Maeder, R. K. Richmond, D. F. Sargent, and T, J. Richmond, Nature, 389(6648), 251–260, September 1997. 16. A. Miele and J. Dekker, Mol. BioSyst., 4(11), 1046–1057, Nov 2008. 17. C. M¨unkel, R. Eils, S. Dietzel, D. Zink, C. Mehring, G. Wedemann, T. Cremer, and J. Langowski, J. Mol. Biol., 285, 1053–1065, 1999. 18. B. Mergell, R. Everaers, and H. Schiessel, Phys. Rev. E, 70, 011915, Jul 2004. 19. D. Marenduzzo, I. Faro-Trindade, and P. R. Cook, Trends Genet., 23(3), 126 – 133, 2007. 20. J. Mateos-Langerak, M. Bohn, W. de Leeuw, O. Giromus, E. M. M. Manders, P. J. Verschure, M. H. G. Indemans, H. J. Gierman, D. W. Heermann, R. van Driel, and S. Goetze, PNAS, 106(10), 3812–3817, 2009. 21. R.-J. Palstra, B. Tolhuis, E. Splinter, R. Nijmeijer, F. Grosveld, and W. de Laat, Nat. Genet., 35(2), 190–194, Oct 2003. 22. H. Schiessel, J. Phys.: Condens. Matter, 15(19), R699–R774, 2003. 23. E. Segal and J. Widom, Nature Reviews Genetics, 10, 443-456, 2009. 24. R. K. Sachs, G. V. D. Engh, B. Trask, H. Yokota, and J. E. Hearst, PNAS, 92(7), 2710–2714, 1995. 25. E. Segal, Y. Fondufe-Mittendorf, L. Chen, A. Th˚astr¨om, Y. Field, I. K Moore, J.-P. Z. Wang, and J. Widom, Nature, 442(7104), 772–778, Aug 2006. 26. M. Simonis, P. Klous, E. Splinter, Y. Moshkin, R. Willemsen, E. de Wit, B. van Steensel, and W. de Laat, Nat. Genet., 38(11), 1348–1354, Nov 2006. 27. K. E. van Holde, Chromatin, New York: Springer-Verlag, 1989. 28. C. L. Woodcock, S. A. Grigoryev, R. A. Horowitz, and N. Whitaker, PNAS, 90(19), 9021– 9025, 1993. 29. J. Widom, PNAS, 89(3), 1095–1099, Feb 1992.



Generalized Bilinear System Identification with Coupling Force Variables Jer-Nan Juang

Abstract A novel method is presented for identification of a generalized bilinear system with nonlinear terms consisting of the product of the state vector and the coupling force variables. The identification process requires a series of pulse response experiments from input values of various pulse duration for coupling force variables. It also requires experiments with multiple inputs rather than one single input at a time. The resulting identified system matrices represent the input–output map of the generalized bilinear system. A simple example is given to illustrate the concept of the identification method.

1 Introduction Many important processes, not only in engineering, but also in biology, socioeconomics, and ecology, may be modeled by bilinear systems (see Bruni et al. [1, 2], Mohler et al. [3], Mohler [4] and Elliott [5]). An important feature of the bilinear system is that it has the characteristics of a linear system for a constant or zero input. The special characteristics are the basis for the identification method developed by Juang [6]. Sontag et al. [7] were able to adapt many of its basic ideas [6] to prove that step inputs are not sufficient, nor are single pulses, but the family of all pulses (of a fixed amplitude but varying widths) do suffice for identification to completely identify the input/output behavior of generic bilinear systems. Recently, the earlier work [6] was extended by Juang [8] to identify a generalized bilinear system with dynamics jointly nonlinear in the state and the force variables of order higher than one.

J.-N. Juang Department of Engineering Science, National Cheng Kung University, Tainan, Taiwan e-mail: [email protected] H.G. Bock et al. (eds.), Modeling, Simulation and Optimization of Complex Processes, DOI 10.1007/978-3-642-25707-0 14, © Springer-Verlag Berlin Heidelberg 2012

169

170

J.-N. Juang

This paper is intended to motivate the interest in bilinear system identification and to present the current state of research in its various aspect of nonlinearity. The identification methods introduced in [6] and [8] are advanced to handle the nonlinearity consisting of the product of the state vector and the coupling force variables. The organization of this paper is as follows. After an introductory section, the section of Basic Formulation is given enlightening special characteristics of bilinear systems. The formulations are self-contained for the purpose of completeness, even though they look quite similar to the ones described in [6] and [8]. The main results are given in the section of System Identification Method, describing the main contributions of this paper in comparison with its companion papers . The Numerical Example section gives a simple example to illustrate the identification method developed. In the final section of Concluding Remarks, some concluding remarks are made on still open problems and possible trends for future research.

2 Basic Formulation Let x be the state vector of n  1, Ac the state matrix of n  n , u the input vector of r  1, Bc the input matrix of n  r, y the output vector of m  1, C the output matrix of m  n, and D the direct transmission matrix of m  r . The generalized bilinear state equation in the continuous-time domain is expressed by xP D Ac x C Bc u C

r X

Nci xui C

i D1

r X r X

Ncij xui uj

(1)

i D1 j D1

with the output measurement equation y D C x C Du

(2)

The coupling terms xui and xui uj between the state vector x and each individual ui and/or uj (i; j D 1; : : : ; r ) in the input vector u are weighted by the matrices Nci and Ncij of n  n, respectively. Subscript c implies the associated quantity in the continuous-time domain. Considering two inputs at a time, (1) reduces to   xP D Ac x C bci ui C bcj uj C Nci ui C Ncj uj C Nci i u2i C Ncjj u2j C Ncij ui uj x (3) where bci and bcj are the i th and j th columns of Bc associated with the inputs ui and uj respectively. Assuming that ui D i = constant and uj D j = constant, the continuous-time state equation (3) further reduces to   xP D Ac C Nci i C Ncj j C Nci i i2 C Ncjj j2 C Ncij i  j x C bci i C bcj j (4)

Generalized Bilinear System Identification with Coupling Force Variables

171

The discrete-time model of this system is x.k C 1/ D ANij x.k/ C bNij I

i; j D 1; 2; : : : ; r

(5)

with the measurement equation yij .k/ D C x.k/ C dij

(6)

where the quantities ANij , bNij , and dNij are determined by N ANij D e Ac ij t I Acij D Ac CNci i CNcj j CNci i i2 CNcjj j2 CNcij i j (7)

bNij D

Z

0

t

  N e Ac ij  d bcij I bcij D bci i C bcj j dij D di i C dj j

(8) (9)

The quantity t is the time interval for data sampling. With the absence of the input, i.e., ui D uj D 0, (5) reduces to x.k C 1/ D Ax.x/

(10)

A D e Ac t

(11)

where Assuming that the initial state x.0/ is a zero vector of n by 1, i.e., x.0/ D 0n1 , the measurement quantities yij .k/ for k D 0; 1;    ; N C `, due to the force excitation of ui D i and uj D j (constant force) simultaneously for k < p, and ui D uj D 0 for k  N , can be expressed as

where

yij .0/ D dij yij .1/ D C bNij C dij   yij .2/ D C ANij bNij C bNij C dij :: : yij .p  1/ D C bQij .p  1/ C dij yij .p/ D C bQij .p/ yij .p C 1/ D CAbQij .p/ :: : yij .p C `/ D CA` bQij .p/

(12)

p1 p2 bQij .p/ D ANij bNij C ANij bNij C    C ANij bNij C bNij I

(13)

172

J.-N. Juang

and ` is an integer indicating the data length of the free-decay response. The upper portion, yij .0/; yij .1/;    ; yij .p 1/, of (12), corresponds to the multiple-pulse response resulting from a constant force over multiple sample periods, i.e., pt. But the lower portion, yij .p/; yij .p C 1/;    ; yij .p C `/, corresponds to the freedecay response which is quite similar, if not identical, to the pulse response for a linear system with the absence of nonlinear coupling terms between the state x and the input u. Any linear system identification technique may be applied to compute the state matrix A and the output matrix C [9, 10] from the free-decay response.

3 System Identification Method The identification method requires two steps. The first step is to identify the state matrix Ac , the output matrix C , and the data transmission matrix D. The second step is to determine the input matrices Bc , Nci for the bilinear coupling term between the state x and the input ui , and Ncij for the nonlinear term between the state vector x and the product of forces ui and uj for i; j D 1; 2; : : : r.

3.1 Identification of Ac , C , and D Apply a pulse of i for the i th input ui and j for the j th input uj to the system for one time step to generate the pulse response .i; j D 1; 2; : : : ; r/. From (12) for N D 1, the pulse response has the following expression yij .0/ D dij I yij .1/ D C bNij I yij .2/ D CAbNij I :: : yij .` C 1/ D CA` bNij

(14)

For any input set of i and j , one obtains a sequence of pulse response; yij .k/; k D 1; 2; : : : ; `. Any different value set of i and j will generate another sequence of pulse response. At the end, one may generate as many sequences as desired for system identification. Now, form the system Markov parameters as (see [9] and [10])     Y1 .0/ D y12 .0/ y13 .0/    D d12 d13    D D     Y1 .1/ D y12 .1/ y13 .1/    D C bN12 bN13        Y1 .2/ D y12 .2/ y13 .2/    D CA bN12 bN13    (15) :: :     Y1 .` C 1/ D y12 .` C 1/ y13 .` C 1/    D CA` bN12 bN13   

Generalized Bilinear System Identification with Coupling Force Variables

173

Subscript 1 for Y1 .k/.k D 1; 2; : : : ; / implies one-time-step pulse response. Equation (15) provides the basic parameters for identification of A, C and D. Note that each system Markov parameter has a minimum of columns, , to be determined later. Let us form a Hankel matrix of ˛m  ˇ as follows. 2

Y1 .1/ 6 Y1 .2/ 6 H1 D 6 : 4 ::

Y1 .2/ Y1 .3/ :: :

  :: :

Y1 .ˇ/ Y1 .ˇ C 1/ :: :

Y1 .˛/ Y1 .˛ C 1/    Y1 .˛ C ˇ  1/ 3 C 6 CA 7   6 7 D 6 : 7 BN 1 ABN 1    Aˇ1 BN 1 4 :: 5 2

3 7 7 7 5

(16)

CA˛1

where

  BN 1 D bN12 bN13   

(17)

with the size of n  . The matrix product in (16) shows the relationship between the system Markov parameters and the discrete-time system matrices. Obviously the Hankel matrix H1 has the rank n that is the order of the state matrix A if we choose ˛ and ˇ such that ˛m and ˇ are larger than or equal to n where m is the number of outputs and  is related to the number of inputs. Using the singular value decomposition (SVD) to decompose the Hankel matrix H1 yields 2

6 6 H1 D U1 ˙1 V1T  6 4

3

C CA :: : CA˛1

7  7 N 7 B1 ABN 1    Aˇ1 BN 1 5

(18)

where ˙1 is a square matrix containing n non-zero singular values. One may choose 2

6 6 U1 D 6 4 and

C CA :: : CA˛1

3 7 7 7 5

  ˙1 V1T D BN1 ABN 1    Aˇ1 BN 1

(19)

(20)

174

J.-N. Juang

The matrix U1 has the dimension of ˛m  n, whereas ˙1 V1T has the dimension of n  ˇ. This choice is not unique. Many other choices are also valid. Note that the Ž choice of (19) has the advantage that U1T U1 D Inn ) U1 D U1T because U1 is a unitary matrix resulting from the property of the singular value decomposition. Equation (19) is commonly called observability matrix whereas (20) is referred to as the controllability matrix. Equations (19) and (20) produce the following solutions C D the first m rows of U1

(21)

BN 1 D the first  columns of ˙1 V1T (22) N Note that the dimension n for B1 is oversized in comparison with the dimension n  r for the original input matrix Bc to be determined later. Since the choices of controllability and observability matrices are not unique, the identified matrices C and BN 1 are not unique. To determine the state matrix A, let us first define and observe the following matrices. 2

6 6 U1" D 6 4

3

C CA :: : CA˛2

7 7 7 5

2

6 6 and U1# D 6 4

CA CA2 :: : CA˛1

3

7 7 7 D U1" A 5

(23)

Deleting the last m rows of U1 forms the matrix U1" whereas deleting the first m rows of U1 yields the matrix U1# . The state matrix A can then be determined by Ž

(24)

A D U1" U1#

For the identified state matrix to have the rank n, the integer ˛ must be chosen such that .˛  1/m  n, i.e., ˛m > n . From (11), (24) produces the continuous-time state matrix as Ac D

1 1 Ž log.A/ D log.U1" U1# / t t

(25)

Thus, we have determined Ac from (25), and C from (21). The original transmission matrix D can be determined from (9)



  d12 d13    d.r1/r    D d1 d2 d3    dr1  D˝

2

1 6 6 2 6  60 dr ˝ 6 6 :: 6 : 6 40 0

3 0  0 7 7 7 0 7 :: :: 7 7 : : 7 7 0    r1    5 0    r   

1 0 3 :: :

   :: :

(26)

Generalized Bilinear System Identification with Coupling Force Variables

175

The matrix D is uniquely determined only when the r   matrix  has rank of r, that is the case for nonzero i ; .i D 1; 2; : : : ; r/ when  > r. Any column of  may be repeated by assigning a different value for i , i.e., repeat the same experiment but different input values. From (15), observe that     Y1 .0/ D y12 .0/ y13 .0/    D d12 d13    D D

(27)

The matrix D can then be recovered using (26) to have D D Y1 .0/ ˝  Ž

(28)

Note again that the identified matrices Ac and C are not uniquely determined but D is coordinate invariant and so is uniquely computed.

3.2 Identification of Bc , Nci and Ncji I i; j D 1; 2 ; : : : ; r The second step begins with generating the multiple-sample-period pulse response for all inputs with two inputs at a time to excite the bilinear system. Figure 1 shows several pulses with sample periods up to p D 4 (four periods) for the i th input ui with the pulse of magnitude i . Similarly, other inputs have the same structure as Fig. 1 but may have various values of i . Apply a pulse of i for the i th input uj and j for the j th input uj to the system for p time steps to generate the pulse response (i, j = 1,2,,r). yij .p/ D C bQij .p/ yij .p C 1/ D CAbQij .p/ yij .p C 2/ D CA2 bQij .p/ :: : yij .p C `/ D CA` bQij .p/

Fig. 1 Multiple-sample-period pulse

(29)

176

J.-N. Juang

where bQij .p/ is defined in (13). There are a total of .r  1/r=2 combinations for two inputs at a time for generation of multiple-sample-pulse response. Additional sets of pulse responses are generated by repeating some experiments with different input values. Now define the system Markov parameters for the p-sample-period pulse response as   Yp .p/ D y12 .p/ y13 .p/     C BN p   Yp .p C 1/ D y12 .p C 1/ y13 .p C 1/     CABN p :: :   Yp .p C `/ D y12 .p C `/ y13 .p C `/     CA` BN p

(30)

where the n   matrix BN p is defined as

  BN p D bQ12 .p/ bQ13 .p/   

(31)

Let us form a ˛m   matrix as follows. 2

6 6 Hp D 6 4

3

Yp .p/ Yp .p C 1/ :: : Yp .p C ˛  1/

2

7 6 7 6 7D6 5 4

C CA :: : CA˛1

3

7 7 N 7 Bp 5

(32)

Using U1 computed in (19) from one-time-step pulse response, the n   matrix BN vp in (32) can be solved by 2

6 6 BN p D 6 4

C CA :: : CA˛1



7 7 Ž 7 Hp D U1 Hp 5

(33)

To determine Bc , let us first observe the matrices BN v1 ;    ; BN vp defined in (17), and (31), and determined by (22), and (33), i.e.,     BN 1 D bQ12 .1/ bQ13 .1/    D bN12 bN13    D the first  columns of ˙1 V1T   BN 2 D bQ12 .2/ bQ13 .2/    :: :   N Bp D bQ12 .p/ bQ13 .p/    (34)

Generalized Bilinear System Identification with Coupling Force Variables

177

Applying the recursive formula  k1  k1 N BN k  BN .k1/ D AN12 bN12 AN13 b13    I k D 2; 3;    ; p

(35)

yields

2

3

2

bN12 6 6 BN  BN 7 6 AN bN 12 12 6 2 1 7 6 7D6 :: :: 6 7 6 4 5 6 : : 4 p1 N N N N A12 b12 Bp  B.p1/ BN 1

bN13    AN13 bN13    :: :: : : p1 N N A b13    13

bN.r1/r AN.r1/r bN.r1/r :: : p1 N bN.r1/r A .r1/r



3

7 7 7 :: 7 : 7 5 

(36)

Based on the above matrix, define the controllability-like matrices for each pair of input i and input j h i N C ij D bNij ANij bNij    ANp1 ij bij I

i; j D 1; 2;    ; rI i ¤ j

(37)

To determine the state matrix ANij D e ANc ij t , let us first define the two matrices Cij

i h Nij D bNij ANij bNij    ANp2 b ij

(38)

h i Nij D ANij Cij Cij ! D ANij bNij AN2ij bNij    ANp1 b ij

and

(39)

Deleting the last column of Cij forms the matrix Cij whereas deleting the first column of Cij yields the matrix Cij ! . Equations (38) and (39) produce the solutions: Ž ANij D Cij ! Cij I

i; j D 1; 2;    rI i ¤ j

(40)

and bNij D the first column of Cij I

i; j D 1; 2;    rI i ¤ j

(41)

For the identified matrix ANij to have the rank n, both n  p matrices Cij ! and Cij must also have the rank n. It implies that p must be chosen such that p > n. This indicates that identification of ANij requires a total of at least .n C 1/ sets of responses generated by .n C 1/ various time periods of the pulse input. From (7) and (8) for the definitions of ANij and bNij , taking the conversion from discrete-time to continuous-time produces Acij D Ac CNci i CNcj j CNci i i2 CNcjj j2 CNcij i j D and bcij

1 log.ANij / (42) t

  1 1 2 2 3 D Inn t C Acij .t/ C .Acij / .t/ C    bNij 2Š 3Š

(43)

178

J.-N. Juang

for i; j D 1; 2;    rI i ¤ j , where Inn is a n  n identity matrix. Now recall from (9) that bcij D bci i C bcj j I i; j D 1; 2;    ; rI i ¤ j (44) which yields 

bc12 bc13    bc.r1/r   

 D bc1 bc2 bc3    bc.r1/



2

1 6 6 2 6  60 bcr ˝ 6 6 :: 6 : 6 40 0

3 0  0   7 7 7 0   7 :: :: 7 7 : : 7 7 0    r1    5 0    r   

1 0 3 :: :

   :: :

(45)

or equivalently (46)

Bc D Bc ˝ 

where the symbol ˝ means the Kronecker product. The input matrix Bc can thus be identified to be Bc D Bc ˝  Ž (47) From (42), the matrices Nci and Ncij .i D 1; 2; : : : ; rI j D 1; 2; : : : ; r/ are determined by Nci i C Ncj j C Nci i i2 C Ncjj j2 C Ncij i j D Acij  Ac

(48)

Rewriting it into a matrix form yields 3 Nci 7 6 h i 6 Ncj 7 7 6 2 2 i j i j i j ˝ 6 Nci i 7 D Acij  Ac 7 6 4 Ncjj 5 Ncij 2

(49)

For the case where the system has only two inputs denoted as ui and uj , we need five experiments with five sets of i and j with D 1; 2; 3; 4; 5 2

3

2 Nci 7 6 6 i2 j2 i22 j22 i2 j2 7 6 N 7 6 cj 6 6 i j  2  2 i j 7 ˝ 6 Nci i 6 3 3 i3 j3 3 3 7 6 7 6 6 4 i4 j4 i24 j24 i4 j4 5 4 Ncjj Ncij i5 j5 i25 j25 i5 j5 i1 j1 i21 j21 i1 j1

3

2

Aci1 j1 7 6A 7 6 ci2 j2 7 6 7 D 6 Aci3 j3 7 6 5 4 Aci4 j4 Aci5 j5

3  Ac  Ac 7 7 7  Ac 7 7  Ac 5  Ac

(50)

Generalized Bilinear System Identification with Coupling Force Variables

179

Matrices Nci and Ncij .i; j D 1; 2/ can then be computed by 2

Nci 6N 6 cj 6 6 Nci i 6 4 Ncjj Ncij

3

2

i1 j1 i21 j21 i1 j1

31

7 7 6 i2 j2 i22 j22 i2 j2 7 7 6 6 7 7 6 7 D 6 i3 j3 i23 j23 i3 j3 7 7 7 6 5 4 i4 j4  2  2 i4 j4 7 5 i4 j4 2 2 i5 j5 i5 j5 i5 j5

2

Aci1 j1 6A 6 ci2 j2 6 ˝ 6 Aci3 j3 6 4 Aci4 j4 Aci5 j5

3  Ac 7  Ac 7 7  Ac 7 7  Ac 5  Ac

(51)

The matrix inverse Œ 1 becomes matrix pseudo-inverse Œ Ž if more than five experiments are conducted for the pair of inputs. Note that the values i and j for D 1; : : : ; 5 should be chosen such that the matrix inverse is well-conditioned. For the general cases where the number of inputs is r > 1, similar equation to (51) may be formulated by performing .r Cr Cr.r 1/=2/ sets of experiments with two inputs at a time. Each set of experiments requires a minimum of n C 1 different time periods of pulse to compute the matrix Aci j with a proper integer for in order to establish enough number of equations to solve for the coupling coefficient matrices Ni , and Ncij for i; j D 1; 2; : : : ; r.

4 Numerical Example Consider the following bilinear equation for the two-input (r D 2) and single-output (m D 1) case xP D Ac x C Bc u C Nc1 xu1 C Nc2 xu2 C Nc11 xu21 C Nc22 xu22 C Nc12 xu1 u2 (52) y D Cx where          11 00 10 1 0 Ac D I I Nc2 D I C D 0 1 I Nc1 D I Bc D 00 11 01 1 2       0 0 1 1 1 1 I Nc22 D I Nc12 D I Nc11 D 1 1 0 0 1 1 (53) The system possesses five nonlinear terms, i.e., r C r C r.r  1/=2 D 5, in this two-input case. Five nonlinear terms implies the need of five independent sets of multiple-pulse response data with five different input values for identification of a complete set of system matrices, Ac ; Bc ; C; D; Nc1 ; Nc2 ; Nc11 ; Nc22 and Nc12 for the two-input case. The order of the system is generally unknown. Let us generate five sets of data with the time period  D 1 second for each pair of input values. The five sets of input values are shown as follows: 

180

J.-N. Juang

Fig. 2 Five responses sampled at 1 Hz from the two inputs of amplitudes Œ0:4I 0:001 over five different sample periods, spp: sample-period pulse



11 12 13 14 15 21 22 23 24 25





0:4 0:001 0:25 1 0:4 D 0:001 1 0:25 2 0:1



(54)

The first subscript for  indicates the input number whereas the second subscript gives the number of experiment. These five sets of pulse values were carefully selected to generate response histories with comparable size in amplitude. Figure 2 shows the pulse responses generated by exciting the bilinear system, (52), using the first set of input values shown in (54). Each response sampled at 1 Hz has 12 data points. Similar pulse responses (not shown) are also generated by other sets of input values. These responses are obtained by numerically integrating the bilinear system shown in (52). Using the five one-sample-period pulse responses and setting ˛ D 5 and ˇ D 6, the Hankel matrix H1 shown in (16) with five sets of input values should have the size of m˛  5ˇ D 5  30. The singular values of this Hankel matrix are   ˙1 D diag 1:0114 0:8767 0 0 0

(55)

that implies the system order n D 2. The left singular vector matrix is 2

3 00:8827 00:4673 6 7 6 7 6 00:4313 00:7638 7 6 7 6 7 U1 D 6 7 D 6 00:1731 00:4076 7 7 4 5 6 4 00:0656 00:1671 5 4 CA 00:0244 00:0638 2

C CA :: :

3

The state matrix and the output matrix identified from this Hankel matrix are

(56)

Generalized Bilinear System Identification with Coupling Force Variables

      00:5674 30:8003 Q I CQ D 00:8827 00:4673 I D D 0 0 Ac D 00:1631 20:4326

181

(57)

Using the other multiple-sample-period responses, the Hankel matrices Hk shown in (32) for k = 2, 3, 4, 5 have the size of 5  5 (i.e., ˛ number of input pairs), that produce the 2  5 matrices BN1 ; BN 2 ;    ; BN 5 shown in (34), and in turn yield 2  4 matrices Cij and Cij ! shown in (38) and (39). Applying (35)–(44), the quantity Bc is identified as   00:1241 00:9444 I (58) BQc D 00:2344 00:3560 Applying (51) thus yields the identified quantities:   00:2759 00:3833 Q I Nc2 D 00:5212 00:7241

(59)

   00:4326 30:8003 00:0568 00:4993 I NQ c22 D I 00:1631 10:4326 00:1074 00:9432   00:7085 40:1836 D 00:3581 00:7085

(60)

NQ c1 D and



NQ c11 D NQ c12

 20:0999 20:9177 I 00:7916 10:0999



The tilt on the top of Ac ; Bc ; Nc1 ; Nc2 ; Nc11 ; Nc22 ; and Nc12 signifies the identified quantities that are not uniquely determined. The identified matrices AQc ; BQ c ; CQ ; NQ c1 ; NQ c2 ; NQ c11 ; NQ c22 ; and NQ c12 are equivalent to the matrices Ac ; Bc ; Nc1 ; Nc2 ; Nc11 ; Nc22 ; and Nc12 shown in (53) in the sense that they give the same map from the input u to the output y. Note that the eigenvalues for both Ac and AQc are identical, i.e., 1 and 2. A coordinate transformation will be able to transform one set of matrices to the other set of matrices (see [6]).

5 Concluding Remarks This paper constitutes part of the great efforts in studying continuous-time bilinear system identification. A series of studies have been done for several cases with various aspect of nonlinearity consisting of the product of state vector and force variables. The whole identification process requires a series of pulse response experiments for various pulse duration with a finite number of pairs of input values. For simplicity, only two coupling force variables at a time are considered in this paper. The identification process can be easily extended to more general cases with any fixed number of coupling force variables. The derivation of the identification algorithm relies on noise-free input and output data. In practice, only noisy signals are measured and model errors always exist. Furthermore, the initial state may not be at rest. These practical limitations of the proposed method have not been addressed in this paper. Nevertheless, it is

182

J.-N. Juang

anticipated that the accuracy indicators already existed for linear system identification with noisy measurements may be enhanced for bilinear system identification. In the past decade, many identification methods were developed for discretetime bilinear systems. It is known at this moment that the continuous-time bilinear systems do not have its explicit counterpart of discrete-time bilinear systems as in the linear case. An open question is still unresolved whether there ever exists an explicit/implicit discrete-time version for the continuous-time bilinear systems in the sense that they may be able to transform from one to the other as in the linear case. Acknowledgements The major portion of this research was completed when the author served as President of National Applied Research Laboratory, Taipei, Taiwan

References 1. Bruni, C., DiPillo, G., and Koch, G., On the Mathematical Models of Bilinear Systems, Ricerche Di Automatica, 2 (1), 1971, pp. 11–26. 2. Bruni, C., DiPillo, G., and Koch, G., Bilinear Systems: An Appealing Class of Nearly Linear Systems in Theory and Application, IEEE Transaction Automatic Control, AC-19, 1974, pp. 334–348. 3. Mohler, R. R., and Kolodziej, W. J., An Overview of Bilinear System Theory and Applications, IEEE Transactions on Systems, Man and Cybernetics, SMC-10, 1980, pp. 683–688. 4. Mohler, R. R., Nonlinear Systems: Vol. II, Applications to Bilinear Control, Prentice-Hall, New Jersey, 1991. 5. Elliott, D. L., Bilinear Systems, in Encyclopedia of Electrical Engineering, Vol. II John Webster (ed.), John Wiley and Sons, New York, 1999, pp. 308–323. 6. Juang, J.-N., Continuous-Time Bilinear System Identification, Nonlinear Dynamics, Kluwer Academic Publishers, Special Issue 39(1-2), (January I-II 2005), pp. 79–94 7. Sontag, E.D., Wang, Y., Megretski, A., Input Classes for Identification of Bilinear Systems, 2007 American Control Conference, July 11–13, 2007, Marriott Marquis Hotel at Time Square, New York, USA, Paper FrA04.3. 8. Juang, J.-N., Generalized Bilinear System Identification, The Journal of the Astronautical Sciences, Vol. 57, Nos. 1 & 2, January-June 2009, pp. 261–273. 9. Juang, J.-N., Applied System Identification, Prentice Hall, New Jersey, 1994. 10. Juang, J-N. and Phan, M. Q., Identification and Control of Mechanical Systems, Cambridge University Press, New York, 2001.

Reduced-Order Wave-Propagation Modeling Using the Eigensystem Realization Algorithm Stephen A. Ketcham, Minh Q. Phan, and Harley H. Cudney

Abstract This paper presents a computationally efficient version of the Eigensystem Realization Algorithm (ERA) to model the dynamics of large-domain acoustic propagation from High Performance Computing (HPC) data. This adaptation of the ERA permits hundreds of thousands of output signals to be handled at a time. Once the ERA-derived reduced-order models are obtained, they can be used for future simulation of the propagation accurately without having to go back to the HPC model. Computations that take hours on a massively parallel high performance computer can now be carried out in minutes on a laptop computer.

1 Introduction Simulation of linear-time-invariant systems has broad application in outdoor sound propagation. On a site-specific basis, particularly for geometrically complex and heterogeneous domains, sound propagation can be best examined and understood using results of three-dimensional numerical simulations. However, to analyze large-domain, long-duration, and wide-bandwidth systems associated with many practical scenarios, these simulations require massively parallel computations merely to approach an acceptable fidelity between simulation and actual outdoor sound transmission. This research, recognizing the computational investment inherent in every high-performance numerical model, develops reduced-order models (ROMs) from input–output signals of sound-propagation supercomputing

S.A. Ketcham  H.H. Cudney Engineer Research and Development Center, Hanover, NH 03755, USA e-mail: [email protected]; [email protected] M.Q. Phan Thayer School of Engineering, Dartmouth College, Hanover, NH 03755, USA e-mail: [email protected] H.G. Bock et al. (eds.), Modeling, Simulation and Optimization of Complex Processes, DOI 10.1007/978-3-642-25707-0 15, © Springer-Verlag Berlin Heidelberg 2012

183

184

S.A. Ketcham et al.

simulations [1, 2]. The method uses a state-space minimum-realization technique called the Eigensystem Realization Algorithm (ERA) [3–6] to generate efficient and reusable ROMs for very large wave fields that are down-selected from the entire output field of the numerical model. For problems where the number of outputs is in the hundreds of thousands or millions, conventional ERA requires the singular value decomposition (SVD) of a very large dimensional matrix. This calculation is expensive in terms of both computational time and random access memory (RAM) requirement. Hence there is a need for a modified ERA formulation that avoids using a large amount of RAM and the SVD of a large matrix. It should be noted that outdoor propagation cannot be described with classical modal decomposition. The implementation, therefore, focuses on finding a model that results in nearly exact reproduction of the Markov parameters (i.e., system unit pulse response samples) for the duration and the dynamic range of interest. Compared to results generated by the urban-acoustics HPC simulation, the prediction error of the reduced-order model derives primarily from any inaccuracies in the computed Markov parameters, rather than from the realization technique. Our study of the wave-field error levels of a reduced-order model with 1.26 million outputs indicates that the method is capable of reducing a supercomputer model to a model that can operate on a laptop computer. The outline of this paper is as follows. First, we review the original ERA algorithm. Next, we describe a modified version for computational efficiency. Finally, we present a numerical illustration that shows how the technique is used to derive highly accurate reduced-order models from HPC simulation data of sound propagation in a highly complex environment. A companion paper [7] describes another technique where the reduced-order models can be directly realized from the Markov parameters without going through ERA.

2 State-Space Model The discrete-time state-space model for a linear time-invariant dynamical system takes the form, x.k C 1/ D Ax.k/ C B u.k/ (1) y.k/ D C x.k / In (1), k is the integer sample index, x is an n -dimensional state vector, u is an m -dimensional input vector, y is a q -dimensional output vector. A is the n-by-n system matrix, B is the n-by-m input influence matrix, and C is the q -by-n output influence matrix. For example, the input can be single sound source, and the output can be sound pressures at nodes throughout a domain of interest. The system Markov parameters, h .k/, are h .k/ D CA k1 B ; k D 1; 2; : : :

(2)

Reduced-Order Wave-Propagation Modeling

185

These can be obtained from the source and response data by various techniques, including the inverse FFT method as described in [6]. The next section describes how a state-space model (1) can be realized from a sufficient number of Markov parameters.

3 Eigensystem Realization Algorithm (ERA) Starting with the Markov parameters, a state-space model of the system can be derived by ERA. The algorithm begins by forming two r-by-s block Hankel matrices H.0/ and H.1/ as follows, 2

h.1/ 6 h.2/ 6 H.0/ D 6 : 4 :: 2

6 6 H.1/ D 6 4

h.2/ h.3/ :: :

  :: :

h.s / h.s C 1/ :: :

h.r/ h.r C 1/    h.r C s  1/ h.2/ h.3/ :: :

h.3/ h.4/ :: :

   h.s    h.s :: :

3 7 7 7 5

3 C 1/ C 2/ 7 7 7 :: 5 :

(3)

(4)

h.r C 1/ h.r C 2/    h.r C s /

The minimum order n of the system is revealed by the singular value decomposition of the Hankel matrix H.0/, (5) H.0/ D U ˙ V T The columns of U and V are orthonormal, ˙ is an n-by-n diagonal matrix of positive singular values, and n is the minimum order of the system. Define a q -by-rq matrix E qT , and an m-by-sm matrix E mT consisting of identity and null matrices of the form,     E qT D I q q 0q .r1/q ; E mT D I mm 0m.s 1/m

(6)

A discrete-time minimum-order realization of the system can be shown to be A r D ˙ 1=2 U T H .1/V ˙ 1=2 Br D ˙ 1=2 V T Em Cr D EqT U˙ 1=2

(7)

In (7) the subscript r is added to indicate that the realized state-space model is not necessarily in the same coordinates as the original state-space model in (1), but they have the same input–output map, i.e., the same Markov parameters. With perfect Markov parameters, the singular value decomposition in (5) reveals the true

186

S.A. Ketcham et al.

minimum order n of the system if r and s are chosen to be sufficiently large so that the rank n of H.0/ can be revealed. In the presence of noise, or when the Markov parameters are imperfect, the Hankel matrix H.0/ will contain more than n nonzero singular values, rendering the determination of the true minimum order of the system not obvious. In that case the order of the realization in (7) is determined by the number of singular values that the user decides to retain in (5).

4 Computationally Efficient Version of ERA Consider a system where the number of outputs is many orders of magnitudes larger than the number of inputs, q >> m. Instead of working with the original state-space model, we work with its transposed model, AN D AT ; BN D C T ; CN D B T

(8)

It follows that the Markov parameters of the transposed model are the transpose of the Markov parameters of the original model,  T  k CN ANk BN D B T AT C T D CAk B

(9)

Applying ERA to the Markov parameters of the transposed system produces ANr D ˙N 1=2 UN T HN .1/VN ˙N 1=2 BN r D ˙N 1=2 VN T Eq CN r D EmT UN ˙N 1=2

(10)

where HN .0/ D UN ˙N VN T , and both the Hankel matrices HN .0/ and HN .1/ are built from the transposed Markov parameters. When q >> m, the matrices HN .0/ and HN .1/ are both very wide matrices. To compute the SVD of HN .0/ efficiently, recognize that  T HN .0/HN T .0/ D UN ˙N VN T UN ˙N VN T D UN ˙N 2 UN T

(11)

The product HN .0/HN T .0/, which is a square symmetric matrix of much lower dimensions, has UN as its matrix of left singular vectors. The singular values of HN .0/HN T .0/ are the squares of the singular values of HN .0/. The matrices VN and VN T can be expressed as VN D HN .0/T UN ˙N 1 (12) VN T D ˙N 1 UN T HN .0/

(13)

The Hankel matrices formed by the transposed Markov parameters are the transpose of the Hankel matrices formed by the original Markov parameters, HN .0/ D H.0/T ; HN .1/ D H.1/T

(14)

Reduced-Order Wave-Propagation Modeling

187

Therefore, H.0/T H.0/ D HN .0/HN .0/T D UN ˙N 2 UN T

(15)

Substituting (12), (13), and (14) into (10) produces   ANr D ˙N 1=2 UN T H.1/T H.0/ UN ˙N 3=2 BN r D ˙N 1=2 UN T H.0/T Eq CN r D EmT UN ˙N 1=2

(16)

Recognizing the relationship between the transposed model and the original model, we arrive at the final formulas for Ar ; Br ; Cr as   Ar D ˙N 3=2 UN T H.0/T H.1/ UN ˙N 1=2 Br D ˙N 1=2 UN T Em Cr D EqT H.0/UN ˙N 1=2

(17)

From (11), UN and ˙N are obtained from the SVD of H.0/T H.0/ D UN ˙N 2 UN T . The formulas in (17) are computationally efficient because both H.0/T H.0/ and H.0/T H.1/ have significantly small dimensions when q >> m. There is no need to form H.0/ and H.1/ explicitly because only their products, H.0/T H.0/ and H.0/T H.1/, are called for. It should be noted that given p D r C s Markov parameters, H.0/ and H.1/ with the largest number of columns are: 2

3 h.2/       h.p  2/ h.p  1/ 6 h.3/       h.p  1/ h.p/ 7 6 7 6 7 h.4/       h.p/ 0 6 7 6 7 :: H.0/ D 6 7 6 h.4/ h.5/    h.p/ 7 0 : 6 7 :: :: :: :: :: :: 6 7 4 5 : : : : : : h.p  1/ h.p/ 0 0  0 h.1/ h.2/ h.3/

2

h.2/ 6 h.3/ 6 6 6 h.4/ 6 H.1/ D 6 6 h.5/ 6 6 :: 4 : h.p/

3 h.3/       h.p  1/ h.p/ h.4/       h.p/ 0 7 7 7 h.5/    h.p/ 0 0 7 :: 7 7 : 7 h.6/    0 0 7 :: :: :: :: :: 7 : : : : 5 : 0 0 0  0

(18)

(19)

In (18) and (19) h.k/ D 0 for k > p, which is appropriate for modeling the propagation dynamics during a finite-time interval k D 0; 1; : : : ; p. For q >> m, these Hankel matrices produce the largest possible state-space model consisting of .p  1/m states. Common terms that are present in the products H.0/T H.0/ and H.0/T H.1/ don’t have to be computed twice. We use an algorithm that computes

188

S.A. Ketcham et al.

Fig. 1 3D HPC simulation model and RMS of output signals (Pa) from source at center

Fig. 2 Waveform of filtered pulse source and frequency spectrum

only the unique multiplications in H.0/T H.0/ and H.0/T H.1/ in forming these products. Finally, as the matrices Em and EqT in (17) simply pick out the first m columns and the first q rows of UN T and H.0/, respectively, the computation of Br ; Cr is simpler than what it may appear.

5 Numerical Illustration A 3D HPC model of a city center (778 m-by-775 m-by-179 m in height) with 2.8 billion nodes is used to simulate the dynamic propagation of a sound source (Fig. 1). The output layer contains 1.26 million nodes above the ground and rooftops. The street-level source is at the model center. The sound source is a filtered pulse with the waveform shown on the left side of Fig. 2. The source concentrates energy in a particular frequency range as shown on the right side of Fig. 2. Time series of sound levels at the 1.26 million output locations are used in the identification. Each series

Reduced-Order Wave-Propagation Modeling

Fig. 3 Markov parameters (Pa s) of reduced-order model and HPC model

Fig. 4 Singular value plots for all 11 strips

189

190

S.A. Ketcham et al.

is 1,024 samples long which corresponds to 4.36 s of propagation. These outputs are divided further into 11 strips, each with 114,240 outputs. This set of time-domain input-out data, one strip at a time, is used to generate the system Markov parameters by the inverse FFT method. From these Markov parameters, the modified version of ERA is applied to produce 11 reduced-order single-input 114,240-output statespace models, one model per strip. The HPC simulation was performed on a Cray XT3 supercomputer with 256 CPU’s and 512 GB of RAM. The ERA model, on the other hand, operates using a single core and about 2.6 GB of RAM on a laptop computer, accessing the Markov parameters by virtual memory. To verify these reduced-order models, they are used to produce the pulse responses at the 1.26 million output nodes, and these responses are compared to the Markov parameters computed from the original HPC model. Figure 3 shows the time-series agreement from the highly scattered signals of selected locations. Using the ordering of the singular values, plotted in Fig. 4 for the 11 strips, the effect of model order reduction from 1,024 to 768 to 512 to 256 states is illustrated by spatial relative error plots in Fig. 5. As the plots quantify, the error increases as the model

Fig. 5 Relative error of 256-, 512-, 768-, and 1024-state reduced-order models (top left, top right, bottom left, bottom right, respectively)

Reduced-Order Wave-Propagation Modeling

191

Fig. 6 Waveforms and spectra of source signals superimposed to test the reduced-order models

order is reduced, revealing the accuracy expected within the duration of the Markov parameters. The higher-order models capture more propagation dynamics, hence better prediction quality is expected. Further verification is by an HPC simulation with an independent source, shown in Fig. 6 both in time and frequency. This input signal is a superposition of two Gaussian pulses and three harmonics. The source is longer than the 4.36-s duration of the Markov parameter sequence to test accuracy when ignoring late-arriving scattered energy. When comparing HPC results and the 1,024-state model signals, before 4.36 s, the median relative error over the output field is 1.6%. This error is 6% when comparing the full 6.5-s duration of the Fig. 6 signals, revealing the

192

S.A. Ketcham et al.

importance of capturing the desired dynamic range in the Markov parameters for models with continuous sources. Regarding efficiency of the ROM compared to the HPC simulation, the 1,024-state model with the Fig. 6 source operates on a laptop computer with reduction factors of about 10,000 in computational requirements (number of cores  seconds) and about 2 million in combined computational and memory requirements (number of cores  seconds  bytes). The utility of the reduced-order models is thus clearly demonstrated.

6 Conclusions In this work we have developed a computationally efficient version of the Eigensystem Realization Algorithm to derive reduced-order modes from wave-field data. This version of ERA can handle systems with hundreds of thousands of outputs. A high fidelity HPC simulation code is used to generate the acoustic responses to a source input in the entire 3D domain from which a subset of output locations of interest are selected. The inverse FFT technique is used to recover the system Markov parameters for use by the computationally efficient version of ERA developed in this paper. When the dynamic responses to a different source input is needed, the ERA-derived reduced-order models can be used in place of the original HPC simulation code, resulting in many orders of magnitude savings in computational requirements. To test the validity of the reduced-order models, the predicted acoustic wave-field signals from the reduced-order models are compared to the HPC-generated responses over a large and highly resolved domain. With care in the generation of the Markov parameters, highly accurate reduced-order models that describe the dynamics of sound propagation with severe scattering can be produced by the developed technique. Acknowledgements This research is supported in part by In-House Laboratory Independent Research (ILIR) and a US Department of the Army Small Business Technology Transfer (STTR) subcontract to Dartmouth College by Sound Innovations, Inc. The authors thank Mr. Michael W. Parker who contributed to the HPC simulations.

References 1. Ketcham, S.A., Parker, M.W., Cudney, H.H., and Wilson, D.K.: Scattering of Urban Sound Energy from High-Performance Computations. DoD High Performance Computing Modernization Program Users Group Conference, IEEE Computer Society, pp. 341–348 (2008). 2. Cudney, H.H., Ketcham, S.A., and Parker, M.W.: Verification of Acoustic Propagation Over Natural and Synthetic Terrain. DoD High Performance Computing Modernization Program Users Group Conference, IEEE Computer Society, pp. 247–252 (2007).

Reduced-Order Wave-Propagation Modeling

193

3. Ho, B.L., Kalman, R.E.: Effective Construction of Linear State-Variable Models from Input– Output Functions. Regelungstechnik, 14, 545–548 (1966). 4. Juang, J.-N., Pappa, R.S.: An Eigensystem Realization Algorithm for Modal Parameter Identification and Model Reduction. Journal of Guidance, Control, and Dynamics, 8, 620–627 (1985). 5. Juang, J.-N., Cooper, J.E., Wright, J.R.: An Eigensystem Realization Algorithm Using Data Correlations (ERA/DC) for Modal Parameter Identification. Control Theory and Advanced Technology, 4, No. 1, 5–14 (1988). 6. Juang, J.-N.: Applied System Identification. Prentice-Hall, Upper Saddle River, NJ (2001). 7. Phan, M.Q., Ketcham, S.A., Darling, R.S., Cudney, H.H.: Superstable State-Space Representation for Large-Domain Wave Propagation. Proceedings of the 4th International Conference on High Performance Scientific Computing, Hanoi, Vietnam (2009).



Complementary Condensing for the Direct Multiple Shooting Method Christian Kirches, Hans Georg Bock, Johannes P. Schl¨oder, and Sebastian Sager

Abstract In this contribution we address the efficient solution of optimal control problems of dynamic processes with many controls. Such problems typically arise from the convexification of integer control decisions. We treat this problem class using the direct multiple shooting method to discretize the optimal control problem. The resulting nonlinear problems are solved using an SQP method. Concerning the solution of the quadratic subproblems we present a factorization of the QP’s KKT system, based on a combined null-space range-space approach exploiting the problem’s block sparse structure. We demonstrate the merit of this approach for a vehicle control problem in which the integer gear decision is convexified.

1 Introduction Mixed-integer optimal control problems (MIOCPs) in ordinary differential equations (ODEs) have a high potential for optimization. A typical example is the choice of gears in transport [6, 8, 9, 14, 19]. Direct methods, in particular all-at-once approaches, [2, 3], have become the methods of choice for most practical OCPs. The drawback of direct methods with binary control functions is that they lead to high-dimensional vectors of binary variables. Because of the exponentially growing complexity of the problem, techniques from mixed-integer nonlinear programming will work only for small instances [20]. In past contributions [9, 12, 13, 15] we proposed to use an outer convexification with respect to the binary controls, which has several main advantages over standard

C. Kirches  H.G. Bock  J.P. Schl¨oder  S. Sager Interdisciplinary Center for Scientific Computing (IWR), University of Heidelberg Im Neuenheimer Feld 368, 69120 Heidelberg, Germany e-mail: [email protected]; [email protected]; [email protected]; [email protected] H.G. Bock et al. (eds.), Modeling, Simulation and Optimization of Complex Processes, DOI 10.1007/978-3-642-25707-0 16, © Springer-Verlag Berlin Heidelberg 2012

195

196

C. Kirches et al.

formulations or convexifications, cf. [12, 13]. In an SQP framework for the solution of the discretized MICOP, the outer convexification approach results in QPs with many control parameters. Classical methods [3] for exploiting the block sparse structure of the discretized OCP leave room for improvement. In [16, 17], structured interior point methods for solving QP subproblems arising in SQP methods for the solution of discretized nonlinear OCPs are studied. A family of block structured factorizations for the arising KKT systems is presented. Extensions to tree-sparse convex programs can be found in [18]. In this contribution we present an alternative approach at solving these QPs arising from outer convexification of MIOCPs, showing that a certain factorization from [16] ideally lends itself to the case of many control parameters. We employ this factorization for the first time inside an active-set method. Comparisons of run times and complexity to classical condensing methods are presented.

2 Direct Multiple Shooting for Optimal Control 2.1 Optimal Control Problem Formulation In this section we describe the direct multiple shooting method [3] as an efficient tool for the discretization and parameterization of a broad class of OCPs. We consider the following general class (1) of optimal control problems min

l.x./; u.//

s.t.

x.t/ P D f .t; x.t/; u.t//

8t 2 T

(1b)

0  c.t; x.t/; u.t//

8t 2 T

(1c)

0i m

(1d)

x./;u./

05

r.ti ; x.ti //

(1a)

in which we strive to minimize objective function l./ depending on the trajectory x./ of a dynamic process described in terms of a system f of ordinary differential equations the time horizon T WD Œt0 ; tf   R, and governed by a control trajectory u./ subject to optimization. The process trajectory x./ and the control trajectory u./ shall satisfy certain inequality path constraints c on the time horizon T , as well as (in-)equality point constraints ri on a grid of m C 1 grid points on T , t0 < t1 < : : : < tm1 < tm WD tf ;

m 2 N; m  1:

(2)

The direct multiple shooting method is applied to discretize the control trajectory u./ to make this infinite dimensional problem computationally accessible.

Complementary Condensing for the Direct Multiple Shooting Method

197

2.2 Direct Multiple Shooting Discretization Control Discretization A discretization of the control trajectory u./ on the shooting q grid (2) is introduced, using control parametersqi 2 Rni and base functions bi W q u T  Rni ! Rn . Examples are piecewise constant or linear functions. Xnqi

u.t/ WD

j D1

bij .t; qij /;

t 2 Œti ; ti C1   T ; 0  i  m  1:

(3)

State Parameterization In addition to the control parameter vectors, we introduce x state vectors si 2 Rn in all shooting nodes serving as initial values for m IVPs xP i .t/ D f .t; xi .t/; qi /;

xi .ti / D si

t 2 Œti ; ti C1   T ; 0  i  m  1: (4)

This parameterization of the process trajectory x./ will in general be discontinuous on T . Continuity is ensured by introduction of additional matching conditions 0  i  m  1;

xi .ti C1 I ti ; si ; qi /  si C1 D 0;

(5)

where xi .ti C1 I ti ; si ; qi / denotes the evaluation of the i -th state trajectory xi ./ at time ti C1 depending on the start time ti , initial value si , and control parameters qi . Constraint Discretization The path constraints of problem (1) are enforced on the nodes of the shooting grid (2) only. It can be observed that in general this formulation already leads to a solution that satisfies the path constraints on the whole of T . 05

ri .ti ; si ; qi /; 0  i  m  1;

05

rm .tm ; sm /:

(6)

Separable Objective The objective function shall be separable with respect to the shooting grid structure, l.x./; u.// D

Xm

i D0

li .si ; qi /:

(7)

In general, l./ will be a Mayer type function or a Lagrange type integral function. For both types, a separable formulation is easily found. Summarizing, the discretized multiple shooting optimal control problem can be cast as a nonlinear problem min w

s.t.

Xm

i D0

(8a)

li .wi /

0 D xi .ti C1 I ti ; wi /  si C1

0i m1

(8b)

05

0i m

(8c)

ri .wi /

198

C. Kirches et al.

with the vector of unknowns w WD .s1 ; q1 ; : : : ; sm1 ; qm1 ; sm / and subvectors wi WD .si ; qi / for 0  i  m1, and wm WD .sm /. The evaluation of the matching condition constraint (8b) requires the solution of the initial value problem (4).

2.3 Block Sparse Quadratic Subproblem For solving the highly structured NLP (8) we employ methods of SQP type, a long-standing and highly effective method for the solution of NLPs that also allow for much flexibility in exploiting the problem’s special structure. SQP methods iteratively progress towards a KKT point of the NLP by solving a linearly constrained local quadratic model of the NLP’s Lagrangian [11]. For NLP (8) the local quadratic model of the Lagrangian, to be solved in each step of the SQP method, reads Xm

ıw0i Bi ıwi C gi0 ıwi

min

1 2

s.t.

0 D Xi .wi /ıwi  ısi C1  hi .wi /;

0  i  m  1;

(9b)

0 5 Ri .wi /ıwi  ri .wi /;

0  i  m;

(9c)

ıw

i D0

(9a)

with the following notations for vector of unknowns ıw and its components ıwi WD .ısi ; ıqi / ; 0  i  m  1;

ıwm WD ısm ;

(10)

reflecting the notation used in (8), and with vectors hi denoting the residuals hi .wi / WD xi .ti C1 I ti ; wi /  si C1 :

(11)

The matrices Bi denote the node Hessians or suitable approximations, cf. [3], and the vectors gi denotes the node gradients of the objective function, while matrices eq Xi , Ri , and Riin denote linearizations of the constraint functions obtained in wi , Bi 

d2 li .wi / ; dw2i

gi WD

dli .wi / dri .wi / ; Ri WD ; dwi dwi

Xi WD

@xi .ti C1 I ti ; wi / : @wi (12)

The computation of the sensitivity matrices Xi requires the computation of derivatives of the solution of IVP (4) with respect to the wi . Consistency of the derivatives is ensured by applying the principle of internal numerical differentiation (IND) [1].

Complementary Condensing for the Direct Multiple Shooting Method

199

3 Block Sparse Quadratic Programming 3.1 Classical Condensing In the classical condensing algorithm [3, 10] that works as a preprocessing step to obtain a small dense QP from the block sparse one, the matching conditions (9b) are used for block Gaussian elimination of the steps of the additionally introduced state variables .ıs1 ; : : : ; ısm /. The resulting dense QP has nx C mnq unknowns instead of m.nx C nq / ones, is usually densely populated, and suited for solution with any standard QP code such as the null-space active-set codes QPOPT [7], qpOASES [4], or BQPD [5]. As we will see in Sect. 4, for MIOCPs with many controls parameters (i.e. large dimension nq ) resulting from the outer convexification of integer control functions, the achieved reduction of the QP’s size is marginal, however.

3.2 The KKT System’s Block Sparse Structure In this section we present an alternative approach at solving the KKT system of QP (9) found in [16, 17] where it was employed inside an interior-point method. We derive in detail the necessary elimination steps that will ultimately retain the duals of the matching conditions only. In this sense, the approach is complementary to the classical condensing algorithm. For optimal control problems with dimensions nq  nx , the presented approach obviously is computationally more favorable than retaining unknowns of dimension nq . In contrast to [16, 17] we employ this factorization approach inside an active-set method, and intend to further adapt it to this case by exploitation of simple bounds and derivation of matrix updates in a further publication. For a given active set, the KKT system of the QP (9) to be solved for the primal step ıwi and the dual step .ı; ı/ reads for 0  i  m Pi0 ıi 1 C Bi .ıwi / C Ri0 ıi C Xi0 ıi D Bi wi C gi

DW g i ; (13a)

Ri .ıwi / D Ri wi  ri

DW r i ; (13b)

Xi .ıwi / C Pi C1 .ıwi C1 / D Xi wi C Pi C1 si C1  hi DW hi : (13c) r

x

with multipliers ı 2 Rn for the matching conditions (9b) and ı 2 Rni for the active point constraints (9c). The projection matrices Pi are defined as   x x q Pi WD I 0 2 Rn .n Cn / ; 1  i  m; x

x

q

x

x

(14)

and as P0 WD 0 2 Rn .n Cn / , PmC1 WD 0 2 Rn n for the first and last shooting nodes, respectively. In the following, all matrices and vectors are assumed to

200

C. Kirches et al.

comprise the components of the active set only. To avoid the need for repeated special treatment of the first and last shooting node throughout this paper, we introduce the following conventions that make (13) hold also for the border cases i D 0 and i D m: x

x

ı1 WD 0 2 Rn ; 1 WD 0 2 Rnx ; ım WD 0 2 Rn ; nx

nx

ıwmC1 WD 0 2 R ; wmC1 WD 0 2 R ;

nx

hm WD 0 2 R ;

x

m WD 0 2 Rn ; Xm WD 0 2 R

nx nx

(15a) :

(15b)

3.3 Hessian Projection Schur Complement Factorization Hessian Projection Step Under the assumption that the number of active point constraints does not exceed the number of unknowns, i.e. the active set is not degenerate, we can perform QR factorizations of the point constraints matrices Ri ,  0  Ri Qi D RiR 0 ;

  Qi WD Yi Zi :

(16)

Here Qi are a unitary matrices and RiR is upper triangular. We partition ıwi into its range space part ıwYi and its null space part ıwZ i , where the identity ıwi D Y Yi ıwYi C Zi ıwZ holds. We find ıw from the range space projection of (13b) i i Ri .ıwi / D RiR ıwYi D r i :

(17)

We transform the KKT system onto the null space of Ri by substituting Yi ıwYi C Z Zi ıwZ i for ıwi and solving for ıwi . We find for the matching conditions (13c) Y Y Z  Xi Zi ıwZ i  Pi C1 Zi ıwi C1 D hi C Xi Yi ıwi C Pi C1 Yi ıwi C1

(18)

Z to be solved for ıwZ i once ıwi C1 is known. For stationarity (13a) we find 0 Y 0 0 0 0 0 Zi0 Pi 0 ıi 1  Zi0 Bi Zi ıwZ i C Zi Ri i C Zi Xi ıi D Zi g i C Zi Bi Yi ıwi

and Zi0 Ri

Yi0 Ri0 ıi D Yi0 .Bi ıwi C Pi 0 ıi 1  Xi0 ıi C g i /: Yi0 Ri0

(19) (20)

D RiR

Therein, D 0 and Thus (19) can be solved for ıi once ıwi and ıi 1 are known, while (20) can be used to determine the point constraints multipliers ıi . Let thus null space projections be defined as follows: BQ i WD Zi0 Bi Zi ; gQ i WD Zi0 g i C Zi0 Bi Yi ıwYi ; XQ i WD Xi Zi ; PQi WD Pi Zi ;

hQ i WD hi C Xi Yi ıwYi C Pi C1 Yi ıwYiC1 ;

0  i  m;

(21a)

0  i  m  1; (21b) 0  i  m  1: (21c)

Complementary Condensing for the Direct Multiple Shooting Method

201

With this notation the projection of the KKT system on the null space of the point constraints can be read from (18) and (19) for 0  i  m  1 as Q0 Qi ; PQi0 ıi 1 C BQ i .ıwZ i / C Xi ıi D g

(22a)

Z Q Q XQi .ıwZ i / C Pi C1 .ıwi C1 / D hi :

(22b)

Schur Complement Step In (22a) the elimination of ıwZ is possible using a Schur complement step, provided that the reduced Hessians BQ i are positive definite. We find Q 1 Q i  PQi0 ıi 1  XQi0 ıi / .ıwZ (23) i / D Bi .g depending on the knowledge of ıi . Inserting into (22b) and collecting for ıi yields Q0 XQ i BQ i1 PQi0 ıi 1 C .XQ i BQ i1 XQ i0 C PQi C1 BQ i1 C1 Pi C1 /ıi

(24)

Q Q Q 1 Q i C PQi C1 BQ i1 Q0 Q i C1 C PQi C1 BQ i1 C1 Xi C1 ıi C1 D hi C Xi Bi g C1 g 0 With Cholesky factorizations BQi D RiB RiB we define the following symbols 1 Q0 XOi WD XQ i RiB ; Ai WD XQ i BQ i1 XQi0 C PQi C1 BQ i1 C1 Pi C1

D XO i XO i0 C POi C1 POi0C1 ;

1 POi WD PQi RiB ;

D XO i POi0 ;

gO WD RiB

T

gQ i ;

Bi WD XQ i BQ i1 PQi0

(25)

ai WD hQ i C XQ i BQ i1 gQ i C PQi C1 BQ i1 Q i C1 D hQ i C XOi gO i C1 g C POi C1 gO i C1 :

Equation (24) may then be written in terms of these values for 0  i  m  1 as Bi ıi 1 C Ai ıi C Bi0C1 ıi C1 D ai :

(26)

Solving the Block Tridiagonal System In the symmetric positive definite banded x system (26), only the matching condition duals ıi 2 Rn remain as unknowns. In classical condensing, exactly these matching conditions were used for elimination of a part of the primal unknowns. System (26) can be solved for ı by means of a block tridiagonal Cholesky factorization and two backsolves. Recovering the Block Sparse QP’s Solution Once ı is known, the step ıwZ can be recovered using (23). The full primal step ıw is then obtained from ıw D Y ıwY C ZıwZ . The constraint multipliers step ı is recovered using (20).

202

C. Kirches et al.

3.4 Computational Complexity In the left part of Table 1 a detailed list of the linear algebra operations required to carry out the individual steps of the complementary condensing method can be found. The number of floating point operations (FLOPs) required per shooting node, depending on the system’s dimensions n D nx C nq and nri , is given in the right part of Table 1. The numbers ny and nz with ny C nz D nri denote the range-space and null-space dimension in (16), respectively. The proposed method’s runtime complexity is O.m/, in sharp contrast to the classical condensing method’s O.m2 /, as the shooting grid length m does not appear explicitly in Table 1. Table 1 Left: Number of factorizations (dc), backsolves (bs), multiplications (*), and additions (+) required per shooting node. Right: Number of FLOPs required per shooting node Matrix Vector Action dc bs * + bs * + Decompose Ri 1 – – – Solve for ıwY , Y ıwY 1 1 – Build BQi – – 2 – Build XQi , PQi – – 2 – Build gQi , hQi – 4 3 Decompose BQi 1 – – – Build XOi , POi – 1 1 – Build Ai , Bi – – 3 1 Build gO i , ai – 3 2 Decompose (26) 1 – – – Solve for ıi 2 – – Z Solve for ıwZ , Zıw 2 3 2 i i Solve for ıi 1 4 3

Action

Floating point operations

Decompose Ri Solve for Y ıwY Build BQi Build XQi , PQi Build gQi , hQi

nri 2 .n  13 nri / nri ny C ny n nz2 n C nz n2 2nx nz n 2nx n C nz n C n2 C 2nx C n

Decompose BQi Build XOi , POi Build Ai , Bi Build gO i , ai Decompose (26) Solve for ıi Solve for ZıwZ i Solve for ıi

1 z3 n 3 x z2

2n n 3nx2 nz C nx2 nz2 + 2nx nz C 2nx 4 x3 n 3 2  2nx2 nz2 C 2nx nz C nz n C 2nz nr i ny C 2nx n C ny n C n2 C 3n

Complementary Condensing for the Direct Multiple Shooting Method

203

4 Example: A Vehicle Mixed-Integer Optimal Control Problem In this section we formulate a vehicle control problem as a test bed for the presented approach to solving the block sparse QPs. Exemplary Vehicle Mixed-Integer Optimal Control Problem We consider a simple dynamic model of a car driving with velocity v on a straight lane with varying slope . The optimizer excerts control over the engine and brake torque rates of change Reng and Rbrk , and the gear choice y. The state dimension is nx D 3, and we consider different numbers of available gears to scale the problems control dimension nq  3.    1 iA  vP .t/ D iT .y/T .y/Meng  Mbrk  iT .y/Mfric  Mair  Mroad m r (27a) P eng .t/ D Racc .t/; M

MP brk .t/ D Rbrk .t/

(27b)

Herein m is the vehicle’s mass, iA and iT .y/ are the rear axle and gearbox transmission ratios. The amount of engine friction is denoted by Mfric , a nonlinear function of the engine speed. By Mair WD 21 cw Aair v2 .t/ we denote air resistance, cw being the aerodynamic shape coefficient, A the effective flow surface, and air the air density. Finally Mroad D mg.sin .t/ C fr cos .t// accounts for downhill force and tyre friction, g being the gravity constant and fr the coefficient of rolling friction. On a predefined track with varying slope, we minimize a weighted sum of travel time and fuel consumption, subject to velocity and engine speed constraints making the gear choice nontrivial. Run Time Complexity Run Time Complexity Clearly from Table 2 it can be seen that the classical condensing algorithm will be suitable for problems with limited grid lengths m and with considerably less controls than states, i.e. nq  nx , which is exactly contrary to the situation encountered when applying outer convexification to MIOCPs. Nonetheless, using this approach we could solve several challenging mixed-integer optimal control problems to optimality with little computational effort, as reported in [9, 12, 14]. Table 2 Run time complexity of classical condensing and a dense active-set QP solver

Action

Run time complexity

Computing the hessian B Computing the constraints X , R Dense QP solver, startup Dense QP solver, per iteration Recovering ıv

O.m2 n3 / O.m2 n3 / O..mnq C nx /3 / O..mnq C nx /2 / O.mnx2 /

204

C. Kirches et al.

Sparsity In Table 3 the dimensions and amount of sparsity present in the Hessian and constraints matrices are given for the exemplary problem for 6 and 16 available gears. A grid length of m D 20 was used. As can be seen in the left part, the QP (9) is only sparsely populated for this example problem, with the number of nonzero elements (nnz) never exceeding 3%. After classical condensing, sparsity has been lost as expected. Had the overall dimension of the QP reduced considerably, as is the case for optimal control problems with nx  nq , that would be of no concern. For our MIOCP with outer convexification, however, the results shown in Tables 2 and 3 indicate a considerable run time increase for larger m or nq is to be expected. Implementation Run Times The classical condensing algorithm as well as the QP solver QPOPT [7] are implemented in ANSI C and translated using gcc 4.3.3 with optimization level -O3. The linear algebra package ATLAS was used for BLAS operations. The proposed complementary condensing algorithm was preliminarily c (Release 2008b). All run times have been obtained implemented in MATLAB on a Pentium 4 machine at 3 GHz under SuSE Linux 10.3. The resulting run times shown in Table 4 support our conclusions drawn from Table 3. For m D 30 as well as for m D 20 and nq  14 the MATLAB code of our proposed methods beats an optimized C implementation of classical condensing plus QPOPT. In addition, we could solve four instances with m D 30 or nq D 18 that could not be solved before due to active set cycling of the QPOPT solver. Table 3 Dimensions and number of nonzero elements (nnz) of the block structured QP (9) and the condensed QP for the exemplary vehicle control problem. Here m D 20, nx D 3. nq 2+6 2+16

Matrix Hessian Constraints Hessian Constraints

Block sparse Size 223  223 438  223 423  423 858  423

nnz 1; 419 1; 535 6; 623 3; 731

Condensed Size 163  163 378  163 363  363 798  363

nnz 13; 366 13; 591 66; 066 131; 769

Dense QP solver nnz seen 13; 366 .27%/ 61; 614 .63%/ 66; 066 .37%/ 289; 674 .80%/

Table 4 Average run time per iteration of the QP solver QPOPT on the condensed QPs (left, condensing run times excluded), and of a preliminary MATLAB code running proposed method on the block sparse QPs (right). “–” indicates cycling of the active set m m nq 10 20 30 nq 10 20 30 2C6 6 ms 31 ms 103 ms 2C6 40 ms 65 ms 100 ms 2C8 11 ms 58 ms 467 ms 2C8 42 ms 75 ms 110 ms 2 C 12 18 ms 226 ms – 2 C 12 50 ms 95 ms 140 ms 2 C 16 – – – 2 C 16 60 ms 115 ms 170 ms

Complementary Condensing for the Direct Multiple Shooting Method

205

5 Summary and Future Work Summarizing the results presented in Tables 3 and 4, we have seen that for OCPs with larger dimension nq , the classical O.m2 n3 / condensing algorithm is unable to significantly reduce the QPs size. Worse yet, the condensed QP is densely populated. As a consequence, the dense QP solver’s performance, exemplarily tested using QPOPT, is worse than what can be achieved by a suitable exploitation of the sparse block structure for the case nq  nx .

We presented an alternative O.mn3 / factorization of the block sparse KKT system due to [16,17], named complementary condensing in the context of MIOCPs. By theoretical analysis as well as by preliminary implementation we provided evidence that the proposed approach is able to challenge the run times of the classical condensing algorithm.

The complementary condensing approach for solving the QP’s KKT system is embedded in an active set loop. In our preliminary implementation, a new factorization of the KKT system is computed in O.mn3 / time in every iteration of the active set loop. Nonetheless, the achieved computation times are attractive for larger values of m or nq . To improve the efficiency of this active set method further, several issues have to be addressed. Exploiting simple bounds on the unknowns will reduce the size of the matrices Bi , Ri , and Xi involved. For dense nullspace and range-space methods it is common knowledge that certain factorizations can be updated after an active set change in O.n2 / time. Such techniques would essentially relieve the active-set loop from all matrix-only operations, yielding O.mn2 / active set iterations with only an initial factorization in O.mn3 / time necessary. A forthcoming publication shall investigate into this topic.

References 1. J. ALBERSMEYER AND H. B OCK , Sensitivity Generation in an Adaptive BDF-Method, in Modeling, Simulation and Optimization of Complex Processes: Proc. 3rd Int. Conf. on High Performance Scientific Computing, Hanoi, Vietnam, 2008, pp. 15–24. 2. L. BIEGLER , Solution of dynamic optimization problems by successive quadratic programming and orthogonal collocation, Comp. Chem. Eng., 8 (1984), pp. 243–248. 3. H. BOCK AND K. PLITT, A Multiple Shooting algorithm for direct solution of optimal control problems, in Proc. 9th IFAC World Congress Budapest, 1984, pp. 243–247. 4. H. FERREAU , H. BOCK , AND M. D IEHL, An online active set strategy for fast parametric quadratic programming in MPC applications, in Proc. IFAC Workshop on Nonlinear Model Predictive Control for Fast Systems, Grenoble, 2006. 5. R. FLETCHER , Resolving degeneracy in quadratic programming, Numerical Analysis Report NA/135, University of Dundee, Dundee, Scotland, 1991. 6. M. GERDTS, A variable time transformation method for mixed-integer optimal control problems, Optimal Control Applications and Methods, 27 (2006), pp. 169–182. 7. P. G ILL, W. MURRAY, AND M. SAUNDERS, User’s Guide For QPOPT 1.0: A Fortran Package For Quadratic Programming, 1995.

206

C. Kirches et al.

˚ SLUND, AND L. NIELSEN , Look-ahead control for heavy ¨ , M. IVARSSON, J. A 8. E. HELLSTR OM trucks to minimize trip time and fuel consumption, Control Eng. Pract., 17 (2009), pp. 245–254. ¨ , Time-optimal control of automobile 9. C. KIRCHES, S. SAGER , H. BOCK , AND J. SCHL ODER test drives with gear shifts, Opt. Contr. Appl. Meth. (2010). DOI 10.1002/oca.892. ¨ , An efficient multiple shooting 10. D. LEINEWEBER , I. BAUER , H. B OCK , AND J. SCHL ODER based reduced SQP strategy for large-scale dynamic process optimization. Part I: Theoretical aspects, Computers and Chemical Engineering, 27 (2003), pp. 157–166. 11. J. N OCEDAL AND S. WRIGHT, Numerical Optimization, Springer, 2nd ed., 2006. 12. S. S AGER , Numerical methods for mixed–integer optimal control problems, Der andere Verlag, T¨onning, L¨ubeck, Marburg, 2005. 13. S. SAGER , Reformulations and algorithms for the optimization of switching decisions in nonlinear optimal control, Journal of Process Control, 19 (2009), pp. 1238–1247. 14. S. S AGER , C. KIRCHES, AND H. BOCK , Fast solution of periodic optimal control problems in automobile test-driving with gear shifts, in Proc. 47th IEEE CDC, Cancun, Mexico, 2008, pp. 1563–1568. 15. S. S AGER , G. REINELT, AND H. BOCK , Direct methods with maximal lower bound for mixedinteger optimal control problems, Math. Prog., 118 (2009), pp. 109–149. 16. M. STEINBACH , Fast recursive SQP methods for large-scale optimal control problems, PhD thesis, Universit¨at Heidelberg, 1995. , Structured interior point SQP methods in optimal control, Zeitschrift f¨ur Angewandte 17. Mathematik und Mechanik, 76 (1996), pp. 59–62. , Tree-sparse convex programs, Math. Methods Oper. Res., 56 (2002), pp. 347–376. 18. 19. S. TERWEN, M. BACK , AND V. KREBS, Predictive powertrain control for heavy duty trucks, in Proc. IFAC Symposium in Advances in Automotive Control, Salerno, Italy, 2004, pp. 451–457. 20. J. T ILL, S. ENGELL, S. PANEK, AND O. STURSBERG , Applied hybrid system optimization: An empirical investigation of complexity, Control Eng. Pract., 12 (2004), pp. 1291–1303.

Some Inverse Problem for the Polarized-Radiation Transfer Equation A.E. Kovtanyuk and I.V. Prokhorov

Abstract An inverse problem for the steady vector transfer equation for polarized radiation is studied. For this problem, an attenuation factor is found from a given solution of the equation at a medium boundary. An approach is propounded to solve the inverse problem by using special external radiative sources. A formula is proposed which relates the Radon transform of an attenuation factor to a solution of the equation at the medium boundary. Numerical experiments show that the proposed reconstruction algorithm for the polarized-radiation transfer equation has an advantage over the similar method for the scalar case.

1 Introduction The linear integro-differential Boltzmann equation, also called the radiation transfer equation, is a basic model for describing the photon transfer process. Two kinds of interaction of photons with substance, namely, absorption and scattering, are considered within the framework of this model. For a more accurate description of the radiation transfer process, account should be taken of light beam polarization. Theoretical aspects of solving the vector transfer equation are presented in [1–3] where general functional properties of the direct problem are explored and conditions are specified under which a Neumann series converges in various spaces. A nice review on the vector transfer equation can be found in [4]. Among few works on inverse problems for the vector transfer equation it is worth mentioning [5–7], in which scattering properties of a medium are defined. In [5], in particular,

A.E. Kovtanyuk Far Eastern National University, Vladivostok, Russia e-mail: [email protected] I.V. Prokhorov Institute of Applied Mathematics FEBRAS, Vladivostok, Russia e-mail: [email protected] H.G. Bock et al. (eds.), Modeling, Simulation and Optimization of Complex Processes, DOI 10.1007/978-3-642-25707-0 17, © Springer-Verlag Berlin Heidelberg 2012

207

208

A.E. Kovtanyuk and I.V. Prokhorov

the problem of finding the single scattering albedo in a semi-infinite layer with a Rayleigh scattering matrix is solved. A model of polarized radiation passing through a plane homogeneous layer is considered in [6] where, too, an inverse problem of finding scattering matrix coefficients by using the incoming and outgoing radiation at a layer boundary is formulated and solved. In this paper, we deal with the problem of determining the attenuation factor in the transfer equation. A method of finding the factor is advanced which is based on employing a special-type external radiation source with discontinuities of the first kind in an angular variable. This method was used in [8–10] for solving an inverse problem in the scalar case. A method for determining the attenuation factor in the vector equation was proposed and substantiated in our recent paper [11]. The present account relies essentially on the results in [11], and so we will prove only those necessary statements that are not contained therein. The emphasis is on numerical verification of the method in order to underline peculiarities and demonstrate advantages of the proposed algorithm over a similar method for the scalar transfer equation. In computational experiments on finding the attenuation factor, we plan to realize some known weight modifications of the Monte-Carlo method: namely, the conjugate trajectories method and the maximum cross-section method [1].

2 Formulation and Solution of the Inverse Problem The main characteristic of polarized radiation is f D .f1 ; f2 ; f3 ; f4 /, a fourdimensional vector of Stokes parameters. The corresponding transfer equation for this vector in an isotropic medium has the form Z !  rr f .r; !/ C .r/f .r; !/ D s .r/ P .r; !; ! 0 /f .r; ! 0 /d! 0 C J.r; !/; (1) ˝

where r D .r1 ; r2 ; r3 / 2 G, G is a convex bounded domain in a three-dimensional Euclidian space E 3 , and ! 2 ˝ D f! 2 E 3 W j!j D 1g. In (1), the function J.r; !/ is a four-dimensional vector of internal radiation sources, .r/ is the total attenuation factor, s .r/ is a scattering coefficient, P .r; !; ! 0 / is a 4  4 scattering matrix. By writing !  rr f .r; !/ we mean a four-component vector function whose i -th component is a derivative of the function fi .r; !/ in a direction ! with respect to a space variable r. To characterize the inhomogeneity of a medium G in which the radiation transfer process is examined, we introduce a partition G0 of the domain G. Assume that the set G0 is open and dense in G, that is, G 0 D G. Moreover, we let G0 be the union of a finite number of domains and write G0 D

p [

i D1

Gi ;

Gi \ Gj D ;;

i ¤ j:

Some Inverse Problem for the Polarized-Radiation Transfer Equation

209

The domains Gi can be interpreted as parts of the inhomogeneous medium G filled with substance i . Suppose that the set G0 is generalized convex [12]; that is, any ray Lr;! D fr C t!; t  0g outgoing from point r 2 G0 in a direction ! 2 ˝ will intersect @G0 at finitely many points. Denote by Cb .X /, X 2 E m , a Banach space of functions which are defined on X , are bounded and continuous on X , and have the norm jjf jj D sup jf .x/j: x2X .4/

Similarly, we define a space Cb .X / which is formed by vector functions f D .f1 ; f2 ; f3 ; f4 / every component of which belongs to Cb .X /, and its corresponding norm is defined by setting jjf jj4 D max jjfi jj: 1i 4

Treating the coefficients in (1), we assume that functions .r/, s .r/ are nonnegative and belong to a space Cb .G0 /, with .r/  s .r/, and that the vector function .4/ J.r; !/ 2 Cb .G0 ˝/. All components of the scattering matrix P .r; !; ! 0 / belong to Cb .G0  ˝  ˝/. Let d.r; !/ be a distance from point r 2 G to boundary @G D G n G in a direction !. In view of [10], d.r; !/ 2 Cb .G  ˝/. Put !˙ D fz 2 @G W  ˙ D f.z; !/ 2 @G  ˝ W

Lz;! \ G0 ¤ ;g; z 2 !˙ g;

 D  C [  :

The set   . C / is a domain of incoming (outgoing) radiation. To (1), we add the following boundary condition: f .; !/ D h.; !/;

.; !/ 2   :

(2)

The vector function h.; !/ is defined on   and describes a radiation flux entering the medium G. By the definitions of   and d.r; !/, the boundary condition specified by (2) can be written in the form f .r  d.r; !/!; !/ D h.r  d.r; !/!; !/;

.r; !/ 2 G0  ˝:

.20 /

As for h, we assume that it is nonnegative and that e h.r; !/ D h.r d.r; !/!; !/ .4/ belongs to Cb .G0  ˝0 /, where ˝0 is open and dense subset in ˝. Along with the boundary condition given by (2’), we specify the following: f .r C d.r; !/!; !/ D H.r C d.r; !/!; !/;

.r; !/ 2 G0  ˝:

(3)

The function H.; !/ is defined on  C and specifies a radiation flux leaving the medium. We formulate an inverse problem which in essence can be thought of as a tomography problem.

210

A.E. Kovtanyuk and I.V. Prokhorov

2.1 Tomography Problem Determine a function .r/ from (1) and boundary conditions (2’) and (3) if only functions h and H are known. To solve the tomography problem, we need some properties of the solution of a direct problem.

2.1.1 Direct Problem Equations (1), (2) is a problem of determining a function f from (1), (2) with known , P , J , and h. Put .lf /.r; !/ D !  rr f .r; !/ C .r/f .r; !/;

(4)

P .r; !; ! 0 /f .r; ! 0 /d! 0 :

(5)

N.r; !/ D s .r/

Z

˝

Let ˝0 be an open subset of a unit sphere ˝ that is dense in ˝. We define a class D in which the solution to the direct problem is sought for. Definition 1. A vector function f .r; !/ belongs to D.G0  ˝0 / if, for any points .r; !/ 2 G0 ˝0 , the function f .r Ct!; !/ is absolutely continuous with respect to a variable t 2 Œd.r; !/; d.r; !/ and functions f .r; !/ and !  rr f .r; !/ belong .4/ to the space Cb .G0  ˝0 /. Note that since .r/ 2 Cb .G0 /, the operator l defined .4/ by (4) maps D.G0  ˝0 / to Cb .G0  ˝0 /. Definition 2. A solution of the direct problem (1), (2) is a function f .r; !/ 2 D.G0  ˝0 / satisfying the relations .lf /.r; !/ D N.r; !/ C J.r; !/; f .r  d.r; !/!; !/ D h.r  d.r; !/!; !/: for all .r; !/ 2 G0  ˝0 . In what follows, we use some conditions evoked by physical constraints on components of the function f .r; !/. Functions fi .r; !/ are Stokes parameters; so they should satisfy the following: f1  0;

f12  f22 C f32 C f42 : .4/

(6)

(see [1–3]). Denote by K a cone in the space Cb .G0  ˝0 / formed by functions .4/ f D .f1 ; f2 ; f3 ; f4 / 2 Cb .G0  ˝0 / satisfying (6). For the functions in K, note, there are conditions that are physical in character, which ultimately guarantee the existence and uniqueness of a solution of the direct problem (1), (2). These

Some Inverse Problem for the Polarized-Radiation Transfer Equation

211

conditions are as follows. For all f 2 K, the matrix P .r; !; ! 0 / must meet the constraints Pf 2 K; (7) Z Z .r/ s .r/ .P .r; !; ! 0 /f .r; ! 0 //1 d! 0  f1 .r; ! 0 /d! 0 (8) 4 ˝

˝

(see [2]). Constraint (7) signifies that the matrix operator P maps the cone K into itself, that is, .Pf /1  0;

.Pf /21  .Pf /22 C .Pf /23 C .Pf /24 ;

and condition (8) expresses energy conservation for a scattering event in a nonmultiplying medium [2]. Let e .r/ be a function in Cb .G0 / satisfying the inequality e .r/  .r/, r 2 G. We cite some facts from [11] which are needed for our further reasoning. We define integral operators e A W K ! K, S W K ! K \ C .4/ .G0  ˝/ and e S W K ! K as follows: .e A'/.r; !/ D

d.r;!/ Z

exp.e .r; !; t//'.r  t!; !/dt;

0

.S '/.r; !/ D s .r/ .e S '/.r; !/ D s .r/

Z

Z

P .r; !; ! 0 /'.r; ! 0 /d! 0 ;

˝

P .r; !; ! 0 /'.r; ! 0 /d! 0 C .e .r/  .r//'.r; !/;

˝

where

Put

e  .r; !/ D

d.r;!/ Z 0

e .r  t!/dt;

e .r; !; t/ D

Zt 0

e .r  t 0 !/dt 0 :

e0 .r; !/ D e f h.r; !/ exp.e  .r; !// C .e AJ /.r; !/:

Let formulate a statement on the well-posedness of direct problem (1), (2). Theorem 1. Assume that e h.r; !/ 2 K; J.r; !/ 2 Cb .G0 ˝/\K, and conditions (7) and (8) hold. In the cone K, a unique solution to problem (1), (2) exists and is expressed in terms of a Neumann series to yield .4/

e0 .r; !/ C f .r; !/ D f

1 X e 0 .r; !/; .e Ae S /n f nD1

(9)

212

A.E. Kovtanyuk and I.V. Prokhorov .4/

which converges in the norm of Cb .G0  ˝0 /. If e .r/ D .r/ then proving that the direct problem is well posed coincides .r/ is introduced to justify a with a similar argument in [11]. The function e computational algorithm which is used to solve the direct problem in the next section. Now we specify the set ˝0 . Hereinafter, let ˝ 0 D ˝  [ ˝C ;

˝˙ D f! 2 ˝ W sgn.!3 / D ˙1g:

In order to solve the above tomography problem, along with the conditions stated in Theorem 1, we impose extra requirements on the function h. 1. Let

.4/ e h.r; !/ 2 Cb .G0  ˝0 /:

(10)

2. For at least one i , i 2 f1; 2; 3; 4g and for all ! D .!1 ; !2 ; 0/ 2 ˝, the following relation holds:

where

e Œe hi .r; !/ D e hC i .r; !/  hi .r; !/ ¤ 0;

r 2 G0 ;

(11)

  .!1 ; !2 ; ˙"/ ˙ e e : hi .r; !/ D lim hi r; "!C0 1 C "2

Thus, we assume that one or more components of the function h.r; !/ in horizontal directions .!3 D 0/ have a discontinuity of the first kind. Below is a result from [11] which yields a solution to our tomography problem. Theorem 2. Assume that under the conditions of Theorem 1, h satisfies (10), (11) and relation (3) holds. Then, for all r 2 G0 ; ! D .!1 ; !2 ; 0/ 2 ˝, the following equality holds true: d.r;!/ Z

.r C !t/dt D ln

Œhi .r  d.r; !/!; !/ : ŒHi .r C d.r; !/!; !/

(12)

d.r;!/

Thus, the tomography problem is reduced to inverting the two-dimensional Radon transform of a function , that is,

.R/.r; !/ 

d.r;!/ Z

.r C !t/dt D ˚i .r; !/;

(13)

d.r;!/

where ˚i .r; !/ D ln

Œhi .r  d.r; !/!; !/ ŒHi .r C d.r; !/!; !/

(14)

Some Inverse Problem for the Polarized-Radiation Transfer Equation

213

in any horizontal plane fr D .r1 ; r2 ; r3 / 2 E 3 W r3 D constg which has a point in common with the set G0 . This problem has a unique solution in a wide class of functions [13, 14]. From (12) to (14), it follows immediately that in order to find .R/.r; !/ we can use any components of the vector functions h and H with nonzero discontinuities treated as functions of the angular variable. This fact will be made use of in conducting numerical experiments in the next section.

3 Numerical Results We show how the solution algorithm for the tomography problem works by the 3D Shepp-Logan Phantom [14] (See Fig. 1a). The function .r/ is recovered in a plane r3 D 0. We assume that matrix P D P .!; ! 0 / describes the Rayleigh law of scattering [4] and s .r/ D 0:5.r/ at the all medium G. For a vector function h.r; !/ corresponding to the incoming radiation, we take components such as    1; !3  0; 0; !3  0; 1; !3  0; h4 D 0: h3 D h2 D h1 D 0; !3 < 0; 1:1; !3 < 0; 1:1; !3 < 0; Intersection of r3 D 0 with the domain G is a circle of radius 1. In recovering .r/, use is made of a parallel scanning scheme [14]. Let r1 D  cos ';

r2 D  sin ';

! D . sin '; cos '; 0/;

 2 Œ1; 1;

' 2 Œ0; 2/;

!? D .cos '; sin '; 0/;

!  !? D 0;

Then equality (13) can be written in the form p 2 Z1 e i .; !/; .!? C t!/dt D ˚ p 

12

e i .; !/ D ˚i .r.; !/; !.'//, r D !? .'/. Hence, in the cross-section where ˚ r3 D 0, we derive integrals of the trace of the function  on almost all lines passing through points r D !? .'/ in a direction !.'/, with  2 Œ1; 1 and ' 2 Œ0; 2/. Thus, the problem of defining the function .r/ reduces to inverting its Radon transform .R/.; !/. In conducting computational experiments, we use the following partition of the set Œ1; 1  Œ0; 2/: l D 1 C l=60;

l D 0; 120;

's D s=90;

on which the Radon transform .R/.; !.'// is defined.

s D 0; 179:

214

A.E. Kovtanyuk and I.V. Prokhorov

Fig. 1 The 3D Shepp-Logan Phantom at the section r3 D 0: (a) original cross-section; (b) reconstruction by using the jump of f1 ; (c) reconstruction by using the jump of f2

Some Inverse Problem for the Polarized-Radiation Transfer Equation

215

e i .l ; !.'s //, we calculate a jump ŒH.r C d.r; !/!; !/ at points To find ˚ .r l;s ; ! s /, where r l;s D l !? .'s / and ! s D !.'s /. The vector function H is calculated based on a Monte-Carlo method. Let .r/   for any r 2 G, where  is a constant. Put e .r/ D . By Theorem 1, then, we arrive at a solution in the form of a convergent series such as in (9). The component .  .r//'.r; !/ in the expression for the integral operator .  .r//'.r; !/ can be treated as some fictitious scattering with the direction of photon propagation kept fixed. This method, called the maximum cross-section method [1], allows for simpler tricks with the free path length of a particle even in domains with a complex structure.With small variations of the total interaction coefficient in the medium, such an approach gives rather good results. Let m be the number of terms being considered in the Neumann series and n be the number of simulated trajectories. Then the function f .r; !/ can be found approximately in the form n 1X f .r; !/  f n .r; !/ D si .r; !/; n i D1 si .r; !/ D fe0 .r; !/ C

j m Y X 1  exp.d.r i;k1 ; ! i;k1 //   j D1 kD1

.  .r / C s .r i;k //Q.! i;k1 ; ! i;k /fe0 .r i;j ; ! i;j /: i;k

In simulating trajectories at each step .i; k/, we define r i;k setting r i;k D r i;k1  ! i;k1 ti;k ;

r i;0 D r;

! i;0 D !;

where ti;k is an independent realization of a random variable distributed on Œ0; d.r i;k1 ; ! i;k1 / with density .t/ D

 exp.t/ : 1  exp.d.r i;k1 ; ! i;k1 //

Then we simulate a realization ˛i;k of a random variable distributed uniformly on Œ0; 1. In defining the quantity ! i;k and the matrix Q.! i;k1 ; ! i;k /, for ˛i;k 

  .r i;k /   .r i;k / C s .r i;k /

we use the following formulas: !1i;k D !2i;k D

q

2 1  i;k cos 'i;k ;

q

2 1  i;k sin 'i;k ;

216

A.E. Kovtanyuk and I.V. Prokhorov

!3i;k D i;k ; Q.! i;k1 ; ! i;k / D 4P .! i;k1 ; ! i;k /; and for ˛i;k