Modeling, Simulation and Optimization of Complex Processes: Proceedings of the Third International Conference on High Performance Scientific Computing, March 6-10, 2006, Hanoi, Vietnam [1 ed.] 3540794085, 9783540794080

This proceedings volume contains a selection of papers presented at the Third International Conference on High Performan

211 22 13MB

English Pages 676 Year 2008

Recommend Papers

Modeling, Simulation and Optimization of Complex Processes: Proceedings of the Third International Conference on High Performance Scientific Computing, March 6-10, 2006, Hanoi, Vietnam 3540794085, 9783540794080

This proceedings volume covers the broad interdisciplinary spectrum of scientific computing and presents recent advances

103 90 21MB Read more

Modeling, Simulation and Optimization of Complex Processes HPSC 2018: Proceedings of the 7th International Conference on High Performance Scientific Computing, Hanoi, Vietnam, March 19-23, 2018 303055239X, 9783030552398

This proceedings volume highlights a selection of papers presented at the 7th International Conference on High Performan

304 51 12MB Read more

Modeling, Simulation and Optimization of Complex Processes HPSC 2018: Proceedings of the 7th International Conference on High Performance Scientific Computing, Hanoi, Vietnam, March 19-23, 2018 [1st ed.] 9783030552398, 9783030552404

This proceedings volume highlights a selection of papers presented at the 7th International Conference on High Performan

488 19 12MB Read more

Modeling, Simulation and Optimization of Complex Processes: Proceedings of the Fourth International Conference on High Performance Scientific Computing, March 2-6, 2009, Hanoi, Vietnam [1 ed.] 3642257070, 9783642257070

This proceedings volume contains a selection of papers presented at the Fourth International Conference on High Performa

233 90 5MB Read more

Modeling, Simulation and Optimization of Complex Processes: Proceedings of the Fourth International Conference on High Performance Scientific Computing, March 2-6, 2009, Hanoi, Vietnam [1 ed.] 3642257062, 9783642257063

This proceedings volume contains a selection of papers presented at the Fourth International Conference on High Performa

245 51 5MB Read more

High Performance Computing on Vector Systems 2005: Proceedings of the High Performance Computing Center Stuttgart, March 2005 9783540291244, 3540291245

The book presents the state of the art in high performance computing and simulation on modern supercomputer architecture

116 12 7MB Read more

Intelligent Computing and Optimization: Proceedings of the 3rd International Conference on Intelligent Computing and Optimization 2020 (ICO 2020) 303068153X, 9783030681531

Third edition of International Conference on Intelligent Computing and Optimization and as a premium fruit, this book, p

893 17 166MB Read more

Sustained Simulation Performance 2022: Proceedings of the Joint Workshop on Sustained Simulation Performance, High-Performance Computing Center ... and Tohoku University, May and October 2022 3031410726, 9783031410727

This book presents the state of the art in High-Performance Computing on modern supercomputer architectures. It addresse

106 33 3MB Read more

High Performance Computing on Vector Systems: Proceedings of the High Performance Computing Center Stuttgart, March 2005 [1 ed.] 9783540291244, 3-540-29124-5

The book presents the state of the art in high performance computing and simulation on modern supercomputer architecture

383 18 6MB Read more

Proceedings of International Scientific Conference on Telecommunications, Computing and Control: TELECCON 2019 9813366311, 9789813366312

This book provides a platform for academics and practitioners for sharing innovative results, approaches, developments,

103 35 21MB Read more

Modeling, Simulation and Optimization of Complex Processes: Proceedings of the Third International Conference on High Performance Scientific Computing, March 6-10, 2006, Hanoi, Vietnam [1 ed.]
3540794085, 9783540794080

Author / Uploaded
Hans Georg Bock
Ekaterina Kostina
Xuan Phu Hoang
Rolf Rannacher

Similar Topics
Education
International Conferences and Symposiums

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

Bock · Kostina · Phu · Rannacher (Eds.) Modeling, Simulation and Optimization of Complex Processes

Hans Georg Bock · Ekaterina Kostina Hoang Xuan Phu · Rolf Rannacher Editors

Modeling, Simulation and Optimization of Complex Processes Proceedings of the Third International Conference on High Performance Scientific Computing, March 6–10, 2006, Hanoi, Vietnam

123

Hans Georg Bock Universität Heidelberg Interdisziplinäres Zentrum für Wissenschaftliches Rechnen (IWR) Im Neuenheimer Feld 368 69120 Heidelberg Germany [email protected]

Ekaterina Kostina Universität Marburg FB Mathematik und Informatik Hans-Meerwein-Str. 35032 Marburg Germany [email protected]

Hoang Xuan Phu Vietnamese Academy of Science and Technology (VAST) Institute of Mathematics 18 Hoang Quoc Viet Road 10307 Hanoi Vietnam [email protected]

Rolf Rannacher Universität Heidelberg Interdisziplinäres Zentrum für Wissenschaftliches Rechnen (IWR) Im Neuenheimer Feld 368 69120 Heidelberg Germany [email protected]

The cover picture shows a computer reconstruction (courtesy of Angkor Project Group, IWR) of the mountain temple of Phnom Bakheng in the Angkor Region, Siem Reap, Cambodia, where the pre-conference workshop “Scientific Computing for the Cultural Heritage” took place on March 2–5, 2006.

ISBN: 978-3-540-79408-0

e-ISBN: 978-3-540-79409-7

Library of Congress Control Number: 2008925522 Mathematics Subject Classification: 49-06, 60-06, 65-06, 68-06, 70-06, 76-06, 85-06, 90-06, 93-06, 94-06

© 2008 Springer-Verlag Berlin Heidelberg This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable for prosecution under the German Copyright Law. The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Cover design: WMX Design GmbH, Heidelberg Printed on acid-free paper 9 8 7 6 5 4 3 2 1 springer.com

Preface

High Performance Scientiﬁc Computing is an interdisciplinary area that combines many ﬁelds such as mathematics, computer science and scientiﬁc and engineering applications. It is a key high-technology for competitiveness in industrialized countries as well as for speeding up development in emerging countries. High performance scientiﬁc computing develops methods for computer aided simulation and optimization of systems and processes. In practical applications in industry and commerce, science and engineering, it helps to save resources, to avoid pollution, to reduce risks and costs, to improve product quality, to shorten development times or simply to operate systems better. Diﬀerent aspects of scientiﬁc computing have been the topics of the Third International Conference on High Performance Scientiﬁc Computing held at the Hanoi Institute of Mathematics, Vietnamese Academy of Science and Technology (VAST), March 6-10, 2006. The conference has been organized by the Hanoi Institute of Mathematics, Ho Chi Minh City University of Technology, Interdisciplinary Center for Scientiﬁc Computing (IWR), Heidelberg, and its International PhD Program “Complex Processes: Modeling, Simulation and Optimization”. The conference had about 200 participants from countries all over the world. The scientiﬁc program consisted of more than 130 talks, 10 of them were invited plenary talks given by John Ball (Oxford), Vincenzo Capasso (Milan), Paolo Carloni (Trieste), Sebastian Engell (Dortmund), Donald Goldfarb (New York), Wolfgang Hackbusch (Leipzig), Satoru Iwata (Tokyo), Hans Petter Langtangen (Oslo), Tao Tang (Hong Kong) and Philippe Toint (Namur). Topics were mathematical modelling, numerical simulation, methods for optimization and control, parallel computing, software development, applications of scientiﬁc computing in physics, chemistry, biology and mechanics, environmental and hydrology problems, transport, logistics and site location, communication networks, production scheduling, industrial and commercial problems.

VI

Preface

This proceeding volume contains 44 carefully selected contributions referring to lectures presented at the conference. We would like to thank all contributors and referees. We would like also to use the opportunity to thank the sponsors whose support signiﬁcantly contributed to the success of the conference: Interdisciplinary Center for Scientiﬁc Computing (IWR) and its International PhD Program “Complex Processes: Modeling, Simulation and Optimization” of the University of Heidelberg, Gottlieb Daimler- and Karl Benz-Foundation, the DFG Research Center Matheon, Berlin/Brandenburg Academy of Sciences und Humanities, the Abdus Salam International Centre for Theoretical Physics, the Vietnamese Academy of Science and Technology (VAST) and its Institute of Mathematics, the Vietnam National Program for Basic Sciences and its Key Project “Selected Problems of Optimization and Scientiﬁc Computing” and Ho Chi Minh City University of Technology. Heidelberg January 2008

Hans Georg Bock Ekaterina Kostina Hoang Xuan Phu Rolf Rannacher

Contents

Development of a Fault Detection Model-Based Controller Nitin Afzulpurkar, Vu Trieu Minh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1

Sensitivity Generation in an Adaptive BDF-Method Jan Albersmeyer, Hans Georg Bock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 The gVERSE RF Pulse: An Optimal Approach to MRI Pulse Design Christopher K. Anand, Stephen J. Stoyan, Tam´ as Terlaky . . . . . . . . . . . . . 25 Modelling the Performance of the Gaussian Chemistry Code on x86 Architectures Joseph Antony, Mike J. Frisch, Alistair P. Rendell . . . . . . . . . . . . . . . . . . 49 Numerical Simulation of the December 26, 2004: Indian Ocean Tsunami J. Asavanant, M. Ioualalen, N. Kaewbanjak, S.T. Grilli, P. Watts, J.T. Kirby, F. Shi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 Approximate Dynamic Programming for Generation of Robustly Stable Feedback Controllers Jakob Bj¨ ornberg, Moritz Diehl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 Integer Programming Approaches to Access and Backbone IP Network Planning Andreas Bley, Thorsten Koch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 An Adaptive Fictitious-Domain Method for Quantitative Studies of Particulate Flows Sebastian B¨ onisch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 Adaptive Sparse Grid Techniques for Data Mining H.-J. Bungartz, D. Pﬂ¨ uger, S. Zimmer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

VIII

Contents

On the Stochastic Geometry of Birth-and-Growth Processes. Application to Material Science, Biology and Medicine Vincenzo Capasso . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 Inverse Problem of Lindenmayer Systems on Branching Structures Somporn Chuai-Aree, Willi J¨ ager, Hans Georg Bock, Suchada Siripant . . 163 3D Cloud and Storm Reconstruction from Satellite Image Somporn Chuai-Aree, Willi J¨ ager, Hans Georg Bock, Susanne Kr¨ omker, Wattana Kanbua, Suchada Siripant . . . . . . . . . . . . . . . 187 Providing Query Assurance for Outsourced Tree-Indexed Data Tran Khanh Dang, Nguyen Thanh Son . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 An Adaptive Space-Sharing Scheduling Algorithm for PC-Based Clusters Viet Hung Doan, Nam Thoai, Nguyen Thanh Son . . . . . . . . . . . . . . . . . . . . 225 Fitting Multidimensional Data Using Gradient Penalties and Combination Techniques Jochen Garcke, Markus Hegland . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 Mathematical Modelling of Chemical Diﬀusion through Skin using Grid-based PSEs Christopher Goodyer, Jason Wood, Martin Berzins . . . . . . . . . . . . . . . . . . . 249 Modelling Gene Regulatory Networks Using Galerkin Techniques Based on State Space Aggregation and Sparse Grids Markus Hegland, Conrad Burden, Lucia Santoso . . . . . . . . . . . . . . . . . . . . . 259 A Numerical Study of Active-Set and Interior-Point Methods for Bound Constrained Optimization Long Hei, Jorge Nocedal, Richard A. Waltz . . . . . . . . . . . . . . . . . . . . . . . . . 273 Word Similarity In WordNet Tran Hong-Minh, Dan Smith . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293 Progress in Global Optimization and Shape Design D. Isebe, B. Ivorra, P. Azerad, B. Mohammadi, F. Bouchette . . . . . . . . . . 303 EDF Scheduling Algorithm for Periodic Messages On Switched Ethernet Myung-Kyun Kim, Dao Manh Cuong . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313

Contents

IX

Large-Scale Nonlinear Programming for Multi-scenario Optimization Carl D. Laird, Lorenz T. Biegler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323 On the Efficiency of Python for High-Performance Computing: A Case Study Involving Stencil Updates for Partial Diﬀerential Equations Hans Petter Langtangen, Xing Cai . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337 Designing Learning Control that is Close to Instability for Improved Parameter Identiﬁcation Richard W. Longman, Kevin Xu, Benjamas Panomruttanarug . . . . . . . . . 359 Fast Numerical Methods for Simulation of Chemically Reacting Flows in Catalytic Monoliths Hoang Duc Minh, Hans Georg Bock, Hoang Xuan Phu, Johannes P. Schl¨ oder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371 A Deterministic Optimization Approach for Generating Highly Nonlinear Balanced Boolean Functions in Cryptography Le Hoai Minh, Le Thi Hoai An, Pham Dinh Tao, Pascal Bouvry . . . . . . . 381 Project-Oriented Scheduler for Cluster Systems T. N. Minh, N. Thoai, N. T. Son, D. X. Ky . . . . . . . . . . . . . . . . . . . . . . . . . 393 Optimizing Spring-Damper Design in Human Like Walking that is Asymptotically Stable Without Feedback Katja D. Mombaur, Richard W. Longman, Hans Georg Bock, Johannes P. Schl¨ oder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403 Stability Optimization of Juggling Katja Mombaur, Peter Giesl, Heiko Wagner . . . . . . . . . . . . . . . . . . . . . . . . 419 Numerical Model of Far Turbulent Wake Behind Towed Body in Linearly Stratiﬁed Media N. P. Moshkin, G. G. Chernykh, A. V. Fomina . . . . . . . . . . . . . . . . . . . . . . 433 A New Direction to Parallelize Winograd’s Algorithm on Distributed Memory Computers D. K. Nguyen, I. Lavallee, M. Bui . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445 Stability Problems in ODE Estimation Michael R. Osborne . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459

X

Contents

A Fast, Parallel Performance of Fourth Order Iterative Algorithm on Shared Memory Multiprocessors (SMP) Architecture M. Othman, J. Sulaiman . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471 Design and Implementation of a Web Services-Based Framework Using Remoting Patterns Phung Huu Phu, Dae Seung Yoo, Myeongjae Yi . . . . . . . . . . . . . . . . . . . . . 479 Simulation of Tsunami and Flash Floods S. G. Roberts, O. M. Nielsen, J. Jakeman . . . . . . . . . . . . . . . . . . . . . . . . . . . 489 Diﬀerentiating Fixed Point Iterations with ADOL-C: Gradient Calculation for Fluid Dynamics Sebastian Schlenkrich, Andrea Walther, Nicolas R. Gauger, Ralf Heinrich . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499 Design Patterns for High-Performance Matrix Computations Hoang M. Son . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509 Smoothing and Filling Holes with Dirichlet Boundary Conditions Linda Stals, Stephen Roberts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521 Constraint Hierarchy and Stochastic Local Search for Solving Frequency Assignment Problem T.V. Su, D.T. Anh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531 Half-Sweep Algebraic Multigrid (HSAMG) Method Applied to Diﬀusion Equations J. Sulaiman, M. Othman, M. K. Hasan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 547 Solving City Bus Scheduling Problems in Bangkok by Eligen-Algorithm Chotiros Surapholchai, Gerhard Reinelt, Hans Georg Bock . . . . . . . . . . . . . 557 Partitioning for High Performance of Predicting Dynamical Behavior of Color Diﬀusion in Water using 2-D tightly Coupled Neural Cellular Network A. Suratanee, K. Na Nakornphanom, K. Plaimas, C. Lursinsap . . . . . . . 565 Automatic Information Extraction from the Web: An HMM-Based Approach M. S. Tran-Le, T. T. Vo-Dang, Quan Ho-Van, T. K. Dang . . . . . . . . . . . 575

Contents

XI

Advanced Wigner Method for Fault Detection and Diagnosis System Do Van Tuan, Sang Jin Cho, Ui Pil Chong . . . . . . . . . . . . . . . . . . . . . . . . . 587 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605

Development of a Fault Detection Model-Based Controller Nitin Afzulpurkar1 and Vu Trieu Minh2 1 2

Asian Institute of Technology (AIT) [email protected] King Mongkut’s Institute of Technology North Bangkok (KMITNB)/Sirindhorn International Thai-German Graduate School (TGGS) [email protected]

Abstract This paper describes a model-based control system that can online determine the optimal control actions and also detect faults quickly in the controlled process and reconﬁgure the controller accordingly. Thus, such system can perform its function correctly even in the presence of internal faults. A fault detection modelbased (FDMB) controller consists of two main parts, the ﬁrst is fault detection and diagnosis (FDD); and the second is controller reconﬁguration (CR). Systems subject to such faults are modeled as stochastic hybrid dynamic model. Each fault is deterministically represented by a mode in a discrete set of models. The FDD is used with interacting multiple- model (IMM) estimator and the CR is used with generalized predictive control (GPC) algorithm. Simulations for the proposed controller are illustrated and analyzed.

1 Introduction Various methods for fault detection of dynamic systems have been studied and developed over recent years ([2–8]) but there are relatively few successful development of controller systems that can deal with faults in sensors and actuators. In process control system, for example, failures of its actuator or sensor may cause serious problems and need to be detected and isolated as soon as possible. We therefore examine and propose a fault detection modelbased (FDMB) controller system for fault detection and diagnosis (FDD) and controller reconﬁguration (CR). The proposed FDMB controller is just theoretically simulated for simple examples. Results show its strong ability for real applications of a controller to detect sensor/actuator faults in dynamic systems. The outline for this paper is as follows: Section 2 describes the design and veriﬁcation of fault modeling; Section 3 analyzes the selection of an FDD system; Section 4 develops a CR in an integrated FDMB controller, examples and simulations are given after each section to illustrate the main ideas in the section; Finally conclusions are given in section 5.

2

N. Afzulpurkar and V.T. Minh

2 Fault Modeling Faults are diﬃcult to foresee and prevent. Traditionally faults were handled by describing the resulting behavior of the system and grouped into a hierarchic structure of fault models [1]. This approach is still widely used in practice: When a failure occurs, the system behavior changes and should be described by a diﬀerent mode from the one that corresponds to the normal mode. For dynamic systems in which state may jump as well vary continuously in a discrete set of modes, an eﬀective way to model the faults is so-called stochastic hybrid system. Apart from the applications to problems involving failures, hybrid systems have found great success in other areas such as target tracking and control that involves possible structural changes [2]. The stochastic hybrid model assumes that the actual system at any time can be modeled suﬃciently and accurately by: x(k + 1) = A(k, m(k + 1))x(k) + B(k, m(k + 1))u(k) +T (k, m(k + 1))ξ(k, m(k + 1))

(1)

z(k + 1) = C(k, m(k + 1))x(k) + η(k, m(k + 1)),

(2)

and the system mode sequence assumed to be a ﬁrst-order Markov chain with transition probabilities: Π{mj (k + 1)|mi (k)} = πi j(k), ∀mi , mj ∈ I,

(3)

where A, B, T, and C are the system matrices, x ∈ ℜn is the state vector; z ∈ ℜp is the measured output; u ∈ ℜm is the control input; ξ ∈ ℜnξ and ¯ and η¯(k), and covariances Q(k) η ∈ ℜp are independent noises with mean ξ(k) and R(k); Π{.} denotes probability; m(k) is the discrete-valued modal state, i.e. the index of the normal or fault mode, at time k; I = {m1 , m2 , ..., mN } is the set of all possible system modes; πi j(k) is the transition probability from mode mi to mode mj , i.e. the probability that the system will jump to mode mj at time instant k. Obviously, the following relation must be held for any mi ∈ M : N j=1

πi j(k) =

N

Π{mj (k + 1)|mi (k)} = 1, i = 1, ..., N.

(4)

j=1

Faults can be modeled by changing the appropriate matrices A, B, and C in equation (1) or (2) representing the eﬀectiveness of failures in the system: x(k + 1) = Ai (k)x(k) + Bi (k)u(k) + Ti (k)ξi (k) mi ∈ M k = (5) z(k) = Ci (k)x(k) + ηi (k) where the subscript i denotes the fault modeling in model set mi ∈ Mk = {m1 , ..., mN }, each mi corresponds to a node (a fault) occuring in the process

Development of a Fault Detection Model-Based Controller

3

at time instant k. The number of failure combinations may be huge if the model set Mk is ﬁxed over time, that is for all k. We can solve this diﬃculty by designing variable structure model in which the model set Mk varies at any time in the total model set M or Mk ∈ M . Variable structure set overcomes fundamental limitations of ﬁxed structure set because the ﬁxed model set does not always exactly match the true system at any time, or the set of possible modes at any time varies and depends on the previous state of the system. The design of model set should assure the clear diﬀerence between models so that they are identiﬁable by the multiple model estimators. We then now verify and check the distance between models in the model set Mk . At the present time, there is still no algorism to determine offline the diﬀerence between models for FDD detection. Therefore we propose to check the distance between models via the diﬀerences of the H∞ norm, i.e. the distance d between two models m1 and m2 is deﬁned as: d = |N orm(m1 − m2 )|∞

(6)

Eventhough this distance does not reﬂect the real diﬀerence between models (the magnitude of this value depends also on the system dimension units) but it can help to verify the model set. If the distance between two model is short, it may not be identiﬁable by the FDD. Example 1: Fault Model Set Design and Veriﬁcation. Simulations throughout of this paper are used the following process model of a distillation column with four state variables, two inputs (feed ﬂow rate and reﬂux ﬂow rate) and two outputs (overhead ﬂow rate and overhead composition). For simplicity, we verify only one input (feed ﬂow rate). The space state model of the system is: ⎧ ⎡ ⎤ ⎡ ⎤ −0.05 −6 0 0 −0.2 ⎪ ⎪ ⎪ ⎢ −0.01 −0.15 0 0 ⎥ ⎢ ⎥ ⎪ ⎪ ⎥ x(t) + ⎢ 0.03 ⎥ u(t) + ξ(t) ⎪ ˙ =⎢ ⎨ x(t) ⎣ 1 ⎣ 2 ⎦ 0 0 13 ⎦ M= (7) 1 0 0 0.1 ⎪ ⎪

0 ⎪ ⎪ 1 −0.5 1 1 ⎪ ⎪ x(t) + ξ(t) ⎩ z(t) = −1 0.6 0 1 It is assumed the following ﬁve models including in the model set: - Model 1: Nominal model (no fault), then nothing changes

in equation (7) 0 0 00 x(t) + ξ(t) - Model 2: Total sensor 1 failure, zm2 (t) = −1 0.6 0 1

1 −0.5 1 1 - Model 3: Total sensor 2 failure, zm3 (t) = x(t) + ξ(t) 0 0 00

0.5z1 0.5 −0.25 0.5 0.5 - Model 4: -50% sensor 1 failure, zm4 (t) = = x(t) + z2 −1 0.6 0 1 ξ(t)

z1 1 −0.5 1 1 - Model 5: -50% sensor 2 failure, zm5 (t) = = x(t) + ξ(t) 0.5z2 −0.5 0.3 0 0.5

The above system was discretized with a sampling period T = 1s. Now, we check the distance between models (table 1): The distance from m1 to m3 is relatively

4

N. Afzulpurkar and V.T. Minh Table 1. Distances between models M odels m1 m2 m3 m4 m5 m1 m2 m3 m4 m5

0 479 85 239 322

479 0 486 239 578

85 486 0 254 239

239 239 254 0 401

322 578 239 401 0

smaller than other distances. It will cause difficulty for FDD to identify which model is currently running: The issue is addressed in the next section.

3 Fault Detection and Diagnosis (FDD) In this section, we analyze and select a fast and reliable FDD system applied for the above set of models by using algorithms of multiple model (MM) estimators. MM estimation algorithms appeared in early 1980s when Shalom and Tse [5] introduced a suboptimal, computationally-bounded extension of Kalman ﬁlter to cases where measurements were not always available. Then, several multiple-model ﬁltering techniques, which could provide accurate state estimation, have been developed. Major existing approaches for MM estimation are discussed and introduced in [4–8] including the Non-Interacting Multiple Model (NIMM), the Gaussian Pseudo Bayesian (GPB1), the Secondorder Gaussian Pseudo Bayesian (GPB2), and the Interacting Multiple Model (IMM). From the design of model set, a bank of Kalman ﬁlters runs in parallel at every time, each based on a particular model, to obtain the model-conditional estimates. The overall state estimate is a probabilistically weighted sum of these model-conditional estimates. The jumps in system modes can be modeled as switching between the assumed models in the set. Figure 1 shows the operation of a recursive multiple model estimator, where x ˆi (k|k) is the estimate of the state x(k) obtained from the Kalman ﬁlter based on the model mi at time k given the measurement sequence through time z(k|k); x ˆ0i (k − 1|k − 1) is the equivalent reinitialized estimate at time (k − 1) as the input to the ﬁlter based on model mi at time k; x ˆ(k|k) is the overall state estimate; Pi (k|k), Pi0 (k − 1)|(k − 1), and P (k|k) are the corresponding covariances. A simple and straightforward way of ﬁlter reinitialization is that each single model based recursive ﬁlter uses its own previous state estimation and state covariance as the input at the current cycle: 0 ˆi (k − 1)|(k − 1) x ˆi (k − 1|k − 1) = x (8) Pi0 (k − 1|k − 1) = Pi (k − 1|k − 1)

Development of a Fault Detection Model-Based Controller

5

z(k| k)

Filter Reinitialization

0

P1 (k-1|k-1)

xˆ 02(k-1|k-1) 0

P2 (k-1|k-1)

xˆN0 (k-1|k-1)

Filter based on model 1

xˆ 1 (k| k)

Filter based on model 2

xˆ 2 (k|k )

Filter based on model N

0 N

P (k-1|k-1)

P1 (k| k)

P2 (k| k)

Estimate Combustion

xˆ 01(k-1|k-1)

xˆ (k| k) P(k| k)

xˆ N (k| k) PN (k| k)

Figure 1. Structure of a MM estimator

This leads to the non-interacting multiple model (NIMM) estimator because the ﬁlters operate in parallel without interactions with one another, which is reasonable only under the assumption that the system mode does not change (ﬁgure 2). xˆi (1| 1)

xˆ (1| 1)

xˆi (2 |2)

xˆ (2| 2)

xˆi (3| 3)

xˆ (3|3)

i=1 i=2 i=3

Figure 2. Illustration of the NIMM estimator

Another way of reinitialization is to use the previous overstate estimate and covariance for each ﬁlter as the required input: 0 x ˆi (k − 1|k − 1) = x ˆ(k − 1)|(k − 1) (9) Pi0 (k − 1|k − 1) = P (k − 1|k − 1) This leads to the ﬁrst order Generalized Pseudo Bayesian (GPB1) estimator. It belongs to the class of interacting multiple model estimators since it uses the previous overall state estimate, which carries information from all ﬁlters. Clearly, if the transition probability matrix is an identity matrix this method of reinitialization reduces to the ﬁrst one (ﬁgure 3). The GPB1 and GPB2 algorithms were the result of early work by Ackerson and Fu [6] and good overviews are provided in [7], where suboptimal hypothesis pruning techniques are compared. The GPB2 diﬀered from the GPB1 by including knowledge of the previous timestep’s possible mode transitions, as

6

N. Afzulpurkar and V.T. Minh xˆi (1 |1)

xˆ (1 |1)

xˆi (2| 2)

xˆ (2| 2)

xˆi (3| 3)

xˆ (3| 3)

i=1 i=2 i=3

Figure 3. Illustration of the GPB1 estimator

modeled by a Markov chain. Thus, GPB2 produced slightly smaller tracking errors than GPB1 during non-maneuvering motion. However in the size of this paper, we do not include GPB2 into our simulation test and comparison. A signiﬁcantly better way of reinitialization is to use IMM. The IMM was introduced by Blom in [8] and Zhang and Li in [4]:

ˆi (k|k)P {mi (k)|z k , mj (k + 1)} x ˆ0j (k|k) = E[x(k|z k , mj (k + 1)] = i x 0 0 Pi (k|k) = cov[ˆ xj (k|k)] = i P {mi (k)|z k , mj (k + 1){Pi (k|k) + |˜ x0ij (k|k)|2

(10)

where cov[.] stands for covariance and x ˜0ij (k|k) = x ˆ0i (k|k) − x ˆ0j (k|k). Figure 4 depicts the reinitialization in the IMM estimator. In this paper we use this approach for setting up a FDD system. The probability of each model matching xˆi (1|1)

xˆi0 (1|1)

xˆi (2 | 2)

xˆi0 (2 | 2)

xˆi (3 |3)

xˆi0(3 | 3)

i=1 i=2 i=3

Figure 4. Illustration of the IMM estimator

to the system mode provides the required information for mode chosen decision. The mode decision can be achieved by comparing with a ﬁxed threshold probability µT . If the mode probabilities max(µi (k)) ≥ µT , mode at µi (k) is occurred and taken place at the next cycle. Otherwise, there is no new mode detection. The system maintains the current mode for the next cycle calculation. Example 2: Analysis and Selection of FDD system. It is assumed that the model ⎡mode in example 1 can jump ⎤ from each other in a mode 0.96 0.01 0.01 0.01 0.01 ⎢ 0.05 0.95 0 0 0 ⎥ ⎢ ⎥ ⎥. 0.05 0 0.95 0 0 probability matrix as: πij = ⎢ ⎢ ⎥ ⎣ 0.05 0 0 0.95 0 ⎦ 0.05 0 0 0 0.95 The threshold value for the mode probabilities is chosen as µT = 0.9. Now we begin to compare the three estimators of NIMM, GPB1, and IMM to test their ability to detect faults. The ﬁve models are run with time: m1 for k = 1 − 20, k = 41 − 60, k = 81 − 100, k = 121 − 140, and k = 161 − 180; m2 for k = 21 − 40; m3 for

Development of a Fault Detection Model-Based Controller 1

1

0.9

0.9

0.9

0.8

0.8

0.8

0.7

0.7

0.6

0.6

0.6

(a)0.5

(b)0.5

(c)0.5

0.4

0.4

0.4

0.3

0.3

0.3

0.2

0.2

0.2

0.1

0.1

0

0

20

40

60

80

100

120

140

160

0

180

7

1

0.7

0.1

0

20

40

60

80

100

120

140

160

0

180

0

20

40

60

80

100

120

140

160

180

Figure 5. Probabilites of estimators (a) NIMM, (b) GPB1, and (c) IMM

k = 61 − 80; m4 for k = 101 − 120; and m5 for k = 141 − 160. Results of simulation are shown in ﬁgure 5. In Figure 5, we can see that the GPB1 estimator performs as good as IMM estimator while NIMM estimator fails to detect sensor failures in the model set. Next we continue to test the ability of the GPB1 and IMM estimators by narrowing the distances between modes as close as possible until one of methods cannot detect the failures. Now we design the following mode set: m1 = M odel1: Nominal mode; 0.95z1 m2 = M odel2: −5% sensor 1 failure or zm2 (t) = ; m3 = M odel3: −5% z2

z1 sensor 2 failure or zm3 (t) = ; m4 = M odel4: −2% sensor 1 failure or 0.95z2

0.98z1 z1 zm4 (t) = ; and m5 = M odel5: −2% sensor 2 failure or zm5 (t) = ; z2 0.98z2 With these parameters, we achieve the distances between models in table 2.

Table 2. Distances between models M odels m1 m1 m2 m3 m4 m5

0 0.048 0.009 0.240 0.004

m2

m3

m4

m5

0.048 0 0.049 0.192 0.048

0.009 0.049 0 0.240 0.004

0.240 0.192 0.240 0 0.240

0.004 0.048 0.004 0.240 0

1

1

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

(a)0.5

(b)0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0 0

0.1

20

40

60

80

100

120

140

160

180

0

0

20

40

60

80

100

120

140

160

180

Figure 6. Probabilites of estimators (a) GPB1, and (b) IMM

8

N. Afzulpurkar and V.T. Minh

Since the distances between models are very close, the GPB1 fails to detect failures while IMM still proves it’s much superior to detect failures (ﬁgure 6). As a result, we select the IMM for our FDD system. Now we move to the next step to design a controller reconﬁguration for the FDMB system.

4 Controller Reconﬁguration (CR) In this section we develop a new CR which can online determine the optimal control actions and reconﬁgure the controller using Generalized Predictive Control (GPC). We will show how an IMM-based GMC controller can be a good FDMB system. Firstly we review the basic GPC algorithm: Generalized Predictive Control (GPC) is one of MPC techniques developed by Clarke et al. [9, 10]: GPC was intended to oﬀer a new adaptive control alternative. GPC uses the ideas with controlled autoregressive integrated moving average (CARIMA) plant in adaptive context and self-tuning by recursive estimation. Kinnaert [11] developed GPC from CARIMA model into a more general in state-space form for multiple inputs and multiple outputs (MIMO) system as in equation (5). The optimal control problem for the general cost function GPC controller for MIMO is minJ(U, x(k)) = x(k+Ny )′ Φx(k+Ny )+ U

Ny −1

k=0

{x(k)′ Ωx(k)+u(k)′ Θu(k)} (11)

where the column vector U = [u′0 , u′1 , ..., u′NU −1 ]′ is the predictive optimization vector, Ω = Ω ′ > 0, Θ = Θ′ ≥0 are the weighting matrices for predictive state and input, respectively. The Liapunov matrix Φ > 0 is the solution of Riccati equation for the system in equation (5). Ny and NU are the predictive output horizon and the predictive control horizon, respectively. By substituting x(k + N −1 N ) = AN x(k) + j=0 Aj Bu(k + N − 1 − j) + AN −1 T ξ(k), equation (11) can be rewritten as 1 min{ U ′ HU + x(k)′ F U + ξ(k)′ Y U } (12) U 2 where H = H ′ > 0, and H, F, Y are obtained from equations (5) and (11). Then the optimization problem (12) is a quadratic program and depends on the current state x(k) and noise ξ(k). In the case of unconstrained GPC, the optimization input vector can be calculated as U = −H −1 {x(k)′ F + ξ(k)′ Y }

(13)

then the ﬁrst input U (1) in U will be implemented into the sytem at each time step. The above GPC algorism is proposed for non-output tracking system. However in reality, the primary control objective is to force the plant outputs to

Development of a Fault Detection Model-Based Controller

9

track their setpoints. In this case, the space state of model in equation (5) can be transformed into a new innovation form as follows: ˜x(k|k − 1) + B△u(k) ˜ ˜ x ˆ(k + 1|k) = Aˆ + Ke(k) ¯ M: (14) ˜ z(k) = C x ˆ(k|k − 1) + e(k) ˜ B, ˜ C, ˜ and K ˜ are ﬁxed matrices from A, B, C, and T in equation where A, (5), η(k) = ξ(k)=e(k), z(k) ∈ ℜny , △u(k) = u(k) − u(k − 1) ∈ ℜnu , and x ˆ(k|k − 1) is an estimate of the state x(k) obtained from a Kalman ﬁlter. For a moving horizon control, the prediction of x(k + j|k) given the information {z(k), z(k − 1), ..., u(k − 1), u(k − 2), ...} is: ˆ(k|k − 1) + x ˆ(k + j|k) = Aj x

j−1

Aj−1−i B△u(k + i) + Aj−1 Ke(k)

(15)

i=0

and the prediction of the ﬁltered output will be: zˆ(k + j|k) = CAj x ˆ(k|k − 1) +

j−1

CAj−1−i B△u(k + i) + CAj−1 Ke(k)

(16)

i=0

If we form u ˜(k) = [△u(k)′ , ..., △u(k + NU −1 )′ ]′ and z˜(k) = [ˆ z (k + N0 |k)′ , ..., ′ ′ zˆ(k + Ny−1 |k) ] , we can write the global predictive model for the ﬁltered out for from 0 to Ny output prediction and for from 0 to NU −1 input predition horizon as: ⎡

CB ⎢ CAB ⎢ ⎢ : zˆ(k) = ⎢ ⎢ CANU −1 B ⎢ ⎣ : CANy −1 B

⎡ ⎡ ⎤ ⎤ ⎤ ... 0 CA C 2 ⎢ CA ⎥ ⎢ CA ⎥ ⎥ ... 0 ⎢ ⎢ ⎥ ⎥ ⎥ ⎢ ... ⎥ ⎢ ⎥ ⎥ ... ... ... ⎢ ⎢ ⎥u ⎥x ⎥ ˜ (k) + ˆ (k|k − 1) + ⎢ ⎢ ⎥ ⎥ ⎥ ... ... ... CB ⎢ ⎢ ⎥ ⎥ ⎥ ⎣ ... ⎦ ⎣ ⎦ ⎦ ... ... ... CANy −1 CANy ... CANy −NU B

×Ke(k).

For simplicity, we can rewrite as: z˜(k) = G˜ u(k) + V x ˆ(k|k − 1) + W Ke(k)

(17)

Consider the new general cost function of GPC for tracking setpoints: Ny −1

minJ(˜ u(k), x(k)) = u ˜ (k)

k=0

{[z(k) − w(k)]′ [z(k) − w(k)] + [△u(k)]′ Γ [△u(k)]}

(18)

where w(k) is the output reference setpoints and Γ = Γ ′ ≥0 is the control weighting matrix. Similarly, the optimization problem in equation (18) is a quadratic program and in the case of unconstrained GPC, the control law that minimizes this cost funtion is:

10

N. Afzulpurkar and V.T. Minh

u ˜(k) = −(G′ G + Γ )−1 (V x ˆ(k|k − 1) + W Ke(k) − w(k))

(19)

then the ﬁst input u ˜(1) in u ˜(k) will be implemented into the system in each time step. Note that the optimization solution in equation (13) and (19) are for unconstrained GPC. In the case of constrained GPC, the quadratic prgram in equation (11) and (18) must be solved subject to constraints on states, inputs and outputs. Now, we can combine GPC with IMM estimator: Since GPC follows a stochastic perspective, we can use GPC controller for the CR using the inputs of the CR as the outputs of the IMM. The overall state estimate x(k) ≈ x ˆ(k) = N xi (k) where N is the number of models in the model set. So we i=1 µi (k)ˆ can assume that the “true” system is the weighted sum with µi (k) of models in a convex combination Mk = {m1 , ..., mN } . A generalized diagram of IMM based GPC controller is shown in ﬁgure 7. u(t)

z(t)

Plant x1

IMM

--

xN

m1 - -

mN

GPC w(t) Figure 7. Illustration of the NIMM estimator

So, we build a bank of GPC controllers for each model in the model set. Assuming the mode probabilities are constant during the control N horizon, we can easily derive the GPC control law by forming G = ( i=1 µi Gi ), V = N N ( i=1 µi Vi ) and W = ( i=1 µi Wi ) matrices that correspond to the “true” N model m = ( i=1 µi mi ) in equation (17) and ﬁnd out the optimal control action in equation (19). A notation is taken here for one of disadvantages of the IMM-based GPC controller is that the type and the magnitude of the input excitation play an important role in its performance. When the magnitude of the input signal is very small, the residuals of the Kalman ﬁlters will be very small and, therefore, the likelihood functions for the modes will approximately be equal. This will lead to unchanging (or very slowly changing) mode probabilities which in turn makes the IMM estimator incapable to detect failures. Next, we will run some simulations to test the above proposed fault detection and control system. Example 3: Controller Reconﬁguration (CR).

Development of a Fault Detection Model-Based Controller

11

Example 3.1: We run a normal GPC controller with Ny = 4, NU = 4, the weighting matrix Γ = 0.1 and with a reference set-point w = 1. It is assumed that 0.5z1 from time k = 0−50, −50% sensor 1 failure or zm2 (t) = ; from k = 51−100, z2 run

the normal mode; and from k = 101 − 150, 50% sensor 1 failure or zm3 (t) = 1.5z1 . The normal GPC controller provides wrong output (ﬁgure 8). z2 0.02

1.6

1.4

0.015

1.2

0.01 1

(b) 0.005

(a)0.8 0.6

0 0.4

−0.005 0.2

0

0

50

100

150

−0.01

0

50

100

150

Figure 8. GPC controller with sensor errors (a) Output, and (b) Input Example 3.2: We run the same parameters in example 3.1 ⎤ ⎡ using our proposed 0.90 0.05 0.05 IMM-base GPC controller with the transition matrix πi j = ⎣ 0.05 0.90 0.05 ⎦. Re0.05 0.05 0.90 sults are shown in ﬁgure 9: Our new FDMB system still keeps the output at the desired setpoint since the IMM estimator easily ﬁnds accurate fault mode and activate the CR system online. 1.01

0.1

1.008

0.08

0.9

1.006

0.06

0.8

1.004

0.7

0.04

1.002

(a)

1

0.6

0.02

(b)

1

(c)0.5

0

0.998

0.4

−0.02 0.996

0.3

−0.04 0.994

0.2

−0.06 0.992

0.1

−0.08 0.99

0

50

100

150

0

50

100

150

0

0

50

100

150

Figure 9. IMM-based GPC controlle (a) Output, (b) Input, and (c) Probabilities Example 3.3: Low magnitude of input signals can lead to failure of IMM estimator. We run the same parameters in example 3.2 but reduce the reference setpoint to a very low value at w = 0.01. The system becomes uncontrollable (ﬁgure 10).

5 Conclusion Systems subject to sensor and actuator failures can be modeled as a stochastic hybrid system with fault modeling nodes in the model set. In stead

12

N. Afzulpurkar and V.T. Minh 9

11

0.5

8

x 10

x 10

1

z z p

0

0.9

6 0.8

−0.5

0.7

4

−1

0.6

(b)2

−1.5

(a)

(c)0.5

−2 0.4

0

−2.5

0.3

−3

0.2

−2 −3.5

−4

0.1

0

50

100

150

−4

0

50

100

150

0

0

50

100

150

Figure 10. IMM-based GPC controller with low magnitude of input signal

of a ﬁxed structure model, the model set can be designed in variable structures. Variable structure model can overcome fundamental limitations of a ﬁxed structure model when the number of failure combinations becomes huge and the ﬁxed model set does not match the true system mode set at any time, or the set of possible modes at any time varies and depends on the previous state of the system. Our proposed IMM based GPC controller can provide real-time control performance, detection and diagnosis of sensor and actuator failures online. Simulations in this study show that the system can maintain the output setpoints amid internal failures. One of the main advantages of the GPC algorithm is that the controller can provide soft switching signals based on the weighted probabilities of the outputs of diﬀerent models. The main diﬃculty of this approach is the choice of modes in the model set as well as the transition probability matrix that assigns probabilities to jump from one mode to another since the IMM algorithms are very sensitive to the transition probability matrix. Another limitation related to IMM based GPC controller is the magnitude of the input excitation. When we change the output setpoints to small values, the input signal may become very small and this leads to unchanging mode probabilities or IMM based GPC controller cannot detect failures. Lastly, this approach does not consider issues of uncertainty in the controller system.

References 1. Cristian, F., Understanding Fault Tolerant Distributed Systems. Communications of ACM, Vol. 34, pp. 56-78, 1991 2. Li, R., Hybrid Estimation Techniques, Control and Dynamic Systems, New York, Academic Press, Vol. 76, pp. 213-287, 1996 3. Kanev, S., Verhaegn, M., Controller Reconﬁguration for Non-linear Systems, Control Engineering Practice, Vol 8, 11, pp. 1223-1235, 2000 4. Zhang, Y., Li, R., Detection and Diagnosis of Sensor and Actuator Failures Using IMM Estimator, IEEE Trans. On Aerospace and Elect Sys, Vol. 34, 4, 1998 5. Shalom Y., and Tse, E., Tracking in a Cluttered Environment with Probabilistic Data Association, Automatica, Vol. 11, pp. 451-460, 1975

Development of a Fault Detection Model-Based Controller

13

6. Ackerson, G., and Fu, K., On State Estimation in Switching Environments. IEEE Trans. On Automatic Control, Vol. 15, 1, pp. 10-17, 1970 7. Tugnait, J., Detection and Estimation of Abruptly Changing Systems. Automatica, Vol 18, pp. 607-615, 1982 8. Blom, H., and Shalom, Y., Interacting Multiple Model for System with Markovian Switching Coefficients, IEEE Trans on Auto Cont, Vol 33, 8, pp. 780-785, 1983 9. Clarke D. W., Mohtadi, C., and Tuﬀs, P. S. Generalized Predictive Control – Extensions and Interpretations, Automatica, 23(2), 149-160, 1987 10. Clarke, D.W., Mohtadi, C., and Tuﬀs, P. S. Generalized Predictive Control: I. The Basic Algorithm, Automatica, 23(2), 137-147, 1987 11. Kinnaert, M. Adaptive Generalized Predictive Control for MIMO Systems, Int. J. Control. 50(1), 161-172, 1987

Sensitivity Generation in an Adaptive BDF-Method Jan Albersmeyer and Hans Georg Bock Interdisciplinary Center for Scientiﬁc Computing (IWR), University of Heidelberg, Im Neuenheimer Feld 368, D-69120 Heidelberg, Germany [email protected], [email protected] Abstract In this article we describe state-of-the-art approaches to calculate solutions and sensitivities for initial value problems (IVP) of semi-explicit systems of diﬀerential-algebraic equations of index one. We start with a description of the techniques we use to solve the systems efficiently with an adaptive BDF-method. Afterwards we focus on the computation of sensitivities using the principle of Internal Numerical Diﬀerentiation (IND) invented by Bock [4]. We present a newly implemented reverse mode of IND to generate sensitivity information in an adjoint way. At the end we show a numerical comparison for the old and new approaches for sensitivity generation using the software package DAESOL-II [1], in which both approaches are implemented.

1 Introduction In real-world dynamical optimization problems the underlying process models can often be described by systems of ordinary diﬀerential equations (ODEs) or diﬀerential-algebraic equations (DAEs). Many state-of-the-art algorithms for optimal control, parameter estimation or experimental design used to solve such problems are based on derivative information, e.g. sequential quadratic programming (SQP) type or Gauss-Newton type methods. This results in a demand for an eﬃcient and reliable way to calculate accurate function values and high-precision derivatives of the objective function and the constraints. Therefore we have to calculate (directional) derivatives of the solutions of the model ODEs/DAEs, so-called (directional) sensitivities. The need of accurate sensitivity information arises for example also from model reduction and network analysis e.g. in system biology [13]. In the following we ﬁrst will describe how to compute solutions of IVPs for DAEs of index one eﬃciently and then how to generate accurate sensitivity information along with the computation of the nominal trajectories by using the principle of Internal Numerical Diﬀerentiation (IND) [4]. At the end we show some numerical results, obtained with the integrator DAESOL-II, a

16

J. Albersmeyer and H.G. Bock

C++ code based on BDF-methods written by Albersmeyer [1], where the presented ideas have been implemented.

2 Efficient Solution of the Initial Value Problems We will now brieﬂy describe how the solutions of the IVPs themselves can be generated eﬃciently using an adaptive BDF-method and sketch the ideas and features that are implemented in the code DAESOL-II. 2.1 Problem formulation We consider in this article IVPs for semi-implicit DAEs of the type A(t, x, z, p, q)x˙ − f (t, x, z, p, q) = 0, g(t, x, z, p, q) − θ(t) g(t0 , x0 , z0 , p, q) = 0,

x(t0 ) = x0 , z(t0 ) = z0 .

(1a) (1b)

Here t ∈ [t0 , tf ] ⊂ R denotes the time, x ∈ Rnx represents the diﬀerential states, z ∈ Rnz the algebraic states, p ∈ Rnp parameter and q ∈ Rnq control parameter. We use here a relaxed formulation of the algebraic equations. This allows us to start the integration with inconsistent algebraic variables, which is often an advantage when solving optimization problems. The damping function θ(·) is a nonnegative, strictly decreasing real-valued function satisfying θ(t0 ) = 1. Furthermore we assume that A and ∂g ∂z are regular along the solution trajectory (index 1 condition). In our practical problems the ODE/DAEsystem is usually stiﬀ, nonlinear and high-dimensional. 2.2 Strategies used in DAESOL-II The code DAESOL-II is based on variable-order variable-stepsize Backward Diﬀerentiation Formulas (BDF). BDF methods were invented by Curtiss and Hirschfelder [6] for the solution of stiﬀ ODEs, and were later also very successfully applied to DAEs. They are known for their excellent stability properties for stiﬀ equations. Beside that they give a natural and eﬃcient way to obtain an error-controlled continuous representation of the solution [5] by interpolation polynomials, which are calculated during the solution process of the IVP anyway. In every BDF-step one has to solve the implicit system of corrector equations A(tn+1 , xn+1 , zn+1 , p, q)

kn l=0

αln xn+1−l − hn f (tn+1 , xn+1 , zn+1 , p, q) =0 (2a)

g(tn+1 , xn+1 , zn+1 , p, q) − θ(tn+1 ) g(t0 , x0 , z0 , p, q) =0, (2b)

Sensitivity Generation in an Adaptive BDF-Method

17

where n is the number of the actual BDF-step, kn is the BDF-order in step n, and hn the actual stepsize. The coeﬃcients αln of the BDF corrector polynomial are calculated and updated eﬃciently via modiﬁed divided diﬀerences. For the solution of this implicit system we apply a Newton-like method, where we follow a monitor-strategy to use existing Jacobian information as long as possible. Based on the contraction rates of the Newton-like method we decide whether to reuse the Jacobian, to decompose it anew with the actual stepsize and coeﬃcients and old derivative information of the model functions, or to build it from scratch. Especially the second step often saves a lot of evaluations of model derivatives compared to ordinary approaches [9]. Note that in any case in our algorithm at most one iteration matrix is used per BDF-step and at most three Newton iterations are made. The stepsize and order selection in DAESOL-II is based on local error estimation on the variable integration grid, and aims for relatively smooth stepsize changes because of stability reasons. Compared to stepsize strategies based on equidistant grids, this approach leads to better estimates and results in fewer step rejections [3, 10]. DAESOL-II allows the use of inconsistent initial values by a relaxed formulation and provides alternatively routines to compute consistent initial values. The generation of derivatives of the model functions is done optionally by user-supplied derivative functions, internally by ﬁnite diﬀerences or by automatic diﬀerentiation via built-in ADOL-C support. In any case directional derivatives are used whenever possible to reduce memory consumption and computational eﬀort. Linear algebra subproblems are currently solved either using routines from the ATLAS [15] library in case of dense matrices, or using UMFPACK [7] in case of unstructured sparse matrices. A complete survey of the strategies and features of DAESOL-II can be found in [1].

3 Sensitivity Generation In the following section we will explain how to generate (directional) sensitivities eﬃciently using the principle of Internal Numerical Diﬀerentiation and reusing information from the computation of the solution of the IVP. 3.1 The principle of Internal Numerical Diﬀerentiation The simplest approach to obtain sensitivity information - the so-called External Numerical Diﬀerentiation - is treating the integrator as a black box and calculate ﬁnite diﬀerences after solving the IVP for the original IVs and again for slightly perturbed IVs. This approach - although very easy to implement suﬀers from the fact that the output of an adaptive integrator usually depends discontinuously on the input: Jumps in the range of the integration tolerance can always occur for diﬀerent sets of parameter and IVs. Therefore the number of accurate digits of the solution of the IVP has to be approximately twice

18

J. Albersmeyer and H.G. Bock

as high as the needed number of accurate digits of the derivatives. This leads to a very high and often unacceptable numerical eﬀort. The idea of Internal Numerical Diﬀerentiation is now to freeze the adaptive components of the integrator and to diﬀerentiate not the adaptive integrator itself, but the adaptively generated discretization scheme (consisting of used stepsizes, BDF-orders, iteration matrices, compare Fig. 1). This scheme can be interpreted as a sequence of diﬀerentiable mappings, each leading from one integration time to the next, and can therefore be diﬀerentiated for example using the ideas of Automatic Diﬀerentiation (AD). We assume in the following that the reader is familiar with the basics of AD, especially with the forward and the adjoint mode of AD. For an introduction to AD see e.g. [11].

h0 , M0

t0 , y 0

h1 , M1

h2 , M2

hN −3 , MN −3

♣ ❘♣ ♣ ♣ ♣ ♣ ♣

❫

❫

t1 , y 1

t2 , y 2

hN −2 , MN −2

❘ tN −2 , yN −2

hN −1 , MN −1

❘

❫

tN −1 , yN −1

tN , yN

Figure 1. The adaptively generated discretization scheme of the solution of the IVP, hi denotes the used stepsize, Mi the used iteration matrix in step i

Diﬀerentiating the scheme using the forward respectively adjoint mode of AD, leads to the two variants of IND: The forward IND and the reverse IND. The ﬁrst generates sensitivity information of the type ⎛ ⎞ vx ∂(x(tf ), z(tf )) ⎜ vz ⎟ ⎟, ·⎜ ∂(x0 , z0 , p, q) ⎝vp ⎠ vq

preferable to calculate directional sensitivities or the full sensitivity matrices when only few parameters and controls are present. The latter generates sensitivity information of the type

∂(x(tf ), z(tf )) λTx λTz · , ∂(x0 , z0 , p, q)

very eﬃciently and is therefore advantageous if we need gradient-type information, the sensitivities of only a few solution components, or the full sensitivity matrix in case many parameter and control parameter are present. 3.2 Forward IND Diﬀerentiation of the integration scheme (2) using the forward mode of AD leads to an integration scheme that is equivalent to solving the corresponding variational DAE using the same discretization scheme as for the computation of the nominal trajectory.

Sensitivity Generation in an Adaptive BDF-Method

19

Depending on whether we prefer to solve the linear systems occurring in this scheme directly or to diﬀerentiate also the Newton-like iterations used in the integration procedure for the solution of the IVP, we speak of direct forward IND or iterative forward IND. For more details on how to apply the ideas of forward IND to BDF-methods refer to [1]. 3.3 Adjoint Sensitivities We now combine the ideas of IND and the adjoint mode of AD and present a reverse mode of IND. Analogous to the forward IND, we will obtain two slightly diﬀerent schemes, the direct and the iterative reverse IND. Direct Reverse IND Here we assume that we have solved the IVP, all trajectory values on the integration grid are available and that we have solved the corrector equations (2) in each step exactly. We interpret now each integration step as one elementary operation with inputs xn+1−i , zn+1−i (i = 1, . . . , kn ), p, q, x0 , z0 and outputs xn+1 , zn+1 , of which we have to calculate the derivatives to apply AD. The fact that xn+1 depends via one elementary operation directly on xn+1−i we denote with the symbol ≻ and their indices, i.e. n + 1 ≻ n + 1 − i. We use the implicit function theorem on the function

F n (xn+1 , zn+1 ; xn , . . . , xn+1−kn , p, q, x0 , z0 ) := kn n αl xn+1−l − hn f (tn+1 , xn+1 , zn+1 , p, q) A(tn+1 , xn+1 , zn+1 , p, q) l=0 = 0. g(tn+1 , xn+1 , zn+1 , p, q) − θ(tn+1 ) g(t0 , x0 , z0 , p, q)

(3)

For the derivative with respect to the new trajectory points one obtains ∂F n = ∂(xn+1 , zn+1 ) Ax,n+1 x˙ C n+1 + α0 An+1 − hn fx,n+1 gx,n+1

Az,n+1 x˙ C n+1 − hn fz,n+1 , (4) gz,n+1

k n n with the abbreviation x˙ C n+1 ≡ l=0 αl xn+1−l for the corrector polynomial and the subscripts of the form Bd,m meaning here and in the following the partial derivative of the function B with respect to variable d, evaluated at the values for tm . For the derivative with respect to the already known trajectory points xn+1−i , i = 1, . . . , nk we obtain n ∂F n αi An+1 0 = . (5) 0 0 ∂(xn+1−i )

20

J. Albersmeyer and H.G. Bock

Furthermore we have the derivatives with respect to parameter and control parameter ∂F n Ap,n+1 x˙ C Aq,n+1 x˙ C n+1 − hn fp,n+1 n+1 − hn fq,n+1 = . (6) gp,n+1 gq,n+1 ∂(p, q) If we use the relaxed formulation from (2b), we have also the derivatives with respect to initial diﬀerential and algebraic states ∂F n 0 0 = . (7) −θ(tn+1 )gx,0 −θ(tn+1 )gz,0 ∂(x0 , z0 ) Hence by using the implicit function theorem we ﬁnally obtain for the derivatives of the new trajectory values

−1 ∂F n ∂(xn+1 , zn+1 ) ∂F n =− , ∂D ∂(xn+1 , zn+1 ) ∂D

(8)

with D ∈ {xn+1−i , p, q, x0 , z0 }. Note that many of the derivatives of the model functions needed here can be evaluated in practice very eﬃciently, without forming the entire model Jacobian, as adjoint directional derivatives by using AD. Algorithm 1 Basic form of direct reverse IND 1: Initialize x ¯NintSteps = LT with adjoint sensitivity directions LT . 2: Initialize all intermediate variables x ¯i , z¯0 , p¯, q¯ to zero. 3: for n = N intSteps − 1, . . . , 0 do ∂xk k−1 k−1 ¯k (I 0)(F(x ≡ − k≻n x )−1 F(x . ¯k ∂x 4: x ¯n + = k≻n x n) n k ,zk ) ∂x

n n n n n+1 ≡ −¯ xn+1 (I 0)(F(x )−1 [F(p) , F(q) , F(z ] 5: [¯ p, q¯, z¯0 ]+ = x ¯n+1 ∂(p,q,z 0) n+1 ,zn+1 ) 0) 6: end for

Reverse sweep for direct reverse IND With the derivatives of our elementary functions (8) we are now able to perform an adjoint AD sweep, using the values of the nominal trajectories at the integration points, to obtain adjoint sensitivity information for given adjoint directions (cf. algorithm 1). Let now N be the number of accepted integration steps from the integration of the IVP, kmax the maximal BDForder during integration and ndir the number of directional sensitivities to be calculated. The computational eﬀort for the reverse sweep can then be estimated as follows: In each step we need to build and factorize the Jacobian (4) once, to solve ndir linear equation systems and to compute ndir directional derivatives of the model functions. Overall, for the direct reverse IND we need N matrix factorizations and N · ndir directional derivatives of the model functions and have to solve also N · ndir linear equation systems. The memory

Sensitivity Generation in an Adaptive BDF-Method

21

demand for the (intermediate) adjoint sensitivity quantities can be estimated by ndir ·[(kmax + 1) · (nx + nz ) + np + nq ], because at most kmax earlier trajectory values contribute directly to a trajectory value. Note that for one adjoint direction this is of the order of storage needed for the interpolation during integration of the IVP anyway. Remark: For practical implementation it is more eﬃcient to introduce k−1 ¯ k := x )−1 and to rewrite the algorithm in these ¯k (I 0)(F(x new variables λ k ,zk ) new quantities. Iterative Reverse IND In this approach we follow more closely the predictor-corrector scheme which is applied in practice for the integration of the IVP: There we use as start-value for the Newton-like method in step n the predictor value P yn+1 , obtained by the polynomial through the last kn + 1 trajectory valP = ues yn , . . . , yn−kn , extrapolated at tn+1 . This can be written as yn+1 kn P,n yn−l . Note that for notational convenience we combine here difl=0 αl ferential and algebraic states to y ≡ (x, z)T . (0) P , 1 ≤ ln ≤ 3 Newton-like We then perform, starting in yn+1 := yn+1 iterations with an iteration Matrix Mn to search the root of the function F n (i) deﬁned in (3). We denote the iterates by yn+1 , 1 ≤ i ≤ ln and set the new (ln ) trajectory value to the last iterate yn+1 := yn+1 . If we analyze the dependencies inside this predictor-corrector scheme we ﬁnd that the predictor value only depends on earlier trajectory values and its derivative is obtained by P ∂yn+1 = αlP,n · I ∂(yn+1−i )

(9)

with i = 1, . . . , kn + 1. By studying the Newton-like iterations we obtain for the derivatives of one iterate with respect to the previous iterate (i+1)

∂yn+1

(i) ∂yn+1

= Id − M Fynn+1 ,

i = 0, . . . , ln − 1

(10)

and for the derivatives with respect to initial states, parameter and controls (i+1)

∂yn+1 = −M (Fpn , Fqn , Fyn0 ), ∂(p, q, y0 )

i = 0, . . . , ln − 1, (i)

where the derivatives of F n are evaluated at the values yn+1 . Reverse sweep for iterative reverse IND We assume again that all trajectory values on the integration grid are known, and additionally all iterates and iteration matrices used during the

22

J. Albersmeyer and H.G. Bock

integration. Note that due to the use of the monitor strategy in the Newtonlike method the number of used iteration matrices is signiﬁcantly smaller than the number of integration steps. We then make an adjoint AD sweep, interpreting the integration as an alternating sequence of two kinds of elementary operations: Evaluating the predictor polynomial and performing Newton-like iteration steps. If using this approach we have to account for diﬀerent start-up strategies of the BDF-method by adapting the last operations of the backward sweep to the actually used starter method. The computational eﬀort for the iterative reverse IND can be estimated by N ndir · i=0 li needed solutions of linear systems and the same number of directional derivatives of the model functions. In return no additional matrix decompositions are needed here. The storage needed for the adjoint quantities is the same as for the direct approach.

4 Numerical Examples As a proof of concept we tested the reverse IND approach on two small ODE examples. We conﬁned ourselves to comparing the results of the reverse IND with the well tested forward IND approach, also implemented in DAESOL-II. For a comparison of the ideas implemented in DAESOL-II with other codes such as for example DASSL, DDASAC and LIMEX, refer to [2]. We considered two chemical reaction systems: the pyridine reaction and the peroxidase-oxidase reaction. The model for the pyridine reaction consists of 7 ODEs and 11 parameter, the model for the peroxidase-oxidase reaction of 10 ODEs and 17 parameter. A detailed description of the systems and their properties can be found e.g. in [1]. We chose as task for the comparison the evaluation of the whole sensitivity matrix, i.e. the derivatives of all states at the ﬁnal time with respect to all initial states and parameter, using the direct and iterative forward respectively reverse IND approach. The necessary (directional) derivatives of the model function were generated inside DAESOL-II with the help of the AD tool ADOL-C [12]. For the solution of the linear systems we used here the dense linear algebra package ATLAS [15]. We tested with error tolerances for the nominal trajectories of T OL = 10−3 and T OL = 10−6 . The calculations were all performed on a standard 2.8GHz Pentium IV Computer with a SuSE 9.2 Linux operating system. It was observed that the diﬀerence between the sensitivities calculated using the forward respectively reverse IND were practically of the order of machine precision, i.e. the maximal relative diﬀerence was smaller than 10−14 for both the direct and iterative mode. Table 1 gives a comparison of the eﬀort for the direct variants on these two examples. Note that the computational eﬀort per sensitivity direction for the forward IND is theoretically the same as for the reverse IND in terms of matrix decompositions and linear systems to be solved and model derivatives to be calculated. But in the adjoint mode

Sensitivity Generation in an Adaptive BDF-Method

T OL # BDF-steps # Matrix factorizations # Forward Sensitivity directions # Linear Systems in Forward Mode # Dir. Der. in Forward Mode # Reverse Sensitivity directions # Linear Systems in Reverse Mode # Dir. Der. in Reverse Mode Time savings

23

Pyridine Pyridine Peroxi Peroxi 10−3 10−6 10−3 10−6 82 162 723 1796 18 1476

18 2916

27 27 19521 48492

7 574

7 1134

10 7230

14.1%

18.6%

19.2% 35.1%

10 17960

Table 1. Comparison of numerical and runtime eﬀort for direct forward and reverse IND on two examples from reaction kinetics

less sensitivity directions are needed to form the complete sensitivity matrix. In exchange, a forward directional derivative of a model function is slightly cheaper than an adjoint one. Overall, we already have a visible speed-up on these small systems with a moderate number of parameter when using the reverse IND, even while there is still room for algorithmic improvement and the results indicate that the adjoint mode will show the most beneﬁts on larger systems with many parameter, when the overhead from trajectory integration and general implementation is less signiﬁcant. Or of course in the case only sensitivities for a few adjoint directions are needed.

5 Conclusion and Outlook In this article we have brieﬂy discussed how to solve IVPs for ODEs and DAEs of index 1 and how to eﬃciently generate sensitivities for these solutions. We explained the idea of Internal Numerical Diﬀerentiation (IND) and derived a reverse mode of IND for semi-implicit DAEs of index 1. We gave a ﬁrst numerical proof of concept of the reverse IND by applying it to two ODE-examples from reaction kinetics. We demonstrated that it is working accurately and that already on these small problems the reverse approach is more eﬃcient for the computation of the whole sensitivity matrix than the forward approach. In the future we want to use the reverse IND in connection with special adjoint-based optimization methods for large-scale systems which only need one or two adjoint directional sensitivities per optimization step, cf. [8]. As the memory consumption of the reverse mode could become a problem for very large systems, a closer investigation of checkpointing schemes (cf. [14]) will be interesting. Furthermore we want to extend the sensitivity generation by the use of IND to directional second order sensitivities, which would allow e.g.

24

J. Albersmeyer and H.G. Bock

the calculation of a directional derivative of a gradient, as needed in robust optimization and optimum experimental design.

References 1. J. Albersmeyer. Effiziente Ableitungserzeugung in einem adaptiven BDF-Verfahren. Master’s thesis, Universit¨at Heidelberg, 2005. 2. I. Bauer. Numerische Verfahren zur L¨ osung von Anfangswertaufgaben und zur Generierung von ersten und zweiten Ableitungen mit Anwendungen bei Optimierungsaufgaben in Chemie und Verfahrenstechnik. PhD thesis, Universit¨ at Heidelberg, 1999. 3. G. Bleser. Eine effiziente Ordnungs–und Schrittweitensteuerung unter Verwendung von Fehlerformeln f¨ ur variable Gitter und ihre Realisierung in Mehrschrittverfahren vom BDF–Typ. Master’s thesis, Universit¨at Bonn, 1986. 4. H. G. Bock. Numerical treatment of inverse problems in chemical reaction kinetics. In K. H. Ebert, P. Deuﬂhard, and W. J¨ ager, editors, Modelling of Chemical Reaction Systems, volume 18 of Springer Series in Chemical Physics, pages 102–125. Springer, 1981. 5. H. G. Bock and J. P. Schl¨ oder. Numerical solution of retarded diﬀerential equations with state-dependent time lags. Zeitschrift f¨ ur Angewandte Mathematik und Mechanik, 61:269, 1981. 6. C. F. Curtiss and J. O. Hirschfelder. Integration of stiﬀ equations. Proc. Nat. Acad. Sci, 38:235–243, 1952. 7. T. A. Davis. Algorithm 832: UMFPACK - an unsymmetric-pattern multifrontal method with a column pre-ordering strategy. ACM Trans. Math. Software, 30:196–199, 2004. 8. M. Diehl, A. Walther, H. G. Bock, and E. Kostina. An adjoint-based SQP algorithm with quasi-newton jacobian updates for inequality constrained optimization. Technical Report Preprint MATH-WR-02-2005, TU Dresden, 2005. 9. E. Eich. Numerische Behandlung semi-expliziter diﬀerentiell-algebraischer Gleichungssysteme vom Index I mit BDF Verfahren. Master’s thesis, Universit¨at Bonn, 1987. 10. E. Eich. Projizierende Mehrschrittverfahren zur numerischen L¨ osung von Bewegungsgleichungen technischer Mehrk¨ orpersysteme mit Zwangsbedingungen und Unstetigkeiten. PhD thesis, Universit¨ at Augsburg, 1991. 11. A. Griewank. Evaluating Derivatives, Principles and Techniques of Algorithmic Diﬀerentiation. Number 19 in Frontiers in Appl. Math. SIAM, 2000. 12. A. Griewank, D. Juedes, and J. Utke. Algorithm 755: ADOL-C: A package for the automatic diﬀerentiation of algorithms written in C/C++. ACM Trans. Math. Softw., 22(2):131–167, 1996. 13. D. Lebiedz, J. Kammerer, and U. Brandt-Pollmann. Automatic network coupling analysis for dynamical systems based on detailed kinetic models. Physical Review E, 72:041911, 2005. 14. A. Walther. Program reversal schedules for single- and multi-processor machines. PhD thesis, TU Dresden, 2000. 15. R. C. Whaley, A. Petitet, and J. J. Dongarra. Automated empirical optimization of software and the ATLAS project. Parallel Computing, 27(1–2):3–35, 2001.

The gVERSE RF Pulse: An Optimal Approach to MRI Pulse Design Christopher K. Anand1 , Stephen J. Stoyan2 , and Tam´ as Terlaky3 1 2 3

McMaster University, Hamilton, Ontario, Canada. [email protected] University of Toronto, Toronto, Ontario, Canada. [email protected] McMaster University, Hamilton, Ontario, Canada. [email protected]

Abstract A Variable Rate Selective Excitation (VERSE) is a type of Radio Frequency (RF) pulse that reduces the Speciﬁc Absorption Rate (SAR) of molecules in a specimen. As high levels of SAR lead to increased patient temperatures during Magnetic Resonance Imaging (MRI) procedures, we develop a selective VERSE pulse that is designed to minimize SAR while preserving its duration and slice proﬁle; called the generalized VERSE (gVERSE). After the formulation of a rigorous mathematical model, the nonlinear gVERSE optimization problem is solved via an optimal control approach. Using the state of the art Sparse Optimal Control Software (SOCS), two separate variations of SAR reducing gVERSE pulses were generated. The Magnetic Resonance (MR) signals produced by numerical simulations were then tested and analyzed by an MRI simulator. Computational experiments involved with the gVERSE model provided constant RF pulse levels and had encouraging results with respect to MR signals. The testing results produced by the gVERSE pulse illustrate the potential advanced optimization techniques have in designing RF sequences.

1 Introduction to the Problem Magnetic Resonance Imaging (MRI) produces high resolution crosssectional images by utilizing selective Radio Frequency (RF) pluses and ﬁeld gradients. Selective excitations are obtained by applying simultaneous gradient waveforms and a RF pulse with the appropriate bandwidth [LL01]. Many conventional RF pulse sequences are geared towards generating high deﬁnition images, but fail to consider the SAR (Speciﬁc Absorption Rate) of the excitation. High levels of SAR during MRI procedures can cause undesired side eﬀects such as skin burns. Thus, one needs to reduce SAR levels and this is the focus of our paper. Instead of using common approaches to approximating the Bloch equation, we integrate the Bloch equation in a nonlinear optimization model that is designed to minimize RF SAR levels. Several researchers have studied the selective RF excitation problem and employed diﬀerent optimization methods in their designs. Simulated annealing

26

C.K. Anand et al.

[She01], evolutionary algorithms [WXF91], quadratic optimization [CGN88] and optimal control techniques [CNM86, UGI04] are the most common. Although they produce solutions that relate to their desired proﬁle, they are computationally intensive and in many cases their design for the pulseenvelope consists of relaxed conditions. In [UGI04], the excitation design under the optimal control approach leads to an ill-conditioned algebraic problem, which stems from the models attempt to include the Bloch equation in the Chebyshev domain. In [CGN88], Conolly et al. design the Variable Rate Selective Excitation (VERSE) pulse that is aimed at reducing MRI SAR levels; however, their model does not incorporate penalties to trade oﬀ energy and adhesion to the desired slice proﬁle, nor does it incorporate relaxation. The problem still remains, as pointed out in [CNM86,CGN88]. There is no suﬃcient mathematical formulation for pulse-envelope design. In this paper we address this problem, and use our model to construct a dynamical nonlinear optimization algorithm that is aimed at reducing RF SAR levels. Using Conolly’s et al. idea, we design the generalized VERSE (gVERSE) pulse. The gVERSE pulse is a highly selective pulse that diﬀers from its originator with respect to how SAR is minimized. The preﬁx “g” was added to VERSE because our objective function directly encompasses the high demands of RF pulse levels by allowing the gradient waveform to freely vary. In addition, we have signiﬁcantly increased the dynamics of the VERSE problem, added additional constraints, and enhanced the degrees of freedom. Using our RF pulse formulation, we develop two separate pulse sequences that include variable slice gradients (listed as future work in [UGI04]). In this paper, we begin with a review of general RF pulse sequences that leads to the development of our new SAR reducing gVERSE pulse model. In Section 3, the gVERSE model is fully detailed and the accompanying Nonlinear Optimization (NLO) problem is formulated. The implementation issues involved in computing the gVERSE pulse are brieﬂy described in Section 4. In Section 5, the computational results for the gVERSE pulse are shown for two diﬀerent test cases. The results are graphically illustrated and then tested by an MRI simulation in Section 6, where they are analyzed and examined with respect to the MR signals they generate. Finally, in Section 7 we conclude on how our results and MRI simulations show that mathematical optimization can have a strong eﬀect on improving RF pulse sequences.

2 MRI Background To understand the implications and eﬀects of the gVERSE pulse we will begin with a short outline of our notation and a review of two diﬀerent types of RF pulse sequences. For more information with regards to the MR formulations and/or general RF pulses, one can refer to [Bus96, HBTV99, LL01]. To begin, we deﬁne the Bloch equation which provides the rate of magne− → tization (dM (t)/dt),

The gVERSE RF Pulse: An Optimal Approach to MRI Pulse Design

27

− → − → − → → dM (t) 1 1− = γ M (t) × B (t) + (M0 − Mz (t)) z − M ⊥ (t) dt τ1 τ2 − → where t is time, B (t) is the external magnetic ﬁeld in the z-axis direction, γ is the gyromagnetic constant, and ⎤ ⎡ ⎤ ⎡ Mx (t) Mx (t) − → − → M (t) = ⎣ My (t) ⎦, and M ⊥ (t) = ⎣ My (t) ⎦ 0 Mz (t)

are respectively the net and transverse magnetization vectors. Furthermore, z is the z-axis unit vector, M0 is the initial magnetization in the z direction, τ1 is the spin-lattice interaction parameter and τ2 is the spin-spin interaction parameter. In addition, we let ⎤ ⎡ bx (t) − → B (t) = ⎣ by (t) ⎦ , bz (t)

where bx (t), by (t) and bz (t) are the external magnetization vector coordinates, which will be used in the formulation of the gVERSE pulse. 2.1 Generic RF pulse

When processing an image, a number of precise RF pulses are applied in combination with synchronized gradients in diﬀerent dimensional directions. For a detailed analysis of this process one can look at [CDM90, HBTV99, LL01]. We would like to highlight that RF pulses are only aimed at a speciﬁc portion of the object or specimen that the user intends to image. In addition, the RF pulse is accompanied by a gradient waveform that is used to spatially modulate the signals orientation [Bus96]. There are many diﬀerent techniques in which RF and gradient waveforms can generate useable signals. Gaussian and sinc pulses are two of the many RF pulse sequences used today. Figure 1 is an illustration of a slice select sinc pulse [HBTV99]. Sinc pulses are successful at exciting particular magnetization vectors into the transverse plane that generate signal readings, however, they fail to account for side eﬀects such as SAR levels. The heating eﬀect experienced by patients during MRI procedures is measured by the level of SAR, which is a direct result of the RF pulse used. The level of SAR becomes particularly important with pediatric patients and as a result the FDA has strict limitations on SAR; which subsequently restricts RF pulse potential and other elements involved in MRI procedures. In addition, as MRI researchers are constantly developing faster scanners, higher tesla magnets, enhanced software components and improved RF coils; they are all still limited by SAR levels. Hence, RF pulses that consider such a factor are in high demand.

28

C.K. Anand et al.

RF Pulse G(t)

MR Signal

Figure 1. A generic NMR slice select SINC pulse imaging sequence

2.2 The VERSE Pulse Originally proposed by Conolly et al. [CGN88], VERSE pluses were designed to generate MR signals similar to generic RF pulses, however, low pulse SAR levels were incorporated into the model. As mentioned, the SAR of a selective RF pulse is a critical parameter in clinical settings and may limit the use of a particular pulse sequence if the SAR limit exceeds given FDA requirements [LL01]. Due to the high SAR levels of various RF pluses the scan time for given pulse sequences are restricted [CGN88]. The key innovation with VERSE pulses is to allow a “trade oﬀ” between time and amplitude. By lowering RF pulse amplitude the duration of the pulse may be extended [CGN88]. As illustrated in Figure 2, VERSE pulses are similar to generic pulses, how-

RF Pulse G(t)

MR Signal

Figure 2. The VERSE pulse imaging sequence

ever, they contain a ﬂattened center peak and their gradient waveform posses two additional steps. It is this uniform redistribution of the pulse area that allows the decrease in SAR. Conolly et al. designed three diﬀerent types of SAR reducing pulses, each that had constraints on the strength of the RF pulse, however, they diﬀered with respect to how they minimized SAR levels. The ﬁrst model consisted of a minimum–SAR facsimile pulse for a speciﬁed duration, whereby the gradient waveform and RF pulse were integrated in the objective and subject to maximum gradient and constant duration constraints.

The gVERSE RF Pulse: An Optimal Approach to MRI Pulse Design

29

The second model used a minimum time formulation approach, whereby it searched for the briefest pulse that did not exceed a speciﬁed peak–RF level. The pulse was optimized for time and constrained by maximum gradient and RF levels. The ﬁnal model, called the parametric gradient, constrained both the maximum gradient and slew-rate, and involved the parametric gradient and the RF pulse in the objective [CGN88]. The ﬁrst two models consisted of a maximum of 3κ + 1 variables, where κ was the total number of samples or RF pulses. The ﬁnal model involved κ(p + 1) + 1 variables, where p represented the dimension of a parameter vector. Experimentation proved that only 256 sample values were necessary, which kept the variable count relatively low [CGN88]. Of the three algorithms, the parametric formulation oﬀered the most robust SAR minimization, however, the design still had areas for improvement as the results contained gradient and RF timing mismatches. Subsequently, further experimentation was necessary with VERSE pulses, as Conolly et al. were the ﬁrst to introduce this innovative concept.

3 The gVERSE Model Conolly et al. [CGN88] showed that SAR can be reduced by combined RF/gradient reductions and time dilations, starting with an initial pulse design. For our research we would like to search a larger parameter space by allowing arbitrary gradient waveforms (subject to machine constraints), including sign changes. The gVERSE pulse is illustrated in Figure 3; our aim is to lower RF pulse energy and more evenly distribute the RF pulse signal. This ﬂattened redistribution of the pulse will allow for a longer signal reading

RF Pulse G(t)

MR Signal

Figure 3. The gVERSE pulse imaging sequence

and potentially cause an even greater decrease in the level of SAR than the original VERSE pulse. Mathematically, this is the same as minimizing the − → external magnetic ﬁeld generated by the RF pulse ( B rf (t)), and therefore our objective is

30

C.K. Anand et al.

min SAR =

0

T

− → | B rf (t)|2 dt =

0

T

b2x (t) + b2y (t)dt,

where T is the time at the end of the RF pulse and ⎤ ⎡ bx (t) − → B rf (t) = ⎣ by (t) ⎦ . 0

As MRI is based on the interaction of nuclear spin with an external magnetic − → − → ﬁeld, B rf (t) is simply the vertical and horizontal components of B (t). Also, if low pulse amplitudes are produced by the gVERSE pulse, the duration T of the pulse can be increased. Another part of MRI comes from the fact that since all magnetization vectors are spinning, there exists a rotational frame of reference. If we set up our equations in the rotating frame of reference then we exclude the uniform magnetic ﬁeld generated by the main super-conducting magnet, B0 . Instead, − → we are left with the magnetic ﬁeld of our RF pulse, B rf (t), and our gradient ⎡ ⎤ 0 − → G (t, s) = ⎣ 0 ⎦ , sG(t)

where sG(t) is the gradient value at coordinate position s. The primary function of the gradient is to produce time-altering magnetic ﬁelds such that the MR signal can be spatially allocated [HBTV99]. Hence, diﬀerent parts of a specimen experience diﬀerent gradient ﬁeld strengths. Thus, by multiplying a constant gradient value by diﬀerent coordinate positions s, we have potentially produced an equivalent linear relationship to what is used in practice. Fundamentally, coordinate positions s split a specimen or object into “planes” or “slices” along the s direction, which for the purposes of this paper will be parallel to z, as depicted in Figure 4. Here, s corresponds to a speciﬁc coordiy

z

x

S

Figure 4. Specimen or object separated into planes or slices about the z-axis

nate value depending on its respective position and further it has a precise and

The gVERSE RF Pulse: An Optimal Approach to MRI Pulse Design

31

representative gradient strength. A RF pulse excites particular magnetization vectors into the transverse (x, y) plane where a signal is generated that is eventually processed into an image. In MRI a voxel corresponds to the unit volume of protons necessary to produce graphic information [HBTV99], and as this is directly related to a group or unit volume of magnetization vectors we will use the word voxel and magnetization vector interchangeably. Thus, s allows us to distinguish between voxels that are excited into the transverse plane by a RF pulse and those that are not. Coordinate positions, s, of voxels that are excited into the transverse plane will be recorded and referred to as being “in the slice.” Magnetization vectors that are not tipped into the transverse plane will be referred to as being “outside the slice.” Since any specimen or object we intend to image will have a ﬁxed length, given s ∈ S, we will restrict S by choosing a ﬁnite set S ⊂ R. S can then be further partitioned into the disjoint · out , where Sin represents the coordinate positions in the union of sets Sin ∪S slice and Sout represents the positions together with the magnetization vectors that we do not wish to tip into the transverse plane, i.e. those which are outside the slice. For each coordinate position s ∈ S we add constraints corresponding to the Bloch equation, however, boundary constraints correspond to diﬀerent conditions depending on the position of the slice, as we will discuss later. Fundamentally, voxels in Sin , ensure uniform magnetic tipping into the transverse plane, whereas s ∈ Sout , certify that external magnetization is preserved. − → Thus, we now have the magnetic ﬁeld B (t, s) with respect to coordinate positions s, whereby bx (t) and by (t) are independent of s, hence − → − → − → B (t, s) = B rf (t) + G (t, s). − → Also, since B (t, s) has divided the z component of our external magnetization into coordinate components, the same notation must be introduced into our net magnetization. By adding coordinate positions s to the magnetization vector we have, ⎤ ⎡ Mx (t, s) − → M (t, s) = ⎣ My (t, s) ⎦ . Mz (t, s) In addition, since VERSE pulses typically have short sampling times we will assume the same for the gVERSE pulse and thus omit proton interactions and relaxation. Therefore, including positions s into the Bloch equation, we are left with − → − → − → dM (t, s) = γ M (t, s) × B (t, s). dt Hence, we have ⎡ ⎤⎡ ⎤ 0 −sG(t) by (t) Mx (t, s) − → − → 0 −bx (t) ⎦ ⎣ My (t, s) ⎦ , M (t, s) × B (t, s) = ⎣ sG(t) −by (t) bx (t) 0 Mz (t, s)

32

C.K. Anand et al.

and ﬁnally ⎡ ⎤ − → 0 −sG(t) by (t) − → dM (t, s) 0 −bx (t) ⎦ M (t, s). = γ ⎣ sG(t) dt −by (t) bx (t) 0

(1)

When stimulating a speciﬁc segment of a specimen by a RF pulse, some of the magnetization vectors are fully tipped into the transverse plane, partially tipped, and those lying outside the slice proﬁle are minimally aﬀected. The magnetization vectors that are only partially tipped into the transverse plane are described as having oﬀ-resonance and tend to disrupt pulse sequences and distort the ﬁnal MRI image [HBTV99]. In anticipation of removing such inhomogeneities we introduce the angle α, at which net magnetization moves from the z direction to the transverse plane. By convention, α will be the greatest at the end of our RF pulse, at time T , and since we are in the rotating frame we can remove the y-axis from our equations. Thus, we can eliminate oﬀ-resonance s coordinates by bounding voxels aﬀected by the pulse ⎡ ⎤ ⎡ ⎤ M0 sin(α) Mx (T, s) ⎣ ⎦ − ⎣ My (T, s) ⎦ ≤ ε1 , 0 M0 cos(α) Mz (T, s)

and those in Sout , with α = 0, hence ⎡ ⎤ ⎤ ⎡ 0 Mx (T, s) ⎣ 0 ⎦ − ⎣ My (T, s) ⎦ ≤ ε2 , M0 Mz (T, s)

where ε1 , ε2 ≥ 0. By comparing these two bounds we can determine the s coordinates from which we would like the signal to be generated and exclude oﬀ-resonance. Another factor we must integrate into our pulse is slew rate W (t), also called gradient rise time. This identiﬁes how fast a magnetic gradient ﬁeld can be ramped to diﬀerent gradient ﬁeld strengths [CGN88]. As a result, higher slew rates enable shorter measurement times since the signal generated by the RF pulse to be imaged is dependent on it. Thus, the slew rate and gradient ﬁeld strength together determine an upper bound on the speed and ultimately minimum time needed to perform the pulse. Thus, there must be a bound on these two entities in our constraints, |G(t)| ≤ Gmax , dG(t) ≤ Wmax . W (t) = dt

Finally, we have the semi-inﬁnite nonlinear optimization problem T b2x (t) + b2y (t)dt , min SAR = 0

(2)

The gVERSE RF Pulse: An Optimal Approach to MRI Pulse Design

33

subject to, ⎤ ⎡ − → 0 −sG(t) by (t) − → dM (t, s) 0 −bx (t) ⎦ M (t, s), = γ ⎣ sG(t) dt −by (t) bx (t) 0 ⎡ ⎤ ⎤ ⎡ M0 sin(α) Mx (T, s) ⎣ ⎦ − ⎣ My (T, s) ⎦ ≤ ε1 , 0 M0 cos(α) Mz (T, s) ⎡ ⎤ ⎤ ⎡ 0 Mx (T, s) ⎣ 0 ⎦ − ⎣ My (T, s) ⎦ ≤ ε2 , M0 Mz (T, s)

|G(t)| ≤ Gmax , dG(t) dt ≤ Wmax ,

Mx (0, s) = 0, My (0, s) = 0, Mz (0, s) = M0 ,

(3)

(4Sin )

(4Sout ) (5) (6) (7)

where equations (2) – (7) hold for ∀ s ∈ S, t ∈ [0, T ]. Thus, depending on our bound for the pulse, we will construct two sets of constraints, one for the voxels Sin ⊂ R that will be excited by the RF pulse and one for those that will not, Sout ⊂ R. Which indices are aﬀected will be determined by the constraints (4Sin ) and (4Sout ). 3.1 Discretization By separating our specimen into coordinate positions we have ultimately created two dimensional segments that are similar to records in a record box, whereby s ∈ S represents the transverse plane at a particular position. Now we will discretize S into coordinate positions s1 , s2 , . . . , sn , where n is the total number of slices. As we have discussed earlier, Sin is the coordinate positions whose magnetization vectors have been tipped into the transverse plane by a RF pulse. Next we can deﬁne the ﬁnite band of particular coordinate positions in Sin to consist of positions sk , . . . , sk+δ , where 1 < k ≤ k + δ < n, δ ≥ 0 and k, δ ∈ Z. Subsequently Sout , which was deﬁned as positions that were not excited in the transverse plane, will consist of all coordinate positions not in Sin , hence, Sout = s1 , . . . , sk−1 , s(k+δ)+1 , . . . , sn . Figure 5 represents how si ∈ S for i = 1, . . . , n would separate magnetization vectors into coordinate positions that have been tipped into the transverse plane, and those that have not. One should also note that we have only discretized with respect to coordinate positions si ∈ S, not time t. Furthermore, we will deﬁne the ﬁrst coordinate position in Sin where RF pulse stimulation begins as s, and similarly, the last position in Sin where stimulation ends as s. Thus, we have s = sk and s = sk+δ , and we can now state the coordinate positions in the slice

34

C.K. Anand et al.

s1

... sk-1

sk

... sk+

s k+

+1

... sn

Figure 5. Separating magnetization vectors into coordinate positions which are in the slice, Sin , and out, Sout

as Sin = [s, s]. The ﬁrst position where RF stimulation is a minimum, closest to s, but in Sout and towards the direction of s1 , will be deﬁned as sl . As well, the same will be done for the position closest to s, which is in Sout and towards the direction of sn , deﬁned as su . Consequently, sl = sk−1 and su = s(k+δ)+1 , and therefore the coordinate positions outside the slice can be represented as · u , sn ]. As depicted in Figure 5, Sin is located between the Sout = [s1 , sl ]∪[s two subintervals of Sout , where si ∈ Sin is centered around 0, leaving Sout subintervals, [s1 , sl ] < 0 and [su , sn ] > 0. As well, [s1 , sl ] and [su , sn ] are symmetric with respect to each other, hence, the length of these subintervals are equivalent, sk−1 − s1 = sn − s(k+δ)+1 . Furthermore, the diﬀerence between respective coordinate positions within each interval are equal to one another such that, s2 − s1 = sn − sn−1

s3 − s2 = sn−1 − sn−2 .. .. . .

(8)

sk−1 − sk−2 = s(k+δ)+2 − s(k+δ)+1 . Also note that the discretization points, si , within any interval [s1 , sl ], [s, s] and [su , sn ] do not necessarily have to be uniformly distributed and thus, more or less coordinate positions can be positioned closer to the boundaries of Sin and Sout . The distance between coordinate positions (sl , s) and (s, su ) will be much larger in comparison to other increments of si . This is typically the area where voxels that have oﬀ-resonance characteristics are located. As mentioned earlier, magnetization vectors having oﬀ-resonance tend to disrupt pulse sequences and distort the MRI image. For this reason we will deﬁne tolerance gaps S0 of ﬁnite length between (sl , s) and (s, su ), where oﬀ-resonance · 0 where · out ∪S prominently resides. Hence, S can now be partitioned into Sin ∪S a general sequence of the intervals would be Sout , S0 , Sin , S0 , Sout . 3.2 gVERSE Penalty An important component of the model now becomes evident, the nonlinear optimization problem deﬁned in (2) – (7) may be infeasible or diﬃcult to solve

The gVERSE RF Pulse: An Optimal Approach to MRI Pulse Design

35

as the number n of si ∈ S becomes large and the slices are close together. In particular, constraints (4Sin ) and (4Sout ) pose a threat to the feasibility of the problem as the number of discretization points increase. A penalty for the violation of these constraints can be imposed such that an optimal solution is located for problems with large numbers of variables and small distances between si coordinate positions. The basic idea in penalty methods is to relax particular constraints and add a penalty term to the objective function that prescribes high cost to infeasible points [Ber95]. The penalty parameter determines the severity of violation and as a consequence, the extent to which the resulting unconstrained problem approximates the original constrained one. Thus, returning to the semi-inﬁnite nonlinear optimization problem formulated at the start of Section 3, we introduce penalty variables ξ1 and ξ2 to constraints (4Sin ) – (4Sout ), and the optimization problem objective becomes T min SAR = b2x (t) + b2y (t) dt + ξ1 ζ1 + ξ2 ζ2 , (9) 0

subject to constraints (3), (5) – (7), and ⎡ ⎤ ⎤ ⎡ M0 sin(α) Mx (T, si ) ⎣ ⎦ − ⎣ My (T, si ) ⎦ ≤ ε1 + ξ1 , 0 M0 cos(α) Mz (T, si ) ⎡ ⎤ ⎤ ⎡ 0 Mx (T, si ) ⎣ 0 ⎦ − ⎣ My (T, si ) ⎦ ≤ ε2 + ξ2 , M0 Mz (T, si )

(10Sin )

(10Sout )

where ζ1 , ζ2 ∈ R are scalar penalty parameters and as in the earlier equations of Section 3, (9) – (10Sout ) apply ∀ s ∈ S, t ∈ [0, T ]. One should note that the larger the value of ζ1 and ζ2 , the less violated constraints (10Sin ) and (10Sout ) become. In addition, as it is written the penalty variables are applied to each si ∈ S for constraints (10Sin ) and (10Sout ). However, depending on computational results, it may be appropriate to only penalize coordinate positions in the neighbourhood of the bounds [sl , s] and [s, su ]. This would enhance the constraints on the optimization problem and only allow violations to occur at the most vulnerable points of the problem. Adding penalty variables and parameters to our optimization problem is an option that may not be necessary. It is dependent on the number n of coordinate positions applied to the model as well as how close we would like si ∈ S to be to one another. Hence, for the remainder of this paper we will omit writing out the penalty variables and parameters, however, the reader should note that they can easily be incorporated into the formulation.

4 Results The gVERSE pulse was designed to improve RF pulse sequences by minimizing SAR levels while upholding MRI resolution, however, the complex

36

C.K. Anand et al.

mathematical requirements of the model may be diﬃcult to satisfy. Even simple NLO problems with large numbers of variables can be challenging to solve and threatens many software packages. Thus, when attempting to minimize the objective function in (2) under the constraints (3) – (7), the number of variables implemented was especially important. Preliminary results were found by implementing the gVERSE model using ﬁve coordinate positions and the SQP based Optimal Control software package SOCS solved the demanding time dependent NLO problems. This kept the variable count to a minimum of 19 (3n + 4), excluding the independent time variable t. The number of slices were systematically increased until software limitations on memory became a factor. Nonetheless, this was a remarkably larger number of variables than anticipated as it accounted for 15 slices with a total of 38 857 variables. By experimenting and consulting the literature, realistic MRI values for the constants were used during each computation. Namely, γ = 42.58 Hz/mT, Gmax = 0.02 mT/mm and Wmax = 0.2 mT/mm/ms, where Hz is Hertz, mm is millimeters, ms is milliseconds, and mT is millitelsa. The magnetization vectors in Sin were fully tipped into the transverse plane, hence, α = π/2. The magnitude of the initial magnetization vector for each coordinate position had an initial magnetization value of M0 = 1.0 spin density units. Initially, we chose ε1 , ε2 ≤ 0.1, however, as the number of variables increased for the problem, the larger the value of ε1 and ε2 had to be in order to ﬁnd a feasible solution, hence, ε1 , ε2 = 0.1 for the 15 slice results. Sout

Sin

Sout

s1, ..., s 6 s1, ...,s 6

s , s , s9 s 77, 8 ...,s 9

s 10, ..., s 15 s 10, ...,s 15

Figure 6. The separation of coordinate positions si into Sin and Sout for 15 magnetization vectors

4.1 Fifteen Slice Results The results for the 15 slice problem accounted for the largest number of variables that SOCS could solve. The problem became even more challenging as the distance from s to sl and s to su decreased. For smaller distances between the magnetization vectors in Sin and Sout penalty variables and parameters had to be incorporated into the formulation of the problem. We will begin with the 15 slice results without penalty, where a greater distances between s to sl and s to su were used.

The gVERSE RF Pulse: An Optimal Approach to MRI Pulse Design

37

Since there were 15 slices, the three middle magnetization vectors were tipped into the transverse plane to ensure that the symmetric structure of the problem was maintained. Hence, coordinate positions s7 , s8 and s9 were in Sin , while s1 , s2 , . . . s6 and s10 , s11 , . . . s15 remained in Sout . The arrangement of the coordinate positions is shown in Figure 6 and the exact values for the coordinate positions are as follows: −30 −28 −26 −24 −22 −20 −0.2 0 0.2 20 22 24 26 28 30 s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 s11 s12 s13 s14 s15 which is in mm. The results for the 15 slice coordinate simulation is illustrated in Figures 7, 8 and 11. Information on the magnetic vector projection is shown in the graphs found in Figures 7 – 8. Due to the symmetric structure of the problem, voxels s1 , . . . , s6 and s10 , . . . , s15 were identical, as were s7 and s9 . Hence, only the ﬁrst eight coordinate positions are shown. Thus, Figure 7 – 8 corresponds to magnetization vectors in Sout and Sin . The resulting RF Magnetization Vector s

Magnetization Vector s2

1 0.9

0.8

0.8

0.7

0.7

0.6

0.6 Mz(t,s2)

1 0.9

1

M (t,s )

1

z

0.5 0.4

0.5 0.4

0.3

0.3

0.2

0.2

0.1

0.1

0 0.15

0 0.15

0.1

0.1

0.1

0.1

0.05

0.05

0.05 0.05

0 0

0 0

−0.05 −0.05

−0.1

My(t,s1)

−0.05 −0.05

,

Mx(t,s1)

−0.1

My(t,s2)

M (t,s ) x 2

Magnetization Vector s

Magnetization Vector s

3

4

1

1 0.9

0.8

0.8 0.7 M (t,s )

0.6 4 z

3

M (t,s )

0.6 z

0.5

0.4

0.4 0.3

0.2

0.2 0 0.2

0.1 0 0.15

0.15

0.1

−0.05 M (t,s ) 3

0

0

−0.05

0

0.05

0.05

0

0.05

y

0.1

0.1

0.05

0.1

−0.05

−0.1 M (t,s ) x

3

,

M (t,s ) y

4

−0.05 −0.1

−0.1 Mx(t,s4)

Figure 7. From Top to bottom, magnetization vectors corresponding to coordinate positions s1 , s2 , s3 and s4

pulse procedure, represented by the external magnetization components and

38

C.K. Anand et al. Magnetization Vector s

Magnetization Vector s

6

5

1

1 0.9

0.8

0.8 0.7 M (t,s )

0.6 6 z

5

M (t,s )

0.6

0.4

z

0.5 0.4 0.3

0.2

0.2 0 0.1

0.1 0 0.1

0.05

0.1 0.05

0.1

0.05 0 −0.05 −0.1

−0.1 x

5

y

Magnetization Vector s7

1

6

0.9

0.9

0.8

0.8

0.7

0.7

−0.05 M (t,s ) x

6

Magnetization Vector s8

1

0.6

0.6 Mz(t,s8)

Mz(t,s7)

0 −0.1

M (t,s )

,

M (t,s )

5

0.05 −0.05

−0.05

M (t,s ) y

0

0

0.5 0.4

0.5 0.4

0.3

0.3 0.2

0.2 0.1

0.1 0

0 My(t,s7) −8

x 10

1 0 −1 0

0.1

0.2

0.3

0.4

0.5 Mx(t,s7)

0.6

0.7

0.8

0.9

1

My(t,s8)

,

−8

x 10

−1

0

0 1

0.1

0.2

0.3

0.4

0.5 Mx(t,s8)

0.6

0.7

0.8

0.9

1

Figure 8. From top to bottom, magnetization vectors corresponding to coordinate positions s5 , s6 , s7 and s8

the gradient waveform is shown in Figure 11; bx (t) is not shown as it was constant and equal to zero. One can observe that the precession of the magnetization vectors in Sout is evident, this is shown in the graphs of Figures 7 – 8. The initial point is close to the voxels precession range and at most it takes one full rotation for them to orbit uniformly. The magnetization vectors in Figure 8, those si that belong to Sin , smoothly tip into the transverse plane without any cusps or peaks. There are small diﬀerences between s7 and s8 as they begin to tip into the transverse plane, however, they act very similar after their height decreases below 0.8 spin density units. In Figure 11, the gradient waveform starts oﬀ negative and then ends up positive. It is not a smooth curve since it is composed of many local hills and valleys. Also, the gradient seems to be the opposite of what is used in practical MRI sequences, however this proves to be a proﬁcient sequence as we will investigate in the next Section. Finally, the external magnetization components, bx (t) and by (t), are constant and linear, precisely what we optimized for in the objective function. The value of bx (t) is zero mT/mm, while by (t) of Figure 11 has a constant value of 0.01925 mT/mm.

The gVERSE RF Pulse: An Optimal Approach to MRI Pulse Design Magnetization Vector s

2

1

1

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6 2

M (t,s )

0.5

0.5

z

Mz(t,s1)

Magnetization Vector s1

0.4

0.4

0.3

0.3

0.2

0.2 0.1

0.1 0 0.6

0 0.6

0.5

0.5 0.4

0.4 0.2

0.2

0

0

0

0 −0.2

−0.2 −0.4

−0.5 x

−0.4

,

M (t,s )

My(t,s1)

1

−0.5

M (t,s ) y

M (t,s )

2

x

2

Magnetization Vector s4

Magnetization Vector s

3

1

1

0.9

0.9

0.8

0.8

0.7

0.7

0.6

3

4

M (t,s )

0.6 M (t,s )

39

0.5

z

z

0.5

0.4

0.4 0.3

0.3 0.2

0.2 0.1

0.1 0 0.6

0 0.6

0.5

0.2

0

0.2

0.5

0.4

0.4 0 −0.2 M (t,s ) y

3

0

0

−0.5 Mx(t,s3)

,

−0.2 My(t,s4)

−0.5 Mx(t,s4)

Figure 9. From top to bottom, magnetization vectors corresponding to coordinate positions s1 , s2 , s3 and s4

4.2 Fifteen Slice Penalty Results To increase the distance between the coordinate positions that were tipped into the transverse plane and allow a smooth transition between magnetization vectors in Sin and Sout , penalty variables and parameters were introduced. As described in Section 3.2, penalty variables were added to each si vector in constraints (10Sin ) and (10Sout ) in order to decrease the distance between [s, sl ] and [s, su ]. The remaining variables, constants, and constraints were consistent with what was used in the 15 slice results. The exact values for the coordinate positions were as follows: −30 −28 −26 −24 −22 −20 −2 0 2 20 22 24 26 28 30 s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 s11 s12 s13 s14 s15 ξ2 ξ2 ξ2 ξ2 ξ2 ξ2 ξ1 ξ1 ξ1 ξ2 ξ2 ξ2 ξ2 ξ2 ξ2 where the positions that were penalized have their respective penalty variables listed below them. Notice that with the addition of penalty variables and parameters the distance from s7 to s9 increased to 4 mm, compared to the 0.4 mm diﬀerence in the 15 slice results on page 37. This allowed the diﬀerence

40

C.K. Anand et al.

between the vectors in Sin and Sout to be reduced. The results for the penalized 15 coordinate simulation is illustrated in Figures 9, 10 and 12, where the value of the penalty parameters were ζ1 = 100 and ζ2 = 100. The proﬁles of the magnetic moments are shown in Figures 9 – 10. Again, due to the problems symmetry we have omitted the graphs of the magnetization vectors corresponding to coordinate positions s9 , . . . , s15 . Hence, Figure 9 and the top two graphs in 10 correspond to magnetization vectors in Sout , whereas the bottom two graphs in Figure 10 refer to the coordinate positions in Sin . The resulting RF pulse procedure, represented by the external magnetization components and gradient sequence is shown in Figure 12; again bx (t) is not shown as it was constant and equal to zero. Magnetization Vector s

Magnetization Vector s6

5

1

1

0.9

0.9 0.8

0.8

0.7

0.7

0.6 Mz(t,s6)

5

M (t,s )

0.6 z

0.5 0.4

0.5 0.4 0.3

0.3

0.2

0.2

0.1 0.1

0 0.6

0.5

0 0.6

0.5 0.4

0.5

0.4

0

0.3

0.2

0.1

0

0.2

0 0

−0.5

−0.1

,

Mx(t,s5)

My(t,s5)

−0.2

M (t,s ) y

Magnetization Vector s7

Mx(t,s6)

Magnetization Vector s

8

1

1

0.9

0.9

0.8

0.8

0.7

0.7 0.6 Mz(t,s8)

0.6 Mz(t,s7)

−0.5

6

0.5

0.5 0.4

0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 0

0 −0.05 My(t,s7)

My(t,s8) 0

0.05 0

0.1

0.2

0.3

0.4

0.5 M (t,s ) x

7

0.6

0.7

0.8

0.9

1

−9

,

x 10

−0.5 −1

0

0.1

0.2

0.3

0.4

0.5 M (t,s ) x

0.6

0.7

0.8

0.9

1

8

Figure 10. From top to bottom, magnetization vectors corresponding to coordinate positions s5 , s6 , s7 and s8

As illustrated, the precession of the magnetization vectors in Sout , Figures 9 – 10, have a much larger radius than that of the 15 slice problem. In fact, these magnetization vectors have at most three successive orbits in the entire time duration. The magnetization vectors in Figure 10, those si that belong to Sin , smoothly tip into the transverse plane and there is a greater similarity between s7 and s8 than in the preceding results. However, due to the penalty

The gVERSE RF Pulse: An Optimal Approach to MRI Pulse Design

41

variables these vectors only tip down to a spin density value of 0.2. Also, the y-axis is larger than was in the 15 slice problem, this is because the My (t, ·) vectors are increasing as they descend into the transverse plane. In Figure 12, the gradient waveform contains two large peaks. The ﬁrst is negative and it starts about one quarter into the time period. The second peak is positive and it starts approximately three quarters into the time period. Also, the gradient sequence has three linear segments. One that is zero at the start of the sequence and the other two occur within the peaks, each having a value of exactly ±Gmax . For the external magnetization components, bx (t) is again constant and has a value of zero mT/mm. Although the axis of by (t) in Figure 12 has been magniﬁed, it is not as linear as the previous results and has increased to a value of approximately 0.10116 mT/mm. Nevertheless, this is still less than the amplitude for a conventional pulse, such as the one illustrated in Figure 1, which has a typical by (t) value of approximately 0.7500 mT/mm. In fact, if we look at the value of the objective function in (2), the 15 slice penalty results have an objective value of 0.1874 SAR units, whereas the generic RF pulse produced a value of 0.5923 SAR units. The 15 slice results generated the lowest objective value of 0.0385 SAR units. 15 Slice Results Gradient Sequence (G(t))

0.01929

0.02

0.01928

0.015

0.01927

0.01

0.01926

0.005

Gradient (mT/mm)

by Pulse

by(t) RF Pulse Sequence

0.01925

0.01924

0

−0.005

0.01923

−0.01

0.01922

−0.015

0

0.2

0.4

0.6

0.8

1 Time (ms)

1.2

1.4

1.6

1.8

2

−0.02

,

0

0.2

0.4

0.6

0.8

1 Time (ms)

1.2

1.4

1.6

1.8

2

Figure 11. External magnetization component by (t) and gradient sequence G(t) for the 15 slice results, bx (t) is zero

Using the simulated gVERSE magnetization results, we produce two different graphs showing the transverse and longitudinal magnetization proﬁles. The desired magnetization distributions for a 900 RF pulse with 15 coordinate positions are shown in Figure 13, for further information about the desired proﬁles the reader may consult [CNM86], [HBTV99]. The transverse magnetization proﬁle illustrated in Figure 14 is very similar to what is desired as given in Figure 13. The Mx magnetization component is free of ripples and contains the requested step function. The My magnetization proﬁle is included to illustrate its minimal presence. One should note that the lower axis in Figures

42

C.K. Anand et al. 15 Slice Penalty Results Gradient Sequence (G(t))

by(t) RF Pulse Sequence 0.1013

0.02

0.015

0.01

by Pulse

Gradient (mT/mm)

0.1012

0.005

0

−0.005

0.1011

−0.01

−0.015

0.1010

−0.02 0

0.2

0.4

0.6

0.8

1 Time (ms)

1.2

1.4

1.6

1.8

2

,

0

0.2

0.4

0.6

0.8

1 Time (ms)

1.2

1.4

1.6

1.8

2

Figure 12. External magnetization component by (t) and gradient sequence G(t) for the 15 slice penalty results, bx (t) is zero

13 – 14 represents the magnetization vectors coordinate positions and from the results this corresponds to a distance of 60 mm. The longitudinal magnetization proﬁle in Figure 14 is also similar to what is implored, however, the Mz dip is slightly higher than desired. In Figure 14 it is important to note that our resultant proﬁles have no ripples extending past the slice of interest, which is not the case for the results of [CNM86], [CGN88] and [UGI04]. By virtually omitting ripples in our magnetization proﬁles we potentially reduce aliasing and other such factors that disrupt MR image resolution.

5 Image Reconstruction To obtain an idea of how the gVERSE pulse performs with respect to MR imaging, we provide a simple illustration of its behaviour. First, one should be familiar with how the signal produced by the RF pulse is mathematically ampliﬁed, digitized, transformed, and then combined together with other signals to form a ﬁnal image [CDM90, HBTV99, LL01, Nis96]. There are several techniques that can be used to produce a ﬁnal image, however, the core of the systematic procedure is the same for all methods. For the purpose of our analysis we use 1D imaging coverage. 5.1 gVERSE Simulation An MRI simulation was implemented in Matlab to test the performance of the gVERSE pulse where using the Bloch equation we created an environment similar to that which is occurring in practical MRI. Thus, by feeding the optimized RF and gradient gVERSE values to a program that simulates the behaviour of a portion of a human spinal cord, we can show how the gVERSE

The gVERSE RF Pulse: An Optimal Approach to MRI Pulse Design

43

Desired Magnetization Profile 1 M

x

0.9 0.8

Desired Magnetization Magnitude

0.7 0.6 0.5 0.4 0.3 0.2 0.1 Mz 0 1

3

5

7

9

11

13

15

Coordinate Position

Figure 13. Desired Mx and Mz distribution proﬁles for a 900 pulse Logitudinal Magnetization Profile Transverse Magnetization Profile 1.5

1

0.8

1

Logitudinal Magnetization

Transverse Magnetization

Mx

0.5

0.6

Mz

0.4

M

y

0

0.2

−0.5

1

3

5

7

9

11

13

0

15

1

3

5

,

Coordinate Position

7

9

11

13

15

Coordiante Position

Figure 14. Transverse magnetization components highlighting Mx and My magnitudes (left), Longitudinal Mz magnetization component magnitude (right)

MR signal performs. Speciﬁcally, the gVERSE values of G(tj ), bx (tj ) and by (tj ) for j = 1, . . . , N were read into the Bloch equation (1) for magnetization vectors at diﬀerent s1 , . . . , sn positions. Although we used a total of n coordinate positions in the optimization of our model, the RF pulse and gradient sequence can be applied to > n positions for imaging purposes. Thus, given n > n coordinate positions, N time discretizations, the initial magne− → tization vector (M 0 ) in the z direction, the Bloch equation was numerically integrated for each si value and j = 1, . . . , N . The VESRE pulse sequence, G(tj ), bx (tj ) and by (tj ), was then inserted into the integral of − → M (t, si ) =

tN

t1

− → dM ( t, si ) dt, dt

(11)

for i = 1, . . . , n and where t = [t1 , t2 , . . . , tN ]T . The values for the magnetization vectors were then converted into a signal by simulating the ampliﬁcation and digitization used in MRI. For a complete description of how (11) was

44

C.K. Anand et al.

integrated and ampliﬁed one can refer to [Sto04]. At this step we would be able to investigate the signal produced by our simulation and examine its properties. Using the gVERSE gradient and RF pulse sequence many MRI simulations were conducted over various tissues. We will show one of the results, for more simulation examples the reader can see [Sto04]. As there was relatively no diﬀerence with regards to simulation results using either of the gVERSE cases, the 15 slice penalty results are shown as they were the better of the two. Using cerebrospinal ﬂuid, the most graphically signiﬁcant results were tested by placing the tissue on an angle, as shown in Figure 15. As the signal generated by the pulse has a direct relationship with that of the tissues spin density, each tissues spin density value was substituted into M0 at its respective position. Thus, a spin density value of 1.0 for cerebrospinal ﬂuid was used when performing the MR imaging simulation. Also note, the gVERSE pulse was designed to tip only the magnetization vectors in Sin into the transverse plane. Thus, the coordinate positions si ∈ Sin would produce a peak in the signal when the gVERSE pulse reaches the cerebrospinal ﬂuid for these si ∈ Sin voxels. As detailed in the preceding sections, voxels si ∈ Sin are located at the center coordinate positions, Figure 16.(A) represents the signal Cerebrospinal Fluid Image Positioning

s Coordinate Position

Sout

Cerebrospinal Fluid

Sin

Sout

x Coordinate Position

Figure 15. The angular position of cerebrospinal ﬂuid to be imaged by our MRI simulation

generated after the gVERSE pulse and gradient waveform was used to excite particular voxels within the cerebrospinal ﬂuid into the transverse plane. As it is shown in Figure 16.(A), the large central peak in the signal represents when the gVERSE pulse reaches the voxels in Sin of the ﬂuid. The peak in the center of the ﬁgure is very distinctive and although noise was not integrated into the simulation, the signal produced a strong step function. Figure 16.(B) represents the signal produced when a generic sinc pulse and gradient waveform is used. In comparing Figure 16.(A) to Figure 16.(B), one can see that the signal produced by the gVERSE pulse has a highly distinctive central

The gVERSE RF Pulse: An Optimal Approach to MRI Pulse Design B: Generic RF Sinc Signal

Signal Amplitude

Signal Amplitude

A: gVERSE Signal

45

x Coordinate Position

,

x Coordinate Position

Figure 16. The signal produced by the gVERSE pulse MRI simulation over the diagonal cerebrospinal ﬂuid (A), and when a generic sinc RF pulse and gradient sequence is applied (B)

peak and a much clearer division with regards to what is tissue and what is not. The base of the signal in Figure 16.(A) is also more representative of when the voxels in Sin reach the ﬂuid, which is not the case for the sinc pulse. In addition, the objective value, which deﬁnes the strength of the RF pulse necessary to produce such a signal, was 0.1874 SAR units for the gVERSE pulse, substantially lower than that of the conventional pulse, which had an objective value of 0.5923 SAR units.

6 Conclusions and Future Work We designed the gVERSE model to reduce the SAR of RF pulses by main− → taining a constant RF pulse strength ( B rf value) and generating high quality MR signals. It was shown that the gVERSE results produced strong MR signals with clear divisions of the location of the tissue being imaged. For this reason various MRI studies utilizing gVERSE pulses could be developed in the near future. The observations noted in Section 4 deserve some additional reasoning and explanation. To begin, the reader should understand that the symmetry displayed between coordinate position vectors in each of the result cases was precisely designed in (8) of the gVERSE model. However, the precession illustrated by the magnetization vectors was not directly part of the gVERSE design, it was a consequence of the Bloch constraint (3). Nonetheless, the precession shown in our results validated our design since it occurs within the nucleus of atoms in vivo. Furthermore, investigating the precession of the magnetization vectors in the 15 slice results, it was shown that they had a much tighter radial orbit than the 15 slice penalty results. This was due to the fact that the penalty parameters allowed the feasible range of the constraints

46

C.K. Anand et al.

on these variables to be larger. With respect to precession, the 15 slice results were the most realistic. However, penalty variables in the 15 slice penalty results allowed the span of the magnetization vectors in Sin to be fairly large, which is probably what occurs in practice. In addition, investigating only the coordinate positions in Sin one should note that the penalty variables relaxed the constraints of the 15 slice penalty results, which did not induce the wavelike motion found in the vectors of the 15 slice results. One could conclude that in order to have improved transverse tipping and increase the length of magnetization vectors in Sin , larger ε2 values are necessary, however, whether or not such a large precessional value is a realistic approximation would then become a factor. The aim of the gVERSE pulse was to minimize SAR by maintaining a constant RF pulse (bx (t) and by (t) values), which was established in both of the results. Although the values of bx (t) were identical for both cases, by (t) values increased as the distance between the slices in Sin became larger. This was expected since an increase in the distance between s and s would require additional energy to tip the voxels into the transverse plane, yielding an increase in the strength of the RF pulse, or larger by (t) value. The by (t) values for the penalty results were the greatest and were not as constant as the other case. This was again due to the penalty variables and parameters, however, the nonlinear portions of the by (t) graph only had small diﬀerences with respect to the other values; and they were lower. Also, when comparing the gVERSE pulse to conventional pulses the gVERSE objective value was lower for all cases, and hence, did not require as much energy to tip the magnetization vectors into the transverse plane. Finally, the most surprising part of the gVERSE pulse results is the gradient waveform. Since we optimized for the RF pulse in our model, this process returned the gradient waveform that would allow such a pulse to occur. In other words, in order to use the bx (t) and by (t) pulse design, the accompanying gradient waveform, mainly derived from the Bloch constraint, would have to be imposed to acquire a useable signal. With regards to practical MR gradient waveforms, the 15 slice penalty results produced the simplest and most reasonable gradient vales to implicate, particularly due to its large linear portions. However, if necessary, regardless of the diﬃculty, either gradient could be implemented. Finally, both results had similar features in the sense that they each started oﬀ fairly negative and then ended up quite positive. This is a very interesting consequence of the gVERSE pulse, as shown in Section 2 and 4, conventional gradient sequences usually have the opposite characteristics. In terms of our MRI simulation, good signal results were produced for such unique gradient waveforms, which would justify further research with gVERSE pulses. In fact, Sections 5 and 6 demonstrated that the gVERSE RF pulse and gradient sequence were viable and could be applied to practical MRI.

The gVERSE RF Pulse: An Optimal Approach to MRI Pulse Design

47

Future Work The gVERSE pulse proved to have encouraging MRI results and performed to be better than anticipated with respect to useable MR imaging signals. However, there are still areas left for investigation and various elements of the gVERSE model can be improved. A few of the issues that should be taken into account for future developments are: • • • • •

Specializing the model to the rotating structure of the equations; Apply the gVERSE model to more than 50 slices; Add spin-lattice and spin-spin proton interactions to the gVERSE formulation; Apply alternative optimization software to the problem; Include gradient distortions to the gVERSE model.

The issues are listed in sequential order, starting with what we believe is the most important item to be addressed. As most are self explanatory, adding rotation into the equations was one of the factors that deemed to be important after the results were examined. By integrating the rotating frame of reference into our equations we eliminated the y-axis. It is possible that this was a source of singularities when optimizing and therefore caused SOCS to increase the size of its working array, occasionally creating memory problems.

References [Ber95] [Ber01]

[BH01]

[Bus96] [CNM86]

[CGN88]

[CDM90]

[HBTV99]

Bertsekas, D. P.: Nonlinear Programming. Athena Scientiﬁc, Belmont, Massachusetts (1995) Betts, J. T.: Practical Methods for Optimal Control Using Nonlinear Programming. Society for Industrial and Applied Mathematics, Philadelphia (2001) Betts, J. T., and Huﬀman, W. P.: Manual: Release 6.2 M and CTTECH-01-014. The Boeing Company, PO Box 3707. Seattle, WA 981242207 (2001) Bushong, S. C.: Magnetic Resonance Imaging: Physical and Biological Principles. Mosby, Toronto, 2nd edition (1996) Conolly, S. M., Nishimura, D. G., and Macovski, A.: Optimal Control Solutoins to the Magnetic Resonace Selelctive Excitation Problem. IEEE Transacitons on Medical Imaging, MI-5, 106-115 (1986) Conolly, S. M., Glover, G., Nishimura, D. G., and Macovski, A.: Variable-Rate Selective Excitation. Journal of Magnetic Resonance, 78, 440-458 (1988) Curry, T. S., Dowdey, J. E., and Murry, R. C.: Christensen’s Physics of Diagnostic Radiology. Lippincott Williams and Wilkins, New York, 4th edition (1990) Haacke, E. M., Brown, R. W., Thompson, M. R., and Venkatesan, R.: Magnetic Resonance Imaging: Physical Principles and Sequence Design, John Wiley and Sons, Toronto (1999)

48

C.K. Anand et al.

[LL01]

[Nis96]

[She01] [Sto04] [UGI04]

[WXF91]

Liang, Z. P., and Lauterbur, P. C.: Principles of Magnetic Resonance Imaging: A Signal Processing Perspective. IEEE Press, New York, New York (2001) Nishimura, D. G.: Principles of Magnetic Resonance Imaging. Department of Electrical Engineering, Stanford University, San Francisco (1996) Shen, J.: Delayed-Focus Pulses Obtimized Using Simulated Annealing. Journal of Magnetic Resonance, 149, 234-238 (2001) Stoyan, S. J.: Variable Rate Selective Excitation RF Pulse in MRI. M.Sc. Thesis: McMaster University, Hamilton (2004) Ulloa, J. L., Guarini, M., and Irarrazaval, P.: Chebyshev Series for Designing RF Pulses Employing an Optimal Control Approach. IEEE Transacitons on Medical Imaging, 23, 1445-1452 (2004) Wu, X. L., Xu, P., and Freeman, R.: Delayed-Focus Pulses for Magnetic Resonace Imaging: An Evolutionary Approach. Magnetic Resonance Medicine, 20, 165-170 (1991)

Modelling the Performance of the Gaussian Chemistry Code on x86 Architectures Joseph Antony1 , Mike J. Frisch2 , and Alistair P. Rendell1 1

2

Department of Computer Science, The Australian National University, ACT 0200, Australia. {joseph.antony, alistair.rendell}@anu.edu.au Gaussian Inc. 340 Quinnipiac St., Bldg. 40, Wallingford CT 06492, USA.

Abstract Gaussian is a widely used scientiﬁc code with application areas in chemistry, biochemistry and material sciences. To operate efficiently on modern architectures Gaussian employs cache blocking in the generation and processing of the two-electron integrals that are used by many of its electronic structure methods. This study uses hardware performance counters to characterise the cache and memory behavior of the integral generation code used by Gaussian for Hartree-Fock calculations. A simple performance model is proposed that aims to predict overall performance as a function of total instruction and cache miss counts. The model is parameterised for three diﬀerent x86 processors – the Intel Pentium M, the P4 and the AMD Opteron. Results suggest that the model is capable of predicting execution times to an accuracy of between 5 and 15%.

1 Introduction It is well known that technological advances have driven processor speeds faster than main memory speeds, and that to address this issue complex cache based memory hierarchies have been developed. Obtaining good performance on cache based systems requires that the vast majority of the load/store instructions issued by the processor are serviced using data that resides in cache. In other words, to achieve good performance it is necessary to minimize the number of cache misses [4]. One approach to achieving this goal is to implement some form of cache blocking [10]. The objective here is to structure the computational algorithm in such a way that it spends most of its time working with blocks of data that are suﬃciently small to reside in cache, and only periodically moves data between main memory and cache. Gaussian [5] is a widely used computational chemistry code that employs cache blocking to perform more eﬃcient integral computations [8,14]. The integrals in question lie at the heart of many of the electronic structure methods implemented within Gaussian, and are associated with the various interactions between and among the electrons and nuclei in the system under study.

50

J. Antony et al.

Since many electronic structure methods are iterative, and the number of integrals involved too numerous for them to be stored in memory, the integrals are usually re-computed several times during the course of a typical calculation. For this reason algorithms that compute electronic structure integrals fast and on-demand are extremely important to the computational chemistry community. To minimize the operation count, integrals are usually computed in batches, where all integrals in a given batch share a number of common intermediates [7]. In the PRISM algorithm used by Gaussian, large batch sizes give rise to large inner loop lengths. This is good for pipelining, but poor if it causes cache overﬂows and the need to fetch data from main memory. To address this problem Gaussian imposes cache blocking by limiting the maximum size of an integral batch. This in eﬀect says that the time required to recompute the common shared intermediates is less than the time penalty associated with having inner loops fetch data quantities from main memory. In the current version of Gaussian there is a “one size ﬁts all” approach to cache blocking, in that the same block size is used regardless of the exact characteristics of the integrals being computed. A long term motivation for our work is to move away from this model towards a dynamic model where cache blocking is tailored to each and every integral batch. As a ﬁrst step towards this goal, this paper explores the ability of a simple Linear Performance Model (LPM) to predict the performance of Gaussian’s integral evaluation code purely as a function of instruction count and cache misses. It is important to note that the LPM is very diﬀerent to that used in typical analytic or simulation based performance studies. Analytic models attempt to weight various system parameters and present an empirical equation for performance, whereas simulation studies are either trace3 or execution4 driven with each instruction considered in order to derive performance metrics. Analytic models fail, however, to capture dynamic aspects of code execution that are only evident at runtime, while execution or trace driven simulations are extremely slow, often being 100-1000 times slower than execution of the actual code. The LPM on the other hand, eﬀectively ignores all the intricate details of program execution and assumes that, over time these details can be averaged out and incorporated into penalty factors associated with the average cost of issuing an instruction and the average cost of a cache miss. In this study, three x86 platforms – the Intel Pentium M, Pentium 4 (P4) and AMD Opteron – are considered. On-chip hardware performance counters are used to gather instruction and cache miss data from which the LPM is derived. The paper is broken into the following sections: section 2 discusses the background to this study, the tools and methodology used, and introduces 3

Trace driven simulation uses a pre-recorded list of instructions in a traceﬁle for later interpretation by a simulator. 4 An execution driven simulator interprets instructions from a binary source to perform its simulation.

Gaussian Performance Modelling

51

the LPM; section 3 uses the LPM for a series of experiments on the three diﬀerent platforms and discusses the results. Previous work, conclusions and future work are covered in sections 4 and 5.

2 Background 2.1 The Hartree-Fock Method Electronic structure methods aim to solve Schr¨ odinger’s wave equation for atomic and molecular systems. For all but the most trivial systems it is necessary to make approximations. Computational chemists have developed a hierarchy of methods each with varying computational cost and accuracy. Within this hierarchy the Hartree-Fock (HF) method is relatively inaccurate, but it is also the bedrock on which more advanced and accurate methods are built. For these reasons the HF method was chosen as the focus of this work. At the core of HF theory is the concept of a molecular orbital (MO), where one MO is used to describe the motion of each electron in the system. The MOs (φ) are expanded in terms of a number (N ) of basis functions (χ) such that for MO φi : N cαi χα (1) φi = α

where cαi are the expansion or molecular orbital coeﬃcients. In HF methods the form of these coeﬃcients is optimized so that the total energy of the system is minimized. The basis functions used are normally located at the various atomic nuclei, and are a product of a radial function that depends on the distance from that nuclei, and an angular function such as a spherical harmonic Ylm 5 [8]. Usually the radial function is a Gaussian Gnl (r) 6 , and it is for this reason that the Gaussian code is so named. The matrix form of HF equations is given by: F C = SCǫ

(2)

where C is the matrix of molecular orbital coeﬃcients, S a matrix of (overlap) integrals between pairs of basis functions, ǫ a vector with elements corresponding to the energy of each MO, and F is the so called Fock matrix deﬁned by: core + Fµν = Hµν

Ne N i

λσ

∗ Cλi Cσi [(µν | λσ) − (µλ | νσ)]

(3)

where Ne is the number of electrons in the system. In equation 3 each element of the Fock matrix is expressed in terms of another two-index quantity core ) that involves other integrals between pairs of basis functions, and (Hµν the molecular orbital coeﬃcients (C) contracted with a four-index quantity (µν | λσ). Since F depends on C, which is the same quantity that we seek 5 6

Ylm (θ, ϕ) = Gnl (r) =

2l+1 (l−m)! m P (cos θ)(eimϕ ) 4π (l+m)! l

3/4

2(2α) π 1/4

√

22n−l−2 ( (4n−2l−3)!!

2αr)2n−l−2 exp(−αr2 )

52

J. Antony et al.

to determine, equation 2 is solved iteratively by guessing C, building a Fock matrix, solving equation 2 and then repeating this process until convergence is reached. The four-index quantities, (µν | λσ), are the electron repulsion integrals (ERIs) that are of interest to this work, and arise due to repulsive interactions between pairs of electrons. They are given by: (µν | λσ) =

χµ (r1 ) χν (r1 )

1 χλ (r2 ) χσ (r2 ) dr1 dr2 |r1 − r2 |

(4)

where r1 and r2 are the coordinates of two electrons. For a given basis the number of two-electron integrals grows as O(N 4 ), so evaluation and processing of these quantities quickly becomes a bottleneck. (We note that for large systems it is possible to reduce this asymptotic scaling through the use of pre-screening and other techniques [13], but these alternative approaches still require a substantial number of ERIs to be evaluated and processed.) In the outline given above it has been assumed that each basis function is a single Gaussian function multiplied by a spherical harmonic (or similar). In fact it is common to combine several Gaussian functions with diﬀerent exponents together in a ﬁxed linear combination, treating the result as one contracted Gaussian basis function. We also note that when a basis function involves a spherical harmonic (or similar) of rank one or higher (i.e. l ≥ 1), it is normal to include all orders of spherical harmonic functions within that rank (i.e. ∀m : −l ≤ m ≤ l). Thus if a basis function involves a spherical harmonic of rank 2, all 5 components are included as basis functions. Thus there are three parameters that characterise a basis function; i) its location, ii) its degree of contraction and the exponents of the constituent Gaussians, and iii) the rank of its angular component. In the PRISM [12] algorithm functions with equivalent ii) and equivalent iii) the same are treated together, with a batch of ERI integrals deﬁned by doing this for all of the four functions involved. The size of these batches can quickly become very large since the same basis set is generally applied to all atoms of the same type within the system under study, e.g. all oxygen or hydrogen atoms in the system. It is for this reason that Gaussian imposes cache blocking to limit maximum batch sizes. 2.2 Linear Performance Model The Linear Performance Model (LPM) gives the total number of cycles required to execute a given code segment as: Cycles = α ∗ (ICount ) + β ∗ (L1M isses ) + γ ∗ (L2M isses )

(5)

where ICount is the instruction count, L1M isses the total number of Level 1 cache misses, L2M isses the total number of Level 2 cache misses, and α, β, and γ are ﬁtting parameters. Intuitively the value of α reﬂects the ability of the code to exploit the underlying superscalar architecture, β is the average

Gaussian Performance Modelling

53

cost of an L1 cache miss, and γ is the average cost of an L2 cache miss. We will collectively refer to α, β and γ as the Processor and Platform speciﬁc coeﬃcients (PPCoeﬀs). They will be derived by performing a least squares ﬁt of the Cycles, ICount , L1M isses and L2M isses counts obtained from hardware performance counters for a variety of cache blocking sizes. 2.3 PAPI PAPI [3], a cross platform performance counter library, is used to obtain hardware counter data. It uses on-chip performance counters to measure application events of interest like instruction and cycle counts as well as other cache events. Of the three x86 machines used, the Intel Pentium M, P4 and the AMD Opteron, have diﬀerent numbers of on-chip performance counters. Each on-chip performance counter can count one particular hardware event. The following hardware events are used in this study; PAPI L1 TCM (Total level one (L1) misses (data and instruction)), PAPI L2 TCM (Total level two (L2) misses), PAPI TOT INS (Total instructions) and PAPI TOT CYC (Total cycles). PAPI also supports hardware performance counter event multiplexing. This uses an event sampling approach to enable more events to be counted than there are available hardware registers. Events counted using multiplexing will therefore have some statistical uncertainty associated with them. It is noted that on the P4 processor PAPI does not have a PAPI L1 TCM preset event, as it is not exposed by the underlying hardware counters. Instead PAPI L1 DCM and PAPI L1 ICM are used to count the total number of data and instruction cache misses respectively. Table 1 lists processor characteristics and the cache and memory latencies measured using lmbench [11]. The P4’s L1 instruction cache is a trace cache [6], unlike the Pentium M and Opteron. Hyperthreading on the P4 was turned oﬀ for this study. PAPI’s event multiplexing was used on the Pentium M, as this processor has only two hardware counters, but four hardware events are required by the LPM. 2.4 Methodology Gaussian computations are performed on a small system consisting of a solvated potassium ion surrounded by 11 water molecules with the geometry obtained from a snapshot of a molecular dynamics simulation. This work uses two basis sets denoted as 6-31G* and 6-31G++(3df,3pd) [8]. The former is a relatively modest basis set, while the latter would be considered large. Cache blocking in Gaussian is controlled by an input parameter cachesize, this was set to values of 2, 8, 32, 128, 256 and 512 kilowords (where a word is 8 bytes). The default value of this parameter equates to the approximate size of the highest level of cache on the machine being used, and from this value the sizes of various intermediate buﬀers are derived. For each blocking size, performance counter results were recorded for one complete iteration of the HF procedure and averaged over ﬁve runs.

54

J. Antony et al.

Clock Rate Ops. per Cycle Memory Subsystem Perf. Counters L1 DCache

L2 Uniﬁed

lmbench Latencies for L1 DCache L2 Uniﬁed Main Memory

(Ghz) (Cy) (No.) Size (KB) Associativity (Ways) Line size (Bytes) Cache Policies Size (MB) Associativity (Ways) Line size (Bytes) Relation to L1 Cache Policies Latency (Cy) Latency (Cy) Latency (Cy, ≈ )

Pentium M P4 Opteron 1.4 3.0 2.2 3 3 3 NtBr NtBr HT 2 18 4 32 16 64 8 8 2 64 64 64 LRU, WB P-LRU LRU, WB, WA 1 1 1 8 8 16 64 64 64 Inclusive Inclusive Exclusive LRU P-LRU P-LRU 3 9 201

4 28 285

3 20 405

NtBr = Northbridge, HT = HyperTransport, LRU = Least Recently Used, P-LRU = Pseudo-LRU, WB = Write Back, WA = Allocate on Write Table 1. Processor characteristics of clock rate, cache sizes and measured latencies for L1 DCache, L2 cache and main memory latencies for the three x86 processors used in the study. Block 6-31G* 6-31G++(3df,3pd) Size Pentium M P4 Opteron Pentium M P4 Opteron 2 42.0 28.2 20.8 4440 3169 2298 8 36.7 24.2 17.0 3849 2821 1970 32 30.0 19.8 13.6 2914 2210 1484 128 31.8 20.2 17.0 2869 2121 1701 256 37.0 22.0 20.2 3349 2259 1856 512 42.0 24.8 22.0 3900 2516 2214 x 36.6 23.2 18.4 3554 2516 1921 σ 5.0 3.2 3.1 618 409 308 Table 2. Timings (seconds) for HF benchmark using the 6-31G* and 631G++(3df,3pd) basis sets as a function of the cache blocking parameter. Also shown are the average (x) times and their standard deviations (σ).

3 Observed Timings and Hardware Counter Data Observed timings Table 2 shows the execution times obtained on the three diﬀerent hardware platforms as a function of the diﬀerent cache block sizes and when using the 631G* and 6-31G++(3df,3pd) basis sets. These results clearly show that cache blocking for integral evaluation has a major eﬀect on the overall performance of the HF code in Gaussian. As the block size is increased from 2 to 512 kilowords the total execution time initially decreases, reaches a minimum,

Gaussian Performance Modelling

55

and then increases again. Exactly where the minimum is located is seen to vary slightly across the diﬀerent platforms, and between the two diﬀerent basis sets. Also shown in Table 2 are the execution times averaged over all the diﬀerent cache block sizes on a given platform, together with the corresponding standard deviation. Although, the absolute value of the standard deviations are signiﬁcantly smaller for the 6-31G* basis, as a percentage of average total execution times they are roughly equal for both basis sets at around 15%.

Figure 1. Hardware counter data as a function of the cache blocking parameter for the HF method, using the 6-31G++(3df,3pd) basis set on the three diﬀerent hardware platforms

Hardware counter data Hardware counter data for the 6-31G++(3df,3pd) basis set is given in Figure 1. The left hand scale of the graph quantiﬁes Total Level 1 misses (L1misses ) and Total Level 2 misses (L2misses ), while the right hand scale quantiﬁes Total Cycles (Cycles) and Instruction Count (ICount ). The x axis is plotted using a log2 scale. The cycle counts shown in Figure 1 are directly related to the times given in Table 2 by the clock speeds (see Table 1). Hence they show a similar behavior, decreasing initially as the block size increases, reaching a minimum and then increasing. In contrast the instruction counts show a steep initial decrease, but then appear to level oﬀ for large block sizes. This behavior reﬂects the fact that similar integrals, previously split into multiple batches, will be computed in fewer batches as the block size increases. Mirroring this behavior the L1 and L2 cache misses are initially low, increase when the blocking size is expanded, and ultimately will plateau when there are no more split batches to be combined (although this is not evident for the block sizes given in the ﬁgure). Obtained PPCoeﬀs Using the LPM (equation 5) and the hardware performance counter data for the HF/6-31G* calculations, a least squares ﬁt was performed in order to obtain the PPCoeﬀs values given in Table 3. For the Pentium M and Opteron the values of α, β and γ appear reasonable. Speciﬁcally a value of 0.67 for

56

J. Antony et al. Processor α β γ Pentium M 0.67 13.39 63.60 P4 2.87 -59.35 588.46 Opteron 0.64 7.13 388.23 P4a 0.86

–

323.18

Table 3. PPCoeﬀ (α, β, γ) values for the Pentium M, P4 and Opteron obtained from HF/6-31G* results. See text for further details. a Results obtained when ignoring counts for L1 cache misses.

α on the Pentium M and 0.64 on the Opteron, implies that the processors are issuing 1.5 and 1.6 instructions per cycle respectively. Given that both processors (and also the P4) can issue upto three instructions per cycle these values are in the typical range of what might be expected. The values for β and γ are average L1 and L2 cache miss penalties respectively, or alternatively β is the average cost of referencing data in L2 cache, while γ is the average cost of referencing data in main memory. The actual costs for referencing the L2 cache and main memory as measured using lmbench are given in Table 1. Thus for the Pentium M a value for β of 13.39 can be compared with the L2 latency of 9 cycles (Table 1), and a value for γ of 63.60 can be compared with 201 cycles. On the Opteron the equivalent comparisons are 7.13 to 20, and 388.23 to 405. These results for β, and particularly those for γ are roughly in line with what we might expect if we note that they are averages, while those measured by lmbench are worst case scenarios; hardware features such as prefetching and out-of-order execution are likely to mask some of the latencies associated with a cache miss in Gaussian, but not for lmbench (by design). In contrast to the Pentium M and Opteron systems the results for the P4 are clearly unphysical with a negative value for β. The reason for this will be outlined in a future publication [2], but in essence it is due to the nature of the P4 micro-architecture which makes it very hard to count accurately the L1 cache misses. If, however, we ignore L1 misses and restrict the LPM to just the instruction count and L2 cache misses we obtain the second set of P4 data given in Table 3. This is far more reasonable,with a value for α that now equates to 1.2 instructions per cycle compared to an unlikely previous value of 0.37. Similarly the latency for a main memory reference is now less than that recorded by lmbench. The PPCoeﬀs in Table 3 were derived using performance counter data obtained from running with the 6-31G* basis set. It is of interest to combine these values for α, β and γ with the instruction and cache miss counts recorded with the larger 6-31G++(3df, 3pd) basis set, and thereby obtain predicted cycle counts for this larger calculation. The diﬀerence between these predicted cycle counts and the actual cycle counts gives a measure of the ability of the LPM to make predictions outside of the domain in which it was originally parameterised. Doing this we ﬁnd RMS diﬀerences between the predicted and measured execution times of 456, 268 and 95 seconds for the Pentium M, P4

Gaussian Performance Modelling

57

(for the 2 parameter LPM) and Opteron processors respectively. Compared to the average execution times given in Table 2, this represents an error of ∼13% on the Pentium M, ∼10% on the P4, and ∼5% on the Opteron. Since the total execution time varies by over 50% as the block size is changed, these results suggest that the LPM is accurate enough to make useful predictions concerning the performance of Gaussian as a function of total instruction and cache misses.

4 Previous Work Using a sparse set of trace based cache simulations, Gluhovsky and O’Krafka [9] build a multivariate model of multiple cache miss rate components. This can then be used to extrapolate for other hypothetical system conﬁgurations. Vera et al. use cache miss equations [15] to obtain an analytical description of cache memory behavior of loop based codes. These are used at compile time to determine near optimal cache layouts for data and code. Snavely et. al use proﬁle convolving [1] a trace based method which involves the creation of a machine proﬁle and an application proﬁle. Machine proﬁles describe the behavior of loads and stores for the given processor, while the application proﬁle is a runtime utility which captures and statistically records all memory references. Convolving involves creating a mapping of the machine signature and application proﬁle; this is then fed to an interconnect simulator to create traces that aids in predicting performance. In comparison to these methods, the LPM is lightweight in obtaining application speciﬁc performance characteristics. PPCoeﬀs are obtained using hardware counter data which can then be used by either trace based or execution based simulators.

5 Conclusions and Future Work A linear performance model was proposed to model the cache performance of Gaussian. PPCoeﬀs (α, β, γ) obtained intuitively correspond to how well the code uses the superscalar resources of the processor, the average cost in cycles of an L1 cache miss and the average cost in cycles of an L2 miss. Experiments show optimal batch sizes are both platform and computation speciﬁc, hinting that a dynamic means of varying batch sizes at runtime might be useful. In which case the LPM could be used to determine cache blocking sizes prior to computing a batch of integrals. On completing each batch cache metrics gathered could then be used to guide a runtime search toward the most optimal blocking size. The predictive ability of the LPM can be used to aid experiments which use cache simulation tools. These tools are capable of simulating caches of current and possible future processors and yield instruction counts, number of L1 and L2 misses. In tandem with the LPM, cycle counts can be computed thus allowing determination of which microarchitectural features have the greatest impact on code performance.

58

J. Antony et al.

For future work we propose to test the usefulness of the LPM at runtime to aid in searching for optimal blocking factors and use it to study the eﬀect of microarchitectural changes on code performance.

Acknowledgments This work was possible due to funding from the Australian Research Council, Gaussian Inc. and Sun Microsystems Inc. under ARC Linkage Grant LP0347178. JA and APR wish to thank Alexander Technology and DCS TSG for access to various hardware platforms.

References 1. A. Snavely, N. Wolter, and L. Carrington. Modelling Application Performance by Convolving Machine Signatures with Application Proﬁles. IEEE Workshop on Workload Characterization, December 2001. 2. J. Antony, M. J. Frisch, and A. P. Rendell. Future Publication. 3. S. Browne, J. Dongarra, N. Garner, G. Ho, and P. Mucci. PAPI. Intl. Journal of HPC Applications, 14(3):189–204, 2000. 4. D. E. Culler, A. Gupta, and J. P. Singh. Parallel Computer Architecture: A Hardware/Software Approach. Morgan Kaufmann Publishers, Inc., San Francisco, California, USA, 1999. 5. M. J. Frisch, G. W. Trucks, H. B. Schlegel, G. E. Scuseria, M. A. Robb, and J. R. Cheeseman et al. Gaussian 03, Revision C.01. Gaussian Inc., Gaussian, Inc., Wallingford CT, USA, 2004. 6. G. Hinton, D. Sager, M. Upton, D. Boggs, D. Carmean, A. Kyker, and P. Roussel. The microarchitecture of the Pentium 4 processor. Intel Technical Journal, 2001. 7. M. Head-Gordon and J. A. Pople. A method for two-electron gaussian integral and integral derivative evaluation using recurrence relations. J. Chem. Phys., 89(9):5777–5786, 1988. 8. T. Helgaker, P. Jorgensen, and J. Olsen. Molecular Electronic-Structure Theory. John Wiley & Sons, 2001. 9. I. Gluhovsky and B. O’Krafka. Comprehensive Multiprocessor Cache Miss Rate Generation Using Multivariate Models. ACM Transactions on Computer Systems, May 2005. 10. M. D. Lam, E. E. Rothberg, and M. E. Wolf. The cache performance and optimizations of blocked algorithms. SIGOPS Oper. Syst. Rev., 25:63–74, 1991. 11. L. W. McVoy and C. Staelin. lmbench: Portable tools for performance analysis. In USENIX Annual Technical Conference, pages 279–294, 1996. 12. P. M. W. Gill. Molecular Integrals over Gaussian Basis Functions. Advances in Quantum Chemistry, 25:141–205, 1994. 13. P. M. W. Gill, B. G. Johnson, and J. A. Pople. A simple yet powerful upper bound for Coulomb integrals. Chemical Physics Letters, 217:65–68, 1994. 14. R. Lindh. Integrals of Electron Repulsion. In P. v. R. Schleyer et al., eds, Encyclopaedia of Computational Chemistry, volume 2, page 1337. Wiley, 1998. 15. X. Vera, N. Bermudo, and A. G. J. Llosa. A Fast and Accurate Framework to Analyze and Optimize Cache Memory Behavior. ACM Transactions on Programming Languages and Systems, March 2004.

Numerical Simulation of the December 26, 2004: Indian Ocean Tsunami J. Asavanant,1 M. Ioualalen,2 N. Kaewbanjak,1 S.T. Grilli,3 P. Watts,4 J.T. Kirby,5 and F. Shi5 1

2

3

4

5

Advanced Virtual and Intelligent Computing (AVIC) Center, Department of Mathematics, Faculty of Science, Chulalongkorn University, Bangkok, 10330, Thailand [email protected], [email protected] Geosciences Azur (IRD, CNRS, UPMC, UNSA), Villefranche-sur-Mer, France [email protected] Department of Ocean Engineering, University of Rhode Island, Narragansett, RI 02882 [email protected] Applied Fluids Engineering, Inc., 5710 E. 7th Street, Long Beach, CA 90803, USA [email protected] Center for Applied Coastal Research, University of Delaware, Newark, DE 19761, USA [email protected]

Abstract The December 26, 2004 tsunami is one of the most devastating tsunami in recorded history. It was generated in the Indian Ocean oﬀ the western coast of northern Sumatra, Indonesia at 0:58:53 (GMT) by one of the largest earthquake of the century with a moment magnitude of Mw = 9.3. In the study, we focus on best ﬁtted tsunami source for tsunami modeling based on geophysical and seismological data, and the use of accurate bathymetry and topography data. Then, we simulate the large scale features of the tsunami propagation, runup and inundation. The numerical simulation is performed using the GEOWAVE model. GEOWAVE consists of two components: the modeling of the tusnami source (Okada, 1985) and the initial tsunami surface elevation, and the computation of the wave propagation and inundation based on fully nonlinear Boussinesq scheme. The tsunami source is used as initial condition in the tsunami propagation and inundation model. The tsunami source model is calibrated by using available tide gage data and anomalous water elevations in the Indian Ocean during the tsunami event, recorded by JASON’s altimeter (pass 129, cycle 109). The simulated maximum wave heights for the Indian Ocean are displayed and compared with observations with a special focus on the Thailand coastline.

1 Introduction On December 26, 2004 at 0:58:53 GMT a 9.3 Magnitude earthquake occurred along 1300 km of the Sundra and Andaman trenches in the eastern Indian Ocean, approximately 100 km oﬀ the west coast of northern Sumatra.

60

J. Asavanant et al.

The main shock epicenter was located at 3.32◦ N and 95.85◦ E, 25-30 km deep. Over 200,000 people across the entire Indian Ocean basin were killed with tens of thousands reported missing as a result of this disastrous event. In accordance with modern practice, several international scientiﬁc team were organized to conduct quantitative survey of the tsunami characteristics and hazard analysis in the impacted coastal regions. Numerous detailed eyewitness observations were also reported in the form of video digital recordings. Information concerning the survey and some ship-based expeditions on tsunami source characteristics can be found in Grilli et al (2007), Moran et al (2005), and McNeill (2005), Kawata et al (2005), Satake et al (2005), Fritz and Synolakis (2005). In this paper a resonable tsunami source based on available geological, seismological and tsunami elevation and timing data is constructed using the standard half-plane solution for an elastic dislocation formula (Okada, 1985). Inputs to these formula are fault plane location, depth, strike, dip, slip, length, and width as well as seismic moment and rigidity. Okada’s solution is implemented in TOPICS (Tsunami Open and Progressive Initial Conditions System) which is a software tool that provides the vertical coseismic displacements as output. Tsunami propagation and inundation are simulated with FUNWAVE (Fully nonlinear Wave Model) based on the dispersive Boussinesq system. Comparisons of simulated surface elevations with tide gage data, satellite transect and runup observations show good agreement both in amplitudes and wave periods. This validates our tsunami source and propagation model of the December 26, 2004 event. Dispersive eﬀects in the simulations are brieﬂy discussed.

2 Source and Propagation Models The generation mechanism for the Indian Ocean tsunami is mainly due to the static sea ﬂoor uplift caused by abrupt slip at the India/Burma plate interface. Seismic inversion models (Ammon, 2005) indicate that the main shock propagated northward from the epicenter parallel to the Sumatra trenches for approximately 1,200 km of the fault length. In this study, the ruptured subduction zone is identiﬁed by ﬁve segments of tsunami source based on diﬀerent morphologies (Figure 1). 2.1 Source Model The main generating force of a tsunami triggered by an earthquake is the uplift or subsidence of the sea-ﬂoor. Determining the actual extent of sea-ﬂoor change in a sub-sea earthquake is very diﬃcult. In general, the displacement can be computed from the formulae which output surface deformation as a function of fault strike, dip, slip, length, width, depth, moment magnitude, and Lame’s constants for the surrounding rock (Okada, 1985). The underlying

Numerical Simulation of the December 26, 2004: Indian Ocean Tsunami

61

Figure 1. Earthquake tsunami source

assumptions are based on the isotropic property and half-plane homogeneity of a simple source conﬁguration. Okada’s formulae are used in this study to compute ground displacement from fault parameters of each segment shown in Table 1. The total seismic moment release is Mo = 1.3 × 1023 J, equivalent to Mw = 9.3. Okada’s solution is implemented in TOPICS (Tsunami Open and Progressive Initial Conditions System) which are then tranferred and linearly superimposed into the wave propagation model (FUNWAVE) as an initial free surface condition. The ﬁve segments are distinguished by their unique shape and orientation. Segment 1 covers the Southern arc of the ruptured subduction zone with length L = 220 km. Segments 2 and 3 are relatively straight sections of the subduction zone in a NNW direction along the trench with the lengths of 150 and 390 kms respectively. The last two segments (4 and 5) have a marked change in orientation and shape. Segment 4 (L = 150 km) is facing southern Thailand whereas a signiﬁcant number of larger islands are located on the overriding plate of segment 5 (L = 350 km).

62

J. Asavanant et al.

2.2 Wave Propagation Model Usually tsunamis are long waves (as compared with the ocean depth). Therefore, it is natural ﬁrst to consider the long-wave (or shallow-water) approximation for the tsunami generation model. However, the shallow water equations ignore the frequency dispersion which can be important for the case of higher frequency wave propagation in relatively deep water. In this paper, a fully nonlinear and dispersive Boussinesq model (FUNWAVE) is used to simulate the tsunami propagation from deep water to the coast. FUNWAVE also includes physical parameterization of dissipation processes as well as an accurate moving inundation boundary algorithm. A review of the theory behind FUNWAVE are given by Kirby (2003). Table 1. Tsunami source parameters Parameters

Segment 1 Segment 2 Segment 3 Segment 4 Segment 5

xo (longitude) yo (latitude) d (km) ϕ (degrees) λ (degrees) δ (degrees) ∆ (m) L (km) W (km) to (s) µ (Pa) Mo (J) λo (km) To (min) ηo (m)

94.57 3.83 25 323 90 12 18 220 130 60 4.0 × 1010 1.85 × 1022 130 24.77 -3.27;+7.02

93.90 5.22 25 348 90 12 23 150 130 272 4.0 × 1010 1.58 × 1022 130 17.46 -3.84;+8.59

93.21 7.41 25 338 90 12 12 390 120 588 4.0 × 1010 2.05 × 1022 120 23.30 -2.33;+4.72

92.60 9.70 25 356 90 12 12 150 95 913 4.0 × 1010 0.61 × 1022 95 18.72 -2.08;+4.49

92.87 11.70 25 10 90 12 12 350 95 1273 4.0 × 1010 1.46 × 1022 95 18.72 -2.31;+4.6

3 Tsunami Simulations Simulations of the December 26, 2004 tsunami propagation in the Bay of Bengal (72◦ to 102◦ E in longitude and 13◦ S to 23.5◦ N in latitude) are performed by using GEOWAVE, which is a single integrated model combining TOPICS and FUNWAVE. The application of GEOWAVE on landslide tsunami is discussed in Watts et al (2003). We construct the numerical simulation grid by using ETOPO2 bathymetry and topography data together with denser and more accurate digitized bathymetry and topography data provided by Chulalongkorn University Tsunami Rehabilitation Research Center. These

Numerical Simulation of the December 26, 2004: Indian Ocean Tsunami

63

data were derived by a composite approach using 30 m NASA’s Space Shuttle Radar Topography Mission (SRTM) data for the land area with digitized navigational chart (Hydrographic Department of the Royal Thai Navy) and overlaid onto the 1:20,000 scale administrative boundary GIS (ESRI Thailand, Co Ltd). The projection’s rectiﬁcation was veriﬁed and adjusted, whenever needed, using up to two ground control points per square kilometer. We regridded the data using linear interpolation to produce the uniform grid with 1.85 × 1.85 km, which approximately corresponds to a 1 minute grid spacing, yielding 1,793 by 2,191 points. The time step for each simulation is set at 1.2 sec. Two kinds of boundary conditions are used in FUNWAVE, i.e. total reﬂected wall and sponge layer on all ocean boundaries. In the simulations, the ﬁve segments of tsunami sources are triggered at appropriate times to according to the reduced speed of propagation of the rupture. Based on the shear wave speed prediced by seismic inversion models, the delay between each segments can be estimated and the values to are provided in Table 1.

4 Discussion of Results The maximum elevations above the sea level are plotted in Fig. 2 showing the tsunami’s radiation patterns. Details of regional areas of Banda Aceh and Thailand’s westcoast are shown in Figs. 3a, 3b. The estimate of sea surface elevation about two hours after the start of tsunami event is obtained along the satellite track No. 129 (Jason 1). The comparison between the model results with the satellite altimetry illustrated in Fig. 4 shows satisfactory agreement, except for a small spatial shift at some locations. This may be due to the noise in satellite data. During the event, sea surface elevations were measured at several tide gage stations and also recorded with a depth echo-sounder by the Belgian yacht “Mercator”. Table 2 lists the the tide gage and the Belgian yacht with their locations. Fig. 5 shows both measured and simulated time series in the Maldives (Hannimaadhoo, Male), Sri Lanka (Columbo), Taphao Noi (east coast of Thailand) and the yacht. Simulated elevation and arrival times at these locations agree well as compared to those of observations. As expected from seismological aspects, all of the tide gage data (both observed and modeled) show leading elevation wave on the western side (uplift) of the sources and depression waves on the eastern side (subsidence) of the source area. As shown in Fig. 3b, the largest runups are predicted near Banda Aceh (northern Sumatra) and in western coast of Thailand (Khao Lak area). The largest runup measured on the west coast of Banda Aceh are underpredicted by 50% likely due to the lack of detailed coastal bathymetry and topography. However, better agreement on the extreme runup values can be found in the Khao Lak area, Thailand where more accurate coastal topography was speciﬁed in the model grid.

64

J. Asavanant et al.

Figure 2. Maximum elevations in Bay of Bengal

(a)

(b)

Figure 3. (a) Maximum elevations along Banda Aceh and (b) Maximum elevations along the westcoast of Thailand

Numerical Simulation of the December 26, 2004: Indian Ocean Tsunami

65

Figure 4. Comparison of tsunami measured with satellite altimetry by Jason1 and results of tsunami simulation

(a)

(b) Figure 5. Comparison of numerical tide gage data for (a) Hanimaadhoo, (b) Male, (c) Colombo, (d) Taphao Noi, and (e) mercator yatch

66

J. Asavanant et al.

(c)

(d)

(e) Figure 5. (continued)

Numerical Simulation of the December 26, 2004: Indian Ocean Tsunami

67

Table 2. Tide gage locations Locations

Coordinates

Hannimaadhoo, Maldives Male, Maldives Columbo, Sri Lanka Taphao Noi, Thailand Mercator (Phuket), Thailand

(6.767, (4.233, (7.000, (7.833, (7.733,

73.167) 73.540) 79.835) 98.417) 98.283)

5 Final Remarks For several decades, models based on either linear or nonlinear versions of the shallow water theory are most generally used within the tsunami problem. This shallow water model basically neglects the eﬀects of dispersion. Here we propose the use of dispersive (yet fully nonlinear) Boussinesq equations which is also another long wave propagation model. The earthquake tsunami sources consisting of ﬁve segments for vertical coseismic displacement are simulated in TOPICS based on parameters provided in Table 1. We simulate tsunami propagation and inundation with FUNWAVE which is a public domain higher order Boussinesq model developed over the last ten years at the University of Delaware (Wei and Kirby, 1995). To estimate dispersive eﬀects, FUNWAVE can also be set to perform the simulations under the Nonlinear Shallow Water Equations (NSWE) for the same set of parameters. Grilli, et al (2007) reported, in the regions of deeper water in WSW direction of main tsunami propagation west of the source, that dispersion can reduce the wave amplitude by up to 25% compared to the nondispersive shallow water equation model. These diﬀerences occur very locally which may be associated with local topographic features and decrease signiﬁcantly after the tsunami has reached the shallower continental shelf. Furthermore the eastward propagation towards Thailand exhibits a very weak dependence on frequency dispersion. A more detailed discussion on these dispersive eﬀects can be found in Ioualalen, et al (2007).

6 Acknowledgments The authors would like to gratefully acknowledge the National Electronics and Computer Technology Center (NECTEC) under the Ministry of Science and Technology, Thailand for the use of their ITANIUM computer cluster and Dr. A. Snidvongs, Head of Chulalongkorn University Tsunami Rehabilitation Research Center, for providing the digitized inland topography and sea bottom bathymetry along the westcoast of Thailand. M. Merriﬁeld from UHSLC for kindly providing us with the tide gage records. S. Grilli, J. Kirby, and F. Shi acknowledge continuing support from the Oﬃce of Naval Research, Coastal Geosciences Program.

68

J. Asavanant et al.

References 1. Ammon, C. J. et al (2005). Rupture process of the 2004 Sumatra-Andaman earthquake. Science, 308, 1133-1139. 2. Fritz, H. M. and C. E. Synolakis (2005). Field survey of the Indian Ocean tsunami in the Maldives. Proc 5th Intl on Ocean Wave Meas and Analysis (WAVES 2005, Madrid, Spain, July 2005), ASCE. 3. Grilli, S. T., Ioualalen, M., Asavanant, J., Shi, F., Kirby, J., and Watts, P. (2007). Source constraints and model simulation of the December 26, 2004 Indian Ocean Tsunami. J Waterway Port Coast and Ocean Engng, 133(6), 414–428. 4. Ioualalen, M., Asavanant, J., Kaewbanjak, N., Grilli, S. T., Kirby, J. T., and Watts, P. (2007) Modeling the 26th December 2004 Indian Ocean tsunami: Case study of impact in Thailand. J Geophys Res, 112, C07025, doi:10.1029/2006JC003850. 5. Kawata, T. et al (2005). Comprehensive analysis of the damage and its impact on coastal zones by the 2004 Indian Ocean tsunami disaster. Disaster Prevention Research Institute http://www.tsunami.civil.tohoku.ac.jp/sumatra2004/ report.html 6. Kirby, J. T. (2003). Boussinesq models and applications to nearshore wave propagation, surf zone processes and wave-induced currents. Advances in Coastal Modeling, V. C. Lakhan (ed), Elsevier Oceanography Series 67, 1-41. 7. McNeill, L., Henstock, T. and Tappin, D. (2005). Evidence for seaﬂoor deformation during great subduction zone earthquakes of the Sumatran subduction zone: Results from the ﬁrst seaﬂoor survey onboard the HMS Scott, 2005, EOS Trans. AGU, 86(52), Fall Meet. Suppl., Abstract U14A-02. 8. Moran, K., Grilli, S. T. and Tappin, D. (2005). An overview of SEATOS: Sumatra earthquake and tsunami oﬀshore survey. EOS Trans. AGU, 86(52), Fall Meet. Suppl., Abstract U14A-05. 9. Okada, Y. (1985). Surface deformation due to shear and tensile faults in a halfspace. Bull. Seis. Soc. Am., 75(4), 1135-1154. 10. Satake, K. et al (2005). Report on post tsunami survey along the Myanmar coast for the December 2004 Sumatra-Andaman earthquake http: //unit.aist.go.jp/actfault/english/topics/Myanmar/index.html 11. Watts, P., Grilli, S. T., Kirby, J. T., Fryer, G. J., and Tappin, D. (2003). Landslide tsunami case studies using a Boussinesq model and a fully nonlinear tsunami generation model. Nat. Hazards and Earth Sci. Systems, 3(5), 391-402. 12. Wei, G. and Kirby, J. T. (1995). Time-dependent numerical code for extended Boussinesq equations. J. Waterway Port Coast and Ocean Engng, 121(5), 251-261.

Approximate Dynamic Programming for Generation of Robustly Stable Feedback Controllers Jakob Bj¨ ornberg1 and Moritz Diehl2 1

2

Center of Mathematical Sciences, University of Cambridge, Wilberforce Road, CB3 OWB Cambridge, United Kingdom Optimization in Engineering Center (OPTEC)/Electrical Engineering Department (ESAT), K.U.Leuven, Kasteelpark Arenberg 10, 3001 Leuven-Heverlee, Belgium [email protected]

Abstract In this paper, we present a technique for approximate robust dynamic programming that allows to generate feedback controllers with guaranteed stability, even for worst case disturbances. Our approach is closely related to robust variants of the Model Predictive Control (MPC), and is suitable for linearly constrained polytopic systems with piecewise affine cost functions. The approximation method uses polyhedral representations of the cost-to-go function and feasible set, and can considerably reduce the computational burden compared to recently proposed methods for exact dynamic programming for robust MPC [1, 8]. In this paper, we derive novel conditions for guaranteeing closed loop stability that are based on the concept of a “uroborus”. We ﬁnish by applying the method to a state constrained tutorial example, a parking car with uncertain mass.

1 Introduction The optimization based feedback control technique of Model Predictive Control (MPC) has attracted much attention in the last two decades and is nowadays widespread in industry, with many thousand large scale applications reported, in particular in the process industries [19]. Its idea is, simply speaking, to use a model of a real plant to predict and optimize its future behaviour on a so called prediction horizon, in order to obtain an optimal plan of future control actions. Of this plan, only the ﬁrst step is realized at the real plant for one sampling time, and afterwards the real system state – which might be diﬀerent than predicted – is observed again, and a new prediction and optimization is performed to generate the next sampling time’s feedback control. So far, in nearly all MPC applications, a deterministic – or

70

J. Bj¨ ornberg and M. Diehl

nominal – model is used for prediction and optimization. Though the reason for repeated online optimization and feedback is exactly the non-deterministic nature of the process, this “nominal MPC” approach is nevertheless useful in practice due to its inherent robustness properties [9, 10]. In contrast to this, robust MPC, originally proposed by Witsenhausen [21], is directly based on a worst-case optimization of future system behaviour. While a key assumption in nominal MPC is that the system is deterministic and known, in robust MPC the system is not assumed to be known exactly, and the optimization is performed against the worst-case predicted system behaviour. Robust MPC thus typically leads to min-max optimization problems, which either arise from an open loop, or from a closed loop formulation of the optimal control problem [14]. In this paper, we are concerned with the less conservative, but computationally more demanding closed loop formulation. We regard discrete-time polytopic systems with piecewise aﬃne cost and linear constraints only. For this problem class, the closed loop formulation of robust MPC leads to multi-stage min-max optimization problems that can be attacked by a scenario tree formulation [12, 20] or by robust dynamic programming (DP) approaches [1, 8, 17] (an interesting other approach to robust MPC is based on “tubes”, see e.g. [13]). Note that the scenario tree formulation treats a single optimization problem for one initial state only, whereas DP produces the feedback solution for all possible initial states. Unfortunately, the computational burden of both approaches quickly becomes prohibitive even for small scale systems as the size of the prediction horizon increases. A recently developed approximation technique for robust DP [4] considerably reduces the computational burden compared to the exact method. This approximation is at the expense of optimality, but still allows to generate robustly stable feedback laws that respect control and state constraints under all circumstances. The ﬁrst aim of the article is to review this technique in a detailed, self contained fashion. In addition, in this paper, we considerably weaken the previous requirements to obtain robust stability, by a novel approach based on the concept of a “uroborus”. We ﬁrst show, in Section 2, how the robust DP recursion can be compactly formulated entirely in terms of operations on sets. For the problem class we consider, these sets are polyhedra and can be explicitly computed [1, 8], as reviewed in Section 3. In Section 4 we generalize an approximation technique originally proposed in [15] (for deterministic DP). This allows us to approximate the result of the robust dynamic programming recursion with considerably fewer facets than in the exact approach. In Section 5 we give conditions that guarantee robust closed loop stability of the generated feedback law, entirely in terms of the polyhedral set representation, and introduce the idea of a “uroborus”. Finally, in Section 6 we illustrate with an example how the approximation approach can be used to synthesize a robustly stable feedback, and we conclude the paper in Section 7. Implementations of all algorithms presented in this paper are publicly available [6].

Approximate Dynamic Programming

71

2 Robust Dynamic Programming with Constraints We consider discrete-time dynamic systems xk+1 = fk (xk , uk ),

fk ∈ F,

k∈N

(1)

with states xk ∈ Rnx and controls uk ∈ Rnu . The transition functions fk in each step are uncertain, but we assume they are known to be in a certain set F. Given an N -stage policy π = (u0 (·), u1 (·), . . . , uN −1 (·)), and given an uncertainty realization φ = (f0 , f1 , . . . , fN −1 ) ∈ FN , as well as an initial 0 , k = 0, . . . , N , by value x0 , we deﬁne the corresponding state sequence xπ,φ,x k π,φ,x0 π,φ,x0 π,φ,x0 π,φ,x0 x0 )). To each admissible event , uk (xk = x0 and xk+1 = fk (xk (xk , uk ), i.e. state xk and control uk at time k, a cost Lk (xk , uk ) is associated, which is additive over time. The objective in robust dynamic programming is to ﬁnd a feedback policy π ∗ that minimizes the worst case total cost, i.e. solves the min-max optimization problem min max π

φ∈ FN

N −1

0 0 0 ) )) + VN (xπ,φ,x , uk (xπ,φ,x Lk (xπ,φ,x N k k

(2)

k=0

0 0 )) ∈ Lk , , uk (xπ,φ,x (xπ,φ,x k k

subject to:

0 xπ,φ,x ∈ XN , N

∀φ ∈ FN , k ∈ {0, . . . , N −1}. The sets Lk and XN specify constraints on states and controls, and VN (·) is a ﬁnal cost. Starting with VN and XN , we compute the optimal cost-to-go functions Vk and feasible sets Xk recursively, for k = N − 1, . . . , 1, by the robust Bellman equation with constraints (cf. [17]): V˜k (x, u) Vk (x) := min n u∈R

u

s.t.

˜ k, (x, u) ∈ X

and

˜ k }, Xk := {x | ∃u : (x, u) ∈ X

(3) (4)

˜ k := {(x, u) ∈ Lk | where V˜k (x, u) := Lk (x, u)+ maxf ∈F Vk+1 (f (x, u)) and X f (x, u) ∈ Xk+1 ∀f ∈ F }. The optimal feedback control uk (x) from the optimal policy π ∗ can be determined as the minimizer of (3). 2.1 Formulation in Terms of Epigraphs Given a set W and a function g : W → R, deﬁne the epigraph of g by epi(g) := {(w, s) ∈ W × R | w ∈ W, s ≥ g(w)}.

(5)

Given two subsets A and B of W × R, deﬁne their cut-sum A B by A B := {(x, s + t) | (x, s) ∈ A, (x, t) ∈ B}.

(6)

72

J. Bj¨ ornberg and M. Diehl

If W ⊆ X × U and f is any function W → X, deﬁne an “epigraph function” fE : W × R → X × R by fE (x, u, s) = (f (x, u), s). We think of X as a state space and U as a space of controls. If we have a dynamic programming recursion with stage constraints deﬁned by the sets Lk and stage costs Lk : Lk → R, let ek = epi(Lk ). Suppose the ﬁnal cost and constraint set are VN and XN , and let EN = epi(VN ). Similarly deﬁne Ek = epi(Vk ) for any 0 ≤ k ≤ N − 1. Here we regard Vk as a function deﬁned on Xk only. We now deﬁne the operation Tk on the set Ek+1 as follows: ⎛ ⎞ Tk (Ek+1 ) := p ⎝ek (7) fE−1 (Ek+1 )⎠ , fE ∈ F E

˜ := {(x, s) | ∃u : where p : X × U × R → X × R denotes projection, p(E) ˜ (x, u, s) ∈ E}, and FE := {fE | f ∈ F}.

Proposition 1. Ek = Tk (Ek+1 ) for k = 0, . . . , N − 1. Proof. By deﬁnition of Ek = epi(Vk ) and by (3) ˜ k , s ≥ V˜k (x, u)}. Ek = {(x, s) | ∃u : (x, u) ∈ X

(8)

−1 We have that (x, u, s) ∈ fE ∈FE fE (Ek+1 ) iﬀ f (x, u) ∈ Xk+1 for all ˜ k := ek f , and s ≥ maxf ∈F Vk+1 (f (x, u)). Furthermore, (x, u, s) ∈ E −1 f (E ) iﬀ, in addition, (x, u) ∈ L and s ≥ Lk (x, u) + k+1 k E fE ∈FE maxf ∈F Vk+1 (f (x, u)), in other words iﬀ (x, u, s) is in the epigraph epi(V˜k ) ˜ k . Therefore E ˜ k = epi(V˜k ). Hence Tk (Ek+1 ) = p(E ˜k) = of V˜k restricted to X p(epi(V˜k )) which is the same as the expression (8) for Ek .

In view of Proposition 1, we call Tk the robust dynamic programming operator. We will also denote this by T , suppressing the subscript, whenever the time-dependency is unimportant. Using Proposition 1 we easily deduce the following monotonicity property, motivated by a similar property in dynamic programming [3]. Proposition 2. If E′ ⊆ E then T (E′ ) ⊆ T (E). Proof. Referring to (7), we see that for any fE ∈ FE we have fE−1 (E′ ) ⊆ fE−1 (E). Thus, letting P ′ = fE−1 (E′ ) and P = fE−1 (E), we have P ′ ⊆ P . Now suppose (x, s) ∈ p(e P ′ ), i.e. there exists a u such that (x, u, s) ∈ e P ′ . Thus there exist s1 , s2 with s = s1 + s2 and (x, u, s1 ) ∈ e and (x, u, s2 ) ∈ P ′ . But then (x, u, s2 ) ∈ P , so (x, u, s) = (x, u, s1 + s2 ) ∈ e P . Thus (x, s) ∈ p(e P ). Hence, T (E′ ) = p(e P ′ ) ⊆ p(e P ) = T (E).

(9)

Approximate Dynamic Programming

73

3 Polyhedral Dynamic Programming From now on we consider only aﬃne systems with polytopic uncertainty, of the form f (x, u) = Ax + Bu + c. (10) Here the matrices A and B and the vector c are contained in a polytope F = conv{(A1 |B1 |c1 ), . . . , (Anf |Bnf |cnf )},

(11)

and we identify each matrix (A|B|c) ∈ F with the corresponding function f . Polytopic uncertainty may arise naturally in application problems (see the example in this paper) or may be used to approximate nonlinear systems. We consider convex piecewise affine (CPWA) cost functions " ! ˘ , VN (x) = max PN [ x1 ], (12) L(x, u) = max P˘ [ x1 ] + Qu

˘ and where the maximum is taken over the components of a vector, and P˘ , Q, PN are matrices of appropriate dimensions. For simplicity of notation, we assume here and in the following that the stage costs Lk and feasible sets Lk in (2) do not depend on the stage index k. We treat linear constraints that result in polyhedral feasible sets $ $ # # ˇ ≤ 0 , XN = x PˆN [ x1 ] ≤ 0 . L = (x, u) Pˇ [ x1 ] + Qu Deﬁnition 1. By polyhedral dynamic programming we denote robust dynamic programming (3)–(4) for affine systems f with polytopic uncertainty (11), CPWA cost functions L and VN , and polyhedral constraint sets L and XN .

The point is that for polytopic F, CPWA L, VN , and polyhedral sets L, XN , all cost-to-go functions Vk are also CPWA, and the feasible sets Xk are polyhedral. This is proved in [1] and [8], the latter of which uses epigraphs. We give an alternative formulation here, using the ideas and notation from Section 2.1. Theorem 1. For T a polyhedral dynamic programming operator and E a polyhedron, also T (E) as deﬁned in (7) is a polyhedron. Proof. We begin by proving that fE ∈ FE fE−1 (E) is a polyhedron. FE is given as the convex hull of the matrices D1 , . . . , Dnf , with Di = A0i B0i 01 c0i . Assume that E consists of all points y satisfying Qy ≤ q. Thus (x, u, s) ∈ fE−1 (E) iﬀ QD · (x, u, s, 1)T ≤ q (where fE is represented by D ∈ FE ). It follows that −1 fE ∈ FE fE (E)

=

%

ti ≥0, nf i=1 ti =1

nf ( nf & ' )

x

x &x' x u u QDi u u ≤ q ti QDi s ≤ q = s s s 1 1 i=1

i=1

which as the ﬁnite intersection of polyhedra is again a polyhedron. In addition, the following trivial lemma holds [5].

74

J. Bj¨ ornberg and M. Diehl

Lemma 1. The cut-sum of two polyhedra is a polyhedron. ˜=e Therefore, also the cut-sum E

fE ∈ F E

fE−1 (E) is a polyhedron. Finally,

˜ is a polyhedron, also p(E) ˜ is a polyhedron. Corollary 1. If this E This corollary, which is a direct consequence of the constructive Theorem 2 below, completes the proof of Theorem 1. ˜ is a nonempty polyhedron represented as Theorem 2 ([8]). Assume E ⎧⎡ ⎤ * + ⎡x⎤ , - ⎫ ⎨ x P˜ Q ˜ 1 ⎬ ˜ = ⎣u⎦ ⎣1⎦ ≤ E (13) s , ¯ 0 ⎭ ⎩ P¯ Q s u

which is bounded below in the s-dimension. Then ) 1 x ˜ = {(x, s) | ∃u : (x, u, s) ∈ E} ˜ = x P s ≤ p(E) 0 s Pˆ 1 with

˜ D ¯ P = D

P˜ P¯

and

¯ P¯ . Pˆ = R

˜ D) ¯ are the vertices of the polyhedron The row vectors of (D ) ˜ λ ˜ ≥ 0, λ ¯ ≥ 0, Q ˜ Q ¯ = 0, 1T λ ˜ =1 ˜ T λ+ ¯T λ λ Λ= ¯ λ

(14)

(15)

(16)

¯ span the extreme rays of Λ. and the row vectors of (0 R) ˜ iﬀ the linear program Proof. We see that (x, s) ∈ p(E) ˜ min s′ s.t. (x, u, s) ∈ E

(u,s′ )

(17)

has a ﬁnite solution y and s ≥ y. The dual of (17) is max λT px , λ∈Λ

where px =

P˜k P¯k

x . 1

(18)

(19)

By assumption (17) is solvable, so the dual (18) is feasible and the set Λ therefore nonempty. Denoting the vertices and extreme ray vectors of Λ by di and rj respectively, we may write Λ = conv{d1 , . . . , dnd } + nonneg{r1 , . . . , rnr }. There are two cases, depending on x:

(20)

Approximate Dynamic Programming

75

1. If the primal (17) is feasible, then by strong duality the primal (17) and the dual (18) have equal and ﬁnite solutions. In particular maxλ∈Λ λT px < ∞. This implies that there is no ray vector r of Λ with rT px > 0. Therefore, the maximum of the dual (18) is attained in a vertex di so that its optimal value is maxi=1,...,nd dTi px . 2. If the primal (17) is infeasible then the dual must be unbounded above, since it is feasible. So there is a ray r ∈ nonneg{r1 , . . . , rnr } with rT px > 0, and hence there must also be an extreme ray rj with rjT px > 0. ˜ iﬀ s ≥ dT px for all vertices di and rT px ≤ 0 This shows that (x, s) ∈ p(E) i j for all extreme rays rj . The proof is completed by collecting all vertices dTi = ¯ k ), the extreme rays rT = (˜ ˜ kD (d˜Ti , d¯Ti ) as rows in the matrix (D rjT , r¯jT ) as rows j ¯ k ), and showing that r˜j = 0 for all j. For any λ ∈ Λ we ˜k R in the matrix (R ˜ = 1−1 = 0. ˜ r˜j ) = 1. Thus 1T r˜j = 1−1T λ have that λ+rj ∈ Λ, so that 1T (λ+ As r˜j ≥ 0 we see that r˜j = 0. The possibility to represent the result of the robust DP recursion algebraically can be used to obtain an algorithm for polyhedral dynamic programming. Such an algorithm was ﬁrst presented by Bemporad et al. [1], where the explicit solution of multi-parametric programming problems is used. The representation via convex epigraphs and the duality based construction at the base of Theorem 2 above was ﬁrst presented in [8]. Polyhedral dynamic programming is exact, and does not require any tabulation of the state space. Thus it avoids Bellman’s “curse of dimensionality”. However, the numbers of facets required to represent the epigraphs Ek will in general grow exponentially. Even for simple examples, the computational burden quickly becomes prohibitive as the horizon N grows.

4 Approximate Robust Dynamic Programming We now review an approximation technique for polyhedral dynamic programming that was ﬁrst presented in [4], and which allows to considerably reduce the computational burden compared to the exact method. When used properly, the method is able to preserve robust stability properties in MPC applications. This approximation technique also is the motivation for the new stability proofs centered around the “uroborus” presented in Section 5. We will ﬁrst show how to generate a polyhedron that is “in between” two polyhedra V ⊆ W, where V and W can be thought of as two alternative epigraphs generated during one dynamic programming step. For this aim we suppose $ $ # # V = x viT [ x1 ] ≤ 0, i ∈ IV , W = x wjT [ x1 ] ≤ 0, j ∈ IW . We describe how to generate another polyhedron $ # A = x ∈ Rn viT [ x1 ] ≤ 0, ∀i ∈ IA ⊆ IV ,

76

J. Bj¨ ornberg and M. Diehl

which satisﬁes V ⊆ A ⊆ W, and which uses some of the inequalities from V. In general A will be represented by fewer inequalities than V, and shall later be used for the next dynamic programming recursion step. It is exactly this data reduction that makes the approximation method so much more powerful than the exact method. We show how to generate the index set IA iteratively, using intermediate (k) sets IA . The method is a generalization of an idea from [15] to polyhedral sets. At each step, let A(k) be given by $ # (k) . A(k) = x ∈ Rn viT [ x1 ] ∀i ∈ IA (k)

where the index set IA is a subset of IV , the index set of the inner polyhedron V. Procedure–Polyhedron Pruning (0)

1. Let k = 0 and IA := ∅ (i.e. A(0) = Rn ). 2. Pick some j ∈ IW . 1 ∗2 a) If there exists an x∗ ∈ A(k) such that wjT x1 > 0, then let 1 ∗2 (k+1) (k) i∗ = arg maxi∈IV viT x1 . Set IA := IA ∪ {i∗ }, and remove i∗ from IV . (k+1) (k) b) If no such x∗ exists, set IA := IA and remove j from IW . (k) 3. If IW = ∅ then let IA := IA and end. Otherwise set k := k + 1 and go to 2. The idea of polyhedron pruning is illustrated in Figure 1. In step (2) of the 1algorithm, for given j ∈ IW we have to ﬁnd an x∗ ∈ A(k) such that 2 T x∗ wj 1 > 0, or make sure that no such x∗ exists. We address this task using the following linear program: (k)

max wjT [ x1 ] s.t. wjT [ x1 ] ≤ η, viT [ x1 ] ≤ 0, ∀i ∈ IA , x

(21)

for some η > 0 (in our implementation we used η = 0.05). If the optimal value of problem (21) is negative or zero, we know that no such x∗ exists. Otherwise we take its optimizer to be x∗ . In our algorithm, we also perform a special scaling of the inequalities, which leads to the following assumption: Assumption 1 Each vector vi = ¯ vi 2 = 1.

1 v¯i 2 ξi

∈ Rn × R deﬁning V is normed so that

We require Assumption 1 for the following reason. The quantity 1 ∗2 viT x1

(22)

in (2) of the pruning procedure measures the vertical “height” of the hyperplane described by vi . If Assumption 1 holds, then by maximizing (22) we ﬁnd the piece of V that is furthest away from x∗ , because then the height

Approximate Dynamic Programming

77

W

A(k)

vi∗

x∗ vi Figure 1. Polyhedron pruning. Here x∗ is in A(k) but violates a constraint deﬁning W, so we form A(k+1) by including another constraint from V. The inequality represented by vi∗ is chosen rather than the one represented by vi , as the distance from vi∗ to x∗ is greater (note that inequalities are hatched on the infeasible side)

above x∗ and the distance from x∗ coincide. Note that we employ a diﬀerent selection rule from [15]: instead of adding the constraint vi∗ that is furthest away from x∗ in a predetermined direction, we choose it so that it maximizes the perpendicular distance to x∗ . Summarizing, we use the described pruning method to perform approximate polyhedral dynamic programming, by exectuting the steps of the following algorithm. Procedure–Approximate Polyhedral DP 1. Given the epigraph Ek , the stage cost L and feasible set L, let Eout = T (Ek ) as in Section 2.1. 2. Choose some polyhedron Ein that satisﬁes Ein ⊆ Eout . 3. Let Ek−1 be the result (A) of applying the polyhedral pruning procedure with V = Ein and W = Eout . 4.1 How to choose Ein ? There are many ways of choosing the set Ein , and the choice is largely heuristic. Recall that Eout = T (Ek ) is the epigraph of the cost-to-go Vk−1 restricted to the feasible set Xk−1 . To get a polyhedron contained in Eout you could raise or steepen the cost-to-go Vk−1 , or you could diminish the size of the feasible set Xk−1 , or both. We are mainly interested in the case where x = 0

78

J. Bj¨ ornberg and M. Diehl

is always feasible and the minimum value of the cost-to-go Vk is attained at zero for all k. Then you can let Ein := {(x, s) ∈ Rn × R | (αx, s) ∈ Eout }

(23)

with some α > 1. This is illustrated in Figure 2. It is diﬃcult to assess the loss of optimality due to this approximation, as not only the cost function is made steeper, but also the feasible set is reduced. However, with the above deﬁnition, Ein is a factor α1 smaller than Eout . For example, if we choose α = 1.1 in the tutorial example below, we lose at most roughly 10% of the volume in each iteration.

s

Ek−1 Ein Eout

x Figure 2. Pruning of epigraphs (sketch)

4.2 The Approximated Robust MPC Feedback Law For a given polyhedral epigraph E we can deﬁne an approximate robustly optimal feedback law u(·) by letting u(x) be the solution of a linear programming problem (LP), &x' ˜ u(x) := arg min s s.t. u ∈ E (24) u,s

s

−1 ˜ =e with E fE ∈ FE fE (E) as in Theorem 1. The computational burden associated with this LP in only nu +1 variables is negligible and can easily be performed online. Alternatively, a precomputation via multiparametric linear programming is possible [1].

5 Stable Epigraphs and the Uroborus The question of stability is a major concern in MPC applications and has been addressed e.g. in [2, 12, 18]. In the notation of this paper, the question is if the uncertain closed loop system

Approximate Dynamic Programming

xk+1 = fk (xk , u(xk )), fk ∈ F,

79

(25)

where u(xk ) is determined as solution of (24), is stable under all possible realizations of fk ∈ F. In the case of uncertain systems, it might not be possible to be attracted by a ﬁxed point, but instead by an attractor, a set that we denote T. In this case, the cost function L would penalize all states that are outside T in order to drive the system towards T. The following theorem formulates easily verifyable conditions that guarantee that a given set T is robustly asymptotically attractive for the closed-loop system, in the sense deﬁned e.g. in [12]. Theorem 3 (Attractive Set for Robust MPC [4]). Consider the closed loop system (25), and assume that 1. there is a non-empty set T ⊂ Rn and an ε > 0 such that L(x, u) ≥ ε · d(x, S) for all (x, u) ∈ L, where d(x, T) := inf y∈T x − y is the distance of x from T, 2. E ⊆ T (E), 3. there exists an s0 ∈ R with (x0 , s0 ) ∈ T (E), such that A := {(x, s) ∈ T (E) | s ≤ s0 } is compact. Then the closed loop is robustly asymptotically attracted by the set T, i.e., lim d(xk , T) = 0, for all system realizations (fk )N ∈ FN .

k→∞

The simple proof uses V (x) := minx,s ss.t. [ xs ] ∈ T (E) as a Lyapunov function and is omitted for the sake of brevity. While conditions 1 and 3 are technical conditions that can easily be met by suitably setting up the MPC cost function and constraints, the crucial assumption on the epigraph E, Assumption 2 – which guarantees that V (x) is a robust Lyapunov function – is much more diﬃcult to satisfy. Deﬁnition 2. We call a set E a stable epigraph iﬀ E ⊆ T (E). Unfortunately, it is not straightforward to generate a stable epigraph in practice. One might think to perform approximate dynamic programming for a while and to check at each iteration if the most currently generated epigraph is stable. Unfortunately, there is no monotonicity in approximate dynamic programming, and this procedure need not yield a result at any iteration. Note that if we would start the exact dynamic programming procedure with a stable epigraph EN , each iterate T k (EN ) would also yield a stable epigraph. Theorem 4. If EN is stable, then E := T k (EN ) is stable for all k ≥ 0. Proof. Using the monotonicity of T , we obtain EN ⊆ T (EN ) ⇒ T k (EN ) ⊆ T k (T (EN )) ⇔ E ⊆ T (E).

80

J. Bj¨ ornberg and M. Diehl

The assumption that EN is stable (or an equivalent assumption) is nearly always made in existing stability proofs for MPC [7, 12, 16], but we point out that (i) it is very diﬃcult to ﬁnd such a positively invariant terminal epigraph EN in practice and (ii) the exact robust dynamic programming procedure is prohibitive for nontrivial horizon lengths N . These are the reasons why we avoid any assumption on EN at all and directly address stability of the epigraph E that is ﬁnally used to generate the feedback law (24). A similar approach is taken in [11]. But the crucial question of how to generate a stable epigraph remains open. 5.1 The Novel Concept of a Uroborus Fortunately, we can obtain a stable epigraph with considerably weaker assumptions than usually made. Deﬁnition 3. Let the collection E1 , E2 , . . . , EN satisfy Ek ⊆ T (Ek+1 ),

k = N − 1, . . . , 1

(26)

and EN ⊆ T (E1 )

(27)

Then we call E1 , E2 , . . . , EN a uroborus. The name “uroborus” is motivated from mythology, where it denotes a snake that eats its own tail, as visualized in Figure 3. A uroborus could be generated by approximate dynamic programming, as follows. Starting with some epigraph EN , we can perform the conservative version of approximate dynamic programming to generate sets EN −1 , . . . , E1 , E0 , which ensures Ek ⊆ T (Ek+1 ) by construction, i.e. the ﬁrst N − 1 inclusions (26) are already satisﬁed. Now, if also the inclusion EN ⊆ E0 holds, i.e., if the head E0 “eats” the tail EN , then also (27) holds because of E0 ⊆ T (E1 ), and the generated sequence of sets EN −1 , . . . , E1 is a uroborus. But why might EN ⊆ E0 ever happen? It is reasonable to expect the sets Ek to grow on average with diminshing index k if the initial set EN was chosen suﬃciently small (a detailed analysis of this hope, however, is involved). So, if N is large enough, we would therefore expect the condition EN ⊆ E0 to hold. i.e., the ﬁnal set E0 to be large enough to include EN . It is important to note that our numbering of the epigraphs is contrary to their order of appearance during the dynamic programming backwards recursion, but that we stick to the numbering here, in order to avoid confusion. It should be kept in mind, however, that an uroborus need not necessarily be found in the whole collection EN , . . . , E0 generated during the conservative dynamic programming recursion by checking EN ⊆ T (E1 ), but that it suﬃces that we ﬁnd Ek ⊆ T (Ek−N +1 ) for any two integers k and N . Note also that a uroborus with N = 1 is simply a stable epigraph.

Approximate Dynamic Programming

81

Figure 3. The uroborus in mythology is a snake that eats its own tail (picture taken from http://www.uboeschenstein.ch/texte/marks-tarlow3.html)

Theorem 5. Let E1 , . . . , EN be a uroborus for T , and let K := their union. Then K is a stable epigraph.

3N

k=1

Ek be

Proof. Using the monotonicity of T , we have from (26) the following inclusions: EN −1 ⊆ T (EN ) ⊆ T (K) down to E1 ⊆ T (E2 ) ⊆ T (K) and ﬁnally, 3N from (27) EN ⊆ T (E1 ) ⊆ T (K). Hence K = k=1 Ek ⊆ T (K).

Deﬁnition 4. We call a robust dynamic programming operator T convex iﬀ T (E) is convex whenever E is convex. Corollary 2. If E1 , . . . , EN is a uroborus and T is convex, then the convex hull E := conv{E1 , . . . , EN } is also stable. Proof. Again K ⊆ T (E), so E = conv(K) ⊆ T (E) by the convexity of T (E).

6 Stability of a Tutorial Example In order to illustrate the introduced stability certiﬁcate, we consider a tutorial example ﬁrst presented in [8], with results that have partly been

82

J. Bj¨ ornberg and M. Diehl

presented in [4]. The task is to park a car with uncertain mass in front of a wall as fast as possible, without colliding with the wall. The state x = [ vp ] consist of position p and velocity v, the control u is the acceleration force, constant on intervals of length t = 1. We deﬁne the following discrete time dynamics: t ( 2t ) uk (28) xk+1 = ( 10 1t ) xk + 2m The mass m of the car is only known to satisfy 1 ≤ m ≤ 1.5, i.e. we have a polytopic system (10) with uncertain matrix B. We impose the constraints# p ≤ 0 and |u| ≤ 1, and choose $ L(x, u) = max(−p + v, −p − v), p XN = [ v ] p ≤ 0, v ≤ 0, −p − v ≤ 0.3 , VN (x) = 100 · (−p − v). First we computed the cost-to-go functions and feasible sets using the exact polyhedral dynamic programming method described in [8]. Computations took almost 4 hours for N = 7. The number of facets deﬁning E0 was 467. Then we repeated the computation using the approximate method of Section 4, where we chose Ein according to (23) with α = 1.1. Computations took about 8 minutes for N = 49; the robustly feasible sets XN , XN −1 , . . . , X0 are plotted in Figure 4. The number of facets deﬁning E0 was 105. 8

6

Velocity (v)

4

2

0

−2

−4

−6 −90

−80

−70

−60

−50 −40 Position (p)

−30

−20

−10

0

Figure 4. Large: Feasible sets for N = 49 using the approximate procedure with α = 1.1. The largest set is the feasible set corresponding to the result T (E0 ) of exact dynamic programming applied to the ﬁnal epigraph E0 (which corresponds to the second largest set). Small: Feasible sets for N = 7 using the exact method. The ﬁgures have the same scale

Approximate Dynamic Programming

83

6.1 Stability by a Trivial Uroborus As can already be seen in Figure 4, the epigraph E0 is contained in its image under the exact dynamic programming operator, E0 ⊆ T (E0 ), i.e., E0 is stable, or, equivalently, E0 forms a uroborus with only one element. We checked this inclusion by ﬁnding the vertices of E0 and checking that they indeed satisfy the inequalities that deﬁne T (E0 ) (within a tolerance of 10−6 ). Therefore, we can simply set E := E0 in the deﬁnition (24) of the robust MPC feedback law u(·) to control the parking car. To satisfy all conditions of the stability guarantee, Theorem 3, we ﬁrst set T = {0}. By construction, conditions 1 and 3 are met whenever x0 ∈ X0 . Together with stability of E, the approximate robust MPC leads the closed loop robustly towards the origin for all initial states x0 in the set X0 that is much larger than what could be obtained by the exact procedure in a reasonable computing time.

1.2

1

Speed (v)

0.8

0.6

0.4

0.2

0 −1.8

−1.6

−1.4

−1.2

−1 −0.8 Position (p)

−0.6

−0.4

−0.2

0

Figure 5. Asymptotic stability of the closed loop resulting from the min-max MPC feedback u(·), for three trajectories with randomly generated masses in each time step

7 Conclusions We have reviewed a method for approximate robust dynamic programming that can be applied to polytopic systems with piecewise aﬃne cost and linear constraints, and have given novel conditions to guarantee robust stability of

84

J. Bj¨ ornberg and M. Diehl

approximate MPC. The underlying dynamic programming technique uses a dual approach and represents the cost-to-go functions and feasible sets at each stage in one single polyhedron Ek . A generalization of the approximation technique proposed in [15] to polyhedral sets allows us to represent these polyhedra approximately. Based on the novel concept of a uroborus, we presented a way to generate a robust MPC controller with the approximate dynamic programming recursion that is guaranteed to be stable. The ideas are demonstrated in a tutorial example, a parking car with uncertain mass. It is shown that our heuristic approximation approach is indeed able to generate the positively invariant set required for this stability certiﬁcate. Comparing the results with the exact robust dynamic programming method used in [8], we were able to demonstrate a signiﬁcant ease of the computational burden. The approximation algorithm, which is publicly available [6], is able to yield much larger positively invariant sets than the exact approach and thus considerably widens the range of applicability of robust MPC schemes with guaranteed stability. The novel concept of a uroborus makes generation of robustly stable MPC controllers much easier than before and promises to fertilize both, robust MPC theory and practice. Acknowledgments Financial support by the DFG under grant BO864/10-1, by the Research Council KUL: CoE EF/05/006 Optimization in Engineering Center (OPTEC), and by the Belgian Federal Science Policy Oﬃce: IUAP P6/04 (Dynamical systems, control and optimization, 2007-2011) is gratefully acknowledged. The authors also thank Sasa Rakovic for very fruitful discussions, in particular for pointing out the possibility to convexify the union of a uroborus, during an inspiring walk to Heidelberg castle.

References 1. A. Bemporad, F. Borrelli, and M. Morari. Min-max control of constrained uncertain discrete-time linear systems. IEEE Transactions on Automatic Control, vol. 48, no. 9, 1600-1606, 2003. 2. A. Bemporad and M. Morari. Robust model predictive control: A survey. In A. Garulli, A. Tesi, and A. Vicino, editors, Robustness in Identiﬁcation and Control, number 245 in Lecture Notes in Control and Information Sciences, 207–226, Springer-Verlag, 1999. 3. D. P. Bertsekas. Dynamic Programming and Optimal Control, volume 1 and 2. Athena Scientiﬁc, Belmont, MA, 1995. 4. J. Bj¨ ornberg and M. Diehl. Approximate robust dynamic programming and robustly stable MPC. Automatica, 42(5):777–782, May 2006. 5. J. Bj¨ ornberg and M. Diehl. Approximate robust dynamic programming and robustly stable MPC. Technical Report 2004-41, SFB 359, University of Heidelberg, 2004.

Approximate Dynamic Programming

85

6. J. Bj¨ ornberg and M. Diehl. The software package RDP for robust dynamic programming. http://www.iwr.uni-heidelberg.de/~Moritz.Diehl/RDP/, January 2005. 7. H. Chen and F. Allg¨ ower. A quasi-inﬁnite horizon nonlinear model predictive control scheme with guaranteed stability. Automatica, 34(10):1205–1218, 1998. 8. M. Diehl and J. Bj¨ ornberg. Robust dynamic programming for min-max model predictive control of constrained uncertain systems. IEEE Transactions on Automatic Control, 49(12):2253–2257, December 2004. 9. R. Findeisen and F. Allg¨ ower. Robustness properties and output feedback of optimization based sampled-data open-loop feedback. In Proc. of the joint 45th IEEE Conf. Decision Contr., CDC’05/9th European Control Conference, ECC’05, 7756–7761, 2005. 10. R. Findeisen, L. Imsland, F. Allg¨ower, and B.A. Foss. Towards a sampleddata theory for nonlinear model predictive control. In W. Kang, C. Borges, and M. Xiao, editors, New Trends in Nonlinear Dynamics and Control, volume 295 of Lecture Notes in Control and Information Sciences, 295–313, New York, 2003. Springer-Verlag. 11. L. Gr¨ une and A. Rantzer. On the inﬁnite horizon performance of receding horizon controllers. Technical report, University of Bayreuth, February 2006. 12. E. C. Kerrigan and J. M. Maciejowski. Feedback min-max model predictive control using a single linear program: Robust stability and the explicit solution. International Journal on Robust and Nonlinear Control, 14:395–413, 2004. 13. W. Langson, S.V. Rakovic I. Chryssochoos, and D.Q. Mayne. Robust model predictive control using tubes. Automatica, 40(1):125–133, 2004. 14. J. H. Lee and Z. Yu. Worst-case formulations of model predictive control for systems with bounded parameters. Automatica, 33(5):763–781, 1997. 15. B. Lincoln and A. Rantzer. Suboptimal dynamic programming with error bounds. In Proceedings of the 41st Conference on Decision and Control, 2002. 16. D. Q. Mayne. Nonlinear model predictive control: Challenges and opportunities. In F. Allg¨ ower and A. Zheng, editors, Nonlinear Predictive Control, volume 26 of Progress in Systems Theory, 23–44, Basel Boston Berlin, 2000. Birkh¨ auser. 17. D. Q. Mayne. Control of constrained dynamic systems. European Journal of Control, 7:87–99, 2001. 18. D. Q. Mayne, J. B. Rawlings, C. V. Rao, and P. O. M. Scokaert. Constrained model predictive control: stability and optimality. Automatica, 26(6):789–814, 2000. 19. S. J. Qin and T. A. Badgwell. A survey of industrial model predictive control technology. Control Engineering Practice, 11:733–764, 2003. 20. P. O. M. Scokaert and D. Q. Mayne. Min-max feedback model predictive control for constrained linear systems. IEEE Transactions on Automatic Control, 43:1136–1142, 1998. 21. H. S. Witsenhausen. A minimax control problem for sampled linear systems. IEEE Transactions on Automatic Control, 13(1):5–21, 1968.

Integer Programming Approaches to Access and Backbone IP Network Planning∗ Andreas Bley and Thorsten Koch Konrad-Zuse-Zentrum f¨ ur Informationstechnik Berlin, Takustr. 7, 14195 Berlin, Germany {bley,koch}@zib.de Abstract In this article we study the problem of designing a nation-wide communication network. Such networks usually consist of an access layer, a backbone layer, and maybe several intermediate layers. The nodes of each layer must be connected to those of the next layer in a tree-like fashion. The backbone layer must satisfy survivability and IP-routing constraints. Given the node locations, the demands between them, the possible connections and hardware conﬁgurations, and various other technical and administrational constraints, the goal is to decide, which node is assigned to which network level, how the nodes are connected, what hardware must be installed, and how traffic is routed in the backbone. Mixed integer linear programming models and solution methods are presented for both the access and the backbone network design problem. The focus is on the design of IP-over-SDH networks, but the access network design model and large parts of the backbone network design models are general and also applicable for other types of communication networks. Results obtained with these methods in the planning of the German research network are presented.

1 Introduction The German gigabit research network G-WiN, operated by the DFN-Verein e.V., is the largest IP network in Germany. In this article we describe the mathematical models and tools used to plan the layout and dimensioning of the access and backbone network [BK00, BKW04]. Since these models are general in nature, they were used also in two other projects. One was the placement of switching centers in circuit switched networks, a cooperation

∗

This work was partially funded by the Bundesministerium f¨ ur Bildung, Wissenschaft, Forschung und Technologie (BMBF).

88

A. Bley and T. Koch

with Telekom Austria. In the other project, studies about MSC planning in mobile phone networks were conducted together with E-Plus. Unfortunately, the data from the later projects cannot be published. For this reason, we focus throughout this article on the G-WiN IP network as an example, but show the generalized models that were developed. The problem we were faced with, roughly can be stated as follows: Given the node locations, the demands between them, the possible node layers, connections, and hardware conﬁgurations, and various other technical and administrational constraints, the goal is to decide, which node is assigned to which network layer, how the nodes are connected, what hardware must be installed, and how traﬃc is routed in the backbone network. In this article we present a two-phase approach that splits between access and backbone network planning. In a ﬁrst step, we only consider the access network. Then, in the second phase, the backbone dimensioning and routing problem is addressed. Both problems are solved by integer linear programming techniques. The problems encountered in the access network planning can be viewed as capacitated two-level facility location problems. There is a huge amount of literature on various aspects of this kind of problems. See for example [ALLQ96, Hal96, MD01, MF90, PLPL00]. The backbone network planning problem is a capacitated survivable network design problem with some additional constraints to make sure that the routing can be realized in practice with the OSPF routing protocol, which is the dominating routing protocol in the Internet. The survivable network design problem in general is well studied in the literature, see for example [AGW97, BCGT98, GMS95, KM05]. Lagrangian approaches for minimizing the maximum link utilization and for ﬁnding minimum cost network designs with respect to OSPF routing have been proposed in [LW93] and [Ble03]. Various heuristic algorithms for OSPF network design and routing optimization problem can be found in [BRT04, BRRT05, BGW98, ERP01, FGL+ 00, FT00, FT04]. Polyhedral approaches to the network design problem with shortest path routing for unicast and multicast traﬃc were presented in [BGLM03, HY04] and [Pry02]. In [BAG03] the combinatorial properties of shortest path routings were studied and a linear programming approach to compute routing weights for a given shortest path conﬁguration was proposed. The computational complexity of several problems related to (unsplittable) OSPF routing is discussed in [Ble07a] and [Ble05]. This article is organized as follows: In Section 2 we describe the problem setting in more detail. In Section 3 we present the mathematical model for the access network planning problem. The same is done for the backbone network in Section 4. In Section 5 we describe the algorithms used and in the last section we report on computational results.

IP Approaches to Access and Backbone IP Network Planning U

W

89

V

Figure 1. Network layers

2 Problem Description The network design and routing problem studied in this article can be described as follows. Given are three node sets, a set U of demand nodes (locations), a set W of possible intermediate nodes, and a set V of possible backbone nodes (see Figure 1). The same location must belong to each set U , W , and V , if it may be assigned to each of the corresponding layers. Additional to the node sets U , W , and V , we have sets AU W and AW V with all potential connections between nodes from U and W , and W and V , respectively. For each pair of demand nodes u, v ∈ U the directed traﬃc demand is du,v ∈ IR+ . Various hardware or technology can be installed or rented at the nodes or potential edges of the network. This hardware has a modular structure and combined hardware components must have matching interfaces. For example, a router provides several slots, these can be equipped with cards providing link-interfaces, and, in order to set up a certain link-type on an edge, corresponding link-interfaces are needed at the two terminal nodes. The set of all available component types is denoted by C. We distinguish between node components CV , edge components CE , or global components CG , depending on whether they can be installed at nodes, on edges, or globally for the entire network. Components may be real, like router cards or leased lines, or artiﬁcial, like bundles of components that can be purchased only together. A component, if installed, provides or consumes several resources. The set of all considered resources is denoted by R. For each node component c ∈ CV , resource r ∈ R and node v ∈ V the number kvc,r ∈ IR denotes how much of resource r is provided (if > 0) or consumed (if < 0) by component c if installed at c,r specify the provision or consumption of for node v. Analogously, kec,r and kG edge and global components if installed. We distinguish between three classes of resources, node resources RV , edge resources RE , and global resources RG . For each edge resource r and each edge e, the total consumption must not exceed the total provision of resource r by the components installed on edge e and its incident vertices. Analogously, the consumption must not exceed the provision by the components installed at a node and its incident edges for node resources or by the components installed on all edges, nodes, and

90

A. Bley and T. Koch

globally for global resources. The component cost and the routing capacity are special resources. Due to the forest structure of the solutions, for the access network planning problem it is often possible to simplify the possible hardware combinations to a set of relatively few conﬁgurations and reassign node costs to edges in the preprocessing. Also the directed traﬃc demands can be aggregated. Hence, we can assume that for the access network planning we are given a set of assembly stages Sn , for each node n from W or V , and that κrs n describes the capacity of resource r that will be provided at node n if it is at stage s. Regarding the traﬃc demands as a special resource, each location u ∈ U has an undirected demand of δur for each resource r. For each v node from W and V the assembly stage has to be chosen in such a way, that there is enough capacity to route the demands of all resources for all demand nodes that are connected to this node v. The backbone nodes and edges have to be dimensioned in such a way, that the accumulated demands can be routed according to the OSPF speciﬁcation. Since in our approach the access network planning problem is solved ﬁrst, the set of backbone nodes is given and ﬁxed for the backbone network planning problem. For notational convenience, it will again be denoted by V . The set of potential links between these nodes is denoted by E, the graph G = (V, E) is called supply graph. The traﬃc demands between the demand nodes U are aggregated to a set of demands between the backbone nodes V . We assume that between each pair of nodes v1 , v2 ∈ V there is a demand (maybe equal to zero) denoted by dv1 ,v2 . We wish to design a backbone network that is still operational if a single edge or node fails. Therefore, we introduce the set of operating states O ⊆ V ∪ E ∪ {∅}. We distinguish between the normal operating state o = ∅, which is the state with all nodes and edges operational, and failure states, which are the states with a single edge (o = e ∈ E) or a single node (o = v ∈ V ) non-operational. Note that in a node failure state o = v ∈ V all edges incident to v are non-operational, too, and the demands with origin or destination v cannot be satisﬁed and are therefore not considered in this operating state. Unfortunately, in the backbone network planning we cannot simplify the hardware combinations like in the access network planning. Since we always had signiﬁcant costs associated with or several restrictions on the use of certain hardware components, it was necessary to explicitly consider all single hardware components. The capacities provided by the components installed on the edges must be large enough to allow a feasible routing with respect to the OSPF routing protocol in all operating states. Assuming non-negative routing weights for all arcs, the OSPF protocol implies that each demand is sent from its origin to its destination along a shortest path with respect to these weights. In each operating state only operational nodes and arcs are considered in the shortest path computation. In this article we address only static OSPF routing

IP Approaches to Access and Backbone IP Network Planning

91

where the weights do not depend on nor change with the traﬃc. Dynamic shortest path routing algorithms, which try to adapt to traﬃc changes, often cause oscillations that lead to signiﬁcant performance degradation, especially if the network is heavily loaded (see [CW92]). Also, though most modern IP routers support OSPF extensions that allow to split traﬃc onto more than one forwarding arcs, in this article we consider only the standard nonbifurcated OSPF routing. This implies, that the routing weights must be chosen in such a way that, for all operating states and all demands, the shortest path from the demand’s origin to its destination is unique with respect to these weights. Otherwise, it is not determined which one of the shortest paths will be selected by the implementation of the routing protocol in the real network and, therefore, it would be impossible to guarantee that the chosen capacities permit a feasible routing of the demands. The variation of data package transmission times increases signiﬁcantly with the number of nodes in the routing path, especially if the network is heavily loaded. In order to guarantee good quality of service, especially for modern real-time multi-media services, the maximum routing path length is bounded by a small number (at least for the normal operating state). Also, even though OSPF routing in principle may choose diﬀerent paths for the routing from s to t and t to s, a symmetric routing was preferred by the network administration. For diﬀerent planning horizons, usually also diﬀerent optimization goals are used. In the long-term strategic planning the goal is to design a network with minimal cost. In the short-term operational planning the goal often is to improve the network’s quality with no or only very few changes in the hardware conﬁguration. Since Quality-of-Service in data networks is strongly related to the utilization (load or congestion) of the network, typical objectives chosen for quality optimization thus are the minimization of the total, average, or maximum utilization of the network’s components. The utilization of a network’s component is ratio between the traﬃc ﬂow through this component and its capacity. In order to provide the best possible quality uniformly for all connections and to hedge against burstiness in traﬃc, a variant of minimizing the maximum utilization of the edge components was chosen as objective function for operational backbone network planning in our application. Usually, simply minimizing the maximum utilization over all network components in all operating states makes no sense in practice. Often there are some components or links in the network that always attain the maximum utilization, like, for example, gateways to other networks with ﬁxed small capacities. Also, it is not reasonable to pay the same attention to minimizing the utilization of some link in a network failure state as in the normal operating state. To overcome these diﬃculties, we introduce a set of disjoint load groups j ∈ J, j ⊂ C × E × O. Each load group deﬁnes a set of edge components on some edges in some operating states,e.g., all edge components on all interior edges in the normal operating state, and each triple (edge component, edge, operating state) belongs to at most one load group. For each load group the

92

A. Bley and T. Koch

maximum utilization is taken only over the components, edges, and operating states in that load group. The objective is to minimize a linear combination of the maximum utilization values for all load groups. Thus, diﬀerent classes of network components can be treated independently and diﬀerently in the utilization minimization. The concept of load groups can be generalized straightforward to include all node, edge, and global components. In our application, we only consider edge components in load groups. In the next two sections, we will develop mathematical models for the problems described above.

3 Access Network Planning For the access network we assume that all traﬃc is routed through the backbone. This leads to a forest structure with the demand nodes as leafs and the backbone nodes as roots. One advantage of this approach is that we can map all node attributes like cost and capacity on the edges, since each node has at most one outgoing edge. In the uncapacitated case this results in a Steiner tree problem in a layered graph, which could be solved at least heuristically quite easy, see for example [KM98]. For each possible connection in AU W we introduce a binary variable xuw , which is 1 iﬀ u and w are connected. Analogously, variables xwv are introduced for AW V . For each possible connection in AW V and each resource r ∈ IR we r use a continuous variable fwv that is equal to the ﬂow of resource r between w and v . For each node n from W and V and each assembly stage s ∈ Sn we use a binary variable zns that is 1 iﬀ stage s is selected for node n. There are no ﬂow variables and costs between the demand and intermediate nodes, because these costs can be computed in advance and added to the installation cost for the link. 3.1 Constraints Here we present the constraints needed to construct a feasible solution. Each demand node has to be connected to exactly one intermediate node and each intermediate node can be connected to at most one backbone node: xuw = 1 ∀u ∈ U xwv ≤ 1 ∀v ∈ W (1) (u,w)∈AU W

(w,v)∈AW V

Each intermediate and each backbone node has exactly one conﬁguration: s zvs = 1 ∀v ∈ V (2) zw = 1 ∀w ∈ W s∈Sw

s∈Sv

If it is possible, that node n ∈ W ∪ V is not chosen at all, there has to be a conﬁguration with κrs n = 0 for all r ∈ IR. In the case of a planning with no

IP Approaches to Access and Backbone IP Network Planning

93

current installations the cost of the corresponding variable zns would be zero. Otherwise if there is already an installation, we can give this conﬁguration zero cost and associate with all other conﬁgurations either negative costs, which means we earn something by switching to another conﬁguration, or positive changing costs. This way, we could also add a ﬁxed charge just for changing what is already there. The conﬁguration of the intermediate nodes must provide enough resources to meet the demands: s κrs ∀w ∈ W, r ∈ IR (3) δur xuw − w zw ≤ 0 s∈Sw

(u,w)∈AU W

Demands can only be routed on chosen connections: r ≥ 0 ∀(w, v) ∈ AW V , r ∈ IR λrw xwv − fwv

and λrw = max κrs w s∈Sw

The ﬂow balance at each intermediate node has to be ensured: r δur xuw − fwv =0 ∀w ∈ W, r ∈ IR (u,w)∈AU W

(4)

(5)

(w,v)∈AW V

Equation (5) can easily be extended with a constant additive factor in case the intermediate node is serving a ﬁxed amount of the demand. Also, a scaling factor can be incorporated if, for example, data compression is employed for the traﬃc from the intermediate to the backbone nodes. The conﬁguration of a backbone node has to provide enough resources to meed the demands of the assigned intermediate nodes: s r κrs ∀v ∈ V, r ∈ IR (6) fwv − v zv ≤ 0 (w,v)∈AW V

s∈Sw

3.2 Objective function While minimizing cost was always the objective, getting sensible cost coefﬁcient for the objective function proved to be a major problem in all projects. The objective can be a combination of the following terms: Installation cost for connections τuw xuw + τwv xwv , (7) (u,w)∈AU W

(w,v)∈AW V

installation cost for conﬁgurations s τvs zvs , τws zw + w∈W s∈Sw

(8)

v∈V s∈Sv

and ﬂow unit cost between intermediate and backbone nodes r r τwv fwv .

(9)

r∈IR (w,v)∈AW V

r The τuw , τwv , τws , τvs , and τwv ∈ IR are the cost coeﬃcients used to weight the desired terms.

94

A. Bley and T. Koch

3.3 Notes If a demand node can be connected directly to a backbone node, this can be modeled by introducing an additional artiﬁcial intermediate node with only one link to the backbone node and adding a constraint that forbids a “zero” conﬁguration of the backbone if this link is active. The model can be easily extended to an arbitrary number of intermediate levels, but, of course, solving it will become more and more diﬃcult. A major drawback of this model is the inability to cope with locally routed demands. If two demand nodes are connected to the same intermediate node, often the demand need not to be routed to the backbone. Finding optimal partitions that minimize the traﬃc in the backbone is NP-hard itself and diﬃcult to solve [FMdS+ 96]. On the other hand, demands are usually quite unstable. While a “big” node will always have a high demand, the destination of its emanating traﬃc might change frequently. In the case of the G-WiN, a large fraction of the traﬃc is leaving it or entering from outside the network. We did limited experiments which indicated that, at least in this case, the possible gain from incorporating local traﬃc is much less than the uncertainty of the input data.

4 Backbone Network Planning In this section we present a mixed-integer linear programming model for the backbone network dimensioning and OSPF routing problem. 4.1 Network hardware In contrast to the access network, the possible conﬁgurations of the modular hardware for the backbone network are described by explicit single component variables and resource constraints. The number of installations of components are modeled by integer variables zvc ∈ IN, zvc ≤ zvc ≤ zvc , for each node component c ∈ CV and node v ∈ V , zec ∈ IN, zec ≤ zec ≤ zec , for each c c c c , for edge component c ∈ CE and edge e ∈ E, and zG ∈ IN, zG ≤ zG ≤ zG c each global component c ∈ CG . The lower and upper bounds z∗ and z∗c for each component and node, edge, or the global network restrict how often this component can be installed there. Depending on the resource type, with each resource there are one or more inequalities associated. For node resource constraint we have kvc,r zvc ≥ 0 ∀ r ∈ RV , v ∈ V . (10) kec,r zec + c∈CE e∈δ(v)

c∈CV

Analogously, the inequalities for edge and global resource constraints are

IP Approaches to Access and Backbone IP Network Planning

kec,r zec

c∈CV

c∈CE

c,r c (kvc,r zvc + kw zw ) ≥ 0 ∀ r ∈ RE , vw = e ∈ E, and

kec,r zec +

c∈CE e∈E

kvc,r zvc +

c∈CV v∈V

c∈CG

c,r c kG zG ≥ 0 ∀ r ∈ R G .

95

(11)

(12)

The node conﬁgurations s ∈ Sw chosen for the backbone nodes in the access network planning phase can be regarded as special ﬁxed components now, which provide the resources to install further components. The optimization goal of minimizing the total network cost can be easily formulated as c,cost c kG zG , (13) kvc,cost zvc + kec,cost zec + min c∈CE e∈E

c∈CV v∈V

c∈CG

where cost ∈ RG is the special cost resource. Of course, any other objective that is linear in the components can be formulated as well via an appropriate resource. 4.2 Routing There are several possible ways to model the OSPF routing of the demands. We used arc–ﬂow, path–ﬂow, and tree–based formulations in our practical experiments. In this article, we present a formulation that is based on binary arc–ﬂow variables. To model the directed OSPF traﬃc appropriately, we associate the two directed arcs (u, v) and (v, u) with each edge e = uv ∈ E and let A = {(u, v), (v, u) | uv ∈ E}. To simplify the notation, we use V o , E o , and Ao to denote the sets of operational nodes, edges, and arcs in operating state o, respectively. For each operating state o ∈ O and each traﬃc demand we use a standard ∈ formulation with binary arc–ﬂow variables, i.e., there is a variable ps,t,o a {0, 1} for all o ∈ O, s, t ∈ V o , a ∈ Ao , s = t, with ps,t,o = 1 iﬀ arc a is in a the routing path from s to t is operating state o. The ﬂow balance and edge capacity constraints then are ⎧ ⎪ ⎨1 u=s s,t,o ps,t,o − p = ∀ o ∈ O, s, t, u ∈ V o , and (14) −1 u = t a a ⎪ ⎩ + − a∈δA a∈δA 0 otherwise o (u) o (u)

ds,t · ps,t,o (u,v) ≤

s,t∈V o

c,cap c kuv zuv

c∈CE

∀ o ∈ O, (u, v) ∈ Ao .

(15)

In our application, there were no node capacity constraints necessary to model, but they can be included into the model in a straightforward way. The allow only symmetric routing, the inequalities

96

A. Bley and T. Koch t,s,o ps,t,o (u,v) = p(v,u)

∀ o ∈ O, s, t ∈ V o , (u, v) ∈ Ao

(16)

can be added to the formulation. A maximum admissible routing path length p¯s,t,o between s and t in operating state o can be enforced by the inequalities ps,t,o ≤ p¯s,t, o ∈ O, s, t ∈ V o . (17) a a∈Ao

4.3 OSPF Routing So far, we presented a mixed-integer linear programming model for a general survivable network design problem with single-path routing. In this section, we will show how to incorporate the special features of the OSPF routing protocol into this model. It is easy to see, that not all possible conﬁgurations of routing paths are realizable with a shortest path routing protocol. For some path conﬁgurations it is impossible to ﬁnd weights, such that all paths are shortest paths simultaneously. Many, rather complicated constraints have to be satisﬁed by a path conﬁguration to be realizable by some routing weights. Examples of such constraints are presented, for example, in [BAGL00], [BAG03], [Gou01], and [SKK00]. We will call a path conﬁguration admissible if it can be realized by a set of routing weights. In the following, we will present some simple necessary constraints for admissible path conﬁgurations and what inequalities these constraints impose on the path variables p. Unfortunately, a complete description of all admissible path conﬁgurations by inequalities (in a template-scheme) on the path variables is not known. But it is possible to decide in polynomial time by solving a linear program whether a given path conﬁguration is admissible and, if not, to generate an inequality that is valid for admissible path conﬁgurations but violated by the given one. We will present this linear program and show how it can be used to separate inequalities cutting oﬀ non-admissible path conﬁgurations. Subpath constraints The simplest constraints that must be satisﬁed by the routing paths are the subpath (or path monotony) constraints. Suppose we have a set of routing weights w and P s,t is a unique shortest path between s, t ∈ V , s = t, of length at least 2. Let v ∈ V be an inner node of P s,t . Then, if P s,v or P v,t denote the shortest paths between s, v and v, t, respectively, both paths must be subpaths of P s,t . Otherwise, P s,t o cannot be the unique shortest path between s in t, see Figure 2. Hence, the routing variables of all admissible path conﬁgurations must satisfy the subpath inequalities:

IP Approaches to Access and Backbone IP Network Planning

97

∀ o ∈ O, s, t, v ∈ V o , a ∈ Ao .

(18)

s,t,o pv,t,o ≤ ps,t,o + 1− pa′ a a

∀ o ∈ O, s, t, v ∈ V o , a ∈ Ao .

(19)

ps,v,o ≤ ps,t,o + 1− a a

ps,t,o a′

− a′ ∈δA o (v)

− a′ ∈δA o (v)

Note that, in general, the subpath inequalities are not facet deﬁning in this basic form but can be turned into facets by appropriate lifting. Nevertheless, in our branch-and-cut implementation, we only use the unlifted form of the subpath inequalities (18) and (19) together with the (still not facet-deﬁning) simple lifted version s,t,o pa′ ∀ o ∈ O, s, t, v ∈ V o , a ∈ Ao . (20) + pv,t,o ≤ ps,t,o +2 1− ps,t,o a a a − a′ ∈δA o (v)

These inequalities proved already suﬃciently tight to obtain good practical results, even without the complicated lifting procedures. Operating state coupling constraints The operating states coupling constraints are the simplest constraints between routing paths of diﬀerent operating states. Let o1 , o2 ∈ O, o1 = o2 , be two diﬀerent operating states. Suppose for some given routing weights the paths P s,t,o1 and P s,t,o2 are the unique shortest paths between s, t ∈ V o1 ,o2 , s = t, in operating states o1 and o2 , respectively. Furthermore, suppose that in operating state o1 no edge or node on P s,t,o2 fails and that in operating state o2 no edge or node on P s,t,o1 fails. Then both paths would remain feasible (s, t)-paths in both operating states. Since only one of them can be the shorter path, both must be identical (see Figure 3). For notational convenience, we deﬁne the following “trigger term”: ⎧ if oi = ∅, ⎪ ⎨0 s,t,oj s,t,oj s,t,oj if oi = uv ∈ E, ] := p(u,v) + p(v,u) [oi ∈ P ⎪ s,t,oj ⎩ if oi = v ∈ V. p a∈δ − (v)−oj a All admissible path conﬁgurations must satisfy the following operating state coupling inequalities: 1 2 + [o1 ∈ P s,t,o2 ] + [o2 ∈ P s,t,o1 ] ≤ ps,t,o ps,t,o a a ∀ o1 , o2 ∈ O, s, t ∈ V o1 ,o2 , a ∈ Ao1 ,o2 .

(21)

Also the operating state coupling inequalities are not facet deﬁning in this basic form in general, but can be turned into facets by a lifting procedure.

98

A. Bley and T. Koch t

t v u

u′

s

s

Figure 2. Example of a violated subpath constraint

Figure 3. Example of a violated operating state coupling constraint

Computing routing weights Let w be routing weights that induce unique shortest path between all node pairs in all operating states. For each o ∈ O and s, t ∈ V − o denote by o πs,t ∈ IR+ the distance between s and t in operating state o with respect to w. It is well known from shortest path theory, that the metric inequalities o o πs,u + w(u,v) − πs,v ≥ 0,

∀ o ∈ O, s ∈ V o , (u, v) ∈ Ao

are satisﬁed and hold with equality if and only if (u, v) is on a shortest (s, v)path in operating state o. Since scaling the weights does not change the shortest paths, we can ﬁnd a scaling factor α > 0 such that for weights αw the left-hand-side of all strict inequalities is at least 1. Now suppose we are given integer variables p that deﬁne a path conﬁguration, that is not necessarily admissible but satisﬁes all subpath constraints. Consider the following linear program, where w and π are variables (while p is ﬁxed): min

w(u,v)

(22)

(u,v)∈A o o πs,u + w(u,v) − πs,v =0 o πs,u

+ w(u,v) −

o πs,v

≥1

wa ≥ 1

o πs,v ≥0

∀ o ∈ O, s ∈ V o , (u, v) ∈ Ao with ps,v,o (u,v) = 1

(23)

∀ o ∈ O, s ∈ V , (u, v) ∈ A , with

(24)

o

∀a∈A

∀ o ∈ O, s, v ∈ V o , s = v

o

ps,v,o (u,v)

=0

(25) (26)

It is easy to see that this linear program has a (primal feasible) solution if and only if the given variables p correspond to an admissible path conﬁguration and, if this is the case, are unique shortest paths with respect to the weights w. Otherwise, if the linear program has no solution, there are no routing weights inducing the given conﬁguration of paths, i.e., the conﬁguration is not admissible. In that case, the linear program contains an infeasible system

IP Approaches to Access and Backbone IP Network Planning

99

of rows I = I1 ∪I0 consisting of some equalities I1 ⊆ {(s, (u, v), o) | ps,v,o (u,v) = 1} and inequalities I0 ⊆ {(s, (u, v), o) | ps,v,o = 0}. Since I is an infeasible system, (u,v) with (s, (u, v), o) ∈ I can have for no admissible path conﬁguration all ps,v,o 1 (u,v) s,v,o the value 1 and all p(u,v) with (s, (u, v), o) ∈ I0 in I can have the value 0 simultaneously. Hence, the inequality s,v,o s,v,o p(u,v) − p(u,v) ≤ |I1 | − 1, (27) (s,(u,v),o)∈I1

(s,(u,v),o)∈I0

which is violated by the given non-admissible path conﬁguration p, is valid for all admissible path conﬁgurations and can be added to the problem formulation. Note that, in general, these infeasibility inequalities (27) are not facet deﬁning. They cut oﬀ only one speciﬁc non-admissible sub-conﬁguration of paths. To use them eﬃciently in a branch-and-cut framework, they have to be strengthened. Otherwise they will become active only too deep in the branching tree to be useful. In our implementation, we try to reduced their support as far as possible. First, we use only irreducible infeasibility systems of rows (set-inclusion minimal infeasibility systems), second, we apply some easy reductions that exploit the problem structure. For example, such an infeasibility systems usually contains both an entry (s, (u, v), o) ∈ I1 and one s,v,o or more entries (s, (u′ , v), o) ∈ I0 . Since, if ps,v,o (u,v) = 1, the variables p(u′ ,v) are already forced to 0 by the ﬂow balance constraints (14), they need not be included in the infeasibility system’s inequality (27). If the OSPF routing must be symmetric, like in the G-WiN, one usually wants to provide equal routing weights for both directions of an edge. This is achieved by adding the equalities w(u,v) = w(v,u)

uv ∈ E

(28)

to the linear system (23)–(26). 4.4 Utilization minimization The variables and constraints presented so far are suﬃcient to model the cost minimization variants of the backbone network design problem as a mixed-integer linear program. For the utilization minimization variant, we need some further variables and constraints. For each load group j ∈ J, we introduce a continuous variable λj ∈ IR, 0 ≤ λj ≤ λj , for the maximum utilization attained by a component c on some edge e in operating state o with (c, e, o) ∈ j. The upper bound λj for that variable is used to specify initially a maximum feasible utilization. The objective is min α j λj , (29) j∈J

100

A. Bley and T. Koch

where αj is the non-negative objective coeﬃcient associated with load group j. For each j ∈ J, e ∈ E, and c ∈ C with (c, e, o) ∈ j for some o ∈ O, we introduce variables for the maximum usable routing capacity yec,j ≥ 0. The value of these variables is the amount of ﬂow that can be routed through the component in this operating state without increasing the current maximum utilization for the load group. The maximum usable capacities cannot be larger than the original components capacity, if installed, times the maximum attained utilization of the load group. This can be expressed by the following two variable-upper-bound inequalities: yec,j ≤ λj kec,cap

yec,j

≤

c zuv (λj kec,cap )

∀ j ∈ J, c ∈ C, e ∈ E with (c, e, o) ∈ j for o ∈ O (30)

∀ j ∈ J, c ∈ C, e ∈ E with (c, e, o) ∈ j for o ∈ O (31)

In the capacity constraints (15) the original capacities provided by the installed edge components are replaced by the maximum usable capacities. This yields for utilization minimization the new capacity constraints

s,t∈V

o

ds,t · ps,t,o (u,v) ≤

c,j(c,uv,o) yuv

c∈C:j(c,uv,o)=∅

∀ o ∈ O, (u, v) ∈ Ao ,

(32)

where j(c, uv, o) denotes the load group that (c, uv, o) belongs to or ∅, if it belongs to no load group.

5 Computation In this section we report on the methods used to solve the models shown in the previous sections. Also, we explain why these algorithms where chosen. 5.1 Access network In all cases we studied, the access network planing problem was highly restricted. This is not surprising, because usually the locations that can be chosen as backbone or intermediate nodes are limited due to technical, administrative and political reasons. The biggest obstacle always was the unavailability of sensible cost data for the potential connections. Also, in all projects we encountered some peculiar side constraints that were speciﬁc to the project. Therefore, we needed a very ﬂexible approach and used a general IP-Solver. And, at least for the data used in the projects, it was possible with some preprocessing to solve the access network planing problems with CPLEX [CPL01], SIP [Mar98], or a comparable MIP-Solver. Unfortunately, the solution times heavily depend on the actual data and objective function, especially on the relation between connection costs and

IP Approaches to Access and Backbone IP Network Planning

101

assembly stage costs and on how tight the resource capacities are. In the G-WiN all locations were already built and the number of backbone and intermediate nodes to choose was ﬁxed as a design criteria. So only the connection costs had to be regarded in the objective function. In one of the other projects this situation reversed: the transportation network was already installed and only the cost for the equipment at the nodes had to be considered. In Section 6 we show some examples on how diﬃcult the problems can get, if we allow all connections between the nodes and have several tight resources. 5.2 Backbone network In order to solve the backbone network planning problem a special branchand-cut algorithm based on the MIP formulation presented in Section 4 was developed and implemented in C++. CPLEX [CPL01] or SoPlex [Wun96] can be used to solve the linear programming relaxations, the branch-and-cut tree is managed by our software. The initial LP relaxation contains all component variables and all resource constraints (10), (11), (12). If the objective is utilization minimization, all load group variables and all usable capacity variables are in the initial LP relaxation as well as the associated variable-upper-bound constraints (30) and (31). If the binary arc–ﬂow variables of all demands and all operating states are considered, the LP relaxation becomes too large to be used. We only considerer the arc–ﬂow variables for all demands in the normal operating state and for the biggest demands in the node failure states, i.e., only for demands with a value of least 50% of the biggest demand value. Arc–ﬂow variables for edge failure states are not in the formulation. Most edge failure states are dominated by some node failure states, already. Numerous experiments with real world data revealed that using only this restricted variable set is a good tradeoﬀ between the quality of the relaxation and the time necessary to obtain good solutions. The ﬂow balance constraints (14) and, the edge capacity constraints (15) or (32) are in the initial formulation, if at least one arc–ﬂow variable in their support is. The routing symmetry constraints (16), of course, are not explicitly generated in practice. Instead, arc–ﬂow variables are generated only for one of the directions s to t or t to s and then used to model the paths for both directions. The path length inequalities (17), if necessary, are also in the initial LP. Clearly, for all arcs that do not belong to any short (s, t)-path or those starting in t or ending in s, the corresponding arc–ﬂow variables must be 0 and are not included the model. Although there is only a polynomial number of subpath constraints (18), (19), and (20) and operating state coupling constraints (21), we generate these inequalities only if violated to keep the size of the relaxation as small as possible. Also, we only use coupling inequalities (21) that link the normal operating state to some failure state. Separating coupling inequalities between two failure states is too time consuming and many of these inequalities are

102

A. Bley and T. Koch

already implied by the coupling constraints between the normal operating state and the two failure states. At each node of the branch-and-cut tree we iteratively separate violated inequalities and resolve the LP relaxation for a limited number of times (≤ 5), as long as there is substantial increase (≥ 1%) in the optimum LP-value. In each such iteration we separate the following inequalities in the order given below, until at most 50 inequalities are separated: 1. subpath inequalities (18), (19), and (20) for the normal operating state, 2. induced cover inequalities [Boy93] for the knapsacks given by the edge capacity constraints (15) or (32) in the normal operating state1 , 3. cover inequalities [BZ78, NV94] for the knapsacks given by the resource constraints (10), (11), and (12), 4. subpath inequalities (18), (19), and (20) for failure operating states, 5. induced cover inequalities for the knapsacks given by the edge capacity constraints (15) or (32) for failure operating states, 6. operating state coupling inequalities (21) between the normal and some failure operating state, and 7. IIS inequalities (27). This strategy separates ﬁrst those inequalities that are computationally easier to separate and give the best improvement in the LP bound. It proved to be well suited in our experiments. If, at some branch-and-cut node, the ﬁnal LP solution is fractional, our basic strategy is to branch on a “good” arc–ﬂow variable of a big demand. From those demands, whose value is no less than a certain percentage (0.9) of the biggest demand with fractional arc–ﬂow variables, we choose the arc– ﬂow variable whose fractional value is closest to a target value (0.8). Flow variables for the normal operating state are preferred: When choosing the next branching variable we divide all demands by a factor of 5 for failure state ﬂow variables. At every branch-and-cut node with depth 3k, k ∈ IN, we branch on a fractional component variable, if there is one. In such a case, global components are always preferred to edge and these to node components, and then the fractional variable whose component provides the biggest routing capacity is chosen. The next node to explore in the branch-and-cut tree is selected by a mix of best-dual-bound and dive strategy. For a given number of iterations (32) we choose the node with the best dual bound. Thereafter, we dive for a good feasible solution in the tree, starting at a best-dual-bound node and then always choosing the child node whose arc–ﬂow variable was set to 1 or edge component variable was rounded up in the last branching. We do not backtrack on the dive path. If a feasible solution was found or the last child node is infeasible, we switch back to the best-dual-bound strategy and continue. 1

The precedence constraints in these knapsacks are the subpath constraints among the paths using the edge.

IP Approaches to Access and Backbone IP Network Planning

103

Two primal heuristics are used at the branch-and-cut nodes. They are applied only at nodes with depth 2k , k ∈ IN, reﬂecting the fact that the “important“ branches, that ﬁx big demands’ ﬂow variables or large edge components’ variables, are performed ﬁrst, while deeper in the branch-and-cut tree only small demands’ ﬂow variables are ﬁxed. Both heuristic ﬁrst generate initial routing weights and compute a shortest path routing with respect to these weights. Then, a (mixed-) integer programming solver is used to compute a hardware conﬁguration minimizing the original cost or utilization objective function plus the total violation of edge capacity constraints by this routing. These integer programs are fairly easy to solve in practice, they only contain the component variables and resource constraints. In a last step, the initially routing weights are adopted to the topology computed by the integer program and perturbed to make all shortest paths unique. Our ﬁrst heuristic takes a linear combination of the dual variables of the edge capacity constraints (15) or (32) over the diﬀerent operating states as initial routing weights. The second heuristic utilizes the linear program (22)–(26) to compute initial routing weights. In contrast to the IIS inequality separator, the heuristic initializes the metric (in-)equalities (23) and (24) only for those arc–ﬂow variables ps,v,o (u,v) that are integer or near-integer (≤ 0.1 or ≥ 0.9) in the current fractional solution. All other metric inequalities (23) or (24) are “deactivated” by setting their sense and right hand side to “≤ 0”. If this LP has a feasible solution, the computed routing weights induce at least all the near-integer routing paths at the current branch-and-cut node. If it has no solution, we generate an IIS inequality (27) cutting oﬀ a non-admissible subconﬁguration of the near-integer paths. The heuristic works as a separator, too. If the objective is to minimization the average utilization, we apply additional bound strengthening techniques to speed up the computation. For ∗ each load group j, we store an upper bound λj on the corresponding maximum utilization of the optimal solution and update these bounds during ∗ the optimization process. Initially, λj = λj . Whenever a new feasible solution with average utilization λ∗ is found, we tighten these bounds by setting ∗ ∗ λj := min{λj , λ∗ /αj } and we update the variable upper bound constraints ∗ (31) for these new bounds λj . Note that this operation may exclude feasible but non-optimal solutions from the remaining solution space; but we are interested only in the optimal solution and not in the entire feasible solution space. Indirectly, this bound strengthening also tightens the capacity constraints (32). Thus, in the following separation attempts we may ﬁnd much stronger cover inequalities for the associated knapsacks than we would ﬁnd without the bound strengthening. But, again, these inequalities are valid only for the optimal solution, not for all feasible solutions of the original problem.

104

A. Bley and T. Koch

6 Results The application of our main interest was the planning of the German research network G-WiN. The G-WiN started in 2000 with about 759 demand locations, see Figure 4, 261 of them could be used as intermediate or backbone nodes, see Figure 5. It was decided from the administration, that there should be about 10 backbone nodes and two intermediate node per backbone node. Several sets of demand data were generated. For the initial access and backbone network planning accounting data from the predecessor network B-WiN was used, later, for the operational replanning of the G-WiN backbone network, accounting data in the latest G-WiN network were available. These traﬃc measurements were scaled to anticipate future demands. In the access network planning step, several scenarios were optimized and assessed according to the stability of the solutions for increasing demands. The backbone nodes were then successively selected. For this reason we can not provide running times for the original problems. We generated some data sets with the original data and varying costs and resources. As can be seen in Table 1, instances of identical size can vary strongly in running time (CPLEX 8.0 MIP solver on an 2.5 GHz Pentium4, Linux). The harder instances were always those with scarcer resources, a higher number of assembly stages, or a more diﬃcult cost structure. The solution for the G-WiN access network planning problem is shown in Figure 6. It should be noted that access link cost functions that do not depend on the distance can lead to very strange-looking optimal solutions. We encountered this, when we had costs that were constant for a certain range. These solutions may be diﬃcult to verify and crossing access network link may be hard to explain to a practitioner. For the G-WiN backbone planning, the underlying network topology was a complete graph on 10 nodes, corresponding to the virtual private STM backbone network inside Germany, plus one additional node linked to two of these ten nodes, corresponding to the ‘uplink’ to other networks via two gateways. The capacities on the two gateway links and the ‘conﬁguration’ of the uplink-node were ﬁxed. On each of the other links, one capacity conﬁguration from a given set of conﬁgurations could be installed. Depending on the speciﬁc problem instance, these sets represented various subsets of the STM-hierarchy. Several types of IP router cards and SDH cards had to be considered as components at the nodes. In order to install a certain capacity conﬁguration on an edge, appropriate IP router and SDH interfaces must be provided by the router and interface cards at the two terminal nodes. Additional IP-interfaces had to be provided at the nodes in order to connect the backbone level to the access level of the network. Of course, every node could hold only a limited number of router and interfaces cards. Besides these local restrictions, for both, edge capacities components as well as node interface and router card components, there were also several global restrictions. For

IP Approaches to Access and Backbone IP Network Planning

100 km

100 km

Figure 4. G-WiN-1 sites with aggregated traffic demands

100 km

Figure 5. G-WiN-1 sites, potential level-1/2 sites as triangles

100 km Kiel

Kiel Rostock

Rostock

Hamburg

Hamburg

Oldenburg

Oldenburg

Hannover

Berlin

Hannover

Braunschweig Magdeburg

Bielefeld

Berlin Braunschweig Magdeburg

Bielefeld

Goöttingen

Essen

Goöttingen

Essen Leipzig

Leipzig Dresden

Aachen

105

Marburg

Koöln

Dresden Aachen

Ilmenau

Marburg

Koöln

Ilmenau

Frankfurt Darmstadt

Frankfurt Darmstadt Wuürzburg

Wuürzburg Erlangen

Erlangen

KaiserslauternHeidelberg

KaiserslauternHeidelberg

Karlsruhe

Regensburg

Karlsruhe

Stuttgart

Regensburg Stuttgart

Augsburg

Augsburg

Muünchen

Freiburg

Figure 6. G-WiN-1 access network solution

Muünchen

Freiburg

Figure 7. G-WiN-1 core backbone and level-1–level-2 links

106

A. Bley and T. Koch Name DFN1 DFN2 DFN3 TV1 TV2 SL1 SL2 KG1 KG2 BWN1 BWN2

Vars Cons Nonz 6062 5892 6062 1818 1818 7188 7188 8230 8230 13063 13063

771 747 771 735 735 3260 3260 5496 5496 9920 9920

B&B

12100 235 11781 13670 12100 >40000 5254 0 5254 24 27280 267 27280 7941 42724 511 42724 3359 57688 372 57688 18600

Time (h:mm:ss) 6 27:05 >3:00:00 1 ∧ i = 2l − 1 ⎪ ⎪ 0 otherwise ⎪ ⎪ ⎩ φ x · 2l − i otherwise.

Figure 4 (left) shows the sparse grid for level two with additional grid points on the boundary. The common hat basis functions span a space of piecewise linear functions with a mesh width of 2−2 . If the modiﬁed boundary functions are used, the same function space can be obtained by using additionally the basis functions adjacent to the boundary of the next higher level. This results in the same number of unknowns and in the grid on the right.

= (1)

Figure 4. V2 tions

with (left) common hat basis functions, (right) modiﬁed basis func-

Considering the modiﬁed basis, adaptivity can take care of the boundary functions: They are automatically created wherever needed. Classiﬁcations showed that it is hardly necessary to modify the basis functions if there are no data points located exactly on the boundary as in the case of the Ripley dataset, for example. Table 2 shows that there is almost no diﬀerence in the number of grid points created. But as the condition number of the matrix detoriates using the modiﬁed boundary functions, it takes more than twice as many iterations to reduce the norm of the residuum below 10−8 . If the datasets are normalized not to the unit hypercube [0, 1]d but to a slightly smaller region then it suﬃces to take the normal hat function basis when classifying adaptively. This allows to start with 2d+1 grid points for the sparse grid on level two, creating “boundary” grid points only when necessary.

130

H.-J. Bungartz et al. Table 2. Conventional hat basis functions vs. modiﬁed boundary functions

hat functions # grid points max l # iterations 5 2 5 17 3 18 49 4 35 123 5 50 263 6 64 506 7 85 866 7 96

modiﬁed boundary functions acc. # grid points max l # iterations acc. 89.9 5 2 5 89.9 91.1 17 3 20 91.0 91.0 49 4 57 91.0 91.0 117 5 115 91.0 90.7 232 6 169 90.6 90.8 470 6 235 90.7 90.7 812 7 252 90.8

5 Summary We presented an adaptive classiﬁcation algorithm using sparse grids to discretize feature space. The algorithmically hard part, the multiplication with the stiﬀness matrix, was sketched. The algorithm allows for the classifcation of large datasets as it scales only linearly in the number of training data points. Using adaptivity in sparse grid classiﬁcation allows to reduce the number of grid points signiﬁcantly. An adaptive selection of grid points is especially useful for higher dimensional feature spaces. Special care has to be taken regarding the boundary values. Sparse grids usually employ grid points located on the boundary. A vast number of grid points can be saved if those are omitted and if the datasets are normalized so that no data points are located on the boundary instead. This allows to start with only 2d + 1 grid points; adaptivity takes care about the creation of grid points next to the boundary. Ongoing research includes investigations on employing other reﬁnement criteria which are suited better for classiﬁcation and the use of other basis functions to further improve adaptive sparse grid classiﬁcation.

References [BuGr04]

H.-J. Bungartz and M. Griebel. Sparse grids. Acta Numerica Volume 13, 2004, p. 147–269. [GaGT01] J. Garcke, M. Griebel and M. Thess. Data Mining with Sparse Grids. Computing 67(3), 2001, p. 225–253. [NHBM98] D. Newman, S. Hettich, C. Blake and C. Merz. UCI Repository of machine learning databases, 1998. [Pﬂ¨ u05] D. Pﬂ¨ uger. Data Mining mit D¨ unnen Gittern. Diplomarbeit, IPVS, Universit¨ at Stuttgart, March 2005. [RiHj95] B. D. Ripley and N. L. Hjort. Pattern Recognition and Neural Networks. Cambridge University Press, New York, NY, USA. 1995. [Sing98] S. Singh. 2D spiral pattern recognition with possibilistic measures. Pattern Recogn. Lett. 19(2), 1998, p. 141–147.

On the Stochastic Geometry of Birth-and-Growth Processes. Application to Material Science, Biology and Medicine Vincenzo Capasso ADAMSS (Centre for Advanced Applied Mathematical and Statistical Sciences) and Department of Mathematics Universit´ a degli Studi di Milano, via Saldini 50, 20133 Milano, Italy [email protected]

Dedicated to Willi Jaeger on his 65th birthday Abstract Nucleation and growth processes arise in a variety of natural and technological applications, such as e.g. solidiﬁcation of materials, semiconductor crystal growth, biomineralization (shell growth), tumor growth, vasculogenesis, DNA replication. All these processes may be modelled as birth-and-growth processes (germ-grain models), which are composed of two processes, birth (nucleation, branching, etc.) and subsequent growth of spatial structures (crystals, vessel networks, etc), which, in general, are both stochastic in time and space. These structures usually induce a random division of the relevant spatial region, known as a random tessellation. A quantitative description of the spatial structure of a random tessellation can be given, in terms of random distributions /`a la Schwartz/, and their mean values, known as mean densities of interfaces (n-facets) of the random tessellation, at diﬀerent Hausdorﬀ dimensions (cells, faces, edges, vertices), with respect to the usual d-dimensional Lebesgue measure. With respect to all ﬁelds of applications, predictive mathematical models which are capable of producing quantitative morphological features can contribute to the solution of optimization or optimal control problems. A non trivial difficulty arises from the strong coupling of the kinetic parameters of the relevant birth-and-growth (or branching-and-growth) process with the underlying ﬁeld, such as temperature, and the geometric spatial densities of the evolving spatial random tessellation itself. Methods for reducing complexity include homogenization at mesoscales, thus leading to hybrid models (deterministic at the larger scale, and stochastic at lower scales); we bridge the two scales by introducing a mesoscale at which we may locally average the microscopic birth-and-growth model in presence of a large number of grains.

132

V. Capasso

The proposed approach, also suggests methods of statistical analysis for the estimation of mean geometric densities that characterize the morphology of a real system.

1 Introduction Many processes of biomedical interest may be modelled as birth-andgrowth processes (germ-grain models), which are composed of two processes, birth (nucleation, branching, etc.) and subsequent growth of spatial structures (cells, vessel networks, etc), which, in general, are both stochastic in time and space. These structures induce a random division of the relevant spatial region, known as random tessellation. A quantitative description of the spatial structure of a tessellation can be given, in terms of the mean densities of interfaces (n-facets). In applications to material science a main industrial interest is controlling the quality of the relevant ﬁnal product in terms of its mechanical properties; as shown e.g. in [30], these are strictly related to the ﬁnal morphology of the solidiﬁed material, so that quality control in this case means optimal control of the ﬁnal morphology. In medicine, very important examples of birth-and-growth processes are mathematical models of tumor growth and of tumor-induced angiogenesis. In this context, the understanding of the principles and the dominant mechanisms underlying tumor growth is an essential prerequisite for identifying optimal control strategies, in terms of prevention and treatment. Predictive mathematical models which are capable of producing quantitative morphological features of developing tumor and blood vessels can contribute to this. A major diﬃculty derives from the strong coupling of the kinetic parameters of the relevant birth-and-growth (or branching-and-growth) process with various underlying ﬁelds, and the geometric spatial densities of the existing tumor, or capillary network itself. All these aspects induce stochastic time and space heterogeneities, thus motivating a more general analysis of the stochastic geometry of the process. The formulation of an exhaustive evolution model which relates all the relevant features of a real phenomenon dealing with diﬀerent scales, and a stochastic domain decomposition at diﬀerent Hausdorﬀ dimensions, is a problem of high complexity, both analytical and computational. Methods for reducing complexity include homogenization at larger scales, thus leading to hybrid models (deterministic at the larger scale, and stochastic at smaller scales). The aim of this paper is to present an overview of a large set of papers produced by the international group coordinated by the author on the subject. As a matter of example we present a couple of simpliﬁed stochastic geometric models, for which we discuss how to relate the evolution of mean geometric densities describing the morphology of the systems to the kinetic parameters of birth and growth.

Birth-and-Growth Processes

133

In Section 2 the general structure of stochastic birth-and-growth processes is presented, introducing a basic birth process, as a marked point process. In Section 3 a volume growth model is presented which is of great interest in many problems of material science, and has attracted a lot of attention because of its analytical complexity with respect to the geometry of the growth front (see e.g. [31, 47] and references therein). In many of the quoted applications it is of great importance to handle evolution equations of random closed sets of diﬀerent (even though integer) Hausdorﬀ dimensions. Following a standard approach in geometric measure theory, such sets may be described in terms of suitable measures. For a random closed set of lower dimension with respect to the environment space, the relevant measures induced by its realizations are singular with respect to the Lebesgue measure, and so their usual Radon-Nikodym derivatives are zero almost everywhere. In Sections 4 and 5 an original approach is presented, recently proposed by the author and his group, who have suggested to cope with these diﬃculties by introducing random generalized densities (distributions) a ´ la Dirac-Schwartz, for both the deterministic case and the stochastic case. In this last one we analyze mean generalized densities, and relate them to densities of the expected values of the relevant measures. For the applications of our interest, the Delta formalism provides a natural framework for deriving evolution equations for mean densities at all (integer) Hausdorﬀ dimensions, in terms of the local relevant kinetic parameters of birth and growth. In Section 6 connections with the concept of hazard function is oﬀered, with respect to the survival of a point to its capture by the relevant growing phase. Section 7 shows how evolution equations for the mean densities of interfaces at all Hausdorﬀ dimensions are obtained in terms of the kinetic parameters of the process, via the hazard function. In Sections 8 and 9 it is shown how to reduce the complexity of the problem, from both the analytical and computational point of view, by taking into account the multiple scale structure of the system, and deriving an hybrid model via an heuristic homogenization of the underlying ﬁeld at the larger scale. Some numerical simulations are reported for the case of crystallization of polymers, together with a simpliﬁed problem of optimal control of the ﬁnal morphology of the crystallized material.

2 Birth-and-Growth Processes The set of ﬁgures 1-11, shows a family of real processes from Biology, Medicine and Material Science. In a detailed description, all these processes can be modelled as birth-and-growth processes. In forest growth, births start from seeds randomly dispersed in a region of interest, and growth is due to nutrients

134

V. Capasso

Figure 1. Candies or phtalate crystals?

Figure 2. Forest growth or crystallization process?

in the soil that may be randomly distributed themselves or driven by a fertilization procedure; in tumor growth abnormal cells are randomly activated and develop thanks to a nutritional underlying ﬁeld driven by blood circulation (angiogenesis); in crystallization processes such as sea shells, polymer solidiﬁcation, nucleation and growth may be due to a biochemical underlying ﬁeld, to temperature cooling, etc.

Birth-and-Growth Processes

135

Figure 3. Sea shell crystallization (from [50])

Figure 4. A simulation of the growth of a tumor mass coupled with a random underlying ﬁeld (from [3])

Figure 5. Vascularization of an allantoid (from [27])

136

V. Capasso

Figure 6. Angiogenesis on a rat cornea (from [26]) (left). A simulation of an angiogenesis due to a localized tumor mass (black region on the right) (from [25]) (right)

Figure 7. Response of a vascular network to an antiangiogenic treatment (from [33])

All this kind of phenomena are subject to random ﬂuctuations, together with the underlying ﬁeld, because of intrinsic reasons or because of the coupling with the growth process.

Birth-and-Growth Processes

137

Figure 8. A libellula wing

Figure 9. A real picture showing a spatial tessellation due to vascularization of a biological tissue : endothelial cells form a vessel network (from [44])

Entangled interlamellar links Branch points

Figure 10. A schematic representation of a spherulite and an impingement phenomenon

2.1 The Birth process - Nucleation or Branching The birth process is modelled as a stochastic marked point process (MPP) N, deﬁned as a random measure on BR+ × E by ∞ ǫTj ,Xj N= j=1

138

V. Capasso

Figure 11. Real Experiment

Figure 12. Simulated Experiment; a Johnson-Mehl tessellation

where • • • •

E denotes the sigma-algebra of the Borel subsets of E, a bounded subset of Rd , the physical space; Tj is an R+ -valued random variable representing the time of birth of the n−th nucleus, Xj is an E-valued random variable representing the spatial location of the nucleus born at time Tj , ǫt,x is the Dirac measure on BR+ × E such that for any t1 < t2 and B ∈ E, 1 if t ∈ [t1 , t2 ], x ∈ B, ǫt,x ([t1 , t2 ] × B) = 0 otherwise.

The (random) number of nuclei born during A , in the region B is given by N (A × B) = ♯{Tj ∈ A, Xj ∈ B}, A ∈ BR+ , B ∈ E .

2.2 Stochastic Intensity The stochastic intensity of the nucleation process provides the probability that a new nucleation event occurs in the inﬁnitesimal region [x, x + dx], during the inﬁnitesimal time interval [t, t + dt], given its past history Ft− up to time t, ν(dx × dt) := P [N (dx × dt) = 1 | Ft− ] = E[N (dx × dt) | Ft− ] In many cases, such as in a crystallization process, we have volume growth models. If the nucleation events occur at {(Tj , Xj ) | 0 ≤ T1 ≤ T2 ≤ . . .}

Birth-and-Growth Processes the crystalline phase at time t > 0 is described by a random set 9 t Θj , Θt =

139

(1)

Tj ≤t

given by the union of all crystals born at times Tj and locations Xj and freely grown up to time t. In this case Θt has the same dimension d as the physical space. If we wish to impose that no further nucleation event may occur in the crystalline phase, we have to assume that the stochastic intensity is of the form ν(dx × dt) = α(x, t)(1 − δΘt− (x))dt dx where δΘt denotes the indicator function of the set Θt , according to a formalism that will be discussed later; so that the term (1 − δΘt− (x)) is responsible of the fact that no new nuclei can be born at time t in a zone already occupied by the crystalline phase. The parameter α, also known as the free space nucleation rate, is a suitable real valued measurable function on R+ × E, such that α(·, t) ∈ L1 (E), for all t > 0 and such that T

dt

0
t0 , is given by [13] Θ(t; x0 , t0 ) = {x ∈ E|∃ξ ∈ W 1,∞ ([t0 , t]) : ξ(t0 ) = x0 , ξ(t) = x, ˙ ξ(s) ≤ G(ξ(s), s), s ∈ (t0 , t)}

(4)

for t ≥ t0 and Θ(t; x0 , t0 ) = ∅ for t < t0 . In this case we may indeed claim that the whole crystalline phase is given by (1). The following deﬁnition will be useful later; we denote by dimH and Hn the Hausdorﬀ dimension, and the n-dimensional Hausdorﬀ measure, respectively. Deﬁnition 1. [24] Given an integer n, such that 0 ≤ n ≤ d, we say that a subset A of Rd is n-regular, if it satisﬁes the following conditions: (i) A is Hn -measurable; (ii) Hn (A) > 0; and Hn (A ∩ Br (0)) < ∞, for any r > 0;

142

V. Capasso

(iii)

Hn (A ∩ Br (x)) lim = r→0 bn r n

1 0

where bn is the volume of the unit ball in Rn .

Hn -a.e. x ∈ A, , ∀x ∈ A.

Remark 1. Note that condition (iii) is related to a characterization of the Hn rectiﬁability of the set A [29]. Theorem 1. [8] Subject to the initial condition that each initial germ is a spherical ball of inﬁnitesimal radius, under suitable regularity on the growth ﬁeld G(t, x), each grain Θtt0 (x0 ) is such that the following inclusion holds Θts0 (x0 ) ⊂ Θtt0 (x0 ), for s < t, with ∂Θts0 (x0 ) ∩ ∂Θtt0 (x0 ) = ∅, for s < t. Moreover, for almost every t ∈ R+ , Θtt0 (x0 ) is a d-regular closed set, and ∂Θtt0 (x0 )is a (d − 1)-regular closed set. As a consequence Θtt0 (x0 ) and ∂Θtt0 (x0 ) satisfy Hd (Θtt0 (x0 ) ∩ Br (x)) = 1 for Hd -a.e. x ∈ Θtt0 (x0 ), r→0 bd r d Hd−1 (∂Θtt0 (x0 ) ∩ Br (x)) lim = 1 for Hd−1 -a.e. x ∈ ∂Θtt0 (x0 ), r→0 bd−1 rd−1 lim

where Br (x) is the d-dimensional open ball centered in x with radius r, and bn denotes the volume of the unit ball in IRn . Further we assume that G(t, x) is suﬃciently regular so that, at almost any time t > 0, the following holds Hd ((Θtt0 (x0 )⊕r \ Θtt0 (x0 )) ∩ A) = Hd−1 (∂Θtt0 (x0 ) ∩ A), r→0 r lim

for any A ∈ BRd such that Hd−1 (∂Θtt0 (x0 ) ∩ ∂A) = 0, where we have denoted by F⊕r the parallel set of F at distance r ≥ 0 (i.e. the set of all points x ∈ Rd with distance from F at most r) (see e.g. [37, 45]).

Birth-and-Growth Processes

143

4 Closed Sets as Distributions – The Deterministic Case In order to pursue our analysis of the germ-grain model associated with a birth-and-growth process, we ﬁnd convenient to represent n-regular closed ´ la Dirac-Schwartz, in terms of their “geometric sets in IRd as distributions a densities”. Consider an n-regular closed set Θn in IRd . Then we have Hn (Θn ∩ Br (x)) 1 Hn -a.e. x ∈ Θn , = lim , n 0 ∀x ∈ Θn . r→0 bn r but Hn (Θn ∩ Br (x)) bn rn Hn (Θn ∩ Br (x)) = lim = d r→0 r→0 bd r bn r n bd r d lim

∞ 0

Hn -a.e. x ∈ Θn , ∀x ∈ Θn .

By analogy with the delta function δx0 (x) associated with a point x0 , for x ∈ IRd we deﬁne the generalized density of Θn as Hn (Θn ∩ Br (x)) . r→0 bd r d

δΘn (x) := lim

The density δΘn (x) (delta function of the set Θn ) can be seen as a linear functional deﬁned by a measure in a similar way as the classical delta function δx0 (x) of a point x0 . Deﬁne µΘn (A) := (δΘn , 1A ) := Hn (Θn ∩ A),

A bounded in BIRd .

In accordance with the usual representation of distributions in the theory of generalized functions, for any function f ∈ Cc (IRd , IR), the space of continuous functions with compact support, we may formally write (δΘn , f ) := f (x)δΘn (x)dx. f (x)µΘn (dx) = IRd

IRd

5 Stochastic Geometry A relevant aspect of stochastic geometry is the analysis of the spatial structure of objects which are random in location and shape. Given a random object Σ ∈ Rd , a ﬁrst quantity of interest is for example the probability that a point x belongs to Σ, or more in general the probability that a compact set K intersects Σ. The theory of Choquet-Matheron [39, 46], shows that it is possible to assign a unique probability law PΣ associated with a RACS (random closed set) Σ ∈ Rd on the measurable space (F, σF ) of the family of closed sets in Rd

144

V. Capasso

endowed with the σ-algebra generated by the hit-or-miss topology, by assigning its hitting functional TΣ . Given a probability space (Ω, A, P ). A RACS Σ is a measurable function Σ : (Ω, A) −→ (F, σF ). The hitting functional of Σ is deﬁned as TΣ : K ∈ K −→ P (Σ ∩ K = ∅), where K the family of compact sets in Rd . Actually we may consider the restriction of TΣ to the family of closed balls {Bε (x); x ∈ Rd , ε ∈ R+ − {0}}. We shall denote by EΣ , or simply by E, the expected value with respect to the probability law PΣ . 5.1 Closed sets as distributions – The stochastic case Suppose now that Θn is an n-regular Random Closed Set in IRd on a suitable probability space (Ω, F, P ), with E[Hn (Θn ∩ Br (0))] < ∞, for all r > 0. As a consequence, µΘn , deﬁned as above, is a random measure, and correspondingly δΘn is a random linear functional. Consider the linear functional E[δΘn ] deﬁned on Cc (IRd , IR) by the measure E[µΘn ](A) := E[Hn (Θn ∩ A)], i.e. by (E[δΘn ], f ) =

IRd

f (x)E[δΘn ](x)dx :=

IRd

f (x)E[µΘn ](dx),

for any f ∈ Cc (IRd , IR). It can be shown that the expected linear functional E[δΘn ] so deﬁned is such that, for any f ∈ Cc (IRd , IR), (E[δΘn ], f ) = E[(δΘn , f )], which corresponds to the expected linear functional a´ la Gelfand-Pettis. For a discussion about measurability of (δΘn , f ) we refer to [6, 38, 51]. Note that, even though for any realization Θn (ω), the measure µΘn (ω) may be singular, the expected measure E[µΘn ] may be absolutely continuous with respect to ν d , having classical Radon-Nykodym density E[δΘn ]. It is then of interest to say whether or not a classical mean density can be introduced for sets of lower Hausdorﬀ dimensions, with respect to the usual Lebesgue measure on Rd . In order to respond to this further requirement, in [23] we have introduced the following deﬁnition.

Birth-and-Growth Processes

145

Deﬁnition 2. Let Θ be a random closed set in Rd with E[HdimH (∂Θ) (∂Θ)] > 0. We say that Θ is absolutely continuous if and only if E[HdimH (∂Θ) (∂Θ ∩ ·)] ≪ ν d (·)

(5)

on BRd , where dimH denotes the Hausdorﬀ dimension. Remark 2. We are assuming that the random set Θ is suﬃciently regular so that, if dimH (Θ) = d, then dimH (∂Θ) = d − 1, while if dimH (Θ) = s < d, then ∂Θ = Θ, and E[HdimH (∂Θ) (∂Θ)] < ∞; thus (5) becomes: E[Hd−1 (∂Θ ∩ ·)] ≪ ν d (·) if dimH (Θ) = d, E[Hs (Θ ∩ ·)] ≪ ν d (·) if dimH (Θ) = s < d. It is easy to check that the deﬁnition above is consistent with the case that Θ is a random variable or a random point in Rd . For n = d, it is easily seen that δΘd (x) = 1Θd (x), ν d -a.s., which directly implies E[δΘd ](x) = P (x ∈ Θd ), ν d -a.s.. The density VV (x) := E[δΘd ](x) = P (x ∈ Θd ) in material science is known as the (degree of ) crystallinity. The complement to 1 of the crystallinity, is known as porosity px = 1 − VV (x) = P (x ∈ Θd ). When the RACS Θd is a.c. according to the deﬁnition above, then the mean surface density SV (x) := E[δ∂Θd ](x) is well deﬁned, too, as a classical function (see Fig. 15).

6 The Hazard Function In the dynamical case, such as a birth-and-growth process, the RACS Θt may depend upon time so that a second question arises, i.e. when a point x ∈ E is reached (captured) by a growing stochastic region Θt ; or viceversa up to when a point x ∈ E survives capture? In this respect the degree of crystallinity (now also depending on time) VV (x, t) = P (x ∈ Θt ) may be seen as the probability of capture of point x ∈ E , by time t > 0. In this sense the complement to 1 of the crystallinity, also known as porosity px (t) = 1 − VV (x, t) = P (x ∈ Θt )

(6)

represents the survival function of the point x at time t, i.e. the probability that the point x is not yet covered by the random set Θt .

146

V. Capasso

Figure 15. An estimate of SV and VV on a planar (simulated) sample of coppertungsten alloy (from [32])

Figure 16. Capture of a point x during time ∆t

With reference to the growing RACS Θt we may introduce the (random) time τ (x) of survival of a point x ∈ E with respect to its capture by Θt , such that px (t) = P (τ (x) > t). In order to relate these quantities to the kinetic parameters of the process, we follow Kolmogorov [35] by introducing the concept of causal cone. Deﬁnition 3. The causal cone C(x, t) of a point x at time t is the set of points (y, s) in the time space R+ × E such that a crystal born in y at time s covers the point x by time t

Birth-and-Growth Processes

147

Figure 17. The causal cone of point x at time t

C(x, t) : = {(y, s) ∈ E × [0, t]|x ∈ Θ(t; y, s)}. where we have denoted by Θst (y) the crystal born at y ∈ E at time s ∈ R+ , and observed at time t ≥ s. Some information on the properties of the boundaries in the sense of geometric measure theory have been obtained in [8], for a freely grown crystal Θ(t; y, s), Proposition 1. For almost every t > s, the set Θ(t; y, s) has ﬁnite nontrivial Hausdorﬀ-measure Hd ; its boundary ∂Θ(t; y, s) has ﬁnite nontrivial Hausdorﬀ-measure Hd−1 . From the theory of Poisson processes, it is easily seen that P(N (C(x, t)) = 0) = e−ν0 (C(x,t)) , where ν0 (C(x, t)) is the volume of the causal cone with respect to the intensity measure of the Poisson process ν0 (C(x, t)) = α(y, s)d(y, s). C(x,t)

The following theorem holds [13] for the time derivative of the measure ν0 (C(x, t)). Proposition 2. Let the standard assumptions on the nucleation and growth rates be satisﬁed. Then ν0 (C(x, t)) is continuously diﬀerentiable with respect to t and t ∂ ν0 (C(x, t)) = G(x, t) dt0 dx0 K(x0 , t0 ; x, t)α(x0 , t0 ) (7) ∂t 0 Rd

148

V. Capasso

with K(x0 , t0 ; x, t) :=

{z∈Rd |τ (x0 ,t0 ;z)=t}

da(z)δ(z − x).

Here δ is the Dirac function, da(z) is a (d − 1)-surface element, and τ (x0 , t0 ; z) is the solution of the eikonal problem |

∂τ ∂τ 1 (x0 , t0 , x)| = (x0 , t0 , x) ∂x0 G(x0 , t0 ) ∂t0 |

∂τ 1 (x0 , t0 , x)| = , ∂x G(x, τ (x0 , t0 , x))

subject to suitable initial and boundary conditions. Let us suppose that the growth rate of a crystal depends only upon the point (x, t) under consideration, and not upon the age of the crystal, for example. In this case, under our modelling asssumptions, px (t) = P (x ∈ Θt ) = P (N (C(x, t)) = 0) = e−ν0 (C(x,t)) . Thanks to Proposition 2 , ν0 (C(x, t)) is continuously diﬀerentiable with respect to t, so that the hazard function, deﬁned as the rate of capture by the process Θt, i.e. P (x ∈ Θt+∆t |x ∈ / Θt ) , h(x, t) = lim ∆t→0 ∆t is given by ∂ ∂ ln px (t) = ν0 (C(x, t)), ∂t ∂t is well deﬁned; hence the time of capture τ (x) is an absolutely continuous random variable, having probability density function h(x, t) = −

fx (t) = px (t)h(x, t). Since fx (t) =

d ∂VV (x, t) (1 − px (t)) = dt ∂t

we immediately obtain ∂VV (x, t) = (1 − VV (x, t))h(x, t). ∂t This is an extension of the well known Avrami-Kolmogorov formula [4,35], proven for a very speciﬁc space and time homogeneous birth and growth process; instead our expression holds whenever a mean volume density and an hazard function are well deﬁned.

Birth-and-Growth Processes

149

Consider now the extended birth-and-growth process which evolves in such a way that germs are born with birth rate α(x, t) and grains grow with growth rate G(x, t), independently of each other, i.e. ignoring overlapping of germs and grains; under the above mentioned regularity assumptions on these parameters, which make the free crystals absolutely continuous the following quantities are well deﬁned a.e. Deﬁnition 4. We call mean extended volume density at point x and time t the quantity Vex (x, t) such that, for any B ∈ BR+ , d E[ ν (Θ(t; Xj , Tj ) ∩ B)] = Vex (x, t)ν d (dx). B

Tj di,j then dminj = di,j } } Algorithm for Skeletonization Process for 2D Image 1. Take the array M N ew from the region growing process. 2. For all elements in M N ew do N ew - Set the value of each element in array M to 1 if Mi,j = 0. 3. Calculate height ﬁeld map (see ﬁgure 13 for 2D). 4. Start the thinning process For all elements in M N ew do { Old Old N ew = 0 then Mi,j = 1 else Mi,j =0 - If M(i,j)

Inverse Problem of Lindenmayer Systems on Branching Structures

177

} Set Oldpoint = 1, N ewpoint = 0, and Iteration = 0 While (Oldpoint = N ewpoint) do { • Oldpoint = N ewpoint, Iteration = Iteration + 1 • Set zero array C with size H×W. • For all elements in M N ew { - Compute the Hilditch’s algorithm - Count N ewpoint } // For } // While. 5. Remove some jagged points along the skeleton of branching structure and updating B. Array M N ew and set B are stored the information of branch structure. 6. Record the thickness information of the branch structure from calculating height ﬁeld map. Algorithm for Skeletonization Process for 3D Volume Data In 3D volume data, we calculate skeleton by applying the peeling oﬀ process from outside and label the number of each layer. 1. Take the array M N ew from the volume growing process. 2. For all elements in M N ew do N ew = 0 for 3D. - Set the value of each element in array M to 1 if Mi,j,k 3. Calculate depth ﬁeld value (DFV) of each voxel by peeling oﬀ process. For all elements in M N ew do { - Set the depth ﬁeld value at position DF Vi,j,k = 1 - Count N p if depth ﬁeld value DF Vi,j,k = 1 } Iteration = 1 While (N p > 0) do { • Iteration = Iteration + 1 • For all elements in M N ew { - Count the number of its 26-neighbors (Cp) if DF Vi,j,k = 1, - If Cp < 20 then set the depth ﬁeld value of current voxel DF Vi,j,k = Iteration, } // For - Count N p if depth ﬁeld value DF Vi,j,k = 1 } // While. 4. Construct the skeleton from the maximum DF Vi,j,k to the smaller value by replacing the sphere and removing the smaller spheres inside the bigger sphere and connect them to its neighbors,

178

S. Chuai-Aree et al.

5. Remove some jagged points along the skeleton of branching structure and updating B. Array M N ew and set B are stored the information of branch structure, 6. Record the thickness information of the branch structure from calculating depth ﬁeld value.

7 Construction of Branching Structure After skeletonization process, the skeleton of branching structure is stored in array M new and set B. Since all the skeleton points are not connected to each other by the line connection yet, this section describes the algorithm to generate the network and remove some unnecessary points, which are on the same straight line. 7.1 Algorithm for Constructing the Branching Network 1. Initialize a stack S and start with a user supplied point for generating the network, 2. Calculate the nearest given starting point R in array M N ew and push point R into stack S, 3. Set point R as a root node of the branching structure T, 4. Set M old = M N ew for marking the path that was discovered, 5. While (stack S is not empty) do { Old = 0, • Pop a point Pi,j from the top of stack S and set the value of Mi,j • Look for all neighboring points of Pi,j with a value equal to 1, and push them to stack S (If there are no neighboring points of Pi,j , the point Pi,j will be set as terminated point.), • Set all neighboring points of Pi,j as children of Pi,j and mark the value of that point in array M Old to be 0, } // While 6. Finally, the network of branching structure T is constructed.

8 Resolution Reduction Up to now the network of branching structure T has been reconstructed, but it is still having a high resolution. In this section, we propose the algorithm to reduce the number of point in the network in algorithm 8.1. The L-string construction is given in algorithm 8.2. 8.1 Algorithm for Resolution Reduction 1. Consider the network of branching structure T,

Inverse Problem of Lindenmayer Systems on Branching Structures

179

2. Start the process at the root node R of branching structure T, 3. For all nodes in T do { • For every 3 nodes and each node has only one child, A is the parent of B and B is the parent of C, −−→ • If node A, B, and C are on the same line, then calculate the angle BA −−→ and BC, −−→ −−→ • If the angle between BA and BC ≤ (180 ± δ), then remove node B from T, where δ is the resolution angle for removal, } // For 4. Finally, the network of branching structure T is regenerated. 8.2 Algorithm for L-string Construction 1. 2. 3. 4. 5.

Read the network of branching structure T after reducing its resolution, Start the process at the root node T of branching structure T, Let A be the root node and B be the ﬁrst child of root node A, −−→ − → Calculate the angle δR between unit vector j and AB, For all nodes in T which have parent node and at least one child do { • Let B be the current node and A be the parent of node B, −−→ • Calculate the vector AB and its length DAB , • If node B has more than one child then – Rearrange all children of node B in the order of left branches “+(δ)I”, right branches “-(δ)I”, and middle branches “I” (has no angle) for L-string preparation and exchange all children of node B in the same order left,right and middle, respectively • For all children of node B do { – Let C be the current child of node B, −−→ – Calculate the vector BC and its length DBC , −−→ −−→ – Calculate the angle δB between AB and BC, – If the angle δB > ǫ and node B has more than one child then print “[” for adding new branch, → n of – Calculate the angle δN between unit perpendicular vector − −−→ − → AB, where n is rotated 90 degrees clockwise, – If δN > 90 + ǫ degrees then print “+(δB )” else if δN < 90 − ǫ print “-(δB )”, where ǫ is a small angle, −−→ – Print the segment of BC in the form of “I(DBC )”, – If node C has no child and the angle δB > ǫ then print “]” for ending its branch } // For } // For 6. Finally, the L-string code of the network T is generated.

180

S. Chuai-Aree et al.

(a)

(b)

(c)

(d)

(e)

Figure 14. L-string construction: (a) input network from algorithm 8.1, (b), (c), (d) and (e) step by step of L-string construction of input network (a)

Figure 14 shows the L-string reconstruction from input network from algorithm 8.1 by applying the algorithm 8.2. The L-string of the input network after applying the algorithm 8.2 is “I(181.0)[−(45.5)I(185.2)]I(179.0)[+(43.68)I(162.6)]I(188)”. The given L-string starts with an internode with 181.0 pixel unit length, then draws a new branch rotated 45.5 degrees clockwise with 185.2 pixel unit length and closes its branch, then continues a new main stem with 179.0 pixel unit length, then draws a new branch rotated 43.68 degrees counter-clockwise with 162.6 pixel unit length, and ﬁnally draws a new main stem with 188 pixel unit length.

9 Experiment and Results In this section we show the result of some examples. Figure 15 illustrates the development of region growing from six given initial points. The black boundary regions represent the growing process until all branching structures are discovered. The skeletonization of clover plant is shown in ﬁgure 15 in the third row. The diﬀerent resolution structures of the clover plant are shown in ﬁgure 16. The user deﬁne δ value will reduce the structure resolution of the network. Figure 17 illustrates the reconstruction process from an input image, de-noising, smoothing, region growing, skeletonization, network construction, and 3D object of a leaf network.

Inverse Problem of Lindenmayer Systems on Branching Structures

Input image

iter=70

skel iter=1

iter=10

iter=90

skel iter=5

iter=30

iter=120

skel iter=10

181

iter=50

iter=157

skel iter=15

Figure 15. Region growing process and skeletonization in branching structure of clover plant

Some Tree-like structures and their L-string codes are given in ﬁgure 18 by applying the algorithm 8.2. Each structure shows the number labels of the pixel node number. Figure 19 shows some results of reconstructed canola roots reconstructed by Dr. P. Kolesik in .MV3D ﬁle and a neuron structure of rat done by P. J. Broser in HOC ﬁle. They can be converted to L-string ﬁle format. Figure 20 shows the vascular aneurysm reconstruction with the diﬀerent conditions of consideration during doing region growing method. Since the input image is not sharp enough, the smoothing process and anisotropic diffusion ﬁltering can be applied as a preprocessing step before applying the region or volume growing method.

10 Conclusion and Further Works This paper provides the methods to reconstruct the branching structure from input images. It will be useful for further uses in bio-informatics and

182

S. Chuai-Aree et al.

(a)

(b)

(c)

(d)

Figure 16. Two resolution structures of clover plant with diﬀerent δ value, (a) and (c) show the wire frame structure, (b) and (d) show 3D structure

Actinidia latifolia (Actinidiaceae)

Actinidia latifolia (Actinidiaceae)

Actinidia latifolia (Actinidiaceae)

Actinidia latifolia (Actinidiaceae)

Fig. 39.2

Fig. 39.2

Fig. 39.2

Fig. 39.2

opposite percurrent - 4⬚s cross

opposite percurrent - 4⬚s cross

opposite percurrent - 4⬚s cross

opposite percurrent - 4⬚s cross

Figure 17. The reconstruction with de-noising process from an input image, denoising, smoothing, region growing, skeletonization, network construction, and 3D object

Inverse Problem of Lindenmayer Systems on Branching Structures

(a)

(b)

(c)

(d)

183

(e)

Figure 18. Some Tree-like structures constructed in L-string codes: (a) I(132.0) [+(29.61)I(117.8)] [-(29.58)I(119.0)], (b) I(136) [-(43.1)I(111.0)] I(101), (c) I(102.0) [+(45.21)I(104.6)] I(145.0), (d) I(108)[+(46.05)I(115.2)] [-(45.6)I(117.3)] I(137), (e) I(94.00) [+(45.64)I(96.89)] [-(1.29)I(84.00)] [-(45.1)I(100.4)] I(84), the number labels in each network are the pixel node numbers

(a)

(b)

(c)

Figure 19. Some results of (a), (b) reconstructed roots and (c) neuron structure from P. J. Broser in HOC ﬁle

medical applications. One can use this method to observe the development of branching structures in biology, botany or medicine. The method allows the user to choose the resolution of the network since in many cases the high resolution of network can be ignored. This research also provides to work with the sliced images from medical and biological applications. The volume of branching structure can be discovered and represented in a 3D object. The user can adjust the parameter of each angle in branching structure. The L-string or production rules of L-systems can also be simply generated after reconstruction process using turtle interpretation based on bracketed L-system.

184

S. Chuai-Aree et al.

(a)

(b)

(c)

(d)

Figure 20. The vascular aneurysm and its reconstruction with diﬀerent conditions and resolutions

Acknowledgments: The authors would like to thank Dr. Susanne Kr¨ omker at the Interdisciplinary Center for Scientiﬁc Computing (IWR), University of Heidelberg, for her productive suggestion. We also would like to thank Dr. Peter Kolesik at the University of Adelaide for example of soil volume with canola roots, and Dr. Philip Julian Broser at the Max Planck Institute for Medical Research (MPI) in Heidelberg for his rat’s neuron reconstruction.

References [PM90]

Perona, P., Malik, J.: Scale-space and edge detection using anisotropic diffusion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(7), 629–639 (1990) [AB94] Adams, R., Bischof L.: Seeded region growing. IEEE Trans. on PAMI, 16(7), 641–647 (1994) [Ash99] Ash, A., Ellis, B., Hickey, L. J., Johnson, K., Wilf, P., and Wing S. : Manual of Leaf Architecture- morphological description and categorization of dicotyledonous and net-veined monocotyledonous angiosperms by Leaf Architecture Working Group. (1999) [CJB05a] Chuai-Aree, S., J¨ ager, W., Bock, H. G., and Siripant, S.: Simulation and Visualization of Plant Growth Using Lindenmayer Systems. In

Inverse Problem of Lindenmayer Systems on Branching Structures

[CJB05b]

[DPS00]

[Mech96]

[Prus94]

[Rus95] [Set99] [Str04] [Tou97]

[Wei98]

185

H. G. Bock, E. Kostina, H. X. Phu and R. Rannacher (eds.): Modeling, Simulation and Optimization of Complex Processes, Springer-Verlag, pp. 115–126 (2005) Chuai-Aree, S., J¨ ager, W. Bock, H. G., and Siripant, S.: Reconstruction of Branching Structures Using Region and Volume Growing Method. International Conference in Mathematics and Applications (ICMA-MU 2005) (Bangkok, Thailand) (2005) Dimitrov, P., Phillips, C., Siddiqi, K.: Robust and Efficient Skeletal Graphs. Conference on Computer Vision and Pattern Recognition (Hilton Head, South Carolina) (2000) Mech, R., and Prusinkiewicz, P.: Visual models of plants interacting with their environment, Proceeings in Computer Graphics (SIGGRAPH’96), 397–410 (1996) Prusinkiewicz, P., Remphrey, W., Davidson, C., and Hammel, M.: Modeling the architecture of expanding Fraxinus pennsylvanica shoots using L-systems, Canadian Journal of Botany, 72, 701–714 (1994) Russ, J. C.: The Image Processing Handbook. CRC Press, (1995) Sethian, J. A.: Level Set Methods and Fast Marching Methods. Cambridge University Press, (1999) Strzodka, R., and Telea, A.: Generalized Distance Transforms and Skeletons in Graphics Hardware, The Eurographics Association (2004). Toussaint, G.: Hilditch’s Algorithm for Skeletonization. (1997) http://jeff.cs.mcgill.ca/ godfried/teaching/projects97/ azar/skeleton.html Weickert, J.: Anisotropic Diﬀusion in Image Processing. B.G. Teubner Stuttgart (1998)

3D Cloud and Storm Reconstruction from Satellite Image Somporn Chuai-Aree1,4 , Willi J¨ ager1 , Hans Georg Bock1 , Susanne 1 2 Kr¨ omker , Wattana Kanbua , and Suchada Siripant3 1

2

3

4

Interdisciplinary Center for Scientiﬁc Computing (IWR), University of Heidelberg, Im Neuenheimer Feld 368, 69120 Heidelberg, Germany [email protected], [email protected], [email protected], [email protected] Thai Meteorological Department, 4353 Sukhumvit Road, Bangna, Bangkok 10260, Thailand watt [email protected] Advanced Virtual and Intelligent Computing (AVIC), Chulalongkorn University, Phayathai, Bangkok 10330, Thailand [email protected] Faculty of Science and Technology, Prince of Songkla University, Muang, Pattani 94000, Thailand [email protected]

Abstract The satellite images in Asia are produced every hour by Kochi University, Japan (URL http://weather.is.kochi-u.ac.jp/SE/00Latest.jpg). They show the development of cloud or storm movement. The sequence of satellite images can be combined to show animation easily but it is shown only from the top-view. In this paper, we propose a method to condition the 2D satellite images to be viewed from any perspective angle. The cloud or storm regions are analyzed, segmented and reconstructed to 3D cloud or storm based on the gray intensity of cloud properties. The result from reconstruction can be used for a warning system in the risky area. Typhoon Damrey (September 25 - 27, 2005) and typhoon Kaitak (October 29 November 1, 2005) are shown as a case study of this paper. Other satellite images can be conditioned by using this approach as well.

1 Introduction In recent years there have occurred many storms in the world, especially in South East Asia and the United States. Even the movement of storm can be predicted and tracked step by step, but the catastrophe still happens. The warning systems have to be functioned to people for evacuation from the risky area to a safe region. In this paper we propose the method to motivate the people for evacuating from the area of storm by visualization. The satellite images are captured in time steps of an hour in Fig. 1 shows a typical 2D image from the top view. The reconstruction of those satellite images to represent

188

S. Chuai-Aree et al.

a 3D image of cloud and storm are important for any perspective viewpoint. The image processing of cloud and storm segmentation can be applied for ﬁltering before combining the ﬁltered storm and earth topography data. In this paper we use the satellite images from Kochi University, Japan as a case study. For cloud segmentation, detection, tracking, extraction and classiﬁcation, there are many methods to overcome these problems, for example, Tian et al. studied cloud classiﬁcation with neural networks using spectral and textural features in [TSA99], Visa et al. proposed a method of neural network based on cloud classiﬁer in [VIVS95], Hong et al. have also used the Artiﬁcial Neural Network (ANN) for cloud classiﬁcation system in [HHGS05]. Griﬃn et al. applied the Principal Component Analysis (PCA) for characterizing and delineating of plumes, clouds and ﬁres in hyperspectral images in [GHBS00]. The fuzzy method has been used by Hetzheim for characterizing of clouds and their heights by texture analysis of multi-spectral stereo images in [Het00]. Kubo et al. have extracted the clouds in the Antarctic using wavelet analysis in [KKM00]. Welch et al. have classiﬁed the cloud ﬁled classiﬁcation based upon high spatial resolution textural feature in [Wel88]. Yang et al. have used wavelets to detect the cloud region in sea surface temperature images by combining data from NOAA polar-orbiting and geostationary satellites in [YWO00]. Mukherjee et al. have tracked the cloud region by scale space classiﬁcation in [MA02]. In this paper, we propose the two new techniques for image segmentation of cloud and storm using the color diﬀerent of cloud property and segmentation on 2D histogram of intensity against gradient length. From Fig. 1; we can see the cloud and storm regions which need to be segmented. The main purpose of this paper is how to convert the 2D satellite images of Fig. 2 (left image) to the 3D image of Fig. 2 (right image) of cloud and storm as virtual reality by using a given virtual height. The paper is organized as follows: in section 2 the satellite image, its properties and in section 3 the segmentation of cloud and storm are presented. Section 4 describes the volume rendering by sliced reconstruction. The visualization methods and animations are shown in section 5. Finally, the conclusion and further works are given in section 6.

2 Satellite Image and Its Properties The color values of cloud and storm regions are mostly in gray. They can be seen clearly when their intensities are high. In the color satellite image, some regions of thin layers of cloud are over the earth and islands which changed the cloud color from gray-scale to some color deviations as shown in Fig. 3 in the red circle. In this paper we use the satellite images from MTSAT-IR IR1 JMA, Kochi University, Japan at URL http://weather.is.kochi-u.ac.jp/SE/00Latest.jpg

3D Cloud and Storm Reconstruction from Satellite Image

189

Figure 1. 2D satellite image on September 9, 2005 at 10:00GMT

Figure 2. The conversion of 2D satellite image to 3D surface

(latest ﬁle). The satellite image consists of a combination of the cloud satellite image and a background topography image from NASA. Fig. 3 shows the cloud color which can be varied by gray from black (intensity value = 0) to white (intensity value = 255). The background consists of the land which varies from green to red, and the ocean which is blue. Cloud regions are distributed everywhere on the background.

190

S. Chuai-Aree et al.

Figure 3. Satellite image on September 23, 2005 at 21:00GMT

3 Cloud and Storm Segmentation This section describes two methods for cloud and storm segmentation. In the ﬁrst method we deﬁne two parameters for segmenting the cloud region from the ocean and earth namely Cdv (Color Diﬀerence Value) and Ccv (Cloud Color Value). The second method provides the segmentation by gradient length and pixel intensity. 3.1 Image Segmentation by Color Diﬀerence and Color Value Let I be a set of input images with a width W and a height H, P be a set of pixels in I (P ∈ I), B be a set of background pixels, C be a set of cloud or storm pixels, and pi,j be a pixel in row i and column j. The pixel pi,j consists of four elements namely red (RR), green (GG), blue (BB) for color image and gray (YY). The description of each set is given in equation (1). P = {pi,j | (0 ≤ pi,j ≤ 255) ∧ (1 ≤ i ≤ W ) ∧ (1 ≤ j ≤ H)} pi,j = {RRi,j , GGi,j , BBi,j , Y Yi,j |RRi,j , GGi,j , BBi,j , Y Yi,j ∈ [0, 255]} C = {p ∈ P |(|RRi,j − GGi,j | ≤ Cdv) ∧ (|GGi,j − BBi,j | ≤ Cdv)∧ (1) (|RRi,j − BBi,j | ≤ Cdv) ∧ (Y Yi,j ≥ Ccv) ∧ (0 ≤ Ccv ≤ 255)∧ (0 ≤ Cdv ≤ 255)} P = B∪C

3D Cloud and Storm Reconstruction from Satellite Image

191

The pixel pi,j in color image can be transformed to gray-scale (Y Yi,j ) by the following equation (2). Y Yi,j = Round(0.299 ∗ RRi,j + 0.587 ∗ GGi,j + 0.114 ∗ BBi,j ) Y Yi,j ∈ {0, 1, 2, ..., 255}

(2)

The gray-scale is used to condition all pixels in P . Each pixel has red, green, blue and gray channel. The gray value and the diﬀerent values between red-green, green-blue and red-blue are conditioned by the speciﬁed parameters for checking the group of cloud pixels. Algorithm for checking cloud pixels For checking all pixels pi,j in P , the diﬀerent values between red and green, green and blue, red and blue are bounded by the value of Cdv. The gray-scale value is greater than or equal to the parameter Ccv. If these conditions are true, then the current pixel pi,j is satisﬁed to be a cloud pixel in C. The algorithm is given below. For all pixels do { Calculate gray values Y Yi,j from Pi,j with equation (2) Deﬁne two parameters : Cdv and Ccv Pixel is cloud if all following conditions are true : (|RRi,j − GGi,j | ≤ Cdv) and (|GGi,j − BBi,j | ≤ Cdv) and (|RRi,j − BBi,j | ≤ Cdv) and (Y Yi,j ≥ Ccv) } Fig. 4 shows the comparison between the diﬀerent values of two parameters Cdv and Ccv. The Cdv and Ccv value of the ﬁrst row are 50, 140 and 70, 100 for the second row, respectively. Fig. 4(a) and 4 (c) are segmented cloud and storm regions, 4(b) and 4(d) are background of 4(a) and 4(c), respectively. The second example of the world satellite image is shown in Fig. 5. Fig. 5(a) and 5(d) are input images and they are similar. Fig. 5(b) and 5(c) are segmented by the parameter Cdv = 106, Ccv = 155, Fig. 5(e) and 5(f) are the output from the parameter Cdv = 93, Ccv = 134. Fig. 4 and 5 show that the parameter Cdv and Ccv aﬀected the size and shape of cloud and storm regions. The bigger value of Cdv can take wider range of cloud region and also depends on the parameter Ccv. 3.2 Image Segmentation by Gradient Length and Its Intensity Our second method describes a calculation of gradient length and its intensity for segmentation based on the 2D histogram. This method transforms

192

S. Chuai-Aree et al.

Figure 4. Segmented cloud and storm from Fig. 1, (a) and (b) by Cdv = 50, Ccv = 140, (c) and (d) by Cdv = 70, Ccv = 100

the input image to the 2D histogram of gradient length and intensity. Let pi,j be the gradient of a pixel pi,j . The calculation of gradient length is given by equation (3). ; 2 2 pi,j = (pi+1,j − pi−1,j ) + (pi,j+1 − pi,j−1 ) (3) pmax = max{ pi,j }; ∀ i, j pmin = min{ pi,j }; ∀ i, j The 2D histogram is plotted in 2D as gradient length, and intensity on vertical axis and horizontal axis, respectively. The size of histogram is set to 255x255 since the intensity of each pixel is mapped on the horizontal axis, and also the gradient length of each pixel is mapped on the vertical axis. Let Ω be a set of histogram points, hm,n be a frequency of the intensity and gradient length position at the point (m, n), where (0 ≤ m ≤ 255) and (0 ≤ n ≤ 255), hmax be the maximum frequency of all histogram points, α be a multiplying factor for mapping all frequencies on 2D plane, ρ(hm,n ) be the intensity of a plotting point (m, n) on the histogram Ω, pmax , and pmin be the maximum and minimum intensity value of all pixels in P . The intensity position m and gradient length position n are computed by equation (4).

3D Cloud and Storm Reconstruction from Satellite Image

193

Figure 5. Segmented cloud and storm (a) and (b) by Cdv = 106, Ccv = 155, (c) and (d) by Cdv = 93, Ccv = 134

pmax pmin m n hmax α

= max{pi,j }; ∀ i, j = min{pi,j }; ∀ i, j (pi,j −pmin ) = Round(255 ∗ (pmax −pmin ) ) (

p

−

p

)

min ) = Round(255 ∗ ( p i,j p min ) max − = max{hm,n }; ∀ m, n = Log10255 (hmax ) Log

(h

(4)

)

10 m,n ρ(hm,n ) = α Log10 (hmax )

Fig. 6 shows the transformation of a gray-scale image to the 2D histogram. All points in gray-scale image are mapped on 2D histogram which is referred to gradient length and intensity. The segmented region (white region) on the histogram means the selected area for segmenting the gray-scale image. The segmented result is shown by white region. The rectangles on the 2D histogram are related to the segmented regions on the input image. The segmentation process operates on the 2D histogram and sends the results of segmentation to the output image (right images). The comparison of cloud and storm segmentation between gray-scale and color image is shown in Fig. 7 using the same segmented region on its histogram. The segmented results are shown in the right column which are nearly similar to each other. The middle column shows the diﬀerent intensity

194

S. Chuai-Aree et al.

Figure 6. The transformation of gray-scale image to 2D histogram and its segmentation

Figure 7. The comparison of cloud and storm segmentation of the same segmented region: input images (left), 2D histograms (middle) and output images (right), grayscale segmentation (upper row), color segmentation (lower row)

3D Cloud and Storm Reconstruction from Satellite Image

195

distributions of histogram between gray-scale and color image since the operation of color image has been done on red, green and blue channel.

4 Volume Rendering by Sliced Reconstruction In this section, the volume rendering method is described. The advantages of OpenGL (Open Graphics Library) are applied by using the alpha-cut value. Each satellite image is converted to N slices by using diﬀerent alpha-cut values from minimum alpha-cut value (ground layer) to maximum alpha cut value (top layer). The alpha-cut value is a real value in [0,1]. Fig. 8 shows the structure of sliced layers from a satellite image.

Figure 8. 2D surfaces for volume rendering of cloud and storm reconstruction

4.1 Volume Rendering Algorithm 1. Deﬁne number of sliced layer (N ) and cloud layer height (CloudLayerH), 2. Deﬁne virtual cloud height (CloudHeight) value and unit cell size (κ), 3. For all sliced layers do a) deﬁne cloud density (CloudDens) of each layer b) deﬁne alpha-cut value of current layer

196

S. Chuai-Aree et al.

c) draw rectangle with texture mapping of satellite image with its alphacut value The source code for volume rendering by sliced images is given in the following code. CloudLayerH := 0.01; for i:=1 to N do begin CloudDens := 0.5*(100-CloudDensity)/100; glAlphaFunc(GL GREATER, CloudDens + (1-CloudDens)*i/N ); glNormal3f(0,1,0); glBegin(GL QUADS); glTexcoord2f(0,0); glVertex3f(-50*κ, CloudLayerH*CloudHeight+0.1*i/N , -50*κ); glTexcoord2f(w,0); glVertex3f( 50*κ, CloudLayerH*CloudHeight+0.1*i/N , -50*κ); glTexcoord2f(w,h); glVertex3f( 50*κ, CloudLayerH*CloudHeight+0.1*i/N , 50*κ); glTexcoord2f(0,h); glVertex3f(-50*κ, CloudLayerH*CloudHeight+0.1*i/N , 50*κ); glEnd(); end; In the source code, the value 50 is a speciﬁc size of each polygon. The polygon of each layer is drawn on XZ-plane from (-50κ, -50κ), (50κ, -50κ), (50κ, 50κ) and (-50κ, 50κ), respectively.

5 Visualization and Animation This section explains the visualization technique and results of two case studies of typhoon Damrey and typhoon Kaitak. This paper proposes two methods for visualization. The ﬁrst method uses the segmented cloud and storm regions from segmentation process with real topography (Etopo2). The full modiﬁcation of virtual height of cloud and earth is applied in the second method. The end user can select any visualization. 5.1 Visualization Using Etopo Data Real data from satellite topography namely Etopo2(2 minutes grid ≈ 3.7 kilometers near equator.) can be retrieved from NOAA at the highest resolution in Asia. Fig. 9 (left) shows the spherical map of the world using Etopo10 (10 minutes) and the case study of Etopo2 is shown in Fig. 9 (right).

3D Cloud and Storm Reconstruction from Satellite Image

197

Figure 9. 3D topography of world (Etopo10) and earth (Etopo2)

Visualization Procedure 1. Read the target region of Etopo2 data for 3D surface of the earth, 2. Calculate the average normal vector of each grid point, 3. For all time steps do a) read satellite input images in the target period of time (every hour), b) apply segmentation method for cloud and storm ﬁltering, c) draw Etopo2 surface of earth and all sliced layers of cloud with its virtual height, d) apply light source to the average normal vector for all objects. 4. Show the animation of all time steps Fig. 10 shows the result of ﬁxed satellite image in diﬀerent perspectives of typhoon Damrey using Etopo2. The virtual height of ﬁltered cloud and storm is deﬁned by the user. 5.2 Visualization Using Fully Virtual Height In this section, each satellite image is mapped to the whole volume for all layers with given maximum virtual height. The alpha-cut value for intermediate layers is interpolated. The topography of earth and ocean is the result from the ﬁltering process of each satellite image. Visualization Procedure 1. Deﬁne the maximum virtual height value 2. For all time steps do a) read satellite input images in the target period of time (every hour), b) apply segmentation method for cloud and storm ﬁltering,

198

S. Chuai-Aree et al.

Figure 10. The typhoon Damrey from diﬀerent perspectives

c) draw all sliced layers of cloud with its virtual height, d) apply texture mapping for all slices. 3. Show the animation of all time steps The result of this technique is shown in Fig. 11 in diﬀerent perspectives. The ﬁltering process gives a smooth result for ground layers and cloud layers. The alpha-cut value is applied for all slices using the algorithm in 4.1. 5.3 Visualization of Numerical Results In order to compare the behavior of storm movement, this paper proposes the result from numerical results of these two typhoons using the model MM5 for weather simulation. This study meteorological simulation is carried out by using the nonhydrostatic version of MM5 Mesoscale Model from NCAR/PSU (National Center for Atmospheric Research/Pennsylvania State University) in [Dud93] and [GDS94]. The model has been modiﬁed to execute for parallel processing

3D Cloud and Storm Reconstruction from Satellite Image

199

Figure 11. The typhoon Kaitak from diﬀerent perspectives

by using MPI version. MM5 Version 3 Release 7 (MM5v3.7) was compiled by using PGI version 6.0 and operated on the Linux tle 7.0 platform. The calculations were performed on the ﬁrst three days of typhoon Damrey (September 25-28, 2005) and typhoon Kaitak (October 28 - November 1, 2005) period. The central latitude and longitude of the coarse domain was 13.1 degree North and 102.0 degree East respectively, and the Mercator map projection was used. The vertical resolution of 23 pressure levels progressively increased towards the surface. The 3D storm reconstruction and numerical solution of typhoon Damrey movement are shown in Fig. 12 and Fig. 13 for every 6 hours starting from September 25, 2005 at 01:00GMT (left to right, top to bottom) to September 26, 2005 at 07:00GMT. The cloud volume is calculated by marching cube method from a given iso-surface value. The second 3D storm reconstruction and numerical result of typhoon Kaitak for every 6 hours starting from October 29, 2005 at 02:00GMT (left to right, top to bottom) to October 30, 2005 at 08:00GMT are shown in Fig. 14 and Fig. 15, respectively. The numerical

200

S. Chuai-Aree et al.

results were visualized by our software namely VirtualWeather3D which is working on Windows Operating System.

Figure 12. The 3D storm reconstruction of typhoon Damrey every 6 hours starting from September 25, 2005 at 01:00GMT (left to right, top to bottom)

3D Cloud and Storm Reconstruction from Satellite Image

201

Figure 13. The numerical result of typhoon Damrey every 6 hours starting from September 25, 2005 at 01:00GMT (left to right, top to bottom)

202

S. Chuai-Aree et al.

Figure 14. The 3D storm reconstruction of typhoon Kaitak for every 6 hours starting from October 29, 2005 at 02:00GMT (left to right, top to bottom)

3D Cloud and Storm Reconstruction from Satellite Image

203

Figure 15. The numerical result of typhoon Kaitak for every 6 hours starting from October 29, 2005 at 02:00GMT (left to right, top to bottom)

204

S. Chuai-Aree et al.

6 Conclusion and Further Works

Figure 16. The 3D reconstruction of hurricane Katrina: input image (from NASA) (a) in diﬀerent perspectives

This paper has proposed a methodology for reconstructing the cloud and storm from satellite images by converting to 3D volume rendering which shall be useful for the warning system. Two methods for cloud and storm segmentation are described by using the parameters Cdv, Ccv in the ﬁrst method and using the histogram of gradient length and intensity in the second method. For visualization, two methods are shown by using the Etopo2 data and using fully virtual height given by the end user. The method can be used for any kind of satellite images both gray-scale and color image. Other examples for the hurricane Katrina approaching New Orleans on August 28, 2005 and the hurricane Kyrill in Europe on January 18, 2007 are shown in Fig. 16 and Fig. 17, respectively. The virtual height parameter can be adjusted by the end user as a maximum virtual height. The numerical results from Virtual Weather 3D show

3D Cloud and Storm Reconstruction from Satellite Image

205

Figure 17. The 3D storm reconstruction of hurricane Kyrill every 6 hours starting from January 18, 2007 at 01:00GMT (left to right, top to bottom)

the movement of cloud and storm volume to the real height of each pressure level. The software supports both visualizing the satellite images and numerical results from MM5 Model. The animation results can be captured for every time step. The combination of predicted wind speed and direction will be applied to the satellite images in our further work.

7 Acknowledgment The authors wish to thank the EEI-Laboratory at Kochi University for all satellite images, NASA for input image in Fig. 16 and two meteorologists namely Mr. Somkuan Tonjan and Mr. Teeratham Tepparat at the Thai Meteorological Department in Bangkok, Thailand for their kindness in executing the MM5 model. Finally, the authors would like to thank the National Geophysical Data Center (NGDC), NOAA Satellite and Information Service for earth topography (ETOPO) data.

References [Dud93]

Dudhia, J.: A nonhydrostatic version of the Penn State-NCAR mesoscale model: Validation test and simulation of an Atlantic cyclone and cold front, Mon. Wea. Rev., 121, 1493–1513 (1993)

206

S. Chuai-Aree et al.

[GDS94]

Grell, G., Dudhia, J., and Stauﬀer, D.: A description of the ﬁfth generation Penn State/NCAR Mesoscale Model. NCAR Tech. Note NCAR/TN-398 + STR, (1994) [GHBS00] Griffin, M. K., Hsu, S. M., Burke, H. K., and Snow, J. W.: Characterization and delineation of plumes, clouds and ﬁres in hyperspectral images, in Proc. 2000 IEEE International Geoscience and Remote Sensing Symposium, II, In: Stein, T. I. (ed.) Piscataway: IEEE, 809–812 (2000) [Het00] Hetzheim, H.: Characterisation of clouds and their heights by texture analysis of multi-spectral stereo images, in Proc. 2000 IEEE International Geoscience and Remote Sensing Symposium, V, In: Stein, T. I. (ed.) Piscataway: IEEE, 1798–1800 (2000) [HHGS05] Hong, Y., Hsu, K., Gao, X., Sorooshian, S.: Precipitation Estimation from Remotely Sensed Imagery Using Artiﬁcial Neural Network - Cloud Classiﬁcation System, Journal of Applied Meteorology, 43, No. 12, 1834-1853, (2005) [KKM00] Kubo, M., Koshinaka, H., Muramoto, K.: Extraction of clouds in the Antartic using wavelet analysis, in Proc. 2000 IEEE International Geoscience and Remote Sensing Symposium, V, In: Stein, T. I. (ed.) Piscataway: IEEE, 2170–2172 (2000) [MA02] Mukherjee, D. P., Acton, S. T.: Cloud tracking by scale space classiﬁcation. IEEE Trans. Geosci. Rem. Sens., GE-40, No.2, 405–415 (2002) [TSA99] Tian, B., Shaikh, M. A., Azimi-Sadjadi, M. R., Von der Haar, T. H., Reinke, D. L.: A Study of cloud classiﬁcation with neural networks using spectral and textural features, IEEE Trans. Neural Networks, 10, 138–151 (1999) [VIVS95] Visa, A., Iivarinen, J., Valkealahti, K., Simula, O.: Neural network based cloud classiﬁer, Proc. International Conference on Artiﬁcial Neural Networks, ICANN’95 (1995) [Wel88] Welch, R. M., et al.: Cloud ﬁled classiﬁcation based upon high spatial resolution textural feature (I): Gray level cooccurrence matrix approach, J. Geophys. Res., 93, 12663–12681 (1988) [YWO00] Yang, Z., Wood, G., O’Reilly, J. E.: Cloud detection in sea surface temperature images by combining data from NOAA polar-orbiting and geostationary satellites, in Proc. 2000 IEEE International Geoscience and Remote Sensing Symposium, V, In: Stein, T. I. (ed.) Piscataway: IEEE, 1817–1820 (2000)

Providing Query Assurance for Outsourced Tree-Indexed Data Tran Khanh Dang and Nguyen Thanh Son Faculty of Computer Science and Engineering, HCMC University of Technology, National University of Ho Chi Minh City, Vietnam {khanh, sonsys}@cse.hcmut.edu.vn Abstract Outsourcing database services is emerging as an important new trend thanks to continued developments of the Internet and advances in the networking technology. In this outsourced database service model, organizations rely upon the premises of an external service provider for the storage and retrieval management of their data. Since a service provider is typically not fully trusted, this model introduces numerous interesting research challenges. Among them, most crucial security research questions relate to (1) data conﬁdentiality, (2) user privacy, (3) data privacy, and (4) query assurance. Although there exist a number of research work on these topics, to the best of our knowledge, none of them has dealt with ensuring query assurance for outsourced tree-indexed data. To address this issue, the system must prove authenticity and data integrity, completeness and, not less importantly, provide freshness guarantees for the result set. These objectives imply that (1) data in the result set is originated from the actual data owner and has not been tampered with; (2) the server did not omit any tuples matching the query conditions; and (3) the result set was generated with respect to the most recent snapshot of the database. This is not a trivial task, especially as tree-based index structures are outsourced to untrusted servers. In this paper, we discuss and propose solutions to security issues in order to provide query assurance for outsourced databases that come together with tree-based index structures. Our techniques allow users to operate on their outsourced tree-indexed data on untrusted servers with high query assurance and at reasonable costs. Experimental results with real datasets conﬁrm the efficiency of our approach and theoretical analysis.

1 Introduction Outsourcing database services is emerging as an important new trend thanks to continued growth of the Internet and advances in the networking technology. In the outsourced database service (ODBS) model, organizations rely on the premises of an external service provider, which include hardware, software and manpower, for the storage and retrieval management of their data. Since a service provider is typically not fully trusted, this model raises

208

T.K. Dang and N.T. Son

numerous interesting research challenges related to security issues. First of all, because the life-blood of every organization is the information stored in its databases, making outsourced data conﬁdential is therefore one of the foremost challenges in this model. In addition, privacy-related concerns must also be taken into account due to their important role in the real-world applications. Not less importantly, in order to make the outsourced database service viable and really applicable, the query result must also be proven qualiﬁed. This means that the system has to provide users with some means to verify the query assurance claims of the service provider. Overall, most crucial security-related research questions in the ODBS model relate to the below issues: • • •

•

Data conﬁdentiality: Outsiders and the server’s operators (database administrator - DBA) cannot see the user’s outsourced data contents in any cases (even as the user’s queries are performed on the server). User privacy: Users do not want the server and even the DBA to know about their queries and the results. Ensuring the user privacy is one of the keys to the ODBS model’s success. Data privacy: Users are not allowed to get more information than what they are querying on the server. In many situations, users must pay for what they have got from the server and the data owner does not allow them to get more than what they have paid for, or even users do not want to pay for what they do not need because of the low bandwidth connections, limited memory/storage devices, etc. This security objective is not easy to obtain and a cost-eﬃcient solution to this issue is still an open question [Dan06b]. Query assurance: Users are able to verify the correctness (authenticity and data integrity), completeness and freshness of the result set. We succinctly explain these concepts as follows (more discussions can be found in [NaT06, MNT04, BGL+03, PJR+05, PaT04, Sio05]): – Proof of correctness: As a user queries outsourced data, it expects a set of tuples satisfying all query conditions and also needs assurance that data returned from the server originated from the data owner and have not been tampered with either by an outside attacker or by the server itself. – Proof of completeness: As a user queries outsourced data, completeness implies that the user can verify that the server returned all tuples matching all query conditions, i.e., the server did not omit any tuples satisfying the query conditions. Note that, a server, which is either malicious or lazy, might not execute the query over the entire database and return no or only partial results. Ensuring the query result completeness aims to detect this unexpected behavior. – Proof of freshness: The user must be ensured that the result set was generated with respect to the most recent snapshot of the database.

Providing Query Assurance for Outsourced Tree-Indexed Data

209

This issue must be addressed so as to facilitate dynamic outsourced databases, which frequently have updates on their data. The above security requirements diﬀer from the traditional database security issues [CFM+95, Uma04] and will in general inﬂuence the performance, usability and scalability of the ODBS model. Although there exist a number of research work on the above topics such as [DuA00, HIL+02, BoP02, DVJ+03, LiC04, ChM04, Dan06a, Dan06b], to the best of our knowledge, none of them has dealt with the problem of ensuring query assurance for outsourced treeindexed data. It has been clearly proven in the literature that tree-indexed data have played an important role in both traditional and modern database applications [Dan03]. Therefore, security issues in query assurance for outsourced tree-indexed data need to be addressed completely in order to materialize the ODBS model. This is even then not a trivial task, especially as tree-based index structures are outsourced to untrusted servers [DuA00, Dan05]. In this paper, we will discuss and propose solutions to security issues in order to provide query assurance for dynamic outsourced databases that come together with tree-based index structures. Our techniques allow users to operate on their outsourced tree-indexed data on untrusted servers with high query assurance and at reasonable costs. Our proposed solutions will address all the three desired security properties of query assurance. Moreover, as presented in [DuA00, MNT04, Dan06b], there are several ODBS models depending on desired security objectives. In this paper, however, we will focus on the most basic and typical ODBS model where only data conﬁdentiality, user privacy, and query assurance objectives should be taken into account. Our holistic solution allows users to manipulate their outsourced data as it is being stored in in-house database servers. The rest of this paper is organized as follows: Section 2 brieﬂy summarizes main related work; Section 3 introduces a state-of-the-art approach to managing outsourced tree-indexed data without query assurance; In section 4, we present our contributions to completely solve the problem of query assurance in dynamic outsourced tree-indexed data; Section 5 shows experimental results with real datasets in order to establish the practical value of our proposed solutions; and, ﬁnally, section 6 gives conclusions and future work.

2 Related Work Although various theoretical problems concerning computation with encrypted data and searching on encrypted data have appeared in the literature [Fon03], the ODBS model which heavily depends on data encryption methods has just emerged not long ago [DuA00, HMI02, Dan06b]. Even then it has rapidly got special attention from the research community due to a variety of conveniences brought in as well as interesting research challenges related [Dan05]. The foremost research challenge relates to security objectives

210

T.K. Dang and N.T. Son

for the model as introduced in Section 1. In Figure 1 we diagrammatically summarize security issues in the ODBS model, together with major references to the corresponding state-of-the-art solutions.

ODBS model [DuA00, HMI02, Dan06b]

Confidentiality

Privacy

Query Assurance

Auditing

[BoP02, DVJ+03, Dan05]

[BDW+04, Dan06b]

User Privacy

Data Privacy

[HIL+02, LiP04, ChM04, Dan06a]

[GIK+08, DuA00, Dan06b]

Correctness [BGL+03, MNT04, PaT04, PJR+05, NaT06, Sio05, this paper]

Completeness

Freshness

[PJR+05, NaT06, Sio05, this paper]

[this paper]

Figure 1. Security issues in the ODBS model

As shown in Figure 1, most security objectives of the ODBS model have been investigated. To deal with data conﬁdentiality issue, most approaches adopted to encrypt (outsourced) data before its being stored at the external server [BoP02, DVJ+03, Dan05]. Although this solution can protect the data from outsiders as well as the server, it introduces diﬃculties in querying process: It is hard to ensure the user and data privacy as performing queries over encrypted data. In general, to address the privacy issue (including both user and data privacy), outsourced data structures (tree- or non tree-based) that are employed to manage the data storage and retrieval should be considered. Notably, the problem of user privacy has been quite well-solved (even without special hardware [SmS01]) if the outsourced database contains only encrypted records and no tree-based indexes are used for the storage and retrieval purposes (see [Dan06b] for an overview). However, the research result is less incentive in case such trees are employed although some proposals have been made as [LiC04,Dan06b]. In our previous work [Dan06b], we did propose an extreme protocol for the ODBS model based on private information retrieval (PIR)like protocols [Aso01]. It would, however, become prohibitively expensive if only one server is used to host the outsourced data [CGK+95]. In [DVJ+03], Damiani et al. also gave a solution to query outsourced data indexed by B + trees but their approach does not provide an oblivious way to traverse the tree and this may lead to compromise security objectives [LiC04,Dan06a]. Of late, Lin and Candan [LiC04] introduced a computational complexity approach to solve the problem with sound experimental results reported. Their solution, however, only supports oblivious search operations on outsourced search trees, but insert, delete, and modiﬁcation ones. That means their solution can not

Providing Query Assurance for Outsourced Tree-Indexed Data

211

be applied to dynamic outsourced search trees where several items may be inserted into and removed from, or existing data can be modiﬁed. In our very recent work [Dan06a], we analyzed and introduced techniques to completely solve the problem of data conﬁdentiality and user privacy, but query assurance, in the ODBS model with dynamic tree-indexed data supports. In section 3 we will elaborate on these techniques and extend them in order to deal with the three security objectives of query assurance as mentioned above. Contrary to the user privacy, although there are initial research activities, see [GIK+98, DuA00, Dan06b], the problem of data privacy still needs much more attention. In [GIK+98], Gertner et al. ﬁrst time considered the data privacy issue in the context of PIR-like protocols and proposed the so-called SPIR-Symmetrically PIR-protocol in order to prevent users from knowing more than answers to their queries. Unfortunately, such PIR-based approaches can not be applied to the ODBS model because the data owners in PIR-like protocols are themselves the database service providers. In [DuA00], Du and Atallah introduced protocols for secure remote database access with approximate matching with respect to four diﬀerent ODBS models requiring diﬀerent security objectives among those presented in the previous section. Even so, their work did not support outsourced tree-indexed data. In our recent work [Dan06b] we presented a solution to ensuring data privacy in the ODBS model, which can be applied to tree-indexed data as well. Nevertheless, our proposed solution must resort to a trusted third party, which is not easy to ﬁnd in practice. Recently, addressing the three issues of query assurance has also attracted many researchers and, as a result, a number of solutions have been proposed as [BGL+03,MNT04,PaT04,PJR+05,NaT06,Sio05]. We must even now note that none of them has given a solution to the problem of guaranteeing the query result freshness (cf. Figure 1). To prove the correctness of a user’s query results, the state-of-the-art approaches [BGL+03, MNT04, PaT04, Sio05] employed some aggregated/condensed digital signature scheme to reduce the communication and computation costs. First, Boneh et al. [BGL+03] introduced an interesting aggregated signature scheme that allows aggregation of multiple signers’ signatures generated from diﬀerent messages into one short signature based on elliptic curves and bilinear mappings. This scheme was built based on a “Gap Diﬃe-Hellman” group where the Decisional DiﬃeHellman problem is easy while the Computational Diﬃe-Hellman problem is hard [JoN01]. Despite a big advantage that this scheme can be applied to different ODBS models, it must bear a disadvantage related to the performance. As shown in [MNT04], the computational complexity of Boneh et al.’s scheme is quite high for practical uses in many cases. Second, in [MNT04] Mykletun et al. introduced a RSA-based condensed digital signature scheme that can be used for ensuring authenticity and data integrity in the ODBS model. Their scheme is concisely summarized as follows. Condensed-RSA Digital Signature Scheme: Suppose pk=(n, e) and sk=(n, d) are the public and private keys, respectively, of the RSA signature

212

T.K. Dang and N.T. Son

scheme, where n is a k-bit modulus formed as the product of two k/2-bit primes p and q. Assume φ(n) = (p − 1)(q − 1), both public and private exponents e, d ǫ Zn∗ and must satisfy ed ≡ 1 mod φ(n). Given t diﬀerent messages {m1 , ..., mt } and their corresponding signatures {s1 , ..., st } that are generated by the same signer. A condensed-RSA signature is computed as follows: t s1,t =Πi=1 si mod n. This signature is of the same size as a single standard RSA signature. To verify the correctness of t received messages the user must mult h(mi ) mod n. tiply the hashes of all t messages and check that: (s1,t )e ≡ Πi=1 As we can see, the above scheme is possible due to the fact that RSA is multiplicatively homomorphic. We will apply this scheme to our ODBS model in order to provide correctness guarantees of the received tree nodes from the server (cf. Section 4.1). Note that, however, this scheme is applicable only for a single signer’s signatures. Sion [Sio05] also employed this approach to deal with the correctness of query results in his scheme. Besides, in [PaT04], Pang and Tan applied and modiﬁed the idea of “Merkle Hash Trees” (MHT) [Mer80] to provide a proof of correctness for edge computing applications, where a trusted central server outsources parts of the database to proxy servers located at the edge of the network. In [NaT06], however, the authors pointed out possible security ﬂaws in this approach. Furthermore, there are also some approaches to deal with the completeness of a user’s query results [Sio05, PJR+05, NaT06]. First, in [Sio05], the author proposed a solution to provide such assurances for arbitrary queries in outsourced database frameworks. The solution is built around a mechanism of runtime query “proofs” in a challenge-response protocol. More concretely, before outsourcing the data, the data owner partitions its data into k segments {S1 , ..., Sk }, computes hashes for each segment, H(Si ), i = 1, k, then stores (outsources) them all together at the service provider’s server. In addition, the data owner also calculates some “challenge tokens” with respect to Si . Actually, the challenge tokens are queries that the data owner already knows their results, which can be used for veriﬁcation later. Whenever a batch of queries are sent to the server, certain challenge token(s) are also sent together. The result set is then veriﬁed using the challenge tokens for its completeness. Although this approach can be applied to diﬀerent query types, not 100% of the query assurance (the completeness) can be guaranteed because there are chances for a malicious server to “get away” with cheating in the query execution phase (i.e., the server only needs to “guess” and return the correct answer to the challenge token together with fake result sets for other queries in the batch, but nothing else). Moreover, this approach also introduces cost-ineﬃciency for database updates because the challenging answers must be recalculated. Seriously, although the author did not aim to address the user privacy issue in the paper, we should note that user privacy in this approach may be compromised because the server knows what data segments are required by the user so inference and linking attacks can be conducted [Dan06b, DVJ+03]. Second, in [PJR+05], the authors introduced a solution based on aggregated signature schemes and MHT to provide the

Providing Query Assurance for Outsourced Tree-Indexed Data

213

completeness of the query result. This approach is an extension of that presented in their previous work [PaT04], which has been proven insecure due to some possible security ﬂaws [NaT06]. Last, in [NaT06], the authors developed an approach, called Digital Signature Aggregation and Chaining (DSAC), which achieves both correctness and completeness of query replies. However, in their approach, tuples must be pre-sorted in ascending order with respect to each searchable dimension for calculation of the signature chain, and thus it still does not support outsourced tree-indexed data where the order of tree nodes’ contents is not able to determine. This pre-sorting requirement also has a negatively tremendous impact on data updates, hence the total performance of the system will be degenerated. Apart from the security issues as mentioned above and in Section 1, as we can observe in Figure 1, there exists another question, that is “How can the server conduct auditing activities in systems provided with such security guarantees (without employing special hardware equipment)?”. The server may not know who is accessing the system (see, e.g., [Dan06b]), what they are asking for, and what the system returns to the user, and thus how can it eﬀectively and eﬃciently tackle the accountability or develop intrusion detection/prevention systems? The goals of privacy-preserving and accountability appear to be in contradiction and an eﬃcient solution to balance the two is still open. More discussions about this topic can be found in a recent publication [BDW+04]. In Section 3 below we will elaborate on the state-of-the-art approach proposed in [Dan06a] to managing the storage and retrieval of dynamic outsourced tree-indexed data, and in Section 4 we will extend this approach to strengthen it with query assurance supports, including all the three concerned security objectives.

3 A Pragmatic Approach to Managing Outsourced Tree-Indexed Data As discussed in the literature, tree-based index structures take an indispensable role in both traditional and modern database applications [Dan03]. However, in spite of their advantages these index structures introduce a variety of diﬃculties in the ODBS model [DuA00, Dan06b]. To detail the problem, let’s see Figure 2a illustrating an example of the B + -tree for an attribute CustomerName with sample values. All tree nodes were encrypted before being stored at the outsourcing server to ensure the data conﬁdentiality. Assume a user is querying all customers whose name is Ha on this tree. If we do not have a secure mechanism for the query processing, a sequence of queries that will access in sequence nodes 0, 1, and 5 with respect to the above query will be revealed to the server. In addition, the server also realizes that the user was accessing nodes 0, 1, 5, and node 0 is the root, node 1 is an internal node, node 5 is a leaf node of the tree, and so the user privacy is compromised. More seriously, using such information collected gradually, together

214

T.K. Dang and N.T. Son

with statistical methods, data mining techniques, etc. the server can rebuild the whole tree structure and infer sensitive information from the encrypted database, hence data conﬁdentiality can also be spoiled. Besides, during the querying, the user will also get more information showing that there are at least two other customers named John and Bob in the database so the data privacy is not satisﬁed (note that we will not address the data privacy problem in this paper). 0

John 2

1 Bob 3 Alice

Bob

6

5

4 Anne

Rose

Ha

Carol

Ha

John

Trang

7 Linh

Rose

8 Son

Trang

(a) B+Table NID Node 0 (1,John,2,-,-1) 1 (3,Bob,4,Ha,5) 2 (6,Rose,7,Trang,8) 3 (Alice,Anne,4) 4 (Bob,Carol,5) 5 (Ha,-,6) 6 (John,Linh,7) 7 (Rose,Son,8) (Trang,-,-1) 8

NID 0 1 2 3 4 5 6 7 8

B+EncryptedTable EncryptedNode D0a1n2g3Kh75nhs& T9&8ra§ÖÄajh3q91 H&$uye’’µnÜis57ß@ L?{inh*ß23&§gnaD Wh09a/[%?Ö*#Aj2k j8Hß}[aHo$§angµG #Xyi29?ß~R@€>Kh ~B3!jKDÖbd0K3}%§ T-§µran&gU19=75m

(b)

Figure 2. An example of the B + − tree (a) and the corresponding plaintext and encrypted table (b)

Although Damiani et al. proposed an approach [DVJ+03] to outsourced tree-based index structures, it unfortunately has some security ﬂaws that may compromise the desired security objectives [Dan06b, Dan05]. Recently, in [LiC04, Dan06a], the authors developed algorithms based on access redundancy and node swapping techniques to address security issues of outsourced tree-indexed data. We brieﬂy summarize their solutions below. Obviously, as private data is outsourced with search trees, the tree structure and data should all be conﬁdential. As shown in [DVJ+03], encrypting each tree node as a whole is preferable because protecting a tree-based index by encrypting each of its ﬁelds would disclose to the server the ordering relationship between the index values. Lin and Candan’s approach [LiC04] also follows this solution and, like others [Dan05, Dan06a, DVJ+03], the unit of storage and access in their approach is also a tree node. Each node is identiﬁed by a unique node identiﬁer (NID). The original tree is then stored at the server as a table with two attributes: NID and an encrypted value representing the node content. Let’s see an example: Figure 2a shows a B + -tree built on an attribute CustomerName; Figure 2b shows the corresponding plaintext

Providing Query Assurance for Outsourced Tree-Indexed Data

215

and encrypted table used to store the B + -tree at the external server. As we can see, that B + -tree is stored at the external server as a table over schema B + EncryptedTable = {NID, EncryptedNode}. Based on the above settings, Lin and Candan proposed an approach to oblivious traversal of outsourced search trees using two adjustable techniques: access redundancy and node swapping. Access Redundancy: Whenever a client accesses a node, called the target node, it asks for a set of m-1 randomly selected nodes in addition to the target node from the server. Hence, the probability that the server can guess the target node is 1/m. This technique is diﬀerent from those presented in [DVJ+03], where only the target node is retrieved (this may lead to reveal the tree structure as shown in [Dan05, Dan06b]). Besides the access redundancy, it also bears another weakness: it can leak information on the target node position. This is easy to observe: multiple access requests for the root node will reveal its position by simply calculating the intersection of the redundancy sets of the requests. If the root position is disclosed, there is a high risk that its child nodes (and also the whole tree structure) may be exposed [LiC04]. This deﬁciency is overcome by secretly changing the target node’s address after each time it is accessed. Node Swapping: Each time a client requests to access a node from the server, it asks for a redundancy set of m nodes consisting of at least one empty node along with the target one. The client then (1) decrypts the target node; (2) manipulates its data; (3) swaps it with the empty node; and (4) re-encrypts all m nodes and writes them back to the server. Note that, this technique must re-encrypt nodes using a diﬀerent encryption scheme/key (see [LiC04] for details). Thanks to this, the authors proved that the possible position of the target node is randomly distributed over the data storage space at the server, and thus the weakness of the access redundancy technique is overcome. Although Lin and Candan’s approach only supports oblivious tree search operations, the two above techniques have served as the basis for our further investigation. Based on the access redundancy and node swapping techniques, in [Dan06a] we developed practical algorithms for privacy-preserving search, insert, delete, and modify operations that can be applied to a variety of dynamic outsourced tree-based index structures and uniﬁed user as well as multi-querier model (without data privacy considerations) (see [MNT04, DuA00, Dan06b] for more details about ODBS models). Although our work provided the vanguard solutions for this problem, it did not consider the query assurance problem. In Section 4 we will extend our previous work to address this problem.

4 Query Assurance for Outsourced Tree-Indexed Data In this section, we present an extension of our previous work in [Dan06a], which introduced solutions to the problems of data conﬁdentiality and user

216

T.K. Dang and N.T. Son

privacy in the ODBS model, in order to incorporate solutions to ensuring the correctness, completeness, and freshness of the query results. Section 5 will show the experimental results with real datasets. 4.1 Correctness Guarantees As introduced in Section 1, to guarantee the correctness of the query result set the system must provide a means for the user to verify that the received data originated from the data owner as it is. As analyzed in Section 2, the state-of-the-arts employed the public key cryptography scheme to deal with this problem. With respect to our concerned ODBS model, where data privacy considerations are omitted and only single signer (i.e., only one data owner) participates in the query processing, the RSA-based signature scheme is the most suitable as discussed in Section 2. In our context, outsourced tree-indexed data is stored at the server side as described in the previous section, i.e., as a table over schema EncryptedTable = {NID, EncryptedNode}. Before outsourcing the data, the data owner computes the hash h(m) of each encrypted node m. Here, h() denotes a cryptographically strong hash function (e.g., SHA-1). The data owner then “signs” that encrypted node m by encrypting h(m) with its private/secret key sk and stores the signatures together with EncryptedTable at the server. The table schema stored at the server therefore becomes EncryptedTable = {NID, EncryptedNode, Signature} (see Figure 3). With these settings users can then verify each returned node using the data owner public key pk, hence ensuring the correctness of the result set.

NID 0 1 2 3 4 5 6 7 8

B+Table Node (1,John,2,-,-1) (3,Bob,4,Ha,5) (6,Rose,7,Trang,8) (Alice,Anne,4) (Bob,Carol,5) (Ha,-,6) (John,Linh,7) (Rose,Son,8) (Trang,-,-1)

B+Encrypted Table Encrypted Node Signature NID 0 s0 D0a1n2g3Kh75nhs& 1 T9&8ra§ÖÄajh3q91 s1 2 H&$uye’’µnÜis57ß@ s2 3 L?{inh*ß23&§gnaD s3 Wh09a/[%?Ö*#Aj2k 4 s4 5 j8Hß}[aHo$§angµG s5 #Xyi29?ß~R@€>Kh 6 s6 7 ~B3!jKDÖbd0K3}%§ s7 T-§µran&gU19=75m 8 s8

Figure 3. EncryptedTable with tree node contents’ signatures

Although the naive approach above ensures the security objective, it is expensive because the number of signatures to verify equals the redundancy set size. To solve this issue, we employ the condensed-RSA digital signature scheme based on the fact that RSA is multiplicatively homomorphic as presented in Section 3 as follows: Given t input encrypted nodes {m1 , ..., mt } (the redundancy set) and their corresponding signatures {s1 , ..., st }, the server

Providing Query Assurance for Outsourced Tree-Indexed Data

217

computes a condensed RSA signature s1,t as the product of these individual signatures and sends it together with the redundancy set to the user. The user, in turn, will then be able to verify the condensed signature s1,t by employing the hashes computed from all received nodes (in the corresponding redundancy set) as shown in Section 3. With this method, not only the query result correctness is ensured, but both communication and computation costs are also tremendously reduced. Note that, in this case the server has to send only one condensed-RSA signature s1,t to the user for veriﬁcation instead of t individual ones. Section 5 will show the experimental results. 4.2 Completeness Guarantees Completeness guarantees mean that the server did not omit any tuples matching the query conditions. In our context, as a user asks the server for a redundancy set A of t nodes A={m1 , ..., mt } and the server returns him a set R of t nodes R={n1 , ..., nt }, the user must be able to verify that A = R. As presented in Section 3, a user asks for any encrypted nodes (at the server side) through their NIDs. Therefore, the user should be provided with a means of verifying that NID of each mi , i = 1, t, equals NID of each corresponding ni , i = 1, t. To ensure this, our solution is embarrassingly simple: a NID is encrypted with the corresponding node contents and this encrypted value is stored at the server side, together with its signature. Users can then check if the server returned the NIDs (in the redundancy set) that s/he required (the completeness) as well as verify the query result correctness (as shown in Section 4.1). This idea is clearly illustrated in Figure 4.

NID

Including NIDs (encrypted with the corresponding node contents)

0 1 2 3 4 5 6 7 8

B+Encrypted Table Signature Encrypted Node xD0a1n2g3Kh75nhs& yT9&8ra§ÖÄajh3q91 zH&$uye’’µnÜis57ß@ mL?{inh*ß23&§gnaD nWh09a/[%?Ö*#Aj2k oj8Hß}[aHo$§angµG p#Xyi29?ß~R@ C>Kh q~B3!jKDÖbd0K3}%§ fT-§µran&gU19=75m

s0 s1 s2 s3 s4 s5 s6 s7 s8

Figure 4. Settings for verifying completeness guarantees

In more detail, Figure 4 sketches settings for verifying completeness guarantees of the system. First, the encrypted value with respect to the attribute EncryptedNode also includes the NID of its corresponding node (for example, in the ﬁrst row, the encrypted value also includes value 0). Second, the data

218

T.K. Dang and N.T. Son

owner signs each encrypted node using the RSA signature scheme, then stores the signature (e.g., s0 ) together with the NID and its corresponding encrypted value as described in the previous section. Note that, verifying the completeness and correctness must be carried out together, i.e., the user cannot omit any of them. This is also true for freshness guarantees presented below. 4.3 Freshness Guarantees As discussed previously, with dynamic outsourced databases, ensuring only the correctness and completeness of the result set is not enough. But, apart from those, the system must also provide a means for users to verify that the received nodes are from the most recent database state, but the older one(s). Either motivating by clear cost-incentives for dishonest behavior or due to intrusions/viruses, the server may return users obsolete nodes, which do not truly reﬂect the state of the outsourced database at the querying time. This is not a less important problem that also needs to be sorted out to make the ODBS model viable. Actually, in [NaT06] the authors did mention this problem and outlined a possible solution based on MHTs but no cost evaluation has been given (note that, MHTs-based approaches to the ODBS model are quite expensive, especially for dynamic outsourced tree-indexed data [NaT06]). In this section, we propose a vanguard solution to this problem and a comprehensive evaluation for all concerned security objectives will be presented in the next section.

NID

Including NIDs and timestamps of the child nodes

0 1 2 3 4 5 6 7 8

B+Encrypted Table Signature Encrypted Node D0a1n2g3Kh75nhs. T9&8ra§ÖÄajh3q91c%. H&$uye’’µnÜis57ß@j9. L?{inh*ß23&§gnaDxKh{}~B3!jKDÖbd0K3}%§5, T-§µran&gU19=75mz*

s0 s1 s2 s3 s4 s5 s6 s7 s8

Figure 5. Settings for verifying freshness guarantees

To solve the problem of freshness guarantees, users must be able to verify that the server did return them the most up-to-date required tree nodes (at the time it processed the query). Our solution is also quite simple, but sound and complete, based on timestamps: A timestamp of each child node is stored at its parent node. This timestamp changes as the child node is updated. In other words, a node keeps timestamps of all of its child nodes, and a user

Providing Query Assurance for Outsourced Tree-Indexed Data

219

can then check (from the root node) if the server returned him the latest version of the required node: As accessing the root, the user knows in advance all timestamps of its child nodes, and as a child node is returned s/he can check if this node’s timestamp equals the known value, and so on. There is, however, one question arose: How can users check the root’s timestamp? The answer to this question is quite simple: In the settings for access redundancy and node swapping techniques, there is a special node called SNODE that keeps some meta-data and the root’s address. The SNODE’s address and its decryption key are known to all qualiﬁed users. Therefore, SNODE will keep the timestamp of the root, and each qualiﬁed user is also informed about the timestamp of SNODE. With these settings, freshness guarantees of the query result can be eﬀectively veriﬁed. Note that, the encrypted value representing the corresponding node contents now includes not only its NID, but also timestamps of the child nodes. The corresponding signature is computed based on this ﬁnal encrypted value. Figure 5 clearly illustrates this.

5 Experimental Results To conﬁrm theoretical analyses carried out in previous sections and establish the practical applicability of our approach, we implemented a prototype system and evaluated the proposed solutions with real datasets. For all experiments, we used 2-dimensional datasets, which were extracted from the SEQUOIA dataset at http://www.rtreeportal.org/spatial.html. The SEQUOIA dataset consists of 2-dimensional points of the format (x, y), representing locations of 62556 California place names. We extracted 5 sub-datasets of 10K, 20K, 30K, 40K and 50K points from the SEQUOIA dataset for experiments. To manage the spatial points, we employed 2-dimensional kd-trees due to its simplicity. For all the trees, we set the maximum number M of data items that a leaf node can keep to 50 and the minimum ﬁll factor value to 4%. This means that each tree leaf node must contain at least 2 points and can store up to 50 points. Furthermore, the tree was stored in a data storage space with 22500-node capacity, divided into 15 levels of 1500 nodes each (see [Dan06a, LiC04] for detailed meanings of these settings). Our prototype system consisted of only one P4 CPU 2.8GHz/1GB RAM PC running Windows 2003 Server. Both client and server were accommodated in the same computer so, for all experiments, we will report averaged time to complete a user request, which can represent the averaged CPU-cost of each client request, and analyze averaged IO- and communication-cost. In addition, all programs were implemented using C#/Visual Studio .NET 2003 and we employed the DES algorithm for the encryption of data, the RSA signature scheme (1024 bits key) with SHA-1 hashing for the digital signatures. We did experiments with all major basic operations, including search (for both point and range queries) and updates (inserts and deletes). Note that, modify operations are combinations of inserts and deletes [Dan05, Dan06a].

220

T.K. Dang and N.T. Son

In addition, because there is no previous work built on the same or similar scheme and addressed the same problem, we had to build our scheme from scratch and did experiments to evaluate our solutions to the query assurance issue on the basis of the condensed-RSA signature scheme and the naive/standard RSA signature scheme (cf. Sections 2, 4). All the security objectives of the query assurance issue (i.e., correctness, completeness, and freshness guarantees) were taken into account. The details are as follows. Initially, we did experiments with the biggest dataset, 50K points for insert, delete, point, and range queries in order to see the performance of both naive RSA and condensed-RSA based solutions. The redundancy set size is set to 4 for the tests. Figure 6 shows the experimental results concerning the CPUcost. It is clearly shown that the condensed-RSA scheme CPU-cost is much better that of the naive RSA scheme. Note that the averaged accessed node number (i.e., the IO-cost) of the two is the same, but the communication cost of the condensed-RSA scheme is also better by a factor of (Redundancy set size - 1) ∗ RSA signature size. This is due to the fact that as with the condensedRSA scheme the server has to send the user only one condensed signature, while it has to send Redundancy set size signatures with respect to the naive RSA scheme. Verifying more signatures is the main reason for a higher CPUcost of the latter.

50K 2-d points (kd-tree)

CPU-time (sec)

30

Naive RSA Condensed RSA

25 20 15 10 5 0 Point

Range Insert Query type

Delete

Figure 6. Condensed RSA signature scheme vs. naive RSA signature scheme

Furthermore, to see the aﬀect of diﬀerent database sizes on the performance, for each of sub-datasets, we ran 100 separate queries with the redundancy set size being set to 4, and calculated averaged values for CPU-time. With inserts, deletes, and point queries we randomly chose 100 points from the corresponding dataset as the queries. With range queries, we randomly chose 100 squares as the queries. The sides of each square were chosen to be 1% of the norm of the data space side (if the dataset is uniformly distributed, this value maintains the selectivity of 0.01% for these range queries).The experimental results are shown in Figure 7. As we can see, the CPU-cost saving of

Providing Query Assurance for Outsourced Tree-Indexed Data

221

all kinds of queries is high, over 30% at the minimum between the condensedRSA scheme and the naive RSA scheme. Again, as mentioned above, although the averaged accessed node number is equal for both schemes, the communication cost of the condensed-RSA scheme is better than that of the naive RSA scheme.

CPU-time saving (%)

Computational cost savings

40 35 30 25 20 15 10 5 0

Point query Range query Insert Delete

10k

20k

30k

40k

50k

Dataset size

Figure 7. A variety of dataset sizes

To conclude this section we emphasize that it has been mathematically proven in [LiC04, Dan06a] that our approach based on the access redundancy and node swapping techniques is computationally secure to protect both queries and the tree structure from a polynomial time server. Therefore, it is quite safe to claim that our proposed solutions in this paper, which have extended the previous work, become full-ﬂedged and can be applied to real-world ODBS models.

6 Conclusion and Future Work In this paper, we explored the problem of query assurance in the oursourced database service (ODBS) model. Concretely, we extended our previous work, see e.g. [Dan06a] and presented a full-ﬂedged solution to the problem of ensuring the correctness, completeness, and freshness for basic operations (insert, delete, modify, point and range queries) on dynamic outsourced treeindexed data. Experimental results with real multidimensional datasets have conﬁrmed the eﬃciency of our proposed solution. Notably, to the best of our knowledge, none of the previous work has dealt with all the three above security issues of query assurance in the ODBS model with respect to dynamic outsourced trees. Our work therefore provides a vanguard solution for this problem. Also, this work can also be applied to non tree-indexed data outsourced to untrusted servers (with settings like those of [DVJ+03, Dan06a]).

222

T.K. Dang and N.T. Son

Our future work will focus on evaluating the eﬃciency of the proposed solutions in real-world applications and on addressing open research issues related. Specially, supporting multiple data owners’ signatures is a generalization of the proposed solution in this paper. An eﬃcient solution to this problem is still open (cf. Section 2). Moreover, as discussed in Section 2, auditing and accountability for the ODBS model as well as computer criminal-related issues must be addressed and it will be one of our future research activities of great interest. Another problem also attracts us is that: How to deal with the problem of over redundancy of the result set returned from the server, i.e., the server sends the user more than what should be returned in the answers. This may cause a user to pay more for the communication cost, to incurs worse computation cost, and so this issue needs to be investigated carefully.

References [Aso01]

Asonov, D.: Private Information Retrieval: An Overview and Current Trends. Proc. ECDPvA Workshop, Informatik, Vienna, Austria (2001) [BDW+04] Burmester, M., Desmedt, Y., Wright, R. N., Yasinsac, A.: Accountable Privacy. Proc. 12th International Workshop on Security Protocols, Cambridge, UK (2004) [BGL+03] Boneh, D., Gentry, C., Lynn, B., Shacham, H.: Aggregate and Veriﬁably Encrypted Signatures from Bilinear Maps. Proc. International Conference on the Theory and Applications of Cryptographic Techniques, May 4-8, Warsaw, Poland, pp. 416-432 (2003) [BoP02] Bouganim, L., Pucheral, P.: Chip-Secured Data Access: Conﬁdential Data on Untrusted Servers. Proc. 28th International Conference on Very Large Data Bases, Hong Kong, China, pp. 131-142 (2002) [CFM+95] Castano, S., Fugini, M. G., Martella, G., Samarati, P.: Database Security. Addison-Wesley and ACM Press, ISBN 0-201-59375-0 (1995) [CGK+95] Chor, B., Goldreich, O., Kushilevitz, E., Sudan, M.: Private Information Retrieval. Proc. 36th Annual IEEE Symposium on Foundations of Computer Science, Milwaukee, Wisconsin, USA, pp. 41-50 (1995) [ChM04] Chang, Y-C., Mitzenmacher, M.: Privacy Preserving Keyword Searches on Remote Encrypted Data. Cryptology ePrint Archive: Report 2004/ 051 (2004) [Dan05] Dang, T. K.: Privacy-Preserving Basic Operations on Outsourced Search Trees. Proc. International Workshop on Privacy Data Management (PDM2005, in conjunction with ICDE2005), IEEE Computer Society press, April 8-9, 2005, Tokyo, Japan (2005) [Dan06a] Dang, T. K.: A Practical Solution to Supporting Oblivious Basic Operations on Dynamic Outsourced Search Trees. Special Issue of International Journal of Computer Systems Science and Engineering, CRL Publishing Ltd, UK, 21(1), 53-64 (2006) [Dan06b] Dang, T. K.: Security Protocols for Outsourcing Database Services. Information and Security: An International Journal, ProCon Ltd., Soﬁa, Bulgaria, 18, 85-108 (2006)

Providing Query Assurance for Outsourced Tree-Indexed Data [Dan03]

[DuA00]

[DVJ+03]

[Fon03]

[GIK+98]

[HIL+02]

[HMI02]

[JoN01]

[LiC04]

[Mer80] [MNT04]

[NaT06]

[PaT04]

[PJR+05]

[Sio05]

[SmS01]

223

Dang, T. K.: Semantic Based Similarity Searches in Database Systems: Multidimensional Access Methods, Similarity Search Algorithms, PhD Thesis, FAW-Institute, University of Linz, Austria (2003) Du, W., Atallah, M. J.: Protocols for Secure Remote Database Access with Approximate Matching. Proc. 7th ACM Conference on Computer and Communications Security, 1st Workshop on Security and Privacy in E-Commerce, Greece (2000) Damiani, E., Vimercati, S. D. C., Jajodia, S., Paraboschi, S., Samarati, P.: Balancing Conﬁdentiality and Efficiency in Untrusted Relational DBMSs. Proc. 10th ACM Conference on Computer and Communication Security, Washingtion, DC, USA, pp. 93-102 (2003) Fong, K. C. K.: Potential Security Holes in Hacig¨ um¨ us’ Scheme of Executing SQL over Encrypted Data (2003) http://www.cs.siu. edu/∼kfong/research/database.pdf Gertner, Y., Ishai, Y., Kushilevitz, E., Malkin, T.: Protecting Data Privacy in Private Information Retrieval Schemes. Proc. 30th Annual ACM Symposium on Theory of Computing, USA (1998) Hacig¨ um¨ us, H., Iyer, B. R., Li, C., Mehrotra, S.: Executing SQL over Encrypted Data in the Database-Service-Provider Model. Proc. ACM SIGMOD Conference, Madison, Wisconsin, USA, pp. 216-227 (2002) Hacig¨ um¨ us, H., Mehrotra, S., Iyer, B. R.: Providing Database as A Service, Proc. 18th International Conference on Data Engineering, San Jose, CA, USA, pp. 29-40 (2002) Joux, A., Nguyen, K.: Separating Decision Diffie-Hellman from DiffieHellman in Cryptographic Groups. Cryptology ePrint Archive: Report 2001/003 (2001) Lin, P., Candan, K. S.: Hiding Traversal of Tree Structured Data from Untrusted Data Stores. Proc. 2nd International Workshop on Security in Information Systems, Porto, Portugal, pp. 314-323 (2004) Merkle, R.: Protocols for Public Keys Cryptosystems. Proc. IEEE Symposium on Research in Security and Privacy (1980) Mykletun, E., Narasimha, M., Tsudik, G.: Authentication and Integrity in Outsourced Databases. Proc. 11th Annual Network and Distributed System Security Symposium, San Diego, California, February 5-6, San Diego, California, USA, (2004) Narasimha, M., Tsudik, G.: Authentication of Outsourced Databases Using Signature Aggregation and Chaining. Proc. 11th International Conference on Database Systems for Advanced Applications, April 12-15, Singapore, pp. 420-436 (2006) Pang, H. H., Tan, K-L.: Authenticating Query Results in Edge Computing. Proc. 20th International Conference on Data Engineering, March 30-April 2, Boston, MA, USA, pp. 560-571 (2004) Pang, H. H, Jain, A., Ramamritham, K., Tan, K-L.: Verifying Completeness of Relational Query Results in Data Publishing. SIGMOD Conference, pp. 407-418 (2005) Sion, R.: Query Execution Assurance for Outsourced Databases. Proc. 31st International Conference on Very Large Data Bases, August 30-September 2, Trondheim, Norway, pp. 601-612 (2005) Smith, S. W., Saﬀord, D.: Practical Server Privacy with Secure Coprocessors. IBM Systems Journal 40(3), 683-695 (2001)

224

T.K. Dang and N.T. Son

[Uma04]

Umar, A.: Information Security and Auditing in the Digital Age: A Managerial and Practical Perspective. NGE Solutions, ISBN 0-97274147-X (2004)

An Adaptive Space-Sharing Scheduling Algorithm for PC-Based Clusters Viet Hung Doan1 , Nam Thoai2 , and Nguyen Thanh Son3 1 2 3

Ho Chi Minh City University of Technology [email protected] Ho Chi Minh City University of Technology [email protected] Ho Chi Minh City University of Technology [email protected]

Abstract In recent years, PC-based cluster has become a mainstream branch in high performance computing (HPC) systems. To improve performance of PC-based cluster, various scheduling algorithms have been proposed. However, they only focused on systems with all jobs are rigid or all jobs are moldable. This paper ﬁlls in the gap by building a scheduling algorithm for PC-based clusters running both rigid jobs and moldable jobs. As an extension of existing adaptive space-sharing solutions, the proposed scheduling algorithm helps to reduce the turnaround time. In addition, the algorithm satisﬁes some requirement about job-priority. Evaluation results show that even in extreme cases such as all jobs are rigid or all jobs are moldable, performance of the algorithm is competitive to the original algorithms.

1 Introduction PC-based clusters are getting more and more popular these days as they provide extremely high execution rates with great cost eﬀectiveness. On par with the development of PC-based clusters, scheduling on PC-based clusters has been an interesting research topic in recent years. Most of the research to date on scheduling has focused on the scheduling of rigid jobs, i.e., jobs that require ﬁxed number of processors. However, many parallel jobs are moldable [2, 4], i.e., they adapt to the number of processors that scheduler set at the beginning of the job executions. Due to this ﬂexibility, schedulers can choose an eﬀective number of processors for each job. If most of processors are free, jobs may be allocated a large number of processors, their execution time is reduced. On the other hand, if the load is heavy, the number of processors allocated for each job will be smaller, which reduces the wait time. These scenarios help to reduce the turnaround time. Older and recent studies in [1–3,8,12,14,16,17] have shown the eﬀectiveness of scheduling for moldable jobs. Based on the scheduling mechanisms developed separately for rigid or moldable jobs, this paper proposes a scheduling solution for a class of PC-based clusters running both rigid and moldable jobs. This study

226

V.H. Doan et al.

is motivated from the fact that today PC-based clusters are commonly used as shared resources for diﬀerence purposes. The jobs run in these PC-based clusters are usually both rigid and moldable. The rest of this paper is organized as follows: Section 2 describes background knowledge on scheduling. Section 3 states the research problem addressed in this paper. Section 4 gives an overview of the previous approaches related to the problem. Sections 5 and 6 describe details of the proposed solution. Section 7 describes the evaluation of the new scheme. Section 8 concludes the paper.

2 Background Knowledge Scheduling algorithms are widely disparate both in their aims and methods. From user’s view point, a good scheduler should minimize turnaround time, which is the time elapsed between job submission and its completion. The turnaround time is calculated by summing up wait time and execution time [1]. Another expectation is those jobs with higher priority should be ﬁnished sooner than jobs with lower priority. The paper focuses on two of these objectives. In PC-based clusters, to improve the performance of ﬁne-grained parallel applications, the systems are usually space-shared, and each job has a dedicated access to some number of the processors [1]. The set of dedicated processors is called partition and the number of processors in each partition is called partition size. Space-sharing policies can be divided into four categories [12]. (i) Fixed partitioning, in which partitions are created before the system operates and these partitions cannot be changed later on. (ii) Variable partitioning, in which partitions are created whenever jobs come into the system. (iii) Adaptive partitioning, in which partitions are created when jobs begin to be executed. (iv) Dynamic partitioning, in which partitions may change their sizes during job execution time to adapt to the current system status. To reach the goals of scheduling, there are many diﬀerent ways to schedule jobs. However, they are all based on few mechanisms. A study in [7] shows that in batch scheduling, where jobs are not preempted, two popular approaches are First Come First Serve (FCFS) and backﬁlling. FCFS considers jobs in order of arrival. If there are enough processors to run a job, the scheduler allocates processors, and the job can start. Otherwise, the ﬁrst job must wait, and all subsequent jobs also have to wait. This mechanism may lead to a waste of processing power. However, the FCFS algorithm has been implemented in many real systems for two main reasons: it is quite simple to implement and it does not require users to estimate the run time of jobs. Backﬁlling is a mechanism that tries to balance between utilization and maintaining FCFS order. It allows small jobs to move ahead and to run on processors that would otherwise remain idle [7]. Drawbacks of backﬁlling are that it is more complex than FCFS and an estimate of execution time is required.

An Adaptive Space-Sharing Scheduling Algorithm for PC-Based Clusters

227

Characteristics of the job also have much impact on performance of scheduling algorithms. In the view of processor allocation, parallel jobs can be classiﬁed according to their run-time ﬂexibility. There are four types of parallel jobs classiﬁed by this criterion [8]. (i) Rigid: the required number of processors for job execution is ﬁxed. (ii) Evolving: job may change its resource requirement during execution. (iii) Moldable: job allows scheduler to set the number of processors at the beginning of execution, and the job initially conﬁgures itself to adapt to this number. (iv) Malleable: job can adapt to the changes in the number of processors that scheduler allocates during execution.

3 Problem Deﬁnition The problem tackled in this paper comes from our experiences on operating Supernode II [18], a PC-based cluster at Ho Chi Minh City University of Technology supporting high performance computing applications. All jobs are running in batch mode. Two classes of jobs running on the system are (i) rigid jobs, e.g., sequential jobs or MPI jobs which the required number of processors is ﬁxed, and (ii) moldable jobs, e.g., MPI jobs which can run on any number of processors. Because Supernode II is used as a shared resource, each Supernode II user has a priority to execute their jobs. Additionally, all users of the system want their jobs to ﬁnish as soon as possible. Therefore, the goal of the scheduler is to minimize the turnaround time and to prefer users with high priority. Moreover, the scheduler should not require a good estimation on job execution time. It can be seen that requirements above are common to other PC-based clusters. Therefore a solution for Supernode II can be applied to other PCbased clusters with similar characteristics.

4 Related Works Scheduling for rigid jobs have been studied extensively. Two main mechanisms for scheduling rigid jobs are FCFS and backﬁlling have been described in Sect. 2. This section only focuses on scheduling moldable jobs. When classifying parallel jobs based on the view of processor allocation, it is pointed out in [8] that adaptive partitioning is a suitable processor allocation policy for moldable jobs. Current research in scheduling moldable jobs [1, 3, 12, 14, 16, 17] also use the adaptive space-sharing scheme. In [12], Rosti et al. try to reduce the impact of fragmentation by attempting to generate equally-sized partitions while adapting to transient workload. The partition size is computed by (1). Let T otalP Es be the number of processing elements in the system and QueueLength be the number of jobs currently in

228

V.H. Doan et al.

queue. The policy always reserves one additional partition for jobs that might come in the future by using QueueLength + 1 instead of QueueLength. P artitionSize = max(1, ceil(

T otalP Es )) QueueLength + 1

(1)

An analysis in [14] shows that (1) considers only the queued jobs to determine partition size. This will lead to situation that contravenes the equal allocation principle in [10, 11]. Therefore, Sivarama et al. [14] suggest a modiﬁed policy as in (2). S is the number of executing jobs. The best value of f depends on system utilization and job structure, but a value of f in the range between 0.5 and 0.75 appears to be a reasonable choice. P artitionSize = max(1, ceil(

T otalP Es )), 0 ≤ f ≤ 1 1 + QueueLength + f × S

(2)

The adaptive policy in [1,3] is more restrictive, in that users must specify a range of the number of processors for each job. Schedulers will select a number which gives the best performance. Schedulers in [1,3] use a submit-time greedy strategy to schedule moldable jobs. In [16], Srinivasan et al. have some improvement to [1]: (i) deferring the choice of partition size until the actual job start time instead of job submission time and, (ii) using aggressive backﬁlling instead of conservative backﬁlling. In [17], Srinivasan et al. argue that an equivalent partition strategy tends to beneﬁt jobs with small computation size (light jobs). On the other hand, allocating processors to jobs proportional to the job computation size tends to beneﬁt heavy jobs signiﬁcantly. A compromise policy is that each job will have a partition size proportional to the square root of its computation size (W eight) as in (3). This equation is used to calculate partition size in an enhanced backﬁlling scheme proposed in [17]. √ W eighti √ (3) W eightF ractioni = W eighti i∈{P arallelJobInSystem} In all the works above, there are two questions that schedulers must answer: (i) how many processors will be allocated for a job (partitioningfunction) and (ii) which jobs will be executed when there are some processors that ﬁnish their jobs and become idle (job-selection rules). The two next sections describe details of partitioning-function and job-selection rules.

5 Partitioning-Function Based on the analysis of FCFS and backﬁlling in Sect. 2, a scheduler using backﬁlling for the system in Sect. 3 is not suitable. Because the backﬁlling requires information about the running time of jobs which cannot be precisely

An Adaptive Space-Sharing Scheduling Algorithm for PC-Based Clusters

229

Sum turn around time

provided by users [9], a new scheme should be suggested and it is also the objective of this paper to enhance the FCFS scheme. As analyzed in [10, 11, 14], a scheduling policy should share the processing power equally among jobs. To verify that, we use simulation. The virtual PC-based cluster and its scheduler are built using SimGrid [15]. This virtual system has the same characteristics as those of Supernode II [18]. Three partitioning-functions used in this experiment are (2, 3) and a new partitioning-function similar to (3) but with W eight be replaced by P riority × W eight. The third partition function gives a larger partition size for job with higher priority. These three functions are named Sivarama, Sudha and Sudha with priority respectively. Until now, though there was some research on models for moldable jobs [2, 4], they are still not accepted as a standard for simulation. Therefore, we used a subset of 4000 jobs of CTC workload log from [6]. This workload is also used in [16]. To determine the speedup and execution time for a job for any desired partition size, this experiment uses Downey’s model [5]. Because the workload log does not contain information of user priority, we statistically assign priority by user ID, there are ﬁve levels of priority ranging from 1.0 (lowest priority) to 1.4 (highest priority). To show the performance details of scheduling algorithms jobs are categorized based on their workload-based weight, i.e., the product of their requested number of processors and actual run time in seconds. In all charts, the vertical axis presents time in second and the horizontal axis shows the workload-based weight.

2000000

Sivarama

1750000

Sudha Sudha with priority

1500000 1250000 1000000 750000 500000 250000 0

256K

Total

Job categories

Figure 1. Turn around time when priority = 1.4

Figure 1 shows that turnaround time increases respectively from Sivarama and Sudha to Sudha with priority. In other words, turnaround time tends to be reduced if jobs have a similar partition size. The reason is that in a heavilyloaded system, if a scheduler assigns a large partition for a heavy job, its

230

V.H. Doan et al.

execution time will be smaller. However, because the partition size is large, the job must wait for a long time until having enough processors to execute. In this situation, delay caused of waiting overwhelms the beneﬁt of reducing execution time. Therefore, in a system that we do not know job execution time in advance, an equivalent partitioning is preferred. This result consists with [10, 11]. With this observation, the partitioning-function chosen in this paper is an enhancement from the partitioning-function in [14]. For rigid jobs, partition size for each job must be equal to the partition size it requires. For moldable jobs, the partition size is calculated in a similar way of (3). However, the numerator now is the number of processors for parallel jobs (P EsF orP arallel). Furthermore, the dominator only counts the number of parallel jobs in queue (P arallelJobInQueue). These diﬀerences rise from the fact that there is a portion of processors reserved for rigid jobs. P artitionSize = max(1, ceil(

P EsF orP arallel )), 0 ≤ f ≤ 1 (4) 1 + P arallelJobInQueue + f × S

P EsF orP arallel = T otalP Es − N umberOf SequentialJobInQueue

(5)

Equations (4, 5) imply that the eﬀect of rigid-parallel jobs and moldableparallel jobs to the size of moldable jobs are the same. An evaluation of this partitioning-function will be conducted in combination with the job-selection rules in the next sections.

6 Job-Selection Rules Job-selection rules use information about partition size in Sect. 5 and status of the system to decide which job will execute next. Scheduling policy must utilize all processing power of the system to reduce turnaround time and it also helps to reduce waiting time for jobs with high priority. To reduce the complexity of the scheduling algorithm, two queues are implemented in the scheduler. The ﬁrst one is a ﬁxed-size queue and the scheduling algorithm only uses this queue, the ﬁxed size of the ﬁrst queue helps to reduce scheduling complexity. The second one is a dynamic-size queue to stabilize job stream entering the system. It contains jobs overﬂowing from the ﬁxed-size queue. Jobs in this queue are ordered by the time they come into the system. Each job X in the ﬁxed-size queue has a priority value called P riority(X). An initial value of P riority(X) is the priority of job X when it is submitted. Details of job-selection rules are: Step 1: Classifying jobs in the ﬁxed-size queue into groups based on the job priority value. Each group Groupi is distinguished by the index i, group that has higher index contains jobs with higher priority.

An Adaptive Space-Sharing Scheduling Algorithm for PC-Based Clusters

231

Step 2: Select next job(s) to execute k = M axGroupIndex while still have P Es for jobs do Select a subset of jobs in Groupk that best ﬁt the available P Es Execute that subset k =k−1 end while The task of selecting a subset of jobs in Groupk that best ﬁt the available P Es using a scheme proposed in [18], including three steps: (i) best-ﬁt loop, selecting a job in Groupk that partition size best ﬁt the available P Es, then repeating the previous selection procedure until being unable to select any more job, (ii) worst-ﬁt loop, similar to best-ﬁt loop but selecting job that worst ﬁt instead of best ﬁt, (iii) exhausted search on the remaining set. Authors in [18] have proved the scheme is a compromise between the correctness and the complexity. With the job-selection rules above, job execution order is not exactly FCFS, but the set of jobs is selected based on their priority and their ﬁtness to available processors. In this way, system utilization will increase and jobs with a higher priority are ensured to be prioritized. However, like other priority scheduling policies, the scheduling algorithm above may lead to starvation. A steady stream of high-priority processes can prevent a low-priority process from getting processors. To cope with this problem, an aging policy is used. P riority(X) will be increased whenever there is a job entering the ﬁxed-size queue after X. When P riority(X) > M axP riority, X will be chosen to execute ﬁrst. If at this time, there are not enough P Es, the scheduler must reserve P Es for these jobs. These job-selection rules associated with partitioning-function in Sect. 5 satisfy all requirements in Sect. 3. The next section presents simulation results of the proposed algorithm.

7 Overall Evaluation Until now, there is no algorithm for both rigid and moldable jobs. Therefore, in this section we only introduce simulation results of the proposed algorithm in two extreme cases when all jobs are rigid or all jobs are moldable. The system and workload are the same as in Sect. 5. When all jobs are rigid, the partitioning-function gives each job a partition size equal to the number of processors it requires. Figures 2 and 3 show turnaround time of the proposed algorithm (represented by Priority) and basic FCFS for rigid jobs (represented by FIFO). In general, the proposed algorithm is a little bit better than FIFO, because the program uses best-ﬁt policy in job selection rules. With jobs having high priority, their turnaround time in Priority scheme is smaller than that in FIFO (Fig. 3). That result

232

V.H. Doan et al.

demonstrates the eﬀectiveness of the proposed algorithm in ensuring priority demand. 8000000

Sum turn around time

7000000

FIFO Priority

6000000 5000000 4000000 3000000 2000000 1000000 0 256K

Total

>256K

Total

Job categories

Figure 2. Turn around time when all jobs are rigid

1800000

Sum turn around time

1600000 1400000

FIFO Priority

1200000 1000000 800000 600000 400000 200000 0 256K

Total

Job categories

Figure 4. Turn around time when all jobs are moldable 1800000

Sum turn around time

1600000 1400000

FIFO Priority

1200000 1000000 800000 600000 400000 200000 0 2.

Proof. Using the Green’s kernel matrix GN with components GN (xi , xj ) one has for the vector fN of function values fN (xi ) the system (GN + mλI)fN = GN y where y is the data vector with components yi . It follows that fN −y = (GN +mλI)−1 GN y−(GN +mλI)−1 (GN +mλI) = −mλ(GN +mλI)−1 y. The bounds for the distance between the function values and the data then follow when the asymptotic behaviour of GN mentioned above is taken into account. ⊓ ⊔ It follows that one gets an asymptotic overﬁtting in the data points and the data term in R(f ) satisﬁes the same bound m

mλ , | log h|

if d = 2

(fN (xi ) − yi ) ≤ Cmλhd−2 ,

if d ≥ 3

i=1

and

m i=1

2

(fN (xi ) − yi ) ≤ C

2

and h → 0. The case d = 2 is illustrated in Figure 2, right. While the approximations fN do converge on the data points they do so very locally. In an area outside a neighbourhood of the data points the fN tend to converge to a constant function so that the ﬁt picks up fast oscillations near the data points but only slow variations further away. It is seen that the value of R(fN ) → 0 for h → 0. In the following we can give a bound for this for d ≥ 3. Proposition 2. The value of functional J converges to zero on the estimator fN and J(fN ) ≤ Cmλhd−2 √ for some C > 0. In particular, one has ∇fN ≤ C mλh(d−2) . Proof. While we only consider regular partitioning with hyper-cubical elements Q, the proof can be generalised for other elements. First, let bQ be a member of the ﬁnite element function space such that bQ (x) = 1 for x ∈ Q and bQ (x) = 0 for x in any element which is not a neighbour of Q. One can see that

Fitting Multidimensional Data Using Combination Techniques

Q

243

|∇bQ |2 dx ≤ Chd−2 .

Choose h such that for the k-th component of xi one has |xi,k − xj,k | > 3h,

for i = j.

In particular, any element contains at most one data point. Let furthermore Qi be the element containing xi , i.e., xi ∈ Qi . Then one sees that g deﬁned by g(x) =

m

yi bQi (x)

i=1

interpolates the data, i.e., g(xi ) = yi . Consequently, R(g) = λ |∇g|2 dx. Ω

Because of the condition on h one has for the supports supp bQi ∩ supp bQj = ∅ for i = j and so R(g) = λ

m i=1

yi2

Ω

|∇bQi |2 dx

and, thus, R(g) ≤ Cmλhd−2 .

It follows that inf R(f ) ≤ R(g) ≤ Cmλhd−2 .

⊓ ⊔

We conjecture that in the case of d = 2 one has J(fN ) ≤ Cmλ/| log h|. We would also conjecture, based on the observations, that fN converges very slowly towards a constant function.

4 Projections and the Combination Technique It is well known that ﬁnite element solutions of V-elliptic problems can be viewed as Ritz projections of the exact solution into the ﬁnite element space satisfying the following Galerkin equations: fN , gRLS = f∗ , gRLS ,

g ∈ VN .

The projections are orthogonal with respect to the energy norm · RLS . Let Pl : V → Vl denote the orthogonal projection with respect to the norm S · RLS and let Pn be the orthogonal projection into the sparse grid space S Vn = |l|≤n Vl . If the projections Pl form a commutative semigroup, i.e., if for all l, l′ there exists a l′′ such that Pl Pl′ = Pl′′ then there exist cl such that

244

J. Garcke and M. Hegland

PnS =

cl Pl .

|l|≤n

We have seen in the previous section why the combination technique may not provide good approximations as the quadrature errors do not cancel in the same way as the approximation errors. The aspect considered here is that the combination technique may break down if there are angles between spaces which are suﬃciently smaller than π/2 and for which the commutator may not be small. For illustration, consider the case of three spaces V1 , V2 and V3 = V1 ∩ V2 . The cosine of the angle α(V1 , V2 ) ∈ [0, π/2] between the two spaces V1 and V2 is deﬁned as 5 6 c(V1 , V2 ) := sup (f1 , f2 ) | fi ∈ Vi ∩ (V1 ∩ V2 )⊥ , fi ≤ 1, i = 1, 2 . The angle can be characterised in terms of the orthogonal projections PVi into the closed subspaces Vi and the corresponding operator norm, it holds [2] c(V1 , V2 ) = P1 P2 PV3⊥ .

(7)

If the projections commute then one has c(V1 , V2 ) = 0 and α(V1 , V2 ) = π/2 which in particular is the case for orthogonal Vi . However, one also gets α(V1 , V2 ) = π/2 for the case where V2 ⊂ V1 (which might contrary to the notion of an “angle”). Numerically, we estimate the angle of two spaces using a Monte Carlo approach and the deﬁnition of the matrix norm as one has c(V1 , V2 ) = PV1 PV2 − PV1 ∩V2 = sup g

PV1 PV2 g − PV1 ∩V2 g PV2 g

(8)

For the energy norm the angle between the spaces substantially depends on the positions of the data points xi . We consider in the following several diﬀerent layouts of points and choose the function values yi randomly. Then the ratio PV1 PV2 g−PV1 ∩V2 g is determined for these function values and data points PV2 g and the experiment is repeated many times. The estimate chosen is then the maximal quotient. In the experiments we choose Ω = (0, 1)2 and the subspaces V1 and V2 were chosen such that the functions were linear with respect to one variable while the h for the grid in the other variables was varied. In a ﬁrst example, the data points are chosen to be the four corners of the square Ω. In this case, the angle turns out to be between 89.6 and 90 degrees. Lower angles corresponded here to higher values of λ. In the case of λ = 0 one has the interpolation problem at the corners. These interpolation operators, however, do commute. In this case the penalty term is actually the only source of non-orthogonality. A very similar picture evolves if one chooses the four data points from {0.25, 0.75}2 . The angle is now between 89 and 90 degrees where the higher angles are now obtained for larger λ and so the regulariser improves the orthogonality.

Fitting Multidimensional Data Using Combination Techniques

245

A very diﬀerent picture emerges for the case of four randomly chosen points. In our experiments we now observe angles between 45 degrees and 90 degrees and the larger angles are obtained for the case of large λ. Thus the regularise again does make the problem more orthogonal. We would thus expect that for a general ﬁtting problem a choice of larger α would lead to higher accuracy (in regard to the sparse grid solution) of the combination technique. A very similar picture was seen if the points were chosen as the elements of the set 0.2i(1, 1) for i = 1, . . . , 4. In all cases mentioned above the angles decrease when smaller mesh sizes h are considered. 4.1 Optimised combination technique In [7] a modiﬁcation of the combination technique is introduced where the combination coeﬃcients not only depend on the spaces as before, which gives a linear approximation method, but instead depend on the function to be reconstructed as well, resulting in a non-linear approximation approach. In [6] this ansatz is presented in more detail and the name “opticom” for this optimised combination technique is suggested. Assume in the following, that the generating subspaces of the sparse grid are suitably numbered from 1 to s. To compute the optimal combination coeﬃcients ci one minimises the functional θ(c1 , . . . , cs ) = |P f −

s i=1

ci Pi f |2RLS ,

where one uses the scalar product corresponding to the variational problem ·, ·RLS , deﬁned on V to generate a norm. By simple expansion one gets θ(c1 , . . . , cs ) =

s

i,j=1

−2

ci cj Pi f, Pj f RLS

s i=1

ci Pi f 2RLS + P f 2RLS .

While this functional depends on the unknown quantity P f , the location of the minimum of J does not. By diﬀerentiating with respect to the combination coeﬃcients ci and setting each of these derivatives to zero we see that minimising this norm corresponds to ﬁnding ci which have to satisfy ⎤ ⎤⎡ ⎤ ⎡ ⎡ P1 f 2RLS P1 f 2RLS · · · P1 f, Ps f RLS c1 ⎢P2 f, P1 f RLS · · · P2 f, Ps f RLS ⎥⎢ c2 ⎥ ⎢ P2 f 2RLS ⎥ ⎥ ⎥⎢ ⎥ ⎢ ⎢ (9) ⎥ ⎥⎢ .. ⎥ = ⎢ ⎢ .. .. .. .. ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ . . . . . Pm f 2RLS cm Ps f, P1 f RLS · · · Ps f 2RLS

The solution of this small system creates little overhead. However, in general an increase in computational complexity is due to the need for the determination of the scalar products Pi f, Pj f RLS . Their computation is often diﬃcult

246

J. Garcke and M. Hegland

Figure 3. Value of the functional (1) and the least squares error on the data, i.e. M 2 2 −x2 1 + e−y for the combination i=1 (f (xi ) − yi ) , for the reconstruction of e M technique and the optimised combination technique for the grids Ωi,0 , Ω0,i , Ω0,0 and the optimised combination technique for the grids Ωj,0 , Ω0,j , 0 ≤ j ≤ i with λ = 10−4 (left) and 10−6 (right) 0.01

ct ls-error ct functional opticom ls-error opticom functional opticom intermed. grids l2-error opticom intermed. grids functional

0.001

least square error and functional

least square error and functional

0.01

1e-04

1e-05

1e-06

ct ls-error ct functional opticom ls-error opticom functional opticom intermed. grids l2-error opticom intermed. grids functional

0.001

1e-04

1e-05

1e-06 0

2

4

6

8

10

0

level

2

4

6

8

10

level

as it requires an embedding into a bigger discrete space which contains both Vi and Vj . Using these optimal coeﬃcients ci the combination formula is now fnc (x) :=

d−1

cl fl (x).

(10)

q=0 |l|1 =n−q 2

2

Now let us consider one particular additive function u = e−x + e−y , which we want to reconstruct based on 5000 random data samples in the domain [0, 1]2 . We use the combination technique and optimised combination technique for the grids Ωi,0 , Ω0,i , Ω0,0 . For λ = 10−4 and λ = 10−6 we show in Figure 3 the value of the functional (1), in Table 1 the corresponding numbers for the residuals and the cosine of γ = ∠(PU1 u, PU2 u) are given. We see that both methods diverge for higher levels of the employed grids, nevertheless as expected the optimised combination technique is always better than the normal one. We also show in Figure 3 the results for an optimised combination technique which involves all intermediate grids, i.e. Ωj,0 , Ω0,j for 1 ≤ j < i, as well. Here we do not observe rising values of the functional for higher levels but a saturation, i.e. higher reﬁnement levels do not substantially change the value of the functional.

5 Conclusions Here we consider a generalisation of the usual kernel methods used in machine learning as the “kernels” of the technique considered here have singularities on the diagonal. However, only ﬁnite dimensional approximations

Fitting Multidimensional Data Using Combination Techniques level 1 2 3 4 5 6 7 8 9 10

cos(γ) -0.012924 -0.025850 -0.021397 -0.012931 0.003840 0.032299 0.086570 0.168148 0.237710 0.285065

e2c 3.353704 · 10−4 2.124744 · 10−5 8.209228 · 10−6 1.451818 · 10−5 2.873697 · 10−5 5.479755 · 10−5 1.058926 · 10−4 1.882191 · 10−4 2.646455 · 10−4 3.209026 · 10−4

247

e2o 3.351200 · 10−4 2.003528 · 10−5 7.372946 · 10−6 1.421387 · 10−5 2.871036 · 10−5 5.293952 · 10−5 9.284347 · 10−5 1.403320 · 10−4 1.706549 · 10−4 1.870678 · 10−4

Table 1. Residual for the normal combination technique e2c and the optimised combination technique, as well as cosine of the angle γ = ∠(PU1 u, PU2 u)

are considered. The overﬁtting eﬀect which occurs for ﬁne grid sizes is investigated. We found that the method (using the norm of the gradient as a penalty) did asymptotically (in grid size) overﬁt the data but did this very locally only close to the data points. It appeared that the information in the data was concentrated on the data point and only the null space of the penalty operator (in this case constants) was ﬁtted for ﬁne grids. Except for the overﬁtting in the data points one thus has the same eﬀect as when choosing very large regularisation parameters so that the overﬁtting in the data points does arise together with an “underﬁtting” in other points away from the data. Alternatively, one could say that the regularisation technique acts like a parametric ﬁt away from the data points for small grid sizes and overall for large regularisation parameters. The eﬀect of the data samples is akin to a quadrature method if there are enough data points per element. In practise, it was seen that one required at least one data point per element to get reasonable performance. In order to understand the ﬁtting behaviour we analysed the performance both on the data points and in terms of the Sobolev norm. The results do not directly carry over to results about errors in the sup norm which is often of interest for applications. However, the advice to have at least one data point per element is equally good advice for practical computations. In addition, the insight that the classical combination technique ampliﬁes the sampling errors and thus needs to be replaced by an optimal procedure is also relevant to the case of the sup norm. The method considered here is in principle a “kernel method” [8] when combined with a ﬁnite dimensional space. However, the arising kernel matrix does have diagonal elements which are very large for small grids and, in the limit is a Green’s function with a singularity along the diagonal. It is well known in the machine learning literature that kernels with large diagonal elements lead to overﬁtting, however, the case of families of kernels which approximate a singular kernel is new.

248

J. Garcke and M. Hegland

References 1. D. Braess. Finite elements. Cambridge University Press, Cambridge, second edition, 2001. 2. F. Deutsch. Rate of convergence of the method of alternating projections. In Parametric optimization and approximation (Oberwolfach, 1983), volume 72 of Internat. Schriftenreihe Numer. Math., pages 96–107. Birkh¨ auser, Basel, 1985. 3. J. Garcke. Maschinelles Lernen durch Funktionsrekonstruktion mit verallgemeinerten d¨ unnen Gittern. Doktorarbeit, Institut f¨ ur Numerische Simulation, Universit¨ at Bonn, 2004. 4. J. Garcke, M. Griebel, and M. Thess. Data mining with sparse grids. Computing, 67(3):225–253, 2001. 5. M. Griebel, M. Schneider, and C. Zenger. A combination technique for the solution of sparse grid problems. In P. de Groen and R. Beauwens, editors, Iterative Methods in Linear Algebra, pages 263–281. IMACS, Elsevier, North Holland, 1992. 6. M. Hegland, J. Garcke, and V. Challis. The combination technique and some generalisations. Linear Algebra and its Applications, 420 (2-3): 249–275, 2007. 7. M. Hegland. Additive sparse grid ﬁtting. In Proceedings of the Fifth International Conference on Curves and Surfaces, Saint-Malo, France 2002, pp. 209–218. Nashboro Press, 2003. 8. B. Sch¨ olkopf and A. Smola. Learning with Kernels. MIT Press, 2002. 9. A. N. Tikhonov and V. A. Arsenin. Solutions of ill-posed problems. W. H. Winston, Washington D.C., 1977. 10. G. Wahba. Spline models for observational data, volume 59 of Series in Applied Mathematics. SIAM, Philadelphia, 1990.

Mathematical Modelling of Chemical Diﬀusion through Skin using Grid-based PSEs Christopher Goodyer1 , Jason Wood1 , and Martin Berzins2 1

2

School of Computing, University of Leeds, Leeds, UK [email protected], [email protected] SCI Institute, School of Computing, University of Utah, Salt Lake City, USA [email protected]

Abstract A Problem Solving Environment (PSE) with connections to remote distributed Grid processes is developed. The Grid simulation is itself a parallel process and allows steering of individual or multiple runs of the core computation of chemical diﬀusion through the stratum corneum, the outer layer of the skin. The eﬀectiveness of this Grid-based approach in improving the quality of the simulation is assessed.

1 Introduction The use of Grid technologies [8] has enabled the automated use of remote distributed computers for many new applications. Many of the larger Grid projects around the world are not simply using high performance computers (HPC) but are making better use of the distributed networks of machines within their own companies and institutions. In many of these cases, the problems being tackled on these systems are not so large that parallel computing is being used for single cases, but instead, multiple production runs, sometimes within the context of a design optimization process, are run on a multiprocessor architecture, e.g. [11]. The key elements of any computation are that the results obtained are useful, accurate, and have been obtained as eﬃciently as possible. Accuracy of the results is an issue that is resolved by an appropriate choice of mathematical model and computational method. The usefulness of the results is usually governed by choice of appropriate input parameters, ranging from those representing the solution case to those concerning the methods used during the computation. A common method for getting better control of the relationship between input parameters and output results has been through the use of Problem Solving Environments (PSEs). These typically combine the inputs, the computation, and visualisation of the outputs into one workﬂow, where the user is said to “close the loop” by feeding back from the visualisation of the output, to changes to the inputs. A key element of PSEs is that they

250

C. Goodyer et al.

must be accessible for non-expert computer users, since the users are typically scientists more focused on their own ﬁelds rather than on the intricacies of using the Grid. PSEs are discussed in general in Sect. 3, together with the PSE developed in this work. The application considered here is that of chemical diﬀusion through the skin. In this work we have simulated how the application of a chemical on the outside of the body gets through the outer layer into the body. The use of multiple simulations is very important for situations such as this, since each calculation represents only one particular case. For transient cases using the true full 3-d heterogeneous skin structure it can take hours or even days for solutions to fully converge. Through the use of multiple instances of the solver it is possible to reduce this to the maximum individual runtime, provided enough resources are available. Brief details of the solver, and the range of cases being considered are given in Sect. 2. The interaction of the PSE with the remote processes is the most important part of this work. The software described here has allowed transparent use of the Grid. For example making the process of running a large parallel job on a remote resource as easy as running a small one locally is very important for non-expert users. We have used the gViz library [2, 22] to provide the Grid middleware to handle the communication between the user’s desktop and the remote Grid processes. How these components are joined together is discussed in Sect. 4 where the use of a directory service of running jobs is also described. The main advantages of using a PSE are the ability to visualise the output datasets and to steer the calculations; these are discussed in Sect. 5. The paper is summarised in Sect. 6 along with a summary of the advantages of the Gridbased approach described, and suggested necessary extensions for future work.

2 Chemical Diﬀusion Through Skin The motivating problem in this work is that of numerical modelling of chemical diﬀusion through the skin. Such a situation might arise either purposefully or accidentally. For example a person being accidentally exposed to a chemical at work may hope for minimal adsorption into the body, but application of a drug transdermally could have great therapeutic beneﬁts. The barrier function of the skin comes almost entirely from the outermost layer, the stratum corneum. This is a highly lipophilic layer only about 10µm thick. Once a chemical has got through the stratum corneum it is into the viable epidermis, which is hydrophilic, and hence eﬀectively into the blood steam. The stratum corneum itself is made up of between six and 40 layers of corneocytes. Each corneocytes is hexagonal in shape, and is typically about 40µm across, 1µm high. They are surrounded by a lipid layer about 0.1µm wide, both between the tessellating corneocytes and between the individual layers. It is through this lipid that almost all of the chemical diﬀuses since the corneocytes themselves are almost impermeable. The diﬀusion path through

Mathematical Modelling of Diﬀusion through Skin using Grid-based PSEs

251

the lipid from surface to body is very tortuous due to the aspect ratio of the corneocytes. The mathematical model of this Fickian diﬀusion process has been well understood for some time, although the modelling is often only done on 1-d homogeneous membranes. There has been some 2-d work on “brick and mortar” shaped domains, notably that of Heisig et al. [13] and Frasch [9] but this work is the ﬁrst to tackle the true three dimensional nature of the skin. The model of Fickian diﬀusion is given in non-dimensional form by: ∂ ∂θ ∂θ ∂ ∂θ ∂ ∂θ = γ + γ + γ , (1) ∂τ ∂X ∂X ∂Y ∂Y ∂Z ∂Z where X, Y and Z are the coordinates, τ is the time, θ is the concentration, and γ the diﬀusion coeﬃcient, with boundary conditions: θ = 1, on Z = L θ = 0, on Γ : Z = 0 , and periodic boundary conditions between opposing faces perpendicular to the X-Y plane. In this work we are assuming that the corneocytes themselves are impermeable and hence on these boundaries we are assuming symmetry conditions perpendicular to the boundary. The domains have been meshed using unstructured tetrahedra, and discretised and solved using a Galerkin linear ﬁnite element solver, based on that originally developed by Walkley [19]. The quantities of interest are the concentration proﬁle throughout the domain; the total ﬂux out through the bottom face and the lag time a measure of the relative time taken to reach steady state. These are deﬁned precisely as ∂θ 1 dΓ (2) Flux out, Fout = AΓ Γ ∂Z T Cumulative mass out = Fout dτ , (3) τ =0

where the lag time is the X-intercept of the steady state rate of cumulative mass out, extrapolated backwards. The key questions being considered with the solver thus far have been concerned with the physical geometric model of the skin. The diﬀerences in the alignment of 2-d corneocytes was considered in [12] and these diﬀerences are further complicated by the move to 3-d. We have been considering the eﬀects of alignment, and how these are aﬀected by the aspect ratio of the corneocytes and the thickness of the lipids. Part of this work is to verify the previously published geometric estimates for the eﬀect of “slits, wiggles and necking” around the corneocytes [5, 14]. The eﬀect of the number of layers of corneocytes is also of great importance. The idea of layer independence of quantities needs rigorous proof, as the relative diﬀerence between having two or three layers is much greater than the diﬀerence between ten and eleven layers. Another case of great importance concerns the eﬀect of the size of

252

C. Goodyer et al.

application patches to the skin. Since any patch spreads out sideways as well as down, even through homogeneous membranes, then the separation of patches compared to the depth of stratum corneum makes signiﬁcant changes to both the mass of chemical getting into the body and the lag time.

3 Problem Solving Environments PSEs allow a user to break away from the traditional batch mode of processing. Traditionally, the user of a scientiﬁc research code would set all the parameters inside the program, compile the software, run it with the results saved to disk. These results could then be loaded into a visualisation package, processed and rendered as a screen output. Only once this sequence had been completed would the user be able to decide if the original set of parameters needed alteration, and the whole process would begin again. PSEs are typically built in or around visualisation packages. Some commercial visualisation products have added PSE capabilities during their development, such as IRIS Explorer [21] and AVS [18]. Other systems that were built around particular application areas have expanded into more general products, such as SCIRun [16]. These all have the same key elements of modular environments where graphical blocks represent the tasks (modules), with wires joing the blocks representing dataﬂow from one task to another. The arrangements of modules and wires are variously referred to as workﬂow, dataﬂow networks and maps. With a PSE the simulation is presented as a pre-compiled module in the workﬂow, and all the variables that may be changed are made available to the user through a graphical user interface (GUI). An example of the interface of the standard PSE for chemical diﬀusion through the skin, with the simulation running completely on the local machine, is shown with IRIS Explorer in Fig. 1. The key parts of the environment are, clockwise from top left, the librarian (a repository of all available modules), the map editor (a workspace for visually constructing maps), the output visualisation, and the input modules. IRIS Explorer has a visual programming interface where modules are dragged from the librarian into the map editor and then input and output ports are wired together between diﬀerent modules to represent the ﬂow of data through the pipeline, typically ending up with a rendered 3-d image. The inputs to the modules are all given through GUIs enabling parameter modiﬁcation without the need to recompile. In the map shown the ﬁrst module is the interface to the mesh generator, Netgen [17]. This module takes inputs concerning the alignment of the corneocytes, their sizes and separation, and how much their alignment is staggered. It then passes the appropriately formatted mesh structure onto the second module which performs the simulation. This module takes user inputs such as the location and size of chemical patch on the skin, whether the solve is to be steady state or transient, and computational parameters such as the level of initial grid adaptation and the frequency of generation of output datasets.

Mathematical Modelling of Diﬀusion through Skin using Grid-based PSEs

253

Figure 1. IRIS Explorer PSE for skin

4 Remote Grid Simulation One of the drawbacks to using a PSE used to be that the simulation was part of the environment and therefore had to be run locally within the PSE. This assumption, however, is not necessarily true as was shown by the ‘Pollution Demonstrator’ of Walkley et al. [3,20] which extended the approach to that of a remote simulation connected to a local (IRIS Explorer) PSE which still handled the steering and visualisation needs. This bespoke approach has been extended into a generic thread-based library, called gViz [2] providing an interface to the remote simulation. The local PSE environment is now independent and hence no longer needs to be IRIS Explorer. The gViz library itself operates as a collection of threads attached to the simulation code which manage the connections between the simulation and any connected users. Thus, once the code is running, the simulation initialises the library making itself available for connections from multiple users. When a user connects to the simulation new threads are started to handle steering and output datastreams. The library does not limit the number of connected users making it possible for multiple users to be collaborating by visualising the output results and modifying input parameters of the same simulation. To launch the simulation onto a remote resource it is necessary to both select the desired machine and an appropriate job launching mechanism. The launching module can interrogate a GIIS service (part of the Globus toolkit [7]) to discover information about available resources. Once the simulation starts, the gViz thread opens a port to which connections can be made.

254

C. Goodyer et al.

The gViz library supports several communications methods between simulation and other processes. Here we are using a web service style directory service communication built around SOAP using the gSOAP library [6]. In this method, the simulation contacts the directory service at startup and deposits its connection details. These can then be retrieved by the one or more PSEs when convenient to the user. The ability to undertake collaborative work is provided at no extra cost through the use of the gViz library. Since the directory service can be advertised to collaborators as running in a consistent location they will all be able to connect to this to discover the location of all the running simulations. This also assists users who may wish only to visualise the output rather than steer. When multiple simulations are involved then there are additional job submission considerations. Many providers of Grid resources are actively limiting the number of submissions per person able to be executed simultaneously, and hence running large numbers concurrently is often not possible on the chosen resource. In the work of Goodyer et al. [11] it was seen how the use of a parallel environment was beneﬁcial for solution time of a previously serial optimisation application, through concurrent simulation of independent cases with similar parameter sets. Here we extend this idea to take the multiple independent simulations inside one large MPI job. This means that co-allocation of all runs is handled by the resource’s own job scheduler and reduces the number of submitted jobs to just one, thus avoiding any resource limits set. When the MPI job starts all simulations register separately with the directory service, with unique identiﬁers allowing each to communicate back to the PSE separately if required.

5 Visualization and Steering With multiple simulations being controlled by the same PSE there are issues concerning eﬀective management of both input and output data, [4,22]. In this section we address these in turn, relating to how they have been used for the skin cases solved to date. The cases discussed here are all concerned with assessing how the geometric makeup of the skin aﬀects the calculated values of ﬂux out and lag time. To that end we have already generated a large selection of 3-d meshes for the solver to use On start-up each parallel process uses standard MPI [15] commands to discover its unique rank in the simulation, and load the appropriate mesh. The steering control panel has a collection of input variables, known as steerables and output values, known as viewables. The steerables are the inputs as described in Sect. 2, and the viewables are output variables calculated by the simulation to which the module is currently attached, including the quantities of interest and the current time of the solver through a transient simulation.

Mathematical Modelling of Diﬀusion through Skin using Grid-based PSEs

255

Figure 2. MultiDisplay visualising many output streams

Another diﬀerence between the PSE shown in Fig. 1 and the Grid enabled version is that this steering module now has two location bars along the bottom, and the buttons ‘Connect’ and ‘Steer all’. The two location bars contain the connection information retrieved from the directory service: the ﬁrst being an individual location, the second being a multiline list of simulations. The connections for steering updates can therefore be actually given to anywhere from one to all of the simulations simultaneously. In this manner it is possible to extend the experimentation approach, discussed in Sect. 3 to many simulations, rather than one speciﬁc case at a time. In addition to the ability to ‘Steer all’ it is similarly possible to retrieve all the current viewable parameters from the simulations currently running, and hence the ability to produce summary visualisations of all the cases. The visualisation of remote gViz processes for both 2-d and 3-d datasets has been done in IRIS Explorer, SCIRun, VTK and Matlab [1, 2, 10, 20]. The data sent is generic to enable the client PSE end to convert it into a native format. For concurrent simulations visualisation is conceptually no harder than for a single simulation, however the realities are slightly diﬀerent. In the work of Handley et al. [1] this has been tackled for the ﬁrst time. Here multiple datastreams from multiple simulations are combined, and the 2-d output is tiled across the display. In 3-d this is a signiﬁcantly harder problem, because the quantities of data being returned by the simulation are potentially orders of magnitude more. To address this issue we have developed a ‘MultiDisplay’ module, as shown in Fig. 2. This module receives images rather than the usual 3-d data. The individual datastreams from the simulations are rendered oﬀ

256

C. Goodyer et al.

screen, with an appropriate colourmap and camera position to produce a 2-d image, and these images are fed into the MultiDisplay module. The image pane on the left shows tiles of 16 simulations at a time, and these may be enlarged into the right hand pane for closer inspection. With multiple simulations connected to the PSE the quantity of data being received is the same, regardless of how it is visualised. It is just the quality of the rendering process to get the ﬁnal images which can make a diﬀerence to the time and memory used.

6 Conclusions In this work we have demonstrated how an intensive ﬁnite element solution code can be run through a Problem Solving Environment. It has been seen that this simulation can be launched remotely onto a Grid resource, and still remain interactive for both steering and output visualisations. Through the use of running multiple simulations run through one large MPI job it is possible to have concurrent calculation of diﬀerent cases, with the same set of steerable inputs, for example. Steering has been shown to be possible on both individual cases and sets of cases up to a ‘steer all’ capacity. The output visualisations for these cases have been shown to be possible in a highly detailed manner, through the use of the visualisation environments processing power on individual cases, or in a group processing fashion to render the simultaneous cases all in the same window. The use of the PSE has enabled two forms of steering to the skin scientist user. Firstly, the ability of the scientist to ask “What if...?” questions on individual cases has been seen to be very beneﬁcial in getting quick answers to individual questions. Through the use of multiple concurrent simulations it has been possible to get more general answers over a wider range of cases than had been possible with just the single run. An important part of running the simulations through the PSE and on the Grid has been the ability to detach the local processes from the remote simulations, hence enabling monitoring on demand rather than being constantly connected. It is true that when potentially hundreds of cases are being run for hours and days that the contact time with the simulations is probably only a small percentage of that, however the added ﬂexibility enables the user to see instantly when a parameter has been set up incorrectly. This means that erroneous computations can be minimised and all cases adjusted accordingly. There are several issues that have arisen out of this work needing further consideration. The main ones are concerned with how to eﬃciently handle the visualisation of large numbers of large datastreams. For the skin geometries we have considered here the obvious ﬁrst step would be to give the user the option of only retrieving the surface mesh. This would reduce the necessary transmission time from the simulation to the PSE, and also reduce the amount of work necessary locally to render the ﬁnal image. By retaining the

Mathematical Modelling of Diﬀusion through Skin using Grid-based PSEs

257

full dataset at the simulation end it would be possible to examine the data of an individual simulation in greater depth than would normally be done for multiple simulations. Another potential idea would be to actually do far more of the visualisation work remotely, hence reducing the local load even further. This could be done by using products such as the Grid-enabled version of IRIS Explorer discussed by Brodlie et al. [2] to put the visualisation process closer to the simulation than to the desktop, hence removing the need to the data to reach the local machine before the ﬁnal rendering.

Acknowledgments This work is funded through the EPSRC grant GR/S04871 and builds on the “gViz Grid Middleware” UK e-Science project. We also acknowledge Annette Bunge of the Colorado School of Mines for her collaboration and expertise.

References 1. Aslanidi, O. V., Brodlie, K. W., Clayton, R. H., Handley, J. W., Holden, A. V. and Wood, J. D. Remote visualization and computational steering of cardiac virtual tissues using gViz In: Cox, S., and Walker D. W., eds.: Proceedings of the 4th UK e-Science All Hands Meeting (AHM’05) EPSRC (2005) 2. Brodlie, K., Duce, D., Gallop, J., Sagar, M., Walton, J. and Wood, J. Visualization in Grid Computing Environments. IEEE Visualization (2004) 3. Brodlie, K. W., Mason, S., Thompson, M., Walkley, M. A., and Wood, J. W. Reacting to a crisis: beneﬁts of collaborative visualization and computational steering in a Grid environment. In: Proceedings of the All Hands Meeting 2002 (2002) 4. Brooke, J. M., Coveney, P. V., Harting, J., Jha, S., Pickles, S. M., Pinning, R. L., and Porter, A. R. Computational steering in RealityGrid. In: Cox, S., ed.: Proceedings of the All Hands Meeting 2003, EPSRC, 885–888 (2003) 5. Cussler, E. L., Hughes, S. E., Ward III, W. J. and Rutherford, A. Barrier Membranes. Journal of Membrane Science, 38:161-174 (1988) 6. van Engelen, R. A. and Gallivan, K. A. The gSOAP Toolkit for Web Services and Peer-To-Peer Computing Networks. In: Proceedings of IEEE CCGrid. (2002) 7. Foster, I. and Kesselman, C. Globus: A metacomputing infrastructure toolkit. International Journal of Supercomputer Applications, 11, 115–128 (1997) 8. Foster, I. and Kesselman, C. The Grid 2 : The Blueprint for a new computing infrastructure. Elsevier (2004) 9. Frasch H. F. and Barbero A. M. Steady-state ﬂux and lag time in the stratum corneum lipid pathway: results from ﬁnite element models. Journal of Pharmaceutical Sciences, Vol 92(11), 2196–2207 (2003) 10. Goodyer, C. E. and Berzins, M. Solving Computationally Intensive Engineering Problems on the Grid using Problem Solving Environments. In: Cuhna, J. C. and Rana, O. F., eds., Grid Computing: Software Environments and Tools. Springer Verlag (2006)

258

C. Goodyer et al.

11. Goodyer, C. E., Berzins, M., Jimack, P. K., and Scales, L. E. A Grid-enabled Problem Solving Environment for Parallel Computational Engineering Design Advances in Engineering Software, 37(7):439–449 (2006) 12. Goodyer C. E. and Bunge A. What If...? Mathematical Experiments on Skin, In: Proceedings of the Perspectives in Percutaneous Penetration, La Grande Motte, France (2004) 13. Heisig M., Lieckfeldt R., Wittum G., Mazurkevich G. and Lee G. Non steadystate descriptions of drug permeation through stratum corneum. I. the biphasic brick-and-mortar model. Pharmaceutical Research, Vol 13(3), 421–426 (1996) 14. Johnson, M. E., Blankschtein, D. and Langer, R. Evaluation of solute permeation through the stratum corneum: Lateral bilayer diﬀusion as the primary transport mechanism. Journal of Pharmaceutical Sciences, 86:1162-1172 (1997) 15. Message Passing Interface Forum: MPI: A message-passing interface standard. International Journal of Supercomputer Applications 8 (1994) 16. Parker, S. G., and Johnson, C. R. SCIRun: A scientiﬁc programming environment for computational steering. In: Meuer, H. W., eds.: Proceedings of Supercomputer ’95, New York, Springer-Verlag (1995) 17. Sch¨ oberl, J. Netgen mesh generation package, version 4.4, (2004) http://www. hpfem.jku.at/netgen/ 18. Upson, C., Faulhaber, T., Kamins, D., Laidlaw, D., Schlegel, D., Vroom, J., Gurwitz, R. and van Dam, A. The application visualization system: A computational environment for scientiﬁc visualization. IEEE Computer Graphics and Applications, 9(4):30–42 (1989) 19. Walkley, M. A., Jimack, P. K. and Berzins, M. Anisotropic adaptivity for ﬁnite element solutions of 3-d convection-dominated problems. International Journal of Numerical Methods in Fluids, Vol 40, 551–559, (2002) 20. Walkley, M. A., Wood, J., Brodlie, K. W. A distributed collaborative problem solving environment. In: Sloot, P. M. A., Tan, C. J. K., Dongarra, J. J., Hoekstra, A. G., eds.: Computational Science, ICCS 2002 Part I, Lecture Notes in Computer Science. Volume 2329, 853–861, Springer (2002) 21. Walton, J. P. R. B. Now you see it – interactive visualisation of large datasets. In: Brebbia, C. A., Power, H., eds.: Applications of Supercomputers in Engineering III. Computational Mechanics Publications/Elsevier Applied Science (1993) 22. Wood, J. W., Brodlie, K. W., Walton, J. P. R. gViz: Visualization and computational steering for e-Science. In: Cox, S., ed.: Proceedings of the All Hands Meeting 2003, EPSRC, 164–171 (2003) ISBN: 1-904425-11-9

Modelling Gene Regulatory Networks Using Galerkin Techniques Based on State Space Aggregation and Sparse Grids Markus Heglanda , Conrad Burdenb , and Lucia Santosoa a b

Mathematical Sciences Institute, ANU and ARC Centre in Bioinformatics Mathematical Sciences Institute and John Curtin School of Medical Research, ANU

Abstract An important driver of the dynamics of gene regulatory networks is noise generated by transcription and translation processes involving genes and their products. As relatively small numbers of copies of each substrate are involved, such systems are best described by stochastic models. With these models, the stochastic master equations, one can follow the time development of the probability distributions for the states deﬁned by the vectors of copy numbers of each substance. Challenges are posed by the large discrete state spaces, and are mainly due to high dimensionality. In order to address this challenge we propose eﬀective approximation techniques, and, in particular, numerical techniques to solve the master equations. Two theoretical results show that the numerical methods are optimal. The techniques are combined with sparse grids to give an eﬀective method to solve high-dimensional problems.

1 Introduction Biological processes can involve large numbers of proteins, RNA, and genes which interact in complex patterns. Modern biology has uncovered many of the components involved and identiﬁed many basic patterns of their interactions. The large number of components does, in itself, pose a major challenge to the investigation of their interactions. Consequently, one often studies subsystems in isolation both in vitro and in silica. These subsystems are localised in time and space and may thenselves display complex behavioural patterns. In order to be able to deal with this complexity, computational tools need to be scalable in the number of components and their interactions. In the following, we will consider some new computational tools which have the potential to deal with high complexity. We will, in particular, discuss tools which are suitable to investigate transcriptional regulation processes. Transcription is the process of replicating genetic information to messenger RNA. The main machinery doing transcription is the RNA polymerase which binds at promoter sites of the DNA and polymerises messenger RNA

260

M. Hegland et al.

(mRNA) encoding the same sequence as the gene. Later, the mRNA is used as a blueprint for the synthesis of proteins by the ribosomes. The process of transcription (and translation) is relatively slow compared to other biochemical processes like polymerisation and protein-DNA interactions. This simple observation will allow approximations which then lead to substantial computational simpliﬁcations. Transcription can be both negatively and positively regulated. In an example of negative regulation, a repressor protein (or complex) binds to an operator site (on the DNA) which overlaps a promoter site belonging to a gene. This prevents the RNA polymerase from binding to the promoter site and consequently transcription of this gene is blocked. Positive regulation, on the other hand, may be achieved by proteins which are bound to operator sites close to a promoter site. These proteins may interact with the RNA polymerase to attract it to the promoter site and thus initiate transcription. Following transcription, the mRNA is translated by ribosomes to form proteins. As transcription factors consist of proteins, which themselves have been synthesised in the transcription/translation process, feedback loops emerge where the transcription factors control their own synthesis. The regulation processes discussed above thus give rise to complex regulatory networks which are the target of the tools discussed in the following. The reader interested in a more in-depth discussion and, in particular a more comprehensive coverage of the principles of regulation of transcriptional control, can consult [12, 14]. An accurate, albeit computationally infeasible model of the transcriptional regulation processes, is provided by the Schr¨ odinger equations modelling the electrons and atomic nuclei of the proteins, DNA, water molecules and many other substances involved. A much simpler, but still infeasible model is based on molecular dynamics where the state of a system is described by the locations (and conformations) of all the molecules involved. In a next stage one could model a cell by the number of molecules in each compartment of the cell. The models considered here have only one compartment and the state is modelled by the number of copies of each component. Transcriptional control processes are modelled as biochemical reactions. Chemical reactions are characterised by their stoichiometry and their kinetic properties. Consider ﬁrst the stoichiometry. A typical process is the process of two components A and B (e.g., proteins, or transcription factor/DNA operator) forming a complex: A + B ⇆ A · B. In the very simpliﬁed model considered here the system (cell) is modelled by the copy numbers of substances or species A1 , . . . , As . These include proteins, RNA, DNA operator sites and complexes. The state of the system is characterised by the copies of each species. These copy numbers x1 , . . . , xs can range from one (in the case of DNA operator sites) to several hundreds in the case

Modelling Gene Regulatory Networks

261

of proteins. The copy numbers deﬁne the “state” of the system which is thus described by the vector x = (x1 , . . . , xs ) ∈ INs . Chemical reactions are then interpreted as “state transitions” We assume that the s species are involved in r reactions. Each chemical reaction is described by two stoichiometric vectors pj , qj with which one can write the reaction as s s qj,i Ai , j = 1, . . . , r. (1) pj,i Ai ⇆ i=1

i=1

Note that most components pj,i , qj,i of the stoichiometric vectors pj , qj are zero and the most typical nonzero components are one, and higher numbers may occur (e.g. two in the case of dimerisation) as well. The j-th chemical reaction then gives rise to a state transition from the state x to the state x+zj where zj = qj − pj . The species considered consist of “elementary” species like proteins, RNA and DNA operator sites and of “compound” species which are formed as complexes of the elementary species. The overall number of elementary species (bound and free) are constant over time and form an invariant vector y which is linearly dependent on x: y = Bx, where the matrix B describes the compositions of the compounds. In particular, for the j-th reaction one has Bpj = Bqj , or Bzj = 0 j = 1, . . . , r. Note that the pj and qj can be regarded as the positive and negative parts of the zj . A consequence of the invariant Bx = y is also that it deﬁnes the feasible domain (2) Ω := {x | Bx = y, x ∈ INs }. In the computations we take as the computational domain the smallest rectangular grid which contains Ω. In some cases one considers open systems where the domain Ω is unbounded and some of the species are fed through an external “reservoir”. Stochastic models are described by their probability distributions p(x; t) where x ∈ Ω and where t > 0 is the time. Equivalently, one may describe the systems by random variables X(t), i.e., stochastic processes. The stochastic system is assumed to be Markovian and is characterised by the conditional probability distribution p(x2 ; t2 | x1 , t1 ) which is the probability that the system is in state x2 at time t2 if it was in state x1 at time t1 . This is basically a transition probability. It is assumed that the system is stationary, i.e., that p(x2 ; t + s | x1 , t) = p(x2 ; s | x1 , 0).

(3)

It follows that p(x2 ; t2 ) =

x1 ∈Ω

p(x2 ; t2 | x1 , t1 )p(x1 ; t1 ).

(4)

Taking the derivative with respect to s and evaluating at s = 0 one gets a system of linear diﬀerential equations:

262

M. Hegland et al.

∂p(x; t) = A(x | y)p(y; t) ∂t

(5)

y∈Ω

where

∂p(x; s | y, 0) . ∂s s=0 As one has x∈Ω p(x; t) = 1 it follows that x∈Ω A(x | y) = 0. In a very small time step ∆t one gets a reasonable approximation with p(x; ∆t) = p(x; 0) + ∆t A(x | y)p(y; 0). A(x | y) =

y∈Ω

Consider now that the system is in state x0 at time zero and so p(x0 ; 0) = 1 and p(x; 0) = 0 for x = x0 . As the probabilities cannot be negative one sees from the approximation formula that A(x | y) ≥ 0 for y = x, and, as the columns of A have to sum to zero one has A(x | x) ≤ 0. In the case of one reaction (or state transition) x → x + zj for (asymptotically) small ∆t only one reaction will occur and so one sees that A(x | u) can only have one nonzero oﬀ-diagonal element in every column. Consider now reaction j in isolation and set for this case aj (x) = −A(x | x). As the columns sum up to zero one has A(x + zj | x) = aj (x). The diﬀerential equations are then ∂p(x; t) = aj (x − zj )p(x − zj ; t) − aj (x)p(x; t). ∂t Note that by deﬁnition the propensity aj (x) ≥ 0. For the case of multiple reactions, one can then derive the following diﬀerential equations: r

∂p(x; t) = aj (x − zj )p(x − zj ; t) − aj (x)p(x; t). ∂t j=1

(6)

These are the fundamental master equations. The propensities aj (x) depend on the numbers of particles on the left hand side of the reaction, more speciﬁcally, they depend on the pj and only the components of x which also occur in pj . Essentially, they are polynomial in x and are known up to a multiplicative constant from the law of mass action. The master equations can be written as a linear initial value problem with diﬀerential equations dp = Ap. dt

(7)

initial conditions p(·; 0) and formally, one has the solution p(·; t) = eAt p(·, 0).

(8)

The pj , qj and aj (x) fully determine the reaction. Using them and the chemical master equations one can now determine p(x; t) for diﬀerent times and a given initial condition p(x; 0).

Modelling Gene Regulatory Networks

263

Assume that we are able to determine p(x; t). How does this help us understand the gene regulatory network? With the probability distribution one may determine the probability of any subset Ω ′ ⊂ Ω at any time. Such a subset may relate to certain disease states, development, or diﬀerentiation, cell cycle, or, in the case of the lambda phage, if the phage is in lysogeny or lysis. In the particular example of the lambda phage the levels of the proteins cro and cI (the repressor) are indicators if the phage lyses and the death of the host is immanent or is in lysogeny where the host is not threatened. The probability of the set Ω ′ is p(Ω ′ ; t) = p(x; t). x∈Ω ′

Other questions can also be answered which relate to the levels of proteins at the stationary state p∞ (x) = limt→∞ p(x; t), in particular the average protein levels which are expectations of the components xi as E(xi ) = xi p(x; t) x∈Ω

and, of course, their variances. Other questions which can be addressed relate to the marginal distributions of the components involved. As the probability p(x; t) is so useful one would like an eﬃcient way to determine this. This is hampered by the size of the feasible set Ω which depends exponentially on the dimension d. This is a manifestation of the curse of dimensionality. One way to address this curse is to use simulation [6]. Here, sample paths x(i) (t) are generated and the sample statistics are used for to estimate probabilities. Examples of regulatory networks where this approach has been used include the lac operon and the lambda phage, see [1]. The error of sampling methods is of order O(n−1/2 ) where n is the number of sample paths used and this approach amounts to a Monte Carlo method for the computation of the underlying integrals. Often one requires large numbers of simulations in order to obtain accurate estimates. The direct determination of p(x; t) would be preferable if it were feasible. Several groups [4, 10] have made eﬀorts to achieve this. In the following we will show how some earlier work by biologists can lead to one approach. In the next section we review this earlier approach. In the following main section we provide the two main convergence results of the Galerkin approach. Then we show how this can be combined with sparse grids.

2 Approximation of Fast Processes Biological processes often involve subprocesses at multiple time scales. At least two time scales have been identiﬁed for gene regulatory processes: the slow scale of protein production involving transcription and translation and

264

M. Hegland et al.

the fast scales of polymerisation and of protein/gene interactions. The scales correspond to a blocking of the the state variables into a fast and a slow part, i.e., x = (xf , xs ). Protein numbers correspond to slow variables while the state of operator switches correspond to fast variables. By deﬁnition of the conditional probability distribution p(xf |xs ; t) one has p(xf , xs ; t) = p(xf | xs ; t)p(xs ; t). We assume now that the fast processes are so fast that p(xf | xs ; t) can be replaced by its stationary limit which we denote by p(xf | xs ). If ps denotes the probability distribution over the slow spaces then one can restate the above equation as p = F ps , where the components of F are F (xf , xs | ys ) = 0 for xs = ys and F (xf , xs | xs ) = p(xf | xs ). Let the aggregation operator E be deﬁned by E(xs | yf , ys ) = 0 if ys = xs and E(ys | yf , ys ) = 1. Then one has EF = I and F is usually called the disaggregation operator. Furthermore, one has ps = Ep and so dp dps =E = EAp = EAF ps . dt dt Having determined the ps one can then compute the p. This is not the only way to do aggregation/disaggregation, for other operators based on piecewise constant approximations of smooth probability distributions, see [10]. The required conditional probabilities p(xf | xs ) can be obtained experimentally as in Shea/Ackers [15]. The discussion is based on statistical thermodynamics. Corresponding to each state x = (xf , xs ) there is an energy Ex and a “redundancy” ωx such that the (canonical ensemble) partition function is ω(xf ,xs ) exp(−E(xf ,xs ) /(kT )) Q(xs , T ) = xf

and the conditional probability follows a Boltzmann distribution p(xf |xs ) =

ω(xf ,xs ) exp(−E(xf ,xs ) /(kT )) . Q(xs , T )

Here the redundancies are known and the energies are determined from experiments. Shea and Ackers apply this method of determining the stationary distributions to three examples from the λ phage, namely, the cooperative binding occurring for the cI repressor on the operator sites OR2 and OR3, the interactions of repressors and RNA-polymerase and the balance between monomers and dimers. In the ﬁrst example, one has xf ∈ {0, 1}2 which describes the binding state of the cI2 repressor on the operator sites OR1 and OR2 and xs is the number of copies of cI2 . In this case the energies Ex only depend on xf and the redundancy factors are ω(0,0,xs ) = 1, ω(1,0,xs ) = ω(0,1,xs ) = xs and ω(1,1,xs ) = xs (xs − 1), respectively.

Modelling Gene Regulatory Networks

265

The matrix A can be decomposed into two parts as A = Af +As , where Af corresponds to the fast processes and As to the slow processes. Overall, one has A = z (Sz − I)Dz and it is assumed that the shift z either only involves variables xs or “mainly” xf . Corresponding to the decomposition into xf and xs the aggregation operator is E = eT ⊗ I 1 and for the slow processes one has ES(0,zs ) = (eT ⊗ I)(I ⊗ Szs ) = Szs ⊗ (eT ⊗ I) = Szs ⊗ E.

If one introduces the “reduced” diagonal propensity matrix Dzs = ED(zf ,zs ) F then one gets EAs F = (Szs − I)ED(zf ,zs ) F. (zf ,zs )

Note that the last matrix is again a diagonal matrix so that one gets the ordinary reduced form. For the fast term one has S(zf ,zs ) = Szf ⊗ I then one gets EAf F = 0, and, in the more general case we use here the approximation EAf F = 0. For the solution of the reduced system one now requires to compute the reduced diagonal matrices ED(zf ,zs ) F . Consider the example of a simple feedback switch which is used to control upper levels of proteins. Here xf is binary and xs is an integer. Any two by two block of Dz corresponding to a ﬁxed xs is diagonal and has diagonal elements α (the rate of production) and 0 as for xf = 1 the translation and hence production is suppressed. The partition function is in this case Q(xs ) = e−E1 /kT + xs e−E2 /kT and, with ∆E = E1 − E2 (E1 > E2 are the energies of the unbound and bound states, respectively), an element of the diagonal matrix EDz F is 1/(1+ xs e∆E/kT ). One can see that the production decreases with high xs and, as the decay increases one gets an equilibrium between decay and production for a certain level of xs . In the λ phage one also has cooperative binding. In this case one has two operator sites and the same approach as above now gives a diagonal element of 1/(1 + 2 ∗ xs e∆E1 /kT + xs (xs − 1)e∆E2 /kT ). This provides a much faster switching to zero than the single operator. If there is no cooperativity, one has ∆E2 = 2∆E1 , the interesting case is where there is cooperative binding, i.e., where the energy of the case where both operator sites are bound to a transcription factor is lower than the sum of the energies, i.e., where ∆E2 > 2∆E1 . In this case the suppression for large protein numbers which is dominated by the case of both operator sites bound is even much stronger compared to the small protein number case which is dominated by the cases where only on operator is bound to a transcription factor. Using the aggregation in this case has two advantages: First it does help reduce the computational complexity substantially. Secondly, one does not need to determine the reaction rates for this case, instead, one only needs to measure the Gibbs energies ∆Ei . These macroscopic properties of an equilibrium have been determined for the λ phage, see [15]. 1 T

e = (1, . . . , 1).

266

M. Hegland et al.

3 Aggregation Errors In the following, the approximation error incurred when numerically determining the probability distribution is discussed. Rather than errors in function values which are naturally bounded by the L∞ norm errors now occur in probabilities of sets M ⊂ Ω as p(M ) − p˜(M ). An upper bound is |p(M ) − p˜(M )| ≤ |p(x) − p˜(x)| ≤ p − p˜1 x∈M

which motivates the usage of the L1 norm here. However, like in the case of the function approximation, one often resorts to Euclidean type norms for convenience and computation. Mostly we will consider the ﬁnite Ω case where the choice of the norm used is less of an issue. In addition to the norm in the space of distributions p we will also use the norm for elements q in the dual space. These norms are chosen such that one has (p, q) ≤ pq,

p ∈ X, q ∈ X ′ ,

where in the ﬁnite Ω case X = X ′ = IRΩ and where (p, q) = x∈Ω p(x)q(x). The norms are also such that the operator A in the master equations is bounded, i.e., a(p, q) = −(Ap, q) ≤ Cpq for some C > 0. Furthermore, it will be assumed that the norms are such that an inf-sup condition for a holds: inf sup p

q

a(p, q) ≥α pq

Ω for some α > 0. In the following consider X = IR to be the fundamental linear space, M = {p | x∈Ω p(x) = 1} the subspace of probability distributions and V = {p | x∈Ω p(x) = 0}. The stationary distribution p∞ ∈ M then satisﬁes a(p∞ , q) = 0, for all q ∈ X ′ . More generally, consider the problem of ﬁnding a p ∈ M (not necessarily positive) from a given f ∈ V (necessary and suﬃcient condition for the existence of a solution) such that

a(p, q) = (f, q),

q∈V

(9) ′

where (f, q) = x∈Ω f (x)q(x). Let p0 be any element in M , then p = p−p0 ∈ V and it satisﬁes the equations a(p′ , q) = (f, q) − a(p0 , q),

q ∈ V.

One can evoke a variant of the Lax-Milgram lemma (see [2]) to show that this problem 9 is well-posed. The numerical aggregation-disaggregation method considered here approximates probability distributions p ∈ M by distributions ph ∈ F (M ) (where F (M ) is the image of M under disaggregation operator F ) which satisfy a(ph , q) = (f, q),

q ∈ E ∗ (V ).

(10)

The approximation is of optimal order (a variant of C´ea’s/Strang’s lemma, see, e.g. [2]):

Modelling Gene Regulatory Networks

267

Proposition 1. Let p and ph be the solutions of the equations 9 and 10, respectively, and let a(·, ·) be a bounded bilinear form satisfying the inf-sup conditions. If, in addition, the inf-sup conditions also hold on the subspaces F (V ) and E ∗ (V ), i.e., sup q∈E ∗ (V )

a(p, q) ≥ αp, v

p ∈ F (V ),

the error is bounded by p − ph 1 ≤

1+

C α

inf

qh ∈F (M )

p − qh 1 .

Proof. By the triangular inequality one has for any qh ∈ F (M ): p − ph ≤ p − qh + ph − qh . As ph − qh ∈ F (V ) the inf-sup condition provides the bound ph − qh ≤ α−1

sup r∈E ∗ (V )

a(ph − qh , r) . r

As a(ph , r) = (f, r) = a(p, r) one gets a(ph − qh , r) = a(p − qh , r) and, by the boundedness a(p − qh , r) ≤ Cp − qh r it follows that ph − qh ≤

C p − qh . α

Combining this bound with the ﬁrst inequality one gets C p − ph ≤ 1 + p − qh α for any qh ∈ F (M ) and so the inequality is valid for the inﬁmum as well. ⊓ ⊔ The question of the approximation order of aggregation itself has been discussed elsewhere, see [10]. This shall not be further considered here and depends on the operators E and F . Let the general approximation class Ah be deﬁned as Ah = {p ∈ X | inf p − q ≤ h}. q∈F (V )

The above proposition is then be restated in the short form as p − ph ≤ (1 + C/α)h,

p ∈ Ah .

When applying this to the determination of the stationary distribution p∞ one gets C p∞ − ph,∞ ≤ 1 + p∞ − F Ep∞ . α

268

M. Hegland et al.

In particular, the Galerkin approximation is of optimal error order relative to the · norm. In the following, we will make use of the operator Th : V → F (V ) deﬁned by a(Th f, q) = (f, q), for all q ∈ E ∗ (V ).

If one knows an element p0 ∈ F (M ) then the solution of the Galerkin problem is ph = p0 +Th (f −Ap0 ). By the inf-sup condition, the operator Th is bounded and one has Th ≤ 1/α. Moreover, the restriction Th |F (V ) is invertible and one has by continuity of a(·, ·) that Th |−1 F (V ) ≤ C. For simplicity, we will −1 denote the inverse of the restriction by Th . The method considered here for the solution of the time-dependent master equations is a semidiscrete Galerkin method. If pt denotes the time derivative then the solution of the master equations amounts to determining p(t) ∈ M for t ≥ 0 which satisﬁes (pt (t), q) + a(p(t), q) = 0,

q ∈ V.

(11)

Under the conditions above on a the solution of these equations exists for any initial condition p(0) = p0 ∈ M . The typical theorems involving time dependent systems are based on two ingredients, the stability of the underlying diﬀerential equations (in time) and the approximation order of the scheme involved. Consider ﬁrst the stability. In the following we will often make some simplifying assumptions, more complex situations will be treated in future papers. One assumption made is that the domain Ω is ﬁnite. In this case the operator A is a matrix. Each of the components of A are of the form (S − I)D for a nonnegative diagonal matrix D and a matrix S ≥ 0 which contains at most one nonzero element (which is equal to one) per column. Using the generalised inverse D+ it follows that (S − I)DD+ = P − I for some nonnegative matrix P which has only zero and one eigenvalues. Thus the matrix I − P is a singular M -matrix (see, e.g. [11, p. 119]) and so the product (P − I)D = (I − S)D is positive semistable, i.e., it has no eigenvalues with positive real parts [11]. We will furthermore assume that A itself is also semistable such that A has only one eigenvalue 0 with eigenvector p∞ and all the other eigenvalues of strictly negative real parts. Consider now the solution of the initial value problem in V . Here the corresponding matrix A is stable and, in particular, the logarithmic norm introduced by Dahlquist [3,16], µ = lims→0 (I + sA − 1)/s is less than zero. The logarithmic norm provides an estimate of the behaviour of the norm of the solution as p(t) ≤ eµt p(0). As t increases, p(t) gets “smoother” and so the aggregation approximation should intuitively provide better approximations. Formally we model this by assuming that the smoothness class Ah is invariant over time meaning that p(0) ∈ Ah ⇒ p(t) ∈ Ah .

Modelling Gene Regulatory Networks

269

The semidiscrete Galerkin method speciﬁes a ph (t) ∈ F (M ) such that (ph,t (t), q) + a(ph (t), q) = 0,

q ∈ E ∗ (V )

(12)

where ph,t is the derivative of ph with respect to time t. For the error of the semidiscrete Galerkin method one has Proposition 2. Let p(t) be the solution of the initial value problem 11 with initial condition p(0) = p0 . Furthermore let the approximations be such that the approximation class Ah is invariant over time, and that p0 ∈ Ah . Then, let the bilinear form a(·, ·) be bounded with constant C and satisfying an infsup condition and let it be stable on V = {p | a(p, q) = 0, ∀q} with logarithmic norm µ < 0. Then e(t) ≤ C(1 + e−µt )h for some C > 0. Proof. The proof will use two operators, namely Th : V → F (V ) deﬁned earlier and Rh : M → F (M ) such that a(Rh p, q) = a(p, q),

q ∈ F (V ).

Note that Rh ≤ C/α. In the symmetric case the operator Rh is sometimes called the Ritz projection. The approximation error is e(t) = ph (t)−p(t). Consider the time derivative et . By deﬁnition one has for q ∈ F (V ): (et , q) = (ph,t − p, q) = −a(ph , q) + a(p, q) = −a(ph , q) + a(Rh p, q) = a(−ph + Rh p, q).

On the other hand, Th is such that (et , q) = a(Th et , q). As the Galerkin problem is uniquely solvable on V , one has Th et = −ph +Rh p = p−ph +Rh p−p and, with r = Rh p − p one gets Th et + e = r.

(13)

Let s = ph − Rh p then e = s + r and thus e ≤ r + s. As u(t) ∈ Ah by our assumption one has r(t) ≤ h. From equation 13 on gets st + Ah s = −Ah Th rt where Ah is a mapping from Vh onto Vh such that a(p, q) = (Ah p, q) for all q, p ∈ Vh . It follows that Ah ≤ C. The diﬀerential equation for s has the solution s(t) = e−Ah t s(0) −

t

0

e−Ah (t−s) Ah Th rt

270

M. Hegland et al.

and integrating by parts one gets −Ah t

s(t) = e

& 't t −Ah (t−s) e−Ah (t−s) A2h Th rt . s(0) − e Ah Th r + 0

0

Using the bound on e−Ah t x introduced above one gets (also using that Ah and Th are bounded uniformly in h) that for some C > 0: s(t) ≤ C(1 + e−µt )h. Combining this bound with r(t) ≤ h provides the claimed bound. ⊓ ⊔

4 Sparse Grids Sparse grids provide an eﬀective means to approximate functions with many variables and their application to the master equations for gene regulatory networks has been suggested in [10]. Sparse grids can be constructed as the superposition of regular grids. Here this corresponds to constructing m aggregations with disaggregation operators F1 , . . . , Fm . The sparse grid approximation space is then given by the sum of the ranges of the disaggregation operators m R(Fi ). VSG = i=1

For the solution of the Galerkin equations for both the stationary and the time dependent problem the same general approximation results hold as above. While for ordinary aggregations one sees that the resulting approximation is still a probability distribution this is not necessarily the case here any more. Approximation with sparse grids is based on tensor product smoothness spaces. Here we will only shortly describe the method, further analysis will be provided in forthcoming papers. The operators Pi = Fi Ei form projections onto the spaces R(Fi ) such that the residuals are orthogonal to R(Ei ). In the case where the set {P1 , . . . , Pm } form a commutative semigroup, it has been shown in [7, 9] that there exist constants c1 , . . . , cm (the combination coeﬃcients) such that PCT =

m

ci Pi

i=1

is a projection onto VSG such that the residual is orthogonal onto all the spaces R(E1 ), . . . , R(Em ). Consider now the Galerkin approximations pi,∞ of the stationary distribution in the spaces R(Fi ). Using this, a sparse grid combination approximation of the stationary distribution is obtained as

Modelling Gene Regulatory Networks

pCT,∞ =

m

271

ci pi,∞ .

i=1

It is known in praxis that such approximations provide good results, the theory, however, requires a detailed knowledge of the structure of the error which goes beyond what can be done with the C´ea-type results provided above. So far, one knows that the combination technique acts as an extrapolation technique in the case of the Poisson problem [13] by cancelling lower order error terms. It is also known, however, that in some cases the combination technique may not give good results, see [5]. This is in particular the case where the operators Pi do not commute. In order to address this a method has been suggested which adapts the combination coeﬃcients ci to the data, see [8]. The method used to determine the stationary distribution can also be used to solve the instationary problem. Let pi (t) be the Galerkin approximation discussed above in the space R(Fi ). Then one obtains a sparse grid combination approximation as m pCT (t) = ci pi (t). i=1

In terms of the error analysis the error can again be decomposed into two terms, namely the best possible error in the sparse grid space and the diﬀerence between the ﬁnite element solution and the best possible error. For the second error a similar result as for the aggregation/disaggregation method provides optimality but as was suggested above, the ﬁrst type of error is still not well understood.

5 Conclusion The numerical solution of the chemical master equations can be based on aggregation/disaggregation techniques. These techniques have roots both in statistical thermodynamics and approximation theory. While this leads to substantially diﬀerent error estimates of the corresponding approximation, the Galerkin techniques based on these approximation spaces are the same for both approaches and a uniﬁed error theory for the Galerkin approach is developed above which demonstrates the optimality of the Galerkin techniques for these applications. Sparse grids are used successfully in many applications to address the curse of dimensionality and further analysis of their performance will be done in future work. In addition, we plan to consider the case of inﬁnite Ω and a further discussion of implementation and results.

6 Acknowledgments We would like to thank Mike Osborne for pointing out the importance of the logarithmic norm for stability discussions.

272

M. Hegland et al.

References 1. A. Arkin, J. Ross, and H. H. McAdams. Stochastic kinetic analysis of developmental pathway bifurcation in phage λ-infected escherichia coli. Genetics, 149:1633–1648, 1998. 2. D. Braess. Finite Elements. Cambridge, second edition, 2005. 3. G. Dahlquist. Stability and error bounds in the numerical integration of ordinary diﬀerential equations. Kungl. Tekn. H¨ ogsk. Handl. Stockholm. No., 130:87, 1959. 4. L. Ferm and P. L¨ otstedt. Numerical method for coupling the macro and meso scales in stochastic chemical kinetics. Technical Report 2006-001, Uppsala University, January 2006. 5. J. Garcke, M. Griebel, and M. Thess. Data mining with sparse grids. Computing, 67(3):225–253, 2001. 6. D. T. Gillespie. Markov Processes: an introduction for physical scientists. Academic Press, San Diego, USA, 1992. 7. M. Griebel, M. Schneider, and C. Zenger. A combination technique for the solution of sparse grid problems. In Iterative methods in linear algebra (Brussels, 1991), pages 263–281. North-Holland, Amsterdam, 1992. 8. M. Hegland. Adaptive sparse grids. ANZIAM J., 44((C)):C335–C353, 2002. 9. M. Hegland. Additive sparse grid ﬁtting. In Curve and surface ﬁtting (SaintMalo, 2002), Mod. Methods Math., pages 209–218. Nashboro Press, Brentwood, TN, 2003. 10. M. Hegland, C. Burden, L. Santoso, S. MacNamara, and H. Booth. A solver for the stochastic master equation applied to gene regulatory networks. J. Comp. Appl. Math., 205:708–724, 2007. 11. R. A. Horn and C. R. Johnson. Matrix analysis. Cambridge University Press, Cambridge, 1990. Corrected reprint of the 1985 original. 12. B. Lewin. Genes VIII. Pearson Prentice Hall, 2004. 13. C. Pﬂaum and A. Zhou. Error analysis of the combination technique. Numer. Math., 84(2):327–350, 1999. 14. M. Ptashne and A. Gann. Genes and Signals. Cold Spring Harbor Laboratory Press, 2002. 15. M. A. Shea and G. K. Ackers. The or control system of bacteriophage lambda, a physical-chemical model for gene regulation. Journal of Molecular Biology, 181:211–230, 1985. 16. T. Str¨ om. On logarithmic norms. SIAM J. Numer. Anal., 12(5):741–753, 1975.

A Numerical Study of Active-Set and Interior-Point Methods for Bound Constrained Optimization∗ Long Hei1 , Jorge Nocedal2 , and Richard A. Waltz2 1

2

Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston IL 60208, USA Department of Electrical Engineering and Computer Science, Northwestern University, Evanston IL 60208, USA

Abstract This papers studies the performance of several interior-point and activeset methods on bound constrained optimization problems. The numerical tests show that the sequential linear-quadratic programming (SLQP) method is robust, but is not as eﬀective as gradient projection at identifying the optimal active set. Interiorpoint methods are robust and require a small number of iterations and function evaluations to converge. An analysis of computing times reveals that it is essential to develop improved preconditioners for the conjugate gradient iterations used in SLQP and interior-point methods. The paper discusses how to efficiently implement incomplete Cholesky preconditioners and how to eliminate ill-conditioning caused by the barrier approach. The paper concludes with an evaluation of methods that use quasi-Newton approximations to the Hessian of the Lagrangian.

1 Introduction A variety of interior-point and active-set methods for nonlinear optimization have been developed in the last decade; see Gould et al. [12] for a recent survey. Some of these algorithms have now been implemented in high quality software and complement an already rich collection of established methods for constrained optimization. It is therefore an appropriate time to evaluate the contributions of these new algorithms in order to identify promising directions of future research. A comparison of active-set and interior-point approaches is particularly interesting given that both classes of algorithms have matured. A practical evaluation of optimization algorithms is complicated by details of implementation, heuristics and algorithmic options. It is also diﬃcult ∗

This work was supported by National Science Foundation grant CCR-0219438, Department of Energy grant DE-FG02-87ER25047-A004 and a grant from the Intel Corporation.

274

L. Hei et al.

to select a good test set because various problem characteristics, such as nonconvexity, degeneracy and ill-conditioning, aﬀect algorithms in diﬀerent ways. To simplify our task, we focus on large-scale bound constrained problems of the form minimize x

f (x)

subject to l ≤ x ≤ u,

(1a) (1b)

where f : Rn → R is a smooth function and l ≤ u are both vectors in Rn . The simple geometry of the feasible region (1b) eliminates the diﬃculties caused by degenerate constraints and allows us to focus on other challenges, such as the eﬀects of ill-conditioning. Furthermore, the availability of specialized (and very eﬃcient) gradient projection algorithms for bound constrained problems places great demands on the general-purpose methods studied in this paper. The gradient projection method can quickly generate a good working set and then perform subspace minimization on a smaller dimensional subspace. Interior-point methods, on the other hand, never eliminate inequalities and work on an n-dimensional space, putting them at a disadvantage (in this respect) when solving bound constrained problems. We chose four active-set methods that are representative of the best methods currently available: (1) The sequential quadratic programming (SQP) method implemented in snopt [10]; (2) The sequential linear-quadratic programming (SLQP) method implemented in knitro/active [2]; (3) The gradient projection method implemented in tron [15]; (4) The gradient projection method implemented in l-bfgs-b [4, 19]. SQP and gradient projection methods have been studied extensively since the 1980s, while SLQP methods have emerged in the last few years. These three methods are quite diﬀerent in nature. The SLQP and gradient projection methods follow a so-called EQP approach in which the active-set identiﬁcation and optimization computations are performed in two separate stages. In the SLQP method a linear program is used in the active-set identiﬁcation phase, while the gradient projection performs a piecewise linear search along the gradient projection path. In contrast, SQP methods follow an IQP approach in which the new iterate and the new estimate of the active set are computed simultaneously by solving an inequality constrained subproblem. We selected two interior-point methods, both of which are implemented in the knitro software package [5]: (5) The primal-dual method in knitro/direct [18] that (typically) computes steps by performing a factorization of the primal-dual system; (6) The trust region method in knitro/cg [3] that employs iterative linear algebra techniques in the step computation.

On Bound Constrained Optimization

275

The algorithm implemented in knitro/direct is representative of various line search primal-dual interior-point methods developed since the mid 1990s (see [12]), whereas the algorithm in knitro/cg follows a trust region approach that is signiﬁcantly diﬀerent from most interior-point methods proposed in the literature. We have chosen the two interior-point methods available in the knitro package, as opposed to other interior-point codes, to minimize the eﬀect of implementation details. In this way, the same type of stop tests and scalings are used in the two interior-point methods and in the SLQP method used in our tests. The algorithms implemented in (2), (3) and (6) use a form of the conjugate gradient method in the step computation. We study these iterative approaches, giving particular attention to their performance in interior-point methods where preconditioning is more challenging [1, 8, 13]. Indeed, whereas in active-set methods ill-conditioning is caused only by the objective function and constraints, in interior-point methods there is an additional source of ill-conditioning caused by the barrier approach. The paper is organized as follows. In Section 2, we describe numerical tests with algorithms that use exact Hessian information. The observations made from these results set the stage for the rest of the paper. In Section 3 we describe the projected conjugate gradient method that plays a central role in several of the methods studied in our experiments. A brief discussion on preconditioning for the SLQP method is given in Section 4. Preconditioning in the context of interior-point methods is the subject of Section 5. In Section 6 we study the performance of algorithms that use quasi-Newton Hessian approximations.

2 Some Comparative Tests In this section we report test results for four algorithms, all using exact second derivative information. The algorithms are: tron (version 1.2), knitro/direct, knitro/cg and knitro/active (versions 5.0). The latter three were not specialized in any way to the bound constrained case. In fact, we know of no such specialization for interior-point methods, although advantage can be taken at the linear algebra level, as we discuss below. A modiﬁcation of the SLQP approach that may prove to be eﬀective for bound constraints is investigated by Byrd and Waltz [6], but was not used here. We do not include snopt in these tests because this algorithm works more eﬀectively with quasi-Newton Hessian approximations, which are studied in Section 6. Similarly, l-bfgs-b is a limited memory quasi-Newton method and will also be discussed in that section. All the test problems were taken from the CUTEr collection [11] using versions of the models formulated in Ampl [9]. We chose all the bound constrained CUTEr problems available as Ampl models for which the sizes could be made large enough for our purposes, while excluding some of the repeated models (e.g., we only used torsion1 and torsiona from the group of torsion models).

Table 1. Comparative results of four methods that use exact second derivative information

tron knitro/direct knitro/cg knitro/active problem n iter feval CPU actv@sol aveCG iter feval CPU iter feval CPU aveCG endCG/n iter feval CPU aveCG biggsb1 20000 X1 X1 X1 X1 X1 12 13 2.61 12 13 245.48 942.50 0.1046 X1 X1 X1 X1 bqpgauss 2003 206 206 6.00 95 6.93 20 21 0.88 42 43 183.85 3261.14 2.0005 232 234 65.21 1020.46 chenhark 20000 72 72 4.30 19659 1.00 18 19 2.57 20 21 1187.49 4837.60 0.7852 847 848 1511.60 1148.74 clnlbeam 20000 6 6 0.50 9999 0.83 11 12 2.20 12 13 2.60 3.67 0.0001 3 4 0.41 1.00 cvxbqp1 20000 2 2 0.11 20000 0.00 9 10 51.08 9 10 3.60 6.33 0.0003 1 2 0.18 0.00 explin 24000 8 8 0.13 23995 0.88 24 25 6.79 26 27 16.93 16.46 0.0006 13 14 1.45 3.08 explin2 24000 6 6 0.10 23997 0.83 26 27 6.39 25 26 16.34 16.72 0.0005 12 13 1.26 2.17 expquad 24000 X2 X2 X2 X2 X2 X4 X4 X4 X4 X4 X4 X4 X4 183 663 56.87 1.42 gridgena 26312 16 16 14.00 0 1.75 8 23 17.34 7 8 43.88 160.86 0.0074 6 8 9.37 77.71 harkerp2 2000 X3 X3 X3 X3 X3 15 16 484.48 27 28 470.76 12.07 0.0010 7 8 119.70 0.86 jnlbrng1 21904 30 30 6.80 7080 1.33 15 16 6.80 18 19 163.62 632.33 0.1373 39 40 27.71 92.23 jnlbrnga 21904 30 30 6.60 7450 1.37 14 16 6.31 18 19 184.75 708.67 0.1608 35 36 30.05 122.03 mccormck 100000 6 7 2.60 1 1.00 9 10 11.60 12 13 20.89 4.17 0.0001 X5 X5 X5 X5 minsurfo 10000 10 10 2.10 2704 3.00 367 1313 139.76 X1 X1 X1 X1 X1 8 10 4.32 162.33 ncvxbqp1 20000 2 2 0.11 20000 0.00 35 36 131.32 32 33 10.32 4.63 0.0006 3 4 0.36 0.67 ncvxbqp2 20000 8 8 0.50 19869 1.13 75 76 376.01 73 74 58.65 26.68 0.0195 30 39 5.90 4.26 nobndtor 32400 34 34 10.00 5148 2.85 15 16 8.66 13 14 6817.52 24536.62 2.0000 66 67 78.85 107.42 nonscomp 20000 8 8 0.82 0 0.88 21 23 5.07 129 182 81.64 12.60 0.0003 10 11 1.37 4.20 obstclae 21904 31 31 4.90 10598 1.84 17 18 7.66 17 18 351.83 846.00 0.3488 93 116 40.36 37.01 obstclbm 21904 25 25 4.20 5262 1.64 12 13 5.52 11 12 562.34 2111.64 0.1819 43 50 16.91 39.08 pentdi 20000 2 2 0.17 19998 0.50 12 13 2.24 14 15 3.40 5.36 0.0005 1 2 0.21 1.00 probpenl 5000 2 2 550.00 1 0.50 3 4 733.86 3 4 6.41 1.00 0.0002 1 2 2.79 1.00 qrtquad 5000 28 58 1.60 5 2.18 39 63 1.56 X5 X5 X5 X5 X5 783 2403 48.44 2.02 qudlin 20000 2 2 0.02 20000 0.00 17 18 2.95 24 25 12.74 16.08 0.0004 3 4 0.20 0.67 reading1 20001 8 8 0.78 20001 0.88 16 17 6.11 14 15 5.64 7.21 0.0001 3 4 0.44 0.33 scond1ls 2000 592 1748 18.00 0 2.96 1276 4933 754.57 1972 2849 10658.26 2928.17 0.3405 X1 X1 X1 X1 sineali 20000 11 15 1.30 0 1.27 9 12 3.48 18 61 13.06 4.57 0.0001 34 112 8.12 1.58 torsion1 32400 59 59 14.00 9824 1.86 11 12 8.13 7 8 359.78 1367.14 0.2273 65 66 57.53 64.65 torsiona 32400 59 59 16.00 9632 1.88 10 11 7.62 6 7 80.17 348.33 0.0279 62 63 62.43 74.06 X1 : iteration limit reached X2 : numerical result out of range X3 : solver did not terminate X4 : current solution estimate cannot be improved X5 : relative change in solution estimate < 10−15

276 L. Hei et al.

On Bound Constrained Optimization

277

The results are summarized in Table 1, which reports the number of variables for each problem, as well as the number of iterations, function evaluations and computing time for each solver. For tron we also report the number of active bounds at the solution; for those solvers that use a conjugate gradient (CG) iteration, we report the average number of CG iterations per outer iteration. In addition, for knitro/cg we report the number of CG iterations performed in the last iteration of the optimization algorithm divided by the number of variables (endCG/n). We use a limit of 10000 iterations for all solvers. Unless otherwise noted, default settings were used for all solvers, including default stopping tests and tolerances which appeared to provide comparable solution accuracy in practice. We also provide in Figures 1 and 2 performance proﬁles based, respectively, on the number of function evaluations and computing time. All ﬁgures plot the logarithmic performance proﬁles described in [7]. We now comment on these results. In terms of robustness, there appears to be no signiﬁcant diﬀerence between the four algorithms tested, although knitro/direct is slightly more reliable. Function Evaluations. In terms of function evaluations (or iterations), we observe some signiﬁcant diﬀerences between the algorithms. knitro/active requires more iterations overall than the other three methods; if we compare it with tron—the other active-set method—we note that tron is almost uniformly superior. This suggests that the SLQP approach implemented in knitro/active is less eﬀective than gradient projection at identifying the optimal active set. We discuss this issue in more detail below. As expected, the interior-point methods typically perform between 10 and 30 iterations to reach convergence. Since the geometry of bound constraints is simple, only nonlinearity and nonconvexity in the objective function cause interior-point methods to perform a large number of iterations. It is not surprising that knitro/cg requires more iterations than knitro/direct given that it uses an inexact iterative approach in the step computation. Figure 1 indicates that the gradient projection method is only slightly more eﬃcient than interior-point methods, in terms of function evaluations. As in any active-set method, tron sometimes converges in a very small number of iterations (e.g. 2), but on other problems it requires signiﬁcantly more iterations than the interior-point algorithms. CPU Time. It is clear from Table 1 that knitro/cg requires the largest amount of computing time among all the solvers. This test set contains a signiﬁcant number of problems with ill-conditioned Hessians, ∇2 f (x), and the step computation of knitro/cg is dominated by the large number of CG steps performed. tron reports the lowest computing times; the average number of CG iterations per step is rarely greater than 2. This method uses

278

L. Hei et al. Number of function evaluations 1

Percentage of problems

0.8

0.6

0.4

0.2 TRON KNITRO-DIRECT KNITRO-CG KNITRO-ACTIVE

0 1

4

16

64

256

1024

4096

16384

x times slower than the best

Figure 1. Number of Function Evaluations CPU time 1

Percentage of problems

0.8

0.6

0.4

0.2 TRON KNITRO-DIRECT KNITRO-CG KNITRO-ACTIVE

0 1

4

16

64 256 x times slower than the best

Figure 2. CPU Time

1024

4096

16384

On Bound Constrained Optimization

279

an incomplete Cholesky preconditioner [14], whose eﬀectiveness is crucial to the success of tron. The high number of CG iterations in knitro/cg is easily explained by the fact that it does not employ a preconditioner to remove ill-conditioning caused by the Hessian of the objective function. What is not so simple to explain is the higher number of CG iteration in knitro/cg compared to knitro/active. Both methods use an unpreconditioned projected CG method in the step computation (see Section 3), and therefore one would expect that both methods would suﬀer equally from ill-conditioning. Table 1 indicates that this is not the case. In addition, we note that the average cost of the CG iteration is higher in knitro/cg than in knitro/active. One possible reason for this diﬀerence is that the SLQP method applies CG to a smaller problem than the interior-point algorithm. The eﬀective number of variables in the knitro/active CG iteration is n − tk , where tk is the number of constraints in the working set at the kth iteration. On the other hand, the interior-point approach applies the CG iteration in n-dimensional space. This, however, accounts only partly for the diﬀerences in performance. For example, we examined some runs in which tk is about n/3 to n/2 during the run of knitro/active and noticed that the diﬀerences in CG iterations between knitro/cg and knitro/active are signiﬁcantly greater than 2 or 3 toward the end of the run. As we discuss in Section 5, it is the combination of barrier and Hessian ill-conditioning that can be very detrimental to the interior-point method implemented in knitro/cg. Active-set identiﬁcation. The results in Table 1 suggest that the SLQP approach will not be competitive with gradient projection on bound constrained problems, unless the SLQP method can be redesigned so as to require fewer outer iterations. In other words, it needs to improve its active-set identiﬁcation mechanism. As already noted, the SLQP method in knitro/active computes the step in two phases. In the linear programming phase, an estimate of the optimal active set is computed. This linear program takes a simple form in the bound constrained case, and can be solved very quickly. Most of the computing eﬀort goes in the EQP phase, which solves an equality constrained quadratic program where the constraints in the working set are imposed as equalities (i.e., ﬁxed variables in this case) and all other constraints are ignored. This subproblem is solved using a projected CG iteration. Assuming that the cost of this CG phase is comparable in tron and knitro/active (we can use the same preconditioners in the two methods), the SLQP method needs to perform a similar number of outer iterations to be competitive. Comparing the detailed results of tron versus knitro/active highlights two features that provide tron with superior active-set identiﬁcation properties. First, the active set determined by SLQP is given by the solution of one LP (whose solution is constrained by an inﬁnity norm trust-region), whereas the gradient projection method, minimizes a quadratic model along the gradient projection path to determine an active-set estimate. Because it explores a

280

L. Hei et al.

whole path as opposed to a single point, this often results in a better active-set estimate for gradient projection. An enhancement to SLQP proposed in [6] mimics what is done in gradient projection by solving a parameteric LP (parameterized by the trust-region radius) rather than a single LP to determine an active set with improved results. Second, the gradient projection implementation in tron has a feature which allows it to add bounds to the active set during the unconstrained minimization phase, if inactive bounds are encountered. On some problems this signiﬁcantly decreases the number of iterations required to identify the optimal active set. In the bound constrained case, it is easy to do something similar for SLQP. In [6], this feature was added to an SLQP algorithm and shown to improve performance on bound constrained problems. The combination of these two features may result in an SLQP method that is competitive with tron. However, more research is needed to determine if this goal can be achieved. In the following section we give attention to the issue of preconditioning. Although, in this paper, we are interested in preconditioners applied to bound constrained problems, we will ﬁrst present our preconditioning approach in the more general context of constrained optimization where it is also applicable.

3 The Projected Conjugate Gradient Method Both knitro/cg and knitro/active use a projected CG iteration in the step computation. To understand the challenges of preconditioning this iteration, we now describe it in some detail. The projected CG iteration is a method for solving equality constrained quadratic programs of the form 1 T x Gx + hT x 2 subject to Ax = b, minimize x

(2a) (2b)

where G is an n × n symmetric matrix that is positive deﬁnite on the null space of the m × n matrix A, and h is an n-vector. Problem (2) can be solved by eliminating the constraints (2b), applying the conjugate gradient method to the reduced problem of dimension (n − m), and expressing this solution process in n-dimensional space. This procedure is speciﬁed in the following algorithm. We denote the preconditioning operator by P ; its precise deﬁnition is given below. Algorithm PCG. Preconditioned Projected CG Method. Choose an initial point x0 satisfying Ax0 = b. Set x ← x0 , compute r = Gx+h, z = P r and p = −z. Repeat the following steps, until z is smaller than a given tolerance:

On Bound Constrained Optimization

281

α = rT z/pT Gp x ← x + αp

r+ = r + αGp z + = P r+

β = (r+ )T z + /rT z p ← −z + + βp z ← z + and r ← r+ End The preconditioning operation is deﬁned indirectly, as follows. Given a vector r, we compute z = P r as the solution of the system

r z D AT , (3) = 0 w A 0 where D is a symmetric matrix that is required to be positive deﬁnite on the null space of A, and w is an auxiliary vector. A preconditioner of the form (3) is often called a constraint preconditioner. To accelerate the convergence of Algorithm PCG, the matrix D should approximate G in the null space of A and should be sparse so that solving (3) is not too costly. It is easy to verify that since initially Ax0 = b, all subsequent iterates x of Algorithm PCG also satisfy the linear constraints (2b). The choice D = I gives an unpreconditioned projected CG iteration. To improve the performance of Algorithm PCG, we consider some other choices for D (One option is to let D be a diagonal matrix; see e.g. [1, 16]). Another option is to deﬁne D by means of an incomplete Cholesky factorization of G, but the challenge is how to implement it eﬀectively in the setting of constrained optimization. An implementation that computes the incomplete factors L and LT of G, multiplies them to give D = LLT , and then factors the system (3), is of little interest; one might as well use the perfect preconditioner D = G. However, for special classes of problems, such as bound constrained optimization, it is possible to rearrange the computations and compute the incomplete Cholesky factorization on a reduced system, as discussed in the next sections. We note that the knitro/cg and knitro/active algorithms actually solve quadratic programs of the form (2) subject to a trust region constraint x ≤ ∆; in addition, G may not always be positive deﬁnite on the null space of A. To deal with these two requirements, Algorithm PCG can be adapted by following Steihaug’s approach: we terminate the iteration if the trust region is crossed or if negative curvature is encountered [17]. In this paper, we will ignore these additional features and consider preconditioning in the simpler context of Algorithm PCG.

282

L. Hei et al.

4 Preconditioning the SLQP Method In the SLQP method implemented in knitro/active, the equality constraints (2b) are deﬁned as the linearization of the problem constraints belonging to the working set. We have already mentioned that this working set is obtained by solving an auxiliary linear program. In the experiments reported in Table 1, we used D = I in (3), i.e. the projected CG iteration in knitro/active was not preconditioned. This explains the high number of CG iterations and computing time for many of the problems. Let us therefore consider other choices for D. Diagonal preconditioners are straightforward to implement, but are often not very eﬀective. A more attractive option is incomplete Cholesky preconditioning, which can be implemented as follows. Suppose for the moment that we use the perfect preconditioner D = G in (3). Since z satisﬁes Az = 0, we can write z = Zu, where Z is a basis matrix such that AZ = 0 and u is some vector of dimension (n − m). Multiplying the ﬁrst block of equations in (3) by Z T and recalling the condition Az = 0 we have that (4) Z T GZu = Z T r. We now compute the incomplete Cholesky factorization of the reduced Hessian, LLT ≈ Z T GZ,

(5)

LLT u ˆ = Z T r,

(6)

solve the system and set z = Z u ˆ. This deﬁnes the preconditioning step. Since for nonconvex problems Z T GZ may not be positive deﬁnite, we can apply a modiﬁed incomplete Cholesky factorization of the form LLT ≈ Z T (G + δI)Z, for some positive scalar δ; see [14]. For bound constrained problems, the linear constraints (2b) are deﬁned to be the bounds in the working set. Therefore the columns of Z are unit vectors and the reduced Hessian Z T GZ is obtained by selecting appropriate rows and columns from G. This preconditioning strategy is therefore practical and eﬃcient since the matrix Z need not be computed and the reduced Hessian Z T GZ is easy to form. In fact, this procedure is essentially the same as that used in tron. The gradient projection method selects a working set (a set of active bounds) by using a gradient projection search, and computes a step by solving a quadratic program of the form (2). To solve this quadratic program, the gradient projection method in tron eliminates the constraints and applies a preconditioned CG method to the reduced problem minimize u

uT Z T GZu + hT Zu.

On Bound Constrained Optimization

283

The preconditioner is deﬁned by the incomplete Cholesky factorization (5). Thus the only diﬀerence between the CG iterations in tron and the preconditioned projected CG method based on Algorithm PCG is that the latter works in Rn while the former works in Rn−m . (It is easy to see that the two approaches are equivalent and that the computational costs are very similar.) Numerical tests of knitro/active using the incomplete Cholesky preconditioner just described will be reported in a forthcoming publication. In the rest of the paper, we focus on interior-point methods and report results using various preconditioning approaches.

5 Preconditioning the Interior-Point Method The interior-point methods implemented in knitro solve a sequence of barrier problems of the form log si (7a) minimize f (x) − µ x,s

i∈I

subject to cE (x) = 0 cI (x) − s = 0,

(7b) (7c)

where s is a vector of slack variables, µ > 0 is the barrier parameter, and cE (x), cI (x) denote the equality and inequality constraints, respectively. knitro/cg ﬁnds an approximate solution of (7) using a form of sequential quadratic programming. This leads to an equality constrained subproblem of the form (2), in which the Hessian and Jacobian matrices are given by

2 AE 0 ∇xx L 0 , A= , (8) G= 0 Σ AI −I where L(x, λ) is the Lagrangian of the nonlinear program, Σ is a diagonal matrix and AE and AI denote the Jacobian matrices corresponding to the equality and inequality constraints, respectively. (In the bound constrained case, AE does not exist and AI is a simple sparse matrix whose rows are unit vectors.) The matrix Σ is deﬁned as Σ = S −1 ΛI , where S = diag {si } , ΛI = diag {λi } , i ∈ I, and where the si are slack variables and λi , i ∈ I are Lagrange multipliers corresponding to the inequality constraints. Hence there are two separate sources of ill-conditioning in G; one caused by the Hessian ∇2xx L and the other by the barrier eﬀects reﬂected in Σ. Any ill-conditioning due to A is removed by the projected CG approach. Given the block structure (8), the preconditioning operation (3) takes the form

284

L. Hei et al.

⎡

Dx ⎢ 0 ⎢ ⎣ AE AI

⎤⎡ ⎤ ⎡ ⎤ r1 zx 0 AE T AI T ⎢ zs ⎥ ⎢ r2 ⎥ Ds 0 −I ⎥ ⎥⎢ ⎥ = ⎢ ⎥. 0 0 0 ⎦ ⎣ w1 ⎦ ⎣ 0 ⎦ w2 0 −I 0 0

(9)

The matrix Ds will always be chosen as a diagonal matrix, given that Σ is diagonal. In the experiments reported in Table 1, knitro/cg was implemented with Dx = I and Ds = S −2 . This means that the algorithm does not include preconditioning for the Hessian ∇2xx L, and applies a form of preconditioning for the barrier term Σ (as we discuss below). The high computing times of knitro/cg in Table 1 indicate that this preconditioning strategy is not effective for many problems, and therefore we discuss how to precondition each of the two terms in G. 5.1 Hessian Preconditioning Possible preconditioners for the Hessian ∇2xx L include diagonal preconditioning and incomplete Cholesky. Diagonal preconditioners are simple to implement; we report results for them in the next section. To design an incomplete Cholesky preconditioner, we exploit the special structure of (9). Performing block elimination on (9) yields the condensed system

zx Dx + AI T Ds AI AE T r1 + AI T r2 = ; (10) w1 AE 0 0 the eliminated variables zs , w2 are recovered from the relation zs = AI zx , w2 = Ds zs − r2 . If we deﬁne Dx = LLT , where L is the incomplete Cholesky factor of ∇2 L, we still have to face the problem of how to factor (10) eﬃciently. However, for problems without equality constraints, such as bound constrained problems, (10) reduces to (Dx + AI T Ds AI )zx = r1 + AI T r2 .

(11)

Let us assume that the diagonal preconditioning matrix Ds is given. For bound constrained problems, AI T Ds AI can be expressed as the sum of two diagonal matrices. Hence, the coeﬃcient matrix in (11) is easy to form. Setting Dx = ∇2xx L, we compute the (possibly modiﬁed) incomplete Cholesky factorization LLT ≈ ∇2xx L + AI T Ds AI .

(12)

The preconditioning step is then obtained by solving LLT zx = r1 + AI T r2 and by deﬁning zs = AI zx .

(13)

On Bound Constrained Optimization

285

One advantage of this approach is apparent from the structure of the matrix in the right hand side of (12). Since we are adding a positive diagonal matrix to ∇2xx L, it is less likely that a modiﬁcation of the form δI must be introduced in the course of the incomplete Cholesky factorization. Minimizing the use of the modiﬁcation δI is desirable because it can introduce undesirable distortions in the Hessian information. We note that the incomplete factorization (12) is also practical for problems that contain general inequality constraints, provided the term AI T Ds AI is not costly to form and does not lead to severe ﬁll-in. 5.2 Barrier Preconditioning It is well known that the matrix Σ = S −1 ΛI becomes increasingly illconditioned as the iterates of the optimization algorithm approach the solution. Some diagonal elements of Σ diverge while others converge to zero. Since Σ is a diagonal matrix, it can always be preconditioned adequately using a diagonal matrix. We consider two preconditioners: Ds = Σ

and

Ds = µS −2 .

The ﬁrst is the natural choice corresponding to the perfect preconditioner for the barrier term, while the second choice is justiﬁed because near the central path, ΛI ≈ µS −1 , so Σ = S −1 ΛI ≈ S −1 (µS −1 ) = µS −2 . 5.3 Numerical Results We test the preconditioners discussed above using a MATLAB implementation of the algorithm in knitro/cg. Our MATLAB program does not contain all the features of knitro/cg, but is suﬃciently robust and eﬃcient to study the eﬀectiveness of various preconditioners. The results are given by Table 2, which reports the preconditioning option (option), the ﬁnal objective function value, the number of iterations of the interior-point algorithm, the total number of CG iterations, the average number of CG iterations per interior-point iteration, and the CPU time. The preconditioning options are labeled as: option = (a, b) where a denotes the Hessian preconditioner and b the barrier preconditioner. The options are: a = 0: a = 1: a = 2: b = 0: b = 1: b = 2:

No Hessian preconditioning (current default in knitro) Diagonal Hessian preconditioning Incomplete Cholesky preconditioning Ds = S −2 (current default in knitro) Ds = µS −2 Ds = Σ.

Since our MATLAB code is not optimized for speed, we have chosen test problems with a relatively small number of variables.

286

L. Hei et al.

problem option ﬁnal objective #iteration #total CG biggsb1 (0,0) +1.5015971301e − 02 31 3962 (n = 100) (0,1) +1.5015971301e − 02 29 2324 (0,2) +1.5015971301e − 02 28 2232 (1,0) +1.5015971301e − 02 30 3694 (1,1) +1.5015971301e − 02 30 2313 (1,2) +1.5015971301e − 02 30 2241 (2,0) +1.5015971301e − 02 31 44 (2,1) +1.5015971301e − 02 29 42 (2,2) +1.5015971301e − 02 28 41 cvxbqp1 (0,0) +9.0450040000e + 02 11 91 (n = 200) (0,1) +9.0453998374e + 02 8 112 (0,2) +9.0450040000e + 02 53 54 (1,0) +9.0454000245e + 02 30 52 (1,1) +9.0450040000e + 02 30 50 (1,2) +9.0454001402e + 02 47 48 (2,0) +9.0450040000e + 02 11 18 (2,1) +9.0454000696e + 02 8 15 (2,2) +9.0450040000e + 02 53 53 jnlbrng1 (0,0) −1.7984674056e − 01 29 5239 (n = 324) (0,1) −1.7984674056e − 01 27 885 (0,2) −1.7984674056e − 01 29 908 (1,0) −1.7984674056e − 01 29 5082 (1,1) −1.7984674056e − 01 27 753 (1,2) −1.7988019171e − 01 26 677 (2,0) −1.7984674056e − 01 30 71 (2,1) −1.7984674056e − 01 27 59 (2,2) −1.7984674056e − 01 29 66 obstclbm (0,0) +5.9472925926e + 00 28 7900 (n = 225) (0,1) +5.9473012340e + 00 18 289 (0,2) +5.9472925926e + 00 31 335 (1,0) +5.9472925926e + 00 27 6477 (1,1) +5.9472925926e + 00 29 380 (1,2) +5.9473012340e + 00 18 197 (2,0) +5.9472925926e + 00 27 49 (2,1) +5.9473012340e + 00 17 32 (2,2) +5.9472925926e + 00 25 49 pentdi (0,0) −7.4969998494e − 01 27 260 (n = 250) (0,1) −7.4969998502e − 01 25 200 (0,2) −7.4969998500e − 01 28 205 (1,0) −7.4969998494e − 01 28 256 (1,1) −7.4992499804e − 01 23 153 (1,2) −7.4969998502e − 01 26 132 (2,0) −7.4969998494e − 01 27 41 (2,1) −7.4969998502e − 01 25 39 (2,2) −7.4969998500e − 01 28 42 torsion1 (0,0) −4.8254023392e − 01 26 993 (n = 100) (0,1) −4.8254023392e − 01 25 298 (0,2) −4.8254023392e − 01 24 274 (1,0) −4.8254023392e − 01 26 989 (1,1) −4.8254023392e − 01 25 274 (1,2) −4.8254023392e − 01 25 250 (2,0) −4.8254023392e − 01 25 52 (2,1) −4.8254023392e − 01 25 53 (2,2) −4.8254023392e − 01 24 51 torsionb (0,0) −4.0993481087e − 01 25 1158 (n = 100) (0,1) −4.0993481087e − 01 25 303 (0,2) −4.0993481087e − 01 23 282 (1,0) −4.0993481087e − 01 25 1143 (1,1) −4.0993481087e − 01 24 274 (1,2) −4.0993481087e − 01 23 246 (2,0) −4.0993481087e − 01 24 49 (2,1) −4.0993481087e − 01 24 49 (2,2) −4.0993481087e − 01 23 48

#average CG 1.278e + 02 8.014e + 01 7.971e + 01 1.231e + 02 7.710e + 01 7.470e + 01 1.419e + 00 1.448e + 00 1.464e + 00 8.273e + 00 1.400e + 01 1.019e + 00 1.733e + 00 1.667e + 00 1.021e + 00 1.636e + 00 1.875e + 00 1.000e + 00 1.807e + 02 3.278e + 01 3.131e + 01 1.752e + 02 2.789e + 01 2.604e + 01 2.367e + 00 2.185e + 00 2.276e + 00 2.821e + 02 1.606e + 01 1.081e + 01 2.399e + 02 1.310e + 01 1.094e + 01 1.815e + 00 1.882e + 00 1.960e + 00 9.630e + 00 8.000e + 00 7.321e + 00 9.143e + 00 6.652e + 00 5.077e + 00 1.519e + 00 1.560e + 00 1.500e + 00 3.819e + 01 1.192e + 01 1.142e + 01 3.804e + 01 1.096e + 01 1.000e + 01 2.080e + 00 2.120e + 00 2.125e + 00 4.632e + 01 1.212e + 01 1.226e + 01 4.572e + 01 1.142e + 01 1.070e + 01 2.042e + 00 2.042e + 00 2.087e + 00

Table 2. Results of various preconditioning options

time 3.226e + 01 1.967e + 01 1.880e + 01 3.086e + 01 2.010e + 01 2.200e + 01 1.950e + 00 1.870e + 00 1.810e + 00 4.420e + 00 4.220e + 00 1.144e + 01 9.290e + 00 9.550e + 00 1.527e + 01 2.510e + 00 1.940e + 00 1.070e + 01 8.671e + 01 1.990e + 01 2.064e + 01 9.763e + 01 3.387e + 01 2.917e + 01 6.930e + 00 6.390e + 00 6.880e + 00 1.919e + 02 1.268e + 01 1.618e + 01 1.620e + 02 2.246e + 01 1.192e + 01 7.180e + 00 4.820e + 00 6.650e + 00 6.490e + 00 5.920e + 00 5.960e + 00 1.111e + 01 9.640e + 00 9.370e + 00 3.620e + 00 3.350e + 00 3.640e + 00 9.520e + 00 4.130e + 00 3.820e + 00 9.760e + 00 4.520e + 00 3.910e + 00 1.760e + 00 1.800e + 00 1.660e + 00 1.079e + 01 4.160e + 00 3.930e + 00 1.089e + 01 4.450e + 00 3.700e + 00 1.720e + 00 1.700e + 00 1.630e + 00

On Bound Constrained Optimization

287

Note that for all the test problems, except cvxbqp1, the number of interiorpoint iterations is not greatly aﬀected by the choice of preconditioner. Therefore, we can use Table 2 to measure the eﬃciency of the preconditioners, but we must exercise caution when interpreting the results for problem cvxbqp1. Let us consider ﬁrst the case when only barrier preconditioning is used, i.e., where option has the form (0, ∗). As expected, the options (0, 1) and (0, 2) generally decrease the number of CG iterations and computing time with respect to the standard option (0, 0), and can therefore be considered successful in this context. From these experiments it is not clear whether option (0, 1) is to be preferred over option (0, 2). Incomplete Cholesky preconditioning is very successful. If we compare the results for options (0,0) and (2,0), we see substantial reductions in the number of CG iterations and computing time for the latter option. When we add barrier preconditioning to incomplete Cholesky preconditioning (options (2, 1) and (2, 2)) we do not see further gains. Therefore, we speculate that the standard barrier preconditioner Ds = S −2 may be adequate, provided the Hessian preconditioner is eﬀective. Diagonal Hessian preconditioning, i.e, options of the form (1, ∗), rarely provides much beneﬁt. Clearly this preconditioner is of limited use. One might expect that preconditioning would not aﬀect much the number of iterations of the interior-point method because it is simply a mechanism for accelerating the step computation procedure. The results for problem cvxbqp1 suggest that this is not the case (we have seen a similar behavior on other problems). In fact, preconditioning changes the form of the algorithm in two ways: it changes the shape of the trust region and it aﬀects the barrier stop test. We introduce preconditioning in knitro/cg by deﬁning the trust region as , - 1/2 Dx dx ≤ ∆. Ds1/2 ds 2

The standard barrier preconditioner Ds = S −2 gives rise to the trust-region Dx1/2 dx (14) ≤ ∆, S −1 ds 2

which has proved to control well the rate at which the slacks approach zero. (This is the standard aﬃne scaling strategy used in many optimization methods.) On the other hand, the barrier preconditioner Ds = µS −2 results in the trust region Dx1/2 dx ≤ ∆. √ (15) µS −1 ds 2

When µ is small, (15) does not penalize a step approaching the bounds s ≥ 0 as severely as (14). This allows the interior-point method to approach the boundary of the feasible region prematurely and can lead to very small steps.

288

L. Hei et al.

An examination of the results for problem cvxbqp1 shows that this is indeed the case. The preconditioner Ds = Σ = S −1 ΛI can be ineﬀective for a diﬀerent reason. When the multiplier estimates λi are inaccurate (too large or too small) the trust region will not properly control the step ds . These remarks reinforce our view that the standard barrier preconditioner Ds = S −2 may be the best choice and that our eﬀort should focus on Hessian preconditioning. Let us consider the second way in which preconditioning changes the interior-point algorithm. Preconditioning amounts to a scaling of the variables of the problem; this scaling alters the form of the KKT optimality conditions. knitro/cg uses a barrier stop test that determines when the barrier problem has been solved to suﬃcient accuracy. This strategy forces the iterates to remain in a (broad) neighborhood of the central path. Each barrier problem is terminated when the norm of the scaled KKT conditions is small enough, where the scaling factors are aﬀected by the choice of Dx and Ds . A poor choice of preconditioner, including diagonal Hessian preconditioning, introduces an unwanted distortion in the barrier stop test, and this can result in a deterioration of the interior-point iteration. Note in contrast that the incomplete Cholesky preconditioner (option (2, ∗)) does not adversely aﬀect the overall behavior of the interior-point iteration in problem cvxbqp1.

6 Quasi-Newton Methods We now consider algorithms that use quasi-Newton approximations. In recent years, most of the numerical studies of interior-point methods have focused on the use of exact Hessian information. It is well known, however, that in many practical applications, second derivatives are not available, and it is therefore of interest to compare the performance of active-set and interiorpoint methods in this context. We report results with 5 solvers: snopt version 7.2-1 [10], l-bfgs-b [4,19], knitro/direct, knitro/cg and knitro/active version 5.0. Since all the problems in our test set have more than 1000 variables, we employ the limited memory BFGS quasi-Newton options in all codes, saving m = 20 correction pairs. All other options in the codes were set to their defaults. snopt is an active-set SQP method that computes steps by solving an inequality constrained quadratic program. l-bfgs-b implements a gradient projection method. Unlike tron, which is a trust region method, l-bfgs-b is a line search algorithm that exploits the simple structure of limited memory quasi-Newton matrices to compute the step at small cost. Table 3 reports the results on the same set of problems as in Table 1. Performance proﬁles are provided in Figures 3 and 4.

knitro/direct (m = 20) iter feval CPU 6812 6950 1244.05 X1 X1 X1 X1 X1 X1 19 20 31.42 29 30 89.03 50 51 239.29 33 34 133.65 X4 X4 X4 X5 X5 X5 183 191 76.26 1873 1913 992.23 2134 2191 10929.97 53 166 1222.38 3953 3980 801.87 85 86 382.62 3831 3835 20043.27 1100 1129 8306.03 31 34 99.36 982 1009 1322.69 383 391 2139.91 59 61 221.98 4 8 0.53 X4 X4 X4 17 18 27.78 359 625 1891.24 X1 X1 X1 X5 X5 X5 696 716 1564.78 625 643 950.16

knitro/cg knitro/active (m = 20) (m = 20) iter feval CPU iter feval CPU 3349 3443 1192.32 X1 X1 X1 X1 X1 X1 X4 X4 X4 X1 X1 X1 X1 X1 X1 14 15 11.94 16 17 6.42 25 26 71.44 2 3 0.59 47 48 76.84 32 34 35.84 40 41 69.74 23 28 17.32 X5 X5 X5 206 645 513.75 X5 X5 X5 20 97 120.59 164 168 58.82 10 11 1.48 1266 1309 1968.66 505 515 1409.80 1390 1427 221.73 395 417 1236.32 X4 X4 X4 X5 X5 X5 1633 1665 16136.98 497 498 743.37 X1 X1 X1 9 10 2.03 8118 8119 993.03 124 125 178.97 1049 1069 27844.21 873 886 3155.06 1098 1235 2812.25 87 92 123.82 618 639 11489.74 1253 1258 2217.58 282 286 1222.99 276 279 641.07 60 62 67.39 3 7 0.72 4 5 0.30 2 4 0.10 X5 X5 X5 X1 X1 X1 24 25 34.81 4 5 0.43 66 69 150.48 15 16 5.17 X1 X1 X1 X1 X1 X1 X4 X4 X4 X1 X1 X1 336 362 15661.85 300 303 1251.40 349 370 15309.47 296 306 1272.50

Table 3. Comparative results of live methods that use quasi-Newton approximations

snopt l-bfgs-b (m = 20) (m = 20) problem n iter feval CPU iter feval CPU biggsb1 20000 X1 X1 X1 X1 X1 X1 bqpgauss 2003 5480 6138 482.87 9686 10253 96.18 chenhark 20000 X1 X1 X1 X1 X1 X1 clnlbeam 20000 41 43 45.18 22 28 0.47 cvxbqp1 20000 60 65 139.31 1 2 0.04 explin 24000 72 100 28.08 29 36 0.52 explin2 24000 63 72 25.62 20 24 0.30 expquad 24000 X4 X4 X4 X2 X2 X2 gridgena 26312 X6 X6 X6 X7 X7 X7 harkerp2 2000 50 57 7.05 86 102 4.61 jnlbrng1 21904 1223 1337 8494.55 1978 1992 205.02 jnlbrnga 21904 1179 1346 1722.60 619 640 59.24 mccormck 100000 1019 1021 10820.22 X8 X8 X8 minsurfo 10000 904 1010 8712.90 1601 1648 97.66 ncvxbqp1 20000 41 43 60.54 1 2 0.04 ncvxbqp2 20000 X6 X6 X6 151 191 4.76 nobndtor 32400 1443 1595 12429 1955 1966 314.42 nonscomp 20000 233 237 1027.41 X8 X8 X8 obstclae 21904 547 597 4344.33 1110 1114 109.11 obstclbm 21904 342 376 1332.14 359 368 35.94 pentdi 20000 2 6 0.57 1 3 0.05 probpenl 5000 3 5 8.86 2 4 0.03 qrtquad 5000 X6 X6 X6 241 308 4.85 qudlin 20000 41 43 19.80 1 2 0.02 reading1 20001 81 83 114.18 7593 15354 234.93 scond1ls 2000 X1 X1 X1 X1 X1 X1 sineali 20000 466 553 918.33 14 19 0.63 torsion1 32400 662 733 4940.83 565 579 86.39 torsiona 32400 685 768 5634.62 490 496 77.42 X1 : iteration limit reached X2 : numerical result out of range X4 : current solution estimate cannot be improved X5 : relative change in solution estimate < 10−15 X6 : dual feasibility cannot be satisﬁed X7 : rounding error X8 : line search error

On Bound Constrained Optimization 289

290

L. Hei et al. Number of function evaluations 1

Percentage of problems

0.8

0.6

0.4

0.2 SNOPT L-BFGS-B KNITRO-DIRECT KNITRO-CG KNITRO-ACTIVE

0 1

4

16

64

256

1024

4096

16384

65536

x times slower than the best

Figure 3. Number of Function Evaluations CPU time 1

Percentage of problems

0.8

0.6

0.4

0.2 SNOPT L-BFGS-B KNITRO-DIRECT KNITRO-CG KNITRO-ACTIVE

0 1

4

16

64 256 1024 x times slower than the best

Figure 4. CPU Time

4096

16384

65536

On Bound Constrained Optimization

291

A sharp drop in robustness and speed is noticeable for the three knitro algorithms; compare with Table 1. In terms of function evaluations, l-bfgsb and knitro/active perform the best. snopt and the two interior-point methods require roughly the same number of function evaluations, and this number is often dramatically larger than that obtained by the interior-point solvers using exact Hessian information. In terms of CPU time, l-bfgs-b is by far the best solver and knitro/active comes in second. Again, snopt and the two interior-point methods require a comparable amount of CPU, and for some of these problems the times are unacceptably high. In summation, as was the case with tron when exact Hessian information was available, the specialized quasi-Newton method for bound constrained problems l-bfgs-b has an edge over the general purpose solvers. The use of preconditioning has helped bridge the gap in the exact Hessian case, but in the quasi-Newton case, improved updating procedures are clearly needed for general purpose methods.

References 1. L. Bergamaschi, J. Gondzio, and G. Zilli, Preconditioning indeﬁnite systems in interior point methods for optimization, Tech. Rep. MS-02-002, Department of Mathematics and Statistics, University of Edinburgh, Scotland, 2002. 2. R. H. Byrd, N. I. M. Gould, J. Nocedal, and R. A. Waltz, An algorithm for nonlinear optimization using linear programming and equality constrained subproblems, Mathematical Programming, Series B, 100 (2004), pp. 27–48. 3. R. H. Byrd, M. E. Hribar, and J. Nocedal, An interior point algorithm for large scale nonlinear programming, SIAM Journal on Optimization, 9 (1999), pp. 877–900. 4. R. H. Byrd, P. Lu, J. Nocedal, and C. Zhu, A limited memory algorithm for bound constrained optimization, SIAM Journal on Scientiﬁc Computing, 16 (1995), pp. 1190–1208. 5. R. H. Byrd, J. Nocedal, and R. Waltz, KNITRO: An integrated package for nonlinear optimization, in Large-Scale Nonlinear Optimization, G. di Pillo and M. Roma, eds., Springer, 2006, pp. 35–59. 6. R. H. Byrd and R. A. Waltz, Improving SLQP methods using parametric linear programs, tech. rep., OTC, 2006. To appear. ´, Benchmarking optimization software with per7. E. D. Dolan and J. J. More formance proﬁles, Mathematical Programming, Series A, 91 (2002), pp. 201–213. 8. A. Forsgren, P. E. Gill, and J. D. Griffin, Iterative solution of augmented systems arising in interior methods, Tech. Rep. NA 05-3, Department of Mathematics, University of California, San Diego, 2005. 9. R. Fourer, D. M. Gay, and B. W. Kernighan, AMPL: A Modeling Language for Mathematical Programming, Scientiﬁc Press, 1993. www.ampl.com. 10. P. E. Gill, W. Murray, and M. A. Saunders, SNOPT: An SQP algorithm for large-scale constrained optimization, SIAM Journal on Optimization, 12 (2002), pp. 979–1006.

292

L. Hei et al.

11. N. I. M. Gould, D. Orban, and P. L. Toint, CUTEr and sifdec: A Constrained and Unconstrained Testing Environment, revisited, ACM Trans. Math. Softw., 29 (2003), pp. 373–394. 12. N. I. M. Gould, D. Orban, and P. L. Toint, Numerical methods for largescale nonlinear optimization, Technical Report RAL-TR-2004-032, Rutherford Appleton Laboratory, Chilton, Oxfordshire, England, 2004. 13. C. Keller, N. I. M. Gould, and A. J. Wathen, Constraint preconditioning for indeﬁnite linear systems, SIAM Journal on Matrix Analysis and Applications, 21 (2000), pp. 1300–1317. ´, Incomplete Cholesky factorizations with limited 14. C. J. Lin and J. J. More memory, SIAM Journal on Scientiﬁc Computing, 21 (1999), pp. 24–45. , Newton’s method for large bound-constrained optimization problems, 15. SIAM Journal on Optimization, 9 (1999), pp. 1100–1127. 16. M. Roma, Dynamic scaling based preconditioning for truncated Newton methods in large scale unconstrained optimization: The complete results, Technical Report R. 579, Istituto di Analisi dei Sistemi ed Informatica, 2003. 17. T. Steihaug, The conjugate gradient method and trust regions in large scale optimization, SIAM Journal on Numerical Analysis, 20 (1983), pp. 626–637. 18. R. A. Waltz, J. L. Morales, J. Nocedal, and D. Orban, An interior algorithm for nonlinear optimization that combines line search and trust region steps, Mathematical Programming, Series A, 107 (2006), pp. 391–408. 19. C. Zhu, R. H. Byrd, P. Lu, and J. Nocedal, Algorithm 78: L-BFGS-B: Fortran subroutines for large-scale bound constrained optimization, ACM Transactions on Mathematical Software, 23 (1997), pp. 550–560.

Word Similarity In WordNet Tran Hong-Minh and Dan Smith School of Computing Sciences University Of East Anglia Norwich, UK, NR4 7TJ [email protected] Abstract This paper presents a new approach to measure the semantic similarity between concepts. By exploiting advantages of distance (edge-base) approach for taxonomic tree-like concepts, we enhance the strength of information theoretic (node-based) approach. Our measure therefore gives a complete view of word similarity, which cannot be achieved by solely applying node-based approach. Our experimental measure achieves 88% correlation with human rating.

1 Introduction Understanding concepts expressed in natural language is a challenge in Natural Language Processing and Information Retrieval. It is often decomposed into comparing semantic relations between concepts, which can be done by using Hidden Markov model and Bayesian Network for part of speech tagging. Alternatively, the knowledge-based approach can also be applied but it has not been well explored due to the lack of machine readable dictionaries (such as lexicons, thesauri and taxonomies) [12]. However, more dictionaries have been developed so far (e.g., Roger, Longman, WordNet [5, 6] etc.) and the number of research on this trend has been increased. The task of understanding and comparing semantics of concepts becomes understanding and comparing such relations by exploiting machine readable dictionaries. We propose a new information theoretic measure to assess the similarity of two concepts on the basis of exploring a lexical taxonomy (e.g., WordNet). The proposed formula is domain-independent. It could be applied for either generic or speciﬁed lexical knowledge base. We use WordNet as an example of the lexical taxonomy. The rest of the paper is organized as follows. In Section 2 we give an overview of the structure of a lexical hierarchy and use WordNet as a speciﬁc example. In the following section, Section 3 we analyze two approaches (such as distance (edge) based and information theoretic (node) based) for measuring the similarity degree. Based on these analysis we present our measure

294

T. Hong-Minh and D. Smith

which combines both advantages of the two approaches in Section 4. In Section 5 we discuss our comparative experiments. Finally we outline our future work in Section 6.

2 Lexical Taxonomy A taxonomy is often organized as a hierarchical and directional structure, in which nodes present for concepts (Noun, Adjective, Verb) and edges present for relations between concepts. The hierarchical structure has seldom more than 10 levels in depth. Although hierarchies in the system vary widely in size, each hierarchy covers a distinct conceptual and lexical domain. They are also not mutually exclusive as some cross-references are required. The advantage of the hierarchical structure is that common information to many items need not be stored with every item. In other words, all characteristics of the superordinate are assumed to be characteristic of all its subordinates as well. The hierarchical system therefore is an inheritance system with possibly multiple inheritance but without forming cycles. Consequently, nodes at deeper levels are more informative and speciﬁc than nodes that are nearer to the root. In principle, the root would be semantically empty. The number of leaf-nodes is obviously very much more than the number of upper nodes. In a hierarchical system, there are three types of nodes, such as concept nodes indicating nouns (a.k.a Noun node), attribute nodes representing adjectives and function nodes standing for verbs. Nodes are linked together by edges to give a full information about concepts. A node and a set of nodes linked with by incoming edges make it distinguished. Edges represent the relations between nodes. They are currently categorized into some common types (such as, is-a, equivalence, antonymy, modiﬁcation, function and meronymy). Among them, the IS-A relation, connecting a Noun node and another Noun node, is the dominant and the most important one. Like the IS-A relation, the meronymy relation connecting two Noun nodes together also has an important role in the system. Besides the two popular relations, there are four more types of relations. The antonymy relation (e.g. man-woman, wife-husband), the equivalence relation connects synonyms together. The modiﬁcation indicates attributes of a concept by connecting a Noun node and an Adjective node and the function relation indicates behaviour of a concept by linking a Verb to a Noun. In Table 1 the characteristics of such relations are brieﬂy summarized. In practice, one example of such lexical hierarchical systems is WordNet which is currently one of the most popular and the largest online dictionary produced by Miller et al from Princeton University in the 1990s. It supports multiple inheritance between nodes and has the most numerous of relations implemented. WordNet hierarchical system includes 25 diﬀerent branches rooted by 25 distinguish concepts. Each of such 25 concepts can be considered

Word Similarity In WordNet

295

Table 1. Characteristic of relations in the lexical hierarchical system Is-A Meronymy Equivalence Modiﬁcation Function Antonymy √ √ √ √ √ Transitive × √ √ √ Symmetric × × ×

as the beginners of the branches and regarded as a primitive semantic component of all concepts in its semantic hierarchy. Table 2 shows such beginners.

Table 2. List of 25 unique beginners for WordNet nouns {act, action, activity} {animal, fauna} {artifact} {attribute, property} {body, corpus} {cognition, knowledge} {communication} {event, happening} {feeling, emotion}

{natural object} {natural phenomenon} {person, human being} {plant, ﬂora} {possession} {process} {quantity, amount} {relation}

{food} {group, collection} {location, place} {motive} {shape} {state, condition} {substance} {time} Object

sustance

Asset money

nickel

weath

cash

treasure

coin

gold dime

Artifact

Chemical element

Instrumentality Conveyance crystal

Gold*

Car

Ware Table ware

Vehicle

metal Motor Vehicle Nickel*

Article

solid

Wheeled Chair

Cultery

Cycle

Fork

Bicycle

Figure 1. Fragments of WordNet noun taxonomy

Like many other lexical inheritance systems, the IS-A and the meronymy relations are fully supported in WordNet. Although the modiﬁcation and the function relations have not been implemented, the antonymy and the synonym sets are implemented in WordNet. Figure 1 shows fragments of WordNet noun hierarchy. With a hierarchical structure, similarity can be obtained not only by solely comparing the common semantics between two nodes in the system (information theoretic based approach) but also by measuring their position in the structure and their relations (distance based approach).

296

T. Hong-Minh and D. Smith

3 Information Theoretic vs. Conceptual Distance Approach for Measuring Similarity Based on diﬀerent underlying assumptions about taxonomy and deﬁnitions of similarity (e.g., [1–3, 7, 10], etc.), there are two main trends for measuring semantic similarity between two concepts: node based approach (a.k.a information content approach) vs. edge based approach (a.k.a conceptual distance approach). The most distinguish characteristic of node-based approach is that the similarity between nodes is measured directly and solely by the common information content. Since taxonomy is often represented as hierarchical structure — a special case of network structure — similarity between nodes can make use of the structural information embedded in the network, especially those of links between nodes. This is the main idea of edge-based approaches. 3.1 Conceptual Distance Approach The conceptual distance approach is natural, intuitive and addresses the problem of measuring similarity of concepts in the lexical hierarchical system presented in Section 2. The similarity between concepts is related to their diﬀerences in the conceptual distance between them. The more diﬀerences they have, the less similar they are. The conceptual distance between concepts is measured by the geometric distance between nodes presenting concepts. Deﬁnition 1 Given two concepts c1 and c2 and dist(c1 , c2 ) as the distance between c1 and c2 , the diﬀerence between c1 and c2 is equal to the distance dist(c1 , c2 ) between them [3]. Deﬁnition 2 The distance dist(c1 , c2 ) between c1 and c2 is the sum of weights wti of edges ei in the shortest path from c1 to c2 : dist(c1 , c2 ) = (wti ) (1) wti ∈{wti ofei |ei ∈shortestPath(c1 ,c2 )}

As being a distance, Formula (1) should satisfy the properties of a metric [10], such as zero property, positive property and triangle inequality. However, the symmetric property may not be satisﬁed, dist(c1 , c2 ) = dist(c2 , c1 ), as diﬀerent types of relations give diﬀerent contributions into the weight of the edge connecting two nodes. For example, regarding the meronymy type of relation, a aggregative relation may have diﬀerent contributions with a partof-relation, though they are reverse relation of each other. Most of contributions to the weight of an edge come from the characteristics of the hierarchical network, such as local network density, depth of a node in the hierarchy, type of link and strength of link: •

Network density of a node can be the number of its children. Richardson et al [8] suggest that the greater density the closer distance between parentchild nodes or sibling nodes.

Word Similarity In WordNet

• •

297

The distance between parent-child nodes is also closer at deeper levels, since the diﬀerentiation at such levels is less. The strength of a link is based on the closeness between a child node to its direct parent, against those of its siblings. This is the most important factor to the weight of the edge but it is still an open and diﬃcult issue.

There are studies on conceptual similarity by using the distance approach with above characteristics of the hierarchical network (e.g., [1, 11]). Most research focus on proposing an edge-weighting formula and then applying Formula (1) for measuring the conceptual distance. For instance, Sussna [11] considers depth, relation type and network density in his weight formula as follows: wt(c1 , c2 ) =

wt(c1 →r c2 ) + wt(c2 →r′ c1 ) 2d

in which wt(x →r y) = max − r

maxr − minr nr (x)

(2)

(3)

where →r , →r′ are respectively a relation of type r and its reversed. d is the deeper depth of c1 and c2 in the hierarchy. minr and maxr are respectively minimum and maximum weight of relation of type r. nr (x) is the number of relation type r of node x, which is viewed as the network density of the node x. The conceptual distance is then given by applying Formula (1). It gives a good result in a word sense disambiguation task with multiple sense words. However, the formula does not take into account the strength of relation between nodes, which is still an open issue for the distance approach. In summary, the distance approach obviously requires a lot of information on detailed structure of taxonomy. Therefore it is diﬃcult to apply or directly manipulate it on a generic taxonomy, which originally is not designed for similarity computation. 3.2 Information Theoretic Approach The information theoretic approach is more theoretically sound. Therefore it is generic and applied on many taxonomies without regarding their underlying structure. In a conceptual space, a node presents a unique concept and contains a certain amount of information. The similarity between concepts is related to the information in common of nodes. The more commonality they share, the more similar they are. Given concept c1 , concept c2 , IC(c) is the information content value of concept c. Let w be the word denoted concept c. For example, in Figure 1, word nickel has three senses: • •

“a United States coin worth one twentieth of a dollar ” (concept coin) “atomic number 28 ” (concept chemical element)

298

•

T. Hong-Minh and D. Smith

“a hard malleable ductile silvery metallic element that is resistant to corrosion; used in alloys; occurs in pentlandite and smaltite and garnierite and millerite” (concept metal).

Let s(w) be the set of concepts in the taxonomy that are senses of word w. For example, in Figure 1, words nickel, coin and cash are all members of the set s(nickel). Let Words(c) be the set of words subsumed by concept c. Deﬁnition 3 The commonality between c1 and c2 is measured by the information content value used to state the commonalities between c1 and c2 [3]: IC(common(c1 , c2 ))

(4)

Assumption 1 The maximum similarity between c1 and c2 is reached when c1 and c2 are identical, no matter how much commonality they share. Deﬁnition 4 In information theory, the information content value of a concept c is generally measured by IC(c) = − log P(c)

(5)

where P (c) is the probability of encountering an instance of concept c. For implementation, the probability is practically measured by the concept frequency. Resnik [7] suggests a method of calculating the concept probabilities in a corpus on the basis of word occurrences. Given count(w) as the number of occurrences of a word belonging to concept c in the corpus, N as the number of concepts in the corpus, the probability of a concept c in the corpus is deﬁned as follows: 1 P (c) = × count(w) (6) N w∈Words(c)

In a taxonomy, the shared information of two concepts c1 and c2 is measured by the information content value of the concepts that subsume them. Given sim(c1 , c2 ) as the similarity degree of two concepts c1 and c2 and Sup(c1 , c2 ) as the set of concepts that subsume both c1 and c2 , the formal deﬁnition of similarity degree between c1 and c2 is given as follows: ⎧ max IC(c), c1 = c2 , ⎪ ⎨ c∈Sup(c 1 ,c2 ) sim(c1 , c2 ) = (7) ⎪ ⎩ 1, c1 = c2 . The word similarity between w1 and w2 is formally deﬁned: sim(w1 , w2 ) =

max

c1 ∈S(w1 ),c2 ∈S(w2 )

[sim(c1 , c2 )]

(8)

Word Similarity In WordNet

299

When applying the above formulae to a hierarchical concept space, there are some slight speciﬁcations. A set of words Words(c), which is directly or indirectly subsumed by the concept c, is considered as all nodes in the sub-tree rooted by c, including c. Therefore, when we move from the leaves to the root of the hierarchy, Formula (6) therefore gives a higher probability to encounter a concept at the upper level. The probability of the root obviously is 1. Consequently, the information content value given by Formula (5) monotonically decreases in the bottom-up direction and the information content value of the root is 0. Those means that concepts at the upper levels are less informative and the characteristic of lexical hierarchical structure discussed in Section 2 is qualiﬁed. In a lexical hierarchical concept space, Sup(c1 , c2 ) contains all superordinates of c1 and c2 . For example, in Figure 1 coin , cash , money are all member of Sup(nickel, dime). However, as analysis above, only IC(coin) gives the highest information content value. The similarity computed by using Formula (7) sim(nickel, dime) therefore is equal to the information content value of its direct superordinate, IC(coin). So the direct superordinate of a node in a hierarchy (e.g. coin is the direct superordinate of nickel and dime ) is called the minimum upper bound of the node. Similarly for a multiple inheritance system, the similarity between concepts sim(c1 , c2 ) is equal to the maximum information content value among those of their minimum upper bound. For example, in Figure 1, sim(nickel∗ , gold∗ ) = max[IC(chemicalelement), IC(metal)] To conclude, unlike the distance approach, the information theoretic approach requires less structural information of the taxonomy. Therefore it is generic and ﬂexible and has wide applications on many types of taxonomies. However, when it is applied on hierarchical structures it does not diﬀerentiate the similarity of concepts as long as their minimum upper bounds are the same. For example, in Figure 1, sim(bicycle, fork) and sim(bicycle, tableware) are equal.

4 A Measure for Word Similarity We propose a combined model for measuring word similarity which is derived from the node-based notion by adding the structural information. We put the depth factor and link strength factor into the node-based approach. By adding such structural information of the taxonomy the node-based approach can exploit all typical characteristics of a hierarchical structure when it is applied on such taxonomy. Moreover, such information can be tuned via parameters. The method therefore is ﬂexible for many types of taxonomy (e.g., hierarchical structure or plain structure).

300

T. Hong-Minh and D. Smith

Deﬁnition 5 The strength of a link is deﬁned to be P (ci |p), the conditional probability of encountering a child node ci , given an instance of its parent node p. Using Baysian formula, we have: P (ci |p) =

P (ci ∩ p) P (ci ) = P (p) P (p)

(9)

The information content value of a concept c with regarding to its direct parent p, which is a modiﬁcation of the Formula (5), is given:

P(c) IC(c|p) = − log P(c|p) = − log = IC(c) − IC(p) (10) P(p) As we discussed in Section 2, concepts at upper levels of the hierarchy have less semantic similarity between them than concepts at lower levels. This characteristic should be taken into account as a constraint in calculating the similarity of two concepts with depth concern. Therefore, the depth function should give a higher value when applied on nodes at lower levels. The contribution of the depth to the similarity is considered as an exponential-growth function: fc1 ,c2 (d) =

eαd − e−αd , eαd + e−αd

(11)

where d = max(depth(c1 ), depth(c2 )) and α is a tuning parameter. The optimal value of the parameter is α = 0.3057, based on our numerous experiments. Function (11) is a monotonically increasing function with respect to depth d. Therefore it satisﬁes the constraint above. Moreover, by employing an exponential-growth function rather than an exponential-decay function, it is an extension of Shrepard’s Law [2, 9], which claims that exponential-decay function are a universal law of stimulus generalisation for psychological science. Then, the function given in Formula (7) is now a function of the depth and the information content with the concern of the strength of a link as follows: ⎧ max (IC(c|p) × fc (d)), c1 = c2 , ⎪ ⎨ c∈Sup(c 1 ,c2 ) (12) sim(c1 , c2 ) = ⎪ ⎩ 1, c1 = c2 .

5 Experiments Although there is no standard way to evaluate computational measures of semantic similarity, one reasonable way to judge would seem to be agreement with human similarity ratings. This can be assessed by measuring and rating the similarity of each word pair in a set and then looking at how well its ratings correlate with human ratings of the same pairs.

Word Similarity In WordNet

301

We use the human ratings done by Miller and Charles [4] and revised by Resnik [7] as our baseline. In their study, 38 undergraduate subjects are given 30 pairs of nouns and were asked to rate the smilarity of meaning for each pair on scale from 0 (dissimilar) to 4 (fully similar). The average rating of each pair represents a good estimate of how similar the two words are. Furthermore, we compare our similarity value with those produced by a simple edge-count measure and Lin’s [3].We use WordNet 2.0 as the hierarchical system to exploit the relationships among the pairs. Table 3. Results obtained evaluating with human judgement and WordNet 2.0 word1 word2 car automobile gem jewel journey voyage boy lad coast shore asylum madhouse magician wizard midday noon furnace stove food fruit bird cock bird crane tool implement brother monk crane implement lad brother journey car monk oracle cemetery woodland food rooster coast hill forest graveyard shore woodland monk slave coast forest lad wizard chord smile glass magician noon string rooster voyage correlation

Human 3.92 3.84 3.84 3.76 3.70 3.61 3.50 3.42 3.11 3.08 3.05 2.97 2.95 2.82 1.68 1.66 1.16 1.10 0.95 0.89 0.87 0.84 0.63 0.55 0.42 0.42 0.13 0.11 0.08 0.08 1.00

simedge 1.00 1.00 0.50 0.50 0.50 0.50 1.00 1.00 0.13 0.13 0.50 0.25 0.50 0.50 0.20 0.20 0.07 0.13 0.10 0.07 0.20 0.10 0.17 0.20 0.14 0.20 0.09 0.13 0.08 0.05 0.77

simLin 1.00 1.00 0.69 0.82 0.97 0.98 1.00 1.00 0.22 0.13 0.80 0.92 0.25 0.29 0.00 0.23 0.08 0.10 0.71 0.08 0.14 0.25 0.13 0.27 0.27 0.13 0.00 0.00 0.80

ours 1.00 1.00 0.92 0.87 1.00 0.90 1.00 1.00 0.32 0.73 0.85 0.85 0.73 0.54 0.80 0.27 0.00 0.26 0.07 0.26 0.71 0.13 0.27 0.31 0.38 0.21 0.07 0.07 0.00 0.00 0.88

302

T. Hong-Minh and D. Smith

6 Conclusion We have presented a review of two main trends of measuring similarity of words in a generic and hierarchical corpus. Based on this review we proposed a modiﬁcation on the node based approach to capture the structural information of a hierarchical taxonomy. Therefore our approach gives a complete view on similarity of words.

References 1. J. J. Jiang and D. W. Conrath. Semantic similarity based on corpus statistics and lexical taxonomy. In the International Conference on Research in Computational Linguistics, 1997. 2. Y. Li, Z. A. Bandar, and D. McLean. An approach for measuring semantic similarity between words using multiple information sources. In IEEE Transaction on Knowledge and Data Transaction, vol. 15, 871–882, 2003. 3. D. Lin. An information-theoretic deﬁnition of similarity. In ICML ’98: Proceedings of the Fifteenth International Conference on Machine Learning, 296–304, San Francisco, CA, USA, 1998. Morgan Kaufmann Publishers Inc. 4. G. Miller and W. Charles. Contextual correlates of semantic similarity. Language and Cognitive Processes, 6(1):1–28, 1991. 5. G. A. Miller. Nouns in wordnet: A lexical inheritance system. International journal of Lexicography, 3(4):245–264, 1990. 6. G. A. Miller, C. Fellbaum, R. Beckwith, D. Gross, and K. Miller. Introduction to wordnet: An online lexical database. International journal of Lexicography, 3(4):235–244, 1990. 7. P. Resnik. Using information content to evaluate semantic similarity in a taxonomy. In IJCAI, 448–453, 1995. 8. R. Richardson and A. F. Smeaton. Using WordNet in a knowledge-based approach to information retrieval. Technical Report CA-0395, Dublin, Ireland, 1995. 9. R. N. Shepard. Toward a universal law of generalization for psychological science. 237(4820):1317–1323, September 1987. 10. R. Roy, H. Mili, E. Bicknell, and M. Blettner. Development and application of a metric on semantic nets. In IEEE Transactions on Systems, Man and Cybernetics, volume 19, 17–30, 1989. 11. M. Sussna. Word sense disambiguation for free-text indexing using a massive semantic network. In CIKM ’93: Proceedings of the second international conference on Information and knowledge management, 67–74, New York, NY, USA, 1993. ACM Press. 12. W. Gale, K. Church, and D. Yarovsky. A method for disambiguating word senses in a large corpus. Common Methodologies in Humanities Computing and Computational Linguistics, 26:415–439, 1992.

Progress in Global Optimization and Shape Design D. Isebe1 , B. Ivorra1 , P. Azerad1 , B. Mohammadi1 , and F. Bouchette2 1

2

I3M- Universite de Montpellier II, Place Eugene Bataillon, CC051, 34095 Montpellier, France [email protected] ISTEEM - Universite de Montpellier II, Place Eugene Bataillon, 34095 Montpellier, France [email protected]

Abstract In this paper, we reformulate global optimization problems in terms of boundary value problems. This allows us to introduce a new class of optimization algorithms. Indeed, many optimization methods can be seen as discretizations of initial value problems for diﬀerential equations or systems of diﬀerential equations. We apply a particular algorithm included in the former class to the shape optimization of coastal structures.

1 Introduction Many optimization algorithms can be viewed as discrete forms of Cauchy problems for a system of ordinary diﬀerential equations in the space of control parameters [1, 2]. We will see that if one introduces an extra information on the inﬁmum, solving global optimization problems using these algorithms is equivalent to solving Boundary Value Problems (BVP) for the same equations. A motivating idea is therefore to apply algorithms solving BVPs to perform this global optimization. In this paper we present a reformulation of global minimization problems in term of over-determined BVPs, discuss the existence of their solutions and present some algorithms solving those problems. One aim here is also to show the importance of global optimization algorithms for shape optimization. Indeed, because of excessive cost of global optimization approaches usually only local minimization algorithms are used for shape optimization of distributed systems, especially with ﬂuids [3, 4]. Our semi-deterministic algorithm permits for global optimization of systems governed by PDEs at reasonable cost. Section 2 presents our global optimization method and mathematical background. In Section 3, the previous approach is applied to our considered optimization problem.

304

D. Isebe et al.

2 Optimization Method We consider the following minimization problem: min J(x)

x∈Ωad

(1)

where J : Ωad → IR is called cost function, x is the optimization parameter and belongs to a compact admissible space Ωad ⊂ IRN , with N ∈ IN. We make the following assumptions [2]: J ∈ C 2 (Ωad , IR) and coercive. The inﬁmum of J is denoted by Jm . 2.1 BVP Formulation of Optimization Problems Many minimization algorithms which perform the minimization of J can be seen as discretizations of continuous ﬁrst or second order dynamical systems with associated initial conditions [1]. A numerical global optimization of J with one of those algorithms, called here core optimization method, is possible if the following BVP has a solution: First or second order initial value problem (2) |J(x(Z)) − Jm | < ǫ where x(Z) is the solution of the considered dynamical system found at a given ﬁnite time Z ∈ IR and ǫ is the approximation precision. In practice, when Jm is unknown, we set Jm to a lower value (for example Jm = 0 for a non-negative function J) and look for the best solution for a given complexity and computational eﬀort. This BVP is over-determined as it includes more conditions than derivatives. This over-determination can be removed for instance by considering one of the initial conditions in the considered dynamical system as a new variable denoted by v. Then we could use what is known on BVP theory, for example a shooting method [1], in order to determine a suitable v solving (2). 2.2 General Method for the Resolution of BVP (2) In order to solve previous BVP (2), we consider the following general method: We consider a function h : Ωad → IR given by: h(v) = J(x(Z, v))

(3)

where x(Z, v) is the solution of considered dynamical in (2) starting from the initial condition v, deﬁned previously, at a given time Z ∈ IR. Solving BVP (2) is equivalent to minimize in Ωad function (3). Depending on the selected optimization method, h is usually a discontinuous plateau function. For example, if a Steepest Descent method is used as

Progress in Global Optimization and Shape Design

305

core optimization method, the associated dynamical system reach, in theory, the same local minimum when it starts from any points included in a same attraction basin. In other words, if Z is large enough, h(v) is piecewise constant with values corresponding to the local minima of J(x(Z, v)). Furthermore, h(v) is discontinuous where the functional reaches a local maximum, or has a plateau. In order to minimize such a kind of function we propose a multi-layer algorithm based on line search methods [1]: We ﬁrst consider the following algorithm A1 (v1 , v2 ): -

(v1 , v2 ) ∈ Ωad × Ωad given Find v ∈ argminw∈O(v2 ) h(w) where O(v2 ) = {v1 +t(v2 −v1 ), t ∈ IR}∩Ωad using a line search method return v

The line search minimization in A1 is deﬁned by the user. It might fails. For instance, a secant method [1] degenerates on plateaus and critical points. In this case, in order to have a multidimensional search, we add an external layer to the algorithm A1 by minimizing h′ : Ωad → IR deﬁned by: h′ (v ′ ) = h(A1 (v ′ , w′ ))

(4)

′

with w chosen randomly in Ωad . This leads to the following two-layer algorithm A2 (v1 , v2′ ): -

(v1′ , v2′ ) ∈ Ωad × Ωad given Find v ′ ∈ argminw∈O(v2′ ) h′ (w) where O(v2′ ) = {v1′ +t(v2′ −v1′ ), t ∈ IR}∩Ωad using a line search method return v ′

The line search minimization in A2 is deﬁned by user. N.B Here we have only described a two-layers algorithm structure. But this construction can be pursued by building recursively hi (v1i ) = hi−1 (Ai−1 (v1i , v2i )), with h1 (v) = h(v) and h2 (v) = h′ (v) where i = 1, 2, 3, ... denotes the external layer. In this paper, we call this general recursive algorithm: Semi-Deterministic Algorithm (SDA). For each class of method used as core optimization method, we will describe more precisely the SDA implementation. 2.3 1st Order Dynamical System Based Methods We consider optimization methods that come from the discretization of the following dynamical system [1, 2]: M (ζ, x(ζ))xζ (ζ) = −d(x(ζ)) (5) x(ζ = 0) = x0 n where ζ is a ﬁctitious time. xζ = dx dζ . M is an operator, d : Ωad → IR is a function giving a suitable direction. For example:

306

• •

D. Isebe et al.

If d = ∇J, the gradient of J, and M (ζ, x(ζ)) = Id, the identity operator, we recover the classical steepest descent method. If d = ∇J and M (ζ, x(ζ)) = ∇2 J(x(ζ)) the Hessian of J, we recover the Newton method.

In this case, BVP (2) can be rewritten as: ⎧ ⎨ M (ζ, x(ζ))xζ = −d(x(ζ)) x(0) = x0 ⎩ |J(x(Z)) − Jm | < ǫ

(6)

This BVP is over-determined by x0 . i.e, the choice of x0 determines if BVP (6) admits or not a solution. For instance, in the case of a steepest descent method, BVP (6) generally has a solution if x0 is in the attraction basin of the global minimum. In order to determine a such x0 , we consider the implementation of algorithms Ai with i = 1, 2, 3, ... (here we limit the presentation to i = 2).

The ﬁrst layer A1 is applied with a secant method in order to perform line search. The output is denoted byA1 (v1 , J, I, ǫ), and the algorithm reads: Input: v1 , J, ǫ v2 chosen randomly For l going from 1 to J ol = D(vl , ǫ) ol+1 = D(vl+1 , ǫ) If J(ol ) = J(ol+1 ) EndFor If min{J(om ), m = 1, ..., l} < ǫ EndFor l+1 −vl vl+2 = vl+1 − J(ol+1 ) J(ovl+1 )−J(ol ) EndFor Output: A1 (v1 , J, ǫ): argmin{J(om ), m = 1, ..., i} where v1 ∈ Ω, ǫ ∈ IR+ and J ∈ IN are respectively the initial condition, the stopping criterion and the iteration number. The second layer A2 is applied with a secant method in order to perform line search. The output is denoted by A2 (w1 , K, J, I, ǫ), and the algorithm reads: Input: w1 , K, J, ǫ w2 chosen randomly For l going from 1 to K pl = A1 (wl , J, ǫ) pl+1 = A1 (wl+1 , J, ǫ) If J(pl ) = J(pl+1 ) EndFor If min{J(pm ), m = 1, ..., l} < ǫ EndFor l+1 −wl wl+2 = wl+1 − J(pl+1 ) J(pwl+1 )−J(pl ) EndFor Output: A2 (w1 , K, J, ǫ): argmin{J(pm ), m = 1, ..., i}

Progress in Global Optimization and Shape Design

307

where w1 ∈ Ω, ǫ ∈ IR+ and (K, J) ∈ IN2 are respectively the initial condition, the stopping criterion and the iteration number. 2.4 2nd Order Dynamical System Based Methods In order to keep an exploratory character during the optimization process, allowing us to escape from attraction basins, we could use variants of previous methods after adding second order derivatives. For instance we could reformulate BVP (2) considering methods coming from the discretization of the following ‘heavy ball ’ dynamical system [2]: ⎧ ⎨ ηxζζ (ζ) + M (ζ, x(ζ))xζ (ζ) = −d(x(ζ)), x(0) = x0 , xζ (0) = xζ,0 (7) ⎩ |J(x(Z)) − Jm | < ǫ

with η ∈ IR. System (7) can be solved by considering x0 (as previously) or xζ,0 as a new variable. In the ﬁrst case the existence of solution for BVP (7) is trivial. In the second case, considering particular hypothesis interesting in numerical analysis, when x0 is ﬁxed it can be proved that it exists a xζ,0 such that BVP (7) admits numerical solutions: Theorem 1 Let J : IRn → IR be a C 2 -function such that minIRn J exists and is reached at xm ∈ IRn . Then for every (x0 , δ) ∈ IRn × IR+ , exists (σ, t) ∈ IRn × IR+ such that the solution of the following dynamical system: ⎧ ⎨ ηxζζ (ζ) + xζ (ζ) = −∇J(x(ζ)) x(0) = x0 (8) ⎩ xζ (0) = σ with η ∈ IR, passes at time ζ = t into the ball Bδ (xm ).

Proof : We assume x0 = xm . Let ǫ > 0, we consider the dynamical system: ⎧ ⎨ ηyτ τ (τ ) + ǫyτ (τ ) = −ǫ2 ∇J(x(τ )) y(0) = x0 (9) ⎩ yτ (0) = ̺(xm − x0 )

with ̺ in IR+ \{0}. •

Assume that ǫ = 0, we obtain the following system: ⎧ ⎨ ηyτ τ,0 (τ ) = 0 y0 (0) = x0 ⎩ yτ,0 (0) = ̺(xm − x0 )

(10)

System (10) describes a straight line of origin x0 and passing at time θ̺ by the point xm , i.e. y0 (θ̺ ) = xm .

308

•

D. Isebe et al.

Assume that ǫ = 0. System (9) could be rewritten as: ⎧ y(τ ) yτ (τ ) ⎪ ⎪ = ⎨ ηyτ (τ ) τ −ǫyτ (τ ) − ǫ2 ∇J(y(τ )) y(0) = x0 ⎪ ⎪ ⎩ yτ (0) = ̺(xm − x0 )

(11)

System (11) is of the form yτ = f (τ, y, ǫ), with f satisfying the CauchyLipschitz conditions. Applying the Cauchy-Lipschitz Theorem [5]: |yǫ (θ̺ )− y0 (θ̺ )| →ǫ→0 0 uniformly. Thus for every δ ∈ R+ \{0}, there exists ǫδ such that for every ǫ < ǫδ : |yǫ (θ̺ ) − xm | < δ (T.1)

Let δ ∈ IR+ \{0}. We consider the following variable changing ζ = ǫδ τ and x(ζ) = yǫδ ( ǫζδ ). System (9) becomes: ⎧ ⎨ ηxζζ (ζ) + xζ (ζ) = −∇J(x(ζ)) x(0) = x0 ⎩ x(0) ˙ = ǫ̺δ (xm − x0 )

(12)

Let ϑ = ǫδ θ̺ . Under this assumption, x(ϑ) = yǫδ (θ̺ ). Thus, due to (T.1) : |x(ϑ) − xm | < δ. We have found σ = ǫ̺δ (xm − x0 ) ∈ IRn and t = ϑ ∈ IR+ such that the solution of system (8) passes at time t into the ball Bδ (xm ). ◦ In order to determine a suitable x0 or x(ζ,0) solving BVP (7), we can consider, for instance, the same algorithms A1 and A2 introduced in section 2.3. 2.5 Other Hybridizations with SDA In practice, any user-deﬁned, black-box or commercial minimization package starting from an initial condition can be used to build the core optimization sequences in the SDA presented in section 2.2. In that sense, the algorithm permits the user to exploit his knowledge on his optimization problem and to improve it. In the same way, preconditioning can be introduced at any layer, and in particular at the lowest one.

3 Application to Shape Optimization of Coastal Structures To our knowledge, despite the fact that beach protection becomes a major problem, shape optimization techniques have never been used in coastal

Progress in Global Optimization and Shape Design

309

engineering. Groins, breakwaters and many other structures are used to decimate water waves or to control sediment ﬂows but their shapes are usually determined using simple hydrodynamical assumptions, structural strength laws and empirical considerations. In this section, we expose two examples of shape optimization in coastal engineering. First, we solve a problem concerning the minimization of the water waves scattered by reﬂective vertical structures in deep water. Secondly, we study the protection of the lido of S`ete (NW Mediterranean sea, France) by optimizing geotextile tubes with a wave refraction-diﬀraction model. All of these works are part of the COPTER research project (2006-2009) NT05 - 2-42253 funded by French National Research Agency. Both optimization problems are performed using the two-layer algorithm A2 , introduced in section 2.3 with a steepest descent algorithm [1] as core optimization method. Each steepest descent iteration number is equals to 100. The layers iteration number is set to 5 (i.e. K = J = 5). 3.1 Minimization of Water Waves Impact in Deep Water In deep water, the main scattering phenomenon is the reﬂection, so we compute the solution of a boundary value problem describing the water waves ξ r scattered by a fully reﬂective vertical structure in deep water and modify accordingly its shape, in order to minimize a pre-deﬁned cost function taking into account the strength (energy) of the water waves [6, 7]. The optimization procedure relies on the global semi-deterministic algorithm detailed in the preceding section, able to pursue beyond local minima. For the control space, we consider a free individual parameterization for the structure which allows diﬀerent original and non-intuitive shapes. Practically, a generic structure is a tree, described by its trunk represented by a set of connected principal edges and by a number of secondary branches leaving from each node.This parameterization gives a large freedom in the considered shapes. The cost function to minimize is the energy norm L2 of water waves free surface in an admissible domain representing a ten meter wide coastal strip of prescribed width, located between two successive structures. We present here an optimized shape for the structures in the case of a north-western incidental wave. Optimization reveals an original and nonintuitive optimized shape represented in Fig. 1-(Left). It is a monotonous structure which provides superior results for the control of the free surface along the coastline. To highlight the eﬀectiveness of this structure, we will compare it with a traditional structure (rectangular and perpendicular to the coastline) (See Fig. 2). The cost function decreases by more than 93% compared to rectangular structures perpendicular to the coastline, like we can see in the cost function convergence during the optimization process (Fig. 1-(Right)).

310

D. Isebe et al. 0.7 0.6

value ofJ

0.5 0.4 0.3 0.2 0.1 0 0

20 40 60 80 number of cost function evaluation

100

Figure 1. (Left) Initial (Dashed line) and optimized structures (Solid line). (Right) Cost function evolution during the optimization process (history of convergence) 1.5 m

(a)

(b)

0.6 m

1

0.4 0.5

0.2 0

0 -0.5

-1

-1.5 m

-0.2

-0.1 m

Figure 2. free surface elevation ξ resulting from a reflection (a) on rectangular structures perpendicular to the coastline, (b) on optimized structures with no feasibility constraints

3.2 Minimization of the Sediment Mobilization Energy in a Coastal Zone The objective is to prevent the erosion phenomenon in the region of the lido of S`ete (NW Mediterranean sea, France) by minimizing the sediment mobilization energy, for a set of given periods, directions and heights for the water waves with the help of a refraction-diﬀraction model [8,9]. In few words, it is important to note that the incident water waves can be divided in two categories, the destructive waters waves and the advantageous water waves. The cost function considered has the ambition to reduce the energy for the destructive water waves and, in addition, to be transparent with the advantageous water waves.

Progress in Global Optimization and Shape Design

311

The solution proposed is the use of geotextile tubes attenuator devices for the water waves. Initially, a preliminary draft proposes to put geotextile tubes, with an height of 3m, at a distance of 550m compared to the coast. We have the possibility to optimize the distance, the shape, the angle with the coast, the height of the geotextile tubes but in this paper, in order to show that global optimization is of great interest in coastal engineering, we ﬁx all the dimensioning quantities and we only optimize the distance compared to the coast. First, we sample the oﬀshore distance between 150m et 750m and we compute the value of J for a geotube disposed in each value of the sampling. We expose the results in Fig. 3-Left.

0.65

0.65

0.6

0.6 value of J

Cost function value J

Best convergence History of convergence

0.55

0.5

0.45 100

Space control parameters

200 300 400 500 600 700 Distance Coastline/Geotextile (m)

0.55 0.5

800

0.45 0

10

40 30 20 number of evaluation of J

50

Figure 3. (Left) Cost function value with respect to the position of the geotube. The admissible domain for the geotubes is 350 − 800m. (Right) Cost function evolution during the optimization. We see the importance of using global minimization

We see clearly that the minimum is obtain for a geotube placed with a oﬀshore distance of 350m. But what we would like to stress is that the cost function is obviously non-convex and this bring us to think that the use of a global optimization algorithm is necessary, in order to avoid the optimization process to be catch in local attraction basin. So, we apply the global algorithm described in the ﬁrst section with a starting point corresponding to the initial position (550m). The optimization process recover the optimal case seen by the sampling. More precisely, we obtain an optimal position of 353m. We expose the cost function evolution in the Fig. 3-Right.

312

D. Isebe et al.

4 Conclusions A new class of Semi-Deterministic methods has been introduced. This approach allow us to improve both deterministic and non-deterministic optimization algorithms. Various algorithms included in former class have been validated on various benchmark functions. Obtained results over-perform those given by a classical genetic algorithm in term of computational complexity and precision. One of them have been applied with success to the design of a coastal structures. These algorithms have been applied to other various industrial optimization problems: Multichannel Optical Filters Design, Shape optimization of a fast-microﬂuidic protein folding device, ﬂame temperature and pollutant control [10].

References 1. B. Mohammadi and J-H. Saiac. Pratique de la simulation num´ erique. Dunod, 2002. 2. H. Attouch and R. Cominetti. A dynamical approach to convex minimization coupling approximation with the steepest descent method. J. Diﬀerential Equations, 128(2):519–540, 1996. 3. B. Mohammadi and O. Pironneau. Applied Shape Optimization for Fluids. Oxford University Press, 2001. 4. A. Jameson, F. Austin, M. J. Rossi, W. Van Nostrand, and G. Knowles. Static shape control for adaptive wings. AIAA Journal, 32(9):1895–1901, 1994. 5. F. Verhulst. Nonlinear diﬀerential equations and dynamical systems. SpringerVerlag., 1990. 6. D. Colton and R. Kress. Inverse acoustic and electromagnetic scattering theory. Springer-Verlag, 1992. 7. D. Isebe, P. Azerad, B. Ivorra, B. Mohammadi, and F. Bouchette. Optimal shape design of coastal structures minimizing coastal erosion. In Proceedings of workshop on inverse problems, CIRM, Marseille, 2005. 8. J. T. Kirby and R. A. Dalrymple. A parabolic equation for the combined refraction diﬀraction of stokes waves by mildly varying topography. J. Fluid. Mechanics., 136:443–466, 1983. 9. J. T. Kirby and R. A. Dalrymple. Combined refraction/diﬀraction model ref/dif 1, User’s manual. Coastal and Oﬀshore Engineering and Research, Inc., Newark, DE., January, 1985. (Revised June, 1986). 10. B. Ivorra, B. Mohammadi, D. E. Santiago, and J. G. Hertzog. Semi-deterministic and genetic algorithms for global optimization of microﬂuidic protein folding devices. International Journal of Numerical Method in Engineering, 66: 319–333, 2006.

EDF Scheduling Algorithm for Periodic Messages On Switched Ethernet Myung-Kyun Kim and Dao Manh Cuong School of Computer Engineering and Information Communication, University of Ulsan, Nam-Gu, 680749 Ulsan, Republic of Korea [email protected] Abstract The switched Ethernet oﬀers many attractive features for real-time communications such as traffic isolation, providing large bandwidth, and full-duplex links, but the real-time features may be aﬀected due to the collisions on an output port. This paper analyzes the schedulability condition for real-time periodic messages on a switched Ethernet where all nodes operate in a synchronized mode. This paper also proposes a EDF (Earliest Deadline First)-based scheduling algorithm to support the real-time features of the periodic traffic over switched Ethernet without any change in the principles of switched Ethernet. The proposed algorithm allows dynamic addition of new periodic messages during system running, which gives more ﬂexibility to designing the real-time systems.

1 Introduction Switched Ethernet has been the most widely used in data communications. It has many attractive features for real-time communications such as trafﬁc isolation, providing large bandwidth, and full-duplex links [1]. However, switches have some problems that have to be solved to support real-time features for industrial communications. The main disadvantage of switched Ethernet is the collision of messages at the output port. If two or more messages are destined to the same output port at the same time, it causes variable message delay in the output buﬀer or message loss in the case of buﬀer overﬂow, which aﬀects the timely delivery of real-time messages. This paper analyzes the schedulability condition for real-time periodic messages on a switched Ethernet where the switch does not require any modiﬁcation and only the end nodes handles messages according to the EDF policy. This paper also proposes a EDF-based scheduling algorithm to support the real-time features of the periodic traﬃc over the switched Ethernet. The transmission of the periodic messages is handled by a master node to guarantee the real-time communication. The master node determines which messages can be transmitted without violating the real-time requirements of industrial communication

314

M.-K. Kim and D.M. Cuong

by checking the schedulability condition, and makes a feasible schedule on demand. The rest of paper is organized as follow. In section 2, the previous works on industrial switched Ethernet and message transmission model on the switched Ethernet are described. Section 3 describes the schedulability condition for real-time periodic messages on the Switched Ethernet and an EDF-based scheduling algorithm for the real-time messages on the switched Ethernet is described in section 4. Section 5 concludes and summaries the paper.

2 Backgrounds 2.1 Related Works From the last few years, a large number of works have been done to analyze the applicability of switched Ethernet to industrial communication. The ﬁrst idea was using Network Calculus theory for evaluating the real-time performance of switched Ethernet networks. Network Calculus was introduced by Cruz [2, 3] and described a theory for obtaining delay bounds and buﬀer requirements. George et al. [4] paid attention to the architectures of switched Ethernet networks and presented a method to design them which aimed to minimize end-to-end delays by using Network Calculus theory. This method may be eﬀective when designing static applications to guarantee real-time performance of switched Ethernet networks. Another work has been done by Loser and Hartig [5], where they used the traﬃc shaper (smoother) to regulate the traﬃc entering switched Ethernet and to bound end-to-end delay based on Network Calculus theory reﬁned by Le Boudec and Thiran [6]. In this method, the traﬃc pattern must satisfy the burstiness constraints to guarantee the delay bounds to be met. Another way to support real-time communication is modifying the original switches to have extra functionality to provide more eﬃcient scheduling policy and admission control. Hoai Hoang et al. [7] attempted to support real-time communication of switched Ethernet by adding the real-time layer in both end nodes and a switch. Instead of using FIFO queuing, packets are queued in the order of deadline in the switch. A source node, before sending a real-time periodic message to the destination node, has to establish a real-time channel that is controlled by the switch for guaranteeing the deadline. The main contribution of this paper is proposing a schedulability condition for EDF scheduling of periodic messages and a EDF-based message scheduling algorithm to support hard real-time communication over switched Ethernet without any modiﬁcation in original switches, so they can be directly applied to industrial communications.

EDF Scheduling Algorithm for Periodic Messages On Switched Ethernet

315

2.2 Message Transmission Model We assume that the switched Ethernet operates in full-suplex mode, where each node is connected to the switch through a pair of links: a transmission link and a reception link as shown in Fig. 1-(a). Both the transmission and reception links operate independently and the switches use store-and-forward or cut-through switching modes for forwarding frames from the input ports to the output ports. In the cut-through switching which is widely used nowadays for fast delivery of frames, the switch decides the output port right after receiving the header of a frame and forwards the frame to the output port. When a switch uses the cut-through switching mode, if a message from node i to node j begins to be transmitted at time t0 , the ﬁrst bit of the message arrives at node j after TL = 2 ∗ tp + tsw amount of delay from t0 if there is no collision at the output port and the output queue is empty. This is shown in Fig. 1-(b). In the ﬁgure, tp is a propagation delay on a link between a node and the switch and tsw is the switching latency (destination port look-up and switch fabric set-up time) which depends on the switch vendor. Normally, tsw is about 11µs in a 100Mbps switch.

TLi TL1

TL2

RL2 RL1

TLk RLk

RLs TLs

Node 2 (a)

Node k

t

SMij SMij

tp

ts SMij

RLj Node 1

t

SMij

t0

t

TL=2*tp+ ts (b)

Figure 1. (a) Switched Ethernet and (b) message transmission on a switch

All the transmission links and reception links are slotted into a sequence of fundamental time units, called Elementary Cycles (ECs), which is similar to that of FTT-CAN [9]. Message transmission on the switched Ethernet is triggered by a master node which sends a TM (Trigger Message) at the beginning of every EC. An EC consists of a TM, a SMP (Synchronous Message Period) and a AMP (Asynchronous Message Period) as shown in Fig. 2-(a). But, in this paper, we only consider the scheduling of periodic messages, so for the simplicity of the analysis, we assume there is no AMP in each EC as shown in Fig. 2-(b). When using the EC in Fig. 2-(b), we have to increase the length of each periodic message by multiplying E/Eo for the correct analysis, because the length of the SMP is increased by multiplying E/Eo . The TM is used to synchronize all nodes and contains a schedule for the periodic messages that are transmitted on the respective EC. The SMP is the time duration for transmitting the real-time periodic messages in this EC, and the AMP is the time duration for transmitting aperiodic messages. In Fig. 2, LT M is the

316

M.-K. Kim and D.M. Cuong

length of trigger message and E ′ = E − TL − LT M is the available time for transmitting messages on an EC.

Figure 2. Message transmission model

The operation of real-time communication network can be described as follows. Firstly, all the slave nodes send the real-time requirements to the o master node that are characterized by SMij (Dij , Pij , Oij , Cij ) where SMij o is the synchronous message from node i to node j and Dij , Pij , Oij , Cij is the deadline, period, initial oﬀset and the amount of data of SMij , respectively. As mentioned before, the amount of message SMij has to be modiﬁed for o the correct analysis as follows: Cij = Cij * E/Eo . In addition, we assume that all the Dij , Pij and Oij are the multiple integers of E and Pij = Dij . After receiving all the request frames from slaves, the master node checks the feasibility of the synchronous messages and broadcasts the result to all the slaves to indicate which messages can be transmitted over the switched Ethernet and meet their deadline. The real-time messages over the switched Ethernet are sorted in the increasing order of their deadline when they arrive at the slave nodes. Then at the beginning of every EC, the master node broadcasts the trigger message with scheduling information that announces a set of messages that can be transmitted at the respective EC.

3 Scheduability Condition on Switched Ethernet By applying EDF scheduling condition for preemptive tasks propsed by Liu and Layland [8], Pedreira et al [9] showed that a set of periodic messages is schedulable if Ci E ′ − max(Ci ) (1) ≤ U= Pi E i where E ′ is the available time to transmit periodic messages on a shared medium. For real-time communication over switched Ethernet, the transmission of periodic messages must be considered both on transmission links and on reception links. In our proposed scheduling algorithm, we consider the periodic

EDF Scheduling Algorithm for Periodic Messages On Switched Ethernet

317

messages in the order of deadline. U Ti and U Rj are the utilization of T Li and RLj such as U Ti =

Cij j

Pij

; j ∈ STi

U Rj =

Cij i

Pij

; j ∈ SRj

(2)

where STi is a set of nodes to which node i sends the messages and SRj is a set of nodes from which node j receives the messages. U Tmax,j are the maximum utilization of a set of transmission links that transmit messages to node j, such that U Tmax,j = max(U Ti ) j ∈ SRj (3) The main idea of our scheduling algorithm is that the periodic messages which are transmitted at each EC of transmission link must be able to be delivered completely at respective EC on reception links, that means meet their deadline. According to the schedulability condition (2) for the periodic messages on a transmission link i, we can make a schedule for the periodic messages that satisﬁes the following condition: Ti,n ≤ U Ti ∗ E + max(Cij )

(4)

where Ti,n is the total time for transmitting messages on T Li in nth EC. We call Tmax,i = U Ti ∗ E + max(Cij ) is the maximum total time to transmit messages on T Li at every EC For hard real-time communication, all the periodic messages have to meet their deadline in the worst-case. In our message transmission model, the worstcase situation occured on RLj when, cumulatively, i) all the periodic messages arrive at a reception link at the same time, and ii) the arrival time of the ﬁrst message on an EC on RLj is latest, which means there is a shortest time for transmitting messages on current EC.

Figure 3. Worst case situation on RLj

Fig. 3 shows the worst-case situation on RLj when the messages are transmitted over switched Ethernet. The ﬁgure also shows that by this scheduling

318

M.-K. Kim and D.M. Cuong

algorithm, the temporal constraints of periodic messages can be satisﬁed if we bound the total time to transmit message on reception link by Rmax,j and they are not aﬀected by the FIFO queue of original switches. Now we can express the worst-case situation occurs on RLj when i) the utilization of all transmission links that are transmitting messages to RLj are equal U Ti = U Tmax,j ∀i ∈ SRj

(5)

Tmax,i = U Tmax,j ∗ E + max(Cij ) ∀i ∈ SRj

(6)

which leads to

th

ii) all the messages arrive at RLj on the n EC at the latest time. As shown in Fig. 3, the latest ﬁnishing time of periodic messages on RLj is Tmax,i (∀i ∈ SRj ). So the latest arrival time of messages on RLj is ATmax,j = Tmax,i − min(Cij ) = U Tmax,j ∗ E + max(Cij ) − min(Cij )

(7)

when the size of all messages SMi j(∀i ∈ SRj ) is smallest. If this worst-case situation happens, the available time to transmit messages on RLj is Rmax,j = E ′ − ATmax,j = E ′ − U Tmax,j ∗ E − max(Cij ) + min(Cij ).

(8)

Because our proposed scheduling algorithm considers the messages in the order of deadline, we can apply equation (2) to analyze the schedulability of messages on the reception links. Thus, a set of periodic messages on RLj is scheduable if Rmax,j − max(Cij ) E by (10), we have U Rj ≤

If we replace Rmax,j U Rj ≤

E ′ − U Tmax,j ∗ E − max(Cij ) + min(Cij ) − max(Cij ) E

(9)

(10)

which leads to E ′ − 2 ∗ max(Cij ) + min(Cij ) (11) E Finally, we have the schedulability condition as follows. A given set of messages SMij from node i to node j are scheduable if U Rj + U Tmax,j ≤

U Ti + U Rj ≤

E ′ − 2 ∗ max(Cij ) + min(Cij ) ∀i, j E

(12)

EDF Scheduling Algorithm for Periodic Messages On Switched Ethernet

319

Initialization

EC_Trigger? yes

Broadcast TM

no

SMP_Trigger?

no

no

AMP_Trigger?

yes

yes

Delay until next AMP

no

More PM_req message? yes

New msg added? yes

no

Perform Scheduling alg.

Receive PM_req Check schedulability yes

Schedulable?

Send PM_rep (accept)

no

Send PM_rep (reject)

Figure 4. Processing ﬂow of master node

4 Message Scheduling Algorithm on Switched Ethernet The master node receives a new periodic message request from the slave nodes in AMP, and checks the schedulability condition and makes a transmission schedule for the message if it is schedulable. At the beginning of each EC, the master transmits a TM which includes the number of messages that each node can transmit in the EC. The processing ﬂow of the master node is shown in Fig. 4. The operation of the master is triggered by three events EC T rigger, SM T rigger, and AM T rigger which are enabled at the start of EC, SMP, and AMP, respectively. A new periodic message request P M req includes the real-time requirements of the periodic message. In response to the request, the master replies by P M rep(accept) or P M rep(reject) according to the scehulability check result. The master node can also play the role of a slave node. In that case, the master node performs the operations shown in Fig. 5 during SMP and AMP, respectively. Each slave node maintains a message buﬀer which contains periodic message instances allowed to transmit according to message deadlines When a slave node receives a TM, it interprets the message and transmits the number of messages speciﬁed in the TM. When there is a new periodic message to transmit, each slave node sends a new message request to the master in AMP. The processing ﬂow of a slave node is shown in Fig. 5. The proposed scheduling algorithm is described in Algorithm1. Initially, r carries the information of ready messages at the beginning of each EC. At line 9, the algorithm checks whether a ready message can be transmitted or not on current EC. If the condition is not satisﬁed, this message is delayed to

320

M.-K. Kim and D.M. Cuong

Figure 5. Processing ﬂow of slave node

next EC. Finally, the master node considers the initial oﬀset Oij of message SMij at line 15 when establishing the list of ready messages. // Algorithm1: EDF-based Scheduling Algorithm // N Ti,n : the number of messages that node i can transmit at nth EC 1. for (k = 1;k ≤ N ;k + +){rk,1 = 0;} 2. for(n = 0;n ≤ LCM (Pij );n + +){ 3. Ti,n = 0, N Ti,n = 0 for all i; Rj,n = 0 for all j; 4. {sort messages in increasing order of deadline}; 5. for (k = 1;k ≤ N ;k + +){ 6. rk,n+1 = rk,n ; 7. if (rk,n = 1) { 8. read SMij ; 9. if ((Ti,n + Cij ≤ Tmax,i ) and (Rj,n + Cij ) ≤ Rmax,j )) { 10. Ti,n = Ti,n + Cij ; 11. Rj,n = Rj,n + Cij ; 12. N Ti,n + +; rk,n+1 = 0; 13. } 14. } 15. if ((n-1) mod Pij /E = Oij ) rk,n+1 = 1; 16. } 17. }

Now we prove that when a new message SMij satisﬁes the condition (12) and is scheduled by this scheduling algorithm, this message can be transmitted completely within the same EC on both T Li and RLj , that means meet its

EDF Scheduling Algorithm for Periodic Messages On Switched Ethernet

321

deadline. The latest ﬁnishing time of SMij on T Li is Tmax,i , so the latest arrival time of SMij on RLj is ATmax,ij = Tmax,i − min(Cij ) = U Ti ∗ E + max(Cij ) − min(Cij )

(13)

when SMij has a smallest size. After arriving at RLj , in the worst-case situation, SMij may have to wait for other messages in the switch buﬀer before being transmitted. But the total delay of this message on RLj , according to scheduling algorithm, is bounded by Rmax,j = E ′ − U Tmax,j ∗ E − max(Cij ) + min(Cij )

(14)

so the ﬁnishing time Fij,n of SMij on nth (current) EC will be Fij,n ≤ ATmax,ij + Rmax,j

(15)

Fij,n ≤ E ′ − (U Tmax,j − U Ti )

(16)

Fij,n ≤ E ′

(17)

which leads to

Because U Ti ≤ U Tmax,j

so SMij is transmitted on the same EC on both T Li and RLj ,that means it meets its deadline.

5 Conclusions and Future Work Real-time distributed control systems have been more widely used in industrial applications like process control, factory automation, vehicles and so on. In those applications, each task must be executed within a speciﬁed deadline, and also the communications between the tasks have to be completed within their deadlines to satisfy the real-time requirements of the tasks. Switched Ethernet which is the most widely used in the oﬃce has good operational features for real-time communications. The switched Ethernet, however, needs some mechanisms to regulate the traﬃc on the network in order to satisfy the hard real-time communication requirements of the industrial applications. In this paper, the EDF-based scheduling algorithm for hard real-time communication over switched Ethernet was proposed. With this scheduling algorithm, there is no need to modify the original principles of switches to support hard real-time communication in industrial environment. This paper also analyzed the schedulability condition for real-time periodic messages and showed that the proposed scheduling algorithm reﬂects correctly the feasibility condition of the periodic messages on the switched Ethernet. With our assumption that the changes in synchronous requirements is carried on the aperiodic messages, we will analyze the real-time features of aperiodic message as well as the level of ﬂexibility of the scheduling algorithm in the future.

322

M.-K. Kim and D.M. Cuong

Acknowledgment The authors would like to thank Ministry of Commerce, Industry and Energy and Ulsan Metropolitan City which partly supported this research through the Network-based Automation Research Center (NARC) at University of Ulsan.

References 1. K. Lee and S. Lee, Performance evaluation of switched Ethernet for real-time industrial communications, Comput. Stand. Interfaces, vol. 24, no. 5, pp. 411–23, Nov. 2002. 2. R. L. Cruz, A calculus for network delay Part I: Network elements in isolation, IEEE Trans. Inform. Theory, vol. 37, no. 1, pp. 114–131, Jan. 1991. 3. R. L. Cruz, A calculus for network delay Part II : Network analysis, IEEE Trans. Information Theory, vol. 37, no. 1, pp. 132–141, Jan. 1991. 4. J.-P. Georges, N. Krommenacker, T. Divoux, and E. Rondeau, A design process of switched Ethernet architectures according to real-time application constraints, Eng. Appl. of Artiﬁcial Intelligence, Volume 19, Issue 3, April 2006, pp 335–344 5. J. Loser and H. Hartig, Low-latency hard real-time communication over switched Ethernet, In Proc. 16th Euromicro Conf. Real-Time Systems, ECRTS 2004, pp. 13–22, July 2004. 6. J. Y. Le Boudec and P. Thiran, Network Calculus. Berlin, Germany: Springer Verlag, LNCS, July 2001, vol. 2050. 7. H. Hoang, M. Jonsson, U. Hagstrom, and A. Kallerdahl, Real-time Switched Ethernet with earliest deadline ﬁrst scheduling protocols and traffic handling, In Proc 10th Int. Workshop on Parallel and Distributed Real-Time Systems, FL, Apr. 2002. 8. C. L. Liu and J. W. Layland, Scheduling algorithms for multiprogramming in a hard real-time environment, J. ACM, vol. 20, no. 1, pp. 46–51, 1973. 9. L. Almeida, P. Pedreiras, and J. A. Fonseca, The FTT-CAN protocol: Why and how, IEEE Trans. Ind. Electron., vol. 49, no. 6, pp. 1189–1201, Dec. 2002.

Large-Scale Nonlinear Programming for Multi-scenario Optimization Carl D. Laird∗ and Lorenz T. Biegler Chemical Engineering Department, Carnegie Mellon University, Pittsburgh, PA 15213 [email protected]

Abstract Multi-scenario optimization is a convenient way to formulate design optimization problems that are tolerant to disturbances and model uncertainties and/or need to operate under a variety of diﬀerent conditions. Moreover, this problem class is often an essential tool to deal with semi-inﬁnite problems. Here we adapt the IPOPT barrier nonlinear programming algorithm to provide efficient parallel solution of multi-scenario problems. The recently developed object oriented software, IPOPT 3.1, has been speciﬁcally designed to allow specialized linear algebra in order to exploit problem speciﬁc structure. Here, we discuss the high level design principles of IPOPT 3.1 and develop a parallel Schur complement decomposition approach for large-scale multi-scenario optimization problems. A large-scale example for contaminant source inversion in municipal water distribution systems is used to demonstrate the eﬀectiveness of this approach, and parallel results with up to 32 processors are shown for an optimization problem with over a million variables.

1 Introduction This study deals with development of specialized nonlinear programming algorithms for large, structured systems. Such problems often arise in the optimal design, planning and control of systems described by nonlinear models, and where the model structure is repeated through the discretization of time, space or uncertainty distributions. We focus on the important application of optimal design with unknown information. Here the goal is to incorporate the eﬀects of variable process inputs and uncertain parameters at the design stage. For general nonlinear models, many approaches rely on the solution of multiscenario problems. To develop the multi-scenario formulation, we consider optimal design problems with two types of unknown information [13, 15]. First, we consider uncertainty, i.e., what is not known well. This includes model parameters (kinetic and transport coeﬃcients, etc.) as well as unmeasured ∗

Carl D. Laird ([email protected]) is currently with the Department of Chemical Engineering, Texas A&M University, College Station, TX 77843-3122.

324

C.D. Laird and L.T. Biegler

and unobservable disturbances (e.g., ambient conditions). Second, we could also incorporate process variability (process parameters that are subject to change, but measurable at run time) which can be compensated by operating variables. Examples of variability include feed ﬂow rates, changing process conditions, product demands and measured process disturbances. While nonlinear multi-scenario optimization formulations can be solved directly with general purpose NLP solvers, the problem size can easily become intractable with these oﬀ-the-shelf tools. Traditionally, large-scale structured optimization problems like this one have been handled by specialized problem level decomposition algorithms. In contrast, this study develops the concept of internal linear decomposition for a particular NLP algorithm, IPOPT. The dominant computational expense in this algorithm is the solution of a large linear system at each iteration. With the internal linear decomposition approach, the fundamental interior point algorithm is not altered, but the mathematical operations performed by the algorithm are made aware of the problem structure. Therefore, we can develop decomposition approaches for these mathematical operations that exploit the induced problem structure, preserving the desirable convergence properties of the overall NLP algorithm. Similar concepts have also been advanced by Gondzio and coworkers [9, 10], primarily for linear, quadratic, and convex programming problems. In this work, we exploit the structure of large multi-scenario problems with a parallel Schur complement decomposition strategy and eﬃciently solve the problems on a distributed cluster. In the next section we provide a general statement of the optimal design problem with uncertainty and variability, along with the derivation of multi-scenario formulations to deal with this task. Section 3 then reviews Newton-based barrier methods and discusses their adaptation to multiscenario problems. Section 4 presents the high level design of the IPOPT 3.1 software package and describes how the design enables development of internal linear decomposition approaches without changes to the fundamental algorithm code. The parallel Schur complement decomposition is implemented within this framework. This approach is demonstrated in Section 5 for a source inversion problem in municipal water networks with uncertain demands. Scaling results are shown for a parallel implementation of the multiscenario algorithm with up to 32 processors. Section 6 then concludes the paper and discusses areas for future work.

2 Optimization Formulations Optimal design with unknown information can be posed as a stochastic optimization problem that includes varying process parameters and uncertain model parameters. For generality, we write the optimization problem with unknown input parameters as:

NLP Algorithm for Multi-scenario Optimization

min

d,z,y

325

Eθ∈Θ [P (d, z, y, θv , θu ) (1)

s.t. h(d , z , y, θv , θu ) = 0 , g(d, z, y, θv , θu ) ≤ 0]

where Eθ∈Θ is the expected value operator and Θ = Θv ∪ Θu is the space of unknown parameters, with Θv containing the “variability” parameters that are measurable at run time and Θu containing the “uncertainty” parameters. In addition, d ∈ Rnd are the design variables, z ∈ Rnz are the control variables, y ∈ Rny are the state variables, and models are represented by the inequality and equality constraints, g : Rnd +nz +ny → Rm and h : Rnd +nz +ny → Rny , respectively. To develop the multi-scenario formulation, we select discrete points from Θ. We deﬁne index set K for discrete values of the varying process parameters θkv ∈ Θv , k ∈ K, and deﬁne index set I for discrete values of the uncertain model parameters, θiu ∈ Θu , i ∈ I. Both sets of points can be chosen by performing a quadrature for the expectation operator in (1). We assume that the control variables z can be used to compensate for the measured process variability, θv , but not the uncertainty associated with model parameters, θu . Thus, the control variables are indexed over k in the multi-scenario design problem, while the state variables, y, determined by the equality constraints, are indexed over i and k. With these assumptions, the multi-scenario design problem is given by: ωik fik (d, zk , yik , θkv , θiu ) min P = f0 (d) + d,z,y

s.t.

k∈K i∈I ) hik (d, zk , yik , θkv , θiu ) = 0, k gik (d, zk , yik , θkv , θiu ) ≤ 0

∈ K, i ∈ I

(2)

While d⋆ from the solution vector of (2) may provide a reasonable approximation to the solution of (1), additional tests are usually required to ensure constraint feasibility for all θ ∈ Θ. These feasibility tests require the global solution of nested optimization problems and remain challenging for general problem classes. Moreover, they may themselves require the eﬃcient solution of multi-scenario problems [13, 14]. Detailed discussion of feasibility tests is beyond the scope of this study and the reader is directed to [5] for further information.

3 NLP Solution Algorithm In principle, multi-scenario problems, such as (2), can be solved directly with general purpose NLP algorithms, but specialized solution approaches are necessary for large problems. SQP-based strategies have been developed that exploit the structure of these problems [4, 16]. In this study, we present an improved multi-scenario strategy based on a recently developed, primal-dual barrier NLP method called IPOPT.

326

C.D. Laird and L.T. Biegler

IPOPT [17] applies a Newton strategy to the optimality conditions that result from a primal-dual barrier problem, and adopts a novel ﬁlter based line search strategy. Under mild assumptions, the IPOPT algorithm has global and superlinear convergence properties. Originally developed in FORTRAN, the IPOPT algorithm was recently redesigned to allow for structure dependent specialization of the fundamental linear algebra operations. This new package is implemented in C++ and is freely available through the COIN-OR foundation from the following website: http://projects.coin-or.org/Ipopt. The key step in the IPOPT algorithm is the solution of linear systems derived from the linearization of the ﬁrst order optimality conditions (in primal-dual form) of a barrier subproblem. More information on the algorithm and analysis is given in [17]. Here, we derive the structured form of these linear systems for the multi-scenario optimization problem and present a specialized decomposition for their solution. To simplify the derivation, we consider problem (2) with only a single set of discrete values for the uncertain parameters, replacing i ∈ I and k ∈ K with the indices q ∈ Q. This simpliﬁcation forces a discretization of z over both of the sets I and K instead of K alone, and constraints can be added to enforce equality of the z variables across each index of I. A more eﬃcient nested decomposition approach that recognizes the structure resulting from the two types of unknown information can be formulated and will be the subject of future work. By adding appropriate slack variables sq ≥0 to the inequalities, and by deﬁning linking variables and equations in each of the individual scenarios, we write a generalized form of the multi-scenario problem as: fq (xq ) min xq ,d

s.t.

q∈Q

cq (xq ) = 0, Sq xq ≥ 0, ¯qd = 0 Dq xq − D

⎫ ⎬ ⎭

q∈Q

(3)

¯ q and Sq matrices extract suitwhere xTq = [zqT yqT sTq dTq ] and the Dq , D able components of the xq vector to deal with linking variables and variable bounds, respectively. If all of the scenarios have the same structures, we can ¯ q = I (where | · | indicates the set S1 = · · · =S|Q| , Dq = [0 | 0 | 0 | I] and D cardinality of the set). On the other hand, indexing these matrices also allows ¯ q may not even scenarios with heterogeneous structures to be used where D be square. Using a barrier formulation, this problem can be converted to: {fq (xq ) − µ ln[(Sq xq )(j) ]} min xq ,d

s.t.

q∈Q

cq (xq ) = 0, ¯qd = 0 Dq xq − D

j

)

q∈Q

(4)

NLP Algorithm for Multi-scenario Optimization

327

where indices j correspond to scalar elements of the vector (Sq xq ). Deﬁning the Lagrange function of the barrier problem (4), L(x, λ, σ, d) = =

q∈Q

q∈Q

L¯q (xq , λq , σq , d)

{Lq (xq , λq , σq , d) − µ

=

q∈Q

{fq (xq )−µ

ln[(Sq xq )(j) ]}

j

(j)

ln[(Sq xq )

j

(5) 1

2 ¯ q d T σq } ] + cq (xq ) λq + Dq xq −D T

with the multipliers λq and σq . Deﬁning Gq =diag(Sq xq ) leads to the primal dual form of the ﬁrst order optimality conditions for this equality constrained problem, written as: ⎫ ∇xq fq (xq ) + ∇xq cq (xq )λq + DqT σq − SqT νq = 0 ⎪ ⎪ ⎬ cq (xq ) = 0 q∈Q (6) ¯ Dq xq − Dq d = 0 ⎪ ⎪ ⎭ Gq νq − µe = 0 ¯ qT σq = 0 − D q∈Q

where we deﬁne eT = [1, 1, . . . , 1]. Writing the Newton step for (6) at iteration ℓ leads to: ⎫ ∇xq xq Lℓq ∆xq + ∇xq cℓq ∆λq + DqT ∆σq − SqT ∆νq = −(∇xq Lℓq − SqT νqℓ ) ⎪ ⎪ ⎬ ∇xq cℓq ∆xq = −cℓq q ∈ Q (7) ℓ ℓ ¯ ¯ Dq ∆xq − Dq ∆d = −Dq xq + Dq d ⎪ ⎪ ⎭ ℓ ℓ Vq Sq ∆xq + Gq ∆νq = µe − Gq νq T T ℓ ¯ q ∆σq = ¯ q σq − D D q∈Q

q∈Q

where the superscript ℓ indicates that the quantity is evaluated at the point (xℓq , λℓq , σqℓ , νqℓ , dℓ ). Eliminating ∆νq from the resulting linear equation gives the primal-dual augmented system ⎫ Hqℓ ∆xq + ∇xq cℓq ∆λq + DqT ∆σq = − ∇xq L¯ℓq ⎬ ∇xq cℓq ∆xq = −cℓq q∈Q (8) ¯ q dℓ ⎭ ¯ q ∆d = −Dq xℓq + D Dq ∆xq − D ¯ qT σqℓ ¯ qT ∆σq = D D − q∈Q

q∈Q

where Hqℓ = ∇xq xq Lℓq +SqT (Gℓq )−1 Vqℓ Sq , and Vq = diag(νq ). According to the IPOPT algorithm [17], the linear system (8) is modiﬁed as necessary by adding diagonal terms. Diagonal elements are added to the block Hessian terms in the augmented system to handle nonconvexities and to the lower right corner in each block to handle temporary dependencies in the constraints. The linear system (8), with these modiﬁcations, can be written with a block bordered diagonal (arrowhead) structure given by:

328

C.D. Laird and L.T. Biegler

⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣

W1 W2 W3 ..

.

WN AT1 AT2 AT3 · · · ATN

A1 A2 A3 .. .

⎤ ⎡

u1 u2 u3 .. .

⎤

⎡

r1 r2 r3 .. .

⎤

⎥ ⎥ ⎢ ⎥ ⎢ ⎥ ⎥ ⎢ ⎥ ⎢ ⎥ ⎥ ⎢ ⎥ ⎢ ⎥ ⎥ ⎢ ⎥ ⎢ ⎥ ⎥=⎢ ⎥·⎢ ⎥ ⎥ ⎢ ⎥ ⎢ ⎥ ⎥ ⎢ ⎥ ⎢ ⎣ ⎣ ⎦ ⎦ AN rN ⎦ uN rd ∆d δ1 I

(9)

¯ q dℓ )T ], uT = [∆xT ∆λT ∆σ T ], where rqT = −[(∇xq Lℓq )T , (cℓq )T , (Dq xℓq − D q q q q T T ¯ Aq = [ 0 0 − Dq ], ⎤ Hqℓ +δ1 I ∇xq cℓq DqT Wq = ⎣ (∇xq cℓq )T −δ2 I 0 ⎦ 0 −δ2 I Dq ⎡

¯ T σℓ . for q ∈ Q, and rd = q∈Q D q q The IPOPT algorithm requires the solution of the augmented system (9), at each iteration. IPOPT also requires the inertia (the number of positive and negative eigenvalues), limiting the available linear solvers that can be used eﬃciently. Conceptually, (9) can be solved with any serial direct linear solver conﬁgured with IPOPT. However, as the problem size grows the time and memory requirements can make this approach intractable. Instead, applying a Schur complement decomposition allows an eﬃcient parallel solution technique. Eliminating each Wq from (9) we get the following expression for ∆d, ⎡ ⎤ ⎣δ1 I − ATq (Wq )−1 Aq ⎦ ∆d = rd − ATq (Wq )−1 rq (10) q∈Q

q∈Q

which requires forming the Schur complement, B = δ1 I − q∈Q ATq (Wq )−1 Aq , and solving this dense symmetric linear system for ∆d. Once a value for ∆d is known, the remaining variables can be found by solving the following system, Wq uq = rq − Aq ∆d

(11)

for each q ∈ Q. This approach is formally described by the following algorithm. ⎤ ⎡ 1. Form the Schur-Complement, B= ⎣δ1 I − ATq (Wq )−1 Aq ⎦ q∈Q

initialize B=δ1 I for each period q ∈ Q for each column j=1..M of Aq for pj Solve the linear system Wq pj = A q Add to column j of the Schur Complement, B =B −ATq pj

NLP Algorithm for Multi-scenario Optimization

2. Solve the Schur complement system, B∆d=rd −

329

ATq (Wq )−1 rq , for ∆d

q∈Q

initialize r¯=rd for each period q ∈ Q Solve the linear system, Wq p = rq for p Add to the right hand side, r¯ = r¯ − ATq p Solve the dense linear system, B∆d = r¯ for ∆d

3. Solve for remaining variables for each period q ∈ Q Solve the linear system, Wq uq = rq −Aq ∆d for uq

This decomposition strategy applies speciﬁcally to the solution of the augmented system within the overall IPOPT algorithm and simply replaces the default linear solver. The sequence of steps in the overall IPOPT algorithm are not altered, and as such, this specialized strategy inherits all of the convergence properties of the IPOPT algorithm for general purpose nonlinear programs [17]. As discussed in [17], the method is globally and superlinearly convergent under mild assumptions. Furthermore, this decomposition strategy is straightforward to parallelize with excellent scaling properties. Let N =|Q|, the number of scenarios, and M =dim(d), the number of common or design variables in the problem. The number of linear solves of the Wq blocks required by the decomposition approach is N · M + 2N . If the number of available processors in a distributed cluster is equal to N (one processor for each scenario), then the number of linear solves required by each processor is only M + 2, independent of the number of scenarios. This implies an approach that scales well with the number of scenarios. As we increase the number of scenarios under consideration, the cost of the linear solve remains fairly constant (with minimal communication overhead) as long as an additional processor is available for each new scenario. More importantly, the memory required on each processor is also nearly constant, allowing us to expand the number of scenarios and, using a large distributed cluster, move beyond the memory limitation of a standard single processor machine. The eﬃcient use of a distributed cluster to solve large problems that were previously not possible with a single standard machine is a major driving force of this work.

4 Implementation of Internal Linear Decomposition The Schur Complement algorithm described above is well-known. Nevertheless, the implementation of this linear decomposition in most existing NLP software requires a nontrivial modiﬁcation of the code. In many numerical codes, the particular data structures used for storing vectors and matrices are exposed to the fundamental algorithm code. With this design it is straightforward to perform any necessary mathematical operations eﬃciently within the

330

C.D. Laird and L.T. Biegler

algorithm code. However, changing the underlying data representation (e.g. storing a vector in block form across a distributed cluster instead of storing it as a dense array) requires that the algorithm code be altered every place it has access to the individual elements of these vectors or matrices. Developing algorithms that exploit problem speciﬁc structure through internal linear decomposition requires the use of eﬃcient (and possibly distributed) data structures that inherently represent the structure of the problem. As well, it requries the implementation of mathematical operations that can eﬃciently exploit this structure. If the fundamental algorithm code is intimately aware of the underlying data representation (primarily of vectors and matrices) then altering that representation for a particular problem structure can require a signiﬁcant refactoring of the code. In the recent redesign of IPOPT, special care was taken to separate the fundamental algorithm code from the underlying data representations. The high level structure of IPOPT is described in Figure 1. The fundamental algorithm code communicates with the problem speciﬁcation through a welldeﬁned NLP interface. Moreover, the fundamental algorithm code is never allowed access to individual elements in vectors or matrices and is purposely unaware of the underlying data structures within these objects. It can only perform operations on these objects through various linear algebra interfaces. While the algorithm is independent of the underlying data structure, the NLP implementation needs to have access to the internal representation so it can ﬁll the necessary data (e.g. specify the values of Jacobian entries). Therefore, the NLP implementation is aware of the particular linear algebra implementation, but only returns interface pointers to the fundamental algorithm code. The IPOPT package comes with a default linear algebra representation (using dense arrays for vectors and a sparse structure for matrices) and a default set of NLP interfaces. However, this design allows the data representations and mathematical operations to be modiﬁed for a particular problem structure without changes to the fundamental algorithm code. Similar ideas have also been used in the design of reduced-space SQP codes, particularly for problems constrained by partial diﬀerential equations [1–3]. In this work, we tested the redesigned IPOPT framework by implementing the Schur complement decomposition approach for the multi-scenario design problem. This implementation makes use of the Message Passing Interface (MPI) to allow parallel execution on a distributed cluster. The implementation uses the composite design pattern and implements a composite NLP that forms the overall multi-scenario problem by combining individual speciﬁcations for each scenario. This implementation has also been interfaced to AMPL [8], allowing the entire problem to be speciﬁed using individual AMPL models for each scenario and AMPL suﬃxes to describe the connectivity (im¯ q , and Sq matrices). This allows the formulation of plicitly deﬁning the Dq , D large multi-scenario problems with relative ease. Furthermore, when solving the problem in parallel, each processor only evaluates functions for its own

NLP Algorithm for Multi-scenario Optimization

331

Figure 1. Redesigned IPOPT structure, allowing for specialized linear algebra

scenarios, allowing distribution of data and parallelization of these computations across processors. Parallel implementations for vectors and matrices have also been developed that distribute individual blocks across processors. All the necessary vector operations (e.g. BLAS operations, etc.) have been implemented for efﬁcient calculation in parallel. Finally, a distributed solver has been written for the augmented system that uses a parallel version of the algorithm described in the previous section. This distributed solver uses a separate linear solver instance for the solution of each of the Wq blocks (and can use any of the linear solvers already interfaced with IPOPT). This separation allows solution of heterogeneous multi-scenario problems where the individual scenarios may have diﬀerent structures. Finally, the distributed solver calls LAPACK routines for the dense linear solve of the Schur complement.

5 Source Detection Application and Results To illustrate these concepts on a large-scale application, we consider the determination of contamination sources in large municipal water distribution systems. Models for these can be represented by a network where edges represent pipes, pumps, or valves, and nodes represent junctions, tanks, or reservoirs. Assuming that contaminant can be injected at any network node, the goal of this source inversion approach is to use information from a limited number of sensors to calculate injection times and locations. Identifying the contamination source enables security or utilities personnel to stop the contamination and propose eﬀective containment and cleanup strategies. This problem can be formulated as a dynamic optimization problem which seeks to ﬁnd the unknown injection proﬁles (at every network node) that

332

C.D. Laird and L.T. Biegler

minimize the least-squares error between the measured and calculated network concentrations. For the water quality model, the pipes are modeled with partial diﬀerential equations in time and space, and the network nodes are modeled with dynamic mass balances that assume complete mixing. This produces an inﬁnite dimensional optimization problem that can be discretized to form a large-scale algebraic problem. A naive approach requires a discretization in both time and space due to the constraints from the pipe model. Instead, following the common assumption of plug ﬂow in the main distribution lines, we developed an origin tracking algorithm which uses the network ﬂow values to precalculate the time delays across each of the network pipes. This leads to a set of algebraic expressions that describe the time-dependent relationship between the ends of each of the pipes. The discretized dynamic optimization problem was solved using IPOPT [17]. This large scale problem had over 210000 variables and 45000 degrees of freedom. Nevertheless, solutions were possible in under 2 CPU minutes on a 1.8 GHz Pentium 4 machine and the approach was very successful at determining the contamination source [11]. Furthermore, the approach has been extended to address very large networks by formulating the problem on a small window or subdomain of the entire network [12]. In previous work, we assumed that the network ﬂows were known. In this work, we assume that the network ﬂows are not known exactly, but can only be loosely characterized based on a demand generation model. Following the approach of Buchberger and coworkers, we use the PRPSym software [6,7], to generate reasonable residential water demands. Varying parameters in the Poisson Rectangular Pulse (PRP) model allows us to generate numerous scenarios for the water demands. To account for uncertain water demands in the source inversion, we formulate a multi-scenario problem over a set of possible ﬂow scenarios as, ) ρ T ⋆ T ⋆ [cq − c ] Ωq [cq − c ] + m mq min pq ,cq ,mq ,m ¯ |Q| q q∈Q ⎫ s.t. ϕq (pq , cq , mq ) = 0, ⎬ mq ≥ 0, q∈Q (12) ⎭ mq − m ¯ =0

where q ∈ Q represents the set of possible realizations or scenarios, c⋆ are the measured concentrations at the sensor nodes (a subset of the entire domain), cq and mq are the vectors of calculated concentrations and injection variables for scenario q at each of the nodes (both discretized over all nodes in the subdomain and all timesteps). The vector of pipe inlet and outlet concentrations for scenario q are given by pq and are discretized over all pipes in the subdomain and all timesteps. The weighting matrix Ωq and the discretized model equations ϕq (pq , cq , mq ) are discussed in more detail in previous publications [11, 12]. The variables m ¯ are the aggregated injection proﬁles that are common across each of the scenarios.

NLP Algorithm for Multi-scenario Optimization

333

To demonstrate the multi-scenario decomposition approach on this problem, we selected a real municipal network model with 400 nodes and 50 randomly placed sensors. We then restricted ourselves to a 100 node subdomain of this model. Three variable parameters in the PRP model were assumed to be unknown, but with reasonable bounds shown in Table 1. Assuming uniform Uncertain Parameter Lower Bound Upper Bound Avg. Demand Duration (min) 0.5 2.0 Avg. Demand Intensity (gpm) 1.0 4.0 Total Base Demand (gpm) 500 700 Table 1. Range of parameters used in the PRP model in order to generate reasonable ﬂow patterns for the distribution system. Diﬀerent scenarios were formulated by randomly selecting values assuming uniform distributions. The total base demand is the sum of the base demands over all the nodes in the network

distributions for the PRP parameters, random values were used to produce the “true” water demands and 32 diﬀerent possible realizations of the uncertain water demands. Using the “true” demands, a pulse contamination injection was simulated from one of the nodes in the network to generate time proﬁles for the concentration measurements. Hydraulic simulations were performed for each of the 32 realizations to calculate the network ﬂows. We formulate the optimization problem with a 6 hour time horizon and 5 minute integration timesteps. This generates an individual scenario with 36168 variables. Across scenarios, we aggregate the common variables and discretize m ¯ over 1 hour timesteps. This gives 600 common variables, one for each node across each of the 6 aggregate time discretizations. We then test the scalability of our parallel decomposition implementation by formulating multi-scenario optimization problems with 2 to 32 scenarios and solve the problems using 16 nodes of our in-house Beowulf cluster. Each node has 1.5 GB of RAM and dual 1 GHz processors. Timing results for these runs are shown in Figure 2 where the number of scenarios is shown on the abscissa, and the wall clock time (in seconds) is shown on the left ordinate. It is important to note that for each additional scenario, an additional processor was used (i.e. 2 processors were used for the 2 scenario formulation and 16 for the 16 scenario formulation, etc). The right ordinate shows the total number of variables in the problem. The timing results for 2 to 16 processors show nearly perfect scaleup where the additional time required as we add scenarios (and processors) is minimal (the same is true from 17 to 32 scenarios). Tests were performed on a 16 node cluster and the jump in time as we switch from 16 to 17 scenarios corresponds to the point where both processors on a single machine were ﬁrst utilized. When two processors on the same machine are each forming their own local contribution to the Schur complement, the process appears

334

C.D. Laird and L.T. Biegler

Figure 2. Timing Results for the Multi-Scenario Problem with 600 Common Variables: This ﬁgure shows the scalability results of the parallel interior-point implementation on the multi-scenario problem. The number of processors used was equal to the number of scenarios in the formulation. The total number of variables in the problem is shown with the secondary axis

to be limited by the memory bandwidth of the dual processor machine. This observation, coupled with the scaleup results in Figure 2, demonstrates that the approach scales well and remains eﬀective as we increase the number of nodes and scenarios. Furthermore, it implies that the distributed cluster model is appropriate for this problem and that the scaling of communication overhead is quite reasonable.

6 Conclusions This study deals with the formulation and eﬃcient solution of multiscenario optimization problems that often arise in the optimal design of systems with unknown information. Discretizing the uncertainty sets leads to large multi-scenario optimization problems, often with few common variables. For the solution of these problems we consider the barrier NLP algorithm, IPOPT, and have developed an eﬃcient parallel Schur complement approach that exploits the block bordered structure of the KKT matrix. The formulation and implementation is demonstrated on a large-scale multi-scenario problem with over 30000 variables in each block and 600 common variables linking the blocks. Testing up to 32 scenarios, we observe nearly perfect scaleup with additional scenarios using a distributed Beowulf cluster. Furthermore, this implementation is easily facilitated by the software structure of the redesigned IPOPT code, because of the separation of the fundamental algorithm code and the linear algebra code. The MPI implementation of the parallel Schur complement solver and the parallel vector and

NLP Algorithm for Multi-scenario Optimization

335

matrix classes are possible without any changes to the fundamental algorithm code. Finally, this implementation has been interfaced with the AMPL [8] modeling language to allow straightforward speciﬁcation of the multi-scenario problem. Individual scenarios can be speciﬁed as AMPL models of their own, with the connectivity described using AMPL suﬃxes. This easily allows the development of both homogeneous and heterogeneous problem formulations. The decomposition presented in this work was formulated using a single discretized set for the unknown parameters. A formulation which explicitly includes both forms of unknown information, uncertainty and variability, leads to a nested block bordered structure in the KKT system. Developing a recursive decomposition strategy for problems of this type will be the subject of future work. Also, while the motivation for this work was the multi-scenario optimization problem arising from design under uncertainty, other problems produce similar KKT systems. Large-scale nonlinear parameter estimation problems have a similar structure with an optimization of large models over many data sets where the unknown parameters are the common variables. This problem and others are excellent candidates for this solution approach. Acknowledgments Funding from the National Science Foundation (under grants ITR/AP0121667 and CTS-0438279) is gratefully acknowledged.

References 1. Bartlett, R. A. (2001). New Object-Oriented Approaches to Large-Scale Nonlinear Programming for Process Systems Engineering, Ph.D. Thesis, Chemical Engineering Department, Carnegie Mellon University. 2. Bartlett, R. A. and van Bloemen Waanders, B. G. (2002). A New Linear Algebra Interface for Efficient Development of Complex Algorithms Independent of Computer Architecture and Data Mapping, Technical Report, Sandia National Laboratories, Albuquerque, NM. 3. Bartlett, R. A. (2002). rSQP++, An Object-Oriented Framework for Reduced Space Successive Quadratic Programming, Technical Report, Sandia National Laboratories, Albuquerque, NM. 4. Bhatia, T. and Biegler, L. (1999). Multiperiod design and planning with interior point methods, Comp. Chem. Eng. 23(7): 919–932. 5. Biegler, L. T., Grossmann, I. E., and Westerberg, A. W. (1997). Systematic Methods of Chemical Process Design, Prentice-Hall, Upper Saddle River, NJ. 6. Buchberger, S. G. and Wells, G. J. (1996). Intensity, duration and frequency of residential water demands, Journal of Water Resources Planning and Management, ASCE, 122(1):11-19. 7. Buchberger, S. G. and Wu, L. (1995). A model for instantaneous residential water demands. Journal of Hydraulic Engineering, ASCE, 121(3):232-246. 8. Fourer, R., Gay, D. M., and Kernighan, B. W. (1992). AMPL: A Modeling Language for Mathematical Programming. Belmont, CA: Duxbury Press.

336

C.D. Laird and L.T. Biegler

9. Gondzio, J. and Grothey, A. (2004). Exploiting Structure in Parallel Implementation of Interior Point Methods for Optimization, Technical Report MS-04-004, School of Mathematics, The University of Edinburgh. 10. Gondzio, J. and Grothey, A. (2006). Solving Nonlinear Financial Planning Problems with 109 Decision Variables on Massively Parallel Architectures, Technical Report MS-06-002, School of Mathematics, The University of Edinburgh. 11. Laird, C. D., Biegler, L. T., van Bloemen Waanders, B., and Bartlett, R. A. (2005). Time Dependent Contaminant Source Determination for Municipal Water Networks Using Large Scale Optimization, ASCE Journal of Water Resource Management and Planning, 131, 2, p. 125. 12. Laird, C. D., Biegler, L. T. and van Bloemen Waanders, B. (2007). Real-time, Large Scale Optimization of Water Network Systems using a Subdomain Approach, in Real-Time PDE-Constrained Optimization, SIAM, Philadelphia. 13. Ostrovsky, G. M., Datskov, I. V., Achenie, L. E. K., Volin, Yu.M. (2003) Process uncertainty: the case of insufficient process data at the operation stage. AIChE Journal 49, 1216-1240. 14. Ostrovsky, G., Volin, Y. M. and Senyavin, N. M. (1997). An approach to solving a two-stage optimization problem under uncertainty, Comp. Chem. Eng. 21(3): 317. 15. Rooney, W. and Biegler, L. (2003). Optimal Process Design with Model Parameter Uncertainty and Process Variability, Nonlinear conﬁdence regions for design under uncertainty, AIChE Journal, 49(2), 438. 16. Varvarezos, D., Biegler, L. and Grossmann, I. (1994). Multi-period design optimization with SQP decomposition, Comp. Chem. Eng. 18(7): 579–595. 17. W¨ achter, A., and Biegler, L. T. (2006). On the Implementation of an Interior Point Filter Line Search Algorithm for Large-Scale Nonlinear Programming, Mathematical Programming, 106(1), 25-57.

On the Efficiency of Python for High-Performance Computing: A Case Study Involving Stencil Updates for Partial Diﬀerential Equations Hans Petter Langtangen1,2 and Xing Cai1,2 1

2

Simula Research Laboratory, P.O. Box 134, N-1325 Lysaker, Norway {hpl,xingca}@simula.no Department of Informatics, University of Oslo, P.O. Box 1080, Blindern, N-0316 Oslo, Norway

Abstract The purpose of this paper is to assess the loss of computational efficiency that may occur when scientiﬁc codes are written in the Python programming language instead of Fortran or C. Our test problems concern the application of a seven-point ﬁnite stencil for a three-dimensional, variable coefficient, Laplace operator. This type of computation appears in lots of codes solving partial diﬀerential equations, and the variable coefficient is a key ingredient to capture the arithmetic complexity of stencils arising in advanced multi-physics problems in heterogeneous media. Diﬀerent implementations of the stencil operation are described: pure Python loops over Python arrays, Psyco-acceleration of pure Python loops, vectorized loops (via shifted slice expressions), inline C++ code (via Weave), and migration of stencil loops to Fortran 77 (via F2py) and C. The performance of these implementations are compared against codes written entirely in Fortran 77 and C. We observe that decent performance is obtained with vectorization or migration of loops to compiled code. Vectorized loops run between two and ﬁve times slower than the pure Fortran and C codes. Mixed-language implementations, Python-Fortran and Python-C, where only the loops are implemented in Fortran or C, run at the same speed as the pure Fortran and C codes. At present, there are three alternative (and to some extent competing) implementations of Numerical Python: numpy, numarray, and Numeric. Our tests uncover signiﬁcant performance diﬀerences between these three alternatives. Numeric is fastest on scalar operations with array indexing, while numpy is fastest on vectorized operations with array slices. We also present parallel versions of the stencil operations, where the loops are migrated to C for efficiency, and where the message passing statements are written in Python, using the high-level pypar interface to MPI. For the current test problems, there are hardly any efficiency loss by doing the message passing in Python. Moreover, adopting the Python interface of MPI gives a more elegant parallel

338

H.P. Langtangen and X. Cai

implementation, both due to a simpler syntax of MPI calls and due to the efficient array slicing functionality that comes with Numerical Python.

1 Introduction The Python language has received signiﬁcant attention in the scientiﬁc computing community over the last few years. Many research projects have experienced the gain in using Python as a scripting language, i.e., to administer simulation, data analysis, and visualization tasks as well as archiving simulation results [5]. For this purpose, shell languages were used in the past, but Python serves as a much more powerful and advanced programming language with comprehensive standard libraries. For example, built-in heterogeneous lists and hash maps can conveniently hold data of various kinds; the strong support for text manipulation comes handy for interpreting data formats; and graphical or web-based user interfaces are well supported. Such features make it easy to bring new life to old, less user-friendly code, by gluing applications together and equipping them with modern interfaces. Important features of Python, compared to competing scripting languages like Perl and Ruby, is the very clean syntax and the wide range of add-on modules for numerical computations and graphics. Although Python without doubt has demonstrated a great potential as a scripting language in scientiﬁc computing, Python also has a potential as a main language for writing new numerical codes. Such codes are traditionally implemented in compiled languages, such as Fortran, C, or C++. Python is known to be very much slower than compiled languages and may hence seem unsuitable for numerical computations. However, most of a scientiﬁc code deals with user interfaces (ﬁles, command line, web, graphical windows), I/O management, data analysis, visualization, ﬁle and directory manipulation, and report generation, to mention some items for which Python is much better suited than the traditional compiled languages. The reason is that Python has more support for such tasks, and the resulting code is much more compact and convenient to write. Usually, only a small portion of the total code volume deals with intensive numerics where high performance matters. This paper addresses how to write these number crunching snippets when Python is chosen to be the main language of a scientiﬁc code. A lot of computational scientists have moved from compiled languages to Matlab. Despite the chance for experiencing decreased performance, scientists choose Matlab because it is a more productive and convenient computing environment. The recent popularity of Python is a part of the same trend. Python shares many of Matlab’s features: a simple and clean syntax of the command language; integration of simulation and visualization; interactive execution of commands, with immediate feedback; lots of built-in functions operating eﬃciently on arrays in compiled code; satisfactory performance of everyday operations on today’s computers; and good documentation and support.

Python for High-Performance Computing

339

In addition, Python is a full-ﬂedged programming language that supports all major programming styles (procedural, functional, object-oriented, and generic programming). Simple programs are concise, as in Matlab, while the concepts of classes, modules, and packages are available to users for developing huge codes. Many widely used Java-type tools for documentation in source code ﬁles and testing frameworks are also mirrored in the Python world. Packing and distributing Python applications are well supported by several emerging tools. We should also mention that Python code can easily be made truly cross-platform, even when graphical user interfaces and operating system interaction are present. To all these advantages adds the feature that Python is free and very actively developed by an open source community. The many advantages and promises of Python make it an interesting platform for building scientiﬁc codes. A major question is, however, the performance. We shall address this question by doing performance tests on a range of codes that solve partial diﬀerential equations (PDEs) numerically via ﬁnite diﬀerence methods. More precisely, we are concerned with standard stenciltype operations on large three-dimensional arrays. The various implementations diﬀer in the way they use Python and other languages to implement the numerics. It turns out that naive loops in pure Python are extremely slow for operations on large three-dimensional arrays. Vectorization of expressions may help a lot, as we shall show in this paper, but for optimal performance one has to combine Python with compiled languages. Python comes with strong support for calling C code. To call a C function from Python, one must write wrapper code that converts function arguments and return values between Python and C. This may be quite tedious, but several tools exist for automating the process, e.g., F2PY, SWIG, Boost.Python, and SIP. We shall apply F2PY to automate gluing Python with Fortran code, and we shall write complete extension modules by hand in C for comparison. Modern scientiﬁc codes for solving PDEs frequently make use of parallel computing. In this paper, we advocate that the parallelization can be done more conveniently at the “high” Python level than at the “low” C/MPI level. Parallelization at the Python level utilizes modules like pypar, which gives access to MPI functionality, but at a higher abstraction level than in C. We shall demonstrate how our PDE solvers can be parallelized with pypar, and carefully evaluate the potential performance loss. There are few studies of the performance of Python for numerical computing. Ramachandran [11] ran some tests of a two-dimensional Laplace equation (solved by an SOR method) with diﬀerent combinations of Python tools and compiled languages (Weave, F2PY, Pyrex, Psyco) in a pure serial setting. Some simpler operations and case studies appear in [5], while [2] contains a study quite similar to the present paper, but with simpler PDE and stencil models. These models, arising from constant-coeﬃcient PDEs, involve so few arithmetic operations inside the loops that the conclusions on performance may be misleading for applications involving heterogeneous media and/or more complicated constitutive relations in the PDEs. We therefore saw the

340

H.P. Langtangen and X. Cai

need to repeat the PDE-related tests from [2] in a more physically demanding context in order to reach conclusions of wider relevance. These new applications and tests constitute the core of the present paper. In the remaining text of this section we present the model problems used in the performance tests. Next, we present various implementation strategies and their eﬃciency for stencil operations on three-dimensional arrays. Thereafter, the most promising implementation strategies are employed in a parallel context, where the parallelization is carried out at the Python level. Finally, we give some concluding remarks. 1.1 Two Simple PDEs with Variable Coefficients To study the performance of Python in realistic scientiﬁc applications, while keeping the mathematical and numerical details to a minimum, we will in the upcoming text heavily use the following two simple PDEs in three space dimensions: ∂u = ∇ · (a(x, y, z)∇u) , (1) ∂t ∂2u = ∇ · (a(x, y, z)∇u) . (2) ∂t2 These PDEs arise in a number of physical applications, either as standalone equations for diﬀusion and wave phenomena, respectively, or as part of more complicated systems of PDEs for multi-physics problems. Both (1) and (2) need to be accompanied with suitable boundary and initial conditions, which we will not discuss in the present paper, but rather use simple choices (such as u = 0 on the boundary) in the numerical experiments. The numerical operations associated with solving (1) and (2) enter not only codes solving these speciﬁc equations, but also many other PDE codes where variable-coeﬃcient Laplace operators are present. For example, an explicit “Forward Euler” ﬁnite diﬀerence update of (1) and (2) at a time step involves (to a large extent) the same code and numerical operations as encountered in, e.g., a multigrid solver for a Poisson equation. This means that a performance analysis of such a ﬁnite diﬀerence update gives a good idea of how Python will perform in real-life PDE applications. 1.2 Finite Diﬀerence Discretization The ﬁnite diﬀerence discretization of the common right-hand sides of (1) and (2) takes the following form on a uniform three-dimensional mesh with grid spacings ∆x, ∆y, ∆z: Lℓi,j,k ≡

" 1 ! ai+ 1 ,j,k (uℓi+1,j,k − uℓi,j,k )/∆x − ai− 1 ,j,k (uℓi,j,k − uℓi−1,j,k )/∆x 2 2 ∆x " 1 ! ai,j+ 1 ,k (uℓi,j+1,k − uℓi,j,k )/∆y − ai,j− 1 ,k (uℓi,j,k − uℓi,j−1,k )/∆y + 2 2 ∆y " 1 ! ℓ ℓ ai,j,k+ 1 (ui,j,k+1 − ui,j,k )/∆z − ai,j,k− 1 (uℓi,j,k − uℓi,j,k−1 )/∆z . + 2 2 ∆z

Python for High-Performance Computing

341

Here, ai+ 12 ,j,k denotes the value of a(x, y, z) at mid-point ((i + 12 )∆x, j∆y, k∆z), likewise for ai− 21 ,j,k , ai,j+ 21 ,k , ai,j− 21 ,k , ai,j,k+ 12 , and ai,j,k− 12 . In real-life applications, it is not unusual that the values of a(x, y, z) are only available at the mesh points, thus not known on the mid-points such as required by ai+ 12 ,j,k etc. We therefore assume in this paper that values of a(x, y, z) have to be approximated at these mid-points. The harmonic mean is a robust technique for approximating ai+ 12 ,j,k , especially in the case of strong discontinuities in a, such as those often met in geological media. The harmonic mean between ai,j,k and ai+1,j,k is deﬁned as follows: ai+ 12 ,j,k ≈

1 ai,j,k

2 . 1 + ai+1,j,k

(3)

Similarly, we can use the harmonic mean to approximate ai− 21 ,j,k , ai,j+ 21 ,k , ai,j− 12 ,k , ai,j,k+ 12 , and ai,j,k− 21 . Our numerical tests will involve sweeps of ﬁnite diﬀerence stencils over three-dimensional uniform box grids. To this end, it makes sense to build test software that solves (1) and (2) by explicit ﬁnite diﬀerence schemes. Even with a more suitable implicit scheme for (1), the most time-consuming part of a typical multigrid algorithm for the resulting linear system will closely resemble an explicit ﬁnite diﬀerence update. The explicit schemes for (1) and (2) read, respectively, ℓ uℓ+1 i,j,k − ui,j,k

∆t ℓ−1 ℓ − 2u uℓ+1 i,j,k + ui,j,k i,j,k ∆t2

= Lℓi,j,k ,

(4)

= Lℓi,j,k .

(5)

Both equations can be solved with respect to the new and only unknown value uℓ+1 i,j,k . Any implementation essentially visits all grid points at a time level and evaluates the formula for uℓ+1 i,j,k .

2 Python and High-Performance Serial Computing The basic computational kernel for our model PDEs consists of loops over three-dimensional arrays, with arithmetic operations and array look-up of neighboring points in the underlying grid. We shall in this section study different types of implementation of this computational kernel, keeping a focus on both performance and programming convenience. 2.1 Alternative Numerical Python Implementations The fundamental data structure is a contiguous three-dimensional array. Python has an add-on package, called Numerical Python, which oﬀers a

342

H.P. Langtangen and X. Cai

multi-dimensional array object encapsulating a plain C array. Because the underlying data structure is just a pointer to a stream of numbers in memory, it is straightforward to feed C, C++, and Fortran functions with the data in Numerical Python arrays. Numerical Python consists of several modules for eﬃcient computations with arrays. These include overloaded arithmetic operators for array objects, standard mathematical functions (trigonometric, exponential, etc.), random number generation, and some linear algebra (eigenvalues, eigenvectors, solution of dense linear systems). Numerical Python makes extensive use of LAPACK, preferably through the highly optimized ATLAS library. In short, Numerical Python adds “basic numerical Matlab functionality” to Python. At the time of this writing, there are three alternative implementations of Numerical Python. These are named after the name of the fundamental module that deﬁnes the array object: Numeric, numarray, and numpy. Numeric is the “old”, original implementation from the mid 1990s, developed by Jim Hugunin, David Ascher, Paul Dubois, Konrad Hinsen, and Travis Oliphant. Much existing numerical Python software makes heavy use of Numeric. Now the source code of Numeric is not maintained any longer, and programmers are encouraged to port code using Numeric to the new numpy implementation. This is happening to a surprisingly large extent. The numarray implementation, mainly developed by Perry Greenﬁeld, Todd Miller, and Rick White, appeared early in this century and was meant to replace Numeric. This did not happen, partly because so much software already depended on Numeric and partly because some numerical operations were faster with Numeric. To avoid two competing Numerical Python implementations, Travis Oliphant decided to merge ideas from Numeric and numarray into a new implementation, numpy (often written as NumPy, but this shortform is also widely used as a synonym for “Numerical Python”). The hope is that numpy can form the basis of the single, future Numerical Python source, which can be distributed as part of core Python. The new numpy implementation was released in the beginning of 2006, with several deﬁciencies with respect to performance, but many improvements have taken place lately. The experiments later will precisely report the relative eﬃciency of the three Numerical Python versions. The interface to Numeric, numarray, and numpy is almost the same for the three packages, but there are some minor annoying diﬀerences. For example, modules for random number generation, linear algebra, etc., have diﬀerent names. Also, some implementations contain a few features that the others do not have. If one has written a numerical Python program using Numeric, most of the statements will work if one changes the Numeric import with numarray or numpy, but it is unlikely that all statements work. One way out of this is to deﬁne a common (“least common denominator”) interface such that application writers can use this interface and afterwards transparently switch between the three implementations. This is what we have done in all software developed for the tests in the present paper. The interface, called numpytools,

Python for High-Performance Computing

343

is available on the Web as part of the SciTools package [4, 5]. As we will show later in the paper, the performance of these three implementations diﬀer so many application developers may want to write their code such that the user can trivially switch between underlying array implementations. The SciPy package [12] builds on Numerical Python and adds a wide range of useful numerical utilities, including improved Numerical Python functions, a library of mathematical functions (Bessel functions, Fresnel integrals, and many, many more), as well as interfaces to various Netlib [7] packages such as QUADPACK, FITPACK, ODEPACK, and similar. Weave is a part of the SciPy package that allows inline C++ code in Python code. This is the only use of SciPy that we make in the present paper. SciPy version 0.3 depended on Numeric, while version 0.4 and later require numpy. Our use of Weave employs numpy arrays. All experiments reported in this section (see Table 1) are collected in software that can be downloaded from the Web [13] and executed in the reader’s own hardware and software environment. We have run the serial experiments on a Linux laptop, using Linux 2.6 and GNU compilers version 4.1.3 with -O3 optimization. 2.2 Plain Python Loops The schemes (4) and (5) are readily coded as a loop over all internal grid points. Using Numerical Python arrays and standard Python for loops, with xrange as a more eﬃcient way than range to generate the indices, the code for (4) takes the following form: def scheme(unew, u, a, dx, dy, dz, dt): nx, ny, nz = u.shape; nx -= 1; ny -= 1; nz -= 1; dx2 = dx*dx; dy2 = dy*dy; dz2 = dz*dz for i in xrange(1,nx): for j in xrange(1,ny): for k in xrange(1,nz): a_c = 1.0/a[i,j,k] a_ip = 2.0/(a_c + 1.0/a[i+1,j,k]) a_im = 2.0/(a_c + 1.0/a[i-1,j,k]) a_jp = 2.0/(a_c + 1.0/a[i,j+1,k]) a_jm = 2.0/(a_c + 1.0/a[i,j-1,k]) a_kp = 2.0/(a_c + 1.0/a[i,j,k+1]) a_km = 2.0/(a_c + 1.0/a[i,j,k-1]) unew[i,j,k] = u[i,j,k] + dt*( \ (a_ip*(u[i+1,j,k] - u[i ,j,k]) - \ a_im*(u[i ,j,k] - u[i-1,j,k]))/dx2 + \ (a_jp*(u[i,j+1,k] - u[i,j ,k]) - \ a_jm*(u[i,j ,k] - u[i,j-1,k]))/dy2 + \ (a_kp*(u[i,j,k+1] - u[i,j,k ]) - \ a_km*(u[i,j,k ] - u[i,j,k-1]))/dz2) return unew

344

H.P. Langtangen and X. Cai

The three-dimensional arrays unew and u correspond to uℓ+1 and uℓ , whereas nx, ny, nz represent the numbers of grid cells in the three spatial directions. We remark that xrange(1,nx) returns indices from 1 to nx-1 (nx is not included) according to the Python convention. Thus the above code kernel will update the interior mesh points at one time step. (The boundary point layers i = 0, i = nx , j = 0, j = ny , k = 0, k = nz require additional computations according to the actual boundary conditions – here we just assume u = 0 on the boundary.) Moreover, dt, dx2, dy2, dz2 contain the values of ∆t, ∆x2 , ∆y 2 , ∆z 2 . The CPU time of the best performance of all implementations addressed in this paper is scaled to have value 1.0. Time consumption by the above function is 70 CPU time units in case of Numeric arrays, 147 for numarray objects, and 151 for numpy (v1.0.3.1) arrays. First, these results show how much slower the loops run in Python compared with Fortran, C, or C++. Second, the newer implementations numarray and numpy are about a factor of two slower than the old Numeric on such scalar operations. Similar loops implemented in Matlab (from version 6.5 and onwards, when loops are optimized by a justin-time compiler) run much faster, in fact as fast as the vectorized Python implementation (Section 2.4). The syntax u[i,j,k] implies that a tuple object (i,j,k) is used as index argument. In older versions of Numeric the array look-up could be faster if dimension was indexed separately, as in plain C arrays: u[i][j][k]. We have tested the eﬃciency of u[i,j,k] versus u[i][j][k] in the loops above, and it appears that the latter is always slower than the former for all the most recent Numerical Python implementations. Iterators have recently become popular in Python code. One can think of an iterator3 with name it over a Numerical Python array u such that a three-dimensional loop can be written as it, dummy = NumPy_array_iterator(u, offset_stop=1, offset_start=1) for i,j,k, value in it(u):

The overhead of using such an iterator is almost negligible in the present test problems, regardless of the Numerical Python implementation. The numpy implementation comes with its own iterator, called ndenumerate, such that the loops can be written as for index, value in numpy.ndenumerate(u): i,j,k = index if i == 0 or i == u.shape[0]-1 or \ j == 0 or j == u.shape[1]-1 or \ k == 0 or k == u.shape[2]-1: continue # next pass

3

The iterator is deﬁned as Tools package [4, 5].

NumPyArray iterator

in the

numpyutils

module of the Sci-

Python for High-Performance Computing

345

From Table 1 we see that ndenumerate is clearly slower than the it iterator above or plain nested loops. To increase the eﬃciency of scalar array operations, numpy oﬀers functions for setting/getting array entries instead of using subscripts [i,j,k]: u.itemset(i,j,k, b) b = u.item(i,j,k)

# means u[i,j,k] = b # means b = u[i,j,k]

The code now requires only 51 time units. That is, numpy is faster than Numeric for scalar array operations if item and itemset are used instead of subscripting. 2.3 Psyco-Accelerated Loops Psyco [8] is a Python module, developed by Armin Rigo, that acts as a kind of just-in-time compiler. The usage is trivial: import psyco scheme = psyco.proxy(scheme)

The psyco.proxy function here returns a new version of the scheme function where Psyco has accelerated various constructions in the code. Unfortunately, the eﬀect of Psyco in the present type of intensive loops-over-arrays applications is often limited. For indexing with numpy and item/itemset the CPU time is halved, but for standard subscripting the gain is only a reduction of 15%. With Numeric, the speed-up of plain loops with Psyco is almost 30%. Since a CPU time reduction of 2-3 orders of magnitude is actually required, we conclude that Psyco is insuﬃcient for improving the performance signiﬁcantly. The CPU time results so far show that array indexing in Python is slow. For one-dimensional PDE problems it may often be fast enough on today’s computers, but for three-dimensional problems one must avoid explicit indexing and use instead the techniques discussed in the forthcoming text. 2.4 Vectorized Code Users of problem-solving environments such as Maple, Mathematica, Matlab, R, and S-Plus, which oﬀer interpreted command languages, know that loops over arrays run slowly and that one should attempt to express array computations as a set of basic operations, where each basic operation applies to (almost) the whole array, and the associated loop is implemented eﬃciently in a C or Fortran library. The rewrite of “scalar” code in terms of operations on arrays such that explicit loops are avoided, is often referred to as vectorization [6]. For example, instead of looping over all array indices and computing the sine function of each element, one can just write sin(x). The whole array x is then sent to a C routine that carries out the computations and returns the result as an array. In the present test problems, we need to vectorize the scheme function, using Numerical Python features. First, the harmonic mean computations

346

H.P. Langtangen and X. Cai

can be carried out on the whole array at once (utilizing overloaded arithmetic operators for array objects). Second, the combination of geometric neighboring values must be expressed as array slices shifted in the positive and negative space directions. A slice in Python is expressed as a range of indices, e.g., 1:n, where the ﬁrst number is the lowermost index and the upper number is the last index minus one (i.e., 1:n corresponds to indices 1,2,...,n-1). Negative indices count reversely from the end, i.e., 1:-1 denotes all indices from 1 up to, but not including, the last index. If the start or end of the slice is left out, the lowermost or uppermost index is assumed as limit. Now, u[1:-1,1:-1,1:-1] represents a view to all interior points, while u[2:,1:-1,1:-1] has a shift “i+1” in the ﬁrst index. Vectorization of the code in Section 2.2 can take the following form, using shifted array slices: def schemev(unew, u, a, dx, dy, dz, dt): nx, ny, nz = u.shape; nx -= 1; ny -= 1; nz -= 1; dx2 = dx*dx; dy2 = dy*dy; dz2 = dz*dz a_c a_ip a_im a_jp a_jm a_kp a_km

= = = = = = =

1.0/a[1:-1,1:-1,1:-1] 2.0/(a_c + 1.0/a[2:,1:-1,1:-1]) 2.0/(a_c + 1.0/a[:-2,1:-1,1:-1]) 2.0/(a_c + 1.0/a[1:-1,2:,1:-1]) 2.0/(a_c + 1.0/a[1:-1,:-2,1:-1]) 2.0/(a_c + 1.0/a[1:-1,1:-1,2:]) 2.0/(a_c + 1.0/a[1:-1,1:-1,:-2])

unew[1:-1,1:-1,1:-1] = u[1:-1,1:-1,1:-1] + dt*( \ (a_ip*(u[2:,1:-1,1:-1] - u[1:-1,1:-1,1:-1]) - \ a_im*(u[1:-1,1:-1,1:-1] - u[:-2,1:-1,1:-1]))/dx2 + \ (a_jp*(u[1:-1,2:,1:-1] - u[1:-1,1:-1,1:-1]) - \ a_jm*(u[1:-1,1:-1,1:-1] - u[1:-1,:-2,1:-1]))/dy2 + \ (a_kp*(u[1:-1,1:-1,2:] - u[1:-1,1:-1,1:-1]) - \ a_km*(u[1:-1,1:-1,1:-1] - u[1:-1,1:-1,:-2]))/dz2) return unew

All the slow loops have now been removed. However, since all arithmetic operations on array objects are necessarily binary (pair-wise) operations, a lot of temporary arrays are created to store intermediate results in compound arithmetic expressions. Python is good at removing such temporaries fast, but their allocation and deallocation imply some overhead. The vectorized version of the ﬁnite diﬀerence scheme, still implemented solely in Python code, requires only 2.3 CPU time units when numpy arrays are used. The number decreases to 3.1 for Numeric arrays and 4.6 numarray objects4 . In this case, the recent numpy implementation outperforms the older Numeric. For many applications where other parts of the code, e.g. involv4

On a MacBook Pro running Mac OS X 10.5, vectorization was even less efficient, the CPU time numbers being 4.1, 5.2, and 5.5 for numpy, Numeric, and numarray, respectively. Most other timing results on the Mac were compatible with those on the IBM computer.

Python for High-Performance Computing

347

ing I/O, consume signiﬁcant CPU time, vectorized implementation of ﬁnite diﬀerence stencils may exhibit satisfactory performance. 2.5 Inline C++ Code Using Wave The Weave tool [15], developed by Eric Jones, comes with the SciPy package and allows us to write the loops over the arrays in C++ code that will be automatically compiled, linked, and invoked from Python. The C++ code employs Blitz++ arrays and the corresponding subscripting syntax must be applied in the C++ code snippet. Let us exemplify the use in our case: def schemew(unew, u, a, dx, dy, dz, dt): nx, ny, nz = u.shape; nx -= 1; ny -= 1; nz -= 1; dx2 = dx*dx; dy2 = dy*dy; dz2 = dz*dz code = r""" int i,j,k; double a_c, a_ip, a_im, a_jp, a_jm, a_kp, a_km; for (i=1; i