Multiscale Optimization Methods and Applications (Nonconvex Optimization and Its Applications, 82) 9780387295497, 0387295496

As optimization researchers tackle larger and larger problems, scale interactions play an increasingly important role. O

144 6 26MB

English Pages 424 [416] Year 2005

Report DMCA / Copyright

DOWNLOAD PDF FILE

Recommend Papers

Multiscale Optimization Methods and Applications (Nonconvex Optimization and Its Applications, 82)
 9780387295497, 0387295496

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

MULTISCALE OPTIMIZATION METHODS AND APPLICATIONS

Nonconvex Optimization and Its Applications VOLUME 82 Managing Editor: Panos Pardalos University of Florida, U.S.A.

Advisory Board: J. R. Birge University of Chicago, U.S.A. Ding-Zhu Du University of Minnesota, U.S.A. C. A. Floudas Princeton University, U.S.A. J. Mockus Lithuanian Academy of Sciences, Lithuania H. D. Sherali Virginia Polytechnic Institute and State University, U.S.A. G. Stavroulakis Technical University Braunschweig, Germany H. Tuy National Centre for Natural Science and Technology, Vietnam

MULTISCALE OPTIMIZATION METHODS AND APPLICATIONS

Edited by WILLIAM W. HAGER University of Florida, Gainesville, Florida SHU-JEN HUANG University of Florida, Gainesville, Florida PANOS M. PARDALOS University of Florida, Gainesville, Florida OLEG A. PROKOPYEV University of Florida, Gainesville, Florida

^

Springer

Library of Congress Control Number: 2005933792 ISBN-10: 0-387-29549-6

e-ISBN: 0-387-29550-X

ISBN-13: 978-0387-29549-7

Printed on acid-free paper.

© 2006 Springer Science-fBusiness Media, Inc. All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science-HBusiness Media, hic., 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in coimection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. Printed in the United States of America. 987654321 springeronline.com

Contents

Multiscale Optimization in VLSI Physical Design Automation Tony F. Chan, Jason Cong, Joseph R. Shinnerl, Kenton Sze, Min Xie, Yan Zhang

1

A Distributed Method for Solving Semidefinite Programs Arising from Ad Hoc Wireless Sensor Network Localization Pratik Biswas, Yinyu Ye

69

Optimization Algorithms for Sparse Representations and Applications Pando G. Georgiev, Fabian Theis, Andrzej Cichocki

85

A Unified Framework for Modeling and Solving Combinatorial Optimization Problems: A Tutorial Gary A. Kochenberger, Fred Glover

101

Global Convergence of a Non-monotone Trust-Region Filter Algorithm for Nonlinear Programming Nicholas I. M. Gould, Philippe L. Toint

125

Factors Affecting the Performance of Optimization-based Multigrid Methods Robert Michael Lewis, Stephen G. Nash

151

A Local Relaxation Method for Nonlinear Facility Location Problems Walter Murray, Uday V. Shanbhag

173

Fluence Map Optimization in IMRT Cancer Treatment Planning and A Geometric Approach Yin Zhang, Michael Merritt

205

vi

Contents

Panoramic Image Processing using Non-Commutative Harmonic Analysis Part I: Investigation Amal Aafif, Robert Boyer

229

Generating Geometric Models through Self-Organizing Maps Jung-ha An, Yunmei Chen, Myron N. Chang, David Wilson, Edward Geiser

241

Self-similar Solution of Unsteady Mixed Convection Flow on a Rotating Cone in a Rotating Fluid Devarapu Anilkumar, Satyajit Roy

251

Homogenization of a Nonlinear Elliptic Boundary Value Problem Modelling Galvanic Interactions on a Heterogeneous Surface Y.S. Bhat

263

A Simple Mathematical Approach for Determining Intersection of Quadratic Surfaces Ken Chan

271

Applications of Shape-Distance Metric to Clustering Shape-Databases Shantanu H. Joshi, Anuj Srivastava

299

Accurately Computing the Shape of Sandpiles Christopher M. Kuster, Pierre A. Cremaud

305

Shape Optimization of Transfer Functions Jiawang Nie, James W. Demmel

313

Achieving Wide Field of View Using Double-Mirror Catadioptric Sensors Ronald Perline, Emek Rose

327

Darcy Flow, Multigrid, and Upscaling James M. Rath

337

Iterated Adaptive Regularization for the Operator Equations of the First Kind Yanfei Wang, Qinghua Ma

367

Recover Multi-tensor Structure from H A R D MRI Under Bi-Gaussian Assumption Qingguo Zeng, Yunmei Chen, Weihong Guo, Yijun Liu

379

P A C B B : A Projected Adaptive Cyclic Barzilai-Borwein Method for Box Constrained Optimization Hongchao Zhang, William W. Eager

387

Contents Nonrigid Correspondence and Classification of Curves Based on More Desirable Properties Xiqiang Zheng, Yunmei Chen, David Groisser, David Wilson

vii

393

Preface

The Conference on Multiscale Optimization Methods and Applications (February 26-28, 2004) and a Student Workshop (March 3-4) took place at the University of Florida (UF), hosted by the Center for Applied Optimization and the local SIAM student chapter, the SIAM Gators. The organizers of the optimization conference were William Hager, Timothy Davis, and Panos Pardalos, while the student workshop was organized by a committee chaired by Beyza Asian, president of the SIAM Gators. In addition, Jung-ha An, Yermal Bhat, Shu-Jen Huang, Oleg Prokopyev, and Hongchao Zhang co-edited the student paper submissions to this volume. The conferences were supported by the National Science Foundation and by UF's Mathematics Department, Industrial and Systems Engineering Department, Computer and Information Science and Engineering Department, College of Liberal Arts and Sciences, and Division of Sponsored Research. At the optimization conference, poster prizes were awarded to Lei Wang, Rutgers University, Kenton Sze, UCLA, Jiawei Zhang, Stanford University, and Balabhaskar Balasundarara, Texas A & M University. At the student workshop, the awards for best presentation were given to Dionisio Fleitas, University of Texas, Arhngton, and Firmin Ndeges, Virginia Polytechnic Institute. For details concerning the student workshop, see the article "By students for students: SIAM Gators welcome chapters nationwide to Florida conference," SIAM News, Vol. 37, Issue 8. The conferences focused on the development of new solution methodologies, including general multilevel solution techniques, for tackhng difficult, large-scale optimization problems that arise in science and industry. Applications presented at the conference included: (a) (b) (c) (d) (e)

the circuit placement problem in VLSI design, the protein folding problem, and drug design, a wireless sensor location problem, internet optimization, the siting of substations in an electrical network.

X

Preface (f) optimal dosages in the treatment of cancer by radiation therapy, (g) facihty location, and (h) shape and topology optimization.

These problems are challenging and intriguing, often easy to state, but difficult to solve, fn each case, complexity is related to geometry: components must be placed so as to satisfy geometric constraints, while optimizing a cost function. The geometric constraints lead to an exponentially large solution space. The development of efficient techniques to probe this huge solution space is an ongoing effort that can have an enormous economic impact. The key to success seems to be the development of techniques that exploit problem structure. In solving difficult problems, it is important to have a quantitative way to compare the effectiveness of different approaches. For the circuit placement problem (the subject of talks by Jason Cong and Tony Chan, and a poster by Kenton Sze) benchmarks have been developed with known optimal solutions. When state-of-the-art algorithms for the circuit placement problem were applied to these benchmarks, researchers were startled to see the large gap between the best algorithms and the actual optimum (a factor of two difference between the best approximation and the optimum). The idea of generating test problems with known optimal solutions seems to be one research direction of importance in the coming years. For certain classes of problems, such as the quadratic assignment problems, best-known solutions as well as lower bounds on the cost are published. Although bounds and estimates are useful, test problems with known optimal solutions provide a quantitative way to compare the advantages and disadvantages of algorithms. The Netlib LP test set catalyzed the development of linear programming algorithms during the past decade, while the circuit placement benchmarks appear to be having a similar impact in VLSI design. In the area of VLSI design, a multilevel approach has proved to be an effective way to cope with the huge solution space (see paper of Tony F. Chan et ai). This approach, closely connected with ideas developed for multigrid techniques in partial differential equations, uses a series of scaled approximations to the original problem. The coarsest approximation is easiest to solve and coarse information passed on to the next finer level gives a good starting point for a more difficult problem. Research is progressing towards systematic ways of moving back and forth between problem scales in an increasing number of apphcation areas (see paper of Michael Lewis and Stephen Nash). In large-scale discrete optimization, another important strategy is to transform the discrete problem into a continuous setting. This is being done in many different ways. Semidefinite programming is used to obtain approximations to partitioning problems with a guaranteed error bound. Continuous quadratic programs are used in reformulations of both graph partitioning and maximum clique problems. A parameterized exponential transformation is used in the siting substation problem (paper by Walter Murray and Uday Shanbhag) to obtain a feasible point.

Preface

xi

Interest in interior point methods remains strong; it is the basis of some powerful optimization packages. On the other hand, recent success was reported by Igor Griva with an exterior approach based on a primal-dual nonlinear rescaling method. The approach was particularly effective in a neighborhood of an optimum where numerical errors and stabihty issues impede convergence of interior point methods. In the area of continuous, smooth optimization, the "Newton Liberation Front" (NLF), introduced by Philippe Toint, received strong endorsement. In the past, researchers placed much emphasis on global convergence of algorithms. This led to rather stringent criteria for the acceptance of a Newton step. A new class of acceptance criteria is emerging, the so-called filter methods. And a new variation of this criterion, with a less stringent acceptance criterion, yielded superior performance on a large set of test problems (paper of Nicholas Gould and Philippe Toint). The papers in this book highlight some of the research presented at the Gainesville conferences. Additional papers wiU appear in a special issue of Computational Optimization and Applications. We would like to thank the sponsors and participants of the conference, the authors, the anonymous referees, and the publisher for helping us produce this volume.

Gainesville, Florida, USA April, 2005

William W. Eager Shu-Jen Huang Panos M. Pardalos Oleg A. Prokopyev

List of Contributors

Amal Aaflf Department of Mathematics Drexel University Philadelphia, PA 19104 USA amalOdrexel.edu Jung-ha An Department of Mathematics University of Florida Gainesville, FL 32611 USA junghaOmath.uf1.edu

Devarapu Anilkumar Department of Mathematics Indian Institute of Technology Madras, Chennai 600 036 India [email protected],in Pratik Biswas Electrical Engineering Stanford University Stanford, CA 94305 USA pbiswasOstanford.edu Y.S. Bhat Department of Mathematics University of Florida

Gainesville, FL 32611 USA ybhatOmath.uf1.edu

Robert Boyer Department of Mathematics Drexel University Philadelphia, PA 19104 USA rboyerOmcs.drexel.edu

Tony F. Chan UCLA Mathematics Department Los Angeles, California 90095-1555 USA chanOmath.ucla.edu

Myron N . Chang Department of Biostatistics University of Florida Gainesville, FL 32611 USA mchangQcog.uf1.edu Ken Chan The Aerospace Corporation 15049 Conference Center Drive, Chantilly, VA 20151 USA kenneth.f.chanQaero.org

List of Contributors Yunmei Chen Department of Mathematics University of Florida Gainesville, FL 32611 USA yunOmath.uf1.edu

David Groisser Department of Mathematics University of Florida Gainesville, FL 32611 USA

Andrzej Cichocki Laboratory for Advanced Brain Signal Processing Brain Science Institute, RIKEN Wako-shi Japan

Fred Glover School of Business University of Colorado at Boulder Boulder, Colorado 80304 USA

groisserOmath.uf1.edu

Fred.GloverOColorado.edu

ciaObsp.brain.riken.jp

Jason Cong UCLA Computer Science Department Los Angeles, Cahfornia 90095-1596 USA congQcs.ucla.edu James W. Demmel Department of Mathematics University of California Berkeley, CA 94710 USA demmelOmath.bekeley.edu

Nick Gould Rutherford Appleton Laboratory Computational Science and Engineering Department Chilton, Oxfordshire England gouldOrl.ac.uk Pierre A. Gremaud Management Science and Department of Mathematics and Center for Research in Scientific Computation Raleigh, NC 27695 USA gremaudOmath.ncsu.edu

Edward Geiser Department of Medicine University of Florida Gainesville, FL 32611 USA

Weihong Guo Department of Mathematics University of Florida Gainesville, FL 32611 USA

geiseeaOmedicine.uf1.edu

guoQmath.uf1.edu

Pando G. Georgiev ECECS Department University of Cincinnati Cincinnati, Ohio 45221-0030 USA

William W. Hager Department of Mathematics University of Florida Gainesville, FL 32611 USA [email protected]

pgeorgieQececs.uc.edu

List of Contributors

XV

Shantanu H. Joshi Department of Electrical Engineering Florida State University Tallahassee, PL 32310 USA j oshiOeng.f su.edu

Qinghua Ma Department of Information Sciences College of Arts and Science of Beijing Union University Beijing, 100038 P.R.China qinghuaSygi.edu.en

Gary A. Kochenberger School of Business University of Colorado at Denver Denver, Colorado 80217 USA Gary.KochenbergerQcudenver.edu

Michael Merritt Department of Computational and Applied Mathematics Rice University Houston, TX 77005 - 4805 USA mmerrittOcaam.rice.edu

Emek Kose Department of Mathematics Drexel University Philadelphia, PA 19103 USA [email protected] Christopher M. Kuster Department of Mathematics and Center for Research in Scientific Computation Raleigh, NC 27695 USA cmkust erOmath.nc su.edu

Walter Murray Department of Management Science and Engineering Stanford University Stanford, CA 94305-4026 USA walterOstanford.edu Stephen G. Nash School of Information Technology and Engineering Mail Stop 5C8 George Mason University Fairfax, VA 22030 USA snashQgmu.edu

Robert Michael Lewis Department of Mathematics College of William & Mary Williamsburg, Virginia, 23187-8795 USA buckarooQmath.wm.edu Yijun Liu Department of Psychiatry University of Florida Gainesville, FL 32611 USA [email protected]. edu

Jiawang Nie Department of Mathematics University of California Berkeley, CA 94710 USA njwSmath.bekeley.edu Ronald Perline Department of Mathematics Drexel University Philadelphia, PA 19103 USA rperlineSmcs.drexel.edu

xvi

List of Contributors

James M. Rath Institute for Computational Engineering and Sciences University of Texas at Austin TX 78712 USA orgcinismSices. u t e x a s . edu Satyajit Roy Department of Mathematics Indian Institute of Technology Madras, Chennai 600 036 India sjroySiitm.ac.in Uday V. Shanbhag Department of Mechanical and Industrial Engineering University of Illinois at UrbanaChampaign Urbana, II 61801 USA udaybagQstcinf ord. edu

Fabian Theis Institute of Biophysics University of Regensburg D-93040 Regensburg Germany f a b i a n S t h e i s . nsime Philippe L. Toint University of Namur Department of Mathematics 61, rue de Bruxelles, B-5000 Namur Belgium philippe.tointOfundp.ac.be Yanfei Wang State Key Laboratory of Remote Sensing Science P.O. BOX 9718, Beijing 100101 P.R.China yf Wcing_ucf Oyahoo. com David Wilson Department of Mathematics University of Florida Gainesville, FL 32611 USA dcwQmath.ufl.edu

Joseph R. Shinnerl UCLA Computer Science Department Los Angeles, Cahfornia 90095-1596 USA shirmerlOcs.ucla.edu

Min Xie UCLA Computer Science Department Los Angeles, Cahfornia 90095-1596 USA xieScs.ucla.edu

Anuj Srivastava Department of Statistics Florida State University Tallahassee, FL 32306 USA anujOstat.fsu.edu

Yinyu Ye Management Science and Engineering Stanford University Stanford, CA 94305 USA yinyu-yeSstanford.edu

Kenton Sze UCLA Mathematics Department Los Angeles, Cahfornia 90095-1555 USA nkszeOmath.ucla.edu

Qingguo Zeng Department of Mathematics University of Florida GainesviUe, FL 32611 USA qingguoOmath.uf1.edu

List of Contributors

xvii

Hongchao Zhang Department of Mathematics University of Florida Gainesville, FL 32611 USA hzhctngOuf 1. edu

Yin Zhang Department of Computational and Applied Mathematics Rice University Houston, TX 77005 - 4805 USA yzhangOcaam.rice.edu

Yan Zhang UCLA Computer Science Department Los Angeles, California 90095-1596 USA zhangyanOcs.ucla.edu

Xiqiang Zheng Department of Mathematics University of Florida Gainesville, FL 32611 USA xzhengSmath.uf1.edu

Multiscale Optimization in VLSI Physical Design Automation Tony F. C h a n \ Jason Cong^, Joseph R. Shinnerl^, Kenton Sze\ Min Xie^, and Yan Zhang^ ^ UCLA Mathematics Department, Los Angeles, CaHfornia 90095-1555, USA. {chan,nksze}Qmath.ucla.edu ^ UCLA Computer Science Department, Los Angeles, CaHfornia 90095-1596, USA. {cong,shinnerl,xie,zhangyan}Qcs.ucla.edu

Summary. The enormous size and complexity of current and future integrated circuits (IC's) presents a host of challenging global, combinatorial optimization problems. As IC's enter the nanometer scale, there is increased demand for scalable and adaptable algorithms for VLSI physical design: the transformation of a logicaltemporal circuit specification into a spatially explicit one. There are several key problems in physical design. We review recent advances in multiscale algorithms for three of them: partitioning, placement, and routing. K e y words: VLSI, VLSICAD, layout, physical design, design automation, scalable algorithms, combinatorial optimization, multiscale, multilevel

1 Introduction In the computer-aided design of very-large-scale integrated circuits (VLSICAD), physical design is concerned with the computation of a precise, spatially explicit, geometrical layout of circuit modules and wires from a given logical and temporal circuit specification. Mathematically, the various stages of physical design generally amount to extremely challenging mixed integer— nonlinear-programming problems, including large numbers of both continuous and discrete constraints. The numbers of variables, nonconvex constraints, and discrete constraints range into the tens of millions and beyond. Viewed discretely, the solution space grows combinatorially with the number of variables. Viewed continuously, the number of local extrema grows combinatorially. The principal goals of algorithms for physical design are (i) speed and scalabihty; (ii) the ability to accurately model and satisfy complex physical constraints; and (iii) the ability to attain states with low objective values subject to (i) and (ii).

2

Tony F. Chan et al.

Highly successful multiscale algorithms for circuit partitioning first appeared in the 1990s [CS93, KAKS97, CAMOO]. Since then, multiscale metaheuristics for VLSICAD physical design have steadily gained ground. Today they are among the leading methods for the most critical problems, including partitioning, placement and routing. Recent experiments strongly suggest, however, that the gap between optimal and attainable solutions remains quite substantial, despite the burst of progress in the last decade. Thus, an improved understanding of the apphcation of multiscale methods to the large-scale combinatorial optimization problems of physical design is widely sought [CS03]. A brief survey of some leading multiscale algorithms for the principal stages of physical design — partitioning, placement, and routing — is presented here. First, the role of physical design in VLSICAD is briefly described, and recent experiments revealing a large optimality gap in the results produced by leading placement algorithms are reviewed. 1.1 Overvievv^ of VLSI Design As illustrated in Figure 1, VLSI design can be divided into the following steps: system modeling, architectual synthesis, logic synthesis, physical design, fabrication, packaging. •









System modeling. The concepts in the designer's mind are captured as a set of computational operations and data dependencies subject to constraints on timing, chip area, etc. Functional Design. The resources that can implement the system's operations are identified, and the operations are scheduled. As a result, the control logic and datapath interconnections are also identified. Functional design is also called high-level synthesis. Logic synthesis. The high-level specification is transformed into an interconnection of gate-level boolean primitives — nand, xor, etc. The circuit components that can best realize the functions derived in functional design are assembled. Circuit delay and power consumption are considered at this step. The output description of the interconnection between different gate-level primitives is usually called a netlist (Section 1.2). Physical design. The actual spatial layout of circuit components on the chip is determined. The objectives during this step usually include total wirelength, maximum signal propogation time ("performance"), etc. Physical design can be further divided into steps including partitioning, floorplanning, placement, and routing; these are described in Section 1.2 below. Fabrication. Fabrication involves the deposition and diffusion of material onto a sihcon wafer to achieve desired electronic circuit properties. Since designs will make use of several layers of metal for wiring, masks mirroring the layout on each metal layer will be applied in turn to produce the required interconnection pattern by photolithography.

Multiscale Optimization in VLSI Physical Design Automation System Modeling

Functional Design ^

Logic Synthesis

Physical Design

Fabrication

Packaging

Fig. 1. VLSI design includes system specification, functional design, logic physical design, fabrication, and paclcaging. •

n.

Packaging. The wafer is diced into individual chips, which are then packaged and tested.

As the fundamental physical barriers to continued transistor miniaturization begin to take shape, efforts in synthesis and physical design have intensified. The main component stages of physical design are reviewed next in more detail. 1.2 Overview of VLSI Physical Design At the physical level, an integrated circuit is a collection of rectangular modules connected by rectangular wires. The wires are arranged in parallel, horizontal layers stacked along the z axis; the wires in each layer are also parallel. Each module has one face in a prescribed rectangle in the xy-plane known as the placement region. However, different modules may intersect with different numbers of metal wiring layers. After logic synthesis, most modules are

Tony F. Chan et al.

Cell

'chip multipin net

two-pm net

Fig. 2. A 2-D illustration of the physical elements of an integrated circuit. The routing layers have been superimposed. selected from a cell library and assigned to logic elements as part of a process known as technology mapping. These modules are called standard cells or simply cells. Their widths (x-direction) may vary freely, but their heights {ydirection) are taken from a small, discrete set. Other, larger modules may represent separately designed elements known as IP blocks (inteUectual-property blocks) or macros; the heights of these larger blocks typically do not fall within the standard-cell heights or their integer multiples. The area of a module refers to the area of its cross-sections in the xy-plane. A signal may propagate from a source point on a module to any number of sinks on other modules. The source and sinks together define a net. At steady state, a net is an equipotential of the circuit. A connection point between a net and a module is called a pin. Hence, a net may be abstracted as either a set of pins, or, less precisely, as the set of modules to which these pins belong. The netlist specifies the nets of a circuit as lists of pins and is a product of logic synthesis (Section 1.1). The physical elements of an IC are iUustrated in Figure 2. As iUustrated in Figure 3, VLSI physical design proceeds through several stages, including partitioning, floorplanning, placement, routing, and compaction [DiM94, She99].

Multiscale Optimization in VLSI Physical Design Automation Logic Synthesis Physical design

Partitioning

Floorplanning

Placement

Routing

Compaction

Fabrication

Fig. 3. Stages in the physical design of integrated circuits. Partitioning Due to the complexity of integrated circuits, the first step in physical design is usually to divide a design into subdesigns. Considerations include area, logic functionality, and interconnections between subdesigns. Partitioning is applied recursively until the complexity in each subdesign is reduced to the extent that it can be handled efficiently by existing tools. Floorplanning The shapes and locations of the components within each partitioning block are determined at this stage. These components are also called blocks and may be reshaped. Floorplanning takes as input a set of rectangular blocks, their fixed areas, their allowed shapes expresed as maximum aspect ratios, and the connection points on each block for the nets containing it. Its output includes the shape and location of each block. Constraints may involve the location

6

Tony F, Chan et al.

of a block and/or adjacency requirements between arbitrary pairs of blocks. The blocks are not allowed to overlap. Floorplanning is typically limited to problems with a few hundred or a few thousand blocks. As such, it is typically used as a means of coarse placement on a simplified circuit model, either as a precursor to placement or as a means of guiding logic synthesis to a physically reasonable solution. Placement In contrast to floorplanning, placement treats the shapes of all blocks as fixed; i.e., it only determines the location of each block on the chip. The variables are the xy-locations of the blocks; most blocks are standard cells (Section 1.2). The y-locations of cells are restricted to standard-cell rows, as in Figure 2. Placement instance sizes range into the tens of millions and will continue to increase. Placement is usually divided into two steps: global placement and detailed placement. Global placement assigns blocks to certain subregions of the chip without determining the exact location of each component within its subregion. As a result, the blocks may still overlap. Detailed placement starts from the result of global placement, removes all overlap between blocks, and further optimizes the design. Placement objectives include the estimated total wirelength needed to connect blocks in nets, the maximum expected wiring congestion in subsequent routing, and/or the timing performance of the circuit. A simphfied formulation of placement is given in Section 1.5.1. Routing With the locations of the blocks fixed, their interconnections as specified by the netlist must be realized. That is, the shapes and locations of the metal wires connecting the blocks must be determined. This wiring layout is performed not only within the placement region but also in a sequence of parallel metal routing layers above it. Cells constitute routing obstacles in layers which pass through them. Above the cells, all the wires in the same routing layer are parallel to the same coordinate axis, either x or y. Routing layers alternate in the direction of their wires. Interlayer connections are called vias. The objective of routing is to minimize the total wirelength while realizing all connections subject to wire spacing constraints within each layer. In addition, the timing performance of the circuit may also be considered. Routing is usually done in two steps, global routing and detailed routing. Global-routing algorithms determine a route for each connection in terms of the regions it passes through, without giving the exact coordinates of the connection. During this phase, the maximum congestion in each region must be kept below a certain hmit. The goal of detailed routing is to realize a point-to-point path for each net following the guidance given by the global routing. It is in this step that the geometric location and shape of each wire is determined. Due

Multiscale Optimization in VLSI Physical Design Automation

7

to the sequential nature of most routing algorithms, a 100% completion rate may not be obtained for many designs. An additional step called rip-up and reroute is used to remove a subset of connections already made and find alternate routes, so that the overall completion rate can be improved. The rip-up and reroute process works in an iterative fashion until either no improvement can be obtained or a certain iteration limit is reached. Compaction Compaction is used to reduce the white space on the chip so that the chip area can be minimized. This step involves heavy manipulation of geometric objects. Depending on the movement these geometric objects are allowed, compaction can be categorized into 1-D compaction or 2-D compaction. However, the chip area for many of the designs are given as fixed. In this case, instead of compacting the design, intelligent allocation of white space can be adopted to further optimize certain metrics, e.g., routing congestion, maximum temperature, etc. 1.3 Hypergraph Circuit Model for Physical Design An integrated circuit is abstracted more accurately as a hypergraph than as a graph, because each of its nets may connect not just a pair of nodes but rather an arbitrarily large subset of nodes. The details of the abstraction depend on the point in the design flow where it is used. A generic definition of the hypergraph concept is given here. In later sections, specific instances of it are given for partitioning, placement, and routing. A hypergraph H = {V, E} consists of a set of vertices V = {fi,f2, •••Vn] and a set of hyperedges E = {ei, 6 2 , . . . , e ^ } . Each hyperedge Cj is just some subset of V, i.e., ej = {t'jijVjj, ...f^^} C V. Each hyperedge corresponds to some net in the circuit. Each vertex Vi may have weight wivi) associated with it, e.g., area; each hyperedge ej may have weight w{ej) associated with it, e.g., timing criticality. In either case, the hypergraph itself is said to be weighted as well. The number of vertices contained by e^ (we will also say "connected by" ej ) is called the degree of Cj and is denoted |ej|. The number of hyperedges containing vi is called the degree of Vi and is denoted \vi\. Every hypergraph, weighted or unweighted, has a dual. The dual hypergraph H' = {V, £"} of a given hypergraph H — {y, -B} is defined as follows. First, let V = E; if H is weighted, then let w{vl) = w{ei). Second, for each Vi & V, let e[ e E' be the set of Cj £ E that contain Vi. li H is weighted, then let w{e'j) = 'w{vi). It is straightforward to show that H", the dual of H', is isomorphic to H. 1.4 The Gigascale Challenge Since the early 1960s, the number of transistors in an integrated circuit has doubled roughly every 18 months. This trend, known as Moore's Law, is ex-

8

Tony F. Chan et al.

pected to continue into the next decade. Projected statistics from the 2003 International Technology Roadmap for Semiconductors (ITRS 2003) are summarized in Table 1. Production year DRAM 1/2 pitch (nm) M transistors/chip Chip size (mm'') Local clock (MHz) Wiring levels

2003 100 153 140 2976 13

2004 90 193 140 4171 14

2005 80 243 140 5204 15

2006 70 307 140 6783 15

2007 65 386 140 9285 15

2008 57 487 140 10972 16

2009 50 614 140 12369 16

Table 1. Circuit Statistics Projections from ITRS 2003 [itr].

Over 40 years of this exponential growth have brought enormous complexity to integrated circuits, several hundred million transistors integrated on a single chip. Although the power of physical-design algorithms has also increased over this period, evidence suggests that the relative gap between optimal and achievable widens with increasing circuit size (Section 1.5). As the number and heterogeneity of devices on chip continues to increase, so does the difficulty in accurately modeling and satisfying various manufacturability constraints. Typically, constraints in VLSICAD physical design are concerned with module nonoverlap ("overlap"), signal propagation times ("timing"), wiring congestion ("routability"), and maximum temperature. A detailed survey of the modeling techniques currently used for these conditions is beyond the scope of this chapter. However, the practical utility of any proposed algorithm rests largely in (a) its scalability and (b) its ability to incorporate constraint modeling efficiently and accurately at every step. Recent research [CCXOSb, JCX03] strongly suggests that, over the last few decades, advances in algorithms for physical design have not kept pace with increasing circuit complexity. These studies are reviewed next. 1.5 Quantifying the Optimality Gap As even the simplest formulations of core physical-design problems are NPhard [She99j, practical algorithms rely heavily on heuristics. Meaningful bounds on the deviation from optimal are not yet known for these algorithms as applied to the design of real circuits. However, a recent optimality study of VLSI placement algorithms shows a large gap between solutions from stateof-the-art placement algorithms and the true optima for a special class of synthetic benchmarks. In this section, the wirelength-driven placement problem is introduced for the purpose of summarizing this study. In Section 3, the role of placement in VLSICAD is considered in more detail, and some recent multiscale algorithms for it are reviewed.

Multiscale Optimization in VLSI Physical Design Automation

9

1.5.1 The Placement Model Problem In the given hypergraph-netlist representation H = {V, E) of an integrated circuit, we require for placement that each Vi has a given, fixed rectangular shape. We assume given a bounded rectangle R in the plane whose boundaries are parallel to coordinate axes x and y. The orientation of each Vi will also be assumed prescribed in alignment with the boundaries of R, although in some cases flipping Vi across coordinate axes may be allowed. The length of Vi along the a;-axis is called its width; its length along the y-axis is called its height. The vertices Vi are typically represented at some fixed level of abstraction. Possible levels are, from lowest to highest, transistors, logic gates, standard cells, or macros (cf. Section 1.2). As IC's become more heterogeneous, the mixed-size problem, in which elements from several of these levels are simultaneously placed, increases in importance. The coverage in this chapter assumes the usual level of standard cells. Interconnections among placed cells (Section 4) are ultimately made not only within R but also in multiple routing layers above R; each routing layer has the same x and y coordinates as R but a different z coordinate. For this reason, the total area of all Vi G V may range anywhere from 50% to 98% or more of the area of R. With multiple routing layers above placed cells, making metal connections between all the pins of a net can usually be accomplished within the bounding box of the net. Therefore, the most commonly used estimate of the length of wire i{ei) that will be required for routing a given net ej = {vi^ ,Vi^,... ,Vij} is simply the half-perimeter of its bounding box: i{ei) = (m&xx{vi^) - minx{vi^)j

+ f m a x y ( v i j - mmy{v,,)j

.

(1)

Wirelength-Driven Model Problem In the simplest commonly used abstraction of placement, total 2D-boundingbox wirelength is minimized subject to the pairwise nonoverlap, row-alignment, and placement-boundary constraints. Let w{R) denote the width (along the x-direction) of the placement region R, and let y i , y 2 , . . . ,F„^ denote the ycoordinates of its standard-cell rows' centers; assume every cell fits in every row. With {xi, yi) denoting the center of cell Vi and Wi denoting its width, this wirelength-driven form of placement may be expressed as ™n(j.._y,) I]eg£;iy(e)^(e)

for t{e) defined in (1)

subject to yj G {Yi,...,y„^} all ^i G V" 0 < Xj < w{R) - Wi/2 all Vi&V \xi -Xj\ > {wi + Wj)/2 or yi ^ j/j all Vi,Vj G V.

(2)

Despite its apparent simplicity, this formulation captures much of the difficulty in placement. High-quality solutions to (2) generally serve as useful starting

10

Tony F, Chan et al.

wm

mm \ r

F

\

i

w T)

H

i

Sii

^

n %

W

T J

imm^^m^^m:

m mm

:•;•:':

% :•:•:•.

•0

for aU i < j .

Quadratic wirelength is minimized subject to the pairwise nonoverlap constraints by a customized interior-point method with a slack variable added to the objective and the nonoverlap constraints to gradually remove overlap. Interestingly, experiments suggest that area variations among the disks can be ignored without loss in solution quality. That is, the radius of each disk can be set to the average over all the disks: pi = pj = p = {1/N) J2i Pk- After nonlinear programming, larger-than-average cells are chopped into average-size fragments, and an overlap-free configuration is obtained by linear assignment on the cells and cell fragments. Fragments of the same cell are then reunited, the area overflow incurred being removed by ripple-move cell propagation described below. Discrete Goto-based swaps are then employed as described below to further reduce wirelength prior to interpolation to the next level. Relaxation at each level therefore starts from a reasonably uniform area-density distribution of vertices.

32

Tony F. Chan et al.

A uniform bin grid is used to monitor the area-density distribution. The first four versions of mPL rely on two sweeps of relaxations on local subsets at all levels except the coarsest. These local-subset relaxations are described in the next two paragraphs. The first of these sweeps allows vertices to move continuously and is called quadratic relaxation on subsets (QRS). It orders the vertices by a simple depth-first search (DPS) on the netlist and selects movable vertices from the DPS ordering in small batches, one batch at a time. Por each batch the quadratic wirelength of all nets containing at least one of the movable vertices is minimized, and the vertices in the batch are relocated. Typically, the relocation introduces additional area congestion. In order to maintain a consistent area-density distribution, a "ripple-move" algorithm [HLOO] is applied to any overfull bins after QRS on each batch. Ripple-move computes a maximum-gain monotone path of vertex swaps along a chain of bins leading from an overfull bin to an underfuU bin. Keeping the QRS batches small facilitates the area-congestion control; the batch size is set to three in the reported experiments. After the entire sweep of QRS-Fripple-move, a second sweep of Goto-style permutations [GotSl] further improves the wirelength. In this scheme, vertices are visited one at a time in netlist order. Each vertex's optimal "Goto" location is computed by holding all its vertex neighbors fixed and minimizing the sum of the bounding-box lengths of all nets containing it. If that location is occupied by b, say, then b's optimal Goto location is similarly computed along with the optimal Goto locations of all of b's nearest neighbors. The computations are repeated at each of these target locations and their nearest neighbors up to a predetermined Hmit (3-5). Chains of swaps are examined by moving a to some location in the Manhattan unit-disk centered at b, and moving the vertex at that location to some location in the Manhattan unit disk centered at its Goto location, and so on. The last vertex in the chain is then forced into a's original location. If the best such chain of swaps reduces wirelength, it is accepted; otherwise, the search begins anew at another vertex. See Pigure 14.

^^ \, s\,

H I J

K

1

N

G K'' *^ J

Fig. 14. Goto-based discrete relaxation in mPL.

C

.D B F E

Multiscale Optimization in VLSI Physical Design Automation

33

In the most recent implementation of mPL [CCS05], global relaxations, in which all movable objects are simultaneously displaced, have been scalably incorporated at every level of hierarchy. The redistribution of smoothed area density is formulated as a Helmholtz equation subject to Neumann boundary conditions, the bins defining area-density constraints serving as a discretization. A log-sum-exp smoothing of half-perimeter wirelength defined in Section 3.4.2 below is the objective. Given an initial unconstrained solution at the coarsest level or an interpolated solution at finer levels, an Uzawa method is used to iteratively improve the configuration. Interpolation AMG-based weighted aggregation [BHMOO], in which each vertex may be fractionally assigned to several generalized aggregates rather than to just one cluster, has yet to be successfully apphed in the hypergraph context. The obstacle is that it is not known how to transfer the finer-level hyperedges, objectives, and constraints accurately to the coarser level in this case. AMGbased weighted disaggregation is simpler, however, it has been successfully apphed to placement in mPL. For each cluster at the coarser level, a C-point representative is selected from it as the vertex largest in area among those of maximal weighted hyperedge degree. C-points simply inherit their parent clusters' positions and serve as fixed anchors. The remaining vertices, called F-points, are ordered by nonincreasing weighted hyperedge degree and placed at the weighted average of their strong C-point neighbors and strong, already-placed F-point neighbors. This F-point repositioning is iterated a few times, but the C-points are held fixed all the while. Iteration Flow Two backtracking V-cycles are used (Figure 8). The first follows the connectivity-based FC clustering hierarchy described above. The second follows a similar FC-cluster hierarchy in which both connectivity and proximity are used to calculate vertex affinities:

- s cPl•.•c.^ E

w{e) (|ej-l)area(e)|j(a:i,yi)-(a;j,2/j-)ir

During this second aggregation, positions are preserved by placing clusters at the weighted average of their component vertices' positions. No nonUnear programming is used in the second cycle, because it alters the initial placement too much and degrades the final result.

34

Tony F. Chan et al.

3.3.3

mPG

As described in Section 4, the wires implementing netlist connections are placed not only in the same region containing circuit cells, but also in a set of 3-12 routing layers directly above the placement region. The bottom layers closest to the cells are used for the shortest connections. The top layers are used for global connections. These can be made faster by increasing wire widths and wire spacing. As circuit sizes continue to increase, so do both the number of layers and the competition for wiring paths at the top layers. Over half the wires in a recent microprocessor design are over 0.5 mm in length, while only 4.1% are below 0.5 mm in length [CCPY02]. While many simple, statistical methods for estimating routing congestion during placement exist {topology-free congestion estimation [LTKS02]) it is generally believed that, for a placement algorithm to consistently produce routable results, some form of approximate routing topology must be explicitly constructed during placement as a guide. The principal goal of mPG [CCPY02] is to incorporate fast, constructive routing-congestion estimates, including layer assignment, into a wirelength-driven, simulatedanneahng based multiscale placement engine. Compared to Gordian-L [SDJ91], mPG is 4-6.7 times faster and generates slightly better wirelength for test circuits with more than 100,000 cells. In congestion-driven mode, mPG reduces wiring overflow estimates by 45%~74%, with a 5% increase in wirelength compared to wirelength-driven mode, but 3-7% less wirelength after global routing. The results of the mPG experiments show that the multiscale placement framework is readily adapted to incorporate complex routability constraints effectively. Hierarchy construction mPG uses connectivity-driven, recursive first-choice clustering (FC, Section 2.2.1) to build its placement hierarchy on the netlist. The vertex affinity used to define clusters is similar to that used by mPL, as defined in (3). However, instead of matching a given vertex to its highest-affinity neighbor, mPG selects a neighbor at random from the those in the top 10% affinity. Moreover, the mPG vertex affinity does not consider vertex area. Experiments for mPG show that imposing explicit constraints on cluster areas in order to limit the cluster-area variation increases run time without significantly improving placement quality. The strategy in mPG is instead to allow unlimited variation in cluster areas, thereby reducing the number of cluster levels and allowing more computation time at each level. Large variations in cluster sizes are managable in mPG due to its hierarchical area-density model. Optimization at each level of the netlist-cluster hierarchy is performed over the exact same set of regular, uniform, nested bin-density grids. By gradually reducing the area overflow in bins at all scales from the size of the smaflest cells up to 1/4 the placement region, a sufficiently

Multiscale Optimization in VLSI Physical Design Automation

35

uniform distribution of cell areas is obtained for detailed placement. The same grid hierarchy is also used to perform fast incremental, global routing, including fast layer assignment, and to estimate routing congestion in each bin. In wirelength mode, the mPG objective is simply bounding-box wirelength, as in (1). In (routability) congestion mode, the objective is equivalent to a weighted-wirelength version of (1), in which the weight of a net is proportional to the sum of the estimated wire usages of the bins used by that net's rectilinear Steiner-tree routing. The congestion-based objective is used only at finer levels of the cluster hierarchy. Relaxation Relaxation in mPG is by simulated anneahng. Throughout the process, vertices are positioned only at bin centers. All vertex moves are discrete, from one bin center to another. At each step, a cluster is randomly selected. A target location for the cluster is then selected either (a) randomly within some range limit or (b) to minimize the total bounding-box wirelength of the nets containing it. The probability of selecting the target randomly is set to max{a, 0.6}, where a is the "acceptance ratio." The probability p of accepting a move with cost change AC is one if ziC < 0 and exp{-ZiC/T} if AC > 0, where T denotes the temperature. At the coarsest level k, the starting temperature is set to approximately 20 times the standard deviation of the cost changes of rifc random moves, where rii denotes the number of clusters at level i. At other levels, binary search is used to estimate a temperature for which the expected cost change is zero. These approximate "equilibrium temperatures" are used as starting temperatures for those levels. When accepting a move to a target location results in a density-constraint violation, an alternative location near the target can be found efficiently, if it exists, by means of the hierarchical bin-density structure. Annealing proceeds at a given temperature in sweeps of rij vertex moves as long as the given objective can be decreased. After a certain number of consecutive sweeps with net increase in the objective, the temperature is decreased by a factor /U = /u(a), in a manner similar to that used by Ultrafast VPR. The default stopping temperature is taken to be 0.005C/|£^|, where C is the objective value and \E\ is the number of nets at the current level. Interpolation Cluster components are placed concentrically at the cluster center. Iteration Flow One V-cycle is used: recursive netlist coarsening followed by one recursive interpolation pass from the coarsest to the finest level. Relaxation is used at each level only in the interpolation pass.

36 3.4

Tony F. Chan et al. Partitioning-based Methods

An apparent deficiency with clustering-based hierarchies is that, while they are observed to perform well on circuits that can be placed with mostly short, local connections, they may be less effective for circuits that necessarily contain a high proportion of relative long, global connections. A bottom-up approach to placement may not work well on a design formed from the top down. A placement hierarchy can also be constructed from the top down in an effort to better capture the global interconnections of a design. An aggregate need not be defined by recursive clustering; recursive circuit partitioning (Section 2) can be used instead. In this approach, the coarsest level is defined first as just two (or perhaps four) partition blocks. Each placement level is obtained from its coarser neighboring level by partitioning the partition blocks at that coarser level. Although partitioning's use as a means of defining a hierarchy for multiscale placement is relatively new, placement by recursive partitioning has a long tradition. The average number of interconnections between subregions is obviously correlated with total wirelength. A good placement can therefore be viewed as one requiring as few interconnections as possible between subregions. Minimizing cutsize is generally acknowledged as easier than placement, and, since the arrival of multiscale hypergraph partitioners hMetis and MLpart (Sections 2.2 and 2.3), little or no progress in partitioning tools has been made. Partitioning tools are thus generaUy seen as more mature than placement tools, whose development continues to progress rapidly. Placement algorithms based on partitioning gain some leverage from the superior performance of the state-of-the-art partitioning tools. 3.4.1

Dragon

Since its introduction in 2000 [WYSOOb], Dragon has become a standard for comparison among academic placers for the low wirelength and high routabihty of the placements it generates. Dragon combines a partitioningbased cutsize-driven optimization with wirelength-driven simulated anneahng on partition blocks to produce placements at each level. Like mPG and Ultrafast VPR, Dragon relies on simulated annealing as its principal means of intralevel iterative improvement. Unlike these algorithms. Dragon's hierarchy is ultimately defined by top-down recursive partitioning rather than recursive clustering. Heavy reliance on annealing slows Dragon's performance relative to other techniques and may diminish its scalabihty somewhat (Section 1.5). The flexibility, simplicity, and power of the annealing-based approach, however, makes Dragon adaptable to a variety of problem formulations [SWY02, XWCS03].

Multiscale Optimization in VLSI Physical Design Automation

37

Hierarchy construction Dragon's placement hierarchy is built from the top down. Initially, a outsizedriven quadrisection of the circuit is computed by hMetis (Section 2.2). Each of the four partition blocks is then viewed as an aggregate. The aggregate is given an area in proportion to its cell content, and the cells within each such aggregate are placed at the aggregate's center. Half-perimeter wirelength— driven annealing on these aggregates is then used to determine their relative locations in the placement region. Cutsize-driven quadrisection is then applied to each of the aggregates, producing 16 = 4 x 4 aggregates at the next level. Wirelength-driven annealing then determines positions of these new, smaller aggregates within some limited distance of their parent aggregates' locations. This sequence of cutsize-driven subregion quadrisection followed by wirelength-driven positioning continues until the number of cells in each partitioning block is approximately 7. At that point, greedy heuristics are used to obtain a final, overlap free placement. When a given partition block Bi is quadrisected, the manner in which its connections to other partition blocks Bj at the same level are modeled may have considerable impact. In standard non-multiscale approaches, various forms of terminal propagation (Section 3.2) are the most effective known technique. Experiments reported by Dragon's authors indicate, however, that terminal propagation is inappropriate in the multiscale setting, where entire partition blocks are subsequently moved. Among a variety of attempted strategies, the one ultimately selected for Dragon is simply to replace any net containing cells in both B^ and other partition blocks by the ceUs in Bi contained by that net. Thus, during quadrisection, all connections within Bi are preserved, but connections to external blocks are ignored. Connections between blocks are accurately modeled only during the relaxation phase described below. Although hMetis is a multiscale partitioner and therefore generates its own hierarchy by recursive clustering, Dragon makes no exphcit use of hMetis's clustering hierarchy it its top-down phase. Instead, Dragon uses the final result of the partitioner as a means of defining a new hierarchy from the top down. It is this top-down, partitioning-based hierarchy which defines the placement problems to which Dragon's wirelength-driven relaxations are applied. Relaxation Low-temperature wirelength-driven simulated annealing is used to perform pairwise swaps of nearby partition blocks. These blocks are not required to remain within the boundaries of their parent blocks. Thus, relaxation at finer levels can to some extent correct premature decisions made at earlier, coarser levels. However, to control the run time, the range of moves that can be considered must be sharply limited.

38

Tony F. Chan et al.

After the final series of quadrisections, four stages proceed to a final placement. First, annealing based on swapping cells between partition blocks is performed. Second, area-density balancing linear programming [CXWS03] is used. Third, cells are spread out to remove all overlap. Finally, small permutations of cells are greedily considered separately along horizontal and vertical directions from randomly selected locations. Interpolation The top-down hierarchy construction completely determines the manner in which a coarse-level solution is converted to a solution at the adjacent finer level. First, the cutsize-driven netUst quadrisection performed by hMetis divides a partition-block aggregate into four equal-area subblocks. Second, the annealing-based relaxation is used to assign each subblock to a subregion. The initial assignment of subregions to subblocks is unspecified, but, due to the distance-limited annealing that follows, it may be taken simply as a collection of randomly selected 4-way assignments, each of these made locally within each parent aggregate's subregion. As stated above, the ultimate subregion selected for a subblock by the annealing need not belong to the region associated with its parent aggregate. Iteration Flow Dragon's flow proceeds top-down directly from the coarsest level to the finest level. Because the multiscale hierarchy is constructed from the top down, there is no explicit bottom-up phase and thus no notion of V-cycle etc. 3.4.2

Aplace

In VLSICAD, the word "analytical" is generally used to describe optimization techniques relying on smooth approximations. Among these methods, forcedirected algorithms [QB79, EJ98] model the hypergraph netlist as a generalized spring system and introduce a scalar potential field for area density. A smooth approximations to half-perimeter wirelength (1) is minimized subject to implicit bound constraints on area-density. Regions of high cell-area density are sources of cell-displacement force gradients, regions of low cell-area density are sinks. The placement problem becomes a search for equilibrium states, in which the tension in the spring system is balanced by the area-displacement forces. Aplace [KW04] (the 'A' stands for "analytic") is essentially a multiscale implementation of this approach. The wirelength model for a net t consisting of pin locations^''-* t = {{xi,yi)\i = 1 , . . . , deg(t)} follows a well-known logsum-exp approximation. *'*' A pin is a point on a cell where a net is connected to the cell.

Multiscale Optimization in VLSI Physical Design Automation

39

4xp(t) = a • ( l n ( ^ e^-/") + l n ( ^ e"^'/") + l n ( ^ e^'/«) + l n ( ^ e"^'/")) , (4) where a is a smoothing parameter. To estimate area densities, uniform grids are used. Each subregion of a grid G is called a bin (the word "cell" is reserved for the movable subcircuits being placed). The scalar potential field v{b) = area(w), and / 1 - 2d'^/r^ P^'^^ ~ \ 2(ci - r)yr^

if 0 < d < r/2 if r/2

and r is the radius of the potential. The potential 0 at any bin b is then defined as the sum of the potentials R^, for the rest i < j , ||afc — a^ilP > -K^j for the restfc,j , aij > 0, akj > 0. Let X = [xi X2 ... Xn] be the 2 x n matrix that needs to be determined. Then |2 ef.X'^Xei \\Xi \\ai-xjf

= {ai;ejf\I

Xf[I

X]{ai;ej),

where e,j is the vector with 1 at the ith position, —1 at the j t h position and zero everywhere else; and Cj is the vector of all zero except —1 at the j t h position. Let Y = X'^X. Then the problem can be rewritten as: minimize EijeN,, iKi^iiJ + EkjeN, ^^kj subject to eJjYeij = (ijj)^ + " j j , y i < j & Ni, (afciCj)^

[X'^Y)

eJ.Yeij > R\\/i

Y = X'^X,

^^k\ej) = {dkjf +akj,

< j (^ Ni,

V fc,j G A^2,

^^^

aij > 0, akj > 0.

Unfortunately, the above problem is not a convex optimization problem. Doherty et al. [DGPOl] ignore the non-convex inequality constraints but keep the convex ones, resulting in a convex second-order cone optimization problem. A drawback of their technique is that all position estimations will lie in the convex hull of the known points. Others have essentially used various types of

72

Pratik Biswas and Yinyu Ye

nonlinear equation and optimization solvers to solve similar quadratic models, where final solutions are highly dependent on initial solutions and search directions. Our method is to relax problem (1) to a semidefinite program: minimize E i j e w , , i ' ' . - * I. . ' ^ ^ •*.'.•;'•"••; ,w''.**- **

-0.2

-0.3

-0.4

-0.5 -0.5

-0.4

-0.3

-0.2

-0.1

0.1

0.2

0.3

0.4

0.5

Fig. 9. Third round position estimations in the 4,000 sensor network, noisyfactor=0, radio-range=.035, and the number of clusters=100.

It is interesting to note that the erroneous points are concentrated within particular regions. This clearly indicates that the clustering approach prevents the propagation of errors to other clusters. Again, see Figure 11 for the correlation between individual error offset (blue diamond) and the squareroot of trace (red square) for a few sensors whose trace is higher than 0.008 • (1 + noisy factor) • radiorange after the final round of the third simulation.

5 Work in Progress The current clustering approach assumes that the anchor nodes are more or less uniformly distributed over the entire space. So by dividing the entire space into smaller sized square clusters, the number of anchors in each cluster is also more or less the same. However this may or may not be the case in a real scenario. A better approach would be to create clusters more intelUgently based on local connectivity information. Keeping this in mind, we try and find for each sensor

82

Pratik Biswas and Yinyu Ye 0.5r 0.4-

0.3

0.2

0.1

-0.1

',•;>..•• 'V^i;-v?•-,:?^|:^;:*^^•:!^*>'*>%*V< i|i't?:,..r^ -0.4

-0.5

-0.5

-0.4

-0.3

-0.2

0.1

-0.1

0.2

0.3

0.4

0.5

Fig. 10. Fifth and final round position estimations in the 4,000 sensor network, noisy-factor=0, radio-range=.035, and the number of clusters=100.

0

2

4

6

0

8

10

12

Fig. 1 1 . Diamond: the offset distance between estimated and true positions, Square: the square root of individual trace (5) for the 4,000 sensor network.

Ad Hoc Wireless Sensor Network Localization

83

its immediate neighborhood, that is, points within radio range of it. It can be said that such points are within one hop of each other. Higher degrees of connectivity between different points can also be evaluated by calculating the minimum number of hops between the 2 points. Using the hop information, we propose to construct clusters which are not necessarily of any particular geometric configuration but are defined by its connectivity with neighborhood points. Such clusters would yield much more efficient SDP models and faster and more accurate estimations.

6 Concluding R e m a r k s The distributed SDP approach solves with great accuracy and speed very large estimation problems which would otherwise be extremely time consuming in a centralized approach. Also due to smaller independent clusters, the noise or error propagation is quite limited as opposed to centralized algorithms. In fact, the trace error (5) provides us with a very reliable measure of how accurate the estimation is and is used to discard estimations which may be very inaccurate as well as determining good estimations which may be used in future estimations. This distributed algorithm is particularly relevant in the ad hoc network scenario where so much emphasis is given to decentralized computation schemes.

References [BYZ03] S. J. Benson, Y. Ye and X. Zhang. DSDP, http://wwwunix.mcs.anl.gov/ benson/ or http://www.stanford.edu/ yyye/Col.html, 1998-2003. [BY98] D. Bertsimas and Y. Ye. Semidefinite relaxations, multivariate normal distributions, and order statistics. Handbook of Combinatorial Optimization (Vol. 3), D.-Z. Du and P.M. Pardalos (Eds.) pp. 1-19, (1998 Kluwer Academic Publishers). [BY04] P. Biswas and Y. Ye. Semidefinite Programming for Ad Hoc Wireless Sensor Network Localization. Proc. IPSN04 (2004). [BEF94] S. Boyd, L. El Ghaoui, E. Feron, and V. Balakrishnan. Linear Matrix Inequalities in System and Control Theory. SI AM, 1994. [BHEOO] N. Bulusu, J. Heidemann, D. Estrin. GPS-less low cost outdoor localization for very small devices. TR 00-729, Computer Science, University of Southern California, April, 2000. [DGPOl] L. Doherty, L. E. Ghaoui, and K. Pister. Convex position estimation in wireless sensor networks. Proc. Infocom 2001, Anchorage, AK, April 2001. [GKW02] D. Ganesan, B. Krishnamachari, A. Woo, D. Culler, D. Estrin, and S. Wicker. An empirical study of epidemic algorithms in large scale multihop wireless networks. UCLA/CSD-TR-02-0013, Computer Science, UCLA, 2002.

84 [HBOl]

Pratik Biswas and Yinyu Ye

J. Hightower and G. Boriello. Location systems for ubiquitous computing. IEEE Computer, 34(8) (2001) 57-66. [HMSOl] A. Howard, M. J. Mataric, and G. S. Sukhatme. Relaxation on a mesh: a formalism for generalized localization. In Proc. lEEE/RSJ Int'l Conf. on Intelligent Robots and Systems (IROSOl) (2001) 1055-1060. [NNOl] D. Niculescu and B. Nath. Ad-hoc positioning system. In IEEE GloheCom, Nov. 2001. [SRL02] C. Savarese, J. Rabaey, and K. Langendoen. Robust positioning algorithm for distributed ad-hoc wireless sensor networks. In USENIX Technical Annual Conf., Monterey, CA, June 2002. [SHSOl] A. Savvides, C. C. Han, and M. Srivastava. Dynamic fine-grained localization in ad hoc networks of sensors. In ACM/IEEE Int'l Conf. on Mobile Computing and Networking (MOBICON), July 2001. [SHS02] A. Savvides, H. Park, and M. Srivastava. The bits and flops of the n-hop multilateration primitive for node localization problems. In 1st ACM Int'l Workshop on Wireless Sensor Networks and Applications (WSNA '02), 112-121, Atlanta, 2002. [SRZ03] Y. Shang, W. Ruml, Y. Zhang and M. Fromherz. Localization From Mere Connectivity, MobiHoc'03, Annapolis, Maryland. June 2003, [StuOl] J. F. Sturm. Let SeDuMi seduce you, http://fewcal.kub.nl/sturm/software/sedumi.html, October 2001. [XY97] G.L. Xue and Y. Ye. An efficient algorithm for minimizing a sum of Euclidean norms with applications, SIAM Journal on Optimization 7 (1997) 1017-1036.

Optimization Algorithms for Sparse Representations and Applications P a n d o G. Georgiev^, Fabian Theis^, and Andrzej Cichocki*^ ^ Laboratory for Advanced Brain Signal Processing, Brain Science Institute, RIKEN, Wako-shi, Japan. Current address: ECECS Department, University of Cincinnati, ML 0030, Cincinnati, Ohio 45221-0030, USA. pgeorgieQececs.uc.edu ^ Institute of Biophysics, University of Regensburg, D-93040 Regensburg, Germany, fabisuiQtheis.najne ^ Laboratory for Advanced Brain Signal Processing, Brain Science Institute, RIKEN, Wako-shi, Japan. c i a Q b s p . b r a i n . r i k e n . j p Summary. We consider the following sparse representation problem^ which is called Sparse Component Analysis: identify the matrices S G IR""^^ and A G IR""^"^ (m < n < N) uniquely (up to permutation of scaling), knowing only their multiplication X = A S , under some conditions, expressed either in terms of A and sparsity of S (identifiability conditions), or in terms of X {Sparse Component Analysis conditions). A crucial assumption (sparsity condition) is that S is sparse of level k in sense that each column of S has at most k nonzero elements [k = l,2,...,m— 1). We present two type of optimization problems for such identification. The first one is used for identifying the mixing matrix A: this is a typical clustering type problem aimed to finding hyperplanes in IR^ which contain the columns of X. We present a general algorithm for this clustering problem and a modification of Bradley-Mangasarian's /c-planes clustering algorithm for data allowing reduction of this problem to an orthogonal one. The second type of problems is those of identifying the source matrix S. This corresponds to finding a sparse solution of a linear system. We present a source recovery algorithm, which allows to treat underdetermined case. Applications include Blind Signal Separation of under-determined linear mixtures of signals in which the sparsity is either given a priori, or obtained with some preprocessing techniques as wavelets, filtering, etc. We apply our orthogonal mplanes clustering algorithm to fMRI analysis. K e y w o r d s : Sparse Component Analysis, Blind Source Separation, underdetermined mixtures

86

Pando G. Georgiev, Fabian Theis, and Andrzej Cichocki

1 Introduction One of the fundamental questions in data analysis, signal processing, data mining, neuroscience, etc. is how to represent a large data set X (given in form of a (m X A'^)-matrix) in different ways. A simple approach is a linear matrix factorization: X = AS,

AeIR'"^",SGlR"^^,

(1)

where the unknown matrices A (dictionary) and S (source signals) have some specific properties, for instance: 1) the rows of S are (discrete) random variables, which are statistically independent as much as possible - this is Independent Component Analysis (ICA) problem; 2) S contains as many zeros as possible - this is the sparse representation or Sparse Component Analysis (SCA) problem; 3) the elements of X, A and S are nonnegative - this is Nonnegative Matrix Factorization (NMF) (see [LS99]). There is a large amount of papers devoted to ICA problems (see for instance [CA02], [HKOOl] and references therein) but mostly for the case m> n. We refer to [BZOl, LLGS99, TLP03, WS03, ZPOl] and reference therein for some recent papers on SCA and underdetermined ICA (m < n). A related problem is the so called Blind Source Separation (BSS) problem, in which we know a priori that a representation such as in equation (1) exists and the task is to recover the sources (and the mixing matrix) as accurately as possible. A fundamental property of the complete BSS problem is that such a recovery (under assumptions in 1) and non-Gaussianity of the sources) is possible up to permutation and scaling of the sources, which makes the BSS problem so attractive. In this paper we consider SCA and BSS problems in the underdetermined case (m < n, i.e. more sources than sensors, which is more challenging problem), where the additional information compensating the limited number of sensors is the sparseness of the sources. It should be noted that this problem is quite general and fundamental, since the sources could be not necessarily sparse in time domain. It would be sufficient to find a linear transformation (e.g. wavelet packets), in which the sources are sufficiently sparse. In the sequel, we present new algorithms for solving the BSS problem: matrix identification algorithm and source recovery algorithm under conditions that the source matrix S has at most m — \ nonzero elements in each column and if the identifiability conditions are satisfied (see Theorem 1). We demonstrate the effectiveness of our general matrix identification algorithm and the source recovery algorithm in the underdetermined case for 7 artificially created sparse source signals, such that the source matrix S has at most 2 nonzero elements in each column, mixed with a randomly generated (3 x 7) matrix. For a comparison, we present a recovery using /i-norm minimization

Optimization Algorithms for Sparse Representations and Applications

87

[CDS98], [DE03], which gives signals that are far from the original ones. This imphes that the conditions which ensure equivalence of li-noiva and ^o-norm minimization [DE03], Theorem 7, are generaUy not satisfied for randomly generated matrices. Note that /i-norm minimization gives solutions which have at most m non-zeros [CDS98], [DE03]. Another connection with [DE03] is the fact that our algorithm for source recovery works "with probability one", i.e. for almost all data vectors x (in measure sense) such that the system X = As has a sparse solution with less than m nonzero elements, this solution is unique, while in [DE03] the authors proved that for all data vectors x such that the system x = As has a sparse solution with less than Spark{A)/2 nonzero elements, this solution is unique. Note that Spark{A) < m + 1, where Spark{A) is the smallest number of linearly dependent columns of A.

2 Blind Source Separation In this section we develop a method for solving the BSS problem if the following assumptions are satisfied: Al) the mixing matrix A G IR'"^" has the property that any square mx m submatrix of it is nonsingular; A2) each column of the source matrix S has at most m - 1 nonzero elements; A3) the sources are sufficiently rich represented in the following sense: for any index set of n - m-h 1 elements I = {ii, ...,in~m+i} C {!,...,n} there exist at least m column vectors of the matrix S such that each of them has zero elements in places with indexes in / and each m - 1 of them are linearly independent. 2.1 Matrix identification We describe conditions in the sparse BSS problem under which we can identify the mixing matrix uniquely up to permutation and scahng of the columns. We give two type of such conditions. The first one corresponds to the least sparsest case in which such identification is possible. Further, we consider the most sparsest case (for small number of samples) as in this case the algorithm is much simpler. 2.1.1 General case - full identiflability Theorem 1 (Identiflability conditions - general case) Assume that in the representation X = AS the matrix A satisfies condition Al), the matrix S satisfies conditions A2) and A3) and only the matrix X is known. Then the mixing matrix A is identifiable uniquely up to permutation and scaling of the columns.

88

Pando G. Georgiev, Fabian Theis, and Andrzej Cichocki

Proof. It is clear that any column a^ of the mixing matrix lies in the intersection of all f ^]^) hyperplanes generated by those columns of A in which aj participates. We will show that these hyperplanes can be obtained by the columns of the data X under the condition of the theorem. Let J7 be the set of all subsets of {!,...,n} containing m — 1 elements and let J E J. Note that J consists of ( „ 1 i ) elements. We will show that the hyperplane (denoted by Hj) generated by the columns of A with indexes from J can be obtained by some columns of X. By A2) and A3), there exist m indexes {tk})^^i C {!,..., A^} such that any m — 1 vector columns of {S(:,ife)}^j form a basis of the (m — l)-dimensional coordinate subspace of IR" with zero coordinates given by {1, ...,n} \ J. Because of the mixing model, vectors of the form Vfc = ^ 5 ' ( j , t f c ) a j , fc = l,...,m, belong to the data matrix X. Now, by condition Al) it follows that any m — 1 of the vectors {^k}^=i are Unearly independent, which impUes that they will span the same hyperplane Hj. By Al) and the above, it follows that we can cluster the columns of X in ( ^'^^ j groups Wfc, fc = 1,..., f "^^ j uniquely such that each group Hk contains at least m elements and they span one hyperplane /f j^, for some Jk & J • Now we cluster the hyperplanes obtained in such a way in the smallest number of groups such that the intersection of all hyperplanes in one group gives a single one-dimensional subspace. It is clear that such one-dimensional subspace will contain one column of the mixing matrix, the number of these groups is n and each group consists of ( ^~ 2 ) hyperplanes. • The proof of this theorem gives the idea for the matrix identification algorithm.

Algorithm 2.1 SCA matrix identification algorithm Data: samples x ( l ) , . . . , x ( r ) o / X Result: estimated mixing matrix A Hyperplane i d e n t i f i c a t i o n . 1 Cluster the columns ofK in ( „ 1 j ) groups ?ik,k = I,..., i^_^] such that the span of the elements of each group Tik produces one hyperplane and these hyperplanes are different. Matrix i d e n t i f i c a t i o n . 2 Cluster the normal vectors to these hyperplanes in the smallest number of groups Gj,j = l,...,n (which gives the number of sources n) such that the normal vectors to the hyperplanes in each group Gj lie in a new hyperplane

Optimization Algorithms for Sparse Representations and Applications

89

3 Calculate the normal vectors kj to each hyperplane Hj,j = l,...,n. 4 The matrix A with columns a.j is an estimation of the mixing matrix (up to permutation and scaling of the columns). Remark The above algorithm works for data for which we know a priori that they he on hyperplanes (or near to hyperplanes). 2.2 Identification of sources Theorem 2 (Uniqueness of sparse representation) Let Ti be the set of all X £ IR"* such that the linear system As = x has a solution with at least n — m + k zero components. If A fulfills Al), then there exists a subset TLQ C Ji with measure zero with respect to H, such that for every x £ Ti\Ho this system has no other solution with this property. Proof. Obviously W is the union of all ( „ " ^.) = (m-k) 0 and some /i G [0,1). If condition (12) does not hold, we assume, as in [FGLTW02], that the computation of tk is unlikely to produce a satisfactory decrease in ruk, and proceed just as if the feasible set of TRQF{xk, Ak) were empty. If Uk can be computed and (12) holds, TRQP(a;^:,zifc) is said to be compatible for /x . In this sufficient model decrease seems possible. We formalize this notion in the form of a familiar Cauchy-point condition, and, recalling that the feasible set of QP(x'fc) is convex, we introduce the first-order criticality measure Xk = \

min cxixk) +

{gk + Hknk,t)\

(13)

Ax{xk)(nk+t)>0

l|tllQ,

(14)

which is, up to the constant term ^{nk,Hknk), equivalent to Q,P{xk) with s = Uk + t. The sufficient decrease condition then consists in assuming that there exists a constant K,,,„d > 0 such that mk{Xk) -rukixk

+tk) > KtmdXfc]

Xk ^ Pk

(15)

whenever TRQP(xfc, Z\/c) is compatible, where /3fc = 1 -t- ||i:f/c||. We know from [Toi88] and [CGST93] that such steps can be computed, even if we recognise that (15) may be difficult to expficitly verify in practice for large problems.

Non-monotone Trust-Region Filter Algorithm

129

2,2 The restoration procedure If TRQP(a;fc, Ak) is not compatible for /x, that is when the feasible set determined by the constraints of QP{xk) is empty, or the freedom left to reduce ruk within the trust region is too small in the sense that (12) fails, we must consider an alternative. Observe that, if 9{xk) is sufficiently small and the true nonlinear constraints are locally compatible, the linearized constraints should also be compatible, since they approximate the nonlinear constraints (locally) correctly. Furthermore, the feasible region for the linearized constraints should then be close enough to Xk for there to be some room to reduce m^, at least if Z\fc is large enough. If the nonlinear constraints are locally incompatible, we have to find a neighbourhood where this is not the case, since the problem (1) does not make sense in the current one. As in [FGLTW02], we rely on a restoration procedure. The aim of this procedure is to produce a new point Xk + fk that satisfies two conditions: we require TRQP(a:;fe + r/j, ^fe+i) to be compatible for some Ak+i > 0, and also require that x/^ + r/. be acceptable, in the sense that we discuss in the Section 2.3.3 (precisely, we require that either (20) or (21) holds for such an x j ) . In what follows, we wiU denote TZ = {k \ Uk does not satisfy (9) or ||nfc|j > K^Zifcmin[l,K^iZ\j!]}, the set of restoration iterations. The idea of the restoration procedure is to (approximately) solve min 9(x) (16) xeIR" starting from Xk, the current iterate. This is a non-smooth problem, but there exist methods, possibly of trust-region type (such as that suggested by [Yua94]), which can be successfully applied to solve it. Thus we wih not describe the restoration procedure in detail. Note that we have chosen here to reduce the infinity norm of the constraint violation, but we could equally well consider other norms, such as ii or £2, in which case the methods of [FL98] or of [HT95] and [DAW99] can respectively be considered. Of course, this technique only guarantees convergence to a first-order critical point of the chosen measure of constraint violation, which means that, in fact, the restoration procedure may fail as this critical point may not be feasible for the constraints of (1). However, even in this case, the result of the procedure is of interest because it typicaUy produces a local minimizer of 9{x), or of whatever other measure of constraint violation we choose for the restoration, yielding a point of locally-least infeasibility. There seems to be no easy way to circumvent this drawback, as it is known that finding a feasible point or proving that no such point exists is a global optimization problem and can be as difficult as the optimization problem (1) itself. One therefore has to accept two possible outcomes of the restoration procedure: either the procedure fails in that it does not produce a sequence of iterates converging to feasibility, or a point Xk + rk is produced such that 6{xk + rk) is as small as desired.

130

Nicholas I. M. Gould and Philippe L. Toint

2.3 The filter as a criterion to accept trial points Unfortunately, because the SQP iteration may only be locally convergent, the step Sfc or rk may not always be very useful. Thus, having computed a step Sk or rk from our current iterate Xk, we need to decide whether the trial point x^, defined by ^+ def (xk+Vk if ken, ,^^. '^ \xk + Sk otherwise ^ ' is any better than Xk as an approximate solution to our original problem (1). If we decide that this is the case, we say that iteration k is successful and choose x'^ as our next iterate. Let us denote by S the set of (Indices of) all successful iterations, that is S = {k\ Xk+i = a ; ^ } . We will discuss the details of precisely when we accept x'^ as our next iterate in Section 2.3.3, but note that an important ingredient in the process is the notion of a filter, a notion itself based on that of dominance. We say that a point xi dominates a point X2 whenever 0{xi) < e{x2) and fixi)

< /(xz).

Thus, if iterate Xk dominates iterate Xj, the latter is unlikely to be of real interest to us since Xk is at least as good as Xj on account of both feasibility and optimality. All we need to do now is to remember iterates that are not dominated by other iterates using a structure called a filter. A filter is a list J^ of pairs of the form {Oi, fi) such that either %< dj or f, < fj for i 7^ j . [FGLTW02] propose to accept a new trial iterate Xk + Sk only if it is not dominated by any other iterate in the filter and Xk- In the vocabulary of multi-criteria optimization, this amounts to building elements of the efficient frontier associated with the bi-criteria problem of reducing infeasibihty and the objective function value. We may describe this concept by associating with each iterate Xk its {9, /)-pair {9k, fk) and might choose to accept Xk + Sfc only if its {9, /)-pair does not lie, in the two-dimensional space spanned by constraint violation and objective function value, above and on the right of a previously accepted pair. If we define p(jr) = {(5)^ f)\9>9j

and / > fj for some j e J"},

(18)

the part of the (9, /)-space that is dominated by the pairs in tiie filter, this amounts to say that xjj" could be accepted if {9{x'^),f{x'l)) 0 V{J^k)i where jFfc denotes the filter at iteration k.

Non-monotone Trust-Region Filter Algorithm

131

2.3.1 The contribution of a trial point to the filter However, we may not wish to accept a new point x^ if its {9, /)-pair

{Ohf^) = {9{xt),fi4)) is arbitrarily close to being dominated by another point already in the filter. [FGLTW02], as all other theoretical analysis of the filter tiiat we know of, set a small "margin" around the border of T>{rk) in which trial points are also rejected. We follow here a different idea and define, for any [9, /)-pair, an area that represents a sufficient part of its contribution to the area of 'D{J-k)- For this purpose, we partition the right half-plane [0, -l-oo] x [—oo, -|-oo] into four different regions (see Figure 1).

fix)

NWiJ'k) e^k

v{rk)

n

C'max

, min

e[x] SW{J^k)

ff SE{Tk)

Fig. 1. The partition of the right half-plane for a filter Tk containing four {0,f) pairs.

If we define ViTkf'

to be the complement of P(J^fe) in the right half-plane,

0'mill ^ t , / =' m" i" n" ^",J,i 3^J^k

"C m aLx =~' m' -a. "x" ^0" :,( ), J&J^k

and /min = m m / j l^J^k

f:^l^ - m a x / , , O^-Fk

132

Nicholas I. M. Gould and Philippe L, Toint

these four parts are 1. the dominated part of the filter, P(^fe). 2. the undominated part of lower left (south-west) corner of the half plane, SW{:F,)

^^ V{Tkf

n [0,eax] x [-oo,/;^^J,

3. the undominated upper left (north-west) corner,

4. the undominated lower right (south-east) corner, 5i?(^fc) = ' ( C L , + o o ] x [ - o o , / ; ^ f ; , ) . Consider first a trial iterate x^ with its associated {6, /)-pair {0^,fi^) with 0'^ > 0. If the filter is empty (JF^. = 0), then we measure its contribution to the area of the filter by the simple formula

for some constant Kp > 0. If the filter already contains some past iterates, we measure the contribution of xt to the area of the filter by

4

a{x+,Tk)

'^' area(^P(^fc)''n[0+, C * „ , + « , ] x [ / + , / ; ^ L + « K ] )

if (^^4+) G SW{:Fk

by

a{xt,:Fk) = K,{e^L - Gt) if {etJi) e NW{n), by a ( 4 , n ) "^ K.{f^,^ - / + )

if (0+,/+) G SE{J^,k)

and by a{xl,T,)

"^ -area(^2?(^On[0fc+-e:t„]x[/+-/^|=jj,

if ( e + , / + ) G V{Tu),

where Vk ='' m,fi)

Gn

I ^i < 0fe+ and /j < / + } ,

(the set of filter pairs that dominate (0^, / ^ ) ) , and ^min =^ min Oj,

e'^i^ ='' rnax 9j.

Figure 2 iUustrates the corresponding areas in the filter for four possible (0^,4+) pairs (in V{Tk), SWi^k), NWiJ'k) and SEiJ'k)) to the area of the filter. Horizontally dashed surfaces indicate a positive contribution and

Non-monotone Trust-Region Filter Algorithm

133

vertically dashed ones a negative contribution. Observe that, by construction, the assigned areas a{x'^,!Fk) for nondominated points are all disjoint and that the negative area for dominated points is chosen such that 2?(^fe) is updated correctly. Also note that a(x, JF) is a continuous function of {0{x), f{x)), and thus of x, for a given filter J-". Furthermore, a{x,!F) is identically zero if {d{x),f{x)) is on the boundary of the dominated region P ( ^ ) . Also note that, although seemingly complicated, the value of a{x,!F) is not difficult to compute, since its calculation requires, in the worst case, us to consider all the points currently in the filter only once.

/ m a x "f~ ^ F

J max

fVk /max

(9

f )

Fig. 2. The contributions of four {9^, fil) pairs (in V{rk),sW(h), NW{Tk) and SE{Tk)) to the area of the filter. Klorizontal stripes indicate a positive contribution and vertical stripes a negative one.

2.3.2 U p d a t i n g t h e filter The procedure to update the filter for a particular {9, f) pair is extremely simple. If {Ok,fk) — {0{xk),f{xk)) does not belong to V{!Fic) (i.e. ii Xk is not dominated), then J^k+i ^ .Ffc U {Ok,fk), while if {9k, fk) £ T^i^k) (if ^k is dominated).

134

Nicholas I, M. Gould and Philippe L. Toint Tk + 1

(•^fc\n)u(0:L,/fc)u(efc,/,

"Pfc \ min/

where Vh is now the subset of pairs in Tk that dominate {9k, fk)- This last situation is illustrated by Figure 3, which shows the filter resulting from the operation of including the pair {6k, fk) belonging to V{J^k) (that associated with the vertically shaded "decrement" in the filter area of Figure 2) in the filter. The two points in Vk that have been removed are marked with crossed circles and their associated dominated orthants are indicated by dotted lines. Observe that it may happen that the number of points in the filter decreases when the set of dominating points Vk contains more than two filter pairs. Moreover, the pair for which the filter is updated is not always itself included in the filter (as shown in Figure 3). fk

{OkJk)

{OZlJk)

9k

"k, / m i n )

Fig. 3. The filter J-k+i after including the dominated pair {9k, fk) into J^k-

2.3.3 Acceptability of potential iterates We now return to the question of deciding whether or not a trial point x'l is acceptable for the filter. We will insist that this is a necessary condition for the iteration k to be successful in the sense that Xk+i = x^, i.e. the algorithm changes its current iterate to the trial point. Note that all restoration iterations are successful {TZ C S). Note also that (except for XQ) all iterates are produced

Non-monotone Trust-Region Filter Algorithm

135

by successful iterations : if we consider an iterate Xk, there must exists a predecessor iteration of index p{k) G S such that Xk.

S(fc) " ^Pik)+l

(19)

Observe that we do not always have that p{k) = k — 1 since not all iterations need being successful. A monotone version of our method (rather similar to that developed in [FGLTW02], but using a{x,!F) rather than a margin around the filter) would be to accept x'^ whenever this trial point results in an sufficient increase in the dominated area of the filter, i.e. 'D{Tk)- This is to say that x'j. would be acceptable for the filter whenever

ak > lAOt)\

(20)

def

where a^ "= a{x'^,J^k) and where 7jr G (0,1) is a constant. The non-monotone version that we analyze below replaces this condition by the weaker requirement that k

k

^

E (^^ + ((^t)'

ap(j) + Qffc > 7;r

= r(/c) + l

(21)

j=r(fc)+l

where aq = a{x'^,!Fq) (and thus a^i^q) = a{xq,!Fpi^q))), where U = {k\ the filter is updated for

{9k,fk)},

and where r{k) < fc is some past reference iteration such that r{k) G U. Note that condition (21) may equivalently be written in the more symmetric form k

Mi) + "fe > 7.F = r(/=) + l

E ^nu))"+ iot?

j = r(fc) + 1

jew iew because of (19). The reader may notice that condition (21) is reminiscent of the condition for non-monotone trust-region algorithms developed in [Toi96]. It requires that the average contribution to the filter area of the last points included in the filter and x'^ together to be globally (sufficiently) positive, but makes it possible to accept x'^ even though it may be dominated (i.e. lie in P(J-)j)). However, if x'j" provides a clear monotonic improvement, in the sense that (20) holds, we are also prepared to accept it. Thus, x^ will he called acceptable at iteration k if either (20) or (21) holds. We will denote def

A=

{keS\

(21) holds}

(22)

Observe also that we could replace Of. by min[0^,Ke] in (20) and (21), where a and Ke are strictly positive constants. This variant may be more numerically sensible, and does not affect the theory developed below.

136

Nicholas I. M. Gould and Philippe L. Toint

2.4 T h e n o n - m o n o t o n e A l g o r i t h m We are now ready to define our algorithm formally as Algorithm 2.1. A simplified flow-chart of the algorithm is given as Figure 4.

Algorithm 2.1: N o n - m o n o t o n e Filter Algorithm Step 0: Initialization. Let an initial point xo, an initial trust-region radius Ao > 0 and an initial symmetric matrix Ho be given, as well as constants 0 < 70 < 71 < 1 < 72, 0 < ?7i < 772 < 1, 7J=- G (0,1), Ke E (0,1), KA e (0,1], Kft > 0, fj, G (0,1), ip > 1/(1 -I- Id) and Ktmd £ (0,1]. Compute /(xo) and c(xo). Set J ^ = 0 and /c = 0. Step 1: Test for optimality. If dk = Xk = 0, stop. Step 2: Ensure compatibility. Attempt to compute a step Uk- If TRQP (x^, Zifc) is compatible, go to Step 3. Otherwise, update the filter for {9k, fk) and compute a restoration step rk for which TRQP{xk+rk, Ak+i) is compatible for some Ak+i > 0, and x^ = Xk+rk is acceptable. If this proves impossible, stop. Otherwise, set Xk+i = x^ and go to Step 7. Step 3: Determine a trial step. Compute a step tk, set x^ = Xk + rik + tk, and evaluate c(x;J') and }{x'l). Step 4: Test acceptability of the trial point. If xjj" is not acceptable, again set Xfe+i = Xk, choose Ak+\ £ [-yoAk,^\Ak\, set nk+\ = uk, and go to Step 7. If mk{xk) - mk{xl) < Ke9f, (23) then update the filter for {Ok,fk) and go to Step 6. Step 5; Test predicted vs. achieved reduction. If Pk'='

^^;i-^^i\, V2 and (23) fails.

Step 7: Update the Hessian approximation. Determine Hk+i- Increment k by one and go to Step 1.

As in [FL98, FL02], one may choose tp = 2 (Note t h a t the choice •0 = 1 is always possible because /i > 0). Reasonable values for the constants might then be 7 ^ = 10-4,

70-0.1, 71=0.5, 72=2, 7yi=0.01, ryz = 0.9, K4 = 0.7, K^ = 100, /ii = 0.01, Ke = 10-'^, and K,,„d = 0.01,

b u t it is too early to know if these are even close to the best possible choices.

Non-monotone Trust-Region Filter Algorithm

137

initialization {k — 0)

attempt to compute nt

TRQP(;i;t,At) compatible?

update the filter

compute tk

compute rfc and Afc+1

ret acceptable?

mfc(3,'t) - mt(;i;J) < Kefljf?

update the filter

.Tt+i =xt

+ rk

Xk+l =Xk

Pk > V> •

+Sk

(maybe) increase Afc —> A^+i

reduce Afc -» At+,

compute Hk-\-\ and increment k by one

Fig. 4. Flowchart of the algorithm (without termination tests)

For the restoration procedure in Step 2 to succeed, we have to evaluate whether TRQP(a;fe -|- rfc,Zifc+i) is compatible for a suitable value of Zi^+i. This requires that a suitable normal step be computed which successfully passes the test (12). Of course, once this is achieved, this normal step may be reused at iteration fc + 1. Thus we shall require the normal step calculated to verify compatibility of TRQP(a;fc +rk, /^fc+i) should actually be used as Ufc+i. Also note that the restoration procedure cannot be applied on two successive iterations, since the iterate Xk + Tk produced by the first of these iterations leads to a compatible TRQP(xfe-|-i,Zifc+i) and is acceptable. As it stands, the algorithm is not specific about how to choose Ak+i during a restoration iteration. We refer the reader to [FGLTW02] for a more complete

138

Nicholas 1, M. Gould and Philippe L. Toint

discussion of this issue, whose implementation may involve techniques such as the internal doubling strategy of [BSS87] to increase the new radius, or the intelligent radius choice described by [Sar97]. However, we recognize that numerical experience with the algorithm is too limited at this stage to make definite recommendations. The role of condition (23) may be interpreted as follows. If it holds, then one may think that the constraint violation is significant and that one should aim to improve on this situation in the future, by inserting the current point in the filter. If it fails, then the reduction in the objective function predicted by the model is more significant than the current constraint violation and it is thus appealing to let the algorithm behave as if it were unconstrained. In this case, it is important that the predicted decrease in the model is reafized by the actual decrease in the function, which is why we then perform the test (24). In particular, if the iterate Xk is feasible, then (9) implies that Xfc = x^ and Ke9f = 0 < mk{xk) - mk{x^) = mk{xk) - mk{x~l). (25) As a consequence, the filter mechanism is irrelevant if all iterates are feasible, and the algorithm reduces to a traditional unconstrained trust-region method. Another consequence of (25) is that no feasible iterate is ever included in the filter, which is crucial in allowing finite termination of the restoration procedure, as explained in [FGLTW02]. Note that the argument may fail and a restoration step may not terminate in a finite number of iterations if we do not assume the existence of the normal step when the constraint violation is small enough, even if this violation converges to zero (see Fletcher, Leyffer and Toint, 1998, for an example). Notice also that the failure of (23) ensures that the denominator of pk in (24) will be strictly positive whenever 9k is. If 6k = 0, then x^, = x^, and the denominator of (24) will be strictly positive unless Xk is a first-order critical point because of (15). The reader may have observed that Step 6 allows a relatively wide choice of the new trust-region radius /ifc+i- While the stated conditions appear to be sufficient for the theory developed below, one must obviously be more specific in practice. We refer again to [FGLTW02] for a more detailed discussion of this issue. Finally, observe that the mechanism of the algorithm imposes that UCS,

(26)

i.e. that iterates are included in the filter only at successful iterations.

3 Convergence to First-Order Critical Points We now prove that our non-monotone algorithm generates a globally convergent sequence of iterates. In the following analysis, we concentrate on the case that the restoration iteration always succeeds. If this is not the case, then it

Non-monotone Trust-Region Filter Algorithm

139

usually follows that the restoration phase has converged to an approximate solution of the feasibility problem (16) and we can conclude that (1) is locally inconsistent. In order to obtain our global convergence result, we will use the assumptions ASl: / and the constraint functions cg and c j are twice continuously diflerentiable; AS2: there exists «„,„,, > 1 such that ll^fcll < Kumh - 1 for all k, ASS: the iterates {xk} remain in a closed, bounded domain X C K". If, for example, Hk is chosen as the Hessian of the Lagrangian function e{x,y) = f{x) + (y£,C£{x)) + {yx,cx{x)) at Xk, in that Hk = ^xxf{Xk)

+ X ^ [yk]i^xxCi{Xk),

(1)

iefuz where [yk\i denotes the i-th component of the vector of Lagrange multipliers Vk = iVs k 2/jfc)>then we see from ASl and ASS that AS2 is satisfied when these multipliers remain bounded. The same is true if the Hessian matrices in (1) are replaced by bounded approximations. A first immediate consequence of AS1~ASS is that there exists a constant K„bi, > 1 such that, for all k, i/(x+)-mfc(x+)! 0. Thus the part of the (0,/)-space in which the {9, /)-pairs associated with the filter iterates lie is restricted to the rectangle [0,6''"'"'] x [/""", oo]. We also note the following simple consequence of (9) and ASS. Lemma 1. Suppose that Algorithm 2.1 is applied to problem (1). Suppose also that (9) and ASS hold, and that Ok < 5nThen there exists a constant

K,S 0 independent of k such that

Kisc^fc < ||"-fc|!-

(4)

140

Nicholas I. M. Gould and Philippe L. Toint

Proof. See [FGLTW02], Lemma 3.1. Our assumptions and the definition of Xk in (13) also ensure that 6^ and Xfc can be used (together) to measure criticality for problem (1). Lemma 2. Suppose that Algorithm 2.1 is applied to problem (1) and that finite termination does not occur. Suppose also that ASl and ASS hold, and that there exists a subsequence {ki} % TZ such that lim Xfci = 0 and Um Qk^ = 0.

(5)

Then every limit point of the subsequence {x^.} is a first-order critical point for problem (1). Proof. See [FGLTW02], Lemma 3.2. We start our analysis by examining the impact of our non-monotone acceptance criteria (20) and (21). Once a trial point is accepted as a new iterate, it must be because it provides some improvement, compared to either a past reference iterate (using (21)), or to the previous iterate (using (20)). We formalize this notion by saying that iterate Xk = a;p(fc)+i improves on iterate Xi(^k), where i{k) = r{p{k)) if p{k) G A, that is if Xk is accepted at iteration p{k) using (21), and i{k)=p{k)

if p{k)(^A,

(6)

that is if Xk is accepted at iteration p{k) using (20). Now consider any iterate Xk- This iterate improved on Xj(fc), which was itself accepted because it improved on Xj(i(fc)), and so on, back to the stage where XQ is reached by this backwards referencing process. Hence we may construct, for each fc, a chain of successful iterations indexed by Cfc = {^i, ^2, • • •, ^g} such that £1=0,

iq = k and x^. = x^g.^^) for j =

l,...,q-l.

We start by proving the following useful lemma. Lemma 3. Suppose that Algorithm 2.1 is applied to problem (1). Then, for each k, fe-i

area(2?(J-fe)) > 7 ^ ^

di

Proof. Consider now the backward referencing chain from iteration fc — 1, Ck-i, and any £j {j > 0) in this chain. Observe that, if p{ij) G A, then (21) implies that i{ij) = r{p{£j)) = ij-i and that

Non-monotone Trust-Region Filter Algorithm If now p{ij)

^ .4, then £j-i

= p{ij)

141

and thus

{£j^i + i,...,ej}nuc

{^^_i + i,...,ij}ns

= {ij},

where we have used (26). Moreover, (20) then implies t h a t ap(^,) > Jy^dfi t h a t (7) holds again in this case. Observe also t h a t

so

q

fc-l

area(I'(j^fc)) > ^ a p

^^ j=0

ieu

E

^p(i)

='•3-I-'

ieu

since the ap(^i) are all disjoint for nondominated points and the dominated area of the filter is updated correctly for dominated ones. Combining this inequality with (7) then gives the desired result. We now consider what happens when the filter is u p d a t e d an infinite number of times. L e m m a 4 . Suppose t h a t Algorithm 2.1 is applied to problem (1). Suppose also t h a t A S l and ASS hold and t h a t \U\ = oo. Then hm 0k = 0. keu Proof. Suppose, for the purpose of obtaining a contradiction, t h a t there exists an infinite subsequence {ki} C U such t h a t 0fc. > e for all i and for some e > 0. Applying now Lemma 3, we deduce t h a t area(2?(J'fc,+i)) > i-fy^e'^. However, (3) implies t h a t , for any k, area('Z?(^fc)) is bounded above by a constant Kp"" > 0 independent of k. Hence we obtain t h a t t
mimkixk)

- •rrikix'^)]

and the desired inequality follows. >V^9k We now estabUsh that if the trust-region radius and the constraint violation are both small at a non-critical iterate Xk, TRQP{xk,Ak) must be compatible. Lemma 10. Suppose that Algorithm 2.1 is applied to problem (1). Suppose also that AS1-AS3, (9) and (10) hold, that (15) holds for k^TZ, and that "^Sn.

Ak < min

(16)

Suppose furthermore that 9k < mm[5o,Sn].

(17)

Then kK4K^Zi^+^

(18)

144

Nicholas I. M. Gould and Philippe L. Toint

where we have used (12) and the fact that Kf^A'/^ < 1 because of (16). In this case, the mechanism of the algorithm then ensures that fc — 1 ^ 7 ^ . Now assume that iterationfc— 1 is unsuccessful. Because of Lemmas 7 and 9, which hold at iteration k — 1 ^ TZ because of (16), the fact that dk = Ok-ij (9), and (15), we obtain that Pk-i > m and /(.T^t^i) < f{xk-i)

- y^9k-i.

(19)

Hence, if iteration fc — 1 is unsuccessful, this must be because x'^_-^ is not acceptable for the filter. However, if we have that eti

< {I - V^)Ok~i,

(20)

then, using the second part of (19) and the fact that {O^^^, fil_^) G a ( 4 - i . - ^ f c - i ) > ifi^k-i)

SW{J^k-i),

- /(a;+_i)][0fc-i - ^ j ] > 7 ^ ^ L i >

lAOt-if,

and a;^_j is acceptable for the filter because of (20). Since this is not the case, (20) cannot hold and we must have that ^ ^ l > (1 - vi^Wk-i

= (1 -

v^)9k-

But Lemma 5 and the mechanism of the algorithm then imply that (1 - Vi^)Ok < /^uM^ti
0.

(27)

We now decompose the model decrease in its normal and tangential components, that is f^kA^ki)

~ mk,{xl.) = nikiixk,) - mkiixlj

+ rukiixl^) -

mk,{x'l).

Substituting (26) and (27) into this decomposition, we find that hm inf [mfc,; {xk,) - ruk^ {xt.)] > S > 0. i—too

(28)

'

We now observe that, because fcj G U \TZ, we know from the mechanism of the algorithm that (23) must hold, that is

146

Nicholas I. M. Gould and Philippe L. Toint rrik, {xki) - viki {xj:^) < KgOf,.

(29)

Combining this bound with (28), we find that 9^. is bounded away from zero for i sufficiently large, which is impossible in view of (21). We therefore deduce that (24) cannot hold and obtain that there is a subsequence {ke} C {/cj} for which lim Ak, = 0. •+00

We now restrict our attention to the tail of this subsequence, that is to the set of indices ke that are large enough to ensure that (14), (15) and (16) hold, which is possible by definition of the subsequence and because of (21). For these indices, we may therefore apply Lemma 10, and deduce that iteration ki ^TZ for (. sufficiently large. Hence, as above, (29) must hold for £ sufficiently large. However, we may also apply Lemma 8, which contradicts (29), and therefore (23) cannot hold, yielding the desired result. Thus, if the filter is updated at an infinite subsequence of iterates. Lemma 2 ensures that there exists a limit point which is a first-order critical point. Our remaining analysis then naturally concentrates on the possibility that there may be no such infinite subsequence. In this case, the filter is unchanged for k sufficiently large. In particular, this means that the number of restoration iterations, \TZ\, must be finite. In what follows, we assume that /CQ > 0 is the last iteration at which the filter was updated. Lemma 12. Suppose that Algorithm 2.1 is applied to problem (1), that finite termination does not occur and that \U\ < oo. Suppose also that AS1-AS3, (9) hold and that (15) holds for k ^TZ. Then we have that fim 6ifc = 0.

(30)

/c—>oo

Furthermore, n^ satisfies (4) for aU k > ko sufficiently large. Proof. Consider any successful iterate with k > ko. Since the filter is not updated at iteration k, it follows from the mechanism of the algorithm that Pk > Vi holds and thus that f{xk) - f{xk+\) > Vi\mk{xk) - mkix'l)] > r]iKg0f > 0.

(31)

Thus the objective function does not increase for all successful iterations with k > ko. But ASl and AS3 imply (3) and therefore we must have, from the first part of this statement, that lim / ( x f c ) - / ( x f c + i ) = 0 .

fces fc—>oo

(32)

The hmit (30) then immediately follows from (31) and the fact that 9j = 6^ for all unsuccessful iterations j that immediately follow the successful iteration k, if any. The last conclusion then results from (9) and Lemma 1. We now show that the trust-region radius cannot become arbitrarily small if the (asymptoticaUy feasible) iterates stay away from first-order critical points.

Non-monotone Trust-Region Filter Algorithm

147

Lemma 13. Suppose that Algorithm 2.1 is applied to problem (1), that finite termination does not occur and that \U\ < oo. Suppose also that AS1-AS3 hold and (15) holds for k ^ TZ. Suppose furthermore that (10) holds for all k > ko. Then there exists a A^in > 0 such that

for all k. Proof. Suppose that fci > fco is chosen sufficiently large to ensure that (17) holds and that Uk satisfies (9) for all k > ki, which is possible because of Lemma 12. Suppose also, for the purpose of obtaining a contradiction, that iteration j is the first iteration following iteration fcj for which Aj < 7o min

= loSs,

(33)

where OF d e f

.

,

r = mmt ieu is the smallest constraint violation appearing in the filter. Note also that the inequahty Aj < joAki, which is implied by (33), ensures that j > fci + 1 and hence that j — I > ki and thus that j — 1 ^ TZ. Then the mechanism of the algorithm and (33) imply that Aj_i < —Aj < 5,

(34)

70

and Lemma 7, which is applicable because (33) and (34) together imply (12) with k replaced by j — 1, then ensures that Pj-i > m-

(35)

Furthermore, since rij^i satisfies (9), Lemma 1 implies that we can apply Lemma 5. This together with (33) and (34), gives that 0+_i < «...,/i,2_i < (1 - Vi^)e'.

(36)

We may also apply Lemma 9 because (33) and (34) ensure that (12) holds and because (15) also holds for j — I > ki. Hence we deduce that /( lo^s for all k > ki, and the desired result follows if we define ^min

=inm\Ao,...,Ak,,'yoSs].

We may now analyze the convergence of Xk itself. Lemma 14. Suppose that Algorithm 2.1 is applied to problem (1), that finite termination does not occur and that \U\ < oo. Suppose also that AS1-AS3, (9) hold and (15) holds for k^U. Then liminf Xfc = 0.

(37)

fc—*CX)

Proof. We start by observing that Lemma 12 implies that the second conclusion of (9) holds for k sufficiently large. Moreover, as in Lemma 12, we obtain (31) and therefore (32) for each k £ S, k > ko- Suppose now, for the purpose of obtaining a contradiction, that (10) holds and notice that mfc(xfc) - mkix'l^) = mk{xk) - mk{xl) + mfc(x^) - mfc(a;^).

(38)

Moreover, note, as in Lemma 6, that \mk{xk) - mfc(Xfc)| < Kubgll^fcll + K„„h||nfcf, which in turn yields that lim [mk{xk) - nikixk)] = 0 k—*oo

because of Lemma 12 and the first inequality of (9). This limit, together with (31), (32) and (38), then gives that lim [mfc(4)~mfe(x+)]=0.

(39)

k€S

But (15), (10), AS2 and Lemma 13 together imply that, for all k > ko "mkixk) - mfc(x^) > K.^^Xk I

f,Ak

(40)

Pk

immediately giving a contradiction with (39). Hence (10) cannot hold and the desired result follows. We may summarize all of the above in our main global convergence result. Lemma 15. Suppose that Algorithm 2.1 is apphed to problem (1) and that finite termination does not occur. Suppose also that AS1-AS3 and (9) hold, and that (15) holds for k ^ TZ. Let {xk} be the sequence of iterates produced by the algorithm. Then either the restoration procedure terminates unsuccessfully by converging to an infeasible first-order critical point of problem (16), or there is a subsequence {kj} for which hm Xk.: = X* j->oo

and

first-order

critical point for problem (1).

Non-monotone Trust-Region Filter Algorithm

149

Proof. Suppose t h a t the restoration iteration always terminates successfully. From ASS, Lemmas 11, 12 and 14, we obtain t h a t , for some subsequence

{kj}, Urn 9k, = Um Xfe, = 0.

(41)

T h e conclusion then follows from Lemma 2.

4 Conclusion and Perspectives We have introduced a trust-region SQP-filter algorithm for general nonlinear programming, and have shown this algorithm to be globally convergent to first-order critical points. T h e proposed algorithm differs from t h a t discussed by [FL02], notably because it uses a decomposition of the step in its normal and tangential components and imposes some restrictions on the length of the former. It also differs from the algorithm of [FGLTW02] in two main aspects. T h e first and most important is t h a t the rule for deciding whether a trial point is acceptable for the filter is non-monotone, and allows, in some circumstances, acceptance of points t h a t are dominated by other filter pairs. This gives hopes t h a t an S Q P filter algorithm can be developed without introducing secondorder correction steps. T h e second is t h a t the algorithm no longer relies on the definition of a "margin" around the filter, but directly uses the dominated area of the filter as an acceptance criterion.

References [BSS87] R. H. Byrd, R. B. Schnabel, and G. A. Shultz. A trust region algorithm for nonlinearly constrained optimization. SIAM Journal on Numerical Analysis, 24, 1152-1170, 1987. [CGTOO] A. R. Conn, N. L M. Gould, and Ph. L. Toint. Trust-Region Methods. Number 01 in 'MPS-SIAM Series on Optimization'. SIAM, Philadelphia, USA, 2000. [CGST93] A. R. Conn, N. I. M. Gould, A. Sartenaer, and Ph. L. Toint. Global convergence of a class of trust region algorithms for optimization using inexact projections on convex constraints. SIAM Journal on Optimization, 3(1), 164-221, 1993. [DAW99] J. E. Dennis, M. El-Alem, and K. A. Williamson. A trust-region approach to nonlinear systems of equalities and inequalities. SIAM Journal on Optimization, 9(2), 291-315, 1999. [HT95] M. El-Hallabi and R. A. Tapia. An inexact trust-region feasible-point algorithm for nonlinear systems of equalities and inequalities. Technical Report TR95-09, Department of Computational and Applied Mathematics, Rice University, Houston, Texas, USA, 1995. [FL98] R. Fletcher and S. LeyfFer. User manual for filterSQP. Numerical Analysis Report NA/181, Department of Mathematics, University of Dundee, Dundee, Scotland, 1998.

150 [FL02]

Nicholas I. M. Gould and Philippe L. Toint

R. Fletcher and S. LeyfFer. Nonlinear programming without a penalty function. Mathematical Programming, 91(2), 239-269, 2002. [FGLTW02] R. Fletcher, N. I. M. Gould, S. Leyffer, Ph. L. Toint, and A. Wachter. Global convergence of trust-region SQP-filter algorithms for nonlinear programming. SIAM Journal on Optimization, 13(3), 635-659, 2002. [FLT98] R. Fletcher, S. Leyffer, and Ph. L. Toint. On the global convergence of a SLP-filter algorithm. Technical Report 98/13, Department of Mathematics, University of Namur, Namur, Belgium, 1998. [FLT02] R. Fletcher, S. Leyffer, and Ph. L. Toint. On the global convergence of a filter-SQP algorithm. SIAM Journal on Optimization, 13(1), 44-59, 20026. [Omo89] E. O. Omojokun. Trust region algorithms for optimization with nonlinear equality and inequality constraints. PhD thesis. University of Colorado, Boulder, Colorado, USA, 1989. [Sar97] A. Sartenaer. Automatic determination of an initial trust region in nonlinear programming. SIAM Journal on Scientific Computing, 18(6), 17881803, 1997. [Toi88] Ph. L. Toint. Global convergence of a class of trust region methods for nonconvex minimization in Hilbert space. IMA Journal of Numerical Analysis, 8(2), 231-252, 1988. [Toi96] Ph. L. Toint. A non-monotone trust-region algorithm for nonlinear optimization subject to convex constraints. Mathematical Programming, 77(1), 69-94, 1997. [Ulb04] S. Ulbrich. On the superlinear local convergence of a filter-SQP method. Mathematical Programming, Series B, 100(1), 217-245, 2004. [Var85] A. Vardi. A trust region algorithm for equality constrained minimization: convergence properties and implementation. SIAM Journal on Numerical Analysis, 22(3), 575-591, 1985. [WBOl] A. Wachter and L. T. Biegler. Global and local convergence of line search filter methods for nonlinear programming. Technical Report CAPD B-01-09, Department of Chemical Engineering, Carnegie Mellon University, Pittsburgh, USA, 2001. Available on http://www.optimizationonline.org/DB.HTML/2001/08/367.html. [Yua94] Y. Yuan. Trust region algorithms for nonlinear programming, in Z. C. Shi, ed., 'Contemporary Mathematics', Vol. 163, pp. 205-225, Providence, Rhode-Island, USA, 1994. American Mathematical Society.

Factors Affecting the Performance of Optimization-based Multigrid Methods * Robert Michael Lewis^ and Stephen G. Nash^ ^ Department of Mathematics, College of WilHam k Mary, P.O. Box 8795, Williamsburg, Virginia, 23187-8795, USA. buckarooQmath.wm.edu ^ Associate Dean, School of Information Technology and Engineering, Mail Stop 5C8, George Mason University, Fairfax, VA 22030, USA. [email protected] S u m m a r y . Many large nonlinear optimization problems are based upon discretizations of underlying continuous functions. Optimization-based multigrid methods are designed to solve such discretized problems efficiently by taking explicit advantage of the family of discretizations. The methods are generalizations of more traditional multigrid methods for solving partial differential equations. The goal of this paper is to clarify the factors that affect the performance of an optimization-based multigrid method. There are five main factors involved: (1) global convergence, (2) local convergence, (3) role of the underlying optimization method, (4) role of the multigrid recursion, and (5) properties of the optimization model. We discuss all five of these issues, and illustrate our analysis with computational examples. Optimization-based multigrid methods are an intermediate tool between general-purpose optimization software and customized software. Because discretized optimization problems arise in so many practical settings we think that they could become a valuable tool for engineering design.

1 Introduction Many large nonlinear optimization problems are based upon discretizations of underlying continuous functions. For example, the underlying infinite dimensional model may be governed by a diflFerential or integral equation representing, for example, the flow of air over an airplane. When solved computationally, the underlying functions are typically approximated using a discretization (an approximation at a discrete set of points) or a finite-element approximation (for example, approximating the solution by a spline function). We focus here on discretizations to simplify the discussion. Optimization-based multigrid methods are designed to solve such discretized problems efficiently by taking explicit advantage of the family of * This research was supported by the National Aeronautics and Space Administration under NASA Grant NCC-1-02029, and by the National Science Foundation under grant DMS-0215444.

152

Robert Michael Lewis and Stephen G. Nash

discretizations. That is, they perform computations on a less costly version of the optimization problem based on a coarse discretization (grid), and use the results to improve the estimate of the solution on a finer discretization (grid). On each individual grid the algorithm applies iterations of a traditional optimization method to improve the estimate of the solution. The methods are generaUzations of more traditional multigrid methods for solving partial differential equations. The goal of this paper is to clarify the factors that affect the performance of an optimization-based multigrid method. There are five main factors involved: • • • • •

Global convergence: Is the algorithm guaranteed to converge to a solution of the optimization problem? Local convergence: How rapidly does the algorithm converge close to the solution? Behavior of the underlying optimization method: What is the role of the underlying optimization method in improving the estimate of the solution? Behavior of the multigrid recursion: What is the role of the multigrid recursion in improving the estimate of the solution? Properties of the optimization model: What types of optimization models are well-suited to an optimization-based multigrid method?

These questions will be addressed in subsequent sections, but it is possible to give brief responses here. In short: • • •



Optimization-based multigrid algorithms can be implemented in a manner that guarantees convergence to a local solution. The multigrid algorithm can very efficient, with a fast hnear rate of convergence near the solution (the same as for traditional multigrid algorithms). The underlying optimization method and the multigrid recursion are complementary. The underlying optimization method is effective at approximating the high-frequency (rapidly changing) components of the solution, and the multigrid recursion is effective at approximating the low-frequency (slowly changing) components of the solution. Thus, combining the two approaches is far more effective than using either one separately. The multigrid algorithm will be effective if the reduced Hessian of the optimization model is nearly diagonal in the Fourier (frequency) basis. There is evidence that suggests that large classes of optimization models have this property. (The reduced Hessian should also be positive semidefinite, but this will always be true near a local solution of an optimization problem.)

Multigrid algorithms were first proposed for solving elliptic linear partial differential equations (PDEs). In that setting they are well known for their efficiency, with computational costs that are linear in the number of variables. Multigrid algorithms have been extended to nonlinear PDEs, and to non-elliptic PDEs, but their behavior in these settings is not as ideal. More

Performance of Optimization-based Multigrid Methods

153

specifically, multigrid algorithms for nonlinear equations are not guaranteed to converge to a solution of the equations, and multigrid algorithms for nonelhptic equations can be far less efficient than for ehiptic equations. We are proposing that optimization-based multigrid algorithms be apphed to discretizations of models of the form minimize F(a) = / ( a , u ( a ) ) ,

(1)

where a is an infinite-dimensional set of design variables (for example, a might be a function on an interval), and u = u{a) is a set of state variables. Given a, the state variables are defined implicitly by a system of equations S{a,u{a)) = 0

(2)

in a and u. We assume that S{a, u) = Ois either a system of partial differential equations or a system of integral equations. The design variables a might represent boundary conditions, might define the shape of a machine part, etc. There may also be additional constraints on the design and state variables. Problems of this type can be very difficult to solve using general-purpose optimization software. In other words, we are applying multigrid methods to an optimization model, i.e., to the minimization of a nonhnear function subject to nonhnear constraints. Some of these constraint may be inequalities that are not always active, and become active in an abrupt, discontinuous manner. This is in contrast to applying multigrid methods to a system of nonlinear equations, where all the equalities (equations) will be active (satisfied) at a solution. Given the limitations of traditional multigrid algorithms applied to nonelUptic or nonlinear equations, it might seem questionable to apply optimization-based multigrid methods to the more general model (l)-(2). (We are not assuming that the state equation (2) is eUiptic or linear.) To further compficate matters, the performance of the multigrid algorithm depends on the properties of the reduced Hessian for (1)~(2), which in turn depends on the Jacobian of S{a,u{a)) and the Hessian of F{a). If the state equation were ill-suited for multigrid, would not the optimization model be even worse? Perhaps counter-intuitively, we believe that the optimization model can be a better setting for multigrid than a system of PDEs. In particular, it is possible to design the multigrid algorithm so that it is guaranteed to converge to a local solution of the optimization problem. (Typically, optimization algorithms have better guarantees of convergence than algorithms for solving systems of nonhnear equations.) Also, broad classes of optimization models appear to be well-suited to multigrid. An additional advantage is that the optimization setting is more general than the nonlinear-equations context in the sense that it can include auxihary constraints (including inequalities) [LN05]. The effectiveness of multigrid for optimization depends on all five of the factors mentioned above. Without careful design of the multigrid algorithm, it

154

Robert Michael Lewis and Stephen G. Nash

is not guaranteed to converge. Without the fast local convergence, the multigrid algorithm is not competitive with existing techniques. Both the underlying optimization algorithm and the multigrid recursion are needed to achieve the fast local convergence. And the multigrid recursion will be of no value if the reduced Hessian does not have the right properties. Our results build on existing work, especially on the extensive research on multigrid algorithms for PDEs (see, e.g., [Bra77, Hac85, McC89]), but also on multigrid optimization research [FP95, KTS95, NasOO, Ta'91, TT98, ZC92]. In our work we have focused on geometric multigrid, in which the optimization problem decomposes along length-scales. Here is an outline of the paper. Section 2 gives a template for the method. Section 3 describes several broad categories of models that are candidates for an optimization-based multigrid method. Sections 4 and 5 discuss global and local convergence, respectively. Sections 6 and 7 explain the complementary roles of the underlying optimization algorithm and the multigrid recursion. The properties of the reduced Hessian are the topic of Section 8. Section 9 has computational examples, and conclusions are in Section 10. The optimization-based multigrid methods we discuss are not completely general-purpose optimization methods. They assume that the optimization problem is based on a discretization, and not all large optimization problems are of this type. Nevertheless, the methods have great flexibility. They can adapt to specialized software (e.g., for grid generation, and for solving the underlying state equation), and they require httle effort from the user beyond that required to apply a traditional optimization method to the problem. Thus, these optimization-based multigrid methods are an intermediate tool between general-purpose optimization software (which may have difficulty solving these problems) and customized software and preconditioners (which require extensive human effort to develop). Because discretized optimization problems arise in so many practical settings (e.g., aircraft design) we think that they could become a valuable tool for engineering design.

2 T h e Multigrid Algorithm Our goal in this paper is to clarify the issues that affect the performance of an optimization-based multigrid algorithm. We specify an algorithm with sufficient generality for the results to have broader interest, yet with sufficient detail that properties of the algorithm (such as convergence theorems) can be deduced. The algorithm below, called MG/Opt, is an attempt to satisfy these conflicting aims. The description of the algorithm MG/Opt is taken from [LN05]. The recursion is a traditional multigrid V-cycle. The coarse-grid subproblems are motivated by the full approximation scheme, a multigrid method for solving systems of nonlinear equations [McC89]. Yet, despite the motivation of more

Performance of Optimization-based Multigrid Methods

155

traditional multigrid methods for equations, MG/Opt is truly based on optimization. The solver ("smoother") used on the fine grid is an optimization algorithm, and the coarse-grid subproblems are optimization problems. MG/Opt differs in two other ways from a more traditional multigrid algorithm. The coarse-grid subproblem imposes bounds on the solution (thus Hmiting the length of the step taken at each iteration), and the result of the coarse-grid subproblem is used to define a search direction for the fine grid. This search direction is used within a line search, a tool for ensuring that the algorithm makes progress toward the solution (as measured in terms of the value of the objective function). These additions guarantee that MG/Opt will converge to a local solution to the optimization problem. (See Section 4.) Several steps in the algorithm require explanation. In two places there is the requirement to "partially minimize" F{a). In our implementation this means to apply some (typically small) number of iterations of a nonlinear optimization algorithm. In our computational tests, we use one outer iteration of a truncated-Newton method. The algorithm MG/Opt refers to multiple versions of quantities corresponding to the various grids. At each iteration of MG/Opt, however, there are only references to two grids: the current grid, identified by the symbol h; and the next coarser grid, identified by H. Thus a^ is the version of the vector of design variables on the current grid, and Ff{ is the objective function on the next coarser grid. MG/Opt requires update and downdate operators, / ^ and I^^, respectively. These operators transform a vector on one grid to a vector on the next finer or coarser grid. For theoretical reasons, we require that these two operators be essentially the transposes of one another: J]^ = constant x I^ where the constant is required to be a positive number. This is a standard assumption [Bri87]. In other respects, MG/Opt offers considerable flexibility. Any optimization algorithm could be used. There are no additional assumptions about the update and downdate operators. No assumptions are made about the relationship among the various grids (e.g., they need not be nested, though this may be desirable). The line search is not specified in detail. Of course, these choices would have a major effect on the practical performance of MG/Opt. One iteration of the algorithm takes a step from a^^\ an initial estimate of the solution on the finest grid, to a^^\ via: • •

If on the coarsest grid, minimizeF/t(a/i) = fh{ 1. An illustrative example that demonstrates the impact of changing 7 is shown in [MS04]. In practice, this method of obtaining an initial feasible solution has been highly effective. 6.1.2 A Local Relaxation Given a feasible integer solution, we use the 9-point stencil to specify a neighborhood (see figure 4). As discussed earlier, we solve a continuous problem given by SSOsub minimize

V^YV

V,I

7-yy=0

subject to

V^ . Proof. This follows from a Taylor series expansion of 2^ (we drop the subscript k to allow for readability)

200

Walter Murray and Uday V. Shanbhag

1 ^ ( 1 _ e-^^o = ^ E ( i - (1 - ^y^ + l^''yi • • •)

i

i

The last inequality holds if 0{(iyi) < 0. We prove this as follows:

2/-1

2

\

J-

4

4/w

•'•

\

= - 2 - " Vi (1 " 3 W') - ^ / ^ 2/i (1 - gAtyi) + • • • • This implies that 1 1 - r/uyi >o =^ 6 1 1 - -^i^yi > 0 ^ 5 1 2n + l = ^

yi
3.

Lemma 1 implies that

r•

A Local Relaxation Method for Nonlinear Facility Location Problems ^FLP^

^

201

ZFLP-

Moreover, FEAFLPQFEAPLPL,

where FEA refers to the feasible region implying the result. D The following lemma prescribes a distance measure between the solutions of the globally relaxed problem and the true problem for a particular set of integer decisions. Lemma 2. The expression \0{nyi)\ is bounded from above by ^jj.'^e^. Proof.

1 = ~2^^yii'^

1 1 - g^yi + Y^iwi?

- •••)

< -li^^yU'^ + \i^yi + ^ ( w i ) ' + • • •) |^(/iyi)l < 2M^2^'(l + tiyi + -{^lyif

+ ...)

Theorem 2. Given an integer solution {xk,yk) of FLP in which we have exactly m facilities. Then we may obtain a set of m lower bounds Zf. '•', where j of the facilities have been relaxed and j = 1 , . . . ,m. Note that j = 0 gives us the original integer solution. We obtain Zf.'-' by solving the appropriate continuous subproblem. Moreover, we have a monotonicity relationship given by Lfl ^ Zk

L,l ^

> Z k

^ ^ k

L,2

L,m •••^k



Proof. This follows immediately by noticing that when any facility is relaxed, it results in a lower transportation cost at the same cost of capital. D 7.2 Convexity and MIP Bounds Duran and Grossman [DG86] discuss an outer-approximation algorithm for solving mixed-integer nonlinear programs, with a convex objective and convex constraints. This algorithm may also be used to provide a tighter set of bounds. However, as the problem sizes grow to the order of thousand variables, it may be well nigh impossible to obtain such bounds.

202

Walter Murray and Uday V. Shanbhag

7.3 Discrete Local Minima In this section, we define tlie notion of a discrete local minimizer by extending the ideas from continuous optimization. It should be noted that such minimizers tend to be meaningful in situations when an integer variable has some locational significance on the graph. We begin by defining a J—local neighborhood of a solution to (FLP). We should specify that the local neighborhood accounts only for a change in the position of the facilities and not in the actual number of facilities. Definition 4. A S—local neighborhood of {xk,yk) represents a set of points {xj,yj] such that if[yk]i = 1 then

Yl [yk]j = 1 and Xj represents the solution of continuous problem FLPgubiVj)• We denote such a neighborhood by J^xk,vk' Essentially, the 5—local neighborhood allows the replacement of existing facilities with those within a 6 distance from the current set of facilities. We may now define necessary conditions for a discrete local minimizer of this problem. Definition 5. Given an integer solution {x*,y*) to the problem FLP with system cost z*. Then any point {xj,yj) belonging to Afx*,y* ^^^ system cost z^ where z^ > z*. Given such a definition, we now state the simple result that our relaxation algorithm does not move off a discrete local minimizer. Lemma 3. Suppose the local relaxation algorithm begins at a discrete local minimizer. Then it terminates without any further progress. Proof. This follows immediately by noticing that a full step of our algorithm moves to a point in the local neighborhood as does every other point visited in the backtracking process. Since the local neighborhood contains no points with strictly better costs, we never move off' the starting point. D

8 Summary and Comments We have presented a framework algorithm to solve nonlinear facility location problems and illustrated how it may be applied to the placement of substations in an electrical network. The algorithm is based on an approach commonly used to solve continuous problems in which an improved point is identified from the neighborhood of the current best point . In general, the concept of

A Local Relaxation Method for Nonlinear Facility Location Problems

203

a neighborhood does not generalize to discrete problems. However, we show that the FLP is an exception. We define a neighborhood and show how an improved point may be found. The algorithm we propose has two important properties. First, the algorithm generates a sequence of feasible and improving estimates of the solution. Second, provided the density of facilities to locations does not significantly decrease as the dimension of the problem increases then the number of steps needed to move from the initial estimate to the solution does not, in general, increase. The key to the scalability of the algorithm is that at every step it is possible (and likely) that all facilities change locations. In one way, the discrete algorithm works better than its continuous counterparts. Typically in a linesearch or trust-region algorithm for continuous problems, the initial step taken is usually only accepted when close to the solution (and not always then). In our observations, our discrete algorithm usually accepts the first step regardless of whether or not the current iterate is a good estimate. The reason for this is that the neighborhood of a continuous problems is difficult to define in terms of magnitude but for a discrete problem it is not. Acknowledgments We extend our sincere thanks to Robert H. Fletcher, PubUc Utility District No. 1 of Snohomish County, Everett, Washington and Patrick Gaffney, Bergen Software Services International (BSSI), for their guidance and support.

204

Walter Murray and Uday V. Shanbhag

References [AH89]

H.M. Amir and T. Hasegawa. Nonlinear mixed-discrete structural optimization. Journal of Structural Engineering, 115:626-646, 1989. [AM093] R. K. Ahuja, T. L. Magnanti, and J. B. Orlin. Network Flows: Theory, Algorithms and Applications. Prentice Hall, Englewood Cliffs, NJ, 1993. [CM89] J.Z. Cha and R.W. Mayne. Optimization with discrete variables via recursive quadratic programming: Part 2 - algorithms and results. Transactions of the ASME, Journal of Mechanisms, Transmissions and Automation in Design, 111:130-136, 1989. [DG86] M.A. Duran and I.E. Grossman. An outer-approximation algorithm for a class of mixed-integer nonlinear programs. Mathematical Programming, 36:307-339, 1986. [Geo72] A.M. Geoffrion. Generalized benders decomposition. J. Optim. Theory Appl,, 10:237-260, 1972. [GMS97a] P. E. Gill, W. Murray, and M. A. Saunders. SNOPT: An SQP algorithm for large-scale constrained optimization. Technical Report NA-97-2, San Diego, CA, 1997. [GMS97b] P. E. Gill, W. Murray, and M. A. Saunders. User's guide for SQOPT 5.3: A fortan package for large-scale linear and quadratic programming. Technical report. Systems Optimization Laboratory, Department of Operations Research, Stanford University - 94305-4022, 1997. [Hoc97] D. S. Hochbaum. Approximation algorithms for NP-hard problems. PWS Publishing Co., 1997. [Hol99] K. Holmstrom. The TOMLAB Optimization Environment in Matlab. Advanced Modeling and Optimization, l(l):47-69, 1999. [JV99] K. Jain and V. V. Vazirani. Primal-dual approximation algorithms for metric facility location and k-median problems. In IEEE Symposium on Foundations of Computer Science, pages 2-13, 1999. [Ley93] S. Leyffer. Deterministic Methods for Mixed Integer Nonlinear Programming. PhD thesis, University of Dundee, Dundee, Scotland, UK, 1993. [MS04] W. Murray and U.V. Shanbhag. A local relaxation approach for the siting of electrical substations. Computational Optimization and Applications (to appear), 2004. [Ng02] K-M Ng. A Continuation Approach to Solving Continuous Problems with Discrete Variables. PhD thesis, Stanford University, Stanford, CA 94305, June, 2002. [NW88] G.L. Nemhauser and L.A. Wolsey. Integer and Combinatorial Optimization. John Wiley, New York, 1988. [OV89] G.R. Olsen and G.N. Vanderplaats. Methods for nonlinear optimization with discrete design variables. AIAA Journal, 27:1584-1589, 1989. [STA97] D. B. Shmoys, E. Tardos, and K. Aardal. Approximation algorithms for facility location problems (extended abstract). In 29th Annual ACM Symposium on Theory of Computing, pages 265-274, 1997.

Fluence Map Optimization in IMRT Cancer Treatment Planning and A Geometric Approach Yin Zhang and Michael Merritt* Department of Computational and Applied Mathematics, Rice University, Houston, TX 77005 - 4805, USA. {yzhang,minerritt}Qcaam.rice.edu S u m m a r y . Intensity-modulated radiation therapy (IMRT) is a state-of-the-art technique for administering radiation to cancer patients. The goal of a treatment is to deliver a prescribed amount of radiation to the tumor, while limiting the amount absorbed by the surrounding healthy and critical organs. Planning an IMRT treatment requires determining fluence maps, each consisting of hundreds or more beamlet intensities. Since it is difficult or impossible to deliver a sufficient dose to a tumor without irradiating nearby critical organs, radiation oncologists have developed guidelines to allow tradeoffs by introducing so-called dose-volume constraints (DVCs), which specify a given percentage of volume for each critical organ that can be sacrificed if necessary. Such constraints, however, are of combinatorial nature and pose significant challenges to the fluence map optimization problem. The purpose of this paper is two-fold. We try to introduce the IMRT fluence map optimization problem to a broad optimization audience, with the hope of attracting more interests in this promising application area. We also propose a geometric approach to the fluence map optimization problem. Contrary to the traditional view, we treat dose distributions as primary independent variables and beamlet intensities as secondary. We present theoretical and preliminary computational results for the proposed approach, while omitting excessive technical details to maintain an expository nature of the paper. K e y w o r d s : Cancer radiation therapy, Optimal t r e a t m e n t planning, Fluence m a p optimization, A geometric Approach.

1 Introduction Using radiation to treat cancer requires careful planning. Bombarding malignant tumors with high-energy X-rays can kill cancerous cells (or hinder their * This author's work was supported in part by DOE/LANL Contract 03891-99-23 and NSF Grant No. DMS-0240058.

206

Yin Zhang and Michael Merritt

growth), but it is usually impossible to deliver a terminal dose without damaging nearby healthy organs in the process. Serious patient complications can occur when the surrounding healthy tissues receive too much of this collateral radiation. On the other hand, sacrificing a modest number of healthy cells may be tolerable since many organs are resilient enough to sustain a certain degree of damage while still providing their anatomical function and can eventually recover. Therefore, research in radiation therapy seeks methods of delivering a sufficient dose to the tumor, while carefully controlling the dose received by neighboring critical organs and other healthy tissues. 1.1 IMRT Intensity-modulated radiation therapy (IMRT) is a state-of-the-art method which delivers higher doses to tumors and allows more precise conformation than the conventional 3D conformal radiotherapy. The primary delivery tool for IMRT is a linear accelerator that rotates on a gantry around the patient, emitting "modulated" beams of X-rays. This modulation is accomplished by means of a device known as a multileaf collimator (MLC) which is attached to the accelerator. Its adjustable heavy-metal leaves act as a filter, blocking or allowing radiation through in a precise manner controlled by a computer, in order to tailor the beam shape to the shape of the tumor volume while minimizing exposure of the neighboring structures. Several mathematical problems arise in order to optimally administer IMRT. Treatment proceeds by rotating the accelerator around the patient and coordinating the leaf movements in the MLC so that the radiation delivered conforms to some desirable dose distribution at each gantry (beam) angle. We will assume in this paper that treatments are administered by fixing the accelerator at a finite number of given gantry angles, rather than emitting radiation while rotating through a continuous arc. We note that determining the number and the values of the gantry angles constitutes a higher-level optimization problem of a combinatorial nature, often called the beam-angle optimization problem. Typically, increasing the number of gantry angles would increase the quality and the cost of the treatments. In addition to knowing the beam angles, one must also know how intense the beams should be at each point {x,y) on the MLC aperture for aU gantry angles. These intensity profiles, or fluence maps, are represented by two-dimensional, nonnegative functions Ia{x,y) for a = 1, 2 , . . . , /c, where k is the number of gantry angles in use. The process of determining the functions Ia{x,y) is often called fluence map optimization. Finally, once the fluence maps Ia{x,y) are determined, one must convert these into MLC leaf sequences that attempt to reahze them. The longer an MLC leaf is open at a certain position {x,y), the more dose the tissue along a straight path from that position (plus some surrounding tissue) absorbs. The process of converting fluence maps into the opening and closing movements of leaves is called leaf-sequencing. There are many physical and mathematical

Fluence Map Optimization in IMRT Cancer Treatment Planning

207

issues that affect how successful MLC leaf sequences are at approximating the desired fluence maps. In this paper, we will focus solely on the problem of computing the fluence maps Ia{x,y), such that the tumor, or target, structures receive the prescribed doses and the healthy critical structures receive as little as possible. These conflicting goals are the primary cause of difflculty in fluence map optimization. 1.2 Dose-Volume Constraints Besides some scattering, the radiation travels primarily in a straight line, so it must typically pass next to or even directly go through critical organs in order to reach and irradiate intended tumor targets. Since the doses that kill most cancers are much larger than those that kiU most healthy tissue in the body, even though multiple angles are used in an attempt to focus radiation on the targets, more often than not one has no choice but to sacrifice some healthy tissues. The next sensible objective is to control the volume of the healthy tissues to be sacrificed. Toward this end, in standard practice oncologists prescribe dose-volume constraints (DVCs) that allow a certain percentage of volume in healthy tissues to be sacrificed in order to make sufficient progress in treating the cancer. A typical DVC has the form, for example, that no more than 30% volume of the right lung can exceed a radiation dose of 20Gy, where "Gy" is the shorthand for "Gray" - the international unit for radiation dose absorption. In addition, oncologists may specify another level of constraint on the same organ, such as no more than 40% volume of the right lung can exceed lOGy. These dose-volume constraints are natural for oncologists to specify and have become the de facto standard way to prescribe radiation therapy treatment in practice. Clearly, dose-volume constraints provide the much needed flexibiUty necessary for the escalation of tumor doses. On the other hand, they also introduce a high degree of complexity to the underlying optimization problem. In the above example, for instance, which 30% of the right lung volume should be allowed to absorb more that 20Gy? This brings a combinatorial component to the optimization problem (once the problem is discretized). Mathematically, finding the globally optimal combination of critical organ cells to sacrifice in this way can be an extremely difficult problem.

2 Fluence M a p Optimization In this section, we provide details on the current practice of IMRT treatment planning, as they are relevant to the fluence map optimization.

208

Yin Zhang and Michael Merritt

2.1 Discretizations To determine the fluence map functions Ia{x,y), we first discretize the MLC aperture for each angle by putting a rectangular grid {{xi,yj)} on it. As a result, the two-dimensional function Ia{x,y) will be approximated by a set of discrete values {Ia{xi,yj)}. The actual number of these small rectangular elements, or "bixels," will depend not only on the physical sizes of the MLC device (such as the width of the MLC leaves), but also on the gantry angles and the geometry of the region under treatment. For instance, if a beam emitting from a given grid point is determined not to have a significant intersection with or impact on the region of treatment, then this particular grid point will be omitted from consideration. With this discretization, each MLC aperture is broken into hundreds (or up to thousands) of discrete "bixels" and, correspondingly, each radiation beam is broken into as many discrete "beamlets." The total number of beamlets in a given fluence map optimization problem is the sum of the beamlets for all the beam angles. Let n be the total number of beamlets for all beam angles and let us index the bixels linearly. Instead of using the notation Ia{xi,yj) for the unknown beamlet intensities, we denote the unknowns by a vector x £ R". In addition, since the intensity values are nonnegative, we have x G K" where M" denotes the nonnegative orthant of M". Moreover, we also need to discretize the "region of interest" or "region of treatment." This is the three-dimensional volume of the patient's anatomy containing the target structures and any nearby critical structures that might be adversely affected by the radiation. Similarly, we will break this volume up into small three-dimensional rectangular elements known as "voxels", each of which is associated with a point {xi,yj,Zk) 6 K'^. Let m be the total number of voxels in the region of interest and let us index the voxels linearly. During the treatment, each voxel will absorb a dose of radiation. We denote the dose values absorbed by the voxels in the region of interest by a vector d 6 R^. Furthermore, let rrit and m^, be the number of target and healthy voxels, respectively, so that rrit + rrih = m. Similarly, we will decompose the dose vector into two sub-vectors dt and dh, corresponding to dose values absorbed by the target and healthy voxels, respectively. 2.2 Dose Calculation The standard IMRT model for dose absorbed at the i-th voxel in the region of interest is n

di^j^^-ij^j^ (1) i=i where a^ represents the amount of dose absorbed by the *-th voxel per unit intensity emission from the j - t h beamlet. The values a^ for all the voxels and bixels form a matrix A G R^^", known as the "influence matrix" (or kernel matrix). In the matrix notation, the dose calculation formula (1) becomes

Fluence Map Optimization in IMRT Cancer Treatment Planning

209

Figure 1 shows how each element aij relates to a beamlet emitted from the MLC and a voxel in the discretized region of interest.

element of the influence matrix

voxel i

Fig. 1. Influence Matrix Element

Assume A has no zero rows or columns. This means, respectively, that all voxels receive some nonzero amount of radiation and every beamlet influences a t least one voxel's dose. These conditions can be easily met by pre-processing, if necessary. Typically, m >> n with m on the order of lo5 or larger and n of lo3 up to lo4. Note the entries aij are necessarily nonnegative. In fact, depending on how much scattering is included, the influence matrix A can be very sparse or fairly dense. The dose calculation model using the influence matrix A can be considered as a first-order approximation. Radiation absorption as a function of the beamlet intensities can be modeled with linear Boltzmann transport equations [Lar97]. Solving these equations can be complicated and computationally expensive, so many different approximation methods have been proposed for computing A. Monte Carlo sampling techniques are, for example, among the more popular methods because of their accuracy. However, Monte Carlo methods are also very slow and expensive. Some commercial planning systems include dose calculation engines with several levels of accuracy. In this way, dose calculations in early iterations, which are usually not required to be

210

Yin Zhang and Michael Merritt

highly accurate, can be made less expensive [WDL+03], while more accurate (and more expensive) schemes can be used in later iterations. Clearly, dose calculation is still an important research area on its own right. While acknowledging its importance, for now we will assume that a constant influence matrix A is provided to us a priori, and we will use it throughout our optimization process. 2.3 Prescriptions A sample prescription for a lung cancer case is given below. As one can see, the prescription consists of a set of dose-volume constraints. Currently, this is the standard practice in prescribing radiation therapy treatments.

>= 95'/, of Tumor = 95'/, of Ext.Tumor = >= >= >=

63 72 60 70

Gy Gy Gy Gy

=

43 30 10 20 10 19 10 54

Gy Gy Gy Gy Gy Gy Gy Gy

1'/. 15'/, 20'/, 2'/, 8'/, 30*/. 40"/, 50*/,

of of of of of of of of

Cord Heart Esophagus Lt.Lung Lt_Lung Rt_Lung Rt_Lung Norm_Tissue

Fig. 2. A sample prescription

For each structure, tumorous or healthy, there is at least one dose-volume constraint, specified by a percentage number on the left and a threshold value for dose on the right. The first four lines of the prescription are for a tumor and a so-called extended area around the tumor that is introduced to account for uncertainties about the boundary of the tumor. The first two lines state that the target tumor dose should be higher than a threshold value of 63Gy, although 5% of the target volume may be lower than that. On the other hand, the target dose should be below 72Gy except for a portion of 1% of volume. We can similarly interpret the specifications on the other structures. It should be pointed out that the dose-volume constraints for the target structures are very different in nature than those for the healthy structures. Obviously, they are not there for the purpose of sacrificing a part for the good of the whole. They are there because it is too difhcult or impossible to achieve

Fluence Map Optimization in IMRT Cancer Treatment Planning

211

a uniform target dose, so some imperfections are allowed. For example, we may very well regard the first two lines of the prescription as a perturbation to a target dose specification of 65Gy. In the rest of the paper, that is precisely the approach we will take; namely, we will assume that a single target dose value will be given for each target structure, knowing that it is unlikely to be exactly satisfiable. This assumption will significantly simplify our presentation, even though our formulation, to be introduced in the next section, can be extended to deal with dose-volume constraints for target structures. 2.4 Current Practice and Research The IMRT fluence map optimization problem has been extensively studied for a number of years, mostly by medical physicists but more recently also by operations researchers and apphed mathematicians. A survey on this subject from a mathematical viewpoint can be found in [SFOM99]. We now give a brief overview on the current practice and on-going research in IMRT fluence map optimization. Intended for a non-expert audience, this overview is by no means comprehensive. For a collection of recent survey papers on many aspects of IMRT treatment planning, including one on mathematical optimization by Censor [Cen03], we refer the reader to the book [PM03] and the references thereof. The fluence map optimization problem can be viewed as an inverse problem where one designates a desirable dose distribution and attempts to determine a beamlet intensity vector that best realizes the given distribution. There are different ways to formulate this problem into optimization problems using different objective functions, some biological and some physical. Biological models attempt to represent statistical knowledge of various biological responses, such as tumor control probability (see [Bra99], for example). At present, however, the predominant formulation is the "weighted least squares" model as described below, which is being used in most commercial IMRT systems on the market. If an objective function is associated with each anatomical structure, then this problem can be naturally viewed as a multi-objective optimization problem (for example, see [CLKBOl, LSB03]). However, due to the difficulties in directly solving multi-objective optimization problems, the prevalent approach in IMRT fluence map optimization is to use a weighted least squares fitting strategy. Although many variations exist, a typical form of the weighted least squares formulation is the following. For each voxel i, one tries to fit the calculated dose value di to some "desirable" value bi. For a target voxel, this "desirable" value is just the prescribed dose value for the tumor structure to which the voxel belongs. For a healthy voxel (for which there is really no "desirable" dose other than zero), it is usually set to the threshold value of the dose-volume constraint, though sometimes adaptive values are used. If a calculated dose for a healthy voxel is less than its "desirable" value, then the

212

Yin Zhang and Michael Merritt

corresponding error term is set to zero. This way, only those doses higher than their "desirable" values are penalized. Then to each target and critical structure, one attaches a weight parameter that represents the relative priority of fitting its calculated doses to the desired one. In fact, different tradeoffs between structures can be made by adjusting these weights. To illustrate, suppose there are four structures Sj, j = 0,1,2,3, each consisting of a set of voxels, where SQ is a target structure and the other three are healthy ones. Then the objective function in the weighted least squares formulation takes the form 3

fix) = Y,w^fM^))^

(3)

i=o where d{x) = Ax is the calculated dose vector corresponding to a beamlet intensity vector x > 0 (see (2)), Wj are the weights, fo{d) = J^{di-bif, i s So

fj{d) = Y, max(0,di " 6 i ) ^ j = 1,2,3, iSSj

and h is the vector of "desirable values" for all voxels. In this case, the resulting error function f{x) in (3) is a convex, piece-wise quadratic function of x. One seeks to minimize /(x) subject to the nonnegativity of the beamlet intensities in X. Obviously, solutions to this weighted least squares problem vary with the a priori choice of the weights. The weighted least squares model in (3) does not directly enforce the dosevolume constraints in a given prescription, which represent the most challenging aspect of the fluence map optimization problem. One approach to enforcing the dose-volume constraints is to try out different choices of the weights while using the dose-volume constraints as evaluation criteria for solutions. The fundamental difficulty in this approach is that there does not seem to exist any transparent relationships between weights and prescriptions. Hence, weight selection basically reduces to a trial-and-error process, which too often becomes overly time-consuming in terms of both human and computer times. In addition to manipulating the weights, some formulations (e.g. [WMOO]) add penalty terms to the weighted least squares objective to "encourage," but not impose, dose-volume constraint feasibility. These penalty terms are inevitably non-convex, thus introducing the complexity of having to deal with local minima. To address this problem, some systems include the option of using stochastic global optimization techniques such as simulated annealing and genetic algorithms to help escape from unsatisfactory local minima. Since the weighted least squares problems are usually quite large, gradienttype algorithms are often the methods of choice. Some implementations also employ conjugate gradient [SC98] or secant methods [LSB03]. The nonnegativity of beamlet intensities are enforced either by projection [WMOO] or some other means [LSB03].

Pluence Map Optimization in IMRT Cancer Treatment Planning

213

With all its shortcomings, the conceptual and algorithmic simpUcity of the weighted least squares approach is still attractive to the practitioners. Indeed, this current state of IMRT treatment planning does represent a remarkable progress in cancer radiotherapy. On the other hand, many issues remain, ample room for improvement exists, and intensive research activities are still on-going. One of the research directions in this field is the so-called weight optimization (see [XLD+99], for example), aimed at automating the weight selection process. The IMRT fiuence map optimization problem has attracted considerable attention from researchers in mathematical programming community who tend to formulate the problem into hnear or mixed-integer linear programs (see [LFC03, Hol03, HADL03, RLPT04] for a sample of some recent works). Linear programming techniques for radiation therapy have been proposed and studied since the early days [BKH+68] and have also been considered for treating dose-volume constraints [LMRW91]. On the other hand, the introduction of mixed-integer programming techniques into radiotherapy planning for treating dose-volume constraints was a more recent event. Many contributions from the mathematical programming community seem encouraging and promising. Their impact on the clinical practice of radiotherapy, even though limited at this point, will hopefully be felt over time. Finally, we reiterate that the above short overview is by no means comprehensive. Given the vast literature on this subject, many omissions have inevitably occurred, most notably works based on biological objective functions and works connecting the fiuence map optimization to the beam angle optimization as well as to the multileaf sequencing.

3 A Proposed Approach In our view, there are two levels of difficulties in IMRT fiuence map optimization, as outlined below. Our focus in this paper will be on the first issue. 1. Given a prescription, one needs to find a beamlet intensity vector so that the calculated dose from it will satisfy the prescription as closely as possible. The difficulty for this problem lies in the fact that dose-volume constraints define a complicated non-convex feasibility set. This leads to a non-convex global optimization problem that is difficult to solve exactly. The traditional weighted least squares approach relies, to a large degree, on a trial-and-error weight-selection process to search for a good plan. 2. Due to variations from patient to patient even for the same kind of cancer, more often than not, oncologists themselves do not know a priori a "good and achievable" prescription for a particular patient. A highly desirable prescription could be too good to be achievable, while an achievable one might not be close enough to the best possible. Development of procedures to assist oncologists in their decision-making is of paramount importance.

214

Yin Zhang and Michael Merritt

In this section we propose a geometric approach t h a t is entirely prescriptiondriven and does not require any artificial weights. At the same time, we will retain the least squares framework. As such, one can consider our formulation as a "weightless least squares" approach. We consider two sets in the dose space: (i) the physical set consisting of physically realizable dose distributions, and (h) the prescription set consisting of dose distributions meeting t h e prescribed tumor doses and satisfying the given dose-volume constraints. In the case where a prescription is given, we seek a suitable dose distribution by successively projecting between these two sets. A crucial observation is t h a t the projection onto the prescription set, which is non-convex, can be properly defined and easily computed. T h e projection onto the physical set, on the other hand, requires solving a nonnegative least squares problem. We show t h a t this alternating projection algorithm is actuahy equivalent to a greedy algorithm driven by local sensitivity information readily available in our formulation. Moreover, the availabihty of such local sensitivity information offers an opportunity to devise greedy algorithms to search for a desirable plan even when a "good and achievable" prescription is unknown. To keep the expository flavor of the paper, we will not include long and overly technical proofs for some mathematical results stated. A more complete treatment, including extensive numerical results, will be presented in a subsequent paper in preparation. 3.1 P r e s c r i p t i o n a n d P h y s i c a l S e t s We partition the rows of the influence matrix A into two groups: those for target voxels and those for healthy ones; t h a t is. A

At Ah

(4)

where At is the submatrix consisting of the rows for target voxels and likewise Ah of those for healthy voxels. Recall t h a t A S IR^J?^" where m = mt + ruh is the number of voxels, and n the number of bixels. T h u s At € IR!f!*^" and Ah £ R'^''^". W i t h this notation, Atx gives the calculated doses for the target voxels and AhX those for the healthy ones. We start by defining two sets in the dose space. D e f i n i t i o n 1 ( P r e s c r i p t i o n S e t ) . Let bt £ R ^ ' be the dose vector for target voxels in a given prescription, and Vy C R^''' be the set of dose vectors for healthy voxels that satisfy all the dose-volume constraints in the given prescription. We call the following set the prescription set

n=[['^:u^Vy^

C R!".

(5)

Fluence Map Optimization in IMRT Cancer Treatment Planning

215

Clearly, any dose vector d EH precisely meets the prescribed target doses and at the same time satisfies all the dose-volume constraints given in the prescription. If healthy tissue doses calculated from a beamlet intensity vector X e R" satisfy the dose-volume constraints, then we must have AhX < u; or equivalently, A^x + s = u ior some nonnegative slack variable s G R^''. Definition 2 (Physical Set). Let A be defined as in (4). We call the following set the physical set K.

Atx AhX + s

{x,s)>0}

CR"^.

(6)

Clearly, the physical set contains all the dose vectors that can be physically realized (disregarding the slack variable) under the standard dose calculation model. Both H and K, are closed sets in W^, and /C is a convex cone but Ti is non-convex. In fact, Vy is a non-convex union of convex "boxes." For example, suppose we have only two healthy tissue voxels in the region of interest and one dose-volume constraint: at least 50% of voxel doses must be less than or equal to IGy. Then Vy = {u £R% : ui < 1} L) {u &R% : U2 < 1}; i.e., either ui can be greater than one or U2, but not both. Clearly, this is the L-shaped (hence non-convex) region in the first quadrant along the two coordinate axes. Figure 3 shows the relation of Vy to H and K, for this case when there is one target voxel. Note the L-shaped region is elevated to a height corresponding to a given target dose value 64. In this figure, the two sets H and AC do not intersect. It is easy to imagine that with more voxels and more dose-volume constraints, the complexity of the geometry for T>y grows quickly out of hand. However, Vy always enjoys a very nice geometric property. That is, despite its non-convexity, Vy permits a trivial projection onto it once the issue of non-uniqueness is resolved (see below). For example, suppose that Vy c R^ specifies only one dose-volume constraint: at least 70% of the voxels must have doses of no more than 5Gy. Then Projo„((l,2,3,4,5,6,7,8,9,10)^) = (1,2,3,4,5,5,5,8,9,10)^. where Projp is the projection onto Vy. That is, we set the smallest two numbers greater than 5 equal to 5. Clearly, this is the closest point in Vy as it affects the least change on the original point in R^. Since Vy is nonconvex, such a projection will not always be unique, but this issue can be resolved by setting some priority rules. It is not difficult to see that dosevolume constraints (DVCs) for multiple structures, and multi-level DVCs for the same structure, can be treated in a similar fashion. Moreover, it is worth noting that projecting a point d G R^ onto H is tantamount to setting the first mt components of d (the target voxel dose values) to bt and projecting the last m^ components of d (the healthy voxel dose values) onto Vy. On the other hand, projecting onto IC is substantially more difficult and will be discussed next.

216

Yin Zhang and Michael Merritt

besirable and beliverable Fig. 3. Prescription set 7-t and physical set K in dose space :!FI

3.2 O p t i m i z a t i o n F o r m u l a t i o n s

Given a prescription, ideally we would like to find x E iR7, s E R y ' h n d u E D, such that Atx = bt, Al,x s = u,

+

but this system is generally over-determined and does not permit a solution. To see this, it suffices to examine the first equation Atx = bt which has mt equations with n unknowns. In practice, there are usually more target voxels than the total number of bixels, i.e., mt > n. The reality of the IMRT fluence map problem is that there may be no physically achievable dose that both satisfies the DVCs and meets the target prescription. That is, 1-I n K = 0; or K) > 0 where dist(., .) is the Euclidean distance between equivalently, di~t(1-I~ two sets. Thus, we are motivated t o find a prescription dose vector dT = [b: uT], u E D,, that is closest to the physical set K (or vice versa). In this view, we have an optimization problem with a variable u E D, (because bt is fixed):

The objective in the problem (7) describes the distance from a given prescription dose vector t o the physical set K which can be written as

Fluence Map Optimization in IMRT Cancer Treatment Planning

217

where || • || is the Euchdean norm by default (though any other fixed, weighted norms can be used as well). Equivalently, we can replace the norm above by one half times the square of the norm and define the following objective function fiu) = min^ i WAx - btf + i \\A„x + s - uf . (8) Namely, f{u) is itself the optimal value of a linear least squares problem with nonnegativity constraints. Using this notation, we can rewrite the problem (7) into the following equivalent form min f(u).

(9)

It is not difficult to show that f{u) decreases monotonically as u increases. Let {x{u),s{u)) be the solution of the optimization problem defined in the right-hand side of (8) for a given u e Vy. Then under suitable conditions, it can be proved that f{u) is differentiable and Vf{u) = -ma.x{0, Ahx{u)-u)

0. For our problem, it is usually the case that dist{H,IC) > 0. Successive or simultaneous projection algorithms have been applied to different formulations of the IMRT flucence map optimization problem, see [CLM+98, WJLM04, XMCG04] for example, where the sets involved in projections are all convex sets (in some cases convex approximations to non-convex sets). To our best knowledge, projections have not been directly apphed to the non-convex DVC feasibility set Vy defined in (9). The set Vy in (9) consists

218

Yin Zhang and Michael Merritt

of a large number of "branches," where one or more of the voxels has a dose that exceeds its threshold dose. Obtaining or verifying a global minimum on such a set can be excessively difficult. We will instead seek a local minimum in one of the branches. We propose to apply an alternating projection algorithm to find a local minimum of (7) by successively projecting iterates in dose space back and forth between the prescription set Ti and the physical set IC. Specifically, given do &7i and for /e = 0,1,2,..., do 4+1/2 = Proj^cK),

4 + 1 = Proj„((ifc+i/2).

(H)

In this algorithm, the iterates are in the prescription set, while the intermediate iterates are in the physical set corresponding to a sequence of beamlet intensity vectors {xi;^i/2 : k = 0,1,2,...}. As mentioned earlier, the projection onto H is easy, and the projection onto IC requires solving a nonnegative hnear least squares problem as defined in the right-hand side of (8). We emphasize that the starting point do & Ti should be chosen to satisfy threshold values of all the dose-volume constraints; i.e., do should be in the intersection of all the "branches" (or "boxes"). For example, if a dosevolume constraint for a given structure is "no more than 30% of voxels can have dose values greater than or equal to 20Gy," then we should require that every component of do corresponding to a voxel of that structure to be set to the threshold value 20 (or possibly lower). As the iterations progress, the algorithm will then automatically select voxels where the threshold value of 20 will be exceeded. This way we avoid arbitrarily selecting which "branch" to enter at the outset. 3.4 Equivalence to a Greedy Algorithm We now consider a gradient projection algorithm directly appfied to the problem (9): given uo G !?„, •Ufc+i =Projj,^^(ufc-afcV/(ufc)), fc = 0 , 1 , 2 , . . . .

(12)

This is a steepest-descent type algorithm, or a greedy algorithm. At each step, the movement is based on the local sensitivity information ~ the gradient of f{u). Likewise, we select the initial iterate UQ to be at or below the threshold values of all the dose-volume constraints, ensuring u & T>y. Then the algorithm will automatically increase u (recall that Vf{u) < 0) in proportion to the sensitivity of the objective function at the current iterate. Moreover, the projection Proj-p^ that follows each move will keep the iterate within the feasibihty set Vy. Thus, this algorithm can be considered as a sensitivity-driven greedy algorithm for solving (9). We note that f{u) is monotone in the direction —Wf{u) > 0. Hence the step length selection in (12) seems not as critical as in general situations. We now show that the constant step length a/j s 1 will lead to an algorithm that is equivalent to the alternating projection algorithm (11).

Fluence Map Optimization in IMRT Cancer Treatment Planning

219

Theorem 1. Let {dk} and {uk} be generated by algorithms (11) and (12), respectively, where d]^ = \bj UQ\ and ai- = 1 in (12). Then dk =

bt

/c = l , 2 , 3 , . . . .

(13)

Proof. Let us drop the iteration subscript k. Define x{u) and s{u) as the solutions associated with the subproblem in the right-hand side of (8). By the definitions of the relevant projections, Proj„ (^Proj^ (^ J j j = P r o j „ (^^^(^^ + ,(^) j = (proj^^(yl,J(u) + ^(u)) Therefore, it suffices to show that u —V/(it) = Afix{u) + s{u). By the gradient formula (10) of V / ( u ) , u — Vf{u) = u + max(0, Ahx{u) — u) = max(u, Ahx{u)). So, it remains to show that Ahx{u) + s{u) = max(u, Ahx{u)) for all u E Vy. In the following, we use subscripts to denote components of vectors. If [A/jx(u)]j < Ui, then necessarily the slack variable s{u)i > 0 must satisfy [Ahx{u) + s{u)]i =Ui = max{ui, [Ahx{u)]i). On the other hand, if [i4/ia;(u)]i > m, then necessarily the slack variable s{u)i = 0 and [Ahx{u) + s{u)]i = [Ahx{u)]i = max{ui, [Ahx{u)]i). This completes the proof. D The equivalence between these two algorithms allows us to view the problem geometrically and apply the alternating projection algorithm (11) with the confidence that locally, reasonable choices are being made as to which dose bounds to relax to take advantage of the flexibility in the dose-volume constraints. 3.5 Convergence to Local Minimum Since the prescription set 7i is non-convex, the classic convergence theory for alternating projection algorithm is not directly apphcable. In our limited computational experience, the algorithm has never failed to converge so far. We observe that despite H being non-convex, it is the union of finitely many (simple) convex sets. This "local convexity" of 7i seems to likely allow a modified convergence proof for the alternating projection algorithm in our case, which remains to be a further research topic. In the meantime, if we introduce a minor modification to the alternating projection algorithm, then the convergence of the algorithm to a local minimum will be guaranteed. For simplicity, let us assume that there is only one healthy structure with a single dose-volume constraint that defines the feasibility set !?„:

220

Yin Zhang and Michael Merritt "No more than P-percent of the healthy voxels can receive doses exceeding a given threshold value 7 > 0."

Let us work with the simpler algorithmic form (12). Suppose that at iteration k, the dose vector u^ S T>y is such that (rfcP)-percent of the voxels have already exceeded the threshold value 7 for some r^ G [0>1]- Then we define 2?^ to be the set of dose vectors for the healthy structure that satisfy the following dose-volume constraint: "No more than (1 — Tfc) P-percent of the healthy voxels corresponding to [uk]i < 7 can receive doses exceeding the threshold value 7." In this setting, once a voxel has taken a dose value [uk]i > 7 at some iteration k, it will be allowed to take dose values greater than 7 for all the subsequent iterations. Moreover, once r/c = 1 at some iteration k, then in all subsequent iterations no more dose upper-bounds will be allowed to exceed 7 except those already having been allowed. The modified algorithm wiU take a projection at iteration k onto the set V^ instead of onto Vy; that is, Mfc+i =ProJx,fc(Mfc - V/(wfe)), /c = 0 , 1 , 2 , . . . .

(14)

In essence, this algorithm provides a greedy scheme to select a set of healthy voxels that are allowed to receive higher doses. We now state the following convergence result for this algorithm without a proof. Theorem 2. Let the iteration sequence {uk} be generated by the algorithm (14) withuo < 7. Then {uk} C Vy and satisfies that (i) w^+i > Uk componentwise, (a) f{uk+i) < /(wfc), and (Hi) {uk} converges to a local minimum w* of f{u) in Vy. We emphasize that the proposed algorithms in this paper are designed for quickly finding a good local optimum instead of locating a global optimum. A number of studies [WM02, LDB+03, JWM03] indicate that the existence of multiple local minima due to dose-volume constraints does not appear to notably affect the quality of treatment plans obtained by the weighted least squares approach. A plausible interpretation of this phenomenon is that there exist many easily reachable local minima with function values very close to the global minimum value. Given the presence of various errors in mathematical models (such as dose calculation models) and in data measurements, finding a global optimum for the underlying non-convex optimization problem does not seem necessary nor practically meaningful, as long as a good local minimum is found. Of course, it is still important to carefully assess the quahty of obtained solutions from a clinical viewpoint. Let us examine the dose values calculated at the solution u^. Clearly, w* € Vy and Ax{u^) e K. (corresponding to s = 0). However, in general one should not expect that Ahx{Ui,) G Vy. That is, the locally optimal physical dose calculated by the algorithm generally does not satisfy the dosevolume constraints, because such constraints are not explicitly imposed in

Fluence Map Optimization in IMRT Cancer Treatment Planning

221

our "weightless least-squares" formulation, just as in weighted least-squares formulations. While this lack of a direct control over the dose-volume constraint satisfaction could be viewed as a potential disadvantage on one hand, it does usually allow fast solution times on the other hand. 3.6 Preliminary Numerical Results In this section, we demonstrate the potential of our algorithm on some twodimensional phantom cases. The region of interest is a 101 x 101 voxel cross section of a simulated treatment area. Each of the three test cases have different geometries for the "tumor" and "critical organ", or organ at risk (OAR). The simplest involves a C-shaped tumor that has grown around a small OAR. More challenging is a small OAR completely surrounded by an "O"-shaped tumor. In the third case, we add further comphcation to the "O" configuration by having the OAR also include a rectangular region just outside the tumor. The geometries of these cases, as outhned in the top-side pictures of Figures 46, are nontrivial and, in our view, sufficient for prehminary proof-of-principle studies. In all the test cases, we specify the prescribed target dose to be 80Gy for all the tumors, and consider the dose-volume constraint: "at most 30% of the critical organ voxels can have doses greater than 25Gy." We label as "normal" all the tissue which is neither the tumor nor the organ at risk. Although not as concerning as injury to the critical organ, we would always hke to prevent this normal tissue from receiving too high a dose. Therefore, we also specify an upper bound of 75Gy for the normal tissue, equivalent to a dose-volume constraint: "0% of the normal tissue can have doses greater than 75Gy." Additionally, each plan uses 9 coplanar beams with dose absorption governed by an influence matrix (i.e., A in (2)) that we have obtained from The University of Wisconsin-Madison Tomotherapy Research Group. We implemented the algorithm (14) in Matlab. To perform the minimization in (8) (projection onto /C) at each iteration, we used an interior-point scaled gradient algorithm [MZ04]. In anticipation that least squares solutions will allow calculated doses to vary both above and below their desired values, to be on the safer side we adjust bt to be 5% higher than the desired tumor dose SOGy, and similarly the OAR threshold value to be 15% lower than the desired 25Gy. We stop the algorithm once the relative change from u^ to u^+i becomes less than 1%. In our experiments, we have observed that the algorithm took very few (usually two) iterations to terminate in all the tested cases. Our computational results are presented in Figures 4-6 corresponding to the three test cases. In each figure, we have included a dose distribution on the left, and a dose-volume histogram (DVH) on the right. The dose distribution indicates the level of calculated radiation intensity (in gray scale) deposited in the region of interest. As can be seen, the calculated doses are well focused on the tumors while more or less sparing the critical organs. The dose-volume

222

Yin Zhang and Michael Merritt

histograms show the relationship between a given dose value (in x-axis) and the volume percentage (in y-axis) of an anatomical structure receiving that level of radiation or higher. For instance, in Figure 4 or 5 the point (40,O.l) on the "Normal" curve means that 10% of the normal tissue has received a radiation 40Gy or higher.

Fig. 4. C-Shape Dose Distribution and DVH

We have previously performed computational experiments on the same set of phantom cases with a weighted least squares (WLS) approach and a successive linear programming (SLP) approach [MZL+04]. Given these experiences, drawing some comparison would be useful. The weighted least squares approach requires trying out multiple sets of weights and solving a non-negative least squares problem for each set of weights. As such, it generally requires considerably more computation than the new approach. In [MZL+04], we used an exhaustive search, designed only for cases with one critical organ and one tumor, t o find a set of optimal

Fluence Map Optimization in IMRT Cancer Treatment Planning

223

Fig. 5. 0-Shape Dose Distribution and DVIl

weights. With such optimal weights, the WLS approach produced solutions of a quality similar t o that of the new approach. The SLP approach enforces the exact satisfaction of the dose-volume constraints and solves a sequence of linear programs. It obtained slightly better quality solutioris than the new approach, but required far more computation than the new approach. In addition, the beamlet intensity distributions generated by the SLP approach are generally less smooth, creating difficulties for the later leaf-sequencing stage. For more details on the WLS and SLP approaches, we refer interested readers t o [MZL+04]. These preliminary numerical results, as encouraging as they may appear, constitute only a first step towards validating the viability of the proposed approach.

224

Yin Zhang and Michael Merritt

Fig. 6. OA-Shape Dose Distribution and DVH

4 Final Remarks The IMRT fluence map optimization problem arises, along with a few other optimization problems, from the state-of-the-art technologies of radiation therapy for cancer treatment. The problem has been extensively studied by medical physicists, and has also attracted considerable on-going research from the operations research and optimization communities. Currently, the predominant methodology in practice is the "classic" weighted least squares (WLS) approach, which focuses on determining an optimal beamlet intensity vector. In this paper, we take a different view t o treat dose distributions as the primary variables, resulting in a formulation based on the geometry in "dose space." It is our purpose t o retain the popular "least squares" framework, while doing away with the burden of having t o select weights in the classic WLS approach. The proposed formulation is free of weights, prescriptiondriven, sensitivity guided, and still shares basic characteristics of a leastsquares approach such as not having a precise control over the dose-volume

Fluence Map Optimization in IMRT Cancer Treatment Planning

225

constraint satisfaction and, at the same time, being much less computationally demanding. It is designed for quickly finding a good locally optimal plan associated with a given prescription. Prehminary computational results indicate t h a t the approach is potentially capable of producing solutions of a quality at least comparable to t h a t obtainable by the classic WLS approach. Encouraged by these proof-of-principle results, we are currently working towards more reahstic testings on three-dimensional chnical cases. T h e approach presented in this paper is only one of many on-going research efforts in helping optimize IMRT cancer t r e a t m e n t planning. It is hopeful t h a t an active participation of the operations research and optimization communities in this important application field will bring about an advancement to cancer t r e a t m e n t planning.

Acknowledgment T h e first author would like to thank his colleagues in the Optimization Collaborative Working Group, sponsored by the National Cancer Institute and the National Science Foundation, and Dr. Helen Liu of M. D. Anderson Cancer Center, from whom he has learned the basics about cancer radiation therapy.

References [BKH+68] G. K. Bahr, J. G. Kereiakes, H, Horwitz, R. Finney, J. Galvin, and K. Goode. "The method of linear programming applied to radiation treatment planning." Radiology, 91:686-693, 1968. [BB93] H. Bauschke and J. Borwein. "On the Convergence of von Neumann's Alternating Projection Algorithm for Two Sets," Set-Valued Analysis, 1: pp. 185-212 (1993). [Bra99] A. Brahme. "Optimized radiation therapy based on radiobiological objectives," Sem. in Rad. Oncol, Vol. 9, No. 1: pp. 35-47, (1999). [Cen03] Y. Censor. "Mathematical optimization for the inverse problem of intensity modulated radiation therapy." In: J.R. Palta and T.R. Mackie (Editors), Intensity-Modulated Radiation Therapy: The State of The Art. American Association of Physicists in Medicine, Medical Physics Monograph No. 29, Medical Physics Publishing, Madison, Wisconsin, USA, 2003, pp. 25-49. [CG59] W. Cheney and A. Goldstein. "Proximity maps for convex sets." Proceedings of the AMS, Vol. 10: pp. 448-450 (1959). [CLM+98] Cho PS, Lee S, Marks RJ II, Oh S, Sutlief SG and Phillips MH. "Optimization of intensity modulated beams with volume constraints using two methods: cost function minimization and projections onto convex sets". Medical Physics 25:435-443, 1998. [CLKBOl] C. Cotrutz, M. Lahanas, C. Kappas,and D. Baltas. "A multiobjective gradient-bcised dose optimization algorithm for external beam conformal radiotherapy," Phys. Med. Biol. 46: pp. 2161-2175 (2001).

226 [H0IO3]

Yin Zhang and Michael Merritt

A. Holder. "Designing Radiotherapy Plans with Elastic Constraints and Interior Point Methods," 2003, Health Care and Management Science, vol. 6, num. 1, pages 5-16. [JWM03] R. Jeraj, C. Wu and T. R Mackie. "Optimizer convergence and local minima errors and their clinical importance." Phys. Med. Biol. 48 (2003) 28092827. [LSB03] M. Lahanas, M. Schreibmann, and D. Baltas. "Multiobjective inverse planning for intensity modulated radiotherapy with constraintfree gradient-baised optimization algorithms," Phys. Med. Biol., 48(17): pp. 2843-71 (2003). [LMRW91] R. G. Lane, S. M, Morrill, I. I. Rosen, and J. A.Wong. "Dose volume considerations with linear programming optimization." Med. Phys., 18(6):1201-1210, 1991. [Lar97] E. W. Larsen. "Tutorial: The Nature of Transport Calculations Used in Radiation Oncology," Transport Theory Statist. Phys., 26: pp. 739 (1997). [LFC03] E. Lee, T. Fox, and L Crocker. "Integer Programming Applied to Intensity-Modulated Radiation Treatment Planning Optimization," Annals of Operations Research, Optimization in Medicine, 119: pp. 165-181 (2003). [LDB+03] J. Llacer, J. Deasy, T. Bortfeld, T. Solberg and C. Promberger. "Absence of multiple local minima effects in intensity modulated optimization with dose-volume constraints." Phys. Med. Biol. 48 (2003) 183210. [MZL+04] M. Merritt, Y. Zhang, Helen Liu, Xiaodong Zhang, Xiaochun Wang, Lei Dong, Radhe Mohan. "A successive linear programming approach to the IMRT fluence map optimization problem." Manuscript, 2004. [MZ04] M. Merritt and Y. Zhang. "An Interior-Point Gradient Method for LargeScale Totally Nonnegative Least Squares Problems." To appear in JOTA, Vol. 126, No. 1 (2005), pp. 191-202. [PM03] J. R. Palta and T. R. Mackie, eds. "Intensity-Modulated Radiation Therapy: The State of the Art," Medical Physics Publishing, 2003. [RLPT04] R. Rardin, M. Langer, F. Preciado-Walters, V. Thai. "A coupled column generation, mixed integer approaches to optimal planning of intensitymodulated Radiation therapy for cancer." To appear in Mathematical Programming, 2004. [HADL03] H. Romeijn, R. Ahuja, J. Dempsey, A. Kumar and J. Li. "A novel linear programming approach to fluence map optimization for intensity modulated radiation therapy treatment planning," Phys. Med. Biol., Vol. 48: pp. 3521-3542 (2003) [SFOM99] D. Shepard, M. Ferris, G. Olivera, and T. Mackie. "Optimizing the delivery of radiation therapy to cancer patients," SIAM Review, 41: pp. 721744 (1999). [SC98] S. V. Spirou and C. Chui. "A gradient inverse planning algorithm with dose-volume constraints," Med. Phys., 25(3): pp. 321-333 (1998). [Neu50] J. von Neumann. "The geometry of orthogonal spaces, Functional operators - vol. II." Annals of Math. Studies, no. 22, Princeton University Press, 1950. (This is a reprint of mimeographed lecture notes, first distributed in 1933.)

Fluence Map Optimization in IMRT Cancer Treatment Planning [WMOO]

227

Q. Wu and R. Mohan. "Algorithms and functionality of an intensity modulated radiotherapy optimization system," Med. Phys., 27(4): pp. 701-711 (2000). [WJLM04] C. Wu, R. Jeraj, W. Lu and T. Mackie. "Fast treatment plan modification with an over-relaxed Cimmino algorithm." Med. Phys. 31:191-200 (2004). [WM02] Q. Wu and R. Mohan. "Multiple local minima in IMRT optimization based on dose-volume criteria." Med. Phys. 29 (2002) 151427. [WDL+03] Q. Wu, D. Djajaputra, M. Lauterbach, Y. Wu, and R. Mohan. "A fast dose calculation method based on table lookup for IMRT optimization," Phys. Med. Biol, 48(12): pp. N159-N166 (2003). [XMCG04] Y. Xiao, D. Michalski, Y. Censor and J. Calvin. "Inherent smoothness of intensity patterns for intensity modulated radiation therapy generated by simultaneous projection algorithms." Phys. Med. Biol. 49 (2004) 32273245. [XLD+99] L. Xing, J. G. Li, S. Donaldson,Q. T. Le,and A. L. Boyer. "Optimization of importance factors in inverse planning," Phys. Med. Biol., 44(10): pp. 2525-2536 (1999).

Panoramic Image Processing using Non-Commutative Harmonic Analysis Part I: Investigation Amal Aafif^ and Robert Boyer^ ^ Department of Mathematics Drexel University, Philadelphia, PA 19104, USA. amalQdrexel.edu ^ Department of Mathematics Drexel University, Philadelphia, PA 19104, USA. rboyerSmcs.drexel.edu

Summary. Automated surveillance, navigation and other applications in computational vision have prompted the need for omnidirectional imaging devices and processing. Omnidirectional vision involves capturing and interpreting full 360° panoramic images using rotating cameras, multiple cameras or cameras coupled with mirrors. Due to the enlarged field of view and the type of sensors required, typical techniques in image analysis generally fail to provide sufficient results for feature extraction and identification. A non-commutative harmonic analysis approach takes advantage of the Fourier transform properties of certain groups. Past work in representation theory already provides the theoretical background to analyze 2-D images though extensive numerical work for applications is limited. We will investigate the implementation and computation of the Fourier transform over groups, such as the motion group. The Eucfidean motion group SE{2) is a solvable Lie group that requires a 2-D polar F F T and has symmetry properties that could be used as a tool in processing panoramic images.

1 Introduction Applications in computer vision have expanded as larger and larger images can be stored, processed and analyzed quickly and efficiently. Autonomous robot navigation, automated surveillance and medical imaging all benefit from expanded fields of view with minimal distortion. Rotating cameras, multiple cameras and mirror-camera systems are being developed to capture 360° panoramic images. An image taken from a camera aimed at a spherical mirror, for example, will give a very distorted view of the surroundings. While the image can be "unwrapped" through a cyhndrical transformation to create a panoramic image, the objects will appear distorted and blurry, making feature identification difficult. Several camera-mirror systems with various mirror shapes have been proposed to minimize distortion and eliminate the need for

230

Amal Aafif and Robert Boyer

pre-processing. However, efficient methods for feature identification and template matching for such images are still needed for automated image analysis. Conventional approaches to pattern recognition include moment invariants and Fourier descriptors however they are generally not suited for omnidirectional images. A group theoretical approach combined with Fourier analysis takes advantage of the symmetry properties of the Fourier transform over certain groups, particularly matrix Lie groups. By defining the general Fourier transform over a group, one can find invariants and descriptors similar to those mentioned above. This paper presents a preliminary investigation into a non-commutative harmonic analysis approach proposed in [GBS91] and [Fon96] and its possible application to panoramic images. Sections 2 and 3 provide the necessary background in Lie group theory and representation theory applied to the group of motions on the plane. Section 4 describes the motion descriptors and the invariants to be extracted from 2-D black & white and gray-scale images. The last two sections illustrate the results and effectiveness of the invariants as a tool for identifying objects under rotation, translation and scahng.

2 Basics of m a t r i x Lie groups and Representations An important class of groups is the matrix Lie groups over the real and complex numbers. The general linear group GL{n;R) or GL{n,C) over the real or complex numbers is the group of all n x n invertible matrices with real or complex entries. Regular noncommutative matrix multiplication is the group operation. A matrix Lie group is any subgroup G of GL{n,C) with the property that if any sequence of matrices in G converges to some matrix A, then either ^ G G or >1 is not invertible. This property holds if and only if a matrix Lie group is a closed subset of GL{n,C). Several matrix Lie groups wiU be taken into consideration in this paper. The set of all n x n orthogonal (ie. A^ = A~^) matrices with determinant 1 is the special orthogonal group SO{n), where both orthogonahty and having determinant 1 are preserved under hmits. SO{n) is then the matrix Lie group of rotations. Orthogonal groups generalized to complex entries are also matrix Lie groups. The set of all unitary matrices also form a subgroup of GL{n; C). A matrix is unitary if A* = A~^ where A* is the adjoint or conjugate-transpose of A. Unitary matrices have orthonormal column vectors and preserve the inner product. The Euchdean group E{n) is the group of aU one-to-one distance preserving maps from M" to itself. Rotations and translations are described by this group; so the orthogonal group is a subgroup of the Euclidean group. For X G M", the translation by x is defined by Tx{y) = x + y. Every element

Panoramic Image Processing T of E{n) can be expressed uniquely lation Tx where R G 0{n): T = T^R. {Ri,xi){R2,X2) = {RiR2,xi + -R1X2). is not a subgroup of the general linear matrices of the form:

231

as a rotation R followed by a transThe group operation is described by Since translations are not linear E{n) group but is isomorphic to a group of ^i\

/

R

X2

Voo... 1 / E{n) is not connected, ie. for A,B G E{n), we cannot always find a continuous path lying in the group from A to B. However, each non-connected matrix Lie group can be decomposed into connected components. If the component of G contains the identity then it is also a subgroup of G. Other important groups such as the positive reals under multiplication are usually not thought of as matrix groups; however they are isomorphic to matrix Lie groups. M* and C* are the groups of non-zero real numbers and complex numbers under multiplication that are isomorphic to the general linear group. S^ is a commutative group of complex numbers with modulus 1, ie. the unit circle. The orthogonal groups over M. and S^ satisfy the conditions for compactness: 1. If Am is any sequence of matrices in a matrix Lie group that converges to matrix A, then A is in the group. 2. There exists a constant C such that for aU A G G, \Aij\ < C for all The orthogonal groups over C, the Euchdean groups, M* and C* all violate property 2 and are therefore not compact. Before defining the Fourier transform over groups, some background in representation theory is required. If G and H are matrix Lie groups, we can define a continuous map 0 : G —> H caUed a Lie group homomorphism that satisfies 0{gh) = (!>{g)l, A G Z}

(2)

The second representation, 0^, which will not be used here, maps an element of SE{2) into operators acting on C

where z £ C. The first representation, (p\ maps an element {9,x,y) into operators that act on L'^{S^,d9), the set of square summable functions on the unit circle with Lebesgue measure d9:

where z & S^ and F G L'^. It is typically more common to express the elements of the group and unitary representation in polar coordinates so that g G SE{2) and (f)\ are functions of 9,p and co. The unitary representation of SE{2) can generally be expressed as a matrix; let Um,n{9{9, p, w), A) be a matrix element of the representation 4>\{9, p,u!) then Um,n{9(.9,p,u;),X) = i"—e-^("^+('"-")-)j^_„(pA)

(3)

where Jk{x) is the fcth Bessel function. The matrix unitary representation of SE{2) allows us to see the group symmetry properties more clearly [CKOO]: Wm,n(ff,A) =

(-l)'"-"u__„,_„(5,A)

Um,n{9,X) = '«m!n(5>A) = U m , n ( 5 ~ \ A ) Um,n{9{9,-p,U!),X)

UrrUg{9,p,uj),X)

=

{-l)'"-"u„i,n{9{9,P,l^j),X)

= {-ir-"u^A9{-~9,p,uj^9),X)

(4)

where Um,n is the conjugate of Um,n- Since SE{2) is neither compact nor commutative, the matrix representation is infinite dimensional. However, the representation is still irreducible.

234

Amal Aafif and Robert Boyer

With the representation established, we can go back to defining the Fourier transform on SE{2) using the representation (f)\{9,x^y). The Fourier transform of a function / on L^{SE{2)) n L^{SE{2)) is /(A)=

I

f{9,x,y)o\\LHf2)

< Cse

/,(!)!

< :::C 4

L2(r)

Homogenization of a Nonlinear Elliptic Boundary Value Problem

267

3 Existence and Uniqueness Consider the problem, Auc = 0 in J? fill

- ^

= f{x/e,u,)onr

- ^ = ov

(1)

0ondf2\r

where f{y,v) = A(y)[e°(2')('^-^(J'» ~ e~(i-«(2^»(''"^(''»]. We consider the 3D problem, i.e. let J? C 5R^^ C 3?^. Here Y = [0,1]^ and A,a, and V are piecewise smooth real valued y-periodic functions, we also assume there exist constants Xo,Ao,ao,Ao and Vb such that 0 < AQ < A(y) < AQ and 0 < ao < a{y) < AQ < I and |V^(y)| < VQ. We show that the energy minimization forms of the problem (1 ) have unique solutions in H^{Q). For a given e, define the following energy functional, E,{v) = 11

2 Jn

\Vv\^dx+

f

Jr

F{x/e,v)dx

where, Hy),

,a(y)(v~V(y)) {y){v--V{y))

F{y,^) = —r^e a{y)

,

My) ^\y)

1 - a{y)

e

(i-a(y))(v-V(y))

Theorem 1 (Existence and Uniqueness of the Minimizer). There exists one function u^, G H^{f2) solving E^{u^) = min„g/^i(j7) E^{u). Proof: Note that -^F{y,v)

= A(y)a(y)e«(2/)(''-^(^» + A(y)(l - a(j/))e^(i^"(^»('^^^(^»,

since A > 0,a > 0, and 1 — a > 0 we have that ^F > 0. It is easy to see that the partial derivative is bounded below. That is, there exists a constant Co, independent of y and v such that, ^F{y,v)>co>0. Since F is smooth in the second variable, for any v,w & H^{n) and for any y, there exists some ^ between v + w and v — w such that F{y, v + w)+ F{y, v - w) ^ 2F{y, v) = -g^F{y, which from the lower bound yields

^w^

268

Y.S. Bhat F(x/e, V + w) + F{x/e, v — w) — 2F{x/e, v) > CQW'^

whence E^{v + w) + Ee{v - w) - 2E^{v) > / \S/w\'^dx + co / w'^dx Jn Jr >CQ\\w\\]ii(n)

(2)

where the last inequality follows by a variant of Poincare. Now let { u " } ^ i be a minimizing sequence, that is i?e(u") ^

inf

E^{u)

as

n - > oo.

note that clearly infug//i(^) E^{u) > —oo. Let

and tf =

2 then note that v + w = u" and v -^ w = u^ and so £;e(v + ^ ) + i?e(^ - ^ ) - 2E,{V) > | ! | < - ^ | | < ~ < i | 2 .

Now if we let m,n —> oo, we see that {u"}n is a Cauchy sequence in the Hilbert Space H^{Q). Define u^ to be its limit in H^{Q). Then we have < ^ u ,

in

H\Q)

in

L2(J-)

which by the Trace Theorem implies,
u,

which implies (Rudin [Rud66],p.68) there exists a subsequence {w"''}fc, such that u^ —> Mf a.e. in /^. So now we claim F(a;/e,Ue) = hminfF(x7e,u^)

a.e..

(3)

Since F is smooth in the second variable, and u^ —> u^ a.e. in F we have that F{-,Uc) = Umfc_,oo-P'(7,'"e) a.e which clearly implies (3). Now note that

Homogenization of a Nonlinear Elliptic Boundary Value Problem

269

clearly F(f,u,^) > 0 Vfc,fc = 1,2,.... So that by Fatou'sLemraa(Rudin [Rud66], p.23) we can claim, F{x/e,Uc)dx

~^\\u, - usWl.^a^

we have inf„g^:(n) E,{u) < E,{^i^) < Ee{uc) + E,{us) -2^^mi

whence,

E,{u) = 0.

So Ue = Us a.e. in H^{Q). Thus we have shown the uniqueness of the minimizer. D Note that this argument can be generalized to address the n-dimensional problem, i.e. the case in which we have Q C iJJ",/^ C 3?"^^ with boundary period c e n y = [0, l ] " - i .

4 Numerical Experiments Finally we wish to numerically observe the behaviour of the homogenized boundary value problems as a way to describe the behaviour of the current near the boundary. We plan to use a finite element method approach to the 2-D problem. For the 2-D problem the domain J7 is a unit square and the boundary F is the left side of the unit square, that is F = {(xi,a;2) : Xi = 1}. In this case we impose a grid of points (called nodes) on the unit square and triangulate the domain, then introduce a finite set of piecewise continuous basis functions. Now we wish to minimize the energy functional with respect to these basis functions. In particular we assume that the minimizer can be written as a linear combination of basis functions, we write

Y^mh

270

Y.S. Bhat

where m is the number of nodes, and {bi}^^ is the set of basis functions. We a t t e m p t to minimize the energy over the set of coefBcients {?7i}™ i using a conjugate descent algorithm developed by Hager and Zhang. We are currently implementing this minimization and refer the reader to future publications for numerical results. Acknowledgments: All figures appear courtesy of Valerie Bhat.

References [BM05] Bhat, Y.S. and Moskow, S. (2005), "Homogenization of a nonlinear elliptic boundary value problem modeling galvanic currents",in preparation. [Eva98] Evans, L.C. (1998), Partial Differential Equations, American Mathematical Society, Providence, Rhode Island. [Hol95] Holmes, M.H. (1995), Introduction to Perturbation Methods, SpringerVerlag New York Inc., New York, New York. [MS88] Morris, R. and Smyrl, W. (1988), "Galvanic Interactions on periodically regular heterogeneous surfaces," AlChE Journal, Vol. 34, 723-732. [Rud66] Rudin, W. (1966), Real and Complex Analysis, McGraw-Hill, New York, New York. [VX98] Vogelius, M. and Xu, J.M. (1998), "A nonlinear elliptic boundary value problem related to corrosion modeling" Quart.Appl.Math., Vol. 56, No. 3, 479-505.

* www.math.ufl.edu/'^hager

A Simple Mathematical Approach for Determining Intersection of Quadratic Surfaces Ken Chan The Aerospace Corporation, 15049 Conference Center Drive, Chantilly, VA 20151, USA. [email protected]

Summary. This paper is primarily concerned with the mathematical formulation of the conditions for intersection of two surfaces described by general second degree polynomial (quadratic) equations. The term quadric surface is used to denote implicitly a surface described by a quadratic equation in three variables. Of special interest is the case of two ellipsoids in the three dimensional space for which the determination of intersection has practical applications. Even the simplest of traditional approaches to this intersection determination has been based on a constrained numerical optimization formulation in which a requisite combined rotational, translational and dilational transformation reduces one of the ellipsoids to a sphere, and then a numerical search procedure is performed to obtain the point on the other ellipsoid closest to the sphere's center. Intersection is then determined according to whether this shortest distance exceeds the radius of the sphere. An alternative novel technique, used by Alfano and Greer [AGOl] is based on formulating the problem in four dimensions and then determining the eigenvalues which yield a degenerate quadric surface. This method has strictly relied on many numerical observations of the eigenvalues to arrive at the conclusion whether these ellipsoids intersect. A rigorous mathematical formulation and solution was provided by Chan [ChaOlJto explain the myriads of numerical observations obtained through trial and error using eigenvalues. Moreover, it turns out that this mathematical analysis may also be extended in two ways. First, it is also valid for quadric surfaces in general: ellipsoids, hyperboloids of one or two sheets, elliptic paraboloids, hyperbolic paraboloids, cyUnders of the elliptic, hyperbolic and parabolic types, and double elliptic cones. (The term ellipsoids includes spheres, and elliptic includes circular.) The general problem of analytically determining the intersection of any pair of these surfaces is not simple. This formulation provides a much desired simple solution. The second way of generalization is to extend it to n dimensions in which we determine the intersection of higher dimensional surfaces described by quadratic equations in n variables. The analysis using direct substitution and voluminous algebraic simplification turns out to be very laborious and troublesome, if at all possible in the general case. However, by using abstract symbolism and invariant properties of the extended (n-l-1) by (n-l-1) matrix, the analysis is greatly simplified and its overall structure made comprehensive and comprehensible. These results are also included in this paper.

272

Ken Chan

They also serve as a starting point for further theoretical investigations in higher dimensional analytical geometry.

1 Introduction Nomenclature As a common starting point, we shall first introduce the nomenclature used in this paper in connection with surfaces described by second degree polynomial equations. The definitions are the same as those used in standard practice. This will also permit us to point out some of the differences when we go from the case of two dimensions to three dimensions and then to higher dimensions. In general, a second degree polynomial will contain first degree terms and a constant term. For brevity, we shall refer to this polynomial as "quadratic". This term is consistent with the usage in "quadratic equations". However, it is used in a different sense in "quadratic forms" which are homogeneous polynomials containing only second degree terms, but not first degree and constant terms. A surface described by a quadratic equation is referred to as a "quadratic surface". If there are two variables in the polynomial, we have a conic (curve). If there are three variables, we have a quadric surface. (Some authors refer to a surface described by a quadratic equation in n variables as a quadric surface in n-dimensional space, but we shall refer to it simply as a quadratic surface here unless we wish to use a term such as "n-dric" surface as a compression of n-dimensional quadric.) Classification of Conies A conic is described by the general second degree polynomial equation qnx'^ + q22y'^ + 2q^2^y + 2qi^x + 2q23y + qss = 0

.

(1)

If we next introduce homogeneous coordinates by defining a column vector r of 3 components and a symmetric 3x3 matrix Q by r = (x,2/,l) =

Q = hj]

(2)

,

(3)

then equation (1) may be written in an e x t e n d e d form as r^Qr = 0

.

(4)

We note that the LHS of equation (4) is in the form that we are famihar with when dealing with quadratic forms except that we now have an extended

Determining Intersection of Quadratic Surfaces

273

vector and an extended matrix. Note that not all quadratic polynomials can be reduced to homogeneous form by eliminating the linear terms. This is the case when a particular variable has no second degree terms but has only linear terms, e.g., a parabola. If we define A as the determinant of Q, i.e., A = \Q\

,

(5)

then we may classify the conic according to whether A is zero or otherwise. If A is non-zero, then we have a proper (or non-singular) conic which may be an elhpse, a hyperbola or a parabola. If A is zero, we have a singular (or improper) conic which may be a point, a pair of intersecting hues, a pair of parallel hnes or a coincident hne which actually comprises two lines. Here we are restricting ourselves to discussing only real conies. Because these singular cases have degenerated into very simple elements (a point or hne), a singular conic is also called a degenerate conic. All this has been known and may be found in a book by Pettofrezzo [Pet66]. It must be stated that for the case of a conic, it has not been customary (or necessary) to use the term singular if A vanishes. Rather, the term degenerate is the only one used for this description. However, for quadratic polynomials with more than two variables, such a distinction is necessary as we shall next discuss. Classification of Quadrics A quadric is described by the general second degree polynomial equation

9iia; + (a + b), then

so that, by equation (25), the two eigenvalues are real, negative and not equal. Thus, the two coefficients in equation (30) are of the same sign. Consequently, the two degenerate conies corresponding to Ai and A2 are two different points (xi, 0) and (x2, 0). Because the eigenvalues are different, the two corresponding eigenvectors ri and r2 are also different, but they are still the two degenerate conies expressed in homogeneous coordinates. We note that if XQ > (a -I- b), then the circle A and the elhpse B are outside of each other, i.e., they share no common area. It is easily verified that Case III: If XQ = b = 2a, then

280

Ken Chan

so that, by equation (25), the two eigenvalues are complex conjugates. Consequently, there are no real degenerate conies and no real eigenvectors corresponding to the two complex eigenvalues. We note that if XQ = b = 2a, then the ellipse B intersects the circle A and passes through its center, i.e., they share common area. However, the three cases discussed above do not exhaustively describe all the scenarios. There are additional complications if we continue to move the center of the eUipse toward the origin. For instance, consider Case IV: If we let XQ = a = b/2, then the discriminant vanishes and we obtain equation (31) so that, by equation (25), the two eigenvalues are equal and positive and are given by Ai,2 = ^

.

(36)

However, only the semi-axes b and c effectively determine whether the two coefficients in equation (30) are of the same sign or of opposite signs. Consequently, by equations (28) and (30), both the degenerate conies corresponding to Ai and A2 are the same point (xj, 0) or the same pair of intersecting hnes passing through that point. By substituting equation (36) into (28), we obtain xi,2 = -a

(37)

so that the two degenerate conies have the common point (-a, 0). Moreover, by equation (27), the two eigenvectors rj and 12 corresponding to the two eigenvalues are also the same. We note that if x,, — a = b/2, then the circle A and the elUpse B are tangent at the point (-a, 0). For the parameters in this example, a little consideration reveals that both the degenerate conies comprise the same pair of intersecting lines passing through the point (-a, 0). This pair of lines also intersect the circle at the other two points where the ellipse intersects the circle. All these conclusions follow from equations (13), (16) and (17) which state that if there is a point common to any two of them, then it is also common to the third. However, not all common points when expressed in homogeneous coordinates will yield eigenvectors. But an eigenvector will always lie on the degenerate conic associated with the eigenvalue, even if that degenerate conic may not have any point in common with the circle and the elhpse. We have described many interesting and important manifestations associated with a circle intersecting with an ellipse. We have given four examples of these two conies having no, one, two and three points in common. A little consideration reveals that for one common point, we can only have the two conies tangent to each other. Our example of two common points considered the case of two intersecting points. However, we could also have two tangent points. Our example of three common points considered the case of one tangent and

Determining Intersection of Quadratic Surfaces

281

two intersecting points; and this is the only possible configuration. Finally, we could also have considered four common points, which must necessarily be intersecting points. Our Illustrative Example of the circle and the eUipse (spanning some 5 pages of concise description) is perhaps the simplest one which brings out the salient features of intersection between quadratic surfaces. Figure 3 illustrates the four cases just discussed.

Case II

(dD Cubic tQ.) = 0

Fig. 3. Relation Between Conic Intersections and Eigenvalues

At the very root of our methodology is the necessity to solve a polynomial equation to obtain explicit expressions for the eigenvalues so that we may then study their relation with each other. In our simple example, we had chosen the center of the ellipse to be on the x-axis thus making it possible to obtain easily the three eigenvalues. If we had chosen the center of the ellipse at a general point in the plane, then we would have to solve unnecessarily a much more comphcated cubic equation even though we have Cardan's method at our disposal. Moreover, if we had considered the case of intersection between two ellipsoids in three dimensions, our methodology would have required us to solve a quartic equation for the four eigenvalues associated with the singular quadric surface. While this is possible using Ferrari's method, it is hardly practical. For these more complex problems, it is doubtful if we will obtain new

282

Ken Chan

revelations illustrating the nature of the intersections. If we were to proceed along the same lines for the case of intersection between higher dimensional quadratic surfaces, we would be confronted with an insurmountable task which is rather daunting. Thus, we are forced to seek a new general approach to formulate and solve the problem.

3 General theorems In this section, we shall consider the general case of two quadratic surfaces defined in n-dimensional space, denoted by A and B. We shall assume that one of them (say A) is non-singular so that its inverse exists. Again, we shall introduce homogeneous coordinates. Let r denote the extended column vector defined by r = (a;i,X2,...,a;„,l)

.

(38)

Let the equations in homogeneous coordinates for A and B be r'^Ar = 0 r^Br = 0

(39) .

(40)

From A and B, it is obvious that we may form another quadratic surface Q described by any linear combination of A and B such as r'^Qr = r'^Br - Ar'^Ar = 0

.

(41)

Since A and B are symmetric, therefore Q is also symmetric. If we choose Q to be singular, i.e., | Q \= Q = 0, then A must be an eigenvalue of the matrix A~ B since we have | A |= A ^ 0 in the equation | A ( A - 1 B - A I ) | =0

.

(42)

There are (n+1) values of A and they are found from the characteristic polynomial equation |A"iB-Al|=0

.

(43)

Then, we may prove the following theorems. T h e o r e m 1. Common Point. If a point P lies on any two of the three surfaces A, B and Q (where Q need not be singular), then it lies on the third surface also.

Determining Intersection of Quadratic Surfaces

283

Proof. The proof is obvious and follows directly from equations (39), (40) and (41). D Remark: Even though it is trivial, this theorem is stated here because reference will be made to it repeatedly. T h e o r e m 2. Eigenvector. Let Aj be an eigenvalue of A~^B and let r^ be the eigenvector corresponding to Aj. Then, the eigenvector TJ always lies on the singular quadratic surface Q associated with that eigenvalue Aj. Proof By the definition of an eigenvector of A^^B, rj satisfies the equation ( A - i B - Ail)ri = 0

.

(44)

If we pre-multiply it by A, then we obtain (B-AiA)ri = 0

.

(45)

Therefore, we have Qr, = 0

.

(46)

Consequently, r, also satisfies rfQr, = 0

.

(47)

Hence, the homogeneous coordinates of this eigenvector r^ yield a point which lies on the singular surface Q. This proves the theorem. D Remark: Note that even though the eigenvector r, lies on Q, this singular surface Q may not have any point in common with A and B. T h e o r e m 3. Tangency. If a point P common to A and B yields the homogeneous coordinates of an eigenvector of A''^B, then the two quadratic surfaces are tangent at that point. Proof. Associated with the surface Q given by equation (41) is the quadratic polynomial ^ = r ^ Q r = r ^ B r - Ar^Ar

.

(48)

Let V ^ denote the n-dimensional gradient of ^, which is an n-dimensional vector. In keeping with current usage, V