Stochastic Simulation Optimization For Discrete Event Systems: Perturbation Analysis, Ordinal Optimization And Beyond : Perturbation Analysis, Ordinal Optimization, and Beyond 9789814513012, 9789814513005

Discrete event systems (DES) have become pervasive in our daily lives. Examples include (but are not restricted to) manu

199 19 3MB

English Pages 274 Year 2013

Report DMCA / Copyright

DOWNLOAD PDF FILE

Recommend Papers

Stochastic Simulation Optimization For Discrete Event Systems: Perturbation Analysis, Ordinal Optimization And Beyond : Perturbation Analysis, Ordinal Optimization, and Beyond
 9789814513012, 9789814513005

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

STOCHASTIC SIMULATION OPTIMIZATION FOR DISCRETE EVENT SYSTEMS Perturbation Analysis, Ordinal Optimization and Beyond

8827hc_9789814513005_tp.indd 1

4/3/13 10:54 AM

This page intentionally left blank

STOCHASTIC SIMULATION OPTIMIZATION FOR DISCRETE EVENT SYSTEMS Perturbation Analysis, Ordinal Optimization and Beyond

Editors

CHUN-HUNG CHEN

George Mason University, USA and National Taiwan University, Taiwan

QING-SHAN JIA Tsinghua University, China

LOO HAY LEE

National University of Singapore, Singapore

World Scientific NEW JERSEY



LONDON

8827hc_9789814513005_tp.indd 2



SINGAPORE



BEIJING



SHANGHAI



HONG KONG



TA I P E I



CHENNAI

4/3/13 10:54 AM

Published by World Scientific Publishing Co. Pte. Ltd. 5 Toh Tuck Link, Singapore 596224 USA office: 27 Warren Street, Suite 401-402, Hackensack, NJ 07601 UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE

Library of Congress Cataloging-in-Publication Data Stochastic simulation optimization for discrete event systems : perturbation analysis, ordinal optimization, and beyond / edited by Chun-Hung Chen (George Mason University, USA), Qing-Shan Jia (Tsinghua University, China) & Loo Hay Lee (National University of Singapore, Singapore). pages cm ISBN 978-9814513005 (alk. paper) 1. Discrete-time systems--Mathematical models. 2. Perturbation (Mathematics) 3. Systems engineering--Computer simulaton. I. Chen, Chun-Hung, 1964– editor of compilation. II. Jia, Qing-Shan, 1980– editor of compilation. III. Lee, Loo Hay, editor of compilation. TA343.S76 2013 003'.83--dc23 2013012700

British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library.

Copyright © 2013 by World Scientific Publishing Co. Pte. Ltd. All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means, electronic or mechanical, including photocopying, recording or any information storage and retrieval system now known or to be invented, without written permission from the Publisher.

For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy is not required from the publisher.

In-house Editor: Amanda Yun

Printed in Singapore

April 29, 2013

16:3

World Scientific Review Volume - 9in x 6in

Preface

Discrete event systems (DES) have become pervasive in our daily life. Examples include (but are not restricted to) manufacturing, telecommunication, supply chain, transportation, healthcare, call center, and finance systems. Stochastic simulation is a powerful tool for analyzing these modern complex systems. Optimization involving millions or even billions of decision variables and constraints has become possible with the advent of modern computing. The combination of these two successful paradigms, called Stochastic Simulation Optimization, has become dramatically powerful for DES. However, computational efficiency is still a big concern because (i) in the optimization process, many alternative designs need to be simulated; (ii) to obtain a sound statistical estimate, a large number of simulation runs (replications) is required for each design alternative. A user may be forced to compromise on simulation accuracy, modeling accuracy, and the optimality of the selected design. There have been several approaches developed to address this efficiency issue. Among the developments, perturbation analysis (PA) and ordinal optimization (OO) are two important techniques launched and developed by Professor Y.-C. Ho and his team. Over the years, the topics have continued to spawn a wealth of ideas and to generate enormous practical and theoretical interest. This book presents basic ideas, important fundamentals, and the state-of-the-art, and discusses some of the future research directions in both fields. This book is divided into two parts. Part I (Chapters 1–7) presents topics in perturbation analysis, while Part II (Chapter 8–11) gives a comprehensive coverage on ordinal optimization. These 11 chapters are written by 40 authors, most of whom are leading scholars in their fields. v

Preface

April 29, 2013

16:3

World Scientific Review Volume - 9in x 6in

vi

Preface

Part I – Perturbation Analysis Chapter 1 (by Cassandars and Wardi) presents an infinitesimal perturbation analysis (IPA) technique to evaluate performance sensitivities along sample paths of hybrid dynamical systems which combine time-driven and discrete-event dynamics. It also includes an application of IPA to a class of stochastic flow models. Chapter 2 (by Fu, Gong, Hu, and Li) reviews smoothed perturbation analysis (SPA) which is an extension and generalization of IPA. SPA is a sample path approach for gradient estimation based on conditional Monte Carlo. Its applications to queueing, inventory, and finance systems are presented. Chapter 3 (led by Vakili) establishes some connections between PA and variance reduction (VR) in Monte Carlo simulation. Utilizing the connection, it offers a PA-inspired approach to make the process of control variate selection more systematic when the variance reduction technique of control variates is used. Chapter 4 (by Glassman) discusses two techniques in derivative estimation. The first is the use of adjoint variables in calculating sample path derivatives. Compared with a standard forward IPA calculation, the adjoint method can substantially speed up the computing of derivatives. The second method uses an average of multiple combinations of IPA and likelihood ratio method estimators, and is shown to inherit attractive features from the two. Chapter 5 (by Chong) takes a retrospective look at one approach to analyze the convergence of stochastic approximation algorithms driven by IPA estimators. It highlights the interesting insight that algorithms with different update intervals behave similarly to one that updates after every regenerative cycle. Chapter 6 (by Xie) investigates simulation-based gradient estimation for continuous flow lines under both time dependent and operation dependent failures. For time dependent failure, simple IPA estimators can be easily established. For operation dependent failures, gradient estimation is more difficult, the IPA leads to biased estimation due to the discontinuity of flow dynamics but more complicated conditional perturbation analysis can be used to establish unbiased gradient estimators. Chapter 7 (by Xia and Cao) documents the research along the insightful idea of perturbation analysis in the recent 30+ years, especially those beyond dynamic programming. It shows that a sample path of a Markov

Preface

April 29, 2013

16:3

World Scientific Review Volume - 9in x 6in

Preface

Preface

vii

system under a policy contains some information about the performance of the system under other policies; and with this view, policy iteration is simply a discrete version of the gradient descent method, and each iteration can be implemented based on a single sample path. This view leads to the direct-comparison based approach to performance optimization, which may be applied to cases where dynamic programming does not work well. Part II – Ordinal Optimization Chapter 8 (by Jia, Zhao, Guan, Shen, and Dai) introduces the fundamentals of ordinal optimization (OO) and presents OOs basic ideas. The topics include the exponential convergence of ordinal comparison, goal softening, the universal alignment probabilities, and its extensions to different selection rules, vector ordinal optimization, constrained ordinal optimization, deterministic complex optimization problem, and the quantification of heuristic designs. Chapter 9 (led by Lee and Chen) presents the Optimal Computing Budget Allocation (OCBA) framework which intends to find a good design using a minimum simulation budget by optimally controlling the allocation of simulation budget to each alternative design. Inspired by OO, one has several ways to find a good enough design. OCBA intends to offer a most efficient way by formulating the process as an optimization problem. Further, it has been shown that OCBA can dramatically enhance efficiency for simulation optimization problems. Chapter 10 (by Chen and Shi) presents Nested Partitions (NP) which is a partition and sampling based framework for solving large-scale optimization problems. NP can help OO when one searches the design space to find a good enough design. Both generic NP and some recent developed NP are offered. Global convergence of NP is discussed as well. The last chapter, Chapter 11 (led by Lau and Jia), concludes Part II of the book by presenting several OO applications, including a scheduling problem for apparel manufacturing to which the conventional OO is applied; a turbine blade manufacturing process optimization problem, which involves a deterministic but very complex objective (as opposed to a stochastic problem); a remanufacturing system performance optimization problem, in which both constrained OO and vector OO are applied; and the Witsenhausen problem, which is a famous problem in team decision theory and has not been solved for more than forty years since it was first introduced. In this last example, OO helps to narrow down the vast search space of control laws, and to find a simple and near-optimal strategy.

April 29, 2013

16:3

World Scientific Review Volume - 9in x 6in

viii

Preface

Acknowledgements We are deeply indebted to our teacher, Prof. Yu-Chi Ho (Harvard University & Tsinghua University). Along with the authors of the various chapters, we would like to thank him for his excellent mentorship, his continuous support, his pioneering vision, his valuable inspirations, and his great wisdom. Without him, the fruitful developments presented in this book would simply not be possible. We are extremely grateful to Mrs. Sophia Ho for her support in all of our endeavors. Without her, we would not have become the close knit family that we are today. We would also like to thank Xi-Ren Cao (Hong Kong University of Science and Technology & Shanghai Jiao Tong University), Christos Cassandras (Boston University), Ek Peng Chew (National University of Singapore), Micahel Fu (University of Maryland), Jian-Qiang Hu (Fudan University), Peter B. Luh (University of Connecticut), Leyuan Shi (Peking University) and Qian-Chuan Zhao (Tsinghua University). They provided us with much needed support and help during the initiation and preparation process of the book.

Chun-Hung Chen, George Mason Univ. & National Taiwan Univ. Qing-Shan Jia, Tsinghua University Loo Hay Lee, National University of Singapore

Preface

April 29, 2013

16:6

World Scientific Review Volume - 9in x 6in

Foreword A Tribute to a Great Leader in Perturbation Analysis and Ordinal Optimization

My first interaction with Professor Yu-Chi (Larry) Ho was in the early 1990s when I was close to finishing my PhD degree at Stanford University. I was engaged to be married and would be moving to the Cambridge/Boston, Massachusetts area after graduation, and I wrote a letter to Professor Ho (by snail mail as this was before email became an official means of communication) asking if he had suggestions or leads on any job openings in the fields of systems and controls in the Boston area. Even though Professor Ho didn’t know me at that time, he was kind enough to respond to my letter. Professor Ho ultimately became a good friend. He always took the time to talk with me early in my career and provided candid and valuable advice and guidance to me. He served as one of my hosts at Harvard when I spent a sabbatical year there during 2001–2002, and his work on ordinal optimization and optimal computing budget allocation influenced the methods my research group was developing at that time in our projects investigating sensor fusion and sensor management algorithms for surveillance and tracking applications. It has been a privilege to know and work with Professor Ho, and I was deeply touched when asked to write a foreword for this book. As you will find from reading this book in honor of Professor Ho’s 80th birthday, Professor Ho has contributed significantly to several areas in the broad systems and controls fields. He has made a lasting impact to optimal control, differential games, and discrete event systems, among many other areas. He has worked on discrete event systems since the 1970s, and this work has led to numerous contributions in perturbation analysis and ordinal

ix

Foreword1

April 29, 2013

16:6

x

World Scientific Review Volume - 9in x 6in

Foreword

optimization, including the books Perturbation Analysis of Discrete Event Dynamic Systemsa and Ordinal Optimization — Soft Optimization for Hard Problemsb . The chapters in this book have been written by Professor Ho’s former PhD students, students of former PhD students, collaborators, as well as those whose work has been strongly influenced by Professor Ho. All of these authors are well-respected and distinguished researchers and faculty members, and many have themselves mentored a large cohort of graduate students who have continued successfully onto diverse career paths. Professor Ho not only mentored his students and others in the methods and technical aspects of research, he also advised them in how to make lucid presentations and clearly explain their work. As such, you will find that the chapters in this book are extremely well-written. The chapters provide historical perspective to those new to the field, include relevant real-world examples to motivate the mathematical principles discussed, showcase the power of the methods outlined, and discuss directions for further research. Many of the authors also share some comments and stories about how Professor Ho created, guided, and advanced the areas of perturbation analysis and ordinal optimization, and about how he influenced their own view of these topics as well as their careers. While Professor Ho’s work motivated me to adapt and develop some of the techniques discussed in this book for the sensor fusion and sensor management application areas, my own expertise is not in the areas of perturbation analysis and ordinal optimization. As a result, I have very much enjoyed reading the chapters that have been contributed to this book by such a prominent group of authors. In particular, I have appreciated the background material provided and the discussions of the surprising and unexpected results that have been revealed and proven over the last 30 years of research in these fields. The book is divided into two parts, with the first part focusing on perturbation analysis, and the second part on ordinal optimization. Both perturbation analysis and ordinal optimization were motivated by discrete event systems. Many dynamic interconnected systems found in manufacturing environments, computer networks, and communication systems can a Y. C. Ho and X. R. Cao, Perturbation Analysis of Discrete Event Dynamic Systems, Kluwer Academic Publishers, Boston, 1991. b Y. C. Ho, Q. C. Zhao, and Q. S. Jia, Ordinal Optimization: Soft Optimization for Hard Problems, Springer, 2007.

Foreword1

April 29, 2013

16:6

World Scientific Review Volume - 9in x 6in

Foreword

Foreword1

xi

be described as discrete event systems. Discrete event systems (DES) are those where the state of the system changes according to a sequence of distinct events, such as the arrival of parts at a machine in a manufacturing line. The events can occur at either deterministic or stochastic times, and they cause the system to evolve by terminating or initiating activities throughout time. The number of design choices in such systems is usually very large, and closed-form analytical solutions generally do not exist for real-world DES problems. Consequently, the analysis of such systems is usually conducted through simulations. For example, in a flexible manufacturing system, we may be interested in how changing the buffer size (or maximum number of parts allowed to be in the queue) before one machine may affect the overall throughput of the manufacturing system. To do so, we would run simulations with a variety of buffer sizes for that machine. In complex systems with hundreds (or thousands or millions) of parameters to be varied, the number of simulations required to determine the optimal solution becomes prohibitive. Both perturbation analysis and ordinal optimization are techniques developed by Professor Ho that expedite the process of performing and evaluating discrete event simulations. One of the main issues in studying DES is performance sensitivity analysis. This can be computationally expensive if merely implemented in a brute force manner, where two simulations are run under normal and perturbed parameter values for the system and the performance results are differenced and used to compute a sensitivity estimate. With Perturbation Analysis (PA), Professor Ho realized that a single sample path or experiment on a DES contains inherently much more information about the system than conventional simulations utilize. Professor Ho showed how single-run data can be wisely processed to yield much more information, including gradients. The basic idea of PA is to reconstruct a perturbed sample path from the nominal one with slight changes in parameters. Infinitesimal perturbation analysis (IPA) methods further expedite computation time by providing a very efficient method for estimating derivatives in discrete event systems. In addition to computational efficiency, another advantage of IPA is that it estimates performance gradients directly, rather than through finite differences, thus leading to superior variance properties. The chapters in the first part of this book discuss PA and IPA, as well as extensions that enable these techniques to be more broadly applicable, methods that enable further reductions in computation time, and convergence analysis of stochastic approximation algorithms driven by IPA derivative estimators.

April 29, 2013

16:6

xii

World Scientific Review Volume - 9in x 6in

Foreword

The chapters in the second part of the book review ordinal optimization (OO) and various research directions that have grown from OO. The basic idea in OO is that ordinal comparison (is A better than B) is much easier to determine than cardinal comparison (how much better is A than B). A second important idea in the OO literature is that of goal softening, where the softened goal is to find a top-n design rather than to find the global optimal design. In other words, OO aims to find a “good enough” design with high probability, and this can be done much more efficiently computationally than traditional optimization algorithms that seek to find the global optimal solution. OO has been used to effectively to compare and evaluate DES designs involving large numbers of parameters in a wide variety of application areas. Inspired by OO, which focuses on how to efficiently select a good enough design in a large-scale optimization problem, Optimal Computing Budget Allocation (OCBA) is a framework that aims to maximize the quality of the selected design given a limited computing budget. OCBA carefully analyzes the trade-off between devoting computational effort on searching a large design space for new promising solutions versus determining more accurately which of the currently promising solutions is the best. While OO assumes that the number of simulation replications for each alternative is equal, OCBA determines how the simulation budget can be concentrated for certain designs to better differentiate the promising alternatives. The goal of OCBA is either to obtain the highest performance design selection using a fixed computing budget or to attain a design selection above a “good enough” performance threshold using a minimum computing budget. OCBA attempts to determine which are the most worthy designs on which we should use our precious computing resources to run more simulations. As OCBA focuses on allocating the simulation budget to a finite number of alternatives, it is natural to integrate it with search algorithms (such as Nested Partitions) to handle large-scale simulation optimization problems where the number of alternatives is so large that it is not tractable to simulate all of them. Both OO and OCBA have been successfully used in many applications, and these are highlighted throughout the second part of the book. This book is a marvelous collection of chapters that review the contributions of Professor Ho in the areas of perturbation analysis and ordinal optimization and also provide a look to the future of these areas which

Foreword1

April 29, 2013

16:6

World Scientific Review Volume - 9in x 6in

Foreword

Foreword1

xiii

continue to impact a growing number of applications. I believe this book will be educational and useful for both readers new to and those familiar with this field, and it is a wonderful way to pay tribute to Professor Ho, who has been an excellent mentor, teacher, and friend to so many.

Lucy Pao Richard & Joy Dorf Professor of Electrical, Computer, & Energy Engineering University of Colorado Boulder, Colorado USA February 2013

This page intentionally left blank

April 29, 2013

16:7

World Scientific Review Volume - 9in x 6in

Foreword The Being and Becoming of Perturbation Analysis

One of the best ideas in the field of systems and control over the last three decades is perturbation analysis (PA). It originated, as mentioned in many chapters of this volume, from Larry’s consulting work that studied buffer storage design in an automobile production line. The research was first reported in a 1979 paper [11] (also refer to a follow-up paper [12]). Like all great ideas, PA has an elegant simplicity, built upon first principles. Suppose one is given the task to improve the performance of a system (the production line) by changing some of its parameters (buffer storage). Then, it will be very helpful to know the sensitivity (i.e., derivative or gradient) of the system’s performance with respect to the parameters—changing the parameters along the direction of the gradient is the best way to improve the performance, and often the best route towards an optimal solution. But here’s the rub: the system itself is often complex enough such that a functional relationship between the performance and the parameter is inaccessible, let alone the gradient. Indeed, in most cases, the performance can only be estimated by either observing and experimenting with the real system or by running a simulation on it. Larry’s idea (which he called PA) is this: as we simulate the system performance over time, we should be able to derive, with minimal additional work, an estimate of its gradient, alongside estimating the performance itself in the same simulation run. In other words, we can avoid running the simulation twice (first on the nominal parameter value, a second time on its perturbed value, and then taking the difference of the two as an estimate on the gradient). The year 1983 (exactly thirty years ago) witnessed three publications in PA, by Larry xv

Foreword2

April 29, 2013

16:7

xvi

World Scientific Review Volume - 9in x 6in

Foreword

and two of his doctoral students (Xiren C and Christos C), [8–10]. Thus, a new research topic in systems and control was born. The rest, as they say, is history. History in general, and history of ideas in particular, according to the Hegelian world view, invariably consists of three stages, entwined with the dual themes of being and becoming. The first stage is being. An idea is born; its existence and legitimacy is immediately challenged, often fiercely—the whips and scorns from serious people and anonymous referees alike. (I am sure many chapter authors of this book have retained fond memories of those challenges in the early years of PA.) But a good idea always survives and eventually wins acceptance. The decade of 1980’s roughly coincides with this first stage of PA, culminating in the publication of two books on PA, [2, 7], both in 1991. Another high point of that same year (1991): Larry started a new journal, Discrete Event Dynamic Systems, published by Kluwer. The second stage is being and becoming. While the idea continues to establish itself and gains popularity, it also keeps evolving, setting off variations and extensions. The third stage is becoming over being: the idea’s original (static) identity gradually fades away, but the dynamism of its character continues to flourish, to generate ever penetrating strengths and long-lasting impacts. The years following 1991 up to now, roughly the last two decades, marks the second stage of PA; and I believe—with a healthy dose of Larry’s infectious optimism—the transition to the third stage has just about started in recent years. As mentioned above, the initial motivation for PA was to solve optimal design problems in stochastic systems. With a set of tools well-developed to estimate gradients along sample paths, the research focus gradually shifted to stochastic optimization; refer to [5]. This includes ordinal optimization (the second part of this volume) as well, which, in a way, is also rooted in the original buffer design problem of [11]. Indeed, all of the chapters in this volume together can be viewed as a excellent representation of, as well as reflection upon, the developments in the second stage; and a few also give a preview of possible paths leading to the third stage. For the latter, I would point to—forgive the limitation to examples that I am personally more familiar with—the application of PA to dynamic programming and reinforcement learning (Xiren C), and the connection (or extension) of PA to financial hedging strategies via evaluating sensitivities of derivative products (Paul G).

Foreword2

April 29, 2013

16:7

World Scientific Review Volume - 9in x 6in

Foreword

Foreword2

xvii

Throughout the 1980’s, discrete-event systems was arguably the hottest topic in systems and control. Larry’s unique foresight and contribution to the field is to emphasize early on the critical importance of dynamics. In fact, during those days, DEDS (with the second D representing “dynamic”) is effectively the “label”—indeed, the second D being the tenet—of the “Ho school” of discrete-event systems control. Without this second D, a discrete-event system is typically modeled as a finite-state machine, and the focus is on the set of all feasible strings of events, the language that can be generated under a given control, and the goal to avoid certain undesirable states or outcomes like deadlock. In contrast, by explicitly considering the clock/life-times of the underlying events, DEDS emphasizes the dynamic and stochastic nature of the system, and the objective is typically to optimize the system performance by selecting or fine-tuning certain design parameters. Needless to say, without the emphasis on dynamics, there will be no need for PA at all. (Also refer to the remarks at the end of the chapter by Fu, Gong, Hu and Li.) A more personal perspective on this, if I may, is the 1994 book Paul and I co-authored [3], which is based on a set of papers we published in the early 1990’s. The DEDS is modeled as a generalized semi-Markov process (GSMP), which is about as general as any discrete-event simulation model. The view is to separate GSMP into a “scheme”—the set of all feasible strings of events, or the language, and a set of clock-times associated with each event. Hence, this framework literally integrates discrete-event systems into DEDS. The focus of the research is to explore the structure of the language or the scheme (which is deterministic and static) in terms of its impact on system performance (which is dynamic and stochastic). To give a flavor of the results coming out of this line of investigation, take Chapter 3 of [3] as an example. The “commuting” condition, (C), which plays a central role in guaranteeing path continuity and hence the validity of PA, is extended to a more general condition (M), which, in turn, leads to monotonicity. (M) is a condition imposed on the scheme only, with no restrictions on the event times. It is shown to endow the language with an appealing structure called antimatroid; and this structure leads to a (min, max, +) recursion of event completion-times (from which delay and queue-length processes all follow), as well as their monotonicity. Adding another condition called “min-closure” to (M) (Chapter 4) results in a special antimatroid with a supermodular rank function. Consequently, the (min, max, +) recursion simplifies to (max, +), and the implication on completion-times in monotonicity is strengthened to convexity. This last

April 29, 2013

16:7

xviii

World Scientific Review Volume - 9in x 6in

Foreword2

Foreword

result explains why (max, +) has received so much attention in the research literature, as it represents the best possible structure of a discrete-event system. Furthermore, condition (M) makes it possible to make comparisons across different schemes, which translates into the implications on system performance of different (supervisory) controls exercised on the language. Thus, it directly connects DEDS control to the control of the underlying discrete-event system, and explicitly brings out performance implictions. Some of these results are highlighted in [4], the first paper in the inaugural issue of the journal DEDS. The Lipschitz continuity of the operators min and max is highlighted in [3]. Its connection to PA, however, appears to be less well explored. Let me explain. It is a well-known fact that the dynamics of many stochastic systems, queues and queueing networks in particular, can be expressed in the form of a linear complementarity problem (LCP) over time—involving the stochastic events of arrival and service completions as input processes; refer to [1]. In the single-class, generalized Jackson network (including the single-server queue; a special case highlighted in several chapters of this volume), the dynamic LCP (also called the Skorohod problem) is known to involve a Lipschitz continuous mapping. And, this property appears to be central to all kinds of “nice” results regarding the single-class network (e.g., existence of diffusion limit). Extending this to a multi-class network, whether or not the Skorohod mapping is still, Lipschitz continuous remains unknown, except in some very special cases. It is perhaps no coincidence that PA also works best in settings where the Lipschitz continuous mapping is present. The more intriguing question is, whether the smoothing techniques of PA via conditioning—taking averages over a collection of sample paths, refer to [6]—may have some role to play outside of the Lipschitz continuous framework. In this age of Big Data dynamically (yes, dynamically) feeding off the Internet and permeating all kinds of apps in our daily life, it will not be an exaggeration to project that the best of PA, the continuing story of its becoming, may still have ways to go. A great idea invariably attracts a community of like minds and scholars, as this volume has so eloquently illustrated. Thank you, Larry, for creating both the idea and the fellowship; and Happy Birthday!

David D. Yao New York

April 29, 2013

16:7

World Scientific Review Volume - 9in x 6in

Foreword

Foreword2

xix

References [1] Chen, H. and Yao, D.D., Fundamentals of Queueing Networks: Performance, Asymptotics, and Optimization. Springer–Verlag, New York, 2001. [2] Glasserman, P., Gradient Estimation via Perturbation Analysis. Kluwer, Boston, MA, 1991. [3] Glasserman, P. and Yao, D.D., Monotone Structure in Discrete-Event Systems. Wiley Inter-Science, Series in Probability and Mathematical Statistics, 1994. [4] Glasserman, P. and Yao, D.D., Algebraic Structure of Some Stochastic Discrete Event Systems, with Applications. Discrete Event Dynamic Systems, 1 (1991), 7–35. [5] Fu, M.C. and Hu, J.Q., Conditional Monte Carlo: Gradient Estimation and Optimization Applications. Kluwer, 1997. [6] Gong, W.B. and Ho, Y.C., Smoothed (Conditional) Perturbation Analysis of Discrete Event Dynamical Systems. IEEE Transactions on Automatic Control, 32 (1987), 858–866. [7] Ho, Y.C. and Cao, X., Perturbation Analysis of Discrete Event Dynamic Systems. Kluwer, Boston, MA, 1991. [8] Ho, Y.C. and Cao, X., Perturbation Analysis and Optimization of Queueing Networks. Journal of Optimization Theory and Applications, 40 (1983), 559– 582. [9] Ho, Y.C. and Cassandras, C.G., A New Approach to the Analysis of Discrete Event Dynamic Systems. Automatica, 19 (1983), 149–167. [10] Ho, Y.C., Cao, X., and Cassandras, C.G., Infinitesimal and Finite Perturbation Analysis of Discrete Event Dynamic Systems. Automatica, 19 (1983), 439–445. [11] Ho, Y.C., Eyler, M.A., and Chien, T.T., A Gradient Technique for General Buffer Storage Design in a Serial Production Line. International J. of Production Research, 17 (1979), 557–580. [12] Ho, Y.C., Eyler, M.A., and Chien, T.T., A New Approach to Determine Parameter Sensitivities of Transfer Lines. Management Science, 29 (1983), 700–714.

This page intentionally left blank

April 29, 2013

16:10

World Scientific Review Volume - 9in x 6in

Foreword Remembrance of Things Past

It is indeed a pleasure to have the opportunity to write a few words about this book, Larry himself, and the body of work created by him and his students. His leadership in the field of automatic control is, of course, well known, and much has been written about the ways in which his contributions have led to new and lasting trends. Even so, there are perspectives not yet completely explored and the papers here on perturbation analysis and ordinal optimization provide ample evidence of that. Larry has always been interested in fresh ideas that have the potential for payoff in the real world. The Ho–Kashyap algorithm, his early work on game theory in a variety of settings and discrete event systems, are examples. The list could easily be extended, and I’m sure there are ideas brought forth here that will find their place among them. Perhaps it is not out of place to recall a little history that does not show up in the Transactions of Automatic Control or the allied journals that archive Larry’s body of work. By the late 1960’s, Larry had been tenured at Harvard; he and Art Bryson had completed their well-received “first of its type” textbook on optimal control, and Art was off to Stanford. The moon landing was not far in the future and the aerospace industry was growing. This was a very exciting period for automatic control, and Larry sensed the possibility of enlarging the control group at Harvard. In those days, the editorial board of the IEEE Transactions on Automatic Control met quarterly in New York City where the entire board got together to discuss the latest set of submissions—George Axelby presiding, Nat Nichols providing insider information on the most interesting places to eat in New xxi

Foreword3

April 29, 2013

16:10

xxii

World Scientific Review Volume - 9in x 6in

Foreword

York City (on a budget of course)—and naturally Larry was part of it. In spite of the considerable work involved, these meetings were fun. One could buy a one-way shuttle ticket from Boston for about $10, and sit down to discuss with some of the best people in our field, past, present and future! It was through these meetings I got to know Larry. Larry put a lot of energy into maintaining an active postdoctoral and visitor program at Harvard, and he was always on the lookout for interesting visitors and conference events to facilitate interaction with other leaders in the field. A prime example of this was the informal exchange program he arranged with David Mayne which resulted in bringing David, and, over time, at least four others from the control group at Imperial College to Harvard for longer term visits. This not only enriched life for the doctoral students, but also had a significant long term effect on the field. The visitor program was not limited to Imperial College, however; notables, in alphabetical order, from Hiotugu Akaike to Hans Witsenhausen, as well as a series of leaders from the People’s Republic of China come to mind. Perturbation analysis is very much in keeping with the current trends based on a more systematic use of simulation, data mining and probabilistic computing. These ideas are at the heart of some of the most intensively studied subjects of our time, ranging from quantum computing (which, in most forms, is intrinsically probabilistic) to models of cognition (which are relying more and more on Bayesian analysis and stochastic descent). Thus this book could not be more in keeping with our times. Let me conclude with a short quote about Larry from the long time Berkeley professor, Pravin Varaiya, “I have always treasured his enthusiasm, his generosity, his friendship, and the leadership by example that he has provided to us all.”

Roger W. Brockett An Wang Professor of Computer Science and Electrical Engineering, Division of Engineering and Applied Sciences, Founder of the Harvard Robotics Laboratory, Harvard University Cambridge, MA, USA February 2013

Foreword3

April 29, 2013

19:9

World Scientific Review Volume - 9in x 6in

TOC

Contents

Preface

v

Foreword: A Tribute to a Great Leader in Perturbation Analysis and Ordinal Optimization

ix

Foreword: The Being and Becoming of Perturbation Analysis

xv

Foreword: Remembrance of Things Past

xxi

Part I: Perturbation Analysis

1

Chapter 1.

The IPA Calculus for Hybrid Systems

3

1.1. Introduction . . . . . . . . . . . . . . . . . . . . . 1.2. Perturbation Analysis of Hybrid Systems . . . . . 1.2.1. Infinitesimal Perturbation Analysis (IPA): The IPA calculus . . . . . . . . . . . . . . 1.3. IPA Properties . . . . . . . . . . . . . . . . . . . . 1.4. General Scheme for Abstracting DES to SFM . . 1.5. Conclusions and Future Work . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . .

4 7 10 14 18 21 22

Smoothed Perturbation Analysis: A Retrospective and Prospective Look

25

2.1. Introduction . . . . . . . . . . . . . . . . . . . . . 2.2. Brief History of SPA . . . . . . . . . . . . . . . . 2.3. Another Example . . . . . . . . . . . . . . . . . .

25 28 29

Chapter 2.

xxiii

May 8, 2013

14:9

xxiv

World Scientific Review Volume - 9in x 6in

TOC

Contents

2.4. Overview of a General SPA Framework . . . 2.5. Applications . . . . . . . . . . . . . . . . . . 2.5.1. Queueing . . . . . . . . . . . . . . . . 2.5.2. Inventory . . . . . . . . . . . . . . . . 2.5.3. Finance . . . . . . . . . . . . . . . . . 2.5.4. Stochastic Activity Networks (SANs) 2.5.5. Others . . . . . . . . . . . . . . . . . 2.6. Random Retrospective and Prospective Concluding Remarks . . . . . . . . . . . . . Acknowledgements . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . Chapter 3.

. . . . . . .

. . . . . . .

. . . . . . .

30 33 33 34 34 36 38

. . . . . . . . .

39 41 41

Perturbation Analysis and Variance Reduction in Monte Carlo Simulation

45

3.1. Introduction . . . . . . . . . . . . . . . . . . . . . 3.2. Systematic and Generic Control Variate Selection 3.2.1. Control variate technique: a brief review . 3.2.2. Parametrized estimation problems . . . . . 3.2.3. Deterministic function approximation and generic CV selection . . . . . . . . . . . . . 3.3. Control Variates for Sensitivity Estimation . . . . 3.3.1. A parameterized estimation formulation of sensitivity estimation . . . . . . . . . . . 3.3.2. Finite difference based controls . . . . . . . 3.3.3. Illustrating example . . . . . . . . . . . . . 3.4. Database Monte Carlo (DBMC) Implementation . 3.5. Conclusions . . . . . . . . . . . . . . . . . . . . . Acknowledgements . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 4. Adjoints and Averaging 4.1. Introduction . . . . . . . . . 4.2. Adjoints: Classical Setting . 4.3. Adjoints: Waiting Times . . 4.4. Adjoints: Vector Recursions 4.5. Averaging . . . . . . . . . . 4.6. Concluding Remarks . . . . References . . . . . . . . . . . . .

45 47 47 49 50 54 54 56 57 59 60 61 61 63

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

63 64 64 67 69 72 73

April 29, 2013

19:9

World Scientific Review Volume - 9in x 6in

TOC

Contents

xxv

Chapter 5. Infinitesimal Perturbation Analysis and Optimization Algorithms

75

5.1. Preliminary Remarks . . . . . . . . . . . . . 5.2. Motivation . . . . . . . . . . . . . . . . . . . 5.3. Single-server Queues . . . . . . . . . . . . . . 5.3.1. Controlled single-server queue . . . . 5.3.2. Infinitesimal perturbation analysis . . 5.3.3. Optimization algorithm . . . . . . . . 5.4. Convergence . . . . . . . . . . . . . . . . . . 5.4.1. Stochastic approximation convergence theorem . . . . . . . . . . . . . . . . 5.4.2. Updating after every busy period . . 5.4.3. Updating after every service time . . 5.4.4. Example . . . . . . . . . . . . . . . . 5.5. Final Remarks . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . Chapter 6.

. . . . . . .

. . . . . . .

. . . . . . .

75 76 77 77 79 83 85

. . . . . .

. . . . . .

. . . . . .

85 86 88 92 92 93

Simulation-based Optimization of Failure-prone Continuous Flow Lines 6.1. 6.2. 6.3. 6.4.

97

Introduction . . . . . . . . . . . . . . . . . . . . Two-machine Continuous Flow Lines . . . . . . Gradient Estimation of a Two-machine Line . . Modeling Assembly/Disassembly Networks Subject to TDF Failures with Stochastic Fluid Event Graphs . . . . . . . . . . . . . . . . . . . 6.5. Evolution Equations and Sample Path Gradients 6.6. Optimization of Stochastic Fluid Event Graphs 6.7. Conclusion . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . .

. 97 . 100 . 104

. . . . .

Chapter 7. Perturbation Analysis, Dynamic Programming, and Beyond 7.1. Introduction . . . . . . . . . . . . . . . . . 7.2. Perturbation Analysis of Queueing Systems on Perturbation Realization Factors . . . . 7.2.1. Performance gradient . . . . . . . . 7.2.2. Policy iteration . . . . . . . . . . .

108 115 119 122 123

127 . . . . Based . . . . . . . . . . . .

128 131 131 135

April 29, 2013

19:9

World Scientific Review Volume - 9in x 6in

xxvi

TOC

Contents

7.3. Performance Optimization of Markov Systems Based on Performance Potentials . . . . . . . . 7.3.1. Performance gradients and potentials . 7.3.2. Policy iteration and HJB equation . . . 7.4. Beyond Dynamic Programming . . . . . . . . 7.4.1. New results based on direct comparison 7.4.1.1. N-bias optimality in MDP . . 7.4.1.2. Optimization of sample-path variance in MDP . . . . . . . 7.4.2. Event-based optimization . . . . . . . . 7.4.3. Financial engineering related . . . . . . Acknowledgments . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . .

. . . . . .

. . . . . .

137 137 141 142 143 143

. . . . .

. . . . .

145 147 151 153 153

Part II: Ordinal Optimization

157

Chapter 8. Fundamentals of Ordinal Optimization

159

8.1. Two Basic Ideas . . . . . . . . . . . . . . . 8.2. The Exponential Convergence of Order and Softening . . . . . . . . . . . . . . . . . . . 8.3. Universal Alignment Probabilities . . . . . 8.4. Extensions . . . . . . . . . . . . . . . . . . 8.4.1. Comparison of selection rules . . . . 8.4.2. Vector ordinal optimization . . . . . 8.4.3. Constrained ordinal optimization . 8.4.4. Deterministic complex optimization problem . . . . . . . . . . . . . . . 8.4.5. OO ruler: quantification of heuristic designs . . . . . . . . . . . . . . . . 8.5. Conclusion . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . .

. . . . Goal . . . . . . . . . . . . . . . . . . . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

160 163 164 165 166 168

. . . . 169 . . . . 170 . . . . 172 . . . . 173

Chapter 9. Optimal Computing Budget Allocation Framework 9.1. Introduction . . . . . . . . . 9.2. History of OCBA . . . . . . 9.3. Basics of OCBA . . . . . . . 9.3.1. Problem formulation 9.3.2. Common assumptions

159

. . . . .

175 . . . . .

. . . . .

. . . . .

176 177 180 180 182

April 29, 2013

19:9

World Scientific Review Volume - 9in x 6in

Contents

9.3.3. Ideas for deriving the simulation budget allocation . . . . . . . . . . . . . . . . . . . 9.3.4. Closed-form allocation rules . . . . . . . . 9.3.5. Intuitive explanations of the allocation rules . . . . . . . . . . . . . . . . . . . . . 9.3.6. Sequential heuristic algorithm . . . . . . . 9.4. Different Extensions of OCBA . . . . . . . . . . . 9.4.1. Selection qualities other than PCS . . . . . 9.4.2. Other extensions to OCBA with single objective . . . . . . . . . . . . . . . . . . . 9.4.3. OCBA for multiple performance measures 9.4.4. Integration of OCBA and the searching algorithms . . . . . . . . . . . . . . . . . . 9.5. Generalized OCBA Framework . . . . . . . . . . . 9.6. Applications of OCBA . . . . . . . . . . . . . . . 9.7. Future Research . . . . . . . . . . . . . . . . . . . 9.8. Concluding Remarks . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 10. Nested Partitions 10.1. Overview . . . . . . . . . . . . . . . . . . . . . . . 10.2. Nested Partitions for Deterministic Optimization . 10.2.1. Nested partitions framework . . . . . . . . 10.2.2. Global convergence . . . . . . . . . . . . . 10.3. Enhancements and Advanced Developments . . . 10.3.1. LP solution-based sampling . . . . . . . . . 10.3.2. Extreme value-based promising index . . . 10.3.3. Hybrid algorithms . . . . . . . . . . . . . . 10.3.3.1. Product design . . . . . . . . . . 10.3.3.2. Local pickup and delivery . . . . 10.4. Nested Partitions for Stochastic Optimization . . 10.4.1. Nested partitions for stochastic optimization . . . . . . . . . . . . . . . . . 10.4.2. Global convergence . . . . . . . . . . . . . 10.5. Conclusions . . . . . . . . . . . . . . . . . . . . . Acknowledgements . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . .

TOC

xxvii

183 185 185 186 188 188 188 189 190 191 192 193 193 194 203 203 206 207 209 212 212 213 216 217 218 218 219 221 222 223 223

April 29, 2013

19:9

xxviii

World Scientific Review Volume - 9in x 6in

TOC

Contents

Chapter 11. Applications of Ordinal Optimization 11.1. Scheduling Problem for Apparel Manufacturing . 11.2. The Turbine Blade Manufacturing Process Optimization Problem . . . . . . . . . . . . . . . . 11.3. Performance Optimization for a Remanufacturing System . . . . . . . . . . . . . . . . . . . . . . . . 11.3.1. Application of constrained ordinal optimization . . . . . . . . . . . . . . . . . 11.3.2. Application of vector ordinal optimization 11.4. Witsenhausen Problem . . . . . . . . . . . . . . . 11.5. Other Application Researches . . . . . . . . . . . Acknowledgments . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . .

227 228 232 235 235 238 239 243 243 243

April 29, 2013

16:13

World Scientific Review Volume - 9in x 6in

PART I

Perturbation Analysis

1

chapter1

This page intentionally left blank

April 29, 2013

16:13

World Scientific Review Volume - 9in x 6in

Chapter 1 The IPA Calculus for Hybrid Systems

Christos G. Cassandras∗ Division of Systems Engineering and Center for Information and Systems Engineering Boston University 15 St. Mary’s St., Brookline, MA 02446 Yorai Wardi† School of Electrical and Computer Engineering Georgia Institute of Technology Atlanta, GA 30332-0250 This chapter presents a systematic approach to evaluating performance sensitivities along sample paths of hybrid dynamical systems and identifying some of their properties. The systems under consideration combine time-driven and discrete-event dynamics, thus defining their hybrid nature. Stochastic hybrid systems provide a suitable modeling framework for performance evaluation of networks in various application areas such as telecommunications, manufacturing, and transportation. In typical applications, performance metrics are functions of structural or control variables, and, for the purpose of system design optimization, provisioning, or control, it is desirable to compute their sensitivities with respect to these variables. However, the hybrid model is often too complex to yield a closed-form functional characterization, hence the sensitivities can be computed only along sample paths of the system. The general framework we present is based on Infinitesimal Perturbation Analysis (IPA) for estimating the gradients of the performance functions with respect to variable parameters. The chapter includes an application of IPA to a class of Stochastic Flow Models (SFMs), and describes a systematic procedure for deriving such SFMs from discrete event systems. ∗ Supported

in part by NSF under Grants EFRI-0735974 and CNS-1239021, by AFOSR under Grant FA9550-12-1-0113, by ONR under Grant N00014-09-1-1051, and by ARO under Grant W911NF-11-1-0227. † Supported in part by NSF under Grant CNS-1239225. 3

chapter1

April 29, 2013

16:13

4

World Scientific Review Volume - 9in x 6in

C. G. Cassandras and Y. Wardi

1.1. Introduction In pioneering the field of Discrete Event Systems (DES) in the early 1980s, Y.C. Ho and his research group at Harvard University discovered that eventdriven dynamics give rise to state trajectories (sample paths) from which one can very efficiently and nonintrusively extract sensitivities of state variables (therefore, various performance metrics as well) with respect to at least certain types of design or control parameters. This eventually led to the development of a theory for perturbation analysis in DES [1–3], the most successful branch of which is Infinitesimal Perturbation Analysis (IPA) due to its simplicity and ease of implementation. Using IPA, one obtains unbiased estimates of performance metric gradients that can be incorporated into standard gradient-based algorithms for optimization purposes. However, IPA estimates become biased (hence, unreliable for control purposes) when dealing with various aspects of DES that cause significant discontinuities in sample functions of interest. Such discontinuities normally arise when a parameter perturbation causes the order in which events occur to be affected and this event order change may violate a basic “commuting condition” [3]. When this happens, one must resort to significantly more complicated methods for deriving unbiased gradient estimates [1]. By the early 1990s, a large collection of Perturbation Analysis (PA) algorithms was developed to accommodate complexities in DES such as saturation phenomena occurring in queueing systems with finite capacities and changes in the order in which events occur in systems providing service to different user classes with class-dependent characteristics. However, PA techniques were also extended to include discrete parameters so that one could efficiently obtain finite difference estimates for performance metrics dependent on such parameters or simply estimate performance metrics over large changes of continuous parameters. These techniques came to be known as Concurrent Estimation methods [4]. Nonetheless, as new, increasingly complex, DES were being designed, the benefits of Perturbation Analysis or Concurrent Estimation techniques reached a natural efficiency limit compared to, for example, “brute force” simulation methods where one simply repeatedly simulates a DES with one simulation for each parameter value of interest. The 1990s also saw the emergence of hybrid systems. Hybrid systems consist of interacting components, some with time-driven and others with event-driven dynamics. For example, electromechanical components

chapter1

April 29, 2013

16:13

World Scientific Review Volume - 9in x 6in

IPA Calculus

chapter1

5

governed by time-driven dynamics become a hybrid system when interacting through a communication network whose behavior is event-driven. Indeed, so-called “cyber-physical systems” are explicitly designed to allow discrete event components (e.g., embedded microprocessors, sensor networks) to be integrated with physical components (e.g., generators in a power grid, engine parts in an automotive vehicle). Given that they must also operate in the presence of uncertainty, an appropriate framework must include the means to represent stochastic effects, be they purely random or the result of adversarial disturbances. The basis for a hybrid system modeling framework is often provided by a hybrid automaton. In a hybrid automaton, discrete events (either controlled or uncontrolled) cause transitions from one discrete state (or “mode”) to another. While operating at a particular mode, the system’s behavior is usually described by differential equations. In a stochastic setting, such frameworks are augmented with models for random processes that affect either the time-driven dynamics or the events causing discrete state transitions or both. A general-purpose stochastic hybrid automaton model may be found in [5] along with various classes of Stochastic Hybrid Systems (SHS) which exhibit different properties or suit different types of applications. As in DES, the performance of a SHS is generally hard to estimate because of the absence of closed-form expressions capturing the dependence of interesting performance metrics on control parameters. Consequently, we lack the ability to systematically adjust such parameters for the purpose of improving — let alone optimizing — performance. By the early 2000s, it was shown that IPA can also be applied to at least some classes of SHS and yield simple unbiased gradient estimators that can be used for optimization purposes. In particular, Stochastic Flow (or Fluid ) Models (SFMs), as introduced in [6], are a class of SHS where the time-driven component captures general-purpose flow dynamics and the event-driven component describes switches, controlled or uncontrolled, that alter these flow dynamics. What is attractive about SFMs is that they can be viewed as abstractions of complex stochastic DES which retain their essential features for the purpose of control and optimization. In fact, fluid models have a history of being used as abstractions of DES. Introduced in [7], fluid models have been shown to be very useful in simulating various kinds of high speed networks [8], manufacturing systems [9] and, more generally, settings where users compete over different sharable resources. It should be stressed that fluid models may not always provide accurate

May 8, 2013

14:20

6

World Scientific Review Volume - 9in x 6in

C. G. Cassandras and Y. Wardi

representations for the purpose of analyzing the performance of the underlying DES. What we are interested in, however, is control and optimization, in which case the value of a fluid model lies in capturing only those system features needed to design an effective controller that can potentially optimize performance without any attempt at estimating the corresponding optimal performance value with accuracy. While in most traditional fluid models the flow rates involved are treated as fixed parameters, an SFM has the extra feature of treating flow rates as stochastic processes. With only minor technical conditions imposed on the properties of such processes, the use of IPA has been shown to provide simple gradient estimators for stochastic resource contention systems that include blocking phenomena and a variety of feedback control mechanisms [10–13]. Until recently, the use of IPA was limited to a particular class of SFMs and was based on exploiting the special structure of specific systems. However, a major advance was brought about by the unified framework introduced in [11] and extended in [14] placing IPA in the general context of stochastic hybrid automata with arbitrary structure. As a result, IPA may now be applied to arbitrary SHS for the purpose of performance gradient estimation and, therefore, gradient-based optimization through standard methods. In this chapter, we begin with an overview of this general IPA framework for SHS. Our emphasis is on the main concepts and key results that will enable the reader to apply the general IPA methodology through a set of simple equations driven by observable system data; additional details on this material can be found in [14]. We then present properties of the resulting gradient estimators (details and proofs may be found in [15]) that justify its applicability even in the absence of detailed models for the timedriven components in some cases, and we provide an illustrative example. Next, we focus on DES and address the issue of obtaining a SHS model from an arbitrary DES as a means of abstraction that facilitates its analysis for control and optimization purposes, and we describe a scheme introduced in [15] for systematically performing this abstraction process. The chapter is organized as follows. In Section 1.2 we present the general IPA framework for SHS which we refer to as the “IPA calculus”. We then identify in Section 1.3 properties which provide sufficient conditions under which IPA is particularly simple and efficient. Section 1.4 presents a general scheme to abstract a DES to a stochastic hybrid automaton and Section 1.5 summarizes and concludes the chapter.

chapter1

April 29, 2013

16:13

World Scientific Review Volume - 9in x 6in

IPA Calculus

chapter1

7

1.2. Perturbation Analysis of Hybrid Systems We begin by adopting a standard hybrid automaton formalism to model the operation of a (generally stochastic) hybrid system [5]. Thus, let q ∈ Q (a countable set) denote the discrete state (or mode) and x ∈ X ⊆ Rn denote the continuous state. Let υ ∈ Υ (a countable set) denote a discrete control input and u ∈ U ⊆ Rm a continuous control input. Similarly, let δ ∈ Δ (a countable set) denote a discrete disturbance input and d ∈ D ⊆ Rp a continuous disturbance input. The state evolution is determined by means of (i) a vector field f : Q × X × U × D → X, (ii) an invariant (or domain) set Inv : Q × Υ × Δ → 2X , (iii) a guard set Guard : Q × Q × Υ × Δ → 2X , and (iv) a reset function r : Q × Q × X × Υ × Δ → X. A sample path of such a system consists of a sequence of intervals of continuous evolution followed by a discrete transition. The system remains at a discrete state q as long as the continuous (time-driven) state x does not leave the set Inv(q, υ, δ). If x reaches a set Guard(q, q  , υ, δ) for some q  ∈ Q, a discrete transition can take place. If this transition does take place, the state instantaneously resets to (q  , x ) where x is determined by the reset map r(q, q  , x, υ, δ). Changes in υ and δ are discrete events that either enable a transition from q to q  by making sure x ∈ Guard(q, q  , υ, δ) or force a transition out of q by making sure x ∈ / Inv(q, υ, δ). We will also use E to denote the set of all events that cause discrete state transitions and will classify events in a manner that suits the purposes of perturbation analysis. In what follows, we describe the general framework for IPA presented in [11] and generalized in [14]. Let θ ∈ Θ ⊂ Rl be a global variable, henceforth called the control parameter, where Θ is a given compact, convex set. This may represent a system design parameter, a parameter of an input process, or a parameter that characterizes a policy used in controlling this system. The disturbance input d ∈ D encompasses various random processes that affect the evolution of the state (q, x) so that, in general, we can deal with a SHS. We will assume that all such processes are defined over a common probability space, (Ω, F, P ). Let us fix a particular value of the parameter θ ∈ Θ and study a resulting sample path of the SHS. Over such a sample path, let τk (θ), k = 1, 2, . . ., denote the occurrence times of the discrete events in increasing order, and define τ0 (θ) = 0 for convenience. We will use the notation τk instead of τk (θ) when no confusion arises. The continuous state is also generally a function of θ, as well as of t, and is thus denoted by x(θ, t). Over an interval [τk (θ), τk+1 (θ)), the system is at some

May 14, 2013

10:0

8

World Scientific Review Volume - 9in x 6in

chapter1

C. G. Cassandras and Y. Wardi

mode during which the time-driven state satisfies: x˙ = fk (x, θ, t)

(1.1)

where x˙ denotes ∂x ∂t . Note that we suppress the dependence of fk on the inputs u ∈ U and d ∈ D and stress instead its dependence on the parameter θ which may generally affect either u or d or both. The purpose of perturbation analysis is to study how changes in θ influence the state x(θ, t) and the event times τk (θ) and, ultimately, how they influence interesting performance metrics which are generally expressed in terms of these variables. The following assumption guarantees that (1.1) has a unique solution w.p.1 for a given initial boundary condition x(θ, τk ) at time τk (θ): Assumption 1: W.p.1, there exists a finite set of points tj ∈ [τk (θ), τk+1 (θ)), j = 1, 2, . . ., which are independent of θ, such that, the function fk is continuously differentiable on Rn × Θ × ([τk (θ), τk+1 (θ)) \ {t1 , t2 . . .}). Moreover, there exists a random number K > 0 such that E[K] < ∞ and the norm of the first derivative of fk on Rn × Θ × ([τk (θ), τk+1 (θ)) \ {t1 , t2 . . .}) is bounded from above by K. An event occurring at time τk+1 (θ) triggers a change in the mode of the system, which may also result in new dynamics represented by fk+1 , although this may not always be the case; for example, two modes may be distinct because the state x(θ, t) enters a new region where the system’s performance is measured differently without altering its time-driven dynamics (i.e., fk+1 = fk ). The event times {τk (θ)} play an important role in defining the interactions between the time-driven and event-driven dynamics of the system. We now classify events that define the set E as follows: 1. Exogenous events. An event is exogenous if it causes a discrete state transition at time τk , in a manner that is independent of the conk trollable vector θ, hence it satisfies dτ dθ = 0. Exogenous events typically correspond to uncontrolled random changes in input processes. 2. Endogenous events. An event occurring at time τk is endogenous if there exists a continuously differentiable function gk : Rn × Θ → R such that τk = min{t > τk−1 : gk (x (θ, t) , θ) = 0}

(1.2)

The function gk normally corresponds to a guard condition in a hybrid automaton model. 3. Induced events. An event at time τk is induced if it is triggered by the occurrence of another event at time τm ≤ τk . The triggering event

April 29, 2013

16:13

World Scientific Review Volume - 9in x 6in

IPA Calculus

chapter1

9

may be exogenous, endogenous, or itself an induced event. The events that trigger induced events are identified by a subset of the event set, EI ⊆ E. Although this event classification is sufficiently general, recent work has shown that in some cases it is convenient to introduce further event distinctions [16]. Moreover, it has been shown in [17] that an explicit event classification is in fact unnecessary if one is willing to appropriately extend the definition of the hybrid automaton described above. However, for the rest of this chapter we shall make use of the above classification. Next, consider a performance function of the control parameter θ: J(θ; x(θ, 0), T ) = E [L(θ; x(θ, 0), T )] where L(θ; x(θ, 0), T ) is a sample function of interest evaluated in the interval [0, T ] with initial conditions x(θ, 0). For simplicity, we write J(θ) and L(θ). Suppose that there are N events, with occurrence times generally dependent on θ, during the time interval [0, T ] and define τ0 = 0 and τN +1 = T . Let Lk : Rn × Θ × R+ → R be a function satisfying Assumption 1 and define L(θ) by N  τk+1  L(θ) = Lk (x, θ, t)dt (1.3) k=0

τk

where we reiterate that x = x(θ, t) is a function of θ and t. We also point out that the restriction of the definition of J(θ) to a finite horizon T which is independent of θ is made merely for the sake of simplicity of exposition. Given that we do not wish to impose any limitations (other than mild technical conditions) on the random processes that characterize the discrete or continuous disturbance inputs in our hybrid automaton model, it is infeasible to obtain closed-form expressions for J(θ). Therefore, for the purpose of optimization, we resort to iterative methods such as stochastic approximation algorithms (e.g., [18]) which are driven by estimates of the cost function gradient with respect to the parameter vector of interest. Thus, we are interested in estimating dJ/dθ based on sample path data, where a sample path of the system may be directly observed or it may be obtained through simulation. We then seek to obtain θ∗ minimizing J(θ) through an iterative scheme of the form θn+1 = θn − ηn Hn (θn ; x(θ, 0), T, ωn ),

n = 0, 1, . . .

(1.4)

where Hn (θn ; x(0), T, ωn ) is an estimate of dJ/dθ evaluated at θ and based on information obtained from a sample path denoted by ωn and {ηn } is an appropriately selected step size sequence. In order to execute an algorithm

April 29, 2013

16:13

World Scientific Review Volume - 9in x 6in

chapter1

C. G. Cassandras and Y. Wardi

10

such as (1.4), we need the estimate Hn (θn ) of dJ/dθ. The IPA approach is based on using the sample derivative dL/dθ as an estimate of dJ/dθ. The strength of the approach is that dL/dθ can be obtained from observable sample path data alone and, usually, in a very simple manner that can be readily implemented on line. Moreover, it is often the case that dL/dθ is an unbiased estimate of dJ/dθ, a property that allows us to use (1.4) in obtaining θ∗ . We will return to this issue later, and concentrate first on deriving the IPA estimates dL/dθ. 1.2.1. Infinitesimal Perturbation Analysis (IPA): The IPA calculus Let us fix θ ∈ Θ, consider a particular sample path, and assume for the time being that all derivatives mentioned in the sequel do exist. To simplify notation, we define the following for all state and event time sample derivatives: x (t) ≡

∂x(θ, t) ∂τk , τk ≡ , k = 0, . . . , N ∂θ ∂θ

(1.5)

In addition, we will write fk (t) instead of fk (x, θ, t) whenever no ambiguity arises. By taking derivatives with respect to θ in (1.1) on the interval [τk (θ), τk+1 (θ)) we get d  ∂fk (t)  ∂fk (t) x (t) = x (t) + dt ∂x ∂θ

(1.6)

The boundary (initial) condition of this linear equation is specified at time t = τk , and by writing (1.1) in an integral form and taking derivatives with respect to θ when x(θ, t) is continuous in t at t = τk , we obtain for k = 1, . . . , N :   x (τk+ ) = x (τk− ) + fk−1 (τk− ) − fk (τk+ ) τk (1.7) We note that whereas x(θ, t) is often continuous in t, x (t) may be discontinuous in t at the event times τk , hence the left and right limits above are generally different. If x(θ, t) is not continuous in t at t = τk , the value of x(τk+ ) is determined by the reset function r(q, q  , x, υ, δ) discussed earlier and x (τk+ ) =

dr(q, q  , x, υ, δ) dθ

(1.8)

Furthermore, once the initial condition x (τk+ ) is given, the linearized state trajectory {x (t)} can be computed in the interval t ∈ [τk (θ), τk+1 (θ)) by

April 29, 2013

16:13

World Scientific Review Volume - 9in x 6in

chapter1

IPA Calculus

11

solving (1.6) to obtain: x (t) = e

t

τk

∂fk (u) du ∂x



t

τk

∂fk (v) − τt e k ∂θ

∂fk (u) du ∂x

 dv + ξk

(1.9)

with the constant ξk determined from x (τk+ ) in (1.7), since x (τk− ) is the final-time boundary condition in the interval [τk−1 (θ), τk (θ)), or it is obtained from (1.8). Clearly, to complete the description of the trajectory of the linearized system (1.6)–(1.7) we have to specify the derivative τk which appears in (1.7). Since τk , k = 1, 2, . . ., are the mode-switching times, these derivatives explicitly depend on the interaction between the time-driven dynamics and the event-driven dynamics, and specifically on the type of event occurring at time τk . Using the event classification given earlier, we have the following. 1. Exogenous events. By definition, such events are independent of θ, therefore τk = 0. 2. Endogenous events. In this case, (1.2) holds and taking derivatives with respect to θ we get:  ∂gk ∂gk   − x (τk ) + fk (τk− )τk + = 0 ∂x ∂θ which, assuming τk

∂gk − ∂x fk (τk )

(1.10)

= 0, can be rewritten as

−1  ∂gk  − ∂gk ∂gk − fk (τk ) + x (τk ) = − ∂x ∂θ ∂x 

(1.11)

3. Induced events. If an induced event occurs at t = τk , the value  where τm ≤ τk is the time when the of τk depends on the derivative τm associated triggering event takes place. The event induced at τm will occur at some time τm + ω(τm ), where ω(τm ) is a random variable which is generally dependent on the continuous and discrete states x(τm ) and q(τm ) respectively. This implies the need for additional state variables, denoted by ym (θ, t), m = 1, 2, . . ., associated with events occurring at times τm , m = 1, 2 . . . The role of each such state variable is to provide a “timer” activated when a triggering event occurs. Recalling that triggering events are identified as belonging to a set EI ⊆ E, let ek denote the event occurring at τk and define k = {m : em ∈ EI , m ≤ k} to be the set of all indices with corresponding triggering events up to τk . Omitting the dependence

April 29, 2013

16:13

12

World Scientific Review Volume - 9in x 6in

chapter1

C. G. Cassandras and Y. Wardi

on θ for simplicity, the dynamics of ym (t) are then given by

−C(t) τm ≤ t < τm + ω(τm ), m ∈ m y˙ m (t) = 0 otherwise

− ) = 0, m ∈ m y0 ym (τm + )= ym (τm 0 otherwise

(1.12)

where y0 is an initial value for the timer ym (t) which decreases at a “clock rate” C(t) > 0 until ym (τm + ω(τm )) = 0 and the associated induced event takes place. Clearly, these state variables are only used for induced events, so that ym (t) = 0 unless m ∈ m . The value of y0 may depend on θ or on the continuous and discrete states x(τm ) and q(τm ), while the clock rate C(t) may depend on x(t) and q(t) in general, and possibly θ. However, in most simple cases where we are interested in modeling an induced event to occur at time τm + ω(τm ), we have y0 = ω(τm ) and C(t) = 1, i.e, the timer simply counts down for a total of ω(τm ) time units until the induced event takes place. An example where y0 in fact depends on the state x(τm ) and the clock rate C(t) is not necessarily constant arises in the case of multiclass resource contention systems as described in [19]. Henceforth, we will consider ym (t), m = 1, 2, . . ., as part of the continuous state of the SHS and, similar to (1.5), we set ∂ym (t) , m = 1, . . . , N. (1.13) ∂θ For the common case where y0 is independent of θ and C(t) is a constant c > 0 in (1.12), the following lemma facilitates the computation of τk for an induced event occurring at τk . Its proof is given in [14]. Lemma 1: If in (1.12) y0 is independent of θ and C(t) = c > 0 (con . stant), then τk = τm With the inclusion of the state variables ym (t), m = 1, . . . , N , the  (t) can be evaluated through (1.6)–(1.11) along derivatives x (t), τk , and ym with (1.13). This very general set of equations represents the “IPA calculus”. In general, the derivative evaluation is recursive over the event (mode switching) index k = 0, 1, . . . In some cases, however, it can be reduced to simple expressions, as seen in the analysis of many SFMs, e.g., [6]. Remark: If a SHS does not involve induced events and if the state does not experience discontinuities when a mode-switching event occurs, then the full extent of the IPA calculus reduces to three equations:  (t) ≡ ym

(i) Equation (1.9), which describes how the state derivative x (t) evolves over [τk (θ), τk+1 (θ)),

May 8, 2013

14:20

World Scientific Review Volume - 9in x 6in

chapter1

IPA Calculus

13

(ii) Equation (1.7), which specifies the initial condition ξk in (1.9), and (iii) Either τk = 0 or (1.11) depending on the event type at τk (θ), which specifies the event time derivative present in (1.7). Now the IPA derivative dL/dθ can be obtained by taking derivatives in (1.3) with respect to θ:  τk+1 N  d dL(θ) Lk (x, θ, t)dt, (1.14) = dθ dθ τk k=0

Applying the Leibnitz rule we obtain, for every k = 0, . . . , N ,   τk+1  τk+1  d ∂Lk ∂Lk (x, θ, t)x (t) + (x, θ, t) dt Lk (x, θ, t)dt = dθ τk ∂x ∂θ τk  + Lk (x(τk+1 ), θ, τk+1 )τk+1 − Lk (x(τk ), θ, τk )τk

(1.15)

where x (t) and τk are determined through (1.6)–(1.11). What makes IPA appealing, especially in the SFM setting, is the simple form the right-handside above often assumes. We close this section with a comment on the unbiasedness of the IPA derivative dL/dθ. This IPA derivative is statistically unbiased [1, 2] if, for every θ ∈ Θ,   d dJ(θ) dL(θ) = E[L(θ)] = . (1.16) E dθ dθ dθ The main motivation for studying IPA in the SHS setting is that it yields unbiased derivatives for a large class of systems and performance metrics compared to the traditional DES setting [1]. The following conditions have been established in [20] as sufficient for the unbiasedness of IPA: Proposition 1: Suppose that the following conditions are in force: (i) exists w.p.1. (ii) W.p.1, the function For every θ ∈ Θ, the derivative dL(θ) dθ L(θ) is Lipschitz continuous on Θ, and the Lipschitz constant has a finite exists, and the IPA first moment. Fix θ ∈ Θ. Then, the derivative dJ(θ) dθ dL(θ) derivative dθ is unbiased. The crucial assumption for Proposition 1 is the continuity of the sample performance function L(θ), which in many SHS (and SFMs in particular) is guaranteed in a straightforward manner. Differentiability w.p.1 at a given θ ∈ Θ often follows from mild technical assumptions on the probability law underlying the system, such as the exclusion of co-occurrence of multiple events (see [19]). Lipschitz continuity of L(θ) generally follows from upper boundedness of | dL(θ) dθ | by an absolutely integrable random variable, generally a weak assumption. In light of these observations, the proofs

April 29, 2013

16:13

14

World Scientific Review Volume - 9in x 6in

chapter1

C. G. Cassandras and Y. Wardi

of unbiasedness of IPA have become standardized and the assumptions in Proposition 1 can be verified fairly easily from the context of a particular problem. 1.3. IPA Properties The IPA estimators obtained within the general framework of the previous section lead to some sufficient conditions under which they become particularly simple and efficient to implement with minimal information required about the underlying SHS dynamics. in (1.14), which, as The first question we address is related to dL(θ) dθ seen in (1.15), generally depends on information accumulated over all t ∈ [τk , τk+1 ). It is, however, often the case that it depends only on information related to the event times τk , τk+1 , resulting in an IPA estimator which is (x,t,θ) , we very simple to implement. Using the notation Lk (x, t, θ) ≡ ∂Lk∂θ dL(θ) can rewrite dθ in (1.14) as    τk+1 + dL(θ)   = τk+1 · Lk τk+1 − τk · Lk τk+ + Lk (x, t, θ) dt dθ τk k (1.17) The following proposition provides two sufficient conditions under which dL(θ) is independent of t and involves only the event time derivatives τk , dθ + +  and the “local” performance Lk τk+1 , Lk τk which is obviously τk+1 easy to observe. The proof of this result is given in [15]. deProposition 2: If condition (i) or (ii) below holds, then dL(θ) dθ pends only on information available at event times {τk }, k = 0, 1, ... (i) Lk (x, t, θ) is independent of t over [τk , τk+1 ) for all k = 0, 1, ... (ii) Lk (x, t, θ) is only a function of x and the following condition holds for all t ∈ [τk , τk+1 ), k = 0, 1, . . .: d ∂fk d ∂fk d ∂Lk = = =0 dt ∂x dt ∂x dt ∂θ

(1.18)

The second question we address is related to the discontinuity in x (t) at event times, described in (1.7). This happens when endogenous events occur, since for exogenous events we have τk = 0. The next proposition identifies a simple condition under which x τk+ is independent of the dynamics f before the event at τk . This implies that we can evaluate the sensitivity of the state with respect to θ without any knowledge of the state trajectory in the interval [τk−1 , τk ) prior to this event. Moreover, under an

April 29, 2013

16:13

World Scientific Review Volume - 9in x 6in

IPA Calculus

chapter1

15

additional condition, we obtain x τk+ = 0, implying that the effect of θ is “forgotten” and one can reset the perturbation process. This allows us to study the SHS over reset cycles, greatly simplifying the IPA process. The proof of the next result is also given in [15]. Proposition 3: Suppose an endogenous event occurs at τk with a switching function g(x, θ). If fk (τk+ ) = 0, x (τk+ ) is independent of fk−1 .  + 0, then If, in addition, ∂g ∂θ = x (τk ) = 0. + The condition fk τk = 0 typically indicates a saturation effect or the state reaching a boundary that cannot be crossed, e.g., when the state is constrained to be non-negative. When the conditions in the two lemmas are satisfied, IPA provides sensitivity estimates that do not require knowledge of the noise processes or the detailed time-driven dynamics of the system, other than mild technical conditions. Thus, one need not have a detailed model (captured by fk−1 ) to describe the state behavior through x˙ = fk−1 (x, θ, t), t ∈ [τk−1 , τk ) in order to estimate the effect of θ on this behavior. This explains why simple abstractions of a complex stochastic system are often adequate to perform sensitivity analysis and optimization, as long as the event times corresponding to discrete state transitions are accurately + observed and the local system behavior at these event times, e.g.,  x τk in (1.7), can also be measured or calculated. In the case of SFMs, the conditions in these lemmas are frequently satisfied since (i) common performance metrics such as workloads or overflow rates satisfy (1.18), and (ii) flow systems involve non-negative continuous states and are constrained by capacities that give rise to dynamics of the form x˙ = 0. The simplicity of the IPA derivatives was noted in our earlier papers on SHS [6], and we present its principal example here for the sake of illustration. Consider the fluid single-queue system shown in Fig. 1.1, where the arrival-rate process {α(t)} and the service-rate process {β(t)} are random processes (possibly correlated) defined on a common probability space. The queue has a finite buffer, {x(t)} denotes the buffer workload (amount of fluid in the buffer), and {γ(t)} denotes the overflow of excess fluid when the buffer is full. Let the controllable parameter θ be the buffer size, and consider the sample performance function to be the loss volume during a given horizon interval [0, T ], namely  T γ(θ, t)dt. (1.19) L(θ) = 0

We assume that α(t) and β(t) are independent of θ, and note that the buffer workload and overflow processes certainly depend upon θ; hence, they are denoted by {x(θ, t)} and {γ(θ, t)}, respectively.

April 29, 2013

16:13

World Scientific Review Volume - 9in x 6in

chapter1

C. G. Cassandras and Y. Wardi

16

Fig. 1.1.

A fluid single-queue system.

The only other assumptions we make on the arrival process and service process is that, w.p.1, α(t) and β(t) are piecewise continuously differen T tiable in t (but need not be continuous), and the terms 0 α(t)dt and

T 0 β(t)dt have finite first moments. These certainly are very weak and general assumptions, and yet the IPA derivative has the following simple form (see [6]): with N denoting the number of lossy nonempty periods of the queue in the horizon interval [0, T ],a dL(θ) = −N. dθ

(1.20)

For example, in Fig. 1.2 there are three nonempty periods; the first and third incur loss while the second does not, hence dL(θ) = −2. Observe dθ that the formula in (1.20) does not depend in any functional way on the details of the arrival or service rate processes, and hence it is said to be nonparametric. Furthermore, it is simple to compute, and in fact amounts to a counting process, since its evaluation consists solely on the counting of the number of lossy nonempty periods. These two factors conceivably render it computable in real time and hence usable in optimization-based control.

Fig. 1.2. a The

A sample path of the fluid single-queue system.

term “nonempty period” is defined as a maximal time-interval during which the buffer is nonempty; it is the fluid equivalent of the term “busy period” that is commonly used in the context of discrete queues. A nonempty period is lossy if it incurs loss.

April 29, 2013

16:13

World Scientific Review Volume - 9in x 6in

IPA Calculus

chapter1

17

In more complicated systems, such as those comprised of queueing networks with flow control, one cannot expect the IPA derivatives to have such a crisp formula as (1.20). Instead, the term dL(θ) dθ often is computable by a recursive algorithm rather than a single formula. The IPA derivative is given by Equations (1.15) and (1.16), and the recursion is inherent in (1.8). Typically the algorithms compute a step whenever an event occurs, such as a queue becoming empty, full, or its buffer workload assuming a value that activates a control action; and the computed variables are re lated to the terms x (τk+ ) in the left-hand-sides of Equations (1.7) and (1.8) and have local significance associated with the queue. If the above event triggers another, induced, event, the computed variables are passed on to the node (queue) where the latter event occurs. This dynamic structure of the algorithms underscores their perturbation propagation along temporal and spatial dimensions, a fundamental concept lying at the heart of the Perturbation Analysis technique [2]. It is desirable to have the IPA algorithms be based only on counting processes or equally-simple procedures (e.g., computing the lengths of nonempty periods [6]). However, this is not the case even if the conditions for Proposition 2 are satisfied. Instead, the algorithms compute (and propagate) two kinds of terms: those derived from counting processes or equally-simple procedures, and those requiring precise measurements of flow rates at event epochs. For example, in networks with threshold flow control where the inflow rate at a queue depends on whether the buffer workload at another queue is larger or smaller than a given threshold, the terms in Equations (1.7) and (1.8) often require the instantaneous flow rates at the former queue. More generally, such flow terms may be inherent in the propagation of perturbations that are associated with induced events (see [11, 14]). Their computations can be much-more time-consuming than those of the counting processes. Furthermore, in real-time applications there arises a question of how to compute flow rates, where measurementbased approaches using moving averages may be both inaccurate and time consuming. In summary, the application of IPA to SHS yields unbiased gradient estimators in a far-larger class of systems as compared to the setting of DES. Furthermore, the algorithms and formulas for the IPA derivatives often are quite simple to compute. However, one part of them, comprised of counting or similar processes, is much simpler than their other part requiring explicit flow rate calculations. The question that we pose is whether the latter part can be ignored while guaranteeing convergence of IPA-based

April 29, 2013

16:13

18

World Scientific Review Volume - 9in x 6in

C. G. Cassandras and Y. Wardi

optimization and control algorithms. We are confident that the answer is affirmative for a significant class of systems, since generally convergence of gradient-descent algorithms has a considerable robustness with respect to noise in the gradient computations. Specifically, in the setting of IPA, the answer is positive as long as the relative error resulting from neglecting the explicit-rate terms, with respect to the exact IPA derivative, is bounded from above by a certain given number. Once this is established for a class of realistic problems, a strong case can be made for the eventual use of IPA in optimization-based control of highly-complex systems. 1.4. General Scheme for Abstracting DES to SFM As mentioned in the introduction, one of the main motivations for SHS is to use them as abstractions of complex DES, where the dynamics are abstracted to an appropriate level that enables effective optimization and control of the underlying DES. In this section, we review a general scheme first presented in [15] to abstract a DES to a HS, under the assumption that the state of the DES is represented by integers; this is typically the case for queueing systems, hence the abstracted SHS is usually a SFM. Consider a DES modeled by an automaton G = {X, E, f, Γ}, where X ⊆ Rn is the set of states, E is the (finite) set of events associated with G, f : X × E → X is the transition function, i.e., f (x, e) = y means that there is a transition caused by event e from state x to state y, and Γ : X → 2E is the active event function, i.e., Γ (x) is the set of all events e for which f (x, e) is defined and referred to as the “active event set” of G at x (see also [1]). In addition, for any x ∈ X, e ∈ E, we define a function hx,e : X → X, such that hx,e (x) = f (x, e). In what follows, we present a general scheme to abstract G to a hybrid automaton modeling a HS. There are two steps in the scheme: the first step is to partition states in X into a number of discrete aggregate states; in the second step, the discrete transitions within each aggregate state are abstracted into continuous (time-driven) dynamics. Step 1: A partition divides the set of states X into non-overlapping and non-empty subsets and is described by the partition function P : X → Z, such that P (a) = P (b) if and only if states a and b are in the same subset. A partition P1 is said to be larger than P2 if any states in the same subset in P2 are also in the same subset in P1 , i.e., P2 (a) = P2 (b) ⇒ P1 (a) = P1 (b)

chapter1

April 29, 2013

16:13

World Scientific Review Volume - 9in x 6in

IPA Calculus

chapter1

19

We also define interior states to be states x ∈ X that have no active events causing transitions to states outside the subset they are in, i.e., a is said to be an interior state of P if for all events e ∈ Γ (a), P (f (a, e)) = P (a). States that are not interior are termed boundary states. The partitions that we are interested in for the purpose of abstracting G should satisfy the following two criteria: Criterion 1 : For two states a, b ∈ X, if P (a) = P (b), then the following holds: Γ (a) = Γ (b)

(1.21)

Criterion 2 : For any two states a, b ∈ X, if P (a) = P (b), then for any event e ∈ Γ (a) ∩ Γ (b) such that P (f (a, e)) = P (f (b, e)) = P (a) the following holds: ha,e = hb,e

(1.22)

where ha,e , hb,e are two functions in the state space as defined above, i.e., they satisfy ha,e (a) = f (a, e) and hb,e (b) = f (b, e). Condition (1.22) states an equivalence relation between two functions such that for all x ∈ X, ha,e (x) = hb,e (x) holds. For example, ha,e (x) = hb,e (x) = x + 1; or ha,e (x), hb,e (x) are constant functions such as ha,e (x) = hb,e (x) = 0. The first criterion indicates that all states x ∈ X in the same aggregate state (subset) must have the same active event set. The second criterion simply states that an active event of an aggregate state will have the same effect for all the interior states of that aggregate state. The partition we are seeking is the largest one satisfying the above two criteria. Step 2: After the state partition is carried out, the next step is to abstract discrete transitions within each of the aggregate states to some form of continuous dynamics. This is achieved by analyzing the effects of all active events in changing the values of the interior states. For example, for an interior state x = (x1 , ..., xn ) ∈ Rn in a certain aggregate state, an active event e1 at x that increases the value of xi can be abstracted as a continuous inflow with a rate αi (t), while an active event e2 that decreases xi can be abstracted as a continuous outflow with a rate βi (t), where xi is viewed as a “flow content” in the system. Doing so for all events in the active set gives flow-like continuous dynamics for all elements of state x, such that dxi = αi (t) − βi (t) dt

April 29, 2013

16:13

World Scientific Review Volume - 9in x 6in

C. G. Cassandras and Y. Wardi

20

Finally, we get the HS abstraction where the event-driven part is represented by the aggregate states obtained in Step 1 and events that cause transitions among them, and the time-driven part is represented by the continuous system evolution within each of the aggregate states obtained through Step 2. In addition, the stochastic characteristics of the original DES remain in the abstraction model, so that events between aggregate states may occur randomly, and the flow rates in the continuous dynamics within each aggregate state may be stochastic processes. Note that the abstraction process aims at obtaining the structure of a hybrid automaton and not the explicit values of the flow processes involved. As previously mentioned, we limit ourselves to DES with integer-valued state variables, hence the abstracted continuous dynamics normally describe flows, leading to the class of SFMs as the abstracted SHS. For more general DES, the abstraction process is more complicated and may generate more general SHS other than SFM. As an example, we apply the proposed scheme to G/G/1/K systems with a First Come First Serve (FCFS) serving policy, and obtain the corresponding SFMs which can be seen to be the same as those obtained in prior work without this systematic framework. Let x ∈ {0, 1, . . . , K} be the state of the system, representing the number of jobs in the queue. The queue has a capacity K, so that when x = K further incoming jobs will be blocked. There are two types of events in this system, i.e., E = {a, d}, where a stands for acceptance of a job arrival, and d represents a departure of a job after being processed. Note that when event a occurs at x = K, this arrival is blocked because the queue is full. The automaton model of this system is shown in Fig. 1.3. Applying the general scheme described above, we divide the state space of the G/G/1/K system into the following three aggregate states: Aggregate State 1 : This aggregate state includes only x = 0, and the active event set is obviously Γ (0) = {a}.

Fig. 1.3.

Automaton Model of the G/G/1/K System.

chapter1

April 29, 2013

16:13

World Scientific Review Volume - 9in x 6in

IPA Calculus

chapter1

21

Aggregate State 2 : x belongs to this aggregate state if 0 < x < K, and the active event set is {a, d}, since there can be both job arrivals and departures, and all arrivals will be accepted given that the queue has not reached its capacity. It is also easy to check that criterion (1.22) is satisfied, as event a increases all x by 1, while event d decreases all x by 1. Aggregate State 3 : This aggregate state includes only x = K, and the active event set is Γ (K) = {a, d}. However, unlike the previous aggregate state, arrivals at this state will be blocked because the queue is full. Next, we abstract the transitions within the above three aggregate states to appropriate continuous dynamics. First, for aggregate states 1 and 3, since they are both singletons with no discrete transitions within, the corre· sponding continuous dynamics are obviously x = 0. Within aggregate state 2, the active discrete transitions are job arrivals and job departures, which can be both abstracted as continuous flows, thus giving the continuous dy· namics x = α (t) − β (t), where α (t) is the inflow rate that corresponds to job arrivals, and β (t) is the outflow rate that comes from job departures. It follows from the above analysis that the SHS abstraction of the G/G/1/K system is a SFM with dynamics given by ⎧ 0 x = 0 and α (t) ≤ β (t) ⎨ · x= 0 x = K and α (t) ≥ β (t) ⎩ α (t) − β (t) otherwise consistent with what has been presented in previous related work [6]. Note that there are some conditions in the above dynamics, e.g., α (t) ≤ β (t), that do not directly follow from the two steps in the abstraction scheme. They are included simply to ensure flow conservation. 1.5. Conclusions and Future Work We have provided an overview of a recently developed general framework for IPA in Stochastic Hybrid Systems (SHS) and established some conditions under which IPA is particularly simple and efficient. When our goal is to develop a SHS as an abstraction of a complex DES, we have presented a systematic method for generating such abstractions. The proposed IPA framework opens up a new spectrum of applications where this IPA calculus may be used to study a very large class of optimization problems, including many that can be placed in the context of stochastic non-cooperative games termed “resource contention games” [21]. In such games, multiple users compete for a sharable resource and IPA enables gradient-based

May 14, 2013

10:0

22

World Scientific Review Volume - 9in x 6in

C. G. Cassandras and Y. Wardi

optimization for both a system-centric and a user-centric perspective. In general, the corresponding solutions do not coincide, giving rise to what is referred to as “the price of anarchy.” However, for at least one class of resource contention problems it was recently shown that these two solutions do in fact coincide [22], opening up interesting directions at exploring conditions under which this is possible. References [1] C. G. Cassandras and S. Lafortune, Introduction to Discrete Event Systems, 2nd Edition. Springer (2008). [2] Y. C. Ho and X. R. Cao, Perturbation Analysis of Discrete Event Dynamic Systems. Kluwer Academic Pub. (1991). [3] P. Glasserman, Gradient Estimation via Perturbation Analysis. Kluwer Academic Pub. (1991). [4] C. G. Cassandras and C. G. Panayiotou, Concurrent sample path analysis of discrete event systems, Journal of Discrete Event Dynamic Systems: Theory and Applications. 9, 171–195 (1999). [5] C. G. Cassandras and J. Lygeros, eds., Stochastic Hybrid Systems. Taylor and Francis (2006). [6] C. G. Cassandras, Y. Wardi, B. Melamed, G. Sun, and C. G. Panayiotou, Perturbation analysis for on-line control and optimization of stochastic fluid models, IEEE Transactions on Automatic Control. 47(8), 1234–1248 (2002). [7] D. Anick, D. Mitra, and M. M. Sondhi, Stochastic theory of a data-handling system with multiple sources, The Bell System Technical Journal. 61, 1871– 1894 (1982). [8] B. Liu, Y. Guo, J. Kurose, D. Towsley, and W. B. Gong. Fluid simulation of large scale networks: Issues and tradeoffs, in Proceedings of the Intl. Conf. on Parallel and Distributed Processing Techniques and Applications, pp. 2136– 2142 (June, 1999). [9] D. Connor, G. Feigin, and D. D. Yao, Scheduling semiconductor lines using a fluid network model, IEEE Transactions on Robotics and Automation. 10(2), 88–98 (1994). [10] H. Yu and C. Cassandras, Perturbation analysis of feedback-controlled stochastic flow systems, IEEE Transactions on Automatic Control. 49(8), 1317–1332 (2004). [11] Y. Wardi, R. Adams, and B. Melamed, A unified approach to infinitesimal perturbation analysis in stochastic flow models: the single-stage case, IEEE Transactions on Automatic Control. 55(1), 89–103 (2010). [12] G. Sun, C. G. Cassandras, and C. G. Panayiotou, Perturbation analysis and optimization of stochastic flow networks, IEEE Transactions Automatic Control. 49(12), 2113–2128 (2004). [13] H. Yu and C. Cassandras, Perturbation analysis and feedback control of communication networks using stochastic hybrid models, Journal of Nonlinear Analysis. 65(6), 1251–1280 (6, 2006).

chapter1

May 14, 2013

10:0

World Scientific Review Volume - 9in x 6in

IPA Calculus

chapter1

23

[14] C. G. Cassandras, Y. Wardi, C. G. Panayiotou, and C. Yao, Perturbation analysis and optimization of stochastic hybrid systems, European Journal of Control. 16(6), 642–664 (2010). [15] C. Yao and C. G. Cassandras, Perturbation analysis of stochastic hybrid systems and applications to resource contention games, Frontiers of Electrical and Electronic Engineering in China. 6(3), 453–467 (2011). [16] Y. Wardi, A. Giua, and C. Seatzu, IPA for continuous stochastic marked graphs, Automatica. 49(5), 1204–1215 (2013). [17] A. Kebarighotbi and C. G. Cassandras, A general framework for modeling and online optimization of stochastic hybrid systems, in Proceedings of 4th IFAC Conf. on the Analysis and Design of Hybrid Systems (June, 2012). [18] H. J. Kushner and G. G. Yin, Stochastic Approximation Algorithms and Applications. Springer–Verlag (1997). [19] C. Yao and C. G. Cassandras, Perturbation analysis and optimization of multiclass multiobjective stochastic flow models, J. of Discrete Event Dynamic Systems. 21(2), 219–256 (2011). [20] R. Rubinstein, Monte Carlo Optimization, Simulation and Sensitivity of Queueing Networks. John Wiley and Sons (1986). [21] C. Yao and C. G. Cassandras, Resource contention games in multiclass stochastic flow models, Nonlinear Analysis: Hybrid Systems. 5(2), 301–319 (2012). [22] C. Yao and C. G. Cassandras, A solution to the optimal lot sizing problem as a stochastic resource contention game, IEEE Trans. on Automation Science and Engineering. 9(2), 240–264 (2012).

This page intentionally left blank

April 29, 2013

16:15

World Scientific Review Volume - 9in x 6in

Chapter 2 Smoothed Perturbation Analysis: A Retrospective and Prospective Look Michael C. Fu University of Maryland, College Park Weibo Gong University of Massachusetts, Amherst Jian-Qiang Hu Fudan University Shu Li WA Regenerative Medicine Ltd We review an extension of infinitesimal perturbation analysis (based on conditional Monte Carlo) known as smoothed perturbation analysis (SPA). We introduce the main ideas behind SPA, develop some simple examples, outline a general framework, and provide some historical perspective. The exposition will focus more on intuition and general concepts rather than mathematical rigor, in the spirit of Professor Ho’s approach to presenting ideas to non-specialists. New connections and recent applications will also be discussed.

2.1. Introduction Smoothed perturbation analysis (SPA) is a sample path approach for gradient estimation based on conditional Monte Carlo. SPA is quite general; in principle it can be applied to any stochastic system, and it is a generalization of infinitesimal perturbation analysis (IPA). In practice, it is similar to the use of conditional Monte Carlo for variance reduction in being heavily problem-dependent in terms of both its implementation and its effectiveness. A simple example will be used to illustrate the main idea. Let T and A denote the two independent random variables, and consider the objective of finding 25

chapter2

April 29, 2013

16:15

26

World Scientific Review Volume - 9in x 6in

chapter2

M. C. Fu et al.

P(T ≥ A). If both distributions are known explicitly, then one way to compute this would be to take an appropriate convolution, since the random variables are independent. However, assume that this cannot be done easily, and that the setting is such that it is easy (both in cost and effort) to take samples of the two random variables. Then the (sometimes called “crude”) Monte Carlo method would be simply to take a large sample of iid T and A, {Ti , Ai }, and simply count the number of times the condition is satisfied (sum of indicator functions) and divide by the total number of trials, i.e., 1 N ∑ 1{Ti ≥ Ai }, N i=1

(2.1)

where N is the sample size and 1{} is the indicator function taking the value 1. (This estimator is of course applicable even in the correlated setting.) Conditional Monte Carlo can be applied for two purposes here: variance reduction for estimating the original performance measure, and derivative estimation via SPA. In both cases, there are two main steps: deciding what to condition on, and doing appropriate conditional expectation calculations. Expressing the desired probability as an expectation of an indicator function, P(T ≥ A) = E[1{T ≥ A}],

(2.2)

we distinguish between what we will refer to as the performance measure P(T ≥ A), always an expectation in our discussion here, and the sample performance 1{T ≥ A}, which will depend on a sample path in the more general setting. Now assume that T is a complicated random variable, whereas A is relatively simple (the example considered later is that T is system time of a queueing system, an output random variable in a simulation model, whereas A is an interarrival time, an input to the simulation). Then by conditioning on T , we write P(T ≥ A) = E[1{T ≥ A}] = E[E[1{T ≥ A}|T ]].

(2.3)

Since A is assumed independent of T , and letting the cumulative distribution function (cdf) of A be denoted by G, we then have P(T ≥ A) = E[G(T )],

(2.4)

so that now applying the Monte Carlo method would consist of sampling T iid, and computing G(T ) for each one and then taking the sample average, i.e., 1 N ∑ G(Ti ). N i=1

(2.5)

This estimator (2.5) can be shown to be no worse (in terms of variance) than the crude Monte Carlo estimator given by (2.1). Intuitively, it makes sense, since there is less sampling and every Ti makes a contribution.

April 29, 2013

16:15

World Scientific Review Volume - 9in x 6in

Smoothed Perturbation Analysis: A Retrospective and Prospective Look

chapter2

27

For our purposes, we are interested in the sensitivity of the performance measure P(T ≥ A) with respect to a parameter θ in the distribution of T . In other words, the objective is to find dP(T (θ ) ≥ A) dE[1{T (θ ) ≥ A}] = . (2.6) dθ dθ The brute-force approach would be a finite-difference estimate, which would require perturbing θ and taking additional samples at the perturbed value(s). If θ is high dimensional, then this could become computationally burdensome and even impractical in many settings. Furthermore, besides not providing an unbiased estimate of the desired quantity, finite difference estimates also require the choice of the size of the perturbation(s), trading off bias versus noise in the stochastic estimate. If common random numbers can be applied in an effective manner, this can reduce the noise dramatically. However, direct gradient estimation techniques avoid the need altogether for choosing a perturbation. IPA assumes the interchange of differentiation and expectation operators in (2.6). Unfortunately, that interchange is generally invalid here, because the sample performance is an indicator function, which is a constant function with a single jump and hence would have a sample derivative of zero almost everywhere. Hence, there is a need for an alternative approach. Depending on the form of G, other possible approaches include the likelihood ratio (LR) or score function (SF) method or weak derivatives (WD) (also known as measure-valued differentiation), which differentiate the measure rather than the sample performance. SPA uses conditional Monte Carlo and then still differentiates the sample performance. Here, the conditioning used is the same as used for the variance reduction, as taken in (2.3) and leading to the expression in (2.4), which assuming a weaker interchange of differentiation and (conditional) expectation operators now holds, can be directly differentiated to get     dG(T (θ )) dT dP(T (θ ) ≥ A) dE[E[1{T (θ ) ≥ A}]|T ]] = E G (T (θ )) , = =E dθ dθ dθ dθ (2.7) where dT /d θ is the usual IPA estimator and G is just the probability density function, assuming its existence. Thus, just as (2.4) is used to provide a reducedvariance conditional Monte Carlo estimate of the performance measure, (2.7) can be used to provide an SPA estimate of the derivative of the performance measure, which would not require any additional sampling above that used for (2.5), i.e., dTi (θ ) 1 N  ∑ G (Ti ) d θ . N i=1

(2.8)

April 29, 2013

16:15

World Scientific Review Volume - 9in x 6in

28

chapter2

M. C. Fu et al.

Note that both (2.5) and (2.8) no longer require sampling Ai , but if the Ai were also sampled (this makes more sense in the queueing example to follow), then alternative SPA estimators are the following: 1 N G (Ti ) dTi (θ ) , ∑ N i:Ai ≤Ti 1 − G(Ti) d θ 1 N G (Ti ) dTi (θ ) . N i:A∑ G(Ti ) d θ i >Ti

(2.9)

(2.10)

Alternatively, for this simple example, using (2.4), which was obtained by conditioning, by viewing θ as appearing directly in G, direct differentiation again assuming an interchange of operators leads to a conditional IPA estimator (NOT usually what is referred to as SPA) of the form 1 N ∂ G(Ti ; θ ) ∑ ∂θ . N i=1 2.2. Brief History of SPA SPA is a direct descendant of IPA, which to much of the outside world remains the face of perturbation analysis. The basic idea behind perturbation analysis envisioned by Y.C. (Larry) Ho and his early collaborators in the 1970s [1] was that of a ‘thought experiment’ in which a perturbation is introduced into the system and its effect on the performance measure of interest tracked. The research grew out of a consulting project at Fiat Motors in Italy on an automative engine production line, involving the allocation of buffer spaces between machines on the line. Originally, a finite perturbation was introduced, but at some point the limit was taken leading to a virtual infinitesimal perturbation whereby no perturbation is introduced at all, with the first major work on IPA being [2] for queueing networks. Some confusion ensued when this transition took place, but soon things were sorted out into the dichotomy of infinitesimal and the finite; cf. [3]. It was realized early on that IPA would not work in many cases — discrete sample performances, such as those involving an indicator function (as in the simple example of the previous section), being an obvious case. The other main source of discontinuity was the underlying system, and the commuting condition introduced by Glasserman [4, 5] was seminal in providing an easily checkable condition for when the system was the cause of the discontinuity. Cao [6] was the first to formalize conditions under which the interchange of expectation and differentiation required for IPA to be unbiased was valid. Conditioning was

April 29, 2013

16:15

World Scientific Review Volume - 9in x 6in

Smoothed Perturbation Analysis: A Retrospective and Prospective Look

chapter2

29

first proposed by Zazanis and Suri for the FCFS single-server queue in estimating the second derivative. Although the paper was not published until 1994 [7], the work was actually carried out in the early 1980s. The idea was formalized in a more general framework by Gong and Ho [8], where the name SPA was coined. Glasserman and Gong [9] considered a setting of terminal rewards where SPA could be applied in the GSMP framework; see also [4]. Fu and Hu [10] extended this to general performance measures in the GSMP framework, introducing the idea of the degenerated nominal path, which is critical in calculating the effect of the event order change. In parallel to SPA, another direction to extend IPA to the finite case was given by [11] and [12]. One characteristic that distinguished PA from the measure-based approaches of LR/SF and WD is that the parameter did not need to be in the distribution (measure) for PA to be applicable. Sometimes it is possible to move the parameter(s) into the measure by a change of variable to make LR/SF or WD applicable, whereas IPA is directly implementable (albeit not necessarily unbiased) in either case — the parameter is called distributional or structural to distinguish the two cases, where the latter refers to basically any case other than the former. Thus, in inventory systems, IPA or SPA might be used for parameters in the demand distribution as well as for the inventory control parameters such as the reorder point or order quantity. 2.3. Another Example As stated in the introduction, there are two main ingredients for applying conditional Monte Carlo: (1) Deciding the set of random variables on which to condition. (2) Estimating the resulting conditional expectation efficiently from the sample path(s). In [8], the conditioning set of random variables was called the characterization, and that is the terminology that will be used here. We now consider a slightly more involved example, the first-come, firstserved (FCFS) single-server queue. Let An denote the interarrival time between the (n − 1)st and nth customer (iid), Xn the service time of the nth customer (iid), Dn the delay of the nth customer, and Tn the system time (delay plus service time, Dn + Xn ) of the nth customer. Assume that the parameter of interest θ appears in the service time distribution F(·; θ ). For simplicity, let the performance measure be the probability that the ith customer has to wait, i.e., P(Di > 0). Note that this can easily be extended to

April 29, 2013

16:15

World Scientific Review Volume - 9in x 6in

30

chapter2

M. C. Fu et al.

the fraction of customers having to wait by averaging over customers. Usually, the performance measure of interest involves Di or Ti itself, but in that case IPA works for the first derivative, whereas by selecting the probability, we know that just as in the first example, IPA automatically fails. The FCFS single-server queue satisfies the well-known Lindley equation: Dn = (Tn−1 − An )+ ,

(2.11)

where x+ = max(x, 0). Thus, Dn > 0 ⇐⇒ Tn−1 > An . In words, a customer has to wait in a FCFS single-server queue if he/she arrives before the customer in front leaves. The input random variables to this system is the set of interarrival and service times {An .Xn }. There are many possible choices for the characterization z, but by choosing perhaps the simplest: all of the random variables except for Ai , i.e., z = {An , Xn }\{Ai}, the estimator becomes basically the same as in the first example, given by (2.8), (2.9), or (2.10), depending on the sample paths used. The usual estimator is given by (2.9), which corresponds to the case where the ith customer is starting a new busy period. Also, the IPA portion is given by differentiating (2.11) to get dXn dTn−1 dTn = + 1{Tn−1 > An }. dθ dθ dθ

(2.12)

Note that if interarrival times were deterministic, then the characterization used here would not work, since in that case, there is nothing random left in the system. [13] considers an alternative choice of characterization for that case. 2.4. Overview of a General SPA Framework Infinitesimal perturbation analysis (IPA) remains the bread and butter of PA, and to many non-experts, it is the only PA that they know. It is well accepted that in terms of gradient estimation, when IPA works, it usually works the best among all competing approaches. There are of course exceptions, which are well documented in the literature. However, IPA is limited in its applicability, which is where SPA comes to the rescue. In principle, SPA can be applied to almost any problem; in practice, it may be not be so easy, because it almost always requires some domain knowledge of the problem/system being analyzed. As alluded to already, this is analogous to the application of conditional Monte Carlo to variance reduction in stochastic simulation.

April 29, 2013

16:15

World Scientific Review Volume - 9in x 6in

Smoothed Perturbation Analysis: A Retrospective and Prospective Look

chapter2

31

Intuitively, IPA works when small changes in the parameter cause small changes in the sample performance estimate. In the first example, it is clear that this cannot be the case, because the sample performance takes only two values: 0 or 1, so either the sample performance will be constant as the parameter is changed or it will jump to the other value. The second queueing example potentially contains more subtleties if instead of the probability performance measure a common performance measure such as delay or system time were considered, where the first derivative of the sample performance changes smoothly. For example, IPA estimator for system time for (2.12) is unbiased under mild conditions. Other difficulties actually ensue if the random variable is in fact discrete, but these will not be addressed here. The main technical requirement for IPA to be applicable is uniform integrability; however, this all but necessary and sufficient condition proved to be unwieldy and unverifiable in practice, so was quickly replaced by the dominated convergence theorem, a sufficient condition, which proved to be a reliable workhorse and the “go-to” method in proving unbiasedness for both IPA and SPA. Almost sure (as) continuity often proved to be an even more convenient sufficient condition that worked well in many settings by providing the needed bound implicitly via the mean value theorem. This was refined further to almost sure Lipschitz continuity, where the bound would be explicitly assumed. Consider the following general form for the performance measure: J(θ ) = E[L(θ , ω )]

(2.13)

where J will be called the performance measure and L will be called the sample performance. Most performance measures can be put in this framework, with quantiles a notable exception. In the GSMP framework of [14], the SPA estimator consists of an IPA contribution plus an additional conditional contribution that is the product of a probability rate and a conditional expectation difference term. Specifically, the SPA estimator for (2.13) can be written (informally) as P(B(Δθ )) dL + lim lim δ L(B(Δθ )), d θ Δθ →0 Δθ Δθ →0

(2.14)

where the first term is the IPA contribution, B represents a critical change in the sample path that causes a discontinuity in the sample performance and δ L corresponds to the resulting expected change. To compute δ L, two key sample paths are introduced: the degenerated nominal path (DNP) and the perturbed path (PP). The original sample path is called the nominal path (NP), consistent with early PA usage. Both DNP and PP are defined in the limit as Δθ → 0, with DNP corresponding to the same event sequence as NP and PP corresponding to

April 29, 2013

16:15

World Scientific Review Volume - 9in x 6in

32

chapter2

M. C. Fu et al.

the path in which the critical change B occurs. Roughly speaking, 

lim δ L(B(Δθ )) = E[LPP ] − E[LDNP ].

Δθ →0

Generally, the difference is estimated using common random numbers and often can be constructed either implicitly or explicitly from NP, the original sample path. In fact, in many cases, E[LDNP ] can be estimated using the L from NP. We illustrate the concepts of DNP and PP for the two previous examples. For the simple two random variables example, the IPA contribution is zero, and NPi = {Ti , Ai }, Assuming that T is increasing in θ and we are considering the righthand derivative, then the critical change would be from 1{T ≥ A} = 0 to 1, so that + DNPi would have Ti = A− i , whereas PPi would have Ti = Ai , with the “−” and “+” indicating infinitesimally below and above, respectively. For the FCFS single-server queue, assume that the service times {Xi } are increasing in θ and we are considering the righthand derivative. Then the critical change would an event order change in which two busy periods coalesce. In this case, again DNPi corresponds to the case where Ti = A− i , just before the two busy periods coalesce (so the customer still does not wait), whereas PPi corresponds to the case where Ti = A+ i , just after the two busy periods coalesce (so the customer waits an infinitesimal amount of time). Note that this example is simple enough for the performance measures considered, so that one does not have to trace the DNP and PP paths very far.

More on Higher Derivatives for Single-server Queues As mentioned earlier, the first use of conditioning arose when trying to extend the IPA results for estimating second derivative of system time for the FCFS singleserver queue. Using the construction just mentioned applied to (2.12) yields the SPA estimator: d 2 Xn d 2 Tn−1 g(Tn−1 ) d 2 Tn = + 1{Tn−1 > An } + 2 dθ dθ 2 dθ 2 1 − G(Tn−1)



dTn−1 dθ 2

2 1{Tn−1 < An },

(2.15) where G and g are the respective cdf and pdf of interarrival times. In an alternative approach, Gong and Hu [15] applied the basic ideas of SPA for estimating all of the higher derivatives. In fact, they calculated the MacLaurin series, i.e., the higher derivatives in light traffic when the service times are equal to zero. In this case, simulation is not required and analytical formulas can be obtained. Specifically, assuming the service times are given by Xn = θ Yn where

April 29, 2013

16:15

World Scientific Review Volume - 9in x 6in

Smoothed Perturbation Analysis: A Retrospective and Prospective Look

chapter2

33

{Yn } is an iid sequence, the mean steady-state delay can be expressed as ∞ i+1

E[D(θ )] = ∑ ∑

i=0 k=1

(i)

g∗(k) (0) (i + 2)!

E[(Y1 + · · · + Yk )i+2 ]θ i+2

with (n)

g∗(k) (0) =

n−1

(n−1−i)

∑ g(i)(0)g∗(k−1)

(0).

i=0

Later this method was extended to inventory systems [16], and the analytical derivatives were used in the Pade approximation for queueing systems in a paper [17] that received the 1997 IEEE Control Systems Society George Axelby Outstanding Paper award. 2.5. Applications We will briefly summarize the major application areas, providing some high-level comments about where SPA was used, and the critical events, along with the corresponding DNP and PP. 2.5.1. Queueing Queueing systems were the first place where PA was applied, with seminal papers [2] and [3] considering IPA for Jackson-like queueing networks (second paper also considered the finite case). As alluded to earlier, for the FCFS single-server queue, the critical order change for most performance measures of interest would correspond to the coalescing of two adjacent busy periods, in which the departure ending the first busy period overtakes the arrival starting the next busy period. In this case, the DNP and PP would be the resulting sample paths formed by taking the NP and forcing these two events to occur simultaneously, with the former corresponding to the case where the event order is preserved and the latter corresponding to the case where the arrival occurs infinitesimally before the departure, i.e., technically there are still two busy periods in the DNP but only a single busy period in the PP. Note that the coalescing corresponds to the setting where a perturbation results in an “expansion” of the sample path; if the perturbation results in a “contraction,” then the critical event order change would cause a splitting of busy periods, which except under very light traffic conditions would arise far more often on a nominal sample path. In other words, the set B in (2.14) would be larger for the latter case than for the former.

April 29, 2013

16:15

34

World Scientific Review Volume - 9in x 6in

M. C. Fu et al.

Some of the other queueing settings treated using SPA in the literature include higher derivatives, multi-server systems with unequal servers, and routing probabilities. An example of the latter is [18]. 2.5.2. Inventory Inventory control is another area in which PA has been successfully applied. In the case where there are no fixed setup costs associated with ordering, a base-stock or single-parameter ordering policy is generally optimal, and IPA can often be applied successfully; cf. [19]. In fact, a nice success story for IPA was reported in a Fortune magazine, October 30, 2000 article, “New Victories in the Supply-Chain Revolution”: “Among the techniques ... used to attack this complex (supply chain inventory control) problem was ... infinitesimal perturbation analysis, for which no complete explanation is possible for the faint-hearted or mathematically disadvantaged.”

When there is a fixed setup cost to place an order, a two-parameter ordering policy is necessary, with one of the parameters being the reorder point. In this case, IPA would not suffice by itself. For example, in the case of an (s, S) inventory control system, if the two parameters are taken to be s and q = S − s, it can be shown that an IPA estimator is unbiased for the sensitivity of most cost performance measures with respect to s with q kept fixed, whereas if s is kept fixed and q varied, there is the possibility of a drastic change in the ordering pattern, so that the IPA estimator would be biased, but SPA can be applied (cf. [20]; see also [21]). Note that here the control parameters are structural as opposed to distributional. In this case, the DNP and PP would be the resulting sample paths formed by taking the NP and forcing the inventory position at the order decision point to be exactly equal to the order point. For the righthand derivative (limit as Δq → 0+ ), the critical event change is from placing an order to not placing an order, with the DNP corresponding to the former case and PP corresponding to the latter case; see [14, 22–25] for numerous examples. 2.5.3. Finance Finance is another more recent area where PA has found a welcome audience, one motivation summarized by this quote from [26]: “Whereas the prices themselves can often be observed in the market, their sensitivities cannot, so accurate calculation of sensitivities is arguably even more important than calculation of prices.”

chapter2

April 29, 2013

16:15

World Scientific Review Volume - 9in x 6in

Smoothed Perturbation Analysis: A Retrospective and Prospective Look

chapter2

35

Furthermore, when he introduced Monte Carlo simulation to the finance community in the 1970s, Boyle [27] stated: “To obtain option values corresponding to different current stock prices a set of simulation trials has to be carried out for each starting stock price.”

Prior to the mid-1990s, this was the common belief in the field, as summarized in the 1993 edition of a popular textbook [28], which instructs that to estimate the socalled “Greeks” — the sensitivities with respect to certain parameters such as the current stock price (delta), volatility (vega), interest rate (rho), playing a central role in hedging financial derivatives — one must in fact follow the prescription above in carrying out multiple simulations with the parameter changed, i.e., using finite differences. The same book also makes another claim on the limitations of simulation [28][p.363]: “Monte Carlo simulation can only be used for European-style options”

PA helped to refute both of these claims, and current editions of the textbook have been revised accordingly. Especially in the setting when the underlying stochastic process has continuous sample paths, IPA, usually known as the “pathwise method” in the field, can often be applied successfully. The first to apply PA to this setting were [29] and [30]; see [26] for both IPA and LR/SF, which work well in many settings. Settings where SPA is required include the following: • stochastic processes with jumps such as L´evy processes (cf. [31] for IPA); • terminal discontinuous payoff functions, e.g., digital options or the gamma (2nd derivative of a simple call or put option); • path-dependent discontinuous payoff functions, e.g., barrier options, loopback options, and basket options; • American-style options, where similar to the inventory control setting, the parameters affect the decision to exercise or hold the option. Fu and Hu [29] (see also [32]) treated the simplest American-style options, where the parameter of interest is a threshold to determine whether or not to exercise or hold the option. Gradient-based search (stochastic approximation) utilizing the SPA estimators was used to maximize the expected payoff by adjusting the threshold and thus price the option. Analogous to the inventory control setting, for the righthand derivative (positive perturbation in threshold, i.e., raising it), the critical event change is from exercising the option (being above the threshold) to holding the option (being below the threshold), with the DNP corresponding to the former case and PP corresponding to the latter case. Wu and Fu [33] extended

April 29, 2013

16:15

36

World Scientific Review Volume - 9in x 6in

M. C. Fu et al.

the work to American-Asian options with more complicated exercise boundaries, which are also parameterized and then optimized. A more recent work includes the following: Barrier options were treated using SPA in [34], where the critical change is from being knocked out (or knocked in) to staying alive (or vice versa). A general framework for a class of discontinuous payoffs was treated using a change of variables in [35], where the resulting estimator turns out to have both an IPA and LR/SF component, perhaps the first such estimator from an actual setting (versus a constructed toy example). 2.5.4. Stochastic Activity Networks (SANs) Another application of SPA is the setting of stochastic activity networks (SANs), which arise in many applications such as the Critical Path Method (CPM) and the Project Evaluation Review Technique (PERT). Since this application is much less known, we describe it in more detail. For large and/or complex networks, Monte Carlo simulation is the only way to estimate performance, e.g., a longest path or a shortest path. And similar to the finance setting, estimating the sensitivities of the performance measures with respect to parameters of the network can sometimes be more important than estimating the performance measure itself. In terms of PA, [36] was an early work that applied IPA for efficient sensitivity analysis of SANs. The following SPA derivations are taken from [37], which also derives other IPA, LR/SF, and WD estimators. Consider a directed acyclic graph, defined by a set of nodes N of integers 1, . . . , |N |, and a set of directed arcs A ⊂ {(i, j) : i, j ∈ N ; i < |N |, j > 1}, where (i, j) represents an arc from node i to node j, and, without loss of generality, node 1 is taken as the source and node |N | as the sink (destination). Let P denote the set of paths from source to sink. The input random variables are the individual activity times given by Xi , with cumulative distribution function (cdf) Fi , i = 1, . . . , |A |, and corresponding probability density function (pdf) or probability mass function (pmf) fi . Assume all of the activity times are independent. However, it should be clear that duration of paths in P will not in general be independent, such as in the following example, where all three of the path durations are dependent, since X6 must be included in any path. Example: 5-node network with A = {(1, 2), (1, 3), (2, 3), (2, 4), (3, 4), (4, 5)} mapped to arcs 1 through 6 as shown in Figure 2.1; P = {(1, 4, 6), (1, 3, 5, 6), (2, 5, 6)}. Let P∗ ∈ P denote the set of activities on the optimal (critical) path corresponding to the total project duration (e.g., shortest or longest path, depending on

chapter2

April 29, 2013

16:15

World Scientific Review Volume - 9in x 6in

chapter2

Smoothed Perturbation Analysis: A Retrospective and Prospective Look

37

2 •  *  HH

X1  

 1 •PP PP P

HHX4 HH

HH

4 HH j 1•   

X3

-5•

 PP ? q• P X5 3

PP

X2

X6

Fig. 2.1. Stochastic Activity Network.

the problem), which is the sample performance of interest here: L=

∑ Xj,

(2.16)

j∈P∗

where P∗ itself is a random variable. IPA works well for estimating the sensitivity with respect to the mean, i.e., dE[L]/d θ , where θ appears in the activity time distributions, but SPA is needed for estimating the sensitivity for the tail distribution performance measure, i.e., dP(L > x)/d θ for some given x ≥ 0. To derive the SPA estimator, we begin by defining the following: P j = {P ∈ P | j ∈ P} = set of paths containing arc j, |P| = length of path P, |P|− j = length of path P with X j = 0. The idea will be to condition on all activity times except a set that includes activity times dependent on the parameter. Assume that θ occurs in the density of X1 , and take L to be the longest path, where other forms such as shortest path could also be handled in a similar manner. Take the characterization to be everything except X1 , i.e., Z = {X2 , . . . , XT }, so LZ (θ ) = PZ (L > x) ≡ E[1{L > x}|X2 , . . . , XT ]  1 if maxP∈P |P|−1 > x; = PZ (maxP∈P1 |P| > x) otherwise; where PZ denotes the conditional (on Z) probability. Since PZ ( max |P| > x) = PZ (X1 + max |P|−1 > x) = PZ (X1 > x − max |P|−1 ) P∈P1

P∈P1

= F1c (x − max |P|−1 ; θ ), P∈P1

P∈P1

April 29, 2013

16:15

World Scientific Review Volume - 9in x 6in

38

M. C. Fu et al.

where F c ≡ 1 − F denotes the complementary cdf, LZ (θ ) = F1c (x − max |P|−1 ; θ ) · 1{max |P|−1 ≤ x} + 1{max |P|−1 > x}. P∈P1

P∈P

P∈P

Differentiation leads to the SPA estimator: ∂ F c (x − maxP∈P1 |P|−1 ; θ ) dLZ = 1 · 1{max |P|−1 ≤ x}, (2.17) dθ ∂θ P∈P which applies for both continuous and discrete distributions, as the following example illustrates. Example: For the 5-node example, P = {146, 1356, 256}, P1 = {146, 1356}, |146|−1 = X4 + X6, |1356|−1 = X3 + X5 + X6, |256|−1 = X2 + X5 + X6 . If X1 is exponentially distributed with mean θ , ∂ F1c (x; θ )/∂ θ = e−x/θ (x/θ 2 ), the SPA estimator is given by   x − max(X3 + X5, X4 ) − X6 x − max(X3 + X5, X4 ) − X6 exp − θ θ2 ·1{max(X3 + X5, X4 , X2 + X5) + X6 ≤ x}, whereas if X1 is a Bernoulli random variable, i.e., equal to xlow with probability θ and equal to xhigh > xlow otherwise, then ∂ F1c (x; θ )/∂ θ = 1{xlow ≤ x < xhigh }, and the SPA estimator is given by 1{xlow ≤ x − max(X3 + X5, X4 ) − X6 < xhigh } ·1{max(X3 + X5, X4 , X2 + X5) + X6 ≤ x}. Clearly, the estimator (2.17) was derived without loss of generality, so if θ is in the distribution of Xi , the SPA estimator is given by

∂ Fic (x − maxP∈Pi |P|− j ; θ ) · 1{max |P|− j ≤ x}. (2.18) ∂θ P∈P For the shortest path problem, simply replace “max” with “min” throughout in the estimator (2.18). [38] performed a simulation comparison of various sensitivity analysis estimates (PA, LR/SF, WD), where the SPA estimator exhibits quite unusual behavior from the rest, with quite low variance. 2.5.5. Others There have been many other application areas for SPA including statistical process control, preventive maintenance, and traffic light signal control; see [14, 37, 39, 40] for references. We end this applications section by summarizing each of these

chapter2

May 8, 2013

14:27

World Scientific Review Volume - 9in x 6in

Smoothed Perturbation Analysis: A Retrospective and Prospective Look

chapter2

39

three briefly. In the statistical process control application [41, 42], the goal was the design of control charts when Monte Carlo simulation was used to estimate performance, either in terms of average run length or costs associated with the process being monitored. Derivative estimation was used to select the optimal control chart parameters, e.g., the control limits. Thus, in this case, the parameter is inherently a structural parameter. In this setting, the event change is similar to a barrier option in the finance setting, where an out-of-control signal is analogous to a knock-out option that stops the process. So, for example, if one of the parameter is an upper control limit and the righthand derivative of the average run length is being considered, i.e., a positive perturbation or increase in the control limit, then the event change would be from an out-of-control signal (whereby the process was stopped) to an in-control signal (whereby the process continued), In this case, the DNP would stop at a point at which the test statistic is just above the control limit, whereas the PP would have the test statistic just below the control limit at the same point in time and continue until some other future out-of-control signal. For preventive maintenance models, critical events involve replacing a component (or set of components) in the system. Parameters could be in the distribution of the failure and repair times, as well as in the operation of the system, e.g., a parameterized threshold policy that specified when to replace a component [43]. In traffic light signal control, if the parameters control the signal timings, and if individual cars are considered, then a small perturbation could lead to one more or fewer cars making it through a signal cycle [44]. 2.6. Random Retrospective and Prospective Concluding Remarks The performance measures treated here were all expectations; see [45] for the use of conditional Monte Carlo for quantiles, along with references to other work for sensitivities of quantile performance measures. An alternative to SPA for discrete-event systems is to apply IPA to fluid-based models to derive estimators that can then be used in the original discrete-event system, as well as in hybrid systems. Cassandras, Wardi and other colleagues and students have developed this approach successfully, which has led to a wide extension of the applicability of IPA. Some of this work is described in Chapter 1 of this book. Other prominent books focusing on perturbation analysis not mentioned already include [46, 47]. Introductory expository articles include [48], [39], and [21].

April 29, 2013

16:15

40

World Scientific Review Volume - 9in x 6in

M. C. Fu et al.

We conclude by mentioning two intellectual lessons one can draw from the history of PA research. First, the initial introduction of PA concepts in queueing systems involved guesswork from Ho and his students; mathematical rigor came later. Ho’s great intuition guided the invention and discovery process, as he sensed that sample path sensitivity should be attainable. We still remember that when the name of infinitesimal perturbation analysis first appeared, some reviewer asked “How small is infinitesimal?”, partly because the theoretical framework was not clear at the time. This to some extent reminds us of Isaac Newton’s comment on why he believed the derivative as a ratio of vanishing quantities should exist, way before the introduction of real analysis: “Perhaps it may be objected, that there is no ultimate proportion, of evanescent qualities; because the proportion, before the quantities have vanished, is not the ultimate, and when they are vanished, is none. But by the same argument, it may be alleged, that a body arriving at a certain place, and there stopping, has no ultimate velocity: because the velocity, before the body comes to the place, is not its ultimate, velocity; when it has arrived, is none. But the answer is easy; for by the ultimate velocity is meant that with which the body is moved, neither before it arrives at its last place and the motion ceases, nor after, but at the very instant it arrives; that is, that velocity with which the body arrives at its last place, and with which the motion ceases. And in like manner, by the ultimate ratio of evanescent quantities is to be understood the ratio of the quantities not before they vanish, nor afterwards, but with which they vanish. In like manner the first ratio of quantities is that with which they begin to be. And the first or last sum is that with which they begin and cease to be (or to be augmented or diminished). There is a limit which the velocity at the end of the motion may attain, but not exceed. This is the ultimate velocity. And there is the like limit in all quantities and proportions that begin and cease to be. And since such limits are certain and definite, to determine the same is a problem strictly geometrical. But whatever is geometrical we may be allowed to use in determining and demonstrating any other thing that is likewise geometrical.

Here we see that physical intuition played a crucial role in laying the foundation for arguably the greatest invention of human intelligence. This perhaps warns us not to prematurely snuff out unorthodox and innovative new ideas for lack of mathematical rigor. A second lesson is that PA is a sample path-based method. Prior to PA, analytical modeling for queueing systems and inventory systems focused on calculating the distributions or ensemble averages. Sample path analysis was not in the mainstream. PA showed us that there is much to be gained by looking carefully at sample path behavior. In the history of quantum physics, this is a bit like the contrast of partial differential equation (such as the Schr¨odinger equation) description and the Feynman path integral description. While they might be equivalent, the

chapter2

April 29, 2013

16:15

World Scientific Review Volume - 9in x 6in

Smoothed Perturbation Analysis: A Retrospective and Prospective Look

chapter2

41

sample path approach seems to give a more direct means for attacking problems (such as the renormalization issue). Ho as a control theorist naturally looked at queueing problems from the sample path point of view. That is why he and his students managed to propose an entirely new approach to an old discipline. In today’s rapidly changing environment, we believe that this spirit of looking at seemingly mature and crowded fields with a new perspective is as crucial as ever. Acknowledgements This work was supported in part by the U.S. National Science Foundation (NSF) under Grants CMMI-0856256, EECS-0901543, EFRI-0735974, CNS-1065113, CNS-1239102, by the Air Force Office of Scientific Research (AFOSR) under Grant FA9550-10-1-0340, by the Army Research Office under Contract W911NF08-1-0233, by the National Natural Science Foundation of China under Grants 71071040, 71028001, 70832002, 71061160506, and by the Program for Professor of Special Appointment (Eastern Scholar) at Shanghai Institution of Higher Learning. Professor Ho was the doctoral advisor for all four of us, supervising our Ph.D. dissertations during the 1980s. He taught us not only about research and academics but also about life in general, covering various and sundry practical topics. Inspiring then and always, we are grateful to be able to contribute to this Festschrift in honor of his 80th birthday. References [1] Y. C. Ho, M. A. Eyler, and T. T. Chien, A gradient technique for general buffer storage design in a serial production line, International Journal of Production Research. 17, 557–580 (1979). [2] Y. C. Ho and X. R. Cao, Perturbation analysis and optimization of queueing networks, Journal of Optimization Theory and Applications. 40, 559–582 (1983). [3] Y. C. Ho, X. R. Cao, and C. G. Cassandras, Infinitesimal and finite perturbation analysis for queueing networks, Automatica. 19, 439–445 (1983). [4] P. Glasserman, Gradient Estimation via Perturbation Analysis. Kluwer Academic Publishers, Boston, Massachusetts (1991). [5] P. Glasserman, Derivative estimates from simulation of continuous-time markov chains, Operations Research. 40, 292–308 (1992). [6] X. R. Cao, Convergence of parameter sensitivity estimates in a stochastic experiment, IEEE Transactions on Automatic Control. AC-30, 845–853 (1985). [7] M. A. Zazanis and R. Suri, Perturbation analysis of the GI/GI/1 queue, Queueing Systems. 18, 199–248 (1994).

April 29, 2013

16:15

42

World Scientific Review Volume - 9in x 6in

M. C. Fu et al.

[8] W. B. Gong and Y. C. Ho, Smoothed (conditional) perturbation analysis of discreteevent dynamic systems, IEEE Transactions on Automatic Control. AC-32, 858–867 (1987). [9] P. Glasserman and W. B. Gong, Smoothed perturbation analysis for a class of discrete event systems, IEEE Transactions on Automatic Control. AC-35, 1218–1230 (1990). [10] M. C. Fu and J. Q. Hu, Extensions and generalizations of smoothed perturbation analysis in a generalized semi-Markov process framework, IEEE Transactions on Automatic Control. 37(10), 1483–1500 (1992). [11] Y. C. Ho and S. Li, Extensions of infinitesimal perturbation analysis, IEEE Transactions on Automatic Control. AC-33, 827–838 (1988). [12] C. G. Cassandras and S. G. Strickland, On-line sensitivity analysis of Markov chains, IEEE Transactions on Automatic Control. AC-34, 76–86 (1989). [13] M. C. Fu and J. Hu, On choosing the characterization for smoothed perturbation analysis, IEEE Transactions on Automatic Control. 36(11), 1331–1336 (1991). [14] M. C. Fu and J. Q. Hu, Conditional Monte Carlo: Gradient Estimation and Optimization Applications. Kluwer Academic Publishers (1997). [15] W. B. Gong and J. Q. Hu, The MacLaurin series for the GI/G/1 queue, Journal of Applied Probability. 29, 176–184 (1992). [16] J. Q. Hu, S. Nananuku, and W. B. Gong, A new approach to (s, S) inventory problems, Journal of Applied Probability. 30, 898–912 (1994). [17] W. B. Gong, S. Nananuku, and A. Yan, Pade approximation for stochastic discreteevent systems, IEEE Transactions on Automatic Control. 40(8), 1349–1358 (1995). [18] P. Br´emaud and W. B. Gong, Derivatives of likelihood ratios and smoothed perturbation analysis for the routing problem, ACM Transaction on Modeling and Computer Simulation. 3(2), 134–161 (1993). [19] P. Glasserman and S. Tayur, Sensitivity analysis for base-stock levels in multi-echelon production-inventory systems, Management Science. 41, 263–281 (1995). [20] M. C. Fu, Sample path derivatives for (s, S) inventory systems, Operations Research. 42(2), 351–364 (1994). [21] M. C. Fu, What you should know about simulation and derivatives, Naval Research Logistics. 55(8), 723–736 (2008). [22] S. Bashyam and M. C. Fu, Application of perturbation analysis to a class of periodic review (s, S) inventory systems, Naval Research Logistics. 41(1), 47–80 (1994). [23] S. Bashyam and M. C. Fu, Optimization of (s, S) inventory systems with random lead times and a service level constraint, Management Science. 44(12), S243–S256 (1998). [24] S. Bashyam, M. C. Fu, and B. K. Kaku, Application of perturbation analysis to multiproduct capacitated production-inventory control. In Proceedings of the American Control Conference, pp. 1270–1274 (1995). [25] M. C. Fu and J. Q. Hu, (s, S) inventory systems with random lead times: Harris recurrence and its implications in sensitivity analysis, Probability in the Engineering and Informational Sciences. 8(3), 355–376 (1994). [26] P. Glasserman, Monte Carlo Methods in Financial Engineering. Springer, New York (2004). [27] P. P. Boyle, Options: A Monte Carlo approach, Journal of Financial Economics. 4, 323–338 (1977).

chapter2

April 29, 2013

16:15

World Scientific Review Volume - 9in x 6in

Smoothed Perturbation Analysis: A Retrospective and Prospective Look

chapter2

43

[28] J. C. Hull, Options, Futures, and Other Derivative Securities, 2nd edn. Prentice Hall (1993). [29] M. C. Fu and J. Q. Hu, Sensitivity analysis for Monte Carlo simulation of option pricing, Probability in the Engineering and Informational Sciences. 9(3), 417–446 (1995). [30] M. Broadie and P. Glasserman, Estimating security price derivatives using simulation, Management Science. 42(2), 269–285 (1996). [31] M. C. Fu, Variance-gamma and Monte Carlo. In eds. M. C. Fu, R. A. Jarrow, J.-Y. Yen, and R. J. Elliott, Advances in Mathematical Finance, pp. 21–35. Birkh¨auser (2007). [32] M. C. Fu, R. Wu, G. G¨urkan, and A. Y. Demir, A note on perturbation analysis estimators for American-style options, Probability in the Engineering and Informational Sciences. 14(3), 385–392 (2000). [33] R. Wu and M. C. Fu, Optimal exercise policies and simulation-based valuation for American-Asian options, Operations Research. 51(1), 52–66 (2003). [34] Y. Wang, M. C. Fu, and S. I. Marcus. Sensitivity analysis for barrier optionss. In Proceedings of the 2009 Winter Simulation Conference, pp. 1272–1282, Piscataway, NJ (2009). [35] Y. Wang, M. C. Fu, and S. I. Marcus, A new stochastic derivative estimator for discontinuous payoff functions with application to financial derivatives, Operations Research. 60(2), 447–460 (2012). [36] R. A. Bowman, Stochastic gradient-based time-cost tradeoffs in PERT network using simulation, Annals of Operations Research. 53, 533–551 (1994). [37] M. C. Fu, Sensitivity analysis in Monte Carlo simulation of stochastic activity networks. In eds. F. B. Alt, M. C. Fu, and B. L. Golden, Perspectives in Operations Research: Papers in Honor of Saul Gass’ 80th Birthday, pp. 351–366. Springer (2006). [38] C. Gro¨er and K. Ryals, Sensitivity analysis in simulation of stochastic activity networks: A computational study. In eds. E. K. Baker, A. Joseph, M. A, and M. A. Trick, Extending the Horizons: Advances in Computing, Optimization, and Decision Technologies, pp. 183–200. Springer (2007). [39] M. C. Fu, Gradient estimation. In eds. S. G. Henderson and B. L. Nelson, Handbooks in Operations Research and Management Science: Simulation, chapter 19, pp. 575– 616. Elsevier (2006). [40] M. C. Fu, Perturbation analysis. In eds. S. I. Gass and M. C. Fu, Encyclopedia of Operations Research and Management Science. Springer (2013), 3rd edn. [41] M. C. Fu and J. Q. Hu, Efficient design and sensitivity analysis of control charts using Monte Carlo simulation, Management Science. 45(3), 385–413 (1999). [42] M. C. Fu, S. Lele, and T. Vossen, Conditional Monte Carlo gradient estimation in economic design of control limits, Production & Operations Management. 18(1), 60–77 (2009). [43] M. C. Fu, J. Q. Hu, and L. Shi, An application of perturbation analysis to replacement problems in maintenance. In Proceedings of the 1993 Winter Simulation Conference, pp. 329–337, Piscataway, NJ (1993). [44] W. C. Howell and M. C. Fu, Application of perturbation analysis to traffic light signal timing. In Proceedings of the 42nd IEEE Conference on Decision and Control, pp. 4837–4840 (2003).

April 29, 2013

16:15

44

World Scientific Review Volume - 9in x 6in

M. C. Fu et al.

[45] M. C. Fu, L. J. Hong, and J. Q. Hu, Conditional Monte Carlo estimation of quantile sensitivities, Management Science. 55(12), 2019–2027 (2009). [46] X. R. Cao, Realization Probabilities: The Dynamics of Queuing Systems. SpringerVerlag, Boston, Massachusetts (1994). [47] X. R. Cao, Stochastic Learning and Optimization: A Sensitivity-Based Approach. Springer, New York, NY (2007). [48] R. Suri, Perturbation analysis: The state of the art and research issues explained via the G/G/1 queue, Proceedings of the IEEE. 77, 114–1374 (1989).

chapter2

May 8, 2013

14:38

World Scientific Review Volume - 9in x 6in

Chapter 3 Perturbation Analysis and Variance Reduction in Monte Carlo Simulation Tarik Borogovac, Na Sun and Pirooz Vakili Boston University In this chapter we present two distinct connections between Perturbation Analysis (PA) and Variance Reduction (VR) in Monte Carlo Simulation. PA sensitivity estimators are random variables and reducing their variance is of interest, for example, when PA estimators are used to drive stochastic optimization algorithms. In this case, VR may be used to obtain PA based sensitivity estimators with smaller variance and thus potentially faster converging optimization algorithms. A less obvious connection, and one that will be discussed in more detail in this chapter, is how, using a PA inspired approach, one can define effective Control Variates (CV) when the variance reduction technique of Control Variates (CV) is used. Infinitesimal Perturbation Analysis (IPA) rests on two basic observations: (i) once randomness is fixed, sample performance is a deterministic function of the model/design/control parameter whose derivative in many instances provides useful information about the sensitivity of the expected performance; (ii) the derivative of the sample performance for some systems can be computed efficiently via the IPA algorithm. The approach presented here for defining effective control variates is based on viewing the sample performance as a deterministic function of system performance, as in (i), and may be thought as an extension of the sample path approach of PA.

3.1. Introduction Sensitivity estimation plays an important role in the analysis, optimization, and control of stochastic systems. Perturbation Analysis (PA), as an efficient sensitivity estimation method, originated in the context of performance improvement and optimization of manufacturing systems modeled as queueing networks. A key innovation was to derive algorithms for computing the derivative of some sample performances by analyzing the generation and propagation of perturbations due to infinitesimal changes in some 45

chapter3

April 29, 2013

16:17

46

World Scientific Review Volume - 9in x 6in

T. Borogovac, N. Sun and P. Vakili

design or control parameters (see, e.g., [1]). Some of the developments since then have included establishing basic conditions for unbiasedness of IPA estimators (see, e.g., [2]), finding remedies when direct IPA estimators are not informative (see, e.g., [3]), and extending the IPA algorithm to wider classes of stochastic models (see, e.g., [4]). For our purposes in this chapter, we borrow two components from the PA approach: (i) we formulate the basic estimation problem within a larger problem of analysis, control, or optimization of a stochastic system where the dependence of the system performance on model or decision parameters is explicitly specified, and (ii) fixing the underlying randomness in the system, we view the sample performance as a deterministic function of model or decision parameters. It is well known that while Monte Carlo (MC) simulation is flexible and widely applicable it has a relatively slow rate of convergence and since its inception, significant effort has been devoted to improving its efficiency. The statistical approaches to efficiency improvement are referred to as Variance Reduction Techniques (VRT), or Efficiency Improvement Techniques (EIT). The most commonly used techniques, such as Control Variate, Stratification, and Importance Sampling, were introduced at the early stages of the development of the MC method (see, e.g., [5]) and subsequent developments have generally focused on adapting these methods to specific domains of application and to analyze the performance of the resulting algorithms (see, e.g., [6, 7]). To achieve estimation efficiency, Variance Reduction Techniques (VRTs) generally bring some auxiliary information to bear on the estimation problem. The Control Variate technique (CV) makes this use most explicit: assume the simulation objective is to estimate E[Y ], the mean of a random variable Y ; the CV method relies on one or more auxiliary random variables called control variates, or controls, and utilizes information about these variables (deviations from their known mean) to reduce the variance of the estimator for E[Y ]. Once the source of information, i.e., the set of controls, is identified, the mechanisms for optimal information extraction and transfer are well understood and analyzed (See, e.g., [7], Section 4.1, [6], Section V.2, [8–11]). By contrast to the above optimal information transfer problem that has a standard solution, the solution to the problem of identification/discovery of effective controls has so far been fairly ad-hoc. There are some guidelines and common approaches for selecting controls, but identifying effective controls has often depended on whether users can discover and exploit specific features of specific estimation problems.

chapter3

April 29, 2013

16:17

World Scientific Review Volume - 9in x 6in

Perturbation Analysis and Variance Reduction in Monte Carlo Simulation

chapter3

47

The approach described in this chapter is an attempt to make the process of control variate selection more systematic. As mentioned earlier, motivated and influenced by the general philosophy of Perturbation Analysis, (i) the estimation task is taken to be a component of a more general problem and the dependence of the system performance on model or decision parameters is explicitly specified; thus a parameterized version of estimation problem is considered; (ii) fixing the underlying randomness in the system, the sample performance is viewed as a deterministic function of model or decision parameters; then, it is shown that a large class of deterministic function approximation methods imply very effective control variates and these controls can be systematically defined. It is further shown that sensitivity estimation problems can be cast within the framework of the above parameterized estimation problem and systematic and generic control variates can be obtained to reduce the variance of sensitivity estimators. The rest of the chapter is organized as follows. Section 3.2 describes the approach of systematic selection of generic control variates for parameterized estimation problems. Section 3.3 extends this approach to sensitivity estimation problems. An implementation of the approach is described in Section 3.4. Concluding remarks are provided in 3.5. 3.2. Systematic and Generic Control Variate Selection We begin this section with a brief overview of the Control Variate technique. 3.2.1. Control variate technique: a brief review Assume our goal is to estimate the unknown mean of an estimation variable Y , J = E[Y ]. Let Y be defined on the probability space (Ω, B, P ) and let {Y1 , · · · , Yn } be an i.i.d. sample of Y . Then, the standard/crude estimator for J = E[Y ] is the sample average 1 ˆ J(n) = Y (n) = (Y1 + · · · , Yn ). n Under mild conditions we have the central limit theorem ˆ √ J(n) −J ⇒ N (0, 1) n S(n) that provides the basis for constructing asymptotically valid confidence intervals for J (S(n) is the sample standard deviation and ⇒ denotes weak convergence).

April 29, 2013

16:17

48

World Scientific Review Volume - 9in x 6in

T. Borogovac, N. Sun and P. Vakili

Assume k controls X1 , · · · , Xk , defined on the same probability space (Ω, B, P ), have somehow been identified. Let X = (X1 , · · · , Xk ). (All vectors are assumed to be column vectors. They are written as row vectors for ease of presentation.) Assume the mean of X(i) is known for all i = 1, · · · , k. Without loss of generality we can assume E[Xi ] = 0, i = 1, · · · , k. For any set of scalars β1 , · · · , βk define the following controlled variable Z(β) = Y − (β1 X1 + · · · + βk Xk ) = Y − β  X where β = (β1 , · · · , βk ) and  denotes transpose. Z(β) is an unbiased estimator of J for any β ∈ Rk , i.e., E[Z(β)] = J. Assume Y and Xi ∈ L2 (Ω, B, P ) for all i where L2 (Ω, B, P ) (or L2 for simplicity) is the Hilbert space of random variables on (Ω, B, P ) with finite second moment. Furthermore assume the covariance matrix of X, ΣX , is nonsingular. Then, there exists a variance minimizing coefficient vector ∗ ∗ β ∗ = Σ−1 X ΣXY . Let Z = Z(β ). The minimized variance is 2 2 2 σZ ∗ = (1 − RXY )σY . 2 is the squared correlation coefficient given by where RXY −1 2 2 = Σ RXY XY ΣX ΣXY /σY .

The variance reduction ratio due to using samples of the optimally controlled Z ∗ as opposed to the uncontrolled Y , denoted by VRR, is given by V RR =

σY2 2 −1 . 2 = (1 − RXY ) σZ ∗

In almost all applications of the CV method β ∗ needs to be estimated due to the fact that all or some components of ΣXY and Σ−1 X are not known in advance and need to be estimated from i.i.d. samples (Y1 , X1 ), · · · (Yn , Xn ). Let S denote the closed linear subspace of L2 generated by X1 , · · · , Xk i.e., S = {β  X; β ∈ Rk , X = (X1 , · · · , Xk )}.

chapter3

April 29, 2013

16:17

World Scientific Review Volume - 9in x 6in

Perturbation Analysis and Variance Reduction in Monte Carlo Simulation

chapter3

49

Then X∗ = β ∗  X is the perpendicular projection of Y − E[Y ] onto S and we have the following variance decomposition V ar(Y ) = V ar(X∗ ) + V ar(Y − X∗ ). A portion of V ar(Y ), i.e., V ar(X∗ ), can be explained and removed by the control variates. Furthermore, Z ∗ is perpendicular to S, hence uncorrelated to every element of S. As a result its variance cannot be further reduced using the controls. For a Hilbert space-based exposition of the CV technique, derivations, and further results, see, e.g., [9]. Let A be a non-singular k × k matrix and AX a nonsingular transformation of X. Components of AX produce a new set of controls that span the same linear subspace as that spanned by components of X and yield the same optimal controlled estimator Z ∗ . Therefore, they form a vector of controls as effective as the original. In other words, the effectiveness of a vector of controls is a property of the linear subspace generated by its components rather than the controls themselves. Thus, an effective vector of controls is one where the linear subspace generated by its components is close to the estimation variable Y . This notion of proximity is the basis for the controls proposed in the next section. 3.2.2. Parametrized estimation problems As mentioned earlier, the problem setting we consider is a parametrized estimation one. This setting represents many problems of practical interest. Specifically, we assume Y = Y (θ), θ ∈ Θ ⊆ Rd , i.e., the estimation variable Y depends on some model or decision parameter θ. Assume the above parametric family of random variables is defined on a single probability space, (Ω, F , P ), corresponding, for example, to the use of Common Random Variables (CRN) at all parametric simulations, i.e., Y (θ) : Ω × Θ −→ R. Fixing θ ∈ Θ, Y (θ) is a random variable on (Ω, F , P ). On the other hand, fixing ω ∈ Ω, Y (ω, ·) : Θ −→ R is a real-valued function on Θ and {Y (θ) : θ ∈ Θ} can be viewed as a random function. Let W a random element of Ω corresponding to the probability measure P (i.e., for all A ⊆ Ω, P (W ∈ A) = P (A)), and f ( · ; θ) : Ω → R a parametric family of functions defined on Ω (θ ∈ Θ). Let Y (θ) = Y (W, θ) and define J(θ) as J(θ) = E[Y (θ)] = E[Y (W ; θ)].

May 8, 2013

14:38

50

World Scientific Review Volume - 9in x 6in

T. Borogovac, N. Sun and P. Vakili

Our goal is to design efficient simulation methods to address the following estimation problems. (As we will discuss in what follows, this formulation includes sensitivity estimation, i.e., estimating ∂J ∂θ (θ).) (1) Estimate J(θ) = E[Y (θ)] for many θ ∈ Θ, or (2) Estimate J(θ) = E[Y (θ)] for some θ ∈ Θ within some time/budget constraint. 3.2.3. Deterministic function approximation and generic CV selection In this section we describe our approach for generic CV selection. Fix ω ∈ Ω. Then Y (ω, θ) is a deterministic function of θ. To simplify the discussion, assume θ is a scalar. Furthermore, assume some information about this deterministic function is available. For example, the function value and a number of its derivatives are known at a specific parameter value θ0 , or, alternatively, the function value at a number of parameter values, say θ1 , · · · , θk are known, or, in a most general setting, function values and some of its derivatives are known at a number of parameter values. The problem of deterministic function approximation is to find a function (from a class of appropriately defined functions) that best approximates Y (ω, θ) over a given range (relative to a defined distance between functions or equivalently in a defined topology). For a review of some common function approximation methods see, e.g., [12]. Here, as an example, we consider the well known Taylor series approximation based on knowing Y (ω, θ) and a number of its derivatives at a parameter value θ0 , say  dj Y (ω, θ)  Xj (ω) =  dθj θ0 for j = 0, · · · , k. Then, assuming that for each ω, Y (ω, θ) is sufficiently smooth, the pathwise Taylor approximation to Y (ω, θ) is given by Tk (ω, θ) =

m  (θ − θ0 )j j=0

j!

Xj (ω).

For θ sufficiently close to θ0 , the error |Y (ω, θ) − Tk (ω, θ)|

chapter3

May 8, 2013

14:38

World Scientific Review Volume - 9in x 6in

Perturbation Analysis and Variance Reduction in Monte Carlo Simulation

chapter3

51

is small. To describe our control variate selection approach, several points are worth highlighting: • Tk (ω, θ) is a linear combination of X0 (ω), · · · , Xk (ω) and the coefficients are independent of ω. • Therefore, we can define Tk (θ) =

m  (θ − θ0 )j j=0

j!

Xj

where Tk (θ) and Xj (j = 0, · · · k) are viewed as random variables. • If |Y (ω, θ) − Tk (ω, θ)| is small for each ω it implies that Y (θ) − Tk (θ) is small where  ·  denotes L2 distance. In fact the L2 distance is more forgiving and does not necessarily require a small error at all ωs. • The above observations suggest X0 , · · · , Xk are effective controls for estimating E[Y (θ)]. • Tk (θ) is in the span of X0 , · · · , Xk but it is not necessarily the closest point of this subspace to Y (θ). Once X0 , · · · , Xk are used as controls it is the L2 norm that is in force when we speak of distance and not sample path distances. As a result the optimal coefficients are not necessarily j 0) (and most likely are not) (θ−θ for j = 0, · · · , k. The optimal choice j! of coefficients leads to a controlled estimator with smaller distance from Y (θ). The above observations lead to the following conclusions: • Path (IPA) derivatives as controls. For problems where path derivatives can be easily obtained (say, using an IPA algorithm) we can use them (and the function value itself, viewed as derivative of zero-th order) at a parameter value, say θ0 as controls when estimating E[Y (θ)] at neighboring θ. • Independence from deterministic approximation. Note that the optimal controlled estimator only depends on the controls and not the deterministic estimation methods used pathwise. On the other hand, the optimal L2 distance is smaller than the L2 average error of any deterministic method that is a linear combination of path derivatives (as long as the coefficients are not path dependent). • Generalization. A generalization of the argument we presented above is that all deterministic function approximation methods that are linear in their input data and where the linear coefficients of the approximations are independent of the realization of the random function being

14:38

World Scientific Review Volume - 9in x 6in

chapter3

T. Borogovac, N. Sun and P. Vakili

52

estimated, imply sets of controls defined by their input data (see, [13]). This is true for multidimensional as well as scalar θ. This observation suggests a large class of control variates that can be considered for parametric estimation. We illustrate our approach and its effectiveness via two simple examples. Example 1. Consider the following simple example. Let  {Y (θ) = e

θU

; θ ∈ Θ = [a, b]},

1

J(θ) = E[Y (θ)] =

eθu du.

0

where U ∼ U (0, 1) is a uniform random variate on (0, 1) and θ ∈ [a, b]. Two types of experiments were run. In the first we used Y (θ0 ) and its first two path derivatives as controls at θ0 = 3. Figure 3.1 (a) shows the variance reduction ratios (compared to crude MC) for when these controls were used to estimate E[Y (θ)] for θ ∈ (0, 8). The solid line represents using optimal control variate coefficients and the dotted line corresponds to using coefficients prescribed by Taylor expansion. In the second experiment, Y (θ1 = 1), Y (θ2 = 2), and Y (θ3 = 3) were used as controls. Figure 3.1 (b) shows the variance reduction ratios for when these controls were used to estimate E[Y (θ)] for θ ∈ (0, 8). The solid line represents using optimal control variate coefficients and the various dotted lines correspond to using coefficients prescribed by linear, polynomial, and trigonometric interpolations. 9

9

10

10

8

10

7

7

10

10

6

6

10

10

5

5

10

V RR

10

4

10

3

4

10

3

10

10

2

2

10

10

1

1

10

10

0

0

10

10

−1

10

OCV POLY TRIG LIN

8

10

V RR

May 8, 2013

−1

1

2

3

4

θ1

(a) Fig. 3.1.

5

6

7

8

10

1

2

3

4

5

6

7

θ1

(b)

Variance Reduction Ratios for estimation of E[Y (θ)] over θ ∈ (0, 8).

8

April 29, 2013

16:17

World Scientific Review Volume - 9in x 6in

chapter3

Perturbation Analysis and Variance Reduction in Monte Carlo Simulation

53

Both experiments indicate that the control variates selected are highly effective and, as can be proved, the optimal control variate coefficients dominate others prescribed by deterministic interpolations.

6

6

5

5

log10(V RR)

log10(V RR)

Example 2. Here we give the results from estimating the price of a financial (Asian call) option. The price of the option is the discounted value of the random variable of the payoff under a risk neutral measure. We assume the underlying asset follows a geometric Brownian process with drift and volatility parameters μ and σ. The controls are X 1 = Y (μ, σ), ∂Y 3 X 2 = ∂Y ∂μ (μ, σ) and X = ∂σ (μ, σ) for (μ, σ) = (0.1, 0.25). These controls were used to estimate J(μ, σ) for 0.01 ≤ μ ≤ 0.3 and 0.01 ≤ σ ≤ 0.6 (this is a wide range for these parameters in finance models). Details are omitted here. They are included in [13]. In Figure 3.2 the following controls are

4 3 2 1

4 3 2 1

0

0 0.5

0.5 0.4

0.4

0.25 0.2

0.3

0.25

0.05

σ

μ

μ

(a)

(b)

6

6

5

5

log10(V RR)

log10(V RR)

0.1

0.1

0.05

σ

0.15

0.2

0.1

0.1

0.2

0.3

0.15

0.2

4 3 2 1

4 3 2 1

0

0 0.5

0.5 0.4

0.25 0.2

0.3

0.4

0.25

0.1

0.1

0.1

0.05

μ

(c)

0.15

0.2

0.1

σ

0.2

0.3

0.15

0.2

0.05

σ

μ

(d)

Fig. 3.2. Variance Reduction Ratios for Asian call option value over range of μ and σ. Controls sets are (a) X 1 , (b) X 1 and X 2 , (c) X 1 and X 3 , (d) X 1 , X 2 and X 3 . Optimal control variate coefficients are used in each case.

April 29, 2013

16:17

54

World Scientific Review Volume - 9in x 6in

T. Borogovac, N. Sun and P. Vakili

used (a) only X 1 , (b) X 1 and X 2 , (c) X 1 and X 3 , and (d) all three controls. Note that the logarithm (base 10) of the Variance Reduction Ratios are plotted. 3.3. Control Variates for Sensitivity Estimation In this section we discuss the selection of effective control variates for reducing the variance of various sensitivity estimators. The first step is to cast the problem in the framework of parameterized estimation that we presented earlier. Once this is accomplished, the control variates proposed in that setting can be directly applied for variance reduction of sensitivity estimators. There is an additional class of control variates that can be used for sensitivity estimators that is worth mentioning. These controls, called finite difference controls, are selected based on the same principle as others discussed earlier, however, given their role as controls for sensitivity estimators, they deserve special mention. We begin with a reformulation of sensitivity estimation as a parameterized estimation problem. 3.3.1. A parameterized estimation formulation of sensitivity estimation We begin with a brief review of sensitivity estimators that are used in Monte Carlo simulation. See, e.g., [1, 6, 7] for more details. Let {Ψ(θ), θ ∈ Θ} denote a family of random variables, where Ψ(θ) represents the sample performance of a stochastic system and θ is a model or decision variable. To simplify the discussion, assume θ is a scalar. Let α(θ) = E[Ψ(θ)] denote the expected system performance. Sensitivity estimation is the problem of estimating α (θ) =

d dα = E[Ψ(θ)]. dθ dθ

The most straightforward and longstanding sensitivity estimators are those based on the finite-difference approach. For example, Ψ(θ + h) − Ψ(θ) α (θ) = h

chapter3

April 29, 2013

16:17

World Scientific Review Volume - 9in x 6in

Perturbation Analysis and Variance Reduction in Monte Carlo Simulation

chapter3

55

represents the forward-difference estimator (h > 0). These are sometimes called indirect estimators since to evaluate them we need to generate samples from a system with perturbed parameters. If sensitivities with respect to k parameters are required, we need to simulate samples at k different perturbed systems in addition to sampling at the nominal parameter. Direct sensitivity estimators that require sampling at a single parameter only fall into two broad categories of path or (IPA) derivatives and likelihood ratio (LR) derivatives (see, e.g., [6, 7]). More recently, sensitivity estimators based on Malliavin calculus have been proposed (see, e.g., [14, 15]). Malliavin estimators are defined in continuous time. To evaluate them via Monte Carlo, they need to be discretized in time. It is shown that the discretized versions are closely related to combinations of pathwise and likelihood ratio derivatives [16]. We, therefore, focus on the two methods of pathwise and likelihood ratio derivatives. Our approach extends to Malliavin based estimators estimated via simulation. Following [17], let {Pθ ; θ ∈ Θ} be a family of probability measures on the same measurable space (Ω, B). Let G be a probability measure that dominates all Pθ , i.e., Pθ is absolutely continuous with respect to G for all θ ∈ Θ. Then, α(θ) can be written as α(θ) = Eθ [Ψ(θ)]  Ψ(θ, ω)dPθ (ω) = Ω  = Ψ(θ, ω)L(G, θ, ω)dG(ω) Ω

= EG [Ψ(θ)L(G, θ)] where L(G, θ, ω) = (dPθ /dG)(ω) is the likelihood ratio. Subject to the validity of the interchange of differentiation and integration, we have α (θ) = EG [

dL(G, θ) dΨ(θ) L(G, θ) + Ψ(θ) ]. dθ dθ

The above identity implies the following unbiased estimator for α (θ) dL(G, θ) dΨ(θ) L(G, θ) + Ψ(θ) . α (θ) = dθ dθ

(3.1)

In most stochastic models of practical interest either the sample performance Ψ or the probability measure P depends on θ and not both. If this is not the case, in almost all cases, the user can select an equivalent representation for which this is true.

April 29, 2013

16:17

56

World Scientific Review Volume - 9in x 6in

chapter3

T. Borogovac, N. Sun and P. Vakili

If θ is only a parameter of the sample performance Ψ and P = G, then the above estimator is the so-called pathwise (IPA) estimator, i.e., dΨ (θ). (3.2) α P W (θ) = dθ On the other hand, if the sample performance is independent of θ and the sampling measure G is equal to Pθ0 for a nominal parameter θ0 , then the estimator is the so-called likelihood ratio estimator d ln(dPθ ) (3.3) α LR (θ0 ) = Ψ · (θ0 ). dθ More generally, if we fix G = Pθ0 , then dL(θ0 , θ) . (3.4) α LR (θ) = Ψ · dθ Therefore, for both cases of pathwise and likelihood ratio derivatives we can write α (θ) = J(θ) = E[Y (θ, ω)]

(3.5)

where {Y (θ); θ ∈ Θ} is an appropriately defined family of random variables defined on the same probability space (Ω, B, P ) for an appropriately defined probability measure P . Note that this representation is not unique. Given such a representation, we can use the approach of the previous section for control variate selection. 3.3.2. Finite difference based controls So far we have not taken into account the fact that Y itself is a derivative of another function, say Y = dΦ/dθ. We now specifically look at what can be said when Y = dΦ/dθ for some function Φ. Again, to simplify, assume θ is a scalar. Note that Y can be approximated by a finite difference: Φ(θ + h, ω) − Φ(θ, ω) h 1 1 = Φ(θ + h, ω) − Φ(θ, ω) h h = β1 Φ(θ + h, ω) + β2 Φ(θ, ω).

Y (θ, ω) ≈

This finite difference approximation satisfies the criteria we specified in the previous sections. Namely, it is linear in its input data, i.e., Φ(θ + h, ω) and Φ(θ, ω), and the linear coefficients of the approximation, i.e., 1/h and −1/h are independent of the realization of the random function being

April 29, 2013

16:17

World Scientific Review Volume - 9in x 6in

Perturbation Analysis and Variance Reduction in Monte Carlo Simulation

chapter3

57

estimated. Therefore, this approximation implies that Φ(θ + h), and Φ(θ) can be effective control variates. This argument generalizes to estimating higher order derivatives of Φ (see [18]). Based on the above discussion, we consider the following class of control variates in our experiments (FD) Xi = Φ(θi ), i = 1, · · · , k. 3.3.3. Illustrating example Here, we illustrate the above approach via a simple example from computational finance. The experimental setting is the same as one provided in [19]. Assume our objective is to estimate the Delta of a European call option on a dividend paying stock that satisfies the Black-Scholes (BS) model. Delta is the sensitivity of the price of the option to the initial value of the stock. Assume the stock price process, {St ; t ∈ [0, T ]}, follows the BlackScholes model under the risk-neutral measure. Namely, dSt = (r − δ)dt + σdWt St where r is the risk-free interest rate, δ is the rate of dividend payment, σ is the volatility parameter, and {Wt } is a standard Brownian motion. Let K be the strike price. In general, the parameters of interest are S0 , r, σ, K, and T and we can take θ = (S0 , r, σ, K, T ). To simplify the discussion, we limit ourselves to the initial stock price and assume θ = S0 . Let Φ denote the discounted payoff of the option, i.e., +

Φ(θ) = e−rT (ST − K) where ST = S0 · exp((r − δ − σ2 /2)T + σWT ). The price of the option is given by J(θ) = E[Φ(θ)] and its Delta is

dJ (θ). dθ We give the pathwise and likelihood ratio estimators of Delta and a set of proposed control variates for each estimator below. For derivation of pathwise and likelihood ratio estimators, see [19].

April 29, 2013

16:17

58

World Scientific Review Volume - 9in x 6in

T. Borogovac, N. Sun and P. Vakili

The pathwise estimator of delta is DP W (θ) 

∂Φ ST (θ) = Y (θ) = e−rT I{ST ≥K} . dθ S0

• PL control variates. These controls are simply of the form X1 = DP W (S0 (1)), · · · , Xk = DP W (S0 (k)), where S0 (i) is the ith initial stock price. These initial values are selected by the user. • TY control variates. To define a Taylor expansion we note that the derivative of DP W (θ) with respect to θ is zero almost everywhere and in this particular case is not informative. • FD control variates. In this case we have Xi = Φ(θi ), i = 1, · · · , k, where θi ’s are again selected by the user. The likelihood ratio estimator of delta is DLR (θ) = Y (θ) = Φ(θ) ·

ln(ST /S0 ) − (r − δ − σ 2 /2)T . S0 σ 2 T

• PL control variates. The controls are of the form X1 = Y (S0 (1)), · · · , Xk = Y (S0 (k)) where S0 (i) is the ith initial stock price. These initial values are selected by the user. • TY control variates. The controls are X1 = DLR (θ0 ), X2 = d DLR (θ0 ), dθ etc. • FD control variates. In this case we need to find Υ(θ) such that dΥ (θ) = dθ Y (θ). It can be verified that Υ(θ) = Φ(θ0 )

g(x, θ) g(x, θ0 )

where g(x, θ) is the density of terminal stock price ST when initial stock price is θ. Then, we have Xi = Υ(θi ), i = 1, · · · , k, where θi ’s are again selected by the user. The experimental results based on the above controls (see, [20, 21]) compare favorably with the results reported in [19].

chapter3

May 8, 2013

14:38

World Scientific Review Volume - 9in x 6in

Perturbation Analysis and Variance Reduction in Monte Carlo Simulation

chapter3

59

3.4. Database Monte Carlo (DBMC) Implementation As stated earlier, the means of the control variates proposed in the previous section can rarely be computed exactly. They need to to be estimated. In this section, we give a brief description of the DBMC approach and comment on an alternative implementation. See [22] and [13] for more details. Assume our objective is to estimate J(θ) = E[Y (θ)] for some θ ∈ Θ. We draw N i.i.d. samples from Ω according to probability measure P . The size of N is user defined. Assume that N is “very large,” taken to be significantly larger than the number of samples generally used for estimating J(θ). Let ΩDB = {ω1 , · · · , ωN } denote the samples generated. In what follows, we assume ΩDB is fixed. We refer to ΩDB as the database. Let P˜ denote the empirical measure associated with samples ω1 , · · · , ωN . We consider solving an approximate problem defined as estimating ˜ (θ)]. ˜ = E ˜ [Y (θ)] = E[Y J(θ) P ˜ is the expectation of Y (θ) with respect to the empirical In other words, J(θ) ˜ measure P . Note that the empirical measure P˜ is simply the uniform measure on ΩDB = {ω1 , · · · , ωN } assuming that if identical samples are generated, they are kept as separate elements. Therefore, N  ˜ = 1 Y (θ, ωj ). J(θ) N j=1

˜ In other words, J(θ) is simply the average of N i.i.d. samples of Y (θ, ωj ). Given our assumption of very large N , and under some regularity assump˜ tions on Y (θ, ω), we can expect J(θ) to be a very good approximation of J(θ) with high probability. Let X1 , · · · , Xk be a set of control variates as proposed in the previous section. We can directly calculate the means of the these controls with respect to the empirical measure P˜ as N 1  Xi (ωj ). μ ˜i = N j=1

April 29, 2013

16:17

60

World Scientific Review Volume - 9in x 6in

T. Borogovac, N. Sun and P. Vakili

˜ We will not evaluate J(θ) exactly for other values of θ. Rather, we use the CV technique to find an estimate of it. We use the classical control ˜ variate technique to estimate J(θ). Figure (3.3) gives the outline of this implementation.

(1) For j = 1, · · · , n (a) Generate a sample ωj uniformly from the database. (b) Evaluate Y (θ, ωj ) and Xi (θ, ωj ), i = 1, · · · , k. ˜ (2) The controlled estimator for J(θ) is Z(θ) defined as Z(θ) = Y (θ) +

k 

˜i ) βi∗ (X i − μ

i=1

Fig. 3.3.

Implementation of DBMC.

Y (θ) and X i are the sample averages of the above n samples and βi∗ is the estimate of the optimal coefficient based on these samples. An alternative implementation is the following. Once the means of the controls are computed, we discard the samples ω1 , · · · , ωN , and in step (a) of Figure 3.3, we generate new samples from Ω according to the original probability measure P . The next steps follow as in Figure 3.3. This is the method of Estimated Control Variate [23]. Both implementations produce conditionally biased estimators of J(θ). The bias can be reduced by increasing N . The statistical properties of the estimators resulting from the two implementations prove to be quite similar (see, e.g., [13, 23]). Both implementation incur a setup cost, as should be clear from the above description. The cost can be justified if the control variates are used for estimation of J(θ) for a large number of θ ∈ Θ, for example. 3.5. Conclusions We have presented a systematic approach to selecting generic and effective control variates in a parameterized estimation context. The approach relies on viewing sample performances of the system as a deterministic function of model or decision parameters when the underlying randomness is fixed.

chapter3

April 29, 2013

16:17

World Scientific Review Volume - 9in x 6in

Perturbation Analysis and Variance Reduction in Monte Carlo Simulation

chapter3

61

We show that in this context a large class of deterministic function approximation methods implies very effective control variates. Two directions of further research are worth pursuing: (i) we have so far used a sample function approximation approach to define the proposed control variates; given the Hilbert space formulation of the Control Variate technique, it seems possible, and desirable, to appeal directly to L2 function approximation methods to define the control variates. This is a subject of our current research. (ii) We have argued that an appropriate setting to use the proposed control variates is when the parameterized estimation problem is a component of a larger problem, say performance optimization. The proposed approach can be viewed as computational learning: by sampling at a number of parameter values and estimating the performance and/or performance sensitivities, for example, we hope to learn about the parameter values that lead to better performance, or more ambitiously, the optimal performance. How to best select control variates to assist with this optimization goal, i.e., the appropriate experimental design in this setting, is an open question. Acknowledgements Research inspired and influenced in good part by the generous guidance and mentorship of Larry Ho and the vibrant community of Perturbation Analysis researchers he has spawned. Thank you & Happy Birthday Larry! References [1] Y. Ho and X.-R. Cao, Perturbation Analysis of Discrete Event Systems. Kluwer Academic Publishers (1991). [2] P. Glasserman, Gradient Estimation via Perturbation Analysis. Kluwer Academic Publishers (1991). [3] M. Fu and J. Hu, Conditional Monte Carlo: Gradient Estimation and Optimization Applications. Kluwer Academic Publishers (1997). [4] C. Cassandras, Y. Wardi, B. Melamed, G. Sun, and C. G. Panayiotou, Perturbation analysis for on-line control and optimization of stochastic fluid models, IEEE Trans. on Automatic Control. 47(3), 1234–1248 (2002). [5] J. M. Hammersley and D. C. Handscomb, Monte Carlo Methods. John Wiley (1964). [6] S. Asmussen and P. Glynn, Stochastic Simulation: Algorithms and Analysis. Springer (2007). [7] P. Glasserman, Monte Carlo Methods in Financial Engineering. Springer Verlag (2004).

April 29, 2013

16:17

62

World Scientific Review Volume - 9in x 6in

T. Borogovac, N. Sun and P. Vakili

[8] B. L. Nelson, Control variate remedies, Operations Research. 38, 974–992 (1990). [9] R. Szechtman, A Hilbert space approach to variance reduction. In eds. S. G. Henderson and B. L. Nelson, Handbook in OR and MS, vol. 13, chapter 10, pp. 259–289. Elsevier B.V. (2006). [10] P. Glynn and R. Szechtman, Some new perspectives on the method of control variates. In eds. K. Fang, F. Hickernell, and H. Niederreiter, Monte Carlo and Quasi-Monte Carlo Methods 2000, pp. 27–49. Springer-Verlag (2002). [11] S. S. Lavenberg and P. D. Welch, A perspective on the use of control variables to increase the efficiency of Monte Carlo simulations, Management Science. 27(3), 322–335 (1981). [12] K. Atkinson and W. Han, Theoretical Numerical Analysis: A Functional Analysis Framework. Texts in Applied Mathematics. Springer (2007). [13] T. Borogovac and P. Vakili, Database Monte Carlo (DBMC) and generic control variates for parametric estimation. Technical report, Boston University College of Engineering (2009). [14] E. Fournie, J. Lasry, and J. Lebuchous, Applications of Malliavin calculus to Monte Carlo methods in finance, Finance and Stochastics. 3, 391–412 (1999). [15] E. Fournie, J. Lasry, J. Lebuchous, and P. Lions, Applications of Malliavin calculus to Monte Carlo methods in finance, Finance and Stochastics. 5, 201–236 (2001). [16] N. Chen and P. Glasserman, Malliavin Greeks without Malliavin calculus, Management Science. 117, 1689–1723 (2007). [17] P. L’Ecuyer, A unified view of the IPA, SF, and LR gradient estimation techniques, Management Science. 36, 1364 (1990). [18] T. Borogovac, Constructive and Generic Control Variates for Monte Carlo Estimation. PhD thesis, Boston University (May, 2009). [19] M. Brodie and P. Glasserman, Estimating security price derivatives using simulation, Management Science. 42, 269–285 (1996). [20] T. Borogovac, N. Sun, and P. Vakili, Control variates for sensitivity estimation. In Proceedings of the 2010 Winter Simulation Conference, pp. 2624– 2641 (2010). [21] N. Sun, Control Variate Approach for Multi-user Estimation via Monte Carlo Simulation. PhD thesis, Boston University (January, 2013). [22] T. Borogovac and P. Vakili, Control variate technique: a constructive approach. In eds. S. J. Mason, R. R. Hill, L. Moench, and O. Rose, Proceedings of the 2008 Winter Simulation Conference, pp. 320–327 (2008). [23] R. Pasupathy, B. W. Schmeiser, M. R. Taaffe, and J. Wang, Control-variate estimation using estimated control means, IIE Transactions. 44(5), 381–385 (2012).

chapter3

April 29, 2013

16:18

World Scientific Review Volume - 9in x 6in

Chapter 4 Adjoints and Averaging

Paul Glasserman Columbia Business School, Columbia University This chapter discusses two techniques in derivative estimation, each of which combines old and new ideas. The first is the use of adjoint variables in calculating sample path derivatives. Compared with a standard forward IPA calculation, the adjoint method can produce substantial speed-up in computing the derivatives of a small number of performance measures with respect to a large number of parameters. The second method uses an average of multiple combinations of IPA and likelihood ratio method estimators, each combination switching from one method to the other at a fixed time along a sample path. The average is shown to inherit attractive features of both methods in a continuous-time limit.

4.1. Introduction In the 30 years since Larry and his earliest collaborators launched the field of derivative estimation, the topic has continued to spawn a wealth of ideas and to generate enormous practical and theoretical interest. Indeed, the number of methods has grown too extensive to cover in one place. The scope of applications has also widened over the years, going beyond the topic’s roots in discrete-event systems. In the financial industry, for example, the accurate estimation of sensitivities for hedging purposes is among the most important and most demanding uses of computing resources, and it continues to motivate new work. The lasting theoretical and practical importance of the topic is a tribute to Larry’s insight in creating the field and his perseverance in advancing it. My objective in this chapter is to highlight two developments in derivative estimation that have benefited from ideas in related areas and that can also be usefully viewed from the perspective of IPA. The first is the use of adjoint methods, and the second is an averaging of multiple 63

chapter4

April 29, 2013

16:18

64

World Scientific Review Volume - 9in x 6in

P. Glasserman

combinations of IPA and likelihood ratio method (LRM) estimators that proves exceptionally effective in a particular application. The adjoint method is an algorithmic technique that can produce significant speed-ups in calculating sample-path derivatives. The averaging technique reveals a connection between the “classical” methods of IPA and LRM and a more recent approach using Malliavin calculus. Both of the tools discussed in this chapter have proved useful in financial applications and may warrant further investigation in applications to other domains. 4.2. Adjoints: Classical Setting Adjoint variables feature prominently in optimal control theory, so I will briefly describe how they are introduced in Larry’s classic book. Bryson and Ho (1975) study linear systems governed by differential equations of the form x(t) ˙ = F (t)x(t) + G(t)u(t), where F and G are time-varying matrices of dimensions compatible with the state x and the control u. They introduce the fundamental matrix Φ(t, t0 ), which satisfies d Φ(t, t0 ) = F (t)Φ(t, t0 ), (4.1) dt and which maps the initial state x(t0 ) and the control function u(s), t0 ≤ s ≤ t, to the terminal state x(t). In the special case u ≡ 0, we have x(t) = Φ(t, t0 )x(t0 ), so that Φ is the matrix of sensitivities of terminal values to initial values. Equation (4.1) takes Φ(t, t0 ) as a function of terminal time t with initial time t0 fixed. As discussed in Bryson and Ho (1975), there are computational advantages to working with the adjoint equation, d  Φ (t, τ ) = −F (t) Φ (t, τ ), Φ(t, t) = I, (4.2) dτ which fixes t and varies the initial time. It will be useful to keep the reversal in mind, as somewhat analogous ideas have proved computationally valuable in calculating sample path derivatives. 4.3. Adjoints: Waiting Times Adjoint differentiation is an algorithmic method for accelerating the calculation of derivatives. It belongs, more generally, to the field of automatic

chapter4

May 14, 2013

10:5

World Scientific Review Volume - 9in x 6in

Adjoints and Averaging

chapter4

65

differentiation, which seeks to automate the differentiation of a function defined implicitly through software; see Giles and Pierce (2000) and Griewank (2000). Automatic differentiation views a computer program as the composition of a possibly very large number of simple operations: arithmetic operations, comparisons of two arguments, and so on. Each of these simple operations is easy to differentiate; applying the chain rule through thousands of compositions of such operations is tedious, but the logic for doing so can itself be implemented in a computer program to automate the calculation of derivatives. The steps in this approach are familiar to anyone who has implemented an IPA algorithm. Consider for example the Lindley recursion Wn+1 (θ) = max{Wn (θ) + Sn (θ) − An+1 (θ), 0},

(4.3)

for the weighting time of the nth job in a queue, where Sn is the service time for the nth job, An+1 is the time between arrivals of the nth and (n + 1)st jobs, and θ is a parameter of these inputs. To differentiate both sides, we use the rule (x + y) = x + y  and (max(x, y)) =



x if x > y; y  if x < y,

to get  (θ) Wn+1

 =

Wn (θ) + Sn (θ) − An+1 (θ), if Wn+1 > 0; 0, otherwise.

(4.4)

This is a forward calculation. At each step, it records the sensitivity of the current state to the input θ. An adjoint calculation works backwards, after the full path has been generated. And, in contrast to a forward variable, an adjoint variable records the sensitivity of the final output to the current variable. This is analogous to the reversal of the roles of the time indices in (4.2). For example, suppose the final output of interest is WN , the waiting time of the N th job, with N fixed. We can write WN =

N −1  i=N −k

(Si − Ai+1 ),

April 29, 2013

16:18

World Scientific Review Volume - 9in x 6in

66

P. Glasserman

where the (N − k)th job is the one that starts the busy period containing the N th job. Clearly,  ∂WN 1, if i and N are in the same busy period, i < N ; = 0, otherwise. ∂Si This is an adjoint variable because it records the influence of Si on the final output WN . (Indeed, Bryson and Ho (1975) also use the term influence variables for adjoint variables.) Of course, we cannot calculate this variable solely through a forward pass; we need to generate and store the full path to WN and then work backwards to the beginning of the last busy period. Once we have done so, we can evaluate the derivative of WN as  N −1   ∂WN  ∂WN  WN (θ) = Si (θ) + Ai+1 (θ) . (4.5) ∂Si ∂Ai+1 i=N −k

The logic underlying this simple rearrangement is as old as IPA itself. It is implicit in Ho, Eyler, and Chien (1983) and in the tableau presentation in Ho and Cao (1983), and it underlies the analysis in Suri and Zazanis (1988). Ho and Cao (1983) describe the propagation of perturbations through a network of queues, which defines a forward differentiation algorithm. They also show that each perturbation is eventually realized or lost, which we may interpret to mean that a corresponding adjoint variable is either 1 or 0. The value of the adjoint variable is not known at the time the perturbation is generated and is determined only later. Equation (4.5) can also be interpreted through the idea of a critical timing path discussed in Ho (1987), where an analogy with the transition matrix Φ(t, t0 ) is made explicit. So far, the adjoint perspective does not offer an obvious computational advantage over the usual forward calculation. But suppose that instead of a single parameter θ we had multiple parameters with respect to which we wanted sensitivities. We might have two parameters each for the service time distribution and the interarrival time distribution. A straightforward extension of the algorithm in (4.4) would require keeping track of a separate accumulator Wn for each parameter, essentially multiplying the number of derivative recursions by the number of parameters. In contrast, once we have calculated the adjoint variables in (4.5), we can re-use the same adjoint variables with all parameters: we simply replace the values of Si and Ai+1 , depending on the parameter. This simple example illustrates the key feature of the adjoint perspective: Adjoint differentiation offers a computational advantage in calculating

chapter4

April 29, 2013

16:18

World Scientific Review Volume - 9in x 6in

Adjoints and Averaging

chapter4

67

the derivatives of a small number of performance measures with respect to a large number of input parameters. In contrast, if the number of performance measures is large or the number of parameters is small, forward differentiation is usually more efficient because of the additional overhead of adjoint calculations. Adjoint differentiation has proved tremendously effective in financial applications. A bank seeking to hedge the value of a large portfolio of derivative securities is primarily interested in one output — the total value of the portfolio — and sensitivities to a very large number of inputs including the prices of the underlying assets, interest rates, exchange rates, and volatilities. Giles and Glasserman (2006) illustrate the computational gains that can be achieved, and similar success with other examples has been reported in Capriotti and Giles (2012). It is worth stressing that adjoint differentiation is fundamentally different from a variance reduction technique. Indeed, on each sample path, an adjoint calculation produces the same result as forward calculation, but the computing times required for the two methods can vary widely. The adjoint method is an algorithmic technique, not a probabilistic one. 4.4. Adjoints: Vector Recursions The Lindley recursion (4.3) offers a simple setting in which to illustrate the idea of adjoint differentiation in a discrete-event simulation. But its very simplicity may obscure the method’s potential. In this section, we review the vector recursion setting considered in Giles and Glasserman (2006). We consider a simulation algorithm of the form Xn+1 = F (Xn , Zn+1 ) ≡ Fn (Xn )

(4.6)

in which the vector Xn records the system state, Zn+1 is a vector of stochastic inputs, and F encodes the rule for updating the state. Writing Fn allows us to absorb the stochastic input into the transition rule. Algorithms of this form arise in many settings; the Lindley recursion is a simple example. Another class of examples arises from the time-discretization of a stochastic differential equation ˜ ˜ ˜ dX(t) = a(X(t)) dt + b(X(t)) dW (t).

(4.7)

˜ is m-dimensional, W is a d-dimensional Brownian motion, The process X a(·) takes values in Rm and b(·) takes values in Rm×d . In financial appli˜ would record all relevant market prices and rates, for example. cations, X

April 29, 2013

16:18

World Scientific Review Volume - 9in x 6in

68

chapter4

P. Glasserman

The purpose of the simulation might then be to evaluate an expectation ˜ )], E[g(X(T where g is the discounted payoff of a derivative security maturing at time T. To simulate (4.7), one often uses an Euler approximation with fixed time step h = T /N , with N an integer. We write Xn for the Euler approximation at time nh, which evolves according to √ ˜0, (4.8) Xn+1 = Xn + a(Xn ) h + b(Xn ) Zn+1 h, X0 = X where Z1 , Z2 , . . . are independent d-dimensional standard normal random vectors. For hedging purposes, one is particularly interested in sensitivities with respect to current market prices, which are part of the initial conditions of the simulation. Thus, we consider derivatives ∂ E[g(XN ]. ∂X0,j To estimate this sensitivity, we will calculate the sample path derivative ∂ g(XN ). ∂X0,j We assume that g is available explicitly and that we can calculate its sensitivity to the terminal values XN . The chain rules gives  ∂g(XN ) ∂XN,i ∂ g(XN ) = . ∂X0,j ∂XN,i ∂X0,j i=1 m

Differentiating the recursion (4.6), we get m  ∂g(XN ) i=1

∂XN,i

Δij (N )

(4.9)

with Δij (n) =

∂Xn,i , ∂X0,j

i, j = 1, . . . , m.

The Δij (n) variables are state sensitivities. For example, from (4.8) one can derive the evolution of the Δij by differentiating both sides (4.8) with respect to X0,j . More generically, let D(n) denote the derivative map of Fn , Dij (n) =

∂Fn,i , ∂xj

April 29, 2013

16:18

World Scientific Review Volume - 9in x 6in

Adjoints and Averaging

chapter4

69

evaluated at (Xn , Zn+1 ). (Again, this can be made explicit in the special case of (4.8) by differentiating both sides.) Then the state sensitivities Δ evolve according to Δ(n+1) = D(n) Δ(n),

Δ(0) = I.

By the chain rule, the gradient of g(XN ) with respect to the initial state X0 is given by ∇X0 g(XN ) = ∇XN g(XN )D(N − 1)D(N − 2) · · · D(1)D(0)Δ(0).

(4.10)

A forward calculation works from right to left in this expression, at each step multiplying by an m× m matrix D(n), until the final step at which the matrix is multiplied by the vector ∇X0 g(XN ). This is O(N m3 ) operations in total. In contrast, an adjoint implementation works in the opposite direction. More precisely, in the forward phase we store (but do not multiply) the matrices D(n). Then, we evaluate the product in (4.10) from left to right. The adjoint calculation is a vector-matrix multiplication at each step, rather than a matrix-matrix multiplication, so the number of operations is reduced to O(N m2 ). The savings can be dramatic when m is large, as in models of the yield curve. The price we pay is that we have committed ourselves to a particular payoff g. In contrast, most of the work in a forward calculation can be re-used if we introduce a different payoff. So, neither method is uniformly better than the other. As we noted previously, the adjoint method is advantageous when we need a large number of derivatives for a small number of functions. The application of this method for discrete-event systems appears not to have been fully explored and may merit further investigation. 4.5. Averaging In applications in finance and in other areas as well, the use of IPA is sometimes limited by the requirement that the performance measure of interest be continuous in the underlying variables. Likelihood ratio method (LRM) estimators generally avoid this need for smoothness but face other limitations. In discrete-event simulation, the main weakness of LRM is that its variance usually grows with the length of the simulation horizon (see Chapter 7 of Ho and Cao (1990)). In some applications to finance, LRM estimator work well when time is discretized into fairly large steps

April 29, 2013

16:18

70

World Scientific Review Volume - 9in x 6in

chapter4

P. Glasserman

but can blow up as the time step shrinks to approximate a continuous-time process. In this section, we review a result from Chen and Glasserman (2007) showing that by taking the average of multiple combinations of IPA and LRM, we can arrive at an estimator that avoids the need for smoothness, as LRM does, yet remains stable as the time shrinks, as IPA does. Several other combinations of IPA and LRM have been proposed before, but the benefit in this example is particularly striking. We consider a model defined by a scalar stochastic differential equation on [0, T ], dXt = μ(Xt )dt + σ(Xt )dWt ,

X0 = x,

(4.11)

where W is a standard Brownian motion. As in the previous section, we are interested in an expectation u(x) = E[g(XT )] and its sensitivity u (x) to the initial state x. Bringing the derivative inside the expectation, we get   dXT ≡ E [g  (XT )ΔT ] , u (x) = E g  (XT ) dx where ΔT = dXT /dx is the derivative of XT with respect to the initial state x. Differentiating both sides of (4.11), we get dΔt = μ (Xt )Δt dt + σ  (Xt )Δt dWt ,

Δ0 = 1,

the continuous-time analog of the derivatives we get by differentiating (4.8). If g is not continuous, the likelihood ratio method (LRM) offers an alternative. The LRM estimator starts from the transition density f (x, ·) describing the distribution of XT given X0 = x. Then u(x) is given by  u(x) = g(xT )f (x, xT ) dxT , so bringing the derivative inside the integral and then multiplying and dividing by f (x, xT ) yields the estimator d log f (x, XT ). (4.12) dx The difficulty with this method is that the transition density f is generally unknown. We can try to circumvent this difficulty by dividing the interval [0, T ] into smaller steps. As before, we may use an Euler approximation u (x) = g(XT )

ˆ i−1 + μ(X ˆ i−1 )Δt + σ(X ˆ i−1 )ΔWi , ˆi = X X

ˆ 0 = x, X

(4.13)

April 29, 2013

16:18

World Scientific Review Volume - 9in x 6in

chapter4

Adjoints and Averaging

71

i = 1, . . . , N , with time step Δt = T /N and ΔWi = W (iΔt)−W ((i−1)Δt). ˆ i = dX ˆ i /dx, then ˆ N )] and let Δ Let u ˆ(x) = E[g(X ˆi = Δ ˆ i−1 + μ (X ˆ i−1 )Δ ˆ i−1 Δt + σ  (X ˆ i−1 )Δ ˆ i−1 ΔWi , Δ

ˆ 0 = 1. Δ

(4.14)

The IPA estimator of the discrete-time derivative u ˆ (x) is then ˆ N )Δ ˆN. g  (X For the LRM estimator, we may use the transition density fˆ over a time step Δt to write   u ˆ(x) = · · · g(xN )fˆ(x, x1 ) · · · fˆ(xN −1 , xN ) dxN · · · dx1 . (4.15) Following the usual steps, we get the LRM estimator ˆN ) g(X

N  d ˆ i−1 , X ˆ 1 ). ˆ i ) = g(X ˆ N ) d log fˆ(x, X log fˆ(X dx dx i=1

(4.16)

Here we have used the important observation that only the first of the transition densities depends on the initial state x. We may not know the original transition density f for the continuoustime process X, but the transition density for the Euler scheme is Gaussian. ˆ 1 is normally distributed with mean x+μ(x)Δt and variance In particular, X 2 σ (x)Δt. As a consequence, we can differentiate the log density and, after some simplification and dropping higher-order terms, write the estimator (4.16) as   ˆ N ) ΔW1 . g(X σ(x)Δt This estimator is nearly unbiased (ignoring higher-order terms) but it blows up as Δt approaches zero because of the non-differentiability of Brownian motion. We now come to the first main idea of this section. A pure IPA estiˆ i−1 , all the way back to the ˆ i with respect to X mator differentiates each X initial condition x. A pure LRM estimator treats x as a parameter of the distribution of XT through the transition density f (x, ·). In analogy with the adjoint ideas of the previous section, we have the flexibility to combine the two methods, switching from IPA to LRM midway through the path. ˆ i−1 as the initial condition for the second part of the We can think of X path (applying LRM for the adjoint derivative from that point forward)

April 29, 2013

16:18

World Scientific Review Volume - 9in x 6in

72

chapter4

P. Glasserman

ˆ i−1 to the and then use the chain rule and IPA to differentiate back from X initial condition x. This combination gives us the estimator ˆN ) g(X

ˆ d ˆ i−1 , X ˆ i ) dXi−1 . log fˆ(X ˆ i−1 dx dX

(4.17)

Switching methods in mid-path could be viewed as a type of smoothing through conditional expectations (Gong and Ho (1987); Fu and Hu (1997)). Again using the fact that fˆ is Gaussian, we can write (4.17) as

ΔWi ˆ i−1 . ˆ Δ g(XN ) ˆi−1 )Δt σ(X Ignoring higher-order terms, this is unbiased for u ˆ (x) for all Δt, for all i = 1, . . . , N . So far, the combination of IPA and LRM has not bought us much: this estimator still blows up as Δt → 0. But we now come to the second main idea of this section. If we average over all i — that is, over all times at which we can switch from IPA to LRM — we get

 N  1 T Δt ΔW 1 i ˆ ˆ Δi−1 ≈ g(XT ) dWt , (4.18) g(XN ) ˆi−1 )Δt N i=1 σ(X T 0 σ(Xt ) for small Δt. Individually, each estimator blows up, but by averaging over multiple combinations of IPA and LRM we get an estimator that behaves well as the time step shrinks. Moreover, we do not need any smoothness in g. The expression on the right side of (4.18) is derived using Malliavin calculus in Fourni´e et al. (1999); see also Gobet and Munos (2005). The connection with IPA and LRM is made in Chen and Glasserman (2007), where the approximation in (4.18) is justified as a limit as Δt → 0. We have presented a very specific example, but the example nevertheless raises an intriguing question of whether there are other application areas in which averaging multiple combinations of IPA and LRM can produce estimators that are superior to what either type of estimator can produce separately. 4.6. Concluding Remarks It has been a pleasure and a privilege to be part of the community of Larry’s students and collaborators for 25 years, a community shaped by Larry’s vision, insight, optimism, and generosity. As I hope this chapter

April 29, 2013

16:18

World Scientific Review Volume - 9in x 6in

Adjoints and Averaging

chapter4

73

shows, his ideas continue to bear fruit, and we have all benefited from the research paths he opened for us. Happy Birthday, Larry. References Bryson, A.E., Jr., and Ho, Y.C. (1975) Applied Optimal Control (Taylor and Francis). Capriotti, L., and Giles, M. (2012) Algorithmic differentiations: adjoint Greeks made easy, Risk, to appear. Chen, N., and Glasserman, P. (2007) Malliavin Greeks without Malliavin calculus, Stochastic Processes and Their Applications 117, pp. 1689–1723. Fourni´e, E., Lasry, J.-M., Lebuchoux, J., Lions, P.-L., and Touzi, N. (1999) Applications of Malliavin calculus to Monte Carlo methods in finance, Finance and Stochastics 3, pp. 391–412. Fu, M.C., and Hu, J.Q. (1997) Conditional Monte Carlo: Gradient Estimation and Optimization Applications (Kluwer). Giles, M., and Glasserman, P. (2006). Smoking adjoints: fast Monte Carlo Greeks, Risk 19, pp. 88–92. Giles, M., and Pierce, N. (2000). An introduction to the adjoint approach to design, Flow, Turbulence and Control 65, pp. 393–415. Gobet, E., and Munos, R. (2005) Sensitivity analysis using Ito-Malliavin calculus and martingales. Application to stochastic optimal control. SIAM Journal on Control and Optimization 43, 5, pp. 1676–1713. Gong, W.B., and Ho, Y.C. (1987) Smoothed (conditional) perturbation analysis of discrete even dynamical systems, IEEE Transactions on Automatic Control 32, 10, pp. 858–866. Griewank, A. (2000). Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation, (SIAM). Ho, Y.C. (1987). Performance evaluation and perturbation analysis of discrete event dynamic systems, IEEE Transactions on Automatic Control, 32, 7, pp. 563–572. Ho, Y.C., and Cao, X. (1983). Perturbation analysis and optimization of queueing networks, Journal of Optimization Theory and Applications 40, 4, pp. 559– 582. Ho, Y.C., and Cao, X. (1990) Perturbation Analysis of Discrete Event Dynamic Systems (Kluwer). Ho, Y.C., Eyler, M.A., and Chien, T.T. (1983) A new approach to determine parameter sensitivities of transfer lines, Management Science 29, 6, pp. 700–714. Suri, R., and Zazanis, M. (1988) Perturbation analysis gives strongly consistent estimates for the M/G/1 queue, Management Science 34, 1, pp. 39–64.

This page intentionally left blank

April 29, 2013

16:20

World Scientific Review Volume - 9in x 6in

Chapter 5 Infinitesimal Perturbation Analysis and Optimization Algorithms Edwin K. P. Chong Dept. of Electrical & Computer Engineering Colorado State University Fort Collins, CO 80523-1373, USA We take a retrospective look at one approach to analyzing the convergence of stochastic approximation algorithms driven by infinitesimal perturbation analysis derivative estimators. This particular approach highlights the interesting insight that algorithms with different update intervals behave similarly to one that updates after every regenerative cycle.

5.1. Preliminary Remarks This chapter is a retrospective look at the use of infinitesimal perturbation analysis (IPA) to drive optimization. IPA is a method to estimate the derivative or gradient of a performance measure in a discrete event system by observing only a single sample path of the system. Naturally, the method lends itself to iterative optimization using gradient-based search algorithms, most notably the Robbins-Monro stochastic approximation method. Applications of such algorithms include on-line optimization and single-run optimization of simulation models. Ever since the first formal descriptions of IPA by Professor Y. C. Ho and his group [Ho et al. (1979); Ho and Cassandras (1983); Ho et al. (1983a,b)], there has been interest in studying optimization algorithms driven by IPA, including [Wardi (1988); Fu (1990); L’Ecuyer and Glynn (1994); Chong and Ramadge (1992, 1993, 1994); Tang and Chen (1994); Andradottir (1998)]. It is not the purpose of this chapter to provide an exhaustive account of these developments and the work that followed them. Instead, this chapter is an opportunity for me to reflect on Professor Y. C. Ho’s impact on my own view of this topic. It goes without saying that mine is not the only view, nor the final word. 75

chapter5

April 29, 2013

16:20

World Scientific Review Volume - 9in x 6in

E. K. P. Chong

76

5.2. Motivation In 1989, while still a graduate student at Princeton, I read a paper that directed my research for several years to come. It appeared in a special issue of the Proceedings of the IEEE on the dynamics of discrete event systems, guest edited by Professor Y. C. Ho. The paper was by R. Suri, “Perturbation analysis: The state of the art and research issues explained via the GI/G/1 queue” [Suri (1989)]. This paper introduced me to the use of IPA to drive the Robbins-Monro algorithm, a method Suri called “PA-RM single-run” (PARMSR). A specific remark in the paper struck me deeply: Proof of “convergence with probability 1” for the PARMSR algorithm (i.e., updating every 5 customers), even for a simple queueing system such as M/M/1, remains an open question.

Well, if this was an open question, I had to seek an answer to it. It is instructive to understand the state-of-the-art in the analysis of iterative optimization driven by IPA at the time of Suri’s paper. Quoting again from [Suri (1989)]: Glynn (1986) presents a convergence proof for an algorithm which is also single-run. By using a clever trick where parameter updates are done only at the end of every two regenerative cycles, he eliminates bias from the estimates, but a concern with this method may be the length of a regenerative cycle in practical systems, which could be hundreds or even thousands of customers in a queueing network problem. (With the PARMSR algorithm in Suri and Leung (1989) each update step was applied every 5 customers.) Fu (1988) extends Glynn’s method to prove convergence of a single-run optimization algorithm for a CI/G/I queue. However, his algorithm still needs to wait till the end of one regenerative cycle before it updates the parameter.

The references to the papers above have been modified for transparency. In our list of references, the paper by Glynn above is [Glynn (1986)], Suri and Leung (1989) is [Suri and Leung (1989)], and Fu (1988) eventually appeared as [Fu (1990)]. So the state-of-the-art at the time of [Suri (1989)], at least as far as convergence proofs go, is that both Glynn and Fu have proofs for algorithms that update after either one or two regenerative cycles (busy periods, in the case of single-server queues). Suri’s open question is concerned with a convergence proof for an algorithm that updates after every 5 customers. As

chapter5

April 29, 2013

16:20

World Scientific Review Volume - 9in x 6in

chapter5

.

Infinitesimal Perturbation Analysis and Optimization Algorithms

77

Suri points out, algorithms that update only after one or two regenerative cycles suffer from the disadvantage that the times between updates might be large (if the regenerative cycles are long). An algorithm that updates after every 5 customers has a guaranteed update rate, presumably resulting in faster convergence than one that waits for the ends of long regenerative cycles. To go even further, an algorithm that updates after every customer would have the highest possible update rate, and, one might think, the fastest rate of convergence. Motivated by Suri’s open question, I set out to understand what it would take to prove the convergence of the PARMSR algorithm in [Suri and Leung (1989)]. In the rest of this chapter, I will describe how Suri’s open question ultimately received an answer, though perhaps an unexpected one. Contrary to initial expectations, it will turn out that the behavior of the PARMSR algorithm is in fact similar to one that updates after every regenerative cycle. In fact, even an algorithm that updates after every customer behaves similarly to one that updates after every busy period. They are so similar that their sample paths are visually almost indistinguishable. Figure 5.1 illustrates this clearly (we will explain the details of this figure later; for now it suffices to see that the two plots are very close to each other). Moreover, one way to prove the convergence of an algorithm that updates after every customer (or 5 customers) is to show that the algorithm is close to one that updates after every busy period. This means that, in a very practical sense, an algorithm that updates after every customer does not converge any more quickly than one that updates after every busy period. 5.3. Single-server Queues We begin with a treatment of IPA for single-server queues and their use in optimization algorithms. This is the context in which Suri’s open question was posed in [Suri (1989)] and, eventually, answered in [Chong and Ramadge (1993)]. The treatment here is based on [Chong (1995)]. 5.3.1. Controlled single-server queue Consider a controlled single-server (G/G/1) queue, where the service times depend on a real-valued control parameter. We index the customers by n = 1, 2, . . . , and assume that the first customer arrives at an empty queue. Let α(n) be the time between the arrivals of the nth and (n+1)st customers, θ(n) the value of the control parameter for the nth customer, and σ(n, θ(n))

April 29, 2013

16:20

World Scientific Review Volume - 9in x 6in

chapter5

E. K. P. Chong

78

1.0 Customer Busy Period

0.8

0.6

θ(n) 0.4

0.2

0 0

3000

6000

n

9000

12000

Fig. 5.1. A comparison of the Customer and Busy Period algorithms (taken from [Chong and Ramadge (1990)])

the service time of the nth customer. We assume that {(α(n), σ(n, ·))} is i.i.d. For the controlled single-server queue, we are interested in performance measures that involve sojourn times. Let T (n) be the sojourn time of the nth customer—the time duration between arrival and departure of the nth customer. For any given control parameter sequence {θ(n)}, the resulting sequence {T (n)} can be expressed via the Lindley recursion: T (n + 1) = [T (n) − α(n)]+ + σ(n + 1, θ(n + 1)),

n ∈ N,

where [·]+ = max(·, 0), T (1) = σ(1, θ(1)), and N = {1, 2, . . .}. The performance of the nth customer is given by J(n) = F (T (n), θ(n)), where F is a given function. A customer that arrives at a busy (nonempty) queue is said to arrive at a busy period ; otherwise, it is said to arrive at an idle period. A necessary and sufficient condition for the nth customer to be the last in a busy period (or, equivalently, the (n + 1)st customer to arrive at an idle period) is T (n) < α(n). We index the busy periods by k = 1, 2, . . . . The index of the

April 29, 2013

16:20

World Scientific Review Volume - 9in x 6in

Infinitesimal Perturbation Analysis and Optimization Algorithms

chapter5

79

last customer in the kth busy period is given by S(k) = min{n ≥ S(k − 1) + 1 : T (n) < α(n)},

(5.1)

where S(0) = 0. The number of customers in the kth busy period is given by N (k) = S(k) − S(k − 1). Let D be a compact interval contained in the stability region of the queue; i.e., D ⊂ {θ : E[σ(1, θ)] < E[α(1)]}. We assume throughout that θ(n) ∈ D for all n ∈ N. In our discussion, we will often need to consider the special case where, for some fixed θ ∈ D, θ(n) = θ for all n. We refer to this as the “θ-system.” In the θ-system, we denote the sojourn time of the nth customer by T (n, θ), explicitly displaying the dependence on θ. Similarly, we denote the performance of the nth customer by J(n, θ) = F (T (n, θ), θ), and the number of customers in the kth busy period by N (k, θ). Note that the sequences {T (n, θ)} and {J(n, θ)} are regenerative processes, with regeneration times corresponding to customers that arrive at idle periods of the queue. We define the overall performance measure J(θ) for the system as the steady-state average performance over all customers, given by 1 1 J(i, θ) = lim F (T (i, θ), θ). n→∞ n n→∞ n i=1 i=1 n

J(θ) = lim

n

(5.2)

The objective is to minimize the performance function J(θ) with respect to θ ∈ D. For this, we use estimates of the derivative of J to drive an iterative optimization algorithm. We describe the derivative estimates in the next section. 5.3.2. Infinitesimal perturbation analysis Infinitesimal perturbation analysis uses derivatives of “sample performances” (performances of individual customers) to estimate the derivative of the steady-state function J. To proceed, we write the Lindley recursion in the form  T (n) − α(n) + σ(n + 1, θ(n + 1)) if T (n) ≥ α(n) T (n + 1) = (5.3) σ(n + 1, θ(n + 1)) otherwise.

April 29, 2013

16:20

80

World Scientific Review Volume - 9in x 6in

chapter5

E. K. P. Chong

Assume that each σ(n, ·) is differentiable on D a.s., with derivative σ  (n, ·). Formally differentiating (5.3) with respect to θ, we get the recursion   T (n) + σ  (n + 1, θ(n + 1)) if T (n) ≥ α(n)  T (n + 1) = (5.4) otherwise. σ  (n + 1, θ(n + 1)) We refer to (5.4) as the IPA recursion for the sojourn times of the queue. We also formally differentiate the performance of the nth customer, and obtain J  (n) = ∂T F (T (n), θ(n))T  (n) + ∂θ F (T (n), θ(n)), where ∂T F and ∂θ F are the partial derivatives of F with respect to its first and second arguments, respectively. For the θ-system, denote by {T  (n, θ)} the sequence generated by (5.4), and {J  (n, θ)} the associated performance-derivative sequence, given by J  (n, θ) = ∂T F (T (n, θ), θ)T  (n, θ) + ∂θ F (T (n, θ), θ).

(5.5)

We refer to J  (n, θ) as an IPA estimate of the derivative dθ J(θ), where the symbol dθ denotes differentiation with respect to θ. Note that {T  (n, θ)} and {J  (n, θ)} are regenerative processes with the same regeneration times as {T (n, θ)} and {J(n, θ)}. A key research question is the relationship between the IPA estimates  J (n, θ) and the true derivative dθ J(θ) (see, e.g., [Glasserman (1991); Ho and Cao (1991)]). In particular, we are interested in the strong consistency property: n 1  J (i, θ) a.s. (5.6) dθ J(θ) = lim n→∞ n i=1 Because the process {J  (n, θ)} is regenerative, equation (5.6) implies that ⎤ ⎡ N (1,θ)  1 J  (n, θ)⎦ (5.7) dθ J(θ) = E⎣ E[N (1, θ)] n=1 (this is a standard result in regenerative process theory; see, e.g., p. 126 of [Asmussen (1987)]). As we shall see shortly, equation (5.7) is more useful for our purpose than (5.6). In fact, we will need an even stronger condition to hold. Specifically, note that (5.7) applies to the θ-system. What we need is for the equation to hold, approximately, if we randomly perturb the values of the control parameters θ(n) around θ (i.e., “continuity” of the right-hand side of (5.7) with respect to the sequence {θ(n)}). To proceed, we introduce the following assumptions.

April 29, 2013

16:20

World Scientific Review Volume - 9in x 6in

Infinitesimal Perturbation Analysis and Optimization Algorithms

chapter5

81

(Q1) Each σ  (n, ·) is Lipschitz on D a.s., with modulus Kσ (n); (Q2) The positive random variables supθ∈D |σ(1, θ)|, supθ∈D |σ  (1, θ)|, sup{θ(n)}⊂D N (1), and Kσ (1) have finite 4th moments; (Q3) ∂T F and ∂θ F are bounded and Lipschitz in both arguments, with bounded moduli; (Q4) There exist constants q > 0 and B < ∞ such that given θ ∈ D and a random sequence {θ(n)} ⊂ D with max1≤n≤N (1) |θ(n) − θ| ≤ aY , where a is a finite constant and Y a random variable with finite 4th moment, we have P {N (1) = N (1, θ)} ≤ Baq . Assumptions (Q1), (Q2), and (Q3) are simple and natural. The crucial assumption is (Q4). Basically, (Q4) requires that, given a θ-system, the number of customers in a busy period should remain approximately the same (in a probabilistic sense) if we randomly perturb the control parameters θ(n) around θ. The assumption corresponds roughly to the idea that if we perturb the control parameters slightly, the probability of “event order change” should be small (this idea was pervasive in the early IPA literature; see, e.g., [Heidelberger et al. (1988)]). Assumption (Q4) holds for the G/G/1 queue under certain regularity assumptions (see [Heidelberger et al. (1988); Chong and Ramadge (1993)]). We are now ready to state a result that will be particularly useful for our purpose. Theorem 1. Assume that (Q1–Q4) hold. Then, there exist constants K < ∞ and r > 0 such that given θ ∈ D and a random sequence {θ(n)} ⊂ D with max1≤n≤N (1) |θ(n) − θ| ≤ aY , where a ≥ 0 is a finite constant and Y a random variable with finite 4th moment, we have  ⎤ ⎡   N (1)    1  dθ J(θ) − ⎦ ≤ Kar . ⎣ J (n) E   E[N (1)]   n=1 Proof. Define the random variables Nmax (1) = sup{θ(n)⊂D} N (1),  (n) = supθ∈D |σ  (n, θ)|. We may σmax (n) = supθ∈D |σ(n, θ)|, and σmax Nmax (1) assume that any moment, if it exists, of Nmax (1), σmax (n), n=1 Nmax (1)  σ (n), K (1), and Y , and any combination of their products, σ max n=1 is bounded by B. Also, we may assume without loss of generality that F has been scaled in such a way that both ∂T F and ∂θ F , as well as their Lipschitz moduli, are all bounded by 1.

May 20, 2013

13:51

World Scientific Review Volume - 9in x 6in

chapter5

E. K. P. Chong

82

For notational convenience, we suppress the argument “1” in N (1), N (1, θ), and Nmax (1). We note that (Q1–Q3) imply strong consistency (see, e.g., [Glasserman (1993)]), and hence (5.7). Therefore, 

N     1   E J  (n)  dθ J(θ) −   E[N ] n=1  ⎤ ⎡ 

N  N (θ)     1 1 J  (n, θ)⎦ − J  (n)  =  E⎣ E E[N ]  E[N (θ)]  n=1 n=1  ⎤ ⎡

N  N  N (θ)          ⎣ ≤ E[|N − N (θ)|]E |J (n)| + E  J (n) − J (n, θ)⎦ . (5.8) n=1  n=1 n=1 We consider the above term by term. First, E[|N − N (θ)|] ≤ E[Nmax 1{N =N (θ)}] 2 ] P {N = N (θ)} ≤ E[Nmax √ √ ≤ B Baq = Baq/2 ,

(5.9)

applying the Cauchy–Schwarz inequality, (Q2), and (Q4). Second,

E

N 

|J  (n)| ≤ E

n=1

N max 

T  (n) + 1

n=1

≤E

n

N max  

n=1

≤ E Nmax

  σmax (m)

+1

m=1 N max



 σmax (m)

+ Nmax

m=1

 2  N max    2  ≤ E[Nmax ] E σmax (m) + E[Nmax ] m=1

≤ 2B,

(5.10)

where we once again applied the Cauchy–Schwarz inequality, and conditions (Q2) and (Q3).

May 20, 2013

13:51

World Scientific Review Volume - 9in x 6in

Infinitesimal Perturbation Analysis and Optimization Algorithms

chapter5

83

Third, ⎤ ⎡

  N (θ) N     N  J (n) − J  (n, θ)⎦ ≤ E 1{N =N (θ)} |J  (n) − J  (n, θ)| E ⎣  n=1 n=1 i=1 ⎤ ⎡ max(N,N (θ))  |J  (n)|⎦ . + E ⎣1{N =N (θ)} n=1

Bounding the terms as we have done above, applying the Cauchy–Schwarz inequality repeatedly, and using (Q1–Q4), we obtain ⎤ ⎡   N (θ)  √   N    ⎣ J (n) − J (n, θ)⎦ ≤ 5Ba + 2Baq/2 (5.11) E   n=1 n=1 Combining (5.9), (5.10), and (5.11) with (5.8) yields the desired result.

The result above is useful because it greatly simplifies the convergence proof for stochastic approximation algorithms driven by IPA, as we shall see in the next section. In the sequel, we assume (Q1–Q4). 5.3.3. Optimization algorithm The optimization algorithm works as follows. We start with an initial control parameter value θ(1). As customers arrive, they are served with this control-parameter value. At the same time, we observe the system and generate an IPA sequence {J  (n)} via (5.4) and (5.5). Using the IPA estimates, we update the control-parameter value after the τ (1)th customer completes service, and before the (τ (1) + 1)st customer begins service. We then serve the customers using the updated parameter value, until the next time we update the value again, which is after the τ (2)th customer leaves. The procedure is repeated, generating a sequence of control parameters θ(1), θ(2), . . . . The updates of the control parameter are performed after customers τ (1), τ (2), . . . . In general, let τ (m) be the index of the service time just before the mth update; i.e., the mth update is performed just after the τ (m)th customer departs from the system. Examples of choices of the update times τ (m) include updating after every service time (i.e., τ (m) = m), or updating after the last customer in every busy period (i.e., τ (m) = S(m), where S(m) is

April 29, 2013

16:20

84

World Scientific Review Volume - 9in x 6in

chapter5

E. K. P. Chong

defined in (5.1)). The later corresponds to what is done in [Fu (1990)], while the PARMSR algorithm in [Suri and Leung (1989)] uses τ (m) = 5m. For the parameter updates, we use the Robbins-Monro stochastic approximation algorithm. Specifically, let θ(n) be the value of the control parameter used in the service time of the nth customer. We have θ(n) = θ(τ (m)) if τ (m − 1) < n ≤ τ (m), and τ (m)

θ(τ (m + 1)) = θ(τ (m)) − a(m)



J  (n),

(5.12)

n=τ (m−1)+1

where a(m) is a positive step-size, and τ (0) = 0. Note that to perform each parameter update, we use the sum of the IPA estimates between the updates. For simplicity, we assume that θ(n) remains in the compact interval D for all n. In practice, this constraint can be enforced using a projection operation. Specifically, if we incorporate a projection, the algorithm becomes ⎤ ⎡ τ (m)  J  (n)⎦ , θ(τ (m + 1)) = Π ⎣θ(τ (m)) − a(m) n=τ (m−1)+1

where Π : R → D is a projection operator. The analysis of the algorithm with projection introduces some technical complications that detract from our present purpose. We refer the reader to [Chong and Ramadge (1993)] for details in handling that case. Let F (n) denote the σ-algebra generated by the i.i.d. process {(α(1), σ(1, ·)), . . . , (α(n), σ(n, ·))}. Naturally, we assume that each τ (m) is a stopping time with respect to the filtration {F (n)}; i.e., the update times should depend only on previous observations and not future observations. We also assume that the step-size sequence {a(m)} is adapted to F (τ (m − 1)) (which is well defined because τ (m − 1) is a stopping time with respect to {F (n)}). We make the following (relatively standard) assumptions on the stepsize sequence. (G1) {a(m)} is nonincreasing a.s.; (G2) There exist constants A1 < ∞, A2 > 0 such that for each m ∈ N, A2 /m ≤ a(m) ≤ A1 /m a.s.; (G3) There exists a constant Ba < ∞ such that for all m ∈ N, (1/a(m + 1)) − (1/a(m)) ≤ Ba a.s. In the sequel, we assume (G1–G3).

April 29, 2013

16:20

World Scientific Review Volume - 9in x 6in

Infinitesimal Perturbation Analysis and Optimization Algorithms

chapter5

85

5.4. Convergence 5.4.1. Stochastic approximation convergence theorem To prove the convergence of the optimization algorithm described in the last section, it will be helpful to use a general convergence theorem for stochastic approximation algorithms. There is an extensive literature on such convergence results. For our purpose, we need only an elementary convergence result. To this end, suppose that our objective is to minimize a differentiable function J(θ) with respect to θ. Consider a stochastic approximation algorithm   ˜ ˜ + 1) = θ(k) ˜ −a + ε(k + 1) , (5.13) θ(k ˜(k) dθ J(θ(k)) where a ˜(k) is the step-size, and ε(k + 1) represents a noise term. Let {G(k)} be a filtration with respect to which {ε(k)} and {˜ a(k)} are adapted; i.e., {e(1), . . . , e(k)} and {˜ a(1), . . . , a ˜(k)} are G(k)-measurable. To establish the convergence of the general stochastic approximation algorithm above, we introduce the following assumptions: ∞ a ˜(k) = ∞ a.s.; (A1) k=1 ∞ a ˜(k)2 < ∞ a.s.; (A2) k=1 ∞ ˜(k)|EG(k) [ε(k + 1)]| < ∞ a.s.; (E1) k=1 a (E2) For all k ∈ N, EG(k) [ε(k + 1)2 ] ≤ σ 2 where σ 2 is a finite constant; ◦

(J1) There exists θ∗ ∈D such that J is continuously differentiable with bounded derivative on D \ {θ∗ }; (J2) For all θ ∈ D \ {θ∗ }, (θ − θ∗ )dθ J(θ) > 0. Assumptions (A1–A2) are standard assumptions used in the analysis of stochastic approximation algorithms. Assumption (E1) requires that the “conditional bias” EG(k) [ε(k + 1)] approaches zero at a sufficiently fast rate, while (E2) ensures bounded noise variance. Assumptions (J1–J2) ensure that the function being optimized is sufficiently well-behaved. In the above, θ∗ is the point minimizing J. Theorem 2. Suppose that (A1–A2), (J1–J2), and (E1–E2) hold. Assume ˜ ˜ that θ(k) ∈ D for all k a.s. Then, θ(k) → θ∗ a.s. The theorem above is a standard elementary convergence result (for a detailed proof based on martingale convergence arguments, see [Chong and

April 29, 2013

16:20

86

World Scientific Review Volume - 9in x 6in

chapter5

E. K. P. Chong

Ramadge (1992)]). Significantly more sophisticated results are available in the literature. The convergence of stochastic approximation algorithms remains a current research topic. The elementary result in Theorem 2 suffices for our present purpose. Indeed, the simple nature of the result leads to a proof technique that is insightful. The remainder of this section is focussed on applying Theorem 2 to our optimization algorithm. In the sequel, we assume that (J1–J2) hold. Therefore, to apply Theorem 2 to our problem, it remains only to verify (A1–A2) and (E1–E2). 5.4.2. Updating after every busy period We now return to our optimization algorithm driven by IPA estimates, and consider the issue of convergence of the algorithm. We first address the case where updating takes place after every busy period of the queue. Convergence results for this case were first reported in [Glynn (1986)] and [Fu (1990)], and later in [Chong and Ramadge (1992)]. Interestingly, as pointed out before, the convergence of algorithms with more general update times is intimately related to convergence in this case, a point made clear in [Chong and Ramadge (1993)]. Recall that S(k) is the index of the last customer in the kth busy period, and S(k) = kj=1 N (j), where N (k) is the number of customers served in the kth busy period. Because updates are performed after every busy period, we have τ (k) = S(k). The optimization algorithm (5.12) can therefore be represented as N (k)

θ(S(k + 1)) = θ(S(k)) − a(k)



J  (S(k − 1) + i),

(5.14)

i=1

while θ(n) = θ(S(k)) if S(k − 1) < n ≤ S(k) (i.e., the value of the control parameter remains constant within each busy period). We have the following convergence result, which is essentially the result of [Fu (1990)]. Theorem 3. For the algorithm (5.14), we have θ(n) → θ ∗ a.s. ˜ ˜ Proof. Let θ(k) = θ(S(k − 1) + 1) (or, equivalently, θ(k) = θ(S(k))), i.e., the value of the control parameter within each busy period. To prove the ˜ theorem, it is equivalent to show that θ(k) → θ∗ a.s. We use Theorem 2. First, let G(k) = F (S(k − 1)) (which is well defined because S(k − 1) is a

April 29, 2013

16:20

World Scientific Review Volume - 9in x 6in

chapter5

Infinitesimal Perturbation Analysis and Optimization Algorithms

87

stopping time with respect to {F (n)}). We may write N (k)

˜ + 1) = θ(k) ˜ − a(k) θ(k



J  (S(k − 1) + i)

i=1

⎞ N (k)  1 ˜ −a J  (S(k − 1) + i)⎠ , = θ(k) ˜(k) ⎝ EG(k) [N (k)] i=1 ⎛

(5.15)

a(k)} is adapted to {G(k)}, and where a ˜(k) = EG(k) [N (k)]a(k). Note that {˜ ˜(k) ≤ A1 B/k. Therefore, (A1–A2) hold. by (G2), A2 /k ≤ a It remains to show that (E1–E2) hold. Now, ε(k + 1) =

N (k)  1 ˜ J  (S(k − 1) + i) − dθ J(θ(k)). EG(k) [N (k)] i=1

Recall that θ(S(k − 1) + i) = θ(S(k)) for all i = 1, . . . , N (k). Therefore, ˜ noting that ε(k+1) depends on G(k) only through θ(k), applying Theorem 1 yields EG(k) [ε(k + 1)] = 0. Therefore, (E1) holds. To verify (E2), we write ⎞2 ⎛ ⎞2 ⎛ N (k) N (k)   1 ⎝ J  (S(k − 1) + i)⎠ ≤ ⎝ J  (S(k − 1) + i)⎠ EG(k) [N (k)] i=1 i=1 N (k)

≤2



J  (S(k − 1) + i)2 .

i=1

Applying bounds as we did for (5.10), we see that (E2) holds. The proof exploits the fact that the control parameter remains constant within each busy period. Therefore, within the kth busy period, the system ˜ behaves like a θ-system with θ = θ(k). Under this situation, equation (5.7) N (k) ˜ holds. Therefore, the update term ( i=1 J  (S(k−1)+i))/EG(k) [N (k, θ(k)] ˜ ˜ in (5.15) is (conditionally given θ(k)) an unbiased estimate of dθ J(θ(k)). This is the precisely the “bias elimination” referred to in the quote by Suri [Suri (1989)] given earlier. We would expect that a similar argument applies even if the control parameter is not constant within the busy period, but is instead being updated. The key is that asymptotically, the parameter iterates do not change by much when updated, and therefore (5.7) still holds approximately (in the sense of Theorem 1). We shall see this idea applied in the next section.

April 29, 2013

16:20

World Scientific Review Volume - 9in x 6in

chapter5

E. K. P. Chong

88

5.4.3. Updating after every service time We now turn to the case where the parameter updates are performed after every customer’s service time. In this case, we have τ (k) = k. The optimization algorithm (5.12) can therefore be written in this case as θ(n + 1) = θ(n) − a(n)J  (n).

(5.16)

To prove the convergence of this algorithm, we use the argument outlined at ˜ the end of Section 5.4.2. Specifically, we argue that the subsequence {θ(k)} ˜ of {θ(n)} taken at the start of every busy period (i.e., θ(k) = θ(S(k −1)+1) as before) behaves approximately like an algorithm that updates after every busy period. The convergence of this subsequence then follows using the proof of Theorem 3, provided the values of the control parameters within ˜ each busy period do not deviate too far away from the subsequence {θ(k)}. For this, we will need the following lemma. Lemma 1. There exists an i.i.d. sequence {Y (k)}, with finite 4th moment, such that for each k, max

1≤n≤N (k)

|θ(S(k − 1) + n) − θ(S(k − 1) + 1)| ≤ a(k)Y (k).

Proof. Fix k. For every n such that 1 ≤ n ≤ N (k), we have |θ(S(k − 1) + n) − θ(S(k − 1) + 1)| ≤ |θ(S(k − 1) + n) − θ(S(k − 1) + n − 1)|+ · · · + |θ(S(k − 1) + 2) − θ(S(k − 1) + 1)| ≤

n−1 

a(S(k − 1) + i)|J  (S(k − 1) + i)|.

i=1

Because S(k − 1) + i ≥ S(k − 1) + 1 ≥ k for all i ≥ 1, and {a(k)} is nonincreasing by (G1), we have a(S(k − 1) + i) ≤ a(k). Hence, N (k)

|θ(S(k − 1) + n) − θ(S(k − 1) + 1)| ≤ a(k)



|J  (S(k − 1) + i)|.

i=1

Applying (5.10), we get the desired result. ˜ We now establish the convergence of the subsequence {θ(k)}. ˜ Theorem 4. Let θ(k) = θ(S(k − 1) + 1), k ∈ N, where {θ(n)} is given by ˜ (5.16). We have θ(k) → θ∗ a.s.

April 29, 2013

16:20

World Scientific Review Volume - 9in x 6in

Infinitesimal Perturbation Analysis and Optimization Algorithms

chapter5

89

Proof. The proof follows closely to that of Theorem 3. Let G(k) = F (S(k− 1)) as before. We write ˜ + 1) θ(k N (k)

˜ − = θ(k)

 i=1

a(S(k − 1) + i)J  (S(k − 1) + i)

⎞ N (k)  a(S(k − 1) + i) 1 ˜ −a J  (S(k − 1) + i)⎠ , = θ(k) ˜(k) ⎝ EG(k) [N (k)] i=1 a(S(k − 1) + 1) ⎛

where a ˜(k) = EG(k) [N (k)]a(S(k − 1) + 1), which is G(k)-measurable. Because S(k) ≥ S(k − 1) + 1 ≥ k, by (G1) and (G2) we obtain 

k S(k)



A1 B A2 ≤a ˜(k) ≤ . k k

Clearly (A2) holds. To verify (A1), note that k k S(k) 1 1 = N (j) ≤ Nmax (j), k k j=1 k j=1

where Nmax (k) =

sup {θ(n):n≥S(k−1)+1}⊂D

min{n ∈ N : T (S(k − 1) + n) < α(S(k − 1) + n)}. Because E [Nmax (1)] < ∞, by the strong law of large numbers, S(k)/k is bounded above by an a.s. finite random variable, and hence k/S(k) is bounded below by an a.s. nonzero (positive) random variable. Thus, (A1) holds.

April 29, 2013

16:20

World Scientific Review Volume - 9in x 6in

chapter5

E. K. P. Chong

90

To verify (E1), we write EG(k) [ε(k + 1)]

⎤ ⎡ N (k)  a(S(k − 1) + i) 1 EG(k) ⎣ J  (S(k − 1) + i)⎦ = EG(k) [N (k)] a(S(k − 1) + 1) i=1 ˜ − dθ J(θ(k))







N (k) 



a(S(k − 1) + i) 1 EG(k) ⎣ − 1 J  (S(k − 1) + i)⎦ EG(k) [N (k)] a(S(k − 1) + 1) i=1 ⎤ ⎞ ⎛ ⎡ N (k)  1 ˜ ⎠. EG(k) ⎣ J  (S(k − 1) + i)⎦ − dθ J(θ(k)) (5.17) +⎝ EG(k) [N (k)] i=1 =

By (G1), the first term on the right-hand side of (5.17) is bounded above by ⎡ Nmax (k)   1 ⎣ a(k)EG(k) a(S(k − 1) + 1) i=1   1 |J  (S(k − 1) + i)| . − a(S(k − 1) + i) Using (G3), we obtain 1 1 − ≤ Ba i ≤ Ba Nmax (k). a(S(k − 1) + 1) a(S(k − 1) + i) Hence, we can bound the first term on the right-hand side of (5.17) by ⎤ ⎡ Nmax (k)  a(k)Ba EG(k) ⎣Nmax (k) |J  (S(k − 1) + i)|⎦ ≤ a(k)Ba B. i=1

We use Lemma 1 to bound the second term on the right-hand side of (5.17) by Ka(k)r . Hence, by (G2), ∞ 

a ˜(k)|EG(k) [ε(k + 1)]| ≤

k=1



∞  k=1 ∞  k=1

< ∞.

a ˜(k) (Ba Ba(k) + Ka(k)r )  A1 B

K Ba B + 1+r k2 k



April 29, 2013

16:20

World Scientific Review Volume - 9in x 6in

chapter5

Infinitesimal Perturbation Analysis and Optimization Algorithms

91

Therefore, (E1) holds. To verify (E2), we use the same steps as in the proof of Theorem 3. ˜ Having established the convergence of the subsequence {θ(k)} taken at the start of each busy period, the convergence of the sequence {θ(n)} follows easily from the fact that, asymptotically, the parameter values within each busy period are close to the values at the start of the busy periods. Theorem 5. For the algorithm (5.16), we have θ(n) → θ∗ a.s. Proof. In light of Theorem 4, it remains to show that 

e(k) =

max

1≤n≤N (k)

|θ(S(k − 1) + n) − θ(S(k − 1) + 1)| → 0

a.s.

To this end, let  > 0 be given. By Lemma 1 and (G2), we have e(k) ≤ a(k)Y (k) ≤ A1 Y (k)/k. Therefore, using Chebyshev’s inequality, P {e(k) ≥ } ≤

E[e(k)2 ] A1 E[Y (1)2 ] ≤ . 2  2 k 2

Hence, ∞ 

P {e(k) ≥ } < ∞.

k=1

By the Borel-Cantelli lemma, e(k) → 0 a.s.

As we have pointed out before, our approach to proving the convergence of the algorithm that updates after every service time has the follow˜ ing interesting insight. The subsequence {θ(k)} behaves like the parameter sequence in an algorithm that updates after every busy period. The conver˜ gence of the sequence {θ(n)} is closely tied to the convergence of {θ(k)}. In fact, the sample paths of {θ(n)} will “track” the sample paths of an algorithm that updates after every busy period, and has parameter value ˜ θ(k) in the kth busy period. Therefore, the convergence rate of {θ(n)} is constrained by that of the sequence that updates only after every busy period. A priori, this observation seems counterintuitive. The convergence of algorithms with general update times τ (k), including one that updates after every 5 customers as in [Suri and Leung (1989)], follows from the same argument as we have used here (see [Chong and Ramadge (1993)] for detailed analyses).

April 29, 2013

16:20

92

World Scientific Review Volume - 9in x 6in

E. K. P. Chong

5.4.4. Example Consider a queue with exponential interarrival times, and arrival rate 1. Assume that the service times are given by σ(n, θ) = θs(n), where s(n) is exponentially distributed with mean 1; i.e., θ is a scale parameter of the service time distribution. Take D = [0.001, 0.999] as the constraint set for the control parameter (note that the stability region is [0, 1)). For this system, let the performance function J(θ) be given by (5.2), with F (T, θ) = T +

16 . θ

In this particular example, we can compute the minimizer of the performance function analytically (e.g., using well-known formulas for the M/M/1 queue). We obtain θ∗ = 0.8. It is easy to see that assumptions (Q1–Q4) and (J1–J2) hold in this example. Suppose that we apply optimization algorithms driven by IPA estimates to this example. Our purpose is to illustrate the behavior of the algorithms described in the foregoing. We use a step-size sequence given by a(m) = 0.002/m, and an initial condition of θ(1) = 0.4. Note that (G1–G3) hold. We consider the two algorithms described before: one that updates after every busy period (called the Busy Period algorithm), and one that updates after every customer’s service time (called the Customer algorithm). Theorems 3 and 5 guarantee the convergence of these algorithms for this example. Figure 5.1 (referred to earlier) shows plots of a single realization of the control parameter sequences from the Customer and Busy Period algorithms (taken from [Chong and Ramadge (1990)]). As we can see in Figure 5.1, the control parameter sequences of the two algorithms “track” each other. Note that for a parameter value of θ = 0.8, the average number of customers in a busy period is 5. Therefore, close to the optimal parameter value of 0.8, the Customer algorithm updates five times more frequently, on average, than the Busy Period algorithm. This example provides convincing illustration of the insight we described in the previous section. 5.5. Final Remarks It would ultimately turn out that Suri’s open question was answered in a way that took an unexpected turn. Specifically, the behavior of the PARMSR algorithm in [Suri and Leung (1989)] is tied to that of Fu’s algorithm [Fu (1990)], and the convergence of the former can be analyzed by

chapter5

April 29, 2013

16:20

World Scientific Review Volume - 9in x 6in

Infinitesimal Perturbation Analysis and Optimization Algorithms

chapter5

93

exploiting this connection. Of course, we need not have taken this route. We could have proved the convergence of the PARMSR algorithm directly using any of a number of convergence theorems for stochastic approximation algorithms. In fact, my study of such algorithms in general led to some original ways to analyze them [Wang et al. (1996, 1997); Wang and Chong (1998); Chong et al. (1999)]. Nonetheless, it was the historical unfolding of the issue as described in [Suri (1989)] that led to what I believe is a more insightful way to view the convergence of optimization algorithms driven by IPA. IPA methods and their extensions continue to be a topic of research interest, including their use in optimization algorithms. Going beyond queues, we have built on the basic approach to treat general regenerative systems [Chong and Ramadge (1994)] and general recursive systems [Chong (1994)]. Recent and ongoing work include the use of single-run gradient-based optimization for policy improvement and optimization in Markov decision processes [Chang et al. (2004); Chong et al. (2009)] (see also the chapter by Cao in this book). For example, if a problem is known to have an optimal threshold policy, then optimizing the threshold using a single-run gradient-based algorithm would be a natural approach. This direction seems promising and deserves to be studied further. My journey is just one of many that bear the signature of Professor Y. C. Ho. His legacy continues to leave its mark on my career. In celebration of his 80th birthday, I pay tribute to a much respected teacher and friend. References S. Andradottir (1998). A review of simulation optimization techniques, Proc. 1998 Winter Simulation Conf., pp. 151–158. S. Asmussen (1987). Applied Probability and Queues (John Wiley & Sons, Chichester). H. S. Chang, R. L. Givan, and E. K. P. Chong (2004). Parallel rollout for online solution of partially observable Markov decision processes, Discrete Event Dynamic Systems, 14, 3, pp. 309–341. E. K. P. Chong (1994). A recursive approach to stochastic optimization via infinitesimal perturbation analysis, in Proc. 33rd IEEE Conf. on Decision and Control, pp. 1984–1989. E. K. P. Chong (1995). On-line optimization of queues using infinitesimal perturbation analysis, in P. R. Kumar and P. P. Varaiya (eds.), Discrete Event Systems, Manufacturing Systems, and Communication Networks, Vol. 73 of Institute for Mathematics and its Applications (IMA) Volumes in Mathematics and its Applications (Springer-Verlag), ISBN: 0-387-97987-5, pp. 41–57.

April 29, 2013

16:20

94

World Scientific Review Volume - 9in x 6in

E. K. P. Chong

E. K. P. Chong, C. Kreucher, and A. O. Hero III (2009). Partially observable Markov decision process approximations for adaptive sensing, Discrete Event Dynamic Systems, special issue on Optimization of Discrete Event Dynamic Systems, 19, 3, pp. 377–422. E. K. P. Chong and P. J. Ramadge (1990). On a stochastic optimization algorithm using IPA which updates after every customer, in Proc. 28th Annual Allerton Conf. on Comm., Control and Comput., pp. 658–667. E. K. P. Chong and P. J. Ramadge (1992). Convergence of recursive optimization algorithms using infinitesimal perturbation analysis estimates, Discrete Event Dynamic Systems, 1, 4, pp. 339–372. E. K. P. Chong and P. J. Ramadge (1993). Optimization of queues using an infinitesimal perturbation analysis-based stochastic algorithm with general update times, SIAM J. Control and Optimization, 31, 3, pp. 698–732. E. K. P. Chong and P. J. Ramadge (1994). Stochastic optimization of regenerative systems using infinitesimal perturbation analysis, IEEE Trans. Automat. Control, 39, 7, pp. 1400–1410. E. K. P. Chong, I.-J. Wang, and S. R. Kulkarni (1999). Noise conditions for prespecified convergence rates of stochastic approximation algorithms, IEEE Trans. Inf. Theory, 45, 2, pp. 810–814. M. C. Fu (1990). Convergence of a stochastic approximation algorithm for the GI/G/1 queue using infinitesimal perturbation analysis, J. Opt. Th. Appl., 65, 1, pp. 149–160. P. Glasserman (1991). Gradient Estimation via Perturbation Analysis (Kluwer Academic Publishers, Norwell, Massachusetts). P. Glasserman (1993). Regenerative derivatives of regenerative sequences, Adv. Appl. Prob., 25, pp. 116–139. P. W. Clynn (1986). Optimization of stochastic systems, in Proc. 1986 Winter Simulation Conf., pp. 52–59. P. Heidelberger, X.-R. Cao, M. A. Zazanis, and R. Suri (1988). Convergence properties of infinitesimal perturbation analysis estimates, Management Sci., 34, 11, pp. 1281–1301. Y. C. Ho and C. Cassandras (1983). A new approach to the analysis of discrete event dynamic systems, Automatica, 19, 2, pp. 149–167. Y. C. Ho, X. R. Cao, and C. Cassandras (1983a). Infinitesimal and finite perturbation analysis for queueing networks, Automatica, 19, pp. 439–445. Y. C. Ho, M. A. Eyler, and T. T. Chien (1979). A gradient technique for general buffer storage design in a serial production line, Int. J. Prod. Res., 17, 6, pp. 557–580. Y. C. Ho, M. A. Eyler, and T. T. Chien (1983b). A new approach to determine parameter sensitivities of transfer lines, Management Sci., 29, 6, pp. 700– 714. Y.-C. Ho and X.-R. Cao (1991). Perturbation Analysis of Discrete Event Dynamic Systems (Kluwer Academic Publishers, Norwell, Massachusetts). P. L’Ecuyer and P. Glynn (1994). Stochastic optimization by simulation: Convergence proofs for the GI/G/1 queue in steady-state, Management Science, 40, pp. 1562–1578.

chapter5

April 29, 2013

16:20

World Scientific Review Volume - 9in x 6in

Infinitesimal Perturbation Analysis and Optimization Algorithms

chapter5

95

R. Suri (1989). Perturbation analysis: The state of the art and research issues explained via the GI/G/1 queue, Proc. of the IEEE, 77, 1, pp. 114–137. R. Suri and Y. T. Leung (1989). Single run optimization of discrete event simulations—An empirical study using the M/M/1 queue, IIE Transactions, 21, 1, pp. 35–49. Q.-Y. Tang and H.-F. Chen (1994). Convergence of perturbation analysis based optimization algorithm with fixed number of customer period, Discrete Event Dynamic Systems, 4, 4, pp. 359–375. I.-J. Wang, E. K. P. Chong, and S. R. Kulkarni (1996). Equivalent necessary and sufficient conditions on noise sequences for stochastic approximation algorithms, Adv. Appl. Prob., 28, pp. 784–801. I.-J. Wang, E. K. P. Chong, and S. R. Kulkarni (1997) Weighted averaging and stochastic approximation, Math. Contr., Sig., and Sys., 10, pp. 41–60. I.-J. Wang and E. K. P. Chong (1998). A deterministic analysis of stochastic approximation with randomized directions, IEEE Trans. Automat. Control, 43, 12, pp. 1745–1749. Y. Wardi (1988). Simulation-based stochastic algorithms for optimizing GI/G/1 queues, preprint, Dept. of Industrial Engr., Ben Gurion University of the Negev, Beersheva, Israel.

This page intentionally left blank

Chapter 6 Simulation-based Optimization of Failure-prone Continuous Flow Lines

Xiaolan Xie Centre for Health Engineering Ecole Nationale Supérieure des Mines de Saint Etienne, France Department of Industrial Engineering and Logistics Management Shanghai Jiao Tong University, China This chapter addresses simulation-based gradient estimation and optimization of continuous flow lines under both time dependent and operation dependent failures and show that gradient estimation depends highly on the failure modes. For time dependent failure, simple IPA estimators can be easily established and more efficient gradient estimators can be derived through a (max, +) formulation of the flow dynamics. Further, single sample path optimization algorithm is proposed for optimization of continuous flow model. For operation dependent failures, gradient estimation is more difficult, the IPA leads to biased estimation due the discontinuity of flow dynamics but more complicated conditional perturbation analysis can be used to establish unbiased gradient estimators.

6.1. Introduction Continuous flow models have been widely used for optimal control and design of manufacturing systems. In manufacturing systems especially in high volume production systems, it is often cumbersome to track all individual parts either in performance evaluation or real-time flow control. The number of possible states is huge and is usually beyond reasonable limits. The number of events to consider in a simulation study is very large as a result of large number of ‘‘minor’’ events such as the start and end of each individual part on a machine. Continuous flow models offer an interesting way to reduce the complexity inherent to discrete flow models by approximating the discrete material flows

97

98

X. Xie

with continuous material flows and hence allowing one to focus on important events such as machine failures and demand fluctuations. Further, techniques for optimization of system parameters are simpler than those for optimization of discrete flow models. For example, continuous flow model makes the gradient estimation with respect to system parameters such as buffer capacities and base stock levels possible and hence efficient gradientbased optimization applies to buffer capacity optimization while it is not possible with discrete flow models. Such system parameters are integers in corresponding discrete flow models and the optimization of such parameters requires time-consuming discrete combinatorial optimization techniques. Continuous flow model was first proposed in [26] for failure-prone transfer lines. Since then, there have been increasing interests in both analytical methods or simulation based methods of various continuous flow models for various applications (see [4] for survey). For example, a decomposition-based analytical method was proposed in [3] for performance evaluation of continuous flow lines and the convergence proof was established in [23]. Discrete production networks with finite buffers, unreliable machines and multiple products were approximated by a continuous flow model in [11] and the latter was shown to be superior in speed and very accurate when compared to discrete flow simulation model. For transfer lines subject to operation-dependent failures and with buffers of finite capacity Ci, it was proved in [6] that the throughput rate of the continuous flow model with buffer capacity Ci (respectively Ci + 2) is smaller (respectively larger) than the throughput rate of the corresponding discrete transfer line. This confirms observations made in several simulation studies that continuous flow models are good approximation of high throughput manufacturing systems. The Infinitesimal Perturbation Analysis (IPA) techniques used in this paper have been widely considered for control and optimization of discrete event systems. Motivated by buffer storage optimization in a production line, the first perturbation analysis was developed in the pioneer work of the group of Prof. Y.C. Ho [9] to compute the sensitivity of a performance measure with respect to system parameters in a single simulation run. This is opposite to other simulation-based gradient estimation methods which require multiple simulation runs. IPA calculates directly the sample derivative dL(θ, ξ ) /dθ using information on the nominal trajectory (θ, ξ ) alone. The basic idea is the following: if the perturbations introduced into the trajectory (θ, ξ ) are small enough, then we can assume that the event sequence of the perturbed trajectory (θ + ∆θ, ξ ) remains unchanged from the nominal one. In this case the derivative dL(θ, ξ ) /dθ can be easily calculated ([8, 10]).

Simulation-based Optimization of Failure-prone Continuous Flow Lines

99

Sample path gradient estimation and optimization of continuous flow models have been addressed by many authors. IPA techniques were first developed in [19, 21] for throughput optimization of continuous production lines with operation-dependent failures. A generalized semi-Markov process model was used to establish gradient estimators with respect to maximal production rates. Gradient estimation of the throughput rate with respect to buffer capacity was addressed in [7] for a two-machine continuous flow line subject to operationdependent failures. They proved that IPA estimators are in general biased and proposed gradient estimators using the so-called Smoothed Perturbation Analysis. Contrary to transfer lines subject to operation-dependent failures, IPA estimators are proved unbiased in [24, 25] for continuous flow production networks subject to time-dependent failures and simulation-based optimization was applied for optimal buffer capacity allocation. Motivated by flow control in telecommunication networks, a basic Stochastic Flow Model (SFM) was proposed in [22] and closed-form formula were derived for gradient estimators of loss-related and workload-related metrics with respect buffer size, service rate and inflow rate. This work is later extended in [2] to include buffer control and in [20] for tandem SFM networks. It was shown that the formula of gradient estimators can be implemented for optimization of the actual discrete flow system. IPA techniques have also been applied for optimization of make-to-stock manufacturing systems. Perturbation analysis and large deviation theory were used in [18] for optimization of a base-stock policy subject to service level constraints. Optimal buffer capacity allocation of two-stage manufacturing system was considered in [17]. When evaluated based on the sample path of discrete-part systems, sample derivatives of the fluid model was shown to provide good estimation of related sensitivity measures. Simulation results of the related discrete-part systems showed that stochastic approximation algorithms that use such estimates do converge to optimal or near optimal hedging points. The purpose of this chapter is to show the applicability of infinitesimal perturbation analysis (IPA) for performance optimization of continuous flow models of manufacturing systems. Most practical continuous flow models have two interacted components: a continuous flow component modeling the fluid dynamics and a discrete-event component modeling significant changes of fluid dynamics such as machine failures and repairs. We will show that the applicability of IPA estimation depends strongly on the interactions between these two components. In particularly, we consider manufacturing systems with two failure modes: operation-dependent failures (ODF) under which a machine

100

X. Xie

cannot fail if it is not producing and time-dependent failures (TDF) under which a machine can fail at any time. IPA estimations are shown to be biased for production lines subject to operation-dependent failures while they are unbiased and strongly consistent when time-dependent failures are considered. ODF and TDF are two most important failure models in the literature. It was shown in [1] that 80% of interruptions in manufacturing systems are generated by ODF failures. It was formally proved in [14] that, for the same production line except the failure modes, the system subject to ODF failures have higher throughput rate than that subject to TDF failures. Nevertheless, TDF failures are frequently assumed in the literature especially for flow control of failure-prone systems. The rest of the chapter is organized as follows. Section 6.2 presents both fluid dynamics and discrete event dynamics of two-machine continuous flow lines subject to either ODF or TDF. Section 6.3 presents IPA analysis of these failure-prone production lines and analyzes the correctness of the IPA estimators. Section 6.4 introduces stochastic fluid event graphs as a general framework for modeling general assembly/disassembly production networks subject to TDF failures. Section 6.5 characterizes the fluid dynamics by a system of (min, +) evolution equations and presents properties of sample path performance measures and sample path gradients. Section 6.6 shows how the evolution equation system can be used for asymptotic optimization of the production network by using a single sample path. Section 6.7 concludes the chapter with immediate extensions and possible future research. 6.2. Two-machine Continuous Flow Lines We first consider a continuous flow model of a single-product transfer line composed of two failure-prone machines (M1, M2) and one buffer B (see Fig. 6.1). Material flows continuously from outside the system to machine M1, then flows to the buffer B, and then to the last machine before departing from the system. We assume that M1 is never starved and M2 is never blocked.

Fig. 6.1. A two-machines production line.

Simulation-based Optimization of Failure-prone Continuous Flow Lines

101

xt

θ Case U1 > U2 slope U1 – U2 0

Case U1 < U2

t

Fig. 6.2. Evolution of buffer level.

The normal behavior of the system is illustrated by Fig. 6.2. At the beginning, both machines Mi produce at their maximal production rate Ui. The buffer level x(t) at time t increases if U1 ≥ U2 and it decreases if U1 < U2. This continues until either the buffer capacity θ is reached or the buffer becomes empty. In the former case, the production rate of M1 is reduced to U2 while the production rate of M2 is reduced to U1 in the latter case. In both cases, the buffer level will remain constant. The above process changes when a machine fails and it cannot produce at all during its repair. At this point, the continuous dynamics evolves as above with the maximal production rate of the failed machine set to 0. The continuous dynamics changes again when the failed machine is repaired and its maximal production rate becomes Ui. The remainder of this section is devoted to the formal description of both discrete and continuous dynamics of the system. Let αi(t) be a reliability state of machine Mi at time t with αi(t) = 1 if Mi is up and αi(t) = 0 otherwise. Let x(t) be the quantity of material in buffer B with xi(t) ∈ [0, θ ]. Let ui(t) ∈ [0, Ui] be the production rate of machine Mi. Let yi(t) be the cumulative production of machine Mi till time t. Each machine produces at its maximal feasible rate under the constraints of its maximum rate αi(t)Ui and the buffer capacity. More specifically, when a machine Mi is up, three situations are possible: (i) its input buffer is empty and its production rate is slowed down to the production rate ui–1(t) of its upstream machine; (ii) its downstream buffer is full and its production rate is slowed down to the production rate ui+1(t) of its downstream machine; (iii) it produces at its maximum production rate. This leads to:

U1α1 (t ), if x(t ) < θ or U1α1 (t ) ≤ U 2α 2 (t ) u1 (t ) =  U 2α 2 (t ),if x(t ) = θ and U1α1 (t ) > U 2α 2 (t )

(6.1)

102

X. Xie

U 2α 2 (t ), if x(t ) < 0 or U1α1 (t ) ≥ U 2α 2 (t ) u 2 (t ) =  U1α1 (t ),if x(t ) = 0 and U1α1 (t ) < U 2α 2 (t )

(6.2)

Note that u(t) does not depend on the exact value of x(t). It only depends on whether x(t) = 0 or x(t) = θ, i.e. whether the buffer is full and/or empty. For simplicity, the following assumption is made in the first part of the chapter. Assumption 1. θ > 0, x(0) = 0. Under Assumption 1, the following flow dynamics hold.

x (t ) = u1 (t ) − u2 (t )

(6.3)

0 ≤ x(t ) = y1 (t ) − y2 (t ) ≤ θ

(6.4)

t

yi (t ) = ∫ ui ( s )ds

(6.5)

0

The second part of the system dynamics is related to changes of reliability state of each machine. We make the following assumption: Assumption 2. Each machine Mi is associated with two sequences of positive i.i.d. random variables {TBFik} and {TTRik} with positive mean where TBFik is the k-th up time of machine Mi and TTRik k-th repair time. The dynamic evolution of the reliability state of a machine Mi is characterized by its up/down state αi(t) and its remaining clock ri(t). The remaining clock ri(t) is set initially to its corresponding TBFik or TTRik depending whether Mi is up or down. It ticks down till zero at which the up/down state αi(t) switches. The decrement speed of ri(t) is 1 when the machine is down. Its decrement speed at the up state depends on the reliability model. Assumption 3. The remaining clock ri(t) of an up machine Mi decrements at speed 1 when Mi is subject to Time-Dependent-Failures (TDF) and it decrements at speed ui(t)/Ui if Mi is subject to Operation-Dependent-Failures (ODF). As a result,

 −1,if α i (t ) = 0 or M i is a TDF-machine  ri (t ) =  ui (t ) ,if α1 (t ) = 1 & M i is an ODF-machine −  Ui

(6.6)

Simulation-based Optimization of Failure-prone Continuous Flow Lines

103

Under Assumption 3, the epochs of machine failure and repair events do not depend on the buffer capacity in a production line subject to TDF failures. This is clearly wrong for a production line subject to ODF failures. The above continuous flow model is a piecewise linear system and its dynamics change at the occurrence of different discrete events including the failure of a machine Mi, the repair of Mi, buffer full of B (i.e. x(t) reaches θ ), buffer empty of Bi (i.e. x(t) reaches 0) denoted respectively, Fi, Ri, BF, BE. Flow rates u(t) and decrement rate r (t ) of remaining clocks can be uniquely determined by (i) the initial state of the machines, (ii) the initial full/empty state of the buffers, and (iii) the sequence of events up to time t. This is true for u(t) as (i)–(iii) determine the state of the machines and the full/empty state of the buffers at time t. Let ek ∈{Fi, Ri, BF, BE} be the k-th event. Let tk be the epoch of event ek with t0 = 0. The inter-arrival times of events (time between ek and ek+1) is denoted by δk, i.e., δk = tk+1 – tk. Let ri (t) be the remaining life time (until failure of machine Mi if αi (t) = 1 or repair of machine Mi if αi (t) = 0). The discrete event dynamics of the system can be characterized as follows. The continuous state variables of the system at time t, ∀ t ∈[tk, tk+1) includes:

x(t ) = x(tk ) + (u1 (tk ) − u2 (tk ))(t − tk )

(6.7)

yi (t ) = yi (tk ) + ui (tk )(t − tk )

(6.8)

ri (t ) = ri (tk ) + ri (tk )(t − tk )

(6.9)

Next event epoch δk = tk+1 – tk can be determined as follows:

 ri (tk ) , if ek +1 = Ri or Fi   ri (tk )  x (t k ) δ k = tk +1 − tk =  ,if ek +1 = BE u t (  2 k ) − u1 (tk )  θ − x (t k ) ,if ek +1 = BF   u1 (tk ) − u2 (tk )

(6.10)

The update of discrete state variables αi(t) at the next event epoch is obvious and the Continuous state variables become:

TBFi orTTRi , if ek +1 = Ri or Fi ri (tk +1 ) =   ri (tk ) + ri (tk )δ k ,otherwise

(6.11)

104

X. Xie

if ek +1 = BE 0,  x(tk +1 ) = θ , if ek +1 = BF  x(t ) + (u (t ) − u (t ))δ ,otherwise  k 1 k 2 k k

(6.12)

yi (tk +1 ) = yi (tk ) + ui (tk )δ k

(6.13)

tk +1 = tk + δ k

(6.14)

The performance measure of interest in this chapter is the throughput rate:

L = lim y2 (t ) t t →∞

which is approximated by the following finite-time estimate:

Lt = y2 (t ) t . 6.3. Gradient Estimation of a Two-machine Line The purpose of this section is to derive sample path gradient estimators by using the Infinitesimal Perturbation Analysis, i.e. by assuming that the sequence of event does not change. Assumption 4. There exists an open interval Θ of real such that

e1 (θ ') = e1 (θ ),…, ek (θ ') = ek (θ ), ∀θ ',θ ∈Θ Theorem 1. Under Assumption 1 to 4, the sequence of flow rates u(tk) and clock decrement rates r (tk ) does not change. All continuous state variables are differentiable at θ ∈ Θ and their derivatives can be derived as follows:

 1 ∂ri (tk ) , if ek +1 = Ri or Fi   ri (tk ) ∂θ ∂δ k  1 ∂ x (t k ) = , if ek +1 = BE ∂θ  u2 (tk ) − u1 (tk ) ∂θ  1  ∂ x (t k )   1 −  , if ek +1 = BF ∂θ   u1 (tk ) − u2 (tk ) 

Simulation-based Optimization of Failure-prone Continuous Flow Lines

105

if ek +1 = Ri or Fi 0, ∂ri (tk +1 )  =  ∂ri (tk ) ∂δ k ∂θ  ∂θ + ri (tk ) ∂θ ,otherwise

 0, if ek +1 = BE ∂x(tk +1 )  = 1, if ek +1 = BF ∂θ  ∂ x (t ) ∂δ k  + (u1 (tk ) − u2 (tk )) k ,otherwise  ∂θ ∂θ

∂yi (tk +1 ) ∂yi (tk ) ∂δ = + ui (tk ) k ∂θ ∂θ ∂θ ∂tk +1 ∂tk ∂δ k = + ∂θ ∂θ ∂θ ∂yi (t ) ∂yi (tk ) ∂t = − ui (tk ) k , ∀t ∈[tk , tk +1 ) ∂θ ∂θ ∂θ ∂Lt ∂y2 (t ) = t ∂θ ∂θ Example 1. This example compares IPA estimators with SD (Symmetric Difference) estimators with ∆θ = 0.05. All random variables TBFi and TTRi have mean equal to 1, the maximal production rates Ui are equal to 1. When TBFi and TTRi are exponentially distributed, analytical results are available, which can be used to assess the convergence of the various estimators. The throughput rates can be found in [4 and references therein] and are equal to LODF = 0.5 − (4θ + 6) −1 LTDF = 0.5 − (4θ + 4) −1

leading to ∂LODF ∂θ = (3 + 2θ ) −2 , ∂LTDF ∂θ = (2 + 2θ ) −2 . The simulation results are summarized in Table 6.1 for ODF lines with exponentially distributed TBF and TTR and Table 6.2 for TDF lines with random variables of both exponential distribution (*) and two-stage Erlang distribution (**). Each case is simulated with 20 replications. Both mean and 95% confidence half-widths (in parentheses) are given. Clearly, the IPA estimator is biased for ODF production lines while it is unbiased and strongly consistent for TDF production lines. As observed in most IPA studies, IPA estimators converge very quickly with small variance.

106

X. Xie Table 6.1. Simulation results of ODF lines. c 0.5

1

2

t 103 104 105 106 103 104 105 106 103 104 105 106

dL/dc 0.0625 0.0625 0.0625 0.0625 0.04 0.04 0.04 0.04 0.0204 0.0204 0.0204 0.0204

IPA 0.0304 (0.0010) 0.031 (0.0003) 0.0312 (0.0001) 0.0313 (0.0000) 0.0197 (0.0008) 0.0200 (0.0003) 0.0200 (0.0001) 0.0200 (0.0000) 0.0103 (0.0006) 0.0102 (0.0002) 0.0102 (0.0001) 0.0102 (0.0000)

SD 0.0573 (0.0075) 0.0625 (0.0028) 0.0620 (0.0010) 0.0625 (0.0002) 0.0197 (0.0008) 0.0386 (0.0024) 0.0404 (0.0006) 0.0400 (0.0002) 0.0211 (0.0058) 0.0203 (0.0015) 0.0203 (0.0004) 0.0204 (0.0002)

Table 6.2. Simulation results of TDF lines. c 0.5

1

2

t 103 104 105 106 103 104 105 106 103 104 105 106

dL/dc* 0.1111 0.1111 0.1111 0.1111 0.0625 0.0625 0.0625 0.0625 0.0278 0.0278 0.0278 0.0278

IPA* 0.1133 (0.0112) 0.1109 (0.0035) 0.1111 (0.0013) 0.1111 (0.0005) 0.0639 (0.0094) 0.0624 (0.0030) 0.0624 (0.0009) 0.0625 (0.0003) 0.0277 (0.0054) 0.0278 (0.0017) 0.0278 (0.0006) 0.0278 (0.0002)

SD* 0.1085 (0.3806) 0.1221 (0.0740) 0.1105 (0.0348) 0.1103 (0.0111) 0.0592 (0.3787) 0.0764 (0.0764) 0.0609 (0.0313) 0.0619 (0.0117) 0.0239 (0.3672) 0.0443 (0.0738) 0.0260 (0.0301) 0.0271 (0.0119)

IPA** 0.1389 (0.0130) 0.1418 (0.0031) 0.1421 (0.0015) 0.1419 (0.0003) 0.0620 (0.0090) 0.0638 (0.0021) 0.0639 (0.0008) 0.0638 (0.0003) 0.0214 (0.0042) 0.0217 (0.0014) 0.0219 (0.0006) 0.0218 (0.0002)

SD** 0.1108 (0.2627) 0.1376 (0.0871) 0.1423 (0.0192) 0.1417 (0.0084) 0.0318 (0.2541) 0.0588 (0.0836) 0.0638 (0.0181) 0.0634 (0.0084) –0.0128 (0.2393) 0.0165 (0.0885) 0.0209 (0.0187) 0.0217 (0.0081)

The bias of the IPA estimator for ODF production lines is mainly due to the discontinuity of the sample path with respect the buffer capacity. This happens when machine M1 fails at the same time as the buffer reaches its capacity and M1 is blocked due to the failure of M2. The sample path depends strongly on the order of the two events. If M1 fails right before the buffer full event, then both M1 and M2 will be under repair simultaneously and both will produce to keep the buffer level at the capacity after the repair of M2 in case of quicker repair of M1. If buffer full event occurs first, then M1 is blocked till the repair of M2 at which M1 fails and the buffer drops. This situation is illustrated in Fig. 6.3. Similar discontinuity exists when M2 fails at the same time the buffer becomes empty. Conditional perturbation analysis is used in [7] to derive unbiased and strongly consistent gradient estimators that can still be evaluated in a single simulation run.

Simulation-based Optimization of Failure-prone Continuous Flow Lines

F1

R1 BF

107

R2

xt(θ')

θ'

θ BF 0

R2, F1

xt(θ)

R1 Fig. 6.3. Sample path discontinuity of ODF flow lines.

The dynamics of TDF lines are significantly different from those of ODF lines. The major difference is that the remaining clocks of TDF lines are independent of their buffer capacity. As a result, their sample path functions are continuous.

Theorem 2. Lt is piece-wise linear, Lipschitz continuous, non-decreasing and concave in θ . Further the gradient estimator ∂Lt /∂θ is a subgradient of Lt for all θ . Proof. The first part of the theorem follows from Theorem 2 of [24] with Lt = y2(t). The second part is a direct consequence of the concavity and Theorem 3 of [24]. □ Theorem 3 (unbiasedness). If E[Lt] is differentiable at θ , then E[gt] = ∂E[Lt]/∂θ for any subgradient gt of Lt. Theorem 4 (strong consistency). If the long-run throughput rate L = limt→∞ Lt exists and is differentiable at θ, then limt→∞ gt = ∂L/∂θ , w.p.1 for any subgradient gt of Lt. The above two theorems are special cases of Theorems 7 and 9 of [24]. They establish both the unbiasedness and strong consistency of IPA gradient estimators given in Theorem 1 for TDF lines.

108

X. Xie

6.4. Modeling Assembly/Disassembly Networks Subject to TDF Failures with Stochastic Fluid Event Graphs The purpose of this section is to extend the results of the previous section to more general production networks with assembly and disassembly operations. Given the bias of IPA estimators for two-machines ODF lines, IPA is expected to provide biased gradient estimation for general production networks subject to operation-dependent failures. For this reason, we limit ourselves time-dependent failures hereafter. It was shown in [12] that the errors in throughput rate between ODF and TDF systems are small compared with typical modeling errors for most practical systems. As a result, results of this section are meaningful for design and optimization of general failure-prone production networks. General assembly/disassembly production networks are modeled as a class of fluid Petri nets called fluid event graphs subject to failures. We will introduce fluid events graphs, the continuous and discrete-event dynamics and then show how typical production networks can be modeled. A (ordinary) Petri net is a bipartite directed graph N = (P, T, F, m) where P is a set of places and T is a set of transitions, F ⊆ P × T ∪ T × P is a set of arcs from transitions to places or from places to transitions, m: P → N is the initial marking that assigns to each place a given number of tokens. A transition can fire if each of its input places contains at least one token. The firing of a transition removes a token from each of input places and adds one token into each of its output places. Repeating the firing process leads to a firing sequence. An event graph also called marked graph is a Petri net such that each place has one input transition and one output transition. As a result, each place can be represented by its input transition and its output transition. For this reason, we use in this paper the following notation:

• • • •

N = (P, T, F, m): the event graph, T = (1, …, i, …, I ): the set of transitions, (i, j): place connecting transitions i to j if such a place exists, mij ∀ (i, j) ∈ P: initial marking of place (i, j).

Event graphs are also called decision-free Petri nets since different transitions do not share common input places and there is no conflict in transition firing. Important properties of event graphs include: (a) an event graph is live and reversible if every elementary circuit contains at least one token; (b) the total number of tokens in any elementary circuit remains invariant.

Simulation-based Optimization of Failure-prone Continuous Flow Lines

109

Fig. 6.4. A fluid event graph.

Fluid Petri nets, also called continuous Petri nets, are extension of classical Petri net models to cope with state explosion problem. In contrast to traditional Petri nets in which each place holds a discrete number of tokens, a place in fluid Petri nets holds a fluid and transitions fire continuously according to some firing speed. As a result, marking of a place is a nonnegative real number that we sometimes call the token content of a place. Furthermore, we shall say firing speed instead of firing sequence. General presentations of fluid Petri nets can be found in [5] and hereafter we limit ourselves to fluid event graphs. In a fluid event graph (see Fig. 6.4), we associate to each transition a maximal firing speed and the following notation will be used:

• • • •

Ai: maximal firing speed of transition i, ui(t): firing speed of transition i at time t, xij(t): marking of place (i, j) at time t. Of course, xij(0) = mij, yi: cumulative firing quantity of transition i up to time t. The following assumptions are made in this paper:

Assumption 5. All transitions have finite maximal firing speed, i.e. Ai < ∞ . Assumption 6. The event graph is connected. While Assumption 6 is not restrictive, Assumption 5 forbids immediate transitions and its relaxation will be addressed in the conclusion. Before giving more technical explanation, let us use Fig. 6.4 to illustrate the dynamic behavior of a fluid event graph. Consider the place (1, 3). Clearly,

110

X. Xie

transition t1 and t2 can always fire at their maximal speed, i.e. u1t = A1 and u2t = A2. Assume that A2 > A3. Then transition t3 can fire at its maximal speed and the fluid level of place (1, 3) increases if A1 ≥ A3. Otherwise, t3 fires at its maximal speed until place (1, 3) becomes empty and then it fires at reduced speed A1. The evolution of the fluid content of place (1, 3) is similar that of Fig. 6.2. In general, the fluid level of each place evolves piece-wise linearly according to the firing speeds of its input/output transitions. We extend the notion of marking to any pair of transitions that are not connected directly via a place. This extension is done by means of the classical notion of token distance defined as follows:

mij = min ρ ∈Γ ij

xij (t ) = min ρ ∈Γ ij



mkl

(6.15)



xkl (t )

(6.16)

( k ,l ) ∈ρ

( k , l ) ∈ρ

where Γij is the set of directed path connecting transition i to transition j. With obvious convention, mij = ∞ and xij(t) = ∞ if there is no directed path from i to j. Note that mii = 0 for any transition i. In Fig. 6.4, m14 = m13 + m34 and x14(t) = x13(t) + x34(t). It immediately follows that:

xij (t ) = ui (t ) − u j (t ), ∀i, j ∈T t

(6.17)

yi (t ) = ∫ ui (τ ) d τ , ∀i ∈T

(6.18)

0 ≤ ui (t ) ≤ Ai , ∀i ∈T

(6.19)

xij (t ) ≥ 0, ∀i, j ∈T

(6.20)

0

Relations (6.17) and (6.20) hold for all pairs of transitions (i, j) connected via a place or not. To prove relation (6.17), the fact that each place has only one input transition and one output transition leads to



( k ,l )∈ρ

xkl (t ) = ∑ ( k ,l )∈ρ xkl (0) + yi (t ) − y j (t ), ∀ρ ∈Γ ij

Combining it with (6.16) gives xij (t ) = xij (0) + yi (t ) − y j (t ) and proves (6.17). Further,

xij (t ) = mij + yi (t ) − y j (t ), ∀i, j ∈T

(6.21)

Simulation-based Optimization of Failure-prone Continuous Flow Lines

111

A control policy ui(t) is said feasible if (6.19) and (6.20) hold. A transition can fire at its maximal firing speed if each of its input places has positive marking. Otherwise, it can be shown that, for any place, xji(t) = 0 implies that ui(t) = uj(t) and transition i cannot fire at its maximal speed if Ai > uj (t). It naturally leads to the following firing policy:

{

ui (t ) = min Ai ,

min

( j , i ) ∈P / x ji ( t ) = 0

}

u j (t ) , ∀i ∈T

(6.22)

Assumption 7. The total token flow in each elementary circuit γ is positive, i.e. ∑(i, j) ∈ γ mij > 0. Firing policy (6.22) can be iteratively determined thanks to Assumption 7 and the invariance property of the fluid content in a circuit, i.e.



( i , j )∈γ

xij (t ) = ∑(i , j )∈γ mij > 0.

It can be proved that:

ui (t ) = min A j , ∀i ∈T j / x ji ( t ) = 0

(6.23)

We notice that only transitions immediately preceding transition i via an empty place are considered in (6.22), while all transitions connecting to transition i through a token-free path are considered in (6.23). From this result,

Theorem 5. Given the marking at time t, the firing policy (6.22) maximizes the firing speed of each transition and consequently maximizes the cumulative firing quantities yi(t). This is not surprising since firing policy (6.22) corresponds to the earliest firing policy of discrete event graph.

Proof. For any feasible control, from (6.17) and (6.20), xji(t) = 0 → x ji (t ) = u j (t ) − ui (t ) ≥ 0 → ui (t ) ≤ u j (t ) ≤ Aj . Hence,

ui (t ) ≤ min Aj j / x ji ( t ) = 0

112

X. Xie

which proves the first part of the theorem. The second part will be proved by contradiction. Assume that it does not hold. Then there exists a feasible control u'i (t) such that y'i (t) is greater than yi (t) at some t. Then there exist i* and t ≥ 0 such that y'i* (t) = yi* (t), u'i* (t) > ui* (t) and y'i (t) ≤ yi (t), ∀i ≠ i*. From (6.21), x'ji* (t) = mji* + y'j (t) – y'i* (t) and xji* (t) = mji* + yj (t) – yi* (t). Hence x'ji* (t) ≤ xji* (t). From the first part of the theorem,

u'i* (t ) ≤ min Aj ≤ min Aj = ui* (t ) j / x' ji* ( t ) = 0

j / x ji* ( t ) = 0

which contradicts our earlier assumption and completes the proof.



Definition 1. A stochastic fluid event graph is a event graph (P, T, F, m) in which each place (i, j) contains a fluid mij, each transition i is associated with a maximum firing speed Ai and an ON-OFF process also called failure-repair process. A transition i in ON state can fire up to speed Ai and a transition in OFF state cannot fire at all. The extension to fluid stochastic event graphs is motivated by the need of a general tool to representing failure-prone manufacturing systems. The following notations will be used:

• tn : epoch of n-th transition ON-OFF event with t0 = 0, • τn = tn+1 – tn : inter-arrival time of ON-OFF events, • αi (t) : ON-OFF state of transition i at time t with αi (t) = 1 if it is ON and αi (t) = 0 otherwise, • Aiαi (t) : maximal firing speed of transition i at time t. The dynamics of fluid stochastic event graph is similar to that of fluid event graphs. All results of deterministic fluid event graphs hold with Ai replaced by Aiαi(t). Concerning the underlying stochastic ON-OFF process,

Assumption 8. The ON-OFF process, represented by {αi (t), ∀i ∈ T, ∀t ≥ 0} and {tn, ∀n ≥ 0}, is a given independent stochastic process. In particular, it does not depend on the initial marking mij and the control policy ui (t).

Simulation-based Optimization of Failure-prone Continuous Flow Lines

113

This assumption corresponds to the assumption of time-dependent failures. It implies that the failure and repair process is independent of the machine utilization. This assumption is usually made in flow control of failure-prone manufacturing systems. From the above, a stochastic fluid event graph is hybrid system composed of a discrete-event component characterized by {αi (t), ∀i ∈ T, ∀t ≥ 0} and {tn, ∀n ≥ 0} and a continuous component characterized by relations (6.17)–(6.23). Assumption 8 implies that the continuous component is governed by the discrete-event component but it has no impact on the discrete-event component. Finally, for simplicity, the following shorthand notations will be used:

• Yin = yi (tn) : cumulative firing quantity of transition i at event n, • Xijn = xij (tn) : marking of place (i, j) at event n, • Ain = Aiαi (tn) : maximal firing speed of transition i upon event n. Before presenting the analysis of stochastic fluid event graphs, let us show how to model manufacturing systems using this tool. For this purpose, we consider a production line composed of three machines M1, M2, M3. Products flow from M1 to M2 and then to M3. Different production control mechanisms will be considered. Figure 6.5 is the model of a production line separated by buffers of limited size. Transitions t1, t2 and t3 represent the machines. Places p1 and p3 represent the buffers while places p2 and p4 represent their remaining capacity. Figure 6.6 corresponds to a production line with a global buffer represented by p2 or control by a CONWIP.

p4

p2 h1

t1

p1

h2

t2

p3

Fig. 6.5. A transfer line with limited buffers.

t3

114

X. Xie

p2 h1

t1

p1

t2

t3

p3

Fig. 6.6. A production line with a global buffer.

Figure 6.7 corresponds to a Kanban system. Transitions LUi are load/unload operations. t0 represents the arrival of demand. Places fi are buffers of free kanbans of stage i, places wi represent parts waiting for machine Mi, places pi represent parts ready to move to the next stage. Place p0 model backlogged demand. f1

f2

f3

h1

h2

h3

t0 p0

LU1

w1

t1

p1 LU2

w2

t2

p2 LU3

w3

t3

p3

LU4

Fig. 6.7. A production line controlled by kanbans.

Figure 6.8 models a production line controlled by the so-called base-stock policy, also called surplus control. The base-stock policy allows each machine to produce more than the true demand of the system. This surplus also called echelon-stock, including the parts in the output buffer of the machine and those located in the downstream stages, is limited by a base-stock level. In the Petri net model, the base-stock levels of M1, M2 and M3 are represented respectively by the initial markings of places p2, p4 and p6. Place p0 represents unsatisfied demand and place p5 the inventory of finished products.

115

Simulation-based Optimization of Failure-prone Continuous Flow Lines

t0 p2

h1

p4 h2 p6 h3

t1

p1

t2

t3

p3

p5

p0

t4

Fig. 6.8. Base-stock control.

The control policy represented by Fig. 6.9 is similar to the base-stock policy except that in the new policy the buffers among machines have limited sizes represented by places b1 and b2. This policy is called generalized kanban systems or two-boundary control policy (surplus control and buffer control).

t0 p2

h1

p4 h2 p0

p6 h3

t1

p1

t2 b1

t3

p3

p5

t4

b2 Fig. 6.9. A generalized kanban system.

6.5. Evolution Equations and Sample Path Gradients This section is devoted to the evaluation of the throughput rate and its gradient with respect to firing speed and initial marking of stochastic fluid event graphs. Instead of discrete event simulation, we use here a (max, +)-like evolution equation system to characterize system evolutions at ON-OFF events only without explicit consideration of continuous flow related events such as buffer full and buffer empty.

116

X. Xie

Theorem 6. Between any two ON-OFF events, i.e. tn ≤ t ≤ tn+1, the following evolution equations hold:

yi (t ) = min{Y jn + Ajn (t − tn ) + m ji }, ∀i ∈T

(6.24)

Y jn +1 (t ) = min{Y jn + Ajnτ n + m ji }, ∀i ∈T

(6.25)

j ∈T

j ∈T

The proof of Theorem 2 is is given in [25]. It might appear that equation (6.24) did not cover the case of small t for which yi (t) is close to 0. It is actually correct as mii = 0 and, for t ≤ t1, yi (t) ≤ Yi0 + Aj0(t – t0) + mii = Aj0t and yi (t) is close to 0 for small t. The evolution equations (6.25) play an important role in the remainder. Further, once the sequence of ON-OFF event epochs and the sequence of ONOFF states are known, they can be used to efficiently estimate performance measures such as throughput rate as shown in the following algorithm. Using this algorithm, there is no need to track the complex fluid dynamics. Algorithm 1 (performance evaluation) 1. Initialization: Choose αi0, ri0 for all i ∈ T and set n = t0 = Yi0 = 0, Xij0 = mij where ri0 is the time to state change of transition i 2. Next event time: tn+1 = tn + mini∈T{rin} 3. Determine Yin+1 using (6.25) and Xijn+1 = mij + Yin+1 – Yjn+1 4. Update ON-OFF states, i.e. αin+1, rin+1 5. Go to step 2.

Theorem 7. For all i ∈ T, yi (t) are piece-wise linear, Lipschitz continuous, nondecreasing and concave in (A, m). Furthermore, for all (A, m) ≥ 0 and for all (∆A, ∆m) ≥ 0,

0 ≤ ∆yi (t ) ≤ ∆ A



t + ( Nt + 1) ∆ m

1

(6.26)

where Nt = sup{n : tn < t}, ∆yit = yit (A + ∆A, m + ∆m) – yit (A, m).

Corollary 1. Assume that (A, m) is a function of a real number θ denoted by F(θ ).yi (t) for all i ∈ T are Lipschitz continuous in θ if F(θ ) is Lipschitz continuous. yi (t) are non-decreasing in θ if F(θ ) is non-decreasing. yi (t) are concave in θ if F(θ ) is concave.

Simulation-based Optimization of Failure-prone Continuous Flow Lines

117

Assume now that the parameters (A, m) are functions of a real number θ. As a result, Yin are as well functions of θ and we use also notations Yin(θ ) to stress the dependence on θ. The objective is to derive derivative of Yin(θ ) that will be used for performance optimization.

Theorem 8. Assume that (A, m) is a function of a real number θ denoted by F(θ ). If F(θ ) is continuous and differentiable at θ, then the right and left derivatives of Yin exist and

 ∂+Y jn ∂Aj ∂+ m( j ,i )  ∂ +Yin +1 α jnτ n + = min  +  j ∈Ein ∂θ ∂θ ∂θ   ∂θ

(6.27)

 ∂−Y jn ∂Aj ∂− m( j ,i )  ∂−Yin +1 = max  + α jnτ n +  j ∈Ein ∂θ ∂θ ∂θ   ∂θ

(6.28)

where Ein = { j ∈ T Yin+1 = Yjn + Ajnτn + mji}.

Theorem 9. Under the conditions of Theorem 8, if F(θ ) is concave, then ∂+Yin /∂θ ≤ ∂–Yin /∂θ for i ∈ T and for all n ≥ 0. Further, any gin such that ∂+Yin /∂θ ≤ gin ≤ ∂–Yin /∂θ is a subgradient of Yin. Theorems 4 and 5 lead to the following subgradients of Yin for concave F(θ ):

g in +1 = g jn + a j α jnτ n + µ ji

(6.29)

for any j ∈ Ein where aj is a subgradient of Aj and µij a subgradient of mij. We now use the evolution equations to analysis the properties of the throughput functions Yin(θ ) and their gradients. Rewrite the evolution equations (6.25) as follows:

Yn+1 = Wn ⊗ Yn where the the (min, +) algebra is used, ⊕ denotes the “min” operator and ⊗ the “+” operator, Yn = (Y1n, …, YIn)T, and Wn is I × I matrix with Wnij = Ainτn + mij.

Assumption 9. {Wn, n∈ IN} is a stationary ergodic sequence. Stationary ergodicity of {Wn n∈ IN} is equivalent to the stationary ergodicity of {(αn, rn), n∈ IN} where rin is the remaining time to state change of transition i at time tn+ as tn+1 – tn = min{rin i∈T}.

118

X. Xie

Theorem 10. Assume that the event graph is strongly connected. Under Assumption 9, there exists a constant γ > 0 such that: limn→∞ Yin /n = γ with probability 1 (or w.p. 1 for short) and limn→∞ E[Yin]/n = γ. We notice that γ corresponds to the average firing quantity of a transition in a state. We term it the throughput rate of type 1. The above theorem shows that the throughput rate of type 1 is the same for all transitions if the event graph is strongly connected.

Corollary 2. Supposed that Assumption 9 holds and that the event graph is strongly connected. If there exists µ > 0 such that tn /n → µ w.p.1, then limt→∞ yit /t = Γ w.p. 1 where Γ = γ /µ. Further γ and Γ are Lipschitz continuous, non-decreasing and concave in (A, m). Theorem 11. If (A, m) is a differentiable and concave function of θ and if E[ yi (t)] is differentiable at θ, then E[git] = ∂E[ yi (t)]/∂θ for any subgradient git of yi (t). Proof. From Corollary 1, yit is a concave function of θ. Therefore, for all ∆ ≥ 0,

yi (t ,θ + ∆ ) − yi (t ,θ ) y (t ,θ ) − yi (t ,θ − ∆ ) ≤ g it ≤ i ∆ ∆ By taking expectation, we obtain:

E[ yi (t ,θ + ∆ )] − E[ yi (t ,θ )] E[ yi (t , θ )] − E[ yi (t ,θ − ∆ )] ≤ E[ git ] ≤ ∆ ∆ Letting ∆ → 0 leads to E[git] = ∂E[yi (t)]/∂θ.



Theorem 12. Assume that Assumption 9 holds and (A, m) is a differentiable and concave function of θ. If the throughput γ defined in Theorem 10 is differentiable at θ, then limn→∞ ∂gin /n = ∂γ /∂θ, w.p.1 for any subgradient gin of Yin. Further, limt→∞ git /t = ∂Γ/∂θ for any subgradient git of yi (t). Proof. As in the proof of Theorem 11, for all ∆ ≥ 0,

Yin (θ + ∆ ) − Yin (θ ) Y (θ ) − Yin (θ − ∆ ) ≤ g in ≤ in ∆ ∆

Simulation-based Optimization of Failure-prone Continuous Flow Lines

119

Letting n → ∞, Theorem 10 implies that w.p.1,

γ (θ + ∆ ) − γ (θ ) ∆

≤ lim inf gin ≤ limsup g in ≤ n →∞

γ (θ ) − γ (θ − ∆ )

n →∞



Letting ∆ → 0 proves limn→∞∂gin /n = ∂γ /∂θ. The proof of the last part is easy as □ ON-OFF event times are independent of θ.

6.6. Optimization of Stochastic Fluid Event Graphs In this section, we consider the following optimization problem. Let z = (A, m) be all system parameters to optimize. The problem consists in maximizing a criterion function of the throughput rate vector Γ(z) and z denoted as J(Γ(z), z). The following assumptions will be considered:

Assumption 10. J(Γ, z) is non-decreasing in Γ and is concave in (Γ, z). Assumption 11. limz→∞ J(Α, z) = –∞. An example of such criteria is J(Γ, z) = < c, Γ > – < p, z > with c ≥ 0 and p > 0 where < c, Γ > is the gain related to throughput rates and < p, z > corresponds to the cost of the firing speeds and markings. An important consequence of Assumption 11 is that limz→∞ J(Γ(z), z) = –∞ since Γ(z) ≤ Α for all z. We propose here a simulation-based approach for solving the optimization problem. The approach makes use of a single sample path ω = {t0 = 0, α0, t1, α1, …, tN, αN}. Equations (6.25) are used to compute Yin ∀i ∈ T and 1 ≤ n ≤ N. We then choose Yin(z)/tn as estimator of the throughput rate of transition i and maximize J(Γn(z), z) where Γn(z) is the vector of estimated throughput rates whose i-th entry Γin(z) is Yin(z)/tn. The new sample function has the following property.

Theorem 13. Under Assumptions 5 to 10, J(Γn(z), z) is concave in z. If Assumption 11 holds as well, then J(Γn(z), z) reaches its maximum at a finite zn.

120

X. Xie

The optimization approach can be summarized as follows: Algorithm 2 (Performance optimization) 1. Generate a sample path of failure-repair events ω = {t0 = 0, α0, t1, α1, …, tN, αN}. 2. Apply a concave function optimization technique to obtain zn that maximizes J(Γn(z), z) where the throughput rates Γn(z) are evaluated using Algorithm 1 and gradients ∂J(Γn(z), z)/∂z is evaluated using equations (6.27) – (6.28). From the existence of long-run throughput rates, limn→∞ Γn(z) = Γ∞(z) = Γ(z), w.p. 1. Further, let z∞ = argmax{J(Γ∞(z), z)}. The asymptotic optimality of zn is ensured by the following theorem and motivates the use of a single sample path of finite length for solving our optimization problem.

Theorem 14. Under Assumptions 5 to 11, w.p. 1, (a) J(Γ∞(z), z) is a finite, continuous convex function; (b) limn→∞ J(Γn(zn), zn) = limn→∞J(Γ∞(zn), zn) = J(Γ∞(z∞), z∞). Let us notice that the results of this section can be extended to the case where (A, m) is a concave function of parameters of interest θ. In this case, Assumption 11 is replaced by limθ →∞ J(Γ, z) = –∞. Constraints of the system parameters θ can be taken into account. In particular, if θ is limited to a bounded convex set, then Assumption 11 can be removed.

Example 2. To illustrate the approach, let us consider a production line composed of three machines M1, M2 and M3 separated by two buffers B1 and B2. The machines are identical with maximal production rate equal to 1 unit per unit of time. The time to failure of each machine is exponentially distributed with mean equal 10 and the time to repair is also exponentially distributed with mean equal to 5. The buffer capacity of B1 is h1 and that of B2 is h2. We assume that h1 + h2 = 10 and determine h1 and h2 that maximize the throughput rate. The Petri net model of this system is given in Fig. 6.5. Clearly, transition ti has maximal firing speed Ai = 1 and its ON-OFF process represent that of machine Mi. From the erogodicity property, without loss of generality, we restrict ourselves to the initial marking m = [0, h1, 0, h2] with h1 = θ and h2 = 10 – θ and 0 ≤ θ ≤ 10, which implies that we start the simulation with empty buffers. Clearly, (A, m) is a concave function of θ. As a result, Yin(θ ) is concave

Simulation-based Optimization of Failure-prone Continuous Flow Lines

121

in θ. The marking distances needed in the evolution equations are as follows: m(2,1) = θ, m(3,2) = 10 – θ, m(3,1) = 10, m(i,j) = 0 for all other places (i, j). A sample path of 100000 failure-repair events is generated. Algorithm 2 is applied to maximize the concave function Γ3n(θ ) = Y3n(θ )/tn where Γ3n(θ ) is the throughput rate of transition t3 in Fig. 6.5, i.e. the throughput rate of machine M3, evaluated over the partial sample path of the first n failure-repair events. A bisection search with subgradients is used in the step 2 of Algorithm 2 to maximize Γ3n(θ ). The subgradients of Γ3n(θ ) are evaluated using equations (6.27) –(6.28) with ∂Aj /∂θ = 0, for all transitions j and ∂m21 /∂θ = 1, ∂m32 /∂θ = –1, ∂mij /∂θ = 0 for all other places (i, j). The simulation results are given in Table 6.3 where the exact performance Γ3(θ ) is estimated with n = 1000000 failure-repair events. From the symmetry of the production line, the optimal solution is h1 = h2 = 5. Table 6.3 shows that θn quickly converges to the optimal solution in terms of the exact performance function Γ3(θ ) even through the sample performance function Γ3n(θ ) converges very slowly. Further the finite-horizon solution θn is already in the neighborhood of the optimal solution even with a very small number of events n. The variation of the sample optimum θn around the real optimum can be partly explained by the smoothness of the performance function Γ3(θ ) around the real optimum. For comparison, the throughput rate with h1 = 1 and h2 = 9 is 0.4432. Table 6.3. Simulation optimization of a TDF line. n

h1

h2

Γ3n(θ n)

Γ3(θ )

50 100 500 1000 2000 5000 10000 100000

5.495 5.495 4.933 5.776 5.355 5.214 5.074 4.933

4.505 4.505 5.067 4.224 4.645 4.786 4.926 5.067

0.6801 0.5360 0.4829 0.4695 0.4577 0.4618 0.4692 0.4737

0.473042 0.473042 0.473473 0.472432 0.473247 0.473387 0.473462 0.473473

5 1

5 9

0.473476 0.443201

Note that a stochastic version of classical local optimization approach was proposed in [16] to solve a kanban allocation of discrete production lines. It was shown that, under some smoothness condition which is usually hard to check, this approach was able to approach the optimal allocation within a small number of local moves.

122

X. Xie

In practice, the approach proposed in this paper and the approach proposed in [16] are complementary especially in the case of high volume production. The fluid-flow approach proposed in this paper can be used to provide a good initial solution and the approach proposed in [16] can be used to fine tune the solution to obtain the optimal solution for a failure-prone discrete manufacturing system.

6.7. Conclusion In this chapter, we addressed the infinitesimal perturbation analysis of failureprone continuous flow production lines. We showed that the correctness of IPA gradient estimators depends strongly on the interaction between the discrete event component and the continuous flow component of the failure-prone systems. In particular, we showed that IPA gradient estimators are biased when machines are subject to operation-dependent failures with which a machine cannot fail when it is not producing. We also showed that IPA gradient estimators are unbiased and strongly consistent when machines are subject to time-dependent failures under which failure-repair events do not depend on the fluid dynamics of the system. The latter results are established with a quite general framework called stochastic fluid event graphs which can model assembly/disassembly operations. For production lines subject to operation-dependent failures, unbiased and strongly consistent gradient estimators were proposed in [7] by conditional perturbation analysis for two-machine lines. Under some conditions, gradients can be evaluated with a single simulation run. Extension to production lines with any number of machines is still open. For production networks subject to time-dependent failures, some immediate extensions are possible. For stochastic fluid event graphs, the constraint of finite maximal firing speeds can be replaced by the following less restrictive constraints: (i) immediate transitions i such that Ai = ∞ is always ON, (ii) each immediate transition i is such that there exists a path from a finite speed transition to i, (iii) every immediate transition has an input place that is initially empty. The evolution equations hold by restricting to finite speed transitions and hence all other results hold. Actually, the fluid event graphs with immediate transitions can be reduced to equivalent event graphs by removing immediate transitions and by adding a place between any pair (i, j) of transitions with marking equal to the minimal fluid content among all path of immediate transitions connecting i to j. Extension to ON-OFF immediate transitions is significant but is still open.

Simulation-based Optimization of Failure-prone Continuous Flow Lines

123

The other immediate extension of fluid event graphs is to allow each transition to have arbitrary number of states instead of the on/off states considered here. This extension is particular useful for modeling the arrival process of demand. All results of this paper can be easily proved true. The evolution equation approach of this chapter provides an elegant way for gradient estimation of throughput rates. One important pending issue is the analysis of marking process with such approaches. This analysis is necessary for evaluation and optimization of waiting times and inventory levels of manufacturing systems. In this chapter, fluid travels across the production systems instantaneously but subject to capacity constraints. Continuous flow models with delays are considered in [13, 15] which shown the introduction of delays make the fluid dynamics more complicated.

References [1] Buzacott, J.A. and Hanifin, L.E. (1978). Models of Automatic Transfer Lines with Inventory Banks: A Review and Comparison. AIIE Trans., 10(2), 197–207. [2] Cassandras, C.G., Wardi, Y., Melamed, B., Sun, G., and Panayiotou, C.G. (2002). Perturbation Analysis for On-line Control and Optimization of Stochastic Fluid Models. IEEE Trans. on Automatic Control, 47(8), 1234–1248. [3] Dallery, Y., David, R., and Xie, X. (1989). Approximate Analysis of Transfer Lines with Unreliable Machines and Finite Buffers. IEEE Transactions on Automatic Control, 34(9), 943–953. [4] Dallery, Y. and Gershwin, S.B. (1992). Manufacturing Flow Line Systems: A Review of Models and Analytical Results. Queueing Systems, 12, 3–94. [5] David, R. and Alla, H. (1992). Petri Nets and Grafcet: Tools for Modeling of Discrete Event Systems (Prentice-Hall, London). [6] David, R., Dallery, Y., and Xie, X. (1990). Properties of Continuous Models of Transfer Lines with Unreliable Machines and Finite Buffers. IMA Journal of Mathematics Applied in Business & Industry, N° 6, 281–308. [7] Fu, M. and Xie, X. (2002). Derivative Estimation for Buffer Capacity of Continuous Transfer Lines Subject to Operation-dependent Failures. J. Discrete Event Dynamic Systems: Theory and Applications, 12, 447–469. [8] Glasserman, P. (1990). Gradient Estimation via Perturbation Analysis (Kluwer Academic Publisher). [9] Ho, Y.C., Eyler, A., and Chien, T.T. (1979). A Gradient Technique for General Buffer Storage Design in a Serial Production Line. International Journal of Production Research, 17, 557–580.

124

X. Xie

[10] Ho, Y. and Cao, X.R. (1991). Perturbation Analysis of Discrete Event Dynamic Systems (Kluwer Academic Publishers, Boston). [11] Kouikoglou, V.S. and Phillis, Y.A. (1997). A Continuous-flow Model for Production Networks with Finite Buffers, Unreliable Machines, and Multiple Products. International Journal of Production Research, 35(2), 381–397. [12] Li, J. and Meerkov, S.M. (2009). Production Systems Engineering (Springer). [13] Mourani, I., Hennequin, S., and Xie, X. (2006). Optimization of Failure-prone Continuous-flow Transfer Lines with Delays and Echelon Base Stock Policy using IPA. Proc. 45th IEEE Conference on Decision and Control (CDC2006). [14] Mourani, I., Hennequin, S., and Xie, X. (2007). Failure Models and Throughput Rate of Transfer Lines. International Journal of Production Research, 45(8), 1835– 1859. [15] Mourani, I., Hennequin, S., and Xie, X. (2008). Simulation-based Optimization of a Single-stage Failure-prone Manufacturing System with Transportation Delay. International Journal of Production Economics, 112(1), 26–36. [16] Panayiotou, C.G. and Cassandras, C.G. (1999). Optimization of Kanban-based Manufacturing Systems. Automatica, Vol. 35, pp. 1521–1533. [17] Panayiotou, C.G. and Cassandras, C.G. (2006). Infinitesimal Perturbation Analysis and Optimization for Make-to-stock Manufacturing Systems Based on Stochastic Fluid Models. Journal of Discrete Event Dynamic Systems, 16(1), 109–142. [18] Paschalidis, I.C., Liu, Y., Cassandras, C., and Panayiotou, C.G. (2004). Inventory Control for Supply Chains with Service Level Constraints: A Synergy between Large Deviations and Perturbation Analysis. Annals of Operations Research, 126, 231–258. [19] Shi, L., Fu, B.-R., and Suri, R. (1999). Sample Path Analysis for Continuous Tandem Production Lines. J. Discrete Event Dynamic Systems: Theory and Applications, 9(3), 211–239. [20] Sun, G., Cassandras, C.G., and Panayiotou, C.G. (2004). Perturbation Analysis and Optimization of Stochastic Flow Networks. IEEE Trans. on Automatic Control, 49(12), 2113–2128. [21] Suri, R. and Fu, B.-R. (1994). On Using Continuous Flow Lines to Model Discrete Production Lines. J. Discrete Event Dynamic Systems: Theory and Applications, 4, 127–169. [22] Wardi, Y., Melamed, B., Cassandras, C.G., and Panayiotou, C.G. (2002). IPA Gradient Estimators in Single-node Stochastic Fluid Models. J. of Optimization Theory and Applic., Vol. 115, 2, pp. 369–406. [23] Xie, X. (1993). Performance Analysis of a Transfer Line with Unreliable Machines and Finite Buffers. IIE Transactions, 25(1), 99–108. [24] Xie, X. (2002a). Evaluation and Optimization of Two-stage Continuous Transfer Lines Subject to Time-dependent Failures. Discrete Event Dynamic Systems: Theory and Applications, 12, 109–122.

Simulation-based Optimization of Failure-prone Continuous Flow Lines

125

[25] Xie, X. (2002b). Fluid-stochastic-event Graphs for Evaluation and Optimization of Discrete-event System with Failures. IEEE Transactions on Robotics and Automation, 18(3), 360–367. [26] Zimmern, B. (1956). Etude de la propagration des arrêts aléatoires dans les chaînes de production. Revue de Statistique Appl., 4, 85–104.

This page intentionally left blank

May 14, 2013

10:17

World Scientific Review Volume - 9in x 6in

Chapter 7 Perturbation Analysis, Dynamic Programming, and Beyond Li Xia Center for Intelligent and Networked Systems Department of Automation, TNList, Tsinghua University Beijing 100084, China Xi-Ren Cao Department of Finance and the Key Laboratory of System Control and Information Processing of the Ministry of Education Department of Automation Shanghai Jiao Tong University, China, and Institute of Advanced Study Hong Kong University of Science and Technology, China∗ The main idea of perturbation analysis is that a sample path of a discrete event dynamic system contains information about not only the performance itself, but also its derivative. The development of recent 30+ years has testified the power of this insightful idea. It is now clear that a sample path of a Markov system under a policy contains some information about the performance of the system under other policies; and with this view, policy iteration is simply a discrete version of the gradient descent method, and each iteration can be implemented based on a single sample path. This view leads to the direct-comparison based approach to performance optimization, which may be applied to cases where dynamics programming does not work well. In this chapter dedicated to Prof. Ho’s 80th birthday, we document the research along this direction, especially those beyond dynamic programming. ∗ This

research was supported in part by the Collaborative Research Fund of the Research Grants Council, Hong Kong Special Administrative Region, China, under Grant No. HKUST11/CRF/10, the 111 International Collaboration Project (B06002), National Natural Science Foundation of China (61221003, 61203039, 60736027), the Specialized Research Fund for the Doctoral Program of Higher Education (20120002120009), Tsinghua National Laboratory for Information Science and Technology (TNList) Crossdiscipline Foundation. 127

chapter7

April 29, 2013

16:24

128

World Scientific Review Volume - 9in x 6in

L. Xia and X.-R. Cao

7.1. Introduction Perturbation analysis (PA) was first proposed for queueing systems by Prof. Y. C. Ho et al., in early 80’s last century [29, 32] and its main idea was to estimate the performance derivatives with respect to system parameters by analyzing a single sample path of a stochastic dynamic system. It indicates that a system’s sample path contains the information not only for the system performance, but also for the performance derivatives. The development in the past three decades has testified the power of this highly innovative idea. In this chapter dedicated to Prof. Ho’s 80th birthday, we briefly document the research development starting from PA that the authors have been involved, and we hope this may illustrate, from one angle, the deep influence of Prof. Ho’s insightful initial work. The chapter represents our own understanding; and there are many other important works presented in other chapters. The early works of PA focus on queueing systems and most of them are with a finite horizon performance; the approach was to develop an efficient algorithm to calculate the sample derivative as an estimate for the derivative of the average performance [29]. It was realized in mid 80’s that for the PA-based sample derivative obtained from a single sample path to be an unbiased estimate, it requires conditions for exchanging the order of expectation and derivatives [2]. Since then, one of the major research direction is developing efficient algorithms for the sample derivatives for various systems, proving the unbiasedness of the sample derivative as the estimate of the derivatives of the average performance, and developing new techniques when the unbiasedness does not hold. There have been many excellent papers and books in this research direction, including [3, 5, 18, 19, 21–24, 26, 28, 30, 31], and we certainly cannot cite all of them. We will not review them in details except mentioning here that the idea is recently applied to analyze the performance of non-linear behavior in financial engineering, and a new property called mono-linearity is discovered, leading to some new insights to this subject [14]. In this chapter, we mainly focus on another research direction starting from PA. The main ideas leading to this direction are as follows. First, the effect of a parameter change on a performance can be decomposed into the sum of that of a sequence of single perturbations, which can be measured by a quantity called perturbation realization factor. When the perturbation generated due to the parameter change is infinitesimal, this decomposition leads to the performance derivative, and when the

chapter7

April 29, 2013

16:24

World Scientific Review Volume - 9in x 6in

Perturbation Analysis, Dynamic Programming, and Beyond

chapter7

129

perturbation generated is finite, it leads to the performance difference. This idea applies to both queueing systems [2, 4, 6, 23, 29] and general Markov systems [13, 17], both finite and infinite horizon performance. For Markov systems, the realization factor of a perturbation, which represents a jump from one state to another, equals the difference of the performance potentials of these two states, the latter measure the “contribution” of a state to the system performance. This approach overcomes the difficulty involved with the exchangeability. The second idea is a fundamental observation: Because the policy space is usually extremely large, exhaustive search for an optimal policy is infeasible. Therefore, if we do not know anything about the property of the performance curve on the policy space, to develop feasible methods for performance optimization we need to compare the performance of any two policies by analyzing one of them. (If comparing two policies requires to analyze both of them, then comparing all the policies requires to analyze all of them, equivalent to enumerating search in the policy space.) The realization factor approach provides a simple way to implement this single policy based comparison. The realization factors can be determined by analyzing one of the policies. With the realization factors, the original idea of PA, i.e., a system’s sample path contains the information not only for the system performance, but also for the performance derivatives, extends to finite parameter changes, i.e., under some mild conditions such as the Markov structure, a system’s sample path under a policy also contains some information about the performance under other policies. This establishes a new approach for the performance optimization, which is simply based on a direct comparison of the performance of any two policies. This approach is based on the performance difference equation, with no dynamic programming. Compared with the standard dynamic programming approach, this approach is simple and intuitive, and may be applied to problems where dynamic programming fails to work. In this chapter, we mainly review the research along the direction with the direct-comparison based approach. This approach was first applied to Markov decision processes (MDPs) and it was shown that the standard results such as HJB optimization equations and policy iteration, etc., can be easily established, and Q-learning and other techniques can be derived; in addition, the study has led to a new topic, gradient-based learning (called policy gradient in the reinforcement learning community). Furthermore, because of its simplicity and intuitiveness, the approach also motivates the study of many problems with relatively complex performance, such

April 29, 2013

16:24

130

World Scientific Review Volume - 9in x 6in

L. Xia and X.-R. Cao

as the N -bias and the sample-path based variance. The N -bias optimality eventually leads to the Blackwell optimality as N goes to infinity [12, 16, 39]; the final results are equivalent to the N -discount optimality [41, 49], but the approach is simple and more direct in the sense of no discounting. The standard dynamic programming requires two assumptions, among others; it assumes that the actions taken at different states can be chosen independently, and that the performance to be optimized is time consistent; that is, the optimal policy for the performance starting from time t + 1 is also optimal for the performance starting from t. However, many practical problems, such as problems involving aggregation or event-triggering actions, the independent-action assumption is violated. We have applied the direct-comparison based approach to study these problems. The study on time-inconsistent performance is underway. The direct-comparison based approach also motivates some new ideas in the well-established subjects, such as the optimization of queueing systems. There are already rich studies on the optimal control of queueing systems. The traditional approaches usually model the optimization problem of Markovian queues as a Markov decision process and use the optimality equation to analyze and develop the related algorithms. However, for the complex queueing systems, such as the queueing networks, the associated optimality equation is too complicated for effective analysis. By applying the direct-comparison based approach, we can derive the difference equation for queueing systems, which is more efficient than the derivative equation in the traditional PA theory [44]. Direct comparison and difference equation of queueing systems give a new perspective and some new results are derived, which are difficult to obtain in the traditional queueing theory, such as the Max-Min optimality of service rate control [48] and the optimal admission control of queueing networks [46]. The structure of the relations of the above topics is illustrated in Fig. 7.1. The rest of this chapter is organized as follows. First, we will introduce the traditional PA theory in queueing systems. Then, we will briefly introduce the development on the performance optimization of Markov systems. Finally, we will discuss the further advancements on some emerging topics, including a new optimization framework called event-based optimization and some new ideas based on PA in financial engineering. The concept of perturbation realization factor (or performance potential for Markov systems) is the overall thread throughout the chapter.

chapter7

May 8, 2013

14:57

World Scientific Review Volume - 9in x 6in

chapter7

Perturbation Analysis, Dynamic Programming, and Beyond

131

Perturbation Analysis

Sample Derivative

Perturbation Realization

Performance Difference

Performance Derivative

Non-linear Behavior in Financial Engineering (MonoLinearity)

Gradient Based Optimization (Policy Gradient)

Fig. 7.1.

Q-Learning

Policy Iteration, HJB eq. MDPs

Complex Performance: N-Bias, Sample-path Variance

Event-Based Optimization

Time Inconsistent Performance in Financial Engineering

New Approach of Queueing Optimization

Summary of the chapter: Some researches motivated by PA.

7.2. Perturbation Analysis of Queueing Systems Based on Perturbation Realization Factors The PA theory in queueing systems reviewed in this chapter is based on perturbation realization factor, which measures the long-term effect of a single perturbation on the system average performance [4, 6]. With the perturbation realization factor as building blocks, the performance derivative equation and difference equation with respect to system parameters are derived, and the gradient-based algorithm and the policy iteration algorithm can be developed, respectively. In this section, we will mainly introduce these two types of sensitivity equations based on perturbation realization factors. Other parts of the PA theory, such as IPA, FPA, SPA, PA for fluid model, etc., are omitted for the limit of the space and can be referred to other parts of this book, or in the literature [6, 19, 24–26, 28, 31]. 7.2.1. Performance gradient Below, we use a typical queueing model, the closed Jackson network, to introduce the key results of the traditional PA theory in queueing systems. Please note, the similar principles of PA theory may be also applied to other queueing systems [6, 21, 28, 35]. Consider a closed Jackson network (also called Gordon-Newell network [27]) with M servers and N customers. The total number of customers in the network is a constant N . That is, there is no customer arrival to or departure from the network. The customer transits among the servers and receives service. The service time at every server obeys an exponential distribution. We denote the service rate of

April 29, 2013

16:24

World Scientific Review Volume - 9in x 6in

L. Xia and X.-R. Cao

132

server i as μi , i = 1, 2, . . . , M . When a customer finishes its service at server i, it will transit to server j with a routing probability qij , i, j = 1, 2, . . . , M . M Obviously, we have j=1 qij = 1 for all i. The service discipline of servers is first come first serve (FCFS). The waiting capacity of servers is adequate and there are no customer overflows or losses. The number of customers (including the customer being served) at server i is denoted as ni . The system state is denoted as n := (n1 , n2 , . . . , nM ). The state space is defined  as S := {all n : M i=1 ni = N }. Let n(t) := (n1 (t), n2 (t), . . . , nM (t)) be the system state at time t, where ni (t) is the number of customers at server i at time t, i = 1, 2, . . . , M , t ≥ 0. Define Tl as the lth state transition time of the stochastic process n(t). Tl also indicates the time when the entire network has served l customers. The cost function of the system is denoted as f (n), n ∈ S. We define the time-average performance η of the system as below.  1 T FT , (7.1) f (n(t))dt = lim η := lim T →∞ T 0 T →∞ T T where FT := 0 f (n(t))dt is defined as the accumulated performance until T . As a comparison, we define another performance metric called the customer-average performance ηC as below.  1 Tl FTl . (7.2) f (n(t))dt = lim ηC := lim l→∞ l 0 l→∞ l When the network is strongly connected (any customer may visit every server in the network), n(t) is ergodic. Thus, the limits in (7.1) and (7.2) exist with probability 1. It is clear that η is the system performance averaged for each unit of time, while ηC is the system performance averaged for each customer. These two performance metrics are both important for queueing systems and they have the following relation  η Tl 1 Tl f (n(t))dt = = ηηI , (7.3) lim ηC = lim l→∞ l Tl →∞ Tl 0 ηth l l→∞ Tl

where ηth := lim

is the average throughput of the entire network, ηI is

a special case of ηC when f (n) = I(n) ≡ 1 for all n. That is, we have  1 Tl Tl 1 = 1dt = lim . (7.4) ηI := lim l→∞ l 0 l→∞ l ηth Generally speaking, the optimizations for η and ηC are different and we have to develop specific approaches for them respectively [44].

chapter7

April 29, 2013

16:24

World Scientific Review Volume - 9in x 6in

Perturbation Analysis, Dynamic Programming, and Beyond

chapter7

133

Suppose that the service rate of server i has a very small change Δμi . That is, the service rate is changed from μi to μi + Δμi , where Δμi → 0. With the inverse transform method, we have s = − μ1i ln ζ, where ζ is a uniformly distributed random number in [0, 1] and s is the random service time obeying exponential distribution with rate μi . For the same randomness ζ, when the service rate is changed from μi to μi + Δμi , the random service 1 i /μi ln ζ = − 1−Δμ ln ζ + o(Δ). The time s will be changed to s = − μi +Δμ μi i perturbation of the random service time can be written as Δs = s − s = −

Δμi s. μi

(7.5)

Therefore, during the simulation process, the service time of each customer at server i will have a perturbation whose amount is proportional to its i original service time with a scalar − Δμ μi . After the perturbations are generated, they are propagated throughout the entire network according to some specific rules [6]. The effect of such single perturbation on the accumulated performance can be quantified by a quantity called perturbation realization factor. Consider a single perturbation Δ of service time of server i when the system is at state n. The perturbation realization factor is denoted as c(f ) (n, i), n ∈ S, i = 1, . . . , M , and it measures the long-term effect of this perturbation on the total accumulated performance FTl . Theoretically, c(f ) (n, i) is defined as below.      FT  − FTl ΔF T l l = lim lim E c(f ) (n, i) := lim lim E l→∞ Δ→0 l→∞ Δ→0 Δ Δ   

  Tl Tl 1  = lim lim E f (n (t))dt − f (n(t))dt , (7.6) l→∞ Δ→0 Δ 0 0 where n (t) is the stochastic process of the state of the perturbed system (with the perturbation Δ of service time of server i at time 0) at time t, Tl is the lth service completion time of the perturbed system. The value of c(f ) (n, i) can be numerically obtained by solving the following set of linear equations [6] If ni = 0, then c(f ) (n, i) = 0; M i=1

c(f ) (n, i) = f (n);

(7.7)

(7.8)

April 29, 2013

16:24

World Scientific Review Volume - 9in x 6in

chapter7

L. Xia and X.-R. Cao

134



M

 1nk >0 μk

k=1

+

M

 μi qij

c

(f )

(n, i) =

M M

1nk >0 μk qkj c(f ) (n−k,+j , i)

k=1 j=1



(f ) 1 − 1nj >0 c (n−i,+j , j) + f (n) − f (n−i,+j ) ; (7.9)

j=1

where n−i,+j := (n1 , . . . , ni − 1, . . . , nj + 1, . . . , nM ) is a neighboring state of n with ni > 0, 1ni >0 is an indicator function defined as, if ni > 0, 1ni >0 = 1; otherwise 1ni >0 = 0. On the other hand, c(f ) (n, i) can also be estimated statistically from a single sample path. For more details, readers can refer to the literature [6, 45]. With c(f ) (n, i) as building blocks, we can derive the performance derivative equation of PA theory in queueing systems in an intuitive way as follows. Suppose the service rate μi is decreased to μi − Δμi , where Δμi is an infinitesimal amount. According to the inverse transform method, i every service time at server i will have a delay Δμ μi s, where s is the random service time originally generated. During a period Tl  1, the total i amount of service time delay at state n is Tl π(n) Δμ μi , where π(n) is the steady-state probability that the system is at n. Since the effect of a unit delay at (n, i) is quantified by c(f ) (n, i), the total effect of −Δμi on the accumulated performance FTl is ΔFTl =

Δμi Tl π(n)c(f ) (n, i). μi

(7.10)

n∈S

Dividing −Δμi l on both sides and letting l → ∞ and Δ → 0, we derive the following performance derivative equation μi dηC =− π(n)c(f ) (n, i). ηI dμi

(7.11)

n∈S

With (7.11), we can develop the gradient-based optimization algorithm to optimize the system parameters, such as the service rates, arrival rates, or routing probabilities. The performance gradients can be directly estimated from the sample path, or indirectly obtained by utilizing the estimates of c(f ) (n, i)’s and (7.11). In fact, the traditional PA approach in queueing systems can be viewed as an efficient implementation of the above derivative equation. Details can be found in the literature [6, 24, 30, 44, 45].

April 29, 2013

16:24

World Scientific Review Volume - 9in x 6in

Perturbation Analysis, Dynamic Programming, and Beyond

chapter7

135

7.2.2. Policy iteration The gradient-based optimization suffers its intrinsic deficiencies, such as the trapping into local optimum and the difficulty of selecting proper step-sizes. Fortunately, in the recent studies we find that the perturbation realization factors contain not only the derivative sensitivity information, but also the difference sensitivity information. For the optimization of service rates in a closed Jackson network, the following difference equation is derived by [44] when only one particular service rate μi,n is changed to μi,n .   −Δμi,n (f )  − ηC = ηI π  (n) c (n, i) + h(n) , (7.12) ηC μi,n where h(n) := f  (n) − f (n), ηI , π  (n), and f  (n) are the corresponding parameters of the perturbed system with new service rate μi,n . For the service rate control problem, we can further write f  (n) = f (n, μn ), where μn := (μ1,n , . . . , μi,n , . . . , μM,n ) in this scenario.  When the service rates of server i at all states n are changed from μi,n to μi,n , n ∈ S, the difference equation of ηC is obtained similarly as below.   −Δμi,n (f )  − ηC = ηI π  (n) c (n, i) + h(n) . (7.13) ηC μi,n n∈S

Furthermore, when the service rates of all servers i at all states n are changed from μi,n to μi,n , i = 1, 2, . . . , M , n ∈ S, the difference equation is derived as follows by combining (7.8).   M μi,n (f )     ηC − ηC = ηI π (n) f (n, μn ) − c (n, i) , (7.14) μ i=1 i,n n∈S

μn 

(μ1,n , μ2,n , . . . , μM,n )

:= in this scenario. where From (7.14), we can easily derive the optimality (HJB) equation for the service rate control problem: A set of service rates μ∗n = (μ∗1,n , μ∗2,n , . . . , μ∗M,n ), n ∈ S, is optimal (for the minimization problem) if and only if it satisfies the optimality (HJB) equation f (n, μn ) −

M μi,n i=1

μ∗i,n

c(f ) (n, i)∗ ≥ 0,

∀ μn , n,

(7.15)

where c(f ) (n, i)∗ is the perturbation realization factors corresponding to the μ∗n , n ∈ S} and the policy space is denoted as Ψ := {all L}. policy L∗ := { The correctness of (7.15) is straightforward based on (7.14) since π  (n) and ηI are always positive.

April 29, 2013

16:24

136

World Scientific Review Volume - 9in x 6in

L. Xia and X.-R. Cao

Based on the difference equation (7.14) and the optimality equation (7.15), we have the following policy iteration algorithm for minimizing the customer-average performance of queueing systems. Algorithm 1. Policy iteration algorithm for the optimization of customeraverage performance of state-dependent closed Jackson networks. (1) Initialization: Choose an arbitrary policy L0 ∈ Ψ as the initial policy, and set k = 0. (2) Evaluation: At the kth iteration with the policy denoted as Lk = { μn , n ∈ S}, calculate or estimate the perturbation realization factors c(f ) (n, i), i = 1, 2, . . . , M , n ∈ S, under policy Lk . (3) Improvement: Choose the policy of the next (the (k + 1)th) iteration μn , n ∈ S} with as Lk+1 = {   M μi,n (f )   f (n, μn ) − c (n, i) , n ∈ S. (7.16) μn = arg min  μn  μ i=1 i,n If at a state n,  μn already reaches the minimum of the above large  n. bracket, then set μ  n = μ (4) Stopping Rule: If Lk+1 = Lk , stop; otherwise, set k := k + 1 and go to step 2. Compared with the gradient-based optimization, the policy iteration is much more efficient. During each iteration, the policy iteration always finds a strictly better policy (usually it is much better). While the gradientbased optimization searches within a near neighboring space. Thus, the policy iteration usually has a much faster convergence speed than that of the gradient-based optimization. Moreover, the policy iteration converges to the global optimum, while the gradient-based optimization is always trapped into a local optimum. This is a prominent improvement compared to the gradient-based optimization in the traditional PA theory. As we see, (7.14) initiates a new approach of the performance optimization of queueing systems. Since (7.14) provides a very clear relation between the performance and the parameters, it may solve some problems which are difficult for the traditional approaches of queueing theory. For example, the service rate control problem is richly studied in the queueing theory, but it is difficult to solve in a complicated queueing network, such as a closed Jackson network. This is because the traditional approach usually uses the formulation of MDP. The associated optimality equation of this MDP model is too complicated to do effective analysis. However, based

chapter7

May 8, 2013

14:57

World Scientific Review Volume - 9in x 6in

Perturbation Analysis, Dynamic Programming, and Beyond

chapter7

137

on (7.14), we easily derive the Max-Min optimality for this service rate control problem. That is, when the cost function is convex with respect to the service rate, the optimal service rate is either maximal or minimal. This result is further more profound than those in the existing literature. This progress benefits from the difference equation (7.14). Details can be referred to in [48]. In summary, the perturbation realization factor plays a key role to build the performance derivative equation (7.11) and the performance difference equation (7.14). It contains the information of not only the performance derivative nearby the current policy, but also the performance difference under other policies. The perturbation realization factor can be efficiently estimated from a single sample path. Therefore, both the gradient-based optimization and the policy iteration can be implemented based on a single sample path, which makes the algorithms online implementable. 7.3. Performance Optimization of Markov Systems Based on Performance Potentials During the recent decades, rich studies have been conducted to further extend the idea of PA theory from queueing systems to Markov systems [9, 17, 21]. In the previous section, we see that the parameter changing is decomposed into a series of perturbations and the total effect equals a weighted sum of perturbation realization factors. This idea is similarly applied to the Markov system, where we use the performance potential as the building blocks and the performance derivative and difference equation are derived thereafter. This gives a new perspective to study the performance optimization of Markov systems, besides the classical MDP theory. Below, we use a discrete time Markov chain to briefly introduce this theory (it has also been extended to continuous time Markov processes, multi-chain Markov systems, semi-Markov processes, etc. [8, 17, 49]). More detailed introduction and review can be referred to the literature [11, 12]. 7.3.1. Performance gradients and potentials Consider an ergodic discrete-time Markov chain X = {X0 , X1 , X2 , . . . }, where Xl is the system state at time l, l = 0, 1, 2, . . . . The system space is finite and denoted as S = {1, 2, . . . , S}, where S is the size of the state space.

April 29, 2013

16:24

World Scientific Review Volume - 9in x 6in

chapter7

L. Xia and X.-R. Cao

138

S

The transition probability is denoted as P = [p(j|i)]i,j=1 . The steadystate distribution is denoted as a row vector π = (π(1), π(2), . . . , π(S)). Obviously, we have P e = e and πP = π, where e is an S-dimensional column vector whose elements are all 1. The cost function is denoted as a column vector f = (f (1), f (2), . . . , f (S))T . The long-run average performance of the Markov system is denoted as η and we have η = E{f (Xl )} =

S

π(i)f (i) = πf

i=1 L−1 1 FL , f (Xl ) = lim L→∞ L L→∞ L

= lim

(7.17)

l=0

where E denotes the expectation over steady-state distribution and FL := L−1 l=0 f (Xl ). At each state i, we have to choose an action a from the action space A. Different action a determines different state transition probability pa (j|i), i, j ∈ S and a ∈ A. A policy d is a mapping from the state space S to the action space A. That is, d : S → A. The policy space is denoted as D and here we consider only the deterministic and stationary policies. Since the policy d determines both the transition probability matrix P and the cost function f , for simplicity, we also use (P, f ) to represent a policy in Markov systems in some situations. The optimization problem is to choose a proper (P, f ) to make the corresponding η optimal. Similar to the parameter perturbation discussed in the previous section, we consider that the transition probability matrix P of Markov systems has perturbations. We assume that the states Xl are observable, l = 0, 1, . . . . For simplicity, we assume the cost function f is irrelevant to the policy d. Different policy has different P . We also use P to represent the policy of Markov systems. We want to obtain the performance gradients around the current policy in the policy space by analyzing the system’s behavior under this policy P . In the policy space, along the direction from P to P  , we consider a randomized policy P δ : at state i the system transits according to p (j|i), i, j ∈ S, with probability δ, and transits according to p(j|i), i, j ∈ S, with probability 1 − δ. That is, the transition probability matrix of the randomized policy is P δ = (1 − δ)P + δP  , where 0 ≤ δ ≤ 1. Policy P and P  can be any two policies in the policy space. Let π δ and η δ be the steadystate probability and the long-run average performance associated with P δ . The performance derivative at policy P along the direction ΔP := P  − P

April 29, 2013

16:24

World Scientific Review Volume - 9in x 6in

Perturbation Analysis, Dynamic Programming, and Beyond

chapter7

139

(from P to P  ) is ηδ − η dη δ  . = lim  dδ δ=0 δ→0 δ

(7.18)

Different P  ’s represent different directional derivatives in the policy space. Consider the policy P is slightly changed to P δ , δ 1. This slight change in P will generate a series of perturbations on the sample path under P . Different from the perturbation (small delay of service time) in the queueing system, the perturbation in the Markov system is a “jump” from one state i to another state j, i, j ∈ S. That is, at time l, suppose the Markov chains with P and P δ are both at state k, k ∈ S; at time l + 1, the original Markov chain with P may transit to state Xl+1 = i; however, because of the slight change in the transition probability p(·|k), the Markov  = j. Thus, there is a state “jump” chain with P δ may transit to state Xl+1 from i to j if we compare these two sample paths. The long-term effect of such a state “jump” from i to j on the accumulated performance FL can be quantified by perturbation realization factor γ(i, j), which is defined as ∞    γ(i, j) := E [f (Xl ) − f (Xl )]X0 = j, X0 = i , (7.19) l=0

Xl

is the perturbed sample path with initial state j. Xl and Xl can where be viewed as two replications of a stochastic process with different initial states. Please note, γ(i, j) in Markov systems is different from cf (n, i) in queueing systems, although they are both called perturbation realization factors. Actually, these two kinds of perturbation realization factors have a relation as follows [42]. c(f ) (n, i) − c(n, i)η = 1ni >0 μi,n

M

qij γ(n−i,+j , n),

(7.20)

j=1

where c(n, i) is called perturbation realization probability and it is a special case of c(f ) (n, i) with f (n) ≡ 1 for all n. The perturbation realization factor γ(i, j) can be further written as below. γ(i, j) = g(j) − g(i),

∀i, j ∈ S,

(7.21)

where g(i), i ∈ S, is called performance potential. Performance potential g(i) quantifies the long-term contribution of initial state i to the

April 29, 2013

16:24

World Scientific Review Volume - 9in x 6in

chapter7

L. Xia and X.-R. Cao

140

accumulated performance and it is defined as  ∞   [f (Xl ) − η]X0 = i . g(i) := E

(7.22)

l=0

Performance potential is a fundamental quantity in Markov systems since it can compose the perturbation realization factor γ(i, j). Decomposing the first step of the summation in (7.22), we can derive the following Poisson equation in a matrix form (I − P )g + ηe = f,

(7.23)

where g = (g(1), g(2), . . . , g(S))T is the potential vector. From the definition of g(i) in (7.22), we can see that the effect of a jump from state i to j on the long-run accumulated performance FL can be measured by γ(i, j) = g(j) − g(i). Finally, the effect of a small (infinitesimal) change in a Markov chain’s transition probability matrix (from P to P δ ) on the long-run accumulated performance FL can be decomposed into the sum of the effects of all the single perturbations (jumps on a sample path) induced by the change in the transition probability matrix. With these principles, we can intuitively derive the equations for the performance derivative along any direction ΔP (from P to any P  ) as follows: During a long enough period L  1, the period that the system stays at state i is π(i)L; When the system is at state i, the probability that the sample path has a state jump from i to j caused by P δ is pδ (j|i) − p(j|i); The number of state jumps from i to j is π(i)L[pδ (j|i) − p(j|i)]; The effect of each state jump from i to j is measured by γ(i, j); The total effect of all of these perturbations is summated as π(i)L [pδ (j|i) − p(j|i)]γ(i, j) FLδ − FL = i∈S



i∈S

j∈S

π(i)L



[p (j|i) − p(j|i)]g(j).

(7.24)

j∈S

Dividing L on both sides of the above equation and combining (7.18), we have the following performance derivative equation dη δ  = π(P  − P )g = πΔP g. (7.25)  dδ δ=0 This equation can be also easily derived from the Poisson equation, as we will see later. However, the PA principles provide a clear and intuitive explanation for the performance potential and the derivative equation, and it can be easily extended to other non-standard problems for which the

May 8, 2013

14:57

World Scientific Review Volume - 9in x 6in

Perturbation Analysis, Dynamic Programming, and Beyond

chapter7

141

Poisson equation may not exist. For a more rigorous analysis to construct this derivative equation (7.25), see the literature [12]. 7.3.2. Policy iteration and HJB equation Besides the performance derivative equation (7.25), we also obtain the difference equation in Markov systems, which is similar to (7.14) in queueing systems. The derivation of difference equation can also be constructed by analyzing a series of perturbations on the sample paths [10], similar to the analysis process of (7.25). However, for the purpose of rigorous analysis and providing another perspective, we theoretically derive the difference equation based on the Poisson equation as follows. Suppose the policy of a Markov system is changed from (P, f ) to (P  , f  ). The steady-state distribution of the perturbed system with (P  , f  ) is denoted as π  and the corresponding system performance is η  = π  f  . Multiplying both sides of (7.23) with π  and after some transformations, we derive the performance difference equation as follows. η  − η = π  [(P  − P )g + (f  − f )] = π  (ΔP g + h),

(7.26)

where ΔP = P  − P and h = f  − f . With (7.26), we can directly compare the performance difference of any two policies based only on the information of the current policy. This is the key idea of the direct-comparison based approach. The policy iteration directly follows as below. Based on the sample path of the current policy, we calculate or estimate the value of g of the current system. We choose a proper (P  , f  ) which makes every element of the column vector P  g+f  minimal component-wisely. Since π  is always positive for ergodic systems, we directly have η  ≤ η and the system performance is improved for the minimization problem. Repeating this process, we can find the optimal policy within a finite number of iterations. This is the basic idea of policy iteration in Markov systems. Moreover, based on (7.26), we can directly derive the optimality (HJB) equation for Markov systems as follows. P g∗ + f P ∗g∗ + f ∗,

∀P, f,

(7.27)

where is ≥ component-wisely and g ∗ is the performance potential under the optimal policy (P ∗ , f ∗ ). The detailed algorithm of policy iteration can be referred to in [12, 39]. With (7.26), it is very easy to obtain the performance derivative equation. For the situation where the cost function is changed under different policies,

April 29, 2013

16:24

142

World Scientific Review Volume - 9in x 6in

chapter7

L. Xia and X.-R. Cao

the performance derivative equation (7.25) is further extended as dη δ = π(ΔP g + h). dδ

(7.28)

With the performance derivative, the policy-gradient based approach can be developed for the continuous parameters optimization problem of Markov systems [12, 36]. In summary, the performance potential plays a key role in the direct comparison theory of Markov systems. The effect of a change in the policy applied to a Markov system can be intuitively decomposed into a series of state “jumps” and the effects of these state “jumps” can be quantified with the performance potentials. This analysis with perturbation decompositions is similar to that in the PA theory of queueing systems. With performance potentials as building blocks, we can construct the performance difference equation and the performance derivative equation. With the difference equation, the policy iteration algorithm is proposed. With the derivative equation, the policy-gradient based algorithm is proposed. The direct-comparison based approach provides a new perspective to study the performance optimization of Markov systems, besides the dynamic programming of the classic MDP theory. It is possible to develop other new approaches to handle the emerging problems which are difficult for the classical MDP theory, such as the event-based optimization. This is what we will discuss in the next section.

7.4. Beyond Dynamic Programming In this section, we will first introduce some problems solved by the directcomparison based approach in Markov systems, while these problems are difficult for the traditional dynamic programming method. These problems include the N -bias optimization problem and the sample-path variance minimization problem in MDP. Next we will introduce a new optimization framework called event-based optimization. Then we will introduce some new results in the financial engineering. The last two problems are solved by following the similar idea of the direct-comparison based approach in Markov systems. It is infeasible to use the dynamic programming to solve these two problems.

April 29, 2013

16:24

World Scientific Review Volume - 9in x 6in

chapter7

Perturbation Analysis, Dynamic Programming, and Beyond

143

7.4.1. New results based on direct comparison In this subsection, we introduce some new results based on the directcomparison approach. These results are derived naturally and intuitively from the difference equation. They were not obtained by using the traditional dynamic programming, probably because applying dynamic programming to long-run average performance usually requires discounting with discounting factor approaching zero; this may loss intuitive insights. 7.4.1.1. N-bias optimality in MDP Consider a discrete-time multichain Markov chain X = {X0 , X1 , . . . }, where Xl is the system state at time l. The notations are the same as those in Section 7.3. The transition probability matrix under policy d is denoted as P d and the associated cost function is denoted as f d . The average performance under policy d is denoted as a column vector g0d with the component as 1 g0d (i) = lim E L→∞ L

L−1

   f (Xl , d(Xl ))X0 = i ,

i ∈ S.

(7.29)

l=0

From the above equation, we see that the average performance g0d (i) is relevant to the initial state i since X is a multichain. We denote (P d )∗ as the Cesaro limit of the transition probability matrix d P , that is, L−1 1 d l (P ) . L→∞ L

(P d )∗ := lim

(7.30)

l=0

The bias under policy d is denoted as a column vector g1d with the component as g1d (i)

= lim E L→∞

 L

[f (Xl , d(Xl )) −



 g0d (i)]X0

 =i ,

i ∈ S.

(7.31)

l=0

From the above equation, we see that the bias g1d (i) quantifies the contribution of initial state i on the accumulated performance and g1d (i) is equivalent to the performance potential g(i) in (7.22) when the Markov chain is a unichain.

April 29, 2013

16:24

World Scientific Review Volume - 9in x 6in

chapter7

L. Xia and X.-R. Cao

144

Furthermore, we define the (n + 1)th bias under policy d as a column vector with components ∞    d d d ∗ d gn+1 (i) = −E {gn (Xl ) − [(P ) gn ](i)}X0 = i , i ∈ S, (7.32) l=0

(P d )∗ gnd

where is the steady-state value of the nth bias, i ∈ S, and n ≥ 1. With the above nth biases, we further define the nth-bias optimality, n ≥ 0. The optimal nth bias is denoted as gn∗ (i) := min gnd (i), d∈Dn−1

i ∈ S,

(7.33)

d ∗ = gn−1 } is the set of all (n − 1)th-bias where Dn−1 := {d ∈ Dn−2 : gn−1 optimal policies and D−1 := D is the original policy space. Therefore, if d ∈ Dn−1 , then gkd = gk∗ for all k = 0, 1, . . . , n − 1. A policy d∗ ∈ Dn−1 is called nth-bias optimal if it satisfies ∗

gnd (i) = gn∗ (i),

i ∈ S.

(7.34)

Therefore, the nth-bias optimization problem is to find the nth-bias optimal policy, n = 0, 1, . . . . When n = 0, it is equivalent to the classical MDP with the long-run average performance criterion. The classical MDP theory handles the average and bias optimality, but no work have been done for the nth-bias optimality. This is because the classical MDP theory usually has to utilize the n-discount optimality (with the discount approaching to 1) and it would be complicated to analyse. As we will show later, we can derive the performance difference equation for the nth-bias optimality, similar to the idea of the direct-comparison based approach in Markov systems. Based on the difference equations, it is very natural and direct to derive the policy iteration for the nth-bias optimality and the optimality equation. This benefits from the advantage of difference equation which directly compares the performance difference under any two policies. More details can be found in the literature [16, 49]. Similar to the difference equation (7.26), we can derive the difference equation of the nth biases under any two policies d and b, d, b ∈ Dn−1 . When n = 0, the nth bias is the performance average and we derive the following difference equation g0b − g0d = (P b )∗ [(f b + P b g1d ) − (f d + P d g1d )] + [(P b )∗ − I]g0d .

(7.35)

When the Markov system is a unichain, the above equation is exactly the same as (7.26).

May 8, 2013

14:57

World Scientific Review Volume - 9in x 6in

Perturbation Analysis, Dynamic Programming, and Beyond

chapter7

145

When n = 1 and g0b = g0d , we consider the bias optimality of MDP and the difference equation of the bias is as follows. g1b − g1d = [I − P b + (P b )∗ ]−1 [(f b + P b g1d ) − (f d + P d g1d )] + (P b )∗ (P b − P d)g2d . (7.36) b d When n > 1 and gn−1 = gn−1 , the difference equation of the nth-biases under policy b and d is as follows. d . (7.37) gnb − gnd = [I − P b + (P b )∗ ]−1 (P b − P d )gnd + (P b )∗ (P b − P d )gn+1

Therefore, we obtain the difference equation of the nth biases, n = 0, 1, . . . . These equations above are more complicated than (7.26) because here we consider the multichain Markov system. The performance of multichain is a vector, while the performance of unichain is a scalar. It is not straightforward for the classical MDP theory to handle these nth-bias optimality problems [39, 41]. Based on the difference equations (7.35), (7.36), and (7.37), we can directly derive the policy iteration for the nth-bias optimality. The basic idea is similar to those in Section 7.3. Here we only give a very brief discussion. We use n = 1 as an example to discuss the policy iteration to find the nth-bias optimal policy. Suppose the current policy d is average optimal. We can choose a new policy b from the set of the average optimal policy D0 , which satisfies the following conditions. First, we have to guarantee f b + P b g1d ≤ f d + P d g1d component-wisely. Second, we have to guarantee (P b g2d )(i) ≤ (P d g2d )(i) when f b (i) + (P b g1d )(i) = f d (i) + (P d g1d )(i) for some i ∈ S. With these conditions and the structure of (P b )∗ , we can directly have g1b ≤ g1d based on (7.36). Thus, the bias of the system is improved. Repeating this process, we can find the bias optimal policy finally. The scenario for a general nth-bias optimality is similar and the details can be referred to in [16, 49]. 7.4.1.2. Optimization of sample-path variance in MDP In the classical MDP theory, we often discuss the optimization under the long-run average performance criterion. However, in many practical systems, such as the controller design in automatic control or the portfolio management in finance, the variability of the system performance is also an important metric. When the long-run average performance already attains optimum, how to choose a policy with a minimal variance is an interesting problem. More specially, the variance considered here is the limiting average variance along the sample path, we call it sample-path variance

May 8, 2013

14:57

World Scientific Review Volume - 9in x 6in

chapter7

L. Xia and X.-R. Cao

146

which is different from the traditional definition of variance of a stochastic process [33]. The sample-path variance is difficult to handle because it is the time average of the square of the total costs deviating from the average cost. In an ergodic Markov chain, denote D0 as the set of optimal policies under the long-run average criterion. Under a policy d ∈ D0 , the sample-path variance is defined as ⎧

2 ⎫ ⎨ L−1 ⎬ 1 d σsp := lim E (f d (Xl ) − η ∗ ) , (7.38) ⎩ ⎭ L→∞ L l=0

where η ∗ is the optimal long-run average performance. The sample-path variance minimization problem is to find the optimal policy d∗ ∈ D0 , which d∗ minimal, that is makes σsp d d∗ = arg min σsp .

(7.39)

d∈D0

Similar to the basic idea of the direct-comparison based approach in Markov systems, we can derive the following difference equation of the sample-path variances under any two policies d and b, with d, b ∈ D0 [33]

b d d b d σsp − σsp = π b (P b − P d )gsp + (fsp − fsp ) , (7.40) d d where fsp and gsp are the performance function and the sample-path varid d and gsp are special and defined ance potential, respectively. The forms of fsp as follows. d fsp (i) := 2[f d (i) − η ∗ ]g d (i) − [f d (i) − η ∗ ]2 ,

∀i ∈ S,

(7.41)

where g d (i) is the performance potential under the long-run average performance criterion and its definition is (7.22). d is defined with the performance The sample-path variance potential gsp d function fsp as below.  L    d d d (i) := lim E [fsp (Xl ) − π d fsp ]X0 = i , ∀i ∈ S. (7.42) gsp L→∞

d π d fsp

l=0

d = σsp [37]. It is proved that With (7.40), it is easy to do the performance analysis for the samplepath variance minimization problem. Some conclusions, such as the direct comparison and the sufficient condition of the optimal policy, naturally follow. This gives valuable inspirations to handle such problems and more details can be referred to in [33].

April 29, 2013

16:24

World Scientific Review Volume - 9in x 6in

Perturbation Analysis, Dynamic Programming, and Beyond

chapter7

147

7.4.2. Event-based optimization In a standard MDP, the decision maker has to select the action at every state. We can call it state-based control. However, in many practical systems, the control decisions are made only when certain events happen. We call it event-based control. Since the sequence of events usually does not have a Markov property, how to optimize such kind of problems is a new challenge and it does not fit the optimization framework of MDP theory. In this subsection, we discuss a new optimization framework called event-based optimization to solve this kind of problems. In the work of [11, 12, 15, 34, 47], the event-based optimization is studied with a sensitivity-based approach, which can be viewed as a consequence of Section 7.3. In this approach, the performance optimization is based on the difference equation and the derivative equation. The policy iteration and the gradient-based optimization follow directly along these two sensitivity equations. These approaches may be applied to cases where the dynamic programming fails to work. Below, we first introduce the mathematical formulation of the event-based optimization. Then we will briefly introduce the theory of event-based optimization based on direct comparison. Consider a discrete time Markov chain X = {X0 , X1 , . . . }, where Xl is the system state at time l, l = 0, 1, . . . . The notations of the basic elements of X are the same as those in Section 7.3. An event e is defined as a set of state transitions which possess certain common properties. That is, e := { i, j : i, j ∈ S and i, j has common properties}, where i, j denotes the state transition from i to j. All of the events which can trigger control compose the event space E. Assume the event space is finite and the total number of types of events is V . We define E := {eφ , e1 , e2 , . . . , eV }, where eφ indicates the “no action” event which means when it occurs, no action is taken. The input state of event e, e ∈ E, is defined as I(e) := {all i ∈ S : i, j ∈ e for some j}. The output state of event e at input state i is defined as Oi (e) := {all j ∈ S : i, j ∈ e}. From the definition of event e, we see that the event includes more information than the system state. The event includes not only the current state, but also the next state. With an event, we may have some information about the future dynamics of the system. For example, if an event e happens, we know that the current state must be in the set I(e) and the next state must be in the set Oi (e), i ∈ I(e). The formulation of events can capture some structure information of the problem.

April 29, 2013

16:24

World Scientific Review Volume - 9in x 6in

chapter7

L. Xia and X.-R. Cao

148

When an event happens, we have to take actions. The action space is assumed finite and it is denoted as A. At event e, when an action a is adopted, the state transition probability is denoted as pa (j|i, e), where a ∈ A, i, j ∈ S, and e ∈ E. Here we consider the event-based policy d and assume d is Markovian and deterministic. That is, d is a mapping from the event space E to the action space A and d : E → A. At time l, we have to determine the policy dl , l = 0, 1, . . . . We further assume the policy is stationary, that is dl ≡ d for all l. Therefore, we only consider the stationary and deterministic policy and the all of the possible policies compose the event-based policy space denoted as De . The cost function under policy d is denoted as f d = [f d (i)]i∈S and the associated long-run average performance is denoted as η d . The stead-state distribution under policy d is denoted as π d and we have η d = π d f d . The goal of event-based optimization is to find the optimal event-based policy d which makes the long-run average performance η d minimal, where d ∈ De . That is,  ∗

d

d = arg min η = arg min lim E d∈De L→∞

d∈De

 L−1 1 d f (Xl ) . L

(7.43)

l=0

The framework of event-based optimization captures the structure of event-triggered control in many practical systems. Moreover, in the practice, the event occurrence is much rarer that the state transitions and the event space size is much less than the state space size, i.e., |E| |S|. Therefore, the complexity of the event-based optimization is much less than that of the standard MDP. Below, we introduce how to follow the direct comparison based approach in Section 7.3 to solve this event-based optimization problem. First, we study the performance difference equation of the event-based optimization. Under policy d, the state transition probability of the Markov system is denoted as pd (j|i, e), where i, j ∈ S, e ∈ E, d ∈ De , and i, j ∈ e. With the law of total probability and the formula of conditional probability, we have pd (j|i) =

V π(i, j, e) e=1

π d (i)

=

V pd (j|i, e)π d (i, e) e=1

π d (i)

=

V

pd (j|i, e)π d (e|i),

e=1

(7.44) where π d (e|i) is the conditional probability that event e happens at the current state i. Usually, this probability π d (e|i) is determined by the

April 29, 2013

16:24

World Scientific Review Volume - 9in x 6in

Perturbation Analysis, Dynamic Programming, and Beyond

chapter7

149

definition of the event and it is irrelevant to the policy d [12]. Therefore, we rewrite (7.44) as below. V

pd (j|i) =

pd (j|i, e)π(e|i).

(7.45)

e=1

With (7.45), we apply the difference equation (7.26) to this event-based optimization. Consider two event-based policies d and b, d, b ∈ De . The performance difference under these two policies is quantified by (7.26) as follows. η b − η d = π b [(P b − P d )g d + (f b − f d )]  S V S = π b (i) π(e|i)[pb(e) (j|i, e) − pd(e) (j|i, e)]g d (j) j=1 e=1

i=1



+[f b (i) − f d (i)] =

S



V S

π b (i, e)[pb(e) (j|i, e) − pd(e) (j|i, e)]g d (j)

j=1 e=1

i=1



+π (i)[f (i) − f (i)] b

=

V

b

π b (e)

e=1

d

i∈I(e)

 π b (i|e) 

+[f b (i) − f d (i)] ,



[pb(e) (j|i, e) − pd(e) (j|i, e)]g d (j)

j∈Oi (e)

(7.46)

The difference equation (7.46) clearly describes how the system performance is changed under any two event-based policies. We notice that (7.46) has relation to π b (i|e) which is a probability of the system under the new policy b. In order to choose a better policy b, it is computationally exhaustive to enumerate π b (i|e) for all of the possible b. Fortunately, under some conditions, we may overcome this difficulty. If the event-based optimization problem satisfies the following property π b (i|e) = π d (i|e),

∀ i ∈ I(e), e ∈ E, for any two policies b and d, (7.47)

April 29, 2013

16:24

World Scientific Review Volume - 9in x 6in

chapter7

L. Xia and X.-R. Cao

150

by substituting (7.47) into (7.46), we have the following difference equation for the event-based optimization  V b d b d η −η = π (e) π (i|e) [pb(e) (j|i, e) − pd(e) (j|i, e)]g d (j) e=1

i∈I(e)



j∈Oi (e)

+[f b (i) − f d (i)] .

(7.48)

As we know, π b (e) is always positive for ergodic systems although it is difficult to enumerate its value under all of the possible b. π d (i|e) and g d (j) can be calculated or estimated based on the sample path under the current policy d. With (7.48), if we choose   π d (i|e) pa (j|i, e) + f (i, a) , ∀e ∈ E, (7.49) b(e) := arg min a∈A

i∈I(e)

j∈Oi (e)

then we have η b ≤ η d . With this update procedure, the performance of the policy will be repeatedly improved until we find the optimal policy d∗ . This optimization procedure is similar to the policy iteration in the directcomparison based approach of Markov systems. The optimality equation of the event-based optimization can be also derived similar to (7.27). There are more studies about the implementation of the above optimization procedure in the event-based optimization, such as the online estimation of π d (i|e), g d (i), and the Q-factor based optimization algorithms. For more details, please refer to the literature [11, 12, 34, 47]. However, the optimality of the above procedure requires the condition (7.47), which is satisfied in categories of problems [11, 40, 43, 46, 48]. That is, the steady-state conditional distribution of state i given event e should be irrelevant to the event-based policy d. When (7.47) does not hold, we cannot guarantee that the above procedure can find the optimal policy. For such situation, we can develop the gradient-based optimization. With (7.48), it is easy to derive the following derivative equation  V ∂η d d d = π (e) π (i|e) [pb(e) (j|i, e) − pd(e) (j|i, e)]g d (j) ∂δ e=1 i∈I(e) j∈Oi (e)  +[f b (i) − f d (i)] .

(7.50)

The above derivative equation does not require the condition (7.47). With (7.50), we can estimate the performance derivative along a direction in the

April 29, 2013

16:24

World Scientific Review Volume - 9in x 6in

Perturbation Analysis, Dynamic Programming, and Beyond

chapter7

151

policy space (direction from policy d to policy b, b, d ∈ De ) and a local optimum can be found by the followed gradient-based optimization. In summary, the theory of event-based optimization provides an efficient way to optimize the system performance where the control decision is triggered by certain events. The event-based optimization inherits the main idea of the direct-comparison based optimization of Markov systems. By deriving the difference equation of event-based optimization, we can develop the policy iteration to find the optimal event-based policy. Similarly, we also derive the derivative equation and develop the gradient-based optimization. The performance potential plays a fundamental role in the above theory. The theory of event-based optimization may handle the problems which do not fit the traditional optimization framework, such as the dynamic programming. Therefore, this approach provides a new perspective of the optimization theory for stochastic systems beyond the dynamic programming. 7.4.3. Financial engineering related The idea of the PA theory can be applied to many fields. In the financial engineering, some new results are derived recently. Portfolio management is an important research topic in the financial engineering. A portfolio consists of a number of assets, for example, the bond assets and the stock assets, etc. The prices of the assets vary in the market environment according to some patterns of stochastic processes. The goal of portfolio management is to find an optimal policy determining the amount of all kinds of assets (buy or sell the assets) to make the total wealths maximal. One of the formulations of the portfolio management problem is to use a partial observable Markov decision process (POMDP) [40]. Consider a discrete time model of this problem. At the beginning, the amount of the bond asset and the stock asset in the portfolio are xb0 and xs0 , respectively. At time l, the price of the bond and the stock are denoted as pbl and psl , respectively, where l = 0, 1, . . . . We have a dynamic equation to quantify the variation of the prices. For the price of the bond asset, we have pbl+1 = f b (pbl , r),

(7.51)

where f b is a deterministic function and r is the risk free interest rate of the market. For the price of the stock asset, we have psl+1 = f s (psl , μ, Vl , Bl ),

(7.52)

April 29, 2013

16:24

152

World Scientific Review Volume - 9in x 6in

L. Xia and X.-R. Cao

where f s is also a deterministic function, μ is the appreciation rate, Vl is called the volatility at time l, and Bl is the value of a random walk at time l. We formulate that Vl also follows a random walk. At each time l, we can observe xbl , xsl , pbl , psl and we cannot observe Vl . We define the system state as (pbl , psl , Vl ) and we see that the system state is partial observable. The action is to determine the values of xbl and xsl at every time l. The goal is to choose an optimal policy which maximizes the long-run average reward rate, i.e.,   log(xbl pbl + xsl psl ) . lim E l→∞ l For this problem, we can use the approach of the direct comparison to conduct a complete analysis. The idea of the event-based optimization is applied to this problem [40]. By properly defining the event of this problem, we formulate this problem with the event-based optimization framework. Fortunately, the condition (7.47) for the optimality of event-based optimization is satisfied in this problem. The correctness of (7.47) in this problem is easy to understand based on the following fact, the change of an individual’s portfolio management policy does not affect the entire market. That is, in a complete market, the buy or sale behavior of an individual investor has no notable effect on the price of the assets. Thus, the price distribution of assets is irrelevant to the policy of an individual and the condition (7.47) holds for this problem. Therefore, we can use similar approach to find the optimal policy of this portfolio management problem. Details can be referred to [40]. Other study of stock portfolio optimization as a Markov decision process can be referred to [20]. Another recent result about the portfolio management is the optimization with a distorted performance criterion, which is also based on the PA theory [14]. In the portfolio management, we use an utility function to measure the investor’s “satisfaction” to the return of the portfolio. Caused by the human’s psychological characteristics, the behavior of the investor’s “satisfaction” to different returns of the portfolio is nonlinear. Such nonlinearity can be explained as the people’s preference to different risks. Such nonlinear behavior makes the utility function complicated and we call it distorted performance function, since it can be modeled by distorting the probability of events. With the distorted performance criterion, the standard optimization approach, such as the dynamic programming, fails to work for this problem. This is because, if a policy is optimal for a distorted performance starting from time t, this policy may be no longer optimal for

chapter7

May 8, 2013

14:57

World Scientific Review Volume - 9in x 6in

Perturbation Analysis, Dynamic Programming, and Beyond

chapter7

153

the system starting from t , where t < t. This is caused by the nonlinearity of the distorted performance function. Thus, the recursion for the optimal policy in dynamic programming is not valid. Fortunately, [14] find the mono-linearity of the distorted performance function. That is, the distorted performance function becomes locally linear if we properly change the underlying probability measure. Thus, the derivative of the distorted performance can be estimated by the expectation of the sample-path based derivatives of the distorted performance. Then, we can apply the PA theory to efficiently estimate the performance derivative from the sample path. A gradient-based optimization approach is developed to solve this portfolio management problem. For more details, readers can refer to [14]. In summary, the PA theory provides a new and efficient perspective to study the portfolio management problem in the financial engineering, especially when the standard optimization approaches fail to work. This also demonstrates the broad applicability of the PA theory to various research fields.

Acknowledgments The chapter briefly records the research development starting from PA that the authors have been involved. Prof. Ho’s deep insight has been motivating all my research activities and influencing my students (by Cao) We wish Mrs. and Prof. Ho, a long, happy, and healthy life.

References [1] Bryson, A. E. and Ho, Y. C. (1969). Applied Optimal Control: Optimization, Estimation, and Control, Blaisdell, Waltham, Massachusetts. [2] Cao, X. R. (1985). Convergence of parameter sensitivity estimates in a stochastic experiment. IEEE Transactions on Automatic Control 30, 834– 843. [3] Cao, X. R. (1987a). First-order perturbation analysis of a single multi-class finite source queue. Performance Evaluation 7, 31–41. [4] Cao, X. R. (1987b). Realization probability in closed Jackson queueing networks and its application. Advances in Applied Probability 19, 708–738. [5] Cao, X. R. (1988). A sample performance function of Jackson queueing networks. Operations Research 36, 128–136. [6] Cao, X. R. (1994). Realization Probabilities — The Dynamics of Queueing Systems. New York: Springer Verlag.

April 29, 2013

16:24

154

World Scientific Review Volume - 9in x 6in

L. Xia and X.-R. Cao

[7] Cao, X. R. (2000). A unified approach to Markov decision problems and performance sensitivity analysis. Automatica 36, 771–774. [8] Cao, X. R. (2003a). From perturbation analysis to Markov decision processes and reinforcement learning. Discrete Event Dynamic Systems: Theory and Applications 13, 9–39. [9] Cao, X. R. (2003b). Semi-Markov decision problems and performance sensitivity analysis. IEEE Transactions on Automatic Control 48, 758–769. [10] Cao, X. R. (2004). The potential structure of sample paths and performance sensitivities of Markov systems. IEEE Transactions on Automatic Control 49, 2129–2142. [11] Cao, X. R. (2005). Basic ideas for event-based optimization of Markov systems. Discrete Event Dynamic Systems: Theory and Applications 15, 169– 197. [12] Cao, X. R. (2007). Stochastic Learning and Optimization — A Sensitivitybased Approach. New York: Springer. [13] Cao, X. R., Yuan, X. M., and Qiu, L. (1996). A single sample path-based performance sensitivity formula for Markov chains. IEEE Transactions on Automatic Control 41, 1814–1817. [14] Cao, X. R. and Wan, X. (2012). Analysis of non-linear behavior — a sensitivity-based approach. Proceedings of the 2012 IEEE Conference of Decision and Control, Maui, Hawaii, US. [15] Cao, X. R. and Zhang, J. (2008a). Event-based optimization of Markov systems. IEEE Transactions on Automatic Control 53, 1076–1082. [16] Cao, X. R. and Zhang, J. (2008b). The nth-order bias optimality for multichain Markov decision processes. IEEE Transactions on Automatic Control 53, 496–508. [17] Cao, X. R. and Chen, H. F. (1997). Potentials, perturbation realization, and sensitivity analysis of Markov processes. IEEE Transactions on Automatic Control 42, 1382–1393. [18] Cassandras, C. G. and Lafortun, S. (2008). Introduction to Discrete Event Systems, 2nd Edition, Springer-Verlag, New York. [19] Cassandras, C. G., Wardi, Y., Melamed, B., Sun, G., and Panayiotou, C. G. (2002). Perturbation analysis for online control and optimization of stochastic fluid models. IEEE Transactions on Automatic Control 47, 1234–1248. [20] Ding, C. and Xi, Y. (2012). Study on stock portfolio optimization problem based on Markov chain. Computer Simulation 29, 366–369. [21] Fu, M. C. and Hu, J. Q. (1994). Smoothed perturbation analysis derivative estimation for Markov chains. Operations Research Letters 15, 241–251. [22] Fu, M. C. and Hu, J. Q. (1997). Conditional Monte Carlo: Gradient Estimation and Optimization Applications. Boston: Kluwer Academic Publishers. [23] Glasserman, P. (1990). The limiting value of derivative estimators based on perturbation analysis. Communications in Statistics: Stochastic Models 6, 229–257. [24] Glasserman, P. (1991). Gradient Estimation via Perturbation Analysis. Boston, MA: Kluwer Academic Publishers.

chapter7

April 29, 2013

16:24

World Scientific Review Volume - 9in x 6in

Perturbation Analysis, Dynamic Programming, and Beyond

chapter7

155

[25] Glasserman, P. and Gong, W. B. (1990). Smoothed perturbation analysis for a class of discrete event system. IEEE Transactions on Automatic Control 35, 1218–1230. [26] Gong, W. B. and Ho, Y. C. (1987). Smoothed (conditional) perturbation analysis for discrete event dynamic systems. IEEE Transactions on Automatic Control 32, 858–866. [27] Gordon, W. J. and Newell, G. F. (1967). Closed queueing systems with exponential servers. Operations Research 15, 252–265. [28] Heidergott, B. (2000). Customer-oriented finite perturbation analysis for queueing networks. Discrete Event Dynamic Systems: Theory and Applications 10, 201–232. [29] Ho, Y. C. and Cao, X. R. (1983). Perturbation analysis and optimization of queueing networks. Journal of Optimization Theory and Applications 40, 559–582. [30] Ho, Y. C. and Cao, X. R. (1991). Perturbation Analysis of Discrete Event Systems. Norwell: Kluwer Academic Publishers. [31] Ho, Y. C., Cao, X. R., and Cassandras, C. G. (1983). Infinitesimal and finite perturbation analysis for queueing networks. Automatica 19, 439–445. [32] Ho, Y. C., Eyler, A., and Chien, T. T. (1979). A gradient technique for general buffer-storage design in a serial production line. International Journal of Production Research 17, 557–580. [33] Huang, Y. and Chen, X. (2012). A sensitivity-based construction approach to sample-path variance minimization of Markov decision process. Proceedings of 2012 Australian Control Conference, Sydney, Australia, 215–220. [34] Jia, Q. S. (2011). On solving event-based optimization with average reward over infinite stages. IEEE Transactions on Automatic Control 56, 2912–2917. [35] Leahu, H., Heidergott, B., and Hordijk, A. (2012). Perturbation analysis of waiting times in the G/G/1 queue. Discrete Event Dynamic Systems: Theory and Applications 22, DOI 10.1007/s10626-012-0144-0. [36] Marbach, P. and Tsitsiklis, J. N. (2001). Simulation-based optimization of Markov reward processes. IEEE Transactions on Automatic Control 46, 191– 209. [37] Meyn, S. P., Tweedie, R. L., and Glynn, P. W. (2009). Markov Chains adn Stochastic Stability. Cambridge: Cambridge University Press. [38] Parr, R. E. (1998). Hierarchical Control and Learning for Markov Decision Processes. Ph.D. Dissertation, University of California at Berkeley. [39] Puterman, M. L. (1994). Markov Decision Processes: Discrete Stochastic Dynamic Programming. New York: John Wiley & Sons. [40] Wang, D. X. and Cao, X. R. (2011). Event-based optimization for POMDP and its application in portfolio management. Proceedings of the 18th IFAC World Congress, Milano, Italy, 3228–3233. [41] Veinott, A. F. (1969). Discrete dynamic programming with sensitive discount optimality criteria. The Annals of Mathematical Statistics 40, 1635–1660. [42] Xia, L. and Cao, X. R. (2006a). Relationship between perturbation realization factors with queueing models and Markov models. IEEE Transactions on Automatic Control 51, 1699–1704.

May 14, 2013

10:17

156

World Scientific Review Volume - 9in x 6in

L. Xia and X.-R. Cao

[43] Xia, L. and Cao, X. R. (2006b). Aggregation of perturbation realization factors and service rate-based policy iteration for queueing systems. Proceedings of the 45th IEEE Conference on Decision and Control, San Diego, CA, USA, 1063–1068. [44] Xia, L., Chen, X., and Cao, X. R. (2009). Policy iteration for customeraverage performance optimization of closed queueing systems. Automatica 45, 1639–1648. [45] Xia, L. and Cao, X. R. (2012). Performance optimization of queueing systems with perturbation realization. European Journal of Operational Research 218, 293–304. [46] Xia, L. (2013a). Event-based optimization of admission control in open queueing networks. Discrete Event Dynamic Systems: Theory and Applications, in press. [47] Xia, L., Jia, Q. S., and Cao, X. R. (2013b). A tutorial on event-based optimization, — A new optimization framework. Discrete Event Dynamic Systems: Theory and Applications, under review. [48] Xia, L. and Shihada, B. (2013c). Max-Min optimality of service rate control in closed queueing networks. IEEE Transactions on Automatic Control 58, 1051–1056. [49] Zhang, J. and Cao, X. R. (2009). Continuous-time Markov decision processes with nth-bias optimality criteria. Automatica 45, 1628–1638.

chapter7

April 29, 2013

16:28

World Scientific Review Volume - 9in x 6in

PART II

Ordinal Optimization

157

chapter8

This page intentionally left blank

May 20, 2013

13:55

World Scientific Review Volume - 9in x 6in

Chapter 8 Fundamentals of Ordinal Optimization

Qing-Shan Jia, Qianchuan Zhao, Xiaohong Guan∗ Center for Intelligent and Networked Systems Department of Automation, TNList, Tsinghua University Beijing 100084, China Zhen Shen Institute of Automation, Chinese Academy of Sciences Beijing 100190, China Liyi Dai Computer Sciences Division, US Army Research Office Durham, NC 27709, USA In many cases a simulation model might be the only faithful way to describe the detailed system dynamics. It is usually time-consuming to run a simulation model for the accurate performance estimation, leaving alone optimizing the design of the system. In this chapter, we review the fundamentals of ordinal optimization (OO), which has started since Ho et al. [1].

8.1. Two Basic Ideas There are two basic ideas in OO, namely ordinal comparison and goal softening. Ordinal comparison says that it is much easier to find out which design is better than to find out how much better. In order to see this, imagine that you received two identical-looking boxes as Christmas gifts. You are asked to judge which one is heavier than the other though you have no idea of the content in the boxes. Almost all of us can answer this question correctly even if there is only a small difference in their weight. ∗ This

work is partially supported by NSFC under grants 60704008, 60736027, 61074034, 61021063, 61174072, 61174105, 61222302, 90924001, and 91224008. 159

chapter8

April 29, 2013

16:28

160

World Scientific Review Volume - 9in x 6in

chapter8

Q.-S. Jia et al.

However, if we are asked to tell the exact weight difference, most of us will have a hard time. In simulation-based optimization, the situation is similar. If we want to find out which design has a better performance, a small number of replications of the simulation is sufficient to evaluate the performance of the designs. Should we be required to tell the difference between the performance of two designs, much more replications are required. This is the first basic idea of OO. The second basic idea of OO, namely goal softening, says that it is much easier to find a top-n design than to find out the global best. In order to see this, consider another example. Imagine you are throwing darts to a dartboard. It is easy to see that when the darts are the same, it is much easier to hit a big dartboard than to hit a small one. Consider a large but finite and discrete design space, there is only a global optimum and n top-n designs. Finding the global optimum in this case is like using a bullet to hit another bullet, the chance of which is very small. While finding a top-n design becomes much easier especially when the value of n is large. This is the second basic idea of OO. As we shall see in the next subsection, these two basic ideas are not only intuitively reasonable, but also mathematically insightful. Putting the two ideas together, one may say that though the traditional optimization algorithms try to find the global optimal for sure, OO suggests to find a good enough design with high probability. In the last chapter of this book, we will see that OO has been successfully applied to many practical largescale optimization problems, especially the problems where the performance evaluation is based on simulation. 8.2. The Exponential Convergence of Order and Goal Softening In this section we will show that the advantage of ordinal comparison and goal softening that have been discussed in the previous section can also be justified mathematically. First, we focus on ordinal comparison. We introduce the following notations. Consider an optimization problem with Θ being the design space which is finite and large. Let |Θ| = N , which means there are N designs in total. Let J(θi ) be the true performance of θi ∈ Θ. Without loss of generality, assume that J(θ1 ) < J(θ2 ) < . . . < J(θN ). The observed performance in each simulation is ˆ i , j) = J(θi ) + w(θi , j), J(θ

(8.1)

April 29, 2013

16:28

World Scientific Review Volume - 9in x 6in

Fundamentals of Ordinal Optimization

chapter8

161

for j = 1, . . . , n is the index of the replications; w(θi , j) is the observation noise of design θi in the jth replication. And we assume that ˆ i , j), j = 1, . . . , n is assumed E[w(θi , j)] = 0. The observation sequence J(θ to be independently identically distributed. Then the estimated performance after n replications is  ˆ i , j). ¯ i , n) = 1 J(θ J(θ n j=1 n

(8.2)

The advantage of ordinal comparison can be mathematically presented as the exponential convergence of order. This will be shown in two steps. First, we show that the probability for correctly identifying the order between two designs with distinct performance converges to 1 exponentially fast. Second, we show that the observed order among all the designs converge to the true order exponentially fast. We start from the first step. We have ¯ j , n)} ¯ i , n) > J(θ Pr{J(θ ¯ j , n)} ¯ i , n) ≤ J(θ = 1 − Pr{J(θ ¯ j , n) > a} ¯ i , n) < a} Pr{J(θ ≤ 1 − Pr{J(θ ¯ j , n) ≤ a}] ¯ i , n) ≥ a}][1 − Pr{J(θ = 1 − [1 − Pr{J(θ ≤ 1 − [1 − e−nα ][1 − e−nβ ] = 1 − [1 − e−nα − e−nβ + e−n(α+β) ] = e−nα + e−nβ − e−n(α+β) < e−nα + e−nβ ≤ 2e−n min{α,β} = eln 2−n min{α,β} .

(8.3)

Where a in the third line is any given constant, and α and β in the fifth line are some constants. Note that when n is large, ln 2 is much smaller than ¯ i , n) > n min{α, β}. Therefore, there exists a constant γ such that Pr{J(θ ¯ j , n)} < e−nγ . J(θ In the second step, we show that the observed order among multiple designs converge to the true order exponentially fast. Mathematically speak¯ 2 , n) < . . . J(θ ¯ N , n)} ≥ 1 − e−nc for ¯ 1 , n) < J(θ ing, this means that Pr{J(θ

April 29, 2013

16:28

World Scientific Review Volume - 9in x 6in

chapter8

Q.-S. Jia et al.

162

some constant c. Simply note that ¯ 2 , n) < . . . J(θ ¯ N , n)} ¯ 1 , n) < J(θ 1 − Pr{J(θ ¯ 2 , n)} ¯ 1 , n) > J(θ = Pr{J(θ ¯ 2 , n), J(θ ¯ 2 , n) > J(θ ¯ 3 , n)} ¯ 1 , n) < J(θ + Pr{J(θ ¯ 1 , n) < J(θ ¯ 2 , n), . . . , J(θ ¯ N −2 , n) + . . . + Pr{J(θ ¯ N −1 , n) > J(θ ¯ N , n)} < J¯(θN −1 , n), J(θ ≤

N −1 

¯ i , n) > J(θ ¯ i+1 , n)}. Pr{J(θ

(8.4)

i=1

From the discussion in the first step, we have that ¯ i+1 , n)} ≤ e−nαi , ¯ i , n) > J(θ Pr{J(θ

(8.5)

for each i. So combining Eqs. (8.4) and (8.5), we have ¯ 1 , n) < J(θ ¯ 2 , n) < . . . J(θ ¯ N , n)} ≤ 1 − Pr{J(θ

N −1 

e−nαi .

(8.6)

i=1

Then it is straightforward to show that there exists a constant c such that ¯ 2 , n) < . . . J(θ ¯ N , n)} ≥ 1 − e−nc . ¯ 1 , n) < J(θ Pr{J(θ Second, we focus on goal softening. The advantage of goal softening can be mathematically said as the probability for not providing enough alignment between the truly top-g and observed top-s designs converges to zero exponentially fast, i.e., Pr{|G ∩ S| < k} ≤ O(e−nβ ) for some constant β. For simplicity, we provide the analysis for the case of k = 1. The proof can be extended to other more general cases. First, we consider the selection rule of blind picking. The misalignment probability is     N −g N Pr{|G ∩ S| = 0} = / s s (N − g)(N − g − 1) · · · (N − g − s + 1) = N (N − 1) · · · (N − s + 1) s   N −g g s ≤ = 1− . (8.7) N N Since 1 − x ≤ e−x holds for all x, we have gs

Pr{|G ∩ S| = 0} ≤ e− N .

(8.8)

This shows that the misalignment probability for blind picking converges exponentially w.r.t. the size of the set G and S.

April 29, 2013

16:28

World Scientific Review Volume - 9in x 6in

chapter8

Fundamentals of Ordinal Optimization

163

Second, we consider the selection rule of horse racing. Simply note that the misalignment probability of horse racing (HR) is upper bounded by that of blind picking (BP), i.e., gs

Pr{|G ∩ S| = 0|HR} ≤ Pr{|G ∩ S| = 0|BP } ≤ e− N ,

(8.9)

where the second inequality directly follow from the above discussion. This shows that the misalignment probability for horse racing also converges exponentially w.r.t. the size of the set G and S. We direct interested readers to [2, 3] for more detailed proof for the exponential convergence of order, and to [4] for more detailed proof for the exponential convergence of goal softening. 8.3. Universal Alignment Probabilities In order to make OO useful, once the user specifies the size of the good enough set G, the required alignment level k, and picks a selection rule (say blind picking or horse racing), there should be a method that can specify how many designs to select in order to ensure Pr{|G ∩ S| ≥ k} is high enough. The size of this selected set is of course problem dependent. But once we classify the optimization problems according to the performance distribution of the designs and the observation noise level, there are some universal scheme to specify the size of S. The universal alignment probabilities are used for these purposes. First, consider the selection rule of blind picking. For given size of S and G being s and g, respectively, the alignment probability for BP is min{g,s} g N −g  i Ns−i  . (8.10) Pr{|G ∩ S| ≥ k|BP } = i=k

s

Using Eq. (8.10), one can easily calculate the alignment probabilities for each value of s. Once the user specifies the required lower bound of the alignment probability, say Pr{|G ∩ S| ≥ k} ≥ α, Eq. (8.10) can be used to find the smallest s that satisfies this lower bound. Second, consider the selection rule of horse racing. In this case, we no longer have a closed-form expression as in Eq. (8.10). Instead, abundant numerical experiments have been conducted to provide an approximation. In particular, the required size of the selected set S for α = 0.95 can be approximated by s = eZ1 k Z2 g Z3 + Z4 ,

(8.11)

April 29, 2013

16:28

164

World Scientific Review Volume - 9in x 6in

Q.-S. Jia et al.

where Z1 , Z2 , Z3 , and Z4 are coefficients that depend on the problem type and observation noise level, and have been tabularized [5]. Note that the optimization problems are classified into five types for this purpose, based on a concept of ordered performance curve (OPC). The OPC is defined as a non-decreasing curve that shows the true performance of all the designs sorted from small to large. Since we do not know the true performance of the designs (otherwise we should have already solved the problem and found the optimal design), in practice the OPC is estimated by sorting the observed performance of all the designs. The OPCs are classified into five types according to the shape. The assumption is that as long as we are interested in the type of the OPC, not the exact value for each point on this curve, the estimated OPC should be sufficient. We now summarize the procedure of the OO method as follows. Step 1. Randomly pick N designs from Θ to compose a representative set ΘN . Usually N = 1000. Step 2. The user specifies the size of the good enough set G and the required alignment level k. Step 3. Evaluate the designs in ΘN using a crude model, which is fast but not necessarily accurate. Step 4. Estimate the noise level and the problem type. Step 5. Calculate the value of s s.t. Pr{|G ∩ S| ≥ k} ≥ α. Step 6. Select the observed top-s designs. Step 7. Then OO theory ensures that there are at least k truly top-g designs in S with a probability no smaller than α. Using the above procedure we can reduce the search space from ΘN down to S, which is usually at least one order of magnitude. Studies have been conducted to show that this procedure also picks out top-(g/N )×100% designs from the entire design space Θ with high probability [6, 7]. Then the reduction from Θ down to S is usually an order of multiple magnitudes. After we obtain the set S, other optimization algorithms can be applied to pick the best or top-k designs in S. So OO can be easily combined with other optimization methods. In short, instead of finding the best for sure, which is computationally infeasible in many practical situations, OO finds a good enough design with high probability. 8.4. Extensions When OO was first developed in early 90s, it was designed to solve stochastic simulation-based optimization with single objective function and no

chapter8

April 29, 2013

16:28

World Scientific Review Volume - 9in x 6in

Fundamentals of Ordinal Optimization

chapter8

165

constraints. Ever since then, the idea of OO has been extended to multiple objective functions, simulation-based constraints, deterministic but complex objective functions, as well as many other situations. We briefly review some of these extensions in this section. 8.4.1. Comparison of selection rules A selection rule is a procedure that selects the set S based on the observed performance of the designs. Two selection rules have been discussed in this chapter, namely blind picking and horse racing. BP does not consume any simulation budget. HR only allocated the simulation budget equally among all the designs and once for all. There are of course other ways to allocate the simulation budget and to select the set S. We list a few examples in the following. Round Robin (RR) Every design compares with every other design pairwisely. The simulation budget is equally allocated to ensure that the same number of replications are used for each design in each comparison. Sequential Pairwise Elimination (SPE) Motivated by the tennis tournament, designs are initially grouped into many pairs. The winners of these pairs are grouped into pairs again. This continues until a final winner appears. Optimal Computing Budget Allocation (OCBA) The idea is to allocate the simulation budget to best separate the good designs from the bad ones. This is an iterative allocation procedure. More details on this rule will be presented in the next chapter. Breadth vs. Depth (B vs. D) The idea is to first estimate the marginal benefit if the next unit of simulation budget is allocated to the best observed design (i.e., on depth), or to a randomly picked design that has never been explored before (i.e., on breadth). Then the next unit of the simulation budget is allocated to maximize this estimated marginal benefit. This procedure is continued iteratively until all the simulation budget is allocated. HR with No Elimination (HR ne) In each round we compare the mean values of the observations so far for each design, and allocate the additional δi units of simulation budget to the observed best mi designs. The values of δi and mi are reduced by half in each iteration.

April 29, 2013

16:28

166

World Scientific Review Volume - 9in x 6in

chapter8

Q.-S. Jia et al.

In order to compare these different selection rules, we use the size of the selected set S that achieves a specified alignment probability as a measure of the efficiency of the selection rule. In other words, when two different sets S1 and S2 achieve the same alignment probability, the one with a smaller size is more efficient in reducing the search region. Therefore the selection rule that outputs this smaller selected set is regarded as more efficient. Abundant numerical experiments have been conducted to approximate the size of the selected set S under each selection rule to achieve the required alignment probability. The following function is used for this approximation s(k, g) = eZ1 k Z2 g Z3 + Z4 ,

(8.12)

where the values of Z1 , Z2 , Z3 , and Z4 depend on the type of the optimization problem, the noise level, and the selection rule. These values have been tabularized [8]. In this way, one can estimate the size of the selected set for each selection rule to achieve the same alignment probability and then pick the most efficient selection rule (i.e., the one leads to the smallest selected set). We also summarize three simple rules to guide us to look for better selection rules in the future. The first rule is without elimination. This means even if a design fails in an early round of the comparison, it should still have a chance to receive further simulation budget in the future (though may with a probability that is smaller than that of a good design). Roughly speaking, this allows to correct a mistake in the history. The second rule is global comparison. This means all the designs should be compared in each iteration. Otherwise, some good designs may be screened out in early rounds due to the large observation noise. The third rule is to use the mean value of the observations as the estimate of the performance of the design. This rule fully utilizes the simulation budget that has been allocated to a design to improve the accuracy of the performance estimation. We also summarize two simple and quick tips for picking up a good selection rule. 1. In most of the cases, we recommend HR ne. 2. When the simulation budget is small, the size of the good enough set is small, and we try to find many good enough designs, we recommend HR. 8.4.2. Vector ordinal optimization In many practical problems, there are multiple simulation-based objective functions. If the user knows the priority among these objective functions,

April 29, 2013

16:28

World Scientific Review Volume - 9in x 6in

Fundamentals of Ordinal Optimization

chapter8

167

s/he can reformulate the problem as a sequence of single-objective optimization problems. If the user can assign appropriate weights to each objective functions, s/he can use the weighted sum of these objective functions as a new objective function. Then the problem can be solved using the conventional OO method. However, a more difficult case is that the user does not know the priority neither the appropriate weights among the objective functions. In this subsection, we focus on this case. There are different ways to introduce order among the designs when there are multiple objective functions. One way is to follow the definition of Pareto frontier. Suppose there are m objective functions. Recall that design θ is dominated by θ if Ji (θ ) ≤ Ji (θ) for all i = 1, . . . , m with at least one inequality being strict. A design θ is Pareto optimal if there does not exist any other design θ that can dominate θ. Pareto optimal designs are called layer 1 designs, and denoted as L1 . After removing all the designs in layer 1, . . . , k − 1, layer k is defined as the Pareto optimal designs of the rest of the designs. We then say designs in Li are better than designs in Lj if i < j. And for designs in the same layer, they are incomparable and are regarded as equally good. A second way to introduce the order among designs is to count the number of designs that dominate a design θ, denoted as n(θ). Then we sort all the designs according to n(θ) from small to large. Design θ is said to be better than θ if n(θ) < n(θ ). And designs that are dominated by the same number of designs are regarded as equally good. A third way to introduce the order among designs is to count the number of designs that are dominated by a design θ, denoted as n (θ). Then we sort all the designs according to n (θ) from small to large. Design θ is said to be better than θ if n (θ) < n (θ ). And designs that dominate the same number of designs are regarded as equally good. Besides the above three definitions, there are also other ways to introduce the order among designs. We use the first definition as an example in the rest of this subsection. The discussions can be generalized to cases when the order among designs are defined in other ways. When we use the layers to define the order among designs, the good enough set G is composed of the designs in the truly first g layers. The selected set S is composed of the designs in the observed first s layers. We can draw a curve showing the number of designs in the first x layers for x = 1, . . .. This curve is called the ordered performance curve in the vector case (VOPC). For the case of two objective functions, abundant numerical experiments have been conducted to approximate the value of s using

April 29, 2013

16:28

168

World Scientific Review Volume - 9in x 6in

Q.-S. Jia et al.

Eq. (8.11), where the values of Z1 , Z2 , Z3 , and Z4 depend on the type of VOPC, the observation noise level, g, and k. Note that we can also show the exponential convergence of order in this case. We direct interested readers to [9] for more details. Similar to the case of single objective function, by reducing the search region from Θ down to the observed top-s layers, we can usually save the simulation budget by at least one order of magnitude. Discussions on other definitions of order can be found in [10, 11]. 8.4.3. Constrained ordinal optimization In some practical problems, besides the objective function, the feasibility of a design is also evaluated by simulation. If we directly apply OO in this case, there might be many infeasible designs in the selected set. The size of the selected set thus needs to be recalculated to provide the required alignment probability. Even if that is done, the size of the selected set could be large if the portion of feasible designs is small. The constrianed ordinal optimization (COO) is developed to address this issue. A key observation of COO is that the classification of feasible vs. infeasible is ordinal. The advantages of the two basic ideas of OO also apply here. First, similar to the advantage of ordinal comparison over cardinal value evaluation, it can be relatively easy to obtain a group of truly feasible designs with high probability instead of one for sure. Second, “imperfectness” of the feasibility model is also in tune with goal softening. A feasibility model may make mistake on an individual design. But the model could be very robust w.r.t. a group of designs overall. The basic idea of COO is to first use a feasibility model to screen out N ˆ f . The feasibility designs that are observed as feasible. Denote this set as Θ model is not necessarily accurate but computationally light. Then a crude model is used to evaluate the performance of these screened-out designs. The observed top-s designs are then selected to compose the selected set Sf . Obviously the size of S should depend on the seize of the G (the set of truly top-g designs among the truly feasible designs), the alignment level k, the required alignment probability, the accuracy of the feasibility model, and the observation noise level of the crude model. To simplify the discussion, we discuss how this size may be calculated for the blind picking selection rule in the following. ˆ f to select Sf . Let The BP selection rule means to apply BP within Θ ρf be the density of feasible designs in the entire design space Θ. Suppose there are Nf truly feasible designs in the N predicted feasible designs.

chapter8

April 29, 2013

16:28

World Scientific Review Volume - 9in x 6in

Fundamentals of Ordinal Optimization

chapter8

169

ˆ f as G, the good Denote the set of top αg × 100% truly feasible designs in Θ enough set. Let Pe1 denote the probability that a truly feasible design is predicted as infeasible (also known as the type-I error). Let Pe2 denote the probability that a truly infeasible design is predicted as feasible (also known as the type-II error). Then follow the Bayesian formula we have ˆ f , which is predicted as feasible, the probability that for each design θ in Θ that it is truly feasible is r=

ρf (1 − Pe1 ) . ρf (1 − Pe1 ) + (1 − ρf )Pe2

(8.13)

Let |Sf | = sf and tf denote the number of infeasible designs in the selected subset Sf . We have   sf sf −j r (1 − r)j . (8.14) Pr{tf = j} = j Given that there are tf infeasible designs in Sf , the conditional AP that there are exactly k good enough designs in Sf is g  Nf −g  Pr{|G ∩ Sf | = k|tf } =

k

sf −tf −k  Nf sf −tf



.

(8.15)

Then following the Total-Probability Theorem, we have Pr{|G ∩ Sf | ≥ k} min{g,s} min{sf −i,N −Nf }

=





i=k

j=0

g i

Nf −g sf −j−i  Nf  sf −j

  sf sf −j r (1 − r)j . j

(8.16)

Using Eq. (8.16) we can then calculate the approximate size of Sf to achieve the required alignment probability. Note that when other selection rules are used, modifications to AP are needed. A quick first approximation is to simply modify the unconstrained UAP by r. 8.4.4. Deterministic complex optimization problem All the previous discussion assume the performance of a design can be estimated by a stochastic simulation. However, in practice there are cases where the performance of a design is evaluated through a deterministic but complex calculation. An interesting question is: In what sense are OO in deterministic complex optimization problem (DCP) and OO in stochastic

April 29, 2013

16:28

170

World Scientific Review Volume - 9in x 6in

Q.-S. Jia et al.

complex simulation-based optimization problems (SCP) equivalent so that the UAP table in SCP can be used in both cases? We address this question through the following steps. First, the Kolmogorov complexity justified that we can regard the unpredictable deterministic number as a random number. This means that there is no fundamental difference between the DCP and SCP from an engineering viewpoint. Second, the procedures to apply OO in DCP and in SCP are almost identical. Let us focus on the determination of the size of the selected set in DCP. Suppose we want to regress another UAP table for DCP. Then we need to repeat the experiments as what we did for the UAP for SCP. When Θ is extremely large, almost no design can be selected more than once in the initial random sampling of N designs. This means that all the experimental data are statistically equivalent to those obtained when regressing the UAP table for SCP. So the table thus regressed should be the same as the UAP table for SCP, subject to statistical error. This is why we can use the same UAP table for both cases. Also note that when there are deterministic errors in the crude model for different designs, this can be regarded as correlated noise or independent but non-identical noise in SCP. Numerical results and theoretical explanations have been conducted [12–15] to show that such correlation among the noises seldom hurt and actually helps most of the time. In short, as long as the true performance of the designs in the DCP can be assumed to be Kolmogorov complex, we can apply OO to this DCP and use the UAP table in SCP to determine the size of the selected set. 8.4.5. OO ruler: quantification of heuristic designs Generally speaking, optimization algorithms for discrete event systems are either heuristic or general search methods. In this subsection, we review the idea of using OO as a universal ruler to quantify the goodness of the solution designs of an algorithm. Recall that when we apply OO, N designs are sampled in the beginning to represent the entire design space Θ. Let ΘN denote the set of these N designs. Then for a given design, after ordering designs in ΘN via the observed performance, the lined-up uniform samples of these performance can be seen as an approximate to measure the goodness of this design. This “OO ruler” idea can be explained intuitively as follows. Suppose we can obtain the accurate performances of all the designs in the search space and order them, then we obtain the most accurate ruler. We can know the

chapter8

April 29, 2013

16:28

World Scientific Review Volume - 9in x 6in

Fundamentals of Ordinal Optimization

chapter8

171

goodness of any design by simply comparing its performance with the ruler. But, as the search space can be very huge and the accurate performance is costly to obtain, we are only able to evaluate a limited number of uniform samples and obtain their observed performances. This is an approximate ruler, which can still be used to measure the performance. The mathematical tool we mainly use for the “OO ruler” idea is hypothesis testing. Assume the output set of an optimization algorithm is SH . We give hypotheses as follows, H0 : |SH ∩ GΘ | ≥ k, H1 : |SH ∩ GΘ | < k,

(8.17)

where GΘ is the good enough set of the search space Θ, i.e., the top n% of Θ. We here explain the case when there is only one output design, i.e., SH = {θH }. The acceptance region and rejection region are given as follow, ˆ H ) < J(θ ˆ N,[t] ), D0 : J(θ ˆ H ) ≥ J(θ ˆ N,[t] ), t = 1, 2, . . . , N, D1 : J(θ

(8.18)

ˆ N,[t] ) is the top-t design when ordering designs in N from best where J(θ to worst according to the observed performances. There are two types of errors, Type I: H0 is true, but is judged to be false, Type II: H1 is true, but is judged to be false. We mistakenly accept H0 when Type II error happens, that is, we overestimate the designs given by the algorithm. Thus, Type II error is more severe than Type I error. We control the Type II error to be no larger than a given level, β0 , that is, P (D0 |H1 ) ≤ β0 .

(8.19)

Usually we set β0 = 0.05. ˆ When there is no noise, i.e., J(θ) = J(θ), we can calculate (8.19) directly.

P J(θH ) < J(θN,[t] )||{θH } ∩ GΘ | < 1

≤ P J(θH ) < J(θN,[t] )||{θH } ∩ GΘ | = 1 j  N −j t−1     N n%|Θ| |Θ| − n%|Θ| ≈ |Θ| |Θ| j i=0   t−1  N (n%)j (1 − n%)N −j . = (8.20) j i=0

April 29, 2013

16:28

World Scientific Review Volume - 9in x 6in

Q.-S. Jia et al.

172

The approximation in (8.20) is due to the fact that there may exist some other design θ ∈ ΘN such that J(θ) = J(θH ). Please refer to [16] for details. With (8.20), we can use a table to show the relationship between n% and t when β0 is given. When there is noise, we give a conclusion for the special case of i.i.d. additive noise with continuous probability density function. If a heuristic design θH is observed to be better than all the uniformly samples in ΘN , with Type II error probability no larger than β0 , we can judge the rank of the design in Θ to be at least top   1 1 − cβ0 1 min × 100%. (8.21) ln n% = N 0 ܺത௜ ). According to Glynn and Juneja [48], the problem (9.4) for the general distribution can be optimized when ܴ௜ (ߙ௕ , ߙ௜ ) = ܴ௝ ൫ߙ௕ , ߙ௝ ൯ ∀ ݅ ≠ ܾ, ݆ ≠ b, and ෍ ௜ஷ௕

߲ܴ௜ (ߙ௕ , ߙ௜ )⁄߲ߙ௕ = 1. ߲ܴ௜ (ߙ௕ , ߙ௜ )⁄߲ߙ௜

(9.7)

(9.8)

In the case of normal distribution, the combination of either Eqs. (9.5)–(9.6), or Eqs. (9.7)–(9.8) is equivalent to ܴ௜ (ߙ௕ , ߙ௜ ) =

మ ఋ್,೔

(ఙ್మ ⁄ఈ್ )ା(ఙ೔మ ⁄ఈ೔ )

.

(9.9)

Optimal Computing Budget Allocation Framework

185

9.3.4. Closed-form allocation rules As the motivation for OCBA is to reduce the computational time, the decision makers may not want to spend additional effort for using solver to compute the allocation. Therefore, an approximate closed-form allocation rules can be derived so that they are easy to implement. From equations (9.5) and (9.6) or (9.7) and (9.8), it can be proven that ‫ݓ‬௜ → 0 ∀݅ ≠ ܾ when the number of nonbest designs that are compared with the best in terms of the main objective tends to infinity. Thus, we can approximate ‫ݓ‬௜ ≈ 0 ∀݅ ≠ ܾ, i.e. the allocation for the best design is much greater than the allocation of the non-best designs. For more explanations, see Pujowidianto et al. [91]. For example, the allocation rules for the non-best design for unconstrained optimization in the normal distribution as described in (9.5) becomes ଶ ߙ௜ ߪ௜ଶ ൗߜ௕,௜ = ଶ ଶ ∀݅ ≠ ܾ, ݆ ≠ ܾ. ߙ௝ ߪ௝ ൗߜ௕,௝

(9.10)

The allocation rules as described in equations (9.10) and (9.6) can then be easily implemented. Let ߟ௜ be the noise-to-signal ratio where ߟ௜ =

ߪ௜ଶ ଶ ∀݅ ≠ ܾ. ߜ௕,௜

(9.11)

This indicates the allocation to the non-best design is proportional to its noiseto-signal ratio. 9.3.5. Intuitive explanations of the allocation rules Intuitively, to ensure a high probability of correctly selecting the desired optimal designs, a larger portion of the computing budget should be allocated to those designs that are critical in identifying the necessary ordinal relationships. The questions on how to identify the critical designs and how to allocate to critical and non-critical designs arise. The answers to these questions depend on specific problem settings. The following are the examples of the insights for the unconstrained optimization problem which act as the basis for the insights for other optimization problems. The optimal allocation for the unconstrained optimization shows that the critical designs are the best design and the non-best designs with high noise-tosignal ratio ߟ௜ . It is intuitive that the best design should receive a significant portion of the computing budget as it is the design that we want to find and it is

186

J. Lin et al.

being compared with the other non-best designs. As for the non-best designs, the allocation depends on the noise-to-signal ratio. The higher the ratio of a non-best design, the greater chance that it is wrongly selected as the best design in the simulation experiment. For example, the designs with the high noise-to-signal ratio are those with large variances and main objective values that are close to that of the best design. These are the designs which should receive more computational effort. On the other hand, limited computational effort should be expended on noncritical designs that have little effect on identifying the good design (even if these designs have high variances). Overall simulation efficiency is improved as less computational effort is spent on simulating non-critical designs and more is spent on critical designs. These insights can be extended for the problems with multiple performance measures. As the goal is to allocate less computational efforts to non-critical designs, we can use the easiest performance to prevent a wrong selection. For example, in the case of constrained optimization, if the non-best design has a high chance of being infeasible, it should receive little computational effort even when the main objective value is close to that of the best design. In other word, the allocation for the design in the example is only based on the constraint measure instead of the main objective. If the design has high chance of being infeasible and inferior to the best, both the constraint measures and the main objective can be considered to further penalize the design by spending less simulation effort. This is to avoid wasting unnecessary budget for estimating the designs that are not critical in identifying the best design. In addition, it implies that for the problems with multiple performance measures, the non-best designs can be categorized into several groups depending on which performance measures influencing the chance of them being wrongly selected. 9.3.6. Sequential heuristic algorithm One direct and intuitive approach is to gradually increase the computing budget (i.e., the number of simulation replications) for each alternative solution until the variance of the estimated performance is sufficiently small (i.e., the confidence interval for estimation is satisfactorily narrow). However, optimizing the allocation budget sequentially is challenging. The allocation rules are thus often derived in the asymptotic setting when the computing budget tends to infinity and the sample means and sample variances approach the means and variances. This is because of the nice mathematical properties that can be obtained in the asymptotic analysis.

Optimal Computing Budget Allocation Framework

187

In practice, the means and variances are unknown and they need to be estimated. A sequential heuristic algorithm can be developed where initial number of simulation replications is collected to estimate the parameters required in the allocation. As the computing budget increases, these parameters are updated. INPUT: number of designs, total computing budget, initial number of replications, increment in each iteration INITIALIZE: Initial number of replications for each alternative is performed LOOP: WHILE the total number of replications conducted so far is less than the total computing budget, DO: UPDATE: Calculate sample mean and variance; determine the best design based on the sample mean ALLOCATE: Add increment to the total replications conducted so far and determine the new number of replications for each alternative based on the OCBA rule and compute the additional replications that need to be conducted for each alternative. SIMULATE: Perform the additional number of replications for each alternative END OF LOOP For discussions on the choice of the initial number of replications and the increment, see Chen et al. [26, 27] and Law and Kelton [70]. Extensive numerical results on the performance of the OCBA algorithm for selecting a single best design can be found in Chen et al. [21, 27] and chapter 4 of Chen and Lee [14] where OCBA performs significantly better than when the simulation budget is equally allocated. As the optimal allocation is developed in asymptotic setting, one may ask the consistency issue, namely whether the sequential allocation converges to the true allocation. The complete discussions which show how the estimators for the allocation are consistent can be found in Glynn and Juneja [48], Szechtman and Yücesan [103], Frazier and Powell [42], and Hunter and Pasupathy [61].

188

J. Lin et al.

9.4. Different Extensions of OCBA This section is the revised and updated version of Lee et al. [78] who surveyed on how OCBA has previously been extended. In addition, it aims to complement the second section which has briefly described the key milestones of the development of OCBA. Section 9.4.1 focuses on the OCBA works which use other selection qualities than ܲ‫ܵܥ‬. The extensions on the OCBA for R&S problems with single objective are listed in Section 9.4.2. Section 9.4.3 presents the OCBA approaches for R&S problems with multiple performance measures. The integration of OCBA and searching algorithms is demonstrated in Section 9.4.4. 9.4.1. Selection qualities other than PCS

Although the most frequently used measurement of selection quality is ܲ‫ܵܥ‬, alternative measures of selection quality are also widely developed. One possible important measure of selection quality is the expected opportunity cost (‫)ܥܱܧ‬, which penalizes particularly bad choices more than mildly bad choices. This is used for the unconstrained optimization by Chick and Inoue [32, 33], Chick and Wu [34], and He et al. [50] while Lee et al. [75] handled the multiobjective optimization problems based on ‫ܥܱܧ‬. Chick and Gans [31] provided a novel work which aims to maximize the expected net present value while the undiscounted version can be found in Chick and Frazier [30]. This is because in some contexts, the financial significance is more important than the statistical significance in finding the best alternative. In the case where the objective of the decision makers is not selecting the best alternative, the selection quality can be changed as described in Sections 9.4.4 and 9.5. 9.4.2. Other extensions to OCBA with single objective As mentioned in Section 9.3.2, the OCBA for R&S problems with single objective has been extended to handle the problems with transient state of simulation systems [86, 87], different unit time of simulation across designs [18], and the correlated designs [33, 46]. Apart from these, there are also some other extensions. Chen et al. [24] introduced minor random perturbation to the original OCBA. Trailovic and Pao [107] proposed the allocation procedure to handle the problem where the best design is the one having smallest variance instead of based on the mean. Crain et al. [36] aimed to select the design with

Optimal Computing Budget Allocation Framework

189

the smallest probability of a rare event. Jia [65] proposed the OCBA rule for selecting the best design in the case with random simulation time. Peng et al. [89] proposed a procedure for the case when there is a sharing of computing resources and in the presence of different computational requirements for different simulation models. In some cases, the decision makers want to have several designs for other qualitative considerations instead of having only one single best design. Thus, the problem of selecting the optimal subset is studied in Chen et al. [26]. The procedure uses a constant to separate the optimal subset from the remaining designs. To avoid the inefficiency caused by the constant, Zhang et al. [116] employ the performance of designs as the boundaries to improve the performance of the procedure. LaPorte et al. [69] addressed the subset selection problem in a small budget environment, i.e. the setting with a small initial number of replications or a small overall computing budget. There may also be a need to rank all of the designs instead of only finding the best design as studied in Xiao et al. [113]. 9.4.3. OCBA for multiple performance measures OCBA has also been extended to tackle other simulation-based optimization problems. The needs of optimization problems with multiple performance measures become more evident. We can categorize these problems whether there are any constraints and whether the constraints are stochastic or deterministic. In the case of multi-objective optimization where all objectives are equally important that there are no constraints, Lee et al. [72, 79] worked on the problem for finding the non-dominated Pareto set where the selection qualities used are type I and type II errors. Teng et al. [106] incorporated the indifference-zone concept. Lee et al. [83] proposed the procedure for finding the optimal subset selection by studying ܲ‫ ܵܥ‬analytically. Branke and Gamer [6] transformed the multiple objectives into a single objective with the ability to interactively update the weight distribution. There are cases where the secondary stochastic performance measures are acting as constraints. In this case, the simulation budget allocation can be allocated based on the optimality only, feasibility only, or both. Lee et al. [84] proposed an OCBA approach which maximizes the lower bound of ܲ‫ܵܥ‬. The procedure is applicable for both the independent case and the case with correlated performance measures. Hunter and Pasupathy [61] proposed a

190

J. Lin et al.

large-deviations approach for the case where the main objective and the constraint measures are independent. As the correlation between the performance measures have previously not been explicitly considered, Hunter et al. [62] presented the effect of correlation to the simulation budget allocation in the case with bivariate normal distribution. Hunter et al. [63] extended the work for the multivariate normal distribution. Pujowidianto et al. [91] proposed the closedform expressions of the allocation rules and proved under which circumstances that they are valid. For the case where feasibility determination is the only issue, i.e. there is no need to select the best feasible design, see Szechtman and Yücesan [103]. It is also possible to have a constraint in terms of the complexity preference. In this case, we want to select good enough designs of which complexity in terms of the implementation is not greater than the allowable level [66, 114]. 9.4.4. Integration of OCBA and the searching algorithms There are many research works which integrate OCBA with searching algorithms or concepts to handle the large-scale simulation optimization problems. Lee et al. [74] propose a framework for the integration of MOCBA with search algorithms which is also applicable for general OCBA procedures. Most papers about the integration of OCBA with search algorithms can follow the basic idea of this framework. OCBA is applied to determine the right replications allocated to each candidate solution, which are generated by search algorithms at each iteration, to accurately estimate the fitness of these solutions and compare them. We can classify the related papers based on the different search algorithms integrated with OCBA. For the integration with Nested Partition (NP), Shi et al. [97] show its application in discrete resource allocation. Shi and Chen [96] then give a more detailed hybrid NP algorithm and prove its global optimal convergence. Brantley and Chen [9] use OCBA with mesh moving algorithm for searching the most promising region. Chew et al. [29] integrate MOCBA with NP to handle multi-objective inventory policies problems. For the integration with evolutionary algorithms, Lee et al. [76] discuss the integration of MOCBA with Multi-objective Evolutionary Algorithm (MOEA). In Lee et al. [77], Genetic Algorithm (GA) is integrated with MOCBA to deal with the computing budget allocations for Data Envelopment Analysis. The integration of OCBA with Coordinate Pattern Search for simulation optimization problems with continuous solution space is considered in Romero et al. [95]. Chen et al. [26]

Optimal Computing Budget Allocation Framework

191

show numerical examples about the performance of the algorithm combining OCBA-m with Cross-Entropy (CE). The theoretical part about the integration of OCBA with CE is then further analyzed in He et al. [51]. There has also been a recent trend of using Particle Swarm Optimization (PSO) as the searching algorithm to be used in tandem with OCBA [3, 57, 115, 117]. OCBA has also been integrated with Design of Experiments [8, 10], Sequential Parameter Optimization [2, 4], and Kriging method [93]. There are also some works integrating OCBA and search algorithms when there are multiple performance measures. In the context of constrained optimization, Vieira Junior et al. [108] combined the OCBA approach by Hunter and Pasupathy [61] and the COMPASS searching algorithm by Hong and Nelson [54]. In the context of multi-objective optimization, Lee et al. [81] developed multi-objective COMPASS (MO-COMPASS). This potentially leads to the integration of multi-objective OCBA with other searching algorithms than EA and NP. Examples of potential searching algorithms for multi-objective optimization which can be used are GO-POLARS by Li et al. [85] and the reference-point method by Siegmund et al. [100]. 9.5. Generalized OCBA Framework The different extensions in Section 9.4 indicate that there exists a consistent notion where an optimization model is used to determine the best allocation scheme to maximize a certain desired quality of the outcome given a fixed budget. This is summarized by the OCBA generalized framework in Figure 9.1. The framework also shows that OCBA can be used to address problems beyond simulation optimization, i.e. those which do not aim to select the optimal subset or the best design.

Fig. 9.1. A Generic View of the OCBA Framework

192

J. Lin et al.

In all scenarios, the key motivation is the limited total budget, ܶ which has to be allocated to all the different processors. The total budget and the processor usually refer to the computing budget and simulation respectively so that the allocation quantity ܰ௜ is the number of simulation replications. Each processor generates an output X_i, a random variable of which quality depends on the budget allocated, i.e. ܰ௜ . A better quality of the output is usually obtained when there is more budget allocated. Based on these outputs, a synthesizer is used to produce the overall outcome. For example, in the original OCBA, the synthesizer is a selection procedure which uses the simulation outputs to select the best alternative with the outcome of the selected best. Once an outcome is obtained, there are different ways of analyzing the quality of the outcome as described in the review of the works with different selection qualities in Section 9.4.1. The framework shows that OCBA can be used for other simulation problems. For example, it can be used for estimating the probability of rare event simulation by changing the synthesizer into a procedure which estimates rare-event probability. In this case, the quality of the outcome becomes the variance of the estimator that needs to be minimized. [98, 99]. Similarly, OCBA can also be used for determining the simulation length of a single design using regression [60]. In addition, OCBA can also be used to problem which does not employ simulation. For example, it can be used for data envelopment analysis as shown in Wong et al. [112] and Wong [111]. In this case, the total budget represents the total data collection budget while the processor is the data collection process. Here, the allocation quantity to each processor ܰ௜ would refer to the number of data points for each unknown variable. The outcome would be the estimated efficiency of which mean squared error needs to be optimized. For more details of the generalized framework of OCBA, see chapter 8 of Chen and Lee [14]. 9.6. Applications of OCBA OCBA has also been used in many different application problems. It provides an effective way to solve operations management problems, such as the combinatorial optimization problems which include machine clustering problems [19], electronic circuit design problems [23], semiconductor wafer fab scheduling problems [58, 59]. Chen and He [13] applied OCBA to a design problem in US air traffic management due to the high complexity of this system. For multi-objective problems, Lee et al. [73] employed MOCBA to optimally

Optimal Computing Budget Allocation Framework

193

select the non-dominated set of inventory policies for the differentiated service inventory problem and an aircraft spare parts inventory problem. Horng et al. [56] tackled the hotel booking limits problem while Horng et al. [57] aimed to to minimize the overkills and re-probes in wafer probe testing. Pujowidianto et al. [92] applied OCBA for constrained optimization to determine the optimal hospital bed allocation. Jia [64] applied OCBA for simulation-based policy improvement while Jia [65] considered the smoke detection problem in wireless sensor network. Lee et al. [80] applied multi-objective OCBA for the container terminal layout design problem. 9.7. Future Research OCBA has been shown to be very efficient. This is attractive for practitioners to use and for the researchers to continue extending the works. There are several possible extensions. First, large deviations perspective can be used in re-visiting some problems. In addition, there are some efforts for moving away from the asymptotic analysis as the implementation is in terms of finite time. Chen et al. [25] found out that the dynamic sequential allocation, which aims to maximize the probability of correct selection after additional allocation, performs better than a static optimal solution that is allocated in one stage even with the assumption of perfect information. Other ways are by sequentially optimizing the allocation process such as the work on expected value of information by Chick et al. [35] and the knowledge-gradient works of which foundation is laid by Frazier and Powell [41]. Waeber et al. [110] proposed a framework for selecting R&S procedures based on a new performance measure that is analogous to convex risk measures. It would be a worthwhile pursuit to optimize this proposed performance measure as the desired selection quality in the spirit of the OCBA framework. 9.8. Concluding Remarks The OCBA framework is an optimized approach for allocating limited simulation resource to achieve the best quality of the overall outcome. OCBA was initially developed when the author Chen was doing his doctoral studies at Harvard under Professor Ho’s guidance. He continued to work on the problem as he became faculty at the University of Pennsylvania, George Mason University, and National Taiwan University. In 2002, the author Lee, who was

194

J. Lin et al.

also a former Ph.D. student of Professor Ho, started to work on the OCBA problems when he worked in the National University of Singapore. Later in the year of 2007, when Lee spent his sabbatical leave at George Mason University, working with Chen, they further developed OCBA and eventually wrote a book on the subject. OCBA not only provides an efficient selection rule of OO, but also grows into an important research area by itself. Further, OCBA has been shown as one of the top performers by Branke et al. [7] and Waeber et al. [109]. The success of this research is impossible without Prof. Yu-Chi Ho’s vision, insight, support, and generosity. We greatly appreciate Professor Ho, and want to say “Happy Birthday”. References [1] Andradóttir, S. (1998). Simulation Optimization. In: J. Banks (Ed.). Handbook of Simulation: Principles, Methodology, Advances, Applications, and Practice, Chapter 9, John Wiley & Sons, New York. [2] Bartz-Beielstein, T. and Friese, M. (2011). Sequential Parameter Optimization and Optimal Computational Budget Allocation for Noisy Optimization Problems, CIOP Technical Report 02/11, Cologne University, Germany. [3] Bartz-Beielstein, T., Blum, D., and Branke, J. (2007). Particle Swarm Optimization and Sequential Sampling in Noisy Environments. In: K. F. Doerner et al. (Ed.). Metaheuristics, Chapter 14, Springer, pp. 261–273. [4] Bartz-Beielstein, T., Friese, M., Zaefferer, M., Naujoks, B., Flasch, O., Konen, W., and Koch, P. (2011). Noisy Optimization with Sequential Parameter Optimization and Optimal Computational Budget Allocation, Proceedings of the 13th Annual Conference Companion on Genetic and Evolutionary Computation, pp. 119–120. [5] Bechhofer, R. E., Santner, T. J., and Goldsman, D. M. (1995). Design and Analysis of Experiments for Statistical Selection, Screening, and Multiple Comparisons, Wiley, New York. [6] Branke, J. and Gamer, J. (2007). Efficient Sampling in Interactive Multi-criteria Selection, Proceedings of the 2007 INFORMS Simulation Society Research Workshop. [7] Branke, J., Chick, S. E., and Schmidt, C. (2007). Selecting a Selection Procedure, Management Science, 53: 1916–1932. [8] Brantley, M. W. (2011). Simulation-based Stochastic Optimization on Discrete Domains: Integrating Optimal Computing and Response Surfaces, PhD Thesis, George Mason University. [9] Brantley, M. W. and Chen, C. H. (2005). A Moving Mesh Approach for Simulation Budget Allocation on Continuous Domains, Proceedings of the 2005 Winter Simulation Conference, pp. 699–707.

Optimal Computing Budget Allocation Framework

195

[10] Brantley, M. W., Lee, L. H., Chen, C. H., and Chen, A. (2008). Optimal Sampling in Design of Experiment for Simulation-based Stochastic Optimization, Proceedings of the 4th IEEE Conference on Automation and Science Engineering, pp. 388–393. [11] Chen, C. H. (1995). An Effective Approach to Smartly Allocate Computing Budget for Discrete Event Simulation, Proceedings of the 34th IEEE Conference on Decision and Control, pp. 2598–2605. [12] Chen, C. H. (1996). A Lower Bound for the Correct Subset-selection Probability and Its Application to Discrete Event System Simulations, IEEE Transactions on Automatic Control, 41(8): 1227–1231. [13] Chen, C. H. and He, D. (2005). Intelligent Simulation for Alternatives Comparison and Application to Air Traffic Management, Journal of Systems Science and Systems Engineering, 14(1): 37–51. [14] Chen, C. H. and Lee, L. H. (2011). Stochastic Simulation Optimization: An Optimal Computing Budget Allocation, World Scientific Publishing Co. [15] Chen, C. H. and Yücesan, E. (2005). An Alternative Simulation Budget Allocation Scheme for Efficient Simulation, International Journal of Simulation and Process Modeling, 1: 49–57. [16] Chen, C. H., Chen, H. C., and Dai, L. (1996). A Gradient Approach for Smartly Allocating Computing Budget for Discrete Event Simulation, Proceedings of the 1996 Winter Simulation Conference, pp. 398–405. [17] Chen, H. C., Dai, L., Chen, C. H., and Yücesan, E. (1997). New Development of Optimal Computing Budget Allocation for Discrete Event Simulation, Proceedings of the 1997 Winter Simulation Conference, pp. 334–341. [18] Chen, C. H., Yücesan, E., Yuan, Y., Chen, H. C., and Dai, L. (1998). Computing Budget Allocation for Simulation Experiments with Different System Structures, Proceedings of the 1998 Winter Simulation Conference, pp. 735–742. [19] Chen, C. H., Wu, S. D., and Dai, L. (1999a). Ordinal Comparison of Heuristic Algorithms using Stochastic Optimization, IEEE Transactions on Robotics and Automation, 15(1): 44–56. [20] Chen, C. H., Yücesan, E., Yuan, Y., Chen, H. C., and Dai, L. (1999b). An Asymptotic Allocation for Simultaneous Simulation Experiments, Proceedings of the 1999 Winter Simulation Conference, pp. 359–366. [21] Chen, C. H., Lin, J., Yücesan, E., and Chick, S. E. (2000a). Simulation Budget Allocation for Further Enhancing the Efficiency of Ordinal Optimization, Journal of Discrete Event Dynamic Systems: Theory and Applications, 10: 251–270. [22] Chen, H. C., Chen, C. H., and Yücesan, E. (2000b). Computing Efforts Allocation for Ordinal Optimization and Discrete Event Simulation, IEEE Transactions on Automatic Control, 45(5): 960–964. [23] Chen, C. H., Donohue, K., Yücesan, E., and Lin, J. (2003a). Optimal Computing Budget Allocation for Monte Carlo Simulation with Application to Product Design, Simulation Modelling Practice and Theory, 11(1): 57–74.

196

J. Lin et al.

[24] Chen, C. H., He, D., and Yücesan, E. (2003b). Better-than-optimal Simulation Run Allocation? Proceedings of the 2003 Winter Simulation Conference, pp. 490–495. [25] Chen, C. H., He, D., and Fu, M. (2006). Efficient Dynamic Simulation Allocation in Ordinal Optimization, IEEE Transactions on Automatic Control, 51(12): 2005–2009. [26] Chen, C. H., He, D., Fu, M., and Lee, L. H. (2008). Efficient Simulation Budget Allocation for Selecting an Optimal Subset, INFORMS Journal on Computing, 20(4): 579–595. [27] Chen, C. H., Yücesan, E., Dai, L., and Chen, H. C. (2010). Efficient Computation of Optimal Budget Allocation for Discrete Event Simulation Experiment, IIE Transactions, 42(1): 60–70. [28] Chen, C. H., Shi, L., and Lee, L. H. (2011). Stochastic Systems Simulation Optimization. Frontiers of Electrical and Electronic Engineering in China, 6(3): 468–480. [29] Chew, E. P., Lee, L. H., Teng, S. Y., and Koh, C. H. (2009). Differentiated Service Inventory Optimization using Nested Partitions and MOCBA, Computers and Operations Research, 36(5): 1703–1710. [30] Chick, S. and Frazier, P. (2012). Sequential Sampling with Economics of Selection Procedures, Management Science, 58(3): 550–569. [31] Chick, S. and Gans, N. (2009). Economic Analysis of Simulation Selection Problems, Management Science, 55(3): 421–437. [32] Chick, S. and Inoue, K. (2001a). New Two-stage and Sequential Procedures for Selecting the Best Simulated System, Operations Research, 49: 1609–1624. [33] Chick, S. and Inoue, K. (2001b). New Procedures to Select the Best Simulated System using Common Random Numbers, Management Science, 47: 1133–1149. [34] Chick, S. E. and Wu, Y.-Z. (2005). Selection Procedures with Frequentist Expected Opportunity Cost, Operations Research, 53(5): 867–878. [35] Chick, S., Branke, J., and Schmidt, C. (2010). Sequential Sampling to Myopically Maximize the Expected Value of Information, INFORMS Journal on Computing, 22(1): 71–80. [36] Crain, B., Chen, C.-H., and Shortle, J. F. (2011). Combining Simulation Allocation and Optimal Splitting for Rare-event Simulation Optimization, Proceedings of the 2011 Winter Simulation Conference, pp. 3998–4007. [37] Dai, L. (1996). Convergence Properties of Ordinal Comparison in the Simulation of Discrete Event Dynamic Systems. Journal of Optimization Theory and Application, 91(2): 363–388. [38] Dai, L. and Chen, C. H. (1997). Rate of Convergence for Ordinal Comparison of Dependent Simulations in Discrete Event Dynamic Systems, Journal of Optimization Theory and Applications, 94(1): 29–54. [39] Dai, L., Chen, C. H., and Birge, J. R. (2000). Large Convergence Properties of Two-stage Stochastic Programming, Journal of Optimization Theory and Applications, 106(3): 489–510.

Optimal Computing Budget Allocation Framework

197

[40] Dudewicz, E. J. and Dalal, S. R. (1975). Allocation of Observations in Ranking and Selection with Unequal Variances, Sankhya 37B: 28–78. [41] Frazier, P. and Powell, W. B. (2008). The Knowledge-gradient Stopping Rule for Ranking and Selection, Proceedings of the 2008 Winter Simulation Conference, pp. 305–312. [42] Frazier, P. I. and Powell, W. B. (2011). Consistency of Sequential Bayesian Sampling Policies, SIAM Journal on Control and Optimization, 49(2): 712–731. [43] Fu, M. C. (2002). Optimization for Simulation: Theory vs. Practice (Feature Article), INFORMS Journal on Computing, 14(3): 192–215. [44] Fu, M. C., Hu, J. Q., Chen, C. H., and Xiong, X. (2004). Optimal Computing Budget Allocation under Correlated Sampling, Proceedings of the 2004 Winter Simulation Conference, pp. 595–603. [45] Fu, M. C., Glover, F. W., and April, J. (2005). Simulation Optimization: A Review, New Developments, and Applications. Proceedings of the 2005 Winter Simulation Conference, pp. 83–95. [46] Fu, M. C., Hu, J. Q., Chen, C. H., and Xiong, X. (2007). Simulation Allocation for Determining the Best Design in the Presence of Correlated Sampling, INFORMS Journal on Computing, 19(1): 101–111. [47] Fu, M. C., Chen, C. H., and Shi, L. (2008). Some Topics for Simulation Optimization. Proceedings of the 2008 Winter Simulation Conference, pp. 27–38. [48] Glynn, P. and Juneja, S. (2004). A Large Deviations Perspective on Ordinal Optimization, Proceedings of the 2004 Winter Simulation Conference, pp. 577– 585. [49] Goldsman, D. and Nelson, B. L. (1998). Comparing Systems via Simulation. In: J. Banks (Ed.). Handbook of Simulation: Principles, Methodology, Advances, Applications, and Practice, Chapter 8, John Wiley & Sons, New York, pp. 273–306. [50] He, D., Chick, S. E., and Chen, C. H. (2007). The Opportunity Cost and OCBA Selection Procedures in Ordinal Optimization, IEEE Transactions on Systems, Man, and Cybernetics — Part C (Applications and Reviews), 37(4): 951–961. [51] He, D., Lee, L. H., Chen, C. H., Fu, M., and Wasserkrug, S. (2010). Simulation Optimization using the Cross-entropy Method with Optimal Computing Budget Allocation, ACM Transactions on Modeling and Computer Simulation, 20(1): Article 4. [52] Ho, Y. C., Cassandras, C. G., Chen, C. H., and Dai, L. (2000). Ordinal Optimization and Simulation, Journal of Operational Research Society, 51(4): 490–500. [53] Ho, Y.-C., Zhao, Q.-C., and Jia, Q.-S. (2007). Ordinal Optimization: Soft Optimization for Hard Problems, Springer, New York. [54] Hong, L. J. and Nelson, B. L. (2006). Discrete Optimization via Simulation using COMPASS, Operations Research 54(1): 115–129.

198

J. Lin et al.

[55] Hong, L. J. and Nelson, B. L. (2009). A Brief Introduction to Optimization via Simulation, Proceedings of the 2009 Winter Simulation Conference, pp. 75–85. [56] Horng, S.-C., Yang, F.-Y., and Lin, S.-S. (2012a). Embedding Evolutionary Strategy in Ordinal Optimization for Hard Optimization Problems, Applied Mathematical Modelling, 36(8): 3753–3763. [57] Horng, S.-C., Yang, F.-Y., and Lin, S.-S. (2012b). Applying PSO and OCBA to Minimize the Overkills and Re-probes in Wafer Probe Testing, IEEE Transactions on Semiconductor Manufacturing, 25(3): 531–540. [58] Hsieh, B. W., Chen, C. H., and Chang, S. C. (2001). Scheduling Semiconductor Wafer Fabrication by using Ordinal Optimization-based Simulation, IEEE Transactions on Robotics and Automation, 17(5): 599–608. [59] Hsieh, B. W., Chen, C. H., and Chang, S. C. (2007). Efficient Simulation-based Composition of Dispatching Policies by Integrating Ordinal Optimization with Design of Experiment, IEEE Transactions on Automation Science and Engineering, 4(4): 553–568. [60] Hu, X., Lee, L. H., Chew, E. P., Morrice, D. J., and Chen, C.-H. (2012). Efficient Computing Budget Allocation for a Single Design by using Regression with Sequential Sampling Constraint, Proceedings of the 2012 Winter Simulation Conference, To Appear. [61] Hunter, S. R. and Pasupathy, R. (2012). Optimal Sampling Laws for Stochastically Constrained Simulation Optimization on Finite Sets, INFORMS Journal on Computing, Articles in Advance pp. 1–16. [62] Hunter, S. R., Pujowidianto, N. A., Chen, C. H., Lee, L. H., Pasupathy, R., and Yap, C. M. (2011). Optimal Sampling Laws for Stochastically Constrained Simulation Optimization on Finite Sets: The Bivariate Normal Case, Proceedings of the 2011 Winter Simulation Conference, pp. 4294–4302. [63] Hunter, S. R., Pujowidianto, N. A., Pasupathy, R., Lee, L. H., and Chen, C.-H. (2012). Constrained and Correlated Simulation Optimization on Finite Sets. Under Review. [64] Jia, Q.-S. (2012a). Efficient Computing Budget Allocation for Simulation-based Policy Improvement, IEEE Transactions on Automation Science and Engineering, 9(2): 342–352. [65] Jia, Q.-S. (2012b). Efficient Computing Budget Allocation for Simulation-based Optimization with Stochastic Simulation Time, IEEE Transactions on Automatic Control, To Appear. [66] Jia, Q. S., Zhou, E., and Chen, C. H. (2012). Efficient Computing Budget Allocation for Finding Simplest Good Designs, IIE Transactions, To Appear. [67] Kim, S. H. and Nelson, B. L. (2001). A Fully Sequential Procedure for Indifference-zone Selection in Simulation, ACM Transactions on Modeling and Computer Simulation, 11(3): 251–273. [68] Kim, S. H. and Nelson, B. L. (2006). Selecting the Best System. In: S. Henderson and B. L. Nelson (Eds.). Handbook in Operations Research and Management Science: Simulation, Chapter 17, Elsevier, Amsterdam, pp. 501–534.

Optimal Computing Budget Allocation Framework

199

[69] LaPorte, G., Branke, J., and Chen, C.-H. (2012). Optimal Computing Budget Allocation in a Small Budget Environment, Proceedings of the 2012 Winter Simulation Conference, To Appear. [70] Law, A. M. and Kelton, W. D. (2000). Simulation Modeling and Analysis. McGraw-Hill, New York. [71] Lee, L. H., Lau, T. W. E., and Ho, Y. C. (1999). Explanation of Goal Softening in Ordinal Optimization, IEEE Transactions on Automatic Control, 44(1): 94–99. [72] Lee, L. H., Chew, E. P., Teng, S. Y., and Goldsman, D. (2004). Optimal Computing Budget Allocation for Multi-objective Simulation Models, Proceedings of 2004 Winter Simulation Conference, pp. 586–594. [73] Lee, L. H., Teng, S., Chew, E. P., Karimi, I. A., Chen, Y. K., Koh, C. H., Lye, K. W., and Lendermann, P. (2005). Application of Multi-objective Simulation-optimization Techniques to Inventory Management Problems, Proceedings of the 2005 Winter Simulation Conference, pp. 1684–1691. [74] Lee, L. H., Chew, E. P., and Teng, S. (2006). Integration of Statistical Selection with Search Mechanism for Solving Multi-objective Simulation-optimization Problems, Proceedings of the 2006 Winter Simulation Conference, pp. 294–303. [75] Lee, L. H., Chew, E. P., and Teng, S. (2007). Finding the Pareto Set for Multi-objective Simulation Models by Minimization of Expected Opportunity Cost, Proceedings of the 2007 Winter Simulation Conference, pp. 513–521. [76] Lee, L. H., Chew, E. P., Teng, S. Y., and Chen, Y. K. (2008). Multi-objective Simulation-based Evolutionary Algorithm for An Aircraft Spare Parts Allocation Problem, European Journal of Operational Research, 189(2): 476–491. [77] Lee, L. H., Wong, W. P., and Jaruphongsa, W. (2009). Data Collection Budget Allocation for Stochastic Data Envelopment Analysis. Proceedings of the 2009 INFORMS Simulation Society Research Workshop, pp. 71–74. [78] Lee, L. H., Chen, C. H., Chew, E. P., Li, J., Pujowidianto, N. A., and Zhang, S. (2010a). A Review of Optimal Computing Budget Allocation Algorithms for Simulation Optimization Problem, International Journal of Operations Research, 7(2): 19–31. [79] Lee, L. H., Chew, E. P., Teng, S. Y., and Goldsman, D. (2010b). Finding the Nondominated Pareto Set for Multi-objective Simulation Models, IIE Transactions, 42(9): 656–674. [80] Lee, L. H., Chew, E. P., Chua, K. H., Sun, Z., and Zhen, L. (2011a). A Simulation Optimisation Framework for Container Terminal Layout Design. In: Wang et al. (Eds.). Multi-objective Evolutionary Optimisation for Product Design and Manufacturing, Chapter 14, Springer, pp. 385–400. [81] Lee, L. H., Chew, E. P., and Li, H. (2011b). Multi-objective Compass for Discrete Optimization via Simulation, Proceedings of the 2011 Winter Simulation Conference, pp. 4065–4074. [82] Lee, L. H., Chen, C.-H., Chew, E. P., Zhang, S., Li, J., and Pujowidianto, N. A. (2012a). Some Efficient Simulation Budget Allocation Rules for Simulation

200

[83]

[84]

[85]

[86]

[87]

[88]

[89]

[90]

[91]

[92]

[93]

[94] [95]

J. Lin et al.

Optimization Problems, International Journal of Services Operations and Informatics, To Appear. Lee, L. H., Chew, E. P., and Li, J. (2012b). Efficient Subset Selection via OCBA for Multi-objective Simulation Optimization Problems, Working Paper, Department of Industrial and Systems Engineering, National University of Singapore, Singapore. Lee, L. H., Pujowidianto, N. A., Li, L.-W., Chen, C. H., and Yap, C. M. (2012c). Approximate Simulation Budget Allocation for Selecting the Best Design in the Presence of Stochastic Constraints, IEEE Transactions on Automatic Control, 57(11): 2940–2945. Li, H., Lee, L. H., and Chew, E. P. (2012). Optimization via Gradient Oriented Polar Random Search, Proceedings of the 2012 Winter Simulation Conference, To Appear. Morrice, D. J., Brantley, M. W., and Chen, C. H. (2008). An Efficient Ranking and Selection Procedure for a Linear Transient Mean Performance Measure, Proceedings of the 2008 Winter Simulation Conference, pp. 290–296. Morrice, D. J., Brantley, M. W., and Chen, C. H. (2009). A Transient Means Ranking and Selection Procedure with Sequential Sampling Constraints, Proceedings of the 2009 Winter Simulation Conference, pp. 590–600. Nelson, B. L., Swann, J., Goldsman, D., and Song, W. (2001). Simple Procedures for Selecting the Best Simulated System when the Number of Alternatives is Large, Operations Research, 49(6): 950–963. Peng, Y., Chen, C. H., Fu, M. C., and Hu, J. Q. (2012). Efficient Simulation Resource Sharing and Allocation for Selecting the Best, IEEE Transactions on Automatic Control, To Appear. Pujowidianto, N. A., Lee, L. H., Chen, C. H., and Yap, C. M. (2009). Optimal Computing Budget Allocation for Constrained Optimization, Proceedings of the 2009 Winter Simulation Conference, pp. 584–589. Pujowidianto, N. A., Hunter, S. R., Pasupathy, R., Lee, L. H., and Chen, C. H. (2012a). Closed-form Sampling Laws for Stochastically Constrained Simulation Optimization on Large Finite Sets, Proceedings of the 2012 Winter Simulation Conference, To Appear. Pujowidianto, N. A., Lee, L. H., Yep, C. M., and Chen, C. H. (2012b). Efficient Simulation-based Comparison for Hospital Bed Allocation, Proceedings of the 2012 IIE Asian Conference, pp. 229–232. Quan, N., Yin, J., Ng, S. H., and Lee, L. H. (2012). Simulation Optimization via Kriging: A Sequential Search using Expected Improvement with Computing Budget Constraints, IIE Transactions, To Appear. Rinott, Y. (1978). On Two-stage Selection Procedures and Related Probability Inequalities, Communications in Statistics, A7: 799–811. Romero, V. J., Ayon, D. V., and Chen, C. H. (2006). Demonstration of Probabilistic Ordinal Optimization Concepts to Continuous-variable Optimization under Uncertainty, Optimization and Engineering, 7(3): 343–365.

Optimal Computing Budget Allocation Framework

201

[96] Shi, L. and Chen, C. H. (2000). A New Algorithm for Stochastic Discrete Resource Allocation Optimization, Journal of Discrete Event Dynamic Systems: Theory and Applications, 10: 271–294. [97] Shi, L., Chen, C. H., and Yücesan, E. (1999). Simultaneous Simulation Experiments and Nested Partition for Discrete Resource Allocation in Supply Chain Management, Proceedings of the 1999 Winter Simulation Conference, pp. 395–401. [98] Shortle, J. F. and Chen, C. H. (2008). A Preliminary Study of Optimal Splitting for Rare-event Simulation, Proceedings of the 2008 Winter Simulation Conference, pp. 266–272. [99] Shortle, J. F., Chen, C.-H., Crain, B., Brodsky, A., and Brod, D. (2012). Optimal Splitting for Rare-event Simulation, IIE Transactions, 44(5): 352–367. [100] Siegmund, F., Bernedixen, J., Pehrsson, L., Ng, A., and Deb, K. (2012). Reference Point-based Evolutionary Multi-objective Optimization for Industrial Systems Simulation, Proceedings of the 2012 Winter Simulation Conference, To Appear. [101] Simpson, T. W., Booker, A. J., Ghosh, D., Giunta, A. A., Koch, P. N., and Yang, R.-J. (2004). Approximation Methods in Multidisciplinary Analysis and Optimization: A Panel Discussion, Structural and Multidisciplinary Optimization, 27(5): 302–313. [102] Swisher, J. R., Jacobson, S. H., and Yücesan, E. (2003). Discrete-event Simulation Optimization using Ranking, Selection, and Multiple Comparison Procedures: A Survey. ACM Transactions on Modeling and Computer Simulation, 13(2): 134–154. [103] Szechtman, R. and Yücesan, E. (2008). A New Perspective on Feasibility Determination, Proceedings of the 2008 Winter Simulation Conference, pp. 273– 280. [104] Tekin, E. and Sabuncuoglu, I. (2004). Simulation Optimization: A Comprehensive Review on Theory and Applications, IIE Transactions, 36: 1067–1081. [105] Teng, S., Lee, L. H., and Chew, E. P. (2007). Multi-objective Ordinal Optimization for Simulation Optimization Problems, Automatica, 43(11): 1884– 1895. [106] Teng, S., Lee, L. H., and Chew, E. P. (2010). Integration of Indifference-zone with Multi-objective Computing Budget Allocation, European Journal of Operational Research, 203(2): 419–429. [107] Trailovic, L. and Pao, L. Y. (2004). Computing Budget Allocation for Efficient Ranking and Selection of Variances with Application to Target Tracking Algorithms, IEEE Transactions on Automatic Control, 49: 58–67. [108] Vieira Junior, H., Kienitz, K. H., and Belderrain, M. C. N. (2011). Discretevalued, Stochastic-constrained Simulation Optimization with COMPASS, Proceedings of the 2011 Winter Simulation Conference, pp. 4191–4200. [109] Waeber, R., Frazier, P. I., and Henderson, S. G. (2010). Performance Measures for Ranking and Selection Procedures, Proceedings of the 2010 Winter Simulation Conference, pp. 1235–1245.

202

J. Lin et al.

[110] Waeber, R., Frazier, P. I., and Henderson, S. G. (2012). A Framework for Selecting a Selection Procedure, ACM Transactions on Modeling and Computer Simulation, 22(3): Article 16. [111] Wong, W. P. (2011). A DCBA-DEA Methodology for Selecting Suppliers with Supply Risk, International Journal of Productivity and Quality Management, 8(3): 296–312. [112] Wong, W. P., Jaruphongsa, W., and Lee, L. H. (2011). Budget Allocation for Effective Data Collection in Predicting an Accurate DEA Efficiency Score, IEEE Transactions on Automatic Control, 56(6): 1235–1245. [113] Xiao, H., Lee, L. H., and Ng, K. M. (2012). Optimal Computing Budget Allocation for Complete Ranking. Under Review. [114] Yan, S., Zhou, E., and Chen, C. H. (2012). Efficient Selection of a Set of Good Enough Designs with Complexity Preference, IEEE Transactions on Automation Science and Engineering, 9(3): 596–606. [115] Zhang, S., Chen, P., Lee, L. H., Chew, E. P., and Chen, C. H. (2011). Simulation Optimization using the Particle Swarm Optimization with Optimal Computing Budget Allocation, Proceedings of the 2011 Winter Simulation Conference, pp. 4303–4314. [116] Zhang, S., Lee, L. H., Chew, E. P., Chen, C. H., and Jen, H. Y. (2012a). An Improved Simulation Budget Allocation Procedure to Efficiently Select the Optimal Subset of Many Alternatives, Proceedings of 2012 IEEE International Conference on Automation Science and Engineering, pp. 226–232. [117] Zhang, R., Song, S., and Wu, C. (2012b). A Two-stage Hybrid Particle Swarm Optimization Algorithm for the Stochastic Job Shop Scheduling Problem, Knowledge-based Systems, 27: 393–406.

April 29, 2013

16:36

World Scientific Review Volume - 9in x 6in

Chapter 10 Nested Partitions

Weiwei Chen GE Global Research Leyuan Shi University of Wisconsin-Madison Nested Partitions (NP) is a partition and sampling based framework for solving large-scale optimization problems. It has been successfully applied to solve industrial problems, such as product design, supply chain, logistics and healthcare. This chapter first reviews the research background and its connection with ordinal optimization. Then the generic NP method for solving deterministic optimization is introduced, and the global convergence of NP is proved. Some enhancements and advanced developments of NP are further presented. These enhancements include a linear programming (LP) solution-based sampling method that biases sampling probabilities using LP solutions, an extreme value-based promising index that can more accurately and robustly guide the NP moves, and hybrid NP algorithms which have been proved effective in solving real-world problems. Finally, NP method for solving stochastic optimization is developed, and the global convergence is discussed.

10.1. Overview Optimization problems arise in many key engineering applications and business decisions. Obtaining good solutions of such problems can lead to the success of businesses by, for example, reducing operational costs, increasing revenue, or improving service levels. However, seeking an optimal or even near-optimal solution for many deterministic problems are notoriously difficult, due to the large size of the problems or structural difficulties. Adding to these difficulties, stochastic optimization problems are even more challenging because of the noise in evaluating the performance of each solution.

203

chapter10

April 29, 2013

16:36

204

World Scientific Review Volume - 9in x 6in

W. Chen and L. Shi

Extensive research has been done to explore efficient algorithms of solving large-scale deterministic optimization problems. There are generally two camps of algorithms: (1) exact algorithms that are grounded in mathematical programming and dynamic programming theories; (2) heuristics, including metaheuristics, which aim to quickly find acceptable solutions. Exact algorithms have been studied for decades, and significant breakthroughs in the ability to solve large-scale optimization problems using mathematical programming have been achieved, especially in solving discrete optimization. Branching and bound methods [1, 2] are one category of exact methods that are commonly used in solving integer or mixed integer programming. Decomposition methods are another class of methods to solve discrete optimization problems [1]. Lagrangian relaxation can be thought of as a decomposition method with respect to the constraints since it moves one or more constraints into the objective function. Relaxation methods play a key role in the use of mathematical programming for solving discrete optimization problems [3, 4]. The Lagrangian problem is easier to solve since the complicating constraints are no longer present. Furthermore, it often produces a fairly tight and hence useful bound. Dynamic programming is another class of exact algorithms that are popularly used to address sequential decision making. It solves subproblems recursively using the Bellman equation [5]. These exact algorithms are very powerful nowadays because of research breakthroughs, however, it can be still very time-consuming to obtain optimal solutions for very large problems in the real world, or is unable to handle structural difficulties, such as non-linear constraints. Some approximation algorithms have been developed based on mathematical programming theories. These algorithms can guarantee that the solution lies within a certain range of the optimal solution. They can also usually provide provable runtime bounds. A typical example for an approximation algorithm is that for covering problems [6, 7]. Another class of optimization algorithms are heuristic algorithms, which aim to find good solutions within an acceptable time frame. Unlike exact algorithms, heuristics do not usually make a performance guarantee, such as bounds. But they have been proven to be very effective and efficient in solving many real-world problems. Some heuristic algorithms utilize the domain knowledge to speed up the search, and thus are problem-dependent. On the other hand, there exist a class of metaheuristics, which are designed to be applicable to a large variety of problems. One simple algorithm is greedy search [8], which makes the locally optimal decision at each step.

chapter10

April 29, 2013

16:36

World Scientific Review Volume - 9in x 6in

Nested Partitions

chapter10

205

It basically sacrifices solution quality for fast computation time, while all heuristics have different trade-offs between these two key measurements. Tabu search [9, 10] improves local search by prohibiting the repeated visits to the same solution within a short period of time. A subcategory of the metaheuristics is evolutionary algorithms that are typically used by artificial intelligence and machine learning communities. These algorithms include genetic algorithms [11, 12], ant colony optimization [13, 14], particle swarm optimization [15, 16], etc. Some metaheuristics use probability distributions as rules/stretegies, such as simulated annealing [17, 18], cross-entropy methods [19, 20], and model reference adaptive search [21]. Metaheuristics are designed to be general enough to apply to many real problems, and it is usually easy to further speed up the algorithms by embedding domain knowledge. For stochastic optimization, the combinatorial explosion of the deterministic counterparts exists, while the noisy estimates of the performances cause another level of challenge. In deterministic problems, techniques such as math programming exhaust solution space and thus guarantee finding the optimal solution. This is not true in stochastic problems because the objective function can only be estimated. Compared to the advancements in solving deterministic problems, the search techniques for stochastic problems in still in an embryonic stage, especially for problems with discrete decision variables. While industrial solvers for stochastic discrete optimization heavily rely on metaheuristic methods such as simulated annealing [22, 23], research frontier has been advanced for methods which provide the sort of convergence guarantees. One example is the stochastic ruler [24], which converges globally but whose finite-time performance is not good. Stochastic branch and bound (SB&B) [25] uses bounding functions and partitions the solution space with the best bound. These methods guarantee global convergence, given all feasible solutions evaluated with an infinite number of replications. In practice, only a small fraction of the solutions can be simulated when the solution space is large. Then global convergence has little practical meaning. The COMPASS method [26] is thus designed to have a practically meaningful stopping criterion which is a local convergence criterion. Some other methods include the stochastic comparison method [27], the low-dispersion point sets method [28], etc. Worth mentioning here is the ordinal optimization (OO) [29, 30]. It is based on ordinal estimation of the performance measure. It has been shown that the convergence rate of the ordinal comparison can be exponential as the number of simulation

April 29, 2013

16:36

206

World Scientific Review Volume - 9in x 6in

chapter10

W. Chen and L. Shi

replications or samples increases,√while the convergence rate of the cardinal value estimate is at most O(1/ t), where t is the number of simulation replications or the number of simulation samples. It significantly reduce the computational cost for stochastic optimization. Nested partitions (NP) [31] is a recently developed metaheuristic framework for solving large-scale optimization problems, both deterministic and stochastic. The NP method first partitions the solution space into subregions, and draws random samples from each subregion. The quality of each subregion is then evaluated by the so-called promising index of each region. The subregion that is determined to have the best quality becomes the next most promising region that NP partitions on in the next iteration. NP also features a strategy that backtracks to superregions when appropriate to guarantee the global optimality. The NP method is particularly effective for problems where the solution space can be partitioned such that good solutions tend to be clustered together and the corresponding regions are hence natural candidates for concentrating the computational effort. It can be observed that NP is closely tied to ordinal optimization. Comparing the promising index of each subregion is actually comparing the relative order of solutions sampled from each region. The remaining context of this chapter is organized as follows. In Section 10.2, the NP framework for deterministic optimization is introduced, and the global convergence is proved. Then, some enhancements and advanced developments of NP are presented in Section 10.3, including a LP solution-based sampling, an extreme value-based promising index, and hybrid algorithms. In Section 10.4, the NP algorithm for solving stochastic optimization is developed, and its global convergence is discussed. The conclusion is drawn in Section 10.5. 10.2. Nested Partitions for Deterministic Optimization Considering the following optimization problem: min f (θ) θ∈Θ

(10.1)

Problem (10.1) can be discrete or continuous. The solution space is denoted as Θ, and an objective function is defined on this set as f : Θ → R which can be linear or non-linear. We want to point out that if the problem is structured to be easily solved by exact algorithms, such as mathematical programming, the exact algorithms are preferred. However, if the problem

April 29, 2013

16:36

World Scientific Review Volume - 9in x 6in

Nested Partitions

chapter10

207

size is too large to be handled by exact algorithms, or it has unfavorable structures, NP provides an effective framework to solve these problems. It is also able to combine NP framework with exact algorithms for a further improved performance. 10.2.1. Nested partitions framework In each iteration of the NP algorithm, we assume that there is a region (subset) of Θ that is considered the most promising. We partition this most promising region into a fixed number of M subregions and aggregate the entire complementary region (also called surrounding region) into one region, that is, all the feasible solutions that are not in the most promising region. Therefore we consider M + 1 subsets as a partition of the feasible region Θ, namely they are disjoint and their union is equal to Θ. This is referred to as a valid partitioning scheme, and we assume for the remainder of this chapter that such a scheme is used. Each of these M + 1 regions is sampled using some random sampling scheme to generate feasible solutions that belong to that region. The performance values (objective values) of the randomly generated samples are used to calculate the promising index for each region. This index determines which region is the most promising region in the next iteration. If one of the subregions is found to be the best, this region becomes the next most promising region. The next most promising region is thus nested within the last. If the complementary region is found to be the best, then the algorithm backtracks to a larger region that contains the previous most promising region. This larger region becomes the next most promising region, and is then partitioned and sampled in the same fashion. If region η is a subregion of region σ, we call σ a superregion of η. Let σ(k) denote the most promising region in the k-th iteration. We further denote the depth of σ(k) as d(k). The feasible region Θ has depth 0, the subregions of Θ have depth 1, and so forth. When Θ is finite, eventually there will be regions that contain only a single solution. Such singleton regions are called regions of maximum depth. If the problem is infinite, we define the maximum depth to correspond to the smallest desired sets. The maximum depth is denoted as d∗ . With this notation, we describe the Generic Nested Partitions Algorithm, distinguished from hybrid NP algorithms, in Algorithm 1 [31]. The special cases of being at minimum or maximum depth are considered separately.

April 29, 2013

16:36

World Scientific Review Volume - 9in x 6in

208

chapter10

W. Chen and L. Shi

Algorithm 1. Nested Partitions Algorithm (0 < d(k) < d∗ ) (1) Partitioning. Partition the most promising region σ(k) into M subregions σ1 (k), . . . , σM (k), and aggregate the complementary region Θ \ σ(k) into one region σM+1 (k). (2) Random Sampling. Randomly generate Nj sample solutions from each of the regions σj (k), j = 1, 2, . . . , M + 1: j θ1j , θ2j , . . . , θN , j = 1, 2, . . . , M + 1. j

Calculate the corresponding performance values: j f (θ1j ), f (θ2j ), . . . , f (θN ), j = 1, 2, . . . , M + 1. j

(3) Calculate Promising Index. For each region σj , j = 1, 2, . . . , M + 1, calculate the promising index as the best performance value within the region: I(σj (k)) =

min

i∈{1,2,...,Nj }

f (θij ), j = 1, 2, . . . , M + 1.

(10.2)

(4) Move. Calculate the index of the region with the best performance value. ˆjk =

arg min

I(σj (k)).

j∈{1,...,M+1}

If more than one region is equally promising, the tie can be broken arbitrarily. If this index corresponds to a region that is a subregion of σ(k), that is ˆjk ≤ M , then let this be the most promising region in the next iteration: σ(k + 1) = σˆjk (k). Otherwise, if the index corresponds to the complementary region, that is ˆjk = M + 1, backtrack to the superregion of the current most promising region: σ(k + 1) = σ(k − 1). or backtrack to the entire solution space: σ(k + 1) = Θ.

(10.3)

For the special case of d(k) = 0, the steps are identical except there is no complementary region. The algorithm hence generates feasible sample solutions from the subregions and in the next iteration moves to the subregion with the best promising index. For the special case of d(k) = d∗ ,

April 29, 2013

16:36

World Scientific Review Volume - 9in x 6in

Nested Partitions

chapter10

209

there are no subregions. The algorithm therefore generates feasible sample solutions from the complementary region and either backtracks or stays in the current most promising region. 10.2.2. Global convergence The above NP method gives us a framework that guides the search and enables global convergence [32]. The basic idea is that the sequence of regions that NP visits, denoted by {σ(k)}∞ k=1 , is a Markov chain, and the regions of maximum depth are the only absorbing states. That is, {σ(k)}∞ k=1 ends up in one of these absorbing states with probability one. We begin with defining some notation. Definition 1. Let Σ denote the set of valid regions that the NP algorithm visits given a fixed partitioning scheme. Let Σ0 denote the set of regions of maximum depth. For discrete optimization, it is straightforward that the sequence of regions that NP visits is a Markov chain, called NP Markov chain. Proposition 1. The stochastic process {σ(k)}∞ k=1 is a Markov chain with Σ as its state space. If the NP Markov chain is in a state that is not of maximum depth, it will either move to a subregion or backtrack. Therefore, the transition probabilit of staying in such state is zero. We have Proposition 2. A state η ∈ Σ is an absorbing state for the NP Markov ∗ ∗ chain {σ(k)}∞ k=1 if and only if η ∈ Σ0 and η = {θ }, where θ is a global minimizer of the original problem. With Propositions 1 and 2, the following convergence can be proved. Theorem 1. The NP method with a valid partitioning scheme for the optimization problem θ∗ ∈ arg min f (θ), θ∈Θ

converges almost surely to a global minimum in finite time, that is, there exists a K < ∞ such that with probability one, σ(k) = {θ∗ },

∀k ≥ K,

April 29, 2013

16:36

210

World Scientific Review Volume - 9in x 6in

chapter10

W. Chen and L. Shi

where θ∗ ∈ arg min f (θ). θ∈Θ

For continuous optimization, we first define a local search heuristic H : Θ × Σ → Θ. For any σ ∈ Σ and θ ∈ σ, the local search H(θ, σ) converges to a point θ˜ within σ with θ as the starting point. For a point θ˜ ∈ Rn ˜ ⊆ Rn such that for any θ ∈ A(θ), ˜ the region of attraction is the set A(θ) ˜ H(θ, Θ) = θ. Therefore, a promising index can be defined as I(σ) = min f (H(θ, σ)). θ∈σ

(10.4)

We further define a function φ : Θ → R, which measures the distance between a point θ˜ ∈ Rn and the complement of its region of attraction, ˜ = φ(θ)

inf

˜ θ∈Θ\A(θ)

θ − θ˜ .

Thus, we can derive the following property for continuous optimization. Theorem 2. Assume the NP method is applied to a problem defined on a bounded subset Θ of Rn using the promising index defined by (10.2). Choose the maximum depth d∗ such that it satisfies φ(θopt ) > max θ1 − θ2

θ1 ,θ2 ∈η

for all η ∈ Σ0 . Let σopt be the region of maximum depth that contains θopt , then σopt is an absorbing state, and the NP method converges to the global optimum in finite time with probability one. The above theorems guarantee that the NP method will eventually converge to a global optimum. However, a more important issue in real applications is its finite time behavior. That is, the number of iterations until the NP Markov chain gets absorbed at a global optimum, denoted by Y . It is not surprising that the expected number of iterations E[Y ] depends on the partitioning scheme, sampling strategy, promising index definition, and backtracking strategy, as well as the problem structure. It is almost always unable to calculate the exact value in applications. To make a reasonable estimation, we introduce another important quantity in NP called success probability, denoted by P0 . The success probability is the probability of the NP algorithm moving in the correct direction, i.e., backtracking if the global optimum is not in the current most promising region or selecting the

April 29, 2013

16:36

World Scientific Review Volume - 9in x 6in

Nested Partitions

chapter10

211

correct subregion if it is in the current region. If we assume success probability equals to 1 at every iteration, the NP Markov chain will be absorbed in d∗ iterations, and backtracking will not happen. Let η be any region, and η ∗ be the unique region which is the closest to the global optimum (in terms of the number of iterations) among all regions that the NP Markov chain can move to in the next iteration from η. In order to study the relationship between NP convergence speed and success probability, we further make the following assumptions. Assumption 1. The transition probability from η to η ∗ is constant P0 , that is, for any region η P [σ(k + 1) = η ∗ |σ(k) = η] = P0 . Assumption 2. The transition probability from η to any other region ξ except η ∗ in the next iteration is given by P [σ(k + 1) = ξ|σ(k) = η] =

1 − P0 . M

Basically, Assumption 1 says that P0 does not depend on the current most promising region or the iteration, i.e., we assume that the transitions of the algorithm are independent and identically distributed until the global optimum is identified. Assumption 2 states that the probability of moving in any given wrong direction is the same for every region. Since each region is partitioned into M subregions, the probability of moving into any wrong direction is approximated by (1 − P0 )/M . Then, we have Theorem 3. If the NP Algorithm 1 backtracks to the entire solution space using (10.3), the expected number of iterations until the NP Markov chain gets absorbed is given by ⎛ ⎞  1−P0 d∗  (1 − P0 )(M − 1) P − P d∗ +1 1 − M 1 ⎝ 0 0 ⎠ . (10.5) E[Y ] = d∗ + P0 (M − 1 + P0 ) 1 − P0 P0 This is still a very rough estimate, since the assumptions are not valid in most applications. However, Eq. (10.5) provides some insight of the NP transient behavior. Generally speaking, we can expect the NP method to be efficient for a specific problem if the success probability P0 is high. If the success probability is too low, exhaustive search can be more efficient than NP. Thus, the efficiency of NP depends on being able to make a move in the correct direction with reasonably high confidence. That means

April 29, 2013

16:36

World Scientific Review Volume - 9in x 6in

212

chapter10

W. Chen and L. Shi

it makes sense to concentrate more computational effort on these regions than others. For details of the proofs of the above theorems, and more theoretical properties of the NP method, we refer to [31]. 10.3. Enhancements and Advanced Developments The generic framework of the NP algorithm has been introduced. In this section, we discuss several enhancements and advanced developments of the NP method. 10.3.1. LP solution-based sampling The NP method requires to draw samples from each subregion in each iteration of the algorithm. Therefore, being able to sample high quality solutions can be critical to the success of the NP method. In successful NP applications, effective sampling methods are usually adopted, some of which are based on domain knowledge. Here, we introduce a heuristic sampling approach that can be applied to a wide range of discrete problems. Consider the following optimization problem: min

c1 x + c2 y

s.t.

A1 x + A2 y ≤ b, eT y ≤ M,

(10.6)

x ∈ R , y ∈ {0, 1} , n

m

where e is the unit vector, c1 , c2 , A1 , A2 , M are given parameters and M < m. x’s can be either real variables or integer variables, and y’s are binary variables. Due to the existence of constraints (10.6), we introduce a concept of partial samples. A partial sample is a solution that is generated by sampling only a part of the variables in a given region. For example, assume the solution space for y is denoted as (y1 , . . . , ym ), and the sampling region is (y1∗ , . . . , yk∗ , yk+1 , . . . , ym ) where y1∗ , . . . , yk∗ are fixed. By sampling yk+1 , . . . , yk+j , 1 ≤ j < m − k, we have a partial sample in the form (y1∗ , . . . , yk∗ , y¯k+1 , . . . , y¯k+j , yk+j+1 , . . . , ym ). The purpose of introducing partial samples is that by reducing the number of variables in the optimization problem, the subproblem becomes smaller with only (m − k − j) variables, and thus easier to solve to optimality using mathematical programming techniques. We intend to combine the global perspective of NP with the effectiveness of exact algorithms for solving subproblems.

April 29, 2013

16:36

World Scientific Review Volume - 9in x 6in

Nested Partitions

chapter10

213

The following sampling procedure generates partial samples based on the solutions obtained from linear programming (LP) relaxation. Algorithm 2. LP Solution-based Sampling (1) Solve the LP relaxation of original optimization problem, with added constraints 0 ≤ y ≤ 1. Obtain the LP solution y∗ . (2) Calculate the sampling weights of variable y’s based on the value of y∗ . The sampling weight for variable yj (j = 1, . . . , m) is positively correlated to the value of yj∗ . (3) A partial solution can be sampled as follows: randomly select M variables from all y’s based on the sampling weights calculated in the previous step, fix the other m − M y’s to zero, and the remaining problem is the subproblem associated with the partial solution. (4) Solve each subproblem to obtain a feasible sample of the original problem. Variants of Algorithm 2 can be designed according to different problem structures. For many problems with tight LP bounds, Algorithm 2 can be very effective to decompose a large problem into smaller subproblems using the NP framework. In [33, 34], the LP solution-based sampling is applied to a local pickup and delivery problem and a facility location problem in intermodal industry, and proved to be effective in solving these large-scale discrete optimization problems. 10.3.2. Extreme value-based promising index The promising index determines how good the quality of each subregion is, and consequently decides which subregion becomes the next most promising region. It is an important component of NP that deserves extensive investigation. Traditional promising index used in the NP method is the sample minimum/maximum of each region. Another index discussed in [35] is the sample mean of each region. We propose a new promising index based on extreme value theory. In solving large-scale optimization problems, global optima are generally very difficult to obtain. In other words, good solutions to any given problem instance are, in a sense, rare phenomena within the space of feasible solutions. The extreme value theory provides limiting results for large samples. Given a large set of independent and identically distributed (iid) random variables, the extreme value theory says the limiting distribution of the minimum (or maximum) of the set of variables goes to one of the

April 29, 2013

16:36

World Scientific Review Volume - 9in x 6in

214

W. Chen and L. Shi

three extreme value distributions [36], with type I, II and III known as Gumbel, Fr´echet and Weibull families respectively. For minimization problem (10.1), the global minimizer serves as an implicit lower bound of the objective value, and hence extreme values follow Weibull distribution as it is bounded below. At each iteration of the NP algorithm, for each region σj , a set of independent solutions are randomly sampled and randomly divided into ni groups, each of which has the same number l of samples. We temporarily drop the subscript i for easier presentation, and we have θ11 .. . θn1

θ12 · · · θ1l .. . . .. . . . θn2 · · · θnl .

For each θjk (j = 1, . . . , n, k = 1, . . . , l), its objective value is evaluated and denoted as yjk = f (θjk ). The best objective value of each group is denoted as zj . Thus, we have a matrix of objective values of samples as follows. y11 .. .

y12 · · · .. . . . .

y1l z1 = min{y11 , . . . , y1l } .. .. . .

yn1 yn2 · · · ynl zn = min{yn1 , . . . , ynl }. Here, we call yjk individual samples and call zj supersamples. Note that y’s and z’s are actually the objective values of samples. Besides obtaining a supersample from the grouping, an alternative way is to perform a local search from an initial sample when an effective local search heuristic (H) exists. In that case, it is not necessary to draw and evaluate l samples in order to obtain one supersample. We only need to randomly draw one initial sample θj0 (j = 1, . . . , n), and perform the local search H to get the local optimum θj∗ and the corresponding objective value f (θj∗ ) as the supersample. H

θ10 −→ θ1∗ z1 = f (θ1∗ ) .. .. . . H

θn0 −→ θn∗ zn = f (θn∗ ). Assume that the set of individual samples y’s are iid, which can be statistically verified and is usually true as θ’s are iid. The extreme value theory tells us when the sample size l is large enough, the limit distribution of supersamples z’s follows a Weibull distribution since minimization

chapter10

April 29, 2013

16:36

World Scientific Review Volume - 9in x 6in

chapter10

Nested Partitions

215

problem (10.1) is bounded from below [37]. With location parameter α, scale parameter β(> 0), and shape parameter γ(> 0), Weibull has the following cumulative distribution: ⎧ ⎨0 for z < α;

 γ  F (z; α, β, γ) = z−α ⎩1 − exp − for z ≥ α. β The Weibull parameters can be fitted using the maximum likelihood estimation (MLE), which has been proven to efficiently fit the Weibull distribution with three unknown parameters [38]. The MLEs of the three ˜ γ˜ , are the values that maximize the Weibull parameters, denoted as α ˜ , β, following log-likelihood function [37, 39], ln L(z1 , . . . , zn ; α, β, γ) = n(ln γ − γ ln β) + (γ − 1)

n 

ln(zi − α) − β

i=1

−γ

n 

(zi − α)γ ,

i=1

subject to the constraints α < min zi , β > 0, γ > 0. i

The MLEs of the Weibull parameters are asymptotically normal under mild conditions (e.g., γ > 2) [40]. We assume the condition is satisfied, which can be checked after fitting. Actually it is not difficult to show that for the rare cases when MLEs of Weibull parameters do not exist, the extreme value-based promising index downgrades to the standard index (10.2) [41]. The MLE of α follows an asymptotic normal distribution, that is, α ˜∼N

  s2 , α∗ , n

where α∗ is the true value of the location parameter, s2 is the asymptotic variance of the estimator, and n is the size of the set of supersamples. In ˆ by practice, the true value α∗ is usually replaced by the point estimator α solving the above log-likelihood function given the samples, and the asymptotic variance can be obtained by inverting the observed Fisher information matrix with elements that are negatives of expected values of second partial

April 29, 2013

16:36

World Scientific Review Volume - 9in x 6in

216

W. Chen and L. Shi

derivative of the log-likelihood function, ⎡ ∂ 2 ln L ∂ 2 ln L ∂ 2 ln L ⎤−1   ∂α2 ∂α∂β ∂α∂γ ⎢ ∂ 2 ln L ∂ 2 ln L ∂ 2 ln L ⎥ V ≈ − ⎣ ∂α∂β ∂β 2 ∂β∂γ ⎦  ∂ 2 ln L ∂ 2 ln L ∂ 2 ln L  ∂α∂γ ∂β∂γ ∂γ 2 ˆγ α, ˆ β,ˆ ⎡ ⎤ ˆ Var(ˆ α) Cov(ˆ α, β) Cov(ˆ α, γˆ ) . ⎢ ˆ Var(β) ˆ Cov(β, ˆ γˆ ) ⎥ = ⎣ Cov(ˆ α, β) ⎦, ˆ Cov(ˆ α, γˆ ) Cov(β, γˆ ) Var(ˆ γ) . ˆ γˆ are point estimators, and sˆ2 = where α ˆ , β, Var(ˆ α), which is the sample variance of Weibull location parameter. Therefore, we have   sˆ2 α ˜i ∼ N α ˆ i , i , i = 1, . . . , M. ni Since Weibull distribution is bounded by location parameter αi , the estimator α ˜ i serves as a prediction of the optimal value of each subset. The value of this prediction is estimated by the mean α ˆi , and the accuracy of the s ˆ2 prediction is estimated by the variance nii . Hence, the extreme value-based promising index can be described as follows. Definition 2. For a partition of the current most promising region σ1 , . . . , σM which are mutually exclusive, and the surrounding region σM+1 , the promising index is calculated as I(σj ) = α ˆ j , j = 1, 2, . . . , M + 1. The next most promising region is determined by ˆj =

arg min

I(σj ).

j∈{1,...,M+1}

This promising index is considered more informative compared to some other indices, such as sample minimum and sample mean. It is an estimation of the minimum of each region, which takes the distribution of the sample population into account. Numerical tests in [41, 42] show that the NP algorithm using the new promising index has a higher success probability, and hence is more accurate and robust than the traditional indices. 10.3.3. Hybrid algorithms The NP framework provides the flexibility of incorporating other algorithms, such as domain knowledge or local search, into the search procedure.

chapter10

April 29, 2013

16:36

World Scientific Review Volume - 9in x 6in

Nested Partitions

chapter10

217

The resulting hybrid algorithms are more efficient than either generic NP or the heuristics alone. The NP method can also be combined with math programming to produce more efficient search algorithms when the math programming approach alone is difficult to solve the problem. In this section, two hybrid algorithms for solving product design and truck scheduling are briefly reviewed. 10.3.3.1. Product design Product design problems occur when designing new products to satisfy the preferences of expected customers. An important problem is discerning how to use the preferences of potential customers to design a new product such that the market share of the new product is maximized. This problem is very difficult to solve, especially as the product complexity increases and more attributes are introduced. In fact, it belongs to the class of NPhard problems, and thus exact solution methodologies are infeasible for realistically sized problems. In literature, a greedy search (GS) heuristic and a dynamic programming (DP) approach have been applied [43], and a genetic algorithm (GA) approach has also been introduced [44, 45]. In [46], four hybrid algorithms are developed to solve the problem. • NP/GS Algorithm: The GS heuristic is incorporated into the NP framework by using it to bias the sampling distribution used in the NP algorithm. The product profiles that are “close” to the profile generated by pure GS are sampled with a higher probability, and other profiles are sampled uniformly. • NP/DP Algorithm: Similarly to the NP/GS algorithm, the DP heuristic is incorporated in the sampling step with the objective of biasing the sampling distribution toward product profiles that are heuristically good. • NP/GA Algorithm: In each NP iteration, the samples drawn in each NP subregion is used as an initial population of GA. GA algorithm starts with this initial population of product profiles, and improves these profiles iteratively over populations. The final population of GA is then used in NP as the promising index of each region. • NP/GA/GS Algorithm: It is also possible to incorporate more than one heuristic into the NP framework. While NP/GA/DP were found to have too much overhead to be efficient, NP/GA/GS are reported be produce good results.

April 29, 2013

16:36

218

World Scientific Review Volume - 9in x 6in

W. Chen and L. Shi

Numerical examples are used to compare the new optimization framework with existing heuristics, and the results indicate that the new method is able to produce higher-quality product profiles. Furthermore, these performance improvements were found to increase with increased problem size. This finding indicates that the NP optimization framework is an important addition to the product design and development process, and will be particularly useful for designing complex products that have a large number of important attributes. 10.3.3.2. Local pickup and delivery The local pickup and delivery is an important problem in the intermodal transportation. Truck/rail intermodal transportation combines the costeffectiveness of rail with the flexibility of trucks for local transport. Although the truck portion is much shorter than the rail portion, it contributes a significant amount to the total cost, majorly because of the higher labor costs and growth in service demand [47]. The local pickup and delivery problem (LPDP) is concerned with the optimal movement of a set of loads between customer locations and rail ramps in a local service area over a relatively short planning horizon. At the beginning of each work day, a fixed number of vehicles are positioned throughout the area. A vehicle can only serve one load at a time. It goes for the next load or becomes idle after the delivery of one load. Served loads generate revenues, but the empty movements of vehicles between the delivery location of one load and the pickup location of next load incur costs. In [33], the LPDP is formulated as a mixed integer program (MIP), and standard math programming can hardly solve problems with medium and large size. Thus, a hybrid NP and math programming (HNP-MP) algorithm is proposed. The hybrid algorithm uses NP as a framework, and generates partial samples as introduced in Section 10.3.1. Given the partial samples, the remaining MIP constitutes a smaller MIP and can be solved to optimality using math programming. The computational results in [33] show that HNP-MP performs as good as CPLEX for easy cases, and outperforms CPLEX for larger problems. 10.4. Nested Partitions for Stochastic Optimization In the above context, we have introduced the NP method and its advanced developments for deterministic optimization. In this section, we discuss its

chapter10

April 29, 2013

16:36

World Scientific Review Volume - 9in x 6in

chapter10

Nested Partitions

219

applicability to problems where the objective function is noisy, for example, when it can only be evaluated as a realization of some random variables. This type of stochastic optimization problems often arise when optimizing the configuration of complex discrete event systems, such as queueing systems. The challenge of evaluating the objective exactly exacerbates the difficulty of the optimization problem. Consider the following stochastic problem: min E[fω (θ)], θ∈Θ

(10.7)

where ω represent the randomness in function evaluation. For each solution θ, the objective value is subject to the randomness, and thus the expected performance is desired. 10.4.1. Nested partitions for stochastic optimization In Section 10.2.2, it has been explained that the regions of maximum depth are the absorbing states of the NP Markov chain. For NP to work with stochastic optimization, we need to keep track of the number of visits to each region of maximum depth σ ∈ Σ0 . The number of visits by the k-th iteration is denoted as Nk (σ). Definition 3. The estimate of the best solution for problem (10.7) is σ ˆopt (k) ∈ arg max Nk (σ), σ∈Σ0

the most frequently visited region of maximum depth by the k-th iteration, that is, the region that has most often been the most promising region. With Definition 3, the NP algorithm for stochastic optimization can be described as follows. Algorithm 3. Nested Partitions Algorithm for Stochastic Optimization ˆopt (k) = Θ, and σ(0) = (0) Initializing. Let k = 0, N0 (σ) = 0 ∀σ ∈ Σ, σ Θ. (1) Partitioning. Unless the current most promising region σ(k) reaches the maximum depth, partition σ(k) into M subregions σ1 (k), . . . , σM (k), and aggregate the surrounding region Θ \ σ(k) into one region σM+1 (k). Here M can be either a fixed number, or depend on the current most promising region σ(k). If σ(k) = Θ, there is no surrounding region; if σ(k) ∈ Σ0 , there is no further partitioning and M = 1.

April 29, 2013

16:36

World Scientific Review Volume - 9in x 6in

220

W. Chen and L. Shi

(2) Random Sampling. Randomly generate Nj sample solutions from each of the regions σj (k), j = 1, 2, . . . , M + 1: j , j = 1, 2, . . . , M + 1. θ1j , θ2j , . . . , θN j

Similarly, the sample size Nj can be either a fixed number, or depend on the region σ(k). Then, calculate the corresponding performance values subject to randomness: j ), j = 1, 2, . . . , M + 1. fω (θ1j ), fω (θ2j ), . . . , fω (θN j

(3) Estimate Promising Index. For each region σj , j = 1, 2, . . . , M + 1, estimate the promising index as the best observed performance value within the region: ˆ j (k)) = I(σ

min

i∈{1,2,...,Nj }

fω (θij ), j = 1, 2, . . . , M + 1.

(4) Move. Calculate the index of the region with the best performance value: ˆjk =

arg min

ˆ j (k)). I(σ

j∈{1,...,M+1}

If more than one region is equally promising, the tie can be broken arbitrarily. If this index corresponds to a region that is a subregion of σ(k), that is ˆjk ≤ M , then let this be the most promising region in the next iteration: σ(k + 1) = σˆjk (k). Otherwise, if the index corresponds to the complementary region, that is ˆjk = M + 1, backtrack to the superregion of the current most promising region: σ(k + 1) = σ(k − 1). or backtrack to the entire solution space: σ(k + 1) = Θ. (5) Update Counters. If the next most promising region σ = σ(k + 1) ∈ Σ0 , let Nk+1 (σ) = Nk (σ) + 1; otherwise let Nk+1 (σ) = Nk (σ). If there σopt (k)), let σ ˆopt (k + 1) = σ; exists σ ∈ Σ0 such that Nk+1 (σ) > Nk+1 (ˆ ˆopt (k). Let k = k + 1. Go back to Step 1. otherwise let σ ˆopt (k + 1) = σ

chapter10

April 29, 2013

16:36

World Scientific Review Volume - 9in x 6in

Nested Partitions

chapter10

221

The biggest difference between Algorithm 3 and Algorithm 1 is that the most promising region σ(k) estimated from current samples in Step 4 may not be the same as the most frequently visited region of maximum depth σ ˆopt (k). In deterministic problems, these two regions are the same. However, by Definition 3, the best region for stochastic problem is now σ ˆopt (k), instead of σ(k). 10.4.2. Global convergence Since in the k-th iteration, the next most promising region σ(k + 1) only depends on the current most promising region σ(k), and the sampling information obtained in the k-th iteration, the sequence of most promising region {σ(k)}∞ k=1 is a Markov chain with state space Σ. Denote the region that NP backtracks to from region σ ∈ Σ \ {Θ} as b(σ). b(σ) can be either the superregion of σ, or the entire solution space Θ, depending on the backtracking strategy that is chosen. We make the following assumption. Assumption 3. For all σ ∈ Σ\{Θ}, the transition probability P (σ, b(σ)) > 0. Given Assumption 1, the following proposition can be proved. Proposition 3. If Assumption 1 holds, the NP Markov chain has a unique stationary distribution {π(σ)}σ∈Σ . Notice that since a unique stationary distribution exists, the average number of visits to each state converges to this distribution with probability one (w.p.1), that is, Nk (σ) = π(σ), ∀σ ∈ Σ, w.p.1. k→∞ k Furthermore, as P (σ, σ) > 0 for some σ ∈ Σ0 , the NP Markov chain is aperiodic and hence the k-th step transition probabilities converge pointwise to the stationary distribution, that is, lim

lim P k (Θ, σ) = π(σ), w.p.1, ∀σ ∈ Σ.

k→∞

Theorem 4. Assume that Assumption 1 holds. The estimate of the best region σ ˆopt (k) converges to a maximum of the stationary distribution of the NP Markov chain, that is, ˆopt (k) ∈ arg max π(σ), w.p.1. lim σ

k→∞

σ∈Σ0

April 29, 2013

16:36

World Scientific Review Volume - 9in x 6in

222

W. Chen and L. Shi

Definition 4. Let σ and η be any valid regions. Then there exists some sequence of regions σ = ξ0 , ξ1 , . . . , ξn = η that the Markov chain can move along to get from state σ to state η. We call the shortest such sequence the shortest path from σ to η. We also define κ(σ, η) to be the length of the shortest path. Given the above Definition 4, we make the following assumption to enable global convergence. Assumption 4. The set S0 = {ξ ∈ Σ0 : P κ(η,ξ) (η, ξ) ≥ P κ(ξ,η) (ξ, η), ∀η ∈ Σ0 }, satisfies S0 ⊆ S, that is, it is a subset of the set of global optimizers. This assumption guarantees that the transition probability from any part of the feasible region to the global optimum is at least as large as going from the global optimum back to that region. Thus, we can prove the following global convergence. Theorem 5. Assume that the NP algorithm 3 is applied to solve stochastic optimization (10.7), and Assumptions 3 and 4 hold. Then arg max π(σ) ⊆ S, σ∈Σ0

and consequently the NP algorithm 3 converges with probability one to a global optimum. Note that Assumption 4 are somehow not intuitive to satisfy in real problems, as it requires the conditions on partitioning, sampling and promising index. Some stronger, but more intuitive conditions have been proposed to prove the global convergence of Algorithm 3, and we refer to [48] for further readings. 10.5. Conclusions In this paper, we first review the NP method for solving deterministic optimization problems. The generic NP algorithm is presented, and its global convergence is proved. Then, some enhancements and advanced developments of NP are introduced. These enhancements include (1) a partial sampling and LP solution-based sampling procedure that can be effective in combining NP framework with mathematical programming to solve many

chapter10

May 8, 2013

15:10

World Scientific Review Volume - 9in x 6in

Nested Partitions

chapter10

223

large-scale optimization problems; and (2) an extreme value-based promising index which basically contains both the point estimate of the optimum of each region and the stability of such estimate. The new promising index has proved to be more informative and accurate than the traditional promising indices, such as sample minimum/maximumn or sample average. Furthermore, the NP algorithm for solving stochastic optimization is proposed. Different from the NP algorithm for solving deterministic problems, it keeps a counter of the number of visits to the regions of maximum depth, and the region that is most frequently visited is defined as the best solution. Under some assumptions, the global convergence of the algorithm is then proved. Although some assumptions are not intuitive to guarantee in real-world problems, it provides nice theoretical guarantees to the algorithm. Acknowledgements The author Leyuan Shi was Prof. Ho’s Ph.D. student at Harvard University, and the author Weiwei Chen was Dr. Leyuan Shi’s Ph.D. student at the University of Wisconsin-Madison. The authors would like to take this opportunity to thank Prof. Ho for his guidance, support and encouragement over the years. He influenced us not only via his numerous scientific publications, but also through his talks and personal blog, which inspired and educated many young researchers. He is a great researcher, educator and mentor. Happy birthday, Prof. Ho! References [1] L. A. Wolsey, Integer Programming. Wiley-Interscience (1998). [2] G. L. Nemhauser and L. A. Wolsey, Integer and Combinatorial Optimization. Wiley-Interscience (1999). [3] M. L. Fisher, The Lagrangian relaxation method for solving integer programming problems, Management Science. 50(12), 1861–1871 (2004). [4] C. Lemarechal, Lagrangian relaxation. In eds. M. J¨ unger and D. Naddef, Computation Combanatorial Optimization, vol. 2241, Lecture Notes in Computer Science, pp. 112–156. Springer-Verlag, Berlin Heidelberg (2001). [5] M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley-Interscience (2005). [6] N. G. Hall and D. S. Hochbaum, A fast approximation algorithm for the multicovering problem, Discrete Applied Mathematics. 15(1), 35–40 (1986). [7] D. Bertsimas and C.-P. Teo, From valid inequalities to heuristics: A unified view of primal-dual approximation algorithms in covering problems, Operations Research. 46(4), 503–514 (1998).

April 29, 2013

16:36

224

World Scientific Review Volume - 9in x 6in

W. Chen and L. Shi

[8] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to Algorithms. MIT Press and McGraw-Hill (1990). [9] D. Cvijovi´c and J. Klinowski, Taboo search: An approach to the multiple minima problem, Science. 267(5198), 664–666 (1995). [10] F. W. Glover and M. Laguna, Tabu Search. Kluwer Academic Publishers, Boston, MA (1997). [11] D. E. Goldberg, Genetic Algorithm in Search, Optimization and Machine Learning. Addison-Wesley, Reading, MA (1989). [12] M. Mitchell, An Introduction to Genetic Algorithms. MIT Press, Cambridge, MA (1998). [13] M. Dorigo, Optimization, Learning and Natural Algorithms. PhD thesis, Politecnico di Milano, Italy (1992). [14] M. Dorigo, G. D. Caro, and L. M. Gambardella, Ant algorithms for discrete optimization, Artificial Life. 5(2), 137–172 (1999). [15] J. Kennedy and R. Eberhart, Particle swarm optimization. In Proceedings of 1995 IEEE International Conference on Neural Networks, vol. 4, pp. 1942– 1948 (1995). [16] R. Poli, J. Kennedy, and T. Blackwell, Particle swarm optimization, Swarm Intelligence. 1(1), 33–57 (2007). [17] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, Optimization by simulated annealing, Science. 220(4598), 671–680 (1983). [18] P. J. M. van Laarhoven and E. H. L. Aarts, Simulated Annealing: Theory and Applications. Kluwer Academic Publishers, Norwell, MA (1987). [19] P. T. De Boer, D. P. Kroese, S. Mannor, and R. Y. Rubinstein, A tutorial on the cross-entropy method, Annals of Operations Research. 134, 19–67 (2005). [20] R. Y. Rubinstein and D. P. Kroese, The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning. Springer, New York (2004). [21] J. Hu, M. C. Fu, and S. I. Marcus, A model reference adaptive search method for global optimiation, Operations Research. 55, 549–568 (2007). [22] S. B. Gelfand and S. K. Mitter, Simulated annealing with noisy or imprecise energy measurements, Journal of Optimization: Theory and Application. 62, 49–62 (1989). [23] M. C. Fu, Optimization for simulation: Theory vs. practice, INFORMS Journal on Computing. 14, 192–215 (2002). [24] D. Yan and H. Mukai, Stochastic discrete optimization, SIAM Journal of Control and Optimization. 30, 594–612 (1992). [25] V. I. Norkin, G. C. Pflug, and A. Ruszczy´ nski, A branch and bound method for stochastic global optimization, Mathematical Programming. 83, 425–450 (1998). [26] L. J. Hong and B. L. Nelson, Discrete optimization via simulation using compass, Operations Research. 54, 115–129 (2006). [27] W. B. Gong, Y. C. Ho, and W. Zhai, Stochastic comparison algorithm for discrete optimization with estimations, SIAM Journal on Optimization. 10(2), 384–404 (2000).

chapter10

April 29, 2013

16:36

World Scientific Review Volume - 9in x 6in

Nested Partitions

chapter10

225

[28] S. Yakowitz, P. L’Ecuyer, and F. V´ azquez-Abad, Global stochastic optimization with low-dispersion point sets, Operations Research. 48, 939–950 (2000). [29] Y. C. Ho, R. S. Sreenivas, and P. Vakili, Ordinal optimization of deds, Discrete Event Dynamic Systems: Theory and Applications. 2(1), 61–88 (1992). [30] Y. C. Ho, C. G. Cassandras, C. H. Chen, and L. Y. Dai, Ordinal optimization and simulation, Journal of Operations Research Society. 51, 490–500 (2000). ´ [31] L. Shi and S. Olafsson, Nested partitions method for global optimization, Operations Research. 48(3), 390–407 (2000). ´ [32] L. Shi and S. Olafsson, Nested Partitions Optimization: Methodology and Applications. vol. 109, International Series in Operations Research & Management Science, Springer (2007). [33] L. Pi, Y. Pan, and L. Shi, Hybrid nested partitions and mathematical programming approach and its applications, IEEE Transactions on Automation Science and Engineering. 5(4), 573–586 (2008). [34] W. Chen, L. Pi, and L. Shi, Optimization and Logistics Challenges in the Enterprise, chapter Nested Partitions and Its Applications to the Intermodal Hub Location Problem, pp. 229–251. Springer (2009). ´ [35] L. Shi, S. Olafsson, and N. Sun, New parallel randomized algorithms for the traveling salesman problem, Computers and Operations Research. 26(4), 371–394 (1999). [36] S. Coles, An Introduction to Statistical Modeling of Extreme Values. Springer (2001). [37] U. Derigs, Using confidence limits for the global optimum in combinatorial optimization, Operations Research. 33(5), 1024–1049 (1985). [38] S. Kotz and S. Nadarajah, Extreme Value Distributions Theory and Applications. Imperial College Press (2000). [39] S. H. Zanakis and J. Kyparisis, A review of maximum likelihood estimation methods for the three-parameter weibull distribution, Journal of Statistical Computation and Simulation. 25, 53–72 (1986). [40] R. L. Smith and J. C. Naylor, Statistics of the three-parameter weibull distribution, Annals of Operations Research. 9, 577–587 (1987). [41] W. Chen, Advanced Development of Nested Partitions Theory and Applications. PhD thesis, University of Wisconsin-Madison (2010). [42] W. Chen, S. Gao, C.-H. Chen, and L. Shi, An optimal sample allocation strategy for partition-based random search, IEEE Transactions on Automation Science and Engineering (2012). conditionally accepted. [43] R. Kohli and R. Krishnamurti, Optimal product design using conjoint analysis: Computational complexity and algorithms, European Journal of Operations Research. 40, 186–195 (1989). [44] P. V. Balakrishnan and V. S. Jacob, Triangulation in decision support systems: Algorithms for product design, Decision Support Systems. 14, 313–327 (1995). [45] P. V. Balakrishnan and V. S. Jacob, Genetic algorithms for product design, Management Science. 42, 1105–1117 (1996).

April 29, 2013

16:36

226

World Scientific Review Volume - 9in x 6in

W. Chen and L. Shi

´ [46] L. Shi, S. Olafsson, and Q. Chen, An optimization framework for product design, Management Science. 47(12), 1681–1692 (2001). [47] E. K. Morlok and L. N. Spasovic, Redesigning rail-truck intermodel drayage operations for enhanced service and cost performance, Transportation Research Forum. 34, 16–31 (1994). ´ [48] L. Shi and S. Olafsson, Nested partitions method for stochastic optimization, Methodology and Computing in Applied Probability. 2(3), 271–291 (2000).

chapter10

April 29, 2013

16:41

World Scientific Review Volume - 9in x 6in

Chapter 11 Applications of Ordinal Optimization

Tak Wing E. Lau Morgan Stanley New York, NY 10019, USA Qing-Shan Jia Center for Intelligent and Networked Systems Department of Automation, Tsinghua University Beijing 100084, China Mike Shangyu Yang Bloom Energy Corporation Sunnyvale, CA 94004, USA Mei Deng Snowkie Consulting Holmdel, NJ 07733, USA Michael E. Larson Seagrass Advisors Bellevue, WA 98004, USA Nikos Patsis VoiceWeb Group GR-15124, Athens, Greece W. David Li CloudShield Technologies Sunnyvale, CA 94089, USA

227

chapter11

April 29, 2013

16:41

228

World Scientific Review Volume - 9in x 6in

T. W. E. Lau et al.

Xiaocang Lin MathWorks, Inc. Natick, MA 01760, USA Zhaohui Chen eBay, Inc. Saratoga, CA 95070, USA Jonathan T. Lee US Department of Transportation, Volpe Center Cambridge, MA 02142, USA Hongxing Bai Sichuan WULIANYIDA Technology Co. Ltd Suining, Sichuan Province 629000, China Ordinal Optimization (OO) methodology has been tested and deployed to a range of application problems. Indeed, the idea of OO was first motivated by solving time-consuming stochastic optimization problems, often encountered in many industrial and commercial settings. In this chapter, we shall present four such examples of applying OO, including three industrial applications and a benchmark problem in team decision theory. Additional applications using OO are listed at the end of the chapter. In Section 11.1, we consider a scheduling problem for apparel manufacturing problem to which the conventional OO is applied. We examine in Section 11.2 how OO is used to solve a turbine blade manufacturing process optimization problem, which involves a deterministic but very complex objective (as opposed to stochastic problem). In Section 11.3, we look into a remanufacturing system performance optimization problem, in which both constrained OO and vector OO are applied. Finally, in Section 11.4, we study the Witsenhausen problem, which is a famous problem in team decision theory and has not been solved for more than forty years since it was first introduced. In this last example, OO helps to narrow down the vast search space of control laws, and to find a simple and near-optimal strategy.

11.1. Scheduling Problem for Apparel Manufacturing The work in this section is based on [1, 2]. The manufacturing system considered here is characterized by the co-existence of two production lines: one line with long lead time and low cost, and the other a flexible line with

chapter11

April 29, 2013

16:41

World Scientific Review Volume - 9in x 6in

chapter11

Applications of Ordinal Optimization

229

short lead time but high cost. The goal is to decide: (1) the fraction of total production capacity, denoted by γ, to be allocated to each individual line, and (2) the production schedules that maximize the overall manufacturing profit and avoid stock shortage, denoted by α. The overall manufacturing profit is defined as the total revenue, less the material cost, the production cost, and the holding cost in finished goods inventory and in work-in-process (WIP). We shall refer to different types of apparel items in demand and in production as stock keeping units (SKUs). An SKU distinguishes one apparel item from another in term of its particular style, use of fabric, and size. Suppose there are M SKUs in total. Demand is weekly with no back-ordering. Let the demand of SKU i at time t, di (t), be max{ξ(t), 0}, where ξ(t) is a Gaussian random variable with mean μi (t) and standard deviation σi (t). We shall introduce a useful concept called the Coefficient of Variation of SKU i, denoted by Cvi (t), which is defined as the standard deviation divided by the mean at time t, i.e., Cvi (t) = σi (t)/μi (t). Assume that Cvi (t) is a constant over time and therefore can be simplified as Cvi . There are three types of demand, namely, flat demand for which average demand is simply a constant throughout a year; sine demand for which demand is seasonal; and impulse demand which is characterized by a sudden jump at the beginning of a peak sales period. There are two kinds of production lines, a quick line and a slow line. The lead times of the quick and slow lines are Lq and Ls , respectively. By definition, we have Lq < Ls . For a given γ and α, Jtotal (α, γ) denotes the total manufacturing profit over the horizon from week t = 1 to week t = Π, which is calculated as follows: Jtotal (α, γ) =

Π M  

PS min(Ii (t), di (t)) −

i=1 t=1



2 Π  M  

2 Π  M  

Cm uij (t)

i=1 t=1 j=1

CLj uij (t) −

i=1 t=1 j=1

Π M   i=1 t=1

CI Ii (t) −

Π M  

CI Wi (t).

i=1 t=1

The average weekly manufacturing profit is J(α, γ) = Jtotal (α, γ)/Π. Our objective is max E{J(α, γ)}. α,γ

(11.1)

This problem is difficult due to a number of factors: large number of SKUs, simulation-based performance evaluation, exponentially increasing size of policy space, and lack of neighborhood and gradient information.

April 29, 2013

16:41

230

World Scientific Review Volume - 9in x 6in

T. W. E. Lau et al.

To apply ordinal optimization to this scheduling problem, two questions need to be addressed. First, how do we randomly sample designs from the design space? Second, what is the crude model that is computationally fast and can provide a reasonable performance estimate? For the first question, note that each design is comprised of two parts, namely the production schedule α, and the proportion of quick line capacity to the total capacity γ. The samples of γ can be easily generated using a uniform distribution. The samples of α are generated in two steps. In step one, we generate the curve of target levels τ (t) for each SKU and each time period. In particular, we let the target level of SKU i be proportional to the mean value and the standard deviation of the demand in the future. Mathematically, this can be expressed as τi (t) = (a1 + a2 Cvi )μi (t + a3 ), where μi (t) is the mean of the demand of SKU i at time t, Cvi is the coefficient of variation of SKU i, and a1 , a2 , and a3 are randomly generated constants used for all SKUs. In step two, the production schedules are arranged so that the inventory level will be equal to the target inventory level by the time that the SKUs leave the production lines. When the production capacity is not sufficient to meet demand, the capacity will be allocated “fairly” among all SKUs such that the ratio of the inventory level to the target level is the same for each SKU after allocation. For the second question, we develop a crude model in three steps. First, we aggregate the SKUs by their Coefficients of Variation. The mean of the demand of the aggregated SKU is equal to the sum of the means of the SKUs with similar Cv, and the Cv of the aggregated SKU will be equal to that Cv. This way, we can typically aggregate 10,000 or even 30,000 different SKUs to no more than 100 SKUs, sometimes no more than 10 SKUs. Second, instead of simulating the system for thousands of weeks, we can use short simulations of only 100 weeks. Third, we can use a small number of replications (even only one replication). We now summarize the application procedure of OO in the scheduling problem of apparel manufacturing system as follows. Step 1. Randomly generate N target levels. Step 2. For each target level, randomly generate the capacity allocations γ between the two production lines. Then use the target tracking strategy α to determine the production schedules of all the SKUs. The capacity allocation γ together with the production schedule α constitutes a design (α, γ).

chapter11

April 29, 2013

16:41

World Scientific Review Volume - 9in x 6in

Applications of Ordinal Optimization

chapter11

231

Step 3. Aggregate SKUs by their Coefficients of Variation. Then, use the crude model to estimate the performance of the sampled designs. Step 4. Estimate observation noise level and problem type. Step 5. Define the size of good enough subset g, and the required alignment level k. Step 6. Use the UAP table to calculate the size of the selected set S. Step 7. OO theory ensures that there are at least k truly good enough designs in the observed top-s designs with high probability. Now, consider the problem with 100 SKUs. For each SKU, the demand at time t is a truncated Gaussian random variable with mean μ(t) and coefficient of variation Cv. Assume that the average demand μ(t) is a seasonal-sine demand. The ratio of the average demand from the peak season to the low season ranges from 3 to 7. The Cv of the SKUs ranges from 0.1 to 1.0, and the SKUs with higher Cv have lower demand than the SKUs with lower Cv. The ratio of the demand of the SKU with the highest Cv to that with the lowest Cv is 5. There are 25 weeks in each season. The lead time of the quick and regular lines are 1 week and 4 weeks, respectively. Weekly total production schedules should be maintained within ±20% of the total production capacity. Some other parameter settings are as follows, the inventory holding cost per unit per week CI is $0.08; the quick line production cost per unit Cq is $4.4; the regular line production cost per unit Cs is $4; the material cost per unit Cm is $10; the sale price per unit PS is $20. The good enough set G is defined as the top 5% designs. The true performance of a design is obtained by a detailed simulation, which simulates 500 weeks dynamics of the entire 100 SKUs, and runs for 40 replications. In the crude model, a single replication of the 100 weeks dynamics of 10 aggregated SKUs is used. Note that the time to run a crude model is roughly 1/2000 of that of a detailed model. In particular, we have reduced the computation time from 1 week to several minutes. We randomly pick 1,000 designs, regard the truly top-50 designs as good enough, and apply OO to pick the observed top-s designs. The alignment level under different values of s is shown in Table 11.1. From Table 11.1 we can see that if we only want to find one of the top-50 designs, then only the observed best design needs to be selected. Note that if we want to find the truly best design in these 1,000 designs, 1,000 detailed simulations are needed. However, using the crude model and OO, we now find one of the top-50 designs using only 1,000 crude simulations, which is only about

April 29, 2013

16:41

232

World Scientific Review Volume - 9in x 6in

T. W. E. Lau et al. Table 11.1.

The alignment level. s

k

1 5 10 20 50 100

1 4 7 11 26 38

1, 000 × 1/2, 000 = 0.5 detailed simulations. This saves the total computing time by 2,000 folds. OO can also be used to study the impacts of different factors on the total profit. More details of this application can be found in [1, 2]. 11.2. The Turbine Blade Manufacturing Process Optimization Problem This section is based on [3, 4]. Peripheral blades and central rotor are the primary parts of an airplane turbine engine. The manufacturing of the integrally-bladed rotor (IBR) is composed of cast, hot isostatic pressing, upset, heat treatment, blocker forge, nearnet-shape (NNS) forge, and machining. An IBR is manufactured via extrusion, which is similar to the manufacturing of plastic parts, but with much tougher high-strength metal and with higher quality requirement on the product. As an optimization problem, such a manufacturing process is challenging due to the large number of parameter settings of all the operations and the difficulty to accurately evaluate the quality of the final production. The physical properties of the production, such as the effective strain field, the effective strain rate field, and the maximum load-stroke, are determined by the deformation process of the work piece during manufacturing, which can only be accurately described by the finite element method (FEM). It usually takes hours if not days to use FEM to simulate (calculate) the entire deformation process, and accurately evaluate the quality of the turbine blade thus produced. The FEM is a deterministic but highly complex calculation. As we will see in the following, by applying Ordinal Optimization, we are able to find a good enough parameter setting based on a computationally fast and crude model. We also save the computing budget by a factor of 95% comparing with using the brute force approach.

chapter11

May 20, 2013

14:11

World Scientific Review Volume - 9in x 6in

Applications of Ordinal Optimization

chapter11

233

Each design θ in this problem is a specification of the seven parameters in the blocker forge (the initial radius, height, and temperature of the billet, the temperature of the die, the ram velocity, the ambient temperature, and the friction coefficient) and a choice of the die shape. The total design space is Θ. The cost function, denoted as J(θ), consists of accounting costs, quality loss penalties, and inspection overheads. This is accurately evaluated by the thermo-mechanical processes, fully described using the FEM. We shall use the Ohio University Forge Simulation Model (the OU model) [5] as the crude model. This model introduces the following simplifications. First, the OU model only tracks down the change of the geometry of the work piece, instead of all the physical quantities as in the FEM model. Second, instead of tracking the entire field of the continuum properties such as strain, strain rate, temperature, pressure, and grain size, the OU model divides the work piece into four parts, and calculates only the estimated average of these characteristic values in the regions with the assumption that these thermo-mechanical properties are uniform inside each region. Third, the evolution of the work piece during the forge process is simplified. In this way, the simulation is much simplified, and much faster than the FEM. A comparison study shows that it takes the FEM about 4 hours to evaluate just one design, compared to only 0.1 seconds using the above crude model — a tremendous saving in computing time. Note that the crude model is a deterministic but simple calculation. Because the true model is too complex, the deterministic errors between the two models are complex and hard to predict. Based on our discussion in Chapter 9 (Fundamentals of Ordinal Optimization), we can regard these errors as random noises, and treat the problem as if the true model is a stochastic simulation. This will be justified by the following numerical results. An estimate shows that it will take about 160 days to evaluate the performance of 1,000 designs using the FEM model. Due to this extremely lengthy computation, we only randomly pick 80 designs. Their performances are accurately evaluated which will be used to quantify the alignment level. We use the crude model to estimate the performance of these 80 designs, which are shown in Fig. 11.1. We see that the performance belongs to the neutral type. Then, we randomly select several of these 80 designs to estimate the normalized noise level, which is 0.1729, a small noise level in the UAP table. For different values of g and k, the predicted size of the selected set is denoted as sˆ1 in Table 11.2. Since the true noise level is smaller

April 29, 2013

16:41

World Scientific Review Volume - 9in x 6in

T. W. E. Lau et al.

234

Fig. 11.1.

Table 11.2.

The observed OPC. [3]

The predicted and true selected sizes. g

k

s∗

sˆ1

sˆ2

1(top 1.25%)

1

3

80

29

4(top 5%)

1 2 3 4

3 5 6 9

13 25 37 49

6 10 15 20

8(top 10%)

1

3

6

3

2 3 4 5 6 7 8

5 6 9 11 13 17 19

10 15 19 24 28 33 38

5 8 10 12 14 16 19

chapter11

April 29, 2013

16:41

World Scientific Review Volume - 9in x 6in

Applications of Ordinal Optimization

chapter11

235

than 0.5, the values of sˆ1 might be conservative (i.e., larger than necessary). So we use linear interpolation to obtain less conservative estimations, denoted as sˆ2 . To be specific, when the noise level is 0, we only need to pick the observed top-k designs to cover k truly top-g designs. When the noise level is 0.5, we need to pick the observed top-ˆ s1 designs. So, we have s1 − k) = 0.1729/0.5 and let sˆ2 = (ˆ s1 − k) × 0.1729/0.5 + k. We (ˆ s2 − k)/(ˆ also show the true values s∗ . From Table 11.2 we can see that sˆ2 is less conservative than sˆ1 , and most values of sˆ2 are larger than s∗ . Even in the only exception when g = 8 and k = 7, sˆ2 < s∗ but the values are very close. If we want to find at least one of the top-10% designs with high probability, then sˆ2 shows that only the observed top-3 designs are needed. Comparing with brute force, the search region is reduced by more than 25 folds from 80 down to 3. This example shows that OO can be applied even in the case when the objective function is not a stochastic simulation but a deterministic and highly complex calculation. 11.3. Performance Optimization for a Remanufacturing System This section is based on [6–8]. The goal is to manage the number of machines in the repair shop and the number of new parts to order in the inventory, so that the maintenance cost is minimized and the average maintenance time for an asset is not too long. We shall discuss two formulations of this problem, namely a constrained optimization and a bi-objective formulation, and apply constrained ordinal optimization and vector ordinal optimization, respectively. 11.3.1. Application of constrained ordinal optimization The basic idea of remanufacturing system is to reuse the parts (sometimes after repair) from old products to produce new products. This idea is especially useful for expensive assets such as aircraft jet engines. Consider a remanufacturing system as shown in Fig. 11.2. A problematic asset (perhaps due to some unknown or random failure) is shipped to this remanufacturing system. After being disassembled into parts and having been inspected, those that are still in serviceable conditions will be directly sent to a certain location waiting to be reassembled into new assets. Other parts needing repair will be sent to the repair shop. After repair, these parts enter

16:41

World Scientific Review Volume - 9in x 6in

chapter11

T. W. E. Lau et al.

ĂĂ

Ă

Ă

Ă

236

Ă

April 29, 2013

Fig. 11.2.

Detailed model of the remanufacturing system. [8]

into the inventory system, and are then assembled into new assets together with the parts in serviceable condition. Finally the completed asset exits the remanufacturing system. Due to the uncertainties in the arrival, waiting, and repair times, there might not be enough parts for assembling into a new asset. To avoid this problem of “lack of synchronization”, new parts can sometimes be ordered to the inventory. The parameters we can control are the number of machines in the repair shop C and the number of new parts to be ordered in the inventory ΔI. The objective function is the average maintenance cost of an asset, denoted as J(C, ΔI). The constraint is that the probability for the maintenance time to exceed a given limit TD is within certain ranges, i.e., Pr{T (k) > TD |η(k) ≤ TC } < P0 , where T (k) is the maintenance time of the kth asset, η(k) is the departure time of the kth asset, the condition η(k) ≤ TC says that the kth asset leaves the system with a contract duration TC , and 0 < P0 < 1 is a given constant. In short, the simulation-based constrained optimization problem is min C, ΔIJ(C, ΔI)s.t. Pr{T (k) > TD |η(k) ≤ TC } < P0 ,

(11.2)

where both the objective function and the constraints can be evaluated accurately only by detailed simulation. It takes 30 minutes to accurately evaluate the performance and feasibility of a single design by the Enterprise Dynamics Software, which means altogether 500 hours to evaluate 1,000 designs. Under the parameter setting discussed in [6], we randomly sample 1,000 designs, with their true performance and feasibility shown in

April 29, 2013

16:41

World Scientific Review Volume - 9in x 6in

Applications of Ordinal Optimization

Fig. 11.3.

chapter11

237

The true performance and feasibility of 1000 randomly sampled designs. [8]

Fig. 11.3. We can see that if ordinal optimization is applied directly without any adjustment, then most of the randomly sampled designs would be infeasible. As a result, we apply COO to address this issue. In order to get a feasibility model, a machine learning method is used. A small number of designs are accurately evaluated and used to train the feasibility model. Then this model can be used to fast predict the feasibility of other designs. Comparing with the 30-minutes running time of the detailed simulation model for a single design, it only takes 0.003 second to run this feasibility model. The average accuracy is 0.985, meaning that if we randomly sample 1,000 designs that are predicted as feasible by this feasibility model, on average 985 designs are truly feasible. As for the crude model for performance estimation, we apply blind picking, of which the size of the selected set is an upper bound of that when other better selection rules are applied. Regard the top 50 feasible designs in the 1,000 randomly sampled designs in Fig. 11.3 as good enough. Using the formula in Chapter 9, we can calculate that the size of the selected set s should be 10, 16, and 39 to achieve an alignment probability of 0.5, 0.7, and 0.95, respectively. For

April 29, 2013

16:41

238

World Scientific Review Volume - 9in x 6in

T. W. E. Lau et al.

example, if we target an alignment probability of 0.95, then COO reduces the search region from 1,000 designs down to 39 randomly selected designs that are predicted as feasible. This saves the computation time by 25 folds. 11.3.2. Application of vector ordinal optimization We can reformulate this constrained problem as a bi-objective optimization problem, of which the first objective function J1 is the probability for the maintenance time to exceed a given threshold, and the second objective function J2 is the maintenance cost. Both objective functions can only be accurately evaluated by detailed simulations. The ordered performance curve is shown in Fig. 11.4, which can be estimated using the crude model, and it belongs to the steep type. We estimate the noise level by 10 independent simulations of a design and find that the normalized noise level is 0.1061 for J1 and 0.0078 for J2 , which are at the small noise level. Define the designs in the truly first two layers as good enough designs — 14 designs in total as shown in Fig. 11.5. For a required alignment level k, we can use the UAP-VOO table to predict the number of layers to select (denoted as sˆ1 ), and use detailed simulation to calculate the true number of layers to select (denoted as s∗ ).

Fig. 11.4.

The true VOPC. [8]

chapter11

April 29, 2013

16:41

World Scientific Review Volume - 9in x 6in

Applications of Ordinal Optimization

Fig. 11.5.

chapter11

239

The true performance of the 1000 randomly sampled designs. [8]

We can also use the following method to obtain a less conservative estimate of s∗ . To be specific, we can perform fast performance evaluation of all 1,000 designs by the crude model. Treating these results as the true performance and using Gaussian observation noise with normalized noise level 0.1061 for J1 and 0.0078 for J2 , we run 1,000 replications to estimate the number of observed layers to be selected such that k good enough designs are covered with a probability of no less than 0.95. These values are denoted as sˆ2 . For different values of k, the values of s∗ , sˆ1 , and sˆ2 are shown in Table 11.3. The total number of designs in the observed top-ˆ s2 layers is also summarized in the last column in the table. We see that sˆ2 is less conservative than sˆ1 and is larger than s∗ in most cases. In the only exception when k = 14, the difference between sˆ2 and s∗ is only 1. In short, we can apply VOO to reduce the search region by about 10 folds in this example. 11.4. Witsenhausen Problem This section is based on [8–10]. A celebrated problem in systems and control is the so-called Witsenhausen problem [11]. This is regarded as one of

April 29, 2013

16:41

World Scientific Review Volume - 9in x 6in

T. W. E. Lau et al.

240 Table 11.3.

The predicted and true values of s.

k

s∗

sˆ1

sˆ2

s ˆ2 | ∪i=1 Li |

1

1

7

1

6

2 3 4 5 6 7 8 9 10 11 12 13 14

1 1 1 2 2 2 3 3 5 6 8 9 13

11 15 20 24 28 32 36 40 44 48 52 56 59

1 2 2 2 3 3 4 5 5 6 8 9 12

6 14 14 14 24 24 36 48 48 62 85 96 141

the simplest two-stage Linear-Quadratic-Gaussian (LQG) control problem except for one small detail: Instead of the usual assumption of perfect memory (or recall) at all stages, a centralized decision maker does not possess at the second stage the full knowledge from the first stage. At stage 1, we observe the initial state of the system x. We choose a control u1 = γ1 (x) and the new state will be determined as x1 = x + u1 = x + γ1 (x). At stage 2, we cannot observe x1 directly. Instead, we can only observe y = x1 + v, where v is the additive noise. Then, we choose a control u2 = γ2 (y) and the system state stops at x2 = x1 + u2 . The cost function is E[k 2 u21 +x22 ] with k 2 > 0, a given constant. The challenge is to find a pair of control functions (γ1 , γ2 ) to minimize the cost function. The tradeoff here is between the costly control γ1 with perfect information x, and the costless control γ2 facing noisy information y. We consider the famous benchmark case for x ∼ N (0, σ 2 ) and v ∼ N (0, 1) with σ = 5 and k = 0.2. Witsenhausen defined f (x) = x + γ1 (x) and g(y) = γ2 (y) and converted the problem to minimize J(f, g). He showed that 1) for any k 2 > 0, the problem has an optimal solution, 2) for any k 2 < 0.25 and σ = k −1 , the optimal solution in linear control ∗ 2 class with √ f (x) = λx and g(y) = μy has Jlinear = 1 − k , and λ = μ = 2 0.5(1 + 1 − 4k ),

chapter11

April 29, 2013

16:41

World Scientific Review Volume - 9in x 6in

Applications of Ordinal Optimization

chapter11

241

∗ 3) there exist k and σ such that the optimal cost is less than Jlinear . The following example is given, i.e., fW (x) = σsgn(x), gW (y) = σ tanh(σy), where sgn() is the sign function, then the cost is JW = 0.4042, and 4) for given f (x) satisfying E[f (x)] = 0 and var[f (x)] ≤ 4σ 2 , the optimal gf∗ associated with function f is

gf∗ =

E[f (x)ϕ(y − f (x))] , ϕ(y − f (x))

(11.3)

where ϕ() is the standard Gaussian density function. Now the problem is transformed to search for an f to minimize J(f, gf∗ ). Unfortunately, to date there is no analytical method available for determining the optimal f . We know, however, that it suffices to consider only symmetric functions f and discretize f into step functions. In order to find a crude model for J, two simplifications are used. First, to approximate gf∗ we calculate 100 gˆf =

i=1 f (xi )ϕ(y − f (xi )) . 100 i=1 ϕ(y − f (xi ))

(11.4)

Second, only 100 replications are used to estimate J(f, gˆf ), denoted as ˆ gˆf ). After discretization of the function f , the design space remains J(f, extremely large. One idea to address this issue is to divide the entire design space into smaller subsets and focus on the more promising subsets for further searching. To be specific, for a search space Θ, we first define two or more subsets (there might be intersection among these subsets) and find the corresponding observed performance distribution functions (PDFs). By comparing the observed PDFs, we obtain the subsets that are sufficiently good. Then, we narrow down the search into smaller subsets. Based on this method, we identify the following properties by numerical results. 1) For each interval Ii , control f is in (−0.5σ, 2.5σ), i.e., f ∼ U (−2.5, 12.5). 2) The control f is a non-decreasing function in (−0.5σ, 2.5σ). 3) The control f is a two-value non-decreasing step function in (−0.5σ, 2.5σ). We then randomly sample 5,000 designs satisfying these properties and find the following design  3.1686, 0 ≤ x < 6.41, fDH (x) = (11.5) 9.0479, x ≥ 6.41.

April 29, 2013

16:41

242

World Scientific Review Volume - 9in x 6in

T. W. E. Lau et al.

The performance of this function is JDH = 0.1901, which is 47% better than the best solution reported by Banal and Basar, with their performance measured at JBB = 0.3634 [12]. The above success implies that the step function is a good representation for f . Following this representation, [10] achieved a fast and accurate computational scheme for the cost J – using numerical integration in lieu of simulation. It was also observed that the jump points should be located around the average of two adjacent values of f . Furthermore, additional improvement can be made by adding small segments to approximate a slight slope for each step in f . Finally, a better function f was found as follows ⎧ 0.00, 0.00 ≤ x < 0.65, ⎪ ⎪ ⎪ ⎪ ⎪ 0.05, 0.65 ≤ x < 1.95, ⎪ ⎪ ⎪ ⎪ 0.10, 1.95 ≤ x < 3.25, ⎪ ⎪ ⎪ ⎪ 6.40, 3.25 ≤ x < 4.58, ⎪ ⎪ ⎪ ⎪ ⎪ 6.45, 4.58 ≤ x < 5.91, ⎪ ⎪ ⎪ ⎪ 6.50, 5.91 ≤ x < 7.24, ⎪ ⎪ ⎨ 6.55, 7.24 ≤ x < 8.57, (11.6) fLLH (x) = 6.60, 8.57 ≤ x < 9.90, ⎪ ⎪ ⎪ ⎪ ⎪ 13.10, 9.90 ≤ x < 11.25, ⎪ ⎪ ⎪ ⎪ 13.15, 11.25 ≤ x < 12.60, ⎪ ⎪ ⎪ ⎪ ⎪ 13.20, 12.60 ≤ x < 13.95, ⎪ ⎪ ⎪ ⎪ 13.25, 13.95 ≤ x < 15.30, ⎪ ⎪ ⎪ ⎪ 13.30, 15.30 ≤ x < 16.65, ⎪ ⎪ ⎩ 19.90, 16.65 ≤ x. The corresponding cost is JLLH = 0.167313205338. This was the best solution known at that time, beating the solution JDH by an appreciable margin. Note that the step function is the basis for finding the best solution with a cost 0.1670790 [13] at the time of this writing. One can observe that when better J’s are achieved, the corresponding function f ’s have become more and more complex. In order to achieve a balance between complexity and performance, an interesting question is to find a simple and good design. Using Ordered Binary Decision Diagram (OBDD) to describe the discretized f , we can estimate an upper bound for the Kolmogorov complexity of f . Then we find the following function ⎧ 3.125, 0 ≤ x < 6.25, ⎪ ⎪ ⎨ 9.375, 6.25 ≤ x < 12.5, (11.7) fsg (x) = ⎪ 15.625, 12.5 ≤ x < 18.75, ⎪ ⎩ 21.875, 18.75 ≤ x,

chapter11

April 29, 2013

16:41

World Scientific Review Volume - 9in x 6in

Applications of Ordinal Optimization

chapter11

243

with performance measured at 0.1746. Comparing with JLLH , we save the memory space by over 30 folds with minor performance degradation (within 5%). 11.5. Other Application Researches Ordinal optimization has been applied to many areas in addition to the above selected examples. For example, by applying ordinal optimization and parallel computers, Patsis, Chen, and Larson won the 1992 MasPar Parallel Computer Challenge Award with Highest Speedup in the Realized Performance [14]. A list of doctoral theses based on Ordinal Optimization research and applications are shown below for interested readers: [15, 16] for methodology development, [9, 10] for team decision problems, [17] for rare event probability problems, [1] for apparel manufacturing systems, [18–20] for financial engineering, [21] for computer communication network, [3] for turbine blade manufacturing problems, [22] for machine learning, [8] for remanufacturing systems, [23] for wireless sensor networks, and [24] for elevator group scheduling. Acknowledgments These contributions would not be possible were it not for the guidance and support provided by Professor Y.-C. Ho throughout the years. References [1] L. H. Lee, Ordinal Optimization and Its Application in Apparel Manufacturing Systems. PhD thesis, Harvard University, MA, USA (1997). [2] S. Bouhia, A Risk-based Approach to Scheduling and Ordering Production in Manufacturing Systems: Real Options on Production Capacity. PhD thesis, Harvard University, MA, USA (2004). [3] M. S. Y. Yang, Ordinal Optimization and Its Application to Complex Deterministic Problems. PhD thesis, Harvard University, MA, USA (1998). [4] M. S. Yang and L. H. Lee, An illustrative case study on application of learning based ordinal optimization approach to complex deterministic problem, European Journal of Operational Research. 174(1), 265–277 (2006).

April 29, 2013

16:41

244

World Scientific Review Volume - 9in x 6in

T. W. E. Lau et al.

[5] J. S. Gunasekera, C. E. Fischer, J. C. Malas, W. M. Mullins, M. S. Yang, and N. Glassman, The development of process models for use with global optimization of a manufacturing system. In ASME 1996 International Mech. Eng. Conference and Exposition, Atlanta, GA (1996). [6] C. Song, X. Guan, Q. Zhao, and Y.-C. Ho, Machine learning approach for determining feasible plans of a remanufacturing system, IEEE Transactions on Automation Science and Engineering. 2(3), 262–275 (2005). [7] C. Song, X. Guan, Q. Zhao, and Q.-S. Jia, Planning remanufacturing systems by constrained ordinal optimization method with feasibility model. In IEEE Conference on Decision and Control and European Control Conference, pp. 4676–4681, Seville, Spain (Dec. 12–15, 2005). [8] Q.-S. Jia, Enhanced Ordinal Optimization: A Theoretical Study and Applications. PhD thesis, Tsinghua University, Beijing, China (2006). in Chinese. [9] M. Deng, Sampling-selection and Space-narrowing Methods for Stochastic Optimization. PhD thesis, Harvard University, MA, USA (1995). [10] J. T. Lee, The Witsenhausen Problem: New Insights into an Old Problem. PhD thesis, Harvard University, MA, USA (2002). [11] H. S. Witsenhausen, A counterexample in stochastic optimum control, SIAM Journal on Control. 6(1), 131–147 (1968). [12] R. Banal and T. Basar, Stochastic teams with nonclassical information revisited: When is an affine law optimal, IEEE Transactions on Automatic Control. 32, 554–559 (1987). [13] N. Li, J. R. Marden, and J. S. Shamma, Learning approaches to the witsenhausen counterexample from a view of potential games. In Proceedings of the Joint 48th IEEE Conference and Control and the 28th Chinese Control Conference, Shanghai, China (Dec. 16–18, 2009). [14] N. Patsis, C.-H. Chen, and M. E. Larson, Simd parallel discrete event dynamic system simulation, IEEE Transactions on Control Systems Technology. 5(3), 30–41 (1997). [15] C.-H. Chen, An Efficient Approach for Discrete Event System Decision Problems. PhD thesis, Harvard University, MA, USA (1994). [16] T. W. E. Lau, Probability Models and Selection Methods for Stochastic Optimization. PhD thesis, Harvard University, MA, USA (1997). [17] M. E. Larson, Stochastic Optimization of Rare Event Probability Problems. PhD thesis, Harvard University, MA, USA (1996). [18] N. Patsis, Pricing American-style Exotic Options using Ordinal Optimization. PhD thesis, Harvard University, MA, USA (1997). [19] X. C. Lin, Optimization Under Uncertainty: A New Framework and Its Applications. PhD thesis, Harvard University, MA, USA (2000). [20] A. P. Volpe, Modeling Flexible Supply Options for Risk-adjusted Performance Evaluation. PhD thesis, Harvard University, MA, USA (2005). [21] W. Li, Vector and Constraint Ordinal Optimization - Theory and Practice. PhD thesis, Harvard University, MA, USA (1998). [22] Z. Chen, Machine Learning Approach Towards Automatic Target Recognition. PhD thesis, Harvard University, MA, USA (2001).

chapter11

April 29, 2013

16:41

World Scientific Review Volume - 9in x 6in

Applications of Ordinal Optimization

chapter11

245

[23] H. X. Bai, Non-complete Coverage Problem in Large Scale Wireless Sensor Networks. PhD thesis, Tsinghua University, Beijing, China (2008). in Chinese. [24] Z. Shen, Ordinal Performance Analysis of the Solution of an Optimization Algorithm. PhD thesis, Tsinghua University, Beijing, China (2009). in Chinese.