140 105 4MB
English Pages 281 [278] Year 2022
Communications and Control Engineering
Timothy L. Molloy Jairo Inga Charaja Sören Hohmann Tristan Perez
Inverse Optimal Control and Inverse Noncooperative Dynamic Game Theory A Minimum-Principle Approach
Communications and Control Engineering Series Editors Alberto Isidori, Roma, Italy Jan H. van Schuppen, Amsterdam, The Netherlands Eduardo D. Sontag, Boston, USA Miroslav Krstic, La Jolla, USA
Communications and Control Engineering is a high-level academic monograph series publishing research in control and systems theory, control engineering and communications. It has worldwide distribution to engineers, researchers, educators (several of the titles in this series find use as advanced textbooks although that is not their primary purpose), and libraries. The series reflects the major technological and mathematical advances that have a great impact in the fields of communication and control. The range of areas to which control and systems theory is applied is broadening rapidly with particular growth being noticeable in the fields of finance and biologically inspired control. Books in this series generally pull together many related research threads in more mature areas of the subject than the highly specialised volumes of Lecture Notes in Control and Information Sciences. This series’s mathematical and control-theoretic emphasis is complemented by Advances in Industrial Control which provides a much more applied, engineering-oriented outlook. Indexed by SCOPUS and Engineering Index. Publishing Ethics: Researchers should conduct their research from research proposal to publication in line with best practices and codes of conduct of relevant professional bodies and/or national and international regulatory bodies. For more details on individual ethics matters please see: https://www.springer.com/gp/authors-editors/journal-author/journal-authorhelpdesk/publishing-ethics/14214
More information about this series at https://link.springer.com/bookseries/61
Timothy L. Molloy · Jairo Inga Charaja · Sören Hohmann · Tristan Perez
Inverse Optimal Control and Inverse Noncooperative Dynamic Game Theory A Minimum-Principle Approach
Timothy L. Molloy Department of Electrical and Electronic Engineering University of Melbourne Melbourne, VIC, Australia
Jairo Inga Charaja Department of Electrical Engineering and Information Technology Karlsruhe Institute of Technology Karlsruhe, Germany
Sören Hohmann Department of Electrical Engineering and Information Technology Karlsruhe Institute of Technology Karlsruhe, Germany
Tristan Perez The Boeing Company Boeing Defence Australia Brisbane, QLD, Australia
ISSN 0178-5354 ISSN 2197-7119 (electronic) Communications and Control Engineering ISBN 978-3-030-93316-6 ISBN 978-3-030-93317-3 (eBook) https://doi.org/10.1007/978-3-030-93317-3 MATLAB and Simulink are registered trademarks of The MathWorks, Inc. See https://www.mathworks. com/ trademarks for a list of additional trademarks. Mathematics Subject Classification: 49N45, 49K15, 49K21, 49N70 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
To My family and S.C. (T. M.) My families in Peru and Germany (J. I. C.) My wife and son Philipp (S. H.) Jae and Oliver (T. P.)
Preface
This book aims to provide an introduction to selected topics within the theory of inverse problems in optimal control and noncooperative dynamic game theory. These topics have emerged relatively recently in data-driven problems that involve inferring the underlying optimality objectives of decision-makers (agents or systems) from quantitative observations of their behavior. For example, such problems have arisen in applications across systems and control, robotics, machine learning, biology, economics, and operations research including the development of robots that mimic the behavior of human experts; the quantitative study of biological control systems; the design of advanced driver assistance technologies; the efficient inference of agent intentions; and the estimation of competitive market and economic models in economics and operations research. The origins of this book lie in our own research exploring inverse problems in optimal control and noncooperative dynamic game theory. We noticed a sparsity of literature treating such inverse problems in their data-driven forms. Most notably, almost no work on them had appeared in leading systems and control journals prior to 2018! Despite the broad practical significance and deep (intellectual) challenges of inverse optimal control and inverse noncooperative dynamic game theory, the powerful mathematical tools and fundamental theoretical insights offered by systems and control theory had, therefore, been missing from many popular treatments. The purpose of this book is thus to both expose systems and control researchers to inverse problems (providing a springboard to open problems) and to draw broader attention to useful systems and control techniques for solving them (specifically Pontryagin’s minimum principle). This book’s intended audience are researchers and graduate students in systems and control, robotics, and computer science. It is intended to be mostly self-contained, but previous exposure to systems and control or (dynamic) optimization would be helpful. Given the significance of the minimum principle throughout this book, we provide a background chapter with a short introduction to its use in (forward) optimal control and noncooperative dynamic game theory. In particular, we collect the scattered results on the conditions for optimal and Nash equilibrium solutions, both in discrete and continuous time. vii
viii
Preface
After presenting background fundamentals, the first half of this book seeks to illuminate key concepts underlying the rapidly growing literature on inverse optimal control for linear and nonlinear dynamical systems in discrete and continuous time with continuous state and control spaces. These concepts include the formulation of different inverse optimal control problems depending on the available data as well as the proposal of the techniques to solve them. The second half of this book endeavors to generalize and extend inverse optimal control theory to inverse noncooperative dynamic game theory. Inverse problems in noncooperative dynamic game theory are concerned with computing the individual optimality objectives of competing decision-makers from data. Such inverse problems raise a host of new theoretical issues due to the information structures and (equilibrium) solution concepts unique to noncooperative dynamic games. Therefore, the book attempts to highlight both the similarities and differences between inverse optimal control and inverse noncooperative dynamic game theory. Throughout the book, an emphasis is placed on fundamental questions and performance characterizations. For example, conditions analogous to identifiability and persistence of excitation are established under which inverse optimal control and inverse noncooperative dynamic game problems have either unique or functionally equivalent solutions. It is hoped that this book will prove helpful and inspire future investigations of inverse optimal control and inverse noncooperative dynamic game theory. Melbourne, Australia Heidelberg, Germany Karlsruhe, Germany Brisbane, Australia
Timothy L. Molloy Jairo Inga Charaja Sören Hohmann Tristan Perez
Acknowledgements
The inception of this monograph was the Linkage Project LP130100483 co-funded by Boeing and the Australian Research Council in 2013 with the participation of The University of Queensland and The University of Newcastle (Australia). In this project, the team set out to investigate how behaviors in nature, such as bird agile maneuvers, could be used to guide the design of behaviors for autonomous aircraft. This led to the initial collaboration of this monograph’s authors Dr. Perez and Dr. Molloy. The initial work was then expanded through a Fellowship that Dr. Molloy undertook at the Queensland University of Technology (QUT) supported by Boeing and the Queensland Government’s Advance Queensland Program. The research on inverse problems at the Institute of Control Systems of the Karlsruhe Institute of Technology (KIT) started with the work on model-based design approaches for human–machine shared control systems by Dr. Michael Flad and Prof Hohmann in 2013. New results in biocybernetics revealed that human movement is well described by optimality principles, leading to dynamic games being natural candidates for modeling human–machine shared control, and raising the question of how the parameters of the human should be identified. Initially focused on humandriver behavior, the problem was later addressed further by the work of Dr. Inga Charaja, partially funded by the German Research Foundation’s (DFG) research grant project “Inverse Noncooperative Dynamic Games in Automatic Control”. The undertakings of both research groups resulted in the presentation of two similar papers at the 2017 IFAC World Congress. This was the beginning of a fruitful collaboration between the groups over a number of years and publications, culminating in the completion of this monograph. Dr. Molloy would like to acknowledge the support of Boeing, the Queensland Government’s Department of Science, Information Technology and Innovation (DSITI), and QUT through an Advance Queensland Research Fellowship. He would like to extend a special thanks to Grace Garden for the many enriching technical discussions and collaborations, Kelly Cox and Brendan Williams for championing the Fellowship within Boeing, and Prof Jason Ford and Prof Michael Milford at QUT for their generous support.
ix
x
Acknowledgements
Dr. Inga Charaja would like to acknowledge the support of the DFG and the Institute of Control Systems at KIT. In particular, he would like to give special thanks to the members of the “Cooperative Systems” research group, especially to Esther Bischoff, Philipp Karg, Florian Köpf, and Simon Rothfuß for the fruitful discussions and collaborations. Dr. Inga Charaja would also like to express his gratitude to Dr. Karl Moesgen, Dr. Gunter Diehm, and Dr. Michael Flad for their mentorship in the early stages of his academic career. Professor Hohmann would like to acknowledge the support of the DFG and KIT, especially the research group “Cooperative Systems” of the Institute of Control Systems at KIT. Finally, he wants to thank his beloved wife, Sabine, and his lively son, Philipp, for tolerating his incessant disappearances into his (home-)office. They make both the journey and destination worthwhile. Dr. Perez would like to acknowledge the Boeing Company and, in particular, Brendan Williams, who championed the initial linkage project within Boeing. Dr. Perez would also like to acknowledge The Australian Research Council, and Prof. Mandyam Srinivasan of the University of Queensland and his team for their outstanding experimental work with birds that ignited Dr. Perez’s interest for research in this area. Dr. Perez also acknowledges the support of The University of Newcastle and is grateful to Prof. Graham C. Goodwin for his mentorship and unfailing guidance and support during his academic career. Most importantly, a heartfelt thank you goes to Dr. Perez’s lovely wife Jae and son Oliver for their constant love and support without which it would have not been possible to contribute to this work.
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Inverse Optimal Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Inverse Noncooperative Dynamic Game Theory . . . . . . . . . . . . . . . . 1.4 Outline of this Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1 1 2 3 5 6
2 Background and Forward Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Static Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 General Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.2 Necessary Optimality Conditions . . . . . . . . . . . . . . . . . . . . . . . 2.1.3 Quadratic Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.4 Systems of Linear Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Discrete-Time Optimal Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 General Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Discrete-Time Minimum Principles . . . . . . . . . . . . . . . . . . . . . 2.3 Continuous-Time Optimal Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 General Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.2 Continuous-Time Minimum Principles . . . . . . . . . . . . . . . . . . 2.4 Noncooperative Dynamic Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 General Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.2 Nash Equilibrium Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.3 Nash Equilibria via Discrete-Time Minimum Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Noncooperative Differential Games . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.1 General Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.2 Nash Equilibrium Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.3 Nash Equilibria via Continuous-Time Minimum Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11 11 11 12 13 15 16 16 18 20 21 22 24 24 27 29 32 32 34 36 39
xi
xii
Contents
3 Discrete-Time Inverse Optimal Control . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Preliminary Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 Parameterized Discrete-Time Optimal Control Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.2 Parameterized Discrete-Time Minimum Principles . . . . . . . . 3.2 Inverse Optimal Control Problems in Discrete Time . . . . . . . . . . . . . 3.3 Bilevel Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Bilevel Method for Whole Sequences . . . . . . . . . . . . . . . . . . . 3.3.2 Bilevel Method for Truncated Sequences . . . . . . . . . . . . . . . . 3.3.3 Discussion of Bilevel Methods . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Minimum-Principle Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 Methods for Whole Sequences . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.2 Methods for Truncated Sequences . . . . . . . . . . . . . . . . . . . . . . 3.5 Method Reformulations and Solution Results . . . . . . . . . . . . . . . . . . . 3.5.1 Linearly Parameterized Cost Functions . . . . . . . . . . . . . . . . . . 3.5.2 Reformulations of Whole-Sequence Methods . . . . . . . . . . . . 3.5.3 Solution Results for Whole-Sequence Methods . . . . . . . . . . . 3.5.4 Reformulations of Truncated-Sequence Methods . . . . . . . . . 3.5.5 Solution Results for Truncated-Sequence Methods . . . . . . . . 3.6 Inverse Linear-Quadratic Optimal Control in Discrete Time . . . . . . . 3.6.1 Overview of the Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.2 Preliminary LQ Optimal Control Concepts . . . . . . . . . . . . . . 3.6.3 Feedback-Law-Based Inverse LQ Optimal Control . . . . . . . . 3.6.4 Estimation of Feedback Laws . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.5 Inverse LQ Optimal Control Method . . . . . . . . . . . . . . . . . . . . 3.7 Notes and Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Continuous-Time Inverse Optimal Control . . . . . . . . . . . . . . . . . . . . . . . 4.1 Preliminary Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 Parameterized Continuous-Time Optimal Control Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.2 Parameterized Continuous-Time Minimum Principles . . . . . 4.2 Inverse Optimal Control Problems in Continuous Time . . . . . . . . . . 4.3 Bilevel Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Bilevel Method for Whole Trajectories . . . . . . . . . . . . . . . . . . 4.3.2 Bilevel Method for Truncated Trajectories . . . . . . . . . . . . . . . 4.3.3 Discussion of Bilevel Methods . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Minimum-Principle Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 Methods for Whole Trajectories . . . . . . . . . . . . . . . . . . . . . . . . 4.4.2 Methods for Truncated Trajectories . . . . . . . . . . . . . . . . . . . . .
41 42 42 43 46 47 48 48 49 50 50 54 57 57 58 62 72 76 84 85 86 87 93 93 93 95 97 97 98 99 102 103 103 103 104 104 104 109
Contents
xiii
4.5 Method Reformulations and Solution Results . . . . . . . . . . . . . . . . . . . 4.5.1 Linearly Parameterized Cost Functionals . . . . . . . . . . . . . . . . 4.5.2 Reformulations of Whole-Trajectory Methods . . . . . . . . . . . . 4.5.3 Solution Results for Whole-Trajectory Methods . . . . . . . . . . 4.5.4 Reformulations of Truncated-Trajectory Methods . . . . . . . . . 4.5.5 Solution Results for Truncated-Trajectory Methods . . . . . . . 4.6 Inverse Linear-Quadratic Optimal Control in Continuous Time . . . . 4.6.1 Overview of Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6.2 Preliminary LQ Optimal Control Concepts . . . . . . . . . . . . . . 4.6.3 Feedback-Law-Based Inverse LQ Optimal Control . . . . . . . . 4.6.4 Estimation of Feedback Controls . . . . . . . . . . . . . . . . . . . . . . . 4.6.5 Inverse LQ Optimal Control Method . . . . . . . . . . . . . . . . . . . . 4.7 Notes and Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
111 112 113 118 127 130 132 132 132 134 137 138 138 140
5 Inverse Noncooperative Dynamic Games . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Preliminary Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.1 Parameterized Noncooperative Dynamic Games . . . . . . . . . . 5.1.2 Nash Equilibria Conditions via Minimum Principles . . . . . . 5.2 Inverse Noncooperative Dynamic Game Problems . . . . . . . . . . . . . . . 5.3 Bilevel Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 Bilevel Method for Whole Sequences . . . . . . . . . . . . . . . . . . . 5.3.2 Bilevel Method for Truncated Sequences . . . . . . . . . . . . . . . . 5.3.3 Discussion of Bilevel Methods . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Open-Loop Minimum-Principle Methods . . . . . . . . . . . . . . . . . . . . . . 5.4.1 Whole-Sequence Open-Loop Methods . . . . . . . . . . . . . . . . . . 5.4.2 Truncated-Sequence Open-Loop Methods . . . . . . . . . . . . . . . 5.4.3 Discussion of Open-Loop Minimum-Principle Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Open-Loop Method Reformulations and Solution Results . . . . . . . . 5.5.1 Linearly Parameterized Player Cost Functions . . . . . . . . . . . . 5.5.2 Fixed-Element Parameter Sets . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.3 Whole-Sequence Methods Reformulations and Results . . . . 5.5.4 Truncated-Sequence Methods Reformulations and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6 Challenges and Potential for Feedback Minimum-Principle Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7 Inverse Linear-Quadratic Feedback Dynamic Games . . . . . . . . . . . . 5.7.1 Preliminary LQ Dynamic Game Concepts . . . . . . . . . . . . . . . 5.7.2 Feedback-Strategy-Based Inverse Dynamic Games . . . . . . . . 5.7.3 Estimation of Feedback Strategies . . . . . . . . . . . . . . . . . . . . . . 5.7.4 Inverse LQ Dynamic Game Method . . . . . . . . . . . . . . . . . . . . 5.8 Notes and Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
143 144 144 145 148 150 150 151 152 153 153 156 158 158 159 160 160 168 175 177 177 180 183 184 184 186
xiv
Contents
6 Inverse Noncooperative Differential Games . . . . . . . . . . . . . . . . . . . . . . . 6.1 Preliminary Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.1 Parameterized Noncooperative Differential Games . . . . . . . . 6.1.2 Nash Equilibria Conditions via Minimum Principles . . . . . . 6.2 Inverse Noncooperative Differential Game Problems . . . . . . . . . . . . 6.3 Bilevel Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.1 Bilevel Methods for Whole Trajectories . . . . . . . . . . . . . . . . . 6.3.2 Bilevel Methods for Truncated Trajectories . . . . . . . . . . . . . . 6.3.3 Discussion of Bilevel Methods . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Open-Loop Minimum-Principle Methods . . . . . . . . . . . . . . . . . . . . . . 6.4.1 Whole-Trajectory Open-Loop Methods . . . . . . . . . . . . . . . . . 6.4.2 Truncated-Trajectory Open-Loop Methods . . . . . . . . . . . . . . . 6.4.3 Discussion of Open-Loop Minimum-Principle Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5 Open-Loop Method Reformulations and Solution Results . . . . . . . . 6.5.1 Linearly Parameterized Player Cost Functionals . . . . . . . . . . 6.5.2 Fixed-Element Parameter Sets . . . . . . . . . . . . . . . . . . . . . . . . . 6.5.3 Whole-Trajectory Methods Reformulations and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5.4 Truncated-Trajectory Methods Reformulations and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.6 Challenges and Potential for Feedback Minimum-Principle Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.7 Inverse Linear-Quadratic Feedback Differential Games . . . . . . . . . . 6.7.1 Preliminary LQ Differential Game Concepts . . . . . . . . . . . . . 6.7.2 Feedback-Strategy-Based Inverse Differential Games . . . . . 6.7.3 Estimation of Feedback Control Laws . . . . . . . . . . . . . . . . . . . 6.7.4 Inverse LQ Differential Game Method . . . . . . . . . . . . . . . . . . 6.8 Notes and Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
189 189 190 191 194 196 196 197 198 198 199 202
7 Examples and Experimental Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 Application-Inspired Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.2 Inverse Noncooperative Dynamic Game Simulations . . . . . . 7.1.3 Inverse Noncooperative Differential Game Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.4 Summary of Application-Inspired Illustrative Example . . . . 7.2 Further Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.1 Failure Case for Soft Method . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.2 Importance of SVDs for Soft Method . . . . . . . . . . . . . . . . . . . 7.3 Experimental Case Study: Identification of Human Behavior for Shared Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.2 Model Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
227 228 228 230
204 205 205 206 207 214 217 217 218 220 223 224 225 225
237 245 246 246 249 254 254 256
Contents
7.3.3 Experimental Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.4 Inverse Methods for Parameter Estimation . . . . . . . . . . . . . . . 7.3.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4 Notes and Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xv
258 259 259 261 261 262
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
Chapter 1
Introduction
1.1 Motivation The notion that phenomena within the natural world, including human and animal behavior, arises from the optimization of interpretable criteria has inspired the study of optimality across almost all fields of human endeavor. Studies of optimality in nature date back to antiquity, with Heron of Alexandria discovering that rays of light reflected from mirrors take those paths with the shortest lengths and least travel times [33, pp. 167–168]. Optimality now underlies our understanding of the principle of least action and Fermat’s principle of least time in physics, evolution and animal behavior in biology [42, 65, 66], human motor control in neuroscience [41, 64], and utility optimization in economics (among myriad other examples). Optimality has thus been described as “one of the oldest principles of theoretical science” [58] and “one of science’s most pervasive and flexible metaprinciples” [59]. Despite the scientific quest to discover optimality principles and underlying optimality criteria from observational data, the study of mathematical optimization has principally focused on forward problems that involve finding the best or optimal values of decision variables under given optimality criteria. Inverse problems that instead involve finding criteria under which given values of decision variables are optimal have received less attention, particularly within the optimal control branch of mathematical optimization. Optimal control is concerned with exerting optimal causal influence on a dynamical system evolving in (discrete or continuous) time, with the variables of influence called controls and the variables to be influenced called states. The forward problem of optimal control (or simply, the optimal control problem) specifically involves finding controls that lead to a given cost functional of the states and controls being minimized subject to the constraints imposed by a given dynamical system. Optimal control thus constitutes dynamic mathematical optimization with the decision variables being controls, and their optimality depending on time and the order in which they influence the dynamical system. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 T. L. Molloy et al., Inverse Optimal Control and Inverse Noncooperative Dynamic Game Theory, Communications and Control Engineering, https://doi.org/10.1007/978-3-030-93317-3_1
1
2
1 Introduction
Optimal control originated from the calculus of variations, and evolved significantly during the second half of the twentieth century with the celebrated work of Bellman on dynamic programming, Pontryagin on the minimum principle,1 and Kalman on linear-quadratic (LQ) optimal control [10, 61]. Bellman’s dynamic programming specifically led to the elegant result that the optimal controls for a dynamical system can be expressed as functions of its past states, with these functions being called optimal feedback (control) laws. In contrast, Pontryagin’s minimum principle led to a set of conditions that trajectories or sequences of controls must satisfy in order to be optimal (i.e., a set of necessary optimality conditions). Finally, Kalman showed that optimal control problems involving linear dynamical systems and cost functionals that are quadratic in the state and control variables can be solved in an efficient manner via matrix equations. In recent years, optimal control has attracted much renewed attention due to its close relationship with reinforcement learning, which relaxes some of the (stronger) assumptions of optimal control such as having prior knowledge of the dynamical system (see, e.g., [9, 36, 37] for detailed discussions of the relationship between optimal control and reinforcement learning). In this book, we investigate inverse optimal control problems (and their extensions in noncooperative dynamic game theory) that involve computing cost functionals under which given or measured state and control trajectories of dynamical systems are optimal. Interest in these inverse problems has grown significantly in recent years, sparked by their potential to model complex, dynamic decision-making tasks such as human navigation [5]; human arm movement [8, 62]; human pose adjustment and posture control [14, 56]; human eye movement [15]; the performance of human pilots, drivers, and operators [22, 26, 40, 43, 67, 68]; and other animal behaviors [18]. The solution of these inverse problems also raises the possibility of developing machines, robots, and autonomous agents that mimic the capabilities of human experts and highly evolved organisms [1, 2, 31, 46, 57].
1.2 Inverse Optimal Control Rudolf Emil Kalman was the first to pose an inverse optimal control problem. In his famous 1964 paper, Kalman posed the question “When is a Linear Control System Optimal?”, and considered the problem of finding all cost functionals under which a given feedback control law is optimal for a given dynamical system [32]. Importantly, he demonstrated that this inverse problem is frequently ill-posed, with a linear feedback control law often being optimal under more than one cost functional. Kalman [32] originally posed and solved his inverse optimal control problem under several rather restrictive assumptions including that: 1. the dynamical system is linear and time-invariant; 2. the dynamical system has a single control variable; 1
Pontryagin originally formulated the minimum principle as a maximum principle.
1.2 Inverse Optimal Control
3
3. the given feedback control law is time-invariant and linear; and 4. the cost functionals considered are quadratic. While subsequent works have focused on relaxing some of these assumptions (cf. [12, 29, 34, 47, 63]), most have remained concerned with the structural properties of optimal, mainly LQ, control problems given feedback control laws. Within systems and control engineering, inverse optimal control has only recently expanded to encompass the data-driven (inverse) problem of computing cost functionals under which given or measured state and control trajectories are optimal. Indeed, Nori and Frezza in 2004 [53] appear to have been among the first in systems and control to examine this data-driven form of inverse optimal control. Similar structural estimation and inverse reinforcement learning problems had, however, earlier been examined in economics [24, 25] and computer science [52] (albeit mostly for systems evolving in discrete time with a finite number of states and/or controls). In its data-driven form, inverse optimal control has begun to attract the attention of control theorists equipped with the powerful tools of (nonlinear) optimal control theory. Specifically, its data-driven form has been observed to naturally lend itself to solution and analysis via Pontryagin’s minimum principle due to the principle’s focus on optimal trajectories rather than optimal feedback control laws. In this context, Chaps. 3 and 4 present a control-theoretic introduction to (data-driven) inverse optimal control in both discrete and continuous time using Pontryagin’s minimum principle.
1.3 Inverse Noncooperative Dynamic Game Theory Game theory provides a mathematical theory of interaction between multiple rational decision-makers, called players; it is dynamic if the players interact by each exerting causal influence on a common dynamical system (in either discrete or continuous time); and it is noncooperative if the players pursue their own individual objectives, which may conflict with those of the other players. Noncooperative dynamic game theory is thus a natural extension of optimal control to settings in which the controls of a single dynamical system are divided between multiple different players, each with their own cost functional. However, unlike optimal control, the (forward) problem of finding optimal player strategies given the dynamical system and the player cost functionals is ambiguous since the notion of optimality itself ceases to be a welldefined concept. A variety of optimality (or solution) concepts for (forward) noncooperative dynamic games have been developed by varying factors including the order in which the players make decisions, and what information the players have or believe about the other players and state of the dynamical system.2 In this book, we shall focus on Nash equilibrium solutions that arise when all players act simultaneously and 2
A detailed discussion of solution concepts for noncooperative dynamic games is beyond the scope of this book, but is given in [7, Chap. 1].
4
1 Introduction
seek to minimize their individual cost functionals under the (correct) belief that all other players act likewise. A precise definition of Nash equilibria is deferred until the next chapter, but intuitively a player following their Nash equilibrium strategy has no incentive to unilaterally adopt a different strategy. Nash equilibrium solutions to (forward) noncooperative dynamic games can be analyzed and obtained using the modern tools of optimal control including Bellman’s dynamic programming, Pontryagin’s minimum principle, and Kalman’s matrix equations in the case of a linear dynamical system and quadratic player cost functionals [7]. Historically, however, noncooperative dynamic game theory evolved alongside optimal control (rather than after it), with Isaacs first introducing two-player noncooperative dynamic games in the 1950s and 1960s [28], and Starr and Ho [60] introducing N -player noncooperative dynamic games in 1969.3 Noncooperative dynamic game theory has since developed a rich literature and numerous applications in mathematics, economics, engineering, and biology including vehicle collision avoidance [7, 45, 49], modeling markets [17, 35], control of power systems [13], decentralized control of electric vehicles [39], vehicle formation control [23, 38], advanced driver assistance systems [19, 20, 30, 50], and modeling collision avoidance in birds [44]. In addition, recent experiments show the descriptive power of noncooperative dynamic games in modeling human–machine interaction or shared control systems [20, 27, 30, 48]. These results can be seen as a natural extension of the conjecture that human motion is governed by an optimality principle asserting the minimization of individual costs (see, e.g., [16, 54, 64]). Consequently, interactions between humans and machines (as players) modify the costs incurred by individuals, and hence the actions they respond with. While noncooperative dynamic game theory evolved in parallel to optimal control, surprisingly little attention has been paid to its inverse problem of computing player cost functionals such that given state and player control trajectories (or feedback control laws) constitute a Nash equilibrium. Indeed, inverse noncooperative dynamic game theory appears to have only emerged within the last four decades, with most developments found in the economics literature. Notable early treatments include Fujii and Khargonekar [21] in 1988, and Carraro [11] in 1989, who both considered linear dynamical systems, quadratic player cost functionals, and given (or estimated) linear player feedback control laws (in the same spirit as Kalman’s early work on inverse optimal control). Subsequent treatments in economics have focused on (data-driven) inverse noncooperative dynamic game problems (called inverse noncooperative dynamic games) involving given state and player control trajectories, with the vast majority considering relatively simple dynamical systems in discrete time with a finite number of states and/or controls (cf. [3, 6, 55], the survey paper of [4] and references therein). More recently, the related problem of multiagent
3
Isaacs was the first to extend the concept of a Nash equilibrium, proposed by John Nash [51] for (static) game theory, to describe the (forward) solution of two-player noncooperative dynamic games. Starr and Ho [60] generalized Issacs’ work to N -player noncooperative dynamic games, and were the first to explicitly note that it was no longer obvious what should be deemed a solution.
1.3 Inverse Noncooperative Dynamic Game Theory
5
inverse reinforcement learning has received some attention in computer science, but again mostly in discrete time. Despite having numerous potential applications in control beyond those covered by inverse optimal control including in multiagent systems and collaborative control, inverse noncooperative dynamic game theory has only recently been explored in its data-driven formulation using control-theoretic tools. Pontryagin’s minimum principle is thus yet to be fully explored as a tool for analyzing and solving inverse noncooperative dynamic games. In this context, Chaps. 5 and 6 generalize and extend the inverse optimal control treatments of Chaps. 3 and 4 to inverse noncooperative dynamic game theory in both discrete and continuous time using Pontryagin’s minimum principle (henceforth referred to simply as the minimum principle). Chapter 6 will specifically consider noncooperative dynamic game theory in continuous time with dynamical systems defined by differential equations. Following convention, hereon in this book we shall refer to (inverse) noncooperative dynamic games in continuous time as (inverse) noncooperative differential games, and (inverse) noncooperative dynamic games in discrete time as simply (inverse) noncooperative dynamic games.
1.4 Outline of this Book This book is divided into seven chapters. This first chapter has served as an introduction to inverse problems in optimal control and noncooperative dynamic game theory, motivating their investigation using the minimum principle. Chapter 2 gives the necessary mathematical background on static optimization, (forward) optimal control, and dynamic games. In particular, we present optimality conditions derived from minimum principles, which lay the foundation of the presented inverse optimal control and inverse dynamic game methods of this book. Chapters 3 and 4 address inverse optimal control problems in discrete and continuous time, respectively. The first part of each chapter formulates specific inverse problems that may arise depending on the given state and control data. Direct approaches for solving inverse optimal control problems, called bilevel methods and based on bilevel optimization, are then discussed. Motivated by the limitations of these direct methods, we use the minimum principle to develop alternative methods along with theoretical results that characterize the existence and uniqueness of inverse optimal control solutions they may yield. We complete each chapter by examining the relationship between (data-driven) inverse optimal control and the feedback-law-based problem posed by Kalman as inverse LQ optimal control. Chapters 5 and 6 extend the inverse optimal control methods and analysis of Chaps. 3 and 4 to inverse noncooperative dynamic games and inverse noncooperative differential games. Analogous to Chaps. 3 and 4, in Chaps. 5 and 6 we pose specific inverse problems before discussing direct methods for solving them. We then use the minimum principle in the form of (necessary) conditions for Nash equilibria to formulate efficient alternative solution methods with associated theoretical
6
1 Introduction
results characterizing the existence and uniqueness of the solutions they may yield. In addition, we complete each chapter by examining the specific solution of inverse LQ dynamic or differential games when player feedback laws are given rather than state and control trajectories. Finally, Chap. 7 presents various simulation examples and an experimental case study of human driver behavior identification toward advanced driver assistance technology. The simulation examples and experimental case study serve to illustrate and compare the methods presented in the other chapters. Each chapter in the book finishes with a section called “Notes and Further Reading”, where we give additional information to help the reader find related work or extensions of the ideas presented, with the aim of illuminating current and potential future research directions and trends.
References 1. Aghasadeghi N, Bretl T (2014) Inverse optimal control for differentially flat systems with application to locomotion modeling. In: 2014 IEEE international conference on robotics and automation (ICRA), pp 6018–6025 2. Aghasadeghi N, Long A, Bretl T (2012) Inverse optimal control for a hybrid dynamical system with impacts. In: 2012 IEEE international conference on robotics and automation (ICRA), pp 4962–4967 3. Aguirregabiria V, Mira P (2007) Sequential estimation of dynamic discrete games. Econometrica 75(1):1–53 4. Aguirregabiria V, Mira P (2010) Dynamic discrete choice structural models: a survey. J Econ 156(1):38–67 5. Albrecht S, Basili P, Glasauer S, Leibold M, Ulbrich M (2012) Modeling and analysis of human navigation with crossing interferer using inverse optimal control. In: Proceedings of the 7th Vienna international conference on mathematical modelling (Math Mod) 6. Bajari P, Benkard CL, Levin J (2007) Estimating dynamic models of imperfect competition. Econometrica 75(5):1331–1370 7. Basar T, Olsder GJ (1999) Dynamic noncooperative game theory, vol 23, 2nd edn. Academic, New York 8. Berret B, Chiovetto E, Nori F, Pozzo T (2011) Evidence for composite cost functions in arm movement planning: an inverse optimal control approach. PLoS Comput Biol 7(10) 9. Bertsekas D (2019) Reinforcement learning and optimal control. Athena Scientific, Belmont 10. Bryson AE (1996) Optimal control - 1950 to 1985. IEEE Control Syst Mag 16(3):26–33 11. Carraro C, Flemming J, Giovannini A (1989) The tastes of European central bankers. In: A European central bank?: perspectives on monetary unification after ten years of the EMS, pp 162–185. Cambridge University Press 12. Casti J (1980) On the general inverse problem of optimal control theory. J Optim Theory Appl 32(4):491–497 13. Chen H, Ye R, Wang X, Lu R (2015) Cooperative control of power system load and frequency by using differential games. IEEE Trans Control Syst Technol 23(3):882–897 14. El-Hussieny H, Asker A, Salah O (2017) Learning the sit-to-stand human behavior: an inverse optimal control approach. In: 2017 13th international computer engineering conference (ICENCO), pp 112–117 15. El-Hussieny H, Ryu J (2018) Inverse discounted-based LQR algorithm for learning human movement behaviors. Appl Intell 16. Engelbrecht SE (2001) Minimum principles in motor control. J Math Psychol 45(3):497–542
References
7
17. Engwerda J (2005) LQ dynamic optimization and differential games. Wiley, West Sussex 18. Faruque IA, Muijres FT, Macfarlane KM, Kehlenbeck A, Humbert JS (2018) Identification of optimal feedback control rules from micro-quadrotor and insect flight trajectories. Biol Cybern 112(3):165–179 19. Flad M, Fröhlich L, Hohmann S (2017) Cooperative shared control driver assistance systems based on motion primitives and differential games. IEEE Trans Hum-Mach Syst 47(5):711–722 20. Flad M(2019) Differential-game-based driver assistance system for fuel-optimal driving. In: Petrosyan LA, Mazalov VV, Zenkevich NA (eds) Frontiers of dynamic games: game theory and management, St. Petersburg, 2018, Static & dynamic game theory: foundations & applications. Springer International Publishing, Cham, pp 13–36 21. Fujii T, Khargonekar PP (1998) Inverse problems in h/sub infinity/control theory and linearquadratic differential games. In: Proceedings of the 27th IEEE conference on decision and control, vol 1, pp 26–31 22. Gote C, Flad M, Hohmann S (2014) Driver characterization & driver specific trajectory planning: an inverse optimal control approach. In: 2014 IEEE international conference on systems, man, and cybernetics (SMC), pp 3014–3021 23. Dongbing G (2007) A differential game approach to formation control. IEEE Trans Control Syst Technol 16(1):85–93 24. Hotz VJ, Miller RA (1993) Conditional choice probabilities and the estimation of dynamic models. Rev Econ Stud 60(3):497–529 25. Hotz VJ, Miller RA, Sanders S, Smith J (1994) A simulation estimator for dynamic models of discrete choice. Rev Econ Stud 61(2):265–289 26. Inga J, Eitel M, Flad M, Hohmann S (2018) Evaluating human behavior in manual and shared control via inverse optimization. In: 2018 IEEE international conference on systems, man, and cybernetics (SMC), pp 2699–2704 27. Inga J, Creutz A, Hohmann S (2021) Online inverse linear-quadratic differential games applied to human behavior identification in shared control. In: 2021 European control conference (ECC) 28. Isaacs R (1965) Differential games: mathematical theory with application to warfare and pursuit control and optimisation. Dover Publications, New York 29. Jameson A, Kreindler E (1973) Inverse problem of linear optimal control. SIAM J Control 11(1):1–19 30. Ji X, Yang K, Na X, Lv C, Liu Y (2019) Shared steering torque control for lane change assistance: a stochastic game-theoretic approach. IEEE Trans Ind Electron 66(4):3093–3105 31. Johnson M, Aghasadeghi N, Bretl T (2013) Inverse optimal control for deterministic continuous-time nonlinear systems. In: 2013 IEEE 52nd annual conference on decision and control (CDC), pp 2906–2913 32. Kalman RE (1964) When is a linear control system optimal? J Basic Eng 86(1):51–60 33. Kline M (1972) Mathematical thought from ancient to modern times. Oxford University Press, Oxford 34. Kong H, Goodwin G, Seron M (2012) A revisit to inverse optimality of linear systems. Int J Control 85(10):1506–1514 35. Kossioris G, Plexousakis M, Xepapadeas A, de Zeeuw A, Mäler KG (2008) Feedback Nash equilibria for non-linear differential games in pollution control. J Econ Dyn Control 32(4):1312–1331 36. Lewis FL, Vrabie D, Vamvoudakis KG (2012) Reinforcement learning and feedback control: using natural decision methods to design optimal adaptive controllers. Control Syst IEEE 32(6):76–105 37. Lewis FL, Vrabie D, Syrmos VL (2012) Optimal control, 3rd edn. Wiley, New York 38. Lin W (2014) Distributed UAV formation control using differential game approach. Aerosp Sci Technol 35:54–62 39. Ma Z, Callaway DS, Hiskens IA (2013) Decentralized charging control of large populations of plug-in electric vehicles. IEEE Trans Control Syst Technol 21(1):67–78 40. Maillot T, Serres U, Gauthier J-P, Ajami A (2013) How pilots fly: an inverse optimal control problem approach. In: 2013 IEEE 52nd annual conference on decision and control (CDC), pp 1792–1797. IEEE
8
1 Introduction
41. Mathis MW, Schneider S (2021) Motor control: neural correlates of optimal feedback control theory. Curr Biol 31(7):R356–R358 42. McFarland DJ (1977) Decision making in animals. Nature 269(1) 43. Menner M, Worsnop P, Zeilinger MN (2018) Predictive modeling by inverse constrained optimal control with application to human-robot co-manipulation, p 12 44. Molloy TL, Garden GS, Perez T, Schiffner I, Karmaker D, Srinivasan M (2018) An Inverse Differential Game Approach to Modelling Bird Mid-Air Collision Avoidance Behaviours. In: 18th IFAC symposium on system Identification (SYSID, 2018), Stockholm, Sweden, p 2018 45. Molloy TL, Perez T, Williams BP (2020) Optimal bearing-only-information strategy for unmanned aircraft collision avoidance. J Guid Control Dyn 43(10):1822–1836 46. Mombaur K, Truong A, Laumond J-P (2010) From human to humanoid locomotion–an inverse optimal control approach. Auton Robot 28(3):369–383 47. Moylan P, Anderson B (1973) Nonlinear regulator theory and an inverse optimal control problem. IEEE Trans Autom Control 18(5):460–465 48. Musi´c S, Hirche S (2020) Haptic shared control for human-robot collaboration: a gametheoretical approach. IFAC-PapersOnLine 53(2):10216–10222 49. Mylvaganam T, Sassano M, Astolfi A (2017) A differential game approach to multi-agent collision avoidance. IEEE Trans Autom Control 62(8):4229–4235 50. Na X, Cole DJ (2015) Game-theoretic modeling of the steering interaction between a human driver and a vehicle collision avoidance controller. IEEE Trans Hum-Mach Syst 45(1):25–38 51. Nash J (1951) Non-cooperative games. Ann Math 54(2):286–295 52. Ng AY, Russell SJ, et al (2000) Algorithms for inverse reinforcement learning. In: ICML, pp 663–670 53. Nori F, Frezza R (2004) Linear optimal control problems and quadratic cost functions estimation. In: Proceedings of the mediterranean conference on control and automation, vol 4 54. Nubar Y, Contini R (1961) A minimal principle in biomechanics. Bull Math Biophys 23(4):377– 391 55. Pakes A, Ostrovsky M, Berry S (2007) Simple estimators for the parameters of discrete dynamic games (with entry/exit examples). Rand J Econ 38(2):373–399 56. Priess MC, Conway R, Choi J, Popovich JM, Radcliffe C (2015) Solutions to the inverse LQR problem with application to biological systems analysis. IEEE Trans Control Syst Technol 23(2):770–777 57. Puydupin-Jamin A-S, Johnson M, Bretl T (2012) A convex approach to inverse optimal control and its application to modeling human locomotion. In: 2012 IEEE international conference on robotics and automation (ICRA), pp 531–536 58. Rosen R (1967) Optimality principles in biology. Springer, Berlin 59. Schoemaker PJH (1991) The quest for optimality: a positive heuristic of science? Behav Brain Sci 14(2):205–215 60. Starr AW, Ho Y-C (1969) Nonzero-sum differential games. J Optim Theory Appl 3(3):184–206 61. Sussmann HJ, Willems JC (1997) 300 years of optimal control: from the brachystochrone to the maximum principle. IEEE Control Syst Mag 17(3):32–44 62. Sylla N, Bonnet V, Venture G, Armande N, Fraisse P (2014) Human arm optimal motion analysis in industrial screwing task. In: 5th IEEE RAS/EMBS international conference on biomedical robotics and biomechatronics, pp 964–969. IEEE 63. Thau F (1967) On the inverse optimum control problem for a class of nonlinear autonomous systems. IEEE Trans Autom Control 12(6):674–681 64. Todorov E (2004) Optimality principles in sensorimotor control. Nat Neurosci 7(9):907–915 65. Tsiantis N, Balsa-Canto E, Banga JR (2018) Optimality and identification of dynamic models in systems biology: an inverse optimal control framework. Bioinformatics 34(14):2433–2440 66. Vasconcelos M, Fortes I, Kacelnik A (2017) On the structure and role of optimality models in the study of behavior. In: APA handbook of comparative psychology: perception, learning, and cognition, vol 2, APA handbooks in psychology®, pp 287–307. American Psychological Association, Washington, DC, US
References
9
67. Yokoyama N (2016) Inference of flight mode of aircraft including conflict resolution. In 2016 American control conference (ACC), pp 6729–6734 68. Yokoyama N (2017) Inference of aircraft intent via inverse optimal control including secondorder optimality condition. J Guid Control Dyn 41(2):349–359
Chapter 2
Background and Forward Problems
In this chapter, we briefly revisit concepts in (static) optimization, (forward) optimal control, and (forward) noncooperative dynamic game theory that will prove useful in later chapters on inverse optimal control and inverse noncooperative dynamic (and differential) game theory. Detailed treatments of these topics are provided in numerous books (e.g., [1, 3, 6, 12]), so we shall refer to these and other primary sources for rigorous mathematical proofs.
2.1 Static Optimization Static optimization is an important precursor to optimal control and noncooperative dynamic (and differential) game theory.
2.1.1 General Formulation Consider a real-valued cost (or objective) function V : U → R defined on a controlconstraint set U that is either a subset of Rm or the entirety of Rm . The static optimization problem min
V (u)
s.t.
u∈U
u
(2.1)
involves determining an optimal control (or decision) variable u ∗ ∈ U ⊂ Rm that leads to the cost function V attaining its minimum value over U in the sense that © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 T. L. Molloy et al., Inverse Optimal Control and Inverse Noncooperative Dynamic Game Theory, Communications and Control Engineering, https://doi.org/10.1007/978-3-030-93317-3_2
11
12
2 Background and Forward Problems
V (u ∗ ) ≤ V (u) for all u ∈ U . The value of a control variable u ∗ that minimizes V (i.e., a minimizing argument of V ) is written as satisfying u ∗ ∈ arg min V (u). u∈U
An important technical concern is that (2.1) may be infeasible in the sense that no u that minimizes the cost function V belongs to U . For example, (2.1) is infeasible if the set U arises from contradicting constraints and is thus empty (denoted by U = ∅ {}); it is also infeasible if V decreases without bound on U (such as in the case V (u) = u with U = R where V (u) → −∞ as u → −∞). The later example, in particular, highlights that for (2.1) to be feasible, it is necessary (though not always sufficient) for V to be bounded from below on U by some value κ ∈ R in the sense that V (u) ≥ κ for all u ∈ U . The greatest value of κ that bounds V from below on U is called the infimum of V (on U ), and is written as inf
V (u)
s.t.
u∈U.
u
(2.2)
More precisely, the infimum of V is the greatest lower bound on the values of V (u) with u ∈ U in the sense that inf u∈U V (u) ≤ V (u) ¯ for all u¯ ∈ U . The infimum can exist when the minimum does not since it need not correspond to a value of ¯ for all u¯ ∈ U ). In this V attained on U (i.e., it may be that inf u∈U V (u) = V (u) book, we shall often avoid explicitly assuming the existence of minima by instead considering infima, noting however that they correspond when (2.1) is feasible.
2.1.2 Necessary Optimality Conditions Let us define ∇u V (u) ¯ ∈ Rm as the gradient of the cost function V at u¯ ∈ Rm . That is, the gradient of the cost function V at u¯ ∈ Rm is the vector ⎡ ⎢ ⎢ ⎢ ∇u V (u) ¯ =⎢ ⎢ ⎢ ⎣
⎤
.. . ∂ V (u) u (m)
⎥ ⎥ ⎦
∂ V (u) ∂u (1) u=u¯ ⎥ ⎥ ∂ V (u) ∂u (2) u=u¯ ⎥ ⎥
u=u¯
where the components are the partial derivatives of V with respect to the components
of the vector u = u (1) u (2) · · · u (m) ∈ U evaluated at u¯ ∈ Rm . Here and throughout
the book, we use to denote the vector (or matrix) transpose.
2.1 Static Optimization
13
If the cost function V is continuously differentiable on Rm (i.e., the gradient ¯ exists and is a continuous function of u) ¯ and U is a closed and convex ∇u V (u) subset of Rm , then optimal solutions u ∗ to (2.1) lie either on the boundary of the set U (with a gradient directed outwards) or in the interior of the set U (with a zero gradient). Thus, if some u ∈ U is an optimal solution to (2.1) (i.e., if u = u ∗ ), then ∇u V (u) (u¯ − u) ≥ 0
(2.3)
for all u¯ ∈ U , which simplifies to ∇u V (u) = 0 if u is in the interior (i.e., not on the boundary) of U . Here, we use 0 to denote either the scalar number zero, or a vector (or matrix) of appropriate dimensions with all zero elements. It is important to note that u ∈ U must satisfy (2.3) in order to constitute an optimal solution to (2.1). However, if u ∈ U satisfies (2.3) we cannot, in general, conclude that it is an optimal solution to (2.1) since (2.3) is satisfied by all u ∈ U that are local (potentially non-global) minima, maxima, or inflection points of V . Thus, we say that (2.3) is a necessary, though not always sufficient, condition for u ∈ U to constitute an optimal solution to (2.1). An important special case in which (2.3) is both a necessary and sufficient condition for u ∈ U to be an optimal solution to (2.1) is when both the cost function V and constraint set U are convex.
2.1.3 Quadratic Programs A quadratic program is a static optimization problem ((2.1) or (2.2)) in which the cost function V is given by a quadratic form in the sense that V (u) =
1 u Ωu + b u 2
(2.4)
for u ∈ U where Ω ∈ Rm×m is a given real symmetric matrix (i.e., Ω = Ω ), and b ∈ Rm is a given real column vector. In this book, we shall primarily concern ourselves with the solution of unconstrained quadratic programs of the form inf u
s.t.
1 u Ωu + b u 2 u ∈ Rm .
(2.5)
The gradient of V when V is the quadratic form (2.4) is ∇u V (u) = Ωu + b. The necessary optimality condition (2.3) for u to be a solution to the unconstrained quadratic program (2.5) is thus
14
2 Background and Forward Problems
Ωu + b = 0. If Ω is positive definite (denoted by Ω 0 and meaning that u Ωu ≥ 0 for all u ∈ Rm with equality if and only if u = 0), then V is (strictly) convex and this condition becomes both necessary and sufficient for u to be an optimal solution to (2.5). Equivalently, if Ω is positive definite then u solves (2.5) if and only if u = −Ω −1 b
(2.6)
since Ω has an inverse Ω −1 when it is positive definite (i.e., Ω is invertible or nonsingular). If, however, Ω is positive semidefinite (denoted by Ω 0 and meaning that u Ωu ≥ 0 for all u ∈ Rm ), we require the following Moore–Penrose pseudoinverse and singular value decomposition (SVD) concepts to present necessary optimality conditions for (2.5). Definition 2.1 (Moore–Penrose Pseudoinverse) A matrix A+ ∈ Rn×m is the Moore– Penrose pseudoinverse (or pseudoinverse) of a matrix A ∈ Rm×n if it satisfies the four conditions: A A+ A = A A+ A A+ = A+ (A A+ ) = A A+
(2.7a) (2.7b) (2.7c)
(A+ A) = A+ A.
(2.7d)
Definition 2.2 (Singular Value Decomposition of Positive Semidefinite Matrix) For a positive semidefinite matrix Ω, the pair (U, Σ) is called a singular value decomposition (SVD) of Ω if Ω = U ΣU where Σ ∈ Rm×m is a diagonal matrix with nonnegative entries and U ∈ Rm×m . Detailed discussions of these definitions are given in [2, Chap. 1] and [7, Chap. 14]. Importantly, they lead to the following proposition characterizing the solutions to unconstrained quadratic programs of the form of (2.5) when Ω is positive semidefinite. Proposition 2.1 (Solutions to Unconstrained Quadratic Programs) Consider the unconstrained quadratic program (2.5) where Ω is positive semidefinite with Moore– Penrose pseudoinverse Ω + and with a SVD (U, Σ) such that Ω = U ΣU . If (I − ΩΩ + ) = 0, then all u ∈ Rm satisfying u = −Ω + b + U
0 z
(2.8)
for any (arbitrary) z ∈ Rm−r are optimal solutions to (2.5) where I denotes the identity matrix of appropriate dimensions, and r rank(Ω) is the matrix rank of Ω. Proof See [7, Proposition 15.2].
2.1 Static Optimization
15
We note that if rank(Ω) = m (which holds when Ω 0), then its Moore–Penrose pseudoinverse reduces to the standard inverse Ω −1 and the conditions (2.6) and (2.8) become equivalent. We shall make extensive use of Proposition 2.1 in later chapters to characterize the solutions of inverse optimal control and inverse dynamic game problems.
2.1.4 Systems of Linear Equations As suggested by the optimality conditions of (2.6) and Proposition 2.1, linear algebra results are useful in solving both quadratic programs and systems of linear equations. Solving a system of linear equations involves finding all vectors u ∈ Rm such that Au = b
(2.9)
for a given matrix A ∈ Rn×m and given vector b ∈ Rn . A system of linear equations (2.9) is said to be consistent if at least one u ∈ Rm satisfying it exists; otherwise, it is said to be inconsistent. The following concept of a generalized inverse of A is useful for determining if a system of linear equations is consistent (and for identifying all of its solutions). Definition 2.3 (Generalized Inverse) A matrix A g ∈ Rn×m is a generalized inverse of a matrix A ∈ Rm×n if it satisfies A A g A = A (i.e., the first condition (2.7a) required for a Moore–Penrose pseudoinverse). It is interesting to note that Moore–Penrose pseudoinverses are generalized inverses, however, not all generalized inverses are Moore–Penrose pseudoinverses (see [10, p. 113] for a detailed discussion). Given the definition of generalized inverses, we have the following results (found in [2, 10]) that characterize solutions to systems of linear equations. Lemma 2.1 (Lemma 2 of [2, p. 43]) Consider a matrix A ∈ Rn×m with a generalized inverse A g ∈ Rm×n (e.g., its Moore–Penrose pseudoinverse A+ ). Then, (a) A g A = I ∈ Rm×m if and only if rank(A) = m; and (b) A A g = I ∈ Rn×n if and only if rank(A) = n.
Proof See Lemma 2 of [2, p. 43].
Proposition 2.2 (Consistency and Solutions of System of Linear Equations) Consider the system of linear equations (2.9) with A g ∈ Rm×n being a generalized inverse of A (e.g., its Moore–Penrose pseudoinverse A+ ). Then (2.9) is consistent if and only if A A g b = b,
16
2 Background and Forward Problems
in which case (2.9) is solved by all u ∈ Rm satisfying u = −A g b + (I − A g A)z for any (arbitrary) z ∈ Rm .
Proof See Corollary 2 of [2, p. 53] or [10, Theorem 1].
We shall use Lemma 2.1 and Proposition 2.2 in later chapters to examine solutions to inverse optimal control and inverse dynamic game problems. We now turn our attention to (forward) optimal control problems in discrete time.
2.2 Discrete-Time Optimal Control In the last section on static optimization, our focus was on the cost function V and we introduced constraints on the control variables u only in due course. In this and the next section on optimal control, our focus shifts to first introducing constraints representing the dynamics (or time-varying properties) of the problem. These dynamics (or constraints) are often fixed by the physical laws of nature (or operational factors), while the cost function can be selected as part of a design process.
2.2.1 General Formulation In this section, let the dynamics be described by the (deterministic) system of difference equations (2.10) xk+1 = f k (xk , u k ) , x0 ∈ Rn for discrete times k ≥ 0 (as shorthand for k = 0, 1, . . . or k ∈ {0, 1, . . .}) where f k : Rn × Rm → Rn are potentially nonlinear functions, xk ∈ Rn are state vectors, x0 is the initial state, and u k ∈ U are control inputs belonging to the control-constraint set U ⊂ Rm .
2.2.1.1
Discrete-Time Finite-Horizon Optimal Control
For a given finite horizon 0 < T < ∞, let us define the real-valued cost function
VT x[0,T ] , u [0,T −1] F (x T ) +
T −1 k=0
gk (xk , u k )
(2.11)
2.2 Discrete-Time Optimal Control
17
where x[0,T ] {x0 , x1 , . . . , x T } and u [0,T −1] {u 0 , u 1 , . . . , u T −1 } are sequences of states and controls, respectively. Here, the functions gk : Rn × U → R for 0 ≤ k < T describe the stage (or running) cost associated with the states and controls at time k while the terminal cost function F : Rn → R describes the additional cost associated with the system being in state x T at the terminal time k = T . Given the discrete-time dynamics (2.10), the finite-horizon cost function (2.11), ¯ the discrete-time finite-horizon optimal control problem and the initial state x0 = x, is to find a control sequence u [0,T −1] with associated state sequence x[0,T ] that solves the (dynamic) optimization problem inf
u [0,T −1]
s.t.
2.2.1.2
VT x[0,T ] , u [0,T −1] xk+1 = f k (xk , u k ), 0 ≤ k ≤ T − 1 uk ∈ U , 0 ≤ k ≤ T − 1 x0 = x. ¯
(2.12)
Discrete-Time Infinite-Horizon Optimal Control
For an infinite horizon (which in a slight abuse of notation, we shall write as T = ∞), let us define the real-valued cost function ∞
V∞ x[0,∞] , u [0,∞] gk (xk , u k )
(2.13)
k=0
where x[0,∞] {x0 , x1 , . . .} and u [0,∞] {u 0 , u 1 , . . .} are infinite sequences of states and controls, respectively, and the functions gk : Rn × U → R describe the stage (or running) cost associated with the states and controls at times k ≥ 0. Given the discrete-time dynamics (2.10), the infinite-horizon cost function (2.13), ¯ the discrete-time infinite-horizon optimal control proband the initial state x0 = x, lem is to find controls u [0,∞] and associated states x[0,∞] that solve the (dynamic) optimization problem inf
u [0,∞]
s.t.
V∞ x[0,∞] , u [0,∞] xk+1 = f k (xk , u k ), k ≥ 0 uk ∈ U , k ≥ 0
(2.14)
x0 = x. ¯ Extra care is required to ensure that the infinite-horizon problem (2.14) is wellposed in the sense that there exists a finite optimal cost V∞ (x[0,∞] , u [0,∞] ) < ∞. Ensuring a finite optimal cost typically entails imposing conditions on the stagecost functions gk (e.g., that they incorporate discounting or averaging). A detailed discussion of stage-cost functions leading to finite-cost infinite-horizon problems can
18
2 Background and Forward Problems
be found in [3, Chap. 7]. Throughout this book, we shall only consider discrete-time infinite-horizon optimal control problems with finite optimal costs.
2.2.2 Discrete-Time Minimum Principles Necessary conditions that control sequences must satisfy in order to constitute solutions to the discrete-time optimal control problems of (2.11) and (2.14) are provided by discrete-time minimum principles. To present these discrete-time minimum principles, let us define the (discrete-time) Hamiltonian function Hk (xk , u k , λk+1 ) gk (xk , u k ) + λ k+1 f k (xk , u k )
(2.15)
where λk ∈ Rn for k ≥ 0 are costate (or adjoint) vectors. Let us assume that the control-constraint set U satisfies the following assumption. Assumption 2.1 The set U is closed and convex. Assumption 2.1 is standard in the development of discrete-time minimum principles for both finite- and infinite-horizon discrete-time optimal control problems. It was, however, omitted in many early (flawed) attempts to establish discrete-time minimum principles. A historical perspective on Assumption 2.1 is provided in the books [3, 8]. Let us also assume that the functions f k , gk , and F in (2.11) and (2.14) are continuously differentiable in their state and control arguments so that we may consider the (column) vectors ∇x Hk (xk , u k , λk+1 ) ∈ Rn and ∇u Hk (xk , u k , λk+1 ) ∈ Rm consisting of the partial derivatives of the Hamiltonian with respect to the components of xk and u k , respectively, and evaluated at xk , u k , and λk+1 . We use ∇x F(x T ) to denote the gradient of F evaluated with x T . Under these assumptions, we have the following discrete-time minimum principle established in [3] for the finite-horizon discrete-time optimal control problem (2.12). Theorem 2.1 (Discrete-Time Finite-Horizon Minimum Principle [3, Proposition 3.3.2]) Suppose that Assumption 2.1 holds and that x[0,T ] and u [0,T −1] constitute a solution to the finite-horizon discrete-time optimal control problem (2.12) for a given horizon 0 < T < ∞. Then, (i) the state sequence x[0,T ] satisfies the dynamics xk+1 = f k (xk , u k ) ¯ for 0 ≤ k ≤ T − 1 with x0 = x; (ii) there exist costates λk ∈ Rn for 0 ≤ k ≤ T satisfying the backward recursion λk = ∇x Hk (xk , u k , λk+1 )
(2.16)
2.2 Discrete-Time Optimal Control
19
for 0 ≤ k ≤ T − 1 with λT = ∇x F (x T ) ;
(2.17)
∇u Hk (xk , u k , λk+1 ) (u¯ − u k ) ≥ 0
(2.18)
and (iii) the controls u k satisfy
for all u¯ ∈ U and all 0 ≤ k ≤ T − 1. Proof See Proposition 3.3.2 of [3] or [12, Chap. 2].
Theorem 2.1 is one possible generalization of the necessary optimality condition for static optimization (2.3) to the dynamic setting of discrete-time optimal control. It is thus important to note that Theorem 2.1 provides necessary though not sufficient conditions for u [0,T −1] to constitute an optimal solution to (2.12). Indeed, the conditions of Theorem 2.1 can be satisfied by u [0,T −1] other than (global) minima. Nevertheless, Theorem 2.1 is often very useful in solving (forward) finite-horizon optimal control problems since it can be used to efficiently identify possible solutions when other methods of solving optimal control problems (e.g., dynamic programming) become intractable (cf. [12, Chaps. 2 and 3] and [3, Chap. 3]). Theorem 2.1 generalizes to other settings including infinite-horizon discrete-time optimal control (i.e., (2.14)). A key difficulty in extending Theorem 2.1 to the infinitehorizon discrete-time optimal control, however, lies in establishing a recursion for the costates analogous to (2.16). A variety of approaches have been used to overcome this difficultly, with each requiring one or more additional (potentially restrictive) assumptions on the dynamics f k . We shall follow the approach of [4] (see also [5]) and introduce the following invertibility assumption on the partial derivatives of f k with respect to xk . Specifically, let ∇x f k (xk , u k ) and ∇u f k (xk , u k ) denote the matrices of partial derivatives of f k with respect to (and evaluated at) xk and u k , respectively (if they exist). Following [4], we introduce the following assumption. Assumption 2.2 The matrix ∇x f k (xk , u k ) is invertible for all k ≥ 0. Assumption 2.2 leads to the following intuitive generalization of the finite-horizon discrete-time minimum principle of Theorem 2.1 to infinite-horizon discrete-time optimal control. The proof of this theorem (found in [4]) exploits Assumption 2.2 to convert the backward costate recursion (2.16) to a forward recursion. Weaker assumptions under which this theorem holds are discussed further in [5]. Theorem 2.2 (Discrete-Time Infinite-Horizon Minimum Principle) Suppose that the state x[0,∞] and control u [0,∞] sequences constitute a solution to the infinitehorizon discrete-time optimal control problem (2.14). If Assumptions 2.1 and 2.2 hold then,
20
2 Background and Forward Problems
(i) the state sequence x[0,∞] satisfies the dynamics xk+1 = f k (xk , u k ) ¯ for k ≥ 0 with x0 = x; (ii) there exist costates λk ∈ Rn for k ≥ 0 satisfying the backward recursion λk = ∇x Hk (xk , u k , λk+1 )
(2.19)
∇u Hk (xk , u k , λk+1 ) (u¯ − u k ) ≥ 0
(2.20)
for k ≥ 0; and (iii) the controls u k satisfy
for all u¯ ∈ U and all k ≥ 0. Proof See Theorem 2 of [4].
The discrete-time finite-horizon minimum principle of Theorem 2.1 and the discrete-time infinite-horizon minimum principle of Theorem 2.2 establish that the solutions to the finite- and infinite-horizon optimal control problems satisfy essentially the same set of necessary conditions. Both minimum principles state that (i) the solutions must satisfy the system dynamics; (ii) there exist costates λk satisfying a backward recursion; and (iii) the optimal controls u k minimize the Hamiltonian ¯ λk+1 ) over all controls u¯ ∈ U . The main difference between function Hk (xk , u, the finite- and infinite-horizon discrete-time minimum principles (aside from their assumptions) is that in the finite-horizon setting, the terminal costate λT satisfies the boundary condition (2.17) while in the infinite-horizon setting, there are no initial or boundary conditions on the costate recursion (2.19). We next introduce (forward) optimal control problems and minimum principles for dynamics described in continuous time by differential equations (instead of the discrete-time difference equations of (2.10)).
2.3 Continuous-Time Optimal Control In this section, we present continuous-time counterparts to the discrete-time optimal control problems and minimum principles considered in the previous section. We shall therefore consider time in this section to be a continuous variable t taking values in the interval [0, T ] for some (potentially infinite) horizon T > 0 (instead of a discrete variable k taking values in the nonnegative integers {0, 1, . . . , T }).1 In a further abuse of notation, we shall take [0, T ] with an infinite horizon (T = ∞) to mean the open interval [0, ∞).
1
2.3 Continuous-Time Optimal Control
21
2.3.1 General Formulation Let the dynamics (or dynamical system) we consider in this section be described by the system of first-order ordinary differential equations2 x(t) ˙ = f (t, x(t), u(t)) , x(0) = x0
(2.21)
for continuous-time t ∈ [0, T ] where x(t) ∈ Rn is the state vector of the system, x0 ∈ Rn is the initial state, u(t) ∈ U are the control inputs belonging to the (closed) control-constraint set U ⊂ Rm , and f : [0, T ] × Rn × U → Rn is a (potentially nonlinear) function. An important technical concern in continuous time is whether the dynamics (2.21) admit a unique solution, in the form of a state trajectory x : [0, T ] → Rn , given a control trajectory u : [0, T ] → U . Following [1, Theorem 5.1], we shall make the standard assumption that f is continuous in t and uniformly (globally) Lipschitz continuous in x(t) and u(t) such that (2.21) admits a unique solution for every continuous control function u : [0, T ] → U . For a given (potentially infinite) horizon T > 0, let us define the (real-valued) cost functional T g (t, x(t), u(t)) dt (2.22) VT (x, u) 0
where g : [0, T ] × Rn × U → R is a potentially time-varying function that describes the stage (or integral or instantaneous) cost associated with the states and controls at time t. Given the cost functional (2.22), the dynamics (2.21), and the initial state x0 , the continuous-time optimal control problem is to find a control function u : [0, T ] → U (and associated state trajectory x : [0, T ] → Rn ) that solves the (dynamic) optimization problem inf
VT (x, u)
s.t.
x(t) ˙ = f (t, x(t), u(t)), t ∈ [0, T ] u(t) ∈ U , t ∈ [0, T ]
u
(2.23)
x(0) = x0 . As in discrete-time optimal control, extra care is often required when the horizon T is infinite in order to ensure that there exists a finite optimal cost V∞ (x, u) < ∞. Throughout this book, we shall assume the existence of a finite optimal cost. In continuous-time optimal control, we note that there is little loss of generality in considering cost functionals of the form in (2.22) without a terminal cost dependent on the terminal state x(T ) since cost functionals with terminal costs can often be rewritten in the form of (2.22). To see this generality, consider a finite horizon T and a cost functional of the form 2
We shall use dot notation to denote derivatives with respect to time (e.g., x(t) ˙ =
d dt
x(t)).
22
2 Background and Forward Problems
V˜T (x, u) = h (x(T )) +
T
g˜ (t, x(t), u(t)) dt
0
where h : Rn → R and g˜ : [0, T ] × Rn × Rm → R are continuously differentiable terminal-cost and integral-cost functions, respectively. The fundamental theorem of calculus (see Sect. 2.7.1 of [13]) implies that we may write V˜T as V˜T (x, u) = h (x(0)) +
T
g˜ (t, x(t), u(t)) +
0
d h(x(t)) dt dt
= h (x(0)) + VT (x, u) where VT consists of functions g that appropriately combine g˜ and dtd h(x(t)). Since the initial state x(0) is given in the optimal control problem (2.23), the term h (x(0)) is fixed, and so the states x and controls u solving (2.23) with the cost functional VT also constitute an optimal solution to an optimal control problem with the same dynamics but with the cost functional V˜T .
2.3.2 Continuous-Time Minimum Principles Necessary optimality conditions for continuous-time optimal control problems are provided by minimum principles analogous to those we encountered in discrete-time optimal control. To present minimum principles for continuous-time optimal control, let us define the continuous-time Hamiltonian function H (t, x(t), u(t), λ(t), μ) μg (t, x(t), u(t)) + λ (t) f (t, x(t), u(t))
(2.24)
where μ ∈ R is a scalar constant and λ : [0, T ] → Rn is the costate (or adjoint) function. In contrast to the discrete-time minimum principle, we do not require the control-constraint set U to be convex. We do, however, require the functions f and g to be continuously differentiable in their state and control arguments so that we may consider the (column) vectors ∇x H (t, x(t), u(t), λ(t), μ) ∈ Rn and ∇u H (t, x(t), u(t), λ(t), μ) ∈ Rm consisting of the partial derivatives of the Hamiltonian with respect to the components of x(t) and u(t), respectively, and evaluated at x(t), u(t), λ(t), and μ. The following theorem provides a continuous-time minimum principle for the continuous-time optimal control problem (2.23) with a finite horizon 0 < T < ∞. Theorem 2.3 (Continuous-Time Finite-Horizon Minimum Principle [9]) Suppose that the state and control trajectories x : [0, T ] → Rn and u : [0, T ] → U constitute a solution to the continuous-time optimal control problem (2.23) for some finite horizon 0 < T < ∞. Then μ = 1;
2.3 Continuous-Time Optimal Control
23
(i) the state trajectory x : [0, T ] → Rn solves the differential equation x(t) ˙ = f (t, x(t), u(t)) for t ∈ [0, T ] with x(0) = x0 ; (ii) there exists a costate trajectory λ : [0, T ] → Rn satisfying ˙ λ(t) = −∇x H (t, x(t), u(t), λ(t), μ) for t ∈ [0, T ] with λ(T ) = 0; and (iii) the controls u : [0, T ] → U satisfy ¯ λ(t), μ) u (t) ∈ arg min H (t, x(t), u(t), u(t)∈U ¯
for all t ∈ [0, T ]. Proof See [9] and references therein.
When the continuous-time optimal control problem (2.23) has an infinite horizon T = ∞, we have the following minimum principle analogous to Theorem 2.3. Theorem 2.4 (Continuous-Time Infinite-Horizon Minimum Principle [9]) Suppose that the state and control trajectories x : [0, ∞) → Rn and u : [0, ∞) → U constitute a solution to the continuous-time optimal control problem (2.23) for an infinite horizon T = ∞. Then, (i) the state trajectory x : [0, ∞) → Rn solves the differential equation x(t) ˙ = f (t, x(t), u(t)) for t ≥ 0 with x(0) = x0 ; (ii) there exists a costate trajectory λ : [0, ∞) → Rn and real number μ ∈ R satisfying ˙ λ(t) = −∇x H (t, x(t), u(t), λ(t), μ) for t ≥ 0 with μ and λ(0) not simultaneously 0; and (iii) the controls u : [0, ∞) → U satisfy u (t) ∈ arg min H (t, x(t), u(t), ¯ λ(t), μ) u(t)∈U ¯
for all t ≥ 0.
24
Proof See [9].
2 Background and Forward Problems
The key differences between the continuous-time finite- and infinite-horizon minimum principles of Theorems 2.3 and 2.4 are that the costate trajectory λ in the infinitehorizon case lacks a terminal condition, and the constant μ can be any real number (not just μ = 1 as in the finite-horizon case). Both the finite- and infinite-horizon minimum principles, however, state that (i) the solutions must satisfy the system dynamics; (ii) there exists a costate trajectory λ solving a differential equation; and (iii) the optimal controls u(t) minimize the Hamiltonian function H (t, x(t), u(t), μ, λ(t)) at all times t in the (possibly infinite) interval [0, T ]. The minimum principles we discussed in this and the last section for discrete-time and continuous-time optimal control provide only necessary conditions for optimality (analogous to the static optimization conditions (2.3)). They may therefore be satisfied by control sequences or trajectories other than global minima. Nevertheless, they are useful for finding candidate solutions to optimal control problems with detailed procedures provided in [12, Chaps. 2 and 3] and [3, Chap. 3]. In this book, we shall explore their recent novel use in solving inverse problems in optimal control and noncooperative dynamic theory. Having briefly discussed (forward) optimal control theory, we now turn our attention to (forward) dynamic game theory.
2.4 Noncooperative Dynamic Games In this and the next section, we shall provide an overview of noncooperative dynamic game theory in which multiple players compete to optimize their own individual cost functions by selecting controls of a shared dynamical system evolving in either discrete or continuous time.3 The static optimization and optimal control problems that we considered in the first three sections of this chapter can be thought of as degenerate problems from noncooperative dynamic (and differential) game theory involving a single player.
2.4.1 General Formulation A noncooperative dynamic game involves N players, interacting through a discretetime dynamical system over a (potentially infinite) horizon T > 0 so as to optimize their individual cost functions. We shall denote the set of players as P {1, 2, . . . , N }.
3
As noted in Chap. 1, we shall refer to dynamic games in continuous time as differential games, and dynamic games in discrete time as dynamic games.
2.4 Noncooperative Dynamic Games
2.4.1.1
25
Game Dynamics
Let the dynamical system be described by the system of difference equations
xk+1 = f k xk , u 1k , . . . , u kN , x0 ∈ Rn
(2.25)
for k ≥ 0 where xk ∈ Rn are the game’s state vectors, u ik ∈ U i are player control i inputs belonging to the sets of admissible player controls U i ⊂ Rm for each player i ∈ P, and f k : Rn × U 1 × · · · × U N → Rn are (potentially nonlinear) functions. We shall use u i[0,k] {u i0 , u i1 , . . . , u ik } to denote a sequence of player i’s controls of length k ≥ 0 and x[0,k] {x0 , x1 , . . . , xk } to denote a sequence of game state vectors of length k ≥ 0.
2.4.1.2
Player Cost Functions
Each player i ∈ P seeks to use their controls u ik to optimize an individual cost function without the cooperation of the other players j ∈ P, j = i. For each player i ∈ P, let us define this individual cost function as N i VTi (x[0,T ] , u 1[0,T −1] , u 2[0,T −1] , . . . , u [0,T −1] ) F (x T ) +
T −1
gki (xk , u 1k , u 2k , . . . , u kN )
k=0
(2.26) when the horizon T is finite, and as i N V∞ (x[0,∞] , u 1[0,∞] , u 2[0,∞] , . . . , u [0,∞] )
∞
gki (xk , u 1k , u 2k , . . . , u kN )
(2.27)
k=0
when the horizon T is infinite. The real-valued functions gki : Rn × U 1 × U 2 × · · · × U N → R for k ≥ 0 describe the stage (or running) costs for player i ∈ P associated with the states and controls, while the real-valued functions F i : Rn → R describe the additional terminal cost for player i ∈ P if the game ends in the state x T (when the horizon T is finite). An important point to note is that the cost function VTi for each player i ∈ P is, in general, dependent on the controls selected by the j other players j ∈ P, j = i—both directly through the instantaneous controls u k and indirectly through the game’s state xk . Remark 2.1 In the case of a two-player game (i.e., when N = 2), if the sum of the cost functions is zero for all state and control sequences (i.e., if VT1 = −VT2 ), then the game is termed zero-sum; otherwise, it is termed nonzero-sum.
26
2 Background and Forward Problems
2.4.1.3
Information Structures and Player Strategies
The information structure of a game describes the information about the game’s state xk available to each player at each time k ≥ 0. In this book, we consider two standard information structures: 1. the open-loop information structure in which each player has access only to the initial state of the game x0 for all k ≥ 0; and 2. the feedback information structure in which each player has access to the most recent state of the game xk for all k ≥ 0. The information structure of a game dictates the strategies that players can employ. Under the open-loop information structure, a permissible strategy for player i ∈ P is a sequence of functions γ i {γki : k ≥ 0} with each function γki : Rn → U i mapping the initial state x0 to the control u ik in the sense that u ik = γki (x0 ) ∈ U i . The set of permissible open-loop strategies Γ Oi L for player i is thus the set of all sequences of functions γ i {γki : k ≥ 0} such that u ik = γki (x0 ) ∈ U i . Since all open-loop strategies are constant with respect to the state xk for k > 0, the open-loop information structure describes the situation where all players decide at k = 0 the control sequences to be applied for all k > 0 based solely on the initial state x0 . The set of permissible open-loop strategies Γ Oi L for player i is thus equivalently the set of all control sequences {u ik ∈ U i : k ≥ 0}. Under the feedback information structure, a permissible strategy for player i ∈ P is a sequence of functions γ i {γki : k ≥ 0} with each function γki : Rn → U i mapping the current state xk to the control u ik in the sense that u ik = γki (xk ) ∈ U i . Player i’s strategy set in the case of the feedback information structure, Γ Fi B , is hence the set of all sequences of functions γ i {γki : k ≥ 0} such that u ik = γki (xk ) ∈ U i . We shall often use Γ i to denote the set of permissible strategies for player i ∈ P with context making it clear whether we are considering the open-loop or feedback sets, Γ Oi L or Γ Fi B . 2.4.1.4
Summary of Formulation
The following definition summarizes the formulation of a (discrete-time) noncooperative dynamic game. Definition 2.4 (Noncooperative Dynamic Game) A noncooperative dynamic game is defined by • • • • • • •
a set of players P = {1, . . . , N }; a potentially infinite horizon T > 0 (known to the players); a discrete-time dynamical system of the form in (2.25); a set of admissible controls Ui for each player i ∈ P; a cost function VTi of the form of either (2.26) or (2.27) for each player i ∈ P; an information structure describing the state information available; and a set of permissible strategies Γi for each player i ∈ P.
2.4 Noncooperative Dynamic Games
27
Given this construction of noncooperative dynamic games, we next examine their solution.
2.4.2 Nash Equilibrium Solutions Unlike static optimization and optimal control, what constitutes an “optimal solution” to a noncooperative dynamic game is somewhat ambiguous since a variety of solution concepts arise by varying factors including the order in which players make decisions and what information the players have or believe about the potential strategies of other players and the state of the game (cf. [1, Chap. 1]). In this book, we shall focus on Nash equilibrium solutions that arise when 1. all players act simultaneously; and 2. each player seeks to minimize their individual cost function given: a. knowledge of the cost functions of the other players in the game; and b. the (correct) belief that all other players are likewise seeking to minimize their individual cost functions (under the same conditions). The definition of Nash equilibrium solutions then follows.
2.4.2.1
Definition
A Nash equilibrium of a noncooperative dynamic game is a collection of player strategies under which there is no incentive for any player to unilaterally deviate from their (Nash equilibrium) strategy. In other words, for each player under a Nash equilibrium, there is no other permissible strategy that would further reduce the value of their cost function without the other players also deviating from their current strategies. A Nash equilibrium is therefore defined mathematically as follows. Definition 2.5 (Nash Equilibrium Solution [1, Sect. 6.2]) Consider the sets of permissible player strategies {Γ i : i ∈ P}. A Nash equilibrium solution is an N -tuple of player strategies {γ i ∈ Γ i : i ∈ P} that satisfies the N inequalities VTi (γ 1 , . . . , γ i , . . . , γ N ) ≤ VTi (γ 1 , . . . , γ¯ i , . . . , γ N )
(2.28)
for all γ¯ i ∈ Γ i and all i ∈ P where, in a slight abuse of notation, we use VTi (γ 1 , . . . , γ i , . . . , γ N ) to denote the cost function VTi evaluated with states xk given by (2.25) and controls given by the strategies γ i ∈ Γ i in the sense that u ik = γki (·) for i ∈ P.
28
2 Background and Forward Problems
Definition 2.5 defines either an open-loop Nash equilibrium or a feedback Nash equilibrium depending on whether the sets of permissible player strategies {Γ i : i ∈ P} are constructed under the open-loop or feedback information structure. That is, • if we consider the sets of player strategies permissible under the open-loop information structure, i.e., {Γ i = Γ Oi L : i ∈ P}, then Definition 2.5 provides a mathematical definition of an open-loop Nash equilibrium; • if we consider the sets of player strategies permissible under the feedback information structure, i.e., {Γ i = Γ Fi B : i ∈ P}, then Definition 2.5 provides a mathematical definition of a feedback Nash equilibrium. In a dynamic game, there may exist a single Nash equilibrium, no Nash equilibria, or multiple Nash equilibria. In general, however, a Nash equilibrium cannot be uniquely associated with a set of player cost functions {VTi : i ∈ P} since multiple different player cost functions {VTi : i ∈ P} can result in the same Nash equilibrium solution or solutions. This fact will be of particular importance later in this book when we consider inverse noncooperative dynamic game problems (and mirrors Kalman’s observation in [11] regarding the ill-posedness of inverse optimal control).
2.4.2.2
Connection Between Nash Equilibria and Optimal Control
The inequalities defining Nash equilibria in (2.28) imply a close relationship between noncooperative dynamic games and the discrete-time optimal control problems we presented in Sect. 2.2. For example, for a finite horizon T < ∞ the inequalities (2.28) imply that the N -tuple of player control sequences {u i[0,T −1] : i ∈ P} constitutes an open-loop Nash equilibrium if and only if the controls solve the N coupled discretetime finite-horizon optimal control problems inf
u¯ i[0,T −1]
s.t.
N VTi (x[0,T ] , u 1[0,T −1] , . . . , u¯ i[0,T −1] , . . . , u [0,T −1] )
xk+1 = f k (xk , u 1k , . . . , u¯ ik , . . . , u kN ), 0 ≤ k ≤ T − 1 x0 ∈ Rn
(2.29)
u¯ ik ∈ U i , 0 ≤ k ≤ T − 1 for i ∈ P where u¯ i[0,T −1] {u¯ i0 , u¯ i1 , . . . , u¯ iT −1 }. Similarly, the inequalities (2.28) imply that the N -tuple of player control sequences {u i[0,T −1] : i ∈ P} with associated feedback laws {u ik = γki (xk ) : k ≥ 0, i ∈ P} constitutes a feedback Nash equilibrium if and only if the controls solve the N coupled discrete-time finite-horizon optimal control problems
2.4 Noncooperative Dynamic Games
inf
u¯ i[0,T −1]
s.t.
29
VTi (x[0,T ] , γ 1 , . . . , u¯ i[0,T −1] , . . . , γ N ) xk+1 = f k (xk , γk1 (xk ), . . . , u¯ ik , . . . , γkN (xk )), 0 ≤ k ≤ T − 1
(2.30)
x0 ∈ Rn u¯ ik ∈ U i , 0 ≤ k ≤ T − 1 for i ∈ P where, again in a slight abuse of notation, we use γ i in the arguments of VTi as shorthand for VTi (x[0,T ] , γ 1 , . . . , u¯ i[0,T −1] , . . . , γ N ) = F i (x T ) +
T −1
gki (xk , γk1 (xk ), . . . , u¯ ik , . . . , γkN (xk )).
k=0
j
From (2.29), we see that the open-loop Nash equilibrium controls u [0,T −1] of players j ∈ P, j = i are considered as given (i.e., dependent only on time and invariant with respect to the state xk and controls u¯ ik ) during the optimization over player i’s controls u¯ i[0,T −1] . In the open-loop case, player i thus essentially solves the standard optimal control problem (2.29) with the controls of the other players in j the system dynamics fixed as u [0,T −1] for j ∈ P, j = i. From (2.30), we see that j the feedback Nash equilibrium controls u [0,T −1] of players j ∈ P, j = i change during the optimization over player i’s controls u¯ i[0,T −1] via the game state xk and their Nash equilibrium feedback control laws {γ j : j ∈ P, i = j}. In the feedback case, player i thus essentially solves the standard optimal control problem (2.30) j j but with the controls of the other players introducing the functions u k = γk (xk ) into the system dynamics. Identical (infinite-horizon) discrete-time optimal control descriptions of Nash equilibria hold in the case of an infinite horizon T = ∞. In light of the connection between discrete-time optimal control and Nash equilibria in noncooperative dynamic games, we shall next show that necessary conditions for the existence of Nash equilibria are provided directly by discrete-time minimum principles.
2.4.3 Nash Equilibria via Discrete-Time Minimum Principles To present conditions for Nash equilibria derived from discrete-time minimumprinciples, let us define the player Hamiltonian functions Hki (xk , u 1k , . . . , u kN , λik+1 ) gki (xk , u 1k , . . . , u kN ) + λi k+1 f k (xk , u 1k , . . . , u kN ) (2.31) and player costate (or adjoint) vectors4 λik ∈ Rn for k ≥ 0 and i ∈ P. Let us also make the standard assumption that the game dynamics f k and the player cost func4
Consistent with our previous notation, we use λi k to denote the transpose of λik .
30
2 Background and Forward Problems
tions gki for i ∈ P are continuously differentiable in their state and control arguments (cf. [1, Sect. 6.2]) such that ∇x Hki (xk , u 1k , . . . , u kN , λik+1 ) ∈ Rn and ∇u i Hki (xk , u 1k , . . . , u ik , . . . , u kN , λik+1 ) ∈ Rm
i
denote the column vectors of partial derivatives of the Hamiltonian Hki with respect to xk and u ik , respectively, and evaluated at xk , {u ik : i ∈ P} and λik+1 . Similarly, let ∇x F i (x T ) be the gradient of F i evaluated at x T . As in discrete-time optimal control, we require the following condition on the player control sets U i . Assumption 2.3 The player control sets U i are closed and convex for all i ∈ P. Under Assumption 2.3, the discrete-time minimum principle of Theorem 2.1 implies the following necessary conditions for open-loop Nash equilibria in the case of a finite horizon T < ∞. Theorem 2.5 (Finite-Horizon Open-Loop Nash Equilibria [1, Theorem 6.1]) Suppose that Assumption 2.3 holds and that the N -tuple of player control sequences {u i[0,T −1] : i ∈ P} with associated state sequence x[0,T ] constitutes an open-loop Nash equilibrium solution to a noncooperative dynamic game with a finite horizon 0 < T < ∞. Then, (i) the state sequence x[0,T ] satisfies the game dynamics xk+1 = f k (xk , u 1k , . . . , u kN ) for 0 ≤ k ≤ T − 1 given x0 ; (ii) there exist costates λik ∈ Rn for all times 0 ≤ k ≤ T and all players i ∈ P satisfying the backward recursions λik = ∇x Hki (xk , u 1k , . . . , u kN , λik+1 )
(2.32)
for 0 ≤ k ≤ T − 1 with λiT = ∇x F i (x T );
(2.33)
∇u i Hki (xk , u 1k , . . . , u kN , λik+1 ) (u¯ − u ik ) ≥ 0
(2.34)
and (iii) the controls u ik satisfy
for all u¯ ∈ U i , all times 0 ≤ k ≤ T − 1, and all players i ∈ P. Proof See [1, Theorem 6.1]. Alternatively, note that the definition of open-loop Nash equilibria provided in (2.28) is equivalent to the system of N coupled finite-horizon discrete-time optimal control problems in (2.29). The theorem result follows by
2.4 Noncooperative Dynamic Games
31
applying the finite-horizon discrete-time minimum principle of Theorem 2.1 (with Assumption 2.3 implying Assumption 2.1) to each of these optimal control problems noting that under an open-loop Nash equilibrium, optimization over player i’s j controls is done with the controls u [0,T −1] of the other players j ∈ P, j = i being fixed. Conditions for feedback Nash equilibria also follow from the same discrete-time minimum principle. Theorem 2.6 (Finite-Horizon Feedback Nash Equilibria [1, Theorem 6.5]) Suppose that Assumption 2.3 holds and that the N -tuple of player control sequences {u i[0,T −1] : i ∈ P} with associated differentiable feedback laws {γki : u ik = γki (xk ), k ≥ 0, i ∈ P} and state sequence x[0,T ] constitutes a feedback Nash equilibrium to a noncooperative dynamic game with a finite horizon 0 < T < ∞. Then, (i) the state sequence x[0,T ] satisfies the game dynamics xk+1 = f k (xk , u 1k , . . . , u kN ) for 0 ≤ k ≤ T − 1 given x0 ; (ii) there exist costates λik ∈ Rn for 0 ≤ k ≤ T and i ∈ P satisfying the backward recursions λik = ∇x Hki (xk , γk1 (xk ), . . . , u ik , . . . , γkN (xk ), λik+1 )
(2.35)
for 0 ≤ k ≤ T − 1 with λiT = ∇x F i (x T );
(2.36)
∇u i Hki (xk , u 1k , . . . , u kN , λik+1 ) (u¯ − u ik ) ≥ 0
(2.37)
and (iii) the controls u ik satisfy
for all u¯ ∈ U i , all 0 ≤ k ≤ T − 1, and all i ∈ P. Proof See [1, Theorem 6.5]. Alternatively, note that the definition of feedback Nash equilibria provided in (2.28) is equivalent to the system of N coupled finitehorizon discrete-time optimal control problems in (2.30). Applying the finite-horizon discrete-time minimum principle of Theorem 2.1 to each of these optimal control problems yields the theorem result. Theorems 2.5 and 2.6 are essentially equivalent except for the costate equations (2.32) and (2.35). In particular, the costate equation (2.35) for feedback Nash equilibj ria involves the partial derivatives of the feedback laws {γk : k ≥ 0, j ∈ P, j = i} with respect to xk while the costate equation (2.32) for open-loop Nash equilibria does not.
32
2 Background and Forward Problems
Necessary conditions for Nash equilibria similar to Theorems 2.5 and 2.6 can be developed for the case of an infinite horizon T = ∞ using the discrete-time minimum principle of Theorem 2.2. Since these conditions are akin to Theorems 2.5 and 2.6 but without terminal constraints on the costate equations (mirroring the relationship between the finite- and infinite-horizon discrete-time minimum principles of Theorems 2.1 and 2.2), we shall omit their presentation. We next consider (continuous-time) differential games.
2.5 Noncooperative Differential Games In this section, we present noncooperative differential games as an extension of continuous-time optimal control to settings with multiple decision-makers (or as an extension of dynamic games with game dynamics governed by continuous-time differential equations). As in continuous-time optimal control, in this section we consider time to be a continuous variable t taking values in the interval [0, T ].
2.5.1 General Formulation A noncooperative differential game involves N players from the set P = {1, 2, . . . , N } interacting through a continuous-time dynamical system governed by differential equations over a potentially infinite horizon T > 0 so as to optimize their individual cost functionals without cooperation.
2.5.1.1
Game Dynamics
Let the continuous-time dynamical system be described by the system of first-order ordinary differential equations x(t) ˙ = f (t, x(t), u 1 (t), . . . , u N (t)), x(0) = x0 ∈ Rn
(2.38)
for continuous-time t ∈ [0, T ] where x(t) ∈ Rn is the game’s state vector, x0 ∈ Rn is the initial game state, u i (t) ∈ U i are player control inputs belonging to the (closed) i sets of admissible player controls U i ⊂ Rm for i ∈ P, and f : [0, T ] × Rn × U 1 × · · · × U N is a (potentially nonlinear) function. As in continuous-time optimal control, we shall ensure that the dynamics (2.38) admit a unique solution for every N -tuple of continuous player control functions {(u i : [0, T ] → U i ) : i ∈ P} by assuming that f is continuous in t and uniformly (globally) Lipschitz in x(t) and u i (t) for all i ∈ P (cf. [1, Theorem 5.1]).
2.5 Noncooperative Differential Games
2.5.1.2
33
Player Cost Functionals
Each player i ∈ P selects their controls u i in (2.25) with the aim of optimizing an individual cost functional without the cooperation of the other players j ∈ P, j = i. For each player i ∈ P, let us define this individual cost functional as VTi (x, u 1 , . . . , u N )
T
g i (t, x(t), u 1 (t), . . . , u N (t))dt
(2.39)
0
with g i : [0, T ] × Rn × U 1 × · · · × U N being a (potentially time-varying) function that describes the stage (or instantaneous, integral, or running) cost for player i associated with the state and controls at time t. As in continuous-time optimal control, there is little loss of generality in considering player cost functionals of the form in (2.39) without a terminal cost since problems with terminal costs can often be equivalently reformulated without one (cf. Sect. 2.3).
2.5.1.3
Information Structures and Player Strategies
The information structures and associated player strategy sets in noncooperative differential games are essentially equivalent to those we discussed in Sect. 2.4 for noncooperative dynamic games. We again consider 1. the open-loop information structure in which each player has access only to the initial state of the game x0 for all t ∈ [0, T ]; and 2. the feedback information structure in which each player has access to the most recent (i.e., instantaneous) state of the game x(t) for all t ∈ [0, T ]. Under the open-loop information structure, a permissible strategy for player i ∈ P is a function γ i : [0, T ] × Rn → U i that maps time and the initial state x0 to the control u i (t) in the sense that u i (t) = γ i (t, x0 ) for all t ∈ [0, T ]. Player i’s set of permissible strategies under the open-loop information structure, Γ Oi L , is thus the set of all functions γ i : [0, T ] × Rn → U i such that u i (t) = γ i (t, x0 ) for all t ∈ [0, T ]. However, the open-loop information structure describes the situation where all players decide at t = 0 the control sequences to be applied for all t > 0 based solely on the initial state x0 , and so Γ Oi L is equivalently the set of all control trajectories u i : [0, T ] → U i . Under the feedback information structure, a permissible strategy for player i ∈ P is a function γ i : [0, T ] × Rn → U i that maps time and the current state x(t) to the control u i (t) in the sense that u i (t) = γ i (t, x(t)). Player i’s strategy set in the case of a feedback information structure, Γ Fi B , is hence the set of all functions γ i : [0, T ] × Rn → U i such that u i (t) = γ i (t, x(t)) ∈ U i . We shall again use Γ i to denote the set of permissible strategies for player i ∈ P with context making it clear whether we are considering the open-loop or feedback sets Γ Oi L or Γ Fi B .
34
2 Background and Forward Problems
2.5.1.4
Summary of Formulation
The formulation of a (continuous-time) noncooperative differential game is summarized in the following definition. Definition 2.6 (Noncooperative Differential Game) A noncooperative differential game is defined by • • • • • • •
a set of players P = {1, 2, . . . , N }; a potentially infinite horizon T > 0 (known to the players); a continuous-time dynamical system of the form in (2.38); a set of admissible controls Ui for each player i ∈ P; a cost functional VTi of the form of (2.39) for each player i ∈ P; an information structure describing the state information available; and a set of permissible strategies Γi for each player i ∈ P.
In contrast to the formulation of a (discrete-time) noncooperative dynamic game in Definition 2.4, a noncooperative differential game involves a continuous-time dynamical system (leading to state and control trajectories rather than state and control sequences) and player cost functionals rather than player cost functions. The solution of noncooperative differential games, however, involves similar concepts to those found in the solution of noncooperative dynamic games, including that of Nash equilibria.
2.5.2 Nash Equilibrium Solutions Although a variety of solution concepts are possible in differential games, as in dynamic games, we focus on Nash equilibria.
2.5.2.1
Definition
The definition of a Nash equilibrium is essentially equivalent in both noncooperative differential and noncooperative dynamic games (cf. Sect. 2.4.2) and so we have the following definition of Nash equilibrium solutions to noncooperative differential games mirroring Definition 2.5. Definition 2.7 (Nash Equilibrium Solution) Consider the sets of permissible player strategies {Γ i : i ∈ P}. A Nash equilibrium solution is an N -tuple of player strategies {γ i ∈ Γ i : i ∈ P} that satisfies the N inequalities VTi (γ 1 , . . . , γ i , . . . , γ N ) ≤ VTi (γ 1 , . . . , γ¯ i , . . . , γ N ) for all γ¯ i ∈ Γ i and all i ∈ P where, in a slight abuse of notation, we use
(2.40)
2.5 Noncooperative Differential Games
35
VTi (γ 1 , . . . , γ i , . . . , γ N ) to denote the cost functional VTi evaluated with states x : [0, T ] → Rn given by (2.38) and controls given by the strategies γ i ∈ Γ i in the sense that u i (t) = γ i (t, ·) for i ∈ P. Definition 2.7 defines an open-loop Nash equilibrium if we consider the sets of player strategies permissible under the open-loop information structure, i.e., {Γ i = Γ Oi L : i ∈ P}, and a feedback Nash equilibrium if we consider the sets of player strategies permissible under the feedback information structure, i.e., {Γ i = Γ Fi B : i ∈ P}. Again, we note that there may exist a single Nash equilibrium, no Nash equilibria, or multiple Nash equilibria in a noncooperative differential game, but in general, a Nash equilibrium cannot be uniquely associated with a set of player cost functionals {VTi : i ∈ P} since multiple different player cost functionals {VTi : i ∈ P} can often result in the same Nash equilibrium solution or solutions.
2.5.2.2
Connection Between Nash Equilibria and Optimal Control
There is a close relationship between noncooperative differential games and the continuous-time optimal control problems we considered in Sect. 2.3. In particular, the inequalities defining Nash equilibria (2.40) imply that the N -tuple of player control functions {u i : i ∈ P} constitutes an open-loop Nash equilibrium if and only if they solve the N coupled continuous-time optimal control problems inf
VTi x, u 1 , . . . , u¯ i , . . . , u N
s.t.
x(t) ˙ = f (t, x(t), u 1 (t), . . . , u¯ i (t), . . . , u N (t)) t ∈ [0, T ]
u¯ i
u¯ i (t) ∈ U i , t ∈ [0, T ] x(0) = x0
(2.41)
for i ∈ P where u¯ i : [0, T ] → U i . Similarly, the inequalities (2.40) imply that the N -tuple of player control functions {u i : i ∈ P} with associated feedback strategies {u i (t) = γ i (t, x(t)) : t ∈ [0, T ], i ∈ P} constitutes a feedback Nash equilibrium if and only if they solve the N coupled continuous-time optimal control problems inf
VTi x, γ 1 , . . . , u¯ i , . . . , γ N
s.t.
x(t) ˙ = f (t, x(t), γ 1 (t, x(t)), . . . , u¯ i (t), . . . , γ N (t, x(t))) t ∈ [0, T ]
u¯ i
u¯ i (t) ∈ U i , t ∈ [0, T ] x(0) = x0 (2.42)
36
2 Background and Forward Problems
for i ∈ P where, in a slight abuse of notation, we write γ i as arguments for VTi as shorthand for
VTi x, γ 1 , . . . , u¯ i , . . . , γ N T g i (t, x(t), γ 1 (t, x(t)), . . . , u¯ i (t), . . . , γ N (t, x(t)))dt. = 0
We see from (2.41) that the open-loop Nash equilibrium controls u j of players j ∈ P, j = i are fixed (i.e., dependent only on time and invariant with respect to the state x and controls u¯ i ) during the optimization over player i’s controls u¯ i . Player i’s open-loop Nash equilibrium controls u i thus solve the standard optimal control problem (2.41) with the controls of the players in the system dynamics given by their open-loop Nash equilibrium controls u j for j ∈ P, j = i. On the other hand, from (2.42) we see that the feedback Nash equilibrium controls u j of players j ∈ P, j = i vary according to the game state x(t) via the Nash equilibrium feedback control laws {γ j : j ∈ P, i = j} during the optimization over player i’s controls u¯ i . Hence, player i’s feedback Nash equilibrium controls u i solve the standard optimal control problem (2.42) but with the controls of the other players given by u j (t) = γ j (t, x(t)) for j ∈ P, j = i acting as additional state-dependent terms in the dynamics. The connection between continuous-time optimal control and Nash equilibria in noncooperative differential games allows us to next establish necessary conditions for the existence of Nash equilibria using continuous-time minimum principles.
2.5.3 Nash Equilibria via Continuous-Time Minimum Principles To present conditions for Nash equilibria derived from continuous-time minimum principles, let us define the player Hamiltonian functions H i (t, x(t), u 1 (t), . . . , u i (t), . . . , u N (t), λi (t), μi ) μi g i (t, x(t), u 1 (t), . . . , u N (t)) + λi (t) f (t, x(t), u 1 (t), . . . , u N (t))
(2.43)
for t ∈ [0, T ] and i ∈ P where μi ∈ R is a scalar constant and λi : [0, T ] → Rn are player costate (or adjoint) functions.5 In order to consider derivatives of the player Hamiltonian functions, let us also make the standard assumption that the functions f and g i for i ∈ P are continuously differentiable in their state and control arguments (cf. [1, Sect. 6.2]). For i ∈ P, we shall then let ∇x H i (t, x(t), u 1 (t), . . . , u i (t), . . . , u N (t), λi (t), μi ) ∈ Rn 5
Consistent with our previous notation, we use λi (t) to denote the transpose of λi (t).
2.5 Noncooperative Differential Games
37
and ∇u i H i (t, x(t), u 1 (t), . . . , u i (t), . . . , u N (t), λi (t), μi ) ∈ Rm
i
denote the column vectors of partial derivatives of the Hamiltonian H i with respect to x(t) and u i (t), respectively, and evaluated at x(t), {u i (t) : i ∈ P}, λi (t), and μi . In light of the relationship between continuous-time optimal control and open-loop Nash equilibria discussed in Sect. 2.5.2.2, the continuous-time minimum principle of Theorem 2.3 implies the following set of necessary conditions for the existence of open-loop Nash equilibria for a finite horizon T < ∞. Theorem 2.7 (Finite-Horizon Open-Loop Nash Equilibria [1, Theorem 6.13]) Suppose that the N -tuple of player control trajectories {(u i : [0, T ] → U i ) : i ∈ P} with associated state trajectory x : [0, T ] → Rn constitutes an open-loop Nash equilibrium solution to a noncooperative differential game with a finite horizon 0 < T < ∞. Then, μi = 1 for all i ∈ P; (i) the state trajectory x : [0, T ] → Rn satisfies the game dynamics x(t) ˙ = f (t, x(t), u 1 (t), . . . , u N (t)) for t ∈ [0, T ] with x(0) = x0 ; (ii) there exist costate trajectories λi : [0, T ] → Rn for all players i ∈ P satisfying λ˙ i (t) = −∇x H i (t, x(t), u 1 (t), . . . , u N (t), λi (t), μi )
(2.44)
for t ∈ [0, T ] with λi (T ) = 0;
(2.45)
and (iii) the controls u i : [0, T ] → U i satisfy u i (t) ∈ arg min H i (t, x(t), u 1 (t), . . . , u¯ i (t), . . . , u N (t), λi (t), μi ) u¯ i (t)∈U
(2.46)
i
for all t ∈ [0, T ] and all players i ∈ P. Proof See [1, Theorem 6.13]. Alternatively, note that the definition of open-loop Nash equilibria provided in (2.40) is equivalent to the system of N coupled finitehorizon continuous-time optimal control problems in (2.41). The theorem result follows by applying the continuous-time finite-horizon minimum principle of Theorem 2.3 to each of these optimal control problems, noting that the open-loop Nash equilibrium controls u j of players j ∈ P, j = i are fixed during optimization over player i’s controls.
38
2 Background and Forward Problems
Conditions for feedback Nash equilibria also follow from the same continuoustime minimum principle. Theorem 2.8 (Finite-Horizon Feedback Nash Equilibria [1, Theorem 6.15]) Suppose that the N -tuple of player control trajectories {(u i : [0, T ] → U i ) : i ∈ P} with associated differentiable feedback laws {(γ i : [0, T ] × Rn → U i ) : γ i (t, x(t)) = u i (t), i ∈ P} and state trajectory x : [0, T ] → Rn constitutes a feedback Nash equilibrium solution to a noncooperative differential game with a finite horizon 0 < T < ∞. Then, μi = 1 for all i ∈ P; (i) the state trajectory x : [0, T ] → Rn satisfies the game dynamics x(t) ˙ = f (t, x(t), u 1 (t), . . . , u N (t)) for t ∈ [0, T ] with x(0) = x0 ; (ii) there exist costate trajectories λi : [0, T ] → Rn for all players i ∈ P satisfying λ˙ i (t) = −∇x H i (t, x(t), γ 1 (t, x(t)), . . . , u i (t), . . . , γ N (t, x(t)), λi (t), μi ) (2.47) for t ∈ [0, T ] with λi (T ) = 0;
(2.48)
and (iii) the controls u i : [0, T ] → U i satisfy u i (t) = γ i (t, x(t)) ∈ arg min H i (t, x(t), u 1 (t), . . . , u¯ i (t), . . . , u N (t), λi (t), μi ) (2.49) u¯ i (t)∈U
i
for all t ∈ [0, T ] and all players i ∈ P. Proof See [1, Theorem 6.13]. Alternatively, recall that the definition of feedback Nash equilibria provided in (2.40) is equivalent to the system of N coupled finitehorizon continuous-time optimal control problems in (2.42). The theorem result follows by applying the continuous-time finite-horizon minimum principle of Theorem 2.3 to each of these optimal control problems. As in the case of discrete-time dynamic games, the conditions for open-loop and feedback Nash equilibria in Theorems 2.7 and 2.8 differ only via the costate equations (2.44) and (2.47). Specifically, under the feedback information structure, the costate differential equation (2.47) involves partial derivatives of the Nash equilibrium feedback control laws γ j of players j ∈ P, j = i while under the open-loop information structure, (2.44) does not. For an infinite horizon T = ∞, conditions for Nash equilibria similar to Theorems 2.7 and 2.8 can be developed using the continuous-time minimum principle of
2.5 Noncooperative Differential Games
39
Theorem 2.4. These conditions are similar to Theorems 2.7 and 2.8 but omit terminal constraints on the costate equations and involve real-valued constants μi = 1 (mirroring the continuous-time infinite-horizon minimum principle of Theorem 2.4). We shall discuss these conditions where they are required later in Chap. 6.
References 1. Basar T, Olsder GJ (1999) Dynamic noncooperative game theory, vol 23, 2nd edn. Academic, New York 2. Ben-Israel A, Greville TNE (2003) Generalized inverses: theory and applications, vol 15. Springer Science & Business Media, Berlin 3. Bertsekas DP (1995) Dynamic programming and optimal control, vol 1, 3rd edn. Athena Scientific, Belmont 4. Blot J, Chebbi H (2000) Discrete time Pontryagin principles with infinite horizon. J Math Anal Appl 246(1):265–279 5. Blot J, Hayek N (2014) Infinite-horizon optimal control in the discrete-time framework. Springer, New York 6. Engwerda J (2005) LQ dynamic optimization and differential games. Wiley, West Sussex 7. Gallier J (2013) Geometric methods and applications: for computer science and engineering, 2nd edn. Springer, New York 8. Goodwin GC, Seron MM, De Doná JA (2006) Constrained control and estimation: an optimisation approach. Springer Science & Business Media, Berlin 9. Halkin H (1974) Necessary conditions for optimal control problems with infinite horizons. Econom: J Econom Soc 267–272 10. James M (1978) The generalised inverse. Math Gaz 62(420):109–114 11. Kalman RE (1964) When is a linear control system optimal? J Basic Eng 86(1):51–60 12. Lewis FL, Vrabie D, Syrmos VL (2012) Optimal control, 3rd edn. Wiley, New York 13. Naidu DS (2002) Optimal control systems. CRC Press, Boca Raton
Chapter 3
Discrete-Time Inverse Optimal Control
In this chapter, we investigate inverse optimal control problems for discrete-time dynamical systems governed by difference equations. We pose two discrete-time inverse optimal control problems that involve computing parameters in the cost function of a discrete-time optimal control problem from observed state and control data sequences. The inverse problems differ in whether the available data sequences are whole or truncated prior to the horizon of the optimal control problem. We present and discuss methods for solving these problems based on bilevel optimization and discrete-time minimum principles. We specifically show that methods based on minimum principles reduce to solving systems of linear equations and (static) quadratic programs under a standard linear parameterization of the class of cost functions. The significance of the minimum-principle methods for discrete-time inverse optimal control reducing to straightforward optimization problems is twofold. Firstly, it allows us to establish fundamental theoretical conditions under which solutions to the discrete-time inverse optimal control problems exist, regardless of the specific method of inverse optimal control employed (whether bilevel, based on the minimum principle, or any other method). Secondly, it enables efficient practical implementation of the minimum-principle methods using standard optimization results (while implementations of bilevel methods involve the repeated numeric solution of forward discrete-time optimal control problems for candidate parameters). In the final part of this chapter, we investigate discrete-time inverse optimal control for linear dynamical systems with infinite-horizon quadratic cost functions. This investigation of linear-quadratic (LQ) inverse optimal control leads to results that mirror those established earlier in the chapter for general nonlinear systems with potentially nonquadratic cost functions. It also enables explicit connections to be made with well-known control-theoretic concepts, such as algebraic Riccati equations and system identification.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 T. L. Molloy et al., Inverse Optimal Control and Inverse Noncooperative Dynamic Game Theory, Communications and Control Engineering, https://doi.org/10.1007/978-3-030-93317-3_3
41
42
3 Discrete-Time Inverse Optimal Control
3.1 Preliminary Concepts In this section, we extend the discrete-time optimal control problems introduced in Sect. 2.2 by allowing their cost functions to depend on parameters (in addition to the states and controls). We present necessary conditions for the existence of solutions to these parameterized discrete-time optimal control problems in the form of discrete-time minimum principles. The remainder of this chapter will examine (inverse) problems concerned with computing these cost-function parameters from data using the parameterized discrete-time minimum principles.
3.1.1 Parameterized Discrete-Time Optimal Control Problems Let us begin by considering the (potentially nonlinear) difference equations xk+1 = f k (xk , u k ) , x0 ∈ Rn
(3.1)
for k ≥ 0 where f k : Rn × Rm → Rn are (potentially nonlinear) functions, xk ∈ Rn are state vectors, and u k ∈ U are controls belonging to the constraint set U ⊂ Rm . Let us also define the parameterized cost function T −1 VT x[0,T ] , u [0,T −1] , θ F (x T , θ ) + gk (xk , u k , θ ) ,
(3.2)
k=0
for finite horizons 0 < T < ∞, and the parameterized cost function ∞ V∞ x[0,∞] , u [0,∞] , θ gk (xk , u k , θ )
(3.3)
k=0
for infinite horizons T = ∞ where x[0,k] {x0 , x1 , . . . , xk } and u [0,k] {u 0 , u 1 , . . . , u k } are sequences of states and controls, respectively. The functions gk : Rn × U × Θ → R for k ≥ 0 describe the stage (or running) cost associated with the states and controls at time k in both the finite- and infinite-horizon cost functions (3.2) and (3.3). In the case of the finite-horizon cost function (3.2), the terminal cost function F : Rn × Θ → R describes the additional cost associated with finishing in the final state x T . We assume that both the stage and terminal cost functions, gk and F, belong to some known class of functions parameterized by a vector θ , with θ belonging to the parameter set Θ ⊂ Rq . For example, if gk (xk , u k , θ ) describes a quadratic stage cost in the sense that gk (xk , u k , θ ) = xk Qxk + u k Ru k , then we may define the parameters θ as the elements of the matrices Q and R.
3.1 Preliminary Concepts
43
Given the discrete-time dynamics (3.1) and the finite-horizon cost function (3.2), the parameterized discrete-time finite-horizon optimal control problem is inf
u [0,T −1]
s.t.
VT x[0,T ] , u [0,T −1] , θ xk+1 = f k (xk , u k ), 0 ≤ k ≤ T − 1 uk ∈ U , 0 ≤ k ≤ T − 1 x0 = x. ¯
(3.4)
Similarly, given the discrete-time dynamics (3.1) and the infinite-horizon cost function (3.3), the parameterized indiscrete-time finite-horizon optimal control problem is inf V∞ x[0,∞] , u [0,∞] , θ u [0,∞]
s.t.
xk+1 = f k (xk , u k ), k ≥ 0 uk ∈ U , k ≥ 0
(3.5)
x0 = x. ¯ Solutions to these discrete-time optimal control problems are sequences of states and controls that minimize the respective cost functions (3.2) and (3.3) while satisfying the dynamics (3.1) and control constraints for a given horizon T , initial state x, ¯ and parameter vector θ . Necessary conditions for the solutions to these parameterized discrete-time optimal control problems are provided by parameterized discrete-time minimum principles.
3.1.2 Parameterized Discrete-Time Minimum Principles To present discrete-time minimum principles for the parameterized discrete-time finite and infinite-horizon optimal control problems (3.4) and (3.5), let us define the parameterized Hamiltonian function Hk (xk , u k , λk+1 , θ ) gk (xk , u k , θ ) + λk+1 f k (xk , u k )
(3.6)
where λk ∈ Rn for k ≥ 0 are costate (or adjoint) vectors. Let us also introduce the following assumption on the control-constraint set U . Assumption 3.1 The set U is closed and convex. Furthermore, we assume that the functions f k , gk , and F are continuously differentiable in their state and control arguments so that we may consider the vectors ∇x Hk (xk , u k , λk+1 , θ ) ∈ Rn and ∇u Hk (xk , u k , λk+1 , θ ) ∈ Rm consisting of the partial derivatives of the Hamiltonian with respect to xk and u k , respectively (and
44
3 Discrete-Time Inverse Optimal Control
evaluated at xk , u k , λk+1 , and θ ). Similarly, ∇x F(x T , θ ) denotes the (column) vector of partial derivatives of F with respect to x T (evaluated with x T and θ ). Along the lines of Theorem 2.1, we then have the following discrete-time minimum principle for the parameterized discrete-time finite-horizon optimal control problem (3.4). Theorem 3.1 (Parameterized Finite-Horizon Minimum Principle) Suppose that Assumption 3.1 holds and that x[0,T ] and u [0,T −1] constitute a solution to the parameterized discrete-time finite-horizon optimal control problem (3.4) for a given θ ∈ Θ and 0 < T < ∞. Then, (i) the state sequence x[0,T ] satisfies the dynamics xk+1 = f k (xk , u k ) ¯ for 0 ≤ k ≤ T − 1 with x0 = x; (ii) there exist costates λk ∈ Rn for 0 ≤ k ≤ T satisfying the backward recursion λk = ∇x Hk (xk , u k , λk+1 , θ )
(3.7)
λT = ∇x F (x T , θ ) ;
(3.8)
∇u Hk (xk , u k , λk+1 , θ ) (u¯ − u k ) ≥ 0
(3.9)
for 0 ≤ k ≤ T − 1 with
and (iii) the controls u k satisfy
for all u¯ ∈ U and all 0 ≤ k ≤ T − 1. Proof See Theorem 2.1.
To introduce a minimum principle for the parameterized indiscrete-time finitehorizon optimal control problem (3.5) similar to Theorem 2.2, let us introduce the following invertibility assumption, with ∇x f k (xk , u k ) and ∇u f k (xk , u k ) denoting the matrices of partial derivatives of f k , with respect to (and evaluated at) xk and u k , respectively. Assumption 3.2 The matrix ∇x f k (xk , u k ) is invertible for all k ≥ 0. Given Assumption 3.2, we have the following discrete-time minimum principle for the parameterized discrete-time infinite-horizon optimal control problem (3.5). Theorem 3.2 (Parameterized Infinite-Horizon Minimum Principle) Suppose that the state x[0,∞] and control u [0,∞] sequences constitute a solution to the parameterized discrete-time infinite-horizon optimal control problem (3.5) for a given θ ∈ Θ. If Assumptions 3.1 and 3.2 hold then,
3.1 Preliminary Concepts
45
(i) the state sequence x[0,∞] satisfies the dynamics xk+1 = f k (xk , u k ) ¯ for k ≥ 0 with x0 = x; (ii) there exist costates λk ∈ Rn for k ≥ 0 satisfying the backward recursion λk = ∇x Hk (xk , u k , λk+1 , θ )
(3.10)
∇u Hk (xk , u k , λk+1 , θ ) (u¯ − u k ) ≥ 0
(3.11)
for k ≥ 0; and (iii) the controls u k satisfy
for all controls u¯ ∈ U and all k ≥ 0.
Proof See Theorem 2.2.
The following corollary summarizes the conditions of Theorems 3.1 and 3.2 to provide a (new) parameterized discrete-time horizon-invariant minimum principle for both parameterized discrete-time (finite- and infinite-horizon) optimal control problems (3.4) and (3.5). Corollary 3.1 (Parameterized Horizon-Invariant Minimum Principle) Suppose that the state and control sequences, x[0,] and u [0,] , constitute a truncated solution to either the infinite-horizon optimal control problem (3.5) or the finite-horizon optimal control problem (3.4) with horizon T > and parameters θ ∈ Θ. If Assumptions 3.1 and 3.2 hold then, (i) the state sequence x[0,] satisfies the dynamics xk+1 = f k (xk , u k ) ¯ for 0 ≤ k < with x0 = x; (ii) there exist costates λk ∈ Rn for 0 ≤ k ≤ + 1 satisfying the backward recursion λk = ∇x Hk (xk , u k , λk+1 , θ )
(3.12)
∇u Hk (xk , u k , λk+1 , θ ) (u¯ − u k ) ≥ 0
(3.13)
for 0 ≤ k ≤ ; and (iii) the controls u k satisfy
for all u¯ ∈ U and all 0 ≤ k ≤ .
46
3 Discrete-Time Inverse Optimal Control
Proof Under Assumption 3.1, Theorem 3.1 establishes the corollary assertions in the finite-horizon case 0 < T < ∞ where we disregard the terminal condition (3.8). Theorem 3.2 establishes the corollary assertions under Assumptions 3.1 and 3.2 in the infinite-horizon case T = ∞. The proof is complete. We shall exploit the discrete-time minimum principles of Theorems 3.1 and 3.2 (as well as their combined form in Corollary 3.1) in the remainder of this chapter to develop methods of discrete-time inverse optimal control. We next introduce the specific discrete-time inverse optimal control problems that we shall consider.
3.2 Inverse Optimal Control Problems in Discrete Time In this section, we pose the discrete-time inverse optimal control problems that we shall consider in this chapter. In each of these inverse problems, we seek to compute cost-function parameters θ such that a given pair of state and control sequences are optimal solutions to either the finite- or infinite-horizon discrete-time optimal control problems (3.4) and (3.5). We assume throughout that the dynamics f k , constraint set U , and functions gk and F are given (e.g., they have already been selected or estimated). The problems differ in whether the state and control sequences are to constitute optimal solutions in their entirety, or are to be optimal solutions that are truncated at some time < T before the (possibly infinite) horizon T . To begin, let us pose the simplest problem in which we are given state and control sequences in their entirety (i.e., data of the trajectories exists and is provided). Definition 3.1 (Whole-Sequence (WS) Problem) Consider the parameterized discrete-time finite-horizon optimal control problem (3.4) with known functions f k , gk , and F, constraint set U , and parameter set Θ. Given the sequences of states x[0,T ] and controls u [0,T −1] , the whole-sequence inverse optimal control problem is to compute parameters θ such that x[0,T ] and u [0,T −1] constitute an optimal solution to (3.4). The WS problem of Definition 3.1 is a prototypical form of (data-driven) discretetime inverse optimal control. Key to its formulation is the consideration of the finitehorizon optimal control problem (3.4) with its horizon determined by the length of the available state and control sequences. To pose an inverse optimal control problem involving the infinite-horizon optimal control problem (3.5) and the finite-horizon inverse optimal control problem (3.4) with an unknown horizon T , we will consider sequences x[0,] and u [0,] that are truncated at a (finite) time < T . Definition 3.2 (Truncated-Sequence (TS) Problem) Consider a potentially infinite horizon1 T > 0, and state and control sequences x[0,] and u [0,] with < T . If 1
We shall consider variations of the truncated-sequence problem in which the horizon T is either known, unknown, or known only to be finite or infinite.
3.2 Inverse Optimal Control Problems in Discrete Time
47
T < ∞, then the truncated-sequence inverse optimal control problem is to compute parameters θ such that x[0,] and u [0,] constitute subsequences of an optimal solution (i.e., x[0,T ] and u [0,T −1] ) to the parameterized discrete-time finite-horizon optimal control problem (3.4) with a given constraint set U , parameter set Θ, and functions f k , gk , and F. Similarly, if T = ∞, then the truncated-sequence inverse optimal control problem is to compute parameters θ such that x[0,] and u [0,] constitute subsequences of an optimal solution (i.e., x[0,∞] and u [0,∞] ) to the parameterized discrete-time infinite-horizon optimal control problem (3.5) with a given constraint set U , parameter set Θ, and functions f k and gk . The WS and TS problems of Definitions 3.1 and 3.2 are similar apart from the consideration of truncated sequences (i.e., subsequences) and a potentially infinite (and unknown) horizon T in the TS problem. We note that in many practical situations, the assumption that the functions gk and F, dynamics f k , and parameter set Θ are known is unlikely to hold. Indeed, in practice, these could be misspecified such that the given sequences fail to constitute optimal solutions to discrete-time optimal control problems for any θ ∈ Θ, in which case the problems of Definitions 3.1 and 3.2 would lack exact solutions. If the problems of Definitions 3.1 and 3.2 lack exact solutions, it is often desirable to instead view them as regression problems in which the aim is to find parameters θ such that the sequences approximately satisfy optimality conditions for discrete-time optimal control problems (e.g., minimum principles).2 In this chapter, we therefore consider both exact and approximate solutions to the discrete-time inverse optimal control problems of Definitions 3.1 and 3.2.
3.3 Bilevel Methods The first class of methods that we consider for solving the discrete-time inverse optimal control problems of Definitions 3.1 and 3.2 are termed bilevel methods, and arise by directly viewing the problems as a form of regression (or trajectory-fitting). They specifically seek to find cost-function parameters θ that minimize a measure of the error between the given sequences, and sequences predicted by solving a discretetime optimal control problem parameterized by θ . They are called bilevel methods since they involve bilevel optimization, that is, they involve two levels of (nested) optimization detailed as follows. 1. The first (or upper) level of optimization is over the cost-function parameters θ ∈ Θ with an optimization objective such as the total squared error3
2
These approximate optimality approaches have been widely advocated for in the literature of inverse optimal control and inverse optimization (see, for example, the pioneering work of [15, 23]). 3 We use · to denote the Euclidean norm.
48
3 Discrete-Time Inverse Optimal Control T
xk − xkθ 2 +
k=0
T −1
u k − u θk 2 .
(3.14)
k=0
2. The second (or lower) level of optimization involves the (forward) solution of a parameterized discrete-time optimal control problem to find the states and controls xkθ and u θk in (3.14) corresponding to candidate parameters θ . In this section, we shall detail specific bilevel methods for both the WS and TS discrete-time inverse optimal control problems. These methods will motivate our later consideration of alternative minimum-principle methods.
3.3.1 Bilevel Method for Whole Sequences The bilevel method for solving the whole-sequence discrete-time inverse optimal control problem of Definition 3.1 is the optimization problem inf
θ∈Θ
T
xk − xkθ 2 +
k=0
T −1
u k − u θk 2
(3.15)
k=0
subject to the constraint that the states xkθ for 0 ≤ k ≤ T and controls u θk for 0 ≤ k ≤ T − 1 are solutions to the parameterized discrete-time finite-horizon optimal control problem inf
u [0,T −1]
s.t.
VT x[0,T ] , u [0,T −1] , θ xk+1 = f k (xk , u k ), 0 ≤ k ≤ T − 1 uk ∈ U , 0 ≤ k ≤ T − 1 x0 = x. ¯
Due to the parameterized discrete-time finite-horizon optimal control problem in its constraints, this bilevel method is only suited to computing the parameters of discrete-time finite-horizon optimal control problems (3.4) when we are given whole sequences of states and controls.
3.3.2 Bilevel Method for Truncated Sequences The bilevel method for solving the TS problem of Definition 3.2 is the optimization problem
3.3 Bilevel Methods
49
inf
θ∈Θ
xk − xkθ 2 +
k=0
u k − u θk 2
(3.16)
k=0
subject to the constraint that the state and control sequences {xkθ : 0 ≤ k ≤ } and {u θk : 0 ≤ k ≤ } are a truncated solution to either the parameterized discrete-time finite-horizon optimal control problem inf
u [0,T −1]
s.t.
VT x[0,T ] , u [0,T −1] , θ xk+1 = f k (xk , u k ), 0 ≤ k ≤ T − 1 uk ∈ U , 0 ≤ k ≤ T − 1 x0 = x¯
in the case of a known finite horizon T < ∞, or the parameterized discrete-time infinite-horizon optimal control problem inf
u [0,∞]
s.t.
V∞ x[0,∞] , u [0,∞] , θ xk+1 = f k (xk , u k ), k ≥ 0 uk ∈ U , k ≥ 0 x0 = x¯
in the case of a known infinite horizon T = ∞.
3.3.3 Discussion of Bilevel Methods The bilevel methods of (3.15) and (3.16) are conceptually similar, except for variations of their upper-level objectives to accommodate whole or truncated sequences, and variations of the optimal control problems in their constraints to accommodate different horizons. Indeed, both methods seek to find cost-function parameters θ that minimize the total squared error between the given states and controls (e.g., xk and u k ), and states and controls (e.g., xkθ and u θk ) predicted by solving optimal control problems in their constraints (or lower level of optimization) with θ . Interestingly, we note that alternative bilevel methods can be posed with upper-level optimization objectives other than the total squared error between both the states or controls. For example, if the controls u k are not provided or unknown, then it may be convenient to penalize only the total squared error between the states, namely inf
θ∈Θ
T k=0
xk − xkθ 2 .
50
3 Discrete-Time Inverse Optimal Control
Regardless of the upper-level optimization objectives, however, bilevel methods rely on solving optimal control problems in the lower level of optimization. Bilevel methods therefore require explicit prior knowledge of the horizon T > 0, which is particularly restrictive in the case of truncated sequences (i.e., the TS problem). Furthermore, since the solution of optimal control problems is nontrivial in general, bilevel methods are usually implemented by nesting two numeric optimization routines, with the first routine searching over the parameters θ and handling the upper-level optimization, and the second routine being called by the first to solve the optimal control problems given candidate parameters θ . The computational demands of bilevel method implementations are thus often prohibitive (with examples provided in Chap. 7), and there is a need for efficient and tractable alternative methods of discrete-time inverse optimal control. In the remainder of this chapter, we shall therefore construct and analyze efficient and tractable alternative methods for solving discrete-time inverse optimal control problems by employing the discrete-time minimum principles of Sect. 3.1.2.
3.4 Minimum-Principle Methods In Sect. 3.1.2, we presented discrete-time minimum principles as providing conditions that state and control sequences must satisfy in order to constitute solutions to parameterized discrete-time optimal control problems with given cost-function parameters. In this section, we shall employ them to solve inverse optimal control problems by alternatively viewing them as providing conditions that the cost-function parameters θ must satisfy assuming that the given state and control sequences solve a parameterized discrete-time optimal control problem (with unknown cost-function parameters). We begin by developing minimum-principle methods for solving the WS problem of Definition 3.1. We then develop minimum-principle methods for the TS problem of Definition 3.2.
3.4.1 Methods for Whole Sequences To develop methods for solving the WS problem of Definition 3.1, consider the discrete-time finite-horizon optimal control problem (3.4) with horizon T , along with (arbitrary) sequences x[0,T ] and u [0,T −1] . Let us also introduce the following concept of a set of times at which the control constraints are not active (i.e., times at which the controls in the sequence u [0,T −1] are in the interior of the constraint set U ). Definition 3.3 (Inactive Control Constraint Times) Given the control sequence u [0,T −1] , we define the inactive control constraint times as the set of times
3.4 Minimum-Principle Methods
51
K {0 ≤ k < T : u k ∈ int U }
(3.17)
where u k ∈ int U denotes that the control u k is in the interior (i.e., not on the boundary) of the control-constraint set U . If x[0,T ] and u [0,T −1] constitute a solution to the finite-horizon optimal control problem (3.4) then at the inactive control constraint times (i.e., the times in K ), the minimum condition (3.9) from assertion (iii) of Theorem 3.1 is equivalent to the condition that the gradient of the Hamiltonian vanishes, namely ∇u Hk (xk , u k , λk+1 , θ ) = 0 for all k ∈ K since the controls are in the interior of U .4 For x[0,T ] and u [0,T −1] to constitute a solution to the finite-horizon optimal control problem (3.4), Theorem 3.1 thus implies that the parameters θ must be such that λk = ∇x Hk (xk , u k , λk+1 , θ )
(3.18)
λT = ∇x F (x T , θ )
(3.19)
∇u Hk (xk , u k , λk+1 , θ ) = 0
(3.20)
for 0 ≤ k ≤ T − 1 with
and
for all k ∈ K .
3.4.1.1
Constraint-Satisfaction Method for Whole Sequences
In light of the costate backward recursion (3.18) and the Hamiltonian gradient condition (3.20), we can form a constraint-satisfaction method of whole-sequence discretetime inverse optimal control that involves identifying parameters θ and the sequence of costate vectors λ[0,T ] {λ0 , λ1 , . . . , λT } such that both (3.18) and (3.20) hold for the given state and control sequences x[0,T ] and u [0,T −1] . That is, the constraintsatisfaction method of whole-sequence discrete-time inverse optimal control is the constraint-satisfaction problem
At all other times k ∈ {0, 1, . . . , T − 1} \ K , the minimum of the Hamiltonian occurs with u k on the boundary of the constraint U . Hence, ∇u Hk (xk , u k , λk+1 , θ) is potentially nonzero at times other than those in K . 4
52
3 Discrete-Time Inverse Optimal Control
inf
θ,λ[0,T ]
C λk = ∇x Hk (xk , u k , λk+1 , θ ) , 0 ≤ k ≤ T − 1
s.t.
λT = ∇x F (x T , θ ) ∇u Hk (xk , u k , λk+1 , θ ) = 0, k ∈ K
(3.21)
θ ∈Θ for any constant C ∈ R. At first glance, the complexity of solving (3.21) appears dependent on the horizon T since as T increases, so too does the length of the costate sequence λ[0,T ] . However, given θ and λT , the remaining costate variables λ[0,T −1] are determined by the backward recursion (3.18) in the constraints of (3.21), and so the optimization in (3.21) can equivalently be written as only over θ and λT . This constraint-satisfaction approach is hence potentially elegant, however, it has both practical and theoretical drawbacks related to the existence and uniqueness of solutions.
3.4.1.2
Soft Method for Whole Sequences
The main limitation of the constraint-satisfaction method (3.21) is that it may not yield any parameters if the functions gk and F, dynamics, horizon, or parameter set are misspecified (i.e., if x[0,T ] and u [0,T −1] do not satisfy the minimum-principle conditions for any θ ∈ Θ). If no solutions to the constraint-satisfaction method (3.21) exist, it is instead desirable to find parameters θ such that the sequences x[0,T ] and u [0,T −1] are approximately optimal under the finite-horizon optimal control problem (3.4) in line with the approximate optimality concept discussed in Sect. 3.2. As an alternative approach to solving the WS problem of Definition 3.1, the (hard) constraints in (3.21) may be reformulated as soft constraints (i.e., optimization objectives) leading to the soft method for whole sequences defined by the optimization problem inf
θ,λ[0,T ]
∇u Hk (xk , u k , λk+1 , θ )2
k∈K
+
T −1
λk − ∇x Hk (xk , u k , λk+1 , θ )2
(3.22)
k=0
s.t.
+ λT − ∇x F (x T , θ )2 θ ∈ Θ.
In this soft method, the cost-function parameters and costates are computed to minimize the extent to which the costate backward recursion (3.18) and the Hamiltonian gradient condition (3.20) are violated. By inspection, we see that if the constraintsatisfaction method (3.21) has feasible solutions, then these will also be the only solutions to this soft method (3.22). However, this soft method also yields parame-
3.4 Minimum-Principle Methods
53
ters when (3.21) has no feasible solutions and the state and control sequences do not constitute an exact solution to the finite-horizon optimal control problem (3.4) for any θ ∈ Θ. Indeed, existence of parameters solving (3.22) requires only that Θ is non-empty. However, as with the constraint-satisfaction method of (3.21), the solutions to (3.22) may be nonunique. It is therefore important to examine the uniqueness of solutions to (3.22). A practical limitation of the soft method (3.22) is that the number of costate variables λ[0,T ] to be optimized grows with the horizon T . In contrast to the constraintsatisfaction method of (3.21), knowing the optimizing λT provides no useful information about the optimizing λk for any k < T in the soft method (3.22) since it seeks optimizers such that the minimum-principle conditions hold approximately rather than exactly. We shall next present one additional method to strike a more delicate balance between using the minimum-principle conditions entirely as hard constraints (as in the constraint-satisfaction method of (3.21)) and as objectives (as in the soft method of (3.22)). This mixed method will avoid the number of optimization variables growing with the horizon T while still handling situations where the given state and control sequences may not constitute an exact solution to the finite-horizon optimal control problem (3.4) for any θ ∈ Θ.
3.4.1.3
Mixed Method for Whole Sequences
We previously identified that the optimization in the constraint-satisfaction method (3.21) is only over the parameters θ and the terminal costate λT due to it encoding the costate recursion (3.18) as a constraint. As a result, the number of optimization variables in (3.21) may be viewed as independent of the horizon T . In contrast, in the soft method (3.22), the number of optimization variables grows with the horizon T . The soft method (3.22) does, however, have the advantage of yielding parameters with approximate optimality properties when no feasible solution to (3.21) exists. We shall now develop a mixed method for solving the whole-sequence inverse optimal control of Definition 3.1 by using the costate recursion (3.18) as a constraint and the Hamiltonian gradient condition (3.20) as an objective. In doing so, we shall retain both the horizon-independence of the optimization in the constraintsatisfaction method and the approximate optimality properties of the soft method. This mixed method is defined as the optimization problem inf
θ,λ[0,T ]
s.t.
∇u Hk (xk , u k , λk+1 , θ )2
k∈K
λk = ∇x Hk (xk , u k , λk+1 , θ ) , 0 ≤ k ≤ T − 1 λT = ∇x F (x T , θ ) θ ∈ Θ.
(3.23)
Here, the cost-function parameters θ are computed to minimize the extent to which the Hamiltonian gradient condition (3.20) is violated, while the costates λ[0,T ] are
54
3 Discrete-Time Inverse Optimal Control
computed directly such that the costate recursion (3.18) holds. The optimization in (3.23) can therefore equivalently be written as only over θ and λT . If the constraint-satisfaction method (3.21) has feasible solutions, then these will also be the only solutions to the mixed method (3.23). However, this mixed method also yields parameters when (3.21) has no feasible solutions, and the state and control sequences do not constitute an exact solution to the finite-horizon optimal control problem (3.4) for any θ ∈ Θ. As with the soft method of (3.22), the only condition required for the mixed method to have feasible solutions is that the parameter set Θ is non-empty. Finally, we note that the constraint-satisfaction method (3.21), soft method (3.22), and mixed method (3.23) are only suited to computing the parameters of discretetime finite-horizon optimal control problems (3.4) with known horizons. We next develop methods for truncated sequences (i.e., methods for solving the TS problem of Definition 3.2) that handle both finite- and infinite-horizon optimal control problems with potentially unknown horizons.
3.4.2 Methods for Truncated Sequences To develop methods for solving the truncated-sequence problem of Definition 3.2, let us consider a (possibly infinite) horizon T > 0 and a pair of state and control sequences x[0,] and u [0,] with < T . Similar to Definition 3.3, let us define K {0 ≤ k ≤ : u k ∈ int U }
(3.24)
as the set of inactive control constraint times for the (truncated) sequence u [0,] . If x[0,] and u [0,] constitute a truncated solution to either the infinite-horizon optimal control problem (3.5) or the finite-horizon optimal control problem (3.4) with T > , then under Assumptions 3.1 and 3.2, assertion (iii) of Corollary 3.1 is equivalent to the condition that the gradient of the Hamiltonian vanishes at the times in K , that is, ∇u Hk (xk , u k , λk+1 , θ ) = 0 for all k ∈ K . For x[0,] and u [0,] to constitute a truncated solution to either the infinite-horizon optimal control problem (3.5) or the finite-horizon optimal control problem (3.4), under Assumptions 3.1 and 3.2, Corollary 3.1 thus implies that the parameters θ must be such that λk = ∇x Hk (xk , u k , λk+1 , θ ) for 0 ≤ k ≤ , and
(3.25)
3.4 Minimum-Principle Methods
55
∇u Hk (xk , u k , λk+1 , θ ) = 0
(3.26)
for all k ∈ K . The costate backward recursion (3.25) and the Hamiltonian gradient condition (3.26) are essentially equivalent to those in (3.18) and (3.20) that we used in the previous section to propose methods of inverse optimal control with whole sequences. In the following, we will therefore be able to develop constraint-satisfaction, soft, and mixed methods for solving the TS problem of Definition 3.2 that are directly analogous to the methods for whole sequences developed in the previous subsection. We note, however, that in the case of truncated sequences (and potentially infinite horizons), there is no concept of a terminal costate or terminal costate condition analogous to (3.19). This difference, while subtle, will have a profound impact on how minimum-principle methods for truncated sequences can be analyzed and implemented.
3.4.2.1
Constraint-Satisfaction Method for Truncated Sequences
In view of the costate backward recursion (3.25) and the Hamiltonian gradient condition (3.26), the constraint-satisfaction method of inverse optimal control with truncated sequences is inf
θ,λ[0,+1]
C λk = ∇x Hk (xk , u k , λk+1 , θ ) , 0 ≤ k ≤ ∇u Hk (xk , u k , λk+1 , θ ) = 0, k ∈ K
s.t.
(3.27)
θ ∈Θ for any constant C ∈ R.
3.4.2.2
Soft Method for Truncated Sequences
Similarly, the soft method of inverse optimal control with truncated sequences is inf
θ,λ[0,+1]
∇u Hk (xk , u k , λk+1 , θ )2
k∈K
+
λk − ∇x Hk (xk , u k , λk+1 , θ )2
k=0
s.t.
θ ∈ Θ.
(3.28)
56
3.4.2.3
3 Discrete-Time Inverse Optimal Control
Mixed Method for Truncated Sequences
Finally, the mixed method of inverse optimal control with truncated sequences is inf
θ,λ[0,+1]
s.t.
3.4.2.4
∇u Hk (xk , u k , λk+1 , θ )2
k∈K
λk = ∇x Hk (xk , u k , λk+1 , θ ) , 0 ≤ k ≤ θ ∈ Θ.
(3.29)
Discussion of Minimum-Principle Methods for Truncated Sequences
The key difference between the methods for truncated sequences presented here in (3.27)–(3.29) and the methods for whole sequences presented in Sect. 3.4.1 is that the truncated-sequence methods require no prior knowledge of the horizon T . Indeed, the truncated-sequence methods simply require that the given sequences are of some length < T , while the whole-sequence methods require the length of the given sequences to correspond to the horizon T . A surprising consequence of the horizon not being explicitly specified in truncated-sequence methods is that they may yield parameters θ under which the given sequences constitute solutions to both finite- and infinite-horizon optimal control problems (3.4) and (3.5) with costfunction parameters θ . The truncated-sequence methods also omit constraints and penalties associated with the terminal costate λT . Many of the other properties of the truncated-sequence methods are, however, analogous to those of the whole-sequence methods of Sect. 3.4.1. For example, the optimizations in the constraint-satisfaction and mixed methods for truncated sequences can be considered to be only over the parameters θ and the last vector λ+1 from the truncated sequence of costates λ[0,+1] , while the optimizations in the constraint-satisfaction and mixed methods for whole sequences in Sect. 3.4.1 are essentially only over the parameters θ and the terminal costate λT . Importantly, when the constraint-satisfaction method (3.27) has feasible solutions, these feasible solutions will be the only solutions to the soft (3.28) and mixed (3.29) methods. The constraint-satisfaction method (3.27) may have no feasible solutions if the functions gk and F, dynamics, or, parameter set are misspecified. In these cases, the soft (3.28) and mixed (3.29) methods will yield parameters under which the truncated sequences x[0,] and u [0,] have approximate optimality properties in the sense of satisfying the minimum-principle conditions for optimality with minimal violation. Indeed, the soft (3.28) and mixed (3.29) methods will yield parameters provided that the parameter set Θ is non-empty.
3.5 Method Reformulations and Solution Results
57
3.5 Method Reformulations and Solution Results The foremost limitation of the minimum-principle methods presented in Sect. 3.4 is that they may yield parameters θ under which the given state and control sequences only correspond to a locally (but not globally) optimal solution. This limitation is due to the discrete-time minimum principles providing necessary, but not typically sufficient, optimality conditions. This limitation is, however, shared in practice with the bilevel methods of Sect. 3.3 since their practical implementation requires the use of locally (but not globally) convergent numeric optimization routines. Importantly, the negative consequences of this limitation can be mitigated for minimum-principle methods if the parameters yielded by them can be shown to be unique, since there will then be no other parameters θ from the parameter set Θ under which the sequences could constitute an optimal solution. In this section, we shall therefore seek to develop results characterizing the existence and uniqueness of solutions to the minimumprinciple methods. Examining the existence and uniqueness of the cost-function parameters yielded by the minimum-principle methods of Sect. 3.4 is, in general, a considerable theoretical challenge since the methods all involve the solution of challenging (potentially nonconvex) constrained optimization problems. To perform this analysis, in the first part of this section we shall introduce additional structure into the cost functions, and exploit it to reformulate the methods of Sect. 3.4 as either systems of linear equations or quadratic programs. In the second part of this section, we then apply the tools of quadratic programming and linear algebra (cf. Sect. 2.1) to distill results examining the existence and uniqueness of cost-function parameters computed by the methods.
3.5.1 Linearly Parameterized Cost Functions Throughout this section, we shall employ the following assumption regarding the parameterization of the functions gk and F in the cost functions (3.2) and (3.3). Assumption 3.3 (Linearly Parameterized Cost Functions) The functions gk and F are linear combinations of known basis functions in the sense that gk (xk , u k , θ ) = θ g¯ k (xk , u k ) for k ≥ 0, and F (x T , θ ) = θ F¯ (x T ) where g¯ k : Rn × Rm → Rq and F¯ : Rn → Rq are basis functions that are continuously differentiable in each of their arguments, and θ ∈ Θ ⊂ Rq are the cost-function parameters.
58
3 Discrete-Time Inverse Optimal Control
While Assumption 3.3 is somewhat restrictive, it is ubiquitous in the literature of inverse optimal control and inverse optimization (see Sect. 3.7 and references therein). For us, the primary utility of Assumption 3.3 is that it implies that the Hamiltonian function Hk becomes linear in both the cost-function parameters θ and the costate variables λk+1 . That is, under Assumption 3.3 we have that Hk (xk , u k , λk+1 , θ ) = θ g¯ k (xk , u k ) + λk+1 f k (xk , u k ) for all k ≥ 0. It follows also that the gradients of the Hamiltonian (i.e., the Hamiltonian gradients) are linear in the parameters θ and the costate λk+1 under Assumption 3.3, namely ∇u Hk (xk , u k , λk+1 , θ ) = ∇u g¯ k (xk , u k ) θ + ∇u f k (xk , u k ) λk+1
(3.30)
∇x Hk (xk , u k , λk+1 , θ ) = ∇x g¯ k (xk , u k ) θ + ∇x f k (xk , u k ) λk+1
(3.31)
and
where ∇x g¯ k (xk , u k ) and ∇u g¯ k (xk , u k ) denote the matrices of partial derivatives of g¯ k with respect to xk and u k , respectively (and evaluated at xk and u k ). We shall ¯ T ) to denote the matrix of partial derivatives of F¯ with respect similarly use ∇x F(x to (and evaluated at) x T . The linearity of the Hamiltonian gradients in the parameters θ and costates λk+1 under Assumption 3.3 directly implies that the methods of Sect. 3.4 become convex optimization problems under Assumption 3.3 (provided that the set Θ is also convex). To see this convexity under Assumption 3.3, we simply note that the objective functions and constraints of the methods in Sect. 3.4 are all linear or quadratic in the linear Hamiltonian gradients (3.30) and (3.31). We shall next further explore the properties of the methods under Assumption 3.3 by reformulating and analyzing them as either systems of linear equations or quadratic programs.
3.5.2 Reformulations of Whole-Sequence Methods Let us first reformulate the constraint-satisfaction, mixed, and soft methods presented in Sect. 3.4.1 for the WS problem under Assumption 3.3.
3.5.2.1
Whole-Sequence Constraint-Satisfaction Method Reformulation
In the constraint-satisfaction method (3.21), the vectors λ[0,T ] are constrained to satisfying the backward recursion
3.5 Method Reformulations and Solution Results
59
λk = ∇x Hk (xk , u k , λk+1 , θ ) for 0 ≤ k ≤ T − 1 with the terminal condition λT = ∇x F (x T , θ ). In the following proposition, we show via induction that since the Hamiltonian gradient ∇x Hk (xk , u k , λk+1 , θ ) on the right-hand side of this recursion is linear in both θ and λk+1 under Assumption 3.3 (cf. (3.31)), the vectors λk in the constraint-satisfaction method of (3.21) are linear functions solely of θ . Proposition 3.1 Suppose that Assumption 3.3 holds and consider the matrices λ¯ k ∈ Rn×q satisfying the backward recursion λ¯ k = ∇x g¯ k (xk , u k ) + ∇x f k (xk , u k ) λ¯ k+1
(3.32)
for 0 ≤ k ≤ T − 1 with the terminal condition λ¯ T ∇x F¯ (x T ) .
(3.33)
λk = λ¯ k θ ∈ Rn
(3.34)
Then, the vectors
for 0 ≤ k ≤ T satisfy the backward recursion (3.7) and terminal condition (3.8) from the finite-horizon discrete-time minimum principle of Theorem 3.1. Proof For k = T , the proposition assertion (3.34) follows by multiplying (3.33) by θ and noting that ∇x F (x T , θ ) = ∇x F¯ (x T ) θ which is equivalent to the terminal condition (3.8) under Assumption 3.3. We now proceed by (backwards) mathematical induction. Suppose that the proposition assertion (3.34) holds for all T ≥ k, and consider k − 1. Using (3.7) and (3.31) together with Assumption 3.3, we then have that λk−1 = ∇x L¯ k−1 (xk−1 , u k−1 ) θ + ∇x f k−1 (xk−1 , u k−1 ) λk = ∇x L¯ k−1 (xk−1 , u k−1 ) θ + ∇x f k−1 (xk−1 , u k−1 ) λ¯ k θ = λ¯ k−1 θ where the second line follows from the induction assumption and the third line follows from the definition of (3.32). The proposition assertion (3.34) thus holds for all 0 ≤ k ≤ T by induction. The proof is complete. By recalling the linearity of the Hamiltonian gradients (3.30) and (3.31) under Assumption 3.3, Proposition 3.1 implies that the Hamiltonian gradient terms in the constraint-satisfaction method (3.21) may be rewritten as
60
3 Discrete-Time Inverse Optimal Control
∇u Hk (xk , u k , λk+1 , θ ) = ∇u g¯ k (xk , u k ) + ∇u f k (xk , u k ) λ¯ k+1 θ
(3.35)
∇x Hk (xk , u k , λk+1 , θ ) = ∇x g¯ k (xk , u k ) + ∇x f k (xk , u k ) λ¯ k+1 θ
(3.36)
and
where the coefficient matrices λ¯ k for 0 ≤ k ≤ T are given by the backward recursion (3.32) with the terminal condition (3.33). We note that the coefficient matrices λ¯ k are independent of the parameters θ , and so the Hamiltonian gradients (3.35) and (3.36) have no (explicit) dependence on the vectors λ[0,T ] . By substituting the linear expressions (3.35) and (3.36) for the Hamiltonian gradients into the constraint-satisfaction method (3.21), we see that under Assumption 3.3, (3.21) simplifies to the optimization problem inf
C
s.t.
λ¯ k = ∇x g¯ k (xk , u k ) + ∇x f k (xk , u k ) λ¯ k+1 , 0 ≤ k ≤ T − 1 λ¯ T = ∇x F¯ (x T ) ∇u g¯ k (xk , u k ) + ∇u f k (xk , u k ) λ¯ k+1 θ = 0, k ∈ K θ ∈Θ
θ
for any constant C ∈ R. The first two constraints in this optimization problem are satisfied for all θ ∈ Θ by simply solving the backward recursion (3.32) with terminal condition (3.33) for the sequence of matrices λ¯ [0,T ] {λ¯ 0 , λ¯ 1 , . . . , λ¯ T }. The third constraint is then linear in θ . Thus, the constraint-satisfaction method (3.21) can be reformulated under Assumption 3.3 as the (constrained) system of linear equations ξC θ = 0
s.t. θ ∈ Θ
(3.37)
where the coefficient matrix ξC is ξC ∇u g¯ k (xk , u k ) + ∇u f k (xk , u k ) λ¯ k+1 k∈K ∈ Rm|K
|×q
(3.38)
with λ¯ [0,T ] solving (3.32) and (3.33). Here, we use the notation [Ak ]k∈K to denote the matrix formed by stacking a subsequence of matrices with index values of k in K from the sequence of matrices A0 , A1 , . . ., e.g., ⎡ ⎢ ⎢ [Ak ]k∈{0,...,T −1} = ⎢ ⎣
A0 A1 .. . A T −1
⎤ ⎥ ⎥ ⎥. ⎦
3.5 Method Reformulations and Solution Results
3.5.2.2
61
Whole-Sequence Soft Method Reformulation
To reformulate the soft method of (3.22) under Assumption 3.3, let us recall the linear forms of the Hamiltonian gradients in (3.30) and (3.31). We are unable to exploit Proposition 3.1 to simplify these Hamiltonian gradients further due to the vectors λ[0,T ] in the soft method (3.22) not being constrained to exactly satisfy the backward recursion of the discrete-time finite-horizon minimum principle. We shall instead proceed directly from the linear Hamiltonian gradients in (3.30) and (3.31) to examine the terms in the soft method of (3.22) under Assumption 3.3. Specifically, under Assumption 3.3 the terms in the soft method of (3.22) are given by
∇u g¯ k (xk , u k ) ∇u Hk (xk , u k , λk+1 , θ )2 = θ λk+1 ∇u f k (xk , u k )
θ · ∇u g¯ k (xk , u k ) ∇u f k (xk , u k ) λk+1
(3.39)
along with ⎤ ⎡ −∇x g¯ k (xk , u k ) ⎦ λk − ∇x Hk (xk , u k , λk+1 , θ )2 = θ λk λk+1 ⎣ I −∇x f k (xk , u k )
⎡ ⎤ θ · −∇x g¯ k (xk , u k ) I −∇x f k (xk , u k ) ⎣ λk ⎦ λk+1 (3.40)
and
−∇x F¯ (x T ) θ λT − ∇x F (x T , θ )2 = θ λT . −∇x F¯ (x T ) I λT I (3.41) The terms summed in the soft method of (3.22) are thus each quadratic forms under Assumption 3.3. Since the sums of quadratic forms are also quadratic forms, the soft method of (3.22) under Assumption 3.3 may be reformulated as the (constrained) quadratic program ⎡
inf
θ,λ[0,T ]
θ λ0
⎤ θ ⎥ ⎢ ⎢ λ0 ⎥ · · · λT ξ S ⎢ . ⎥ ⎣ .. ⎦ λT
s.t. θ ∈ Θ
(3.42)
62
3 Discrete-Time Inverse Optimal Control
where ξ S is a matrix of dimension (q + (T + 1)n) × (q + (T + 1)n) containing the coefficients of the parameters θ and vectors λ[0,T ] contributed by each of the terms (3.39)–(3.41).
3.5.2.3
Whole-Sequence Mixed Method Reformulation
As in the constraint-satisfaction method (3.21), the mixed method (3.23) constrains the vectors λ[0,T ] to satisfying the backward recursion λk = ∇x Hk (xk , u k , λk+1 , θ ) for 0 ≤ k ≤ T − 1 with the terminal condition λT = ∇x F (x T , θ ). Thus, by again appealing to Proposition 3.1, we have that the vectors λk in the mixed method (3.23) are linear functions of θ under Assumption 3.3. Hence, the Hamiltonian gradients in the mixed method (3.23) are given by (3.35) and (3.36) and so under Assumption 3.3, we see that (3.23) simplifies to the optimization problem inf θ
∇u g¯ k (xk , u k ) + ∇u f k (xk , u k ) λ¯ k+1 θ 2 k∈K
s.t.
λ¯ k = ∇x g¯ k (xk , u k ) + ∇x f k (xk , u k ) λ¯ k+1 , 0 ≤ k ≤ T − 1 λ¯ T = ∇x F¯ (x T ) θ ∈ Θ.
This optimization problem is a constrained quadratic program, and so the mixed method (3.23) under Assumption 3.3 may be reformulated as inf θ
θ ξM θ
s.t. θ ∈ Θ
(3.43)
where ξ M ∈ Rq×q is the positive semidefinite matrix ξM
∇u g¯ k (xk , u k ) + ∇u f k (xk , u k ) λ¯ k+1 ∇u g¯ k (xk , u k ) + ∇u f k (xk , u k ) λ¯ k+1
k∈K
and the matrices λ¯ [0,T ] solve (3.32) and (3.33).
3.5.3 Solution Results for Whole-Sequence Methods Given the reformulations of the constraint-satisfaction, mixed, and soft methods for the WS problem as systems of linear equations or quadratic programs under Assumption 3.3, we shall now apply the tools of linear algebra and quadratic programming
3.5 Method Reformulations and Solution Results
63
(cf. Sect. 2.1) to establish results characterizing the existence and uniqueness of the cost-function parameters computed by them.
3.5.3.1
Fixed-Element Parameter Set
As the first step toward establishing solution results, we note that scaling the cost function of an optimal control problem (i.e., VT of (3.4)) by any scalar C > 0 does not change the nature of the optimizing sequences x[0,T ] and u [0,T −1] , but does scale the minimum value of the cost function. Thus, an immediate condition necessary (though not sufficient) for methods of discrete-time inverse optimal control to yield unique solutions is that parameter set Θ must not contain both θ and C θ for any scalar C > 0 and any θ . To satisfy this condition, we consider the fixed-element parameter set Θ = {θ ∈ Rq : θ(1) = 1}
(3.44)
where θ(1) denotes the first element of θ ∈ Rq . There is no loss of generality with this choice of parameter set since the ordering and scaling of the basis functions and costfunction parameters is arbitrary. Analogous results to those we establish here will also hold when the parameter set is instead constructed as the fixed-normalization set Θ = {θ ∈ Rq : θ = 1}. We also note that this choice of parameter set excludes the trivial solution θ = 0 to the discrete-time inverse optimal control problems.
3.5.3.2
Whole-Sequence Constraint-Satisfaction Method Solution Results
Under Assumption 3.3, we have seen that the constraint-satisfaction method for whole sequences (3.21) can be reformulated as the constrained system of linear equations (3.37). Incorporation of the parameter set Θ given by (3.44) into this reformulation leads to the constraint-satisfaction method (3.21) further reducing to the unconstrained system of linear equations ξ¯C θ = e¯1
(3.45)
where
¯ξC e1 ξC and e1 and e¯1 are column vectors of appropriate dimensions with 1 in their first components and zeros elsewhere. In the following theorem, we exploit this reformulation to characterize the existence and uniqueness of cost-function parameters that are yielded by the constraint-satisfaction method (3.21).
64
3 Discrete-Time Inverse Optimal Control
Theorem 3.3 (Solutions to Whole-Sequence Constraint-Satisfaction Method) Suppose that Assumption 3.3 holds and that the parameter set Θ is given by (3.44). Let ξ¯C+ be the Moore–Penrose pseudoinverse of the matrix ξ¯C . Then the constraintsatisfaction method for whole sequences (3.21) yields cost-function parameters if and only if ξ¯C ξ¯C+ e¯1 = e¯1 ,
(3.46)
and these (potentially nonunique) parameters are given by θ = ξ¯C+ e¯1 + I − ξ¯C+ ξ¯C b
(3.47)
where b ∈ Rq is any arbitrary vector. Furthermore, the cost-function parameters computed by the method are unique and given by θ = ξ¯C+ e¯1
(3.48)
if and only if ξ¯C has rank q (i.e., rank(ξ¯C ) = q) in addition to satisfying (3.46). Proof It suffices to analyze the solutions to the system of linear equations (3.45) since (3.21) reduces to (3.45) under Assumption 3.3 with Θ given by (3.44). From Proposition 2.2, we have that the system of linear equations (3.45) is consistent if and only if ξ¯C ξ¯C+ e¯1 = e¯1 . Proposition 2.2 also implies that these solutions are given by (3.47), proving the first theorem assertion. The second theorem assertion follows from (3.47) since Lemma 2.1 gives that ξ¯C+ ξ¯C = I if and only if rank(ξ¯C ) = q. The proof is complete. Under Assumption 3.3 and when the parameter set Θ is given by (3.44), Theorem 3.3 establishes that the left-identity condition (3.46) is both a necessary and sufficient condition for ensuring the existence of solutions to the optimization problem defining the constraint-satisfaction method (3.21). If the left-identity condition (3.46) fails to hold, no solutions to (3.21) will exist, and so the constraint-satisfaction method for whole sequences (3.21) will fail to yield any cost-function parameters. The rank condition of Theorem 3.3 serves a role analogous to persistence of excitation conditions that appear in parameter estimation and adaptive control since it holds when the given states and controls provide sufficient information to enable unique determination of the cost-function parameters θ (cf. [18] and references therein). Indeed, it will fail to hold when the problem is ill-posed due to a short horizon or if there are too few times during which the controls are in the interior of the constraint set U (e.g., if m|K | + 1 < q). It may also fail for degenerate dynamics and initial states that lead to uninformative sequences (e.g., sequences that remain in equilibrium points of the dynamics).
3.5 Method Reformulations and Solution Results
65
Theorem 3.3 provides deep insight into the solution of the whole-sequence inverse optimal control problem of Definition 3.1. Specifically, it implies conditions for the existence and uniqueness of exact solutions to the WS problem of Definition 3.1 with Θ given by (3.44) regardless of the specific method of inverse optimal control employed (whether bilevel, based on the minimum principle, or any other method). Indeed, recall that the constraint-satisfaction method (3.21) encodes the costate backward recursion (3.18) and the Hamiltonian gradient condition (3.20) of the discretetime finite-horizon minimum principle of Theorem 3.1 as constraints. The constraintsatisfaction method (3.21) simplifies to (3.45) under Assumption 3.3 with Θ given by (3.44). Thus, when no solution to (3.45) exists, which Theorem 3.3 implies when ξ¯C ξ¯C+ e¯1 = e¯1 , there is no θ in Θ such that the states x[0,T ] and controls u [0,T −1] satisfy the conditions of the costate backward recursion (3.18) and the Hamiltonian gradient condition (3.20) of the discrete-time finite-horizon minimum principle of Theorem 3.1. Similarly, the rank condition of Theorem 3.3 combined with (3.46) is both necessary and sufficient for ensuring that there is at most one θ in Θ that constitutes an exact solution to the whole-sequence inverse optimal control problem of Definition 3.1. We summarize these observations in the following corollary. Corollary 3.2 (Existence of Exact Solutions to Whole-Sequence Problem of Definition 3.1) Suppose that Assumption 3.3 holds and that the parameter set Θ is given by (3.44). If ξ¯C ξ¯C+ e¯1 = e¯1 then the sequence of states x[0,T ] and associated controls u [0,T −1] do not constitute a (potentially local) optimal solution to (3.4) for any θ ∈ Θ. If however, ξ¯C has rank q and ξ¯C ξ¯C+ e¯1 = e¯1 then there is at most one θ ∈ Θ such that the sequences x[0,T ] and u [0,T −1] constitute a (potentially local) optimal solution to (3.4). Proof The constraint-satisfaction method (3.21) encodes the costate backward recursion (3.18) and the Hamiltonian gradient condition (3.20) of Theorem 3.1 as constraints. The constraint-satisfaction method (3.21) simplifies to (3.45) under Assumption 3.3 with Θ given by (3.44). Thus, when no solution to (3.45) exists, which Theorem 3.3 implies when ξ¯C ξ¯C+ e¯1 = e¯1 , there is no θ in Θ such that the states x[0,T ] and controls u [0,T −1] satisfy the discrete-time finite-horizon minimum principle of Theorem 3.1. The first corollary assertion follows by noting that satisfying the conditions of Theorem 3.1 is necessary in order for x[0,T ] and u [0,T −1] to constitute a (potentially local) optimal solution to (3.4). The second corollary assertion follows similarly since if (3.45) possesses a unique solution, which Theorem 3.3 implies when ξ¯C ξ¯C+ e¯1 = e¯1 and ξC has rank q, there is only one θ in Θ such that x[0,T ] and u [0,T −1] satisfy (3.18) and (3.20) of the finitehorizon discrete-time minimum principle of Theorem 3.1. The proof is completed
66
3 Discrete-Time Inverse Optimal Control
by noting that Theorem 3.1 is necessary in order for x[0,T ] and u [0,T −1] to constitute a (potentially local) optimal solution to (3.4), though not in general sufficient. The left-identity condition (3.46) may fail to hold (implying that x[0,T ] and u [0,T −1] do not satisfy the minimum-principle conditions for any θ ∈ Θ) in practical situations ¯ dynamics f , horizon T , or parameter set Θ where the basis functions g¯ k and F, are misspecified. As discussed in Sect. 3.2, in these cases it is often desirable to instead find parameters θ such that the sequences x[0,T ] and u [0,T −1] are approximately optimal. Thus, the left-identity condition (3.46) provides a practical (and testable) condition for determining if it is feasible to solve the WS problem of Definition 3.1 exactly, or if it is necessary to resort to an approximate optimality solution from either the soft method (3.22) or mixed method (3.23). We summarize the constraint-satisfaction method for whole sequences (3.21) under Assumption 3.3 in Algorithm 3.1. Algorithm 3.1 Constraint-Satisfaction Method for Whole Sequences Input: Whole state and control sequences x[0,T ] and u [0,T −1] , dynamics f k , basis functions g¯ k and ¯ control-constraint set U , and parameter set Θ = {θ ∈ Rq : θ(1) = 1}. F, Output: Computed cost-function parameters θ. 1: Compute sequence of matrices λ¯ [0,T ] via (3.32) and (3.33). 2: Compute matrix ξC via (3.38). 3: Compute augmented matrix ξ¯C from (3.45). 4: Compute the pseudoinverse ξ¯C+ of ξ¯C . 5: if ξ¯C ξ¯C+ e¯1 = e¯1 then 6: if ξ¯C has rank q then 7: return Unique θ given by (3.48). 8: else 9: return Any θ from (3.47) with any b ∈ Rq . 10: end if 11: else 12: return No feasible exact solutions θ to the WS problem (cf. Corollary 3.2). 13: end if
3.5.3.3
Whole-Sequence Soft Method Solution Results
Under Assumption 3.3, we have seen that the soft method for whole sequences (3.22) can be reformulated as the constrained quadratic program (3.42). In the following theorem, we incorporate the parameter set (3.44) into this reformulation, and characterize the existence and uniqueness of cost-function parameters that it yields. To present this theorem, let us define
3.5 Method Reformulations and Solution Results
⎡ ⎢ ⎢ ξ¯S ⎢ ⎣
ξ S,(2,2) ξ S,(3,2) .. .
... ... .. .
67
ξ S,(2,q+(T +1)n) ξ S,(3,q+(T +1)n) .. .
⎤ ⎥ ⎥ ⎥ ⎦
(3.49)
ξ S,(q+(T +1)n,2) . . . ξ S,(q+(T +1)n,q+(T +1)n) as the principal submatrix of ξ S in (3.42) formed by deleting its first row and column (here, ξ S,(i, j) is the element of ξ S in its ith row and jth column). Let us also define ν S ξ S,(2,1) ξ S,(3,1) . . . ξ S,(q+(T +1)n,q+(T +1)n)
(3.50)
as the first column of ξ S with its first element deleted. Similarly, let r S be the rank of ξ¯S , let ξ¯S+ be the pseudoinverse of ξ¯S , and let ξ¯S = U S Σ S U S be a singular value decomposition (SVD) R(q+(T +1)n−1)×(q+(T +1)n−1) is a diagonal matrix, and US
of
ξ¯S
11 12 US US ∈ R(q+(T +1)n−1)×(q+(T +1)n−1) U S21 U S22
where
ΣS ∈
(3.51)
is a block matrix with submatrices U S11 ∈ R(q−1)×r S , U S12 ∈ R(q−1)×(q+(T +1)n−1−r S ) , U S21 ∈ Rn(T +1)×r S , and U S22 ∈ Rn(T +1)×(q+(T +1)n−1−r S ) . Finally, let us define the rect angular matrix I S I 0 ∈ Rq×(q+(T +1)n) where I ∈ Rq×q is the identity matrix. Theorem 3.4 (Solutions to Whole-Sequence Soft Method) Suppose that Assumption 3.3 holds and that Θ is given by (3.44). If (I − ξ¯S ξ¯S+ )ν S = 0, then all of the parameter vectors θ ∈ Θ corresponding to all solutions (θ, λ[0,T ] ) of the soft method (3.22) are of the form θ = IS ηS
(3.52)
where η S 1 η¯ S ∈ Rq+(T +1)n are (potentially nonunique) solutions to the quadratic program (3.42) with η¯ S ∈ Rq+(T +1)n−1 given by η¯ S = −ξ¯S+ ν S + U S
0 b
(3.53)
for any b ∈ Rq+(T +1)n−1−r S . Furthermore, if either U S12 = 0 or r S = q + (T + 1)n − 1, then all solutions (θ, λ[0,T ] ) to the soft method (3.22) correspond to a single unique parameter vector given by
θ = IS
1 . −ξ¯S+ ν S
(3.54)
68
3 Discrete-Time Inverse Optimal Control
Proof The soft method for whole sequences (3.22) is equivalent to (3.42) under Assumption 3.3, and so it suffices to analyze the solutions to the quadratic program (3.42) with Θ given by (3.44). Any vector η S ∈ Rq+(T +1)n that satisfies the constraint I S η S ∈ Θ with Θ given by (3.44) can be written as η S = 1 η¯ S where η¯ S ∈ Rq+(T +1)n−1 . Thus, for any η S ∈ Rq+(T +1)n with I S η S ∈ Θ and Θ given by (3.44), the objective function in (3.42) is
1 η¯ S = ξ S,(1,1) + η¯ S ξ¯S η¯ S + 2η¯ S ν S .
ηS ξ S η S = 1 η¯ S ξ S
All solutions η S to (3.42) are thus of the form η S = 1 η¯ S where η¯ S ∈ Rq+(T +1)n−1 are solutions to the unconstrained quadratic program
1 ¯ η¯ ξ S η¯ S + η¯ S ν S . inf η¯ S 2 S Under the condition that (I − ξ¯S ξ¯S+ )ν S = 0 and by noting that ξ¯S is symmetric positive semidefinite due to ξ S being symmetric positive semidefinite, Proposition 2.1 gives that this unconstrained quadratic program is solved by any η¯ S satisfying
0 + ¯ η¯ S = −ξ S ν S + U S b for any b ∈ Rq+(T +1)n−1−r S . The first theorem assertion (3.53) follows. Now, to prove the second theorem assertion we note that if U S12 = 0, then the vectors η¯ S in the solutions η S = 1 η¯ S to the constrained quadratic program (3.42) satisfy
11 0 US 0 U S21 U S22 b
0 = −ξ¯S+ ν S + U S22 b
η¯ S = −ξ¯S+ ν S +
for any b ∈ Rq+(T +1)n−1−r S where U S22 b ∈ Rn(T +1) . Clearly, if r S = q + n(T + 1) − 1, then we also have that η¯ S = −ξ¯S+ ν S .
3.5 Method Reformulations and Solution Results
69
Thus, if either U S12 = 0 or r S = q + n(T + 1) − 1, then the first q − 1 components q+n(T +1)−1−r S , and so all of η¯ S are invariant respect to the free vector b ∈ R with solutions η S = 1 η¯ S of the constrained quadratic program (3.42) satisfy
IS ηS = IS
1 . −ξ¯S+ ν S
The second theorem assertion follows since the parameters θ from (3.22) satisfy θ = I S η S . The proof is complete. The soft method (3.22) involves an explicit search over both the cost-function parameters θ and the sequence of costate vectors λ[0,T ] . In Theorem 3.4, we have specifically focused on characterizing the form and uniqueness of cost-function parameters θ yielded by (3.22) without regard for the costate vectors λ[0,T ] . Indeed, Theorem 3.4 establishes that the cost-function parameters yielded by (3.22) are given explicitly by (3.52), and that the conditions U S12 = 0 and r S = q + (T + 1)n − 1 are both sufficient for ensuring the uniqueness of the cost-function parameters yielded by (3.22). If r S = q + (T + 1)n − 1 holds, then both the cost-function parameters θ and costate vectors λ[0,T ] solving (3.22) will be unique since the first assertion of The orem 3.4, specifically (3.53), implies that the free vector b in the vectors η S = 1 η¯ S solving (3.42) will be zero-dimensional when r S = q + (T + 1)n − 1. The condition U S12 = 0 can hold when r S < q + (T + 1)n − 1. In this case, all pairs (θ, λ[0,T ] ) solving the soft method (3.22) will share the unique parameter vector θ given by (3.54) but may not share common costate vectors. The conditions U S12 = 0 and r S = q + (T + 1)n − 1 will not hold when the inverse optimal control problem is ill-posed—for example, due to short horizons T , due to degenerate system dynamics, or when the sequences are uninformative (e.g., they correspond to an equilibrium of the dynamics in the sense that xk+1 = xk for all k). The conditions U S12 = 0 and r S = q + (T + 1)n − 1 therefore have an intuitive interpretation as persistence of excitation conditions for the soft method (3.22) (analogous to similar concepts in parameter estimation and adaptive control). While we note that these conditions for the soft method may hold in cases where the conditions of Theorem 3.3 and Corollary 3.2 for the constraint-satisfaction method do not, in these cases the solutions to the soft method will yield parameters under which the sequences are approximately optimal rather than exactly optimal (in the sense discussed in Sect. 3.2). We summarize the soft method for whole sequences (3.22) under Assumption 3.3 in Algorithm 3.2.
3.5.3.4
Whole-Sequence Mixed Method Solution Results
We shall now exploit the reformulation of the mixed method for whole sequences (3.23) as the constrained quadratic program (3.43) under Assumption 3.3 to characterize the existence and uniqueness of cost-function parameters that are yielded by
70
3 Discrete-Time Inverse Optimal Control
Algorithm 3.2 Soft Method for Whole Sequences Input: Whole state and control sequences x[0,T ] and u [0,T −1] , dynamics f k , basis functions g¯ k and ¯ control-constraint set U , and parameter set Θ = {θ ∈ Rq : θ(1) = 1}. F, Output: Computed cost-function parameters θ. 1: Compute matrix ξ S via (3.42). 2: Compute submatrix matrix ξ¯S from (3.49) and vector ν S from (3.50). 3: Compute the pseudoinverse ξ¯S+ of ξ¯ S such that (I − ξ¯S ξ¯ S+ )ν S = 0. 4: Compute the rank r S of ξ¯ S . 5: if r S = q + (T + 1)n − 1 then 6: return Unique θ given by (3.54). 7: else 8: Compute U S and U S12 in (3.51) via SVD of ξ¯ S . 9: if U S12 = 0 then 10: return Unique θ given by (3.54). 11: else 12: return Any θ from (3.52) with any b ∈ Rq+(T +1)n−1−r S . 13: end if 14: end if
it when the parameter set is given by (3.44). To present these results, let us define ⎡
ξ¯ M
ξ M,(2,2) ⎢ξ M,(3,2) ⎢ ⎢ . ⎣ ..
ξ M,(q,2)
⎤ . . . ξ M,(2,q) . . . ξ M,(3,q) ⎥ ⎥ .. ⎥ .. . . ⎦ . . . ξ M,(q,q)
(3.55)
as the principal submatrix of ξ M in (3.43) formed by deleting its first row and column. Let us also define ν M ξ M,(2,1) ξ M,(3,1) . . . ξ M,(q,q)
(3.56)
as the first column of ξ M with its first element deleted. Similarly, let r M be the rank + of ξ¯ M , let ξ¯ M be the pseudoinverse of ξ¯ M , and let ξ¯ M = U M Σ M U M
be an SVD of ξ¯ M where Σ M ∈ R(q−1)×(q−1) and U M ∈ R(q−1)×(q−1) . Theorem 3.5 (Solutions to Whole-Sequence Mixed Method) Suppose that Assump+ tion 3.3 holds and that Θ is given by (3.44). If (I − ξ¯ M ξ¯ M )ν M = 0, then all of the cost-function parameters θ ∈ Θ yielded by the mixed method (3.23) satisfy
1 θ= ¯ θ where the vectors θ¯ ∈ Rq−1 are given by
(3.57)
3.5 Method Reformulations and Solution Results
71
0 + θ¯ = −ξ¯ M νM + UM b
(3.58)
for any b ∈ Rq−1−r M . Furthermore, if r M = q − 1, then the mixed method (3.23) yields the unique cost-function parameters
θ=
1
+ νM −ξ¯ M
.
(3.59)
Proof The mixed method for whole sequences (3.23) is equivalent to (3.43) under Assumption 3.3, and so it suffices to analyze the solutions to the quadratic program (3.43) with Θ given by (3.44). Any vector θ ∈ Θ with Θ given by (3.44) can be written as θ = 1 θ¯ where θ¯ ∈ Rq−1 . Thus, for any θ ∈ Θ, the quadratic objective function in (3.43) is
1 θ ξ M θ = 1 θ¯ ξ M ¯ θ = ξ M,(1,1) + θ¯ ξ¯ M θ¯ + 2θ¯ ν M . The solutions to the constrained quadratic program (3.43) are therefore of the form θ = 1 θ¯ where θ¯ ∈ Rq−1 are solutions to the unconstrained quadratic program
1 ¯ ¯ ¯ ¯ θ ξM θ + θ νM . inf 2 θ¯ + Under the condition that (I − ξ¯ M ξ¯ M )ν M = 0 and by noting that ξ¯ M is symmetric positive semidefinite due to ξ M being symmetric positive semidefinite, Proposition 2.1 gives that this unconstrained quadratic program is solved by any θ¯ satisfying
0 + ¯θ = −ξ¯ M νM + UM b for any b ∈ Rq−1−r M . The first theorem assertion follows. The second theorem assertion (3.59) follows since the free vector is zero dimensional when r M = q − 1. The proof is complete. From Theorem 3.5, we see that the solutions to the mixed method (3.23) have a similar form to the solutions to the soft method (3.22) established in Theorem 3.4 under Assumption 3.3. However, the mixed method (3.23) avoids the need to explicitly optimize for the costate vectors λ[0,T ] . The rank condition r M = q − 1 for ensuring the uniqueness of cost-function parameters given by the mixed method is thus simpler (and likely to hold more generally) than the uniqueness conditions U S12 = 0 or r S = q + (T + 1)n − 1 we established for the soft method (3.22) in Theorem 3.4. As with the conditions for uniqueness of solutions to the soft method, the rank condition r M = q − 1 has an intuitive interpretation as a persistence of excita-
72
3 Discrete-Time Inverse Optimal Control
tion condition for the mixed method and will fail to hold when the inverse optimal control problem is ill-posed due to short horizons T , degenerate system dynamics, or uninformative trajectories. Again, the rank condition r M = q − 1 for the mixed method may hold in cases where the conditions of Theorem 3.3 and Corollary 3.2 for the constraint-satisfaction method do not. In these cases, the mixed method will yield parameters under which the sequences are approximately optimal rather than exactly optimal (in the sense discussed in Sect. 3.2). We summarize the mixed method for whole sequences (3.23) under Assumption 3.3 in Algorithm 3.3. Algorithm 3.3 Mixed Method for Whole Sequences Input: Whole state and control sequences x[0,T ] and u [0,T −1] , dynamics f k , basis functions g¯ k and ¯ control-constraint set U , and parameter set Θ = {θ ∈ Rq : θ(1) = 1}. F, Output: Computed cost-function parameters θ. 1: Compute sequence of matrices λ¯ [0,T ] via (3.32) and (3.33). 2: Compute matrix ξ M in (3.43). 3: Compute submatrix matrix ξ¯ M from (3.55) and vector ν M from (3.56). + + 4: Compute the pseudoinverse ξ¯ M of ξ¯ M such that (I − ξ¯ M ξ¯ M )ν M = 0. 5: Compute the rank r M of ξ¯ M . 6: if r M = q − 1 then 7: return Unique θ given by (3.58). 8: else 9: Compute U M via SVD of ξ¯ M . 10: return Any θ from (3.57) with any b ∈ Rq−1−r M . 11: end if
3.5.4 Reformulations of Truncated-Sequence Methods Having reformulated and examined the solutions to the constraint-satisfaction, mixed, and soft methods for the WS problem of Definition 3.1 under Assumption 3.3, we shall now reformulate the constraint-satisfaction, mixed, and soft methods for solving the TS problem of Definition 3.2 under Assumption 3.3.
3.5.4.1
Truncated-Sequence Constraint-Satisfaction Method Reformulation
The constraint-satisfaction method (3.27) for truncated sequences constrains the vectors λ[0,+1] to satisfy the backward recursion λk = ∇x Hk (xk , u k , λk+1 , θ )
(3.60)
3.5 Method Reformulations and Solution Results
73
for 0 ≤ k ≤ . This backward recursion is from Corollary 3.1 and lacks a terminal condition. Without a terminal condition, we are unable to rewrite the vectors λk as linear functions solely of the parameters θ (as we did in the whole-sequence setting of Proposition 3.1). Instead, to reformulate the constraint-satisfaction method for truncated sequences, we shall directly exploit that the Hamiltonian gradients are linear in both the parameters and costates under Assumption 3.3 as shown in (3.30) and (3.31). We exploit this linearity by substituting (3.31) under Assumption 3.3 into the backward recursion (3.60), giving
I 0 θ θ = ∇x g¯ k (xk , u k ) ∇x f k (xk , u k ) λk+1 λk for 0 ≤ k ≤ . Under Assumption 3.3, the parameters θ , final costate λ+1 , and costate λk at any time 0 ≤ k ≤ can therefore be seen to satisfy
θ θ = Gk λk λ+1 for 0 ≤ k ≤ where Gk is the product matrix Gk G k × G k+1 × · · · × G −1 × G and
Gk
I 0 . ∇x g¯ k (xk , u k ) ∇x f k (xk , u k )
We may thus rewrite (3.30) under Assumption 3.3 as
θ ∇u Hk (xk , u k , λk+1 , θ ) = ∇u g¯ k (xk , u k ) ∇u f k (xk , u k ) λk
θ = ∇u g¯ k (xk , u k ) ∇u f k (xk , u k ) Gk λ+1
θ = Wk λ+1
(3.61)
for 0 ≤ k ≤ with Wk [∇u g¯ k (xk , u k ) ∇u f k (xk , u k )]Gk . In light of (3.61) with its implicit satisfaction of the backward recursion (3.60), the constraint-satisfaction method for truncated sequences (3.27) simplifies to the optimization problem
74
3 Discrete-Time Inverse Optimal Control
inf
C
θ,λ+1
s.t.
Wk
θ
λ+1
= 0, k ∈ K
θ ∈Θ under Assumption 3.3 for any constant C ∈ R. Equivalently, under Assumption 3.3 we may reformulate the constraint-satisfaction method of (3.27) as the problem of solving the (constrained) system of linear equations
φC
θ
λ+1
=0
s.t. θ ∈ Θ
(3.62)
where φC is the stacked matrix φC = [Wk ]k∈K . 3.5.4.2
Truncated-Sequence Soft Method Reformulation
The soft method for truncated sequences (3.28) can be reformulated under Assumption 3.3 in the same way that we reformulated the soft method for whole sequences in (3.42). Specifically, under Assumption 3.3, the Hamiltonian gradients have the linear forms (3.30) and (3.31), which implies that the terms summed in the soft method (3.28) are quadratic forms. That is,
∇u g¯ k (xk , u k ) ∇u Hk (xk , u k , λk+1 , θ ) = θ λk+1 ∇u f k (xk , u k )
θ · ∇u g¯ k (xk , u k ) ∇u f k (xk , u k ) λk+1 2
for k ∈ K , and ⎤ ⎡ −∇x g¯ k (xk , u k ) ⎦ λk − ∇x Hk (xk , u k , λk+1 , θ )2 = θ λk λk+1 ⎣ I −∇x f k (xk , u k )
⎡ ⎤ θ · −∇x g¯ k (xk , u k ) I −∇x f k (xk , u k ) ⎣ λk ⎦ λk+1
for 0 ≤ k ≤ . Since the sums of quadratic forms are also quadratic forms, the soft method of (3.28) under Assumption 3.3 may be reformulated as the (constrained) quadratic program
3.5 Method Reformulations and Solution Results
75
⎡ ⎢ ⎢ θ λ0 · · · λ+1 φ S ⎢ ⎣
inf
θ,λ[0,+1]
⎤
θ λ0 .. .
⎥ ⎥ ⎥ ⎦
s.t. θ ∈ Θ
(3.63)
λ+1 where φ S is a matrix of appropriate dimensions containing the coefficients of the parameters θ and vectors λ[0,+1] contributed by each of the terms in the soft method of (3.22).
3.5.4.3
Truncated-Sequence Mixed Method Reformulation
As in the case of the constraint-satisfaction method for truncated sequences (3.27), the mixed method for truncated sequences (3.29) constrains the vectors λ[0,+1] to satisfy the backward recursion of (3.60). This backward recursion is from Corollary 3.1 and lacks a terminal condition, and so we are unable to rewrite the vectors λk as linear functions solely of the parameters θ (as we did in the whole-sequence setting). Instead, we shall reformulate the mixed method for truncated sequences in the same manner as we reformulated the constraint-satisfaction method for truncated sequences. Recall from (3.61) that the linearity of the Hamiltonian gradients in both the parameters and costates under Assumption 3.3 implies that
∇u Hk (xk , u k , λk+1 , θ ) = Wk
θ
λ+1
for 0 ≤ k ≤ . Thus, the mixed method for truncated sequences (3.29) may be reformulated under Assumption 3.3 as the quadratic program inf
θ,λ+1
θ λ+1 φ M
θ
λ+1
s.t. θ ∈ Θ
(3.64)
where φ M is the positive semidefinite matrix φM
Wk Wk .
k∈K
We note that in the reformulations of the constraint-satisfaction and mixed methods of (3.62) and (3.64), the matrices φC and φ M are not dependent on the variables being optimized (i.e., θ and λ+1 ).
76
3 Discrete-Time Inverse Optimal Control
3.5.5 Solution Results for Truncated-Sequence Methods We now establish results characterizing the existence and uniqueness of the costfunction parameters computed by the constraint-satisfaction, mixed, and soft methods for truncated sequences by exploiting their reformulations under Assumption 3.3. As in our treatment of the methods for whole sequences, as a first step, we again note that scaling the cost function of an optimal control problem (i.e., VT or V∞ ) by any scalar C > 0 does not change the optimizing sequences. Thus, without loss of generality, we shall again consider the parameter set defined in (3.44).
3.5.5.1
Truncated-Sequence Constraint-Satisfaction Method Solution Results
Under Assumption 3.3, we have seen that the constraint-satisfaction method of (3.27) for truncated sequences may be implemented by solving the constrained system of linear equations in (3.62). By substituting the parameter set (3.44) into (3.62) for Θ, this constrained system of linear equations becomes the unconstrained system of linear equations φ¯ C
θ
λ+1
= e¯1
(3.65)
where
e φ¯ C 1 ∈ R(|K |+1)×(q+n) φC and e1 and e¯1 are column vectors of appropriate dimensions with 1 in their first components and zeros elsewhere. In the following theorem, we shall analyze the solutions to this unconstrained system of linear equations in order to characterize the existence and uniqueness of solutions to the constraint-satisfaction method (3.27). To present this theorem, let us use φ¯ C+ ∈ R(q+n)×(|K |+1) to denote the pseudoinverse of the matrix φ¯ C , and let us introduce the matrix
1 U¯ U¯ C ¯ C2 = I − φ¯ C+ φ¯ C UC
(3.66)
where U¯ C1 ∈ Rq×(q+n) and U¯ C2 ∈ Rn×(q+n) . Let us also define I¯C I 0 ∈ Rq×(q+n) . Theorem 3.6 (Solutions to Truncated-Sequence Constraint-Satisfaction Method) Suppose that Assumption 3.3 holds and that the parameter set Θ is given by (3.44). Then the constraint-satisfaction method for truncated sequences (3.27) yields costfunction parameters if and only if
3.5 Method Reformulations and Solution Results
φ¯ C φ¯ C+ e¯1 = e¯1 ,
77
(3.67)
and these (potentially nonunique) parameters are given by θ = I¯C φ¯ C+ e¯1 + U¯ C b
(3.68)
where b ∈ Rq+n is any arbitrary vector. Furthermore, if U¯ C1 = 0 or φ¯ C has rank ρC = q + n in addition to (3.67) holding, then the cost-function parameters computed by the method are unique and given by θ = I¯C φ¯ C+ e¯1 .
(3.69)
Proof Since (3.27) reduces to (3.65) under Assumption 3.3 with Θ given by (3.44), we shall analyze the solutions to (3.65). From Proposition 2.2, we have that the system of linear equations (3.65) is consistent if and only if φ¯ C φ¯ C+ e¯1 = e¯1 , and we have that these solutions are given by
θ
λ+1
= φ¯ C+ e¯1 + U¯ C b.
Multiplying by I¯C proves the first theorem assertion (3.68). The second theorem assertion follows from (3.68) since Lemma 2.1 gives that U¯ C = 0 if and only if φ¯ C has rank q + n, implying that
θ
λ+1
= φ¯ C+ e¯1 .
Alternatively, when U¯ C1 = 0, the components of the product U¯ C b corresponding to those that form θ are zero, and so θ = I¯C φ¯ C+ e¯1 . The proof is complete. Similar to Theorem 3.3 in the case of whole sequences, Theorem 3.6 for truncated sequences establishes that the left-identity condition (3.67) is both a necessary and sufficient condition for ensuring the existence of solutions to the optimization problem defining the constraint-satisfaction method (3.27) under Assumption 3.3 and when Θ is given by (3.44). The constraint-satisfaction method for truncated sequences (3.27) will fail to yield any cost-function parameters when the left-identity condition (3.46) fails to hold. Furthermore, similar to the rank condition of Theorem 3.3, the conditions U¯ C1 = 0 and ρC = q + n serve roles analogous to persistence of excitation conditions that appear in parameter estimation and adaptive control since when they hold the truncated state and control sequences provide sufficient information to enable unique determination of the cost-function parameters θ . Finally, if ρC = q + n, then the proof of Theorem 3.6 suggests that the constraint-satisfaction
78
3 Discrete-Time Inverse Optimal Control
method for truncated sequences (3.27) will yield both unique cost-function parameters θ and a unique final costate vector λ+1 . If ρC < q + n but U¯ C1 = 0, then only the cost-function parameters θ solving (3.27) will be unique. Theorem 3.6 implies conditions for the existence and uniqueness of exact solutions to the TS problem of Definition 3.2 regardless of the specific method of inverse optimal control employed (whether bilevel, based on the minimum principle, or any other method). Specifically, the constraint-satisfaction method (3.27) encodes the costate backward recursion (3.25) and the Hamiltonian gradient condition (3.26) of the combined finite- and infinite-horizon discrete-time minimum principle of Corollary 3.1 as constraints. Thus, when no solution to the constraint-satisfaction method (3.27) exists, which Theorem 3.6 implies when φ¯ C φ¯ C+ e¯1 = e¯1 , there is no θ in Θ such that the states x[0,] and controls u [0,] satisfy the discrete-time minimum principle of Corollary 3.1. This observation is summarized in the following corollary. Corollary 3.3 (Existence of Exact Solutions to Truncated-Sequence Problem of Definition 3.2) Suppose that Assumption 3.3 holds and that the parameter set Θ is given by (3.44). If φ¯ C φ¯ C+ e¯1 = e¯1 then the sequence of states x[0,] and associated controls u [0,] do not constitute a (potentially local) optimal solution to either (3.4) or (3.5) for any θ ∈ Θ. Proof Recall that the constraints in (3.27) encode both the conditions (3.25) and (3.26) of the discrete-time minimum principle of Corollary 3.1, and (3.27) simplifies to (3.65) under Assumption 3.3 with Θ given by (3.44). Thus, when no solution to (3.45) exists, which Theorem 3.6 implies when φ¯ C φ¯ C+ e¯1 = e¯1 , there is no θ in Θ such that the states x[0,] and controls u [0,] satisfy the discrete-time minimum principle of Corollary 3.1. The first corollary assertion follows by noting that satisfying the conditions of Corollary 3.1 is necessary in order for x[0,] and u [0,] to constitute an optimal solution to both (3.4) and (3.5). + ¯ ¯ Corollary 3.3 implies that the left-inverse condition φC φC e¯1 = e¯1 is necessary in order for the TS problem of Definition 3.2 to have an exact solution. In practice, the condition may not be satisfied when the basis functions gk , dynamics f k , or parameter set Θ are misspecified—in which case it may be desirable to resort to finding an approximate optimality solution using either the soft or mixed methods of (3.28) and (3.29). We summarize the constraint-satisfaction method for truncated sequences (3.27) under Assumption 3.3 in Algorithm 3.4.
3.5.5.2
Truncated-Sequence Soft Method Solution Results
Let us now recall that the soft method for truncated sequences (3.28) can be reformulated as the constrained quadratic program (3.63) under Assumption 3.3. In the
3.5 Method Reformulations and Solution Results
79
Algorithm 3.4 Constraint-Satisfaction Method for Truncated Sequences Input: Truncated state and control sequences x[0,] and u [0,] , dynamics f k , basis functions g¯ k and ¯ control-constraint set U , and parameter set Θ = {θ ∈ Rq : θ(1) = 1}. F, Output: Computed cost-function parameters θ. 1: Compute matrix φC in (3.62). 2: Compute augmented matrix φ¯ C in (3.65). 3: Compute the pseudoinverse φ¯ C+ of φ¯ C . 4: if φ¯ C φ¯ C+ e¯1 = e¯1 then 5: Compute the rank ρC of φ¯ C . 6: if ρC = q + n then 7: return Unique θ given by (3.69). 8: else 9: Compute U¯ C and U¯ C1 in (3.66). 10: if U¯ C1 = 0 then 11: return Unique θ given by (3.69). 12: else 13: return Any θ from (3.68) with any b ∈ Rq+n . 14: end if 15: end if 16: else 17: return No feasible exact solutions θ to the TS problem (cf. Corollary 3.3). 18: end if
following theorem, we incorporate the parameter set given by (3.44) into this reformulation, and characterize the existence and uniqueness of cost-function parameters that are yielded by it. To present this theorem, let us define φ¯ S as the principal submatrix of φ S formed by deleting the first row and column of φ S . Let us also define μ S as the first column of φ S with its first element deleted. Similarly, let ρ S be the rank of φ¯ S , let φ¯ S+ be the pseudoinverse of φ¯ S , and let φ¯ S = U¯ S Σ¯ S U¯ S be an SVD of φ¯ S where Σ¯ S ∈ R(q+(+2)n−1)×(q+(+2)n−1) is a diagonal matrix, and
11 12 U¯ U¯ ¯ U S ¯ S21 ¯ S22 ∈ R(q+(+2)n−1)×(q+(+2)n−1) US US
(3.70)
is a block matrix with submatrices U¯ S11 ∈ R(q−1)×ρS , U¯ S12 ∈ R(q−1)×(q+(+2)n−1−ρS ) , n(+2)×ρ S ¯ 21 U , and U¯ S22 ∈ Rn(+2)×(q+(+2)n−1−ρS ) . Finally, let us define I¯S S ∈ R q×(q+(+2)n) I 0 ∈R . Theorem 3.7 (Solutions to Truncated-Sequence Soft Method) Suppose that Assumption 3.3 holds and that Θ is given by (3.44). If (I − φ¯ S φ¯ S+ )μ S = 0, then all of the parameter vectors θ ∈ Θ corresponding to all solutions (θ, λ[0,+1] ) of the soft method (3.28) are of the form
80
3 Discrete-Time Inverse Optimal Control
θ = I¯S β S
(3.71)
where β S 1 β¯S ∈ Rq+(+2)n are (potentially nonunique) solutions to the quadratic program (3.63) with β¯S ∈ Rq+(+2)n−1 given by
0 β¯S = −φ¯ S+ μ S + U¯ S b
(3.72)
for any b ∈ Rq+(+2)n−1−ρS . Furthermore, if either U¯ S12 = 0 or ρ S = q + ( + 2)n − 1, then all solutions (θ, λ[0,+1] ) to the soft method (3.28) correspond to a single unique parameter vector given by θ = I¯S
1 . −φ¯ S+ μ S
(3.73)
Proof We shall exploit the reformulation of the soft method for truncated sequences under Assumption 3.3 given in (3.63) with Θ given by (3.44). Any vector β S ∈ Rq+(+2)n that the constraint I¯S β S ∈ Θ with Θ given satisfies ¯ by (3.44) can be written as β S = 1 β S where β¯S ∈ Rq+(+2)n−1 . Thus, for any β S ∈ Rq+(+2)n with I¯S β S ∈ Θ and Θ given by (3.44), the quadratic objective function in (3.63) is β S φ S β S
= 1 β¯S φ S
1 β¯S
= φ S,(1,1) + β¯S φ¯ S β¯S + 2β¯S μ S . All solutions β S to (3.63) are thus of the form β S = 1 β¯S where β¯S ∈ Rq+(+2)n−1 are solutions to the unconstrained quadratic program inf β¯S
1 β¯S φ¯ S β¯S + β¯S μ S . 2
Under the condition that (I − φ¯ S φ¯ S+ )μ S = 0 and by noting that φ¯ S is symmetric positive semidefinite due to φ S being symmetric positive semidefinite, Proposition 2.1 gives that this unconstrained quadratic program is solved by any β¯S satisfying
0 β¯S = −φ¯ S+ μ S + U¯ S b for any b ∈ Rq+(+2)n−1−ρS . The first theorem assertion (3.72) follows. Now, to prove the second theorem assertion we note that if U¯ S12 = 0, then the vectors β¯S in the solutions β S = 1 β¯S to the constrained quadratic program (3.63) satisfy
3.5 Method Reformulations and Solution Results
81
11 0 0 U¯ β¯S = −ξ¯S+ ν S + ¯ S21 ¯ 22 b US US
0 + ¯ = −φ S μ S + ¯ 22 US b for any b ∈ Rq+(+2)n−1−ρS where U¯ S22 b ∈ Rn(+2) . Clearly, if ρ S = q + n( + 2) − 1, then we also have that β¯S = −φ¯ S+ μ S . Thus, if either U¯ S12 = 0 or ρ S = q + n( + 2) − 1, then the first q − 1 components q+n(+2)−1−ρ S , and so all of β¯S are invariant respect to the free vector b ∈ R with ¯ solutions β S = 1 β S of the constrained quadratic program (3.63) satisfy I¯S β S = I¯S
1 . −φ¯ S+ μ S
The second theorem assertion follows since the parameters θ from (3.28) satisfy θ = I¯S β S . The proof is complete. Theorem 3.7 describes the cost-function parameters yielded by the soft method for truncated sequences (3.28). Its assertions and proof directly mirror those of Theorem 3.4 for the whole-sequence soft method. Much of the discussion after Theorem 3.4 is therefore relevant here. In particular, Theorem 3.7 does not explicitly describe the costate vectors λ[0,+1] computed by the soft method (3.28). However, by inspecting its proof, we see that if ρ S = q + ( + 2)n − 1 holds, then both the cost-function parameters θ and costate vectors λ[0,+1] solving (3.28) will be unique. The condition U¯ S12 = 0 can hold when ρ S < q + ( + 2)n − 1, however in this case all pairs (θ, λ[0,+1] ) solving the soft method (3.28) will share the unique parameter vector θ given by (3.73) but may not share common costate vectors. We summarize the soft method for truncated sequences (3.28) under Assumption 3.3 in Algorithm 3.5.
3.5.5.3
Truncated-Sequence Mixed Method Solution Results
We now turn our attention to the mixed method for truncated sequences (3.29). Under Assumption 3.3, the mixed method (3.29) reduces to the constrained quadratic program of (3.64). We shall exploit this reformulation to characterize the existence and uniqueness of cost-function parameters that are yielded by the mixed method (3.29) when the parameter set given by (3.44). Let us therefore consider (3.64) and define φ¯ M as the principal submatrix of φ M formed by deleting its first row and column. Let us also define μ M as the first column of φ M with its first element deleted, + as the pseudoinverse of φ¯ M , and let and ρ M as the rank of φ¯ M . Let us also define φ¯ M
82
3 Discrete-Time Inverse Optimal Control
Algorithm 3.5 Soft Method for Truncated Sequences Input: Truncated state and control sequences x[0,] and u [0,] , dynamics f k , basis functions g¯ k and ¯ control-constraint set U , and parameter set Θ = {θ ∈ Rq : θ(1) = 1}. F, Output: Computed cost-function parameters θ. 1: Compute matrix φ S via (3.63). 2: Compute principal submatrix φ¯ S from φ S by deleting first row and column. 3: Compute vector μ S from φ S by extracting first column without its first element. 4: Compute the pseudoinverse φ¯ S+ of φ¯ S such that (I − φ¯ S φ¯ S+ )μ S = 0. 5: Compute the rank ρ S of φ¯ S . 6: if ρ S = q + ( + 2)n − 1 then 7: return Unique θ given by (3.73). 8: else 9: Compute U¯ S and U¯ S12 in (3.70) via SVD of φ¯ S . 10: if U¯ S12 = 0 then 11: return Unique θ given by (3.73). 12: else 13: return Any θ from (3.71) with any b ∈ Rq+(+2)n−1−ρ S . 14: end if 15: end if
φ¯ M = U¯ M Σ¯ M U¯ M
be an SVD of φ¯ M where Σ¯ M ∈ R(q+n−1)×(q+n−1) is a diagonal matrix, and
11 12 U¯ U¯ M (q+n−1)×(q+n−1) ¯ UM = ¯ M 21 ¯ 22 ∈ R UM UM
(3.74)
11 12 is a block matrix with submatrices U¯ M ∈ R(q−1)×ρ M , U¯ M ∈ R(q−1)×(q+n−1−ρ M ) , 21 22 n×ρ M n×(q+n−1−ρ M ) ¯ ¯ , and U M ∈ R . Finally, let I M [I 0] ∈ Rq×(q+n) . UM ∈ R
Theorem 3.8 (Solutions to Truncated-Sequence Mixed Method) Suppose that + )μ M = 0, then Assumption 3.3 holds and that Θ is given by (3.44). If (I − φ¯ M φ¯ M all of the parameter vectors θ ∈ Θ corresponding to all solutions (θ, λ+1 ) of the mixed method (3.29) are of the form θ = I¯M β M
(3.75)
where β M = 1 β¯M ∈ Rq+n are (potentially nonunique) solutions to the quadratic program (3.64) with β¯M ∈ Rq+n−1 given by
0 + ¯ ¯ ¯ β M = −φ M μ M + U M b
(3.76)
12 = 0 or ρ M = q + n − 1, then all for any b ∈ Rq+n−1−ρ M . Furthermore, if either U¯ M of the solutions (θ, λ+1 ) to the mixed method (3.29) correspond to the single unique parameter vector θ ∈ Θ given by
3.5 Method Reformulations and Solution Results
θ = I¯M
83
1
+ −φ¯ M μM
.
(3.77)
Proof The mixed method (3.29) reduces to the constrained quadratic program (3.64) under Assumption 3.3 and so it suffices to analyze the solutions to (3.64) with the parameter set given by (3.44). where β¯M ∈ For any β M ∈ Rq+n with I¯M β M ∈ Θ we have that β M = 1 β¯M q+n−1 and so R
1 ¯ βM φM βM = 1 βM φM ¯ βM ¯ ¯ φ M β M + 2β¯M μM = φ M,(1,1) + β¯M where φ M,(1,1) is the first element of φ M . All solutions to the constrained quadratic where program (3.64) with Θ given by (3.44) are therefore of the form β M = 1 β¯M q+n−1 ¯ are solutions to the unconstrained quadratic program βM ∈ R inf β¯M
1 β¯M φ¯ M β¯M + β¯M μM . 2
+ Under the condition that (I − φ¯ M φ¯ M )μ M = 0 and by noting that φ¯ M is symmetric positive semidefinite due to φ M being symmetric positive semidefinite, Proposition 2.1 gives that this unconstrained quadratic program is solved by any β¯M satisfying
0 + β¯M = −φ¯ M μ M + U¯ M b for any b ∈ Rq+n−1−ρ M . The first theorem assertion (3.75) follows. 12 = 0, then To prove the second theorem assertion, we note that if U¯ M
11 U¯ M 0 0 + ¯ ¯ β M = −φ M μ M + ¯ 21 ¯ 22 UM UM b
0 + ¯ = −φ M μ M + ¯ 22 UM b 22 b ∈ Rn . Clearly, if ρ M = q + n − 1, then we also for any b ∈ Rq+n−1−ρ M where U¯ M have that + β¯M = −φ¯ M μM . 12 = 0 or ρ M = q + n − 1, then the first q − 1 components of β¯M Thus, if either U¯ M are invariant with respect to the free vector b ∈ Rq+n−1−ρ M , and so all solutions
84
3 Discrete-Time Inverse Optimal Control
β M = 1 β¯M of the constrained quadratic program (3.75) satisfy I¯M β M = I¯M
1
+ −φ¯ M μM
.
The second theorem assertion follows since θ = I¯M β M . The proof is complete. Theorem 3.8 describes the cost-function parameters yielded by the mixed method for truncated sequences (3.29) without consideration of the costate vectors. Its assertions and proof are similar to those of Theorems 3.4 and 3.7 for the soft methods. Aspects of the discussions after Theorems 3.4 and 3.7 are therefore also relevant here. For example, Theorem 3.8 does not explicitly describe the terminal costate vector λ+1 computed by the mixed method (3.29). However, by inspecting its proof, we see that if ρ M = q + n − 1 holds, then both the cost-function parameters θ and costate vector λ+1 solving (3.29) will be unique since the first assertion of Theorem 3.8, specifically (3.76), implies that the free vector b will be zero-dimensional when 12 = 0 can hold when ρ M = q + n − 1, however ρ M = q + n − 1. The condition U¯ M in these cases all pairs (θ, λ+1 ) solving the mixed method (3.29) will share the unique parameter vector θ given by (3.77) but may not share common costate vectors. 12 = 0 and ρ M = q + n − 1 will not hold when the inverse Again, the conditions U¯ M optimal control problem is ill-posed and they both therefore have an intuitive interpretation as persistence of excitation conditions for the mixed method (3.29). While we note that these conditions for the mixed method may hold in cases where the conditions of Theorem 3.6 and Corollary 3.3 for the constraint-satisfaction method do not, in these cases the mixed method will yield parameters under which the sequences are approximately optimal rather than exactly optimal (in the sense discussed in Sect. 3.2). We summarize the mixed method for truncated sequences (3.29) under Assumption 3.3 in Algorithm 3.6.
3.6 Inverse Linear-Quadratic Optimal Control in Discrete Time We have so far considered methods of discrete-time inverse optimal control in this chapter for potentially nonlinear dynamics (as described by (3.1)), and for general (potentially nonquadratic) cost functions (as described by (3.2) or (3.3)). In this section, we explore an approach to solving the TS problem of Definition 3.2 that becomes tractable in the infinite-horizon case of linear system dynamics (i.e., when f k (xk , u k ) = Axk + Bu k for matrices A and B), quadratic basis functions (i.e., when g(xk , u k , θ ) = xk Qxk + u k Ru k with the matrices Q and R constituting the parameters θ ), and an infinite horizon T = ∞.
3.6 Inverse Linear-Quadratic Optimal Control in Discrete Time
85
Algorithm 3.6 Mixed Method for Truncated Sequences Input: Truncated state and control sequences x[0,] and u [0,] , dynamics f k , basis functions g¯ k and ¯ control-constraint set U , and parameter set Θ = {θ ∈ Rq : θ(1) = 1}. F, Output: Computed cost-function parameters θ. 1: Compute matrix φ M via (3.64). 2: Compute principal submatrix matrix φ¯ M from φ M by deleting first row and column. 3: Compute vector μ M from φ M by extracting first column without its first element. ¯ ¯ ¯+ 4: Compute the pseudoinverse φ¯ + M of φ M such that (I − φ M φ M )μ M = 0. 5: Compute the rank ρ M of φ¯ M . 6: if ρ M = q + n − 1 then 7: return Unique θ given by (3.77). 8: else 12 in (3.74) via SVD of φ¯ . 9: Compute U¯ M and U¯ M M 12 = 0 then 10: if U¯ M 11: return Unique θ given by (3.77). 12: else 13: return Any θ given by (3.75) with any b ∈ Rq+n−1−ρ M . 14: end if 15: end if
3.6.1 Overview of the Approach To present the approach for inverse LQ optimal control, we note that standard LQ (forward) optimal control results (e.g., [2]) imply that the states and controls solving the (forward) infinite-horizon LQ optimal control problem satisfy the feedback relationship u k = −K xk for some optimal feedback (control) law taking the form of the matrix K ∈ Rm×n . With this observation, (infinite-horizon) discrete-time inverse LQ optimal control can be approached via a two-step process involving 1. estimation of the feedback law K from given state and control sequences; and 2. a search to find cost matrices Q and R (i.e., parameters θ ) such that the estimated feedback law K (and hence the given state and control sequences) constitutes an optimal solution to an infinite-horizon LQ optimal control problem. Importantly, the first step of this approach has the potential to exploit well-known techniques for estimating feedback laws K from input–output data (e.g., via linear least-squares estimation and other techniques from system identification [18]). Similarly, the second step is essentially the original notion of inverse optimal control introduced by Kalman in [14] (albeit in continuous time). Motivated by the connections between this approach and the rich literature of system identification and feedback-law-based inverse optimal control in control theory, in this section we shall formalize and treat it in detail.
86
3 Discrete-Time Inverse Optimal Control
3.6.2 Preliminary LQ Optimal Control Concepts We begin by specializing concepts from Sect. 3.1 to the LQ setting.
3.6.2.1
Parameterized LQ Optimal Control in Discrete Time
Let us consider linear dynamics described by the linear difference equations xk+1 = Axk + Bu k , x0 ∈ Rn ,
(3.78)
for k ≥ 0 where A ∈ Rn×n and B ∈ Rn×m are time-invariant system matrices. Let us also consider an infinite-horizon quadratic cost function of the form ∞
V∞ (x[0,∞] , u [0,∞] , Q, R) =
1 xk Qxk + u k Ru k 2 k=0
(3.79)
where Q 0 and R 0 constitute (cost) matrices that parameterize the cost function. Given the linear dynamics (3.78) and the quadratic cost function (3.79), the corresponding (forward) discrete-time infinite-horizon LQ optimal control problem is inf
u [0,∞]
s.t.
V∞ x[0,∞] , u [0,∞] , Q, R xk+1 = Axk + Bu k , k ≥ 0 x0 = x. ¯
(3.80)
We note that there are no constraints on the controls u k . We shall next introduce an algebraic Riccati equation (which can be derived from discrete-time minimum principles) that describes necessary and sufficient conditions for the solution of this LQ optimal control problem, and which we shall use to develop inverse methods.
3.6.2.2
Necessary and Sufficient Conditions for Feedback Solutions
In order to present an algebraic Riccati equation (ARE) characterizing the solutions to (3.80), we shall restrict our attention to controls of the form u k = −K xk ,
(3.81)
where the feedback law K ∈ Rm×n is such that the closed-loop system matrix F A − BK
(3.82)
3.6 Inverse Linear-Quadratic Optimal Control in Discrete Time
87
is stable, meaning that xk+1 = F xk tends to zero as k → ∞. In other words, we shall restrict our attention to stabilizing linear feedback laws K that belong to the set F = {K ∈ Rm×n : F is stable}.
(3.83)
The set F is non-empty if the pair (A, B) is stabilizable.5 With this restriction, the cost function in (3.80) can be viewed and written as a function of the initial state and the feedback law, i.e., V∞ (K , x0 , Q, R), and we have the following theorem which gives necessary and sufficient conditions for optimal feedback laws. Theorem 3.9 (Discrete-Time ARE) Consider the infinite-horizon LQ optimal control problem (3.80). A feedback law K ∈ F is optimal, i.e., V∞ (K , x0 , Q, R) ≤ V∞ ( K¯ , x0 , Q, R), for all K¯ ∈ F and all x0 ∈ Rn if and only if −1 K = R + B P B B PA
(3.84)
where P ∈ Rn×n is a stabilizing solution6 to the (discrete-time) ARE −1 P = A (P − P B B P B + R B P)A + Q.
(3.85)
Furthermore, if K is optimal, then V∞ (K , x0 , Q, R) = x0 P x0 . Proof The sufficiency result is known from classical optimal control theory (e.g., see [5, Proposition 4.4.1] or [16, Theorem 16.6.4]). The necessary part can be proved using matrix differential calculus and Lyapunov theory analogously to the proof of [7, Theorem 2]. Having presented the preliminary concepts of (forward) LQ optimal control problems (3.80), and the notion of an ARE as optimality conditions (cf. Theorem 3.9), we next introduce a feedback-law-based inverse LQ optimal control problem.
3.6.3 Feedback-Law-Based Inverse LQ Optimal Control We now consider the second step of the two-step process described in Sect. 3.6.1 for solving inverse LQ optimal control problems. Specifically, we pose and solve the problem of inverse optimal control given a feedback law K (e.g., after it has been estimated from system states and controls).
5
Besides stabilizability, detectability also plays a fundamental role in LQ optimal control. We refer to textbooks (e.g., [2] or [4, Sect. 5.5]) for further information on stabilizability and detectability as well as the stronger notions of controllability and observability. 6 That is, P solves (3.85) and leads to a feedback law K of the form in (3.84) belonging to F .
88
3.6.3.1
3 Discrete-Time Inverse Optimal Control
Feedback-Law-Based Problem Formulation
The feedback-law-based inverse optimal control problem is defined as follows. Definition 3.4 (Feedback-Law-Based (FLB) Problem) Consider the parameterized discrete-time infinite-horizon LQ optimal control problem (3.80) with linear dynamics (6.61) and quadratic cost function (3.79). Given system matrices A and B, and a feedback law K , the feedback-law-based discrete-time infinite-horizon inverse LQ optimal control problem is to compute the cost-function parameters (i.e., the entries of the matrices Q and R), such that K (and hence the controls given by u k = −K xk ) constitutes an optimal solution to (3.80). To solve the FLB problem of (3.4), we shall exploit the discrete-time ARE established in Theorem 3.9. Indeed, we shall interpret Theorem 3.9 as stating conditions that the unknown cost function matrices Q and R (and the Riccati solution P) must satisfy in order for the feedback law K to constitute an optimal solution to (3.80). We begin by reformulating (3.85).
3.6.3.2
Reformulation of Algebraic Riccati Equation
In light of Theorem 3.9, for K to constitute an optimal solution to (3.80), matrices P and R must exist such that K is of the form given in (3.84). That is, P and R must exist such that R + B P B K = B P A R K = B P (A − B K ) (3.86) R K = B P F, where we used the definition of the closed-loop system matrix (3.82). We shall proceed by vectorizing the last equation of (3.86). In order to vectorize the last equation of (3.86), let us define the notation vec(X ) ∈ R pq×1 to mean the (column) vector formed from an arbitrary matrix X ∈ R p×q by stacking its columns. Let us also use X ⊗ Y to denote the Kronecker product between arbitrary matrices X and Y (see [6] for a review of these concepts). With this notation, we obtain (3.87) K ⊗ Im vec(R) = F ⊗ B vec(P), where we used the equivalence vec(X Y Z ) = Z ⊗ X vec(Y )
(3.88)
which holds for any matrices X , Y , and Z with suitable dimensions (cf. [6]). We now turn our attention to the ARE (3.85) and the conditions on the matrix Q. We rewrite (3.85) as
3.6 Inverse Linear-Quadratic Optimal Control in Discrete Time
−1 P = A P A − A P B R + B P B B PA + Q
89
(3.89)
which, using (3.84), leads to P = A P(A − B K ) + Q P = A P F + Q
(3.90)
and, by vectorizing the last equation of (3.90), we get vec(P) = F ⊗ A vec(P) + vec(Q) vec(P) = (In 2 − (F ⊗ A ))−1 vec(Q),
(3.91)
provided the inverse exists. We now insert (3.91) into (3.87), which results in
F ⊗ B (In 2 − (F ⊗ A ))−1 vec(Q) − K ⊗ Im vec(R) = 0.
Therefore, we obtain
W¯ θ¯ = 0,
(3.92)
(3.93)
where W¯ ∈ Rnm×L with L n 2 + m 2 is given by W¯ =
F ⊗ B (In 2 − (F ⊗ A ))−1 − K ⊗ Im
and θ¯ ∈ R L contains all entries of the cost function matrices, i.e.,
¯θ = vec(Q) . vec(R)
(3.94)
(3.95)
Importantly, the matrix W¯ does not depend on the unknown matrices P, Q, or R, and can be computed with only the (given) feedback law K and system matrices A and B. Therefore, (3.93) forms a homogenous system of linear equations that can be solved for the cost-function parameters θ¯ (i.e., the matrices Q and R). Before we exploit (3.93) to present a method for solving the FLB problem of Definition 3.4, we shall first leverage it to analyze the existence of (exact) solutions to the FLB problem (in a similar vein to the more general results of Corollary 3.3).
3.6.3.3
Existence of Exact Solutions to Feedback-Law-Based Problem
The properties of the system of linear equations in (3.93), and its solutions, depend on the dimensions of W¯ , and ultimately on its rank or kernel. The dimensions of W¯ vary with the number of states n and the number of controls m, as well as with any assumptions on the number of unknown elements in the cost function matrices (i.e., the assumed structure of the matrices Q and R). To analyze the dimensions of
90
3 Discrete-Time Inverse Optimal Control
W¯ , let q > 0 denote the number of nonredundant parameters of the cost function (i.e., the nonredundant elements of θ¯ ).7 If Q and R are diagonal matrices (i.e., only their diagonal elements are unknown), then q = n + m. If Q and R are symmetric matrices (i.e., Q = Q and R = R ), then q=
1 (n(n + 1) + m(m + 1)). 2
In the most general case of every element of Q and R being unknown, q = n 2 + m 2 = L. Hence, q ≤ L always holds. By omitting redundant elements of Q and R from the parameter vector θ¯ , the reformulation of ARE in (3.93) can be rewritten as W θ = 0,
(3.96)
with the reduced parameter vector θ ∈ Rq containing only nonredundant and nonzero ¯ and the matrix W ∈ Rnm×q being an appropriately modified version elements of θ, of W¯ such that (3.96) holds. For example, if θ¯ has elements equal to zero due to diagonal cost function matrices, then these are deleted together with the corresponding columns of W¯ in order to obtain θ and W in (3.96). From (3.96), we can see that the set of parameters θ solving the FLB problem of Definition 3.4 corresponds to the kernel of the matrix W , i.e., ker(W ),
(3.97)
with convex boundaries representing Q 0 and R 0. If the kernel of W does not exist, then there are no feasible solutions to the FLB problem. The existence of the kernel of W depends on the dimensions of W (or on rank(W )). A sufficient condition for the kernel to exist (i.e., not be the empty set ∅) is that nm < q (i.e., that the system of equations (3.96) has fewer equations than unknowns). This condition always holds if the cost function matrices Q and R are symmetric since nm ≤
1 1 2 (n + m 2 ) < (n(n + 1) + m(m + 1)) 2 2
for all n, m > 0. However, if the matrices are diagonal, then combinations of n and m exist for which nm ≥ n + m and the kernel may not exist (so that no solutions to the FLB problem exist).
7
We note that this corresponds to choosing basis functions according to Assumption 3.3, which are additionally nonredundant and nontrivial. The concepts of nonredundant and nontrivial basis functions were not discussed in the context of Assumption 3.3 since they were not necessary for the development and analysis of the minimum-principle methods in Sect. 3.4, though they may be of practical concern.
3.6 Inverse Linear-Quadratic Optimal Control in Discrete Time
91
We recall that the fulfillment of (3.96) (and hence the existence of the kernel of W ) demands exact optimality of K under some (unknown) infinite-horizon LQ optimal control problem of the form in (3.80). However, situations may arise in which the given control law K does not solve any infinite-horizon LQ optimal control problem of the form in (3.80), and (3.96) ceases to be of much use. We next therefore develop a method which is always solvable (independent of the values of n and m) and enables the solution of the FLB problem of Definition 3.4 in the same approximate optimality sense described in Sect. 3.2.
3.6.3.4
Feedback-Law-Based Inverse LQ Optimal Control Method
The method for solving the FLB problem of Definition 3.4 that we consider involves minimizing the violation of the conditions for LQ optimal control solutions by encoding (3.96) as an objective function. In particular, the FLB method is defined by the quadratic program min θ
s.t.
1 θ Ωθ, 2 Q0
(3.98)
R0 where Ω 2(W W ) ∈ Rq×q , and where Q and R are cost matrices that result from the parameter vector θ (as in (3.96), θ contains the nonredundant and nonzero elements of Q and R). By seeking to minimize the violation of the conditions of Theorem 3.9, the FLB method (3.98) is thus similar in spirit to the soft and mixed methods of (general nonlinear nonquadratic) inverse optimal control we presented in Sect. 3.4. We note that the matrix Q is constrained in (3.98) to be positive definite instead of positive semidefinite in order to avoid trivial solutions. Alternatively, this constraint could be relaxed to include positive semidefinite matrices Q and an additional constraint that one element of Q or R is nonzero and positive introduced to avoid trivial solutions (as in the parameter set defined in (3.44)). We now turn our attention to analyzing the solutions to (3.98).
3.6.3.5
Solution Results
The first solution result we consider concerns the existence of solutions to (3.98). Proposition 3.2 Under the conditions of the FLB problem of Definition 3.4, the quadratic program (3.98) is convex and at least one solution is guaranteed to exist. Proof The constraint sets defined by Q 0 and R 0 are convex, and therefore, their intersection is also convex. Furthermore, the objective function in (3.98) is
92
3 Discrete-Time Inverse Optimal Control
convex since Ω = 2W W 0. Hence, (3.98) is convex and therefore always has a solution. The results of Proposition 3.2 are not surprising when solutions to (3.96) exist. However, they have significance in non-ideal settings where the given feedback law K does not constitute an exact solution to any infinite-horizon LQ optimal control problem due to misspecified system matrices or misspecified cost function matrix structures. In these cases, Proposition 3.2 guarantees the existence of a solution in an approximate optimality sense. The next result we consider concerns the uniqueness of solutions to (3.98). We note that here we consider uniqueness up to a nonzero scaling factor since as in the general inverse optimal control case, if K is an optimal feedback law under (3.80) with the cost function matrices Q and R, then it is also optimal under (3.80) with the matrices C Q and C R for all arbitrary nonzero scaling factors C > 0. Theorem 3.10 (Necessary and Sufficient Conditions for Unique Solutions) Consider the FLB problem of Definition 3.4 and suppose that it is solved by Q and R matrices with elements from the vector θ . Then the set of all solutions to the proposed method of discrete-time infinite-horizon LQ inverse optimal control defined by (3.98) is {C θ : C > 0} (3.99) if and only if n m ≥ q − 1 together with rank(W ) = q − 1. Proof We first state that n m ≥ q − 1 is a necessary condition for unique solutions since n m < q − 1 leads to a higher dimensional solution set according to (3.96) and (3.97). Since the ARE (3.85) is fulfilled if and only if W θ = 0 or equivalently W θ 2 = 0, we proceed to analyze (3.96). Under the theorem condition rank(W ) = q − 1, we have dim(ker(W )) = 1 which implies a one-dimensional solution set of the form (3.99), proving the sufficiency result. The case rank(W ) < q − 1 leads to kernels and solution sets with a dimension greater that 1 and is therefore excluded. Therefore, only the case rank(W ) = q remains which we analyze using (3.98). If rank(W ) = q, which is only possible if n m ≥ q, then we obtain Ω 0 and thus (3.98) is strictly convex. Strict convexity leads to a unique solution of (3.98) and therefore also a unique solution of the ARE (3.85). But the latter contradicts the illposedness up to a constant factor which is always present in inverse optimal control problems. Hence, we conclude that rank(W ) = q − 1 is also necessary and the proof is complete. Theorem 3.10 gives necessary and sufficient conditions for the solution set of (3.98) to be one-dimensional, i.e., for the parameters θ to be unique up to a nonzero positive factor C . It additionally shows how the characteristics of the optimal control problem, i.e., the number of assumed basis functions q as well the number of states n and controls m have an influence on the possible set of solutions of the inverse problem.
3.6 Inverse Linear-Quadratic Optimal Control in Discrete Time
93
3.6.4 Estimation of Feedback Laws In the last subsection, we developed a method for solving the feedback-law-based problem of Definition 3.4 in which a feedback law K is assumed given. However, as discussed in Sect. 3.6.1, in practice (and in the truncated-sequence problem of Definition 3.2), we may only have access to observations of state and control trajectories over a time interval [0, ]. Hence, in order to utilize the FLB method (3.98), we first need to identify the feedback law K from data. For the discrete-time LQ case at hand, the (assumed) linear feedback relationship (3.81) between the states, controls, and feedback law K implies that K can be estimated simply by employing linear least-squares estimation techniques. That is, given (finite) sequences of states and controls x[0,] and u [0,] in the state and control pairs, the feedback matrix can be estimated by means of solving the linear least-squares estimation problem K = arg min K¯
K¯ xk + u k 2
(3.100)
k=0
which has the closed form solution −1 X [0,] X [0,] X [0,] K = −U[0,]
(3.101)
where X [0,] ∈ R+1×n and U[0,] ∈ R+1×m denote the observed sequence of available state and control values arranged in matrices, respectively. Least-squares estimation theory states that the parameters (in this case, the entries of K ) can be recovered if the persistence of excitation conditions are fulfilled [3, Sect. 2.4]. These conditions demand that the state and control trajectories are “informative” enough and are not identical to zero.
3.6.5 Inverse LQ Optimal Control Method We summarize the two-step approach introduced in this section concerning the estimation of the feedback law and solution of the feedback-law-based method of discrete-time infinite-horizon inverse LQ optimal control in Algorithm 3.7.
3.7 Notes and Further Reading Bilevel and Other Methods: Bilevel methods of discrete-time inverse optimal control appear to have never been explicitly proposed. Similar methods do, however, have a long history in the economics literature for systems with finite numbers of
94
3 Discrete-Time Inverse Optimal Control
Algorithm 3.7 Discrete-Time Infinite-Horizon Inverse LQ Optimal Control Input: Truncated state and control sequences x[0,] and u [0,] , system matrices A and B. Output: Computed cost-function parameters θ (i.e., nonredundant elements of Q and R). 1: Estimate K using least-squares (i.e., (3.100)) and determine the corresponding closed-loop system matrix F with (3.82). 2: Compute W¯ with (3.94). 3: Modify W¯ to form W so as to fulfill (3.96) with unknown parameters θ. 4: Solve the quadratic program (3.98) for θ.
states and/or controls (cf. [9, 10, 28] and references therein). The idea of solving discrete-time inverse optimal control problems without resorting to bilevel methods and instead using Karush–Kuhn–Tucker (KKT) conditions appears to have been originally developed in [15], and extended in many publications since including [1, 17, 24, 27, 29]. Preliminary characterizations of the existence and uniqueness of solutions yielded by these KKT approaches are provided in [15, 27, 29]. Minimum-Principle Methods: Minimum-principle methods of discrete-time inverse optimal control have emerged recently as alternatives to both bilevel and KKT methods [19–22]. Despite it being possible to derive (finite-horizon) discrete-time minimum principles from KKT conditions (cf. [8]), direct use of discrete-time minimum principles for inverse optimal control enables exploitation of control-theoretic concepts such as Hamiltonians and costate recursions [19–22]. As we have seen in this chapter, by exploiting control-theoretic concepts such as Hamiltonian functions and costate recursions, minimum-principle methods can be formulated for a variety of finite- and infinite-horizon discrete-time inverse optimal control problems. When the cost functions are linearly parameterized, we have also shown in Sect. 3.5 that minimum-principle methods can be implemented with straightforward matrix operations and have easily accessible results describing the existence and uniqueness of the cost-function parameters they yield. The application of discrete-time minimum principles has also delivered insights into the fundamental solvability of inverse optimal control problems using any inverse optimal control method (including both bilevel and minimum-principle methods), as discussed in Corollaries 3.2 and 3.3. Furthermore, as we shall see throughout this book, minimum principles provide a powerful, portable, framework for solving inverse problems in both discrete and continuous time, and in both optimal control and noncooperative dynamic game theory. Inverse Linear-Quadratic Optimal Control: Discrete-time inverse optimal control in the case of linear dynamical systems with unknown quadratic cost functions has received considerable specialized attention in the control engineering literature. The idea of estimating the feedback law and using algebraic Riccati equations as optimization constraints to identify cost function matrices was explored in [26] for the infinite-horizon discrete-time inverse LQ optimal control problem considered in Sect. 3.6. More recently, finite-horizon inverse LQ optimal control methods are presented in [30–32] that resemble bilevel methods but exploit discrete-time minimum prin-
3.7 Notes and Further Reading
95
ciples to avoid the direct solution of finite-horizon LQ optimal control problems in the lower level of optimization. Importantly, the results of [32] represent some of the first theoretical characterizations of the statistical consistency of inverse methods in the presence of noise, and [31] explores the first use of the celebrated expectation–maximization algorithm from statistical estimation for inverse optimal control. Extending the statistical results of [30–32] far beyond the LQ setting to general nonlinear dynamical systems and general nonquadratic cost functions is of paramount practical importance and theoretical significance. In the language of statistics and system identification, the results of this chapter (e.g., Corollaries 3.2 and 3.3) address problems analogous to those of identifiability and persistence of excitation that arise even in settings without noise (cf. [18] and references therein). They are thus practically important for ensuring that the given state and control sequences are sufficiently informative to yield unique solutions that are sensibly related to the true (unknown) parameters. Additional Topics: Numerous variations and extensions of the prototypical wholesequence and truncated-sequence discrete-time inverse optimal control problems we considered in this chapter have started attracting attention. For example, problems in which the cost functions can be abruptly time-varying have been considered in [11]; problems in which cost functions are computed via distributed optimization have been examined in [13]; and problems in which given state and control sequences must be processed sequentially (i.e., online without storing or processing them in a batch) have been investigated in [20, 21]. Importantly, methods for solving these various problems continue to draw on ideas derived from discrete-time minimum principles, including the recovery matrix approach of [12, 25] in the case of [11, 13], and the mixed method for truncated sequences (3.29) in the case of [20, 21].
References 1. Aghasadeghi N, Long A, Bretl T (2012) Inverse optimal control for a hybrid dynamical system with impacts. In: 2012 IEEE international conference on robotics and automation (ICRA), pp 4962–4967 2. Anderson BDO, Moore JB (1990) Optimal control: linear quadratic methods. Prentice Hall, Englewood Cliffs 3. Åström KJ, Wittenmark B (1995) Adaptive control, 2nd edn. Addison-Wesley, Reading 4. Basar T, Olsder GJ (1999) Dynamic noncooperative game theory, vol 23, 2nd edn. Academic, New York 5. Bertsekas DP (1995) Dynamic programming and optimal control, vol 1, 3rd edn. Athena Scientific, Belmont 6. Brewer J (1978) Kronecker products and matrix calculus in system theory. IEEE Trans Circuits Syst 25(9):772–781 7. Engwerda JC, van den Broek WA, Schumacher JM (2000) Feedback Nash equilibria in uncertain infinite time horizon differential games. In: Proceedings of the 14th international symposium of mathematical theory of networks and systems, MTNS 2000, pp 1–6 8. Goodwin GC, Seron MM, De Doná JA (2006) Constrained control and estimation: an optimisation approach. Springer Science & Business Media, Berlin
96
3 Discrete-Time Inverse Optimal Control
9. Hotz VJ, Miller RA (1993) Conditional choice probabilities and the estimation of dynamic models. Rev Econ Stud 60(3):497–529 10. Hotz VJ, Miller RA, Sanders S, Smith J (1994) A simulation estimator for dynamic models of discrete choice. Rev Econ Stud 61(2):265–289 11. Jin W, Kuli´c D, Lin JF-S, Mou S, Hirche S (2019) Inverse optimal control for multiphase cost functions. IEEE Trans Robot 35(6):1387–1398 12. Jin W, Kuli´c D, Mou S, Hirche S (2021) Inverse optimal control from incomplete trajectory observations. Int J Robot Res 40(6–7):848–865 13. Wanxin J, Shaoshuai M (2021) Distributed inverse optimal control. Automatica 129:109658 14. Kalman RE (1964) When is a linear control system optimal? J Basic Eng 86(1):51–60 15. Keshavarz A, Wang Y, Boyd S (2011) Imputing a convex objective function. In: 2011 IEEE international symposium on intelligent control (ISIC). IEEE, pp 613–619 16. Lancaster P, Rodman L (1995) Algebraic riccati equations. Clarendon Press, Oxford 17. Lin JF-S, Bonnet V, Panchea AM, Ramdani N, Venture G, Kuli´c D (2016) Human motion segmentation using cost weights recovered from inverse optimal control. In: 2016 IEEE-RAS 16th international conference on humanoid robots (Humanoids). IEEE, pp 1107–1113 18. Ljung L (1999) System identification: theory for the user. Prentice Hall PTR, Upper Saddle River 19. Molloy TL, Ford JJ, Perez T (2018) Finite-horizon inverse optimal control for discrete-time nonlinear systems. Automatica 87:442–446 20. Molloy TL, Ford JJ, Perez T (2018) Online inverse optimal control on infinite horizons. In: 2018 IEEE conference on decision and control (CDC), pp 1663–1668 21. Molloy TL, Ford JJ, Perez T (2020) Online inverse optimal control for control-constrained discrete-time systems on finite and infinite horizons. Automatica 120:109109 22. Molloy TL, Tsai D, Ford JJ, Perez T (2016) Discrete-time inverse optimal control with partialstate information: a soft-optimality approach with constrained state estimation. In: 2016 IEEE 55th annual conference on decision and control (CDC), Las Vegas, NV 23. Mombaur K, Truong A, Laumond J-P (2010) From human to humanoid locomotion–an inverse optimal control approach. Auton Robots 28(3):369–383 24. Panchea AM, Ramdani N (2015) Towards solving inverse optimal control in a bounded-error framework. In: American control conference (ACC) 2015, pp 4910–4915 25. Parsapour M, Kuli´c D (2021) Recovery-matrix inverse optimal control for deterministic feedforward-feedback controllers. In: 2021 American control conference (ACC), pp 4765– 4770 26. Priess MC, Conway R, Choi J, Popovich JM, Radcliffe C (2015) Solutions to the inverse LQR problem with application to biological systems analysis. IEEE Trans Control Syst Technol 23(2):770–777 27. Puydupin-Jamin A-S, Johnson M, Bretl T (2012) A convex approach to inverse optimal control and its application to modeling human locomotion. In: 2012 IEEE international conference on robotics and automation (ICRA), pp 531–536 28. Rust J (1987) Optimal replacement of GMC bus engines: an empirical model of Harold Zurcher. Econometrica 55(5):999–1033 29. Yokoyama N (2017) Inference of aircraft intent via inverse optimal control including secondorder optimality condition. J Guid Control Dyn 41(2):349–359 Feb 30. Yu C, Yao L, Hao F, Jie C (2021) System identification approach for inverse optimal control of finite-horizon linear quadratic regulators. Automatica 129:109636 31. Zhang H, Li Y, Hu X (2019) Inverse optimal control for finite-horizon discrete-time linear quadratic regulator under noisy output. In: 2019 IEEE 58th conference on decision and control (CDC), pp 6663–6668 32. Han Z, Jack U, Hu X (2019) Inverse optimal control for discrete-time finite-horizon Linear quadratic regulators. Automatica 110:108593
Chapter 4
Continuous-Time Inverse Optimal Control
In this chapter, we investigate inverse optimal control in continuous time. The contents of this chapter mirror those of Chap. 3. Specifically, we pose inverse optimal control problems that differ in whether the given data consists of whole or truncated state and control trajectories. We then develop inverse optimal control methods based on either bilevel optimization or (continuous-time) minimum principles. There are several distinctions between inverse optimal control in continuous time and discrete time, the most notable of which is that the dynamics of the underlying systems are governed by ordinary differential equations instead of difference equations. Methods of continuous-time inverse optimal control derived from minimum principles still, however, simplify to (static) optimization problems for linear parameterizations of the cost functionals, albeit after the solution of differential equations. A distinction also arises in the ability of minimum-principle methods to solve continuous-time infinite-horizon inverse optimal control problems since continuous-time infinite-horizon minimum principles involve additional scaling factors compared to their discrete-time counterparts, complicating their usage. In the final part of this chapter, we explore continuous-time inverse optimal control for linear systems with infinite-horizon quadratic cost functionals. These infinitehorizon linear-quadratic (LQ) results enable us to draw a connection between the data-driven inverse optimal control problems we consider and the original notion of inverse optimal control introduced by Kalman in [17].
4.1 Preliminary Concepts In this section, we prepare to consider continuous-time inverse optimal control problems by introducing continuous-time (forward) optimal control problems with parameters in their cost functionals (extending the standard concepts of Sect. 2.3). © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 T. L. Molloy et al., Inverse Optimal Control and Inverse Noncooperative Dynamic Game Theory, Communications and Control Engineering, https://doi.org/10.1007/978-3-030-93317-3_4
97
98
4 Continuous-Time Inverse Optimal Control
The (inverse) problems we consider later in this chapter will be concerned with computing these cost-functional parameters.
4.1.1 Parameterized Continuous-Time Optimal Control Problems Let us consider a (possibly infinite) horizon T > 0 and dynamics described by the (potentially nonlinear) system of first-order ordinary differential equations x(t) ˙ = f (t, x(t), u(t)) , x(0) = x0
(4.1)
for continuous-time t ∈ [0, T ]. Here, x(t) ∈ Rn is the state vector of the system, x0 ∈ Rn is the initial state, u(t) ∈ U are the controls belonging to the (closed) control-constraint set U ⊂ Rm , and f is a (potentially nonlinear) function. We make the standard assumption that f is continuous in t and uniformly (globally) Lipschitz continuous in x(t) and u(t) such that (4.1) admits a unique solution for every continuous control function u : [0, T ] → U (cf. [4, Theorem 5.1]). Let us also consider the cost functional T g (t, x(t), u(t), θ ) dt (4.2) VT (x, u, θ ) 0
where g : [0, T ] × Rn × U × Θ → R is a potentially time-varying function that describes the stage (or integral or instantaneous) cost associated with the states and controls at time t. We shall further consider the functions g to belong to some known class of functions parameterized by θ belonging to the parameter set Θ ⊂ Rq . Given the dynamics (4.1) and cost functional (4.2), the parameterized continuoustime optimal control problem we consider is inf
VT (x, u, θ )
s.t.
x(t) ˙ = f (t, x(t), u(t)), t ∈ [0, T ] u(t) ∈ U , t ∈ [0, T ]
u
(4.3)
x(0) = x0 . The solutions to this continuous-time optimal control problem are state and control trajectories or functions x : [0, T ] → Rn and u : [0, T ] → Rm that minimize the cost functional (4.2) while satisfying the dynamics (4.1) and control constraints for a given (potentially infinite) horizon T , initial state x0 , and parameters θ . Necessary optimality conditions for this parameterized continuous-time optimal control problem are provided by (parameterized) continuous-time minimum principles.
4.1 Preliminary Concepts
99
4.1.2 Parameterized Continuous-Time Minimum Principles We present continuous-time minimum principles for the parameterized optimal control problem (4.3) by defining the parameterized Hamiltonian function H (t, x(t), u(t), λ(t), μ, θ ) μg (t, x(t), u(t), θ ) + λ (t) f (t, x(t), u(t)) (4.4) where μ ∈ R is a scalar constant and λ : [0, T ] → Rn is the costate (or adjoint) function. We assume that the functions f and g are continuously differentiable in their state and control arguments so that we may write ∇x H (t, x(t), u(t), λ(t), μ, θ ) and ∇u H (t, x(t), u(t), λ(t), μ, θ ) to denote the (column) vectors of partial derivatives of the Hamiltonian with respect to x(t) and u(t), respectively (evaluated at x(t), u(t), λ(t), μ, and θ ). Similar to Theorem 2.3, we then have the following continuous-time minimum principle for the parameterized continuous-time optimal control problem (4.3) with a finite horizon 0 < T < ∞. Theorem 4.1 (Parameterized Finite-Horizon Minimum Principle) Suppose that the state and control trajectories x : [0, T ] → Rn and u : [0, T ] → U constitute a solution to the continuous-time optimal control problem (4.3) for some finite horizon 0 < T < ∞. Then μ = 1; (i) the state trajectory x solves the differential equation x(t) ˙ = f (t, x(t), u(t)) for t ∈ [0, T ] with x(0) = x0 ; (ii) there exists a costate trajectory λ : [0, T ] → Rn satisfying ˙ λ(t) = −∇x H (t, x(t), u(t), λ(t), μ, θ ) for t ∈ [0, T ] with λ(T ) = 0; and (iii) the controls u satisfy ¯ λ(t), μ, θ ) u (t) ∈ arg min H (t, x(t), u(t), u(t)∈U ¯
for all t ∈ [0, T ]. Proof See Theorem 2.3.
Similarly, the minimum principle for the parameterized continuous-time optimal control problem (4.3) with an infinite horizon T = ∞ is as follows.
100
4 Continuous-Time Inverse Optimal Control
Theorem 4.2 (Parameterized Infinite-Horizon Minimum Principle) Suppose that the state and control trajectories x : [0, ∞) → Rn and u : [0, ∞) → U constitute a solution to the continuous-time optimal control problem (4.3) for an infinite horizon T = ∞. Then, (i) the state trajectory x solves the differential equation x(t) ˙ = f (t, x(t), u(t)) for t ≥ 0 with x(0) = x0 ; (ii) there exists a costate trajectory λ : [0, ∞) → Rn and real number μ ∈ R satisfying ˙ λ(t) = −∇x H (t, x(t), u(t), λ(t), μ, θ ) for t ≥ 0 with μ and λ(0) not simultaneously 0; and (iii) the controls u satisfy u (t) ∈ arg min H (t, x(t), u(t), ¯ λ(t), μ, θ ) u(t)∈U ¯
for all t ≥ 0.
Proof See Theorem 2.4 or [10].
The following corollary to Theorems 4.1 and 4.2 serves as a (new) combined minimum principle for both finite and infinite-horizon continuous-time optimal control problems. It also establishes an additional property of the Hamiltonian function H (t, x(t), u(t), λ(t), μ, θ ). Corollary 4.1 (Parameterized Horizon-Invariant Minimum Principle) Suppose that the state and control trajectories, x : [0, ] → Rn and u : [0, ] → U , constitute a (potentially) truncated solution to the optimal control problem (4.3) for a potentially infinite horizon T ≥ . Then, (i) the state trajectory x solves the differential equation x(t) ˙ = f (t, x(t), u(t)) for t ∈ [0, ] with x(0) = x0 ; (ii) there exists a costate trajectory λ : [0, ] → Rn and a real number μ satisfying ˙ λ(t) = −∇x H (t, x(t), u(t), λ(t), μ, θ ) for t ∈ [0, ] with μ and λ(0) not simultaneously 0; and (iii) the controls u satisfy
4.1 Preliminary Concepts
101
u (t) ∈ arg min H (t, x(t), u(t), ¯ λ(t), μ, θ ) u(t)∈U ¯
for all t ∈ [0, ]. Furthermore, if the dynamics f and stage costs g are time invariant, then H (t, x(t), u(t), λ(t), μ, θ ) = d
(4.5)
for t ∈ [0, ] and some constant d. Proof Theorems 4.1 and 4.2 establish the first three assertions of the corollary for finite and infinite horizon problems, respectively. That H (t, x(t), u(t), λ(t), μ, θ ) is constant for all t ∈ [0, ] when f and g are time invariant can be seen by considering the total derivative with respect to time, namely, ˙ + ∇u H u(t) ˙ + λ˙ (t) f (t, x(t), u(t)) H˙ (t, x(t), u(t), λ(t), μ, θ ) = ∇t H + ∇x H x(t)
where we use the shorthand H for H (t, x(t), u(t), λ(t), μ, θ ), and ∇t H , ∇x H and ∇u H to denote the (column) vectors of partial derivatives of H with respect to t, x(t), and u(t), respectively. Now, x(t) ˙ = f (t, x(t), u(t)) combined with the third corollary assertion gives that ∇x H + λ˙ (t) = 0, and so ˙ + [∇x H + λ˙ (t)] f (t, x(t), u(t)) H˙ (t, x(t), u(t), λ(t), μ, θ ) = ∇t H + ∇u H u(t) = ∇t H + ∇u H u(t) ˙ = ∇u H u(t) ˙ where the last line holds since ∇t H = 0 when f and g are constant with respect to time. Since the second corollary assertion implies that the controls u satisfy ¯ λ(t), μ, θ ) = 0 ∇u H (t, x(t), u(t), when u (t) is in the interior (i.e., not on the boundary) of the control-constraint set ˙ = 0. Thus, U and since u(t) ˙ = 0 otherwise, we must have that ∇u H u(t) H˙ (t, x(t), u(t), λ(t), μ, θ ) = 0 for all t ∈ [0, ] and the proof is complete.
The continuous-time minimum principles of Theorems 4.1 and 4.2 together with the combined minimum principle of Corollary 4.1 will form the basis of the methods of inverse optimal control that we shall develop in this chapter. Before developing these methods, however, we shall introduce the specific continuous-time inverse optimal control problems that we consider.
102
4 Continuous-Time Inverse Optimal Control
4.2 Inverse Optimal Control Problems in Continuous Time In this section, we introduce continuous-time inverse optimal control problems analogous to the discrete-time inverse optimal control problems defined in Sect. 3.2. In these continuous-time problems, we shall seek to compute cost-functional parameters θ such that given state and control trajectories constitute optimal solutions to the continuous-time optimal control problem (4.3). We assume throughout that the dynamics f , constraint set U , and stage cost function g are known a priori. The problems differ in whether the state and control trajectories are to constitute optimal solutions in their entirety, or are to be optimal solutions that are truncated at a time < T before the (possibly infinite) horizon T . We first pose the simplest problem in which we have access to the state and control trajectories in their entirety. Definition 4.1 (Whole-Trajectory (WT) Problem) Consider the optimal control problem (4.3) with a finite horizon T < ∞, known dynamics f , constraint set U , parameter set Θ, and parameterized function g. Given the state trajectory x : [0, T ] → Rn and the associated controls u : [0, T ] → U , the whole-trajectory (WT) inverse optimal control problem is to compute parameters θ such that x and u constitute an optimal solution to (4.3). Key to the formulation of the WT problem of Definition 4.1 is that the horizon T is finite and determined by the length of the available state and control trajectories (thus it is known). The following alternative continuous-time inverse optimal control problem handles the possibility of an unknown finite horizon or an infinite horizon by considering state and control trajectories x : [0, ] → Rn and u : [0, ] → U that are truncated at some (finite) time < T . Definition 4.2 (Truncated-Trajectory (TT) Problem) Consider the optimal control problem (4.3) with a potentially infinite horizon T > 0, known dynamics f , constraint set U , parameter set Θ, and parameterized function g. Given state and control trajectories x : [0, ] → Rn and u : [0, ] → U with < T , the truncated-trajectory (TT) inverse optimal control problem is to compute parameters θ such that x and u constitute a truncated optimal solution to (4.3) for some T < ∞ or T = ∞.1 Before considering methods for solving these continuous-time inverse optimal control problems, it is important to note that in many practical situations, the function g, dynamics f , horizon T , or parameter set Θ may be misspecified such that the given trajectories fail to constitute exact optimal solutions to continuous-time optimal control problem (4.3) for any θ ∈ Θ. As discussed in the discrete-time setting of Chap. 3, if no exact solutions to these inverse optimal control problems exist, it is often desirable to instead find parameters θ such that the given states and controls are approximately optimal. Approximate-optimality inverse optimal control approaches 1
Within the context of the TT problem in this chapter, we shall discuss cases in which the horizon T is either known, unknown, or known only to be finite or infinite.
4.2 Inverse Optimal Control Problems in Continuous Time
103
have been considered in the literature of continuous-time inverse optimal control (cf. [16]), and so in this chapter we shall consider both exact and approximate solutions to the problems of Definitions 4.1 and 4.2.
4.3 Bilevel Methods In this section, we present bilevel methods for solving the WT and TT of Definitions 4.1 and 4.2. As in the discrete-time setting of Sect. 3.3, these bilevel methods arise by viewing inverse optimal control problems as regression or trajectory-fitting problems. They also involve two levels of optimization, with the first (or upper) level of optimization being over the unknown cost-functional parameters θ with an objective function based on a measure of “closeness” between the given trajectories and trajectories computed (via a second or lower level of optimization) by solving a continuous-time optimal control problem parameterized by postulated cost-functional parameters.
4.3.1 Bilevel Method for Whole Trajectories The bilevel method for solving the whole-trajectory continuous-time inverse optimal control problem of Definition 4.1 is the optimization problem
T
inf
θ∈Θ
x(t) − x θ (t) 2 + u(t) − u θ (t) 2 dt
(4.6)
0
subject to the constraint that the state and control trajectories x θ : [0, T ] → Rn and u θ : [0, T ] → U are solutions to the continuous-time optimal control problem (4.3) parameterized by θ ∈ Θ.
4.3.2 Bilevel Method for Truncated Trajectories In a similar vein, the bilevel method for solving the truncated-trajectory continuoustime inverse optimal control problem of Definition 4.2 is the optimization problem
inf
θ∈Θ
x(t) − x θ (t) 2 + u(t) − u θ (t) 2 dt
(4.7)
0
subject to the constraint that the state and control trajectories x θ : [0, ] → Rn and u θ : [0, ] → U are truncated solutions to the continuous-time optimal control problem (4.3) parameterized by θ with either a finite or infinite horizon T > 0.
104
4 Continuous-Time Inverse Optimal Control
4.3.3 Discussion of Bilevel Methods As in discrete-time (cf. Sect. 3.3), the bilevel methods of continuous-time inverse optimal control in (4.6) and (4.7) both seek to find cost-functional parameters θ that minimize the total squared error between the given state and control trajectories (x and u), and the state and control trajectories predicted by solving a continuous-time optimal control problem parameterized by θ (x θ and u θ ). They therefore require explicit prior knowledge of the horizon T > 0, but can be posed with a variety of (upper level) optimization objectives such as the total squared error between the states, i.e.,
T
inf
θ∈Θ
x(t) − x θ (t) 2 dt.
0
The computational demands of continuous-time bilevel methods are especially burdensome compared to their discrete-time counterparts due to the additional difficulty of numerically solving continuous-time optimal control problems compared to solving discrete-time optimal control problems. We illustrate this point in more detail in the simulations and experimental case study of Chap. 7. Here, we note that the computational demands of continuous-time bilevel methods are such that papers like [26] have explicitly sought to perform the upper-level optimization over parameters θ with numeric optimization routines that minimize the number of (forward) continuous-time optimal control problems that must be solved. In this chapter, we take an alternative approach by exploiting continuous-time minimum principles.
4.4 Minimum-Principle Methods In this section, we develop methods for solving the continuous-time inverse optimal control problems of Definitions 4.1 and 4.2 by exploiting the continuous-time minimum principles of Sect. 4.1.2. We first develop methods for solving the WT problem of Definition 4.1 before developing analogous methods for solving the TT problem of Definition 4.2.
4.4.1 Methods for Whole Trajectories Let us begin the development of methods for solving the WT problem of Definition 4.1 by considering the continuous-time optimal control problem (4.3) with a finite horizon T < ∞, together with some (arbitrary) state and control trajectories, x : [0, T ] → Rn and u : [0, T ] → U . Let us also introduce the following defini-
4.4 Minimum-Principle Methods
105
tion of a finite set of select times at which the control constraints are not active in u : [0, T ] → U . Definition 4.3 (Select Inactive Control Constraint Times) Given the control trajectory u : [0, T ] → U , let K {tk ∈ [0, T ] : u(tk ) ∈ int U and 1 ≤ k ≤ |K |} be a finite set of (select) times at which the control constraints are inactive; that is, K is a finite set of times t at which the controls u(t) are in the interior (not on the boundary) of the constraint set U . We note that the control constraints may be active (i.e., u(t) may be on the boundary of U ) at the infinitely many times t ∈ [0, T ] not also in K (i.e., at t ∈ [0, T ] \ K ). Importantly, given the select inactive control constraint times K , assertion (iii) of Theorem 4.1 can be rewritten as ∇u H (t, x(t), u(t), λ(t), μ, θ ) = 0 for all t ∈ K with μ = 1. In order for x : [0, T ] → Rn and u : [0, T ] → U to constitute a solution to the optimal control problem (4.3) with finite horizon T < ∞, Theorem 4.1 hence implies that the parameters θ must be such that ∇u H (t, x(t), u(t), λ(t), μ, θ ) = 0
(4.8)
λ˙ (t) = −∇x H (t, x(t), u(t), λ(t), μ, θ )
(4.9)
for all t ∈ K with
for t ∈ [0, T ] where μ = 1 and λ(T ) = 0.
4.4.1.1
(4.10)
Constraint-Satisfaction Method for Whole Trajectories
The Hamiltonian gradient condition (4.8) and the costate differential equation (4.9) along with (4.10) lead directly to a constraint-satisfaction method for solving the WT problem of Definition 4.1. Specifically, this method is defined by the constraintsatisfaction problem
106
4 Continuous-Time Inverse Optimal Control
inf
C
s.t.
˙ λ(t) = −∇x H (t, x(t), u(t), λ(t), μ, θ ) , t ∈ [0, T ] λ(T ) = 0
θ,λ
(4.11)
∇u H (tk , x(tk ), u(tk ), λ(tk ), μ, θ ) = 0, tk ∈ K θ ∈Θ for any constant C ∈ R with μ = 1. The solution of (4.11) involves finding parameters θ and the costate function λ : [0, T ] → Rn . However, given the parameters θ , the constraints in (4.11) imply that the costate function λ is uniquely determined by solving the differential equation (4.9) with terminal condition λ(T ) = 0. The optimization in (4.11) can therefore be equivalently written as only over the costfunctional parameters θ .
4.4.1.2
Soft Method for Whole Trajectories
The optimization in the constraint-satisfaction method (4.11) may be infeasible when the dynamics f , basis functions g, horizon T , or parameter set Θ are misspecified since in this case x and u will not solve the optimal control problem (4.3) for any parameter θ ∈ Θ. As discussed in Sect. 4.2, in these misspecified cases the aim of the inverse optimal control becomes instead to find parameters θ so that the states x and controls u are approximately optimal under the optimal control problem (4.3). The use of the Hamiltonian gradient and costate conditions (4.8), (4.10), and (4.9) as (hard) constraints in the constraint-satisfaction method (4.11) however precludes its use in finding parameters that deliver this approximate optimality. To find cost-functional parameters θ so that the states x and controls u are either exactly or approximately optimal under the optimal control problem (4.3), the Hamiltonian gradient and costate conditions (4.8) and (4.9) may be used as (soft) objectives giving rise to the soft method defined by the optimization problem inf θ,λ
∇u H (tk , x(tk ), u(tk ), λ(tk ), μ, θ ) 2
tk ∈K
T
+
˙ + ∇x H (t, x(t), u(t), λ(t), μ, θ ) 2 dt
λ(t)
(4.12)
0
s.t.
θ ∈Θ
where μ = 1. The soft method (4.12) intuitively seeks costate functions λ : [0, T ] → Rn and parameters θ ∈ Θ such that the costate (4.9) and gradient conditions (4.8) hold (at least approximately) for all times t ∈ [0, T ]. The objective function of the soft method (4.12) penalizes cost-functional parameters θ ∈ Θ and costate functions λ that lead to Hamiltonian gradients ∇u H (t, x(t), u(t), λ(t), μ, θ ) with large magnitudes and Hamiltonian gradients ∇x H (t, x(t), u(t), λ(t), μ, θ ) that differ from
4.4 Minimum-Principle Methods
107
the time-derivatives λ˙ . Unlike in the constraint-satisfaction method (4.11), the optimization in the soft method (4.12) cannot be reduced to being only over θ since the functions λ are free from constraints involving θ . By encoding the Hamiltonian gradient and costate conditions (4.8) and (4.9) as optimization objectives, the soft method (4.12) will yield parameters even when there is no feasible solution to the constraint-satisfaction method (4.11). If however the constraint-satisfaction method (4.11) has feasible solutions, then these will also be the only solutions to the soft method (4.12). As in the constraint-satisfaction method of (4.11) however, the solutions to (4.12) may be nonunique. A key difficulty faced in employing the soft method (4.12) and examining its properties lies in its mix of a summation and integral in its objective function. Indeed, in (4.12) we are faced with optimizing the functions λ : [0, T ] → Rn at the sampled times tk ∈ K in the summation, and at all times t ∈ [0, T ] in the integral. To simplify the treatment of the soft method, we shall therefore introduce the following simplifying assumption (but only for consideration of the soft method). Assumption 4.1 The controls u(t) are in the interior (i.e., not on the boundary) of the control-constraint set U for all t ∈ [0, T ]. Under Assumption 4.1, assertion (iii) of Theorem 4.1 is equivalent to stating that the gradient of the Hamiltonian vanishes, namely, ∇u H (t, x(t), u(t), λ(t), μ, θ ) = 0 for all t ∈ [0, T ]. The soft method under Assumption 4.1 then becomes
T
inf θ,λ
s.t.
0
∇u H (t, x(t), u(t), λ(t), μ, θ ) 2 + λ˙ (t) + ∇x H (t, x(t), u(t), λ(t), μ, θ ) 2 dt
(4.13)
θ ∈Θ
where μ = 1. We shall examine this form of the soft method further in Sect. 4.5.
4.4.1.3
Mixed Method for Whole Trajectories
As we have just seen, employing the soft method (4.12) may be difficult due to its objective function involving both integration and summation. While Assumption 4.1 lets us develop the integral-only form of the soft method in (4.13), it is too strong to be of wide-ranging practical use since control constraints are important and active in many optimal control problems. For example, optimal control problems involving physical systems frequently have actuator constraints. In order to develop a method of whole-trajectory inverse optimal control that handles constrained controls while also offering similar approximate optimality properties to the soft method (4.12), we can formulate the Hamiltonian gradient condition
108
4 Continuous-Time Inverse Optimal Control
(4.8) into an objective (as in the soft method (4.12)) and the costate condition (4.9) as a (hard) constraint (as in the constraint-satisfaction method (4.11)). By doing so, we arrive at a mixed method of whole-trajectory continuous-time inverse optimal control defined by the optimization problem inf θ,λ
s.t.
∇u H (tk , x(tk ), u(tk ), λ(tk ), μ, θ ) 2
tk ∈K
˙ λ(t) = −∇x H (t, x(t), u(t), λ(t), μ, θ ) , t ∈ [0, T ] λ(T ) = 0
(4.14)
θ ∈Θ with μ = 1.
4.4.1.4
Discussion of Minimum-Principle Methods for Whole Trajectories
The mixed method (4.14) intuitively entails finding cost-functional parameters θ such that the Hamiltonian gradient condition (4.8) holds approximately at the times t ∈ K while the costate condition (4.9) holds exactly for all t ∈ [0, T ]. This deviation from the soft method (4.12) is enough to avoid needing the restrictive Assumption 4.1 in the context of the mixed method (4.14). In contrast to the mixed method (4.14), the constraint-satisfaction and soft methods of (4.11) and (4.12) either seek cost-functional parameters θ such that both the Hamiltonian gradient condition (4.8) and costate condition (4.9) hold exactly, or both hold approximately. Furthermore, due to the costate conditions (4.9) and (4.10) being (hard rather than soft) constraints in the mixed method (4.14), the functions λ : [0, T ] → Rn are uniquely determined for each cost-functional parameter θ ∈ Θ. The optimization in (4.14) is therefore effectively only over the parameters θ . We note that if the constraint-satisfaction method (4.11) has feasible solutions, then these will also be the only solutions to the mixed method (4.14). However, the mixed method also yields parameters when the constraint-satisfaction method (4.11) has no feasible solutions. Indeed, as with the soft method of (4.12), the only condition required for the mixed method to have feasible solutions is that the parameter set Θ be non-empty. We shall later examine the uniqueness of solutions to the mixed method (4.14), constraint-satisfaction method (4.11), and soft method (in its simplified form of (4.13)). The main limitation associated with the constraint-satisfaction method (4.11), soft method (4.12), and mixed method (4.14) is that they are all only able to compute the parameters of continuous-time optimal control problems (4.3) with known finite horizons T < ∞. We therefore next develop methods of continuous-time inverse optimal control for truncated trajectories that enable the computation of parameters for both finite and infinite-horizon optimal control problems with potentially unknown horizons.
4.4 Minimum-Principle Methods
109
4.4.2 Methods for Truncated Trajectories To develop methods for solving the TT problem of Definition 4.2, let us consider a possibly infinite horizon T > 0 and state and control trajectories x : [0, ] → Rn and u : [0, ] → U with < T . Similar to Definition 4.3, let us define K {tk ∈ [0, ] : u(tk ) ∈ int U and 1 ≤ k ≤ |K |} as a finite set of select times at which the controls in the (truncated) trajectory u : [0, ] → U are inactive. If x : [0, ] → Rn and u : [0, ] → U constitute a truncated solution to the optimal control problem (4.3) with T > then assertion (iii) of Corollary 4.1 is equivalent to the condition that the gradient of the Hamiltonian vanishes at the times t ∈ K in the sense that ∇u H (t, x(t), u(t), λ(t), μ, θ ) = 0 for all t ∈ K . For x : [0, ] → Rn and u : [0, ] → U to constitute a truncated solution to the continuous-time optimal control problem (4.3), Corollary 4.1 then implies that the parameters θ must be such that λ˙ (t) = −∇x H (t, x(t), u(t), λ(t), μ, θ )
(4.15)
for t ∈ [0, ] and some real number μ not simultaneously zero with λ(0), and ∇u H (t, x(t), u(t), λ(t), μ, θ ) = 0
(4.16)
for all t ∈ K . The costate and Hamiltonian gradient conditions (4.15) and (4.16) are analogous to the conditions of (4.8) and (4.9) that we used in the previous section for whole trajectories. In this truncated-trajectory setting however, the horizon T is unknown and potentially infinite which implies that the value of μ will also be unknown. We will develop constraint-satisfaction, soft, and mixed methods for the truncatedtrajectory problem of Definition 4.2 analogous to the methods for whole trajectories developed in the previous subsection by introducing μ as an additional unknown alongside the parameters θ and the costate function λ.
4.4.2.1
Constraint-Satisfaction Method for Truncated Trajectories
In view of the costate condition (4.15) and the Hamiltonian gradient condition (4.16), the constraint-satisfaction method of continuous-time inverse optimal control with truncated trajectories is the constraint-satisfaction problem
110
4 Continuous-Time Inverse Optimal Control
inf
C
θ,λ,μ
λ˙ (t) = −∇x H (t, x(t), u(t), λ(t), μ, θ ) , t ∈ [0, T ]
s.t.
∇u H (tk , x(tk ), u(tk ), λ(tk ), μ, θ ) = 0, tk ∈ K θ ∈Θ
(4.17)
μ∈R for any constant C ∈ R.
4.4.2.2
Soft Method for Truncated Trajectories
Similarly, the soft method of continuous-time inverse optimal control with truncated trajectories is the optimization problem
inf
θ,λ,μ
∇u H (tk , x(tk ), u(tk ), λ(tk ), μ, θ ) 2
tk ∈K
+
˙ + ∇x H (t, x(t), u(t), λ(t), μ, θ ) 2 dt
λ(t)
(4.18)
0
s.t.
θ ∈Θ μ ∈ R.
As in the whole-trajectory setting however, employing the soft method (4.18) and examining its properties is difficult in general due to the mixing of a summation and integral in its objective function. To simplify the treatment of the soft method, we shall introduce the following (potentially restrictive) assumption. Assumption 4.2 The controls u(t) are in the interior (i.e., not on the boundary) of the control-constraint set U for all t ∈ [0, ]. Under Assumption 4.2, assertion (iii) of Corollary 4.1 becomes ∇u H (t, x(t), u(t), λ(t), μ, θ ) = 0 for all t ∈ [0, ], and the soft method under Assumption 4.2 becomes inf
θ,λ,μ
s.t.
∇u H (t, x(t), u(t), λ(t), μ, θ ) 2
0
˙ + ∇x H (t, x(t), u(t), λ(t), μ, θ ) 2 dt + λ(t) θ ∈Θ μ ∈ R.
(4.19)
4.4 Minimum-Principle Methods
4.4.2.3
111
Mixed Method for Truncated Trajectories
Finally, the mixed method of continuous-time inverse optimal control with truncated trajectories is the optimization problem inf
θ,λ,μ
s.t.
∇u H (tk , x(tk ), u(tk ), λ(tk ), μ, θ ) 2
tk ∈K
˙ λ(t) = −∇x H (t, x(t), u(t), λ(t), μ, θ ) , t ∈ [0, ] θ ∈Θ
(4.20)
μ ∈ R. 4.4.2.4
Discussion of Minimum-Principle Methods for Truncated Trajectories
Unlike the methods for whole trajectories of Sect. 4.4.1, the methods for truncated trajectories require no prior knowledge of the horizon T other than that the given trajectories are of length < T . The truncated-trajectory methods also omit constraints and penalties associated with the terminal costate λ(T ). If, however, the horizon is known to be finite (i.e., that it has a potentially unknown finite value T < ∞), then the optimizations over μ in the truncated-trajectory methods can be omitted since μ = 1 for all finite horizons T < ∞ (cf. Theorem 4.1). Many of the other properties of the truncated-trajectory methods are analogous to those of the whole-trajectory methods of Sect. 4.4.1. For example, due to the costate condition (4.15) being used as a constraint in the constraint-satisfaction and mixed methods for truncated sequences, the optimizations in them are essentially only over μ, the parameters θ , and the last costate λ(). Similarly, the constraint-satisfaction method (4.17) may have no feasible solutions if the functions g and dynamics f , or parameter set are misspecified. In these cases, the soft (4.19) and mixed (4.20) methods will yield parameters such that the truncated trajectories have approximate optimality properties. We next discuss the existence and uniqueness of solutions to these truncatedtrajectory methods and the whole-trajectory methods.
4.5 Method Reformulations and Solution Results As in the discrete-time setting of Chap. 3, the continuous-time minimum-principle and bilevel methods of Sects. 4.3 and 4.4 often share the practical limitation of only finding parameters θ under which the trajectories x and u correspond to local (but not global) extremum or inflection trajectories. The negative consequences of this limitation can be minimized for the minimum-principle methods by showing that the parameters they yield are unique (since then no other parameters exist in the
112
4 Continuous-Time Inverse Optimal Control
parameter set under which the trajectories could constitute a global minimum). In this section, we thus investigate the properties of the minimum-principle methods of continuous-time inverse optimal control presented in Sect. 4.4. Specifically, we shall introduce additional structure into the parameterization of the cost functionals (4.2) and use this structure to reformulate the methods of Sect. 4.4 as either systems of linear equations or quadratic programs. These reformulations will allow us to then apply the tools of quadratic programming and linear algebra (cf. Sect. 2.1) to examine the existence and uniqueness of parameters computed by the methods.
4.5.1 Linearly Parameterized Cost Functionals We shall make use of the following assumption regarding the parameterization of the cost function g throughout this section. Assumption 4.3 (Linearly Parameterized Cost Functions) The function g is a linear combination of known basis functions g¯ : [0, T ] × Rn × U → Rq in the sense that ¯ x(t), u(t)) g(t, x(t), u(t), θ ) = θ g(t, where θ ∈ Θ ⊂ Rq are the cost-functional parameters, and the basis functions g¯ are continuously differentiable in each of their arguments. Assumption 4.3 is directly analogous to Assumption 3.3 which we employed in the discrete-time setting of Chap. 3. While it is potentially restrictive, the linear parameterization imposed by Assumption 4.3 is ubiquitous in the literature of inverse optimal control. Importantly, under Assumption 4.3, the Hamiltonian function H is linear the parameters θ and costate function λ(t), namely, ¯ x(t), u(t)) + λ (t) f (t, x(t), u(t)) H (t, x(t), u(t), λ(t), μ, θ ) = μθ g(t, for t ≥ 0. An immediate consequence of this linearity is that the Hamiltonian gradients are also linear under Assumption 4.3, namely, ∇u H (t, x(t), u(t), λ(t), μ, θ ) = ∇u g(t, ¯ x(t), u(t))θ μ + ∇u f (t, x(t), u(t))λ(t)
(4.21)
and ∇x H (t, x(t), u(t), λ(t), μ, θ ) = ∇x g(t, ¯ x(t), u(t))θ μ + ∇x f (t, x(t), u(t))λ(t)
(4.22)
where ∇x g(t, ¯ x(t), u(t)) ∈ Rn×q , ∇u g(t, ¯ x(t), u(t)) ∈ Rm×q , ∇x f (t, x(t), u(t)) ∈ n×n m×n denote the matrices of partial derivatives of g¯ R , and ∇u f (t, x(t), u(t)) ∈ R
4.5 Method Reformulations and Solution Results
113
and f with respect to x(t) and u(t), respectively (and evaluated at x(t), u(t), λ(t), and μ). Since the objective functions and constraints of all of the continuous-time inverse optimal control methods in Sect. 4.4 are either linear or quadratic in the Hamiltonian gradients, Assumption 4.3 therefore also implies that the optimization problems in the methods of Sect. 4.4 are convex (provided that the set Θ is also convex). We will now explicitly reformulate the methods under Assumption 4.3 as either systems of linear equations or quadratic programs in order to further examine their properties.
4.5.2 Reformulations of Whole-Trajectory Methods We shall first reformulate the constraint-satisfaction, soft, and mixed methods of inverse optimal control for whole trajectories presented in Sect. 4.4.1. We recall here that for the whole-trajectory methods, the constant μ takes the value μ = 1 (due to the finite-horizon minimum principle of Theorem 4.1).
4.5.2.1
Whole-Trajectory Constraint-Satisfaction Method Reformulation
To reformulate the constraint-satisfaction method of (4.11), let us recall that the unknown function λ : [0, T ] → Rn is constrained to be a solution to the differential equation λ˙ (t) = −∇x H (t, x(t), u(t), λ(t), μ, θ ) with boundary condition λ(T ) = 0. Under Assumption 4.3, this differential equation is linear in the parameters θ and function values λ(t). In the following proposition, we extend this linearity further and show that the function λ solving this differential equation can in fact be written as simply a linear function of θ . Proposition 4.1 Suppose that Assumption 4.3 holds and consider the function λ¯ : [0, T ] → Rn×q solving the differential equation ˙¯ ¯ ¯ x(t), u(t))μ − ∇x f (t, x(t), u(t))λ(t) λ(t) = −∇x g(t,
(4.23)
for t ∈ [0, T ] with the terminal condition λ¯ (T ) = 0 and μ = 1. Then, the function λ : [0, T ] → Rn given by λ(t) = λ¯ (t)θ
(4.24)
114
4 Continuous-Time Inverse Optimal Control
for t ∈ [0, T ] solves the differential equation ˙ λ(t) = −∇x H (t, x(t), u(t), λ(t), μ, θ ) with boundary condition λ(T ) = 0. Proof Multiplying the differential equation defined in (4.23) by θ gives λ˙¯ (t)θ = −∇x g(t, ¯ x(t), u(t))μθ − ∇x f (t, x(t), u(t))λ¯ (t)θ = −∇x H (t, x(t), u(t), λ¯ (t)θ, μ, θ ) for t ∈ [0, T ] where the last line follows by recalling the form of the Hamiltonian gradient (4.22) under Assumption 4.3. Similarly, multiplying the boundary condition ¯ ) = 0 by θ gives λ(T ¯ )θ = 0. Thus, the differential equation λ(T ˙ λ(t) = −∇x H (t, x(t), u(t), λ(t), μ, θ ) ¯ ¯ with λ(T ) = 0 is solved by the function λ(t) = λ(t)θ where λ(t) solves the differ¯ ential equation (4.23) with λ(T ) = 0. The proof is complete. In light of Proposition 4.1, the Hamiltonian gradients (4.21) and (4.22) can be rewritten under Assumption 4.3 as ∇u H (t, x(t), u(t), λ(t), μ, θ ) ¯ x(t), u(t))μ + ∇u f (t, x(t), u(t))λ¯ (t)]θ = [∇u g(t,
(4.25)
and ∇x H (t, x(t), u(t), λ(t), μ, θ ) ¯ ¯ x(t), u(t))μ + ∇x f (t, x(t), u(t))λ(t)]θ = [∇x g(t,
(4.26)
where λ¯ is the solution to the differential equation (4.23) with λ¯ (T ) = 0. With these linear forms of the Hamiltonian gradients, the constraint-satisfaction method (4.11) may be rewritten as the optimization problem inf θ
s.t.
C ˙¯ ¯ λ(t) = −∇x g(t, ¯ x(t), u(t))μ − ∇x f (t, x(t), u(t))λ(t), t ∈ [0, T ] ¯ )=0 λ(T ¯ k )]θ = 0, tk ∈ K [∇u g(t ¯ k , x(tk ), u(tk ))μ + ∇u f (tk , x(tk ), u(tk ))λ(t θ ∈Θ
for any constant C ∈ R with μ = 1. In this form, we see that the constraints in the constraint-satisfaction method (with the exception of the set constraint θ ∈ Θ)
4.5 Method Reformulations and Solution Results
115
are linear in the parameters under Assumption 4.3. Thus, under Assumption 4.3, the constraint-satisfaction method (4.11) may be reformulated as the problem of solving the (constrained) system of linear equations ξC θ = 0
s.t. θ ∈ Θ
(4.27)
where the coefficient matrix ξC is defined as ¯ k) ∈ Rm|K ξC ∇u g(t ¯ k , x(tk ), u(tk )) + ∇u f (tk , x(tk ), u(tk ))λ(t tk ∈K
|×q
(4.28)
n×q with the function λ¯ : [0, T ] → solving the differential equation (4.23) with R ¯λ(T ) = 0, and the notation Atk meaning the matrix formed by stacking a tk ∈K sequence of matrices At1 , At2 , . . . indexed by the times tk in the set K , i.e.,
⎡
Atk
tk ∈K
⎢ ⎢ ⎢ ⎣
At|K 4.5.2.2
⎤
At1 At2 .. .
⎥ ⎥ ⎥. ⎦ |
Whole-Trajectory Soft Method Reformulation
We shall now employ Assumption 4.3 to reformulate the soft method of (4.13) developed under Assumption 4.1. We first note that we are unable to exploit Proposition 4.1 to further simplify the Hamiltonian gradients (4.21) and (4.22) under Assumption 4.3 since the function λ : [0, T ] → Rn in the soft method (4.13) is not constrained to exactly solve the costate differential equation of the finite-horizon continuous-time minimum principle. We will therefore proceed directly from the linear Hamiltonian gradients in (4.21) and (4.22) to reformulate the soft method of (4.13) under Assumptions 4.1 and 4.3. Let us begin by defining the rectangular matrix I I 0 ∈ Rq×(q+n) where I ∈ Rq×q is the (square) identity matrix. Let us also define R = I ∈ Rn×n , B
0 ∇x g¯ (t) , and S(t) ∇x f (t) I
and Q(t) F (t)F(t) where
¯ ∇x f (t) ∇x g(t) . F(t) ¯ ∇u f (t) ∇u g(t) Here, we will use the shorthand ∇x f (t) ∈ Rn×n and ∇u f (t) ∈ Rm×n to denote the matrices of partial derivatives of the system dynamics f with respect to x(t) and u(t), ¯ ∈ respectively (and evaluated with x(t), and u(t)). Similarly, we will use ∇x g(t)
116
4 Continuous-Time Inverse Optimal Control
Rn×q , and ∇u g(t) ¯ ∈ Rm×q to denote the matrices of partial derivatives of the basis functions g¯ with respect to (and evaluated with) x(t) and u(t). Under Assumption 4.3, the linear Hamiltonian gradients in (4.21) and (4.22) imply that the integrand in the soft method (4.13) developed under Assumption 4.1 may be rewritten as
∇u H (t, x(t), u(t), λ(t), μ, θ ) 2 + λ˙ (t) + ∇x H (t, x(t), u(t), λ(t), μ, θ ) 2
λ˙ (t) + ∇x H (t, x(t), u(t), λ(t), μ, θ ) 2 = ∇u H (t, x(t), u(t), λ(t), μ, θ )
2 λ˙ (t) + ∇x g(t)θ ¯ + ∇x f (t)λ(t) = ¯ + ∇u f (t)λ(t) ∇u g(t)θ 2
∇x g(t) θ I ˙ ¯ ∇x f (t) λ = (t) + i ∇u g(t) ¯ ∇u f (t) λ(t) 0 = z (t)Q(t)z(t) + v (t)Rv(t) + 2z (t)S(t)v(t) where the second equality follows from (4.21) and (4.22) under Assumption 4.3, and the last equality follows from the definitions of Q(t), R, and S(t) together with the variable substitutions
θ ˙ z(t) and v(t) λ(t). λ(t) The constraint θ ∈ Θ in the soft method (4.13) may also be rewritten as I z(t) ∈ Θ and the (implicit) constraint in (4.13) that θ is time invariant implies that
0 θ˙ = Bv(t). z˙ (t) = ˙ = ˙ λ(t) λ(t) The optimization in the soft method (4.13) under Assumption 4.3 is thus equivalent to the LQ dynamic optimization problem inf z,v
s.t.
T
z (t)Q(t)z(t) + v (t)Rv(t) + 2z (t)S(t)v(t) dt
0
z˙ (t) = Bv(t), t ∈ [0, T ] I z(t) ∈ Θ, t ∈ [0, T ].
(4.29)
We note however that given a function v : [0, T ] → Rn together with an initial value z(0) = η ∈ Rq+n with I η ∈ Θ, we may solve the differential equation z˙ (t) = Bv(t) for the unique function z : [0, T ] → Rq+n . The optimization in (4.29) is therefore essentially only over the initial value z(0) = η ∈ Rq+n and the function v : [0, T ] → Rn , namely,
4.5 Method Reformulations and Solution Results
inf inf η
s.t.
v
T
117
z (t)Q(t)z(t) + v (t)Rv(t) + 2z (t)S(t)v(t) dt
0
z˙ (t) = Bv(t), t ∈ [0, T ]
(4.30)
z(0) = η I η ∈ Θ. For any η ∈ Rq+n , the inner optimization problem over the function v in (4.30) is a LQ optimal control problem. Since R = I is positive definite and ¯ ∇u f (t) ∇u g(t) ¯ ∇u f (t) Q(t) − S(t)R −1 S (t) = ∇u g(t) is positive semidefinite, the optimal control results of [3, Sect. 3.4] imply that for any z(0) = η ∈ Rq+n , the unique function solving the inner optimization over v in (4.30) is given by v(t) = κ(t)z(t) for all t ∈ [0, T ] where κ(t) − B P(t) + S (t) and P : [0, T ] → R(q+n)×(q+n) is the unique positive semidefinite solution to the Riccati differential equation ˙ P(t) = (P(t)B + S(t))(B P (t) + S (t)) − Q(t)
(4.31)
for t ∈ [0, T ] with terminal condition P(T ) = 0. Section 3.4 of [3] also gives that the value of the inner optimization problem over v in (4.30) is η P(0)η for any initial state z(0) = η. The optimization in (4.30) thus simplifies to the (constrained) quadratic program inf η
η P(0)η
s.t. I η ∈ Θ.
(4.32)
The cost-functional parameters yielded by the soft method (4.13) under Assumption 4.3 are then θ = I η where η solves the (constrained) quadratic program (4.32).
4.5.2.3
Whole-Trajectory Mixed Method Reformulation
Under Assumption 4.3, we may reformulate the mixed method (4.14) in a similar method to the reformulation of the constraint-satisfaction method (4.11) in (4.27). Specifically, the function λ : [0, T ] → Rn in the mixed method (4.14) is constrained to satisfy the differential equation λ˙ (t) = −∇x H (t, x(t), u(t), λ(t), μ, θ )
118
4 Continuous-Time Inverse Optimal Control
with λ(T ) = 0. Under Assumption 4.3, Proposition 4.1 gives that the function λ solving this differential equation is linear in the parameters θ . The Hamiltonian gradients are therefore given by (4.25) and (4.26), and so under Assumption 4.3 the mixed method (4.14) simplifies to the optimization problem inf θ
s.t.
2 [∇u g(t ¯ k , x(tk ), u(tk ))μ + ∇u f (tk , x(tk ), u(tk ))λ¯ (tk )]θ tk ∈K
λ˙¯ (t) = −∇x g(t, ¯ x(t), u(t))μ − ∇x f (t, x(t), u(t))λ¯ (t), t ∈ [0, T ] λ¯ (T ) = 0 θ ∈Θ
with μ = 1. This optimization problem is a constrained quadratic program, and so under Assumption 4.3, the mixed method (4.14) can be rewritten in the form inf θ
θ ξM θ
s.t. θ ∈ Θ
(4.33)
where ξ M ∈ Rq×q is the positive semidefinite matrix ξM
[∇u g(t ¯ k , x(tk ), u(tk )) + ∇u f (tk , x(tk ), u(tk ))λ¯ (tk )]
(4.34)
tk ∈K
· [∇u g(t ¯ k , x(tk ), u(tk )) + ∇u f (tk , x(tk ), u(tk ))λ¯ (tk )]
(4.35)
and the function λ¯ solves the differential equation ˙¯ ¯ ¯ x(t), u(t)) − ∇x f (t, x(t), u(t))λ(t) λ(t) = −∇x g(t, for t ∈ [0, T ] with terminal condition λ¯ (T ) = 0.
4.5.3 Solution Results for Whole-Trajectory Methods We shall now use the reformulations of the whole-trajectory constraint-satisfaction, soft, and mixed methods to establish results characterizing the existence and uniqueness of the cost-functional parameters they yield under Assumption 4.3.
4.5.3.1
Fixed-Element Parameter Set
We first note that scaling the cost functional of the optimal control problem (4.3) by any scalar C > 0 will not change the optimal trajectories x : [0, T ] → Rn and u : [0, T ] → U . Thus if the trajectories x and u solve the optimal control problem
4.5 Method Reformulations and Solution Results
119
(4.3) specified with the parameters θ = θ¯ , then they will also solve the optimal control problem specified by θ = C θ¯ for any C > 0. Inverse optimal control problems will therefore have multiple solutions in general. The zero vector θ = 0 will also be a trivial solution if θ = 0 ∈ Θ. An immediate necessary (though not sufficient) condition for any method of inverse optimal control to yield unique (nontrivial) cost-functional parameters θ is thus that the parameter set Θ must not contain both θ and C θ for any scalar C > 0 and any θ . In order to satisfy this condition (and exclude trivial solutions and ambiguous scaling), we shall consider the fixed-element parameter set given by Θ = {θ ∈ Rq : θ(1) = 1}.
(4.36)
The choice of this fixed-element parameter set is somewhat arbitrary, and we could instead consider a set with normalization constraints in the sense that Θ = {θ ∈ Rq :
θ = 1}. There is also no loss of generality with this choice of parameter set since the ordering and scaling of the basis functions is arbitrary (e.g., we could have chosen with θ(i) = a for any 1 ≤ i = q and any a > 0).
4.5.3.2
Whole-Trajectory Constraint-Satisfaction Method Solution Results
We have seen that the constraint-satisfaction method (4.11) for whole trajectories may be reformulated as the constrained system of linear equations (4.27) under Assumption 4.3. With the parameter set Θ given by (4.36), the set constraint θ ∈ Θ in (4.27) is equivalent to the linear equation e1 θ = θ(1) = 1 where e1 ∈ Rq is a column vector with 1 in its first component and zeros elsewhere. Thus, under Assumption 4.3 and with the parameter set Θ given by (4.36), the constraint-satisfaction method (4.11) reduces to the unconstrained system of linear equations ξ¯C θ = e¯1
(4.37)
where ¯ξC e1 ξC and e¯1 ∈ Rm|K |+1 is a column vector with 1 in its first component and zeros elsewhere. This reformulation of the constraint-satisfaction method (4.11) leads to the following theorem. Theorem 4.3 (Solutions to Whole-Trajectory Constraint-Satisfaction Method) Suppose that Assumption 4.3 holds and that the parameter set Θ is given by (4.36). Let ξ¯C+ be the pseudoinverse of the matrix ξ¯C . Then the constraint-satisfaction method for whole trajectories (4.11) yields cost-functional parameters if and only if
120
4 Continuous-Time Inverse Optimal Control
ξ¯C ξ¯C+ e¯1 = e¯1 ,
(4.38)
and these (potentially nonunique) parameters are given by θ = ξ¯C+ e¯1 + (I − ξ¯C+ ξ¯C )b
(4.39)
where b ∈ Rq is any arbitrary vector. Furthermore, the cost-functional parameters computed by the method (4.11) are unique and given by θ = ξ¯C+ e¯1
(4.40)
if and only if ξ¯C has rank q in addition to satisfying (4.38). Proof Since (4.11) reduces to (4.37) under Assumption 4.3 with Θ given by (4.36), we shall analyze the solutions to (4.37). Proposition 2.2 gives that (4.37) is consistent if and only if ξ¯C ξ¯C+ e¯1 = e¯1 . Proposition 2.2 also gives that these solutions are given by (4.39), proving the first theorem assertion. The second theorem assertion follows from (4.39) since Lemma 2.1 gives that ξ¯C+ ξ¯C = I if and only if ξ¯C has rank q. The proof is complete. The rank condition of Theorem 4.3 serves a role analogous to persistence of excitation conditions that appear in parameter estimation and adaptive control since it holds when the given state and control trajectories provide sufficient information to enable unique determination of the cost-functional parameters θ . It will fail to hold when the problem is ill-posed due to a short time horizon or if there are too few times during which the controls are in the interior of the constraint set U . It may also fail to hold for degenerate system dynamics and initial states that lead to uninformative trajectories. Theorem 4.3 also establishes that the left-identity condition (4.38) is a necessary and sufficient condition for the constraint-satisfaction method (4.11) to yield costfunctional parameters under Assumption 4.3 with Θ given by (4.36). If the leftidentity condition (4.38) does not hold, the constraint-satisfaction method (4.11) will fail to yield any cost-functional parameters. The left-identity condition (4.38) will fail to hold when the trajectories x and u do not constitute a solution to the optimal control problem (4.3) for any θ ∈ Θ, and so Theorem 4.3 provides deep insight into the solution of the WT problem of Definition 4.1 with any method (not just the minimumprinciple methods). Indeed, since the constraint-satisfaction method (4.11) encodes the costate differential equation (4.9) and the Hamiltonian gradient condition (4.8) of the continuous-time minimum principle of Theorem 4.1 as constraints, when (4.11) has no solutions, there are no θ in Θ such that x and u satisfy the minimum principle of Theorem 4.1 (which are necessary conditions for the solution of the optimal control problem (4.3)). Similarly, the rank condition of Theorem 4.3 combined with (4.38)
4.5 Method Reformulations and Solution Results
121
is both necessary and sufficient for ensuring that there is at most one θ in Θ that constitutes an exact solution to the WT problem of Definition 4.1. We thus have the following corollary. Corollary 4.2 (Existence of Exact Solutions to Whole-Trajectory Problem of Definition 4.1) Suppose that Assumption 4.3 holds and that the parameter set Θ is given by (4.36). If ξ¯C ξ¯C+ e¯1 = e¯1 then the state and control trajectories x : [0, T ] → Rn and u : [0, T ] → U do not constitute an optimal solution to the optimal control problem (4.3) for any θ ∈ Θ. If, however, ξ¯C ξ¯C+ e¯1 = e¯1 and ξ¯C has rank q, then there is at most one θ ∈ Θ for which the trajectories x : [0, T ] → Rn and u : [0, T ] → U constitute an optimal solution to (4.3). Proof The costate and Hamiltonian gradient conditions (4.9) and (4.8) of the continuous-time minimum principle of Theorem 4.1 are constraints in the constraintsatisfaction method (4.11), which simplifies to (4.37) under Assumption 4.3 with Θ given by (4.36). Thus, when no solution to (4.37) exists, which Theorem 4.3 implies when ξ¯C ξ¯C+ e¯1 = e¯1 , there are no θ in Θ under which x and u satisfy the continuoustime minimum principle of Theorem 4.1. The first corollary assertion follows since satisfying the conditions of Theorem 4.1 is necessary in order for x and u to constitute a potentially local optimal solution to (4.3). The second corollary assertion follows similarly since if (4.37) possesses a unique solution, which Theorem 4.3 implies when ξ¯C ξ¯C+ e¯1 = e¯1 and ξC has rank q, there is only one θ in Θ such that x and u satisfy (4.9) and (4.8) of the minimum principle (Theorem 4.1). The proof is completed by noting that Theorem 4.1 is necessary (but not sufficient) in order for x and u to constitute a potentially local optimal solution to (4.3). The left-identity condition (4.38) provides a practical condition for determining if it is feasible to solve the WT problem of Definition 4.1 exactly, or if it is necessary to resort to an approximate optimality solution involving either the soft or mixed methods of (4.13) and (4.14). We summarize the use of this condition and the constraint-satisfaction method for whole trajectories (4.11) under Assumption 4.3 in Algorithm 4.1.
4.5.3.3
Whole-Trajectory Soft Method Solution Results
Under Assumption 4.3, we previously formulated the soft method (4.13) for whole trajectories satisfying Assumption 4.1 as the problem of solving the constrained
122
4 Continuous-Time Inverse Optimal Control
Algorithm 4.1 Constraint-Satisfaction Method for Whole Trajectories Input: Whole state and control trajectories x : [0, T ] → Rn and u : [0, T ] → U , dynamics f , basis functions g, ¯ control-constraint set U , parameter set Θ = {θ ∈ Rq : θ(1) = 1}, and set of times K . Output: Computed cost-functional parameters θ. 1: Solve the differential equation (4.23) for λ¯ with λ¯ (T ) = 0. 2: Compute matrix ξC via (4.28). 3: Compute augmented matrix ξ¯C from (4.37). 4: Compute the pseudoinverse ξ¯C+ of ξ¯C . 5: if ξ¯C ξ¯C+ e¯1 = e¯1 then 6: if ξ¯C has rank q then 7: return Unique θ given by (4.40). 8: else 9: return Any θ from (4.39) with any b ∈ Rq . 10: end if 11: else 12: return No feasible exact solutions θ to the WT problem (cf. Corollary 4.2). 13: end if
quadratic optimization problem (4.32). By now incorporating that the parameter set Θ is given by (4.36), we will next examine the existence and uniqueness of solutions to (4.13). To present these results, let us define ⎡
⎤ P(2,2) (0) . . . P(2,q+n) (0) ⎢ P(3,2) (0) . . . P(3,q+n) (0) ⎥ ⎢ ⎥ ξ¯S ⎢ ⎥ .. .. .. ⎣ ⎦ . . . P(q+n,2) (0) . . . P(q+n,q+n) (0)
(4.41)
as the principal submatrix of P(0) formed by deleting its first row and column and where P(i, j) (0) denotes the element of P(0) in its ith row and jth column. Let us also define ξ¯S+ and r S as the pseudoinverse and rank of ξ¯S , respectively, and let the first column of P(0) with its first element deleted be ν S P(2,1) (0) P(3,1) (0) . . . P(q+n,q+n) (0) .
(4.42)
Finally, let a singular value decomposition (SVD) of ξ¯S be ξ¯S = U S Σ S U S where Σ S ∈ R(q+n−1)×(q+n−1) is a diagonal matrix, and US =
11 12 US US ∈ R(q+n−1)×(q+n−1) U S21 U S22
(4.43)
4.5 Method Reformulations and Solution Results
123
is a block matrix with submatrices U S11 ∈ R(q−1)×r S , U S12 ∈ R(q−1)×(q+n−1−r S ) , U S21 ∈ Rn×r S and U S22 ∈ Rn×(q+n−1−r S ) . A characterization of the solutions to the soft method (4.13) under Assumption 4.3 with the set Θ given by (4.36) follows. Theorem 4.4 (Whole-Trajectory Soft Method Solutions) Suppose that Assumptions 4.1 and 4.3 hold, and that Θ is given by (4.36). If (I − ξ¯S ξ¯S+ )ν S = 0, then all of the parameter vectors θ ∈ Θ corresponding to all of the solutions (λ, θ ) to the soft method (4.13) are of the form θ = Iη
(4.44)
where η = 1 η¯ ∈ Rq+n are (potentially nonunique) solutions to the quadratic program (4.32) with η¯ ∈ Rq+n−1 given by η¯ = −ξ¯S+ ν S + U S
0 b
(4.45)
for any b ∈ Rq+n−1−r S . Furthermore, if either U S12 = 0 or r S = q + n − 1, then all solutions (λ, θ ) to the soft method (4.13) correspond to the single unique parameter vector θ ∈ Θ given by θ =I
1 . −ξ¯S+ ν S
(4.46)
Proof The soft method developed under Assumption 4.1 is (4.13), and may be reformulated as (4.32) under Assumption 4.3. We thus proceed by analyzing the solutions to (4.32). For any η ∈ Rq+n with I η ∈ Θ and Θ given by (4.36), we have that η = 1 η¯ where η¯ ∈ Rq+n−1 and so 1 η P(0)η = 1 η¯ P(0) η¯ = P(1,1) (0) + η¯ ξ¯S η¯ + 2η¯ ν S .
All solutions η = 1 η¯ of the constrained quadratic program (4.32) with Θ given by (4.36) therefore have η¯ ∈ Rq+n−1 given by the solution to the unconstrained quadratic program inf η¯
1 η¯ ξ¯S η¯ + η¯ ν S . 2
Under the condition that (I − ξ¯S ξ¯S+ )ν S = 0 and by noting that ξ¯S is symmetric positive semidefinite due to P(0) being symmetric positive semidefinite, Proposition 2.1 gives that this unconstrained quadratic program is solved by any η¯ satisfying
124
4 Continuous-Time Inverse Optimal Control
η¯ = −ξ¯S+ ν S + U S
0 b
for any b ∈ Rq+n−1−r S . The first theorem assertion (4.45) follows. To prove the second theorem assertion, we note that if U S12 = 0, then
11 0 US 0 + ¯ η¯ = −ξ S ν S + U S21 U S22 b
0 + ¯ = −ξ S ν S + U S22 b for any b ∈ Rq+n−1−r S where U S22 b ∈ Rn . Clearly, if r S = q + n − 1, then η¯ = −ξ¯S+ ν S . Thus, if either U S12 = 0 or r S = q + n − 1, then the first q − 1 components of η¯ q+n−1−r S , and so all solutions are invariant with respect to the free vector b ∈ R η = 1 η¯ of the constrained quadratic program (4.32) satisfy
1 . Iη = I −ξ¯S+ ν S
The second theorem assertion follows since θ = I η. The proof is complete.
The soft method (4.13) involves optimizing over both the cost-functional parameters θ and the costate function λ : [0, T ] → Rn . In Theorem 4.4, we have focused on characterizing the cost-functional parameters θ yielded by the soft method (4.13) without regard for the costate function. Indeed, Theorem 4.4 establishes that the cost-functional parameters yielded by (4.13) are given by (4.44), and that the conditions U S12 = 0 and r S = q + n − 1 are both sufficient for ensuring the uniqueness of the cost-functional parameters yielded by (4.13). The conditions U S12 = 0 and r S = q + n − 1 again have an intuitive interpretation as persistence of excitation conditions for the soft method (4.13). While we note that these conditions for the soft method may hold in cases where the conditions of Theorem 4.3 and Corollary 4.2 for the constraint-satisfaction method do not, in these cases the solutions to the soft method will yield parameters under which the trajectories x and u are approximately optimal rather than exactly optimal (in the sense discussed in Sect. 4.2). From Theorem 4.4, we observe that if r S = q + n − 1 holds then both the costfunctional parameters θ and costate functions λ : [0, T ] → Rn yielded by the soft method (4.13) are unique. To see that a unique pair (λ, θ ) solves (4.13) when r S = q + n − 1, we note that thefirst assertion of Theorem 4.4, specifically (4.44), implies that the vectors η = 1 η¯ are unique solutions to the quadratic program (4.32) if r S = q + n − 1 because the free vector b will be zero-dimensional. By also recalling from (4.30) that
4.5 Method Reformulations and Solution Results
125
η = z(0) =
θ λ(0)
and that the function z(t) = θ λ (t) in (4.30) is unique for each initial condition λ(0), we have that the pair (λ, θ ) solving the soft method (4.13) will be unique when r S = q + n − 1. Finally, the condition U S12 = 0 can hold when r S < q + n − 1. In this case, all pairs (θ, λ) solving the soft method (4.13) will share the unique parameter vector θ given by (4.46) but may not share the common function λ. We summarize the soft method for whole trajectories (4.13) developed and reformulated under Assumptions 4.1 and 4.3 in Algorithm 4.2. Algorithm 4.2 Soft Method for Whole Trajectories Input: Whole state and control trajectories x : [0, T ] → Rn and u : [0, T ] → U , dynamics f , basis functions g, ¯ control-constraint set U , and parameter set Θ = {θ ∈ Rq : θ(1) = 1}. Output: Computed cost-functional parameters θ. 1: Solve Riccati differential equation (4.31) with P(T ) = 0 for P(0). 2: Compute submatrix matrix ξ¯S from (4.41) and vector ν S from (4.42). 3: Compute the pseudoinverse ξ¯S+ of ξ¯ S such that (I − ξ¯S ξ¯ S+ )ν S = 0. 4: Compute the rank r S of ξ¯ S . 5: if r S = q + n − 1 then 6: return Unique θ given by (4.46). 7: else 8: Compute U S and U S12 in (4.43) via SVD of ξ¯ S . 9: if U S12 = 0 then 10: return Unique θ given by (4.46). 11: else 12: return Any θ from (4.44) with any b ∈ Rq+n−1−r S . 13: end if 14: end if
4.5.3.4
Whole-Trajectory Mixed Method Solution Results
We now examine the solution of the mixed method (4.14) when Θ is given by (4.36) through its reformulation in (4.33) under Assumption 4.3. Let us define ⎡
ξ¯ M
ξ M,(2,2) ⎢ξ M,(3,2) ⎢ ⎢ . ⎣ ..
ξ M,(q,2)
⎤ . . . ξ M,(2,q) . . . ξ M,(3,q) ⎥ ⎥ .. ⎥ .. . . ⎦ . . . ξ M,(q,q)
(4.47)
as the principal submatrix of ξ M formed by deleting its first row and column and where ξ M,(i, j) denotes the element of ξ M in the ith row and jth column. Let us also
126
4 Continuous-Time Inverse Optimal Control
+ define ξ¯ M and r M as the pseudoinverse and rank of ξ¯ M , respectively, and let
ν M ξ M,(2,1) ξ M,(3,1) . . . ξ M,(q,q)
(4.48)
denote the first column of ξ M without its first element. Finally, let ξ¯ M = U M Σ M U M
be a SVD of ξ¯ M where U M ∈ R(q−1)×(q−1) and Σ M ∈ R(q−1)×(q−1) . In the following theorem, we use these definitions to characterize the existence and uniqueness of solutions to the mixed method (4.14) under Assumption 4.3 with the parameter set Θ given by (4.36). Theorem 4.5 (Whole-Trajectory Mixed Method Solutions) Suppose that Assump+ ¯ )ξ M = 0, then the parameter tion 4.3 holds and that Θ is given by (4.36). If (I − ξ¯ M ξ¯ M vectors θ ∈ Θ solving the mixed method of (4.14) are all of the form 1 θ= ¯ θ
(4.49)
where the vectors θ¯ ∈ Rq−1 are given by 0 + θ¯ = −ξ¯ M νM + UM b
(4.50)
for any b ∈ Rq−1−r M . Furthermore, if r M = q − 1 then θ=
1
+ νM −ξ¯ M
(4.51)
is the unique solution to (4.14). Proof The mixed method (4.14) reduces to the quadratic program (4.33) under Assumption 4.3 and so we analyze (4.33). Since any θ ∈ Θ can be written as θ = 1 θ¯ with θ¯ ∈ Rq−1 when Θ = {θ ∈ Rq : θ(1) = 1}, we have that 1 θ ξ M θ = 1 θ¯ ξ M ¯ θ = ξ M,(1,1) + θ¯ ξ M θ¯ + 2θ¯ ν M where ξ M,(1,1) is the first element of ξ M (i.e., the element in its first row and first column). The constrained quadratic program (4.33) with Θ given by (4.36) is therefore solved by the vectors θ = 1 θ¯ where θ¯ ∈ Rq−1 are solutions to the unconstrained quadratic program
4.5 Method Reformulations and Solution Results
inf θ¯
127
1 θ¯ ξ¯ M θ¯ + θ¯ ν M . 2
Noting that ξ¯ M is positive semidefinite due to ξ M being positive semidefinite, and + )ν M = 0 by the theorem conditions, Proposition 2.1 gives that this since (I − ξ¯ M ξ¯ M unconstrained quadratic program is solved by all θ¯ of the form 0 + νM + UM θ¯ = −ξ¯ M b for any b ∈ Rq−1−r M . The first theorem assertion (4.49) follows. The second lemma assertion (4.51) also follows since the free vector b ∈ Rq−1−r M is zero-dimensional when r M = q − 1. The proof is complete. Theorem 4.5 establishes that the mixed method (4.14) yields unique parameters θ under Assumption 4.3 when the parameter set Θ is given by (4.36) and the rank condition r M = q − 1 holds. As with the conditions for uniqueness of solutions to the constraint-satisfaction and soft methods established in Theorems 4.3 and 4.4, the rank condition r M = q − 1 established in Theorem 4.5 has an intuitive interpretation as a persistence of excitation condition. The rank condition r M = q − 1 of Theorem 4.5 for the mixed method (4.14) may, however, hold in cases where the conditions of Theorems 4.3 and 4.4 for the constraint-satisfaction and soft methods do not. Indeed, the mixed method (4.14) will always yield parameters under which the trajectories are either exactly or approximately optimal (in the sense discussed in Sect. 4.2) while the constraint-satisfaction method (4.11) is limited to exact optimality and so may fail to yield any parameters. The mixed method (4.14) also handles situations in which the controls are constrained while the soft method (4.13) does not (due to Assumption 4.1). We summarize the mixed method for whole trajectories (4.14) under Assumption 4.3 in Algorithm 4.3. parameter set
4.5.4 Reformulations of Truncated-Trajectory Methods We now turn our attention to reformulating the truncated-trajectory methods of Sect. 4.4.2 under the linear parameterization structure imposed by Assumption 4.3.
4.5.4.1
Truncated-Trajectory Constraint-Satisfaction and Mixed Method Reformulations
In the constraint-satisfaction and mixed methods of (4.17) and (4.20), the function λ : [0, ] → Rn is constrained to satisfy the differential equation
128
4 Continuous-Time Inverse Optimal Control
Algorithm 4.3 Mixed Method for Whole Trajectories Input: Whole state and control trajectories x : [0, T ] → Rn and u : [0, T ] → U , dynamics f , basis functions g, ¯ control-constraint set U , parameter set Θ = {θ ∈ Rq : θ(1) = 1}, and set of times K . Output: Computed cost-functional parameters θ. 1: Solve differential equation (4.23) for λ¯ with λ¯ (T ) = 0. 2: Compute matrix ξ M via (4.34). 3: Compute submatrix ξ¯ M from (4.47) and vector ν M from (4.48). 4: Compute rank r M of ξ¯ M . + + 5: Compute pseudoinverse ξ¯ M of ξ¯ M so (I − ξ¯ M ξ¯ M )ν M = 0. 6: if r M = q − 1 then 7: return Unique θ given by (4.51). 8: else 9: Compute U M through SVD of ξ¯ M . 10: return Any θ from (4.49) with any b ∈ Rq−1−r M . 11: end if
λ˙ (t) = −∇x H (t, x(t), u(t), λ(t), μ, θ ) without any explicit boundary conditions on λ() or λ(0). Without any boundary conditions for λ, we are unable to employ the same strategy we used to reformulate the constraint-satisfaction and mixed methods for whole trajectories (i.e., we are unable to develop or apply a result analogous to Proposition 4.1). Indeed, options for reformulating the truncated-trajectory constraint-satisfaction and mixed methods of (4.17) and (4.20) appear limited to substituting the linear forms of the Hamiltonian gradients (4.21) and (4.22) into the methods. Thus, under Assumption 4.3 the constraint-satisfaction method (4.17) may be rewritten as inf
θ,λ,μ
s.t.
C λ˙ (t) = −∇x g(t, ¯ x(t), u(t))θ μ − ∇x f (t, x(t), u(t))λ(t), t ∈ [0, T ] ∇u g(t ¯ k , x(tk ), u(tk ))θ μ + ∇u f (tk , x(tk ), u(tk ))λ(tk ) = 0, tk ∈ K θ ∈Θ μ∈R (4.52)
for any constant C ∈ R. Similarly, under Assumption 4.3 the mixed method (4.20) may be rewritten as inf
θ,λ,μ
s.t.
∇u g(t, ¯ x(t), u(t))θ μ + ∇u f (t, x(t), u(t))λ(t) 2
tk ∈K
˙ λ(t) = −∇x g(t, ¯ x(t), u(t))θ μ − ∇x f (t, x(t), u(t))λ(t), t ∈ [0, ] (4.53) θ ∈Θ μ ∈ R.
4.5 Method Reformulations and Solution Results
129
From (4.52), we see that the constraint-satisfaction method (4.17) under Assumption 4.3 involves the solution of a system of linear equations and a linear ordinary differential equation. Similarly (4.53) implies that the mixed method (4.20) under Assumption 4.3 involves the solution of a quadratic optimization problem and a linear ordinary differential equation. In both methods however, the differential equation and linear or quadratic programs are coupled through both the parameters θ and functions λ, limiting the extent to which we are able to reformulate them. We shall therefore instead focus our attention on reformulating the soft method (4.19).
4.5.4.2
Truncated-Trajectory Soft Method Reformulation
We may employ Assumption 4.3 to reformulate the soft method of (4.19) developed under Assumption 4.2. Let us begin by recalling the linearity of the Hamiltonian gradients in (4.21) and (4.22) under Assumption 4.3. Let us also recall the definitions and notation used in the reformulation of the whole-trajectory soft method in Sect. 4.5.2.2, and let
μθ . z (t) λ(t) Using the same argument and notation as Sect. 4.5.2.2, the soft method (4.19) under Assumptions 4.2 and 4.3 can be rewritten as the LQ optimization problem
inf inf β
s.t.
v
0
z (t)Q(t)z (t) + v (t)Rv(t) + 2z (t)S(t)v(t) dt
z˙ (t) = Bv(t), t ∈ [0, ] z (0) = β
(4.54)
I βμ−1 ∈ Θ where β ∈ Rq+n . For any β ∈ Rq+n , the inner optimization problem over the function v in (4.54) is a LQ optimal control problem with finite horizon . Thus, by applying the LQ optimal control results of [3, Sect. 3.4] as in Sect. 4.5.2.2, the soft method (4.19) under Assumption 4.3 is the (constrained) quadratic program inf β
β P (0)β
s.t. I βμ−1 ∈ Θ,
(4.55)
and P : [0, ] → R(q+n)×(q+n) is the unique positive semidefinite solution to the Riccati differential equation P˙ (t) = (P (t)B + S(t))(B P (t) + S (t)) − Q(t) for t ∈ [0, ] with terminal condition P () = 0.
(4.56)
130
4 Continuous-Time Inverse Optimal Control
If the horizon T is known to be finite, then the cost-functional parameters yielded by the soft method (4.55) under Assumption 4.3 are θ = I β where β solves the (constrained) quadratic optimization problem (4.55) with μ = 1. If however the horizon T is infinite or not known to be finite, then the constant μ ∈ R is unknown and so the quadratic optimization problem (4.55) must be solved without the constraint that I βμ−1 ∈ Θ. In the case of an unknown (potentially infinite) horizon T , the cost-functional parameters yielded by the soft method (4.55) under Assumption 4.3 will therefore satisfy μθ = I β where β solves the quadratic optimization problem (4.55) without the constraint I βμ−1 ∈ Θ.
4.5.5 Solution Results for Truncated-Trajectory Methods We shall now use the reformulation of the truncated-trajectory soft method to establish results characterizing the existence and uniqueness of the cost-functional parameters it yields under Assumption 4.3. Here, we omit consideration of the constraintsatisfaction and mixed methods for truncated trajectories due to the limited extent to which we are able to reformulate them in (4.52) and (4.53). Without loss of generality, we shall again consider the parameter set Θ given by (4.36) to avoid nonuniqueness and triviality due to scalings of the cost functional.
4.5.5.1
Truncated-Trajectory Soft Method Solution Results
Under Assumption 4.3, we reformulated the soft method (4.19) for truncated trajectories satisfying Assumption 4.2 as the problem of solving the constrained quadratic optimization problem (4.55). By now incorporating that the parameter set Θ is given by (4.36), we will next characterize the existence and uniqueness of solutions to (4.19). To present these results, let us define ⎤ P,(2,2) (0) . . . P,(2,q+n) (0) ⎢ P,(3,2) (0) . . . P,(3,q+n) (0) ⎥ ⎥ ⎢ φ¯ S ⎢ ⎥ .. .. .. ⎦ ⎣ . . . P,(q+n,2) (0) . . . P,(q+n,q+n) (0) ⎡
(4.57)
as the principal submatrix of P (0) formed by deleting the first row and column of P (0) where P,(i, j) (0) denotes the element in the ith row and jth column of P (0). Let us also define φ¯ S+ and ρ S as the pseudoinverse and rank of φ¯ S , respectively, and let ν¯ S P,(2,1) (0) P,(3,1) (0) . . . P,(q+n,q+n) (0) be the first column of P (0) without its first element. Finally, a SVD of φ¯ S is
(4.58)
4.5 Method Reformulations and Solution Results
131
φ¯ S = U¯ S Σ¯ S U¯ S where Σ¯ S ∈ R(q+n−1)×(q+n−1) is a diagonal matrix, and 11 12 U¯ U¯ ¯ U S = ¯ S21 ¯ S22 ∈ R(q+n−1)×(q+n−1) US US
(4.59)
is a block matrix with submatrices U¯ S11 ∈ R(q−1)×ρS , U¯ S12 ∈ R(q−1)×(q+n−1−ρS ) , U¯ S21 ∈ Rn×ρS and U¯ S22 ∈ Rn×(q+n−1−ρS ) . We characterize the existence and uniqueness of solutions to the soft method (4.19) under Assumption 4.3 with the parameter set Θ given by (4.36) when the horizon T is finite (but potentially unknown). Theorem 4.6 (Truncated-Trajectory Soft Method Solutions) Suppose that Assumptions 4.2 and 4.3 hold, that Θ is given by (4.36), and that (I − φ¯ S φ¯ S+ )¯ν S = 0. If the horizon T is finite (but potentially unknown), then all of the parameter vectors θ ∈ Θ corresponding to all of the solutions (λ, θ ) to the soft method (4.19) are of the form θ = Iβ
(4.60)
where β = 1 β¯ ∈ Rq+n are (potentially nonunique) solutions to the quadratic program (4.55) with β¯ ∈ Rq+n−1 given by 0 β¯ = −φ¯ S+ ν¯ S + U¯ S b
(4.61)
for any b ∈ Rq+n−1−ρS . Furthermore, if either U¯ S12 = 0 or ρ S = q + n − 1, then all solutions (λ, θ ) to the soft method (4.19) correspond to the single unique parameter vector θ ∈ Θ given by
1 . −φ¯ S+ ν¯ S
θ =I
(4.62)
Proof The proof is essentially the same as that of Theorem 4.4 and follows from the reformulation (4.55) under Assumption 4.3 noting that μ = 1 when T is finite. The proof is complete. If the horizon T is known to be finite (albeit with a potentially unknown value), then Theorem 4.6 establishes the same solution and uniqueness results for the soft method (4.19) as Theorem 4.4 but for the truncated, rather than whole, trajectory inverse optimal control problem of Definition 4.2. The implementation of the soft method for truncated trajectories (4.19) is therefore essentially equivalent to that of the soft method for whole trajectories (4.13) summarized in Algorithm 4.2. If,
132
4 Continuous-Time Inverse Optimal Control
however, the horizon is unknown and potentially infinite, then inspection of the soft method (4.55) suggests that it may only be feasible to estimate the bilinear term μθ (and not the values of μ and θ individually).
4.6 Inverse Linear-Quadratic Optimal Control in Continuous Time In this section, we consider the TT problem of Definition 4.2 in the special case of linear system dynamics (i.e., f (x(t), u(t)) = Ax(t) + Bu(t) with matrices A and B), quadratic cost functionals (i.e., g(x(t), u(t), θ ) = x (t)Qx(t) + u (t)Ru(t) with matrices Q and R described by the parameters θ ), and an infinite horizon T = ∞.
4.6.1 Overview of Approach The approach we shall develop to solve the TT problem of Definition 4.2 in this infinite-horizon LQ case is essentially equivalent to that described in Sect. 3.6.1 for discrete time. Specifically, by noting that standard LQ optimal control results (e.g., [3]) imply that the states and controls solving the (forward) infinite-horizon LQ optimal control problem satisfy the feedback relationship u(t) = −K x(t) for some optimal feedback law K ∈ Rm×n , the inverse LQ optimal control problem can be approached via a two-step process involving estimation of K from given state and control trajectories followed by manipulation of an algebraic Riccati equation. As we shall now show, the key distinction between this continuous-time approach and the discrete-time approach presented in Sect. 3.6 lies in the use of a continuous-time algebraic Riccati equation (rather than a discrete-time algebraic Riccati equation) to find the matrices Q and R in the second step.
4.6.2 Preliminary LQ Optimal Control Concepts Let us first specialize concepts from Sect. 4.1 to the linear-quadratic setting.
4.6.2.1
Parameterized LQ Optimal Control in Continuous Time
Consider continuous-time system dynamics described by the system of linear ordinary differential equations x(t) ˙ = Ax(t) + Bu(t), x(0) = x0 ∈ Rn ,
(4.63)
4.6 Inverse Linear-Quadratic Optimal Control in Continuous Time
133
where A ∈ Rn×n and B ∈ Rn×m are time-invariant system matrices. Consider also an infinite-horizon quadratic cost functional of the form 1 V∞ (x, u, Q, R) = 2
∞
x (t)Qx(t) + u (t)Ru(t) dt
(4.64)
0
where Q 0 and R 0 constitute positive semidefinite and positive definite matrices, respectively, which parameterize the cost functional. Given the linear dynamics (4.63) and the quadratic cost functional (4.64), the continuous-time infinite-horizon LQ optimal control problem is: inf
V∞ (x, u, Q, R)
s.t.
x(t) ˙ = Ax(t) + Bu(t), t ∈ [0, T ] x(0) = x0 .
u
(4.65)
As in discrete-time, necessary and sufficient optimality conditions for this LQ optimal control problem involve an algebraic Riccati equation (ARE).2
4.6.2.2
Necessary and Sufficient Conditions for Feedback Solutions
To present the continuous-time ARE for (4.65), we restrict our attention to stabilizing linear feedback control laws K ∈ Rm×n such that u(t) = −K x(t)
(4.66)
and the resulting closed-loop system matrix F A − BK
(4.67)
is stable, meaning that x(t) with x(t) ˙ = F x(t) tends to zero as t → ∞. The set of stabilizing linear feedback laws will be denoted F = {K ∈ Rm×n : F is stable}.
(4.68)
The set F is non-empty if the pair (A, B) is stabilizable (cf. Footnote 5 in Sect. 3.6.2.2). The cost functional (4.64) can then be written as V∞ (K , x0 , Q, R), and we have the following theorem giving necessary and sufficient conditions for optimal feedback laws.
2
These conditions can be derived from continuous-time minimum principles such as Theorems 4.1 and 4.2).
134
4 Continuous-Time Inverse Optimal Control
Theorem 4.7 (Continuous-Time ARE [8]) Consider the continuous-time infinitehorizon LQ optimal control problem (4.65). A feedback law K ∈ F is optimal, i.e., V∞ (K , x0 , Q, R) ≤ V∞ ( K¯ , x0 , Q, R), for all K¯ ∈ F and all x0 ∈ Rn if and only if K = R −1 B P
(4.69)
where P ∈ Rn×n is a stabilizing solution3 to the continuous-time ARE A P + P A − P B R −1 B P + Q = 0.
(4.70)
Furthermore, if K is optimal then V∞ (K , x0 , Q, R) = x0 P x0 . Proof See [8, Theorem 2].
It is important to note that the continuous-time ARE presented in Theorem 4.7 differs from the discrete-time ARE presented in Theorem 3.9. Nevertheless, we shall employ it in a similar manner to solve the continuous-time inverse LQ optimal control problem that we introduce next.
4.6.3 Feedback-Law-Based Inverse LQ Optimal Control In this subsection, we pose and solve the feedback-law-based problem of inverse optimal control of concern in the second step of the two-step process described in Sect. 4.6.1.
4.6.3.1
Feedback-Law-Based Problem Formulation
The feedback-law-based inverse optimal control problem is defined as follows. Definition 4.4 (Feedback-Law-Based (FLB) Problem) Consider the parameterized continuous-time infinite-horizon LQ optimal control problem (4.65) with linear dynamics (4.63), quadratic cost functional (4.64), and infinite horizon T = ∞. Given system matrices A and B, and a feedback law K , the feedback-law-based continuous-time infinite-horizon inverse LQ optimal control problem is to compute the cost-functional parameters (i.e., the entries of the matrices Q and R), such that K (and hence the controls given by u(t) = −K x(t)) constitutes an optimal solution to (4.65). We shall examine the solution of the FLB problem of Definition 4.4 by exploiting the continuous-time ARE established in Theorem 4.7. Specifically, we shall interpret Theorem 4.7 as stating conditions that the unknown cost function matrices Q and R (and the Riccati solution P) must satisfy in order for the feedback law K to constitute an optimal solution to (4.65). Thus, we begin by reformulating (4.70). 3
That is, P solves (4.69) and leads to a feedback law K of the form in (4.69) belonging to F .
4.6 Inverse Linear-Quadratic Optimal Control in Continuous Time
4.6.3.2
135
Reformulation of the Algebraic Riccati Equation
Theorem 4.7 implies that in order for K to constitute an optimal solution to (4.65), matrices P and R must exist such that K is of the form given in (4.69). That is, P and R must exist such that R K = B P (K ⊗ Im )vec(R) = (In ⊗ B )vec(P),
(4.71)
where the second equation results from the vectorisation of the first one (see Sect. 3.6.3.2 for definitions of the vectorization notation and related identities). We now turn our attention to the ARE (4.70) which gives conditions on Q depending on P. We first use (4.69) to rewrite (4.70) as 0 = −P A − A − Q + P B K 0 = −P F − A P − Q,
(4.72)
where the last equation results by inserting (4.67). We now vectorize the last equation to obtain vec(Q) = −(In ⊗ A )vec(P) − (F ⊗ In )vec(P) (4.73) and thus
vec(P) = −[(In ⊗ A ) + (F ⊗ In )]−1 vec(Q),
(4.74)
provided the inverse exists. We now insert (4.74) in the last equation of (4.71), resulting in − (In ⊗ B )[(In ⊗ A ) + (F ⊗ In )]−1 vec(Q) − (K ⊗ Im )vec(R) = 0.
(4.75)
The result is an equation of the form W¯ θ¯ = 0,
(4.76)
where W¯ ∈ Rnm×L (with L n 2 + m 2 ) is the matrix given by W¯ = −(In ⊗ B )[(In ⊗ A ) + (F ⊗ In )]−1 − K ⊗ Im and θ¯ ∈ R L contains all entries of the cost function matrices, i.e.,
¯θ = vec(Q) . vec(R)
(4.77)
(4.78)
Since the matrix W¯ does not depend on the unknown matrices P, Q or R, in principle, we can solve (4.76) as a homogenous system of linear equations for the cost-function
136
4 Continuous-Time Inverse Optimal Control
parameters θ¯ (i.e., the matrices Q and R). We can thus leverage (4.76) to analyze the existence of (exact) solutions to the FLB problem of Definition 4.4 in a similar manner to the results of Corollary 4.2 (which despite being for nonlinear systems and nonquadratic cost functionals, are applicable only in the case of whole trajectories).
4.6.3.3
Existence of Exact Solutions to Feedback-Law-Based Problem
To examine the existence (and uniqueness) of solutions to the homogeneous system of linear equations (4.76), let us omit redundant elements of Q and R from the parameter vector θ¯ , leading to the reduced parameter vector θ ∈ Rq containing only nonredundant and nonzero elements of θ¯ . Thus, we obtain a reformulated ARE (4.76) of the form W θ = 0, (4.79) with the reduced parameter vector θ ∈ Rq , and with the matrix W ∈ Rnm×q being an appropriately modified version of W¯ such that (4.79) holds. From (4.79), we see that the set of parameters θ solving the FLB problem of Definition 4.4 corresponds to the kernel of the matrix W , i.e., ker(W ),
(4.80)
with convex boundaries representing Q 0 and R 0. If the kernel of W does not exist, then there are no feasible solutions to the FLB problem. The existence of the kernel of W depends on the dimensions of W (or on rank(W )), with the same conditions discussed in Sect. 3.6.3.3 for the discrete-time setting applicable here.
4.6.3.4
Feedback-Law-Based Method
To avoid issues with the existence of solutions to (4.79), similar to the soft methods we presented in Sect. 4.4, the method we consider for solving the FLB problem of Definition 4.4 is to minimize violation of the continuous-time ARE (4.70) by solving the quadratic program: min θ
s.t.
1 θ Ωθ, 2 Q0 R0
(4.81)
where Ω 2(W W ) ∈ Rq×q and where Q and R denote the cost functional matrices with elements from the vector θ . As in its discrete-time counterpart (3.98), the constraints in the quadratic program (4.81) are included to avoid trivial solutions. Similarly, the constraints could be relaxed to include an additional constraint that
4.6 Inverse Linear-Quadratic Optimal Control in Continuous Time
137
one element of Q or R is nonzero and positive (as in the parameter set in (4.36)). We refer to Sect. 3.6.3.4 for further details.
4.6.3.5
Solution Results
Analogous to Sect. 3.6.3.5, we have the following result characterizing solutions to (4.81). Proposition 4.2 Under the conditions of the FLB problem of Definition 4.4, the quadratic program (4.81) is convex and a solution is guaranteed to exist. Proof Follows via the same argument as Proposition 3.2.
The significance of Proposition 4.2 is that the FLB method (4.81) will yield costfunctional parameters θ , regardless of whether the given feedback law K constitutes an exact solution to any continuous-time infinite-horizon LQ optimal control problem. The next result we present concerns the uniqueness of solutions to (3.98) (up to an arbitrary nonzero scaling factor C > 0). Theorem 4.8 (Necessary and Sufficient Conditions for Unique Solutions) Consider the FLB problem of Definition 4.4 and suppose that it is solved by Q and R matrices with elements from the vector θ . Then the set of all solutions to the proposed method of discrete-time infinite-horizon LQ inverse optimal control defined by (4.81) is {C θ : C > 0}
(4.82)
if and only if n m ≥ q − 1 together with rank(W ) = q − 1. Proof See the proof of Theorem 3.10.
4.6.4 Estimation of Feedback Controls Having developed a method for solving inverse LQ optimal control given the feedback law K in the last subsection (the second step in the approach outlined in Sect. 4.6.1), we consider the issue of computing the feedback law from (sampled) state and control trajectories (the first step in the approach outlined in Sect. 4.6.1). For this purpose, we can employ a least-squares approach based on (4.66). Specifically, let us introduce the finite sequence of sampling times K {tk ∈ [0, ] : 1 ≤ k ≤ |K | and 0 ≤ t1 < t2 < · · · < t|K | ≤ },
(4.83)
where [0, ] is the time interval for which x(t) and u(t) are available. Let the value of the state and control trajectories at tk be denoted by x [k] and u i[k] , respectively. Then,
138
4 Continuous-Time Inverse Optimal Control
the feedback matrix can be estimated by means of solving the linear least-squares estimation problem K = arg min K¯
|K |
K¯ x [k] + u [k] 2 .
(4.84)
k=0
The solution of (4.84) can be given in closed form as −1 K = −U[0,] X [0,] X [0,] X [0,]
(4.85)
where X [0,] ∈ R|K |×n and U[0,] ∈ R|K |×m denote matrices formed from the sampled sequence of available state and control values, respectively.
4.6.5 Inverse LQ Optimal Control Method We summarize the results of this section concerning the estimation of the feedback law and solution of the feedback-law-based problem of inverse LQ optimal control in Algorithm 4.4. The algorithm implements the two-step approach we described at the very start of this section. Algorithm 4.4 Method for Continuous-Time Infinite-Horizon Inverse LQ Optimal Control i
Input: Truncated state and control trajectories x : [0, ] → Rn and u : [0, ] → Rm sampled at times K , and system matrices A and B. Output: Computed cost-function parameters θ. 1: Estimate K using least-squares (i.e., (4.84)) and determine the corresponding closed-loop system matrix F with (4.67). 2: Compute W¯ with (4.77). 3: Modify W¯ to form W so as to fulfill (4.79) with unknown parameters θ. 4: Solve the quadratic optimization problem (4.81) for θ.
4.7 Notes and Further Reading Bilevel Methods: Bilevel methods for continuous-time inverse optimal control appear to have been originally proposed in [26], and implemented on the basis of direct numerical methods for solving the lower level optimal control problems. Hatz et al. [11] subsequently replaced the direct solution of the inverse optimal control problem in the lower level of bilevel methods with costate and Hamiltonian gradient conditions from continuous-time minimum principles. The work of Hatz et al. [11] appears to
4.7 Notes and Further Reading
139
be have been the first in which continuous-time minimum principles were used to solve continuous-time inverse optimal control problems. Nevertheless, neither [26] nor [11] examined the existence or uniqueness of the cost-functional parameters provided by bilevel methods. It was only later in [7, 13] that minimum principles were used to examine the existence and uniqueness of solutions to inverse optimal control problems (although these results are limited to the example problems originally studied in [11, 26]). Minimum-Principle Methods: Methods of continuous-time inverse optimal control based purely on continuous-time minimum principles were pioneered by Johnson et al. in [16] with the proposal of the soft method of inverse optimal control for whole trajectories (cf. (4.13)). The minimum principle method of [16] was subsequently extended in [29] to explicitly handle terminal costs in the cost functional. Conditions for the existence and uniqueness of solutions to the inverse optimal control method of [16] were recently established in [25], along with the proposal of the mixed method (4.14) for problems with constrained controls. As we have seen in this chapter, use of continuous-time minimum principles continues to lead to new simple continuoustime inverse optimal control methods, and the results for the constraint-satisfaction method (4.11) in Theorem 4.3 and Corollary 4.2, in particular, cast new light on the feasibility of solving continuous-time inverse optimal control problems using any inverse optimal control method (including both bilevel and minimum-principle methods). Similar theoretical results with deeper insights for specific control-affine and linear systems with quadratic cost functionals have also been developed in [14, 15]. Inverse Linear-Quadratic Optimal Control: The special case of LQ continuoustime inverse optimal control has received by far the most attention in the control engineering literature, with a particular focus on feedback-law-based problems as we considered in Sect. 4.6.3. Specifically, Kalman, in his pioneering work [17], considered an infinite-horizon LQ setting with a single-input linear system, and posed the problem of finding cost-function parameters such that a linear feedback control law is optimal. Kalman’s theoretical work was extended in [6, 12] with the development of algorithms to solve this feedback-law-based inverse optimal control problem. Additional solution algorithms have since been proposed in [5] based on linear matrix inequalities, and in [19] based on coprime matrix fraction descriptions of linear systems. More recently, feedback-law-based inverse LQ optimal control methods similar to that we presented in Sect. 4.6.3.4 have been proposed in [24], where reformulations using the Kronecker product are applied to build semidefinite and linear programs that aim to minimize the deviation to optimality conditions. A two-step approach analogous to that in Sect. 4.6.1 for solving continuous-time infinite-horizon inverse LQ optimal control given state and control trajectories, rather than feedback laws, was presented relatively recently in [28]. Priess et al. [28] also proposed a new feedback-law method using the continuous-time algebraic Riccati equations as a hard constraint in an optimization problem. Faruque et al. [9] similarly consider the initial estimation of feedback laws, but exploit the closed-loop system matrix (4.67) instead of the feedback law (4.66), and use a Frobenius-normbased minimization approach. Furthermore, Faruque et al. [9] apply a constraintsatisfaction method to determine, among all possible solutions, the “maximally-
140
4 Continuous-Time Inverse Optimal Control
diagonal” solution for Q and R. Finally, Li et al. [20, 21] for finite-horizon inverse LQ optimal problems, where it is assumed that a complete function of the feedback law K (t) (instead of the constant K ) is known. Li et al. [20, 21] similarly evaluate the existence and the uniqueness of the solution and provide a method to calculate cost-function parameters based on the given function K (t). Additional Topics: A variety of extensions, generalizations, and complications of the whole-trajectory and truncated-trajectory continuous-time inverse optimal control problems we investigated in this chapter have attracted considerable attention. In particular, problems in which given state and control trajectories must be processed online (i.e., sequentially without storing or progressing them in a batch) have been posed and investigated in [18, 22, 23, 30–33]. These online problems have also introduced additional complexities likely to be encountered in practice including unknown system dynamics [18, 32], partial state information [23, 31], unknown or adversarial disturbances [22, 23, 30], and insufficient data [33]. Methods for solving these online problems for specific classes of second (and higher) order linear and nonlinear systems have been derived using the Hamilton–Jacobi–Bellman (HJB) equation, which resembles (4.5) in Corollary 4.1 but with the costate functions replaced by the partial derivatives of dynamic programming value functions. We note that the HJB equation was earlier used in [27] to develop an offline method of inverse optimal control. Specialized inverse optimal control problems and methods for dynamical systems with particular structures have also been considered, including for hybrid systems in [2] and differentially flat systems in [1]. Despite these extensions and specializations, numerous continuous-time inverse optimal control topics remain to be thoroughly explored, including the selection of basis functions (or the avoidance of their use entirely), and a rigorous treatment of continuous-time inverse as a statistical estimation problem. We note that within any future statistical treatments of continuoustime inverse optimal control, the results of this chapter (e.g., the solution results in Sect. 4.5) could serve to address problems analogous to those of identifiability and persistence of excitation.
References 1. Aghasadeghi N, Bretl T (2014) Inverse optimal control for differentially flat systems with application to locomotion modeling. In: 2014 IEEE international conference on robotics and automation (ICRA), pp 6018–6025 2. Aghasadeghi N, Long A, Bretl T (2012) Inverse optimal control for a hybrid dynamical system with impacts. In: 2012 IEEE international conference on robotics and automation (ICRA), pp 4962–4967 3. Anderson BDO, Moore JB (1990) Optimal control: linear quadratic methods. Prentice Hall, Englewood Cliffs 4. Basar T, Olsder GJ (1999) Dynamic noncooperative game theory, vol 23, 2nd edn. Academic, New York 5. Boyd SP, El Ghaoui L, Feron E, Balakrishnan V (1994) Linear matrix inequalities in system and control theory, vol 15. SIAM
References
141
6. Casti J (1980) On the general inverse problem of optimal control theory. J Optim Theory Appl 32(4):491–497 7. Chittaro FC, Jean F, Mason P (2013) On inverse optimal control problems of human locomotion: stability and robustness of the minimizers. J Math Sci 195(3):269–287 8. Engwerda JC, van den Broek WA, Schumacher JM (2000) Feedback Nash equilibria in uncertain infinite time horizon differential games. In: Proceedings of the 14th international symposium of mathematical theory of networks and systems, MTNS 2000, pp 1–6 9. Faruque IA, Muijres FT, Macfarlane KM, Kehlenbeck A, Humbert JS (2018) Identification of optimal feedback control rules from micro-quadrotor and insect flight trajectories. Biol Cybern 112(3):165–179 10. Halkin H (1974) Necessary conditions for optimal control problems with infinite horizons. Econ: J Econ Soc 267–272 11. Hatz K, Schlöder JP, Bock HG (2012) Estimating parameters in optimal control problems. SIAM J Sci Comput 34(3):A1707–A1728 12. Jameson A, Kreindler E (1973) Inverse problem of linear optimal control. SIAM J Control 11(1):1–19 13. Jean F, Mason P, Chittaro FC (2013) Geometric modeling of the movement based on an inverse optimal control approach. In: 52nd IEEE conference on decision and control, pp 1816–1821 14. Jean F, Maslovskaya S (2019) Injectivity of the inverse optimal control problem for controlaffine systems. In: 2019 IEEE 58th conference on decision and control (CDC). IEEE, pp 511–516 15. Jean F, Maslovskaya S, Zelenko I (2017) Inverse optimal control problem: the sub-Riemannian case. IFAC-PapersOnLine 50(1):500–505. 20th IFAC World Congress 16. Johnson M, Aghasadeghi N, Bretl T (2013) Inverse optimal control for deterministic continuous-time nonlinear systems. In: 2013 IEEE 52nd annual conference on decision and control (CDC), pp 2906–2913 17. Kalman RE (1964) When is a linear control system optimal? J Basic Eng 86(1):51–60 18. Kamalapurkar R (2018) Linear inverse reinforcement learning in continuous time and space. In: 2018 annual American control conference (ACC), pp 1683–1688 19. Kong H, Goodwin G, Seron M (2012) A revisit to inverse optimality of linear systems. Int J Control 85(10):1506–1514 20. Li Y, Yao Y, Hu X (2020) Continuous-time inverse quadratic optimal control problem. Automatica 117:108977 21. Li Y, Zhang H, Yao Y, Hu X (2018) A convex optimization approach to inverse optimal control. In: 2018 37th Chinese control conference (CCC), pp 257–262 22. Lian B, Xue W, Lewis FL, Chai T (2021) Online inverse reinforcement learning for nonlinear systems with adversarial attacks. Int J Robust Nonlinear Control 31(14):6646–6667 23. Lian B, Xue W, Lewis FL, Chai T (2021) Robust inverse Q-learning for continuous-time linear systems in adversarial environments. IEEE Trans Cybern 1–13 24. Menner M, Zeilinger MN (2018) Convex formulations and algebraic solutions for linear quadratic inverse optimal control problems. In: 2018 European control conference (ECC), pp 2107–2112 25. Molloy TL, Inga J, Flad M, Ford JJ, Perez T, Hohmann S (2020) Inverse open-loop noncooperative differential games and inverse optimal control. IEEE Trans Autom Control 65(2):897–904 26. Mombaur K, Truong A, Laumond J-P (2010) From human to humanoid locomotion-an inverse optimal control approach. Auton Robot 28(3):369–383 27. Pauwels E, Henrion D, Lasserre J-B (2014) Inverse optimal control with polynomial optimization. In: 2014 IEEE 53rd annual conference on decision and control (CDC), pp 5581–5586 28. Priess MC, Conway R, Choi J, Popovich JM, Radcliffe C (2015) Solutions to the inverse LQR problem with application to biological systems analysis. IEEE Trans Control Syst Technol 23(2):770–777 29. Rothfuß S, Inga J, Köpf F, Flad M, Hohmann S, Inverse optimal control for identification in non-cooperative differential games. In: IFAC, (2017) world congress. Toulouse, France, July, p 2017
142
4 Continuous-Time Inverse Optimal Control
30. Self R, Abudia M, Kamalapurkar R (2020) Online inverse reinforcement learning for systems with disturbances. In: 2020 American control conference (ACC), pp 1118–1123 31. Self R, Coleman K, Bai H, Kamalapurkar R (2021) Online observer-based inverse reinforcement learning. IEEE Control Syst Lett 5(6):1922–1927 32. Self R, Harlan M, Kamalapurkar R (2019) Online inverse reinforcement learning for nonlinear systems. In: 2019 IEEE conference on control technology and applications (CCTA), pp 296– 301 33. Self R, Mahmud SMN, Hareland K, Kamalapurkar R (2020) Online inverse reinforcement learning with limited data. In: 2020 59th IEEE conference on decision and control (CDC), pp 603–608
Chapter 5
Inverse Noncooperative Dynamic Games
In this chapter, we generalize and extend the discrete-time inverse optimal control methods and results of Chap. 3 to (discrete-time) inverse noncooperative dynamic games. We specifically pose two inverse noncooperative dynamic game problems that involve computing the parameters in the cost functions of multiple players in a noncooperative dynamic game from given state and control sequences. The inverse problems differ in whether the available sequences are whole or truncated prior to the horizon of the game. We then develop and discuss methods for solving these problems based on bilevel optimization and conditions for Nash equilibria derived from discrete-time minimum principles. In the context of inverse noncooperative dynamic game theory, the minimum-principle methods are particularly efficient compared to bilevel methods since they avoid the solution of (forward) noncooperative dynamic games, and enable the independent computation of cost-function parameters for each player. The variety of different information structures and solution concepts in noncooperative dynamic games also leads to subtle yet profound contrasts between inverse noncooperative dynamic game theory and discrete-time inverse optimal control. We illustrate some of these contrasts by showing that when players adopt open-loop Nash equilibrium strategies, minimum-principle methods and results for solving inverse noncooperative dynamic game problems are much the same as those in Chap. 3 for solving discrete-time inverse optimal control problems. However, when players adopt feedback Nash equilibrium strategies, minimum-principle methods and results for solving inverse noncooperative dynamic game problems are somewhat limited, although useful variations exist for the important special case of noncooperative linear-quadratic (LQ) dynamic games (i.e., games with linear dynamics and infinite-horizon quadratic cost functions).
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 T. L. Molloy et al., Inverse Optimal Control and Inverse Noncooperative Dynamic Game Theory, Communications and Control Engineering, https://doi.org/10.1007/978-3-030-93317-3_5
143
144
5 Inverse Noncooperative Dynamic Games
5.1 Preliminary Concepts We begin this chapter by introducing a generalization of noncooperative dynamic games (cf. Sect. 2.4) in which the player cost functions belong to some parameterized class of functions. The remainder of this chapter will examine (inverse) problems involving the computation of the cost-function parameters from data.
5.1.1 Parameterized Noncooperative Dynamic Games Let us consider the set of players P {1, 2, . . . , N }, a (potentially infinite) horizon T > 0, and a deterministic discrete-time dynamical system described by the system of (potentially nonlinear) difference equations xk+1 = f k xk , u 1k , . . . , u kN , x0 ∈ Rn 1
(5.1) N
for k ≥ 0. Here, xk ∈ Rn are state vectors, f k : Rn × Rm × · · · × Rm → Rn are (potentially nonlinear) functions, and u ik ∈ U i are control inputs belonging to the i sets of admissible controls U i ⊂ Rm for i ∈ P. For each player i ∈ P, let us define a parameterized cost function N i i i VTi (x[0,T ] , u 1[0,T −1] , . . . , u [0,T −1] , θ ) F (x T , θ ) +
T −1
gki (xk , u 1k , . . . , u kN , θ i )
k=0
(5.2) if the horizon is finite 0 < T < ∞, or a parameterized cost function i N V∞ (x[0,∞] , u 1[0,∞] , . . . , u [0,∞] , θi)
∞
gki (xk , u 1k , . . . , u kN , θ i )
(5.3)
k=0
if the horizon is infinite T = ∞. Here, u i[0,k] {u i0 , u i1 , . . . , u ik } denotes player i’s control sequence and x[0,k] {x0 , x1 , . . . , xk } denotes the system’s state sequence. The functions gki : Rn × U 1 × · · · × U N × Θ i → R for k ≥ 0 describe the stage (or running) costs associated with the states and controls for player i, while the functions F i : Rn × Θ i → R describe the additional cost for player i associated with termination of the game in state x T for finite T . Importantly (and in contrast to standard constructions of dynamic games), we assume that the functions gki and F i belong to some known class of functions parameterized by vectors θ i from the i parameter sets Θ i ⊂ Rq for each player i ∈ P. We consider parameterized noncooperative dynamic games played with either the open-loop or feedback information structure as described in Sect. 2.4. Under the
5.1 Preliminary Concepts
145
open-loop information structure, player i’s strategy set Γ i is the set of all sequences of functions γ i {γki : k ≥ 0} such that u ik = γki (x0 ) ∈ U i (or equivalently, the set of all control sequences {u ik ∈ U i : k ≥ 0}). Under the feedback information structure, player i’s strategy set Γ i is the set of all sequences of functions γ i {γki : k ≥ 0} such that u ik = γki (xk ) ∈ U i . We assume that the players are playing for a Nash equilibrium. Taking {Γ i : i ∈ P} as the appropriate open-loop or feedback strategy sets, the N -tuple of player strategies {γ i ∈ Γ i : i ∈ P} constitutes a Nash equilibrium (open-loop or feedback) for cost-function parameters {θ i : i ∈ P} if and only if VTi (γ 1 , . . . , γ i , . . . , γ N , θ i ) ≤ VTi (γ 1 , . . . , γ¯ i , . . . , γ N , θ i )
(5.4)
for all γ¯ i ∈ Γ i and all i ∈ P, where here, in a slight abuse of notation, we use VTi (γ 1 , . . . , γ i , . . . , γ N , θ i ) to denote the cost function VTi evaluated with states xk given by (5.1) and controls given by the strategies γ i ∈ Γ i in the sense that u ik = γki (·) for i ∈ P.
5.1.2 Nash Equilibria Conditions via Minimum Principles There is a close relationship between parameterized noncooperative dynamic games and the parameterized discrete-time optimal control problems we considered in Chap. 3. This connection is the same as in the case of standard noncooperative dynamic games (without parameters), and is discussed in detail in Sect. 2.4.2.2. Here, it will prove useful in enabling the derivation of necessary conditions for the existence of Nash equilibria in parameterized noncooperative dynamic games using the parameterized discrete-time minimum principles of Chap. 3. To present these conditions, let us define the (parameterized) player Hamiltonian functions Hki (xk , u 1k , . . . , u kN , λik+1 , θ i ) gki (xk , u 1k , . . . , u kN , θ i ) + λi k+1 f k (xk , u 1k , . . . , u kN ) (5.5) where λik ∈ Rn for k ≥ 0 and i ∈ P are player costate (or adjoint) vectors. Let us assume that the functions f k , gki , and F i are continuously differentiable in each of their state and control arguments so that ∇x Hki (xk , u 1k , . . . , u kN , λik+1 , θ i ) ∈ Rn and ∇u i Hki (xk , u 1k , . . . , u ik , . . . , u kN , λik+1 , θ i ) ∈ Rm
i
denote the column vectors of partial derivatives of the Hamiltonian Hki with respect to xk and u ik , respectively, and evaluated at xk , {u ik : i ∈ P}, λik+1 , and θ i . Similarly, let ∇x F i (x T , θ i ) denote the column vector of partial derivatives of F i with respect
146
5 Inverse Noncooperative Dynamic Games
to x T (evaluated at x T and θ i ). Finally, we require the following condition on the player control sets U i . Assumption 5.1 The player control sets U i are closed and convex for all i ∈ P. Under Assumption 5.1, we have the following conditions for open-loop Nash equilibria in the case of a finite horizon T < ∞ (due to the discrete-time minimumprinciple of Theorem 3.1 and mirroring Theorem 2.5). Theorem 5.1 (Finite-Horizon Open-Loop Nash Equilibria) Suppose that Assumption 5.1 holds and that the N -tuple of player control sequences {u i[0,T −1] : i ∈ P} with associated state sequence x[0,T ] constitutes an open-loop Nash equilibrium solution to a noncooperative dynamic game with player cost-function parameters {θ i ∈ Θ i : i ∈ P} and a finite horizon 0 < T < ∞. Then, (i) the state sequence x[0,T ] satisfies the game dynamics xk+1 = f k (xk , u 1k , . . . , u kN ) for 0 ≤ k ≤ T − 1 given x0 ; (ii) there exist costates λik ∈ Rn for all times 0 ≤ k ≤ T and all players i ∈ P satisfying the backward recursions λik = ∇x Hki (xk , u 1k , . . . , u kN , λik+1 , θ i )
(5.6)
for 0 ≤ k ≤ T − 1 with λiT = ∇x F i (x T , θ i );
(5.7)
∇u i Hki (xk , u 1k , . . . , u kN , λik+1 , θ i ) (u¯ − u ik ) ≥ 0
(5.8)
and (iii) the controls u ik satisfy
for all admissible controls u¯ ∈ U i , all times 0 ≤ k ≤ T − 1, and all players i ∈ P. Proof Follows the same argument as Theorem 2.5 with appropriate use of the parameterized discrete-time finite-horizon minimum principle of Theorem 3.1. Combined conditions for open-loop Nash equilibria for games with finite or infinite horizons are also straightforward to establish along the same lines as Corollary 3.1. To present these combined horizon-invariant conditions for open-loop Nash equilibria, we require the following assumption analogous to Assumption 3.2 where we use ∇x f k (xk , u 1k , . . . , u kN ) and ∇u i f k (xk , u 1k , . . . , u kN ) to denote the matrices of partial derivatives of f k , with respect to xk and u ik , respectively (and evaluated at xk and {u ik : i ∈ P}).
5.1 Preliminary Concepts
147
Assumption 5.2 The derivative matrix of the dynamics ∇x f k (xk , u 1k , . . . , u kN ) is invertible for all k ≥ 0. Our combined horizon-invariant conditions for open-loop Nash equilibria are as follows. Theorem 5.2 (Horizon-Invariant Open-Loop Nash Equilibria) Suppose that the N -tuple of player control sequences {u i[0,] : i ∈ P} constitutes a truncated openloop Nash equilibrium solution with associated state sequence x[0,] for cost-function parameters {θ i ∈ Θ i : i ∈ P} and either an infinite horizon T = ∞ or finite horizon T > . If Assumptions 5.1 and 5.2 hold then, (i) the state sequence x[0,] satisfies the game dynamics xk+1 = f k (xk , u 1k , . . . , u kN ) for 0 ≤ k < given x0 ; (ii) there exist costates λik ∈ Rn for all times 0 ≤ k ≤ + 1 and all players i ∈ P satisfying the backward recursions λik = ∇x Hki (xk , u 1k , . . . , u kN , λik+1 , θ i )
(5.9)
for 0 ≤ k ≤ ; and (iii) the controls u ik satisfy ∇u i Hki (xk , u 1k , . . . , u kN , λik+1 , θ i ) (u¯ − u ik ) ≥ 0
(5.10)
for all admissible controls u¯ ∈ U i , all times 0 ≤ k ≤ , and all players i ∈ P. Proof Under Assumption 5.1, Theorem 5.1 establishes the theorem assertions in the case of finite 0 < T < ∞ where here we disregard the terminal values λiT . In the infinite-horizon case T = ∞, we note that as in the proof of Theorem 5.1 (see also Theorem 2.5), the definition of open-loop Nash equilibria provided in (5.4) is equivalent to a system of N coupled discrete-time optimal control problems that are the parameterized infinite-horizon counterparts to (2.29). The theorem assertions then follow by applying the infinite-horizon discrete-time minimum principle of Theorem 3.2 to each of these infinite-horizon optimal control problems (with Assumption 5.1 implying Assumption 3.1 and Assumption 5.2 implying Assumption 3.2). Turning our attention now to conditions for feedback Nash equilibria, a similar argument to that used in Theorem 5.1 for the finite horizon T < ∞ case leads to conditions analogous to those of Theorem 5.1 but with additional complexity associated with the costate vectors λik . Theorem 5.3 (Finite-Horizon Feedback Nash Equilibria) Suppose that Assumption 5.1 holds and that the N -tuple of player control sequences {u i[0,T −1] : i ∈ P}
148
5 Inverse Noncooperative Dynamic Games
with associated differentiable feedback laws {γki : u ik = γki (xk ), k ≥ 0, i ∈ P} and state sequence x[0,T ] constitutes a feedback Nash equilibrium solution for costfunction parameters {θ i : i ∈ P} and a finite horizon 0 < T < ∞. Then, (i) the state sequence x[0,T ] satisfies the game dynamics xk+1 = f k (xk , u 1k , . . . , u kN ) for 0 ≤ k ≤ T − 1 given x0 ; (ii) there exist costates λik ∈ Rn for 0 ≤ k ≤ T and i ∈ P satisfying the backward recursions λik = ∇x Hki (xk , γk1 (xk ), . . . , u ik , . . . , γkN (xk ), λik+1 , θ i )
(5.11)
for 0 ≤ k ≤ T − 1 with λiT = ∇x F i (x T , θ i );
(5.12)
∇u i Hki (xk , u 1k , . . . , u kN , λik+1 , θ i ) (u¯ − u ik ) ≥ 0
(5.13)
and (iii) the controls u ik satisfy
for all admissible controls u¯ ∈ U i , all 0 ≤ k ≤ T − 1, and all i ∈ P. Proof Follows via the same argument as Theorem 2.6 with appropriate use of the finite-horizon discrete-time minimum principle of Theorem 3.1 for parameterized discrete-time optimal control problems. Theorems 5.1 and 5.3 are similar, however, under the feedback information structure in Theorem 5.3, the costate function (5.11) involves partial derivatives of the j Nash equilibrium feedback control laws γk . As we shall discuss later in this chapter, this difference in costate recursions has a profound impact on inverse noncooperative dynamic game problems. We next introduce the specific inverse noncooperative dynamic game problems that will be considered in this chapter.
5.2 Inverse Noncooperative Dynamic Game Problems The inverse noncooperative dynamic game problems that we shall consider in this chapter involve computing cost-function parameters θ i for some or all of the players in a parameterized noncooperative dynamic game such that a given collection of state and control sequences constitutes a Nash equilibrium. We first pose a problem in which the state and control sequences are given in their entirety.
5.2 Inverse Noncooperative Dynamic Game Problems
149
Definition 5.1 (Whole-Sequence (WS) Problem) Consider a parameterized noncooperative dynamic game (cf. Sect. 5.1.1) with: a finite horizon 0 < T < ∞; a given open-loop or feedback information structure; and known dynamics f k , constraint sets U i , parameter sets Θ i and functions gki and F i . Given the sequence of states x[0,T ] and associated player controls {u i[0,T −1] : i ∈ P}, the whole sequence (WS) inverse noncooperative dynamic game problem is to compute parameters {θ i ∈ Θ i : i ∈ P} of the game such that x[0,T ] and {u i[0,T −1] : i ∈ P} constitute a Nash equilibrium of the appropriate type for the given information structure. We also pose an inverse noncooperative dynamic game problem in which the horizon of the game T > 0 can differ from the length of the given state and control sequences (and potentially be infinite). Definition 5.2 (Truncated-Sequence (TS) Problem) Consider a parameterized noncooperative dynamic game (cf. Sect. 5.1.1) with: a potentially infinite horizon T > 0; a given open-loop or feedback information structure; and known dynamics f k , constraint sets U i , parameter sets Θ i , and functions gki and F i . Given the truncated sequence of states x[0,] and associated player controls {u i[0,] : i ∈ P} with < T , the truncated-sequence (TS) inverse noncooperative dynamic game problem is to compute parameters {θ i ∈ Θ i : i ∈ P} of the game such that x[0,] and {u i[0,] : i ∈ P} constitute a truncated Nash equilibrium of the appropriate type for the given information structure. The WS and TS problems of Definitions 5.1 and 5.2 are natural generalizations of the WS and TS discrete-time inverse optimal control problems of Definitions 3.1 and 3.2 to the case where there is more than one player in the game. In the case of a single player game (i.e., N = 1), the WS problems of Definitions 3.1 and 5.1 are equivalent, as are the TS problems of Definitions 3.2 and 5.2. Given the close relationship between discrete-time inverse optimal control and inverse dynamic game problems, many of the inverse dynamic game methods we shall construct and analyze in this chapter are directly analogous to those in Chap. 3. Several particular points of interest will however be worth noting in the game setting. • Firstly, the (forward) solution of noncooperative dynamic games is typically more involved than the (forward) solution of discrete-time optimal control problems. The tractability of inverse methods that involve solving noncooperative dynamic games (e.g., naive bilevel methods) is therefore of acute concern in game settings with N > 1. • Secondly, despite the definition of Nash equilibria in (5.4) involving coupling between the Nash equilibria player strategies, inverse methods may allow for the separate computation of cost-function parameters θ i for individual players without the need to compute the (unknown) parameters θ j of other players j ∈ P, j = i. Before we construct and analyze inverse noncooperative dynamic game methods, we note that as in inverse optimal control, in many practical situations misspecification can occur in the dynamics, cost functions, parameters, or horizon such that
150
5 Inverse Noncooperative Dynamic Games
the given sequences fail to constitute an exact Nash equilibrium for any player costfunction parameters {θ i ∈ Θ i : i ∈ P}. In this chapter, we shall thus also consider the inverse noncooperative dynamic game problems of Definitions 5.1 and 5.2 under an approximate solution concept in which we seek to find parameters such that the sequences approximately satisfy Nash equilibria conditions.
5.3 Bilevel Methods As in the case of inverse optimal control, bilevel methods are natural candidates for solving the inverse noncooperative dynamic game problems of Definitions 5.1 and 5.2. Naively applying the same bilevel-optimization ideas behind the bilevel methods of inverse optimal control in Sect. 3.3 suggests that bilevel methods for solving inverse noncooperative dynamic game problems should involve a first (or upper) level of optimization over all unknown player cost-function parameters, and a second (or lower) level of optimization involving the (forward) solution of a noncooperative dynamic game. Interestingly however, as we shall see in this section, there is some flexibility in needing to solve a noncooperative dynamic game in the lower level of optimization since we can alternatively exploit the relationship between Nash equilibria and discrete-time optimal control (cf. Sect. 2.4.2.2) to simply solve a discrete-time optimal control problem for a specific player of interest. In this section, we thus present bilevel methods involving either the solution of noncooperative dynamic games or alternatively discrete-time optimal control problems. These bilevel methods will motivate our later consideration of methods based on minimum-principle conditions for Nash equilibria.
5.3.1 Bilevel Method for Whole Sequences The bilevel method for solving the WS problem of Definition 5.1 is the optimization problem inf
{θ i ∈Θ i :i∈P }
T k=0
xk − xkθ 2 +
T −1
2 u ik − u iθ k
(5.14)
i∈P k=0
θ with the optimization subject to the constraint that the state sequence x[0,T ] θ θ θ {x0 , x1 , . . . , x T } and associated player control sequences iθ iθ iθ {u iθ [0,T −1] {u 0 , u 1 , . . . , u T −1 } : i ∈ P}
constitute an appropriate open-loop or feedback Nash equilibrium of a parameterized noncooperative dynamic game with cost-function parameters {θ i ∈ Θ i : i ∈ P}.
5.3 Bilevel Methods
151
This bilevel method thus contains the (forward) solution of a noncooperative dynamic games in its constraints, with the information structure (i.e., open-loop or feedback) and the finite horizon T specified a priori. A useful variation of the bilevel method of (5.14) involves noting that if the information structure is open loop, then the cost-function parameters of each individual player i ∈ P can be computed separately by solving the optimization problem inf
θ i ∈Θ i
T
xk −
xkθ 2
k=0
+
T −1
2 u ik − u iθ k
(5.15)
k=0
iθ θ subject to the state sequence x[0,T ] and control sequence u [0,T −1] constituting a solution to the discrete-time finite-horizon optimal control problem
inf
u¯ i[0,T −1]
s.t.
N i VTi (x[0,T ] , u 1[0,T −1] , . . . , u¯ i[0,T −1] , . . . , u [0,T −1] , θ )
xk+1 = f k (xk , u 1k , . . . , u¯ ik , . . . , u kN ), 0 ≤ k ≤ T − 1 x0 ∈ Rn
(5.16)
u¯ ik ∈ U i , 0 ≤ k ≤ T − 1 with player cost-function parameters θ i ∈ Θ i and the other player controls taken j as the given sequences {u [0,T −1] : j ∈ P, j = i}. This simplified bilevel method exploits the fact that player i’s open-loop Nash equilibrium controls can be found by solving the discrete-time optimal control problem given the open-loop Nash equilibrium controls of the other players j ∈ P, j = i (cf. Sect. 2.4.2.2). It can therefore be implemented in the same manner (and with the same computational complexity) as the bilevel method for discrete-time inverse optimal control in (3.15). By comparing (2.29) and (2.30), we note that a similar reduction of the bilevel method of (5.14) does not hold for the feedback information structure since the player feedback j Nash equilibrium strategies {γk : j ∈ P, i = j, 0 ≤ k ≤ T − 1} are, in general, unknown within the context of the WS problem of Definition 5.1. The bilevel method of (5.14) and its open-loop version in (5.15) are only suitable for solving inverse noncooperative dynamic game problems with finite horizons T < ∞ given whole sequences of states and controls. We shall next present a bilevel method that enables the solution of inverse noncooperative dynamic game problems with both finite and infinite horizons using truncated sequences.
5.3.2 Bilevel Method for Truncated Sequences The bilevel method for solving the TS problem of Definition 5.2 is the optimization problem
152
5 Inverse Noncooperative Dynamic Games
inf
{θ i ∈Θ i :i∈P }
xk − xkθ 2 +
k=0
2 u k − u iθ k
(5.17)
i∈P k=0
θ subject to the state sequence x[0,] and control sequences {u iθ [0,] : i ∈ P} being truncated versions of sequences that constitute an appropriate Nash equilibrium of a noncooperative dynamic game with cost-function parameters {θ i ∈ Θ i : i ∈ P}, and a given information structure and (potentially infinite) horizon T > 0. Unlike the bilevel method for whole sequences (5.14), the horizon T in the bilevel method for truncated sequences (5.17) can be infinite since the state and control sequences are only evaluated at a finite number of times < T . The bilevel method for truncated sequences of (5.17) is, however, not easily simplified when the information structure is open-loop in the same manner as the whole-sequence bilevel method (5.14) can be simplified to (5.15). Indeed, in the truncated-sequence open-loop case, the cost-function parameters of each player i ∈ P cannot easily be computed separately using a bilevel method of discrete-time inverse optimal control problem similar to (5.15) since the control sequences of the other players are not known in their entirety (preventing the solution of the optimal control problems (5.16)). This limitation of the truncated-sequence bilevel method (5.17) is noteworthy since other methods that we shall develop in this chapter will enable the parameters of individual players to be computed separately (regardless of the sequences being truncated).
5.3.3 Discussion of Bilevel Methods As in the inverse optimal control, the objective of the upper level of optimization in the bilevel methods (5.14) and (5.17) is the total squared error between the given states and controls (i.e., xk and u ik ) and the states and controls predicted by solving optimal control problems in the constraints (or lower level of optimization) with {θ i ∈ Θ i : i ∈ P}. Alternative upper-level objectives open the possibility of handling partialinformation inverse dynamic game problems in which some or all of the states xk or controls u ik are not provided or unknown. For example, the total squared error of the states, inf
{θ i ∈Θ i :i∈P }
T
xk − xkθ 2 ,
k=0
can be used as an alternative upper-level objective in (5.14) if only the states (and not the controls) are given. The main limitation of bilevel methods for solving inverse dynamic game problems is their reliance on solving dynamic games (or optimal control problems) in their constraints. They therefore require:
5.3 Bilevel Methods
153
• explicit prior knowledge of the horizon T > 0, and; • the solution of two nested optimization problems (and hence implementations nesting two numeric optimization routines) with the first optimization over the parameters {θ i ∈ Θ i : i ∈ P} and the second optimization corresponding to the solution of a noncooperative dynamic game or optimal control problem with parameters {θ i ∈ Θ i : i ∈ P}. The computational demands of many bilevel method implementations are thus significant, and explicit knowledge of the horizon T is restrictive in practice. Motivated by the need to find efficient alternatives to bilevel methods for solving inverse noncooperative dynamic game problems, in the remainder of this chapter, we shall construct and analyze inverse methods that exploit the Nash equilibria conditions of Sect. 5.1.2.
5.4 Open-Loop Minimum-Principle Methods In this section, we shall develop methods for solving the WS and TS problems of Definitions 5.1 and 5.2 under the open-loop information structure using the conditions of Sect. 5.1.2 derived from minimum principles. In later sections of this chapter, we shall use these minimum-principle methods to discuss the challenges and potentials of using similar approaches to solve the problems of Definitions 5.1 and 5.2 under the feedback information structure (i.e., when the given sequences are to constitute a feedback Nash equilibrium).
5.4.1 Whole-Sequence Open-Loop Methods Let us begin solving the WS problem of Definition 5.1 by considering a finite horizon 0 < T < ∞ along with an (arbitrary) N -tuple of player control sequences {u i[0,T −1] : i ∈ P} and a state sequence x[0,T ] . For each player i ∈ P, let us define K i 0 ≤ k < T : u ik ∈ int U i as a set of times in the horizon at which the player’s controls from the sequence u i[0,T −1] are in the interior of the constraint set U i . If x[0,T ] and {u i[0,T −1] : i ∈ P} constitute an open-loop Nash equilibrium of a (finite-horizon) noncooperative dynamic game with horizon T and player cost-function parameters {θ i : i ∈ P}, then under Assumption 5.1, assertion (iii) of Theorem 5.1 implies that the gradient of player i’s Hamiltonian vanishes at the times k ∈ K i , namely, ∇u i Hki (xk , u 1k , . . . , u kN , λik+1 , θ i ) = 0 for all k ∈ K i and all i ∈ P. Theorem 5.1 thus implies that in order for x[0,T ] and {u i[0,T −1] : i ∈ P} to constitute an open-loop Nash equilibrium, the cost-function
154
5 Inverse Noncooperative Dynamic Games
parameters θ i of each player i ∈ P must be such that λik = ∇x Hki (xk , u 1k , . . . , u kN , λik+1 , θ i )
(5.18)
for k = 0, . . . , T − 1 with λiT = ∇x F i x T , θ i
(5.19)
∇u i Hki (xk , u 1k , . . . , u kN , λik+1 , θ i ) = 0
(5.20)
and
for all k ∈ K i . For each player i ∈ P, the costate backward recursion (5.18), costate terminal condition (5.19), and Hamiltonian gradient condition (5.20) are identical to the costate backward recursion (3.18), costate terminal condition (3.19), and Hamiltonian gradient condition (3.20) of discrete-time optimal control. Recalling the proof of Theorem 5.1, this relationship between conditions is specifically due to the optimal control formulation of open-loop Nash equilibria (cf. Sect. 2.4.2.2). As a result of the equivalence in conditions for open-loop Nash equilibria and solutions of discrete-time optimal control problems, methods identical to the constraintsatisfaction, soft, and mixed methods of Sect. 3.4.1 can be developed to solve the whole-sequence inverse noncooperative dynamic game problem under the open-loop information structure. In this subsection, we shall summarize these whole-sequence inverse noncooperative dynamic game methods but we shall omit their derivations since they mirror those in Sect. 3.4.1.
5.4.1.1
Constraint-Satisfaction Method for Whole Sequences
Recalling the constraint-satisfaction method of inverse optimal control (3.21), the constraint-satisfaction method for solving the whole-sequence inverse noncooperative dynamic game problem under the open-loop information structure is defined by the N constraint-satisfaction problems inf
θ i ,λi[0,T ]
s.t.
C λik = ∇x Hki (xk , u 1k , . . . , u kN , λik+1 , θ i ), 0 ≤ k ≤ T − 1 λiT = ∇x F i x T , θ i ∇u i Hki (xk , u 1k , . . . , u kN , λik+1 , θ i ) = 0, k ∈ K θi ∈ Θi
i
(5.21)
5.4 Open-Loop Minimum-Principle Methods
155
for any constant C ∈ R and for all i ∈ P where λi[0,T ] {λi0 , λi1 , . . . , λiT }. We note that these N optimization problems are decoupled given the sequences x[0,T ] and {u i[0,T −1] : i ∈ P}. Hence, (5.21) can be solved in isolation for each player i ∈ P. 5.4.1.2
Soft Method for Whole Sequences
Recalling the soft method of inverse optimal control (3.22), the soft method for solving the whole-sequence inverse noncooperative dynamic game problem under the open-loop information structure is defined by the N optimization problems inf
θ i ,λi[0,T ]
∇u i H i (xk , u 1 , . . . , u N , λi k
k∈K
+
k
k+1 , θ
k
i
2 )
i
T −1
i λ − ∇x H i (xk , u 1 , . . . , u N , λi , θ i )2 k k k k k+1
(5.22)
k=0
2 + λiT − ∇x F i x T , θ i s.t.
θi ∈ Θi
for all i ∈ P. This soft method provides an approximate optimality solution to the WS problem of Definition 5.1 since it involves minimizing the extent to which the costate backward recursion (5.18) and the Hamiltonian gradient condition (5.20) are violated for each player i ∈ P. Again, we note that the N optimization problems in (5.22) are decoupled and so they can be solved independently for each i ∈ P.
5.4.1.3
Mixed Method for Whole Sequences
Finally, recalling the mixed method of inverse optimal control (3.23), the mixed method for the solving whole-sequence inverse noncooperative dynamic game problem under the open-loop information structure is defined by the N optimization problems inf
θ i ,λi[0,T ]
s.t.
∇u i H i (xk , u 1 , . . . , u N , λi k
k
k
k+1 , θ
i
k∈K i λik = ∇x Hki (xk , u 1k , . . . , u kN , λik+1 , θ i ), λiT = ∇x F i x T , θ i i i
2 ) k = 0, . . . , T − 1
(5.23)
θ ∈Θ
for all i ∈ P. We again see that the N optimization problems in (5.23) are decoupled and so they can be solved independently for each i ∈ P.
156
5 Inverse Noncooperative Dynamic Games
As in discrete-time inverse optimal control, perhaps the most significant limitation of the constraint-satisfaction, soft, and mixed methods is that they require a known, finite horizon 0 < T < ∞. We next develop methods that are applicable in the case of truncated sequences and potentially unknown finite and infinite horizons.
5.4.2 Truncated-Sequence Open-Loop Methods In order to develop minimum-principle methods for solving the TS problem of Definition 5.2 under the open-loop information structure, consider a (possibly infinite) horizon T > 0 and state and control sequences x[0,] and {u i[0,] : i ∈ P} with < T . For each player i ∈ P, let us define the set Ki {0 ≤ k ≤ : u ik ∈ int U i } as the times at which the player’s controls u i[0,] are in the interior of the controlconstraint set U i . Theorem 5.2 implies that in order for x[0,] and {u i[0,] : i ∈ P} to constitute a truncated open-loop Nash equilibria of a noncooperative dynamic game with either a finite or infinite horizon T > , the parameters {θ i : i ∈ P} must be such that λik = ∇x Hki xk , u 1k , . . . , u kN , λik+1 , θ i
(5.24)
for all 0 ≤ k ≤ and all players i ∈ P, and ∇u i Hki xk , u 1k , . . . , u kN , λik+1 , θ i = 0
(5.25)
for all k ∈ Ki and all players i ∈ P. The costate backward recursions (5.24) and the Hamiltonian gradient conditions (5.25) are similar to those that we used in the previous subsection to propose methods for solving the whole-sequence inverse dynamic game problem under the openloop information structure. For each individual player i ∈ P, the conditions (5.24) and (5.25) are also equivalent to the conditions (3.25) and (3.26) that we used in Sect. 3.4.2 to develop methods of discrete-time inverse optimal control with truncated sequences. As in the whole-sequence case, by inspecting the proof of Theorem 5.2, we see that this per-player equivalence in conditions is due again to the optimal control formulation of open-loop Nash equilibria (cf. Sect. 2.4.2.2). In this subsection, we shall therefore develop constraint-satisfaction, soft, and mixed methods for solving the TS problem of Definition 5.2 that are directly analogous to the methods for whole sequences developed in the previous subsection and to the inverse optimal control methods of Sect. 3.4.2. These truncated-sequence methods will differ from their whole-sequence counterparts by not incorporating the concept of terminal costates λiT or associated terminal costate conditions analo-
5.4 Open-Loop Minimum-Principle Methods
157
gous to (5.19). Due to the similarities between the truncated-sequence discrete-time inverse optimal control methods of Sect. 3.4.2 and the methods we present here, we shall summarize the methods here and refer to Sect. 3.4.2 for detailed derivations.
5.4.2.1
Constraint-Satisfaction Method for Truncated Sequences
Recalling the truncated-sequence constraint-satisfaction method of inverse optimal control (3.27), the constraint-satisfaction method for solving the truncated-sequence inverse noncooperative dynamic game problem under the open-loop information structure is defined by the N optimization problems inf
θ i ,λi[0,+1]
C λik = ∇x Hki (xk , u 1k , . . . , u kN , λik+1 , θ i ), 0 ≤ k ≤
s.t.
(5.26)
∇u i Hki (xk , u 1k , . . . , u kN , λik+1 , θ i ) = 0, k ∈ Ki θi ∈ Θi for any constant C ∈ R and all i ∈ P.
5.4.2.2
Soft Method for Truncated Sequences
Similarly, recalling the truncated-sequence soft method of inverse optimal control (3.28), the soft method for solving the truncated-sequence inverse noncooperative dynamic game problem under the open-loop information structure is defined by the N optimization problems inf
θ i ,λi[0,+1]
∇u i H i (xk , u 1 , . . . , u N , λi k
k
k
k+1 , θ
i
2 )
k∈K i
+
i λ − ∇x H i (xk , u 1 , . . . , u N , λi , θ i )2 k k k k k+1
(5.27)
k=0
s.t.
θ ∈ Θi i
for all i ∈ P.
5.4.2.3
Mixed Method for Truncated Sequences
Finally, recalling the truncated-sequence mixed method of inverse optimal control (3.29), the mixed method for solving the truncated-sequence inverse noncooperative dynamic game problem under the open-loop information structure is defined by the
158
5 Inverse Noncooperative Dynamic Games
N optimization problems inf
θ i ,λi[0,+1]
s.t.
∇u i H i (xk , u 1 , . . . , u N , λi k
k
k
k+1 , θ
i
2 )
k∈K i
λk = ∇x Hki (xk , u 1k , . . . , u kN , λik+1 , θ i ), 0 ≤ k ≤
(5.28)
θi ∈ Θi for all i ∈ P.
5.4.3 Discussion of Open-Loop Minimum-Principle Methods The key difference between the whole-sequence and truncated-sequence minimumprinciple methods of inverse dynamic games presented in Sects. 5.4.1 and 5.4.2 is that the truncated-sequence methods require no prior knowledge of the horizon length T other than that it exceeds the length of the sequences (i.e., < T ). Many of the other properties of the truncated-sequence inverse dynamic game methods are analogous to those of the whole-sequence methods. For example, both the whole-sequence and truncated-sequence constraint-satisfaction, soft, and mixed methods enable the cost-function parameters of individual players i ∈ P to be computed separately without the need to compute or know the parameters of the other players j ∈ P, j = i. This property is important and surprising since the truncated-sequence bilevel method of (5.17) requires the cost-function parameters of all players to be computed simultaneously (and with knowledge of the horizon T ). The main limitation of the bilevel and minimum-principle methods (for both whole and truncated sequences) is that they may yield cost-function parameters under which the given state and control sequences only constitute a local (not global) openloop Nash equilibrium (or even inflection sequences). Nevertheless, if the parameters yielded by the minimum-principle methods can be shown to be unique, then there will be no other parameters under which the sequences could constitute a (global) Nash equilibrium. Results characterizing the existence and uniqueness of solutions to the constraint-satisfaction, soft, and mixed methods for whole and truncated sequences are thus important and will be the focus of the next section.
5.5 Open-Loop Method Reformulations and Solution Results In this section, we reformulate and analyze the minimum-principle methods of inverse noncooperative dynamic games developed in Sect. 5.4. The results of this section mirror the discrete-time inverse optimal control results of Sect. 3.5, and so here we shall only present simplified derivations.
5.5 Open-Loop Method Reformulations and Solution Results
159
5.5.1 Linearly Parameterized Player Cost Functions We begin by introducing additional structure into the parameterization of the player cost functions. Assumption 5.3 (Linearly Parameterized Player Cost Functions) For all players i ∈ P, the functions gki and F i are linear combinations of known basis functions in the sense that gki (xk , u 1k , . . . , u kN , θ i ) = θ i g¯ ki (xk , u 1k , . . . , u kN ) for all k ≥ 0, and
F i (x T , θ i ) = θ i F¯ i (x T )
where g¯ ki : Rn × Rm × · · · × Rm → Rq and F¯ i : Rn → Rq are basis functions i that are continuously differentiable in each of their arguments, and θ i ∈ Θ i ⊂ Rq are the cost-function parameters. 1
N
i
i
Assumption 5.3 is a straightforward generalization of Assumption 3.3 that we used to develop results for minimum-principle methods of discrete-time inverse optimal control in Sect. 3.5. Under Assumption 5.3, the player Hamiltonian functions Hki are linear in their respective cost-function parameters θ i and costate variables λik in the sense that Hki (xk , u 1k , . . . , u kN , λik+1 , θ i ) = θ i g¯ ki (xk , u 1k , . . . , u kN ) + λi k+1 f k (xk , u 1k , . . . , u kN ) for i ∈ P. This linearity property also extends to the gradients of the player Hamiltonians under Assumption 5.3, namely, ∇u i Hki (xk , u 1k , . . . , u kN , λik+1 , θ i ) = ∇u i g¯ ki xk , u 1k , . . . , u kN θ i + ∇u i f k xk , u 1k , . . . , u kN λik+1
(5.29)
and ∇x Hki (xk , u 1k , . . . , u kN , λik+1 , θ i ) = ∇x g¯ ki xk , u 1k , . . . , u kN θ i + ∇x f k xk , u 1k , . . . , u kN λik+1
(5.30)
for i ∈ P where ∇x g¯ ki (xk , u 1k , . . . , u kN ) and ∇u i g¯ ki (xk , u 1k , . . . , u kN ) denote the matrices of partial derivatives of g¯ ki with respect to xk and u ik , respectively, and evaluated at xk and {u ik : i ∈ P}. We shall similarly use ∇x F¯ i (x T ) to denote the matrix of partial derivatives of F¯ i with respect to (and evaluated at) x T . Since the gradients of the player Hamiltonians are linear in the parameters θ i and costates λik+1 under Assumption 5.3, the methods of Sect. 5.4 can immediately be seen to be convex optimization problems under Assumption 5.3 (provided that the sets Θ i are also convex). By following identical arguments to those in Sect. 3.5, we
160
5 Inverse Noncooperative Dynamic Games
are thus next able to exploit (static) optimization results (cf. Sect. 2.1) to analyze the existence and uniqueness of the cost-function parameters yielded by the methods of Sect. 5.4.
5.5.2 Fixed-Element Parameter Sets We first note that scaling any of the player cost functions in a noncooperative dynamic game (i.e., VTi ) by any scalar C i > 0 does not change the nature of the open-loop Nash equilibria sequences x[0,T ] and u i[0,T −1] . Thus, an immediate condition necessary (though not sufficient) for methods of inverse noncooperative dynamic games to yield unique solutions is that parameter sets Θ i must not contain both θ i and C i θ i for any scalar C i > 0 and any θ i . For the purpose of analysis, we shall therefore let the parameter set for each player i ∈ P be i
i Θ i = {θ i ∈ Rq : θ(1) = 1}
(5.31)
i
i denotes the first element of θ i ∈ Rq . There is no loss of generality with where θ(1) these parameter sets since the order and scaling of the basis functions (and elements of θ i ) under Assumption 5.3 is arbitrary. Importantly, the parameter sets given by (5.31) also exclude the trivial solution {θ i = 0 : i ∈ P}.
5.5.3 Whole-Sequence Methods Reformulations and Results Let us consider the (open-loop) whole-sequence constraint-satisfaction, mixed, and soft methods presented in Sect. 5.4.1 under Assumption 5.3 and with parameter sets given by (5.31).
5.5.3.1
Whole-Sequence Constraint-Satisfaction Method Reformulation and Results
Under Assumption 5.3 and by following the derivation of (3.37) for each player i ∈ P, the whole-sequence constraint-satisfaction method (5.21) can be reformulated as the problem of solving the N (constrained) systems of linear equations ξCi θ i = 0
s.t.
θi ∈ Θi
(5.32)
for all i ∈ P. The coefficient matrices ξCi for i ∈ P have dimensions m i |K i | × q i and are defined as
5.5 Open-Loop Method Reformulations and Solution Results
ξCi ∇u i g¯ ki xk , u 1k , . . . , u kN + ∇u i f k xk , u 1k , . . . , u kN λ¯ ik+1 k∈K i
161
(5.33)
i with λ¯ ik ∈ Rn×q given by the backward recursions
λ¯ ik = ∇x g¯ ki xk , u 1k , . . . , u kN + ∇x f k xk , u 1k , . . . , u kN λ¯ ik+1
(5.34)
for 0 ≤ k ≤ T − 1 with λ¯ iT ∇x F¯ i (x T ) .
(5.35)
Finally, the notation [Ak ]k∈K i denotes the matrix formed by stacking the matrices Ak with values of k in K i , e.g., ⎡
A0 A1 .. .
⎢ ⎢ [Ak ]k∈{0,...,T −1} = ⎢ ⎣
⎤ ⎥ ⎥ ⎥. ⎦
A T −1 By incorporating the player parameter sets Θ i given by (5.31) into the reformulation (5.32), the constraint-satisfaction method (5.21) becomes the problem of solving the N (unconstrained) systems of linear equations ξ¯Ci θ i = e¯1
(5.36)
for all i ∈ P where e ξ¯Ci 1i ξC and e1 and e¯1 are column vectors of appropriate dimensions with 1 in their first components and zeros elsewhere. We highlight that this reformulation preserves the ability to solve for player parameters individually (i.e., the systems of linear equations for different players can be solved independently). In the following theorem, we exploit this reformulation to characterize the existence and uniqueness of costfunction parameters that are yielded by the constraint-satisfaction method (5.21). Theorem 5.4 (Solutions to Whole-Sequence Constraint-Satisfaction Method) Suppose that Assumption 5.3 holds and consider any player i ∈ P. Let the parameter set Θ i be given by (5.31) and let ξ¯Ci+ be the pseudoinverse of ξ¯Ci . Then the constraintsatisfaction method for whole sequences (5.21) yields cost-function parameters for player i ∈ P if and only if ξ¯Ci ξ¯Ci+ e¯1 = e¯1 , and these (potentially nonunique) parameters are given by
(5.37)
162
5 Inverse Noncooperative Dynamic Games
θ i = ξ¯Ci+ e¯1 + I − ξ¯Ci+ ξ¯Ci bi
(5.38)
i
where bi ∈ Rq is any arbitrary vector. Furthermore, the cost-function parameters computed by the constraint-satisfaction method are unique and given by θ i = ξ¯Ci+ e¯1
(5.39)
if and only if ξ¯Ci has rank q i in addition to satisfying (5.37). Proof Follows via the same argument as Theorem 3.3 using the reformulation in (5.36) for player i ∈ P. Theorem 5.4 suggests that in some situations it is possible for the whole-sequence constraint-satisfaction method (5.21) to yield cost-function parameters for some players but not others, and to yield unique cost-function parameters for some players but not others. The discussion after Theorem 3.3 is relevant here and provides some intuition behind the conditions that must be satisfied for each player i ∈ P in order for the player’s cost-function parameters to be computable with the constraint-satisfaction method (5.21) and in order for them to be unique. Clearly, in order for the constraintsatisfaction method (5.21) to yield unique parameters for all of the players in a noncooperative dynamic game, the conditions of Theorem 5.4 must hold for all players i ∈ P. We now have the following counterpart to Corollary 3.2 which provides general insight into the solution of the WS problem of Definition 5.1. Specifically, the following corollary summarizes conditions for the existence and uniqueness of exact solutions to the whole-sequence inverse noncooperative dynamic game problem regardless of the specific inverse method employed. Corollary 5.1 (Existence of Exact Solutions to Whole-Sequence Problem of Definition 5.1) Suppose that Assumption 5.3 holds and consider any player i ∈ P. If the parameter set Θ i is given by (5.31) and if ξ¯Ci ξ¯Ci+ e¯1 = e¯1 then the sequence of states x[0,T ] and player controls {u i[0,T −1] : i ∈ P} do not constitute a (potentially local) open-loop Nash equilibrium for any θ i ∈ Θ i . If, however, ξ¯Ci has rank q i and ξ¯Ci ξ¯Ci+ e¯1 = e¯1 then there is at most one θ i ∈ Θ i such that the sequences x[0,T ] and {u i[0,T −1] : i ∈ P} constitute a (potentially local) open-loop Nash equilibrium.
5.5 Open-Loop Method Reformulations and Solution Results
163
Proof Follows via the same argument as Corollary 3.2.
Corollary 5.1 implies that if the left-identity condition (5.37) holds for all players i ∈ P, then it is feasible to solve the whole-sequence inverse noncooperative dynamic game problem of Definition 5.1 exactly. However, if (5.37) does not hold for all players i ∈ P, then those players for which it does not hold will have to have their parameters computed via an approximate optimality approach involving either the soft or mixed methods of (5.22) and (5.23), respectively. Interestingly, we note that it may therefore be reasonable to use different methods to compute the cost-function parameters of different players. We summarize the constraint-satisfaction method for whole sequences (for each player i ∈ P) under Assumption 5.3 in Algorithm 5.1. Algorithm 5.1 Whole-Sequence Constraint-Satisfaction Method for Player i j
Input: Whole state and control sequences x[0,T ] and {u [0,T −1] : j ∈ P }, dynamics f k , basis funci tions g¯ i and F¯ i , control set U i , and parameter set Θ i = {θ i ∈ Rq : θ i = 1}. k
Output: Computed player cost-function parameters θ i . 1: Compute sequence of matrices λ¯ i[0,T ] via (5.34) and (5.35). 2: Compute matrix ξCi via (5.33). 3: Compute augmented matrix ξ¯Ci from (5.36). 4: Compute the pseudoinverse ξ¯Ci+ of ξ¯Ci . 5: if ξ¯Ci ξ¯Ci+ e¯1 = e¯1 then 6: if ξ¯Ci has rank q i then 7: return Unique θ i given by (5.39). 8: else i 9: return Any θ i from (5.38) with any bi ∈ Rq . 10: end if 11: else 12: return No feasible exact solutions θ i (cf. Corollary 5.1). 13: end if
5.5.3.2
(1)
Whole-Sequence Soft Method Reformulation and Results
Under Assumption 5.3, the soft method of (5.22) can be reformulated by directly exploiting the linear forms of the Hamiltonian gradients in (5.29) and (5.30). For any player i ∈ P, the linearity of the Hamiltonian gradients under Assumption 3.3 specifically implies that the terms in the soft method are each quadratic in the parameters θ i and costate variables λik , namely,
164
5 Inverse Noncooperative Dynamic Games
∇u i H i xk , u 1 , . . . , u N , λi , θ i 2 k k k k+1 i i ∇u i g¯ ki xk , u 1k , . . . , u kN = θ λk+1 ∇u i f k xk , u 1k , . . . , u kN θ i i N N 1 1 · ∇u i g¯ k xk , u k , . . . , u k ∇u i f k xk , u k , . . . , u k λik+1 and i λ − ∇x H i xk , u 1 , . . . , u N , λi , θ i 2 k k k k k+1 ⎤ ⎡ i N 1 i i i −∇x g¯ k xk , u k , . . . , u k ⎦ = θ λk λk+1 ⎣ I −∇x f k xk , u 1k , . . . , u kN
⎡ ⎤ θ · −∇x g¯ ki xk , u 1k , . . . , u kN I −∇x f k xk , u 1k , . . . , u kN ⎣ λik ⎦ λik+1
together with i i ¯ i λ − ∇x F i x T , θ i 2 = θ i λi −∇x F (x T ) −∇x F¯ i (x T ) I θi . T T λT I Since the sums of quadratic forms are also quadratic forms, the soft method of (5.22) under Assumption 5.3 becomes the N (constrained) quadratic programs ⎡
i i θ λ0
inf
θ i ,λi[0,T ]
⎤ θi i ⎥ ⎢ ⎢ λ0 ⎥ · · · λi T ξ Si ⎢ . ⎥ ⎣ .. ⎦
s.t.
θi ∈ Θi
(5.40)
λiT
for all i ∈ P where ξ Si are matrices of dimension (q i + (T + 1)n) × (q i + (T + 1)n) containing the coefficients of the parameters θ i and costate variables λi[0,T ] contributed by each of the terms in the soft method of (5.22). We now use the reformulation in (5.40) to characterize the existence and uniqueness of cost-function parameters that are yielded by the soft method (5.22) with the parameter sets (5.31). To present these solution results, let us define the principal submatrix of ξ Si for i ∈ P as ⎡ ⎢ ⎢ ξ¯Si ⎢ ⎣
i ξ S,(2,2) i ξ S,(3,2) .. .
i ξ S,(q i +(T +1)n,2)
⎤ i ... ξ S,(2,q i +(T +1)n) i ⎥ ... ξ S,(3,q i +(T +1)n) ⎥ ⎥ .. .. ⎦ . . i . . . ξ S,(q i +(T +1)n,q i +(T +1)n)
(5.41)
5.5 Open-Loop Method Reformulations and Solution Results
165
i i where ξ S,(, j) is the element of ξ S in its th row and jth column. Let us also define
i i i ξ S,(3,1) . . . ξ S,(q ν Si ξ S,(2,1) i +(T +1)n,q i +(T +1)n)
(5.42)
as the first column of ξ Si for i ∈ P with its first element deleted. Similarly, for i ∈ P, let r Si be the rank of ξ¯Si , let ξ¯Si+ be the pseudoinverse of ξ¯Si , and let ξ¯Si = U Si Σ Si U Si be a singular value decomposition (SVD) of ξ¯Si where Σ Si ∈ R(q +(T +1)n−1)×(q +(T +1)n−1) i
i
is a diagonal matrix, and U Si
i,11 i,12 US US (q i +(T +1)n−1)×(q i +(T +1)n−1) i,21 i,22 ∈ R US US
(5.43)
is a block matrix with submatrices U Si,11 ∈ R(q −1)×r S , U Si,21 ∈ Rn(T +1)×r S , U Si,22 ∈ i i i i i Rn(T +1)×(q +(T +1)n−1−r S ) and U Si,12 ∈ R(q −1)×(q +(T +1)n−1−r S ) . Finally, let us define i i the rectangular matrix I Si I 0 ∈ Rq ×(q +(T +1)n) where I denotes the (square) identify matrix. The main result for the whole-sequence soft method (5.22) follows. i
i
i
Theorem 5.5 (Solutions to Whole-Sequence Soft Method) Suppose that Assumption 5.3 holds and consider any player i ∈ P. If the parameter set Θ i is given by (5.31) and (I − ξ¯Si ξ¯Si+ )ν Si = 0, then all of the parameter vectors θ i ∈ Θ i corresponding to all solutions (θ i , λi[0,T ] ) of the soft method (5.22) are of the form θ i = I Si ηiS
(5.44)
i where ηiS 1 η¯ i S ∈ Rq +(T +1)n are (potentially nonunique) solutions to the i quadratic program (5.40) with η¯ iS ∈ Rq +(T +1)n−1 given by η¯ iS = −ξ¯Si+ ν Si + U Si
0 bi
(5.45)
for any bi ∈ Rq +(T +1)n−1−r S . Furthermore, if either U Si,12 = 0 or r Si = q i + (T + 1)n − 1, then all solutions (θ i , λi[0,T ] ) to the soft method (5.22) correspond to a single unique parameter vector given by i
i
θ = i
I Si
1
−ξ¯Si+ ν Si
.
(5.46)
166
5 Inverse Noncooperative Dynamic Games
Proof The proof follows that of Theorem 3.4 for each player i ∈ P using the quadratic program reformulation in (5.40) in place of (3.42). The conditions of Theorem 5.5 are discussed (as they relate to each player i ∈ P) after Theorem 3.4. We note here that it is possible for the conditions of Theorem 5.5 to hold for some players i ∈ P but not others and so the parameters computed by the soft method may be unique for some players, but not others. We summarize the soft method for whole sequences (5.22) (per player i ∈ P) under Assumption 5.3 in Algorithm 5.2. Algorithm 5.2 Whole-Sequence Soft Method for Player i j
Input: Whole state and control sequences x[0,T ] and {u [0,T −1] : j ∈ P }, dynamics f k , basis funci tions g¯ i and F¯ i , control set U i , and parameter set Θ i = {θ i ∈ Rq : θ i = 1}. k
Output: Computed cost-function parameters θ i . 1: Compute matrix ξ Si via (5.40). 2: Compute submatrix matrix ξ¯Si from (5.41) and vector ν Si from (5.42). 3: Compute the pseudoinverse ξ¯Si+ of ξ¯ Si such that (I − ξ¯Si ξ¯ Si+ )ν Si = 0. 4: Compute the rank r Si of ξ¯ Si . 5: if r Si = q i + (T + 1)n − 1 then 6: return Unique θ i given by (5.46). 7: else 8: Compute U Si and U Si,12 in (5.43) via SVD of ξ¯Si . 9: if U Si,12 = 0 then 10: return Unique θ i given by (5.46). 11: else i i 12: return Any θ i from (5.44) with any bi ∈ Rq +(T +1)n−1−r S . 13: end if 14: end if
5.5.3.3
(1)
Whole-Sequence Mixed Method Reformulation and Results
Following the derivation of (3.43) for each player i ∈ P and noting Assumption 5.3, we have that the mixed method (5.23) may be reformulated as the problem of solving the N constrained quadratic programs inf θi
i i θ i ξ M θ
s.t.
θi ∈ Θi .
(5.47)
i for all i ∈ P where ξ M ∈ Rq ×q are positive semidefinite matrices defined as i
i ξM
i
∇u i g¯ ki xk , u 1k , . . . , u kN + ∇u i f k xk , u 1k , . . . , u kN λ¯ ik+1 k∈K
i
· ∇u i g¯ ki xk , u 1k , . . . , u kN + ∇u i f k xk , u 1k , . . . , u kN λ¯ ik+1
5.5 Open-Loop Method Reformulations and Solution Results
167
and the matrices λ¯ ik ∈ Rn×q are given by the backwards recursions (5.34) with terminal conditions (5.35). In order to exploit this reformulation to characterize the existence and uniqueness of cost-function parameters that are yielded by the whole-sequence mixed method when the parameter sets are given by (5.31), for each player i ∈ P let us define the i as principal submatrix of ξ M i
⎡
i ξ¯ M
i ξ M,(2,2) i ⎢ ξ M,(3,2) ⎢ ⎢ . ⎣ .. i ξ M,(q i ,2)
⎤ i . . . ξ M,(2,q i) i ⎥ . . . ξ M,(3,q i) ⎥ ⎥. .. .. ⎦ . . i . . . ξ M,(q i ,q i )
(5.48)
i Let us also define the first column of ξ M with its first element deleted as
i i i i ξ M,(3,1) . . . ξ M,(q νM ξ M,(2,1) . i ,q i )
(5.49)
i+ i i i Similarly, let r M be the rank of ξ¯ M , let ξ¯ M be the pseudoinverse of ξ¯ M , and let i i i i ξ¯ M = UM ΣM UM i i i be a SVD of ξ¯ M where Σ M ∈ R(q −1)×(q −1) and U M ∈ R(q −1)×(q −1) . The main result characterizing the parameters yielded by the mixed method follows. i
i
i
i
Theorem 5.6 (Solutions to Whole-Sequence Mixed Method) Suppose that Assumption 5.3 holds and consider any player i ∈ P. If Θ i is given by (5.31) and (I − i ¯ i+ i ξ¯ M ξ M )ν M = 0, then all of the cost-function parameters θ i ∈ Θ i yielded by the mixed method (5.23) satisfy 1 θ i = ¯i θ
(5.50)
where the vectors θ¯ i ∈ Rq −1 are given by i
0 i+ i i ¯θ i = −ξ¯ M νM + UM i b
(5.51)
i for any bi ∈ Rq −1−r M . Furthermore, if r M = q i − 1, then the mixed method (5.23) yields the unique cost-function parameters i
i
θi =
1
i+ i νM −ξ¯ M
.
(5.52)
168
5 Inverse Noncooperative Dynamic Games
Proof The theorem is proved in the same manner as Theorem 3.5 for each player i ∈ P using the quadratic reformulation in (5.47). Theorem 5.6 is the extension of Theorem 3.5 to solving the whole-sequence inverse noncooperative dynamic game problem. The discussion of rank conditions and solutions immediately after Theorem 3.5 is therefore also applicable here. We summarize the mixed method for whole sequences (5.23) under Assumption 5.3 (per player i ∈ P) in Algorithm 5.3. Algorithm 5.3 Whole-Sequence Mixed Method for Player i ∈ P j
Input: Whole state and control sequences x[0,T ] and {u [0,T −1] : j ∈ P }, dynamics f k , basis funci tions g¯ i and F¯ i , control set U i , and parameter set Θ i = {θ i ∈ Rq : θ i = 1}. (1)
k
Output: Computed cost-function parameters θ i . 1: Compute sequence of matrices λ¯ i[0,T ] via (5.34) and (5.35). i in (5.47). 2: Compute matrix ξ M i from (5.48) and vector ν i from (5.49). 3: Compute submatrix matrix ξ¯ M M i+ i such that (I − ξ¯ i ξ¯ i+ )ν i = 0. of ξ¯ M 4: Compute the pseudoinverse ξ¯ M M M M i of ξ¯ i . 5: Compute the rank r M M i = q i − 1 then 6: if r M 7: return Unique θ i given by (5.51). 8: else i via SVD of ξ¯ i . 9: Compute U M M 10: return Any θ i from (5.50) with any bi ∈ Rq 11: end if
i −1−r i M
.
5.5.4 Truncated-Sequence Methods Reformulations and Results We now turn our attention to reformulating and analyzing the constraint-satisfaction, mixed, and soft methods presented in Sect. 5.4.2 for truncated sequences under the open-loop information structure. We shall again make use of Assumption 5.3 and parameter sets given by (5.31).
5.5.4.1
Truncated-Sequence Constraint-Satisfaction Method Reformulation and Results
By following the derivation of (3.62) for each player i ∈ P, we have that the constraint-satisfaction method (5.26) for truncated sequences under Assumption 5.3 is equivalent to solving the N (constrained) systems of linear equations
5.5 Open-Loop Method Reformulations and Solution Results
φCi
θi =0 λi+1
169
s.t. θ i ∈ Θ i
(5.53)
for all i ∈ P where φCi is the stacked matrix φCi Wki k∈K i
with Wki [∇u i g¯ ki (xk , u 1k , . . . , u kN ) ∇u i f k (xk , u 1k , . . . , u kN )]Gki
(5.54)
and Gki G ik × · · · × G i−1 × G i where I 0 . ∇x g¯ ki (xk , u 1k , . . . , u kN ) ∇x f k (xk , u 1k , . . . , u kN )
G ik
By substituting the parameter set (5.31) into (5.53) for Θ i , the constraint-satisfaction method becomes the N unconstrained systems of linear equations φ¯ Ci
θi = e¯1 λi+1
(5.55)
for all i ∈ P where i i e i ¯ φC 1i ∈ R(|K |+1)×(q +n) φC and e1 and e¯1 are column vectors of appropriate dimensions with 1 in their first components and zeros elsewhere. In the following theorem, we analyze the solutions to the constraint-satisfaction method reformulated as the system of linear equations (5.55). To present this theorem, i i let us use φ¯ Ci+ ∈ R(q +n)×(|K |+1) to denote the (Moore–Penrose) pseudoinverse of φ¯ Ci , and let us introduce the matrix i,1 U¯ i ¯ UC ¯ Ci,2 = I − φ¯ Ci+ φ¯ Ci UC
(5.56)
i i i where U¯ Ci,1 ∈ Rq ×(q +n) and U¯ Ci,2 ∈ Rn×(q +n) . Let us also define I¯Ci I 0 ∈ i i Rq ×(q +n) .
170
5 Inverse Noncooperative Dynamic Games
Theorem 5.7 (Truncated-Sequence Constraint-Satisfaction Method Solutions) Suppose that Assumption 5.3 holds and consider any player i ∈ P. If the parameter set Θ i is given by (5.31) then the constraint-satisfaction method for truncated sequences (5.26) yields player cost-function parameters if and only if φ¯ Ci φ¯ Ci+ e¯1 = e¯1 ,
(5.57)
and these (potentially nonunique) parameters are given by θ i = I¯Ci φ¯ Ci+ e¯1 + U¯ Ci bi
(5.58)
i where bi ∈ Rq +n is any arbitrary vector. Furthermore, if U¯ Ci,1 = 0 or φ¯ Ci has rank ρCi = q i + n in addition to (5.57) holding, then the player cost-function parameters computed by constraint-satisfaction method are unique and given by
θ i = I¯Ci φ¯ Ci+ e¯1 .
(5.59)
Proof The proof follows that of Theorem 3.6 for each player i ∈ P using the reofmrulation in (5.55). The proof is complete. Theorem 5.7 establishes conditions under which it is possible for the constraintsatisfaction method (5.26) to compute cost-function parameters for the player i ∈ P. Our discussion of the inverse optimal control counterpart result of Theorem 3.6 provides deeper intuition into what these conditions entail. We note however that in the setting of dynamic games, there may be situations in which the constraintsatisfaction method (5.26) yields (potentially unique) parameters for some players, but not for others. As in the whole-sequence case and the case of inverse optimal control, Theorem 5.7 also provides insight into when the TS problem of Definition 5.2 is feasible to solve exactly (via any method). We summarize this insight in the following corollary. Corollary 5.2 (Existence of Exact Solutions to Truncated-Sequence Problem of Definition 5.2) Suppose that Assumption 5.3 holds and consider any player i ∈ P. If the parameter set Θ i is given by (5.31) and if φ¯ Ci φ¯ Ci+ e¯1 = e¯1 j
then the sequence of states x[0,] and associated controls {u [0,] : j ∈ P} do not constitute a Nash equilibrium for any θ i ∈ Θ i . Proof Follows via the same argument as Corollary 3.3. φ¯ Ci φ¯ Ci+ e¯1
Corollary 5.2 suggests that = e¯1 must hold in order for the truncatedsequence inverse noncooperative dynamic game problem of Definition 5.2 to have an exact solution. If however φ¯Ci φ¯ Ci+ e¯1 = e¯1 for any player i ∈ P, then there are no costj function parameters for that player such that x[0,] and {u [0,] : j ∈ P} constitute
5.5 Open-Loop Method Reformulations and Solution Results
171
a Nash equilibrium, and so their parameters would have to be computed via an approximate optimality approach such as the soft or mixed methods. We summarize the reformulation and solution results for the truncated-sequence constraint-satisfaction method (5.26) under Assumption 5.3 for each player i ∈ P in Algorithm 5.4. Algorithm 5.4 Truncated-Sequence Constraint-Satisfaction Method for Player i j
Input: Truncated state and control sequences x[0,] and {u [0,] : j ∈ P }, dynamics f k , basis funci tions g¯ i and F¯ i , control set U i , and parameter set Θ i = {θ i ∈ Rq : θ i = 1}. (1)
k
Output: Computed cost-function parameters θ i . 1: Compute matrix φCi in (5.53). 2: Compute augmented matrix φ¯ Ci in (5.55). 3: Compute the pseudoinverse φ¯ Ci+ of φ¯ Ci . 4: if φ¯ Ci φ¯ Ci+ e¯1 = e¯1 then 5: Compute the rank ρCi of φ¯ Ci . 6: if ρCi = q i + n then 7: return Unique θ i given by (5.59). 8: else 9: Compute U¯ Ci and U¯ Ci,1 in (5.56). 10: if U¯ Ci,1 = 0 then 11: return Unique θ i given by (5.59). 12: else i 13: return Any θ i from (5.58) with any bi ∈ Rq +n . 14: end if 15: end if 16: else 17: return No feasible exact solutions θ i (cf. Corollary 5.2). 18: end if
5.5.4.2
Truncated-Sequence Soft Method Reformulation and Results
Under Assumption 5.3, the soft method for truncated sequences (5.27) can be reformulated in the same way as the soft method for whole sequences (as in (5.40)). In particular, the soft method of (5.27) under Assumption 5.3 is equivalent to the N (constrained) quadratic programs ⎡ inf
θ i ,λi[0,+1]
i i ⎢ ⎢ θ λ0 · · · λi +1 φ Si ⎢ ⎣
θi λi0 .. .
λi+1
⎤ ⎥ ⎥ ⎥ ⎦
s.t.
θi ∈ Θi
(5.60)
172
5 Inverse Noncooperative Dynamic Games
for all i ∈ P where φ Si are matrices of appropriate dimensions containing the coefficients of the parameters θ i and costate variables λi[0,+1] contributed by each of the terms in the soft method of (5.27), with these terms being, ∇u i H i xk , u 1 , . . . , u N , λi , θ i 2 k k k k+1 ∇u i g¯ ki xk , u 1k , . . . , u kN = θ i λi k+1 ∇u i f k xk , u 1k , . . . , u kN θ i · ∇u i g¯ ki xk , u 1k , . . . , u kN ∇u i f k xk , u 1k , . . . , u kN λik+1 and i λ − ∇x H i xk , u 1 , . . . , u N , λi , θ i 2 k k k k k+1 ⎤ ⎡ i N 1 i i i −∇x g¯ k xk , u k , . . . , u k ⎦ = θ λk λk+1 ⎣ I −∇x f k xk , u 1k , . . . , u kN
N
· −∇x g¯ ki xk , u 1k , . . . , u k
N
I −∇x f k xk , u 1k , . . . , u k
⎡
⎤ θ ⎣ λik ⎦ . λik+1
With the reformulation of the truncated-sequence soft method in (5.60) and the player parameter sets given by (3.44), we can now establish the existence and uniqueness of cost-function parameters that are yielded by the soft method (5.27). To present these solutions results, for i ∈ P, let us define φ¯ Si as the principal submatrix of φ Si formed by deleting the first row and column of φ Si and let μiS be the first column of φ Si with its first element deleted. For i ∈ P, let us also define ρ Si as the rank of φ¯ Si , let φ¯ Si+ be the pseudoinverse of φ¯ Si , and let φ¯ Si = U¯ Si Σ¯ Si U¯ Si i i be a SVD of φ¯ Si where Σ¯ Si ∈ R(q +(+2)n−1)×(q +(+2)n−1) is a diagonal matrix, and
i,11 i,12 i i U¯ U¯ U¯ Si ¯ Si,21 ¯ Si,22 ∈ R(q +(+2)n−1)×(q +(+2)n−1) US US
(5.61)
i i i i i is a block matrix with submatrices U¯ Si,11 ∈ R(q −1)×ρS , U¯ Si,12 ∈ R(q −1)×(q +(+2)n−1−ρS ) , i i i U¯ Si,21 ∈ Rn(+2)×ρS and U¯ Si,22 ∈ Rn(+2)×(q +(+2)n−1−ρS ) . Finally, let us define the rect i i angular matrix I¯Si I 0 ∈ Rq ×(q +(+2)n) where I denotes the square identify matrix with appropriate dimensions.
Theorem 5.8 (Solutions to Truncated-Sequence Soft Method) Suppose that Assumption 5.3 holds and consider any player i ∈ P. If Θ i is given by (5.31) and if (I − φ¯ Si φ¯ Si+ )μiS = 0, then all of the parameter vectors θ i ∈ Θ i corresponding to
5.5 Open-Loop Method Reformulations and Solution Results
173
all solutions (θ i , λi[0,+1] ) of the soft method (5.27) are of the form θ i = I¯Si β Si
(5.62)
i where β Si 1 β¯Si ∈ Rq +(+2)n are (potentially nonunique) solutions to the quadratic i program (5.60) with β¯Si ∈ Rq +(+2)n−1 given by 0 β¯Si = −φ¯ Si+ μiS + U¯ Si i b
(5.63)
i i for any bi ∈ Rq +(+2)n−1−ρS . Furthermore, if either U¯ Si,12 = 0 or ρ Si = q i + ( + 2)n − 1, then all solutions (θ i , λi[0,+1] ) to the soft method (5.27) correspond to a single unique parameter vector given by
θ = I¯Si
i
1
−φ¯ Si+ μiS
.
(5.64)
Proof Follows along the same lines as Theorem 3.7 for each player i ∈ P using the reformulation (5.60). Theorem 5.8 describes the player cost-function parameters yielded by the soft method for truncated sequences (5.27). Its assertions mirror those of Theorem 5.5 for the whole-sequence soft method, and hence much of the discussion after Theorems 5.5 and 3.4 is therefore relevant here. We summarize the reformulation and solution results for soft method for truncated sequences (5.27) under Assumption 5.3 in Algorithm 5.5 for player i ∈ P.
5.5.4.3
Truncated-Sequence Mixed Method Reformulation and Results
Finally, the mixed method for truncated sequences (5.28) for each player i ∈ P can be reformulated in the same manner as the truncated-sequence mixed method of inverse optimal control in (3.64). In particular, under Assumption 5.3, the mixed method (5.28) becomes the problem of solving the N quadratic programs: inf
θ i ,λi+1
i i i θ λ+1 φ M
θi λi+1
s.t. θ i ∈ Θ i
for all i ∈ P where φ iM is the positive semidefinite matrix φ iM
k∈K i
and Wki is defined in (5.54).
Wki Wki
(5.65)
174
5 Inverse Noncooperative Dynamic Games
Algorithm 5.5 Truncated-Sequence Soft Method for Player i j
Input: Truncated state and control sequences x[0,] and {u [0,] : j ∈ P }, dynamics f k , basis funci tions g¯ i and F¯ i , control set U i , and parameter set Θ i = {θ i ∈ Rq : θ i = 1}. (1)
k
Output: Computed cost-function parameters θ i . 1: Compute matrix φ Si via (5.60). 2: Compute principal submatrix φ¯ Si from φ Si by deleting first row and column. 3: Compute vector μiS from φ Si by extracting first column without its first element. 4: Compute the pseudoinverse φ¯ Si+ of φ¯ Si such that (I − φ¯ Si φ¯ Si+ )μiS = 0. 5: Compute the rank ρ Si of φ¯ Si . 6: if ρ Si = q i + ( + 2)n − 1 then 7: return Unique θ i given by (5.64). 8: else 9: Compute U¯ Si and U¯ Si,12 in (5.61) via SVD of φ¯ Si . 10: if U¯ Si,12 = 0 then 11: return Unique θ given by (5.64). 12: else i i 13: return Any θ i from (5.62) with any bi ∈ Rq +(+2)n−1−ρ S . 14: end if 15: end if
Given the reformulation of the mixed method in (5.65), we are now in a position to derive results characterizing its solutions. To develop the solution results, let us consider the reformulation (5.65) and for each i ∈ P, define φ¯ iM as the principal submatrix of φ iM formed by deleting the first row and column of φ iM , μiM as the first i as the rank of φ¯ iM . Let us also column of φ iM with its first element deleted, and ρ M i+ i ¯ ¯ define φ M as the pseudoinverse of φ M , and let i ¯ i ¯ i φ¯ iM = U¯ M ΣM UM i be a SVD of φ¯ iM where Σ¯ M ∈ R(q +n−1)×(q +n−1) is a diagonal matrix, and i
i,11 U¯ i U¯ M = ¯M i,21 UM
i
i,12 U¯ M (q i +n−1)×(q i +n−1) i,22 ∈ R ¯ UM
(5.66)
i,11 i,12 ∈ R(q −1)×ρ M , U¯ M ∈ R(q −1)×(q +n−1−ρ M ) , is a block matrix with submatrices U¯ M i i i i,21 i,22 ∈ Rn×ρ M and U M ∈ Rn×(q +n−1−ρ M ) . Finally, let us define the rectangular U¯ M i q i ×(q i +n) ¯ matrix I M [I 0] ∈ R . i
i
i
i
i
Theorem 5.9 (Solutions to Truncated-Sequence Mixed Method) Suppose that Assumption 5.3 holds and consider any player i ∈ P. If Θ i is given by (5.31) and i i i if (I − φ¯ iM φ¯ i+ M )μ M = 0, then all of the parameter vectors θ ∈ Θ corresponding to i i all solutions (θ , λ+1 ) of the mixed method (5.28) are of the form i θ i = I¯Mi β M
(5.67)
5.5 Open-Loop Method Reformulations and Solution Results
175
i i i where β M = 1 β¯M ∈ Rq +n are (potentially nonunique) solutions to the quadratic i i ∈ Rq +n−1 given by program (5.65) with β¯M 0 i i i ¯ = −φ¯ i+ μ + U β¯M M bi M M
(5.68)
i,12 i for any bi ∈ Rq +n−1−ρ M . Furthermore, if either U¯ M = 0 or ρ M = q i + n − 1, then i i all of the solutions (θ , λ+1 ) to the mixed method (5.28) correspond to the single unique parameter vector θ i ∈ Θ i given by i
i
θ = I¯Mi i
1
i −φ¯ i+ M μM
.
(5.69)
Proof The theorem assertions hold via the same argument as Theorem 3.8 using the reformulation in (5.65). Theorem 5.9 is the counterpart to Theorem 3.8 for the truncated-sequence mixed method of inverse noncooperative dynamic games. The discussion after Theorem 3.8 therefore provides useful intuition here into the conditions of Theorem 5.9. In the context of inverse noncooperative dynamic games, Theorem 5.9 highlights that in contrast to the bilevel method (5.17), the truncated-sequence mixed method (like the constraint-satisfaction and soft methods) can compute the parameters of individual players separately without knowledge of the horizon T (or needing to solve forward noncooperative dynamic games). We summarize the reformulation and solution results for the truncated-sequence mixed method (5.28) under Assumption 5.3 in Algorithm 5.6.
5.6 Challenges and Potential for Feedback Minimum-Principle Methods In this chapter, we have so far focused only on developing minimum-principle methods for solving the inverse noncooperative dynamic game problems of Definitions 5.1 and 5.2 under the open-loop information structure. In this open-loop case, the problems of Definitions 5.1 and 5.2 collapse to N discrete-time inverse optimal control problems (one for each player in the game) analogous to those in Definitions 3.1 and 3.2. This relationship between open-loop inverse dynamic game problems and inverse optimal control problems follows from the relationship between (forward) noncooperative dynamic games and optimal control problems (cf. Sect. 2.4.2.2). Unfortunately, the development of minimum-principle methods for solving the inverse noncooperative dynamic game problems of Definitions 5.1 and 5.2 under the feedback information structure poses challenges not encountered under the openloop information structure. In particular, a similar reduction of the problems in Definitions 5.1 and 5.2 to N discrete-time inverse optimal control problems analogous
176
5 Inverse Noncooperative Dynamic Games
Algorithm 5.6 Truncated-Sequence Mixed Method for Player i j
Input: Truncated state and control sequences x[0,] and {u [0,] : j ∈ P }, dynamics f k , basis funci tions g¯ i and F¯ i , control set U i , and parameter set Θ i = {θ i ∈ Rq : θ i = 1}. k
(1)
Output: Computed cost-function parameters θ i . 1: Compute matrix φ iM via (5.65). 2: Compute principal submatrix matrix φ¯ iM from φ iM by deleting first row and column. 3: Compute vector μiM from φ iM by extracting first column without its first element. ¯i ¯ i ¯ i+ i 4: Compute the pseudoinverse φ¯ i+ M of φ M such that (I − φ M φ M )μ M = 0. i of φ¯ i . 5: Compute the rank ρ M M i = q i + n − 1 then 6: if ρ M 7: return Unique θ i given by (5.69). 8: else i and U ¯ i,12 in (5.66) via SVD of φ¯ i . 9: Compute U¯ M M M i,12 ¯ 10: if U M = 0 then 11: return Unique θ i given by (5.69). 12: else i i 13: return Any θ i given by (5.67) with any bi ∈ Rq +n−1−ρ M . 14: end if 15: end if
to those in Definitions 3.1 and 3.2 is not directly possible since the discrete-time optimal control problems associated with feedback Nash equilibria (described in Sect. 2.4.2.2 and (2.30)) involve the player feedback Nash equilibrium strategies j {γk : j ∈ P, i = j, 0 ≤ k ≤ T − 1}. Furthermore, as shown in Theorem 5.3, the minimum-principle conditions for feedback Nash equilibria involve costate backward recursions (5.11) that dependent on knowledge of the player feedback Nash j equilibrium strategies {γk : j ∈ P, i = j, 0 ≤ k ≤ T − 1} and their partial derivatives with respect to the states xk . In general, it does not seem possible to simply omit the costate backwards recursions and use only the Hamiltonian conditions (5.13) to develop inverse methods since frequently there arise more unknowns (both costate variables and cost-function parameters) than equations or conditions. In principle, if the feedback Nash equilibrium strategies (but not the cost-function parameters) of the players are known, then we could pursue constraint-satisfaction, soft, and mixed methods for feedback Nash equilibria in a similar manner to Sect. 5.4. Indeed, the inverse problem becomes that of computing player cost-function parameters such that the given set of (known) player feedback strategies constitute a feedback Nash equilibrium, which is still nontrivial. We highlight that in this form, the inverse problem becomes analogous to the original notion of inverse optimal control introduced by Kalman in [13] for the single-player case N = 1. Although the problem of computing player cost-function parameters given the player feedback strategies departs slightly from the inverse problems in Definitions 5.1 and 5.2, from a practical perspective, the assumption that the feedback strategies are known can be interpreted as requiring that sufficient state and control data is provided to first compute the player strategies before using these to com-
5.6 Challenges and Potential for Feedback Minimum-Principle Methods
177
pute player cost-function parameters. In principle, inverse dynamic game problems can thus be solved under the feedback information structure in two steps by first estimating the feedback strategies γki from given state and control sequences, and then computing the player cost-function parameters θ i . In the next section, we shall examine this two-step approach for infinite-horizon LQ feedback dynamic games.
5.7 Inverse Linear-Quadratic Feedback Dynamic Games In this section, we explore the two-step approach discussed in the previous section for solving the TS problem of Definition 5.2 under the feedback information structure in the special case where the game dynamics are linear, the player cost functions are quadratic, and the horizon T is infinite. The two-step approach is tractable in this LQ case since: 1. the player feedback strategies are time-invariant matrices that can be estimated from state and control data using standard linear least-squares estimation; and 2. the player cost-function parameters can be computed directly from the player feedback strategies using conditions for feedback Nash equilibria derived from coupled discrete-time algebraic Riccati equations (AREs).
5.7.1 Preliminary LQ Dynamic Game Concepts We start by specializing the concepts introduced in Sect. 5.1.
5.7.1.1
Parameterized LQ Dynamic Games
Let us consider N players P = {1, . . . , N } interacting through a discrete-time dynamical system described by the system of linear difference equations xk+1 = Axk +
N
B i u ik , x0 ∈ Rn ,
(5.70)
i=1 i
for k ≥ 0 where A ∈ Rn×n and B i ∈ Rn×m for i ∈ P, denote time-invariant system matrices. For each player i ∈ P, let us also consider an infinite-horizon quadratic cost function of the form ⎛ ⎞ ∞ N 1 j j i N ⎝xk Q i xk + V∞ (x[0,∞] , u 1[0,∞] , . . . , u [0,∞] , Q i , R i1 , . . . , R i N ) = uk Ri j uk ⎠ , 2 k=0
j=1
(5.71)
178
5 Inverse Noncooperative Dynamic Games
where Q i ∈ Rn×n and R i j ∈ Rm ×m (with i, j ∈ P) are symmetric cost matrices that constitute the parameters of the cost functions. To avoid degeneracy of the game, we shall assume that the matrices R ii to be positive definite (i.e., R ii 0 for all i ∈ P).1 We shall consider noncooperative LQ dynamic games with a feedback information structure, and restrict our attention to linear player feedback strategies of the form i
i
γ i (xk ) = u ik = −K i xk ,
(5.72)
for i ∈ P where the matrices K i ∈ Rm ×n result in a stable closed-loop system in the sense that xk+1 = F xk → 0 as k → ∞ where i
F A−
N
BjK j
(5.73)
j=1
is the closed-loop system matrix. In other words, we shall consider N -tuples of feedback strategies K {K i : i ∈ P} belonging to the set F {K = {K i ∈ Rm ×n : i ∈ P} : F is stable}. i
(5.74)
The set F is non-empty if (and only if) the matrix pair (A, [B 1 , B 2 , . . . , B N ]) is stabilizable (cf. [8]).2 As in Sect. 2.4.2.2, an N -tuple of feedback strategies K ∈ F constitutes a feedback Nash equilibrium solution to the infinite-horizon noncooperative LQ dynamic game with linear dynamics (5.70) and quadratic player cost functions (5.71) if and only if they solve the N coupled discrete-time LQ optimal control problems inf
i N V∞ (x[0,∞] , u 1[0,∞] , . . . , u [0,∞] , θ¯ i )
s.t.
xk+1 = Axk +
K¯ i
N
j
B j u k , x0 ∈ Rn
j=1
u ik j uk
¯i
(5.75)
= − K xk
= −K j xk , j ∈ P, j = i {K 1 , . . . , K¯ i , . . . , K N } ∈ F It is also common to introduce the constraints that Q i 0 and R i j 0 for all i, j ∈ P , as in the one-player case of LQ optimal control. However, for the following results, these constraints are not necessary. 2 As discussed in [9, p. 371], the restriction to stabilizing feedback strategies corresponds to the assumption that the players have a first priority in stabilizing the system (as a meta-objective) and that at least the possibility of some coordination between players exists. For the inverse dynamic game problem, this means that we shall solely use trajectories from stable systems (or the corresponding feedback strategies) in order to estimate cost-function parameters. In other words, equilibria which lead to unstable system dynamics are not considered. 1
5.7 Inverse Linear-Quadratic Feedback Dynamic Games
179
i for i ∈ P where, in a slight abuse of notation, we use θ¯ i in V∞ to denote a costfunction parameter vector of player i ∈ P encoding all elements of the cost function matrices, i.e.,
θ¯ i = vec(Q i ) vec(R i1 ) · · · vec(R ii ) · · · vec(R i N ) .
(5.76)
The relationship that (5.75) establishes between discrete-time LQ optimal control (cf. Sect. 3.6) and discrete-time LQ feedback dynamic games enables us to use discretetime AREs to characterize feedback Nash equilibria.
5.7.1.2
Conditions for Feedback Nash Equilibria
The following theorem establishes necessary and sufficient conditions for Nash equilibrium feedback strategies by means of coupled discrete-time AREs. Theorem 5.10 (Coupled Discrete-Time AREs) Consider a noncooperative infinitehorizon LQ dynamic game with linear dynamics (5.70) and quadratic player cost functions (5.71). If an N -tuple of player feedback strategies K = {K i : i ∈ P} and an N -tuple of matrices {P i ∈ Rn×n : i ∈ P} satisfy the coupled discrete-time AREs
N B i P i B i + R ii K i = B i P i A − B i P i B j K j,
(5.77a)
j=1 j=i
P i = Qi +
N
K l R il K l + F P i F
(5.77b)
l=1
for i ∈ P and K ∈ F , then K is a feedback Nash equilibrium solution to the infinitehorizon LQ dynamic game. Conversely, if an N -tuple of player feedback strategies K = {K i : i ∈ P} ∈ F is a feedback Nash equilibrium solution, then the coupled AREs (5.77) admits a solution in the form of an N -tuple of matrices {P i : i ∈ P}. Proof The theorem is proved analogously to [8, Theorem 4] (which gives the equivalent result in continuous time). Given the preliminary concepts of (forward) infinite-horizon LQ dynamic games and coupled discrete-time AREs (cf. Theorem 5.10), we next introduce a feedbackstrategy-based inverse LQ dynamic game problem.
180
5 Inverse Noncooperative Dynamic Games
5.7.2 Feedback-Strategy-Based Inverse Dynamic Games In this subsection, we first pose the problem of inverse LQ dynamic games given player feedback strategies (i.e., the second step in the two-step approach described in Sect. 5.6). We then develop and analyze a method of solving this problem.
5.7.2.1
Feedback-Strategy-Based Problem Formulation
The feedback-strategy-based inverse noncooperative LQ dynamic game problem is defined as follows. Definition 5.3 (Feedback-Strategy-Based (FSB) Problem) Consider a parameterized noncooperative dynamic game with an infinite horizon T = ∞, linear dynamics (5.70), quadratic cost functions (5.71), and the feedback information structure. Then, given system matrices A and {B i : i ∈ P}, and an N -tuple of feedback control strategies K = {K i : i ∈ P}, the feedback-strategy-based inverse LQ dynamic game problem is to compute the cost-function parameters (i.e., the entries of the matrices Q i and R i j for one or more players i ∈ P), such that K (and hence the controls given by u ik = −K i xk , i ∈ P) constitutes a feedback Nash equilibrium solution. The FSB problem reduces to the inverse LQ optimal control feedback-law-based (FLB) problem of Definition 3.4 when the game has a single player (i.e., when N = 1). For multiple players N > 1, we shall develop a method for solving the FSB problem analogous to that developed in Sect. 3.6.3 for inverse LQ optimal control. In particular, we shall interpret the discrete-time AREs of Theorem 5.10 as conditions that the cost matrices Q i and R i j must satisfy such that K constitutes a feedback Nash equilibrium solution.
5.7.2.2
Reformulation of the Discrete-Time Algebraic Riccati Equations
In order for K to constitute a feedback Nash equilibrium, Theorem 5.10 gives conditions that the matrices P, Q i , and R i j must satisfy. Hence, manipulation of the Eqs. (5.77) using vectorization operations and the Kronecker product analogously to the procedure in Sect. 3.6.3.2 leads to (5.77) being equivalent to the N systems of linear equations (5.78) W¯ i θ¯ i = 0, for i ∈ P where
5.7 Inverse Linear-Quadratic Feedback Dynamic Games
W¯ i S i S i K ⊗1 · · · S i K ⊗i − K i ⊗ I · · · S i K ⊗N −1 i 2 S i F ⊗ B i I − F ⊗ F , S i ∈ Rnm ×n , K ⊗i
K ⊗K , i
i
K ⊗i
n 2 ×(m i )2
∈R
181
(5.79) (5.80) (5.81)
and the unknown parameter vector θ¯ i defined as in (5.76). Since the matrices W¯ i for i ∈ P are independent of the unknown parameters θ¯ i , the systems (5.78) can, in principle, be solved to for each player i to yield solutions to the FSB problem of Definition 5.3. We can also use (5.78) to analyze the existence of (exact) solutions to the FSB problem of Definition 5.3 (similar to Corollary 5.1).
5.7.2.3
Existence of Exact Solutions to Feedback-Strategy-Based Problem
As in the case of inverse optimal control discussed in Sect. 3.6.3.3, the existence of solutions to the system of linear equations (5.78) for each player i ∈ P depends on the properties of the matrix W¯ i and the number of nonredundant parameters in the player’s cost function. To discuss these properties, let q i denote the number of nonredundant (and nonzero) parameters of the cost function of player i ∈ P, i.e., the nonredundant elements of θ¯ i . If Q i and R i j are diagonal matrices (i.e., only their diagonal entries are unknown), then q =n+ i
N
m j.
j=1
If Q i and R i j are symmetric matrices (i.e., Q i = Q i and R i j = R i j ), then n 2 + n (m j )2 + m j + . 2 2 j=1 N
qi =
Finally, in the most general case of every element of Q i and R i j being unknown, we have N q =n + (m j )2 . i
2
j=1
By omitting the redundant parameters of Q i and R i j from the parameter vector ¯θ i , (5.78) can be rewritten as (5.82) W i θ i = 0,
182
5 Inverse Noncooperative Dynamic Games i
with the reduced parameter vector θ i ∈ Rq containing only nonredundant and i i nonzero elements and the matrix W i ∈ Rnm ×q which is obtained by modifying W¯ i such that (5.82) holds. From (5.82), we observe that the set parameters θ i solving the FSB problem of Definition 5.3 for player i ∈ P is given by ker(W i ), provided it exists. If the kernel of W i does not exist, then there are no feasible solutions to the FSB problem for player i. A sufficient condition for the existence of the kernel of W i is that nm i < q i . i R i j are symmetThis condition always holds if the cost function matrices N Qi and i 2 i 2 i ric since nm ≤ 0.5(n + (m ) ) < 0.5(n(n + 1) + j=1 m (m + 1)) = q i . However, in some scenarios (e.g., with diagonal matrices), combinations of n and m i exist such that nm i ≥ q i , which implies that ker(W i ) does not exist and the FSB problem lacks a solution for player i. Clearly, the FSB problem of Definition 5.3 is not exactly solvable via (5.82) if the kernel of W i for any i ∈ P does not exist. This situation will arise, for example, if we are given feedback strategies K that fail to constitute a feedback Nash equilibrium for any cost function matrices. We therefore next propose a quadratic programming approach so as to be able to obtain solutions (in a potentially approximate-optimality sense) for all given feedback strategies K .
5.7.2.4
Feedback-Strategy-Based Method
The method for solving the FSB problem of Definition 5.3 that we present involves minimizing the violation of the conditions for feedback Nash equilibria summarized by the systems of linear equations (5.82). Specifically, the FSB method is to solve the N quadratic programs 1 i i i θ Ωθ , 2 R ii 0
min θi
s.t.
(5.83)
for i ∈ P where Ω i 2(W i W i ) ∈ Rq ×q and R ii is the cost function matrix that results from θ i . The constraint in (5.83) can be relaxed by replacing it with the constraint of setting one parameter of R ii to be nonzero and positive (as in the parameter set in (5.31)). We note that by minimizing the violation of conditions for Nash equilibria, this method resembles the soft and mixed methods of (open-loop) inverse dynamic games we presented in Sect. 5.4. We note also that (5.83) can be used to compute the cost matrices of players independently (i.e., there is no need to compute matrices for all of the players simultaneously). We can also analyze the solutions to (5.83) using tools from linear algebra. i
i
5.7 Inverse Linear-Quadratic Feedback Dynamic Games
5.7.2.5
183
Solution Results
The first result regarding (5.83) establishes the existence of its solutions. Proposition 5.1 Consider any player i ∈ P. Under the conditions of the FSB problem of Definition 5.3, the quadratic program (5.83) is convex and possesses at least one solution. Proof The proof is the same as in Proposition 3.2.
Proposition 5.1 guarantees the existence of a solution to the inverse LQ feedback dynamic game problem for each player i ∈ P (regardless of whether K = {K i : i ∈ P} constitutes an exact feedback Nash equilibrium). The next result establishes conditions under which solutions to (5.83) are unique for each player i ∈ P (up to an arbitrary nonzero scaling factor C i > 0). Theorem 5.11 (Necessary and Sufficient Conditions for Unique Solutions) Consider the FSB problem of Definition 5.3 and suppose that it is solved for player i ∈ P by matrices Q i and R i j with elements from a vector θ i . Then, the set of all solutions to (5.83) for i ∈ P is {C i θ i : C i > 0}
(5.84)
if and only if n m i ≥ q i − 1 together with rank(W i ) = q i − 1. Proof The proof is analogous to that of Theorem 3.10.
5.7.3 Estimation of Feedback Strategies Having developed a method for solving the FSB problem of Definition 5.3 (i.e., the second step in the two-step approach described in Sect. 5.6), we now briefly discuss how these could be estimated from sequences of states and controls (the first step in the two-step approach described in Sect. 5.6). For this purpose, we utilize the finite sequences of the states and controls of all players {(x0 , u 10 , u 20 , . . . , u 0N ), (x1 , u 11 , u 21 , . . . , u 1N ), . . . , (x , u 1 , u 2 , . . . , u N )}.
(5.85)
Given the feedback relationship between states and controls (i.e., (5.72)), the feedback strategy of any player i ∈ P can be estimated by solving the linear least-squares estimation problem K i = arg min K¯ i
k=0
K¯ i xk + u ik 2 .
(5.86)
184
5 Inverse Noncooperative Dynamic Games
The solution can be given in closed form as −1 i X [0,] X [0,] X [0,] K i = −U[0,]
(5.87)
i where X [0,] ∈ R+1×n and U[0,] ∈ R+1×m denote the observed sequence of available state and control values arranged in matrices, respectively. i
5.7.4 Inverse LQ Dynamic Game Method Algorithm 5.7 summarizes the results of this section concerning the solution of the truncated-sequence discrete-time inverse dynamic game problem of Definition 5.2 under the feedback information structure in the special case where the game dynamics are linear, the player cost functions are quadratic, and the horizon T is infinite. Algorithm 5.7 Infinite-horizon inverse LQ feedback dynamic games for Player i j
Input: Truncated state and control sequences x[0,] and {u [0,] : j ∈ P }, system matrices A and {B j : j ∈ P }. Output: Computed cost-function parameters θ i (i.e., nonredundant elements of Q i and R i j ). 1: Estimate K j for all j ∈ P using least squares (i.e., (5.86)) and determine the corresponding closed-loop system matrix F with (5.73). 2: Compute W¯ i with (5.79). 3: Modify W¯ i to form W i so as to fulfill (5.82) with unknown parameters θ i . 4: Solve the quadratic optimization problem (5.83) for θ i .
It is interesting to note that the inverse LQ dynamic game method presented in Algorithm 5.7 enables the computation of cost-function parameters for each player i ∈ P independently by solving (5.83), but requires all of the player feedback strategies K = {K j : j ∈ P} to first be estimated. We note also that Algorithm 5.7 corresponds to the inverse LQ optimal control method of Algorithm 3.7 in the case of a single player N = 1.
5.8 Notes and Further Reading Bilevel and Other Methods: Bilevel methods for solving inverse dynamic game problems appear to have been implicitly introduced (and discarded) in the economics literature, with Bajari et al. [4] noting that “computing equilibria over and over, as would be required in a typical estimation routine, seems out of the question”. A variety of alternative inverse dynamic game methods based on conditions for equilibrium solutions have thus been introduced in the economics literature, with the vast majority focused on games with dynamics governed by (stochastic) Markov
5.8 Notes and Further Reading
185
decision processes with a finite number of states and/or controls (cf. [1, 4, 21] and the survey paper of [2] and references therein). Similar treatments have also begun to recently emerge in the engineering and computer science literature (e.g., [17]). In contrast to much of this work, in this chapter we have considered games with dynamics governed by (deterministic) difference equations with continuous states and controls. We note, however, that the two-step approach described in Sect. 5.6 for solving inverse dynamic games under the feedback information structure has broad similarities to popular methods in economics (e.g., [4]). Minimum-Principle Methods: Control-theoretic investigations of inverse noncooperative dynamic game problems using discrete-time minimum principles have only been undertaken relatively recently, beginning with the inverse two-player zero-sum dynamic game work of [23] and the inverse N -player (potentially) nonzero-sum dynamic game work of [19]. Most of the results in this chapter therefore appear new, with only preliminary versions of the mixed method for whole sequences (5.23) having previously been presented in [19, 23] without consideration of player control constraints. Inverse Linear-Quadratic Dynamic Games: The inverse dynamic game method we presented in Sect. 5.7 for LQ dynamic games under the feedback information structure appears novel, despite discrete-time inverse optimal control (or the singleplayer inverse dynamic game case) for linear dynamical systems and quadratic cost functions having received considerable specialized attention in the control engineering literature. Indeed, existing results appear to be limited to the reformulation of necessary and sufficient conditions for computing quadratic player cost functions in LQ dynamic games in [7] for specific (restrictive) classes of cost matrices encountered in some economic applications. Additional Topics: Given the close relationship between the (N -player) inverse dynamic game problems explored in this chapter and the (single player) inverse optimal control problems explored in Chap. 3, many of the variations and extensions of inverse optimal control we discussed in Sect. 3.7 are of equal interest in inverse dynamic games. For example, probabilistic approaches to inverse (cooperative and noncooperative) dynamic games have recently been explored in [10, 11], and in [22] in conjunction with necessary optimality conditions related to discrete-time minimum principles. It is also conceivable that much of the discrete-time inverse LQ optimal control work discussed in Sect. 3.7 could be extended to the discrete-time inverse LQ dynamic game setting by exploiting coupled algebraic Riccati equations analogous to those presented in Theorem 5.10. Furthermore, under the open-loop information structure, dynamic games can be equivalently treated as (continuous) static games by using the state dynamics (5.1) to express the player cost functions (i.e., (5.2) and (5.3)) solely as functions of the initial state x0 , player control sequences {u i[0,T ] : i ∈ P}, and cost-function parameters {θ i : i ∈ P} (cf. [5, Sect. 6.2.1]). Inverse methods concerned with computing the cost functions (or utilities) of players in static games (e.g., [3, 6, 12, 14, 15]) are thus of potential use for solving inverse dynamic game problems under the openloop information structure. We note, however, that such approaches could lead to
186
5 Inverse Noncooperative Dynamic Games
unwieldy expressions, especially when the horizon T is large or infinite. Thus, as has been noted in the case of (forward) dynamic games in [5, Sect. 6.2.1], it is often useful to retain the recursive state dynamics and additive structure of the player cost functions so as to exploit connections with (inverse) optimal control problems. Finally, inverse dynamic game problems for solution concepts other than Nash equilibria appear yet to be thoroughly explored, with some recent work in this direction presented in [18]. Insights from potential game theory, in which equilibrium properties of (forward) games can be analyzed using potential functions (cf. [16, 20]), also appear yet to be fully explored for solving inverse dynamic game problems.
References 1. Aguirregabiria V, Mira P (2007) Sequential estimation of dynamic discrete games. Econometrica 75(1):1–53 2. Aguirregabiria V, Mira P (2010) Dynamic discrete choice structural models: a survey. J Econ 156(1):38–67 3. Allon G, Federgruen A, Pierson M (2011) How much is a reduction of your customers’ wait worth? an empirical study of the fast-food drive-thru industry based on structural estimation methods. Manuf & Serv Oper Manag 13(4):489–507 4. Bajari P, Benkard CL, Levin J (2007) Estimating dynamic models of imperfect competition. Econometrica 75(5):1331–1370 5. Basar T, Olsder GJ (1999) Dynamic noncooperative game theory, vol 23, 2nd edn. Academic, New York 6. Bertsimas D, Gupta V, Paschalidis ICh (2015) Data-driven estimation in equilibrium using inverse optimization. Math Prog 153(2):595–633 7. Carraro C, Flemming J, Giovannini A (1989) The tastes of European central bankers. In: A European central bank?: perspectives on monetary unification after ten years of the EMS, pp 162–185. Cambridge University Press, Cambridge 8. Engwerda JC, van den Broek WA, Schumacher JM (2000) Feedback Nash equilibria in uncertain infinite time horizon differential games. In: Proceedings of the 14th international symposium of mathematical theory of networks and systems, MTNS 2000, pp 1–6 9. Engwerda J (2005) LQ dynamic optimization and differential games. Wiley, West Sussex 10. Inga J, Bischoff E, Köpf F, Hohmann S (2019) Inverse dynamic games based on maximum entropy inverse reinforcement learning. arXiv:1911.07503 11. Inga Charaja JJ (2021) Inverse dynamic game methods for identification of cooperative system behavior. KIT Scientific Publishing, Publication Title: KIT Scientific Publishing 12. Jia R, Konstantakopoulos IC, Li B, Spanos C (2018) Poisoning attacks on data-driven utility learning in games. In: 2018 annual American control conference (ACC), pp 5774–5780. IEEE 13. Kalman RE (1964) When is a linear control system optimal? J Basic Eng 86(1):51–60 14. Konstantakopoulos IC, Ratliff LJ, Jin M, Spanos CJ, Sastry SS (2016) Inverse modeling of non-cooperative agents via mixture of utilities. In: 2016 IEEE 55th conference on decision and control (CDC), pp 6327–6334. IEEE 15. Kuleshov V, Schrijvers O (2015) Inverse game theory: learning utilities in succinct games. In: International conference on web and internet economics (WINE) 2015, pp 413–427 16. Lã QD, Chew YH, Soong B-H (2016) Potential game theory. Springer, Berlin 17. Lin X, Beling PA, Cogill R (2018) Multiagent inverse reinforcement learning for two-person zero-sum games. IEEE Trans Games 10(1):56–68 18. Maddux A (2021) Behaviour estimation in dynamic games. Master’s thesis, ETH Zurich
References
187
19. Molloy TL, Ford JJ, Perez T (2017) Inverse noncooperative dynamic games. In: IFAC 2017 world congress, Toulouse, France 20. Monderer D, Shapley LS (1996) Potential games. Games Econ Behav 14(1):124–143 21. Pakes A, Ostrovsky M, Berry S (2007) Simple estimators for the parameters of discrete dynamic games (with entry/exit examples). RAND J Econ 38(2):373–399 22. Peters L, Fridovich-Keil D, Royo VR, Tomlin CJ, Stachniss C (2021) Inferring objectives in continuous dynamic games from noise-corrupted partial state observations. In: Shell DA, Toussaint M, Hsieh MA (eds), Robotics: science and systems XVII 23. Tsai D, Molloy TL, Perez T (2016) Inverse two-player zero-sum dynamic games. In: Australian control conference (AUCC), 2016, Newcastle, Australia
Chapter 6
Inverse Noncooperative Differential Games
In this chapter, we generalize and extend the continuous-time inverse optimal control methods and results of Chap. 4 to inverse noncooperative differential games. We begin by posing inverse noncooperative differential game problems that differ in whether the given state and control trajectories are whole or truncated. We then develop methods for solving these inverse problems via either bilevel optimization or conditions for Nash equilibria derived from (continuous-time) minimum principles. This chapter thus serves as the continuous-time counterpart to Chap. 5. We again note the promise of minimum-principle methods compared to bilevel methods, and that the variety of different information structures and solution concepts in noncooperative differential games leads to subtle yet profound differences between inverse noncooperative differential game theory and continuous-time inverse optimal control. The final part of this chapter provides a detailed treatment of solving inverse noncooperative linear-quadratic (LQ) differential game problems with linear dynamics and infinite-horizon quadratic cost functionals. In doing so, the original notion of inverse optimal control introduced by Kalman is extended to noncooperative differential games.
6.1 Preliminary Concepts In this section, we introduce noncooperative differential games with parameterized player cost functionals together with associated conditions for Nash equilibria. Throughout this chapter, we shall use these parameterized games and conditions to investigate inverse noncooperative differential game problems.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 T. L. Molloy et al., Inverse Optimal Control and Inverse Noncooperative Dynamic Game Theory, Communications and Control Engineering, https://doi.org/10.1007/978-3-030-93317-3_6
189
190
6 Inverse Noncooperative Differential Games
6.1.1 Parameterized Noncooperative Differential Games Consider the set of players P {1, 2, . . . , N }, a (potentially infinite) horizon T > 0, and a deterministic continuous-time dynamical system described by the (potentially nonlinear) system of first-order ordinary differential equations x(t) ˙ = f (t, x(t), u 1 (t), . . . , u N (t)), x(0) = x0 ∈ Rn
(6.1)
for t ≥ 0 where x(t) ∈ Rn is the state vector of the system, x0 ∈ Rn is the initial state, and u i (t) ∈ U i are control inputs belonging to the (closed) sets of admissible i controls U i ⊂ Rm for i ∈ P. Here, f is a (potentially nonlinear) function that we assume to be continuous in t and uniformly (globally) Lipschitz in each of its state and control arguments such that (6.1) admits a unique solution for all N -tuples of continuous control functions {u i : i ∈ P} (cf. [3, Theorem 5.1]). For each player i ∈ P, let us define a parameterized cost functional VTi (x, u 1 , . . . , u N , θ i )
T
g i (t, x(t), u 1 (t), . . . , u N (t), θ i )dt
(6.2)
0
with g i : [0, T ] × Rn × U 1 × · · · × U N describing the stage cost for player i associated with the state and controls at time t. In contrast to standard presentations of differential games (cf. Sect. 2.5), in this chapter we shall consider the function g i to belong to some known class of functions parameterized by vectors θ i from parameter i sets Θ i ⊂ Rq for each player i ∈ P. We consider parameterized noncooperative differential games played with either the open-loop or feedback information structure as described in Sect. 2.5. In the case of the open-loop information structure, player i’s strategy set Γ i contains all functions γ i : [0, T ] × Rn → U i such that u i (t) = γ i (t, x0 ) ∈ U i (or equivalently, the set of all control trajectories u i satisfying u i (t) ∈ U i for all t ∈ [0, T ]). Under the feedback information structure, player i’s strategy set Γ i is the set of all functions γ i : [0, T ] × Rn → U i such that u i (t) = γ i (t, x(t)) ∈ U i . We assume that the players are playing for a Nash equilibrium. Given the appropriate open-loop or feedback strategy sets {Γ i : i ∈ P}, the N -tuple of player strategies {γ i ∈ Γ i : i ∈ P} constitutes a Nash equilibrium (open-loop or feedback) for cost-functional parameters {θ i : i ∈ P} if and only if VTi (γ 1 , . . . , γ i , . . . , γ N , θ i ) ≤ VTi (γ 1 , . . . , γ¯ i , . . . , γ N , θ i )
(6.3)
for all γ¯ i ∈ Γ i and all i ∈ P, where here, in a slight abuse of notation, we use VTi (γ 1 , . . . , γ i , . . . , γ N , θ i ) to denote the cost functional VTi evaluated with states x and controls u i given by solving (6.1) with u i (t) = γ i (t, ·) for i ∈ P.
6.1 Preliminary Concepts
191
We next exploit the connections between continuous-time optimal control and differential games (cf. Sect. 2.5.2.2) to derive necessary conditions for the existence of Nash equilibria solutions in parameterized differential games.
6.1.2 Nash Equilibria Conditions via Minimum Principles Let us define the (parameterized) player Hamiltonian functions H i (t, x(t), u 1 (t), . . . , u i (t), . . . , u N (t), λi (t), μi , θ i ) μi g i (t, x(t), u 1 (t), . . . , u N (t), θ i ) + λi (t) f (t, x(t), u 1 (t), . . . , u N (t)) (6.4) for t ∈ [0, T ] and i ∈ P where μi ∈ R is a scalar constant and λi : [0, T ] → Rn are player costate functions. For i ∈ P, we assume that the functions f and g i are continuously differentiable in each of their state and control arguments and let ∇x H i (t, x(t), u 1 (t), . . . , u i (t), . . . , u N (t), λi (t), μi , θ i ) ∈ Rn and ∇u i H i (t, x(t), u 1 (t), . . . , u i (t), . . . , u N (t), λi (t), μi , θ i ) ∈ Rm
i
denote the column vectors of partial derivatives of the Hamiltonian H i with respect to x(t) and u i (t), respectively, and evaluated at x(t), {u i (t) : i ∈ P}, λi (t), μi , and θi. In the case of a finite horizon T < ∞, we have the following conditions for openloop Nash equilibria (due to the continuous-time minimum principle of Theorem 4.1 and along the same lines as Theorem 2.7). Theorem 6.1 (Parameterized Finite-Horizon Open-Loop Nash Equilibria) Suppose that the N -tuple of player control trajectories {(u i : [0, T ] → U i ) : i ∈ P} with associated state trajectory x : [0, T ] → Rn constitutes an open-loop Nash equilibrium solution to a noncooperative differential game with player cost-functional parameters {θ i ∈ Θ i : i ∈ P} and a finite horizon 0 < T < ∞. Then, μi = 1 for all i ∈ P; (i) the state trajectory x : [0, T ] → Rn satisfies the game dynamics x(t) ˙ = f (t, x(t), u 1 (t), . . . , u N (t))
(6.5)
for t ∈ [0, T ] with x(0) = x0 ; (ii) there exist costate trajectories λi : [0, T ] → Rn for all players i ∈ P satisfying
192
6 Inverse Noncooperative Differential Games
λ˙ i (t) = −∇x H i (t, x(t), u 1 (t), . . . , u N (t), λi (t), μi , θ i )
(6.6)
for t ∈ [0, T ] with λi (T ) = 0;
(6.7)
and, (iii) the controls u i : [0, T ] → U i satisfy u i (t) ∈ arg min H i (t, x(t), u 1 (t), . . . , u¯ i (t), . . . , u N (t), λi (t), μi , θ i ) (6.8) u¯ i (t)∈U
i
for all t ∈ [0, T ] and all players i ∈ P. Proof The proof is identical to that of Theorem 2.7 with appropriate use of the continuous-time finite-horizon minimum principle of Theorem 4.1 for parameterized continuous-time optimal control problems. Since there is a close relationship between continuous-time optimal control and open-loop noncooperative differential games (cf. Sect. 2.5.2.2), the continuous-time minimum principle we established in Corollary 4.1 for parameterized optimal control problems implies the following necessary conditions for the existence of open-loop Nash equilibria for potentially infinite horizons T > 0. Theorem 6.2 (Parameterized Horizon-Invariant Open-Loop Nash Equilibria) Suppose that the N -tuple of player control trajectories {(u i : [0, ] → U i ) : i ∈ P} with associated state trajectory x : [0, ] → Rn constitutes a (potentially) truncated open-loop Nash equilibrium solution to a noncooperative differential game with player cost-functional parameters {θ i ∈ Θ i : i ∈ P} and a potentially infinite horizon T > . Then, (i) the state trajectory x : [0, ] → Rn satisfies the game dynamics x(t) ˙ = f (t, x(t), u 1 (t), . . . , u N (t)) for t ∈ [0, ] with x(0) = x0 ; (ii) there exist costate trajectories λi : [0, ] → Rn and real numbers μi for all players i ∈ P satisfying λ˙ i (t) = −∇x H i (t, x(t), u 1 (t), . . . , u N (t), λi (t), μi , θ i )
(6.9)
for t ∈ [0, ] with μi and λi (0) not simultaneously 0; and, (iii) the controls u i : [0, ] → U i satisfy u i (t) ∈ arg min H i (t, x(t), u 1 (t), . . . , u¯ i (t), . . . , u N (t), λi (t), μi , θ i ) u¯ i (t)∈U
i
(6.10)
6.1 Preliminary Concepts
193
for all t ∈ [0, ] and all players i ∈ P. Furthermore, if the game dynamics f and all of the player stage cost functions {g i : i ∈ P} are time-invariant, then H i (t, x(t), u 1 (t), . . . , u N (t), λi (t), μi , θ i ) = d i for t ∈ [0, ] and constants d i for all i ∈ P. Proof The theorem assertion follows by applying the combined continuous-time minimum principle of Corollary 4.1 to the N coupled parameterized continuoustime optimal control problems that describe an open-loop Nash equilibrium (i.e., versions of (2.41) with cost functionals parameterized by θ i ). We will use Theorems 6.1 and 6.2 in this chapter to develop efficient methods for solving inverse noncooperative differential game problems under the open-loop information structure. For the feedback information structure, we can employ similar arguments to those used in Theorems 6.1 and 6.2 to develop conditions for feedback Nash equilibria. Theorem 6.3 (Parameterized Horizon-Invariant Feedback Nash Equilibria) Suppose that the N -tuple of player control trajectories {(u i : [0, ] → U i ) : i ∈ P} with associated differentiable feedback laws {(γ i : [0, ] × Rn → U i ) : γ i (t, x(t)) = u i (t), i ∈ P} and state trajectory x : [0, ] → Rn constitutes a (potentially) truncated feedback Nash equilibrium solution to a noncooperative differential game with player cost-functional parameters {θ i ∈ Θ i : i ∈ P} and a potentially infinite horizon T > . Then, (i) the state trajectory x : [0, ] → Rn satisfies the game dynamics x(t) ˙ = f (t, x(t), u 1 (t), . . . , u N (t)) for t ∈ [0, ] with x(0) = x0 ; (ii) there exist costate trajectories λi : [0, ] → Rn and real numbers μi for all players i ∈ P satisfying λ˙ i (t) = −∇x H i (t, x(t), γ 1 (t, x(t)), . . . , u i (t), . . . , γ N (t, x(t)), λi (t), μi , θ i ) (6.11) for t ∈ [0, ] with μi and λi (0) not simultaneously 0; and, (iii) the controls u i : [0, ] → U i satisfy u i (t) = γ i (t, x(t)) ∈ arg min H i (t, x(t), u 1 (t), . . . , u¯ i (t), . . . , u N (t), λi (t), μi , θ i ) u¯ i (t)∈U i
(6.12) for all t ∈ [0, ] and all players i ∈ P.
194
6 Inverse Noncooperative Differential Games
Furthermore, if the game dynamics f and all of the player stage cost functions {g i : i ∈ P} are time-invariant, and if the feedback laws {γ i : i ∈ P} are time¯ = γ i (t¯, x) ¯ for all t, t¯ ∈ [0, ] and x¯ ∈ Rn ), then invariant (i.e., γ i (t, x) H i (t, x(t), u 1 (t), . . . , u N (t), λi (t), μi , θ i ) = d i
(6.13)
for all t ∈ [0, ] and all i ∈ P where {d i ∈ R : i ∈ P} are (time-invariant) constants. Proof The definition of feedback Nash equilibria provided in (5.4) is equivalent to a system of N coupled parameterized continuous-time optimal control problems (i.e., versions of (2.42) with cost functionals parameterized by θ i ). Applying the continuous-time minimum principle of Corollary 4.1 to each of these optimal control problems yields the theorem result, with the last assertion (6.13) holding only when the feedback laws {γ i : i ∈ P} are time-invariant since then the dynamics of the ith coupled optimal control problem, i.e., f (t, ·, γ 1 (t, ·), . . . , ·, . . . , γ N (t, ·)), are also time-invariant.
The conditions for open-loop and feedback Nash equilibria in Theorems 6.2 and 6.3 are identical except for the partial derivatives of the Hamiltonian functions with respect to the state x(t) in the costate differential equation (6.11) involving the Nash equilibrium feedback control laws γ j of players j ∈ P, j = i. As in the discretetime dynamic game setting of Chap. 5, the consequences of this difference are significant for inverse noncooperative differential game problems and so we shall discuss the feedback information structure in more detail later in this chapter. We next introduce the inverse noncooperative differential game problems that we will consider in this chapter.
6.2 Inverse Noncooperative Differential Game Problems In this chapter, we consider inverse noncooperative game problems that involve computing parameters θ i of some or all of the player cost functionals in a parameterized noncooperative differential game (cf. Sect. 6.1.1) such that a given collection of state and control trajectories constitutes a Nash equilibrium. Our first problem assumes that the entire state and control trajectories are given, while our second problem will consider truncated trajectories. Definition 6.1 (Whole-Trajectory (WT) Problem) Consider a parameterized noncooperative differential game (cf. Sect. 6.1.1) with: a finite horizon 0 < T < ∞; a given open-loop or feedback information structure; and, known dynamics f , constraint sets U i , parameter sets Θ i and functions g i . Given the state trajectory x : [0, T ] → Rn
6.2 Inverse Noncooperative Differential Game Problems
195
and associated player control trajectories {(u i : [0, T ] → U i ) : i ∈ P}, the wholetrajectory (WT) inverse noncooperative differential game problem is to compute parameters {θ i ∈ Θ i : i ∈ P} such that x and {u i : i ∈ P} constitute a Nash equilibrium of the appropriate type for the given information structure. The second problem we consider involves computing player cost-functional parameters given state and control trajectories that are truncated at some time prior to the (potentially infinite) game horizon T . Definition 6.2 (Truncated-Trajectory (TT) Problem) Consider a parameterized noncooperative differential game (cf. Sect. 6.1.1) with: a potentially infinite horizon T > 0; a given open-loop or feedback information structure; and known dynamics f , constraint sets U i , parameter sets Θ i , and functions g i . Given the truncated state trajectory x : [0, ] → Rn and associated player control trajectories {(u i : [0, ] → U i ) : i ∈ P} with < T , the truncated-trajectory (TT) inverse noncooperative differential game problem is to compute parameters {θ i ∈ Θ i : i ∈ P} such that x and {u i : i ∈ P} constitute a truncated Nash equilibrium of the appropriate type for the given information structure. The WT and TT problems of Definitions 6.1 and 6.2 are the continuous-time counterparts of the (discrete-time) inverse noncooperative dynamic game problems of Sect. 5.2. As in the discrete-time setting, the inverse noncooperative differential game problems extend the continuous-time inverse optimal control problems we posed in Sect. 4.2 to settings with multiple players. Indeed, in the case of a single player game (i.e., N = 1), the WT problems of Definitions 6.1 and 4.1 are equivalent, as are the TT problems of Definitions 6.2 and 4.2. The close relationship between continuous-time inverse optimal control and inverse differential game problems means that many of the methods and results pertaining to inverse differential games we will develop in this chapter will be directly analogous to those we developed in Chap. 4 for continuoustime inverse optimal control. As in the discrete-time (dynamic game) setting, however, there are several key points of interest worth noting in the game setting. Firstly, the tractability of inverse methods that involve solving noncooperative differential games (e.g., naive bilevel optimization methods) is of acute concern in game settings with N > 1 since the (forward) solution of noncooperative differential games is typically more involved than the (forward) solution of continuous-time optimal control problems. Secondly, methods for solving inverse noncooperative differential games will frequently allow for the separate computation of individual player cost-functional parameters θ i without the need to compute the (unknown) parameters θ j of other players j ∈ P, j = i. In this chapter, we shall consider the exact solution of the inverse noncooperative differential game problems of Definitions 6.1 and 6.2 as well as their approximate solution in the sense of finding parameters such that the trajectories approximately satisfy Nash equilibria conditions. As in previous chapters, this approximate solution concept is intended to address practical situations in which misspecification can occur in the dynamics, cost functions, parameters, or horizon of the game such that the given trajectories fail to constitute an exact Nash equilibria for any player cost-function parameters {θ i ∈ Θ i : i ∈ P}.
196
6 Inverse Noncooperative Differential Games
6.3 Bilevel Methods Let us first present bilevel methods for solving the inverse noncooperative differential game problems of Definitions 6.1 and 6.2. These bilevel methods mirror those we considered in previous chapters and involve searching over the player cost-functional parameters of a parameterized noncooperative differential game such that the trajectories predicted by solving the game match the given trajectories. These bilevel methods will motivate our later consideration of methods based on minimum-principle conditions for Nash equilibria.
6.3.1 Bilevel Methods for Whole Trajectories The bilevel method for solving the WT problem of Definition 6.1 is defined by the optimization problem
T
inf
{θ i ∈Θ i :i∈P }
x(t) − x θ (t) 2 + u i (t) − u iθ (t) 2 dt
0
(6.14)
i∈P
subject to the constraint that the state and control trajectories x θ : [0, T ] → Rn and {(u iθ : [0, T ] → U i ) : i ∈ P} constituting an appropriate open-loop or feedback Nash equilibrium of a parameterized noncooperative differential game with costfunctional parameters {θ i ∈ Θ i : i ∈ P}. The (forward) solution of a noncooperative differential game with a given information structure and known finite horizon T is thus nested in the constraints of (6.14). If the information structure is open-loop, then the bilevel method of (6.14) can be modified to enable the computation of the cost-functional parameters of each player i ∈ P separately. That is, under the open-loop information structure, instead of (6.14), we can solve inf
θ i ∈Θ i
T
x(t) − x θ (t) 2 + u i (t) − u iθ (t) 2 dt
(6.15)
0
subject to the constraint that the state and control trajectories x θ and u iθ constitute a solution to the continuous-time finite-horizon optimal control problem inf
VTi x, u 1 , . . . , u¯ i , . . . , u N , θ i
s.t.
x(t) ˙ = f (t, x(t), u 1 (t), . . . , u¯ i (t), . . . , u N (t)) t ∈ [0, T ]
u¯ i
u¯ i (t) ∈ U i , t ∈ [0, T ] x(0) = x0
(6.16)
6.3 Bilevel Methods
197
with player cost-functional parameters θ i ∈ Θ i and the other player control trajectories taken as the given trajectories {u j : j ∈ P, j = i}. This simplified bilevel method exploits the close relationship between continuous-time optimal control and open-loop noncooperative differential games as described in Sect. 2.5.2.2, and can therefore be implemented in the same manner (and with the same computational complexity) of the bilevel method for continuous-time inverse optimal control in (4.6). Unfortunately, the bilevel method of (6.14) does not share a similar simplified form in the case of the feedback information structure unless the feedback Nash equilibrium strategies γ j : [0, T ] × Rn → U j of the other players j ∈ P, j = i are known. Given that the bilevel method of (6.14) and its open-loop variant of (6.15) are only applicable when the horizon T is finite, and the state and control trajectories provided are whole, we next present a bilevel method suitable for potentially infinite horizons T and truncated state and control trajectories.
6.3.2 Bilevel Methods for Truncated Trajectories The bilevel method for the TT problem of Definition 6.2 is the optimization problem inf
{θ i ∈Θ i :i∈P }
0
x(t) − x θ (t) 2 + u i (t) − u iθ (t) 2 dt
(6.17)
i∈P
θ subject to the constraint that the state and control trajectories, x[0,] and {u iθ [0,] : iθ θ i ∈ P}, are truncated versions of trajectories x[0,T ] and {u [0,T −1] : i ∈ P} that constitute an appropriate Nash equilibrium of a noncooperative differential game with cost-functional parameters {θ i ∈ Θ i : i ∈ P}, and a given information structure and horizon T > 0. Since the given state and control trajectories x[0,] and {u i[0,] : i ∈ P} are truncated at some finite time < T , the bilevel method of (6.17) for truncated trajectories can be used to compute the parameters of games with infinite horizons T = ∞. Unfortunately however, the bilevel method of (6.17) is not easily simplified when the information structure is open-loop in the same manner as the whole-trajectory bilevel method of (6.14). Indeed, since the control trajectories of the other players are not known in their entirety (due to being truncated), the cost-functional parameters of each player i ∈ P cannot easily be computed separately using a bilevel method of continuous-time inverse optimal control problem as in (6.15). The inability to easily compute player cost-functional parameters separately is a limitation of the truncated-trajectory bilevel method of (6.17). Later in this chapter, we shall develop alternative methods that enable the parameters of individual players to be computed separately from truncated trajectories.
198
6 Inverse Noncooperative Differential Games
6.3.3 Discussion of Bilevel Methods The key properties of the bilevel methods we have presented in (6.14), (6.15), and (6.17) for solving inverse noncooperative differential game problems are essentially equivalent to those of the bilevel methods we presented in Chap. 5 for solving inverse noncooperative dynamic game problems, and are summarized as follows. • The objectives of the upper-level optimizations in the bilevel methods of (6.14), (6.15), and (6.17) is the total squared error between both state and control trajectories, however, alternative upper-level objectives open the possibility of handling a variety of partial-information problems, for example, the total squared error of the state trajectories, inf
{θ i ∈Θ i :i∈P }
T
x(t) − x θ (t) 2 dt,
0
can be used as an alternative upper-level objective if only the state trajectory x (and not the control trajectories) is given. • The bilevel methods of (6.14), (6.15), and (6.17) all require explicit prior knowledge of the horizon T > 0. • Implementation of the bilevel methods (6.14), (6.15), and (6.17) requires the solution of two nested optimization problems (often via nesting two numeric optimization routines) with the first optimization over the parameters {θ i ∈ Θ i : i ∈ P} and the second optimization corresponding to the solution of a noncooperative differential game or continuous-time optimal control problem with parameters {θ i ∈ Θ i : i ∈ P}. Standard implementations of the bilevel methods are thus computationally complex, and their need for explicit prior knowledge of the horizon T is often restrictive. In order to find efficient alternatives to bilevel methods for solving inverse noncooperative differential game problems, in the remainder of this chapter, we shall construct and analyze inverse methods that exploit the Nash equilibria conditions of Sect. 6.1.2.
6.4 Open-Loop Minimum-Principle Methods In this section, we turn our attention to using the minimum-principle conditions for Nash equilibria established in Sect. 6.1.2 to develop methods for solving the inverse differential game problems of Definitions 6.1 and 6.2. Key to these developments will be interpreting the minimum-principle conditions established in Sect. 6.1.2 as conditions that player cost-functional parameters must satisfy given trajectories, instead of their usual interpretations as conditions that the trajectories must satisfy given player cost-functional parameters.
6.4 Open-Loop Minimum-Principle Methods
199
As in the treatment of inverse noncooperative dynamic games in Chap. 5, we shall first develop methods of solving the whole-trajectory and truncated-trajectory problems of Definitions 6.1 and 6.2 under the open-loop information structure (i.e., when the given trajectories are to constitute an open-loop Nash equilibrium). We shall later use these methods to discuss the challenges and potentials of using similar approaches to solve the problems of Definitions 6.1 and 6.2 under the feedback information structure (i.e., when the given trajectories are to constitute a feedback Nash equilibrium).
6.4.1 Whole-Trajectory Open-Loop Methods We develop methods for solving the whole-trajectory problem of Definition 6.1 in a similar manner to that which we used to develop the whole-trajectory continuoustime inverse optimal control methods of Sect. 4.4.1. Let us begin by considering a finite horizon 0 < T < ∞ together with (arbitrary) state and control trajectories, x : [0, T ] → Rn and {(u i : [0, T ] → U i ) : i ∈ P}. For each player i ∈ P, let us define a finite set of times at which the controls in the trajectory u i : [0, T ] → U i are in the interior (i.e., not on the boundary) of the constraint set U i as K i {tk ∈ [0, T ] : u i (tk ) ∈ int U i and 1 ≤ k ≤ |K i |}. The control constraints may therefore be active (i.e., u i (t) may be on the boundary of U i ) at the infinitely many times t ∈ [0, T ] not also in K i (i.e., at t ∈ [0, T ] \ K i ). Assertion (iii) of Theorem 6.1 can therefore be rewritten as ∇u i H i t, x(t), u 1 (t), . . . , u i (t), . . . , u N (t), λi (t), μi , θ i = 0 for all t ∈ K i with μi = 1. In order for the trajectories x : [0, T ] → Rn and {(u i : [0, T ] → U i ) : i ∈ P} to constitute an open-loop Nash equilibrium of a (finitehorizon) noncooperative differential game with horizon T and player cost-functional parameters {θ i : i ∈ P}, Theorem 6.1 hence implies that the parameters {θ i : i ∈ P} must be such that ∇u i H i t, x(t), u 1 (t), . . . , u i (t), . . . , u N (t), λi (t), μi , θ i = 0
(6.18)
for all t ∈ K i with λ˙ i (t) = −∇x H i t, x(t), u 1 (t), . . . , u i (t), . . . , u N (t), λi (t), μi , θ i
(6.19)
for t ∈ [0, T ] where μi = 1 and λi (T ) = 0.
(6.20)
200
6 Inverse Noncooperative Differential Games
The costate differential equation (6.19), costate terminal condition (6.20), and Hamiltonian gradient condition (6.18) for each player i ∈ P are identical to the costate differential equation (4.9), costate terminal condition (4.10), and Hamiltonian gradient condition (4.8) of continuous-time optimal control. Recalling the proof of Theorem 6.1, this relationship between conditions is specifically due to the optimal control formulation of open-loop Nash equilibria (cf. (2.41) and Sect. 2.5.2.2). The close relationship between the conditions for open-loop Nash equilibria and solutions of continuous-time optimal control problems means that methods identical to the constraint-satisfaction, soft, and mixed methods of Sect. 4.4.1 for solving continuoustime inverse optimal control problems can be developed to solve the whole-trajectory inverse noncooperative differential game problem of Definition 6.1 under the openloop information structure. In this subsection, we shall summarize these wholetrajectory inverse noncooperative differential game methods but we shall omit their derivations since they essentially mirror those in Sect. 4.4.1.
6.4.1.1
Constraint-Satisfaction Method for Whole Trajectories
Recalling the constraint-satisfaction method of continuous-time inverse optimal control (4.11), the constraint-satisfaction method for solving the whole-trajectory inverse noncooperative differential game problem under the open-loop information structure is defined by the N optimization problems inf
θ i ,λi
s.t.
Ci λ˙ i (t) = −∇x H i t, x(t), u 1 (t), . . . , u i (t), . . . , u N (t), λi (t), μi , θ i , t ∈ [0, T ] λi (T ) = 0 ∇u i H i tk , x(tk ), u 1 (tk ), . . . , u i (tk ), . . . , u N (tk ), λi (tk ), μi , θ i = 0, tk ∈ K i θi ∈ Θi
(6.21) for any constant C i ∈ R, μi = 1 and for all i ∈ P. We note that the N optimization problems (6.21) are decoupled given the trajectories x and {u i : i ∈ P} and so (6.21) can be solved for each player’s cost-functional parameters individually.
6.4.1.2
Soft Method for Whole Trajectories
Recalling the soft method of inverse optimal control (4.13), we shall develop the soft method for solving the whole-trajectory inverse noncooperative differential game problem under the open-loop information structure under the following assumption.
6.4 Open-Loop Minimum-Principle Methods
201
Assumption 6.1 The controls u i (t) are in the interior (i.e., not on the boundary) of the control-constraint set U i for all t ∈ [0, T ] and all i ∈ P. Under Assumption 6.1, the soft method for open-loop whole-trajectory inverse noncooperative differential games is defined by the N optimization problems
T
inf
θ i ,λi
s.t.
∇u i H i t, x(t), u 1 (t), . . . , u i (t), . . . , u N (t), λi (t), μi , θ i 2
0
+ λ˙ i (t) + ∇x H i t, x(t), u 1 (t), . . . , u i (t), . . . , u N (t), λi (t), μi , θ i 2 dt
θi ∈ Θi (6.22)
with μi = 1 and for all i ∈ P. This soft method provides an approximate optimality solution to the whole-trajectory inverse noncooperative dynamic game problem of Definition 6.1 since it involves minimizing the extent to which the costate differential equation (6.19) and the Hamiltonian gradient condition (6.18) are violated for each player i ∈ P. Again, we note that the N optimization problems in (6.22) are decoupled and so they can be solved individually.
6.4.1.3
Mixed Method for Whole Trajectories
Finally, recalling the mixed method of inverse optimal control (4.14), the mixed method for the solving whole-trajectory inverse noncooperative differential game problem under the open-loop information structure is defined by the N optimization problems inf
θ i ,λi
s.t.
2
∇u i H i tk , x(tk ), u 1 (tk ), . . . , u i (tk ), . . . , u N (tk ), λi (tk ), μi , θ i
tk ∈K
i
λ˙ i (t) = −∇x H i
t, x(t), u 1 (t), . . . , u i (t), . . . , u N (t), λi (t), μi , θ i , t ∈ [0, T ]
λi (T ) = 0 θi ∈ Θi
(6.23) with μi = 1 and for all i ∈ P. We again see that the N optimization problems in (6.23) are decoupled and so they can be solved individually. Perhaps the most significant limitation of the whole-trajectory constraintsatisfaction, soft, and mixed methods is that they require a known, finite game horizon 0 < T < ∞. We next develop methods that are applicable in the case of truncated trajectories and potentially unknown finite and infinite horizons.
202
6 Inverse Noncooperative Differential Games
6.4.2 Truncated-Trajectory Open-Loop Methods To develop methods for solving the truncated-trajectory problem of Definition 6.2 under the open-loop information structure, let us consider a possibly infinite horizon T > 0 and state and control trajectories x : [0, ] → Rn and {(u i : [0, ] → U i ) : i ∈ P} with < T . For each player i ∈ P, let us also define a finite set of times Ki {tk ∈ [0, ] : u i (tk ) ∈ int U i and 1 ≤ k ≤ |Ki |} at which the player’s controls u i are in the interior of the constraint set U i . For x : [0, ] → Rn and {(u i : [0, ] → U i ) : i ∈ P} to constitute a truncated openloop Nash equilibria of a noncooperative differential game with either a finite or infinite horizon T > , Theorem 6.2 then implies that the parameters {θ i : i ∈ P} must be such that λ˙ i (t) = −∇x H i t, x(t), u 1 (t), . . . , u i (t), . . . , u N (t), λi (t), μi , θ i
(6.24)
for t ∈ [0, ] and some real number μi not simultaneously zero with λi (0), and ∇u i H i t, x(t), u 1 (t), . . . , u i (t), . . . , u N (t), λi (t), μi , θ i = 0
(6.25)
for all t ∈ Ki and all i ∈ P. The costate (6.24) and Hamiltonian gradient conditions (6.25) are similar to those that we used in the previous subsection to propose methods for solving the wholetrajectory inverse differential game problem under the open-loop information structure. For each individual player i ∈ P, the conditions (6.24) and (6.25) are also equivalent to the conditions (4.15) and (4.16) that we used in Sect. 4.4.2 to develop methods of continuous-time inverse optimal control with truncated trajectories. As the whole-trajectory case, the proof of Theorem 6.2 suggests that this per-player equivalence in conditions is due again to the optimal control formulation of openloop Nash equilibria (cf. (2.41) and Sect. 2.5.2.2). In this subsection, we shall develop constraint-satisfaction, soft, and mixed methods for solving the truncated-trajectory inverse differential game problem of Definition 6.2 under the open-loop information structure in an analogous manner to the development of methods for whole trajectories developed in the previous subsection and to the inverse optimal control methods of Sect. 4.4.2. These truncated-trajectory methods will differ from their whole-trajectory counterparts since the horizon T is unknown and potentially infinite, which implies that the values of μi will also be unknown. We shall thus introduce {μi : i ∈ P} as additional unknowns alongside the parameters {θ i : i ∈ P}. Due to the similarities between the continuous-time inverse optimal control methods for truncated-trajectories in Sect. 4.4.2 and the inverse differential game methods for truncated trajectories that we shall present here, in this subsection we will only summarize the inverse differential game methods and refer to Sect. 4.4.2 for detailed derivations.
6.4 Open-Loop Minimum-Principle Methods
6.4.2.1
203
Constraint-Satisfaction Method for Truncated Trajectories
Recalling the truncated-trajectory constraint-satisfaction method of inverse optimal control (4.17), the constraint-satisfaction method for solving the truncated-trajectory inverse noncooperative dynamic game problem under the open-loop information structure is defined by the N optimization problems inf
θ i ,λi ,μi
s.t.
Ci λ˙ i (t) = −∇x H i t, x(t), u 1 (t), . . . , u i (t), . . . , u N (t), λi (t), μi , θ i , t ∈ [0, T ] ∇u i H i tk , x(tk ), u 1 (tk ), . . . , u i (tk ), . . . , u N (tk ), λi (tk ), μi , θ i = 0, tk ∈ Ki θi ∈ Θi μi ∈ R
(6.26) for any constant C i ∈ R and all i ∈ P.
6.4.2.2
Soft Method for Truncated Trajectories
As in the case of the truncated-trajectory soft method of inverse optimal control (4.19), we require the following assumption to develop the truncated-trajectory soft method of inverse noncooperative differential games. Assumption 6.2 The controls u i (t) are in the interior (i.e., not on the boundary) of the control-constraint set U i for all t ∈ [0, ] and all i ∈ P. Under Assumption 6.2, the soft method for solving the truncated-trajectory inverse noncooperative dynamic game problem under the open-loop information structure is defined by the N optimization problems inf
θ i ,λi ,μi
s.t.
2
∇u i H i t, x(t), u 1 (t), . . . , u i (t), . . . , u N (t), λi (t), μi , θ i
0
+ λ˙ i (t) + ∇x H i t, x(t), u 1 (t), . . . , u i (t), . . . , u N (t), λi (t), μi , θ i 2 dt
θi ∈ Θi μi ∈ R
(6.27) for all i ∈ P.
204
6.4.2.3
6 Inverse Noncooperative Differential Games
Mixed Method for Truncated Trajectories
Finally, recalling the truncated-sequence mixed method of continuous-time inverse optimal control (4.20), the mixed method for solving the truncated-sequence inverse noncooperative differential game problem under the open-loop information structure is defined by the N optimization problems inf
θ i ,λi ,μi
s.t.
2
∇u i H i tk , x(tk ), u 1 (tk ), . . . , u i (tk ), . . . , u N (tk ), λi (tk ), μi , θ i
tk ∈Ki
λ˙ i (t) = −∇x H i t, x(t), u 1 (t), . . . , u i (t), . . . , u N (t), λi (t), μi , θ i , t ∈ [0, ] θi ∈ Θi μi ∈ R
(6.28) for all i ∈ P.
6.4.3 Discussion of Open-Loop Minimum-Principle Methods The key difference between the whole-trajectory and truncated-trajectory minimumprinciple methods of inverse differential games presented in Sects. 6.4.1 and 6.4.2 is that the whole-trajectory methods require prior knowledge of the horizon length T while the truncated-trajectory methods do not. The other properties of the truncatedtrajectory methods are analogous to those of the whole-trajectory methods. Perhaps most noteworthy is that both the whole-trajectory and truncated-trajectory constraintsatisfaction, soft, and mixed methods enable the cost-functional parameters of individual players i ∈ P to be computed separately without the need to compute or know the parameters of the other players j ∈ P, j = i. This property is important since the bilevel method of (6.17) for truncated trajectories requires the cost-functional parameters of all players to computed simultaneously (and with knowledge of the horizon T ). The main limitation of the minimum-principle methods for both whole and truncated trajectories is that they may yield cost-functional parameters under which the given state and control trajectories only constitute a local (not global) open-loop Nash equilibrium. This limitation is due to the minimum-principle conditions of Sect. 6.1.2 being necessary, but not always sufficient, conditions for the existence of open-loop Nash equilibria. This limitation is, however, shared in practice with the bilevel methods of Sect. 6.3 since their implementations require the use of locally (but not always globally) convergent numeric optimization algorithms. Importantly, the consequences of this limitation for minimum-principle methods can be mitigated if the parameters yielded by the methods can be shown to be unique, since there will then be no other parameters under which the trajectories could constitute a
6.4 Open-Loop Minimum-Principle Methods
205
(global) Nash equilibrium. The next section will therefore focus on characterizing the existence and uniqueness of solutions to the constraint-satisfaction, soft, and mixed methods for whole and truncated trajectories. We shall later discuss the potential of developing minimum-principle methods to solve inverse noncooperative differential game problems under the feedback information structure.
6.5 Open-Loop Method Reformulations and Solution Results In this section, we shall reformulate the minimum-principle methods presented in the previous section for solving the whole-trajectory and truncated-trajectory inverse noncooperative differential game problems under the open-loop information structure. These reformulations will enable us to characterize the existence and uniqueness of the solutions yielded by the methods, and to simplify their practical implementation. The results of this section mirror the continuous-time inverse optimal control results of Sect. 4.5 and so here we will present simplified derivations.
6.5.1 Linearly Parameterized Player Cost Functionals Throughout this section, we shall make use of the following assumption regarding the structure of the player cost functions g i . Assumption 6.3 (Linearly Parameterized Player Cost Functions) For all players i ∈ P, the functions g i are linear combinations of known basis functions i g¯ i : [0, T ] × Rn × U 1 × · · · × U N → Rq , namely, g i (t, x(t), u 1 (t), . . . , u i (t), . . . , u N (t), θ i ) = θ i g¯ i (t, x(t), u 1 (t), . . . , u i (t), . . . , u N (t)) i
where θ i ∈ Θ i ⊂ Rq are the cost-functional parameters, and the basis functions g¯ i are continuously differentiable in each of their arguments. Assumption 6.3 generalizes Assumption 4.3 from the continuous-time inverse optimal control setting of Sect. 4.5 to the current setting of inverse noncooperative differential games. The importance of Assumption 6.3 is that it leads to player Hamiltonian functions H i that are linear in the player cost-functional parameters θ i and the player costate functions λi (t) in the sense that H i (t, x(t), u 1 (t), . . . , u N (t), λi (t), μi , θ i ) = μi θ i g¯ i (t, x(t), u 1 (t), . . . , u N (t)) + λi (t) f (t, x(t), u 1 (t), . . . , u N (t))
for t ≥ 0 and all i ∈ P. An immediate consequence of this linearity is that the gradients of the player Hamiltonians are also linear under Assumption 6.3, namely,
206
6 Inverse Noncooperative Differential Games
∇u i H i (t, x(t), u 1 (t), . . . , u i (t), . . . , u N (t), λi (t), μi , θ i ) = ∇u i g¯ i (t, x(t), u 1 (t), . . . , u i (t), . . . , u N (t))θ i μi
(6.29)
+ ∇u i f (t, x(t), u (t), . . . , u (t), . . . , u (t))λ (t) 1
i
N
i
and ∇x H i (t, x(t), u 1 (t), . . . , u i (t), . . . , u N (t), λi (t), μi , θ i ) = ∇x g¯ i (t, x(t), u 1 (t), . . . , u i (t), . . . , u N (t))θ i μi
(6.30)
+ ∇x f (t, x(t), u (t), . . . , u (t), . . . , u (t))λ (t) 1
i
N
i
where ∇x g¯ i (t, x(t), u 1 (t), . . . , u N (t)) and ∇u i g¯ i (t, x(t), u 1 (t), . . . , u N (t)), denote the matrices of partial derivatives of g¯ i with respect to x(t) and u i (t), respectively, evaluated at x(t), {u i (t) : i ∈ P}, λi (t), and μi , and ∇x f (t, x(t), u 1 (t), . . . , u N (t)) and ∇u i f (t, x(t), u 1 (t), . . . , u N (t)) denote the matrices of partial derivatives of f with respect to x(t) and u i (t), respectively, evaluated at x(t), {u i (t) : i ∈ P}, λi (t), and μi . Since the objective functions and constraints of all of the open-loop inverse noncooperative differential game methods in Sect. 6.4 are either linear or quadratic in the player Hamiltonian gradients, Assumption 6.3 thus implies that the optimization problems in the methods of Sect. 6.4 are convex (provided that the sets Θ i are also convex). We next reformulate and analyze the properties of the methods of Sect. 6.4 as either systems of linear equations or quadratic programs via the same arguments as Sect. 4.5.
6.5.2 Fixed-Element Parameter Sets We first note that as in optimal control and discrete-time noncooperative dynamic games, scaling any of the player cost functions in a noncooperative differential game by any scalar C i > 0 does not change the nature of the open-loop Nash equilibria trajectories x and u i . A necessary condition for methods of inverse noncooperative differential games to yield unique solutions is thus that the parameter sets Θ i must not contain both θ i and C i θ i for any scalar C i > 0 and any θ i . We therefore let the parameter set for each player i ∈ P be i
i Θ i = {θ i ∈ Rq : θ(1) = 1}
(6.31)
6.5 Open-Loop Method Reformulations and Solution Results
207
i
i where θ(1) denotes the first element of θ i ∈ Rq . Since the order and scaling of the basis functions (and elements of θ i ) under Assumption 6.3 is arbitrary, there is no loss of generality with these fixed-element parameter sets. The parameter sets given by (6.31) also exclude the trivial solution of {θ i = 0 : i ∈ P}.
6.5.3 Whole-Trajectory Methods Reformulations and Results Given Assumption 6.3 and the parameter sets in (6.31), the (open-loop) wholetrajectory constraint-satisfaction, mixed, and soft methods presented in Sect. 6.4.1 can be reformulated and analyzed as in Sect. 4.5. We summarize these results here without repeating their derivations.
6.5.3.1
Whole-Trajectory Constraint-Satisfaction Method Reformulation and Results
Under Assumption 6.3 and by following the derivation of (4.27) for each player i ∈ P, the constraint-satisfaction method (6.21) can be reformulated as the problem of solving the N (constrained) systems of linear equations ξCi θ i = 0
s.t.
θi ∈ Θi
(6.32)
for all i ∈ P. The coefficient matrices ξCi for i ∈ P are defined as ξCi ∇u i g¯ i (tk , x(tk ), u 1 (tk ), . . . , u N (tk )) + ∇u i f (tk , x(tk ), u 1 (tk ), . . . , u N (tk ))λ¯ i (tk )
tk ∈K
(6.33) i
i where the functions λ¯ i : [0, T ] → Rn×q solve the differential equations
λ˙¯ i (t) = −∇x g¯ i (t, x(t), u 1 (t), . . . , u N (t)) − ∇x f (t, x(t), u 1 (t), . . . , u N (t))λ¯ i (t) (6.34) for t ∈ [0, T ] with the terminal conditions λ¯ i (T ) = 0. Here, we note that the notation [Ak ]k∈K i denotes the matrix formed by stacking a subsequence of matrices determined by values of k in K i from the sequence of matrices A0 , A1 , . . . , e.g.,
208
6 Inverse Noncooperative Differential Games
⎡
A0 A1 .. .
⎢ ⎢ [Ak ]k∈{0,...,T −1} = ⎢ ⎣
⎤ ⎥ ⎥ ⎥. ⎦
A T −1 By considering the parameter sets Θ i given by (6.31), the reformulation (6.32) becomes the problem of solving the N unconstrained systems of linear equations ξ¯Ci θ i = e¯1
(6.35)
for each player i ∈ P where ¯ξCi e1i ξC with e1 and e¯1 being column vectors of appropriate dimensions with 1 in their first components and zeros elsewhere. The reformulation in (6.35) preserves the ability to solve for player parameters individually (i.e., the systems of linear equations for different players can be solved independently). The reformulation in (6.35) also enables us to characterize the existence and uniqueness of player cost-functional parameters yielded by the constraint-satisfaction method (6.21), leading to the following theorem. Theorem 6.4 (Solutions to Whole-Trajectory Constraint-Satisfaction Method) Suppose that Assumption 6.3 holds and consider any player i ∈ P. Let the parameter set Θ i be given by (6.31) and let ξ¯Ci+ be the pseudoinverse of the matrix ξ¯Ci . Then the constraint-satisfaction method for whole trajectories (6.21) yields cost-functional parameters for player i ∈ P if and only if ξ¯Ci ξ¯Ci+ e¯1 = e¯1 ,
(6.36)
and these (potentially nonunique) parameters are given by θ i = ξ¯Ci+ e¯1 + (I − ξ¯Ci+ ξ¯Ci )bi
(6.37)
i
where bi ∈ Rq is any arbitrary vector. Furthermore, the cost-functional parameters computed by the method are unique and given by θ i = ξ¯Ci+ e¯1
(6.38)
if and only if ξ¯Ci has rank q i in addition to satisfying (6.36). Proof Follows via the same argument as Theorem 4.3 using the reformulation in (6.35) for player i ∈ P.
6.5 Open-Loop Method Reformulations and Solution Results
209
Since the conditions of Theorem 6.4 are specific to each player i ∈ P, it is possible for the whole-trajectory constraint-satisfaction method (6.21) to yield cost-functional parameters for some players but not others, and to yield unique cost-functional parameters for some players but not others. The discussion after Theorem 4.3 provides some intuition behind the conditions of Theorem 6.4 that must be satisfied in order for the constraint-satisfaction method (6.21) to yield (possibly unique) cost-functional parameters for player i ∈ P. Clearly, the conditions of Theorem 6.4 must hold for all players in order for the constraint-satisfaction method (6.21) to yield unique parameters for all of the players. Theorem 6.4 provides deeper insight into the solution of the whole-trajectory inverse noncooperative differential game problem of Definition 6.1. Specifically, the following corollary summarizes conditions for the existence and uniqueness of exact solutions to the WT problem of Definition 6.1 with Θ i given by (6.31) for i ∈ P (regardless of the specific method employed). Corollary 6.1 (Existence of Exact Solutions to Whole-Trajectory Problem of Definition 6.1) Suppose that Assumption 6.3 holds and consider any player i ∈ P. If the parameter set Θ i is given by (6.31) and if ξ¯Ci ξ¯Ci+ e¯1 = e¯1 then the state and player control trajectories, x : [0, T ] → Rn and {(u i : [0, T ] → U i ) : i ∈ P} do not constitute a (potentially local) open-loop Nash equilibrium for any θ i ∈ Θ i . If, however, ξ¯Ci has rank q i and ξ¯Ci ξ¯Ci+ e¯1 = e¯1 then there is at most one θ i ∈ Θ i such that the sequences x[0,T ] and {u i[0,T −1] : i ∈ P} constitute a (potentially local) open-loop Nash equilibrium. Proof Follows via the same argument as Corollary 4.2.
If the left-identity condition (6.36) holds for all players i ∈ P, then Corollary 6.1 implies that it is feasible to solve the whole-trajectory inverse noncooperative dynamic game problem of Definition 6.1 exactly. However, if (6.36) does not hold for all players i ∈ P, then those players for which it does not hold will have to have their parameters computed via an approximate optimality approach, such as either the soft method (6.22) or the mixed method (6.23). In some settings, it may therefore be necessary to use different methods to compute the cost-function parameters of different players. We summarize the constraint-satisfaction method (per player i ∈ P) for whole trajectories under Assumption 6.3 in Algorithm 6.1.
210
6 Inverse Noncooperative Differential Games
Algorithm 6.1 Whole-Trajectory Constraint-Satisfaction Method for Player i Input: Whole state and control trajectories x : [0, T ] → Rn and {(u j : [0, T ] → U j ) : j ∈ P }, i i = dynamics f , basis functions g¯ i , control-constraint set U i , parameter set Θ i = {θ i ∈ Rq : θ(1) i 1}, and set of times K . Output: Computed player cost-functional parameters θ i . 1: Solve the differential equation (6.34) for λ¯ i with λ¯ i (T ) = 0. 2: Compute matrix ξCi via (6.33). 3: Compute augmented matrix ξ¯Ci from (6.35). 4: Compute the pseudoinverse ξ¯Ci+ of ξ¯Ci . 5: if ξ¯Ci ξ¯Ci+ e¯1 = e¯1 then 6: if ξ¯Ci has rank q i then 7: return Unique θ i given by (6.38). 8: else i 9: return Any θ i from (6.37) with any bi ∈ Rq . 10: end if 11: else 12: return No feasible exact solutions (cf. Corollary 6.1). 13: end if
6.5.3.2
Whole-Trajectory Soft Method Reformulation and Results
Under Assumptions 6.1 and 6.3, the soft method of (6.22) can be reformulated by directly exploiting the linear forms of the Hamiltonian gradients in (6.29) and (6.30). Indeed, by following the derivation of Sect. 4.5.2.2 for each player i ∈ P, (6.22) can be reformulated as the N (constrained) quadratic programs ηi P i (0)ηi
inf ηi
s.t. I i ηi ∈ Θ i
(6.39)
for all i ∈ P. The player cost-functional parameters computed by the soft method are thus θ i = I i ηi where ηi solves (6.39) and i i I i I 0 ∈ Rq ×(q +n) . Here, the matrix P i (0) in (6.39) is the initial value of the unique positive semidefinite i i function P i : [0, T ] → R(q +n)×(q +n) solving the Riccati differential equation P˙ i (t) = (P i (t)B i + S i (t))(B i P i (t) + S i (t)) − Q i (t) for t ∈ [0, T ] with terminal condition P i (T ) = 0 and matrices Bi
0 ∇x g¯ i (t) , and S i (t) ∇x f (t) I
and Q i (t) F i (t)F i (t) where
(6.40)
6.5 Open-Loop Method Reformulations and Solution Results
F i (t)
211
∇x g¯ i (t) ∇x f (t) . ∇u g¯ i (t) ∇u i f (t)
Here, we use the shorthand ∇x f (t) ∈ Rn×n and ∇u i f (t) ∈ Rm ×n to denote the matrices of partial derivatives of the game dynamics f with respect to (and evaluated with) i i i x(t) and u(t), respectively. Similarly, ∇x g¯ i (t) ∈ Rn×q , and ∇u i g(t) ∈ Rm ×q denote i the matrices of partial derivatives of the basis functions g¯ . The reformulation in (6.39) lets us characterize the player cost-functional parameters computed by the soft method (6.22) given the parameter sets (6.31). To present this characterization, let us define, for i ∈ P, i
⎡
i P(2,2) (0) i ⎢ P(3,2) (0) ⎢ ξ¯Si ⎢ .. ⎣ . P(qi i +n,2) (0)
⎤ i . . . P(2,q i +n) (0) i . . . P(3,q i +n) (0) ⎥ ⎥ ⎥ .. .. ⎦ . . i . . . P(q i +n,q i +n) (0)
(6.41)
as the principal submatrix of P i (0) formed by deleting the first row and column of i i P i (0) where P(, j) (0) denotes the element of P (0) in the th row and jth column. i+ i Let us also define ξ¯S and r S as the pseudoinverse and rank of ξ¯Si , respectively, and let the first column of P i (0) with its first element deleted be i i (0) P(3,1) (0) . . . P(qi i +n,q i +n) (0) . ν Si P(2,1)
(6.42)
Finally, a singular value decomposition (SVD) of ξ¯Si is ξ¯Si = U Si Σ Si U Si where Σ Si ∈ R(q +n−1)×(q +n−1) is a diagonal matrix, and i
i
U Si =
i,11 i,12 i i US US ∈ R(q +n−1)×(q +n−1) U Si,21 U Si,22
(6.43)
is a block matrix with submatrices U Si,11 ∈ R(q −1)×r S , U Si,12 ∈ R(q −1)×(q +n−1−r S ) , i i i U Si,21 ∈ Rn×r S and U Si,22 ∈ Rn×(q +n−1−r S ) . The main result concerning the solution of the whole-sequence soft method (6.22) follows. i
i
i
i
i
Theorem 6.5 Consider any player i ∈ P and suppose that Assumptions 6.1 and 6.3 hold. If the parameter set Θ i is given by (6.31) and (I − ξ¯Si ξ¯Si+ )ν Si = 0, then all of the parameter vectors θ i ∈ Θ i corresponding to all of the solutions (λi , θ i ) to the soft method (6.22) are of the form θ i = I i ηi
(6.44)
212
6 Inverse Noncooperative Differential Games
i where ηi = 1 η¯ i ∈ Rq +n are (potentially nonunique) solutions to the quadratic i program (6.39) with η¯ i ∈ Rq +n−1 given by η¯ i = −ξ¯Si+ ν Si + U Si
0 bi
(6.45)
for any bi ∈ Rq +n−1−r S . Furthermore, if either U Si,12 = 0 or r Si = q i + n − 1, then all solutions (λi , θ i ) to the soft method (6.22) correspond to the single unique parameter vector θ i ∈ Θ i given by i
i
θi = I i
1
−ξ¯Si+ ν Si
.
(6.46)
Proof The proof follows that of Theorem 4.4 for each player i ∈ P using the quadratic program reformulation in (6.39) instead of (4.32). The conditions of Theorem 6.5 are discussed (as they relate to each player i ∈ P) after Theorem 4.4. We note here that it is possible for the conditions of Theorem 6.5 to hold for some players i ∈ P but not others, and so the parameters computed by the soft method may be unique for some players i ∈ P but not others. We summarize the soft method for whole trajectories (6.22) (per player i ∈ P) under Assumptions 6.1 and 6.3 in Algorithm 6.2. Algorithm 6.2 Whole-Trajectory Soft Method for Player i Input: Whole state and control trajectories x : [0, T ] → Rn and {(u j : [0, T ] → U j ) : j ∈ P }, i dynamics f , basis functions g¯ i , control-constraint set U i , and parameter set Θ i = {θ ∈ Rq : i θ(1) = 1}. Output: Computed cost-functional parameters θ i . 1: Solve Riccati equation (6.40) with P i (T ) = 0 for P i (0). 2: Compute submatrix matrix ξ¯Si from (6.41) and vector ν Si from (6.42). 3: Compute the pseudoinverse ξ¯Si+ of ξ¯ Si such that (I − ξ¯Si ξ¯ Si+ )ν Si = 0. 4: Compute the rank r Si of ξ¯ Si . 5: if r Si = q i + n − 1 then 6: return Unique θ i given by (6.46). 7: else 8: Compute U Si and U Si,12 in (6.43) via SVD of ξ¯Si . 9: if U Si,12 = 0 then 10: return Unique θ i given by (6.46). 11: else i i 12: return Any θ i from (6.44) with any bi ∈ Rq +n−1−r S . 13: end if 14: end if
6.5 Open-Loop Method Reformulations and Solution Results
6.5.3.3
213
Whole-Trajectory Mixed Method Reformulation and Results
Following the derivation of (4.33) for each player i ∈ P, and noting Assumption 6.3, the mixed method for whole trajectories (6.23) can be reformulated as the problem of solving the N constrained quadratic programs inf θi
i i θ i ξ M θ
s.t.
θi ∈ Θi
(6.47)
i ∈ Rq ×q is the positive semidefinite matrix for all i ∈ P. For i ∈ P, ξ M i
i ξM
tk ∈K
i
[∇u i g¯ i (tk , x(tk ), u 1 (tk ), . . . , u N (tk )) i
+ ∇u i f (tk , x(tk ), u 1 (tk ), . . . , u N (tk ))λ¯ i (tk )]
(6.48)
· [∇u i g¯ i (tk , x(tk ), u 1 (tk ), . . . , u N (tk )) + ∇u i f (tk , x(tk ), u 1 (tk ), . . . , u N (tk ))λ¯ i (tk )] and the function λ¯ i : [0, T ] → Rn solves the differential equation λ˙¯ i (t) = −∇x g¯ i (t, x(t), u 1 (t), . . . , u N (t)) − ∇x f (t, x(t), u 1 (t), . . . , u N (t))λ¯ i (t) (6.49) for t ∈ [0, T ] with terminal condition λ¯ i (T ) = 0. The reformulation of the mixed method in (6.47) enables a characterization of the player cost functionals it yields given the parameter sets in (6.31). To present this i characterization, for each player i ∈ P, let us define the principal submatrix of ξ M i formed by deleting the first row and column of ξ M as ⎡
i ξ¯ M
i ξ M,(2,2) i ⎢ ξ M,(3,2) ⎢ ⎢ . ⎣ .. i ξ M,(q i ,2)
⎤ i . . . ξ M,(2,q i) i ⎥ . . . ξ M,(3,q i) ⎥ ⎥ .. .. ⎦ . . i . . . ξ M,(q i ,q i )
(6.50)
i i as the principal submatrix of ξ M formed by deleting the first row and column of ξ M i i where ξ M,(, j) denotes the element of ξ M in the th row and jth column. Let us also i+ i i and r M as the pseudoinverse and rank of ξ¯ M , respectively, and let define ξ¯ M
i i i i ξ M,(3,1) . . . ξ M,(q νM ξ M,(2,1) i ,q i ) i denote the first column of ξ M without its first element. Finally, let
(6.51)
214
6 Inverse Noncooperative Differential Games i i i i ξ¯ M = UM ΣM UM
i i i i i i i be a SVD of ξ¯ M where U M ∈ R(q −1)×(q −1) and Σ M ∈ R(q −1)×(q −1) . The existence and uniqueness of solutions to the mixed method (6.23) are then characterized in the following theorem.
Theorem 6.6 Suppose that Assumption 6.3 holds and consider any player i ∈ P. If i ¯ i+ ¯ i ξ M )ξ M = 0, then the parameter vectors θ i ∈ Θ i Θ i is given by (6.31) and (I − ξ¯ M solving the mixed method (6.23) are all of the form 1 θ i = ¯i θ
(6.52)
where the vectors θ¯ i ∈ Rq −1 are given by i
0 i+ i i ¯θ i = −ξ¯ M νM + UM i b
(6.53)
i for any bi ∈ Rq −1−r M . Furthermore, if r M = q i − 1 then i
i
θi =
1
i+ i νM −ξ¯ M
(6.54)
is the unique solution to (6.23). Proof The theorem is proved in the same manner as Theorem 4.5 for each player i ∈ P using the quadratic reformulation in (6.47). Theorem 6.6 mirrors Theorem 4.5 for solving the whole-trajectory inverse noncooperative differential game problem. The discussion after Theorem 4.5 is thus also applicable here for each player i ∈ P. We summarize the mixed method for whole trajectories (6.23) under Assumption 6.3 (per player i ∈ P) in Algorithm 6.3.
6.5.4 Truncated-Trajectory Methods Reformulations and Results We now reformulate and analyze the soft method presented in Sect. 6.4.2 for truncated trajectories under the open-loop information structure. These results mirror the continuous-time inverse optimal control results of Sect. 4.5 for each player i ∈ P and so we shall omit detailed derivations. We also omit consideration of the constraintsatisfaction and mixed methods for truncated trajectories (due to the limited extent to which we were able to reformulate them in Sect. 4.5).
6.5 Open-Loop Method Reformulations and Solution Results
215
Algorithm 6.3 Whole-Trajectory Mixed Method for Player i ∈ P Input: Whole state and control trajectories x : [0, T ] → Rn and {(u j : [0, T ] → U j ) : j ∈ P }, i i = dynamics f , basis functions g¯ i , control-constraint set U i , parameter set Θ i = {θ i ∈ Rq : θ(1) i 1}, and set of times K . Output: Computed cost-functional parameters θ i . 1: Solve differential equation (6.49) for λ¯ i with λ¯ i (T ) = 0. i via (6.48). 2: Compute matrix ξ M i from (6.50) and vector ν i from (6.51). 3: Compute submatrix ξ¯ M M i i . ¯ 4: Compute rank r M of ξ M i+ i so (I − ξ¯ i ξ¯ i+ )ν i = 0. 5: Compute pseudoinverse ξ¯ M of ξ¯ M M M M i = q i − 1 then 6: if r M 7: return Unique θ i given by (6.54). 8: else i through SVD of ξ¯ i . 9: Compute U M M 10: return Any θ i from (6.52) with any bi ∈ Rq 11: end if
6.5.4.1
i −1−r i M
.
Truncated-Trajectory Soft Method Reformulation and Results
Recall the notation of Sect. 6.5.3.2. By following the reformulation of the truncatedtrajectory soft method of inverse optimal control in Sect. 4.5, the soft method for truncated trajectories of (6.27) can be reformulated under Assumptions 6.2 and 6.3 as the N (constrained) quadratic optimization problems inf βi
β i Pi (0)β i
s.t.
1 i i I β ∈ Θi , μi
(6.55)
for all i ∈ P where the matrix Pi (0) is the initial value of the unique positive i i semidefinite function Pi : [0, ] → R(q +n)×(q +n) solving the Riccati differential equation P˙i (t) = (Pi (t)B i + S i (t))(B i Pi (t) + S i (t)) − Q i (t)
(6.56)
for t ∈ [0, ] with terminal condition Pi () = 0. The player cost-functional parameters computed by the soft method are then given by θ i = I i β i where β i solves (6.55). The reformulation of the soft method for truncated trajectories in (6.55) leads us to a characterization of its solutions for a finite (but potentially unknown) horizon 0 < T < ∞ given the parameter sets in (6.31). To develop this characterization, for i ∈ P, let us define φ¯ Si as the principal submatrix of Pi (0) formed by deleting the first row and column of Pi (0). Let us also define φ¯ Si+ and ρ Si as the pseudoinverse and rank of φ¯ Si , respectively, and let ν¯ Si be the first column of Pi (0) without its first element. Finally, denote a SVD of φ¯ Si as
216
6 Inverse Noncooperative Differential Games
φ¯ Si = U¯ Si Σ¯ Si U¯ Si i i where Σ¯ Si ∈ R(q +n−1)×(q +n−1) is a diagonal matrix, and
i,11 i,12 i i U¯ U¯ i ¯ U S = ¯ Si,21 ¯ Si,22 ∈ R(q +n−1)×(q +n−1) US US
(6.57)
i i i i i is a block matrix with submatrices U¯ Si,11 ∈ R(q −1)×ρS , U¯ Si,12 ∈ R(q −1)×(q +n−1−ρS ) , i i i U¯ Si,21 ∈ Rn×ρS and U¯ Si,22 ∈ Rn×(q +n−1−ρS ) . The player cost-functional parameters yielded by the soft method (6.27) when the horizon T is finite (but potentially unknown) are described in the following theorem.
Theorem 6.7 Consider any player i ∈ P and suppose that Assumptions 6.2 and 6.3 hold. If the horizon T is finite (but potentially unknown), the parameter set Θ i is given by (6.31), and (I − φ¯ Si φ¯ Si+ )¯ν Si = 0, then all of the parameter vectors θ i ∈ Θ i corresponding to all of the solutions (λi , θ i ) to the soft method (6.27) are of the form θ i = I i βi
(6.58)
i where β i = 1 β¯ i ∈ Rq +n are (potentially nonunique) solutions to the quadratic i program (6.55) with β¯ i ∈ Rq +n−1 given by i+ i i i 0 ¯ ¯ ¯ β = −φ S ν¯ S + U S i b
(6.59)
i i for any bi ∈ Rq +n−1−ρS . Furthermore, if either U¯ Si,12 = 0 or ρ Si = q i + n − 1, then all solutions (λi , θ i ) to the soft method (6.27) correspond to the single unique parameter vector θ i ∈ Θ i given by
θi = I i
1 . −φ¯ Si+ ν¯ Si
(6.60)
Proof The proof follows that of Theorem 4.6 using the reformulation (6.55), noting that μi = 1 when T is finite. Since Theorem 6.7 is the differential game equivalent of Theorem 4.6, the discussion after Theorem 4.6 provides useful intuition about the conditions of Theorem 6.7. Importantly, in this differential game setting, Theorem 6.7 suggests that unlike the bilevel method for truncated trajectories (6.17), the soft method for truncated trajectories can compute the parameters of individual players separately without knowledge of the game horizon T and without the need to solve (forward) noncooperative differential games. The implementation of the soft method for truncated trajectories (6.27) is also essentially equivalent to that of the soft method for whole trajectories (6.22) summarized in Algorithm 6.2.
6.6 Challenges and Potential for Feedback Minimum-Principle Methods
217
6.6 Challenges and Potential for Feedback Minimum-Principle Methods This chapter has focused on developing minimum-principle methods for solving the inverse noncooperative differential game problems of Definitions 6.1 and 6.2 under the open-loop information structure (i.e., finding player cost-functional parameters such that given trajectories constitute an open-loop Nash equilibrium). In light of the close relationship between optimal control and inverse differential games under the open-loop information structure (cf. Sect. 2.5.2.2), the problems of Definitions 6.1 and 6.2 reduce to solving N continuous-time inverse optimal control problems analogous to those of Definitions 4.1 and 4.2 (one for each player in the game). The methods and results we have developed for solving inverse differential game problems in Sects. 6.4 and 6.5 thus specialize those we developed for continuous-time inverse optimal control in Chap. 4 when N = 1. For the same reasons discussed in the context of (discrete-time) case of inverse noncooperative dynamic games in Sect. 5.6, developing minimum-principle methods for solving the problems of Definitions 6.1 and 6.2 under the feedback information structure poses significant challenges. Indeed, their solution via minimum-principle methods appears to be difficult (if not impossible) without access to player feedback Nash equilibrium strategies. Given player feedback strategies, the inverse problems would, however, become computing player cost-functional parameters such that the given set of player feedback strategies constitute a feedback Nash equilibrium. This problem is still nontrivial, but becomes analogous to the original notion of inverse optimal control introduced by Kalman in [9] for the single-player case N = 1. From a practical perspective, assuming that the player feedback strategies are known can be interpreted as requiring that sufficient state and control data is available to first compute them before using them to compute player cost-functional parameters. Inverse noncooperative differential game problems under the feedback information structure thus have the potential to be solved in two steps by first estimating the feedback strategies {γ i : i ∈ P} from given state and control trajectories x and {u i : i ∈ P}, and then computing the player cost-functional parameters {θ i : i ∈ P} from the estimated feedback strategies. In the next section, in a similar vain to the optimal control results of Sects. 3.6 and 4.6 and the (discrete-time) inverse dynamic game results of Sect. 5.7, we shall show that this two-step approach is tractable for differential games with linear dynamics, quadratic cost functionals, an infinite horizon, and the feedback information structure.
6.7 Inverse Linear-Quadratic Feedback Differential Games In this section, we detail the two-step approach discussed in the previous section for solving the truncated-trajectory continuous-time inverse differential game problem of Definition 6.2 in the special case where the game dynamics are linear, the player cost
218
6 Inverse Noncooperative Differential Games
functionals are quadratic, and the horizon T is infinite. The key distinction between this LQ continuous-time approach and the LQ discrete-time approach presented in Sect. 5.7 will be shown to be the use of coupled continuous-time algebraic Riccati equations (AREs) instead of coupled discrete-time AREs in the solution of the second step of the approach (i.e., computing the player cost-functional parameters {θ i : i ∈ P}).
6.7.1 Preliminary LQ Differential Game Concepts We first specialize the concepts introduced in Sect. 6.1.
6.7.1.1
Parameterized Linear-Quadratic Differential Games
Consider a differential game consisting of N players P = {1, . . . , N } controlling a continuous-time dynamical system described by the linear first-order differential equations x(t) ˙ = Ax(t) +
N
B j u j (t), x(0) = x0 ∈ Rn
(6.61)
j=1 j
for t ∈ [0, ∞) where A ∈ Rn×n and B j ∈ Rn×m , j ∈ P, denote time-invariant system matrices. For each player i ∈ P, let us introduce an infinite-horizon quadratic cost functional of the form i V∞ (x, u 1 , . . . , u i , . . . , u N , θ¯ i ) =
1 2
∞ 0
x Qi x +
N
u j R i j u j dt,
(6.62)
j=1
where Q i and R i j for i, j ∈ P are symmetric cost matrices that parameterize the cost functionals via the vectors θ¯ i = vec(Q i ) vec(R i1 ) · · · vec(R ii ) · · · vec(R i N )
(6.63)
for i ∈ P. To avoid game degeneracy, we consider positive definite matrices R ii 0 for all i ∈ P. We shall consider noncooperative LQ differential games played with the feedback information structure and with linear player feedback strategies of the form γ i (t, x(t)) = u i (t) = −K i x(t)
(6.64)
6.7 Inverse Linear-Quadratic Feedback Differential Games
219
for i ∈ P where the matrices K i ∈ Rm ×n result in a stable closed-loop system x(t) ˙ = F x(t) with (stable) closed-loop system matrix i
F A−
N
B j K j.
(6.65)
j=1
We denote the set of all N -tuples of feedback strategies K {K i : i ∈ P} leading to stable closed-loop systems as F {K = {K i ∈ Rm ×n : i ∈ P} : F is stable}, i
(6.66)
which is non-empty if (and only if) the matrix pair (A, B 1 · · · B N ) is stabilizable.1 As in Sect. 2.5.2.2, an N -tuple of feedback strategies K ∈ F constitutes a feedback Nash equilibrium solution to the infinite-horizon (noncooperative) LQ differential game with linear dynamics (6.61) and quadratic player cost functionals (6.62) if and only if the solve the N coupled continuous-time LQ optimal control problems inf
i V∞ (x, u 1 , . . . , u N , θ¯ i )
s.t.
x(t) ˙ = Ax(t) +
K¯ i
N
B j u j (t), t ∈ [0, ∞)
j=1
u i (t) = − K¯ i x(t)
(6.67)
u j (t) = −K j x(t), j ∈ P, j = i x(0) = x0 ∈ Rn . {K 1 , . . . , K¯ i , . . . , K N } ∈ F for i ∈ P. The relationship implied by (6.67) between continuous-time LQ optimal control (cf. Sect. 4.6) and (continuous-time) LQ feedback differential games enables us to next exploit coupled continuous-time AREs to characterize feedback Nash equilibria.
6.7.1.2
Conditions for Feedback Nash Equilibria in LQ Differential Games
The following theorem presents conditions for feedback Nash equilibrium solutions of the coupled LQ optimal control problems (6.67) via coupled continuous-time AREs. 1
Recall that this restriction limits us to trajectories corresponding to a stable closed-loop system for solving the inverse LQ differential game problem (see Sect. 5.7.1.1 for a detailed explanation). Additionally, control strategies belonging to F ensure that (6.62) is finite.
220
6 Inverse Noncooperative Differential Games
Theorem 6.8 (Coupled Continuous-Time AREs [5, Theorem 4]) Consider a noncooperative infinite-horizon LQ differential game with linear dynamics (6.61) and quadratic player cost functionals (6.62). If an N -tuple of player feedback strategies K = {K i : i ∈ P} and an N -tuple of matrices {P i ∈ Rn×n : i ∈ P} satisfy the coupled continuous-time AREs Pi F + F Pi +
N
P j B j R jj
−1
Ri j R j j
−1
B j P j + Q i = 0
(6.68a)
j=1
K i = R ii
−1
B i P i
(6.68b)
for i ∈ P and K ∈ F , then K is a feedback Nash equilibrium solution to the infinitehorizon LQ differential game. Conversely, if an N -tuple of player feedback strategies K = {K i : i ∈ P} ∈ F is a feedback Nash equilibrium solution, then the coupled AREs (6.68) admits a solution in the form of an N -tuple of matrices {P i : i ∈ P}. Proof See [5].
We next use the concepts of (forward) infinite-horizon LQ differential games and coupled continuous-time AREs (cf. Theorem 6.8) to pose and solve a feedbackstrategy-based inverse LQ differential game problem.
6.7.2 Feedback-Strategy-Based Inverse Differential Games We begin this subsection by posing the problem of inverse LQ differential games given player feedback control strategies (as applicable in the second step in the twostep approach described in Sect. 6.6). We then investigate conditions and methods for solving it.
6.7.2.1
Feedback-Strategy-Based Problem Formulation
The feedback-strategy-based inverse differential game problem is defined as follows. Definition 6.3 (Feedback-Strategy-Based (FSB) Problem) Consider a parameterized noncooperative differential game with an infinite horizon T = ∞, linear dynamics (6.61), quadratic cost functionals (6.62), and the feedback information structure. Then, given system matrices A and {B i : i ∈ P}, and an N -tuple of feedback control strategies K = {K i : i ∈ P}, the feedback-strategy-based inverse LQ differential game problem is to compute the cost-functional parameters (i.e., the entries of the matrices Q i and R i j for one or more players i ∈ P, such that K (and hence the controls given by u i (t) = −K i x(t), i ∈ P) constitutes a feedback Nash equilibrium solution.
6.7 Inverse Linear-Quadratic Feedback Differential Games
221
Clearly, the FSB problem reduces to the inverse LQ optimal control feedback-lawbased (FLB) problem of Definition 4.4 when the game has a single player (i.e., when N = 1). In the following, we shall present a method for solving the FSB problem (for potentially multiple players N > 1) based on the continuous-time AREs of Theorem 6.8. This method and the results associated with it are the continuous-time counterparts to discrete-time results of Sect. 5.7.2.2.
6.7.2.2
Reformulation of the Continuous-Time Algebraic Riccati Equations
To solve the FSB problem of Definition 6.3, we note that continuous-time AREs of Theorem 6.8 can be viewed as conditions that the matrices Q i and R i j must satisfy such that the given feedback strategies K constitute a feedback Nash equilibrium. Rearranging the Eq. (6.68) using vectorisation operations and the Kronecker product analogously to the procedure in Sect. 4.6.3.2 leads to (6.68) being equivalent to the N homogeneous systems of linear equations W¯ i θ¯ i = 0,
(6.69)
for i ∈ P where W¯ i Z i Z i K ⊗1 · · · Z i K ⊗i−1 (Z i K ⊗i + K i ⊗ I p ) Z i K ⊗i+1 · · · Z i K ⊗N (6.70) with Z i (I ⊗ B i ) K ⊗i
i
−1 2 F ⊗ I + I ⊗ F ∈ Rnm i ×n n 2 ×m i2
K ⊗K ∈R i
,
(6.71) (6.72)
and where θ¯ i is as defined in (6.63). We note that the inverses involved in computing W¯ i can be shown to exists for all K belonging to the stabilizing set F (see Lemma 1 of [6]). In principle, the systems of linear equations (6.69) can be solved for the costfunctional parameters θ¯ i since the coefficient matrices W¯ i are calculable from the known system matrices and given feedback strategies. The systems of linear equations (6.69) also enable us to analyze the existence of (exact) solutions to the FSB problem of Definition 6.3 in a similar manner to the results of Corollary 6.1 (which despite being for nonlinear systems and nonquadratic cost functionals, are applicable only in the case of whole trajectories).
222
6.7.2.3
6 Inverse Noncooperative Differential Games
Existence of Exact Solutions to Feedback-Strategy-Based Problem
As in the discrete-time LQ dynamic game setting discussed in Sect. 5.7.2.3, the existence of solutions to the system of linear equations (6.69) (and hence the FSB problem of Definition 6.3) depends on the properties of the matrices W¯ i and the number of nonredundant (and nonzero) parameters in the player cost functionals. Let q i be the number of nonredundant (and nonzero) parameters in player i’s cost functional for i ∈ P. Then, as in Sect. 5.7.2.3, we may rewrite (6.69) in the form Wiθi = 0
(6.73) i
for i ∈ P with the (potentially) reduced vector θ i ∈ Rq containing only the nonredundant and nonzero elements of the matrices Q i and R i j , and with the matrix i i W i ∈ Rnm ×q being an appropriately modified version of W¯ i . The properties of (6.73) are analogous to those of (5.82) in the discrete-time case described in Sect. 5.7.2.3. In particular, the set of cost-functional parameters solving the FSB problem of Definition 6.3 for player i is given by the kernel of W i , with conditions for its existence being the same as those we discussed in Sect. 5.7.2.3.
6.7.2.4
Feedback-Strategy-Based Method
To avoid difficulties associated with the existence of the kernels of W i for i ∈ P (and exact solutions to the FSB problem of Definition 6.3), we propose computing player cost-functional parameters by minimizing the violation of the conditions for feedback Nash equilibria summarized by the systems of linear equations (6.73). Specifically, the method for solving the FSB problem of Definition 6.3 is to solve the N quadratic programs 1 i i i θ Ωθ , 2 R ii 0,
min θi
s.t.
(6.74)
for i ∈ P where Ω i 2(W i W i ) ∈ Rq ×q and R ii is the cost-functional matrix that results from θ i . As in the discrete-time counterpart (5.83), it is possible to relax the constraint in (6.74) by replacing it with the constraint of setting one parameter of R ii to be nonzero and positive (as in the parameter set in (6.31)). We note that (6.74) can be used to compute the cost-functional matrices of players independently, and we also analyze the solutions to (6.74) using tools from linear algebra (as we show next). i
i
6.7 Inverse Linear-Quadratic Feedback Differential Games
6.7.2.5
223
Solution Results
The first result regarding (6.74) establishes the existence of its solutions. Proposition 6.1 Consider any player i ∈ P. Under the conditions of the FSB problem of Definition 6.3, the quadratic program (6.74) is convex and possesses at least one solution. Proof The proof is the same as in Proposition 4.2.
As in the discrete-time inverse LQ dynamic game case, Proposition 6.1 is most useful in non-ideal settings since it guarantees that (6.74) yields at least one set of player cost-functional parameters for each particular player i ∈ P regardless of whether the given feedback strategies K constitute a (exact) feedback Nash equilibrium. The final result establishes conditions under solutions to (6.74) are unique for each player i ∈ P (up to an arbitrary nonzero scaling factor C i > 0). Theorem 6.9 (Necessary and Sufficient Conditions for Unique Solutions) Consider the FSB problem of Definition 6.3 and suppose that it is solved for player i ∈ P by matrices Q i and R i j with elements from a vector θ i . Then, the set of all solutions to (6.74) for i ∈ P is (6.75) {C i θ i : C i > 0}, if and only if n m i ≥ q i − 1 and additionally rank(W i ) = q i − 1. Proof The proof is analogous to that of Theorems 4.8 and 3.10.
6.7.3 Estimation of Feedback Control Laws In the last subsection, we have developed a method for solving the FSB problem of Definition 6.3 (i.e., the second step in the two-step approach described in Sect. 6.6). However, in practice we may only have access to state and control trajectories x : i [0, ] → Rn and {(u i : [0, ] → Rm ) : i ∈ P} over a time interval [0, ]. Hence, in order to utilize the developed FSB method, we now consider the estimation of feedback strategy matrices K = {K i : i ∈ P} from observed trajectories (i.e., the first step in the two-step approach described in Sect. 6.6). To estimate K = {K i : i ∈ P} for the N -player inverse LQ differential game at hand, we can employ linear least-squares estimation based on the feedback relation (6.64). For this purpose, let us introduce a finite sequence of sampling times K i {tk ∈ [0, ] : 1 ≤ k ≤ |K i | and 0 ≤ t1 < · · · < t|K i | ≤ }
(6.76)
for each player i ∈ P. Let the value of the state and control trajectories at tk be denoted by x [k] and u i[k] , respectively. Then, the feedback matrices for i ∈ P can
224
6 Inverse Noncooperative Differential Games
be estimated by means of K i = arg min K¯ i
k∈K
K¯ i x [k] + u i[k] 2 .
(6.77)
i
The closed-form solution is given by −1 i X [0,] X [0,] X [0,] K i = −U[0,]
(6.78)
i where X [0,] ∈ R|K |×n and U[0,] ∈ R|K |×m denote matrices containing the sampled sequence of state and control values of player i ∈ P, respectively. i
i
i
6.7.4 Inverse LQ Differential Game Method Algorithm 5.7 summarizes the results of this section concerning the solution of the truncated-trajectory continuous-time inverse differential game problem of Definition 6.2 under the feedback information structure in the special case where the game dynamics are linear, the player cost functionals are quadratic, and the horizon T is infinite. Algorithm 6.4 Infinite-horizon inverse LQ feedback differential games for Player i ∈P i
Input: Truncated state and control trajectories x : [0, ] → Rn and {(u i : [0, ] → Rm ) : i ∈ P } sampled at times Ki , and system matrices A and {B i : i ∈ P }. Output: Computed cost-functional parameters θ i (i.e., nonredundant elements of Q i and R i j ). 1: Estimate K j for all j ∈ P using least squares (i.e., (6.77)) and determine the corresponding closed-loop system matrix F with (6.65). 2: Compute W¯ i with (6.70). 3: Modify W¯ i to form W i so as to fulfill (6.73) with unknown parameters θ i . 4: Solve the quadratic optimization problem (6.74) for θ i .
Algorithm 6.4 enables the computation of cost-functional parameters for each player i ∈ P independently by solving (6.74), but requires all of the player feedback strategies K = {K j : j ∈ P} to be estimated first. We also note that by setting N = 1, Algorithm 6.4 reduces to a continuous-time inverse LQ optimal control method. The equations involved in the method with N = 1 differ slightly from those presented in Algorithm 4.4 since the Riccati equations for differential games we consider are expressed in a Lyapunov-equation form (see (6.68a) and [11, p. 299] for more details).
6.8 Notes and Further Reading
225
6.8 Notes and Further Reading Bilevel and Other Methods: Bilevel methods for solving the whole-trajectory inverse differential game problem of Definition 6.1 under the open-loop information structure appear to have been explicitly introduced in [14], and applied in a biological setting in [15]. Compared to the discrete-time inverse dynamic game setting of Chap. 5, surprisingly few methods of inverse differential games have been proposed in the economics literature, with most continuous-time treatments in economics seemingly limited to problems with state dynamics governed by continuous-time finite-state Markov processes [1] rather than differential equations. Minimum-Principle Methods: Minimum-principle methods for solving inverse differential game problems under the open-loop information structure emerged relatively recently, with the proposal of the whole-trajectory soft method (6.22) in [14], and the whole-trajectory mixed method (6.23) in [13]. Solution results for both of these methods appeared first in [13]. The soft method has since been extended in [2] to additionally deal with inequality constraints dependent on the states and controls. A minimum-principle method for solving a two-player inverse differential game problem under the feedback information structure was proposed in [16] under the assumption that the feedback strategy of one of the players is known and the cost functional of the other player is sought. An extension of this approach for solving an N -player inverse differential game problem is given in [8]. Inverse Linear-Quadratic Differential Games: The treatment of inverse LQ feedback differential games in Sect. 6.7 generalizes the inverse LQ optimal control results of Sect. 4.6 and related works (e.g., [4, 10, 12]). It thus deals with a natural N -player extension of the question “What optimization problems lead to a constant, linear control law?” first posed by Kalman in [9]. Additional Topics: Given the close relationship between inverse differential games and inverse optimal control, much of the additional work we discussed in Sect. 4.7 remains of interest, and could be extended, for inverse differential games. For example, inverse differential game problems in which given state and player control sequences must be processed online (i.e., sequentially without storing or processing them in a batch) have recently been investigated in [7], based on similar online treatments of inverse optimal control. In addition, inverse differential game problems involving solution concepts other than Nash equilibria are yet to be fully explored.
References 1. Arcidiacono P, Bayer P, Blevins JR, Ellickson PB (2016) Estimation of dynamic discrete choice models in continuous time with an application to retail competition. Rev Econ Stud 83(3):889– 931 2. Awasthi C, Lamperski A (2020) Inverse differential games with mixed inequality constraints. In: 2020 American control conference (ACC), pp 2182–2187. ISSN: 2378-5861 3. Basar T, Olsder GJ (1999) Dynamic noncooperative game theory, vol 23, 2nd edn. Academic, New York
226
6 Inverse Noncooperative Differential Games
4. Boyd SP, El Ghaoui L, Feron E, Balakrishnan V (1994) Linear matrix inequalities in system and control theory, vol 15. SIAM 5. Engwerda JC, van den Broek WA, Schumacher JM (2000) Feedback Nash equilibria in uncertain infinite time horizon differential games. In: Proceedings of the 14th international symposium of mathematical theory of networks and systems, MTNS 2000, pp 1–6 6. Inga J, Bischoff E, Molloy TL, Flad M, Hohmann S (2019) Solution sets for inverse noncooperative linear-quadratic differential games. IEEE Control Syst Lett 3(4):871–876 7. Inga J, Creutz A, Hohmann S (2021) Online inverse linear-quadratic differential games applied to human behavior identification in shared control. In: 2021 European control conference (ECC) 8. Inga Charaja JJ (2021) Inverse dynamic game methods for identification of cooperative system behavior. KIT Scientific Publishing. Publication Title: KIT Scientific Publishing 9. Kalman RE (1964) When is a linear control system optimal? J Basic Eng 86(1):51–60 10. Kong H, Goodwin G, Seron M (2012) A revisit to inverse optimality of linear systems. Int J Control 85(10):1506–1514 11. Lewis FL, Vrabie D, Syrmos VL (2012) Optimal control, 3rd edn. Wiley, New York 12. Menner M, Zeilinger MN (2018) Convex formulations and algebraic solutions for linear quadratic inverse optimal control problems. In: 2018 European control conference (ECC), pp 2107–2112 13. Molloy TL, Inga J, Flad M, Ford JJ, Perez T, Hohmann S (2020) Inverse open-loop noncooperative differential games and inverse optimal control. IEEE Trans Autom Control 65(2):897–904 14. Molloy TL, Ford JJ, Perez T (2017) Inverse noncooperative differential games. In: 2017 IEEE 56th annual conference on decision and control (CDC), Melbourne, Australia 15. Molloy TL, Garden GS, Perez T, Schiffner I, Karmaker D, Srinivasan M (2018) An inverse differential game approach to modelling bird mid-air collision avoidance behaviours. In: 18th IFAC symposium on system identification (SYSID, 2018), Stockholm, Sweden 16. Rothfuß S, Inga J, Köpf F, Flad M, Hohmann S (2017) Inverse optimal control for identification in non-cooperative differential games. In: IFAC 2017 world congress, Toulouse, France
Chapter 7
Examples and Experimental Case Study
In this book, we have presented a variety of methods and theoretical results for solving inverse optimal control and noncooperative dynamic game problems in both discrete and continuous time. In this final chapter, we illustrate and compare these methods in an application-inspired example, in two further examples, and then in an experimental case study. The application-inspired example serves to illustrate and compare the bilevel and minimum-principle methods of inverse noncooperative dynamic and differential games from Chaps. 5 and 6. By doing so under the open-loop information structure, we also implicitly illustrate and compare the inverse optimal control methods of Chap. 3 and 4. Through this example, we demonstrate that minimum-principle methods can match and exceed the performance of bilevel methods with only a tiny fraction of the computational effort. The two further examples we present later in this chapter serve to differentiate the soft and mixed minimum-principle methods of inverse optimal control, and inverse noncooperative dynamic and differential games. These examples demonstrate important failure or near-failure cases for soft methods. Together these examples highlight that employing appropriate inverse methods is crucial, even in idealized settings. The final part of this chapter is concerned with an experimental case study involving the use of inverse methods for linear-quadratic (LQ) feedback differential games (from Chap. 6) to identify human driver behavior for advanced driver assistance technology.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 T. L. Molloy et al., Inverse Optimal Control and Inverse Noncooperative Dynamic Game Theory, Communications and Control Engineering, https://doi.org/10.1007/978-3-030-93317-3_7
227
228
7 Examples and Experimental Case Study
7.1 Application-Inspired Example In this section, we present an example inspired by the use of inverse optimal control and inverse dynamic game theory to model human arm movements (e.g., [13]). We specifically compute the underlying objectives of a simulated combined human– robot system involving a robotic rehabilitation system where a human arm (player 1) is supported by an actuated arm exoskeleton (player 2). In this example, we consider both (discrete-time) methods of inverse dynamic games presented in Chap. 5, and (continuous-time) methods of inverse differential games presented in Chap. 6. The aim of this example is to illustrate some of the implementation issues and choices inherent to these methods.
7.1.1 System Model We model the human–robot system using the planar two-link robot arm model illustrated in Fig. 7.1. The parameters of the model are summarized in Table 7.1. This model has been widely used for modeling and analysis of robot arms moving in two dimensions (cf. [9]). With the coordinates defined in Fig. 7.1, the inverse dynamics1 of the two-link arm are given by [9] M(α)α¨ + C(α, α) ˙ α˙ + G(α) = τ, (7.1)
Fig. 7.1 Diagram of two-link robot arm model and coordinates
1
The phrase “inverse dynamics” is used here with its standard meaning from robotics and biomechanics as a description of the relationship between torques and arm movement—not as a technical concept from inverse optimal control or inverse dynamic game theory.
7.1 Application-Inspired Example Table 7.1 Parameters of two-link robot arm model Parameter Value m1 m2 r1
2.0 kg 1.5 kg 0.2 m
r2
0.2 m
l1 l2 I1 I2
0.4 m 0.4 m 0.047 kg.m2 0.035 kg.m2
229
Description Mass of link 1 Mass of link 2 Distance from the joint center to the center of mass for link 1 Distance from the joint center to the center of mass for link 2 Length of link 1 Length of link 2 Moment of inertia of link 1 Moment of inertia of link 2
where α = α(1) α(2) is the vector of joint angles, α˙ and α¨ denote the associated joint velocities and accelerations, respectively, and τ = τ(1) τ(2) are the joint torques.2 In addition, M(α) ∈ R2×2 , C(α, α) ˙ ∈ R2×2 , and G(α) ∈ R2 denote the inertia matrix, Coriolis matrix, and the gravity vector, respectively, and are given by a1 + 2a2 cos(α(2) ) a3 + a2 cos(α(2) ) a3 a3 + a2 cos(α(2) ) −a2 α˙ (2) sin(α(2) ) −a2 sin(α(2) )(α˙ (1) + α˙ (2) ) C(α, α) ˙ = 0 a2 α˙ (1) sin(α(2) ) b g cos(α(1) + α(2) ) + b1 g cos(α(1) ) G(α) = 2 b2 g cos(α(1) + α(2) ) M(α) =
where a1 = m 1r12 + m 2 (l12 + r22 ) + I1 + I2 a2 = m 2 l 1 r 2 a3 = m 2 r22 + I2 b1 = l1 m 2 + r1 m 1 b2 = r2 m 2 with the parameter values given in Table 7.1. To accommodate two players interacting through the two-link arm, we consider the joint torques τ ∈ R2 to be the sum of two independent torque inputs τ 1 ∈ R2 and τ 2 ∈ R2 , namely
We highlight that α and τ are functions of time t ∈ [0, ∞) but we shall omit their time arguments here for brevity (e.g., α = α(t)).
2
230
7 Examples and Experimental Case Study
τ = τ 1 + τ 2. Without loss of generality, we consider τ 1 ∈ R2 to be the torque selected by player 1 (the human) and τ 2 ∈ R2 to be the torque selected by player 2 (the actuated exoskeleton). By defining the state vector ⎤ ⎡ ⎤ α(1) x(1) (t) ⎢x(2) (t)⎥ ⎢α(2) ⎥ ⎥ ⎢ ⎥ x(t) = ⎢ ⎣x(3) (t)⎦ ⎣α˙ (1) ⎦ , α˙ (2) x(4) (t) ⎡
(7.2)
together with the player control vectors u 1 (t) τ 1 and u 2 (t) τ 2 for players 1 and 2, respectively, we obtain the continuous-time dynamics of the two-link arm model ⎤ x(3) (t) ⎥ ⎢ x(4)(t)
⎥ . x(t) ˙ =⎢ ⎦ ⎣ −1 x(3) (t) ˙ − G(α) + u 1 (t) + u 2 (t) M (α) −C(α, α) x(4) (t) ⎡
(7.3)
We shall later use these continuous-time dynamics to illustrate methods of inverse differential games presented in Chap. 6. Before doing so, however, we shall illustrate methods of inverse noncooperative dynamic games presented in Chap. 5.
7.1.2 Inverse Noncooperative Dynamic Game Simulations In order to illustrate inverse dynamic game methods from Chap. 5 using the two-link arm model (7.3), we must first compute discrete-time dynamics of the system and precompute open-loop Nash equilibrium state and control sequences to provide as (simulated) input data to the inverse methods. To compute discrete-time dynamics of the system, we linearize the continuoustime dynamics (7.3) around the equilibrium point x¯ = 0 − π2 0 0 . Then, discretizing these linearized dynamics with a sampling time of ΔT = 0.05 s (e.g., using MATLAB’s c2d routine) gives the discrete-time dynamics ⎡
xk+1
0.95 ⎢ 0.05 ⎢ =⎣ −1.79 2.03
0.04 0.91 1.37 −3.42
0.05 0 0.95 0.05
⎡ ⎤ 0 0.01 ⎢ 0.05 ⎥ ⎥ xk + ⎢ −0.02 ⎣ 0.38 0.04 ⎦ 0.91 −0.65
⎡ ⎤ −0.02 0.01 ⎢ 0.03 ⎥ ⎥ u 1 + ⎢ −0.02 −0.65 ⎦ k ⎣ 0.38 1.33 −0.65
⎤ −0.02 0.03 ⎥ ⎥ u2. −0.65 ⎦ k 1.33
(7.4)
7.1 Application-Inspired Example
231
To precompute open-loop Nash equilibrium state and control sequences to provide as input data, we consider the two players to be optimizing linearly parameterized cost functions of the form VTi (x[0,T ] , u 1[0,T −1] , u 2[0,T −1] , θ i ) =
T −1
θ i g¯ ki (xk , u 1k , u 2k )
(7.5)
k=0
for i ∈ {1, 2} over a finite horizon T = 14, with basis functions ⎤ (10x(1),k )2 ⎢(10x(2),k )2 ⎥ ⎥ g¯ ki (xk , u 1k , u 2k ) = ⎢ ⎣ (u i(1),k )2 ⎦ (u i(2),k )2 ⎡
(7.6)
for i ∈ {1, 2} where x( j),k and u ( j),k denote the jth components of xk and u k , respectively. These basis functions model the aims of both players to regulate the joint angles α(1) and α(2) toward zero, while minimizing their individual torque inputs.3 The players weigh the basis functions with the (true) parameters ⎡
⎤ 1 ⎢2⎥ ⎥ θ1 = ⎢ ⎣0.3⎦ 1
⎡
and
⎤ 1 ⎢0.8⎥ ⎥ θ2 = ⎢ ⎣0.2⎦ . 1
(7.7)
Given the dynamics (7.4) and cost functions (7.5), we compute open-loop Nash equilibrium state x[0,T ] and control sequences {u 1[0,T −1] , u 2[0,T −1] } from an initial state of x0 = π4 0 0 0 . We compute these sequences by using the dynamics (7.4) to express the states xk for all 0 ≤ k ≤ T as functions only of the control variables u 1k and u 2k for all 0 ≤ k < T (i.e., we “unwind” the recursions (7.4) describing the states). We substitute these state expressions into the player cost functions (7.5), thereby obtaining cost functions that depend only on the control variables {u 1[0,T −1] , u 2[0,T −1] }. We then find open-loop Nash equilibrium controls by searching for player controls {u 1[0,T −1] , u 2[0,T −1] } such that the (partial) derivatives of the player cost functions with respect to their associated controls are simultaneously 0, i.e., by finding {u 1[0,T −1] , u 2[0,T −1] } such that ∇u 1k VT1 = 0
and
∇u 2k VT2 = 0
(7.8)
for all 1 ≤ k < T . The associated open-loop Nash equilibrium state sequence x[0,T ] is then found by using these controls to evaluate the dynamic equations (7.4) from
We scale the first two components of the basis functions g¯ ki (xk , u 1k , u 2k ) by 102 to ensure that these terms are of the same order of magnitude as the last two components for numerical reasons.
3
232
7 Examples and Experimental Case Study
Fig. 7.2 Illustrative Example: (discrete-time) open-loop Nash equilibrium sequences for the twolink robot arm dynamic game
the initial state x0 . The open-loop Nash equilibrium sequences we compute (using MATLAB’s fsolve routine to solve (7.8)) are shown in Fig. 7.2. In the following, we examine the ability of inverse dynamic game methods from Chap. 5 to compute (or recover) the (true) parameters θ 1 and θ 2 given the precomputed open-loop Nash equilibrium state and control sequences x[0,T ] , u 1[0,T −1] , and u 2[0,T −1] , the dynamics (7.4), and the basis functions (7.6).
7.1.2.1
Bilevel Method
We first simulate the whole-sequence bilevel method of (5.14). We implement (5.14) in MATLAB, using the interior-point algorithm in fminsearch to perform the upper-level optimization over the parameters {θ 1 , θ 2 }. To implement the lower level optimization and find the open-loop Nash equilibrium state and control sequences θ 1θ 2θ x[0,T ] , u [0,T −1] , and u [0,T −1] , we use the same procedure used to precompute the 1 sequences x[0,T ] , u [0,T −1] , and u 2[0,T −1] (i.e., solving (7.8) using MATLAB’s fsolve routine). Given the precomputed state and control sequences x[0,T ] , u 1[0,T −1] , and u 2[0,T −1] , and initial parameter guesses of θ0i = 1 1 1 1 for all i ∈ {1, 2}, our implementation of the bilevel method (5.14) stops, with MATLAB’s default fminsearch stopping criteria, after 1001 iterations and 1520 function evaluations (i.e., solutions of the forward dynamic game), yielding the parameters
7.1 Application-Inspired Example
233
⎡
⎡
⎤ 1 ⎢2⎥ ⎥ θ1 = ⎢ ⎣0.3⎦ 1
and
⎤ 1 ⎢0.8⎥ ⎥ θ2 = ⎢ ⎣0.2⎦ . 1
(7.9)
These parameters are computed in approximately 3470 s.4 The parameters (7.9) computed by the bilevel method (6.14) are identical to the true parameters (7.7). However, clearly, the computational effort of the bilevel method is excessive due to its reliance on solving (forward) dynamic games for the candidate parameters selected by the upper-level optimization routine. We next illustrate the minimum-principle methods for solving inverse dynamic game problems, which avoid the solution of (forward) dynamic game problems.
7.1.2.2
Constraint-Satisfaction Method
To simulate the whole-sequence constraint-satisfaction method (5.21), we implement it in MATLAB by following Algorithm 5.1 for both players i ∈ {1, 2}. In contrast to the bilevel method (6.14), all steps in Algorithm 5.1 can be implemented with simple MATLAB commands without the need for optimization routines. In this example, the controls u 1k and u 2k are unconstrained (i.e., U 1 = U 2 = R2 ), and so we are free to select any (finite) sets of times K 1 and K 2 from {0, 1, . . . , T = 14}. Taking K 1 = K 2 = {0, 1} for simplicity, our implementation of the constraintsatisfaction method (5.21) given the precomputed sequences x[0,T ] , u 1[0,T −1] , and u 2[0,T −1] yields ⎡
1 ⎢ 10.907 ⎢ ξ¯C1 = ⎢ ⎢−27.663 ⎣ 7.6455 −22.088
⎤ 0 0 0 ⎥ −6.764 8.7377 0 ⎥ 21.317 0 −14.971⎥ ⎥ ⎦ −5.6214 11.991 0 17.22 0 −12.353
and ⎡
⎤ 1 ⎢−3.646 × 10−12 ⎥ ⎢ ⎥ −11 ⎥ ξ¯C1 ξ¯C1+ e¯1 = ⎢ ⎢−2.178 × 10−12 ⎥ ⎣ 2.6597 × 10 ⎦ 2.637 × 10−11 for player 1, and
4
All computational times in this book are for MATLAB R2021a on an Apple MacBook Air (M1, 2020) running macOS Big Sur.
234
7 Examples and Experimental Case Study
⎡
1 ⎢ 10.907 ⎢ ξ¯C2 = ⎢ ⎢−27.663 ⎣ 7.6455 −22.088
⎤ 0 0 0 −6.764 −27.478 0 ⎥ ⎥ 21.317 0 10.609⎥ ⎥ −5.6214 −15.742 0 ⎦ 17.22 0 8.3115
and ⎡
⎤ 1 ⎢−1.0631 × 10−12 ⎥ ⎢ ⎥ ¯ξC2 ξ¯C2+ e¯1 = ⎢−4.8956 × 10−12 ⎥ ⎢ ⎥ ⎣ 1.863 × 10−12 ⎦ 6.2474 × 10−12 for player 2. For player 1, we note that ξ¯C1 ξ¯C1+ e¯1 is equal to e¯1 = 1 0 0 0 0 (with a numerical tolerance of approximately 3 × 10−11 ), and the rank of ξ¯C1 is 4 = q 1 . By following the fifth and sixth steps of Algorithm 5.1, we thus have that the unique parameters computed by the constraint-satisfaction method (5.21) are ⎡
⎤ 1 ⎢2⎥ ⎥ θ 1 = ξ¯C1+ e¯1 = ⎢ ⎣0.3⎦ . 1
(7.10)
Similarly, for player 2, ξ¯C2 ξ¯C2+ e¯2 is equal to e¯1 = 1 0 0 0 0 (up to a numerical tolerance) and the rank of ξ¯C2 is 4 = q 2 . The unique parameters computed by the constraint-satisfaction method (5.21) are thus ⎡
⎤ 1 ⎢0.8⎥ ⎥ θ 2 = ξ¯C2+ e¯1 = ⎢ ⎣0.2⎦ . 1
(7.11)
Our implementation takes approximately 0.08 s to compute the parameters of both players (or around 0.04 s for each player). Clearly, the parameters computed by the constraint-satisfaction method (5.21) with the times K 1 = K 2 = {0, 1} match the true values in (7.7), unlike the bilevel method (5.14). The constraint-satisfaction method also uses only a tiny fraction of the time required by the bilevel method. By re-running the constraint-satisfaction method (5.21) with a variety of different time sets K i , we have observed that it is not at all sensitive to the specific times in K i . However, it fails to yield unique parameters when the sets K 1 and K 2 contain less than 2 (unique) elements. This mode of failure is somewhat intuitive since there i i = θ(4) = 1, and are two unique unknown elements in each vector θ i given that θ(1)
7.1 Application-Inspired Example
235
only one Hamiltonian gradient condition present in ξ¯Ci when |K i | = 1 (i.e., there are more unknowns than equations).
7.1.2.3
Soft Method
We next implement the soft method of (5.22) in MATLAB. Although we could follow Algorithm 5.2, constructing the matrices ξ S1 and ξ S2 is nontrivial since it involves factorizing terms into quadratic forms. We, therefore, depart from Algorithm 5.2 and instead solve (5.22) directly for each player by using MATLAB’s fmincon i = 1 (and a maximum number of 8000 function routine with the constraint that θ(1) evaluations and 200 iterations). Given the precomputed sequences x[0,T ] , u 1[0,T −1] , and u 2[0,T −1] , our implementation of the soft method (5.22) yields the parameters ⎡
⎡ ⎤ ⎤ 1 1 ⎢2⎥ ⎢0.8⎥ 1 2 ⎢ ⎥ ⎥ θ =⎢ ⎣0.3⎦ and θ = ⎣0.2⎦ . 1 1 Our implementation takes approximately 4 s to compute the parameters of both players (or 2 s for each player). Although the parameters computed by the soft method match the true parameters (7.7), our implementation requires more time to compute them than the constraintsatisfaction method. While this time increase is mostly due to us resorting to the direct use of MATLAB’s fmincon, we note that, in practice, the time required to construct the matrices ξ S1 and ξ S2 and create more efficient implementations is significant since the matrices ξ S1 and ξ S2 need to be individually constructed for the specific given sequences x[0,T ] , u 1[0,T −1] , and u 2[0,T −1] . In contrast, constructing the matrices required by the constraint satisfaction and mixed methods is straightforward. The soft method is also not applicable in situations with constrained controls and fails in several other situations that we shall illustrate later in Sect. 7.2.
7.1.2.4
Mixed Method
The final inverse dynamic game method that we simulate is the mixed method of (5.23) summarized in Algorithm 5.3. We implement Algorithm 5.3 in MATLAB with simple commands (i.e., without optimization routines). We again note that, in this example, we can select any (finite) sets of times K 1 and K 2 from {0, 1, . . . , 14}. Given the precomputed sequences x[0,T ] , u 1[0,T −1] , and u 1[0,T −1] , and using K 1 = K 2 = {0, 1}, our implementation of the mixed method (5.23) yields
236
7 Examples and Experimental Case Study
⎡
1 ξ¯ M
⎤ 828.32 −126.51 −531.87 ⎦ 0 = ⎣−126.51 220.13 −531.87 0 376.74
with ⎡
1 νM
⎤ −1086.8 = ⎣ 186.98 ⎦ 687
for player 1, and ⎡
2 ξ¯ M
⎤ 828.32 274.35 369.29 = ⎣274.35 1002.8 0 ⎦ 369.29 0 181.64
with ⎡
2 νM
⎤ −1086.8 = ⎣−420.05⎦ −477.07
1 2 for player 2. Since both ξ M and ξ M are full rank, the mixed method computes the unique player cost-functional parameters
⎡
⎤ 1 ⎢2⎥ ⎥ θ1 = ⎢ ⎣0.3⎦ 1
⎡
⎤ 1 ⎢0.8⎥ ⎥ and θ 2 = ⎢ ⎣0.2⎦ . 1
Our implementation takes a total of approximately 0.05 s to compute the parameters (or 0.025 s for each player). The performance of the mixed method (5.23) thus matches that of the constraintsatisfaction method (5.21), with both methods computing the true parameters (7.7) using K 1 = K 2 = {0, 1}, and both methods only requiring a small fraction of the computational times required by the bilevel (5.14) and soft (5.22) methods. The practical benefit of the mixed method is that it does not concern itself with the theoretical existence condition ξ¯Ci ξ¯Ci+ e¯1 = e¯1 , which we can only satisfy in this example with our implementation of the constraint-satisfaction method by introducing a numerical tolerance of about 3 × 10−11 for our tests of equality. As in the case of the constraint-satisfaction method (5.21), by re-running the mixed method (5.23) with a variety of different time sets K i , we have observed that it is insensitive to the specific times in K i . However, it again fails to yield unique parameters when the sets K 1 and K 2 contain less than 2 (unique) element.
7.1 Application-Inspired Example
237
Having illustrated inverse dynamic game methods from Chap. 5, we next illustrate (continuous-time) inverse noncooperative differential game methods from Chap. 6.
7.1.3 Inverse Noncooperative Differential Game Simulations To illustrate inverse differential game methods, we again open-loop Nash equilibrium trajectories to provide as (simulated) input data. For this purpose, we consider the two players to be optimizing cost functionals of the form T VTi (x, u 1 , u 2 , θ i )
=
θ i g¯ i (t, x(t), u 1 (t), u 2 (t))dt
(7.12)
0
for i ∈ {1, 2} over a finite horizon T = 1.5 s, with basis functions ⎤ (x(1) (t) − π4 )2 ⎢ (x(2) (t))2 ⎥ ⎥ g i (t, x(t), u 1 (t), u 2 (t)) = ⎢ ⎣ (u i(1) (t))2 ⎦ (u i(2) (t))2 ⎡
(7.13)
for i ∈ {1, 2}. The basis functions model the aims of both players to regulate the joint angles α(1) and α(2) toward π4 and zero, respectively, while minimizing the torques used. The players weigh these basis functions with the (true) parameters ⎡
⎤ 1 ⎢1.5⎥ ⎥ θ1 = ⎢ ⎣0.5⎦ 0.5
⎡
and
⎤ 1 ⎢1.5⎥ ⎥ θ2 = ⎢ ⎣0.1⎦ . 0.1
(7.14)
Given the nonlinear continuous-time dynamics (7.3) and cost functionals (7.12), we compute open-loop Nash equilibrium state x and control trajectories {u 1 , u 2 } π from an initial state of x0 = 0 2 0 0 . We compute these trajectories by using the Hamiltonian gradient condition (6.8) of Theorem 6.1, in the form ∇u i H i (t, x(t), u 1 (t), u 2 (t), λi (t), 1, θ i ) = 0 for i ∈ {1, 2}, to express the controls u 1 (t) and u 2 (t) in the dynamics (6.5) and costate equations (6.6) as functions of the state x(t) and costates λ1 (t) and λ2 (t). Elimination of the control variables leads to a two-point boundary value problem in terms of the states and costates that we solve numerically using MATLAB’s bvp4c routine. The resulting open-loop Nash equilibrium trajectories x, u 1 , and u 2 are shown in Fig. 7.3.
238
7 Examples and Experimental Case Study
Fig. 7.3 Illustrative Example: (continuous-time) open-loop Nash equilibrium trajectories for the two-link arm differential game
In the following, we examine the ability of inverse differential game methods from Chap. 6 to compute (or recover) the parameters θ 1 and θ 2 given the precomputed trajectories x, u 1 , and u 2 , the dynamics (7.3), and the basis functions (7.13).
7.1.3.1
Bilevel Method
We first simulate the whole-trajectory bilevel method of (6.14). We implement (6.14) in MATLAB, with the upper-level optimization over the parameters {θ 1 , θ 2 } implemented with the fminsearch numerical solver and its interior-point algorithm. The lower level optimization to find the open-loop Nash equilibrium state and control trajectories x θ , u 1θ , and u 2θ involves the same two-point boundary value problem formulation and bvp4c routine used to precompute the given trajectories x, u 1 , and u2. Given trajectories x, u 1 , and u 2 , and the initial parameters the precomputed i θ0 = 1 0.1 0.1 0.1 for all i ∈ {1, 2}, our implementation of the bilevel method (6.14) stops, with MATLAB’s default fminsearch stopping criteria, after 563 iterations and 858 function evaluations (i.e., solutions of the forward differential game), yielding the parameters ⎡
⎤ 1 ⎢1.4724⎥ ⎥ θ1 = ⎢ ⎣0.4810⎦ 1.2943
⎡
and
⎤ 1 ⎢1.4722⎥ ⎥ θ2 = ⎢ ⎣0.0982⎦ . 0.0923
(7.15)
7.1 Application-Inspired Example
239
Fig. 7.5 Illustrative Example: trajectory-error versus iterations of the bilevel method (6.14)
Sum of squared errors
Fig. 7.4 Illustrative Example: comparison of given trajectories x, u 1 , u 2 with the trajectories x θ , u 1θ , and u 2θ arising from parameters (7.15) computed by the bilevel method (6.14) 1500 1000 500 0 0
100
200
300
400
500
600
700
Iterations
The open-loop Nash equilibrium trajectories x θ , u 1θ and u 2θ that these parameters give rise to are shown in Fig. 7.4. Our implementation takes approximately 2565 s to compute the parameters. The parameters (7.15) computed by the bilevel method (6.14) clearly differ from the true parameters (7.14). However, as Fig. 7.4 shows, the difference in parameters leads only to slight differences between the state and control trajectories. By running our implementation of the bilevel method with a stricter fminsearch stopping criteria (i.e., a greater number of iterations), the parameters it computes tend to approach the true parameters. As shown in Fig. 7.5, however, the convergence of the bilevel method’s trajectory-error optimization objective in (6.14) (i.e., the sum of squared errors between the states and controls) is relatively slow with respect to the number of iterations. Increasing the number of iterations also leads to a roughly linear increase in the computational time required. We next consider the whole-trajectory constraint-satisfaction method of (6.21), which we expect to be significantly more computationally efficient due to its avoidance of two levels of numerical optimization.
240
7.1.3.2
7 Examples and Experimental Case Study
Constraint-Satisfaction Method
We implement the whole-trajectory constraint-satisfaction method (6.21) in MATLAB by following Algorithm 6.1 for each player. We use the MATLAB ode113 routine with its default tolerances to solve the differential equations (6.34) for λ¯ 1 and λ¯ 2 with terminal conditions λ¯ 1 (T ) = 0 and λ¯ 2 (T ) = 0. All other steps in Algorithm 6.1 involve simple MATLAB commands. As in the discrete-time case of this example, the controls u 1 (t) and u 2 (t) are free of constraints (i.e., U 1 = U 2 = R2 ), and so we can select any (finite) sets of times K 1 and K 2 in the interval [0, 1.5]. Taking K 1 = K 2 = {0, 0.75, 1.5} for simplicity, our implementation of the constraint-satisfaction method (6.21) given the precomputed trajectories x, u 1 , and u 2 yields ⎡
1 ⎢−531.89 ⎢ ⎢ 1745.6 ⎢ ¯ξC1 = ⎢−4.0728 ⎢ ⎢ 11.806 ⎢ ⎣ 0 0
⎤ 0 0 0 ⎥ 350.84 7.0746 0 ⎥ −1159.5 0 0.99185 ⎥ ⎥ ⎥ 2.5602 0.43423 0 ⎥ −7.8468 0 0.019923⎥ ⎥ ⎦ 0 0 0 0 0 0
and ⎡
⎤ 1 ⎢ 7.7557 × 10−6 ⎥ ⎢ ⎥ ⎢ 3.1185 × 10−6 ⎥ ⎢ ⎥ −4 ⎥ ξ¯C1 ξ¯C1+ e¯1 = ⎢ ⎢−1.2636 × 10−4 ⎥ ⎢−1.5526 × 10 ⎥ ⎢ ⎥ ⎣ ⎦ 0 0 for player 1, and ⎡
1 ⎢−531.89 ⎢ ⎢ 1745.6 ⎢ ¯ξC2 = ⎢−4.0728 ⎢ ⎢ 11.806 ⎢ ⎣ 0 0 and
⎤ 0 0 0 ⎥ 350.84 35.373 0 ⎥ −1159.5 0 4.9593 ⎥ ⎥ ⎥ 2.5602 2.1711 0 ⎥ −7.8468 0 0.099614⎥ ⎥ ⎦ 0 0 0 0 0 0
7.1 Application-Inspired Example
241
⎡
⎤ 1 ⎢ 7.7557 × 10−6 ⎥ ⎢ ⎥ ⎢ 3.1185 × 10−6 ⎥ ⎢ ⎥ −4 ⎥ ξ¯C2 ξ¯C2+ e¯1 = ⎢ ⎢−1.2636 × 10−4 ⎥ ⎢−1.5526 × 10 ⎥ ⎢ ⎥ ⎣ ⎦ 0 0 for player 2. For player 1, we note that ξ¯C1 ξ¯C1+ e¯1 is numerically close (up to 3 decimal places) to e¯1 = 1 0 0 0 0 0 0 and the rank of ξ¯C1 is 4, so following Algorithm 6.1, the unique parameters computed by the constraint-satisfaction method (6.21) are ⎡
⎤ 1 ⎢ 1.506 ⎥ ⎥ θ 1 = ξ¯C1+ e¯1 = ⎢ ⎣0.49995⎦ . 0.51602
(7.16)
Similarly, for player 2, ξ¯C2 ξ¯C2+ e¯2 is numerically close to e¯1 = 1 0 0 0 0 0 0 and the rank of ξ¯C2 is 4, so the unique parameters computed by the constraint-satisfaction method (6.21) are ⎡
⎤ 1 ⎢ 1.506 ⎥ ⎥ θ 2 = ξ¯C2+ e¯1 = ⎢ ⎣0.09999⎦ . 0.1032
(7.17)
Computing the parameters of both players with our implementation takes approximately 0.25 s (or 0.125 s for each player). The parameters computed by the constraint-satisfaction method (6.21) with the selected times K 1 = K 2 = {0, 0.75, 1.5} differ slightly from the true values of ⎡
⎤ 1 ⎢1.5⎥ ⎥ θ1 = ⎢ ⎣0.5⎦ 0.5
⎡
and
⎤ 1 ⎢1.5⎥ ⎥ θ2 = ⎢ ⎣0.1⎦ . 0.1
They are, however, much closer to the true parameters than the parameters computed by the bilevel method (7.15). The constraint-satisfaction method also computes its parameters with only a tiny fraction of the computation effort required by the bilevel method (our implementation of the constraint-satisfaction method is about 10000 times faster than our implementation of the bilevel method). The error in the parameters computed by the constraint-satisfaction method is due primarily to the choice of the times in K i , and the tolerances of the ODE solver used to solve the differential equations (6.34). Increasing the number of (unique) times
242
7 Examples and Experimental Case Study
in K i and reducing the tolerances for the ODE solver both have the potential to decrease the error in the computed parameters, but at the cost of an increase in computational effort. Indeed, by instead selecting the times K 1 = K 2 = {0, 0.5, 1, 1.5} in this example, the constraint-satisfaction method (6.21) yields the more accurate parameters ⎡
⎡ ⎤ ⎤ 1 1 ⎢ 1.5059 ⎥ ⎢ ⎥ ⎥ and θ 2 = ⎢ 1.5059 ⎥ . θ1 = ⎢ ⎣0.50045⎦ ⎣0.10009⎦ 0.50402 0.1008 By repeating this example with a variety of different time sets K i , we have observed that the conditions ξ¯C1 ξ¯C1+ e¯1 = e¯1 and ξ¯C2 ξ¯C2+ e¯1 = e¯1 appear to be grossly violated by all sets of times K 1 and K 2 with less than 3 (unique) elements. The constraint-satisfaction method (6.21) then fails to compute unique (or accurate) parameters using these sets of times. This mode of failure appears intuitive since i = 1, and less than there are three unknown elements in each vector θ i given that θ(1) i i ¯ three Hamiltonian gradient conditions present in ξC when |K | < 3 (i.e., there are more unknowns than equations).
7.1.3.3
Soft Method
To avoid selecting the sets of times K 1 and K 2 , we consider the soft method of (6.22). We implement (6.22) in MATLAB for each player by following Algorithm 6.2. We use the ode113 routine with its default tolerances to solve the Riccati differential equations (6.40) for P 1 (0) and P 2 (0) with terminal conditions P 1 (T ) = P 2 (T ) = 0. All other steps in Algorithm 6.2 are implemented with simple MATLAB commands. Given the precomputed trajectories x, u 1 , and u 2 , for player 1, our implementation of the soft method (6.22) yields ⎡ ξ¯S1 =
and
0.2521 0.16075 ⎢ −0.0093944 ⎣ 0.16107 −0.31494 −0.11001 0.78853
0.16075 6.1851 052583 0.08669 −0.17232 1.9624 −0.37222
−0.0093944 052583 0.034394 −0.015247 0.0087714 −0.015072 0.26629
0.16107 0.08669 −0.015247 0.45616 −0.16106 −1.135 −0.44663
−0.31494 −0.17232 0.0087714 −0.16106 0.48529 0.12748 −1.9925
−0.11001 1.9624 −0.015072 −1.135 0.12748 4.3394 −0.24489
0.78853 ⎤ −0.37222 0.26629 ⎥ −0.44663 ⎦ −1.9925 −0.24489 20.859
7.1 Application-Inspired Example
243
⎤ 0.019337 ⎢ 0.0068223 ⎥ ⎥ ⎢ ⎢−0.0012332⎥ ⎥ ⎢ ⎥ ν S1 = ⎢ ⎢ 0.072751 ⎥ . ⎢ −0.019774 ⎥ ⎥ ⎢ ⎣ −0.18396 ⎦ −0.1052 ⎡
The rank of ξ¯S1 is 7, which satisfies q 1 + n − 1 = 4 + 4 − 1 = 7. Following Algorithm 6.2, we thus have that the unique parameters computed by the soft-constrained method (6.22) are ⎡
⎤ 1 ⎢ 1.5014 ⎥ ⎥ θ1 = ⎢ ⎣0.50088⎦ . 0.50141 Similarly, for player 2, our implementation of the soft method (6.22) yields ⎡ ξ¯S2 =
0.2521 0.80373 ⎢ −0.046995 ⎣ 0.16107 −0.31494 −0.10999 0.78854
0.80373 154.63 0.013312 0.43346 −0.86157 9.8119 −1.8613
−0.046995 0.013312 0.85986 −0.076219 0.04388 −0.075474 1.3315
0.16107 0.43346 −0.076219 0.45616 −0.16106 −1.135 −0.44663
−0.31494 −0.86157 0.04388 −0.16106 0.4853 0.12745 −1.9925
−0.10999 9.8119 −0.075474 −1.135 0.12745 4.3395 −0.24478
0.78854 ⎤ −1.8613 1.3315 ⎥ −0.44663 ⎦ −1.9925 −0.24478 20.859
and ⎤ 0.019336 ⎢ 0.034112 ⎥ ⎥ ⎢ ⎢−0.0061633⎥ ⎥ ⎢ ⎥ ν S2 = ⎢ ⎢ 0.072751 ⎥ . ⎢ −0.019774 ⎥ ⎥ ⎢ ⎣ −0.18396 ⎦ −0.10519 ⎡
The rank of ξ¯S2 is 7 (which corresponds to q 2 + n − 1 = 4 + 4 − 1), and so following Algorithm 6.2, we have that the unique parameters computed by the soft-constrained method (6.22) are ⎡
⎤ 1 ⎢ 1.4861 ⎥ ⎥ θ2 = ⎢ ⎣0.099051⎦ . 0.09893 The time required by our implementation to compute both θ 1 and θ 2 is approximately 0.34 s (or 0.17 s for each player).
244
7 Examples and Experimental Case Study
The parameters computed by the soft method (6.22) differ slightly from the true values in (7.14), but are closer to the true parameters than the parameters computed by the bilevel method (cf. (7.15)). They are also similar to those computed by the constraint-satisfaction method (cf. (7.16) and (7.17)), without the need to (carefully) select times in sets like K i . Indeed, the error in the parameters computed by the soft method is due entirely to the selected tolerances of the ODE solver used to solve the Riccati differential equations (6.40). The error in parameters can thus be reduced (at the cost of reduced computational speed) by decreasing the ODE solver’s tolerances. Importantly, the soft method retains the same enormous computational speed advantage over the bilevel method that the constraint-satisfaction method has without requiring the (careful) selection of times in sets like K i . Unfortunately, however, the soft method is not applicable in situations with constrained controls and in several other situations that we shall illustrate later in Sect. 7.2.
7.1.3.4
Mixed Method
The final method we simulate is the mixed method of (6.23), which is summarized in Algorithm 6.3. As in the case of the constraint-satisfaction method, we use the MATLAB ode113 routine with its default tolerances to solve the differential equations (6.34) for λ¯ 1 and λ¯ 2 with terminal conditions λ¯ 1 (T ) = 0 and λ¯ 2 (T ) = 0. All other steps in Algorithm 6.3 are implemented with simple MATLAB commands. We note again that, in this example, we are free to select any (finite) sets of times K 1 and K 2 in the interval [0, 1.5]. Taking again K 1 = K 2 = {0, 0.75, 1.5}, our implementation of the mixed method (6.23) given the precomputed trajectories x, u 1 , and u 2 yields ⎡
1 ξ¯ M
⎤ 1.4675 × 106 2483.2 −1150.2 ⎦, 50.239 0 = ⎣ 2483.2 −1150.2 0 0.98417
with ⎡
1 νM
⎤ −2.2106 × 106 = ⎣ −3764.7 ⎦ 1731.6
and 2 ξ¯ M
with
⎡ ⎤ 1.4675 × 106 12416 −5750.8 ⎦ 1256 0 = ⎣ 12416 −5750.8 0 24.604
7.1 Application-Inspired Example
245
⎡
2 νM
⎤ −2.2106 × 106 = ⎣ −18823 ⎦ . 8658
1 2 and ξ M are full rank, the mixed method computes the unique player Since both ξ M cost-functional parameters
⎡
⎡ ⎤ ⎤ 1 1 ⎢ 1.506 ⎥ ⎢ 1.506 ⎥ 2 ⎢ ⎥ ⎥ θ1 = ⎢ ⎣0.49995⎦ and θ = ⎣0.09999⎦ . 0.51602 0.1032 Our implementation takes a total of approximately 0.24 s to compute the parameters (or 0.12 s for each player). The performance of this mixed method is thus essentially identical to that we observed for the constraint-satisfaction method (both in terms of the error in the parameters and the computational time). The mixed method also appears to share the same mode of failure as the constraint-satisfaction method, with us repeating 1 and this example with different sets of times K i and observing that the matrices ξ¯ M 2 1 2 ¯ξ M appear rank deficient for sets of times K and K with less than 3 (unique) elements. This similarity is perhaps not surprising since the two methods exploit the same conditions but make use of them in slightly different ways. The practical benefit of the mixed method is, however, that it does not concern itself with the theoretical existence condition ξ¯Ci ξ¯Ci+ e¯1 = e¯1 (which, even in this idealized example, we can only satisfy by introducing a numerical tolerance of 3 decimal places).
7.1.4 Summary of Application-Inspired Illustrative Example The following observations have arisen from the example considered in this section (in both the discrete-time setting of inverse noncooperative dynamic games and the continuous-time setting of inverse noncooperative differential games). • Bilevel methods are significantly more computationally complex than methods based on minimum principles (i.e., constraint-satisfaction, soft, and mixed methods)—with our bilevel implementations taking about 10000 times longer than our implementations of the minimum-principle methods. • Soft and mixed methods offer similar performance in terms of parameter errors, but soft methods are slightly more computationally expensive (and less broadly applicable because they require unconstrained controls), while mixed methods require the selection of times for the sets K i . • Practically checking the existence of (exact) solutions in constraint-satisfaction methods (i.e., the condition ξ¯Ci ξ¯Ci+ e¯1 = e¯1 ) requires a level of numerical tolerance,
246
7 Examples and Experimental Case Study
making mixed methods potentially attractive alternatives since they offer identical performance without invoking this condition. • Both mixed and constraint-satisfaction methods can fail to yield unique (or accurate) parameters when the number of (unique) times in the sets K i are less than the number of unknown elements in the parameter vectors θ i .
7.2 Further Examples As we observed in the last section, soft and mixed minimum-principle methods offer similar performance in situations where they are both applicable. Unfortunately, however, soft methods are not applicable in situations with constrained controls. Furthermore, as we shall show in this section, it is possible to construct simple examples in which soft methods fail entirely, or in which the (computationally expensive) singular value decomposition (SVD) steps in them (cf. Algorithms 3.2, 4.2, 5.2, and 6.2) become crucial for averting failure. The aim of this section is to highlight that despite the satisfactory performance of the soft methods in the previous section, in the vast majority of situations, the mixed methods are much less troublesome and the superior approach.
7.2.1 Failure Case for Soft Method The first example we consider in this section is a case in which soft methods fail to compute unique parameters. For this example, consider the continuous-time finitehorizon LQ optimal control problem inf u
s.t.
T
θ g¯ (t, x(t), u(t)) dt
0
x(t) ˙ = u(t), t ∈ [0, T ] u(t) ∈ R, t ∈ [0, T ] x(0) = 1
with T = 5 and where ⎡ ⎤ ⎡ ⎤ 1 (u(t))2 θ = ⎣2⎦ and g(t, ¯ x(t), u(t)) = ⎣ (x(t))2 ⎦ . 1 x(t)u(t)
(7.18)
7.2 Further Examples
247
Fig. 7.6 Failure Case for Soft Method: continuous-time optimal state and control trajectories
The state and control trajectories x and u that solve this optimal control problem5 are shown in Fig. 7.6. Given these state and control trajectories along with the dynamics and basis functions, we examine the ability of the soft and mixed methods of continuous-time inverse optimal control from Chap. 4 to (re)compute the true costfunctional parameters.
7.2.1.1
Failure of Soft Method
Consider the soft method (4.13) in the form of Algorithm 4.2. Solving the Riccati differential equation (4.31) yields the matrix ⎡
⎤ 0.92836 −0.82822 −0.82822 ξ¯S = ⎣−0.82822 0.99994 0.99994 ⎦ −0.82822 0.99994 0.99994 and the vector ⎡
⎤ 0.48584 ν S = ⎣−1.1718⎦ . −1.1718 The matrix ξ¯S is rank deficient (i.e., its rank is 2), and so following Algorithm 4.2, we must compute an SVD of ξ¯S . Computing an SVD of ξ¯S yields the matrix ⎡
⎤ −0.54039 −0.84141 −1.0659 × 10−8 −0.70711 ⎦ U S = ⎣ 0.59498 −0.38211 0.59498 −0.38211 0.70711 with the submatrix of interest being
5
These trajectories can be found and verified as unique using standard results for linear-quadratic optimal control problems (e.g., [2, Sect. 3.4]).
248
7 Examples and Experimental Case Study
U S12 =
−1.0659 × 10−8 . −0.70711
Clearly, U S12 = 0 and so following Algorithm 4.2, we thus have that the soft method (4.13) cannot compute unique cost-functional parameters in this example.
7.2.1.2
Success of Mixed Method
Consider now the mixed method (4.14) and its reformulation in Algorithm 4.3 with K = {0, 2.5, 5}. Solving the differential equation (4.23) and evaluating the matrix ξ M in (4.34) yields the submatrix ξ¯ M =
2.6300 0.0021287 0.00326 2.6674 × 10−6
and the vector νM =
−5.26 . −0.006526
Clearly, ξ¯ M is full rank, and so following Algorithm 4.3, we have that the mixed method (4.14) gives the unique cost-functional parameters ⎡ ⎤ 1 θ = ⎣2⎦ , 1 which match the true parameters used to generate the given trajectories. Furthermore, by repeating the application of the mixed method (4.14) with other choices of times for the set K , we have been unable to find any K ⊂ [0, 5] with greater than two (unique) elements such that the mixed method (4.14) fails to uniquely compute the true parameters.
7.2.1.3
Summary
This example illustrates that the soft method of continuous-time inverse optimal control can fail in situations where the mixed method of continuous-time inverse optimal control succeeds. Additional examples in which soft methods of discretetime inverse optimal control, inverse dynamic games, and inverse differential games fail can be constructed by simply including basis functions that cross-multiply states and controls (as in g¯ (3) (t, x(t), u(t)) = x(t)u(t) in this example).
7.2 Further Examples
249
The conditions required by mixed methods and soft methods to compute unique solutions are thus not equivalent in general, with those required by mixed methods likely to be more easily (and widely) satisfied than of those required by soft methods.
7.2.2 Importance of SVDs for Soft Method The second example we consider in this section is a case in which (potentially computationally complex) SVDs become important in the soft methods to compute unique parameters. For this example, consider two players moving in two dimensions with player i’s position vector (i ∈ {1, 2}) being
x¯ i x¯ (t) (1) i x¯(2)
i
and player i’s velocity vector being i ˙ ˙x¯ i (t) x¯(1) . i x˙¯(2) Let us then define the state of a two-player (continuous-time) differential game as the vector of positions and velocities, namely ⎡
⎤ x¯ 1 (t) ⎢x˙¯ 1 (t)⎥ ⎥ x(t) = ⎢ ⎣x¯ 2 (t)⎦ . x˙¯ 2 (t) Let the state evolve according to the kinematic equations x(t) ˙ = from an initial state of
0 2 A¯ 0 C¯ 1 x(t) + (t) + u (t) u C¯ 0 0 A¯
(7.19)
250
7 Examples and Experimental Case Study
⎤ −10 ⎢−10⎥ ⎥ ⎢ ⎢ −5 ⎥ ⎥ ⎢ ⎢ −5 ⎥ ⎥ x0 = ⎢ ⎢ 10 ⎥ ⎥ ⎢ ⎢ 0 ⎥ ⎥ ⎢ ⎣ 5 ⎦ 5 ⎡
where u i (t) ∈ R2 are the player acceleration control inputs, and the matrices 0I 0 A¯ and C¯ 00 I describe the mappings between position, velocity, and acceleration. Consider also player cost functionals of the form VTi (x, u 1 , u 2 , θ i )
T
θ i g¯ i (t, x(t), u 1 (t), u 2 (t))dt
(7.20)
0
for i ∈ {1, 2} with T = 15, parameters ⎡ ⎤ ⎡ ⎤ 1 1 ⎢1⎥ ⎢1⎥ ⎢ ⎥ ⎢ ⎥ 2 ⎥ ⎢ ⎥ θ1 = ⎢ ⎢1⎥ and θ = ⎢1⎥ , ⎣1⎦ ⎣5⎦ 5 1
(7.21)
and basis functions ⎡
⎤ x¯ 1 (t) − x¯ 2 (t) 2 i 2 ⎢ ⎥ (x¯(1) (t)) ⎢ ⎥ i 2 i 1 2 ⎢ ⎥. ( x ¯ (t)) g¯ (t, x(t), u (t), u (t)) = ⎢ (2) ⎥ i 2 ⎣ ⎦ (u (1) (t)) i 2 (u (2) (t)) The basis functions and parameters describe the objectives of players seeking to intercept each other (by minimizing the squared distance x¯ 1 (t) − x¯ 2 (t) 2 ) while reaching the origin (x¯ i (t) = 0) and minimizing accelerations. Following the procedure for finding open-loop Nash equilibria detailed in Sect. 7.1.3 (involving a two-point boundary value problem), we find trajectories x and {u 1 , u 2 } that constitute an open-loop Nash equilibrium for the two-player game with dynamics (7.19) and player cost functionals (7.20). The position components of the open-loop Nash equilibrium state trajectory x are shown in Fig. 7.7. Given the open-loop Nash equilibrium state and player control trajectories x and {u 1 , u 2 }, the
7.2 Further Examples
251
Fig. 7.7 Importance of SVDs for Soft Method: open-loop Nash equilibrium state trajectory
dynamics (7.19), and the player cost functionals (7.20), we examine the ability of the soft method (6.22) of (continuous-time) inverse differential games from Chap. 6 to recompute the true cost-functional parameters {θ 1 , θ 2 }. For player 1, following the reformulation of the soft method (6.22) in Algorithm 6.2 and solving the Riccati differential equation (6.40) yields the matrix ⎡
256.31 ⎢ 0 ⎢ ⎢ 46.876 ⎢ ⎢ 0 ⎢ ⎢ 13.545 ⎢ ⎢ 0 ⎢ ξ¯S1 = ⎢ ⎢ −4.1977 ⎢ ⎢ 0 ⎢ ⎢ 0 ⎢ ⎢ 0 ⎢ ⎣ 0 0
⎤ 0 46.876 0 13.545 0 −4.1977 0 0 0 0 0 860.93 0 22.952 0 20.073 0 −2.3504 0 0 0 0 ⎥ ⎥ 0 892.06 0 −0.40821 0 17.689 0 0 0 0 0⎥ ⎥ 22.952 0 206.46 0 −2.0723 0 11.112 0 0 0 0 ⎥ ⎥ 0 −0.40821 0 0.86603 0 −0.5 0 0 0 0 0⎥ ⎥ ⎥ 20.073 0 −2.0723 0 0.86603 0 −0.5 0 0 0 0 ⎥ ⎥ 0 17.689 0 −0.5 0 0.86603 0 0 0 0 0⎥ ⎥ −2.3504 0 11.112 0 −0.5 0 0.86603 0 0 0 0 ⎥ ⎥ 0 0 0 0 0 0 0 0 0 0 0⎥ ⎥ 0 0 0 0 0 0 0 0 0 0 0⎥ ⎥ 0 0 0 0 0 0 0 0 0 0 0⎦ 0 0 0 0 0 0 0 0 0 0 0
and the vector 633.96 ⎤ 794.1 56.608 ⎢ 28.522 ⎥ ⎢ 33.277 ⎥ ⎢ 20.198 ⎥ ⎥ ⎢ ⎢ −6.6328 ⎥ . ⎢ −3.2029 ⎥ ⎣ 0 ⎦ 0 0 0
⎡
ν S1 =
The matrix ξ¯S1 is clearly rank deficient (its rank is 8), and so following Algorithm 6.2 we must compute an SVD of ξ¯S1 . Computing an SVD of ξ¯S1 yields the matrix
252
7 Examples and Experimental Case Study ⎡
0.07297 ⎢ 0 ⎢ ⎢ 0.99715 ⎢ ⎢ 0 ⎢ ⎢ ⎢ 064 ⎢ ⎢ 0 U S1 = ⎢ ⎢ 0.01937 ⎢ ⎢ 0 ⎢ ⎢ 0 ⎢ ⎢ ⎢ 0 ⎢ ⎣ 0 0
0 −0.99912 0 −0.03486 0 −0.02320 0 0.00229 0 0 0 0
−0.99570 0 0.07248 0 −0.05350 0 0.02171 0 0 0 0 0
0 0.03437 0 −0.99784 0 0.01356 0 −0.05440 0 0 0 0
0 −0.02086 0 0.04451 0 0.76841 0 −0.63807 0 0 0 0
0.04360 0 −0.01985 0 −0.48389 0 0.87382 0 0 0 0 0
⎡
⎤ 0 0⎥ ⎥. 0⎦ 0
0.03688 0 0.00729 0 −0.87349 0 −0.48539 0 0 0 0 0
0 −0.01192 0 −0.03359 0 0.63940 0 0.76805 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 0 1 0 0
⎤ 0 0⎥ ⎥ 0⎥ ⎥ ⎥ 0⎥ ⎥ 0⎥ ⎥ 0⎥ ⎥, 0⎥ ⎥ 0⎥ ⎥ ⎥ 1⎥ ⎥ 0⎥ ⎥ 0⎦ 0
from which we extract
U S1,12
0 ⎢0 =⎢ ⎣0 0
0 0 0 0
0 0 0 0
Since U S1,12 = 0, Algorithm 6.2 implies that the unique cost-functional parameters for player 1 given by the soft method are ⎡
1 ⎢0 ⎢ θ1 = ⎢ ⎢0 ⎣0 0
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
⎡ ⎤ ⎤ 1 0 ⎢1⎥ 0⎥ ⎢ ⎥ ⎥ 1 ⎢ ⎥ 0⎥ ⎥ −ξ¯ 1+ ν 1 = ⎢1⎥ . S S ⎣1⎦ 0⎦ 5 0
Similarly, for player 2, following Algorithm 6.2 and solving the Riccati differential equation (6.40) gives the matrix ⎤ 814.65 0 21.941 0 0 0 0 0 −19.732 0 2.4351 0 ⎢ 0 19.135 0 7.6545 0 0 0 0 0 −0.1243 0 0.85248 ⎥ ⎥ ⎢ ⎥ ⎢ 21.941 0 227.69 0 0 0 0 0 1.975 0 −11.45 0 ⎥ ⎢ ⎢ 0 7.6545 0 260.19 0 0 0 0 0 0.80055 0 −10.319 ⎥ ⎥ ⎢ ⎥ ⎢ 0 0 0 0 0000 0 0 0 0 ⎥ ⎢ ⎥ ⎢ 0 0 0 0 0 0 0 0 0 0 0 0 ⎥ ξ¯ S2 = ⎢ ⎥ ⎢ 0 0 0 0 0000 0 0 0 0 ⎥ ⎢ ⎥ ⎢ 0 0 0 0 0 0 0 0 0 0 0 0 ⎥ ⎢ ⎥ ⎢ −19.732 0 1.975 0 0 0 0 0 0.86603 0 −0.5 0 ⎥ ⎢ ⎥ ⎢ 0 −0.1243 0 0.80055 0 0 0 0 0 0.86603 0 −0.5 ⎥ ⎢ ⎦ ⎣ 2.4351 0 −11.45 0 0 0 0 0 −0.5 0 0.86603 0 0 0.85248 0 −10.319 0 0 0 0 0 −0.5 0 0.86603 ⎡
and the vector
7.2 Further Examples
253 1192.3 ⎤ −47.698 ⎥ ⎢ 46.266 ⎥ ⎢ 13.455 ⎢ 0 ⎥ ⎢ 0 ⎥. ⎢ 0 ⎥ ⎢ 0 ⎥ ⎣ −33.277 ⎦ −20.198 6.6328 3.2029
⎡
ν S2 =
Again, the matrix ξ¯S2 is clearly rank deficient (with a rank of 8). Following Algorithm 6.2, we must therefore compute an SVD of ξ¯S2 , which yields the matrix ⎡
−0.999 ⎢ 0 ⎢ ⎢ −0.037 ⎢ ⎢ 0 ⎢ ⎢ 0 ⎢ ⎢ 0 2 US = ⎢ ⎢ 0 ⎢ ⎢ 0 ⎢ ⎢ 0.024 ⎢ ⎢ 0 ⎢ ⎣ −0.002 0
0 0.031 0 0.999 0 0 0 0 0 0.003 0 −0.040
0.037 0 −0.998 0 0 0 0 0 −0.012 0 0.051 0
0 0.998 0 −0.029 0 0 0 0 0 −0.010 0 0.064
0 −0.041 0 0.024 0 0 0 0 0 −0.855 0 0.516
0.021 0 −0.042 0 0 0 0 0 0.753 0 −0.656 0
0 0.048 0 −0.034 0 0 0 0 0 −0.518 0 −0.853
0.013 0 0.031 0 0 0 0 0 0.657 0 0.753 0
⎤ 0 0 0 0 0 0 0 0⎥ ⎥ 0 0 0 0⎥ ⎥ 0 0 0 0⎥ ⎥ −1 0 0 0⎥ ⎥ 0 0.764 −0.645 0 ⎥ ⎥. 0 0.645 0.764 0 ⎥ ⎥ 0 0 0 1⎥ ⎥ 0 0 0 0⎥ ⎥ 0 0 0 0⎥ ⎥ 0 0 0 0⎦ 0 0 0 0
Extracting ⎡
U S2,12
0 ⎢0 =⎢ ⎣0 0
0 0 0 0
0 0 0 0
⎤ 0 0⎥ ⎥ 0⎦ 0
implies that the unique cost-functional parameters for player 2 given by the soft method are ⎡ ⎤ ⎡ ⎤ 1 1000000000000 ⎢0 1 0 0 0 0 0 0 0 0 0 0 0⎥ ⎢1⎥ ⎢ ⎥ ⎢ ⎥ 1 ⎢ ⎥ ⎥ θ2 = ⎢ ⎢ 0 0 1 0 0 0 0 0 0 0 0 0 0 ⎥ −ξ¯ 2+ ν 2 = ⎢1⎥ . S S ⎣5⎦ ⎣0 0 0 1 0 0 0 0 0 0 0 0 0⎦ 1 0000100000000
7.2.2.1
Summary
The player cost-functional parameters computed by the soft method of inverse differential games (6.22) in this example match the true parameters (7.21) despite the matrices ξ¯S1 and ξ¯S2 being rank deficient (and hence failing the first test of uniqueness in Algorithm 6.2). Thus, this example highlights the importance of the (potentially computationally expensive) SVD steps in Algorithm 6.2. We note that since the
254
7 Examples and Experimental Case Study
matrices ξ¯S1 and ξ¯S2 are rank deficient, the (costate) functions λ1 and λ2 yielded by the soft method (6.22) are nonunique in this example. We also note that because the mixed method (6.23) only involves optimization over the player cost-functional parameters, in this example it yields unique player cost-functional parameters (in the same manner it has for all of the other examples we have considered in this chapter). Additional examples demonstrating the importance of the SVD steps in soft methods (including those for inverse optimal control and inverse dynamic games) are easily constructed by considering dynamics similar to those in (7.19) in which subsets of states and controls evolve independently, except for any interaction through cost functionals or cost functions. For example, a collision-avoidance game demonstrating the same effect is presented in [15, Sect. V.A]. Having shown through simulations that mixed methods are superior minimumprinciple methods for solving inverse optimal control problems and inverse noncooperative dynamic and differential game problems under the open-loop information structure, we next present a real-data experimental case study involving the solution of an inverse differential game problem under the feedback information structure using the inverse LQ feedback differential game method of Sect. 6.7.
7.3 Experimental Case Study: Identification of Human Behavior for Shared Control Shared control describes the simultaneous control of a dynamical system by one or more humans or machines in a common perception-action cycle. It has proved particularly important for the development of Advanced Driving Assistance Systems (ADAS) that control vehicle dynamics in tandem with human drivers. A key challenge in the design of systems for safe and effective shared control (including ADAS) is enabling machines to identify and adapt to the behavior of different humans. In this section, we conduct a study using real experimental data to evaluate the ability of inverse differential game methods to identify human driver behavior for shared control. Specifically, we seek to identify cost functionals for pairs of humans simultaneously acting on a system of coupled haptic steering wheels using the inverse LQ feedback differential games method of Sect. 6.7.
7.3.1 Experimental Setup The experimental setup represents a simplified vehicle lateral control scenario with two human drivers (study participants). The setup is shown diagrammatically in Fig. 7.8 and consists of four main components: • two haptically-coupled (active) steering wheels; • two monitors;
7.3 Experimental Case Study: Identification of Human Behavior for Shared Control
255
PC
Real Time System
Fig. 7.8 Diagram of experimental setup with visualization illustrated on the monitors
• a real-time dSPACE processing unit; and, • a desktop computer (PC). Figure 7.9 shows a picture of one of the driving simulators (a steering wheel and monitor) used by one of the test subjects.6 Each of the steering wheels is equipped with an incremental encoder that measures steering angles with a sampling frequency of f s = 100 Hz and a precision of 40000 increments per full rotation. In addition, each steering wheel is active due to electrical motors on their shaft. The motors are used to realize: • dynamics of a spring–damper system individually for each steering wheel; and, • a virtual haptic coupling between both steering wheels. The virtual coupling is implemented in MATLAB/Simulink 2010b using real-time environment within the real-time processing unit. This is also used to establish communication between all components. The haptic coupling consists of an automatic controller that emulates a virtual spring–damper element between both steering wheels, and thus resists nonzero angular differences between the steering wheels. The desktop computer interacts with the real-time processing unit to generate a visualization window on each monitor that provides visual feedback as to the angle of the associated steering wheel. The visualization (illustrated on the monitors depicted in Fig. 7.8) consists of a marker (green square) with a vertical position fixed at 75% of the window height, that moves horizontally on the screen according to the angle of the respective steering wheel. The range of steering wheel angles mapped onto the screen is [−180◦ , 180◦ ), where a positive angle corresponds to a counterclockwise rotation. In addition to the marker, a reference line is shown in each visualization. The reference lines scroll downwards (top-to-bottom) through the window at a constant 6
The driving simulator and experimental setup for the study is located at the Institute of Control Systems of Karlsruhe Institute of Technology (KIT).
256
7 Examples and Experimental Case Study
Fig. 7.9 Driving simulator as one part of the experimental setup (only the middle monitor was used for our experiments)
speed, with a single point crossing the entire visualization window in 2 s. The aim of the participants (humans) is to have their individual markers track the (common) reference trajectory by means of turning their corresponding steering wheel.
7.3.2 Model Structure The haptic coupling between the steering wheels means that each participant is seeking to achieve their individual objective of following the reference trajectory in the presence of another decision-maker (i.e., the other participant). This scenario is thus modeled by means of a differential game under the feedback information structure (cf. Sect. 2.5), and indeed a noncooperative differential game with unknown player cost functionals since different participants (i.e., the players) are likely to exhibit different styles or behaviors in approaching the task. The structure of this differential game model is as follows. We assume a simplified model of the experimental setup based on the assumption of an ideal coupling of the two steering wheels, leading to a single angle ϕ and angular velocity ϕ˙ of the coupled system. With this assumption, the dynamics of the system of coupled steering wheels can be described by ⎡ ⎤ ⎤ ⎤ ⎡ 1 1 cc dc − − x(t) ˙ = ⎣ Isum Isum ⎦ x(t) + ⎣ Isum ⎦ u 1 (t) + ⎣ Isum ⎦ u 2 (t) 0 0 1 0 ⎡
where the state vector is
(7.22)
7.3 Experimental Case Study: Identification of Human Behavior for Shared Control Table 7.2 Experimental system model parameters Parameter Value
Description
kg m2
Isum
0.094
cc dc
1.146 Nm/rad 0.859 Nm · s/rad
x(t)
257
Rotational inertia of the coupled steering wheels Spring constant Damping constant
ϕ(t) ˙ , ϕ(t)
and the player (participant) controls u i (t) ∈ R with i ∈ {1, 2} are the steering-wheel torques. The variable Isum denotes the sum of the moments of inertia of both steering wheels. All system parameters are given in Table 7.2. We model the player cost functionals as quadratic in the control values, and quadratic in the error between the actual states x(t) and the state reference trajectory z(t) (defined by the reference line shown to the players on their monitors). That is, ∞ i V∞
e(t) Q i e(t) + u i (t) R ii u i (t) dt, i ∈ {1, 2},
(7.23)
0
where e(t) x(t) − z(t) and the cost matrices are of the form
Q i(1) 0 Q = 0 Q i(2)
i
(7.24)
with R ii ∈ R. The state reference trajectory is specifically defined by ϕ˙ ref (t) , z(t) ϕref (t)
(7.25)
with ϕ˙ref (t) being the reference value of the steering angle velocity and ϕref (t) being the reference value of the steering angle. The reference trajectory of the steering angle ϕref (t) corresponds to the one visible on the monitor and is equal for both players. No particular reference trajectory for the steering angle velocity is specified, either visually or verbally, and so we take ϕ˙ref (t) = 0. We note that the cost functionals (7.23) define tracking optimal control problems for each player. In order to solve the inverse (and forward) LQ differential game problem, we need to transform the tracking problem into a standard LQ regulation problem. This transformation is done using the approach presented in [14]. Let us therefore introduce the extended state
258
7 Examples and Experimental Case Study
⎡
⎤ ϕ(t) ˙ X (t) ⎣ ϕ(t) ⎦ . ϕref (t) Given (7.22) and assuming a (near) constant reference (noting that ϕ˙ref (t) = 0), the dynamics of this extended state are ⎡
cc dc − ⎢ X˙ (t) = ⎣ Θsum Θsum 1 0 0 0 −
⎤
⎡ 1 ⎤ ⎡ 1 ⎤ ⎥ ⎢ ⎥ ⎥ ⎢ X (t) + ⎣ Θsum ⎦ u 1 (t) + ⎣ Θsum ⎦ u 2 (t). 0 0 0⎦ 0 0 0 0
(7.26)
Let us also introduce the transformation matrix 1 0 0 Z = 0 1 −1 such that e(t) = Z X (t), and the original cost functionals of (7.23) are equivalent to i = V∞
with
∞
exp(−γ t)X (t) Q˜ i X (t) + R ii (u i (t)) dt, 2
(7.27)
0
⎡
⎤ Q i(1) 0 0 Q˜ i Z Q i Z = ⎣ 0 Q i(2) −Q i(2) ⎦ , 0 −Q i(2) Q i(2)
(7.28)
and where we introduce a discount factor γ ∈ [0, 1) to guarantee the convergence of the cost functionals. In this case, a factor of γ = 0.01 is chosen. Now, as shown in [14], we can absorb the discounting terms in the cost functionals (7.27) into the state dynamics (7.26), so that this new problem becomes equivalent to a standard LQ differential game with undiscounted player cost functionals.
7.3.3 Experimental Protocol The reference trajectory that was selected for the experiment is a series of step functions. These represent goal-oriented or point-to-point movements, also known as reaching movements. This choice is due to the widespread consideration of such movements in studies on human motor behavior, both from a neuroscience and biological perspective [6, 11, 12] as well as from a control-theoretical perspective [1, 4]. The pair of participants performed these movements for approximately 36 s. The data used for identification corresponds to four point-to-point movements defined by
7.3 Experimental Case Study: Identification of Human Behavior for Shared Control
259
the fixed positions (120◦ , 0◦ , −120◦ , 0◦ , 120◦ ). The given trajectories for identifying the player cost matrices Q i and R ii in this experiment thus correspond to (truncated) state trajectories X (t) and control trajectories u 1 (t) and u 2 (t).
7.3.4 Inverse Methods for Parameter Estimation Given the truncated state and control trajectories, and the LQ feedback differential game model structure, we estimate the player cost matrices Q i using the feedbackstrategy-based (FSB) inverse LQ differential game method given in Algorithm 6.4. As a comparison, we also consider the bilevel method given in (6.17). We note that for the avoidance of trivial solutions, we fixed and R ii = 1 for i ∈ {1, 2}. In contrast to the simulations presented in Sects. 7.1 and 7.2, no ground truth cost-functional parameters are available in this real application. Therefore, we shall evaluate the identification results by using the estimated cost functions to generate estimated trajectories x θ , u 1θ , and u 2θ and comparing them with the measured trajectories. We shall analyze the ability of the inverse methods to compute parameters that explain the observed trajectories.
7.3.5 Results We first implemented the bilevel method of (6.17) analogously to Sect. 7.1.3.1 with the main difference being that the lower level optimization generates feedback Nash equilibrium trajectories x θ , u 1θ , and u 2θ by solving an LQ feedback differential game (via the solution of coupled algebraic Riccati equations). Given the measured experimental data, our implementation of the bilevel method stops, with the MATLAB default stopping criteria for fminsearch, after 54 iterations and 876 function evaluations (i.e., solutions of the forward LQ feedback differential game), yielding, after transforming back to the cost-functional form (7.24), the parameters Q1 =
0 0 0 31.2223
and
Q2 =
0 0 . 0 41.9623
(7.29)
Our implementation of the bilevel method takes approximately 227 s to compute these parameters. Next, we implemented the FSB method of inverse LQ differential games of Algorithm 6.4. We used the MATLAB function quadprog to solve the quadratic program (6.74) with the constraints implied by the matrix structure in (7.28). Given the measured experimental data, our implementation of this FSB method yields the cost matrices
260
7 Examples and Experimental Case Study
Fig. 7.10 Experimental case study results Table 7.3 Root-mean-squared (RMS) state and control errors for the experimental case study Method RMS error ϕ˙ ϕ u1 u2 FSB Bilevel
1.3258 1.0582
0.1992 0.1238
Q1 =
0 0 0 15.6234
1.6551 1.6465
and
Q2 =
0 0 . 0 27.3669
2.1376 2.0884
(7.30)
Our implementation of the FSB method takes approximately 0.5 s to compute these parameters (including the estimation of the feedback control laws). The (feedback) Nash equilibrium trajectories that arise from using the parameters computed by both the bilevel and FSB methods to solve the forward LQ differential game are shown in Fig. 7.10, along with the given (measured) trajectories. The root mean squared (RMS) errors between the trajectories arising from these computed parameters and the measured data are reported in Table 7.3.
7.3 Experimental Case Study: Identification of Human Behavior for Shared Control
261
7.3.6 Discussion We observe that both the parameters identified by the FSB and the bilevel method are able to generate Nash equilibrium trajectories that are able to explain the result of the human–human haptic interaction, specifically the measured state trajectories. The smaller RMS errors achieved by the bilevel method can be explained by noting that this is the criterion that it directly minimizes, whereas the FSB method relies on the assumption that the trajectories correspond (closely) to a Nash equilibrium. We note that the FSB method performs well given its (significant) reduction in computational complexity compared to the bilevel method, and its performance could be further improved by adopting a variety of the spline-based preprocessing techniques proposed for minimum-principle methods in [10] and references therein. For both methods, there are certain parts of the control trajectories that are not closely approximated (specifically around jumps in the reference trajectory). One potential source of error for both methods is the inexact model of the experimental setup, i.e., the system of virtually coupled steering wheels. In particular, it is conceivable that the model becomes inaccurate the more dynamic the interaction between the human participants, as shown for example at t = 1.75 s when the torque is changed abruptly by one the test subjects. In these cases, the assumed stiff coupling of the dynamic model (7.26) is too restrictive. In addition, the LQ differential game model and in particular the linear feedback strategies cannot describe situations that are not in line with the aim of the experiment, e.g., when one of the subjects turns the steering wheel far too early. However, in general, the overall behavior of the humans is adequately described by both methods, and the practical performance of the FSB method is greatly encouraging given its superior computational efficiency.
7.4 Notes and Further Reading The vast majority of papers discussed in Sects. 3.7, 4.7, 5.8, and 6.8 detail simulation studies of bilevel and/or minimum-principle methods. Further detailed simulation and application studies of bilevel methods are presented in [3, 5, 17–19]. Simulation and/or experimental comparisons between bilevel methods and minimum-principle methods (specifically soft methods) are presented in [8, 10, 16]. Furthermore, [8] and [10] study and discuss a suite of practical considerations related to the use of minimum-principle methods, including specification of basis functions, noise performance, and practical techniques for preprocessing noise-corrupted trajectories (e.g., using spline interpolation). Finally, a comparison of recent emerging inverse methods within the same experimental setup as Sect. 7.3 is provided in [7].
262
7 Examples and Experimental Case Study
References 1. Albrecht S, Ramírez-Amaro K, Ruiz-Ugalde F, Weikersdorfer D, Leibold M, Ulbrich M, Beetz M (2011) Imitating human reaching motions using physically inspired optimization principles. In: 2011 11th IEEE-RAS international conference on humanoid robots, pp 602–607. ISSN: 2164-0572 2. Anderson BDO, Moore JB (1990) Optimal control: linear quadratic methods. Prentice Hall, Englewood Cliffs 3. Berret B, Chiovetto E, Nori F, Pozzo T (2011) Evidence for composite cost functions in arm movement planning: an inverse optimal control approach. PLoS Comput Biol 7(10) 4. Chackochan VT, Sanguineti V (2017) Modelling collaborative strategies in physical human– human interaction. In: Ibáñez J, González-Vargas J, Azorín JM, Akay M, Pons JL (eds) Converging clinical and engineering research on neurorehabilitation II, vol 15. Springer International Publishing, Cham, pp 253–258 5. Flad M (2019) Differential-game-based driver assistance system for fuel-optimal driving. In: Petrosyan LA, Mazalov VV, Zenkevich NA (eds) Frontiers of dynamic games: game theory and management, St. Petersburg, 2018, Static & dynamic game theory: foundations & applications. Springer International Publishing, Cham, pp 13–36 6. Flash T, Henis E (1991) Arm trajectory modifications during reaching towards visual targets. J Cognit Neurosci 3(3):220–230. Publisher: MIT Press 7. Inga J, Creutz A, Hohmann S (2021) Online inverse linear-quadratic differential games applied to human behavior identification in shared control. In: 2021 European control conference (ECC) 8. Inga Charaja JJ (2021) Inverse dynamic game methods for identification of cooperative system behavior. KIT Scientific Publishing. Publication Title: KIT Scientific Publishing 9. Jin W, Kuli´c D, Mou S, Hirche S (2021) Inverse optimal control from incomplete trajectory observations. Int J Rob Res 40(6–7):848–865 10. Johnson M, Aghasadeghi N, Bretl T (2013) Inverse optimal control for deterministic continuous-time nonlinear systems. In: 2013 IEEE 52nd annual conference on decision and control (CDC), pp 2906–2913 11. Kalaska JF (2009) From intention to action: motor cortex and the control of reaching movements. In: Sternad D (ed) Progress in motor control. Number 629 in Advances in experimental medicine and biology. Springer, US, pp 139–178 12. Krakauer JW, Mazzoni P (2011) Human sensorimotor learning: adaptation, skill, and beyond. Curr Opin Neurobiol 21(4):636–644 13. Li W, Todorov E, Liu D (2011) Inverse optimality design for biological movement systems. IFAC Proc Vol 44(1):9662–9667 14. Modares H, Lewis FL (2014) Linear quadratic tracking control of partially-unknown continuous-time systems using reinforcement learning. IEEE Trans Autom Control 59(11):3051–3056 15. Molloy TL, Inga J, Flad M, Ford JJ, Perez T, Hohmann S (2020) Inverse open-loop noncooperative differential games and inverse optimal control. IEEE Trans Autom Control 65(2):897–904 16. Molloy TL, Ford JJ, Perez T (2017) Inverse noncooperative differential games. In: 2017 IEEE 56th annual conference on decision and control (CDC), Melbourne, Australia 17. Molloy TL, Garden GS, Perez T, Schiffner I, Karmaker D, Srinivasan M (2018) An inverse differential game approach to modelling bird mid-air collision avoidance behaviours. In: 18th IFAC symposium on system identification (SYSID, 2018), Stockholm, Sweden 18. Mombaur K, Truong A, Laumond J-P (2010) From human to humanoid locomotion–an inverse optimal control approach. Auton Rob 28(3):369–383 19. Oguz OS, Zhou Z, Glasauer S, Wollherr D (2018) An inverse optimal control approach to explain human arm reaching control based on multiple internal models. Sci Rep 8(1):5583
Index
A Admissible player controls, 25, 32 Advanced driver assistance systems (ADAS), 254 Algebraic Riccati equation (ARE) Continuous-time, 133 Coupled continuous-time, 219 Coupled discrete-time, 179 Discrete-time, 87 Approximate solution or optimality, 47, 103, 150, 195 B Basis functions, 57, 112, 159, 205, 231 Bilevel methods, 47, 103, 150, 196 Bilevel optimization, 47 Boundary of set, 13 C Closed-loop system matrix, 86, 133, 178, 219 Constraint-satisfaction method Truncated-sequence, 55 Truncated-sequence open-loop, 157 Truncated-trajectory, 109 Truncated-trajectory open-loop, 203 Whole-sequence, 51 Whole-sequence open-loop, 154 Whole-trajectory, 105 Whole-trajectory open-loop, 200 Constraint-satisfaction method reformulation Truncated-sequence, 74 Truncated-sequence open-loop, 168
Truncated-trajectory, 128 Truncated-trajectory open-loop, 214 Whole-sequence, 60 Whole-sequence open-loop, 160 Whole-trajectory, 114 Whole-trajectory open-loop, 207 Constraint-satisfaction method solution results Truncated-sequence, 76 Truncated-sequence open-loop, 169 Truncated-trajectory, 130 Truncated-trajectory open-loop, 214 Whole-sequence, 63 Whole-sequence open-loop, 161 Whole-trajectory, 119 Whole-trajectory open-loop, 208 Continuously differentiable, 13 Continuous time, 20 Continuous-time inverse optimal control, 102 Continuous-time optimal control, 20 Finite-horizon, 21 Infinite-horizon, 21 Parameterized, 98 Control-constraint set, 11, 16, 21 Player, 25, 32 Controls, 1 Control sequence, 17 Control trajectory, 21 Convex optimization, 13 Costate function, 22 Costate vector, 18 Cost function(al), 1, 11, 16, 17, 21 Parameterized, 42, 98, 144, 190 Player, 25, 33
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 T. L. Molloy et al., Inverse Optimal Control and Inverse Noncooperative Dynamic Game Theory, Communications and Control Engineering, https://doi.org/10.1007/978-3-030-93317-3
263
264 Quadratic, 86, 133, 178, 218 Cost matrices, 86, 133, 178, 218
D Decision-maker, 3 Diagonal matrix, 14 Difference equations, 16 Differential equations, 21 Discrete time, 16 Discrete-time inverse optimal control, 46 Discrete-time optimal control, 16 Finite-horizon, 16 Infinite-horizon, 17 Parameterized, 42 Dynamical system, 1 Continuous-time, 21 Discrete-time, 16 Dynamic programming, 2 Dynamics. see Dynamical system
E Empty set, 12 Exact solution or optimality, 47, 103, 150, 195
F Feasible, 12 Feedback, 26, 85, 132 Feedback (control) law, 2, 85, 132 Feedback-law-based (FLB) method Continuous-time, 136 Discrete-time, 91 Feedback-law-based (FLB) problem Continuous-time, 134 Discrete-time, 88 Feedback-strategy-based (FSB) problem, 180, 220 Forward problem, 1
G Game theory, 3 Generalized inverse, 15 Gradient, 12
H Hamiltonian function Continuous-time, 22 Discrete-time, 18 Parameterized, 43, 99
Index Parameterized Player, 145, 191 Player, 29, 36 Hamiltonian gradient, 58, 112 Player, 159, 206 Horizon, 16, 20
I Identifiability, 95 Identity matrix, 14 Inactive Control Constraint Times, 50 Infeasible, 12 Infimum, 12 Information structure, 26, 33 Feedback, 26, 33 Open-loop, 26, 33 Interior of set, 13 Inverse linear-quadratic (LQ) feedback differential game method, 217 Inverse linear-quadratic (LQ) feedback dynamic game method, 177 Inverse linear-quadratic (LQ) optimal control approach Continuous-time two-step, 132 Discrete-time two-step, 85 Inverse noncooperative differential game, 5 Inverse noncooperative dynamic game, 5 Inverse optimal control, 2 Inverse problem, 1 Inverse reinforcement learning, 3, 5 Inverse two-player zero-sum dynamic game, 185 Invertible matrix, 14
K Karush–Kuhn–Tucker (KKT) conditions, 94 Kronecker product, 88
L Linear-least-squares estimation, 93, 138, 183, 224 Linearly parameterized, 57, 112, 159, 205, 231 Linear ordinary differential equation, 132 Linear-quadratic (LQ) optimal control, 2 Continuous-time, 133 Discrete-time, 86 Lipschitz, 21
M Maximum principle, 2
Index Minimizing argument, 12 Minimum, 11 Minimum principle, 2 Continuous-time finite-horizon, 22 Continuous-time infinite-horizon, 23 Discrete-time finite-horizon, 18 Discrete-time infinite-horizon, 19 Parameterized continuous-time finitehorizon, 99 Parameterized continuous-time horizoninvariant, 100 Parameterized continuous-time infinitehorizon, 99 Parameterized discrete-time finitehorizon, 44 Parameterized discrete-time horizoninvariant, 45 Parameterized discrete-time infinitehorizon, 44 Misspecified, 47 Mixed method Truncated-sequence, 56 Truncated-sequence open-loop, 157 Truncated-trajectory, 111 Truncated-trajectory open-loop, 204 Whole-sequence, 53 Whole-sequence open-loop, 155 Whole-trajectory, 107 Whole-trajectory open-loop, 201 Mixed method reformulation Truncated-sequence, 75 Truncated-sequence open-loop, 173 Truncated-trajectory, 128 Truncated-trajectory open-loop, 214 Whole-sequence, 62 Whole-sequence open-loop, 166 Whole-trajectory, 118 Whole-trajectory open-loop, 213 Mixed method solution results Truncated-sequence, 82 Truncated-sequence open-loop, 174 Truncated-trajectory, 130 Truncated-trajectory open-loop, 214 Whole-sequence, 69 Whole-sequence open-loop, 167 Whole-trajectory, 125 Whole-trajectory open-loop, 214 Moore–Penrose pseudoinverse, 14
N Nash equilibrium, 3, 27, 34 Nash equilibrium conditions
265 Continuous-time finite-horizon feedback, 38 Continuous-time finite-horizon openloop, 37 Discrete-time finite-horizon feedback, 31 Discrete-time finite-horizon open-loop, 30 Parameterized continuous-time finitehorizon open-loop, 191 Parameterized continuous-time horizoninvariant feedback, 193 Parameterized continuous-time horizoninvariant open-loop, 192 Parameterized discrete-time finitehorizon feedback, 147 Parameterized discrete-time finitehorizon open-loop, 146 Parameterized discrete-time horizoninvariant open-loop, 147 Necessary condition, 13 Noncooperative, 3 Noncooperative differential game, 5, 32 Linear-quadratic (LQ) feedback, 218 Parameterized, 190 Noncooperative dynamic game, 3, 5, 24 Linear-quadratic (LQ) feedback, 178 Parameterized, 144 Noncooperative dynamic game theory, 3 Nonsingular matrix, 14 Nonzero-sum game, 25
O Online or sequential inverse problems, 95 Open-loop, 26 Optimal control, 1 Optimality conditions, 12
P Parameter set, 42, 98, 144, 190 Fixed-element, 63, 118, 160, 206 Persistence of excitation, 64, 95 Player, 3 Pontryagin, 2, 5 Positive definite, 14 Positive semidefinite, 14 Potential game theory, 186 Pseudoinverse. see Moore–Penrose pseudoinverse
266 Q Quadratic cost function(al). see Cost function(al), 178, 218 Quadratic form, 13 Quadratic program, 13
R Rank, 14 Recovery matrix approach, 95 Reinforcement learning, 2 Riccati differential equation, 129 Riccati equation Algebraic. see Algebraic Riccati equation (ARE) Differential. see Riccati differential equation Robot arm, 228
S Select Inactive Control Constraint Times, 105 Shared control, 254 Singular value decomposition (SVD), 14 Soft method Truncated-sequence, 55 Truncated-sequence open-loop, 157 Truncated-trajectory, 110 Truncated-trajectory open-loop, 203 Whole-sequence, 52 Whole-sequence open-loop, 155 Whole-trajectory, 106 Whole-trajectory open-loop, 201 Soft method reformulation Truncated-sequence, 74 Truncated-sequence open-loop, 171 Truncated-trajectory, 129 Truncated-trajectory open-loop, 215 Whole-sequence, 61 Whole-sequence open-loop, 164 Whole-trajectory, 117 Whole-trajectory open-loop, 210 Soft method solution results Truncated-sequence, 79
Index Truncated-sequence open-loop, 172 Truncated-trajectory, 131 Truncated-trajectory open-loop, 216 Whole-sequence, 66 Whole-sequence open-loop, 165 Whole-trajectory, 121 Whole-trajectory open-loop, 211 Solution concepts, 3 Stability, 87, 133 Stage cost, 17, 21, 25, 33 Parameterized, 42, 57, 98, 112, 144, 159, 190, 205 States, 1 State sequence, 17 State trajectory, 21 Static optimization, 11 Strategy, 26, 33 Feedback, 26, 33 Open-loop, 26, 33 Structural estimation, 3 Sufficient condition, 13 Symmetric matrix, 13 System identification, 85 System matrices, 86, 133, 177, 218 System of linear equations, 15 (In)consistent, 15 Solutions, 15
T Terminal cost, 17, 21, 25, 33 Parameterized, 42, 57, 144, 159 Transpose, 12 Truncated-sequence (TS) problem, 46, 149 Truncated solution, 45 Truncated-trajectory (TT) problem, 102, 195
W Whole-sequence (WS) problem, 46, 148 Whole-trajectory (WT) problem, 102, 194
Z Zero-sum game, 25, 185