128 102 2MB
English Pages 224 [218] Year 2006
Advances in Industrial Control
Other titles published in this Series: Digital Controller Implementation and Fragility Robert S.H. Istepanian and James F. Whidborne (Eds.) Optimisation of Industrial Processes at Supervisory Level Doris Sáez, Aldo Cipriano and Andrzej W. Ordys Robust Control of Diesel Ship Propulsion Nikolaos Xiros Hydraulic Servo-systems Mohieddine Jelali and Andreas Kroll Strategies for Feedback Linearisation Freddy Garces, Victor M. Becerra, Chandrasekhar Kambhampati and Kevin Warwick Robust Autonomous Guidance Alberto Isidori, Lorenzo Marconi and Andrea Serrani Dynamic Modelling of Gas Turbines Gennady G. Kulikov and Haydn A. Thompson (Eds.) Control of Fuel Cell Power Systems Jay T. Pukrushpan, Anna G. Stefanopoulou and Huei Peng Fuzzy Logic, Identification and Predictive Control Jairo Espinosa, Joos Vandewalle and Vincent Wertz Optimal Real-time Control of Sewer Networks Magdalene Marinaki and Markos Papageorgiou Process Modelling for Control Benoît Codrons Computational Intelligence in Time Series Forecasting Ajoy K. Palit and Dobrivoje Popovic Modelling and Control of mini-Flying Machines Pedro Castillo, Rogelio Lozano and Alejandro Dzul
Rudder and Fin Ship Roll Stabilization Tristan Perez Hard Disk Drive Servo Systems (2nd Edition) Ben M. Chen, Tong H. Lee, Kemao Peng and Venkatakrishnan Venkataramanan Measurement, Control, and Communication Using IEEE 1588 John Eidson Piezoelectric Transducers for Vibration Control and Damping S.O. Reza Moheimani and Andrew J. Fleming Windup in Control Peter Hippe Manufacturing Systems Control Design Stjepan Bogdan, Frank L. Lewis, Zdenko Kovaˇci´c and José Mireles Jr. Practical Grey-box Process Identification Torsten Bohlin Modern Supervisory and Optimal Control Sandor A. Markon, Hajime Kita, Hiroshi Kise and Thomas Bartz-Beielstein Publication due July 2006 Wind Turbine Control Systems Fernando D. Bianchi, Hernán De Battista and Ricardo J. Mantz Publication due August 2006 Soft Sensors for Monitoring and Control of Industrial Processes Luigi Fortuna, Salvatore Graziani, Alessandro Rizzo and Maria Gabriella Xibilia Publication due August 2006 Advanced Fuzzy Logic Technologies in Industrial Applications Ying Bai, Hanqi Zhuang and Dali Wang (Eds.) Publication due September 2006 Practical PID Control Antonio Visioli Publication due November 2006
Murad Abu-Khalaf, Jie Huang and Frank L. Lewis
Nonlinear H2 /H∞ Constrained Feedback Control A Practical Design Approach Using Neural Networks
With 47 Figures
123
Murad Abu-Khalaf, PhD Automation & Robotics Research Institute The University of Texas at Arlington Fort Worth, Texas USA
Jie Huang, PhD Department of Automation and Computer-aided Engineering Chinese University of Hong Kong Shatin, New Territories Hong Kong
Frank L. Lewis, PhD Automation & Robotics Research Institute The University of Texas at Arlington Fort Worth, Texas USA
British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library Library of Congress Control Number: 2006925302 Advances in Industrial Control series ISSN 1430-9491 ISBN-10: 1-84628-349-3 e-ISBN 1-84628-350-7 ISBN-13: 978-1-84628-349-9
Printed on acid-free paper
© Springer-Verlag London Limited 2006 MATLAB® and Simulink® are registered trademarks of The MathWorks, Inc., 3 Apple Hill Drive, Natick, MA 01760-2098, U.S.A. http://www.mathworks.com Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms of licences issued by the Copyright Licensing Agency. Enquiries concerning reproduction outside those terms should be sent to the publishers. The use of registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant laws and regulations and therefore free for general use. The publisher makes no representation, express or implied, with regard to the accuracy of the information contained in this book and cannot accept any legal responsibility or liability for any errors or omissions that may be made. Printed in Germany 987654321 Springer Science+Business Media springer.com
Advances in Industrial Control Series Editors Professor Michael J. Grimble, Professor of Industrial Systems and Director Professor Michael A. Johnson, Professor (Emeritus) of Control Systems and Deputy Director Industrial Control Centre Department of Electronic and Electrical Engineering University of Strathclyde Graham Hills Building 50 George Street Glasgow G1 1QE United Kingdom
Series Advisory Board Professor E.F. Camacho Escuela Superior de Ingenieros Universidad de Sevilla Camino de los Descobrimientos s/n 41092 Sevilla Spain Professor S. Engell Lehrstuhl für Anlagensteuerungstechnik Fachbereich Chemietechnik Universität Dortmund 44221 Dortmund Germany Professor G. Goodwin Department of Electrical and Computer Engineering The University of Newcastle Callaghan NSW 2308 Australia Professor T.J. Harris Department of Chemical Engineering Queen’s University Kingston, Ontario K7L 3N6 Canada Professor T.H. Lee Department of Electrical Engineering National University of Singapore 4 Engineering Drive 3 Singapore 117576
Professor Emeritus O.P. Malik Department of Electrical and Computer Engineering University of Calgary 2500, University Drive, NW Calgary Alberta T2N 1N4 Canada Professor K.-F. Man Electronic Engineering Department City University of Hong Kong Tat Chee Avenue Kowloon Hong Kong Professor G. Olsson Department of Industrial Electrical Engineering and Automation Lund Institute of Technology Box 118 S-221 00 Lund Sweden Professor A. Ray Pennsylvania State University Department of Mechanical Engineering 0329 Reber Building University Park PA 16802 USA Professor D.E. Seborg Chemical Engineering 3335 Engineering II University of California Santa Barbara Santa Barbara CA 93106 USA Doctor K.K. Tan Department of Electrical Engineering National University of Singapore 4 Engineering Drive 3 Singapore 117576 Professor Ikuo Yamamoto Kyushu University Graduate School Marine Technology Research and Development Program MARITEC, Headquarters, JAMSTEC 2-15 Natsushima Yokosuka Kanagawa 237-0061 Japan
To my parents Suzan and Muhammad Samir M. Abu-Khalaf To Qingwei, Anne and Jane J. Huang To Galina F. L. Lewis
Series Editors’ Foreword
The series Advances in Industrial Control aims to report and encourage technology transfer in control engineering. The rapid development of control technology has an impact on all areas of the control discipline. New theory, new controllers, actuators, sensors, new industrial processes, computer methods, new applications, new philosophies}, new challenges. Much of this development work resides in industrial reports, feasibility study papers and the reports of advanced collaborative projects. The series offers an opportunity for researchers to present an extended exposition of such new work in all aspects of industrial control for wider and rapid dissemination. Almost all physical systems are nonlinear and the success of linear control techniques depends on the extent of the nonlinear system behaviour and the careful attention given to switching linear controllers through the range of nonlinear system operations. In many industrial and process-control applications, good engineering practice, linear control systems and classical PID control can give satisfactory performance because the process nonlinearity is mild and the control system performance specification is not particularly demanding; however, there are other industrial system applications where the requirement for high-performance control can only be achieved if nonlinear control design techniques are used. Thus, in some industrial and technological domains there is a strong justification for more applications of nonlinear methods. One prevailing difficulty with nonlinear control methods is that they are not so easily understood nor are they easy to reduce to formulaic algorithms for routine application. The abstract and often highly mathematical tools needed for nonlinear control systems design means that there is often an “education gap” between the control theorist and the industrial applications engineer; a gap that is difficult to bridge and that prevents the widespread implementation of many nonlinear control methods. The theorist/applications engineer “education gap” is only one aspect of the complex issues involved in the technology transfer of nonlinear control systems into industry. A second issue lies in the subject itself and involves the question of whether nonlinear control design methods are sufficiently mature actually to make the transfer to industry feasible and worthwhile. A look at the nonlinear control literature reveals many novel approaches being developed by the theorist but often
x
Series Editors’ Foreword
these methods are neither tractable nor feasible nor has sufficient attention been given to the practical relevance of the techniques for industrial application. We hope through the Advances in Industrial Control series to explore these themes through suitable volumes and to try to create a corpus of monograph texts on applicable nonlinear control methods. Typically such volumes will make contributions to the range of applicable nonlinear-control-design tools, will provide reviews of industrially applicable techniques that try to unify groups of nonlinear control design methods and will provide detailed presentations of industrial applications of nonlinear control methods and system technology. This particular volume in Advances in Industrial Control by M. Abu-Khalaf, J. Huang and F.L. Lewis makes a contribution to increasing the range of applicable nonlinear control design tools. It starts from a very classical viewpoint that performance can be captured by a suitably constructed cost function and that the appropriate control law emerges from the optimisation of the cost function. The difficulty is that the solution of these optimal control problems for the class of nonlinear state-space systems selected leads to intractable equations of the Hamilton–Jacobi type. The authors then propose and develop a solution route that exploits the approximation properties of various levels of complexity within nonlinear network structures. Namely, they use neural networks and exploit their “universal function approximation property” to compute tractable solutions to the posed nonlinear H2- and H-optimal-control problems. Demonstrations of the methods devised are given for various numerical examples in Chapter 3; these include a nonlinear oscillator, a minimum-time control problem and a parabolic tracking system. Later in the volume, the nonlinear benchmark problem of a Rotational–Translational Actuator (RTAC) system is used to illustrate the power of the methods devised. An aerospace example using the control design for the F-16 aircraft normal acceleration regulator illustrates a high-performance output feedback control system application. Thus, the volume has an interesting set of applications examples to test the optimal control approximation techniques and demonstrate the performance enhancements possible. This welcome entry to the Advances in Industrial Control monograph series will be of considerable interest to the academic research community particularly those involved in developing applicable nonlinear-control-system methods. Research fellows and postgraduate students should find many items giving research inspiration or requiring further development. The industrial engineer will be able to use the volume’s examples to see what the nonlinear control laws look like and by how much levels of performance can be improved by the use of nonlinear optimal control. M.J. Grimble and M.A. Johnson Industrial Control Centre Glasgow, Scotland, U.K.
Preface
Modern Control Theory has revolutionized the design of control systems for aerospace systems, vehicles including automobiles and ships, industrial processes, and other highly complex systems in today’s world. Modern Control Theory was introduced during the late 1950s and 1960s. Key features of Modern Control are the use of matrices, optimality design conditions, and probabilistic methods. It allows the design of control systems with guaranteed performance for multiinput/multi-output systems through the solution of formal matrix design equations. For linear state-space systems, the design equations are quadratic in form and belong to the general class known as Riccati equations. For systems in polynomial form, the design equations belong to the class known as Diophantine equations. The availability of excellent solution techniques for the Riccati and Diophantine design equations has brought forward a revolution in the design of control systems for linear systems. Moreover, mathematical analysis techniques have been effectively used to provide guaranteed performance and closed-loop stability results for these linear system controllers. This has provided confidence in modern control systems designed for linear systems, resulting in their general acceptance in communities including aerospace, process control, military systems, and vehicle systems, where performance failures can bring catastrophic disasters. Physical systems are nonlinear. The push to extend the operating envelopes of such systems, for instance hyper-velocity and super-maneuverability performance in aerospace systems and higher data storage densities for computer hard disk drive systems, means that linear approximation techniques for controls design no longer work effectively. Therefore the design of efficient modern control systems hinges on the ability to use nonlinear system models. It is known that control systems design for general nonlinear systems can be performed by solving equations that are in the Hamilton–Jacobi (HJ) class. Unfortunately, control design for modernday nonlinear systems is hampered because the HJ equations are impossible to solve exactly for general nonlinear systems. This book presents computationally effective and rigorous methods for solving control design equations in the HJ class for nonlinear systems. The approach taken
xii
Preface
is the approximation of the value functions of the HJ equations by nonlinear network structures such as neural networks. It is known that neural networks have many properties, some of them remarkable and none more important than the “universal function approximation property”. In this book, we use neural networks to solve HJ equations to obtain nearly optimal solutions. The convergence of the solutions and the guaranteed performance properties of the controllers derived from them are rigorously shown using mathematical analysis techniques. The result of the nearly optimal solution procedures provided in this book is an extension to modern nonlinear systems of accepted and proven results like those already known for linear systems. Included are optimal controls design for nonlinear systems, H-infinity design for nonlinear systems, constrained-input controllers including minimum-time design for nonlinear systems, and other results that are essential for effective utilization of the full envelope of capabilities of modern systems. The book is organized into eight chapters. In Chapter 1, preliminarily results from four main areas are collected. These results can be thought of as the building blocks upon which the rest of the book relies. Chapter 2 introduces the policy iterations technique to constrained nonlinear optimal control systems. It is shown that one can solve the optimal control problem by iterative optimization. Chapter 3 introduces neural network training as a means to solve the iterative optimizations introduced in Chapter 2. Both Chapter 2 and 3 therefore introduce neural networks to the solution of optimal control problems for constrained nonlinear systems by using iterative optimization techniques based on policy iterations, dynamic programming principles, function approximation, and neural network training. In Chapter 4, the application of reinforcement learning to zero-sum games appearing in H-infinity control is discussed. The result is an iterative optimization technique that solves the zero-sum game. Chapter 5 shows an implementation of neural networks to the solution of the iterative optimization problems for the case of zero-sum games. In Chapters 6 and 7, a systematic approach to the solution of the value function for the case of zero-sum games is shown in both continuous time and discrete time respectively. In this case, unlike the previous chapters, the solution is aimed at directly without using neural networks or iterative optimizations. Chapter 8 addresses constraints on the measured output. The static output feedback control for H-infinity control is treated. An iterative method to solve for the static output feedback gain for the case of linear systems is presented. The work in Chapter 8 is based on collaborative research with Jyotirmay Gadewadikar who contributed this chapter. Simulations presented in this book are implemented using The MathWorks MATLAB software package. Funding of the work reported by the first and third authors, mainly Chapters 1– 5 and Chapter 8, was provided by the National Science Foundation through the Electrical and Communications Systems division under grant ECS-0501451, and by the Army Research Office under grant W91NF-05-1-0314. Jie Huang’s work, which is limited to Chapters 6 and 7, was supported by the Hong Kong Research
Preface
xiii
Grants Council under grant CUHK 4168/03E, and by the National Natural Science Foundation of China under grant No. 60374038.
April 2006 Arlington, Texas
Murad Abu-Khalaf Jie Huang Frank L. Lewis
Contents
Mathematical Notation ....................................................................................... xix 1 Preliminaries and Introduction ....................................................................... 1 1.1 Nonlinear Systems ..................................................................................... 1 1.1.1 Continuous-time Nonlinear Systems .............................................. 1 1.1.2 Discrete-time Nonlinear Systems ................................................... 2 1.2 Stability of Nonlinear Systems .................................................................. 3 1.2.1 Lyapunov Stability of Continuous-time Nonlinear Systems .......... 4 1.2.2 Lyapunov Stability of Discrete-time Nonlinear Systems ............... 7 1.3 Dissipativity of Nonlinear Systems ........................................................... 8 1.3.1 Dissipativity of Continuous-time Nonlinear Systems..................... 9 1.3.2 Dissipativity of Discrete-time Nonlinear Systems........................ 12 1.4 Optimal Control of Nonlinear Systems.................................................... 14 1.4.1 Dynamic Programming and the HJB Equation............................. 14 1.4.2 Discrete-time HJB Equation......................................................... 17 1.5 Policy Iterations and Optimal Control ..................................................... 18 1.5.1 Policy Iterations and H2 Optimal Control..................................... 19 1.5.2 Policy Iterations and the Bounded Real Lemma........................... 21 1.6 Zero-sum Games of Nonlinear Systems .................................................. 23 1.6.1 Continuous-time Zero-sum Games: The HJI Equation ................ 23 1.6.2 Linear Quadratic Zero-sum Games and H Optimal Control ....... 25 1.6.3 Discrete-time HJI equation........................................................... 26 1.7 Neural Networks and Function Approximation....................................... 28 1.7.1 Neural Networks........................................................................... 28 1.7.2 Function Approximation Theorems.............................................. 30 1.8 Bibliographical Notes .............................................................................. 31 2 Policy Iterations and Nonlinear H2 Constrained State Feedback Control .................................................................................. 33 2.1 Introduction.............................................................................................. 33 2.2 Optimal Regulation of Systems with Actuator Saturation ....................... 34 2.3 Policy Iterations for Constrained-input Systems...................................... 37
xvi
Contents
2.4
2.5
Nonquadratic Performance Functionals for Minimum-time and Constrained States Control ...................................................................... 41 2.4.1 Minimum-time Problems.............................................................. 41 2.4.2 Constrained States ........................................................................ 41 Bibliographical Notes .............................................................................. 42
3 Nearly H2 Optimal Neural Network Control for Constrained-Input Systems............................................................................ 43 3.1 A Neural Network Solution to the LE(V,u).............................................. 43 3.2 Convergence of the Method of Least Squares to the Solution of the LE(V,u) .......................................................................................... 45 3.3 Convergence of the Method of Least Squares to the Solution of the HJB Equation................................................................................. 52 3.4 Algorithm for Nearly Optimal Neurocontrol Design with Saturated Controls: Introducing a Mesh in \ n ....................................... 54 3.5 Numerical Examples................................................................................ 56 3.5.1 Constrained-input Linear System ................................................. 56 3.5.2 Nonlinear Oscillator with Constrained Input................................ 62 3.5.3 Constrained State Linear System.................................................. 65 3.5.4 Minimum-time Control................................................................. 68 3.5.5 Parabolic Tracker.......................................................................... 71 3.6 Policy Iterations Without Solving the LE(V,u) ........................................ 75 3.7 Bibliographical Notes .............................................................................. 76 4 Policy Iterations and Nonlinear H Constrained State Feedback Control .................................................................................. 77 4.1 Introduction.............................................................................................. 77 4.2 Policy Iterations and the Nonlinear Bounded Real Lemma ..................... 78 4.3 L2-gain of Nonlinear Control systems with Input Saturation ................... 83 4.4 The HJI Equation and the Saddle Point ................................................... 86 4.5 Solving the HJI Equation Using Policy Iterations ................................... 90 4.6 Bibliographical Notes .............................................................................. 94 5 Nearly H Optimal Neural Network Control for Constrained-Input systems ............................................................................ 95 5.1 Neural Network Representation of Policies............................................. 96 5.2 Stability and Convergence of Least Squares Neural Network Policy Iterations ..................................................................................... 100 5.3 RTAC: The Nonlinear Benchmark Problem.......................................... 104 5.4 Bibliographical Notes ............................................................................ 113 6 Taylor Series Approach to Solving HJI Equation ..................................... 115 6.1 Introduction............................................................................................ 115 6.2 Power Series Solution of HJI Equation.................................................. 118 6.3 Explicit Expression for Hk ..................................................................... 126 6.4 The Disturbance Attenuation of RTAC System..................................... 135 6.5 Bibliographical Notes ............................................................................ 146
Contents xvii
7 An Algorithm to Solve Discrete HJI Equations Arising from Discrete Nonlinear H Control Problems.......................................... 147 7.1 Introduction............................................................................................ 147 7.2 Taylor Series Solution of Discrete Hamilton–Jacobi–Isaacs Equation................................................................................................. 151 7.3 Disturbance Attenuation of Discretized RTAC System......................... 164 7.4 Computer Simulation............................................................................. 172 7.5 Bibliographical Notes ............................................................................ 175 8 H Static Output Feedback.......................................................................... 177 8.1 Introduction............................................................................................ 177 8.2 Intermediate Mathematical Analysis ..................................................... 178 8.3 Coupled HJ Equations for H Static Output Feedback Control............. 182 8.4 Existence of Static Output Feedback Game Theoretic Solution ............ 185 8.5 Iterative Solution Algorithm .................................................................. 187 8.6 H Static Output Feedback Design for F-16 Normal Acceleration Regulator .......................................................................... 188 8.7 Bibliographical Notes ............................................................................ 192 References ........................................................................................................... 193 Index .................................................................................................................... 201
Mathematical Notation
\n AT V ( x) Vx
x x xc H2 Hf L2 : : C m (: ) x: x: w
V V V V ARE HJ HJB HJI LE DOV A
B A B sup x:
Euclidean n -dimensional space transpose of matrix A value or cost of x column vector corresponding to the gradient of V ( x) with respect to x . In Chapters 5 and 6, this is a row vector. state vector of the dynamical system the 2-norm of vector x transpose of the vector x 2-norm on the Hardy space f-norm on the Hardy space 2-norm on the Lebesgue space of integrable functions compact set of the state space complement of the set : continuous and differentiable up to the mth degree on : x belongs to : x does not belong to : neural network weight vector neural network activation function neural network activation functions vector column vector denoting the gradient of V with respect to x algebraic Riccati equation Hamilton–Jacobi equation Hamilton–Jacobi–Bellman equation Hamilton–Jacobi–Isaacs equation Lyapunov equation domain of validity there exists A Kronecker product B A and B supremum of a function with respect to x on :
xx
Mathematical Notation
min
minimum with respect to u
max
maximum with respect to d
u
d
a, b
inner product: integral
³ a( x)b( x)dx
for scalar a( x) and b( x)
1 Preliminaries and Introduction
In this chapter, basic concepts and background material related to the analysis and control of nonlinear systems are reviewed. The topics covered in this chapter are based on a variety of well-established research topics upon which the rest of this book is based. In Section 1.1, the classes of continuous-time and discrete-time dynamical nonlinear systems that appear throughout the book are described using the state-space formulation. Section 1.2 reviews the main stability results concerning nonlinear dynamical systems. Section 1.3 reviews the important notions of dissipativity and the bounded real lemma. In Section 1.4, optimal control of nonlinear dynamical systems is reviewed, and the well-known Hamilton–Jacobi–Bellman equation for continuous-time and discrete-time systems is introduced with its relations to the H 2 norm. In Section 1.5, the concept of policy iterations found in the reinforcement learning literature is reviewed. Its relation to the optimal control problem is discussed in the framework of Riccati equations. In Section 1.6, zero-sum game theory is reviewed, and the well-known Hamilton–Jacobi–Isaacs equation for continuous-time and discrete-time domains is introduced and its relation to the H f norm. Finally, Section 1.7 reviews the basics of neural networks and their function approximation property.
1.1 Nonlinear Systems In this section, continuous-time and discrete-time systems considered in this book are described. These systems are autonomous, i.e. time-invariant, and affine in the input. 1.1.1 Continuous-time Nonlinear Systems In this book, the affine in input continuous-time nonlinear dynamical systems considered in this book are described by
2
Nonlinear H2/H Constrained Feedback Control
x (t )
f ( x(t )) g ( x(t ))u (t ) k ( x(t )) d (t )
y (t )
h( x(t ))
(1.1)
where t is the continuous-time index, x(t ) \ n is the internal state vector, f ( x) \ n , g ( x) \ nu m1 and k ( x) \ nu m2 . u (t ) \ m1 is the control input, and y (t ) \ p is the measured system output. d (t ) \ m2 is a disturbance input determined by the surrounding environment. Note that the dynamics of many physical systems can be described by (1.1). For instance, it may be derived from the physics of the system by using the Lagrangian or Hamiltonian dynamics. Equation y h( x) is called the output or measurement equation and represents how we choose to measure the systems variables. It depends on the type and availability of sensors. Throughout this book, it is assumed that a unique continuous-time solution exists locally for all t t 0 . To guarantee this, it is assumed throughout the book that f , g , and k are sufficiently smooth or at least locally Lipschitz to guarantee the uniqueness of local solutions. A special and important class of the dynamical systems (1.1) is the class of linear time-invariant systems (LTI) described by x y
Ax Bu Kd Hx
(1.2)
with A , B , C are constant matrices. Hence, results applicable to (1.2) will be highlighted throughout the book.
1.1.2 Discrete-time Nonlinear Systems In this book, the discrete-time nonlinear dynamical systems considered are affine in the input and can be described by
xk 1 y
f ( xk ) g1 ( xk )uk g 2 ( xk )d k h( xk )
(1.3)
where k is the discrete-time index, xk \ n is the internal state vector, f ( x) \ n , g1 ( xk ) \ nu m1 and g 2 ( xk ) \ nu m2 . uk \ m1 is the control input, and yk \ p is the measured system output. d k \ m2 is a disturbance input determined by the surrounding environment. A special and important class of (1.3) is the linear time-invariant systems described by xk 1 yk
Axk Buk Kd k Hxk
(1.4)
with A , B , C constant matrices. The results tailored to (1.4) will be highlighted and emphasized throughout the book.
Preliminaries and Introduction
3
In both continuous-time and discrete-time systems with zero disturbance, the choice of f ( x ) determines the stability of the unforced system, i.e. u 0 . Moreover, the choice of the input matrix g ( x) determines the controllability of the system, and the choice of the measurement matrix h( x ) determines the observability of the system, in other words the suitability of the measurements taken in a system.
1.2 Stability of Nonlinear Systems In this section, we study the stability of an equilibrium point of the system with respect to changes in the initial conditions. These definitions will be stated in terms of continuous-time nonlinear systems with the understanding that discrete-time nonlinear systems admit similar. Consider the following unforced, i.e. no inputs, continuous-time nonlinear dynamical system x
f ( x)
(1.5)
where x and f are n u 1 vectors. It is assumed that f is Lipschitz continuous on a set : \ n containing the origin of the system. Under this assumption, a unique continuous-time solution x(t ) that satisfies (1.5) exists. To discuss the stability of (1.5), the following definitions are required.
Definition 1.1 (Equilibrium Point) A vector xe \ n is a fixed or equilibrium point of (1.5) if f ( xe ) 0 . Definition 1.2 In all parts of this definition, xe is an equilibrium point of (1.5) denotes a vector norm. and 1. Stability: xe is stable in the sense of Lyapunov if starting close enough to xe , the state will always stay close to xe at later times. More precisely, xe is stable in the sense of Lyapunov if for any given H ! 0 , there exists a positive constant G (H ) such that if x0 xe G (H ) then x(t ) xe H . 2. 3.
Asymptotic Stability: xe is asymptotically stable if it is stable in the sense of Lyapunov and eventually converging to xe as time goes to infinity. Domain of Attraction: A region such that asymptotic stability will result for any state starting inside this region but not for states starting outside it.
4
Nonlinear H2/H Constrained Feedback Control
4.
Global Asymptotic Stability: xe is globally asymptotically stable if the equilibrium point is asymptotically stable, and the corresponding region of attraction is \ n .
An isolated equilibrium point xe can always be brought to the origin by redefinition of co-ordinates; therefore, let us assume without loss of generality that the origin is an equilibrium point. In this case, if an output equation is considered, then one can relate the stability of the internal state vector x by observing the measured output y h( x) . In that case, the following definitions become relevant.
Definition 1.3 (Zero-state Observability) System (1.5) is zero-state observable if y (t ) 0 t t 0 implies that x(t ) 0 t t 0 . Definition 1.4 (Zero-state Detectability) System (1.5) is zero-state detectable if y (t ) 0 t t 0 implies that lim x(t ) 0 . t of
If control inputs are introduced to system (1.5) to become x y
f ( x) g ( x)u h( x )
(1.6)
and when x f ( x) is not necessarily stable, then in this case it is important to study the stabilizabilty and controllability of system (1.6).
Definition 1.5 (Controllability) System (1.6) is locally controllable around the origin if there exists a neighbourhood : of the origin such that, given any initial state x0 : , there exists a final time T and a control input u (t ) on [0, T ] that drives the state from x0 to the origin. Definition 1.6 (Stabilizabilty) System (1.6) is locally stabilizable around the origin if there exists a neighbourhood : of the origin and a control input u (t ) that drives the state from x0 : to the origin asymptotically. Determining whether a nonlinear system is stable or not is largely based on Lyapunov theorems discussed in what follows for both continuous-time and discrete-time nonlinear systems.
1.2.1 Lyapunov Stabiltity of Continuous-time Nonlinear Systems Consider the autonomous dynamical system
x
f ( x), y
h( x )
(1.7)
with x \ n , which could represent an unforced system with u 0 , or a closedloop system after the controller has been designed and specified as a function of
Preliminaries and Introduction
5
the state x(t ) , i.e. u ( x) l ( x) . In both cases, the stability of (1.7) around the origin can be determined by the following theorems.
Theorem 1.1 (Lyapunov Stability) If there exists a locally positive definite function V ( x) ! 0 , such that its time derivative along the trajectories of (1.7) in some neighbourhood of the origin is V ( x ) VxT x VxT f ( x) d 0
(1.8)
then the origin is stable in the sense of Lyapunov and V ( x) is called a Lyapunov function. Moreover, if in some neighbourhood of the origin V ( x ) VxT x VxT f ( x) 0
(1.9)
then the origin is asymptotically stable. The origin is globally stable, respectively globally asymptotically stable, if in addition (1.8), respectively (1.9), holds for all x \ n with the Lyapunov function satisfying the radially unbounded property, i.e. V ( x) o f as x o f .
Theorem 1.2 Let V ( x) t 0 be a solution to
VxT f ( x)
h( x)T h( x)
(1.10)
and suppose that x
f ( x), y
h( x )
(1.11)
is zero-state detectable. Then x 0 is an asymptotically stable equilibrium of (1.11). If additionally (1.10) holds for all x \ n and V ( x) is radially unbounded, then x 0 is globally asymptotically stable. For the special case of linear time-invariant systems, the following Lyapunov theorems apply.
Theorem 1.3 (Lyapunov Theorem for Linear Systems) The system x
Ax, y
Hx
is stable in the sense of Lyapunov if there exists matrices P ! 0 , Q that satisfy the Lyapunov equation AT P PA
Q
(1.12)
HT H t 0
(1.13)
6
Nonlinear H2/H Constrained Feedback Control
Moreover, if Q H T H ! 0 and there exists a P ! 0 that solves (1.13) then system (1.12) is asymptotically stable. One may think of P in (1.13) as a cost that is the outcome of evaluating the following performance functional: f T
³ y(t )
y (t ) dt
(1.14)
0
over the state trajectories of (1.12). Hence one may write f T
³ x(t )
Qx(t )dt
0
§f · x0c ¨ ³ (e At )T Qe At dt ¸ x0 ©0 ¹
x0T Px0
(1.15)
If the linear time-invariant system
x
Ax Bu
u
Kx
(1.16)
is considered, one can define the following closed-loop Lyapunov equation
( A BK )T P P( A BK )
Q
(1.17)
and analyze the stability of the solutions of this Lyapunov equations to determine the stability of this closed-loop system. There are many extensions to the results appearing in Theorem 1.3. Using the concepts of observability and detectability, the following extensions follow.
Theorem 1.4 A necessary and sufficient condition for A to be strictly stable is that for any symmetric Q ! 0 , the unique solution of the linear matrix equation AT P PA Q
0
is P ! 0 and symmetric.
Theorem 1.5 If Q t 0 and A is strictly stable, the linear matrix equation
AT P PA Q
0
has a unique solution P , and P t 0 . Moreover, if (Q1/ 2 , A) is observable, then P!0.
Theorem 1.6 Suppose P t 0 , Q t 0 , (Q1/ 2 , A) is detectable and
Preliminaries and Introduction
AT P PA Q
7
0
Then A is strictly stable. Moreover, if (Q1/ 2 , A) is observable, then P ! 0 .
1.2.2 Lyapunov Stabiltity of Discrete-time Nonlinear Systems Consider the autonomous discrete-time dynamical system xk 1
f ( xk )
(1.18)
with x \ n could represent a closed-loop system after the controller has been designed and specified as a function of the state xk . Stability of (1.18) around the origin can be determined by the following theorems.
Theorem 1.7 (Lyapunov Stability) If there exists a positive definite function V ( x) ! 0 such that the forward difference along the trajectories of (1.18)
'V ( xk ) V ( xk 1 ) V ( xk ) V ( f ( xk )) V ( xk ) d 0
(1.19)
then the origin is stable in the sense of Lyapunov and V ( xk ) is called a Lyapunov function. Moreover, if 'V ( xk ) V ( f ( xk )) V ( xk ) 0
(1.20)
then the origin is asymptotically stable. The origin is globally stable, respectively globally asymptotically stable, if in addition (1.19), respectively (1.20), holds for all x \ n with the Lyapunov function satisfying the radially unbounded property, i.e. V ( x) o f as x o f . For the special case of linear time-invariant systems, the following theorem applies.
Theorem 1.8 (Lyapunov Theorem for Linear Systems) The system xk 1
Axk
(1.21)
is stable in the sense of Lyapunov if there exist matrices P ! 0 , Q t 0 that satisfy the Lyapunov equation P
AT PA Q
(1.22)
If there exists a solution such that both P and Q are positive definite, the system is asymptotically stable.
8
Nonlinear H2/H Constrained Feedback Control
One may think of P in (1.22) as a cost function that is the outcome of evaluating the following performance functional f
¦x
T
k
Qxk
(1.23)
k 0
over the state trajectories of (1.21). Hence one may write f
¦x
k
k 0
T
Qxk
§ f · x0T ¨ ¦ ( Ak )T QAk ¸ x0 ©k 0 ¹
x0T Px0
(1.24)
1.3 Dissipativity of Nonlinear Systems Dissipativity, when it holds, is a property of nonlinear systems that plays an important role in the study of control systems from an input–output perspective. The focus in this section will be mainly concerned with the treatment of squareintegrable signals, i.e. finite-energy signals. These are natural to work with when studying the effect of disturbances on closed-loop systems. This study requires the following function norms.
Definition 1.7 ( L p -norm) Given a continuous-time function f (t ) : > 0, f o \ n , its L p function norm, f (t ) Lp , is given in terms of the vector p-norm f (t ) p at each value of t by 1/ p
f (t )
and if p
§f · p ¨ ³ f (t ) p dt ¸ ©0 ¹
Lp
f f (t )
Lf
sup f (t ) t t0
f
Definition 1.8 ( l p -norm) Let Z {0,1, 2,3,!} be the set of natural numbers and f (k ) : Z o \ n . The l p function norm, f (k ) l p , is given in terms of the vector pnorm f (k ) p at each value of k by 1/ p
f (t ) and if p
§ f p · ¨ ¦ f (k ) p ¸ ©k 0 ¹
lp
f f (t )
lf
sup f (k ) f . k t0
Preliminaries and Introduction
9
If the L p -norm ( l p -norm) is finite, then f (t ) L p ( f (k ) l p ). A continuoustime signal (discrete-time signal) that is f (t ) L2 ( f (k ) l2 ) is called a squareintegrable signal or equivalently a finite-energy signal. Therefore, the L2 -norm ( l2 -norm) is essential in the study of dissipativity and robustness of dynamical systems.
1.3.1 Dissipativity of Continuous-time Nonlinear Systems Consider the system described by x
f ( x) k ( x)d
z
h( x )
(1.25)
where f (0) 0 and hence x 0 is assumed to be an equilibrium point of the system. d (t ) is considered a disturbance input and z (t ) is a fictitious penalty output.
Definition 1.9 (Dissipative Systems) System (1.25) with supply rate w(d , z ) is said to be dissipative if there exists V t 0 , called the storage function, such that t1
V ( x(t0 )) ³ w(d (t ), z (t ))dt t V ( x(t1 ))
(1.26)
t0
Definition 1.10 ( L2 -gain Stability) System (1.25) has an L2 -gain less than or equal to J , where J t 0 , if z (t )
L2
d J d (t )
(1.27)
L2
for all d (t ) L2 for which the state trajectory remains within the domain attraction of the system, with z (t ) h( x(t )) denoting the output of (1.25) resulting from d when the initial state x(0) 0 . Dissipativity and L2 -gain stability are related. It can be shown that if the 2 2 dynamical system (1.25) is dissipative with a supply rate w(t ) J 2 d (t ) z (t ) , then it is also L2 -gain stable. To see this, if there exists V t 0 satisfying (1.26) 2 2 with x0 0 and w(t ) J 2 d (t ) z (t ) . And if V ( x0 ) 0 , then t1
t1
t1 2
2
2 ³ w(d (t ), z (t ))dt t V ( x1 ) t 0 ³ z (t ) dt d J ³ d (t ) dt
t0
t0
t0
It has been shown that a lower bound on the storage function V ( x ) is given by the so-called available storage. The existence of the available storage is essential in determining whether or not a system is dissipative.
10
Nonlinear H2/H Constrained Feedback Control
Definition 1.11 (Available Storage) The available storage Va t 0 of (1.25) is given by the following optimal control problem f
Va ( x)
sup ³ w(d (t ), z (t ))dt d (t ) 0
The optimal maximizing policy d associated with the available storage can be thought of as the policy for extracting the maximum energy from the system. It can be interpreted as the worst possible L2 disturbance when the supply rate is given 2 2 by w(t ) J 2 d (t ) z (t ) . For a system to be dissipative, Va needs to be finite. The available storage Va t 0 provides a lower bound on the storage function of the dynamical system, 0 d Va d V . If Va C1 then it solves the following Hamilton–Jacobi equation
dVa dx
T
T
f
dV 1 dVa kk T a hT h 2 dx 4J dx
0, Va (0)
0
(1.28)
To find the available storage, one needs to solve an optimization problem which can be approached by solving a variational problem as in optimal control theory. The Hamiltonian of the this optimization problem is given by H ( x, p , d )
pT f kd hT h J 2 d T d
(1.29)
This Hamiltonian is a polynomial of degree two in d , and has a unique maximum at 1 T k ( x) p 2J 2
d* given by H ( x, p )
pT f ( x)
1 T p k ( x)k ( x)T p h( x)T h( x) 4J 2
(1.30)
Setting the right-hand side of Equation (1.30) to zero and replacing p with dV dx , one has dV dx
T
T
f ( x)
1 dV dV h( x)T h( x) k ( x ) k ( x )T 2 dx 4J dx
0
and this is the same as Equation (1.28). It can be shown that any V ( x) t 0 that solves the following Hamilton–Jacobi inequality
Preliminaries and Introduction
dV dx
T
11
T
f
1 dV dV kk T hT h d 0, V (0) 2 dx 4J dx
0
(1.31)
is a possible storage function. The relationship between Hamilton–Jacobi equations, L2 -gain stability and dissipativity of dynamical system is discussed in the Bounded Real Lemma theorems.
Theorem 1.9 (Nonlinear Bounded Real Lemma) Consider the nonlinear timeinvariant system (1.25). Suppose there is a continuously differentiable, positive semidefinite function V ( x) that satisfies dV dx
T
T
f
1 dV dV hT h d 0 kk T 2 dx 4J dx
with J a positive constant. Then, the system (1.25) is finite-gain L2 -stable and its L2 -gain is less than or equal to J . If V ( x) t 0 solves dV dx
T
T
f
1 dV dV hT h kk T dx 4J 2 dx
0
with x
1 dV kk T 2 dx 2J
f
asymptotically stable, then the system is finite-gain L2 -stable and its L2 -gain is strictly less than J and V ( x) t 0 is called the available storage. If in addition zero-state observability is assumed, then the available storage is positive definite. Theorem 1.9 is the nonlinear analogue of the Bounded Real Lemma known for linear time-invariant systems.
Theorem 1.10 (Bounded Real Lemma) Consider the system x
Ax Kd
z
Hx
Then the following statements are equivalent: V ( A) ^ and the L2 -gain is strictly less than J . The algebraic Riccati equation
AT P PA H T H J12 PKK T P
0
(1.32)
12
Nonlinear H2/H Constrained Feedback Control
has a unique symmetric solution P t 0 such that V ( A J12 KK T P) ^ .
There exists a symmetric X ! 0 such that AT X XA H T H J12 XKK T X 0
Note that for the special case of linear systems, the L2 -gain can be found exactly and is equivalent to the infinity norm, H f , of the transfer function from d to z given as z
2 L2
d
2 L2
S
2
³ S H ( jw)d ( jw) dw d sup S ³ S d ( jw) dw
2
w
H ( jw)d ( jw) d ( jw)
2 2
2 2
(1.33) 2
d sup H ( jw) w
2
where
J
sup H ( jw) w
2
is the lower bound on J for which Theorem 1.10 holds.
1.3.2 Dissipativity of Discrete-time Nonlinear Systems Consider the following nonlinear time-invariant discrete-time system
xk 1 zk
f ( xk ) k ( xk )d k h( xk )
(1.34)
where f (0) 0 and hence x 0 is assumed to be an equilibrium point of the system. d k is considered a disturbance input and zk is a fictitious penalty output.
Definition 1.12 (Dissipative Systems) System (1.34) with supply rate w(t ) is said to be dissipative if there exists V t 0 , called the storage function, such that N
V ( x0 ) ¦ w( zk , d k ) t V ( xN 1 )
(1.35)
k 0
The storage function in (1.35) satisfies V ( xk 1 ) V ( xk ) d w( zk , d k ) It can be shown that any V ( x) t 0 that solves the following discrete-time Hamilton–Jacobi inequality
Preliminaries and Introduction
V ( xk ) t sup ^w( zk , d k ) V ( xk 1 )`
13
(1.36)
dk
is a possible storage function.
Definition 1.13 ( l2 -gain Stability) System (1.34) has an l2 -gain less than or equal to J , where J t 0 , if zk
l2
d J dk
(1.37)
l2
for all d k l2 , with z (k ) h( xk ) denoting the output of (1.34) resulting from d for initial state x(0) 0 with the state trajectory remaining within the domain attraction of the system.
Definition 1.14 (Available Storage) The available storage Va t 0 of (1.34) is given by the following optimal control problem N
Va ( xk )
max ¦ w( zk , d k ) dk
k 0
The optimal policy d k associated with the available storage can be thought of as the policy for extracting the maximum energy from the system. It can be interpreted as the worst possible l2 disturbance when the supply rate is given by 2 2 wk J 2 d k zk . The relationship between discrete-time Hamilton–Jacobi equation and l2 -gain stability and dissipativity of dynamical system is discussed in the discrete-time Bounded Real Lemma theorems.
Theorem 1.11 (Nonlinear Bounded Real Lemma) Consider the time-invariant nonlinear system (1.34), Suppose there is a positive semidefinite function V ( xk ) that satisfies the following discrete-time version of the Hamilton–Jacobi equation V ( xk )
max[ zk dk
2
J 2 dk
2
V ( f ( xk ) k ( xk )d k )]
with J a positive constant. Then, the system (1.34) is finite-gain l2 stable and its l2 -gain is less than or equal to J . Theorem 1.11 is the nonlinear analogue of the Bounded Real Lemma known for discrete-time linear time-invariant systems.
Theorem 1.12 (Bounded Real Lemma) Consider the system xk 1 yk
Ax Buk Cxk
14
Nonlinear H2/H Constrained Feedback Control
Then the following statements are equivalent: A is asymptotically stable and the l2 -gain is strictly less than J . There exists a P ! 0 such that P ! AT PA C T C AT PB( BT PB J 2 I ) 1 BT PA 0 ! BT PB J 2 I
The algebraic Riccati equation P
AT PA C T C AT PB( BT PB J 2 I ) 1 BT PA
(1.38)
has a unique symmetric solution P t 0 such that A B( BT PB J 2 I ) 1 BT PA is asymptotically stable.
1.4 Optimal Control of Nonlinear Systems Optimal control theory involves the design of controllers that besides meeting the required control objective also minimize specific performance functionals. The value function of the optimal control problem satisfies Bellman’s Optimality Principle which is described for both continuous-time and discrete-time systems.
1.4.1 Dynamic Programming and the HJB Equation Consider the system described by x
f ( x) g ( x)u
(1.39)
where f (0) 0 and hence x 0 is assumed to be an equilibrium point of the system. u (t ) is considered a control input. It is desired to find u (t ) such that the following infinite-horizon performance functional is minimized f
V ( x(t ))
³ > r x(W ), u(W ) @ dW
(1.40)
t
This can be equivalently written as f
V ( x(t ))
min ³ > r x(W ), u (W ) @ dW u (t )
0dt df
t
which after subdividing the interval becomes
(1.41)
Preliminaries and Introduction
f t 't ½ min ® ³ > r x(W ), u (W ) @ dW ³ > r x(W ), u (W ) @ dW ¾ u (W ) ¿ t 't t dW df ¯ t
V ( x(t ))
15
(1.42)
To solve this optimal control problem, Bellman’s Optimality Principle requires that
t 't ½ min ® ³ > r x(W ), u (W ) @ dW V ( x(t 't )) ¾ u (t ) ¯ t ¿
V ( x(t ))
(1.43)
Taking the infinitesimal version of (1.43), under the assumption that V ( x) is continuously differentiable, one obtains 0
^
`
min r x(t ), u (t ) Vx c ( x(t )) > f ( x) g ( x)u @ u (t )
(1.44)
which is known as the Hamilton–Jacobi–Bellman equation (HJB) and provides a sufficient condition in optimal control theory. The optimal control law is then given as
u ( x)
1 dV R 1 g T 2 dx
(1.45)
For the special case of linear time-invariant systems x
Ax Bu
(1.46)
with the quadratic infinite-horizon cost f
V ( x0 )
min ³ ª¬ xT Qx u T Ru º¼ dt u (t )
0dt df
(1.47)
0
the Hamilton–Jacobi–Bellman equation (1.44) becomes the algebraic Riccati equation
AT P PA Q PBR 1 BT P
0
(1.48)
The value of the optimal control problem is quadratic in the state V ( x)
xT Px
(1.49)
with P being the positive semidefinite solution of (1.48). The optimal controller in this case, known as the linear quadratic regulator (LQR), is given as
16
Nonlinear H2/H Constrained Feedback Control
u ( x)
R 1 BT Px
(1.50)
For linear systems, the optimal control problem (1.47) is related to a general class of problems known as the H 2 optimal control. To illustrate this fact, consider the system described in Figure 1.1
x Ax B1w B2u z C1 x D11w D12u y C2 x D21w D22u
w u
z
y
u D ( y) Figure 1.1.
where u is the control input, w is a disturbance input, y is the measured output and z is the controlled output. In the H 2 optimal control theory, one is interested in finding a controller that minimizes the second norm of the transfer function from w to z
H ( s)
2 2
1 2S
³
f
0
³
f
f
trace ¬ª H T ( jw) H ( jw) ¼º dw
T ¬ª h(t ) h(t ) ¼º dt
(1.51)
This can be thought of as minimizing the response of the transfer function H ( s ) to the impulsive input w . To see why (1.46) and (1.47) is an H 2 optimal control problem. Note that x Ax Bu x(0) x0
(1.52)
can be written as
with
x
Ax B1 w B2 u
z
C1 x D12 u
y
x
(1.53)
Preliminaries and Introduction
B1
x0
B2
B
1/ 2
ªQ º « » ¬ 0 ¼
C1 Moreover, w G (t ) , x(0)
D12
ª 0 º « R1/ 2 » ¬ ¼
0 and the optimized functional is
f
f
T ³ ª¬ z (t ) z (t ) º¼ dt
³ ª¬ x Qx u
T
0
17
T
Ru º¼ dt .
(1.54)
0
In this book, we will refer to the optimal control problem (1.41) as an H 2 optimal control problem.
1.4.2 Discrete-time HJB Equation Consider the following nonlinear time-invariant discrete-time system xk 1
f ( xk ) g ( xk )uk
(1.55)
where f (0) 0 . uk is a control input and x 0 is assumed to be an equilibrium point of the system. It is desired to find uk such that the following infinite-horizon performance functional is minimized f
V ( xk )
¦ r x ,u i
i
(1.56)
i k
This can be equivalently written as V ( xk )
f
min ¦ r xi , ui uk 0d k df
(1.57)
i k
which after subdividing the interval becomes
V ( xk )
f ½ min ®r xk , uk ¦ r xi , ui ¾ uk ¯ ¿ i k 1 0d k df
(1.58)
To solve this optimal control problem, Bellman’s Optimality Principle requires that
V ( xk )
min ^r xk , uk V ( xk 1 )` uk
min ^r xk , uk V ( f ( xk ) g ( xk )uk )` uk
(1.59)
18
Nonlinear H2/H Constrained Feedback Control
Equation (1.59) is the discrete-time Hamilton–Jacobi–Bellman equation. Unlike the case of continuous-time systems, the optimal control policy is related to the optimal cost-to go through dV ( xk 1 ) 1 R 1 g ( xk )T 2 dxk 1
u ( xk )
(1.60)
and hence it is very difficult, except in very special cases, to find a closed-form solution for u ( xk ) in terms of V ( xk ) when u ( xk ) is related to V ( xk 1 ) as shown in (1.60). Consider the special case of linear time-invariant systems xk 1
Axk Buk
(1.61)
with the quadratic infinite-horizon cost V ( xk )
f
min ¦ ª¬ xk T Qxk uk T Ru º¼ uk 0d k df
(1.62)
i k
In this case, Equation (1.59) becomes the algebraic Riccati equation P
AT PA Q AT PB( BT PB R) 1 BT PA
(1.63)
The value of the optimal control problem is quadratic in the state
V ( xk )
xk T Pxk
(1.64)
with P the positive semi definite solution of (1.64). The optimal controller in this case, known as the discrete-time linear quadratic regulator (LQR) and is given by
uk
( BT PB R) 1 BT PAxk
(1.65)
1.5 Policy Iterations and Optimal Control Solving optimal control problems requires solving a dynamic programming problem which happens backward in time. One can solve for the optimal control policy by a sequence of policy iterations that eventually converge to the optimal control policy. This idea has its roots in the reinforcement learning literature. In this section, policy iterations in the case of continuous-time linear timeinvariant systems are introduced. In this case, policy iterations require one to solve a sequence of linear matrix equations, i.e. Lyapunov equations, rather than solving
Preliminaries and Introduction
19
a quadratic matrix equation, i.e. algebraic Riccati equation. Preliminary results concerning policy iterations technique to solve the algebraic Riccati equations (1.48) and (1.32) are discussed. These concepts are generalized later in this book to the case of nonlinear continuous-time systems to solve the Hamilton–Jacobi– Bellman equation and the Hamilton–Jacobi–Isaacs equation. Discrete-time policy iterations methods do not appear in this book.
1.5.1 Policy Iterations and H2 Optimal Control Consider the following linear system x u with u
Ax Bu Kx
(1.66)
Kx a stabilizing controller. Consider the following cost-to-go f
V ( x0 )
T
³ ª¬ x Qx u
T
Ru º¼ dt
(1.67)
0
which when evaluated over the trajectories of the closed-loop system (1.66), one has V ( x0 )
ªf º x0T « ³ (e( A BK )t )T ª¬Q K T RK º¼ e( A BK )t dt » x0 ¬0 ¼ T x0 Px0
(1.68)
Taking the infinitesimal version of (1.68) V ( x) VxT ( A BK ) x
xT Qx u T Ru xT (Q K T RK ) x
one obtains the following closed-loop Lyapunov equation
0
( A BK )T P P( A BK ) Q K T RK
If one has two different stabilizing controller u1 one has V1 ( x) V1xT ( A BK1 ) x xT Qx u1T Ru1 xT (Q K1T RK1 ) x
K1 x and u2
K 2 x , then
20
Nonlinear H2/H Constrained Feedback Control
V2 ( x) V2 xT ( A BK 2 ) x xT Qx u2T Ru2 xT (Q K 2T RK 2 ) x
u2
One can differentiate V1 ( x) V2 ( x) K 2 x to have V1 V2
over the trajectories of controller
(V1x V2 x )T ( A BK 2 ) x (u1T u2T ) R (u1 u2 ) (Vx1T B 2u2T R )(u1 u2 )
and implies that V1 ( x0 ) V2 ( x0 )
³
f
0
ª¬(u1T u2T ) R (u1 u2 ) (Vx1T B 2u2T R)(u1 u2 ) º¼ dt
Hence, one has f
P1 P2
³ (e
( A BK 2 ) t T
) ª¬( K1 K 2 )T R( K1 K 2 )
0
( BT P1 RK 2 )T ( K1 K 2 ) ( K1 K 2 )T ( BT P1 RK 2 ) º¼ e( A BK2 ) t dt If the control policy u2
(1.69)
K 2 x is selected such that K2
R 1 BT P1
then it can be shown that this policy is a stable policy by noticing that V1 ( x) is a Lyapunov function for u2 and is given by V1
(V1x )T ( A BK 2 ) x xT Qx (u1T u2T ) R(u1 u2 ) u2T Ru2
Theorem 1.13 Let Pi , i Lyapunov equation
0
(1.70)
0,1,! be the unique positive definite solution of the
( A BK i )T Pi Pi ( A BK i ) Q K iT RK i
where the policies Ki
R 1 BT Pi 1
(1.71)
and K 0 is such that A BK 0 has all eigenvalues with negative real parts. Then
Preliminaries and Introduction
P d Pi 1 d Pi d " , i
lim Pi o P
21
0,1,!
i of
where P solves AT P PA Q PBR 1 BT P
0
Proof. From (1.70), the policies given by (1.71) are stabilizing. From (1.69), one , can show that P d Pi 1 d Pi d " , i 0,1,! . 1.5.2 Policy Iterations and the Bounded Real Lemma Consider the algebraic Riccati equation (1.32) in Theorem 1.10. The next theorem shows the existence and convergence of policy iterations that solve for the stabilizing solution of (1.32).
Theorem 1.14 Assume there exists a P t 0 that solves
AT P PA J12 PDDT P H T H with V ( A J12 DDT P) ^ Pi , i 0,1,! that solves
and
0
(1.72)
H ! 0 . Then, there exists a sequence
( A DK i )T Pi Pi ( A DK i ) H T H J 2 K iT K i
0 where the policies Ki
1 J2
DT Pi 1
(1.73)
with P0 0 . Then, i, V ( A DK i ) ^
Pi d Pi 1 d " d P , i
0,1,!
lim Pi o P ! 0 i of
where P solves (1.72).
Proof. Existence: Assume that there is Pi 1 such that V ( A DK i ) ^ , then one has f
Pi
³ (e
( A DKi ) t T
0
and Pi is the unique solution of
) ª¬ H T H J 2 K iT K i º¼ e ( A DKi )t dt
22
Nonlinear H2/H Constrained Feedback Control
( A DK i )T Pi Pi ( A DK i )
H T H J 2 K iT K i
(1.74)
Moreover, from Theorem 1.10, there exists a symmetric X ! 0 such that ( X )
AT X XA H T H J12 XDDT X 0
(1.75)
Equation (1.75) can be rewritten to have ( A DK i )T X X ( A DK i )
H T H J 2 K iT K i J12 ( X Pi 1 ) DDT ( X Pi 1 ) ( X )
(1.76)
Combining (1.76) with (1.74), one has ( A DK i )T ( X Pi ) ( X Pi )( A DK i ) J12 ( X Pi 1 ) DDT ( X Pi 1 ) ( X ) (1.77)
w1 w2 " wL @
are the vector activation function and the vector weight, respectively. The neural network weights are tuned to minimize the residual error in a least squares sense over a set of points within the stability region : of the initial stabilizing control. The least squares solution attains the lowest possible residual error with respect to the neural network weights. Replacing V ji in PI (V ji , u j , d i ) 0 with Vˆji , one has
Nearly H Optimal Neural Network Control for Constrained-input Systems
§ PI ¨ Vˆji ( x) ©
L
¦w V k
k
k 1
· ( x), u j , d i ¸ ¹
97
(5.3)
eL ( x)
where eL ( x) is the residual error. To find the least squares solution, the method of weighted residuals is used [31]. The weights, w ij , are determined by projecting the residual error onto deL ( x) dwij and setting the result to zero x : using the inner product, i.e.
deL ( x) , eL ( x) dw ij where f,g has w ij
³ fgdx
0
(5.4)
is a Lebesgue integral. Rearranging the resulting terms, one
:
ı L Fji , ı L Fji
1
< H ij , ı L F ji
(5.5)
where F ji
f gu j kd i
H ij
h ' h 2 ³ I 1 (v)dv J 2 d i
uj
2
0
Equation (5.5) involves a matrix inversion. The following lemma discusses the invertibility of this matrix. L
Lemma 5.1 If the set ^V j ` is linearly independent, then 1
^V
T j
Fji `
L
1
is also linearly independent.
Proof. This follows from the asymptotic stability of the vector field x in [3], and from [1].
Fji shown ,
Because of Lemma 5.1, the term ı L F ji , ı L F ji is guaranteed to have full rank, and thus is invertible as long as x Fji is asymptotically stable. This in turn guarantees a unique w ij of (5.5). Having solved for the neural net weights, the disturbance policy is updated as dˆ i 1
1 2J 2
k T ı LT w ij
(5.6)
98
Nonlinear H2/H Constrained Feedback Control
It is important that the new dynamics x f gu j kdˆ i 1 are asymptotically stable in order to be able to solve for w ij1 in (5.5). Theorem 1 in the next section discusses the asymptotic stability of x f gu j kdˆ i 1 . Policy iterations on the disturbance requires solving iteratively between Equation (5.5) and (5.6) at each inner loop iteration on i until the sequence of neural network weights w ij converges to some value, denoted by w j . Then the control is updated using w j as
uˆ j 1
I( 12 g T ı LT w j )
(5.7)
in the outer-loop iteration on j . Finally, one can approximate the integrals needed to solve (5.5) by introducing a mesh on : with mesh size equal to 'x . Equation (5.5) becomes X ij
«ı F i "" ı F i L j ¬« L j x1
T
» , Yi j xp ¼ »
«H i ¬« j
x1
"" H ij
» xp ¼ »
T
(5.8)
where p in x p represents the number of mesh points and H and F are as shown in (5.5). The number p increases as the mesh size is reduced. Therefore
ı L Fji , ı L Fji H ij , ı L Fji
lim ( X ij T X ij ) 'x
'x o 0
lim ( X ij T Y ji ) 'x
(5.9)
'x o 0
This implies that one can calculate w ij as
w ij
( X ijT X ij ) 1 ( X ij T Y ji )
(5.10)
An interesting observation is that Equation (5.10) is the standard least squares method of estimation for a mesh on : . Note that the mesh size ' should be such that the number of points p is greater than or equal to the order of approximation L . This guarantees full rank for ( X ij T X ij ) . There do exist various ways to efficiently approximate integrals such as those appearing in (5.5). Monte Carlo integration techniques can be used. Here the mesh points are sampled stochastically instead of being selected in a deterministic fashion [30]. In any case however, the numerical algorithm at the end requires solution of (5.10), which is a least squares computation of the neural network weights. Numerically stable routines to compute equations like (5.10) exist in several software packages, including MATLAB, which is used the next section. A flowchart of the proposed computational algorithm is shown in Figure 5.1. This is an offline algorithm run a priori to obtain a neural network constrained state feedback controller that is nearly L2-gain optimal.
Nearly H Optimal Neural Network Control for Constrained-input Systems
Start
Initialization ı L : Neurons.
p: Number of mesh points. u0 : Initial asymptotically stable control. I1 , I 2 : Number of policy iterations. :: The neural network region of approximation. h( x), J 0 : States related performance criteria. R: Controls related performance criteria.
j i
X ij
0
0, dˆ (i )
0
«ı F i "" ı F i L j ¬« L j x1 « H i "" H i j ¬« j x1
w ij
( X ij T X ij ) 1 ( X ij T Y ji )
1 2J 2
» ¼»
T
T
Y ji
i o i 1, dˆ i 1
» xp ¼ »
xp
k T ı LT w ij
No i ! I1
Reduce J Let u0 be uˆ I 2
Yes
I( 12 g T ı LT w j )
j o j 1, uˆ j 1
No
j ! I2
Is the HJI solvable?
Yes
No Finish
Figure 5.1. Flowchart of the algorithm
99
100
Nonlinear H2/H Constrained Feedback Control
In this algorithm, once the policies converge for some J 1 , one may use the control policy as an initial policy for new inner/outer loop policy iterations with J 2 J 1 . The attenuation J is reduced until the HJI equation is no longer solvable on the desired compact set.
5.2 Stability and Convergence of Least Squares Neural Network Policy Iterations In this section, the stability and convergence of policy iterations between (5.5), (5.6) and (5.7) is studied. It is shown that the closed-loop dynamics resulting from the inner loop iterations on the disturbance (5.6) is asymptotically stable as dˆ i 1 uniformly converges to d i 1 . Then later, it is shown that the updated uˆ j 1 is also stabilizing. Hence, this section starts by showing convergence results of the method of least squares when neural networks are used to solve for V ji in. Note that (5.2) is a Fourier series expansion. In this chapter, a linear in parameters Volterra neural network is used. This gives a power series neural network that has the important property of being differentiable. This means that they can approximate uniformly a continuous function with all its partial derivatives up to order m using the same polynomial, by differentiating the series termwise. This type of series is m -uniformly dense as shown in [1]. Other m -uniformly dense neural networks, not necessarily based on power series, are studied in [41]. To study the convergence properties of the developed neural network algorithm, the following assumptions are required.
Assumption 5.1 It is assumed that the available storage exists and is positive definite. This is guaranteed for stabilizable dynamics and when the performance functional satisfies zero-state observability. Assumption 5.2 The system dynamics and the performance integrands are such that the solution of PI (V ji , u j , d i ) 0 is continuous and differentiable for all i and j , therefore, belonging to the Sobolev space V H 1,2 (:) [6]. f
Assumption 5.3 One can choose complete coordinate elements ^V j ` H 1,2 (:) 1 such that the solution V H 1,2 (:) and ^wV wx1f,!, wV wxn ` can be uniformly approximated by the infinite series built from ^V j ` . 1
Assumption 5.4 The sequence ^\ j and given by
AV j
AV j ` is linearly independent and complete,
dV j dx
T
( f gu kd )
Assumptions 5.1–5.3 are standard in H control theory and neural network control literature. Lemma 5.1 assures the linear independence required in the fourth
Nearly H Optimal Neural Network Control for Constrained-input Systems
101
assumption while the high-order Weierstrass approximation theorem (Theorem 1.15) shows that for all V and H , there exists L and w L such that Vˆ V H , dVˆ dxk dV dxk H k This implies that as L o f sup AVˆ AV o 0 AVˆ AV x:
L2 ( : )
o0
Therefore completeness of ^\ j ` is established, and Assumption 5.4 is satisfied. Similar to the HJB Equation [1], one can use the previous assumptions to conclude the uniform convergence of the least squares method which is placed in the Sobolev space H 1,2 (:) [6].
Theorem 5.1 The neural network least squares approach converges uniformly for
sup dVˆji dx dV ji dx o 0, sup Vˆji V ji o 0, sup dˆ i 1 d i 1 o 0 x:
x:
x:
sup uˆ j 1 u j 1 o 0 x:
Next, it is shown that the system x f j kdˆ i 1 is asymptotically stable, and hence Equation (5.5) can be used to find Vˆ i 1 .
Theorem 5.2 L0 : L t L0 such that x
f j kdˆ i 1 is asymptotically stable.
Proof. Since the system x f j kd is dissipative with respect to J , this implies [90] that there exists P ( x ) ! 0 such that uj
PxT f j hT h 2 ³ I 1 (v)dv 4J1 2 PxT kk cPx
Q( x) 0
(5.11)
0
where i, P ( x ) t V i ( x ) . Since uj
(Vxi 1 )T ( f j 2J1 2 kk T Vxi )
hT h 2 ³ I 1 (v) dv 4J1 2 (Vxi )T kk T Vxi 0
one can write the following using Equation (5.12) and (5.11):
(5.12)
102
Nonlinear H2/H Constrained Feedback Control
uj
( Px Vxi 1 )T ( f j kd i 1 )
PxT f j hT h 2 ³ I 1 (v)dv 4J1 2 PxT kk T Px 0
4J1 2 ( Px Vxi )T kk T ( Px Vxi )
(5.13)
Q( x) 4J1 2 ( Px Vxi )T kk T ( Px Vxi ) 0 Since x f j kd i 1 and the right-hand side of (5.13) is negative definite, it follows that P ( x ) V i 1 ( x ) ! 0 . Using P ( x ) V i 1 ( x ) ! 0 as a Lyapunov function candidate for the dynamics x f j kdˆ i 1 , one has uj
( Px Vxi 1 )T ( f j 2J1 2 kk T Vˆxi )
Pxc f j hT h 2 ³ I 1 (v)dv 4J1 2 PxT kk T Px 0
4J 2 ( Px V ) kk T ( Px Vxi ) 1
i T x
2J1 2 ( Px Vxi 1 )T kk T (Vˆxi Vxi ) d Q( x) 2J1 2 ( Px Vxi 1 )T kk T (Vˆxi Vxi ) From uniform convergence of Vˆ i to V i , L0 : L t L0 such that
x :,
1 2J 2
( Px Vxi 1 )T kk T (Vˆxi Vxi ) ! Q ( x)
This implies that
x :, ( Px Vxi 1 )T ( f j 2J1 2 kk T Vˆxi ) 0 , Next, it is shown that neural network policy iteration on the control as given by (5.7) is asymptotically stabilizing and L2-gain stable for the same attenuation J on :.
Theorem 5.3 L0 : L t L0 such that x
f uˆ j 1 is asymptotically stable.
Proof. This proof is in essence contained in Corollary 3 in [1] where the positive definiteness of h( x ) is utilized to show that uniform convergence of Vˆj to V j , implies that L0 : L t L0 such that
x :, (Vx j )T ( f uˆ j 1 ) 0 , Theorem 5.4 If x f gu j 1 kd has L2-gain less than J , then it can be shown that L0 : L t L0 such that x f guˆ j 1 kd has L2-gain less than J .
Nearly H Optimal Neural Network Control for Constrained-input Systems
103
Proof. Since x f gu j 1 kd has L2-gain less than J , then this implies that there exists a P ( x ) ! 0 such that u j 1
PxT ( f gu j 1 ) hT h 2 ³ I 1 (v)dv 4J1 2 PxT kk T Px
Q( x) 0
0
Hence, one can show that uˆ j 1
PxT ( f guˆ j 1 ) hT h 2 ³ I 1 (v)dv 4J1 2 PxT kk T Px 0
uˆ j 1
Q( x) PxT g (uˆ j 1 u j 1 ) 2
³I
1
(v) dv
u j 1
From uniform convergence of uˆ j 1 to u j 1 , L0 : L t L0 such that uˆ j 1
x :, PxT g (uˆ j 1 u j 1 ) 2
³I
1
(v)dv ! Q( x)
u j 1
This implies that uˆ j 1
x :, Px g ( f uˆ j 1 ) h h 2 ³ I 1 (v)dv 4J1 2 PxT kk T Px 0 T
T
0
, The importance of Theorem 5.4 is that it justifies solving for the available storage for the new updated dynamics x f guˆ j 1 kd . Hence, all of the preceding theorems can be used to show by induction the following main convergence results. The next theorem is an important result upon which the algorithm proposed in Figure 5.1 is justified. Theorem 5.5 L0 : L t L0 such that A. For all j , x f guˆ j 1 kd is dissipative with L2-gain less than J on : . B. For all j and i , x C.
f guˆ j 1 kd i is asymptotically stable on : .
H , L1 ! L0 such that sup uˆ j u H and sup Vˆji V H . x:
x:
Proof. The proof follows directly by indution from Theorems 5.1–5.4.
,
104
Nonlinear H2/H Constrained Feedback Control
5.3 RTAC: The Nonlinear Benchmark Problem In this section, we will consider the disturbance attenuation problem associated with the so called Rotational/Translational Actuator (RTAC) system depicted in Figure 5.2.
Figure 5.2. Rotational actuator to control a translational oscillator
The RTAC system is described in [25] and is abstracted to study a dual-spin spacecraft. It consists of a cart of mass M connected to a fixed wall by a linear spring of stiffness k. The cart is constrained to have one-dimensional travel. The proof-mass actuator attached to the cart has mass m and moment of inertia I about its center of mass, which is located at a distance e from the point about which the proof-mass rotates. Its motion occurs in a horizontal plane so that no gravitational forces need to be considered. The motion of RTAC is derived in [25] and is repeated as follows:
[ [
H (T 2 sin T T cos T ) F
(5.14)
T H[ cos T u
where [ is the one-dimensional displacement of the cart, T the angular position of the proof body, w the disturbance, and u the control input. The coupling between the translational and rotational motion is captured by the parameter H , which is defined by
H
me ( I me 2 )( M m)
where 0 H 1 is the eccentricity of the proof body. Letting x col ( x1 x2 x3 x4 ) col ([ [ T T) , d F , and following state-space representation of (5.14):
y
[
yields the
Nearly H Optimal Neural Network Control for Constrained-input Systems
x
f ( x ) g ( x )u k ( x ) d
y
x1
105
(5.15)
where
f
g
k
x2 ª º « » 2 x1 H x4 sin x3 « » « » 1 H 2 cos 2 x3 « » x4 « » « H cos x ( x H x 2 sin x ) » 3 1 4 3 » « «¬ »¼ 1 H 2 cos 2 x3 0 ª º « H cos x » 3 « » «1 H 2 cos 2 x3 » « » 0 « » « » 1 « » 2 2 «¬1 H cos x3 »¼ 0 ª º « » 1 « » 2 2 «1 H cos x3 » « » 0 « » « H cos x3 » « » 2 2 ¬«1 H cos x3 ¼»
(5.16)
where 1 H 2 cos 2 x3 z 0 for all x3 and H 1. The dynamics of this nonlinear plant pose a challenge as both the rotational and translation motions are coupled as shown. In [89] and [74], unconstrained controls were obtained to solve the L2 disturbance problem of the RTAC system based on Taylor series solutions of the HJI equation. In [74], unconstrained controllers based on the state-dependent Riccati equation (SDRE) were obtained. The SDRE is easier to solve than the HJI equation and results in a time varying controller that was shown to be suboptimal. In this section, a neural network constrained-input H state feedback controller is computed for the RTAC shown in Figure 5.2. To our knowledge, this is the first treatment in which input constraints are explicitly considered during the design of the optimal H controller that guarantees optimal disturbance attenuation. The dynamics of the nonlinear plant are given as
106
Nonlinear H2/H Constrained Feedback Control
x z'z
H me
f ( x) g ( x)u k ( x)d , 2 1
2 2
u d2
2 3
x 0.1x 0.1x 0.1x42 u
I me M m 2
0.2, J
2
(5.17)
q
10
with f ( x) , g ( x) and k ( x) as defined in (5.16). The design steps procedure goes as follows: Initial control selection The following H f controller of the linear system resulting from Jacobian linearization of (5.17) is chosen u0
2 tanh(2.4182 x1 1.1650 x2 0.3416 x3 1.0867 x4 )
and forced to obey the u d 2 constraint. This is a stabilizing controller that guarantees that L2 -gain