Emergent Behavior Detection and Task Coordination for Multiagent Systems: A Distributed Estimation and Control Approach (Studies in Systems, Decision and Control, 397) 3030868923, 9783030868925

This book addresses problems in the modeling, detection, and control of emergent behaviors and task coordination in mult

139 109 6MB

English Pages 255 [250] Year 2021

Table of contents :
Preface
Acknowledgements
Contents
List of Figures
1 Introduction
1.1 Introduction
1.2 Multiagent Systems
1.3 Emergent Behaviors in Multiagent Systems
1.3.1 Characteristics of Multiagent Systems
1.3.2 Interaction Dynamics
1.3.3 Interaction Topologies
1.4 Task Coordination in Multiagent Systems
1.5 Agent Model Examples
1.5.1 Single- and Double-Integrator Models
1.5.2 Flocking Models
1.6 Summary
References
2 Preliminaries on Matrix and System Theory
2.1 Introduction
2.2 Basics on Linear Algebra and Matrix Theory
2.3 Solutions to and Stability of Linear Systems
2.3.1 Solutions to Linear Systems
2.3.2 Stability of Linear Systems
2.4 Tools for Nonlinear System Analysis and Design
2.4.1 Lyapunov Stability
2.4.2 Nonlinear Control Design
2.5 Summary
References
3 Interaction Topologies of Multiagent Systems and Consensus Algorithms
3.1 Interaction Topologies of Multiagent Systems
3.1.1 Algebraic Graph Theory
3.1.2 Matrix Representation of Sensing/Communication Network
3.2 Basic Consensus Algorithm: Continuous-Time Case
3.3 Basic Consensus Algorithm: Discrete-Time Case
3.4 Consensus Algorithm for High-Order Linear Systems
3.4.1 Cooperative Backstepping Control
3.4.2 Cooperative Output Feedback Control
3.4.3 Cooperative Control for a Class of Linear Systems in a Canonical Form
3.5 A Discontinuous Consensus Algorithm
3.6 Summary
References
4 Emergent Behavior Detection in Multiagent Systems
4.1 Introduction
4.2 Emergent Behavior Indicators
4.3 Distributed Estimation of Time-Varying Signals
4.4 Distributed Least Squares Algorithm
4.4.1 Fundamentals on Least Squares Algorithms
4.4.2 Distributed Recursive Least Squares Algorithm
4.4.3 Distributed Iterative Least Squares Algorithm
4.5 Distributed Kalman Filtering Algorithm
4.6 Summary
References
5 Distributed Task Coordination of Multiagent Systems
5.1 Task Coordination as a Control Problem
5.2 A General Design Method for Distributed Nonlinear Control
5.2.1 General Design
5.2.2 Distributed Control of Nonholonomic Robots
5.3 Distributed Tracking Control for a Class of Uncertain Nonlinear Systems
5.3.1 A Simple Case for fi(xi)
5.3.2 A General Case for Neural Network Parameterized fi(xi)
5.3.3 The Case with Partially Unknown gi
5.3.4 The Case with Completely Unknown gi
5.3.5 Extensions
5.4 Summary
References
6 Multiagent Distributed Optimization and Reinforcement Learning Control
6.1 Introduction
6.2 Basics on Optimization and Reinforcement Learning Algorithms
6.2.1 Optimization Algorithms
6.2.2 Dynamic Programming and Reinforcement Learning
6.3 Multiagent Distributed Optimization
6.3.1 Distributed Multiagent Optimization Algorithm: Case 1
6.3.2 Distributed Multiagent Optimization Algorithm: Case 2
6.4 Multiagent Distributed Coordination Using Reinforcement Learning
6.4.1 Multiagent HJB Equation
6.4.2 Value Iteration Algorithm for Multiagent HJB
6.4.3 Q-function-Based Value Iteration
6.4.4 Extension
6.5 Summary
References
Index

Recommend Papers

Systems, Decision and Control in Energy III (Studies in Systems, Decision and Control, 399) 3030876748, 9783030876746

This book describes new energy saving methods and technologies for heat power engineering. The book is devoted to topica

101 60 10MB Read more

Systems, Decision and Control in Energy V (Studies in Systems, Decision and Control, 481) 3031350871, 9783031350870

The book consists of 8 parts: Energy Informatics, Electric Power Engineering, Heat Power Engineering, Nuclear Power Engi

113 1 27MB Read more

Systems, Decision and Control in Energy II (Studies in Systems, Decision and Control, 346) 3030691888, 9783030691882

This book examines the problems in the field of energy and related fields (chemical, transport, aerospace, construction,

108 66 14MB Read more

Control and Estimation in Distributed Parameter Systems 9780898712971, 0898712971

Research in control and estimation of distributed parameter systems encompasses a wide range of applications including b

451 33 2MB Read more

Diagnosis, Fault Detection & Tolerant Control (Studies in Systems, Decision and Control, 269) 9789811517457, 9811517452

This book focuses on unhealthy cyber-physical systems. Consisting of 14 chapters, it discusses recognizing the beginning

165 68 18MB Read more

Cyber-Physical Systems: Modelling and Intelligent Control (Studies in Systems, Decision and Control, 338) 3030660761, 9783030660765

This book highlights original approaches of modelling and intelligent control of cyber-physical systems covering both th

119 73 Read more

Performance Evaluation Models for Distributed Service Networks (Studies in Systems, Decision and Control, 343) 3030670627, 9783030670627

This book presents novel approaches to formulate, analyze, and solve problems in the area of distributed service network

112 62 5MB Read more

State Estimation and Stabilization of Nonlinear Systems: Theory and Applications (Studies in Systems, Decision and Control, 491) 3031379691, 9783031379697

This book presents the separation principle which is also known as the principle of separation of estimation and control

117 36 11MB Read more

Event-Triggered Control of Switched Linear Systems (Studies in Systems, Decision and Control, 365) 3030716031, 9783030716035

This book approaches its subject matter in a way that provides Lyapunov function analysis and event-triggered design met

120 49 5MB Read more

Neural Control of Renewable Electrical Power Systems (Studies in Systems, Decision and Control, 278) 3030474429, 9783030474423

This book presents advanced control techniques that use neural networks to deal with grid disturbances in the context re

118 74 43MB Read more

Emergent Behavior Detection and Task Coordination for Multiagent Systems: A Distributed Estimation and Control Approach (Studies in Systems, Decision and Control, 397)
3030868923, 9783030868925

Author / Uploaded
Jing Wang

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

Studies in Systems, Decision and Control 397

Jing Wang

Emergent Behavior Detection and Task Coordination for Multiagent Systems A Distributed Estimation and Control Approach

Studies in Systems, Decision and Control Volume 397

Series Editor Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland

The series “Studies in Systems, Decision and Control” (SSDC) covers both new developments and advances, as well as the state of the art, in the various areas of broadly perceived systems, decision making and control–quickly, up to date and with a high quality. The intent is to cover the theory, applications, and perspectives on the state of the art and future developments relevant to systems, decision making, control, complex processes and related areas, as embedded in the fields of engineering, computer science, physics, economics, social and life sciences, as well as the paradigms and methodologies behind them. The series contains monographs, textbooks, lecture notes and edited volumes in systems, decision making and control spanning the areas of Cyber-Physical Systems, Autonomous Systems, Sensor Networks, Control Systems, Energy Systems, Automotive Systems, Biological Systems, Vehicular Networking and Connected Vehicles, Aerospace Systems, Automation, Manufacturing, Smart Grids, Nonlinear Systems, Power Systems, Robotics, Social Systems, Economic Systems and other. Of particular value to both the contributors and the readership are the short publication timeframe and the worldwide distribution and exposure which enable both a wide and rapid dissemination of research output. Indexed by SCOPUS, DBLP, WTI Frankfurt eG, zbMATH, SCImago. All books published in the series are submitted for consideration in Web of Science.

More information about this series at http://www.springer.com/series/13304

Jing Wang

Emergent Behavior Detection and Task Coordination for Multiagent Systems A Distributed Estimation and Control Approach

Jing Wang Department of Electrical and Computer Engineering Bradley University Peoria, IL, USA

ISSN 2198-4182 ISSN 2198-4190 (electronic) Studies in Systems, Decision and Control ISBN 978-3-030-86892-5 ISBN 978-3-030-86893-2 (eBook) https://doi.org/10.1007/978-3-030-86893-2 © Springer Nature Switzerland AG 2022 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

To My Parents Wei Wang and Guihua Wen To Yulan Li, Maggie Wang, and Matthew Wang

Preface

In this book, we consider multiagent systems with linear and/or nonlinear dynamics, and possibly with model and sensing/communication uncertainties. Examples include mobile sensor networks, satellite and communication networks, robotic networks (air, ground, and underwater), power grids, and others. With an everincreasing development in distributed devices and computing power and in fast computer networks, data-driven distributed algorithms for multiagent systems have shown great potential applications in such as surveillance and reconnaissance, cooperative exploration for search and rescue missions, environmental sensing and monitoring, and cooperative transportation. A comprehensive understanding of emergent behaviors in multiagent systems is thus of paramount importance because this may further facilitate the design of optimal distributed algorithms for resilient operation of multiagent systems. This book focuses upon addressing the fundamental issues on multiagent emergent behavior detection and multiagent task coordination in order to deal with multiagent systems with nonlinear dynamics and uncertainties and to achieve autonomous and optimal task coordination. For multiagent emergent behavior detection, the objective is to develop distributed estimation algorithms for unknown interaction dynamics, interaction topologies, and certain behavior characteristic functions. For multiagent task coordination, the objective is to develop the proper distributed control protocol to perform the coordinated tasks, and the coordination protocol design is closely related to agent system dynamics and task classifications. This book provides systematic approaches and analytical solutions to address the above-mentioned problems arising from multiagent uncertainties with regard to interaction dynamics and interaction topologies. The book aims to present a comprehensive and self-contained coverage on the study of multiagent emergent behaviors and task coordination by not only exposing some recently developed distributed algorithms based on the research work of the author and his collaborators, but also some fundamentals on systems and controls in order to facilitate the technique development in the book. It brings together the necessary tools and methodologies of analysis and design for multiagent systems, from systems and controls theory to graph theory, from distributed least squares estimation to distributed Kalman filtering, from linear consensus algorithms to nonlinear and adaptive consensus algorithms, and from vii

viii

Preface

distributed optimization algorithms to distributed optimal coordination algorithms using reinforcement learning. By integrating these approaches and solutions, the book provides a general framework and a number of concrete algorithms for analyzing and designing multiagent systems, and paves the way for potential applications in engineered systems, such as sensor networks, microgrids, robotic networks, to name but a few. The book puts its emphasis on rigorous solutions and provides answers to the critical issues related to distributed estimation and control, distributed optimization, and distributed optimal control using reinforcement learning. The book consists of six chapters. Chapter 1 provides an overview of the characteristics of multiagent systems and points out challenges in studying emergent behaviors of multiagent systems. Chapter 2 presents the standard matrix theory and system theory. Technique topics related to matrix transformations, matrix eigenstructure, solutions to linear systems, Lyapunov stability theory, and nonlinear design tools are reviewed. These materials cover the needed background knowledge to follow the algorithm development in the rest chapters of the book. In Chap. 3, thorough discussions on expressing multiagent interaction topology using algebraic graph and sensing/communication matrix are included. Linear consensus algorithms are presented with system solutionbased convergence analysis. New designs on both backstepping-based and outputfeedback-based consensus algorithms are given. Chapter 3 is the foundation for the subsequent development of distributed estimation, control, and optimization algorithms. Chapter 4 focuses on distributed estimation algorithms and presents three sets of results for emergent behavior detection in multiagent systems. First, a consensus-based algorithm is given for the distributed estimation of time-varying global signals, which serve as the indicators for emergent behaviors of multiagents as a group. A distributed estimation algorithm for interaction topology is integrated into the overall design. Second, two distributed least squares algorithms are presented for dealing with more general emergent behavior detection scenarios. Third, a new distributed Kalman filtering algorithm is proposed for handling multiagent systems with dynamics. Chapter 5 is on multiagent task coordination problem. We formulate it as a distributed control problem and consider multiagent systems with nonlinear dynamics and model uncertainties. A general design for distributed nonlinear coordination control is first proposed, which is applied to solve the coordination problem of nonholonomic systems. Distributed leader tracking control is then thoroughly studied by systematically presenting a number of adaptive designs for nonlinear multiagent systems with various uncertainties. Asymptotical convergence can be guaranteed in those designs. Chapter 6 studies distributed task optimization problem in multiagent systems. First, basics on numerical optimization algorithms, dynamic programming, and reinforcement learning algorithms are reviewed. Then, distributed multiagent optimization algorithms are presented which are based on the distributed consensus estimation of the gradient of the overall cost function. Third, a new consensus-based value iteration algorithm is proposed to solve the optimal coordination problem for a class of nonlinear multiagent systems. Neural network parameterization of value function and reinforcement learning techniques are used in the design. Those results

Preface

ix

may serve as a vehicle for further study on data-driven distributed reinforcement learning control of multiagent systems. In summary, the book provides a unified treatment for multiagent emergent behavior detection and task coordination using a distributed estimation and control approach. On the one hand, it is author’s intent to make the book as self-contained as possible by including some fundamentals on systems and control theory, learning, and optimization. The readers with moderate training in the field of engineering and science will be able to follow the text at ease. On the other hand, the book covers sufficient depth on advanced research topics in terms of distributed control and optimization for multiagent systems particularly with uncertainties and with inherently nonlinear dynamics such as nonholonomic constraints. Many simulation examples are included for an easy grasp of ideas, methodologies, and algorithms presented in the book. The book may serve as a reference for researchers and practicing engineers in the areas of systems and controls, robotics, and cyber-physical systems. The book can also readily be adopted as an advanced textbook for a graduate-level course or an elective for senior undergraduate students focusing on distributed estimation and control of multiagent systems. Peoria, IL, USA July 2021

Jing Wang

Acknowledgements

I would particularly like to thank my postdoc advisor, Zhihua Qu, who supported and guided my early research in the areas of nonlinear control and multiagent systems, and who continued to provide inspirations, encouragements, and valuable suggestions after I completed my postdoc work at the University of Central Florida. I wish to express my thanks to my research collaborators over the years for their useful suggestions and constructive comments: Brian Abbe, In Soo Ahn, Shuzhi Sam Ge, Yi Guo, Zhiwu Huang, Yufeng Lu, Morrison Obeng, Jun Peng, Khanh Pham, Zhihua Qu, Gennady Staskevich, Zhengdong Sun, Alvaro Velasquez, Xiaohe Wu, and Tianyu Yang. My thanks also go to my colleagues, students, and friends who provided support and help. I would like to acknowledge the funding support for my research work from the following agencies during the past few years: Air Force Research Lab, Air Force Office of Sponsored Research Summer Faculty Fellowship Program (AFOSR-SFFP), National Science Foundation (MRI), and Illinois Space Grant Consortium. Finally, I want to thank my parents (Wei Wang and Guihua Wen), my wife (Yulan), and my kids (Maggie and Matthew) for their love and support. Peoria, IL, USA July 2021

Jing Wang

xi

Contents

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Multiagent Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Emergent Behaviors in Multiagent Systems . . . . . . . . . . . . . . . . . . . . 1.3.1 Characteristics of Multiagent Systems . . . . . . . . . . . . . . . . . . . 1.3.2 Interaction Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.3 Interaction Topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Task Coordination in Multiagent Systems . . . . . . . . . . . . . . . . . . . . . . 1.5 Agent Model Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.1 Single- and Double-Integrator Models . . . . . . . . . . . . . . . . . . 1.5.2 Flocking Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1 1 4 6 6 6 11 13 15 15 18 19 19

2 Preliminaries on Matrix and System Theory . . . . . . . . . . . . . . . . . . . . . . 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Basics on Linear Algebra and Matrix Theory . . . . . . . . . . . . . . . . . . . 2.3 Solutions to and Stability of Linear Systems . . . . . . . . . . . . . . . . . . . . 2.3.1 Solutions to Linear Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.2 Stability of Linear Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Tools for Nonlinear System Analysis and Design . . . . . . . . . . . . . . . . 2.4.1 Lyapunov Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.2 Nonlinear Control Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

23 23 23 29 29 30 32 33 37 40 41

3 Interaction Topologies of Multiagent Systems and Consensus Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Interaction Topologies of Multiagent Systems . . . . . . . . . . . . . . . . . . 3.1.1 Algebraic Graph Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.2 Matrix Representation of Sensing/Communication Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Basic Consensus Algorithm: Continuous-Time Case . . . . . . . . . . . . .

43 43 43 50 54 xiii

xiv

Contents

3.3 Basic Consensus Algorithm: Discrete-Time Case . . . . . . . . . . . . . . . 3.4 Consensus Algorithm for High-Order Linear Systems . . . . . . . . . . . . 3.4.1 Cooperative Backstepping Control . . . . . . . . . . . . . . . . . . . . . . 3.4.2 Cooperative Output Feedback Control . . . . . . . . . . . . . . . . . . 3.4.3 Cooperative Control for a Class of Linear Systems in a Canonical Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 A Discontinuous Consensus Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

87 94 99 99

4 Emergent Behavior Detection in Multiagent Systems . . . . . . . . . . . . . . 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Emergent Behavior Indicators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Distributed Estimation of Time-Varying Signals . . . . . . . . . . . . . . . . 4.4 Distributed Least Squares Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 Fundamentals on Least Squares Algorithms . . . . . . . . . . . . . . 4.4.2 Distributed Recursive Least Squares Algorithm . . . . . . . . . . . 4.4.3 Distributed Iterative Least Squares Algorithm . . . . . . . . . . . . 4.5 Distributed Kalman Filtering Algorithm . . . . . . . . . . . . . . . . . . . . . . . 4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

101 101 102 104 116 116 119 129 132 143 144

5 Distributed Task Coordination of Multiagent Systems . . . . . . . . . . . . . 5.1 Task Coordination as a Control Problem . . . . . . . . . . . . . . . . . . . . . . . 5.2 A General Design Method for Distributed Nonlinear Control . . . . . 5.2.1 General Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.2 Distributed Control of Nonholonomic Robots . . . . . . . . . . . . 5.3 Distributed Tracking Control for a Class of Uncertain Nonlinear Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 A Simple Case for f i (xi ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.2 A General Case for Neural Network Parameterized f i (xi ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.3 The Case with Partially Unknown gi . . . . . . . . . . . . . . . . . . . . 5.3.4 The Case with Completely Unknown gi . . . . . . . . . . . . . . . . . 5.3.5 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

145 145 148 148 152

6 Multiagent Distributed Optimization and Reinforcement Learning Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Basics on Optimization and Reinforcement Learning Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.1 Optimization Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.2 Dynamic Programming and Reinforcement Learning . . . . . . 6.3 Multiagent Distributed Optimization . . . . . . . . . . . . . . . . . . . . . . . . . .

60 66 66 77

166 167 173 179 181 187 191 192 195 195 195 195 198 203

Contents

6.3.1 Distributed Multiagent Optimization Algorithm: Case 1 .................................................. 6.3.2 Distributed Multiagent Optimization Algorithm: Case 2 .................................................. 6.4 Multiagent Distributed Coordination Using Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.1 Multiagent HJB Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.2 Value Iteration Algorithm for Multiagent HJB . . . . . . . . . . . . 6.4.3 Q-function-Based Value Iteration . . . . . . . . . . . . . . . . . . . . . . . 6.4.4 Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xv

204 207 209 213 215 223 226 228 230

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233

List of Figures

Fig. 1.1 Fig. 1.2 Fig. 1.3 Fig. 3.1 Fig. 3.2 Fig. 3.3 Fig. 3.4 Fig. 3.5 Fig. 3.6 Fig. 3.7 Fig. 3.8 Fig. 3.9 Fig. 3.10 Fig. 3.11 Fig. 3.12 Fig. 3.13 Fig. 3.14 Fig. 3.15 Fig. 3.16 Fig. 3.17 Fig. 3.18 Fig. 3.19 Fig. 3.20 Fig. 3.21 Fig. 3.22

Architecture of Emergent behaviors Detection and Control . . . . Consensus behaviors under different initial conditions . . . . . . . . Proposed multiagent task coordination architecture . . . . . . . . . . . A multiagent system with limited sensing/communication . . . . . Digraph examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Location of eigenvalues of L . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A spanning tree in a digraph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A three-node digraph with a spanning tree . . . . . . . . . . . . . . . . . . A three-node strongly connected digraph . . . . . . . . . . . . . . . . . . . A network of four agents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . An irreducible network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A reducible network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Switching sensing/communication topologies . . . . . . . . . . . . . . . Consensus under the sensing/communication topology A1 . . . . . Consensus under the sensing/communication topology A2 . . . . . Consensus under the sensing/communication topology A3 . . . . . The consensus under the switching sensing/communication topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Consensus under the sensing/communication topology A1 . . . . . Control inputs under the sensing/communication topology A1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Consensus under the sensing/communication topology A2 . . . . . Control inputs under the sensing/communication topology A2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Consensus under the sensing/communication topology A3 . . . . . Control inputs under the sensing/communication topology A3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Consensus under the switching sensing/communication topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Control inputs under the switching sensing/communication topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

6 8 15 44 44 47 47 49 50 51 52 53 59 60 61 62 63 69 70 71 72 73 74 75 76 xvii

xviii

Fig. 3.23 Fig. 3.24 Fig. 3.25 Fig. 3.26 Fig. 3.27 Fig. 3.28 Fig. 3.29 Fig. 3.30 Fig. 3.31 Fig. 3.32 Fig. 3.33 Fig. 3.34 Fig. 3.35 Fig. 3.36 Fig. 3.37 Fig. 4.1 Fig. 4.2 Fig. 4.3 Fig. 4.4 Fig. 4.5 Fig. 4.6 Fig. 4.7 Fig. 4.8 Fig. 4.9 Fig. 4.10 Fig. 4.11 Fig. 4.12 Fig. 4.13 Fig. 4.14 Fig. 4.15

List of Figures

The consensus under the sensing/communication topology A1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Control inputs under the sensing/communication topology A1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The consensus under the sensing/communication topology A2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Control inputs under the sensing/communication topology A2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The consensus under the sensing/communication topology A3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Control inputs under the sensing/communication topology A3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The consensus under the switching sensing/communication topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Control inputs under the switching sensing/communication topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The consensus under the sensing/communication topology A1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Control inputs under the sensing/communication topology A1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The convergence of estimation errors . . . . . . . . . . . . . . . . . . . . . . The consensus under the sensing/communication topology A1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Control inputs under the sensing/communication topology A1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . System responses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . System responses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Average estimates by agents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Estimation errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Average estimates by agents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Estimation errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Estimate of w1 by agent 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Estimate of w1 by agent 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Estimate of w1 by agent 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Three sensor monitoring ten agents . . . . . . . . . . . . . . . . . . . . . . . . Ten agents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Norm of estimation errors for ten agents’ positions by sensor 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Norm of estimation errors for ten agents’ positions by sensor 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Norm of estimation errors for ten agents’ positions by sensor 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i to w11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Convergence of wˆ 11 i to w12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Convergence of wˆ 12 i Convergence of wˆ 13 to w13 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

78 79 80 81 82 83 84 85 86 87 88 92 93 98 99 108 108 113 113 114 114 114 117 127 128 128 129 129 130 130

List of Figures

Fig. 4.16 Fig. 4.17 Fig. 4.18 Fig. 4.19 Fig. 4.20 Fig. 4.21 Fig. 4.22 Fig. 4.23 Fig. 4.24 Fig. 4.25 Fig. 4.26 Fig. 4.27 Fig. 4.28 Fig. 4.29 Fig. 5.1 Fig. 5.2 Fig. 5.3 Fig. 5.4 Fig. 5.5 Fig. 5.6 Fig. 5.7 Fig. 5.8 Fig. 5.9 Fig. 5.10 Fig. 5.11 Fig. 5.12 Fig. 5.13 Fig. 5.14 Fig. 5.15 Fig. 5.16 Fig. 5.17 Fig. 5.18 Fig. 5.19 Fig. 5.20 Fig. 5.21 Fig. 5.22 Fig. 5.23

Norm of estimation errors for ten agents’ positions by sensor 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Norm of estimation errors for ten agents’ positions by sensor 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Norm of estimation errors for ten agents’ positions by sensor 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Agent 1 state x11 (k) and the corresponding estimates by three observers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Agent 2 state x21 (k) and the corresponding estimates by three observers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Agent 3 state x21 (k) and the corresponding estimates by three observers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Agent 4 state x41 (k) and the corresponding estimates by three observers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Agent 5 state x51 (k) and the corresponding estimates by three observers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Norm of estimation errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Communication topologies: (A1 : top, A2 : bottom) . . . . . . . . . . . Estimates of w11 by four agents . . . . . . . . . . . . . . . . . . . . . . . . . . . Estimates of w12 by four agents . . . . . . . . . . . . . . . . . . . . . . . . . . . Estimates of w13 by four agents . . . . . . . . . . . . . . . . . . . . . . . . . . . Estimates of w14 by four agents . . . . . . . . . . . . . . . . . . . . . . . . . . . Initial configurations of three robots . . . . . . . . . . . . . . . . . . . . . . . Final configurations of three robots . . . . . . . . . . . . . . . . . . . . . . . . System response under control (5.14) . . . . . . . . . . . . . . . . . . . . . . System response under control (5.15) . . . . . . . . . . . . . . . . . . . . . . System response under control (5.18) . . . . . . . . . . . . . . . . . . . . . . System response under control (5.18) . . . . . . . . . . . . . . . . . . . . . . Control inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A differential-drive robot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A car-like robot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . System state responses xi (t) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . System state responses yi (t) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . System state responses θi (t) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . System control inputs u i1 (t) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . System control inputs u i2 (t) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . System state responses xi (t) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . System state responses yi (t) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . System state responses θi (t) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Final configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . System state responses xi (t) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tracking errors x˜i (t) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Control inputs u i (t) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameter estimates αˆ i . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameter estimates wˆ i1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xix

131 131 132 138 138 139 139 140 140 141 142 142 143 143 146 147 151 151 153 153 154 155 156 161 162 162 163 163 164 165 165 166 170 170 171 171 172

xx

Fig. 5.24 Fig. 5.25 Fig. 5.26 Fig. 5.27 Fig. 5.28 Fig. 5.29 Fig. 5.30 Fig. 5.31 Fig. 5.32 Fig. 5.33 Fig. 5.34 Fig. 5.35 Fig. 6.1 Fig. 6.2 Fig. 6.3 Fig. 6.4 Fig. 6.5 Fig. 6.6 Fig. 6.7 Fig. 6.8

List of Figures

Parameter estimates wˆ i2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . System state responses xi (t) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tracking errors x˜i (t) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Control inputs u i (t) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameter estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameter estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . System state responses xi (t) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tracking errors x˜i (t) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Control inputs u i (t) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameter estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameter estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameter estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Estimation errors of agent 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Estimation errors of agent 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Estimation errors of agent 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Estimation errors of agent 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . System responses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Control inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameter estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Instantaneous cost value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

172 176 177 177 178 178 184 185 185 186 186 187 210 210 211 211 227 228 228 229

Chapter 1

Introduction

1.1 Introduction In this book, we study emergent behavior detection and task coordination problems for multiagent systems. Multiagent systems have been studied by researchers in many different disciplines such as biology, physics, chemistry, systems and controls, and computer science [1–9]. While multiagent systems may be defined and studied according to the context domain knowledge in each discipline, in this book, we define multiagent systems as a group of dynamical systems which can be mathematically modeled by differential equations and/or difference equations, and then take a systems and controls approach to present a number of research results for detection, estimation, modeling, and control of emergent behaviors in multiagent systems. Emergent behaviors in multiagent systems, as a result of local interactions of individuals in the group, are often seen in nature phenomena, such as fish schooling, bird flocking, animal swarming, and synchronization of fireflies [10–12]. Motivated by those fascinating phenomena and in order to improve efficiency, flexibility, and robustness in engineering applications, recent years have witnessed the rapid developments of engineered multiagent systems, such as seen in robotic networks [5, 13, 14], power grids [15], computer networks [16], and sensor networks [17]. Those of course resulted on the one hand in the well-improved computing, communication, and sensing technology, and on the other hand in the enhanced understanding of emergent behaviors in multiagent systems which enables researchers to analyze and design distributed estimation, control, and optimization algorithms for engineering applications such as surveillance and reconnaissance, cooperative exploration for search and rescue missions, environmental sensing and monitoring, and cooperative transportation. It is the objective of this book to collectively present a number of research results of ourselves as well as some fundamentals by focusing on an integrated model-based approach for distributed detection and control of emergent behaviors in multiagent systems.

© Springer Nature Switzerland AG 2022 J. Wang, Emergent Behavior Detection and Task Coordination for Multiagent Systems, Studies in Systems, Decision and Control 397, https://doi.org/10.1007/978-3-030-86893-2_1

1

2

1 Introduction

For a single-agent system, governed by dynamic models such as differential equations, its dynamic behaviors can be well characterized by the patterns of its equilibrium points and periodic orbits, as well as by their stability properties, such as stable nodes, unstable nodes, saddle points, limit cycles, and bifurcation [18]. Fruitful mathematical tools like matrix theory and Lyapunov analysis are available for carrying out the dynamic behavior analysis for such a single-agent system [19]. However, the study of dynamic behaviors of multiagent systems poses significant challenges due to the fact that complex system behaviors emerge as a result of their individual self-operating (sensing and actuating) capability, as well as of the interactions among agents. The multiagent system has some unique characteristics which make it remarkably difficult to analyze by simply using standard tools in [18], even though the overall system may still be treated as a single one mathematically. For instance, the overall multiagent system may exhibit infinite equilibrium points, and correspondingly, their convergence to certain equilibrium points relies on the interaction rules among agents as well as their interaction topologies. The question then is how to detect and analyze the rich emergent behaviors of multiagent systems induced by system dynamics and by agent interactions. That is, two dominating factors involved in the study of emergent behaviors of multiagent systems are agents interaction dynamics and interaction topologies. Hence, it is of paramount importance to develop reliable strategies for detecting emergent behaviors of autonomous multiagent systems by identifying, designing, and controlling interaction dynamics and interaction topologies. Modeling Emergent Behaviors. Numerous attempts have been made to understand the collective behaviors of mutliagent systems through observations, experiments, mathematical modeling, and computer simulations. In physical, chemical, and biomolecular systems, emergent behaviors, such as flocks, schools, herds, colonies, swarms, and synchronization, have been revealed through quantitative observations and experiments [2, 20, 21]. General studies show that particle density of agents, reaction to external stimulus (e.g., resource or perturbation), boundary conditions, and decision making are key factors for the onset of spatial and temporal patterns. For example, fish schools organize loosely for food foraging and tightly to avoid predator attack. V-formations are often seen in long distance birds flocking. In [22], a computer simulation model (boids) was designed to reproduce the swarm behaviors of animals based on a set of steering rules of cohesion, separation, and alignment. Despite the simple local interactions in the boids model, simulated animation displays the amazing flocking behavior. In order to establish a mathematical explanation of the flocking behavior, the celebrated Vicsek model was proposed in [12], in which the alignment rule for a group of n agents was modeled 1 (θi (k) + j∈N i (k) θ j (k)), which in essence says that the heading as θi (k + 1) = n+1 direction θi (k + 1) for the ith agent at time instant k + 1 is updated to the average of those headings of its neighboring agents. Under the condition that agents are sufficiently connected, the emergent behavior under Vicsek model is that all agents eventually move to the same heading. In the presence of intrinsic and extrinsic noises, the effect on phase transitions of Vicsek model was studied in [23], and disordered

1.1 Introduction

3

behaviors may result. More complicated birds flock and formation models were studied recently. In [24], flocking of multiagent dynamic systems was studied based on the complete boids rules under the assumption of all-to-all interaction. In [25], a nonlinear gain model is given to quantify the influence of distances between birds, and [26] studied the bird flight formations using diffusion adaptation. More general swarm behavior models such as aggregation and social foraging were also analyzed in [27] by using potential function methods. Interaction Topologies for Emergent Behaviors. It is apparent that emergent behaviors depend on the local interaction topologies among agents. The interactions among agents are usually through communication and sensing mechanisms, which are mostly time-varying and intermittent. By using a graph to capture the topology of the interaction network, the first systematic analysis on network connectivity was done in [28, 29], and the conditions were obtained for composite undirected graphs, which need to be connected for generating global consensus behaviors. Extensions were made in [30, 31] to the cases with directed graphs, and the less restrictive conditions were stated as the existence of a spanning tree in the connectivity network or the periodically strongly connected network. Complement to the aforementioned graphtheoretical methods, a matrix-theoretical framework was developed in [32] to deal with multiagent systems with high-order interaction dynamics. To achieve consensus behaviors, the necessary and sufficient condition is that the interaction network described by matrix sequences is sequentially complete. In [33], we studied the correlation between interaction dynamics and interaction topologies and pointed out that the sequentially complete interaction network may not guarantee the desired group behavior in the presence of discontinuous interaction dynamics and that, unexpected behaviors may appear instead. Detecting Emergent Behaviors. It is worth pointing out that analysis and detection of emergent behaviors have also been done using nonstandard model and knowledgebased approaches, which can be generically classified into live emergence analysis method and post-mortem emergence analysis method [34, 35]. In the live emergence analysis, the emergent behavior is not known beforehand, while in the post-mortem analysis, the emergent property is observed and defined, and the system is analyzed to identify its causes. In [36], a grammar-based method was given for live emergence analysis, which introduces a set of rules to represent the behaviors, and two property grammars L whole and L par ts are calculated correspondingly and compared for detecting emergent behaviors. In the post-mortem analysis method, events are used in [37] to define and identify the behaviors; in [38], macrovariable and microvariable are used to describe emergence. In [35], the post-mortem analysis relies on the representation of model components as macrolevel and microlevel emergent properties, and composed model simulation is implemented to record those values for reconstructability analysis. There exist other methods for detecting and predicting emergent behaviors, such as metric-based method [39], semi-Boolean algebra method [40], formal specification method [41], taxonomy-based exploration [42], and agent-based simulation [43, 44]. More recently, a feature classification method is proposed to recognize the collective group behaviors based on limited information from local agents [45].

4

1 Introduction

The key is proper definition of behavior indicators such as group polarization, group momentum, and the Fiedler eigenvalue of the interaction graph. Controlling Emergent Behaviors. Motivated by modeling and analysis methods for emergent behaviors in natural multiagent systems, the dual problem of controlling emergent behaviors in engineered multiagent systems, such as UAVs, computer networks, and smart grids, has also been addressed extensively in the literature to gain a complete picture of detecting the emergent behaviors in multiagent systems. The problem is largely formulated as coordinating multiple agents to perform a set of tasks as a group, while individual agents may exhibit local behaviors. Despite the variety of coordination tasks and applications of multiagent systems, their global behaviors can be loosely described as the emergence of consensus or agreement behaviors with different patterns. For instance, flock is consensus in velocities, rendezvous is consensus in positions, and formation is agreement on relative positions. In that regard, recent years have witnessed the development of numerous results on designing and controlling emergent consensus behaviors from the perspective of systems and controls. The focus has been on the design of local coordination control laws for agents. Early results are mostly obtained using heuristic approaches. Artificial intelligence methods [46] were used to explore the architecture, task allocation, mapping building, coordination, and control algorithms in multirobot motion systems [47, 48]. In [49, 50], standard control technique-oriented approaches have been pursued with the aid of graph theory and artificial potentials for the cooperative control of multirobots under rigid communication topologies among agents in the network. Considering the time-varying interaction among agents, local cooperative controls are designed for first-order linear systems [28–31, 51], for double-integrator model [52], for high-order linear systems [32, 53], for continuous nonlinear systems [54–57], and for discontinuous nonlinear systems [58–62]. In these works, system dynamics properties, such as convexity [54], subtangentiality condition on the vector fields [55], passivity [56], and nonsmooth analysis [63, 64], have been used to facilitate the control of consensus behaviors. In the presence of uncertainties, unexpected behaviors may emerge for multiagent systems. There has appeared some work to deal with time delays, stochastic communication noises, bounded disturbances and measurement errors recently [65–67]. For instance, in uncertain linear multiagent systems, consensus with external disturbances and model uncertainties were investigated in [68, 69]. For nonlinear multiagent systems with uncertainties, a distributed adaptive coordinated tracking control was proposed in [8].

1.2 Multiagent Systems In this book, we consider the general multiagent systems defined by a set of dynamical equations given below

z˙ i (t) = Fi (z i (t), u i (t), t) + ΔFi (z i (t)) + wi (t), yi (t) = Hi (z i (t)) + vi (t),

(1.1)

1.2 Multiagent Systems

5

where i ∈ {1, . . . , n} is the index for agent i, and there are n agents in the group, z i ∈ ni is the state, u i ∈ m is the control (interaction rule) to be designed, yi ∈ m is the output (measurement) vector, wi (t) ∈ ni and vi (t) ∈ m are Gaussian noises with zero mean, Fi (·) and Hi (·) are piecewise-continuous vector-valued functions, and the term ΔFi denotes the unknown agent dynamics and/or model uncertainties. Agent dynamics considered in (1.1) are given by the first-order differential equations in continuous time. The analog of (1.1) in discrete time can be defined by a system of first-order difference equations of the following form

z i (k + 1) = Fi (z i (k), u i (k), k) + ΔFi (z i (k), k) + wi (k), yi (k) = Hi (z i (k), u i (k)) + vi (k)

(1.2)

where k ∈ {0, 1, 2, . . .} is the discrete-time index. In this book, we deal with the general class of multiagent dynamical systems, and the agent dynamics may assume either the continuous-time model in (1.1) or the discrete-time model in (1.2). Under this multiagent model, the distributed detection and control techniques presented in the following chapters are rigorous, theoretically sound, and practically feasible. Indeed, many practical multiagent systems, such as unmanned aerial vehicle (UAVs), are well described using the dynamical model in (1.1) or (1.2). The (desired or emergent) group behaviors of multiagents are generated through modeling or design of local interaction rules based on sensing/communicationenabled local information exchange among agents. The interaction topology or sensing/communication structure can be represented by a digraph {V, E(t)}, where {V} denotes the set of n nodes and {E} is the set of directed edges. Accordingly, the local information flow among agents may be embedded into the following n × n binary sensing/communication matrix S(t) = si j (t) ,

sii = 1,

(1.3)

where si j = 1 if { j → i} ∈ E(t), and si j = 0 if otherwise. To this end, the key problems are (i) to analyze the overall behavior of the dynamic networked multiagent systems in (1.1) through modeling and design of different local interaction rules u i ; (ii) to analyze the unexpected/emergent behaviors of the dynamic networked multiagent systems in (1.1) by conducting the estimation of interaction topologies S(t) and the corresponding redesign of cooperative interaction control laws u i (t). Figure 1.1 illustrates the overall architecture for addressing those research problems. In this book, by using approaches from systems and controls theory, optimization and estimation, and adaptive learning and controls, the methods on detection and control of emergent behaviors in multiagent systems will be presented based on analysis and design of interaction dynamics.

6

1 Introduction

Fig. 1.1 Architecture of Emergent behaviors Detection and Control

1.3 Emergent Behaviors in Multiagent Systems 1.3.1 Characteristics of Multiagent Systems Generally speaking, multiagent systems consist of a number of individual systems in a group. The emergent behaviors of multiagent systems are the results of local interaction rules and interaction topologies among agents. While multiagent systems may exhibit various dynamical behaviors, some common characteristics of individual agents considered in this book include: • Agents have decoupled dynamics. • Each agent has the capability of sensing/communicating its immediate environment. In other words, agents have the capability of locally sharing information with neighbors within their sensing/communication ranges. • Agents can process the information gathered. • The individual agent has its own actuator of taking actions.

1.3.2 Interaction Dynamics It is apparent that interaction dynamics are one of the determining factors for emergent behaviors in multiagent systems. Let us consider a simple case and assume that the general model given in (1.1) is feedback linearizable by omitting the unknown term ΔFi (z i ). That is, the multiagent system can be ideally modeled as a group of single integrators after feedback linearization

1.3 Emergent Behaviors in Multiagent Systems

z˙ i (t) = vi (t).

7

(1.4)

We may define a behavior variable xi = χi (z i ) to generically describe the coordination tasks for multiagent systems, where χi : ni → qi is a continuous and differentiable function of z i . In general, various coordination behaviors such as consensus, rendezvous, cooperative target localization, mobile agents coverage control, distributed resource allocation, and formation control may be embedded into the definition of function χi (z i ). Based on the injection of local interaction rules vi (z i , z j ), the overall system dynamics in terms of behavior variable xi can be written as x˙i =

n

αi j (si j (t)(xi − x j )),

(1.5)

j=1

where αi j (·) is in general a nonlinear function. To this end, the emergent system behavior can be well characterized by the steady state xi (∞). There are several challenges in analyzing the overall behavior of system (1.5). First, the interaction dynamics αi j (·) is generally nonlinear, and some parameters in the model may dominate the system behaviors. For instance, consider the particle kinematics in [70], r˙i = eiθi , θ˙i = u i (ri , r j , θi , θ j ). where ri ∈ 2 is the 2D position vector for the ith particle, θi is the orientation. For u i = 0, the particle travels in a straight line; for u i = ωi with ωi being a constant, the particle traverses a circle with radius 1/|ωi |; and more interesting motion patterns emerge if the controllers use only relative phase information between neighboring particles: u i = ω0 (t) + f i (θi − θ j ). Depending on the parameters, the emergent motion could be translational or circular. Second, for agent i, the available behavior information about its neighbors is limited, that is, the availability of x j depends on interaction matrix S(t), which is unpredictable and may be time-varying. Third, unlike the standard stable dynamical system whose steady state is the origin, the emergent behavior of multiagent systems will be different if agents start with different initial conditions, even under simple interaction rules and fixed interaction topology. For instance, consider the simple system with three agents and linear interaction rules x˙i = 3j=1 si j (xi − x j ), i = 1, 2, 3, and xi ∈ 2 representing the 2D position of agent i. The interaction matrix S is connected and fixed. The resulting consensus behaviors are different as shown in Fig. 1.2. Fourth, the situation becomes more complicated in the presence of unknown agent dynamics ΔFi (z i ), and the detected behavior of system (1.5) could only be an approximated one or completely inaccurate. Estimation of ΔFi (z i ) has to be taken into account together with the redesign of local interaction rules vi . Multiagent system interaction dynamics may be derived from some network task functions or objective functions J (x1 , . . . , xn ), which capture the group behaviors. Specifically, the general consensus behavior can be measured by a quadratic dis-

8

1 Introduction

(a) Initial condition 1

(b) Initial condition 2

Fig. 1.2 Consensus behaviors under different initial conditions

2 agreement function like J (x1 , . . . , xn ) = 2 si j x j − xi , and formation behavior by J (x1 , · · · , xn ) = si j x j − xi − di j , with di j being the offset value between agent i and agent j. For optimal deployment of agents covering a convex environment Q, the coverage behavior function can be defined as J (x1 , . . . , xn ) =

n 1 q − xi 2 φ(q)dq, 2 i Vi

where q is an arbitrary point in Q , xi ∈ Q denotes the position of ith agent, φ(q) is a weighting function of importance over Q, and Vi is Voronoi partition of Q satisfying Vi = {q ∈ Q|q − xi ≤ q − x j , ∀ j = i}. Multiagent global behaviors can also be characterized using network n moments such xi ; variance: as centroid, variance, and orientation given as: centroid: J (x) = n1 i=1 n (xi − x¯1 )T (xi − x¯1 ), and orientation: J (x) = R(θ )(xi − x), ¯ where J (x) = N1 i=1 R(θ ) is the rotation matrix. In addition to the above behavior task functions, we mayl also look into other types of objective functions while considering interaction uncertainties. For example, if wireless communication is used as a primary means of information sharing among agents, the quality of service of communication relies on many unknown factors such as distance, bandwidth, multipath fading, noise, and 2δ −1 interferences among agents. Rate outage probability, P = 1 − e− λ xi −x j , can be used to define the quality of communication, where δ > 0 is the data rate, and λ is the SNR. The corresponding task function is J (x1 , . . . , xn ) = f (P) for some function f (·).

1.3 Emergent Behaviors in Multiagent Systems

9

If a centralized approach was used for measurement of behavior task functions J (x1 , . . . , xn ), then the detection of emergent behavior would be straightforward based on task functions. However, the issue is that for multiagent systems, only local information and measurements are available, and the exact values of function J (x1 , . . . , xn ) are unknown. In [71], a reinforcement learning like cooperative control method was designed based on the distributed estimation of unknown value functions. While the estimation of behavior function J provides an indirect way to reveal the system interaction dynamics for identifying the emergent behaviors of multiagent systems, the detection of emergent behaviors may also be done by directly estimating and designing interaction dynamics. This is important for two reasons. First, as we mentioned before, under the same form of interaction rules u i for system (1.1), the difference of control parameters may render the change of group behaviors. For example, a group of robots might be trapped into the local minima behavior due to poorly designed interaction rules u i ; congestion behaviors for computer networks may occur in the hazard situation, in which case many agents send out their warning packets at the same time. Second, system (1.1) may contain unknown dynamics ΔFi (z i ). Clearly, identification of such uncertainties will benefit the detection and control of evolution of emergent behaviors. The adaptive learning and distributed estimation of interaction dynamics are instrumental in the designs presented in this book. The general method can start with the analysis and design of a nominal interaction cooperative control for ideal multiagent models given by z˙ i = Fi (z i , vi ),

yi = Hi (z i ),

that is, based on the consideration of behavior variables xi , the nominal control protocol for multiagent systems is of the form vi = vli (z i ) + G i (S(t))vh i ,

(1.6)

in which vli (z i ) is the lower-level controller, G i (S(t)) is the control gain based on information interaction matrix S(t), and vh i is the higher-level cooperative control of the form vh i = vh i (si1 (x1 − xi ), . . . , sin (xn − xi ))

(1.7)

Under the interaction cooperative control (1.6), the emergent behaviors may be studies from the following three aspects. First, the control gains in (1.7) may play a role in the occurrence of emergent behaviors. For example, for multiagent systems with the integrator model z˙ i = vi for agent i with the interaction rules being of the form vi = j∈Ni si j (x j − xi ), the different choices of control gains si j (not necessarily be binary number) may result in different group behaviors. Second, in multiagent systems, the emergent behavior might be dominated by informed agents or leaders in the group. For example, a group leader may exist in birds flocking. The observation of behaviors of informed agents may lead to the detection

10

1 Introduction

of group behaviors. Therefore, it is of importance to identify informed agents. On the other hand, given a network of agents with interaction capabilities, emergent group behaviors can be monitored and controlled by injecting informed agents into the network in order to provide robustness against uncertainties. For example, in the position consensus problem, the informed agent may carry the desired location information, which are a set of constant values. For formation control problem, the informed agent knows the time-varying desired trajectory to follow. However, the challenge is that the complete behavior and system dynamics of informed agent are mostly unknown or at best partially known to certain agents. In Chap. 5, we will discuss an adaptive control approach to address this issue through system augmentation. That is, as the starting formalism, we assume that the emergent group behavior of multiagent systems can be modeled by an unknown informed agent described by the first-order differential equation x˙0 = a0 x0 + r0 (t),

(1.8)

where a0 is unknown constant, and r0 (t) is a piecewise-continuous bounded function of time representing certain command signals to specify the desired behavior of the system. We will study the coordinated leader tracking behavior for the augmented system (1.1) and (1.8). The basic design idea is that if through the careful design of low-level control vli (z i ) and state transformation, we can make system (1.1) convert into certain form like x˙i = ai xi + bi vh i , where ai , bi are system parameters, then a high-level adaptive interaction control law vh i of the form vh i = f i (xi , aˆ i , rˆ0 ), a˙ˆ i =

N

αi j (x j − x0 ), rˆ0 = φiT (xi )Wi

(1.9)

j=0

could be designed to make individual agents converge to the behavior x0 (t). To this end, the group behaviors of the multiagent system (1.1) will be obtained through inverse state and input transformations. The issues to be addressed include how to define the corresponding state and input transformations, as well as how to design adaptation law and parameterized approximators (e.g., neural networks) for unknown parameters ai and r0 as given in (1.9). Third, if the interaction dynamics of multiagent systems were completely known, it would be of ease to analyze the overall system behaviors by applying typical nonlinear tools like Lyapunov function method, passivity theory, contraction mapping method, or singular perturbations. However, in engineered multiagent systems, the simple local interaction rules are generally designed by simplifying nonlinear dynamics into linear ones. Thus, unexpected/emergent behaviors may occur due to functions of unknown interaction dynamics of the closed-loop system. To solve this problem, we may use neural network to estimate unknown dynamics ΔFi (z i ) in the design of low-level controller vli as necessary. Specifically, this can be done by designing an error monitoring mechanism to check the value of the error

1.3 Emergent Behaviors in Multiagent Systems

11

signal ei = xi − |N1 i | j∈Ni x j , which defines the difference between the behavior of agent i and the average of behaviors of its connecting neighbors. The error ei could be be estimated using a distributed estimator of the form e˙i =

αi j (e j − ei ).

j∈Ni

In the case of ei being greater than certain threshold value, it indicates the occurrence of unknown dynamics in agent i, and correspondingly, the lower-level controller vli could be redesigned to address the issue. To this end, emergent behaviors can be monitored and controlled through distributed estimation algorithms presented in Chap. 4.

1.3.3 Interaction Topologies Another important element in the study of emergent behaviors of multiagent systems is the development of methods for estimating and controlling interaction topologies among agents. The interaction topology S(t) defined in (1.3) is of paramount importance in determining the emergent behavior. Under continuous interaction rules for multiagent systems, the emergence of group consensus behavior requires the connected interaction topologies [29–32]. However, in the case of discontinuous interaction rules, it has been shown in [33] that the connected interaction topology may not guarantee group consensus behavior. While for the connected interaction topologies, if the structures are different, the final consensus values will be different. For instance, consider the following multiagent systems x˙i =

(x j − xi ),

(1.10)

j∈Ni

with i = 4 and under the following two interaction topologies ⎡

1 ⎢1 S1 (t) = ⎢ ⎣0 0

0 1 1 0

0 0 1 1

⎡ ⎤ 0 1 ⎢1 0⎥ ⎥ , S2 (t) = ⎢ ⎣1 0⎦ 1 0

1 1 0 1

1 0 1 1

⎤ 0 1⎥ ⎥. 1⎦ 1

Both S1 (t) and S2 (t) are connected. Under S1 (t), system (1.10) will exhibit the consensus behavior of converging to x1 (t0 ). Under S2 (t), system (1.10) will converge 4

x (t )

to the average of initial values, that is, i=14 i 0 . If interaction topology is not connected, the situation becomes more complicated. For example, consider again system (1.10) with i = 4 and the associated two interaction matrices S(t) given by

12

1 Introduction

⎡

1 ⎢1 S1 (t) = ⎢ ⎣0 0

0 1 1 0

0 0 1 1

⎡ ⎤ 0 1 ⎢1 0⎥ ⎥ , S2 (t) = ⎢ ⎣0 0⎦ 1 0

1 1 0 0

0 0 1 1

⎤ 0 0⎥ ⎥. 1⎦ 1

It can be seen that S1 (t) is connected, and the expected group behavior is that all agents converge to the first agent corresponding to the first row of S1 (t). On the other hand, S2 (t) shows that agents are separated into two groups. In this scenario, the emergent consensus behaviors will be two clusters. By using S(t) defined in (1.3) to describe the interaction topologies, we will need to address several challenges in order to analyze the emergent behaviors. First, the interaction matrix S(t) is in general time-varying and unpredictable. Second, in the realistic situations of multiagent systems, there may exist possible interaction link failures and limited bandwidth. Third, unexpected change in Si j (t) may appear due to information transmission delay, loss of interagent communication, or addition or removal of agent nodes. To deal with the aforementioned challenges, the estimation algorithms for monitoring the change of interaction topology and group size could be designed. In essence, the emergent behavior of general multiagent systems is closely related to the spectral properties of the so-called Laplacian matrix defined below (to be discussed in Chap. 3) L = D + (I − S(t)), where I is the identity matrix, and D = diag{ j =i si j }. For undirected interaction topology, the network connectivity and convergence speed are determined by the second smallest eigenvalue of L, the so-called Fiedler eigenvalue [72]. In [73], a decentralized power iteration approach is used to estimate the Fiedler eigenvalue and eigenvector for an undirected and connected network. However, it may not be able to handle fast changing topologies. In [74], connectivity of direct network was estimated by observing the first left eigenvector of the network Laplacian. Strategies may also be developed to classify agents in the group as informed or uninformed. Intuitively, the informed agents as well as their connectivity will influence the entire group behaviors. For instance, in the leader-follower case for multiple UAVs, a single leader will provide the reference motion for the whole group; while in the existence of multiple leaders, the group motion will be expected to stay within a region, which is determined by the convex combination of all leaders’ reference trajectories. Through the estimation of the degree distribution and the spectral analysis of Laplacian L, the number of informed agents and the corresponding emergent behavior (e.g., group consensus or cluster consensus) may be inferred. The group behaviors of multiagent systems are supposed to be robust against group size change. For example, in birds flocking, the drop-out of one bird would not change the whole group behavior. However, this is not always the case. The loss of an informed agent could be influential. Let us also consider the example of using n sensor nodes for distributed data acquisition. Without using the central node, each node will reach the behavior of obtaining the average value of the data acquired by individual nodes if the distributed algorithm di (k + 1) = di (k) + j∈Ni (d j (k) −

1.3 Emergent Behaviors in Multiagent Systems

13

di (k)) is employed, where di (k) denotes the data acquired by agent i at time instant di (t0 ), that is, the average of k. The steady-state behavior of the whole group is n1 initial measurements. Apparently, the sudden change of group size may render an inaccurate estimation. To solve this problem, we may use the interaction topology identification method to estimate the network degree for monitoring group behaviors. In addition, we may design a distributed agent counting algorithm to monitor the group size. Each agent will use a variable to denote the number in the group. Then, using a similar average consensus algorithm in the above example, the size of n can be obtained at each agent. By monitoring the group size, the unexpected/emergent behavior will be detected together with the model-based analysis. In this book, we will present a method for the distributed estimation of a key left eigenvector of the Laplacian matrix L(t), which is useful in the design of distributed estimation algorithms in Chap. 4 and the design of distributed optimization algorithms in Chap. 6.

1.4 Task Coordination in Multiagent Systems To achieve multiagent systems coordination, it is necessary that the agents in the group are capable of exchanging information through the sensing/communication networks. For agent i, its output and measurement vector yi reflects its interaction with other agents in the group through communication/sensor channels. In addition, we may define a coordination variable xi = χi (yi ) to generically describe the coordination tasks for multiagent systems, where χi : pi → q is a continuous and differentiable function of yi . By introducing xi , various coordination tasks such as consensus, rendezvous, cooperative target localization, mobile agents coverage control, distributed resource allocation, and formation control may be embedded into the definition of function χi (yi ). To this end, the multiagent task coordination to be addressed in this book can be generally recast as cooperative stability issues as defined below. Definition systems (1.1) or (1.2) are said to be cooperative if 1.1 Multiagent limt→∞ xi (t) − x j (t) = 1q 0, where 1q is q−dimensional column vector with all its elements being 1. Multiagent systems (1.1) or (1.2) are said to be cooperatively stable (i.e., cooperative, and all the state variables of the systems are uniformly bounded) if, for some steady state x ss ∈ q , limt→∞ xi (t) = x ss . As seen in definition 1.1, the steady state x ss represents the convergence value of the coordination variables xi (t) for all agents in the group. For example, if the coordination n task for mulitagent systems (1.1) is to seek the average consensus, then xi (0)/n. For multiagent systems with uncertainties and disturbances, we x ss = i=1 can further introduce the following definition to describe the robustness of networked multiagent systems.

14

1 Introduction

Definition 1.2 Multiagent systems (1.1) or (1.2) are said to be robustly cooperative if (1.11) lim supt≥η max |xi (t) − x j (t)| ≤ 1q c11 (d, t), η→∞

i, j

where c11 (d, t) is some bounding function of d and t, and d denotes the upper bound of system uncertainties. Multiagent systems (1.1) or (1.2) are said to be robustly cooperatively stable if they are robustly cooperative and, for some x ss ∈ q , lim supt≥η |xi (t) − x ss | ≤ 1n c12 (d),

η→∞

(1.12)

where c12 (d) is a bounding function of d. To further address the optimal task coordination problems for multiagent systems, some objective function Ji (x, u i , t, T ) for agent i may be introduced to quantify the optimal performance of the multiagent systems coordination. In general, the objective function Ji (x, u i , t, T ) may be defined as t+T Ji (x, u i , t, T ) = ψi (x(t + T )) +

L i (x(τ ), u i (τ ))dτ ,

(1.13)

t

where x = [x1T , x2T , . . . , xnT ]T is the stacked overall coordination variable, T denotes the finite prediction horizon, L i (·) and ψ(·) are the running and terminal cost functions, respectively. To this end, the multiagent systems task coordination problem can be generically described as follows. Problem 1.1 For a network of mulitagent dynamical systems (1.1) or (1.2), design cooperatively stabilizing control protocols u i (t) of the form u i (t) = αi (xi , x j1 , . . . , x jl , t),

(1.14)

while solving the following optimization problem min

n

Ji (x)

(1.15)

i=1

subject to some state constraints where x jk , jk ∈ Ni are the coordination variables of the neighboring agents of agent i, and Ni is the index set of the neighboring agents of agent i. In solving Problem 1.1, a system architecture as shown in figure 1.3 may be adopted, which illustrates several key elements. First, a sensing/communication model is fundamental to describe the information exchange among multiple agents

1.4 Task Coordination in Multiagent Systems

15

Fig. 1.3 Proposed multiagent task coordination architecture

in the system. One question to answer is to establish the least restrictive network controllability condition for multiagent systems to achieve the task coordination while at the same time meeting the sensing/communication capacity limit constraints. Second, in order to cover a broad class of practical applications for multiple agents, multiagent dynamics are of paramount importance in the coordination tasks. A general class of dynamical systems with uncertainties needs to be considered. Third, the optimization model to quantify the performance and cost of the coordination task has to be dealt with. Finally, based on the integrated consideration of all aforementioned elements, an optimal and robust coordination strategy and protocol policy may be developed.

1.5 Agent Model Examples In this section, we present several typical model examples for studying the dynamical behaviors of multiagent systems.

1.5.1 Single- and Double-Integrator Models The simplest agent model is the single-integrator model given below x˙i = u i

(1.16)

16

1 Introduction

where xi ∈ ni denotes the state variable, u i ∈ ni is the control input, and normally n i = 2 for 2D particles or n i = 3 for 3D particles. Another common model is the double-integrator model

x˙i = u i u˙ i = vi

(1.17)

As mentioned before, the general model in (1.1) may be converted into (1.16) or (1.17) using local state and/or input transformations. For instance, consider the following unicycle model x˙ = u 1 cos θ y˙ = u 1 sin θ θ˙ = u 2

(1.18)

where x, y are coordinates of the guidepoint of the unicycle, θ is the body orientation, u 1 is the driving velocity input, u 2 is the steering velocity input. The model in (1.17) also represents the kinematic model for differential-drive robots by letting L L and u 2 = ρ ω R −ω with ρ being the wheel radius, ω L the left side u 1 = ρ ω R +ω 2 2 wheel angular velocity, and ω R the right side wheel angular velocity. To convert (1.18) into (1.16), let us choose a reference point along the body orientation with a distance b from the guidepoint, and define its coordinates as y1 = x + b cos θ, y2 = y + b sin θ Taking time derivative of y1 and y2 renders

y˙1 y˙2

cos θ −b sin θ = sin θ b cos θ

u1 u2

u = T(θ ) 1 u2

(1.19)

For b = 0, we may set

u1 u2

= T−1 (θ )

v1 v2

=

cos θ sin θ − sin θ/b cos θ/b

v1 v2

(1.20)

and obtaining y˙1 = v1 y˙2 = v2 1 sin θ θ˙ = v2 cos θ−v b

(1.21)

in which the model in terms of y1 and y2 is the single-integrator one in (1.16). The unicycle model (1.18) can also be converted into the double-integrator model via dynamic feedback linearization. That is, by adding an integrator on the linear velocity input

1.5 Agent Model Examples

17

u˙ 1 = a and taking further derivative on x˙ and y˙ yields a x¨ cos θ −u 1 sin θ = u2 sin θ u 1 cos θ y¨ Assuming u 1 (t) = 0, we can get

a u2

cos θ −u 1 sin θ = sin θ u 1 cos θ

−1

v1 v2

x¨ v = 1 y¨ v2

which leads to

Given the single- and double-integrator agent models, the interaction rules will determine the emergent behaviors of the group. Consider (1.16) under the generic interaction rule of the form (x j − xi ) (1.22) u i = ki j∈Ni

where Ni = { j : x j − xi ≤ Ri } is the neighboring set of agent i, Ri is the sensing radius, and ki > 0 is the control gain. The group consensus behavior relies on initial conditions of agents and Ri . Control gain ki plays no rule for such a case. The interaction rule in (1.22) can be modified as follows to generate the group behavior of aggregation in formation u i = ki

(x j − a j − xi + ai )

(1.23)

j∈Ni

where ai , a j are constants defining formation shapes. For the case of time-varying ai , the interaction rule (1.23) can be further modified as u i = ki

(x j − a j − xi + ai ) + a˙ i

(1.24)

j∈Ni

For the double-integrator model in (1.17), the interaction rules rendering consensus behaviors can be similarly constructed though the convergence analysis becomes more involved. It follows (x j − xi ) + ki2 (v j − vi ) (1.25) vi = ki1 j∈N I

where ki1 and ki2 are some constants.

j∈N I

18

1 Introduction

1.5.2 Flocking Models A number of models for flocking behaviors have been proposed [12, 22, 24, 25]. Vicsek’s model is the first one to realize the flocking model while focusing on heading alignments of agents [12]. The model builds upon the discrete-time version of the unicycle model in (1.18) with u 1 being a constant, and the heading of the agent is updated toward the average of its neighbor’s headings, that is, θi (k + 1) =

1 θ j (k) n i (k) j∈N (k)

(1.26)

i

where n i (k) = |Ni |. More complex flocking models are designed based on the Boids model proposed in [22], which contains three basic rules of cohesion, separation and alignment: • Rule 1: Agents try to fly towards the center of mass of neighboring agents . • Rule 2: Agents try to keep a small distance away from other objects including other agents and obstacles. • Rule 3: Agents try to match velocity with near agents. For instance, using the double-integrator model, we may define x vi1 = ki1 ( j∈Ni |Nji | − xi ), cohesion vi2 = j∈ repulsion region ki2 (xi − x j ), separation vi3 = ki3 j∈Ni (u j − u i ), alignment

(1.27)

and accordingly the combined interaction rule for agent i is vi (t) = vi1 (t) + vi2 (t) + vi3 (t). In [75], Couzin model was proposed similar to that in (1.27) by explicitly defining three zones: repulsion zone Nir = { j|xi − x j ≤ Rr }, orientation zone Nio = { j|Rr xi − x j ≤ Ro }, and attraction zone Nia = { j|Ro ≤ xi − x j ≤ Ra }, where 0 < Rr < Ro < Ra . There was no proof for convergence of the whole group under the interaction rules (1.27). The improved flocking algorithm was obtained in [24] with rigorous stability proof based on the use of agent potential functions. In [25], the double-integrator model is considered while the interaction rule is given by vi (t) =

N j=1

K (u j (t) − u i (t)) (σ 2 + xi − x j 2 )β

(1.28)

where K , σ , and β are some positive constants. Under control (1.28), it is proved that each agent adjusts its velocity based on distances to other agents.

1.6 Summary

19

1.6 Summary In this chapter, general dynamical models for multiagent systems considered in the book are introduced. Some challenge problems related to multiagent system emergent behavior detection and control, as well as multiagent task coordination, are analyzed, which motivate us to present main algorithms documented in this book from three aspects including distributed estimation, distributed control, and distributed optimization. Specifically, following the key elements in Fig. 1.3 as well as the objectives of multiagent emergent behavior detection and task coordination, the book is organized as follows. Chapter 2 covers the preliminaries on matrix and system theory which are instrumental for conducting the rigorous analysis on multiagent system behaviors. Chapter 3 presents the fundamental knowledge on algebraic graph theory for describing the interaction topology among agents and some basic consensus algorithms. Particularly, several new cooperative control design algorithms for high-order linear systems are included. Multiagent emergent behavior detection and distributed estimation algorithms are covered in Chap. 4, which specifically documents several our new design results in terms of distributed least squares algorithms and distributed Kalman filtering algorithms. Chapter 5 focuses on design of distributed task coordination algorithms. A general nonlinear cooperative control algorithm is proposed for nonlinear systems, followed by results on adaptive tracking control design for nonlinear systems with uncertainties. Multiagent distributed optimization and reinforcement learning control results are included in Chap. 6.

References 1. Strogatz, S.H.: Exploring complex networks. Nature 410, 268–276 (2001) 2. Newman, M.E.J.: Network an Introduction. Oxford University Press, Oxford (2010) 3. Weiss, G.: Multiagent Systems a Modern Approach to Distributed Artificial Intelligence. The MIT Press, Cambridge, Massachusetts (1999) 4. Ren, W., Beard, R.W.: Distributed Consensus in Multi-vehicle Cooperative Control. Springer, London (2008) 5. Qu, Z.: Cooperative Control of Dynamical Systems. Springer, London (2009) 6. Bullo, F., Cortés, J., Martínez, S.: Distributed Control of Robotic Networks. Applied Mathematics Series, Princeton University Press (2009). Electronically available at https:// coordinationbook.info 7. Saber, R.O., Fax, J.A., Murray, R.M.: Consensus and cooperation in networked multi-agent systems. Proc. IEEE 95, 215–233 (2007) 8. Wang, J.: Distributed coordinated tracking control for a class of uncertain multiagent systems. IEEE Trans. Autom. Control 62, 3423–3429 (2017) 9. Wang, J., Pham, K.: An approximate distributed gradient estimation method for network optimization with limited communications. IEEE Trans. SMC: Syst. 50, 5142–5151 (2020) 10. Strogatz, S.H.: SYNC: The Emerging Science of Spontaneous Order. Hyperion, New York (2003) 11. Winfree, A.T.: The Geometry of Biological Time. Springer, New York (1980) 12. Vicsek, T., Czirok, A., Jacob, E.B., Cohen, I., Shochet, O.: Novel type of phase transition in a system of self-driven particles. Phys. Rev. Lett. 75, 1226–1229 (1995)

20

1 Introduction

13. Dimarogonal, D.V., Kyriakopoulos, K.J.: On the rendezvous problem for multiple nonholonomic agents. IEEE Trans. Autom. Control 52, 916–922 (2007) 14. Wang, J., Qu, Z., Obeng, M.: A distributed cooperative steering control with application to nonholonomic robots. In: 49th IEEE Conference on Dec. and Ctrl, (Atlanta, GA), pp. 4571– 4576 (2010) 15. Siljak, D.D.: Large-Scale Dynamic Systems: Stability and Structure. North-Holland, New York (1978) 16. Srikant, R.: The Mathematics of Internet Congestion Control. Birkhauser, Boston (2004) 17. Scutari, G., Barbarossa, S., Pescosolido, L.: Distributed decision through self-synchronizing sensor networks in the presence of propagation delays and asymmetric channels. IEEE Trans. Signal Process. 56, 1667–1684 (2008) 18. Khalil, H.: Nonlinear Systems, 3rd edn. Prentice Hall, Upper Saddle River, NJ (2003) 19. Vidyasagar, M.: Nonlinear Systems Analysis. Prentice Hall, Englewood Cliffs, NJ (1978) 20. Dorfler, F., Bullo, F.: Synchronization in complex networks of phase oscillators: a survey. Automatica (2014) 21. Gazi, V., Passino, K.M.: Swarm stability and optimization. Springer, New York, NY (2011) 22. Reynolds, C.W.: Flocks, herds, and schools: a distributed behavioral model. In: Computer Graphics (ACM SIGGRAPH 87 Conference Proceedings), pp. 25–34 (1987) 23. Pimentel, J.A., Aldana, M., Huepe, C., Larralde, H.: Intrinsic and extrinsic noise effects on phase transitions of network models with application to swarming systems. IEEE Trans. Cybern. 43, 738–750 (2013) 24. Olfati-Saber, R.: Flocking for multiagent dynamic systems: algorithms and theory. IEEE Trans. Autom. Control 51, 401–419 (2006) 25. Cucker, F., Smale, S.: Emergent behavior in flocks. IEEE Trans. Autom. Control 52, 852–862 (2007) 26. Cattivelli, F.S., Sayed, A.H.: Modeling bird flight formations using diffusion adaptation. IEEE Trans. Signal Process. 59, 2038–2051 (2011) 27. Gazi, V., Passino, K.: Stability analysis of swarms. IEEE Trans. Autom. Control 48, 692–697 (2003) 28. Jadbabaie, A., Lin, J., Morse, A.: Coordination of groups of mobile autonomous agents using nearest neighbor rules. IEEE Trans. Autom. Control 48, 988–1001 (2003) 29. Saber, R.O., Murray, R.M.: Consensus problems in networks of agents with switching topology and time-delays. IEEE Trans. Autom. Control 49, 1520–1533 (2004) 30. Ren, W., Beard, R.W.: Consensus seeking in multiagent systems under dynamically changing interaction topologies. IEEE Trans. Autom. Control 50, 655–661 (2005) 31. Lin, Z., Brouchke, M., Francis, B.: Local control strategies for groups of mobile autonomous agents. IEEE Trans. Autom. Control 49, 622–629 (2004) 32. Qu, Z., Wang, J., Hull, R.A.: Cooperative control of dynamical systems with application to autonomous vehicles. IEEE Transa. Autom. Control 53, 894–911 (2008) 33. Wang, J., Obeng, M., Qu, Z., Yang, T., Staskevich, G., Abbe, B.: Discontinuous cooperative control for consensus of multiagent systems with switching topologies and time-delays. The 53th IEEE Conference on Dec. and Ctrl, Florence, Italy (2013) 34. Szabo, C., Teo, Y.: An integrated approach for the validation of emergence in component-based simulation models. Proceedings of the 2012 Winter Simulation Conference, (Berlin, Germany), pp. 2739–2750, 9–12 Dec 2012 35. Szabo, C., Teo, Y.: Post-mortem analysis of emergent behavior in complex simulation models. In: SIGSIM-PADS’13, (Montreal, Quebec, Canada), pp. 241–251, 19–22 May 2013 36. Kubik, A.: Towards a formalization of emergence. J. Artif. Life 9, 41–65 (2003) 37. Chen, C., Nagl, S.B., Clack, C.D.: Detecting and analyzing emergent behaviors in multilevel agent based simulations. Proceedings of the Summer Computer Simulation Conference (2007) 38. Seth, A.K.: Measuring emergence via nonlinear granger causality. Proceedings of the Eleventh International Conference on the Simulation and Synthesis of Living Systems, pp. 545–553 (2008)

References

21

39. Birdsey, L., Szabo, C.: An architecture for identifying emergent behavior in multiagent systems. In: Proceedings of the 13th International Conf. on Autonomous Agents and Multiagent Systems, Paris, France, pp. 1455–1456 May 5–9 2014 40. Haglich, P., Rouff, C.: Detecting emergent behaviors with semi-boolean algebra. In: AIAA Infotech, Atlanta, GA, pp. 1–6 (2010) 41. Teo, Y.M., Luong, B.L., Szabo, C.: Formalization of emergence in multiagent systems. In: SIGSIM-PADS’13, Montreal, Quebec, Canada, pp. 231–240, 19–22 May 2013 42. Gore, R., Reynolds, J.P.F.: An exploration-based taxonomy for emergent behavior analysis in simulations. In: Proceedings of the 2007 Winter Simulation Conference, Washington, DC, pp. 1232–1240, 9–12 Dec 2007 43. Parunak, H.V.D., VanderBok, R.S.: Managing emergent behavior in distributed control systems. In: ISA-Tech’97, Anaheim, CA, pp. 1–8 (1997) 44. Mogul, J.: Emergent (mis)behavior vs. complex software systems. In: Hewlett-Packard Development Company, L. P., Palo Alto, CA, pp. 1–20, 19–22 May 2006 45. Brown D.S., Goodrich, M.A.: Limited bandwidth recognition of collective behavior in bioinspired swarms. In: Proceedings of the 13th International Conference on Autonomous Agents and Multiagent Systems, Paris, France, pp. 405–412, 5–9 May 2014 46. Murphy, R.R.: Introduction to AI Robotics. MIT Press, Boston, MA (2000) 47. Arai, T., Pagello, E., Parker, L.E.: Editorial: advances in multi-robot systems. IEEE Trans. Robot. Autom. 18, 655–661 (2002) 48. Parker, L.E.: Current state of the art in distributed autonomous mobile robotics. In: Parker, L.E., Bekey, G., Barhen, J. (Eds.) Distributed Autonomous Robotic Systems 4, Springer, New York, pp. 3–12 (2000) 49. Wang, P.: Navigation strategies for multiple autonomous mobile robots. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Tsukuba, Japan, pp. 486–493 (1989) 50. Leonard, N.E., Fiorelli, E.: Virtual leaders, artificial potentials and coordinated control of groups. In: IEEE Conference on Decision and Control, (Orlando, FL), pp. 2968–2973 (2001) 51. Fax, J.A., Murray, R.M.: Information flow and cooperative control of vehicle formations. IEEE Trans. Autom. Control 49, 1465–1476 (2004) 52. Tanner, H.G., Jadbabaie, A., Pappas, G.J.: Flocking in fixed and switching networks. IEEE Trans. Autom. Control 52, 863–868 (2007) 53. Wang, J., Qu, Z., Ihlefeld, C.M., Hull, R.A.: A control-design-based solution to robotic ecology: autonomy of achieving cooperative behavior from a high-level astronaut command. Auton. Rob. 20, 97–112 (2006) 54. Moreau, L.: Stability of multiagent systems with time-dependent communication links. IEEE Trans. Autom. Control 50, 169–182 (2005) 55. Lin, Z., Francis, B., Maggiore, M.: State agreement for continuous-time coupled nonlinear systems. SIAM J. Control Optim. 46, 288–307 (2007) 56. Papachristodoulou, A., Jadbabaie, A., Munz, U.: Effects of delay in multi-agent consensus and oscillator synchronization. IEEE Trans. Automa. Control 55, 1471–1477 (2010) 57. Qu, Z.: Cooperative control of networked nonlinear systems. In: 49th IEEE Conference on Dec. and Ctrl, Atlanta, GA, pp. 3200–3207 (Dec 2010) 58. Cortes, J.: Finite-time convergent gradient flows with applications to network consensus. Automatica 42, 1993–2000 (2006) 59. Cao, Y., Ren, W., Meng, Z.: Decentralized finite-time sliding mode estimators and their applications in decentralized finite-time formation tracking. Syst. Control Lett. 59, 522–529 (2010) 60. Cao, Y. Ren, W.: Distributed coordinated tracking with reduced interaction via a variable structure approach. IEEE Trans. Automa. Control, 56 (2011) (in press) 61. Hui, Q.: Finite-time rendezvous algorithms for mobile autonomous agents. IEEE Trans. Autom. Control 56, 207–211 (2011) 62. Chen, G., Lewis, F.L., Xie, L.: Finite-time distributed consensus via binary control protocols. Automatica 47 (2011) (in press)

22

1 Introduction

63. Shevitz, D., Paden, B.: Lyapunov stability theory of nonsmooth systems. IEEE Trans. Autom. Control 39, 1910–1994 (1994) 64. Paden, B., Sastry, S.: A calculus for computing filippov’s differential inclusion with application to the variable structure control of robot manipulators. IEEE Trans. Circuits Syst. 34, 73–82 (1987) 65. Garulli, A., Giannitrapani, A.: Analysis of consensus protocols with bounded measurement errors. Syst. Control Lett. 60, 44–52 (2011) 66. Bauso, D., Giarre, L., Pesenti, R.: Consensus for networks with unknown but bounded disturbances. Preprints (2011) 67. Li, T., Zhang, J.: Consensus conditions of multi-agent systems with time-varying topologies and stochastic communication noises. IEEE Trans. Autom. Control 55, 2043–2057 (2010) 68. Lin, P., Jia, Y.M.: Distributed robust h ∞ consensus control in directed networks of agents with time-delay. Syst. Control Lett. 57(8), 643–653 (2008) 69. Liu, Y., Jia, Y.M., Du, J.P., Yuan, S.Y.: Dynamic output feedback control for consensus of multi-agent systems: An h ∞ approach. Proceedings of the 2009 American Control Conference, pp. 4470–4475 (2009) 70. Justin, E.W., Krishnaprasad, P.S.: Equilibria and steering laws for planar formations. Syst. Control Lett. 52, 25–38 (2004) 71. Wang, J., Yang, T., Staskevich, G., Abbe, B.: Approximately adaptive neural cooperative control for nonlinear multiagent systems with performance guarantee. Int. J. Syst. Sci. 48, 909–920 (2016) 72. Fiedler, M.: Algebraic connectivity of graphs. Czechoslovak Math. J. 23, 298–305 (1973) 73. Yang, P., Freeman, R., Gordon, G., Lynch, K.: Decentralized estimation and control of graph connectivity for mobile sensor networks. Automatica 46, 390–396 (2010) 74. Qu, Z., Li, C., Lewis, F.: Cooperative control with distributed gain adaptation and connectivity estimation for directed networks. Int. J. Robust, Nonlinear Control (2012) 75. Couzin, L.D., Krause, J., Franks, N.R., Levin, S.A.: Effective leadership and decision making in animal groups on the move. Nature 433, 513–516 (2005)

Chapter 2

Preliminaries on Matrix and System Theory

2.1 Introduction In this chapter, we briefly review some useful background materials in terms of linear algebra and matrix theory, and systems and control theory. We refer to books [1–8] for a more complete treatment of the subjects. Matrix theory plays an important role in the rigorous analysis of dynamical behaviors of multiagent systems, particularly for multiagent systems with linear dynamics. The interaction topologies can naturally be represented using matrices. The solution structure of linear dynamical systems lends itself the explicit analysis of emergent behaviors. To study multiagent systems with nonlinear dynamics, fundamental tools for nonlinear system analysis and design are instrumental. The basic concepts in terms of Lyapunov stability theory and the fundamental nonlinear control design methods including feedback linearization, Lyapunov redesign, and backstepping are included in this chapter, which are useful in understanding the analysis and design of distributed estimation and control algorithms presented in the book.

2.2 Basics on Linear Algebra and Matrix Theory Let n be the n-dimensional Euclidean real space spanned by a set of n orthonormal basis vectors {e1 , e2 , . . . , en }, where ei is the vector with all 0s except for a 1 in its ith entry. Define a vector in n as an n-tuple of real numbers

© Springer Nature Switzerland AG 2022 J. Wang, Emergent Behavior Detection and Task Coordination for Multiagent Systems, Studies in Systems, Decision and Control 397, https://doi.org/10.1007/978-3-030-86893-2_2

23

24

2 Preliminaries on Matrix and System Theory

⎡

⎤ x1 ⎢ x2 ⎥ ⎢ ⎥ x=⎢ . ⎥ ⎣ .. ⎦ xn and zero vector

T x = 0 0 ... 0 = 0

A set of vectors x1 , x2 , . . . , xm is said to be linearly dependent if there exist real numbers α1 , α2 , . . . , αm , not all zero, such that α1 x1 + α2 x2 + · · · + αm xm = 0 Otherwise, they are said to be linearly independent. In n , there are at most n linearly independent vectors. Let us select a set of linearly independent vectors {q1 , q2 , . . . , qn } as the basis vectors for n , that is, every vector in n can be expressed as the unique linear combination of such a set x = α1 q1 + α2 q2 + · · · + αn qn where α1 , α2 , . . . , αn are real numbers, not all zero. Define the n × n square matrix Q = q1 q2 . . . qn Then we have

⎡

⎤ α1 ⎢ α2 ⎥ ⎢ ⎥ x = Q ⎢ . ⎥ = Q x¯ ⎣ .. ⎦ αn

T x¯ = α1 α2 . . . αn is called the representation of the vector x with respect to T can be looked as the the basis {q1 , q2 , . . . , qn }. The vector x = x1 x2 . . . xn representation with respect to the basis vectors {e1 , e2 , . . . , en }. A set of vectors xi , i = 1, . . . , m is said to be orthonormal if

xiT x j =

0 if i = j 1 if i = j

A real square n × n matrix A is called an orthogonal matrix if AT A = In , where In is the n × n unit matrix. Linear Algebraic Equation. For an m × n real matrix A, the range space is defined as all possible linear combinations of columns of A. The rank of A is the number of

2.2 Basics on Linear Algebra and Matrix Theory

25

linearly independent columns of A. rank(A) = ρ(A) ≤ min(m, n). A nonzero vector x is a null vector of A if Ax = 0. The null space of A consists of all its null vectors. Nullity is defined as the maximum number of linear independent null vectors of A. Nullity(A) = n − ρ(A) For a square matrix A : n × n, it is nonsingular if rank(A) = n, or equivalently, the determinant of A is not equal to zero (det(A) = 0). A is singular if rank(A) < n (det(A) = 0). Consider the following linear algebraic equation Ax = y where A is an m × n real matrix, y : m × 1 and x : n × 1 are real vectors. Matrices A and y are given, x is the unknown to be solved. A solution to Ax = y exists if and only if y lies in the range space of A [2]. It is easy to see that if A has full row rank m, then a solution x exists for every y. If A is an n × n matrix and nonsingular, then x = A−1 y. The homogeneous equation Ax = 0 has nonzero solutions if and only if A is singular or ρ(A) < n. The number of linearly independent solutions equals n − ρ(A), the nullity of A. Similarity Transformation. Consider an n × n matrix A = [ai j ]. Its ith column is the representation of Aei with respect to the orthonormal basis vectors {e1 , e2 , . . . , en }. Select a different set of basis vectors {q1 , q2 , . . . , qn }, then A will have a different representation A = [a¯ i j ], whose ith column is the representation of Aqi with respect to the basis vectors {q1 , q2 , . . . , qn }. That is, Aqi = a¯ i1 q1 + a¯ i2 q2 + · · · + a¯ in qn = Q a¯ i ⎡

⎤ a¯ i1 ⎢ a¯ i2 ⎥ ⎢ ⎥ Q = q1 q2 . . . qn , a¯ i = ⎢ . ⎥ ⎣ .. ⎦

where

a¯ in

It follows that

Aq1 Aq2 . . . Aqn = Q a¯ 1 Q a¯ 2 . . . Q a¯ n

which leads to the similarity transformation AQ = Q A ⇒ A = Q −1 AQ

(2.1)

Eigenvalue and Eigenvector. Similarity transformation provides an easy way for the structural decomposition of a given matrix and facilitates the study of dynamical systems represented by matrices. Specifically, it is preferred to transform a matrix

26

2 Preliminaries on Matrix and System Theory

into the diagonal or block-diagonal form (Jordan form) based on its eigenstructure. Recall that for an n × n real matrix A if there exists a nonzero vector x and a real or complex number λ such that λx = Ax then λ is an eigenvalue of A, and x is the corresponding eigenvector. It follows that the nonzero solution of (λIn − A)x = 0 requires matrix λIn − A to be singular, we obtain the characteristic equation of matrix A det(λIn − A) = 0

(2.2)

and (λ) = det(λIn − A) is defined as the characteristic polynomial of A, which is an nth order polynomial in the variable λ and has n roots λ1 , . . . , λn (not necessarily distinct). The roots of (λ) are eigenvalues of A. Apparently, a real matrix A may have complex eigenvalues (in complex conjugate pairs). Eigenvectors may appear as complex vectors as well. For distinct eigenvalues of a real matrix A, the associated eigenvectors are linearly independent. For an n × n real square matrix A, eigenvectors we discussed so far are right eigenvectors. Sometimes, we may consider the left eigenvectors as xT A = xT λ

(2.3)

AT x = λx

(2.4)

which is equivalent to

If follows from (A) = (AT ) that left eigenvalues and right eigenvalues for matrix A are identical. Then the left eigenvector for A is really same as the right eigenvector for AT . Jordan Canonical Form. Consider a real matrix A with all distinct eigenvalues λ1 , . . . , λn , and let qi be the corresponding eigenvector of λi . It follows Aqi = λi qi that ⎡ ⎤ λ1 ⎢ λ2 ⎥ ⎢ ⎥ A = Q −1 AQ = ⎢ ⎥ .. ⎣ . ⎦ λn that is, A can be transformed into the diagonal form. Consider a real matrix A with nondistinct eigenvalues. Whether matrix A can be transformed into diagonal or not depends on the number of linearly independent eigenvectors for the repeated eigenvalue. Define the algebraic multiplicity of an eigenvalue λi as the integer n i associated with factor (λ − λi )ni in the characteristic polynomial (λ) = det(λIn − A). In other words, λi repeats n i times. If for all such repeated λi , there are correspondingly n i linearly independent eigenvectors, then

2.2 Basics on Linear Algebra and Matrix Theory

27

matrix A is still diagonalizable. Otherwise, matrix A can be transformed into the block-diagonal form (Jordan canonical form) ⎡ ⎢ ⎢ A=⎢ ⎣

⎤

Jn 1 (λ1 )

Jn 2 (λ2 )

..

⎥ ⎥ ⎥ ⎦

. Jn m (λm )

with Jni (λi ) is an n i × n i Jordan block in the form of ⎤ λi 1 0 . . . 0 ⎥ ⎢ 0 λi 1 ⎥ ⎢ ⎢ .. . . .. .. ⎥ Jni (λi ) = ⎢ . ⎥ ⎥ ⎢ ⎣0 1⎦ 0 0 . . . 0 λi ⎡

(2.5)

where n 1 + · · · + n m = n, and λi , i = 1, . . . , m are eigenvalues of A. It should be noted that for an eigenvalue λi with the algebraic multiplicity n i ≥ 2, the Jordan form depends on the nullity of the matrix (λi In − A), denoted as n iJ , that is, the number of linearly independent eigenvectors associated with λi . If n iJ = 1, then the Jordan block for λi is of the form in (2.5), which we say that the order of the Jordan block is n i ; if n iJ = n i , then there are n i Jordan blocks for λi , and the order of each Jordan block is 1; if 1 < n iJ < n i , then there are n iJ Jordan blocks for λi , and the largest order the Jordan block n¯ i < n i . We may define n iJ as the geometric multiplicity of λi . Example 2.1 Consider an 4 × 4 matrix A with 2 eigenvalues of λ1 and λ2 with the algebraic multiplicity of λ1 being 3. That is (λ) = (λ − λ1 )3 (λ − λ2 ). Then the Jordan canonical form assumes one of the following form ⎤ ⎡ λ1 0 0 0 ⎢ 0 λ1 0 0 ⎥ J ⎥ A=⎢ ⎣ 0 0 λ1 0 ⎦ , n 1 = 3, n¯ 1 = 1 0 0 0 λ2 ⎡

λ1 ⎢0 A=⎢ ⎣0 0 ⎡

λ1 ⎢0 A=⎢ ⎣0 0

1 λ1 0 0

0 0 λ1 0

⎤ 0 0⎥ ⎥ , n J = 2, n¯ 1 = 2 1 0⎦ λ2

1 λ1 0 0

0 1 λ1 0

⎤ 0 0⎥ ⎥ , n J = 1, n¯ 1 = 3 1 1⎦ λ2 ♦

28

2 Preliminaries on Matrix and System Theory

Functions of a Matrix. For the Jordan block in (2.5), there is an important property called nilpotent. That is, (Jni − λi Ini )k = 0 for k ≥ n i . Transforming a square matrix A into its diagonal form is useful in solving matrix functions. For example, the solution of matrix exponential e At is instrumental in the study of the properties of linear dynamical systems. To study the functions of a square matrix, we need the Cayley–Hamilton theorem [2]. Theorem 2.1 (Cayley–Hamilton theorem) For an n × n matrix A with the characteristic polynomial (λ), we have (A) = 0. Let f (λ) be any function. The Cayley–Halmilton Theorem 2.1 provides a simple to solve for f (A) by using the equivalent polynomial of f (λ) on the spectrum of A, denoted by h(λ) = β0 + β1 λ + · · · + βn−1 λn−1 . The coefficients of h(λ) can be found by equating f (λ) = h(λ) on the spectrum of A. Then f (A) = h(A). Consider the similarity transformation A = Q AQ −1 , we further have f (A) = Qh(A)Q −1 . Example 2.2 Consider a Jordan block Jni (λi ) in (2.5) with n i = 3. Then for a function f (λ), its equivalent polynomial on the spectrum of J3 n i (λi ) is h(λ) = f (λ1 ) + f (λ1 )(λ − λ1 ) + where f (λ) = block, we have

df dλ

and f (2) (λ) =

d2 f dλ2

f (2) (λ1 ) (λ − λ1 )2 2!

. Noting the Nilpotent property of the Jordan ⎡

⎤ (2) f (λ1 ) f (λ1 ) f 2!(λ1 ) f (J3 ) = h(J3 ) = ⎣ 0 f (λ1 ) f (λ1 ) ⎦ 0 0 f (λ1 )

♦

Positive (Semi-) Definiteness. Real symmetric matrices are very important because of their use in the construction of quadratic functions. Given an n × n real symmetric matrix P, it can be shown that all its eigenvalues are real. The scalar function xT Px is called a quadratic form, where x is a real vector. A symmetric matrix P can be transformed into the pure diagonal form even it has repeated eigenvalues. In other words, for the repeated eigenvalue of a symmetric matrix, its geometric multiplicity is same as its algebraic multiplicity. It follows P = QT P Q where Q is an orthogonal matrix and P is diagonal consisting of all eigenvalues of P. A symmetric matrix P is said to be positive definite (P > 0) if xT Px > 0 for all nonzero real vectors x. If xT Px ≥ 0 for all nonzero real vectors x, then P is positive semidefinite, or P ≥ 0. There are a number of equivalent necessary and sufficient conditions to check positive definiteness (or positive semidefiniteness), the commonly used ones are: (i) every eigenvalues of P is positive (nonnegative); and (ii) all leading principal minors of P are positive (all the principal minors of P are zero or positive).

2.3 Solutions to and Stability of Linear Systems

29

2.3 Solutions to and Stability of Linear Systems 2.3.1 Solutions to Linear Systems Consider a typical continuous-time linear time-invariant system given by

x˙ = Ax + Bu y = C x + Du

(2.6)

where x ∈ n is state, u ∈ m is input, y ∈ q is output, and A, B, C, D are state matrices with appropriate dimensions. The dynamical behaviors (system responses) of (2.6) generally consist of two terms. One is due to system initial conditions x(0), called zero-input response, and the other is due to input signal u(t), called zero-state response. The complete response of linear systems in (2.6) is the superposition of the zero-input response and the zerostate response, which can be obtained using the integrating factor method as follows:

t ) Bu(τ )dτ x(t) = e At x(0) + 0 e A(t−τ t At A(t−τ ) y(t) = Ce x(0) + C 0 e Bu(τ )dτ + Du(t)

(2.7)

The control problem for linear systems (2.6) boils down two aspects: regulation and tracking. The baseline control design is the state feedback control of the form u(t) = −K x(t)

(2.8)

where K is the feedback control gain matrix which can be obtained using the standard pole placement method. Applying (2.8) to (2.6) leads to the closed-loop system dynamics x˙ = (A − B K )x

(2.9)

The system response as a result of (2.9) is x(t) = e(A−B K )(t−t0 ) x(t0 )

(2.10)

The discrete-time state-space model can be obtained by discretizing (2.6) under the assumption that u(t) is generated using a zero-order hold (ZOH) over the sample interval T . It follows

x(k + 1) = x(k) + u(k) (2.11) y(k) = C x(k) + Du(k)

30

2 Preliminaries on Matrix and System Theory

where = e AT , and =

η 0

e Aη dηB. The solution to (2.11) can be obtained as

⎧ k−1 k−1−i ⎪ ⎪ u(i) ⎨ x(k) = k x(0) + i=0

k−1 ⎪ ⎪ ⎩ y(k) = Ck x(0) + Ck−1−i u(i) + Du(k)

(2.12)

i=0

2.3.2 Stability of Linear Systems There are two stability notions based on system responses. For zero-input responses, the internal stability can be defined. For zero-state responses, the BIBO (boundedinput-bounded-output) stability can be defined. It follows from the solution in (2.7) that the internal stability can be revealed from the structures of e At and k . Based on the discussions in Sect. 2.2, there are a number of methods to compute e At and k . The most straightforward way is to transform A and into Jordan canonical form A k and , and then e At = Qe At Q −1 and k = Q Q −1 . In essence, the eigenvalues of A and determine the internal stability of a linear system. That is, if and only if all eigenvalues of A (all eigenvalues of ) are in the left-half plane of the complex plane (are inside the unit circle), then the linear continuous-time system (2.6) (the linear discrete-time system (2.11)) is asymptotically stable; if and only if all eigenvalues of A (all eigenvalues of ) have zero or negative real parts and those with zero real parts have the same algebraic multiplicity and geometric multiplicity (have magnitudes less than or equal to 1, and those equal to 1 the same algebraic multiplicity and geometric multiplicity), then the linear continuous-time system (2.6) (the linear discrete-time system (2.11)) is marginally stable. Example 2.3 Consider the following two system matrices for linear continuous-time systems. ⎡ ⎤ ⎡ ⎤ 00 0 01 0 A1 = ⎣0 0 0 ⎦ , A2 = ⎣ 0 0 0 ⎦ 0 0 −2 0 0 −2 Note that A1 has eigenvalues 0, 0, −2, the algebraic multiplicity and the geometric multiplicity of the repeated eigenvalue 0 are 2, and ⎤ 10 0 = ⎣0 1 0 ⎦ 0 0 e−2t ⎡

e A1 t

and A1 is marginally stable. For A2 , it also has eigenvalues 0, 0, and −2, while the algebraic multiplicity for 0 is 2 and the geometric multiplicity for 0 are 1. It follows

2.3 Solutions to and Stability of Linear Systems

31

⎤ 1t 0 = ⎣0 1 0 ⎦ 0 0 e−2t ⎡

e A2 t

♦

and A2 is unstable.

Example 2.4 Consider the following two system matrices for linear discrete-time systems. ⎡ ⎤ ⎡ ⎤ 10 0 11 0 1 = ⎣ 0 1 0 ⎦ , 2 = ⎣ 1 0 0 ⎦ 0 0 0.5 0 0 0.5 1 has eigenvalues 1, 1, 0.5, the algebraic multiplicity and the geometric multiplicity of the repeated eigenvalue 1 are 2, and ⎤ 10 0 k1 = ⎣ 0 1 0 ⎦ 0 0 0.5k ⎡

and 1 is marginally stable. For 2 , it also has eigenvalues 1, 1, and 0.5, while the algebraic multiplicity for 1 is 2 and the geometric multiplicity for 1 are 1. It follows ⎡

⎤ 1k 0 k2 = ⎣ 0 1 0 ⎦ 0 0 0.5k and 2 is unstable.

♦

The BIBO stability is defined based on zero-state responses. A system is said to be BIBO stable if the output is bounded given a bounded input. It follows from (2.7) that under the zero initial conditions, we have t y(t) =

Ce A(t−τ ) Bu(τ )dτ + Du(t)

(2.13)

0

Following a similar argument as that in [2], the necessary ∞ and sufficient condition for BIBO stability can be obtained from (2.13) as that 0 |Ce A(t−τ ) B|dτ is bounded. For a minimum realization (A, B, C, D), the necessary and sufficient condition for BIBO stability is equivalent to that all eigenvalues of A lies inside the left half of the complex plane. For discrete-time systems, it follows from (2.12) that under the zero initial conditions we have y(k) =

k−1 i=0

Ck−1−i u(i) + Du(k)

(2.14)

32

2 Preliminaries on Matrix and System Theory

k The necessary and sufficient condition for BIBO stability is that ∞ k=0 C is bounded, which is equivalent to that all eigenvalues of lies inside the unit circle of the complex plane.

2.4 Tools for Nonlinear System Analysis and Design In this section, we present the basic results on studying stability and control design for nonlinear systems. The norm of a vector x ∈ n , defined by x , is any real-valued function with the properties • x ≥ 0, ∀x ∈ n , and x = 0 iff x = 0. • x + y ≤ x + y , ∀x, y ∈ n . • αx = |α| x , for any α and x ∈ n . For a vector x = x1 x2 . . . xn ∈ n , the p-norm is defined by 1

x p = (|x1 | p + · · · + |xn | p ) p , 1 ≤ p < ∞ The commonly used norms are 1-norm, 2-norm (Euclidean norm), and infinite norm, that is,

x 1 =

n

|xi |, x 2 =

√

xTx =

i=1

n

21 |xi |2

, x ∞ = max xi

i=1

i

For an m × n real matrix A, the induced norm is defined as

A p = supx=0

Ax p = max Ax p

x p =1

x p

Accordingly, the induced 1-norm, 2-norm, and infinite norm of A are

A 1 = max j

m

|ai j |, max column sum

i=1

A ∞ = max i

m

|ai j |, max row sum

j=1

A 2 = [λmax (AT A)]1/2 , max eigenvalue of AT A The matrix norm has the following properties: Ax ≤ A

x , A + B ≤ A +

B , and AB ≤ A

B .

2.4 Tools for Nonlinear System Analysis and Design

33

A general nonlinear system is of the form x˙ = f (x, u, t)

(2.15)

where x ∈ n is the state of the system, u ∈ m is the input to the system, and f : n × m × → n is a nonlinear vector function, normally satisfying Lipschitz condition, that is,

f (x1 , u, t) − f (x2 , u, t) ≤ γ x1 − x2

(2.16)

where constant γ > 0. A special class of nonlinear systems are of the affine form x˙ = f (x) + g(x)u

(2.17)

where f and g are nonlinear functions. The nonlinear system (2.15) is called autonomous if f does not explicitly depend on t. Unlike linear systems which satisfy the superposition property, the dynamical behaviors of nonlinear systems could be very complicated. A nonlinear system may have multiple equilibrium points, isolated periodic solutions, limit cycles, and finite escape time. The Lyapunov theory is fundamental in the study of nonlinear systems.

2.4.1 Lyapunov Stability Consider a nonlinear system x˙ = f (x)

(2.18)

where f is a nonlinear vector function satisfying Lipschitz condition. The system in (2.18) represents the unforced system or the closed-loop system dynamics under the feedback control of u(x). The stability can be established in terms of the equilibrium points of (2.18), which satisfy f (xe ) = 0. Normally xe = 0. The fundamental Lyapunov stability notions are introduced as follows. Definition 2.1 The equilibrium point xe of x˙ = f (x) is • stable if ∀ > 0, ∃δ = δ( ) > 0 such that x(t) < , ∀t ≥ 0 if x(t0 ) < δ. • unstable if not stable. • asymptotically stable if it is stable and ∃δ1 such that limt→∞ x(t) = xe if x(t0 ) < δ1 . • globally asymptotically stable if it is asymptotically stable and if δ1 = ∞. • exponentially stable if

x(t) ≤ c x(t0 ) e−λt , ∀t ≥ 0 for some positive constants c and λ.

34

2 Preliminaries on Matrix and System Theory

For an asymptotically stable equilibrium point (let it be the origin without loss of generality) of the system x˙ = f (x), where f is a locally Lipschitz function defined over a domain D ⊂ n , the region of attraction is defined as the set of all points x0 ∈ D such that the solution of x˙ = f (x), x(0) = x0 exists for all t ≥ 0 and converges to the origin as t → ∞. If the region of attraction is n , then the equilibrium point is said to be globally asymptotically stable. The stability analysis for (2.18) can be done using several methods. Explicitly finding the closed form solution to (2.18) would be the easiest way if not impossible. Without seeking the solution, Lyapunov’s first method and Lyapunov’s second method could be used. Lyapunov’s first method, or referred to as Lyapunov’s indirect method, is based on the linearization of the nonlinear system, which may at least T provide the local stability result. Consider (2.18) with x = x1 x2 . . . xn ∈ n , T f (x) = f 1 f 2 . . . f n and f (0) = 0. Define the Jacobian matrix of f (x) as ⎡ ∂f

⎤ · · · ∂∂xf1n ⎥ . . . ∂∂xf2n ⎥ ∂f =⎢ . . . J (x) = . ⎥ ⎥ ∂x ⎣ .. .. . . .. ⎦ ∂ fn ∂ fn · · · ∂∂ xfnn ∂ x1 ∂ x2 1 ∂ f1 ∂ x1 ∂ x2 ⎢ ∂ f2 ∂ f2 ⎢ ∂ x1 ∂ x x ⎢

and h(σ ) = f (σ x) with 0 ≤ σ ≤ 1. It follows dh = J (σ x)x, h(1) − h(0) = dσ

1

1 dh =

0

J (σ x)dσ x, 0

and noting h(0) = f (0) = 0, h(1) = f (x), we have 1 J (σ x)dσ x = [A + G(x)]x

f (x) =

(2.19)

0

1 where A = J (0), G(x) = 0 [J (σ x) − J (0)]dσ , and G(x) → 0 as x → 0. To this end, it follows from (2.19) that we may use the linear system x˙ = Ax to approximate the nonlinear system x˙ = f (x) in the small neighborhood of the origin. Theorem 2.2 (Lyapunov’s indirect method) The origin of x˙ = f (x) is asymptotically stable if and only if the linearized system A has all its eigenvalues in the open left-half complex plane; the origin is unstable if one or more of eigenvalues of A are in the open right-half complex plane; if all eigenvalues of A are in the closed left-half complex plane, but at least one of them is on the imaginary axis, then the stability of nonlinear system is inconclusive.

2.4 Tools for Nonlinear System Analysis and Design

35

The proof of Theorem 2.2 can be found in [6]. To fully study the stability of nonlinear systems, Lyapunov’s second (direct) method is instrumental, which is motivated by the principle of energy dissipation. The key is to examine the time derivative of a Lyapunov function candidate V (x) along the system trajectory. Theorem 2.3 (Lyapunov’s direct method) Let the origin be the equilibrium point of x˙ = f (x) ∈ D ∈ n . Let V (x) : D → be a continuously differentiable positive definite function, then • if

∂V ∂V ∂V f (x) = ,..., V˙ (x) = ∂x ∂ x1 ∂ xn x = 0 is stable. • if

⎡

⎤ f1 ⎢ .. ⎥ ⎣ . ⎦≤0 fn

V˙ < 0, ∀x ∈ D − {0}

then x = 0 is asymptotically stable. How to search for the Lyapunov function is generally not easy. For linear system x˙ = Ax, the Lyapunov function candidates are normally quadratic functions of the form x T P x, where P is a symmetric and positive definite matrix. If P is not symmetric, it can always be rewritten as the sum of a symmetric matrix and a skew-symmetric T T T + P−P . Note that for a skew-symmetric matrix P−P , we have matrix as P = P+P 2 2 2 T T T P+P T x T P−P x = 0, ∀x. Thus, x P x = x x and we can always assume P is sym2 2 metric in the quadratic form of Lyapunov function candidates. It follows that for a linear system of the form x˙ = Ax V˙ = x T P x˙ + x˙ T P x = x T (P A + AT P)x = −x T Qx where Q is a symmetric matrix defined by the following Lyapunov equation P A + AT P = −Q

(2.20)

If Q is positive definite, then the linear system is asymptotically stable. For a general nonlinear system, the variable gradient method may provide a systematic way to construct the Lyapunov function [6, 7]. The idea is to first define V˙ (x) in the parameterized form like ∂∂Vx = g(x), and then choose free parameters ∂g based on the conditions ∂∂gx ij = ∂ xij and V (x) > 0. To this end,

36

2 Preliminaries on Matrix and System Theory

x1 V (x) = 0

∂V (x1 , 0, . . . , 0)dx1 + ∂ x1

xn + 0

x2 0

∂V (x1 , x2 , . . . , 0)dx2 + · · · ∂ x2

∂V (x1 , x2 , . . . , xn )dxn ∂ xn

Example 2.5 Consider x˙ = −L x with ⎡

⎤ 2 −1 0 L = ⎣ 0 1 −1⎦ −1 0 1 L is positive definite. Let Q = L + L T , the solution to (2.20) is P = I3 , and x T P x is the corresponding Lyapunov function for x˙ = −L x. ♦ In the case of V˙ ≤ 0, Theorem 2.3 can only claim the Lyapunov stability. Invariant set theorems by La Salle can be used to further draw some conclusions on asymptotic stability. A set M is an invariant set with respect to x˙ = f (x) if x(0) ∈ M ⇒ x(t) ∈ M, ∀t. For example, an equilibrium point is an invariant set. Theorem 2.4 (Local invariant set theorem) Consider x˙ = f (x) with V (x) being a continuously differentiable function and V˙ ≤ 0 for all x ∈ 1 . Let 2 be the set of all points in 1 where V˙ = 0, and M be the largest invariant set in 2 . Then, every solution x(t) originating in 1 approaches to M as t → ∞. Example 2.6 Consider x˙ = −L x with ⎡

⎤ 1 −1 0 L = ⎣−1 2 −1⎦ 0 −1 1 with the Lyapunov function candidate V (x) = 0.5x T x. It follows V˙ = −x T L x ≤ 0 The largest invariant set is M = {x|L x = 0}. Note that L has rank 2, then M = {x|x = c1, ∀c ∈ }. Therefore, x(t) → c1 as t → ∞. ♦ For a discrete-time system x(k + 1) = f (x(k))

(2.21)

2.4 Tools for Nonlinear System Analysis and Design

37

with a Lyapunov function V (x), it requires that V (x(k)) = V (x(k + 1)) − V (x(k)) ≤ 0 Lyapunov stability theory can be developed in parallel.

2.4.2 Nonlinear Control Design The control design for linear systems can be done using frequency response method and pole placement-based state feedback method. The fundamental idea is to make sure that the closed-loop system has the desired characteristic equation. For nonlinear systems, the situation becomes more complicated and the conventional linear design methods are generally no longer working. There are fruitful results on nonlinear control design [6]. The commonly used ones include feedback linearization, Lyapunov redesign, backstepping, and so on. Input-State Feedback Linearization. Consider a class of multiinput affine nonlinear systems given by x˙ = f (x) +

m

gi (x)u i = f (x) + G(x)u,

(2.22)

i=1

where x ∈ n is the state, u ∈ m is the input, f (0) = 0, m < n, the entries of f (x) and G(x) are analytic functions of x, and rank G(x) = m for all x ∈ n . Let pair {A, B} denote the linear time-invariant controllable canonical form of proper dimension. Then, the standard feedback linearization problem is to find a state transformation z = φ(x) ∈ n and a control mapping u = α(x) + β(x)v with v ∈ m such that the resulting transformed system is given by z˙ = Az + Bv. Conditions under which nonlinear system (2.22) is feedback linearizable can be found in texts such as [9, 10]. For those systems that are not exact feedback linearizable, the problem of partial feedback linearization was studied in [11] to transform system into a partially linear controllable form as z˙ 1 = Az 1 + Bv, z˙ 2 = γ (z 1 , z 2 ) + β(z 1 , z 2 )v, where z 1 ∈ p , and z 2 ∈ n− p . As a more general extension to exact feedback linearization, the dynamic feedback linearization problem was studied in [9, 12, 13] by using the following dynamic compensator w˙ = a(x, w) + b(x, w)v, u = α(x, w) + β(x, w)v,

38

2 Preliminaries on Matrix and System Theory

with a(0, 0) = 0 and α(0, 0) = 0. Clearly, feedback linearization is closely related to controllability and, when applicable, renders simple solution to the stabilization problem. Input–Output Feedback Linearization. Consider the following single-input singleoutput nonlinear system x˙ = f (x) + g(x)u, y = h(x)

(2.23)

The basic idea of input–output linearization is to repeatedly differentiate the output function y until control u appears. If the system (2.23) has relative degree r , that is, L g L if h(x) = 0, 0 ≤ i ≤ r − 2, L g L rf−1 h(x) = 0, where L f h(x) = ∂∂hx h(x) is the Lie derivative of h along f , then we have = = .. .

∂h [ f (x) + g(x)u] = L f h + L g hu = L f h ∂x ∂L f h [ f (x) + g(x)u] = L 2f h + L g L f hu = ∂x

y (r ) =

f (x) + g(x)u] = L rf h + L g L rf−1 hu

y˙ y¨

Let u =

∂ L rf−1 h [ ∂x

1 [−L rf h(x) L g L rf−1 h(x)

L 2f h

+ v], we obtain a chain of r integrators y (r ) = v

Lyapunov Redesign. Lyapunov redesign can be used to deal with nonlinear systems with matched uncertainties under the assumption that the bounds of the uncertainties are known. Consider x˙ = f (t, x) + g(t, x)(u + g(t, x, u))

(2.24)

where x ∈ n , u ∈ m , f : + × D → n and g : + × D → n× p are known functions, g : R+ × D × m → m is unknown but with a known bound. The design objective is to find a state feedback control such that the closed-loop system is stable. The proposed control is of the form u = u N (t, x) + u (t, x)

(2.25)

in which the term u N (t, x) is designed such that the nominal system x˙ = f (t, x) + g(t, x)u N is asymptotically stable, and the term u (t, x) is added to handle the effect due to the uncertainty g(t, x, u). Once an asymptotically stabilizing control u N (t, x) is designed, which implies the existence of a Lyapunov function V (t, x) satisfying α1 ( x ) ≤ V (t, x) ≤ α2 ( x )

2.4 Tools for Nonlinear System Analysis and Design

39

∂V ∂V + [ f (t, x) + g(t, x)u N (t, x)] ≤ −α3 ( x ) ∂t ∂x where αi : + → + , αi (0) = 0, and αi (·) is strictly increasing, the additive term u can be designed as follows. Applying (2.25) to (2.24) leads to x˙ = f (t, x) + g(t, x)u N (t, x) + g(t, x)[u + g(t, x, u)]

(2.26)

It follows that the time derivative of V along (2.26) is ∂V ∂V ∂V + [ f + gu N ] + g(u + g) V˙ = ∂t ∂x ∂x ∂V ≤ −α3 ( x ) + g(u + g) ∂x

(2.27)

If u can be chosen such that ∂∂Vx g(u + g) ≤ 0, then Lyapunov redesign is done. Assume that g(t, x, u N + u ) ≤ δ(t, x) + c u , 0 ≤ c < 1. Then ∂V ∂V ∂V g(u + g) ≤ gu + g (δ(t, x) + c u ) ∂x ∂x ∂x To this end, let ∂V g u = −η ∂∂Vx g

(2.28)

∂x

where η ≥ δ(t, x)/(1 − c), we have

∂V ∂x

g(u + g) ≤ 0. ∞ Remark 2.1 If g(t, x, u) → 0 as t → ∞, and 0 g(t, x)(t, x, u) dt < ∞, then the nominal control u N (t, x) will be able to ensure the asymptotical stability of system (2.24). ♦ Backstepping. [14] Backstepping is a recursive Lyapunov design method particularly for nonlinear systems in the strict feedback form as x˙ z˙ 1 z˙ 2

= f 0 (x) + g0 (x)z 1 = f 1 (x, z 1 ) + g1 (x, z 1 )z 2 = f 2 (x, z 1 , z 2 ) + g2 (x, z 1 , z 2 )z 3 .. .

z˙ k−1 = f k−1 (x, z 1 , . . . , z k−1 ) + gk−1 (x, z 1 , . . . , z k−1 )z k z˙ k = f k (x, z 1 , . . . , z k ) + gk (x, z 1 , . . . , z k )u where gi (x, z 1 , . . . , z i ) = 0 for 1 ≤ i ≤ k over the domain of interest. To illustrate the idea, let us consider the following second order nonlinear system

40

2 Preliminaries on Matrix and System Theory

x˙1 = x2 + f 1 (x1 )

(2.29)

x˙2 = u + f 2 (x1 , x2 )

(2.30)

where f 1 and f 2 are smooth nonlinear functions. The design consists of two steps: • Step 1: Consider the first subsystem (2.29), and design a virtual control x2 such that (2.29) is stabilized. Let x2 = α(x1 ) = −x1 − f 1 (x1 ), which renders x˙1 = −x1 and

V1 (x1 ) = 0.5x12 ⇒ V˙1 = −x12 < 0

• Step 2: Define the state transformation z = x2 − α(x1 ) and the Lyapunov function candidate V = V1 + 0.5z 2 It follows ∂α x˙1 V˙ = x1 (x2 + f 1 ) + z x˙2 − ∂ x1 ∂α = x1 (z + α + f 1 ) + z u + f 2 − x˙1 ∂ x1 ∂α = x1 (z − x1 ) + z u + f 2 − (x2 + f 1 ) ∂ x1 To this end, let u = −x1 − z − f 2 +

∂α (x ∂ x1 2

+ f 1 ), we have

V˙ = −x12 − z 2 < 0 The closed-loop system stability can be claimed.

2.5 Summary In essence, multiagent systems are dynamical systems on interaction graphs. The basic matrix theory and system theory for dynamical systems covered in this chapter are of importance in the study of dynamical behaviors of multiagent systems. In the analysis of multiagent systems in the state space, it is common to introduce the state vector for the overall systems which normally is the stacked vector of individual agents’ state vectors. To this end, the solution-based analysis and/or Lyapunov stability theory-based analysis become handy, particularly for case of multiagent systems with fixed sensing/communication topologies. For time-varying cases, the

2.5 Summary

41

situations generally are more complicated. Nonetheless, the fundamental theories and design methods introduced in this chapter may still serve as a starting point for further development.

References 1. Luenberger, D.G.: Introduction to Dynamic Systems. Wiley, Hoboken (1979) 2. Chen, C.: Linear System Theory and Design, 4th edn. Oxford University Press, New York (2013) 3. Narendra, K.S., Annaswamy, A.M.: Stable Adaptive Systems. Prentice-Hall, Englewood Cliffs, NJ (1989) 4. Sastry, S.: Nonlinear Systems: Analysis, Stability and Control. Springer-Verlag, New York (1999) 5. Vidyasagar, M.: Nonlinear Systems Analysis. Prentice Hall, Englewood Cliffs, NJ (1978) 6. Khalil, H.: Nonlinear Systems, 3rd edn. Prentice Hall, Upper Saddle River, NJ (2003) 7. Slotine, J.J., Li, W.: Applied Nonlinear Control. Prentice-Hall, Englewood Cliffs, NJ (1991) 8. Isidori, A.: Lectures in Feedback Design for Multivariable Systems. Springer, Switzerland (2017) 9. Isidori, A.: Nonlinear Control Systems, 3rd edn. Springer-Verlag, Berlin (1995) 10. Nijmeijer, H., van der Schaft, A.J.: Nonlinear Dynamical Control Systems. Springer-Verlag, New York (1990) 11. Marino, R.: On the largest feedback linearizable subsystem. Syst. Control Lett. 6, 245–351 (1986) 12. Charlet, B., Levine, J., Marino, R.: Sufficient conditions for dynamic state feedback linearization. SIAM J. Control Optim. 29, 38–57 (1991) 13. Sluis, W.M.: A necessary condition for dynamic feedback linearization. Syst. Control Lett. 21, 277–283 (1993) 14. Krstic, M., Kanellakopoulos, I., Kokotovic, P.V.: Nonlinear and Adaptive Control Design. Wiley, New York (1995)

Chapter 3

Interaction Topologies of Multiagent Systems and Consensus Algorithms

3.1 Interaction Topologies of Multiagent Systems Fundamentally, the study on emergent behaviors of multiagent systems can be cast as the study of consensus problems. The consensus among agents relies on information exchange. In general, we assume that the information exchanges among agents are conducted through communication broadcasting or agents’ sensing capabilities. Considering the limited and directed sensing/communication capability of each agent as shown in Fig. 3.1, the sensing/communication topologies among agents are generally time-varying.

3.1.1 Algebraic Graph Theory It is natural to use a graph to represent the information flow among agents in the network. A graph consists of a set of nodes and a set of links (edges) which connect

nodes, and it can be represented by G = (V, E), where V = {v1 , v2 , . . . , vn } is the set of n nodes (vertices) and E = {ei j } ⊂ V × V is the set of directed edges. An edge from node i to node j is denoted as ei j = (vi , v j ), which represents the information transmission from node i to node j. In other words, ei j is a directed link outgoing from vi and incoming to v j . If both ei j ∈ E and e ji ∈ E, then it is called an undirected link. If all links in E are undirected, the graph G is an undirected graph. Otherwise, it is a directed graph (digraph). The graph we are considering is also simple; that is, there are no self-loops for any node and no multiple edges between the same pairs of nodes. Figure 3.2 illustrates two digraph examples.

© Springer Nature Switzerland AG 2022 J. Wang, Emergent Behavior Detection and Task Coordination for Multiagent Systems, Studies in Systems, Decision and Control 397, https://doi.org/10.1007/978-3-030-86893-2_3

43

3 Interaction Topologies of Multiagent Systems . . .

44

Fig. 3.1 A multiagent system with limited sensing/communication

Fig. 3.2 Digraph examples

For the graph in Fig. 3.2a, V = {v1 , v2 , v3 , v4 } and E = {e12 , e14 , e43 }. For the graph in Fig. 3.2b, V = {v1 , v2 , v3 , v4 } and E = {e12 , e23 , e34 }. The set of neighbors of node vi is defined as Ni = {v j |(v j , vi ) ∈ E}

(3.1)

that is, the set of nodes with edges incoming to vi . The in-degree of vi is the number of edges incoming to vi , and the out-degree of vi is the number of edges outgoing from vi . We use |Ni | to denote the number of neighbors of node vi , that is, the in-degree of vi . A graph is a balanced graph if for every node in the graph, its in-degree equals its out-degree. A weighted digraph is defined as a triple (V, E, W), where (V, E) is

a digraph and W = {ai j } is a set of positive weights associated with the edges in E.

3.1 Interaction Topologies of Multiagent Systems

45

The connectivity of the graph indicates how the information is shared among agents (nodes), which is related to the connected edges (paths) in the graph. A directed path from node vi to node v j consists of a sequence of edges (vi , vl1 ), (vl1 , vl2 ), . . . , (vlr , v j ) in the set E, where vlk , k = 1, . . . , r are some nodes on the path from vi to v j . A graph is said to be strongly connected if there exists a directed path for any two nodes vi , v j . For a strongly connected undirected graph, it is simply called connected. A graph is said to be weakly connected if replacing all the directed edges in the graph with undirected ones renders a connected graph. A directed tree in the graph is a connected digraph where every node except the root has in-degree equal to one. A spanning tree of a digraph is directed tree in the graph which includes all nodes of the graph. It is apparent that a graph may have multiple spanning trees. Remark 3.1 A directed tree may only contain part of nodes in the graph. Intuitively, the existence of a spanning tree in the graph is the necessary condition for the group of agents generating certain group emergent behaviors. Otherwise, the group of agents will be split into multiple groups, and there is no guarantee that the common group behavior will emerge. A strongly connected graph has multiple spanning trees. A spanning tree in an undirected graph implies that the graph is connected. ♦ To study the multiagent system on a graph, it is of importance to introduce several graph matrices. We consider a weighted graph G = (V = {v1 , . . . , vn }, E, W) and associate each edge (v j , vi ) ∈ E with a weight ai j . For an undirected graph, ai j = a ji . The adjacency matrix A = [ai j ] ∈ n×n is defined as aii = 0, ∀i ∈ {1, . . . , n}, ai j > graph, ai j = a ji , A = AT . The 0 if e ji ∈ E, and otherwise ai j = 0. For an undirected degree matrix D = diag{di } ∈ n×n , where di = nj=1 ai j . If ai j = 1, ∀e ji ∈ E, then di equals the in-degree of node vi . The graph Laplacian L = D − A = [li j ] is defined as n k=1 aik , i = j (3.2) li j = i = j −ai j , For the digraph in Fig. 3.2, if ai j = 1, ∀e ji ∈ E, then its Laplacian is given by ⎡

0 ⎢ −1 L=⎢ ⎣ 0 −1

0 1 0 0

0 0 1 0

⎤ 0 0 ⎥ ⎥ −1 ⎦ 1

Note that L1 = 0, that is, all row sums of L equal zero, and L has an eigenvalue zero associated with an eigenvector L is symmetric, and n1. For nan undirected graph, 2 a (x − x ) ≥ 0, ∀x ∈ n . Hence, L it is easy to show that x T Lx = 21 i=1 j j=1 i j i is positive semidefinite. For general directed graphs, L is not symmetric and can be sign-indefinite. Example 3.1 Consider two undirected graphs with the following adjacency matrices, respectively

3 Interaction Topologies of Multiagent Systems . . .

46

⎡

0 ⎢1 A1 = ⎢ ⎣0 0

1 0 0 0

0 0 0 1

⎡ ⎤ 0 01 ⎢1 0 0⎥ ⎥ , A2 = ⎢ ⎣0 1 1⎦ 0 00

0 1 0 1

⎤ 0 0⎥ ⎥ 1⎦ 0

It can be seen that A1 is not connected, and A2 is connected. The eigenvalues for the graph Laplacian matrices L1 are 0, 0, 2, 2, and for L2 are 0, 0.5858, 2, 3.4142. ♦ Theorem 3.1 (Gersgorin Circle Theorem [1]). All eigenvalues of matrix E = [ei j ] ∈ n×n are located in the union of n disks ⎫ ⎬ |ei j | z ∈ C |z − eii | ≤ ⎭ ⎩ i=1 j=i

⎧ n ⎨

The Gersgorin circle theorem is useful to characterize the locations of eigenvalues of the graph Laplacian matrix. As shown in Fig. 3.3 , it can be seen that all eigenvalues of L are located within the disk {λ ∈ C||λ − dmax | ≤ dmax } where dmax is the largest in-degree of G. It follows that Re{λi } ≥ 0, and 0 = λ1 ≤ |λ2 | ≤ · · · ≤ |λn |. Specifically, for an undirected graph, L is symmetric and positive semidefinite, and all its eigenvalues are real and nonnegative. That is, 0 = λ1 ≤ λ2 ≤ · · · ≤ λn . The following theorems reveal how the Laplacian matrix is linked to the connectivity of the network. Theorem 3.2 [2] For a digraph, its Laplacian matrix L has rank n − 1, that is, λ1 = 0 is simple (not repeated) if and only if the digraph has a spanning tree. Theorem 3.2 states a necessary and sufficient condition for network connectivity of multiagent systems, which we may refer it as the network controllability condition without explicitly considering the dynamics of individual agents. In other words, this is the least required connectivity condition for producing certain global group behaviors in a network of distributed multiagent systems. For a digraph, if it is strongly connected, there always exists at least one spanning tree. Figure 3.4 shows such an example. For the digraph in Fig. 3.4a, a spanning tree can be identified as shown in Fig. 3.4b. For an undirected graph, the terms of the existence of a spanning tree, of being strongly connected, and of being connected are all equivalent. For a digraph with a spanning tree, it has been shown that rank{L} = n − 1 in Theorem 3.2, and {λi } ≥ 0 by Theorem 3.1, it can be further shown that for ¯ where M¯ ∈ any given positive semidefinite symmetric matrix M: = diag{0, M}, (n−1)×(n−1) , there exists a positive definite symmetric matrix P that satisfies the following matrix equation PL + LT P = M

(3.3)

3.1 Interaction Topologies of Multiagent Systems

Fig. 3.3 Location of eigenvalues of L

Fig. 3.4 A spanning tree in a digraph

47

3 Interaction Topologies of Multiagent Systems . . .

48

The matrix P in (3.3) can be constructively obtained. Let

∞

P=

e−L t Me−Lt dt T

(3.4)

0

Noting (3.18) and (3.19), we know that the integrand (3.4) is a sum of terms of t k−1 e−λi t . Therefore, the integral exists. It is easy to see that P is symmetric and positive definite. To this end, substituting (3.4) into (3.3) yields ∞ PL + L P = T

e

−LT t

0

−Lt

Me

∞ Ldt +

LT e−L t Me−Lt dt T

0

∞ =−

∞ d −LT t T e Me−Lt dt = −e−L t Me−Lt = M 0 dt

(3.5)

0

Remark 3.2 For an undirected graph, the matrix P satisfying (3.3) can simply be an identity matrix with M = 2L. ♦ Remark 3.3 For a strongly connected digraph, the matrix P satisfying (3.3) can T be constructed using the left eigenvector w1 = w11 w12 . . . w1n for eigenvalue λ1 = 0. Let P = diag{w1 } Noting that PL1 = 0 and 1T PL = w1T L = 0, we have PL = LT P that is, PL is symmetric and positive semidefinite. Thus Eq. (3.3) is satisfied with M = 2PL. ♦ Now let us consider the case of adding a nonnegative diagonal matrix into L. Define B = diag{bi } ∈ n×n with bi ≥ 0 and at least one of bi > 0. We have the following theorem. Theorem 3.3 Assume that L is the Laplacian matrix of a strongly connected digraph. Then all eigenvalues of the matrix L + B have the positive real parts. Proof Note that L = D − A. Let us define an augmented Laplacian matrix as follows. ⎡ ⎤ 0 0 ... 0 0n ⎢−b1 b1 + j=1 a1 j ⎥ −a12 ... −a1n ⎢ ⎥ n ⎢−b2 ⎥ −a b + a . . . −a ˆ 21 2 2 j 2n j=1 L=⎢ ⎥ (3.6) ⎢ . ⎥ . . . . .. .. .. .. ⎣ .. ⎦ n −an1 ... −an(n−1) bn + j=1 an j −bn

3.1 Interaction Topologies of Multiagent Systems

49

Clearly, Lˆ can be treated as the Laplacian matrix for an augmented graph by adding a so-called leading agent (node 0) into the original n nodes graph, while bi representing the path between the leading node and node i. That is, if bi > 0, there is a link e0i and node i is receiving information from node 0. Since at least one bi > 0, and note that the original graph is strongly connected, we know that the augmented graph has at least one spanning tree, which implies that Lˆ has rank 1. To this end, note that Lˆ is in the lower-triangular form and can be rewritten as ⎡ Lˆ = ⎣

0

0

⎤ ⎦

(3.7)

−B1 L + B

we know that the matrix L + B must be nonsingular, and together with Gersgorin circle theorem, it can be claimed that all eigenvalues of the matrix L + B have the positive real parts. Remark 3.4 It follows from Theorem 3.3 that the −L − B is Hurwitz. For any given positive definite symmetric matrix M there exists a positive definite symmetric matrix P that satisfies the following Lyapunov equation. −P(L + B) − (LT + B)P = −M

♦

Remark 3.5 For a connected undirected graph, we have that all eigenvalues of the matrix L + B have the positive real parts. ♦ Remark 3.6 If the digraph associated with L has a spanning tree, there is no guarantee that L + B is nonsingular for general B. However, if the nonzero element bi is related to the root node of the spanning tree, then we have that all eigenvalues of the matrix L + B have the positive real parts. ♦ Example 3.2 Consider the digraph in Fig. 3.5. There are three nodes and an augmented node 0. It is apparent that the graph with nodes 1, 2, and 3 has a spanning tree, and node 0 is not connected to the root node 1.

Fig. 3.5 A three-node digraph with a spanning tree

3 Interaction Topologies of Multiagent Systems . . .

50 Fig. 3.6 A three-node strongly connected digraph

The Laplacian matrix L and is B are ⎡

⎤ ⎡ ⎤ ⎡ ⎤ 0 0 0 000 0 0 0 L = ⎣−1 1 0⎦ , B = ⎣0 1 0⎦ , L + B = ⎣−1 2 0⎦ 0 −1 1 000 0 −1 1 The matrix L + B has the eigenvalues of 0, 1, and 2. It is singular. Now consider the digraph in Fig. 3.6, which is strongly connected. ⎡

⎤ ⎡ ⎤ ⎡ ⎤ 1 0 −1 000 1 0 −1 L = ⎣−1 1 0 ⎦ , B = ⎣0 1 0⎦ , L + B = ⎣−1 2 0 ⎦ 0 −1 1 000 0 −1 1 The matrix L + B has the eigenvalues of 0.25, 1.88 + 0.75 j, and 1.88 − 0.75 j. It is nonsingular. ♦

3.1.2 Matrix Representation of Sensing/Communication Network In this section, we present a direct matrix method to represent the sensing/communication topologies among agents, which may lead a natural way to deal with multiagent systems with high-order dynamics. Consider a network of n agents. Let us define the following sensing/communication matrix ⎡ ⎤ s11 s12 (t) . . . s1n (t) ⎢ s21 (t) s22 . . . s2n (t) ⎥ ⎢ ⎥ (3.8) S(t) = ⎢ . .. . . .. ⎥ ⎣ .. . . ⎦ . sn1 (t) sn2 (t) . . . snn

3.1 Interaction Topologies of Multiagent Systems

51

Fig. 3.7 A network of four agents

with S(t) = S(tk ), ∀t ∈ [tk , tk+1 ), k = 0, 1, . . . , where sii ≡ 1; si j (t) = 1 if the jth agent is in the sensing/communication range of the ith agent at time t, and si j = 0 if otherwise. An example is given in Fig. 3.7. If there is a link from agent j to agent i, then si j = 1. The corresponding sensing/communication matrix is ⎡

1 ⎢0 S=⎢ ⎣1 0

1 1 0 0

0 0 1 1

⎤ 1 0⎥ ⎥ 0⎦ 1

(3.9)

Remark 3.7 A connection can be established between the matrix S and the adjacency matrix A. For the graph in Fig. 3.7, we have A = S − I if a binary matrix is considered. ♦ Given the matrix sequence S(tk ), the notion of sequentially completeness was introduced in [3, 4] to describe the least required condition on network connectivity for multiagent systems. A matrix E is a nonnegative matrix (positive matrix) if all its elements ei j ≥ 0 (ei j > 0). A matrix E is a reducible matrix if there exists a permutation matrix P such that

F 0 P E P = 11 F12 F22

T

where Fii are square and irreducible. A matrix E is irreducible if it is not reducible. A reducible matrix E can be put into the lower-triangular form. For an n × n irreducible nonnegative matrix E, the matrix (In×n + E)n−1 becomes a positive matrix.

3 Interaction Topologies of Multiagent Systems . . .

52

Fig. 3.8 An irreducible network

Example 3.3 Consider the network in Fig. 3.8. The corresponding matrix S is ⎡

1 ⎢0 S=⎢ ⎣0 1

1 1 0 0

0 1 1 0

⎤ 0 0⎥ ⎥ 1⎦ 1 ♦

which is irreducible. Example 3.4 Consider the network in Fig. 3.9. The corresponding matrix S is ⎤ 1 00 0 ⎢ 11 ⎥ ⎥ S=⎢ ⎣ S21 1 1 0 ⎦ S31 S32 1 ⎡

which is reducible. The sub-blocks S21 , S31 , and S32 could be nonzero.

♦

Definition 3.1 The sensing/communication matrix sequence {S(t)} is said to be sequentially lower-triangularly complete if it is sequentially lower triangular and in every block row i of its lower-triangular canonical form, there is at lease one j < i such that the corresponding block Si j (t) is uniformly nonvanishing. Definition 3.2 The sensing/communication matrix sequence {S(t)} is said to be sequentially complete if the sequence contains an infinite subsequence that is sequentially lower-triangularly complete.

3.1 Interaction Topologies of Multiagent Systems

53

Fig. 3.9 A reducible network

Remark 3.8 It can be seen that the irreducible sensing/communication matrix S(t) corresponds to a strongly connected digraph. For the reducible sensing/communication matrix sequence {S(t)} which is sequentially lower-triangularly complete, the corresponding graph has a spanning tree. ♦ Remark 3.9 The sequential completeness condition can be extended to include the case of the union of a finite consecutive sensing/communication matrices S(t). That is, a sensing/communication matrix sequence {S(tk )}, k = 0, 1, . . . is sequentially complete, if there exists an infinite subsequence {S(tk )} that is sequentially complete, kj=0 l j S(ti ), where k = 0, 1, . . . , l j are integers, and l0 = 0. ♦ where S(tk ) = i= k−1 l j=0 j

Example 3.5 Consider that the sensing/communication topologies for three agents

are changing according to the sequence {S(tk ), k ∈ ℵ, ℵ = {1, 2, . . . }} defined below: S(tk ) = S1 for k = 4η, S(tk ) = S2 for k = 4η + 1, S(tk ) = S3 for k = 4η + 2, and S(tk ) = S4 for k = 4η + 3, where η ∈ ℵ, ⎡

⎤ ⎡ ⎤ 100 110 S1 = ⎣ 1 1 0 ⎦ , S2 = ⎣ 0 1 0 ⎦ , 001 001 ⎡ ⎤ ⎡ ⎤ 100 100 S3 = ⎣ 0 1 0 ⎦ , and S4 = ⎣ 0 1 0 ⎦ . 101 001 The bitwise union of Si , i = 1, . . . , 4 is

i

⎡

⎤ 110 SΛ,11 ∅ . Si = ⎣ 1 1 0 ⎦ = SΛ,21 1 101

(3.10)

3 Interaction Topologies of Multiagent Systems . . .

54

It then follows from the structure of i Si that the corresponding sequence is lower-triangularly complete, and therefore, the switching sensing/communication topologies defined by (3.10) is sequentially complete. ♦

3.2 Basic Consensus Algorithm: Continuous-Time Case Fundamentally, the study of emergent behaviors in multiagent systems boils down to the study of the consensus problem (or agreement problem) among agents. Problem 3.1 Consensus problem. Consider multiagent systems with the following system dynamics (agent i) x˙i = f i (u i , xi ), i = 1, . . . , n

(3.11)

where xi is the state vector, u i is the input vector, and f i is a smooth (or Lipschitz continuous) nonlinear function denoting agent dynamics. Assume that there exists a sensing/communication graph G among agents. The connectivity topology of the graph may be time-varying. The general consensus control problem is to find u i = Ui (xi , x j ), x j ∈ Ni , where Ni is the neighboring set of agent i, such that all agents achieve state consensus in the sense of lim xi (t) − x j (t) = 0, ∀i, j

t→∞

(3.12)

In this section and next three sections, we present a number of basic consensus algorithms for agents with linear dynamics in both continuous-time case and discrete-time case. These lay down the foundation for dealing with more complicated multiagent systems. The simplest multiagent system is the linear system with the single integrator model as follows x˙i = u i , i = 1, . . . , n

(3.13)

It is useful to start with the first-order linear system (3.13) because nonlinear systems in (3.11) may be converted into (3.13) through feedback linearization. For some inherently nonlinear systems, which are not feedback linearizable (such as mobile robots with kinematic constraints), we will present a general nonlinear consensus control algorithm in Chap. 5. Without loss of generality, we assume that (3.13) is a single-input-single-output (SISO) system, that is, xi ∈ and u i ∈ . For multiple-input-multiple-output systems (MIMO) like xi ∈ m and u i ∈ m , the extension is straightforward. For the multiagent system consisting of (3.13), we use a weighted digraph (V, E, W) to represent its sensing / communication topology. Let A = [ai j ] ∈ n×n be the adjacency

3.2 Basic Consensus Algorithm: Continuous-Time Case

55

matrix, and Ni be the neighboring set of agent i. The basic consensus algorithm is of the following form u i (t) =

ai j (x j (t) − xi (t))

(3.14)

j∈Ni

Applying (3.14) to (3.13) leads to the overall closed-loop multiagent system dynamics x˙ = −Lx

(3.15)

T where x = x1 , . . . , xn is the stacked state vector for all agents in the group. Theorem 3.4 Consider the multiagent system (3.13) under the consensus control (3.14). Assume the sensing/communication topology G = (V, E, W) among agents is fixed. The consensus problem in (3.1) is solved if and only if the graph G has a spanning tree. Proof According to Theorem 3.2, the existence of a spanning tree in G is equivalent to that L has rank n − 1, and the eigenvalue λ1 = 0 is simple and Re{λi } < 0, i = m ni = 2, . . . , m. Assume that eigenvalue λi has the algebraic multiplicity n i , and i=1 n − 1. It suffices to show the convergence by the closed-loop system solution. It follows from (3.15) that x(t) = e−Lt x(0)

(3.16)

For L ∈ n×n , there exists a similarity transformation Q such that L is transformed into the Jordan diagonal form ⎤

⎡

0

⎥ ⎥ ⎥ : = Lˆ ⎦

⎢ Jn 2 (λ2 ) ⎢ Q −1 LQ = ⎢ .. ⎣ .

(3.17)

Jn m (λm ) where Jni (λi ) is an n i × n i Jordan block for eigenvalue λi . It follows from (3.17) that ⎡ ⎤ 1 ⎢ e Jn2 (−λ2 )t ⎥ ⎢ ⎥ −1 −Lt e = Q⎢ (3.18) ⎥Q . .. ⎣ ⎦ e Jnm (−λm )t Following a similar procedure as that in example 2.2, it can be shown that e Jni (−λi )t is in the form of

3 Interaction Topologies of Multiagent Systems . . .

56

⎡

e Jni (−λi )t

2 −λi t

e−λi t te−λi t t e2! ⎢ ⎢ 0 e−λi t te−λi t ⎢ ⎢ =⎢ 0 0 e−λi t ⎢ ⎢ . .. .. ⎣ .. . . 0 ... 0

... ... .. . .. . 0

t (ni −1) e−λi t (n i −1)! t (ni −2) e−λi t (n i −2)!

⎤

⎥ ⎥ ⎥ (n i −3) −λi t ⎥ t e ⎥ (n i −3)! ⎥ ⎥ .. ⎦ . −λi t e

(3.19)

It follows from (3.18) and (3.19) that e−Lt is stable, and lim e−Lt = v1 w1T

(3.20)

t→∞

where v1 ∈ n is the first column vector in Q, and w1T ∈ 1×n is the first row vector in Q −1 . It boils down to see how to find v1 and w1 . It follows from LQ = Q Lˆ that (λ1 = 0)

λ1 v1 = Lv1 ⇒ (λ1 I − L)v1 = 0

That is, v1 is the right eigenvector corresponding to eigenvalue λ1 = 0. Apparently, T v1 = c1 = c c . . . c for any nonzero constant c. On the other hand, it follows from ˆ −1 Q −1 L = LQ that w1T L = λ1 w1T ⇒ w1T (λ1 I − L) = 0 T That is, w1 = w11 w12 . . . w1n is the left eigenvector corresponding to eigenvalue λ1 = 0. To this end, we have from (3.16) and (3.20) that lim x(t) = v1 w1T x(0) = c1

t→∞

n

w1i xi (0)

(3.21)

i=1

which is equivalent to lim xi (t) = c

t→∞

This completes the proof.

n

w1i xi (0), ∀i

(3.22)

i=1

Remark 3.10 In (3.22), the constant c can be explicitly found. It follows that Q Q −1 = I that w1T v1 = 1 and c = n 1 w1i . That is, Theorem 3.4 solves the weighed i=1 average consensus problem. ♦

3.2 Basic Consensus Algorithm: Continuous-Time Case

57

Remark 3.11 The Proof of Theorem 3.4 can be done using Lyapunov’s direct method [5]. According to (3.3), the Lyapunov function V (x) = (Lx)T P(Lx) can be constructed using the positive definite symmetric matrix P satisfying PL + LT P = M for a positive semidefinite symmetric matrix M. It follows V˙ = −(Lx)T MLx ≤ 0 The invariant set is {x|Lx = 0}. From LaSalle’s invariant set theorem [6], it can be shown that x converges to the null space of L, which is the space spanning by the ♦ eigenvector v1 . Remark 3.12 If a digraph is strongly connected, then all entries in w1 are nonzero ♦ [7]. That is, w1i = 0, ∀i = 1, . . . , n. T Remark 3.13 For an undirected graph, it is apparent that w1 = v1 = c1 = c c . . . c for any nonzero constant c. In (3.22), it again follows from v1 w1T = 1 that v1 = w1 = 1 1, and c = n1 . That is, Theorem 3.4 solves the average consensus problem. ♦ n Remark 3.14 If a digraph is strongly connected and balanced, then w1 = v1 = c1 = T c c . . . c for any nonzero constant c. Accordingly, the average consensu problem is also solved in Theorem 3.4 for strongly connected and balanced digraphs. ♦ Remark 3.15 In Theorem 3.4, the convergence speed of states is determined by the second smallest eigenvalue λ2 of L, the so-called Fiedler eigenvalue [8]. This can be clearly seen from the building block in (3.19). Clearly, the term e−λ2 t dominates the convergence speed. ♦ Now let us consider a more general case for the multiagent system (3.13) under the consensus control (3.14). Assume that the sensing/communication topology G(t) = (V, E(t), W) among agents is time-varying. That is, the sensing/communication topology changes according to the sequence G(tk ), k = 0, 1, . . ., where G(tk ) is fixed within the time interval t ∈ [tk , tk+1 ). Let us assume that every G(tk ) has a spanning tree and t0 = 0. It follows that x(tk ) = e−Lk−1 (tk −tk−1 ) e−Lk−2 (tk−1 −tk−2 ) . . . e−L0 (t1 −t0 ) x(0), k−1 = e−Lm (tm+1 −tm ) x(0),

(3.23) (3.24)

m=0

where e−Lm (tm+1 −tm ) has the structure similar to that in (3.18). Thus, we have k−1 m=0

e−Lm (tm+1 −tm ) =

k−1 v1 (m)w1T (m) + Ξ (m) m=0

(3.25)

3 Interaction Topologies of Multiagent Systems . . .

58

where v1 (m) is the right eigenvector corresponding to eigenvalue 0 of Lm , w1 (m) is the left eigenvector corresponding to eigenvalue 0 of Lm , and Ξ (m) is the matrix due to rest eigenvalues λi of Lm and its elements are in the form of (tm − tm−1 )l e−λi (tm −tm−1 ) . It then from (3.25) that k−1

e−Lm (tm+1 −tm ) =

m=0

k−1

v1 (m)w1T (m) + Θ

(3.26)

m=0

where v1 (m) and w1 (m) are the corresponding right and left eigenvectors for matrix Lm , and Θ is a matrix consisting of terms like (tm − tm−1 )l e−λi (tm −tm−1 ) with all zero eigenvalues from matrices −Lm , m = 0, . . . , k − 1. It can be shown that all elements in Θ are bounded by an exponential function like k l e−λk with λ be the smallest eigenvalue with positive real part of all matrices Lm . To this end, we summarize that the consensus is achieved with k−1 v1 (m)w1T (m)x(0) lim x(t) = lim t→∞

k→∞

m=0

It should be noted that the final consensus value depends on the combination of the switching sensing/communication topologies. Example 3.6 Consider a network of five agents. The sensing/communication topologies switch according to graphs in Fig. 3.10. The corresponding adjacency matrices are ⎡

0 ⎢1 ⎢ A1 = ⎢ ⎢0 ⎣0 1

1 0 0 0 1

0 1 0 0 0

0 0 1 0 0

⎤ ⎡ 0 01 ⎢1 0 0⎥ ⎥ ⎢ ⎢ 0⎥ ⎥ , A2 = ⎢0 1 ⎣0 0 1⎦ 0 00

0 1 0 1 0

0 0 1 0 1

⎤ ⎡ 0 00 ⎢1 0 0⎥ ⎥ ⎢ ⎢ 0⎥ ⎥ , A3 = ⎢0 1 ⎣0 0 1⎦ 0 00

0 0 0 1 0

0 0 0 0 1

⎤ 0 0⎥ ⎥ 0⎥ ⎥ 0⎦ 0

It is easy to see that A1 is strongly connected, A2 is undirected and connected, and A3 has a spanning tree. Corresponding to the zero eigenvalue of the Laplacian matrices, the left eigenvectors for L1 , L2 and L3 are ⎡

w1,L1

⎤ ⎡ ⎤ ⎡ ⎤ 0.3 0.2 1 ⎢0.2⎥ ⎢0.2⎥ ⎢0⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎥ =⎢ ⎢0.2⎥ , w1,L2 = ⎢0.2⎥ , w1,L3 = ⎢0⎥ ⎣0.2⎦ ⎣0.2⎦ ⎣0⎦ 0.1 0.2 0

T Let the initial state vector for agents be x(0) = 3 2 6 −2 −5 . We first consider the case with a fixed topology. It follows from Theorem 3.4 that if agents maintain a fixed topology A1 , the consensus value is

T w1,L x(0) 1

T w1,L 1 1

= 1.6; if agents maintain a

3.2 Basic Consensus Algorithm: Continuous-Time Case

59

Fig. 3.10 Switching sensing/communication topologies

fixed topology A2 , the consensus value is

T w1,L x(0) 2

T w1,L 1

= 0.8; if agents maintain a fixed

2

topology A3 , the consensus value is 3. The simulation results are shown in Figs. 3.11, 3.12, and 3.13, respectively. If the sensing/ communication topology switches according to the sequence {A1 , A2 , A3 , A1 , A2 , A3 , . . .}, Fig. 3.14 shows the convergence to 2.21. The basic consensus algorithm in Theorem 3.4 can be extended to produce more general multiagent behaviors. For example, to formulate some desired geometric formation patterns for the group of agents, we simply use a state transformation yi = xi − ri to model (3.13 ), where ri is a parameter specifying the desired formation. To this end, the control ai j (x j (t) − xi (t) − r j + ri ) u i (t) = j∈Ni

is able to solve the formation control problem. Remark 3.16 The weights ai j in (3.14) normally have the impact on both the convergence speed and the final consensus value. In general, large values of ai j will render a fast convergence due to employment of a high gain control. In most cases, ai j will result in a different left eigenvector w1 and accordingly the changed final consensus value. However, for undirected graphs, there will be no change for the final consensus value because its left eigenvector is always of the form w1 = c1. ♦

3 Interaction Topologies of Multiagent Systems . . .

60

Fig. 3.11 Consensus under the sensing/communication topology A1

3.3 Basic Consensus Algorithm: Discrete-Time Case Consider the discrete-time system with the single integrator model as follows xi (k + 1) = xi (k) + T u i (k), i = 1, . . . , n

(3.27)

where k = 0, 1, . . . , are discrete-time instants, T > 0 is the step size (sampling period), xi (k) ∈ is the state of agent i, and u i (k) ∈ is the input for agent i. The model in (3.27) can be directly obtained from (3.13) using a standard discretization method with a sampling period T . Let the control u i (k) be u i (k) =

ai j (x j (k) − xi (k))

(3.28)

j∈Ni

Applying (3.28) to (3.27) leads to the overall closed-loop multiagent system dynamics x(k + 1) = x(k) − T Lx(k) = Φx(k)

(3.29)

3.3 Basic Consensus Algorithm: Discrete-Time Case

61

Fig. 3.12 Consensus under the sensing/communication topology A2

T where x = x1 , . . . , xn is the stacked state vector for all agents in the group, and Φ := I − T L. The dynamic behavior of (3.29) relies on the eigenvalues of Φ. It follows that Φ has an eigenvalue λ1 = 1 and the corresponding right eigenvector is v1 = c1 for any nonzero constant c. To ensure that (3.29) is internally stable, the rest eigenvalues of Φ must lie inside the unit circle. Based on Gershigorin circle theorem, we know that the eigenvalues of L are located within a circle centered at the coordinate (dmax , 0) with a radius dmax . Accordingly, the eigenvalues of −T L are inside a circle {λ ∈ C||λ − T dmax | ≤ T dmax } Thus we have that the disk covering all the eigenvalues of Φ is {λ ∈ C||λ − (1 − T dmax )| ≤ T dmax } which should be covered by the unit circle in order to obtain a stable matrix Φ. To this end, we know that T has to be carefully selected to render a stable Φ. It follows that

62

3 Interaction Topologies of Multiagent Systems . . .

Fig. 3.13 Consensus under the sensing/communication topology A3

1 − 2T dmax > −1 and thus T
0 is the control gain to be designed. It follows that the closed-loop dynamics for agent i become x˙i1 = z i +

ai j (x j1 − xi1 )

(3.44)

j∈Ni

z˙ i = −κi z i

(3.45)

and the overall closed-loop multiagent system dynamics X˙ = Z − LX Z˙ = −ΛZ

(3.46) (3.47)

T T where X := x11 x21 . . . xn1 , Z := z 1 z 2 . . . z n , and Λ = diag{κ1 , . . . , κn }. The solutions to (3.46) and (3.47) are Z(t) = e−Λt Z(0)

(3.48)

X (t) = e−Lt X (0) +

t

e−L(t−τ ) Z(τ )dτ

(3.49)

0

It then follows from (3.48) and (3.49) that X (t) = e

−Lt

−Lt

t

X (0) + e

eLτ e−Λτ Z(0)dτ

(3.50)

0

To this end, if κi is chosen such that κi > di , where di = j∈Ni ai j , then we know t that the term 0 eLτ e−Λτ Z(0)dτ will converge to some constant vector as t goes to infinity. Following a similar argument as that in theorem 3.4, we have lim X (t) = v1 w1T X (0) + v1 w1T Zˆss t→ ! " 1 = n 1w1T X (0) + Zˆss i=1 w1i

(3.51)

3 Interaction Topologies of Multiagent Systems . . .

68

where Zˆss = limt→∞

t

eLτ e−Λτ Z(0)dτ . The result is summarized into the following

0

theorem. Theorem 3.6 Consider the multiagent system (3.40) under the consensus control (3.43). Assume the sensing/communication topology G = (V, E, W) among agents is fixed. The control gain κi is chosen such that κi > di , where di = j∈Ni ai j . The output consensus can be achieved if and only if the graph G has a spanning tree. Proof The proof directly follows the aforementioned derivations.

Remark 3.22 If the sensing/communication topologies are time-varying while each topology has a spanning tree, it can be shown that the consensus can still be reached following a similar argument at that in Sect. 3.2 ♦ Example 3.7 Consider a network of five agents with the double-integrator dynamics. Again, assume that the sensing/communication topologies switch according to graphs T in Fig. 3.10. The initial state vector for agents is x(0)= 3 0 2 0 6 0 −2 0 −5 0 . We first consider the case with a fixed topology. Figures 3.15 and 3.16 show the consensus under the sensing/communication topology A1 ; Figs. 3.17 and 3.18 show the consensus under the sensing/communication topology A2 ; Figs. 3.19 and 3.20 show the consensus under the sensing/communication topology A3 ; and Figs. 3.21 and 3.22 show the consensus under the switching sensing/communication topologies A1 , A2 and A3 . The proposed cooperative backstepping control in Theorem 3.6 can be extended to deal with the following chain of q integrators x˙i1 = xi2 , x˙i2 = xi3 , .. .

(3.52)

x˙iq = u i yi = xi1

The design is done in a step-by-step fashion following the chain of integrators with the use of state transformations z il = xi(l+1) − αil , l = 1, . . . , n − 1, and αil is the virtual control input to the lth integrator in the chain. To illustrate the idea, let us consider the case of q = 3 for ease of presentation. It follows that z i1 = xi2 − αi1 , z i2 = xi3 − αi2 , and let

3.4 Consensus Algorithm for High-Order Linear Systems

Fig. 3.15 Consensus under the sensing/communication topology A1

69

3 Interaction Topologies of Multiagent Systems . . .

70

Fig. 3.16 Control inputs under the sensing/communication topology A1

αi1 =

ai j (x j1 − xi1 )

(3.53)

j∈Ni

αi2 = −κi1 z i1 + α˙ i1 = −κi1 (xi3 − α˙ i1 ) + α˙ i1

(3.54)

u i = −κi2 z i2 + α˙ i2 = −κi2 xi3 + κi2 αi2 + α˙ i2 = −κi2 xi3 − κi2 κi1 (xi2 − αi1 ) + κi2 α˙ i1 − κi1 z˙ i1 + α¨ i1 = −κi2 xi3 − κi2 κi1 (xi2 − αi1 ) + κi2 α˙ i1 − κi1 xi3 + κi1 α˙ i1 + α¨ i1 = −(κi1 + κi2 )xi3 − κi2 κi1 xi2 + κi2 κi1 αi1 + (κi1 + κi1 )α˙ i1 + α¨ i1 = −(κi1 + κi2 )xi3 − κi2 κi1 xi2 + κi2 κi1 ai j (x j1 − xi1 ) +(κi1 + κi1 )

j∈Ni

j∈Ni

ai j (x j2 − xi2 ) +

ai j (x j3 − xi3 )

j∈Ni

which result in the following closed-loop dynamics for agent i

(3.55)

3.4 Consensus Algorithm for High-Order Linear Systems

Fig. 3.17 Consensus under the sensing/communication topology A2

71

72

3 Interaction Topologies of Multiagent Systems . . .

Fig. 3.18 Control inputs under the sensing/communication topology A2

z˙ i2 = −ki2 z i2 z˙ i1 = −ki1 z i1 + z i2 ai j (x j1 − xi1 ) x˙i1 = z i1 +

(3.56) (3.57) (3.58)

j∈Ni

where ki2 ≥ ki1 > di are control gains. To this end, the overall closed-loop system dynamics for all agents become Z˙2 = −K2 Z2 Z˙1 = −K1 Z1 + Z2 X˙1 = −LX1 + Z1

(3.59) (3.60) (3.61)

T T T where Z2 = z 12 z 22 . . . z n2 , Z1 = z 11 z 21 . . . z n1 , X1 = x11 x21 . . . xn1 , K1 = diag{κ11 , κ21 , . . . , κn1 }, and K2 = diag{κ12 , κ22 , . . . , κn2 }. Note the exponential convergence of Z1 and Z2 , the consensus of X1 can be established following a similar argument as that in Theorem 3.6. The proposed cooperative backstepping control design method in Theorem 3.6 can be applied to the discrete-time system as well. Consider the following double-

3.4 Consensus Algorithm for High-Order Linear Systems

Fig. 3.19 Consensus under the sensing/communication topology A3

73

3 Interaction Topologies of Multiagent Systems . . .

74

Fig. 3.20 Control inputs under the sensing/communication topology A3

integrator model in the discrete-time domain xi1 (k + 1) = xi1 (k) + xi2 (k) xi2 (k + 1) = xi2 (k) + u i (k)

(3.62) (3.63)

Similarly, the design starts with the state transformation z i (k) = xi2 (k) − αi (k) where αi (k) is the virtual control input defined by αi (k) = λi

(x j1 (k) − xi1 (k))

(3.64)

j∈Ni

where λi is the control gain satisfying the condition of λi < 1/dmax as that in (3.30). It follows

3.4 Consensus Algorithm for High-Order Linear Systems

Fig. 3.21 Consensus under the switching sensing/communication topologies

75

3 Interaction Topologies of Multiagent Systems . . .

76

Fig. 3.22 Control inputs under the switching sensing/communication topologies

xi1 (k + 1) = xi1 (k) + λi

(x j1 (k) − xi1 (k)) + z i (k)

(3.65)

j∈Ni

z i (k + 1) = xi2 (k) + u i (k) − αi1 (k + 1)

(3.66)

Let u i (k) be u i (k) = −xi2 (k) + i z i (k) + αi1 (k + 1) (x j1 (k) − xi1 (k)) = −xi2 (k) + i xi2 (k) − i λi +λi

j∈Ni

(x j1 (k) − xi1 (k) + x j2 (k) − xi2 (k))

(3.67)

j∈Ni

where 0 < i < 1. Then we have z i (k + 1) = i z i (k)

(3.68)

As of result of (3.65) and (3.68), the overall closed-loop system dynamics for all agents become

3.4 Consensus Algorithm for High-Order Linear Systems

X1 (k + 1) = ΦX1 (k) + Z(k) Z(k + 1) = ΛZ(k)

77

(3.69) (3.70)

T T where X1 = xi1 , . . . , xn1 , Z = z 1 , . . . , z n , Φ = I − diag{λ1 , . . . , λn }L, and Λ = diag{1 , . . . , n }. Proceeding forward, we readily obtain, for k > 0, X1 (k) = Φ X1 (0) + k

k−1

Φ k−1−m Z(m)

(3.71)

m=0

Z(k) = Λk Z(0)

(3.72)

Note the exponential convergence of Z(k) and with a sufficiently small choice of −m Φ Z(m) will converge to a constant vector. Then i , we know that the term k−1 m=0 following a similar argument as that in Theorem 3.5, the consensus of X1 can be concluded. Example 3.8 Let us revisit the multiagent system in Example 3.7 but considering the discrete-time dynamics. The same initial value setting and sensing / communication topologies in A1 , A2 , A are used. In the use of control law (3.67), i = 0.1, λi = 0.01. Simulation results under different sensing/communication topologies A1 , A2 , A3 as well as the time-varying sequences {A1 , A2 , A3 } are shown in Figs. 3.23, 3.24, 3.25, 3.26, 3.27, 3.28, 3.29 and 3.30. As expected, state consensus is achieved for all cases. ♦

3.4.2 Cooperative Output Feedback Control In this subsection, we present the cooperative consensus control algorithm using output feedback. Consider again the double-integrator model in (3.40). The objective is to design cooperative consensus control in the case of not all state elements being available for feedback. We assume that for each agent i, only the output yi is measured and xi2 has to be obtained using a state estimator. For each agent in (3.40), a simple full-order estimator is of the following form x˙ˆi = A xˆi + Bu i + H (yi − C xˆi )

(3.73)

T T where xˆi = xˆi1 xˆi2 is the estimate of xi = xi1 xi2 ,

01 0 A= , B= , C= 10 00 1 and H = h 1 h 2 is the estimator gain to be chosen based on the standard eigenvalue assignment method. That is, by specifying the desired location for the estimator

78

3 Interaction Topologies of Multiagent Systems . . .

Fig. 3.23 The consensus under the sensing/communication topology A1

3.4 Consensus Algorithm for High-Order Linear Systems

79

Fig. 3.24 Control inputs under the sensing/communication topology A1

error poles p1∗ and p2∗ , H can be found by matching the characteristic equation of the estimator error dynamics, det(s I − A + H C) = 0, with the desired estimator characteristic equation, (s − p1∗ )(s − p2∗ ) = 0. To this end, the cooperative backstepping control u i in (3.43) becomes u i = −κi xˆi2 + κi

ai j (x j1 − xi1 ) +

j∈Ni

ai j (xˆ j2 − xˆi2 )

(3.74)

j∈Ni

The control u i in (3.74) can be rewritten as u i = −κi xi2 + κi x˜i2 + κi +

ai j (x j1 − xi1 )

j∈Ni

ai j (x j2 − xi2 ) −

j∈Ni

ai j (x˜ j2 − x˜i2 )

(3.75)

j∈Ni

where x˜i2 = xi2 − xˆi2 . Substituting (3.75) into (3.42) leads to z˙ i = −κi z i + κi x˜i2 −

j∈Ni

ai j (x˜ j2 − x˜i2 )

(3.76)

80

3 Interaction Topologies of Multiagent Systems . . .

Fig. 3.25 The consensus under the sensing/communication topology A2

3.4 Consensus Algorithm for High-Order Linear Systems

81

Fig. 3.26 Control inputs under the sensing/communication topology A2

It follows that the overall closed-loop multiagent system dynamics are X˙1 = Z − LX1 Z˙ = −ΛZ + ΛX˜2 + LX˜2 X˙˜ = diag{A − H C, . . . , A − H C}X˜

(3.77) (3.78) (3.79)

T T where X1 := x11 x21 . . . xn1 , Z := z 1 z 2 . . . z n , Λ = diag{κ1 , . . . , κn }, X˜2 = T T x˜11 x˜22 . . . x˜n2 , and X˜ = x˜1T x˜2T . . . x˜nT . To this end, it is not difficult to conclude the exponential convergence of X˜2 and Z, and the consensus of X1 . Example 3.9 Consider again the multiagent system in Example 3.7 but using the cooperative output feedback control in (3.74). The same initial value setting is used, and assume the sensing/communication topology is given by the adjacency matrix A1 . Simulation results are shown in Figs. 3.31, 3.32 and 3.33. For the discrete-time system in (3.63), the cooperative output feedback control can be designed similarly. A deadbeat estimator may be used to have the property that the estimator error goes to zero in finite time. The deadbeat estimator is of the form

82

3 Interaction Topologies of Multiagent Systems . . .

Fig. 3.27 The consensus under the sensing/communication topology A3

3.4 Consensus Algorithm for High-Order Linear Systems

83

Fig. 3.28 Control inputs under the sensing/communication topology A3

xˆi1 (k + 1) = xˆi1 (k) + xˆi2 (k) + h 1 (xi1 (k) − xˆi1 (k)) xˆi2 (k + 1) = xˆi2 (k) + u i (k) + h 2 (xi1 (k) − xˆi1 (k))

(3.80) (3.81)

where h 1 and h 2 are the estimator gains to be chosen such that the characteristic equation of the estimator error dynamics match with the desired characteristic equation given by p2 = 0

(3.82)

It follows from (3.63), (3.80), and (3.81) that x˜i1 (k + 1) = (1 − h 1 )x˜i1 (k) + x˜i2 (k)

(3.83)

x˜i2 (k + 1) = −h 2 x˜i1 (k) + x˜i2 (k)

(3.84)

which renders the characteristic equation p 2 + (h 1 − 2) p + 1 − h 1 + h 2 = 0 Matching the coefficients of (3.82) and (3.85), we obtain

(3.85)

84

3 Interaction Topologies of Multiagent Systems . . .

Fig. 3.29 The consensus under the switching sensing/communication topologies

3.4 Consensus Algorithm for High-Order Linear Systems

85

Fig. 3.30 Control inputs under the switching sensing/communication topologies

h 1 = 2, h 2 = 1

(3.86)

To this end, together with the deadbeat estimator in (3.80) and (3.81), the cooperative control u i (k) in (3.67) becomes u i (k) = −xˆi2 (k) + i xˆi2 (k) − i λi +λi

j∈Ni

(x j1 (k) − xi1 (k))

j∈Ni

(x j1 (k) − xi1 (k) + xˆ j2 (k) − xˆi2 (k))

(3.87)

86

3 Interaction Topologies of Multiagent Systems . . .

Fig. 3.31 The consensus under the sensing/communication topology A1

3.4 Consensus Algorithm for High-Order Linear Systems

87

Fig. 3.32 Control inputs under the sensing/communication topology A1

3.4.3 Cooperative Control for a Class of Linear Systems in a Canonical Form Consider the multiagent system with the following linear dynamics x˙i1 = −xi1 + xi2 x˙i2 = −xi2 + xi3 .. .

(3.88)

x˙iq = −xiq + u i yi = xi1

T where i = 1, . . . , n, xi = xi1 , . . . , xiq ∈ q is the state of agent i, and yi is the output. The objective is to design cooperative control such that the output consensus can be achieved. Remark 3.23 The multiagent model in (3.88) has a special structure, which makes the cooperative output feedback control design ease. For general dynamical systems, it is possible to convert them into the canonical form in (3.88) through state and input

88

3 Interaction Topologies of Multiagent Systems . . .

Fig. 3.33 The convergence of estimation errors

3.4 Consensus Algorithm for High-Order Linear Systems

89

transformations. For example, consider the double-integrator model x˙1 = x2 , x˙2 = u. Define z 1 = x1 , z 2 = x2 + x1 , and u = −2x2 − x1 + v, we have the transformed model in the form of (3.88), that is, z˙ 1 = −z 1 + z 2 , z˙ 2 = −z 2 + v It is worth pointing out that while v can be designed using the output z 1 based on the transformed model, the original control u = −x1 − 2x2 + v is still a full state feedback one. ♦ To facilitate the control design, we use S(t) defined in (3.8) to describe the sensing/communication topology among agents. Let the cooperative control be u i (t) =

n j=1

si j n

k=1 sik

y j := G i (t)y

(3.89)

T where y = y1 y2 . . . yn , si j G i := G i1 . . . G in , G i j := n

k=1 sik

and

T G = G 1T G 2T . . . , G nT ∈ n×n

T Define x = x1T x2T . . . xnT ∈ nq×1 , and ⎡

−1 ⎢0 ⎢ ⎢ A = ⎢ ... ⎢ ⎣0 0

⎤

⎡ ⎤ 0 ⎥ ⎢0⎥ ⎥ ⎢ ⎥ ⎥ ⎢.⎥ ⎥ ∈ q×q , B = ⎢ .. ⎥ ∈ q×1 , C = 1 0 . . . 0 ∈ 1×q ⎥ ⎢ ⎥ ⎣0⎦ 0 . . . −1 1 ⎦ 0 0 . . . −1 1

1 −1 .. .

0 1 .. .

... ... .. .

0 0 .. .

Under control (3.89), the overall closed-loop multiagent system dynamics can be written as x˙ = (A + BGC)x = (−I + E)x

(3.90)

where A = diag{A, A, . . . , A}, B = diag{B, B, . . . , B}, C = diag{C, C, . . . , C}, and ⎤ ⎡ E11 E12 . . . E1n ⎢E21 E22 . . . E2n ⎥ ⎥ ⎢ (nq)×(nq) E =⎢ . . . . ⎥∈ ⎣ .. .. . . .. ⎦ En1 En2 . . . Enn

3 Interaction Topologies of Multiagent Systems . . .

90

with Eii =

0 I(q−1)×(q−1) 0 0 , Ei j = G ii 0 Gi j 0

Theorem 3.7 Consider the multiagent system (3.88) under the cooperative control (3.89). Assume that the sensing/communication sequence {S(tk )}, k = 0, 1, . . . , ∞ is sequentially complete. Then the output consensus of the multiagent system (3.88) can be achieved. Proof Since S(t) = S(tk ) for tk ≤ t < tk+1 , accordingly G(t) = G(tk ) and E(t) = E(tk ) for tk ≤ t < tk+1 , the solution to (3.90) can be obtained as x(tk+1 ) = e(−I+E(tk ))(tk+1 −tk ) . . . e(−I+E(t0 ))(t1 −t0 ) x(t0 )

(3.91)

To prove the consensus, it suffices to show lim

k→∞

k η=0

Q(k) := lim Q(k)Q(k − 1) . . . Q(0) = 1c k→∞

(3.92)

where Q(k) = e(−I+E(tk ))(tk+1 −tk ) , and c ∈ 1×nq is a constant vector. Several factors are useful. First, Q(k) is a row stochastic nonnegative matrix and diagonally positive [3]. Second, for the case of Q(k) being in the lower-triangular form, using Theorem 3.2 in [3], the convergence of k0 Q(k) can be shown. For general Q(k), it can be shown the existence of a sequentially lower-triangular complete subsequence after permutation which renders the convergence of k0 Q(k) [3, 9]. Remark 3.24 For more general linear systems in the form of (3.88) but with multipleinput-multiple-output, that is, yi ∈ m , u i ∈ m , as well as with different relative degree for each agent, the cooperative control design and its convergence analysis has been done in [3]. Extension was also made to deal with discrete-time linear systems in [10]. ♦ Example 3.10 Consider the double-integrator model x˙i1 = xi2 , x˙i2 = u i . Define z i1 = xi1 , z i2 = xi2 + xi1 , and u = −xi1 − 2xi2 + vi , we have the transformed model in the form of (3.88), that is, z˙ i1 = −z i1 + z i2 , z˙ i2 = −z i2 + vi The cooperative control u i is of the form u i (t) = −2xi2 − xi1 +

n j=1

si j n

k=1 sik

x j1

We use a similar simulation setting as that in Example 3.7. Assume that the sensing/communication topology is given by A1 with the corresponding S is as

3.4 Consensus Algorithm for High-Order Linear Systems

⎡

follows

1 ⎢1 ⎢ S=⎢ ⎢0 ⎣0 1

1 1 0 0 1

0 1 1 0 0

0 0 1 1 0

91

⎤ 0 0⎥ ⎥ 0⎥ ⎥ 1⎦ 1

Simulation results are shown in Figs. 3.31, 3.34 and 3.35. The cooperative control design in (3.89) is based on the sensing/communication matrix S(t), while the cooperative controls in Sect. 3.2 are based on adjacency matrix A(t). To establish a connection of those two design methodologies, let us consider the agents with the single integrator model x˙i = u i . The control using S(t) is u i (t) = −xi +

n

si j n

j=1

l=1 sil

xj

(3.93)

and the control using A(t) is u i (t) =

n

ai j (x j − xi )

(3.94)

j=1

Note that u i in (3.93) can be rewritten as u i (t) =

n j=1

si j n

l=1 sil

(x j − xi )

(3.95)

s

which is in the form of (3.94) with ai j = n i j sil . That is, the Laplacian matrix induced l=1 by S(t) is ⎡ ⎤ 1 − ns12 s1l · · · − ns1n s1l l=1 l=1 ⎢− ns21 1 · · · − ns2n s2l ⎥ ⎢ ⎥ l=1 s2l l=1 ⎢ ⎥ L=⎢ .. .. .. .. ⎥ . ⎣ ⎦ . . . s s 1 − n n1 snl − n n2 snl · · · l=1

l=1

which has its all eigenvalues inside a circle centered at (1,0) with radius one. Apparently, due to this limitation, the convergence speed of using (3.93) may be slower than that of using (3.94). However, the control in (3.93) provides a systematic way to handle systems with arbitrary relative degree and may also reduce the amount of the needed information exchange among agents. For instance, let us consider the following nonlinear multiagent system in the strict normal form with the relative degree r

92

3 Interaction Topologies of Multiagent Systems . . .

Fig. 3.34 The consensus under the sensing/communication topology A1

3.4 Consensus Algorithm for High-Order Linear Systems

93

Fig. 3.35 Control inputs under the sensing/communication topology A1

ξ˙i1 = ξi2 ξ˙i1 = ξi2 .. . ξ˙i(r −1) = ξir ξ˙ir = q(ξi ) + b(ξi )u i

(3.96)

T where ξi = ξi1 · · · ξir is the state vector for agent i, q(ξi ) and b(ξi ) are smooth functions, and b(ξi ) = 0. Define the state and input transformations xi1 = ξi1 , k−1 l xik = Ck−1 ξi(l+1) , k = 2, . . . , r, l=0 r −1 r −1 1 l l u i = b(ξi ) − Cr −2 ξi(l+2) + Cr −1 ξi(l+1) − q(ξi ) + vi , l=0

l=0

n! where Cnm = m!(n−m)! . The system in (3.96) can then be transformed into the linear canonical form in (3.88) with the new input vi . To this end, the cooperative control is

3 Interaction Topologies of Multiagent Systems . . .

94

# r −1 r −1 1 ui = Crl −2 ξi(l+2) + Crl −1 ξi(l+1) − b(ξi ) l=0 l=0 ⎞ n si j n −q(ξi ) + ξ j1 ⎠ s il l=1 j=1

(3.97)

Remark 3.25 The proposed control (3.97) can be readily extended to deal with the cooperative tracking problem of a constant reference input ξ0 . The control is of the form # r −1 r −1 1 ui = Crl −2 ξi(l+2) + Crl −1 ξi(l+1) − b(ξi ) l=0 l=0 ⎞ n si j si0 n −q(ξi ) + ξ j1 + n ξ0 ⎠ (3.98) s s il il l=0 l=0 j=1 where si0 = 1 if agent i knows ξ0 , otherwise si0 = 0.

♦

3.5 A Discontinuous Consensus Algorithm In this section, we present a discontinuous consensus algorithm for the multiagent systems in (3.13). The proposed consensus control is of the form u i (t) =

n

αi j (t)sgn(x j (t) − xi (t))

(3.99)

j=1

where αi j (t) is a control gain to be designed based on the sensing/communication matrix S(t) (or the adjacency matrix A(t)) as well as the available available neighboring states, and sgn(·) function is defined as ⎧ ⎨ 1, z > 0 0, z = 0 sgn(z) = ⎩ −1, z < 0 One of the main benefits of using sgn(·) in the control law (3.99) is that the overall system convergence may be improved [11]. Moreover, the cost for sensor measurement and communication may be reduced due to the use of sgn(x j (t) − xl (t)) compared with that of (x j (t) − xl (t)). However, the control gain αi j may have to be redesigned. A direct use of S(t) (or the adjacency matrix A(t)) in the design may not be able to ensure the consensus even if the convergence condition on the

3.5 A Discontinuous Consensus Algorithm

95

sensor/communication topology is satisfied. The following example illustrates the point. Example 3.11 Consider a multiagent system with three agents x˙i = u i , i = 1, 2, 3, where xi ∈ and u i ∈ . The sensing/communication topology among agents are switching according to the following matrix sequence ⎡

⎤ ⎡ ⎤ ⎡ ⎤ 100 101 100 S(t3k ) = ⎣ 1 1 0 ⎦ , S(t3k+1 ) = ⎣ 0 1 0 ⎦ , S(t3k+2 ) = ⎣ 0 1 0 ⎦ , 001 001 011 where k = 0, 1, . . .. Apparently, the matrix sequence {S(t3k ), S(t3k+1 ), S(t3k+2 )} is sequentially complete. Consider the control in (3.99) with αi j =

3 j=1

si j si1 + si2 + si3

A counterexample can be readily constructed to show that such a control may no longer render the consensus. For example, let us assume that x1 (t0 ) = 1, x2 (t0 ) = 0, x3 (t0 ) = 0, and t3k+i − t3k+i−1 = 2, i = 1, 2, k = 0, 1, . . .. Direct computation shows that x1 (t0 ) = 1, x2 (t0 ) = 0, x3 (t0 ) = 0 x1 (t1 ) = 1, x2 (t1 ) = 1, x3 (t1 ) = 0 x1 (t2 ) = 0, x2 (t2 ) = 1, x3 (t2 ) = 0 x1 (t3 ) = 0, x2 (t3 ) = 1, x3 (t3 ) = 1 x1 (t4 ) = 0, x2 (t4 ) = 0, x3 (t4 ) = 1 x1 (t5 ) = 1, x2 (t5 ) = 0, x3 (t5 ) = 1 x1 (t6 ) = 1, x2 (t6 ) = 0, x3 (t6 ) = 0 ··· Clearly, the above pattern repeats, and no consensus is reached.

♦

As shown in Example 3.11, standard network topology-based control gain design for (3.99) no longer ensures the consensus of multiagent systems, even under the most-restrictive network connectivity condition (that is, fixed and undirected communication). In order to guarantee the multiagent systems consensus with control (3.99) under the least restrictive sensing/communication condition (i.e., sequential completeness of {S(tk }), the control gain αi j has to be redesigned. For any agent i, let the nonlinear gain αi j be given by (1) if xi (tk ) = max j∈Ni x j (tk ) = min j∈Ni x j (tk ), then αi j (tk ) can be any bounded positive value. (2) if xi (tk ) ≥ max j∈Ni x j (tk ), let αi j (tk ) be selected to satisfy the inequality

3 Interaction Topologies of Multiagent Systems . . .

96

0≤

αi j (tk )
0, such that xmax (t + δ) − xmin (t + δ) ≤ λxmax (t) − xmin (t),

(3.104)

for some 0 ≤ λ < 1, where xmax (t) = max j x j (t) and xmin (t) = min j x j (t). This is made possible due to the proposed gain selection rules in (3.100), (3.101 ), and (3.102). A detailed analysis can be found in [12]. Remark 3.26 The gain selection rules in (3.100), (3.101 ), and (3.102) are a general set of sufficient conditions for the asymptotical stability of the multiagent systems in (3.13) under the discontinuous consensus control (3.99) with directed and switching sensing/communication topologies. The nonlinear gain design conditions (3.100) to (3.102) are imposed for the purpose of avoiding the possible states oscillation due to the finite time state reachability of dynamical systems driven by discontinuous functions under certain communication topologies. ♦ Remark 3.27 The nonlinear gain design conditions in (3.100), (3.101 ), and (3.102) can be easily tsatisfied since for agent i, it only requires the available neighboring state information of agent i in the design of αi j (tks ). For instance, to satisfy (3.100), one simple choice could be

3.5 A Discontinuous Consensus Algorithm

αi j (tk ) =

xi (tk ) − min j∈Ni x j (tk ) , (|Ni | + 1)ct

97

(3.105)

where |Ni | denotes the cardinality of the set Ni . Same selection can be made for satisfying the conditions (3.101) and (3.102). ♦ In what follows, we use another example to consider the situation with sensing/communication delays. In the presence of sensing/communication delays, the cooperative control in (3.99) becomes u i (t) =

n

αi j (si j , x j (tk − τi j ))sgn(x j (t − τi j ) − xi (t)),

j=1

t ∈ [tk , tk+1 ),

(3.106)

where τi j ∈ [0, r ] are time delays incurred during transmission with r being the upper bound on latencies of information transmission over the network. In general, multiagent systems with time delays become more involved. By imposing more stringent network connectivity conditions, such as bidirectional (undirected) sensing/communication, the consensus may still be ensured. However, given the discontinuous cooperative control (3.106), consensus may not be guaranteed even under fixed and undirected communication topology as illustrated by the following example. Nonlinear piecewise constant gain αi j (·) needs to be designed to solve the problem. Example 3.12 Consider two agents with the following dynamics x˙1 = sgn(x2 (t − τ ) − x1 (t)), x˙2 = sgn(x1 (t − τ ) − x2 (t)) Let the initial conditions be x1 (0) = 0, x2 (0) = 0.1 and delay τ = 0.1. Simple analysis shows that ⎧ 0, 0 ≤ t ≤ 0.1 ⎪ ⎪ ⎪ t − 0.1, ⎪ 0.1 ≤ t ≤ 0.2 ⎪ ⎪ ⎪ ⎨ −t + 0.3, 0.2 ≤ t ≤ 0.3 x1 (t) = .. ⎪ . ⎪ ⎪ ⎪ ⎪ t − 2k−1 , 2k−1 ≤ t ≤ 2k , k≥1 ⎪ ⎪ 10 10 10 ⎩ 2k+1 2k 2k+1 −t + 10 , 10 ≤ t ≤ 10 , k ≥ 1 ⎧ 0.1, 0 ≤ t ≤ 0.1 ⎪ ⎪ ⎪ ⎪ −t + 0.2, 0.1 ≤ t ≤ 0.2 ⎪ ⎪ ⎪ ⎨ t − 0.2, 0.2 ≤ t ≤ 0.3 x2 (t) = .. ⎪ . ⎪ ⎪ ⎪ ⎪ −t + 2k , 2k−1 ≤ t ≤ 2k , k≥1 ⎪ ⎪ 10 10 10 ⎩ 2k 2k 2k+1 t − 10 , ≤ t ≤ , k≥1 10 10

(3.107)

(3.108)

3 Interaction Topologies of Multiagent Systems . . .

98 Fig. 3.36 System responses

0.12

x1 x2

0.1 0.08 0.06 0.04 0.02 0 −0.02 0

0.1

0.2

0.6 0.5 0.4 Time (sec)

0.3

0.7

0.8

0.9

1

Figure 3.36 depicts the simulation results. Clearly, there is no cooperative convergence. ♦ The following theorem provides the new design. Theorem 3.8 [12] Consider the multiagent system (3.13) under cooperative control (3.106). Assume that sensing/communication matrix sequence of {S(tk )} is sequentially complete. Let the nonlinear gain αi j be designed as follows: for any agent i, (1) if xi (tk ) = max j∈Ni x j (tk − τi j ) = min j∈Ni x j (tk − τi j ), then αi j (tk ) can be any bounded positive values. (2) if xi (tk ) ≥ max j∈Ni x j (tk − τi j ), let αi j (tk ) be selected to satisfy the inequality 0≤

αi j (tk )
0, i = 2, . . . , n. For λ1 = 0, the corresponding normalized left eigenvector w1 = [w11 , . . . , w1n ]T is positive, that is, w1i > 0, ∀i. To this end, the modified distributed estimation algorithm is of the form y˙ˆi (t) =

j∈Ni

ai j ( yˆ j (t) − yˆi (t)) +

y˙i (t) , nw1i

(4.19)

Under (4.19), the overall system dynamics for n estimators become μ(t) , y˙ˆ (t) = −L yˆ (t) + W1−1 n

(4.20)

4.3 Distributed Estimation of Time-Varying Signals

109

where W1 = diag{w1 }. It follows that w1T y˙ˆ (t)

=

−w1T L yˆ (t)

+

μ(t) w1T W1−1 n

n =

i=1

y˙i (t)

n

(4.21)

Similarly, the disagreement dynamics can be obtained as n −1 T W 1 11 1 ˙ = y˙ˆ (t) − δ(t) y˙i (t) = −L yˆ (t) + − μ(t) n i=1 n n

(4.22)

Note the property of L1 = 0 and Lδ(t) = L yˆ (t), (4.22) can be rewritten as

11T W1−1 ˙ = −Lδ(t) + − δ(t) μ(t), n n

(4.23)

and its solution is δ(t) = e

−Lt

t δ(0) +

e

−L(t−τ )

W1−1 11T − μ(τ )dτ n n

(4.24)

0

Since L is strongly connected, and then following a similar argument as that in Theorem 3.4, we have ⎡

⎤

1

⎢ e Jn2 (−λ2 )t ⎢ e−Lt = Q ⎢ .. ⎣ .

⎥ ⎥ −1 ⎥Q ⎦ e Jnm (−λm )t

with e Jni (−λi )t being of the form ⎡

e Jni (−λi )t

2 −λi t

e−λi t te−λi t t e2! ⎢ ⎢ 0 e−λi t te−λi t ⎢ ⎢ =⎢ 0 0 e−λi t ⎢ ⎢ . .. .. ⎣ .. . . 0 ... 0

... ... .. . .. . 0

t (ni −1) e−λi t (n i −1)! t (ni −2) e−λi t (n i −2)!

⎤

⎥ ⎥ ⎥ (n i −3) −λi t ⎥ t e ⎥ (n i −3)! ⎥ ⎥ .. ⎦ . −λi t e

To this end, note that w1T δ(t) = 0 and it is not difficult to establish that

110

4 Emergent Behavior Detection in Multiagent Systems

δ(t) ≤ γ1 e−λ2 t δ(0) t −1 11T −λ2 (t−τ ) W1 +γ2 e − μ(τ ) dτ n n

(4.25)

∞

0

for some constants γ1 > 0 and γ2 > 0. Note that W −1 11T 1 − μ(τ ) n n

≤ max i

∞

| y˙i (τ )/(w1i (n − 1)) − y˙ j (τ )| n

j

≤ max | y˙i (τ )/(w1i (n − 1)) − y˙ j (τ )| i, j

≤ max(max | y˙i (t)/(w1i (n − 1)) − y˙ j (t)|) t

i, j

Thus, it follows from (4.25) that δ(t) ≤ γ1 e−λ2 t δ(0) maxτ ∈[0,t] (maxi, j | y˙i (τ )/(w1i (n − 1)) − y˙ j (τ )|) (1 − e−λ2 t ) +γ2 λ2 Remark 4.2 It should be noted that in the use of (4.19), the left eigenvector w1 is needed. While this global information may be available by design, it is preferred to just use local information. In what follows, we combine a distributed estimation ♦ algorithm of w1 into (4.19) to alleviate this obstacle. Lemma 4.1 Consider n agents with a strongly connected digraph. Let w1 be the normalized left eigenvector (w1T 1 = 1) corresponding to zero eigenvalue of the Laplai i i T , wˆ 12 , . . . , wˆ 1n ] be the estimate of w1 by agent i. The cian matrix L. Let wˆ 1i = [wˆ 11 distributed estimation algorithm is given as follows. w˙ˆ 1i =

j

ai j (wˆ 1 − wˆ 1i )

(4.26)

j∈Ni

where the initial value wˆ 1i = [0, . . . , 0, 1, 0, . . . , 0]T with its ith element being 1. Then we have lim wˆ 1i (t) = w1 , ∀i

t→∞

(4.27)

Proof To prove (4.27), it suffices to show the convergence of individual elements i (t) to w1l , l = 1, . . . , n. It follows that wˆ 1l i = w˙ˆ 1l

j∈Ni

j

i ai j (wˆ 1l − wˆ 1l )

(4.28)

4.3 Distributed Estimation of Time-Varying Signals

111

and i i = −Lwˆ ∗l w˙ˆ ∗l

(4.29)

i i i i T = [wˆ 1l , wˆ 2l , . . . , wˆ nl ] . Thus, by invoking Theorem 3.4 and noting the where wˆ ∗l l k initial values wˆ 1l = 1 and wˆ 1l = 0, k = l , we obtain i i (t) = w1T wˆ ∗l (0) = w1l lim wˆ 1l

t→∞

To this end, for agent i, the combined distributed estimation algorithm is given by y˙ˆi (t) =

ai j ( yˆ j (t) − yˆi (t)) +

j∈Ni i = w˙ˆ 1i

y˙i (t) , i n wˆ 1i (t)

(4.30)

j

i ai j (wˆ 1i − wˆ 1i )

(4.31)

j∈Ni

Theorem 4.1 The distributed estimation algorithm in (4.30) and (4.31) renders δ(t) ≤ γ1 e−λ2 t δ(0) +γ2

(τ ) maxτ ∈[0,t] maxi, j (w1i +γ1y˙ei−λ − y˙ j (τ ) 2 t )(n−1) λ2

(1 − e−λ2 t ()4.32)

Proof Under (4.30), the overall system dynamics for n estimators become ˆ −1 (t) μ(t) , y˙ˆ (t) = −L yˆ (t) + W 1 n

(4.33)

ˆ 1 (t) = diag{[wˆ 1 (t), . . . , wˆ i (t), . . . , wˆ n (t)]}. Similarly, the disagreement where W 11 1n 1i dynamics can be obtained as n ˆ −1 W 11T 1 1 ˙ ˙ = yˆ (t) − − δ(t) y˙i (t) = −L yˆ (t) + μ(t) n i=1 n n

(4.34)

Note the property of L1 = 0 and Lδ(t) = L yˆ (t), (4.34) can be rewritten as

T ˆ −1 W 11 1 ˙ = −Lδ(t) + − δ(t) μ(t), n n

(4.35)

112

4 Emergent Behavior Detection in Multiagent Systems

and its solution is δ(t) = e−Lt δ(0) +

t

e−L(t−τ )

T ˆ −1 W 11 1 − μ(τ )dτ n n

(4.36)

0

Again, it can be established that δ(t) ≤ γ1 e−λ2 t δ(0) t ˆ −1 11T −λ2 (t−τ ) W1 − +γ2 e μ(τ ) dτ n n

(4.37)

∞

0

for some constants γ1 > 0 and γ2 > 0. Note that the exponential convergence of ˆ 1 (t), and W W −1 | y˙i (τ )/((w1i + γ1 e−λ2 t )(n − 1)) − y˙ j (τ )| 11T ˆ1 − μ(τ ) ≤ max i n n n j ∞

≤ max | y˙i (τ )/((w1i + γ1 e−λ2 t )(n − 1)) − y˙ j (τ )| i, j

≤ max(max | y˙i (t)/((w1i + γ1 e−λ2 t )(n − 1)) − y˙ j (t)|) t

i, j

Thus, it follows from (4.37) that we obtain (4.32).

Example 4.2 Let us revisit Example 4.1 with the sensing/communication topology among agents being defined by the following adjacency matrix ⎡

⎤ 020 A = ⎣0 0 2⎦ 220 The left eigenvector w1 is [0.25, 0.5, 0.25]T for the Laplacian matrix L. Figure 4.3 shows that estimates yˆi (t) for y(t) by three agents, respectively. The boundedness of estimation errors y˜i (t) = yˆi (t) − y(t) can be seen in Fig. 4.4, and Figs. 4.5, 4.6 ♦ and 4.7 illustrate the estimates of w1 by three agents, respectively. For the implementation of the distributed estimation algorithm (4.5), an eventtriggered mechanism may be adopted for the purpose of reducing communication load for information transmission among agents. Define the event-triggering time sequence for agents as t0 , t1 , . . .. Let the event-triggered distributed estimation algorithm be ai j ( yˆ j (tk ) − yˆi (tk )) + y˙i (t), t ∈ [tk , tk+1 ) (4.38) y˙ˆi (t) = j∈Ni

4.3 Distributed Estimation of Time-Varying Signals Fig. 4.3 Average estimates by agents

113

1 \hat{y}_{1} \hat{y}_{2} \hat{y}_{3}

0.8 0.6

Estimate

0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 0

10

20

30

40

50

60

Time (sec)

Fig. 4.4 Estimation errors

0.05 \tilde{y}_{1} \tilde{y}_{2} \tilde{y}_{3}

Estimation error

0.04 0.03 0.02 0.01 0 -0.01 0

10

20

30

40

50

60

Time (sec)

Define the measurement error εi (t) = yˆi (tk ) − yˆi (t) with ε = [ε1 , . . . , εn ]T . It then follows from (4.38) that y˙ˆi (t) =

ai j ( yˆ j (t) − yˆi (t))

j∈Ni

+

ai j (ε j (t) − εi (t)) + y˙i (t), t ∈ [tk , tk+1 )

(4.39)

j∈Ni

The overall closed-loop system dynamics for distributed estimators become y˙ˆ (t) = −L yˆ (t) − Lε(t) + μ(t),

(4.40)

Fig. 4.5 Estimate of w1 by agent 1

4 Emergent Behavior Detection in Multiagent Systems

Estimate of Left Eigenvector by Agent 1

114

1 \hat{w}_{11} \hat{w}_{12} \hat{w}_{13}

0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0

10

20

30

40

50

60

Fig. 4.6 Estimate of w1 by agent 2

Estimate of Left Eigenvector by Agent 2

Time (sec) 1 \hat{w}_{21} \hat{w}_{22} \hat{w}_{23}

0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0

10

20

30

40

50

60

Fig. 4.7 Estimate of w1 by agent 3

Estimate of Left Eigenvector by Agent 3

Time (sec) 1 \hat{w}_{31} \hat{w}_{32} \hat{w}_{33}

0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0

10

20

30

Time (sec)

40

50

60

4.3 Distributed Estimation of Time-Varying Signals

115

and the corresponding dynamics for the disagreement vector δ(t) are

11T ˙δ(t) = −Lδ(t) − Lε(t) + I − μ(t) n

(4.41)

Its solution is δ(t) = e−Lt δ(0) −

t

e−L(t−τ ) Lε(τ )dτ

0

t +

e

−L(t−τ )

11T I− μ(τ )dτ n

(4.42)

0

and δ(t) ≤

n

e

−λ j t

δ(0) +

j=2

+

t

j=2 0

≤

e−λ j (t−τ ) Lε(τ )dτ

e

−λ j t

11T −λ j (t−τ ) e I − n μ(τ ) dτ t δ(0) + (n − 1)L

j=2

+

t

j=2 0

n

n

n

e−λ2 (t−τ ) ε(τ )dτ

0

n t j=2 0

11T dτ I − μ(τ ) e−λ j (t−τ ) N

(4.43)

Thus, if the event-triggering time sequence {tk } is updated such that ε(t) ≤ c0 e−α(t−tk ) , t ∈ [tk , tk+1 )

(4.44)

for some positive constants c0 and α, then δ(t) ≤

n

e−λ j t δ(0) +

j=2

+

n j=2 0

t

e

−λ j (t−τ )

(n − 1)L −αt (e − e−λ2 t ) λ2 − α

T

I − 11 μ(τ ) dτ, n

from which the same convergence conclusion can be obtained.

(4.45)

116

4 Emergent Behavior Detection in Multiagent Systems

Remark 4.3 The even-triggering condition in (4.44) is used to determine the time instants for requesting the transmission of yˆ j (tk ) and yˆi (tk ), that is, if ε(t) ≤ c0 e−α(t−tk ) , there is no need for updating the transmission of yˆi (tk ). Otherwise, the new time instant tk+1 will be generated and accordingly εi (tk+1 ) is reset to zero which enforces (4.44). ♦ Remark 4.4 To avoid the need for global information of all εi (t) in continuously checking the condition (4.44), (4.44) can be simplified as a set of distributed conditions c2 (4.46) εi (t)2 ≤ 0 e−2α(t−tk ) , t ∈ [tk , tk+1 ), n which implies (4.44). In other words, each agent only needs to monitor its own states for triggering the signal transmission events. ♦

4.4 Distributed Least Squares Algorithm In this section, we present distributed least squares algorithms for the detection of emergent behaviors in the multiagent systems. Least squares has been extensively used in data fitting, estimation, and system identification. In the multiagent systems setting, we assume that a sensor network is deployed to monitor the multiagent system. Each sensor (detector) can take the measurements of a number of agents in its covered region. Figure 4.8 illustrates a scenario with three sensor monitoring ten agents. It is worth pointing out that the deployed sensors may be part of multiagent systems. To make distributed detection problem solvable, we further assume that every agent is covered by at least one sensor.

4.4.1 Fundamentals on Least Squares Algorithms Suppose we have a linear model given by y = Hx

(4.47)

where y = [y1 , . . . , y p ]T ∈ p×1 is the measurement vector, x = [x1 , . . . , xq ]T ∈ q×1 is an unknown vector, and H = [H1 , . . . , H p ]T is an p × q the regression matrix with HiT ∈ 1×q being the ith row vector of H . Normally, we assume that H is tall with p > q and the column of H is linearly independent, that is, rank{H } = q. Based on the discussion in Chap. 2, we know that most likely y is not in the range of H , and therefore there is no x for which H x = y. The least squares solution is then to seek the best estimate of x such that the following least squares loss function is minimized.

4.4 Distributed Least Squares Algorithm

117

10

8 6 Sensor #1 4

2 Sensor #2 0

-2 -4 Sensor #3 -6 -8 -10 -10

-8

-6

-4

-2

0

2

4

6

8

10

Fig. 4.8 Three sensor monitoring ten agents

1 1 y − H x2 = (yi − HiT x)2 2 2 i=1 p

V (x) = It follows from

∂V ∂x

(4.48)

= 0 that the least squares approximate solution xˆ of y = H x is

xˆ = (H H ) T

−1

H y= T

p

−1 Hi HiT

i=1

p

Hi yi

(4.49)

i=1

By introducing a diagonal weighting matrix = diag{[γ1 , . . . , γ p ]} and based on the modified loss function V (x) =

1 (y − H x)T (y − H x), 2

(4.50)

the weighted least squares solution is obtained as xˆ = (H H ) T

−1

H y = T

p i=1

−1 Hi γi HiT

p i=1

Hi γi yi

(4.51)

118

4 Emergent Behavior Detection in Multiagent Systems

The recursive least squares algorithm can be developed based on sequentially p T −1 , and then available measurements [10]. Define P( p) = i=1 Hi γi Hi −1 P( p) = P( p − 1)−1 + H p γ p H pT

(4.52)

It follows from (4.51) and (4.52) that x( ˆ p) = P( p)

p

Hi γi yi = P( p)

i=1

p−1

Hi γi yi + H p γ p y p

(4.53)

i=1

and noting p−1

Hi γi yi = P( p − 1)−1 x( ˆ p − 1) = P( p)−1 x( ˆ p − 1) − H p γ p H pT x( ˆ p − 1)

i=1

we have ˆ p − 1) + P( p)H p γ p y p x( ˆ p) = x( ˆ p − 1) − P( p)H p γ p H pT x( = x( ˆ p − 1) + P( p)H p γ p y p − H pT x( ˆ p − 1)

(4.54)

To this end, Eqs. (4.52) and (4.54) consist of the recursive least squares algorithm. Remark 4.5 In the use of (4.52) and (4.54), one needs to ensure that P( p) is nonsingular for all p. ♦ The weighted least squares algorithm in (4.51) can be implemented iteratively based on the gradient descent method. It follows from computing the gradient of (4.50) that x(k ˆ + 1) = x(k) ˆ − αk H T H x(k) ˆ − H T y p p T = x(k) ˆ − αk Hi γi Hi x(k) ˆ − Hi γi yi i=1

(4.55)

i=1

where αk > 0 is the learning rate. Remark 4.6 In the aforementioned least squares algorithms, a common assumption is that matrix H T H is nonsingular. If H T H is singular, the least squares solution can be computed using singular-value decomposition. ♦ Recall that for every p × q matrix H , the eigenvalues of H T H can be arranged as λ21 ≥ λ22 ≥ · · · λr2 > 0 = λr +1 = · · · = λq . Let q¯ = min{ p, q}, then λ1 ≥ λ2 ≥ · · · λr > 0 = λr +1 = · · · = λq¯ are called singular values of H . To this end, singularvalue decomposition says that H can be transformed into the form

4.4 Distributed Least Squares Algorithm

119

UTHV = S =

0 0 0

(4.56)

where U T U = UU T = I p , V T V= V V T = Iq , and S is p × q with the singular values of H on the diagonal and = diag(λ1 , . . . , λr ). To this end, it follows from the least squares loss function in (4.48) that the best estimate xˆ can be calculated using SVD as [11] xˆ = V

−1

0 UTy 0 0

(4.57)

4.4.2 Distributed Recursive Least Squares Algorithm Suppose there are a set of n measurement models given by yi (k) = Hi (k)x, i = 1, . . . , n

(4.58)

where k = 0, 1, . . . is an integer, yi is a pi × 1 measurement vector, x = [x1 , pi ×q is the matrix relating the meax2 , . . . , xq ]T is an q × 1 unknown vector, n Hi ∈ pi ≥ q. surements to the unknowns, and p = i=1 Remark 4.7 The model in (4.58) represents well a scenario in which n sensors are used to monitor the behaviors of a network of q agents. xi denotes the state of agent i, i = 1, . . . , q. We assume that the number of sensors is far more smaller than that of agents, that is, n q. While all agents are under the coverage of the sensor network, each sensor only takes care of the measurements of a small number of agents. In such a case, it is not difficult to validate that H T H is always nonsingular, where H = [H1T , . . . , HnT ]T . Distributed least squares algorithms are developed for each sensor to produce the estimate of x based on local information exchange among ♦ sensors. In what follows, xi is scalar. The vector case can be treated similarly. Example 4.3 Consider a case of three sensors to monitor ten agents. Sensor 1 measures agents x4 , x5 , x6 , x7 , sensor 2 covers agents x1 , x2 , x3 , and sensor 3 covers agents x7 , x8 , x9 , x10 . The corresponding measurement matrices may simply be ⎡

0 ⎢0 H1 = ⎢ ⎣0 0 ⎡

0 0 0 0

0 0 0 0

1 0 0 0

0 1 0 0

0 0 1 0

0 0 0 1

0 0 0 0

0 0 0 0

⎤ 0 0⎥ ⎥ 0⎦ 0

⎤ 1000000000 H2 = ⎣ 0 1 0 0 0 0 0 0 0 0 ⎦ 0010000000

120

4 Emergent Behavior Detection in Multiagent Systems

⎡

0 ⎢0 H3 = ⎢ ⎣0 0

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

1 0 0 0

⎤ 000 1 0 0⎥ ⎥ 0 1 0⎦ 001 ♦

We use the adjacency matrix A = [ai j ] ∈ n×n to capture the information exchange among sensors. Assumption 4.1 The sensing/communication topology defined by A is directed, time-invariant, and strongly connected. Problem 4.1 Let Xi (k) be the estimate of x by sensor i at discrete-time instant k. Design a distributed estimation algorithm of the form Xi (k + 1)= f i (Xi (k), ai j I j (k)) such that limt→∞ |Xi (k) − x| < for some small constant , where I j (k) represents the information received from neighboring sensor of the ith sensor. It follows from (4.58) that the overall measurement model is obtained as y(k) = H (k)x,

(4.59)

where y(k) = [y1T (k), y2T (k), . . . , ynT (k)]T ∈ p×1 , and H (k) = [H1T (k), H2T (k), . . . , HnT (k)]T ∈ p×q Define the following least squares loss function V (x) =

1 (y − H x)T R(y − H x) 2

(4.60)

where R = diag{R1 , . . . , Rn } with Ri ∈ pi × pi is the diagonal weighting matrix with respect to sensor i. The least squares solution to (4.59) is xˆ =

n i=1

HiT Ri−1 Hi

−1 n

HiT Ri−1 yi

(4.61)

i=1

n Remark the implementation of (4.61), the computation of i=1 HiT Ri−1 Hi n 4.8 TIn −1 and i=1 Hi Ri yi requires all sensors to send their measurements matrices Hi , covariance matrices Ri , and raw measurements yi (k) to a fusion center, which may be computationally expensive and lacks robustness against possible sensor failures. A distributed estimation algorithm implemented on local sensors is preferred. ♦ The distributed least squares solution is based on the recursive least squares design procedure presented 4.4.1, as well as the distributed estimation of n inT Sect. n −1 T −1 i=1 Hi Ri Hi and i=1 Hi Ri yi . Let Xi (k) be the estimate of x by sensor i at

4.4 Distributed Least Squares Algorithm

121

n T −1 ˆ ˆ discrete-time instant n k. TLet−1Hi (k) be the estimate of i=1 Hi Ri Hi and Yi (k) be the estimate of i=1 Hi Ri yi by sensor i, respectively. Define Pi (0) = Hˆ i (0)−1 and Pi (k + 1) = [Pi (k)−1 + Hˆ i (k + 1)]−1

(4.62)

To derive the recursive algorithm, let us start with Xi (k) = Pi (k)Yˆ i (k)

(4.63)

Then given the estimates Hˆ i (k + 1), Yˆ i (k + 1), Hˆ i (k) and Yˆ i (k), we have the estimate Xi (k + 1) as Xi (k + 1) = Pi (k + 1)[Yˆ i (k) + Yˆ i (k + 1)]

(4.64)

It follows from (4.62) and (4.63) that Xi (k + 1) = Pi (k + 1)[Pi−1 (k)Xi (k) + Yˆ i (k + 1)] = Pi (k + 1)[P −1 (k + 1)Xi (k) − Hˆ i (k + 1)Xi (k) + Yˆ i (k + 1)] i

= Xi (k) + Pi (k + 1)[Yˆ i (k + 1) − Hˆ i (k + 1)Xi (k)] us present the distributed estimation algorithms for nNow Tlet −1 i=1 Hi Ri yi by sensor i. Theorem 4.2 Let the distributed estimation algorithms for n T −1 H R i yi be given by (for sensor node i) i=1 i Hˆ i (k + 1) = Hˆ i (k) +

n i=1

n i=1

(4.65) HiT Ri−1 Hi and HiT Ri−1 Hi and

1 ai j (Hˆ j (k) − Hˆ i (k)) 1 + di j∈N i

1 T + Hi (k + 1)Ri−1 Hi (k + 1) − HiT (k)Ri−1 (k)Hi (k) (4.66) w1i 1 Yˆ i (k + 1) = Yˆ i (k) + ai j (Yˆ j (k) − Yˆ i (k)) 1 + di j∈N i

1 T + Hi (k + 1)Ri−1 yi (k + 1) − HiT (k)Ri−1 yi (k) w1i

(4.67)

where ai j is the i jth element of the adjacency matrix A, Ni is the neighboring set of sensor node i, di = nj=1 ai j , w1 = [w11 , . . . , w1n ]T is the normalized left eigenvector of matrix F = In − (In + D)−1 L corresponding to eigenvalue λ1 = 1, D = diag{di }, and L = D − A is the Laplacian matrix. Under Assumption 4.1 and initial conditions Hˆ i (0) = w11i Hi (0) and Yˆ i (0) = w11i HiT (0)Ri−1 yi (0), we have

122

4 Emergent Behavior Detection in Multiagent Systems n

lim Hˆ i (k) −

k→∞

HiT Ri−1 Hi ≤ 1

(4.68)

HiT Ri−1 yi ≤ 2

(4.69)

i=1

and n

lim Yˆ i (k) −

k→∞

i=1

for some small positive constants 1 and 2 . Proof We show the convergence of (4.66). Same can be done for (4.67). It follows that the overall system dynamics for n estimators can be written as ˆ + 1) = (Inq − (Inq + D ⊗ Iq )−1 L ⊗ Iq )H(k) ˆ H(k 1 ⊗ Iq (H(k + 1) − H(k)), +diag wi where

(4.70)

⎡ˆ ⎡ T ⎤ ⎤ H1 (k) H1 (k)R1−1 H1 (k) −1 ⎢Hˆ 2 (k)⎥ ⎢ H T (k)R H2 (k)⎥ 2 ⎢ ⎢ 2 ⎥ ⎥ ˆ H(k) = ⎢ . ⎥ , and H(k) = ⎢ ⎥ .. ⎣ .. ⎦ ⎣ ⎦ . T −1 Hn (k)Rn Hn (k) Hˆ n (k)

It then follows that ˆ + 1) = w1T ⊗ Iq F ⊗ Iq H(k) ˆ w1T ⊗ Iq H(k 1 ⊗ Iq (H(k + 1) − H(k)) +w1T ⊗ Iq diag w1i n ˆ + (Hi (k + 1) − Hi (k)) (4.71) = w1T ⊗ Iq H(k) j=1

Thus, if initial conditions satisfy Hˆ i (0) = ˆ w1T ⊗ Iq H(k) =

1 w1i

n

Hi (0), then

HiT (k)Ri−1 Hi (k)

(4.72)

i=1

To this end, once the consensus of all estimators in (4.66) is reached, that is, Hˆ i (k) = Hˆ j (k), ∀i, j as k → ∞, then lim Hˆ i (k) =

k→∞

n i=1

HiT (k)Ri−1 Hi (k).

(4.73)

4.4 Distributed Least Squares Algorithm

123

HiT (k + 1)Ri−1 Hi (k + 1) − HiT (k)Ri−1 (k)Hi (k) in (4.66) may not render the exact consensus of Hˆ i (k) and Hˆ j (k). To further show (4.68), let us define the disagreement variable

The drift term

1 w1i

˜ ˆ H(k) = H(k) − 1n ⊗

n

HiT (k)Ri−1 Hi (k)

i=1

It then follows from (4.70) that ˆ ˜ + 1) = F ⊗ Iq H(k) + diag H(k −1n ⊗

n

1 w1i

⊗ Iq (H(k + 1) − H(k))

HiT (k + 1)Ri−1 Hi (k + 1)

i=1

1 ˜ = F ⊗ Iq H(k) + diag ⊗ Iq (H(k + 1) − H(k)) w1i n n −1n ⊗ HiT (k + 1)Ri−1 Hi (k + 1) − HiT (k)Ri−1 Hi (k) i=1

i=1

˜ = F ⊗ Iq H(k) + H(k + 1)

(4.74)

where

1 ⊗ Iq (H(k + 1) − H(k)) w1i n n −1 −1 T T Hi (k + 1)Ri Hi (k + 1) − Hi (k)Ri Hi (k) −1n ⊗

H(k + 1) := diag

i=1

i=1

The solution to (4.74) is ˜ ˜ H(k) = (F ⊗ Iq )k H(0) +

k−1

(F ⊗ Iq )k−1−m H(m + 1)

(4.75)

m=0

Based on Assumption 4.1, we know that for F ∈ n×n , there exists a similarity transformation Q such that F is transformed into the Jordan diagonal form ⎡

⎤

⎢ Jn 2 (λ2 ) ⎢ Q −1 F Q = ⎢ .. ⎣ .

⎥ ⎥ ⎥ ⎦

1

Jn m (λm )

124

4 Emergent Behavior Detection in Multiagent Systems

To this end, note that (4.72) and following a similar argument as that in Theorem 3.5, it can be established that ˜ ˜ + γ2 H(k) ≤ γ1 λk2 H(0)

k−1

λk−1−m H(m + 1) 2

(4.76)

m=0

where γ1 and γ2 are some constants, which leads to (4.84).

Remark 4.9 The algorithms in (4.66) and (4.67) are distributed, and only depend on information exchange among neighboring sensors. If HiT Ri−1 HiT and HiT Ri−1 yi are constant matrices, the asymptotical convergence can be reached in Theorem 4.2. ♦ Theorem 4.3 (Distributed recursive least squares algorithm). The algorithm in (4.62) and (4.65) together with (4.66) and (4.67) solves Problem 4.1. Proof The proof follows the derivation of recursive least squares algorithm and the results in Theorem 4.2. In (4.66) and (4.67), the left eigenvector w1 of matrix F is needed. The following lemma provides a distributed estimation of w1 . Lemma 4.2 (Estimation of the Left Eigenvector) Consider n agents with a strongly connected digraph. Let w1 be the normalized left eigenvector (w1T 1 = 1) corresponding to eigenvalue 1 of the matrix F=In −(In + D)−1 L. Let wˆ 1i = i i i T , wˆ 12 , . . . , wˆ 1n ] be the estimate of w1 by agent i. The distributed estimation [wˆ 11 algorithm is given as follows. wˆ 1i (k + 1) = wˆ 1i (k) +

1 j ai j (wˆ 1 (k) − wˆ 1i (k)) 1 + di j∈N

(4.77)

i

where the initial value wˆ 1i (0) = [0, . . . , 0, 1, 0, . . . , 0]T with its ith element being 1. Then we have lim wˆ 1i (t) = w1 , ∀i

k→∞

(4.78)

n T ∗ 1 2 Proof Define wˆ 1i = [wˆ 1i , wˆ 1i , . . . , wˆ 1i ] which represents a stacked vector for the j ith component in vectors wˆ 1 , ∀ j. It then follows from (4.77) that ∗ ∗ (k + 1) = F wˆ 1i (k), wˆ 1i

(4.79)

Note that F1 = 1, and λ1 = 1 is an eigenvalue of F and the corresponding right eigenvector is 1. The rest of eigenvalues are in the unit circle and satisfy λ1 ≥ |λ2 | ≥ · · · ≥ |λ N |. Under the Assumption 4.1, λ1 = 1 is simple and the convergence of (4.79) can be shown following a similar argument as what shown in Theorem 3.5. That is, we have

4.4 Distributed Least Squares Algorithm

125

j

l lim wˆ 1i (k) = wˆ 1i (k), ∀ j, l.

(4.80)

k→∞

Now let us show the consensus value of wˆ ji , ∀ j. It follows ∗ ∗ ∗ (k + 1) = w1T F wˆ 1i (k) = w1T wˆ 1i (k) w1T wˆ 1i

(4.81)

∗ (k) is invariant and which says that w1T wˆ 1i ∗ ∗ (k) = w1T wˆ 1i (0) w1T wˆ 1i

(4.82)

∗ (0) = [0, . . . , 1, . . . , 0]T with its ith element being 1, and it To this end, note wˆ 1i follows from (4.80) and (4.82) that

w1i j lim wˆ 1i (k) = n

k→∞

l=1

(4.83)

w1l

To this end, the distributed estimation algorithm in (4.77) for the left eigenvector w1 can be combined with (4.66) and (4.67) to generate a truly distributed estimation algorithm as follows Hˆ i (k + 1) = Hˆ i (k) +

1 ai j (Hˆ j (k) − Hˆ i (k)) 1 + di j∈N i

H T (k + 1)Ri−1 Hi (k + 1) HiT (k)Ri−1 Hi (k) + i − i i wˆ 1i (k + 1) wˆ 1i (k) 1 Yˆ i (k + 1) = Yˆ i (k) + ai j (Yˆ j (k) − Yˆ i (k)) 1 + di j∈N

(4.84)

i

+

HiT (k

+ 1)Ri−1 yi (k i wˆ 1i (k + 1)

+ 1)

−

HiT (k)Ri−1 yi (k) i wˆ 1i (k)

(4.85)

Theorem 4.4 (Distributed recursive least squares algorithm). The algorithm in (4.62) and (4.65) together with (4.84), (4.85) and (4.77) solves Problem 4.1. Proof The proof can be done by combining the derivation of recursive least squares algorithm, and the results in Lemma 4.2 and Theorem 4.2. Remark 4.10 For ease of implementation, the algorithms in (4.84) and (4.85) can be rewritten as follows by introducing new variables Hi (k) = Hˆ i (k) −

HiT (k)Ri−1 Hi (k) i wˆ 1i (k)

126

4 Emergent Behavior Detection in Multiagent Systems

and Y i (k) = Yˆ i (k) −

HiT (k)Ri−1 yi (k) i wˆ 1i (k)

Thus, Hi (k + 1) = Hi (k) +

1 ˆ (H j (k) − Hˆ i (k)) 1 + di j∈N

(4.86)

i

H T (k)Ri−1 Hi (k) Hˆ i (k) = Hi (k) + i i wˆ 1i (k)

Y i (k + 1) = Y i (k) +

(4.87)

1 ˆ (Y j (k) − Yˆ i (k)) 1 + di j∈N

(4.88)

i

H T (k)Ri−1 yi (k) Yˆ i (k) = Y i (k) + i i wˆ 1i (k)

(4.89) ♦

To this end, the distributed recursive least squares algorithm solving Problem 4.1 can be summarized as follows. Algorithm 1 Distributed Recursive Least Squares Algorithm 1: Initialization: wˆ 1i (0), Hˆ i (0) =

HiT (0)Ri−1 Hi (0) , i (0) wˆ 1i

Yˆ i (0) =

HiT (0)Ri−1 yi (0) , i (0) wˆ 1i

Hi (0) = 0, Y i (0) = 0,

Pi (0) = I, Xˆi (0) = Pi (0)Yˆ i (0), where Xˆi is the estimate of x by sensor i. 2: while with new samples at time instant k ≥ 1 do 3: Update wˆ 1i (k) using (4.77); 4: Update Hˆ i (k) using (4.87) and (4.86); 5: Update Yˆ i (k) using (4.89) and (4.88); 6: Compute Pi (k) using (4.62); 7: Compute Xˆi (k) using (4.65) 8: end while

Example 4.4 We provide an example to illustrate the proposed distributed least squares Algorithm 1. Assume there are three sensor nodes in the sensor network for monitoring ten agents (targets) in an 2D environment. As shown in Fig. 4.9, the agents are located in a 2D region. The positions of agents 1, 2, 5, and 6 can be measured by sensor 1, agents 3, 4, 5, 8, and 9 by sensor 2, and agents 7, 9, and 10 by sensor 3. The communication topology among sensors is given by

4.4 Distributed Least Squares Algorithm

127

Fig. 4.9 Ten agents

Ten agents locations 10 6

8 6

1

4

4 5 2

2 0

8

-2

3

9

-4

7

-6 -8 10 -10 -10

-8

-6

-4

-2

0

2

4

6

8

10

⎡

⎤ 010 A = ⎣0 0 1⎦ 110 That is, sensor 1 can receive information from sensor 2, sensor 2 can receive information from sensor 3, and sensor 3 can receive information from sensor 1 and sensor 2. Apparently, the communication topology is directed and strongly connected and satisfies Assumption 4.1. The corresponding Laplacian matrices L and F are ⎡

⎤ ⎡ ⎤ 1 −1 0 0.5 0.5 0 L = ⎣ 0 1 −1 ⎦ , F = ⎣ 0 0.5 0.5 ⎦ −1 −1 2 1/3 1/3 1/3 with the left eigenvector corresponding to eigenvalue λ1 = 1 being w1 = [0.2222, 0.4444, 0.3333]T Using Algorithm 1, each sensor can estimate the positions of all agents. The performance of the algorithm is measured using the norm of position estimation errors. Figures 4.10, 4.11, and 4.12 illustrate the norm of position estimation errors for ten agents by three sensors, respectively. The convergence is apparent. The convergence of the estimation of w1 by three sensors is shown in Figs. 4.13, 4.14, and 4.15. ♦

128

4 Emergent Behavior Detection in Multiagent Systems

Fig. 4.10 Norm of estimation errors for ten agents’ positions by sensor 1

Norm of estimation error of 10 agents by sensor 1 12 10 8 6 4 2 0 0

2

4

6

8

10

12

14

16

18

20

Time (sec)

Fig. 4.11 Norm of estimation errors for ten agents’ positions by sensor 2

Norm of estimation error of 10 agents by sensor 2 12 10 8 6 4 2 0 0

2

4

6

8

10

12

14

16

18

20

Time (sec)

Remark 4.11 The proposed distributed Algorithm 1 can also be used to deal with slowly time-varying agents x(t) by introducing a discounting factor 0 < λ < 1. Accordingly, the computation of Pi (k) in (4.62) will be updated by the following equation Pi (k) = (Pi (k − 1)−1 λ + Hˆ i (k))−1

(4.90) ♦

4.4 Distributed Least Squares Algorithm Fig. 4.12 Norm of estimation errors for ten agents’ positions by sensor 3

129

Norm of estimation error of 10 agents by sensor 3 12 10 8 6 4 2 0

0

2

4

6

8

10

12

14

16

18

20

Time (sec)

Fig. 4.13 Convergence of i to w wˆ 11 11

The estimates of w by three sensors 11

1

\hat{w}_{11}^1 \hat{w}_{11}^2 \hat{w}_{11}^3

0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0

2

4

6

8

10

12

14

16

18

20

Time (sec)

4.4.3 Distributed Iterative Least Squares Algorithm Let us revisit Problem 4.1. It can be solved using a distributed version of the iterative the estimate of x by least squares algorithm presented in (4.55). Again, let Xi (k) be n HiT Ri−1 Hi and sensor i at discrete-time instant k. Let Hˆ i (k) be the estimate of i=1 n Yˆ i (k) be the estimate of i=1 HiT Ri−1 yi by sensor i, respectively. It follows from (4.60) and (4.55) that

130

4 Emergent Behavior Detection in Multiagent Systems

Fig. 4.14 Convergence of i to w wˆ 12 12

The estimates of w 12 by three sensors 1 \hat{w}_{12}^1 \hat{w}_{12}^2 \hat{w}_{12}^3

0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

0

2

4

6

8

10

12

14

16

18

20

Time (sec)

Fig. 4.15 Convergence of i to w wˆ 13 13

The estimates of w13 by three sensors 1 \hat{w}_{13}^1 \hat{w}_{13}^2 \hat{w}_{13}^3

0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0

2

4

6

8

10

12

14

16

18

20

Time (sec)

Xi (k + 1) = Xi (k) − αk Hˆ i (k)Xi (k) − Yˆ i (k)

(4.91)

where αk > 0 is the learning rate, Hˆ i (k) is updated using (4.87) and (4.86), and Yˆ i (k) is updated using (4.89) and (4.88). Example 4.5 Let us resolve the distributed estimation problem in Example 4.4 using the distributed iterative least squares algorithm in (4.91). The simulation settings are same as those in Example 4.4. The learning rate αk = 1/(1 + k). The convergence for estimation errors by three sensors is shown in Figs. 4.16, 4.17, and 4.18, respectively.

4.4 Distributed Least Squares Algorithm Fig. 4.16 Norm of estimation errors for ten agents’ positions by sensor 1

12

131

Norm of estimation error of 10 agents by sensor 1

10 8 6 4 2 0 0

2

4

6

8

10

12

14

16

18

20

Time (sec)

Fig. 4.17 Norm of estimation errors for ten agents’ positions by sensor 2

Norm of estimation error of 10 agents by sensor 2 12 10 8 6 4 2 0

0

2

4

6

8

10

12

14

16

18

20

Time (sec)

Compared with the results in Example 4.4, the convergence speed appears to be slower. This is due to the fact that Pi (k) in (4.65) turns out to the (near-)optimal learning rate while αk (4.91) is empirically chosen and may be not the best one. ♦

132 Fig. 4.18 Norm of estimation errors for ten agents’ positions by sensor 3

4 Emergent Behavior Detection in Multiagent Systems

Norm of estimation error of 10 agents by sensor 3 12 10 8 6 4 2 0

0

2

4

6

8

10

12

14

16

18

20

Time (sec)

4.5 Distributed Kalman Filtering Algorithm The proposed distributed least squares algorithms can be further extended to deal with multiagents with possibly unknown dynamics. Consider the detection of dynamical behaviors of a group of N agents with the following discrete dynamics xi (k + 1) = i xi (k) + i u i (k)

(4.92)

where k = 0, 1, . . . is an integer, i = 1, . . . , N , xi ∈ ni and i is a n i × n i system matrix, i is an n i × m i matrix, and u i ∈ m i is the input. Assume that there are n observers which are employed to monitor the behaviors of all agents, and each observer can only monitor the agents in its sensing range. Assume also n N (i.e., the number of sensors is far more less than that of agents), and each observer has a measurement model of the form yi (k) = Hi (k)X + vi (k)

(4.93)

where yi is a pi × 1 measurement vector, X = [x1T , x2T , . . . , x NT ]T is an q × 1 (q = N i=1 n i ) unknown vector representing the overall states of all agents, vi is a pi × 1 white measurement noise vector with zero mean and diagonal, positive definite covariance matrix Ri (k),Hi ∈ pi ×q is the matrix relating the measurements to n pi ≥ N . The dimension parameter pi for observer i the unknowns, and p = i=1 depends on the number of agents in its measurement range. Assume that each observer can communicate wirelessly with some other observers in its sensing/communication. Again, we use the adjacency matrix A = [ai j ] ∈ n×n to capture the information exchange among observers. Assumption 4.1 is imposed for the discussion below.

4.5 Distributed Kalman Filtering Algorithm

133

The objective is to design a distributed recursive estimation algorithm so that each observer can reconstruct the state X for all agents based on information exchange among different observers. It follows from (4.92) and (4.93) that the overall agent dynamics and measurement model are X (k + 1) = X (k) + U (k) Y (k) = H (k)X (k) + V (k)

(4.94) (4.95)

where Y (k) = [y1T (k), y2T (k), . . . , ynT (k)]T ∈ p×1 ,

U (k) = [u T1 (k), u T2 (k), . . . , u TN (k)]T ∈

i

m i ×1

,

V (k) = [v1T (k), v2T (k), . . . , vnT (k)]T , and ⎡ ⎢ ⎢ =⎢ ⎣

1

⎤ 2

..

⎡

⎢ ⎥ ⎢ ⎥ ⎥, = ⎢ ⎣ ⎦

. N

1

⎤ 2

..

⎥ ⎥ ⎥, ⎦

. N

⎡

⎤ H1 (k) ⎢ H2 (k) ⎥ ⎢ ⎥ H (k) = ⎢ . ⎥ ⎣ .. ⎦ Hn (k)

Remark 4.12 It should be noted that Y is the measurement vector due to observers, and not necessarily being the vector of consisting of output vectors for individual agents. In order to reconstruct X from Y based on the model in (4.94) and (4.95), the pair {, H (k)} must be observable. This naturally imposes the condition that each agent is covered by at least one of observers. ♦ Example 4.6 Consider agents with the following discrete time double integrator dynamics (N = 5)

2

1T T /2 i = (4.96) , i = T 0 1 where T is the sample period. We assume that there are three observers (n = 3) to monitor agents’ states. One possible setting is that observer 1 measures agent 1, observer 2 monitor agents 2 and 3; and observer 3 monitor agents 4 and 5. Thus,

01000 00010 H1 = 1 0 0 0 0 ⊗ I2 , H2 = ⊗ I2 , H3 = ⊗ I2 (4.97) 00100 00001 It is not difficult to validate that the pair {, H } induced by (4.96) and (4.97) is observable. Given the same setting, H1 , H2 , and H3 could be in the form of

134

4 Emergent Behavior Detection in Multiagent Systems

01000 00010 H1 = 1 0 0 0 0 ⊗ h, H2 = ⊗ h, H3 = ⊗h 00100 00001

(4.98)

10 where h = . In such a case, instead of measuring both states for agent xi = 00 T [xi1 , xi2 ] , the observer only measures xi1 for agent i. The pair {, H } induced by (4.96) and (4.98) is also observable. ♦ In what follows, we first give the Kalman filtering algorithm by fully using the overall agent dynamics in (4.94) and (4.95), and then the distributed solution will be derived for each observer. The Kalman filtering algorithm consists of a twostep recursive update: measurement update and time update. That is, at the time of measurement, the estimate is updated based on the a similar structure as that in the current estimator (4.4), and between measurements, the estimate is updated based the on the system model as that in (4.3). Define X (k) as the prior estimate of the state at the time instant k before using the current measurement Y (k), and Xˆ (k) as the updated estimate using X (k) and Y (k) at the time instant k. It follows Xˆ (k) = X (k) + K(k)(Y (k) − H (k)X (k)) X (k) = Xˆ (k) + U (k)

(4.99) (4.100)

where K(k) is the observer gain, which is usually designed using the eigenvalue assignment method. In the Kalman filtering setting, K(k) is obtained based on the minimization of the least squares loss function. To this end, similarly to the recursive least squares Algorithm (4.54), we have K(k) = P(k)H T diag{Ri }

(4.101)

T } being the estimate covariance with P(k) = E{(x(k) − x(k))(x(k) ˆ − x(k)) ˆ

P(k) = [M −1 (k) + H T diag{Ri }−1 H ]−1 −1 n −1 −1 T = M(k) + Hi (k)Ri (k)Hi (k)

(4.102)

i=1

where M(k) is the estimate covariance for x(k). ¯ It then follows from (4.100) that T M(k) = E{(x(k) − x(k))(x(k) ¯ − x(k)) ¯ } = P(k − 1)T

(4.103)

To this end, the Kalman filtering algorithm for the multiagent system in (4.94) and (4.95) can be summarized as follows: • Measurement update (at the measurement time k = 1, 2, . . . ,)

4.5 Distributed Kalman Filtering Algorithm

135

Xˆ (k) = X (k) + P(k)

n

HiT (k)Ri−1 (k)yi (k)

i=1

−

n

HiT (k)Ri−1 (k)Hi (k)X (k)

(4.104)

i=1

• Time update (between the measurements) X (k) = Xˆ (k − 1) + U (k − 1)

(4.105)

where P(k) is given by (4.102), M(k) is given by (4.103), and the initial conn −1 ditions can be given by P(0) = HiT (0)Ri−1 (0)Hi (0) and x(0) ˆ = P(0) i=1 n −1 T i=1 Hi (0)Ri (0)yi (0) . Remark 4.13 Apparently, the Kalman filtering algorithm in (4.102) to (4.103) is a centralized one. A information fusion center is needed to handle information from all observers for the computation. In what follows, motivated by the distributed recursive least squares algorithm in Sect. 4.4, we propose a distributed Kalman filtering for each observer the distributed computation of terms like n enabling n algorithm −1 −1 T T H (k)R (k)H (k), H (k)R i i=1 i i=1 i i i (k)yi (k), , and U (k) through information sharing among locally connected observers. ♦ Distributed Kalman Filtering Algorithm: For the ith observer, let X i be its estimate of X at the time instant k before using the current measurement, Xˆi be its updated n ˆ HiT (k)Ri−1 (k)Hi (k), Yˆ i (k) be the estimate of X , Hi (k) be the estimate of i=1 n −1 T ˆ i (k) is the estimate of , and estimate of i=1 Hi (k)Ri (k)yi (k) by sensor i, ˆ i (k) be the estimate of U (k). The proposed distributed Kalman filtering algorithm is as follows: • Measurement update (at the measurement time k = 1, 2, . . . ,)

Xˆi (k) = X i (k) + Pi (k) Yˆ i (k) − Hˆ i (k)X i (k)

−1 Pi (k) = Mi (k)−1 + Hˆ i (k) H T (k)Ri−1 Hi (k) Hˆ i (k) = Hi (k) + i i wˆ 1i (k) 1 Hi (k + 1) = Hi (k) + (Hˆ j (k) − Hˆ i (k)) 1 + di j∈N

(4.106) (4.107) (4.108) (4.109)

i

H T (k)Ri−1 yi (k) Yˆ i (k) = Y i (k) + i i wˆ 1i (k) 1 ˆ Y i (k + 1) = Y i (k) + (Y j (k) − Yˆ i (k)) 1 + di j∈N i

(4.110) (4.111)

136

4 Emergent Behavior Detection in Multiagent Systems

i where wˆ 1i (k) is updated using (4.77). • Time update (between the measurements)

ˆ i (k − 1)Xˆi (k − 1) + ˆ i (k) X i (k) = (4.112) T ˆ ˆ (4.113) Mi (k) = i (k − 1)Pi (k − 1)i (k − 1) 1 ˆ i (k) = ˆ i (k − 1) + ˆ j (k − 1) − ˆ i (k − 1)) (4.114) ( 1 + di j∈N i

i u i (k) ˆ i (k) = i (k) + i wˆ 1i (k) 1 i (k + 1) = i (k) + (ˆ j (k) − ˆ i (k)) 1 + di j∈N

(4.115) (4.116)

i

where the initial conditions Hˆ i (0) =

1 w1i

HiT (0)Ri−1 (0)Hi (0),

1 T H (0)Ri−1 (0)yi (0), Yˆ i (0) = w1i i ⎡

0

⎢ .. ⎢ . ⎢ ⎢ ∗ 1 1 ⎢ ⎢ .. ˆ i (0) = ⎢ . w1i ⎢ ⎢ ∗ 2 ⎢ ⎢ .. ⎣ .

⎤

⎡

0 .. .

⎤

⎢ ⎥ ⎥ ⎢ ⎥ ⎥ ⎢ ⎥ ⎥ ⎢∗1 u ∗1 (0)⎥ ⎥ ⎢ ⎥ ⎥ 1 ⎢ ⎥ ˆ ⎥ .. ⎢ ⎥ , i (0) = ⎥ (4.117) . ⎥ ⎥ w1i ⎢ ⎢∗ u ∗ (0)⎥ ⎥ ⎢ 2 2 ⎥ ⎥ ⎢ ⎥ ⎥ .. ⎣ ⎦ ⎦ . 0 0

where ∗∗ and ∗∗ u ∗∗ (0) are the matrices from agents covered by the ith observer (∗∗ ∈ Ni ). Remark 4.14 In the proposed Kalman filtering algorithm, the convern distributed gence analysis of Hˆ i (k) to i=1 HiT (k)Ri−1 (k)Hi (k) in (4.109) and (4.110) is same n as that in Sect. 4.4. For the convergence of Yˆ i (k) to i=1 HiT (k)Ri−1 (k)yi (k), same analysis can be done. ♦ Remark 4.15 Equation (4.114) is used to estimate the overall multiagent system dynamics . Similarly, this is an application of the average consensus algorithm with the scaled initial conditions in (4.117). Specifically with the use of w1i in (4.117), ˆ i (k) will converge to nj=1 w1 j ˆ i (0) = . In the case of w1i is not known before hand, it can always be estimated in advance using (4.77). Similarly, the algorithm in (4.115) and (4.116) provides a distributed estimation of U (k) by each observer. ♦

4.5 Distributed Kalman Filtering Algorithm

137

Example 4.7 Let us consider the system model in Example 4.6. Each agent has the model in (4.96) with N = 5, and T = 0.1. Three observers (n = 3) are with corresponding measurement matrices in (4.97), and their communication agency matrix is given by ⎡ ⎤ 010 A = ⎣0 0 1⎦ 110 Clearly, as computed in Example 4.4, we have ⎡

⎤ 0.5 0.5 0 F = ⎣ 0 0.5 0.5 ⎦ 1/3 1/3 1/3 and the left eigenvector corresponding to its eigenvalue λ1 = 1 is w1 = [0.2222, 0.4444, 0.3333]T In the simulation, initial states of all agents are zero, and agents are controlled to follow square wave reference inputs ⎧ i, ⎪ ⎪ ⎨ 0, ri (k) = i, ⎪ ⎪ ⎩ 0,

1 ≤ k < 1000 1001 ≤ k < 2000 2001 ≤ k < 3000 3001 ≤ k ≤ 4000

where i = 1, . . . , 5. The corresponding control signals for five agents are u i (k) = −Kxi (k) + K

1 r (k), i = 1, . . . , 5 0 i

where control gain K is obtained using eigenvalue assignment method for i , i based on the desired eigenvalues 0.8 ± 0.25 j. R1 = 0.01I2 , R2 = 0.01I4 , and R3 = 0.01I4 . Figures 4.19, 4.20, 4.21, 4.22, and 4.23 illustrate the corresponding estimates of xi1 (k) by three observers, respectively. Furthermore, we define the norm of estimation error % & 5 2 & Xi(2 j−1) (k)

' E i (k) = Xi(2 j)(k) − xi (k) , i = 1, 2, 3 j=1

The time history of E i (k) is shown in Fig. 4.24, which validates the performance of the proposed distributed Kalman filtering algorithm. ♦

138 Fig. 4.19 Agent 1 state x11 (k) and the corresponding estimates by three observers

4 Emergent Behavior Detection in Multiagent Systems Agent state x11 and the corresponding estimates by three observers

2

Agent 1 Observer 1 Observer 2 Observer 3

1.5

1

0.5

0

-0.5 0

5

10

15

20

25

30

35

40

Time (sec)

Fig. 4.20 Agent 2 state x21 (k) and the corresponding estimates by three observers

Agent state x21 and the corresponding estimates by three observers 3 Agent 2 Observer 1 Observer 2 Observer 3

2.5 2 1.5 1 0.5 0 -0.5 0

5

10

15

20

25

30

35

40

Time (sec)

Remark 4.16 The proposed distributed leasts squares algorithm and distributed Kalman filtering algorithm may be extended to handle the unpredictable changes of the sensing/communication topologies. In such a case, the estimation of the corresponding left eigenvector under the new topology has to be redone to capture the unexpected change of link connectivity. To do so, we could redesign the estimator in (4.77) such that it periodically resets initial values to wˆ 1i (0) = [0, . . . , 1, . . . , 0]T in order to recapture the possibly changing sensing/communication topology among agents. In other words, the following eigenvector estimator can be used

4.5 Distributed Kalman Filtering Algorithm Fig. 4.21 Agent 3 state x21 (k) and the corresponding estimates by three observers

139

Agent state x 31 and the corresponding estimates by three observers 4 Agent 3 Observer 1 Observer 2 Observer 3

3.5 3 2.5 2 1.5 1 0.5 0 -0.5 -1 0

Fig. 4.22 Agent 4 state x41 (k) and the corresponding estimates by three observers

5

10

15

20

25

30

35

40

Time (sec) Agent state x41 and the corresponding

estimates by three observers 5 Agent 4 Observer 1 Observer 2 Observer 3

4 3 2 1 0 -1 0

5

10

15

20

25

30

35

40

Time (sec)

wˆ 1i (k + 1) = wˆ 1i (k) +

1 j ai j (wˆ 1 (k) − wˆ 1i (k)), 1 + di j∈N i

for k ∈ [τ ℵ, (τ + 1)ℵ) i wˆ 1 (τ ℵ) = [0, . . . , 1, . . . , 0]T where the integer τ = 0, 1, . . . , and the integer ℵ is the period of resetting.

(4.118) (4.119) ♦

140

4 Emergent Behavior Detection in Multiagent Systems

Agent state x 51 and the corresponding estimates by three observers 6 Agent 5 Observer 1 Observer 2 Observer 3

5 4 3 2 1 0 -1 0

5

10

15

20

25

30

35

40

Time (sec) Fig. 4.23 Agent 5 state x51 (k) and the corresponding estimates by three observers Norm of Estimation errors 14 Observer 1 Observer 2 Observer 3

12 10 8 6 4 2 0 0

5

10

15

20

25

Time (sec)

Fig. 4.24 Norm of estimation errors

30

35

40

4.5 Distributed Kalman Filtering Algorithm

141

Fig. 4.25 Communication topologies: (A1 : top, A2 : bottom)

Example 4.8 Assume that four agents switch their communication topologies according to the following graphs (Fig. 4.25). The corresponding adjacency matrices are ⎡

0 ⎢1 A1 = ⎢ ⎣0 1

0 0 1 0

1 0 0 0

⎡ ⎤ 0 01 ⎢0 0 0⎥ ⎥ , A2 = ⎢ ⎣0 0 1⎦ 0 10

0 1 0 0

⎤ 0 0⎥ ⎥, 1⎦ 0

and the system matrices are ⎡

⎤ 0.5000 0 0.5000 0 ⎢ 0.5000 0.5000 0 0 ⎥ ⎥, F1 = ⎢ ⎣ 0 0.3333 0.3333 0.3333 ⎦ 0.5000 0 0 0.5000

⎡ ⎢ F2 = ⎢ ⎣

⎤ 0.5000 0.5000 0 0 0 0.5000 0.5000 0 ⎥ ⎥ 0 0 0.5000 0.5000 ⎦ 0.5000 0 0 0.5000

The left eigenvectors to be estimated for F1 and F2 are ⎡

w1,F1

⎡ ⎤ ⎤ 0.3636 0.25 ⎢ 0.1818 ⎥ ⎢ 0.25 ⎥ ⎢ ⎥ ⎥ =⎢ ⎣ 0.2727 ⎦ , w1,F2 = ⎣ 0.25 ⎦ 0.1818 0.25

In the simulation, for k ∈ [0, 100), four agents assume communication topology A1 ; for k ∈ [100, 200), four agents assume communication topology A2 ; and for k ≥ 200, four agents assume communication topology A1 again. The period for

142 Fig. 4.26 Estimates of w11 by four agents

4 Emergent Behavior Detection in Multiagent Systems Estimates of w1 by four agents 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0

50

100

150

200

250

300

Time steps

Fig. 4.27 Estimates of w12 by four agents

Estimates of w2 by four agents 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0

50

100

150

200

250

300

Time steps

resetting initial values of estimators in (4.118) is ℵ = 50. The estimation results are depicted in Figs. 4.26, 4.27, 4.28, and 4.29. It can be seen from Fig. 4.26 that for k ∈ [0, 100), the estimates of w11 by four agents converge to 0.3636, which is the first component of w1,F1 ; for k ∈ [100, 200), the estimates of w11 by four agents converge to 0.25, which is the first component of w1,F2 ; and k ∈ [200, 300), the estimates of w11 by four agents converge to 0.3636, which is the first component of w1,F1 . It should also note that every 50 steps, the initial values of estimators are reset no matter whether there is a change of communication topology or not. This is needed in order to capture the possible communication changes. ♦

4.6 Summary Fig. 4.28 Estimates of w13 by four agents

143 Estimates of w3 by four agents 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0

50

100

150

200

250

300

Time steps

Fig. 4.29 Estimates of w14 by four agents

Estimates of w4 by four agents 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0

50

100

150

200

250

300

Time steps

4.6 Summary In this chapter, we presented several distributed estimation algorithms for emergent behavior detection and identification of interaction topologies in multiagent systems. Those results follow from our recent original work in [12–15] with further extensions and improved convergence analyses. Specifically, different from those in [12–15], the distributed estimation algorithm for time-varying signals in Sect. 4.3 and the distributed least squares algorithms in Sect. 4.4 are extended to handle the case with directed and strongly connected sensing/communication topologies by introducing

144

4 Emergent Behavior Detection in Multiagent Systems

the distributed estimation of a normalized left eigenvector w1 , and more rigorous convergence analyses were also provided. In addition, both distributed recursive least squares algorithm and distributed iterative least squares algorithm were designed in Sect. 4.4. The distributed Kalman filtering algorithm in Sect. 4.5 is new, and simulation example was provided to validate its effectiveness.

References 1. Strogatz, S.H.: Exploring complex networks. Nature 410, 268–276 (2001) 2. Newman, M.E.J.: Network An Introduction. Oxford University Press, Oxford (2010) 3. Weiss, G.: Multiagent Systems a Modern Approach to Distributed Artificial Intelligence. The MIT Press, Cambridge, Massachusetts (1999) 4. Ren, W., Beard, R.W.: Distributed Consensus in Multi-vehicle Cooperative Control. SpringerVerlag, London (2008) 5. Qu, Z.: Cooperative Control of Dynamical Systems. Springer-Verlag, London (2009) 6. Bullo, F., Cortés, J., Martínez, S.: Distributed Control of Robotic Networks. Applied Mathematics Series. Princeton University Press (2009). Electronically available at http:// coordinationbook.info 7. Saber, R.O., Fax, J.A., Murray, R.M.: Consensus and cooperation in networked multi-agent systems. Proc IEEE 95, 215–233 (2007) 8. Spanos, D.P., Olfati-Saber, R., Murray, R.M.: Distributed sensor fusion using dynamic consensus. In: IFAC World Congress, Prague, Czech (2005) 9. Freeman, R.A., Yang, P., Lynch, K.M.: Stability and convergence properties of dynamic consensus estimators. In: Proceedings of the IEEE International Conference on Decision and Control, pp. 383–343 (2006) 10. Astrom, K.J., Wittenmark, B.: Adaptive Control. Addison-Wesley, Reading, MA (1995) 11. Strang, G.: L. Algebra and L. from Data. Wellesley-Cambridge Press, Wellesley, MA (2019) 12. Wang, J., Ahn, I.S., Lu, Y., Yang, T., Staskevich, G.: A distributed estimation algorithm for collective behaviors in multiagent systems with applications to unicycle agents. Int. J. Control Autom. Syst. 15, 2829–2839 (2017) 13. Wang, J., Ahn, I.S., Lu, Y., Staskevich, G.: A new distributed algorithm for environmental monitoring by wireless sensor networks with limited communication. In: 2016 IEEE Sensors, Orlando, FL (2016) 14. Wang, J., Ahn, I.S., Lu, Y., Yang, T., Staskevich, G.: A distributed least-squares algorithm in wireless sensor networks with limited communication. In: 17th IEEE International Conference on Electro Information Technology (EIT), Lincoln, Nebraska, May 2017 15. Wang, J., Ahn, I.S., Lu, Y., Yang, T., Staskevich, G.: A distributed least-squares algorithm in wireless sensor networks with limited and unknown communications. Int. J. Handheld Comput. Res. 8, 15–36 (2017)

Chapter 5

Distributed Task Coordination of Multiagent Systems

5.1 Task Coordination as a Control Problem Distributed control of agent behaviors and networked optimization are of paramount importance in the study of multiagent system emergent behaviors. One of the fundamental issues is how to design local and distributed control to coordinate the individual agent’s behavior such that the desired group behavior emerges. In this chapter, we treat the multiagent task coordination problem as a distributed cooperative control problem by specifically focusing on a class of multiagent systems with nonlinear dynamics and/or with model uncertainties. Consider multiagent systems with the following nonlinear dynamics x˙i = f i (xi (t), u i (t), t),

(5.1)

where i = 1, . . . , n, xi (t) ∈ q is the state, u i ∈ m is the control input to be designed, and f i is piecewise continuous in t and Lipschitz in xi on q . The main objective of this chapter is to design a distributed control for (5.1) by using local information so as to achieve certain cooperative behaviors for the overall system, such as consensus, rendezvous, and formation control. In essence, the study of various cooperative behaviors can be recast as cooperative stability issues [1]. The cooperative stabilization problem for (5.1) can be done using the linear consensus algorithms discussed in Sect. 3.2, under the condition that (5.1) is feedback linearizable. However, some nonlinear systems are not feedback linearizable, and new design has to be studied. The following example shows such a case. Example 5.1 Consider the formation stabilization problem for a group of three differential-drive mobile robots with the following kinematic model x˙i = vi cos θi , y˙i = vi sin θi , θ˙i = ωi

© Springer Nature Switzerland AG 2022 J. Wang, Emergent Behavior Detection and Task Coordination for Multiagent Systems, Studies in Systems, Decision and Control 397, https://doi.org/10.1007/978-3-030-86893-2_5

(5.2)

145

146

5 Distributed Task Coordination of Multiagent Systems

Fig. 5.1 Initial configurations of three robots

12 robot 1 robot 2 robot 3

10

8

6

4

2

0 -6

-4

-2

0

2

4

6

Initial Configurations

where (xi , yi )T ∈ 2 is the robot’s position in the 2D plane, θi is the orientation angle, vi ∈ is the driving velocity, and ωi ∈ is the steering velocity. The control objective is to drive the robots into an equilateral triangle formation, while with the same orientation. Define ⎡ ⎡ ⎤ ⎡ ⎤ ⎤ cos θi 0 0 gi1 = ⎣ sin θi ⎦ , gi2 = ⎣0⎦ , f i = ⎣0⎦ 1 0 0 It can be seen that the vector fields {gi1 , gi2 , [ f i , gi1 ], [ f i , gi2 ]} are not linearly independent. Thus, system in (5.2) is not input-state feedback linearizable [2]. The problem may be partially and approximately solved via input/output linearization [3]. That is, as discussed in Sect. 1.5, let us choose a reference point along the body orientation with a distance b = 0 from the guidepoint, and define its coordinates as z i1 = xi + b cos θi , z i2 = yi + b sin θi and let

we obtain

vi ωi

=

cos θi sin θi − sinb θi cosb θi

u i1 u i2

5.1 Task Coordination as a Control Problem

147

z˙ i1 = u i1 z˙ i2 = u i2 i1 sin θi θ˙i = u i2 cos θi −u b

(5.3)

in which the model in terms of z i1 and z i2 is the single-integrator one in (1.16). Let the coordinates defining the formation shape be d1 =

2 0 4 , d2 = , d3 = √ 0 0 2 3

and the adjacency matrix (strongly connected) among three robots be ⎡

⎤ 010 A = ⎣0 0 1 ⎦ 110 Using the linear consensus algorithm in Sect. 3.2, we have u i1 =

ai j (z j1 − d j1 − z i1 + di1 )

(5.4)

ai j (z j2 − d j2 − z i2 + di2 )

(5.5)

j∈Ni

u i2 =

j∈Ni

where di = [di1 , di2 ]T . Initial configurations for three robots are [3, −2, − π4 ]T , [0, 6, π3 ]T , and [1, 8, π6 ]T , respectively. Figures 5.1 and 5.2 show the initial and Fig. 5.2 Final configurations of three robots

12 robot 1 robot 2 robot 3

10

8

6

4

2

0

-6

-4

-2

0

2

Final Configurations

4

6

148

5 Distributed Task Coordination of Multiagent Systems

final configurations of three robots, respectively. Apparently, the formation shape is achieved with no consensus for orientation angles. This is due to the fact that in the linearized model (5.3), θi becomes internal dynamics of the system which is not controllable. ♦ In the rest part of this chapter, we first present a general design method for distributed control of nonlinear multiagent systems which will enable us to solve the full state coordination problem of robots with kinematic constraints, and then we look into the adaptive cooperative control for nonlinear multiagent systems with model uncertainties.

5.2 A General Design Method for Distributed Nonlinear Control 5.2.1 General Design For the cooperative stabilization of nonlinear multiagent systems in (5.1), the design adopts an online replanning strategy for the desired state xi (tk+1 ) := xi (k + 1) along an infinite sequence time instants {tk }, k = 0, 1, . . . , where tk+1 − tk = T with T being the sampling time. And then control is implemented in a sampled data sense such that the state xi (tk ) can be steered to xi (tk+1 ) in the finite time T . We assume that the sensing/communication network connectivity condition among agents is satisfied in terms of the existence of a spanning tree in the digraph G as defined in Sect. 3.1.1 or the sequentially complete sensing/communication matrix sequence as defined in Sect. 3.1.2. We further assume that introducing the sampling time sequence {tk } in the design process does not breach the the sensing/communication network connectivity condition. To this end, at time instant tk , the planning for xi (k + 1) is produced based on xi (k) and x j (k), j ∈ Ni . Define the adjacency matrix A(k) = [ai j (k)] at the time instant tk . It follows that the planning algorithm for xi (k + 1) can be simply given by xi (k + 1) = xi (k) +

1+

1 j∈Ni

ai j (k)

ai j (k)(x j (k) − xi (k))

(5.6)

j∈Ni

Once having xi (k + 1), together with the steering control u i (t) = αi (t), t ∈ [tk , tk+1 ) which drives the system states from xi (k) to xi (k + 1) in a finite time T , we have the overall distributed feedback control of the form u i (t) = αi (xi (k), ai j (k)x j (k), t), t ∈ [tk , tk+1 )

(5.7)

It should be noted that function αi (·) is Lipschitz continuous in state variables, and piecewise continuous in t, and could be in open-loop form or feedback form. In the sequel, we have the following theorem.

5.2 A General Design Method for Distributed Nonlinear Control

149

Theorem 5.1 Consider the multiagent system (5.1) under the distributed control law (5.7). Assume that f i (t, xi , u i ) satisfies the following generalized Lipschitz condition in terms of xi f i (xi , u i , t) − f j (x j , u j , t) ≤ Lxi − x j

(5.8)

where L > 0 is some Lipschitz constant. The closed-loop system is cooperatively stable if the following design conditions are satisfied. (a) αi (·) is Lipschitz continuous in state variables, and piecewise continuous in t. For any constant state x ss ∈ q , f i (x ss , αi (x ss , ai j (k)x ss , t), t) = 0 for t ∈ [tk , tk+1 ). (b) for any given pair of initial and final conditions xi (k) and xi (k + 1) on the time interval [tk , tk+1 ), αi (·) is designed in open-loop control form or feedback control form to transfer the ith agent from xi (k) to xi (k + 1) in finite time. (c) there exists some constant 0 < λ1 ≤ 1 such that max xi (k + 1) − x j (k + 1) ≤ λ1 max xi (k) − x j (k) i, j

i, j

(5.9)

(d) there exists an infinite subsequence {kv : v = 0, 1, . . . , }, such that max xi (kv+1 ) − x j (kv+1 ) ≤ λ2 max xi (kv ) − x j (kv ) i, j

i, j

(5.10)

where 0 < λ2 < 1. Proof Condition (a) says that control function αi (·) needs to be designed such that any constant state x ss can be an equilibrium point for the multiagent systems, which is the essential feature for any cooperatively controlled multiagent systems. Condition (5.9) implies that under steering control αi (·) the distance between xi (k + 1) and x j (k + 1) is nonincreasing over time for all i, j, and condition (5.10) further establishes the convergence of xi (k) − x j (k) → 0 as k → ∞. It is apparent since v max xi (kv ) − x j (kv ) max xi (k) − x j (k) ≤ λk−k 1

i, j

i, j

≤

v kv −k0 λ2 λk−k 1

≤

v +k0 kv −k0 λ2 λk−k 1

max xi (k0 ) − x j (k0 ) i, j

max xi (0) − x j (0) i, j

(5.11)

where kv is the nearest integer smaller than k. Now, we need show the convergence of xi (t) − x j (t) for any t under the steering control αi (·). It follows from (5.1) and (5.7) that for t ∈ [tk , tk+1 ),

t xi (t) = xi (k) +

f i (xi (τ ), αi (xi (k), ai j x j (k), τ ), τ )dτ tk

(5.12)

150

5 Distributed Task Coordination of Multiagent Systems

Thus using condition (5.8), we have

t xi (t) − x j (t) ≤ xi (k) − x j (k) +

f i (xi (τ ), αi (xi (k), ai j x j (k), τ ), τ ) tk

− f j (x j (τ ), α j (x j (k), a ji xi (k), τ ), τ )dτ

t ≤ xi (k) − x j (k) + Lxi (τ ) − x j (τ )dτ tk

To this end, application of the Gronwall–Bellman inequality [4] results in xi (t) − x j (t) ≤ xi (k) − x j (k)e L(t−tk ) ≤ xi (k) − x j (k)e L T

(5.13)

Thus, it follows from (5.13) that since xi (k) − x j (k) converges to zero, so does xi (t) − x j (t). This completes the proof. Remark 5.1 As shown in Chap. 3, the planning algorithm in (5.6) satisfies the con♦ ditions (5.9) and (5.10). The choice of αi relies on the system dynamics. Example 5.2 Let us reconsider the consensus control of multiagent systems with the single-integrator dynamics given by x˙i = u i , where xi ∈ and u i ∈ . The standard consensus control is of the form ai j (t)(x j (t) − xi (t)) (5.14) u i (t) = j∈Ni

We may use the result in Theorem 5.1 to redesign control. Given xi (k + 1) and i (k) for t ∈ [tk , tk+1 ). Thus, using the planning xi (k), a simple choice is αi = xi (k+1)−x T algorithm in (5.6) renders the new consensus control for t ∈ [tk , tk+1 ) u i (t) =

T (1 +

1 j∈Ni

ai j (k))

ai j (k)(x j (k) − xi (k))

(5.15)

j∈Ni

Apparently, the advantage of using (5.15) is that it requires less communication among agents. Simulation results for three agents are given in Figs. 5.3 and 5.4. The adjacency matrix is ⎡ ⎤ 010 A = ⎣0 0 1⎦ 100 and T = 0.5.

♦

Example 5.3 Consider the consensus control of multiagent systems with the doubleintegrator model

5.2 A General Design Method for Distributed Nonlinear Control

151

6 x1 x2

5.5

x3

5 4.5 4 3.5 3 2.5 2

0

1

2

3

4

5

6

7

8

9

10

Time (sec)

Fig. 5.3 System response under control (5.14) 6 x1

5.5

x2 x3

5 4.5 4 3.5 3 2.5 2 0

1

2

3

4

5

Time (sec)

Fig. 5.4 System response under control (5.15)

6

7

8

9

10

152

5 Distributed Task Coordination of Multiagent Systems

x˙i1 = xi2 , x˙i2 = u i

(5.16)

where xi = [xi1 , xi2 ]T . Given any xi (k) and xi (k + 1), the input u i (t) = −B T e A

T

(tk+1 −t)

Wc−1 (e AT xi (k) − xi (k + 1))

(5.17)

will transfer xi (k) to xi (k + 1) at time tk+1 , where t ∈ [tk , tk+1 ), 01 0 A= , B= 00 1 and

T Wc =

e Aτ B B T e A τ dτ = T

0

T3 T2 3 2 T2 T 2

To this end, following the planning algorithm in (5.6), control u i (t) in (5.17) becomes Wc−1 (e AT xi (k) 1 −xi (k) − ai j (k)(x j (k) − xi (k))) 1 + j∈Ni ai j (k) j∈N

u i (t) = −B T e A

T

(tk+1 −t)

(5.18)

i

Simulation results for three agents are given in Figs. 5.5, 5.6 and 5.7. The initial conditions are x1 (0) = [3, 0]T ,x2 (0) = [2, 0]T , and x3 (0) = [6, 0]T . T = 0.2, and the adjacency matrix is ⎡ ⎤ 010 A = ⎣0 0 1⎦ 100 ♦

5.2.2 Distributed Control of Nonholonomic Robots In this section, we show how to use the general distributed control design method in Theorem 5.1 to solve the consensus control problem of multiple nonholonomic robots. Nonholonomic robots represent a general class of underactuated robotic systems with motion constraints [5, 6], in which the kinematic constraints impose a limit on the velocity space of the robots. As a result, certain velocity maneuver has to be properly done in order to reach any point in the configuration space of the robot. A typical example is the parallel parking in driving a vehicle. Define the vector of generalized coordinates q = [q1 , . . . , qn ]T ∈ n for the robot’s configuration, and the generalized velocity q˙ = [q˙1 , . . . , q˙n ]T . The first-order kinematic constraints (k constraints) can be expressed as

5.2 A General Design Method for Distributed Nonlinear Control

153

6 x 11 x 21

5.5

x 31

5 4.5 4 3.5 3 2.5 2 0

2

4

6

8

10

12

14

16

18

20

Time (sec)

Fig. 5.5 System response under control (5.18) 3 x 12

2.5

x 22 x 32

2 1.5 1 0.5 0 -0.5 -1 -1.5 -2

0

2

4

6

8

10

Time (sec)

Fig. 5.6 System response under control (5.18)

12

14

16

18

20

154

5 Distributed Task Coordination of Multiagent Systems 250 u1

200

u2 u3

150 100 50 0 -50 -100 -150 -200 0

2

4

6

8

10

12

14

16

18

20

Time (sec)

Fig. 5.7 Control inputs

A T (q)q˙ = 0

(5.19)

where A(q) = [a1 (q), . . . , ak (q)] ∈ n×k with ai (q) being a n-dimensional vector. The constraints in (5.19) become nonholonomic constraints if (5.19) is not integrable. Otherwise, they are holonomic constraints which render the geometric limitation on the configuration space of q. It follows from (5.19) that q˙ belongs to the null space of A(q), which is formed by a set of n − k linearly independent vector fields satisfying A T (q)G(q) = 0

(5.20)

where G(q) ∈ n×m has the rank m = n − k. Accordingly, the kinematic model for the nonholonomic robot can be obtained as q˙ = G(q)u

(5.21)

where u = [u 1 , . . . , u m ]T is the input. Example 5.4 Consider the differential-drive robot in Fig. 5.8. Given the configuration vector q = [x, y, θ ]T , where (x, y) is the coordinates of the guide point (center of the robot), and θ is the orientation angle of the robot, the rolling without slipping constraint is given by x˙ sin θ − y˙ cos θ = 0

(5.22)

5.2 A General Design Method for Distributed Nonlinear Control

155

Fig. 5.8 A differential-drive robot

⎡ cos θ

A(q) = sin θ − cos θ 0 , G(q) = ⎣ sin θ 0

It follows

⎤ 0 0⎦ 1

and the kinematic model is ⎡

⎤ ⎡ ⎤ cos θ 0 q˙ = G(q)u = ⎣ sin θ ⎦ u 1 + ⎣0⎦ u 2 0 1

(5.23)

where u = [u 1 , u 2 ]T , u 1 is driving velocity input, and u 2 is the steering velocity input. ♦ Example 5.5 Consider the car-like robot. As shown in Fig. 5.9, the guide point is the midpoint of the rear axle, and the configuration vector is q = [x, y, θ, φ]T . There are two nonholonomic constraints corresponding to the front wheel and the back wheel, respectively.

It follows

x˙ sin θ − y˙ cos θ = 0 d (x + l cos θ ) sin(θ + φ) − dt

d (y dt

+ l sin θ ) cos(θ + φ) = 0

(5.24)

156

5 Distributed Task Coordination of Multiagent Systems

Fig. 5.9 A car-like robot

⎡ cos θ ⎢ sin θ sin(θ + φ) − cos(θ + φ) −l cos θ 0 A(q) = , G(q) = ⎢ ⎣ tan φ sin θ − cos θ 0 0 l 0

⎤ 0 0⎥ ⎥ 0⎦ 1

and the kinematic model is ⎡

⎤ ⎡ ⎤ cos θ 0 ⎢ sin θ ⎥ ⎢0⎥ ⎥ ⎢ ⎥ q˙ = G(q)u = ⎢ ⎣ tan φ ⎦ u 1 + ⎣0⎦ u 2 l 1 0

(5.25)

where u = [u 1 , u 2 ]T , u 1 is driving velocity input of the back wheels, and u 2 is the steering velocity input of the front wheels. ♦ To facilitate the design, we further convert the model in (5.21) into its canonical form (chained form) [7]. It is shown in [8] that there exist a diffeomorphic coordinate transformation Z = T1 (q) and a control mapping v = T2 (q)u, such that (5.21) can be converted into a m-input, (m − 1)-chain, single-generator chained form given by: z˙ 1 = v1 , z˙ j,i = z j,i+1 v1 , 2 ≤ i ≤ n j − 1, 1 ≤ j ≤ m − 1 (5.26) z˙ j,n j = v j+1 , where Z = [z 1 , Z 2 , . . . , Z m ]T ∈ R n with Z j = [z j−1,2 , . . . , z j−1,n j−1 ] (2 ≤ j ≤ m) are the substates, and v = [v1 , v2 , . . . , vm ]T are the inputs. Specifically, for the two-

5.2 A General Design Method for Distributed Nonlinear Control

157

input systems as shown in Examples 5.4 and 5.5, the chained form is given below z˙ 1 = v1 z˙ 2 = v2 z˙ 3 = z 2 v1 .. .

(5.27)

z˙ n = z n−1 v1 . Example 5.6 For the differential-drive robot model in (5.23), the following local coordinate transformation and input mapping z 1 = x, z 2 = tan(θ ), z 3 = y, u1 =

v1 , u 2 = v2 cos2 (θ ) cos θ

render the chained form z˙ 1 = v1 , z˙ 2 = v2 , z˙ 3 = z 2 v1

(5.28)

For the car-like robot model in (5.25), the follow local coordinate transformation and input mapping tan(φ) , z 3 = tan(θ ), z 4 = y, l cos3 (θ ) v1 3 sin(θ ) , u2 = − sin2 (φ)v1 + l cos3 (θ ) cos2 (φ)v2 u1 = cos(θ ) l cos2 (θ ) z 1 = x, z 2 =

render the chained form z˙ 1 = v1 , z˙ 2 = v2 , z˙ 3 = z 2 v1 , z˙ 4 = z 3 v1

(5.29) ♦

Consider the chained form system in (5.27). The open-loop steering control can be derived to steer the system from the initial configuration Z (0) = [z 1 (0), . . . , z n (0)]T to the final configuration Z (T ) = [z 1 (T ), . . . , z n (T )]T in a finite time T . There are generally three basic steering control methods: sinusoidal steering control, piecewiseconstant steering control, and polynomial steering control [7, 9]. The sinusoidal steering control is of the form v1 (t) = c0 + c1 sin(ωt) v2 (t) = b0 + b1 cos(ωt) + · · · + bn−2 cos((n − 2)ωt)

(5.30)

158

5 Distributed Task Coordination of Multiagent Systems

which can drive the chained system (5.27) from Z (0) to Z (T ) in T = 2π , where ω is a ω given angular frequency, and unknown coefficients c0 , c1 , b0 , . . . , bn−2 can be solved by simply integrating system equation. For example, for n = 3 with the chained system in (5.28), we have v1 (t) = c0 + c1 sin(ωt), v2 (t) = b0 + b1 cos(ωt), and then direct integration leads to z 1 (t) = z 1 (0) + c0 t + c1 (1−cos(ωt)) , ω z 2 (t) = z 2 (0) + b0 t + b1 sin(ωt) , ω z 3 (t) = z 3 (0) + z 2 (0)c0 t + c02b0 t 2 + cω0 b21 (1 − cos(ωt)) + c1 zω2 (0) (1 − cos(ωt)) + cω1 b20 (sin(ωt) − ωt cos(ωt)) + cω1 b21 ( ωt2 − sin(2ωt) ) 4 Using the boundary conditions, we obtain 1 (0) c0 = z1 (T )−z T z 2 (T )−z 2 (0) b0 = T 2 b1 = 2b0 + 2ω(z3 (T )−z3 (0)−zc12T(0)c0 T −c0 b0 T /2)

(5.31)

where c1 = 0 is a free parameter. The piecewise-constant steering control is based on the further partition of the time [0, T ] into n − 1 subintervals [(k − 1)T /(n − 1), kT /(n − 1)), where k = 1, . . . , n − 1. The control is of the form: for t ∈ [(k − 1)T /(n − 1), kT /(n − 1)), v1 (t) = αk , v2 (t) = βk ,

(5.32)

where αk and βk are constants, which again can be solved by integrating system equations. Example 5.7 Consider the chained system in (5.28). The piecewise-constant steering control is given by v1 (t) = α1 , v2 (t) = β1 , t ∈ [0, T /2) v1 (t) = α2 , v2 (t) = β2 , t ∈ [T /2, T ] It follows

(5.33)

z 1 (T /2) = z 1 (0) + α1 T2 z 2 (T /2) = z 2 (0) + β1 T2 2 z 3 (T /2) = z 3 (0) + z 2 (0)α1 T2 + β1 α1 T8 z 1 (T ) = z 1 (T /2) + α2 T2 z 2 (T ) = z 2 (T /2) + β2 T2 2 z 3 (T ) = z 3 (T /2) + z 2 (T /2)α2 T2 + β2 α2 T8

For simplicity, we just let α1 = α2 = α, then using boundary conditions we obtain

5.2 A General Design Method for Distributed Nonlinear Control

α1 = α2 = T β1 2 2 = 3αT β2 8

z 1 (T ) − z 1 (0) T T −1 2 αT 2 8

z 2 (T ) − z 2 (0) z 3 (T ) − z 3 (0) − z 2 (0)αT

159

(5.34)

♦ The polynomial steering control is of the form v1 (t) = c10 , v2 (t) = c20 + c21 t + . . . + c2(n−2) t n−2 ,

(5.35)

where c10 , c20 , c21 , . . ., and c2(n−2) are constants to be determined. It is easy to check that integrating chained system yields ⎧ z 1 (t) = z 1 (0) + c10 t ⎪ ⎪ ⎪ ⎪ c2(n−2) t n−1 c21 t 2 ⎪ ⎪ + ··· + ⎪ z 2 (t) = z 2 (0) + c20 t + ⎪ ⎨ 2 n−1 .. . ⎪ ⎪ ⎪ n−2 n−1 n−k n−k ⎪ n−2 ⎪ k!c10 c2k t n+k−1 c10 t ⎪ ⎪ + z k (0) z (t) = z (0) + ⎪ n n ⎩ (n + k − 1)! (n − k)! k=0 k=2

(5.36)

from which c10 , c20 , c21 , . . ., and c2(n−2) can be solved using the boundary conditions. Example 5.8 For the chained system in (5.29), we have ⎧ 1 (0) c10 = z1 (T )−z , ⎪ T ⎪ ⎤−1 ⎡ ⎪ ⎤ ⎡ ⎪ T2 T3 ⎪ T ⎪ c20 2 3 ⎪ ⎪ c T 2 c10 T 3 c10 T 4 ⎥ ⎪ ⎨ ⎣ c21 ⎦ = ⎢ ⎣ 102 6 4 212 5 ⎦ 2 2 c10 T 3 c10 T c10 T c22 ⎪ 24 60 ⎤ ⎡6 ⎪ ⎪ ⎪ z 2 (T ) − z 2 (0) ⎪ ⎪ ⎪ ⎦ ⎪ × ⎣ z 3 (T ) − z 3 (0) − c10 z 2 (0)T ⎪ ⎩ 2 z 2 (0)T 2 z 4 (T ) − z 4 (0) − c10 z 3 (0)T − 0.5c10 ♦ Now, let us consider the consensus control problem for multiagent systems in chained form. That is, each agent is of the form z˙ i1 = vi1 , z˙ i2 = vi2 , z˙ i3 = z i2 vi1 , .. .

z˙ in = z i(n−1) vi1 ,

(5.37)

160

5 Distributed Task Coordination of Multiagent Systems

where i = 1, . . . , q, z i = [z i1 , z i2 , . . . , z in ]T is the state, and vi = [vi1 , vi2 ]T is the control input. Using the sinusoidal steering input in (5.30), we have the following result. Theorem 5.2 Consider a multiagent system in chained form given by (5.37). Assume that the sensing/communication network connectivity condition among agents is satisfied in terms of the existence of a spanning tree in the digraph G with the adjacency matrix A = [ai j ]. Given the sampling time sequence {tk }, k = 0, 1, . . . , tk+1 − tk = T , the distributed cooperative steering control for the ith agent is given by: for t ∈ [tk , tk+1 ) k k + ci2 sin(ω(t − tk )) vi1 (t) = ci1 k k k vi2 (t) = bi1 + bi2 cos(ω(t − tk )) + bi3 cos(2ω(t − tk )) + · · · k +bi,n−1 cos((n − 2)ω(t − tk ))

(5.38) (5.39)

k k , ci2 = 0 is a constant, ci1 and bilk for l = 1, . . . , n − 1 can be obtained where ω = 2π T by integrating (5.37) under steering controls (5.38) and (5.39) and then solving the generated n linear algebra equations based on the boundary conditions z i (k) and z i (k + 1), where z i (k + 1) is specified by the following planning algorithm

z i (k + 1) = z i (k) +

1+

1 j∈Ni

ai j (k)

ai j (k)(z j (k) − z i (k))

(5.40)

j∈Ni

The resulting closed-loop system is cooperative stable. Proof The proof directly follows from theorem 5.1.

Example 5.9 To illustrate the algorithm in Theorem 5.2, we consider the consensus control of differential-drive robots given in Example 5.6. For the ith robot, x˙i = u i1 cos θi , y˙i = u i1 sin θi , θ˙i = u i2 , and cz i1 = xi , z i2 = tan θi , z i3 = yi , vi1 u i1 = , u i2 = vi2 cos2 θi , cos θi Let the control be: for t ∈ [tk , tk+1 ), k k + ci2 sin(ω(t − tk )), vi1 (t) = ci1 k k vi2 (t) = bi1 + bi2 cos(ω(t − tk )),

Following the results in (5.31) and (5.40), we obtain

5.2 A General Design Method for Distributed Nonlinear Control

161

3 x1 x2

2.5

x3

2

1.5

1

0.5

0 0

1

2

3

4

5

6

7

8

9

10

Time (sec)

Fig. 5.10 System state responses xi (t)

k ci1 =

1 T (1+ j∈N ai j (k)) i

j∈Ni

ai j (k)(z j1 (k) − z i1 (k))

= 0 any constant = T (1+ 1 ai j (k)) j∈Ni ai j (k)(z j2 (k) − z i2 (k)) j∈Ni j∈Ni ai j (k)(z j3 (k)−z i3 (k)) k k k = 2bi1 + c2ω − z i2 (k)ci1 T− bi2 k (1+ ai j (k)) T

k ci2 k bi1

i2

j∈Ni

k k ci1 bi1 T 2

(5.41)

2

In the simulation, we consider the case of three agents with adjacency matrix given below ⎡ ⎤ 010 A = ⎣0 0 1⎦ 100 The initial configurations of three agents are [3, −2, − π4 ]T , [0, 6, π3 ]T , and [1, 8, π6 ]T , respectively. ci2 = 2, T = 0.5. Simulation results are shown in Figs. 5.10, 5.11, 5.12, 5.13 and 5.14. ♦ Example 5.9 simply illustrates the proposed distributed consensus algorithm in Theorem 5.2. With little modification, the formation stabilization problem can be solved at ease. Example 5.10 Let us revisit formation stabilization problem in Example 5.1. To achieve the formation in terms of [xi , yi ]T , and alignment in terms of θi , let us define zˆ i1 = z i1 − di1 and zˆ i3 = z i3 − di2 . Then we obtain

162

5 Distributed Task Coordination of Multiagent Systems 8 y1 y2

7

y3

6

5

4

3

2 0

1

2

3

4

5

6

7

8

9

10

Time (sec)

Fig. 5.11 System state responses yi (t) 1.2 theta 1

1

theta 2 theta 3

0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8

0

1

2

3

4

5

Time (sec)

Fig. 5.12 System state responses θi (t)

6

7

8

9

10

5.2 A General Design Method for Distributed Nonlinear Control

163

3 u 11 u 21

2

u 31

1

0

-1

-2

-3

-4 0

1

2

3

4

5

6

7

8

9

10

Time (sec)

Fig. 5.13 System control inputs u i1 (t) 50 u 12 u 22 u 32

0

-50

-100 0

1

2

3

4

5

Time (sec)

Fig. 5.14 System control inputs u i2 (t)

6

7

8

9

10

164

5 Distributed Task Coordination of Multiagent Systems 3 x1 x2 x3

2.5 2 1.5 1 0.5 0 -0.5 -1 -1.5 -2 0

1

2

3

4

5

6

7

8

9

10

Time (sec)

Fig. 5.15 System state responses xi (t)

z˙ˆ i1 = vi1 , z˙ i2 = vi2 , z˙ˆ i3 = z i2 vi1 Accordingly, the control coefficients in (5.41) become k ci1 =

1 T (1+ j∈N ai j (k)) i

j∈Ni

ai j (k)(ˆz j1 (k) − zˆ i1 (k))

= 0 any constant = T (1+ 1 ai j (k)) j∈Ni ai j (k)(z j2 (k) − z i2 (k)) j∈Ni z j3 (k)−ˆz i3 (k)) j∈Ni ai j (k)(ˆ k k k − z i2 (k)ci1 T− bi2 = 2bi1 + c2ω k (1+ ai j (k)) T

k ci2 k bi1

i2

j∈Ni

k k ci1 bi1 T 2

2

(5.42)

Use the same simulation setting as that in Example 5.1, and let T = 0.5 and ci2 = 2, we obtain the simulation results shown in Figs. 5.15, 5.16, 5.17 and 5.18. ♦ Remark 5.2 In Theorem 5.2, other steering control inputs such as piecewise constant inputs and polynomial inputs may be used. For instance, the following polynomial inputs for the differential-drive robots may be used k k k , vi2 (t) = ci2 + ci3 (t − tk ) vi1 (t) = ci1 k k k , ci2 , and ci3 are given by where constants ci1

5.2 A General Design Method for Distributed Nonlinear Control

165

9 y1 y2 y3

8

7

6

5

4

3

2 0

1

2

3

4

5

6

7

8

9

10

Time (sec)

Fig. 5.16 System state responses yi (t) 1.2 theta 1

1

theta 2 theta 3

0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 0

1

2

3

4

5

Time (sec)

Fig. 5.17 System state responses θi (t)

6

7

8

9

10

166

5 Distributed Task Coordination of Multiagent Systems

Fig. 5.18 Final configurations

12 robot 1 robot 2 robot 3

10

8

6

4

2

0 -6

-4

-2

0

2

4

6

Final Configurations

k ci1 =

k ci2 k ci3

T (1 +

=

T k T2 ci1 2

1

ai j (k)(z j1 (k) − z i1 (k)) ai j (k)) j∈N i −1 ⎡ j∈Ni ai j (k)(z j2 (k)−zi2 (k))

j∈Ni T2 2k T3 ci1 6

⎣

1+ j∈N ai j (k) i a j∈Ni i j (k)(z j3 (k)−z i3 (k)) 1+ j∈N ai j (k) i

k − ci1 z i2 (k)T

⎤ ⎦

k k However, it should be noted that in the computation of ci2 and cin , there will be a k achieving zero. singularity problem with the convergence of z i1 due to the value of ci1 To avoid this problem, we may introduce the notion of practical cooperative stability for z i1 by only considering the bounded consensus within a bounded neighborhood. In solving formation stabilization problem, this problem may be naturally avoided based on the desired formation shape. ♦

5.3 Distributed Tracking Control for a Class of Uncertain Nonlinear Systems In the coordination of multiagent behaviors, a fundamental task is to make the group of agent to track the dynamical behavior of an informed agent (leader). In this section, we present the results on the distributed coordinated tracking control for multiagent systems with model uncertainties. Specifically, we consider a class of uncertain nonlinear multiagent system governed by the following differential equations

5.3 Distributed Tracking Control for a Class of Uncertain Nonlinear Systems

x˙i = f i (xi ) + gi u i ,

167

(5.43)

where i ∈ {1, . . . , n} is the index for agent i and there are n agents in the group, xi ∈ is the state, u i ∈ is the control to be designed, and f i (xi ) is a smooth function of its argument with f (0) = 0 representing the unknown system dynamics, and gi = 0 is an unknown constant, which is referred to as the control coefficient. We assume that there is an informed agent whose dynamics are described by x˙0 = α0 x0 + r0 (t),

(5.44)

where constant α0 < 0, and r0 (t) is a piecewise-continuous bounded function of time. We further assume that α0 and r0 (t) are completely unknown to all agents, and x0 (t) may be sensed or communicated to some agents in the group. We aim to solve the distributed tracking problem of the informed leader (5.44) for uncertain multiagent systems (5.43) based on local information exchange among agents. The design of distributed control u i (t) relies on the information exchange among agents, which can be described using the adjacency matrix A = [ai j ]. For the design below, we assume that all agents have equal sensing/communication capabilities, that is, ai j = a ji and A is symmetric. Accordingly, the Laplacian matrix L is symmetric as well. We also assume that the informed agent state x0 (t) is available to at least one agent through sensing/communication detection, and this is described by a diagonal matrix B given below B = diag {bi0 } .

(5.45)

where bi0 > 0 means that agent i has the information x0 (t). As shown in Chapt. 3, we know that for any nonzero vector z = [z 1 , z 2 , . . . , z N ]T , z T Lz = 21 i, j ai j (z j − z i )2 ≥ 0. Thus L is positive semidefinite. In addition, if A is connected, it can be shown that L + B is positive definite since its all eigenvalues are positive as indicated by Gershgorin circle Theorem [10].

5.3.1 A Simple Case for f i (xi ) In this subsection, we consider a simple case in which the ith agent in (5.43) becomes x˙i = αi xi + u i

(5.46)

that is, f i (xi ) = αi xi and gi = 1, where αi is an unknown constant. For the informed agent in (5.44), we assume that constant α0 < 0 is unknown, and r0 (t) is a piecewisecontinuous bounded function of time parameterized by r0 (t) = φ T (t)w, where basis functions φ(t) = [φ1 (t), φ2 (t), . . . , φl (t)]T ∈ l are available to all agents, and parameters w = [w1 , w2 , . . . , wl ]T ∈ l are unknown constants.

168

5 Distributed Task Coordination of Multiagent Systems

Distributed Adaptive Control. For agent i, let αˆ i be the parameter estimate of αi∗ = α0 − αi , and wˆ i j be the estimate of w j . wˆ i = [wˆ i1 , . . . , wˆ il ]T . The control input for agent i is chosen to be u i = αˆ i xi + φ T (t)wˆ i

(5.47)

Defining the tracking error X˜ = [x˜1 , . . . , x˜n ]T = [x1 − x0 , . . . , xn − x0 ]T , and the parameter estimation errors α˜ i = αˆ i − αi∗ , w˜ i = wˆ i − w = [w˜ i1 , . . . , w˜ il ]T , the error equation for agent i can be derived as x˙˜i = α0 x˜i + α˜ i xi + φ T w˜ i

(5.48)

and the overall error dynamics for all agents are Φ j w˜ ∗ j X˙˜ = α0 X˜ + X α˜ + l

(5.49)

j=1

where α˜ = [α˜ 1 , . . . , α˜ n ]T , w˜ ∗ j = [w˜ 1 j , . . . , w˜ n j ]T , X = diag[x1 , . . . , xn ], and Φ j = diag[φ j (t), . . . , φ j (t), . . . , φ j (t)]. We further define the consensus error ei = j∈Ni ai j (xi − x j ) + bi0 (xi − x0 ). Then it can be verified that (L + B) X˜ = [e1 , . . . , ei , . . . , en ]T

(5.50)

The adaptive laws for updating αˆ i (t) and wˆ i j (t) are given by α˙ˆ i = −Γα−1 xi ei i

(5.51)

w˙ˆ i j = −Γw−1 φ j ei ij

(5.52)

where i = 1, . . . , n, j = 1, . . . , l, Γai > 0, and Γwi j > 0. Stability Analysis. Let us consider a candidate for the Lyapunov function 1 1 1 ˜T Γαi (α˜ i )2 + Γw (w˜ i j )2 X (L + B) X˜ + 2 2 i=1 2 j=1 i=1 i j n

V =

l

n

(5.53)

The time derivative of V along the trajectories of (5.49), (5.51) and (5.52) is given by

5.3 Distributed Tracking Control for a Class of Uncertain Nonlinear Systems

169

Γαi α˜ i α˙˜ i + Γwi j w˜ i j w˙˜ i j V˙ = X˜ T (L + B) X˙˜ + n

⎛

l

i=1

n

j=1 i=1

= X˜ T (L + B) ⎝α0 X˜ + X α˜ +

l j=1

⎞

Φ j w˜ ∗ j ⎠ −

n

α˜ i xi ei −

i=1

n l

w˜ i j φ j ei

j=1 i=1

= α0 X˜ T (L + B) X˜ ≤ 0

(5.54)

which implies X˜ , αˆ i , wˆ i j ∈ L∞ . Also X˜ ∈ L2 and X˙ ∈ L∞ , which further implies that X˜ → 0 as t → ∞. Example 5.11 In this example, the proposed distributed adaptive control in (5.47), (5.51) and (5.52) is simulated for the following multiagent system with three agents x˙1 = 2x1 + u 1 , x˙2 = −2x2 + u 2 , x˙3 = 3x3 + u 3 and the informed agent is given by x˙0 = −x0 + r (t) with r (t) = 2 cos(t) + 3 sin(2t). The adjacency matrix A and the leader information matrix B are given by ⎡

⎤ ⎡ ⎤ 110 000 A = ⎣1 1 1⎦, B = ⎣0 1 0⎦ 011 000 Accordingly, the distributed adaptive control is as follows u i = αˆ i xi + wˆ i1 cos(t) + wˆ i2 sin(2t) with the adaptive laws ⎞ ⎛ α˙ˆ i = −Γαi xi ⎝ ai j (xi − x j ) + bi0 (xi − x0 )⎠ j

⎛ ⎞ w˙ˆ i1 = −Γwi1 cos(t) ⎝ ai j (xi − x j ) + bi0 (xi − x0 )⎠ j

⎛ ⎞ w˙ˆ i2 = −Γwi2 sin(2t) ⎝ ai j (xi − x j ) + bi0 (xi − x0 )⎠ j

Simulation results are given in Figs. 5.19, 5.20, 5.21, 5.22, 5.23 and 5.24, which illustrate the effectiveness of the proposed design.

170

5 Distributed Task Coordination of Multiagent Systems State responses 6 x1

5

x2 x3

4 3 2 1 0 -1 -2 -3

0

5

10

15

20

25

30

35

40

45

50

35

40

45

50

Time (sec)

Fig. 5.19 System state responses xi (t) Tracking errors \tilde{x}_i 6 \tilde{x}_{1} \tilde{x}_{2} \tilde{x}_{3}

5 4 3 2 1 0 -1 -2 -3 0

5

10

15

20

25

Time (sec)

Fig. 5.20 Tracking errors x˜i (t)

30

5.3 Distributed Tracking Control for a Class of Uncertain Nonlinear Systems

171

Control inputs 10 5 0 -5 -10 -15 -20 u1

-25

u2 u3

-30 -35

0

5

10

15

20

25

30

35

40

45

50

Time (sec)

Fig. 5.21 Control inputs u i (t) Parameter estimates \hat{\alpha}_i 4 {\hat\alpha}_{1} {\hat\alpha}_{2} {\hat\alpha}_{3}

2

0

-2

-4

-6

-8

-10 0

5

10

15

20

25

Time (sec)

Fig. 5.22 Parameter estimates αˆ i

30

35

40

45

50

172

5 Distributed Task Coordination of Multiagent Systems Parameter estimates \hat{w}_{i1} 4 \hat{w}_{11} \hat{w}_{21} \hat{w}_{31}

3

2

1

0

-1

-2 0

5

10

15

20

25

30

35

40

45

50

Time (sec)

Fig. 5.23 Parameter estimates wˆ i1 Parameter estimates \hat{w}_{i2} 4 \hat{w}_{12} \hat{w}_{22} \hat{w}_{32}

3

2

1

0

-1

-2

0

5

10

15

20

25

30

Time (sec)

Fig. 5.24 Parameter estimates wˆ i2

35

40

45

50

5.3 Distributed Tracking Control for a Class of Uncertain Nonlinear Systems

173

5.3.2 A General Case for Neural Network Parameterized f i (xi ) In this subsection, we consider a more general case in which f i (xi ) is of unknown nonlinear function. We use a linearly parameterized neural network to approximate f i (xi ), that is, f i (xi ) = ψiT (xi )θi + i

(5.55)

where basis functions ψi (xi ) = [ψi,1 , . . . , ψi,ni ] ∈ ni , θi = [θi,1 , . . . , θi,ni ]T ∈ ni are unknown constants representing weights of the neural network, and i is the neural network approximation error. Based on the universal approximation result for neural network [11], we assume that i is bounded by an unknown constant δi , that is, | i | ≤ δi . We also assume gi = 1 and the informed agent is given by (5.44). Distributed Adaptive Control. The proposed design is based on the distributed estimation of θi , δi and w. The consensus tracking error ei is defined by ei = j∈Ni ai j (x i − x j ) + bi0 (x i − x 0 ). The control for agent i is of the form u i = αˆ i xi − ψiT (xi )θˆi − sgn(ei )δˆi + φ T (t)wˆ i ,

(5.56)

where αˆ i be the estimate of α0 by agent i, θˆi is the estimate of θi , wˆ i is the estimate of w, δˆi is the estimate of δi , and sgn(x) is defined as ⎧ ⎨ +1, if x > 0 sgn(x) = 0, if x = 0 ⎩ −1, if x < 0. Define the tracking error X˜ =[x˜1 , . . . , x˜n ]T = [x1 − x0 , . . . , xn − x0 ]T , the parameter estimation errors α˜ i = αˆ i − α0 , θ˜i = θˆi − θi = [θ˜i,1 , . . . , θ˜i,ni ]T , w˜ i = wˆ i − w = [w˜ i1 , . . . , w˜ il ]T , and δ˜i = δˆi − δi . It then follows from (5.43), (5.44), and (5.56) that x˙˜i = α0 x˜i + α˜ i xi − ψiT θ˜i + φ T w˜ i + i − sgn(ei )δˆi ,

(5.57)

and the overall n system error equation can be derived as X˙˜ = α0 X˜ + X α˜ − Ψ +

l

Φ j w˜ ∗ j + − Δ

(5.58)

j=1

where α˜ = [α˜ 1 , . . . , α˜ n ]T , Ψ = [θ˜1T ψ1 , . . . , θ˜nT ψn ]T , w˜ ∗ j = [w˜ 1 j , . . . , w˜ n j ]T , X = diag [x1 , . . . , xn ], = [ 1 , . . . , n ]T , Δ = [sgn(e1 )δˆ1 , . . . , sgn(en )δˆn ]T , and Φ j =

diag φ j (t), . . . , φ j (t), . . . , φ j (t) , j = 1, . . . , l. The adaptive laws for αˆ i , θˆi , wˆ i ,

174

5 Distributed Task Coordination of Multiagent Systems

and δˆi are given by xi ei , α˙ˆ i = −Γα−1 i ψi ei , θˆ˙i = Γθ−1 i −1 wˆ˙ i j = −Γ φ j ei ,

(5.59) (5.60)

wi j

(5.61)

sgn(ei )ei , δ˙ˆi = Γδ−1 i

(5.62)

where i = 1, . . . , n, j = 1, . . . , l, Γαi > 0, Γθi > 0, Γwi j > 0, and Γδi > 0. Stability Analysis. Let the Lyapunov function candidate be given by 1 1 1 1 ˜T Γαi (α˜ i )2 + Γδi (δ˜i )2 + Γθ θ˜ T θ˜i X (L + B) X˜ + 2 2 i=1 2 i=1 2 i=1 i i n

V =

n

1 Γw (w˜ i j )2 2 j=1 i=1 i j n

l

+

n

(5.63)

The time derivative of V along the the solutions of (5.58), (5.59), (5.60), (5.61), and (5.62) is given by V˙ = X˜ T (L + B) X˙˜ + Γαi α˜ i α˙˜ i + Γδi δ˜i δ˙˜i + Γθi θ˜iT θ˙˜i +

n l

n

n

n

i=1

i=1

i=1

Γwi j w˜ i j w˙˜ i j

j=1 i=1

= α0 X˜ T (L + B) X˜ + X˜ T (L + B)X α˜ − +

l j=1

+

n i=1

Note that

n

α˜ i xi ei

i=1

˜T

X (L + B)Φ j w˜ ∗ j −

n

w˜ i j φi ei − X˜ T (L + B)Ψ

i=1

θ˜iT ψi ei + X˜ T (L + B)( − Δ) +

n

δ˜i sgn(ei )ei

(5.64)

i=1

X˜ T (L + B) = ((L + B) X˜ )T = [e1 , . . . , ei , . . . , en ]T ,

n α˜ i xi ei , X˜ T (L + B)Φ j w˜ ∗ j = it can be readily verified that X˜ T (L + B)X α˜ = i=1 n n n ˜ i j φi ei , X˜ T (L + B)Ψ = i=1 θ˜iT ψi ei , X˜ T (L + B) = i=1 ei i , and i=1 w n X˜ T (L + B)Δ = i=1 |ei |δˆi . Thus, we obtain from (5.64) that

5.3 Distributed Tracking Control for a Class of Uncertain Nonlinear Systems

V˙ ≤ α0 X˜ T (L + B) X˜ +

n

|ei || i | −

i=1

≤ α0 X˜ T (L + B) X˜ +

n

n

|ei |δˆi +

i=1

|ei |δi −

i=1

n

n

175

δ˜i sgn(ei )ei

i=1

|ei |δˆi +

i=1

n

δ˜i sgn(ei )ei

i=1

= α0 X˜ T (L + B) X˜ ≤ 0.

(5.65)

To this end, it can be concluded that X˜ , αˆ i , θˆi and wˆ i j are bounded for all t, i.e., X˜ , αˆ i , θˆi , wˆ i j ∈ L∞ . Thus, since x0 (t) is bounded, we have xi (t) is bounded for all t ≥ t0 , which further means that given R > 0, there exist constant R0 , such that if |xi (t0 )| < R0 , then |xi (t)| < R, ∀t ≥ t0 . To this end, by defining compact set Ωi = {xi ∈ ||xi | ≤ max(R0 , R)}, we know that xi (t) ∈ Ωi , ∀t ≥ t0 . On the other hand, it follows from (5.65) that V˙ ≤ α0 λmin (L + B) X˜ 2 , where positive constant λmin (L + B) denotes the minimum eigenvalue of L + B. By taking integration on both sides, we have

∞

V (∞) − V (t0 ) ≤ a0 λmin (L + B)

X˜ (τ 2 dτ,

t0

which leads to

∞

X˜ (τ 2 dτ ≤

t0

1 (V (t0 ) − V (∞)) < ∞. |α0 |λmin (L + B)

That is, X˜ ∈ L2 . Since X˙˜ , as given by Eq. (5.58), is bounded. Thus, we know that X˜ ∈ L2 L∞ . It then follows from the Barbalat’s lemma (Corollary 2.9, page 86, in [12]) that X˜ → 0 as t → ∞. Remark 5.3 In the control law (5.56), the use of signum function may result in a chattering phenomenon. A way to eliminate chattering is to replace the signum function in (5.56) and (5.62) by a high-slope saturation function given below sat(ei /ε) = where ε is a positive constant.

ei

, if | eεi | ≤ 1, ei sgn( ε ), if | eεi | > 1 ε

♦

Example 5.12 We revisit the problem in Example 5.11 and use the same simulation setting. However, control (5.56) is used based on the RBF neural network approximation of f i (xi ). That is, f i (xi ) = ψiT (xi )θi + i , where | i | ≤ δi , and ψi = [ψi,1 , . . . , ψi,ni ]T , with ψi, j being chosen as the commonly used Gaussian functions, which have the form

176

5 Distributed Task Coordination of Multiagent Systems

ψi, j = e−(xi −μi, j )

T

(xi −μi, j )/ηi,2 j

, j = 1, . . . , n i

where μi, j is the center of the receptive field and ηi is the width of the Gaussian function. The performance of the proposed adaptive control relies on the selection of the centers and widths of RBF. For Gaussian RBF NNs, it was shown in that the centers can be arranged on a regular lattice on n to uniformly approximate smooth functions. In the simulation, we select the widths and centers as: ηi,2 j = 0.1, ∀i, j, every neural network ψiT θi contains 11 nodes, with center μi, j ( j = 1, . . . , 11) evenly spaced in [−15, 15]. The following initial conditions and design parameters are used in the simulation: x1 (0) = [0.5, 0]T , x2 (0) = [−0.2, 0]T , x3 (0) = [0.3, 0]T , x0 (0) = 0, αˆ i (0) = 0, wˆ i1 (0) = 0, wˆ i2 (0) = 0, δˆi (0) = 0, θˆi (0) = 0, Γαi = Γwi = Γθi = 2, and Γδi = 20. Simulation results in Figs. 5.25, 5.26, 5.27, 5.28 and 5.29 validate the effectiveness of the proposed distributed adaptive control. Figures 5.25 and 5.26 show that all three agents follow the desired trajectory specified by the informed leader x0 (t). The boundedness of the corresponding control inputs is shown in Fig. 5.27. The boundedness of parameter estimates αˆ i , wˆ i , δˆi as well as NN weights θˆi are illustrated in Figs. 5.28 and 5.29.

States

5

x1 x2 x3

4 3 2 1 0 -1 -2 -3

0

5

10

15

20

25

30

Time (sec)

Fig. 5.25 System state responses xi (t)

35

40

45

50

5.3 Distributed Tracking Control for a Class of Uncertain Nonlinear Systems

177

Tracking errors

4

x 1-x 0 x 2-x 0

3

x 3-x 0

2

1

0

-1

-2

0

5

10

15

20

25

30

35

40

45

50

Time (sec)

Fig. 5.26 Tracking errors x˜i (t) Control inputs

10

u1 u2 u3

5

0

-5

-10

-15

0

5

10

15

20

25

Time (sec)

Fig. 5.27 Control inputs u i (t)

30

35

40

45

50

178

5 Distributed Task Coordination of Multiagent Systems Adaptive estimation of a *i

5

\hat{a}_{1} \hat{a}_{2} \hat{a}_{3}

0 -5 -10 0

5

10

15

20

25

30

35

40

45

50

Time (sec) Adaptive estimation of w 1

4

\hat{w}_{11} \hat{w}_{12} \hat{w}_{13}

2 0 -2

0

5

10

15

20

25

30

35

40

45

50

Time (sec) Adaptive estimation of w 2

4

\hat{w}_{21} \hat{w}_{22} \hat{w}_{23}

2 0 -2 0

5

10

15

20

25

30

35

40

45

50

Time (sec)

Fig. 5.28 Parameter estimates Adaptive estimation of i

1

\hat\delta_{1} \hat\delta_{2} \hat\delta_{3}

0.8 0.6 0.4 0.2 0 0

5

10

15

20

25

30

35

40

45

50

Time (sec) Norm of neural network weights

1.5

||\hat\theta_1|| ||\hat\theta_2|| ||\hat\theta_3||

1

0.5

0 0

5

10

15

20

25

30

Time (sec)

Fig. 5.29 Parameter estimates

35

40

45

50

5.3 Distributed Tracking Control for a Class of Uncertain Nonlinear Systems

179

5.3.3 The Case with Partially Unknown g i The results in Sects. 5.3.1 and 5.3.2 are based on the assumption that control coefficients gi is known in the model (5.43), now let us consider the case of gi being unknown but its sign is known, and then we make a further extension to deal with the completely unknown gi in next subsection. Distributed Adaptive Control. With the partially unknown gi , the proposed distributed adaptive control for agent i is of the form u i = sgn(gi ) αˆ i xi − gˆ i ψiT (xi )θˆi + gˆ i φ T (t)wˆ i − gˆ i sgn(ei )δˆi , where αˆ i is the estimate of αi∗ = sgn(gi ) αg0i , gˆ i is the estimate of gi∗ = define g˜ i = gˆ i − gi∗ . It then follows from (5.43), (5.44), and (5.66) that

(5.66)

sgn(gi )

x˙˜i = α0 x˜i + |gi |α˜ i xi − |gi |g˜ i ψiT θˆi − ψiT θ˜i + |gi |g˜ i φ T wˆ i +φ T w˜ i − |gi |g˜ i sgn(ei )δˆi − sgn(ei )δˆi + i ,

gi

, and

(5.67)

and the overall n system error equation can be derived as X˙˜ = α0 X˜ + Gx α˜ − Gθ g˜ − Ψ + Gw g˜ +

l

Φ j w˜ ∗ j

j=1

−Gδ g˜ + − Δ.

(5.68)

where α˜ = [α˜ 1 , . . . , α˜ n ]T , Ψ = [θ˜1T ψ1 , . . . , θ˜nT ψn ]T , w˜ ∗ j = [w˜ 1 j , . . . , w˜ n j ]T , g˜ =

= [ 1 , . . . , n ]T , Δ = [sgn(e1 )δˆ1 , . . . , sgn(en )δˆn ]T , Gx = [g˜ 1 , . . . , g˜ n ]T , diag[|g1 |x1 , . . . , |gn |xn ], Gθ = diag |g1 |ψ1T θˆ1 , . . . , |gi |ψiT θˆi , . . . , |gn |ψnT θˆn ,

Gw = diag |g1 |φ T wˆ 1 , . . . , |gi |φ T wˆ i , . . . , |gn |φ T wˆ n , Gδ = diag |g1 |sgn(e1 )δˆ1 , . . . , |gi |sgn(ei )δˆi , . . . , |gn |sgn(en )δˆn and

Φ j = diag φ j (t), . . . , φ j (t), . . . , φ j (t) , j = 1, . . . , l.

The adaptive laws are chosen as xi ei , α˙ˆ i = −Γα−1 i −1 g˙ˆ i = Γgi (ψiT θˆi − φ T wˆ i + sgn(ei )δˆi )ei ,

(5.69) (5.70)

180

5 Distributed Task Coordination of Multiagent Systems

ψi ei , θ˙ˆi = Γθ−1 i

(5.71)

w˙ˆ i j = −Γw−1 φ j ei , ij

(5.72)

sgn(ei )ei , δ˙ˆi = Γδ−1 i

(5.73)

where i = 1, . . . , n, j = 1, . . . , l, Γαi > 0, Γgi > 0, Γθi > 0, Γwi j > 0, and Γδi > 0. Stability Analysis. Consider the Lyapunov function candidate 1 1 1 ˜T Γαi |gi |(α˜ i )2 + Γg |gi |(g˜ i )2 X (L + B) X˜ + 2 2 i=1 2 i=1 i n

V =

1 1 1 Γθi θ˜iT θ˜i + Γwi j (w˜ i j )2 + Γδ (δ˜i )2 . 2 i=1 2 j=1 i=1 2 i=1 i n

+

n

l

n

n

(5.74)

The time derivative of V along the trajectories of (5.68), (5.69), (5.70), (5.71), (5.72), and (5.73) is given by V˙ = X˜ T (L + B) X˙˜ + Γαi |gi |α˜ i α˙˜ i + Γgi |gi |g˜ i g˙˜ i + Γθi θ˜iT θ˙˜i n

i=1

+

n l

Γwi j w˜ i j w˙˜ i j +

n

j=1 i=1

n

n

i=1

i=1

Γδi δ˜i δ˙˜i

i=1

= α0 X˜ T (L + B) X˜ + X˜ T (L + B)Gx a˜ −

n

|gi |a˜ i xi ei − X˜ T (L + B)Gθ g˜

i=1

+

n

|gi |g˜ i ψiT θˆi ei + X˜ T (L + B)Gw g˜ −

i=1

n

|gi |g˜ i φ T wˆ i ei

i=1

− X˜ T (L + B)Gδ g˜ +

n

|gi |g˜ i sgn(ei )δˆi ei − X˜ T (L + B)Ψ

i=1

+

n i=1

θ˜iT ψi ei +

l

X˜ T (L + B)Φ j w˜ ∗ j −

j=1

+ X˜ T (L + B)( − Δ) +

n

w˜ i j φi ei

i=1 n

Γδi δ˜iT δ˙ˆi

(5.75)

i=1

n a˜ i |gi |xi ei , X˜ T (L + Similarly, it can be readily verified that X˜ T (L + B)Gx α˜ = i=1 n n T ˆ T T ˜ B)Gθ g˜ = i=1 g˜ i |gi |ψi θi ei , X (L + B)Gw g˜ = i=1 g˜ i |gi |φ wˆ i ei , X˜ T (L + B) n n θ˜iT ψi ei , X˜ T (L + B)Φ j w˜ ∗ j = Gδ g˜ = i=1 g˜ i |gi |sgn(ei )δˆi ei , X˜ T (L + B)Ψ = i=1 n n n ˜ i j φi ei , X˜ T (L + B) = i=1 ei i , and X˜ T (L + B)Δ = i=1 |ei |δˆi . Thus, i=1 w we obtain from (5.75) that

5.3 Distributed Tracking Control for a Class of Uncertain Nonlinear Systems

V˙ ≤ α0 X˜ T (L + B) X˜ +

n i=1

|ei |δi −

n

|ei |δˆi +

i=1

n

181

δ˜i sgn(ei )ei

i=1

= a0 X˜ T (L + B) X˜ ≤ 0

(5.76)

To this end, the asymptotically stability of the overall closed-loop multiagent system can be claimed following the same argument as that in Sect. 5.3.2.

5.3.4 The Case with Completely Unknown g i For the individual agent, when gi in the model (5.43) is completely unknown, Nussbaum gain technique can be used to solve its adaptive control problem [13, 14]. A typical Nussbaum gain is N (ζ ) = ζ 2 sin(ζ ). It can be seen that as ζ increases in magnitude and tends to ∞, N (ζ ) changes its sign an infinite number of times, and also supζ (N (ζ )) = +∞ and infζ (N (ζ )) = −∞ [12]. Such a property can be used in facilitating the stability analysis in dealing with the adaptive control with completely unknown gi . In this subsection, we show how to use Nussbaum gain in the design of distributed adaptive control for multiagent systems. Distributed Adaptive Control. The proposed control is of the form u i = N (ζi ) αˆ i xi − gˆ i ψiT (xi )θˆi + gˆ i φ T (t)wˆ i − gˆ i sgn(ei )δˆi ,

(5.77)

where N (ζi ) = ζi2 (t) sin(ζi (t)), ζ˙i (t) = αˆ i xi − gˆ i ψiT (xi )θˆi + gˆ i φ T (t)wˆ i − gˆ i sgn(ei )δˆi ei ,

(5.78) (5.79)

with ζi (0) = ζi0 ∈ . It can be seen that in (5.77), the Nussbaum gain N (ζi ) is used to replace sgn(gi ) in control (5.66). It then follows (5.43), (5.44), and (5.77) that x˙˜i = α0 x˜i − ψiT θ˜i + φ T w˜ i − sgn(ei )δˆi + i − (α0 xi − ψiT θˆi + φ T wˆ i − sgn(ei )δˆi ) (5.80) +gi N (ζi )(αˆ i xi − gˆ i ψiT θˆi + gˆ i φ T wˆ i − gˆ i sgn(ei )δˆi ), and the overall system error equation can be derived as X˙˜ = α0 X˜ − Ψ +

l j=1

Φ j w˜ ∗ j + − Δ − Ξ ∗ + Ξ,

(5.81)

182

5 Distributed Task Coordination of Multiagent Systems

where Ξ ∗ = [Ξ1∗ , . . . , Ξn∗ ]T with Ξi∗ = α0 xi − ψiT θˆi + φ T wˆ i − sgn(ei )δˆi , and Ξ = [Ξ1 , . . . , Ξn ]T with Ξi = gi N (ζi )(αˆ i xi − gˆ i ψiT θˆi + gˆ i φ T wˆ i − gˆ i sgn(ei )δˆi ). The same adaptive laws as those in (5.69), (5.70), (5.71), (5.72), and (5.73) are used for the estimates αˆ i , gˆ i , θˆi , wˆ i j , and δˆi . Stability Analysis. Use the same Lyapunov function candidate V in (5.74), we find its time derivative along the trajectories of (5.81), (5.70), (5.71), (5.72), and (5.73) as follows V˙ = α0 X˜ T (L + B) X˜ − X˜ T (L + B)Ξ ∗ + X˜ T (L + B)Ξ −

n

|gi |α˜ i xi ei

i=1

+

n

|gi |g˜ i ψiT θˆi ei −

i=1

+

n

θ˜iT ψi ei +

i=1

l

n

|gi |g˜ i φ T wˆ i ei +

i=1

n i=1

X˜ T (L + B)Φ j w˜ ∗ j −

j=1

+ X˜ T (L + B)( − Δ) +

|gi |g˜ i sgn(ei )δˆi ei

n

w˜ i j φi ei

i=1 n

Γδi δ˜iT δ˙ˆi

(5.82)

i=1

n n n n Note that X˜ T (L + B)Ξ ∗ = i=1 α0 xi ei − i=1 ψiT θˆi ei + i=1 φ T wˆ i ei − i=1 n n sgn(ei )δˆi ei , and X˜ T (L + B)Ξ = i=1 Ξi ei = i=1 gi N (ζi )(αˆ i xi − gˆ i ψiT θˆi + ∗ T gˆ i φ wˆ i − gˆ i sgn(ei )δˆi )ei . Also, recall that α˜ i = αˆ i − αi = αˆ i − sgn(gi ) αg0i and g˜ i = sgn(g ) gˆ i − gi∗ = gˆ i − gi i , thus the following identities can be established n

|gi |α˜ i xi ei =

i=1 n

|gi |g˜ i ψiT θˆi ei

i=1 n = (|gi |gˆ i ψiT θˆi ei − ψiT θˆi ei )

i=1 n

i=1

|gi |g˜ i φ T wˆ i ei =

i=1 n i=1

n (|gi |αˆ i xi ei − α0 xi ei )

|gi |g˜ i sgn(ei )δˆi ei =

n

(|gi |gˆ i φ T wˆ i ei − φ T wˆ i ei )

i=1 n i=1

To this end, we obtain from (5.82) that

|gi |gˆ i sgn(ei )δˆi ei − sgn(ei )δˆi ei

5.3 Distributed Tracking Control for a Class of Uncertain Nonlinear Systems

V˙ ≤ α0 X˜ T (L + B) X˜ +

n

Ξi ei −

i=1

− +

n i=1 n

|gi |gˆ i φ T wˆ i ei +

n

|gi |αˆ i xi ei +

i=1

n

n

183

|gi |gˆ i ψiT θˆi ei

i=1

|gi |gˆ i sgn(ei )δˆi ei +

i=1

n

|ei |δi −

n

i=1

|ei |δˆi

i=1

δ˜i sgn(ei )ei

i=1

≤ α0 λmin

n

x˜i2 −

i=1

+

n

n

|gi | αˆ i xi − gˆ i ψiT θˆi + gˆ i φ T wˆ i − gˆ i sgn(ei )δˆi ei

i=1

Ξi ei

(5.83)

i=1

where λmin is the minimum eigenvalue of (L + B). Integrating on both sides of (5.83), and noting the expression of ζ˙i in (5.79) and the fact αi ei = gi N (ζi )ζ˙i , we obtain V (t) − a0 λmin

n

t

x˜i2 dτ

i=1 0

≤ V (0) +

n

ζi (t)

ζi (t) n 2 gi τ sin τ dτ − |gi | dτ

i=1

Define β(ζ (t)) = V (0) +

n

i=1

i=1

ζi (0)

gi

ζi (t) ζi (0)

τ 2 sin τ dτ −

(5.84)

ζi (0)

n i=1

|gi |

ζi (t) ζi (0)

dτ , where ζ (t)

= [ζ1 (t), . . . , ζi (t), . . . , ζn (t)] . It follows from the direct integration that T

β(ζ (t)) = V (0) +

n

gi (2ζi (t) sin ζi (t) − ζi2 (t) cos ζi (t) + 2 cos ζi (t))

i=1

−

n

gi (2ζi (0) sin ζi (0) − ζi2 (0) cos ζi (0) + 2 cos ζi (0))

i=1

−

n

|gi |(ζi (t) − ζi (0))

(5.85)

i=1

where β(0) = V (0) ≥ 0. It can be seen from (5.85) that there exists a closed, bounded interval [ζ − , ζ + ] containing ζ (0) for which both β(ζ − ) and β(ζ + ) are negative. However, it follows from (5.84) that β(ζ (t)) ≥ 0, and thus ζ (t) ∈ [ζ − , ζ + ]. That is, ζ (t) is bounded. It follows from (5.85) that β(ζ (t)) is bounded, and from (5.84) that V (t) is bounded, X˜ , αˆ i , gˆ i , θˆi , wˆ i j , and δˆi are bounded for all t. Again, it follows from (5.84) that

184

5 Distributed Task Coordination of Multiagent Systems n

∞

0 ≤ −α0 λmin

x˜i2 dτ ≤ β(ζ (∞)) − V (∞) < ∞

(5.86)

i=1 0

or X˜ ∈ L2 . Since X˙ , as given in ( 5.81), is bounded, that is X˜ ∈ L∞ , it follows from the Barbalat’s lemma that X˜ → 0 as t → ∞. Example 5.13 In this example, we simulate the control algorithm in (5.77) on the same multiagent systems used in Example 5.12, while in this simulation, gi is completely unknown. Simulation settings are similar to those in 5.12. That is, RBF neu2 T ral networks are used to approximate f i (xi ) with ψi, j = e−(xi −μi, j ) (xi −μi, j )/ηi, j , j = 1, . . . , n i . We select the widths and centers as: ηi,2 j = 0.1, ∀i, j, every neural network ψiT θi contains 51 nodes, with center μi, j ( j = 1, . . . , 51) evenly spaced in [−15, 15]. Initial conditions and design parameters are given by: x1 (0) = [0.5, 0]T , x2 (0) = [−0.2, 0]T , x3 (0) = [0.3, 0]T , x0 (0) = 0, αˆ i (0) = 0, wˆ i1 (0) = 0, wˆ i2 (0) = 0, ζ (0) = 0, gˆi = 0, δˆi (0) = 0, θˆi (0) = 0, Γαi = Γwi = Γθi = 5, and Γδi = 100. Simulation results in Figs. 5.30, 5.31, 5.32, 5.33, 5.34 and 5.35 validate the effectiveness of the proposed distributed adaptive control with completely unknown gi . Figures 5.30 and 5.31 show that all three agents follow the desired trajectory specified by the informed leader x0 (t). The boundedness of the corresponding control inputs is shown in Fig. 5.32. The boundedness of parameter estimates αˆ i , wˆ i ,δˆi , gˆ i as well ♦ as NN weights θˆi are illustrated in Figs. 5.33 and 5.35.

7 x1 x2 x3

6 5 4

States

3 2 1 0 -1 -2 -3

0

5

10

15

20

25

Time (sec)

Fig. 5.30 System state responses xi (t)

30

35

40

45

50

5.3 Distributed Tracking Control for a Class of Uncertain Nonlinear Systems

185

4 x 1-x 0 x 2-x 0 x 3-x 0

3

States errors

2

1

0

-1

-2

-3 0

5

10

15

20

25

30

35

40

45

50

Time (sec)

Fig. 5.31 Tracking errors x˜i (t) Control inputs

60

u1 u2 u3

40 20 0 -20 -40 -60 -80 -100 -120 0

5

10

15

20

25

30

Time (sec)

Fig. 5.32 Control inputs u i (t)

35

40

45

50

186

5 Distributed Task Coordination of Multiagent Systems Adaptive estimation of a *i

0

\hat{a}_{1} \hat{a}_{2} \hat{a}_{3}

-0.5 -1 -1.5 0

5

10

15

20

25

30

35

40

45

50

Time (sec) Adaptive estimation of w 1

2

\hat{w}_{11} \hat{w}_{12} \hat{w}_{13}

1 0 -1 0

5

10

15

20

25

30

35

40

45

50

Time (sec) Adaptive estimation of w 2

1

\hat{w}_{21} \hat{w}_{22} \hat{w}_{23}

0.5 0 -0.5

0

5

10

15

20

25

30

35

40

45

50

Time (sec)

Fig. 5.33 Parameter estimates 0 -1

1 2

-2

3

-3 -4 -5 -6

0

5

10

15

20

25

30

35

40

45

50

Time (sec) Adaptive estimation of gi

1.5

\hat{g}_{1} \hat{g}_{2} \hat{g}_{3}

1

0.5

0 0

5

10

15

20

25

30

Time (sec)

Fig. 5.34 Parameter estimates

35

40

45

50

5.3 Distributed Tracking Control for a Class of Uncertain Nonlinear Systems Adaptive estimation of 0.4

187

i

\hat\delta_{1} \hat\delta_{2} \hat\delta_{3}

0.3 0.2 0.1 0

0

5

10

15

20

25

30

35

40

45

50

Time (sec) Norm of neural network weights

1.2

||\hat\theta_1|| ||\hat\theta_2|| ||\hat\theta_3||

1 0.8 0.6 0.4 0.2 0 0

5

10

15

20

25

30

35

40

45

50

Time (sec)

Fig. 5.35 Parameter estimates

5.3.5 Extensions The proposed distributed adaptive control algorithms for multiagent systems in Sects. 5.3.2, 5.3.3, and 5.3.4 can be further extended to more general cases. Extension to the case with a completely unknown r0 (t). In Sect. 5.3.2, we have made an assumption that in the informed leader (5.44) r0 (t) = φ T (t)w with w being unknown, and the estimate wˆ i is used in the proposed control (5.56) while the basic function φ(t) is assumed to be known to all agents. This assumption can be removed as well by using the neural network parameterization for r0 (t), that is, for agent i, let us assume that the neural network parameterization of r0 (t) is given by r0 (t) = (φ i (t))T wi + ri where φ i (t) is the basis function vector chosen by agent i, wi is the unknown constant to be estimated by agent i, and ri is the approximation error bounded by δri . Normally, r0 (t) is a typical signal from polynomial functions and/or sinusoidal functions. Thus, φ i (t) can simply be chosen from polynomial basis or Fourier basis functions. More generally, a popular choice is the radial basis function (RBF). For agent i, let wˆ i be the estimate of wi , and δˆri be the estimate of δri .

188

5 Distributed Task Coordination of Multiagent Systems

The control (5.56) can be modified as u i = αˆ i xi − ψiT (xi )θˆi − sgn(ei )δˆi − sgn(ei )δˆri + (φ i (t))T wˆ i ,

(5.87)

and the error equation in (5.57) becomes x˙˜i = α0 x˜i + α˜ i xi − ψiT θ˜i + φ T w˜ i + i − sgn(ei )δˆi − ri − sgn(ei )δˆri

(5.88)

with an added adaptive law for δˆri as sgn(ei )ei , δ˙ˆri = Γδ−1 ri

(5.89)

where Γδri > 0. To this end, following the same stability analysis procedure as that in Sect. 5.3.2, it can be readily shown that X˜ → 0 as t → ∞. Extension to High-Order Multiagent Systems. Consider the second-order multiagent systems of the following form

x˙i1 = f i1 (xi1 ) + xi2 x˙i2 = f i2 (xi1 , xi2 ) + u i ,

i = 1, . . . , n

(5.90)

where f i1 (xi1 ) and f i2 (xi1 , xi2 ) are unknown smooth nonlinear functions, xi = [xi1 , xi2 ]T ∈ 2 is the state of agent i, and u i ∈ is the control input of agent i. The design can be done using backstepping [15]. We assume that f i1 is parameterized as T (xi1 )θi1 , and we also apply a linearly parameterized neural network to f i1 (xi1 ) = ψi1 T (xi1 , xi2 )θi2 + i , where basis functions approximate f i2 , that is, f i2 (xi1 , xi2 ) = ψi2 T ni ψi1 = [ψi1,1 , . . . , ψi1,ni ] ∈ and ψi2 = [ψi2,1 , . . . , ψi2,ni ]T ∈ ni are known, θi1 = [θi1,1 , . . . , θi1,ni ]T and θi2 = [θi2,1 , . . . , θi2,ni ]T are unknown constants, and

i is the neural network approximation error bounded by an unknown constant δi . We start with the first equation of (5.90) x˙i1 = f i1 (xi1 ) + xi2 ,

(5.91)

and take xi2 as a virtual control input. Define the intermediate control function T (xi1 )θˆi1 + φ T (t)wˆ i ρi (xi1 , aˆ i , θˆi1 , wˆ i , t) = αˆ i xi1 − ψi1

(5.92)

and new state variables z i1 = xi1 − x0 , z i2 = xi2 − ρi ,

(5.93)

where αˆ i is the estimate of α0 , θˆi1 is the estimate of θi1 , and θ˜i1 = θˆi1 − θi1 , we obtain

5.3 Distributed Tracking Control for a Class of Uncertain Nonlinear Systems T z˙ i1 = α0 z i1 + z i2 + α˜ i xi1 − ψi1 (xi1 )θ˜i1 + φ T w˜ i , ∂ρi T (xi1 , xi2 )θi2 + i + u i − x˙i1 z˙ i2 = ψi2 ∂ xi1 ∂ρi ∂ρi ˙ ∂ρi ˙ ∂ρi θˆi1 − w˙ˆ iT − − aˆ i − ∂ aˆ i ∂ wˆ i ∂t ∂ θˆi1

189

(5.94)

(5.95)

Note that T x˙i1 = z˙ i1 + x˙0 = α0 xi1 + z i2 + α˜ i xi1 − ψi1 (xi1 )θ˜i1 + φ T wˆ i ,

thus (5.95) can be rewritten as T (xi1 , xi2 )θi2 + i + u i z˙ i2 = ψi2 ∂ρi T α0 xi1 + z i2 + α˜ i xi1 − ψi1 − (xi1 )θ˜i1 + φ T wˆ i ∂ xi1 ∂ρi ∂ρi ˙ ∂ρi ∂ρi ˙ , θˆi1 − w˙ˆ iT − − aˆ i − ∂ aˆ i ∂ wˆ i ∂t ∂ θˆi1

Define ei1 =

(5.96)

ai j (xi1 − x j1 ) + bi0 (xi1 − x0 ),

j∈Ni

and let the control input for agent i be chosen as ∂ρi ∂ρi ∂ρi T αˆ i xi1 + z i2 + φ wˆ i ∂ xi1 ∂ xi1 ∂ xi1 ∂ρi ∂ρi ˙ ∂ρi ˙ ∂ρi − sgn(z i2 )δˆi . θˆi1 + w˙ˆ iT + + aˆ i + ˆ ∂ aˆ i ∂ w ˆ ∂t ∂ θi1 i

T ˆ u i = −ei1 − z i2 − ψi2 θi2 +

(5.97)

where δˆi is the estimate of δi . Substituting (5.97) into (5.96) renders T ˜ θi2 + z˙ i2 = −ei1 − z i2 + i − ψi2

∂ρi T ψ θ˜i1 − sgn(z i2 )δˆi , ∂ xi1 i1

(5.98)

where θ˜i2 = θˆi2 − θi2 . Define Z 1 = [z 11 , . . . , z i1 , . . . , z n1 ]T , Z 2 = [z 12 , . . . , z i2 , . . . , z n2 ]T . It then follows from (5.94) and (5.98) that the overall multiagent system dynamics can be derived as Z˙ 1 = α0 Z 1 + Z 2 + X1 α˜ +

l

Φ j w˜ ∗ j − Ψ1

(5.99)

j=1

Z˙ 2 = −E1 − Z 2 + − Ψ2 + Ψ3 − Δ

(5.100)

190

5 Distributed Task Coordination of Multiagent Systems

where α=[ ˜ α˜ 1 , . . . , α˜ n ]T , w˜ ∗ j = [w˜ 1 j , . . . , w˜ n j ]T , X1 = diag[x11 , . . . , xi1 , . . . , xn1 ], T ˜ T ˜ θ11 , . . . , ψn1 θn1 ]T , Ψ2 = E1 = [e11 , . . . , en1 ]T , = [ 1 , . . . , n ]T , Ψ1 = [ψ11 ∂ρ1 ∂ρn T ˜ T ˜ T ˜ T ˜ T [ψ12 θ12 , . . . , ψn2 θn2 ] , Ψ3 = [ ∂ x11 ψ11 θ11 , . . . , ∂ xn1 ψn1 θn1 ], and Δ = [sgn(z 12 )δˆ1 , . . . , sgn(z n2 )δˆn ]T . The adaptive laws are given by xi1 ei1 , αˆ˙ i = −Γα−1 i w˙ˆ i j = −Γw−1 φ j ei1 , ij

(5.102)

θ˙ˆi1 = Γθ−1 i1

(5.103)

(5.101)

j = 1, . . . , l, ! ∂ρi ψi1 ei1 − ψi1 z i2 , ∂ xi1

θ˙ˆi2 = Γθ−1 ψi2 z i2 , i2 ˙δˆ = Γ −1 sgn(z )z , i i2 i2 δi

(5.104) (5.105)

where Γai , Γwi j , Γθi1 , Γθi2 , Γδi , are some positive constants. The closed-loop system stability can be summarized into the following theorem. Theorem 5.3 Consider the multiagent system in (5.90). If the sensing/communication topology A is connected, and B has at least one entry being nonzero, then the distributed adaptive tracking control in (5.92) and (5.97) with the adaptive laws in (5.101)–(5.105) guarantee the boundedness of all signals of the closed-loop system and achieve asymptotical tracking of the informed leader. Proof Consider the Lyapunov function candidate 1 T 1 1 1 Z 1 (L + B)Z 1 + Z 2T Z 2 + Γαi (α˜ i )2 + Γw (w˜ i j )2 , 2 2 2 i=1 2 j=1 i=1 i j n

V =

N

1 1 1 2 2 Γθi1 θ˜i1 + Γθi2 θ˜i2 + Γδ δ˜2 . 2 j=1 2 j=1 2 j=1 i i n

+

l

n

n

The time derivative of V along the trajectories of (5.99) and (5.100) is given by ⎡ V˙ = Z 1T (L + B) ⎣α0 Z 1 + Z 2 + X1 α˜ +

l

⎤ Φ j w˜ ∗ j − Ψ1 ⎦

j=1

+Z 2T [−E1 n

− Z 2 + − Ψ2 + Ψ3 − Δ]

α˜ i xi1 ei1 −

−

i=1

l n j=1 i=1

w˜ i j φ j ei1 +

n j=1

θ˜i1T ψi1 ei1 −

∂ρi ψi1 z i2 ∂ xi1

!

5.3 Distributed Tracking Control for a Class of Uncertain Nonlinear Systems

+

n

θ˜i2T ψi2 z i2 +

j=1

≤

a0 Z 1T (L

n

191

δ˜i |z i2 |

j=1

+ B)Z 1 − Z 2T Z 2 ≤ 0

which implies that Z 1 , Z 2 , αˆ i , wˆ i j , θˆi1 , θˆi2 , and δˆi are bounded. Similarly, we can show that Z 1 , Z 2 ∈ L2 and Z˙ 1 , Z˙ 2 ∈ L∞ . It then follows from the Barbalat’s lemma that Z 1 , Z 2 → 0 as t → ∞. This completes the proof. Remark 5.4 In (5.90), f i1 could also be parameterized using a neural network. However, the asymptotical stability result may not be obtained. Further extension can be done for a more general class of nonlinear systems in strict-feedback form as follows ⎧ x˙i1 = ψi1 (xi1 )T θi1 + xi2 ⎪ ⎪ ⎪ ⎨ x˙i2 = ψi2 (xi1 , xi2 )T θi2 + xi3 , i = 1, . . . , n .. ⎪ ⎪. ⎪ ⎩ x˙im = ψim (xi1 , . . . , xim )T θim + u i where θi1 , . . . , θim are unknown constants.

(5.106)

♦

5.4 Summary In this chapter, we presented several distributed task coordination algorithms for multiagent systems with nonlinear dynamics and model uncertainties. By formulating the task coordination problem as cooperative stabilization or cooperative tracking control problem, Lyapunov stability theory-based nonlinear and adaptive design and analysis methods may become handy in dealing with nonlinear multiagent systems [12, 15]. In recent years, many results have been obtained on Lyapunov-based cooperative control design for multiagent systems [1, 16–28]. In [18], consensus stabilization problem was addressed for multiagent systems with unknown nonlinear dynamics and undirected information exchange. The consensus tracking was solved for the first-order nonlinear systems with unknown nonlinear functions in [16] with directed communication topologies. The proposed adaptive laws required the global information on the left eigenvector of the graph Laplacian. The extension was made to a class of high-order nonlinear systems in a Brunovsky form in [22, 23]. The result in [21] considered time-varying communication for adaptive design of first-order nonlinear systems. In [17], the adaptive tracking design was given for multiple uncertain mechanical systems. For a class of strict-feedback nonlinear systems with parametric uncertainties, adaptive consensus tracking was designed using the backsepping technique [19, 20]. In [1], a cooperative steering control was designed. In [25, 26], adaptive cooperative tracking controls were designed using the method of model reference adaptive control. For multiagent systems with unknown control gains, some

192

5 Distributed Task Coordination of Multiagent Systems

recent results on adaptive cooperative control were obtained using Nussbaum-type functions [24, 29, 30]. The distributed task coordination algorithms presented in this chapter follow from our original work in [1, 24–26] with some new extensions. The general design method of distributed nonlinear control in Sect. 5.2 can handle more general nonlinear systems than those in [1]. The presented distributed tracking control design in Sect. 5.3 is systematic and covers several more cases than those in [24–26], including standard linearly parameterized uncertainties, neural network parameterized uncertainties, partially unknown control gains, and completely unknown control gains. Further extensions were also made to deal with the completely unknown input signal in the informed agent, and to deal with high-order nonlinear systems in the strict-feedback form. Numerous simulation examples were included to illustrate the effectiveness of the proposed controls. More importantly, the unified results on design of distributed nonlinear and adaptive control may pave the way in practice for real applications in task coordinations of multiagent systems with uncertainties and nonlinear dynamics.

References 1. Wang, J., Qu, Z., Obeng, M.: A distributed cooperative steering control with application to nonholonomic robots. In: 49th IEEE Conference on Decision and Control, pp. 4571–4576, Atlanta, GA, Dec 2010 2. Isidori, A.: Nonlinear Control Systems, 3rd ed., Springer-Verlag, Berlin (1995) 3. Luca, A., Oriolo, G., Samson, C.: Feedback control of a nonholonomic car-like robot. In: Laumond J. P. (ed.) Robot Motion Planning and Control, pp. 171–253 (1998) 4. Khalil, H.: Nonlinear Systems, 3rd ed. Prentice Hall, Upper Saddle River, NJ (2003) 5. Laumond, J.-P.: Robot Motion Planning and Control. Springer-Verlag, London (1998) 6. Qu, Z., Wang, J., Plaisted, C.E.: A new analytical solution to mobile robot trajectory generation in the presence of moving obstacles. IEEE Trans. Robot 20, 978–993 (2004) 7. Murray, R.M., Sastry, S.S.: Nonholonomic motion planning: steering using sinusoids. IEEE Trans. Autom. Control 38, 700–716 (1993) 8. Walsh, G.C., Bushnell, L.G.: Stabilization of multiple input chained form control systems. Syst. Control Lett. 227–234 (1995) 9. Sastry, S.: Nonlinear Systems: Analysis, Stability and Control. Springer-Verlag, New York (1999) 10. Horn, R.A., Johnson, C.R.: Matrix Analysis. Cambridge University Press, Cambridge (1985) 11. Sanner, R.M., Slotine, J.E.: Gaussian networks for direct adaptive control. IEEE Trans. Neural Netw. 3, 837–863 (1992) 12. Narendra, K.S., Annaswamy, A.M.: Stable Adaptive Systems. Prentice-Hall, Englewood Cliffs, NJ (1989) 13. Nussbaum, R.D.: Some remarks on the conjecture in parameter adaptive control. Syst. Control Lett. 3, 243–246 (1983) 14. Ge, S.S., Wang, J.: Robust adaptive tracking for time-varying uncertain nonlinear systems with unknown control coefficients. IEEE Trans. Autom. Control. 48, 1463–1469 (2003) 15. Krstic, M., Kanellakopoulos, I., Kokotovic, P.V.: Nonlinear and Adaptive Control Design. Wiley, New York (1995) 16. Das, A., Lewis, F.L.: Distributed adaptive control for synchronization of unknown nonlinear networked systems. Automatica 46, 2014–2021 (2010)

References

193

17. Dong, W.: On consensus algorithms of multiple uncertain mechanical systems with a reference trajectory. Automatica 47, 348–355 (2011) 18. Hou, Z., Cheng, L., Tan, M.: Decentralized robust adaptive control for the multiagent system consensus problem using neural networks. IEEE Trans. Syst. Man Cybern. 39, 636–647 (2009) 19. Wang, W., Huang, J., Wen, C., Fan, H.: Distributed adaptive control for consensus tracking with application to formation control of nonholonomic mobile robots. Automatica 50, 1254–1263 (2014) 20. Yoo, S.J.: Distributed consensus tracking for multiple uncertain nonlinear strict-feedback systems under a directed graph. IEEE Trans Neural Netw. Learn. Syst. 24, 666–672 (2013) 21. Yu, H., Xia, X.: Adaptive consensus of multi agents in networks with jointly connected topologies. Automatica 48, 1783–1790 (2012) 22. Zhang, H., Lewis, F.L.: Adaptive cooperative tracking control of higher-order nonlinear systems with unknown dynamics. Automatica 48, 1432–1439 (2012) 23. Zhang, H., Lewis, F.L., Qu, Z.: Lyapunov, adaptive, and optimal design techniques for cooperative systems on directed communication graphs. IEEE Trans. Indust. Electron. 59, 3026–3041 (2012) 24. Wang, J.: Adaptive cooperative control for a class of uncertain multiagent systems. In: 2017 American Control Conference. Seattle, WA (2017) 25. Wang, J.: Distributed adaptive tracking control for a class of uncertain multiagent systems. In: 2016 American Control Conference, Boston, MA, July 2016 26. Wang, J.: Distributed coordinated tracking control for a class of uncertain multiagent systems. IEEE Trans. Autom. Control 62, 3423–3429 (2017) 27. Wang, J., Yang, T., Staskevich, G., Abbe, B.: Approximately adaptive neural cooperative control for nonlinear multiagent systems with performance guarantee. Int. J. Syst. Sci. 48, 909–920 (2016) 28. Qu, Z.: Cooperative Control of Dynamical Systems. Springer-Verlag, London (2009) 29. Chen, W., Li, X., Ren, W., Wen, C.: Adaptive consensus of multiagent systems with unknown identical control directions based on a novel Nussbaum-type function. IEEE Trans. Autom. Control 59, 1887–1892 (2014) 30. Ding, Z.: Adaptive consensus output regulation of a class of nonlinear systems with unknown high-frequency gain. Automatica 51, 348–355 (2015)

Chapter 6

Multiagent Distributed Optimization and Reinforcement Learning Control

6.1 Introduction This chapter studies two problems for optimizing the dynamical behaviors of multiagent systems. The first one is the multiagent distributed optimization problem which searches for the optimal solution to minimize a cost function defined over all agents. The second one is the multiagent distributed optimal control problem which addresses the multiagent coordination problem by designing distributed optimal control law for each agent while achieving certain prescribed global optimal performance. Generally speaking, the first one is a static optimization problem while the second one is a dynamic optimization problem. Nonetheless, two problems share common characteristics in terms of seeking for the distributed solutions by individual agents. In this chapter, we formulate the multiagent optimization problem as addressing a general type of convex network cost function, and the multiagent optimal control problem as making all systems achieve consensus tracking while minimizing the individual sensing/communication topology dependent cost functions. Then we present several distributed solutions to those problems.

6.2 Basics on Optimization and Reinforcement Learning Algorithms 6.2.1 Optimization Algorithms A typical optimization problem considers the minimization of a cost function f (x): n → over all possible values x, where x = [x1 , . . . , xn ]T ∈ n is the decision vector and f (x) is generally a convex and differentiable function. The optimal value of x is denoted as x ∗ . That is, x ∗ = argmin f (x)

(6.1)

x

© Springer Nature Switzerland AG 2022 J. Wang, Emergent Behavior Detection and Task Coordination for Multiagent Systems, Studies in Systems, Decision and Control 397, https://doi.org/10.1007/978-3-030-86893-2_6

195

196

6 Multiagent Distributed Optimization and Reinforcement Learning Control

Based on the well-known optimality condition, x ∗ can be found from the equation below ∇ f (x) = 0

(6.2)

T · · · ∂∂f x(x) = ∂∂f x(x) is the gradient. While the analytical soluwhere ∇ f (x) = ∂ f∂(x) x 1 n tion to (6.2) may not be tractable due to the complexity of the cost function, numerical optimization solutions are normally pursued. Two commonly used iterative algorithms are gradient descent algorithm and Newton’s method. Both methods start the minimization process with some initial value x(0), and then take iterative steps to generate x(k + 1) from x(k) such that the condition f (x(k + 1)) ≤ f (x(k))

(6.3)

is satisfied over the iteration steps k = 0, 1, . . .. This successive approximation process stops once some stopping criteria are satisfied. A commonly used one is that ∇ f (x(k)) is sufficiently close to zero. Gradient Descent Algorithm. The gradient descent algorithm is based on the linear (first-order) approximation of f (x) given below f (x(k + 1)) ≈ f (x(k)) + ∇ f (x(k))T [x(k + 1) − x(k)]

(6.4)

and then constructs the iteration of x(k) following the direction of the negative gradient in the form of x(k + 1) = x(k) − αk ∇ f (x(k))

(6.5)

where αk > 0 is the learning rate. The convergence of (6.5) can be justified by noting that substituting (6.5) into (6.4) leads to f (x(k + 1)) ≈ f (x(k)) − αk ∇ f (x(k)) 2 ≤ f (x(k))

(6.6)

which approximately satisfies the condition in (6.3). While (6.6) holds for any positive constant αk in terms of the linear approximation of f (x), the minimization of the original f (x) relies on the proper choice of αk , which is usually selected as a small positive constant. Once αk is chosen, the condition f (x(k) − αk ∇ f (x(k))) ≤ f (x(k)) may be verified for the convergence. Otherwise, a new value αk can be selected. More generally, the diminishing αk satisfying the following conditions can be used to ensure the convergence of the gradient descent algorithm (6.5) [1]. α → 0, as k → ∞ k ∞ ∞ 2 k=0 αk = ∞, k=0 αk = 0

(6.7)

6.2 Basics on Optimization and Reinforcement Learning Algorithms

197

Newton’s Method. The Newton’s method is based on the quadratic (second-order) approximation of f (x) given below f (x(k + 1)) ≈ f (x(k)) + ∇ f (x(k))T [x(k + 1) − x(k)] 1 + [x(k + 1) − x(k)]T ∇ 2 f (x(k))[x(k + 1) − x(k)] 2

(6.8)

where ∇ 2 f (x(k)) is the n × n symmetric Hessian matrix of second-order derivatives, that is, ⎡ ∂2 f ⎤ 2 ∂2 f · · · ∂ x∂1 ∂fxn ∂ x1 ∂ x1 ∂ x1 ∂ x2 ⎢ ∂2 f ∂2 f ∂2 f ⎥ ⎢ ∂ x2 ∂ x1 ∂ x2 ∂ x2 · · · ∂ x2 ∂ xn ⎥ 2 ⎥ ∇ f (x(k)) = ⎢ .. . . . ⎥ ⎢ .. . .. ⎦ ⎣ . . 2 ∂2 f ∂2 f · · · ∂ x∂n ∂fxn ∂ xn ∂ x1 ∂ xn ∂ x2 The approximated f (x(k + 1)) in (6.8) reaches the minimum if the gradient of the right-hand side of (6.8) becomes zero, that is, ∇ 2 f (x(k))x(k + 1) = ∇ 2 f (x(k))x(k) − ∇ f (x(k))

(6.9)

In the case of matrix ∇ 2 f (x(k)) being invertible, we have from (6.9) that x(k + 1) = x(k) − [∇ 2 f (x(k))]−1 ∇ f (x(k))

(6.10)

Apparently, the Network’s method in (6.10) can be treated as a special case of the gradient descent algorithm in (6.5) with the learning rate αk = [∇ 2 f (x(k))]−1 . The gradient descent algorithm and the Newton’s method can also be used to solve the constrained optimization problem given below after using some techniques for handling constraints. minimize f (x) subject to h(x) = 0

(6.11)

where h(x) is the constraint imposed on x. One possible way is to use the penalty function method [2] to solve the unconstrained approximating problem minimize f (x) + K h 2 (x)

(6.12)

for some large positive constant K . The term K h 2 (x) is the penalty function, and for sufficiently large K , the problem in (6.12) is almost equivalent to that in (6.11).

198

6 Multiagent Distributed Optimization and Reinforcement Learning Control

6.2.2 Dynamic Programming and Reinforcement Learning For dynamical systems, reinforcement learning has been used to solve the optimal control problem mainly using data along the trajectory of the system. The underlying idea is from dynamic programming method based on the Bellman’s principle of optimality [3, 4]. Consider the standard discrete-time system x(k + 1) = f (x(k), u(k))

(6.13)

with an initial condition x(0) = x0 , control constraints u(k) ∈ U, and an objective function ∞ J (k) = l(x(m), u(m)) (6.14) m=k

where l(x(m), u(m)) is the immediate cost of using u(m) at state x(m). The optimal control problem is to design u(k) such that the objective function is minimized. Such a problem can be solved using either calculus of variations or dynamic programing [5]. It follows from (6.14) that J (k) = l(x(k), u(k)) + J (k + 1)

(6.15)

Given a state value x at the time step k, define V (x, k) as the optimal (minimum) cost function starting at x at time k. To solve the optimal control problem and to determine the optimal cost functions, a backward method can be used based on the principle of optimality. Assume V (x, k + 1) is known for state x, the question becomes to find V (x, k) for all x. Given x(k) = x, using u(k) = u, then the total objective value is J (k) = l(x, u) + V ( f (x, u), k + 1) The optimal cost from x at k is then V (x, k) = min(l(x, u) + V ( f (x, u), k + 1)) u∈U

(6.16)

and the optimal control is u ∗ (k) = argminu(k) (l(x, u) + V ( f (x, u), k + 1))

(6.17)

Example 6.1 The standard linear quadratic regulation problem can be solved using the recursive expression in (6.16). Consider the linear discrete-time system x(k + 1) = Ax(k) + Bu(k)

(6.18)

6.2 Basics on Optimization and Reinforcement Learning Algorithms

with J=

199

∞ [x(k)T Qx(k) + u T (k)Ru(k)] k=0

where Q is a symmetric positive semidefinite matrix and R is a symmetric positive definite matrix. Let us assume the optimal cost at the time step k is in the form of V (x, k) = x T P x, where P is a matrix to be determined. It then follows from (6.16) that x T P x = min(x T Qx + u T Ru + (x T A T + u T B T )P(Ax + Bu)) u

(6.19)

The minimum with respect to u is solved by u T R + u T B T P B + x T AT P B = 0 which renders the control u = −(R + B T P B)−1 B T P Ax

(6.20)

Substituting (6.20) into the (6.19) leads to the discrete-time algebraic Riccati equation A T P A + Q − P − A T P B(R + B T P B)−1 B T P A = 0

(6.21)

Remark 6.1 For the linear dynamics given in (6.18), the optimal cost function is parameterized as V (x, k) = x T P x, which renders the search for P by solving the ARE in (6.21). While for general nonlinear dynamics in (6.13), as shown in the following example, the problem may become intractable and numerical and approximate solutions may have to pursued. Example 6.2 Consider the nonlinear discrete-time system x(k + 1) = f (x(k)) + g(x(k))u(k) with J=

∞ [x(k)T Qx(k) + u T (k)Ru(k)] k=0

It follows from (6.17) that u ∗ (k) satisfies ∂(x(k)T Qx(k) + u T (k)Ru(k) + V (x(k + 1), k + 1)) =0 ∂u(k)

(6.22)

200

6 Multiagent Distributed Optimization and Reinforcement Learning Control

which leads to u T (k)R +

∂ V (x(k + 1), k + 1) g(x(k)) = 0 ∂ x(k + 1)

and u ∗ (k) = R −1 g T

∂ V (x(k + 1), k + 1) ∂ x(k + 1)

(6.23)

Substituting (6.23) into (6.16) we obtain the Hamilton–Jacobi–Bellman (HJB) equation V (x(k), k) = 0.5

∂ V (x(k + 1), k + 1) ∂ x(k + 1)

T

g R −1 g T

∂ V (x(k + 1), k + 1) ∂ x(k + 1)

+0.5x(k)T Qx(k) + V ( f (x, u), k + 1))

T

(6.24)

Apparently, the analytic solution to (6.24) is generally difficult to be obtained. In what follows, we present several numerical and approximate solutions. Two general iteration methods have been developed to solve the Bellman optimality equation given in (6.16). One is the value iteration method, and the other is the policy iteration method [6, 7]. In both methods, value functions can be represented using either the state value function (the so-called V -function) or the state–action value functions (Q-function). In addition, both methods can take the form of either model-based iteration (dynamic programming-based iteration) or model-free iteration (reinforcement learning-based iteration). Value iteration algorithm . Let us consider the state value function defined by Vl (x(k)) with state x(k) at the iteration step l = 0, 1, . . .. The value iteration algorithm starts with the initialization by selecting an admissible control policy h 0 (x(k)) and an initial value V0 (x(k)). The algorithm proceeds with following two steps until convergence: • Value update step: At step l = 0, 1, . . ., determine the value of the current policy h l (x(k)) using the following Bellman equation Vl+1 (x(k)) = l(x(k), h l (x(k))) + Vl (x(k + 1))

(6.25)

• Control policy improvement step: Update the control policy using the following equation h l+1 (x(k)) = argminh(x(k)) {l(x(k), h(x(k))) + Vl+1 (x(k + 1))}

(6.26)

Remark 6.2 To implement the value iteration algorithm in (6.25) and (6.26), x(k + 1) and l(x(k), u(k)) could be obtained by using the system model in the model-based case, or obtained by observation in the model-free case. In addition, the algorithm

6.2 Basics on Optimization and Reinforcement Learning Algorithms

201

may be implemented offline or online. Offline requires to store control values for every state, while online can be done by properly mixing the iteration steps and system dynamics steps. Apparently, if the state space and/or control space have a very large or infinite number of possible values, value functions and control policies will need to be represented approximately (using neural networks) in order to make the solutions tractable. Now let us consider the case of using the state–action value function. Given the control policy u = h(x), and the state–action pair (x, u), the corresponding cost function is defined using the following Q-function Q h (x, u) =

∞

l(x(k), u(k))

(6.27)

k=0

with x(0) = x, u(0) = u, x(k + 1) = f (x(k), u(k)) for k ≥ 0, and u(k) = h(x(k)) for k≥1. Accordingly, the Bellman equation in terms of Q-function can be derived as Q h (x, u) = l(x, u) + Q h ( f (x, u), h(x))

(6.28)

and the optimal Q-function at the state–action pair (x, u) is then Q ∗ (x, u) = min Q h (x, u) h

(6.29)

with the optimal control policy at state x h ∗ (x) = argminu Q ∗ (x, u)

(6.30)

It follows from (6.28) and (6.29) that we obtain the following Bellman optimality equation Q ∗ ( f (x, u), u ) Q ∗ (x, u) = l(x, u) + min

u

(6.31)

To this end, the value iteration algorithm in (6.25) and (6.26) can be replaced using Q-function as follows. For every (x, u), do Q l ( f (x, u), u ) Q l+1 (x, u) = l(x, u) + min

u

(6.32)

with the iteration step l = 0, 1, . . . and the initial value Q 0 = 0. As a result, we have Q ∗ (x, u) = liml→∞ Q l (x, u) until convergence. The optimal control policy is then given by (6.30). Remark 6.3 The value iteration in (6.32) is a model-based algorithm. To make it model free, the temporal difference method can be employed [3]. For example, an online value iteration algorithm using Q-function is given below

202

6 Multiagent Distributed Optimization and Reinforcement Learning Control

Q k+1 (x(k), u(k)) = Q k (x(k), u(k))

Q (x(k + 1), u ) − Q (x(k), u(k)) (6.33) +αk l(x(k), u(k)) + min k k

u

where αk is the learning rate, and the term l(x(k), u(k)) + minu Q k (x(k + 1), u ) − Q k (x(k), u(k)) is the temporal difference. Policy iteration algorithm. The policy iteration algorithm starts with the initialization by selecting an admissible control policy h 0 (x(k)) and an initial value V0 (x(k)). The algorithm proceeds with following two steps until convergence: • Policy evaluation step: At step l = 0, 1, . . ., determine the value of the current policy h l (x(k)) using the following Bellman equation Vl+1 (x(k)) = l(x(k), h l (x(k))) + Vl+1 (x(k + 1))

(6.34)

• Control policy improvement step: Determine an improved control policy using the following equation h l+1 (x(k)) = argminh(x(k)) {l(x(k), h(x(k))) + Vl+1 (x(k + 1))}

(6.35)

Remark 6.4 It should be noted that the control policy improvement step in value iteration algorithm and policy iteration algorithm is same. Example 6.3 Let us reconsider the linear discrete-time system (6.18) in Example 6.1. Define the control policy u(k) = h(x(k)) = −K x(k), where K is the control gain matrix to be determined. Let the cost function be V (x(k)) = x(k)T P x(k). The Bellman equation for the LQR is x(k)T P x(k) = x(k)T Qx(k) + u(k)T Ru(k) + x(k + 1)T P x(k + 1) (6.36) Using the control policy u(k) = −K x(k), equation (6.36) leads to the Lyapunov equation Q + K T R K + (A − B K )T P(A − B K ) − P = 0

(6.37)

K = (R + B T P B)−1 B T P A

(6.38)

and

To this end, it follows from (6.37) and (6.38) that the value iteration algorithm is given by Pl+1 = Q + K lT R K l + (A − B K l )T Pl (A − B K l ) K l+1 = (R + B T Pl+1 B)−1 B T Pl+1 A

(6.39) (6.40)

6.3 Multiagent Distributed Optimization

203

and the policy iteration algorithm is given by Pl+1 = Q + K lT R K l + (A − B K l )T Pl+1 (A − B K l ) K l+1 = (R + B T Pl+1 B)−1 B T Pl+1 A

(6.41) (6.42)

6.3 Multiagent Distributed Optimization Consider a network of n agents. Let xi ∈ be the variable associated with agent i, x = [x1 , . . . , xn ]T , and f i (x): n → be a smooth convex function that is available only to agent i. The multiagent optimization problem is formulated as finding the optimal solution x ∗ = [x1∗ , x2∗ , . . . , xn∗ ]T to the following overall cost function

n f i (x) minimize i=1 subject to x ∈ n

(6.43)

It follows from (6.43) that x ∗ = argminx

n

f i (x)

(6.44)

i=1

and its solution can be obtained using the following gradient descent method with the convergence of x(k) to x ∗ . x(k + 1) = x(k) − α∇

n

fl (x(k))

(6.45)

l=1

where x(k) = [x1 (k), x2 (k), . . . , xn (k)]T , α is a small, positive step length, and the gradient is defined by ∇

n

fl (x) =

∂

n l=1

∂ x1

fl (x)

···

∂

n l=1

∂ xn

fl (x)

T

l=1

n Remark 6.5 The iteration algorithm in (6.45) relies on the global term ∇ l=1 fl (x(k)). It is more realistic to solve the problem in a distributed way by individual agents using available local information. In what follows, we present two results: n ∂ l=1 fl (x) One is based on the distributed estimation of , and the other is based on the ∂ xi n distributed estimation of ∇ l=1 fl (x(k)).

204

6 Multiagent Distributed Optimization and Reinforcement Learning Control

6.3.1 Distributed Multiagent Optimization Algorithm: Case 1 In this case, we assume that agent i aims to find xi∗ to the problem in (6.44). Let xi (k) be the estimate of xi∗ by agent i at step k. It follows from (6.45) that n ∂ fl (x) xi (k + 1) = xi (k) − α ∂ xi l=1

(6.46) x=x(k)

The design of a distributed multiagent optimization algorithm boils down the disn ∂ l=1 fl (x) tributed estimation of the term by individual agents. As introduced in ∂ xi Chap. 3, the information exchange among agents is captured by the adjacency matrix A = [ai j ], which is further assumed to be directed, time-invariant, and strongly connected. The degree matrix D = diag([d1 , . . . , dn ]) with di = nj=1 ai j , i = 1, . . . , n. It follows that the corresponding Laplacian matrix L = D − A has rank n − 1 and has a nonrepeated zero eigenvalue [8]. Define F = I − (I + D)−1 L , with I being an n × n identify matrix. Define 1 = [1, . . . , 1]T . Note that F1 = 1 and that λ1 = 1 is an eigenvalue of F, the corresponding right eigenvector of which is 1. The remaining eigenvalues are in the unit circle and satisfy λ1 ≥ |λ2 | ≥ . . . ≥ |λn |. It follows from the discussion in Chap. 3 that λ1 = 1 is simple, and the corresponding normalized left eigenvector w1 = [w11 , w12 , . . . , w1n ]T (i.e., w1T F = w1T λ1 = w1T ) with the property that w1i > 0, ∀i = 1, . . . , n. The proposed distributed multiagent optimization algorithm is of the form xi (k + 1) = xi (k) − αk Pˆii (k)

(6.47)

in which αk is an appropriately chosen step length, and Pˆii (k) is given by Pˆii (k) = Pˆii (k − 1) +

1 ai j Pˆ ji (k − 1) − Pˆii (k − 1) 1 + di j∈N i

1 + [ pii (k) − pii (k − 1)] w1i

(6.48)

∂ f (x(k)) , p ji (k) := j∂ xi for j = i, Pˆ ji (k) represents of the estiwhere pii (k) := ∂ fi ∂(x(k)) xi n mate of l=1 pli (k) by agent j, j = 1, . . . , n, which is given by (for j = i)

Pˆ ji (k) = Pˆ ji (k − 1) +

1 a jl Pˆli (k − 1) − Pˆ ji (k − 1) 1 + d j l∈N j

1 p ji (k) − p ji (k − 1) + w1 j

(6.49)

Remark 6.6 It follows from (6.48) and (6.49) that each agent j maintains the estin ∂ fl (x) mation of n gradient terms l=1 , i = 1, . . . , n. That is, Pˆ ji is the estimate ∂ xi

6.3 Multiagent Distributed Optimization

n

205

∂ fl (x) , ∂ xi

i = 1, . . . , n, by agent j. Apparently, the algorithm in (6.49) is a distributed one since it only needs its own estimate Pˆ ji , its own value p ji , and the estimate Pˆli transmitted from its neighbors l ∈ N j .

of

l=1

Convergence Analysis. Let us consider convergence analysis for the estimate of n n ∂ fl (x(k)) . Its estimate by agent j is given by Pˆ ji , j = 1, . . . , n. l=1 pli (k) = l=1 ∂ xi Define the overall vectors ⎡ ˆ ⎤ ⎡ ⎤ P1i p1i ⎢ ˆ ⎥ ⎢ ⎥ ⎢ P2i ⎥ ⎢ p2i ⎥ Pˆi = ⎢ . ⎥ , pi = ⎢ . ⎥ ⎣ .. ⎦ ⎣ .. ⎦ pni Pˆni

(6.50)

It then follows from (6.49) and (6.50) that the overall estimator dynamics are Pˆi (k + 1) = F Pˆi (k) + diag

1 1 ,..., [ pi (k + 1) − pi (k)] w11 w1n

(6.51)

It suffices to show that lim | Pˆii (k) −

n

k→∞

pli (k)| ≤ i

(6.52)

l=1

where i is a small constant. Multiplying w1T on both sides of (6.51) leads to w1T Pˆi (k + 1) = w1T F Pˆi (k) + w1T diag

1 1 ,..., [ pi (k + 1) − pi (k)] w11 w1n

= w1T Pˆi (k) + 1T [ pi (k + 1) − pi (k)]

(6.53)

which is equivalent to w1T Pˆi (k + 1) − w1T Pˆi (k) = 1T pi (k + 1) − 1T pi (k) Thus, if the initial values of the estimators are selected as Pˆ ji (0) = 1, . . . , n, it then follows from (6.54) that w1T Pˆi (k) = 1T pi (k) =

n l=1

pli (k) =

n ∂ fl (x(k)) l=1

∂ xi

, ∀k

(6.54) p ji (0) ,∀j w1 j

=

(6.55)

Define a disagreement vector δi (k) = Pˆi (k) − 1w T Pˆi (k),

(6.56)

206

6 Multiagent Distributed Optimization and Reinforcement Learning Control

It then follows from (6.55) that δi (k) = Pˆi (k) − 11T pi (k)

(6.57)

Thus, using (6.51), we have δi (k + 1) = Pˆi (k + 1) − 11T pi (k + 1) 1 1 ··· = F Pˆi (k) + diag [ pi (k + 1) − pi (k)] − 11T pi (k + 1) w11 w1n

1 1 − 11T [ pi (k + 1) − pi (k)] (6.58) ··· = Fδi (k) + diag w11 w1n Taking the z−transform of both sides of (6.58), we obtain −1

δi (z) = (z I − F)

1 1 T diag − 11 (z − 1) pi (z) ··· w11 w1n

(6.59)

Note that all the poles of the proper rational transfer function matrix −1

(z I − F)

1 1 T diag − 11 (z − 1) ··· w11 w1n

lie inside the unit circle of the z-plane, and system (6.59) is BIBO stable. Hence (6.52) holds. Specifically, using the final value theorem, we have

1 1 − 11T pi (z) ··· lim δi (k) = lim (z − 1)2 (z I − F)−1 diag z→1 k→∞ w11 w1n from which we can further see that if pi (k) → 0, δi (k) → 0 as well. To this end, note that the gradient iteration in (6.47) can be rewritten as xi (k + 1) = xi (k) − αk

n l=1

pli (k) − αk

Pˆi (k) −

n

pli (k)

(6.60)

l=1

which can be treated as gradient descent algorithm with an additional the exact n pli (k) . Thus, by properly selecting the step bounded error term αk Pˆi (k) − l=1 size αk such that ∞ (i) to vanish, and k=1 αk = ∞, which is needed to forces the gradient n pli (k) to van(ii) limk→∞ αk → 0, which forces the error term αk Pˆi (k) − l=1 ish, the convergence of (6.47) together with (6.48) and (6.49) follows.

6.3 Multiagent Distributed Optimization

207

Remark 6.7 Note that in (6.48) and (6.49), the left eigenvector w is needed by individual agents. To avoid the use of this global information, the distributed estimation of w by each agent can be integrated into the distributed multiagent optimization algorithm in (6.47). That is, an adaptive version of (6.49) as given below can be used. Pˆ ji (k) = Pˆ ji (k − 1) +

1 a jl Pˆli (k − 1) − Pˆ ji (k − 1) 1 + d j l∈N j

+

1 j wˆ 1 j

p ji (k) − p ji (k − 1)

j

j

(6.61)

j

j

j

where wˆ 1 j (k) is the jth element of wˆ 1 = [wˆ 11 , . . . , wˆ 1n ]T , and wˆ 1 is the estimate of w1 by agent j, which is given follows j

j

wˆ 1 (k + 1) = wˆ 1 (k) +

1 j a jl wˆ 1l (k) − wˆ 1 (k) 1 + d j l∈N

(6.62)

j

j

where the initial value wˆ 1 (0) = [0, . . . , 0, 1, 0, . . . , 0]T with its jth element being 1. The convergence of (6.62) has been shown in Lemma 4.2. It should be noted that the j convergence of (6.47) will not be altered with the use of wˆ 1 j (k) in (6.61) due to the j

exponential convergence of wˆ 1 as well as the proper selection of the step size αk .

6.3.2 Distributed Multiagent Optimization Algorithm: Case 2 In this case, we consider the following optimization problem over networks

n minimize i=1 f i (x) subject to x ∈ m

(6.63)

in which n agents wish to determine an optimal value x ∗ ∈ m , while each agent i only knows its own cost f i (x). This setup is slightly different from the problem in (6.43), where agent i aims to determine its own associated optimal value xi∗ . Nonetheless, the distributed multiagent optimization algorithm to (6.63) can be developed following a similar procedure. It follows that the gradient descent algorithm to find x ∗ is n ∂ i=1 f i (x) x(k + 1) = x(k) − αk (6.64) ∂x x=x(k) ∂

n

f (x)

i=1 i A distributed algorithm can be obtained based on the estimation of . Let ∂x i m ∗ x (k) ∈ be the estimate of optimal value x by agent i at the iteration step k,

208

6 Multiagent Distributed Optimization and Reinforcement Learning Control

n i pi (k) = ∂ fi (x∂ x(k)) , and Pˆi (k) be the estimate of l=1 pl (k) by agent i. The proposed distributed optimization algorithm for agent i is of the form x i (k + 1) = x i (k) − αk Pˆi (k) 1 ˆ P j (k) − Pˆi (k) Pˆi (k + 1) = Pˆi (k) + 1 + di j∈N

(6.65)

i

1 [ pi (k + 1) − pi (k)] + i wˆ 1i (k)

(6.66)

i (k) is given by (6.62). where αk is an appropriately chosen step length, wˆ 1i The convergence analysis of the algorithm in (6.65) and (6.66) is similar to that in Sect. 6.3.1. It can be shown that x i (k) → x ∗ as k → ∞ for all i. More generally, the distributed multiagent optimization algorithm can also be applied to the following constrained optimization problem

n f i (x) minimize i=1 subject to h i (x) = 0, i = 1, . . . , n

(6.67)

That is, with the aid of the penalty function method [2], we may convert the problem in (6.67) into an unconstrained approximation problem as follows minimize

n

[ f i (x) + K h i2 (x)]

(6.68)

i=1

where K is some large, positive constant. Similarly, to minimize the constraint region

n i=1

f i (x) over

Ω = {x : gi (x) ≥ 0, i = 1, 2, . . . , n}, the unconstrained approximation problem can be formulated as finding the minimum of n 1 f i (x) + γ gi (x) i=1 over the interior of the region Ω, where γ is a small, positive constant. Example 6.4 Consider the distributed optimization problem in (6.63) with x ∈ 2 , n = 4, and the individual cost function for the agents is fi (x) = x T Pi x + biT x, where P1 =

0.2 0.1 0.4 0.1 , P2 = , 0.1 0.2 0.2 0.4

0.3 0.1 0.5 0.1 , P4 = , P3 = 0.1 0.2 0.1 0.2

6.4 Multiagent Distributed Coordination Using Reinforcement Learning

209

T and b1 = [1, 8]T , b2 = [1, 1]T , b3 = [3, 1]T , and b4 = [5, 1] . The optimal Tvalue T −1 can be computed using −( i Pi + i Pi ) ( i bi ) = [−2.1086, −4.5511] . We simulate the distributed multiagent optimization algorithm in (6.65). Assume that the adjacency matrix among agents is

⎡

0 ⎢ 200 A=⎢ ⎣ 200 200

200 0 200 0

200 200 0 200

⎤ 0 0 ⎥ ⎥ 200 ⎦ 0

The corresponding degree matrix, Laplacian matrix, and matrix F are ⎡

400 ⎢ 0 D=⎢ ⎣ 0 0 and

0 400 0 0

0 0 600 0

⎤ ⎡ ⎤ 0 400 −200 −200 0 ⎢ ⎥ 0 ⎥ ⎥ , L = D − A = ⎢ −200 400 −200 0 ⎥ ⎦ ⎣ 0 −200 −200 600 −200 ⎦ 400 −200 0 −200 400 ⎡

⎤ 0.0025 0.4988 0.4988 0 ⎢ 0.4988 0.0025 0.4988 0 ⎥ ⎥ F = I − (I + D)−1 L = ⎢ ⎣ 0.3328 0.3328 0.0017 0.3328 ⎦ 0.4988 0 0.4988 0.0025

and its left eigenvector, corresponding to an eigenvalue of 1, is T w1 = 0.2964 0.2593 0.3331 0.1111 The estimation of w1 is carried out using (6.62). The initial values of the estimates made by the four agents are set to be zero. Figures 6.1, 6.2, 6.3, and 6.4 show the convergence of the estimation errors produced by the four agents.

6.4 Multiagent Distributed Coordination Using Reinforcement Learning Consider a group of n agents, and each exhibits dynamical behaviors governed by the following discrete-time dynamic model xi (k + 1) = f i (xi (k)) + gi (xi (k))u i (k)

(6.69)

where i = 1, . . . , n, xi (k) ∈ q is the state vector of the agent i at the discrete time step k, u i (k) ∈ m is the control input to agent i, and f i : q → q and gi : q → m are differentiable in their arguments with f i (0) = 0.

210

6 Multiagent Distributed Optimization and Reinforcement Learning Control

Fig. 6.1 Estimation errors of agent 1

Estimation errors by agent 1

5

0

-5

-10

-15

-20

-25

x 1-x *1 x 2-x *2

-30 0

50

100

150

200

250

Time steps

Fig. 6.2 Estimation errors of agent 2

Estimation errors by agent 2

5

0

-5

-10

x 1-x *1 x 2-x *2

-15 0

50

100

150

200

250

Time steps

We consider the multiagent optimal coordination control problem for multiagent systems in (6.69). Through local information exchange, the group of agents aims to reach to a common target cooperatively while minimizing certain performance cost defined by all agents. The design can be straightforwardly applied to the multiagent systems with continuous-time dynamics with the use of a standard discretization procedure. Similarly, as given in Sect. 6.3.1, the adjacency matrix A = [ai j ] ∈ n×n is used to describe the sensing / communication topology among agents, which is assumed to be directed, time-invariant, and strongly connected. The target agent or called informed agent (leader) has dynamics given by x0 (k + 1) = f 0 (x0 (k), u 0 (k)),

(6.70)

6.4 Multiagent Distributed Coordination Using Reinforcement Learning Fig. 6.3 Estimation errors of agent 3

211

Estimation errors by agent 3 5

0

-5

-10

-15 x 1-x *1 x 2-x *2

-20 0

50

100

150

200

250

Time steps

Fig. 6.4 Estimation errors of agent 4

Estimation errors by agent 4

5

0

-5

-10

-15 x 1-x *1 x 2-x *2

-20

0

50

100

150

200

250

Time steps

where x0 ∈ q , u 0 ∈ m , and (6.70) is BIBO stable. The control objective can be explicitly stated as designing u i (k) such that lim xi (k) − x0 (k) = 0

k→∞

(6.71)

We also assume that the informed agent state x0 (k) is available to at least one agent by sensing/communication detection, and use a diagonal matrix B to describe the availability of x0 (k) to all agents, that is, B = diag{bi0 }

212

6 Multiagent Distributed Optimization and Reinforcement Learning Control

with bi0 > 0 if agent i can access x0 (k). At the time step k, upon applying control action u i (k), the agent i’s state is transferred to xi (k + 1) from xi (k) according to the system dynamics in (6.69). Then agent i’s controller receives a reward according to a cost function defined below ri (k + 1) = ρi (xi (k), ai j x j (k), bi0 x0 (k), u i (k))

(6.72)

The control action u i (k) is generated according to its control policy function u i (k) = μi (xi (k), ai j x j (k), bi0 x0 (k))

(6.73)

The individual long-term cost for agent i can be defined as follows: Ji (xi (k), ai j x j (k), bi0 x0 (k)) =

∞

γil−k ρi (xi (l), ai j x j (l), bi0 x0 (k), u i (l))

l=k

where γi ∈[0, 1] is the discount factor for agent i. To shorten the notation, we simply use Ji (k) for Ji (xi (k), ai j x j (k), bi0 x0 (k)). The overall cost for all agents is then given by n Ji (k) (6.74) J (k) = i=1

For the multiagent coordination problem, we simply choose ρi as ρi (k) =

(xi (k) − x j (k))T Γi (xi (k) − x j (k))

j∈Ni

+bi0 (xi (k) − x0 (k))T Γi (xi (k) − x0 (k)) + u i (k)T Λi u i (k)

(6.75)

where Γi and Λi are symmetric and positive definite matrices. In the defined cost function ρi , the first term is used to measure the closeness of the states between agents i and j, while the second term is to measure the control effort exerted by agent i. We let the discount factor γi = 1 just like in the standard optimal control problem. To this end, the cost functional for agent i is given by Ji (k) =

∞ l=k

⎡ ⎣

(xi (l) − x j (l))T Γi (xi (l) − x j (l))

j∈Ni

+bi0 (xi (l) − x0 (l))T Γi (xi (l) − x0 (l)) + u i (l)T Λi u i (l)

The control objective for each agent i is to solve the following problem.

(6.76)

6.4 Multiagent Distributed Coordination Using Reinforcement Learning

213

Problem 6.1 Find a distributed optimal control policy μi∗ for agent i such that the overall performance function (6.74) for all agents can be minimized while achieving (6.71). Assumption 6.1 There exist admissible cooperative controls u i (k) for multiagent systems (6.69). Remark 6.8 The control is admissible if the closed-loop system is stabilizable. For cooperative control of (6.69), this implies that for agent i, there exists local distributed cooperative control of the form u i (xi , ai j x j ) such that (6.71) can be achieved. Assumption 6.1 is imposed to consider the nonlinear systems (6.69) which can be controlled cooperatively. For instance, given a feedback linearizable system of the form of x˙i = f i (xi ) + u i [9], a simple admissible cooperative control policy could be u i (t) = − f i (xi ) +

n

ai j (x j − xi ),

(6.77)

j=1

which renders (6.71) under assumption that the adjacency matrix A is connected. In this section, we present a distributed reinforcement learning control approach to address problem 6.1.

6.4.1 Multiagent HJB Equation For the multiagent system (6.69), define the overall state vector x(k) = [x1T (k), . . . , xnT (k)]T and the overall control vector u(k) = [u 1T (k), . . ., u nT (k)]T . It follows from (6.74) and (6.76) that we have J (k) =

n i=1

⎡ ⎣

(xi (k) − x j (k))T Γi (xi (k) − x j (k))

j∈Ni

+bi0 (xi (k) − x0 (k))T Γi (xi (k) − x0 (k)) + u i (k)T Λi u i (k) +J (k + 1)

(6.78)

Let V (k) be the optimal value function starting at x(k) at time step k. Then according to the Bellman optimality principle, we have

214

6 Multiagent Distributed Optimization and Reinforcement Learning Control

V (k) = min u(k)

+

n

u i (k)T Λi u i (k) + bi0 (xi (k) − x0 (k))T Γi (xi (k) − x0 (k))

i=1

⎤

(xi (k) − x j (k))T Γi (xi (k) − x j (k))⎦ + V (k + 1)

j∈Ni

and the optimal control policy u ∗ (k) = argminu(k)

⎫ ⎬ (6.79)

⎭

n u i (k)T Λi u i (k) i=1

+bi0 (xi (k) − x0 (k))T Γi (xi (k) − x0 (k)) +

⎫ ⎬

⎤

(xi (k) − x j (k))T Γi (xi (k) − x j (k))⎦ + V (k + 1)

⎭

j∈Ni

(6.80)

It then follows from (6.80) that the optimal cooperative control can be obtained by setting the gradient of the right-hand side of (6.79) with respect to u(k) equal to zero, that is, ∂ V (k + 1) = 0, ∀i (6.81) 2Λi u i (k) + ∂u i (k) n Note that V (k) can be rewritten as V (k) = i=1 Vi (k) with Vi (k) being the optimal value function for agent i. Apparently, Vi (k) must be a function of xi (k) and x j (k), j ∈ Ni . Thus, it follows from (6.81) that the resulting optimal cooperative control is of the form ⎡ ⎤ ∂ V j (k + 1) ∂ V (k + 1) 1 i ⎦ u i∗ (k) = − Λi−1 giT ⎣ + (6.82) 2 ∂ xi (k + 1) ∂ x (k + 1) i j∈N i

Substituting (6.82) into (6.79) leads to the following multiagent HJB equation i

⎧⎡

⎤T ∂ V (k + 1) 1 ∂ V (k + 1) j i ⎣ ⎦ × + Vi (k) = ⎪ 4 ∂ x (k + 1) ∂ x (k + 1) i i ⎩ i=1 j∈N n ⎪ ⎨

i

⎡

⎤ ∂ V j (k + 1) ∂ V (k + 1) i ⎦ + gi Λi−1 giT × ⎣ ∂ xi (k + 1) ∂ x (k + 1) i j∈N i

+bi0 (xi (k) − x0 (k)) Γi (xi (k) − x0 (k)) T

+

(xi (k) − x j (k))T Γi (xi (k) − x j (k))

j∈Ni

+

i

Vi (k + 1)

⎫ ⎬ ⎭ (6.83)

6.4 Multiagent Distributed Coordination Using Reinforcement Learning

215

Apparently, once the multiagent HJB equation (6.83) is solved, the optimal cooperative control u i (k) can be obtained from (6.82). However, equation (6.83) is a partial differential equation for the optimal value function Vi (k). It is generally difficult to solve this equation analytically. In addition, even if its solution is obtained, it may not be implementable by individual agents because of the need of global information from all agents. In what follows, we present an approximated value iteration algorithm using a reinforcement learning approach.

6.4.2 Value Iteration Algorithm for Multiagent HJB The proposed value iteration algorithm for numerically solving (6.83) consists of two steps: value update and policy improvement. Theorem 6.1 Consider the nonlinear multiagent system in (6.69). For any integer l ≥ 0, given an admissible control u 0 (k), if a sequence of pairs {V l+1 , u l+1 } is generated by the following two steps: • Value update: (Vi0 (k) = 0) i

⎧⎡ ⎤T n ⎪ ⎨ 1 ∂ V l (k + 1) ∂ V l (k + 1) j i ⎣ ⎦ + Vil+1 (k) = ⎪ 4 ∂ x (k + 1) ∂ x (k + 1) i i ⎩ i=1 j∈Ni ⎡ ⎤ l l ∂ V (k + 1) ∂ V (k + 1) j ⎦ + gi Λi−1 giT ⎣ i ∂ xi (k + 1) ∂ x (k + 1) i j∈N i

+bi0 (xi (k) − x0 (k))T Γi (xi (k) − x0 (k)) +

(xi (k) − x j (k))T Γi (xi (k) − x j (k))

j∈Ni

+

⎫ ⎬ ⎭

Vil (k + 1)

(6.84)

i

• Policy improvement ⎡

u l+1 i (k) =

∂ V l+1 (k + 1) 1 − Λi−1 giT ⎣ i 2 ∂ xi (k + 1)

+

∂ V jl+1 (k + 1) j∈Ni

∂ xi (k + 1)

⎤ ⎦

(6.85)

then the corresponding value function V l satisfies V l+1 (k) ≥ V l (k)

(6.86)

216

6 Multiagent Distributed Optimization and Reinforcement Learning Control

and lim V l (k) = V (k)

(6.87)

l→∞

Proof The proof of (6.86) can be done by induction. Define ¯l

V (k) =

n

u li (k)T Λi u li (k) + bi0 (xi (k) − x0 (k))T Γi (xi (k) − x0 (k))

i=1

+

⎤ (xi (k) − x j (k))T Γi (xi (k) − x j (k))⎦ + V¯ l−1 (k + 1)

j∈Ni

⎫ ⎬ ⎭

(6.88)

with V¯ 0 (k) = 0. It is apparent that V 1 (k) ≥ V¯ 0 (k). Assume that V l (k) ≥ V¯ l−1 (k). Then it follows from (6.88 ) and (6.84) that V l+1 (k) − V¯ l (k) = V l (k + 1) − V¯ l−1 (k + 1) ≥ 0

(6.89)

On the other hand, based on the definitions of V l (k) and V¯ l (k), we know that V l (k) satisfies the Bellman optimality equation and is the optimal value function at time instant k, which implies V l (k) ≤ V¯ l (k). Combining with (6.89) leads to (6.86). The convergence of V l (k) to V (k) can be shown by noting that the sequence {V l (k)} generated by (6.84) is upper bounded and therefore has a limit, which is the solution to (6.83). Apparently, the value iteration algorithm in Theorem 6.1 requires the storage of value functions for every state. In order to facilitate its implementation in dealing with the continuous state space in the problem setting of this section, we use the following neural network to parameterize the value function for for agent i. Vi (k) = ΦiT (x¯i )θi∗ + ωi,l (x¯i ), ∀x¯i ∈ Ωi

(6.90)

where Ωi a compact set, x¯i = [bi0 x0T , ai1 x1T , . . . , xiT , . . . , ain xnT ]T , li ≥ 1 is the number of neural nodes, θi∗ ∈ R li unknown constant parameter vector, Φi (x¯i ) = [φi1 , φi2 , . . . , φili ]T is the known basis function vector, and ωi,l (x¯i ) denotes the approximation error. There are a number of different choices of basis functions in the use of neural networks. We impose a condition on the basis functions Φi such that the mapping defined by ΦiT θi∗ is nonexpansive. That is, the following inequality is satisfied: ΦiT θi − ΦiT θi ≤ θi − θi .

(6.91)

One commonly used basis function is the Gaussian function ensuring the inequality in (6.91), which has the form

6.4 Multiagent Distributed Coordination Using Reinforcement Learning

φi j (x¯i ) = exp

−

j∈Ni

x j − xi − μi j 1n 2 − bi0 xi − x0 2 ηi2j

217

! ,

where j = 1, 2, ..., li , μi j is the center of the receptive field and ηi j is the width of the Gaussian function. Remark 6.9 The optimal weight vector θi∗ in (6.90) is an “artificial” quantity required only for analytical purposes. Typically, θi∗ is chosen as the value of θi that minimizes ωi (x¯i ) for all x¯i ∈ Ωi , i.e., θi∗

:= arg min

" sup |Vi (k) −

θi,l ∈R li

x¯i ∈Ωi

θiT Φi (x¯i )|

According to the universal approximation Theorem [10, 11], approximation error ωi (x¯i ) must be bounded upon having the expression of (6.90). The following assumption on the approximation error is thus in order. Assumption 6.2 Over a compact region Ωi |ωi (x¯i )| ≤ δi∗ ∀x¯i ∈ Ωi , i = 1, . . . , N

(6.92)

where δi∗ ≥ 0 is an unknown bound. Remark 6.10 It is worth pointing out that the approximation error ωi (x¯i ) will converge to zero as li → ∞ according to the universal approximation Theorem [10, 11], which implies ω¯ i (t) → 0 as li → ∞. Remark 6.11 Several other types of function approximators could be used to parameterize value function Vi with certain modifications, such as, the radial basis function (RBF) neural networks [11], high-order neural networks [10], and fuzzy systems [12]. The key is to pick the appropriate basis functions Φi such that the nonexpansion condition in (6.91) is satisfied. In Theorem 6.1, the iteration step l and the time step k are generally different. Note that the iteration mapping from Vil (k) to Vil+1 (k + 1) defined in (6.84) is a contraction mapping and there is a fixed point Vi (k) as proved in Theorem 6.1. Thus, we may simply let l = k for the ease of online implementation [3]. To this end, together with the linear neural network parameterization of (6.90) and let θi (k) be the estimate of θi∗ at time instant k, equation (6.84) becomes n (ΦiT (x¯i (k))θi∗ + ωi (x¯i (k)) = ρi (k) i

+

i

i=1

(ΦiT (x¯i (k

+ 1))θi∗ + ωi (x¯i (k + 1))

(6.93)

218

6 Multiagent Distributed Optimization and Reinforcement Learning Control

where we recall ρi (k) = j∈Ni (xi (k) − x j (k))T Γi (xi (k) − x j (k)) + bi0 (xi (k) − x0 (k))T Γi (xi (k) − x0 (k)) + u i (k)T Λi u i (k). Again, directly solving for θi∗ would be difficult, and therefore we derive a gradient-based learning for updating θ (k) by minimizing the squared value error defined below ¯ (k)) = E(θ

Vi (k) −

i

!2 ΦiT (x¯i (k))θi (k)

(6.94)

i

It follows that α(k) ¯ (k)) E(θ 2 ! T = θ (k) + α(k) Vi (k) − Φi (x¯i (k))θi (k) Φ(x(k)) (6.95)

θ (k + 1) = θ (k) −

i

i

T where θ (k) = θ1 (k)T , . . . , θn (k)T , Φ(x(k)) = [Φ1T (x¯1 (k)), . . . , ΦnT (x¯n (k))]T , and α(k) is the learning rate satisfying the following conditions: ∞ ∞ 2 (i) k=1 α(k) = ∞ and k=1 α (k) < ∞. and (ii) limk→∞ α(k) → 0. Apparently, the optimal value i Vi (k) in (6.95) is unknown. Motivated by the temporal difference method in reinforcement learning [3], we replace it by its pre n ρi (k) + i ΦiT (x¯i (k + 1))θi (k). To this end, dicted value using θ (k), that is, i=1 the adaptive law for θ (k) becomes θ (k + 1) = θ (k) + α(k)

n

ρi (k) +

i=1

−

!

ΦiT (x¯i (k + 1))θi (k)

i

ΦiT (x¯i (k))θi (k) Φ(x(k))

(6.96)

i

and particularly for each agent i, (6.96) reduces to θi (k + 1) = θi (k) + α(k)

n i=1

−

n

ρi (k) + !

n

ΦiT (x¯i (k + 1))θi (k)

i=1

ΦiT (x¯i (k))θi (k) Φi (x¯i (k))

(6.97)

i=1

The corresponding control in (6.85) becomes ⎤ ⎡ ∂Φ j (k + 1) 1 −1 T ⎣ ∂Φi (k + 1) u i (k) = − Λi gi θi (k) + θ j (k)⎦ (6.98) 2 ∂ xi (k + 1) ∂ xi (k + 1) j∈N i

6.4 Multiagent Distributed Coordination Using Reinforcement Learning

219

Remark 6.12 The algorithm in (6.97) can betreated as anapproximate temporal n ρi (k) + i ΦiT (x¯i (k + 1))θi (k) difference algorithm because of the use of i=1 replacing the true optimal value function Vi (k). Remark 6.13 The conditions on learning rate α(k) are standard in the gradient-based learning algorithm in order to ensure the convergence of the estimate of θi (k) to θi∗ . Specifically, condition (i) is needed to force the gradient to vanish as k → ∞ and condition (ii) is needed to force the error terms like α(k)ε(k) to vanish, where ε(k) represents the lumped error ndue to neural network approximation error ωi as well as ρi (k) + i ΦiT (x¯i (k + 1))θi (k). error between Vi (k) and i=1 Remark 6.14 In (6.97), the terms i ρi (k), i ΦiT (x¯i (k))θi (k), and i ΦiT (x¯i (k + 1))θi (k) depend on all agents and are global information which may not be available for agent i. What available to agent i are ρi (k), ρ j (k) for j ∈ Ni , ΦiT (x¯i (k + 1))θi (k), Φ Tj (x¯ j (k + 1))θ j (k) for j ∈ Ni , and ΦiT (x¯i (k))θi (k), Φ Tj (x¯ j (k))θ j (k) for j ∈ Ni . We to further estimate three terms of the Tfollowing consensus algorithms propose T i ρi (k), i Φi ( x¯i (k))θi (k), and i Φi ( x¯i (k + 1))θi (k). of i ρi (k), Υi be the estimate of ForT agent i, let us define Ri (k) as the estimate T i Φi ( x¯i (k + 1))θi (k), and Ψi be the estimate of i Φi ( x¯i (k))θi (k). The proposed consensus algorithms are of the form Ri (k) = Ri (k − 1) +

1 ai j (R j (k − 1) − Ri (k − 1)) 1 + di j∈N i

1 + [ρi (k) − ρi (k − 1)] wi

Υi (k) = Υi (k − 1) +

(6.99)

1 ai j (Υ j (k − 1) − Υi (k − 1)) 1 + di j∈N i

1 T + Φi (x¯i (k + 1))θi (k) − ΦiT (x¯i (k))θi (k − 1) wi

Ψi (k) = Ψi (k − 1) +

(6.100)

1 ai j (Ψ j (k − 1) − Ψi (k − 1)) 1 + di j∈N i

1 T + Φi (x¯i (k))θi (k) − ΦiT (x¯i (k − 1))θi (k − 1) wi

(6.101)

where wi is the ith component of the left eigenvector for eigenvalue 1 of matrix F = I − (I + D)−1 L , with I being an n × n identify matrix. That is, define 1 = [1, . . . , 1]T . Note that F1 = 1 and that λ1 = 1 is an eigenvalue of F, the corresponding right eigenvector of which is 1. The remaining eigenvalues are in the

220

6 Multiagent Distributed Optimization and Reinforcement Learning Control

unit circle and satisfy λ1 ≥ |λ2 | ≥ · · · ≥ |λn |. Under the assumption that the sensing / communication topology is strongly connected, we know that λ1 = 1 is simple, and the corresponding normalized left eigenvector w1 = [w11 , w12 , . . . , w1n ]T (i.e., w1T F = w1T λ1 = w1T ) with the property that w1i > 0, ∀i = 1, . . . , n. To this end, the adaptive law for updating neural network weights in (6.97) becomes θi (k + 1) = θi (k) + α(k) [Ri (k) + Υi (k) − Ψi (k)] Φi (x¯i (k))

(6.102)

The proposed distributed reinforcement learning coordination algorithm can be summarized as follows. Algorithm 6.1 Consensus Based Reinforcement Learning Control Algorithm with Linear Parameterization of Value Function (for agent i) Input: System dynamics f i , gi , BFs, learning rate α(k), and network topology parameters wi 1. Initialization θi (0), Ri (0), Ψi (0), Υi (0) 2. for agent i, measure initial state xi (0) and neighbors x j (0) for time step k = 0, 1, 2, · · · do 3. update u i (k) using (6.98) 4. apply u i (k), measure next state xi (k + 1) and obtain x j (k + 1), ρi (k + 1), ρ j (k + 1), Υ j (k), Υi (k), Ψi (k + 1), Ψi (k) and Ri (k) and R j (k) 5. update θi (k + 1) using (6.102), (6.99), (6.100), and (6.101) end for

To this end, we have the following main theorem. Theorem 6.2 Consider the nonlinear multiagent system in (6.69) under assumptions 6.1 and 6.2. The proposed Algorithm 6.1 approximately solves Problem 6.1. Proof It has been shown in Theorem 6.1 that fundamentally the value iteration algorithm is a contraction mapping, and the use of nonexpansive linear neural network parameterization does not change this property. Thus, the proof boils down to show the convergence of the gradient-based algorithm in (6.102). To proceed, we first show the convergence of the consensus estimation algorithms in (6.99), (6.100), and(6.101). That is, the boundedness of estimation T errors |Ri (k) − T i ρi (k)|, |Υi (k) − i Φi ( x¯i (k + 1))θi (k)| and |Ψi (k) − i Φi ( x¯i (k))θi (k)| will be proved. Then we establish the convergence of the gradient-based algorithm in (6.102). It suffices to show the boundedness of |Ri (k) − i ρi (k)|, and the proof for the boundedness of |Υi (k) − i ΦiT (x¯i (k + 1))θi (k)| and |Ψi (k) − i ΦiT (x¯i (k))θi (k)| can be done similarly. Define the overall vectors ⎤ ⎡ ⎤ R1 ρ1 ⎢ R2 ⎥ ⎢ ρ2 ⎥ ⎢ ⎢ ⎥ ⎥ R = ⎢ . ⎥, χ = ⎢ . ⎥ ⎣ .. ⎦ ⎣ .. ⎦ Rn ρn ⎡

(6.103)

6.4 Multiagent Distributed Coordination Using Reinforcement Learning

221

It follows from (6.99) and (6.103) that the overall estimator dynamics are 1 1 ,..., R(k + 1) = FR(k) + diag [χ (k + 1) − χ (k)] w11 w1n

(6.104)

Thus, it follows from (6.104) that w1T R(k + 1) = w1T FR(k) + w1T diag

1 1 ,..., [χ (k + 1) − χ (k)] w1 wn

= w1T R(k) + 1T [χ (k + 1) − χ (k)]

(6.105)

which is equivalent to w1T R(k + 1) − w1T R(k) = 1T χ (k + 1) − 1T χ (k) Thus, if the initial values of the estimators are selected as Ri (0) = 1, . . . , n, it then follows from (6.106) that w1T R(k) = 1T χ (k) =

n

ρl (k), ∀k

(6.106) ρi (0) , ∀i w1i

=

(6.107)

l=1

Now let us show that lim |Ri (k) −

k→∞

n

ρl (k)| ≤ i

(6.108)

l=1

where i is a bounded constant. Taking the z−transform of both sides of (6.104) leads to 1 1 (z − 1)X (z) ,..., R(z) = (z I − F) diag w11 w1n −1

(6.109)

It can be seen that the transfer function matrix (z I − F)−1 diag w111 , . . . , w11n (z − 1) is proper and rational and that all its poles lie inside the unit circle of the z-plane; thus, (6.109) is BIBO stable. Define a disagreement vector δ(k) = R(k) − 1w1T R(k),

(6.110)

It then follows from w T R(k) = 1T χ (k) and (6.110) that δ(k) = R(k) − 11T χ (k) Thus, using (6.104), we have

(6.111)

222

6 Multiagent Distributed Optimization and Reinforcement Learning Control

δ(k + 1) = R(k + 1) − 11T X (k + 1) 1 1 ,..., = FR(k) + diag [χ (k + 1) − χ (k)] − 11T χ (k + 1) w11 w1n 1 1 − 11T [χ (k + 1) − χ (k)] (6.112) ,..., = Fδ(k) + diag w11 w1n Taking the z-transform of both sides of (6.112), we obtain −1

δ(z) = (z I − F)

1 1 T diag − 11 (z − 1)χ (z) ,..., w11 w1n

(6.113)

Again, noting that all the poles ofthe proper rational transfer function matrix (z I − −1 diag w111 , . . . , w11n − 11T (z − 1) lie inside the unit circle of the z-plane, F) system (6.113) can be considered BIBO stable. In other words, the disagreement vector δ(k) is bounded, and accordingly, (6.108) holds. Now let us show the convergence of (6.102). It follows from (6.102), (6.99), (6.100), and (6.101) that the gradient iteration in (6.102) can be rewritten as θi (k + 1) = θi (k) + α(k)(Y(k) + εi (k))Φi (x¯i (k))

(6.114)

N ρi (k)+ i ΦiT (x¯i (k + 1))θi (k) − i ΦiT (x¯i (k))θi (k), and εi (k) where Y(k)= i=1 is the lumped bounded error due to the use of Υi (k), Ψi (k), and Ri (k). Thus, equation (6.114) can be treated as the exact gradient descent algorithm with an additional error term α(k)εi (k). Thus due to conditions on learning rate α(k), the convergence of (6.102) follows. Remark 6.15 It should be noted that in the online implementation of control policy u i (k) in (6.98), we will generally need states xi (k + 1) and x j (k + 1) because of the Φ (k+1) (k+1) appearance of gradients Φxii(k+1) and xij(k+1) . For some dynamical systems, the closed form of u i (k) may be solved from (6.98). However, for general case the data xi (k + 1) and x j (k + 1) could be generated by directly using system dynamics xi (k + 1) = f i (xi (k)) + gi (xi (k)u¯ i with some random stable action u¯ i . This serves as the online explore process in the reinforcement learning algorithm. With xi (k + 1) and x j (k + 1), the u i (k) obtained in (6.98) is then exploited to drive the dynamical system. To further solve this problem, we may use another neural network to parameterize control policy u i (k) and redesign the control algorithm under the actor-critic framework. Remark 6.16 Theorem 6.2 provides a control design to solving the distributed tracking problem as defined in Problem 6.1. The algorithm can be easily adapted to solve the consensus control problem.

6.4 Multiagent Distributed Coordination Using Reinforcement Learning

6.4.3

223

Q-function-Based Value Iteration

To fully solve the issue of model dependence in the proposed Algorithm 6.1, it is instrumental to use the state–action value function (Q-function). That is, let the optimal Q-function be Q ∗ (x(k), u(k)) =

n

⎡ ⎣

i=1

(xi (k) − x j (k))T Γi (xi (k) − x j (k))

j∈Ni

+bi0 (xi (k) − x0 (k))T Γi (xi (k) − x0 (k)) + u i (k)T Λi u i (k) +V (k + 1) (6.115) Then V (k) = min Q ∗ (x(k), u(k))

(6.116)

u ∗ (k) = argminu(k) Q ∗ (x(k), u(k))

(6.117)

u(k)

and

The Bellman optimality equation for the Q-function is Q ∗ (x(k), u(k)) =

N

⎡ ⎣

i=1

(xi (k) − x j (k))T Γi (xi (k) − x j (k))

j∈Ni

+bi0 (xi (k) − x0 (k))T Γi (xi (k) − x0 (k)) + u i (k)T Λi u i (k) + min Q ∗ (x(k + 1), u(k + 1))

(6.118)

u(k+1)

The corresponding value iteration algorithm using Q-function becomes • Value update: (Q i0 (x¯i (k), u i (k)) = 0)

Q l+1 i ( x¯i (k), u i (k))

i

=

n #

u i (k)T Γi u i (k) + bi0 (xi (k) − x0 (k))T Γi (xi (k) − x0 (k))

i=1

+

(xi (k) − x j (k))T Γi (xi (k) − x j (k))

j∈Ni

+

i

min Q li (x¯i (k + 1), u i )

ui

⎫ ⎬ ⎭ (6.119)

224

6 Multiagent Distributed Optimization and Reinforcement Learning Control

• Policy improvement l+1

u l+1 i (k) = argminu i Q i ( x¯i (k), u i )

(6.120)

To this end, by using the neural network parameterization for the Q-function of agent i as Qˆ i ((x(k), u(k)) = ΦiT (x¯i (k), u i (k))θi (k)

(6.121)

and following a similar derivation as that in the Sect. 6.4.2, we obtain n

θi (k + 1) = θi (k) + α(k)

ρi (k) +

n

i=1

−

n

T Φi (x¯i (k + 1), u i )θi (k) min

i=1

!

ui

ΦiT (x¯i (k), u i (k))θi (k) Φi (x¯i (k), u i (k))

(6.122)

i=1

and u i (k) = argminu i ΦiT (x¯i (k), u i )θi (k)

(6.123)

It follows from (6.123) that u i (k) can be explicitly computed as follows without knowing the system dynamics ∂ΦiT (x¯i (k), u i )θi (k) =0 ∂u i

(6.124)

Example 6.5 Assume u i ∈ and θi = [θi1 , θi2 , θi3 ]T ∈ 3 . Let us simply choose ΦiT as ⎡

−

2 j∈Ni xi −x j η12

⎢ e ⎢ 2 Φi = ⎢ − j∈Ni 2 xi −x j η2 ⎣e

⎤

xi − x j

2

⎥ ⎥ ⎥ n ⎦ (x − x )u jl i l=1 il

j∈Ni j∈Ni u i2

(6.125)

It then follows from (6.124) and (6.125) that −

u i (k) =

2 j∈Ni xi −x j η22

θi2 (k)e 2θi3 (k)

n

(x jl − xil )

(6.126)

j∈Ni l=1

6.4 Multiagent Distributed Coordination Using Reinforcement Learning

225

n let Ri (k) be the estimate of i=1 ρi (k), Ψi (k) be the estimate of nSimilarly, T Φ ( x ¯ (k), u (k))θ (k), and Υ (k) be the estimate of i i i i i=1 i n

T Φi (x¯i (k + 1), u i )θi (k) , min

i=1

ui

we have the following distributed updating algorithm for θi (k) θi (k + 1) = θi (k) + α(k) [Ri (k) + Υi (k) − Ψi (k)] Φi (x¯i (k))

(6.127)

with Ri (k) = Ri (k − 1) +

1 1 ai j (R j (k − 1) − Ri (k − 1)) + 1 + di j∈N w1i i

[ρi (k) − ρi (k − 1)]

(6.128)

1 1 ai j (Υ j (k − 1) − Υi (k − 1)) + 1 + di j∈N w1i i T

T min Φ ( x ¯ (k + 1), u )θ (k) − Φ ( x ¯ (k), u (k))θ (k − 1) (6.129) i i i i i i i i

Υi (k) = Υi (k − 1) +

ui

Ψi (k) = Ψi (k − 1) +

1 1 ai j ( j (k − 1) − Ψi (k − 1)) + 1 + di j∈N w1i i

T Φi (x¯i (k), u i (k))θi (k) − ΦiT (x¯i (k − 1),

(6.130)

u i (k − 1))θi (k − 1)] To this end, we have the following Q-function-based approximate value iteration algorithm for distributed coordination of multiagent systems. Remark 6.17 The benefit of using Q-function is that no system dynamics information needed in the control law (6.123). In addition, the data x¯i (k + 1) in the adaptive law (6.128) can be obtained through online measurement due to the use of u i (k). Again, there is no need to use system dynamics to compute those numerically. Remark 6.18 The operator minu i in (6.128) can be replaced with the explicit expres sion of u i , which is the function of x¯i (k + 1) and θi (k).

226

6 Multiagent Distributed Optimization and Reinforcement Learning Control

Algorithm 6.2 Consensus-Based Reinforcement Learning Control Algorithm with Linear Parameterization of Q-Function (for agent i) Input: Neural network BFs, learning rate α(k), and network topology parameters wi 1. Initialization θi (0), Ri (0), Ψi (0), Υi (0) 2. for agent i, measure initial state xi (0) and neighbors x j (0) for time step k = 0, 1, 2, · · · do 3. update u i (k) using (6.123) 4. apply u i (k), measure next state xi (k + 1) and obtain x j (k + 1), ρi (k + 1), ρ j (k + 1), Υ j (k), Υi (k), Ψi (k + 1), Ψi (k) and Ri (k) and R j (k) 5. update θi (k + 1) using (6.127), (6.128), (6.129), and (6.130) end for

6.4.4 Extension In the proposed algorithm 6.1 and algorithm 6.2, the left eigenvector for eigenvalue 1 of matrix F has been used. This is in general not problem in the control of engineered multiagent systems for which the communication topology may be deployed by design beforehand. However, to further enhance the robustness of the overall systems against the intermittent communication topology changes, an online estimation of the corresponding eigenvector would be beneficial. In what follows, we make a further extension and integrate a distributed eigenvector estimation algorithm module into i i i T , wˆ 12 , . . . , wˆ 1n ] be the estimate of algorithm 6.1 and algorithm 6.2. Let wˆ 1i = [wˆ 11 T w1 = [w11 , . . . , w1n ] by agent i. The proposed estimation algorithm is of the form wˆ 1i (k + 1) = wˆ 1i (k) +

1 j (wˆ (k) − wˆ 1i (k)) 1 + di j∈N 1

(6.131)

i

where the initial value wˆ 1i (0) = [0, . . . , 1, . . . , 0]T , that is, the ith component i (0) = 1 and the rest are zero. The exponential convergence of wˆ 1i (k) to w1 was wˆ 1i shown in lemma 4.2. To this end, the value wi in (6.99), (6.100), (6.101), (6.128), i (k), respectively. This renders the truly (6.129), and (6.130) can be replaced by wˆ 1i distributed algorithms for algorithm 6.1 and algorithm 6.2. i Remark 6.19 With the use of wˆ 1i (k) in algorithm 6.1 and algorithm 6.2, the overall convergence of (6.97) and (6.102) can still be ensured due to exponential convergence of (6.131). The use of wˆ ii (k) would just contribute an additional error term in (6.60), which can be taken care of due to conditions on learning rate α(k).

Example 6.6 In this example, we simulate algorithm algorithm 6.1 by addressing the consensus control problem. We assume that there is no leader agent in the group. Consider three agents with the simple integrator model xi (k + 1) = xi (k) + u i (k)

6.4 Multiagent Distributed Coordination Using Reinforcement Learning

227

where xi ∈ and u i ∈ R for i = 1, 2, 3. Assume that the sensing/communication topology among agents is given by the following adjacency matrix ⎡

⎤ 010 A = ⎣1 0 1⎦ 010 The corresponding eigenvector w1 for matrix F is w1 = [0.28570.42860.2857]. The cost functions ρi (k) are ρ1 (k) = (x2 (k) − x1 (k))2 + u 21 (k), ρ2 (k) = (x2 (k) − x1 (k))2 + (x3 (k) − x2 (k))2 + u 22 (k), and ρ3 (k) = (x2 (k) − x3 (k))2 + u 23 (k). We use a single-node neural network to approximate value functions Vi . Let the corresponding basic functions be Φ1 = exp(−(x1 − x2 )2 ), Φ2 = exp(−(x2 − x3 )2 − (x2 − x1 )2 ) and Φ3 = exp(−(x3 − x2 )2 ) for value functions V1 , V2 and V3 , respectively. u i (k) is updated using (6.98), and θi (k + 1) is updated using (6.102), (6.99), (6.100), and (6.101). System initial states are x1 (0) = 0.8, x2 (0) = 0.5, and x3 (0) = 1.5. The learning . Figure 6.5 shows the system responses, and it is clear that system rate αk = 0.001 k state consensus is achieved. The distributed cooperative control inputs 3are illustrated Vi (k). The in Figs. 6.6 and 6.8 which display the instantaneous cost values i=1 estimates of the neural network weights are shown in Fig. 6.7. It can be seen that the proposed distributed approximated value iteration cooperative control provides an automated way to select the control gains in the control laws with performance guarantee. Fig. 6.5 System responses

1.5 x1

1.4

x2 x3

1.3 1.2

States

1.1 1 0.9 0.8 0.7 0.6 0.5 0

20

40

60

80

100

Steps

120

140

160

180

200

228

6 Multiagent Distributed Optimization and Reinforcement Learning Control

Fig. 6.6 Control inputs

0.025 u1 u2

0.02

u3

0.015

Control inputs

0.01 0.005 0 -0.005 -0.01 -0.015 -0.02 0

20

40

60

80

100

120

140

160

180

200

Steps

Fig. 6.7 Parameter estimation

10 -3

10

1 2

9

3

8

7

6

5

4

3 0

20

40

60

80

100

120

140

160

180

200

Steps

6.5 Summary In this chapter, the problems of distributed optimization and distributed optimal task coordination for multiagent systems were studied. There are many real applications related to multiagent optimization problem, such as deployment of sensor networks [13], resource allocation [14, 15], virtual network embedding [16–20], software defined networks [21], and cognitive radio networks [14, 22–24]. For multiagent optimal task coordination problem, there are also some examples in practice, such as the coverage control of multiple mobile robots [25], communication network congestion control, routing, scheduling [26, 27], and so on. Recent years have seen significant developments on distributed optimization algorithms using distributed cooperative control and consensus algorithms [14, 22, 23,

6.5 Summary Fig. 6.8 Instantaneous cost value

229 2.5

2

1.5

1

0.5

0 0

20

40

60

80

100

120

140

160

180

200

Steps

28–35]. For instance, an unconstrained multiagent was studied noptimization problem in [31], where the cost function is defined as i=1 f i (x), with f i : n → representing the convex cost function of agent i, and x ∈ n is the decision vector to be minimized. Extension was made for multiagent constrained optimization in [33]. In [35], an asynchronous broadcast-based method was designed for distributed optimization, and a distributed gossip algorithm was proposed in [28]. The optimal control for dynamical systems is well developed, which can be solved using either calculus of variations or dynamic programming [5]. Nonetheless, the analytical and closed-form solution to the Hamilton–Jacobi–Bellman (HJB) equation resulting in the design of nonlinear optimal control is usually difficult to be obtained [7]. This becomes even more challenging in study of optimal control for multiagent systems. Reinforcement learning or approximate dynamic programming has been applied to solve optimal control problems for dynamical systems [3, 7]. For example, policy iteration algorithm was designed in [36] to solve the HJB equation, and the Galerkin approximation method was applied in [37]. Neural networks are commonly used in the reinforcement learning control of dynamical systems for parameterized value functions and/or control policies [6, 38–41]. There are also some work on multiagent optimal control [42–44]. The results in this chapter cover not only the basics on numerical optimization algorithms and reinforcement learning algorithms, but also our recent solutions to multiagent distributed optimization and multiagent optimal control problems. The result in Sect. 6.3 follows from our original work in [45] with a consolidated convergence analysis. In our work [46], the optimal cooperative control was designed for a class of nonlinear multiagent systems in continuous-time domain. An approximated policy iteration algorithm was proposed based on least squares estimate. In contrast, the result in Sect. 6.4 considers nonlinear multiagent systems in discrete-time domain. Moreover, the coordination control for each agent was designed based on the minimization of a new overall cost function which is the sum of of the cost func-

230

6 Multiagent Distributed Optimization and Reinforcement Learning Control

tions of individual agents. A new distributed value iteration algorithm was derived based on the consensus estimation of the overall cost function. Both V -functionand Q-function-based value iteration algorithms were designed. Those results may serve as a vehicle for future study on data-driven distributed reinforcement learning coordination algorithms for multiagent systems. In particular, the use of deep neural network [47] in the parameterization of value functions may further facilitate the proposed value iteration algorithms.

References 1. Watt, J., Borhani, R., Katsaggelos, A.K.: Machine Learning Refined. Cambridge University Press, Cambridge, United Kingdom (2016) 2. Luenberger, D.G.: Optimization by Vector Space Methods. John Wiley & Sons Inc, New York, NY (1969) 3. Sutton, R.S., Barto, A.G.: Reinforcement Learning an Introduction, 2nd edn. The MIT Press, Cambridge, Massachusetts (2018) 4. Bellman, R.E., Dreyfus, S.E.: Applied Dynamic Programming. Princeton University Press, Princeton, NJ (1962) 5. Luenberger, D.G.: Introduction to Dynamic Systems. John Wiley & Sons, Inc. (1979) 6. Busoniu, L., Babuska, R., Schutter, B.D., Ernst, D.: Reinforcement Learning and Dynamic Programming Using Function Approximators, 3rd edn. CRC Press, Inc., Boca Raton, FL 7. Lewis, F.L., Vrabie, D.L., Syrmos, V.L.: Optimal Control. Wiley. Hoboken, NJ (2012) 8. Qu, Z.: Cooperative Control of Dynamical Systems. Springer-Verlag, London (2009) 9. Khalil, H.: Nonlinear Systems, 3rd edn. Prentice Hall, Upper Saddle River, NJ 10. Kosmatopoulos, E.B., Polycarpou, M.M., Christodoulou, M.A., Ioannou, P.A.: High-order neural network structures for identification of dynamical systems. IEEE Trans. Neural Netw. 6, 422–431 (1995) 11. Sanner, R.M., Slotine, J.E.: Gaussian networks for direct adaptive control. IEEE Trans. Neural Networks 3, 837–863 (1992) 12. Wang, L.X.: Adaptive Fuzzy Systems and Control: Design and Analysis. Prentice-Hall, Englewood Cliffs, NJ (1994) 13. Caicedo-Nunez, C.H., Zefran, M.: Distributed task assignment in mobile sensor networks. IEEE Trans. Autom. Control 56, 2485–2489 (2011) 14. Lorenzo, P., Barbarossa, S.: Swarming algorithms for distributed radio resource allocation. IEEE Signal Process. Mag. 144–154 (2013) 15. Aztiria, A., Augusto, J., Basagoiti, R., Izaguirre, A., Cook, D.: Learning frequent behaviors of the users in intelligent environments. IEEE Trans. Syst. Man Cybern. Syst. 43, 1265 – 1278 (2013) 16. Zhang, Z., Cheng, X., Su, S., Wang, Y., Shuang, K., Luo, Y.: A unified enhanced particle swarm optimization-based virtual network embedding algorithm. Int. J. Commun. Syst. 26, 1054–1073 (2013) 17. Mijumbi, R., Serrat, J., Gorricho, J.-L., Boutaba, R.: A path generation approach to embedding of virtual networks. IEEE Trans. Netw. Service Manage. 12, 334–347 (2015) 18. Haeri, S., Trajkovic, L.: Virtual network embedding via monte Carlo tree search. IEEE Trans. Cybern. (2017) 19. Yu, M., Yi, Y., Rexford, J., Chiang, M.: Rethinking virtual network embedding: substrate support for path splitting and migration. ACM SIGCOMM CCR 38, 17–29 (2008) 20. Chowdhury, M., Rahman, M., Boutaba, R.: Vineyard: virtual network embedding algorithms with coordinated node and link mapping. IEEE Trans. Netw. 20, 206–219 (2012)

References

231

21. Kreutz, D., Ramos, F.M.V., Verissimo, P.E., Rothenberg, C., Azodolmolky, S., Uhlig, S.: Software-defined networking: a comprehensive survey. Proc. IEEE 103, 14–76 (2015) 22. Lorenzo, P.D., Barbarossa, S., Sayed, A.H.: Decentralized resource assignment in cognitive networks based on swarming mechanisms over random graphs. IEEE Trans. Signal Process. 60, 3755–3769 (2012) 23. Zhang, W., Guo, Y., Liu, H., Chen, Y., Wang, Z., Mitola III, J.: Distributed consensus-based weight design for cooperative spectrum sensing. IEEE Trans. Parallel Distrib. Syst. 26, 54–64 (2015) 24. Pham, K.: Assured satellite communication: a minimal-cost-variance system controller paradigm. In: 2016 American Control Conference, pp. 6555–6561. Boston, MA, July 2016 25. Bullo, F., Cortés, J., Martínez, S.: Distributed Control of Robotic Networks. Applied Mathematics Series. Princeton University Press, 2009. Electronically available at http://coordinationbook. info 26. Shakkottai, S., Srikant, R.: Network optimization and control. Foundations Trends Netw. 2, 271–379 (2007) 27. Low, S.H.: A duality model of TCP and queue management algorithms. IEEE Trans. Netw. 11, 525–536 (2003) 28. Lu, J., Tang, C.Y., Regier, P.R., Bow, T.D.: Gossip algorithms for convex consensus optimization over networks. IEEE Trans. Autom. Control 56, 2917–2923 (2011) 29. Scaglione, A., Goeckel, D.L., Laneman, J.N.: Cooperative communications in mobile ad hoc networks. IEEE Signal Process. Mag. 18–29 (2006) 30. Das, A., Mesbahi, M.: Distributed linear parameter estimation over wireless sensor networks. IEEE Trans. Aerosp. Electron. Syst. 45, 1293–1306 (2009) 31. Nedic, A., Ozdaglar, A.: Distributed subgradient methods for multi-agent optimization. IEEE Trans. Autom. Control 54, 48–61 (2009) 32. Wang, J., Elia, N.: Control approach to distributed optimization. In: Forty-Eighth Annual Allerton Conference, pp. 557–561. Allerton House, IL, Sept 29–Oct 1 (2010) 33. Nedic, A., Ozdaglar, A., Parrilo, P.A.: Constrained consensus and optimization in multiagent networks. IEEE Trans Autom. Control 55, 922–938 (2010) 34. Lobel, I., Ozdaglar, A.: Distributed subgradient methods for convex optimization over random networks. IEEE Trans. Autom. Control 56, 1291–1306 (2011) 35. Nedic, A.: Asynchronous broadcast-based convex optimization over a network. IEEE Trans. Autom. Control 56, 1337–1351 (2011) 36. Saridis, G.N., Lee, C.G.: An approximation theory of optimal control for trainable manipulators. IEEE Trans. Syst. Man Cybern. 9, 152–159 (1979) 37. Beard, R.W., Saridis, G.N., Wen, J.T.: Galerkin approximations of the generalzied hamiltonjacobi-bellman equation. Automatica 33, 2159–2177 (1997) 38. Si, J., Wang, Y.: On-line learning control by association and reinforcement. IEEE Trans. Neural Netw. 12, 264–276 (2001) 39. Vamvoudakis, K.G., Lewis, F.L.: Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 46, 878–888 (2010) 40. Zhang, H., Liu, D., Luo, Y., Wang, D.: Adaptive Dynamic Programming and Control. Springer, London (2013) 41. Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific, Belmont, Mass (1996) 42. Semsar-Kazerooni, E., Khorasani, K.: Optimal consensus algorithms for cooperative team of agents subject to partial information. Automatica 44, 2766–2777 (2008) 43. Cao, Y., Ren, W.: Optimal linear consensus algorithms: an LQR perspective. IEEE Trans. Syst. Man Cybern. Part B Cybern. 40, 819–830 (2010) 44. Qu , Z., Simaan, M.: An analytic solution to the optimal design of information structure and cooperative control in networked systems. In: 51th IEEE Conference on Decision and Control, pp. 4015–4022. Maui, HI, Dec 2012 45. Wang, J., Pham, K.: An approximate distributed gradient estimation method for network optimization with limited communications. IEEE Trans. SMC Syst. 50, 5142–5151 (2020)

232

6 Multiagent Distributed Optimization and Reinforcement Learning Control

46. Wang, J., Yang, T., Staskevich, G., Abbe, B.: Approximately adaptive neural cooperative control for nonlinear multiagent systems with performance guarantee. Int. J. Syst. Sci. 48, 909–920 (2016) 47. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016). http://www. deeplearningbook.org

Index

A Algebraic multiplicity, 26 Algebraic Riccati equation, 199 Approximate temporal difference, 219 Asymptotically stable, 30 B Backstepping, 39 Bellman optimality equation, 200, 201 BIBO stability, 30, 31 Boids model, 18 C Car-like robot, 155 Cayley-Hamilton theorem, 28 Chained form, 156 Characteristic polynomial, 26 Consensus algorithm backstepping, 66 canonical form, 87 continuous-time case , 54–58 discontinuous, 94 discrete-time case , 60–66 high-order linear systems , 66–94 output feedback, 77 Consensus protocol, see consensus algorithm Constrained optimization, 208 Continuous-time linear time-invariant system, 29 Cooperative, 13 Cooperative control, see consensus algorithm backstepping, 66

Cooperatively stable, 13 Cooperative stabilization, 145 Coordination variable, 13 Couzin model, 18

D Differential drive mobile robot, 145, 154 Discrete-time state-space model, 29 Distributed adaptive control, 168, 173, 179, 181 Distributed control, 145 Distributed estimation , 104–107 Distributed estimation algorithm event-triggered, 112 left eigenvector, 110, 124 strongly connected, 107 undirected, 104 Distributed kalman filtering algorithm, 132– 136, 136 Distributed least squares iterative, 129 recursive, 119 Distributed least squares algorithm, 116–132 Distributed multiagent optimization algorithm, 204, 207 Distributed nonlinear control , 148–150 Distributed optimal control, 195, 212 Distributed optimization, 195 Distributed reinforcement learning coordination algorithm, 220 Distributed tracking control , 166–191 Double-integrator model, 16 Dynamic program policy iteration, 202

© Springer Nature Switzerland AG 2022 J. Wang, Emergent Behavior Detection and Task Coordination for Multiagent Systems, Studies in Systems, Decision and Control 397, https://doi.org/10.1007/978-3-030-86893-2

233

234 value iteration, 200 Dynamic programming, 198

E Eigenvalue and eigenvector, 25 Emergent behavior indicator, 102–104 Emergent behaviors, 2 Equilibrium point, 33 Estimator current, 101 prediction, 101

F Flocking model, 18

G Geometric multiplicity, 27 Gersgorin circle theorem, 46 Graph balanced, 44 connected, 45 digraph, 44 directed, 43 directed tree, 45 spanning tree, 45, 46 strongly connected, 45 undirected, 43 weakly connected, 45 weighted digrpha, 44 Graph matrix adjacency, 45 degree, 45 Laplacian, 45, 48

H Hamilton-Jacobi-Bellman (HJB) equation, 200

I Induced norm, 32 Input-output feedback linearization, 38 Input-state feedback linearization, 37 Interaction dynamics, 6 Interaction rule, 17 Interaction topology, 11, 43 Internal stability, 30 Invariant set theorem, 36

Index J Jacobian matrix, 34 Jordan canonical form, 26

L Least squares algorithm basic, 116 recursive, 118 weighted, 117 Left eigenvector, 26 Linear algebra, 23 Linear algebraic equation, 25 Linearly independent vector, 24 Linear quadratic regulation, 198 Lipschitz condition, 33 Lyapunov equation, 35, 49 Lyapunov redesign, 38 Lyapunov’s first method, 34 Lyapunov’s indirect method, 34 Lyapunov’s second (direct) method, 35 Lyapunov stability asymptotically stable, 33 exponentially stable, 33 globally asymptotically stable, 33 stable, 33 unstable, 33

M Marginally stable, 30 Matrix irreducible, 51 nonnegative, 51 positive, 51 reducible, 51 Matrix exponential, 28 Matrix theory, 23 Multiagent distributed optimization , 203– 208 Multiagent HJB equation, 214 Multiagent optimal coordination control, 209 Multiagent system, 4, 145, 166, 209 high-order, 188

N Neural network, 173, 216 Nilpotent, 28 Nonholonomic constraint, 154 Nonholonomic robot, 152 Nonsingular, 25 Norm of a vector, 32

Index Nullity, 25 Nussbaum gain, 181 O Optimal control, 198 Optimal cost, 198 Optimization algorithm gradient descent, 196 newton’s method, 197 Orthogonal matrix, 24 Orthonormal, 24 P Penalty function method, 208 Positive definite, 28 Positive semidefinite, 28 Q Q-function based value iteration, 223 Quadratic form, 28 R Range space, 25 Rank, 25 Reinforcement learning, 198, 200, 215 Right eigenvector, 26 S Sensing/communication matrix, 50

235 sequentially complete, 52 sequentially lower-triangularly complete, 52 Similarity transformation, 25 Single-integrator model, 15 Singular-value decomposition, 118 State feedback control, 29 Steering control piecewise-constant, 158 polynomial, 159 sinusoidal, 157

T Task coordination, 13, 145

U Unicycle model, 16

V Value function Q-function, 200, 201 V -function, 200 Value iteration algorithm for multiagent HJB, 215–217 Vicsek’s model, 18

Z Zero-input response, 29 Zero-state response, 29