137 12 11MB
English Pages 371 Year 2023
Advances in Industrial Control
Andrea L’Afflitto Gokhan Inalhan Hyo-Sang Shin Editors
Control of Autonomous Aerial Vehicles Advances in Autopilot Design for Civilian UAVs
Advances in Industrial Control Series Editor Michael J. Grimble, Industrial Control Centre, University of Strathclyde, Glasgow, UK Editorial Board Graham Goodwin, School of Electrical Engineering and Computing, University of Newcastle, Callaghan, NSW, Australia Thomas J. Harris, Department of Chemical Engineering, Queen’s University, Kingston, ON, Canada Tong Heng Lee , Department of Electrical and Computer Engineering, National University of Singapore, Singapore, Singapore Om P. Malik, Schulich School of Engineering, University of Calgary, Calgary, AB, Canada Kim-Fung Man, City University Hong Kong, Kowloon, Hong Kong Gustaf Olsson, Department of Industrial Electrical Engineering and Automation, Lund Institute of Technology, Lund, Sweden Asok Ray, Department of Mechanical Engineering, Pennsylvania State University, University Park, PA, USA Sebastian Engell, Lehrstuhl für Systemdynamik und Prozessführung, Technische Universität Dortmund, Dortmund, Germany Ikuo Yamamoto, Graduate School of Engineering, University of Nagasaki, Nagasaki, Japan
Advances in Industrial Control is a series of monographs and contributed titles focusing on the applications of advanced and novel control methods within applied settings. This series has worldwide distribution to engineers, researchers and libraries. The series promotes the exchange of information between academia and industry, to which end the books all demonstrate some theoretical aspect of an advanced or new control method and show how it can be applied either in a pilot plant or in some real industrial situation. The books are distinguished by the combination of the type of theory used and the type of application exemplified. Note that “industrial” here has a very broad interpretation; it applies not merely to the processes employed in industrial plants but to systems such as avionics and automotive brakes and drivetrain. This series complements the theoretical and more mathematical approach of Communications and Control Engineering. Indexed by SCOPUS and Engineering Index. Proposals for this series, composed of a proposal form (please ask the in-house editor below), a draft Contents, at least two sample chapters and an author cv (with a synopsis of the whole project, if possible) can be submitted to either of the: Series Editor Professor Michael J. Grimble: Department of Electronic and Electrical Engineering, Royal College Building, 204 George Street, Glasgow G1 1XW, United Kingdom; e-mail: [email protected] or the In-house Editor Mr. Oliver Jackson: Springer London, 4 Crinan Street, London, N1 9XW, United Kingdom; e-mail: [email protected] Proposals are peer-reviewed. Publishing Ethics Researchers should conduct their research from research proposal to publication in line with best practices and codes of conduct of relevant professional bodies and/or national and international regulatory bodies. For more details on individual ethics matters please see: https://www.springer.com/gp/authors-editors/journal-author/journal-author-hel pdesk/publishing-ethics/14214
Andrea L’Afflitto · Gokhan Inalhan · Hyo-Sang Shin Editors
Control of Autonomous Aerial Vehicles Advances in Autopilot Design for Civilian UAVs
Editors Andrea L’Afflitto Department of Industrial and Systems Engineering Virginia Polytechnic Institute and State University Blacksburg, VA, USA
Gokhan Inalhan School of Aerospace, Transport and Manufacturing Cranfield University Cranfield, Bedfordshire, UK
Hyo-Sang Shin School of Aerospace, Transport and Manufacturing Cranfield University Cranfield, Bedfordshire, UK
ISSN 1430-9491 ISSN 2193-1577 (electronic) Advances in Industrial Control ISBN 978-3-031-39766-0 ISBN 978-3-031-39767-7 (eBook) https://doi.org/10.1007/978-3-031-39767-7 Mathematics Subject Classification: 93C10, 93C40, 49-02, 49N10 MATLAB and Simulink are registered trademarks of The MathWorks, Inc. See https://www.mathworks. com/trademarks for a list of additional trademarks. Mathematica is a registered trademark of Wolfram Research, Inc. See https://www.wolfram.com/legal/ trademarks for a list of additional trademarks. © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland Paper in this product is recyclable.
The editors wish to dedicate this work to Anh, Paolo, and Serap for their loving support over the two years needed to complete this work.
Series Editor’s Foreword
Control engineering is viewed rather differently by researchers who produce general theories and engineers who must design, calibrate, implement, and maintain real control systems. Researchers very often develop algorithms for control problems with a well-defined mathematical basis, while engineers may have more immediate concerns over the limitations of equipment, quality of control, safety of the system, security, and potential downtime. The monograph series Advances in Industrial Control (AIC) attempts to bridge this divide by encouraging the consideration of advanced control techniques when they offer genuine real-world benefits. The rapid development of new control theory, techniques, and technology has an impact on all areas of engineering. This series focuses on applications that may stimulate the development of new more general control paradigms. This is desirable if the different aspects of the “control design” problem are to be explored with the same dedication that “analysis and synthesis” problems have received. The series enables researchers to introduce new ideas motivated by challenging problems in the applications of interest. It raises awareness of the various benefits that advanced control can provide while also explaining the problems and challenges that can arise. This entry in the AIC monograph series involves the control of autonomous aerial vehicles, which is an important topic of growing significance. It is an edited text with a large number of contributors providing a wide-ranging overview of the subject. After a brief introductory Chap. 1, attention turns to one of the most useful topics in advanced control, namely, model predictive control in Chap. 2. In fact, in this case, nonlinear model predictive control (NMPC) is required. The NMPC approach is applied to the line-tracking problem. The unmanned aerial vehicle (UAV) dynamics model is first introduced and an NMPC design is described for line tracking where the UAVs are modeled as point masses. The NMPC solution generates the reference paths for UAVs in real time. Chapter 3 shows how model predictive control can be applied to the design of the reference trajectories for multi-rotor UAVs in, for example, rather hazardous environments. Optimal trajectory planning is of course interesting for aerospace and other application areas. Chapter 4 provides a differential dynamic programming approach for solving the computational guidance problem, with a different way vii
viii
Series Editor’s Foreword
of handling constraints. This autonomous vehicle optimization problem involves a flexible final time that can be useful. Chapter 5 is relevant to UAVs and other applications. It includes a unified hybrid control approach to the maximum endurance of turbojets, turboprops, turbofans, and all-electric aircraft. The “endurance” in this context is the maximum length of time that a UAV or aircraft can spend in cruising flight. Chapter 6 is concerned with modern challenges in applications for UAV navigation systems. Filtering methods are introduced for UAVs with onboard gas detection sensors including the useful “particle filtering” methods. Chapter 7 involves resilient estimation and safe planning for UAVs in Global-Positioning-System-denied environments. In this case, the “unscented Kalman filter” with sliding window outputs is used. A safety-constrained control approach is described that includes a detector, a resilient state estimator, and a robust controller. Chapter 8 considers the core problem of the design of the trajectory-tracking controller for an agile quadrotor. It has position- and attitude-tracking controllers in a cascaded structure with incremental nonlinear dynamic inversion. That is, it involves a traditional inner attitude control loop and outer position control loop. An incremental backstepping controller is presented in Chap. 9. Uncertainties in systems are always a problem in applications and this controller was shown to be superior to the more classical solution and was less reliant on information about the controlled plant. Chapter 10 has a different approach to the control of fixed-wing UAVs that uses the well-established adaptive dynamic programming. It provides robustness to uncertainties and allows a user-defined level of performance (represented in terms of a cost function) to be achieved. Chapter 11 discusses ways in which adaptive control can be used to control the various systems of autonomous aircraft. A distributed adaptive control approach is taken to the cooperative output regulation problem for a class of heterogeneous uncertain multi-agent systems. This is also of interest for many other application problems. Chapter 12 returns the text to considering more practical issues involving the interaction of an aerial manipulator with the environment. An aerial device with tilting propellers is controlled using a super-twisting sliding mode controller. The interaction force can be controlled for inspection tasks and other purposes. The use of hardware-in-the-loop simulation that can considerably reduce potential commissioning problems in systems is also discussed. This text is very ambitious in providing an insight into a very specific application area while at the same time introducing methods that are useful in many other applications. It includes detailed modeling information and theoretical control solutions while also covering rather practical aspects of the design and implementation process. The monograph should therefore be helpful to many aerospace-related students, engineers, and researchers in dynamic systems, estimation, and control. Glasgow, UK June 2023
Michael J. Grimble
Foreword
The design of autopilots for small autonomous Unmanned Aerial Vehicles (UAVs) is a research topic that is evolving rapidly. In the past decade, cohorts of researchers and practitioners have designed algorithms, electronic components, and mechanical components to allow these relatively simple vehicles to perform highly complex tasks in real time while relying exclusively on onboard resources. Although the state of the art has advanced in an impressive manner, the problem of designing autonomous UAVs able to support people in their daily tasks at home and work is still far from being solved. The literature on the design of autopilots for autonomous UAVs is vast and articulated and includes contributions by researchers with a diverse set of backgrounds. On one hand, the numerous journal and conference papers in this area are usually very detailed and tackle highly technical aspects of the problem, and are generally written for specialists. On the other hand, textbooks and monographs tend to provide a more accessible exposition to the autopilot design problem, but rarely capture the state of the art on the numerous aspects of the autopilot design problem, serving instead as educational primers. As a result, there is a gap in the literature between textbooks and journal or conference papers that challenges the researcher’s ability to produce a clear, coherent, and updated picture of the state of the art. As such, this edited book provides a welcome addition to the current literature by filling this need from a control-theoretic perspective. The range of topics addressed allows interested readers to learn more about relevant research problems and some novel and clever solutions to those problems. The collective experience and the academic diversity of the authors who have contributed to this work are impressive and guarantee both technical breadth and depth. The systematic organization of this work allows less experienced readers to orient themselves to the relevant research issues, while each chapter provides numerous references for a deeper comprehension of the problems addressed. Perhaps, the autopilot design problem for autonomous vehicles will be considered as closed in the coming decades. At a minimum, this vibrant academic community will almost definitely produce impressive scientific discoveries inspired by and applied to this challenging problem. While awaiting these results, this edited ix
x
Foreword
book provides an up-to-date and valuable reference for researchers, students, and practitioners in this field. March 2023
Prof. Randal W. Beard Brigham Young University, Provo USA
Preface
The idea of this contributed book sparked throughout a conversation among us editors about the state of the art in control theory and its applications to aerospace systems in general and unmanned systems in particular. With our research activities, our roles on the editorial board of the IEEE Transactions of Aerospace and Electronic Systems (Dr. Inalhan serves as the Editor-in-Chief, Dr. L’Affitto serves as the senior editor of the Autonomous Systems track, and Dr. Shin serves as an associate editor within the same track), and our active involvement in professional societies such as AIAA and IEEE (for instance, Dr. Inalhan is the Chair of the IEEE Committee on Aerospace Control, Dr. L’Affitto has been a long-standing member of the same committee, and Dr. Shin has been a member of the IFAC TC on Aerospace for many years), we believed to have a unique perspective on the rapidly expanding scientific literature in this area of great interest to academicians, commercial entities, and simple amateurs. We also felt that, despite the nourished scientific literature on control systems theory for autonomous aerospace systems, our community has been lacking a book that provides both breadth and depth on the state of the art. Indeed, we wanted to create a book that any graduate student in control theory with some competencies in dynamics and control of aerospace systems could use as a starting point for their research and become aware of some key trending research topics. Furthermore, we wanted to create a book that could have been used as a primary or complementary textbook for some graduate special topic classes on autopilot design for aerospace systems or closely related topics. The aerospace controls community is populated by world-class researchers who had truly game-changing impacts both in control theory and in aerospace applications. Furthermore, this community is nourishing several rising stars who already contributed in a significant manner to advance control theory and realize autonomous aerospace systems, which would have been considered visionary a decade ago. Thus, we reached out to some of these outstanding figures asking for a contribution presenting both some of their latest results and some detailed discussion on the state of the art in their sub-disciplines. This edited book provides in a single volume a snapshot of the state of the art in the field of control theory applied to the design of autonomous UAVs employed in a xi
xii
Preface
variety of civilian applications. Its homogeneous structure allows the reader to seamlessly transition from results in guidance, navigation, and control of UAVs, according to the canonical classification of the main components of a UAV’s autopilot. The contributing authors duly presented detailed literature reviews, conveyed their arguments in a systematic way with the help of diagrams, plots, and algorithms, and showcased the applicability of their results by means of flight tests and numerical simulations whose results are discussed in great detail. In several cases, the authors made available their codes and their simulation or flight test results on their research websites or open-access repositories. Sharing these datasets allows the interested reader to better appreciate the proposed results and use them as benchmarks for their own works. No edited book on the control of UAVs can be omni-comprehensive. Thus, in the first and last chapters of this work, we recommend numerous textbooks and very recent survey papers on the design of autopilots for UAVs and other aerial robots to further enhance the readers’ learning experience. Finally, each chapter has been edited to employ some common language and guarantee a good level of consistency throughout this work. The growing demand for trustworthy autonomy in general and reliable, userfriendly autonomous aerial systems to be employed in numerous and diverse applications, which range from agriculture to surveillance, last-mile payload delivery, and people transportation across cities, assures that this field will continue expanding in the next decades, and, perhaps, will become a classical ever-lasting area of study. We do hope that this editorial effort will serve current and future generations of researchers and practitioners to thrive in autonomous aerospace control systems design. Blacksburg, VA, USA Cranfield, Bedfordshire, UK Cranfield, Bedfordshire, UK May 2023
Andrea L’Afflitto Gokhan Inalhan Hyo-Sang Shin
Contents
1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Andrea L’Afflitto, Gokhan Inalhan, and Hyo-Sang Shin
2
Nonlinear Model Predictive Controller-Based Line Tracking Guidance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . On Park, Hyo-Sang Shin, and Antonios Tsourdos
7
Autonomous Multi-rotor Unmanned Aerial Vehicles for Tactical Coverage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Julius A. Marshall, Paul Binder, and Andrea L’Afflitto
31
An Improved Differential Dynamic Programming Approach for Computational Guidance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xiaobo Zheng, Shaoming He, and Defu Lin
77
3
4
1
5
A Unified Hybrid Control Approach to Maximum Endurance of Aircraft . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 Emily Oelberg and Luis Rodrigues
6
Information-Theoretic Autonomous Source Search and Estimation of Mobile Sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 Minkyu Park, Seulbi An, Hongro Jang, and Hyondong Oh
7
Resilient Estimation and Safe Planning for UAVs in GPS-Denied Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 Wenbin Wan, Hunmin Kim, Naira Hovakimyan, Petros Voulgaris, and Lui R. Sha
8
Incremental Nonlinear Dynamic Inversion-Based Trajectory Tracking Controller for an Agile Quadrotor . . . . . . . . . . . . . . . . . . . . . 195 Emre Saldiran and Gokhan Inalhan
9
Incremental Control to Reduce Model Dependency of Classical Nonlinear Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 Byoung-Ju Jeon, Hyo-Sang Shin, and Antonios Tsourdos xiii
xiv
Contents
10 Adaptive Dynamic Programming for Flight Control . . . . . . . . . . . . . . 269 Erik-Jan van Kampen and Bo Sun 11 A Distributed Adaptive Control Approach to Cooperative Output Regulation of a Class of Heterogeneous Uncertain Multi-agent Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293 Selahattin Burak Sarsılmaz, Ahmet Taha Koru, Tansel Yucelen, and Behçet Açıkme¸se 12 Aerial Manipulator Interaction with the Environment . . . . . . . . . . . . 319 Santos M. Orozco-Soto, Eugenio Cuniato, Jonathan Cacace, Mario Selvaggio, Fabio Ruggiero, Vincenzo Lippiello, and Bruno Siciliano 13 Conclusion and Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349 Andrea L’Afflitto, Gokhan Inalhan, and Hyo-Sang Shin Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
Contributors
Seulbi An Ulsan National Institute of Science and Technology, Ulsan, South Korea Behçet Açıkme¸se University of Washington, Seattle, WA, USA Paul Binder Virginia Tech, Blacksburg, VA, USA Jonathan Cacace Università di Napoli “Federico II”, Naples, Italy Eugenio Cuniato ETH Zurich, Zurich, Switzerland Shaoming He Beijing Institute of Technology, Beijing, China Naira Hovakimyan University of Illinois at Urbana-Champaign, Urbana, IL, USA Gokhan Inalhan Cranfield University, Cranfield, United Kingdom Hongro Jang Ulsan National Institute of Science and Technology, Ulsan, South Korea Byoung-Ju Jeon Cranfield University, Cranfield, United Kingdom Erik-Jan van Kampen Delft University of Technology, Delft, Netherlands Hunmin Kim Mercer University, Macon, USA Ahmet Taha Koru University of Texas at Arlington, Arlington, TX, USA Defu Lin Beijing Institute of Technology, Beijing, China Vincenzo Lippiello Università di Napoli “Federico II”, Naples, Italy Andrea L’Afflitto Virginia Tech, Blacksburg, VA, USA Julius A. Marshall Virginia Tech, Blacksburg, VA, USA Emily Oelberg Concordia University, Montreal, Canada Hyondong Oh Ulsan National Institute of Science and Technology, Ulsan, South Korea
xv
xvi
Contributors
Santos M. Orozco-Soto Università di Napoli “Federico II”, Naples, Italy Minkyu Park Ulsan National Institute of Science and Technology, Ulsan, South Korea On Park Cranfield University, Cranfield, United Kingdom Luis Rodrigues Concordia University, Montreal, Canada Fabio Ruggiero Università di Napoli “Federico II”, Naples, Italy Emre Saldiran Cranfield University, Bedford, United Kingdom Selahattin Burak Sarsılmaz Utah State University, Logan, UT, USA Mario Selvaggio Università di Napoli “Federico II”, Naples, Italy Lui R. Sha University of Illinois at Urbana-Champaign, Urbana, IL, USA Hyo-Sang Shin Cranfield University, Cranfield, United Kingdom Bruno Siciliano Università di Napoli “Federico II”, Naples, Italy Bo Sun Delft University of Technology, Delft, Netherlands Antonios Tsourdos Cranfield University, Cranfield, United Kingdom Petros Voulgaris University of Nevada, Reno, Reno, USA Wenbin Wan University of New Mexico, Albuquerque, USA Tansel Yucelen University of South Florida, Tampa, FL, USA Xiaobo Zheng Beijing Institute of Technology, Beijing, China
Abbreviations
AD AD-HDP ADP ALT ANN AOA ARE BKS CFD CoM CSG CT CUSUM DDP DHP DoF DT EMF ESC EsC ETC FFT-CDDP FMS FSF GCS GDHP GMM GPS HDP HIL
Action-Dependent Action-Dependent Heuristic Dynamic Programming Adaptive Dynamic Programming Attacker Location Tracker Artificial Neural Network Angle of Attack Algebraic Riccati Equation BacKStepping controller Computational Fluid Dynamics Center of Mass Command State Generator Continuous Time CUmulative SUM Differential Dynamic Programming Dual Heuristic Programming Degree of Freedom Discrete Time Electromotive Force Electronic Speed Controller Escape Controller Event-Triggered Control Flexible Final-Time-Constrained Differential Dynamic Programming Flight Management System Full-State Feedback Ground Control Station Global Dual Heuristic Programming Gaussian Mixture Model Global Positioning System Heuristic Dynamic Programming Hardware-In-the-Loop xvii
xviii
HJB HOC IBKS ICC IM IMU INDI LMI LoS LPV LQ LQR LQT LTI MIMO MPC MST NDI NED NMPC OLHP PD PE PID PLOS PLT PNG PO RH RL RLS RMSE ROS RRT SITL SMC SR STD STSMC SVG UAV UKF XGDHP ZOH
Abbreviations
Hamilton–Jacobi–Bellman Hybrid Optimal Control Incremental BacKStepping controller Individual Chance Constraint Incremental Model(-based) Inertial Measurement Unit Incremental Nonlinear Dynamic Inversion Linear Matrix Inequality Line of Sight Linear Parameter-Varying Linear Quadratic Linear Quadratic Regulator Linear Quadratic Tracking Linear Time-Invariant Multi-Input Multi-Output Model Predictive Control Mean Search Time Nonlinear Dynamic Inversion North-East-Down Nonlinear Model Predictive Control Open Left Half Complex Plane Proportional–Derivative Persistent Excitation Proportional–Integral–Derivative Pursuit and Line of Sight Pursues Location Tracker Proportional Navigation Guidance Partial Observability Receding Horizon Reinforcement Learning Recursive Least Squares Root Mean Square Error Robot Operating System Rapidly exploring Random Tree Software-In-The-Loop Sliding Mode Control Success Rate Standard Deviation Super-Twisting Sliding Mode Control Scheduling Variable Generator Unmanned Aerial Vehicle Unscented Kalman Filter Explainable Global Dual Heuristic Programming Zero-Order Holder
Chapter 1
Introduction Andrea L’Afflitto, Gokhan Inalhan, and Hyo-Sang Shin
1.1 The Role of Control Theory in the Design of Autonomous UAVs In the past decade, aeronautical engineering has evolved very rapidly toward directions that, perhaps, would have been unexpected by most practitioners in the second half of the 1900s. Nowadays, a strong trending topic in aeronautical engineering involves autonomous unmanned aerial vehicles (UAVs), also commonly known as “drones” by the general public and the press. Although these vehicles have been initially developed for military applications, civilian use of UAVs has become preponderant [1–3]. Furthermore, both the increasing miniaturization and the stunning improvement of the performances of electronic components, such as single-board computers, optical sensors, inertial measurement units, and antennas, have enabled the design of UAVs that can serve a very broad range of applications. Examples of such applications include payload delivery both outdoors for commercial applications and indoors to streamline the logistics of manufacturing plants [4]. Furthermore, UAVs have been employed to monitor crop fields [5, 6], oil and gas industries and pipelines [7], wind farms [8], construction sites [9], and industrial plants [10], to name a few. Recently, UAVs have been equipped with robotic arms, some of which are as complex as human limbs, to install sensors in secluded or high-altitude locations, take measurements either by contact or through contactless devices, and sample A. L’Afflitto (B) Virginia Tech, Blacksburg, USA e-mail: [email protected] G. Inalhan · H.-S. Shin Cranfield University, Cranfield, United Kingdom e-mail: [email protected] H.-S. Shin e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 A. L’Afflitto et al. (eds.), Control of Autonomous Aerial Vehicles, Advances in Industrial Control, https://doi.org/10.1007/978-3-031-39767-7_1
1
2
A. L’Afflitto et al.
rocks, soil, water, and harvest fruits [11, 12]. More recently, the use of UAVs is being thoroughly investigated to support and further enhance the performance of 5G and other wireless networks [13, 14]. Finally, UAVs have transcended Earthly applications since 2021, when the NASA multi-rotor UAV “Ingenuity” made its way in the Mars atmosphere [15]. Although UAVs, and in particular multi-rotor UAVs, such as quadcopters, are an established reality and numerous firms worldwide have been profitably commercializing autonomous UAVs, the systematic use of vehicles in numerous industrial activities is still to be realized fully. The reasons for this state of the art are multiple and range from the current regulations on the use of UAVs outdoors [16], which are still particularly strict to minimize the risks associated with the interference of UAVs with national airspace systems, the complexity of controlling UAVs manually in secluded and cluttered environments, and the lack of dedicated infrastructures for precise navigation, which were instrumental to determine the success of classical aviation. These challenges can not be solved with state-of-the-art technology and, hence, scientific research in the areas of perception, wireless communication, power storage, signal processing, data storage, optimization, and control theory needs to be performed before satisfactory levels of autonomy in small UAVs are achieved. Control theory has provided the ideal framework to solve problems concerning the guidance, navigation, and control of autonomous UAVs. In particular, optimal control theory and differential game theory have played pivotal roles in designing path and trajectory planners able to account for the UAV’s dynamics, environmental constraints, and structural limits given by the flight envelope and the saturation of motors [17–23]. Results in linear robust control, stochastic optimal control, and variable structure control have been inspired by and applied to estimation problems for UAVs, including state estimation from limited numbers of sensors, reliable state estimation from faulty sensors, fault detection and isolation, and source localization to name a few of the most relevant ones [1, pp. 381–490], [24], [25, Ch. 8]. Several frameworks, including H2 and H∞ control, backstepping control, variable structure control, adaptive control, and hybrid systems control, to name some of the most relevant and broadest ones, have been essential to design control laws able to guarantee satisfactory levels of robustness despite uncertainties, faults, failures, external disturbances, and unmodeled payload dynamics [26, 27]. More recently, innovative learning-based methods have been merged with classical linear and nonlinear control techniques to produce innovative autopilots for UAVs resilient to external disturbances and able to adapt themselves to new scenarios and unexpected circumstances [28, 29].
1.2 Outline of the Book According to a classical approach to the aerospace control problem, the contributions of this edited book have been organized in three macro-areas, namely, guidance, navigation, and control. However, as clearly shown by several works presented in
1 Introduction
3
this book, this canonical classification of the components of aircraft’s autopilots is becoming increasingly evanescent for UAVs employed in complex missions, which require extensive and rapid exchange of information among the guidance, navigation, and control subsystems. Chapters 2–5 propose frameworks based on optimal control theory applied to the design of guidance systems of aerospace vehicles. In particular, Chap. 2 presents an original optimization algorithm based on the nonlinear model predictive control framework to generate reference paths for UAVs in real time. Chapter 3 presents a guidance system for quadcopter UAVs to be employed by first responders and other emergency units to collect information on unknown and potentially hazardous environments. This guidance system includes an original algorithm for coverage of potentially contested areas, an optimization-based path planner, an original collision avoidance algorithm, and a trajectory planner based on the model predictive control algorithm and applied to the vehicle’s feedback-linearized equations of motion. The control inputs produced by the model predictive control algorithm are then employed to directly control the quadcopter. Chapter 4 revisits the differential dynamic programming framework by dynamically optimizing the final time via the first-order optimality condition of the value function and using the augmented Lagrangian method to address nonlinear constraints. The resultant algorithm, named flexible final-time-constrained differential dynamic programming, is applied to the trajectory planning problem for fixed-wing UAVs. Chapter 5 presents original results based on the optimal control framework to maximize the endurance of turbojet, turboprop, turbofan, and all-electric aircraft, including UAVs and other autonomous aircraft employed for urban air mobility. Chapters 6 and 7 address two challenging classes of navigation-related problems for autonomous UAVs. In particular, Chap. 6 provides an original approach to employ autonomous UAVs and reconstruct both the source and the release rate of chemical, biological, or radiological substances dispersed in the atmosphere. By addressing information-theoretic decision-making strategies and information sharing strategies among multiple UAVs, cooperation, and sensor fusion strategies, Chap. 6 establishes strong ties between the guidance and navigation problems. As clearly shown in Chap. 7, UAVs provide extraordinary examples of cyber-physical systems, that is, systems built from, and dependent upon, the seamless integration of computation and physical components. In this chapter, the authors provide original solutions to the problem of navigating UAVs through environments, where the GPS signal is spoofed. By proposing a system that includes an attack detector, a resilient state estimator, a robust controller, an attacker location tracker, and an escape controller, Chap. 7 provides an additional example of the increasingly strong connections between guidance, navigation, and control of UAVs. With the design of a control system for agile quadcopters based on the incremental nonlinear dynamic inversion, Chap. 8 marks the beginning of the last part of this contributed book. A key element of distinction of this chapter lies in the fact that, as shown by means of numerical simulations and flight tests, the proposed controller guarantees satisfactory results in real time despite modeling uncertainties. Chapter 9 provides original results within the contexts of incremental backstepping control
4
A. L’Afflitto et al.
and incremental nonlinear dynamic inversion that, leveraging data on the UAV’s state derivative and control input measurements, guarantee satisfactory results despite the absence of explicit model information on the UAV. Chapter 10 discussed a novel approach to adaptive dynamic programming that reduces the curse of dimensionality, and is applied to the control problem for fixed-wing UAVs. Chapter 11 proposes a distributed adaptive control approach to the cooperative output regulation problem for a class of heterogeneous uncertain multi-agent systems over general directed graphs. This approach breaks the original problem into a cooperative linear output regulation problem and control problem addressed by means of the model reference adaptive control approach for each agent. Finally, Chap. 12 presents an innovative control law based on the super-twisting slide mode framework for a tilt-rotor UAV equipped with a robotic arm and able to interact with the environment and human operators. Chapter 13 concludes this book by providing the editors’ perspective on future research on control systems for autonomous UAVs.
References 1. Valavanis K, Vachtsevanos G (2014) Handbook of unmanned aerial vehicles. Springer, Berlin, Germany 2. Shakhatreh H, Sawalmeh AH, Al-Fuqaha A, Dou Z, Almaita E, Khalil I, Othman NS, Khreishah A, Guizani M (2019) Unmanned aerial vehicles (UAVs): a survey on civil applications and key research challenges. IEEE Access, vol 7, pp 48 572–48 634 3. Marshall JA, Sun W, L’Afflitto A (2021) A survey of guidance, navigation, and control systems for autonomous multi-rotor small unmanned aerial systems. Ann Rev Control 52:390–427 4. Villa DKD, Brandão AS, Sarcinelli-Filho M (2020) A survey on load transportation using multirotor UAVs. J Intell Robot Syst 98(2):267–296 5. Chebrolu N, Läbe T, Stachniss C (2018) Robust long-term registration of UAV images of crop fields for precision agriculture. IEEE Robot Autom Lett 3(4):3097–3104 6. Furukawa F, Maruyama K, Saito YK, Kaneko M (2020) Corn height estimation using UAV for yield prediction and crop monitoring. Springer, Cham, pp 51–69 7. Qiu R, Liang Y (2020) A novel approach for two-stage UAV path planning in pipeline network inspection, ser. International pipeline conference, vol 3, 09 2020, v003T04A011 8. Car M, Markovic L, Ivanovic A, Orsag M, Bogdan S (2020) Autonomous wind-turbine blade inspection using LiDAR-equipped unmanned aerial vehicle. IEEE Access, vol 8, pp 131 380– 131 387 9. Asadi K, Kalkunte Suresh A, Ender A, Gotad S, Maniyar S, Anand S, Noghabaei M, Han K, Lobaton E, Wu T (2020) An integrated UGV-UAV system for construction site data collection. Automation in construction, vol 112, p 103068 10. Liu H, Chen Q, Pan N, Sun Y, An Y, Pan D (2022) UAV stocktaking task-planning for industrial warehouses based on the improved hybrid differential evolution algorithm. IEEE Trans Ind Inform 18(1):582–591 11. Bonyan Khamseh H, Janabi-Sharifi F, Abdessameud A (2018) Aerial manipulation: a literature survey. Robot Auton Syst 107:221–235 12. Meng X, He Y, Han J (2020) Survey on aerial manipulator: system, modeling, and control. Robotica 38(7):1288–1317 13. Zhang L, Zhao H, Hou S, Zhao Z, Xu H, Wu X, Wu Q, Zhang R (2019) A survey on 5G millimeter wave communications for UAV-assisted wireless networks. IEEE Access, vol 7, pp 117 460–117 504
References
5
14. Shahzadi R, Ali M, Khan HZ, Naeem M (2021) UAV assisted 5G and beyond wireless networks: a survey. J Netw Comput Appl 189:103114 15. Red planet selfie [the big picture]. IEEE Spectrum, vol 58, no 6, pp 14–15 (2021) 16. US code of federal regulations, https://www.ecfr.gov/current/title-14/chapter-I/subchapter-F/ part-107, June 2016, Title 14, Chapter I, Subchapter F, Part 107 17. Ben-Asher J (2010) Optimal control theory with aerospace applications. American Institute of Aeronautics and Astronautics, Reston, VA 18. Longuski J, Guzmán J, Prussing J (2013) Optimal control with aerospace applications. Springer, New York, NY 19. Faruqi F (2017) Differential game theory with applications to missiles and autonomous systems guidance. Wiley, Boston, MA 20. Goerzen C, Kong Z, Mettler B (2009) A survey of motion planning algorithms from the perspective of autonomous UAV guidance. J Intell Robot Syst 57(1):65 21. Aggarwal S, Kumar N (2020) Path planning techniques for unmanned aerial vehicles: a review, solutions, and challenges. Comput Commun 149:270–299 22. Indu, Singh R (2020) Trajectory planning and optimization for UAV communication: a review. J Discrete Math Sci Crypt 23(2):475–483 23. Cabreira TM, Brisolara LB, Ferreira PR Jr (2019) Survey on coverage path planning with unmanned aerial vehicles. Drones, vol 3, no 1 24. Merhav S (1998) Aerospace sensor systems and applications. Springer, New York, NY 25. Beard R, McLain T (2012) Small unmanned aircraft: theory and practice. Princeton University Press, Princeton, NJ 26. L’Afflitto A, Anderson RB, Mohammadi K (2018) An introduction to nonlinear robust control for unmanned quadrotor aircraft. IEEE Control Syst Mag 38(3):102–121 27. Kim J, Gadsden SA, Wilkerson SA (2020) A comprehensive survey of control strategies for autonomous quadrotors. Can J Electr Comput Eng 43(1):3–16 28. Carrio A, Sampedro C, Rodriguez-Ramos A, Campoy P (2017) A review of deep learning methods and applications for unmanned aerial vehicles. J Sens 3296874 29. Fraga-Lamas P, Ramos L, Mondéjar-Guerra V, Fernández-Caramés TM (2019) A review on IoT deep learning UAV systems for autonomous obstacle detection and collision avoidance. Remote Sens 11(18)
Chapter 2
Nonlinear Model Predictive Controller-Based Line Tracking Guidance On Park, Hyo-Sang Shin , and Antonios Tsourdos
2.1 Introduction Optimal control is a branch of modern control theory that is widely used in aerospace engineering from designing a feedback control law to a guidance law, and numerous of its applications and variations have also been developed over the past few decades. The classic optimal control problem is solved offline with complete knowledge of the system dynamics in order to minimize a given performance index over a finite or an infinite horizon [1–4]. In general, it is difficult to find analytical solutions to optimal control problems. Notable classes of optimal control problems for which there exist analytical closed-form solutions involve linear dynamical systems with quadratic performance indices or, alternatively, local models and performance index approximated by a Taylor series expansion. The linear quadratic regulator (LQR) methodology is a representative example of linear quadratic (LQ) optimal control problems regulating state trajectories of the system with minimum control input [3, 4]. The controller gain can be obtained by solving the associated quadratic cost defined in terms of a system state and control input. The resulting control input is a type of state feedback scheme, whose optimal gain is calculated from the solution of the Algebraic Riccati Equation (ARE). In addition to LQR, the linear quadratic tracker (LQT) problem presented in a causal form can also be formulated by means of a quadratic cost that is a function of the tracking error and control input. The standard LQT problem requires information on the command generator dynamics for the feedforward input to formulate the tracking O. Park · H.-S. Shin (B) · A. Tsourdos Cranfield University, Cranfield, United Kingdom e-mail: [email protected] O. Park e-mail: [email protected] A. Tsourdos e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 A. L’Afflitto et al. (eds.), Control of Autonomous Aerial Vehicles, Advances in Industrial Control, https://doi.org/10.1007/978-3-031-39767-7_2
7
8
O. Park et al.
problem in an LQ structure. Model predictive control (MPC), also referred to as a receding horizon control or online trajectory optimization, on the other hand, enables to accommodate constraints on both the state and the control based on the optimal control theory [5, 6]. The basic concept of the MPC is to propagate the future state trajectory from the current time step and optimize the control input sequence satisfying the system constraint and minimizing the predefined performance index over a finite horizon, which is an open-loop control problem. This optimization process yields a finite control sequence and the first control input is then applied to the system. After the sequence has been computed, the horizon is shifted and the optimization process is repeated again; this framework is clearly different from the classical LQ control problem whose solution is solved offline. The major advantage of MPC over LQ is that the control input is computed by solving an “online” optimal control problem at each sampling time and allows to account for constraint conditions in the performance index, which convert the original constrained optimal control problem into an unconstrained optimal control problem. Furthermore, it can be applied to both regulating and tracking problems and be expanded to the nonlinear control framework, known as Nonlinear MPC or NMPC [5–7]. For these reasons, MPC has gained considerable popularity in control theory despite the fact that intensive computation power may be required. The guidance problem is as important as the flight control problem for the success of a flight mission in all of its stages, including path following, waypoint navigation, target interception, and rendezvous. The track guidance mimics a spatial geometry of a first-order system response with a proportional control law [8] and proportional navigation guidance (PNG) forms a classical proportional–integral–derivative (PID) control law [9]. Potential fields construct a vector field around the tracking path and provide a flight direction [10]. Within the context of guidance, the optimal guidance problem, which leverages classical and modern approaches to optimal control, plays a pivotal role. For instance, a classical application of the LQR problem to optimal path planning was proposed in [11] by considering the vehicle’s linearized dynamics. Numerous alternative optimal guidance problems have been solved thus far, considering constraints on the terminal stage including, but not limited to, constraints on the impact angle or impact time while intercepting a target with zero miss distance [12– 14]. NMPC has also been utilized to enable formation flying and stand-off tracking for multiple UAVs [15–17]. In addition to optimal guidance, adaptive guidance has also been applied to expand the potential of adaptive control and enable a systematic approach and automatic adjustment procedures to autonomous applications in real time. The adaptive technique provides the capability to maintain a desired level of system performance when the dynamics parameters are changed or unknown, or in the presence of time-varying external disturbances. Adaptive guidance algorithms may consist in employing an adaptive control law to augment a linear or a nonlinear baseline controller in the UAV’s inner loop and, hence, adapt the closed-loop dynamics to uncertain properties [18–20]. Adaptive laws can be seen as an implementation of the tuning procedure from the collected data of the system in real time in order to tune the controller for achieving the desired performance [21, 22]. Besides, guidance and control laws can
2 Nonlinear Model Predictive Controller-Based Line Tracking Guidance
9
be integrated to take into account the reference trajectory tracking and control loop with an adaptive law [23, 24]. This chapter presents an optimal and adaptive guidance algorithm based on the NMPC architecture to solve a line tracking problem for UAVs. Specifically, adaptive weightings are applied in the performance index of the NMPC, which is a function of the relative position and heading error from the flight path. In order to address the line tracking problem, the error dynamics with respect to the tracking line are formulated, and the input sequence is generated. The performance of the proposed algorithm is verified by numerical simulations and is compared to other guidance methodologies. This chapter is organized as follows. In Sect. 2.2, the dynamics of the point mass model are derived, and the relative dynamics are then introduced to be utilized in the NMPC. Section 2.3 describes an optimal and adaptive line tracking guidance based on the NMPC approach with the numerical optimization algorithm. Section 2.4 shows the results of the numerical simulations, including comparisons with the benchmark algorithms. Lastly, the conclusions are drawn in Sect. 2.5.
2.2 Problem Formulation 2.2.1 Preliminaries This section introduces some preliminary backgrounds of the relative kinematics model for the line tracking guidance law design. Before designing a guidance algorithm, the following assumptions, which are widely accepted in the guidance law design for UAV waypoint guidance, facilitate the design of the optimal guidance law in the following sections. Assumption 2.1 The UAV is modeled as a point mass [16, 25, 26]. Assumption 2.2 A low-level controller has been implemented, such as a stabilityaugmented system (SAS) or a controllability-augmented system (CAS), for holding the heading and velocity of the UAV. Assumption 2.3 The tracking line is fixed in a two-dimensional and horizontal plane (a stationary flight path). Assumption 2.4 The wind has a constant velocity with a fixed direction angle at the same altitude. Note that the above assumptions are widely accepted in the guidance law design for UAV waypoint guidance.
10
O. Park et al.
2.2.2 Unmanned Aerial Vehicle Dynamics Model The three degrees-of-freedom (DOF) UAV model is given in a two-dimensional plane with a wind environment based on the principles of kinematics [16, 25]: x˙ = f c (x, u) ⎤ ⎡ v cos ψ + vw cos ψw ⎢ v sin ψ + vw sin ψw ⎥ ⎥ ⎢ ⎥ ⎢ r =⎢ ⎥, ⎥ ⎢ − 1 v + 1 uv τv τv ⎦ ⎣ − τ1r r +
1 τr
(2.1)
ur
where the components of x [x, y, ψ, v, r ]T ∈ Rn denote the position vector, the heading angle (−π ≤ ψ ≤ π ) with respect to the inertial coordinate, and the velocity and yaw rate of the UAV. The actuator is modeled in the control input as a first-order system with time constants τv and τr . The control input is given by u [u v , u r ]T ∈ Rm , whose components are the velocity and angular rate commands, respectively. The inertial coordinate system of the UAV is described in Fig. 2.1. Suppose that there is a constant wind vw with a fixed direction angle ψw expressed in the inertial
Fig. 2.1 Geometry of the kinematics model in the horizontal plane
2 Nonlinear Model Predictive Controller-Based Line Tracking Guidance
11
reference frame. Solid lines represent vectors and dashed lines denote the projected vector onto the corresponding axis, for example, vwI X = vw cos ψ,
(2.2a)
= vw sin ψ.
(2.2b)
I vwY
The total velocity is given by the sum of the velocity of the UAV and wind by vt = v + vw B 2 ) , = (v + vwBX )2 + (vwY
(2.3)
where vwB X,Y denote the components of the wind vector of the body axis rotated from the inertial coordinate component, and B
vw X cos ψ sin ψ vwI X = . B I vwY − sin ψ cos ψ vwY
(2.4)
Thus, the dynamics (2.1) can be rewritten using the total velocity vector as x˙ = f c (x, u) ⎤ ⎡ vt cos (ψ + ψdrift ) ⎢ vt sin (ψ + ψdrift ) ⎥ ⎥ ⎢ ⎥ ⎢ r =⎢ ⎥, ⎢ − 1 v + 1 uv ⎥ τ τ ⎦ ⎣ v v − τ1r r +
1 τr
(2.5)
ur
B v where ψdrift sin−1 vwYt denotes the difference angle between the actual travel direction of the UAV and the heading caused by wind. The constraints on the control inputs are given by (2.6) |u u , u r | < u r max . The discrete-time equations of motion can be obtained from the continuous-time dynamics (2.1) by Euler integration as xk+1 = f d (xk , uk ) = xk + f c (xk , uk )Ts ,
(2.7)
where f c (xk , uk ) captures the continuous-time equations of motion, and Ts denotes a sampling time. Thus, the discretized equations of motion can be obtained as
12
O. Park et al.
xk+1 = f d (xk , uk ) ⎡ ⎤ ⎡v cos ψ + v cos ψ ⎤ k k w w xk vk sin ψk + vw sin ψw ⎥ ⎢ yk ⎥ ⎢ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ rk ⎥+⎢ ψ =⎢ ⎥ Ts , k ⎢ ⎥ ⎢ 1 1 ⎥ ⎣vu k ⎦ ⎣ − τv vk + τv u v k ⎦ 1 1 rk − τr rk + τr u r k
(2.8)
or, equivalently, as ⎤ ⎡v cos (ψ + ψ )⎤ tk k drift xk vt k sin (ψk + ψdrift ) ⎥ ⎢ yk ⎥ ⎢ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ r ⎥ k ψ + =⎢ ⎢ ⎥ Ts . ⎢ k⎥ ⎢ 1 1 ⎥ − v + u ⎣vu k ⎦ ⎣ τv k τv v k ⎦ 1 1 rk − rk + u r k ⎡
xk+1
τr
(2.9)
τr
2.2.3 Relative Kinematics Equations Let us consider the line tracking problem shown in Fig. 2.2. The superscript I denotes the inertial coordinate with respect to the inertial coordinate, and tr denotes the new coordinate system with respect to the tracking line generated by w p1 and w p2 , where the goal point is w p2 with the direction angle of the tracking line ψwp = y −y tan−1 x pp2 −x pp1 . The relative position vector with respect to the tracking line coordinate 2 1 can be calculated by a plane rotation, where the inertial coordinate system is rotated into the new tracking coordinate system, tr, by a right-handed rotation about the z-axis of the direction angle ψwp as follows:
cos ψwp sin ψwp x − x p2 xtr = . ytr − sin ψwp cos ψwp y − y p2
(2.10)
From the line tracking geometry shown in Fig. 2.2, we deduce that the dynamics of the UAV relative to the tracking line are given by ⎡
⎤ ⎡ tr ⎤ ⎡ tr ⎤ tr vt x vx + vwx x˙tr tr ⎦ ⎣ y˙tr ⎦ = ⎣vttry ⎦ = ⎣vtry + vwy Δψ˙ r r ⎡ ⎤ v cos Δψ + vw cos Δψw = ⎣ v sin Δψ + vw sin Δψw ⎦ r
2 Nonlinear Model Predictive Controller-Based Line Tracking Guidance
13
Fig. 2.2 Geometry between the tracking line and the UAV in the horizontal plane
⎡
⎤ vt cos (Δψ + ψdrift ) = ⎣ vt sin (Δψ + ψdrift ) ⎦ , r
(2.11)
T T
tr tr T
, vwy denote the projected components onto where vttrx , vttry , vxtr , vtry , and vwx the tr-coordinate system and are shown in Fig. 2.2 by the dashed line with the heading error Δψ = ψ − ψwp and wind heading error Δψw = ψw − ψwp (−π ≤ Δψ, Δψw ≤ π ). Similar to (2.3), the total velocity can be computed from the sum of the projected velocity components as vt =
tr )2 + (v tr + v tr )2 . (vxtr + vwx y wy
(2.12)
Since the tracking line is fixed, the rate of the heading angle of the tracking line is zero, that is, ψ˙ wp = 0, the rate of heading error can be obtained from the third equation of (2.11). Thus, the kinematic model defined in the previous subsection can be reformulated with respect to the tracking line as x˙ tr = f c (xtr , u) ⎡ ⎤ vt cos (Δψ + ψdrift ) ⎢ vt sin (Δψ + ψdrift ) ⎥ ⎢ ⎥ ⎢ ⎥ r =⎢ ⎥, ⎢ − 1 v + 1 uv ⎥ τv τv ⎣ ⎦ 1 1 − τr r + τr u r
(2.13a)
(2.13b)
14
O. Park et al.
where the components of xtr [xtr , ytr , Δψ, v, r ]T denote the relative position vector, heading error, velocity, and yaw rate of the UAV, respectively. The discretized relative dynamics can also be obtained from the continuous-time relative dynamics (2.13), through Euler integration xtrk+1 = f d (xtrk , u k ) ⎡ ⎤ ⎡v cos (Δψ + ψ )⎤ tk k drift xtr k sin + ψ v (Δψ )⎥ ⎢ ytr k ⎥ ⎢ t k drift k ⎥ ⎢ ⎥ ⎢ ⎢ ⎥ r ⎢ ⎥ = ⎢Δψk ⎥ + ⎢ ⎥ Ts . 1 1 ⎥ v + u − ⎣ vk ⎦ ⎢ τv τv v ⎣ ⎦ 1 1 rk − τr r + τr u r
(2.14a)
(2.14b)
It can also be extended to the three-dimensional plane based on the two-dimensional relative kinematics. The basic equations of motion for the point mass model can be derived from the geometric relation shown in Fig. 2.3 as x˙ = f c (x, u),
Fig. 2.3 Geometry between the tracking line and the UAV in the plane
(2.15a)
2 Nonlinear Model Predictive Controller-Based Line Tracking Guidance
⎡ ⎤ ⎡ ⎤ vk cos θk cos ψk + vw cos θw cos ψw x˙ ⎢ ⎥ ⎢ y˙ ⎥ ⎢ vk cos θk sin ψk + vw cos θw sin ψw ⎥ ⎥ ⎢ ⎥ ⎢ −vk sin θk − vw sin θw ⎥ ⎢ z˙ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ qk ⎥ ⎢ θ˙ ⎥ ⎢ ⎥, ⎢ ⎥=⎢ rk ⎥ ⎢ψ˙ ⎥ ⎢ 1 1 ⎥ ⎢ ⎥ ⎢ − τv vk + τv u v ⎥ ⎢ v˙ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ 1 1 ⎥ ⎣ q˙ ⎦ ⎢ − q + u k q τq τq ⎣ ⎦ r˙ 1 1 − τr rk + τr u r
15
(2.15b)
where the components of x = [x, y, z, θ, ψ, v, q, r ]T ∈ Rn denote the position vector, pitch angle (−π/2 ≤ θ ≤ π/2), heading angle (−π ≤ ψ ≤ π ) with respect to the inertial coordinate, velocity, pitch rate, and yaw rate of the UAV, respectively, and u q denote the angular rate command with a first-order system with a time constant τq . The resultant dynamics show that the vertical state variables are added in the two-dimensional dynamics based on (2.1). Similar to the derivation procedure of the relative dynamics from (2.7) to (2.13), the relative dynamics with respect to the 3D tracking line can also be obtained by x˙ tr = f c (x, u), ⎡ ⎤ ⎡ ⎤ vt k cos(θk + θdrift ) cos(ψk + ψdrift ) x˙tr ⎢ ⎥ ⎢ y˙tr ⎥ ⎢ vt k cos(θk + θdrift ) sin(ψk + ψdrift ) ⎥ ⎢ ⎥ ⎢ ⎥ sin(θ + θ ) −v t k drift k ⎢ z˙ tr ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ q k ⎢ θ˙ ⎥ ⎢ ⎥ ⎢ ⎥=⎢ ⎥, r k ⎢ψ˙ ⎥ ⎢ ⎥ 1 1 ⎢ ⎥ ⎢ ⎥ − v + u k v τv τv ⎢ v˙ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 1 1 ⎥ ⎣ q˙ ⎦ ⎢ − q + u k q τ τ q q ⎣ ⎦ r˙ − τ1r rk + τ1r u r
(2.16a)
(2.16b)
where θ θ − θwp and ψ ψ − ψwp denote the pitch and heading angle error, respectively, and θdrift θt k − θk and ψdrift ψt k − ψk denote the difference pitch and heading angle between the total velocity and the UAV direction, respectively. Thus, the discretized relative dynamics can be derived by (2.14a). In this chapter, a two-dimensional line tracking guidance problem will be considered for comparison reasons, because the benchmark algorithms to be compared are derived based on the plane, which is common in guidance problems.
16
O. Park et al.
2.3 Nonlinear Model Prediction Control for Line Tracking Guidance 2.3.1 Performance Index and Constraint Condition Consistently with prior works [3, 15, 16], the performance index of the optimal control problem considered in this chapter is given by −1 k+N Jk = φk+N ytrk+N + L i ytri , ui ,
(2.17a)
i=k
s.t. xtri+1 = f d xtri , ui ,
(2.17b)
|u v,r i | ≤ u v,r max ,
(2.17c)
where 1 φk+N ytrk+N ytrTk+N Sk+N ytrk+N , 2 1 T i L ytri , ui ytri Q i ytri + uiT Ri ui , 2
(2.18a) (2.18b)
φk+N ytrk+N denotes a function of the final-time cost, and L i (ytri , ui ) denotes a time-varying cost of the output ytr and the control input u at the horizon time step i ∈ [k, k + N ]. The matrices Sk+N , Q i , and Ri characterize the final-time cost; the output ytr ; and the control input u, respectively. The main objective of this optimal control problem is to follow the reference line generated by two points described in Fig. 2.2, which can be considered as an
NMPC problem to find a control input sequence Uk u k , u k+1 , . . . , u k+N −1 |k by regulating both the relative position vector [xtr , ytr ]T and the heading error Δψ with yaw rate r while minimizing the performance index (2.17a). Thus, this line tracking problem is solved by designing a regulator for the reformulated kinematics (2.13). The output of the relative dynamics to be regulated is given by ytr k = h d (xtr k , utr k ) ⎤ ⎡ ⎡ ⎤ x 1 0 0 0 0 ⎢ tr k ⎥ ⎢0 1 0 0 0⎥ ⎢ ytr k ⎥ ⎥ ⎥⎢ =⎢ ⎣0 0 1 0 0⎦ ⎢Δψk ⎥ ⎣ vu k ⎦ 00001 rk = Cd xtr k .
(2.19)
2 Nonlinear Model Predictive Controller-Based Line Tracking Guidance
17
2.3.2 Numerical Optimization Algorithm for NMPC The augmented performance index can be derived from the equality and inequality conditions as −1 k+N T L i ytri , ui + λi+1 Ja k = φk+N ytrk+N + f d xtri , ui − xtri+1 i=k
1 1 + lr i μv Sv2 (u vi ) + lr i μr Sr2 (u r i ) , 2 2
(2.20)
where λ denotes a Lagrange multiplier used to account for the dynamic constraint. The control input saturation (2.6) can be formulated as Sv (u v ) =
|u v −
u v max | 2 u v max 2
Sr (u r ) =
−
u v max 2
≤ 0,
(2.21a)
|u r | − u r max ≤0 u r max
(2.21b)
with penalty parameters μv and μr . The parameters lv and lr account for the inequality constraint when the constraint condition on the control input lv,r i =
0, Sv,r i ≤ 0, 1, Sv,r i > 0
(2.22)
is not satisfied. To derive the optimal control law, define the Hamiltonian function 1 1 T f d xtri , ui + lvi μv Sv2 (u vi ) + lr i μr Sr2 (u r i ) . (2.23) Hi L i ytri , ui + λi+1 2 2 Thus, with some manipulations [3], the augmented performance index can be rewritten by using the Hamiltonian function as k+N −1 Hi + λiT xtri . Ja k = φk+N ytrk+N + Hk − λTk+N xtrk+N + i=k+1
The variation of the augmented performance index is given by
d Ja k
∂φk+N = − λk+N ∂xtrk+N
T
dxtrk+N +
∂Hk ∂xtrk
T
dxtrk +
∂Hk ∂uk
T duk
(2.24)
18
O. Park et al.
+
k+N −1
i=k+1
∂Hi − λi ∂xtri
T dxtri +
∂Hi ∂ui
T dui +
∂Hi − xtri−1 ∂λi
T
dλi . (2.25)
Thus, the necessary conditions for the constrained minimum are given by ∂Hi ∂λi = f d xtri , ui .
xtri+1 =
(2.26)
The costate equation is given by λi =
∂Hi ∂xtri
(2.27a)
= CdT QCd xtri +
∂ f T d λi+1 ∂xtri
(2.27b)
with the final-time condition given by
∂φk+N − λk+N dxtrk+N , ∂xtrk+N
(2.28)
where the Jacobian matrix in (2.27) can be computed as ⎡
∂ fd ∂xtri
1 ⎢0 ⎢ ⎢ = ⎢0 ⎢ ⎣0 0
0 −vt i sin (Δψi + ψdrift ) Ts 1 vt i cos (Δψi + ψdrift ) Ts 0 1 0 0 0 0
vi +vw x vt i vi +vw x vt i
⎤ cos (Δψi + ψdrift ) Ts 0 ⎥ sin (Δψi + ψdrift ) Ts 0 ⎥ ⎥ 0 Ts ⎥. ⎥ 1 − τ1v T s 0 ⎦ 0 1 − τ1r T s (2.29)
The stationarity condition is given by 0=
∂Hi ∂ui
= Rui +
∂ fd ∂ui
T λi+1 +
μu v l S sgn u vi − u v2max (u v max /2)2 v v (u vi ) μu r l S sgn (u r i ) , u r max 2 r r (u r i )
where the Jacobian in (2.30) can also be computed as
,
(2.30)
2 Nonlinear Model Predictive Controller-Based Line Tracking Guidance
⎡
0 ⎢ 0 ⎢ ∂ fd =⎢ ⎢ 10 ∂ui ⎣ τ Ts v 0
and sgn (a)
⎤ 0 0 ⎥ ⎥ 0 ⎥ ⎥ 0 ⎦ 1 T τr s
19
(2.31)
1, a ≥ 0, −1, a < 0,
(2.32)
which denotes the sign function. It follows from final-time condition (2.28) that λk+N =
∂φk+N ∂xtrk+N
= CdT Sk+N Cd xtrk+N ,
(2.33)
where λi can be calculated from the final-time condition i = k + N with (2.33) to initial time i = k by backward iteration using (2.27). Thus, the variation of Ja is given by k+N −1 ∂Hi T dui + λTk xk . (2.34) d Ja k = ∂u i i=k To minimize the Hamiltonian function Hi , dui can be determined by using a gradient descent method [1, 3] as ∂Hi , (2.35) dui = −Δi dui where Δ denotes a step size. Thus, the decreasing variation is given d Ja k = −
k+N −1
Δ
i=k
∂Hi ∂ui
T
∂Hi dui
+ λTk xk ,
(2.36)
and the control input is updated as uinew
=
uiold
− Δi
∂Hi dui
.
(2.37)
The is repeated until a control input sequence uk:k+N−1 =
numerical optimization u k , u k+1 , ..., u k+N −1 |k is found at each time step k, and the first control input is then implemented on the dynamics as a control input at the current time step. The computed control input sequence is updated to the initial sequence of the control input as ui=1:N −2 = ui=2:N −1 with ui=N −1 = u N −1 for the numerical optimization at the
20
O. Park et al.
next optimization loop. The numerical optimization process is shown in Algorithm 2.1, which is repeated until the predefined tolerant condition is satisfied. Algorithm 2.1 The Numerical Optimization in the NMPC old 1: Initialize Numerical Optimization: ui=1:N −1 = 0m×N 2: for k = 1, 2, . . . do 3: iter = 1, J0 = 10100 4: while |J | > do 5: for do 6: State Prediction using the Eqn. (2.14a) and (2.19) 7: Compute Jk using the Eqn. (2.20) 8: Backward Computation, λi , using the Eqn. (2.27) from (2.33) i 9: Compute ∂H ∂ui , Eqn. (2.30) 10: end for 11: if J = Jiter − Jiter−1 ≤ 0 then 12: uinew = uiold − i ∂H ∂ui 13: else 14: reduce i 15: end if 16: iter = iter + 1 17: end while new 18: uk = ui=1 19: Dynamics Update, Eqn. (2.7) 20: Control Input Sequence Update old new new 21: ui=1:N −1 = ui=2:N −1 , ui=N −1 22: end for
for i=1:N-1 for i=1:N for i=N-1:-1:1 for i=1:N-1
2.4 Numerical Simulations In this section, the nonlinear simulation for the line tracking guidance is conducted to verify the feasibility and benefits of the NMPC approach. The numerical simulation considers two cases, namely, the first model assumes a two-dimensional point mass model with a constant velocity model and the second model is (2.5) defined in the previous section to show the performance of the designed NMPC guidance.
2.4.1 LQR Guidance As described in the NMPC guidance, the LQ optimal control theory can also be applied in the guidance problem [11]. Recall the relative velocity, y˙tr , from (2.13). If we consider the small heading angle error, Δψ = ψ − ψwp ≈ 0, y˙tr can be rewritten as y˙tr = v sin Δψ ≈ vΔψ.
(2.38)
2 Nonlinear Model Predictive Controller-Based Line Tracking Guidance
21
Then, differentiating (2.38) yields y¨tr = vΔψ˙ = vr,
(2.39)
where v is constant. Thus, the linearized kinematics can be obtained from (2.38) and (2.39) with actuator dynamics ⎤⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ 01 0 0 ytr ytr d ⎣ ⎦ ⎣ y˙tr = 0 0 v ⎦ ⎣ y˙tr ⎦ + ⎣ 0 ⎦ u r dt r 1 00−1 r τr
τr
= Axlqr + Bu lqr ,
(2.40)
where xlqr [ytr , y˙tr , r ]T ∈ Rn lqr and u lqr u r ∈ Rm lqr . The quadratic performance index for the infinite horizon is given by 1 ∞ T J= xlqr Q lqr xlqr + u Tlqr Rlqr u lqr dt (2.41) 2 t0 and the optimal control can be computed as −1 T u lqr = −Rlqr B P xlqr
= −K lqr xlqr ,
(2.42)
where P denotes the solution of the algebraic Riccati equation −1 T 0 = AT P + P A + Q lqr − P B Rlqr B P.
(2.43)
2.4.2 Track Guidance The track guidance algorithm was proposed and further developed by Lee et al. in [8]. This algorithm is the spatial version of the first-order dynamical system that is able to generate a path without overshooting. Track guidance uses a desired distance κ from the relative position xtr of the tracking line generated by two waypoints w p1 and w p2 as described in Fig. 2.4. From the geometry of the relative coordinate, we deduce that ytr y˙tr , (2.44) = tan Δψ = x˙tr κ where relative components can be computed from (2.10) and (2.11). Thus, the error vector can be defined by E tr κ y˙tr − ytr x˙tr (2.45) and the yaw rate command can be expressed as a proportional feedback control law
22
O. Park et al.
(a) Geometry of the track guidance
(b) Example of the track guidance
Fig. 2.4 Track guidance in the horizontal plane
rc = K r E tr = K r (κ y˙tr − ytr x˙tr ) ,
(2.46)
where K r denotes a proportional gain.
2.4.3 Nonlinear Guidance The nonlinear guidance approach was proposed by Park in [27], and it is a lateral guidance law to track a desired flight path and has better performance at tracking a curved line. A designed parameter in the nonlinear guidance law is the relative distance L 1 from the UAV to the reference point on the flight path shown in Fig. 2.5. The lateral acceleration command is given by
(a) Geometry of the nonlinear guidance Fig. 2.5 Nonlinear guidance in the horizontal plane
(b) Example of nonlinear guidance
2 Nonlinear Model Predictive Controller-Based Line Tracking Guidance
as cmd = 2
V2 sin η, L1
23
(2.47)
where η denotes the angle between the velocity vector of the UAV and the reference point and L 1 is the distance between the UAV and the reference point on the flight path. Assuming that η is small, we obtain that sin η ≈ η = η1 + η2 ,
(2.48)
ytr y˙tr . Substituting (2.48) into (2.47) leads to the acceleration and η2 ≈ L1 V command which forms a proportional–derivative (PD) controller of the cross-track distance where η1 ≈
V2 sin η L1 V V y˙tr + ≈2 ytr , L1 L1
as cmd = 2
(2.49)
and the guidance law is derived from the relation of the centrifugal force rcmd =
as . V
(2.50)
2.4.4 Pursuit and Line of Sight-Based Path Following The pursuit and line of sight (PLOS) guidance law is derived from missile guidance by combining the pure pursuit guidance and LOS guidance [28]. The pursuit guidance law generates the command to make the interceptor toward the waypoint, whereas the LOS guidance law aligns the interceptor on the LOS line between the reference point and the interceptor. Thus, the result of the combined guidance law enables the UAV to follow a flight path. The guidance law of the PLOS is given as acmd k1 (ψLOS − ψ) + k2 ytr ,
(2.51)
where L 1 denotes the distance between the UAV and the reference point on the tracking line and the LOS angle is defined as ψLOS ψwp + η1 = ψwp + sin−1
−ytr L1
(2.52)
24
O. Park et al.
(a) Geometry of PLOS guidance
(b) Example of PLOS guidance
Fig. 2.6 PLOS guidance in the horizontal plane
from the relationship of the centrifugal force (2.50) and the guidance law, rcmd is derived (Fig. 2.6).
2.4.5 Comparison with Other Guidance Algorithms Two numerical simulations of the line tracking problem are conducted with the benchmark guidance algorithms [29]. The first simulation considers a square trajectory generated by four waypoints as a stationary flight path and the UAV travels from WP1 to WP4. The second simulation addresses the “X” shape nonlinear trajectory with a wind condition (vw = 5 m/s, ψw = −120◦ ). This simulation adopts a fly-by waypoint using turn anticipation to reduce overshoot of the next flight path, that is, if the distance between the current goal point and the UAV is smaller than the boundary radius Rbnd , then the next waypoint becomes as a goal point and the UAV should turn to the next course prior to reaching the current waypoint. Two models are considered for each NMPC. The first model is a point mass model with constant velocity x˙ = vt cos (ψ + ψdrift ) ,
(2.53a)
y˙ = vt sin (ψ + ψdrift ) , ˙ ψ = r, 1 1 r˙ = − r + u r , τr τr
(2.53b) (2.53c) (2.53d)
where v = 20 m/s, vt is given by (2.3), the nonlinear dynamics are given by xk+1 = f d (xk , uk ) with x ∈ Rn 1 and u ∈ Rm 1 , like (2.7), which is a usual assumption in the guidance algorithm and the second is derived in the (2.1) in which NMPC will utilize both of the two models to compare the performance of the NMPC denoted
2 Nonlinear Model Predictive Controller-Based Line Tracking Guidance Table 2.1 Initial conditions and parameters Parameter Value Time step (or sampling time) Ts Prediction time Rbnd (xini , yini ) ψini rini v τv , τr xtr max , ytr max ψmax vmax rmax μv , μr Δi
LQR guidance: Q lqr , Rlqr Track guidance: κ, K r Nonlinear, PLOS guidance: L 1 PLOS guidance: (k1 , k2 )
25
Unit
0.1
s
1 (equivalent to 10 steps) 20 (70, 0) 0 0 20 1/3 (300, 100) 180 20 1 10−2 0.3 10−3 In×n , Im×m 100, −0.0005 50 (20, 0.2)
s m m ◦ ◦ /s
m/s s m ◦
m/s rad/s N/A N/A N/A N/A m, N/A m N/A
by NMPC1 and NMPC2, respectively. The initial conditions of the simulation and parameters are summarized in Table 2.1. The weighting matrices Q and R for the NMPC are selected as time-varying functions of the relative state vector as Q i = Γ Q diag Ri1 = Γ R 1 Ri2 = Γ R 2
xtri
,
ytr i
xtr max ytr max
,
Δψi ri , ψmax rmax
ur i , u r max
u vi ur i , diag , u v max u r max
,
(2.54a) (2.54b) (2.54c)
where Γ Q = diag 1, 105 , 1, 1 , Γ R 1 = 10, and Γ R 2 = diag 10−2 , 102 are gain matrices and the superscript in R j , j = 1, 2, denotes the model case of the NMPC. The weighting for the final time is set by Sk+N = diag 1, 106 , 1, 1 . The simulation results are illustrated in Figs. 2.7 and 2.8 and given in Tables 2.2 and 2.3. First, all the tested guidance algorithms show a certain level of regulating performance in the straight line tracking where the cross-track error is converged to the zero as described in Fig. 2.7. The optimal controller-based guidance algorithms show
26
O. Park et al.
Fig. 2.7 The result of the waypoint line tracking guidance simulation
(a) Trajectories of the guidance algorithms
(b) Time history of the relative states and inputs
Fig. 2.8 The result of the waypoint line tracking guidance simulation with a wind condition Table 2.2 Simulation1: performance comparison with the bench mark algorithms Parameter NMPC1 NMPC2 LQR Track NL ytr mean, m ytr overshoot, m u r mean, ◦ /s Simulation time, s
0.20 4.62 2.97 60.4
0.06 2.64 2.94 61.2
1.03 4.20 3.00 60.0
4.50 14.27 2.98 60.6
5.00 19.90 2.89 62.2
PLOS 2.50 12.35 3.98 60.5
a better result in the cross-track error ytr and its overshoot ytr max compared with the linear controller-based algorithms although the consumption of the control input of the heading rate is similar while the NMPC2 takes more simulation time comparing with the other algorithms because the velocity is decelerated in order to minimize the cross-track error at each intersection point and then accelerated to make the xtr small.
2 Nonlinear Model Predictive Controller-Based Line Tracking Guidance
27
Table 2.3 Simulation2: performance comparison with the bench mark algorithms Parameter NMPC1 NMPC2 LQR Track NL ytr mean, m ytr overshoot, m u r mean, ◦ /s Simulation time, s
2.51 25.62 1.84 75.9
1.34 2.70 1.82 76.1
2.60 35.06 1.76 78.8
5.29 29.52 1.82 76.6
5.25 40.62 1.73 80.0
PLOS 2.67 26.80 1.85 78.6
The second simulation considers a more highly nonlinear condition, where the variation of the trajectory direction angle Δψwp = |ψwp i+1 − ψwp i | is greater than 90◦ with a constant wind condition, which is described in Fig. 2.8. At the beginning of the simulation, the UAV tends to be drifted due to the wind, however, aligned with the flight path in a few seconds. When the UAV turns at the waypoints 1 and 2, the wind acts as a headwind yielding a lower ground speed that reduces the overshoot of the cross-track error ytr . However, this wind becomes a tailwind at waypoint 3, which can cause a greater cross-track error when the UAV turns to waypoint 4 as it accelerates the UAV speed. Although the control input consumption of the NMPCs is slightly higher than that of other algorithms, the mean cross-track error and its maximum cross-track error are lower than others as can be seen in Table 2.3. Furthermore, the advantage of the NMPC can be found at the intersection point where the flight path is changed to the new path as the UAV reaches the predetermined distance Rbnd ≤ xtr 2 + ytr 2 . The direction of the new flight path is drastically changed due to the acute intersection angle, which affects on the linear controller-based guidance law. As the guidance laws of the LQR, track, nonlinear, and PLOS are derived from the linear assumption with a small difference angle between the trajectory and the UAV, it can deteriorate the performance of the guidance law when the UAV turns to the new goal point. Meanwhile, the NMPC guidance optimizes its control input sequence considering the propagated states and control input over the given prediction horizon. It is expected that NMPC guidance law can improve line tracking accuracy in a nonlinear environment.
2.5 Conclusion and Future Work This chapter described a novel optimal control approach to the guidance problem for a kinematic UAV model in the two-dimensional space and summarized other benchmark guidance algorithms briefly. The proposed optimal framework relies on a linear control strategy derived from a kinematic model for line tracking. The nonlinear model predictive control technique and numerical optimization process with a point mass model were discussed to design a line tracking guidance law. The benefit of this approach is to optimize a control input subject to physical constraints. Furthermore, the adopted architecture can handle the control and guidance problem simultaneously as an integrated system and expand to design a velocity controller as well. Numerical
28
O. Park et al.
simulations compared the performance of the nonlinear optimal guidance law with benchmark algorithms, which contributes toward optimal guidance in terms of path following and control energy consumption. Future work involves extending this work to include the UAV’s longitudinal dynamics and, hence, ease the implementation of the proposed approach to actual vehicles. Having discussed how a nonlinear model predictive control framework can be applied to the path planning problem for UAVs modeled as point masses, the next chapter shows how the model predictive control framework can be applied to design reference trajectories for multi-rotor UAVs employed in potentially hazardous environments. In the framework discussed in the next chapter, the UAVs are considered rigid bodies, and their nonlinear dynamics are feedback-linearized first to enable aggressive maneuvers while minimizing some quadratic cost function of the vehicle’s state and control input.
References 1. Bryson AE, Ho Y-C (1975) Applied optimal control: optimization, estimation, and control. Routledge 2. Kirk DE (2004) Optimal control theory: an introduction. Courier Corporation 3. Lewis FL, Vrabie D, Syrmos VL (2012) Optimal control. John Wiley & Sons 4. Naidu DS (2002) Optimal control systems. CRC Press, Boca Raton, FL 5. Rawlings JB, Mayne DQ, Diehl M (2017) Model predictive control: theory, computation, and design, vol 2. Nob Hill Publishing Madison, WI 6. Pannek LGJ, Grüne L (2011) Nonlinear model predictive control theory and algorithms. Springer, Berlin, Germany 7. Mayne DQ (2014) Model predictive control: Recent developments and future promise. Automatica 50(12):2967–2986 8. Lee D, Kim S, Suk J (2018) Formation flight of unmanned aerial vehicles using track guidance. Aerosp Sci Technol 76:412–420 9. Zarchan P (2012) Tactical and strategic missile guidance. American Institute of Aeronautics and Astronautics, Preston, VA 10. Sigurd K, How J (2003) UAV trajectory design using total field collision avoidance. Guidance, navigation, and control conference and exhibit. AIAA, Austin, TX, pp 1–17. https://doi.org/ 10.2514/6.2003-5728 11. Ratnoo A, Sujit P, Kothari M (2011) Adaptive optimal path following for high wind flights. IFAC Proceed 44(1):985–990 12. Ryoo C-K, Cho H, Tahk M-J (2005) Optimal guidance laws with terminal impact angle constraint. J Guidance Control Dyn 28(4):724–732 13. Kim T-H, Lee C-H, Jeon I-S, Tahk M-J (2013) Augmented polynomial guidance with impact time and angle constraints. IEEE Trans Aerosp Electron Syst 49(4):2806–2817 14. Lee C-H, Kim T-H, Tahk M-J, Whang I-H (2013) Polynomial guidance laws considering terminal impact angle and acceleration constraints. IEEE Trans Aerosp Electron Syst 49(1):74– 92 15. Shin H-S, Thak M-J, Kim H-J (2011) Nonlinear model predictive control for multiple uavs formation using passive sensing. Int J Aeronaut Space Sci 12(1):16–23 16. Kim S, Oh H, Tsourdos A (2013) Nonlinear model predictive coordinated standoff tracking of a moving ground vehicle. J Guidance Control Dyn 36(2):557–566
2 Nonlinear Model Predictive Controller-Based Line Tracking Guidance
29
17. Chai R, Savvaris A, Tsourdos A, Chai S, Xia Y (2018) Optimal tracking guidance for aeroassisted spacecraft reconnaissance mission based on receding horizon control. IEEE Trans Aerosp Electron Syst 54(4):1575–1588 18. Johnson EN, Calise AJ, Corban JE (2001) Adaptive guidance and control for autonomous launch vehicles. In: Aerospace conference proceedings, vol. 6. IEEE, pp 2669–2682 19. Prabhakar N, Painter A, Prazenica R, Balas M (2018) Trajectory-driven adaptive control of autonomous unmanned aerial vehicles with disturbance accommodation. J Guidance Control Dyn 41(9):1976–1989 20. Kaminer I, Pascoal A, Xargay E, Hovakimyan N, Cao C, Dobrokhodov V (2010) Path following for small unmanned aerial vehicles using l1 adaptive augmentation of commercial autopilots. J Guidance Control Dyn 33(2):550–564 21. Lewis FL, Vrabie D, Vamvoudakis KG (2012) Reinforcement learning and feedback control: Using natural decision methods to design optimal adaptive controllers. IEEE Control Syst Mag 32(6):76–105 22. Rubí B, Morcegol B, Peréz R (2019) Adaptive nonlinear guidance law using neural networks applied to a quadrotor. In: International conference on control and automation. IEEE, pp 1626– 1631 23. Chwa D, Choi JY (2003) Adaptive nonlinear guidance law considering control loop dynamics. IEEE Trans Aerosp Electron Syst 39(4):1134–1143 24. Khankalantary S, Rezaee Ahvanouee H, Mohammadkhani H (2021) l1 adaptive integrated guidance and control for flexible hypersonic flight vehicle in the presence of dynamic uncertainties. Proceed Inst Mech Eng Part I: J Syst Control Eng 235(8):1521–1531 25. Niculescu M (2001) Lateral track control law for aerosonde UAV. In: Aerospace sciences meeting and exhibit. AIAA, p 16 26. Beard RW, McLain TW (2012) Small unmanned aircraft: Theory and practice. Princeton University Press 27. Park S, Deyst J, How J (2004) A new nonlinear guidance logic for trajectory tracking. In: Guidance, navigation, and control conference and exhibit, p 4900 28. Kothari M, Postlethwaite I (2013) A probabilistically robust path planning algorithm for uavs using rapidly-exploring random trees. J Intell Robot Syst 71(2):231–253 29. Sujit P, Saripalli S, Sousa JB (2013) An evaluation of UAV path following algorithms. In: European control conference. IEEE, pp 3332–3337
On Park received his BSc and MSc degrees in Aerospace Engineering from Chungnam National University in 2014 and 2016, respectively. He is currently a Ph.D. student in Aerospace Engineering at the School of Aerospace, Transport, and Manufacturing, Cranfield University. His interests include optimization and machine learning in UAV guidance and control. Hyo-Sang Shin received his BSc in Aerospace Engineering from Pusan National University in 2004 and gained an MSc on flight dynamics, guidance, and control in Aerospace Engineering from KAIST and a Ph.D. on Cooperative Missile Guidance from Cranfield University in 2006 and 2011, respectively. He is currently a Professor of Guidance, Control, and Navigation Systems and Head of the Autonomous and Intelligent Systems Group at Cranfield University. His current research interests include target tracking, data-driven guidance and control, and distributed control of multiple agent systems. Antonios Tsourdos obtained an MEng in Electronic Control and Systems Engineering from the University of Sheffield (1995), an MSc in Systems Engineering from Cardiff University (1996), and a Ph.D. in Nonlinear Robust Missile Autopilot Design and Analysis from Cranfield University (1999). He is a Professor of Control Engineering at Cranfield University and was appointed Head of the Centre for Cyber-Physical Systems in 2013. He was a member of Team Stella, the winning team for the UK MoD Grand Challenge (2008) and the IET Innovation Award (Category Team, 2009).
Chapter 3
Autonomous Multi-rotor Unmanned Aerial Vehicles for Tactical Coverage Julius A. Marshall, Paul Binder, and Andrea L’Afflitto
3.1 Introduction In recent years, the use of small multi-rotor unmanned aerial vehicles (UAVs), such as quadcopters, has become particularly appealing to first responders, law enforcement agencies, and other emergency teams to collect information with high spatial and temporal resolutions about unknown and potentially dangerous environments. The complexity of employing UAVs in these tasks is substantially determined by the need for these vehicles to operate in complex environments where external sources of information, such as the Global Positioning System (GPS) or alternative global navigation satellite systems, either underperform or are unavailable. This chapter presents an original guidance system for autonomous multi-rotor UAVs, such as quadcopters, equipped with forward-facing cameras to create maps of unknown, potentially hostile, GPS-denied environments, while flying at very low altitudes. Several unique features distinguish the proposed system. Firstly, to operate in dangerous environments, this guidance system allows the aircraft to implement several tactics to minimize its exposure to potential threats. In this chapter, these tactics include coasting obstacles so that potential threats may only detect and intercept the UAV from limited directions. Furthermore, the UAV coasts obstacles to reduce the ability of systems based on sound reflection to localize the UAV. If no obstacle can be used to shelter the UAV, then the vehicle accelerates toward its next goal point to minimize the time spent in unsheltered areas. The proposed guidance system also allows the implementation of alternative tactics, such as maximizing flight time J. A. Marshall · P. Binder · A. L’Afflitto (B) Virginia Tech, Blacksburg, VA, USA e-mail: [email protected] J. A. Marshall e-mail: [email protected] P. Binder e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 A. L’Afflitto et al. (eds.), Control of Autonomous Aerial Vehicles, Advances in Industrial Control, https://doi.org/10.1007/978-3-031-39767-7_3
31
32
J. A. Marshall et al.
in already explored areas. An additional unique feature of the proposed guidance system is that it can operate with any vision-based navigation system, such as the one presented in [1, 2], tasked with localizing the UAV relative to the environment and reconstructing the environment in the form of voxel maps. In the voxel maps employed in this work, the size of all voxels is user-defined and each voxel captures whether the corresponding portion of the environment is still unexplored or, with some user-defined confidence level, explored and either occupied by some obstacle or unoccupied. Finally, the proposed guidance system implements an explore-thenexploit approach [3], since its path planner a suitable strategy to cover a user-defined set, and its trajectory planner enables the mapping process by exploiting the vehicle’s dynamics. However, a key difference between the proposed guidance system for mapping and the majority of existing ones based on the explore-then-exploit technique [4] is that, in this work, the map is created by employing forward-facing, and not down-facing, cameras. The proposed path planner includes both an original algorithm to select multiple goal points for the UAV to visit in order to cover the environment and an optimizationbased algorithm to produce sequences of waypoints connecting consecutive goal points. The goal point selection algorithm partitions the set to be covered in parallelepipeds of user-defined size and containing user-defined numbers of occupied voxels, and then sets the next goal point as the barycenter of the nearest partition, the barycenter of the largest partition, or a convex combination thereof. Thus, the proposed goal point selection algorithm allows the user to choose a more systematic approach or greedier approach, according to needs. For instance, a greedier approach may imply a longer flight time, which, in the presence of potential threats to the UAV, may result in an unsafe approach. The lifelong planning A∗ (L P A∗ ) algorithm underlying the proposed path planner generates a sequence of waypoints for the UAV to connect consecutive goal points; the L P A∗ algorithm has been chosen for its ability to rapidly generate paths [5]. Furthermore, in the proposed framework, this search algorithm can be seamlessly replaced by alternative algorithms operating over graphs such as D ∗ or D ∗ lite [6, 7]. In order to induce tactical behaviors, the cost-to-come function characterizing the proposed path planner includes a weighing function that steers the UAV toward obstacles to seek shelter. This weighing function is tunable by means of user-defined parameters that define both the maximum distance and the strength of the attraction exerted by obstacles on the UAV. Remarkably, alternative or additional criteria to generate more cautious trajectories, such as maximizing the distance traveled in already covered areas, can be implemented in the proposed path planning system by modifying the weighing function. A fast model predictive control framework allows the proposed guidance system to generate real-time reference trajectories that are compatible with the UAV’s dynamics and interpolate the waypoints produced by the path planner with user-defined precision. The cost function underlying the proposed trajectory planner captures the effort needed to control the UAV, the cost to reach the next waypoint, and the tactical advantage for the UAV to coast obstacles to seek shelter; user-defined parameters allow to tune the attractive effect of the obstacles’ set on the UAV. The proposed guidance system accounts for the UAV’s nonlinear dynamics, collision avoidance
3 Autonomous Multi-rotor Unmanned Aerial Vehicles for Tactical Coverage
33
constraints, attitude constraints, and saturation constraints on each propeller’s thrust force. In particular, the UAV’s dynamics are captured by the UAV’s discrete-time output-feedback linearized equations of motion; output-feedback linearization allows to exploit the UAV’s dynamics while leveraging a quadratic programming framework to solve the model predictive control problem in real time [2]. Collision avoidance constraints generated by an original algorithm in the form of convex sets that include the UAV, exclude all detected obstacle points, and are sufficiently large to produce reference trajectories over long time horizons. Constraints on the UAV’s yaw angle are imposed by means of inequalities, and constraints on the UAV’s pitch and roll angles are imposed by means of barrier functions embedded in the cost function. Finally, the boundary conditions underlying the proposed trajectory planner allow the user to define how closely waypoints need to be followed and the UAV’s acceleration, while traversing less safe areas. The literature on the coverage problem for UAVs employed in agriculture, mining, and civil engineering, has grown rapidly in the past few years [4, 8–10]. However, the problem of covering unknown and potentially hostile environments by using UAVs is still under-explored. Among the few works in this area, it is worthwhile to recall [11], where a modified logistic map and a modulo tactic are used to generate unpredictable motion profiles [12], where a nonlinear programming-based approach is employed to design paths for teams of UAVs for continuous coverage and sustained situational awareness, and [13], where a path planner based on genetic algorithms was proposed for heterogeneous fleets of UAVs flying at low altitudes in dynamic and potentially dangerous environments. However, it is worthwhile to recall that both [12, 13] rely on a priori information about the threats to the UAV and, hence, are not recommended for completely unknown environments. Furthermore, [11–13] propose path planners only. However, not accounting for the UAV’s dynamics reduces the ability to implement these systems on UAVs operating in cluttered environments, and, to succeed in these missions, fast trajectory planners and collision avoidance algorithms need to be included. For its ability to operate in environments, where the nature and the location of threats are unknown, and the integration of a fast path planner and a fast trajectory planner, the proposed guidance system advances the state of the art in the design of UAVs involved in mapping of unknown, contested areas. This chapter is structured as follows. In Sect. 3.2, the notation is outlined. Successively, in Sect. 3.3, we present the path planner underlying the proposed guidance system for tactical coverage. In Sect. 3.4, we present the trajectory planning algorithm underlying the proposed guidance system. Section 3.5 presents the results of two sets of numerical simulations aimed at showing the applicability of the proposed approach and its ability to efficiently cover large areas both in a tactical and in a reckless manner. Finally, in Sect. 3.6, we draw conclusions and recommend future work directions.
34
J. A. Marshall et al.
3.2 Definitions and Notation Let N denote the set of positive integers, R the set of real numbers, Rn the set of n × 1 real column vectors, Rn×m the set of n × m real matrices, Rn+ the positive orthant of n Rn , and R+ the nonnegative orthant of Rn . The interior of the set S ⊂ Rn is denoted ˚ the boundary of S ⊂ Rn is denoted by ∂E, and the closure of S is denoted by S. by S, Given x, y ∈ Rn , if each component of x is larger, not smaller, not larger, or smaller than the corresponding component of y, then we write x >> y, x ≥≥ y, x ≤≤ y, or x 0 centered at x ∈ Rn is denoted by Bρ (x). Given the symmetric, negative-definite matrix P ∈ R3×3 , r ∈ R3 , and c < 0, E(P, r, c) {w ∈ R3 : (w − r )T P(w − r ) − c ≥ 0} captures a closed ellipsoid. The transpose of B ∈ Rn×m is denoted by B T . The ith element of the canonical basis of Rn is denoted by ei,n [0, . . . , 1, . . . , 0]T , the zero vector in Rn is denoted by 0n , the zero n × m matrix in Rn×m is denoted by 0n×m , and the identity matrix in Rn×n is denoted by 1n . The diagonal matrix formed by ai ∈ R, matrix i = 1, . . . , p, is denoted by A = diag(a1 , . . . , a p ), and the block-diagonal formed by Mi ∈ Rni ×ni is denoted by M = blockdiag M1 , . . . , M p . The distance between the point x ∈ Rn and the set S is denoted by d2 (x, S) [14, p. 16]. We write · for the Euclidean vector norm and the corresponding equi-induced matrix norm [15, Definition 9.4.1]. The saturation function sat : R → [−1, 1] is defined so that if x ∈ [−1, 1], then sat(x) = x, if x > 1, then sat(x) = 1, and if x < −1, then sat(x) = −1. The firstand second-time derivatives of the smooth function y : [0, ∞) → Rn are denoted by y˙ (t), t ≥ 0, and y¨ (t), and the kth-order derivative of y(·), k ≥ 3, is denoted by y (k) (·). User-defined parameters are denoted by μa ∈ R, a ∈ {1, . . . , 24}. All vectors are expressed with respect to the orthonormal inertial reference frame I {O; X, Y, Z } centered at O and with axes X, Y, Z ∈ R3 defined so that the gravitational force acts along the −Z axis.
3.3 Path Planning System for Tactical Coverage 3.3.1 Overview The proposed guidance system computes both a reference path and a reference trajectory for a UAV quadcopter so that its camera-based navigation system maps a user-defined connected set, that is, classifies the voxels covering this set as occupied or unoccupied, and reconstructs a map of the obstacles in this set. To meet this goal, the UAV’s reference path is computed iteratively by identifying a sequence of goal points, and then by finding a sequence of waypoints joining consecutive goal points. In particular, at the beginning of the mission, all voxels covering the set to be
3 Autonomous Multi-rotor Unmanned Aerial Vehicles for Tactical Coverage
35
mapped are considered as unexplored, and as the set is being mapped, unexplored voxels are grouped into partitions, that is, parallelepipeds whose minimum edge is user-defined. At each iteration, the UAV’s goal point is chosen as the barycenter of the partition containing the largest number of unexplored voxels, or as the barycenter of the nearest partition containing a user-defined minimum number of unexplored voxels, or as a convex combination of these two points. Setting the barycenter of the partition containing the largest number of unexplored voxels as the goal point for a given iteration not only allows to classify a large number of voxels but also enforces a reckless strategy since the UAV may be drawn toward unsafe environments. Conversely, setting the barycenter of the nearest partition not only allows the classification of a small number of voxels but also enables a more cautious tactic by preventing the UAV from quickly reaching far, unexplored areas. Having set the goal point, the reference path is computed as the solution of an optimization problem that captures the UAV’s need to reach the goal point while enabling additional tactical strategies to exhibit a tactical behavior; examples of such strategies include coasting the obstacles’ set or flying faster when sufficiently far from the obstacles’ set. The proposed path planner does not account for the UAV’s dynamics, but models the vehicle as a point mass able to move across adjacent unoccupied voxels. The UAV’s dynamics and, hence, the dynamics of the onboard cameras’ focal axis, is accounted for by the trajectory planner presented in Sect. 3.4 below. Although the proposed path planner is designed to lead the UAV through the voxels containing the goal points, the proposed coverage strategy does not require meeting this objective. The voxel containing a goal point merely needs to be detected by the onboard navigation system and classified either as occupied or as unoccupied. Attempting to classify all voxels in the user-defined connected set in a systematic manner, as it occurs for numerous coverage algorithms [4, 16–18], may induce a reckless behavior in the UAV.
3.3.2 Notation A voxel is a rectangular parallelepiped of user-defined dimensions, and let the connected set V ⊂ R3 denote the union of n V ∈ N congruent voxels that cover the set to be mapped. Each voxel in V is denoted by its center, whose position is captured by rˆ p ∈ V, p ∈ {1, . . . , n V }. The explored indicator function χexplored : R3 → {0, 1} is defined so that, if the voxel centered in rˆ p ∈ V is unexplored, then χexplored (ˆr p ) = 0, and if the voxel centered in rˆ p ∈ V is explored, then χexplored (ˆr p ) = 1. Furthermore, let Vunexplored {ˆr p ∈ V : χexplored (ˆr p ) = 0, p = 1, . . . , n V } and Vexplored V \ Vunexplored capture the unexplored subset of V and the explored subset of V, respectively. The occupied indicator function χoccupied : R3 → {0, 1} is defined so that, if the voxel centered in rˆ p ∈ V, p ∈ {1, . . . , n V }, is unoccupied, then χoccupied (ˆr p ) = 0, and if the voxel centered in rˆ p ∈ V is occupied, then χoccupied (ˆr p ) = 1. Furthermore, let Vfree {ˆr p ∈ V : χoccupied (ˆr p ) = 0, p = 1, . . . , n V } and Voccupied V \ Vfree
36
J. A. Marshall et al.
capture the unoccupied subset of V and the occupied subset of V, respectively. The obstacles’ set O Voccupied ∩ Vexplored is defined as those voxels that are both occupied and explored; the cardinality of O is denoted by n O . Remark 3.1 The notions of occupied and explored voxels are unrelated. Indeed, at a given time instant, an occupied voxel may have not been explored by the UAV’s navigation system. In this chapter, all unexplored voxels are designated as unoccupied until they are explored by the UAV’s navigation system. This choice is motivated by the need for the path planning algorithm to return complete paths between any two unoccupied voxels, regardless of whether these voxels have already been explored or not. ˆ a , a ∈ {1, . . . , n P }, The set V is partitioned in n P rectangular parallelepipeds P n P ˆ ˆ ˆ ˆ such that Pa is the union of voxels, a=1 Pa = V, and Pa ∩ Pb = {∅} for all a, b ∈ ˆ a , a ∈ {1, . . . , n P } is denoted by {1, . . . , n P } such that a = b. The barycenter of P ˆ ˆ 0. rˆPˆ a ∈ Pa . Lastly, the smallest parallelepiped containing V is denoted by P ˆ i , i = 0, . . . , n P , is The subset of the voxel map contained in parallelipiped P ˆ ˆ denoted by Vi ⊆ Pi . The set of explored voxels in Pi , i = 0, . . . , n P , is defined ˆ i is defined as as Vexplored,Pˆ i Vexplored ∩ Vi , the set of unexplored voxels in P ˆ V ˆ Vunexplored ∩ Vi , the set of unoccupied voxels in Pi is defined as unexplored,Pi
ˆ i is defined as V Vfree,Pˆ i Vfree ∩ Vi , and the set of occupied voxels in P ˆi occupied,P ˆ i , i = 0 . . . , nP, O ∩ Vi ; for details, see Fig. 3.1. The number of explored voxels in P ˆ i is denoted by is denoted by n explored,Pˆ i , the number of occupied voxels in P ˆ i is denoted by n n occupied,Pˆ i , the number of unexplored voxels in P ˆ i , and unexplored,P the length of the smallest edge of Pi Pˆ i ,min .
3.3.3 Goal Points Selection Algorithm As discussed in Sect. 3.3.1, the UAV’s reference path is computed iteratively by identifying a sequence of goal points determined by executing Algorithm 3.1. The ˆ 0 ⊆ V in n P rectangular parallelepipeds by first step of this algorithm is to partition P employing Algorithm 3.2, which is discussed in detail in the following. Next, at each ˆ ˆ nP iteration of Algorithm 3.1, we find the partitions P1 , . . . , Pn˜ P ⊆ P1 , . . . , P n
Pi ≥ 1 − μ1 , i = 1, . . . , n˜ P , where μ1 ∈ (0, 1) captures the usersuch that unexplored, n Pi defined threshold on the ratio of explored voxels over the total number of voxels ˆ 0 . Finally, the ˆ i ; thus, P1 , . . . , Pn˜ P are sufficiently unexplored partitions of P in P , q = 0, . . . , n , is defined as the barycenter of the partition UAV’s goal point rˆ g P,q P ∈ P1 , . . . , Pn˜ P that is accessible to the UAV and contains the point
μ2 rˆPprox + (1 − μ2 )ˆrPmax ,
(3.1)
3 Autonomous Multi-rotor Unmanned Aerial Vehicles for Tactical Coverage
37
Occupied unexplored voxel Unoccupied explored voxel Occupied explored voxel
Pˆ0
Voccupied Vunexplored
Obstacle
O = Voccupied ∩ Vexplored Vexplored,Pˆ1
Pˆ8
Pˆ1 Vunexplored,Pˆ1 Vunexplored,Pˆ3
Pˆ3
Pˆ2
ˆ i , i = 1, . . . 8. Occupied unexplored voxels Voccupied ∩ Fig. 3.1 A voxel map and its partitions P Vunexplored are marked in gray, and occupied explored voxels O are marked in gray with a pink boundary. Free unexplored voxels fill the remaining space. The sets Vunexplored,Pˆ 1 and Vunexplored,Pˆ 3 ˆ 1 and P ˆ 3 . The sets V are shown to illustrate the unexplored subsets of the parallelepipeds P ˆ ˆ 1 and P ˆ 8 , respectively and Vunexplored,Pˆ 8 illustrate explored and unexplored subsets of P
explored,P1
where μ2 ∈ [0, 1] is user-defined, rˆPprox ∈ Pprox denotes the barycenter of Pprox , Pprox ∈ P1 , . . . , Pn˜ P denotes the partition that is closest to the UAV along its direction of motion, rˆPmax ∈ Pmax denotes the barycenter of Pmax , and Pmax ∈ P1 , . . . , Pn˜ P denotes the partition containing the largest number of voxels. In this chapter, the condition whereby the barycenter of P is accessible to the UAV is verified by applying a greedy A∗ algorithm [19, pp. 604–608]. Partitions and goal points are recomputed as soon as the UAV’s navigation system maps the voxel containing rˆ P . If there does not exist a partition whose ratio of explored voxels is smaller than μ1 , then the user-defined set V is considered as mapped. ˆ 0 ⊆ V is partitioned in n P rectangular parAt each iteration of Algorithm 3.1, P allelepipeds according to Algorithm 3.2. In Algorithm 3.2, if two sets of conditions ˆ 0 or any of its partitions are divided into smaller parallelepipeds, are verified, then P ˆ 0 . The first set of these conditions whose aspect ratio is the same as the aspect ratio of P ˆ i, requires that both the ratio of explored voxels over the total number of voxels in P
38
J. A. Marshall et al.
Algorithm 3.1 Generate goal points rˆ P,q , q = 0, . . . , n g , for the UAV 1: Set q = 0 ˜ i s.t. 2: while ∃P
n explored,P˜
i < μ1 do ˆ 0 in P ˆ 1, . . . , P ˆ n by executing Algorithm 3.2 Partition P P n ˆ n s.t. unexplored,Pi ≥ 1 − μ1 , i ∈ {1, . . . , n P } ˆ 1, . . . , P Find P1 , . . . , Pn˜ P ⊆ P P n
n P˜
i
3: 4:
Pi
5: Compute rˆPprox , rˆPmax 6: Determine the goal point as the barycenter of the parallelepiped containing (3.1) 7: while Navigation system has not mapped rˆP,q ˜ do 8: wait 9: end while 10: Increment q = q + 1 11: end while
ˆ0 Algorithm 3.2 Octree iterative algorithm to partition P ˆ 0 = minimum bounding box, n P = 1 1: Initialize P 2: Find n Pˆ 0 , n explored,O P0 ,min P0 , n occupied,O P0 , O 3: if
n explored,O P0 n Pˆ
< μ1 then
0
ˆ 0) 4: Divide(P 5: end if ˆi Divide P ˆ n +1 , . . . , P ˆ n +8 } 6: Compute vertices of child parallelipipeds {P P P 7: Compute n Pˆ j , n explored,Pˆ j , n occupied,Pˆ j , Pˆ j ,min , j ∈ {n P + 1, . . . , n P + 8} 8: n P = n P + 8 9: for j = (n P − 7):n P do 10: 11:
if
n explored,Pˆ n Pˆ
j
< μ1 and
Pˆ
j
if n explored,Pˆ j ≥ μ4 or
ˆ j) 12: Divide(P 13: end if 14: end if 15: end for
j ,min
≥ μ3 2 n occupied,Pˆ j
n Pˆ
then ≥ μ5 then
j
i = 0 . . . , n P , is smaller than the user-defined parameter μ1 ∈ (0, 1) and the length ˆ i is larger than 2μ3 , where μ3 > 0 is user-defined. The second of the smallest side of P ˆ i , i = 0 . . . , n P , is set of conditions requires that the number of explored voxels in P larger than the user-defined parameter μ4 ∈ N or the ratio of occupied voxels over ˆ i is larger than the user-defined parameter μ5 ∈ (0, 1). the total number of voxels in P ˆ 0 that are larger than some user-defined Algorithm 3.2 produces partitions of P threshold and contain sufficiently many explored voxels, sufficiently many occupied voxels, or sufficiently many unexplored voxels. Setting larger values of μ1 and smaller values of μ3 , μ4 , and μ5 Algorithm 3.2 produces smaller partitions containing primarily unexplored voxels that are at the edge of explored areas, larger partitions containing primarily unexplored voxels that are located away from explored areas,
3 Autonomous Multi-rotor Unmanned Aerial Vehicles for Tactical Coverage
39
and larger partitions containing primarily explored voxels. Thus, it is expected that larger values of μ1 and smaller values of μ3 , μ4 , and μ5 imply longer flight times, since the UAV must collect more information about the environment. Setting larger values of μ2 ∈ [0, 1], Algorithms 3.1 and 3.2 enable a systematic approach to cover ˆ 0 , whereas, by setting smaller values of μ2 ∈ [0, 1], Algorithms 3.1 and 3.2 enable a P greedy approach to the coverage problem. Therefore, it is expected that larger values of μ1 and smaller values of μ2 , . . . , μ5 imply longer flight times since the UAV is ˆ 0 more greedily. tasked to cover smaller partitions of P
3.3.3.1
Study on the Computational Cost of Algorithm 3.2
In this section, we present the results of numerical simulations aimed at analyzing the performance of Algorithm 3.2 as a function of the user-defined parameters μ1 , μ3 , μ4 , and μ5 . These simulations have been performed on a computer hosting an i711800H 2.3 GHz CPU, 16 GB RAM, and operates Ubuntu 18.04. This single-board computer is used also for the software-in-the-loop simulations presented in Sect. 3.5 and future flight tests. To perform these numerical simulations, 10 randomly generated obstacle sets Voccupied are employed, and, for each obstacle set, 10 sets of explored voxels Vexplored are considered. Each obstacle set is generated by incrementally placing obstacles in the shape of cubes of 1 m side, starting from an empty map, whose volume is 2400 m3 , until 30% of the map is occupied. Given an obstacle set, each set of explored voxels is generated as the set of voxels contained in a sphere of 4 m radius and ˆ 0. centered at one of 10 pre-determined locations, which are equally spaced in P This approach allows to simulate maps that, similar to maps produced by UAVs at the beginning of a coverage mission, contain explored voxels that are clustered ˆ 0 . A uniformly distributed random number and occupy approximately 10% of P generator is used to select the locations of the obstacles. A volume of 2400 m3 was chosen because it is sufficiently representative of common industrial or office spaces, where the proposed guidance system for scouting unknown environments could be employed. Placing a cap on the density of obstacles is necessary since at higher densities, path planners similar to the one proposed in this chapter become unsuccessful on voxel maps, whose obstacles are scattered [2]. ˆ 0 conSince the first condition of Algorithm 3.2 to further divide a partition of P cerns the pair (μ1 , μ3 ) and its second condition concerns the pair (μ4 , μ5 ), the proposed numerical study is performed by varying (μ1 , μ3 ) and (μ4 , μ5 ) separately. The user-defined parameter μ1 is varied over the interval [0.30, 0.90], since, in the proposed simulation setup, if μ1 < 0.30, then the result of Algorithm 3.2, in general, is not refined enough to be useful. Furthermore, if μ1 < 0.10, then Algorithm 3.2 does not produce any partition. If μ1 is less than or equal to 0.10 (since the initial sphere leave 10% of the map explored), and if μ1 > 0.90, then Algorithm 3.2 produces partitions containing a small number of voxels. The parameter μ3 is spanned over the interval [0.20, 2.40], since the length of the smallest side of a voxel is 0.20 m and, considering the scopes of this chapter, at least two voxels should be contained
40
J. A. Marshall et al.
in a partition. Additionally, 2.40/2 m is sufficiently representative of the width of some space to be explored by the UAV. The parameter μ4 is spanned over the set {4, . . . , 160}, since the size of the UAV employed in this research is approximately equivalent to 4 voxels, and the field of view of the cameras employed in this research allows to explore a cone, whose diameter is approximately as large as 160 voxels. Finally, μ5 is spanned over the interval [0.00, 0.30], since it is statistically impossible for a small UAV, which employs a guidance system similar to the proposed one, to find a viable path in a space occupied by 30% of randomly generated voxels. To reduce the computational effort involved with the proposed set of simulations, not all admissible pairs of (μ1 , μ3 ) are tested, but only the pairs (μ1 , μ3 ) ∈ S1 , where S1 {(μ1 ,μ3 ) = (0.35 + 0.05i, 0.20 + 0.20i), i = 0, . . . , 11} is an ordered set, since the computational time of Algorithm 3.2 increases for increasing values of μ1 and decreasing values of μ3 , whereas it is unclear a priori whether the computational cost of Algorithm 3.2 is supposed to increase for increasing values of μ1 and increasing values of μ3 or not. Similarly, not all admissible pairs of (μ4 , μ5 ) are tested, but only the pairs (μ4 , μ5 ) ∈ S2 {(μ4 , μ5 ) = (160 − 12i, 0.01 + 0.022i), i = 0, . . . , 13}, since the computational time of Algorithm 3.2 increases for decreasing values of μ4 and decreasing values of μ5 , whereas it is unclear a priori whether the computational cost of Algorithm 3.2 is supposed to increase for decreasing values of μ4 and increasing values of μ5 or not. The boxplots in Fig. 3.2 capture some of the statistical data of the 1000 simulations performed to test Algorithm 3.2 as functions of (μ1 , μ3 ) ∈ S1 , averaged over all (μ4 , μ5 ) ∈ S2 . From these plots, it appears that, varying the pair (μ1 , μ3 ) over S1 , the average computational time varies nonlinearly, captured by the best-fitting curve p(τ ) = 0.0002τ 7 − 0.0116τ 6 + 0.2317τ 5 − 2.1865τ 4 + 10.0547τ 3 − 22.3011τ 2 + 36.8072τ + 220.3555,
τ ∈ [1, 12],
where p(1) approximates the computational time of Algorithm 3.2 for (μ1 , μ3 ) = (0.35, 0.20) and p(12) approximates the computational time of Algorithm 3.2 for (μ1 , μ3 ) = (0.90, 2.40), the average interquartile range is 1 ms with maximum deviance of 1 ms, and the average length of the whiskers is 4 ms with maximum deviance of 2 ms. The boxplots in Fig. 3.3 capture some of the statistical data of the performed simulations as functions of (μ4 , μ5 ) ∈ S2 , averaged over all (μ1 , μ3 ) ∈ S1 . These plots show that varying the pair (μ4 , μ5 ) over S2 , the computational time remains substantially constant and approximately equal to 291 ms with maximum deviance of 3 ms, the interquartile range is in the order of 30 ms with maximum deviance of 1 ms, and the average length of the whiskers is 101 ms with maximum deviance of 3 ms. From this analysis, we deduce that the computational time of Algorithm 3.2 increases for both increasing values of μ1 and μ3 and increasing values of μ1 and decreasing values of μ3 . Therefore, μ1 has a stronger impact than μ3 on the computational time for Algorithm 3.2. From this analysis, we also deduce that the computational time of Algorithm 3.2 remains constant for decreasing values of μ4
3 Autonomous Multi-rotor Unmanned Aerial Vehicles for Tactical Coverage
41
310 300
Computational time [ms]
290 280 270 260 250 240 230 220 210 )
20
,0.
35
(0.
)
40
,0.
40
(0.
)
60
,0.
45
(0.
)
80
,0.
50
(0.
)
00
,1.
55
(0.
)
20
,1.
60
(0.
(
1
,
3
)
40
,1.
65
(0.
)
60
,1.
70
(0.
)
80
,1.
75
(0.
)
00
,2.
80
(0.
)
20
,2.
85
(0.
)
40
,2.
90
(0.
)
Fig. 3.2 Boxplots capturing statistical data on the computational time of Algorithm 3.2 as a function of (μ1 , μ3 ) ∈ S1 , averaged over (μ4 , μ5 ) ∈ S2 . As both μ1 and μ3 increase, the computational time increases of Algorithm 3.2 increases. A seventh-order polynomial over the interval [1, 12] best interpolates the average computational times
and increasing values of μ5 and increases for decreasing values of μ4 and decreasing values of μ5 . Therefore, to limit the computational cost of executing Algorithm 3.2, it is recommended to employ smaller values of μ1 , larger values of μ3 , smaller values ˆ 0 is divided in larger partitions of μ4 , and larger values of μ5 . However, in this case, P and all partitions contain a larger number of both explored and unexplored voxels. Alternatively, larger values of μ1 and smaller values of μ3 , μ4 , and μ5 imply higher computational costs, but produce partitions that contain either mostly explored voxels or mostly unexplored voxels. In this work, the proposed guidance system is implemented on an Intel NUC single-board computer, which is sufficiently fast to execute an implementation of Algorithm 3.2 coded in C++ with low computational times, while producing partitions that contain either mostly explored voxels or mostly unexplored voxels. Thus, we set (μ1 , μ3 , μ4 , μ5 ) = (0.90, 1.00, 16, 0.28), and tested the performance of Algorithm 3.2 with in three sets of software-in-the-loop simulations on this singleboard computer. These simulations were performed on the voxel map shown in the left-most plot of Fig. 3.4. This map captures an open space, high-bay area with few scattered obstacles on the north side, and low-ceiling offices on the south side. In the first simulation, we set n explored,Pˆ 0 = 63319, in the second simulation, we set n explored,Pˆ 0 = 97697, and in the third simulation, we set n explored,Pˆ 0 = 152394. To collect statistically relevant data, each simulation was performed 10 times. In the first, second, and third columns on the right of Fig. 3.4, the explored voxels are indicated by purple dots, the boundaries of the partitions produced by Algorithm 3.2 are denoted by red lines, and the unexplored set of voxels Vunexplored are unmarked. In average, the first set of simulation results was achieved in 213 ms with
42
J. A. Marshall et al. 310 300
Computational time [ms]
290 280 270 260 250 240 230 220 210 )
.01
0,0
(16
)
.03
8,0
(14
)
.05
6,0
(13
)
.08
4,0
(12
)
.10
2,0
(11
)
.12
0,0
(10
)
14
,0.
(88
(
4
,
)
17
,0.
(76
5
)
19
,0.
(64
)
21
,0.
(52
)
23
,0.
(40
)
26
,0.
(28
)
28
,0.
(16
0)
0.3
(4,
)
Fig. 3.3 Boxplots capturing statistical data on the computational time of Algorithm 3.2 as a function of (μ4 , μ5 ) ∈ S2 , averaged over (μ1 , μ3 ) ∈ S2 . Each set of results shows similar computational time, median, and variance
Fig. 3.4 Voxel map employed to evaluate the performance of Algorithm 3.2 and graphical representation of three sets of simulations. The occupied voxels are denoted by black dots, the explored voxels are indicated by purple dots, the boundaries of the partitions produced by Algorithm 3.2 are denoted by red lines, and the unexplored set of voxels Vunexplored are unmarked. By setting (μ1 , μ3 , μ4 , μ5 ) = (0.90, 1.00, 16, 0.28), Algorithm 3.2 produces partitions that cluster explored voxels into large parallelepipeds, partitions that cluster unexplored voxels into large parallelepipeds, and smaller partitions on the border of the explored set
3 Autonomous Multi-rotor Unmanned Aerial Vehicles for Tactical Coverage
43
a standard deviation of 3 ms, the second set of simulation results was achieved in 372 ms with a standard deviation of 9 ms, and the third set of simulation results was achieved in 617 ms with a standard deviation of 11 ms; these results show that Algorithm 3.2 can be executed in real time on a multi-rotor UAV. These simulations also show that, regardless of the size Vexplored , Algorithm 3.2 produces partitions that cluster explored voxels into large parallelepipeds, partitions that cluster unexplored voxels into large parallelepipeds, and smaller partitions on the border of the explored set. This is an ideal result, since it enables a greedy behavior by setting μ2 in (3.1) closer to 0 and, hence, allowing to select goal points for the UAV in areas that are large and mostly unexplored, while ignoring large and mostly explored areas, or a systematic behavior by setting μ2 closer to 1.
3.3.4 Numerical Solution of the Path Planning Problem n
g Given the sequence of goal points {ˆr P,q }q=0 , the proposed path planning algorithm w generates the UAV’s reference path as the sequence {ˆrk }nk=0 ⊂ R3 \ O such that rˆ0 = w rk }nk=0 minimizes the cost rˆ P,q , q ∈ {0, . . . , n g − 1}, and rˆn w = rˆ P,q+1 . The sequence {ˆ function
f k,q gk + h k,q ,
k ∈ {1, . . . , n w },
q ∈ {0, . . . , n g − 1},
(3.2)
where gk
k
κ(d2 (ˆr p , O))d2 (ˆr p , rˆ p−1 )
(3.3)
p=1
denotes the cost-to-come function, h k,q μ8 d2 (ˆrk , rˆ P,q+1 )
(3.4)
denotes the heuristic function,
κ(α)
1) , α ∈ [μ6 , μ7 ], μ8 + 0.5(1 − μ8 ) 1 + cos 2π(α−μ μ7 −μ6 1 α ∈ [0, μ6 ) ∪ (μ7 , ∞),
(3.5)
denotes the weighing function, and μ7 > μ6 > 0 and μ8 ∈ [0, 1) are user-defined parameters. The cost function (3.2) is the weighted sum of the length of the UAV’s reference path, namely, (3.3), and an underestimate of the Euclidean distance between the voxel occupied by the UAV and the goal point rˆ P,q , q ∈ {1, . . . , n g }, namely, (3.4). The weighing function κ(·) is continuous, its minimum is equal to μ8 and is attained at α = (μ6 + μ7 )/2, and it encourages tactical behaviors by rewarding paths
44
J. A. Marshall et al. 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1
0
1
2
3
4
5
6
7
8
9
10
Fig. 3.5 Plot of the weighing function κ(·) as a function of α > 0 with (μ6 , μ7 , μ8 ) = (1, 5, 0.1), (μ6 , μ7 , μ8 ) = (2, 6, 0.3), and (μ6 , μ7 , μ8 ) = (1, 5, 0.1). This weighing function is used to encourage the proposed guidance system to search for paths close to the obstacles set, so that the UAV may exploit obstacles for shelter
that are closer to the obstacles’ set O. Thus, it follows from (3.3) that for larger values of μ8 , the obstacles’ attractive effect is less enhanced, and the UAV exhibits more reckless behavior. For all α ∈ [μ6 , μ7 ], it holds that κ(α) < 1 and hence, [μ6 , μ7 ] captures the region of influence of the obstacles set. If the UAV occupies a voxel that is either closer than μ6 to O or further from O than μ7 , then the weighing function is equal to unity, and the cost-to-come function reduces to the cost-to-come function employed in classical heuristics-based path planning algorithms. Figure 3.5 shows a graphical representation of κ(·) for multiple values of μ6 , μ7 , and μ8 . Assuming that the UAV is able to move to any adjacent unoccupied voxel, the w that minimizes (3.2) can be computed by applying any optireference path {ˆrk }nk=0 mization algorithm based on the use of a heuristic function over a graph, for instance, the A∗ [19, Appendix C] or the L P A∗ algorithm [5]. Since μ8 scales the distance between the UAV’s position and the next goal point rˆ P,q , q ∈ {1, . . . , n g }, in (3.4), and κ(α) ≥ μ8 for all α ≥ 0, it follows from the triangle inequality that the heuristic function h k,q is consistent and hence, the proposed path planning subsystem does not search voxels that were already visited in previous iterations of the heuristics-based w that minimizes (3.2) will be the search algorithm, and the reference path {ˆrk }nk=0 terminating path. An alternative weighing function for tactical path planning has been proposed in [2]. A key difference between (3.5) and the weighing function in [2] lies in the fact that the user is now able to set explicitly the width of the region of influence of the obstacles’ set. Detailed discussions on the role of weighing functions, such as (3.5),
3 Autonomous Multi-rotor Unmanned Aerial Vehicles for Tactical Coverage
45
in designing path planner for tactical vehicles, while employing heuristics-based search algorithms, are presented in [2, 20].
3.3.5 Numerical Solver Selection In this section, we discuss the selection process for the optimization algorithm underlying the proposed path planner. To this goal, two algorithms were considered, namely, the A∗ algorithm [19, Appendix C] and the L P A∗ algorithm [5]. An advantage of L P A∗ over A∗ lies in the fact that, if part of the search graph varies over time, then L P A∗ leverages those parts of the graph that remain unchanged, whereas A∗ must be re-initiated each time the search graph varies. Thus, L P A∗ is computationally more efficient than A∗ . This feature of L P A∗ is highly relevant for applications such as those considered in this chapter, wherein the obstacles’ set is updated as the UAV proceeds in its coverage task, and the path planning process must be reiterated as the voxel map evolves. To verify the computational advantage of L P A∗ over A∗ , these algorithms have been tested by means of four sets of 650 software-in-the-loop simulations performed on an Intel i7 processor with 8 cores, whose maximum frequency is 4.30 GHz, and which executes a Linux operating system. Two sets of simulations are aimed at testing the computational efficiency of L P A∗ over A∗ , while seeking tactical paths, and two sets of simulations are aimed at testing the computational efficiency of L P A∗ over A∗ , while seeking reckless paths. In particular, tactical paths have been tested by seeking paths that minimize the cost function (3.2) with μ6 = 4, μ7 = 8, and μ8 = 0.45, and reckless paths have been tested by seeking paths that minimize the cost function (3.2) with μ6 = 3, μ7 = 7, and μ8 = 0.75. In all simulations, the UAV reference path connects a given initial position to a given final position, and must traverse the voxel map of some office space captured by employing the camera-based navigation system presented in [2]; this voxel map is shown in Figure 3.4. In order to challenge the optimization algorithm’s speed in dynamic environments, an incremental number of obstacles is introduced along the planned paths over the course of each simulation. These obstacles are placed so that their distance is not smaller than one UAV length and, hence, considering the size of the voxel map and the size of the UAV, a maximum of 12 obstacles have been introduced. Fixed the number of obstacles to be introduced in the voxel map along the UAV’s path, 50 simulations have been performed to assess the computational speed of each optimization algorithm. The results of the simulations obtained by employing the A∗ algorithm and the L P A∗ algorithm with a tactical parameter set are shown in Figs. 3.6 and 3.7, respectively, and the results of the simulations obtained by employing the A∗ algorithm and the L P A∗ algorithm with a tactical parameter set are shown in Figs. 3.8 and 3.9, respectively. It appears from Figs. 3.6 and 3.8 that the average computational time for the A∗ algorithm remains substantially constant with the number of obstacles introduced into the voxel map. Additionally, it appears from Figs. 3.6, 3.7, 3.8 and 3.9 that, if no obstacle or one obstacle is introduced along the UAV’s path, then
46
J. A. Marshall et al.
87
Computational time [ms]
86 85 84 83 82 81 80
0
1
2
3
4
5
6
7
8
9
10
11
12
Number of obstacles
Fig. 3.6 Statistical data on the computational time of the A∗ optimization algorithm as a function of the number of obstacles introduced along the UAV’s path, while seeking tactical paths. The red line indicates the average computational time over the 50 simulations, fixed the number of obstacles, the blue box indicates the 25–75% range, the black lines indicate the non-outlier minimum and maximum, and the red dots indicate outliers. The average computational time required to produce a path is approximately constant with the number of obstacles and is 80.93 ms with average standard deviation of 0.34 ms
the average computational time for the L P A∗ algorithm is higher than the average computational time for the A∗ algorithm. However, for more than one obstacle, the average computational time of L P A∗ is lower than the average computational time of A∗ . In light of these results, the L P A∗ algorithm has been employed for the proposed path planner. Finally, the average computational time of the L P A∗ algorithm appears to increase with the number of obstacles being introduced. However, introducing obstacles at a distance smaller than one UAV length is not relevant to the scopes of this research.
3.3.6 Safety Measures The objective of Algorithm 3.1 is to deduce goals rˆP,q ˜ , q = 0, . . . , n g , which lead the UAV’s navigation system to under-explored areas of the voxel map V. In some cases, rˆP,q ˜ lies in an area that the navigation system can not observe from the UAV’s current position rk (iΔT ), irrespectively of the yaw angle ψk (iΔT ) and pitch angle θk (iΔT ), due to the bounds on the cameras’ field-of-view captured by ψmax and the maximum pitch angle θmax . As a result, the planned path may attempt to traverse unexplored,
3 Autonomous Multi-rotor Unmanned Aerial Vehicles for Tactical Coverage
47
90 80
Computational time [ms]
70 60 50 40 30 20 10 0 0
1
2
3
4
5
6
7
8
9
10
11
12
Number of obstacles
Fig. 3.7 Statistical data on the computational time of the the L P A∗ optimization algorithm as a function of the number of obstacles introduced along the UAV’s path while seeking tactical paths. The average computational time required to produce a path is approximately constant with the number of obstacles and is 15.45 ms with average standard deviation of 26.86 ms. If no obstacle or only one obstacle is introduced along the UAV’s path, then, on average, then L P A∗ requires more time than A∗ to recalculate the vehicle’s path. If more than one obstacle is introduced along the UAV’s path, then, on average, the computational time is less than the computational time required by A∗
unobservable voxels that are occupied and, hence, jeopardizes the UAV’s safety; see Fig. 3.10 for a schematic representation of this case. In this section, we present a novel approach to address the challenge of planning a path while accounting for unobservable occupied voxels. This approach, which is captured by Algorithm 3.3 and is schematically represented by Fig. 3.11, consists in penalizing the path that leads the vehicle toward unobservable voxels. Given rˆk ∈ V \ O, the navigation system’s field-of-view ψmax , and the maximum pitch angle θmax , the set of unobservable voxels is approximated by the closed cone K h,θblind rˆk ⊂ R3 , whose apex is at rˆk , whose axis is given by the UAV’s yaw axis, and whose semiaperture is θblind π−ψ2 max − θmax , and whose height is h > 0. Given a goal point rˆP,q ˜ , q = 0, . . . , n g , firstly, we determine if rˆP,q ∈ K h,θblind rˆk . If so, then we define the ˜ closed right circular cylinder Cdanger Crdanger ,h danger ⊂ V \ O centered in the UAV, whose axis is given by the UAV’s yaw axis, and whose semi-height is given by h danger > 0. Thus, we perform ray tracing across the voxels contained in the circle Corigins Crdanger ,0 ⊂ V \ O contained in the plane orthogonal to the UAV’s yaw axis and containing the center of the UAV, where rdanger > 0 denotes the circle’s radius, and along the direction
rˆk −ˆrP,q ˜ . ˆrk −ˆrP,q ˜
If any ray intersects Vunexplored , then the Euclidean
48
J. A. Marshall et al. 64
63.5
Computational time [ms]
63
62.5
62
61.5
61
60.5 0
1
2
3
4
5
6
7
8
9
10
11
12
Number of obstacles
Fig. 3.8 Statistical data on the computational time of the A∗ optimization algorithm as a function of the number of obstacles introduced along the UAV’s path while seeking reckless paths. The average computational time required to produce a path is approximately constant with the number of obstacles and is 62.11 ms with average standard deviation of 0.37 ms
distance between any two points in rˆ p , rˆq ∈ Cdanger when evaluating (3.2) is replaced with (3.6) d2 (ˆr p , rˆq , M) = (ˆr p − rˆq )T diag(1, 1, M)(ˆr p − rˆq ), where M ≥ 1 denotes a user-defined bias set arbitrarily large to discourage any verand not penaltical motion within Cdanger . To penalize positive changes in altitude ize negative ones, M in (3.6) can be set as M = max max sign(ˆ r p,3 − rˆq,3 ), 1 , sign(ˆr p,3 − rˆq,3 )M1 where M1 > 1 is arbitrarily large. Alternatively, to penalize negative changes positive ones, M in (3.6) can be set as in altitude and not penalize M = max max −sign(ˆr p,3 − rˆq,3 ), 1 , −sign(ˆr p,3 − rˆq,3 )M1 . If a ray intersects O, then ray tracing along that particular ray is terminated. Next, the path planning problem is solved by employing the solver outlined in Sect. 3.3.4. Once the search algorithm leaves Cdanger , the process above is repeated is generated; in this case, the original distance function until a new goal rˆP,q+1 ˜ d2 (·, ·, M) is restored, that is, M is set to one. To validate Algorithm 3.3, the proposed path planning system is executed in three simulations involving the same environment, namely, a two-story building with an access door per floor; see Fig. 3.12. In all simulations, we consider the same initial position, namely, rk (iΔT ) = [17.0, 17.0, 1.4]T , which is on the first floor of the simT ulated map, and the same goal point, namely, rˆ P,q ˜ = [17.0, 17.0, 4.6] , which is on
3 Autonomous Multi-rotor Unmanned Aerial Vehicles for Tactical Coverage
49
60
Computational time [ms]
50
40
30
20
10
0 0
1
2
3
4
5
6
7
8
9
10
11
12
Number of obstacles
Fig. 3.9 Statistical data on the computational time of the the L P A∗ optimization algorithm as a function of the number of obstacles introduced along the UAV’s path while seeking tactical paths. The average computational time required to produce a path is approximately constant with the number of obstacles and is 11.45 ms with average standard deviation of 17.34 ms. Similar to the search for tactical paths, if no obstacle or only one obstacle is introduced along the UAV’s path, then, on average, then L P A∗ requires more time than A∗ to recalculate the vehicle’s path. If more than one obstacle is introduced along the UAV’s path, then, on average, the computational time is less than the computational time required by A∗
Algorithm 3.3 Evaluate safety of the goal point rˆP,q ˜ and bias the cost function (3.2) 1: if
rˆk −ˆrP,q ˜ ˆrk −ˆrP,q ˜
· (ˆrk,3 − rˆP,q,3 )e3,3 ≥ cos θblind then ˜
2: Compute Corigins 3: RayTracing(Corigins ) 4: if Ray tracing encountered Vunexplored then 5: ∀(ˆr p , rˆq ) ∈ Cdanger × Cdanger , p = q, rˆ p,3 = rˆq,3 use (3.6) to compute (3.3). 6: end if 7: end if
the second floor. Thus, we observe the outcome of the proposed path planner with and without Algorithm 3.3, while assuming that the voxels of the floor between the start and goal points are occupied and unexplored; see the left image in Fig. 3.12. Thus, we observe the outcome of the proposed path planner without Algorithm 3.3 assuming that the voxels of the floor between the start and goal points are occupied and explored; see the right image in Fig. 3.12. It is apparent how, while employing Algorithm 3.3, the proposed path planner avoids the unexplored, occupied, and unobservable, and hence, unsafe voxels between the start and goal point. Furthermore, employing Algorithm 3.3, the proposed planner produces a path similar to the
50
J. A. Marshall et al.
Fig. 3.10 Schematic representation of a challenging scenario. In the left image, the UAV enters a room through a doorway, and, there is a large cluster of unexplored and occupied voxels (marked in blue) in the ceiling of the room. If the goal point rˆ P,q ˜ generated by Algorithm 3.1 is very close w may intersect the cluster of unexplored to the UAV’s vertical axis, then the planned path {ˆrk }nk=0 voxels in the ceiling, since, in this work, unexplored voxels are considered unoccupied. Because of the cameras’ limited field of view, the UAV’s constraints on the attitude, which are imposed by the trajectory planner (see Sect. 3.4 below), and the geometry of the room, there may be a set of voxels that is impossible to observe for any yaw and pitch angles (marked in green in the right image). If this set of unobservable voxels intersects the set of unexplored voxels, then this scenario may jeopardize the UAV’s safety
path obtained assuming that the entire voxel map is explored. Conversely, without Algorithm 3.3 and assuming that the voxels between the start and goal points are unexplored and unobservable, the UAV attempts to traverse the floor between the first and second floors.
3.4 Trajectory Planning System for Tactical Coverage 3.4.1 Overview In order to exploit the UAV’s nonlinear dynamics while traversing the waypoints outlined by the path planner, the proposed guidance system employs a trajectory planner. In particular, reference trajectories for the UAV’s position and yaw angle
3 Autonomous Multi-rotor Unmanned Aerial Vehicles for Tactical Coverage
51
Fig. 3.11 Graphical representation of Algorithm 3.3 to assess the safety of the goal point rˆ P,q ˜ and avoid traversing unobservable, occupied voxels. The first step is illustrated in the top left image, in which we deduce if rˆ P,q lies in the set of unobservable voxels. Next, in the top right ˜ image, the second step requires performing several ray tracings in the voxel map to determine if there are unexplored voxels in the direction of rˆ P,q ˜ relative to the UAV’s position rk (iΔT ). If any ray intersects an unexplored voxel, then there is a potential for a collision with an unobservable occupied voxel. Next, if any ray intersects Vunexplored , then the Euclidean distance of traversing from rˆ p ∈ Cdanger \ O to rˆq ∈ Cdanger \ O, p = q, is modified according to (3.6); see bottom left image. Finally, Algorithm 3.3 is repeated until rk (iΔT ) ∈ / Cdanger ; see bottom right image. The changes to the transition costs persist until the goal is updated
52
J. A. Marshall et al.
Fig. 3.12 Results of three simulations involving the proposed path planner. In all simulations, the UAV is tasked with reaching a point directly above its initial position. However, a set of occupied voxels lies between the start and goal points. The simulation results in the left image show the UAV’s path employing Algorithm 3.3 and not employing this algorithm, assuming that the voxels between the start and goal points are unobservable. In the first case, the UAV avoids the unobservable occupied voxels and safely reaches the goal point. In the second point, the UAV attempts to traverse the unobservable occupied voxels. The simulation results in the right image show the UAV’s path employing assuming that all voxels are explored. In this case, the UAV’s path is similar to the path obtained assuming that the voxels between the start and goal points are unobservable and employing Algorithm 3.3
are computed as solutions of a linear–quadratic optimal control problem applied to the feedback-linearized equations of motion of the UAV. The optimal control problem’s time horizon is user-defined, and the cost function captures the UAV’s competing needs of reaching the next waypoints and coasting the obstacles’ set to seek shelter from potential threats. The underlying equality constraints capture both the UAV’s output-feedback linearized, discretized equations of motion and the to interpolate the sequence of waypoints given by the UAV’s need nw between the goal point rˆ reference path rˆk k=0 P,q , q ∈ {0, . . . , n g − 1}, and rˆ P,q+1 . The inequality constraints capture the UAV’s need of avoiding obstacles, maintaining the next goal point in the cameras’ fields of view at all times, verifying the bounds on the Euler angles, and assuring that each propeller’s thrust force is nonnegative. Lastly, the boundary conditions to the optimal trajectory planning problem capture the need to reach the next waypoint and orient the onboard cameras’ focal axes toward the next waypoint. The proposed trajectory planning system has several distinctive features. The linear–quadratic optimal control problem underlying the proposed trajectory planner is solved numerically as a quadratic optimization problem by employing the model predictive control methodology. The feedback-linearized equations of motion allow to exploit the vehicle’s nonlinear dynamics and map areas above and below the UAV by rotating the cameras, rather than varying the UAV’s altitude. Collision avoidance constraints are captured by convex sets with affine boundaries, which are identified by an original algorithm that allows to plan reference trajectories over sufficiently long time horizons. Finally, applying the approach presented in [2], fast solutions
3 Autonomous Multi-rotor Unmanned Aerial Vehicles for Tactical Coverage
53
to the trajectory planning problem are obtained by employing logarithmic barrier functions to embed inequality constraints in the cost function and exploiting the block-tridiagonal structure of the matrices that capture the quadratic cost function and the constraint equations. A key difference between the trajectory planning algorithm presented in this chapter and the trajectory planning algorithm presented in [2] lays in the fact that, at each time step, the proposed trajectory planner accounts for a user-defined number of waypoints ahead of the UAV, whereas the trajectory planner discussed in [2] only accounts for the next waypoint. This approach allows the proposed guidance system to trajectories that account for the UAV’s tactical needs over larger distances. Additionally, the proposed trajectory planner allows to choose how closely these waypoints need to be followed, whereas the trajectory planner in [2] requires the UAV to traverse all waypoints. As shown in [20], accounting for one waypoint at the time and requiring to traverse all waypoints, it is particularly difficult to plan trajectories that are both tactical and sufficiently short. Being able to coarsely follow waypoints allows to set the path planner the UAV is drawn to the next goal point through the shortest path, and the trajectory planner so that a more tactical behavior is enforced by coasting the obstacles’ set more closely. Compared to the collision avoidance algorithm in [2], the proposed collision avoidance algorithm allows to compute larger convex sets, where the reference trajectory can be computed and hence, allows to investigate a larger search space. Finally, in [2], the MPC algorithm is applied irrespectively of whether the reference trajectory traverses explored or unexplored voxels, whereas, in this work, the UAV’s reference trajectory is not recomputed at each time step over those segments that traverse explored voxels; this features allows to further reduce the trajectory planner’s computational time.
3.4.2 Notation Time is denoted by t ≥ 0, and we assume that the UAV is able to fly from rˆk , k ∈ {0, . . . , n w − 1}, to rˆk+1 in n t ΔT time units, where both n t ∈ N and ΔT > 0 are user-defined. In general, both n t and ΔT are different for each pair of consecutive waypoints. However, the functional dependency of these quantities on k ∈ {0, . . . , n w − 1} is omitted for simplicity of exposition. The UAV’s mass is denoted by m > 0 and the gravitational acceleration is denoted by g > 0. Assuming that the UAV’s roll, pitch, and yaw axes are principal axes of inertia, the UAV’s matrix of inertia is denoted by I ∈ R3×3 , which is diagonal and positive-definite. The UAV’s position is captured by rk : [0, n t ΔT ] → R3 \ O, inertial reference frame I. k ∈ {0, . . . , n w − 1}, expressed in a conveniently located The UAV’s roll angle is denoted by φk : [0, n t ΔT ] → − π2 , π2 , k ∈ {0, . . . , n w − 1}, the UAV’s pitch angle is denoted by θk : [0, n t ΔT ] → − π2 , π2 , the UAV’s yaw angle is denoted by ψk : [0, n t ΔT ] → [0, 2π ), the UAV’s velocity with respect to
54
J. A. Marshall et al.
I is denoted by vk : [0, n t ΔT ] → R3 , the UAV’s angular velocity with respect to I is denoted by ωk : [0, n t ΔT ] → R3 , and the the UAV’s state vector is denoted T by xk ( jΔT ) rkT ( jΔT ), φ( jΔT ), θ ( jΔT ), ψ( jΔT ), vkT ( jΔT ), ωkT ( jΔT ) , j∈ {i, . . . , n t }, i ∈ {0, . . . , n t }, k ∈ {0, . . . , n w − 1}. Let η1,k : [0, n t ΔT ] → R, k ∈ {0, . . . , n w − 1}, denote the total thrust force’s virtual control input, the total thrust force produced by the UAV’s propellers is defined as u 1,k (·) such that u 1,k (t) = [1, 0] δk (t), t ∈ [0, n t ΔT ], and u 1,k (0) u 1,0,k ˙δk (t) = 0 1 δk (t) + 0 η1,k (t), , t ∈ [0, n t ΔT ]. = u 1,f,k u 1,k (n t ΔT ) 00 1 (3.7) The roll moment produced by the UAV’s propellers is denoted by u 2,k (·), the pitch moment produced by the UAV’s propellers is denoted by u 3,k (·), and the yaw moment produced by the UAV’s propellers is denoted by u 4,k (·). The UAV’s con T trol input is defined as u k (t) u 1,k (t), u 2,k (t), u 3,k (t), u 4,k (t) , t ∈ [0, n t ΔT ], k ∈ {0, . . . , n w − 1}, and the vector of thrust forces produced by each propeller is defined as Tk (t) MT,u u k (t),
t ∈ [0, n t ΔT ],
(3.8)
where the ith component of Tk (·), i = 1, . . . , 4, namely, Ti,k : [0, n t ΔT ] → [0, ∞), denotes ⎡the thrust force⎤ produced by the ith propeller, 1 0 2l −1 −cT−1 1 ⎢1 −2l −1 0 cT−1 ⎥ ⎥, l > 0 denotes the distance of the propellers MT,u ⎢ −1 4 ⎣1 0 −2l −cT−1 ⎦ 1 2l −1 0 cT−1 from the vehicle’s barycenter, and cT > 0 denotes the propellers’ drag coefficient [21].
3.4.3 Output-Feedback Linearized Equations of Motion Neglecting the aerodynamic drag, the inertial counter-torque, and the gyroscopic effect [21], which are small for Class I aircraft such as those considered in this chapter, the UAV’s continuous-time equations of motion are given by
3 Autonomous Multi-rotor Unmanned Aerial Vehicles for Tactical Coverage r˙k (t) = vk (t), v˙k (t) = m
−1
rk (0) = rinit,k ,
t ∈ [0, n t νs,k ΔT ],
k ∈ {0, . . . , n w − 1},
55
(3.9a)
R(φk (t), θk (t), ψk (t))[0, 0, u 1,k (t)] − [0, 0, g] , rk (n t νs,k ΔT ) = rend,k , T
T
⎡ ⎤ ⎤ ⎡ ⎤ φ˙ k (t) φk (0) φ0,k ⎣ θk (0) ⎦ = ⎣ θ0,k ⎦ , ⎣ θ˙k (t) ⎦ = Γ −1 (φk (t), θk (t))ωk (t), ψ0,k ψk (0) ψ˙ k (t) ⎞ ⎡ ⎛⎡ ⎤ ⎤ ⎤ ⎡ φk (n t νs,k ΔT ) u 2,k (t) φf,k ⎣ θk (n t νs,k ΔT ) ⎦ = ⎣ θf,k ⎦ , ω˙ k (t) = I −1 ⎝⎣u 3,k (t)⎦ − ωk× (t)I ωk (t)⎠ , ψf,k u 4,k (t) ψk (n t νs,k ΔT ) ⎡
(3.9b) (3.9c)
(3.9d)
where ⎡
⎤⎡ ⎤ ⎤⎡ cos ψk − sin ψk 0 1 0 0 cos θk 0 sin θk 1 0 ⎦ ⎣0 cos φk − sin φk ⎦ , R(φk , θk , ψk ) ⎣ sin ψk cos ψk 0⎦ ⎣ 0 0 sin φk cos φk 0 0 1 − sin θk 0 cos θk π π π π × − , × [0, 2π ), (φk , θk , ψk ) ∈ − , 2 2 2 2 (3.10) captures the UAV’s attitude relative to the inertial reference frame I, ⎤ 1 0 − sin θk Γ (φk , θk ) ⎣0 cos φk cos θk sin φk ⎦ , 0 − sin φk cos θk cos φk ⎡
(3.11)
and νs,k ∈ {1, . . . , n w − k} denotes the user-defined path stride; we recall that Γ (φk , θk ), k ∈ {0, . . . , n w − 1}, is invertible for all (φk , θk ) ∈ − π2 , π2 × − π2 , π2 [22, Chap. 2]. Detailed discussions on the path stride are provided in Sects. 3.4.6 and 3.4.7 below, and a discussion on the boundary conditions rinit,k and rend,k ∈ R3 is provided in Sect. 3.4.7. The initial conditions φ0,k , θ0,k , and ψ0,k , k ∈ {0, . . . , n w − 1}, are user-defined. Employing (3.9a) and (3.9b) to produce reference trajectories, φ0,k , θ0,k , and ψ0,k , k ∈ {0, . . . , n w − 1}, are set as the UAV’s angular position at the time the MPC algorithm is initiated. The endpoint conditions for (3.9c) and (3.9d) are set so that φf,k , k ∈ {0, . . . , n w − 1}, θf,k , and ψf,k are the Euler angles of the 3-2-1 rotation sequence underlying the rotation matrix T Rf,k nˆ x,k , nˆ y,k , nˆ z,k ,
(3.12)
T T where nˆ x,k ˆrk+νs,k −ˆrk −1 rˆk+νs,k −ˆrk , nˆ y,k nˆ × ˆ× x,k [0, εk , g] /n x,k [0, εk , g] , nˆ z,k nˆ × ˆ y,k , and εk is such that if nˆ x,k = [0, 0, g]T , then εk = 0, and if nˆ x,k = x,k n T [0, 0, g] , then εk = 0 is small and user-defined. The rotation matrix Rf,k , k ∈ {0, . . . , n w − 1}, captures the attitude of a UAV, whose roll axis intersects both rˆk and rˆk+νs,k . This matrix is well-defined since the L P A∗ algorithm underlying the
56
J. A. Marshall et al.
proposed path planner does not generate two identical consecutive waypoints, that is, rˆk = rˆk+1 , for all k ∈ {0, . . . , n w − 1}. Moreover, nˆ y,k , k ∈ {0, . . . , n w − 1}, has been designed so that nˆ x,k and [0, εk , g]T are not collinear, and hence Rf,k is always a rotation matrix. Quadcopter UAVs are underactuated [21], since only four of their six degrees of freedom can be controlled directly. In this chapter, we are interested in regulating the UAV’s position rk (·), k ∈ {0, . . . , n w − 1}, and its yaw angle ψk (·) since the cameras’ T focal axes is aligned with the UAV’s roll axis. To this goal, setting rkT (t), ψk (t) , t ∈ [0, n t νs,k ΔT ], k ∈ {0, . . . , n w − 1}, as the measured output, and applying Proposition 5.1.2 of [23], we verify that the dynamical system given by (3.9) and (3.7) has vector relative degree {4, 4, 4, 2}, and (4) rk (t) = f (rk (t), φk (t), θk (t), ψk (t), ωk (t), u 1,k (t)) ψ¨ k (t) ⎡
⎤ η1,k (t) ⎢u 2,k (t)⎥ ⎥ + G −1 (rk (t), φk (t), θk (t), ψk (t), u 1,k (t)) ⎢ ⎣u 3,k (t)⎦ , u 4,k (t) r r¨ rk (0) r¨k (0) = init,k , = init,k , rend,k r¨end,k rk (n t νs,k ΔT ) r¨k (n t νs,k ΔT ) ψk (0) ψ0,k = , t ∈ [0, n t νs,k ΔT ], (3.13) ψf,k ψk (n t νs,k ΔT )
where ⎡ ⎢ ⎢ ⎢ G −1 (rk , φk , θk , ψk , u 1,k ) ⎢ ⎢ ⎢ ⎣
msφk sψk + mcφk cψk sθk mcφk sψk sθk − mcψk sφk mcφk cθk I1 m(cφk sψk −cψk sφk sθk ) u 1,k
+sψk sφk sθk ) k sφk − I1 m(cφk cψuk1,k − I1 mcθ u 1,k
0 0
I2 mcψk cθk u 1,k
I2 mcθk sψk u 1,k
k − I2umsθ 1,k
0
k cθk sφk − I3 mcψ u 1,k cφk
k sψk sφk − I3 mcθ u 1,k cφk
I3 msθk sφk u 1,k cφk
I3 mcθk cφk
⎤ ⎥ ⎥ ⎥ ⎥, ⎥ ⎥ ⎦
(3.14) cα = cos α, α ∈ R, sα = sin α, I = diag(I1 , I2 , I3 ), and f : R3 × − π2 , π2 × π π an expression for f (·, ·, ·, ·, ·, ·) − 2 , 2 × [0, 2π ) × R3 × R → R4 ; u 2 cos φ
k 1,k is omitted for brevity. It holds that det G(rk , φk , θk , ψk , u 1,k ) = m 3 det(I , ) cos θk (rk , φk , θk , ψk , u 1,k ) ∈ R3 × − π2 , π2 × − π2 , π2 × [0, 2π ) × (0, ∞), and hence, G(·, ·, ·, ·, ·, ·) is invertible if and only if u 1 = 0 since φk ∈ − π2 , π2 . The constraints on the UAV’s roll and pitch angles and the control input, namely, the conditions whereby φk (t) ∈ − π2 , π2 , t ∈ [0, n t νs,k ΔT ], k ∈ {0, . . . , n w − 1}, θk (t) ∈ π π − 2 , 2 , and u 1,k (t) > 0, are enforced by the proposed trajectory planning algorithm; for details, see Sect. 3.4.6 below. It follows from (3.9) that for the UAV’s attitude to be captured by the triplet (φ0,k , θ0,k , ψ0,k ) at t = 0 and the triplet (φf,k , θf,k , ψf,k ) at t = n t νs,k ΔT , it must
3 Autonomous Multi-rotor Unmanned Aerial Vehicles for Tactical Coverage
57
hold that r¨init,k = m −1 R(φ0,k , θ0,k , ψ0,k )[0, 0, u 1,0,k ]T − [0, 0, g]T , k ∈ {0, . . . , n w − 1}, (3.15) r¨end,k = m −1 R(φf,k , θf,k , ψf,k )[0, 0, u 1,f,k ]T − [0, 0, g]T .
(3.16)
Thus, given r¨init,k , r¨end,k ∈ R3 as user-defined, we deduce u 1,0,k , u 1,f,k ∈ R, namely, the boundary conditions of (3.7), from (3.15) and (3.16). The boundary conditions on the UAV’s acceleration r¨init,k and r¨end,k ∈ R3 are provided in Sect. 3.4.7 below. Next, let ζ (rk , φk , θk , ψk , ωk , u 1,k , λk ) G(rk , φk , θk , ψk , ωk , u 1,k ) − f (rk , φk , θk , ψk , ωk , u 1,k ) ! " Br Ar,0 rk (t) + Ar,1 r˙k (t) + Ar,2 r¨k (t) + Ar,3 rk(3) (t) λ , + Bψ k Aψ,0 ψk (t) + Aψ,1 ψ˙ k (t) π π π π × − , × [0, 2π) × R3 × R × R4 , (rk , φk , θk , ψk , ωk , u 1,k , λk ) ∈ R3 × − , 2 2 2 2 (3.17) k ∈ {0, . . . , n w − 1}, +
where ⎡
03×3 ⎢03×3 A˜ r ⎢ ⎣03×3 Ar,0
13 03×3 03×3 Ar,1
03×3 13 03×3 Ar,2
⎤ 03×3 03×3 ⎥ ⎥ ∈ R12×12 , 13 ⎦ Ar,3
A˜ ψ
0 1 ∈ R2×2 , (3.18) Aψ,0 Aψ,1
are Hurwitz, 09×4 ˜ ∈ R12×4 , Br Br
B˜ ψ
0 ∈ R2×4 Bψ
(3.19)
and the pairs ( A˜ r , B˜ r ) and ( A˜ ψ , B˜ ψ ) are controllable. If
η1,k (t), u 2,k (t), u 3,k (t), u 4,k (t)
T
= ζk (t), t ∈ [0, n t νs,k ΔT ], k ∈ {0, . . . , n w − 1},
(3.20) where ζk (t) denotes ζ (rk (t), φk (t), θk (t), ψk (t), ωk (t), u 1,k (t), λk (t)) for brevity, then the UAV’s equations of motion (3.9) are output-feedback linearized and ˜ k (t) + Bλ ˜ k (t), χ˙ k (t) = Aχ
t ∈ [0, n t νs,k ΔT ], T
(3.21)
where χk (t) rkT (t), r˙kT (t), r¨kT (t), rk(3)T (t), ψk (t), ψ˙ k (t) , A˜ blockdiag( A˜ r , A˜ ψ ) T and B˜ B˜ rT , B˜ ψT . Thus, it follows from (3.21) that the discrete-time, linearized,
58
J. A. Marshall et al.
zero-order hold [24], output-feedback linearized equations of motion of a quadcopter UAV are given by χk (( j + 1)ΔT ) = Aχk ( jΔT ) + Bλk ( jΔT ), j ∈ {i, . . . , n t νs,k − 1},
i ∈ {0, . . . , n t νs,k − 1},
k ∈ {0, . . . , n w − 1},
(3.22) # ΔT ˜ ˜ ˜ the boundary conditions for (3.21) and where A = e AΔT and B = 0 e Aσ dσ B; (3.22) are the same boundary conditions as for (3.13). Equation (3.22) serves as an equality constraint for the model predictive control algorithm employed to outline tactical reference trajectories for UAVs. Given λk ( jΔT ), j ∈ {i, . . . , n t νs,k − 1}, i ∈ {0, . . . , n t νs,k − 1}, k ∈ {0, . . . , n w − 1}, the total thrust force’s virtual control input η1,k ( jΔT ), the roll moment produced by the UAV’s propellers u 2,k ( jΔT ), the pitch moment u 3,k ( jΔT ), and the yaw moment u 4,k ( jΔT ) are deduced from (3.20) at t = jΔT employing rk ( jΔT ), φk ( jΔT ), θk ( jΔT ), ψk ( jΔT ), and their derivatives estimated by the UAV’s navigation system. Then, the total thrust force is computed as u 1,k ( jΔT ) = [1, 0]δk ( jΔT ), j ∈ {i, . . . , n t − 1}, i ∈ {0, . . . , n t − 1}, k ∈ {0, . . . , n w − 1}, where
1 ΔT 2 T 11 δk (( j + 1)ΔT ) = δ ( jΔT ) + e ζ ( jΔT ), 01 k 2 2ΔT 1,4
(3.23)
is the discrete-time zero-order hold counterpart of (3.7). Finally, the thrust force produced by each propeller is computed by applying (3.8).
3.4.4 Collision Avoidance Constraints The constraint set, which contains the UAV, excludes obstacle points, and is sufficiently large to contain viable reference trajectories, is captured by Fr,k (iΔT )rk (iΔT ) ≤≤ fr,k (iΔT ),
i ∈ {0, . . . , n t νs,k },
k ∈ {0, . . . , n w − 1},
(3.24) where the Fr,k : R → Rn coll ×3 is such that ⎤T cos (μ9 + 2μ9 (q − 1)) cos (μ10 + 2μ10 (q − 1)) T F (iΔT ) ⎣ sin (μ9 + 2μ9 (q − 1)) cos (μ10 + 2μ10 (q − 1)) ⎦ , eq,n coll r,k (−1)q sin (μ10 + 2μ10 (q − 1)) ⎡
q ∈ {1, . . . , n coll }, (3.25)
3 Autonomous Multi-rotor Unmanned Aerial Vehicles for Tactical Coverage
59
μ9 > 0 and μ10 > 0 are user-defined and such that n coll 16π 2 /(μ9 μ10 ) is an even integer and denotes the number of collision avoidance constraints, fr,k : R → Rn coll is such that $ T T f (iΔT ) min min eq,n F (iΔT ) p, eq,n r,k coll coll r,k p∈Sq ∩Oc,i,k % T min eq,n F (iΔT ) p (iΔT ) , r,k s coll s∈{1,...,n coll }\{q}
(3.26) Sq ⊂ R3 is a right square spherical pyramid, whose apex is centered in the UAV, whose bisector is given by (3.25), and whose sides have user-defined length, T (iΔT )e T ps (iΔT ) arg min p∈Ss ∩Oc,i,k Fr,k s,n coll es,n coll Fr,k (iΔT ) p,
s ∈ {1, . . . , n coll },
(3.27) and Oc,i,k ⊆ O captures a set of obstacles’ points that are more likely to be impacted by the UAV. In practice, the convex constraint set with affine boundaries captured by (3.24) is produced by partitioning a sphere centered in the UAV in n coll right square spherical pyramids, and constructing hyperplanes orthogonal to the bisector of each pyramid [25]. The use of the affine inequalities (3.24) to capture collision avoidance constraints is essential to guarantee fast numerical solutions of the proposed trajectory planning problem. To deduce Oc,i,k , we propose an approach that we named the Bubble Bath algorithm. This original method selects those obstacle points that are within some userdefined radius from the UAV’s position, are contained in the half-space defined by the UAV’s velocity vector, and are directly accessible to the UAV. In the Bubble Bath algorithm, first, we define the closed set H ρ (rk (iΔT )) x ∈ Bρ (rk (iΔT )) ∩ O : x T vk (iΔT ) ≥ 0 , i ∈ {0, . . . , n t νs,k },
k ∈ {0, . . . , n w − 1}, (3.28)
where ρ max{vk (iΔT ), v}iΔT and v > 0 is user-defined. This closed set captures all obstacle points that are within a distance ρ from the UAV’s position at t = iΔT and are included in the closed half-space determined by the UAV’s velocity vector. It follows from the definition of ρ that the size of Oc,i,k increases with the UAV’s velocity; the parameter v captures a user-defined safety margin. Next, we compute the closed parent ellipsoid E (Pk (iΔT ), rk (iΔT ), ck (iΔT )/μ11 ), i ∈ {0, . . . , n t νs,k }, k ∈ {0, . . . , n w − 1}, where Pk (iΔT ) ∈ R3×3 and ck (iΔT ) ∈ R are solutions of the quadratic programming problem with cost function min ck (iΔT )
(3.29)
60
J. A. Marshall et al.
and constraints g1 (Pk (iΔT ), rk (iΔT ), wα , ck (iΔT )) ≥ 0, g2 (Pk (iΔT ), rk (iΔT ), wβ , ck (iΔT )) ≤ 0,
α ∈ {1, . . . , n UAV }, β ∈ {1, . . . , n H },
g3 (Pk (iΔT )) ≤ 03×3 , g4 (ck (iΔT )) < 0,
(3.30a) (3.30b) (3.30c) (3.30d)
g1 (P, r, w, c) (w − r )T P(w − r ) − c, (P, r, w, c) ∈ R3×3 × R3 × R3 × R, T g2 (P, r, w, c) (w − r ) P(w − r ) − c, g3 (P) P + 13 , g4 (c) c, μ11 > 1 is a user-defined safety margin, and n H denotes the cardinality of H ρ (rk (iΔT )); conditions (3.30c) and (3.30d) guarantee that E (Pk (iΔT ), rk (iΔT ), ck (iΔT )/μ11 ) is an ellipsoid, and (3.30a) and (3.30b) guarantee that the parent ellipsoid includes all the UAV’s points wα , α ∈ {1, . . . , n UAV }, and excludes all points of H ρ (rk (iΔT )). Successively, for all i ∈ {0, . . . , n t νs,k } and k ∈ {0, . . . , n w − 1}, we compute closed children ellipsoids, E( P˜k,l (iΔT ), r˜k,l (iΔT ), c˜k,l (iΔT )), l ∈ {1, . . . , }, where is user-defined, P˜k,l (iΔT ) ∈ R3×3 and c˜k,l (iΔT ) ∈ R are solutions of the quadratic programming problem with cost function min c˜k,l (iΔT )
(3.31)
and constraints given by (3.30b)–(3.30d) with Pk (·) replaced by P˜k,l (·), rk (·) replaced by r˜k,l (·), & r˜k,l (iΔT )
⎡ ⎤ cos γl cos ηl ck (iΔT )Pk−1 (iΔT ) ⎣ sin γl cos ηl ⎦ + rk (iΔT ), μ11 sin η
(3.32)
l
ck (·) replaced by c˜k,l (·), and γl , ηl ∈ R are user-defined. Thus, children ellipsoids exclude all obstacle points in H ρ (rk (iΔT )), and it follows from (3.32) that children ellipsoids are Finally, we define −1/2 1/2 Oc,i,k r ∈ H ρ (rk (iΔT )) : ck (iΔT )Pk (iΔT )(r − rk (iΔT )) ≤ 1 + μ12 −1/2 1/2 ∪l=1 r ∈ H ρ (rk (iΔT )) : c˜k,l (iΔT ) P˜k,l (iΔT )(r − r˜k,l (iΔT )) ≤ 1 + μ12 , i ∈ {0, . . . , n t νs,k },
k ∈ {0, . . . , n w − 1},
(3.33)
where μ12 > 0 denotes a user-defined safety margin. In practice, the subset of obstacle points Oc,i,k is given by the set of all points that are within a distance smaller than or equal to μ12 from any of the children ellipsoids or E (Pk (iΔT ), rk (iΔT ), ck (iΔT )); note that it follows from (3.30b) that ∂E (Pk (iΔT ), rk (iΔT ), ck (iΔT )) comprises at least one point of the obstacles’ set O. This technique has been named Bubble Bath algorithm because the children ellipsoids are centered at the parent ellipsoid and cover unoccupied voxels surrounding the UAV and hence, mimic the behavior of bubbles in a bathtub. The larger the num-
3 Autonomous Multi-rotor Unmanned Aerial Vehicles for Tactical Coverage
61
Fig. 3.13 Schematic representation of the proposed approach to generate affine constraint sets from the set of occupied voxels O. Firstly, the parent ellipsoid, which contains the UAV and excludes all obstacle points, is created. Successively, multiple children ellipsoids, which are centered on the boundary of the parent ellipsoid, are generated. Then, the subset of obstacle points Oc,i,k are computed as the set of points of O within some user-defined distance from the parent or any of the children ellipsoids. Finally, the affine set is computed as the subset of the state space that contains the UAV and excludes all points of Oc,i,k
ber of children ellipsoids, the more accurate is this technique in identifying occupied voxels that could be impacted by the UAV. Figure 3.13 provides a graphical representation of the proposed approach to identifying the subset Oc,i,k of the obstacles’ set O for some i ∈ {0, . . . , n t νs,k } and k ∈ {0, . . . , n w − 1}, and create the affine constraint set captured by (3.24). Although the proposed technique applies to three-dimensional problems, a two-dimensional example has been shown for clarity of visualization.
3.4.4.1
Study on the Computational Cost of the Bubble Bath Algorithm
To assess the efficiency of the Bubble Bath algorithm and compare its performance with the iterative regional inflation by SDP (IRIS) algorithm [26–28], the safe flight corridor (SFC) algorithm [29], and the minimum distance collision avoidance (MDCA) algorithm [2], we consider as a test case a two-story house comprising multiple rooms on both floors and a set of stairs; see Fig. 3.14 for top-down views of both floors. Within this map, we consider 10 paths, each departing from a different room on the first floor, converging at the bottom of the stairs, proceeding to the second floor parallel along the stairs, and reaching the same point on the second floor. For each path, the distance between waypoints is 25 voxels; the shortest path includes 175 waypoints, and the longest path includes 400 waypoints. Each of the 4 algorithms being tested is executed 50 times along each path on a personal computer equipped with an Intel 8Core i7, 4.5 GHz processor, a 32 Gb DDR4 memory, and MATLAB® 2021b.
62
J. A. Marshall et al.
Fig. 3.14 Paths used for testing the computational efficiency of the Bubble Bath algorithm. The beginning of each path is marked with the numbers of waypoints Table 3.1 Coefficients of the first-order linear regressions relating path length and computation times Algorithm Slope Offset Bubble Bath MDCA SFC IRIS
8.669 3.374 19.684 12.537
64.354 12.920 3.693 16.510
As shown in Fig. 3.15, MDCA produces the lowest computational time, and the Bubble Bath algorithm produces the second-best computational time. Furthermore, for all algorithms, the computational time increases with the number of points in the path. From an analysis of the data of a first-order regression for each algorithm, it is apparent that MDCA grows the least with the number of waypoints, and Bubble Bath has the second slowest increasing computational time per number of waypoints; for details, see Table 3.1. Figure 3.16 shows statistical information about the volumes of the sets produced by the Bubble Bath algorithm, MDCA, SFC, and IRIS. Although SFC and IRIS produce the largest average and the largest minimum constraint sets, the average maximum constraint set volume is similar across the four algorithms. Finally, Fig. 3.17 shows that, except for IRIS, all algorithms produce constraint sets, whose volumes’ standard deviations are similar. Finally, it is worthwhile remarking that SFC is able to produce unbounded sets, but in a relatively small number. In conclusion, for consistency of generated volumes and computational speed, MDCA and Bubble Bath are preferable. If the volume of the constraint set is more important than the computational time, then IRIS would be a preferable choice. Finally, the Bubble Bath algorithm has the most consistent volume generated and
3 Autonomous Multi-rotor Unmanned Aerial Vehicles for Tactical Coverage
63
200 150 100 50
200 150 100 50
200 150 100 50
200 150 100 50 175
200
225
250
275
300
325
350
375
400
Fig. 3.15 Average computational times of the Bubble Bath algorithm, the MDCA, SFC, and IRIS along each path
computational time throughout testing, making it worthy of consideration for applications where predictability is a top priority.
3.4.5 Constraints on the UAV’s Attitude As discussed in Sect. 3.4.3, the output-feedback linearized equations of motion (3.22) yield for φk ( jΔT ) ∈ − π2 , π2 , j ∈ {i, . . . , n t νs,k }, i ∈ {0, . . . , n t νs,k }, k ∈ {0, . . . , n w − 1}, and θk ( jΔT ) ∈ − π2 , π2 . Furthermore, the user may want to impose constraints on the yaw angle so that ψk ( jΔT ) ∈ (−ψmax , ψmax ), j ∈ {i, . . . , n t νs,k }, i ∈ {0, . . . , n t νs,k }, k ∈ {0, . . . , n w − 1}, where ψmax > 0. For instance, by setting ψmax = π2 , the user imposes that the UAV’s roll axis always points in the direction of motion. In this chapter, the constraints on the yaw angle are imposed by means of the inequality constraints −ψk ( jΔT ) ≤ −ψf,k + ψmax , j ∈ {i, . . . , n t νs,k }, i ∈ {0, . . . , n t νs,k }, k ∈ {0, . . . , n w − 1}, (3.34a) ψk ( jΔT ) ≤ ψf,k + ψmax ,
(3.34b)
64
J. A. Marshall et al.
50 40 30 20 10
2.5 2 1.5 1 0.5 0 Bubble Bath
MDCA
SFC
IRIS
Fig. 3.16 Boxplots of some characteristics of the volumes of the collision avoidance sets produced by the Bubble Bath algorithm, MDCA, SFC, and IRIS 50 40 30 20 10
2.5 2 1.5 1 0.5 0 Bubble Bath
MDCA
SFC
IRIS
Fig. 3.17 Standard deviations of the volumes generated by each algorithm, the number of unbounded constraint sets per path, and the number of obstacles contained along each of the 10 paths
3 Autonomous Multi-rotor Unmanned Aerial Vehicles for Tactical Coverage
65
or, alternatively, by Fψ (iΔT )ψk ( jΔT ) ≤≤ f ψ,k (iΔT ), i ∈ {0, . . . , n t νs,k }, k ∈ {0, . . . , n w − 1}, (3.35) where Fψ (iΔT ) [−1, 1]T , T f ψ,k (iΔT ) ψmax − ψf,k , ψmax + ψf,k .
(3.36a) (3.36b)
As discussed in Sect. 3.4.6 below, the constraints on the pitch and roll angles are imposed by means of barrier functions.
3.4.6 Cost Function Definition The proposed trajectory planner produces reference trajectories that minimize a cost function that encourages four objectives, namely, following waypoints produced by the path planner, reducing the control effort, encouraging tactical or reckless behaviors, and discouraging the UAV’s pitch and roll angles from approaching the boundaries of the interval (−π/2, π/2). A cost function that meets the first three objectives is given by J˜[rk (0), ψk (0), λk (·)] k+νs,k −1
k+νs,k −1 n t −1 l−k νdis f (rl (n t ΔT ), ψl (n t ΔT )) +
l=k
l=k
i=0
l−k ˜ νdis (rl (iΔT ), ψl (iΔT ), λl (iΔT )),
k ∈ {0, . . . , n w − 1},
(3.37)
where νdis ∈ (0, 1), T r˜ r˜ ˜ (rk , ψk , λk ) k R˜ k k + qψ (ψk − ψf,k )2 + q˜rT r˜k + q˜λT λk , λk λk (rk , ψk , λk ) ∈ R3 × R × R4 , T f (rk , ψk ) rk − rˆk+νs,k Rr,f rk − rˆk+νs,k + qψ (ψk − ψf,k )2 ,
(3.38a) (3.38b)
R˜ r R˜ r,λ ˜ , Rr ∈ R3×3 is symmetric, R˜ r,λ ∈ R3×4 , and Rλ ∈ R4×4 is positiveT Rλ R˜ r,λ definite and such that R˜ k
T Rλ−1 R˜ r,λ > 0, R˜ r − 2 R˜ r,λ
(3.39)
66
J. A. Marshall et al.
q˜r ∈ R3 , q˜λ ∈ R4 , r˜k (iΔT ) μ13 rk (iΔT ) − rˆk+νs,k + (1 − μ13 ) f sat μ14 (ˆrk − rOc,i,k ) rk (iΔT ) − rOc,i,k , i ∈ {0, . . . , n t νs,k − 1},
(3.40)
sat (w) , w w ∈ Rn , rOc,i,k d2 (ˆrk , Oc,i,k ), and Rr,f ∈ R3×3 is symmetric and nonnegativedefinite, and qψ > 0. By the Schur complement condition on the positive-definiteness of block-matrices [30, pp. 7–8], (3.39) implies that R˜ is symmetric and positivedefinite. As discussed in Sect. 3.4.7 below, the path stride νs,k , k ∈ {0, . . . , n w − 1}, allows to set boundary conditions for the trajectory planning problem at the waypoint rˆk+νs,k . Thus, the reference trajectory does not traverse the waypoints rˆk+1 , . . . , rˆk+νs,k −1 . However, for all k ∈ {0, . . . , n w − 1}, the need to outline a reference trajectory that approximates the waypoints rˆk+1 , . . . , rˆk+νs,k −1 , is captured by the cost function (3.37). l−k , k ∈ {0, . . . , n w − 1}, in (3.37) is a discount factor applied to The term νdis the objective function so that, while the UAV approaches rˆk+1 , the influence of rˆk+2 , . . . , rˆk+νs,k on the cost function is less marked. Furthermore, if l is considerably larger than k, then the reference trajectory is more likely to intersect the set of unexplored voxels Vunexplored . Therefore, the waypoints rˆk+2 , . . . , rˆk+νs,k should contribute less to the overall cost, since confidence in the feasibility of these trajectories is lower. The Mayer’s function (3.38b) captures the UAV’s need to reach the waypoint rˆk+νs,k , k ∈ {0, . . . , n w − 1}, and point its roll axis toward this waypoint. The first term on the right-hand side of (3.40) captures the UAV’s distance from rˆk+νs,k , k ∈ {0, . . . , n w − 1}, and the second term on the right-hand side of (3.40) captures the UAV’s distance from the obstacles’ set O. The Lagrangian function (3.38a) captures the UAV’s competing needs of reaching the waypoint rˆk+νs,k , k ∈ {0, . . . , n w − 1}, and coasting the obstacles’ set, while pointing the onboard cameras toward rˆk+νs,k . Indeed, if μ13 = 1, then r˜k (iΔT ) = rk (iΔT ) − rˆk+νs,k , i ∈ {0, . . . , n t νs,k − 1}, k ∈ {0, . . . , n w − 1}, and minimizing (3.37) induces a reckless behavior in the UAV, since its only goal is to reach the waypoint, and if μ13 ∈ (0, 1), then coasting the obstacles’ set becomes a higher priority. The function f sat (·) in (3.40) reduces the attractive effect of obstacles at a distance from the waypoint rˆk , k ∈ {0, . . . , n w − 1}, that is larger than μ−1 14 . Indeed, f sat (w) → 1 as w → 0, f sat (μ14 w) = 1 for all w ∈ Bμ−1 (0 ), f (μ w) < 1 for all w ∈ Rn \ Bμ−1 (0n ), and limw→∞ f sat (μ14 w) = 0. n sat 14 14 14 Detailed discussions on the role of the weighing matrices R˜ r , R˜ r,λ , Rλ , the weighing term qψ , and the user-defined parameters μ13 and μ14 in designing trajectory planners for tactical vehicles based on the model predictive control framework are presented in [2, 20]. μ13 ∈ (0, 1] and μ14 > 0 are user-defined, rk (·) verifies (3.22), f sat (w)
3 Autonomous Multi-rotor Unmanned Aerial Vehicles for Tactical Coverage
67
To recast (3.37) as an explicit function of both the UAV’s position rk (·), k ∈ {0, . . . , n w − 1}, and λk (·), we substitute (3.40) in (3.38a), and note that minimizing (3.37) is equivalent to minimizing k+νs,k −1
l−k ˆ νdis f (rl (n t ΔT ), ψl (n t ΔT )) +
l=k
k+νs,k −1 n t −1
l=k
i=0
l−k νdis (rl (iΔT ),ψl (iΔT ), λl (iΔT )),
k ∈{0, . . . , n w − 1}, (3.41)
where T T T ˆ k , ψk , λk ) rk Rk rk + qψ (ψk − ψf,k )2 + qr,k (τ )rk + qλ,k (τ )λk , (r λk λk Rk
(rk , ψk , λk ) ∈ R3 × R × R4 ,
Rr,k Rr,λ,k , T Rr,λ,k Rλ
(3.42a) (3.42b)
Rr,k (τ ) 1 + μ13 (1 − f sat μ14 (ˆrk − rOc,i,k )) R˜ r , (3.42c) Rr,λ,k (τ ) 1 + μ13 (1 − f sat μ14 (ˆrk − rOc,i,k )) R˜ r,λ , (3.42d) qr,k (τ ) 1 + μ13 (1 − f sat μ14 (ˆrk − rOc,i,k )) q˜r − 2 R˜ r μ13rˆk+1 +(1 − μ13 ) f sat μ14 (ˆrk − rOc,i,k ) rOc,i,k , (3.42e) T qλ,k (τ ) q˜λ − 2 R˜ r,λ μ13rˆk+1 + (1 − μ13 ) f sat μ14 (ˆrk − rOc,i,k ) rOc,i,k , (3.42f) Next, we modify (3.41) to meet the fourth design objective, namely, constraining the UAV’s pitch and roll angles. Employing a model predictive control framework to minimize (3.41), in this chapter the UAV’s reference trajectories and the corresponding control inputs are computed iteratively at each time step over the discrete time horizon {iΔT, . . . , n t νs,k ΔT }, i ∈ {0, . . . , n t νs,k − 1}, as minimizers of J [iΔT, rk (iΔT ), ψk (iΔT ), λk (·)] k+νs,k −1
l−k νdis f (rk (n t ΔT ), ψk (n t ΔT )) +
l=k
n
t −1
( jΔT, rk ( jΔT ), ψk ( jΔT ), λk ( jΔT ))
j=i k+νs,k −1 n t −1
+
l=k+1
i=0
l−k νdis (iΔT, rl (iΔT ), ψl (iΔT ), λl (iΔT )),
i ∈ {0, . . . , n t νs,k − 1},
where
k ∈ {0, . . . , n w − 1},
(3.43)
68
J. A. Marshall et al.
Table 3.2 Parameters needed to define the boundary conditions for (3.13) for k ∈ {0, . . . , n w − 1}
i =0
rinit,k r¨init,k rend,k r¨end,k rinit,k r¨init,k rend,k r¨end,k
i ∈ {1, . . . , n t νs,k − 1}
= rk (n t νs,k ΔT ), = r¨k (n t νs,k ΔT ), = rˆk+νs,k , = aˆ k+νs,k = rk (iΔT ), = r¨k (iΔT ), = rˆk+νs,k , = aˆ k+νs,k
−1 ˆ jΔT, rk , ψk , λk ), ( jΔT, rk , ψk , λk ) gbarrier,k ( jΔT )(
( j, rk , ψk , λk ) ∈ {i, . . . , n t − 1} × R3 × R × R4 , gbarrier,k ( jΔT ) =
2 φmax
·
− φk2 ( jΔT )
4 ' p=1
2 θmax
− θk2 ( jΔT )
T p,k ( jΔT ) − Tmin
4 '
Tmax − T p,k ( jΔT ) ,
(3.44)
(3.45)
p=1
φmax ∈ 0, π2 and θmax ∈ 0, π2 are user-defined. The role of the penalty function gbarrier,k (·) is that if φk ( jΔT ) ∈ (−φmax , φmax ), θk ( jΔT ) ∈ (−θmax , θmax ), T p,k ( jΔT ) > Tmin , p = 1, . . . , 4, and T p,k ( jΔT ) < Tmax , then the model predictive control algorithm underlying the proposed trajectory planning subsystem penalizes n t νs,k , k ∈ {0, . . . , n w }, that all admissible solutions {rk ( jΔT ), ψk ( jΔT ), λk ( jΔT )} j=i n t νs,k induce a decreasing subsequence of gbarrier,k ( jΔT ) j=i . The second term on the right-hand side of (3.43) includes a summation that recedes as the UAV approaches n t ΔT , and a fixed summation for path segments l > k. It follows from (3.13), (3.17), and (3.21) that the UAV’s roll and pitch angles are not part of the state vector χk (·), k ∈ {0, . . . , n w − 1}, and their dynamics are T unobservable in the measured output rkT (·), ψk (·) . Therefore, both φk ( jΔT ), j ∈ {i, . . . , n t νs,k }, k ∈ {0, . . . , n w − 1}, and θk ( jΔT ) are deduced from (3.9c) and (3.9d) with control input given by (3.17) and λk ( jΔT ) given by the model predictive control law. It is also worthwhile to note that constraints on φk (·), k ∈ {0, . . . , n w − 1}, and θk (·) are imposed through the barrier function gbarrier (·) Lagrangian function. Similarly, it follows from (3.20), (3.23), and (3.8) that each propeller’s thrust force T p,k (·) is a nonlinear, time-varying function of λk (·) and hence, constraints on the positivity of each propeller’s thrust force must be imposed through gbarrier (·). The user-defined parameter Tmin > 0 captures the fact that the majority of commercial-off-the-shelf motors and propellers for quadcopters do not allow to produce arbitrarily small thrust force. The user-defined parameter Tmax > 0 captures the saturation constraint for each propeller.
3 Autonomous Multi-rotor Unmanned Aerial Vehicles for Tactical Coverage
69
3.4.7 Boundary Conditions The boundary conditions on the UAV’s translational dynamic equations (3.13) or, alternatively, (3.21) and (3.22), are given in Table 3.2. The conditions on the UAV’s initial position and acceleration namely, rinit,k and r¨init,k , ensure that the UAV’s reference position and acceleration are equal to the UAV’s position and acceleration when the MPC algorithm is initialized, respectively. The conditions on the UAV’s final position, namely, rend,k , ensures that the reference trajectory traverses the waypoint rˆk+νs,k , k ∈ {0, . . . , n w − 1}. The condition on the UAV’s final acceleration are defined so that aˆ k+νs,k aˆ d2 (ˆrk+νs,k , O) dˆk ,
k ∈ {0, . . . , n w − 1},
(3.46)
where
a(α) ˆ
⎧ ⎪ ⎨0,
μ17 (α ⎪ μ16 −μ15
⎩ μ17 ,
dˆk μ18
− μ15 ),
if α ∈ [0, μ15 ], if α ∈ [μ15 , μ16 ], otherwise,
rˆk+νs,k +1 − rˆk+νs,k rˆk+νs,k − rˆk+νs,k −1 + (1 − μ18 ) , ˆrk+νs,k − rˆk+νs,k −1 ˆrk+νs,k +1 − rˆk+νs,k
(3.47a)
(3.47b)
and μ15 , μ16 , μ17 > 0 and μ18 ∈ [0, 1] are user-defined. If d2 (ˆrk+νs,k , O) ≤ μ15 , k ∈ {0, . . . , n w − 1}, that is, if the UAV is sheltered by an obstacle at a distance smaller than μ15 , then ¨rend,k = 0, since it is desirable for the UAV not to accelerate while operating in safer areas. Alternatively, if d2 (ˆrk+νs,k , O) > μ16 , k ∈ {0, . . . , n w − 1}, that is, if the UAV is too far from any obstacle to be considered sheltered, then ¨rend,k = μ17 , since it is desirable that the UAV rapidly traverses unsafer areas. Finally, if d2 (ˆrk+νs,k , O) ∈ [μ15 , μ16 ], k ∈ {0, . . . , n w − 1}, then ¨rend,k is set to increase as a linear function of d2 (ˆrk+νs,k , O) from 0 to μ17 . If μ18 = 1, then r¨end,k , k ∈ {0, . . . , n w − 1}, is set to point in the direction joining the waypoint rˆk+νs,k , which is set as the endpoint for the trajectory, and rˆk+νs,k −1 . Alternatively, if μ18 = 0, then r¨end,k , k ∈ {0, . . . , n w − 1}, is set to point in the direction joining the waypoint rˆk+νs,k , which is set as the endpoint for the trajectory, and rˆk+νs,k +1 . Setting μ18 = 1 allows to align the endpoint condition on the acceleration with the direction the reference trajectory would follow by interpolating all waypoints, that is, by setting νs,k = 1. This option may induce overly aggressive maneuvers in the presence of sharp turns in the reference path. Setting μ18 = 0 allows to align the endpoint condition on the acceleration with the direction the reference trajectory would follow by interpolating all waypoints after rˆk+νs,k has been reached. This option may induce less aggressive maneuvers in the presence of sharp turns in the reference path. For brevity, Table 3.2 does not include the initial conditions at the beginning of the mission and the endpoint conditions at the end of the mission. In this study, the UAV is designed to start and end its mission with zero acceleration.
70
J. A. Marshall et al.
3.4.8 Numerical Solution of the Trajectory Planning Problem In this chapter, the pair (χk ( jΔT ), λk ( jΔT )) is computed for each j ∈ {i, . . . , n t − 1}, i ∈ {0, . . . , n t − 1}, and for all k ∈ {0, . . . , n w − 1}, as minimizers of the cost function (3.43) subject to (3.22), (3.24), and (3.35) by applying the framework presented in [2]. A difference with the work presented in [2] is that in this work, the cost function underlying the trajectory planner is a function of time, since the roll angle φk (·), k ∈ {0, . . . , n w − 1}, the pitch angle θk (·), and the thrust force produced by the pth propeller T p,k (·), p = 1, . . . , 4, are not a part of the state vector χk (·). Thus, the quadratic programming algorithm used to solve the trajectory planning problem needs to be updated at each time step.
3.5 Numerical Simulations Results In this section, we present and analyze the results from two hardware-in-the-loop simulations aimed at showing the features and the applicability of the proposed guidance system. In particular, Sect. 3.5.1 below shows the results of the proposed system while requiring a reckless trajectory to be followed. Successively, Sect. 3.5.2 shows the results of the proposed system while requiring a tactical trajectory to be followed. The results of the simulations discussed in the following as well as the optimization framework presented thus far can be also found at [31].
3.5.1 Reckless Guidance System Simulation The results from the first simulation, in which the guidance system’s parameters are set to instill a reckless behavior, are shown in Fig. 3.18. Solid black lines indicate the boundaries of obstacles, the green star indicates the UAV’s initial position r0 (0) = [1.0, 1.0, 1.0]T m, the red line indicates the UAV’s position rk (iΔT ), the blue, purple, and green arrows indicate the orientation of the UAV’s roll, pitch, and yaw axes, respectively. The pink and yellow contour plot represents the projection of the explored set Vexplored onto the horizontal plane. Yellow shades indicate that a large percentage of voxels have been detected by the simulated camera, whereas pink shades indicate that a small percentage of voxels have been detected by the simulated camera. At the beginning of the mission, the simulated camera gathers information for T Algorithm 3.1, which generates the first goal point rˆP,0 ˜ = [4.6, 1.6, 1.8] m, a reference path and a reference trajectory are outlined, while avoiding the detected obstacles. Once the UAV rounds the corner near [7.0, 5.0, 1.0]T m, the first goal is detected at t = 19.86 s, triggering Algorithm 3.1 to execute and create a new goal point. This T process is iterated 8 times when the final goal point rˆP,8 ˜ = [13.8, 3.8, 1.0] m was
3 Autonomous Multi-rotor Unmanned Aerial Vehicles for Tactical Coverage
71
Fig. 3.18 Results from a flight simulation with a reckless parameter set. Solid black lines indicate the boundaries of obstacles, the green star indicates the UAV’s start point r0 (0), the red line indicates the UAV’s trajectory, the blue star indicates the UAV’s final goal point rˆP,8 ˜ . The filled contour plot indicates the sum of the explored voxels at that a given location in the horizontal plane. Bright yellow indicates that every voxel has been detected by the simulated camera, and bright pink indicates that no voxel has been detected by the simulated camera. The coverage mission is completed in 262.70 s after having explored 97.17% of the total number of voxels. On average, 1028 voxels per second are explored, which corresponds to 0.34% of the total number of voxels per second
detected at t = 262.70 s, and the mission is completed after having explored 97.17% of the total number of voxels. On average, 538.30 voxels per second are explored, which corresponds to 0.18% of the total number of voxels per second. It is apparent how the UAV’s trajectory traverses the open areas, and, hence, exposes the vehicle to potential threats. The average distance from the obstacles’ set is 1.97 m, and the standard deviation of the distance from the obstacles’ set is 1.22 m. The distance traveled by the UAV is 95.717 m, approximately 10 times the distance . The average flight speed of 0.40 ms with a standard deviation between r0 (0) and rˆP,12 ˜ m of 0.51 s . The vehicle’s maximum pitch angle θk (·) and maximum roll angle φk (·) were 8.39◦ and 8.82◦ , respectively, satisfying the constraints imposed by (3.45).
72
J. A. Marshall et al.
Fig. 3.19 Results from a flight simulation with a tactical parameter set. The coverage mission is completed in 216.12 s after having explored 96.65% of the total number of voxels. On average, 383.00 voxels per second are explored, which corresponds to 0.11% of the total number of voxels per second
3.5.2 Tactical Guidance System Simulation The results from the second simulation, in which the guidance system’s parameters are set to instill a tactical behavior, are shown in Fig. 3.19. As for the first simulation, the UAV starts at r0 (0) = [1.0, 1.0, 1.0]T m. Once the simulation starts, the simulated camera gathers information for Algorithm 3.1, which determines the first T goal point rˆP,0 ˜ = [15.0, 5.0, 4.6] m. The first goal point is on the second floor, and, due to the unknown nature of the environment, the tactical path aims to intersect the ceiling of the first floor. As the vehicle follows the path, the ceiling’s occupied voxels are detected, and the collision avoidance algorithm inhibits the vehicle from traveling any further. Thus, the UAV generates a new path, and, eventually, the staircase is discovered and used to reach the second floor. Once the vehicle reaches the second floor, the planned path coasts the walls nearby as it approaches the goal, enabling a tactical behavior. When the UAV rounds the corner of the obstacle near T [10.0, 5.0, 1.0]T m, the first goal point rˆP,0 ˜ = [8.6, 5.4, 0.6] m is detected by the simulated camera at t = 34.80 s, triggering Algorithm 3.1 to execute and create a new goal point. By the end of the simulation, 13 goal points are generated. The = [14.6, 16.0, 5.0]T m, is generated at t = 301.47 s, final goal point, namely, rˆP,12 ˜
3 Autonomous Multi-rotor Unmanned Aerial Vehicles for Tactical Coverage
73
and detected at t = 308.93 s. The percentage of the map explored is 96.51%, with approximately 3.30% being undetectable by the simulated camera sensor. The number of voxels explored per second is 383.00, which is approximately 0.11% of the total number of voxels in the map. Observing Fig. 3.19, it is apparent how the UAV’s trajectory exploits the obstacles’ set to seek shelter and reduce its exposure. An animation of this trajectory shows how the UAV travels more rapidly across areas away from the obstacles’ set. The average distance from the obstacles’ set is 1.22 m, which is 0.75 m less than the average distance from the obstacles’ set obtained in the reckless simulation, and the standard deviation of the distance from the obstacles’ set is 1.01 m. The tactical guidance system enabled the vehicle to exploit shelter in areas that are not densely packed with obstacles. The distance traveled by the UAV is 244.05 m, which is 148.33 m longer than the distance traveled by the UAV with a reckless behavior. This . The flight distance is approximately 11 times the distance between r0 (0) and rˆP,12 ˜ time is 216.12 s, 66.02 s longer than the reckless counterpart. This flight time yields an average flight speed of 0.28 ms with a standard deviation of 0.38 ms . The lower average flight speed can be attributed to the fact that, while coasting obstacles, the UAV is programmed to fly at a lower speed. The vehicle’s maximum pitch angle θk (·) and maximum roll angle φk (·) are 8.60◦ and 8.42◦ , respectively, satisfying the constraints imposed by (3.45).
3.6 Conclusion and Future Work This chapter presented the first guidance system for autonomous multi-rotor UAVs employed to cover unknown areas while implementing several tactics to minimize the risk of exposure to unknown, potential threats. These tactics include exploring obstacles to seek shelter, proceeding more slowly in safer areas, such as in proximity of obstacles, and accelerating, while traversing more hazardous areas. The extent these strategies need to be pursued can be tuned by user-defined parameters. This guidance system comprises an optimization-based path planner that is suitable for cluttered, dynamic environments. An original goal selection algorithm allows the proposed path planner to prioritize the need for a systematic coverage or the need for covering largely unexplored, according to the user’s preference. The proposed guidance system also includes an optimal control-based trajectory planner that interpolates the waypoints produced by the path planner, while accounting for the UAV’s nonlinear dynamics. User-defined constraints on the maximum pitch, roll, and yaw angles are imposed by means of inequality constraints and barrier functions. This trajectory planner also allows the user to choose how closely the reference path needs to be followed. Thus, the proposed guidance system allows the user to request reckless reference paths, which guarantee shorter distances and flight times, and reference trajectories that, while not following the reference path too closely, guarantee sufficiently high levels of cautiousness. An original algorithm, named the bubble bath
74
J. A. Marshall et al.
algorithm, allows to define convex constraint sets in voxel maps, which define the search space for the trajectory planner. In the near term, future work directions include verifying the computational advantage of the proposed path planner, which is based on the L P A∗ algorithm over alternative search algorithms such as A∗ . Additionally, the performance of the proposed Bubble Bath algorithm will be tested against state-of-the-art algorithms such as IRIS (Iterative Regional Inflation by Semidefinite programming) [26–28] and SFC (Safe Flight Corridors) [29]. Finally, flight tests will validate the proposed numerical simulation results. A key point in the proposed guidance system revolves around optimal trajectory planning, which, after having feedback-linearized the UAV’s equations of motion, has been cast as a linear–quadratic optimization problem to produce solutions in real time. The next chapter discusses an alternative approach to the optimal trajectory planning problem, which allows to account for nonlinear differential equations in the trajectory planning process. Acknowledgements This work was supported in part by DARPA under Grant no. D18AP00069.
References 1. Chien WY (2020) Stereo-camera occupancy grid mapping. Master’s thesis, Aerospace Engineering 2. Marshall JA, Anderson RB, Chien W-Y, Johnson EN, L’Afflitto A (2021) A guidance system for tactical autonomous unmanned aerial vehicles. J Intell & Robot Syst 103(4):1–36 3. Peng C, Isler V (2019) Adaptive view planning for aerial 3d reconstruction. In: International conference on robotics and automation. IEEE, pp 2981–2987 4. Zhou X, Yi Z, Liu Y, Huang K, Huang H (2020) Survey on path and view planning for UAVs. Virtual Real & Intell Hardw 2(1):56–69 5. Koenig S, Likhachev M, Furcy D (2004) Lifelong planning A∗ . Artif Intell 155(1):93–146 6. Koenig S, Likhachev M (2002) D ∗ lite. In: National conference on artificial intelligence, vol 15. AAAI, Alberta, Canada (2002), pp 476–483 7. Koenig S, Likhachev M (2005) Fast replanning for navigation in unknown terrain. Trans Robot 21(3):354–363 8. Cabreira TM, Brisolara LB, Ferreira PR Jr (2019) Survey on coverage path planning with unmanned aerial vehicles. Drones 3(1):4 9. Shakhatreh H, Sawalmeh AH, Al-Fuqaha A, Dou Z, Almaita E, Khalil I, Othman NS, Khreishah A, Guizani M (2019) Unmanned aerial vehicles (UAVs): a survey on civil applications and key research challenges. IEEE Access 7:48 572–48 634 10. Martz J, Al-Sabban W, Smith RN (2020) Survey of unmanned subterranean exploration, navigation, and localisation. IET Cyber-Syst Robot 2(1):1–13 11. Moysis L, Petavratzis E, Volos C, Nistazakis H, Stouboulos I, Valavanis K (2020) A chaotic path planning method for 3D area coverage using modified logistic map and a modulo tactic. In: International conference on unmanned aircraft systems. IEEE, pp 220–227 12. Wallar A, Plaku E, Sofge DA (2014) A planner for autonomous risk-sensitive coverage (PARCov) by a team of unmanned aerial vehicles. In: IEEE symposium on swarm intelligence. IEEE, pp 1–7 13. He P, Dai S (2013) Stealth coverage multi-path corridors planning for UAV fleet. In: International conference on mechatronic sciences, electric engineering and computer. IEEE, pp 2922–2926
3 Autonomous Multi-rotor Unmanned Aerial Vehicles for Tactical Coverage
75
14. Kreyszig E (1989) Introductory functional analysis with applications. Wiley, New York, NY 15. Bernstein DS (2009) Matrix mathematics: theory, facts, and formulas, 2nd edn. Princeton University Press, Princeton, NJ 16. Vasquez-Gomez JI, Gomez-Castaneda C, De Cote EM, Herrera-Lozada JC (2016) Multirotor UAV coverage planning under wind conditions. In: International conference on mechatronics, electronics and automotive engineering. IEEE, pp 32–37 17. Gramajo G, Shankar P (2017) An efficient energy constraint based UAV path planning for search and coverage. Int J Aerosp Eng 2017:1–13 18. Li T, Wang C, de Silva CW et al (2019) Coverage sampling planner for UAV-enabled environmental exploration and field mapping. In: International conference on intelligent robots and systems. IEEE, pp 2509–2516 19. Latombe J-C (2012) Robot motion planning, vol 14. Springer, Berlin, Germany 20. Amrite S (2021) A Taguchi-based approach to tune bio-inspired guidance system for tactical UAVs. Master’s thesis, Mechanical Engineering 21. L’Afflitto A, Anderson RB, Mohammadi K (2018) An introduction to nonlinear robust control for unmanned quadrotor aircraft. IEEE Control Syst Mag 38(3):102–121 22. L’Afflitto A (2017) A mathematical perspective on flight dynamics and control. Springer, London, UK 23. Isidori A (1995) Nonlinear control systems. Springer, New York, NY 24. Tsai JS-H, Huang C-C, Guo S-M, Shieh L-S (2011) Continuous to discrete model conversion for the system with a singular system matrix based on matrix sign function. Appl Math Modell 35(8):3893–3904 25. Kavan L, Kolingerova I, Zara J (2006) Fast approximation of convex hull. In: International conference on advances in computer science and technology, Anaheim, CA, pp 101–104 26. Alonso-Mora J, Baker S, Rus D (2017) Multi-robot formation control and object transport in dynamic environments via constrained optimization. Int J Robot Res 36(9):1000–1021 27. Deits R, Tedrake R (2015) Computing large convex regions of obstacle-free space through semidefinite programming. In: Algorithmic foundations of robotics XI. Springer, pp 109–124 28. Deits R, Tedrake R (2015) Efficient mixed-integer planning for UAVs in cluttered environments. In: International conference on robotics and automation. IEEE, pp 42–49 29. Liu S, Watterson M, Mohta K, Sun K, Bhattacharya S, Taylor CJ, Kumar V (2017) Planning dynamically feasible trajectories for quadrotors using safe flight corridors in 3-D complex environments. IEEE Robot Autom Lett 2(3):1688–1695 30. Boyd S, El Ghaoui L, Feron E, Balakrishnan V (1994) Linear matrix inequalities in system and control theory. SIAM, Philadelphia, PA 31. Marshall JA, Binder P, L’Afflitto A, Tactical coverage of environments using multi-rotor UAVs. YouTube. https://youtu.be/NjulQs-cTTo Julius A. Marshall received a B.S. degree in Aerospace Engineering from the University of Oklahoma in 2018. Since 2018, he has been a part of the Advanced Control Systems Lab under Dr. L’Afflitto’s advisement. He is pursuing a Ph.D. degree in Industrial and Systems Engineering at Virginia Tech, where he is specializing in robotics. His current research interests include robust control, optimal control, guidance and navigation, and adaptive control for autonomous shipboard landing. In 2019, he received the SMART Scholar Award. Paul Binder received a B.S. in Aerospace Engineering from the Pennsylvania State University in 2021. He joined the Advanced Control Systems Lab at Virginia Tech under the advisement of Dr. L’Afflitto in the fall of 2021. He is currently pursuing a Master’s degree in Aerospace Engineering at Virginia Tech, where he is specializing in autonomous aircraft. His current research interests include object detection, controls, and guidance and navigation.
76
J. A. Marshall et al.
Andrea L’Afflitto received a B.S. degree in Aerospace Engineering and an M.S. degree in Aerospace Engineering and Astronautics from the University of Napoli “Federico II,” Italy, in 2004 and 2006, respectively, an M.S. degree in Mathematics from Virginia Tech in 2010, and a Ph.D. in Aerospace Engineering from Georgia Tech in 2015. He is an Associate Professor with the Grado Department of Industrial and Systems Engineering at Virginia Tech and holds affiliate positions with the Department of Aerospace and Ocean Engineering, the Department of Mechanical Engineering, and the National Security Institute. His current research interests include nonlinear robust control, optimal control, and control of unmanned aerial systems. In 2018, he received the DARPA Young Faculty Award. He is a Senior Editor for the IEEE Transactions of Aerospace and Electronic Systems and is a member of the IEEE Aerospace Controls Technical Committee.
Chapter 4
An Improved Differential Dynamic Programming Approach for Computational Guidance Xiaobo Zheng, Shaoming He, and Defu Lin
4.1 Introduction Optimal guidance and its variants have been widely used for unmanned aerial vehicle (UAV) trajectory planning due to their systematicity, well-posedness of the design process, and ability to satisfy certain terminal constraints [1–4]. Optimal guidance laws bring in the philosophy of trajectory shaping by optimizing a meaningful cost function and meeting certain boundary conditions. With the increasing complexity of application scenarios, however, real-world guidance problems for UAV systems will be characterized by numerous practical constraints and highly time-varying, nonlinear dynamics. Therefore, classical closed-form optimal guidance laws, which rely on approximated models with linearization, unrealistic assumptions, and an offline design fashion, are no longer appealing to solve future real-world guidance problems. Indeed, for instance, the authors in Chap. 3 rely on a feedback-linearization baseline controller before applying their fast, optimal trajectory planning approach to the UAV’s dynamics. In recent years, thanks to the rapid development of embedded computational capability, there has been increasing attention to the development of computational guidance algorithms [5, 6]. Unlike classical optimal guidance laws, computational guidance algorithms generate the guidance command relying extensively on onboard computers and, therefore, do not require an analytic solution of specific guidance laws. In essence, the computational guidance problem can be considered as a nonX. Zheng · S. He (B) · D. Lin Beijing Institute of Technology, Beijing, China e-mail: [email protected] X. Zheng e-mail: [email protected] D. Lin e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 A. L’Afflitto et al. (eds.), Control of Autonomous Aerial Vehicles, Advances in Industrial Control, https://doi.org/10.1007/978-3-031-39767-7_4
77
78
X. Zheng et al.
linear constrained trajectory optimization problem [7–11], i.e., finding a time history of guidance commands that optimizes a performance metric while verifying certain nonlinear constraints. In general, finding analytical solutions to real-world trajectory optimization problems is intractable due to its inherent nonlinear nature. Over the past few years, many different kinds of numerical algorithms have been reported to solve nonlinear trajectory optimization problems, e.g., pseudospectral methods [12–14], sequential quadratic programming [15–17], sequential convex programming [18–20], and differential dynamic programming [21–23]. The authors in [8] provide a well-organized description of several numerical optimization algorithms, while highlighting their advantages and mathematical nature. In recent years, differential dynamic programming (DDP) has become a prevailing method in computational guidance for aerospace vehicles due to its fast convergence characteristics [24–27]. DDP origins from classical dynamic programming (DP) that leverages the Bellman optimality principle to recursively update the solution until convergence [28]. However, DDP overcomes some of the shortcomings of DP, i.e., the curse of dimensionality, by a second-order approximation of the value function to decompose the original problem into a succession of subproblems of smaller dimensions [29]. More specifically, DDP utilizes second-order Taylor series to approximate the value function in a neighborhood of the current solution and uses the first-order optimality condition to recursively update the optimal increment of the control sequence until it converges to the local optimum. This key feature of DDP allows linear complexity in the length of the prediction horizon and ensures theoretical second-order convergence rate [30], similar to Newton’s method. This means that DDP and its variants enable real-time applications, which is of paramount importance for fast-moving vehicles such as UAVs. Another interesting property that makes DDP an attractive method in the context of trajectory optimization is that it provides locally optimal feedback controllers, which can be helpful when the flight trajectory is perturbed away from the nominal one. The main issue of DDP lies in that it requires an initial setting of the final time and cannot directly handle nonlinear constraints. Consequently, the applicability of DDP is limited for autonomous vehicles employed in problems of practical interest. Indeed, on one hand, selecting a fixed final time is a heuristic process, and it is generally intractable to find the optimal final time by hand. On the other hand, autonomous vehicles are usually subject to physical limits, e.g., control saturation, and trajectory limits, e.g., no-fly zone. Hence, it is meaningful and necessary to revisit the original DDP to address these two major shortcomings. Unlike DDP, the treatment of free terminal time in other numerical optimization algorithms has been reported in some recent publications. For instance, the authors in [31] integrate the terminal time increment into the change of the state variables and formulate a composite cost function that comprises the terminal time and control increments for a model predictive static programming solver. The first-order optimality condition is then leveraged to derive the analytical expression of the terminal time increment to the final-time update. This concept was also utilized in [15] to solve the problem of optimal free-time obstacle avoidance constrained guidance using sequential quadratic programming. Driven by the ongoing needs in trajectory
4 An Improved Differential Dynamic Programming Approach …
79
optimization, there have also been several initial investigations on constrained DDP reported in the literature. Popular techniques in this front include quadratic programming [32, 33] and the augmented Lagrangian penalty function method [34–36]. However, the treatment in these works is not mathematically rigorous due to the lack of detailed derivation of the variation of the value function in consideration of the nonlinear constraints. Additionally, none of the existing works consider the problem of dynamically optimizing the final time in DDP implementation. Motivated by the aforementioned observations, this chapter revisits the original DDP and proposes a new algorithm, namely, the flexible final-time-constrained differential dynamic programming (FFT-CDDP). This algorithm utilizes a Taylor expansion on the value function with respect to the final time and uses the first-order optimality condition to obtain the optimal final-time variation in a neighborhood of the nominal trajectory at each iteration. The nonlinear state and control constraints are handled by the augmented Lagrangian method. A detailed mathematical proof shows how to modify the state-action value function. Hence, the proposed FFTCDDP algorithm provides strong potential in practical applications for autonomous vehicles. A three-dimensional constrained UAV guidance problem is solved by using the proposed FFT-CDDP algorithm. This problem encompasses all the main points of the development in this chapter, i.e., free final time and nonlinear no-fly zone constraints. The proposed method works well and reliably in solving the optimal guidance problem, including all the cases where the conventional GPOPS algorithm fails to provide optimal solutions. A comparative analysis demonstrates that the proposed algorithm provides better solutions with much higher computational efficiency.
4.2 Backgrounds and Preliminaries In this section, we first introduce a general trajectory optimization problem for autonomous vehicles and then review the original DDP method for completeness of the chapter.
4.2.1 General Trajectory Optimization Problem We assume that the dynamics of an autonomous vehicle are captured by the generic nonlinear differential equation x˙ = F(x, u), x (t0 ) = x 0 ,
(4.1)
where x ∈ Rn and u ∈ Rm denote the state and control vectors, respectively. The function F : Rn × Rm → Rn is a nonlinear and continuously differentiable in x and u. The state of the considered dynamical system at the initial time instant t0 is denoted as x 0 .
80
X. Zheng et al.
DDP and its variants are formulated in a discrete-time manner and, hence, the continuous-time system (4.1) needs to be discretized to allow the application of DDP. By using either the simple Euler discretization or the Runge–Kutta method, (4.1) can be discretized as (4.2) x k+1 = f (x k , uk ) , where k = {0, 1, . . . , N } denotes the discrete-time index, f is a nonlinear function that describes the system’s dynamics, and the variables with subscript k represent the corresponding value at time instant tk . A cost or objective function that has been utilized in trajectory optimization usually consists of a terminal constraint and a running cost, i.e., J
N −1
l(x k , uk ) + φ(x N , t N ),
(4.3)
k=0
where φ(x N , t N ) denotes the terminal constraint and l(x k , uk ) denotes the running cost. The final time t N ≥ t0 is assumed to be free in this chapter. Due to physical constraints, the control input of an autonomous vehicle is usually bounded by a component-wise inequality such as umin ≤ uk ≤ umax ,
(4.4)
where umin and umax ∈ Rm denote the lower and upper bounds of the control input, respectively. The trajectory of the vehicle is also subject to nonlinear path constraints, e.g., no-fly zones, which can be captured by g (x k , uk ) ≤ 0 p ,
(4.5)
where g : Rn × Rm → R p is a nonlinear function. In summary, the constrained trajectory optimization problem considered in this chapter, which is denoted as CTO1 , has the following form: CTO1 : Find N −1 u∗k = min l(x k , uk ) + φ(x N , t N ) (4.6) uk
k=0
subject to x k+1 = f (x k , uk ) , g (x k , uk ) ≤ 0 p ,
(4.7a) (4.7b)
umin ≤ uk ≤ umax , t N is free.
(4.7c) (4.7d)
4 An Improved Differential Dynamic Programming Approach …
81
It is assumed that problem CTO1 has a solution. However, finding a solution analytically is in general intractable due to the inherent nonlinearity, and, hence, numerical algorithms are usually leveraged to solve this problem.
4.2.2 Differential Dynamic Programming DDP is an approximate DP algorithm that has been well-recognized for solving nonlinear optimization problems. The original DDP cannot handle nonlinear constraints and requires a fixed final time in optimization. Hence, DDP solves a fixed-time constraint-free optimization problem, which is denoted as CTO2 . CTO2 : Find N −1 u∗k = min l(x k , uk ) + φ(x N ) (4.8) uk
k=0
subject to x k+1 = f (x k , uk ) , t N is fixed.
(4.9a) (4.9b)
Let U k = {uk , uk+1 , . . . , u N −1 } be a partial control sequence. The cost-to-go Jk at time instant tk is defined as Jk =
N −1
l (x i , ui ) + φ (x N ) .
(4.10)
i=k
DDP leverages the concept of value function V (x k ), which is defined as the optimal cost-to-go V (x k ) min Jk Uk
=
N −1
l x i , ui∗ + φ (x N ) .
(4.11)
i=k
The key idea of DDP is to convert the problem of optimizing the entire control sequence U 0 into the problem of optimizing a single control input uk backward sequentially. This principle is called the Bellman optimality condition, which can be mathematically formulated as V (x k ) min l (x k , uk ) + V (x k+1 ) . uk
(4.12)
82
X. Zheng et al.
Thus, define Q(x k , uk ) l (x k , uk ) + V (x k+1 ) as the state-action value function. Then, (4.12) can be rewritten as V (x k ) = min Q (x k , uk ) .
(4.13)
uk
The issue associated with (4.13) is that the system dynamics is generally nonlinear and, hence, finding optimal uk using (4.13) is intractable in practice. To address this problem, DDP leverages second-order Taylor series to approximate both the value function and the action-value function around a nominal trajectory defined by (x k , uk ), as 1 V (x k + δx k ) ≈ V (x k ) + VxT (x k )δx k + δx Tk Vx x (x k ) δx k , (4.14a) 2 T δx k T , Q Q Q (x k + δx k , uk + δuk ) ≈ Q (x k , uk ) + x u δuk T 1 δx k Q x x Q xu δx k , (4.14b) + Q ux Q uu δuk 2 δuk where the gradient vectors and Hessian matrices of the action-value function Q (x k , uk ) can be readily obtained by the definition of Q (x k , uk ) as Q x = l x (x k , uk ) + f xT (x k , uk ) Vx (x k+1 ) , Qu = Qxx =
l u (x k , uk ) + f uT (x k , uk ) Vx (x k+1 ) , l x x (x k , uk ) + f xT (x k , uk ) Vx x (x k+1 )
(4.15a) (4.15b) f x (x k , uk )
+ Vx (x k+1 ) f x x (x k , uk ) , Q uu = l uu (x k , uk ) +
f uT
(x k , uk ) Vx x (x k+1 ) f u (x k , uk ) + Vx (x k+1 ) f uu (x k , uk ) ,
Q ux = l ux (x k , uk ) +
f uT
(x k , uk ) Vx x (x k+1 ) f x (x k , uk ) + Vx (x k+1 ) f ux (x k , uk ) ,
(4.15c) (4.15d) (4.15e)
where (·) x denotes the partial derivative of (·) with respect to x. Note that computing the second-order derivatives of the dynamics, i.e., f x x , f xu , f uu , is computationally expensive and, therefore, DDP ignores these terms. In the following derivations, for brevity, we omit the independent variables inside the brackets of (4.15), and denote the next time step V ≡ V (x k+1 ). With this in mind, (4.15) can be further reduced to
4 An Improved Differential Dynamic Programming Approach …
83
Q x = l x + f xT Vx x ,
(4.16a)
l u + f uT Vx x , l x x + f xT Vx x l uu + f uT Vx x l xu + f xT Vx x l ux + f uT Vx x
(4.16b)
Qu = Qxx = Q uu = Q xu = Q ux =
fx,
(4.16c)
fu,
(4.16d)
fu,
(4.16e)
fx.
(4.16f)
Since (4.14b) is a quadratic function with respect to control perturbation δuk , the optimal control correction δu∗k can be derived by using the first-order optimality condition (4.17) ∇δuk Q (x k + δx k , uk + δuk ) = 0. Solving (4.17) gives the optimal solution δu∗k = −Q −1 uu (Q u + Q ux δx k ) ,
(4.18)
−1 which includes a feedforward term −Q −1 uu Q u and a feedback term −Q uu Q ux δx. Substituting the optimal control correction (4.18) back into (4.14b) and grouping the corresponding terms with the same order of δx yields
1 T −1 Q Q Qu, 2 u uu Vx (x k ) = Q x − Q xu Q −1 uu Q u , V (x k ) = Q (x k , uk ) −
Vx x (x k ) = Q x x −
Q xu Q −1 uu Q ux .
(4.19a) (4.19b) (4.19c)
Given an initial solution guess of the control sequence U 0 , DDP initializes the terminal value function V (x N ) as V (x N ) = φ(x N ) and runs a backward process from k = N to k = 0 to sequentially find the optimal control correction δuk . Hence, the one-step control update is given by uk = uk + δuk
ˆ k − x k − Q −1 = uk − Q −1 uu Q ux x uu Q u .
Once the backward pass is completed, a forward pass is triggered to find a new nominal trajectory by substituting the updated control sequence into the system dynamics as (4.20) x k+1 = f (x k , uk ) . DDP runs the backward pass and the forward pass in a recursive way until the convergence condition is met. A typical termination criterion is that the difference
84
X. Zheng et al.
of cost function between two iterations is smaller than a lower threshold, i.e., if |Ji − Ji−1 | < ,
(4.21)
where Ji denotes the cost function evaluated at the ith iteration of DDP and > 0 is a small constant, then we terminate the iteration process.
4.3 Flexible Final-Time-Constrained Differential Dynamic Programming This section presents the details of the proposed FFT-CDDP algorithm with rigorous mathematical derivation. We first introduce how to dynamically optimize the final time in a recursive way and then the nonlinear constraints are handled by the augmented Lagrangian approach.
4.3.1 Flexible Final-Time Differential Dynamic Programming This subsection revisits the original DDP into a free final-time framework without any constraints. The optimization problem considered in this subsection is denoted as CTO3 . CTO3 : Find N −1 u∗k = min l(x k , uk ) + φ(x N , t N ) (4.22) uk
k=0
subject to x k+1 = f (x k , uk ) , t N is free.
(4.23a) (4.23b)
Unlike problem CTO2 , where the final time t N is fixed, the variation of the terminal value function of problem CTO3 includes both the variation of the terminal state and the variation of the terminal time. The key idea to solve the free final-time problem is to update both the control history and the final time by determining their increments at every iteration. Similar to DDP, we utilize a second-order Taylor series to approximate the terminal value function around the nominal trajectory, denoted by (x N , t N ), as
4 An Improved Differential Dynamic Programming Approach …
V (x N + δx N , t N + δt N ) ≈ V (x N , t N ) + Vx N δx N + Vt N δt N 1 1 + δx TN Vx N x N δx N + Vt N t N δt N2 2 2 1 1 + δx TN Vx N t N δt N + δt N Vt N x N δx N . 2 2
85
(4.24)
Notice that the value function does not explicitly contain the final time t N . Hence, we leverage the “chain rule” to compute the gradients and Hessian of the terminal value function as ∂V d x N ∂ x N dt N = VxTN F(x N , u N ),
Vt N = Vx N t N = =
= Vt N t N = = = =
(4.25a)
VtTN x N
∂V ∂ ∂ x N ∂t N Vx N x N F(x N , u N ) + Vx N Fx N ,
∂V ∂ ∂t N ∂t N
∂V d x N ∂ ∂t N ∂ x N dt N
∂V d x N dxN ∂ ∂V ∂ + ∂t N ∂ x N dt N ∂ x N ∂t N dt N
∂V d x N ∂ ∂V ∂ + F(x N , u N ), ∂ x N ∂t N dt N ∂ x N ∂t N
(4.25b)
(4.25c)
where Fx N denotes the gradient of the state dynamics function at time instant t N . Note that updating u N does not have any significance since u N is not experienced by the system. Thus, we assume that u N = u N −1 during the implementation. Since x˙ N = F(x N , u N ) does not contain the terminal time t N explicitly, we have ∂ (F(x N , u N )) = 0. ∂t N Substituting (4.26) into (4.25c) gives Vt N t N
∂V d x N ∂ = ∂ x N ∂t N dt
∂V d x N d x N ∂ = ∂ x N ∂ x N dt dt
∂V d x N dx N ∂V ∂ dx N ∂ + = ∂xN ∂xN dt ∂xN ∂xN dt dt
(4.26)
86
X. Zheng et al.
= Vx N x N F(x N , u N ) + Vx N Fx N F(x N , u N ).
(4.27)
By the definition of the value function, we have V (x N , t N ) = φ(x N , t N ), Vx N (x N , t N ) = φ x N (x N , t N ),
(4.28a) (4.28b)
Vx N x N (x N , t N ) = φ x N x N (x N , t N ).
(4.28c)
The objective of this section is to find an optimal final time δt N∗ to exclude the requirement of manually setting a final time for the implementation of DDP. To this end, we define the variation of the terminal value function ΔV (x N , t N ) = V (x N + δx N , t N + δt N ) − V (x N , t N ). Per definition, the optimal final time δt N∗ satisfies the condition δt N∗ = minΔV (x N , t N ) . (4.29) δt N
Using (4.24), the variation of the terminal value function can be determined as 1 1 ΔV (x N , , t N ) = Vx N δx N + Vt N δt N + δx TN Vx N x N δx N + Vt N t N δt N2 2 2 1 T 1 + δx N Vx N t N δt N + δt N Vt N x N δx N , (4.30) 2 2 which is a quadratic function with respect to the final-time perturbation δt N and hence the optimal final time δt N∗ can be derived by using the first-order optimality condition as ∇δt N V (x N , t N ) = Vt N + δx N T Vx N t N + δt N Vt N t N = 0. Solving (4.31) for δt N gives the optimal final-time correction as δt N∗ = −Vt−1 Vt N + Vt N x N δx N , N tN
(4.31)
(4.32)
which indicates that the optimal final-time correction δt N∗ is composed of a feedforVt N and a feedback term −Vt−1 Vt N x N δx N . The terms Vt N , Vt N x N , ward term −Vt−1 N tN N tN and Vt N t N can be obtained by using (4.25a), (4.25b), and (4.31). Substituting the expression of δt N∗ , i.e., (4.32), back into the second-order expansion of V (x N , t N ), i.e., (4.24), and grouping the zero-order, first-order, and secondorder terms of δx N give 1 V (x N + δx N , t N + δt N ) ≈ V (x N , t N ) − Vt N Vt−1 Vt N N tN 2 T + Vx N − VtTN x N Vt−1 Vt N δx N N tN 1 + δx TN Vx N x N − VtTN x N Vt−1 Vt N x N δx N , N tN 2
(4.33)
4 An Improved Differential Dynamic Programming Approach …
87
which subsequently give the updated gradient and Hessian of the value function at the time instant t N as Vx (x N ) = VxTN − VtTN x N Vt−1 Vt N , N tN
(4.34a)
Vt N x N . Vx x (x N ) = Vx N x N − VtTN x N Vt−1 N tN
(4.34b)
Notice that the action-value function at the time instant t N −1 is given by Q (x N −1 , u N −1 ) = l (x N −1 , u N −1 ) + V (x N , t N ).
(4.35)
Then, substituting (4.34) into (4.16) gives the gradients and Hessian matrices of the action-value function at time instant t N −1 as T Q x,N −1 = l x,N −1 + f x,N −1 Vx (x N ) ,
Q u,N −1 = Q x x,N −1 = Q uu,N −1 =
T l u,N −1 + f u,N −1 Vx (x N ) , T l x x,N −1 + f x,N −1 Vx x (x N ) T l uu,N −1 + f u,N −1 Vx x (x N )
Q xu,N −1 = l xu,N −1 + Q ux,N −1 = l ux,N −1 +
T f x,N −1 Vx x T f u,N −1 Vx x
(4.36a) (4.36b) f x,N −1 ,
(4.36c)
f u,N −1 ,
(4.36d)
(x N ) f u,N −1 ,
(4.36e)
(x N ) f x,N −1 ;
(4.36f)
the terms with subscripts N − 1 and N are evaluated at the time instant t N −1 and t N , respectively. Since the update of the final time only influences the action-value function at time instant t N −1 , the gradients and Hessian matrices of the action-value function at other time instants are identical as (4.16). Remark 4.1 With a poor initial guess of the final time, DDP might diverge from the local optimum. To ensure numerical stability, we limit the update of the final time within a bounded region as t N = min{max{t N + δt N∗ , tmin }, tmax },
(4.37)
where tmin > 0 and tmax > 0 capture the lower and the upper bounds of the final time, respectively. Remark 4.2 With the variation of the final time t N , we adjust the maximum value of the discrete-time instant, i.e., N , accordingly as N = N + δN , δ N = δt N∗ /Δt , (4.38) where Δt denotes the sampling step during the discretization process. If δ N ≤ 0, then we simply truncate the nominal trajectory. If δ N > 0, then we expand the nominal trajectory as x N +k = f (x N +k−1 , u N +k−1 ) , where u N +k−1 = 0m is utilized for simplicity.
k = 1, 2, · · · , δ N ,
(4.39)
88
X. Zheng et al.
4.3.2 FFT-CDDP Algorithm This subsection extends the flexible final-time DDP to consider nonlinear state and control constraints, thereby solving problem CTO1 . We first utilize the Lagrangian multiplier method [34, 37, 38] to cope with the path constraints g (x k , uk ). Successively, we use a control projection approach to handle the control limit constraint umin ≤ uk ≤ umax . By augmenting the inequality constraint into the original cost function (4.3), we can subsequently obtain the augmented cost function J=
N −1
l (x k , uk ) + φ(x N , t N ) +
k=0
+
N −1
λTk g (x k , uk )
k=0
1 2
N −1
g T (x k , uk ) Γ k g (x k , uk ) ,
(4.40)
k=0
where λk and Γ k denote the Lagrange multiplier vector and a diagonal penalty j j matrix, respectively. The jth elements of λk and Γ k , denoted as λk and μk , are such that
> 0, g j (x k , uk ) > 0, j λk = (4.41a) 0, otherwise,
> 0, g j (x k , uk ) > 0, j (4.41b) μk = 0, otherwise, where g j (x k , uk ) denotes the ith element of g (x k , uk ). With the augmented cost function (4.40), the value function and action-value function are modified as Vˆ (x k ) =
N −1 lˆ x i , ui∗ + φ(x N , t N ),
(4.42)
i=k
Qˆ (x k , uk ) = lˆ (x k , uk ) + Vˆ (x k+1 ) ,
(4.43)
where lˆ (x k , uk ) is defined as 1 lˆ (x k , uk ) = l (x k , uk ) + λTk g (x k , uk ) + g T (x k , uk ) Γ k g (x k , uk ) . 2
(4.44)
4 An Improved Differential Dynamic Programming Approach …
89
Similar to the original DDP, we consider a small perturbation around the nominal trajectory and use a second-order Taylor series to approximate the value function as well as the action-value function as 1 Vˆ (x k + δx k ) ≈ Vˆ (x k ) + Vˆ x (x k )T δx k + δx Tk Vˆ x x (x k ) δx k , (4.45a) 2 T T δx k Qˆ (x k + δx k , uk + δuk ) ≈ Qˆ (x k , uk ) + Qˆ x Qˆ u δuk T 1 δx k Qˆ x x Qˆ xu δx k , (4.45b) + δuk Qˆ ux Qˆ uu 2 δuk where the gradients and Hessian matrices of the modified action-value function can be readily obtained by definition as Qˆ x = lˆx + f xT Vˆ x x ,
(4.46a)
Qˆ u =
(4.46b)
Qˆ x x = Qˆ uu = Qˆ xu = Qˆ ux =
lˆu + f uT Vˆ x x , lˆx x + f xT Vˆ x x lˆuu + f uT Vˆ x x lˆxu + f xT Vˆ x x lˆux + f uT Vˆ x x
fx,
(4.46c)
fu,
(4.46d)
fu,
(4.46e)
fx,
(4.46f)
where we omit the independent variables in the brackets for notational simplicity. Notice that, since Vˆ (x N ) = V (x N ) = φ(x N , t N ), we can utilize the results presented in the previous section to update final time t N and value function Vˆ (x N ). Similar to (4.36), the updated value function at time instant t N is then leveraged to calculate the gradients and Hessian matrices of modified original action-value function at the time instant t N −1 as T ˆ Qˆ x,N −1 = lˆx,N −1 + f x,N −1 Vx (x N ) ,
(4.47a)
T ˆ Qˆ u,N −1 = lˆu,N −1 + f u,N −1 Vx (x N ) ,
(4.47b)
Qˆ x x,N −1 = lˆx x,N −1 +
T ˆ f x,N −1 Vx x T f u,N −1 Vˆ x x
(x N ) f x,N −1 ,
(4.47c)
(x N ) f u,N −1 ,
(4.47d)
T ˆ Qˆ xu,N −1 = lˆxu,N −1 + f x,N −1 Vx x (x N ) f u,N −1 ,
(4.47e)
T ˆ Qˆ ux,N −1 = lˆux,N −1 + f u,N −1 Vx x (x N ) f x,N −1 .
(4.47f)
Qˆ uu,N −1 = lˆuu,N −1 +
Replacing the gradients and Hessian matrices of the original action-value function in (4.20) with their modifications derived in (4.46) gives the optimal control correction as
90
X. Zheng et al.
uk = uk + δuk
ˆ ˆ ˆ k − x k − Qˆ −1 = uk − Qˆ −1 uu Q ux x uu Q u .
(4.48)
To handle the constraints on the control input (4.7c), we project the controls into a feasible box so that uk = min(max(uk + δuk , umin ), umax ).
(4.49)
Similarly, the gradients and Hessian matrices of the original action-value function in (4.19) should be replaced with their modifications derived in (4.46), which yields the update of the modified value function and its derivatives as 1 ˆ Vˆ (x k ) = Qˆ (x k , uk ) − Qˆ Tu Qˆ −1 uu Q u , 2 ˆ Vˆ x (x k ) = Qˆ x − Qˆ xu Qˆ −1 uu Q u , ˆ Vˆ x x (x k ) = Qˆ x x − Qˆ xu Qˆ −1 uu Q ux .
(4.50a) (4.50b) (4.50c)
The results presented thus far are summarized by Algorithm 4.1. According to this algorithm, the backward process of the FFT-CDDP is split into two parts, namely, a solution update at the time instant t N −1 and other time instants. The difference between these two parts follows from the update of the final time t N . The algorithm is terminated at the ith iteration once |Ji − Ji−1 | ≤ ,
(4.51)
where denotes a small positive constant. Remark 4.3 The proposed FFT-CDDP algorithm requires to invert the matrix Vt N t N and Qˆ uu , as shown in (4.34) and (4.48). To guarantee numerical stability, one can leverage the regularization technique to ensure the positive-definiteness of both Vt N t N and Qˆ uu as Vt N t N = Vt N t N + ρv , Qˆ uu = Qˆ uu + ρq Im ,
(4.52) (4.53)
where ρv and ρq are small positive regularization constants. Remark 4.4 As in typical applications of the Lagrangian multiplier, we gradually increase the constraint penalty to ensure that the solution converges to the optimal solution, located in the feasible region governed by the path constraints. Therefore, we update the jth element of Γ k at each iteration as j
j
j
μk,i = αμk,i−1 , μk,0 = μ,
(4.54)
4 An Improved Differential Dynamic Programming Approach … j
91
j
where μk,i denotes the value of μk at the ith iteration of FFT-CDDP and α > 1 quantifies the updating rate. Equation (4.54) is only utilized when the jth constraint j is violated. Otherwise, we set μk,i = 0. Remark 4.5 The proposed FFT-CDDP algorithm utilizes the information from the dynamic model. In practical applications, the vehicle dynamics might deviate from the simulator model due to the existence of model uncertainties. To mitigate the issue of discrepancies between the simulation and the real world, we can model the physical and path constraints as safety constraints by a set of probabilistic chance constraints [39, 40]. Extension of the proposed FFT-CDDP to the problems with chance constraints is subject to our future study.
Algorithm 4.1 Flexible final-time-constrained differential dynamic programming 1: Initialize Γ k , λk , U 0 , t N 2: for k = 0, ..., N − 1 do 3: x k+1 = f (x k , uk ) 4: end for 5: J0 ← (4.40) 6: while |Ji − Ji−1 | ≥ do 7: Vˆ N = φ(x N , t N ), Vˆ x N = φ x (x N , t N ), Vˆ x N x N = φ xx (x N , t N ) 8: Vˆt N ← (4.25a), Vˆt N x N ← (4.25b), Vˆt N t N ← (4.3.1) 9: for k = N − 1, ..., 0 do 10: if k = N − 1 then 11: Vˆ x,N , Vˆ x,N ← (4.34) 12: Calculate the gradients and Hessian matrices of Qˆ (x N −1 , u N −1 ) using Eq. (4.47) 13: else 14: Calculate the gradients and Hessian matrices of Qˆ (x k , uk ) using Eq. (4.46) 15: end if 16: uk ← (4.48) 17: uk = min(max(uk + δuk , umin ), umax ) 18: Update of the modified value function and its derivatives using Eq. (4.50) 19: end for j j 20: μk,i = αμk,i−1 21: for k = 0, ..., N − 1 do 22: x k+1 = f (x k , uk ) 23: end for 24: δt N ← (4.32) 25: t N = min(max(t N + δt N , tmin ), tmax ) 26: δ N = δt N∗ /Δt 27: N = N + δ N 28: if δ N > 0 then 29: for k = 1, ..., δ N do 30: u N +k−1 = 0m 31: xˆ N +k = f (x N +k−1 , u N +k−1 ) 32: end for 33: else 34: Truncate the nominal trajectory 35: end if 36: end while
92
X. Zheng et al.
Table 4.1 Conditions of the considered scenario Parameter Value UAV initial position [x M0 , y M0 , z M0 ]T UAV initial velocity VM0 UAV initial flight path angle γ0 UAV initial azimuth angle ψ0 Target position [x T , yT , z T ]T Desired terminal flight path angle γd Reference area S Drag coefficient Cd Acceleration bound amax Center of the no-fly cylinder zone [xc , yc ]T Radius of the no-fly cylinder zone r Bounds of time length of the mission duration [tmin , tmax ]
[5000, 5000, 5000]T m 300 m/s 0◦ 180◦ [0, 0, 0]T m −60◦ 0.0325 m2 0.0169 30 m/s2 [1500, 2000]T m and [2800, 4800]T m 500 m [15, 55] m
4.4 Illustrative Numerical Example To verify the effectiveness of the proposed FFT-CDDP algorithm, a three-dimensional arrival-angle-constrained UAV guidance problem with path constraints and control limits is studied in this section. A comparison with GPOPS [41], an off-the-shelf tool, is also presented to demonstrate the advantage of the proposed approach. All simulations are conducted in a personal computer environment with an i7-8570H processor and 8 GB RAM memory.
4.4.1 Mission Scenario We consider the UAV’s translational dynamics and neglect any delays in the autopilot. Then, the three-dimensional kinematical model of the UAV can be described as x˙U = VU cos ψ cos γ,
(4.55a)
y˙U = VU sin ψ cos γ, z˙ U = VU sin γ, D V˙U = − − g sin γ, m az + g cos γ γ˙ = − , VU ay , ψ˙ = VU cos γ
(4.55b) (4.55c) (4.55d) (4.55e) (4.55f)
4 An Improved Differential Dynamic Programming Approach …
93
where [xU , yU , zU ]T denotes the three-dimensional position vector of the UAV in the inertial reference frame. The flight path angle and the azimuth angle are denoted by γ and ψ, respectively [27]. The magnitude of the UAV’s velocity is denoted by VU , and g and m denote the gravitational acceleration and the mass of the UAV, respectively. The lateral accelerations in the horizontal and vertical planes are captured by a y and az , respectively. The aerodynamic drag is modeled as D=
1 2 ρV SCd , 2 U
(4.56)
where S denotes the reference area and Cd denotes the drag coefficient; the model of air density ρ is deduced from [42]. Since the lateral accelerations are limited by physical constraints, we assume that a y and az are bounded by a y ≤ amax , |az | ≤ amax ,
(4.57)
where amax > 0 is a constant, determining the maneuverability of the UAV. The UAV is required to arrive at a target position, located at [x T , yT , z T ]T with desired arrival angles γd and ψd . The nonlinear path constraints in the considered application example are two cylindrical no-fly zones. The cylindrical constraints only affect the UAV trajectory in the x − y-plane and can be mathematically expressed as (4.58) (xU − xc )2 + (yU − yc )2 ≤ r 2 , where [xc , yc ]T denotes the position of the center coordinate of the cylinder in the x − y-plane, and r denotes its radius. We also assume that the time of the mission is constrained by tmin ≤ t N ≤ tmax . The conditions of the considered scenario are summarized in Table 4.1.
4.4.2 Simulation Setup The state and input vectors of this problem are defined as x = [xU , yU , zU , VU , γ, ψ]T , T u = a y , az .
(4.59a) (4.59b)
Euler’s method is used to discretize the dynamical model, and a quadratic cost function is defined as
94
X. Zheng et al.
Table 4.2 Design parameters of the FFT-CDDP algorithm ρv
ρq
α
μ
λ
R
Ws
WN
0
0
1.1
1
0.004
10−4
0.1I2
diag(2 × 10−5 I3 , 0, 10−7 I2 )
diag(400I3 , 0, 0.0005I2 )
J=
1 (x N − x d )T W N (x N − x d ) 2 N −1 1 T 1 T u Ruk + (x k − x d ) Ws (x k − x d ) , + 2 k 2 k=0
(4.60)
where x d = [x T , yT , z T , free, γd , ψd ]T denotes the desired terminal state vector, and W N , R, and Ws are weighting matrices of the cost function. We note from (4.60) that the running cost includes an energy penalty term and a trajectory shaping term. To implement the proposed FFT-CDDP, we leverage a constant Lagrangian mulj tiplier vector, i.e., the jth element of λk is defined as λk = λ with λ being a positive j constant if the jth no-fly zone constraint is violated. Otherwise, we set λk = 0. The parameters required to implement the proposed algorithm are summarized in Table 4.2. Remark 4.6 In order to solve the guidance problem formulated using FFT-CDDP, we have two different classes of controlling parameters to be tuned, namely, algorithm related and problem related. For algorithm-related parameters, we need to tune ρv , ρq , α, μ, and . The first two of these parameters are used to ensure the positiveness of two matrices. They can be tuned by applying the Levenberg–Marquardt approach. The parameters α and μ are leveraged to ensure the satisfaction of the path constraints. According to (4.54), we can fix μ and tune α only. Typically, bigger values of α provide faster convergence speed but might also generate numerical instability. Hence, tuning α is a trade-off between convergence speed and numerical stability. Finally, determines the optimization performance and computational efficiency: a smaller value ensures accurate convergence performance but a higher computational burden. For problem-related parameters (the UAV guidance problem considered in the chapter), we need to tune W N , R, and Ws . The first matrix is related to the terminal constraint and the other two matrices are used to shape the running cost. We first fix the energy weighting matrix R and then tune W N and Ws . Since we aim to control the UAV to arrive at the target position with hard terminal constraint, the magnitudes of the elements of matrix W N are much bigger than those of matrix Ws . Notice that the weighting matrices W N and Ws are tuned by considering the scale difference of the state variables.
4 An Improved Differential Dynamic Programming Approach …
95
4.4.3 Performance of FFT-CDDP Under Various Conditions This subsection evaluates the performance of the proposed FFT-CDDP algorithm in the considered UAV guidance application scenario with different desired terminal azimuth angles ψd = 225◦ , 240◦ , 260◦ . The nominal trajectory is initialized with an uncontrolled trajectory, i.e., the initial guess with a y = az = 0, and the final time is initialized as t N = 35s in all simulations. The simulation results, including the UAV trajectory, the convergence of the final time, the time history of flight path angles, and the commanded acceleration, are presented in Fig. 4.1. The results indicate that the proposed FFT-CDDP algorithm allows to optimize the trajectory that enables accurate terminal position and terminal arrival angle control. The UAV trajectory results also reveal that the proposed algorithm is able to satisfy the no-fly zone constraint and the UAV trajectory is very close to the edge of the no-fly zone. This can be attributed to the fact that cost function (4.60) penalizes the energy consumption. The recorded time durations of three different cases are 28.18s, 27.4s, and 27.8s, respectively. It follows from Fig. 4.1d
(a) UAV trajectory
(b) Convergence of the final time
0
40
-20
20
-40
0
-60
-20 0
5
10
15
20
25
30
260
0
5
10
15
20
25
30
0
5
10
15
20
25
30
10
240
5
220 0
200 180
-5 0
5
10
15
20
25
30
(c) Time history of flight path angles Fig. 4.1 Simulation results of different cases
(d) Commanded acceleration
96
X. Zheng et al.
that the generated control commands are smooth and within the upper bound during the flight. The acceleration command in the horizontal plane during the initial phase leverages the maximum allowable maneuverability to avoid obstacles. This provides unique information on how the proposed algorithm guides the UAV to arrive at the target position without collision with obstacles. Considering the fact the FFT-CDDP algorithm is initialized with an uncontrolled trajectory, we can conclude that the proposed algorithm is robust against the initial solution guess.
4.4.4 Comparison with Traditional Optimization Methods In order to demonstrate the advantage of the proposed FFT-CDDP algorithm, this subsection provides a comparative analysis with the off-the-shelf software GPOPS [41]. The simulation scenario is the same as the previous subsection with ψd = 225◦ . We consider two different kinds of initial solutions in the simulations. One solution is the uncontrolled trajectory with t N = 35 s, and the other solution is the initial guess generated by proportional navigation guidance (PNG) [43] with t N = 25.6 s. In order to ensure the fairness of the comparison, we utilize almost the same termination conditions for the two methods, which are listed in Table 4.3. Notice that the termination condition for the proposed algorithm is more strict due to fewer maximum allowable iterations. The comparison results with two different initial solution guesses are presented in Figs. 4.2 and 4.3. The results indicate both algorithms generate feasible solutions that satisfy both path and terminal constraints for both scenarios. However, GPOPS generates a solution that requires a much higher energy consumption with an uncontrolled initial solution guess, compared with the proposed algorithm. This means that the performance of GPOPS is sensitive to the initial solution guess. As a comparison, the performance of the proposed FFT-CDDP algorithm is consistent across different initial solution guesses. The detailed comparison results in terms of computational efficiency and optimization performance are summarized in Table 4.4. From this table, we observe that the proposed FFT-CDDP algorithm provides a smaller value of the cost function than GPOPS for the scenario with PNG initial solution guess. In addition, the FFT-CDDP algorithm generates the solution much faster than GPOPS. Hence, the proposed approach provides a promising way to solve constrained trajectory optimization problems with flexible final time.
Table 4.3 Termination conditions of the two methods Metric Proposed Stop tolerance 10−4 Maximum allowable iterations 300
GPOPS 10−4 500
4 An Improved Differential Dynamic Programming Approach …
97
Fig. 4.2 Comparison results for the scenario with uncontrolled initial guess
(a) UAV trajectory 0 -20 -40 -60 0
5
10
15
20
25
30
0
5
10
15
20
25
30
250
200
150
(b) Time history of flight path angles 40 20 0 -20 -40 0
5
10
15
20
25
30
0
5
10
15
20
25
30
40 20 0 -20 -40
(c) Commanded acceleration
98
X. Zheng et al.
Fig. 4.3 Comparison results for the scenario with PNG initial guess
(a) UAV trajectory 0 -20 -40 -60 0
5
10
15
20
25
30
0
5
10
15
20
25
30
260 240 220 200 180
(b) Time history of flight path angles 40 20 0 -20 -40 0
5
10
15
20
25
30
0
5
10
15
20
25
30
30 20 10 0 -10
(c) Commanded acceleration
4 An Improved Differential Dynamic Programming Approach … Table 4.4 Comparison results of FFT-CDDP and GPOPS Method Initial guess Terminal time FFT-CDDP
GPOPS
Uncontrolled trajectory with t N = 35 s PNG with t N = 25.6 s Uncontrolled trajectory with t N = 35 s PNG with t N = 25.6 s
99
Cost
Computational time
28.18 s
618.39
1.20 s
28.18 s 28.94 s
861.617 22948.7
1.24 s 26.28 s
26.06 s
16614.4
27.0 s
4.4.5 Monte Carlo Simulation Study To better illustrate the robustness of the proposed FFT-CDDP algorithm against initial solution guess, 500 Monte Carlo simulations are performed with random initial solution guesses. The simulation settings are the same as the previous subsection with ψd = 225◦ . The acceleration commands of the initial solution guess at every time instants are randomly sampled from a uniform distribution, ranging from the minimum and maximum available control inputs, e.g., u ∼ U (umin , umax ). The distribution of the number of DDP iterations is presented in Fig. 4.4a. According to the recorded results, the solutions of 403 simulations converge to the optimal result within 83 iterations. This verifies the robustness of the proposed FFT-CDDP algorithm against a random initial solution guess, given the fixed termination condition. We also observe that the proposed solver generates a result that converges to the optimal solution in about 1.2 s for these 403 random sample runs. This also demonstrates the computational efficiency of the proposed algorithm. The distribution of the optimal final time is shown in Fig. 4.4b, and one can note that the resulting final time is 28.18 s, which is consistent with the results obtained in the previous section. Notice that GPOPS is unable to proceed in scenarios with random initial solution samples and, hence, the Monte Carlo results of GPOPS solutions are omitted here.
4.5 Conclusion and Future Work This chapter proposes a new FFT-CDDP algorithm to solve the constrained trajectory optimization problem with flexible final time for autonomous vehicles. The proposed method revisits the original DDP by dynamically adjusting the final time and actively handling nonlinear path constraints as well as control limits. Detailed mathematical derivation of the optimal correction is presented to support the implementation. The strengths and effectiveness of the proposed method are demonstrated in a three-dimensional arrival-angle-control UAV guidance problem. A comparison with traditional optimization methods reveals that the proposed algorithm provides better performance in terms of trajectory optimization and computational efficiency.
100
X. Zheng et al.
(a) Distribution of the number of DDP iterations
(b) Distribution of the optimal terminal time Fig. 4.4 Monte Carlo simulation results
The proposed method can be applied to a wider range of applications rather than a certain type of autonomous vehicle or just the aerospace field. Therefore, the scope of application of FFT-CDDP is not limited to the presented example but to a class of constrained trajectory optimization problems that require the adjustment of the final time. Future research includes extending the proposed algorithm into a distributed manner to solve the swarm optimization problem. Whereas this chapter addressed a solution to optimal guidance problems in general, the next chapter discusses a highly relevant open-loop optimal control problem.
4 An Improved Differential Dynamic Programming Approach …
101
Specifically, the next chapter addresses the problem of assuring maximum endurance for multiple kinds of aircraft, including all-electric ones, within a unified framework.
References 1. Ryoo C-K, Cho H, Tahk M-J (2005) Optimal guidance laws with terminal impact angle constraint. J Guidance Control Dyn 28(4):724–732 2. Ryoo C-K, Cho H, Tahk M-J (2006) Time-to-go weighted optimal guidance with impact angle constraints. IEEE Trans Control Syst Technol 14(3):483–492 3. He S, Lee C-H (2018) Optimality of error dynamics in missile guidance problems. J Guidance Control Dyn 41(7):1624–1633 4. He S, Lee C-H, Shin H-S, Tsourdos A (2019) Minimum-effort waypoint-following guidance. J Guidance Control Dyn 5. Lu P (2017) Introducing computational guidance and control. J Guidance Control Dyn 40(2):193–193 6. Tsiotras P, Mesbahi M (2017) Toward an algorithmic control theory. J Guidance Control Dyn 40(2):194–196 7. He S, Shin H-S, Tsourdos A (2019) Trajectory optimization for target localization with bearingonly measurement. IEEE Trans Robot 35(3):653–668 8. Chai R, Savvaris A, Tsourdos A, Chai S, Xia Y (2019) A review of optimization techniques in spacecraft flight trajectory design. Progr Aerosp Sci 109:1–15 9. Wang Z, Grant MJ (2017) Constrained trajectory optimization for planetary entry via sequential convex programming. J Guidance Control Dyn 40(10):2603–2615 10. Hong H, Maity A, Holzapfel F, Tang S (2019) Adaptive trajectory generation based on real-time estimated parameters for impaired aircraft landing. Int J Syst Sci 50(15):2733–2751 11. He S, Shin H-S, Tsourdos A (2020) Trajectory optimization for multitarget tracking using joint probabilistic data association filter. J Guidance Control Dyn 43(1):170–178 12. Chai R, Savvaris A, Tsourdos A (2017) Violation learning differential evolution-based hpadaptive pseudospectral method for trajectory optimization of space maneuver vehicle. IEEE Trans Aerosp Electron Syst 53(4):2031–2044 13. Tang G, Jiang F, Li J (2018) Fuel-optimal low-thrust trajectory optimization using indirect method and successive convex programming. IEEE Trans Aerosp Electron Syst 54(4):2053– 2066 14. Laad D, Elango P, Mohan R (2020) Fourier pseudospectral method for trajectory optimization with stability requirements. J Guidance Control Dyn 43(11):2073–2090 15. Hong H, Maity A, Holzapfel F (2021) Free final-time constrained sequential quadratic programming-based flight vehicle guidance. J Guidance Control Dyn 44(1):181–189 16. Zhang J, Ma K, Meng G, Tian S (2015) Spacecraft maneuvers via singularity-avoidance of control moment gyros based on dual-mode model predictive control. IEEE Trans Aerosp Electron Syst 51(4):2546–2559 17. Li J, Li C, Zhang Y (2019) Entry trajectory optimization with virtual motion camouflage principle. IEEE Trans Aerosp Electron Syst 56(4):2527–2536 18. Morgan D, Chung S-J, Hadaegh FY (2014) Model predictive control of swarms of spacecraft using sequential convex programming. J Guidance Control Dyn 37(6):1725–1740 19. Augugliaro F, Schoellig AP, D’Andrea R (2012) Generation of collision-free trajectories for a quadrocopter fleet: A sequential convex programming approach. In: International conference on Intelligent Robots and Systems. IEEE, pp 1917–1922 20. Deligiannis A, Amin M, Lambotharan S, Fabrizio G (2018) Optimum sparse subarray design for multitask receivers. IEEE Trans Aerosp Electron Syst 55(2):939–950 21. Aziz JD, Scheeres DJ, Lantoine G (2019) Hybrid differential dynamic programming in the circular restricted three-body problem. J Guidance Control Dyn 42(5):963–975
102
X. Zheng et al.
22. Sun W, Pan Y, Lim J, Theodorou EA, Tsiotras P (2018) Min-max differential dynamic programming: Continuous and discrete time formulations. J Guidance Control Dyn 41(12):2568–2580 23. Ozaki N, Campagnola S, Funase R, Yam CH (2018) Stochastic differential dynamic programming with unscented transform for low-thrust trajectory design. J Guidance Control Dyn 41(2):377–387 24. Morimoto J, Atkeson CG (2002) Minimax differential dynamic programming: An application to robust biped walking 25. Li W, Todorov E (2004) Iterative linear quadratic regulator design for nonlinear biological movement systems. In: ICINCO. Citeseer, pp 222–229 26. Tassa Y, Erez T, Smart WD (2007) Receding horizon differential dynamic programming. In: NIPS. Citeseer, pp 1465–1472 27. He S, Shin H-S, Tsourdos A (2019) Computational guidance using sparse gauss-hermite quadrature differential dynamic programming. Symp Autom Control Aerosp 52(12):13–18 28. Bertsekas DP (2019) Reinforcement learning and optimal control. Athena Scientific Belmont, MA 29. Mayne D (1966) A second-order gradient method for determining optimal trajectories of nonlinear discrete-time systems. Int J Control 3(1):85–95 30. Manchester Z, Kuindersma S (2016) Derivative-free trajectory optimization with unscented dynamic programming. In: Conference on decision and control. IEEE, pp 3642–3647 31. Maity A, Padhi R, Mallaram S, Rao GM, Manickavasagam M (2016) A robust and high precision optimal explicit guidance scheme for solid motor propelled launch vehicles with thrust and drag uncertainty. Int J Syst Sci 47(13):3078–3097 32. Tassa Y, Mansard N, Todorov E (2014) Control-limited differential dynamic programming. In: International conference on robotics and automation. IEEE, pp 1168–1175 33. Xie Z, Liu CK, Hauser K (2017) Differential dynamic programming with nonlinear constraints. In: International conference on robotics and automation. IEEE, pp 695–702 34. Plancher B, Manchester Z, Kuindersma S (2017) Constrained unscented dynamic programming. In: International conference on intelligent robots and systems. IEEE, pp 5674–5680 35. Sleiman J-P, Farshidian F, Hutter M (2021) Constraint handling in continuous-time ddp-based model predictive control. ArXiv preprint arXiv:2101.06067 36. Li H, Wensing PM (2020) Hybrid systems differential dynamic programming for whole-body motion planning of legged robots. IEEE Robot Autom Lett 5(4):5448–5455 37. Nocedal J, Wright S (2006) Numerical optimization. Springer Science & Business Media 38. Farshidian F, Neunert M, Winkler AW, Rey G, Buchli J (2017) An efficient optimal planning and control framework for quadrupedal locomotion. In: International conference on robotics and automation. IEEE, pp 93–100 39. Alcan G, Kyrki V (2022) Differential dynamic programming with nonlinear safety constraints under system uncertainties. IEEE Robot Autom Lett 7(2):1760–1767 40. Celik O, Abdulsamad H, Peters J (2019) Chance-constrained trajectory optimization for nonlinear systems with unknown stochastic dynamics. In: International conference on intelligent robots and systems. IEEE, pp 6828–6833 41. Patterson MA, Rao AV (2014) GPOPS-II: A Matlab software for solving multiple-phase optimal control problems using hp-adaptive gaussian quadrature collocation methods and sparse nonlinear programming. ACM Trans Math Soft 41(1):1–37 42. Kee P, Dong L, Siong C (1998) Near optimal midcourse guidance law for flight vehicle. In: Aerospace sciences meeting and exhibit, p 583 43. Imado F, Kuroda T, Tahk M-J (1998) A new missile guidance algorithm against a maneuvering target. In: Guidance, navigation, and control conference and exhibit, pp 145–153
4 An Improved Differential Dynamic Programming Approach …
103
Xiaobo Zheng received a B.Sc. degree in Aircraft Design and Engineering from the Beijing Institute of Technology, Beijing, China, in 2018. He is currently a Ph.D. student in the School of Aerospace, Beijing Institute of Technology. His research interests include aircraft guidance and control, trajectory optimization algorithm, multi-UAV trajectory generation, and formation control. Shaoming He received the B.Sc. degree and the M.Sc. degree in Aerospace Engineering from Beijing Institute of Technology, Beijing, China, in 2013 and 2016, respectively, and the Ph.D. degree in Aerospace Engineering from Cranfield University, Cranfield, U.K., in 2019. He is currently a Professor with the School of Aerospace Engineering, Beijing Institute of Technology, and also a recognized teaching staff with the School of Aerospace, Transport, and Manufacturing, Cranfield University. His research interests include aerospace guidance, multitarget tracking, and trajectory optimization. Defu Lin received a Ph.D. degree from the School of Aerospace Engineering, Beijing Institute of Technology, in 2004. He is currently a Professor with the School of Aerospace Engineering, Beijing Institute of Technology, and the Director of the Beijing Key Laboratory of UAV Autonomous Control, Ministry of Education, Beijing. His research interests include the overall design of flying vehicles, flight vehicle guidance and control, aerodynamics, and autonomous UAV control technique.
Chapter 5
A Unified Hybrid Control Approach to Maximum Endurance of Aircraft Emily Oelberg and Luis Rodrigues
5.1 Introduction Of all the fields that were mentioned in Chap. 1, aircraft performance optimization is one of fast-growing interest. The performance of an aircraft can be optimized with respect to a number of different factors depending on the application. Some common aircraft performance optimization problems include maximum range, economy mode (minimum direct operating cost), minimum time, and maximum endurance. Commercial aircraft typically fly in economy mode, which is a trade-off problem between the costs associated with fuel and time aloft. However, there are specific instances in which a pilot may be required to loiter for an indefinite amount of time, e.g., in cases of airport congestion. In fact, some airports are so congested that a short period of holding is expected as a means of maximizing runway usage [1]. In these situations, the aircraft should be flown to maximize its endurance. Additionally, unmanned aerial vehicles are gaining popularity in a number of areas which include applications that require long endurance capabilities. Some examples of current applications include search and rescue [2], inspection of power lines [3], border patrol [4], crop surveillance [5], and weather monitoring [6]. This chapter addresses the maximum endurance problem. The maximum endurance objective can be achieved by solving an optimal control problem. Optimal control theory has been developed starting in the 1950s as an extension of calculus of variations [7–11]. In particular, optimal control has been used in aircraft performance optimization problems for over 50 years [12, 13]. The main objective of this chapter is to solve the problem of maximum cruise endurance over the climb, cruise, and descent phases of flight using a hybrid optimal control (HOC) framework. A comprehensive overview of the HOC framework is provided in [14] and [7] and the references therein. E. Oelberg · L. Rodrigues (B) Concordia University, Montreal, Canada e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 A. L’Afflitto et al. (eds.), Control of Autonomous Aerial Vehicles, Advances in Industrial Control, https://doi.org/10.1007/978-3-031-39767-7_5
105
106
E. Oelberg and L. Rodrigues
Optimal control theory has been applied to various aircraft performance optimization problems. Often, approaches based on optimal control theory are used for flight management system (FMS) algorithms. The FMS is an onboard computer used in commercial aircraft to perform a range of tasks such as flight planning, navigation, and performance optimization. Several papers have been published dealing with the optimization of FMS algorithms, including the economy mode problem for jet aircraft in cruise [15–19]. A comparison of minimum fuel optimization techniques for jet aircraft was provided in [20] for level-cruise flights with fixed arrival time. However, only numerical techniques were included in this comparison. The minimum fuel problem was solved for hybrid electric vehicles in [21] using Pontryagin’s Maximum Principle. However, only numerical results were provided. An optimal control framework was proposed for maximum endurance of jet aircraft in [12], and an expression for the endurance of a turbofan was formulated in [22]. However, in both cases, no optimal solution was provided. In [23], the approximate endurance time was derived from the maximum range solution for jet aircraft, but the proposed method assumes constant speed and is only proposed as a rough estimate. The maximum endurance problem was formulated and solved for a turbojet using an optimal control framework in [24] for the cruise portion of the flight, and the authors provided an analytical state feedback solution. Similarly, an analytical state feedback solution to the maximum endurance problem for all-electric aircraft was proposed in [25, 26]. The maximum endurance problem of a turboprop in cruise was formulated and solved in [27] assuming constant airspeed. An expression for the endurance of a turboprop in cruise was formulated in [22], but no optimal solution was given, and the effect of wind was not considered. The theory of hybrid optimal control has seen several applications. For example, a hybrid optimal control framework was used in the control of an electric vehicle in [28] and for launch vehicles mission planning in [29]. In [30], hybrid optimal control was used to formulate the climb, cruise, and descent phases of flight. However, the hybrid optimal control problem was then reformulated as a classical optimal control problem and no analytical solution was given. A fixed-range minimum fuel problem for jet aircraft was formulated using a hybrid optimal control framework in [31] including climb, cruise, and descent. In this work, numerical results were also provided. The different phases of flight are modeled using a hybrid system in [32]. However, to the best of the authors’ knowledge, a hybrid optimal control framework has not been applied to the maximum endurance problem. Periodic cruise trajectories were developed in [33, 34] in order to maximize endurance in cruise. However, in these papers, the aircraft model assumes that the weight is constant and that the engine power can be set to zero. Furthermore, the proposed periodic trajectory requires the aircraft to be constantly climbing or descending between some maximum and minimum altitudes, which is not practical for commercial flights as aircraft are required to maintain a given flight level determined by air traffic control. The climb and descent portions of a flight were studied for the economy mode problem of a turbojet in [35]. The minimum fuel problem for the descent of a jet aircraft with fixed arrival time was presented in [36]. The top of the descent location was estimated for idle-thrust descent trajectories in [37].
5 A Unified Hybrid Control Approach to Maximum Endurance of Aircraft
107
Furthermore, hybrid optimal control has been used in aircraft trajectory optimization. A unified approach was used to solve the maximum endurance and maximum range problem for propeller and jet aircraft in cruise in [38]. In [39], the specific energy is used as an independent variable to minimize the direct operating cost of an aircraft with a fixed range over climb, cruise, and descent. Genetic algorithms were applied to optimize 3D flight profiles of aircraft with a model including wind in [40]. The advantage of this methodology is that the wind model that is used in the simulations can be more complex. However, no analytical solutions are provided. In [41], the wind effect was incorporated as a final penalty in the economy mode problem. In [42], measured flight data was used to develop a modified Bréguet equation to incorporate the effect of wind on maximum range in cruise. The minimum fuel problem for jet aircraft was formulated with constant head and tailwinds in cruise and climb/descent separately in [43]. However, only numerical solutions were provided. The effect of the horizontal headwind or tailwind on maximum endurance and maximum range of a turbojet appears in [44] without proofs. To the best of the authors’ knowledge, the problem of maximum endurance for fixed-wing aircraft has not been solved using a hybrid optimal control approach within a unified energy-depletion model. Several papers have provided numerical solutions. However, the advantage of the unified framework proposed in this chapter is that it addresses turbojet, turbofan, turboprop, and all-electric aircraft and, in many cases, it provides analytical solutions. The main contributions of this chapter are essentially three. First, a unification of different aircraft power plant configuration models into a single energy-depletion model is proposed. Second, a hybrid optimal control framework is applied to solve the cruise maximum endurance problem over climb, cruise, and descent. Third, analytical solutions will be derived for the climb, cruise, and descent for the turbojet, turboprop, and all-electric aircraft. The chapter starts by stating the assumptions and describing the mathematical models. Then, the maximum endurance problem is stated and solved for each aircraft configuration. After presenting and discussing the simulation results, this chapter ends by drawing some conclusions and outlining future work directions.
5.2 Assumptions and Models Throughout this chapter, the following assumptions are made: Assumption 5.1 The aircraft flies below the drag divergence Mach number. It is worthwhile recalling that the divergence Mach number is the Mach number beyond which the drag increases dramatically due to the formation of shock waves. Assumption 5.2 The aircraft experiences no lateral movement or forces. Assumption 5.3 The aircraft follows a flight path, which includes climb (γ > 0), level cruise (γ = 0), and descent (γ < 0) segments.
108
E. Oelberg and L. Rodrigues
Assumption 5.4 The aircraft is in steady (unaccelerated) flight. Assumption 5.5 The flight path angle γ and the angle of attack α are small such that cos γ ≈ 1 and sin γ ≈ γ . Similar assumptions hold true for the angle of attack α. Assumption 5.6 Specific fuel consumption is constant for turbojet and turboprop aircraft and depends only on airspeed for turbofan aircraft. Furthermore, the altitude effect on specific fuel consumption is small. Assumption 5.7 The battery is modeled as having negligible internal resistance and constant output voltage. It will be assumed that the electromotive force does not depend on the battery temperature and that the capacity does not depend on the amplitude of the current. Assumption 5.8 Only the headwind and tailwind will be considered (vertical and cross winds are ignored). Assumption 5.9 The thrust force is bounded, that is, Tidle ≤ T ≤ Tmax (h). Most of these assumptions are fairly standard. Future work directions involve an explicit dependence of fuel consumption on altitude and a dependence of the battery voltage on internal resistance and temperature. Although changes in the stated assumptions may have as a consequence the absence of analytical solutions, they would certainly extend the potential applications of the proposed methodology.
5.2.1 Battery Dynamics Using Assumption 5.7, in the following, the battery dynamics are derived for an all-electric aircraft. The required power is given by P = T v,
(5.1)
where T denotes the thrust and v the airspeed. The charge rate of change of an ideal battery is Q˙ = −i, (5.2) where i denotes the current. According to Ohm’s law, the current is such that i=
P Pe = , U ηU
(5.3)
where Pe denotes the electric power, U the battery voltage, and η the efficiency, which takes into account the conversion from electric to mechanical power and the propeller efficiency. Substituting (5.1) into (5.3), and then (5.3) into (5.2) yields
5 A Unified Hybrid Control Approach to Maximum Endurance of Aircraft Table 5.1 Aircraft energy vector fields Engine type f (0) , [J/s] Turbojet Turbofan Turboprop All-electric
eS FC T eS FC T eS FC T v 1 ηTv
109
f (1) , [J/s]
Fuel flow coefficient, b
eS FC D(W, v, ρ) eS FC D(W, v, ρ) eS FC D(W, v, ρ)v 1 η D(W, v, ρ)v
b=0 b>0 b=0 N/A
Tv . Q˙ = − ηU
(5.4)
5.2.2 Aircraft Performance Using Assumptions 5.1 and 5.4, the aerodynamic drag force is modeled as [18] D=
2C D2 W 2 1 ρ SC D0 v2 + , 2 ρ Sv2
(5.5)
where ρ denotes the air density, S the surface area, W is the weight, and C Di coefficients of drag, i ∈ {0, 2}. From Assumption 5.6, the fuel consumption is modeled as v (5.6) S FC = a + b , c where a > 0 and b is given in Table 5.1 for each type of aircraft. Using Assumption 5.5, the position dynamics are x˙ = v + ω, (5.7) where ω denotes the velocity of the wind. From Assumptions 5.4 and 5.5, it follows that T − D − W γ = 0. (5.8) Using Assumption 5.5, the altitude dynamics are h˙ = vγ .
(5.9)
Thus, substituting γ from (5.8) into (5.9) yields v(T − D) . h˙ = W
(5.10)
110
E. Oelberg and L. Rodrigues
5.2.3 Energy-Depletion Dynamics Four types of aircraft are modeled in this chapter, namely, turbojet, turbofan, turboprop, and all-electric aircraft. The main difference between these aircraft is the way in which they generate power. Turbojet, turbofan, and turboprop aircraft all burn fuel to generate energy, and, therefore, their weight decreases during flight as fuel is consumed. On the other hand, all-electric aircraft maintain a constant weight during flight, and the energy expended comes from a stored battery charge. In order to model the dynamics of these four aircraft using the same framework, the weight and charge dynamics will be converted into energy dynamics. This section will derive the conversion from fuel and charge to energy, respectively, so that the maximum endurance optimal control problem can be formulated more generally.
5.2.3.1
Fuel to Energy Conversion
Aviation fuel is a class of fuel used to power aircraft engines. An important property of fuel is its heating value. The heating value of fuel is constant and represents the energy stored per kilogram of fuel. Typical values for jet fuel range from 40,000 kJ/kg to 43,000 kJ/kg [45]. The weight dynamics of fuel-powered aircraft can therefore be converted to energy dynamics by multiplying the weight dynamics by the heating value e and dividing by the gravitational acceleration g. The time rate of change of energy for a fuel-powered aircraft is captured by e E˙ = W˙ fuel = e˜ W˙ fuel , g
(5.11)
where e˜ e/g. Similarly, the weight boundary conditions can be converted to energy boundary conditions using the following relationship: E=
5.2.3.2
e Wfuel = eW ˜ fuel . g
(5.12)
Charge to Energy Conversion
The energy stored in a battery can be quantified as E = QU.
(5.13)
Thus, using Assumption 5.7, the rate of change of energy is ˙ E˙ = QU. Replacing (5.4) into (5.14) yields for all-electric aircraft
(5.14)
5 A Unified Hybrid Control Approach to Maximum Endurance of Aircraft
111
Fig. 5.1 Hybrid automaton
Tv E˙ = − , η
(5.15)
which has the units of a time rate of change of energy [J/s]. Furthermore, the boundary conditions of the system can be converted from charge to energy using the relationship (5.13).
5.2.4 Hybrid Optimal Control Framework In the following, the maximum endurance problem is modeled using a hybrid optimal control framework. The hybrid system consists of control inputs (v and T ), continuous states (x, W , and h), and one discrete state that can be climb, cruise, or descent. A hybrid automaton of the switched system for the three phases of flight is given in Fig. 5.1. The superscript (q) denotes the discrete state of the system. From Assumption 5.3, we deduce that the aircraft will experience two switches, namely, from climb to cruise and then from cruise to descent. The discrete states are defined as ⎧ ⎪ t 0 ≤ t < tc , ⎨0 for climb (5.16) q 1 for cruise tc ≤ t < td , ⎪ ⎩ 2 for descent td ≤ t ≤ t f , where t0 denotes the initial time, tc the time at the top of the climb, td the time at the top of the descent, and t f the final time. The dynamics in altitude for the switched system are therefore given by ˙h = 0 v(T −D) W
if tc ≤ t ≤ td , otherwise.
(5.17)
The equations that relate fuel to energy and charge to energy from Sect. 5.2.3 will be used to generalize the dynamics of the system in terms of energy per unit of time. From Assumption 5.3, and using the hybrid optimal control framework,
112
E. Oelberg and L. Rodrigues
Fig. 5.2 The three phases of flight
− f (1) (W, v, ρ) ˙ E= − f (0) (T, v, ρ)
if tc ≤ t ≤ td , otherwise,
(5.18)
where the functions f (0) and f (1) satisfy Assumption 5.10 and will depend on the type of aircraft, as per Table 5.1. Assumption 5.10 The function f (1) (W, v, ρ) is twice continuously differentiable, is positive, has a bounded derivative, and is strictly convex with respect to the airspeed v for v > 0. The function f (0) (T, v, ρ) is twice continuously differentiable, is positive, has a bounded derivative, is linear in T , and its derivative with respect to T is positive.
5.3 Maximum Endurance In this section, the maximum endurance problem is formulated over the climb, cruise, and descent phases of flight using the hybrid optimal control framework described in [14]. The phase of flight (discrete state) is defined in (5.16), and the different phases of flight are shown in Fig. 5.2. The energy and altitude dynamics are given by (5.18) and (5.17), respectively. The performance index is defined as J ∗ max
td
dt
(5.19)
tc
such that the endurance time in cruise is maximized. The aircraft will switch to cruise once it reaches the desired cruising altitude, as defined by the switching manifold ψ (1) = h(tc ) − h c = 0.
(5.20)
The aircraft will then switch to descent once it has a certain amount of stored energy remaining, as defined by the switching manifold ψ (2) = E(td ) − E d = 0.
(5.21)
5 A Unified Hybrid Control Approach to Maximum Endurance of Aircraft
113
The hybrid optimal control problem is therefore formulated as ∗
J =
max
v,T,tc ,td ,t f
s.t. x˙ = v + ω, 0 h˙ =
v(T −D) W
td
dt,
(5.22b) if tc ≤ t ≤ td , otherwise,
− f (1) (W, v, ρ) ˙ E= − f (0) (T, v, ρ) t = tc ,
(5.22a)
tc
if tc ≤ t ≤ td , otherwise,
when ψ (1) = h(t) − h c = 0, (2)
(5.22c) (5.22d) (5.22e)
when ψ = E(t) − E d = 0, t = td , E(t0 ) = E 0 , E(td ) = E d , E(t f ) = E f ,
(5.22f) (5.22g)
h(t0 ) = h 0 , h(t) = h c for tc ≤ t ≤ td , v > 0, Tidle ≤ T ≤ Tmax (h),
(5.22h) (5.22i) (5.22j)
D=
1 2C D2 W 2 ρ SC D0 v2 + . 2 ρ Sv2
(5.22k)
5.4 Maximum Endurance Solution This section provides a general solution to the maximum endurance problem (5.22). This solution is captured by the following result. Theorem 5.1 Consider the hybrid optimal control problem (5.22). For each stage of flight, the maximum endurance airspeed is given by ⎧ ∗(0) v is the solution of ⎪ ⎪ ⎪ (0) ⎪ f (T,v,ρ) ⎪ (T − D − v Dv ) − f v(0) (T, v, ρ) = 0 ⎪ v(T −D) ⎪ ⎪ ⎪ ⎪ for t0 ≤ t < tc , ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ∗(1) ⎪ ⎪ is the solution of ⎨v ∗ (1) v = f v (W, v, ρ) = 0 ⎪ ⎪ ⎪ for tc ≤ t < td , ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ v∗(2) is the solution of ⎪ ⎪ ⎪ f (0) (T,v,ρ) ⎪ ⎪ (T − D − v Dv ) − f v(0) (T, v, ρ) = 0 ⎪ ⎪ ⎩ v(T −D) for td ≤ t ≤ t f ,
(5.23)
114
E. Oelberg and L. Rodrigues
the optimal thrust profile is given by ⎧ ∗(0) ⎪ = Tmax (h) ⎨T
∗ ∗(1) T = T = D W, v∗(1) , ρ ⎪ ⎩ ∗(2) = Tidle T
for t0 ≤ t < tc , for tc ≤ t < td , for td ≤ t ≤ t f ,
(5.24)
and the maximum endurance is given by J∗ =
td
dt = −
tc
Ed Ec
1 d E. f (1) (W, v∗(1) , ρ)
(5.25)
where f v(0) and f v(1) are, respectively, the partial derivatives of f (0) and f (1) with respect to v. Proof In climb, the Hamiltonian function is given by H (0) = −JE∗ f (0) (T, v, ρ) + Jh∗
v(T − D) + Jx∗ (v + ω) W
for t0 ≤ t < tc , (5.26)
and in cruise, the Hamiltonian function is given by H (1) = 1 − JE∗ f (1) (W, v, ρ) + Jx∗ (v + ω)
for tc ≤ t < td .
(5.27)
Finally, in descent, the Hamiltonian function is H (2) = −JE∗ f (0) (T, v, ρ) + Jh∗
v(T − D) + Jx∗ (v + ω) W
for td ≤ t ≤ t f .
(5.28) At the switching times tc (climb to cruise) and td (cruise to descent), the following necessary conditions must hold. At the top of the climb, the conditions are [7]
∂ψ (1) , = + ∂ E(tc )
∂ψ (1) Jh∗ (tc− ) = Jh∗ (tc+ ) + νh(1) , ∂h(tc )
∂ψ (1) Jx∗ (tc− ) = Jx∗ (tc+ ) + νx(1) , ∂ x(tc )
JE∗ (tc− )
JE∗ (tc+ )
ν E(1)
and, at the top of the descent, the conditions are
(5.29a) (5.29b) (5.29c)
5 A Unified Hybrid Control Approach to Maximum Endurance of Aircraft
∂ψ (2) , = + ∂ E(td )
∂ψ (2) Jh∗ (td − ) = Jh∗ (td + ) + νh(2) , ∂h(td )
∂ψ (2) Jx∗ (td − ) = Jx∗ (td + ) + νx(2) , ∂ x(td )
JE∗ (td − )
JE∗ (td + )
ν E(2)
115
(5.30a) (5.30b) (5.30c)
where ν E(1) , νh(1) , νx(1) , ν E(2) , νh(2) , and νx(2) are scalars. Evaluating the partial derivatives in (5.29) and (5.30) yields ∂ψ (1) = 0, ∂ E(tc ) ∂ψ (2) = 1, ∂ E(td )
∂ψ (1) ∂ψ (1) = 1, = 0, ∂h(tc ) ∂ x(tc ) ∂ψ (2) ∂ψ (2) = 0, = 0. ∂h(td ) ∂ x(td )
(5.31a) (5.31b)
Therefore, at the switching times tc and td , the following relations must hold for the costates JE∗ (tc− ) = JE∗ (tc+ ),
(5.32a)
Jh∗ (tc+ ) + νh(1) , Jx∗ (tc+ ),
(5.32b)
JE∗ (td − ) = JE∗ (td + ) + ν E(2) , Jh∗ (td − ) = Jh∗ (td + ),
(5.33a) (5.33b)
Jx∗ (td − ) = Jx∗ (td + ).
(5.33c)
Jh∗ (tc− ) Jx∗ (tc− )
= =
(5.32c)
and
From (5.32), we deduce that there is a jump discontinuity in Jh∗ at tc if νh(1) = 0 and JE∗ is continuous at tc . Similarly, from (5.33), we deduce that there is a jump discontinuity in JE∗ at td if ν E(2) = 0, but Jh∗ is continuous at td . The costate Jx∗ experiences no jump discontinuities. The dynamics in the costate Jx∗ are defined by Hamilton’s equations as ⎧ ∂ H (0) ⎪ ⎨− ∂ x for t0 ≤ t < tc , (1) (5.34) J˙x∗ = − ∂ H∂ x for tc ≤ t < td , ⎪ ⎩ ∂ H (2) − ∂ x for td ≤ t ≤ t f . Thus, from (5.26)–(5.28), we deduce that the Hamiltonian function does not depend on x at any flight phase. Therefore, the costate Jx∗ is constant because J˙x∗ = 0
for t0 ≤ t ≤ t f .
(5.35)
116
E. Oelberg and L. Rodrigues
Furthermore, since the final position is free, it follows from the transversality condition [7] that Jx∗ (t f ) = 0. This fact together with (5.32), (5.33), and (5.35) implies that (5.36) Jx∗ (t) = 0 for t0 ≤ t ≤ t f . Therefore, the Hamiltonian functions in climb, cruise, and descent are simplified to H (0) = −JE∗ f (0) (T, v, ρ) + Jh∗
v(T − D) W
H (1) = 1 − JE∗ f (1) (W, v, ρ) H (2) = −JE∗ f (0) (T, v, ρ) + Jh∗
v(T − D) W
for t0 ≤ t < tc ,
(5.37a)
for tc ≤ t < td ,
(5.37b)
for td ≤ t ≤ t f .
(5.37c)
The necessary conditions for the Hamiltonian functions at tc and td are given by [7] ∂ψ (1) , ∂tc ∂ψ (2) H (1) (td − ) = H (2) (td + ) − ν H(2) , ∂td H (0) (tc− ) = H (1) (tc+ ) − ν H(1)
(5.38a) (5.38b)
where ν H(1) and ν H(2) are constant scalars, and the partial derivatives in (5.38) are ∂ψ (1) = 0, ∂tc which implies that
and
∂ψ (2) = 0, ∂td
(5.39)
H (0) (tc− ) = H (1) (tc+ )
(5.40)
H (1) (td − ) = H (2) (td + ).
(5.41)
In other words, from (5.40) and (5.41), the Hamiltonian is continuous. Moreover, since the Hamiltonian function does not explicitly depend on time and the final time is free, it holds that H (0) (t) = H (1) (t) = H (2) (t) = 0, ∀t ∈ [t0 , t f ].
(5.42)
The rest of the proof will be broken down into four sections, namely, the cruise, the climb, the descent, and the computation of optimal endurance.
5 A Unified Hybrid Control Approach to Maximum Endurance of Aircraft
117
5.4.1 Cruise First, it will be shown that JE∗ is positive in cruise, as this will be necessary for determining the optimal airspeed profile. In cruise, the Hamiltonian function is given by (5.37b). From (5.42), the costate JE∗ is given by JE∗ =
1 (W, v, ρ), tc ≤ t < td . f (1)
(5.43)
Since f (1) is positive by Assumption 5.10, the costate JE∗ is also positive in cruise. The necessary condition in v is Hv(1) = −JE∗ f v(1) (W, v, ρ) = 0,
(5.44)
where f v(1) denotes the partial derivative of f (1) with respect to v. Since JE∗ > 0, it holds that (5.45) f v(1) (W, v, ρ) = 0. Therefore, the optimal airspeed profile in cruise can be found by solving for v∗(1) in the necessary condition f v(1) (W, v∗(1) , ρ) = 0. From Assumption 5.4, it follows that the optimal thrust profile is T ∗(1) = D(W, v∗(1) , ρ).
(5.46)
5.4.2 Climb First, it will be shown that the costate JE∗ is positive in climb. This condition will be necessary to determine the optimal airspeed and thrust profiles later. In climb, the Hamiltonian function is given by (5.37a). From (5.42), the costates Jh∗ and JE∗ can be related by f (0) (T, v, ρ)W , t0 ≤ t < tc . (5.47) Jh∗ = JE∗ v(T − D) It follows from (5.8) that, in climb, T > D with W > 0 and γ > 0. Furthermore, from (5.32) and (5.43), at the top of the climb, the costate JE∗ is given by JE∗ (tc− ) = JE∗ (tc+ ) =
1 > 0. (1) f (W, v, ρ) t=tc+
The dynamics of the costate JE∗ in climb are
(5.48)
118
E. Oelberg and L. Rodrigues (0)
∂H , J˙E∗ = − ∂E which is equivalent to
t < tc ,
∂ H (0) dW , J˙E∗ = − ∂W d E
(5.49)
t < tc .
(5.50)
From the energy model in Sect. 5.2.3, the conversion from weight to energy for fuel-burning aircraft is E = eWfuel = e(W − W f ), (5.51) where the weight of fuel available is captured by W − W f and W f denotes the weight of the fuselage plus passengers and luggage. Solving for W yields W =
E + Wf. e
(5.52)
Therefore, W E is WE =
dW 1 = >0 dE e
for fuel-burning aircraft,
(5.53)
and is a constant. In the case of all-electric aircraft, there is no fuel weight since the energy comes from the battery, and, hence, WE = 0
for all-electric aircraft.
(5.54)
Therefore, for all aircraft, we have that dW ≥ 0. dE
(5.55)
Next, the first partial derivative in (5.50) is ∂ H (0) = −Jh∗ v ∂W
T−D DW + W2 W
.
(5.56)
Substituting (5.47) into (5.56) yields
∂ H (0) DW 1 = −JE∗ f (0) (T, v, ρ) + . ∂W W T−D
(5.57)
Now, replacing (5.57) into (5.50) yields J˙E∗ = JE∗ f (0) (T, v, ρ)
DW 1 + W T−D
dW . dE
(5.58)
5 A Unified Hybrid Control Approach to Maximum Endurance of Aircraft
119
The differential in (5.58) is separable and can be integrated as
1 d J∗ = JE∗ E
f
(0)
DW 1 + (T, v, ρ) W T−D
dW dt + k1 , dE
(5.59)
where k1 is a constant of integration. Integrating the left-hand side of (5.59) yields
ln JE∗ − k1 =
f (0) (T, v, ρ)
DW 1 + W T−D
dW dt. dE
(5.60)
Solving for JE∗ yields JE∗ = k2 e
f (0) (T,v,ρ)
1 W
D
W + T −D
dW dE
dt
,
(5.61)
where k2 = ek1 is a positive constant. The costate JE∗ is therefore positive. Next, the optimal airspeed profile will be derived. From (5.37a), the necessary condition in v during climb is Hv(0) = −JE∗ f v(0) +
Jh∗ (T − D − v Dv ) = 0. W
(5.62)
Substituting (5.47) into (5.62) yields Hv(0) = −JE∗ f v(0) + JE∗
f (0) (T, v, ρ) (T − D − v Dv ) = 0, v(T − D)
(5.63)
which can be rearranged as Hv(0) = JE∗
f (0) (T, v, ρ) (T − D − v Dv ) − f v(0) v(T − D)
= 0.
(5.64)
It follows from (5.61) with T > D and (5.8) with W > 0 and γ > 0 that JE∗ > 0. Thus, f (0) (T, v, ρ) (T − D − v Dv ) − f v(0) = 0. (5.65) v(T − D) From Assumption 5.10
f (0) (T, v, ρ) = f T(0) (v, ρ)T.
(5.66)
Hence, replacing (5.66) into (5.37a) and rearranging the terms yield v vD . H (0) (T ) = Jh∗ − JE∗ f T(0) (v, ρ) T − Jh∗ W W Substituting Jh∗ from (5.47) into the first term yields
(5.67)
120
E. Oelberg and L. Rodrigues
H (0) (T ) =
vD JE∗ f T(0) D T − Jh∗ , T−D W
(5.68)
where the arguments for f T(0) were omitted for simplicity. Defining a switching function ζ as JE∗ (5.69) ζ f (0) D, T−D T the optimal thrust is given by
T = arg max H
(0)
(T ) = arg max ζ T −
vD Jh∗ W
.
(5.70)
Since T is bounded according to Assumption 5.9, the optimal thrust profile depends on the sign of ζ . If ζ is positive, then the optimal thrust profile will be Tmax (h). Alternatively, if ζ is negative, then the optimal thrust profile will be Tidle . SinceJE∗ is strictly positive from (5.61), f T(0) is strictly positive from Assumption 5.10, and D is strictly positive from (5.5), ζ will never be zero. The sign of ζ depends on T − D as ζ > 0 for T > D, ζ < 0 for T < D.
(5.71)
Therefore, in climb, when T > D, ζ is positive. Since ζ is positive in climb, the optimal thrust in climb is T ∗(0) = Tmax (h). (5.72)
5.4.3 Descent The descent portion of flight closely resembles the climb portion of flight since the Hamiltonian functions are identical. First, it will be shown that the costate JE∗ is positive during descent since this condition will again be necessary to determine the optimal airspeed and thrust profiles later. The costate dynamics J˙E∗ in descent are identical to those in climb given by (5.58) since, in descent, the Hamiltonian function (5.37c) is identical to the Hamiltonian function in climb given by (5.37a). Moreover, it follows that JE∗ is equal to (5.61). The costate JE∗ is therefore positive during descent. Since the Hamiltonian function in descent (5.37c) is identical to the Hamiltonian in climb (5.37a), the necessary condition in v is identical to (5.65). The optimal thrust profile in descent is still obtained by (5.70) with T < D. Therefore, in descent, ζ < 0 and (5.73) T ∗(2) = Tidle .
5 A Unified Hybrid Control Approach to Maximum Endurance of Aircraft
121
5.4.4 Optimal Endurance The maximum endurance can be found by solving the differential equation dE E˙ = = − f (1) (W, v∗(1) , ρ) dt
(5.74)
from the top of the climb to the top of the descent and evaluated at v = v∗(1) . Since f (1) > 0 by Assumption 5.10, the total endurance is J∗ =
td
dt = −
tc
Ed Ec
1 f (1) (W, v∗(1) , ρ)
dE.
(5.75)
Remark 5.1 The expression for maximum endurance (5.75) depends on E c and E d . The energy at the top of the descent E d is known and the energy at the top of the climb E c can be related to the cruising altitude h c through the set of differential equations for climb given by
dE = − f (0) T ∗(0) , v∗(0) , dt v∗(0) (T ∗(0) − D) dh = , dt W
(5.76a) (5.76b)
which can be rearranged as
f (0) T ∗(0) , v∗(0) W dE = − ∗(0) ∗(0) , dh v (T − D)
(5.77)
where T > D in climb by Assumption 5.3 and (5.8). In general, (5.77) is not necessarily separable. Therefore, analytical solutions may be difficult to find. However, it can be integrated numerically with initial conditions E(t0 ) = E 0 and h(t0 ) = h 0 and final altitude h(tc ) = h c to solve for the energy at the top of the climb E(tc ) = E c . The energy at the top of the climb E c can then be substituted into the integral (5.75) to solve for the endurance time.
5.5 Aircraft Configurations The results of Sect. 5.4 can be applied to specific engine configurations using the energy-depletion dynamics given for each aircraft in Table 5.1. Only two aircraft configurations will be solved in detail, namely, the turbofan and the turboprop. The analytical solutions for the turbojet and all-electric aircraft are also provided in less
122
E. Oelberg and L. Rodrigues
detail. The optimal thrust profile is identical for all aircraft and is given by (5.24). This section provides the optimal airspeed profile for the climb, cruise, and descent for each aircraft.
5.5.1 Turbofan Aircraft The maximum endurance problem (5.22) can be solved for a turbofan following the procedure outlined in Sect. 5.4. From the results of Theorem 5.1, the optimal thrust profile is given by (5.24). Using Table 5.1, the model (5.6) for the specific fuel consumption, and the optimal thrust curve in the necessary condition (5.65) yields
e a + b vc T eb (T − D − v Dv ) − T = 0. v(T − D) c
(5.78)
Thus, multiplying both sides by v(T − D)/T e yields a (T − D − v Dv ) +
bv bv bv (T − D) − v Dv − (T − D) = 0, c c c
(5.79)
which, for a = 0, simplifies to
bv v Dv = 0. T − D− 1+ ac
(5.80)
Using the drag expression (5.5) and its derivative gives 2C D2 W 2 b b 4C D2 W 2 3 = 0. − ρ SC D0 v3 + T − ρ SC D0 v2 + 2 2 ρ Sv ac ac ρ Sv
(5.81)
Multiplying by −v2 and rearranging (5.81) yield pt f (v) = 0 where pt f (v) is the fifth-order polynomial 2C D2 W 2 3 b 4C D2 W 2 b ρ SC D0 v5 + ρ SC D0 v4 − T v2 − v− . ac 2 ac ρ S ρS
(5.82)
Applying Descartes’ rule of signs, (5.82) will have one positive real root. The positive real root of (5.82) corresponds to the optimal airspeed profile in climb for T = Tmax (h) and the optimal airspeed profile in descent for T = Tidle . Since there is no analytical expression for finding the roots of a quintic polynomial in the same form as (5.82), numerical solutions will be provided. Using Table 5.1, the model (5.6) for the specific fuel consumption, and the optimal thrust curve in the necessary condition (5.45) for cruise yields
5 A Unified Hybrid Control Approach to Maximum Endurance of Aircraft
4W 2 C D2 3 bC D0 ρ S 5 2bW 2 C D2 v + C D0 ρ Sv4 − v− = 0. 2 ac acρ S ρS
123
(5.83)
The solution for the cruise portion of a flight also requires finding the root of a fifth-order polynomial.
5.5.2 Turboprop Aircraft Using Table 5.1, the model (5.6) with b = 0 for the specific fuel consumption, and the optimal thrust curve in (5.65) yields v eaT (T − D − v Dv ) − 1 = 0, v(T − D)
(5.84)
which from eaT = 0 is equivalent to v(T − D − v Dv ) − v(T − D) = 0.
(5.85)
Dividing (5.85) by v and simplifying gives v Dv = 0.
(5.86)
Using the drag expression (5.5) and its derivative and multiplying by v yields the following bi-quadratic equation: (ρ S)2 C D0 v4 − 4C D2 W 2 = 0.
(5.87)
The optimal airspeed is the positive real solution of (5.87). Since the necessary condition is the same both in climb and in descent, and since (5.87) does not depend on T , the optimal airspeed profile will be the same both in climb and in descent, and is given by 2W C D2 v∗(0) = v∗(2) = . (5.88) ρ S C D0 Using Table 5.1, the model (5.6) with b = 0 for the specific fuel consumption, and the optimal thrust curve in (5.45) yields 3 2C D2 W 2 ρ SC D0 v4 − = 0. 2 ρS
(5.89)
Solving for v yields the maximum endurance airspeed for a turboprop in cruise
124
E. Oelberg and L. Rodrigues
v∗(1)
2W C D2 = , ρ S 3C D0
(5.90)
which is a classical result.
5.5.3 Turbojet Aircraft Using Table 5.1 for the turbojet and following the procedure in Sect. 5.4 yields the maximum endurance cruise speed
v∗(1)
2W C D2 = , ρ S C D0
(5.91)
which is a classical result. Using the same procedure as in Sect. 5.4, the optimal airspeed profile in climb is v
∗(0)
=
Tmax +
2 + 12W 2 C C Tmax D0 D2 , 3ρ SC D0
(5.92)
which corresponds to the maximum rate of climb as expected. In descent, the optimal airspeed profile is
v∗(2)
T + T 2 + 12W 2 C C D0 D2 idle idle = , 3ρ SC D0
(5.93)
which, as expected, corresponds to the minimum rate of descent.
5.5.4 All-Electric Aircraft Using Table 5.1 for all-electric aircraft, following the same procedure as in Sect. 5.4 yields the maximum endurance cruise speed
v∗(1)
2W C D2 = , ρ S 3C D0
(5.94)
5 A Unified Hybrid Control Approach to Maximum Endurance of Aircraft Table 5.2 Summary of aircraft parameters Turbofan Aircraft Wing area, S Zero lift drag coefficient, C D0 Induced drag coefficient, C D2 Specific fuel consumption, S FC Fuel flow coefficient, a Fuel flow coefficient, b
Boeing 737 125 m2 0.020 0.055 N/A 1.0737 × 10−5 1/s 0.9045
125
Turboprop King air 350 26.75 m2 0.0185 0.0263 1.5174 × 10−6 1/m N/A N/A
which confirms the result of [26]. Using the same procedure as Sect. 5.4, the optimal airspeed profile in climb is identical to the optimal airspeed profile in descent, and is given by 2W C D2 v∗(0) = v∗(2) = . (5.95) ρ S C D0
5.6 Examples 5.6.1 Aircraft Parameters This section provides simulation results for two aircraft configurations, namely, turbofan and turboprop. A summary of the aircraft parameters used for the simulations is provided in Table 5.2. The data for the Boeing 737 is taken from [24, 46]. The data for the King Air 350 is taken from [19, 47].
5.6.2 Turbofan Aircraft The climb portion was simulated using a desired cruising altitude of 11,000 m. Figure 5.3 shows the optimal airspeed and thrust profiles in climb, and Fig. 5.4 shows the weight and altitude trajectories. In order to validate that the optimal thrust in climb is the maximum thrust, a comparison was made for different thrust profiles. Since maximum thrust depends on altitude, the maximum thrust profile was multiplied by a constant k to scale down the thrust profile with k ∈ [0.55, 1]. Figure 5.5 illustrates the effect of a decreased thrust profile on the time to reach the desired cruising altitude. Going from left to right, the curves show decreasing values of k. For k = 1, the aircraft reaches the desired cruising altitude in the shortest amount of time, thus maximizing the cruising endurance. For k < 0.9, the aircraft does not reach the desired altitude of
126
E. Oelberg and L. Rodrigues
Fig. 5.3 Turbofan optimal climb airspeed and thrust profiles
11,000 m. For values of k < 0.55, the aircraft simply ran out of fuel before reaching a constant cruising altitude.
5.6.3 Turboprop Aircraft Since the turboprop’s endurance is very sensitive to the cruising altitude, two cruising altitudes were simulated, namely, 3,000 m and 11,000 m. Flying at the optimal airspeed and thrust profiles for climb, the total endurance time in cruise was 7.33 hours for a cruising altitude of 3,000 m and 2.28 hours for a desired cruising altitude of 11,000 m. As expected, the higher cruising altitude caused a large reduction in endurance time for two reasons. Specifically, the turboprop is more efficient at lower altitudes and more fuel was burned during climb to reach the higher altitude. Figure 5.6 shows the optimal airspeed and thrust profiles in climb and Figure 5.7 shows the weight and altitude trajectories. In order to validate that the optimal thrust in the climb is the maximum thrust, a comparison was made for different thrust profiles. Since the maximum thrust depends on the altitude, the maximum thrust profile was multiplied by a constant k to scale
5 A Unified Hybrid Control Approach to Maximum Endurance of Aircraft
Fig. 5.4 Turbofan weight and altitude climb trajectories
Fig. 5.5 Turbofan climb altitude profiles for scaled thrust input
127
128
E. Oelberg and L. Rodrigues
(a) Cruising Altitude of 3,000 m
(b) Cruising Altitude of 11,000 m
Fig. 5.6 Turboprop optimal climb airspeed and thrust profiles
(a) Cruising Altitude of 3,000 m
(b) Cruising Altitude of 11,000 m
Fig. 5.7 Turboprop weight and altitude climb trajectories
down the thrust profile. For a cruising altitude of 3,000 m, we chose k ∈ [0.1, 1]. For 11,000 m, we chose the parameter k ∈ [0.55, 1]. Figure 5.8 illustrates the effect of a decreased thrust profile on the time to reach the desired cruising altitude, and Fig. 5.9 shows the effect of the scaled-down thrust profile on total cruise endurance. For a higher cruising altitude, the thrust effect was more significant than for a lower cruising altitude, whereas for a cruising altitude of 3,000 m, the effect of scaling down the thrust was relatively small for a decrease of up to 50% of maximum thrust.
5.7 Summary The optimal flight strategy for each aircraft can be summarized as follows:
5 A Unified Hybrid Control Approach to Maximum Endurance of Aircraft
(a) Cruising Altitude of 3,000 m
129
(b) Cruising Altitude of 11,000 m
Fig. 5.8 Turboprop climb altitude profiles for scaled thrust input
• All aircraft configurations should be flown at maximum thrust in climb; idle thrust in descent; and at optimal airspeed in climb, cruise, and descent. • For all aircraft configurations, the climb performance greatly impacts the cruise endurance. However, for turbojet and turbofan aircraft, it is possible to slightly reduce the thrust in climb while achieving near-optimal cruise endurance. • To maximize the endurance of turboprop and all-electric aircraft, they should fly at the lowest possible cruising altitude. • The headwind and tailwind do not affect the optimal airspeed or thrust profiles, However, the final position and ground speed will depend on the wind speed. Furthermore, from the analytical solutions provided in Sect. 5.5, there may be discontinuities in the optimal airspeed at the top of the climb tc and at the top of the descent td . It is important to note that the optimal solutions are not the real values of airspeed but rather the target airspeed that a flight management system determines. The aircraft pilot or autopilot would need to control the speed to match the target airspeed. Similarly, the optimal thrust profile may not be continuous at switching times. The results obtained in this chapter are consistent with current airline practices. The FAA suggests using maximum thrust as a possible climb option and suggests using the minimum positive value of thrust in descent [48].
5.8 Conclusion and Future Work This chapter covered one of the important fields discussed in Chap. 1 which is the optimal endurance of large manned or unmanned aircraft. In particular, a unified energy-depletion model was developed and used to solve the maximum endurance problem of fixed-wing aircraft. The problem was solved using a hybrid optimal control framework. A distinctive advantage of the unified energy model is that the solution framework can be applied to both fuel-burning and all-electric aircraft. Fur-
130
Fig. 5.9 Turboprop endurance versus climb thrust profile
E. Oelberg and L. Rodrigues
5 A Unified Hybrid Control Approach to Maximum Endurance of Aircraft
131
thermore, the use of a hybrid optimal framework allowed the development of analytical solutions for turbojet, turboprop, and all-electric aircraft. Analytical solutions are advantageous because they provide insight into the physical properties of the problem. They also have a faster implementation in real time when compared with numerical solutions. In the case of the turbofan aircraft, no analytical solution was obtained since the solution corresponds to the positive real root of a fifth-order polynomial. Future work directions involve extensions of this research to autonomous and semi-autonomous urban air mobility systems to be employed to transport people and goods across cities and from cities to local airports. With this chapter, we conclude the first section of this contributed book aimed at surveying guidance systems for UAVs. The next chapter introduces the section dedicated to modern challenges for UAV navigation systems.
References 1. A matter of time: air traffic delay in Europe, vol 2, 2007. [Online]. Available: https://www. eurocontrol.int/sites/default/files/publication/files/tat2-air-traffic-delay-europe-2007.pdf 2. Tomic T, Schmid K, Lutz P, Domel A, Kassecker M, Mair E, Grixa IL, Ruess F, Suppa M, Burschka D (2012) Toward a fully autonomous uav: Research platform for indoor and outdoor urban search and rescue. IEEE Robot Autom Mag 19(3):46–56 3. MIR Innovation (2018) (subsidiary of Hydro-Quebec). Inspection and maintenance solutions for power transmission systems. [Online]. Available: http://mir-innovation.hydroquebec.com/ f-innovation/en/transmission-solutions-uav.html 4. U.S. Department of Homeland Security (2017) CBP to test the operational use of small unmanned aircraft systems in 3 U.S. border patrol sectors. Available: https://www.cbp. gov/newsroom/national-media-release/cbp-test-operational-use-small-unmanned-aircraftsystems-3-us-border 5. Reinecke M, Prinsloo T (2017) The influence of drone monitoring on crop health and harvest size. In: Conference on next generation computing applications (NextComp). IEEE, pp 5–10 6. National Aeronautics and Space Administration (2017) Drones see second life mapping hurricanes–NASA’s HS3 Program. [Online]. Available: https://www.nasa.gov/content/dronessee-second-life-mapping-hurricanes-nasa-s-hs3-program/ 7. Bryson AE, Ho Y-C (1975) Applied optimal control: optimization, estimation, and control. Taylor & Francis, New York, NY 8. Stengel RF (1994) Optimal control and estimation. Dover, New York, NY 9. Liberzon D (2012) Calculus of variations and optimal control theory: a concise introduction. Princeton University Press, Princeton, NJ 10. Athans M, Falb PL (1966) Optimal control: an introduction to the theory and its applications. McGraw-Hill, Mineola, NY 11. Sussman HJ, Willems JC (1997) 300 years of optimal control: From the brachystoschrone to the maximum principle. IEEE Control Syst 17(3):32–44 12. Miele A (1963) A survey of the problem of optimizing flight paths of aircraft and missiles. University of California Press, Berkeley, CA, pp 3–32 13. Miele A (1962) The calculus of variations in applied aerodynamics and flight mechanics. In: Leitmann G (ed) Optimization techniques, ser. Mathematics in science and engineering, vol 5. Elsevier, pp 99–170 14. Pakniyat A (2016) Optimal control of deterministic and stochastic hybrid systems: Theory and applications. Ph.D. dissertation, McGill University, Montreal, Quebec, Canada
132
E. Oelberg and L. Rodrigues
15. Sorensen JA, Morello SA, Erzberger H (1979) Application of trajectory optimization principles to minimize aircraft operating costs. IEEE conference on decision and control 2:415–421 16. Burrows JW (1983) Fuel-optimal aircraft trajectories with fixed arrival times. J Guidance Control Dyn 6(1):14–19 17. Vilardaga S, Prats X (2015) Operating cost sensitivity to required time of arrival commands to ensure separation in optimal aircraft 4D trajectories. Transp Res Part C-Emer Technol 61:75–86 18. Villarroel J, Rodrigues L (2016) Optimal control framework for cruise economy mode of flight management systems. J Guidance Control Dyn 39(5):1022–1033 19. Botros A (2017) Minimizing direct operating cost for turbojet and turboprop aircraft in cruise. Master’s thesis, Concordia University, Montreal, Quebec, Canada 20. García-Heras J, Soler M, Sáez FJ (2014) A comparison of optimal control methods for minimum fuel cruise at constant altitude and course with fixed arrival time. In: Procedia engineering, vol 80, pp 231–244. International Symposium on Aircraft Airworthiness 21. Kim N, Cha S, Peng H (2011) Optimal control of hybrid electric vehicles based on Pontryagin’s minimum principle. IEEE Trans Control Syst Technol 19(5):1279–1287 22. Bert CW (1999) Range and endurance of turboprop, turbofan, or piston-propeller aircraft having wings with or without camber. Aircraft Design 2(4):183–190 23. Raymer DP (2004) Approximate method of deriving loiter time from range. J Aircraft 41(4):938–940 24. Villarroel J (2015) An optimal control framework for flight management systems. Master’s thesis, Concordia University, Montreal, Canada 25. Kaptsov M (2017) An energy efficient optimal control framework for general purpose flight management systems. Master’s thesis, Concordia University, Montreal, Canada 26. Kaptsov M, Rodrigues L (2018) Electric aircraft flight management systems: Economy mode and maximum endurance. J Guidance Control Dyn 41(1):288–293 27. Bert CW (1981) Prediction of range and endurance of jet aircraft at constant altitude. J Aircraft 18(10):890–892 28. Pakniyat A, Caines PE (2017) Hybrid optimal control of an electric vehicle with a dual-planetary transmission. Nonlinear Anal: Hybrid Syst 25:263–282 29. Ross IM, D’Souza CN (2005) Hybrid optimal control framework for mission planning. J Guidance Control Dyn 28(4):686–697 30. Soler M, Olivares A, Staffetti E (2010) Hybrid optimal control approach to commercial aircraft trajectory planning. Journal of Guidance, Control Dyn 33(3):985–991 31. Franco A, Rivas D (2015) Optimization of multiphase aircraft trajectories using hybrid optimal control. J Guidance Control Dyn 38(3):452–467 32. Soler M, Olivares A, Staffetti E, Zapata D (2012) Framework for aircraft trajectory planning toward an efficient air traffic management. J Aircraft 49(1):341–348 33. Sachs G, Christodoulou T (1986) Endurance increase by cyclic control. J Guidance Control Dyn 9(1):58–63 34. Menon PK, Sweriduk GD, Bowers AH (2007) Study of near-optimal endurance-maximizing periodic cruise trajectories. J Aircraft 44(2):393–398 35. Villarroel J, Rodrigues L (2016) An optimal control framework for the climb and descent economy modes of flight management systems. IEEE Trans Aerosp Electron Syst 52(3):1227– 1240 36. Zhao Y, Tsiotras P (2013) Analysis of energy-optimal aircraft landing operation trajectories. J Guidance Control Dyn 36(3):833–845 37. Stell L (2011) Prediction of top of descent location for idle-thrust descents. In: USA/Europe air traffic management R&D seminar, Berlin, Germany 38. Rodrigues L (2018) A unified optimal control approach for maximum endurance and maximum range. IEEE Trans Aerosp Electron Syst 54(1):385–391 39. Erzberger H, Lee H (1980) Constrained optimum trajectories with specified range. J Guidance Control 3(1):78–85 40. Patron RSF, Botez RM, Labour D (2014) Flight trajectory optimization through genetic algorithms coupling vertical and lateral profiles. In: International mechanical engineering congress and exposition
5 A Unified Hybrid Control Approach to Maximum Endurance of Aircraft
133
41. Franco A, Rivas D (2011) Minimum-cost cruise at constant altitude of commercial aircraft including wind effects. J Guidance Control Dyn 34(4):1253–1260 42. Randle WE, Hall CA, Vera-Morales M (2011) Improved range equation based on aircraft flight data. J Aircraft 48(4):1291–1298 43. Chakravarty A (1985) Four-dimensional fuel-optimal guidance in the presence of winds. J Guidance Control Dyn 8(1):16–22 44. Hale FJ, Steiger AR (1979) Effects of wind on aircraft cruise performance. J Aircraft 16(6):382– 387 45. El-Sayed A (2016) Fundamentals of aircraft and rocket propulsion. Springer, London, UK 46. Longmuir M, Ahmed NA (2009) Commercial aircraft exterior cleaning optimization. J Aircraft 46(1):284–290 47. King air 350i. [Online]. Available: http://beechcraft.txtav.com/en/king-air-350i 48. FAA (2016) Pilot’s handbook of aeronautical knowledge (FAA-H-8083-25B). Aircraft Performance
Emily Oelberg earned a Bachelor of Mechanical Engineering degree at Concordia University in 2016 and finished her Master of Applied Science in Electrical Engineering in 2018. Since finishing her master’s, Emily has worked in the flight simulation industry. Her research interests include optimal control theory and its aerospace applications, specifically flight management system algorithm optimization. Luis Rodrigues is a Professor at the Department of Electrical Engineering of Concordia University and the Founder of the Hybrid Control Systems Laboratory. He was the Director of Education at the Concordia Institute of Aerospace Design & Innovation from 2018 to 2020. He received his Ph.D. from the Department of Aeronautics and Astronautics of Stanford University in June 2002. During the last 2 years of his Ph.D. research, he was a visiting student at the Department of Aeronautics and Astronautics of MIT. He received his Masters’s and his “Licenciatura” degrees from the Technical University of Lisboa, Portugal. Prior to joining Concordia, Dr. Rodrigues worked as a consultant for speech recognition at Eliza Corporation, USA, and as a Project Manager for flight simulation applications at Ydreams, Portugal. At Concordia University, he leads a research group on optimization and control of aerospace systems with a focus on air vehicles and mobility. He is the author and co-author of 120 publications including journal articles, conference chapters, and a book titled “Piecewise Affine Control: Continuous Time, Sampled Data, and Networked Systems”. Professor Rodrigues is a senior member of the Institute of Electrical and Electronics Engineers (IEEE), a lifetime member of the American Institute of Aeronautics and Astronautics (AIAA), and a Professional Engineer registered in the Professional Engineers of Ontario (PEO). He is an Associate Editor of the IEEE Transactions on Aerospace and Electronic Systems.
Chapter 6
Information-Theoretic Autonomous Source Search and Estimation of Mobile Sensors Minkyu Park, Seulbi An, Hongro Jang, and Hyondong Oh
6.1 Introduction In recent years, the potential threat of accidental release of harmful chemical, biological, radiological, or nuclear substances into the atmosphere has significantly increased [1]. Prompt and accurate prediction of the spread and deposition of hazardous materials is critical to enable operators to implement appropriate mitigation strategies and allow people to evacuate from affected areas. However, since the concentration of substance dispersed in the atmosphere is highly nonlinear and affected by unstable wind disturbances (i.e., turbulence), sensor measurements are generally noisy, sporadic, and fluctuating. Thus, efficient and effective sensor data collection and accurate estimation of the source term, including the emission source origin and release strength, have become a challenging problem of high relevance. The use of pre-installed static sensor networks is a traditional source term estimation method for high-risk areas around chemical factories or power plants [2]. However, it has a limited response area, and only a fixed resolution of sensing information collection is possible. Manual sensor measurement collection is also popular for estimating hazardous material emission events that occur in an unexpected area. However, since sensor measurements are obtained using hand-held devices, they are time-consuming and may expose inspection personnel to risks. Autonomous search strategies using unmanned mobile sensors have received considerable attention over M. Park · S. An · H. Jang · H. Oh (B) Ulsan National Institute of Science and Technology, Ulsan, South Korea e-mail: [email protected] M. Park e-mail: [email protected] S. An e-mail: [email protected] H. Jang e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 A. L’Afflitto et al. (eds.), Control of Autonomous Aerial Vehicles, Advances in Industrial Control, https://doi.org/10.1007/978-3-031-39767-7_6
135
136
M. Park et al.
the past decade, which can allow human operators to be positioned away from hazardous areas. To this goal, unmanned aerial vehicles (UAVs) are frequently used as mobile sensors owing to their flexibility and versatility [3–6]. Search path planning and exploration algorithms are crucial for efficient source term estimation. Various search strategies for source term estimation that use mobile sensors have been proposed, which can be largely categorized into pre-planned path [7–10], reactive search [11–15], and cognitive search [16–19] approaches. Each approach has slightly different perspectives on solving source term estimation problems. The pre-planned approach generally focuses on estimation, and, hence, these algorithms do not consider generating efficient search paths. Neumann et al. [7] proposed gas dispersion mapping with wind estimation using a quadrotor UAV. Hutchinson et al. [8] conducted flight experiments to estimate the source term using a particle filter using a UAV following a pre-planned path. Gao et al. [10] employed a mobile sensor that followed a pre-planned spiral path and proposed a source term estimation method using a peak-suppressed particle filter method. The reactive search approaches to select the next sampling path according to the current sensor data. Many reactive search strategies are inspired by the movements of animals or insects; therefore, these strategies are often referred to as bio-inspired search strategies. Vogas et al. [11] proposed search algorithms that determine the future path by combining spiral and zigzag motions adaptively. Russell et al. [14] compared various bio-inspired source search algorithms such as the behavior of E. coli, silkworm moth, and dung beetle, and implemented these strategies in a mobile robot. Pyk et al. [13] proposed a surge-cast motion-based source search strategy while avoiding collision detected from a fisheye lens camera. In this approach, many studies assumed that the search is terminated when the mobile agent arrives within a certain distance from the source. This process is difficult to be applied to an actual source term estimation problem because the agent itself cannot know whether it has actually reached the source or not. To address this issue, some studies conducted source term estimation while performing a bio-inspired source search strategy; however, the search paths were still determined according to a reactive approach without using the estimated source term information. Neumann et al. [12] proposed a source term estimation method using a particle filter while a mobile sensor adopted surge-cast, zigzag, and pseudo-gradient-based search algorithms. Bourne et al. [15] proposed a source term estimation strategy for multi-mobile sensor networks that used particle filters while employing a surge-cast movement strategy. The cognitive search approaches estimate the source term and simultaneously determine the next sampling location. In particular, deciding the best sampling location is strongly related to the real-time estimation of the source term in these approaches. Many cognitive search strategies select an optimal sensing location by maximizing a certain utility function, which is related to the currently estimated source term. This chapter presents the cognitive search methods using information theory-based utility functions that can calculate the expected information gain. The information-theoretic source search strategies are known to be beneficial in a real turbulent gas dispersion situation in the atmosphere, which makes the measurements
6 Information-Theoretic Autonomous Source Search …
137
Fig. 6.1 Overall architecture of the information theory-based cognitive source search and source term estimation in a hazardous gas release scenario: (1) a UAV, which is equipped with a gas sensor, obtains concentration measurements; (2) the UAV estimates the source term using sensor measurements and known information; (3) the estimated source term is used to evaluate the future information gain based on the information theory and making the search decision; and (4) the UAV moves to the next desired sampling location by a position controller and cooperates with neighbors
sporadic and noisy [11, 20]. Both the illustrative hazardous gas release scenario and the overall architecture of the information-theoretic cognitive search method for source term estimation are shown in Fig. 6.1. One of the popular information-theoretic cognitive search strategies is Infotaxis, which was proposed by Vergassola et al. [20]. This algorithm demonstrates that a mobile agent can search a source origin without gradients. In this algorithm, the agent moves to reduce the entropy of the source term where the entropy is defined as the uncertainty of a source term distribution. Infotaxis is modified by replacing the estimation method from a grid-based discrete estimator with a particle filter, which can estimate not only the source origin but also other source parameters such as the release strength [21]. Hutchinson et al. [22] proposed another information-theoretic cognitive strategy named Entrotaxis. This algorithm is based on maximum entropy sampling, which uses the fact that collecting measurements at a highly uncertain location (i.e., a location of a high entropy) is likely to lower the uncertainty of the target distribution. Park et al. [18] proposed Gaussian mixture model (GMM) Infotaxis by adopting not only a particle filter to estimate the source term in a continuous domain but also GMM to select the next sensing location in the continuous domain. UAVs are able to navigate in an obstacle-free environment in most cases as well as to avoid obstacles in obstacle-rich urban environments, indoors, and in areas affected by severe accidents. Lu et al. [23] adopted a potential function to avoid obstacles while selecting a control action by minimizing the entropy of the source term. Zhao et al. [24] also proposed an Infotaxis-based source term estimation strategy that marked some visited areas as forbidden areas to allow an agent to passively leave from
138
M. Park et al.
dangerous areas. However, many studies considered a one-step myopic reward, which could lead to an inefficient search path in obstacle-rich environments. To improve the source term estimation performance in a complex environment by considering multiple-step ahead information, Park et al. [25] adopted the receding-horizon (RH) path-planning approach, and An et al. [26] utilized the rapidly exploring random tree (RRT) method. An additional solution to the collision-free trajectory planning problem for UAVs operating in unknown environments is provided in Chap. 3. The information-theoretic cognitive search framework is extended to various source search strategies for multiple UAVs. Masson et al. [27] extended Infotaxis to a multi-agent algorithm. However, it assumed that the next sensing positions of all agents are selected by a centralized decision-maker, which requires a high communication load. Ristic et al. [28] proposed a decentralized information-theoretic source term estimation strategy for multiple agents by adopting consensus-based formulation control. It allows more flexible operation of multiple agents but limits them to move in the same direction while gathered closely around. Park et al. [19] proposed cooperative Infotaxis that adopts a game-theoretic and negotiation approach to determine a group search decision in a decentralized manner for multiple UAVs. The negotiation approach allows each agent to move to its own best sensing location by considering the decision of the other group members, which improves the exploration and group search performance. Karpas et al. [29] proposed a source term estimation strategy for a multi-mobile sensor network by adopting the Kullback– Leibler divergence as a social utility function, which is named Socialtaxis. Then, improved Socialtaxis was proposed by Jang et al. [30]. Improved Socialtaxis involved a new information-theoretic utility function incorporating Socialtaxis with an additional exploitation term and the Rao–Blackwellized particle filter-based multi-sensor fusion. This chapter introduces the aforementioned recent source term estimation algorithms with the required relevant techniques such as particle filter and informationtheoretic measures. Demonstrations of the search performance using numerical simulations and flight experiments are also presented. The overall flow scheme of the information-theoretic approach for source term estimation is shown in Fig. 6.2. In particular, the main contribution of this chapter can be summarized as follows: • gas dispersion models and gas detection sensor models are introduced. Those models can be used to simulate a gas leakage accident situation or estimate the source term with Bayes’ inference; • the estimation methods based on Bayes’ inference are introduced. In particular, the sampling-based approaches, known as particle filter and Rao–Blackwellized particle filter, are introduced. The difference between each filter is introduced, and the implementation of the source term estimation is proposed in this chapter; • since the information theory-based source term estimation approaches are one of the most recent studies, and it is proven that they show good performance in turbulent gas dispersion situations, several information-theoretic measures for source term estimation are introduced;
6 Information-Theoretic Autonomous Source Search …
139
Fig. 6.2 The flow scheme of the information-theoretic source term estimation approach: (1) the UAV collects sensor data at the current location; (2) the agent estimates the source term; (3) the estimated source term is used to evaluate the utility function, which consists of information-theoretic measures; and (4) single UAV or multi-UAV network moves to the desired next sampling location
• four information-theoretic source term estimation algorithms for single UAV are introduced, and three algorithms for multiple UAVs are introduced in this chapter; and • several evaluation scenarios, such as the computational fluid dynamics (CFD) simulation, real gas dispersion dataset, and real flight experiment, are discussed, and the search performance of each algorithm is verified in various scenarios. This chapter is structured as follows. Section 6.2 presents the list of symbols. Section 6.3 presents the problem descriptions, including the gas dispersion and mobile-sensor models. Section 6.4 discusses some estimation methods based on Bayesian inference, such as the Bayes and particle filters. Section 6.5 introduces information-theoretic measures, which are used to design the utility function for search strategies. Sections 6.6 and 6.7 introduce the information-theoretic source term estimation strategies for single UAV and multiple UAVs, respectively, with demonstrations of the search performance from numerical simulations and flight experiments. Finally, conclusions and future work are provided in Sect. 6.8.
140
M. Park et al.
6.2 Notation In the following, we list and discuss some of the most commonly used terms in this chapter: r [x, y, z]T rˆ [x, y]T q θ [x, y, q]T θˆ [x, y]T = rˆ p [x, y, z]T μc V ζ H τ cP cG c ∈ [c P , cG ] P(·), Q(·) P(x|y) w(·) Np H (Θ) D K L (P||Q) I (Θ; Φ) U (·) u T ψ Π T, MT, Σ T Π M Σ ρ Nm Mn β
Three-dimensional source location [m] Two-dimensional source location [m] Gas release strength [g/s] Source term (source location and release strength) Source term (source location) Sensing location of mobile sensor [m] Mean gas concentration calculation function Mean wind velocity [m/s] Diffusivity of the gas plume [m2 /s] Stacking height of the plume (Gaussian plume model) [m] Lifetime of the emitted gas particle (Isotropic plume model) [s] Measured number of encountered particles (Poisson sensor model) Measured gas concentration (Gaussian sensor model) [g/m3 ] Gas sensor measurement Probability distribution Conditional probability distribution (x given y) Weight of sample (i.e., particle) for particle filter Number of sample (i.e., particle) for particle filter Entropy of θ (information theory) Kullback–Leibler divergence between P and Q Mutual information for random variables θ and φ Utility function for optimal path planning Control action for mobile sensor (waypoint) Gaussian mixture model (GMM) parameters Set of mixing coefficient for GMM Set of the mean for GMM Set of covariance for GMM Discount factor of receding horizon approach Number of measurement sampling for receding horizon approach Set of neighbor of the agent n Balancing parameter between social and individual action for Improved Socialtaxis
6.3 Problem Description This section explains the basic concept of the source term estimation problem including the mathematical formulation of the source and mobile-sensor models. Several types of sources are available, which generate an invisible field such as gas, radiation, and magnetic fields, and this study mainly focuses on the gas emission source. Note that switching the type of source is not difficult by changing the source model. Estimating the source term (i.e., the parameters of the gas dispersion model) and
6 Information-Theoretic Autonomous Source Search …
141
forecasting the influenced area are the objectives of the source term estimation problem. Thus, a mobile agent is assumed to be equipped with a gas sensor that measures hazardous gas concentration. The dispersion models of the released gas into the atmosphere are first introduced, and the mobile-sensor models are described afterward.
6.3.1 Gas Dispersion Model The expected gas concentration can be calculated using various gas dispersion models, ranging from the analytical gas dispersion models [20, 31] to complex CFD models [32]. The computational burden and forecasting accuracy are considered to select an appropriate dispersion model in each source term estimation study. Since the lightweight onboard computers of autonomous aerospace systems such as UAVs have limited computational power, this chapter utilizes two analytical gas dispersion models that have a low computational burden, namely, Gaussian [31] and isotopic [20] plume models. These gas dispersion models assume that gas dispersion has already reached a steady state, which means that the expected concentration only changes relative to the location. The parameter set used in the dispersion model is called a source term θ , which includes mean wind speed and diffusivity among others. Since each gas dispersion model has a different source term, some common and essential elements (i.e., two-dimensional source location and release strength) are used as elements of the source term, that is, θ [xs , ys , qs ]T .
6.3.1.1
Gaussian Plume Model
The Gaussian plume model assumes that the gas is continuously released from a 3 stationary point source located at rs [xs , ys , z s ]T ∈ R+ with a release strength qs ∈ R+ [31]. The mean gas concentration at each sensing location p[x, y, z]T can be calculated as (z − H )2 (z + H )2 qs Δy 2 exp + exp , exp − 2 μc (p, θ ) = 2π V ζ y ζz 2ζ y 2ζz2 2ζz2 (6.1) where θ represents source term, V denotes the mean wind speed, and H denotes the stacking height of the emitted gas. It is assumed that the wind blows in the x direction, and, hence, y and z represent the horizontal and vertical crosswind directions, respectively. The crosswind distance from the source in the y direction is expressed as Δy = y − ys . Turbulent diffusivity parameters ζ y and ζz are determined by the wind conditions, as described in [33].
142
6.3.1.2
M. Park et al.
Isotropic Plume Model
The isotropic plume model [20] calculates the mean gas concentration of a turbulent plume in the atmosphere by assuming that the plume consists of gas particles. The plume model assumes that gas particles are continuously released from a stationary point source located at rs with the fixed release strength qs and the expected mean concentration μc at the sensing location p are calculated as −(y − ys )V |p − rs | qs exp exp − , μc (p, θ ) = 4π ζ |p − rs | 2ζ λ
where λ=
ζτ 1+
V 2τ 4ζ
.
(6.2)
(6.3)
It is assumed that the wind blows in the −y direction in this model. Thus, x and z represent the crosswind directions, respectively. Isotropic effective diffusivity ζ influences the diffusion in the crosswind direction when the gas is advected by the mean wind velocity V . Finally, it is assumed that the gas particle can be maintained for the lifetime τ in this plume model.
6.3.2 Mobile-Sensor Models It is assumed that the mobile sensing agent can hover in the air like a multi-rotor UAV. The sensor installed in the UAV is modeled using probability distribution functions because the sensor measurement is a stochastic process in nature. In recent studies, the Poisson and Gaussian distributions have been used as gas sensor models to represent discrete and continuous noisy sensors, respectively. 6.3.2.1
Poisson Distribution-Based Sensor Model
This sensor model is often called a particle encounter sensor model because the random variable represents the number of particles encountered by the sensor during the sensing time. The probability of a sensor measurement c P ∈ Z+ measured at p, which is influenced by the source term θ can be expressed in conditional probability as P (c P |θ ) =
(μc (p, θ )Δt)c P exp −μc (p, θ )Δt , cP !
(6.4)
where μc (p, θ ) ∈ R+ denotes the modeled mean number of encountered particles given by (6.1) or, alternatively, (6.2), and c P ! denotes the product of the first c P positive integers; additional details on conditional probability can be found in [34].
6 Information-Theoretic Autonomous Source Search …
6.3.2.2
143
Gaussian Distribution-Based Sensor Model
Many real gas sensors measure the continuous gas concentration, instead of the number of encountered particles. The Gaussian probability model is a conservative choice to represent a continuous random variable [8, 35–37]. The probability of obtaining a stochastic sensor value cG ∈ R+ can be calculated as cG − μc (p, θ ) 2 1 P(cG |θ ) = N(cG ; μc (p, θ ), σ ) = √ exp − , √ σ 2π 2σ
(6.5)
where μc (p, θ ) denotes the modeled mean concentration. Illustrative maps of the expected sensor observation using the aforementioned plume and sensor models are shown in Fig. 6.3. The puff that contains a 10 mg/m3 concentration is assumed as one particle by the Poisson sensor (i.e., particle encounter sensor) model.
Gaussian sensor
Poisson sensor
Gaussian plume
Isotropic plume
Fig. 6.3 Illustrative map of the expected observation using different plume and sensor models. The employed source parameters are xs = 204 m, ys = 210 m, and qs = 2 g/s. The wind direction and velocity are 220◦ and 2 m/s, respectively
144
M. Park et al.
6.4 Source Term Estimation Methods The set of parameters for forecasting the source dispersion, such as the source location, emission strength, and wind speed and direction, are called source terms. Although each gas dispersion model has a different source term set, some essential parameters, such as source location and release strength, are included as fundamental elements in almost all models. In addition, estimating the source location and strength is crucial for undertaking appropriate mitigation strategies; other parameters are assumed to be obtained using meteorological data [15, 21, 22, 38]. Thus, the vector of a two-dimensional (2D) source location and the release strength are used 3 as source terms, i.e., θ [xs ys qs ]T ∈ R+ . There are two different approaches to calculating the posterior probability distribution, namely, the Frequentist and the Bayesian approach. The Bayesian approach is an estimation method that utilizes prior knowledge such as the aforementioned gas dispersion and sensing models. The Frequentist approach uses observations only [39]. The Bayes filter method updates its posterior distribution recursively using a Bayesian approach when the new measurement is obtained. Since known gas dispersion models and sensor models exist and the mobile sensor takes measurements repeatedly, the Bayes filter is adopted to estimate the source term in this study. If calculating the exact posterior probability distribution is difficult, then the samplingbased approach, known as particle filter, can be used. The particle filter approximates the posterior function with samples. As the particle filter often requires high computational power, the Rao–Blackwellized particle filter is proposed to reduce the computational burden using a decoupling method.
6.4.1 Bayes Filter The Bayes filter is a statistical estimation method that utilizes the Bayes’ theorem to calculate the posterior probability of a target (i.e., source term in this study). Bayes’ theorem allows calculating the posterior probability of the source term P (θ |c) using the observation model P (c|θ ), the prior knowledge of the source term P (θ ), and the evidence P (c) as P (θ |c) =
P (c|θ ) P (θ ) . P (c)
(6.6)
Since gas information is sequentially acquired over time, the posterior probability at each time can be calculated. All collected measurements along the sensing
6 Information-Theoretic Autonomous Source Search …
145
T trajectory c1:k = c1 (p1 ), · · · , ck (pk ) are required to exactly calculate the posterior probability. Note that the term c can denote c P , cG , or other expected mean concentration calculated using a gas dispersion model. Markov assumption is often adopted to reduce the computational burden. It is assumed that the currently estimated probability P (θk |ck ) implies all previous information and each measurement is assumed independent. Thus, the prior probability at the kth time step, namely, P (θk |ck−1 ), is determined using the propagation of the posterior probability at the (k − 1)th time step, i.e., P (θk−1 |ck−1 ), with the transition probability from state θk−1 to θk , P (θk |θk−1 ). This approach is called recursive Bayesian estimation, also known as the Bayes filter [34]. The newly updated probability of the source term using the Bayes filter is calculated as P (θk |ck ) =
P (ck |θk ) P (θk |ck−1 ) , P (ck |ck−1 )
(6.7)
where P (θk |ck−1 ) =
P (θk |θk−1 ) P (θk−1 |ck−1 ) dθk−1 .
(6.8)
The observation model P (ck |θk ) can be replaced by (6.4), (6.5), or alternative sensor models. Since the evidence probability p (ck |ck−1 ) is not a function of the random variable θk , it can be assumed as a constant coefficient and calculated using marginalization as
P (ck |ck−1 ) =
P (ck |θk ) P (θk |ck−1 ) dθk−1 ;
(6.9)
more details on the marginalization can be found in [34]. The Bayes filter becomes equal to the Kalman filter under the assumption that the observation and transition functions are linear and the random variables are normally distributed [34]. Discrete Bayes filters such as the grid-based Bayes or particle filter can be adopted whenever the target function is highly nonlinear and non-Gaussian. In the next section, the Monte Carlo random sampling-based Bayes filter, also known as the particle filter, is introduced.
6.4.2 Particle Filter The estimation accuracy of the grid-based Bayes filter depends on the grid size. If the grid size is reduced to increase the estimation performance, then the computational burden and the number of wasted grids that have no meaningful information, increase. Thus, a sampling-based Bayes filter is often employed to approximate a highly nonlinear non-Gaussian posterior probability distribution by overcoming the limitations of the grid-based Bayes filter. The continuous random variables are
146
M. Park et al.
T approximated using sampled potential source terms θ (i) 1≤i≤N p , where N p denotes the total number of particles. Theoretically, samples should be taken from the target distribution. However, since taking a sample from an unknown target distribution is impossible, particles are drawn from the proposed distribution with associated weight w(i) . The approximation of the posterior distribution at the kth time step can be expressed as Np wk(i) δ(θk − θk(i) ), (6.10) P (θk |ck ) ≈ i=1
where δ(·) denotes the Dirac Delta function. Since the particle filter underlies the Markov assumption, the probability distribution P(θk |ck−1 ) expressed in (6.8) is used as the proposed distribution for the kth time step. The associated weight represents the likelihood of each particle θ (i) , and can be expressed as wk(i) = P(θk(i) |ck ).
(6.11)
To reduce the computational burden, it is assumed that the source term is time(i) ) = 1, or equivainvariant, which means that the transition probability P(θk(i) |θk−1 (i) (i) lently θk = θk−1 [19, 21, 22]. Thus, the source term prediction function in (6.8) can be expressed as (i) (i) |ck−1 ) = wk−1 . (6.12) P(θk(i) |ck−1 ) = P(θk−1 Then, similar to (6.7), the weight of each particle can be updated as (i) P(ck |θk(i) ) · wk−1 (i) ∝ P(ck |θk(i) ) · wk−1 . wk(i) = P(θk(i) |ck ) = N p (i) (i) P(c |θ ) · w k k−1 i=1 k−1
(6.13)
Note that since the denominator of (6.13) is not a function of θ , it is simply a constant normalization coefficient, as described in (6.9). The particle filter often suffers from a degeneracy problem, which means that only a few particles have nonzero weights. To avoid this problem, a resampling method is used with the effective number n e f f = N p 1 (i) 2 [40]. The effective number i=1 (wk
)
indicates the dispersion or concentration of a distribution. If the effective number is lower than a threshold (i.e., weights are concentrated to a few particles), then the resampling algorithm duplicates the high-weight particles and deletes the near-zeroweight particles. Additionally, to increase the particle diversity after resampling, some studies employ the Markov chain Monte Carlo methods [18, 22, 40].
6 Information-Theoretic Autonomous Source Search …
147
6.4.3 Rao–Blackwellized Particle Filter The Rao–Blackwellized particle filter [41] can be used to reduce the computational burden of the particle filter. This algorithm assumes that some random variables are independent of one another. In the source term estimation problem, the gas release strength and the source location can be considered independent of each other. In this case, the posterior probability can be factorized using the chain rule as P(rk , qk |ck ) = P(qk |rk , ck ) · P(rk |ck ).
(6.14)
The first probability term on the right-hand side of (6.14) is assumed to be a gamma distribution for the source term estimation as [41] qk(κ−1) e−qk /η , ηκ Γ (κ)
P(qk |rk , ck ) = G(qk ; κ, η) =
(6.15)
where κ = κ0 +
Nv
ck,n ,
(6.16a)
n=1
η=
1 + η0
η0 Nv n=1
ρ(rk,n )
,
ρ(rk,n ) = μc (pk,n , θk )/qk,n ,
(6.16b) (6.16c)
κ0 and η0 denote hyper-parameters of the prior probability of the release strength, P(qk ), which is also assumed as the gamma distribution G(qk ; κ0 , η0 ), and Γ (·) denotes a gamma function. The second probability term in (6.14) can be expressed using the particle filter as P (rk |ck ) ≈
Np
wk(i) δ(rk − rk(i) ).
(6.17)
i=1
Similar to (6.13), Bayes’ theorem is adopted to update the particle filter weight as wk(i) ≈ P(rk(i) |ck ) = ∝
P(ck |rk(i) )P(rk(i) |ck−1 ) N p (i) (i) i=1 P(ck |r k )P(r k |ck−1 ) (i) P(ck |rk(i) )wk−1 .
(6.18a) (6.18b) (6.18c)
However, the likelihood P(ck |rk(i) ) is unknown for this weight update. To compute this term, both Bayes’ theorem and (6.15) are used as
148
M. Park et al.
P(qk |rk(i) , ck ) =
P(ck |rk(i) , qk )P(qk |rk(i) ) P(ck |rk(i) )
.
(6.19)
Since qk and rk are independent, it holds that P(qk |rk(i) ) = P(qk ), which is the prior distribution of the release strength, which is assumed to be the gamma distribution G(qk ; κ0 , η0 ) as mentioned earlier. Then, by re-arranging (6.19), the likelihood P(ck |rk(i) ) can be obtained by using the gamma distribution and the sensor measurement model, (e.g., (6.4)), as P(ck |rk(i) ) = =
P(ck |rk(i) , qk )P(qk ) P(qk |rk(i) , ck ) P(c|θk(i) )G(qk ; κ0 , η0 ) . G(qk ; κ, η)
(6.20)
6.5 Information-Theoretic Measures This section introduces the information-theoretic measures for autonomous source term estimation for mobile sensors. In particular, we present entropy, Kullback– Leibler divergence, and mutual information. The entropy from the information theory indicates the average “information” or “uncertainty” of the target probability distribution. This is a basic information-theoretic measure, which is needed to calculate other information-theoretic quantities, and, for this reason, it is computed first. Next, the Kullback–Leibler divergence D K L , also known as relative entropy, is introduced. The Kullback–Leibler divergence is another basic measure for calculating how a probability distribution (P) is different from a reference probability distribution (Q) for the same random variable θ . Finally, mutual information is introduced in this section. The mutual information is a special type of D K L between the joint and product probabilities of the two random variables.
6.5.1 Entropy Entropy is a statistical measure that indicates the uncertainty of the probability distribution. The entropy H can be calculated as H (Θ) = −
P(θ ) log P(θ ) = EΘ − log P(θ ) ,
(6.21)
θ∈Θ
where P(θ ) indicates the target probability mass function, i.e., the source term probability function. Entropy can be considered as an expectation of − log P(θ ), and
6 Information-Theoretic Autonomous Source Search …
149
it is often called Shannon entropy after Shannon’s seminal work [42]. The binary logarithm log2 is used to calculate the Shannon entropy, and the natural logarithm ln and the decimal logarithm log10 are simply constant multiples of one another. In contrast to the variance, which calculates the uncertainty according to how far a sampled random variable is from the mean, the entropy uses probability values to calculate its uncertainty [43]. Thus, the entropy does not need to consider the shape of the probability distribution; hence, it enjoys the advantage of easily calculating the uncertainty in multi-modal and arbitrary non-Gaussian distributions.
6.5.2 Kullback–Leibler Divergence The Kullback–Leibler divergence between probability mass functions P(θ ) and Q(θ ) is defined as P(θ ) . (6.22) P(θ ) log D K L (P||Q) Q(θ ) θ∈Θ This equation is often used to represent how different two probability functions are for the same random variable. The Kullback–Leibler divergence is proven to be non-negative and zero if and only if P = Q for any arbitrary distributions [43].
6.5.3 Mutual Information Mutual information I (θ ; φ) denotes one special case of D K L between the joint probability mass function P(θ, φ) and the product probability mass function P(θ )P(φ) for two random variables θ and φ. Furthermore, the mutual information is nonnegative and zero if and only if θ and φ are independent. The mutual information for θ and φ is defined as I (Θ; Φ)
P(θ, φ) log
θ∈Θ,φ∈Φ
P(θ, φ) . P(θ )P(φ)
The mutual information can be written in the entropy form as
I (Θ; Φ) =
P(θ, φ) log
P(θ, φ) P(θ )P(φ)
P(θ, φ) log
P(θ |φ) P(θ )
θ∈Θ,φ∈Φ
=
θ∈Θ,φ∈Φ
=−
θ∈Θ,φ∈Φ
P(θ, φ) log P(θ ) +
θ∈Θ,φ∈Φ
P(θ, φ) log P(θ |φ)
(6.23)
150
M. Park et al.
=−
P(θ ) log P(θ ) +
θ∈Θ
= H (Θ) −
P(φ)P(θ |φ) log P(θ |φ)
θ∈Θ,φ∈Φ
P(φ)H (Θ|Φ = φ)
φ∈Φ
= H (Θ) − EΦ [H (Θ|Φ = φ)] = H (Θ) − H (Θ|Φ).
(6.24)
Equation (6.24) shows that the mutual information for Θ and Φ represents the entropy difference between the entropy of θ and that of θ , conditioned on φ. Therefore, it indicates how much the uncertainty of the probability of the random variable θ decreases as a new variable φ is obtained. For the source term estimation, new measurements or measurements from neighbors can be employed as φ when the source term is θ . More detailed implementations using these information-theoretic concepts are introduced in the following sections.
6.6 Source Term Estimation Using a Single UAV This section introduces the information-theoretic search strategies for source term estimation using a single autonomous UAV. Infotaxis, which is one of the most basic cognitive source term estimation algorithms for a mobile sensor, uses informationtheoretic measures, and some variations and expansions of the Infotaxis are introduced [19–22]. This section first introduces the basic idea of Infotaxis, which is the maximization of the expected reduction in the entropy where a mobile agent moves to reduce the entropy (i.e., uncertainty) of a source term. Then, the improved Infotaxis method by selecting more effective sampling locations using Gaussian mixture model clustering [18] is introduced. Lastly, the Infotaxis variants that use the receding horizon approach and RRT are introduced.
6.6.1 Infotaxis Infotaxis [20] is the utility-based optimal search approach where the next best sensing position is determined by maximizing its utility value. The utility function of this strategy involves the expected reduction in the entropy of the source term. Note that the original Infotaxis adopts a grid-based Bayes filter to estimate only 2D source location rˆ [x, y]T and the grid-based movement model. Thus, the source term is θˆ = rˆ . The mobile agent selects the best sampling location among the neighboring grid cells by maximizing the utility function U (u k ) as
6 Information-Theoretic Autonomous Source Search …
u ∗k = arg max [U (u k )] , u k ∈u
151
(6.25)
where U (u k ) = P(θˆ = pk+1 ) H (Θˆ k ) − 1 − P(θˆ = pk+1 ) EC H Θˆ k+1 |Ck+1 = ck+1 (u k ) − H (Θˆ k ) . (6.26) This utility function is balanced between exploitation (first term) and exploration (second term) using the source-existence probability at the next sampling position with a choice of action u k ∈ u[↑, ↓, ←, →]T . In the second term, EC H Θˆ k+1 |Ck+1 = ck+1 (u k )) represents the expectation of the entropy given a random future measurement ck+1 , which is obtained at the next sampling position determined by theaction u k . This equation is the same as the definition of the ˆ conditional entropy H Θk+1 |Ck+1 . Thus, the utility function is expressed as U (u k ) = P(θˆ = pk+1 ) H (Θˆ k ) − 1 − P(θˆ = pk+1 ) H Θˆ k+1 |Ck+1 − H (Θˆ k ) .
(6.27) In addition, the second term of the utility function is precisely the mutual information between the source term θˆ and future observation ck+1 with the assumption that the source term is time-invariant (i.e., θˆk = θˆk+1 ) as mentioned earlier [44]. Thus, the utility function can also be expressed as U (u k ) = P(θˆ = pk+1 ) H (Θˆ k ) + 1 − P(θˆ = pk+1 ) I (Θˆ k ; Ck+1 ).
(6.28)
Ristic et al. [21] implemented the particle filter to the Infotaxis scheme and the modified utility function into a simpler form with similar search performance. The particle filter could easily expand the dimension of a target random variable. Therefore, not only the source location but also the source release strength q could be conT sidered in the source term in this algorithm; thus, θ θˆ T , q = [x, y, q]T . Because the first term in (6.28) is meaningful only when the source is located at the next sampling position, only the second term with the assumption P(θˆ = pk+1 ) = 0 is used to simplify the utility function as U (u k ) = I (Θk ; Ck+1 ) = H (Θk ) − H (Θk+1 |Ck+1 ).
(6.29)
The source term is estimated using the particle filter, and each particle θ (i) represents a random variable with associated weight w(i) . These weights approximate the source term probability distribution as expressed in (6.17). Thus, the entropy of the source term, which is approximated from (6.21) using the particle filter at the kth time step, is calculated as
152
M. Park et al.
Algorithm 6.1 Infotaxis k=0 SEARCH = ON while SEARCH == ON do k =k+1 ck ← read a new sensor measurement Calculate P(θk |c1:k ) using (6.17) for all u k ∈ u do Calculate pˆ k+1 Calculate μc (pˆ k+1 , θ) using (6.2) for ck+1 = 1 : cmax do Calculate P(θk+1 |c1:k+1 ) using (6.17) end for Calculate U (u k ) using (6.29) end for Select u ∗k using (6.25) Update pk+1 if Source Found then SEARCH = OFF end if end while
H (Θk ) =
Np
wk(i) log wk(i) .
(6.30)
i=1
The performance of the source term estimation using (6.29), also known as Infotaxis II utility, is demonstrated to be similar to that of the original Infotaxis with a less computational burden [21]. The pseudo-code of Infotaxis is shown in Algorithm 6.1.
6.6.2 Entrotaxis Entrotaxis [22] is another cognitive search strategy that uses the particle filter to estimate the source term. This strategy uses the entropy of the future measurement as a utility function instead of the entropy of the source term. The maximum entropy sampling principle [45] is used to guide the UAV in Entrotaxis. This requires obtaining the sensor measurement at the location with the high measurement entropy, which indicates the location where sufficient information to predict the quantity of the measurement is not given. The utility function of Entrotaxis is expressed as c max
U (u k ) = H (Ck+1 ) =
P(ck+1 (u k )|c1:k ) log P(ck+1 (u k )|c1:k ).
ck+1 =0
The optimal decision u ∗k is selected by maximizing the entropy, i.e.,
(6.31)
6 Information-Theoretic Autonomous Source Search …
u ∗k = arg max [U (u k )] . u k ∈u
153
(6.32)
The probability of the future measurement P(ck+1 (u k )|c1:k ) by selecting the decision u k can be calculated using the marginal probability similar to (6.9) with the particle filter as Np P(ck+1 (u k )|θk(i) )P(θk(i) |c1:k ). (6.33) P(ck+1 (u k )|c1:k ) = i=1
The measurement probability given the source term P(ck+1 (u k )|θk(i) ) can be replaced with sensor models such as (6.4), and the probability of the source term approximated using the particle filter P(θk(i) |c1:k ) can be represented by the corresponding weight wk(i) .
6.6.3 Infotaxis Using Gaussian Mixture Model The Infotaxis strategy requires the computation of the utility function for all neighboring sensing location candidates to determine the optimal decision. Thus, the computational burden of the Infotaxis algorithm increases as the number of next-sampling location candidates increases. Although Ristic et al. [21] improved the estimation method a the particle filter that allows estimating the source term in a continuous domain, the mobile sensor still moves on the grid cells. A method that selects better sensing location candidates in a continuous search domain using the Gaussian mixture model (GMM) instead of the grid action set was proposed in [18]. This strategy is called GMM-Infotaxis, and it uses the particle filter to estimate the source term and clusters the particles using the GMM. The next sampling location is selected toward the mean of each Gaussian distribution in the GMM with a fixed movement action distance as shown in Fig. 6.4. The GMM is a probability function comprising several Gaussian distributions as P(θ |ψ) =
L
πl N(θ ; m i , Σi ),
(6.34)
l=1
T where ψ Π T , M T , Σ T denote GMM parameters and include a set of mixing coefficient, mean, and covariance, that is, Π [π1 , π2 , . . . , π L]T , M [m 1 , m 2 , . . . , m L ]T , and Σ[Σ1 , Σ2 , . . . , Σ L ]T , respectively. The number of Gaussian distributions in the GMM L is a user parameter that is set to 3 in this chapter. Note that this GMM-Infotaxis strategy is activated only when the variance of particles is less than a threshold (i.e., particles are converged to some extent); otherwise, the original Infotaxis is used.
154
M. Park et al.
Fig. 6.4 The next sampling location candidates of GMM-Infotaxis Table 6.1 The results of 200 Monte Carlo simulations of GMM-Infotaxis and Infotaxis for different values of the release rate q. SR denotes the success rate and MST denotes the mean search time d qs 0.2 0.5 1 5 10 1m
5m
Infotaxis SR [%] MST [steps] GMM-Infotaxis SR [%] MST [steps] Infotaxis SR [%] MST [steps] GMM-Infotaxis SR [%] MST [steps]
84.5 159.3
99.0 112.4
100 76.1
100 45.5
97.0 44.8
88.0 151.4
100 104.3
100 74.2
99.5 41.1
99.5 43.1
21.0 164.5
41.0 130.8
65.5 108.2
94.5 62.1
95.5 32.7
92.0 140.2
99.5 79.2
99.5 47.0
99.5 18.4
99.5 14.5
Table 6.1 shows a comparison of the search performance of Infotaxis and GMMInfotaxis in terms of estimation accuracy, success rate (SR), and mean search time (MST). The search performance of GMM-Infotaxis is superior to that of Infotaxis regardless of the step size of the movement and release strength qs . The superior performance of GMM-Infotaxis is particularly evident in the case of the 5 m-step size; Infotaixs cannot properly estimate the source term, and the search time becomes longer at a low release strength because only a few grid cells have meaningful measurements in a sparse grid compared with a low background concentration. In contrast, as GMMInfotaxis can flexibly select the next sampling position in a continuous domain rather
6 Information-Theoretic Autonomous Source Search …
155
Fig. 6.5 Snapshot and setup of the outdoor flight experiment Table 6.2 The average performance comparison between Infotaxis and GMM-Infotaxis of the outdoor flight experiments Estimation error (RMSE) Location x Location y Release Mean search time [steps] [m] [m] rate [mg/s] Infotaxis GMM-Infotaxis
2.14 0.756
3.66 0.776
1824 486.5
44.7 36.8
than the grid cell, it can obtain more useful sensor measurement, which reduces the search time and increases the search accuracy even at low release strength conditions. Flight experiments demonstrated that the search performance of GMM-Infotaxis is superior to that of the Infotaxis strategy even in uncontrolled real environments. The experimental setup and snapshot of the flight experiment are shown in Fig. 6.5. The source term estimation errors and search time of the GMM-Infotaxis are lower than those of the Infotaxis regardless of the wind condition. The average result over ten experiments for each is shown in Table 6.2. The representative experimental result of GMM-Infotaxis are shown in Fig. 6.6. Figure 6.6a, b show that GMM-Infotaxis can select the next best sampling location in the continuous domain, which is significantly useful around the source because the UAV can approach closer to the source where it can select the next sampling location in the continuous domain instead of just moving in the grid locations.
6.6.4 Infotaxis Variants in Obstacle-Rich Environments Hazardous gas release accidents can often occur in obstacle-rich environments such as urban and industrial areas. Although a vehicle is used, obstacles such as high buildings and factory chimneys exist in these environments. The improved Infotaxis-based source term estimation algorithm for UAVs in an obstacle-rich environment, which is called RH-Infotaxis, is thus proposed [25]. RH-Infotaxis considers the long-term
156
M. Park et al.
(a) 21th time step
(b) 31th time step
Fig. 6.6 Representative GMM-Infotaxis results from the flight experiment. Both a and b show the results of the search and estimation in different time steps
search path (i.e., multiple time steps) for the source term estimation and chooses the optimal path that maximizes the expectation of the possible utility functions along the path. Since computing all possible utility functions is computationally very expensive, the randomly sampled future measurement and binary sensor approximation, which contains only detection and non-detection cases, are used. The utility function is calculated as ( path) I (u k:k+K −1 )
( path)
Nm ( path,m) Ik+n−1 = ρ , Nm n=1 m=1 K
n
(6.35)
where I (u k:k+K −1 ) denotes the expected gain of information when an agent follows the ( path)th for K time steps in the future, the belief ratio 0 ≤ ρ ≤ 1 is introduced as a discount factor, and Nm represents the number of measurement sampling. The local utility function for the (k + n − 1)th time step on the selected path with mth ( path,m) sampling case is expressed as Ik+n−1 , and the utility function of the Infotaxis is adopted as a local utility function. After choosing the optimal path, the agent follows only the first step in the path. Simulation results in an obstacle-rich environment using Infotaxis and RH-Infotaxis are shown in Fig. 6.7. It is assumed that 2D LiDAR is equipped on the UAV and the agent can build an obstacle map along the trajectory. Infotaxis is a greedy algorithm, which means that the mobile agent considers only the next sampling position. This characteristic creates a failure when the agent makes a detour that is currently inefficient but can obtain considerable source information in
6 Information-Theoretic Autonomous Source Search …
(a) Infotaxis
157
(b) RH-Infotaxis
Fig. 6.7 Results of the illustrative runs of Infotaxis and RH-Infotaxis
(a) Infotaxis
(b) RRT-Infotaxis
Fig. 6.8 Results of illustrative runs of infotaxis and RRT-Infotaxis
the future. RH-Infotaxis considers several steps forward, and the potential information gain that can be obtained in the future influences the current movement. Thus, it can easily avoid obstacles and find a successful search path in an obstacle-rich environment. A path planning approach for source term estimation similar to RH-Infotaxis but considers a continuous search domain similar to GMM-Infotaxis by adopting an RRT was proposed in [26]. This algorithm is called RRT-Infotaxis. The representative simulation results of Infotaxis and RRT-Infotaxis in the same obstacle environment are shown in Fig. 6.8. Similar to the results of GMM-Infotaxis, RRT-Infotaxis can estimate the source term faster than Infotaxis with four predefined movements.
158
M. Park et al.
6.7 Source Term Estimation Using Multiple UAVs This section introduces source term estimation strategies for multiple UAVs. In particular, this study focuses on a decentralized multi-UAV autonomous search method. First, cooperative Infotaxis [19], which uses the negotiation approach from the game theory to find a global sub-optimum in a decentralized manner, is introduced. Second, Socialtaxis [29], which utilizes a modified utility function with Kullback–Leibler divergence (D K L ) in (6.22) is presented. Lastly, improved Socialtaxis, which is inspired by the utility function of Socialtaxis but improves the exploitation property, is proposed.
6.7.1 Cooperative Infotaxis Two different levels of coordination methods for the source term estimation using a decentralized multi-mobile sensor network are proposed, called coordinated and cooperative Infotaxis [19]. Each UAV estimates the source term using the particle filter. Since the UAVs simultaneously search the source and share their current measurement via wireless network, the particle filter is modified by multiplying all likelihoods as (i) (i), (i) Nv = Πm=1 P(c(k,m) |θ(k,n) ) · w(k−1,n) , (6.36) w˜ (k,n) where the un-normalized ith particle weight of the nth UAV at the kth time step is the (i) . The likelihoods for updating the particle filter of the nth numerator in (6.13), w˜ (k,n) UAV are calculated using the current estimation result of the nth UAV with group measurement ck,1:Nv . The total number of vehicles is denoted as Nv . The Infotaxis utility function is adopted as the utility function, and the problem is considered as a potential game [46]. The difference between coordinated and cooperative Infotaxis is defined as to whether each mobile agent negotiates with other mobile agents when determining its next best sampling location. The agents determine their own next sampling location to increase the group search performance by considering their neighbor’s decisions through negotiation in cooperative Infotaxis. Meanwhile, they make their own decision without considering the other group members in coordinated Infotaxis. The flow in cooperative Infotaxis is shown in Fig. 6.9. The search performance comparison is performed using 100 Monte Carlo simulations for each algorithm and different numbers of mobile sensors that use real gas dispersion dataset [47] collected in the vicinity of Kincaid power plant that is located in Illinois, USA. Independent Infotaxis with multiple agents is used as the baseline approach to demonstrate the advantage of coordinated Infotaxis. Since the source term estimation accuracy and efficiency (i.e., search time) are the most important factors in the response to hazardous material incidents, the success rate, mean search
6 Information-Theoretic Autonomous Source Search …
159
Fig. 6.9 System flow of cooperative Infotaxis Table 6.3 Performance comparison of independent Infotaxis, Coordinated Infotaxis, and Cooperative Infotaxis with different numbers of mobile agents over 100 Monte Carlo simulations Nv 1 2 3 4 5 6 Infotaxis SR [%] MST [steps] Coordinated Infotaxis SR [%] MST [steps] Cooperative Infotaxis SR [%] MST [steps]
7 144
4 182
2 176
0 –
0 –
0 –
6 154
13 133
30 113
62 131
63 144
64 155
6 146
12 115
29 110
61 103
83 103
82 101
time, and standard deviation of the mean search time are compared, as shown in Table 6.3. Independent Infotaxis is below 10% SR while those of coordinated and cooperative Infotaxis reach 64% and 82%, respectively. For cooperative Infotaxis, the MST is reduced from 146 to 101 steps when six mobile agents are used. The illustrative run using the cooperative Infotaxis strategy for source term estimation in the Kincaid power plant dataset is shown in Fig. 6.10.
160
M. Park et al.
Fig. 6.10 Illustrative run using Cooperative Infotaxis for source term estimation with the Kincaid power plant dataset
6.7.2 Socialtaxis Karpas et al. [29] proposed the Socialtaxis for source term estimation using multimobile sensors by adopting the Kullback–Leibler divergence as a social utility function. The proposed utility function is expressed as U (u (k,n) ) = H(k+1,n) (Θ(k+1,n) |C(u (k,n) )) − β
D K L (Pn ||Pm ),
(6.37)
m∈Mn
where β denotes a user parameter for balancing between social and individual movement, Mn denotes a set of neighbors of the nth mobile agent, and D K L (Pn ||Pm ) denotes the Kullback–Leibler divergence between the source term probabilities estimated by the nth and mth agents. The first term is defined as a conditional entropy, which is a part of (6.29). Since the current entropy of source term H(k,n) (Θ(k,n) ) in (6.29) is not a function of control action u (k,n) , it can be ignored in determining u (k,n) . The second term encourages exploration by guiding each agent to move in the direction of resulting in different source term distributions. In other words, each agent moves to estimate its probability distribution of the source term to be different from that of the others; this makes the search paths of the agents not overlap, thereby reducing the uncertainty of the environment.
6 Information-Theoretic Autonomous Source Search …
161
6.7.3 Improved Socialtaxis In this section, the improved source term estimation strategy, inspired by Socialtaxis [29] and cooperative Infotaxis [19], is introduced. The Rao–Blackwellized particle filter is adopted, which is described in Sect. 6.4.3, instead of the particle filter in order to reduce the computational burden by assuming that gas release strength and source location are independent. Similar to cooperative Infotaxis [19], sensor fusion is performed in this algorithm. The calculation of the likelihood expressed in (6.20) is then modified as P(c|ˆr ) =
Nv P(cn |θn )G(q; κ0 , η0 ) Πn=1 , G(q; κ, η)
(6.38)
where the source term includes source location and release strength (i.e., θ = [ˆr T , q]), while rˆ = [x, y]T denotes 2D source location. To calculate the D K L between the estimated source term probabilities of its own and that of the neighbors, each agent approximates its distribution as Gaussian and shares its mean and standard deviation. In addition, as the utility function of Socialtaxis is biased for exploration, a distancebased exploitation term is added to improve its exploitation capability as U (u (k,n) ) = tr(σn )−1 ||μn − pn || + H(k+1,n) (Θ(k+1,n) |C(u (k,n) )) − β
D K L (Pn ||Pm ),
m∈Mn
(6.39) where μn and σn denote the mean and standard deviation of the approximated Gaussian distribution of the nth agent, respectively. The inverse of the trace of standard deviation matrix tr(σm )−1 is used as a balancing coefficient. If the standard deviation is large (i.e., the source term is quite uncertain), then the effect of the distance term decreases, whereas, if the standard deviation is small, then its influence increases. The search performance comparisons among improved Socialtaxis, Socialtaxis, and coordinated Infotaxis are shown in Table. 6.4. In this approach, it is assumed that each agent decides its own next sampling location without negotiating its decisions to avoid excessive computation and communication burden. Thus, the search performance of improved Socialtaxis is compared with that of coordinated Infotaxis instead of cooperative Infotaxis. Since improved Socialtaxis focuses more on exploitation than exploration, its SR is smaller than those of the other two algorithms; however, it has the lowest MST in all conducted simulations.
6.8 Conclusions and Future Work This chapter presented autonomous source term estimation methods using UAVs equipped with onboard gas detection sensors. To estimate the nonlinear and nonGaussian gas dispersion using gas concentration measurements, Bayes’ inference
162
M. Park et al.
Table 6.4 Performance comparison among Improved Socialtaxis, Socialtaxis, and Coordinated Infotaxis with different numbers of mobile agents averaged over 500 Monte Carlo simulations Nv 1 2 3 4 5 Improved Socialtaxis SR [%] MST [steps] Socialtaxis SR [%] MST [steps] Coordinated Infotaxis SR [%] MST [steps]
74 156
92 125
92 95
89 82
91 69
91 170
98 153
99 142
99 129
99 123
91 170
86 129
92 105
93 94
92 84
estimation methods, namely, Bayes’ filter, particle filter, and Rao–Backwellized particle filter are introduced. Information measures that could represent the estimation uncertainty and expected information gain were then introduced. Entropy is one of the most basic information measures that can indicate the uncertainty of probability distribution. The Kullback–Leibler divergence captures the difference between two distributions of the same random variable. Lastly, mutual information measures the amount of reduction in uncertainty about one random variable, given the knowledge of another. In the source term estimation study, it is often used to calculate the reduction in uncertainty about source term distribution given sensor measurements (single agent strategies) or neighbor’s information (multi-agent strategies). Source search strategies were introduced not only for a single UAV but also for multiple UAVs in this chapter. Infotaxis was introduced, which balances the exploitation and exploration using entropy and mutual information. The modified algorithms that used the particle filter and GMM to improve the search performance by considering the estimation and selecting the next sampling position in continuous domains were subsequently presented. cooperative Infotaxis, which adopts a gametheoretic coordination approach for a multi-UAV system with sensor fusion, was introduced. Additional utility values utilizing social factors among UAVs to improve the search performance of the multi-UAV system were introduced. Finally, search performance comparisons of the algorithms were presented in various gas release simulations and flight experiments. Several factors need to be considered in future research that can improve the source term estimation performance or expand its capability. In the source term estimation problem, many studies utilized analytical gas dispersion models to forecast the dispersion. However, these models do not reflect the real gas behavior when the atmosphere condition (e.g., wind, temperature, and pressure) is unstable or dynamic. Thus, more realistic gas dispersion models can improve the search and estimation performance. In addition, the proposed GMM-Infotaxis and RRT-Infotaxis consider
6 Information-Theoretic Autonomous Source Search …
163
a few sampled positions only for the next sampling position rather than considering infinitely many continuous actions. Although the search performance could be further improved by considering the infinite number of feasible actions, this might be computationally too expensive while using a realistic gas dispersion model and considering a large number of next sensing locations to select the optimal decision. Deep learning can be a solution to this problem; a deep neural network can imitate a realistic gas dispersion CFD model in real time [48], and deep reinforcement learning has been shown to find an appropriate policy within a continuous action domain [49]. This chapter demonstrated that using multiple UAVs can dramatically increase the source term estimation performance. In order to utilize a large number of UAVs (i.e., a swarm of UAVs), each agent may need to consider a limited number of neighboring agents only rather than all neighbors to avoid an excessive computational burden and more sophisticated distributed estimation and optimization would be required. Besides, although the proposed algorithms are validated in real experiments, there remain many areas that need to be further developed to support missions in relevant applications. Among these areas of improvement, it is worthwhile recalling that more agile and robust flight control technologies might be required. Furthermore, appropriate small and lightweight gas sensors for aerial platforms need to be developed. Finally, the flight endurance of UAVs should be improved. A key problem in UAV-related applications, such as those considered in this chapter, concerns determining the location of the vehicle relative to both its environment and a global reference frame. The vehicle’s location relative to the environment can be produced by employing cameras, as discussed in Chap. 3, and Lidar, as discussed in this chapter. GPS antennas provide overly common tools for determining a UAV’s location in a global reference frame. However, the GPS signal may suffer from both intentional and unintentional distortions, which may jeopardize the ability to complete a given mission successfully. The next chapter proposes novel techniques to overcome, and, in some cases, overcome these issues. Acknowledgements This research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2020R1A6A1A03040570) and the National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (2023R1A2C2003130).
References 1. Hutchinson M, Oh H, Chen W-H (2017) A review of source term estimation methods for atmospheric dispersion events using static or mobile sensors. Inf Fus 36:130–148 2. Singh SK, Rani R (2014) A least-squares inversion technique for identification of a point release: Application to fusion field trials 2007. Atmos Environ 92:104–117 3. Sujit P, Ghose D (2004) Search using multiple UAVs with flight time constraints. IEEE Trans Aerosp Electron Syst 40(2):491–509 4. Erdos D, Erdos A, Watkins SE (2013) An experimental uav system for search and rescue challenge. IEEE Aerosp Electron Syst Mag 28(5):32–37
164
M. Park et al.
5. Esmailifar SM, Saghafi F (2015) Moving target localization by cooperation of multiple flying vehicles. IEEE Trans Aerosp Electron Syst 51(1):739–746 6. Boström-Rost P, Axehill D, Hendeby G (2021) Sensor management for search and track using the poisson multi-bernoulli mixture filter. IEEE Trans Aerosp Electron Syst 7. Neumann PP, Asadi S, Lilienthal AJ, Bartholmai M, Schiller JH (2012) Autonomous gassensitive microdrone: Wind vector estimation and gas distribution mapping. IEEE Robot Autom Mag 19(1):50–61 8. Hutchinson M, Liu C, Chen W-H (2019) Source term estimation of a hazardous airborne release using an unmanned aerial vehicle. J Field Robot 36(4):797–817 9. Hutchinson M, Ladosz P, Liu C, Chen W-H (2019) Experimental assessment of plume mapping using point measurements from unmanned vehicles. In: 2019 international conference on robotics and autonomous 10. Gao W, Wang W, Zhu H, Huang G, Wu D, Du Z (2018) Robust radiation sources localization based on the peak suppressed particle filter for mixed multi-modal environments. Sensors 18(11):3784 11. Voges N, Chaffiol A, Lucas P, Martinez D (2014) Reactive searching and Infotaxis in odor source localization. PLOS Comput Biol 10(10):1–13 12. Neumann PP, Hernandez Bennetts V, Lilienthal AJ, Bartholmai M, Schiller JH (2013) Gas source localization with a micro-drone using bio-inspired and particle filter-based algorithms. Adv Robot 27(9):725–738 13. Pyk P, i Badia SB, Bernardet U, Knüsel P, Carlsson M, Gu J, Chanie E, Hansson BS, Pearce TC, Verschure PF, (2006) An artificial moth: Chemical source localization using a robot based neuronal model of moth optomotor anemotactic search. Autonom Robot 20(3):197–213 14. Russell RA, Bab-Hadiashar A, Shepherd RL, Wallace GG (2003) A comparison of reactive robot chemotaxis algorithms. Robot Autonom Syst 45(2):83–97 15. Bourne JR, Pardyjak ER, Leang KK (2019) Coordinated Bayesian-based bioinspired plume source term estimation and source seeking for mobile robots. IEEE Trans Robot 35(4):967–986 16. Li J-G, Meng Q-H, Wang Y, Zeng M (2011) Odor source localization using a mobile robot in outdoor airflow environments with a particle filter algorithm. Autonom Robot 30(3):281–292 17. Hoffmann GM, Tomlin CJ (2010) Mobile sensor network control using mutual information methods and particle filters. IEEE Trans Autom Control 55(1):32–47 18. Park M, An S, Seo J, Oh H (2021) Autonomous source search for uavs using gaussian mixture model-based infotaxis: Algorithm and flight experiments. IEEE Trans Aerosp Electron Syst 57(6):4238–4254 19. Park M, Oh H (2020) Cooperative information-driven source search and estimation for multiple agents. Inf Fus 54:72–84 20. Vergassola M, Villermaux E, Shraiman BI (2007) Infotaxis as a strategy for searching without gradients. Nature 445(7126):406 21. Ristic B, Skvortsov A, Gunatilaka A (2016) A study of cognitive strategies for an autonomous search. Inf Fus 28:1–9 22. Hutchinson M, Oh H, Chen W-H (2018) Entrotaxis as a strategy for autonomous search and source reconstruction in turbulent conditions. Inf Fus 42:179–189 23. Lu Q, He Y, Wang J (2014) Localization of unknown odor source based on shannon’s entropy using multiple mobile robots. In: IECON 2014-40th annual conference of the IEEE industrial electronics society. IEEE, pp 2798–2803 24. Zhao Y, Chen B, Zhu Z, Chen F, Wang Y, Ji Y (2020) Searching the diffusive source in an unknown obstructed environment by cognitive strategies with forbidden areas. In: Building and environment, p 107349 25. Park M, Ladosz P, Kim J, Oh H (2022) Receding horizon-based infotaxis with random sampling for source search and estimation in complex environments. IEEE Trans Aerosp Electron Syst 26. An S, Park M, Oh H (2022) Receding-horizon rrt-infotaxis for autonomous source search in urban environments. Aerosp Sci Technol 120:107276 27. Masson J, Bechet MB, Vergassola M (2009) Chasing information to search in random environments. J Phys A: Math Theoret 42(43):1–14
6 Information-Theoretic Autonomous Source Search …
165
28. Ristic B, Gilliam C, Moran W, Palmer JL (2020) Decentralised multi-platform search for a hazardous source in a turbulent flow. Inf Fus 58:13–23 29. Karpas ED, Shklarsh A, Schneidman E (2017) Information socialtaxis and efficient collective behavior emerging in groups of information-seeking agents. Proceed Nat Acad Sci 114(22):5589–5594 30. Jang H, Park M, Oh H (2021) Improved socialtaxis for information-theoretic source search using cooperative multiple agents in turbulent environments. In: Asia-pacific international symposium 31. Wang Y, Huang H, Huang L, Ristic B (2017) Evaluation of Bayesian source estimation methods with prairie grass observations and gaussian plume model: A comparison of likelihood functions and distance measures. Atmos Environ 152:519–530 32. Monroy J, Hernandez-Bennetts V, Fan H, Lilienthal A, Gonzalez-Jimenez J (2017) Gaden: A 3D gas dispersion simulator for mobile robot olfaction in realistic environments. Sensors 17(7):1479 33. Senocak I, Hengartner NW, Short MB, Daniel WB (2008) Stochastic event reconstruction of atmospheric contaminant dispersion using Bayesian inference. Atmosp Environ 42(33):7718– 7727 34. Thrun S, Burgard W, Fox D (2005) Probabilistic robotics. MIT press, Cambridge, MA 35. Efthimiou GC, Kovalets IV, Venetsanos A, Andronopoulos S, Argyropoulos CD, Kakosimos K (2017) An optimized inverse modelling method for determining the location and strength of a point source releasing airborne material in urban environment. Atmos Environ 170:118–129 36. Jaynes ET (2003) Probability theory: The logic of science. Cambridge University Press, Cambridge, MA 37. Yee E (2017) Automated computational inference engine for Bayesian source reconstruction: Application to some detections/non-detections made in the CTBT international monitoring system. Appl Math Sci 11(32):1581–1618 38. Hajieghrary H, Hsieh MA, Schwartz IB (2016) Multi-agent search for source localization in a turbulent medium. Phys Lett A 380(20):1698–1705 39. Samaniego FJ (2010) A comparison of the Bayesian and frequentist approaches to estimation, vol 24. Springer 40. Ristic B, Arulampalam S, Gordon N (2003) Beyond the Kalman filter: Particle filters for tracking applications. Artech house 41. Ristic B, Gunatilaka A, Wang Y (2017) Rao-blackwell dimension reduction applied to hazardous source parameter estimation. Sig Process 132:177–182 42. Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27(3):379–423 43. Cover TM, Thomas JA (2006) Elements of information theory. John Wiley & Sons, Hoboken, NJ 44. Beyme S (2014) Autonomous, wireless sensor network-assisted target search and mapping. Ph.D. dissertation, University of British Columbia 45. Sebastiani P, Wynn HP (2000) Maximum entropy sampling and optimal Bayesian experimental design. J Roy Stat Soc: Ser B (Stat Methodol) 62(1):145–157 46. Marden JR, Arslan G, Shamma JS (2009) Joint strategy fictitious play with inertia for potential games. IEEE Trans Autom Control 54(2):208–220 47. Carruthers D, Edmunds H, Ellis K, McHugh C, Davies B, Thomson D (1995) The atmospheric dispersion modelling system (ADMS): Comparisons with data from the kincaid experiment. Int J Environ Pollut 5(4–6):382–400 48. Na J, Jeon K, Lee WB (2018) Toxic gas release modeling for real-time analysis using variational autoencoder with convolutional neural networks. Chem Eng Sci 181:68–78 49. Duan Y, Chen X, Houthooft R, Schulman J, Abbeel P (2016) Benchmarking deep reinforcement learning for continuous control. In: International conference on machine learning. PMLR, pp 1329–1338
166
M. Park et al.
Minkyu Park received the B.Eng. degree in Mechanical System Design and Manufacturing, from the Ulsan National Institute of Science and Technology (UNIST), South Korea, in 2016. He is currently working toward a Ph.D. degree at the Autonomous Systems Laboratory, UNIST. Dr. Park’s research interests include autonomous search, sensor management, Bayesian estimation, information theory, and mobile robot applications for environmental monitoring. Seulbi An received the B.Eng. degree in Mechanical and Aerospace Engineering from the Ulsan National Institute of Science and Technology (UNIST), Ulsan, South Korea, in 2018, where she is currently working toward the Master’s degree in the field of autonomous unmanned vehicles. Ms. An’s research interests include sequential Monte Carlo methods, estimation, samplingbased path planning, and informative path planning. Hongro Jang received the B.Eng. degree in Mechanical and Aerospace Engineering from UNIST, Ulsan, South Korea, in 2021. She is currently working toward a Ph.D. degree at the Autonomous Systems Laboratory, UNIST. Ms. Jang’s research interests include information theory, sequential Monte Carlo methods, Bayesian estimation, informative path planning, and cooperative search strategy. Hyondong Oh received the B.Sc. and M.Sc. degrees in Aerospace Engineering from the Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea, in 2004 and 2010, respectively, and the Ph.D. degree in Autonomous Surveillance and Target Tracking Guidance of Multiple UAVs from Cranfield University, Cranfield, U.K., in 2013. Dr. Oh was a Lecturer in Autonomous Unmanned Vehicles with Loughborough University, U.K., from 2014 to 2016. He is currently an Associate Professor with the Department of Mechanical Engineering, Ulsan National Institute of Science and Technology, Uslan, South Korea. His research interests include autonomy and decision-making, cooperative control, path planning, nonlinear guidance and control, and estimation and sensor/information fusion for unmanned systems.
Chapter 7
Resilient Estimation and Safe Planning for UAVs in GPS-Denied Environments Wenbin Wan, Hunmin Kim, Naira Hovakimyan, Petros Voulgaris, and Lui R. Sha
7.1 Introduction Unmanned aerial vehicles (UAVs) have been used worldwide for commercial, civilian, and educational applications over the decades. The mechanical simplicity and agile maneuverability appeal to many practical applications, such as cargo transportation [1], aerial photography [2], and agricultural farming [3], and other cooperative tasks. The most widely used sensor for UAVs is the Global Positioning System (GPS), which offers accurate and reliable state measurements. However, GPS receivers are vulnerable to various types of distortions, such as blocking, jamming, and spoofing [4]. The Vulnerability Assessment Team at the Los Alamos National Laboratory has demonstrated that civilian GPS spoofing distortions can be easily implemented by using a GPS simulator [5]. Furthermore, GPS is more vulnerable when its signal strength is weak. Due to various applications of UAVs, the operating environment becomes diverse as well, where GPS signals are weak or even W. Wan (B) University of New Mexico, Albuquerque, USA e-mail: [email protected] H. Kim Mercer University, Macon, USA e-mail: [email protected] N. Hovakimyan · L. R. Sha University of Illinois at Urbana-Champaign, Urbana, IL, USA e-mail: [email protected] L. R. Sha e-mail: [email protected] P. Voulgaris University of Nevada, Reno, Reno, USA e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 A. L’Afflitto et al. (eds.), Control of Autonomous Aerial Vehicles, Advances in Industrial Control, https://doi.org/10.1007/978-3-031-39767-7_7
167
168
W. Wan et al.
denied due to other structures such as skyscrapers, elevated highways, bridges, and mountains. One of the most effective GPS spoofing detection techniques is to analyze raw antenna signals or utilize multi-antenna receiver systems. The GPS spoofing can be detected by checking whether the default radiation pattern is changed in [6]. A multi-antenna receiver system was used to detect GPS spoofing by monitoring the angle-of-arrival of the spoofing attempts in [7]. As an extension of this work, GPS spoofing mitigation has also been investigated where an array of antennas is utilized to obtain genuine GPS signals by spatial filtering [8–10]. However, those solutions require hardware modifications or low-level computing modules and assume that only single-antenna spoofing systems are utilized. Furthermore, the solutions are infeasible if multi-antenna spoofing devices are available [11]. In cyber-security literature, GPS spoofing has been described as malicious signal injections to the genuine sensor output [12]. Anomaly detection against malicious signal injection has been widely studied over the last few years. The detection problem has been formulated as an 0 /∞ optimization problem, which is NP-hard in [13, 14]. The fundamental limitations of structural detectability and graph-theoretical detectability for linear time-invariant systems have been studied in [15]. Based on the properties found, centralized and distributed detectors have been provided. The detection problem has been formulated as a resilient estimation problem of constrained state and unknown input in [16, 17]. A switching mode resilient detection and estimation framework for GPS spoofing has been studied in [18]. Anomaly detection using multiple GPS signals by checking cross-correlation was introduced in [19]. In [12], the maximum deviations of the state were identified due to the sensor distortions while remaining stealthy due to the detection. A secure control framework for networked control systems was designed in [20] to analyze the resource-limited adversaries. We notice that existing emergency control architectures focus on switching control from a high-performance controller to a robust high-assurance controller in the presence of distortions [21]. These architectures can efficiently handle a class of distortions but cannot address the fundamental problem due to limited sensor availability. This chapter addresses safety problems induced by limited sensor availability due to GPS spoofing distortions. We formulate the sensor drift problem as an increasing variance of state estimation to quantify the sensor drift and introduce the escape time under which the state estimation error remains within a tolerable error with high confidence. Next, we develop a novel safety-constrained control framework that adapts the UAV at a path re-planning level to support resilient state estimation against GPS spoofing distortions. In the presence of GPS spoofing distortion, the pursuer location tracker (PLT) tracks the pursuer’s location and estimates the output power of the spoofing device by unscented Kalman filter (UKF) with sliding window outputs. The estimates are then used in the escape controller (EsC) that drives the UAVs away from the effective range of the spoofing device within the escape time to avoid intolerable sensor drift. The proposed framework has also been applied to the timecritical coordination task for a multi-UAV system by adapting the UAV(s) at a path replanning level to support resilient state estimation against GPS spoofing distortions.
7 Resilient Estimation and Safe Planning for UAVs in GPS-Denied Environments
169
Preliminary results have been published in [18, 22, 23]. The current chapter provides detailed derivations of the UKF with sliding window outputs algorithm and presents three types of controller design with different safety constraints. The rest of the chapter is organized as follows. Section 7.2 introduces the system model and problem statement. Section 7.3 presents the high-level idea of the proposed framework, detailed explanations of each component, and an illustrative example. In Sect. 7.4, an extended framework for the multi-UAV systems is designed for performing the time-critical coordination task. Finally, Sect. 7.5 draws conclusions and outlines future work directions.
7.2 Problem Formulation System Model. Consider the discrete-time dynamic system model xk = Axk−1 + Bu k−1 + wk−1 , ykG ykI
= C xk + dk + G
=
ykS =
(7.1a)
vkG ,
(7.1b)
C (xk − xk−1 ) + vkI , S ηk C d(x a ,xk )2 + vkS , under the distortion, k I
η S + vkS ,
(7.1c) (7.1d)
otherwise,
where xk ∈ Rn denotes the state. The system matrices A, B, C G , C I , and C S are known and bounded with appropriate dimensions. There are three types of outputs available, namely, ykG ∈ Rm G captures the GPS measurement, which may be corrupted by unknown GPS spoofing signal dk ∈ Rm G ; the signal dk is injected by the pursuer when the UAV is in the effective range of the spoofing device. The output ykI ∈ Rm I is the inertial measurement unit (IMU) measurement, which returns a noisy measurement of the state difference. Finally, the output ykS ∈ Rm S represents the GPS signal strength. The defender is unaware of xka and ηk , where xka ∈ Rn is the unknown pursuer location, and ηk ∈ Rm S captures the nominal power of the spoofing device If the GPS signal is under distortion, then ykS is an inverse function of the distance between the pursuer and the UAV. The function d(a, b) measures the distance between a and b. If the UAV receives genuine GPS signals, then this output represents the genuine GPS signal strength η S . We assume that the pursuer can inject any signal dk into ykG . The noise signals wk , vkG , vkI , and vkS are independent and identically random variableswith zero distributed Gaussian means and covariances E wk wkT = Σ w ≥ 0, E vkG (vkG )T = Σ G > 0, E vkI (vkI )T = Σ I > 0, and E vkS (vkS )T = Σ S > 0, respectively. Remark 7.1 The sensor measurement ykI represents any relative sensor measurement, such as velocity measurement by a camera. In this chapter, we use IMU for illustration purposes.
170
W. Wan et al.
Remark 7.2 The signal strength output ykS in (7.1d) is derived by the GPS signal attenuation due to free-space path loss. The Friis transmission equation is given by Pr = Pt G t G r
λ2 , (4πr )2
where Pt and Pr denote the transmit power and the receive power, respectively, G t and G r denote the transmit antenna gain and receive antenna gain, respectively, r denotes the distance between two antennas, and λ denotes the wavelength [24]. We λ 2 ) as the output matrix C S , G t Pt as the nominal power of the spoofing write G r ( 4π device ηk , and r as the distance between the pursuer and the UAV d(xka , xk ). Problem Statement. Given the system (7.1) with sensor measurements (7.1b)– (7.1d), the defender aims to detect the GPS spoofing distortion, achieve resilient state estimation when considering the limited sensor availability, and complete the global mission securely.
7.3 Main Results To address the given problem, we propose the safety-constrained control framework captured by Fig. 7.1, which consists of a detector, a resilient state estimator, a robust controller, a PLT, and an EsC. The proposed safety-constrained control framework drives the UAV outside the effective range of the spoofing device. In the following, we present each module of the proposed framework.
Fig. 7.1 A safety-constrained control framework consisting of a detector, a resilient state estimator, a robust controller, a PLT, and an EsC
7 Resilient Estimation and Safe Planning for UAVs in GPS-Denied Environments
171
Robust Control Mode. The robust controller is a complex controller that operates the UAV to the destination in the presence of noise, but without the presence of distortions. Any robust control technique can be implemented in this module. Emergency Control Mode. PLT is designed for tracking the location of the pursuer and estimating the output power of the spoofing device by applying UKF with sliding window outputs. EsC is a model predictive controller (MPC)-based design that drives the UAV out of the effective range of the spoofing device based on the estimation of the pursuer location obtained by PLT. Resilient Monitor and Decision Logic. The resilient state estimator is developed based on a Kalman-filter-like state estimator. The detector is designed by the χ 2 -based anomaly detection algorithm. Based on the previous estimation from the resilient state estimator, the Boolean output (dotted-dashed line in Fig. 7.1) of the detector determines (i) whether the GPS measurement should be used for the state estimation; and (ii) the switching rule between two control modes: the robust control mode and the emergency control mode. PLT and EsC adapt the UAV at a path re-planning level for safe operation. In the following, each subsection describes a new resilience measure (escape time) and the details of the corresponding component.
7.3.1 Escape Time: A New Resilience Measure This subsection introduces a new resilience measure, the escape time. It has been proven by Theorem 4.2 of [18] that if GPS signals are compromised for a long time, then the state estimation becomes less trustful. Therefore, the UAV should escape from the GPS spoofing device at a certain time before the estimation becomes unreliable. The definition of the escape time is provided as follows for completeness. Definition 7.1 (Escape time) The escape time k esc ≥ 0 is the time difference between the attack time k a and the first time instance when the estimation error xk − xˆk may not be in the tolerable error distance ζ ∈ Rn with the significance α, i.e., k esc = arg min k − ka , k≥k a
s.t. ζ Pk−1 ζ < χd2f (α), T
where Pk E (xk − xˆk )(xk − xˆk )T denotes the error covariance, and χd2f (α) is the χ 2 value with degree of freedom d f and statistical significance level α. The escape time calculation is presented in Algorithm 7.1. Given the attack time k a , the state estimation errors may not remain in the tolerable region with the predetermined confidence α after the escape time k esc .
172
W. Wan et al.
Algorithm 7.1 Escape time calculation Require:k a , α, d f , ζ ; Ensure:k esc ; 1: k = k a ; 2: while ζ T Pk−1 ζ > χd2 f (α) do 3: k = k + 1; 4: end while 5: k esc = k − k a ;
7.3.2 Resilient State Estimation The following Kalman-filter-like state estimator is used to estimate the current state. The state estimate xˆk can be obtained by xˆk = xˆk− + K kG (ykG − C G xˆk− ) + K kI ykI − C I (xˆk− − xˆk−1 ) ,
(7.2)
where xˆk− A xˆk−1 + Bu k−1 denotes the a priori state estimate. The state estimation error covariance Pk is given by Pk = (A − K k C A + K k DC)Pk−1 (A − K k C A + K k DC)T + (I − K k C)Σw (I − K k C)T + K k Σ y K kT ,
(7.3)
G 00 C ΣG 0 , Σ and D where K k K kG K kI , C . The optiy CI 0 ΣI 0I mal gain K k is obtained by minimizing the trace of Pk , i.e., min tr (Pk ).
(7.4)
K k = (A Pk−1 (C A − DC)T + Σw C T ) −1 · (C A − DC)Pk−1 (C A − DC)T + CΣw C T + Σ y .
(7.5)
Kk
The solution of (7.4) is given by
In [18], it has been shown that the covariance in (7.3) is bounded when the GPS signal is available. If the GPS is denied, and only the relative sensor ykI is available, then the covariance is strictly increasing and unbounded in time. That is, the sensor drift problem can be formulated as instability of the covariance matrix.
7 Resilient Estimation and Safe Planning for UAVs in GPS-Denied Environments
173
7.3.3 Detection The defender implements both an estimator and a χ 2 detector to estimate the state and detect the GPS spoofing distortion. The χ 2 test is widely used in detection for stochastic systems [25, 26]. Given a sample of Gaussian random variable σˆ k with unknown mean σk and known covariance Σk , the χ 2 test provides statistical evidence of whether σk = 0 or not. In particular, the sample σˆ k is normalized by σˆ kT Σk−1 σˆ k , and we compare the normalized value with the value χd2f (α). We reject the null hypothesis H0 : σk = 0, if σˆ kT Σk−1 σˆ k > χd2f (α), and accept the alternative hypothesis H1 : σk = 0, i.e., there is significant statistical evidence that σk is non-zero. Otherwise, we accept the null hypothesis H0 , i.e., there is no significant evidence that σk is non-zero. To be specific, we detect the GPS spoofing distortions by χ 2 test H0 : dk = 0;
H1 : dk = 0,
(7.6)
using the CUSUM (cumulative sum) algorithm, which is widely used in detection research [27–29]. Since dk = ykG − C G xk − vkG , given the previous state estimate xˆk−1 , we estimate the vector by comparing the sensor output and the output prediction dˆk = ykG − C G (A xˆk−1 + Bu k−1 ).
(7.7)
Note that the current estimate xˆk should not be used for the prediction because it is correlated with the current output; i.e., E[xˆk (ykG )T ] = 0. Due to the Gaussian noises wk and vk injected into the linear system in (7.1), the states follow Gaussian distribution since any finite linear combination of Gaussian distributions is also Gaussian. Similarly, dˆk is Gaussian as well, and, thus, the use of χ 2 test (7.6) is justified. In particular, the χ 2 test compares the normalized vector estimate dˆkT (Pkd )−1 dˆk with χd2f (α) Accept H0 , if dˆkT (Pkd )−1 dˆk ≤ χd2f (α),
(7.8a)
Accept H1 , if dˆkT (Pkd )−1 dˆk > χd2f (α),
(7.8b)
where Pkd E[(dk − dˆk )(dk − dˆk )T ] = C G (A Pk−1 AT + Σw )(C G )T + ΣG , and χd2f (α) is the threshold found in the chi-square table. To reduce false positives/negatives due to noise, we use the test (7.8) in a cumulative form. The proposed χ 2 CUSUM detector is characterized by the detector state Sk ∈ R+ :
174
W. Wan et al.
Fig. 7.2 Resilient state estimator. GPS and IMU measurements are used in estimator one (Est. 1). Estimator two (Est. 2) only uses the IMU measurement. Est. 1 is used to estimate the state by (7.2) for the detection as in (7.7). When GPS is free of distortions, Est. 1 is also used to estimate the state for the control since the GPS measurement is trustful. In the presence of GPS distortion, Est. 2 is used for the control
Sk = δSk−1 + (dˆk )T (Pkd )−1 dˆk , S0 = 0,
(7.9)
where 0 < δ < 1 is the pre-determined forgetting factor. At each time step k, the CUSUM detector (7.9) is used to update the detector state Sk and detect the distortion. The detector will (i) update the estimated state xˆk and the error covariance Pk in (7.3) with K kG = 0 and (ii) switch the controller to EsC, if Sk >
∞ i=0
If Sk
0, respectively. Prediction. Given the previous state estimate xˆk−1 and system model (7.12), the current state can be predicted as xˆk|k−1 = Ak−1 xˆk−1 . Its error covariance matrix is Pk|k−1 E[(xk − xˆk|k−1 )(xk − xˆk|k−1 )T ] = Ak−1 Pk−1 ATk−1 + Σw , where Pk−1 E[(xk−1 − xˆk−1 )(xk−1 − xˆk−1 )T ] denotes the state estimation error covariance matrix. Sigma Points Generation. We define a sigma points array Xk xˆk|k−1 ±
T n Pk|k−1
i
, i = 1, . . . , n ,
T where n Pk|k−1 denotes the matrix square root such that n Pk|k−1 n Pk|k−1 = n Pk|k−1 , and the matrix operator (·)i gives the ith row of the matrix. Measurement Update. Given the sliding window size M, the nonlinear measurement equation f (·) is used to transform the sigma points into predicted measurement vectors yˆki = f (Xik ), i i = f (A−1 yˆk−1 k−1 Xk ), .. . −N +1 i i yˆk−N +1 = f (Ak−1 Xk ).
T i Defining ˆyik yˆki , . . . , yˆk−M+1 , the approximated mean of the measurements is y¯ k
2n i=0
i
Wki yˆ k ,
7 Resilient Estimation and Safe Planning for UAVs in GPS-Denied Environments
177
where Wki denotes a weighting coefficient. By taking the measurement noise into account, the estimated covariance of the predicted measurements is given by y Pk
2n
i
i
Wki ( yˆ k − y¯ k )( yˆ k − y¯ k )T + Σ v ,
i=0
where Σ v = diag{Σv , . . . , Σv } is a diagonal matrix. The cross-covariance between the state prediction and predicted measurements is given by 2n xy i Pk = Wki (Xik − xˆk|k−1 )( yˆ k − y¯ k )T , i=0
where Xik denotes the ith element in Xk . The measurement yk [yk , . . . , yk−M+1 ]T is used to update the prediction xˆk|k−1 as xˆk = xˆk|k−1 + K k ( yk − y¯ k ). The covariance matrix of the state estimation error is given by x y T xy y − P k K kT + K k P k K kT . Pk = Pk|k−1 − K k P k The gain matrix K k is chosen by minimizing the trace norm of Pk , i.e., min K k tr (Pk ). xy y The solution of the program is given by K k = P k ( P k )−1 . Note that the prediction step does not need unscented transformation because the dynamic system (7.12a) is linear. The UKF with sliding window outputs algorithm is summarized in Algorithm 7.2.
7.3.5 Controller Design: EsC In the presence of a GPS spoofing distortion, the variance of the state estimation errors Pk in (7.3) is strictly increasing and unbounded in time (Theorem 4.2 [18]). The goal of EsC is to drive the UAV outside of the effective range of the spoofing device within the escape time so that the state estimation error remains within the tolerable region with a predetermined probability. In particular, EsC is designed to drive the UAV outside of the effective range of the spoofing device within the escape time. Given the estimates of UAV state xˆk and pursuer state xˆ ka with their covariances, the problem can be formulated as a finite-horizon constrained MPC problem as the following program:
178
W. Wan et al.
Algorithm 7.2 UKF with sliding window outputs Prediction 1: xˆk|k−1 = Ak−1 xˆk−1 ; 2: Pk|k−1 = Ak−1 Pk−1 ATk−1 + Σw ; Sigma points generation 3: Xk = {xˆk|k−1 ± ( n Pk|k−1 )iT }, i ∈ {1, . . . , n}; Measurement Update 4: for i = 1 : 2n do 5: −M+1 i T i i i yˆ ik [ yˆki , yˆk−1 , . . . , yˆk−M+1 ]T = [ f (Xik ), f (A−1 k−1 Xk ), . . . , f (Ak−1 Xk )] ;
6: end for 2n 7: y¯ k = i=0 Wki yˆ ik ; y 2n 8: P k = i=0 Wki ( yˆ ik − y¯ k )( yˆ ik − y¯ k )T + Σ v ; xy 2n Wki (Xik − xˆk|k−1 )( yˆ ik − y¯ k )T ; 9: P k = i=0 10: K k = P k ( P k )−1 ; xy
y
11: xˆk = xˆk|k−1 + K k ( yk − y¯ k ); y
12: Pk = Pk|k−1 − K k P k K kT ;
min u
a +N k
T Q i xˆ˜i+1 + u iT Ri u i , x˜ˆi+1
i=k a
s.t. xˆi+1 = A xˆi + Bu i , d(xˆkaa +k esc , xˆk a +k esc ) − reffect > 0, h(xˆi , u i ) ≤ 0 for i = k a , k a + 1, . . . , k a + N ,
(7.13a) (7.13b)
where N ≥ k esc denotes the prediction horizon, xˆ˜i captures the difference between goal the state estimation and the goal state at time index i, i.e., xˆ˜i xˆi − xi , Q i and Ri a are symmetric positive definite weight matrices, and xˆi denotes the estimate of the pursuer location. The value reffect denotes an upper bound of the effective range of the spoofing device. The constraint (7.13a) implies that EsC should drive the UAV outside of the effective range of the spoofing device. The inequality (7.13b) captures any nonlinear constraint on the state estimation xˆi (e.g., velocity) and the control input u i (e.g., acceleration). Remark 7.4 The upper bound of the effective range reffect can be assumed to be known. Due to hardware constraints, the output power/nominal strength of the spoofing device ηk is bounded, and ηk also can be estimated by PLT. The output power determines the effective range of the spoofing device, and reffect can be computed as reffect = arg max g(r ), r
where g(r ) C S
ηk > ηS . r2
7 Resilient Estimation and Safe Planning for UAVs in GPS-Denied Environments
179
There are two key challenges in Program 7.13. First, the states and the pursuer location are unknown, and their estimates xˆi and xˆia are subject to stochastic noise. Moreover, we cannot guarantee that constraint (7.13a) is always feasible, i.e., Program 7.13 may not have a solution. To address these two challenges, we introduce two programs for EsC in the following sections.
7.3.6 Controller Design: EsC with Tube Since the constraint (7.13a) captures the safety-critical constraint, we can reformulate it as a conservative constraint such that EsC should drive the UAV outside of the effective range of the spoofing device with probability γ by the single individual chance constraint (ICC): P [d(xkaa +k esc ), xk a +k esc ] − reffect > 0 > γ .
(7.14)
Then, Program 7.13 becomes a new stochastic MPC with ICC. The chance constraints can be handled by constraint backoffs, which originate in linear MPC with additive stochastic noise [32]. However, we consider nonlinear constraints in Program 7.13, which makes the backoff intractable to compute. In [33], the tube is constructed based on sublevel sets of the incremental Lyapunov function by online predicted tube size, and then it is used to ensure robust constraint satisfaction by tightening the nonlinear state and input constraints. In [34], this is extended to allow for ICCs and stochastic uncertainty. Similar to [33, 34], the stochastic MPC with ICC can be formulated as the following program: min u
a k +N
T Q i xˆ˜i+1 + u iT Ri u i , x˜ˆi+1
i=k a
s.t. xˆi+1 = A xˆi + Bu i , d(xˆkaa +k esc , xˆk a +k esc ) − reffect > s(Pk a +k esc , Pka , γ ), h(xˆi , u i ) ≤ 0 for i = k a , k a + 1, . . . , k a + N ,
(7.15a) (7.15b)
where Pk a +k esc denotes the UAV state covariance at escape time, and Pka denotes the pursuer state covariance. The function s(·) is the probabilistic tube size that can be seen as a margin to fulfill the second constraint in (7.13a). To provide the theoretical guarantees on the capability of Program 7.15 and the equivalence between the stochastic MPC with ICC and Program 7.15, we use the results from [33, 34]. Since the newly formulated MPC with ICC (7.14) is the standard nonlinear stochastic MPC problem, Assumptions in [34] can be verified. Theorem 7.1 Under Assumptions 1–4, 6, and 9 in [34], if Program 7.15 is feasible at t = k a , then it is recursively feasible. Furthermore, the constraints (7.13b) and (7.14) are satisfied, and the origin is practically asymptotically stable for the resulting
180
W. Wan et al.
closed-loop system. The impact of the hard constraint (7.15a) is equivalent to the nonlinear ICCs (7.14). Proof See proofs of Theorem 1 in [33] and Theorems 8 and 10 in [34].
From Theorem 7.1, we conclude that as long as Program 7.15 is feasible at the time of distortion k a , we are able to guarantee that the UAV can escape within the escape time in probability. However, Program 7.15 may not be feasible in some cases. To address this issue, we introduce a program with a soft constraint in the subsequent section.
7.3.7 Controller Design: EsC with Potential Function The hard constraint (7.15a) can be replaced by the repulsive potential function [35] as a high penalty in the cost function which is active only after the escape time k a + k esc . The repulsive potential function Ur ep (D) is defined as Ur ep (D)
1 β 2
1 D
−
2
1 reffect
0
if D ≤ reffect , if D > reffect ,
which can be constructed based on the distance between the location of the pursuer and the location of the UAV, D d(xˆkaa +k esc , xˆk a +k esc ). The scaling parameter β is a large constant, which represents a penalty when the constraint has not been fulfilled. Utilizing the soft constraint, we reformulate the MPC problem as min u
a +N k
T Q i xˆ˜i+1 + u iT Ri u i + xˆ˜i+1
a k +N
Ur ep (Di ),
(7.16a)
i=k a +k esc
i=k a
s.t. xˆi+1 = A xˆi + Bu i ,
(7.16b)
h(xˆi , u i ) ≤ 0 for i = k , k + 1, . . . , k + N . a
a
a
(7.16c)
Remark 7.5 Comparing to the use of the repulsive potential function Ur ep in the collision avoidance literature [36–38], he proposed application of the repulsive potential function in Program 7.16 has two differences. Firstly, the repulsive potential function is known before the collision happens in collision avoidance literature, while we can only deduce the repulsive potential function Ur ep after the collision happens, i.e., only after the UAV has entered the effective range of the spoofing device. Secondly, the repulsive potential function Ur ep is only counted in the cost function of Program 7.16 after the escape time.
7 Resilient Estimation and Safe Planning for UAVs in GPS-Denied Environments
181
7.3.8 Illustrative Example In the following, we present a numerical example aimed at showing the applicability and the effectiveness of the proposed framework. In this example, the UAV is moving from the start position with the coordinates at (0, 0) to the target position (300, 300) while implementing a proportional–derivative (PD) tracking controller based on the state estimate from (7.2). If a GPS distortion happens, then the state estimate will be no longer trustworthy. After the GPS measurement is turned off, the only available relative state measurement causes the sensor drift problem [18]. The UAV will switch the control mode from the robust control mode to the emergency control mode when the distortion is detected, using EsC to escape away from the pursuer within the escape time. We solve the problem with EsC with Potential Function described in Program 7.16. The online computation is done using Julia, and EsC is implemented by using JuMP [39] package with Ipopt solver. UAV Model. We use a double integrator UAV dynamics under the GPS spoofing distortion as in [40]. The discrete-time state vector xk considers planar position y y y and velocity at time step k, i.e., xk = [rkx , rk , vkx , vk ]T , where rkx and rk denote the y x horizontal position coordinates, respectively, and vk and vk denote the corresponding velocity coordinates. We consider the acceleration of UAV as the control input u k = y [u kx , u k ]T . Weassume that both the state constraints and control input constraints y
y
are given by (vkx )2 + (vk )2 ≤ 5 and (u kx )2 + (u k )2 ≤ 2. With a sampling time of 0.1s, the double integrator model is discretized into the following matrices: ⎡
1 ⎢0 A=⎢ ⎣0 0
0 1 0 0
0.1 0 1 0
⎤ 0 0.1⎥ ⎥, 0⎦ 1
⎡
0 ⎢0 B=⎢ ⎣0.1 0
⎤ 0 0⎥ ⎥, 0⎦ 0.1
and the outputs ykG , ykI and ykS denote the position measurements from the GPS, the velocity measurements from the IMU, and the GPS signal strength measurements, respectively, with the output matrices
1000 0010 I C = , C = , C S = 1. 0100 0001 G
The covariance matrices of the sensing and disturbance noises are chosen as Σw = 0.1 · I , ΣG = I , Σ I = 0.01 · I , and Σ S = I . GPS Spoofing Distortion and Distortion Signal Estimation. The GPS distortion happens whenever the UAV is in the effective range of the spoofing device. In this distortion scenario, the distortion signal is d = [10, 10]T . The location of the pursuer and the nominal power of the spoofing device are xka = [100, 100]T and ηk = 200, respectively, which are both unknown to the UAV. The estimation obtained by (7.7) is shown in Fig. 7.4.
182
W. Wan et al.
Fig. 7.4 Distortion signal estimation. The UAV stays in the effective range of the spoofing device from time step 231 to 356
Fig. 7.5 Detection. The detector state Sk is defined in (7.9) of the CUSUM detector. The threshold 2 is χ f (α) calculated by log d1−δ with α = 0.01 and δ = 0.15
Detection. Using the estimated distortion signal to calculate the detector state Sk by (7.9), the detector finds the distortion using the normalized vector as shown in Fig. 7.5. In Fig. 7.5, there are abnormal high detector state values, which imply that there is a distortion. The statistical significance of the distortion is tested using the CUSUM detector described in (7.10) with the significance α at 1%. Pursuer State Estimation. If the GPS distortion is detected, then the UAV first estimates the pursuer state xka by using Algorithm 7.2 with window size M = 5; the estimation result is shown in Fig. 7.6. The estimated location and the estimated nominal power quickly converge to the true values. The estimates are drifting when the UAV remains in a GPS-denied environment. After obtaining an estimate of the pursuer state, EsC is used to escape away from the effective range of the spoofing device. Trajectory Generation. Program 7.16 with the prediction horizon N = k esc + 40 and the scaling parameter β = 50000 is used to generate the estimated and true trajectories of the simulated scenario shown in Fig. 7.7. As shown in Fig. 7.8, the state estimation error xk − xˆk is increasing when the UAV is in the effective range
7 Resilient Estimation and Safe Planning for UAVs in GPS-Denied Environments
183
Fig. 7.6 Pursuer state estimation
Fig. 7.7 Estimated and true trajectories of the simulated scenario. The pursuer is located at (100, 100) with reffect = 30, which is displayed as the light blue circle
of the spoofing device, and the error is bounded by the tolerable error distance ζ = 3 corresponding to k esc = 125. Figure 7.9 shows how the proposed control framework performs in different cases, where reffect ∈ {10, 30, 50, 70}. Regardless of the size of reffect , the UAV will escape the effective range within the escape time.
7.4 Extension to Time Coordination Tasks for Multi-UAV Systems 7.4.1 Multi-agent Network Let xi,k ∈ Rn , i = 1, . . . , Na , denote the state of the ith agent associated with dynamic system model (7.1), where Na denotes the total number of agents. Graph theory can provide the natural abstractions for how information is shared between
184
W. Wan et al.
Fig. 7.8 Bounded estimation error xk − xˆk
Fig. 7.9 Trajectories with different effective ranges. In a, the UAV can pass the pursuer without changing the direction or even its speed since reffect is small enough. From b–d, the UAV drives away from the effective range within the escape time and tries to get as close to the global goal as possible
agents in a network [41]. An undirected graph G = (V, E) consists of a set of nodes V = {1, 2, . . . , Na }, which corresponds to the different agents, and a set of edges E ⊂ V × V , which relates to a set of unordered pairs of agents. In particular, (i, j), ( j, i) ∈ E if and only if there exists a communication channel between agents i and j. The neighborhood N(i) ⊆ V of the agent i will be understood as the set { j ∈ V | (i, j) ∈ E}.
7.4.2 Path Following Consensus Each agent i ∈ V has a desired trajectory gi : si,k → Rn s that is parameterized by the coordination state variable si,k ∈ [0, 1] as shown in Fig. 7.10. The dimension n s is usually 2 for 2D missions or 3 for 3D missions.
7 Resilient Estimation and Safe Planning for UAVs in GPS-Denied Environments
185
Fig. 7.10 Illustration of the path following consensus. The goal of the multi-agent system is for all agents to reach the desired goal state simultaneously. For the agent i at time k, the virtual target/predetermined desired state is gi (si,k ) and the true state is xi,k . The error between the virtual target gi (si,k ) and the true state xi,k (marked in red) is to be minimized. The pursuer is on the path of the agent j and the effective spoofing area is displayed as the light blue circle. When the distortion is detected, the agent j will be re-planning the trajectory so that the state estimation errors remain in the tolerable region, while the other agents will adjust their coordination states accordingly to achieve time coordination
At time k, gi (si,k ) denotes the virtual target that the agent i follows at that time, i.e., agent i pursues to minimize the error gi (si,k ) − xi,k , which is marked in red in Fig. 7.10. The state si,k can be seen as a normalized length of the trajectory. The agents also desire to achieve a consensus of the coordination state variable si,k − s j,k → 0 as k → ∞ ∀i, j ∈ V, so that the virtual targets of the agents arrive at the destination at the same time. The agent i knows the coordination state si,k as well as the coordination states s j,k for neighboring agents j ∈ N(i).
7.4.3 Time Coordination Task Given a multi-agent network consisting of number of Na agents described in (7.1), the agent i, where i = 1, . . . , Na , aims to follow its desired trajectory gi (·) with a reference rate ρ, i.e.,
186
W. Wan et al.
gi (si,k ) − xi,k → 0 as k → ∞,
(7.17a)
si,k+1 − si,k → ρ as k → ∞,
(7.17b)
and to achieve time coordination, i.e., si,k − s j,k → 0 as k → ∞
(7.18)
for all i, j ∈ V , and for all k ≥ 0. In the meanwhile, each agent aims to detect the GPS spoofing distortion, obtain the resilient state estimation when considering the limited sensor availability, and complete the path-following mission securely.
7.4.4 Consensus Protocol Consider the coordination state of the consensus network model si,k+1 = si,k + z i,k ,
(7.19)
where z i,k ≥ 0 denotes the control input for the coordination state of the agent i at time index k. To solve the path following consensus problem in (7.17) and (7.18), we propose to design the control input z i,k as ⎧ ⎨
z i,k = max −ke |gi (si,k ) − xi,k | − ks ⎩
j∈N(i)
⎫ ⎬
(si,k − s j,k ) + ρ + 1distorted zˆ i,k , 0 , ⎭ (7.20)
where ke > 0 and ks > 0 denote coordination control gains, and the reference rate ρ is the desired rate of progress that is a constant. The first term, −ke |gi (si,k ) − xi,k |, indicates that if there is a tracking error, then the agent reduces the coordination speed. The second term, −ks j∈N(i) (si,k − s j,k ), captures the consensus term, which reduces errors between the local coordination state with those of the neighbors. The third term, ρ, denotes the desired rate if there is no tracking error and no coordination error. The last term, zˆ i,k = ke |gi (si,k ) − xi,k |, drives the virtual target away from the spoofing device even when the UAV detours the planned trajectory. The indicator function 1i,distorted is defined as 1i,distorted
1, if a distortion is detected by agent i, 0, otherwise.
Moreover, if −ke gi (si,k ) − xi,k − ks j∈N(i) (si,k − s j,k ) + ρ + 1i,distorted zˆ i,k is less than zero, then the virtual target chooses to stay at the current state rather than go backward.
7 Resilient Estimation and Safe Planning for UAVs in GPS-Denied Environments
187
( j)
Table 7.1 Bézier curve control points Pi i/( j)
(0)
(1)
(2)
(3)
1 2 3
[0 , 0]T [200, 0]T [400, 0]T
[100, 100]T [100, 100]T [450, 150]T
[10, 300]T [250, 200]T [300, 300]T
[190, 400]T [200, 400]T [210, 400]T
400 Agent 1 Agent 2 Agent 3
400 300
Agent 1 Agent 2 Agent 3
y- axis
y- axis
300
200
200 100 100 0 0 0
100
200
x- axis
300
400
(a) Trajectories of the three agents in dashed lines generated by Bézier curves (7.21) using the control points summarized in Table 7.1
0
100
200
x- axis
300
400
(b) Path following trajectories of three agents are plotted in solid lines. Every three hex points connected by two dotted lines indicate the locations of three agents at the same coordination state
Fig. 7.11 Trajectory generation and time coordination
7.4.5 Illustrative Example Trajectory Generation and Time Coordination for Multi-UAV Systems. The nominal trajectories of a three-UAV system gi (si,k ), where i ∈ {1, 2, 3}, are generated by cubic Bézier curves [42] 2 3 Pi(2) + si,k Pi(3) , gi (si,k ) (1 − si,k )3 Pi(0) + 3(1 − si,k )2 si,k Pi(1) + 3(1 − si,k )si,k (7.21) ( j) where si,k ∈ [0, 1] denotes the coordination state and Pi , where j ∈ {0, 1, 2, 3}, are the control points for the agent i; the control points for this example are listed in Table 7.1. Figure 7.11a shows the trajectories generated by (7.21), and the Bézier curve control points for each agent are marked with colored dots. Agent i aims to follow the trajectory starting from point Pi(0) and plans to arrive at the destination point Pi(3) simultaneously. To achieve these goals, the time coordination controller proposed in (7.20) is used to update the consensus network in (7.19); then a PD tracking controller is used to track the virtual target generated by the coordination state in (7.19).
188
W. Wan et al.
The parameters used in (7.20) and the PD controller were ρ=
1 , ke = 0.005, ks = 0.005, k p = 0.05 , kd = 0.315, 1200
where k p and kd denote the proportional gain and the derivative gain, respectively. Figure 7.11b shows the path following and time coordination results. A series of locations of the three agents are plotted by the hex points. Their connections by the dotted lines indicate that they have the same coordination states. We observe that the time coordination and PD control are well-designed, and all of the agents arrived at the goal destination simultaneously. In the Presence of GPS Spoofing Distortion. The GPS distortion happens when the UAV is in the effective range of the spoofing device. In this scenario, the distortion signal is dk = [10, 10]T and the effective range of the spoofing device is reffect = 30. The location of the pursuer is xka = [200, 200]T , which is unknown to the UAV until it is inside the effective range of the spoofing device. The estimation obtained by (7.7) is shown in Fig. 7.12a. The detector state Sk can be obtained by using the estimated distortion signal as in (7.9). The abnormal high detector state values shown in Fig. 7.12b imply that there is a distortion. Statistical significance of the distortion is tested using the CUSUM detector described in (7.10) with the significance α at 1%. The threshold is calculated by log
χd2f (α) 1−δ
with α = 0.01 and δ = 0.15.
EsC in Program 7.16 with prediction horizon N = k esc + 50 and scaling parameter β = 10000 is used to generate the new trajectory for safety operation. Figure 7.13 shows the trajectory of the simulated scenario. EsC drives the distorted UAV away from the effective range of the spoofing device; time coordination is achieved and all of the agents arrive at the destination points simultaneously. Figure 7.14 presents how the proposed control framework performs in different cases, where reffect ∈ {15, 50, 60, 70}. Regardless of the size of reffect , the UAV will escape the effective range within the escape time and achieve time coordination. In Fig. 7.14a, the distorted UAV can pass the pursuer without changing the direction or even its speed, since reffect is small enough. From Fig. 7.14b–d, the UAV drives away from the effective range within the escape time and tries to get back to the assigned trajectory.
7.5 Conclusion and Future Work In this chapter, we presented a secure safety-constrained control framework that adapts the UAV at a path re-planning level to support resilient state estimation against GPS spoofing distortions. A resilient state estimator has been designed, and the χ 2 CUSUM algorithm is used for detection. In the presence of a GPS spoofing distortion, the state estimation suffers from increasing variance due to the limited sensor availability. In this case, using the robust controller may keep the UAV within the effective range of the spoofing device after the estimation errors are not in the tolerable region. The large estimation error will give rise to safety problems. PLT is
7 Resilient Estimation and Safe Planning for UAVs in GPS-Denied Environments
189
(a) Distortion signal estimation
(b) Detection Fig. 7.12 Distortion estimation and detection
developed to track the pursuer’s location and estimate the spoofing device’s effective range by using UKF with sliding window outputs. Then, EsC is used to escape away from the effective range of the spoofing device within the escape time to solve this safety problem. The proposed framework has been extended for the multiUAV systems that perform the time-critical coordination task. Illustrative numerical examples are given to demonstrate the results. While the results presented in this chapter attempt to achieve safe operation for UAVs in GPS-denied environments, many open questions are left for further research and development. For example, when the UAV is under GPS spoofing distortion, the
190
W. Wan et al. 400
y- axis
300
200
100
0 0
100
200
300
400
x- axis Fig. 7.13 Trajectory in the presence of the distortion. The pursuer is located at [200, 200]T with reffect = 30, which is displayed as the light blue circle
Fig. 7.14 Trajectories when pursuer is located at [200, 200]T with different effective ranges
feasibility of Program 7.13 and 7.15 depends on how far the UAV is away from the boundary of the effective range of the spoofing device. When they are not feasible, Program 7.16 can still generate a solution such that the UAV can escape with a minimum time. The drawback is that Program 7.16 could violate the escape time constraint when the pursuer has an aggressive strategy. The possible extensions and future directions are (i) developing an emergency landing controller when safety constraints are not feasible, (ii) investigating the conditions when UKF with sliding window algorithm could fail, and (iii) exploiting multi-UAV-based group strategy in GPS-denied environments. UAVs do not only suffer from exogenous disturbances such as those discussed in this chapter. Existing autopilots for UAVs underlie relatively simple and sometimes
7 Resilient Estimation and Safe Planning for UAVs in GPS-Denied Environments
191
idealistic, dynamical models that, in problems of practical interest, are insufficient to forecast the vehicle’s dynamics and, hence, may lead to the design of insufficient control systems. The next chapter marks the beginning of the third part of this contributed book which is dedicated to contemporary control problems for autonomous UAVs. That chapter presents a novel control system for multi-rotor UAVs based on incremental nonlinear dynamic inversion (INDI) that is proven to be robust against unmodeled aerodynamic forces. Acknowledgements This work has been supported by the National Science Foundation (ECCS1739732, and CMMI-1663460).
References 1. Maza I, Kondak K, Bernard M, Ollero A (2009) Multi-UAV cooperation and control for load transportation and deployment. In: Selected papers from the 2nd international symposium on UAVs, Reno, Nevada, USA. Springer, pp 417–449 2. Hardin PJ, Jackson MW (2005) An unmanned aerial vehicle for rangeland photography. Rangeland Ecol Manag 58(4):439–442 3. Candiago S, Remondino F, De Giglio M, Dubbini M, Gattelli M (2015) Evaluating multispectral images and vegetation indices for precision farming applications from UAV images. Remote Sens 7(4):4026–4047 4. Warner JS, Johnston RG (2003) GPS spoofing countermeasures. Homeland Secur J 25(2):19– 27 5. Warner JS, Johnston RG (2002) A simple demonstration that the global positioning system (GPS) is vulnerable to spoofing. J Secur Adm 25(2):19–27 6. McMilin E, De Lorenzo DS, Walter T, Lee TH, Enge P (2014) Single antenna GPS spoof detection that is simple, static, instantaneous and backwards compatible for aerial applications. In: International technical meeting of the satellite division of the institute of navigation. Institute of Navigation, Tampa, FL, pp 2233–2242 7. Montgomery PY, Humphreys TE, Ledvina BM (2009) Receiver-autonomous spoofing detection: experimental results of a multi-antenna receiver defense against a portable civil GPS spoofer. In: International technical meeting of the institute of navigation, pp 124–130 8. Magiera J, Katulski R (2015) Detection and mitigation of GPS spoofing based on antenna array processing. J Appl Res Technol 13(1):45–57 9. Chen Y-H (2012) A study of geometry and commercial off-the-shelf (COTS) antennas for controlled reception pattern antenna (CRPA) arrays. In: ION GNSS. The Institute of Navigation, pp 907–914 10. Chen Y-H, Lo S, Akos DM, De Lorenzo DS, Enge P (2013) Validation of a controlled reception pattern antenna (CRPA) receiver built from inexpensive general-purpose elements during several live-jamming test campaigns. In: International technical meeting. The Institute of Navigation, San Diego, CA, pp 154–163 11. Jansen K, Pöpper C (2017) Advancing attacker models of satellite-based localization systems: the case of multi-device attackers. In: Conference on security and privacy in wireless and mobile networks. Association of Computer Machinery, Boston, MA, pp 156–159 12. Mo Y, Garone E, Casavola A, Sinopoli B (2010) False data injection attacks against state estimation in wireless sensor networks. In: Conference on decision and control. IEEE, Atlanta, GA, pp 5967–5972 13. Fawzi H, Tabuada P, Diggavi S (2014) Secure estimation and control for cyber-physical systems under adversarial attacks. IEEE Trans Autom Control 59(6):1454–1467
192
W. Wan et al.
14. Pajic M, Weimer J, Bezzo N, Tabuada P, Sokolsky O, Lee I, Pappas GJ (2014) Robustness of attack-resilient state estimators. In: International conference on cyber-physical systems. ACM/IEEE, Berlin, Germany, pp 163–174 15. Pasqualetti F, Dörfler F, Bullo F (2013) Attack detection and identification in cyber-physical systems. IEEE Trans Autom Control 58(11):2715–2729 16. Wan W, Kim H, Hovakimyan N, Voulgaris PG (2019) Attack-resilient estimation for linear discrete-time stochastic systems with input and state constraints. In: Conference on decision and control, Nice, France, pp 5107–5112 17. Wan W, Kim H, Hovakimyan N, Voulgaris P (2022) Constrained attack-resilient estimation of stochastic cyber-physical systems. Soc Sci Res Netw 1–12 18. Yoon H-J, Wan W, Kim H, Hovakimyan N, Sha L, Voulgaris PG (2019) Towards resilient UAV: escape time in GPS denied environment with sensor drift. IFAC-PapersOnLine 52(12):423–428 19. Psiaki ML, O’Hanlon BW, Bhatti JA, Shepard DP, Humphreys TE (2013) GPS spoofing detection via dual-receiver correlation of military signals. IEEE Trans Aerosp Electron Syst 49(4):2250–2267 20. Teixeira A, Shames I, Sandberg H, Johansson KH (2015) A secure control framework for resource-limited adversaries. Automatica 51:135–148 21. Wang X, Hovakimyan N, Sha L (2018) RSimplex: a robust control architecture for cyber and physical failures. Trans Cyber-Phys Syst 2(4):27 22. Wan W, Kim H, Hovakimyan N, Sha L, Voulgaris PG (2020) A safety constrained control framework for UAVs in GPS denied environment. In: Conference on decision and control. IEEE, Jeju Island, Republic of Korea, pp 214–219 23. Wan W, Kim H, Cheng Y, Hovakimyan N, Voulgaris P, Sha L (2021) Safety constrained multiUAV time coordination: a bi-level control framework in GPS denied environment. In: Aviation forum. Virtual Event, AIAA, p 2463 24. Friis HT (1946) A note on a simple transmission formula. Proc IRE 34(5):254–256 May 25. Teixeira A, Amin S, Sandberg H, Johansson KH, Sastry SS (2010) Cyber security analysis of state estimators in electric power systems. In: Conference on decision and control. IEEE, Atlanta, GA, pp 5991–5998 26. Mo Y, Chabukswar R, Sinopoli B (2014) Detecting integrity attacks on SCADA systems. IEEE Trans Control Syst Technol 22(4):1396–1407 27. Page ES (1954) Continuous inspection schemes. Biometrika 41(1/2):100–115 28. Barnard GA (1959) Control charts and stochastic processes. J R Stat Soc Ser B (Methodol), pp 239–271 29. Lai TL (1995) Sequential changepoint detection in quality control and dynamical systems. J R Stat Soc Ser B (Methodol), pp 613–658 30. Julier SJ, Uhlmann JK (1997) New extension of the Kalman filter to nonlinear systems. In: Signal processing, sensor fusion, and target recognition VI, vol 3068. International Society for Optics and Photonics, Orlando, FL, pp 182–193 31. Wan EA, Van Der Merwe R (2000) The unscented Kalman filter for nonlinear estimation. In: Adaptive systems for signal processing, communications, and control symposium. IEEE, Lake Louise, AB, pp 153–158 32. Van Hessem D, Bosgra O (2002) A conic reformulation of model predictive control including bounded and stochastic disturbances under state and input constraints. In: Conference on decision and control, vol 4. IEEE, Las Vegas, NV, pp 4643–4648 33. Köhler J, Soloperto R, Müller MA, Allgöwer F (2020) A computationally efficient robust model predictive control framework for uncertain nonlinear systems. IEEE Trans Autom Control 66(2):794–801 34. Schlüter H, Allgöwer F (2020) A constraint-tightening approach to nonlinear stochastic model predictive control under general bounded disturbances. In: World congress, vol 53, no 2. IFAC, pp 7130–7135 35. Ge SS, Cui YJ (2000) New potential functions for mobile robot path planning. IEEE Trans Robot Autom 16(5):615–620
7 Resilient Estimation and Safe Planning for UAVs in GPS-Denied Environments
193
36. Olfati-Saber R (2006) Flocking for multi-agent dynamic systems: algorithms and theory. IEEE Trans Autom Control 51(3):401–420 37. Choset HM, Hutchinson S, Lynch KM, Kantor G, Burgard W, Kavraki LE, Thrun S (2005) Principles of robot motion: theory, algorithms, and implementation. MIT Press, Boston, MA 38. Wolf MT, Burdick JW (2008) Artificial potential functions for highway driving with collision avoidance. In: IEEE international conference on robotics and automation, pp 3731–3736 39. Dunning I, Huchette J, Lubin M (2017) Jump: a modeling language for mathematical optimization. SIAM Rev 59(2):295–320 40. Kerns AJ, Shepard DP, Bhatti JA, Humphreys TE (2014) Unmanned aircraft capture and control via GPS spoofing. J Field Robot 31(4):617–636 41. Mesbahi M, Egerstedt M (2010) Graph theoretic methods in multiagent networks, vol 33. Princeton University Press, Princeton, NJ 42. Bartels RH, Beatty JC, Barsky BA (1995) An introduction to splines for use in computer graphics and geometric modelling. Morgan Kaufmann Wenbin Wan is an Assistant Professor in the Department of Mechanical Engineering at the University of New Mexico (UNM). He received his Ph.D. in mechanical engineering and M.S. in applied mathematics from the University of Illinois Urbana-Champaign (UIUC). He earned a B.S. in mechanical engineering from the University of Missouri-Columbia. He was appointed as MechSE Teaching Fellow at UIUC and named CPS Rising Star by the University of Virginia. His research interests are in control and estimation, optimization, machine learning, and cyberphysical systems and their safety-critical applications. Hunmin Kim received a Ph.D. degree in Electrical Engineering and Computer Science from Pennsylvania State University (2018), and a Bachelor’s degree in Mechanical Engineering from Pusan National University, Korea (2012). He is currently an Assistant Professor of Electrical and Computer Engineering at Mercer University. He worked as a Postdoctoral Research Associate in The Department of Mechanical Science and Engineering at the University of Illinois at UrbanaChampaign. Naira Hovakimyan received her M.S. degree in Applied Mathematics from Yerevan State University in Armenia. She got her Ph.D. in Physics and Mathematics from the Institute of Applied Mathematics of the Russian Academy of Sciences in Moscow. She is currently a W. Grafton and Lillian B. Wilkins Professor of Mechanical Science and Engineering at UIUC. She has coauthored 2 books, 11 patents, and more than 450 refereed publications. She is the 2011 recipient of the AIAA Mechanics and Control of Flight Award, the 2015 recipient of the SWE Achievement Award, the 2017 recipient of the IEEE CSS Award for Technical Excellence in Aerospace Controls, and the 2019 recipient of the AIAA Pendray Aerospace Literature Award. In 2014, she was awarded the Humboldt prize for her lifetime achievements. She is a Fellow of AIAA and IEEE. Petros G. Voulgaris received the Diploma in Mechanical Engineering from the National Technical University of Athens, Greece, and the S.M. and Ph.D. degrees in Aeronautics and Astronautics from the Massachusetts Institute of Technology. He is currently the Victor LaMar Lockhart Professor and Chair of Mechanical Engineering at the University of Nevada, Reno. Before joining UNR in 2020 and since 1991, he was a faculty in Aerospace Engineering at the University of Illinois at Urbana-Champaign. He is a Fellow of IEEE. Lui Sha graduated with Ph.D. from CMU in 1985. He worked at the Software Engineering Institute from 1986 to 1998. He joined UIUC in 1998 as a Full Professor. Currently, he is a Donald B. Gillies Chair Professor of the Computer Science Department and Daniel C. Drucker Eminent Faculty at UIUC’s College of Engineering. He is a Fellow of IEEE and ACM. He was a member of the National Academic of Science’s Committee on Certifiably Dependable Software Systems and a member of the NASA Advisory Council.
Chapter 8
Incremental Nonlinear Dynamic Inversion-Based Trajectory Tracking Controller for an Agile Quadrotor Design, Analysis, and Flight Tests Results Emre Saldiran and Gokhan Inalhan
8.1 Introduction In the last decades, the interest in unmanned aerial vehicles (UAVs) has increased considerably. One of the most popular types of these UAVs is the quadrotor. This kind of vehicle became a popular choice due to its much simpler mechanical design compared to the conventional helicopter, easily accessible off-the-shelf components, and open-source autopilot solution that provides fully autonomous flight, such as ArduPilot and PX4 [1, 2]. The quadrotors’ vertical takeoff and landing ability allow them to operate in tighter spatially constrained environments. The quadrotors gained attention both in academia as a research tool and in many industries, including cargo delivery [3], film making [4], precision agriculture [5], topological surveying [6], search and rescue [7], and building inspection [8]. Quadrotors must operate reliably in unmodeled environments. With the projected increase in the quadrotor flight every day [9], designing a robust control law becomes even more important. The robust controller would allow the vehicle to operate more precisely under harsh environmental conditions. Quadrotors experience high levels of disturbance while performing fast flight maneuvers. Shorter flight times, which require a higher flight speed, are crucial for cost reduction in the quadrotor application. In recent years, the control of UAVs at high speed became more prominent. One of the popular research areas where high-speed motion is needed is autonomous drone racing. Autonomous drone racing introduces challenging objectives for navigation, planning, and control. Some of the challenges were mentioned in these studies [10]. Autonomous drone racing requires performing extreme maneuvers through E. Saldiran · G. Inalhan (B) Cranfield University, Bedford, United Kingdom e-mail: [email protected] E. Saldiran e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 A. L’Afflitto et al. (eds.), Control of Autonomous Aerial Vehicles, Advances in Industrial Control, https://doi.org/10.1007/978-3-031-39767-7_8
195
196
E. Saldiran and G. Inalhan
a cluttered environment, e.g., both high speed and high acceleration. Achieving these demanding maneuvers without colliding with obstacles requires good trajectory tracking. During the high-speed forward flight phase, aerodynamic forces and moments acting on the UAV change considerably compared to the hover condition. Thus, deployed controllers must compensate for these effects during the tracking of high-speed trajectories. A controller that takes into account aerodynamic forces and moments was shown to improve the tracking performance. Developing a controller for high-speed trajectories requires high-fidelity aerodynamic modeling of UAVs in forward flight conditions. However, obtaining a good model for very high-speed forward flight is a laborious process and requires multiple outdoor flight tests in calm weather. In the literature, trajectory tracking performance has been improved either by incorporating linear lumped aerodynamic drag parameters directly into the controller or by optimizing the controller gains with high-fidelity parametric models. In [11–13], it was shown that the inclusion of lumped linear drag force has reduced the trajectory tracking error. However, these trajectories were tracked with a relatively low speed of 5 m/s. Recently, [14] showed that identifying and incorporating the drag effects that occur during extreme conditions (e.g., vertical, high-speed horizontal, and high angular acceleration) into the controller can reduce the tracking error significantly. While adding drag-related terms into the controller has been shown to decrease the tracking error, these methods still require multiple flight tests for the identification of aerodynamic drag parameters. Also, the identified parameters need to be revised for each UAV, to achieve better tracking performance. In [15, 16] and in our previous work [17], it is shown that the speed damping derivatives and speed stability derivatives change significantly with increasing speed and linear controller gains are optimized using high fidelity parametric model for a given handling quality criteria. The trajectory tracking performance obtained by applying these gains was evaluated rigorously, and, although satisfactory results were obtained, the modelbased approach requires multiple flight tests and calm weather conditions to perform high-speed forward flight modeling. A new control method named incremental nonlinear dynamic inversion (INDI) has been proven to be robust to unmodeled aerodynamic effects. Incremental nonlinear dynamic inversion, also known as the sensor-based control method, utilizes sensor measurements to minimize model dependency. This method has been applied to helicopters [18], fixed-wing aircraft [19, 20], and quadrotors [21, 22]. In [23, 24], the ability of an INDI-based system to control a quadrotor with rotor failure was demonstrated. These results, however, are intended to regulate the quadrotor around the hover conditions, not for trajectory tracking. In this chapter, the INDI controller is introduced in a cascaded fashion for position and attitude tracking. The INDI controller is derived for the rotational and translational dynamics separately. The model-related parameters are separated from the controller. This new formulation allows an intuitive and structured way to tune and implement the INDI controller. A new approach to applying INDI to the yaw
8 Incremental Nonlinear Dynamic Inversion …
197
axis rotational dynamics is introduced. This approach is novel because it simplifies the nonlinear control allocation equation by eliminating the nonlinear rotor acceleration term. This approach further simplifies torque calculations for the actuator state feedback by removing the rotor inertia and the motor’s time constant parameters. The stability of this approach is investigated with closed-loop analysis and flight tests. The designed controller is implemented on a racer quadrotor platform running ArduPilot open-source autopilot software. The software-in-the-loop (SITL) tests are conducted before the flight test to verify the successful implementation of the controller. The outdoor flight tests are performed to evaluate and compare trajectory tracking performance with a classical proportional–integral–derivative (PID) controller. The proposed control law had a 50% lower tracking error for the circle trajectory with 10 m/s maximum speed and a 70% lower tracking error for the lemniscate trajectory with 11 m/s maximum speed. To evaluate the attitude hold and attitude tracking performances, the proposed control law is tested at the vehicle’s physical limit, which is a maximum speed of 30 m/s and a pitch angle of 60◦ . The overshoot from this pitch angle to the level flight is 70% smaller for the proposed INDI controller due to its better disturbance rejection capabilities. The source code is publicly available on GitHub [25]. This chapter is structured as follows. In Sect. 8.3, the experimental platform, quadrotor model, and coordinate system notation are introduced. In Sect. 8.4, the INDI control law is derived for the rotational and translational dynamics, and the attitude and position control system architecture is described. Furthermore, the new formulation for the missing rotor rotational acceleration term is derived. The reference, tracking, and robustness of the INDI controller are evaluated through closedloop analysis. In Sect. 8.5, experimental results from outdoor flight tests for trajectory tracking and attitude hold are given. Finally, in Sect. 8.6, we draw conclusions and outline future work directions.
198
E. Saldiran and G. Inalhan
8.2 Notation
p, q, r φ, θ, ψ σˆ Δ α Cd Cdm Cm Ct Ctr d F1 , F2 , F3 , F4 f ref g Ir Iv Hrot (s) Htrans (s) Ht (s) Hy (s) K1 K2 K3 Kq , Kw Kpos , Kvel m M(s) q r s R BI
: Components of angular velocity vector in body frame [rad/s] : Vehicle roll, pitch, and yaw angle [rad] : Scaled throttle command : Parametric uncertainty scaling factor : Angular acceleration vector in body frame[rad/s2 ] : Body frame drag coefficient [N/(m/s)2 ] : Drag moment coefficient[N · m/(rad/s)2 ] : Propeller moment coefficient, [N/(rad/s)2 ] : Propeller thrust coefficient, [N/(rad/s)2 ] : Scaled throttle to the rotation speed [rad/s] : Distance from the rotor center to the vehicle center [m] : Individual motor thrust [N] : Specific thrust vector [m/s2 ] : Gravitational acceleration [m/s2 ] : Rotor moment of inertia [kg · m2 ] : Diagonal vehicle moment of inertia matrix [kg · m2 ] : Angular acceleration and estimated torque filter transfer function : Inertial acceleration and estimated specific thrust filter transfer function : Specific thrust reference filter transfer function : Yaw axis torque reference filter transfer function : Linearized roll/pitch axis control effectiveness [kg · m2 /(rad · s)] : Linearized yaw axis control effectiveness [kg · m2 /(rad · s)] : Linearized yaw axis acceleration control effectiveness [kg · m2 ] : Attitude and angular rate controller gain vector : Position and velocity controller gain vectors : Vehicle total mass [kg] : Motor dynamic transfer function : Quaternion vector : Position vector in inertial frame : Laplace operator : Rotation matrix from the body to the inertial frame
Ttot τd τf τm τ ref τt τy vB ω
: Motors collective thrust [N] : Input disturbance moment vector [N · m] : Angular acceleration and estimated torque filter time constant [s] : Motor time constant [s] : Reference torque vector [N · m] : Specific thrust reference filter time constant [s] : Yaw axis torque reference filter time constant [s] : Velocity vector in body frame, m/s : Motor rotational speed vector [rad/s] : Angular velocity vector in body frame [rad/s]
8 Incremental Nonlinear Dynamic Inversion …
199
8.3 Quadrotor Model This section introduces the experimental quadrotor platform and the mathematical model to describe the vehicle’s dynamics. The lumped aerodynamic effect is considered during modeling to increase simulation fidelity. Although the INDI-based controller allows compensating for any unconsidered aerodynamic effect, a highfidelity model is still required for controller verification.
8.3.1 Experimental Platform An agile maneuvering quadrotor that is capable of reaching speed up to 30 m/s is used for demonstration. The quadrotor’s high thrust-to-weight ratio allows it to reach high speed and evaluate the INDI-based controller under unmodeled aerodynamic disturbances.
8.3.2 Coordinate Frames and Equations of Motion The coordinate frames and free-body diagram of the quadrotor platform are shown in Fig. 8.1. The inertial frame is defined as north–east–down (NED). The body frame is attached to the vehicle’s center of mass with the x-axis pointing toward the vehicle nose, the y-axis pointing toward the right, and the z-axis pointing toward the down as shown in Fig. 8.1. Vehicle orientation represented using a quaternion q = [qw , qx , q y , qz ]T , where qw denotes the scalar part and [qx , q y , qz ] the vector part. Rotation from the body frame to the inertial NED frame is defined by the
Fig. 8.1 The experimental platform body coordinate frame and motor order
200
E. Saldiran and G. Inalhan
rotation matrix ⎤ ⎡ 2 qw + qx2 − q y2 − q z2 2 qx q y − qw qz 2 qx qz + qw q y R BI = ⎣ 2 qx q y + qw qz qw2 − qx2 + q y2 − q z2 2 q y qz − qw qx ⎦ , 2 q x q z − qw q y 2 q y qz + qw qx qw2 − qx2 − q y2 + qz2
(8.1)
where ⊗ denotes the Hamiltonian product. The vehicle rotational dynamics are captured by 1 q ⊗ ω, 2 Iv ω˙ + ω × Iv ω = Mprop () + Mgyro (ω, ) + Maero (ω) + Mext , q˙ =
(8.2a) (8.2b)
where ω = [ p, q, r ]T denotes the vehicle’s angular velocity, Iv denotes the diagonal matrix of inertia. The moment due to the angular velocity of the propellers and the distance of each propeller to the center of mass are given by ⎡ ⎤ √ ⎤⎞ ⎡ ⎛⎡ ⎤ Ω2 Ct d/√2 −1 −1 1 1 ⎢ 12 ⎥ Ω2 ⎥ Mprop () diag ⎝⎣Ct d/ 2⎦⎠ ⎣ 1 −1 −1 1 ⎦ ⎢ ⎣Ω32 ⎦ , 1 −1 1 −1 Cm Ω42
(8.3)
where d denotes the distance from the center of mass to the rotor in and Cm denotes the moment coefficient of the propeller. The gyroscopic moment due to a change in the axis of rotation and a change in the rotation speed of the rotor is given by ⎡ ⎤ ⎡ ⎤ ⎤ Ω˙ ⎡ ⎤ Ω 0 0 0 0 ⎢ ˙ 1⎥ q −q q −q ⎢ 1 ⎥ Ω2 ⎥ ⎣ ⎦ ⎢Ω2 ⎥ Mgyro (ω, ) Ir ⎣ 0 0 0 0 ⎦ ⎢ ⎣Ω˙ 3 ⎦ + Ir − p p − p p ⎣Ω3 ⎦ , (8.4) 1 −1 1 −1 0 0 0 0 Ω4 Ω˙ 4 ⎡
where Ir denotes the rotor moment of inertia in [kg · m2 ]. The drag moment due to the angular velocity of the vehicle Maero (ω) −Cdm ω|ω|, where Cdm denotes the drag moment coefficient vector.
(8.5)
8 Incremental Nonlinear Dynamic Inversion …
201
The vehicle’s translational dynamics are captured by r˙ I = v I ,
(8.6a)
m v˙ + mω × v = Fgrav (q) + Fprop () + Faero (v ) + Fext , B
B
B
(8.6b)
T T where r I = r x , r y , r z and v I = vx , v y , vz denote the position and velocity of the vehicle in the inertial frame, and v B = R IB v I denotes the body frame velocity. The force due to gravitational acceleration ⎡
⎤ 0 Fgrav (q) R(q) IB ⎣ 0 ⎦ , mg
(8.7)
where m denotes the mass and g the gravitational acceleration. The force due to propellers’ angular velocity ⎡
⎤ 0 Fprop () ⎣ 0 ⎦ , −Ttot
(8.8)
where Ttot (Ω) Ct
4
Ωi2
(8.9)
i=1
denotes the total thrust and Ct denotes the thrust coefficient of the propeller. The drag force due to the vehicle airspeed Faero (v B ) −Cd v B |v B |,
(8.10)
where Cd denotes the body frame drag coefficient. The actuator is modeled as a first-order system Ω 1 , Ωc τm s + 1
(8.11)
Ωc Ctr σˆ ,
(8.12)
where
Ctr denotes the coefficient for scaled throttle command to rotor speed, τm denotes the time constant of the motor, and σˆ denotes the scaled throttle command. The commanded speed and actual rotation speed of the rotor are denoted by Ωc and Ω, respectively. The list of model parameters is given in Table 8.1.
202
E. Saldiran and G. Inalhan
Table 8.1 The list of model parameters obtained from ground tests or literature Symbol Value Unit Description m Ix x I yy Izz Ct Cm Cd Cdm τm Ctr Ir g
0.5 3.544 · 10−3 3.544 · 10−3 6.935 · 10−3 1.459 · 10−6 1.629 · 10−8 1.536 · 10−2 2.832 · 10−3 0.043 2917.4 6.73 · 10−6 9.80665
kg [kg · m2 ] [kg · m2 ] [kg · m2 ] [N/(rad/s)2 ] [N · m/(rad/s)2 ] [N/(m/s)2 ] [N · m/(rad/s)2 ] [s] [(rad/s)−1 ] [kg · m2 ] [m/s2 ]
Mass Moment of inertia in x-axis Moment of inertia in y-axis Moment of inertia in z-axis Propeller thrust coefficient Propeller moment coefficient Body frame drag coefficient Drag moment coefficient Motor dynamic time constant Scaled throttle to the rotation speed Moment of inertia of rotor Gravitational acceleration
8.4 Design and Analysis In this section, an INDI control law is formulated for the rotational and translational dynamics of a quadcopter UAV. Attitude and angular velocity control laws are designed around the INDI controller. The robustness of the proposed method is investigated through closed-loop analysis. Finally, a novel yaw INDI formulation, which requires fewer model parameters, is introduced.
8.4.1 Rotational Dynamics INDI Formulation The derivations of an INDI control law using the Taylor series expansion are presented in [21, 22]. In this section, an INDI control law is derived for the rotational dynamics. Despite existing results, a simplification in the moment due to rotor acceleration is introduced. Thus, our solution eliminates a nonlinear inversion term in control effectiveness. Rearranging (8.2), we obtain that ω˙ = Iv−1 [Maero (ω) + Mext − ω × Iv ω] + Iv−1 Mprop () + Mgyro (ω, ) . (8.13) The contribution of gyroscopic moment Mgyro (ω, ) in body x-y axes is an order of magnitude smaller than the moment due to propeller Mprop (Ω). However, the moment due to rotor acceleration has a lead effect on the body z-axis dynamics [17, 26]. A reduced version of Mgyro (ω, ) with only z-axis-related terms is given by
8 Incremental Nonlinear Dynamic Inversion …
203
⎡ ⎤ 4 0 Mgyro,red () = Ir (−1)i+1 Ω˙ i ⎣0⎦ . 1 i=1
(8.14)
Despite existing results [21, 22], the control allocation calculation is separated from the control law. The control input-related terms M prop (Ω) and Mgyro,red () are expressed as a single input named torque reference τ ref . This value is used as an input to the control allocation algorithm. The torque reference is given by ⎡
τ ref
⎤ τref,x ⎣τref,y ⎦ = Mprop () + Mgyro,red (). τref,z
(8.15)
Thus, rewriting Eq. (8.13), we obtain that ω˙ = Iv−1 [Maero (ω) + Mext − ω × Iv ω] + Iv−1 τ ref , ω˙ = f(ω) + g(ω)τ ref .
(8.16) (8.17)
In (8.17), state-dependent terms and input-dependent terms are separated. Applying the Taylor series expansion to (8.17) around the current point (ω0 , τ 0 ), which is denoted by the subscript 0, we obtain ω˙ ≈ f(ω0 ) + g(ω0 )τ0 ∂ f(ω) + g(ω)τ ref (ω − ω0 ) + ∂ω 0 ∂ + f(ω) + g(ω)τ ref (τ ref − τ 0 ). ∂τ ref 0
(8.18)
After simplification, the approximated rotational dynamic becomes ω˙ ≈ ω˙ 0 + g(ω)(τ ref − τ 0 ), α = α0 +
I −1 v (τ ref
− τ 0 ),
(8.19a) (8.19b)
where α denotes the angular acceleration. The INDI control law is given by τ ref = I v (α ref − α 0 ) + τ 0 ,
(8.20)
where τ ref denotes the torque reference used as an input to the control allocation, α ref denotes the reference angular acceleration, α 0 denotes the vehicle angular acceleration estimation or measurement, and τ 0 denotes the torque calculated from the measured propeller angular velocity. At each time step, the error between the angular acceleration reference and angular acceleration measurement is computed, and the difference is added to the torque calculated from the propeller angular velocity.
204
E. Saldiran and G. Inalhan
This control law requires both the angular acceleration and the rotor’s angular velocity measurement. Most of the commercial-off-the-shelf affordable IMUs only measure the angular velocity. The angular acceleration can be obtained via the backward Euler differentiation, the predictive filter [19], or the state estimation method [27]. In this study, the backward Euler differentiation method is chosen due to its simplicity and ease of implementation. A first-order low-pass filter is used to attenuate the noise due to numerical differentiation. A block diagram of Eq. (8.20) with the low-pass filter is given in Fig. 8.2, where the low-pass filtered measurements are denoted by the subscript f . The moment estimation block uses the same relation from the rotor speed to the torque calculation given in (8.3).
8.4.2 Closed-Loop Analysis of the INDI Rotational Dynamics Before deploying a new controller, the stability of the closed-loop system employing the INDI control law captured by Fig. 8.2 is proven. The result is attained by analyzing the vehicle’s response around the hover flight condition by linearizing control allocation and angular acceleration dynamic around the hover flight condition. The tracking, input disturbance responses, and uncertainty analysis are presented. Performance degradation due to missing rotational acceleration term in control allocation calculation of the yaw axis is also investigated.
8.4.2.1
Roll Axis Analysis
The closed-loop analysis is only done on the roll axis since the rotational dynamics of the roll and pitch axis are identical. Linearizing the rotational dynamics of the x-axis (8.16) around the hover flight condition, we obtain that
Fig. 8.2 Rotational dynamic INDI controller. The block diagram of rotational dynamic INDI controller from the angular acceleration reference α ref to the angular acceleration output ω˙
8 Incremental Nonlinear Dynamic Inversion …
205
Fig. 8.3 Linearized Rotational Dynamic INDI Controller. The block diagram of linearized rotational dynamics INDI controller in the body x-axis
Ix x p˙ = 2Ct l y Ωh [−(−Ω − Ωh ) − (−Ω − Ωh ) + (Ω − Ωh ) + (Ω − Ωh )], (8.21) where I x x αx = K 1 Ω = τx ,
(8.22)
K 1 8Ct l y Ωh denotes the linearized roll axis control effectiveness, αx denotes the body x-axis rotational acceleration, and τx denotoes the total moment in the body x-axis. The INDI controller for linearized dynamic Eq. (8.21) is given by Ωref =
Ix x αref − α f,x + Ω f , K1
(8.23)
where α f,x and Ω f denote the filtered rotational acceleration and rotor rotation speed, respectively, and αref denotes the angular acceleration reference in the body x-axis. A block diagram of the control law is given in Fig. 8.3. The transfer functions Hrot (s) captures the low-pass filter, M(s) represents the motor dynamics, and τd denotes the input disturbance moment. The closed-loop transfer function of the system from the angular acceleration reference αref,x to the angular acceleration output αx for the filter Hrot (s) with the same cutoff frequency is reduced to αx = M(s). αref,x
(8.24)
For the same cutoff frequency, the reference tracking in Fig. 8.3 reduces to only motor dynamics [21]. This result shows the importance of having the same filter
206
E. Saldiran and G. Inalhan
cutoff frequency for both the rotor speed and the angular acceleration feedback. Different cutoff frequencies create a phase shift between rotational acceleration and rotor rotation speed measurement which reduces the phase margin. The transfer function from the disturbance τd,x to the angular acceleration αx for the same filter cutoff frequency is give by αx = Ix−1 x (1 − M(s)Hrot (s)). τd,x
(8.25)
This result shows that the INDI controller disturbance rejection bandwidth is determined by the motor and filter dynamics. So far, all the analyses have been performed under the assumption that we have perfect knowledge of the model parameters. Further analyses are performed to assess the controller sensitivity to the parameter uncertainty. These uncertainties could be due to a change in the control effectiveness at high-speed forward flight. The INDI control law block diagram in Fig. 8.3 for the linearized roll axis dynamics is updated with the new input function Ix x I¯x x , (8.26) =Δ ¯ K 1 K1 where Δ denotes the scaling applied to the input function in (8.23). With these changes, the closed-loop transfer function of the system from the reference angular acceleration αref,x to the angular acceleration output αx becomes ΔM(s) αx . = αref,x 1 + M(s)Hrot (s)(Δ − 1)
(8.27)
The step response of (8.27) for different values of Δ is given in Fig. 8.4. It can be seen that the INDI control law is robust to perturbations from the nominal model parameters The closed-loop response to the 5 times larger and smaller parametric uncertainty remains stable. As the perturbation increases, the response becomes less damped, and a slight overshoot occurs. Underestimating the model parameters in the INDI control law results in an overdamped response compared to overestimating it. Expanding (8.27) as Δτ f s + Δ αx , = αref,x τm τ f s 2 + (τm + τ f )s + Δ
(8.28)
gives an insight into INDI control response under parametric uncertainty with changing filter cutoff frequency, where τm denotes the motor time constant and τ f denotes the filter time constant. As the filter frequency increases, the first term at the numerator and denominator of (8.28) becomes much smaller, and the transfer function response approximates the first-order system. This relation shows that the INDI controller robustness to the parametric uncertainty improves with increasing cutoff frequencies.
8 Incremental Nonlinear Dynamic Inversion … Fig. 8.4 Parametric Uncertain System Step Response. Step response of linearized roll axis rotational dynamic INDI controller for different parametric uncertainty levels. The blue line represents the perfect model response which is also the motor dynamic, and the yellow and orange lines represent 5 and 0.2 times scaled uncertainty level response, respectively
1.2 1 0.8 0.6 0.4 0.2 0 0
8.4.2.2
207
0.2
0.4
0.6
0.8
1
Yaw Axis Analysis
The z-axis of the rotational dynamic equation in (8.16) is linearized around hover flight condition as Izz r˙ = 2Cm Ωh [(Ω − Ωh ) − (−Ω − Ωh ) + (Ω − Ωh ) − (−Ω − Ωh )] d(Ω − Ωh ) ∂(−Ω − Ωh ) ∂(Ω − Ωh ) ∂(−Ω − Ωh ) − + − , + Ir dt ∂t ∂t ∂t (8.29) where Izz αz = K 2 Ω + K 3 Ω˙ = τz ,
(8.30)
K 2 8Cm Ωh denotes the linearized yaw axis control effectiveness, K 3 4Ir denotes the linearized yaw axis acceleration control effectiveness, αz captures the body z-axis rotational acceleration, and τz denotes the total moment in the body z-axis. In the literature, the moment produced by the rotor speed change is taken into account in the control allocation algorithm either via nonlinear relation in Eq. (8.16) or by the numeric derivative calculation [21]. Even the difference between the sampling period of the control loop and motor time constant is shown to result in a high-frequency oscillation in the yaw axis [28]. It is known that the effect of rotor rotational acceleration presents itself as the lead term in the yaw axis dynamic, and it is required for capturing the vehicle dynamic during system identification [17, 26]. However, identifying the rotor moment of inertia and motor time constant requires a more rigorous measurement setup compared to the thrust and moment coefficients. INDI control law is developed for the linearized yaw dynamic without the linearized yaw axis acceleration control effectiveness K 3 term
208
E. Saldiran and G. Inalhan
Fig. 8.5 Linearized rotational dynamic INDI controller. The block diagram of linearized rotational dynamic INDI controller in the body z-axis. The rotor acceleration term is neglected and instead Hy,1 or Hy,2 low-pass filter placed to compensate for it
Ωref =
Izz αref,z − α f,z + Ω f . K2
(8.31)
A block diagram of the new control law is given in Fig. 8.5. Effect of first-order low pass filter Hy,1 (s) and Hy,2 (s) on the yaw axis INDI controller stability is investigated individually. The closed-loop transfer function from the angular acceleration reference αref,z to the angular acceleration output αz is written as follows. Initially, yaw output filter Hy,1 (s) = Hy,2 (s) = 1 is assumed to be ideal.
K3s K2
αz = αref,z 1+
+ 1 M(s)
K3s K2
M(s)H (s)
.
(8.32)
The closed-loop response of the yaw axis INDI control law without the rotor acceleration term in the control allocation amplifies the high-frequency reference as shown in the bode plot of the loop transfer function in Fig. 8.6. Although the closed-loop system is stable, the current response would result in erratic behavior under noisy measurement. This result is different from the roll axis response where closed-loop INDI control law reduced to the motor dynamic. The closed-loop response of the system is improved by placing a first-order low-pass filter. The response with each filter is analyzed individually. Placing the low pass filter before the input αref,z would also attenuate the high-frequency reference command, but the system would amplify the high-frequency noise from the angular acceleration measurement. The effects of these filters on the loop transfer function are compared in Fig. 8.6. The first-order low-pass filter with a 3 Hz cutoff frequency is placed at two different locations in the loop transfer function line in Fig. 8.6. Using either Hy,1 or Hy,2 resulted in a desired −20 dB rolloff at a high frequency. Both responses have a similar cross-over frequency of 110 rad/s. However, the loop transfer function with Hy,2 has a lower phase margin compared to Hy,1 and the same low-frequency gain as the original system. The yaw axis does not affect the trajectory tracking performance of the quadrotor
8 Incremental Nonlinear Dynamic Inversion …
209
60 40 20 0 -20 -1 10
10
0
10
1
10
2
10
3
10
0
10
1
10
2
10
3
0
-50
-100 -1 10
Fig. 8.6 Loop transfer function bode diagram. The z-axis linearized rotational dynamic INDI controller loop transfer function Bode diagram with different filter placement. The blue line shows the response of the controller with no filter. Neglecting the rotor acceleration term results in high dB at increasing frequency. Either Hy,1 or Hy,2 is placed to compensate for the missing dynamic. Both the orange and the yellow responses have the same crossover frequency and −20dB roll-off at a high frequency. The design with filter Hy,1 is chosen due to a smaller magnitude at low frequency and higher phase margin
and often has a relatively slow dynamic compared to the roll/pitch axis. It is also desirable to have a small amplitude at a low frequency to reduce actuator usage. For these reasons, the low-pass filter Hy,1 is decided to be used, which will be denoted as Hy (s). Updated version of Eq. (8.32) is K3s + 1 Hy,1 (s)M(s) K2 αz = . αref,z 1 + KK32s Hy,1 (s)M(s)H (s)
(8.33)
Similar to the analysis performed on the roll axis INDI control law, the parametric uncertainty analysis is made for the yaw axis INDI control law. A block diagram of INDI control law for linearized yaw axis dynamic is updated with new input function, where Δ is the scaling applied to the INDI control law in Eq. (8.31). Ix x I¯x x . =Δ ¯ K 1 K1
(8.34)
210
E. Saldiran and G. Inalhan 1.2 1 0.8 0.6 0.4 0.2 0 0
0.2
0.4
0.6
0.8
1
Fig. 8.7 Parametric uncertain system step response. Step responses of the linearized yaw axis rotational dynamic INDI controller for different parametric uncertainty levels. The blue line represents the perfect model response, and the yellow and orange lines represent 5 and 0.2 times the scaled uncertainty level response, respectively. Fixed the uncertainty level, the response degrades more than it does for the roll axis
A simplified version of the closed-loop transfer function of the yaw INDI controller is ΔHy (s)M(s)(K 3 s + K 2 ) αz , = αref,z K 2 + Hy (s)M(s)H (s)(Δ(K 3 s + K 2 ) − K 2 )
(8.35)
where the transfer function Hy (s) and Hrot (s) are first-order low-pass filters with a cutoff frequency of 3 Hz and 40 Hz, respectively. Step response of Eq. (8.35) for different Δ values is given in Fig. 8.7. Similar to the roll axis, underestimation of model parameters results in an overdamped response. Yaw INDI controller response has a higher overshoot than roll INDI controller for the same uncertainty range. The overshoot in the closed-loop response in Fig. 8.7 decreases as the measurement filter cutoff frequency increases or Hy (s) filter frequency decreases, which suggests that a more robust response can be achieved in exchange for the better tracking error with decreasing yaw output filter cutoff value. The rotor moment of inertia and motor time constant are not used in the controller. However, the change in these model parameters can result in instability. In order to assess the robust stability for unmodeled parametric uncertainty, the Eq. (8.35) is expanded ΔK 3 τ f s 2 + (ΔK 3 + ΔK 2 τ f )s + ΔK 2 αz , = 3 v K 2 τ f τm τ y s + K 2 (τ y τ f + τ y τm + τ f τm )s 2 + K 2 (τ f + τm + τ y + ΔK 3 )s + ΔK 2
(8.36)
8 Incremental Nonlinear Dynamic Inversion …
211
Table 8.2 Parameter bounds for yaw axis INDI control law Symbol Lower bound Upper bound Δ Ir τm τf
0.2 1.35e-6 2.87e-2 2.7e-3
5 3e-5 6.45e-2 6e-3
Function Uncertainty level Z-axis moment of inertia Motor time constant Filter time constant
where τ y , τm , τ f are the time constant of transfer functions Hy (s), M(s), H (s), respectively. All the parameters given in Eq. (8.36) are greater than zero. The numerator of the transfer function in Eq. (8.36) is a second-order polynomial and all of its coefficients are positive. This ensures that the parametric uncertain yaw INDI controller is not a non-minimum phase system. The denominator of the transfer function has multi-linear coefficients; however, they can be grouped to define them as interval coefficients without any approximation, since all of the parameters are positive. The bounds of parameters are defined as follows. Recall from the Eq. (8.29) that K 2 and K 3 are defined as K 2 = 8Cm Ωh ,
K 3 = 4Ir .
(8.37)
The lower and upper bounds of the model parameters, including the uncertainty scaling Δ are given in Table 8.2. The bounds are selected as five times and one over five times the current value of the rotor inertia Ir . The motor and the filter time constant’s bounds are selected as 1.5 times and 1/1.5 times the current value. For a third-order interval polynomial family parametric uncertainty is p(s, q) = q0 + q1 s + q2 s 2 + q3 s 3 (qi ∈ [qi− , qi+ ], qi− > 0),
(8.38)
where qi − and qi+ represent the lower and upper uncertainty bound of parameters. A third-order polynomial family is robustly stable if the following polynomial is stable [29, 30]: (8.39) p(s, q)+− = q0+ + q1− s + q2− s 2 + q3+ s 3 . The polynomial in Eq. (8.36) is over-bounded and the resultant polynomial with new lower and upper bounds is given in Table 8.3. The candidate polynomial is written using the bound from Table 8.3. p(s, q)+− = 6.516e − 4 + 1.1e − 5s + 2.268e − 7s 2 + 2.678e − 9s 3 .
(8.40)
The stability of the resultant polynomial can be checked by using the Hurwitz theorem [31]. It can be seen that, according to the Hurwitz theorem, this polynomial is stable. We can conclude that the yaw axis INDI controller in Eq. (8.35) under the parameter bounds given in Table 8.2 is robustly stable.
212
E. Saldiran and G. Inalhan
Table 8.3 Interval polynomial parameter bounds for yaw axis INDI control law Symbol Lower bound Upper bound q0 q1 q2 q3
2.61e-5 1.1e-5 2.27e-7 5.29e-10
6.5e-4 1.62e-5 5.38e-7 2.678e-9
Fig. 8.8 Attitude and Angular Rate Controller Diagram. The attitude and angular rate controller are designed around the rotational dynamic INDI controller. The angular acceleration loop is closed with P-term controllers since the INDI controller reduces the nonlinear vehicle dynamic to the actuator dynamic which has a first-order linear dynamic. Reference attitude is given as quaternion
8.4.3 Attitude Controller The attitude controller plays an important role in the quadrotor flight. Degraded performance in attitude control/attitude hold could result in poor trajectory tracking and position hold performance. It is important to have a controller with a good disturbance rejection capability and robustness to the parameter uncertainty. The integrator term is used in the PID controller to compensate for the disturbance and uncertainty. If another design metric is not taken into account, a high integrator gain could result in an overshoot in the system response. The INDI control law given in (8.20) has an integral-like behavior. It accounts for the disturbance or model imperfection by incrementing the last control action at each time step. This behavior is different from the classical PID controller, where disturbance is eliminated slowly with an integrator in a first-order fashion. The INDI controller’s integral-like property eliminates the need for an integrator term in the attitude and the angular velocity controller. The attitude and angular rate controller are developed with a single P gain term in a cascaded fashion as shown in Fig. 8.8. K q and K w represent attitude and angular rate controller gain vector, respectively.
8.4.3.1
Attitude Error Calculation
The attitude controller loop is closed around the angular rate controller. The kinematic relation between angular rate and attitude is given in the modeling section. The relation between angle error and angular rate can be linearized by using the quater-
8 Incremental Nonlinear Dynamic Inversion …
213
nion notation for the attitude kinematic. The roll/pitch and yaw attitude dynamic is assumed to be uncoupled. Calculating attitude error separately for roll/pitch and yaw angle results in a shorting thrust vector direction correction, thus better trajectory tracking performance [32, 33]. Let the target attitude quaternion be qref and the current attitude quaternion be q. The total attitude error quaternion qe in the current attitude frame that rotates the current attitude to the target attitude is qref = q ⊗ qe .
(8.41)
This quaternion error equation is expanded as roll/pitch error and yaw error. Firstly, correction in the roll/pitch axis is calculated, after aligning the current thrust vector direction with the reference thrust vector, we correct for the yaw error. qref = q ⊗ qe,r p ⊗ qe,y .
(8.42)
The roll/pitch error correction is calculated by obtaining the shortest distance from the current attitude body frame z-axis to the reference attitude frame z-axis. Derivation of attitude error with quaternion is given in existing literature [34–36]. This derivation is omitted here for briefness. Compared to the Euler angle representation, the quaternion does not suffer from the singularity at ±π/2 radians pitch angle. Another advantage of the quaternion error calculation is that the relation between attitude error and its corresponding axis-angle representation in the body frame is linear. This allows us to design a linear controller from attitude to the angular rate loop without compensating for nonlinear kinematics introduced with Euler representation.
8.4.4 Translational Dynamics INDI Formulation In this section, the INDI control law for the translational dynamics of the quadrotor is developed. The translation dynamic equation is rearranged to separate the actuator term and state-related term. Recall the body frame translational dynamic Eq. (8.6b) v˙ B = m −1 Fgrav (q) + Faero (v B ) + Fext − mω × v B + m −1 Fprop (Ω).
(8.43)
The quadrotor’s thrust vector direction in the inertial frame is determined by its current attitude. To generate the attitude reference term in translational dynamic Eq. (8.13), the force equation is rotated from the body frame to the inertial frame by multiplying both sides with the rotation matrix R(q)IB . The translational dynamic equation in the inertial frame is ⎡ ⎤ 0 v˙ I = ⎣0 ⎦ + m −1 R(q) IB Faero (v B ) + Fext + m −1 R(q) IB Fprop (Ω), g
(8.44)
214
E. Saldiran and G. Inalhan
where the last term m −1 R(q) IB Fprop (Ω) represent the thrust vector in inertial frame. The rotor rotation speed-related term is abstracted by defining it as fref . This value is used for the total thrust and reference attitude calculation. The specific thrust reference fref is ⎤ ⎡ fref,x f ref = ⎣fref,y ⎦ = m −1 R(q) IB F prop (). (8.45) fref,z Substituting this into Eq. (8.44), we obtain ⎡ ⎤ 0 v˙I = ⎣0 ⎦ + m −1 R(q) IB Faero (v B ) + Fext + fref , g ˙ I v = f(v I , q) + f . ref
(8.46a) (8.46b)
The state-dependent terms and input-dependent terms are separated. Applying the Taylor series expansion to (8.46) around the current point (v0I , q0 , f0 ), denoted by subscript 0, we obtain a I ≈ f0 (v0 , q0 ) + f 0 ∂ I I I f(v + , q) + f ref (v − v0 ) ∂v I 0 ∂ I f(v , q) + f ref (q − q0 ) + ∂q 0 ∂ I f(v , q) + f ref (f ref − f 0 ). + ∂f ref 0
(8.47)
After simplification, approximated rotational dynamic reduces to a I = a0I + (f ref − f 0 ).
(8.48)
and INDI control law becomes I − a0 + f 0 , f ref = aref
(8.49)
I denotes the reference acceleration and a denotes the vehicle’s acceleration where aref in the inertial frame, f ref denotes the reference specific thrust used in the control allocation to obtain total motor command, a0 denotes the measured acceleration in the inertial frame, f 0 denotes the specific thrust calculated from the propeller’s angular velocity. At each time step, the error between the translational acceleration reference and translational acceleration measurement is computed, and the difference is added to the current specific thrust calculated from the propeller’s angular velocity. Note that the gravity-corrected acceleration in the inertial frame is used for acceleration
8 Incremental Nonlinear Dynamic Inversion …
215
Fig. 8.9 Translational Dynamic INDI Controller. The block diagram of translational dynamic INDI I to the acceleration output a I controller from inertial acceleration reference aref
measurement as
⎡ ⎤ 0 a I = R(q) IB a B + ⎣0 ⎦ . g
(8.50)
Similar to the rotational dynamic, the noisy accelerometer measurement is filtered with a first-order low-pass filter Htrans . The low-pass filters Htrans and Hrot are used for translational and rotational INDI controllers, respectively. In the literature, translational and rotational INDI controllers use the same filter for the rotor rotation speed, inertial acceleration, and angular acceleration measurement. This approach limits rotational dynamic INDI controller performance to the outer loop. A different filter frequency might be required for the quadrotor with a distinct noise profile. Although sensor measurements are filtered, the accelerometer is affected by mechanical imperfections in the frame. These imperfections can be due to an unbalanced propeller, unbalanced rotor, or frame resonance. To prevent the transfer of noisy signal to the inner attitude controller and the motor, a first-order low-pass filter is placed at the translation INDI controller output as shown in Fig. 8.9. In Fig. 8.9, the attitude reference block is responsible for generating the reference quaternion for the attitude controller. The yaw angle reference is given separately to the attitude controller. Although it is not shown, the current yaw angle is used at the attitude reference block instead of the reference yaw angle.
8.4.4.1
Attitude Reference Calculation
The reference attitude quaternion from the inertial to the body frame is constructed using the specific thrust reference in the inertial frame and the vehicle’s current yaw angle. Attitude reference is assumed to be smaller than 90◦ roll or pitch angle for simplicity. In the literature, the whole rotation space representation is studied [37, 38]. The relationship between the z-axis in the attitude reference frame and the specific thrust in the inertial frame is
216
E. Saldiran and G. Inalhan I eˆref,z =−
f ref . f ref
(8.51)
The current yaw angle is used to construct a vector that is perpendicular to the heading direction and lies in the horizontal plane of the inertial frame. ⎡
⎤ − sin ψ eˆ yI = ⎣ cos ψ ⎦ . 0
(8.52)
The remaining x and y axes are I eˆref,x =
I eˆref,y =
I eˆ yI × eˆref,z I eˆ yI × eˆref,z
,
I I × eˆref,x eˆref,z I I eˆref,z × eˆref,x
(8.53a) .
(8.53b)
Finally, the attitude reference matrix is given by T I I I I R(q)ref = eˆref,x , eˆref,y , eˆref,z .
(8.54)
8.4.5 Position and Velocity Controllers The position and velocity controllers are responsible for tracking a position command or holding the position under severe disturbance. The quadrotor experiences significant aerodynamic drag during high-velocity flight. The position and velocity controller without high integrator gain would lag behind the reference position during these high-speed flights. A classical PID controller compensates for the aerodynamic drag mainly by the high integrator term. However, the INDI control law given in (8.49) compensates for the external aerodynamic drag forces or disturbances by applying incremental correction over the previous actuator command. The kinematic relation between the position and the velocity is a single integrator as given in Sect. 8.4. The position and velocity controllers are designed with a single P gain term in a cascaded fashion as given in Fig. 8.10. It is important to avoid dynamic interaction between the inner attitude control loop and the outer position control loop. According to the guidelines outlined in [39], the crossover frequency of the outer loop should be 3–5 times smaller than the crossover frequency of the inner loop as shown by ωc,outer =
1 1 ωc,inner . − 3 5
(8.55)
8 Incremental Nonlinear Dynamic Inversion …
217
Fig. 8.10 Position and velocity controller diagram. The position and velocity controllers are designed around the translational dynamic INDI controller. The inertial acceleration loop is closed with P term controllers. INDI controller reduces the nonlinear vehicle dynamic to the inner loop attitude control dynamic for the horizontal axis and motor dynamic for the vertical axis
8.4.6 Control Allocation The control allocation is a process of determining the actuator command for the given torque and force reference [40]. The control allocation algorithm used in the quadrotor has to allocate body moment/torque and body collective thrust to the four actuator commands for the given configuration of the vehicle. The novel solution is given in Sect. 8.4.2.2, where the nonlinear rotor acceleration term control allocation is eliminated. We obtain a relationship from τ ref and Ttot to individual motor commands by combining Eqs. (8.8) and (8.3) as ⎤ ⎡ −1 F1 ⎢ F2 ⎥ 1 ⎢−1 ⎢ ⎥= ⎢ ⎣ F3 ⎦ 4 ⎣ 1 F4 1 ⎡
1 −1 −1 1
1 −1 1 −1
⎤⎞ ⎡ ⎤ ⎤ ⎛⎡ √ τref,x 1 √2/d ⎥⎟ ⎢ ⎥ ⎜⎢ 1⎥ ⎥ diag ⎜⎢ 2/d ⎥⎟ ⎢ τref,y ⎥ , ⎝⎣Ct /Cm ⎦⎠ ⎣ τref,z ⎦ 1⎦ Tr e f,tot 1 1
(8.56a)
where Fi , i ∈ [1, 2, 3, 4] denotes the thrust produced by the individual motor, τref,x , τref,y , τref,z denote the reference torque/moment obtained from the rotational INDI controller, and Tr e f,tot denotes the reference total thrust from the translational INDI controller. This operation defines a map from the thrust per motor to the reference body torques and total thrust. The current formulation of the control allocation does not take into account the moment due to rotor acceleration. This gyroscopic moment requires knowledge of the rotor moment of the inertia and motor time constant, which requires more elaborate ground testing to obtain.
8.5 Implementation and Testing In this section, the implementation and testing details of the INDI control law for the inner and outer loops are given. To reduce the development cost of control systems, design through SITL simulations was performed. After successfully verifying
218
E. Saldiran and G. Inalhan
the control law in the SITL simulation, flight tests were performed to validate the controller’s performance. The proposed INDI control design is implemented into an ArduPilot autopilot [1]. ArduPilot is an open-source autopilot software with multiple features that can be used for control system testing. These features include but are not limited to a variety of off-the-shelf flight controller support tools, real-time data monitoring, high-quality state estimation for feedback, and bidirectional communication with the electronic speed controllers (ESCs) which drive the motors. A series of experiments were performed to evaluate the INDI controller’s performance in the time domain. The step response of the attitude and position controller is compared with the result from a linear analysis to assess the validity of the assumption underlying the control system’s development. The circle and lemniscate-shaped trajectories were tracked at a high speed to demonstrate the controller’s robustness to aerodynamic drag forces and moments. The controllers’ attitude tracking and attitude hold performance were assessed by performing high-speed forward flight tests, in which the vehicle experienced a significant pitching moment. For a fair comparison, the ArduPilot PID controller is tuned using its autotune feature.
8.5.1 Implementation and SITL Simulation The INDI controller was implemented into the ArduPilot autopilot as a separate library. The legacy controller was used as a backup solution to minimize development costs. A switching mechanism between the legacy and the proposed controller was enforced to avoid any conflict during operation. Up to 21 parameters were used to define the controller gains and model-related parameters. While the ArduPilot controllers require 15 parameters for the inner attitude controller, 7 for the horizontal position controller, and 11 for the vertical position controller, making it a total of 33. To make a fair comparison, only the PID controller-related parameters are counted. Initial tests were conducted in the SITL environment to verify successful implementation. The communications between the compiled ArduPilot executable and Simulink® were made via UDP protocol by using the already available ArduPilotSimulink® interface. The mission planner ground control station (GCS) software was used to interact with autopilot. The SITL simulation architecture is given in Fig. 8.11. The communication rate between the vehicle model in Simulink® and Ardupilot was 1000 Hz, while the autopilot control loop frequency was 500 Hz. The autopilot executable and vehicle model were lockstep in soft real time.
8 Incremental Nonlinear Dynamic Inversion …
219
Fig. 8.11 SITL communication architecture. The vehicle’s model running in Simulink® and ArduPilot executable was in lockstep at a 1 kHz communication rate. The pilot command was delivered from a physical transmitter to autopilot via ground control station
Fig. 8.12 Test platform components. The quadrotor components are listed for reference. The ESCs have a rotor rotation speed sensing capability which provides actuator state feedback used in the INDI controller
8.5.2 Experimental Platform The flight tests were performed with an agile maneuvering quadrotor platform. The general view of the experimental platform is given in Fig. 8.12. The components of the quadrotor are described in Table 8.4. The INDI control law requires actuator feedback in its formulation. However, some studies utilized the actuator model for the actuator feedback [41], which contradicts the nature of the INDI control law due to model-dependent design. Some of the most recent commercial off-the-shelf ESCs provide telemetry information, namely,
220
E. Saldiran and G. Inalhan
Table 8.4 List of the components used in the experimental platform Name Manufacturer Model Flight controller GNSS Module Motor ESC Propeller Battery
mRobotic ProfiCNC EMAX Hobbywing Dalprop ProFuse
Pixracer R15 Here+ RS2205 2600KV Xrotor-30A-micro T5045 Tri-bladed 4S 1300mAh 75C
the rotation speed, current, voltage, and temperature measurement per motor with the development of new firmware and protocols such as UAVCAN or DSHOT. The rotation speed of the motor can be obtained without installing a sensor per motor, such as an optical encoder sensor as done in [28]. The rotation speed is measured by the ESCs using the back electromotive force (EMF) zero-crossing detection method [42]. The electrical and optical rotation speed value is compared in the static thrust test experiment to assess the reliability of the electrical angular speed measurement obtained via the back EMF method. The electrical rotation speed measurements matched 99.9% with the optical rotation speed measurement. In the experimental platform, the angular velocity of the motors is obtained from the telemetry information provided by the ESC. The ESCs execute the BLHeli32 32.7 firmware. Angular velocity information is transferred from the ESCs to the autopilot either via a separate telemetry line or by the same line used for sending the motor command. The bidirectional Dshot capability is added to the ArduPilot, so a second solution is used for the ESC telemetry transmission. The autopilot was configured to use the DShot600 protocol. The DShot is a new low latency, high-speed digital communication protocol between the flight controller and ESCs [43].
8.5.3 ArduPilot Control Allocation In these experiments, ArduPilot control allocation, also known as a mixer, is used. The ArduPilot model-free control architecture takes scaled torque and scaled total thrust command as an input and generates scaled thrust force per motor. This relationship is given by ⎡ ˆ ⎤ ⎡ F1 −1 ⎢ Fˆ2 ⎥ ⎢−1 ⎢ ⎥=⎢ ⎣ Fˆ3 ⎦ ⎣ 1 1 Fˆ4
1 −1 −1 1
1 −1 1 −1
⎤ ⎤ ⎛⎡ ⎤⎞ ⎡ τˆref,x 1 0.5 ⎥ ⎜⎢ ⎥⎟ ⎢ 1⎥ ⎥ diag ⎜⎢0.5⎥⎟ ⎢ τˆref,y ⎥ , ⎝⎣0.5⎦⎠ ⎣ τˆref,z ⎦ 1⎦ 1 1 Tˆr e f,tot
(8.57)
8 Incremental Nonlinear Dynamic Inversion …
221
where Fˆi ∈ [0, 1], i ∈ (1, 2, 3, 4) denotes the scaled actuator command, τˆref,x , τˆref,y , τˆref,z ∈ [−1, 1] denote the scaled torque references in the body frame generated from the rotational dynamic INDI formulation, and Tˆr e f,tot denotes the scaled total thrust in the body frame generated from the translational dynamic INDI formulation. We divide both sides of (8.56) by Ct Cr2 , which is the maximum thrust a motor can generate. We rearrange the constant terms so that the control allocation matrix resembles the ArduPilot implementation. ⎡ ˆ ⎤ ⎡ F1 −1 ⎢ Fˆ2 ⎥ ⎢−1 ⎢ ⎥=⎢ ⎣ Fˆ3 ⎦ ⎣ 1 1 Fˆ4
1 −1 −1 1
1 −1 1 −1
⎛⎡
√
⎤⎞
2 ⎡ 2 1 0.5 √t Cr ⎥⎟ τref,x ⎜⎢ 2dC ⎜⎢ ⎥⎟ ⎜⎢ 2 ⎥⎟ ⎢ 1⎥ ⎥ diag ⎜⎢0.5⎥⎟ diag ⎜⎢ 2dCt Cr2 ⎥⎟ ⎢ τref,y ⎜⎢ 1 ⎥⎟ ⎣ τref,z ⎦ ⎝ ⎣ ⎦ ⎠ 1 0.5 ⎝⎣ 2Cm Cr2 ⎦⎠ Ttot 1 1 1 4Ct Cr2
⎤
⎛⎡
⎤⎞
⎤ ⎥ ⎥. ⎦ (8.58)
Thus, we scale the torque and total thrust by multiplying them by ⎛⎡
√
⎤⎞
2 2 √t Cr ⎥⎟ ⎜⎢ 2dC ⎜⎢ 2 2 ⎥⎟ ⎢ 2dC C ⎥⎟ diag ⎜ ⎜⎢ 1t r ⎥⎟ . ⎝⎣ 2Cm Cr2 ⎦⎠ 1 4Ct Cr2
(8.59)
After scaling, we use (8.60) to find a relationship from the scaled thrust force to the scaled motor command, namely, Fˆ = (1 − a)σˆ + a σˆ 2 ,
(8.60)
where Fˆ denotes the scaled thrust force, σˆ denotes the scaled motor command, and a = 0.53 denotes the approximated model parameter for the UAV employed in the proposed experiments.
8.5.4 Time Domain Validation Before evaluating tracking performance, the proposed controller is validated through flight tests. The inner attitude and the outer position control loop are validated with doublet signal tracking performance. The ArduPilot attitude reference shaping algorithm was turned off for these experiments. To validate the attitude controller stability and response, a doublet signal with a 30◦ amplitude and 4 s period was applied as a reference roll angle. For the position controller, a doublet signal with 1 m amplitude and 10 s period is applied as a reference
222
E. Saldiran and G. Inalhan
40
1.5
30
1
20 0.5
10 0
0
-10
-30 -40
-0.5
Reference INDI Flight Test Legacy Flight Test INDI Linear
-20
0
1
2
3
Reference INDI Flight Test Legacy Flight Test INDI Linear
-1
4
5
6
7
8
9
10
-1.5
(a) Roll doublet response
0
5
10
15
(b) x-axis position doublet response
Fig. 8.13 Time domain validation. The attitude and position controller response of the proposed method is validated by comparing their doublet response from the flight test with the linearized controller. a shows the attitude controller response, whereas b shows the position controller response Table 8.5 The proposed controller attitude and position controller gain. Measurement and output filter cutoff frequency are also given Name Symbol Value Attitude controller gain vector Angular rate controller gain vector Rotational acceleration filter Hrot cutoff frequency Yaw output filter Hy cutoff frequency [Hz] Position controller gain vector Velocity controller gain vector Inertial acceleration filter Htrans cutoff frequency [Hz] Specific thrust reference filter Ht cutoff frequency [Hz]
Kq Kω – – Kpos Kvel –
[7.5, 7.5, 6] [30, 30, 24] 30 3 [1, 1, 1] [4, 4, 4] 20
–
3
position in the x-axis of the inertial reference frame. These flight tests were performed for both the INDI and the legacy controller under calm weather conditions. The attitude and position tracking response of the proposed controller, legacy controller, and linear approximation of the proposed controller are given in Fig. 8.13a, b, respectively. The controller gains used in this experiment are given in Table 8.5. The gains were selected to give a critically damped response. It can be seen from Fig. 8.13a, b that the proposed INDI controller flight tests and linearized INDI controller response overlap with both for rise time and settling time.
8 Incremental Nonlinear Dynamic Inversion …
223
8.5.5 Trajectory Tracking The trajectory tracking performance of the INDI controller is evaluated by flying two different trajectories. The circle and lemniscate-shaped trajectories were traversed at different speeds to assess the INDI controller’s robustness and disturbance rejection performance with increasing speed. Trajectory tracking flights were flown with both the INDI controller and autotuned ArduPilot legacy PID controller for comparison. Both circle and lemniscate-shaped trajectories are defined by three parameters, namely, w(t) =
wmax (t/tr )2 , 0 ≤ t ≤ tr , t > tr , wmax , ∞
θ=
w(t)dt,
(8.61a) (8.61b)
0
where rt denotes the trajectory radius, wmax the maximum frequency in radians per second, and tr the ramp-up time. The ramp-up time is used to smooth discontinuities in the velocity and acceleration terms at the start. The trajectory tracking was started after 5 s passed in GUIDED mode. The altitude reference remained constant for all of the experiments. The trajectory position derivatives, velocity, and acceleration were also fed into the controller as a feedforward. The circle trajectory reference position, velocity, and acceleration are produced by T rref = rt sin θ, −rt (cos θ − 1) , T vref = wrt cos θ, wrt sin θ , T aref = −w 2 rt sin θ, w 2 rt cos θ .
(8.62a) (8.62b) (8.62c)
The circle with rt = 15 m was tracked with four different speeds, namely, [4, 7, 10, 15] m/s which correspond to the different maximum frequency wmax . The fastest test, namely, the one at 15 m/s was not conducted for the legacy controller due to safety reasons. A bird’s-eye view of the circle trajectory track with a 10 m/s tangential speed is given in Fig. 8.14a. The two controllers appear to follow a similar track. However, Fig. 8.15a shows that the INDI controller’s mean position error norm is around 0.32 m, whereas the legacy controller is 1.22 m. This result is achieved despite having similar scaled total thrust output. The lemniscate-shaped trajectory reference position, velocity, and acceleration are produced using
224
E. Saldiran and G. Inalhan
35
20 Reference Proposed INDI Legacy PID
30 25
10
20
5
15
0
10
-5
5
-10
0
-15
-5 -20
-15
-10
-5
0
5
10
15
Reference Proposed INDI Legacy PID
15
-20
20
-5
0
5
10
15
20
25
30
35
(b) Lemniscate-shaped trajectory track
(a) Circle trajectory track
Fig. 8.14 Circle and lemniscate-shaped trajectories. Circle and lemniscate-shaped trajectories were tracked with both the proposed INDI controller and legacy controller. The vehicle reached a speed of 10 m/s for the circular trajectory and 11.74 m/s for the lemniscate-shaped trajectory. Although the trajectories may seem to overlap, the proposed controller achieved more than 50% smaller tracking error 1
1
Proposed INDI Legacy PID
0.5
Proposed INDI Legacy PID
0.5 0
0 0
5
10
15
0
5
10
15
0
5
10
15
0
5
10
15
0
5
10
15
10.5 10 10
5 0
5
10
15
14 12
12
10
10 0
5
10
15
0.2 0.15 0.15 0.1 0.1 0
5
10
(a) Circle trajectory results
15
(b) Lemniscate-shaped trajectory results
Fig. 8.15 Circle and lemniscate-shaped trajectories. The proposed INDI controller (blue) and the legacy PID controller (red) trajectory tracking results are compared. The mean position error norm is reduced from 1.22 m to 0.32 m for the circle trajectory and from 0.78 m to 0.21 m for the lemniscate-shaped trajectory. a shows the circle trajectory results whereas b shows the lemniscateshaped trajectory results
8 Incremental Nonlinear Dynamic Inversion …
225
1.1 INDI Circle INDI Lemniscate Legacy Circle Legacy Lemniscate
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1
4
6
8
10
12
14
16
Fig. 8.16 Position error versus speed. The change in the mean of position error norm for different trajectories and controllers with increasing speed is shown. The proposed INDI controller reduced error by more than 50% for all of the trajectories and speeds. The result demonstrates the proposed controller disturbance rejection performance
T rref = −rt (cos θ − 1), rt sin 2θ , T vref = wrt sin θ, 2wrt cos 2θ , T aref = w 2 rt cos θ, −4w 2 rt sin 2θ .
(8.63a) (8.63b) (8.63c)
The lemniscate-shape trajectory parameters are set as rt = 15 m, wmax = 0.35, and vmax = 11.74 m/s. A bird’s-eye view of the lemniscate-shape trajectory is given in Fig. 8.14b. Notice that the legacy controller track is not on course during the long diagonal segment of the flight. The tracking error is noticeable when the vehicle is flying from the southwest to the northeast direction, where the wind was blowing toward the opposite direction during the flight. The legacy controller position error increases with the vehicle speed as shown in Fig. 8.15b. However, the INDI controller maintained a smaller position error with increasing speed. The mean position error is nearly 70% smaller for the INDI controller where it is reduced from 0.78 m to 0.21 m. A comparison of the Euclidean norm of the position error with increasing maximum velocity is shown in Fig. 8.16. The INDI controller outperforms the legacy controller significantly for all of the tests. The INDI controller tracking error increases slower than the legacy controller for both circle and lemniscate-shaped trajectories. Further reduction in INDI controller can be achieved by increasing the loop frequency from 500 Hz to 1000 Hz, tighter tuning position and attitude controller, and increasing filter cutoff frequency.
226
E. Saldiran and G. Inalhan
90 60 30 0 -30 -60 -90
90 60 30 0 -30 -60 -90
Pitch Ref. Pitch
0
2
4
6
8
10
12
30
30
20
20
10
10
0
Pitch Ref. Pitch
0
2
4
6
8
10
12
0
2
4
6
8
10
12
0
2
4
6
8
10
12
0 0
2
4
6
8
10
12
1
1
0.5
0.5
0
0 0
2
4
6
8
10
12
Fig. 8.17 Attitude hold and tracking results. The legacy (left) and INDI (right) controller attitude tracking and hold performance tested at fast-forward flight reaching up to 30 m/s and −60◦ pitch angle. The INDI controller returned to the level flight faster and with a smaller overshoot
8.5.6 Attitude Hold in High Forward Speed Flight A flight test scenario is created to push the vehicle to its physical limit, to unveil the advantages of the INDI controller. In this experiment, the vehicle is commanded to pitch and reach a steady-state speed of approximately 30 m/s. The goal of this experiment is to evaluate attitude tracking and attitude hold performance of controllers during fast-forward flight. The vehicle experiences high moment drag during the forward flight due to a change in the inflow angle. This results in a change in the control effectiveness value, which is measured using a static ground test. The flight tests were conducted in ALTHOLD mode, whereby the autopilot only tries to maintain a constant altitude while allowing roll, pitch, and yaw attitude to be controlled by the pilot. In the experiment, the vehicle was commanded to a −60◦ pitch angle. After reaching steady-state speed, the vehicle was commanded to return to a pitch angle of 0◦ . Finally, the pitch angle of 60◦ is commanded to return to the takeoff location. This procedure is shown in Fig. 8.17. Attitude tracking and hold results for the legacy and the INDI controller are given on the left and right sides of Fig. 8.17, respectively. Although the attitude tracking performance did not deteriorate with increasing speed for both controllers, the legacy controller uses a higher scaled total thrust command to maintain its attitude. This could be due to the low stability margin of the legacy controller at forward flight. The advantage of the INDI controller becomes evident when the vehicle is commanded to return to the level flight between 4–6 s. The legacy controller has an overshoot of 60◦ and shows low damping oscillation. After the airspeed decreases below 15 m/s, the legacy controller
8 Incremental Nonlinear Dynamic Inversion …
227
displays a high damping ratio closed-loop response. Overall, the legacy controller converges to the level flight in 2 s. However, the INDI controller has a single overshoot of 15◦ and converges to the level flight within 0.2 s.
8.6 Conclusion and Future Work In this chapter, the control of an agile maneuvering quadrotor using INDI is demonstrated both for the inner attitude control loop and outer position control loop of a quadcopter UAV. The disturbance rejection and tracking capability of the INDI control law are shown with closed-loop analysis and outdoor flight tests. The versatile control law development is a complex process and requires the control law to be robust to different vehicle configurations under severe weather conditions. To speed up the testing of the proposed controller for others, the INDI control law is implemented into the open-source ArduPilot project, and the source code is shared publicly. A less model-dependent solution is proposed for the yaw axis INDI control formulation. Despite existing results, this new formulation does not require the motor moment of inertia information and does still inherit the robustness and disturbance rejection properties of the INDI controller. This approach also eliminates the rotor acceleration term from the control allocation and reduces the number of model parameters required. Reference tracking, disturbance rejection, and robustness to the parametric uncertainty properties of the derived INDI controller are studied through closed-loop response analysis. The cascaded and model-free control architecture is presented to allow easy integration of the proposed controller. The cascaded architecture also enables using the INDI controller in different flight modes. The lack of an integrator term allows the proposed controller to be switched in and out of different flight modes without resetting the integrator. The bump-less transfer between different flight modes is demonstrated with outdoor flight tests. The circle and lemniscate-shaped trajectories are flown to demonstrate the trajectory tracking and disturbance rejection capability of the INDI controller. The track performance is compared with the legacy ArduPilot controller. The INDI controller was demonstrated to have more than 50% smaller tracking error, without performing any in-flight tuning. The nonlinear dynamic inversion assumption of the proposed INDI control law is tested by performing a fast-forward flight with a speed reaching up to 30 m/s. Attitude tracking and attitude hold performance of the INDI controller are shown to be similar to the hover flight assumption. The legacy controller has nearly 90% overshoots, whereas the proposed controller has only 30% overshoots. Although the proposed INDI controller outperforms the PID controller, there are still multiple areas of research that need to be tackled. The INDI controller requires less model information compared to the nonincremental dynamic inversion, however, the control effectiveness matrix must be
228
E. Saldiran and G. Inalhan
known up to a certain confidence level for satisfactory results. An adaptive control scheme can be developed to correct the mismatch between the assumed control effectiveness matrix and the real vehicle. The proposed controller is tested only in the time domain. The frequency-domain analysis can be made to validate the loop gain crossover frequency, and sensitivity margin of the controller in the linear analysis similar to the studies performed for non-incremental dynamic inversion control law [44, 45]. The INDI formulation takes incremental steps to eliminate any disturbance or model mismatch. Although the error is corrected at each time step, the use of a lowpassed filter delays the propagation of the error. A reinforcement learning (RL)-based control law can be developed to work cooperatively alongside the INDI controller. This RL-based controller could take reference trajectory, vehicle state, actuator state, and output reference to the control allocation algorithm. This new control law could gain a predictive feature, as the error is predicted and the feedforward term is given by the RL-based controller. The next chapter further explores the incremental control framework by presenting an incremental backstepping controller. As for the incremental dynamic inversion results presented in this chapter, the incremental backstepping controller will be proven to be superior to that of its classical counterpart while relying on less information on the controlled system.
References 1. Ardupilot (2020) Available online at http://ardupilot.org/ (Last accessed May 30, 2020) 2. Meier L, Honegger D, Pollefeys M (2015) PX4: A node-based multithreaded open source robotics framework for deeply embedded platforms. International conference on robotics and automation. IEEE, Seattle, WA, pp 6235–6240 3. Frachtenberg E (2019) Practical drone delivery. Computer 52(12):53–57 4. Galvane Q, Lino C, Christie M, Fleureau J, Servant F, o.-l. Tariolle F, Guillotel P, (2018) Directing cinematographic drones. ACM Trans Graph (TOG) 37(3):1–18 5. Mogili UR, Deepak B (2018) Review on application of drone systems in precision agriculture. Procedia Comput Sci 133:502–509 6. Madawalagama S, Munasinghe N, Dampegama S, Samarakoon L (2016) Low-cost aerial mapping with consumer-grade drones. Asian conference on remote sensing. Colombo, Sri Lanka, pp 1–8 7. Herekoglu O, Hasanzade M, Saldiran E, Cetin A, Ozgur I, Kucukoglu AG, Ustun MB, Yuksek B, Yeniceri R, Koyuncu E et al (2019) Flight testing of a multiple UAV RF emission and vision-based target localization method. Scitech. AIAA, San Diego, CA, pp 1–18 8. Seo J, Duque L, Wacker J (2018) Drone-enabled bridge inspection methodology and application. Autom Construct 94:112–126 9. Brief L (2011) Growth opportunity in global uav market. Las Colinas, USA 10. Foehn P, Brescianini D, Kaufmann E, Cieslewski T, Gehrig M, Muglikar M, Scaramuzza D (2020) Alphapilot: Autonomous drone racing. ArXiv preprint arXiv:2005.12813 11. Faessler M, Franchi A, Scaramuzza D (2017) Differential flatness of dynamics subject to rotor drag for accurate tracking of high-speed trajectories. IEEE Robot Autom Lett 3(2):620–626 12. Svacha J, Mohta K, Kumar V (2017) Improving quadrotor trajectory tracking by compensating for aerodynamic effects. International conference on unmanned aircraft systems. IEEE, Miami, FL, pp 860–866
8 Incremental Nonlinear Dynamic Inversion …
229
13. Kai J-M, Allibert G, Hua M-D, Hamel T (2017) Nonlinear feedback control of quadrotors exploiting first-order drag effects. IFAC-PapersOnLine 50(1):8189–8195 14. Rigter M, Morrell B, Reid RG, Merewether GB, Tzanetos T, Rajur V, Wong K, Matthies LH (2019) An autonomous quadrotor system for robust high-speed flight through cluttered environments without GPS. International conference on intelligent robots and systems. IEEE, Macau, China, pp 5227–5234 15. Tobias E, Sanders FC, Tischler M (2018) Full-envelope stitched simulation model of a quadcopter using stitch. In: International annual forum. American Helicopter Society, Phoenix, AZ 16. Ivler CM, Goerzen CL (2018) Control design for tracking of scaled MTE trajectories on an IRIS+ quadcopter. In: International annual forum. American Helicopter Society, Phoenix, AZ 17. Yuksek B, Saldiran E, Cetin A, Yeniceri R, Inalhan G (2020) System identification and modelbased flight control system design for an agile maneuvering platform. In: Scitech. AIAA, Orlando, FL, p 1835 18. Simplício P, Pavel M, Van Kampen E, Chu Q (2013) An acceleration measurements-based approach for helicopter nonlinear flight control using incremental nonlinear dynamic inversion. Control Eng Pract 21(8):1065–1077 19. Sieberling S, Chu Q, Mulder J (2010) Robust flight control using incremental nonlinear dynamic inversion and angular acceleration prediction. J Guidance Control Dyn 33(6):1732–1742 20. van’t Veld R, Van Kampen E-J, Chu QP, (2018) Stability and robustness analysis and improvements for incremental nonlinear dynamic inversion control. Guidance, navigation, and control conference. AIAA, Kissimmee, FL, pp 1–17 21. Smeur EJ, Chu Q, de Croon GC (2016) Adaptive incremental nonlinear dynamic inversion for attitude control of micro air vehicles. J Guidance Control Dyn 39(3):450–461 22. Smeur EJ, de Croon GC, Chu Q (2018) Cascaded incremental nonlinear dynamic inversion for mav disturbance rejection. Control Eng Pract 73:79–90 23. Sun S, Sijbers L, Wang X, de Visser C (2018) High-speed flight of quadrotor despite loss of single rotor. IEEE Robot Autom Lett 3(4):3201–3207 24. Sun S, Wang X, Chu Q, Visser CD (2021) Incremental nonlinear fault-tolerant control of a quadrotor with complete loss of two opposing rotors. IEEE Trans Robot 37(1):116–130 25. Saldiran E (2021) Indi source code. Available online at https://github.com/esaldiran/ardupilot/ tree/indi-control (Last accessed Dec 24, 2021) 26. Gong A, Sanders FC, Hess RA, Tischler MB (2019) System identification and full flightenvelope model stitching of a package-delivery octocopter. Scitech. AIAA, San Diego, CA, pp 1–18 27. Han J, He Y, Xu W (2007) Angular acceleration estimation and feedback control: An experimental investigation. Mechatronics 17(9):524–532 28. Tal E, Karaman S (2021) Accurate tracking of aggressive quadrotor trajectories using incremental nonlinear dynamic inversion and differential flatness. IEEE Trans Control Syst Technol 29(3):1203–1218 29. Kharitonov VL (1978) The asymptotic stability of the equilibrium state of a family of systems of linear differential equations. Differentsial’nye Uravneniya 14(11):2086–2088 30. Anderson B, Jury E, Mansour M (1987) On robust Hurwitz polynomials. IEEE Trans Autom Control 32(10):909–913 31. Hurwitz A (1895) Über die Bedingungen, unter welchen eine Gleichung nur Wurzeln mit negativen reellen Theilen besitzt. Mathematische Annalen 46(2):273–284 32. Brescianini D, D’Andrea R (2018) Tilt-prioritized quadrocopter attitude control. IEEE Trans Control Syst Technol 28(2):376–387 33. Yu Y, Yang S, Wang M, Li C, Li Z (2015) High performance full attitude control of a on S O(3). IEEE international conference on robotics and automation. IEEE, Seattle, WA, pp 1698–1703 34. Brescianini D, Hehn M, D’Andrea R (2013) Nonlinear quadrocopter attitude control: Technical report. Tech, Rep, ETH Zurich 35. Sola J (2017) Quaternion kinematics for the error-state Kalman filter. ArXiv preprint arXiv:1711.02508
230
E. Saldiran and G. Inalhan
36. Sommer H, Gilitschenski I, Bloesch M, Weiss S, Siegwart R, Nieto J (2018) Why and how to avoid the flipped quaternion multiplication. Aerospace 5(3):72 37. Morrell B, Rigter M, Merewether G, Reid R, Thakker R, Tzanetos T, Rajur V, Chamitoff G (2018) Differential flatness transformations for aggressive flight. IEEE international conference on robotics and automation. IEEE, Brisbane, Australia, pp 5204–5210 38. Faessler M, Fontana F, Forster C, Scaramuzza D (2015) Automatic re-initialization and failure recovery for aggressive flight with a monocular vision-based. International conference on robotics and automation. IEEE, Seattle, WA, pp 1722–1729 39. Tischler MB, Berger T, Ivler CM, Mansur MH, Cheung KK, Soong JY (2017) Practical methods for aircraft and rotorcraft flight control design: An optimization-based approach. American Institute of Aeronautics and Astronautics, Preston, VA 40. Johansen TA, Fossen TI (2013) Control allocation-survey. Automatica 49(5):1087–1103 41. Wang Z, Zhao J, Cai Z, Wang Y, Liu N (2021) Onboard actuator model-based incremental nonlinear dynamic inversion for quadrotor attitude control: Method and application. Chinese J Aeronaut 34(11):216–227 42. Bangura M, et al (2017) Aerodynamics and control of quadrotors 43. Liang O (2020) Dshot esc protocol. Available online at https://oscarliang.com/dshot/ (Last accessed May 30, 2020) 44. Saetti U, Horn JF, Lakhmani S, Lagoa C, Berger T (2018) Design of dynamic inversion and explicit model following control laws for quadrotor inner and outer loops. In: International annual forum. American Helicopter Society, Phoenix, AZ 45. Saetti U, Horn JF, Lakhmani S, Lagoa C, Berger TF (2020) Design of dynamic inversion and explicit model following flight control laws for quadrotor uas. J Am Helicopter Soc 65(3):1–16 Emre Saldiran received his B.Sc. degree in Astronautical Engineering and M.Sc. degree in Aeronautical and Astronautical Engineering from Istanbul Technical University (ITU) in 2017 and 2021, respectively. Between 2017 and 2021, he worked as a Research Assistant at the ITU Aerospace Research Center(ARC). Currently, he is a Ph.D. student at Cranfield University. His current research areas are autonomous systems, flight control system design, modeling, and certification of AI-enabled airworthy systems. Gokhan Inalhan received his B.Sc. degree in Aeronautical Engineering from Istanbul Technical University in 1997 and the M.Sc. and Ph.D. degrees in Aeronautics and Astronautics from Stanford Universit, in 1998 and 2004, respectively. In 2003, he received his Ph.D. Minor from Stanford University on Engineering Economics and Operations Research (currently Management Science and Engineering). Between 2004 and 2006, he worked as a Postdoctoral Associate at the Massachusetts Institute of Technology. During this period, he led the Communication and Navigation Group in the MIT-Draper Laboratory NASA CER project. He has served as the Director of Controls and Avionics Laboratory (2006–2016) and Director General of Aerospace Research Centre (2016–2019) at Istanbul Technical University. He is currently the BAE Systems Chair, Professor of Autonomous Systems and Artificial Intelligence, and Deputy Head of the Centre for Autonomous and Cyber-Physical Systems at Cranfield University. He is the Chair of the IEEE Technical Committee on Aerospace Controls, Associate Editor-in-Chief of the IEEE Transactions on Aerospace and Electronic Systems, and Senior Editor for Autonomous Systems. He is a lifetime member and Associate Fellow of AIAA.
Chapter 9
Incremental Control to Reduce Model Dependency of Classical Nonlinear Control Byoung-Ju Jeon, Hyo-Sang Shin, and Antonios Tsourdos
9.1 Introduction Nonlinear control algorithms have been widely and successfully utilized due to their performance compared to linear control methods since they are able to handle large nonlinearities [1]. Although large nonlinearities are handled by explicitly utilizing full model information, as can be observed in well-known nonlinear control algorithms such as backstepping (BKS) controller [2] and nonlinear dynamic inversion (NDI) controllers [3], one of the disadvantages of classical nonlinear control algorithms is their closed-loop stability and performance is sensitive to model uncertainties. To reduce the model dependency of classical nonlinear control algorithms, incremental control algorithms, such as the incremental backstepping (IBKS) controller [4] and the incremental nonlinear dynamic inversion (INDI) controller, [5] were proposed. Despite the classical control systems, incremental control algorithms additionally utilize the state derivatives and control input measurements to replace required knowledge on a model except for control effectiveness. Since model information is implicitly included in these measurements, incremental control algorithms do not rely on explicit model information, except for control effectiveness, for their implementation. Hence, incremental control can be considered as in between sensor-based and model-based approaches. This leads to the necessity of understanding the closed-loop
B.-J. Jeon · H.-S. Shin (B) · A. Tsourdos Cranfield University, Cranfield, United Kingdom e-mail: [email protected] B.-J. Jeon e-mail: [email protected] A. Tsourdos e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 A. L’Afflitto et al. (eds.), Control of Autonomous Aerial Vehicles, Advances in Industrial Control, https://doi.org/10.1007/978-3-031-39767-7_9
231
232
B.-J. Jeon et al.
characteristics of incremental control under model uncertainties and measurement disturbances, such as biases, noises, or delays. There have been some studies on the effects of model uncertainties or measurement disturbances on the closed-loop system with incremental control algorithms. In [4], numerical simulations are performed to investigate the closed-loop characteristics with IBKS under model uncertainties. In [6], a condition for the state error to be globally ultimately bounded under model uncertainties is analyzed for closed-loop systems with INDI; this condition is mathematically rigorous but slightly difficult to understand its physical meaning. In [7], where flight test results with INDI are provided, it is stated that synchronization between delayed measurements on the state derivatives and the control surface deflection angle is essential for a successful flight test with INDI. In this chapter, an incremental controller is designed and its closed-loop characteristics are suggested considering model uncertainties and measurement disturbances. IBKS design for the 6-degrees-of-freedom (DoF) dynamics of an unmanned aerial vehicle (UAV) is provided as a representative example, and this controller’s performance is compared with the performance of a BKS controller to investigate the differences between incremental and classical nonlinear control. The closed-loop analysis with IBKS is performed in three different cases to investigate individual and integrated effects of the model uncertainties and the measurement disturbances on the closed-loop system with incremental control. In the first case, which involves model uncertainties and ideal measurements, the closed-loop system’s stability and performance with IBKS are analyzed from its closed-loop transfer function, and the results obtained applying IBKS are compared to those with BKS, as in [8]. In the second case, the effects of biases in the additional measurements and the model uncertainties on the closed-loop characteristics with IBKS are investigated analytically, as in [9]. In the last case, delays in the additional measurements and the model uncertainties are considered, and the closed-loop characteristics with IBKS are identified and analyzed from the analysis results obtained by applying the framework to the representative examples, as in [10]. The closed-loop analysis with IBKS in the above three cases is conducted by analyzing the short-period mode dynamics for simplicity, providing a critical understanding of incremental control.
9.2 Design of an Incremental Control System Incremental control algorithms, such as IBKS [4] and INDI [5], are proposed to reduce the dependency on the model of classical nonlinear controllers such as BKS [2] and nonlinear dynamic inversion [3]. Both BKS and NDI ultimately regulate the closed-loop error dynamics by canceling undesired nonlinearities. One of the main differences between BKS and NDI is that BKS considers transient responses and NDI disregards interconnections between the loops under the time-scale separation assumption. In this chapter, IBKS for a 6-DoF unmanned aerial vehicle (UAV) is
9 Incremental Control to Reduce Model Dependency of Classical …
233
introduced as an example of incremental control, and BKS is proposed as a benchmark for IBKS.
9.2.1 Dynamics of a 6-DoF UAV The dynamics of a 6-DoF UAV is given by [11] x˙ 1 = f1 + f1g + g1 + g1a x2 + h1 u, x˙ 2 = f2 + f2a x2 + g2 u,
(9.1a) (9.1b)
where x1
f1
f1 g
g1
g1a
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ δe α p ⎣β ⎦ , x2 ⎣ q ⎦ , u ⎣δa ⎦ , (9.2a) φ r δr ⎤ ⎡ −(sin α/ cos β)[T + C x (α)q¯ S] + (cos α/ cos β)C z (α)q¯ S 1 ⎣ − cos α sin β[T + C x (α)q¯ S] + cos βC y (β)q¯ S − sin α sin βC z (α, β)q¯ S ⎦ , (9.2b) mV 0 ⎤ ⎡ (1/ cos β)(sin α sin θ + cos α cos φ cos θ ) g ⎣ cos α sin β sin θ + cos β cos θ sin φ − sin α sin β cos φ cos θ ⎦ , (9.2c) V 0 ⎤ ⎡ − cos α tan β 1 − sin α tan β ⎣ sin α 0 − cos α ⎦ , (9.2d) 1 sin φ tan θ cos φ tan θ ⎤ ⎡ 0 0 −(sin α/ cos β)C xq (α)c¯ + (cos α/ cos β)C zq (α)c¯ ρS ⎣ cos βC y p (α)b − cos α sin βC xq (α)c¯ − sin α − sin βC zq (α)c¯ cos βC yr (α)b⎦ , 4m 0 0 0
(9.2e)
⎤ 0 0 −(sin α/ cos β)C xδe (α) + (cos α/ cos β)C z δe (α, β) ρV S ⎣ − cos α sin βC xδe (α) − sin α sin βC z δe (α, β) h1 cos βC yδa (β) cos βC yδr (β)⎦ , 2m 0 0 0 ⎡
⎡
⎤
⎡
⎤
(9.2f)
I2 pq + I1 qr I3 Cl (α, β)q¯ Sb + I4 Cn (α, β)q¯ Sb ⎦, I7 Cm (α)q¯ S c¯ f2 ⎣ I5 pr − I6 ( p 2 − r 2 )⎦ + ⎣ (9.2g) I4 Cl (α, β)q¯ Sb + I9 Cn (α, β)q¯ Sb −I2 qr + I8 pq ⎤ ⎡ 0 I3 Clr (α)b + I4 Cnr (α)b I C (α)b + I4 Cn p (α)b ρV S ⎢ 3 l p ⎥ 0 0 I7 Cm q (α)c¯ f2a (9.2h) ⎦, ⎣ 4 I4 Cl p (α)b + I9 Cn p (α)b 0 I4 Clr (α)b + I9 Cnr (α)b ⎡ ⎤ 0 I3 Clδa (α, β)b + I4 Cn δa (α, β)b I3 Clδr (α, β)b + I4 Cn δr (α, β)b ⎢ ⎥ ¯ 0 0 g2 q¯ S ⎣ I7 Cm δe (α)(c) ⎦, 0 I4 Clδa (α, β)b + I9 Cn δa (α, β)b I4 Clδr (α, β)b + I9 Cn δr (α, β)b
(9.2i)
234
B.-J. Jeon et al.
x1 denotes the state vector of the outer loop; α, β, and φ denote the angle of attack, the sideslip angle, and the roll angle of the UAV, respectively; x2 denotes the state vector of the inner loop; p, q, and r denote the roll, pitch, and yaw rates with respect to the body-fixed reference frame, respectively; u denotes the control input; δe , δa , and δr denote the elevator, the aileron, and the rudder deflection angles, respectively; θ denotes the pitch angle of the UAV; T and V denote the thrust and the velocity of the UAV; S, b, c, ¯ and m denote the wing area, the wing span, the wing mean aerodynamic chord, and the mass of the UAV, respectively; ρ denotes the air density; g denotes the gravitational acceleration; q¯ = 0.5ρV 2 denotes the dynamic pressure; I[·] denotes the generic moment of inertia-related terms of the UAV; and C(·) denotes the generic non-dimensional aerodynamic coefficient of the UAV. Under the assumption that the fin surfaces are pure moment generators, the 6-DoF UAV dynamics in (9.1a) becomes strict feedback, that is, x˙ 1 = f1 + f1g + g1 + g1a x2 , x˙ 2 = f2 + f2a x2 + g2 u.
(9.3a) (9.3b)
9.2.2 Benchmark: Backstepping Control Based on Lyapunov stability, a BKS controller guarantees the asymptotic stability of the equilibrium point of the closed-loop system. For the outer loop controller design, the Lyapunov candidate function V1 is defined as V1
1 T z z1 , 2 1
(9.4)
where z1 x1 − x1c and the subscript c denotes a command on the state vector; note that V1 is positive definite. The time derivative of V1 is obtained from (9.3) and (9.4) with V˙1 = z1T z˙ 1 = z1T
f1 + f1g + g1 + g1a x2 − x˙ 1c .
(9.5)
To guarantee that V˙1 < 0 except at the equilibrium point z1 = 0, the command on x2 , which serves as a pseudoinput of the outer loop, is given by −1 x2c = g1 + g1a −C1 z1 − f1 + f1g + x˙ 1c ,
(9.6)
where C1 > 0. The Lyapunov candidate function for the inner loop controller design is given by V2
1 T 1 z1 z1 + z2T z2 , 2 2
(9.7)
9 Incremental Control to Reduce Model Dependency of Classical …
235
where z2 x2 − x2c denotes the state error of the inner loop. Note that V2 > 0 except T
at the equilibrium point z1T z2T = 0. The assumption that x1 has converged to its desired value z1 = 0 is not valid for the inner loop controller design since the inner loop dynamics are faster than the outer loop dynamics. Hence, z1 as well as z2 is considered in the inner loop controller design. As a result, the time-scale separation assumption is significantly relaxed. Thus, V˙2 is obtained from (9.3) and (9.7) by V˙2 = z1T z˙ 1 + z2T z˙ 2
= z1 f1 + f1g + g1 + g1a x2 + h1 − x˙ 1c + z2 f2 + f2a x2 + g2 u − x˙ 2c . (9.8) The command on the control input vector is given by
uc g2−1 −C2 z2 − g1 + g1a z1 − f2 + f2a x2 + x˙ 2c ,
(9.9)
where C1 > 0 and C2 > 0 denote design parameters, so that V˙2 < 0 except for the T
equilibrium point z1T , z2T = 0. Since only the estimated values of the model information are available in the controller design phase, BKS for the 6-DoF UAV dynamics follows from (9.6) and (9.9) as
−1 −C1 z1 − fˆ1 + fˆ1g + x˙ 1c , x2c gˆ 1 + gˆ 1a
uc gˆ 2−1 −C2 z2 − gˆ 1 + gˆ 1a z1 − fˆ2 + fˆ2a x2 + x˙ 2c ,
(9.10a) (9.10b)
where fˆ1 , fˆ1g , gˆ 1 , gˆ 1a , fˆ2 , fˆ2a , and gˆ 2 denote f1 , f1g , g1 , g1a , f2 , f2a , and g2 calculated with the estimated values of the aerodynamic coefficients of UAV, respectively.
9.2.3 Incremental Backstepping Control IBKS for the 6-DoF UAV dynamics is designed as follows. First, IBKS is applied to the inner loop controller design. Then, as can be seen from the previous studies on IBKS [4, 12], the outer loop controller is designed with BKS since there exist other practical ways to compensate the model information. For example, general measurements like a normal force and a velocity can directly substitute for aerodynamic forces [12]. Hence, x2c is given by (9.6). Taking the first-order Taylor series expansion for (9.1a) at x = x0 and u = u0 , we deduce that
236
B.-J. Jeon et al.
x˙ 2 f2 (x0 ) + f2a (x0 ) x20 + g2 (x0 ) u0 ∂
f2 (x) + f2a (x) x2 + g2 (x) u x=x0 , u=u0 (x − x0 ) + ∂x ∂
g2 (x) u x=x0 , u=u0 (u − u0 ) , + ∂u
(9.11)
T where x x1T , x2T , and x0 and u0 denote the current values of x and u. Equation (9.11) can be rewritten as x˙ 2 x˙ 2,0 + A2,0 Δx + B2,0 Δu,
(9.12)
Δx x − x0 , Δu u − u0 , ∂
f2 (x) + f2a (x) x2 + g2 (x) u x=x0 , u=u0 , A2,0 ∂x ∂
g2 (x) u x=x0 , u=u0 . B2,0 ∂u
(9.13)
where
(9.14) (9.15)
In (9.12), the increment in state, Δx, can be ignored [5] since it has much smaller effects than the increment in input, Δu. This approximation becomes acceptable as the control surface deflection angles directly and instantly affect the aerodynamic moments, whereas the effects of Δx on the aerodynamic moments are not direct. Hence, compared to Δu, the effects of Δx become negligible in (9.12), especially when a sampling time is small enough. As a result, the dynamics of x2 can be rewritten as the incremental dynamics x˙ 2 x˙ 2,0 + B2,0 Δu.
(9.16)
In a similar way, with the inner loop controller design in Sect. 9.2.3, the Lyapunov candidate function V2 is given by (9.7). Hence, V2 > 0 except at the equilibrium point
T T T z1 z2 = 0, and it follows from (9.7) and (9.16) that V˙2 = z1T z˙ 1 + z2T z˙ 2
= z1 f1 + f1g + g1 + g1a x2 + h1 − x˙ 1c + z2 x˙ 2,0 + B2,0 Δu − x˙ 2c . (9.17) Thus, the control input command to guarantee that V˙2 < 0 except at the equilibrium T
point z1T z2T = 0 is given by ˙ 2,0 + x˙ 2c + u0 . uc B−1 2,0 −C2 z2 − g1 + g1a z1 − x
(9.18)
9 Incremental Control to Reduce Model Dependency of Classical …
237
T
Note that the equilibrium point of the closed-loop system with IBKS, z1T z2T = 0, is guaranteed to be asymptotically stable since (9.18) is derived based on Lyapunov’s stability theorem. Because only the estimated values of the model information are available in the controller design phase, IBKS for the 6-DoF aircraft dynamics follows from (9.6) and (9.18) as
−1 −C1 z1 − fˆ1 + fˆ1g + x˙ 1c , x2c = gˆ 1 + gˆ 1a ˆ 1 + gˆ 1a z1 − x˙ 2,0 + x˙ 2c + u0 , uc , = Bˆ −1 2,0 −C2 z2 − g
(9.19) (9.20)
ˆ 2,0 denote f1 , f1g , g1 , g1a , f2 , f2a , and B2,0 calculated where fˆ1 , fˆ1g , gˆ 1 , gˆ 1a , fˆ2 , fˆ2a , and B with the estimated values of the aerodynamic coefficients of aircraft, respectively. It follows from (9.9) that the entire model information is required to implement BKS. On the other hand, observing (9.18), we deduce that the model information related to the system matrix, namely, fˆ2 and fˆ2a , is not required to implement IBKS, since the measurements on the state derivatives and the control inputs, x˙ 2,0 and u0 , are utilized. Hence, compared to BKS, model dependency is reduced in IBKS.
9.3 Closed-Loop Characteristics with Incremental Control In this section, the closed-loop system with incremental control is analyzed and its characteristics are discussed. For simplicity, the short-period mode dynamics of a fixed-wing UAV is considered, and it is captured by [13] α˙ = Z α∗ (M, α) α + q + Z δ∗ (M, α) δ, q˙ =
Mα∗
(M, α) α +
Mq∗
(M, α) q +
Mδ∗
(9.21a) (M, α) δ,
(9.21b)
where q¯ S 1 C Z α (M, α) , (9.22a) m U0 q¯ S 1 C Z δ (M, α) Z δ∗ (M, α) , (9.22b) m U0 q¯ S c¯ q¯ S c¯2 q¯ S 1 C Z α (M, α) Mα∗ (M, α) C Mα (M, α) + C Mα˙ (M, α) , (9.22c) Iy 2I y U0 m U0 Z α∗ (M, α)
Mq∗ (M, α)
q¯ S c¯2 q¯ S c¯2 C Mq (M, α) + C M (M, α) , 2I y U0 2I y U0 α˙
(9.22d)
Mδ∗ (M, α)
q¯ S c¯ q¯ S c¯2 q¯ S C Z δ (M, α) C Mδ (M, α) + C (M, α) 2 Mα˙ Iy m 2I y U0
(9.22e)
238
B.-J. Jeon et al.
denote the aerodynamic derivatives, and C(·) denote the non-dimensional aerodynamic coefficients. Equation (9.21) captures a nonlinear system, which can be expressed as a linear parameter-varying (LPV) system whose parameters vary with the state variables. To design BKS and IBKS, the short-period longitudinal dynamics of a fixed-wing UAV is reformulated in strict feedback form by assuming that a fin surface is a pure moment generator; note that this assumption is reasonable for most aircraft, often made in flight controller design, since C Z δ is generally small enough to be neglected [13]. In this case, it holds that α˙ = Z α∗ α + q, q˙ = Mα∗ α + Mq∗ q + Mδ∗ δ.
(9.23a) (9.23b)
A BKS controller for (9.23) is derived in [8] by proceeding as in Sect. 9.2.3, and it is given by qc = −C1 z 1 − Zˆ α∗ α + α˙ c ,
1 δ= −C2 z 2 − z 1 − Mˆ α∗ α − Mˆ q∗ q + q˙c , Mˆ δ∗
(9.24a) (9.24b)
ˆ denote the estimated values of the aerodynamic coefficients (·), z 1 α − where (·) αc and z 2 q − qc denote the state errors, and C1 and C2 denote positive design parameters. An IBKS controller for (9.23) is given by [8, 10] qc = −C1 z 1 − Zˆ α∗ α + α˙ c , δ = δ0 + Δδ 1 = (−C2 z 2 − z 1 − q˙0 + q˙c ) + δ0 . ˆ Mδ∗
(9.25a)
(9.25b)
Note that the inner loop controller in (9.25) is designed with the incremental dynamics of q, which is given by q˙ q˙0 + Mδ∗ Δδ.
(9.26)
Furthermore, note that IBKS is applied to the inner loop controller design and the outer loop controller is designed with BKS since there exist other practical ways to compensate Zˆ α∗ .
9 Incremental Control to Reduce Model Dependency of Classical …
239
9.3.1 Closed-Loop Analysis Under Model Uncertainty 9.3.1.1
Closed-Loop Transfer Function
IBKS and BKS without Model Uncertainties In this section, closed-loop transfer functions with IBKS and BKS are derived without considering model uncertainties. Note that this closed-loop analysis is performed in a piece-wise way to utilize the existing analysis framework for linear time-invariant (LTI) system as in [14]. To derive the closed-loop transfer function with IBKS, the additional measurements q˙0 and δ0 are modeled as q˙0 = Mα∗ α + Mq∗ q + Mδ∗ δ0 ,
(9.27a)
δ0 = δ(t − Δt),
(9.27b)
where Δt denotes the step size, and the measurement of the state derivative q˙0 is modeled considering (9.23). Note that the model information Mα∗ and Mq∗ are implicitly included in q˙0 . Under the assumption that an actuator is ideal, the measurement on control surface deflection δ0 can be regarded as a control command generated in the previous step. Using (9.25) with (9.21) and (9.27) under the assumptions that the estimates on ˆ are identical with their actual values (·), and αc is the aerodynamic coefficients (·) constant, δ can be rearranged as follows: δ=−
1 1 Mδ∗ 1 δ (t − Δt) , ν α − ν q + C + 1) α + 1 − (C 2,α 2,q 1 2 c Mδ∗ Mδ∗ Mδ∗ Mδ∗ (9.28)
where ν2,α C1 + Z α∗ C2 + Z α∗ + Mα∗ + 1, ν2,q C1 + C2 + Mq∗ + Z α∗ .
(9.29a) (9.29b)
Applying the Laplace transform to (9.28) and arranging the equation with respect to δ, we obtain that 1 1 ν2,α (s) − φ(s) ν2,q (s) X(s) + δ(s) = − φ(s)
1 (C1 C2 + 1) αc (s), φ(s)
(9.30)
where φ(s) = Mδ∗ 1 − e−Δts + Mδ∗ e−Δts . Thus, (9.23) can be expressed as
(9.31)
240
B.-J. Jeon et al.
x˙ = Ax + Bu,
(9.32a)
y = Cx,
(9.32b)
Z α∗ 1 0 where x = [α, q] , u = δ, A = ,B= ,C= 10 . Mα∗ Mq∗ Mδ∗ Applying the Laplace transform to (9.32) and substituting δ(s) into (9.30), the closed-loop system with the IBKS can be obtained as
T
sX(s) = A2 (s)X(s) + B2 (s)αc (s), Y = C(s)X(s),
(9.33a) (9.33b)
where a2,11 (s) a2,12 (s) , A2 (s) = a2,21 (s) a2,22 (s) Z α∗ 1 = , Mδ∗ Mδ∗ Mα∗ − φ(s) ν2,α (s) Mq∗ − φ(s) ν2,q (s) 0 ∗ , B2 (s) = Mδ C + 1) φ(s) (C 1 2 C(s) = [1, 0].
(9.34a) (9.34b) (9.34c) (9.34d)
Thus, it follows from (9.33) that the transfer function for the closed-loop system with IBKS is given by α(s) = C(s) (sI − A2 (s))−1 B2 (s) αc (s) M∗
δ a2,12 φ(s) (C1 C2 + 1) = 2 s − a2,11 + a2,22 s + a2,11 a2,22 − a2,12 a2,21 T (s) = 2 , s + 2ζ ωn s + ωn2
(9.35)
where M∗ Mδ∗ T (s) = a2,12 (s) δ (C1 C2 + 1) = (C1 C2 + 1) , φ(s) φ(s) 2ζ ωn = − a2,11 (s) + a2,22 (s)
M∗ = − Z α∗ + Mq∗ − δ C1 + C2 + Mq∗ + Zˆ α∗ , φ(s) ωn2 = a2,11 (s)a2,22 (s) − a2,12 (s)a2,21 (s) Mδ M∗ ∗ ∗ Z α Mq − Mα∗ + = 1− δ (C1 C2 + 1) . φ(s) φ(s)
(9.36a)
(9.36b)
(9.36c)
9 Incremental Control to Reduce Model Dependency of Classical …
241
If the control input command is generated, transmitted, and applied fast enough, then Δt can be assumed to be 0 and, hence, it holds that TN α(s) = 2 , αc (s) s + 2ζ N ωn N s + ωn2 N
(9.37)
where TN = C1 C2 + 1, 2ζ N ωn N = C1 + C2 , ωn2 N
= C1 C2 + 1,
(9.38a) (9.38b) (9.38c)
ζ N and ωn N represent the damping ratio and the natural frequency for the closed-loop systems with IBKS without model uncertainties. Without model uncertainties, the closed-loop transfer function with BKS [8] can be obtained by substituting (9.24) into (9.32), and this transfer function is the same as (9.37). Remark 9.1 If there is no model uncertainty, the closed-loop transfer functions with IBKS and BKS become identical when the measurements are ideal and the control system is fast enough. Remark 9.1 indicates that, without model uncertainty, the closed-loop systems with IBKS and BKS behave identically. The difference between IBKS and BKS for the short-period mode dynamics is that q˙0 and δ0 are utilized in IBKS, whereas BKS requires Mˆ α∗ and Mˆ q∗ . This fact implies that, if there is no model uncertainty, that is, if Mˆ α∗ = Mα∗ and Mˆ q∗ = Mq∗ , then the terms related to Mˆ α∗ and Mˆ q∗ in BKS are replaced with q˙0 and δ0 in IBKS. IBKS and BKS Under Model Uncertainties In the following, closed-loop transfer functions with IBKS and BKS are derived considering model uncertainties. Note that this closed-loop analysis is performed in a piece-wise way to utilize the existing analysis framework for LTI system, as in [14]. For the derivation of the closed-loop transfer function with IBKS, we rearrange δ by using (9.25) with (9.23) and (9.27) under the assumption of constant αc and obtain 1 1 Mδ∗ δ=− ν2,q q + (C1 C2 + 1) αc + 1 − ∗ δ (t − Δt) , Mˆ δ∗ Mˆ δ∗ Mˆ δ
(9.39)
242
B.-J. Jeon et al.
where
C1 + Zˆ α∗ C2 + Z α∗ + Mα∗ + 1 ,
= C1 + C2 + Mq∗ + Zˆ α∗ .
ν2,α =
(9.40a)
ν2,q
(9.40b)
Applying the Laplace transform to (9.39) and rewriting it with respect to δ, we deduce that 1 1 ν2,α (s) − φ(s) ν2,q (s) X(s) + δ(s) = − φ(s)
1 (C1 C2 + 1) αc (s), φ(s)
(9.41)
where φ(s) = Mˆ δ∗ 1 − e−τ s + Mδ∗ e−τ s .
(9.42)
Applying the Laplace transform to (9.32), and δ(s) in (9.41) is substituted into that equation, the closed-loop system with the IBKS is given by sX(s) = A2 (s)X(s) + B2 (s)αc (s),
(9.43a)
Y = C(s)X(s),
(9.43b)
where A2 (s) = B2 (s) =
Mα∗ −
Z α∗
Mδ∗ ν (s) φ(s) 2,α
1
Mq∗ −
0 , Mδ∗ C + 1) φ(s) (C 1 2
Mδ∗ ν (s) φ(s) 2,q
C(s) = [1, 0] .
,
(9.44a) (9.44b) (9.44c)
Thus, it follows from (9.43) that the transfer function for the closed-loop system with IBKS is given by α(s) = C(s) (sI − A2 (s))−1 B2 (s) αc (s) DΔ,2 (s) , = 2 s + NΔ,2 (s)s + NΔ,2 (s)
(9.45)
9 Incremental Control to Reduce Model Dependency of Classical …
243
where Mδ∗ (9.46a) (C1 C2 + 1) , φ(s)
M∗ (9.46b) NΔ,2 (s) − Z α∗ + Mq∗ − δ C1 + C2 + Mq∗ + Zˆ α∗ , φ(s)
M∗ M∗ ∗ ∗ Z α Mq − Mα∗ + δ C2 Zˆ α∗ − Z α∗ + (C1 C2 + 1) , NΔ,2 (s) 1 − δ φ(s) φ(s) (9.46c) φ(s) = Mˆ δ∗ 1 − e−Δts + Mδ∗ e−Δts . (9.46d)
DΔ,2 (s)
The transfer function with IBKS can be simplified as (9.47) by applying the assumption Δt 0 for analysis purposes, and this assumption is reasonable due to improved computation power and reduced transmission time in recent avionics systems. TΔ,2 α(s) = 2 , αc (s) s + 2ζΔ,2 ωn Δ,2 s + ωn2Δ,2
(9.47)
where TΔ,2 = C1 C2 + 1,
Zˆ α∗
Z α∗
, −
= (C1 C2 + 1) + C2 Zˆ α∗ − Z α∗ ,
2ζΔ,2 ωn Δ,2 = (C1 + C2 ) + ωn2Δ,2
(9.48a) (9.48b) (9.48c)
ζΔ,2 and ωn Δ,2 denote the damping ratio and natural frequency for the closed-loop system with IBKS under the model uncertainties, respectively. Under the assumption of constant αc , it follows from (9.24) and (9.23) that
1 1 1 δ= − ν1,α , − ν1,q x + (C1 C2 + 1) αc , ∗ ∗ ˆ ˆ ˆ Mδ Mδ Mδ∗
(9.49)
where
C1 + Zˆ α∗ C2 + Z α∗ + Mˆ α∗ + 1 ,
= C1 + C2 + Mˆ q∗ + Zˆ α∗ .
ν1,α =
(9.50a)
ν1,q
(9.50b)
Thus, substituting (9.49) into (9.32), the transfer function for the closed-loop system with BKS is given by α(s) TΔ,1 = 2 , αc (s) s + 2ζΔ,1 ωn Δ,1 s + ωn2Δ,1
(9.51)
244
B.-J. Jeon et al.
Table 9.1 Case description Case Aerodynamic coefficient estimates Mˆ α∗ = Mα∗ (1 + Δ Mα∗ ) Mˆ q∗ = Mq∗ 1 ∗ ∗ ˆ Mq = Mq (1 + Δ M ∗ ) Mˆ α∗ = Mα∗ 2 q
3 4
Zˆ α∗ = Z α∗ (1 + Δ Z α∗ ) Mˆ ∗ = M ∗ (1 + Δ M ∗ ) δ
δ
δ
Mˆ α∗ = Mα∗ Mˆ α∗ = Mα∗
Zˆ α∗ = Z α∗ Zˆ α∗ = Z α∗ Mˆ q∗ = Mq∗ Mˆ q∗ = Mq∗
Mˆ δ∗ = Mδ∗ Mˆ δ∗ = Mδ∗ Mˆ δ∗ = Mδ∗ Zˆ α∗ = Z α∗
Table 9.2 Case study—parameters for transfer function with BKS Case
T Δ,1
2ζ Δ,1 ω n Δ,1
ω 2n Δ,1
1 2
TN TN
2ζ N ωn N 2ζ N ωn N + Mq∗ Δ Mq∗
ωn2 N + Mα∗ Δ Mα∗ ωn2 N − Z α∗ Mq∗ Δ Mq∗
3
TN
4
TN 1+Δ M ∗ δ
2ζ N ωn N + Z α∗ Δ Z α∗
Z α∗ + Mq∗ Δ Mδ∗ 2ζ N ωn N − 1 + Δ Mδ∗ 1 + Δ Mδ∗
ωn2 N + C2 Z α∗ Δ Z α∗
Z α∗ Mq∗ − Mα∗ Δ Mδ∗ ωn2 N + 1 + Δ Mδ∗ 1 + Δ Mδ∗
where Mδ∗ (C1 C2 + 1) , Mˆ δ∗
Mδ∗ ∗ ∗ ∗ ∗ ˆ ˆ C 1 + C 2 + Mq + Z α , = − Z α + Mq − Mˆ δ∗
Mδ∗ ∗ ∗ ∗ ∗ ∗ ∗ Z α Mˆ q − Mˆ α = Z α Mq − M α − Mˆ δ∗
M∗ + δ C2 Zˆ α∗ − Z α∗ + (C1 C2 + 1) , Mˆ δ∗
TΔ,1 = 2ζΔ,1 ωn Δ,1 ωn2Δ,1
(9.52a) (9.52b)
(9.52c)
ζΔ,1 and ωn Δ,1 capture the damping ratio and natural frequency for the closed-loop system with BKS under the model uncertainties, respectively. Since the study of the effects of each model uncertainty on the closed-loop system with BKS can provide better a understanding of the advantages of IBKS from its reduced model dependency, for each case in Table 9.1, the closed-loop transfer function with BKS (9.43) is simplified to T Δ,1 α(s) = 2 . αc (s) s + 2ζ Δ,1 ω n Δ,1 s + ω 2n Δ,1
(9.53)
The perturbed parameters T Δ,1 , 2ζ Δ,1 ω n Δ,1 , and ω 2n Δ,1 in (9.53) from the nominal parameters TN , 2ζ N ωn N , and ωn2 N in (9.37) are provided in Table 9.2.
9 Incremental Control to Reduce Model Dependency of Classical …
9.3.1.2
245
Stability and Performance
IBKS and BKS without Model Uncertainties In this section, the closed-loop stability and performance of closed-loop systems controlled by IBKS and BKS are investigated. It follows from (9.37) that, without model uncertainties, the poles of the closed-loop system with BKS and IBKS, p N ,1 and p N ,2 , are given by p N ,1 = p N ,2 =
− (C1 + C2 ) ±
(C1 − C2 )2 − 4 . 2
(9.54)
From (9.54), the following considerations can be made: Remark 9.2 If there is no model uncertainty, then the poles are always located in the left half-plane with the positive design parameters, which means that the closed-loop systems with BKS and IBKS always become stable. Remark 9.3 If there is no model uncertainty, then the poles can be expressed as functions of the design parameters only, which implies that the system characteristics become uniform in the entire flight envelope. The steady-state error is given by ess = αc − lim α(t) = αc − lim sα(s), t→∞
s→0
(9.55)
and a step input αc = K in the time domain is expressed in the frequency domain as αc (s) = Ks . By applying (9.55) to (9.37) with αc = K and αc (s) = Ks , we deduce that the steady-state errors with BKS and IBKS for the nominal case, ess N ,1 and ess N ,2 , are given by ess N ,1 = ess N ,2 = 0.
(9.56)
Remark 9.4 It follows from (9.56) that if there is no model uncertainty, the steadystate errors with BKS and IBKS are zero. From the closed-loop transfer functions with BKS and IBKS, various performance metrics can be discussed. For example, a settling time, whereby the magnitude of the state error is reduced within 5%, can be approximated by
ts N =
where
⎧ 3.2 ⎪ ⎪ , if 0 < ζ N , < 0.69, ⎪ ⎪ ⎨ ζ N ωn N ⎪ ⎪ 4.5 ⎪ ⎪ ζ N , if ζ N > 0.69, ⎩ ωn N
(9.57)
246
B.-J. Jeon et al.
C1 C2 + 1, C1 + C2 . ζN = √ 2 C1 C2 + 1
ωn N =
(9.58a) (9.58b)
It appears from (9.57) and (9.58) that the settling time ts N is determined only by the design parameters, C1 and C2 . IBKS and BKS Under Model Uncertainties In the following, for each closed-loop system with IBKS and BKS, a condition to maintain stability under the model uncertainties is investigated, along with some performance metrics such as a steady-state error. In particular, the closed-loop transfer function with BKS in (9.43) is analyzed for a comparative study with IBKS’ performance as follows. Because the model uncertainties are considered, not only the design parameters, but also the aerodynamic coefficients and their estimates with the uncertainties have impacts on the closed-loop system. For a damped system, stability is normally guaranteed. Therefore, since 2ζΔ,1 ωn Δ,1 > 0 and ωn2Δ,1 > 0, and assuming that both Mδ∗ and Mˆ δ∗ have the same sign, a condition GΔ,1 to maintain stability for the closed-loop system with BKS under the model uncertainties is given by (9.59a) GΔ,1 = {C1 , C2 ∈ R>0 : Cond. 1 & Cond. 2} , ∗ ∗ Mˆ δ ∗ Mˆ δ ∗ ˆ∗ M − Mˆ q∗ , Cond. 1 : C1 + C2 > (9.59b) ∗ Zα − Zα + Mδ Mδ∗ q
Mˆ ∗ Cond. 2 : C1 + Zˆ α∗ − Z α∗ C2 > − δ∗ Z α∗ Mq∗ − Mα∗ + Z α∗ Mˆ q∗ − Mˆ α∗ − 1. Mδ (9.59c) Remarkably, this stability condition can be interpreted as constraints on the design parameters for BKS under the model uncertainties. It follows from (9.43) and (9.55) with αc = K and αc (s) = Ks that the steady-state error essΔ ,1 with BKS controller is given by essΔ ,1 = K
Mˆ δ∗ η Mδ∗ 3 , Mˆ δ∗ η2 + M ∗ η3 δ
η2 + η1 +
(9.60)
where η1 C1 C2 + 1,
η2 C2 Zˆ α∗ − Z α∗ ,
∗ ∗ ∗ M η3 Z α Mq − Mα∗ − δ Z α∗ Mˆ q∗ − Mˆ α∗ . Mˆ δ∗
(9.61a) (9.61b) (9.61c)
9 Incremental Control to Reduce Model Dependency of Classical …
247
Table 9.3 Case study: parameters for stability conditions with BKS Case κ1,1 κ2,1 1 2 3 4
−Mα∗ Δ Mα∗ Z α∗ Mq∗ Δ Mq∗
0 −Mq∗ Δ Mq∗
−Z α∗ Δ Z α∗
Z α∗ + Mq∗ Δ Mδ∗
Table 9.4 Case study: parameters for steady-state error with BKS
−C2 Z α∗ Δ Z α∗
− Z α∗ Mq∗ − Mα∗ Δ Mδ∗
Case
η 2,1
1 2 3 4
Mα∗ −Z α∗ Mq∗ C2 Z α∗ Z α∗ Mq∗ − Mα∗
The effects of each model uncertainty on the closed-loop system with BKS are investigated by analyzing (9.53). For each case, using the parameters 2ζ Δ,1 ω n Δ,1 and ω 2n Δ,1 in Table 9.2, the stability condition can be simplified as = {C1 , C2 ∈ R>0 : Cond. 1 & Cond. 2} , GΔ,1
(9.62a)
Cond. 1 : C1 + C2 > κ1,1 , Cond. 2 : C1 C2 + 1 > κ2,1 ,
(9.62b) (9.62c)
with Table 9.3. The parameters κ1,1 and κ2,1 , which are related to the feasible boundary of the design parameters, appear to be the perturbation term of 2ζ Δ,1 ω n Δ,1 , and ω 2n Δ,1 with the opposite sign. By comparing the cases, a more simplified structure for the steady-state error equation is given by e ssΔ ,1 = K
η 2,1 Δ(·) , η1 + η 2,1 Δ(·)
(9.63)
where η1 is the same as in (9.60), defined only by the design parameters. For each case, the weight factor to the model uncertainty η 2,1 only changes, and it is listed in Table 9.4. As can be seen in Tables 9.3 and 9.4, κ2,1 = −η 2,1 Δ(·) . Note that |essΔ ,1 | increases with |Δ(·) |. The closed-loop transfer function with IBKS in (9.45) is analyzed to investigate the stability and the performance of the closed-loop system with IBKS under model uncertainties. Since the model uncertainties are considered, it is observed in (9.45) that not only C1 and C2 , but also the aerodynamic derivatives and their estimates with the uncertainties have impacts on the closed-loop system with IBKS, unlike (9.37). Comparing to (9.43) with BKS, there are two main differences in (9.45) with IBKS.
248
B.-J. Jeon et al.
First, true values for the aerodynamic coefficients in the inner loop, not the estimated ones, appear in (9.45). Second, the effect of the control effectiveness estimate Mˆ δ∗ resides in φ(s). An interesting observation from (9.47) is that, under the Δt 0 assumption, the effects of Mδ∗ and Mˆ δ∗ with the uncertainty are vanished in the closed-loop system with IBKS, while they still remain in the closed-loop system with BKS as shown in (9.43). Remark 9.5 If the control command is calculated, transmitted, and reflected fast enough to the real control surface deflection, the closed-loop system with IBKS becomes insensitive to the uncertainty in Mˆ δ∗ , although this information is still required to implement IBKS. It follows from 2ζΔ,2 ωn Δ,2 > 0 and ωn2Δ,2 > 0 with (9.47) that a condition GΔ,2 to maintain stability for the closed-loop system with IBKS under the model uncertainties is given by GΔ,2 = {C1 , C2 ∈ R>0 : Cond. 1 & Cond. 2} ,
Cond. 1 : C1 + C2 > − Zˆ α∗ − Z α∗ ,
Cond. 2 : C1 + Zˆ α∗ − Z α∗ C2 > −1.
(9.64a) (9.64b) (9.64c)
This stability condition can be interpreted as a constraint on the design parameters for IBKS under the model uncertainties. Note that the stability condition is affected only by the uncertainty on Zˆ α∗ required for the outer loop controller design with BKS. It follows from (9.45) and (9.55) with αc = K and αc (s) = Ks that a steady-state error essΔ ,2 with IBKS can be given by essΔ ,2 = K
η2 , η1 + η2
(9.65)
where η1 C1 C2 + 1,
η2 C2 Zˆ α∗ − Z α∗ .
(9.66a) (9.66b)
Unlike the nominal case, the steady-state error exists due to the model uncertainties. Comparing to (9.60) for BKS, there exist the same η1 and η2 in (9.65) for IBKS. However, there is no η3 in (9.65) which is mainly related to the model information for the inner loop, unlike BKS. The steady-state error occurs only when there exists uncertainty on Zˆ α∗ required for the outer loop controller design with BKS. IBKS for the inner loop utilizes the additional measurements q˙0 and δ0 . In the q˙0 measurement obtained in flight, the effects of true Mα∗ and Mq∗ are included implicitly. Hence, model information about Mα∗ and Mq∗ is not required for IBKS, unlike BKS.
9 Incremental Control to Reduce Model Dependency of Classical … Table 9.5 Simulation parameters
Parameter Value 1.5◦ 1.5 [−0.75, −0.5, −0.25, 0, 0.25, 0.5, 0.75, 1]
αc C1 , C2 Δ(·)
Table 9.6 Steady-state error for system with BKS, ess,1 Case Δ(·) −0.75 −0.5 −0.25 0 0.25 1 2 3 4
0.7843 0.9606 0.6068 –
249
0.6332 0.8143 0.4676 –
0.4013 0.5588 0.2770 −35.0820
0 0 0 0
−0.8632 −2.1919 −0.4391 0.7343
0.5
0.75
1
−3.8281 – −1.2418 0.9859
– – −3.1749 1.1131
– – −14.4279 1.1898
This implies that the closed-loop system with IBKS is not affected by the uncertainties in Mˆ α∗ and Mˆ q∗ , while the stability and the steady-state error of the closed-loop system with BKS are affected by those uncertainties as shown in (9.62) with Table 9.3 and (9.63) with Table 9.4. In the q˙0 measurement, information related to the control effectiveness Mδ∗ is also included. If there is no uncertainty in Mˆ δ∗ utilized to implement IBKS, then the effect of Mδ∗ in the q˙0 measurement can be totally compensated with the δ0 measurement in IBKS. If there is any uncertainty in Mˆ δ∗ , then they cannot be fully canceled out, resulting in φ(s) for the closed-loop system with IBKS. However, if Δt 0, then φ(s) reduces to Mδ∗ , even if Mˆ δ∗ has uncertainty. Hence, the closed-loop system with IBKS becomes robust to the uncertainty in Mˆ δ∗ utilized to implement IBKS, if the control command is calculated, transmitted, and reflected fast enough to the real control surface deflection. In summary, uncertainties on all aerodynamic coefficients except Zˆ α∗ do not affect the closed-loop system with IBKS when the computation and transmission of the control command are fast enough. Note that Zˆ α∗ is required for the outer loop controller designed with BKS. 9.3.1.3
Simulation Results
In the following, we present the results of some simulations to verify the analysis results on the closed-loop characteristics with IBKS, and conduct a comparative study between IBKS and BKS. Since a piece-wise approach is considered for the analysis of the LPV system, simulations are conducted with the short-period mode dynamics at several flight conditions. As an example, the simulation results when altitude is 7.62 km and U0 = 185.93 m/s are suggested, and the corresponding aerodynamic coefficients are Z α∗ = −1.96, Z δ∗ = 0, Mα∗ = −4.75, Mq∗ = −3.93, and Mδ∗ = −26.68. Simulation parameters, such as the angle of attack command, the design parameters, and the level of the model uncertainties, are suggested in Table 9.5. The level
250
B.-J. Jeon et al.
of the model uncertainty Δ(·) indicates how much percentage of error exists in the aerodynamic coefficient estimate compared to the actual aerodynamic coefficient (·). Initial values for the state variables α and q are set to be 0◦ and 0◦ /s. Small enough Δt is applied as 0.001 s for the simulation. With BKS, simulations are performed for cases 1, 2, 3, and 4, respectively. With IBKS, simulations are performed for cases 3 and 4 (Table 9.6). Figure 9.1 shows that the closed-loop system with BKS becomes unstable if Δ Mα∗ ∈ {0.75, 1}, and Δ Mq∗ ∈ {0.5, 0.75, 1} can result in unstable behaviors of the closed-loop system with BKS. Also, the magnitudes of the steady-state errors |ess,1 | become larger as the magnitudes of Δ Mα∗ and Δ Mq∗ increase. These observations imply that the uncertainties on Mˆ α∗ and Mˆ q∗ significantly affect the stability and the performance of the closed-loop system with BKS, while the closed-loop system with IBKS is not affected by those uncertainties. It can be seen from Fig. 9.1 that Δ Mδ∗ ∈ {−0.5, −0.75} can make the closed-loop system with BKS unstable. The magnitudes of the steady-state errors |ess,1 | become larger as the magnitude of Δ Mδ∗ increases. On the contrary, Fig. 9.2 shows that the closed-loop system with IBKS is insensitive to the model uncertainty Δ Mδ∗ in Mˆ δ∗ with τ = 0.001, which is set to be small enough for the simulation. There is no change in the location of poles depending on the variation of Δ Mδ∗ , so the closed-loop system with IBKS is stable all the time. Consequently, ωn , ζ , and ts remain the same, and ess,2 does not exist. These observations coincide with Remark 9.5.
9.3.2 Closed-Loop Analysis Under Measurement Bias and Model Uncertainty 9.3.2.1
Closed-Loop System
Since the additional δ0 and q˙0 measurements are utilized for IBKS instead of the model information on Mα∗ and Mq∗ , as it occurs for BKS, it is essential to understand how disturbances of these additional measurements, such as biases, noises, and delays, along with the model uncertainties, affect the closed-loop system with IBKS. In this section, the closed-loop analysis with IBKS is performed considering biases on δ0 and q˙0 measurements, bq˙0 and bδ0 . Specifically, q˙0 = q˙0,true + bq˙0 ,
(9.67a)
δ0 = δ0,true + bδ0 ,
(9.67b)
q˙0,true = Mα∗ α + Mq∗ q + Mδ∗ δ0,true ,
(9.68a)
δ0,true = δ(t − Δt).
(9.68b)
where
9 Incremental Control to Reduce Model Dependency of Classical …
Fig. 9.1 Closed-loop system response with BKS under model uncertainty
251
252
B.-J. Jeon et al.
Fig. 9.2 Closed-loop system response with IBKS under model uncertainty
By substituting (9.67) into (9.25), we obtain that 1 1 1 ν2,α α − ν2,q q + (C1 C2 + 1) αc ∗ ∗ Mˆ δ Mˆ δ Mˆ δ∗ Mδ∗ 1 + 1− bq˙0 + bδ0 , δ (t − Δt) − ∗ ˆ ˆ Mδ Mδ∗
δ=−
(9.69)
where
C1 + Zˆ α∗ C2 + Z α∗ + Mα∗ + 1 ,
νq C1 + C2 + Mq∗ + Zˆ α∗ .
να
(9.70a) (9.70b)
The closed-loop analysis with IBKS under measurement biases and model uncertainty is performed in a piece-wise way to utilize the existing analysis framework for LTI system, as in [14]. Applying Laplace transform to (9.69), and rewriting this equation with respect to δ, we obtain that
9 Incremental Control to Reduce Model Dependency of Classical …
253
1 1 να (s) − φ(s) νq (s) X(s) δ(s) = − φ(s) +
1 1 Mˆ ∗ bq˙0 + δ bδ0 , (C1 C2 + 1) αc (s) − φ(s) φ(s) φ(s)
(9.71)
where φ(s) = Mˆ δ∗ 1 − e−Δts + Mδ∗ e−Δts .
(9.72)
If the Laplace transform is applied to (9.32) and δ(s) in (9.71) is substituted into that equation, then the closed-loop system with IBKS is given by sX(s) = A(s)X(s) + B(s)αc (s) + D(s)b(s),
(9.73a)
Y = C(s)X(s),
(9.73b)
where A(s) = B(s) =
Mα∗ −
Z α∗
Mδ∗ ν (s) φ(s) 2,α
1
Mq∗ −
0 , Mδ∗ C + 1) φ(s) (C 1 2
Mδ∗ ν (s) φ(s) 2,q
,
C(s) = [1, 0] , 0 0 D(s) = Mδ∗ Mδ∗ Mˆ δ∗ , − φ(s) φ(s) b (s) . b(s) = q˙0 bδ0 (s)
(9.74a) (9.74b) (9.74c) (9.74d) (9.74e)
From (9.73), we deduce that α(s) = C(s) {sI − A(s)}−1 {B(s)αc (s) + D(s)b(s)} 1 = 2 s − a2,11 + a2,22 s + a2,11 a2,22 − a2,12 a2,21 Mδ∗ Mδ∗ Mδ∗ Mˆ δ∗ · bq˙ (s) + bδ (s) . (C1 C2 + 1) αc (s) − φ(s) φ(s) 0 φ(s) 0
(9.75)
Assuming that Δt 0, which is reasonable for enhanced computation power and reduced transmission time in recent avionics systems, we deduce that α(s) =
s2
T (s) , + 2ζ ωn s + ωn2
(9.76)
254
B.-J. Jeon et al.
where T (s) = (C1 C2 + 1) αc (s) − bq˙0 (s) + Mˆ δ∗ bδ0 (s),
2ζ ωn = (C1 + C2 ) + Zˆ α∗ − Z α∗ ,
ωn2 = (C1 C2 + 1) + C2 Zˆ α∗ − Z α∗ ,
(9.77a) (9.77b) (9.77c)
ζ and ωn capture the damping ratio and the natural frequency for the closed-loop system with IBKS under the measurement biases and the model uncertainty, respectively.
9.3.2.2
Stability and Performance
In the following, the stability and the performance of closed-loop systems with IBKS are investigated by analyzing (9.76). Since stability is normally guaranteed for a damped system, 2ζ ωn > 0, and ωn2 > 0, a condition G to maintain the closed-loop stability with IBKS under measurement biases and model uncertainties is given by G = {C1 , C2 ∈ R>0 : Cond. 1 & Cond. 2} , Cond. 1 : Cond. 2 :
C1 + C2 > −Z α∗ Δ Z α∗ , C1 C2 + C2 Z α∗ Δ Z α∗ > −1.
(9.78a) (9.78b) (9.78c)
If αc , bq˙0 , and bδ0 are constant, then they can be expressed in the frequency domain as b b αc (s) = αsc , bq˙0 (s) = sq˙0 , and bδ0 (s) = sδ0 , respectively. Thus, it follows from (9.76) that α(s) =
(C1 C2 + 1) αc − bq˙0 + Mˆ δ∗ bδ0 1 . s 2 + 2ζ ωn s + ωn2 s
(9.79)
Furthermore, it follows from (9.55) and (9.79) that the steady-state error with IBKS under measurement biases and model uncertainties can be computed as ess =
η2 1 η3 αc + bq˙0 + bδ , η1 + η2 η1 + η2 η1 + η2 0
(9.80)
where η1 C1 C2 + 1,
η2 = C2 Zˆ α − Z α ,
η3 − Mˆ δ∗ .
(9.81)
From the closed-loop analysis with IBKS under the measurement biases and the model uncertainty suggested above, two key remarks can be made. Note that those observations can be further understood by comparing them to the observations
9 Incremental Control to Reduce Model Dependency of Classical …
255
provided in Sect. 9.3.1.2. where closed-loop analysis with IBKS was carried out only considering model uncertainties under the assumption of perfect measurements. Remark 9.6 Constant biases on the state derivative and the control input measurements do not affect the stability of the closed-loop system with IBKS. Remark 9.7 If constant biases on the additional measurements are considered, the uncertainty on the control effectiveness starts to affect the closed-loop system with IBKS, especially on the steady-state error. It follows from (9.76) that neither bq˙0 nor bδ0 affect the characteristic equation, which is the same as the one in Sect. 9.3.1.2 without considering biases. Hence, the condition for absolute stability of the closed-loop system with IBKS under the measurement biases and the model uncertainty (9.82) becomes the same as the one in Sect. 9.3.1.2 to maintain stability just under the model uncertainties. One of the important characteristics obtained from the analysis in Sect. 9.3.1.2 is that if the control system is fast enough, then the closed-loop system with IBKS is robust with respect to Δ Mδ∗ in Mˆ δ∗ , even though Mˆ δ∗ is required to implement IBKS. However, if bq˙0 and bδ0 are considered, then Δ Mδ∗ starts to have an impact on the steady-state error as it can be seen especially in the term related to bδ0 in (9.80). Note that the first term in (9.80) is identical to the one in Sect. 9.3.1.2, and bq˙0 and bδ0 additionally cause the second and the third terms.
9.3.2.3
Simulation Results
Simulations are carried out to verify the analysis results on the closed-loop characteristics with IBKS under the measurement biases and the model uncertainty. Since a piece-wise approach is considered for the analysis on the LPV system, simulations are conducted with the short-period mode dynamics at several flight conditions. As an example, simulation results when altitude is 7.62km and U0 = 185.93m/s are suggested, and the corresponding aerodynamic coefficients are Z α∗ = −1.96, Z δ∗ = 0, Mα∗ = −4.75, Mq∗ = −3.93 and Mδ∗ = −26.68. Simulation parameters, such as the angle of attack command, design parameters, level of model uncertainty in control effectiveness information, and measurement biases, are suggested in Table 9.7. The initial values for α and q are 0◦ and 0◦ /s. The time step is set so that Δt = 0.001s, and Δ Z α∗ is assumed to be 0 to focus on the effects of biases in the additional measurements and the uncertainty in the control effectiveness on the closed-loop system with IBKS. Closed-loop system responses with IBKS under bq˙0 and Δ Mδ∗ are suggested in Fig. 9.3, and the predicted steady-state errors under them are summarized in Table 9.8. For the cases where bδ0 and Δ Mδ∗ are considered, simulation results are shown in Fig. 9.4 and the predicted steady-state errors are suggested in Table 9.9. It is observed from Figs. 9.3 and 9.4 that, for all the simulation cases, the closed-loop system with IBKS under the measurement biases and the model uncertainties is stable and the rising, and the settling times are shown to be the same. This coincides with Remark
256
B.-J. Jeon et al.
Table 9.7 Simulation parameters Parameter
Value
αc C1 , C2 Δ Mˆ ∗
1.5◦ 1.5 [−0.25, 0, 0.25]
bq˙0 , bδ0
[−0.1, 0.1]
δ
Fig. 9.3 Closed-loop system response with bq˙0 Table 9.8 Predicted steady-state error ess by (9.43) with bq˙0 Δ Mδ∗
bq˙0 −0.1◦ /s
−0.25 0 0.25
−0.0308◦ −0.0308◦ −0.0308◦
Fig. 9.4 Closed-loop system response with bδ0
0.1◦ /s 0.0308◦ 0.0308◦ 0.0308◦
9 Incremental Control to Reduce Model Dependency of Classical … Table 9.9 Predicted steady-state error ess by (9.28) with bδ0
257
Δ Mδ∗
bδ0 −0.1◦
0.1◦
−0.25 0 0.25
−0.6158◦ −0.8211◦ −1.0263◦
0.6158◦ 0.8211◦ 1.0263◦
9.6 and the observation from (9.76) that ζ and ωn of the closed-loop system with IBKS are not affected by bq˙0 , bδ0 , and Δ Mδ∗ . Also, it is shown in Fig. 9.4 and Table 9.9 that the steady-state error is affected by Δ Mδ∗ , which coincides with Remark 9.7. The steady-state errors observed from Figs. 9.3 and 9.4 are the same as the steady-state errors predicted by (9.80) suggested in Tables 9.8 and 9.9.
9.3.3 Closed-Loop Analysis Under Measurement Delay and Model Uncertainty 9.3.3.1
Closed-Loop Analysis Framework
Since the additional measurements q˙0 and δ0 are utilized for IBKS instead of model information Mα∗ and Mq∗ compared to BKS, it is essential to understand how the delays in these additional measurements affect the closed-loop characteristics with IBKS. In the following, the analysis framework is introduced to investigate the closedloop stability with IBKS under measurement delays and model uncertainty, and this framework consists of the following three steps. The first step is to derive the closed-loop transfer function with IBKS under the measurement delays along with the model uncertainties. Delays on q˙0 and δ0 measurements are mainly considered for the closed-loop analysis with IBKS, and modelings of these delayed measurements can be captured by δ0 = δ (t − τδ ) , q˙0 = q˙ t − τq˙ = Mα∗ α t − τq˙ + Mq∗ q t − τq˙ + Mδ∗ δ t − τq˙ .
(9.82a) (9.82b)
Since this section focuses on IBKS, the model uncertainty Δ Mδ∗ on Mˆ δ∗ is mainly considered. Using (9.25) with (9.82) under the assumption of constant a αc (i.e., α˙ c = α¨ c = 0) and Δ Z α∗ = 0, we obtain that Mq∗ 1 1 Mα∗ 1 να α − νq q + α t − τq˙ − q t − τq˙ (C 1 C 2 + 1) αc − Mˆ δ∗ Mˆ δ∗ Mˆ δ∗ Mˆ δ∗ Mˆ δ∗ M∗ + δ (t − τδ ) − δ δ t − τq˙ , (9.83) ∗ ˆ Mδ
δ=−
258
B.-J. Jeon et al.
where
C1 + Z α∗ C2 + Z α∗ + 1 , νq = C1 + C2 + Z α∗ .
να =
(9.84a) (9.84b)
The closed-loop analysis with IBKS with the measurement delays and model uncertainties is performed in a piece-wise way to utilize the existing analysis framework for LTI system, as in [14]. Applying the Laplace transform to (9.83), we obtain that 1 1 δ(s) = − Mˆ ∗ φ1 (s) μα (s) − Mˆ ∗ φ1 (s) μq (s) X(s) + δ
δ
1 (C1 C2 + 1) αc (s), ∗ ˆ Mδ φ1 (s) (9.85)
where φ1 (s) = 1 − e−τδ s +
Mδ∗ −τq˙ s e , Mˆ δ∗
(9.86a)
μα (s) = να + Mα∗ e−τq˙ s , μq (s) = νq + Mq∗ e−τq˙ s .
(9.86b) (9.86c)
If Laplace transform is applied to (9.32) and accounting for (9.85), the closed-loop system with IBKS under measurement delays along with model uncertainty is given by sX(s) = A(s)X(s) + B(s)αc (s), Y = C(s)X(s),
(9.87a) (9.87b)
where A(s) = B(s) =
Mα∗ − Mδ∗ Mˆ δ∗ φ1 (s)
C(s) = [1, 0] .
Z α∗
Mδ∗ μ (s) Mˆ δ∗ φ1 (s) α
Mq∗ −
0 , (C1 C2 + 1)
1
Mδ∗ μ (s) Mˆ δ∗ φ1 (s) q
,
(9.88a) (9.88b) (9.88c)
From (9.87), we deduce that the closed-loop transfer function with IBKS with measurement delays along with model uncertainties can be derived as
9 Incremental Control to Reduce Model Dependency of Classical …
259
α(s) = C(s) (sI − A(s))−1 B(s) αc (s) =
Mδ∗ (C1 C2 Mˆ δ∗
+ 1)
φ1 (s)s 2 + φ2 (s)s + φ3 (s)
,
(9.89)
where Mδ∗ −τq˙ s e , Mˆ δ∗ M∗ φ2 (s) = −(Z α∗ + Mq∗ ) 1 − e−τδ s + δ C1 + C2 + Z α∗ − Z α∗ e−τq˙ s , ∗ ˆ Mδ M∗ φ3 (s) = (Z α∗ Mq∗ − Mα∗ ) 1 − e−τδ s + δ (C1 C2 + 1) . Mˆ δ∗ φ1 (s) = 1 − e−τδ s +
(9.90a) (9.90b) (9.90c)
The closed-loop stability with IBKS under measurement delays along with model uncertainty can be examined by searching unstable poles from its characteristic equation in (9.89). The biggest challenge to obtain poles is that there exist exponential functions from the considered measurement delays. If the exponential functions are approximated as in [15–18], then it is difficult to deduce the feasible range of delay for the valid approximation. In addition, [18] addresses that the stability characteristics obtained with Taylor series expansion can be highly sensitive to the order of the approximation. The analysis framework proposed in this section is based on a numerical approach without approximation. The second step is to reformulate the characteristic equation for numerical pole search. Letting s = a + j b, where a, b ∈ R, and applying Euler’s formula, φ1 , φ2 , and φ3 in (9.89) can be rewritten as φ1 = (1 − xδ + Ω xq˙ ) + j (yδ − Ω yq˙ )
φ2 = −(Z α∗ + Mq∗ )(1 − xδ ) + Ω(C1 + C2 + Z α∗ − Z α∗ xq˙ )
+ j −(Z α∗ + Mq∗ )yδ + Ω Z α∗ yq˙
φ3 = (Z α∗ Mq∗ − Mα∗ )(1 − xδ ) + Ω(C1 C2 + 1) + j (Z α∗ Mq∗ − Mα∗ )yδ ,
(9.91a) (9.91b) (9.91c)
where xδ = e−τδ a cos(τδ b), −τq˙ a
(9.92a)
xq˙ = e cos(τq˙ b), yδ = e−τδ a sin(τδ b),
(9.92b) (9.92c)
yq˙ = e−τq˙ a sin(τq˙ b), M∗ Ω= δ. Mˆ δ∗
(9.92d) (9.92e)
260
B.-J. Jeon et al.
Thus, by substituting (9.91) and s = a + j b, the characteristic equation can be rearranged as φ1 s 2 + φ2 s + φ3 = Re(a, b) + j Im(a, b) = 0,
(9.93)
where
Re(a, b) = (a 2 − b2 ) 1 − e−τδ a cos(τδ b) − Ωe−τq˙ a cos(τq˙ b) − 2ab e−τδ a sin(τδ b) − Ωe−τq˙ a sin(τq˙ b)
+ a −(Z α∗ + Mq∗ ) + Ω(C1 + C2 + Z α∗ ) + Mq∗ e−τδ a cos(τδ b) +Z α∗ e−τδ a cos(τδ b) − Ωe−τq˙ a cos(τq˙ b))
! + b Mq∗ e−τδ a sin(τδ b) + Z α∗ e−τδ a sin(τδ b) − Ωe−τq˙ a sin(τq˙ b)
+ (Z α∗ Mq∗ − Mα∗ )(1 − e−τδ a cos(τδ b)) + Ω(C1 C2 + 1) , (9.94a)
Im(a, b) = (a 2 − b2 ) e−τδ a sin(τδ b) − Ωe−τq˙ a sin(τq˙ b)
+ 2ab 1 − e−τδ a cos(τδ b) − Ωe−τq˙ a cos(τq˙ b)
− a Mq∗ e−τδ a sin(τδ b) + Z α∗ e−τδ a sin(τδ b) − Ωe−τq˙ a sin(τq˙ b)
+ b −(Z α∗ + Mq∗ ) + Ω(C1 + C2 + Z α∗ ) + Mq∗ e−τδ a cos(τδ b) +Z α∗ e−τδ a cos(τδ b) − Ωe−τq˙ a cos(τq˙ b) + (Z α∗ Mq∗ − Mα∗ )e−τδ a sin(τδ b). (9.94b) The final step is to find poles by searching a and b so that Re(a, b) = 0 and Im(a, b) = 0 in (9.93). This is performed by solving the system of these nonlinear equations with a widely known and utilized numerical method [19]. The closed-loop system with IBKS is unstable if there exists at least one solution with a > 0.
9.3.3.2
Stability Analysis Results and Simulations
In the following, we present simulation results to demonstrate some characteristics of closed-loop systems controlled by IBKS under measurement delays along with model uncertainties. These results are obtained by applying the analysis framework in Sect. 9.3.3.1. Since a piece-wise approach is considered for the analysis of the LPV system, simulations are conducted with the short-period mode dynamics in several flight conditions. Illustrative examples of the analysis and the simulation results are provided; four different types of aircraft models are considered and their aerodynamic coefficients at certain flight conditions in Table 9.10 are utilized. Both the design parameters C1 and C2 are set to be 1.5. An extensive range of τq˙ and τδ together with Δ Mδ∗ is introduced to obtain critical insights about the effects of measurement delays along with a model uncertainty on the closed-loop system with IBKS. The presented cases are with τq˙ and τδ from 0 s to 0.1 s with 0.01 s increment and from 0.1 s to 0.2 s with 0.02 s increment, that is, {0 : 0.01 : 0.1} ∪ {0.1 : 0.02 :
9 Incremental Control to Reduce Model Dependency of Classical … Table 9.10 Aerodynamic coefficients of aircraft Parameters Airplane A Airplane B h (km) U0 (m/s) Z α∗ Mα∗ Mq∗ Mδ∗
7.6200 185.9280 −1.9626 −4.7488 −3.9326 −26.6845
1.5240 67.0865 −0.8222 −17.1690 −6.8791 −35.2513
261
Airplane C
Airplane D
1.5240 103.6320 −2.4660 −23.8147 −5.8557 −28.4270
6.0960 205.1304 −0.5249 −1.2473 −0.6474 −1.6937
0.2} s along with Δ Mδ∗ = − 50, −35, −20, 0, 25, 100, 200, 300 %. Initial values for the state variables α and q, and the α-command αc are set to be 0◦ , 0◦ /s, and 1.5◦ , respectively. In Fig. 9.5, τq˙ and τδ are shown to be stable for each Δ Mδ∗ . Note that the empty red, blue, black, and green markers in Fig. 9.5 denote the framework results of aircraft A, B, C, and D, respectively. Figure 9.5 shows that the closed-loop system with IBKS is stable if τq˙ = kτδ , where k ≤ 0 is bounded from above by kmax , which varies with Δ Mˆ ∗ . The values of kmax shown in Fig. 9.5 are summarized in Table 9.11. Analysis of δ Fig. 9.5 and Table 9.11 leads to the following observations. For Δ Mˆ ∗ = 0, kmax = 1, δ indicating that the closed-loop system with IBKS is stable if there is no delay on the measurement of q˙0 , that is, τq˙ = 0, or when the additional measurements q˙0 and δ0 are synchronized with the same delay, that is, τq˙ = τδ . If Mˆ δ∗ is underestimated, that is, Δ Mˆ ∗ < 0, then kmax decreases, resulting in reduced number of stable points. On δ contrary, if Mˆ δ∗ is overestimated, that is, Δ Mˆ ∗ > 0, then kmax increases, resulting in δ increased number of stable points. The simulation results coincide with the framework results, as can be seen in Fig. 9.5. Note that the simulation results are given with the filled pink, cyan, gray, and light green markers for each aircraft A, B, C, and D in Fig. 9.5. For some representative cases with Aircraft A, time responses are presented in Fig. 9.6. It is shown in Fig. 9.6 that the closed-loop system with IBKS is unstable if τq˙ and τδ do not satisfy τq˙ = kτδ (k ≤ kmax ) even with small deviation. To better appreciate the stability and robustness of the closed-loop systems, gain margins under τq˙ and τδ together with Δ Mˆ ∗ are examined through simulations. For δ illustrative purposes, a part of the results with Aircraft A under τδ = {0 : 0.01 : 0.05} s are given in Fig. 9.7, but the trend of all results is the same. Figure 9.7 indicates that the gain margin increases as Δ Mˆ ∗ increases under the same τq˙ and τδ . δ For the same Δ Mˆ ∗ , the gain margin decreases when k approaches to its upper bound δ kmax . The important findings from the analysis results can be summarized as Remarks 9.8 and 9.9.
262
B.-J. Jeon et al.
Fig. 9.5 Relationships between τq˙ and τδ for system stability under Δ Mˆ ∗ δ
9 Incremental Control to Reduce Model Dependency of Classical …
263
Table 9.11 kmax for each Δ Mˆ ∗ δ
Δ Mˆ ∗
kmax Airplane A
Airplane B
Airplane C
Airplane D
−0.5 −0.35 −0.2 0 0.25 1 2 3
0 1 1 1 2 3 5 6
0 1 1 1 2 3 5 6
0 1 1 1 2 3 5 6
0 1 1 1 2 3 4 5
δ
Fig. 9.6 Time response graphs for Aircraft A
264
B.-J. Jeon et al.
M M M
80 M
70
M M
60
M
50
M
*
= -0.5
*
= -0.35
*
= -0.2
*
=0
*
= 0.25
*
=1
*
=2
*
=3
40 30 20 0.05
10 0 0.2
0.15
0.1
0.05
0
k=0 k=1 k=2 k=3 k=4 k=5 k=6
0
Fig. 9.7 Gain margin (GM) for stable closed-loop system under τq˙ and τδ together with Δ Mˆ ∗ δ
Remark 9.8 The closed-loop system controlled by IBKS is stable only if τq˙ = kτδ with a non-negative integer k which has kmax as an upper bound. Otherwise, the closed-loop system controlled by IBKS becomes unstable. Remark 9.9 The upper bound kmax becomes smaller as the model uncertainty on control effectiveness information Δ Mˆ ∗ decreases. δ
Remarks 9.8 and 9.9 can be understood as follows. Measurements of q˙0 have critical influences on the closed-loop system with IBKS because it contains the model information about Mα∗ and Mq∗ to be replaced. If there is no delay on this measurement, that is, τq˙ = 0 with k = 0, then the system with IBKS becomes stable. If the q˙0 signal is delayed, then the relationship between delays on q˙0 and δ0 becomes important for the system stability for the following reason. Due to the considered delays on q˙0 and δ0 , trigonometric functions whose frequencies are τq˙ and τδ appear in the characteristic equation (9.93). The differences between trigonometric terms with frequencies of τq˙ and τδ are repeatedly shown in (9.93) and they have significant impacts to the closed-loop system stability with IBKS. When τq˙ = kτδ with a positive integer k, these differences shows a periodic pattern with the frequency of τδ like the case without τq˙ , resulting in the stable closed-loop response. The parameter k is bounced from above by kmax , which is determined by the model uncertainty on control effectiveness information Δ Mˆ ∗ , and it can be explained as follows. The δ
9 Incremental Control to Reduce Model Dependency of Classical …
265
parameter k indicates how many times the cycle of the trigonometric term with τq˙ is repeated during one period of that with τδ . The estimated parameter Δ Mˆ ∗ affects the δ amplitude of the trigonometric term with τq˙ , as can be seen in (9.93). The maximum magnitude of the differences between trigonometric terms with frequencies of τq˙ and τδ can be more amplified as k increases, and Δ Mˆ ∗ has an impact on the magnitude of δ this amplification. This implies that there exists an upper bound kmax which makes the amplification to be within the range where the system is stable and this kmax is affected by Δ Mˆ ∗ . The reason why kmax decreases as Δ Mˆ ∗ gets smaller can be explained δ δ with loop gain point of view. In the derived control command (9.25), there exists a reciprocal of Mˆ δ∗ . This implies that a loop gain becomes smaller with overestimated Mˆ δ∗ (i.e., Δ Mˆ ∗ > 0), while it grows with underestimated if Δ Mˆ ∗ < 0. Thus, the closedδ δ loop system becomes less robust against the defects as Δ Mˆ ∗ decreases, resulting in δ a smaller kmax . Instead, the time domain response of the system becomes faster as Δ Mˆ ∗ gets smaller since the loop gain increases. δ The proposed comparative study has been conducted to facilitate the critical understanding about individual and integrated effects of measurement delays and model uncertainties to the closed-loop system with IBKS. If model uncertainties are considered and measurement delays are not considered, then Sect. 9.3.1.2 shows that the closed-loop system with IBKS is not affected by any model uncertainty even in Mˆ δ∗ and is always stable with uniform performance. When the closed-loop system with IBKS is under both measurement delays and model uncertainties, the relationship between τq˙ and τδ for the system stability is provided, which is affected by Δ Mˆ ∗ . δ The framework result with Δ Mˆ ∗ = 0 indicates the cases wherein τq˙ and τδ are only δ considered. In this case, the system is stable only if τq˙ = 0 or τq˙ = τδ (i.e., kmax = 1). If Δ Mˆ ∗ is additionally considered, the number of stable points tends to decrease with δ 0 ≤ kmax ≤ 1 for underestimated Mˆ δ∗ and increase with kmax ≥ 1 for overestimated Mˆ δ∗ .
9.4 Conclusion and Future Work In this chapter, IBKS is designed as a representative example for the incremental controller, and it is compared with BKS to show that the model dependency is reduced by utilizing the state derivative and the control input measurements. Also, the closed-loop analysis with IBKS under the model uncertainties and the measurement disturbances is conducted to investigate the individual and the integrated effects of the model uncertainties and the measurement disturbances on the closed-loop stability and performance with the incremental control. If the model uncertainties do not exist and the measurements are ideal, the closedloop transfer functions with IBKS and BKS become the same, and their closed-loop characteristics are affected only by the design parameters when the electronic system and the actuators are fast enough. If there are model uncertainties and no measurement disturbances, the following property of IBKS is observed; unlike BKS, the closed-
266
B.-J. Jeon et al.
loop stability and performance with IBKS are still not affected by the uncertainties on any model information including control effectiveness if the control input is calculated, transmitted, and reflected fast enough to the actual control surface. Unlike the case when the measurements are assumed to be perfect, the model uncertainty in control effectiveness information starts to have an impact on a steadystate error when the biases in the additional measurements are considered. These biases on the additional measurements cause a steady-state error, but they do not have any impact on the characteristic equation. Hence, the closed-loop stability with IBKS is not affected by the biases in the additional measurements. When the delays in the additional measurements exist along with the model uncertainties, the closed-loop stability with IBKS is highly dependent on these delays and starts to be affected by the model uncertainty in control effectiveness information. The closed-loop stability with IBKS under the measurement delays and the model uncertainties is guaranteed only when the delays on the state derivatives and the control input measurements satisfy a certain relationship which is dependent on the model uncertainty in the control effectiveness information. The following topics related to incremental control could be studied further in future studies. The relationship required for the closed-loop stability with incremental control under the measurement delays and the model uncertainties is difficult to be satisfied in practice. Hence, a practical approach to minimize the impact of the measurement delays on its closed-loop stability could be a topic for future study. The effects of other measurement disturbances, such as noise, on the closed-loop stability and performance with incremental control can be investigated further in the future. The next chapter presents a different approach to the control of fixed-wing UAVs, which is not based on incremental control, as shown in both this and the previous chapter, but is based on adaptive dynamic programming. In addition to assuring robustness to uncertainties, this approach allows to enforce some user-defined level of performance, which is captured in the form of a cost function.
References 1. Slotine J-JE, Li W et al (1991) Applied nonlinear control, vol 199, no 1. Englewood Cliffs, NJ, Prentice Hall 2. Kokotovic PV (1992) The joy of feedback: nonlinear and adaptive. IEEE Control Syst Mag 12(3):7–17 3. Lane SH, Stengel RF (1988) Flight control design using non-linear inverse dynamics. Automatica 24(4):471–483 4. Acquatella P, van Kampen E-J, Chu QP (2013) Incremental backstepping for robust nonlinear flight control. Proc EuroGNC 2013:1444–1463 5. Sieberling S, Chu Q, Mulder J (2010) Robust flight control using incremental nonlinear dynamic inversion and angular acceleration prediction. J Guidance Control Dyn 33(6):1732–1742 6. Wang X, Van Kampen E-J, Chu Q, Lu P (2019) Stability analysis for incremental nonlinear dynamic inversion control. J Guidance Control Dyn 42(5):1116–1129
9 Incremental Control to Reduce Model Dependency of Classical …
267
7. Grondman F, Looye G, Kuchar RO, Chu QP, Van Kampen E-J (2018) Design and flight testing of incremental nonlinear dynamic inversion-based control laws for a passenger aircraft. In: Guidance, navigation, and control conference, pp 1–25 8. Jeon B-J, Seo M-G, Shin H-S, Tsourdos A (2021) Understandings of incremental backstepping controller considering measurement delay with model uncertainty. Aerosp Sci Technol 109:106408 9. Jeon B-J, Seo M-G, Shin H-S, Tsourdos A (2019) Closed-loop analysis with incremental backstepping controller considering measurement bias. IFAC-PapersOnLine 52(12):405–410 10. Jeon B-J, Seo M-G, Shin H-S, Tsourdos A (2021) Understandings of incremental backstepping controller considering measurement delay with model uncertainty. Aerosp Sci Technol 109:106408 11. Lee T, Kim Y (2001) Nonlinear adaptive flight control using backstepping and neural networks controller. J Guidance Control Dyn 24(4):675–682 12. van Gils P, Van Kampen E-J, de Visser CC, Chu QP (2016) Adaptive incremental backstepping flight control for a high-performance aircraft with uncertainties. In: Guidance, navigation, and control conference, pp 1–26 13. McLean D (1990) Automatic flight control systems. Prentice Hall, New York 14. Lee C-H, Jun B-E, Lee J-I (2016) Connections between linear and nonlinear missile autopilots via three-loop topology. J Guidance Control Dyn 39(6):1426–1432 15. Driver R, Sasser D, Slater M (1973) The equation x’(t) = ax (t) + bx (t −τ ) with “small” delay. Am Math Mon 80(9):990–995 16. Guillouzic S, L’Heureux I, Longtin A (1999) Small delay approximation of stochastic delay differential equations. Phys Rev E 59(4):3970 17. Bahill A (1983) A simple adaptive smith-predictor for controlling time-delay systems: a tutorial. IEEE Control Syst Mag 3(2):16–22 18. Insperger T (2015) On the approximation of delayed systems by Taylor series expansion. J Comput Nonlinear Dyn 10(2):024503 19. Powell MJ (1968) A Fortran subroutine for solving systems of nonlinear algebraic equations. Tech. Rep, Atomic Energy Research Establishment, Harwell, England (United Kingdom) Byoung-Ju Jeon received the B.S. and M.S. degrees in Aerospace Engineering from KAIST, South Korea, in 2012 and 2014, and the Ph.D. degree in Aerospace Engineering from Cranfield University, U.K., in 2020. From 2014 to 2017, she was a Researcher at the Agency for Defense Development, South Korea, and worked on control law design for UAV. She is currently a Research Fellow in UAV navigation at Cranfield University. Her current research interests include adaptive control and sensor fusion for autonomous systems, and artificial intelligence for navigation and control. Hyo-Sang Shin received his BSc in Aerospace Engineering from Pusan National University in 2004 and gained an MSc in flight dynamics, guidance, and control in Aerospace Engineering from KAIST and a PhD on Cooperative Missile Guidance from Cranfield University in 2006 and 2011, respectively. He is currently a Professor of Guidance, Control, and Navigation Systems and Head of Autonomous and Intelligent Systems Group at Cranfield University. His current research interests include target tracking, data-driven guidance and control, and distributed control of multiple agent systems. Antonios Tsourdos obtained an MEng in Electronic Control and Systems Engineering from the University of Sheffield (1995), an MSc in Systems Engineering from Cardiff University (1996), and a Ph.D. in Nonlinear Robust Missile Autopilot Design and Analysis from Cranfield University (1999). He is a Professor of Control Engineering at Cranfield University and was appointed Head of the Centre for Cyber-Physical Systems in 2013. He was a member of Team Stella, the winning team for the UK MoD Grand Challenge (2008) and the IET Innovation Award (Category Team, 2009).
Chapter 10
Adaptive Dynamic Programming for Flight Control Erik-Jan van Kampen and Bo Sun
10.1 Introduction to Adaptive Dynamic Programming Optimality is a property that a control system should always pursue. A generic way to solve optimal control problems is involving the Hamilton–Jacobi–Bellman (HJB) equation. However, for nonlinear systems, it is, in general, impossible to find an analytical solution to the HJB equation [1, 2]. Nevertheless, adaptive dynamic programming (ADP) offers a promising tool to attain satisfying numerical solutions by incorporating artificial neural networks (ANNs). As its name suggests, ADP not only inherits the capability of the classic dynamic programming for pursuing optimality but also empowers the controller to be adaptive. Through iterations between policy improvement and policy evaluation, ADP can address optimal control problems in a numerical way. Wang et al. [3] provided a comprehensive survey on ADP and also discusses the many names under which these methods are known, such as Approximate Dynamic Programming, Neuro-Dynamic Programming, Adaptive Critic Designs, and RL. Although there are differences in the meanings of these terms, they have all at some point been used to describe the same class of methods. In this chapter, the term ADP will be used, but we consider it a form of reinforcement learning (RL) because ADP essentially maximizes returns [4, 5]. With the development of technology, the complexity of aircraft has largely increased. For example, making aviation sustainable has become a popular topic in recent days. To achieve this goal, wings with morphing functions [6] or large aspect ratios [7] are usually designed, which, however, involves more states that introduce nonlinearities, uncertainties, and constraints. An equilibrium for different E.-J. van Kampen (B) · B. Sun Delft University of Technology, Delft, Netherlands e-mail: [email protected] B. Sun e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 A. L’Afflitto et al. (eds.), Control of Autonomous Aerial Vehicles, Advances in Industrial Control, https://doi.org/10.1007/978-3-031-39767-7_10
269
270
E.-J. van Kampen and B. Sun
demands should be attained, which becomes an optimal control problem. With its power of approximation and interaction, ADP is promising to overcome these new challenges. This section presents some of the basic derivations of ADP, which will be used as a starting point for derivations of more advanced ADP methods in Sect. 10.3. Consider a generic continuous, nonlinear, affine in the control system x˙ (t) = f (x (t)) + g (x (t)) u (t) ,
(10.1)
where x (t) ∈ Rn denotes the state vector, u (t) ∈ Ωu ⊂ Rm the control vector, and f and g are differentiable functions describing the effect of the state and input and the state derivatives. Based on optimal control theory, the utility function U is introduced to account for the cost of states and inputs U (x (t) , u (t)) Q (x (t)) + u T (t) Ru (t) .
(10.2)
The infinite-horizon cost function related to this utility function should be minimized, leading to the optimal cost function ∞
∗
U (x (τ ) , u (τ )) dτ .
J (x) min
u∈A(Ω)
(10.3)
t
Bellman’s optimality principle states that the admissible control that achieves this optimal value is the one that sets the Hamiltonian of system (10.1) H (x, u (x) , ∇ J (x)) U (x, u (x)) + (∇ J (x))T [ f (x) + g (x) u (x)]
(10.4)
to zero, that is, 1 u ∗ (x) = arg minu∈A(Ω) H x, u (x) , ∇ J ∗ (x) = − R −1 g T (x) ∇ J ∗ (x) . (10.5) 2 The cost or value function J is approximated by the critic, often in the form of an artificial neural network (ANN). Instead of the analytical expression for the control action, as given in (10.5), an actor ANN is often used to approximate the control law and to reduce model dependency. Together with a model of the plant that is to be controlled, both the actor and the critic form the basic framework for ADP, as shown in Fig. 10.1. The structure of the remainder of this chapter is as follows. First, some early applications of ADP to flight control will be presented in Sect. 10.2. The main contributions of this chapter are presented in Sect. 10.3, which will describe four recent developments in ADP applied to flight control, namely, the use of incremental models (Sect. 10.3.1), the analytical approach to ADP (XGDHP) (Sect. 10.3.2), how
10 Adaptive Dynamic Programming for Flight Control
271
Fig. 10.1 Basic actor–critic ADP framework: action-dependent heuristic dynamic programming (AD-HDP)
to deal with input constraints (Sect. 10.3.3), and an event-triggered ADP control (Sect. 10.3.4). The chapter is concluded by Sect. 10.4.
10.2 Early Applications of ADP to Flight Control This section will show some of the early applications of ADP methods to flight control tasks. These applications form the foundation of the recent developments that will be shown in Sect. 10.3. One of the early works on applications of ADP to flight control is a paper by Enns and Si on Neuro-Dynamic Programming applied to helicopter flight control [8]. Although they do not use the name ADP themselves, and opt for Neuro-Dynamic to stress the use of artificial neural networks as function approximators, it is a classic AD-HDP approach, in which the actor output is fed into the critic for easier back-propagation through the critic to update the actor weights. This is a model-free approach, which does not require a model of the plant in the control loop. A block diagram of their approach is shown in Fig. 10.2. The controller
Fig. 10.2 Action-dependent heuristic dynamic programming architecture for helicopter control. For additional details on this architecture, see [8]
272
E.-J. van Kampen and B. Sun
Fig. 10.3 Dual heuristic dynamic programming architecture for fixed-wing aircraft control. The scheduling variable generator (SVG) produces the significance scheduling vector and the command state generator (CSG) provides secondary state elements that are compatible with the reference. For additional details on this architecture, see [9]
is applied to stabilizing tasks, i.e., tasks where the aim is to drive a set of states to zero. The controller is trained offline, with success rates varying between 78 and 96%, depending on the task. After training, the weights of a successful run are frozen and an online evaluation takes place. Another key publication in the history of ADP for flight control is the work by Ferrari and Stengel on the application of ADP for control of a business jet-type aircraft model [9]. The work from Ferrari and Stengel is different from the work from Enns and Si in several important aspects. First of all, they use a more advanced form of ADP, Dual Heuristic Dynamic Programming, in which the critic network approximates the derivative of the value function with respect to the state λ, instead of the value function itself, see Fig. 10.3. Secondly, Ferrari and Stengel do not have the action dependency in the critic, and, hence, model information, in the form of transition matrices, is needed to update the actor network. Another difference with previous work is that the controller is not just used for stabilization, but also for tracking reference signals. A final example of some of the early work on ADP for flight control can be found in [10]. In this paper, a direct comparison is made between action-dependent and action-independent heuristic dynamic programming applied to a simulation model of the F-16 fighter aircraft. By removing the action dependency, i.e., by cutting the back-propagation path through the critic to the actor, an approximated plant model is required, as in the work from Ferrari and Stengel. The results in [10] show that the success rate for convergence was nearly doubled by removing the action dependency. It also shows an improvement in tracking performance for the action-independent HDP approach. The three examples of early ADP applied to flight control, which
10 Adaptive Dynamic Programming for Flight Control
273
were shown in this section, have shaped a lot of the later developments, some of which are discussed in Sect. 10.3.
10.3 Recent Developments in ADP This section describes four recent developments in ADP applied to flight control. The use of incremental models and the analytical approach to ADP will be introduced in Sects. 10.3.1 and 10.3.2, respectively. Then, the improvement of the recent ADP in dealing with input constraints and incorporating event-triggered control (ETC) will be presented in Sects. 10.3.3 and 10.3.4, respectively.
10.3.1 Incremental Model ADP can be model free if it is action dependent, which means the control signals are also introduced as the inputs of the critic network [11]. Nevertheless, to achieve a model-free application, an alternative is building a third module to approximate the plant dynamics, and ANNs are often regarded as the first choice [10]. The authors in [10] show that this three-network structure outperforms the action-dependent approach if rewards only depend on system states. Although ANNs can approximate the nonlinear function with arbitrary precision, many samples are required before the weights converge for online identification of complex plant dynamics, such as aerospace systems, which can be dangerous especially at the start of training because the critic and actor networks are then trained based on the incorrect model. For these complex systems, offline training is normally involved to obtain a primary model and it often remains constant in applications [10], which, however, cannot achieve adaptive control when facing unforeseen uncertainties and sudden disturbances in realistic application [12]. Modern processors work in a discrete way, leading to discrete measurements and computations. With the assumption of sufficiently high sampling frequency and relatively slow time-varying dynamics, one can represent a continuous nonlinear plant with a discrete incremental model and retain high enough precision [13]. By taking the first-order Taylor series expansion of (10.1) around time t0 and omitting higher order terms, the system is linearized approximately as follows: x(t) ˙ ≈ x(t ˙ 0 ) + F(x(t0 ), u(t0 ))(x(t) − x(t0 )) + G(x(t0 ), u(t0 ))(u(t) − u(t0 )), (10.6)
274
E.-J. van Kampen and B. Sun
where ∂ f (x(t), u(t)) F(x(t0 ), u(t0 )) = x(t0 ),u(t0 ) , ∂ x(t) ∂ f (x(t), u(t)) G(x(t0 ), u(t0 )) = x(t0 ),u(t0 ) , ∂u(t)
(10.7a) (10.7b)
F(x(t0 ), u(t0 )) ∈ Rn×n denotes the system matrix and G(x(t0 ), u(t0 )) ∈ Rn×m denotes the control effectiveness matrix. Assuming the states and state derivatives of the system are measurable, i.e., Δx(t), ˙ Δx(t) and Δu(t) are measurable, the following incremental model can be used to describe the above system: Δx(t) ˙ F(x(t0 ), u(t0 ))Δx(t) + G(x(t0 ), u(t0 ))Δu(t).
(10.8)
With a constant, high sampling frequency, i.e., if the sampling time Δt is sufficiently small, then the plant model can be written approximately in the discrete form xt+1 − xt ≈ Ft−1 (xt − xt−1 ) + G t−1 (u t − u t−1 ), (10.9) Δt ∈ Rn×n denotes the system transition matrix and where Ft−1 ∂ f ∂(x,u) x x ,u t−1 t−1 (x,u) ∈ Rn×m denotes the input distribution matrix at the time step G t−1 ∂ f ∂u xt−1 ,u t−1
t − 1 for the discretized system. From (10.9), the following incremental form of the new discrete nonlinear system can be obtained as follows: Δxt+1 ≈ Ft−1 ΔtΔxt + G t−1 ΔtΔu t .
(10.10)
This way, the continuous nonlinear global plant is simplified into a linear incremental dynamic equation. The resulting local plant model can be identified online by the recursive least squares (RLS) technique, to take advantage of its adaptability to cope with time variations in the regression parameters and fast convergence speed, so as to avoid training a complex ANN [13]. Although some information is omitted, such as state variation-related nonlinear terms and higher order terms in their Taylor series expansion, with the identified Fˆt−1 and Gˆ t−1 matrix, the next system state can be predicted by (10.11) xˆt+1 = xt + Fˆt−1 ΔtΔxt + Gˆ t−1 ΔtΔu t . The combination of the incremental model with HDP and DHP leads to IM-HDP [15] and IM-DHP [13, 14, 16–18], respectively. The structural diagram of IM-DHP is illustrated in Fig. 10.4 [14], where the discrete-time (DT) state xt is represented by st . Both IM-HDP and IM-DHP have successfully been applied to a variety of aerospace systems, including launch vehicles, satellites, airplanes, etc. In particular,
10 Adaptive Dynamic Programming for Flight Control
275
Fig. 10.4 Flowchart of the information during a single complete time step of the IM-DHP learning framework. Solid lines capture feedforward information, while dashed lines indicate feedback update paths For additional details on this architecture, see [14]
Fig. 10.5 The PH-LAB research aircraft operated by Delft University of Technology. The Cessna Citation 550 is a CS-25 certified aircraft
the application of IM-DHP in Cessna Citation 550 (shown in Fig. 10.5) has been gaining attention [16, 18], and, therefore, is selected to demonstrate the applicability of the incremental model-based ADP.
276
E.-J. van Kampen and B. Sun
Fig. 10.6 General layout of the flight controller. The inner control loop consists of the reinforcement learning agent and provides body rate control. The outer loop consists of PID controllers converting the desired altitude and bank angle to body rate references for the inner control loop. For additional details on this architecture, see [18]
The adaptive learning framework is applied to the pitch and roll rate, augmented with an outer control loop, as illustrated in Fig. 10.6. Rate control exhibits the lowest learning complexity due to the direct dynamic relation between the angular rates and control surfaces. The outer control loop consists of conventional proportional– integral–derivative (PID) controllers and provides a higher level control interface, which enables reference tracking of an altitude and roll angle profile. Under the assumption of a symmetric aircraft, a decoupled design is employed for longitudinal and lateral learning controllers. Furthermore, the aircraft’s yaw damper is disabled to provide the agent with full control authority over the control surfaces. The engine’s thrust setting is controlled by an internal airspeed controller. The simulation model is run with a sampling frequency of 50 Hz. The online learning simulation is conducted at the trimmed condition where the true airspeed is 90 m/s and the altitude is 2 km. More detailed settings can be found in [18]. It is noteworthy that persistent excitation (PE) is essential to both state-space exploration in the learning process and dynamic excitation in the system identification process of the incremental model [12]. Exponentially decaying, sinusoidal excitation is applied to the elevator and ailerons to excite the system during the initial training phase. As the agent learns to track the dynamic reference signals, the excitation on the elevator and ailerons is reduced. As illustrated in Fig. 10.7, both the longitudinal and lateral controllers are able to follow the reference signals after less than 30 s of training.
10.3.2 XGDHP The main character of the (X)GDHP technique is that it makes use of both the cost function and its derivative information to update the critic network. The structural diagram of the present XGDHP implementation that is combined with IM is depicted in Fig. 10.8.
10 Adaptive Dynamic Programming for Flight Control
277
Fig. 10.7 Online training procedures of longitudinal (left) and lateral (right) controller starting at the trimmed operation condition. Reference and excitation signals are illustrated by black dashed and blue dotted lines, respectively. For additional details, see [18]
Fig. 10.8 The architecture of the IM-XGDHP algorithm, where solid lines represent the feedforward flow of signals, dashed lines are back-propagation pathways, and the thick arrow represents the weight transmission. For additional details on this architecture, see [19]
278
E.-J. van Kampen and B. Sun
Fig. 10.9 Traditional straightforward critic network
As depicted in Fig. 10.9, for the conventional straightforward GDHP technique, the critic network outputs the approximation of cost function and its derivatives simultaneously [20], whose description is given by
T T wˆ c2,J Jˆ(xt ) = σ wˆ c1 xt , ˆ t) wˆ c2,λ λ(x
(10.12)
where wˆ c1 ∈ Rn×lc , wˆ c2,J ∈ Rlc , and wˆ c2,λ ∈ Rlc ×n , respectively, denote the estimation of the ideal weights wc1 ∈ Rn×lc wc2,J ∈ Rlc , and wc2,λ ∈ Rlc ×n , and σ (·) is the activation function in the hidden layer. In this structure, two kinds of outputs share the same inputs and hidden layer but have different pathways between the hidden layer and the output layer. However, this sharing does not make any physical sense but only makes them strongly coupled, which is undesirable. Indeed, in general, the weights update processes for two kinds of outputs should be relatively independent. ˆ t ) approximated in Besides, due to the inevitable approximation error, Jˆ(xt ) and λ(x this way cannot exactly provide the derivative relationship, which is called suffering from the inconsistency error [12]. Therefore, inspired by [12, 19], a novel XGDHP technique that takes the advantage of explicit analytical calculations is developed. The architecture of the critic network of the XGDHP technique is illustrated in Fig. 10.10, where the critic network only approximates the cost function as T T σ (wˆ c1 xt ), Jˆ(xt ) = wˆ c2
(10.13)
where wˆ c2 ∈ Rlc denotes the estimation of the ideal weight matrix wc2 ∈ Rlc . By taking the explicit analytical calculations, we obtain that ˆ t) = λ(x
∂ Jˆ(xt ) T = wc1 wˆ c2 σ (wc1 xt ) , ∂ xt
where denotes the Hadamard product.
(10.14)
10 Adaptive Dynamic Programming for Flight Control
279
Fig. 10.10 Critic network of the XGDHP technique. For additional details, see [12]
XGDHP makes use of the cost function and its derivative information, and, therefore, the critic network is expected to minimize the performance measure 1 2 1 T + (1 − β) ec2,t ec2,t , E c,t = β ec1,t 2 2
(10.15)
where ec1,t = Jˆ(xt ) − U (xt , u(xt )) − γ Jˆ(xˆt+1 ), ec2,t =
∂[ Jˆ(xt ) − U (xt , u(xt )) − γ Jˆ(xˆt+1 )] , ∂ xt
(10.16a) (10.16b)
and β ∈ [0, 1]. If β = 1, then XGDHP becomes pure HDP, and if β = 0, then the weight matrix is tuned merely based on the computed derivatives λˆ (xt ) and, consequently, XGDHP is equivalent to DHP [19]. For precise calculations, the partial derivative of u(xt ) with respect to xt should also be into consideration in the critic network updating procedure. Thus, define U (xt , u(xt )) as in (10.2), in which Q(xt ) = x T Qx. By applying the chain rule, it follows from (10.16b) that ∂u(xt ) ˆ t ) − 2Qxt − ec2,t = λ(x · 2Ru(xt ) ∂ xt ∂u(xt ) ∂ xˆsk +1 ∂ xˆsk +1 λˆ (xsk +1 ), + −γ ∂ xt ∂ xt ∂u(xt )
(10.17)
where ∂u(xt )/∂ xt is computed with the facilitation of the actor network, and ∂ xˆsk +1 /∂ xt and ∂ xˆsk +1 /∂u(xt ) are computed through the model network. Given a learning rate ηc > 0, the weight updating algorithm is given by Δwˆ c2 = −ηc
∂ E c,t ∂ E c,t , Δwˆ c1 = −ηc , ∂ wˆ c2 ∂ wˆ c1
(10.18)
280
E.-J. van Kampen and B. Sun
and ˆ t) ∂ E c,t ∂ Jˆ(xt ) ∂ λ(x =β ec1,t + (1 − β) ec2,t , ∂ wˆ c2 ∂ wˆ c2 ∂ wˆ c2 ˆ t) ∂ E c,t ∂ Jˆ(xt ) ∂ λ(x =β ec1,t + (1 − β) ec2,t , ∂ wˆ c1 ∂ wˆ c1 ∂ wˆ c1
(10.19a) (10.19b)
ˆ t )/∂ wˆ c1 are the second-order mixed gradients of the ˆ t )/∂ wˆ c2 and ∂ λ(x where ∂ λ(x ˆ cost function J (xt ), and are computed as ∂ λˆ (xt ) Te
ˆ T x ), ec2,t = (wˆ c1 (10.20a) c2,t ) σ (w c1 t ∂ wˆ c2
∂ λˆ (xt ) T x ))T − x w T T x ) σ (w T x ) T, ec2,t = ec2,t (wˆ c2 σ (wˆ c1 ˆ c2 σ (wˆ c1 ˆ c1 t t ˆ c1 x t w t t ∂ wˆ c1
(10.20b) respectively [21]. The computation method in (10.20) is simpler than the approach presented in [12, 19, 22], where the Kronecker product and tensor operations are involved, and matrix dimensionality transformation is required. Through mathematical derivation, it can be found that (10.20) is equivalent to the method developed in [12, 19, 22]. The above introduction is presented based on the stabilization control problem. The IM-XGDHP approach can also be applied to tracking control problems such as the angle of attack (AOA) tracking problem of the F-16 Fighting Falcon aircraft [12]. In this case, the input to the networks and the identifier is the tracking error instead of the system state xt . The AOA reference signal varies around the trimmed condition, namely, 2.6638◦ . The partial observability (PO) condition is taken into consideration, i.e., the pitch rate q is not directly measurable. To handle this PO problem, an augmented IM is adopted by extending IM with previous controls and observations. All simulations are performed with a sampling frequency of 1 kHz and using Euler’s method. In order to achieve the PE condition, a 3211 disturbance signal [12, 13] is introduced as the exploration signal at the initial exploration stage. Furthermore, it is noted that the learning process is performed totally online. More detailed settings can be found in [19]. Although the initial condition can have an impact on the controller, the IM-XGDHP-PO approach can deal with a wide range of initial states within [−10, 15]◦ without loss of precision. As presented in Fig. 10.11, the AOA can track the given reference signal α ref in less than 2 s for all initial conditions using the IM-XGDHP-PO approach, which is indicative of its competent adaptability and robustness. Nevertheless, only when the task is successfully performed, the results presented above make sense. Random factors, such as initial weights of ANNs and measurement noises, can have impacts on performance and, occasionally, even trigger divergence and failure. A concept of success ratio is therefore introduced as a performance index [19] to show the robustness of the IM-XGDHP-PO by comparing it with XGDHP-
10 Adaptive Dynamic Programming for Flight Control
281
Fig. 10.11 Online AOA tracking control with different initial states using the IM-XGDHP-PO approach Table 10.1 Success ratio comparison for different initial states with 1000 times of Monte Carlo simulation α0 [◦ ] −10 −5 0 2.6638a 5 10 15 IM-XGDHP- 100% FSF IM-XGDHP- 92.7% PO XGDHP-PO 25.7% a AOA
64.5%
78.5%
100%
100%
99.1%
99.4%
46.3%
73.5%
100%
99.3%
99.2%
99.6%
38.2%
45.3%
51.2%
44.3%
41.4%
50.5%
value starting at the trimmed condition
PO using a model network and IM-XGDHP with full state feedback (FSF). One thousand Monte Carlo simulations are executed with seven different initial AOAs and equal reference to evaluate the robustness of these approaches toward initial tracking errors. The results regarding success ratio are illustrated in Table 10.1. Both IM-XGDHP-FSF and IM-XGDHP-PO have a success ratio of 100% starting at the trimmed condition, which implies these approaches are stable for this tracking control problem. Thanks to the fast convergence and accurate approximation of IM, both IM approaches outperform the XGDHP-PO which uses a model network to approximate system dynamics. However, it should also be noted that in multiple cases the success ratio is not 100%, which is mainly owing to the fact that it is arduous to accomplish optimal PE condition due to the circular argument between PE condition, accurate system information, and stable control policy [12]. Nevertheless, there is still a prospect of full success and the results presented in Table 10.1 are obtained based on current settings. The development of various aspects can benefit the stability and improve the success ratio, such as sensor precision, exploration signal, parameter initialization, and learning rates.
282
E.-J. van Kampen and B. Sun
Fig. 10.12 The structure of the actor network, where wˆ a1 , wˆ a2 , and wˆ a3 are the weight matrices to be updated. It is assumed that there are 4 system states and 3 control inputs, and the number of neurons in both hidden layers is 10
10.3.3 Input Constraints Saturation constraints commonly exist in aerospace systems [22], but are rarely tackled in the traditional optimal control because of their nonlinear nature. By incorporating ANNs, ADP acquires a more powerful generalization capability and can deal with input constraints as well. For DT systems, the actor–critic scheme is adopted, and an actor network is required. A bounding layer can be added to the actor output to improve the actor network such that the outputted control command can be bounded by the activation function of the bounding layer [12, 19] as shown in Fig. 10.12. However, for continuous-time (CT) systems, by solving the HJB equation, the single critic network architecture is able to perform ADP with lower computational cost and eliminate the approximation error introduced by the actor network [23]. Different from the actor–critic architecture, where the input saturation constraints are addressed by the bounded actor neurons, the single critic network structure ordinarily utilizes a non-quadratic cost function, such that the control inputs derived from the solution of the HJB equation can be bounded by a hyperbolic tangent function [23]. For the CT system (10.1), assume that u(x(t)) ∈ Ωu and Ωu = {u|u ∈ Rm , |u i | < u b , i = 1, . . . , m}, where u b denotes the control saturating bound. For simplicity, the input constraints are assumed identical and symmetric for each input element in this chapter. Different and asymmetric input constraints are introduced in [22] and [21], respectively. To deal with input constraints, a new infinite-horizon cost function
10 Adaptive Dynamic Programming for Flight Control
283
is defined for system (10.1) as
∞
J (x) =
x T Qx + Y (u)dτ,
(10.21)
t
where Q ∈ Rn×n is positive semi-definite and is set to be a diagonal matrix in this chapter, and Y (u) is a positive semi-definite integrand function utilized to handle control input constraints. We choose U (x, u(x)) = x T Qx + Y (u) as the utility function, and U (x, u(x)) satisfies U (x, u) ≥ 0 and U (0, 0) = 0. Therefore, we define
u
Y (u) 2u b
tanh
−T
(υ/u b )Rdυ = 2u b
0
m i=1
ui
tanh−T (υi /u b )ri dυi , (10.22)
0
where tanh−T (·) stands for (tanh−1 (·))T , and tanh−1 (·) denotes the inverse hyperbolic tangent function, which is a monotonic odd function; R = diag([r1 , · · · , rm ]) ∈ Rm×m is a positive-definite weight matrix, where diag(·) reshapes the vector to a diagonal matrix; remarkably Y (u) ≥ 0, and Y (u) = 0 only if u = 0. The Hamiltonian and the optimal cost function J ∗ (xt ) still take forms as (10.4) and (10.3), respectively. Applying the first-order optimality condition ∂ H (x, u(x), ∇ J (x)) /∂u(x) = 0, we deduce that the corresponding optimal feedback control solution is given by u ∗ (x) = arg minu(x)∈Ωu H (x, u(x), ∇ J ∗ (x)) = −u b tanh(D ∗ ),
(10.23)
where tanh(·) denotes the hyperbolic tangent function, and D ∗ is given by D∗ =
1 −1 T R g (x)∇ J ∗ (x). 2u b
(10.24)
The control input u ∗ is bounded by u b , and the non-quadratic cost (10.22) evaluated at u ∗ is given by Y (u ∗ (x)) = u b ∇ J ∗T (x)g(x) tanh(D ∗ ) + u 2b R ln 1 − tanh2 (D ∗ ) ,
(10.25)
where ∇ J ∗T (x) denotes (∇ J ∗ (x))T and R = [r1 , · · · , rm ]T . Substituting (10.23) and (10.24) into the HJB equation produces 0 = x T Qx + u 2b R ln(1 − tanh2 (D ∗ )) + ∇ J ∗T (x) f (x)
(10.26)
with J ∗ (0) = 0 that leads to H (x, u ∗ (x), ∇ J ∗ (x)) = 0. Since the system is CT, only a single critic network is required to approximate the cost function J (x), and the
284
E.-J. van Kampen and B. Sun
bounded approximate optimal control can directly be obtained through (10.23) and (10.24) by substituting the approximated cost function Jˆ(x).
10.3.4 Event-Triggered Control Although time-based ADP approaches provide a mature and normative solution to nonlinear optimal control problems, to enhance the resource utilization and reduce the computational burden, ETC has been evolved as an alternate control paradigm and acquired more attentions in recent days [21]. Originating from networked control systems [25], ETC originally aims to deal with the limitation of communication bandwidth [21, 26]. A cross fertilization of ETC and ADP produces event-triggered ADP, which has successfully been implemented for optimal stabilization of both discretetime aerospace systems [21] and CT aerospace systems [27]. The key attribute of the event-triggered mechanism lies in that the control signals are updated only when a certain condition is triggered [27]. Therefore, designing a sound triggering condition is the principal task of ETC. Considering the event-triggered scheme, we define a sequence of triggering instants {sk }∞ k=0 , where sk satisfies sk < sk+1 with k ∈ N. The output of the sampleddata module is x(sk ) xk for all t ∈ [sk , sk+1 ). Subsequently, we define the gap function using the event error: ek (t) = xk − x, ∀t ∈ [sk , sk+1 ).
(10.27)
We denote ek (t) briefly by ek hereafter. Every time when a certain triggering condition is satisfied, the event-triggered state vector is updated and the event error ek is reset to zero. At every triggering instant (instead of time instant), the state feedback control law u(x(sk )) = u(xk ) is accordingly updated. By introducing a zero-order holder (ZOH), the control sequence {u(xk )}∞ k=0 actually turns to be a piece-wise signal that remains constant during the time interval [sk , sk+1 ), ∀k ∈ N. Based on the control signal u(xk ), system (10.1) takes the form: x˙ = f (x) + g(x)u(x + ek ), ∀t ∈ [sk , sk+1 ).
(10.28)
For system (10.1), with the infinite-horizon cost function represented by (10.21), we define a triggering condition as ek 2 > eThr 2 ,
(10.29)
where eThr denotes the threshold to be determined. The event is said to be triggered if (10.29) is satisfied. The methods to design a threshold for CT systems and DT systems are different. For CT systems, the triggering threshold is designed to incorporate the ADP
10 Adaptive Dynamic Programming for Flight Control
285
Fig. 10.13 Simple diagram of the ETC scheme incorporating the DT ADP algorithm
controller, which means the control policy is always computed although the control input is updated based on the triggering condition. For CT systems adopting ETC methods, the inter-execution time can be zero, resulting in the accumulation of event times. This is the infamous Zeno phenomenon that must be avoided in the controller design [27]. The main benefit to involve ETC is to decrease the communication cost between the controller and the controlled system. On the contrary, for DT systems the design of the triggering threshold can be separated from the common control law. As presented in Fig. 10.13, only when an event is triggered will the ADP algorithm be activated and the control policy is computed. Therefore, not only the communication cost is decreased, but also the computational load is saved for DT systems. Since the triggering condition is determined independently for DT ADP, it can have a common form. By assuming that there exists a positive constant C ∈ (0, 0.5) such that (10.30) f (xt , μ(et + xt )) ≤ Cxt + Cet , where et ≤ xt , the triggering condition for DT systems can be defined as [21] et > eThr = C
1 − (2C)t−sk xsk . 1 − 2C
(10.31)
The effectiveness of the event-triggered ADP has been verified in an aeroelastic system as an example for CT systems [27]. With the wide usage of composite materials, high aspect-ratio aircraft wing can suffer from aeroelastic instability phenomena, including the LCOs [28, 29]. If not suppressed by active control, LCOs can lead to structural failure and even flight accidents [30]. The schematic of an aeroelastic wing section controlled by a single trailing-edge flap is illustrated in Fig. 10.14 [24], where c.m. is the abbreviation of center of mass. It has two degrees of freedom: the plunge displacement h and the pitch angle θ . In this problem, it is assumed in the undisturbed case that the freestream is along the airfoil chord, and thus pitch angle θ is equal to
286
E.-J. van Kampen and B. Sun
Fig. 10.14 A two-degree-of-freedom aeroelastic system with one control surface. For additional details on this mechanical analogy between fluttering of a wing and the connection of a wing with a translational and torsional spring, see [24]
AOA α. The task of the controller is to stabilize the system with input constraints in the ETC scheme and the detailed settings can be found in [27]. For presenting the advantage of the ETC scheme, a time-based approach is adopted for comparison, whose settings are exactly same as the proposed intelligent critic control approach except for the event-triggered scheme, i.e., the time-based control approach updates the control input at each time instant. The simulation is conducted in an online manner, which means that the control policy improves in a closed-loop way. It can be observed from Fig. 10.15 that the convergence of the weight vector occurs around 1s, which illustrates the feasibility of online learning. The inter-execution time is depicted in Fig. 10.16. It is worth mentioning that 800 samples are utilized by the time-based controller, whereas the proposed event-triggered approach only requires 366 samples. Therefore, the event-triggered method reduces the control updates in the learning process up to 54.25%, and thus improves the resource utilization. Figures 10.17 and 10.18 present the aeroelastic system states trajectory divided into plunge and pitch motion, respectively. It can be observed that, although the event-triggered controller utilizes fewer data samples, the state variables eventually converge to a small vicinity of zero without deteriorating the converge rate. Figure 10.19 presents the control command directly generated by the controller, u (βc ). The developed event-triggered approach has an overall comparable curve to the time-based approach. Due to the event-triggered mechanism, the control command signal is step-wise. Nevertheless, the control command signal has to go through an actuator and the real deflection is adequately smooth for the wing surface control. Furthermore, the control command (incorporating exploration noise) is bounded by the pre-designed saturation constraints, i.e., |u| < u b . Therefore, it can be concluded that the control input constraints problem has been overcome.
10 Adaptive Dynamic Programming for Flight Control
287
Fig. 10.15 Convergence process of the critic weights
Fig. 10.16 Evolution of the inter-execution time
The phase portraits of plunge motions are illustrated in Fig. 10.20. It appears that the trajectories of the proposed method and the open-loop simulation almost coincide at the beginning. This phenomenon is due to the collective effect caused by LCOs and the initial unlearned policy and disappears quickly as the weight vector updates. Then, all states are stabilized to a small vicinity of the equilibrium point.
288
Fig. 10.17 Evolution of the plunge motion states
Fig. 10.18 Evolution of the pitch motion states
E.-J. van Kampen and B. Sun
10 Adaptive Dynamic Programming for Flight Control
289
Fig. 10.19 Control command generated by the controller
Fig. 10.20 Phase portrait of the plunge motion states
The simulation results collectively verify the feasibility and the effectiveness of the event-triggered constrained-input ADP control approach.
290
E.-J. van Kampen and B. Sun
10.4 Conclusions and Future Work This chapter provided an overview of how adaptive dynamic programming has been applied to flight control in the past 20 years. The introduction of incremental plant models in the ADP loop has allowed fast online learning and enabled the design of providing fault-tolerant flight control systems. Many developments have been made to increase the convergence success rate and performance of ADP controllers, for example, by using more advanced structures such as XGDHP (explainable global dual heuristic dynamic programming). There are still several challenges that have to be addressed before ADP can find more widespread applications in flight control. For this reason, considerable effort is continuing in developing ADP controllers with guaranteed converge properties. These guarantees will improve safety at the cost of some flexibility in the learning strategy. An additional aspect that needs improvement concerns the learning efficiency, i.e., how many samples of data are required to learn a good policy. For low-level tasks, such as pitch rate or pitch angle control, the efficiency is high enough to allow for fast online learning. However, increasing the complexity of the control task, by adding more states and actions, can cause the learning to be too slow for pure online learning, meaning the aircraft can lose control before a recovering policy is learned. In these situations, a combination of offline and online learning can be used, in order to provide some baseline control performance while the online agent is trained. Having addressed a control system for a single autonomous aircraft, such as ADP, which is at the intersection between optimal control, adaptive control, and RL, the next chapter will discuss how adaptive control can be successfully employed to control heterogeneous systems of autonomous aircraft.
References 1. Zhao J, Na J, Gao G (2020) Adaptive dynamic programming based robust control of nonlinear systems with unmatched uncertainties. Neurocomputing 395:56–65 2. Wang D, Ha M, Qiao J (2020) Self-learning optimal regulation for discrete-time nonlinear systems under event-driven formulation. IEEE Trans Autom Control 65(3):1272–1279 3. Wang D, He H, Liu D (2017) Adaptive critic nonlinear robust control: a survey. IEEE Trans Cybern 47(10):3429–3451 4. Kiumarsi B, Vamvoudakis KG, Modares H, Lewis FL (2018) Optimal and autonomous control using reinforcement learning: a survey. IEEE Trans Neural Netw Learn Syst 29(6):2042–2062 5. Sutton RS, Barto AG (2018) Reinforcement learning: an introduction, 2nd edn. MIT Press, Cambridge, MA 6. Sun B, Mkhoyan T, van Kampen E-J, De Breuker R, Wang X (2022) Vision-based nonlinear incremental control for a morphing wing with mechanical imperfections. Trans Aerosp Electron Syst 58(6):5506–5518 7. Ruiz Garcia A, Vos R, de Visser C (2020) Aerodynamic model identification of the flying V from wind tunnel data. In: Aviation forum, Orlando, FL, pp 1–18 8. Enns R, Si J (2000) Neuro-dynamic programming applied to helicopter flight control. In: Guidance, navigation, and control conference and exhibit. AIAA, Orlando, FL, pp 1–11
10 Adaptive Dynamic Programming for Flight Control
291
9. Ferrari S, Stengel RF (2004) Online adaptive critic flight control. J Guidance Control Dyn 27(5):777–786 10. van Kampen E-J, Chu Q, Mulder J (2006) Online adaptive critic flight control using approximated plant dynamics. In: International conference on machine learning and cybernetics. IEEE, pp 256–261 11. Abouheaf M, Gueaieb W, Lewis F (2018) Model-free gradient-based adaptive learning controller for an unmanned flexible wing aircraft. Robotics 7(4):1–22 12. Sun B, van Kampen E-J (2020) Incremental model-based global dual heuristic programming with explicit analytical calculations applied to flight control. Eng Appl Artif Intell 89:103425 13. Zhou Y, van Kampen E-J, Chu Q (2018) Incremental model based online dual heuristic programming for nonlinear adaptive control. Control Eng Pract 73:13–25 14. Helder B, van Kampen E-J, Pavel M (2021) Online adaptive helicopter control using incremental dual heuristic programming. In: Scitech Forum. AIAA, Orlando, FL, pp 1–18 15. Zhou Y, van Kampen E-J, Chu Q (2020) Incremental model based online heuristic dynamic programming for nonlinear adaptive tracking control with partial observability. Aerosp Sci Technol 105:1–14 16. Lee JH, van Kampen E-J (2021) Online reinforcement learning for fixed-wing aircraft longitudinal control. In: Scitech Forum. AIAA, Orlando, FL, pp 1–20 17. Li H, Sun L, Tan W, Jia B, Liu X (2020) Switching flight control for incremental model-based dual heuristic dynamic programming. J Guidance Control Dyn 43(7):1352–1358 18. Heyer S, Kroezen D, van Kampen E-J (2020) Online adaptive incremental reinforcement learning flight control for a CS-25 class aircraft. In: Scitech Forum. AIAA, pp 1–20 19. Sun B, van Kampen E-J (2021) Intelligent adaptive optimal control using incremental modelbased global dual heuristic programming subject to partial observability. Appl Soft Comput 103:107153 20. Wang D, Liu D, Wei Q, Zhao D, Jin N (2012) Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming. Automatica 48(8):1825–1832 21. Sun B, van Kampen E-J (2022) Event-triggered constrained control using explainable global dual heuristic programming for nonlinear discrete-time systems. Neurocomputing 468:452– 463 22. Sun B, van Kampen E-J (2021) Reinforcement-learning-based adaptive optimal flight control with output feedback and input constraints. J Guidance Control Dyn 44(9):1685–1691 23. Heydari A, Balakrishnan SN (2013) Finite-horizon control-constrained nonlinear optimal control using single network adaptive critics. IEEE Trans Neural Netw Learn Syst 24(1):145–157 24. Lhachemi H, Chu Y, Saussié D, Zhu G (2017) Flutter suppression for underactuated aeroelastic wing section: nonlinear gain-scheduling approach. J Guidance Control Dyn 40(8):2102–2109 25. Li L, Song L, Li T, Fu J (2021) Event-triggered output regulation for networked flight control system based on an asynchronous switched system approach. IEEE Trans Syst Man Cybern Syst 51(12):7675–7684 26. Wang J, Wang P, Ma X (2020) Adaptive event-triggered control for quadrotor aircraft with output constraints. Aerosp Sci Technol 105:1–13 27. Sun B, Wang X, van Kampen E-J (2022) Event-triggered intelligent critic control with input constraints applied to a nonlinear aeroelastic system. Aerosp Sci Technol 120:1–11 28. Li D, Xiang J, Guo S (2011) Adaptive control of a nonlinear aeroelastic system. Aerosp Sci Technol 15(5):343–352 29. Sanches L, Guimarães TA, Marques FD (2019) Aeroelastic tailoring of nonlinear typical section using the method of multiple scales to predict post-flutter stable LCOs. Aerosp Sci Technol 90:157–168 30. Lee KW, Singh SN (2018) Robust finite-time continuous control of an unsteady aeroelastic system. J Guidance Control Dyn 41(4):978–986
292
E.-J. van Kampen and B. Sun
Erik-Jan van Kampen received his BSc, MSc, and PhD degrees from Delft University of Technology in 2004, 2006, and 2010, respectively. He is currently an Assistant Professor in the Department of Control and Operations, Delft University of Technology, Delft, Netherlands. His research interests are intelligent flight control, reinforcement learning and incremental nonlinear control. Bo Sun received a B.S. degree and the M.S. degree in aerospace engineering from Northwestern Polytechnical University, Xi’an, China, 2016 and 2019, respectively. He is currently pursuing a Ph.D. degree with the Department of Control and Operations, Delft University of Technology, Delft, Netherlands. He has a broad interest in adaptive dynamic programming, reinforcement learning, and aerospace engineering.
Chapter 11
A Distributed Adaptive Control Approach to Cooperative Output Regulation of a Class of Heterogeneous Uncertain Multi-agent Systems Selahattin Burak Sarsılmaz, Ahmet Taha Koru, Tansel Yucelen, and Behçet Açıkme¸se
11.1 Introduction The output regulation problem has been one of the main topics in control theory since the 1970s (for example, see [1–5]). On the other hand, the cooperative output regulation problem, which can be regarded as an extension of the conventional output regulation problem to multi-agent systems, has attracted attention during the last decade (for instance, see [6–28]). The goal of this relatively recent problem is to design a distributed control law that enables output synchronization for a network of agents to a class of reference inputs while preserving closed-loop stability. Reference inputs are generated by an exosystem, which can be seen as the leader. The main theoretical difficulty here lies in the lack of central authority. In other words, each agent can share information with only their neighbors. The cooperative output regulation problem for multi-agent systems with heterogeneous (in dynamics and dimension) multi-input multi-output (MIMO) linear timeinvariant (LTI) agent dynamics over general directed graphs has been studied in [7, 11, 17, 18, 26–28]. Distributed control laws used to solve this problem are predicated on the following two approaches of linear output regulation, namely feedforward and S. B. Sarsılmaz (B) Utah State University, Logan, UT, USA e-mail: [email protected] A. T. Koru University of Texas at Arlington, Arlington, TX, USA e-mail: [email protected] T. Yucelen University of South Florida, Tampa, FL, USA e-mail: [email protected] B. Açıkme¸se University of Washington, Seattle, WA, USA e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 A. L’Afflitto et al. (eds.), Control of Autonomous Aerial Vehicles, Advances in Industrial Control, https://doi.org/10.1007/978-3-031-39767-7_11
293
294
S. B. Sarsılmaz et al.
p-copy internal model. The former is adopted by [7, 17, 18], and the feedforward gain of each agent relies on the solution of the regulator equations; thus, this methodology is known to be not robust to parameter uncertainties. The latter employed by [11, 26–28], however, is robust with respect to small variations of the plant parameters. The salient outcomes of [7, 11, 17, 18, 26, 28] are the agent-wise local design methods that pave the way for independent control gain synthesis of each agent and hence scalability. The design method in [28] is also capable of assigning the eigenvalues of the overall closed-loop system to a user-defined region. This feature makes the optimization of the transient response of the multi-agent system possible, for instance, the minimal decay rate and minimal damping ratio maximization. The global design method in [27] is less conservative than the agent-wise local results in [11, 26, 28]. Although it can also optimize the transient response of overall closed-loop system, it depends on the multi-agent system scale. In the presence of linearly parameterized matched system uncertainties, which are large-scale uncertainties, distributed controllers of the aforementioned studies [7, 11, 17, 18, 26–28] cannot directly address cooperative output regulation. To handle these matched system uncertainties, adaptive control techniques have been utilized in [29– 31] for the leader-following consensus problem of multi-agent systems with identical nominal LTI agent dynamics subject to different uncertainties, where the leaderfollowing consensus problem can be viewed as a special case of the cooperative output regulation problem [7]. When the heterogeneity in networks stems not only from different uncertainties but also from nonidentical nominal agent dynamics with different state dimensions, the distributed controllers in [30, 31] are not feasible. To the authors’ knowledge, the approach in [29] has not been generalized to this case either. The authors of [12–14, 16, 21, 24] have provided distributed controllers for various classes of heterogeneous uncertain nonlinear multi-agent systems, but these controllers are restricted to the cooperative output regulation problem for networks of single-input single-output systems. The proposed neuro-adaptive controller in [32] applies to MIMO agent dynamics in Brunovsky form, where the agents have the same state dimension. Just recently, the authors of [33] have considered networks of MIMO systems in lower triangular form. In this chapter, we focus on the cooperative output regulation problem of multiagent systems with heterogeneous (in dynamics and dimension) nominal MIMO LTI agent dynamics subject to different matching uncertainties under general directed communication graphs. This problem is first addressed in [19] by the distributed observer-based feedforward approach in [7] and a certainty equivalent adaptive controller. We, on the other hand, extend the field of application of the internal modelbased distributed control approach in [11, 26–28] to this problem. The approach given in this chapter is twofold. First, building on the existing solution to the cooperative linear output regulation problem given in [26, 28], a distributed reference model is constructed to solve the problem as if there were no uncertainties. Then, a distributed dynamic state feedback control law is proposed to drive the output of the uncertain multi-agent system to the output of this reference model while keeping all the state variables of the closed-loop system bounded.
11 A Distributed Adaptive Control Approach to Cooperative …
295
The key elements of the proposed distributed control law are the distributed reference model characterizing the desired closed-loop response, an adaptive law (that is, parameter adaptation rule) handling the uncertainties, and a decoupling virtual tracking error (that is, information exchange) whose role is to achieve a level of decoupling between the distributed reference model and the adaptive controller in analysis and synthesis. A preliminary version of this distributed control law has been presented in [34] by the authors, and its performance has been demonstrated in [35] with an experimental study on a heterogeneous multi-agent system composed of two cart-inverted pendulums and a cart. With the current version, we now have the following capabilities. First, we can define not only the desired boundedness and steady-state properties but also the transient performance of the distributed reference model. Second, a larger class of reference inputs can be asymptotically tracked by the heterogeneous uncertain multi-agent system. Finally, the uncertainties in the control effectiveness of the agents are also taken care of. As highlighted in [7, 36], the cooperative output regulation problem offers a unifying framework to handle cooperative control problems such as consensus and formation of multi-agent systems (for example, see [37, 38]). It is also illustrated in [19, 36] that the energy management for electrical energy storage systems and leader-following formation of nonholonomic mobile robots can be formulated as the cooperative output regulation problem. The numerical section of this chapter considers a heterogeneous multi-vehicle system composed of four unmanned aerial vehicles (UAVs) and one unmanned ground vehicle (UGV). Based on the proposed distributed control law, it is demonstrated that they are synchronized to a circular trajectory while monitoring different altitudes.
11.2 Notation and Preliminaries We begin with the standard notation used in this chapter. Specifically, In denotes the n × n identity matrix, 1n denotes the n × 1 vector of all ones, diag(A1 , . . . , An ) denotes a block-diagonal matrix with matrix entries A1 , . . . , An on its diagonal blocks, and ⊗ denotes the Kronecker product of matrices. We write ρ(·) for the spectral radius of a square matrix, spec(·) for the spectrum of a square matrix, tr(·) for the trace of a square matrix, λmin (·) for the smallest eigenvalue of a symmetric matrix, and · 2 for the Euclidean norm of a vector. A Hermitian positive-definite (negative-definite) matrix X is denoted by X > 0 (< 0). Now, we continue with the basic graph-theoretical notations and definitions. A nonempty directed graph G without self-loops is a pair (N, E), where N = 1, . . . , N is a nonempty finite set of nodes and E ⊆ (i, j) | i, j ∈ N, i = j is a set of edges. Let G = (N, E) be a nonempty directed graph without self-loops. In the graph G, a directed walk of length ≥ 1 is a sequence of nodes i 1 , . . . , i +1 such that (i k , i k+1 ) ∈ E for k = 1, . . . , . A directed walk of length 0 consists of only one node. A directed path in G is a directed walk in which all nodes are distinct. A directed cycle in G is a walk of length ≥ 2 where only the first and last nodes are
296
S. B. Sarsılmaz et al.
the same. The graph G is called an acyclic graph if it contains no directed cycle. In the graph G, node j is said to be reachable from node i if there is a directed path starting at node i and ending at node j. The directed graph G is said to contain a directed spanning tree if there exists a node i such that every node in N is reachable from node i. Next, the p-copy internal model concept of square matrices given in Definition 1.22 and Remark 1.24 of [5] is tailored for this chapter. Definition 11.1 Given any square matrix A0 , a pair of matrices (G 1 , G 2 ) is said to incorporate a p-copy internal model of the matrix A0 if G 1 = diag(β1 , . . . , β p ), G 2 = diag(σ1 , . . . , σ p ), where for = 1, . . . , p, β is a square matrix and σ is a column vector such that the minimal polynomial of A0 is equal to the characteristic polynomial of β and the pair (β , σ ) is controllable. Finally, we recall an eigenvalue clustering-based definition of matrix stability and the definition of the linear matrix inequality (LMI) regions from Sect. II.A and Sect. II.B of [39], respectively. Definition 11.2 Let D be a region in the open left half complex plane (OLHP). A square matrix A is called D-stable if all its eigenvalues lie in D. Definition 11.3 A subset D of the complex plane is called an LMI region if there exist a symmetric matrix L ∈ Rq×q and a matrix M ∈ Rq×q such that D = {z ∈ C | L + z M + z¯ M T < 0}, where z¯ denotes the complex conjugate of z.
11.3 Problem Formulation In this chapter, we consider a heterogeneous uncertain multi-agent system in the form x˙i = Ai xi + Bi Λi u i + Bi Δi (xi ), yi = Ci xi , i = 1, . . . , N ,
(11.1a) (11.1b)
where xi ∈ Rni is the state, u i ∈ Rm i is the control input, and yi ∈ R p is the output of the ith subsystem. While Ai ∈ Rni ×ni , Bi ∈ Rni ×m i , and Ci ∈ R p×ni are known constant matrices, Λi ∈ Rm i ×m i is an unknown constant control effectiveness matrix and Δi : Rni → Rm i is a state-dependent uncertainty. Note that a considerable set of aerial vehicles can be modeled as in (11.1) (for example, see [40–42] and Sect. 11.6). We also consider an exosystem in the form
11 A Distributed Adaptive Control Approach to Cooperative …
297
ω˙ = A0 ω,
(11.2a)
y0 = Sω,
(11.2b)
with the state ω ∈ Rn 0 and the output y0 ∈ R p , which is the reference input to be tracked by the heterogeneous uncertain multi-agent system in (11.1). The matrix A0 ∈ Rn 0 ×n 0 is constant, its minimal polynomial is known, and S ∈ R p×n 0 is an unknown constant matrix. Moreover, the tracking (that is, regulation) error for each subsystem is defined as ei = yi − y0 . Now, we make assumptions about the uncertainties of the multi-agent system and the class of reference inputs that can be generated by the exosystem. Assumption 11.1 The matrix Λi is symmetric positive definite for i = 1, . . . , N . Assumption 11.2 The state-dependent uncertainty in (11.1) is linearly parameterized as Δi (xi ) = WiT σi (xi ), for all xi ∈ Rni , where Wi ∈ Rsi ×m i is an unknown constant matrix and σi : Rni → Rsi is a known basis function whose components are real-valued locally Lipschitz functions of xi . Assumption 11.3 All the eigenvalues of A0 are semisimple (that is, A0 is diagonalizable) with zero real parts. Remark 11.1 Assumptions 11.1 and 11.2 are standard in the adaptive control literature (for example, see [43] and Chap. 10 of [41]), and so is Assumption 11.3 in the output regulation literature (for instance, see [13] and Chap. 7 of [5]). Assumption 11.1 ensures that the uncertainties due to Λi and Δi (xi ) are matched. Hence, the ith subsystem in (11.1) would be linearized by a state feedback if Λi and Δi (xi ) were known. Under Assumption 11.3, for any ω(0) ∈ Rn 0 , the state ω(t) of the exosystem in (11.2) is bounded for t ≥ 0. In conjunction with the local Lipschitzness property of the basis function σi (xi ) in Assumption 11.2, the boundedness of ω(t) guarantees that the closed-loop system of interest in this chapter is well-posed. Assumption 11.3 also tells us that the exosystem can generate constant and sinusoidal signals and their combinations to be tracked by the multi-agent system in (11.1). In the context of cooperative output regulation (for example, see [18, 23, 24, 26]), the system (11.1) and the exosystem (11.2) together are regarded as a multi-agent system of N + 1 agents, where the exosystem is the leader and all the subsystems of (11.1) are the followers. To model the information exchange between N distinct follower agents, we define a time-invariant (that is, static) directed graph G = (N, E) without self-loops as N = 1, . . . , N , where node i ∈ N corresponds to the ith follower, and for each j, i ∈ N, we put ( j, i) ∈ E if, and only if, the ith follower has access to the information of the jth follower. The weighted adjacency matrix A = [ai j ] ∈ R N ×N of the graph G is defined by the rule that for each j, i ∈ N, ai j > 0 if ( j, i) ∈ E and ai j = 0 otherwise. Let Ni = j | ( j, i) ∈ E denote the
298
S. B. Sarsılmaz et al.
set of neighbors of the ith follower. The weighted in-degree di of the ith follower is defined as di = j∈Ni ai j . The leader is included in the information exchange model by augmenting the graph G. Specifically, let G = (N, E) be an augmented directed graph with N = {0} ∪ N and E = E ∪ E , where E ⊆ (0, i) | i ∈ N . Here, the node 0 corresponds to the leader and for each i ∈ N, we put (0, i) ∈ E if, and only if, the ith follower has access to the information of the leader. For each i ∈ N, the pinning gain ki > 0 if (0, i) ∈ E and ki = 0 otherwise. In practice, the information of the leader is only available to a small subset of the followers. We make the following assumption on the graph G. Assumption 11.4 The augmented directed graph G contains a directed spanning tree. Remark 11.2 For time-invariant networks, Assumption 11.4 is the weakest assumption on the graph connectivity to address leader-following consensus and cooperative output regulation problems (for instance, see Remark 3.2 in [44]). By definition of the graph G, Assumption 11.4 holds if, and only if, every node in N is reachable from the node 0. Figure 11.1 depicts two examples of augmented directed graphs satisfying Assumption 11.4. Readers are referred to Remark 3.3 in [44] for five classes of graphs covered by Assumption 11.4. The weighted adjacency matrix A has encoded the adjacency relationships in the directed graph G. As in [26], define F = diag (d1 + k1 )−1 , . . . , (d N + k N )−1 . Under Assumption 11.4, F is well defined since di + ki > 0 for all i ∈ N. Consequently, the matrix F A encodes the adjacency relationships in the augmented directed graph G. Note that F normalizes A in the sense that the row sum of row i is equal to 1 if ki = 0 and less than 1 otherwise. Example 11.1 Consider the graphs in Fig. 11.1. Let all the nonzero weights ai j and ki be 1. Then, the graphs admit the following matrix representations:
Fig. 11.1 Two examples of augmented directed graphs satisfying Assumption 11.4. Circles and arrows are drawn to represent the nodes and edges, respectively. While graph G1 contains cycles, graph G2 is acyclic
11 A Distributed Adaptive Control Approach to Cooperative …
⎡
0 ⎢0.5 ⎢ F1 A1 = ⎢ ⎢0.5 ⎣0.5 0.5
0 0 0.5 0 0
0 0.5 0 0 0
0 0 0 0 0.5
⎤ ⎡ 0 0 ⎢1 0⎥ ⎥ ⎢ ⎢ 0⎥ ⎥ , F2 A2 = ⎢0.5 ⎦ ⎣1 0.5 0 0.5
299
0 0 0.5 0 0
0 0 0 0 0
0 0 0 0 0.5
⎤ 0 0⎥ ⎥ 0⎥ ⎥, 0⎦ 0
where ρ(F1 A1 ) = 0.5 and ρ(F2 A2 ) = 0. Remark 11.3 It has been demonstrated in [26–28] that the spectral radius of F A plays a crucial role in p-copy internal model-based distributed control analysis and synthesis. For example, the disturbance attenuation level to be satisfied by each follower is determined by ρ(F A) in [26]. Under Assumption 11.4, 0 ≤ ρ(F A) < 1 and ρ(F A) = 0 if, and only if, the augmented directed graph G is acyclic (see Lemma 4.2 in [26] and Lemma 5 in [28]). A control law that relies on the information exchange modeled by the directed graph G is called a distributed control law. The closed-loop system is defined by (11.1) and the distributed controller. We are now ready to state the considered problem in this chapter. Problem 11.1 Given the multi-agent system composed of (11.1) and (11.2), which satisfy Assumptions 11.1–11.3, and the corresponding graph G under Assumption 11.4, find a distributed dynamic state feedback control law such that for any ω(0) ∈ Rn 0 , xi (0) ∈ Rni , i = 1, . . . , N , and the initial state of the distributed controller, all the state variables of the closed-loop system are bounded for t ≥ 0 and limt→∞ ei (t) = 0 for all i ∈ N. In the following two sections, we present agent-wise local methods for designing a distributed reference model and then introduce a distributed control law utilizing this reference model to solve Problem 11.1.
11.4 Distributed Reference Model If Δi (xi ) = 0 for all xi ∈ Rni and Λi = Im i in (11.1), then Problem 11.1 could be immediately solved by the p-copy internal model-based distributed dynamic state feedback studied in [26–28]. Due to these system uncertainties, the existing distributed control law is not directly applicable. Yet, we use it to propose a distributed reference model that satisfies the desired properties of the problem. To this end, let us first define a virtual tracking error ev,i for each i ∈ N as ev,i =
1 ai j (yr,i − yr, j ) + ki (yr,i − y0 ) , di + ki j∈N
(11.3)
i
where yr,i ∈ R p is the output of the ith follower in the distributed reference model introduced now by
300
S. B. Sarsılmaz et al.
x˙r,i = (Ai + Bi K 1i )xr,i + Bi K 2i z r,i ,
(11.4a)
z˙ r,i = G 1i z r,i + G 2i ev,i , yr,i = Ci xr,i , i = 1, . . . , N ,
(11.4b) (11.4c)
where xr,i ∈ Rni and z r,i ∈ Rn z are the states of the distributed reference model for the ith follower. The matrices K 1i ∈ Rm i ×ni , K 2i ∈ Rm i ×n z , G 1i ∈ Rn z ×n z , and G 2i ∈ Rn z × p are constant control gains to be designed. Furthermore, the regulated error of the distributed reference model for the ith follower is defined as er,i = yr,i − y0 . Since er,i − er, j = yr,i − yr, j , (11.3) can be equivalently written as ev,i = er,i −
1 ai j er, j . di + ki j∈N
(11.5)
i
Next, we specify the following assumption on the pair (G 1i , G 2i ). Assumption 11.5 The pair (G 1i , G 2i ) incorporates a p-copy internal model of A0 for all i ∈ N. Remark 11.4 Assumption 11.5 is standard in the linear output regulation literature for internal model-based control (for example, see Chap. 1 of [5]). Definition 11.1 tells us how to construct a pair (G 1i , G 2i ) satisfying Assumption 11.5. Let us consider a special case. If A0 = 0, then the exosystem in (11.2) can generate constant signals of arbitrary magnitudes. The pair (0, I p ) incorporates a p-copy internal model of 0. Hence, (11.4b) becomes a distributed integrator. Internal stabilization of the distributed reference model leads to an ideal system characterizing the desired closed-loop response for the multi-agent system in (11.1). To show this, we define the following matrices and vectors: Φ = diag(Φ1 , . . . , Φ N ), Φ = A, B, C; K = diag(K 1 , . . . , K N ), = 1, 2; G = diag(G 1 , . . . , G N ), l = T T T , . . . , αr,N ] , α = x, z, e; ηr = 1, 2; Sa = I N ⊗ S; W = (I N − F A) ⊗ I p ; αr = [αr,1 T T T [xr , z r ] ; and ωa = 1 N ⊗ ω. Considering (11.4), (11.5), and er,i = Ci xr,i − Sω, we arrive at the dynamics in the compact form η˙ r = Ag ηr + Bg ωa , er = Cg ηr + Dg ωa ,
(11.6a) (11.6b)
where A 0 A + B K1 B K2 B K K , = + Ag = G 2 WC G 1 G 2 WC G 1 0 1 2 Kg
Bg =
Ao
0 , Cg = [C 0] , −G 2 WSa
Bo
Dg = −Sa .
11 A Distributed Adaptive Control Approach to Cooperative …
301
Proposition 11.1 Let Assumptions 11.3–11.5 hold. If Ag is Hurwitz, then, for any ω(0) ∈ Rn 0 , xr,i (0) ∈ Rni , z r,i (0) ∈ Rn z , i = 1, . . . , N , all the state variables of the distributed reference model in (11.4) are bounded for t ≥ 0 and limt→∞ er,i (t) = 0 for all i ∈ N. Proof Let ω(0) ∈ Rn 0 be given. The solution of (11.2a) is ω(t) = et A0 ω(0). Under Assumption 11.3, et A0 is bounded for t ≥ 0. Hence, ω(t) is bounded for t ≥ 0, and so is ωa (t). Since Ag is Hurwitz, the linear system (11.6), with ωa viewed as the input, is input-to-state stable (for example, see [45] and Chap. 4.9 in [46]). This, together with the boundedness Nof ωa (t) on [0, ∞), implies that for any initial state ¯ nz , where n¯ = i=1 n i , ηr (t) is bounded on [0, ∞). Finally, Theorem ηr (0) ∈ Rn+N ¯ nz , limt→∞ er (t) = 0 since Assumptions 4.1 in [26] ensures that for any ηr (0) ∈ Rn+N 11.3–11.5 hold and Ag is Hurwitz. Remark 11.5 By virtue of Proposition 11.1, the cooperative linear output regulation problem of the distributed reference model has been converted into the stabilization problem of designing K g such that Ag is Hurwitz. Since the transient response of the distributed reference model is determined by the eigenvalues of Ag , a constraint on the eigenvalue locations can be incorporated into the stabilization problem as in Sect. II of [39]: Given the LMI region D in the OLHP, design K g such that Ag is D-stable. Note that numerous elements of K g equal to zero since K 1 and K 2 are block-diagonal matrices. Due to the structure of K g , its synthesis predicated on the pair (Ao , Bo ) leads to a nonconvex optimization problem in general. On the other hand, through structured Lyapunov inequalities, convex formulations of the control gain synthesis are provided in Theorem 1 of [27] and Proposition 1 of [28]. They are referred to as global design methods since a multi-agent system is regarded as one system. These structured Lyapunov-based global design methods are less conservative for synthesizing the control gains K 1i and K 2i of the followers than the agent-wise local ones in [26, 28]. However, the global design depends highly on the communication graph and multi-agent system scale. For a detailed comparison of the agent-wise local and global design methods, readers are referred to [27, 28]. Remark 11.6 It is shown in [39] that a wide variety of eigenvalue clustering regions, such as shifted half-planes, conic sectors, disks, and their intersections, are LMI regions. Eigenvalue clustering regions help control designers define desired transient responses for closed-loop systems. For example, if the closed-loop eigenvalues of an LTI system lie in the intersection of the user-defined shifted half-plane and conic sector, then both the decay rate and damping ratio of the closed-loop system are lower bounded by the desired bound. The resulting regional eigenvalue assignment problems for a single system have been addressed by LMI-based control synthesis for LMI regions in [39, 47]. Recall that the goal of the distributed reference model introduced in this section is to define the desired closed-loop response for the uncertain multi-agent system in (11.1). Under mild assumptions, it is shown in Proposition 11.1 that the desired boundedness and steady-state properties of the distributed reference model are achieved when Ag is Hurwitz. It is also of practical importance to define a desired level of transient performance for the multi-agent system. To have
302
S. B. Sarsılmaz et al.
this flexibility in the distributed reference model design, the subsequent discussions will focus on D-stability of Ag for any LMI region D in the OLHP. In this chapter, we employ the agent-wise local design methods of [28] for the design of the control gains K 1i and K 2i of the followers, and hence K g . Thus, we can To present these methods, we design K 1i and K 2i for each follower independently. 1 T T T , z r,i ] ∈ Rni +n z and μi = di +k define ξr,i = [xr,i j∈Ni ai j er, j . Considering (11.4), i (11.5), and er,i = Ci xr,i − Sω also yield the following dynamics: ξ˙r,i = Af,i ξr,i + Bf,i (μi + Sω), er,i = Cf,i ξr,i − Sω, i = 1, . . . , N ,
(11.7a) (11.7b)
where 0 Ai Ai + Bi K 1i Bi K 2i B = = + i K 1i K 2i , G 2i Ci G 1i 0 G 2i Ci G 1i K f,i
Af,i
Bf,i =
Ao,i
Bo,i
0 , Cf,i = [Ci 0] . −G 2i
For each i ∈ N, the transfer matrix from μi to er,i is gf,i (s) = Cf,i (s Ini +n z − Af,i )−1 Bf,i .
(11.8)
We now state Theorem 1 of [28], which studies the relationship between the Dstability of Ag and the D-stability of all Af,i for the special case wherein ρ(F A) = 0 (see Remark 11.3). Lemma 11.1 Let D be an LMI region in the OLHP, where L and M are the matrices associated with the LMI region D as in Definition 11.3. Let Assumption 11.4 hold with ρ(F A) = 0. Then, the following statements are equivalent: (i) The matrix Ag is D-stable. (ii) The matrix Af,i is D-stable for all i ∈ N. (iii) There exists a symmetric positive-definite matrix Pi ∈ R(ni +n z )×(ni +n z ) such that L ⊗ Pi + M T ⊗ (ATf,i Pi ) + M ⊗ (Pi Af,i ) < 0,
(11.9)
for all i ∈ N. Remark 11.7 For acyclic augmented directed graphs satisfying Assumption 11.4, the D-stability of Ag reduces to the following problem: For each i ∈ N, design K f,i such that Af,i is D-stable. Note that, unlike K g , the matrix K f,i does not have any structure. Consequently, there exists a K f,i such that Af,i is D-stable for every LMI region D in the OLHP if, and only if, the pair (Ao,i , Bo,i ) is controllable. In the case that the pair (Ao,i , Bo,i ) is stabilizable, for an LMI region D in the OLHP, there
11 A Distributed Adaptive Control Approach to Cooperative …
303
exists a K f,i such that Af,i is D-stable if, and only if, the region D contains all the uncontrollable modes of the pair (Ao,i , Bo,i ). We know from Lemma 1.26 in [5] that under Assumption 11.5, the pair (Ao,i , Bo,i ) is controllable (stabilizable) if the pair (Ai , Bi ) is controllable (stabilizable) and rank
Ai − λIni Bi = n i + p, Ci 0
for all λ ∈ spec(A0 ). Remark 11.8 For a given K f,i , the D-stability of Af,i is characterized by the existence of a Pi > 0 satisfying the inequality (11.9) due to Theorem 2.2 in [39]. When L = 0 and M = 1, the LMI region D coincides with the OLHP, and (11.9) reduces to the well-known Lyapunov matrix inequality (for example, see Theorem 8.2 in [48]) ATf,i Pi + Pi Af,i < 0.
(11.10)
While the inequality (11.9) is an LMI for the free parameter Pi , it is not jointly convex in K f,i and Pi . As discussed in Remark 2 of [28], the dual of (11.9), together with a simple change of variable (for example, see Sect. 7.2.1 in [49]), overcomes this issue. Specifically, if there exist a symmetric positive-definite matrix Q i ∈ R(ni +n z )×(ni +n z ) and a matrix Yi ∈ Rm i ×(ni +n z ) such that T D = L ⊗ Q i + M T ⊗ (Q i ATo,i + YiT Bo,i ) + M ⊗ (Ao,i Q i + Bo,i Yi ) < 0, M (11.11) and if the control gain K f,i is recovered as K f,i = Yi Q i−1 , then Af,i is D-stable. When L = 0 and M = 1, the LMI (11.11) reduces to the well-known LMI (for example, see Eq. (7.9) in [49]) T + Bo,i Yi < 0. Q i ATo,i + Ao,i Q i + YiT Bo,i
(11.12)
Note that the stabilizability of the pair (Ao,i , Bo,i ) is equivalent to the feasibility of the LMI (11.12). Lemma 11.1 provides necessary and sufficient conditions for the D-stability of Ag in terms of the subsystems in (11.7). However, for the case wherein ρ(F A) > 0, there are only sufficient conditions available in terms of the subsystems in (11.7). Specifically, Theorem 2 in [28] shows that a robust D-stability condition of Af,i for all i ∈ N implies the D-stability of Ag . The converse is not true. When D is the OLHP, we also know from Theorem 4.2 in [26] that the Hurwitzness of Af,i , together with a disturbance attenuation condition predicated on the transfer function (11.8) and ρ(F A), for all i ∈ N implies the Hurwitzness of Ag . The converse is not true. We now state Corollary 1 of [28], which is based on Theorem 2 in [28] and is the counterpart of the LMI (11.11) for the case wherein ρ(F A) > 0. Lemma 11.2 Let D be an LMI region in the OLHP, where L and M are the matrices associated with the LMI region D as in Definition 11.3. Let M be factorized as M =
304
S. B. Sarsılmaz et al.
M1T M2 with M1 ∈ Rr ×q and M2 ∈ Rr ×q . Let Assumption 11.4 hold with ρ(F A) > 0. If there exist symmetric positive-definite matrices Q i ∈ R(ni +n z )×(ni +n z ) , Z ∈ Rr ×r , and a matrix Yi ∈ Rm i ×(ni +n z ) such that ⎤ T D M (M1T Z ) ⊗ Bf,i M2T ⊗ (Q i Cf,i ) T ⎦ < 0, ⎣ (Z M1 ) ⊗ Bf,i (−γ Z ) ⊗ I p 0 0 (−γ Z ) ⊗ I p M2 ⊗ (Cf,i Q i ) ⎡
(11.13)
D is defined in (11.11), and if the for all i ∈ N, where 1 < γ ≤ 1/ρ(F A) and M control gain K f,i is recovered as K f,i = Yi Q i−1 for all i ∈ N, then the matrix Ag is D-stable. Remark 11.9 Since the matrix Z in (11.13) is common to all followers, the LMI (11.13) needs to be solved for the free parameters Q i , Z , Yi , and i ∈ N simultaneously. As discussed in Remark 6 of [28], the sufficient condition for the D-stability of Ag in Lemma 11.2 becomes agent-wise local if one of the following two approaches is taken: (i) Set Z = Ir ; (ii) For one of the followers, solve the LMI (11.13) for the free parameters Q i , Z , and Yi . Then, fix Z and solve the LMI (11.13) for the other follower agents independently. Remark 11.10 A few notes regarding the design of the control gains K 1i and K 2i are in order. If ρ(F A) = 0, we use the LMI (11.11) to compute them. Otherwise, we utilize the LMI (11.13). These agent-wise local design methods enforce Af,i to be Hurwitz for all i ∈ N. However, that Af,i is Hurwitz for all i ∈ N is not necessary for Ag being Hurwitz when ρ(F A) > 0 (see Appendix A of [26]). Nonetheless, the distributed control law presented in the next section relies on the assumption that Af,i is Hurwitz for all i ∈ N.
11.5 Distributed Control Law In this section, we consider the system uncertainties in (11.1), which satisfy Assumptions 11.1 and 11.2. In particular, we propose a distributed control law that includes the distributed reference model presented in the previous section and a direct adaptive controller to handle the system uncertainties. This distributed control law will provide a level of decoupling between the distributed reference model and the adaptive controller in analysis and synthesis through new information exchange between agents. For this purpose, let us first define a decoupling virtual tracking error e˜v,i for each i ∈ N as e˜v,i =
1 ai j (yi − yr, j ) + ki (yi − y0 ) = ei − μi . di + ki j∈N i
(11.14)
11 A Distributed Adaptive Control Approach to Cooperative …
305
The distributed dynamic state feedback control law proposed in this chapter consists of (11.3), (11.4), (11.14), and u i = u n,i + u ad,i , u n,i = K 1i xi + K 2i z i = K f,i ξi , T a,i u ad,i = −W σa,i (ξi ), z˙ i = G 1i z i + G 2i e˜v,i , ˙ = Γ σ (ξ )ξ˜ T P B , i = 1, . . . , N , W a,i i a,i i i f,i o,i
(11.15a) (11.15b) (11.15c) (11.15d) (11.15e)
where u n,i and u ad,i are the nominal (that is, baseline) and adaptive components of the control input for the ith follower, respectively. In (11.15), z i ∈ Rn z , ξi = [xiT , z iT ]T ∈ Rni +n z , ξ˜i = ξi − ξr,i , σa,i (ξi ) = [u Tn,i , σiT (xi )]T ∈ Rm i +si is the aggregated basis a,i ∈ R(m i +si )×m i is the estimate of the aggregated unknown matrix function, and W T = [Im i − Λi−1 , Λi−1 WiT ]. FurtherWa,i with the standard adaptive law, where Wa,i more, Γi ∈ R(m i +si )×(m i +si ) is a constant adaptive gain matrix and Pf,i ∈ R(ni +n z )×(ni +n z ) is a constant matrix. Properties of the matrices Γi and Pf,i will be specified later in Theorem 11.1. Remark 11.11 The distributed reference model constructed in the previous section defines the desired closed-loop response for the uncertain multi-agent system in (11.1) with the aid of the quadruple (K 1i , K 2i , G 1i , G 2i ) of each follower. The nominal controller in (11.15) makes use of this quadruple and the decoupling virtual tracking error in (11.14). It is expected to operate as designed when there are no uncertainties. The adaptive controller in (11.15) augments the nominal controller to recover the desired performance when the multi-agent system is subject to uncertainties. The adaptive law in (11.15e) is standard in the adaptive control literature (for example, see Chap. 10 of [41] and Chap. 5 of [50]). Now, we are ready to present the main result of this chapter. Theorem 11.1 Let Assumptions 11.1–11.5 hold. Under the assumption that Af,i is Hurwitz for all i ∈ N, let Pf,i be a symmetric positive-definite matrix satisfying the Lyapunov matrix inequality (11.10) for all i ∈ N. Let Γi be symmetric positive definite for all i ∈ N. If Ag is Hurwitz, then the distributed control law given by (11.3), (11.4), (11.14), and (11.15) solves Problem 11.1. Proof Let ω(0) ∈ Rn 0 and the initial state of the closed-loop system (that is, ξr,i (0) ∈ a,i (0) ∈ R(m i +si )×m i , i = 1, . . . , N ) be given. Under the Rni +n z , ξi (0) ∈ Rni +n z , and W conditions of the theorem, ξr,i (t) is bounded on [0, ∞) and limt→∞ er,i (t) = 0 for all i ∈ N by Proposition 11.1. To complete the proof, we need to show the boundedness a,i (t) on [0, ∞) and the convergence of the regulation error ei (t) to of ξi (t) and W zero for all i ∈ N. a,i and e˜i = yi − yr,i for all i ∈ N. Note that a,i = Wa,i − W To that end, define W e˜i = ei − er,i and ei = Ci xi − Sω for all i ∈ N. Under Assumptions 11.1 and 11.2, the dynamical system
306
S. B. Sarsılmaz et al.
ξ˙i = Ao,i ξi + Bo,i (Λi u i + WiT σi (xi )) + Bf,i (μi + Sω) T = Af,i ξi + Bo,i Λi (Wa,i σa,i (ξi ) + u ad,i ) + Bf,i (μi + Sω) T a,i σa,i (ξi ) + Bf,i (μi + Sω), = Af,i ξi + Bo,i Λi W (11.16a)
˙ = −Γ σ (ξ )ξ˜ T P B , W a,i i a,i i i f,i o,i ei = Cf,i ξi − Sω, i = 1, . . . , N
(11.16b) (11.16c)
arises from (11.1), (11.14), and (11.15). By (11.7) and (11.16), we arrive at the error equation given by T a,i σa,i (ξ˜i + ξr,i (t)), ξ˙˜i = Af,i ξ˜i + Bo,i Λi W
˙ = −Γ σ (ξ˜ + ξ (t))ξ˜ T P B , W a,i i a,i i r,i i f,i o,i e˜i = Cf,i ξ˜i , i = 1, . . . , N .
(11.17a) (11.17b) (11.17c)
Now, fix i ∈ N. First, in order to guarantee the boundedness of ξi (t) and a,i (t) are bounded on a,i (t) on [0, ∞), it suffices to show that ξ˜i (t) and W W [0, ∞) because ξr,i (t) is bounded on [0, ∞) and Wa,i is constant. Second, it is enough to show limt→∞ e˜i (t) = 0 to prove limt→∞ ei (t) = 0 since ei = e˜i + er,i and limt→∞ er,i (t) = 0. Thus, it is enough to show limt→∞ ξ˜i (t) = 0 by (11.17c). a,i ) = (0, 0) is an equilibrium point of the ith subsystem Note that the origin (ξ˜i , W in (11.17a) and (11.17b). To conclude the desired results, we follow the steps in the proof of Theorem 8.4 in [46], which is an invariance-like theorem. Consider the following continuously differentiable Lyapunov function candidate: T a,i ) = ξ˜iT Pf,i ξ˜i + tr Λi W a,i a,i . Γi−1 W Vi (ξ˜i , W
(11.18)
Since Pf,i , Λi , and Γi are symmetric positive-definite matrices, one can show that a,i ) > 0 for all (ξ˜i , W a,i ) = (0, 0), and Vi (ξ˜i , W a,i ) is radially Vi (0, 0) = 0, Vi (ξ˜i , W ˜ radial unboundedness of Vi (ξi , Wa,i ) implies that the set Ωc = unbounded. The a,i ) ∈ Rni +n z × R(m i +si )×m i | Vi (ξ˜i , W a,i ) ≤ c is compact for every c > 0 (ξ˜i , W a,i (0)) ∈ Ωc . The derivative of Vi along the and we can choose c so that (ξ˜i (0), W trajectories of the ith subsystem in (11.17a) and (11.17b) is given by T a,i ) = ξ˜iT ATf,i Pf,i + Pf,i Af,i ξ˜i + 2ξ˜iT Pf,i Bo,i Λi W a,i σa,i (ξ˜i + ξr,i (t)) V˙i (t, ξ˜i , W T T a,i σa,i (ξ˜i + ξr,i (t))ξ˜i Pf,i Bo,i . − 2tr Λi W (11.19) Let Q f,i = − ATf,i Pf,i + Pf,i Af,i . Since Pf,i is a symmetric positive-definite matrix satisfying the Lyapunov matrix inequality (11.10), Q f,i is a symmetric positivedefinite matrix. For any two codimensional column vectors a and b, a T b = tr(ba T ). The vector trace identity and the positive-definiteness of Q f,i yield a,i ) = −ξ˜iT Q f,i ξ˜i ≤ 0, V˙i (t, ξ˜i , W
(11.20)
11 A Distributed Adaptive Control Approach to Cooperative …
307
a,i ) ∈ Rni +n z × R(m i +si )×m i . for all t ≥ 0 and (ξ˜i , W Under Assumption 11.2, it can be shown that the right-hand side function of a,i ). It is also continuous in t since (11.17a) and (11.17b) is locally Lipschitz in (ξ˜i , W a,i (t)) ≤ Vi (ξ˜i (0), W a,i (0)) ≤ c for all ξr,i is continuous in t. As V˙i ≤ 0, Vi (ξ˜i (t), W a,i (t)) ∈ Ωc for all t ≥ 0. Since Ωc is a compact set, ξ˜i (t) and t ≥ 0. Thus, (ξ˜i (t), W a,i (t) are bounded on [0, ∞), as desired, and uniquely defined for all t ≥ 0 by W Theorem 3.3 in [46]. Next, define Ui (ξ˜i ) = ξ˜iT Q f,i ξ˜i . Note that Ui (0) = 0 and Ui (ξ˜i ) > 0 for all ξ˜i = 0. Integrating (11.20) from 0 to any t > 0 and rearranging the terms, we have t 0≤
a,i (0)) − Vi (ξ˜i (t), W a,i (t)) Ui (ξ˜i (τ ))dτ = Vi (ξ˜i (0), W
0
a,i (0)) < ∞, ≤ Vi (ξ˜i (0), W
(11.21)
˜ where the second inequality t follows from the fact that Vi (ξi (t), Wa,i (t)) ≥ 0 for all ˜ ˜ (ξi , Wa,i ). Let h i (t) = 0 Ui (ξi (τ ))dτ . Since h i (t) is monotonically increasing on a,i (0)), limt→∞ h i (t) exists and is finite. [0, ∞) and bounded above by Vi (ξ˜i (0), W The boundedness of ξr,i (t) and ξ˜i (t) on [0, ∞) has ensured the boundedness of ξi (t) on [0, ∞). The components of the aggregated basis function σa,i are continuous functions of ξi because the components of σi are locally Lipschitz functions of xi and u n,i is a linear function of ξi . In conjunction with the boundedness of ξi (t), this implies that σa,i (ξi (t)) = σa,i (ξ˜i (t) + ξr,i (t)) is bounded on [0, ∞). a,i (t), and σa,i (ξ˜i (t) + ξr,i (t)) are bounded on [0, ∞), we conclude Since ξ˜i (t), W from (11.17a) that ξ˙˜i (t) is bounded on [0, ∞). Therefore, ξ˜i (t) is uniformly continuous Moreover, since Ui (ξ˜i ) is continuous in ξ˜i on the compact set on [0,n∞). i +n z a,i ) ∈ Ωc for some W a,i ∈ R(m i +si )×m i , which is the pro˜ | (ξ˜i , W φc = (ξi ∈ R jection of the compact set Ωc onto one of its coordinates, Ui (ξ˜i ) is uniformly continuous on the set φc . Consequently, Ui (ξ˜i (t)) is uniformly continuous in t on [0, ∞) by being the composition of uniformly continuous functions Ui (ξ˜i ) and ξ˜i (t) because ξ˜i (t) ∈ φc for all t ≥ 0. t We have shown that limt→∞ 0 Ui (ξ˜i (τ ))dτ exists and is finite, and Ui (ξ˜i (t)) is uniformly continuous on [0, ∞). Thus, we can apply Barbalat’s lemma (for example, see Lemma 8.2 in [46]) and conclude that limt→∞ Ui (ξ˜i (t)) = 0, which in turn implies that limt→∞ ξ˜i (t) = 0 because of the inequality λmin (Q f,i )ξ˜i (t)22 ≤ Ui (ξ˜i (t)), where λmin (Q f,i ) > 0. Since i ∈ N was arbitrary, the proof of the theorem is complete. Remark 11.12 As summarized in Remark 11.10, not only Ag but also Af,i is Hurwitz for all i ∈ N when we design the control gains K 1i and K 2i for the followers by the agent-wise local design methods in the previous section. One can also use the global design methods in Theorem 1 of [27] or Proposition 1 of [28] to synthesize the control gains. However, a little extra care is required with the following point. It is still unknown whether the structured Lyapunov-based global design method ensures
308
S. B. Sarsılmaz et al.
that Af,i is Hurwitz for all i ∈ N (see Remark 6 in [27]). Since the adaptive law in (11.15e) employs a symmetric positive-definite Pf,i satisfying the Lyapunov matrix inequality in (11.10) for each follower and it plays a crucial role in the Lyapunov analysis of Theorem 11.1, the Hurwitzness of the resulting Af,i with the global design method should be checked for all i ∈ N. Remark 11.13 The proof of Theorem 11.1 shows how we break Problem 11.1 into two problems, namely, (i) cooperative linear output regulation of the distributed reference model, which has been converted into a stabilization problem by Proposition 11.1 (see Remark 11.5); (ii) model reference adaptive control of each follower. Specifically, after the first problem is addressed in the proof, the decoupling virtual tracking error in (11.14) allows us to arrive at the dynamics in (11.17). Each subsystem in (11.17) is a nonlinear nonautonomous system with the external driving input ξr,i (t). This is in the form of a commonly studied closed-loop error dynamics of a direct model reference adaptive control system (for instance, see [43], Chap. 10 of [41], Chap. 5 of [50], and Example 8.12 in [46]). We, therefore, have conducted the Lyapunov analysis of each subsystem in (11.17) independently. In short, once the distributed reference model has the desired characteristics of Problem 11.1, the problem for each follower reduces to solving its own model reference adaptive control problem by virtue of the decoupling virtual tracking error. Remark 11.14 For a single system, the standard adaptive law in (11.15e) is known to encounter robustness issues when the system is subject to additional uncertainties such as unstructured state-dependent uncertainties and unknown bounded timevarying external disturbances (for instance, see Chap. 8 of [50]). Unboundedness of the state of the adaptive law (that is, adaptive parameters) is one of the issues and is referred to as parameter drift. To overcome the robustness issues stemming from the uncertainties different than the ones considered in Assumptions 11.1 and 11.2, many modifications to the standard adaptive law have been developed in the adaptive control literature (for example, see Chap. 11 in [41] and Chap. 9 in [50]). Several modifications, such as σ -modification and projection operator, can be incorporated into the proposed distributed control law in this chapter owing to its decoupling feature explained in Remark 11.13. We echo the opinion of the authors of [41], whereby in real-life applications, control designers should consider at least the following two modifications: (i) dead-zone modification to prevent adaptive parameters from drifting due to noise; (ii) projection operator to ensure that the adaptive parameters do not leave a predefined compact set.
11.6 Illustrative Numerical Example We consider a multi-vehicle system consisting of four quadrotor UAVs and one UGV. The vehicles cooperatively follow a circular trajectory in the x y−plane. To accomplish the goal, all the vehicles execute the distributed dynamic state feedback control law given by (11.3), (11.4), (11.14), and (11.15) for the x−axis and y−axis
11 A Distributed Adaptive Control Approach to Cooperative …
309
Fig. 11.2 The inertial frame and the body-fixed frame of the ith UAV
separately. The augmented directed graph G1 in Fig. 11.1 describes the information flow among the vehicles. We set each nonzero ai j and ki to 1. The altitude of each UAV is regulated around different predetermined values with a static state feedback control law to avoid crashes. Similarly, headings of all the vehicles are regulated around 0◦ . Using ZYX Euler angles to describe the attitude of ith body-fixed frame with respect to the inertial frame in Fig. 11.2, the nonlinear quadrotor UAV dynamics can be obtained from Eq. (2) in [51] as m i p¨x,i = − [sin(φi ) sin(ψi ) + cos(φi ) sin(θi ) cos(ψi )] f i ,
(11.22a)
m i p¨y,i = − [cos(φi ) sin(θi ) sin(ψi ) − sin(φi ) cos(ψi )] f i , m i p¨z,i = − cos(φi ) cos(θi ) f i + m i g, Jxx,i ω˙ x,i = Jyy,i − Jzz,i ωy,i ωz,i + τx,i , Jyy,i ω˙ y,i = Jzz,i − Jxx,i ωx,i ωz,i + τy,i , Jzz,i ω˙ z,i = Jxx,i − Jyy,i ωx,i ωy,i + τz,i , i = 1, 2, 3, 4,
(11.22b) (11.22c) (11.22d) (11.22e) (11.22f)
where g ∈ R is the gravity constant, m i ∈ R is the mass, Ji = diag(Jxx,i , Jyy,i , Jzz,i ) ∈ T R3×3 is the matrix of inertia, pi = px,i , py,i , pz,i ∈ R3 is the position in the iner T tial frame, Θi = [φi , θi , ψi ]T ∈ R3 is the Euler angles, ωi = ωx,i , ωy,i , ωz,i ∈ R3 is the angular velocity in the ith body-fixed frame, f i ∈ R is the thrust input, and T τi = τx,i , τy,i , τz,i ∈ R3 is the torque input for the ith UAV. We model actuator uncertainties or failures by τx,i = cx,i u x,i , τy,i = cy,i u y,i , τz,i = cz,i u zi , and f i = cf,i u f,i for some unknown positive constants cx,i , cy,i , cz,i , cf,i ∈ R. The kinematic equation φ˙ i = ωx,i + sin(φi ) tan(θi )ωy,i + cos(φi ) tan(θi )ωz,i , θ˙i = cos(φi )ωy,i − sin(φi )ωz,i , ψ˙ i = sin(φi ) sec(θi )ωy,i + cos(φi ) sec(θi )ωz,i , i = 1, 2, 3, 4,
(11.23a) (11.23b) (11.23c)
describes the relationship between ωi and Θ˙ i when θi = ±π/2 (for example, see Eq. (1.77) in [52]).
310
S. B. Sarsılmaz et al.
The linearization of the nonlinear dynamics in (11.22) and (11.23) around Θi = 0, ωi = 0, f i = m i g, and τi = 0 yields the decoupled LTI dynamics for the position in the x−axis as ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎡ 0 p˙ x,i 01 0 0 px,i ⎢ v˙x,i ⎥ ⎢ 0 0 −g 0 ⎥ ⎢ vx,i ⎥ ⎢ 0 ⎥ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ (11.24) ⎣ θ˙i ⎦ = ⎣ 0 0 0 1 ⎦ ⎣ θi ⎦ + ⎣ 0 ⎦ cy,i u y,i , −1 Jyy,i ωy,i 00 0 0 ω˙ y,i the position in the y−axis as ⎤ ⎡ p˙ y,i 0 ⎢ v˙y,i ⎥ ⎢ 0 ⎥ ⎢ ⎢ ⎣ φ˙ i ⎦ = ⎣ 0 0 ω˙ x,i ⎡
1 0 0 0
0 g 0 0
⎤ ⎡ ⎤ ⎤⎡ 0 0 py,i ⎥ ⎢ ⎥ ⎢ 0⎥ ⎥ ⎢ vy,i ⎥ + ⎢ 0 ⎥ cx,i u x,i , ⎦ ⎣ ⎦ ⎣ 0 ⎦ φi 1 −1 ωx,i 0 Jxx,i
(11.25)
the position in the z−axis as
p˙ z,i v˙z,i
=
01 00
pz,i vz,i
+
0 −1 cf,i u f,i , −m i
and the heading as
ψ˙ i ω˙ z,i
01 = 00
ψi ωz,i
+
0
−1 Jzz,i
cz,i u z,i ,
for i = 1, 2, 3, 4, where vx,i = p˙ x,i , vy,i = p˙ y,i , and vz,i = p˙ z,i . The last agent is an omnidirectional mobile robot with decoupled single integrator dynamics given by p˙ x,i = cx,i u x,i , p˙ y,i = cy,i u y,i , ψ˙ i = cz,i u z,i , for i = 5 (for example, see [53]). The parameters of vehicles can be seen in Table 11.1. We set u f,i = −1.5( prz,i − pz,i ) + 0.5vz,i for the altitude regulation, where prz,i is the predetermined altitude, for i = 1, 2, 3, 4, and u z,i = −2.8ψi − 0.2ωz,i for the heading regulation for i = 1, . . . , 5. An exosystem with the system matrix A0 =
0 2π f −2π f 0
can generate the circular trajectory with a frequency f = 1/20 Hertz. The output matrices of the exosystem are S = [1, 0] and S = [0, 1] for the x−axis and y−axis,
11 A Distributed Adaptive Control Approach to Cooperative … Table 11.1 Parameters of vehicles Vehicle m i (kg) Ji (kg · m2 ) 1 2, 3, 4 5
1.0 0.5 –
10−3
× diag(2.5, 2.2, 3.0) 10−3 × diag(1.5, 1.5, 2.0) –
311
cx,i
cy,i
cz,i
cf,i
0.4 0.2 0.1
0.4 0.2 0.1
1.0 1.0 1.0
1.0 1.0 –
respectively. The pair (G 1i , G 2i ) given by G 1i =
0 1 0 = , G , 2i −4π 2 f 2 0 1
incorporates a 1-copy internal model of A0 for i = 1, . . . , 5 for each axis. An LMI region with
00 L = L c (θ ) = , 00
sin(θ ) cos(θ ) M = Mc (θ ) = − cos(θ ) sin(θ )
corresponds to a conic sector in the OLHP where 0 < θ < π/2. This region provides a lower bound for the damping ratio of the distributed reference model (minimal damping ratio). An LMI region with L = L s (α) = 2α,
M = Ms (α) = 1
for α ≥ 0 corresponds to a shifted half-plane in the OLHP. This region provides a lower bound for the decay rate of the distributed reference model (minimal decay rate). An LMI region with L = L d (c, r ) =
−r −c , −c −r
M = Md (c, r ) =
01 , 00
corresponds to a disk centered at c on the real axis with a radius r . This region provides an upper bound for the natural frequency of the distributed reference model (maximal natural frequency). Furthermore, it avoids the existence of eigenvalues with large magnitude when the disk is located at c = 0. Consequently, it prevents high control gains and high-frequency oscillations. We choose the intersection of these LMI regions to synthesize control gains for each axis as L = diag(Lc (θ ), Ls (α), Ld (c, r)) and M = diag(Mc (θ ), Ms (α), Md (c, r)) with θ = 60◦ , α = 0.1, r = 5, and c = 0. We decompose M = M1T M2 with the compact singular value decomposition. We now solve the LMI (11.13) as in item (i) of Remark 11.9 to compute Q i , Yi , and recover K f,i . The resulting x−axis control gains
312
S. B. Sarsılmaz et al.
K 1i K 1i K 1i K 2i K 2i K 2i
= 0.1221 0.0680 −0.1915 −0.0301 , = 0.0832 0.0464 −0.1307 −0.0206 , = −4.3911, = −0.0006 0.0753 , = −0.0004 0.0513 , = −0.2709 −4.0205 ,
i i i i i i
= 1, = 2, 3, 4, = 5, = 1, = 2, 3, 4, = 5,
and y−axis control gains K 1i K 1i K 1i K 2i K 2i K 2i
= −0.1387 −0.0772 −0.2177 −0.0343 , = −0.0832 −0.0463 −0.1306 −0.0206 , = −4.3911, = 0.0007 −0.0856 , = 0.0004 −0.0513 , = −0.2709 −4.0205 ,
i i i i i i
= 1, = 2, 3, 4, = 5, = 1, = 2, 3, 4, =5
assign the eigenvalues of Ag to the specified LMI region for each axis (see Fig. 11.3). For the adaptation parameters, we set Γi = 0.5 and Pf,i = Q i−1 . It is verified that this Pf,i satisfies the inequality (11.10).
Fig. 11.3 Eigenvalue locations of the matrix Ag for the x−axis at left and y−axis at right. The green shaded area is the LMI region that is the intersection of the conic sector with θ = 60◦ , the shifted half-plane with α = 0.1, and the disk with r = 5 and c = 0
11 A Distributed Adaptive Control Approach to Cooperative …
313
Fig. 11.4 Output response of the distributed reference model and the uncertain linear model
We consider the control effectiveness uncertainty in Assumption 11.1 due to unknown parameters cx,i and cy,i . However, in this example, we do not consider a state-dependent uncertainty as in Assumption 11.2 since the system and input matrices of the linearized dynamics in (11.24) and (11.25) are assumed to be known. The simulation results in Fig. 11.4 present the output response of the distributed reference model in (11.4) and the linear system in (11.24) and (11.25), where u x,i and u y,i utilize the control input in (11.15a). As is shown in the corresponding figure, the vehicles recover their reference model performances with the proposed distributed control law. Figure 11.5 illustrates the resulting trajectories in 3D when the control is also applied to the nonlinear model in (11.22) and (11.23).
11.7 Conclusion and Future Work The cooperative output regulation problem for a class of heterogeneous uncertain multi-agent systems was studied in this chapter. A distributed dynamic state feedback control law was proposed. This involves a distributed reference model characterizing the desired closed-loop response, an adaptive law handling the uncertainties, and an information exchange providing a level of decoupling between the distributed reference model and the adaptive controller. A numerical example demonstrated its application to a multi-vehicle system. It is noted that the distributed adaptive control approach in this chapter does not consider several additional practical limitations, such as output feedback, switching in communication graphs, and time delays in information exchange. Therefore, extensions of the approach to address these limitations would be well worth an exploration. Another research direction can be the civilian applications of the developed
314
S. B. Sarsılmaz et al.
Fig. 11.5 3D trajectories for the multi-vehicle system, where pz,i is multiplied by −1 to reverse the z−axis of the coordinate system in Fig. 11.2
distributed control law, for example, cooperatively monitoring fires and pipelines with multiple UAVs. Having addressed how uncertainties in the coordination of heterogeneous agent groups are successfully handled by employing a distributed adaptive control approach, the next chapter considers the problem of an aerial manipulator interacting with the environment in the presence of uncertainties. The approach of the current chapter may be valuable for the case when one extends the results of the next chapter to the coordination of multiple cooperative aerial manipulators. Acknowledgements This work was supported in part by the National Science Foundation under Grant CMMI-1657637 and in part by the Air Force Office of Scientific Research under Grant FA9550-20-1-0053.
References 1. Davison EJ (1976) The robust control of a servomechanism problem for linear time-invariant multivariable systems. IEEE Trans Autom Control 21(1):25–34 2. Francis BA, Wonham WM (1976) The internal model principle of control theory. Automatica 12(5):457–465 3. Francis BA (1977) The linear multivariable regulator problem. SIAM J Control Optim 15(3):486–505 4. Byrnes CI, Delli Priscoli F, Isidori A (1997) Output regulation of uncertain nonlinear systems. Birkhäuser 5. Huang J (2004) Nonlinear output regulation: theory and applications. SIAM
11 A Distributed Adaptive Control Approach to Cooperative …
315
6. Wang X, Hong Y, Huang J, Jiang Z-P (2010) A distributed control approach to a robust output regulation problem for multi-agent linear systems. IEEE Trans Autom Control 55(12):2891– 2895 7. Su Y, Huang J (2012) Cooperative output regulation of linear multi-agent systems. IEEE Trans Autom Control 57(4):1062–1066 8. Hong Y, Wang X, Jiang Z-P (2013) Distributed output regulation of leader-follower multi-agent systems. Int J Robust Nonlinear Control 23(1):48–66 9. Su Y, Hong Y, Huang J (2013) A general result on the robust cooperative output regulation for linear uncertain multi-agent systems. IEEE Trans Autom Control 58(5):1275–1279 10. Ding Z (2013) Consensus output regulation of a class of heterogeneous nonlinear systems. IEEE Trans Autom Control 58(10):2648–2653 11. Huang C, Ye X (2014) Cooperative output regulation of heterogeneous multi-agent systems: an H∞ criterion. IEEE Trans Autom Control 59(1):267–273 12. Ding Z (2015) Adaptive consensus output regulation of a class of nonlinear systems with unknown high-frequency gain. Automatica 51:348–355 13. Su Y, Huang J (2015) Cooperative global robust output regulation for nonlinear uncertain multi-agent systems in lower triangular form. IEEE Trans Autom Control 60(9):2378–2389 14. Xu D, Wang X, Hong Y, Jiang Z-P (2016) Global robust distributed output consensus of multiagent nonlinear systems: an internal model approach. Syst Control Lett 87:64–69 15. Li Z, Chen MZ, Ding Z (2016) Distributed adaptive controllers for cooperative output regulation of heterogeneous agents over directed graphs. Automatica 68:179–183 16. Bechlioulis CP, Rovithakis GA (2017) Decentralized robust synchronization of unknown high order nonlinear multi-agent systems with prescribed transient and steady state performance. IEEE Trans Autom Control 62(1):123–134 17. Lu M, Liu L (2017) Cooperative output regulation of linear multi-agent systems by a novel distributed dynamic compensator. IEEE Trans Autom Control 62(12):6481–6488 18. Cai H, Lewis FL, Hu G, Huang J (2017) The adaptive distributed observer approach to the cooperative output regulation of linear multi-agent systems. Automatica 75:299–305 19. Cai H, Hu G (2017) Cooperative output regulation for a class of nonlinear heterogeneous multiagent systems. In: 2017 29th Chinese control and decision conference (CCDC), pp 921–928 20. Liang H, Zhang H (2019) Cooperative tracking control and regulation for a class of multi-agent systems. Springer 21. Wang L, Wen C, Su H (2019) Cooperative semiglobal robust output regulation of nonintrospective nonlinear agents with partial normal form and state-dependent high-frequency gain. IEEE Trans Control Netw Syst 6(1):388–402 22. Yang R, Zhang H, Feng G, Yan H, Wang Z (2019) Robust cooperative output regulation of multi-agent systems via adaptive event-triggered control. Automatica 102:129–136 23. Huang B, Meng Z (2019) Cooperative output regulation for a group of nonlinear systems with limited information exchange: a distributed observer approach. Syst Control Lett 128:46–55 24. Liu T, Huang J (2019) Cooperative robust output regulation for a class of nonlinear multi-agent systems subject to a nonlinear leader system. Automatica 108:108501 25. Basu H, Yoon SY (2020) Robust cooperative output regulation under exosystem detectability constraint. Int J Control 93(5):1102–1114 26. Sarsilmaz SB, Yucelen T (2021) A distributed control approach for heterogeneous linear multiagent systems. Int J Control 94(5):1402–1414 27. Koru AT, Sarsılmaz SB, Yucelen T, Johnson EN (2021) Cooperative output regulation of heterogeneous multiagent systems: a global distributed control synthesis approach. IEEE Trans Autom Control 66(9):4289–4296 28. Koru AT, Sarsılmaz SB, Yucelen T, Muse JA, Lewis FL, Açıkme¸se B (2023) Regional eigenvalue assignment in cooperative linear output regulation. IEEE Trans Autom Control 68(7):4265–4272 29. Yucelen T, Johnson EN (2013) Control of multivehicle systems in the presence of uncertain dynamics. Int J Control 86(9):1540–1553
316
S. B. Sarsılmaz et al.
30. Peng Z, Wang D, Zhang H, Sun G (2014) Distributed neural network control for adaptive synchronization of uncertain dynamical multiagent systems. IEEE Trans Neural Netw Learn Syst 25(8):1508–1519 31. Li Z, Duan Z, Lewis FL (2014) Distributed robust consensus control of multi-agent systems with heterogeneous matching uncertainties. Automatica 50(3):883–889 32. Hashim HA, El-Ferik S, Lewis FL (2019) Neuro-adaptive cooperative tracking control with prescribed performance of unknown higher-order nonlinear multi-agent systems. Int J Control 92(2):445–460 33. Katsoukis I, Rovithakis GA (2022) A low complexity robust output synchronization protocol with prescribed performance for high-order heterogeneous uncertain mimo nonlinear multiagent systems. IEEE Trans Autom Control 67(6):3128–3133 34. Sarsilmaz SB, Yucelen T, Oswald T (2018) A distributed adaptive control approach for heterogeneous uncertain multiagent systems. In: 2018 AIAA guidance, navigation, and control conference 35. Yildirim E, Sarsilmaz SB, Yucelen T (2019) Application of a distributed adaptive control approach to a heterogeneous multiagent mechanical platform. In: AIAA Scitech 2019 forum 36. Su Y, Huang J (2012) Cooperative output regulation with application to multi-agent consensus under switching network. IEEE Trans Syst Man Cybern Part B (Cybern) 42(3):864–875 37. Ren W, Beard RW (2008) Distributed consensus in multi-vehicle cooperative control. Springer 38. Mesbahi M, Egerstedt M (2010) Graph theoretic methods in multiagent networks. Princeton University Press 39. Chilali M, Gahinet P (1996) H∞ design with pole placement constraints: an LMI approach. IEEE Trans Autom Control 41(3):358–367 40. Dydek ZT, Annaswamy AM, Lavretsky E (2013) Adaptive control of quadrotor UAVs: a design trade study with flight evaluations. IEEE Trans Control Syst Technol 21(4):1400–1406 41. Lavretsky E, Wise KA (2013) Robust and adaptive control with aerospace applications. Springer 42. Anderson RB, Marshall JA, L’Afflitto A (2021) Constrained robust model reference adaptive control of a tilt-rotor quadcopter pulling an unmodeled cart. IEEE Trans Aerosp Electron Syst 57(1):39–54 43. Yucelen T (2019) Model reference adaptive control. Wiley encyclopedia of electrical and electronics engineering 44. Lewis FL, Zhang H, Hengster-Movric K, Das A (2014) Cooperative control of multi-agent systems: optimal and adaptive design approaches. Springer 45. Sontag ED (1989) Smooth stabilization implies coprime factorization. IEEE Trans Autom Control 34(4):435–443 46. Khalil HK (2002) Nonlinear systems. Prentice Hall 47. Chilali M, Gahinet P, Apkarian P (1999) Robust pole placement in LMI regions. IEEE Trans Autom Control 44(12):2257–2270 48. Hespanha JP (2009) Linear systems theory. Princeton University Press 49. Boyd S, El Ghaoui L, Feron E, Balakrishnan V (1994) Linear matrix inequalities in system and control theory. SIAM 50. Nguyen NT (2018) Model-reference adaptive control: a primer. Springer 51. Bouabdallah S, Murrieri P, Siegwart R (2004) Design and control of an indoor micro quadrotor. In: IEEE international conference on robotics and automation. Proceedings. ICRA’04, vol 5, pp 4393–4398 52. L’Afflitto A (2017) A mathematical perspective on flight dynamics and control. Springer 53. Pin FG, Killough SM (1994) A new family of omnidirectional and holonomic wheeled platforms for mobile robots. IEEE Trans Robot Autom 10(4):480–489
11 A Distributed Adaptive Control Approach to Cooperative …
317
Selahattin Burak Sarsılmaz received the B.S. and M.S. degrees in Aerospace Engineering from Middle East Technical University in 2013 and 2016, respectively. He received the M.A. degree in Mathematics and the Ph.D. degree in Mechanical Engineering from the University of South Florida in 2020. He is an Assistant Professor in the Electrical and Computer Engineering Department at Utah State University. His research interests include distributed control of multi-agent systems, geometric approach and optimization-based control, adaptive control, and trajectory planning. Ahmet Taha Koru received the B.Sc. and M.Sc. degrees in Electrical and Electronics Engineering from Bilkent University, Ankara, Turkey, respectively, in 2009 and 2012, and the Ph.D. degree in Control and Automation Engineering from Yıldız Technical University, Istanbul, Turkey, in 2017. He is currently a Postdoctoral Research Associate at the University of Texas at Arlington Research Institute. His research interests include optimal control, multi-agent systems, autonomous aerial vehicles, delay systems, switching systems, haptic interfaces, and medical robotics. Tansel Yucelen received the Ph.D. degree in Aerospace Engineering from the Georgia Institute of Technology. He has currently been an Associate Professor in the Department of Mechanical Engineering and the Director of the Laboratory for Autonomy, Control, Information, and Systems at the University of South Florida. His research interests include adaptive and robust control; distributed estimation and control; resilient and secure robotics, autonomous vehicles, and cyber-physical systems; and large-scale and modular systems. Behçet Açıkme¸se is a Professor at University of Washington (UW), Seattle. He received his Ph.D. in Aerospace Engineering from Purdue University. Before joining UW, he was a Senior Technologist at JPL and a Lecturer at Caltech, and later a faculty member at the University of Texas at Austin. At JPL, he developed control algorithms for planetary landing, spacecraft formation flying, and asteroid and comet sample return missions. He developed the “flyaway” control algorithms for Mars Science Laboratory (MSL) and Mars 2020 missions, which landed on Mars in 2012 and 2021. His research interests include optimization and its applications in control theory, Markov decision processes, nonlinear and robust control, and game theory. He is a Fellow of AIAA and IEEE.
Chapter 12
Aerial Manipulator Interaction with the Environment Santos M. Orozco-Soto, Eugenio Cuniato, Jonathan Cacace, Mario Selvaggio, Fabio Ruggiero, Vincenzo Lippiello, and Bruno Siciliano
12.1 Introduction Aerial manipulators are nowadays more and more employed in maintenance and construction tasks [1], specifically requiring manipulation capabilities. Their strength resides in the agility these systems offer, as they are not limited by distance and height as much as human workers usually are, but also in accurate and powerful manipulation capabilities. Aerial manipulators comprise an unmanned aerial vehicle (UAV) with either a gripper (or a tool) or a several degree-of-freedom (DoF) robotic manipulator(s) onboard. The latter case provides versatility and dexterousness. Recent states of the art regarding aerial manipulation are discussed in [2, 3]. The UAVs usually employed in aerial manipulators are multi-rotor vehicles. These have conventional designs with parallel-axes rotors having the property of being underactuated and strongly coupled systems. Designs with parallel-axes rotors are divided according to the propulsion S. M. Orozco-Soto · J. Cacace · M. Selvaggio · F. Ruggiero (B) · V. Lippiello · B. Siciliano Università di Napoli “Federico II”, Naples, Italy e-mail: [email protected] J. Cacace e-mail: [email protected] M. Selvaggio e-mail: [email protected] V. Lippiello e-mail: [email protected] B. Siciliano e-mail: [email protected] E. Cuniato ETH Zurich, Zurich, Switzerland e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 A. L’Afflitto et al. (eds.), Control of Autonomous Aerial Vehicles, Advances in Industrial Control, https://doi.org/10.1007/978-3-031-39767-7_12
319
320
S. M. Orozco-Soto et al.
configuration. The most common designs incorporate single propulsion units (quadcopter, hexacopter, octocopter) or coaxial propulsion units [4]. Non-parallel-axes rotor designs with tilted or tilting rotors can instead overcome the inherent underactuated property of parallel-axes rotor designs [4], but are challenging from the control engineering perspective. Another consideration for designing the control system of an aerial manipulator concerns interactions with the environment, including other UAVs, aerial manipulators, or humans. Indeed, aerial manipulators can help humans with their daily activities, especially in tasks wherein human judgment or physical intervention is still necessary. As the human workspace becomes more and more crowded with aerial robots, it becomes of paramount importance to understand how the interaction between the two is established, especially in terms of safety for the human operators. This chapter investigates the control design for an aerial manipulator interacting with the surroundings. In the first part, the aerial manipulator should deliver a sustained force to the environment (i.e., a wall). A decentralized control approach is presented for sustained force delivery tasks, in which the UAV is independently controlled from the arm. Since such a robotic arm is equipped with position-source actuators, the main scope of the chapter’s first part is dealing with the omnidirectional flight control of the UAV and exploiting this property to deliver lateral force upon a vertical flat surface. A second-order sliding mode control approach is implemented for trajectory tracking of the omnidirectional UAV, which switches to a “transparent” force regulator since it is provided with tilting propellers. In the second part of this chapter, the aerial manipulator deals with safe interaction with humans. In this regard, the use of a hardware-in-the-loop simulator for human cooperation has different advantages, namely, (i) it may play the role of a training interface work workers since it allows for physical interaction without the intrinsic danger of rotary-wing platforms; (ii) it allows the introduction of software safety layers, like a predetermined bounding box for the manipulator; (iii) it lends itself to developing and testing both autonomous and human–aerial manipulator interaction control strategies. Both parts of this chapter rely on a simulation environment endowed with a physics engine to assess the control effectiveness, which represents a step beyond traditional numerical simulation approaches since the results are closer to the actual behavior of the real platform. The work presented in this chapter is split into two parts. The first part details designing and implementing a model-free robust control technique for an aerial manipulator compound by a tilting (non-parallel-axes) quadrotor and a 6-degreeof-freedom (DoF) manipulator onboard. The experiments showing the performance of the proposed controller were performed within the Gazebo simulator, a widely employed robot simulator with physics engines, and consisted of flight trajectory tracking and horizontal force delivery. The second part is related to developing a hardware-in-the-loop simulation system for interaction tasks. Such a setup is compounded by a simulated standard parallel-axes quadrotor with a 6-DoF arm onboard, a hardware interface to enable the force-based interaction task, and a middleware. Two experiments show the success of the proposed system, namely, a human–UAV interaction task and the installation of a bird diverter into an electrical wire.
12 Aerial Manipulator Interaction with the Environment
321
12.2 Preliminaries and Notation In this section, general notation is introduced. Local symbols for each part are discussed in the related sections. Let x(t) = [x1 (t), . . . , xn (t)]T ∈ Rn denote a vector. The following operator is defined as in [5] ⎤ ⎡ |x1 (t)|q sign(x1 (t)) ⎥ ⎢ .. x(t)q = ⎣ (12.1) ⎦, . |xn (t)|q sign(xn (t))
where q ∈ R denotes a constant equal for each component xi ∈ R, with i = 1, . . . ,N, of the vector x. Its time derivative is given by d x(t)q = q J x(t)q−1 x˙ = q diag |xi (t)|q−1 x˙ , (12.2) dt where J x(t)q−1 ∈ Rn×n denotes the Jacobian of x(t)q . Notice that (12.1) and (12.2) are continuous functions as long as q = 0. The identity matrix of proper dimensions is represented by the symbol In ∈ Rn×n . The zero matrix of proper dimensions is given by On×m ∈ Rn×m . The zero vector is denoted by 0n ∈ R N . The vector ei ∈ R3 denotes the ith canonical basis vector. The matrix denoting the rotation of an angle x ∈ R about each standard basis vector is denoted by Rei (x) ∈ S O(3), with i = 1, 2, 3. The symbol S(x) ∈ R3×3 denotes the skew-symmetric operator of a generic vector x ∈ R3 . For brevity, the functions cos x and sin x, with x ∈ R, will be captured by cx and sx , respectively. Cartesian norms are employed in this chapter, and they are denoted with the symbol · . The position and orientation of the UAV are defined through the frame Σb attached to the body center of mass (CoM), whereas the inertial frame is represented by Σw . The position of Σb in Σw and its attitude are denoted by pb = [x, y, z]T ∈ R3 and Rb ∈ S O(3), respectively. The attitude can also be expressed in a minimum way through the roll-pitch-yaw Euler angles family, ob = [φ, θ, ψ]T ∈ R3 . The translational and angular velocities of Σb in Σw are denoted by p˙ b ∈ R3 and ωb ∈ R3 , T
respectively. The pose of the UAV can be stacked in the vector ξ = pTb , obT ∈ R6 . The position, velocity, and acceleration of the onboard manipulator joints are given by q ∈ Rn j , q˙ ∈ Rn j , and q¨ ∈ Rn j , respectively, with n j > 0 the number of joints. The inertia matrix of the manipulator is denoted by M(q) ∈ Rn j ×n j . The manipulator’s joints are controlled through the torque input vector τ q ∈ Rn j . As for the UAV, a distinction is made. In the first part of this chapter, a quadrotor with tilting propellers T
is considered. Therefore, the control input vector is utb = u x , u y , u z , u φ , u θ , u ψ ∈ R6 , expressed in Σb . In the second part of this chapter, the quadrotor cannot tilt its propeller, resulting in the classic flat design quadrotor whose control input vector is
322
S. M. Orozco-Soto et al.
Table 12.1 Symbols used to represent the parameters of the UAV Symbol Meaning m>0 g>0 l>0 kf > 0 km > 0 km σ = >0 kf
Total mass of the UAV Gravity acceleration Length of the boom from the center of the airframe to the rotor Thrust coefficient Drag coefficient
ςi ϑi
Orientation of the i-th propeller with respect to the airframe Spin sense of the i-th propeller, 1 if clockwise and −1 if counterclockwise For the specific case it ϑ1 = −1, ϑ2 = 1, ϑ3 = −1, and ϑ4 = 1 Spin velocity of the i-th propeller Tilting angle of the i-th propeller with respect to the airframe horizontal plane
i ∈ R αi ∈ R
Thrust–drag ratio coefficient
T T ubf = u T , τ b ∈ R4 , with u T > 0 the total thrust and τ b ∈ R3 the control torques acting around the Σb axes and expressed in Σb . Table 12.1 contains the intrinsic parameters of the UAV.
12.3 Part I: Super-Twisting Sliding Mode Controller for an Omnidirectional Aerial Manipulator 12.3.1 Related Work and Contribution Sliding mode control (SMC) is a powerful tool for controlling disturbed uncertain systems, but it is also well known for its chattering effect as the main drawback, at least for mechanical systems. On the other hand, several techniques deal with this effect, such as replacing the discontinuous function for a continuous one or the use of higher order sliding mode techniques [6]. In this context, the super-twisting sliding mode control (STSMC), a second-order SMC type, offers a suitable alternative with the following advantages: (i) the chattering effect is significantly reduced by replacing the discontinuous control for a continuous one; (ii) the chattering is hidden behind an integral action; (iii) such an integral action increases the robustness. These advantages have been recently exploited by researchers for controlling quadrotor UAVs with both numerical simulation and successful experimental results [7, 8]. For the case of an omnidirectional multi-rotor, second-order SMC has also been applied in simulation for tilting quadrotors and tilted hexarotors [9–11]. Nonetheless, these authors do not take into account the allocation, which is critical for mapping the computed six-dimensional control wrench to the thrust that the tilting rotors must
12 Aerial Manipulator Interaction with the Environment
323
supply. Therefore, a more reliable result was presented in [12], where a model-based integral SMC is used. Regarding aerial manipulators, high-order SMCs have also been implemented. Recent research results have successfully tested numerically both terminal sliding mode control and STSMC for non-tilting quadrotor-based aerial manipulators [13– 15], which are also the closest results to those presented in this work. The contributions presented in this part of the chapter with respect to the current state of the art are the following: (i) this chapter deals with the control of an omnidirectional quadrotor with actively tilting propellers and a 6-DoF manipulator onboard, differently from traditional quadrotor-based aerial manipulators; (ii) the proposed controller is an STSMC for full pose tracking, taking into account the allocation matrix, that is missing in the related literature; (iii) the presented controller is extended to a hybrid pose/force regulation for sustained lateral force delivery, which is also a novelty to the latest results; (iv) the delivered force is controlled and sustained for at least 60 s, which is a long-time exerted force to the authors’ best knowledge.
12.3.2 Omnidirectional Manipulator Dynamics The generalized aerial manipulator dynamics can be found in [16]. In this part of the chapter, we consider the manipulator actuated by servomotors and controlled in position. Therefore, we consider all the disturbances from the manipulator to the UAV lumped in the gravity, Coriolis, and centrifugal terms. Thus, for the design of the controller, we start with the following mathematical model: −1 03×3 03×3 b ˙ ¨ξ = mI3 03×3 ¯ ˙ , Rb ut − ξ − ϕ(g, q, q) 03×3 Mb (ob ) 03×3 C(ob , o˙ b )
(12.3)
with Mb (ob ) ∈ R3×3 the symmetric and positive-definite (provided that θ = ±π/2 [17]) inertia matrix of the UAV’s angular part expressed as a function ¯ b = blockdiag(Rb , Rb ) ∈ R6×6 , C(ob , o˙ b ) the of the roll-pitch-yaw Euler angles, R ˙ ∈ R6 the vector containing both Coriolis matrix of the UAV system, and ϕ(g, q, q) the gravity and the lumped disturbance effects from the manipulator. Breaking down the control inputs mapping yields ⎤ u x (cψ cθ ) + u y (cψ sθ sφ − sψ cφ ) + u z (cψ sθ cφ + sψ sφ ) ⎢u x (sψ cθ ) + u y (sψ sθ sφ + cψ cφ ) + u z (sψ sθ cφ − cψ sφ )⎥ ⎥ ⎢ ⎥ ⎢ −u x (sθ ) + u y (cθ sφ ) + u z (cθ cφ ) b ⎥. ⎢ ¯ Rb ut = ⎢ ⎥ u φ ⎥ ⎢ ⎦ ⎣ uθ uψ ⎡
(12.4)
324
S. M. Orozco-Soto et al.
Fig. 12.1 Tilting propellers kinematic parameters
The control cross terms can be considered as matched disturbances affecting the same channels of the non-crossed control inputs as expressed below: ⎤ ⎡ ⎤ u y (cψ sθ sφ − sψ cφ ) + u z (cψ sθ cφ + sψ sφ ) u x (cψ cθ ) ⎢u y (sψ sθ sφ + cψ cφ )⎥ ⎢ ⎥ u x (sψ cθ ) + u z (sψ sθ cφ − cψ sφ ) ⎥ ⎢ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ u z (cθ cφ ) −u x (sθ ) + u y (cθ sφ ) b ⎢ ⎥ ⎢ ⎥ ¯ +⎢ Rb ut = ⎢ ⎥ ⎥ uφ 0 ⎥ ⎢ ⎢ ⎥ ⎣ ⎦ ⎣ ⎦ uθ 0 uψ 0 ⎡
= Γ (ob )utb + λ(u x , u y , u z , ob ), where
(12.5)
Γ (ob ) = blockdiag diag cψ cθ , sψ sθ sφ + cψ cφ , cθ cφ , I3 ,
and λ(u x , u y , u z , ob ) ∈ R6 is a disturbance-like vector containing control cross terms. For the quadrotor with tilting propellers considered in this part of the chapter, refer to Fig. 12.1. Each rotor is located at a distance l with an orientation ς with respect to Σb . Each propeller can be commanded to tilt of an angle α about its x R -axis. Hence, the six-dimensional wrench supplied by the propellers is given by [18] = k f 4
4 2 i=1 i Re3 (ςi )Re1 (αi )e3 . 2 2 i=1 lRe3 (ςi ) × i Re3 (ςi )Re1 (αi )e3 − ϑi σ i Re3 (ςi )Re1 (αi )e3 (12.6) The allocation problem is to find the rotors speed i and the angles αi to supply the wrench utb [19]. The thrust supplied by each rotor is compound by utb
fib
=
b f i,x 2 sin(αi ) y ∈ R2 , = k f i b cos(αi ) f i,z
12 Aerial Manipulator Interaction with the Environment
325
with i = 1, . . . , 4. Notice that for αi = 0, i = 1, . . . , 4, the thrust is the same as a parallel-axes quadrotor. Furthermore, ςi determines if the quadrotor is “X” or “+” type. The control thrust of each rotor can be retrieved as follows: u T,i =
b b f i,x y + f i,z > 0. 2
2
(12.7)
Consequently, the angular velocity of each rotor is computed through the following expression: u T,i i = , (12.8) kf and the tilting angle of each rotor is obtained through b b αi = arctan2 f i,z , | f i,x y| ,
(12.9)
with i = 1, . . . , 4 in all the above expressions. Then, splitting the static parameters from (12.6), the following allocation matrix is proposed: ⎤ 0 0 0 0 sς1 sς2 sς3 sς4 ⎢ 0 0 0 0 −cς1 −cς2 −cς3 −cς4 ⎥ ⎥ ⎢ ⎥ ⎢ 1 1 1 1 0 0 0 0 ⎥ Λ=⎢ ⎢ lsς1 lsς2 lsς3 lsς4 sς1 ϑ1 σ sς2 ϑ2 σ sς3 ϑ3 σ sς4 ϑ4 σ ⎥ . ⎥ ⎢ ⎣ −lcς1 −lcς2 −lcς3 −lcς4 −cς1 ϑ1 σ −cς2 ϑ2 σ −cς3 ϑ3 σ −cς4 ϑ4 σ ⎦ −l −l −l −l −ϑ1 σ −ϑ2 σ −ϑ3 σ −ϑ4 σ (12.10) Notice that the right pseudo-inverse of such an allocation matrix exists, Λ† = T −1 T Λ . Therefore, the mapping from the control wrench utb to the actuators Λ Λ signals is
T T T T T f b = f1b , f2b , f3b , f4b = Λ† utb ∈ R8 . (12.11) ⎡
12.3.3 Control Design Consider the omnidirectional quadrotor dynamics (12.3) with split control cross terms as in (12.5). A control problem for this system is to lead the pose of the UAV to the desired value, despite the disturbances caused by the onboard manipulator motion and by the control cross terms. From a mathematically viewpoint, the problem is described as follows:
326
S. M. Orozco-Soto et al.
lim ξ˜ → 06 ,
t→∞
lim ξ˜˙ → 06 ,
t→∞
where ξ˜ = ξ d − ξ ∈ R6 denotes the pose error, with ξ d ∈ R6 indicating the desired d pose that should be at least twice differentiable and ξ˙˜ = ξ˙ − ξ˙ ∈ R6 is the velocity d pose error, with ξ˙ ∈ R6 indicating the desired velocity of the pose vector. For the d following, ξ¨ ∈ R6 is the desired acceleration of the UAV’s pose. The following controller is proposed to lead the UAV to the target trajectory utb = Γ (ob )−1
0 mI3 03×3 03×3 d K1 s1/2 + K2 w + Kv ξ˙˜ + ξ¨ + 3×3 ξ˙ , 03×3 Mb (ob ) 03×3 C(ob , o˙ b )
(12.12a) ˙ = sign(s), w
(12.12b)
where Kv , K1 , K2 ∈ R6×6 denote constant gains and s denotes the following sliding surface: s = Bξ˜ + ξ˙˜ , (12.13) with B ∈ R6×6 also constant. In order to deliver the desired force for a long time (e.g., more than 30 s), the tracking controller is extended so that the behavior of the aerial manipulator is similar to that of a flying tool able to exert a six-dimensional wrench [20]. In this case, the experiments are only performed to deliver sustained force along the x B -axis of Σb ; see Fig. 12.1. Thus, the proposed controller (12.12) can be extended for force regulation as follows: 03×3 b −1 mI3 Kv ξ˙˜ + I6 − S1 K1 s1/2 + I6 − S1 S2 K2 w ut = Γ (ob ) 03×3 Mb (ob ) 0 03×3 3×3 ˜ + ξ˙ , (12.14a) +sat S1 K f f 03×3 C(ob , o˙ b ) ˙ = sign(s), w
(12.14b)
where f˜ f d − f m ∈ R6 denotes the error between the desired, f d , and the measured wrench, f m , at the tip of the end-effector; K f ∈ R6×6 denotes a constant gain matrix; and S1 , S2 ∈ R6×6 denote two switching matrices choosing whether to use the force regulation and the integral action or not, respectively.
12 Aerial Manipulator Interaction with the Environment
327
Fig. 12.2 Onboard manipulator kinematic parameters
12.3.4 Validation in the Gazebo Simulator In order to assess the proposed STSMC, two sets of simulations were performed in Gazebo, namely, (i) trajectory tracking and (ii) flying to a target pose and applying a sustained force. The STSMC was programmed in the C++ language and executed as an ROS node communicating with Gazebo to obtain the odometry and joint states from the model. A different node commands both the manipulator and the UAV set points. The aerial manipulator used for this work is shown in Figs. 12.1 and 12.2. It consists of a quadrotor UAV whose propellers are actuated by a servomotor to rotate about their corresponding x R -axis. The onboard manipulator is also actuated by servomotors and controlled in position. The joint configuration can be appreciated in Fig. 12.2, and the related kinematics is expressed in Σb . The weight of the UAV is 9.8 kg, and the arm weighs about 1.7 kg. The first case study to assess the proposed STSMC given by (12.12) consists of commanding the UAV to track the following trajectory in Cartesian space for 90 s: ⎡ ⎤ ⎡ ⎤ ⎡ 2 ⎤ r cos(at) −ar sin(at) −a r cos(at) pdb = ⎣ r sin(at) ⎦ , p˙ db = ⎣ ar cos(at) ⎦ , p¨ db = ⎣ −a 2 r sin(at) ⎦ , (12.15) 1 + 0.1t 0.1 0 where r = 0.4 m denotes the radius of the circle described mathematically above and a = 0.25 denotes the period of the sinusoidal signals. The desired orientation is obd = 03 . Figure 12.3 shows the behavior of the controlled states. During the first 25 s of the test, the UAV takes. Then, the manipulator is moved to a home setup. Finally, the UAV leads to a target pose close to the beginning of the trajectory. After this process, the trajectory is commanded for 90 s. Figure 12.3 shows the UAV’s position and orientation, where it can be appreciated the satisfactory tracking of the reference trajectory. The average norms of the error vectors are p˜ b = 0.0086 m and ˜ob = 0.1352◦ . Figure 12.4 shows the motion of the aerial
328
S. M. Orozco-Soto et al.
Fig. 12.3 Behavior of the controlled states during the tracking experiment. Top line: the three components of the UAV’s position, pb . Bottom line: the three components of the UAV’s attitude, ob . - - - Reference — Measured states
Fig. 12.4 Motion of the aerial manipulator in the Cartesian space. - - - Reference trajectory — Measured trajectory
manipulator in the Cartesian space, where it can be observed that the trajectory is correctly tracked. A video of the carried out simulation is available online.1 In the second case study, the aerial manipulator is commanded to the target Cartesian position pdb = [0.7, 0.5, 1.8]T m, to apply the desired force of 2.2 N for at least 60 s. The force is measured using a Gazebo contact sensor plugin at the tip of the 1
https://youtu.be/pJLFsVOek7M.
12 Aerial Manipulator Interaction with the Environment
329
Fig. 12.5 Behavior of the controlled states during the flight and force regulation experiment. Top line: the three components of the UAV’s position, pb . Bottom line: the three components of the UAV’s attitude, ob . - - - Reference — Measured states
end-effector of the arm, pointed in Fig. 12.2 as the frame Σe : x E , y E , z E . The experiment also is compound by the following stages: (i) take-off, (ii) arm setup (40 s), (iii) approach to the point (120 s), (iv) contact phase (20 s), and (v) force regulation (150 s). The final stage involves both reaching and then holding the desired force. The sample time for the force regulation is of 1 s. Figure 12.5 shows the six controlled states of the UAV. The x B -axis is not well regulated; however, a higher reference has been commanded to perform the contact before the force regulation. Nevertheless, the other degrees of freedom are precisely controlled. The results of the force control phase are displayed in Fig. 12.6, where it can be appreciated that the force is successfully regulated to 2.2 N for more than 60 s. The steady-state force ˜ = 0.000012 N. The video of this simulation is available online.2 error norm is F
12.4 Part II: Hardware-in-the-Loop Simulator for a Physical Human–Aerial Manipulator Cooperation The developed hardware-in-the-loop (HIL) simulator architecture consists of three main parts: (i) a model-based simulation of an unmanned aerial manipulator (UAM), composed of parallel-axes quadrotor equipped with a 6-DoF robotic arm; (ii) a hardware interface to enable a force-based interaction with the simulated robotic model while rendering the motion effects of the UAM floating base; and (iii) a bilateral software communication interface connecting the hardware with the simulated model. All these components are integrated through the Robot Operating System (ROS) framework, as shown in Fig. 12.7, with specifically designed modules for each of them. 2
https://youtu.be/Ks261f6orMw.
330
S. M. Orozco-Soto et al.
Fig. 12.6 Controlled force along the x B -axis of Σb . - - - Reference — Measured force
Fig. 12.7 Conceptual scheme of the simulator architecture
As the hardware interface interacts with the human operator or the environment, all the exchanged forces, measured by a force sensor, are applied to the simulated UAM. The simulation is carried out in Gazebo. The position of the floating base, affected by the interaction forces, is then fed back to the hardware interface, which adjusts its position consequently. This exchange of information between the real and simulated worlds happens simultaneously using standard ROS messages. Although the hardware and simulation controllers have been tailored for the specific experiments, the proposed architecture is general enough to allow the deployment of different control strategies, which the users might develop according to their needs. The code is indeed freely released.3
3
https://github.com/prisma-lab/HIL_airmanip.
12 Aerial Manipulator Interaction with the Environment
331
12.4.1 Related Work and Contribution So far, most of the literature regarding human–drone interaction relies on communication through speech [21], gestures [22], brain–computer interfaces [23], and multimodal interaction [24, 25]. An exhaustive overview can be found in [26]. A few works indeed deal with the problem of close physical cooperation between drones and humans [27]. Safe-to-touch UAVs have been considered in [28, 29], but the first example of a hardware-in-the-loop simulator for human–UAV interaction was devised in [30]. Here, the human can command a UAV by exchanging forces measured by buttons pushed at a contact point. The interaction forces are then used in an admittance control scheme to modify the vehicle reference trajectory. For similar purposes, an admittance control scheme was also employed in [31]. Whereas in these last works the interaction happened using pushing actions, in [32] a tethered interaction was investigated. Here, a UAV pulls a human along the desired path, while the pulling force is used as an indirect communication channel. This chapter’s part extends what has been already presented in [33] by adding the stability proof.
12.4.2 Simulations The simulated quadrotor was implemented using the plugins and functionalities offered by the RotorS library [34]. It is modeled as a rigid body in space actuated by four propellers with parallel axes. The classical parallel-axes quadrotor dynamic model is m p¨ b = mge3 − u T Rb e3 + fext , ˙ b = Rb S(ωbb ), R
(12.16b)
Ib ω˙ bb
(12.16c)
=
−S(ωbb )Ib ωbb
+τ + b
τ bext ,
(12.16a)
where Ib ∈ R3×3 denotes the constant inertia matrix of the UAV in Σb , ωbb = RbT ωb ∈ R3 denotes the rotation velocity vector of Σb with respect to itself, fext ∈ R3 and τ bext ∈ R3 represent the external forces and torques disturbances acting on the quadrotor, respectively. These last TERMS account for unmodeled terms (e.g., aerodynamic disturbances), arm movements, and human–UAM interaction forces. The allocation problem for the parallel-axes quadrotor can be seen as before with αi = 0, ∀i = 1, . . . 4. The total thrust supplied by each rotor is thus only the scalar f i , z b = k f i2 > 0. The mapping from ubf to f bf ∈ R4 is given by f bf = Λ f ubf , with allocation matrix [35] ⎡ ⎤ 1 1 1 1 ⎢ 0 l 0 −l ⎥ ⎥ Λf = ⎢ (12.17) ⎣−l 0 l 0 ⎦ . σ −σ σ −σ
332
S. M. Orozco-Soto et al.
The actuated joints of the 6-DoF arm attached to the quadrotor are configured as an anthropomorphic arm with a spherical wrist. Denote with Σe the manipulator’s end-effector frame while its base frame coincides with Σb . The direct kinematics from Σb to Σe is described by Reb (q) pbe (q) , = 1 03T
Abe
(12.18)
where Reb ∈ S O(3) and pbe ∈ R3 denote the rotation matrix and the position of Σe with respect to Σb , respectively.
12.4.3 Hardware Setup The hardware interface connected to the simulation allows the user to interact with the simulated environment through the measured contact forces. At the same time, the user receives haptic feedback related to the movements of the simulated UAM, also taking into account the position of the floating base. Notice that it is unnecessary to have a user interacting with the hardware interface. Indeed, the interaction forces can also come from the environment while testing autonomous control strategies for physical interaction. The architecture is independent of the specific hardware, which can be any device as long as its position in the space can be commanded (e.g., manipulators and/or haptic interfaces). In this work, the hardware interface consists of a 7-DoF KUKA IIWA manipulator equipped with an ATI Mini45 force/torque sensor at the end-effector. Also, since the hardware interface mimics the simulated UAM in operational space coordinates, the two kinematic chains can be structurally different if a suitable inverse kinematics algorithm is adopted. To perform a safe human–robot interaction, the dynamics of the hardware interface are given by an admittance controller. Let Σb denote the base frame of the hardware side. Let pbe ∈ R3 denote the position of the hardware end-effector frame, Σe , with respect to Σb , and let pbe ,c ∈ R3 denote the position of the compliant frame, Σe ,c , with respect to Σb . The hardware-side admittance is captured by Md x¨˜ + K D x˙˜ + K P x˜ = hh + hq ,
(12.19)
where Md ∈ R6×6 , K D ∈ R6×6 , and K P ∈ R6×6 denote the apparent mass, damping,
T T T ∈ R6 denotes the operaand stiffness matrices, respectively, while x˜ = x˜ p , x˜ o
tional space error between Σe and Σe ,c . In particular, x˜ p = pbe − pbe ,c ∈ R3 denotes the position error, whereas the orientation error x˜ o ∈ R3 is given by the vector part of the quaternion representing the rotation between the hardware compliant frame and the hardware end-effector frame, expressed by Ree ,c ∈ S O(3). The right-hand side of (12.19) represents the wrench applied to the system given by (i) hh ∈ R6 , the interaction wrench exerted by the human operator at the hardware end-effector Σe
12 Aerial Manipulator Interaction with the Environment
333
expressed in Σb ; (ii) hq ∈ R6 , and the haptic feedback from the simulated UAM detailed in the next section.
12.4.4 Communication Interface The wrench he ∈ R6 applied in simulation at the UAM’s end-effector can be computed as he = AdTew AdTwb AdTb e hh ,where AdT12 ∈ R6×6 denotes the adjoint transformation matrix between two generic frames, Σ1 and Σ2 [36]. Notice that the transformation matrix T w b should be chosen by the user to connect the real hardware base frame to the simulated world. Regarding the simulation feedback to the hardware interface, the displacement e p ∈ R3 between the commanded and the actual position of the quadrotor’s CoM can be fed back onto the hardware interface to emulate the effects of the floating base displacements. In particular, this contribution can be seen as an additional wrench, playing the role of haptic feedback for the human operator, given by Md e¨ p + K D e˙ p + K P ep = hq ,
(12.20)
where ep = AdTb w e p denotes the quadrotor’s displacement transformed from Σw into Σb .
12.4.5 System Controllers Here, we present the hardware and the simulation controllers later used in the two case studies. Although the simulator itself is not strictly related to the type of controller, in our case, a decentralized controller [2] has been implemented on the simulated UAM, employing a geometric position tracking controller for the quadrotor and a Cartesian variable admittance controller for the arm. Also, the quadrotor’s position tracking is enhanced with a momentum-based external wrench estimator [17] to compensate for the arm dynamics, the interaction forces, and other unmodeled disturbances. Because of the under-actuation of the system, a hierarchical approach is followed to control both the position, pb , and the attitude, Rb , of the quadrotor. From this point of view, the geometric tracking controller in S E(3) [37] is implemented on the quadrotor. The outer position loop tracking errors are e p = pb − pdb and ev = p˙ b − p˙ db , where d pb ∈ R3 denotes the desired position of the Σb ’s origin in Σw , obtained from an external trajectory planner. Let Rb,d = [xb,d , yb,d , zb,d ] ∈ S O(3) denote the desired rotation matrix, where xb,d ∈ R3 is given from the trajectory planner. The tracking T Rb − RbT Rb,d )∨ and errors of the inner attitude loop are given by e R = 0.5 (Rb,d b,d b,d b T 3 eω = ωb − Rb Rb,d ωb,d , where ωb,d ∈ R is the desired body rotation velocity in
334
S. M. Orozco-Soto et al.
Σb and ∨ : R3×3 → R3 is a map performing the inverse of the skew-symmetric operator. Theorem 12.1 In the absence of external disturbances, or in the case they are negligible, the necessary thrust, u T , the desired body axis, zb,d ∈ R 3 , and the attitude control law, τ b , that asymptotically bring to zero the outer position loop and the inner attitude loop tracking errors be computed as u T = (K p e p + Kv ev + mge3 − m p¨ b,d )T Rb e3 , zb,d = −
m p¨ db
−K p e p − Kv ev − mge3 + , − K p e p − Kv ev − mge3 + m p¨ b,d
(12.21a) (12.21b)
where K p ∈ R3×3 and Kv ∈ R3×3 denote symmetric positive-definite gain matrices, and T ˙ b,d τ b = −K R e R − Kω eω + S(ωbb ) Ib ωbb − Ib (S(ωbb ) RbT Rb,d ωb,d b,d − Rb Rb,d ω b,d ), (12.22) with K R ∈ R3×3 and Kω ∈ R3×3 symmetric positive-definite gain matrices.
Proof This result is proven in detail in [37].
Because there might be significant external disturbances in our case, this control scheme needs an external wrench estimator to maintain good performance levels in trajectory tracking. In this work, the estimator presented in [17] has been used. Due to the limited payload capabilities of aerial platforms, the arm of a UAM is typically actuated by position- or velocity-controlled joints (e.g., servo motors) [2]. In this case, admittance control can guarantee compliance and safety during the interaction with both environment and humans. However, as human and environment interaction tasks require different compliance choices, a variable-gain admittance control is used. In the operational space, the admittance-controlled manipulator dynamics are given by (12.23) Md x¨˜ + K D x˙˜ + K P x˜ = heb , T
where x˜ = x˜ Tp , x˜ oT ∈ R6 denotes the operational space error between the desired end-effector frame and the compliant frame, in the simulation side, and heb ∈ R6 is the wrench measured at the simulated end-effector, expressed in Σb . In particular, x˜ p = pbe − pbe,c ∈ R3 denotes the position error, whereas the orientation error x˜ o ∈ R3 is given by the vector part of the quaternion representing the rotation between the simulated compliant frame and the simulated end-effector frame, expressed by e ∈ S O(3). Re,c The dynamics in (12.23) are passive4 with respect to the power port (heb , x˙˜ ). Indeed, choosing as a storage function 4
For additional details on the passivity concept, see [38, 39].
12 Aerial Manipulator Interaction with the Environment
335
1 1 V (˜x, x˙˜ ) = x˙˜ T Md x˙˜ + x˜ T K P x˜ , 2 2
(12.24)
V˙ = x˙˜ T Md x¨˜ + x˙˜ T K P x˜ = x˙˜ T heb − x˙˜ T K D x˙˜ ≤ x˙˜ T heb .
(12.25)
its time derivative is
The same passivity argument can be extended to the hardware side (see (12.19)) with respect to the interaction force hh . By defining x¯ = x˜ − ep , Eq. (12.19) can be rewritten as Md x¨¯ + K D x˙¯ + K P x¯ = hh , which is passive with respect to the power port (hh , x˙¯ ) with storage function V (¯x, x˙¯ ) =
1 1 ˙T x¯ Md x˙¯ + x¯ T K P x¯ . 2 2
(12.26)
However, if the admittance gains are time-variant, functions in (12.24) and (12.26) are no longer valid. Indeed, (12.25) becomes
1 ˙ T ˙ ˙ ˙ P x˜ x˜ Md x˜ + x˜ T K V˙ = x˙˜ T Md x¨˜ + x˙˜ T K P x˜ + 2
1 T T ˙ d x˙˜ + x˜ T K ˙ P x˜ , x˙˜ T M = x˙˜ he − x˙˜ K D x˙˜ + 2
(12.27)
and passivity is guaranteed only if
1 ˙ T ˙ ˙ ˙ P x˜ . x˜ Md x˜ + x˜ T K x˙˜ T K D x˙˜ ≥ 2
(12.28)
Notice that the same reasoning can be applied to (12.26). To guarantee passivity despite (12.28), an energy tank can be employed [40, 41]. The tank dynamics are 1 ϕ z˙ = Pd − γ w, where ϕ ∈ R and γ ∈ R denote two parameters, Pd = x˙˜ T K D x˙˜ ≥ z z 1 ˙ ˙ ˜+ 0 denotes the power dissipated by the admittance system (12.23), w = x˙˜ T M dx 2 T ˙ x˜ K P x˜ denotes the tank input. Let us define the tank’s storage function as T (z) = 0.5z 2 . To guarantee passivity, let
0, if T ≥ T & w ≤ 0, ι, otherwise, ⎧ ⎨1, if w > 0, T −T ι= 1 ⎩ 1 − cos π , otherwise, 2 T −T ϕd , if T < T ϕ= 0, otherwise,
γ =
(12.29a)
(12.29b)
(12.29c)
336
S. M. Orozco-Soto et al.
with ϕd ≤ 1 representing the amount of the dissipated energy redirected to the tank, while T ∈ R and T ∈ R denote the upper and lower energy limits of the tank, respectively. ˙ P are substituted by ιM ˙ d and ιK ˙ P , respectively, ˙ d and K If the desired values of M then the overall storage function is given by V˙ + T˙ = x˙˜ T Md x¨˜ + x˙˜ T K P x˜ + ιw + ϕ Pd − γ w = x˜˙ T he − x˙˜ T K D x˙˜ + ιw + ϕ Pd − γ w = x˙˜ T he − (1 − ϕ)Pd + (ι − γ )w ≤ x˙˜ T he ,
(12.30)
where the passivity of the arm-plus-tank system is always verified despite the sign of w. The energy tank partially stores the energy dissipated by the admittance dynamics, releasing it later if necessary. By injecting controlled amounts of energy into the system, the tank allows performing actions that would typically violate the system passivity, like changing the admittance virtual stiffness or mass. Since the arm and the quadrotor are controlled separately, using an external wrench estimator to compensate for the arm movements, among the various disturbances, can enhance the stability properties of the quadrotor controller. A momentum-based estimator is used [42], where the external wrench estimation is given by ηˆ ext =
T T T , τˆ bext ∈ R6 . fˆext Lemma 12.1 To effectively perform the compensation in the quadrotor control law, the force estimation, fˆext , is added to the term K p e p + Kv ev + mge3 − m p¨ b,d in (12.21), while the torque estimation, τˆ bext , is added to (12.22). The estimation of the external wrench is performed through the estimator ηˆ ext (t) = K 1
−ηˆ ext (σ ) + K 2 κ(σ ) 0 t u T Rb e3 − mge3 ˆ + η − (σ ) dσ dσ , ext τ b − s(ωbb )I b ωbb 0 t
(12.31)
where κ ∈ R6 denotes the generalized momentum of the system (12.16) defined as mI3 03×3 p˙ b , and the matrices K1 ∈ R6×6 and K2 ∈ R6×6 are symmetric κ= 03×3 Ib ωbb positive-definite gains. Proof The geometric flight controller for the UAV and the admittance-controlled arm have been proven to possess stability properties in many past works. However, their combination, especially with a momentum-based external wrench estimator, is not guaranteed to maintain the same properties. This subsection aims to demonstrate that the dynamics of the UAM are marginally stable and bounded in the presence of unknown and non-vanishing (but still bounded)
12 Aerial Manipulator Interaction with the Environment
337
external force disturbances. In the following, only the boundedness of the linear dynamics will be proven for the sake of brevity, but notice that the effects of the angular dynamics are still taken into account in the UAV dynamics. In this regard, consider both the manipulator and the UAV commanded to be still in the air at the desired position. The admittance-controlled dynamics of the arm with respect to the UAV base frame and the dynamics of the UAV base frame with respect to the world frame can be written, respectively, as Me p¨ be + De p˙ be + Ke p˜ be = fh , Mb p¨ b + Db p˙ b + Kb p˜ b = δ + e F ,
(12.32a) (12.32b)
where δ ∈ R3 captures the nonlinear coupling dynamics given by the under-actuation of the UAV, fh ∈ R 3 captures the human interaction forces, and e F ∈ R 3 captures the residual of external forces acting on the UAV not instantaneously compensated by the external wrench estimator in (12.31). Fusing together the two dynamics in (12.32) yields the arm dynamics into the world frame, Σw , whose expression is Me p¨ e + De p˙ e + Ke p˜ e = fh + Me Mb−1 (δ + e F ) + (De − Me Mb−1 Db )p˙ b + (Ke − Me Mb−1 Kb )p˜ b = fh + M(δ + e F ) + Dp˙ b + Kp˜ b .
(12.33)
Starting from (12.33), the stability proof is split into two parts: first, the UAV dynamics will be proven to be bounded; then, the manipulator’s dynamic into the world frame will be established to be bounded by exploiting the first result. Consider the non-perturbed UAV dynamics α b (¯xb ) = Mb p¨ b + Db p˙ b + Kb p˜ b = 03 ,
(12.34)
T where x˙¯ b = p˙ Tb , p˜ Tb . Since Mb , Db , and Kb are symmetric positive-definite matrices, the closed-loop dynamics is asymptotically stable and it can be rewritten in the form x˙¯ b = Ab x¯ b , where Ab ∈ R6×6 can be easily extracted from (12.34). From the proven stability of the previous closed-loop system, it follows that the 1 Lyapunov function Vb (t, x¯ b ) = x¯ bT Pb x¯ b exists and the following Lyapunov equation 2 Pb Ab + ATb Pb = −Qb
(12.35)
is satisfied with a unique symmetric and positive-definite matrix Pb ∈ R6×6 given any symmetric and positive-definite Qb ∈ R6×6 . Then, it is easy to show that there exist four scalars γ1 = λPb , γ2 = λ¯ Pb , γ3 = λQb , and γ4 = 2λ¯ Pb such as to satisfy the inequalities
338
S. M. Orozco-Soto et al.
γ1 ¯xb 2 ≤ Vb (¯xb ) ≤ γ2 ¯xb 2 , ∂ Vb ∂ Vb ˙ + x¯ b ≤ −γ3 ¯xb 2 , ∂t ∂ x¯ b ∂ Vb 2 ∂ x¯ ≤ γ4 ¯xb , b
(12.36a) (12.36b) (12.36c)
where λX and λ¯ X denote the minimum and maximum eigenvalues of a generic matrix X, respectively. So far, we have shown a few properties of the non-perturbed UAV dynamics in (12.34), but these actually have a possibly non-vanishing perturbation on the right side given by gb (¯xb , t) = δ + e F . By recalling the properties and the bounds of the nonlinear coupling term δ (see [37] for the details), it can be shown that the perturbation term is bounded by √ gb (¯xb , t) ≤ δ + e F ≤ α( 2 max{λ¯ Kb , λ¯ Db }¯xb + B) + B1 = Γ1 ¯xb + Γ2 ,
(12.37)
with α < 1, −mge3 + m p¨ b,d < B and a scalar B > 0. By Lemma 5.2 in [43], if the inequality Γ1
0 such that ¯xe (t) ≤ ξe e−ρe (t−t0 ) ¯xe (t0 ), ¯xe (t) ≤ B5 , ∀t ≥ te , with ξe =
∀t0 ≤ t ≤ te ,
(1 − )λ Q e 2Δe λ¯ Pe λ¯ Pe , ρe = , B5 = ξe , < 1. ¯ λPe λQe 2λPe
(12.42a) (12.42b)
(12.43)
From (12.42), the boundedness of the manipulator dynamics yields ¯xe (t) ≤ B6 = max{ξe ¯xe (t0 ) , B5 },
∀t ≥ te .
(12.44)
In the previous subsection, we have demonstrated that the UAV dynamics and, consequently, the whole UAM dynamics are bound to external non-vanishing forces. However, the entire proof is based on the validity of the inequality (12.38). Since it is not straightforward to demonstrate it theoretically, a numerical approach is presented here. In particular, it can be shown from (12.34) and (12.35) that verifying (12.38) reduces to solving the system ⎡
−Mb−1 Db −Mb−1 Kb
Ab = ⎣
I3×3
⎤ ⎦,
(12.45a)
03×3
Pb Ab + ATb Pb = −Qb ,
(12.45b)
√ " λQ λP ! α 2 max λ¯ Kb , λ¯ Db < b 2 b , 2λ¯ Pb
(12.45c)
as a function of the controller parameters, the system mass, and the initial attitude error α of the geometric controller defined in [37]. The symbolic computations to verify the existence of bounds have been performed in Mathematica [44]. To provide an answer in a reasonable time, the search for solutions has been restricted to the subset in which Qb = I 3 . Notice that this is not a restrictive hypothesis: indeed, the existence of a solution in this subset implies the presence of a solution in the larger subset given by all the symmetric and positivedefinite matrices Qb . The result of the computations is visible in Fig. 12.8. The plots show the maximum initial attitude error tracking value for the system’s state to be bounded as a function of the UAM mass and its controller parameters. It is interesting
340
S. M. Orozco-Soto et al.
Fig. 12.8 Limits on the initial attitude tracking error for the existence of bounds on the UAM state. The limits are function of the UAM mass as well as the UAV controller gains
to notice that, regardless of the value of the external interaction forces that will be applied to the system, the response of the UAM will always be bounded as long as the force is bounded and our system parameters respect the limits in the graph. Also, by employing a trajectory planner for the UAM, the initial attitude error will usually be very low, ideally zero, making it easy to fall in the cases depicted in the graphs. Moreover, as stated earlier, this is a result obtained in the particular case of Qb = I 3 . It is likely that by considering other values for Qb , the parameter set that allows the boundedness will enlarge even more.
12 Aerial Manipulator Interaction with the Environment
341
Fig. 12.9 Aerial base position displacement e p along axes x (blue), y (red), and z (orange) during human interaction experiment
12.4.6 Case Study The effectiveness of the proposed system is evaluated along with two case studies: (i) a collaborative experiment where the human operator attaches a tool to the robot end-effector and (ii) an autonomous bird diverter installation task. The simulations have been performed on a standard Ubuntu 18.04 distribution with ROS Melodic, running at 200 Hz. The first case study considers the collaboration task between a human operator and the UAM through the hardware interface. In this context, the aerial platform awaits in mid-air until the operator approaches. The interaction phase starts with the admittance-controlled hardware interface to mount a tool on the manipulator gripper. While in use, the hardware interface provides a compliant behavior and transfers all the measured forces to the simulated UAM. When the aerial platform recognizes that the tool is mounted on the hardware interface and the operator has finished the interaction with it, the experiment ends. The interaction between the human operator and the simulation framework is demonstrated in Fig. 12.9 by the motion of the aerial manipulator e p subject to the forces generated on the hardware side. As stated above, these positions are fed back to the hardware to provide the operator with a realistic interaction feeling. The human–drone interaction forces at the arm’s tip hh , expressed in the world frame, are shown in Fig. 12.10. After a few seconds from the beginning of the experiment, the operator grabs the hardware manipulator, and the interaction lasts for about 30 s until he successfully mounts the tool. At this point, the tool weight force of about 1.4 N is the only one applied at the manipulator’s end-effector. The admittance gains are diagonal matrices whose variation is shown in Fig. 12.11. The arm starts with low gains to improve the comfort of the human operator and increases the system’s safety. If human contact is detected (around 10 s), then the virtual stiffness and the mass gains are increased to aid the tool’s placement process. The increased gains violate the arm passivity, which can be guaranteed by employing an energy tank. The tank partially discharges as in Fig. 12.12, consequently delaying the increase in
342
S. M. Orozco-Soto et al.
Fig. 12.10 Human–hardware contact forces hbe along axes x (blue), y (red), and z (orange) during human interaction experiment
Fig. 12.11 Quadrotor arm admittance gains K P (red), K D (blue), and Md (orange) during human interaction experiment: desired (dashed) and actual profiles (solid)
Fig. 12.12 Energy T inside the tank during human interaction experiment
the admittance gains. However, later in the experiment, the tank recharges thanks to the energy dissipated from the human interaction. This second case study aims at installing a bird diverter on an aerial power line by impacting the line with sufficient force. The arm admittance controller gains are
12 Aerial Manipulator Interaction with the Environment
343
Fig. 12.13 Diverter installation impact forces hbe along axes x (blue), y (red), and z (orange) during the bird diverter installation experiment
Fig. 12.14 Aerial base position displacement e p in Σw along axes x (blue), y (red), and z (orange) during the bird diverter installation experiment
increased with respect to the human–UAM interaction phase to provide rigidity. Only the gains along the front direction are kept low to ease the diverter installation and attenuate the impact effects on the floating base. In the beginning, the aerial manipulator is in free flight approaching the installation point. When this point is reached, the quadrotor is commanded to be still in position while the arm is positioned under the aerial cable and rapidly rises to hook the diverter. Because of the impact forces, shown in Fig. 12.13, right after the diverter was hooked, the quadrotor undergoes a displacement of about 0.1 m along the x-axis in Σw , which is recovered by the flight controller as shown in Fig. 12.14.
12.5 Conclusion and Future Work In its first part, this chapter has seen the implementation of the STSMC technique to an aerial manipulator platform. The proposed control technique is a hybrid position tracking/force regulation approach, considering the entire manipulator system
344
S. M. Orozco-Soto et al.
as a flying end-effector. Two case studies were performed within the Gazebo simulation environment to assess the STSMC: (i) tracking the desired trajectory and (ii) reaching a target position to apply lateral force for a long time. Both case studies showed successful tracking and force regulation results, yielding significantly small error norms, which proved the effectiveness of the proposed controller for omnidirectional aerial manipulator platforms. Future work will be devoted to implementing it on a real platform and testing the interaction with the environment with higher values of forces. The second part of this chapter presented a general framework for HIL simulation of human–aerial manipulator collaboration. In the beginning, the simulated aerial manipulator and the hardware interface were presented. These two components were then connected through a communication interface to implement the force and position feedback between the simulation and the environment. The overall architecture’s effectiveness was evaluated in two case studies: (i) a collaborative task with a human operator and (ii) an autonomous bird diverter installation task. We demonstrated the possibility of performing human–aerial manipulator interaction during these experiments without endangering the operator. Additionally, the stability proof of the chosen architecture was proposed by employing both symbolic and numerical tools. Future work will further stress the HIL approach with other tasks and validate the possibility of using it to train human operators through a comprehensive humansubjects study. Besides, the proposed architecture can be tested with other hardware interfaces to stress its limits. Teleoperation may also be inserted with proper handling of the delay, thanks to the energy tank background. Acknowledgements The research leading to these results has been supported by both the AERIALCORE project, European Union’s Horizon 2020 research and innovation program, under Grant Agreement No 871479, and the AERO-TRAIN project, European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement No 953454. The authors are solely responsible for its content.
References 1. Cacace J, Orozco-Soto SM, Suarez A, Caballero A, Orsag M, Bogdan S, Vasiljevic G, Ebeid E, Rodriguez JAA, Ollero A (2021) Safe local aerial manipulation for the installation of devices on power lines: aerial-core first year results and designs. Appl Sci 11(13):6220 2. Ruggiero F, Lippiello V, Ollero A (2018) Aerial manipulation: a literature review. IEEE Robot Autom Lett 3(3):1957–1964 3. Ollero A, Tognon M, Suarez A, Lee D, Franchi A (2022) Past, present, and future of aerial robotics manipulators. IEEE Trans Robot 38(1):626–645 4. Kotarski D, Krznar M, Piljek P, Simunic N (2017) Experimental identification and characterization of multirotor UAV propulsion. J Phys Conf Ser 870:012003 5. Goel A, Swarup A (2017) MIMO uncertain nonlinear system control via adaptive high-order super twisting sliding mode and its application to robotic manipulator. J Control Autom Electr Syst 28(1):36–49
12 Aerial Manipulator Interaction with the Environment
345
6. Shtessel YB, Moreno JA, Plestan F, Fridman LM, Poznyak AS (2010) Super-twisting adaptive sliding mode control: a Lyapunov design. In: Conference on decision and control. IEEE, pp 5109–5113 7. Alqaisi W, Kali Y, Ghommam J, Saad M, Nerguizian V (2020) Position and attitude tracking of uncertain quadrotor unmanned aerial vehicles based on non-singular terminal super-twisting algorithm. Proc Inst Mech Eng Part I J Syst Control Eng 234(3):396–408 8. Ghadiri H, Emami M, Khodadadi H (2021) Adaptive super-twisting non-singular terminal sliding mode control for tracking of quadrotor with bounded disturbances. Aerosp Sci Technol 112:106616 9. Nguyen NP, Kim W, Moon J (2018) Observer-based super-twisting sliding mode control with fuzzy variable gains and its application to overactuated quadrotors. In: Conference on decision and control. IEEE, pp 5993–5998 10. Nguyen NP, Kim W, Moon J (2019) Super-twisting observer-based sliding mode control with fuzzy variable gains and its applications to fully-actuated hexarotors. J Franklin Inst 356(8):4270–4303 11. Ji R, Ma J, Ge SS, Ji R (2020) Adaptive second-order sliding mode control for a tilting quadcopter with input saturations. IFAC-PapersOnLine 53(2):3910–3915 12. Yi S, Watanabe K, Nagai I (2021) Anti-disturbance control of a quadrotor manipulator with tiltable rotors based on integral sliding mode control. Artif Life Robot 1–10 13. Riache S, Kidouche M, Rezoug A (2019) Adaptive robust nonsingular terminal sliding mode design controller for quadrotor aerial manipulator. TELKOMNIKA Telecommun Comput Electron Control 17(3):1501–1512 14. Kuchwa-Dube C, Pedro JO (2018) Altitude and attitude tracking of a quadrotor-based aerial manipulator using super twisting sliding mode control. In: Proceedings of the 6th international conference on control, mechatronics and automation, pp 65–69 15. Kuchwa-Dube C, Pedro JO (2019) Quadrotor-based aerial manipulator altitude and attitude tracking using adaptive super-twisting sliding mode control. In: International conference on unmanned aircraft systems. IEEE, pp 144–151 16. Lippiello V, Ruggiero F (2012) Cartesian impedance control of a UAV with a robotic arm. In: International IFAC symposium on robot control, pp 704–709 17. Ruggiero F, Cacace J, Sadeghian H, Lippiello V (2014) Impedance control of VToL UAVs with a momentum-based external generalized forces estimator. In: IEEE international conference on robotics and automation, pp 2093–2099 18. Invernizzi D, Giurato M, Gattazzo P, Lovera M (2018) Full pose tracking for a tilt-arm quadrotor UAV. In: Conference on control technology and applications. IEEE, pp 159–164 19. Kamel M, Verling S, Elkhatib O, Sprecher C, Wulkop P, Taylor Z, Siegwart R, Gilitschenski I (2018) The voliro omniorientational hexacopter: an agile and maneuverable tiltable-rotor aerial vehicle. IEEE Robot Autom Mag 25(4):34–44 20. Ryll M, Muscio G, Pierri F, Cataldi E, Antonelli G, Caccavale F, Bicego D, Franchi A (2019) 6d interaction control with aerial robots: the flying end-effector paradigm. Int J Robot Res 38(9):1045–1062 21. Cacace J, Finzi A, Lippiello V, Furci M, Mimmo N, Marconi L (2016) A control architecture for multiple drones operated via multimodal interaction in search rescue mission. In: IEEE international symposium on safety, security, and rescue robotics, pp 233–239 22. Medeiros ACS, Ratsamee P, Uranishi Y, Mashita T, Takemura H (2020) Human-drone interaction: using pointing gesture to define a target object. In: Kurosu M (ed) Human-computer interaction. Multimodal and natural interaction. Springer, Cham, pp 688–705 23. Tezza D, Garcia S, Hossain T, Andujar M (2019) Brain eRacing: an exploratory study on virtual brain-controlled drones. In: Chen J, Fragomeni G (ed) Virtual, augmented and mixed reality. Applications and case studies (Lecture notes in computer science), vol 11575. Springer, Cham, pp 150–162 24. Cacace J, Finzi A, Lippiello V (2017) A robust multimodal fusion framework for command interpretation in human-robot cooperation. In: 2017 26th IEEE international symposium on robot and human interactive communication, pp 372–377
346
S. M. Orozco-Soto et al.
25. Jane LE, Ilene LE, Landay JA, Cauchard JR (2017) Drone & Wo: cultural influences on humandrone interaction techniques. In: CHI conference on human factors in computing systems, pp 6794–6799 26. Tezza D, Andujar M (2019) The state-of-the-art of human-drone interaction: a survey. IEEE Access, vol 7, pp 167 438–167 454 27. Selvaggio M, Cognetti M, Nikolaidis S, Ivaldi S, Siciliano B (2021) Autonomy in physical human-robot interaction: a brief survey. IEEE Robot Autom Lett 6(4):7989–7996 28. Abtahi P, Zhao DY, E JL, Landay JA (2017) Drone near me: exploring touch-based humandrone interaction. Proc ACM Interactive Mobile Wearable Ubiquitous Technol 1(3) 29. Lieser M, Schwanecke U, Berdux J (2021) Tactile human-quadrotor interaction: Metrodrone. In: Proceedings of the fifteenth international conference on tangible, embedded, and embodied interaction, pp 1–6 30. Rajappa S, Bülthoff H, Stegagno P (2017) Design and implementation of a novel architecture for physical human-UAV interaction. Int J Robot Res 36(5–7):800–819 31. Augugliaro F, D’Andrea R (2013) Admittance control for physical human-quadcopter interaction. In: European control conference, pp 1805–1810 32. Tognon M, Alami R, Siciliano B (2021) Physical human-robot interaction with a tethered aerial vehicle: application to a force-based human guiding problem. IEEE Trans Robot 1–12 33. Cuniato E, Cacace J, Selvaggio M, Ruggiero F, Lippiello V (2021) A hardware-in-the-loop simulator for physical human-aerial manipulator cooperation. In: International conference on advanced robotics 34. Furrer F, Burri M, Achtelik R, Aand Sigwart M (2016) RotorS—a modular Gazebo MAV simulator framework. In: Robot operating system (ROS). Studies in computational intelligence, vol 625. Springer, Cham 35. Madani T, Benallegue A (2006) Backstepping control for a quadrotor helicopter. In: International conference on intelligent robots and systems, pp 3255–3260 36. Murray R, Li Z, Sastry S (2017) A mathematical introduction to robotic manipulation. CRC Press, Boca Raton, FL 37. Lee T, Leok M, McClamroch N (2010) Geometric tracking control of a quadrotor UAV on S E(3). In: Conference on decision and control, pp 5420–5425 38. Slotine J, Li W (1987) On the adaptive control of robot manipulators. Int J Robot Res 6(3):49–59 39. Berghuis H, Nijmeijer H (1993) A passivity approach to controller-observer design for robots. IEEE Trans Robot Autom 9:740–754 40. Ferraguti F, Secchi C, Fantuzzi C (2013) A tank-based approach to impedance control with variable stiffness. In: IEEE international conference on robotics and automation, pp 4948–4953 41. Selvaggio M, Robuffo Giordano P, Ficuciello F, Siciliano B (2019) Passive task-prioritized shared-control teleoperation with haptic guidance. In: 2019 IEEE international conference on robotics and automation, pp 430–436 42. Ruggiero F, Trujillo M, Cano R, Ascorbe H, Viguria A, Peréz C, Lippiello V, Ollero A, Siciliano B (2015) A multilayer control for multirotor UAVs equipped with a servo robot arm. In: IEEE international conference on robotics and automation, pp 4014–4020 43. Khalil HK (2002) Nonlinear systems, 3rd edn. Prentice-Hall, Upper Saddle River, NJ 44. W. R. Inc. (2021) Mathematica, Version 12.3.1, Champaign, IL. [Online]. https://www. wolfram.com/mathematica
12 Aerial Manipulator Interaction with the Environment
347
Santos M. Orozco-Soto Santos Miguel Orozco Soto received the Ph. D. degree in Automatic Control from the Research and Advanced Studies Center of the National Polytechnic Institute, Mexico City, in 2020. His research consisted of developing control systems for multiple robotic platforms. He currently holds a postdoctoral position at the Consorzio CREATE in Naples, Italy, where he carries out scientific research regarding the control of aerial manipulators. His research interests are automatic control and its applications. Eugenio Cuniato received an M.Sc. degree in automation engineering from the University of Naples “Federico II” in 2021. He focused on linear and nonlinear control of robotic systems during his studies, especially manipulators and aerial vehicles. He is now pursuing a Ph.D. degree at the Autonomous Systems Lab at ETH Zurich. His main area of research regards the control of aerial manipulators for safe aerial physical interaction. Jonathan Cacace received the Master’s degree (magna cum laude) in computer science from the University of Naples “Federico II” in 2012 and the Ph.D. degree in Robotics in 2016 from the same institution. Currently, he is an Assistant Professor at the University of Naples “Federico II”. He is involved in several research projects in Human–Robot Interaction in Industry 4.0, autonomous control of UAVs for inspection and maintenance, and robotic manipulation. Mario Selvaggio received the Ph.D. degree in information technology and electrical engineering from the University of Naples “Federico II”, Naples, Italy, in 2020. He has been Visiting Student with IRISA, INRIA Rennes, Rennes, France, in 2017 and 2018; Visiting Student with the University of California Santa Barbara, Santa Barbara, California, US, in 2019. Since 2020, he has been working as a Postdoctoral Researcher in the Electrical Engineering and Information Technology department of the University of Naples “Federico II”, Naples, Italy. He has published more than 35 peer-reviewed journal articles, conference papers, and book chapters. Fabio Ruggiero received the Ph.D. in Computer and Automation Engineering from the University of Naples “Federico II” in 2010. He spent 7 months at Northwestern University as a Visiting Scholar from September 2009 to March 2010. After several postdoctoral positions from 2011 to 2016, he is an Associate Professor at the University of Naples “Federico II”. His research interests are control strategies for dexterous, dual-hand and non-prehensile robotic manipulation, aerial robots, aerial manipulators, and legged robots. He has co-authored about 70 journal papers, book chapters, and conference papers. Vincenzo Lippiello received his Laurea degree in Electronic Engineering and the Ph.D. degree in Information Engineering from the University of Naples “Federico II” in 2000 and 2004, respectively. He is a Professor of Automatic Control in the Department of Electrical Engineering and Information Technology, University of Naples “Federico II”. His research interests include visual servoing of robot manipulators, aerial robotics, hybrid visual/force control, adaptive control, grasping and manipulation, and visual object tracking and reconstruction. He has published more than 150 journal and conference papers and book chapters. Bruno Siciliano is the Director of ICAROS Center and Coordinator of PRISMA Lab at University of Naples “Federico II”. He is Honorary Professor at Óbuda University, where he holds the Kalman Chair. Fellow of the scientific societies IEEE, ASME, IFAC, he received numerous international prizes and awards, and he was President of the IEEE Robotics and Automation Society from 2008 to 2009. He has delivered more than 150 keynotes and has published more than 300 papers and 7 books. His book “Robotics” is among the most adopted academic texts worldwide, while his edited volume “Springer Handbook of Robotics” received the highest recognition for scientific publishing: 2008 PROSE Award for Excellence in Physical Sciences & Mathematics. His research team got more than 20 projects funded by the European Union in the last 15 years, including an Advanced Grant from the European Research Council.
Chapter 13
Conclusion and Future Research Andrea L’Afflitto, Gokhan Inalhan, and Hyo-Sang Shin
13.1 Conclusion After a brief introduction in Chap. 1 to the role of control theory in the design of autopilots for autonomous UAVs, Chaps. 2–12 of this book provided snapshots of the state of the art in control theory and its applications to the design of autonomous fixedwing and multi-rotor aircraft. Although this work was not meant to be exhaustive, we are confident that it provided the readers with sufficient breadth and depth to appreciate the stunning abilities of UAVs for civilian applications and their potential in the near future. Furthermore, for their tutorial style and detailed discussions, each of these chapters has provided the readers with solid starting points for their own literature reviews and for their own lines of research. Revolving around optimal control theory and differential game theory, Chaps. 2–5 discussed modern solutions to the guidance problem for UAVs, that is, the problem of autonomously defining reference paths and trajectories that meet userdefined performance criteria while meeting the mission objectives. Chapters 6 and 7 addressed two key problems in modern navigation for UAVs, namely, following plumes of material dispersed in the atmosphere and how to employ GPS antenna in UAV navigation, while avoiding common problems related to the misuse of the lack of availability of this technology at all times. Finally, Chaps. 8–12 showed multiple nonlinear control techniques and their applications to autonomous UAVs. Despite classical techniques, those discussed in the third part of this book show how the A. L’Afflitto (B) Virginia Tech, Blacksburg, VA, USA e-mail: [email protected] G. Inalhan · H.-S. Shin Cranfield University, Cranfield, United Kingdom e-mail: [email protected] H.-S. Shin e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 A. L’Afflitto et al. (eds.), Control of Autonomous Aerial Vehicles, Advances in Industrial Control, https://doi.org/10.1007/978-3-031-39767-7_13
349
350
A. L’Afflitto et al.
nonlinear nature of UAVs’ dynamics can be leveraged to perform complex tasks despite uncertainties and external disturbances.
13.2 Future of Research on Control Systems for UAVs Although fundamental knowledge developed to design, produce, and employ manned aircraft has been instrumental to jump-start research in UAV-related areas, the transfer of knowledge from research in unmanned aerial systems to manned aerial vehicles has been considerably less preponderant. The reasons for this imbalance are to be sought not only in the fact that UAVs have been investigated for a shorter time span, but also in the tight regulations constraining manned aircraft for civilian use. As a consequence, in the past two decades, research and practice in classical aeronautical engineering and UAVs proceeded substantially in parallel. More recently, considerable attention has been placed by FAA (Federal Aviation Administration) and NASA (National Aeronautics and Space Administration) in the areas of urban air mobility and advanced air mobility. With these efforts, considerable momentum is being generated to design, produce, and implement aerial vehicles that are autonomous; able to transport a limited number of passengers or goods; and operating in and across urban and sub-urban environments, while being perfectly integrated with the National Airspace System and exploiting existing cyber-physical infrastructures such as air traffic control services and helicopter pads. Aircraft that meet the demanding requirements of the urban air mobility and the advanced air mobility paradigms will be the result of considerable effort at the intersection of classical research in aeronautical engineering and autonomous UAVs. Some of the results presented in this book, such as those presented in Chaps. 4 and 6, directly contribute to this futuristic direction. Other results, such as those presented in Chaps. 2, 3, and 5, present viable solutions to the guidance problem that can play pivotal roles in the design of autopilots for semi-autonomous aircraft to be employed in urban environments. Similarly, Chaps. 7–10 provide outstanding tools to exploit the dynamics of aerial vehicles and guarantee high levels of performance despite considerable uncertainties, disturbances, and possible faults and failures, and assuring the highest safety standards is indispensable to realizing urban air mobility and advanced air mobility concepts. As a final remark, it is worthwhile noting that there is a considerable unbalance between academic research on autonomous UAVs and industrial practices involving aerial autonomy. Whereas experiments in controlled, indoor, or outdoor environments show the effectiveness and the applicability of the theoretical results being developed in academia, there is a compelling need for advanced verification and validation methods to certify the effectiveness of novel guidance, navigation, and control methods; assess their potential and limitation; and quantify the associated risks and benefits. Without such results, the so-called “valley of death”, that is, the gap between academic research and industrial practices, will continue widening.
Index
Symbols A∗ , 37 D ∗ , 32 lite, 32 L P A∗ , 44
A Adaptive control, 304 Advanced air mobility, 350 Aerial manipulator, 319
B Backstepping control, 231, 234 Battery dynamics, 108 Bayes filter, 144 Bézier curve, 187
C Camera field of view, 46 Climb, 117 Cooperative output regulation, 293 Costate equations, 18, 90 Covariance, 153 equation, 172 Coverage problem, 33 Cruise, 117 Cyber-security, 168
D Descent, 120 Directed graph, 297 Drag force, 93, 109, 201
Drag moment, 200 Dynamic equations, 54, 92, 169, 200, 201, 233, 323, 331 feedback linearized, 56 Dynamic programming adaptive, 273 differential, 78, 81 flexible final time constrained, 79, 84
E Entropy, 137, 148 Entrotaxis, 137, 152 Event-triggered control, 273, 284 Example constrained guidance, 92 exerting forces, 327, 341 GPS spoofing avoidance, 187 human–aerial robot interaction, 341 lateral-directional control, 271 longitudinal control, 271 multi-vehicle system control, 308 nonlinear guidance, 22 position estimation, 181 pursuit and line of sight law, 23 reckless guidance, 70 source identification, 155 tactical guidance, 72 track guidance, 21 trajectory tracking, 217, 249, 255, 327 turbofan aircraft, 125 turboprop aircraft, 126 Explainable global dual heuristic dynamic programming, 276 Explore-then-exploit, 32
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 A. L’Afflitto et al. (eds.), Control of Autonomous Aerial Vehicles, Advances in Industrial Control, https://doi.org/10.1007/978-3-031-39767-7
351
352 F Fixed-wing aircraft, 92, 233, 271 all electric, 124 turbofan, 122 turbofprop, 123 turbojet, 124 Flight management system, 106
G Goal point selection, 36 GPS, 31, 167 denied, 31, 168 spoofing, 167 detection, 168 distorsion, 168 Gyroscopic moment, 200 reduced, 202
H Hamiltonian function, 17, 114, 270 Heuristic-based problem, 43
I Incremental backstepping control, 231, 235 Incremental nonlinear dynamic inversion, 196, 202, 231 Inertial measurement unit, 169, 204 Infotaxis, 137, 150 cooperative, 158 rapidly-exploring random tree, 157 receding horizon, 155
J Jacobian matrix, 18
K Kalman filter, 145, 171, 172 unscented, 168, 174 Kinematic equations, 54, 200, 201, 233, 323, 331 planar, 10 relative, 12
L Lagrange multiplier, 17, 88 Linear matrix inequality region, 296 Linear quadratic, 7 regulator, 7
Index tracker, 7 Lyapunov algebraic equation, 337 function, 234, 306, 337 inequality, 303
M Model predictive control, 8, 32, 171, 177, 179, 180 nonlinear, 8 stochastic, 179
O Optimal control, 7 boundary conditions, 69, 113 constraint, 16, 58, 63, 80, 113 cost function, 16, 65, 80, 270 augmented, 17 barrier, 67 maximum endurance, 105, 112 safety, 65 cost to go, 81 hybrid, 105, 111 performance index, 112 problem, 16, 113 Optimal endurance, 121 Optimization cost to come, 43 Heuristic function, 43 weighing function, 43
P Particle filter, 136, 145 Rao–Blackwellized, 138, 147 Path planning, 16, 43, 150, 158 Persistence of excitation, 276 Plume model Gaussian, 141 isotropic, 142 Proportional derivative control, 181 Proportional integral derivative control, 8, 197, 216, 276 Proportional navigation guidance, 8 Pursuit and line of sight law, 23
Q Quadcopter, 34, 155, 181, 199 system of, 187, 308
Index R Rapidly-exploring random tree, 150, 157 Receding horizon, 8, 150, 155
S Sensor Model, 142 Gauss, 143 Poisson, 142 Sliding mode control second-order, 320 super-twisting, 322 surface, 326 Socialtaxis, 138, 160 improved, 138, 161 Source cognitive search, 136 estimation, 136 pre-planned search, 136 reactive search, 136 Spanning tree, 296 Stability analysis, 204, 245, 254, 260, 304, 325, 333
353 Storage function, 334
T Thrust force, 54, 201 realization, 54, 217 Trajectory planning, 50, 79, 112, 182
U Unmanned ground vehicle, 308 Urban air mobility, 350
V Voxel, 32 explored, 35 occupied, 34 unexplored, 35 unoccupied, 34
Z Zeno phenomenon, 285