242 56 11MB
English Pages 163 [164] Year 2023
Particle Acceleration and Detection
Zheqiao Geng Stefan Simrock
Intelligent Beam Control in Accelerators
Particle Acceleration and Detection Series Editors Alexander Chao, SLAC, Stanford University, Menlo Park, CA, USA Katsunobu Oide, KEK, High Energy Accelerator Research Organization, Tsukuba, Japan Werner Riegler, Detector Group, CERN, Geneva, Switzerland Vladimir Shiltsev, Accelerator Physics Center, Fermi National Accelerator Lab, Batavia, IL, USA Frank Zimmermann, BE Department, ABP Group, CERN, Genèva, Switzerland
The series “Particle Acceleration and Detection” is devoted to monograph texts dealing with all aspects of particle acceleration and detection research and advanced teaching. The scope also includes topics such as beam physics and instrumentation as well as applications. Presentations should strongly emphasize the underlying physical and engineering sciences. Of particular interest are – contributions which relate fundamental research to new applications beyond the immediate realm of the original field of research – contributions which connect fundamental research in the aforementioned fields to fundamental research in related physical or engineering sciences – concise accounts of newly emerging important topics that are embedded in a broader framework in order to provide quick but readable access of very new material to a larger audience The books forming this collection will be of importance to graduate students and active researchers alike.
Zheqiao Geng · Stefan Simrock
Intelligent Beam Control in Accelerators
Zheqiao Geng RF Section Accelerator Technology Department Large Research Facilities Division Paul Scherrer Institut Villigen PSI, Switzerland
Stefan Simrock SCOP, SCOD, Controls Division ITER Organization St. Paul-lez-Durance, France
ISSN 1611-1052 ISSN 2365-0877 (electronic) Particle Acceleration and Detection ISBN 978-3-031-28596-7 ISBN 978-3-031-28597-4 (eBook) https://doi.org/10.1007/978-3-031-28597-4 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
For Yingxin and Xintian. Zheqiao Geng For Josef and Maria. Stefan Simrock
Preface
Operating a particle accelerator to provide high-quality particle or photon beams is challenging due to the large number of complex subsystems with many dependencies and the large number of parameters to be controlled. An automated and intelligent beam control system is essential for successful beam operation. The primary tasks of beam control include beam setup, stabilization, and optimization. Beam setup establishes the desired beam parameters, beam stabilization stabilizes the beam parameters against disturbances, and beam optimization promotes the accelerator’s performance. A general beam control system consists of many parts, such as device controllers (e.g., low-level RF, magnet controllers, beam diagnostic controllers), beam feedback controllers, physics applications, beam optimizers, typically designed by different teams. Integrating these parts requires a wide range of cross-domain knowledge, such as control theory, optimization, beam physics, systems/software engineering. The primary goal of this book is to provide an overview of the basic architecture and algorithms of beam controls. It covers the topics of static-system feedback control, blackbox optimization, and machine learning applications in feedback and optimization. We limit our discussions to fundamental concepts, basic methodologies, and simple algorithms without diving deeply into the theory. This helps the readers capture the essential relationship between these algorithms without getting lost in math. Optimization and machine learning are hot topics in accelerator controls. Many advanced algorithms are being developed with excellent results. This book cannot cover all these developments, for which the reader is referred to further literature. However, the basic algorithms covered in this book are a good basis for understanding, applying, and developing advanced algorithms. This book is aimed at accelerator control scientists and engineers at the postgraduate level. They should have basic knowledge of beam dynamics, college mathematics (calculus, linear algebra, probability theory), and control theory. The book is written with minimum math and provides many methods and results that can directly apply to engineering practices. The book consists of four chapters. Chapter 1 establishes a layered architecture for beam control. It helps clarify the intrinsic relationship between different parts of a beam control system. Chapter 2 discusses the beam feedback control, focusing on the vii
viii
Preface
design of static controllers based on beam response matrices. Chapter 3 introduces several widely used blackbox optimization algorithms that are successful in the accelerator beam optimization. In Chap. 4, we provide an overview of machine learning and summarize its applications in the accelerator beam feedback and optimization. Many attempts are being made to apply different machine learning algorithms to accelerators, but we only discuss several most successful ones, such as the neural network and Gaussian process. Finally, the basic concepts of reinforcement learning are introduced with the demonstration of designing linear optimal controllers. We want to thank Sergey Tomin (European XFEL) for providing some materials used in the book. We would also like to thank Alexander Wu Chao (SLAC/Stanford University) for proposing this book to Springer, which has greatly motivated us to finish this book. Finally, we thank Springer staff, in particular Hisako Niko and Padma Subbaiyan, for their help and support. Villigen PSI, Switzerland St. Paul-lez-Durance, France
Zheqiao Geng Stefan Simrock
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Overview of Beam Controls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.1 Beam Control Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.2 Beam Control Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Beam Control System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.1 Hierarchy of Beam Control System . . . . . . . . . . . . . . . . . . . . . 1.2.2 Beam Device Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.3 Instrumentation Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.4 Global Control Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.5 Global Optimization Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.6 Role of Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.7 SwissFEL Two-Bunch Operation . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1 1 2 4 6 6 8 9 12 13 14 15 18
2 Beam Feedback Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Beam Feedback Control Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Beam Feedback Control Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Plant Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Static and Dynamical Controllers . . . . . . . . . . . . . . . . . . . . . . . 2.2.3 Local and Global Control Loops . . . . . . . . . . . . . . . . . . . . . . . 2.3 Beam Response Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Response Matrix Identification . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.2 Singular Value Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.3 Response Matrix Uncertainties . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Static Linear Feedback Controller Design . . . . . . . . . . . . . . . . . . . . . . 2.4.1 Difficulties in Response Matrix Inversion . . . . . . . . . . . . . . . . 2.4.2 Matrix Inversion with SVD . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.3 Matrix Inversion with Least-Square Method . . . . . . . . . . . . . 2.4.4 Robust Control Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.5 SwissFEL Bunch2 Feedback Control . . . . . . . . . . . . . . . . . . .
21 21 22 23 25 28 29 29 33 34 36 36 38 39 40 43
ix
x
Contents
2.5 Further Reading and Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
45 46
3 Beam Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Beam Optimization Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 Optimization Problems in Beam Controls . . . . . . . . . . . . . . . . 3.1.2 Formulation of Optimization Problems . . . . . . . . . . . . . . . . . . 3.1.3 Noise in Online Optimization Problems . . . . . . . . . . . . . . . . . 3.2 Optimization Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Overview of Optimization Algorithms . . . . . . . . . . . . . . . . . . 3.2.2 Test Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.3 Spontaneous Correlation Optimization . . . . . . . . . . . . . . . . . . 3.2.4 Random Walk Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.5 Robust Conjugate Direction Search . . . . . . . . . . . . . . . . . . . . . 3.2.6 Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.7 Particle Swarm Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.8 Comparison of Optimization Algorithms . . . . . . . . . . . . . . . . 3.3 Beam Optimization Examples and Tools . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Practical Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.2 FEL Optimization with SCO . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.3 Operating Point Changing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.4 Optimization Software Tools . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Further Reading and Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
49 49 49 50 51 53 53 55 55 58 60 64 70 73 74 74 75 76 79 81 82
4 Machine Learning for Beam Controls . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Introduction to Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 Machine Learning Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.2 Machine Learning Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.3 Machine Learning Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.4 Machine Learning Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Accelerator Modeling with Machine Learning . . . . . . . . . . . . . . . . . . 4.2.1 Neural Network Regression Model . . . . . . . . . . . . . . . . . . . . . 4.2.2 Gaussian Process Regression Model . . . . . . . . . . . . . . . . . . . . 4.3 Applications of Machine Learning Models in Beam Controls . . . . . 4.3.1 Surrogate Models of Beam Responses . . . . . . . . . . . . . . . . . . 4.3.2 Response Matrix Estimation with Neural Network Surrogate Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.3 Beam Optimization with Neural Network Surrogate Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.4 Feedforward Control with Neural Network Surrogate Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.5 Beam Optimization with GP Surrogate Models . . . . . . . . . . . 4.4 Feedback Control with Reinforcement Learning . . . . . . . . . . . . . . . . 4.4.1 Introduction to Reinforcement Learning . . . . . . . . . . . . . . . . .
85 85 85 88 90 92 94 95 101 106 106 108 110 112 116 123 123
Contents
4.4.2 Feedback Controller Design with Natural Actor-Critic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.3 Example: RF Cavity Controller Design . . . . . . . . . . . . . . . . . . 4.4.4 Example: Static Feedback Controller Design . . . . . . . . . . . . . 4.5 Further Reading and Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xi
126 135 139 144 146
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
Abbreviations
AC ADRC AI BAM BC BCM BFB BPM BSDAQ CDR CW DAC DC DMD EA EI EPICS ETFE FEL FIR FPGA GA GP GP-LCB GPO GPU HPRF HV I/O ICT IIR
Alternating-Current Active Disturbance Rejection Control Artificial Intelligence Bunch Arrival Time Monitor Bunch Compressor Bunch Compression Monitor Beam Feedback Beam Position Monitor Beam Synchronous Data Acquisition Coherent Diffraction Radiation Continuous-Wave Digital-to-Analog Converter Direct-Current Dynamic Mode Decomposition Evolutionary Algorithm Expected Improvement Experimental Physics and Industrial Control System Empirical Transfer Function Estimate Free Electron Laser Finite Impulse Response Field Programmable Gate Array Genetic Algorithm Gaussian Process Gaussian Process Lower Confidence Bound Gaussian Process Optimization Graphics Processing Unit High-Power Radio Frequency High Voltage Input/Output Integrating Current Transformer Infinite Impulse Response xiii
xiv
IMC LH LLRF LQG LQR LTI MAE MC MG-GPO MIMO MISO ML MLE MOPSO MSE MVN NAC NN NSGA-II PBIG PBPG PCA PDF PI PID PSO PSSS RBF RCDS RF RL RMS RWO SASE SCO SISO SRM SVD SVM SW SwissFEL TD TESLA TW
Abbreviations
Internal Model Control Laser Heater Low-Level Radio Frequency Linear Quadratic Gaussian Linear Quadratic Regulator Linear Time-Invariant Mean Absolute Error Monte-Carlo Multi-Generation Gaussian Process Optimization Multiple-Input Multiple-Output Multiple-Input Single-Output Machine Learning Maximum Likelihood Estimation Multi-Objective Particle Swarm Optimization Mean Square Error Multivariate Normal Distribution Natural Actor-Critic Neural Network Non-dominated Sorting Generic Algorithm II Photon-Beam-Intensity Gas-monitor Photon-Beam-Position Gas-monitor Principal Component Analysis Probability Distribution Function Probability of Improvement Proportional-Integral-Derivative Particle Swarm Optimization Photon Single-Shot Spectrometer Radial-Basis Function Robust Conjugate Direction Search Radio Frequency Reinforcement Learning Root Mean Square Random Walk Optimization Self-Amplified Spontaneous Emission Spontaneous Correlation Optimization Single-Input Single-Output Synchrotron Radiation Monitor Singular Value Decomposition Support Vector Machine Standing-Wave Swiss Free-Electron Laser. A FEL machine in Switzerland Temporal Difference TeV-Energy Superconducting Linear Accelerator Traveling-Wave
Chapter 1
Introduction
Abstract Particle accelerators are widely used for high-energy physics study, medical treatment, and photon beam generation in synchrotron radiation and freeelectron laser machines. Optimal and stable particle beams are essential for achieving the physics goals. Beam control algorithms and subsystems are critical in designing and operating an accelerator. This chapter gives an overview of the concepts and architecture of beam controls. We introduce a four-layer architecture as a guideline for developing the subsystems related to beam controls, such as beam device control, beam feedback, and beam optimization. We also highlight the role of machine learning in beam controls. Finally, we briefly introduce the two-bunch operation of SwissFEL since many examples in this book are based on it.
1.1 Overview of Beam Controls Accelerators produce beams of charged particles, such as electrons, protons, and heavy ions, which may generate secondary products like photons, neutrons, and muons. Charged particle beams are characterized by beam energy, bunch length, beam current, emittance, beam position, bunch arrival time, etc. In contrast, a photon beam is typically described by its wavelength, spectral bandwidth, pulse energy, etc. Different accelerators impose different requirements on the values and stability of these parameters, which drive the accelerator design and guide the development of beam controls. The term beam control represents the activities and entities to detect and manipulate the beam parameters. The accelerator subsystems related to beam controls are within the scope of a super system named beam control system. This book discusses the general concepts, algorithms, and structures of beam controls. We will use electron Linac-based free electron laser (FEL) machines as examples (Altarelli et al. 2006; Stohr et al. 2011; Milne et al. 2017). Of course, the results are also applicable in other types of accelerators like storage rings, proton Linacs, or cyclotrons.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 Z. Geng and S. Simrock, Intelligent Beam Control in Accelerators, Particle Acceleration and Detection, https://doi.org/10.1007/978-3-031-28597-4_1
1
2
1 Introduction
1.1.1 Beam Control Tasks The primary tasks of beam controls include beam setup, optimization, and stabilization. All these tasks observe the beam parameters and adjust the settings of different accelerator subsystems. For an FEL machine based on electron Linacs, subsystems affecting the beam include the radio frequency (RF) Gun laser generating electron bunches, RF system producing accelerating or deflecting electric fields, and various magnets for focusing the beam or guiding the beam orbit. In most cases, the subsystem parameters being adjusted for setting up, optimizing, and stabilizing the same beam parameters are mostly the same. The decision on the subsystems and parameters used are based on their sensitivity for controlling the beam parameters. Beam Setup is a basic task of beam operation for establishing beams with desired parameters (Loos et al. 2008; Schietinger et al. 2016). It is executed in two scenarios: setting up the beam from scratch when starting up the accelerator or adjusting the beam parameters to different setpoints. The latter scenario is also called operating point changing. A stable machine state, including a set of particular values of the beam parameters and the corresponding subsystem settings, is defined as an operating point. Here are several examples of beam setup tasks for an FEL machine: • Set up desired bunch charge by adjusting the RF Gun laser intensity. • Set up desired beam energy by adjusting the RF amplitudes and phases. • Set up desired bunch compression by adjusting the bunch compressor chicane and the RF amplitudes and phases. • Set up desired beam optics (i.e., Twiss function) by adjusting the quadrupoles. • Set up desired photon wavelength by adjusting the undulator gaps or electron beam energy. For setting up the secondary beam (e.g., FEL photon) parameters, the particle beam parameters (e.g., beam energy) may be used as knobs, which are further controlled by the corresponding accelerator subsystems (e.g., RF system). Beam Optimization improves the beam performance after the accelerator is set up with desired beam parameters (Huang 2020). It fine-tunes the beam towards better qualities by minimizing a cost, a criterion for performance evaluation defined as a function of beam parameters known as the cost function. For example, we need to accomplish the following optimization tasks for an FEL machine: • • • •
Minimize the electron bunch energy spread at the RF Gun exit. Minimize the beam emittance at the RF Gun exit. Minimize the beam loss in Linacs, bunch compressors, and undulators. Maximize the overall photon energy in an FEL pulse (i.e., FEL pulse energy).
These are complex optimization tasks that require adjusting multiple subsystems. For example, to maximize the FEL pulse energy, most of the subsystems mentioned above, such as the RF Gun laser, RF stations, and magnets, may require adjustment. The magnets are adjusted either directly (e.g., Gun solenoid and bunch compressor
1.1 Overview of Beam Controls
3
dipoles) or indirectly by changing the setpoints of the beam optics or orbits. Typically, the optic or orbit feedback will automatically set the corresponding magnets (e.g., quadrupoles or orbit correctors). Some beam setup tasks, especially the operating point changing tasks, can also be solved with the optimization methods. In this case, the cost is defined as the difference between the beam parameters and their desired values. Beam Stabilization is required for maintaining the beam quality against drifts (Steinhagen 2016). After being optimized, beam parameters may be driven away from their optimal values by external disturbances. In an FEL machine, examples of disturbances include: • • • • • •
Fluctuations of RF Gun laser intensity, arrival time, and transverse orbit. Fluctuations of RF field amplitude and phase in accelerating cavities. Fluctuations of magnet drive currents. Phase drift introduced by the timing and synchronization system. Mechanical vibrations caused by cooling pumps coupled to RF cavities. Coupling of disturbances of alternating-current (AC) mains (50 or 60 Hz) into electronics.
These disturbances cause fluctuations in beam parameters. Slow fluctuations are often denoted as drift, and fast ones are called jitter. Drift and jitter are relative concepts without concrete boundaries. Typically, slower fluctuations that can be controlled by beam feedback are defined as drifts, mainly caused by temperature, pressure, or humidity changes. Jitter is caused by fast disturbances like electrical noise or mechanical vibrations and is often not controllable by feedback (Simrock and Geng 2022). Here we list some examples of beam stabilization tasks: • Stabilize the bunch charge by adjusting the RF Gun laser intensity. • Stabilize the bunch arrival time at the bunch compressor exit by adjusting the upstream RF phases. • Stabilize the beam energy and bunch length by adjusting the amplitudes and phases of the RF stations upstream the bunch compressor. • Stabilize the beam orbit by adjusting the drive currents of corrector magnets. • Stabilize the FEL photon wavelength by adjusting the electron beam energy entering the undulator. Stabilizing the beam also includes recovering the beam parameters after a fault in the accelerator that may stop the beam temporarily. The fault may be caused by subsystem failures, such as RF breakdown, inoperable motors, defected magnet power supplies, or by operators.
4
1 Introduction
1.1.2 Beam Control Methods Here we briefly summarize the beam control methods that will be discussed in more detail in the remaining chapters of this book.
1.1.2.1
Methods for Beam Stabilization
Let’s first look at the methods for beam stabilization. To suppress the beam parameter drifts, we apply feedback control. Closed-loop bandwidth is an important criterion for evaluating the feedback control performance, determining the fastest disturbance that the loop can suppress. The achievable closed-loop bandwidth is determined by the loop delay, gain margin, beam detector noise, etc. It must be limited to avoid loop instability or transferring too much detector noise to the beam. Therefore, feedback can only reduce disturbances below a certain frequency limited by the loop delay, gain margin and phase margin. High-frequency disturbances result in fast beam jitter that is difficult to control. The essential method for reducing beam jitter is to adopt stable components in the accelerator. For example, low-noise RF power sources, quiet cooling pumps, and low-noise magnet power supplies are vital for obtaining an intrinsic beam stability. Though high-frequency disturbances often cannot be suppressed by feedback, some may be controllable by feedforward. Feedforward control can reject repetitive or periodic (e.g., sinusoidal) disturbances. Beam feedback control will be discussed in Chap. 2. Considering the characteristics of the beam feedback loops, we will highlight the feedback control of multipleinput multiple-output (MIMO) static systems. The core step of controller design is to identify and invert a beam response matrix. We introduce several simple but robust methods for inverting ill-conditioned matrices, such as singular value decomposition (SVD), least square with regularization, and robust control. Chapter 4 discusses the applications of machine learning (ML) to beam controls. If we build a global surrogate model (i.e., a neural network regression model) of the accelerator, we can estimate the response matrix for a new operating point. It helps reconfigure the beam feedback loops for fast operating point changing. This topic is discussed in Sect. 4.3.2. We also apply ML methods for feedforward control in Sect. 4.3.4. Furthermore, we introduce the reinforcement learning (RL) method for implementing feedback controllers with data. It is an alternative to the traditional model-based controller design methods like loop shaping, internal model control (IMC), and robust control (Skogestad and Postlethwaite 2005). The deep RL method based on deep neural networks may also control nonlinear systems. The applications of RL are discussed in Sect. 4.4.
1.1 Overview of Beam Controls
1.1.2.2
5
Methods for Beam Optimization
Beam optimization employs particular algorithms to achieve the objectives of minimizing or maximizing some quantities derived from the beam parameters (Huang 2020). Since the precise model between the inputs and the resulting beam parameters is often unknown, direct search or heuristic algorithms are typically adopted for optimization. Chapter 3 covers the topics of online beam optimization. We will introduce several widely used algorithms, such as spontaneous correlation optimization (SCO), random walk optimization (RWO), robust conjugate direction search (RCDS), genetic algorithm (GA), and particle swarm optimization (PSO). Machine learning also enhances the performance of beam optimization. First, by applying the optimization algorithms to the accelerator surrogate model, we can quickly obtain a set of quasi-optimal inputs. With these quasi-optimal inputs as the starting points, optimization of the physical system can be significantly accelerated. Directly optimizing the physical system may be time-consuming. We will discuss these applications of surrogate models in Sect. 4.3.3. Second, we also introduce Bayesian optimization based on the Gaussian process models of the machine. It is a direct application of machine learning to optimizations. See Sect. 4.3.5.
1.1.2.3
Methods for Beam Setup
For setting up the beam parameters or changing their operating points, we can adopt the following methods: (a) Determine the required settings of the concerned subsystems (e.g., RF system, laser system, and magnets) by inverting the accelerator models. (b) Vary the setpoints of the beam feedback loops controlling the beam parameters to be changed. (c) Determine the required subsystem settings with optimization. Method a needs a valid inverse model (physics or empirical model like a neural network model mapping the outputs to the inputs) of the machine. If such a model exists, the setup can be swift. Method b requires the presence of beam feedback. Since linear feedback is based on response matrices around particular operating points, this method cannot make significant setpoint changes if the system is nonlinear. The typical beam parameter changing method is to combine the methods a and b. The subsystem settings for the new setpoints are first determined by inverting the physics model, which is often available from the beam dynamics simulation. The results may not be accurate but are close to the desired operating point. Then, the beam feedback loops are reconfigured and closed to bring the beam parameters precisely to the desired values. An alternative to method a is method c based on optimization, which is more helpful when the machine model is unavailable. Applying optimization algorithms directly to the physical system is often too slow for operating point changing during user operation. However, a surrogate model of the machine that predicts the
6
1 Introduction
beam parameters for large-range inputs can accelerate the optimization process and helps for fast operating point changing. This book will not discuss the beam setup methods based on physics models. They are covered in textbooks on accelerator physics (Wiedemann 2015). We focus on empirical models identified from the input–output data of the accelerator. The outputs are beam parameters and the inputs are accelerator subsystem settings.
1.2 Beam Control System This section discusses the beam control system. We introduce a hierarchical architecture consisting of four layers and summarize the functional and performance requirements of each layer.
1.2.1 Hierarchy of Beam Control System An accelerator consists of many subsystems, such as high-power radio frequency (HPRF), low-level radio frequency (LLRF), Gun laser, magnets (e.g., dipoles in bunch compressors, orbit corrector, quadrupoles, sextuples, etc.), power supplies, undulators, cooling system, vacuum system, beam diagnostics, etc. Some are directly related to beam controls, either measuring the beam parameters or affecting them as actuators for beam controls. The beam control system is an aggregation of all these subsystems. As depicted in Fig. 1.1, we allocate the subsystems of the beam control system into four layers, according to how close they are to the beam:
Fig. 1.1 A 4-layer architecture of accelerator beam controls
1.2 Beam Control System
7
The Beam Device Layer collects devices that produce, transmit, accelerate, and diagnose the particle beams. These devices, denoted as beam devices, are essential for an accelerator. Their design and development are driven by beam physics for achieving the desired beam quality (e.g., beam current, energy, bunch length, repetition rate, etc.). Beam devices should be appropriately configured and stabilized for accurate and stable beam operation. Errors in devices manipulating the beam result in deviations in beam parameters. For example, Gun laser power error induces bunch charge deviations, RF amplitude and phase errors introduce variations in beam energy and bunch length, and magnet field errors deform the beam orbit or mismatch the beam. On the other hand, errors in devices diagnosing the beam cause errors in beam parameters if the measurement results are used as inputs for feedback or optimization. Therefore, control instruments are needed to configure, monitor, and stabilize the beam devices. The Instrumentation Layer consists of control instruments for beam devices. These instruments are responsible for automatically setting up, diagnosing, optimizing, and stabilizing the beam devices. The control system designers introduce this layer with more flexibility in choosing the technologies for the design and implementation. See Sect. 1.2.3 for details. The Global Control Layer coordinates the instrumentation-layer subsystems for successful beam operation. This layer includes all physics applications for beam setup and all beam feedback control loops. Section 1.2.4 offers more details. The Global Optimization Layer is the top layer aiming to maximize the accelerator’s performance. Here we run high-level optimization tools. The other three layers with high robustness and automation are the basis for efficient optimization. Some details can be found in Sect. 1.2.5. After an accelerator facility is constructed, the beam device layer is typically fixed, including its architecture and characteristics. For example, after a superconducting cavity is installed, its sensitivity to mechanical vibrations does not change much under a particular cryomodule design. We have more flexibility in designing the other three layers. This book focuses more on the global control and global optimization layers. Discussing the subsystems in the instrumentation layer require different domain knowledge. Many books on accelerator instrumentations, including the one about LLRF system that we published recently, are available now (Simrock and Geng 2022; Strehl 2006). Another two critical topics for beam controls are automation and machine learning. They should be considered in all three upper layers for improving the performance of the subsystems of beam controls. Here we make a summary. The beam device layer is the plant to be controlled, and the instrumentation layer is a basis for beam operation. The global control layer builds up the proper and stable beam operation, and the global optimization layer enhances the beam performance. Automation is critical for successful operations of all these three layers, and machine learning helps improve the automation level and control performance further.
8
1 Introduction
1.2.2 Beam Device Layer Beam devices related to beam controls are categorized into beam actuators and beam detectors. Beam actuators manipulate the beam parameters via their outputs, usually in the form of electromagnetic fields. Beam detectors pick up the beaminduced electrical or optical signals for detecting the beam parameters. When we talk about a beam device, we include not only the parts that directly interact with the beam but also the driving components (e.g., power supplies, power transmission lines, etc.). For example, the entire HPRF station is viewed as a beam device (beam actuator), including the RF amplifier, klystron, high-voltage modulator, waveguides, and cavities. In this example, only the cavities interact with the beam, and all other parts generate the required electromagnetic fields in the cavities. In an FEL machine, some other beam actuators in addition to the RF system are listed as follows: • RF Gun laser produces electron bunches via laser (i.e., electromagnetic wave) fields interacting with the RF Gun photocathode (Vicario et al. 2010; Winkelmann et al. 2019). • Magnets, such as solenoids, dipoles, quadrupoles, etc., guide or focus the electron beam via magnetic fields. • Undulators produce FEL photon beams by modulating the electron beam orbit with spatially periodic magnetic fields (Howells and Kincaid 1993). Beam detectors are installed around the beam transmission path to pick up the beam induced electrical or optical signals. Some beam detectors cut the beam and thus are invasive to beam operation. For example, screens or wire scanners for transverse beam profile measurement, or RF deflector cavities for bunch length measurement, must disturb the beam. Beam feedback and optimization are interested in non-invasive beam detectors that do not affect the beam. Some typical non-invasive electron beam detectors in an FEL machine are listed below: • Integrating current transformer (ICT ) detects the electrical signal induced by each electron bunch to measure its bunch charge (Belohrad et al. 2014). • Beam position monitor (BPM) picks up the bunch-induced electrical signals with multiple probes or resonance cavities for measuring the bunch’s transverse position. BPM can also provide the bunch charge information with a lower resolution compared to the ICT. At a dispersion region, the beam position can also be further calibrated as a measurement of the beam energy. • Bunch arrival time monitor (BAM) compares the electrical signal induced by each bunch to a stable timing reference (as an RF or optical signal) and obtains the relative fluctuations of bunch arrival time (Viti et al. 2017). • Synchrotron radiation monitor (SRM) detects the synchrotron radiation (optical) signals emitted by the electron bunches passing through a dipole of the bunch compressor chicane. SRM measures the transverse beam profile and thus monitors the energy chirp of the electron beam during compression (Gerth 2007).
1.2 Beam Control System
9
• Bunch compression monitor (BCM) measures the bunch length after a bunch compressor. One of the BCM realizations detects and integrates the spectral intensity of the coherent diffraction radiation (CDR) emitted by the electron within a specific frequency range (Frei et al. 2013). Examples of non-invasive photon beam detectors (Juranic et al. 2018) include: • Gas-based detector measures the photon flux and photon beam position. It is also known as the photon-beam-intensity gas-monitor and the photon-beam-position gas-monitor (PBIG/PBPG). • Pulse arrival and length monitor uses a THz streak camera to measure the arrival time and pulse length of an FEL beam. • Photon single-shot spectrometer (PSSS) measures the spectrum of each photon pulse and thus derives the central wavelength and bandwidth of the FEL.
1.2.3 Instrumentation Layer Subsystems in the instrumentation layer are also called device controllers because they control the beam devices.
1.2.3.1
Categories of Device Controllers
Device controllers are categorized based on the beam devices they control: Open-loop Beam Actuator Controllers implement local input/output (I/O) controllers to control the beam actuators in open loops. The outputs of such beam actuators are not regulated by feedback but only determined by the control parameter settings. In an FEL machine, the following beam actuators are controlled by open-loop controllers: • RF Gun laser. Some of its parameters are controlled in open loop, such as the laser delay, laser spot size, pulse shape, etc. • Magnets. Though the drive currents for magnet coils are typically controlled by feedback, there is no direct control of the magnetic fields. Therefore, magnets are usually viewed as open-loop beam actuators. • Undulators. They are adjusted by changing the gap distances without directly controlling the magnetic fields. Open-loop beam actuators have no control of drifts. Therefore, their operating environment should be stabilized with small temperature and humidity fluctuations. Otherwise, active compensations, such as beam feedback or continuous online optimization, should be considered to overcome the drifts. Closed-loop Beam Actuator Controllers use local feedback controllers to control the beam actuators in closed loops. The outputs of such beam actuators are regulated
10
1 Introduction
by feedback, and are adjusted by changing the feedback-loop setpoints. Examples of closed-loop beam actuators include: • RF Gun laser. Some of its parameters are controlled in closed loops, such as the laser intensity and the laser transverse position on the RF Gun cathode. • RF system. The amplitude and phase of the RF fields in the cavities are controlled by the LLRF feedback loops. Beam feedback loops with closed-loop actuators form cascaded loops, which should be designed carefully to avoid instability. A basic guideline is that the inner loops provide fast control while the outer loops implement slow feedback. Beam Diagnostic Controllers receive the electrical or optical signals picked by the beam detectors and interpret them as measurements of beam parameters (Schlott et al. 2015). A beam detector (for picking up beam-induced signals) and the corresponding beam diagnostic controller (for interpreting the signals as beam measurements) are tightly connected and they together are often denoted as a beam diagnostic device.
1.2.3.2
Requirements to Device Controllers
The device controllers should provide some common functions, including (a) Configure and optimize the beam devices This is a collection of domain-specific functional requirements for device controllers. Beam actuators should be configured to produce correct outputs, and beam detector signals should be interpreted as accurate measurements of the beam parameters. The concrete contents of these functional requirements depend on the physical principle of the beam device. Therefore, domain expertise is necessary for developing device controllers. For example, the LLRF controller for an RF station should set up the klystron high voltage, RF pulse shape, and cavity resonance frequency for operating the RF cavity properly. We also need to determine the optimal parameters for maximizing the performance of a beam device. For example, when operating an RF cavity, we need to maximize its stability and minimize the required RF drive power and breakdown rate. Device control optimization is typically done manually using the beam device’s physics model or scanning its parameters. Of course, the optimization algorithms introduced in Chaps. 3 and 4 can be used for optimization problems with many input parameters or complex objectives. For example, automatic optimizers are needed for maximizing the energy gain and power efficiency of a cryomodule containing multiple superconducting cavities (e.g., determining the voltages of different cavities to avoid quenches, thus maximizing the total energy gain). (b) Stabilize the beam device outputs This requirement is for the control of closed-loop beam actuators. For example, the LLRF controller stabilizes the RF fields in a cavity with the RF and resonance control
1.2 Beam Control System
11
loops. The RF Gun laser controller implements a pointing feedback loop to stabilize the laser transverse position on the RF Gun cathode (Alverson et al. 2017). (c) Calibrate the beam device outputs to physical units and convert the user setting from physical units to values acceptable by the device The device controller must provide easily understandable interfaces for higher-level users. For example, we can calibrate the drive current of a dispersive dipole magnet with the energy of the beam that passes through its center. When setting up the magnet for different beam energy, we can directly set the desired beam energy to the magnet controller, which then determines and sets the proper drive current using the calibration factor. (d) Exception detection and handling When operating a beam device, the exceptions raised by the device or the device controller itself should be detected and handled automatically. This is important to avoid device damage and improve the machine’s robustness, reliability, and availability. For example, the RF drive power should be cut or reduced after a cavity breakdown or quench to avoid damage. The operation of the beam device should be automatically recovered after the fault is resolved. (e) Beam synchronous data acquisition (BSDAQ) Reading and archiving the data of the beam actuators and beam diagnostics for every bunch is vital for making correlations. The data should be correctly timestamped so that the actuation and measurement of each bunch can be identified. BSDAQ is helpful for troubleshooting and is a basis for applying machine learning to beam controls. Machine learning relies on complete and comprehensive data from the inputs and outputs of the systems to be controlled. (f) Automation For any device controller, basic automation procedures that should be implemented include the startup/stop procedure and the fault recovery procedure. The highlighted performance requirements for device controllers are reliability, robustness, and repeatability, which are supercritical for successfully operating an accelerator. The failure rates of device controllers should be minimized for successful beam setup, optimization, and user experiments. In an FEL machine, some user experiments need continuous photon beams for hours, and any interruption in between may cause the experiments to fail. Robustness is also a fundamental requirement. Abnormal user settings should be detected and rejected, and comprehensive exception detection and proper handling should be focused on. For example, a glitch in the clock of the field programmable gate array (FPGA) should not drive the FPGA to a dead state and cause the controller to fail. Good repeatability is another crucial factor in improving the availability of the device. After power cycling or rebooting, a device controller should be able to restore exactly its outputs quickly. For example, we should guarantee that the RF field amplitude and phase are completely repeatable
12
1 Introduction
after a reboot or power cycle of the LLRF controller (Geng 2020). Experiences show that the accelerator is almost not operable without repeatability because establishing the beam operation again from messy RF phases is challenging and may take hours. It is unacceptable during user operation when we have to reboot the LLRF controller. Note that here only provides some highlights of the common functional and performance requirements of device controllers. A detailed requirements analysis is necessary for designing and implementing each specific one. In this book, the subsystems in the beam device layer and instrumentation layer are not our focus, and we only provide a brief overview of them in these two subsections to complete the image of beam controls. When describing the beam feedback and optimization in the remaining chapters, we often use the terms physical system, input, and output. The term “physical system” is a combination of the beam devices and their controllers related to the feedback or optimization. The term “input” corresponds to the settings of beam actuators, either as control parameter settings for open-loop beam actuators or as setpoints for closed-loop beam actuators. The term “output” stands for the results of beam diagnostics, i.e., the measurements of beam parameters.
1.2.4 Global Control Layer The global control layer interacts with multiple subsystems in the instrumentation layer to accomplish global beam controls, including beam setup and stabilization. We use physics applications to perform beam setup and implement beam feedback controllers to stabilize the beam. Physics Applications collect the global procedures for operating the accelerator, including the tools for beam setup, calibration, and complex (invasive) beam parameter measurements. Some examples of the typical physics applications in an FEL machine are listed as follows: • Set up the electron bunch parameters at the RF Gun exit (e.g., desired bunch charge and energy, and minimum energy spread). • Set up the electron bunch parameters at the bunch compressor exit (e.g., desired beam energy, bunch length, and longitudinal profile). • Beam energy management, including changing the beam energy, updating the magnet settings for the new energy, and balancing the energy gains of different RF stations to avoid breakdowns. It also includes the function of compensating for the failed cavities by automatically increasing the energy gain of other cavities or applying a standby cavity. • Change the machine operating point, like setting the beam for a different energy, current, bunch rate, compression parameters, or FEL wavelength. • Calibrate the accelerating voltage and beam phase of all RF stations. • Measure the beam’s longitudinal phase space and slice parameters (e.g., bunch length and longitudinal emittance) with RF deflecting cavities. • Measure the transverse emittance of the beam.
1.2 Beam Control System
13
• Set up the FEL photon parameters (e.g., photon wavelength, bandwidth, and pulse duration). The physics applications are typically developed by beam physicists or operators for daily beam operation. Beam Feedback Controllers implement feedback loops for stabilizing the beam parameters. The loops read the measurements of beam parameters from beam diagnostics and process them in feedback controllers. The outputs of the feedback controllers are used to adjust the beam actuators via either the local I/O controllers or the local feedback controllers. Examples of beam feedback loops (Lonza et al. 2010; Fairley et al. 2011) include: • • • • •
Bunch charge feedback. Bunch arrival time feedback. Longitudinal feedback at bunch compressors (beam energy and bunch length). Beam orbit feedback. FEL pointing feedback.
Beam feedback controllers also implement the functions of identifying the system models (e.g., beam response matrices), configuring the feedback loops, and optimizing the feedback parameters. The algorithms introduced in Chaps. 3 and 4 can also be used for optimizing the feedback loops. The most important performance requirement for the physics applications is the level of automation. Successful beam operation relies on automating these global procedures, including the procedures for beam recovery after faults (e.g., RF interlock trip or temporary beam loss), beam operating point changing, and machine startup after a long-term shutdown. Like the instrumentation-layer subsystems, beam feedback controllers should also focus on reliability, robustness, and repeatability. Beam feedback loops are essential elements for the successful beam operation. Specifically, they should implement comprehensive exception handling (e.g., automatically open the loops when the beam stops and resume after the beam is back) for reliable continuous operation.
1.2.5 Global Optimization Layer The global optimization layer implements beam optimizers to enhance the beam quality. The beam optimizers read the particle or secondary (e.g., photon) beam parameters and determine the optimal settings for the beam actuators and beam feedback loops, aiming at maximizing or minimizing some criteria. Beam optimizers are sometimes implemented as a part of physics applications because they share the same inputs and outputs. Therefore, the requirements for physics applications also apply to beam optimizers.
14
1 Introduction
1.2.6 Role of Machine Learning Machine learning (Hastie et al. 2009; Murphy 2012; Géron 2019) has been introduced to the accelerator community in recent years. Many studies are being carried out to apply ML methods in accelerator design and control. People build surrogate models of the accelerator to predict the results of invasive beam diagnostics without disturbing the beam. This usage of surrogate models is also known as virtual diagnostics (Emma et al. 2021). Gaussian process (GP) regression models are introduced to perform Bayesian optimization. Some studies train classification models for quench detection for superconducting cavities. This book discusses several selected machine-learning applications that are beneficial for beam controls, including. • Surrogate models of beam control system. We can predict the beam response to specific inputs using a surrogate model. Since ML surrogate models can often fit the responses of nonlinear systems, they can be used in much larger input ranges than linear models. We will demonstrate some applications of such surrogate models, such as identifying beam response matrices at different operating points, accelerating online beam optimization, and implementing feedforward controllers. • Bayesian Optimization based on GP models. Bayesian optimization uses the realtime data from the machine and thus usually performs better than the blackbox optimization algorithms introduced in Chap. 3. • Reinforcement Learning (RL) for feedback controller synthesis. RL provides an alternative philosophy and methodology for implementing feedback controllers. We will demonstrate using it to synthesize optimal controllers for solving the linear quadratic Gaussian (LQG) problems. ML concepts and its applications in beam controls are discussed in Chap. 4. Beam control experts are continuously discovering new applications of machine learning. However, we attempt to provide some hints below for implementing MLbased algorithms in beam control systems: (a) Machine learning is a powerful tool. However, it requires a comprehensive and large amount of data to train the model. It introduces overheads for designing the data acquisition and model training tools. Therefore, for solving a practical problem, we still prefer applying traditional methods (e.g., linear system modeling, classical control theory, deterministically programmed automation tools, etc.) if they are sufficient. (b) During the early design stages, we should focus more on the accuracy, reliability, robustness, repeatability, and automation of the subsystems in the instrumentation and global control layers. Only with reliable operation of the machine do the optimization and ML methods make sense. Of course, we can consider designing a mature data acquisition and archiving system in advance. Complete and high-quality data is the basis for any ML algorithms. Finally, ML models cannot replace the physics models of the system. A primary advantage of ML models is that they only rely on the system input–output data without
1.2 Beam Control System
15
needing to know the physics under it. This simplifies the beam control system design in some cases. However, we still need to deeply understand the principles and theories of each part of the beam control system. Only with comprehensive knowledge of the system can we understand the system behavior, judge the system performance, and configure the ML models more efficiently.
1.2.7 SwissFEL Two-Bunch Operation Many examples in this book are based on experiments carried out at SwissFEL for controlling the second bunch. Here is a brief introduction to SwissFEL and its two-bunch operation mode. SwissFEL (Milne et al. 2017) is an FEL machine based on a normal-conducting Linac. See Fig. 1.2. It employs an S-band (2998.8 MHz) RF Gun with a 2.6-cell standing-wave (SW) cavity to generate electron bunches. The Booster1 consists of two S-band traveling-wave (TW) structures, each powered by a separate klystron. A laser heater (LH) is used to mitigate the micro-bunching instability in the bunch compressors. The Booster2 consists of two S-band and one X-band (11.424 GHz) RF stations, each driving two TW structures. The Booster2 S-band RF stations operate off-crest to generate the required energy chirp for bunch compression in the first bunch compressor (BC), BC1. The X-band RF station works at a decelerating phase to linearize the energy distribution along the electron bunch for optimal compression performance. After BC1, the Linac1 with 9 C-band (5712 MHz) RF stations ramps the beam energy and generates the necessary energy chirp for the second bunch compressor, BC2. Each C-band RF station drives four TW structures. The Linac2 consists of four C-band RF stations, which boost the beam energy to 3 GeV to feed the soft X-ray line “Athos” and the Linac3 of the hard X-ray line “Aramis”. In the Aramis beamline, 13 C-band RF stations ramp the electron beam energy up to 5.8 GeV before injection in the undulators. The RF stations operate in the pulsed mode at a pulse repletion rate of up to 100 Hz. SwissFEL’s two-bunch operation mode simultaneously supplies the Aramis and Athos beamlines with electron bunches at up to 100 Hz. It requires accelerating two electron bunches (bunch1 and bunch2) separated by 28 ns in the same RF pulse. At the switchyard, bunch2 is kicked to the Athos beamline by a resonant kicker, and Athos Athos Linac C-band (4x) Undulator s BC 1
S-band
Laser Heater
Gun Booster1
Booster2
S-band
X-band
Linac1 C-band (36x)
BC 2
Deflector S-band
Fig. 1.2 Layout of SwissFEL linear accelerator
Switch Yard Linac2 C-band (16x)
Linac3 C-band (52x)
Deflector C-band
Collimator Aramis Undulators
16
1 Introduction
bunch1 feeds the Linac3 of Aramis. The two-bunch mode imposes strict requirements on the LLRF system, which must be able to adjust the accelerating voltage and phase of either bunch independently. Typically, we set up and optimize the beam operation by adjusting the amplitude and phase of the entire RF pulse referring to bunch1. Afterward, we manipulate the RF inputs (e.g., envelope shape of the input RF pulse) to adjust the accelerating voltage and phase for bunch2 without disturbing bunch1. To do this, we introduce a step in the envelope of the RF pulse inputting to the SW cavities (RF Gun) or TW structures (injector S-/X-band and Linac1 C-band). Figure 1.3 illustrates the envelope of an input RF pulse and explains the principle of bunch2 adjustment for TW structures. The energy in the input RF pulse establishes electromagnetic fields in TW structures for beam acceleration. In Fig. 1.3, the input RF energy between the time points A and B (called time window AB, or simply AB) accelerates the bunch1. Since the bunch2 arrives 28 ns later, it is accelerated by the RF energy in CD, which is delayed by 28 ns compared to AB. Note that the lengths of AB and CD are equal to the filling time of the TW structure. It can be seen that the RF energy in BD only accelerates bunch2. Therefore, we can generate a step in amplitude and phase starting from the time point B to manipulate the bunch2 without disturbing the bunch1. Figure 1.3b shows the accelerating vector (representing the accelerating voltage and phase) for bunch1, where E AB indicates that the vector is induced by the RF energy in AB. As illustrated in Fig. 1.3c, the RF pulse step starting from B can adjust the accelerating vector for bunch2, which is the sum of the two vectors induced by E CB and E BD , respectively. Note that the vector induced by E BD falls in the shaded disk depending on the step settings. Since the RF pulse step is applied after the bunch1, it only affects the bunch2. The short spacing (28 ns) between the two bunches limits the bunch2 tuning range. The maximum accelerating voltage and phase changes for bunch2 are determined by not only the step settings but also the time constant of the SW cavity or TW structure. An RF pulse step is characterized by its step ratio and step phase, which are the scale factor and phase shift to the initial amplitude and phase, respectively. The tuning Bunch1 Acceleration A C
28 ns B D
Bunch2 Acceleration
a Envelope of RF Pulse inputting to the Traveling-wave Structure
Bunch1 E AB Accelerating Vector
Common E CB Accelerating Vector
c
b Accelerating Vector generated EBD by 28ns Step
Bunch2 Accelerating Vector
Fig. 1.3 LLRF strategy to adjust bunch2 with a step in the amplitude and phase of the input RF pulse envolope. Here only shows the TW structure case
1.2 Beam Control System
17
Table 1.1 Accelerating voltage and phase tuning ranges for bunch2 Filling time (ns)
Acc. voltage range
Acc. phase range
SW gun
2.9988
440
−1.8 to 0.9%
±1.3°
S-band TW
2.9988
920
−1.7 to 0.0%
±0.8°
C-band TW
5.7120
320
−4.8 to −2.9%
±0.9°
X-band TW
11.9952
100
−21.6 to 0.0%
±11.5°
RF station type
Frequency (GHz)
ranges of bunch2 accelerating voltage and phase with a step ratio within [0, 1] and a step phase within [−60°, 60°] are summarized in Table 1.1. Table 1.1 indicates that the RF pulse step typically reduces the accelerating voltage. Therefore, before applying the step, we should optimize the RF pulse delay such that the bunch2 receives more energy gain, which helps to locate the accelerating voltage tuning range symmetrically around zero. The RF pulse steps should be further optimized after the successful transmission of bunch2 in the presence of bunch1. Typically, we regulate the bunch2 parameters using a separate feedback loop actuating on the RF pulse steps. The beam actuators and detectors for controlling bunch2 are defined in Fig. 1.4. Note that bunch1 is regulated by a separate feedback loop adjusting the amplitude and phase of the entire RF pulse. Bunch1 feedback brings disturbance to bunch2, and therefore, the bunch2 feedback loop also mitigates the crosstalk from the control of bunch1. The actuator inputs and the detector outputs for bunch2 control and optimization can be summarized in vectors T u = rbst1 pbst1 rbst2 pbst2 r X p X r L1 p L1 , T y = E L H E BC1 L BC1 E BC2 L BC2 . The elements of the input vector are summarized as follows: • rbst1 , pbst1 : RF pulse step ratio and step phase of Booster1 stations. • rbst2 , pbst2 : RF pulse step ratio and step phase of Booster2 stations. • r X , p X : RF pulse step ratio and step phase of the X-band station.
Fig. 1.4 Actuators and detectors for SwissFEL bunch2 control and optimization
(1.1)
18
1 Introduction
• r L1 , p L1 : RF pulse step ratio and step phase of Linac1 stations. The elements of the output vector are: • E L H : bunch2 energy deviation at the laser heater, measured by a BPM installed after the second dipole of the chicane. • E BC1 : bunch2 energy deviation at BC1, measured by a BPM after the first dipole of the chicane. • L BC1 : bunch2 bunch length at BC1, measured by a BCM. • E BC2 : bunch2 energy deviation at BC2, measured by a BPM after the first dipole of the chicane. • L BC2 : bunch2 bunch length at BC1, measured by a CDR detector. Note that we will directly use the BPM readings to represent the energy deviations (E L H , E BC1 , and E BC2 ). To calculate the actual absolute energy deviation, we need to know the dispersion (η) and the nominal beam energy (E 0 ) at the BPM. Then the absolute energy deviation can be calculated with the formula E = E 0 x/η, where x is the BPM reading. For a particular operating point of the beam, it implies that E ∝ x. The bunch lengths, L BC1 and L BC2 , are represented by the raw values with arbitrary units produced by the BCM and CDR, which are proportional to the actual bunch length. In later chapters of this book, we will revisit these parameters frequently. We will demonstrate some of the feedback, optimization and machine learning algorithms using the experiments applied to the bunch2 of SwissFEL.
References M. Altarelli, R. Brinkmann, M. Chergui et al., The European X-ray Free-Electron Laser technical design report. DESY 2006-097 (2006) S. Alverson, D. Anderson, S. Gilevich, LCLS-II injector laser system, in Proceedings of the ICALEPCS2017 Conference, Barcelona, Spain, 8–13 Oct 2017 (2017) D. Belohrad, M. Krupa, J. Bergoz et al., A new integrating current transformer for the LHC, in Proceedings of the IBIC2014 Conference, Monterey, CA, USA, 14–18 Sep 2014 (2014) C. Emma, A. Edelen, A. Hanuka et al., Virtual diagnostic suite for electron beam prediction and control at FACET-II. Information 12(2), 61 (2021). https://doi.org/10.3390/info12020061 D. Fairley, K. Kim, K. Luchini et al., Beam based feedback for the Linac Coherent light source, in Proceedings of the ICALEPCS2011 Conference, Grenoble, France, 10–14 Oct 2011 (2011) F. Frei, I. Gorgisyan, B. Smit et al., Development of electron bunch compression monitors for SwissFEL, in Proceedings of the IBIC2013 Conference, Oxford, UK, 16–19 Sep 2013 (2013) Z. Geng, Robustness issues of timing and synchronization for free electron lasers. Nuclear Inst. Methods Phys. Res. A 963, 163738 (2020). https://doi.org/10.1016/j.nima.2020.163738 A. Géron, Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd edn. (O’Reilly Media, Sebastopol, 2019) C. Gerth, Synchrotron radiation monitor for energy spectrum measurements in the bunch compressor at FLASH, in Proceedings of the DIPAC2007 Conference, Venice, Italy, 20–23 May 2007 (2007) T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edn. (Springer, New York, 2009) M.R. Howells, B.M. Kincaid, The properties of undulator radiation. LBL-34751 (1993). https:// cds.cern.ch/record/260372/files/P00021955.pdf. Accessed 06 Aug 2022
References
19
X. Huang, Beam-Based Correction and Optimization for Accelerators (CRC Press, Boca Raton, 2020) P. Juranic, J. Rehanek, C.A. Arrell et al., SwissFEL Aramis beamline photon diagnostics. J. Synchrotron Rad. 25, 1238–1248 (2018). https://doi.org/10.1107/S1600577518005775 M. Lonza, S. Cleva, D. Mitri et al., Beam-based feedbacks for the FERMI@Elettra free electron laser, in Proceedings of the IPAC10 Conference, Kyoto, Japan, 23–28 May 2010 (2010) H. Loos, R. Akre, A. Brachmann et al., Commissioning of the LCLS linac, in Proceedings of LINAC08 Conference, Victoria, BC, Canada, 29 Sep–3 Oct 2008 (2008) C.J. Milne, T. Schietinger, M. Aiba et al., SwissFEL: the Swiss X-ray free electron laser. Appl. Sci. 7(7), 720 (2017). https://doi.org/10.3390/app7070720 K.P. Murphy, Machine Learning: A Probabilistic Perspective (MIT Press, Cambridge, 2012) T. Schietinger, M. Pedrozzi, M. Aiba et al., Commissioning Experience and Beam Physics Measurements at the SwissFEL Injector Test Facility. Phys. Rev. Accel. Beams 19, 100702 (2016). https:// doi.org/10.1103/PhysRevAccelBeams.19.100702 V. Schlott, V. Arsov, M. Baldinger et al., Overview and status of SwissFEL diagnostics, in Proceedings of the IBIC2015 Conference, Melbourne, Australia, 13–17 Sep 2015 (2015) S. Simrock, Z. Geng, Low-Level Radio Frequency Systems (Springer, Cham, 2022) S. Skogestad, I. Postlethwaite, Multivariable Feedback Control: Analysis and Design, 2nd edn. (Wiley, New York, 2005) R. Steinhagen, Feedback control for particle accelerators. Tutorial Presented in the PCaPAC2016 Conference (2016). https://pages.cnpem.br/pcapac2016/tutorial-feedback-control-for-particleaccelerators. Accessed 05 Aug 2022 J. Stohr, Z. Huang, P. Emma et al., Linac Coherent Light Source II (LCLS-II) conceptual design report (2011). SLAC-I-060-003-000-00 P. Strehl, Beam Instrumentation and Diagnostics (Springer, Berlin, 2006) C. Vicario, R. Ganter, F.L. Pimpec et al., Photocathode drive laser for SwissFEL, in Proceedings of FEL2010 Conference, Malmö, Sweden, 23–27 Aug 2010 (2010) M. Viti, M.K. Czwalinna, H. Dinter et al., The bunch arrival time monitor at FLASH and European XFEL, in Proceedings of the ICALEPCS2017 Conference, Barcelona, Spain, 8–13 Oct 2017 (2017) H. Wiedemann, Particle Accelerator Physics, 4th edn. (Springer, Cham, 2015) L. Winkelmann, A. Choudhuri, H. Chu et al., The European XFEL photocathode laser, in Proceedings of the FEL2019 Conference, Hamburg, Germany, 26–30 Aug 2019 (2019)
Chapter 2
Beam Feedback Control
Abstract Particle accelerators must provide stable beams for achieving their physics goals. Stabilizing the beam against external disturbances is a critical task of the beam control system. Typically, we introduce feedback control for mitigating the effects of slow disturbances like temperature or humidity changes and power supply drifts. In this chapter, we present an overview of the beam feedback control in accelerators, introduce a generic controller structure, and discuss several methods for feedback controller design. We highlight the control of multiple-input multiple-output static systems and introduce several controller design methods based on singular value decomposition, least-square with regularization, and robust control.
2.1 Beam Feedback Control Overview Feedback control is a fundamental method for stabilizing the beam. It suppresses the effects of external disturbances and internal variations of system dynamics (i.e., perturbations) and tracks the setpoint changes. Efficient control requires the frequencies of disturbances, perturbations, and setpoint changes to be within the closed-loop bandwidth. We have listed several typical disturbances degrading the beam quality in Sect. 1.1.1. Perturbations are system model uncertainties, such as the errors in an identified beam response matrix compared to the actual one. Desired time-varying setpoint changes also impose requirements on the dynamical behavior of a feedback loop. For example, the energy feedback controller should track the setpoint ramping by automatically adjusting the radio frequency (RF) and magnet settings. A feedback controller should include an integrator if we want to remove the steady-state errors in beam parameters. The abstract block diagram of a beam feedback (BFB) loop is depicted in Fig. 2.1. For a BFB controller, the plant to be controlled includes the open-loop or closedloop beam actuator controllers, beam actuators, beam detectors, beam diagnostic controllers, and the beam itself. These components are in the beam device layer and instrumentation layer of the beam control system, as described in Sects. 1.2.2 and 1.2.3. The plant inputs are the control settings of beam actuators (e.g., control parameter settings of open-loop actuators or setpoints of closed-loop actuators), © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 Z. Geng and S. Simrock, Intelligent Beam Control in Accelerators, Particle Acceleration and Detection, https://doi.org/10.1007/978-3-031-28597-4_2
21
22
2 Beam Feedback Control
Fig. 2.1 Abstract block diagram of a beam feedback loop
and the outputs are the measurements of beam parameters. The beam is stabilized based on these measurements; therefore, accurate and low-noise beam detectors are necessary. This book does not cover the topics of beam parameter measurements, and interested readers can refer to the book by Strehl (2006). The core functions of BFB controllers include: (a) System identification. Identify the accelerator system model describing the transfer function between the control settings of beam actuators and the beam parameters. The model could be a static response matrix or a dynamical transfer function (matrix), depending on the plant’s properties. (b) Disturbance rejection. Reduce the effects of disturbances (coupled into the system from external sources) and perturbations (variations of system transfer functions). (c) Reference tracking, also denoted as setpoint tracking or command tracking. Control the beam parameters to follow the time-varying setpoints. These are basic functional requirements for BFB controllers. Many other requirements are not listed here, such as configuring the loops (selecting detectors and actuators), optimizing the feedback parameters (e.g., gain), and comprehensive exception detection and handling. In practice, a complete requirements analysis is needed to design and implement a BFB controller successfully. This chapter assumes that the beam setpoints are given and focuses only on realizing the feedback control. The setpoints are typically provided by operators, higher-level feedback controllers, or beam optimizers (see Chaps. 3 and 4).
2.2 Beam Feedback Control Analysis Before discussing the concrete algorithms for BFB control, we make some analysis of the characteristics of the accelerator beam control system.
2.2 Beam Feedback Control Analysis
23
2.2.1 Plant Characteristics The plant to be controlled shown in Fig. 2.1 has the following characteristics that may influence the design of BFB controllers: (a) Dynamics in beam detectors and beam diagnostic controllers Most beam detectors and diagnostic controllers for feedback control can measure every bunch. For example, we can obtain the measurements of bunch charge, arrival time, energy, bunch length, and position for each bunch after it passes through. Therefore, we often neglect the dynamics in beam detectors and diagnostic controllers and normalize their transfer functions to unity. The following discussions will assume that the exact beam parameters are observed at the plant outputs and neglect the dynamics of the beam detectors and diagnostic controllers. (b) Dynamics in beam actuators and their controllers Some beam actuators (e.g., fast kicker magnets) can manipulate every bunch, and their dynamics can be neglected. We use constant gains (for single-output devices) or constant matrices (for multiple-output devices) to model such fast beam actuators. On the contrary, some beam actuators have larger time constants, and their control setting changes result in slow output variations and affect multiple bunches. We must consider the dynamics of such beam actuators within BFB loops. Examples of dynamical beam actuators include the RF cavity and the orbit corrector magnet. Of course, if the bunch repetition rate is low, such as in a pulsed Linac, the RF cavities or corrector magnets can reach steady states between adjacent bunches. In this case, their dynamics can also be neglected, and they can be treated as static beam actuators. The actuators of a BFB loop are often similar to each other. For example, the orbit feedback actuators are all corrector magnets, and the longitudinal feedback actuators are similar RF stations. These actuators have similar dynamical behaviors. (c) Transfer function between beam actuators’ outputs and beam parameters In a single-pass accelerator section (e.g., Linac or beam transmission line, through which each bunch passes once), the transfer function between the beam actuators’ outputs and the beam parameters is static. It can be described by a constant matrix named beam response matrix. In a multiple-pass accelerator, such as a storage ring, the transfer function is dynamical (i.e., includes system dynamics) because the effects of beam actuators on the same bunch accumulate in multiple turns. For example, in a storage ring, the transfer functions between the RF amplitude/phase and the beam energy/arrival time are second-order differential equations related to the synchrotron oscillation (Simrock and Geng 2022). However, if the dynamics of the transfer function are much faster than the feedback update rate, we can still approximate the transfer function with a static response matrix. One example is the storage-ring closed-orbit control. After adjusting the corrector magnets, the orbit controller waits until the closed orbit reaches a new steady state before making the subsequent control.
24
2 Beam Feedback Control
In this case, we can use a constant response matrix to describe the transfer function between the settings of corrector magnets and the beam positions. (d) Multiple-input multiple-output (MIMO) system Most BFB controllers receive multiple beam parameters and adjust multiple beam actuators. If both the beam actuators and the beam responses are dynamical, we should design the feedback controller following the general MIMO control design methods (Skogestad and Postlethwaite 2005). However, if the transfer function between the beam actuators’ outputs and the beam parameters is static, the controller design can be significantly simplified. We will focus on this case. (e) Discrete control Though beam actuators accept continuous inputs, beam detectors can only provide discrete measurements for a bunched beam. Therefore, BFB controllers are often discrete with a maximum sampling frequency equal to the bunch repetition rate. For example, we use the discrete pulse-to-pulse feedback to control a pulsed Linac that accelerates a single bunch in each RF pulse. Specifically, if the controller only suppresses the slow drifts of beam parameters, the closed-loop bandwidth may be much smaller than the bunch repetition rate. In this case, we often average the measurements of many bunches as inputs to the controller to improve the signal-to-noise ratio and design the feedback controllers using continuous models. Figure 2.2 is an example of BFB plants. It is for the longitudinal control of Linacbased free electron laser (FEL) machines. The plant inputs are the amplitude and phase setpoints of k RF stations, and the outputs are the beam energy and bunch length. The RF amplifiers and cavities correspond to the beam actuators in Fig. 2.1, and the RF controllers correspond to the beam actuator controllers. We have neglected the beam detectors and beam diagnostic controllers, assuming they provide accurate measurements. Let’s analyze the transfer function of the plant. The inputs and outputs are defined as vectors: [ [ ]T ]T u = ΔA S P1 Δϕ S P1 · · · ΔA S Pk Δϕ S Pk ∈ R2k , y = ΔE ΔL ∈ R2 , (2.1) where A and ϕ stand for amplitude and phase, respectively. We use Δ to denote the deviations of the inputs and outputs with respect to a particular operating point. Since the plant is highly nonlinear, we can only build a linear model in a small range around each operating point. The RF closed loops are dynamical and described by a transfer function G(s) for both amplitude and phase. The transfer function between the amplitudes and phases of the cavity fields and the beam parameters are described by a constant matrix R ∈ R2×2k . Therefore, the transfer function of the plant in the form of Laplace transforms (s is the complex frequency) is written as
2.2 Beam Feedback Control Analysis
25
Fig. 2.2 Plant to be controlled for a longtitudinal feedback loop of Linac-based FEL machines. Details of RF feedback loops (e.g., RF actuators and detectors) are not shown
⎡
] [ ΔE(s) R11 = R21 ΔL(s)
[
y(s)
⎤ ΔA S P1 (s) ⎢ Δϕ (s) ⎥ ] ⎢ ⎥ S P1 R12 · · · R1,2k ⎢ ⎥ G(s)I2k ⎢ ··· ⎥, ⎢ ⎥ R22 · · · R2,2k ⎣ ⎦ ΔA (s) S Pk R Δϕ S Pk (s)
(2.2)
u(s)
where I2k is a 2k-dimentional unit matrix. Here we have written the inputs and outputs as Laplace transforms. Equation (2.2) is simplified to y(s) = G(s)Ru(s).
(2.3)
Note that R is a constant real-valued matrix (i.e., response matrix), and G(s) is a single-input single-output (SISO) transfer function. We will limit our discussions to the plants with beam actuators having the same dynamics. It is the case in most orbit control loops with identical corrector magnets and in the loop of Fig. 2.2 with similar cavities and RF controllers. This particular format of transfer functions can significantly simplify the controller design.
2.2.2 Static and Dynamical Controllers The model (2.3) describes a MIMO system with dynamics only in the beam actuators. The beam parameters depend on multiple actuators through constant gains, i.e.,
26
2 Beam Feedback Control
without dynamics. Therefore, we can design a controller consisting of two parts to control the static and dynamical parts of the plant separately. See Fig. 2.3. We assume the plant has n inputs and m outputs, where n and m are natural numbers with m ≤ n (reasons given below). The plant outputs y ∈ Rm are compared with the setpoints r ∈ Rm , resulting in the errors e ∈ Rm . We design separate SISO dynamical controllers K di (s) (i = 1, 2, …, m) to regulate the dynamical response of each beam parameter. The dynamical controllers are stacked into an m-by-m diagonal matrix Kd (s) = diag(K d1 (s), K d2 (s), . . . , K dm (s)). A MIMO static controller, Ks ∈ Rn×m , which is a constant matrix, converts the dynamical controller outputs u' ∈ Rm into the plant inputs u ∈ Rn . In analogy, Ks inverts the beam response matrix R ∈ Rm×n . In summary, we design a static controller to decouple the MIMO plant into m independent loops regulated by separate dynamical controllers. Each loop controls one beam parameter. The design methods for SISO dynamical controllers have been discussed in many textbooks like the reference (Skogestad and Postlethwaite 2005). We only focus on the static controller design. Since we often cannot invert R (e.g., R is either not a square matrix or close-to singular), we express the static controller via RKs = Im ,
(2.4)
where Im is the m-dimensional unit matrix. Therefore, Ks normalizes the plant into m independent SISO channels with the same dynamics, as shown in Fig. 2.3. The numbers of inputs and outputs are of concern for BFB loops. If the number of outputs is larger than the number of inputs (i.e., m > n), there will be output directions (i.e., directions in output space, each representing a 1-dimensional subspace) that cannot be controlled by the inputs. When the desired outputs line up with these uncontrollable directions, they are not achievable by any combinations of inputs. In other words, the controller cannot suppress the disturbances that generate output errors in these uncontrollable directions. We typically require the following conditions to be satisfied for implementing effective BFB controllers: (a) The number of inputs (n) and outputs (m) should satisfy m ≤ n. (b) The beam response matrix R has a full row rank, i.e., rank (R) = m. These conditions guarantee that by manipulating the inputs u (without limits), we can achieve any desired outputs y. The first condition enables us to affect any output
Fig. 2.3 Separate implementation of static and dynamical controllers
2.2 Beam Feedback Control Analysis
27
directions, and the second makes the outputs form an m-dimensional linear space, implying that any output values are achievable. If we control the SwissFEL bunch2 longitudinal beam parameters using feedback, the control loop in Fig. 1.4 can be elaborated into Fig. 2.4. The inputs and outputs of the plant to be controlled are given by (1.1) with m = 5 and n = 8. SwissFEL operates in the pulsed mode. The feedback controller corrects the errors in the previous pulse by adjusting the inputs for the next pulse. The dynamics of the RF stations are not relevant because, in each pulse, the beam is accelerated after the traveling-wave structures are fully filled (i.e., in steady-state). Therefore, the plant model for pulseto-pulse feedback is static and can be fully described by a response matrix. The plant for SwissFEL bunch2 is given by Δy = RΔu, where [ ]T u = rbst1 pbst1 rbst2 pbst2 r X p X r L1 p L1 , ]T [ y = ΔE L H ΔE BC1 L BC1 ΔE BC2 L BC2 .
(2.5)
Here we use Δ before u and y to represent the incremental inputs and outputs around a particular operating point. The transfer function between u and y is strongly nonlinear, and we can only linearize it as (2.5) in a small range around the operating point. We introduce a static controller Ks to invert R, converting the integrated errors of 5 beam parameters back into the corresponding corrections of 8 actuators. The dynamical controller of this example is a discrete pulse-to-pulse integral controller for all beam parameters. Its z-transform transfer function is K d (z) =
g , z−1
(2.6)
where g is a gain satisfying 0 < g < 2. In practice, we often set a smaller gain (e.g., g = 0.1) to avoid amplifying high-frequency beam jitter while maintaining the capability of suppressing slow drifts. Later we will use this feedback loop to demonstrate the design methods for the static controller Ks . When designing a static controller, we first identify the response matrix and then invert it approximately.
Fig. 2.4 Feedback control of SwissFEL bunch2 longtitudinal beam parameters
28
2 Beam Feedback Control
2.2.3 Local and Global Control Loops Practical BFB controllers can be implemented either in one global loop or multiple local loops. For example, Fig. 2.4 is a global loop that simultaneously stabilizes multiple beam parameters distributed in the accelerator. On the contrary, we may split a global loop into smaller loops controlling local beam parameters. The global loop in Fig. 2.4 can be divided into three local loops: 1. Local loop for stabilizing ΔE L H by manipulating rbst1 and pbst1 . 2. Local loop for stabilizing ΔE BC1 and L BC1 with rbst2 , pbst2 , r X and p X . 3. Local loop for stabilizing ΔE BC2 and L BC2 with r L1 and p L1 . Another typical example of global and local feedback loops is the beam orbit control system in Fig. 2.5. Both configurations are widely used in practice. The BFB system should allow configuring the loops freely during operation, such as splitting a global loop into several local ones, merging local loops into a global one, or modifying the included actuators and detectors in an existing loop. A global loop needs to invert a large response matrix (e.g., a hundred-by-hundred matrix for the orbit feedback in a storage ring) that could be very challenging. We will discuss the difficulties in inverting an ill-conditioned matrix in Sect. 2.4.1. The primary benefit of the global loop is that the effects of upstream beam actuators on downstream beam parameters are automatically considered in the global response matrix. On the contrary, local loops are often easier to design due to their lowerdimensional response matrices, which are less problematic to invert than a large ill-conditioned matrix. Local loops also allow the design of each controller independently with different control algorithms and configurations. When one of the local
Orbit Feedback Controller
Orbit Feedback Controller 1
…
BPM
Corrector
BPM
Corrector
BPM
Corrector
BPM
Orbit Feedback Controller 2
Beam
Orbit Feedback Controller N
BPM
Corrector
BPM
Beam
Corrector
BPM
Corrector
BPM
Corrector
BPM
Corrector
BPM
…
Corrector
Local Feedback Loops
Corrector
BPM
Corrector
BPM
…
Corrector
Global Feedback Loop
Fig. 2.5 Beam orbit control with a global feedback loop or N local feedback loops
2.3 Beam Response Matrix
29
loops fails, the beam operation may continue with other loops in operation. Therefore, local feedback loops are preferred in many practical BFB systems. One major issue of local loops is the coupling between different loops. The beam actuator adjustments issued by the upstream loops also affect the downstream beam parameters, causing coupling between loops. This problem can be mitigated by optimizing the closed-loop bandwidths of the local loops. For example, if the downstream loops have larger closed-loop bandwidth, the disturbances coupled from the upstream loops can be suppressed. However, if the coupled disturbances fall into particular frequency ranges, they may be amplified by the downstream loops (Skogestad and Postlethwaite 2005). Therefore, the closed-loop bandwidths of different local loops should be optimized carefully to avoid increasing the beam jitter. In some accelerators, people implement fast communications between local loops to remove the coupling with feedforward corrections (Himel et al. 1993). This method can be adopted when the coupling cannot be mitigated by optimizing the loop bandwidths.
2.3 Beam Response Matrix The plant to be controlled (see Fig. 2.1) is typically nonlinear. At a particular operating point, we linearize the plant response in a small input range and model it as a response matrix R. It implies that R is different for different operating points. This section discusses the identification methods, decomposition, and uncertainties of the response matrices.
2.3.1 Response Matrix Identification Beam dynamics simulation can estimate the response matrix before the beam is available. When designing an accelerator, we determine the nominal beam parameters and the required inputs (e.g., RF amplitude and phase, magnet strength, etc.) with simulation. Using this nominal case as a reference, we can slightly change each input and simulate the changes in all beam parameters. With these results, we can calculate the sensitivity of each beam parameter to each input and obtain the response matrix. Here we focus on empirical methods to identify the response matrix R, assuming the beam parameters are measurable. The simplest method is similar to that based on beam dynamics simulation. We scan the control setting of each beam actuator in a small range around the operating point (i.e., vary the control setting linearly in multiple steps) and observe the changes in all concerned beam parameters. The sensitivity of a beam parameter to the beam actuator (i.e., an element of R) can be calculated with linear fitting. Therefore, R is also called beam sensitivity matrix. At each scan step, we must wait long enough for the beam to reach steady state before reading the beam parameters. It guarantees that R models the static beam response. The scan step size should be large enough to cause beam parameter changes
30
2 Beam Feedback Control
larger than the resolution of beam detectors. We will use Rid to denote the identified response matrix that contains errors compared to the actual one, R. Figure 2.6 shows the results of SwissFEL bunch2 response matrix identification. The inputs are scanned around an operating point defined by the nominal inputs u0 and nominal outputs y0 . We scan each actuator in a small range to ensure linear responses. For example, the third column of Fig. 2.6 are the beam responses when scanning rbst2 with other actuators unchanged. We notice that some responses in Fig. 2.6 already show nonlinearity, causing errors in Rid . The calculated response matrix is ⎡
0.339 ⎢ 0.877 ⎢ ⎢ Rid = ⎢ −0.260 ⎢ ⎣ −0.661 0.400
0.107 −0.213 −0.776 0.207 −0.274
0 1.542 −0.465 −1.590 0.723
0 −0.107 −0.201 0.083 −0.047
0 −1.188 −0.035 1.207 −0.785
⎤ 0 0 0 −0.149 0 0 ⎥ ⎥ ⎥ 0.529 0 0 ⎥. ⎥ −0.075 −1.123 3.044 ⎦ 0.230 0.857 −0.546 (2.7)
The linear-scanning method above scans one actuator each time, which might be slow if the system has many actuators. Another method is to adjust the actuators simultaneously with random variations and fit the response matrix with the leastsquare algorithm. Assume we produce N random input vectors ui and the corresponding outputs are yi with i = 1, 2, …, N. The incremental inputs and outputs compared to the operating point are denoted as Δui = ui − u0 and Δyi = yi − y0 , respectively. We should have Δyi = RΔui . Therefore, given the Δui and the measured Δyi , R can be estimated by solving the following lease-square problem ˜ F , where R = arg min||˜y − Mu|| M∈Rm×n
[
] [ ] u˜ = Δu1 Δu2 . . . Δu N , y˜ = Δy1 Δy2 . . . Δy N .
(2.8)
The Frobenius norm can be replaced by the 1 or 2-norm for similar performance. This is a convex optimization problem and can be solved by a Matlab tool named cvx. The code for the SwissFEL bunch2 response matrix identification is. cvx_begin variable M(m, n) minimize(norm(y - M*u, ’fro’)) M(1, 3:end) == 0 M(2, 7:end) == 0 M(3, 7:end) == 0 cvx_end
Here y and u correspond to the stacked input and output vectors in (2.8). From the SwissFEL layout, we find the zero-valued elements of R, considering that the upstream beam parameters are not affected by the downstream actuators. These elements are set to zero as the constraints of this optimization problem. Figure 2.7 shows the first 20 random incremental inputs for this lease-square-based method. The identified response matrix is given below:
Fig. 2.6 SwissFEL bunch2 response matrix identification by scanning the actuators. The operating point is defined by u0 = [0.89, 20, 0.4, 40.88, 0.26, −1.14, 0.68, −49.79]T and y0 = [0, 0, 163.3, 0.46, 757]T . The scan ranges of the step ratio and step phase are ±0.2 and ±5°, respectively, around the operating point. Each actuator is scanned in 30 steps. The valid range of step ratio is [0, 1] and the valid range of step phase is [−60°, 60°]
2.3 Beam Response Matrix 31
32
2 Beam Feedback Control
⎡
Rid,ls
0.371 ⎢ 0.471 ⎢ ⎢ = ⎢ −0.228 ⎢ ⎣ −0.618 0.133
0.124 −0.291 −0.634 0.207 −0.123
0 1.391 −0.433 −1.580 0.432
0 −0.205 −0.144 0.114 −0.137
0 −1.004 −0.107 1.124 −0.424
⎤ 0 0 0 −0.236 0 0 ⎥ ⎥ ⎥ 0.580 0 0 ⎥. ⎥ 0.094 −0.849 3.297 ⎦ −0.067 0.487 −0.544 (2.9)
It is slightly different from that identified by linear scanning. The linear-scanning method takes longer time but are more robust again measurement noise; the leastsquare-based method are faster but could be sensitive to noise and introduce more variance to the identified response matrix. We often make more identifications and average the results. The response matrix uncertainty is discussed in Sect. 2.3.3. To validate the identified response matrix, we apply some new random variations to the actuators and measure the beam parameter deviations. We also use the two identified response matrices in (2.7) and (2.9) to predict the beam parameter changes. The measurements and predictions are compared in Fig. 2.8. They match quite well. There are residual errors among the measurements and the two predictions, implying that both identified response matrices contain uncertainties compared to the actual beam response. The response matrices (2.7) and (2.9) use “radian” as the unit of input phases. The values of L BC1 and L BC2 are normalized by dividing their operating point, 163.3 and 757, respectively. This scaling ensures that the inputs and outputs are in the same range respectively. They improve the condition of the response matrix for precise inversion. The matrix inversion is discussed in Sect. 2.4.1.
Fig. 2.7 Random variations (offset to operarting point) of actuators for least-square-based identification. The step ratio variation RMS is 0.1 and the step phase varation RMS is 2°
2.3 Beam Response Matrix
33
Fig. 2.8 Validation of the response matrices. We compare the response-matrix predictions with the actual measurements for new random inputs
2.3.2 Singular Value Decomposition Singular value decomposition (SVD) is a powerful tool for the BFB design. The SVD of a real matrix R ∈ Rm×n (m ≤ n, and rank(R) = m) is given by ⎤⎡σ 1 ⎢ R = UΣVT = ⎣ u1 · · · um ⎦ ⎣ . . . ⎡
U
σm Σ
⎤⎡
⎤ v1T ⎥⎢ . ⎥ 0 ⎦ ⎣ .. ⎦,
(2.10)
vT n VT
where U ∈ Rm×m and V ∈ Rn×n are orthogonal matrices with UT U = UUT = Im and VT V = VVT = In . We use u1 , u2 , …, um to denote the column vectors of U and v1 , v2 , …, vn to denote the column vectors of V. The block diagonal elements of the matrix Σ ∈ Rm×n are the singular values of R sorted in σ1 ≥ σ2 ≥ · · · ≥ σm > 0. The matrix R maps the vectors in the input space into the output space. To simplify the discussion, we use input direction and output direction to represent the directions of vectors in the input and output spaces, respectively. SVD represents the gain of R as a function of input directions. R maps the input direction vi to the output direction ui with the magnitude scaled by σi (i = 1, 2, …, m). Therefore, the singular values represent the gains of R in different input directions. Note that when m < n, the inputs in the directions vm+1 , . . . , vn cause 0 output since we have only m singular values. It implies that there are combinations of inputs that do not affect the outputs because their effects cancel each other. On the other hand, if we have more outputs than inputs, i.e., m > n, there will be output directions (un+1 , . . . , um ) that cannot be affected by any inputs. This is why we usually require m ≤ n for BFB systems. Figure 2.9 shows the geometric meaning of SVD. A matrix G maps the vectors on a unit circle (left) to an ellipse (right). G and its SVD are given by
34
2 Beam Feedback Control
Fig. 2.9 Geometric explaination of the input, output directions and singular values
[
] 1.5 0 G= = UΣVT , where −1.1 1.1 [ ] [ ] [ ] −0.69 0.73 2.00 0 −0.92 0.40 U= , Σ= , V= . 0.73 0.69 0 0.83 0.40 0.92
(2.11)
The unit-length input vectors v1 and v2 are the column vectors of V and are mapped to the major and minor axes of the output ellipse. The lengths of Gv1 and Gv2 equal to the singular values (i.e., 2.00 and 0.83, respectively) and their directions are the same as the column vectors of U. For comparison, we also show the eigenvectors of G, w1 and w2 . They do not change their directions after being mapped to the outputs. Each singular value represents a “mode” of the response matrix, which has a pair of characteristic input and output directions, and the singular value is the gain between them. When evaluating the output of a general input vector, we can project it onto the input directions of different modes and then combine the corresponding outputs.
2.3.3 Response Matrix Uncertainties The identified response matrix Rid has errors compared to the actual one R, degrading the performance of the static controller derived from Rid according to (2.4). The uncertainties may come from the errors in beam actuators and detectors or the operating point drifts in the presence of nonlinearity. We introduce a linear relation to evaluate the size of the uncertainties of Rid : R = Rid (In + γ Δ),
(2.12)
2.3 Beam Response Matrix
35
where γ > 0, In is a unit matrix, and Δ ∈ Rn×n is a random perturbation matrix satisfying ||Δ||2 ≤ 1. We do not assume any structure or any concrete values of Δ for generality. The only information is that its 2-norm is bounded, i.e., less than 1. The 2-norm of a matrix M is defined by its largest singular value, denoted as ||M||2 = σ (M).
(2.13)
Precise evaluation of the relative uncertainty size γ is complex (Gayadeen et al. 2015). Here we only provide a simple method to estimate the lower boundary of γ using the response matrices determined through system identification (i.e., identified response matrices). At a particular operating point, we can repeat the response matrix identification multiple times and denote the results as Rid,k with k = 1, 2, …, K. We assume that Rid,k has no bias compared to the actual response matrix R, but only with random variances. Therefore, we can average Rid,k to estimate R: R≈
K 1 ∑ Rid,k . K k=1
(2.14)
For each identified response matrix, we rewrite (2.12) to R = Rid,k (In + γk Δ) = Rid,k + γk Rid,k Δ.
(2.15)
Then, we have R −Rid,k = γk Rid,k Δ. Apply 2-morm to both sides, and we obtain || || || || || || || || ||R − Rid,k || = γk ||Rid,k Δ|| ≤ γk ||Rid,k || ||Δ||2 ≤ γk ||Rid,k || . 2 2 2 2
(2.16)
Here we have used the inequality relation ||AB|| ≤ ||A||||B|| for matrix norms. For the 2-norm, the equality occurs when the output direction of σ (B) aligns with the input direction of σ (A). From (2.16), the boundary of γk is given by || || / || || γk ≥ ||R − Rid,k ||2 ||Rid,k ||2 .
(2.17)
Finally, the magnitude of the uncertainty of the response matrix takes the maximum value of γk : γ ≥ max{γ1 , γ2 , . . . , γ K }.
(2.18)
We applied this method to evaluate the uncertainty of the SwissFEL bunch2 response matrices (2.7) and (2.9) and obtained γ ≥ 0.1. It means that the uncertainty of the identified response matrices may exceed 10%. This is a rough estimate because we only measured the response matrix twice. Multiple identifications are required for a more precise estimate. The uncertainty information is useful for the robust control design of static controllers. See Sect. 2.4.4.
36
2 Beam Feedback Control
2.4 Static Linear Feedback Controller Design With the estimated response matrix Rid , we implement the static controller according to (2.4). It requires the inversion of Rid , which is often difficult if Rid is ill conditioned. We will introduce several useful methods for inverting an ill-conditioned matrix in this section.
2.4.1 Difficulties in Response Matrix Inversion If a matrix has a large condition number for inversion (called condition number in short), a small perturbation of the matrix may lead to significant inversion errors. The 2-norm condition number of a general matrix M is defined as || || κ(M) = ||M||2 ||M−1 ||2 = σ (M)/σ (M),
(2.19)
where σ (M) and σ (M) are the largest and smallest singular values of M, respectively. M is said to be ill conditioned if its condition number is greater than a few. Inverting an ill-conditioned matrix is usually problematic. For any matrix norm, the following inequality relation exists (Horn and Johnson 2013): || || ||(M + E)−1 − M−1 || ||E|| κ(M) || || || || ≤ , −1 || || ||M−1 || 1 − M E ||M||
(2.20)
where E is a perturbation matrix of M. Note that if the norm in (2.20) is not the 2-norm, the condition number should be defined using the same type of norm. For a particular size of E, a larger condition number leads to more significant errors in the matrix inverse. Let us look at an example. Assume M = [1 2; 2 3] and E = [0.05 0; 0 0] that perturbs the first element of M by 5%. We can calculate that κ(M) = 17.9, indicating that M is ill conditioned. The left side of (2.20) is calculated to be 0.18 and the right side 0.26. These results imply a large relative error (close to 20%) in the inverse of M if it is perturbed by E. Here we write down their inverses for comparison: M−1 = [−3 2; 2 −1] and (M + E)−1 = [−3.53 2.35; 2.35 −1.24]. Apply (2.20) to the uncertainty model (2.12) of the response matrix, we obtain || −1 || ||R − R−1 || γ id 2 || −1 || κ(Rid ). ≤ ||R || 1 − γ id 2
(2.21)
−1 Therefore, to mitigate the error of Rid relative to the actual response matrix inverse R , we must reduce the uncertainty and condition number of Rid . Let us use the SwissFEL bunch2 response matrices (2.7) and (2.9) as an example. We assume that (2.9) is the actual response matrix, i.e., R = Rid,ls , and Rid is given by (2.7). The −1
2.4 Static Linear Feedback Controller Design
37
relative error of the matrix inverse is calculated to be about 0.5 following the left-hand side of (2.21), where we have used the SVD-based method (Sect. 2.4.2) to calculate the matrix inverses. The uncertainty size of Rid with respect to R is estimated with (2.17) as γ ≥ 0.2. Here we take γ = 0.2. Then, the right-hand side of (2.21) is calculated to be around 3.5. Therefore, the relation (2.21) is satisfied. It provides an upper bound for the relative error of the matrix inverse. In this example, the matrixinverse relative error may exceed 100% for some identified response matrices due to the large condition number of the plant (κ(Rid ) ≈ 14.2). Note that we have estimated the uncertainties of (2.7) and (2.9) to be γ ≥ 0.1 in Sect. 2.3.3, which differs from the estimate here. The reason is that here we assumed (2.9) as the actual response matrix instead of using their average as we did in Sect. 2.3.3. The condition number of a response matrix represents the ratio of the maximum and minimum gains of the plant in different input/output directions. Controlling an ill-conditioned plant is difficult. More control actuation is required to mitigate the output errors in low-gain directions, possibly saturating the beam actuators. The static controller (i.e., response matrix inverse) may have significant errors and cannot decouple the MIMO plant effectively, leading to coupling for dynamical controllers controlling different beam parameters. In this case, the dynamical controller must be designed as a general MIMO controller, and if we still design them separately according to the statements in Sect. 2.2.2, stability issues may arise. We notice that the units of the plant’s inputs and outputs affect the gains between them and then the condition number of the response matrix. Here are some basic guidelines for selecting the units and scales for the plant’s inputs and outputs: (a) Use the same unit or units with similar scales for the inputs and outputs If the inputs and outputs of a plant have uniform units, the response matrix has a relatively smaller condition number. One example is the plant of beam orbit control. Its inputs (corrector magnet control settings) have the same unit, and so do the outputs (BPM readings). For longitudinal control, we define the units as follows: • Use radian for phases and relative value ΔA/ A for amplitudes. • Convert the bunch arrival time (in femtosecond) into RF phase in radian. • Use relative values with respect to the operating point for beam energy (ΔE/E), bunch length (ΔL/L) and bunch charge (ΔQ/Q). (b) Scale the inputs and outputs with respect to their limits The largest values of the inputs and outputs for beam control are determined by the dynamic ranges of the beam actuators and detectors. They are also limited by the requirements for efficient beam transmission. We use the plant in Fig. 2.2 as an example. On the input side, the RF amplitude is limited by the maximum cavity voltage achievable by the RF station. On the output side, the beam must fall into the valid ranges of the beam energy and bunch length detectors. Furthermore, to achieve successful beam transmission and desired beam parameters, the RF amplitudes and phases are limited in particular ranges. We can scale the plant’s inputs and outputs
38
2 Beam Feedback Control
using their limits to normalize them into a standard range [0, 1]. For example, the RF amplitudes can be normalized into new input variables as A˜ S Pi =
A S Pi − A S Pi,min ∈ [0, 1], i = 1, 2, . . . , k, A S Pi,max − A S Pi,min
(2.22)
where A S Pi,min and A S Pi,max are the lower and upper limits of the amplitude setpoint of the ith RF station. The phases and the outputs (beam energy and bunch length) can be scaled in the same way. In addition to choosing proper units, scaling of the plant’s inputs and outputs may further reduce the condition number of the response matrix. More discussions of scaling a dynamical plant’s inputs, outputs and transfer functions can be found in the reference (Skogestad and Postlethwaite 2005). We use the SwissFEL bunch2 response matrix (2.7) as an example, its condition number is about 14.2. As mentioned in Sect. 2.3.1, we have considered the guideline a above when selecting the units of the inputs and outputs. As a comparison, we change the unit of step phases to degree and use the unscaled values of L BC1 and L BC2 , resulting in a response matrix with a condition number 3.2 × 104 . We also scale the inputs with their limits following the method given in the guideline b above. The step ratios are limited to [0, 1], and the step phases to [−60°, 60°]. The scaling further reduces the condition number of the response matrix to around 12.5. Therefore, we must select the correct units for the plant inputs and outputs to reduce the condition number of its response matrix. The scaling using the limits is typically optional. Most beam-control plants are ill conditioned and challenging to control. We must make tradeoffs between stability and control performance. We will introduce several practical methods for designing static controllers for such ill-conditioned response matrices in the presence of uncertainties.
2.4.2 Matrix Inversion with SVD The SVD of a response matrix Rid ∈ Rm×n (m ≤ n, and rank(Rid ) = m) is Rid = UΣVT given by (2.10). The pseudo-inverse of Rid can be calculated as ⎡
−1 Rid
⎤
⎡
⎢ ˜ −1 UT = ⎣ v1 · · · vn ⎦ ⎢ = VΣ ⎢ ⎣ V −1
σ1−1
⎤
..
⎡ ⎥ ⎥⎢ . ⎥⎣ σ −1 ⎦ m
0
˜ −1 Σ
⎤ u1T .. ⎥ ∈ Rn×m , . ⎦
(2.23)
umT
UT
−1 ˜ ∈ Rn×m is the equivalent inverse of Σ. We can prove that Rid Rid where Σ = Im . Therefore, we obtain a design of the static controller according to (2.4) as
2.4 Static Linear Feedback Controller Design −1 ˜ −1 UT . Ks ≈ Rid = VΣ
39
(2.24)
Here we use Rid as an approximation of the actual response matrix R. If Rid is not a −1 Rid is usually not equal to In . The square matrix, i.e., m < n, the multiplication Rid −1 −1 difference between Rid Rid and Rid Rid is explained as follows: −1 (a) Rid Rid = Im describes the normalized plant in Fig. 2.3. It means that the beam parameters in y can be directly achieved by setting their desired values as the equivalent inputs u' for the normalized plant. Here we neglected the plant dynamics, i.e., assuming G(s) = 1. (b) Assume the plant inputs have a deviation Δu resulting in beam parameter errors −1 to calculate the required input corrections to compensate e = r − y. We use Rid −1 Rid /= In implies that the calculated corrections may for e. The relation Rid not equal to −Δu. This is because the plant has more inputs than outputs, and different input combinations may result in the same outputs. −1 When used as a static controller, Rid estimates the required corrections of beam actuators using the beam parameter errors. If Rid is ill conditioned, its singular values differ significantly. Therefore, the beam parameter errors in certain directions may be scaled by the inverses of small singular values, resulting in dramatic corrections of beam actuators. It may saturate the beam actuators and cause instability. For controlling such ill-conditioned plants, some applications replace the inverses of smaller singular values with zero. This method is called singular-value truncation, which only corrects the beam parameter errors in the directions of the remaining singular values. It can avoid saturating the beam actuators and make the feedback loop stable. Note that this method cannot compensate for the beam parameter errors in the directions of the truncated singular values. Singular-value truncation is widely used in beam orbit controls because the orbit errors align mainly with the directions of the first several most significant singular values (Mirza et al. 2019). However, this method is unsuitable for longitudinal feedback, which has only a few beam parameters to control and must suppress the errors in all directions. We will compare the SwissFEL bunch2 control results with and without singular-value truncation in Sect. 2.4.5.
2.4.3 Matrix Inversion with Least-Square Method The response-matrix inversion problem can be converted to an optimization problem. The inputs and outputs deviations satisfy Δy = RΔu. If R and Δy are known, Δu can be obtained by solving an optimization problem as { } Δu = arg min ||Δy − Rx||22 + λ||x||22 . x∈Rn
(2.25)
40
2 Beam Feedback Control
Here λ > 0 is a regularization factor that limits the magnitude of Δu by adding extra penalty to it (Gayadeen et al. 2017). Equation (2.25) has an analytical solution given by )−1 ( Δu = R T R + λIn R T Δy.
(2.26)
Then, we obtain another solution for the static controller as )−1 T ( T Ks ≈ Rid Rid + λIn Rid .
(2.27)
Since the actual R is unknown, we use Rid to approximate it. The regularization factor λ plays an essential role in this algorithm. It avoids too large beam actuator corrections but keeps all beam parameter directions under control. Therefore, this method often performs better for longitudinal feedback than the SVD-based method with singular-value truncation. A larger λ results in a more robust static controller, which can handle response matrices with larger uncertainties or condition numbers. The cost is that the feedback loop’s response may become slower, i.e., with smaller T Rid ∈ Rn×n does not have a full rank. In this case, closed-loop bandwidth. If m < n, Rid we must have λ > 0 so that the inversion in (2.27) is feasible. We will demonstrate the SwissFEL bunch2 control results for different values of λ in Sect. 2.4.5.
2.4.4 Robust Control Design Control theory tells that a feedback controller can represent the inverse transfer function of the plant. It provides another method to estimate the response matrix inverse. Consider the loop in Fig. 2.10, we introduce a matrix M ∈ Rn×m and connect it with the plant in a positive feedback loop. The plant’s actual response matrix is R ∈ Rm×n with m ≤ n, and rank(R) = m. With a proper M, we obtain a unity closed-loop transfer matrix Im from r to y, implying perfect reference tracking. That is, we have RM(Im − RM)−1 = Im .
(2.28)
Compared to (2.4), we obtain another design of the static controller as Ks ≈ M(Im − Rid M)−1 =: Q.
r
+ +
M
R
(2.29)
y
Fig. 2.10 Feedback loop to generate equivalent inverse of the plant’s response matrix
2.4 Static Linear Feedback Controller Design
41
Fig. 2.11 Extended feedback loop with the modeling of response matrix uncertainties
We use Rid to approximate the unknown actual R and define a new matrix Q for later usage. It requires det(Im − Rid M) /= 0 for feasible inversion. The feedback loop above converts the problem of inverting R into solving M. Since R is unknown, we use its approximation Rid with an explicit modeling of its uncertainty. The value of r is not relevant, so we assign it to zero. Then Fig. 2.10 can be extended to Fig. 2.11. We used the response-matrix uncertainty model (2.12) with γ replaced by Wu . The design will find an optimal control matrix M to mitigate the disturbance w added to the plant’s outputs. Good disturbance-rejection capability also implies excellent reference tracking performance. Both are determined by the sensitivity function of the closed loop. We define several weighting factors (as scalar values) to guide the design of M: • Wu is the weight of the uncertainty. Typically, we set Wu ≥ γ with γ estimated by (2.18). With a larger Wu , the design of M (and further Ks ) is robust to larger uncertainties, but the closed-loop (Fig. 2.3) response might be slower with the same dynamical controller. • Wd defines the size of the disturbance. A larger Wd requires M to respond faster, and the resulting Ks is less robust to uncertainties. • We defines the allowed residual errors for disturbance rejection. A larger We requires more disturbance suppression if the maximum magnitude of e is fixed and therefore, requires stronger control actions from M. These factors are tuned empirically when synthesizing M, making tradeoffs between robust stability and robust performance (Skogestad and Postlethwaite 2005). The robust stability requires the closed-loop to be stable for all possible disturbance matrices with ||Δ||2 ≤ 1, and the robust-performance goal is to achieve ||e||2 ≤ 1 for all ||w||2 ≤ 1 and all ||Δ||2 ≤ 1. We will use the robust control design method (Smith and Packard 1996; Rezaeizadeh et al. 2016; Gayadeen et al. 2017) to synthesize the control matrix M. First, we convert the loop in Fig. 2.11 into a general plant P (see Fig. 2.12), which is written as ⎡ ⎤ ⎤ ⎡ ⎤ ⎡ z v 0 0 W u In ⎣ e ⎦ = P⎣ w ⎦, where P = ⎣ We Rid We Wd Im We Rid ⎦. (2.30) Rid W d Im Rid y u
42
2 Beam Feedback Control
Fig. 2.12 General plant diagram for robust control design
Note that u, z, v ∈ Rn and y, e, w ∈ Rm . When closing the lower loop with M, u and y become internal variables and do not appear in the closed-loop transfer function. Therefore, we obtain [ ] [ ] z v =N , with N = + UQV, where e w ] [ ] [ [ ] W u In 0 0 ,U = , V = Rid Wd Im . = We Rid We Rid We Wd Im
(2.31)
Here N is the closed-loop transfer matrix, which has bounded uncertainties due to the presence of the perturbation matrix Δ. The matrix Q has been defined in (2.29) and Ks ≈ Q. Therefore, the design problem changes to finding an optimal Q to minimize the structured singular value of N perturbed by ||Δ||2 ≤ 1. The structured singular value of N is the upper bound of the maximum singular values of all possible Ns (N has infinite possible values due to the presence of Δ). It means that we search for a realization of Q to provide maximum disturbance rejection. Here the word “maximum” stands for the average performance considering all possible uncertainties. This is analogous to the H∞ -design criteria for the dynamical system case. To facilitate minimizing N’s structured singular value, we define a set of matrices that commute to the perturbation Δ: Z = {diag(zIn , Im )|z ∈ R, z > 0}.
(2.32)
Then, we need to solve the following optimization problem Q∈R
inf n×m
,Z∈Z
) ( σ¯ Z( + UQV)Z−1 ,
(2.33)
which looks for an optimal matrix Q and an optimal scalar z to minimize the structured singular value of N. The matrix Z ensures the minimization of the upper bound of the maximum singular values of all possible Ns considering the plant’s uncertainties. If given z, (2.33) can be written in the form inf Q σ ( c + Uc QVc ) with the subscript “c” standing for constant matrices. It is the constant-matrix Davis-Kahan-Weinberger problem and can be solved with the Matlab command ruqvsol. We solve the problem (2.33) with the constant-matrix D-K iteration method, which consists of the following steps:
2.4 Static Linear Feedback Controller Design
43
1. Set Z0 = In+m and i = 0. 2. Solve ) ( Qi = arg infn×m σ¯ Zi ( + UQV)Zi−1 . Q∈R
(2.34)
3. Solve the upper bound of the maximum singular value ) ( Zi+1 = arg inf σ Z( + UQi V)Z−1 . Z∈Z
(2.35)
4. i = i + 1. Go to step 2. Typically, we should choose the number of iterations by checking the performance of the resulting Ks ≈ Q in a feedback loop controlling the perturbed response matrix. As we have mentioned, the three weighting factors can be adjusted to tune the performance of the resulting Q. This method will also be demonstrated with the SwissFEL bunch2 feedback in the next subsection.
2.4.5 SwissFEL Bunch2 Feedback Control We applied the algorithms discussed in this chapter to the pulse-to-pulse longitudinal feedback control for the bunch2 of SwissFEL. The loop is depicted in Fig. 2.4. It operates at 100 Hz. The gain of the dynamical controller K d (z) was set to g = 1. Based on the identified response matrix (2.7), we designed the static controller with different algorithms and configurations. They are summarized as follows: (a) Case1: SVD-based design without singular-value truncation. See Eq. (2.23). (b) Case2: SVD-based design with singular-value truncation. The singular values of Rid are 4.32, 1.78, 1.09, 0.56 and 0.30. We discarded the last two. (c) Case3: least-square-based design with λ = 0. See Eq. (2.27). (d) Case4: least-square-based design with λ = 0.7. (e) Case5: robust control design with Wu = 0.5, Wd = 1, and We = 0.4. We assumed a large response matrix uncertainty (50%) for more robustness. These five static controllers are tested to track the same step changes in the setpoints of beam parameters. The responses of the beam parameters are depicted in Fig. 2.13, and the corresponding changes of inputs (RF pulse steps) are shown in Fig. 2.14. Below are some analysis of the results: • The case1 controller (labeled with “SVD no truncation”) corresponds to the direct inverse of Rid . It works but with large oscillations in both outputs and inputs, implying too small gain or phase margins. • The case2 controller (labeled with “SVD with truncation”) results in acceptable responses for ΔE BC1 , L BC1 , and ΔE BC2 , but bad performance for ΔE L H
44
2 Beam Feedback Control
Fig. 2.13 Step responses of SwissFEL bunch2 parameters with different static controllers
Fig. 2.14 Settings of RF pulse steps (actuators) for the step response test
2.5 Further Reading and Outlook
45
and L BC2 . This is due to the singular-value truncation that leaves some output directions uncontrolled. • The case3 controller (label: “Regul. λ = 0”) is unstable and saturates the inputs. The step ratios are in the range [0, 1], and the step phases are in the range [−60°, 60°]. We do not show the full range of step phases in Fig. 2.14. • The case4 controller (labeled with “Regul. λ = 0.7”) and the case5 controller (with the label “Robust control”) both work well with excellent tracking performance for all beam parameters. Some beam parameters (e.g., ΔE L H ) have slower responses, and some have slight overshoots (e.g., L BC2 ). In principle, their transient responses can be further optimized by implementing different dynamical controllers for the corresponding beam parameters.
2.5 Further Reading and Outlook Beam feedback control is critical for the successful operation of accelerators. Many studies have been conducted, and many practical systems are in operation in different accelerator facilities. We provide some more references in addition to those given in previous sections (Steinhagen 2007; Fairley et al. 2009; Tian et al. 2015; Rezaeizadeh 2016; Dinter 2018). This chapter only discusses the basic concepts, architecture, and algorithms of beam feedback control; here, we give some outlooks for the future R&D. In addition to inverting ill-conditioned response matrices, another difficulty in designing beam feedback controllers comes from the drifts or nonlinearity of the system. Drifts result in a time-varying response matrix, and nonlinearity makes the response matrix dependent on the operating points. Both may lead to performance degradation or even instability. The least-square with regularization (Sect. 2.4.3) and the robust control (Sect. 2.4.4) methods can partly mitigate the drifts and nonlinearity. Here we provide several other possibilities to handle this problem. In Chap. 4 of this book, we build a neural-network surrogate model of the accelerator. When changing the beam to a different operating point, we estimate the response matrix based on the surrogate model and reconfigure the feedback controller using the matrix-inversion methods given in this chapter. This strategy helps make fast changes between different operating points of a nonlinear system. Alternatively, one may identify the beam response matrices at different operating points in advance, and then update the controller configuration during run-time based on the current operating point. This method is known as gain scheduling. Adaptive control is another method to adapt the response matrix or the feedback controller to the drifts or operating point changes. In the 1990s, applying neural networks to nonlinear system control was a hot topic. The studies focused on the adaptive control (Chen and Khalil 1992; Mazumdar 1995; Widrow and Plett 1997; Ge et al. 2002) and the internal model control (IMC) (Nahas et al. 1992; Rivals and Personnaz 1996). Accelerator beam feedback control may benefit from these studies if applicable and needed.
46
2 Beam Feedback Control
Chapter 4 also introduces a data-driven method for designing feedback controllers: reinforcement learning (RL). It synthesizes feedback controllers using the input– output data of the system without needing a system model (i.e., response matrix). Using the RL method, we will demonstrate implementing a static controller for SwissFEL bunch2. The attractive point of RL is that by introducing deep neural networks (i.e., deep RL), it may be able to implement a single controller that works for an extensive range of operating points of a nonlinear system. We will give more information in Chap. 4. Nowadays, computers are becoming more and more powerful for real-time processing. Many feedback systems perform real-time optimizations at each timestep (i.e., control step) of the discrete control to identify system models or produce control actions. In Amin Rezaeizadeh’s thesis (Rezaeizadeh 2016), many control algorithms based on real-time optimizations are applied to accelerator beam controls. For example, Chap. 5 of the thesis discusses the beam energy feedback that determines the amplitudes, phases, or modulator high voltages of multiple RF stations using optimizations at each control step. Chapter 6 introduces a real-time optimization-based algorithm (adaptive control) for refining the beam response matrix and determining the input corrections at each control step. This algorithm helps adapt (gradually) the beam response matrix with the accelerator’s drifts or the operating point changes. The optimization problems in such algorithms are typically convex and can be solved efficiently. Feedback based on real-time (convex) optimizations is an attractive direction for the future beam feedback systems.
References F.C. Chen, H.K. Khalil, Adaptive control of nonlinear systems using neural networks. Int. J. Control 55(6), 1299–1317 (1992). https://doi.org/10.1080/00207179208934286 H. Dinter, Longitudinal diagnostics for beam-based intra bunch-train feedback at FLASH and the European XFEL. Ph.D. Thesis, Hamburg University (2018) D. Fairley, S. Allison, S. Chevtsov et al., Beam based feedback for the Linac coherent light source, in Proceedings of ICALEPCS2009, Kobe, Japan, 12–16 Oct 2009 (2009) S. Gayadeen, M.T. Heron, G. Rehm, Uncertainty modeling of response matrix, in Proceedings of ICALEPCS2015 Conference, Melbourne, Australia, 17–23 Oct 2015 (2015) S. Gayadeen, S.R. Duncan, G. Rehm, Optimal control of perturbed static systems for synchrotron electron beam stabilization. IFAC PapersOnLine 50(1), 9967–9972 (2017) S.S. Ge, C.C. Hang, T.H. Lee et al., Stable Adaptive Neural Network Control (Springer, New York, 2002) T. Himel, S. Allison, P. Grossberg et al., Adaptive cascaded beam-based feedback at the SLC, in Proceedings of the PAC93 Conference, Washington DC, USA, 17–20 May 1993 (1993) R.A. Horn, C.R. Johnson, Matrix Analysis, 2ed edn. (Cambridge University Press, New York, 2013) S.K. Mazumdar, Adaptive control of nonlinear systems using neural networks, Ph.D. Thesis, The University of Adelaide (1995) S.H. Mirza, R. Singh, P. Forck et al., Closed orbit correction at synchrotrons for symmetric and near-symmetric lattices. Phys. Rev. Accel. Beams 22, 072804 (2019). https://doi.org/10.1103/ PhysRevAccelBeams.22.072804
References
47
E.P. Nahas, M.A. Henson, D.E. Seborg, Nonlinear internal model control strategy for neural network models. Comput. Chem. Eng. 16(12), 1039–1057 (1992). https://doi.org/10.1016/0098-135 4(92)80022-2 A. Rezaeizadeh, Automatic control strategies for the Swiss free electron laser, Ph.D. thesis, Eidgenössische Technische Hochschule Zürich (2016) A. Rezaeizadeh, T. Schilcher, R.S. Smith, Robust H∞ -based control design for the beam injector facility, in Proceedings of the 2016 European Control Conference, Aalborg, Denmark, 29 June–1 July 2016 (2016) I. Rivals, L. Personnaz, Internal model control using neural networks, in Proceedings of the IEEE International Symposium on Industrial Electronics, Warsaw, Poland, 17–20 June 1996 (1996) S. Simrock, Z. Geng, Low-Level Radio Frequency Systems (Springer, Cham, 2022) S. Skogestad, I. Postlethwaite, Multivariable Feedback Control: Analysis and Design, 2ed edn. (Wiley, New York, 2005) R. Smith, A. Packard, Optimal control of perturbed linear static systems. IEEE Trans. Auto Control 41(4), 579–584 (1996). https://doi.org/10.1109/9.489279 R. Steinhagen, LHC beam stability and feedback control, Ph.D. Thesis, Rheinisch-Westf¨alischen Technischen Hochschule (2007) P. Strehl, Beam Instrumentation and Diagnostics (Springer, Berlin, 2006) Y. Tian, K. Ha, L. Yu et al., NSLS-II fast orbit feedback system, in Proceedings of ICALEPCS2015 Conference, Melbourne, Australia, 17–23 Oct 2015 (2015) B. Widrow, G.L. Plett, Nonlinear adaptive inverse control, in Proceedings of the 36th Conference on Decision & Control, San Diego, CA, USA, 10–12 Dec 1997 (1997)
Chapter 3
Beam Optimization
Abstract Reliable beam feedback is essential for operating an accelerator with stable charged particle beams. Furthermore, one must determine the setpoints of these charged particle beam feedback loops for optimal parameters of the machine’s final products (e.g., free electron laser (FEL) photon beam). This leads to the needs for the beam optimization processes discussed in this chapter. More generally, optimizing the operation of an accelerator also includes setting up the accelerator subsystems for maximum performance like availability, stability, power efficiency, robustness, and so on. In this chapter, we will introduce several widely used optimization algorithms focusing on online beam optimizations. The test results of these algorithms for optimizing the bunch2 parameters of SwissFEL will be demonstrated.
3.1 Beam Optimization Overview 3.1.1 Optimization Problems in Beam Controls Optimization problems exist throughout the design and operation phases of particle accelerators. When designing a new accelerator, the configuration of magnets (i.e., beam optics), radio frequency (RF) frequency, and even the machine size, should be optimized to achieve the desired beam quality. We will not discuss the design optimization process here. This chapter focuses on the optimization problems in operating an existing accelerator. Several examples of these problems are given below: • • • • • • •
Maximize the FEL pulse energy in an FEL machine. Maximize the beam injection efficiency of a storage ring. Maximize the beam lifetime in a storage ring. Minimize the beam loss in a linear or circular accelerator. Minimize the beam emittance. Minimize the overall RF breakdown rate of a linear accelerator. Maximize the energy efficiency of a high-power RF amplifier.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 Z. Geng and S. Simrock, Intelligent Beam Control in Accelerators, Particle Acceleration and Detection, https://doi.org/10.1007/978-3-031-28597-4_3
49
50
3 Beam Optimization
An optimization process searches for the required control settings of beam actuators for achieving the objectives. For example, to maximize the FEL pulse energy, we tune the RF Gun amplitude and phase, the Gun solenoid current, the bunch compression, the strength of quadrupoles, the beam orbit, etc. Traditionally, optimization is done manually by operators if the number of knobs is small. However, for optimizing the state-of-the-art accelerators, there may be tens or hundreds of parameters to tune, so the manual tuning is either too slow or not practical. Therefore, we design software tools, named optimizers, to automate the optimization process (Huang 2020). As shown in the 4-layer control strategy (Fig. 1.1) of particle accelerators, optimizers are at the highest control layer, aiming at providing satisfactory final products (e.g., photons) to the users. In this layered model, the global optimization determines the setpoints of beam feedback loops. If the beam feedback loops are unavailable, the optimizer can also tune the instrumentation-layer components directly. In principle, optimization is used in far broader aspects, wherever the tuning of some knobs for achieving specific goals is required.
3.1.2 Formulation of Optimization Problems A beam optimization problem can be generally formulated as u∗ = arg min J (y), where y = G(u), subject to u ∈ Ω.
(3.1)
u
Here u ∈ Rn is a vector of n separate inputs of the system. The inputs are adjusted as knobs to affect the system outputs that form a vector y ∈ Rm . Similar to Chap. 2, we still assume m ≤ n so that the desired system outputs are always achievable by manipulating the inputs. We use G to denote the actual system response function, which can be approximated by a mathematical model either derived from physics principles or identified empirically. If such a model does not exist, then G is viewed as a blackbox. We assume the system to be optimized is static: the output y is only determined by the instant input u and is independent of the historical input changes. The optimization of a dynamical system is discussed in Sect. 4.4. The term system is a generic concept to denote any entity that accepts inputs and produces outputs. The function J (y) is called an objective function (or cost function) that evaluates the system outputs. We assume that J is a given function with a known expression (y is the independent variable). An optimization process obtains an optimal input vector u∗ (i.e., the solution of the optimization problem) that minimizes J. We will always “minimize” the cost function. If the problem needs to maximize J, we can convert it to the form (3.1) by minimizing J ' = −J . In a physical system, each input is limited to a specific range. The input range of the vector u covers a volume in the n-dimensional parameter space and is denoted as Ω. The optimization problems in particle accelerators can be categorized into two types according to the format of their objectives:
3.1 Beam Optimization Overview
51
1. Open-objective optimization. For such problems, the objectives are minimizing some quantities. It is not defined how small the quantity should be, and the optimizer just does its best. 2. Operating point changing. For such problems, we define concrete target values for system outputs, and the optimizer should control the outputs as close to the target values as possible. Solving the above two types of problems follows the same methods. The only difference is the definition of the cost function. We will demonstrate the operating point changing using the SwissFEL bunch2 operation as an example (Sect. 3.3.3). Depending on the number of objectives, optimization problems can also be characterized into single-objective or multi-objective optimizations. If the cost function J returns a vector (i.e., with multiple objectives), it must be evaluated with a special sorting method. For example, if we want to maximize the FEL pulse energy and minimize the beam loss simultaneously, we must define what “optimal” means because these two objectives may conflict. A non-dominated sorting algorithm has been developed to deal with multi-objective optimizations (Deb et al. 2002), briefly discussed in Sect. 3.2.6.3. We can also categorize the optimization problems with the response function G. If its format (e.g., analytical model) is known, (3.1) is called a model-based or white-box optimization problem, solved by offline optimization algorithms based on simulation. Offline optimization is typically faster and safer regarding beam loss or interlock trips (e.g., no worry of RF breakdowns). However, for most beam optimization problems, G is unknown and nonlinear or even not smooth. Such problems are modeless or blackbox optimization problems. They must be solved by online optimization algorithms, manipulating the system inputs according to the measurements of the system outputs. Additional requirements arise for efficient online optimizers. First, they must handle the noise in the system inputs and outputs. Noise can mislead the search for optimal solutions, as discussed in Sect. 3.1.3. Second, online optimizers should be robust against faults, such as interlock trips of RF stations or unexpected beam stops. Finally, online optimizers should be sample efficient. They must converge with few cost function evaluations, which might be time-consuming or costly for physical systems. Furthermore, online optimizers should converge fast enough to track the drifts in system dynamics. Fortunately, most accelerator subsystems are reproducible, i.e., the drifts of the system parameters are much slower than the convergence of the optimization process.
3.1.3 Noise in Online Optimization Problems The major difference between online and offline optimizations is the noise in the system. An online optimization problem is formulated graphically in Fig. 3.1. The control settings (i.e., the commands adjusting the system inputs), uset , are determined by the optimizer. The actual system input vector is u = uset + Δu jit , where Δu jit
52
3 Beam Optimization
Fig. 3.1 Formulation of an online optimization problem (the actuators and detectors are not shown)
is the actuation noise introduced by the actuators. We use vector y to denote the actual system outputs, which are measured by detectors that add measurement noise Δymea . The measurement results are noisy given by ymea = y + Δymea . Since the cost function is derived from ymea , the performance evaluation will be affected by the measurement noise. Some optimization algorithms also need to know the actual inputs to the system. Then the actual input u is measured, resulting in a noisy measurement umea = u + Δumea . The noise terms Δu jit , Δumea , and Δymea may limit the optimization performance and must be considered explicitly. Compared to Fig. 2.1, the components between uset and ymea in Fig. 3.1 corresponds to the block of “Plant” in Fig. 2.1. ∗ The online optimizer obtains a solution uset , which contains randomness due to the ∗ noise. We still use u to represent the actual optimal solution. The bias and variance ∗ are criteria to evaluate the optimizer’s performance, defined as of uset ( ∗ ) ] ∗ ] = u∗ − E uset , bias uset [( ( ∗ ) ])( ∗ ] ] ∗ ])T [ ∗ ∗ var uset = E uset . uset − E uset − E uset
(3.2)
Here we use E[·] to denote the mathematical expectation. The bias represents the ∗ from the actual optimal solution. Typically, we deviation of the mean value of uset ∗ scatter from expect a small bias. The variance describes how far the resulting uset its mean value if we repeat the optimization process multiple times. Though we also want a small variance, it is less significant than the bias in some cases because we can repeat the optimization several times and average the resulting optimal solutions to reduce the noise effects. Of course, if the optimization process is expensive or timeconsuming, multiple executions may not be practical, and the variance is also critical. ∗ The second equation of (3.2) returns a covariance matrix of the random vector( uset ). ∗ ∗ If the components of uset , each corresponding to an input, are uncorrelated, var uset is a diagonal matrix. The noise terms in Fig. 3.1 affect both the bias and the variance. The effects of noise are algorithm dependent and must be handled properly, which will be discussed in Sect. 3.2. Table 3.1 shows an example of online beam optimization problems, explaining the terms in Fig. 3.1. In this example, we adjust the beam orbit in the undulators to maximize the FEL pulse energy, which is measured by a gas-based detector (see Sect. 1.2.2). The beam orbit is represented by n beam positions at different locations
3.2 Optimization Algorithms
53
Table 3.1 Example of an online optimization problem Items
Notation
Meaning in the example
Control settings
uset ∈ Rn
Setpoints of beam positions in undulator region
Actuation noise
Δu jit ∈ Rn
Beam position jitter as orbit feedback residual error
Actual input
u ∈ Rn
Actual beam positions in undulator region
Input meas. noise
Δumea ∈ Rn Rn
BPM noise
Measured input
umea ∈
Actual output
y∈R
Actual FEL pulse energy
Output meas. noise
Δymea ∈ R
Gas-based detector noise
Measured output
ymea ∈ R
Gas-based detector measured FEL pulse energy
Cost function
J ∈R
J (ymea ) = −ymea
BPM measured beam positions
in the undulator region. The beam positions are measured by BPMs and controlled by an orbit feedback loop actuating on multiple corrector magnets.
3.2 Optimization Algorithms 3.2.1 Overview of Optimization Algorithms The algorithm of an optimizer depends strongly on the response function G and the cost function J. In an ideal case, the accurate model of G is known and its inverse G −1 exists, and the required inputs for the desired outputs can be calculated with G −1 . This quickly solves the operating point changing problems. One example is the beam matching to the designed optics (i.e., Twiss functions). Since the model describing the relationship between quadrupole strengths and Twiss functions is typically accurate, the quadrupole strengths can be determined with the measured beam parameters and the model inverse. Even if the model is inaccurate, this method can determine the quasi-optimal inputs as a starting point for further optimization or feedback control, as discussed in Sect. 1.1.2.3. Most practical control problems are not invertible. However, if the analytical expression of G is known and J’s derivative with respect to u exists, many derivative-based algorithms, such as the gradient decent algorithm (first-order method) and Newton’s method (second-order method) (Kochenderfer and Wheeler 2019), can be adopted. Practical engineering problems like beam optimizations are often blackbox problems, for which G is unknown and J is often non-differentiable. Derivative-free algorithms (Audet and Hare 2017; Xi et al. 2020) are needed to solve them. Two categories of algorithms have been developed for the blackbox optimization: (a) Direct search algorithms (Lewis et al. 2000), such as pattern search, NelderMead simplex method (Huang 2018), extremum seeking (Ariyur and Krstic
54
3 Beam Optimization
2004; Scheinker and Krstic 2017), and the methods with adaptive searchdirection sets like the Rosenbrock’s method and Powell’s method. These methods are deterministic algorithms, searching for optimal solutions following predefined strategies. (b) Heuristic algorithms (Lee and El-Sharkawi 2008), such as random search (Li and Rhinehart 1998), swarm intelligence (Engelbrecht 2007), tabu search, simulated annealing, and genetic algorithms. These algorithms are stochastic algorithms, searching via randomly selected trial solutions. These algorithms search for optimal solutions based on experiences and some simple rules, often more efficient than accurate algorithms for engineering problems. They cannot guarantee finding the absolute best solution but are still very useful in practice. Deterministic algorithms are efficient for finding local optima, whereas stochastic algorithms have a larger probability of finding the global optimum but are often less efficient. Of course, a deterministic algorithm can be executed many times with randomly selected starting points in search of the global optimum. Since each starting point converges to the nearby local optimum, we can compare the solution of each local optimum and choose the best one. This technique is called the multi-start method (Marti et al. 2013). Optimization algorithms are usually executed iteratively in a loop. The loop is typically terminated if the optimal solution is found, the solution stops improving, or the maximum number of iterations is reached. These conditions are known as the termination conditions of an optimization algorithm. In this section, we introduce several blackbox optimization algorithms widely used in accelerator beam controls. The spontaneous correlation optimization (SCO) and random walk optimization (RWO) methods are the simplest but practical algorithms for online beam optimization. The robust conjugate direction search (RCDS) algorithm extends Powell’s method with a 1-dimensional optimizer robust against noise. It can also be used for online optimization. The SCO, RWO and RCDS algorithms typically search for the local optima. As examples of stochastic algorithms, we introduce two evolutionary algorithms: genetic algorithm (GA) and particle swarm optimization (PSO). They are population-based algorithms. A population is a set of trial solutions evolving in iterations. GA and PSO search in large ranges of system inputs and often require many evaluations. Aggressive input changes may cause beam loss or interlock trips, and too many evaluations in the physical system take much time. Therefore, GA and PSO are typically not suitable for online optimization. However, if we can establish a surrogate model of the physical system (see Sect. 4.3.3), we may apply GA or PSO to the model and benefit from their capability of finding the global optimum. In Chap. 4, after introducing the machine learning models, we will discuss the Gaussian process optimization (GPO), another candidate for online beam optimization (see Sect. 4.3.5).
3.2 Optimization Algorithms
55
Fig. 3.2 Value of the 2-dimensional Rastrigin function
3.2.2 Test Function The Rastrigin function is a non-convex function often used to test the performance of optimization algorithms. For better visualization, we perform our test with the 2-dimensional Rastrigin function given by y(x1 , x2 ) = 20 + x12 + x22 − 10 cos(2π x1 ) − 10 cos(2π x2 ),
(3.3)
where x1 and x2 are the inputs, presented as an input vector u = [x1 x2 ]T . The output y has a pattern as in Fig. 3.2. It has a global minimum at [0 0]T , where y(0, 0) = 0. The local minima are at [m n]T , where m and n are integers, not simultaneously zero. ] ]T Considering the input jitter Δu jit = Δx1 jit Δx2 jit and the measurement noise Δymea , we write the cost function as ) ( J = y x1set + Δx1 jit , x2set + Δx2 jit + Δymea
(3.4)
with uset = [x1set x2set ]T the input setting values. In this problem, we constraint the input range to be −10 ≤ x1set , x2set ≤ 10.
3.2.3 Spontaneous Correlation Optimization Jitter in the system inputs causes correlated jitter in the system outputs. If the input deviation is small, the relation between the input and output variations is approximately linear. In a physical system, the inputs are produced by actuators introducing spontaneous noise (i.e., naturally existing noise, such as thermal noise, voltage or current jitter added by electronics, etc.). The spontaneous noise is typically small and
56
3 Beam Optimization
Table 3.2 SCO algorithm Initialize Define the number of data points N and the fraction η (with 0 < η < 1, e.g., 0.1) for selecting (0) best performing data points. Set the initial system input vector uset Repeat (for iterations: t = 0, 1, 2, …) 1. Read N synchronous measurements of the inputs and outputs and calculate the costs. ) ( (t) Store the results of inputs and costs in records: umea,i , Ji(t) , i = 1, 2, . . . , N
2. Sort the N records above and find the ηN records with the smallest J values: ( ) ∗(t) ∗(t) umea, j , J j , j = 1, 2, . . . , ηN , where * denotes the “good” points (t+1) 3. Determine and set the new inputs as uset = 1/(ηN ) · End (if the termination conditions are satisfied)
∑ηN
∗(t) j=1 umea, j
yields linear correlations between the input and output variations. In certain conditions, such as when the cost function gradient is not zero, such linear correlations can guide the system inputs towards the optimum. This method is called spontaneous correlation optimization, as described in Table 3.2. The principle of SCO is simple. At the operating points where the cost function gradient is nonzero, the input jitter may increase the cost in one direction and decrease it in the other. Therefore, the actual inputs resulting in smaller costs are closer to the optimal solution. They can be used to update the input settings. We apply the SCO algorithm to our test function (3.3), and the results are shown in Fig. 3.3. In this (0) (0) = 0.4, x2set = example, the optimization is started from different initial points: x1set (0) (0) 0.4 (see plots a, b, and c) and x1set = −1.35, x2set = 2.66 (see plots d, e, and f). The optimization is executed four times for each starting point, each with a different combination of noise levels. We also demonstrate the correlation between the input and cost variations in an SCO iteration in Fig. 3.4. It indicates how the inputs are adapted: the circled inputs yield smaller costs and are averaged to update the input settings. SCO requires measuring the system inputs and outputs synchronously. We must collect many input–output pairs in a short period (e.g., for each pulse in a pulsed machine or each bunch in a continuous-wave (CW) machine), or the optimization may be too slow. Unlike other algorithms, SCO requires measuring the inputs explicitly. Figure 3.3 implies that SCO converges to a local minimum near the starting point because it is driven by the cost function gradient. After reaching a local minimum, the gradient becomes zero, and the input–output jitter correlation vanishes. With the multi-start technique, SCO can also search for the global minimum. SCO converges to an unbiased solution. SCO only stops once the linear correlation between the input/output variations becomes zero. It guarantees convergence if the cost function gradient is nonzero, even with significant input jitter or input/output measurement noises. The input and output measurement noises reduce the convergence speed and increase the variance of the solution. These measurement noises result in a weaker correlation between the input/output variations and reduce the
3.2 Optimization Algorithms
57
Fig. 3.3 SCO results with the test function. a–c Convergence of inputs and cost for different noise levels with a starting point [0.4 0.4]T . The inputs converges to [0 0]T with y = 0; d–f With a starting point [−1.35 2.66]T . The input converges to [−1 3]T with y = 10
capability to select “good” solutions. For example, Fig. 3.3 shows that case 3 (with larger output measurement noise) converges slower than case 1 and has larger input and cost fluctuations. The input measurement noise can be mitigated by increasing the average number ηN. SCO requires larger input jitter for a stronger correlation between the input/output variations, which should be much larger than the measurement noises. If the spontaneous input jitter is too small, we can consider injecting noise (i.e., artificial jitter) actively into the system inputs (Gaio and Lonza 2015). If the system has multiple inputs, noise can be injected into each channel separately to avoid large deviations from the operating point. If an SCO optimizer relies only on spontaneous noise or operates with a small artificial jitter negligible for the beam users, it is suitable for online optimization.
58
3 Beam Optimization
Fig. 3.4 Correlation of the inputs and costs for an iteration of SCO applied to the test function. The input setting is x1set = x2set = 0.4, and the noise terms are Δx1 jit = Δx2 jit = 0.01 RMS, Δx1mea = Δx2mea = 0.01 RMS, and Δymea = 0.5 RMS
3.2.4 Random Walk Optimization The random walk optimization algorithm (Aiba et al. 2012) tunes the system inputs by emulating the behavior of a human operator. An operator typically adjusts the system inputs one by one when manually optimizing the beam. He changes the setting of each input in a random direction by a small step and observes the beam quality. Once the beam quality improves, the change is maintained, and he continues with the next input; if not, the input setting is restored to the previous value. This procedure repeats for all inputs until the beam quality is satisfactory or stops improving. The RWO algorithm is described in Table 3.3. Table 3.3 RWO algorithm Initialize • Define the random walk step size for each input and the number of averages for the system output measurements. One may also define a preferred sequence of adjusting different inputs • Measure the initial system outputs, calculate the cost, and assign it to Jmin Repeat (for iterations: t = 0, 1, 2, …) 1. Repeat (for i = 1,2,…,n, where i is the index of the system input) (a) Vary the ith input by adding or subtracting a specified step size. The operation is chosen randomly with 50% probability for “+” or “–” (b) Measure the system outputs and calculate the cost J (c) If J < Jmin , update Jmin = J , else remove the input change in step a End (if the termination conditions are satisfied)
3.2 Optimization Algorithms
59
The step size must be large enough so that the output variations exceed the measurement resolution. We may adapt the step size with the optimization progress. Smaller step size is preferred when close to the optimum point to avoid the inputs oscillating around the optimal solution. In this case, we need to average the outputs for a longer period to increase the measurement resolution. More averages slow down the convergence, so we must make a trade-off. If there are many input channels, the algorithm in Table 3.3 might be too slow. In this case, one can generate random steps in all inputs simultaneously and then determines whether to keep or discard these changes following the same rule. RWO is a simple optimizer without requiring synchronous measurements of the system inputs and outputs. Therefore, it is usually the first choice for automating the optimization. The RWO algorithm is applied to the test function (3.3), and the results are depicted in Fig. 3.5.
Fig. 3.5 RWO results with the test function. The random walk step size is 0.02 for both x 1 and x 2 . a–c Convergence of inputs and cost for different noise levels with a starting point [0.4 0.4]T . The inputs converges to [0 0]T with y = 0; d–f with a starting point [−1.35 2.66]T . The input converges to [−1 3]T with y = 10
60
3 Beam Optimization
Like SCO, RWO converges to a local minimum near the starting point, and the multi-start technique can help search for the global minimum. The input jitter results in slower convergence. If the random walk step size is small compared to the input jitter, the outputs will be dominated by the jitter instead of the random walk. This leads to wrong decisions for keeping or discarding the input changes, resulting in oscillations in the convergence trajectory of the inputs. In Fig. 3.5, cases 1 and 3 have smaller input jitter (0.01 RMS) than the random walk step size (0.02), resulting in faster convergence than cases 2 and 4. Generally, the random walk step size must be larger than the input jitter RMS value. The output measurement noise causes bias in the solution. If the random walkinduced output variation is smaller than the measurement noise, it is hard to judge whether the outputs improve or not. It is more serious when close to the local minimum, where the gradient is small, resulting in smaller output variations for the same input change. In this case, the solution either stops changing or oscillates around a bias. Figure 3.5 shows that the bias of case 3 (with larger measurement noise) is larger than case 1. To mitigate the measurement noise, we can either increase the random walk step size or make more averages of the measurements. RWO is suitable for online optimization. With a small random walk step size and more averages of the measurements, RWO can be executed during user operations to optimize the beam quality continuously, compensating for slow drifts.
3.2.5 Robust Conjugate Direction Search RWO varies the system inputs one by one in small steps. Alternatively, we can scan each input in a broader range in multiple steps and search for the minimum. Furthermore, we can scan along with a general direction in the input parameter space, leading to a general line optimizer. Starting from an initial input vector u0 , we select trial solutions with different λ ∈ R with u(λ) = u0 + λd, where d is a unit vector in the input parameter space. These trial solutions are on a hyper-line passing through u0 along with the direction d and should bracket the minimum point in this direction. Then, we evaluate the costs of these trial solutions by applying them to the physical system or by simulation. This line optimization problem can be solved using the golden section search method (Kochenderfer and Wheeler 2019) or fitting a parabola function. The direction d has a significant impact on the performance of a line optimizer. In the simplest case, we scan each input separately (see Fig. 3.6a). In this example, the system has two inputs, x1 and x2 , and the cost function has a local minimum at the center of the ellipses representing the contours of the cost function. Suppose a long corridor between the starting point (point 0) and the minimum point. Scanning of each individual input (along with d1 and d2 ) leads to a step-like convergence trajectory (see Fig. 3.6a), resulting in slow convergence. Note that the optimal solution obtained from the last scan is used as the starting point of the new scan. In an ideal case, if we know the conjugate directions of the input parameter space, the optimal solution can
3.2 Optimization Algorithms
61
Fig. 3.6 Convergence trajectories for different line optimization strategies. a Search along each individual input; b search along conjugate directions; c search with Powell’s method
be found with at most n line-optimization scans. Here n is the dimension of the input parameter space, i.e., the number of independent inputs. This situation is depicted in Fig. 3.6b, where two conjugate directions, dC1 and dC2 , are identified. The major benefit of searching along conjugate directions is that the scanning in one direction does not affect the optimal position in other directions. It can avoid the step-like trajectories as Fig. 3.6a, resulting in much faster convergence. Any two conjugate directions of the input parameter space, denoted as v and w, satisfy v T Hw = 0, where H ∈ Rn×n is the Hessian matrix of the cost function. The cost function J is a function of the system input vector )] u. Therefore, H is formed by the second-order derivatives ( ] as H = ∂ 2 J / ∂u i ∂u j i, j=1,2,...,n , where u i and u j are the individual inputs of the system. For blackbox optimization, the conjugate directions are usually unknown. In this case, Powell’s method, as described in Table 3.4, can be used to approximate the conjugate directions (Kochenderfer and Wheeler 2019; Huang 2020). The Powell’s method is demonstrated in Fig. 3.6c with two inputs. To simplify the discussion, we denote the numbered points (i.e., solutions found by line searches) as pk (k = 0, 1, 2, …). In this example, the starting point is u0(0) = p0 and the starting directions are D (0) = [d1 , d2 ]. In iteration t = 0, we make line optimizations along d1 and d2 and obtain solutions p1 and p2 , respectively. The direction with a larger cost drop is d2(0) = d2 and the iteration-final solution is u2(0) = p2 . The costs satisfy (0) = J (2p2 − p0 ) < J (p0 ), so we replace the direction d2 with the direction Jext (0) = d3 pointing from p0 to p2 . The line optimization results along d3 , denoted davg as p3 , is then used as the new starting point of the next iteration. Therefore, for iteration t = 1, the updated direction list is D (1) = [d1 , d3 ] and the new starting point is u0(1) = p3 . With similar analysis, we obtain D (2) = [d3 , d4 ] and u0(2) = p6 for iteration t = 2. The iterations repeat until the local minimum is found or the maximum number of iterations is reached. In step 4 of Powell’s method, we replace (t) because the latter is likely a more efficient search direction and has a dl(t) with davg significant overlap with dl(t) . Powell’s method is more efficient than scanning individual inputs, as in Fig. 3.6. For a quadratic cost function, Powell’s method can obtain n conjugate directions and find the local minimum after n iterations, where n is the number of inputs of
62
3 Beam Optimization
Table 3.4 Powell’s method Initialize Select a starting point of the system input vector u0(0) and define n starting directions [ [ D (0) = di(0) , i = 1, . . . , n , often initialized as the directions of individual inputs. Here we use [d1 , d2 , . . .] to represent a sequence of vectors Repeat (for iterations: t = 0, 1, 2, …) (t)
1. Starting from u0 , make line optimizations along the directions in D (t) sequentially. A subsequent scan should start from the optimal solution of the last scan. We denote the direction with the largest cost drop as dl(t) (1 ≤ l ≤ n) and remember the (t)
iteration-final solution after all these n scans as un
( ) (t) 2. Evaluate the cost at an extension point as Jext = J 2un(t) − u0(t) (t+1)
3. Initialize the starting point and searching directions of the next iteration as u0
(t)
= un and
D (t+1) = D (t) ( ) (t) 4. If Jext < J u0(t) , execute the steps below: || ( )/ || || (t) (t) (t) || (a) Append the average direction davg = un(t) − u0(t) ||un − u0 || to D (t+1) (t)
and remove dl from D (t+1) (t) (b) Make a line optimization along davg and use the optimal solution as u0(t+1) End (if the termination conditions are satisfied)
the system. For more complex cost functions, Powell’s method converges in a finite number of iterations if the cost function is close to quadratic near its extremum. Note that the conjugate direction set is not unique. For example, the two obtained conjugate directions (d3 and d4 ) in Fig. 3.6c differ from that in Fig. 3.6b. If the number of inputs is large, it may take too long to construct all conjugate directions. Therefore, we should provide an initial estimate of conjugate directions whenever possible. For example, if a system model exists, we can estimate its Hessian matrix via simulation and use its eigenvectors as the initial conjugate directions. The robust conjugate direction search algorithm proposed by Huang et al. (2013) is based on Powell’s method. It implements an enhanced line optimizer robust to the noise in physical systems. The line optimizer needs to find the correct scan range to bracket the minimum point, which may fail in the presence of noise if we only compare the instant cost without considering its variance. The cost is a random variable if there is input jitter or output measurement noise. RCDS evaluates the variance of the cost explicitly. The variance helps determine the line optimization scan range. At the boundaries of the scan range, the costs should be higher by several (e.g., 3) standard deviations compared to the minimum cost of the sampled trial solutions. We apply the RCDS algorithm to the test function (3.3) and show the results in Fig. 3.7. This test used the Matlab-based realization of the RCDS algorithm (https://github.com/SPEAR3-ML/RCDS).
3.2 Optimization Algorithms
63
Fig. 3.7 RCDS results with the test function. The first 100 evaluations are used to estimate the variance of the cost J. a–c Convergence of inputs and cost for different noise levels with a starting point [0.4 0.4]T ; d–f with a starting point [−1.35 2.66]T
Figure 3.7 indicates that RCDS converges to one of the local minima but cannot guarantee the one nearest to the starting point. The robust line optimizer can properly handle the input jitter and output measurement noise, resulting in unbiased solutions. The cost value J fluctuates. We should choose the inputs resulting in the smallest J value as the final solution. The search range in early iterations (evaluation no. 100– 200 in Fig. 3.7) is relatively large for identifying the conjugate directions. However, such large-range searching can be avoided if we can estimate the initial conjugate directions. In addition, we can add constraints to the input ranges to avoid beam loss or interlock trips. Therefore, RCDS can also be used for online optimization. RCDS has been widely used in accelerator operations, demonstrating excellent performance (Huang and Safranek 2015; Ji et al. 2015; Liuzzo et al. 2016).
64
3 Beam Optimization
3.2.6 Genetic Algorithm The genetic algorithm does random searches for the global optimal solution in the parameter space. It is a stochastic algorithm. It belongs to the larger class of evolutionary algorithms (EAs) that are efficient heuristic search methods based on Darwinian evolution theory. The typical EA process is depicted in Fig. 3.8. It initializes several random trial solutions and evolves them in an evolution loop until the termination criteria are met. The evolution loop consists of several typical steps. First, the costs of the trial solutions are evaluated, and the results are used to guide the selection of better performing trial solutions. In the reproduction phase, the selected solutions (named parents) produce new trial solutions (called offspring or children) that inherit the advantages of parents and explore new solutions. The evolution loop terminates once the termination conditions are satisfied. In EAs, the set of trial solutions is called a population, and each solution is called an individual. As a special case of EA, GA has all the features of EA mentioned above. The basic genetic algorithm (Deb et al. 2002; Engelbrecht 2007; Kochenderfer and Wheeler 2019) is described in Table 3.5. Fig. 3.8 Typical steps of an evolutionary algorithm
Population Initialization
Evaluation
Reproduction
Termination
Selection
Table 3.5 Genetic algorithm Initialize Initialize a population with N random trial solutions and evaluate their costs. Store the results in )) } {( ( a set P (0) = ui(0) , J ui(0) , i = 1, 2, . . . , N with J defined in (3.1) Repeat (for iterations: t = 0, 1, 2, …) 1. Select individuals from the population P (t) as parents and produce M offspring with crossover or mutation. The offspring (denoted as v) and their costs are collected in a set {( ( )) } (t) Q (t) = v(t) , j = 1, 2, . . . , M j , J vj 2. Merge the parents and offspring into a combined set R (t) = P (t) ∪ Q (t) 3. Sort the individuals in R (t) according to their costs 4. Select N best performing (i.e., with smaller costs) individuals in R (t) and define them as the population of the next generation, denoted as P (t+1) . Here we rejected M worse performing solutions in R (t) (i.e., population reduction) End (if the termination conditions are satisfied)
3.2 Optimization Algorithms
65
Compared to Fig. 3.8, step 1 of the GA loop corresponds to the “reproduction” and “evaluation” blocks, and steps 2–4 correspond to the “selection” block. Step 1 also needs a selection algorithm to choose individuals from P (t) for producing offspring. Therefore, the “selection” block also covers part of step 1. We will discuss the algorithms of some GA steps in the remaining part of this subsection.
3.2.6.1
Selection of Parents for Reproduction
In step 1, we obtain M offspring from the population P = {(ui , J (ui )), i = 1, . . . , N } through crossover or mutation operations. Here we do not show the generation number to simplify the formulas. Each crossover operation requires two parents, and each mutation needs one. There are many different methods for selecting parents. The simplest one is to choose randomly with uniform probability, which often performs not so well since the parents’ costs are not considered. We prefer selecting individuals with better performance but still keeping the probability of selecting worse performing individuals for exploration. Therefore, the parent selection method should be stochastic, assigning larger selection probabilities to better performing individuals. Such algorithms include roulette wheel selection, stochastic tournament, expected value selection, etc. Here we introduce the roulette wheel selection method (Engelbrecht 2007; Kochenderfer and Wheeler 2019), which creates a discrete probability distribution based on the costs of the parents. We use the softmax function to define a probability distribution for the individuals in P: e−β J (ui ) p(ui ) = ∑ N , i = 1, 2, . . . , N , −β J (u j ) j=1 e
(3.5)
where i ) is the probability for selecting ui as a parent. They satisfy 0 ≤ p(ui ) ≤ 1 ∑p(u N and i=1 p(ui ) = 1. The minus sign in the exponential function yields a larger selection probability for an individual with a smaller cost. A hyperparameter, β ≥ 0, is introduced to scale the cost function J (ui ) for fine-tuning of the distribution. For example, with β = 0, we obtain a uniform distribution, whereas, with β → ∞, we can only select the best performing solution. In practice, β is a finite positive number determined by maximizing the performance of GA. When selecting a parent, we sample the distribution (3.5). First, we calculate the cumulative probability by accumulating the probability of the individuals in P. See Fig. 3.9, where each line segment represents the selection probability of an individual. Then, we generate a random number r of uniform distribution of (0, 1). The parent individual is selected by checking in which segment r falls. Its index is determined by ⎧ l=
1 r ≤ p(u1 ) ∑k ∑k−1 , p(ui ) < r ≤ i=1 p(ui ), 1 < k ≤ N k i=1
(3.6)
66
3 Beam Optimization
Fig. 3.9 Roulette wheel selection method for parents selection. Individuals with smaller costs occupy longer segments and have more opportunities to choose
which chooses ul as a parent. For example, the “Sample 1” in Fig. 3.9 will select u2 and the “Sample 2” will choose u N −1 , respectively. The selection method above only works for single-objective problems with J (ui ) a real number. Other selection methods like the binary tournament method should be adopted for a multi-objective problem whose cost function returns a real vector. Interested readers can refer to the article (Deb et al. 2002).
3.2.6.2
Crossover and Mutation
Crossover and mutation are two reproduction operations for generating new trial solutions named offspring. In GA, each trial solution is also called a chromosome, and its components are named genes. For accelerator beam optimization, a chromosome is a combination of the system inputs (e.g., n beam position settings in the undulator region of an FEL), and a gene is the setting of a single input. Crossover recombines the chromosomes of two parents and produces two offspring, which inherit the advantages of both parents and may perform better. To avoid being attracted by local minima, we also need mutation that randomly varies some of the genes, making the algorithm more explorative. Mutation can be applied to either the parents or the offspring. There are many crossover methods. If the chromosomes are coded with binary values (i.e., each gene is a binary number), the one-point crossover, two-point crossover, or uniform crossover methods (Engelbrecht 2007) can be adopted. These methods select genes in the chromosomes of the two parents and exchange their binary values. Here we focus on continuous chromosomes with each gene being a floating-point number, which is the case of beam optimization problems. For continuous chromosomes, the arithmetic crossover method is often applied, producing two offspring using the linear combinations of the two parents. Here we introduce one ]T ] T := u 11 u 12 . . . u 1n (we use of the implementations. Assume u1 = [u 1i ]i=1,...,n T are the chrothis format to present a vector for simplicity) and u2 = [u 2i ]i=1,...,n mosomes of two selected parents and we cross them over to generate two offspring T T v1 = [v1i ]i=1,...,n and v2 = [v2i ]i=1,...,n . For arithmetic crossover, we generate a T random vector α = [α1i ]i=1,...n of size n with each element drawn from the uniform distribution of (0, 1). Then the offspring are calculated as
3.2 Optimization Algorithms
67
v1i = αi u 1i + (1 − αi )u 2i , v2i = αi u 2i + (1 − αi )u 1i , i = 1, . . . , n.
(3.7)
To improve the exploration capability of crossover, we may draw the random number αi from a uniform distribution of a slightly different range, (−γ , 1 + γ ), where γ is a positive number satisfying γ « 1. In practice, one can implement different crossover functions and randomly select them during operation. Mutation is an essential operation for further improving the exploration capability of GA, providing more opportunities for finding the global minimum. To mutate a chromosome, which is either an offspring produced by crossover or directly a parent, we make changes to some randomly selected genes. Here we give a simple mutation T method as an example. Assume we are given an initial chromosome u = [u i ]i=1,...,n T and we want to mutate it to v = [vi ]i=1,...,n . First, we generate a random vector T η = [ηi ]i=1,...,n with each element drawn from a Bernoulli distribution. The value of ηi is 0 or 1, and the probability of being 1 is μ (a hyperparameter called mutation rate). If ηi = 1, the gene u i will be mutated by adding a random offset: ⎧ vi =
u i ηi = 0 , i = 1, ..., n, u i + δ ηi = 1
(3.8)
where δ is a random number drawn from either a uniform distribution of (−ε, ε) or a Gaussian distribution N(0, σ 2 ). Both ε and σ are hyperparameters called mutation steps. Mutation helps find the global minimum. However, too many or large mutations (i.e., with a high mutation rate or large mutation step) may slow down the convergence. In practice, we often reduce the mutation rate and mutation step with the progress of GA iterations. At the early stage of optimization, we make more and larger mutations for exploring a larger volume of the input parameter space. When approaching the global minimum, fewer and smaller mutations are made to accelerate the convergence. The mutation algorithm introduced here is the simplest one. Many other mutation algorithms are developed and interested readers can refer to the article (Engelbrecht 2007).
3.2.6.3
Sorting and Population Reduction
Steps 3 and 4 of GA sort the combined population R (t) and reduce its size to obtain P (t+1) . Sorting the solutions of a single-objective problem is simple. However, we must use a special sorting algorithm, non-dominated sorting (Deb et al. 2002; Bao et al. 2017), to judge the cost function results of multi-objective problems. In this T , where n J is the case, the cost function returns a vector, J(u) = [Ji (u)]i=1,...,n J number of objectives and u is the trial solution (i.e., system input vector). A trial solution ua is said to be better than ub if the following conditions are satisfied: Jai ≤ Jbi , ∀i ∈ {1, . . . , n J },
68
3 Beam Optimization
Ja j < Jbj , ∃ j ∈ {1, . . . , n J },
(3.9)
T T where Ja = [Jai ]i=1,...,n and Jb = [Jbi ]i=1,...,n are the costs of ua and ub , respecJ J tively. It indicates that Ja is no worse than Jb for all objectives and is strictly better for at least one objective. In this case, we say that the solution ua dominates ub . In a population of trial solutions, if one solution is not dominated by any others, we say it is on the Pareto front (Kochenderfer and Wheeler 2019) of the population. These frontier solutions are candidates for the final solution of the optimization problem and should be chosen with trade-offs among different objectives. They do not dominate each other. Therefore, if we choose a different frontier solution better in one objective, it must be worse in some other objectives. To understand the concepts of domination and Pareto front, we study an example problem with two objectives: ]T ] J(u) = J1 (u) J2 (u) (see Fig. 3.10). The cost function outputs Ja , Jb , Jc , and Jd correspond to the trial solutions ua , ub , uc and ud , respectively. We can see that ua , ub and uc are on the Pareto front because there are no other solutions dominating them. All solutions on the Pareto front are candidates of the final solution. We must make trade-offs between different objectives. For example, if we choose ub as the final solution instead of ua , the performance represented by J2 (e.g., beam loss) is improved, but that represented by J1 (e.g., FEL pulse energy) degrades. Our goal is to minimize the cost. The solutions ub and uc dominate the solution ud according to (3.9). Actually, any solution whose cost function output falls into the shaded region (including the boundary) dominates ud . The non-dominated sorting algorithm identifies the Pareto fronts of different levels for the trial solutions in R (t) . The sorting and population reduction algorithm is described in Table 3.6. Table 3.6 explains the basic principle of the non-dominated sorting algorithm. It is not optimal in practice since we repeat comparing two solutions if they are not in the first Pareto front. A more efficient implementation can be found in the nondominated sorting genetic algorithm II (NSGA-II) (Deb et al. 2002). For the step 3b, ' from PPa,F using the so-called crowding distance NSGA-II selects the subset PPa,F as a criterion.
Fig. 3.10 Concepts of domination and Pareto front
3.2 Optimization Algorithms
69
Table 3.6 Non-dominated sorting and population reduction algorithm for R (t) Initialize Initialize the remaining population as Pr m = R (t) Repeat (for front index: F = 1, 2, 3, …) 1. Compare the individuals in Pr m and find those not dominated by any others. Collect them in a new set PPa,F , which forms the level-F Pareto front 2. Remove the individuals in PPa,F from Pr m , that is, Pr m ← Pr m − PPa,F 3. Calculate the total number of individuals in all identified Pareto-front sets: ) ( ∑ K = Ff =1 numel PPa, f , where “numel()” returns the number of elements in a set. Take different actions based on the value of K: UF f =1 PPa, f and terminate the loop ) (U F−1 ' ' (b) If K > N: assign P (t+1) = f =1 PPa, f ∪ PPa,F , where PPa,F is a subset of PPa,F to be (a) If K = N: assign P (t+1) =
sure the size of P (t+1) is N. Then terminate the loop (c) Else: continue the iteration End
3.2.6.4
Test Function Optimization with GA
We apply GA to the test function (3.3) and show the results in Fig. 3.11. This test has used the GA implementation in the Global Optimization Toolbox of Matlab. Since GA initializes an initial population randomly, the starting point of inputs is no longer important compared to the SCO, RWO and RCDS. Referring to the simulation results in Fig. 3.11, we can summarize the features of GA as follows: (a) GA does large-range searches in the system input parameter space. It has more opportunity to find the global minimum (as shown in the results) but may cause beam loss due to aggressive input changes. (b) GA evaluates the cost function many times, especially for high-dimension problems with many inputs, which often require a large population evolving for many
Fig. 3.11 GA results with the test function
70
3 Beam Optimization
generations. Therefore, GA typically requires significant computation power and takes a long time for convergence. (c) GA is sensitive to the noise in the system. The measurement noise may mislead the sorting and selection processes by introducing errors in the cost function evaluations. It slows down the convergence or even makes the algorithm fail. These features make GA unsuitable for online optimization, which requires smoother parameter-space search, fewer evaluations (i.e., to be sample efficient), and more robustness against noise. Instead, GA has been widely used for accelerator design optimization mainly based on simulation. In addition, parallel computing can accelerate the GA process if the cost function is evaluated via simulation. If a surrogate model of the physical system exists, we can apply GA to the surrogate model, and then use the result as a starting point for further online optimization. This method significantly accelerates the overall optimization process. We will discuss such surrogate model-based optimization in Sect. 4.3.3.
3.2.7 Particle Swarm Optimization Particle swarm optimization (Engelbrecht 2007; Kochenderfer and Wheeler 2019) is another widely used stochastic algorithm. It is also an evolutionary algorithm. Unlike GA, PSO does not create new trial solutions. Instead, each trial solution is assigned a velocity vector, which points to the update direction and defines the step size for updating itself. In PSO, the set of trial solutions is called a swarm, and each trial solution is named a particle. The value of a particle is called a position in the input parameter space. PSO simulates the social behaviors of animals like fish and birds. In an animal swarm, each one adjusts its moving direction by observing the movements of others. Even though the intelligence of a single animal is limited, a large swarm can show high intelligence for successfully finding food or escaping enemies. This is the key concept of swarm intelligence. In PSO, each particle learns from others, adjusting its moving direction and distance referring to the performance (i.e., costs) of itself and other particles. PSO uses simple rules to update the particle positions with fewer hyperparameters to tune. It is very popular and useful in various fields like complex system design, operational parameter optimization, and machine learning. The update law of a particle is depicted in Fig. 3.12. In a swarm of N particles, the ith (i = 1, 2, …, N) particle is fully described by two n-dimensional vectors: the position ui(t) and the velocity vi(t) , where t is the iteration number and n is the number of inputs. The particle’s movement is influenced by its current velocity vi(t) , its personal best position pi(t) found since the initial iteration, and the global best position g(t) found by the swarm of particles since the start of optimization. The velocity and position update law of each particle is
3.2 Optimization Algorithms
71
Fig. 3.12 Update law of the particle position and velocity
⎧
( ) ( ) vi(t+1) = wvi(t) + c1 R1 pi(t) − ui(t) + c2 R2 g(t) − ui(t) , ui(t+1) = ui(t) + vi(t+1) ,
i = 1, 2, . . . , N , (3.10)
where R1 and R2 are two diagonal matrices with their diagonal elements drawn from the uniform distribution of (0, 1). The three items on the right-hand side of the velocity equation are the inertia, cognitive, and social components, respectively. To avoid the particle position from too fast changes, velocity clamping is often implemented, which clamps each velocity component with an upper limit, vmax, j , j = 1, 2, . . . , n. The hyperparameters in (3.10) are described as follows: • w is an inertia weight controlling the exploration and exploitation abilities of the swarm. It controls how much memory of the previous moving direction affects the new velocity. A large w increases the diversity of the particle and improves the exploration capability, whereas a smaller w promotes local exploitation. Typically, w is in the range from 0 to 1. One possible compromise of w is varying with the progress of optimization. We can initialize w with a larger value when starting the optimization and reduce it gradually for later iterations. This method encourages the particles to explore in early iterations and make more exploitation in later iterations. The simplest method is to multiply w with a factor 0 < η < 1 after each iteration, as in Table 3.7. Other methods to update w can be found in the article (Engelbrecht 2007). • c1 and c2 are acceleration coefficients controlling the stochastic influence of the cognitive and social components on the new velocity. c1 and c2 express how confident a particle is in itself and its neighbors, respectively. If c1 ≫ c2 , each particle makes local searching independently without referring to each other, resulting in excessive wandering around its personal best position. On the other hand, if c2 ≫ c1 , all particles are quickly attracted to the global best position, which may be problematic for rough multimodal search spaces. Usually, we set c1 ≈ c2 with static values like 2. More discussions about setting the values of c1 and c2 can be found in the article (Engelbrecht 2007). Table 3.7 describes the PSO algorithm for single-objective optimizations. It can be naturally extended to multi-objective problems if using non-dominated sorting for selecting the personal and global best solutions (Coello and Lechuga 2002).
72
3 Beam Optimization
Table 3.7 PSO algorithm Initialize
) ( (a) Initialize a swarm of N random particles ui(0) , evaluate their costs J ui(0) , and initialize their velocity as vi(0) = 0, where i = 1, 2, …, N (0)
(0)
(b) Initialize the personal best position of each particle as pi = ui , i = 1, 2, …, N { ( ) } (c) Initialize the global best position as g(0) = arg min J pi(0) , i = 1, 2, ..., N (0)
pi
(d) Set the parameters w(0) , c1 and c2, and define the damping factor η Repeat (for iterations: t = 0,1,2,…) 1. Repeat for all particles with the index i = 1,2,…,N: (a) Calculate the new velocity vi(t+1) and new position ui(t+1) with Eq. (3.10) ) ( (b) Evaluate the cost of the new position, J ui(t+1) ) ( ) ( (c) If J ui(t+1) ≥ J pi(t) , set pi(t+1) = pi(t) and g(t+1) = g(t) ; otherwise: (t+1)
(t+1)
(i) Update the particle’s personal best position as pi = ui ) ( ( ) (ii) If J pi(t+1) < J g(t) , update the global best position as g(t+1) = pi(t+1) 2. Damp the inertia coefficient: w(t+1) = ηw(t) End (if the termination conditions are satisfied)
The PSO algorithm is applied to the test function (3.3) and the results are depicted in Fig. 3.13. This test has used the PSO implementation in the Global Optimization Toolbox of Matlab. Similar to GA, the starting point of system inputs is not important for PSO. Referring to the simulation results in Fig. 3.13, we can summarize the features of PSO as follows:
Fig. 3.13 PSO results with the test function
3.2 Optimization Algorithms
73
(a) PSO does large-range searches in the system input parameter space. It has more opportunity to find the global minimum (as shown in the results) but may cause beam loss due to aggressive input changes. (b) PSO needs many evaluations of the cost function. It may converge slowly if the cost function evaluation is time consuming. However, PSO is often more efficient than the GA method. As seen in Figs. 3.13 and 3.11, PSO requires fewer evaluations than GA for the test problem (3.3). (c) PSO is sensitive to the noise in the system. The measurement noise may mislead the selection of the personal and global best solutions and move the particle in the wrong direction. It slows down the convergence or makes the algorithm fail. These features make PSO unsuitable for online optimization. Similar to GA, PSO has been widely used for accelerator design optimizations. It also supports parallel computation if the cost function evaluation is based on simulation. Furthermore, we can also apply PSO to surrogate models of physical systems to accelerate the overall optimization process. See Sect. 4.3.3 for more details and examples.
3.2.8 Comparison of Optimization Algorithms Table 3.8 briefly compares the five optimization algorithms introduced in this section. One should choose proper algorithms based on the problems to be solved. Typically, we start from simple algorithms like SCO or RWO and only consider complex ones when they are insufficient. More optimization algorithms can be found in the reference articles provided in Sect. 3.2.1. Table 3.8 Comparison of the discussed optimization algorithms Algorithm
SCO
Category
Deterministic Deterministic Deterministic Stochastic
Stochastic
Online
Yes
Local/global Local optimization
RWO
RCDS
GA
PSO
Yes
Yes
No
No
Local
Local
Global
Global /
Bias with noise
Unbiased
Biased
Unbiased
/
Number of objectives
Single
Single
Single
Single/multiple Single/multiple
Parallel computing support
No
No
No
Yes
Yes
74
3 Beam Optimization
3.3 Beam Optimization Examples and Tools This section gives several examples of online beam optimizations with the introduced algorithms. We also summarize some practical considerations for optimizing physical systems. In the end, a survey of optimization software tools is included.
3.3.1 Practical Considerations When applying optimization algorithms to physical systems, some practical issues should be taken care of to ensure performance, summarized as follows: (a) Scaling of system inputs and outputs In Sect. 2.4.1, we have introduced several methods to reduce the condition number of a response matrix. In optimization problems, a system may not be described by response matrices. However, we can still call it ill-conditioned or singular if its outputs are sensitive to input errors. Choosing uniform units for the system inputs/outputs is essential to reduce the system’s singularity. It balances all inputs to similar levels and approximately equalizes the magnitudes of all outputs. Otherwise, if one input is relatively small, a tiny error in the input vector may cause significant relative errors in this input, possibly resulting in dramatic output errors. On the other hand, if one output is much larger than others, its value will dominate the cost, and perhaps only this one converges. Another method to scale the system inputs and outputs is to normalize their ranges to [0, 1] using their limits. See Sect. 2.4.1 for details. The constraints to beam optimization problems are often defined as upper and lower limits of the input and output parameters. The input limits guarantee successful beam acceleration and avoid beam loss or interlock trips. An optimization algorithm must ensure that the system inputs and outputs are within the specified ranges. The inputs should be clamped if they are over the limits. A trial solution must obtain a higher cost if it violates the input or output limits, which guides the search away from boundaries. (b) Conversion of multiple objectives to a single objective As discussed in Sect. 3.2.6, multi-objective problems can be solved by the algorithms based on non-dominated sorting. In practice, multiple objectives can be combined T into a single objective. Suppose the cost function returns a vector J = [Ji ]i=1,...,n J with n J objectives. It can be converted into two different single objectives as Ja =
nJ ∑ i=1
[ | nJ |∑ b wi Ji2 , wi Ji , J = | i=1
(3.11)
3.3 Beam Optimization Examples and Tools
75
where J a is for open-objective optimization, and J b is often used for operating point changing problems. The positive weight factors wi determine the relative importance of each objective and influence the optimization results significantly. A larger wi imposes more penalty on the corresponding objective that will get higher priority in the optimization. They are hyperparameters and should be set empirically for different problems. Operating point changing problems minimize the absolute differyi and ence between each beam parameter | | its target value ydest,i . Its cost function has a typical format like Ji = | ydest,i − yi |. Given a multi-objective problem, we often try first converting it to a singleobjective problem and checking the performance. This is because solving a singleobjective problem is often more straightforward. Of course, multi-objective optimization is still needed if better performance is expected or the conversion to a single objective is not feasible. For example, if our goal is to maximize the FEL pulse energy and minimize the beam loss, it is difficult to convert them into a single objective, and multi-objective optimization algorithms should be adopted. (c) Handling of system faults Online optimizers must handle the system faults, such as interlock trips, beam stops by the machine protection system, or abnormal measurement points caused by malfunctions of beam detectors. For example, suppose the bunch length exceeds the dynamic range of the bunch length detector, or the sensor is affected by the interference from a power supply glitch. In that case, the measurement results may be corrupted. The optimizer should stop its execution if a fault happens and automatically resume after the fault is resolved. Abnormal measurement points should be excluded from the cost function evaluation to avoid misleading the search for optimal solutions.
3.3.2 FEL Optimization with SCO In an FEL machine, maximizing the overall photon energy in an FEL pulse is one of the most important tasks when setting up the machine. Photon users require different FEL wavelengths, bandwidths, pulse widths, pulse repetition rates, pulse energies, etc. Typically, we achieve these parameters using the inverse physics model of the machine, determining the required electron beam parameters (e.g., bunch charge, beam energy, bunch length, beam orbit, etc.). The electron beam parameters are then adjusted by setting the setpoints of the corresponding beam feedback loops or by directly setting the subsystem knobs. The initial FEL setup is typically performed manually, which is often sufficient to achieve the desired FEL wavelength, bandwidth, pulse width, etc. However, improving the FEL pulse energy is tricky because it is sensitive to random errors in the machine due to the stochastic nature of the selfamplified spontaneous emission (SASE) FEL process. In this case, inverting the physics model and manual adjustment are often insufficient. An automated optimizer
76
3 Beam Optimization
is needed (Scheinker et al. 2019; Duris et al. 2020; Kirschner et al. 2022). Typical knobs for improving the FEL pulse energy include (not all): • • • • • • • •
RF Gun laser profile and delay RF Gun amplitude and phase RF Gun solenoid current bunch length at the bunch compressor exit seeding laser alignment (if the FEL is seeded) beam orbit in the undulators beam optics before/in the undulators FEL phase shifter gaps between undulators
As shown in Fig. 1.1, the FEL optimizer may adjust the control settings of openloop devices (e.g., laser delay, Gun solenoid current, FEL phase shifter), vary the local feedback setpoints of closed-loop devices (e.g., Gun amplitude and phase), or change the beam feedback setpoints (e.g., bunch length and beam orbit). SCO is proven efficient for optimizing the FEL pulse energy. It requires beamsynchronous data acquisition and strong correlations between the knobs and objectives. One successful application of SCO is maximizing the FEL pulse energy by adapting the beam orbit in undulators (Gaio and Lonza 2015). The SCO optimizer reads the beam orbit in undulators and the FEL pulse energy for the same bunch. After collecting the data of many bunches, SCO selects the orbits resulting in larger FEL energies and updates the orbit feedback setpoints. If the spontaneous orbit fluctuation is weak, artificial jitter can be added to improve the correlation. Alternatively, we can increase the orbit feedback gain to increase the beam orbit jitter. If we want the optimization process to be transparent to user experiments, we prefer using the spontaneous correlation or limiting the magnitude of the artificial jitter. Figure 3.14 shows some test results at the European XFEL for optimizing the FEL pulse energy (Tomin et al. 2019). The SCO algorithm (also called adaptive feedback) is used. At the first iteration, the data of 100 bunches are recorded, including the FEL pulse energies and the beam orbits in the undulators. See plots a and b. Then, 20 records with larger FEL pulse energies (marked by circles in plot b) are identified, and the corresponding orbits are highlighted as thick-orange lines in plot a. These identified orbits are averaged (the thick-red line in plot c) to update the orbit feedback setpoints. It increases the FEL pulse energy, as seen from the measurements with pulse numbers over 100 in plot d. This process repeats until the FEL pulse energy stops improving. As mentioned above, the FEL pulse energy may be further increased by injecting artificial jitter into the beam orbit.
3.3.3 Operating Point Changing One of the primary tasks of beam control is setting up the desired beam parameters, including varying them to other operating points during operation. For example, we may adjust the electron beam energy for setting up a different FEL wavelength during
3.3 Beam Optimization Examples and Tools
77
Fig. 3.14 SCO results at European XFEL for maximizing FEL pulse energy by adjusting the orbit setpoints in undulators (Courtesy of S. Tomin). a Orbits of 100 electron bunches; b FEL pulse energies of 100 bunches; c golden orbit found by the first iteration of SCO; d FEL pulse energy increase after applying the golden orbit as the new orbit feedback setpoints
user experiments. The beam setup methods, including optimizations for operating point changing, are summarized in Sect. 1.1.2.3. In this subsection, we demonstrate changing the bunch2 parameters of SwissFEL using the RWO and RCDS algorithms. The SwissFEL two-bunch operation is introduced in Sect. 1.3. The bunch2 parameters and the inputs for tuning them are defined in Eq. (1.1). Suppose bunch2 is at an initial operating point described by the input u0 and output y0 . The goal is to determine a new input u∗ to achieve the desired output ydest . We make the change via optimization, and it is a multi-objective problem. Here we use the technique introduced in Sect. 3.3.1 to convert it to a single-objective problem. We define a cost function according to (3.11) J b as J = ||ydest − y||2 , y = G(u),
(3.12)
78
3 Beam Optimization
where G is the bunch2 response function that is unknown. Compared to J b , we have set all the weights to wi = 1. The search ranges of the step ratios and step phases are [0.1, 1] and [−65°, 65°], respectively. The test results at SwissFEL are depicted in Figs. 3.15, 3.16, 3.17 and 3.18. RWO and RCDS manage to drive the bunch2 parameters from their initial values to the target values labeled as “Goal” in the plots. RWO converges within 20 min, but the bias of ΔE L H is nonzero. RCDS achieves all beam parameter goals, but the convergence is slower, with more than 30 min. The reason is the high dimension of the input parameter space (with 8 inputs). RCDS requires many cost function evaluations to obtain all the 8 conjugate directions. Evaluating the cost, i.e., checking the bunch2 parameters for particular inputs, is time-consuming. The optimizer is implemented in Matlab and executes slowly. It sets the RF pulse steps and reads the bunch2 parameters via EPICS (Experimental Physics and Industrial Control System) channel access at around 4 Hz. We average the measurements to reduce the noise effects, further slowing down the process. The system inputs determined by RWO and RCDS are shown in Figs. 3.16 and 3.18, respectively. Since we have more inputs than outputs (8 inputs and 5 outputs), the two cases settle down at different inputs, but both achieve the same outputs. In Sect. 4.3.5, we apply another machine-learning-based optimization algorithm, the multi-generation Gaussian process optimization (MG-GPO), to the same operating point changing task for the bunch2 of SwissFEL. It performs better than RWO and RCDS, benefiting from the Gaussian-process surrogate model of the system established during the optimization process.
Fig. 3.15 Convergence of bunch2 parameters for operating point changing with RWO. The initial output is y0 = [−0.02 0.17 122 0.23 670]T and the target value is ydest = [−0.69 −2.05 223 3.53 578]T . The displayed values are unscaled
3.3 Beam Optimization Examples and Tools
79
Fig. 3.16 RF pulse step settings for changing the bunch2 operating points with RWO
Fig. 3.17 Convergence of bunch2 parameters for operating point changing with RCDS. The initial and target outputs are the same as Fig. 3.15
3.3.4 Optimization Software Tools Many software tools have been developed to solve optimization problems. We make a brief summary in this subsection. Matlab provides two toolboxes related to optimization. One is the Optimization Toolbox for solving classical optimization problems, such as linear programming, quadratic programming, least-squares, etc. The other is the Global Optimization Toolbox that implements several blackbox optimization algorithms, such as pattern
80
3 Beam Optimization
Fig. 3.18 RF pulse step settings for changing the bunch2 operating points with RCDS
search, GA, PSO, simulated annealing, etc. We have used the GA and PSO from this toolbox in Sects. 3.2.6 and 3.2.7. If the system response function G and its domain set Ω are known and convex (Boyd and Vandenberghe 2004), the problem can be solved efficiently using the software tools cvx (in Matlab, http://cvxr.com/cvx/) and CVXOPT (in Python, https:// cvxopt.org/). For Python programmers, the optimize module of the SciPy package (https:// www.scipy.org/) is widely used for minimizing (or maximizing) objective functions, possibly subject to constraints. It includes solvers for nonlinear problems, linear programming, constrained and nonlinear least-squares, root finding, curve fitting, etc. It implements many algorithms, including the Nelder-Mead simplex method, Powell’s method, gradient-based algorithms, etc. The accelerator community has also developed open-source software for solving optimization problems in accelerator design and beam control. Here we list several well-known ones: • RCDS: designed by Xiaobiao Huang and his team. The code is downloadable From GitHub: https://github.com/SPEAR3-ML/RCDS. • MG-GPO: designed by Xiaobiao Huang and his team. The code is downloadable From GitHub: https://github.com/SPEAR3-ML/MG-GPO. • Multi-Objective Particle Swarm Optimization (MOPSO): designed by Xiaobiao Huang and his team. The code is downloadable From GitHub: https://github.com/ SPEAR3-ML/MOPSO. • Ocelot Optimizer (Agapov et al. 2014): a platform for automated optimization of accelerator performance. It is an open-source project, being developed by physicists from the European XFEL, DESY and SLAC. The supported algorithms include the Nelder-Mead simplex method, Bayesian optimization with Gaussian
3.4 Further Reading and Outlook
81
process, RCDS, and Extremum seeking. The source code can be downloaded from GitHub: https://github.com/ocelot-collab/optimizer. This list keeps growing since new problems of accelerator optimization keep emerging, imposing new requirements on the optimization tools.
3.4 Further Reading and Outlook Many optimization algorithms have been developed to solve different problems. The book by Kochenderfer and Wheeler (2019) introduces many practical algorithms for the design of engineering systems. In this chapter, we only discussed several algorithms that are widely used in accelerator beam controls. Another method based on extremum seeking is also commonly adopted in accelerator facilities. We will not discuss it in this book due to the space limit. Interested readers can refer to the articles (Schuster et al. 2005; Fujii et al. 2021; Scheinker et al. 2019, 2022). Some other works for accelerator online optimization can be found in articles (Tomin et al. 2017; Wu et al. 2017; Bergan 2020). In Chap. 4, we will use a neural-network surrogate model of the machine to accelerate the online beam optimization process. We will also introduce a machine learning-based method, Bayesian optimization, as another online optimization algorithm. We have demonstrated using optimization algorithms to maximize the beam performance during setup (e.g., maximizing the FEL pulse energy) and change the beam operating points. There are many other use cases of optimization, such as • Realize slow feedback by running optimization continuously. The drifts in an accelerator can be compensated for by running optimization continuously during the beam operation (Scheinker et al. 2020). In this use case, the optimizer should only vary the system inputs in small steps when searching for the optimum to avoid disturbing the nominal beam operation. • Optimize the parameters of beam feedback controllers, e.g., the gains of a proportional-integral-derivative (PID) controller. The paper by DeBoon et al. (2019) discusses optimizing the controller parameters of the active disturbance rejection control (ADRC). • Optimize the global machine operation strategy, such as minimizing the overall downtime of the RF system of a Linac, which needs to balance the power settings of different RF stations. This is a strategy optimization based on the data observed in a long period (e.g., several weeks or months). Mature implementation is critical for applying optimization algorithms to the daily operation of accelerators. The optimizer should detect and handle exceptions explicitly for robust operation. It should allow selecting the inputs to be manipulated and changing the objectives during run time. An intelligent optimizer may automatically select proper inputs for adjustments from many possible ones. The inputs should be selected according to their impact on the objectives. We prefer using fewer inputs
82
3 Beam Optimization
in the optimizer to accelerate the convergence speed. For example, RCDS needs at least n (number of inputs) iterations to find the minimum. Therefore, with a smaller n value (i.e., a smaller number of inputs but are sufficient to reach the minimum), RCDS converges faster. Furthermore, we can reduce the number of inputs using the dimensionality reduction technique in machine learning like autoencoder. We manipulate the encoded inputs (i.e., the latent space vector, with fewer variables than the original input vector) for optimization and the actual inputs can be determined by the decoder. An optimizer should support being adapted to different systems easily. A general implementation framework helps (Zhang et al. 2021). For accelerating the optimization process, parallel computing with graphics processing units (GPUs) or field programmable gate arrays (FPGAs) should be considered (Lalwani et al. 2019) whenever possible.
References I. Agapov, G. Geloni, S. Tomin et al., OCELOT: a software framework for synchrotron light source and FEL studies. Nucl. Instrum Methods Phys. Res. A 768, 151–156 (2014). https://doi.org/10. 1016/j.nima.2014.09.057 M. Aiba, M. Boege, N. Milas et al., Random walk optimization in accelerators: vertical emittance tuning at SLS, in Proceedings of IPAC2012 Conference, New Orleans, Louisiana, USA, 20–25 May 2012 (2012) K. Ariyur, M. Krstic, Real-Time Optimization by Extremum-Seeking Control (Wiley, New York, 2004) C. Audet, W. Hare, Derivative-Free and Blackbox Optimization (Springer, Cham, 2017) C. Bao, L. Xu, E.D. Goodman et al., A novel non-dominated sorting algorithm for evolutionary multi-objective optimization. J. Computat. Sci. 23, 31–43 (2017). https://doi.org/10.1016/j.jocs. 2017.09.015 W.F. Bergan, Dimension reduction for online optimization of particle accelerators. Ph.D. Thesis, Cornell University (2020) S. Boyd, L. Vandenberghe, Convex Optimization (Cambridge University Press, Cambridge, 2004) C.A. Coello, M.S. Lechuga, MOPSO: a proposal for multiple objective particle swarm optimization, in Proceedings of CEC’02 Conference, Honolulu, HI, USA, 12–17 May 2022 (2002) K. Deb, A. Pratap, S. Agarwal et al., A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002). https://doi.org/10.1109/4235.996017 B. DeBoon, B. Kent, M. Lachi et al., Multi-objective gain optimizer for an active disturbance rejection controller, in Proceedings of the GlobalSIP2019 Conference, Ottawa, Canada, 11–14 Nov 2019 (2019) J. Duris, D. Kennedy, A. Hanuka et al., Bayesian optimization of a free-electron laser. Phys. Rev. Lett. 124, 124801 (2020). https://doi.org/10.1103/PhysRevLett.124.124801 A.P. Engelbrecht, Computational Intelligence: An Introduction, 2nd edn. (Wiley, Atrium, 2007) H. Fujii, A. Scheinker, A. Uchiyama et al., Extremum seeking control for the optimization of heavy ion beam transportation, in Proceedings of the 18th Annual Meeting of Particle Accelerator Society of Japan, QST-Takasaki, Japan, 9–12 Aug 2021 (2021) G. Gaio, M. Lonza, Automatic FEL optimization at FERMI, in Proceedings of ICALEPCS2015 Conference, Melbourne, Australia, 17–23 Oct 2015 (2015) X. Huang, Robust simplex algorithm for online optimization. Phys. Rev. Accel. Beams 21, 104601 (2018). https://doi.org/10.1103/PhysRevAccelBeams.21.104601
References
83
X. Huang, Beam-Based Correction and Optimization for Accelerators (CRC Press, Boca Raton, 2020) X. Huang, J. Safranek, Online optimization of storage ring nonlinear beam dynamics. Phys. Rev. ST Accel. Beams 18, 084001 (2015). https://doi.org/10.1103/PhysRevSTAB.18.084001 X. Huang, J. Corbett, J. Safranek et al., An algorithm for online optimization of accelerators. Nucl. Instrum. Methods Phys. Res. A 726, 77–83 (2013). https://doi.org/10.1016/j.nima.2013.05.046 H. Ji, Y. Jiao, S. Wang et al., Feasibility study of online tuning of the luminosity in a circular collider with the robust conjugate direction search method. Chin. Phys. C 39(12), 127006 (2015). https:// doi.org/10.1088/1674-1137/39/12/127006 J. Kirschner, M. Mutny, A. Krause et al., Tuning particle accelerators with safety constraints using Bayesian optimization. Phys. Rev. Accel. Beams 25, 062802 (2022). https://doi.org/10.1103/ PhysRevAccelBeams.25.062802 M. Kochenderfer, T.A. Wheeler, Algorithms for Optimization (The MIT Press, Cambridge, 2019) S. Lalwani, H. Sharma, S.C. Satapathy et al., A survey on parallel particle swarm optimization algorithms. Arab. J. Sci. Eng. 44, 2899–2923 (2019). https://doi.org/10.1007/s13369-018-037 13-6 K.Y. Lee, El-Sharkawi, (eds.), Modern Heuristic Optimization Techniques: Theory and Applications to Power Systems. (Wiley-IEEE Press, Hoboken, 2008) R.M. Lewis, V. Torczon, M.W. Trosset, Direct search methods: then and now. J. Comput. Appl. Math. 124(1–2), 191–207 (2000). https://doi.org/10.1016/S0377-0427(00)00423-4 J. Li, R.R. Rhinehart, Heuristic random optimization. Comput. Chem. Eng. 22(3), 427–444 (1998). https://doi.org/10.1016/S0098-1354(97)00005-7 S.M. Liuzzo, N. Carmignani, L. Farvacque et al., RCDS optimizations for the ESRF storage ring, in Proceedings of IPAC2016 Conference, Busan, Korea, 8–13 May 2016 (2016) R. Marti, M. Resende, C. Ribeiro, Multi-start methods for combinatorial optimization. Eur. J. Oper. Res. 226(1), 1–8 (2013). https://doi.org/10.1016/j.ejor.2012.10.012 A. Scheinker, M. Krstic, Model-Free Stabilization by Extremum Seeking (Springer, Cham, 2017) A. Scheinker, D. Bohler, S. Tomin et al., Model-independent tuning for maximizing free electron laser pulse energy. Phys. Rev. Accel. Beams 22, 082802 (2019). https://doi.org/10.1103/PhysRe vAccelBeams.22.082802 A. Scheinker, S. Hirlaender, F.M. Velotti et al., Online multi-objective particle accelerator optimization of the AWAKE electron beam line for simultaneous emittance and orbit control. AIP Adv. 10, 055320 (2020). https://doi.org/10.1063/5.0003423 A. Scheinker, E. Huang, C. Taylor, Extremum seeking-based control system for particle accelerator beam loss minimization. IEEE Trans. Control Syst. Technol. 30(5), 2261–2268 (2022). https:// doi.org/10.1109/TCST.2021.3136133 E. Schuster, C.K. Allen, M. Krstic, Optimized beam matching using extremum seeking, in Proceedings of 2005 Particle Accelerator Conference, Knoxville, Tennessee, 16–20 May 2005 (2005) S. Tomin, G. Geloni, I. Agapov et al., On-line optimization of European XFEL with OCELOT, in Proceedings of ICALEPCS2017 Conference, Barcelona, Spain, 8–13 Oct 2017 (2017) S. Tomin, G. Geloni, M. Scholz, FEL optimization: from model-free to model-dependent approaches and ML prospects. Paper presented in FEL2019 conference, Hamburg, Germany, 26–30 Aug 2019 (2019). https://accelconf.web.cern.ch/fel2019/talks/thd03_talk.pdf. Accessed 20 Aug 2022 J. Wu, K. Fang, X. Huang et al., Recent on-line taper optimization on LCLS, in Proceedings of FEL2017 Conference, Santa Fe, NM, USA, 20–25 Aug 2017 (2017) M. Xi, W. Sun, J. Chen, Survey of derivative-free optimization. Numer. Algebra Control Optim. 10(4), 537–555 (2020). https://doi.org/10.3934/naco.2020050 Z. Zhang, X. Huang, M. Song, Teeport: break the wall between the optimization algorithms and problems. Front. Big Data 4, 734650 (2021). https://doi.org/10.3389/fdata.2021.734650
Chapter 4
Machine Learning for Beam Controls
Abstract Machine learning is a class of data-driven methodologies to identify system models, make predictions, or determine control actions. Machine learning methods are attractive since less domain knowledge is required when applying them to accelerator design and operation. This chapter presents an overview of the machine learning algorithms for accelerator beam controls. After a brief introduction, we discuss building surrogate models for accelerator subsystems and beam responses based on the input–output data. The neural network and Gaussian process regression models are emphasized. These surrogate models are beneficial for adapting beam feedback for different operating points, implementing feedforward control, and accelerating beam optimization processes. The concepts and algorithms of reinforcement learning are then introduced and used to solve linear quadratic Gaussian problems. In the end, we also summarize the machine learning applications in particle accelerators beyond beam controls.
4.1 Introduction to Machine Learning 4.1.1 Machine Learning Algorithms Machine learning (Hastie et al. 2009; Murphy 2012; Géron 2019) is gaining popularity in almost all domains of science to deal with the problem of extracting patterns from data. The patterns are represented in the form of models used to solve different predictive tasks, such as forecasting, imputing missing data, detecting anomalies, classifying, ranking, decision making, etc. A machine learning algorithm represents the model structure and the method to obtain the model parameters. Machine learning is being increasingly used in particle accelerators. It has been successfully used for modeling the accelerator input–output relations, optimizing the accelerator design and operation, predicting/classifying the machine faults, etc. This subsection gives a brief overview of machine learning. From Sect. 4.2, we will focus on several machine learning algorithms applied successfully to beam controls (i.e., beam setup, optimization, or stabilization, see Chap. 1).
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 Z. Geng and S. Simrock, Intelligent Beam Control in Accelerators, Particle Acceleration and Detection, https://doi.org/10.1007/978-3-031-28597-4_4
85
86
4 Machine Learning for Beam Controls
Machine learning is a part of artificial intelligence (AI). Machine learning algorithms learn the internal principles and logical relations within the sampled data, known as training data, and build models to make predictions or decisions for new data without being explicitly programmed. For example, we can create a regression model for a radio frequency (RF) amplifier using the measurements of its input and output powers and then predict the output power for a new input with the model. Based on the functions of their derived models, machine learning algorithms can be categorized as follows: • Regression: For a given set of input features (e.g., klystron input power and high voltage), predict the target numeric outputs (e.g., klystron output power). Typical regression algorithms include linear regression, polynomial regression, logistic regression, support vector regression, decision tree regression, random forest regression, Ridge regression and Lasso regression. Here “feature” means a parameter plus its value (e.g., “high voltage” = 200 kV). • Classification: Predict the class label for a given set of input features. One example is to classify if a superconducting cavity quenches or not based on the sampled waveforms of the forward, reflected, and cavity probe RF signals. Typical algorithms for classification include k-nearest neighbors, decision tree, random forest, support vector machine (SVM), naive Bayes, gradient boosting, and logistic regression. Logistic regression is often used for classification, with its output interpreted as a probability of belonging to a given class (e.g., 20% chance of quench). • Clustering: Group instances of data into clusters containing similar characteristics. It can also be used to identify the hidden relationships in a dataset that are difficult to find by simple observations. For example, in a pulsed free electron laser (FEL) machine, the beam parameters are affected by the phases of the alternating-current (AC) mains if the machine repetition rate (e.g., 100 Hz) is higher than the AC frequency (e.g., 50 Hz). Clustering algorithms help find this dependency more efficiently. Typical clustering algorithms include k-means, DBSCAN, and hierarchical cluster analysis (HCA). • Visualization and dimensionality reduction: Reduce the dimension of highdimensional data into 2D or 3D for displaying while preserving the primary structures in the data. Dimensionality reduction also removes the redundancy in the data (e.g., some components in an input vector may be correlated) and simplify the machine learning models (handle fewer input features). Typical algorithms include principal component analysis (PCA), locally linear embedding (LLE), and t-distributed stochastic neighbor embedding (t-SNE). • Anomaly detection: Detect abnormal or unusual observations in data, such as identifying abnormal FEL pulses to exclude them from experiment data analysis. The anomaly detection overlaps with the classification category if the training data is labeled (supervised learning). Typical anomaly detection algorithms based on data without labels (unsupervised learning) include self-organizing maps (SOMs), one-class SVM, and isolation forest.
4.1 Introduction to Machine Learning
87
• Association rule learning: Discover relations between data attributes by digging into large amounts of data. For example, running an association rule on the FEL data may reveal that the FEL pulse energy drops are possibly related to the RF Gun amplitude drifts. Typical association rule learning algorithms include the Apriori algorithm, equivalence class transformation (Eclat) algorithm, and frequent pattern (F-P) growth algorithm. For accelerator beam controls, these algorithms are potentially helpful in identifying the sources of beam quality degradation or failure. Artificial neural network, or simply neural network (NN), is one of the most important structures of machine learning algorithms. It can be used to implement algorithms of all the above categories and is the basis of deep learning. Deep learning architectures such as deep neural network, deep belief network, deep reinforcement learning, recurrent neural network, and convolutional neural network have been widely used in the fields like computer vision, natural language processing, medical image analysis, etc. In particle accelerators, deep learning is also successfully used for virtual diagnostics, surrogate models, and beam controls. We will cover some of these topics in this chapter and provide references to related articles. The relations between AI, machine learning, and deep learning are depicted in Fig. 4.1. Another criterion to categorize the machine learning algorithms is based on whether or not their models are trained with human supervision, resulting in categories as follows: (a) Supervised learning The training data includes the input features and their desired outputs called labels. Learning aims to find a general model mapping the inputs to outputs. Most regression and classification algorithms fall into this category. The primary use cases of machine learning for beam controls are based on the surrogate models of the accelerator subsystems and beam responses. These surrogate models are identified with the supervised learning algorithms since the input–output data of the accelerator are typically available. We discuss the identification and applications of surrogate models in Sects. 4.2 and 4.3. Fig. 4.1 Relations between AI, machine learning and deep learning
Artificial Intelligence Machine Learning
Deep Learning
88
4 Machine Learning for Beam Controls
(b) Unsupervised learning The training data has no labels for the input features, and the algorithm must discover the hidden patterns in the data independently. The clustering, visualization and dimensionality reduction, anomaly detection, and association rule learning algorithms belong to unsupervised learning. This book will not discuss unsupervised learning because it is not so relevant for beam controls. (c) Reinforcement learning For reinforcement learning, the learning algorithm trains an agent by interacting with a dynamical environment. The agent contains a policy that determines optimal actions based on the state of the environment to maximize the returned cumulative rewards. Unlike supervised/unsupervised learning, reinforcement learning results in an optimal control policy instead of data models. We will give more details of the reinforcement learning in Sect. 4.4. There are other criteria to classify machine learning algorithms. Depending on whether or not they learn incrementally during operation, they can be categorized as online learning or batch learning algorithms. The primary goal of machine learning is to make predictions for new data (i.e., generalization). If a machine learning algorithm makes predictions by comparing new data points to known data points, it is called an instance-based learning algorithm. Otherwise, the algorithm detects patterns in the training data and builds a predictive model. Then, it is called a model-based learning algorithm.
4.1.2 Machine Learning Models Mathematical models are used in many scientific and engineering domains to approximate the properties and behaviors of the real world. When working on particle accelerators, physicists build physic models, control engineers identify control models, and here we introduce machine learning models for the accelerator subsystems and beam responses (called systems for simplicity). These models can describe the input–output relations of the systems but with different levels of detail and domain knowledge. Physics models are built based on the systems’ principles and are often called white-box models. They are typically described by linear or nonlinear differential or static equations. The parameters of these equations usually have clear physical meanings. For example, the physics model of an RF cavity is described with the following differential equations (Simrock and Geng 2022) [ ] v˙ C (t) + ω1/2 − jΔω(t) vC (t) = ω1/2 R L iC (t), Δω¨ m (t) +
M ∑ ωm Δω˙ m (t) + ωm2 Δωm (t) = −K m ωm2 (t)|vC (t)|2 , Δω(t) = Δωm (t). Qm m=1
(4.1)
4.1 Introduction to Machine Learning
89
It describes the electrical model between the cavity drive current iC (t) and cavity voltage vC (t) and the mechanical model between the cavity voltage and detuning. The parameters in the equations are with clear physical meanings. For example, ω1/2 is the half-bandwidth and Δω(t) is the time-varying detuning of the cavity. The model parameters are typically given by the physics design or the lab measurements (e.g., measuring the cavity bandwidth with a network analyzer). In some cases, we also need to estimate the parameters using the input–output data measured from the system. This is a typical grey-box system identification problem since the model structure is known, and we only estimate the unknown parameters. Building or identifying physics models requires much domain knowledge and is usually done by domain experts. Control models of the systems are often required to design feedback controllers. Here we are more interested in the input–output relations of a system instead of its physical parameters. Therefore, the system can be viewed as a blackbox, and we use general functions to fit its behaviors mapping the inputs to the outputs. For the same RF cavity, we have x˙ (t) = f (t, x(t), iC (t)), vC (t) = h(t, x(t), iC (t)),
(4.2)
where x(t) is the state vector, and f and h are general functions. If f and h are linear functions, (4.2) degenerates to a linear state-space equation (Dorf and Bishop 2010). This blackbox model can also describe the system’s dynamical behaviors as (4.1), but the physical meanings of f and h are not obvious. In control theory, identifying such blackbox dynamical models based on the input–output data is an important task. It is known as the blackbox system identification (Ljung 1998). Typically, we identify linear control models around an operating point of a system, especially when the system is strongly nonlinear. In this case, the input–output data covers only a small operating range to keep the response approximately linear. Control models play an essential role in analyzing and designing feedback systems. Machine learning models are far more general for describing a system than the physics or control models. A regression model can describe the system’s input–output relation including both static and dynamical behaviors. Moreover, as mentioned in Sect. 4.1.1, machine learning models are used in wider applications. For example, in addition to modeling the cavity’s dynamical behaviors with a neural network (e.g., Fig. 4.6b or a recurrent neural network), a classification model can detect the cavity quench. Machine learning interprets the natural world in a statistical way. It assumes the system’s inputs and outputs are random variables, and their appearance in pairs follows certain joint distributions. Therefore, machine learning models are statistical models representing the distributions of P(y|x) (conditional probability distribution for supervised learning) or P(x) (for unsupervised learning). Here x is the input features to the machine learning algorithms, and y is the outputs (i.e., labels for supervised learning). Like the control models, the physical meanings of the machine learning models are not obvious. Machine learning models are often
90
4 Machine Learning for Beam Controls
used to predict the nonlinear responses of a system over a wide operating range. Therefore, the input–output training data should represent the large-range operating points concerned by the users. Identifying the control or machine learning models requires less domain knowledge than identifying the physics models. However, deeply understanding the principles of the target system helps determine the model structure and evaluate the model performance. System identification and machine learning both use data but follow different methodologies. Many system identification methods interpret the data in the frequency domain (e.g., the empirical transfer function estimate (ETFE) method), and the most critical hyperparameter is the order of the dynamical system. As a comparison, machine learning mainly focuses on finding the patterns and correlations in static data. As a common point, both the system identification and the machine learning adopt the parameter estimation methods of statistics, e.g., maximum likelihood estimation (MLE), and the parameter estimation is often converted to (convex) optimization problems.
4.1.3 Machine Learning Workflow Building a machine learning model follows the typical steps shown in Fig. 4.2 (Géron 2019). We typically start by collecting and analyzing requirements to understand the goals, scope, and constraints of the task. The output of this step is a requirements specification document. An early decision should be made whether we employ machine learning algorithms or other strategies based on physics models, (linear) control models, or modeless methods. For example, to detect quenches of superconducting cavities, we must decide if we use machine learning algorithms (e.g., neural-network classifier) or simply calculate and check the cavity quality factor based on the physics model. If machine learning is chosen, we collect data from the system based on whether supervised, unsupervised, or reinforcement-learning methods can potentially be used. For example, to predict the system output for a given input, supervised learning should be used, and every training data point must have a label. For data preparation, the raw data needs to be cleaned up and transformed to fit the target model better. Then, we split the cleaned and transformed data set into a training set and a test set. The test set is put aside for testing the final model later, and we use the training set to train the machine learning model. Typically, we further split the training set into several subsets (also called folds), some of which are used to train the model, and the others (also called validation sets) are used to validate the trained model. The training-validation steps are often iterated to optimize the hyperparameters (e.g., neural network layers, polynomial order of polynomial regression, etc.). Note that a hyperparameter is a parameter of the learning algorithm (not of the model). Typically, we train several models of different types with the same data set, because different machine learning algorithms may solve the same problem. We often try
4.1 Introduction to Machine Learning
91
Fig. 4.2 Machine learning model development workflow. Only the main steps are shown. The rounded boxes are action steps, the dotted arrows are control flows and the solid arrows are object flows
Initial Requirements analysis Req. spec.
Data collection Raw data Test set
Data preparation Training set
Training set splitting Training set folds
Model selection Model structure
Model training
Hyperparameter tuning
Models & valid. sets
Final model
Model validation Final model test
several algorithms and select the best performing one. For example, for a regression task, we can train a linear regression model, a random forest regression model, and a neural-network regression model for comparison. For each model, we run the training-validation iterations to optimize the hyperparameters. Then, the best performing model is selected as the final model that is re-trained with the entire training set (including the data in the validation sets). This final model is evaluated with the test set obtained in the earlier step. If the final model is not satisfactory, we may choose different model types or even iterate the procedure from collecting new data more representative of the desired working scenarios of the model. For example, when predicting the klystron output power at a higher input power, we must update the training data to cover the extended operating points. Note that these iterations are not shown in Fig. 4.2. If the final model is satisfactory, we deploy it (already trained) to production. Supervised and unsupervised learning produce models that make predictions on new inputs. In contrast, reinforcement learning yields policies to regulate the system outputs for setpoint tracking or disturbance rejection, just as feedback controllers. After deployment, we need to maintain the model, such as observing its performance, updating the model with new data (either train a new model with batch learning or update the existing one with online learning), or extending it with more functions. The deployment and maintenance steps are not depicted in Fig. 4.2.
92
4 Machine Learning for Beam Controls
4.1.4 Machine Learning Processes We provide some details about the steps (i.e., processes) in Fig. 4.2 as follows: (a) Data collection The data required by machine learning can be collected from difference sources, such as archived history, live systems, or simulations. Before the physical system is available, simulation based on physics models can provide the necessary data to pre-study the machine learning algorithms. This is helpful to determine the model structure or evaluate if machine learning is feasible or not. The quantity and quality of data are critical for successful machine learning applications. First, many algorithms require lots of data, so we must collect sufficient training data. Second, the training data must represent the new cases to which we want to generalize the model. For example, a regression model that predicts the klystron output power should be trained with sufficiently large ranges of inputs (i.e., input power and high voltage) so that the model is accurate at all possible operating points. Third, high-quality data with fewer errors, outliers, and less noise are preferred so that the learning algorithm can better detect the underlying patterns. If the data quality is low, data cleaning is needed as described in the “data preparation” part. Finally, we should collect (or select) data relevant to the problem to be solved. The learning algorithms only work well if the training data contain enough relevant features and not many irrelevant ones. For example, for a regression model predicting the FEL wavelength, the input features should include the electron beam energy and undulator gaps, or the model hardly works. (b) Data preparation The raw data may not be optimal for model training. To enhance the learning efficiency and extract as much as possible information from the data, we often prepare the data with the following processes: • Align the data in time. The data for the same event (e.g., an FEL pulse and the corresponding RF amplitudes and phases) should be collected in the same record (i.e., data point). It requires aligning the timestamps of the data read from different places of the machine. • Clean up the data. First, we may discard the data points that are clearly outliers. Second, if some data points are corrupted with some features missing, we need to decide whether we want to ignore these data points or fill in the missing values. For example, suppose we want to train a regression model to predict the electron beam parameters (e.g., beam energy, bunch length, etc.) using the amplitudes and phases of multiple RF stations. The data of an RF station was lost for a period due to a network problem. We may neglect the entire data records during this period or fill in the missing data using the amplitudes and phases of this RF station before and after the network fault.
4.1 Introduction to Machine Learning
93
• Transform the data. We may discard irrelevant input features or combine multiple features to construct a new one with stronger correlations with the outputs. This can also be done automatically with the PCA algorithm, which extracts the principal components of the input features, resulting in a lower-dimensional input vector. Text or category features may need to be converted to numbers required by many machine learning algorithms. Mathematical transformations (e.g., logarithm, cosine, or sine) help simplify the model structure. For example, when building a regression model for an exponential response, logarithm transformation of the output (i.e., label) enable the usage of a simple linear regression model. Another important transformation is the normalization of the data called feature scaling. Machine learning algorithms typically require the numerical input features to have similar scales for good performance. The discussions about scaling in Sects. 2.4.1 and 3.3.1 also apply to the machine learning applications. (c) Model selection and training When developing a machine learning system, we may try several algorithms, compare their performance, and select one for deep tuning. Simpler algorithms are preferred if their performance is satisfactory. In accelerator controls, the neural network is a successful model structure, which is discussed in Sect. 4.2.1. Training a machine learning model is to determine its model parameters based on the training data, which requires solving an optimization problem of minimizing a cost function (also called loss function). Typical cost functions for regression problems include the mean square error (MSE) and mean absolute error (MAE). For classification problems, the cross-entropy (also called negative log-likelihood) cost function is frequently used. There are many other cost functions used to train different machine learning models. We prefer constructing convex cost functions that can be minimized more efficiently with gradient-based optimization algorithms. As discussed in Chap. 3, solving non-convex problems requires derivative-free optimization algorithms that are much less efficient. When selecting models to solve machine learning problems, we should avoid either overfitting or underfitting. Overfitting means the model performs well on the training data but does not generalize well, i.e., performs poorly on the validation or test set. It happens when the model is too complex relative to the amount and quality of the training data. To mitigate the overfitting, we may simplify the model (e.g., select a simpler model, reduce the number of input features, or constrain the model parameters with regularizations), gather more training data, or improve the data quality (e.g., reduce the noise). Underfitting means the model performs poorly on both the training data and the validation or test set. It occurs when the model is too simple to learn the underlying patterns of the data. In this case, we need to choose a more powerful model, feed better features to the learning algorithm, or reduce the constraints on the model. (d) Model validation and hyperparameter tuning Model validation is to evaluate the performance of a trained model by applying it to the validation set. For example, to validate a regression model, we predict the outputs
94
4 Machine Learning for Beam Controls
for input features in the validation set and check the MSE between the predictions and the given labels. The K-fold cross-validation algorithm is widely used in practice. We randomly split the training set into K distinct folds. Then, we train and evaluate the model K times, each time picking a different fold for evaluation and training the model with other folds. The K evaluation results tell the score of the model, including a mean value and a variance indicating the uncertainty of the score. Crossvalidation trains the model multiple times, which could be difficult if the training is too time-consuming. The validation results are often the criteria to tune the hyperparameters, such as regularization factor and neural network structure (e.g., number of layers and number of neurons per layer). Hyperparameter tuning is an optimization problem aiming at minimizing the validation cost (i.e., loss). Machine learning software tools (e.g., scikit-learn) provide basic algorithms for hyperparameter tunings, such as line search, grid search, or randomized search. Bayesian optimization (see Sect. 4.3.5) can be used when we have multiple (but less than several tens) hyperparameters to tune and the basic algorithms become not feasible. Furthermore, the optimization algorithms introduced in Chap. 3 can also be adopted here.
4.2 Accelerator Modeling with Machine Learning The most successful applications of machine learning in accelerator controls are based on the surrogate models of the machine. A surrogate model is a mathematical model describing the system input–output relations approximately. It can replace the actual system when we study the system behavior under different inputs and disturbances. Surrogate models are also beneficial for developing controllers or performing optimizations, especially when the actual system is not available (e.g., not constructed or in user operation). All the three types of models introduced in Sect. 4.1.2 can serve as surrogate models. The physics models are often not precise compared to the actual responses. The control models are usually linear and only valid around a particular operating point. As a comparison, machine learning models can model the large-range responses of the system accurately if trained with proper data. In beam controls, machine learning surrogate models are mainly used in the following cases: 1. Study the beam responses of new inputs. This is useful if we want to run the machine in a new mode never tried before. The surrogate model can predict the response that helps find any potential problems before applying the inputs to the actual machine. 2. Replace the invasive beam detectors. It is also called virtual diagnostics. Many accurate beam diagnostics are invasive to beam operations. One example is the longitudinal profile measurement of electron bunches after a bunch compressor with an RF deflector cavity and a screen. Suppose we train a surrogate model to map the noninvasive measurements (e.g., RF amplitude and phase, Gun solenoid current, bunch charge) to the longitudinal profile on the screen. The longitudinal
4.2 Accelerator Modeling with Machine Learning
3.
4.
5.
6.
95
profile can be predicted during normal operation when the RF deflector cavity is disabled (it should be enabled when collecting the training data). Implement feedback/feedforward controllers with inverse surrogate models. As discussed in Chap. 2, static controllers are derived by inverting the beam response matrices. If we train an inverse surrogate model to map the system outputs to inputs, it can be used as the static controller shown in Fig. 2.3. With this inverse surrogate model, the beam setup can also be easily done because the required inputs for desired outputs can be calculated directly. Accelerate the optimization process. Beam optimization requires evaluating a cost function many times. It is time-consuming or at risk of beam loss if the evaluation is done directly in the physical system. Optimizing a surrogate model yields a quasi-optimal solution that can be the starting point of online beam optimization applied to the physical system. This accelerates the overall optimization process. Furthermore, surrogate models allow using global optimization algorithms like genetic algorithm (GA) or particle swarm optimization (PSO) algorithm, which can more likely find the global optima. This topic will be discussed in Sects. 4.3.3 and 4.3.5. Estimate the beam response matrix at a new operating point. Linear beam feedback controllers must be re-configured at a different operating point because the beam response matrix might differ. This is due to the nonlinearity of the beam responses. With a surrogate model, the beam response matrix at the new operating point can be estimated and used to update the static controller. It is helpful when there is no time to re-measure the beam response matrix after changing the operating point. See Sect. 4.3.2 for details. Implement feedforward controllers. We could also train a predictive model to map the disturbances (assume measurable) to the beam errors. This model can be used to implement a feedforward controller to mitigate the effects of disturbances, as discussed in Sect. 4.3.4.
Surrogate models are regression models. This section introduces two successful machine learning structures for building surrogate models of accelerator subsystems and beam responses. We will introduce the basic concept and architecture of neural network (model-based learning) and Gaussian process (instance-based learning). In the next section, we apply the surrogate models to the accelerator beam control and optimization, mainly covering cases 4–6 above.
4.2.1 Neural Network Regression Model Neural networks are important building blocks of modern machine learning. Its applications can be found in all aspects of machine learning, especially after deep learning gained popularity in the 1990s. In this subsection, we use neural networks to model the input–output relations of the accelerator subsystems and beam responses. These input–output relations are typically nonlinear but smooth, i.e., small input changes
96
4 Machine Learning for Beam Controls
only produce small output deviations. Neural networks are attractive to model such smooth nonlinear functions. Under this condition of usage, the neural network models are regression models. Typically, we train the neural network regression model using the data measured from the physical system and then use it to predict the system outputs for new inputs.
4.2.1.1
Introduction to Neural Networks
First, let’s briefly overview the neural networks and their working principles. The basic elements of neural networks are neurons depicted in Fig. 4.3. A neuron accepts multiple inputs, x1 , x2 , . . . , xn , and produces an output with an activation function applied to the biased linear combination of the inputs. The weighting parameters θ1 , θ2 , . . . , θn and the bias θ0 should be determined with the training data. Activation functions are typically nonlinear, allowing fitting nonlinear systems. Figure 4.4 shows several widely used activation functions given by sigmoid(u) =
1 eu − e−u , linear(u) = u, tanh(u) = . 1 + e−u eu + e−u
(4.3)
Fig. 4.3 Model of a neuron
2
Fig. 4.4 Example of typical activation functions
y
1
0 sigmoid linear tanh
-1
-2 -6
-4
-2
0
u
2
4
6
4.2 Accelerator Modeling with Machine Learning
97
Interconnecting multiple neurons and organizing them in layers result in a neuron network. See Fig. 4.5 as an example. We often define an input layer that only passes the inputs to hidden layers without computations. A hidden layer performs intermediate calculations and passes results to the next hidden layer or the output layer. The output layer produces the final outputs. The numbers of neurons in the input and output layers should be consistent with the physical system to be modeled. To achieve a good performance, we should choose a proper number of hidden layers and a proper number of neurons in each hidden layer. These two numbers are hyperparameters of a neural network. If we apply deep learning, the layer types (e.g., fully connected layer, convolution layer, recurrent layer, etc.) should also be determined. Here we limit our discussions to ordinary neural networks (i.e., multilayer perceptron, MLP). When using neural networks as regression models, like that for accelerator subsystems and beam responses, the hidden-layer neurons can choose either sigmoid or tanh (hyperbolic tangent) activation functions. They introduce nonlinearity in the network to model the nonlinear physical systems. A regression network often employs the linear activation function in the output layer. In principle, neural networks can fit any smooth (nonlinear) functions given the flexibility to choose arbitrary numbers of hidden layers and neurons per layer (Hornik 1991). Let us look at the example in Fig. 4.5. Given an input x and the parameters θ = [ θ0 θ1 . . . θ6 ]T , the outputs (including intermediate results) are
Fig. 4.5 A simple example neural network. It models a single-input (x) single-ouput (y) system. There is a hddien layer containing 2 neurons with the sigmoid activation function. The linear activation function is used in the output-layer neuron
Fig. 4.6 Neural network models for a static systems and b dynamical systems. The first part of the subscript represents the index of inputs and outputs, and the second part is the time index
x1,k x2,k
…
xn,k
Neural Network
a
y1,k y2,k
…
ym,k
x1,k x1,k-1 … x1,k-l x2,k x2,k-1 … x2,k-l
…
xn,k xn,k-1 … xn,k-l
Neural Network
y1,k y2,k
…
ym,k
b
98
4 Machine Learning for Beam Controls
1 1 , u 12 = θ2 + θ3 x, z 12 = . −u 11 1+e 1 + e−u 12 = θ4 + θ5 z 11 + θ6 z 12 , yˆ = u 21 .
u 11 = θ0 + θ1 x, z 11 = u 21
(4.4)
The effects of each parameter are summarized as follows: • • • •
θ 0 and θ 2 shift the sigmoid function waveforms of z11 and z12 along the x-axis. θ 1 and θ 3 scale the x-axis of the sigmoid function waveforms of z11 and z12 . θ 4 is the mean value of the output. θ 5 and θ 6 scale the magnitudes of the waveforms of z11 and z12 .
Therefore, the neural network output is the sum of two manipulated (shifted in x, scaled in x and magnitude) sigmoid functions and an offset. If the hidden layer includes many neurons, the output will be the sum of many manipulated sigmoid functions, making the network possible to fit complex nonlinear functions.
4.2.1.2
Training of Neural Networks
Many algorithms are developed to train neural networks. Here we introduce the traditional back propagation method. To determine the unknown parameters θ = [ θ0 θ1 . . . ]T in a predefined network structure, we collect training data from the physical system. The training data is a set of input–output pairs {xi , yi , i = 1, 2, . . . , N }, where N is the number of points in the set. Note that we limit our formulas to singleinput single-output (SISO) neural networks to highlight the basic concepts and avoid getting lost in mathematics. For regression problems, the training goal is to find θ* to minimize the MSE cost: J (θ) =
N ]2 1 ∑[ yi − yˆi (xi , θ) , N i=1
(4.5)
where yˆi is the network output for xi with a given θ. The optimal θ* can be determined using the gradient-decent algorithm θ j+1 = θ j − α∇θ J (θ)|θ=θ j ,
(4.6)
where j = 0, 1, 2, … is the iteration index, α is a positive number controlling the convergence speed, and ∇θ J (θ) is the cost function gradient with respect to θ. Typically, we initialize θ0 as a non-zero random vector. Since the network structure is known, the gradient can be calculated analytically as ]T [ ∇θ J (θ) = ∂/∂θ0 ∂/∂θ1 · · · J (θ) =−
N ] 2 ∑[ yi − yˆi (xi , θ) ∇θ yˆi (xi , θ). N i=1
(4.7)
4.2 Accelerator Modeling with Machine Learning
99
The gradient of yˆi (xi , θ) can be calculated using the intermediate results. For example, the derivative of yˆ to θ 5 and θ 3 for Fig. 4.5 can be written as ∂ yˆ ∂ yˆ ∂u 21 ∂ yˆ ∂ yˆ ∂u 21 ∂z 12 ∂u 12 = = z 11 , = = θ6 z 12 (1 − z 12 )x. ∂θ5 ∂u 21 ∂θ5 ∂θ3 ∂u 21 ∂ z 12 ∂u 12 ∂θ3
(4.8)
The network training consists of several steps. First, we calculate the network outputs and intermediate results with the existing θ, as (4.4). Then, we compute the gradient using (4.8) and (4.7). Finally, θ is updated according to (4.6). Since the derivative propagates from outputs back to inputs, this algorithm is called back propagation. Many software packages, like Matlab, TensorFlow, PyTorch, and scikitlearn, have already implemented mature algorithms to train neural networks.
4.2.1.3
System Modeling with Neural Networks
When using neural networks to model static systems, the structure in Fig. 4.6a is used, whose outputs are only determined by the instant inputs. On the contrary, many systems are dynamical. Their outputs are also affected by historical inputs. We adopt the structure in Fig. 4.6b to model dynamical systems. Their outputs at time k are determined by both the instant inputs xi,k (i = 1, …, n) and the previous l historical inputs. Here we model the system as a discrete system with n inputs and m outputs. Figure 4.6b models the system as a finite impulse response (FIR) filter (Simrock and Geng 2022). The input sequence length l should be long enough to cover the significant points in the system’s impulse response. We demonstrate applying neural networks to model the input–output relations of a klystron, as shown in Fig. 4.7. The RF controller stores two reference waveforms that define the in-phase (I) and quadrature (Q) components of the RF pulse. The reference waveforms are scaled by a drive factor and output via two digital-to-analog converters (DACs), modulating the RF reference signal with an RF actuator. The output is used to drive the klystron. We built two neural network models for the following purposes: (a) Predict klystron output amplitude This corresponds to modeling a static system. Here we keep the DAC reference waveforms unchanged. The model inputs are the DAC drive factor and modulator high voltage (HV), and the output is the klystron output amplitude averaged within Fig. 4.7 A klystron driven by DACs and an RF actuator
100
4 Machine Learning for Beam Controls
Fig. 4.8 Neural network model for predicting klystron output amplitude. Points with “*” are training data and that with “o” are test results predicted by the model
the RF pulse. This model has one hidden layer with 18 neurons. The training and test results are shown in Fig. 4.8. The model predictions (data points marked with circles) agree well with the measurements. This regression model can predict the klystron output amplitude for given values of the HV and DAC drive factor. (b) Predict klystron output waveforms We model the dynamical behavior of the klystron. The model inputs are the DAC reference waveforms, drive factor, HV, and the time index within an RF pulse. We include the time index in inputs for modeling the klystron’s time-dependent gain caused by the intra-pulse HV ripples. The HV, DAC drive factor, and time index are static inputs to the neural network model, while the DAC reference waveforms are stacked as Fig. 4.6b with l = 20. This model aims to predict the klystron output waveforms for given inputs. The model has one hidden layer with 40 neurons. The prediction of a test pulse is shown in Fig. 4.9. The left plot shows the DAC reference waveforms. The predicted amplitude and phase waveforms of the klystron output agree well with the measurements. The predictions get poor at the falling edge because the training data does not cover this part of the pulses. We will revisit the topic of neural network modeling in Sect. 4.3, applying it to beam control and optimization.
4.2 Accelerator Modeling with Machine Learning
101
Fig. 4.9 Neural network model for predicting klystron output waveforms
4.2.2 Gaussian Process Regression Model Neural network is a parametric model with many parameters determined by the training process. The training data is then discarded, and the model can predict the system outputs for new inputs. We must collect new data and train the neural network model again to adapt it to the drifts of system responses. The training may be timeconsuming for complex models, and we may have to interrupt the system operation to collect data. This subsection discusses the Gaussian process (GP) regression models (Murphy 2012; Schulz et al. 2018), which are also very efficient in modeling the accelerator subsystems and beam responses. GP models assume that the system outputs follow a multivariate Gaussian distribution. For new inputs, the outputs will be predicted with a posterior distribution derived from the training data. GP models have the following features: • GP models are non-parametric models. They make predictions using the training data directly. Therefore, a GP model can easily include newly observed data (input–output pairs) and adapt to the system drifts. This makes GP models attractive for accelerator beam controls since their primary goal is to suppress the drifts in the accelerator. • Since the prediction result is a probability distribution, GP models can predict both the mean value and the variance (i.e., uncertainty) of the output for a given input. 4.2.2.1
Multivariate Gaussian Distribution
Let us first briefly review the multivariate Gaussian distribution (also known as the multivariate normal distribution, MVN). An n-dimensional random vector x = [ x1 x2 · · · xn ]T ∈ Rn consists of n random variables. Each sample of x is a random point in the n-dimensional real space. If x follows a multivariate Gaussian distribution, then the joint probability distribution function (PDF) of x’s components can be written as
102
4 Machine Learning for Beam Controls
p(x) = p(x1 , x2 , . . . , xn ) =
1 (2π )
n/2
e− 2 (x−μ) 1
|∑|
1/2
T
∑ −1 (x−μ)
.
(4.9)
The PDF can be integrated over volumes of Rn to assign probabilities for events that x’s value falls into these volumes. The vector μ ∈ Rn is the mean value of x, and the positive definite symmetric matrix ∑ ∈ Rn×n is the covariance matrix of x. They are given by )] [ ( μ = E[x], ∑i j = E (xi − μi ) x j − μ j i, j = 1, 2, . . . , n,
(4.10)
where E[·] represents the mathematical expectation, ∑i j is the element of ∑ at the ith row and jth column, and μi and μ j are components of μ. The diagonal elements ∑ii is the variance of xi , whereas ∑i j is the covariance of xi and x j . Covariance characterizes the correlation of two random variables: stronger correlation results in a larger covariance magnitude. Specifically, ∑i j = 0 if xi and x j are uncorrelated. To shorten the writing, we also denote the MVN distribution as x ∼ N(μ, ∑), or p(x) = N(μ, ∑),
(4.11)
describing the random vector x with its mean value and covariance matrix. Given a joint PDF (4.9), we are interested in the joint distribution of some components of x if others are known. That is, we split the vector x into two sub-vectors ]T [ x = x1T x2T with x1 ∈ R p and x2 ∈ Rn− p (1 ≤ p < n). If we know the value of x1 , then what is the PDF of x2 ? The given joint PDF (4.9) is called a prior PDF since it is derived from experience or theory and not yet proved by observations. After observing (i.e., measuring) the values of the random variables in x1 , we gain some insights into x. With this new information, we should be able to update our knowledge about the distribution of x2 . The PDF of x2 under such conditions is called a posterior PDF, which is a conditional PDF. Theorem 4.1 gives formulas to calculate the posterior PDF. We will not prove it and interested readers can refer to the book by Murphy (2012). ([ ] [ ]) [ ] μ1 ∑ 11 ∑ 12 x1 ∼ N , Theorem 4.1 Suppose x = , where μ1 and μ2 x2 μ2 ∑ 21 ∑ 22 are the mean vectors of x1 and x2 , and ∑ 11 , ∑ 12 , ∑ 21 , ∑ 22 are blocks of ∑. The marginal PDFs are given by px1 (x1 ) = N(μ1 , ∑ 11 ), px2 (x2 ) = N(μ2 , ∑ 22 ),
(4.12)
and the conditional PDFs are given by ) ( −1 p x2 |x1 ( x2 |x1 ) = N μ2 + ∑ 21 ∑ −1 11 (x1 − μ1 ), ∑ 22 − ∑ 21 ∑ 11 ∑ 12 , ) ( −1 p x1 |x2 ( x1 |x2 ) = N μ1 + ∑ 12 ∑ −1 22 (x2 − μ2 ), ∑ 11 − ∑ 12 ∑ 22 ∑ 21 .
(4.13)
4.2 Accelerator Modeling with Machine Learning
103
Fig. 4.10 An example of multivariate Guassian distribution. a Joint PDF; b marginal PDF of x 1 and x 2 ; c Posterior PDF of x 2 for a given x 1 = 2.5
Equation (4.13) implies that if x1 and x2 are correlated (i.e., ∑ 12 /= 0, ∑ 21 /= 0), observing one of them can improve the knowledge of the other. Theorem 4.1 is the theoretical basis of GP regression. Here we give a simple example. Assume x = [x1 x2 ]T has two random variables. Its mean value is μ = [1 0]T and its covariance matrix is ∑ = [1 0.3; 0.3 0.25]. Equations (4.12) and (4.13) can be used to calculate the marginal and conditional PDFs of x1 and x2 . Here we split x into two sub-vectors, each containing one variable. The joint PDF of x is illustrated in Fig. 4.10a, and the marginal PDFs of x1 and x2 are depicted in Fig. 4.10b. Let us use ∞x1 as an example. Its marginal PDF is computed with the formula px1 (x1 ) = −∞ p(x1 , x2 )dx2 , which is the probability density of x1 with x2 unlimited. An example of the conditional PDF of x2 (x1 = 2.5) can be found in Fig. 4.10c. Its mean value and variance are calculated with the first equation of (4.13). The conditional PDF is obtained from the joint PDF in the following way: we cut the 3D surface in Fig. 4.10a with a vertical plane parallel to x2 at x1 = 2.5. The intersection curve between the vertical plane and the joint PDF surface is similar to a 1D Gaussian PDF. The intersection curve is then normalized by the marginal probability density of x1 at 2.5, resulting in the conditional PDF. This is a geometric explanation. The conditional PDF is actually calculated with the formula p x2 |x1 ( x2 |2.5) = p(x1 , x2 )|x1 =2.5 / px1 (2.5). 4.2.2.2
Gaussian Process Regression
A Gaussian process regression problem is formulated as: given a set of training data D = {(xi , yi ), i = 1, 2, . . . , N }, where xi ∈ Rn and y ∈ R, predict the output y N +1 for a new input vector x N +1 . Here we limit our discussions to a multiple-input single-output (MISO) system. To make predictions, one can observe the distances between x N +1 and the input vectors in D. The output y N +1 should be similar to yi if x N +1 is close to xi . This principle is based on the belief that the system response is smooth and similar input vectors produce similar output values. It is usually valid for most accelerator subsystems and beam responses.
104
4 Machine Learning for Beam Controls
In GP regression models, we stack the system output values for different input vectors into a random vector, which is assumed to follow an MVN distribution. As mentioned above, two closer input vectors result in a stronger correlation between their output values. Therefore, we can construct the covariance matrix of the output random vector according to the distances between the corresponding input vectors. Consider first the training data D. We stack the output values into a vector and the input vectors as columns into a matrix: ]T ] [ [ y = y1 y2 · · · y N , X = x1 x2 · · · x N .
(4.14)
The output vector y follows an MVN distribution given by ⎤ k(x1 , x1 ) · · · k(x1 , x N ) ⎥ ⎢ .. .. .. y ∼ N(0, K), where K = k(X, X) = ⎣ ⎦, . . . k(x N , x1 ) · · · k(x N , x N ) ⎡
(4.15)
where K is a positive definite symmetric matrix called kernel matrix. The elements of K are defined by a kernel function related to the distance between the corresponding input vectors. The choice of kernel function is flexible depending on the system to be modeled. An effective kernel function should return a larger value for closer input vectors, representing stronger correlation between their output values. The widely used radial-basis function (RBF) kernel is given by T ) ( 1 k xi , x j = σ y2 e− 2l 2 (xi −x j ) (xi −x j ) .
(4.16)
Here σ y2 is the variance of the output y and l is the length scale. Both are hyperparameters for tuning the performance of the GP regression model. A smaller l makes the kernel function decay faster with increasing distance between xi and x j , assigning only strong correlations to closer points. In this case, the model predicts the output only with the training data points close to the new input vector. Determining the hyperparameters can follow the trial and error method using cross-validation. Algorithms have been developed to facilitate the tuning of hyperparameters. Interested readers can refer to the book by Murphy (2012). The kernel function (4.16) can satisfy most of our needs in accelerator controls. Murphy’s book introduces many other kernel functions that can be considered if needed. Now we include the new input and output into the joint distribution: [
] ([ ] [ ]) y 0 K K∗ ∼N , , where K∗T K∗∗ y N +1 0 ]T [ K∗ = k(X, x N +1 ) = k(x1 , x N +1 ) k(x2 , x N +1 ) · · · k(x N , x N +1 ) , K∗∗ = k(x N +1 , x N +1 ).
(4.17)
4.2 Accelerator Modeling with Machine Learning
105
According to (4.13), with the given training data set D, we can obtain the posterior distribution of y N +1 as ) ( p yN +1 |y ( y N +1 |y, X, x N +1 ) = N μ yN +1 , σ y2N +1 , where μ yN +1 = K∗T K−1 y, σ y2N +1 = K∗∗ − K∗T K−1 K∗ .
(4.18)
It calculates the mean value and variance of y N +1 using the training data and the new input x N +1 . Remark 1 Equation (4.17) is for predicting the output of a single input vector. The algorithm can easily be extended for predicting the output values of multiple input vectors by extending the matrix K∗ and K∗∗ correspondingly. Remark 2 If the system output y contains measurement noise and its variance 2 (assuming a zero-mean Gaussian noise), the matrix K in (4.17) is known as σmea 2 I N (Murphy 2012), where I N is an and (4.18) should be replaced by K + σmea N-dimensional unit matrix. This helps increase the accuracy of prediction in the presence of measurement noise. The measurement noise can be easily characterized in practice, e.g., we can measure multiple output samples for the same input and calculate the mean and variance. Remark 3 Every time we update the training data, K−1 should be calculated once. The matrix K∗ and K∗∗ should be calculated when making predictions for new input vectors. The matrix K may be ill-conditioned, and directly calculating K−1 may result in large errors. In practice, we use Cholesky decomposition (K = LLT ) to calculate the inverse (Murphy 2012). In this case, we have. ( )−1 −1 K−1 y = LT L y = LT \(L\y), ( )−1 −1 L K∗ = (L\K∗ )T (L\K∗ ). K∗T K−1 K∗ = K∗T LT
4.2.2.3
System Modeling with Gaussian Processes
The algorithm (4.18) assumes a MISO system. When modeling the input–output relations of a multiple-input multiple-output (MIMO) system, we can establish a separate GP model for each output component. As an example, we built a GP regression model for one of the gain curves (HV = 2500 V) in Fig. 4.8. The results are shown in Fig. 4.11, illustrating the models trained with different numbers of data points. The mean values and variances of the outputs are predicted with the trained model. At the training data points, the model has less uncertainty with smaller prediction variances, whereas, at other points, the variances are large since the model does not have information about those points. The prediction accuracy improves with more training data points.
106
4 Machine Learning for Beam Controls Confidence region
Measurement
Training data
Estimated mean value 0.8
2 1.5 1 0.5 0 -0.5 -1
1
Klystron out amplitude [arb.]
Klystron out amplitude [arb.]
Klystron out amplitude [arb.]
2.5
0.8 0.6 0.4 0.2 0
-0.2
-1.5 0.5
DAC drive factor [arb.]
1
0.4
0.2
0
-0.4 0
0.6
0
0.5
1
DAC drive factor [arb.]
0
0.5
1
DAC drive factor [arb.]
Fig. 4.11 Klystron gain curve GP regression model. The shaded region illustrates ± 3σ confidence. The models are with 2, 3 and 6 training data points, respectively
4.3 Applications of Machine Learning Models in Beam Controls Machine learning models capable of predicting large ranges of outputs can serve as surrogate models of the accelerator. They enable exploring the system inputs extensively without worrying about beam loss or interlock trips. In addition, simulation is often much faster than evaluating the physical systems for optimization. This section discusses the applications of machine learning surrogate models in beam response matrix estimation, predictive feedforward control and optimization. Again, we will use the SwissFEL two-bunch operation as an example.
4.3.1 Surrogate Models of Beam Responses The neural network regression model can represent a (nonlinear) system’s input– output relations within a large operating range. It is an excellent candidate for modeling the global beam response. Identifying a global neural network model is difficult due to the possible beam loss when collecting training data with large input variations. For example, if we vary the RF amplitude significantly, the beam energy may exceed the acceptance thresholds of the magnets, causing beam loss. As a trade-off, we can collect training data at multiple operating points. These operating points are set up with physics models. We only vary the inputs randomly in a small range at each operating point. Combining the data from multiple operating points allows training a global neural network model. This global model helps estimate the beam response matrix at a different operating point (Sect. 4.3.2), change quickly between different operating points (Sect. 4.3.3), and perform feedforward controls (Sect. 4.3.4). When changing the operating points, the model only determines the
4.3 Applications of Machine Learning Models in Beam Controls
0
-50 500
1000
0
Test Id
0
-50 0
500
Test Id
-50 0
1000
500
1000
0
0
500
Test Id
1000
1000
50
0.5
0
500
Test Id
1
r L1
0.5
0
0
Test Id
50
p X [deg]
rX
1000
Test Id
1
0
500
0.5
p L1 [deg]
0
50
p bst2 [deg]
r bst2
0.5
0
1
50
p bst1 [deg]
r bst1
1
107
0 -50
0
500
Test Id
1000
0
500
1000
Test Id
Fig. 4.12 Inputs to identify the surrogate model of SwissFEL bunch2 response. The step ratio (r) and step phase (p) of booster1 (bst1), booster2 (bst2), X-band (X) and Linac1 (L1) are varied by Gaussian noise at 10 different operating points
relevant inputs (e.g., RF amplitude and phase). The setup of other components (e.g., magnets) should be done with physics models; otherwise, they must be included in the inputs of the neural network model. For example, a neural network surrogate model was identified for the SwissFEL bunch2 response (see Sect. 1.3 for details). We collected training data at 10 different operating points with the inputs and outputs depicted in Figs. 4.12 and 4.13. At each operating point, we injected Gaussian noise into the actuators to probe the local response. The last plot in Fig. 4.13 indicates that the BC2 bunch length measurement was invalid for the first operating point. The invalid data should be excluded from the training data. This neural network model has one hidden layer with 16 neurons. The test results for the trained model are shown in Fig. 4.14, which compares the predictions and measurements for the same test inputs. The prediction error of L BC2 is relatively large due to the strong nonlinearity and poor signal-to-noise ratio of the bunch length detector at BC2 of SwissFEL. Similarly, GP regression models can also serve as surrogate models of beam responses if trained with the system input–output data. For example, we trained a GP model with the bunch2 data shown in Figs. 4.12 and 4.13. Figure 4.15 shows the test results of the trained model. Compared to the neural network model, the GP model can predict both the mean value and the variance of the output for a new input. The new input closer to the training data points result in a more accurate prediction, i.e., with a smaller variance. GP models derived from large training sets are expensive in computing. Equation (4.18) implies that we need to calculate a large matrix K∗ if the training data consists of many points. Therefore, we typically only use GP models to model small-range beam responses around a particular operating point, which do not need many training data points (as the case in GP model-based optimization, see Sect. 4.3.5). When modeling global beam responses covering an extensive range of operating points, neural network models are preferred.
4 Machine Learning for Beam Controls 300
-0.2 -0.4 -0.6
0
LBC1 [arb.]
0
EBC1 [mm]
ELH [mm]
108
-1 -2
200 100 0
0
500
0
1000
500
1000
0
Test Id
4
500
1000
Test Id
800
LBC2 [arb.]
EBC2 [mm]
Test Id
2 0 0
500
700 600 500
1000
0
Test Id
500
1000
Test Id
Fig. 4.13 Bunch2 measurements in response to the inputs in Fig. 4.12. ΔE is beam energy deviation represented by beam poistions and L is bunch length measurement with arbitrary units. The subscripts LH , BC1 and BC2 represent the parameters at laser heater, BC1 and BC2, respectively
-0.2 -0.4 -0.6
300
0
LBC1 [arb.]
EBC1 [mm]
ELH [mm]
0
-1 -2
0
10
20
30
40
100 0
0
10
Test Id
20
30
40
0
Test Id
4
20
40
Test Id
800
LBC2 [arb.]
EBC2 [mm]
200
2
0 0
20
Test Id
40
Measurement NN model prediction
700 600 0
10
20
30
40
Test Id
Fig. 4.14 Test of the neural network surrogate model of bunch2 response
4.3.2 Response Matrix Estimation with Neural Network Surrogate Models Beam feedback controllers rely on the inverse of the beam response matrix (see Chap. 2). If the beam operating point changes dramatically (e.g., with different energy or bunch compression), a new response matrix must be identified to keep the loop stable and performing satisfactorily. Determining the response matrix typically needs to vary the system inputs, which is invasive to operation. It is often not allowed to measure the response matrix during critical user operation (e.g., delivering FEL photons to user experiments). In this case, surrogate models can help estimate the new response matrix. To do that, the same methods introduced in Sect. 2.3.1 can apply to
-0.5
109 300
0
LBC1 [arb.]
0
EBC1 [mm]
ELH [mm]
4.3 Applications of Machine Learning Models in Beam Controls
-1 -2
-1 0
5
10
15
20
0
4 2 0 0
5
10
Test Id
20
Test Id LBC2 [arb.]
EBC2 [mm]
Test Id
10
15
20
200 100 0
10
20
Test Id
800
Confidence region Measurement Predicted mean value
600
400 0
10
20
Test Id
Fig. 4.15 Test of the GP surrogate model of bunch2 response
the surrogate model. The estimated response matrix may not be as accurate as that identified with the input–output data measured from the physical system. However, we can still use it to configure the beam feedback loop using the least-square method (Sect. 2.4.3) or the robust control method (Sect. 2.4.4). These methods allow the loop to operate well with response matrix uncertainties. For example, we identified the response matrix of SwissFEL bunch2 using the neural network surrogate model obtained in Sect. 4.3.1. The operating point y O P and the estimated response matrix R O P are given by (4.19). The validation of this response matrix with fresh inputs to the physical system is shown in Fig. 4.16. The prediction errors are visible, especially for L BC2 . However, the response matrix is accurate enough for configuring the beam feedback loop. Because the response matrix identification using surrogate models is very fast, this method supports quick configurations of beam feedback loops after changing the operating point without disturbing the beam operation. ]T [ y O P = ΔE L H ΔE BC1 L BC1 ΔE BC2 L BC2
RO P
= [ −0.064 −0.065 140.8 2.24 620.9 ]T , ⎡ 0.44 0.14 0 0 0 0 ⎢ 0.82 −0.16 1.41 −0.06 −1.24 0.34 ⎢ ⎢ = ⎢ −0.39 −0.93 −0.56 −0.27 2.24 0.90 ⎢ ⎣ −1.16 0.15 −1.35 0.11 1.07 −0.12 0.04 0.07 0.01 −0.09 −0.12 0.05
⎤ 0 0 0 0 ⎥ ⎥ ⎥ 0 0 ⎥. ⎥ 0.21 −0.32 ⎦ 0.04 −0.06
(4.19)
110
4 Machine Learning for Beam Controls
-0.1
200
0.2
LBC1 [arb.]
EBC1 [mm]
ELH [mm]
0
0 -0.2 -0.4
-0.2 0
10
20
100 0
Test Id
10
20
Test Id
0
10
20
Test Id
660
2.6
LBC2 [arb.]
EBC2 [mm]
150
2.4 2.2 2 0
10
Test Id
20
Measurement NN derived resp. matrix prediction
640 620 600 0
10
20
Test Id
Fig. 4.16 Validation of the response matrix estimated with the neural network surrogate model
4.3.3 Beam Optimization with Neural Network Surrogate Models We have introduced several optimization algorithms in Chap. 3. Some (e.g., SCO, RWO and RCDS) can perform online optimization but typically converge slowly (take tens of minutes to hours). Stochastic algorithms like GA and PSO explore the input parameter space aggressively. They can easily cause beam loss and are unsuitable for online optimization. Similar to Sect. 4.3.2, we can also apply optimization to the system’s surrogate model (Kong et al. 2016; Edelen et al. 2020). This strategy enables the usage of GA and PSO with better capability to find the global optimum. Evaluating the optimization cost function on surrogate models is based on simulations, which often support parallel computation and converge much faster. In addition, optimizing with surrogate models avoids evaluating the physical systems too frequently, which may be time-consuming or risky for beam loss. On the other hand, surrogate models are often not so accurate, and the resulting solutions may not be optimal for physical systems. However, we still achieve a quasi-optimal solution as a starting point of the online optimization directly applied to the physical system. Specifically, suppose we change the operating point of beam parameters with optimization. The quasi-optimal inputs can bring the system close to the desired operating point. Then, we update the beam feedback loop using the response matrix identified from the surrogate model, as discussed in Sect. 4.3.2. The feedback loop can be closed to eliminate the residual errors in the beam parameters. Note that we cannot change the operating point by directly varying the feedback setpoints if the beam response is strongly nonlinear. In summary, with surrogate models, the overall optimization process can be accelerated significantly. This concept is depicted in Fig. 4.17.
4.3 Applications of Machine Learning Models in Beam Controls
111
Objectives
Fig. 4.17 Acceleration of optimization with surrogate models
Optimizer
Optimizer
Model Inputs
Surrogate Model Model Outputs Step I
System Inputs Initial Inputs
Physical System
System Outputs
Step II
For example, we change the SwissFEL bunch2 operating points via optimization, as in Sect. 3.3.3. Instead of directly optimizing the RF pulse steps, we apply the optimization algorithms to the neural network surrogate model identified in Sect. 4.3.1. After the model-based optimization converges, we set the solution to the physical system and observe its outputs. The results are shown in Fig. 4.18. The achieved system outputs are close to the desired operating point (“Goal” of each plot), indicating that the model-based optimization managed to find the (quasi-)optimal inputs. Another benefit of this strategy is that we can change the system operating points in a single step. It simplifies the setup of magnets (not included in the surrogate model) for the new operating point. We only need to adjust the magnet settings once after setting the model-based optimization solution to the physical system. If we optimize the physical system directly when changing the operating point, we must adjust the magnet settings frequently to follow the intermediate beam energy. Particularly, the surrogate model can be used to estimate the conjugate directions in the input parameter space, which can accelerate the RCDS-based online optimization applied to the physical system. The convergences of PSO and RCDS on the neural network surrogate model are shown in Figs. 4.19 and 4.20. We have chosen PSO to demonstrate the behavior of stochastic algorithms. PSO needs many iterations (i.e., cost function evaluations) to converge and produces significant output variations in early iterations. It is risky for beam loss if we directly apply PSO to the physical systems. On the contrary, RCDS converges more smoothly. See Fig. 4.20. This is why RCDS can make online optimization directly on physical systems. Model-based optimization is much faster than online optimization (Sect. 3.3.3). With Matlab simulation, PSO takes around 8 min to converge, and RCDS takes about 2.5 min. If implemented in lower-level languages (e.g., C language), they can run even faster, enabling much faster optimization following the procedure in Fig. 4.17.
112
4 Machine Learning for Beam Controls 250
0
LBC1 [arb.]
0
EBC1 [mm]
-0.5
-0.2 -0.4
-1
-1.5
-0.6
200 150 100
-2
W O
A
AII PS O R C D S
G
R
SG
LBC2 [arb.]
3 2 1
N
N
N
R
W O
G SG A AII PS O R C D S
W O R
4
600
Initial value Achieved value Goal
400 200
S D
R
C
II
O
PS
A-
G N
N
R
W
O
G SG A AII PS O R C D S
O W R
A
0
0
SG
EBC2 [mm]
50 0
-0.8
G SG A AII PS O R C D S
ELH [mm]
0.2
Fig. 4.18 Bunch2 operating point changing via optimization applied to the neural network surrogate model. “Initial value” is the old and “goal” is the desired new operating points. “Achieved value” is the physical system output resulting from the inputs determined by the model-based optimization. Multiple algorithms were tested: RWO, GA, NSGA-II, PSO and RCDS (see Sect. 3.2 for details)
-0.4 -0.7
-1 -2 -3
0
0.5
1
1.5
Num. of Eval.
2 10
0
0.5
4
6
400 300 200 100 0
-4
-1
1
1.5
Num. of Eval.
2 10
4
0
0.5
1
1.5
Num. of Eval.
2 10 4
1200
LBC2 [arb.]
EBC2 [mm]
0
LBC1 [arb.]
EBC1 [mm]
ELH [mm]
0.2 -0.1
1000
4 2 0 0
0.5
1
1.5
Num. of Eval.
2 10
4
Eval. output Initial value Goal
800 600 400 200 0
0.5
1
1.5
Num. of Eval.
2 10 4
Fig. 4.19 Convergence of PSO applied to the neural network surrogate model. “Eval. output” is the model output of each iteration. The convergence takes around 8 min
4.3.4 Feedforward Control with Neural Network Surrogate Models Beam feedback can reject the effects of low-frequency disturbances. Several factors determine the upper boundary of the closed-loop bandwidth (ω B ) of a feedback loop. First, the loop delay Δt limits the achievable ω B to ω B < 1/Δt for preserving a phase margin over 30° (Skogestad and Postlethwaite 2005). Second, the components in the
0.2 0 -0.2 -0.4 -0.6 -0.8
LBC1 [arb.]
0 -1 -2 -3
0
2000
4000
200 150 100 50
0
Num. of Eval.
2000
4000
0
Num. of Eval.
2000
4000
Num. of Eval.
800
4
LBC2 [arb.]
EBC2 [mm]
113 250
EBC1 [mm]
ELH [mm]
4.3 Applications of Machine Learning Models in Beam Controls
3 2 1 0 0
2000
4000
Num. of Eval.
Eval. output Initial value Goal
700 600 500 400 300 0
2000
4000
Num. of Eval.
Fig. 4.20 Convergence of RCDS applied to the neural network surrogate model. The convergence takes around 2.5 min
feedback loop may limit the maximum ω B due to their higher-order dynamics. For example, in a cavity tuning control loop, the mechanical oscillation modes of the cavity restrict the reachable feedback bandwidth (Simrock and Geng 2022). Finally, we may reduce ω B to avoid passing too much measurement noise to the system output. Feedback can do little to disturbances with frequencies above ω B . In this case, we can consider the feedforward control for such high-frequency disturbances. It requires that the disturbances are measurable directly or indirectly (e.g., via other measurable signals affected). The disturbances should also be stationary (e.g., with fixed frequencies) so that their effects on beam parameters are predictable from the measurements. This concept is illustrated in Fig. 4.21. The strategy in Fig. 4.21 uses feedforward to reject high-frequency disturbances and uses feedback to mitigate low-frequency (including direct-current, DC) output errors. We demonstrate this concept with the control of SwissFEL bunch2 (see Fig. 4.22). When studying the beam feedback and optimization, we have used the RF pulse step ratios and phases of booster1, booster2, X-band and Linac1 to adjust Disturbance
Disturbance Estimation
r
+ -
Feedback Control
Beam Ripple Prediction +
Response Matrix Inverse
R–1
u
Plant
y
Fig. 4.21 Concept of feedforward control to reject disturbances with frequencies above the feedback closed-loop bandwidth
114
4 Machine Learning for Beam Controls
the bunch2 parameters at the laser heater (LH), BC1 and BC2. Assume a disturbance is coupled into the RF Gun field, deviating the downstream beam parameters. We will mitigate this disturbance with an additional feedforward signal applied to the RF pulse steps of booster2, X-band and Linac1. To do that, we measure the Gun amplitude and phase waveforms and estimate the disturbance in terms of equivalent step ratio (r gun ) and step phase ( pgun ) applied to bunch2. Then, we train a neural network predictive model to predict the disturbance-caused bunch2 parameter deviations and subtract them from the feedback controller outputs (Meier et al. 2009a, b). Finally, the required RF pulse step settings are calculated via the response matrix inverse (i.e., the static controller). This algorithm is described as follows: ⎧ ⎨ a F F,k = NN(dk−1 , dk−2 , . . . , dk−l ), =a + g(r −) yk−1 ), a ⎩ F B,k −1 (F B,k−1 uk = R a F B,k − a F F,k , [ ]T ]T [ where d = r gun pgun , y = ΔE BC1 L BC1 ΔE BC2 L BC2 , ]T [ u = rbst2 pbst2 r X p X r L1 p L1 .
(4.20)
We use the index k − 1 to denote the just finished pulse and k the upcoming pulse. The goal is to determine the drive signal for the upcoming pulse. The estimated disturbance is denoted as d, and we use its l previous values to predict the beam deviations of the upcoming pulse, a F F,k . The function NN(·) returns the outputs of the neural network predictive model. For feedback control, we have used the simple discrete integral control with g the integral gain. The neural network predictive model has a structure as Fig. 4.23. We use the disturbances estimated in l previous pulses to predict the beam deviations in the next pulse. Compared to the dynamical regression model in Fig. 4.6b, the first input index of the predictive model is k − 1 instead of k. To train the neural network predictive model, we produce r gun and pgun excitation signals that are superpositions of multiple sinusoidal signals. See Fig. 4.24. In this example, 10 sinusoidal signals
Fig. 4.22 Study of feedforward control on the second bunch of SwissFEL
4.3 Applications of Machine Learning Models in Beam Controls
115
were produced with random amplitudes, phases, and frequencies between 0.01 f pul and 0.2 f pul , where f pul is the RF pulse repetition rate. The resulting beam parameter variations are also displayed. The neural network model trained with such training data can predict beam fluctuations caused by disturbances in the same frequency range. The model has one hidden layer with 10 neurons, and the number of previous pulses is selected to be l = 10. The model test can be found in Fig. 4.25, indicating good accuracy. To test the feedforward control, we varied the RF Gun amplitude and phase to emulate the disturbance. The RF Gun of SwissFEL is stable in regular operation. The frequency of this artificial disturbance is configured by software. We tested two frequencies, 0.05 f pul and 0.15 f pul , and the results are shown in Figs. 4.26 and 4.27. For the lower-frequency case (Fig. 4.26), feedback is quite efficient for suppressing the disturbance, and feedforward is optional. However, for the higherfrequency disturbance (Fig. 4.27), feedback has little effect while feedforward can
Fig. 4.23 Neural network predictive model for predicting beam ripples from estimated disturbances
pgun [deg]
r gun
0.4 0.3
EBC1 [mm]
120
0.5
110 100 90
0.2 0
50
50
0
100
160 50
Test Id
100
0.2
LBC2 [arb.]
EBC2 [mm]
LBC1 [arb.]
200 180
0 -0.2 0
50
Test Id
50
100
Test Id
Test Id
Test Id
0
0
-1 0
100
1
100
1200 1000 800 0
50
100
Test Id
Fig. 4.24 Training data for the neural network predictive model. The excitation signals are the step ratio (r gun ) and step phase ( pgun ) applied to the Gun RF pulse. The outputs are the energy deviation (ΔE) and bunch length (L) of bunch2 at BC1 and BC2
116
4 Machine Learning for Beam Controls 0.3
0.15
Measurement NN prediction
0
0.2
EBC2 [mm]
LBC1 [arb.]
EBC1 [mm]
0.1 0.5
0.05
-0.05
5
10
0
Test Id
5
0
0
-0.2
-0.1 0
10
Measurement NN prediction
0.2
0.1
0
-0.1
-0.5 0
0.4
Measurement NN prediction
LBC2 [arb.]
Measurement NN prediction
1
Test Id
5
10
-0.4 0
Test Id
5
10
Test Id
Fig. 4.25 Test of the neural network predictive model by comparing the beam parameter measurements and the predictions. All values are deviations from the operating point. L BC1 and L BC2 have been normalized by their operating points
No ctrl
FB only
FF only
0.1
LBC1 [arb.]
EBC1 [mm]
FB + FF
0 -0.1
190 180 170
-0.2 0
20
40
60
80
100
0
20
40
60
80
100
80
100
Test Id
Test Id 1600
0.1
1400
LBC2 [arb.]
EBC2 [mm]
FB set point
200
0.2
0
1200
-0.1
1000
-0.2 0
20
40
60
Test Id
80
100
800 0
20
40
60
Test Id
Fig. 4.26 Feedforward control of a low-frequency (0.05 f pul ) disturbance coupled into RF Gun
suppress the periodic disturbance very well. Furthermore, we can see from Fig. 4.27 that the fluctuation of ΔE BC2 is amplified by feedback compared to the “No ctrl” case. This is due to the small phase margin in the ΔE BC2 output channel, requiring a smaller feedback gain. With only feedforward, the steady-state errors may be large (as seen from the figures), and feedback is still needed to reduce the steady-state and low-frequency errors.
4.3.5 Beam Optimization with GP Surrogate Models In Sect. 4.3.1, we used GP surrogate models of beam responses to predict the beam parameters for new inputs. However, for optimization, we train GP surrogate
4.3 Applications of Machine Learning Models in Beam Controls No ctrl
FB only
FF only
FB + FF
117 FB set point
0.2
LBC1 [arb.]
EBC1 [mm]
190 0.1 0
-0.1
180 175
-0.2 0
20
40
60
80
170
100
0
Test Id
20
40
60
80
100
80
100
Test Id
0.2
1600
0.1
1400
LBC2 [arb.]
EBC2 [mm]
185
1200
0
1000
-0.1 -0.2
800 0
20
40
60
80
Test Id
100
0
20
40
60
Test Id
Fig. 4.27 Feedforward control of a high-frequency (0.15 f pul ) disturbance coupled into RF Gun
models whose outputs are directly the cost function values. The model inputs are still the system inputs. Suppose we change the SwissFEL bunch2 operating points (Sect. 3.3.3) using the GP surrogate model-based optimization. In that case, the inputs to the GP model are the system inputs (i.e., RF pulse step ratios and phases), and the output is the cost J defined by (3.12). If there are multiple objectives, the cost function returns a vector of cost items (i.e., cost vector). In this case, we can train a separate GP model for each cost item and use the non-dominated sorting to evaluate the cost vector (Roussel et al. 2021). These models share the same inputs. In this section, we will stay with single-objective optimization problems.
4.3.5.1
Bayesian Optimization
The GP surrogate model-based optimization belongs to the larger class of Bayesian optimization (Brochu et al. 2010), which is an iterative decision-making procedure based on prior knowledge and posterior experiences. The principle of Bayesian optimization is depicted in Fig. 4.28. For Bayesian optimization, the physical system must be static: the system outputs only rely on the instant inputs. If the physical system is dynamical, i.e., its outputs depend on both the instant and the historical inputs, then the optimizer should minimize the cumulative cost for a sequence of transitions of the system’s state. Such optimization problems should be solved by the dynamic programming or reinforcement learning that will be discussed in Sect. 4.4. To optimize the system outputs (i.e., minimize a cost function defined with the system outputs), we initially determine the system inputs based on the prior knowledge of the system before making any observations. For example, to minimize the
118 Fig. 4.28 Principle of Bayesian optimization
4 Machine Learning for Beam Controls Observation (accumulated)
Bayesian Rule Posterior
Prior Knowledge System Outputs
Decision Rule
Objectives
Physical System
System Inputs
bunch length after a bunch compressor, we initially determine the upstream RF settings using the physics model of the system. The prior knowledge also includes the belief that the system outputs (or the derived costs) of different input vectors follow some joint distribution, e.g., joint Gaussian distribution (Sect. 4.2.2.2). We evaluate the initial inputs in the physical system and observe its outputs, including calculating the cost of the system outputs. Then, the following procedure is executed iteratively to optimize the system outputs: (a) Build a surrogate model Combining the prior knowledge (assumed joint distribution of multiple input-cost data points) and the observations (measured input-cost data points), we can calculate the posterior distribution of the cost for a given input vector. That is, for each possible input vector, we can predict the mean value and variance of the resulting cost. Therefore, the posterior is a function of inputs and serves as a surrogate model of the physical system. We still use the bunch length optimization problem as an example. The surrogate model can predict the bunch length (i.e., the cost for the problem of bunch length minimization) for different RF amplitude and phase settings and also tell the prediction uncertainty. This step uses the Bayesian rule (Murphy 2012) of the probability theory. (b) Determine the next trial input vector The surrogate model allows fast determination of the next trial input vector using simulation. There are two ways to determine the next trial input vector. First, we may choose it by minimizing the mean value of the cost predicted by the surrogate model. This is called exploitation, which evaluates the inputs close to the current optimum point. On the contrary, we can select a trial input vector where the predicted cost has a large variance. This action is exploration aiming at reducing the uncertainty of the surrogate model via new observations. We must make a trade-off between exploitation and exploration. Too much exploitation may miss the global optimum at other unobserved input regions, whereas too much exploration may slow down the convergence. For the bunch length optimization example, we can choose the next trial RF settings close to the values that already yield small bunch length but with larger variance. Typically, we define an acquisition function (or called utility ) ( function) to compromise the statistical attributes of the model prediction, u(x) = f μ J (x) , σ J (x) ,
4.3 Applications of Machine Learning Models in Beam Controls
119
where μ J (x) and σ J (x) are the mean value and standard deviation of the predicted cost J for the input vector x. The next trial input vector for the physical system is determined by minimizing u(x). (c) Evaluate the trial input vector in the physical system We apply the trial input vector (e.g., RF amplitude and phase settings) determined in the last step to the physical system. The system outputs (e.g., bunch length) are observed and their cost is calculated. We add this new input-cost data point to the observation data set and go back to step a for a new iteration. In step b above, a surrogate model is used to determine the new trial input vector. An accurate surrogate model can improve the efficiency of searching for the optimal solution. Bayesian optimization updates the surrogate model in each iteration with new observations of the physical system and keeps improving the model accuracy. With the help of surrogate models, Bayesian optimization often converges faster than the optimization algorithms introduced in Chap. 3. Bayesian optimization is often used to tune the hyperparameters of the machine learning algorithms. Of course, it can also be applied to optimize the beam parameters.
4.3.5.2
Gaussian Process Optimization
When the surrogate model is a GP regression model, the algorithm is named Gaussian process optimization (GPO), as described in Table 4.1. The key part of GPO is the acquisition function u(x). It selects the next x where the GP model output has a small mean value (for exploitation) and large variance (for exploration). We assume that the optimization is to find the minimum. Different acquisition functions have been developed, such as the probability of improvement (PI), expected improvement (EI), and lower confidence bound (LCB) functions (Brochu et al. 2010). We focus on the GP-LCB acquisition function: u(x) = μ J (x) − κσ J (x) ,
(4.21)
Table 4.1 GPO algorithm Initialize Determine the initial system input vector x0 from prior knowledge, evaluate it in the physical system, observe the system outputs, and calculate the cost J0 . Initialize the observation data set D0 = {(x0 , J0 )} Repeat (for iterations: t = 0, 1, 2, …) 1. Build a GP surrogate model G t using the observation data set Dt and define an acquisition function u( x|G t ) that combines the attributes of the posterior distribution predicted by G t for any given x 2. Find xt+1 by solving the optimization problem xt+1 = arg minx u( x|G t ) 3. Evaluate xt+1 in the physical system and observe the cost Jt+1 . Then, update the observation data set Dt+1 = Dt ∪ {(xt+1 , Jt+1 )} End (if the termination conditions are satisfied)
120
4 Machine Learning for Beam Controls
where κ is a positive number balancing the exploitation and exploration. A larger κ chooses the next trial input vector at more uncertain regions for exploration. Normally, κ is chosen to be relatively large and decreases with optimization iterations. This encourages the exploration of a large input range at early iterations to search for the global optimum, whereas at later iterations, κ is reduced for smooth convergence. A simple strategy to adapt κ is κt+1 = λκt ,
(4.22)
where λ is a positive number smaller than 1. In step 2 of GPO, we use the acquisition function u(x) to determine the next trial input vector that minimizes u(x). Since the analytical expression of u(x) is often unknown, we must use some blackbox direct search algorithms like the Nelder-Mead simplex method. These algorithms may attract the GPO to a local minimum if the optimization problem is not convex. For example, Fig. 4.29a illustrates the optimization of a test cost function f (x) = sin x + sin(10x/3), −2 ≤ x ≤ 6,
(4.23)
whose global minimum within the given range is at x * = 5.146 (with f * = −1.9). Figure 4.29b indicates that the GPO algorithm settles down at a local minimum with x = −0.549 and f = −1.488. Final GP model of GPO
4
Confidence region Evaluation
Convergence of GPO
Real value Estimated mean value
1
x value
f(x)
2 0 -2 -4 -2
0.5 0 -0.5
a 2
4
0
6
2
Confidence region Evaluation
6
Real value Estimated mean value
4
6
8
10
Iteration Convergence of MG-GPO
x Final GP model of MG-GPO
4
4
x value
2
f(x)
b
-1 0
0
2 0
-2 -2
c
d
-2 0
2
x
4
6
0
2
4
6
8
10
Iteration
Fig. 4.29 Test of GPO and MG-GPO to minimize (4.23). GPO adopts a Matlab function fminsearch to choose the next input by minimizing u(x). MG-GPO uses GA to select the next inputs with parameters: N = 2, m = 10
4.3 Applications of Machine Learning Models in Beam Controls
121
Table 4.2 MG-GPO algorithm based on GA Initialize Determine N initial system input vectors x0,i , i = 1, ..., N from prior knowledge, evaluate them in the physical system, observe the system outputs, and calculate their costs J0,i , i = 1, . . . , N . ) } {( Initialize the observation data set D0 = x0,i , J0,i , i = 1, . . . , N Repeat (for iterations: t = 0, 1, 2, …) 1. Build a GP surrogate model G t using the observation data set Dt and define an acquisition function u( x|G t ) 2. Using the input vectors of N better performing (i.e., with lower costs) observations in Dt as parents, produce mN (with m > 1) new trial input vectors through crossover or mutation, and evaluate them in u( x|G t ) 3. Sort the acquisition-function evaluation results in step 2 and select the first N input vectors with lower u( x|G t ) values. These input vectors are denoted as xt+1,i , i = 1, . . . , N 4. Evaluate the input vectors xt+1,i , i = 1, . . . , N in the physical system and observe their resulting costs Jt+1,i , i = 1, . . . , N . Then, update the observation data set ) } {( Dt+1 = Dt ∪ xt+1,i , Jt+1,i , i = 1, . . . , N End (if the termination conditions are satisfied)
4.3.5.3
Multi-generation Gaussian Process Optimization
To improve the capability of finding the global minimum, we can use the evolutionary algorithms (EAs) like GA or PSO to determine the next trial input vector. They perform better in searching for the global minimum than direct search algorithms. Instead of determining one trial input vector, we may select multiple (N with N ≥ 1) trial input vectors that tend to minimize the acquisition function u(x). This leads to the multi-generation Gaussian process optimization (MG-GPO) algorithm developed by Xiaobiao Huang (Huang 2020; Huang et al. 2021; Zhang et al. 2021), as described in Table 4.2. In MG-GPO step 4, the observation data set Dt+1 may become too large after several iterations. To simplify the GP surrogate model, we often tailor Dt+1 before updating the model. In the article by Huang et {( ) } al. (2021), Dt+1 includes all of the new observations xt+1,i , Jt+1,i , i = 1, . . . , N and several better performing points in Dt , resulting in a smaller Dt+1 than that above. MG-GPO enhances the exploration capability of GPO with a higher probability of finding the global optimum. Since we only apply GA or PSO to optimize u(x) by simulation, there is no risk of beam loss when searching x in a large range. We applied MG-GPO to (4.23) and the results are shown in Fig. 4.29c, d, which settles down at the global minimum.
4.3.5.4
Beam Optimization with MG-GPO
MG-GPO combines the advantages of evolutionary algorithms and GP surrogate models. EAs improve the capability of finding the global optimum, and GP surrogate models make better use of the information in beam measurements. Since EAs only apply to the surrogate model, we avoid exploring aggressive inputs in the physical
4 Machine Learning for Beam Controls
-1 -2 -3
21:50
300 250 200 150 100 50 21:50
21:50 22:00 22:10 May 21, 2021
22:00 22:10 May 21, 2021
22:00 22:10 May 21, 2021
800
4
LBC2 [arb.]
EBC2 [mm]
0
LBC1 [arb.]
0.2 0 -0.2 -0.4 -0.6 -0.8
EBC1 [mm]
ELH [mm]
122
2
0 21:50
700
Beam meas. Initial value Goal
600 500 400
22:00 22:10 May 21, 2021
21:50 22:00 22:10 May 21, 2021
0 -50
21:50
22:00 22:10 May 21, 2021
1 0.8 0.6 0.4 0.2 0
22:00 22:10 May 21, 2021
21:50
22:00 22:10 May 21, 2021
21:50
22:00 22:10 May 21, 2021
50 0
-50 21:50
22:00 22:10 May 21, 2021
21:50
22:00 22:10 May 21, 2021
50
p L1 [deg]
r L1
1 0.8 0.6 0.4 0.2 0
-50
22:00 22:10 May 21, 2021
50
21:50
0
1 0.8 0.6 0.4 0.2 0
p X [deg]
21:50
50
r bst2
p bst1 [deg]
1 0.8 0.6 0.4 0.2 0
rX
p bst2 [deg]
r bst1
Fig. 4.30 Convergence of bunch2 parameters for operating point changing with MG-GPO. The initial output is y0 = [0.03 0.24 100 0.17 661]T and the target value is ydest = [−0.69 −2.05 223 3.53 578]T
21:50
22:00 22:10 May 21, 2021
0
-50
Fig. 4.31 RF pulse step settings for changing the bunch2 operating points with MG-GPO
system and reduce the risk of beam loss or interlock trips. MG-GPO can be used for online optimization in addition to SCO, RWO and RCDS (Chap. 3). It is demonstrated that MG-GPO often converges faster than RWO and RCDS when directly applied to the physical system. Of course, with the technique introduced in Sect. 4.3.3, the speed of RWO and RCDS can also be enhanced. We applied MG-GPO to change the SwissFEL bunch2 operating points under the same conditions as in Sect. 3.3.3. The test results are illustrated in Figs. 4.30 and 4.31. Compared to the RWO results in Fig. 3.15, MG-GPO managed to achieve the goals of all beam parameters (i.e.,
4.4 Feedback Control with Reinforcement Learning
123
without bias). It also demonstrated faster convergence than RCDS shown in Fig. 3.17. This experiment has used the Matlab-based realization of the MG-GPO algorithm implemented by Xiaobiao Huang. Interested readers can find the open-source code on GitHub: https://github.com/SPEAR3-ML/MG-GPO.
4.4 Feedback Control with Reinforcement Learning Reinforcement learning (RL) (Sutton and Barto 2018) is attractive to beam controls since its architecture matches well a feedback loop. They both observe the system outputs and decide the system inputs, aiming to maximize the performance. This section introduces the basic concepts of reinforcement learning and its application in implementing linear optimal controllers. In this case, the control policy and value functions are linear. The concepts, workflow, and methodologies discussed here can be easily extended to nonlinear cases of deep reinforcement learning (Sewak 2019; Zai and Brown 2020), whose policy and value functions are neural networks.
4.4.1 Introduction to Reinforcement Learning In reinforcement learning, a software agent affects the environment with actions determined from the observed states of the environment. The state describes the environment’s status completely. If the state at a time instance is known, the future states can be uniquely determined if given the future actions. The agent receives rewards in return as a critic of the goodness of the actions. In the agent, a policy determines the action for a given observed state. The policy is trained by the reinforcement learning algorithm based on the actions, states and rewards collected from the environment. An optimal policy tends to maximize the cumulative reward during an episode, which contains multiple steps of execution of the policy. In each step, the policy observes the state, determines the action, applies the action to the environment and receives the reward, and the environment transits to a new state. Note that the policy interacts with the environment in discrete time steps. The term “episode” has different meanings in different problems. When playing a video game, an episode starts when we click the “start” button and ends when the game fails or passes. When controlling a superconducting cavity with the agent, an episode can be defined from when a step disturbance happens to after the cavity voltage is stabilized to the setpoint again or is driven to error states over tolerances. The concepts of reinforcement learning are illustrated in Fig. 4.32. The architecture in Fig. 4.32 can be mapped to a feedback loop. The environment is equivalent to the plant to be controlled, and the policy is the feedback controller. The state and action are the plant’s output and input, and the reinforcement learning algorithm corresponds to the synthesizer of the controller. In control theory, we use cost (e.g., tracking error) instead of reward to evaluate the control performance.
124 Fig. 4.32 Concepts of reinforcemenet learning
4 Machine Learning for Beam Controls
Agent Policy
Action
RL Algorithm Reward State
Environment
A cost can be simply defined as a negative reward. When applying reinforcement learning to beam controls, the items in Fig. 4.32 represent the following entities: • Environment: the physical systems in the accelerator. • Policy: an optimal controller. • Action: the control settings of different subsystems (e.g., RF amplitude and phase, magnet current, Gun laser intensity and delay, etc.). • State: the beam parameter measurements (e.g., beam energy, bunch length, bunch arrival time, FEL pulse energy, etc.) • Cost (negative reward): the beam parameter errors or beam loss. When used in beam controls, reinforcement learning synthesizes an optimal controller (i.e., control policy) maximizing the integrated performance of a dynamical system during transient responses. For example, when controlling a superconducting cavity, we want to eliminate the disturbances as fast as possible and minimize the integrated cavity voltage error during the transient period. Note that the reinforcement learning requires the environment to be dynamical. The next state of the environment is only determined by the current state and the action received. In other words, we cannot directly apply reinforcement learning to static systems unless we introduce some dynamics into the system and use this extended system as the new environment (see Sect. 4.4.4). As mentioned above, reinforcement learning optimizes the performance of dynamical systems, and its outcome is an optimal controller. As a comparison, the optimization algorithms we have discussed, such as SCO, RWO, RCDS, GA, PSO, and MG-GPO, are mainly used to optimize static problems. Their outcome is a set of optimal parameters of the static system. The policy of an agent realizes part or all of the following three functions, which are fit by the reinforcement learning algorithms based on the data collected from the environment: (a) Policy function. It determines the optimal action (maximizing the state-value function) for a given state observation. It is often denoted as π (s), where s is the state of the environment (not the Laplace transform variable).
4.4 Feedback Control with Reinforcement Learning
125
(b) State-value function. It is also called value function. The value of a state s is the expected cumulative reward starting from the state s, assuming that the policy π controls the environment. Therefore, the value function, denoted as V π (s), is policy dependent. (c) State-action-value function. It is also known as Q-function. Assume a is a possible action that can be taken at the state s. The value of the state-action pair (s, a) is the expected cumulative reward if we take the action a at the state s and let the environment be controlled by the policy π from the next state on. It is usually denoted as Q π (s, a). Let’s check the relation between the value function and the Q-function. Suppose at state s, the possible action is denoted as a, which may take discrete or continuous values. The policy can be written as a probability distribution π(a|s), representing the probability of taking the action a at the state s. The randomness may come from the policy itself (like we actively choose random actions to explore the environment) or from the noise in the physical systems. For example, even if we have a deterministic policy (e.g., proportional-integral control), the measurement noise in s still introduces randomness in the resulting action. With these assumptions, we have the following relationship: [ ] ∑ V π (s) = Ea Q π (s, a) = π (a|s)Q π (s, a),
(4.24)
a
where Ea [·] is the mathematical expectation with respect to all possible actions. The right-hand side of (4.24) is for discrete actions, which should be replaced by an integration if the action is continuous. Reinforcement learning algorithms can be categorized based on the policy function, value function, Q-function formats, or on which functions are included. Traditional reinforcement learning implements these functions as linear functions, whereas deep reinforcement learning uses neural networks. Some algorithms only realize the Q-function and obtain the actions by directly optimizing the Q-function like a ∗ = π (s) = arg maxa Q π (s, a). They are called value-based algorithms. One example is the Q-learning algorithm. On the contrary, policy-based algorithms synthesize the policy function explicitly, such as the policy gradient algorithm. If an algorithm realizes both the policy and value functions, it belongs to another class called actor-critic algorithms. This book will not discuss the deep reinforcement learning, which is still far from practical usage. It is often hard to tune the neural network hyperparameters for training the agent successfully. Of course, with continued development, deep reinforcement learning may become stabler and promote more practical use cases. In Sect. 4.4.2, we introduce an actor-critic algorithm based on linear policy/value/Q functions. It is successful in solving linear quadratic Gaussian (LQG) control problems. Many other learning algorithms (e.g., Q-learning) have been successfully applied to control systems design (Wang et al. 2018; Farjadnasab and Babazadeh 2022). We chose the actor-critic algorithm because it covers most of the key concepts
126
4 Machine Learning for Beam Controls
and basic procedures of reinforcement learning in the domain of feedback control. In Sects. 4.4.3 and 4.4.4, we apply this algorithm to synthesize optimal controllers for a dynamical system (RF cavity) and a static system (SwissFEL bunch2), respectively.
4.4.2 Feedback Controller Design with Natural Actor-Critic Algorithm Many studies are being conducted to apply reinforcement learning to feedback control design. It synthesizes optimal feedback controllers using the data as an alternative to the traditional model-based design. The discrete state-space model of a linear time-invariant (LTI) plant is given by xk+1 = Axk + B(uk + wk ), x|k=0 = x0 yk = Cxk +D(uk + wk )+vk ,
(4.25)
where k = 0, 1, 2, … is the sample time index, x ∈ Rn x , u ∈ Rn u , and y ∈ Rn y are the state, input, and output vectors, respectively. The actuator noise w ∈ Rn u (i.e., disturbances normalized to the plant input) and measurement noise v ∈ Rn y (i.e., detector noise) are assumed to be zero-mean Gaussian noise. Their distributions are wk ∼ N (0, ∑ w ), vk ∼ N (0, ∑ v ), where ∑ w and ∑ v are covariance matrices. A, B, C and D are real constant matrices. The plant should be both controllable and observable. Lack of controllability (observability) implies that more actuators (sensors) are required to control the plant. The infinite horizon LQG problem (Skogestad and Postlethwaite 2005) is formulated to find a sequence of optimal uk to bring an initial state x0 to the zero-state (x = 0) with the following cost function minimized [ J =E
∞ ∑ (
uiT Rui
+
xiT Qxi
] ) ,
(4.26)
i=0
where Q is a positive semi-definite symmetric matrix (denoted as Q ≥ 0) and R is a positive definite symmetric matrix (denoted as R > 0). Both Q and R are empirical weighting factors to adjust the controller design. We use E[·] to represent the mathematical expectation because both u and x are random vectors in the presence of noise. Minimizing (4.26) implies that the control policy (generating u based on x or y) should reduce the state deviation and the required drive power during the procedure of bringing the initial state to zero. When the exact model of the plant is known, that is, the matrices A, B, C, D, ∑ w and ∑ v are given, the LQG problem can be solved explicitly. The resulting controller includes a Kalman filter (for estimating the state x from the input u and output y) and a state feedback law uk = −Kxˆ k , where K is a real constant matrix and xˆ k is the state estimate. In reality, the plant model is often unknown. An alternative for
4.4 Feedback Control with Reinforcement Learning
127
controller design is to use reinforcement learning based on the plant’s input–output data. This subsection will introduce an actor-critic learning method to design optimal controllers (Hua 2021) for LQG problems.
4.4.2.1
Problem Formulation
We formulate the control problem to be solved as a regulation problem, mainly for disturbance rejection. As in Fig. 4.33, the controller intends to maintain a constant system output (y = 0) against disturbances. We assume the plant is an LTI system, and its model is unknown. The input u ∈ Rn u (i.e., the action in reinforcement learning) and the output y ∈ Rn y are available from the data acquisition or measurements. The statistical properties of the actuator noise w and measurement noise v are unknown, either. We view w and v as part of the environment. In such a noisy environment, the output y and the input u produced by the policy π are random vectors. Compared to the standard LQG controller, the control policy π in Fig. 4.33 is equivalent to the combination of a Kalman filter and a linear quadratic regulator (LQR). The Kalman filter takes u and y as inputs to estimate the plant’s state, and the LQR implements the state feedback law. Since only u and y are available now, we can define an equivalent state of the plant as ) ( T T T T ]T ∈ Rn xl , n = l n + n , xl,k = [ uk−1 · · · uk−l yk−1 · · · yk−l xl u y
(4.27)
which is constructed by l historical input–output data points of the plant. xl,k can represent the system’s full states if l is large enough, i.e., with a sufficient number of historical data points. Typically, l is treated as a hyperparameter and tuned empirically. Note that we use the index (k–1) to denote the just finished time step and k for the upcoming one. The design problem can be formulated as follows: design a control policy π to produce an optimal action uk (system input for the next time step) based on the state xl,k , minimizing the value function V
π
(
[
)
xl,k = E
∞ ∑
γ
i−k
] ( T ) T ui Rui + yi Qyi ,
(4.28)
i=k
Fig. 4.33 Formulation of the regulation problem and the map to RL components
Agent
u
w Control Policy
π
y
Environment
u
Plant
+ +
y v
128
4 Machine Learning for Beam Controls
where γ is a discount factor with 0 < γ ≤ 1; Q ≥ 0 and R > 0 are weighting matrices. It implies that the control policy must keep the output y close to zero against disturbances with minimum usage of u. Note that we directly use the cost function (4.26) to define the value function (4.28), so it should be minimized by the policy instead of being maximized. One must distinguish this definition from the value functions (to be maximized) defined in standard reinforcement learning.
4.4.2.2
Value Function and Q-Function
The value function (i.e., state-value function) (4.28) is defined for any state xl,k , representing the mean integrated cost from time k until the end of control ( (e.g., ) after the plant reaches y = 0 and u = 0) by the policy π. Therefore, V π xl,k is also dependent on π. The discount factor γ determines how important the future cost is. A small γ focuses more on short-term costs and weighs less on long-term costs. The recursive form of (4.28) follows the Bellman equation: ( ) [ ( )] V π xl,k = E ukT Ruk + ykT Qyk + γ V π xl,k+1 .
(4.29)
We can use the Monte-Carlo (MC) method to estimate the state-value function. Multiple control trajectories can be obtained from the same initial state xl,k using the control policy π. Each trajectory is a sequence of inputs and outputs: uk , yk , uk+1 , yk+1 , …, u N , y N , assuming the control ends at time N. The mathematical expectation of (4.28) can be estimated with the mean value of the results calculated from these control trajectories. The MC method requires many data and is inefficient. We will use another algorithm, the temporal difference (TD) method, to estimate the value function (and the Q-function), which uses the data of two steps (xl,k , uk and yk ) to update the state or state-action values iteratively. To evaluate an action uk at the state xl,k , we define the Q-function (i.e., stateaction-value function). It represents the mean integrated cost of a control trajectory starting from xl,k and acting uk at this state, letting the policy π controls the plant from time k + 1. The difference between the Q-function and the value function is only the action taken at time k: the value function uses the control policy to determine uk = π(xl,k ), whereas the Q-function takes a specific uk . We can write down the Bellman equation for the Q-function as: ( ) ( ) Q π xl,k , uk = ukT Ruk + ykT Qyk + γ V π xl,k+1 .
(4.30)
Compared to (4.29), the mathematical expectation is removed because uk is deterministic for the Q-function. Though yk is still with randomness considering the actuator and measurement noise, we also remove its expectation to simplify the calculation. It has been shown that this approximation does not affect the convergence when computing the Q-function using ( ) ( the TD) method (Sutton and Barto 2018). Functions V π xl,k and Q π xl,k , uk will be used to evaluate the performance of the control policy π.
4.4 Feedback Control with Reinforcement Learning
4.4.2.3
129
Strategy of Problem Solving
We follow the strategy below to design the control policy π: 1. Based on the domain knowledge of LTI systems, we define a policy function of a particular format with unknown parameters. It is called a parameterization of the control policy, denoted as πθ , where θ is a vector of the unknown parameters. Then the design problem is simplified to identifying θ. 2. We define a critic function to evaluate the performance of the parameterized policy πθ . The value function is often used as the critic function. It depends on πθ and, therefore, is a function of θ. We must find the optimal θ* to minimize the critic function. θ* can be determined iteratively using the critic function’s gradient regarding θ, known as the policy gradient algorithm. 3. The value function is unknown. It should also be parameterized, i.e., we define a simplified format of the value function with unknown parameters. 4. We use the policy iteration method to improve πθ iteratively. At each iteration, the unknown parameters of the critic function are identified from the data collected from the system inputs and outputs. Then, the policy parameters (stacked in vector θ) are updated using the policy gradient algorithm. The control policy π is also called an actor. Since we identify both the policy and the critic functions, the strategy above is an actor-critic algorithm in the framework of reinforcement learning.
4.4.2.4
Parameterization of Control Policy
In a model-based LQG controller (Skogestad and Postlethwaite 2005), a Kalman filter estimates the states based on uk−1 and yk−1 , and an LQR produces uk using a state feedback law. Both the Kalman filter and the LQR are linear. Therefore, a natural assumption of the control policy πθ is a linear function as ( ) uk = πθ xl,k = Kxl,k ,
(4.31)
where K ∈ Rn u ×n xl is an unknown constant real matrix. The policy parameter vector θ is a column vector formed by stacking the columns of K: θ = [ K 11 K 21 · · · K n u 1 K 12 K 22 · · · ]T ∈ Rn u n xl .
(4.32)
The actor-critic algorithm used here is an on-policy algorithm that uses the same policy to obtain the learning data. In contrast, an off-policy algorithm uses a different policy for data collecting. When collecting learning data using the policy (4.31), white noise is introduced artificially for persistent excitation, then uk = Kxl,k + ξ,
(4.33)
130
4 Machine Learning for Beam Controls
) ( where ξ ∈ Rn u is a Gaussian white noise vector with ξ ∼ N 0, ∑ ξ . Since ξ is artificially produced, its covariance matrix ∑ ξ is known. Typically, ξ is much larger than the plant’s unknown disturbances and measurement noise to improve the learning efficiency. Note that ξ is only needed for learning the policy. During regular operation, the control policy (4.31) should be used. With ξ in the presence, uk becomes a random vector with its probability density function written as ( ) πθ uk |xl,k =
T −1 1 1 e− 2 (uk −Kxl,k ) ∑ ξ (uk −Kxl,k ) | | 1/2 (2π )n u /2 |∑ ξ |
(4.34)
derived from (4.33) and ξ’s distribution. It describes the probability density of uk for a given xl,k . It also implies that the mean value of uk is Kxl,k and its covariance matrix is ∑ ξ . Note that we use the notation πθ to describe both the policy function (4.31) and the probability density function (4.34).
4.4.2.5
Policy Gradient
To determine the unknown policy parameter vector θ, we need to construct a critic function that depends on θ to evaluate the performance of πθ . Recall that the control goal is to minimize ( )the value function (4.28) with πθ for any given initial state xl,k . Therefore, V πθ xl,k is a perfect candidate of critic function. As implied by the Mento-Carlo method, the value of a state can be computed by averaging the values of multiple control trajectories starting from that state. We define the control trajectory explicitly using the states and inputs of the plant: { } τ := xl,k , uk , xl,k+1 , uk+1 , . . . ∼ πθ .
(4.35)
Note that the output yk is contained in the state xl,k+1 . In a control trajectory, the control action uk is derived from the state xl,k according to (4.33). Since we have added noise ξ, any control trajectory starting from the same initial state xl,k will differ from each other. In other words, the control trajectory τ is a random process and its control actions are sampled from the distribution πθ given by (4.34). For each particular trajectory τ, we can compute the value as follows: ∞ ( ) ∑ ( ) Vτπθ xl,k = γ i−k uiT Rui + yiT Qyi ,
(4.36)
i=k
where the inputs and outputs are in the trajectory τ. Then, the value function is calculated as the mean of the values of all possible instances of τ: ( ) [ ( )] ∑ ( ) V πθ xl,k = Eτ Vτπθ xl,k = Pτ Vτπθ xl,k , τ
(4.37)
4.4 Feedback Control with Reinforcement Learning
131
where Eτ [·] is the mathematical expectation for all possible τ, and Pτ is τ ’s probability, which is a function of θ and given by ) ( ) ( ) ( ) ( Pτ (θ) = P xl,k πθ uk |xl,k P xl,k+1 |xl,k , uk πθ uk+1 |xl,k+1 . . .
(4.38)
Here P describes the state transition probability, which is a property of the plant and is independent of θ. ( ) Given the value function V πθ xl,k , the optimal θ can be obtained by solving the following optimization problem ∑ ( ) ( ) θ∗ = arg min V πθ xl,k = arg min Pτ (θ)Vτπθ xl,k . θ
θ
(4.39)
τ
The gradient decent algorithm can be used to solve this problem, which computes θ* iteratively: ( )| θ j+1 = θ j − α ∇θ V πθ xl,k |θ=θ j ,
(4.40)
]T [ where ∇θ = ∂/∂θ1 ∂/∂θ2 · · · is the gradient with respect to θ, and α is a positive number to control the convergence speed. This method is known as the policy gradient algorithm (Sutton and Barto 2018). Now the key point is to calculate the gradient ( ) ∑ πθ ( ) ∇θ V πθ xl,k = Vτ xl,k ∇θ Pτ (θ).
(4.41)
τ
( ) Here we used the fact that Vτπθ xl,k is a scalar value for a specific trajectory and does not contain θ. The gradient term in (4.41) can be calculated as ∇θ Pτ (θ) = Pτ (θ)∇θ ln Pτ (θ) Pτ (θ) [ ( ) ( ) ( ) ] = Pτ (θ)∇θ ln P xl,k + ln πθ uk |xl,k + ln P xl,k+1 |xl,k , uk + . . .
∇θ Pτ (θ) = Pτ (θ)
= Pτ (θ)
∞ ∑
( ) ∇θ ln πθ ui |xl,i ,
(4.42)
i=k
considering that the transition probability of the plant state is independent of θ. Then we get the formula of the policy gradient: ∞ ∑ ( ) ∑ πθ ( ) ( ) ∇θ V πθ xl,k = Vτ xl,k Pτ (θ) ∇θ ln πθ ui |xl,i τ
= Eτ
[
i=k
] ∞ ( )∑ ( ) πθ ∇θ ln πθ ui |xl,i . Vτ xl,k i=k
(4.43)
132
4 Machine Learning for Beam Controls
The policy gradient is calculated using the gradient of the probability of each stateaction pair (xl,i , ui ) weighted by the value of the trajectory. This means that if the value of the trajectory containing (xl,i , ui ) is small (or large), the probability of( taking ) ui in state xl,i is increased (or decreased). In (4.43), the weighting factor Vτπθ xl,k is applied to all state-action pairs in the trajectory ( ) and affects their probabilities equally. This is not accurate because even if Vτπθ xl,k is small (i.e., the trajectory τ performs well), some state-action pairs in the trajectory may still perform poorly, and it is not appropriate to increase their probabilities as we do for others. Therefore, we define an advantage function Aπθ for each state-action pair as its weighting factor, then (4.43) is rewritten as ] [∞ ∑ ( ) ( ) ( ) πθ ∇θ V xl,k = Eτ A xl,i , ui ∇θ ln πθ ui |xl,i , πθ
(4.44)
i=k
where the advantage function is defined as ( ) ( ) ( ) Aπθ xl,i , ui = Q πθ xl,i , ui − V πθ xl,i
( ) ( ) = uiT Rui + yiT Qyi + γ V πθ xl,i+1 − V πθ xl,i .
(4.45)
The advantage function is defined as the unbiased Q-function of the state-action pair. It illustrates the relative advantage of an action compared to other actions for the same state. By subtracting the mean of the Q-function with respect to all possible actions (i.e., the value function), we can reasonably assume that actions with a negative advantage are better than those with a positive advantage (note that with the value and Q-functions). According definitions in Sect. 4.4.2.2, ( )we expected [ ( lower )] to (4.24), we have V πθ xl,i = Eu Q πθ xl,i , ui , where Eu [·] is the mathematical expectation with respect to all possible u. The advantage function is unbiased, so the policy gradient (4.44) can absolutely increase or( decrease ) the probability of πθ x , depending on the value of A , u x taking u(i in state l,i i . In (4.44), the term ) l,i ∇θ ln πθ ui |xl,i can be calculated with (4.34). To calculate the policy gradient, we still need the advantage function, which is determined by the value function of two steps (i.e., with the TD method) according to (4.45).
4.4.2.6
Parameterization of Value and Advantage Functions
In this part, we simplify the format of the value and advantage functions and parameterize them. It has been shown (Hua 2021) that if the control policy follows (4.31), the value function can be written as a linear function ( ) T T T V πθ xh,k = xh,k Pxh,k + c = [ xh,k ⊗ xh,k 1 ]p,
(4.46)
4.4 Feedback Control with Reinforcement Learning
133
where P ∈ Rn xh ×n xh is a constant symmetric matrix with P > 0, and c is a constant. We use h to denote the number of historical inputs and outputs that form the equivalent state (see 4.27). It provides the flexibility ) the policy function and the value ( to define function independently. Here n xh = h n u + n y . We have written the parameterization in vector format, where p ∈ Rn xh n xh +1 is a column vector formed by stacking the columns of P (similar to (4.32)) and the constant c. The operator ⊗ represents the Kronecker product. The Kronecker product of two matrices of arbitrary size, A ⊗ B, is a block matrix, and each block is the multiplication of the corresponding element of A and the matrix B. For example, if A = [1 2 3], B = [4 5], then A ⊗ B = [4 5 8 10 12 15]. The articles (Sutton et al. 1999; Peters et al. 2005) indicate that the advantage function (4.45) can be replaced by a compatible function approximation (i is the time index to match (4.45)): ( ) [ ( )]T Aπθ xl,i , ui ≈ ∇θ ln πθ ui |xl,i ω,
(4.47)
where ω ∈ Rn u n xl is an unknown parameter vector. With some math work, the gradient term can be calculated from (4.34) as ( ) ( ) ( ) ui − Kxl,i . ∇θ ln πθ ui |xl,i = xl,i ⊗ In u ∑ −1 ξ
(4.48)
where In u is the unit matrix of size nu . Substitute (4.46), (4.47), and (4.48) into (4.45), we obtain an equation for the unknown parameter vectors p and ω as ( T ) T T T [ xh,i ⊗ xh,i 1 ] − γ [ xh,i+1 ⊗ xh,i+1 1] p ) ( )]T [( ui − Kxl,i ω = uiT Rui + yiT Qyi . + xl,i ⊗ In u ∑ −1 ξ
(4.49)
The input–output data of the plant can be obtained by applying the policy πθ with Gaussian noise according to (4.33). Each set of data, (ui , yi , xl,i , xh,i , xh,i+1 ), can form an equation with (4.49). If we collect enough data points, the parameter vectors p and ω can be calculated using the least-square method.
4.4.2.7
Natural Policy Gradient Algorithm
With the parameterized advantage function, the policy gradient can be written as [∞ ] ∑ ( ) ( )[ ( )]T ∇θ V xl,k = Eτ ∇θ ln πθ ui |xl,i ∇θ ln πθ ui |xl,i ω = Fθ ω. πθ
(4.50)
i=k
Such a gradient function is not easy to compute since it must estimate the expectation with respect to all possible control trajectories. The article from ( )Peters et al. (2005) replaces the gradient (4.50) with the natural gradient ∇˜ θ V πθ xl,k to simplify
134
4 Machine Learning for Beam Controls
the calculation. This method does not follow the steepest direction in the parameter space but the steepest direction with respect to the Fisher metric: ( ) ( ) ∇˜ θ V πθ xl,k = G−1 (θ)∇θ V πθ xl,k ,
(4.51)
where G(θ) is the Fisher information matrix. The natural gradient can still guide the search for θ to the local minima of the value function. This is because the Fisher metric guarantees that the angle between the natural gradient and the ordinary gradient is less than 90°. It has been proved in article (Peters et al. 2005) that in our case, Fθ is the Fisher information matrix so that the natural gradient simplifies to ( ) ∇˜ θ V πθ xl,k = G−1 (θ)Fθ ω = ω.
(4.52)
This is an incredibly simple result. Therefore, if we manage to obtain the value of ω via the fitting of (4.49), the parameter θ can be updated following the rule below: θ j+1 = θ j − α ω|θ=θ j .
(4.53)
This algorithm is called the natural actor-critic (NAC) algorithm, as summarized in Table 4.3. Table 4.3 NAC algorithm Input: Number l and h (l ≤ h) of historical input–output points for the equivalent state definitions in the policy and value functions; weighting matrices Q ≥ 0 and R > 0; discount factor γ ; initial stabilizing control policy parameter vector θ0 ; excitation noise ξ; policy gradient decent speed α Initialize: Record the operating point of the plant (i.e., the initial input and output vectors, u O P and y O P ) Repeat for iterations (i.e., episodes) (for j = 0, 1, 2, …) 1. Initialize the system state that deviates from the operating point. This can be done by introducing a small disturbance to the plant (e.g., a small deviation of the plant input or a small external disturbance, e.g., to disturb a superconducting cavity, we can generate a mechanical vibration via the piezo tuner) 2. Regulate the plant with the control policy πθ j with Gaussian noise according to (4.33) and collect the plant input–output data 3. Subtract the operating point input u O P (and output y O P ) from the plant input (and output), resulting in the incremental input and output of the plant. This step is necessary to synthesize the regulator 4. Using the incremental input and output data, solve (4.49) with the least-square method and update the control policy to πθ j+1 according to (4.53) || || 5. Stop the iteration if the parameter vector θ converges (i.e., ||θ j+1 − θ j || < ε with ε a small positive number) or the control performance is satisfactory (i.e., the incremental output quickly approaches 0 with a reasonable input) End
4.4 Feedback Control with Reinforcement Learning
135
4.4.3 Example: RF Cavity Controller Design In this subsection, we implement an optimal controller for an RF cavity using the NAC algorithm. In particle accelerators, RF cavities are used to accelerate charged particles. Lowlevel radio frequency (LLRF) systems (Simrock and Geng 2022) are employed to control the cavity fields for desired and stable beam acceleration. A LLRF feedback loop controlling the cavity has usually two different operation modes: (a) Controlling the amplitude and phase of the cavity field to track a time-varying setpoint. This is the tracking mode, and the controller is called a tracker. (b) Stabilizing the cavity field around a constant setpoint against disturbances. This is the regulation mode, and the controller is referred to as a regulator. The tracking mode is applied to the cavities operating in the pulsed mode or in the booster storage rings, where the cavity voltage must be ramped up when boosting the beam energy. The regulation mode is mainly used to control the cavities operated in the continuous-wave (CW) mode, e.g., in CW superconducting free-electron lasers (FELs) or synchrotron radiation storage rings. Most beam feedback loops (e.g., beam energy feedback, orbit feedback, etc.) operate in the regulation mode. Typically, we set up and optimize the beam parameters with the beam feedbacks turned off, then close the loops to stabilize the beam at the achieved operating point. The NAC algorithm is used to synthesize optimal regulators. It can be considered a data-driven version of the LQG design method. The word “optimal” refers to the Gaussian disturbances at the system input. The controller designed with the NAC algorithm may not perform well for a tracking problem. We will demonstrate this later after implementing the cavity regulator. The regulator design problem for an RF cavity is formulated in Fig. 4.34. The nominal operating point of the cavity is defined by a feedforward u F F and the resulting cavity voltage yop with the feedback loop open. We denote this relationship as yop (s) = GC (s)u F F (s), where GC (s) is the transfer function of the cavity and the input and output vectors are represented as Laplace transforms. With unknown disturbances, the cavity voltage deviates from yop , and the incremental cavity voltage yinc = ycav −yop serves as the input signal to the control policy. In this configuration, the control policy π only stabilizes the cavity against disturbances, and the control objective is to regulate yinc to 0 with minimum u F B . This matches well with the definition of a regulation problem. As an example, we implement an optimal regulator for a TESLA cavity with the following parameters: cavity resonance frequency f 0 = 1.3 GHz, loaded quality factor Q L = 3e6, and the cavity is detuned by 10 times the cavity half-bandwidth. The half-bandwidth of the cavity is calculated by ω1/2 =
ω0 π f0 = . 2Q L QL
(4.54)
136
4 Machine Learning for Beam Controls
Fig. 4.34 Formulation of the regulation problem of an RF cavity. The actuator and measurement noise is not shown
We design the control policy using the data obtained from simulation. The cavity behavior is modeled with the following continuous state-space equation d dt
[
] [ ] [ ] ][ u cav I ycav I ycav I −ω1/2 −Δω = + ω1/2 , Δω −ω1/2 ycav Q ycav Q u cav Q
(4.55)
where ucav = [ u cav I u cav Q ]T and ycav = [ ycav I ycav Q ]T . This is the time-domain representation of the cavity transfer function GC (s). The subscripts I and Q denote the in-phase and quadrature components of the signal, representing the real and imaginary parts of the complex envelopes of the cavity input–output RF signals. The detuning, Δω = ω0 − ω R F , is the difference between the cavity resonance (angular) frequency ω0 and the frequency of the drive RF, ω R F . As mentioned above, we assume Δω = 10ω1/2 in the simulation. From the properties of linear systems, the input (u F B ) and output (yinc ) of the environment in Fig. 4.34 also satisfy (4.55). After discretization, the environment can be written in the form of (4.25) with the variables and matrices as follows: ]T ]T [ [ u ← u F B = u F B I u F B Q , x ← yinc = yincI yincQ , y ← yinc , ] [ ] [ [ ] Ts ω1/2 0 10 1 − Ts ω1/2 −Ts Δω ,B = ,C = A= , D = 0, Ts Δω 1 − Ts ω1/2 0 Ts ω1/2 01 (4.56) where u, x and y are the general representations of the input, state, and output vectors in (4.25), which are mapped to the variables in Fig. 4.34. Here Ts is the sampling time. Note that we have neglected the actuator noise w and measurement noise v in this study. The settings of the NAC algorithm are as follows. The length of the historical inputs and outputs for the policy function parameterization is l = 1 and for the value function is h = 4. The discount factor γ = 1, and the policy gradient descent speed α = 0.1. The initial policy parameter vector θ0 = 0. The weighting matrices Q and R and the covariance matrix of the excitation noise ξ are given by
4.4 Feedback Control with Reinforcement Learning
137
[
] [ ] [ ] 20 0 0.1 0 10 Q= ,R = , ∑ξ = . 0 20 0 0.1 01
(4.57)
In this study, a sampling time Ts = 10 μs is chosen, which is much smaller than the cavity’s time constant (735 μs), allowing for checking the transient response. After 100 iterations (i.e., episodes) of running the NAC algorithm, the parameter vector θ of the control policy converges, and its evolution during the learning process is shown in Fig. 4.35. The learned control policy (also referred to as the NAC regulator) is used to regulate the cavity with an initial output yinc0 = [0.88 −0.59]T and with the excitation noise present. It is assumed that the operating point is u F F = 0 and yop = 0. The open and closed loop outputs of the cavity are compared in Fig. 4.36. The regulator shows a good ability to suppress disturbances. The achieved controller also works well for different levels of cavity detuning. Figure 4.34 implies that by replacing yop with a time-varying setpoint, such as the linear ramp in Fig. 4.37, the control policy may be used for setpoint tracking. This is because the control policy always tries to keep yinc = ycav − yop close to zero, which is equivalent to keeping ycav = yop . Simulation shows that this does not work so well, see the curves labeled by “Closed loop (NAC)” in Fig. 4.37, which are the cavity response to a time-varying yop (t) under the control of the NAC regulator. The poor tracking performance is predicted by the NAC design process, in which the reference signal (i.e., setpoint) is not considered and both the input and output of the plant are random signals. The tracking error is better explained in the frequency domain. As given by (4.31), the control policy of the NAC regulator is a constant matrix K, providing a constant gain to the feedback loop at all frequencies. The NAC design process focuses on adapting K iteratively to mitigate the wideband Gaussian disturbances (i.e., ξ in Eq. 4.33) acting on the plant input. However, such a K may
0.1
0.1
0.3
0
0.2
(4)
-0.1
(3)
0.2
(2)
(1)
0
0.1
-0.1
-0.2 -0.3 50
0
100
0
-0.2
0 0
50
0
100
0
100
100
0
1
0
50
Iteration
Iteration
Iteration
Iteration
50
0
-2
(8)
(7)
-1
(6)
(5)
0 -0.5 -1
-4
-1 0
50
Iteration
100
0
50
Iteration
100
-2
0
50
100
Iteration
Fig. 4.35 Evolution of the parameters θ (with 8 elements) during learning
0
50
100
Iteration
138
y incI
1 Open loop
0.5
Closed loop
0
-0.5 -1 0
1
2
3
4
Time [s]
5 10-3
1 Open loop
0.5
y incQ
Fig. 4.36 Performance of the synthesized control policy. In the open-loop case, the initial cavity voltage is damped slowly due to the large time constant of the cavity, and the oscillations arise from the significant detuning of the cavity. In the closed-loop case, the NAC regulator quickly controls the (incremental) cavity voltage to zero
4 Machine Learning for Beam Controls
Closed loop
0 -0.5 -1 0
1
2
3
Time [s]
4
5 10
-3
not provide sufficient low-frequency loop gain associated with the dynamics of the cavity, GC . A large low-frequency loop gain is necessary for improving the tracking performance. Specifically, if we want to remove the steady-state error completely, the loop gain at DC should approach infinity, i.e., an integrator should be included in the loop. Many studies have been carried out to solve tracking problems with reinforcement learning algorithms (Modares and Lewis 2013; Kiumarsi et al. 2014). We will not discuss these algorithms here. Designing optimal controllers for tracking problems is difficult because we can often only optimize for a specific reference signal (e.g., step, linear ramp, or sinusoidal signals). The traditional frequency-domain design methods (e.g., loop shaping, internal model control (IMC), etc.) are better suited for designing tracking controllers. If we still want to benefit from the NAC regulator but with improved tracking performance, additional control must be introduced. One choice is to implement a feedforward controller that determines an accurate feedforward signal u F F (t) for achieving the desired yop (t). This is the original idea of the configuration in Fig. 4.34, where the regulator only controls random disturbances, and the feedforward establishes the desired (time-varying) operating point. With an arbitrary desired output yop (t), an inverse of the plant model is required to determine the required feedforward in real-time. This is not easy, especially when there is a delay in the system. However, if yop (t) is repetitive, such as when the cavity operates in the pulsed mode, the required feedforward signal can be determined with adaptive algorithms (Simrock and Geng 2022). As shown in Fig. 4.37, an accurate feedforward results in excellent tracking of the setpoint. See the curves labeled by “Closed loop (NAC + FF)”. Alternatively, we consider introducing a separate integral controller to improve the tracking performance. The integral controller has good tracking performance for low frequency setpoint changes, as demonstrated by the curves of “Closed loop (NAC + I FB)”. Of course, the integral gain should be optimized not to cause instability with
4.4 Feedback Control with Reinforcement Learning Setpoint Open loop
139
Closed loop (NAC) Closed loop (NAC + FF)
Closed loop (NAC + I FB)
y cavI
1
0.5
0 0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
Setpoint Open loop
y cavQ
0.5
5 10-3
Time [s] Closed loop (NAC) Closed loop (NAC + FF)
Closed loop (NAC + I FB)
0
-0.5 0
0.5
1
1.5
2
2.5
3
3.5
Time [s]
4
4.5
5 10-3
Fig. 4.37 Comparison of the tracking performance of NAC regulator and NAC regulator with feedforward or with additional integral control. The setpoint corresponds to yop
the NAC regulator coexists. Typically, we use the integral control to perform low frequency setpoint tracking and use the NAC regulator to suppress wideband disturbances. It can be seen from Fig. 4.37 that the NAC regulator suppresses random disturbances quite well in all closed-loop cases compared to the open-loop results.
4.4.4 Example: Static Feedback Controller Design As discussed in Chap. 2, the beam response of an accelerator is often modeled as a MIMO system. It typically consists of a static response matrix and a SISO transfer function describing the dynamics of the response. To control such a system, we have split the controller into two parts, dealing with the static and dynamical parts of the beam response separately. The dynamical controller is relatively simple. In most cases, a proportional-integral-derivative (PID) controller is sufficient, which can be designed using the loop shaping or IMC methods. The static controller, which is in principle a pseudo-inverse of the response matrix, is more challenging to design. Some design methods have been discussed in Sect. 2.4.
140
4 Machine Learning for Beam Controls
The reinforcement learning method discussed in this chapter offers an alternative for designing optimal controllers, as demonstrated with a dynamical system (RF cavity) in the last subsection. This subsection will apply the NAC algorithm to a static system. The SwissFEL bunch2 response matrix, firstly given in Chap. 2 and repeated in (4.58), will be the plant to be controlled in our simulation study. ⎡
Rsys
0.339 ⎢ 0.877 ⎢ ⎢ = ⎢ −0.260 ⎢ ⎣ −0.661 0.400
0.107 −0.213 −0.776 0.207 −0.274
0 1.542 −0.465 −1.590 0.723
0 −0.107 −0.201 0.083 −0.047
0 −1.188 −0.035 1.207 −0.785
⎤ 0 0 0 −0.149 0 0 ⎥ ⎥ ⎥ 0.529 0 0 ⎥. ⎥ −0.075 −1.123 3.044 ⎦ 0.230 0.857 −0.546 (4.58)
Since we only demonstrate the static controller design with simulation, we will not distinguish the physical meanings of the inputs and outputs. The inputs will be denoted as “input n” with n = 1,…,8 and the outputs as “output m” with m = 1,…,5. The theoretical basis of reinforcement learning is the Markov process, which requires the system under control to have state transitions; that is, the system should be dynamical. To satisfy this requirement, we must introduce some dynamics to the static system, such as an integrator or a low-pass filter, as shown in Fig. 4.38. Similar to the discussion in Sect. 4.4.3, the feedforward input u F F and the induced output yop = Rsys u F F define the operating point of the static system. We are interested in the transfer relation between u F B and yinc = ysys − yop , which are the input and output of the environment controlled by the policy π. The control objective is to regulate yinc to 0 with minimum u F B . The discrete transfer relation of the environment in Fig. 4.38 can be written as xk+1 = g2 I8 xk + g1 I8 u F B,k , yinc,k = Rsys xk ,
(4.59)
Fig. 4.38 Formulation of the regulation problem of a static system. The actuator and measurement noise is not shown
4.4 Feedback Control with Reinforcement Learning
141
where k = 0, 1, 2, … is the sampling time and I8 is an 8 × 8 unit matrix. Compared to (4.25), we have A = g2 I8 , B = g1 I8 , C = Rsys , and D = 0. Note that x is the actual input vector to the static system. The first equation of (4.59) describes the dynamics introduced by g1 and g2 . These dynamics do not belong to the system but are included in the environment controlled by the policy. The discrete transfer function of the dynamical part of the environment can be written as x(z) = G(z)u F B (z), where G(z) =
g1 I8 . z − g2
(4.60)
The independent variable z is for the z-transform. We assume that the real-valued parameters g1 > 0 and g2 > 0. A stable discrete system requires that its pole should be within (or on) the z-plane unit circle, which further requires that g2 ≤ 1 (for stable open loop) and g1 ≤ g2 + 1 (for stable closed loop), assuming that Rsys π = I5 (i.e., the control policy π only inverts Rsys ). Note that G(z) is an infinite impulse response (IIR) low-pass filter with a DC gain of g1 /(1 − g2 ) and a 3-dB cut-off frequency at f cut =
) ( 2 g − 4g2 + 1 fs arccos − 2 2π 2g2
(4.61)
with f s the sampling frequency. For a fixed g1 , a larger g2 results in a higher DC gain and a smaller f cut . Specifically, if g2 = 1, G(z) becomes a discrete integrator with an infinite DC gain. We synthesize the control policy π using the NAC algorithm with the following parameters: l = 1, h = 2, γ = 0.9, α = 0.1, Q = I5 (5 × 5 unit matrix), R = 0.01×I8 , and ∑ ξ = 0.001 × I8 . Note that ξ is the excitation noise added artificially to u F B and must be larger than the actuator and measurement noise in the system. This is critical for the NAC algorithm to converge. In the simulation, the variance of the measurement noise added to ysys is 1e−4, and the actuator noise is set to zero. Therefore, the fluctuations at the system output is dominated by ξ. The parameters of G(z) are g1 = 1.0 and g2 = 0.995. Choosing g2 smaller than 1 reduces the lowfrequency loop gain and avoids the NAC algorithm to diverge. The NAC algorithm is executed for 100 iterations (i.e., episodes). In each iteration, we introduce a random initial state vector x (larger than the effects of ξ) and run the closed loop controlled by the newest policy π for 500 steps with ξ present. The resulting input–output data are used to update the parameters of the control policy π. The NAC algorithm converges, and we apply the obtained policy to control the system in Fig. 4.38. The simulation results are shown in Fig. 4.39. In the closed-loop case, the control policy brings the system output to 0 quickly from a nonzero initial state in the presence of excitation noise. We have assumed the operating point u F F = 0 and yop = 0. The excitation noise is a wideband Gaussian noise added to u F B , which is low pass filtered by G(z) before applying to the static system input. In Fig. 4.39, the control performance of output 1 is poor, implying a small loop gain for this output channel. Another simulation is performed to check the closed-loop response to the actuator noise (see Fig. 4.40). We apply a Gaussian noise and a step disturbance directly to
142
4 Machine Learning for Beam Controls 0.4
6
0
0 Open-loop Closed-loop
2
0
-0.2 0
4
Output 3
0.2
Output 2
Output 1
Open-loop Closed-loop
100 200 300 400 500
Time Step
0
Open-loop Closed-loop 0
100 200 300 400 500
Time Step
0 -1
Output 5
Output 4
2
100 200 300 400 500
Time Step
Open-loop Closed-loop
4
-4
-6 0
6
-2
-2 -3 Open-loop Closed-loop
-4 -2 0
100 200 300 400 500
Time Step
0
100 200 300 400 500
Time Step
Fig. 4.39 Test of the synthesized control policy for regulating the system output against a random initial state and a Gaussian noise added to the input of the environment, u F B
the system input (added to usys ) with the excitation noise in u F B removed. With the closed loop, the step disturbance is successfully eliminated (not fully for output 1), but the Gaussian noise cannot be removed. This means, the synthesized control policy is optimal regarding the Gaussian noise added to u F B (i.e., input of the environment) instead of that added to usys (i.e., input of the static system). The output 1 in Fig. 4.40 has a steady-state error in response to the step disturbance due to the limited DC gain of G(z). As mentioned before, if g2 = 1, then G(z) becomes a discrete integrator and has an infinite DC gain. Without re-synthesizing the control policy, we directly set g2 = 1, and the steady-state error in Fig. 4.40 is successfully eliminated (not shown). This gives us a hint that the synthesized control policy together with G(z) operated as a discrete integrator may work for setpoint tracking. This idea is proved by applying a time-varying yop (t) as the setpoint, as shown in Fig. 4.41, where the setpoint is successfully tracked. The output 1 responses slower due to the relatively small loop gain, but its steady-state error is successfully removed. The originally obtained control policy (noted as “control policy 1”) has poor transient performance, such as the large overshoots in the outputs 2 and 3. We can adjust the NAC parameters to improve the transient response. In the example in Fig. 4.41, we update two parameters (γ = 0.995 and R = I8 ) and obtain a new policy noted as “control policy 2”. The new control policy achieves better transient response, including faster response in output 1 and smaller overshoots in outputs 2 and 3. The new control policy has a larger weighting factor R for u in the value function (4.28), imposing more penalties on large inputs. A larger discount factor γ makes the control policy reduce the longer-term costs and improve the overall performance around the
4.4 Feedback Control with Reinforcement Learning 1
0
-0.1
0
0.8
-0.2
0.6
Output 3
Open-loop Closed-loop
Output 2
Output 1
0.1
-0.4 -0.6
-0.2
-1 500
0
1000
Open-loop Closed-loop
0 1000
0
-0.5
0
500
1000
Time Step
Open-loop Closed-loop
0.4
Output 5
Output 4
Open-loop Closed-loop 500
Time Step
Time Step 0.5
Open-loop Closed-loop
0.4 0.2
-0.8 0
143
0.2
0
-0.2
-1 0
500
Time Step
1000
0
500
1000
Time Step
Fig. 4.40 Test of the synthesized control policy for regulating the system output against a step disturbance (switched on at the time step 300) and a Guassian noise, both added to the input of the static system, usys
stepping time. Note that when synthesizing both control policies, we have used a smaller g2 , but when using them for the setpoint tracking control, we have set g2 = 1. In principle, g1 can also be adjusted to tune the control performance. A larger g1 increases the closed-loop bandwidth so that the loop can track faster setpoint changes and suppress higher-frequency disturbances. However, it will amplify the effects of disturbances with frequencies above the closed-loop bandwidth. Therefore, we usually set a small g1 for mitigating low-frequency disturbances and use feedforward for fast setpoint tracking. At the same time, g2 is set to 1 to eliminate the steady-state errors. Similar to the static controllers discussed in Chap. 2, the control policy designed by the NAC algorithm is also a pseudo-inverse of the response matrix Rsys . Therefore, the NAC algorithm is another method for static controller design. One advantage of the NAC algorithm is that the design process can be directly applied to the physical system without identifying the response matrix. It also implies that the NAC algorithm can be executed online (if safe for beam operation) to adapt the controller to the physical system drifts.
4 Machine Learning for Beam Controls 2
4
0
2
Output 2
Output 1
144
-2
0 -2
-4 0
1000
2000
3000
4000
0
1000
2000
3000
4000
3000
4000
Time Step
0
10
-0.5
5
Output 4
Output 3
Time Step
-1 -1.5
0 -5 -10
-2 0
1000
2000
3000
4000
Time Step
0
1000
2000
Time Step
Output 5
10 Control policy 1 Control policy 2 Setpoint
5
0 0
1000
2000
3000
4000
Time Step
Fig. 4.41 Setpoint tracking test of two control policies synthesized with different settings. Control policy 1 parameters are given before, and control policy 2 has modified the values of two parameters: γ = 0.995 and R = I8
4.5 Further Reading and Outlook Machine learning has become a hot topic in the design, control, and operation of particle accelerators. This chapter mainly focuses on the applications of machine learning in beam controls. We introduce the basic concepts, processes, and some typical algorithms of machine learning. Neural network and Gaussian process are highlighted since they are the most successful algorithms applied to the accelerator control. We did not discuss deep learning (e.g., convolutional/recurrent neural networks or deep reinforcement learning) due to the space limit. Nevertheless, the introduced concepts and methods are also applicable to deep learning-based applications. This section reviews the R&D of machine learning in accelerators and offers some outlooks. The most valuable application of machine learning is building surrogate models of accelerators. A comprehensive surrogate model can predict the global, nonlinear beam responses and is constructive for beam controls. It can be used to study new beam operation modes (e.g., the feasibility of a new mode with particular FEL photon
4.5 Further Reading and Outlook
145
energy, bandwidth, and pulse width), configure beam feedback controllers, implement predictive feedforward controls, accelerate beam optimizations, and so on. We have covered some of these topics in this chapter. Such surrogate models are often referred to as virtual diagnostics, and many studies have been reported in the articles (Sanchez-Gonzalez et al. 2017; Emma et al. 2021; Zhu et al. 2021, 2022; Dingel et al. 2022). Surrogate models are also used to facilitate the beam dynamics design, such as in the articles (Wan et al. 2020; Wan and Jiao 2022). Many machine learning regression models (e.g., the neural network and Gaussian process regression models discussed in this chapter) can model the responses of static systems. However, building a surrogate model for a (nonlinear) dynamical system based on the input–output data is more challenging. We have discussed a possible dynamical model based on neural networks in Fig. 4.6b, which only works for systems with impulse responses that decay fast (the FIR approximation). Dynamic mode decomposition (DMD) (Kutz et al. 2016) is a general data-driven method to model dynamical systems. It is originally designed to retrieve the low-rank spatiotemporal modes of high-dimensional dynamical systems (e.g., fluid dynamics). With the kernel trick based on the Koopman theory, DMD can also be used to model nonlinear dynamical systems. With the retrieved dynamical modes, a surrogate model can be constructed to predict the future evolution of the system outputs. DMD is also an excellent tool to analyze the jitter pattern (both spatial and temporal) of highdimensional data (e.g., RF waveforms, beam orbit with many BPM measurement points, or image data of beam profiles). An ultimate surrogate model may simulate the whole accelerator facility’s behaviors and form a virtual accelerator. The surrogate model should adapt itself to track the drifts of the accelerator by observing the live input–output data. This can be accomplished by adopting online learning techniques, such as training neural networks with the stochastic gradient descent algorithm, or using instance-based learning algorithms (e.g., Gaussian process), which update the model with new data. The article (Scheinker 2021) describes a method to adapt the surrogate model using a correlated detector that estimates the drifts explicitly. When using the surrogate model for beam optimization, we can adapt the surrogate model and then determine the optimal inputs based on the updated surrogate model in each iteration. This forms a slow feedback controller that keeps the machine output (e.g., FEL pulse energy) optimal. Since the optimization is based on the surrogate model via simulation, it does not disturb the beam operation, and the calculation is much faster than directly optimizing the physical systems. Another interesting topic of surrogate modeling is the inverse model, which predicts the system inputs for given outputs. An inverse model of the accelerator is attractive. It predicts the required inputs for the desired outputs and is helpful for changing the beam operating points or implementing static feedback controllers (see Chap. 2). Some studies of the inverse surrogate models can be found in the references (Fliller et al. 2018; Bellotti et al. 2021; Wan et al. 2021). Bayesian optimization is another successful machine learning algorithm for accelerator controls. In addition to the references given in Sect. 4.3.5, some other studies can be found in the articles (Duris et al. 2020; Shalloo et al. 2020). The performance of Bayesian optimization can be improved by combing a physics model, as described
146
4 Machine Learning for Beam Controls
in the paper (Hanuka et al. 2021). In online beam optimization, we want to maximize the beam performance (e.g., FEL pulse energy) but do not want to cause beam losses. Therefore, the optimization is subject to some constraints. The ETH team has proposed a Bayesian optimization strategy with safety constraints, which can be found in the articles (Berkenkamp et al. 2021; Kirschner et al. 2022). This algorithm simultaneously models the system’s response (e.g., FEL pulse energy in response to machine settings) and the constraint’s response (e.g., beam loss rate in response to the same machine settings) and determines the trial inputs by compromising the performance and risk. Reinforcement learning is also a hot topic in accelerator controls (Bruchon et al. 2020; O’Shea et al. 2020). Its study in the accelerator community is still in a preliminary stage. For linear systems control, as discussed in Sect. 4.4, reinforcement learning only provides an alternative method for the controller design. The traditional model-based design methods (Kalman filter plus LQR) are sufficient. The major benefit of (deep) reinforcement learning is the potential to control nonlinear systems. This is only useful if the neural network-based policy or value function can be trained with a reasonably complex data set and a better convergence guarantee. Examples of applying deep reinforcement learning can be found in the articles (Kim et al. 2019; Kain et al. 2020; John et al. 2021). Other machine learning algorithms are also found in accelerator controls, such as fault identification and classification (Tennant et al. 2020; Edelen and Hall 2021; Gruenhagen et al. 2021). As discussed in Sect. 3.4, a mature implementation with high-degree automation and good exception handling is essential for machine learning-based applications. Machine learning algorithms require high-quality data (e.g., sufficient input–output channels, well aligned in time, etc.) to train the model. Therefore, a reliable data acquisition system is another pre-condition for applying machine learning algorithms. In summary, implementing machine learning-based applications for the daily operation of an accelerator will be very challenging.
References R. Bellotti, R. Boiger, A. Adelmann, Fast, efficient and flexible particle accelerator optimization using densely connected and invertible neural networks. Information 12, 351 (2021). https:// doi.org/10.3390/info12090351 F. Berkenkamp, A. Krause, A.P. Schoellig, Bayesian optimization with safety constraints: safe and automatic parameter tuning in robotics. Mach. Learn. (2021). https://doi.org/10.1007/s10994021-06019-1 E. Brochu, V.M. Cora, N. Freitas, A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning (2010). arXiv: 1012.2599v1. https://arxiv.org/abs/1012.2599. Accessed 29 Aug 2022 N. Bruchon, G. Fenu, G. Gaio et al., Basic reinforcement learning techniques to control the intensity of a seeded free-electron laser. Electronics 9, 781 (2020). https://doi.org/10.3390/electronics9 050781
References
147
K. Dingel, T. Otto, L. Marder et al., Toward AI-enhanced online-characterization and shaping of ultrashort X-ray free-electron laser pulses (2022). arXiv:2108.13979. https://arxiv.org/abs/2108. 13979. Accessed 30 Aug 2022 R.C. Dorf, R.H. Bishop, Modern control systems, 12th edn. (Pearson Education, London, 2010) J. Duris, D. Kennedy, A. Hanuka et al., Bayesian optimization of a free-electron laser. Phys. Rev. Lett. 124, 124801 (2020). https://doi.org/10.1103/PhysRevLett.124.124801 A. Edelen, N. Neveu, M. Frey et al., Machine learning for orders of magnitude speedup in multiobjective optimization of particle accelerator systems. Phys. Rev. Accel. Beams 23, 044601 (2020). https://doi.org/10.1103/PhysRevAccelBeams.23.044601 J.P. Edelen, C.C. Hall, Autoencoder based analysis of RF parameters in the Fermilab low energy Linac. Information 12, 238 (2021). https://doi.org/10.3390/info12060238 C. Emma, A. Edelen, A. Hanuka et al., Virtual diagnostic suite for electron beam prediction and control at FACET-II. Information 12(2), 61 (2021). https://doi.org/10.3390/info12020061 M. Farjadnasab, M. Babazadeh, Model-free LQR design by Q-function learning. Automatica 137, 110060 (2022). https://doi.org/10.1016/j.automatica.2021.110060 R.P. Fliller, C. Gardner, P. Marino et al., Application of machine learning to minimize long term drifts in the NSLS-II Linac, in Proceedings of IPAC2018 Conference, Vancouver, BC, Canada, 29 April–4 May 2018 (2018) A. Géron, Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd edn (O’Reilly Media, Sebastopol, 2019) A. Gruenhagen, J. Branlard, A. Eichler et al., Fault analysis of the beam acceleration control system at the European XFEL using data mining, in Proceedings of 2021 IEEE 30th Asian Test Symposium (ATS), Matsuyama, Ehime, Japan, 22–25 Nov 2021 (2021) A. Hanuka, X. Huang, J. Shtalenkova et al., Physics model-informed Gaussian process for online optimization of particle accelerators. Phys Rev Accel Beams 24, 072802 (2021). https://doi.org/ 10.1103/PhysRevAccelBeams.24.072802 T. Hastie, R. Tibshirani, J. Friedman, The elements of statistical learning: data mining, inference, and prediction, 2nd edn. (Springer, New York, 2009) K. Hornik, Approximation capabilities of multilayer feedforward networks. Neural Netw. 4(2), 251–257 (1991). https://doi.org/10.1016/0893-6080(91)90009-T C. Hua, Reinforcement Learning Aided Performance Optimization of Feedback Control Systems (Springer Vieweg, Wiesbaden, 2021) X. Huang, Beam-Based Correction and Optimization for Accelerators (CRC Press, Boca Raton, 2020) X. Huang, Z. Zhang, M. Song et al., Multi-objective multi-generation Gaussian process optimizer, in Proceedings of IPAC2021 Conference, Campinas, SP, Brazil, 24–28 May 2021 (2021) J.S. John, C. Herwig, D. Kafkes et al., Real-time artificial intelligence for accelerator control: A study at the Fermilab Booster. Phys. Rev. Accel. Beams 24, 104601 (2021). https://doi.org/10. 1103/PhysRevAccelBeams.24.104601 V. Kain, S. Hirlander, B. Goddard et al., Sample-efficient reinforcement learning for CERN accelerator control. Phys. Rev. Accel. Beams 23, 124801 (2020). https://doi.org/10.1103/PhysRevAc celBeams.23.124801 H. Kim, M. Ghergherehchi, S. Shin et al., The automatic frequency control based on artificial intelligence for compact particle accelerator. Rev. Sci. Instrum. 90, 074707 (2019). https://doi. org/10.1063/1.5086866 B. Kiumarsi, F.L. Lewis, H. Modares et al., Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics. Automatica 50, 1167–1175 (2014). https://doi.org/10.1016/j.automatica.2014.02.015 J. Kirschner, M. Mutny, A. Krause et al., Tuning particle accelerators with safety constraints using Bayesian optimization. Phys. Rev. Accel. Beams 25, 062802 (2022). https://doi.org/10.1103/ PhysRevAccelBeams.25.062802 J.N. Kutz, S.L. Brunton, B.W. Brunton et al., Dynamic Mode Decomposition: Data-Driven Modeling of Complex Systems (Siam, Philadelphia, 2016)
148
4 Machine Learning for Beam Controls
Y.B. Kong, M.G. Hur, E.J. Lee et al., Predictive ion source control using artificial neural network for RFT-30cyclotron. Nucl. Instrum. Methods Phys. Res. A 806, 55–60 (2016). https://doi.org/ 10.1016/j.nima.2015.09.095 L. Ljung, System Identification: Theory for the User, 2nd edn (Prentice Hall PTR, Upper Saddle River, 1998) E. Meier, S.G. Biedron, G. LeBlanc et al., Development of a combined feed forward-feedback system for an electron Linac. Nucl. Instrum. Methods Phys. Res. A 609(2–3), 79–88 (2009). https://doi.org/10.1016/j.nima.2009.08.028 E. Meier, S.G. Biedron, G. LeBlanc et al., Electron beam energy and bunch length feed forward control studies using an artificial neural network at the Linac coherent light source. Nucl. Instrum. Methods Phys. Res. A 610(3), 629–635 (2009). https://doi.org/10.1016/j.nima.2009.09.048 H. Modares, F.L. Lewis, Online solution to the linear quadratic tracking problem of continuous-time systems using reinforcement learning, in Proceedings of 52nd IEEE Conference on Decision and Control, Florence, Italy, 10–13 Dec 2013 (2013) K.P. Murphy, Machine Learning: A Probabilistic Perspective (MIT Press, Cambridge, 2012) F.H. O’Shea, N. Bruchon, G. Gaio, Policy gradient methods for free-electron laser and terahertz source optimization and stabilization at the FERMI free-electron laser at Elettra. Phys. Rev. Accel. Beams 23, 122802 (2020). https://doi.org/10.1103/PhysRevAccelBeams.23.122802 J. Peters, S. Vijayakumar, S. Schaal, Natural actor-critic, in Machine Learning: ECML 2005, ed. by J. Gama, R. Camacho, P.B. Brazdil et al. Lecture notes in computer science, vol. 3720 (Springer, Berlin, 2005), pp. 280–291 R. Roussel, A. Hanuka, A. Edelen, Multiobjective Bayesian optimization for online accelerator tuning. Phys. Rev. Accel. Beams 24, 062081 (2021). https://doi.org/10.1103/PhysRevAccel Beams.24.062801 A. Sanchez-Gonzalez, P. Micaelli, C. Olivier et al., Accurate prediction of X-ray pulse properties from a free-electron laser using machine learning. Nat. Commun. 8, 15461 (2017). https://doi. org/10.1038/ncomms15461 A. Scheinker, Adaptive machine learning for time-varying systems: low dimensional latent space tuning. J. Instrum. 16, P10008 (2021). https://doi.org/10.1088/1748-0221/16/10/p10008 E. Schulz, M. Speekenbrink, A. Krause, A tutorial on Gaussian process regression: modelling, exploring, and exploiting functions. J. Math. Psychol. 85, 1–16 (2018). https://doi.org/10.1016/ j.jmp.2018.03.001 M. Sewak, Deep Reinforcement Learning: Frontiers of Artificial Intelligence (Springer, Singapore, 2019) R.J. Shalloo, S.J.D. Dann, J.-N. Gruse et al., Automation and control of laser wakefield accelerators using Bayesian optimization. Nat. Commun. 11, 6355 (2020). https://doi.org/10.1038/s41467020-20245-6 S. Skogestad, I. Postlethwaite, Multivariable Feedback Control: Analysis and Design, 2nd edn. (Wiley, New York, 2005) S. Simrock, Z. Geng, Low-Level Radio Frequency Systems (Springer, Cham, 2022) R.S. Sutton, A.G. Barto, Reinforcement Learning: An Introduction, 2nd edn. (The MIT Press, Cambridge, 2018) R.S. Sutton, D. McAllester, S. Singh et al., Policy gradient methods for reinforcement learning with function approximation, in Proceedings of NIPS1999 Conference, Denver, CO, USA, November 29–December 4 (1999) C. Tennant, A. Carpenter, T. Powers et al., Superconducting radio-frequency cavity fault classification using machine learning at Jefferson Laboratory. Phys. Rev. Accel. Beams 23, 114601 (2020). https://doi.org/10.1103/PhysRevAccelBeams.23.114601 J. Wan, Y. Jiao, Machine learning enabled fast evaluation of dynamic aperture for storage ring accelerators. New J. Phys. 24, 063030 (2022). https://doi.org/10.1088/1367-2630/ac77ac J. Wan, P. Chu, Y. Jiao, Neural network-based multiobjective optimization algorithm for nonlinear beam dynamics. Phys. Rev. Accel. Beams 23, 081601 (2020). https://doi.org/10.1103/PhysRe vAccelBeams.23.081601
References
149
J. Wan, Y. Jiao, J. Wu, Machine learning-based direct solver for one-to-many problems on temporal shaping of electron beams (2021). arXiv:2103.06594. https://arxiv.org/abs/2103. 06594. Accessed 29 Aug 2022 Y. Wang, K. Velswamy, B. Huang, A novel approach to feedback control with deep reinforcement learning. IFAC PapersOnLine 51–58, 31–36 (2018). https://doi.org/10.1016/j.ifacol.2018. 09.241 A. Zai, B. Brown, Deep Reinforcement Learning in Action (Manning Publications Co., Shelter Island, 2020) Z. Zhang, M. Song, X. Huang, Online accelerator optimization with a machine learning-based stochastic algorithm. Mach. Learn Sci. Technol. 2, 015014 (2021). https://doi.org/10.1088/26322153/abc81e J. Zhu, Y. Chen, F. Brinker et al., High-fidelity prediction of megapixel longitudinal phase-space images of electron beams using encoder-decoder neural networks. Phys. Rev. Appl. 16, 024005 (2021). https://doi.org/10.1103/PhysRevApplied.16.024005 J. Zhu, N.M. Lockmann, M.K. Czwalinna et al., Mixed diagnostics for longitudinal properties of electron bunches in a free-electron laser. Front Phys. 10, 903559 (2022). https://doi.org/10.3389/ fphy.2022.903559
Index
A Acceleration coefficient, 71 Accelerator subsystem, 1, 2, 6, 49, 51, 85, 87, 88, 95, 97, 101, 103 Acquisition function, 118–121 Action, 41, 46, 69, 85, 88, 91, 118, 123–125, 127, 128, 130, 132 Activation function, 96, 97 Actor, 129 Actor-critic algorithm, 125, 126, 129 Adaptive feedback, 76 Advantage function, 132, 133 Agent, 88, 123–125 Arithmetic crossover, 66 Artificial intelligence, 86 Artificial neural network, 87 Automation, 7, 11, 13, 14, 146
B Back propagation, 98, 99 Batch learning, 88, 91 Bayesian optimization, 5, 14, 80, 81, 94, 117–119, 145, 146 Beam actuator, 8–13, 17, 21–25, 28, 29, 34, 37, 39, 40, 50 Beam control, 1, 2, 4, 6–8, 11, 12, 14, 37, 38, 46, 49, 54, 76, 80, 81, 85, 87, 88, 94, 95, 100, 101, 106, 123, 124, 144 Beam control system, 1, 6, 14, 15, 21, 22 Beam detector, 4, 8–10, 21–24, 30, 75, 94 Beam device, 1, 7–12 Beam device layer, 7, 8, 12, 21 Beam diagnostic controller, 10, 21, 23, 24 Beam diagnostic device, 10
Beam feedback, 1, 3–5, 7–10, 12, 13, 21, 22, 45, 46, 49, 50, 75, 76, 85, 109, 110, 112, 113, 135 Beam feedback controller, 12, 13, 45, 81, 95, 108, 145 Beam optimization, 1, 2, 5, 14, 49–54, 66, 74, 81, 85, 95, 110, 116, 121, 145, 146 Beam optimizer, 13, 22 Beam position monitor, 8 Beam response matrix, 4, 21, 23, 26, 29, 46, 95, 106, 108 Beam sensitivity matrix, 29 Beam setup, 2, 3, 5–7, 11, 12, 77, 85, 95 Beam stabilization, 3, 4 Beam synchronous data acquisition, 11 Bellman equation, 128 Bias, 35, 52, 60, 73, 78, 96, 123 Binary tournament, 66 Blackbox model, 89 Blackbox optimization, 14, 51, 53, 54, 61, 79 Blackbox system identification, 89 Bunch arrival time monitor, 8 Bunch compression monitor, 9
C Children, 64 Chromosome, 66, 67 Closed-loop beam actuator controller, 9 Cognitive component, 71 Coherent diffraction radiation, 9 Command tracking, 22 Condition number, 36–38, 40, 74
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 Z. Geng and S. Simrock, Intelligent Beam Control in Accelerators, Particle Acceleration and Detection, https://doi.org/10.1007/978-3-031-28597-4
151
152 Conjugate direction, 60–63, 78, 111 Control model, 88–90, 94 Control setting, 21–23, 29, 37, 50, 51, 53, 76, 124 Convex optimization, 30 Cost, 2, 3, 40, 50–53, 55–70, 72–75, 77, 78, 93, 94, 98, 110, 111, 117–121, 123, 124, 126, 128, 142 Cost function, 2, 50–53, 55, 56, 60–62, 65–68, 70, 73, 75, 77, 78, 93, 95, 98, 110, 111, 117, 120, 126, 128 Cost vector, 117 Coupling, 3, 29, 37 Critic function, 129, 130 Cross-entropy, 93 Crossover, 64–67, 121 Cross-validation, 94, 104 Crowding distance, 68 Cvx, 30, 80 CVXOPT, 80
D Deep learning, 87, 95, 97, 144 Deep reinforcement learning, 87, 123, 125, 144, 146 Derivative-free algorithm, 53 Deterministic algorithm, 54 Device controller, 9–12 Dimensionality reduction, 82, 86, 88 Direct search algorithm, 53, 120, 121 Discount factor, 128, 134, 136, 142 Disturbance rejection, 22, 41, 42, 81, 91, 127 Domination, 68 Drift, 3, 4, 9, 21, 24, 27, 34, 45, 46, 51, 60, 81, 87, 101, 143, 145 Dynamical controller, 25–27, 37, 41, 43, 45, 139 Dynamic mode decomposition, 145 Dynamic programming, 117
E Empirical transfer function estimate, 90 Environment, 9, 88, 123–125, 127, 136, 140–142 Episode, 123, 134, 137, 141 Evolutionary algorithm, 54, 64, 70, 121 Exploitation, 71, 118–120 Exploration, 65, 67, 71, 118–121
Index F Feature scaling, 93 Feedback control, 4, 21–23, 27, 36, 43, 53, 114, 123, 126 Feedforward control, 4, 85, 106, 112–117, 145 Finite impulse response, 99 Fisher information matrix, 134 Fisher metric, 134 Fold, 90, 94
G Gas-based detector, 9, 52, 53 Gaussian process, 5, 14, 81, 95, 101, 103, 105, 144, 145 Gaussian process optimization, 54, 78, 119, 121 Gaussian process regression model, 85, 101, 145 Gene, 66, 67 Genetic algorithm, 5, 54, 64, 68, 95 Global control layer, 7, 12, 14 Global optimization layer, 7, 13 Global Optimization Toolbox, 69, 72, 79 Golden section search method, 60 Grey-box system identification, 89
H Heuristic algorithm, 5, 54 Hidden layer, 97, 100, 107, 115 Hyperparameter, 65, 67, 70, 71, 75, 90, 91, 94, 97, 104, 119, 125, 127
I Ill conditioned, 36, 38, 39 Individual, 60, 61, 64, 65, 69 Inertia weight, 71 Infinite impulse response, 141 Input, 5–7, 9–14, 16, 17, 21, 24–27, 29, 30, 32–34, 37–39, 43, 46, 50–63, 66, 67, 69, 70, 72–74, 77, 78, 81, 82, 86, 89–97, 99–101, 103–112, 114, 116–121, 123, 126–130, 133–137, 140–142, 145, 146 Input direction, 33–35 Input feature, 86–89, 92–94 Input layer, 97 Instance-based learning, 88, 95, 145 Instrumentation layer, 7, 9, 12, 21 Integrating current transformer, 8 Internal model control, 4, 45, 138
Index J Jitter, 3, 4, 27, 29, 53, 55–57, 60, 62, 76, 145
K Kalman filter, 126, 127, 129, 146 Kernel function, 104 Koopman theory, 145 Kronecker product, 133
L Label, 45, 86, 87, 89, 90, 93, 94 Least-square method, 39, 109, 133, 134 Length scale, 104 Linear quadratic Gaussian, 14, 85, 125 Linear quadratic regulator, 127 Linear state-space equation, 89 Line optimizer, 60, 62, 63 Local feedback controller, 9, 13 Local I/O controller, 13 Loss function, 93 Lower confidence bound, 119
M Machine learning, 1, 5, 7, 11, 14, 18, 70, 82, 85–90, 92–95, 106, 119, 144, 145 Machine learning model, 54, 86, 88–91, 93, 94, 106 Magnet, 2–6, 8, 9, 11, 12, 21, 23, 25, 29, 37, 49, 53, 106, 111, 124 Markov process, 140 Mathematical model, 50, 88, 94 Matrix inversion, 32, 36, 38, 39 Maximum likelihood estimation, 90 Mean absolute error, 93 Mean square error, 93 Model-based learning, 88, 95 Model-based optimization, 70, 107, 111, 112, 117 Modeless optimization, 51 Monto-Carlo method, 130 Multi-generation Gaussian process optimization, 78, 121 Multi-objective optimization, 51, 75 Multi-start method, 54 Multivariate Gaussian distribution, 101 Multivariate normal distribution, 101 Mutation, 64–67, 121 Mutation rate, 67 Mutation step, 67
153 N NAC regulator, 137–139 Natural actor-critic algorithm, 126 Natural gradient, 134 Negative log-likelihood, 93 Neural network, 5, 45, 46, 85, 87, 89, 90, 93–100, 106, 107, 115, 123, 125, 144, 145 Neural network predictive model, 114–116 Neural network regression model, 4, 96, 106 Neural network surrogate model, 107–112 Neuron, 94, 96–98, 100, 107, 115 Non-dominated sorting, 51, 67–69, 71, 74, 117 Non-parametric model, 101
O Objective function, 50, 80 Ocelot optimizer, 80 Offline optimization, 51 Off-policy, 129 Offspring, 64–67 Online learning, 88, 91, 145 Online optimization, 9, 51–54, 57, 60, 63, 70, 73, 81, 110, 111, 122 On-policy, 129 Open-loop beam actuator controller, 9 Open-objective optimization, 51, 75 Operating point, 2, 4, 5, 12, 14, 18, 24, 27, 29–32, 34, 37, 45, 46, 56, 57, 81, 85, 89, 91, 92, 94, 95, 106, 107, 109–111, 116, 117, 122, 134, 135, 137, 138, 140, 141, 145 Operating point changing, 2–5, 13, 45, 46, 51, 53, 75, 76, 78, 79, 108, 112, 122 Optimization Toolbox, 79 Optimizer, 10, 50–54, 57, 59, 75, 76, 78, 81, 82, 117 Output, 5, 8–13, 17, 22–27, 30, 32–34, 37–39, 41, 43, 50–53, 55–60, 62, 68, 74, 77, 78, 85–101, 103–107, 109, 111–119, 121–123, 126–130, 133–138, 140–143, 145, 146 Output direction, 26, 27, 33–35, 37, 45 Output layer, 97 Overfitting, 93
P Parameterization, 129, 132, 133, 136 Pareto front, 68, 69
154 Particle, 1, 2, 7, 13, 49, 50, 70–73, 80, 85, 87, 88, 135, 144 Particle swarm optimization, 5, 54, 70, 95 Photon beam detector, 9 Photon single-shot spectrometer, 9 Physical system, 5, 12, 50, 51, 54, 55, 60, 62, 70, 73, 74, 92, 95–98, 106, 109–112, 117–119, 121, 122, 124, 125, 143, 145 Physics applications, 7, 12, 13 Physics model, 5, 6, 10, 14, 75, 88–90, 92, 94, 106, 107, 118, 145 Policy, 88, 123–130, 132–138, 140–143, 146 Policy-based algorithm, 125 Policy function, 124, 125, 129, 130, 133, 136 Policy gradient, 130–132, 134, 136 Policy gradient algorithm, 125, 129, 131, 133 Policy iteration, 129 Population, 54, 64, 65, 67–70 Position, 1, 9–11, 23, 24, 52, 53, 61, 66, 70–72 Posterior, 101–103, 105, 117–119 Powell’s method, 54, 61, 62, 80 Principal component analysis, 86 Prior, 102, 117, 119, 121 Probability distribution function, 101 Pulse arrival and length monitor, 9 Q Q-function, 125, 128, 132 Q-learning algorithm, 125 R Radial-basis function, 104 Random walk optimization, 5, 54, 58 Rastrigin function, 55 Reference tracking, 22, 40, 41 Regularization, 4, 21, 45, 93 Regularization factor, 40, 94 Regulation mode, 135 Regulator, 134, 135, 137, 138 Reinforcement learning, 4, 14, 46, 85, 88, 91, 117, 123–129, 138, 140, 146 Reliability, 11, 13, 14 Repeatability, 11–14 Reward, 88, 123–125 RF Gun laser, 2, 3, 8–11, 76 RF system, 2, 5, 8, 10, 81 Robust conjugate direction search, 5, 54, 60
Index Robust control, 4, 21, 35, 40–43, 45, 109 Robustness, 7, 11, 13, 14, 43, 49, 70 Robust performance, 41 Robust stability, 41 Roulette wheel selection, 65, 66
S Setpoint tracking, 22, 91, 137, 139, 142–144 Sigmoid, 97, 98 Single-objective optimization, 71, 117 Singular value, 33–36, 39, 42, 43 Singular value decomposition, 4, 21, 33 Singular value truncation, 39, 40, 43, 45 Social component, 71 Softmax function, 65 Solution, 40, 50–52, 54, 56, 59–68, 70, 71, 73–75, 95, 110, 111, 119 Spontaneous correlation optimization, 5, 54, 56 State-action-value function, 125, 128 State-value function, 124, 125, 128 Static controller, 26, 27, 34, 35, 37–40, 43, 44, 46, 95, 114, 139, 140, 143 Statistical model, 89 Step phase, 16, 17, 31, 32, 38, 45, 78, 107, 114, 115 Step ratio, 16–18, 31, 32, 38, 45, 78, 107, 113–115, 117 Stochastic algorithm, 54, 64, 70, 110, 111 Stochastic gradient decent, 145 Structured singular value, 42 Supervised learning, 86, 87, 89, 90 Surrogate model, 4, 5, 14, 45, 54, 70, 73, 78, 81, 85, 87, 94, 95, 106–111, 113, 116, 118, 119, 121, 144, 145 Swarm, 70–72, 80 Swarm intelligence, 54, 70 SwissFEL, 1, 15, 17, 18, 27, 30, 31, 35, 36, 38–40, 43, 44, 46, 49, 51, 77, 78, 106, 107, 109, 111, 113–115, 117, 122, 126, 140 Synchrotron radiation monitor, 8 System, 1, 3, 5–7, 13–16, 21–23, 25, 28, 30, 42, 45, 46, 50–52, 54–62, 66, 67, 69, 70, 72–75, 78, 80, 81, 85, 89–91, 93–95, 99, 101, 103–108, 110, 111, 113, 117–119, 121–124, 126, 127, 129, 134, 135, 138–143, 145, 146 System identification, 22, 35, 90
Index T Tanh, 97 Temporal difference method, 128 Termination condition, 54, 56, 58, 62, 64, 72, 119, 121 Test set, 90, 91, 93 Tracker, 135 Tracking mode, 135 Training data, 86–88, 90–93, 95, 96, 98, 100, 101, 103–107, 115 Training set, 90, 91, 94, 107 U Underfitting, 93 Undulator, 2, 3, 6, 8, 9, 15, 52, 53, 66, 76, 77, 92 Unsupervised learning, 86, 88, 89, 91
155 Utility function, 118
V Validation set, 90, 91, 93, 94 Value-based algorithm, 125 Value function, 123, 125, 127–130, 132–134, 136, 142, 146 Variance, 32, 35, 52, 56, 62, 63, 94, 101, 103–105, 107, 118, 119, 141 Virtual accelerator, 145 Virtual diagnostic, 14, 87, 94, 145
W White-box model, 88 White-box optimization, 51