Advances in Uncertainty Quantification and Optimization Under Uncertainty with Aerospace Applications: Proceedings of the 2020 UQOP International Conference (Space Technology Proceedings, 8) [1st ed. 2021] 3030805417, 9783030805418

The 2020 International Conference on Uncertainty Quantification & Optimization gathered together internationally ren

131 22 22MB

English Pages 466 [448] Year 2022

Table of contents :
Preface
Contents
Part I Applications of Uncertainty in Aerospace & Engineering (ENG)
From Uncertainty Quantification to Shape Optimization: Cross-Fertilization of Methods for Dimensionality Reduction
1 Introduction
2 Design-Space Dimensionality Reduction in Shape Optimization
2.1 Geometry-Based Formulation
2.2 Physics-Informed Formulation
3 Example Application
4 Concluding Remarks
References
Cloud Uncertainty Quantification for Runback Ice Formations in Anti-Ice Electro-Thermal Ice Protection Systems
Nomenclature
1 Introduction
2 Modelling of an AI-ETIPS
2.1 Computational Model
2.2 Case of Study
3 Cloud Uncertainty Characterization
4 Uncertainty Propagation Methodologies
4.1 Monte Carlo Sampling Methods
4.2 Generalized Polynomial Chaos Expansion
5 Numerical Results
6 Concluding Remarks
References
Multi-fidelity Surrogate Assisted Design Optimisation of an Airfoil under Uncertainty Using Far-Field Drag Approximation
1 Introduction
2 Multi-fidelity Gaussian Process Regression
3 Aerodynamic Computational Chain
4 Far-Field Drag Coefficient Calculation
5 Deterministic Design Optimisation Problem
6 Probabilistic Design Optimisation Problem
7 Optimisation Pipeline
8 Results
8.1 Deterministic Optimisation
8.2 Probabilistic Optimisation
9 Conclusion
References
Scalable Dynamic Asynchronous Monte Carlo Framework Applied to Wind Engineering Problems
1 Introduction
2 Monte Carlo Methods
2.1 Monte Carlo
2.2 Asynchronous Monte Carlo
2.3 Scheduling
3 Wind Engineering Benchmark
3.1 Problem Description
3.2 Source of Uncertainty
3.3 Results
4 Conclusion
References
Multi-Objective Optimal Design and Maintenance for Systems Based on Calendar Times Using MOEA/D-DE
1 Introduction
2 Methodology and Description of the Proposed Model
2.1 Extracting Availability and Economic Cost from Functionability Profiles
2.2 Multi-Objective Optimization Approach
2.3 Building Functionability Profiles
3 The Application Case
4 Results and Discussion
5 Conclusions
References
Multi-objective Robustness Analysis of the Polymer Extrusion Process
1 Introduction
2 Robustness in Polymer Extrusion
2.1 Extrusion Process
2.2 Robustness Methodology
2.3 Multi-objective Optimization with Robustness
3 Results and Discussion
4 Conclusion
References
Quantification of Operational and Geometrical Uncertainties of a 1.5-Stage Axial Compressor with Cavity Leakage Flows
1 Motivation and Test Case Description
1.1 Geometry and Operating Regime
1.2 Uncertainty Definition
Correlated Fields at the Main Inlet
Secondary Inlets
Rotor Blade Tip Gap
2 Uncertainty Quantification Method
2.1 Scaled Sensitivity Derivatives
3 Simulation Setup and Computational Cost
4 Results and Discussion
4.1 Non-deterministic Performance Curve
4.2 Scaled Sensitivity Derivatives
5 Conclusions
References
Can Uncertainty Propagation Solve the Mysterious Case of Snoopy?
1 Introduction
2 Background
3 Methodology
3.1 Dynamics Modelling
3.2 Using the TDA Structure to Solve ODE
3.3 Performing Numerical Analysis
3.4 Propagator Implementation and Validation
3.5 Monte-Carlo Estimation
4 Results and Discussion
4.1 Performing Numerical Analysis on the Trajectory of Snoopy
4.2 Computing Snoopy's Trajectory
4.3 Estimating the Probability of Snoopy's Presence
5 Conclusions and Future Work
References
Part II Imprecise Probability, Theory and Applications (IP)
Robust Particle Filter for Space Navigation Under EpistemicUncertainty
1 Introduction
2 Filtering Under Epistemic Uncertainty
2.1 Imprecise Formulation
2.2 Expectation Estimator
2.3 Bound Estimator
3 Test Case
3.1 Initial State Uncertainty
3.2 Observation Model and Errors
3.3 Results
4 Conclusions
References
Computing Bounds for Imprecise Continuous-Time Markov Chains Using Normal Cones
1 Introduction
2 Imprecise Markov Chains in Continuous Time
2.1 Imprecise Distributions over States
2.2 Imprecise Transition Rate Matrices
2.3 Distributions at Time t
3 Numerical Methods for Finding Lower Expectations
3.1 Lower Expectation and Transition Operators as Linear Programming Problems
3.2 Computational Approaches to Estimating Lower Expectation Functionals
4 Normal Cones of Imprecise Q-Operators
5 Norms of Q-Matrices
6 Numerical Methods for CTIMC Bounds Calculation
6.1 Matrix Exponential Method
6.2 Checking Applicability of the Matrix Exponential Method
6.3 Checking the Normal Cone Inclusion
6.4 Approximate Matrix Exponential Method
7 Error Estimation
7.1 General Error Bounds
7.2 Error Estimation for a Single Step
7.3 Error Estimation for the Uniform Grid
8 Algorithm and Examples
8.1 Parts of the Algorithm
8.2 Examples
9 Concluding Remarks
References
Simultaneous Sampling for Robust Markov Chain Monte Carlo Inference
1 Introduction
2 Markov Chain Monte Carlo
3 Simultaneous Sampling
4 Markov Chain Monte Carlo for Imprecise Models
5 Practical Implementation
6 Linear Representation for Exponential Families
7 Computer Representation of the Credal Sets
8 Credal Set Merging
9 Discussion
Reference
Computing Expected Hitting Times for Imprecise Markov Chains
1 Introduction
2 Existence of Solutions
3 A Computational Method
4 Complexity Analysis
References
Part III Robust and Reliability-Based Design Optimisation in Aerospace Engineering (RBDO)
Multi-Objective Robust Trajectory Optimization of Multi-Asteroid Fly-By Under Epistemic Uncertainty
1 Introduction
2 Problem Formulation
3 Lower Expectation
3.1 Minimizing the Expectation
3.2 Estimating the Expectation
4 Multi-Objective Optimization
4.1 Control Mapping for Dimensionality Reduction
Deterministic Control Map
Max-Min Control Map
Min-Max Control Map
4.2 Threshold Mapping
5 Asteroid Tour Test Case
6 Results
6.1 Control Map and Threshold Map
6.2 Lower Expectation
6.3 Expectation and Sampling Methods
6.4 Execution Times
7 Conclusions
References
Reliability-Based Robust Design Optimization of a Jet Engine Nacelle
1 Introduction
2 Definition of Aeronautical Optimization Under Uncertainties
2.1 Nacelle Acoustic Liner and Manufacturing Tolerances
2.2 Nacelle Acoustic Liner FEM Model
3 Adaptive Sparse Polynomial Chaos for Reliability Problems
3.1 Basic Formulation of Adaptive PCE
3.2 Adaptive Sparse Polynomial Chaos Expansion
3.3 Application of Adaptive PCE to Reliability-Based Optimization
4 Reliability-Based Optimization of the Engine Nacelle
4.1 Optimization Platform
4.2 Optimization Results
5 Conclusion
References
Bayesian Optimization for Robust Solutions Under Uncertain Input
1 Introduction
2 Literature Review
3 Problem Definition
4 Methodology
4.1 Gaussian Process
4.2 Robust Bayesian Optimization
Direct Robustness Approximation
Robust Knowledge Gradient
4.3 Stochastic Kriging
5 Experiments
5.1 Benchmark Problems
Test Functions
Experimental Setup
5.2 Results
Latin Hypercube Sampling
Stochastic Kriging
Uncontrollable Input
6 Conclusions
References
Optimization Under Uncertainty of Shock Control Bumps for Transonic Wings
1 Introduction
2 Gradient-Based Robust Design Framework
2.1 Motivation
2.2 Surrogate-Based Uncertainty Quantification
2.3 Obtaining the Gradients of the Statistics
2.4 Optimization Architecture
2.5 Application to Analytical Test Function
3 Application to the Robust Design of Shock Control Bumps: Problem Definition
3.1 Test Case
3.2 Numerical Model
3.3 Parametrization of Shock Control Bumps
3.4 Optimization Formulations
4 Results
4.1 Single-Point (Deterministic) Results
4.2 Uncertainty Quantification
4.3 Robust Results
5 Conclusions
References
Multi-Objective Design Optimisation of an Airfoil with Geometrical Uncertainties Leveraging Multi-Fidelity Gaussian Process Regression
1 Introduction
2 Design Optimisation Problem of Airfoil
3 Solvers
4 Multi-Fidelity Gaussian Process Regression
5 Uncertainty Treatment
6 Multi-Objective Optimisation Framework for Airfoil Optimisation Under Uncertainty
7 Results
8 Conclusion
References
High-Lift Devices Topology Robust Optimisation Using Machine Learning Assisted Optimisation
1 Introduction
2 Machine Learning Assisted Optimisation
2.1 Surrogate Model
2.2 Classifier
3 Quadrature Approach for Uncertainty Quantification
4 Problem Formulation
4.1 Optimisation Design Variables
4.2 High-Lift Devices Robust Optimisation Problem
Original Objective Function
Artificial Objective Function
5 Optimisation Setup
6 Results
7 Conclusions and Future Work
References
Network Resilience Optimisation of Complex Systems
1 Introduction
2 Evidence Theory as Uncertainty Framework
3 System Network Model
4 Complexity Reduction of Uncertainty Quantification
4.1 Network Decomposition
4.2 Tree-Based Exploration
4.3 Combined Method
5 Optimisation Approach
6 Resilience Framework
7 Application
8 Results
9 Conclusions
References
Gaussian Processes for CVaR Approximation in Robust Aerodynamic Shape Design
1 Introduction
2 Robust Design and CVaR Risk Function
3 Risk Function Approximation
3.1 Gaussian Processes
3.2 Training Methodology
4 Numerical Analysis Tools
5 Design Application Example
5.1 Optimisation Problem Setup
5.2 Optimisation Process and Robust Design Results
6 Conclusions
References
Part IV Uncertainty Quantification, Identification and Calibration in Aerospace Models (UQ)
Inference Methods for Gas-Surface Interaction Models: From Deterministic Approaches to Bayesian Techniques
1 Introduction
2 Plasma wind Tunnel Experiments
2.1 Heterogeneous Catalysis
2.2 Thermochemical Ablation
3 Deterministic Approaches to the Inference of Model Parameters
3.1 Heterogeneous Catalysis
3.2 Thermochemical Ablation
4 Bayesian Approaches to the Inference of Model Parameters
4.1 Bayes Theorem
4.2 Heterogeneous Catalysis
4.3 Thermochemical Ablation
5 Conclusions
References
Bayesian Adaptive Selection Under Prior Ignorance
1 Introduction
2 Model
3 Posterior Computation
3.1 Selection Indicators
3.2 Regression Coefficients
4 Illustration
4.1 Synthetic Datasets
4.2 Real Data Analysis
5 Conclusion
References
A Machine-Learning Framework for Plasma-Assisted Combustion Using Principal Component Analysis and Gaussian Process Regression
1 Introduction
2 Reactor Model and Ignition Simulations
3 PCA-Based Gaussian Process Regression
4 Results
4.1 Principal Component Analysis
4.2 Combination of PCA with Gaussian Process Regression
5 Conclusion
References
Estimating Exposure Fraction from Radiation Biomarkers: A Comparison of Frequentist and Bayesian Approaches
1 Introduction
2 Methodology
3 Simulation
4 Estimation of Exposed Fraction
5 Discussion
Appendix
References
A Review of Some Recent Advancements in Non-Ideal Compressible Fluid Dynamics
1 Introduction
2 Non-Ideal Oblique Shock Waves
3 NICFD Computational Model Accuracy Assessment
4 Bayesian Inference of Fluid Model Parameters
5 Conclusions
References
Dealing with High Dimensional Inconsistent Measurements in Inverse Problems Using Surrogate Modeling: An Approach Based on Sets and Intervals
1 Introduction
2 Identification Strategy and Outlier Detection Method
3 Results
3.1 Application with the Set-Valued Inverse Method When Measurements Are in a Small Amount
3.2 Application with the Set-Valued Inverse Method When Measurements Are in a Large Amount
4 Summary
References
Stochastic Preconditioners for Domain Decomposition Methods
1 Introduction
2 Acceleration of the Schwarz Method
3 Acceleration of Schur Complement Based Methods
4 Conclusions and Perspectives
References
Index

Recommend Papers

Optimization Under Uncertainty with Applications to Aerospace Engineering 303060165X, 9783030601652

In an expanding world with limited resources, optimization and uncertainty quantification have become a necessity when h

123 65 11MB Read more

Proceedings of the 5th International Symposium on Uncertainty Quantification and Stochastic Modelling: Uncertainties 2020 [1st ed.] 9783030536688, 9783030536695

This proceedings book discusses state-of-the-art research on uncertainty quantification in mechanical engineering, inclu

340 65 46MB Read more

Uncertainty in Mechanical Engineering: Proceedings of the 4th International Conference on Uncertainty in Mechanical Engineering (ICUME 2021), June 7-8 2021 [1 ed.] 9783030772567, 9783030772550

This open access book reports on methods and technologies to describe, evaluate and control uncertainty in mechanical en

120 27 25MB Read more

Scalable Uncertainty Management: 14th International Conference, SUM 2020, Bozen-Bolzano, Italy, September 23–25, 2020, Proceedings [1st ed.] 9783030584481, 9783030584498

This book constitutes the refereed proceedings of the 14th International Conference on Scalable Uncertainty Management,

420 65 6MB Read more

Aerospace System Analysis and Optimization in Uncertainty [1st ed.] 9783030391256, 9783030391263

Spotlighting the field of Multidisciplinary Design Optimization (MDO), this book illustrates and implements state-of-the

243 13 17MB Read more

Quantification of Uncertainty: Improving Efficiency and Technology: QUIET selected contributions [1st ed.] 9783030487201, 9783030487218

This book explores four guiding themes – reduced order modelling, high dimensional problems, efficient algorithms, and a

333 14 10MB Read more

Design Optimization Under Uncertainty 3031492072, 9783031492075

This book introduces the fundamentals of probability, statistical, and reliability concepts, the classical methods of un

102 97 Read more

Model Validation and Uncertainty Quantification, Volume 3: Proceedings of the 38th IMAC, A Conference and Exposition on Structural Dynamics 2020 [1st ed.] 9783030476373, 9783030476380

Model Validation and Uncertainty Quantification, Volume 3: Proceedings of the 38th IMAC, A Conference and Exposition on

333 93 44MB Read more

Optimization and Applications: 11th International Conference, OPTIMA 2020, Moscow, Russia, September 28 – October 2, 2020, Proceedings [1st ed.] 9783030628666, 9783030628673

This book constitutes the refereed proceedings of the 11th International Conference on Optimization and Applications, OP

479 95 20MB Read more

Information Processing and Management of Uncertainty in Knowledge-Based Systems: 18th International Conference, IPMU 2020, Lisbon, Portugal, June 15–19, 2020, Proceedings, Part I [1st ed.] 9783030501457, 9783030501464

This three volume set (CCIS 1237-1239) constitutes the proceedings of the 18th International Conference on Information P

381 8 49MB Read more

Advances in Uncertainty Quantification and Optimization Under Uncertainty with Aerospace Applications: Proceedings of the 2020 UQOP International Conference (Space Technology Proceedings, 8) [1st ed. 2021]
3030805417, 9783030805418

Author / Uploaded
Massimiliano Vasile (editor)
Domenico Quagliarella (editor)

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

Space Technology Proceedings 8

Massimiliano Vasile Domenico Quagliarella Editors

Advances in Uncertainty Quantification and Optimization Under Uncertainty with Aerospace Applications Proceedings of the 2020 UQOP International Conference

Space Technology Proceedings Volume 8

The Space Technology Proceedings series publishes cutting-edge volumes across space science and the aerospace industry, along with their robust applications. Explored in these conference proceedings are the state-of-the-art technologies, designs, and techniques used in spacecraft, space stations, and satellites, as well as their practical capabilities in GPS systems, remote sensing, weather forecasting, communication networks, and more. Interdisciplinary by nature, SPTP welcomes diverse contributions from experts across the hard and applied sciences, including engineering, aerospace and astronomy, earth sciences, physics, communication, and metrology. All SPTP books are published in print and electronic format, ensuring easy accessibility and wide visibility to a global audience. The books are typeset and processed by Springer Nature, providing a seamless production process and a high quality publication. To submit a proceedings proposal for this series, contact Hannah Kaufman (hannah. [email protected]).

More information about this series at https://link.springer.com/bookseries/6576

Massimiliano Vasile • Domenico Quagliarella Editors

Advances in Uncertainty Quantification and Optimization Under Uncertainty with Aerospace Applications Proceedings of the 2020 UQOP International Conference

Editors Massimiliano Vasile University of Strathclyde Glasgow, UK

Domenico Quagliarella SCPA Centro Italiano Ricerche Aerospaziali Capua, Caserta, Italy

ISSN 1389-1766 Space Technology Proceedings ISBN 978-3-030-80541-8 ISBN 978-3-030-80542-5 (eBook) https://doi.org/10.1007/978-3-030-80542-5 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Preface

In an expanding world with limited resources and increasing uncertainty, optimisation and uncertainty quantification become a necessity. Optimisation can turn a problem into a solution but needs to deal with the complexity of modern engineering systems and incorporate uncertainty from the start. It is generally recognised, in fact, that neglecting the impact of uncertainty on the design of any system or process can lead to unreliable design solutions. Common approaches that make use of safety margins to account for uncertainty in design and manufacturing are not adequate to fully capture the growing complexity of engineering systems and provide reliable, resilient and optimal solutions. This book collects a selection of manuscripts presented at the 2020 Conference on Uncertainty Quantification & Optimisation (UQOP 2020), held virtually on 16– 19 November 2020. The conference was organised by the H2020 ETN UTOPIAE and ran jointly with the 9th International Conference on Bioinspired Optimisation Methods and Their Applications (BIOMA 2020). UQOP gathered internationally renowned researchers developing methods in the fields of optimisation and uncertainty quantification. The conference themes covered the computational, theoretical and application aspects of uncertainty treatment and optimisation under uncertainty, with a particular focus on applications in space and aerospace engineering and problems involving complex numerical models and large-scale expensive simulations. The papers published in this book contain recent advancements in one or more of the following areas: • • • • • • • •

Robust and reliability-based optimisation Stochastic programming, stochastic inverse problems Bayesian inference Uncertainty quantification methods Sensitivity analysis Design of experiments Surrogate and reduced modelling High-dimensional problems v

vi

Preface

• Probability theory and imprecise probability • Applications ranging from aerospace engineering and sciences, energy production and management, transportation, and manufacturing The conference was divided into four thematic symposia, and the book maintains the same organisation with the following sections: • The Uncertainty Quantification, Identification and Calibration in Aerospace Models section contains recent advances on methods for uncertainty quantification and their application to complex problem in aerospace. • The Imprecise Probability, Theory and Applications section presents recent developments on imprecise probability theories with applications to space and aerospace engineering. • The Robust and Reliability-Based Design Optimisation in Aerospace Engineering section collects recent advances in the field of design and optimisation under aleatory and epistemic uncertainty. • The Applications of Uncertainty in Aerospace and Engineering section presents a number of real-world applications of uncertainty treatment in advanced industrial problems. In this last symposium, in particular, the keynote “From Uncertainty Quantification to Shape Optimization: CrossFertilization of Methods for Dimensionality Reduction” highlighted how the correct management of the dimensionality of robust optimisation problems is an ingredient of fundamental importance for developing applications of engineering and industrial interest. Glasgow, UK Capua, Caserta, Italy

Massimiliano Vasile Domenico Quagliarella

Contents

Part I Applications of Uncertainty in Aerospace & Engineering (ENG) From Uncertainty Quantification to Shape Optimization: Cross-Fertilization of Methods for Dimensionality Reduction . . . . . . . . . . . . . . Matteo Diez and Andrea Serani Cloud Uncertainty Quantification for Runback Ice Formations in Anti-Ice Electro-Thermal Ice Protection Systems . . . . . . . . . . . . . . . . . . . . . . . . . Bárbara Arizmendi Gutiérrez, Tommaso Bellosta, Giulio Gori, and Alberto Guardone Multi-fidelity Surrogate Assisted Design Optimisation of an Airfoil under Uncertainty Using Far-Field Drag Approximation . . . . . . . . . . Elisa Morales, Péter Zénó Korondi, Domenico Quagliarella, Renato Tognaccini, Mariapia Marchi, Lucia Parussini, and Carlo Poloni Scalable Dynamic Asynchronous Monte Carlo Framework Applied to Wind Engineering Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Riccardo Tosi, Marc Nuñez, Brendan Keith, Jordi Pons-Prats, Barbara Wohlmuth, and Riccardo Rossi Multi-Objective Optimal Design and Maintenance for Systems Based on Calendar Times Using MOEA/D-DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A. Cacereño, D. Greiner, and B. Galván Multi-objective Robustness Analysis of the Polymer Extrusion Process . . . Lino Costa and António Gaspar-Cunha Quantification of Operational and Geometrical Uncertainties of a 1.5-Stage Axial Compressor with Cavity Leakage Flows . . . . . . . . . . . . . . . Alexandre Gouttière, Dirk Wunsch, Virginie Barbieux, and Charles Hirsch

3

21

35

55

69 85

97

Can Uncertainty Propagation Solve the Mysterious Case of Snoopy? . . . . . 109 Thomas Caleb and Stéphanie Lizy-Destrez vii

viii

Contents

Part II Imprecise Probability, Theory and Applications (IP) Robust Particle Filter for Space Navigation Under Epistemic Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 Cristian Greco and Massimiliano Vasile Computing Bounds for Imprecise Continuous-Time Markov Chains Using Normal Cones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 Damjan Škulj Simultaneous Sampling for Robust Markov Chain Monte Carlo Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 Daniel Krpelik, Louis J. M. Aslett, and Frank P. A. Coolen Computing Expected Hitting Times for Imprecise Markov Chains . . . . . . . . 185 Thomas Krak Part III Robust and Reliability-Based Design Optimisation in Aerospace Engineering (RBDO) Multi-Objective Robust Trajectory Optimization of Multi-Asteroid Fly-By Under Epistemic Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . 209 Simão da Graça Marto and Massimiliano Vasile Reliability-Based Robust Design Optimization of a Jet Engine Nacelle . . . 231 Alberto Clarich and Rosario Russo Bayesian Optimization for Robust Solutions Under Uncertain Input. . . . . . 245 Hoai Phuong Le and Juergen Branke Optimization Under Uncertainty of Shock Control Bumps for Transonic Wings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 Christian Sabater Multi-Objective Design Optimisation of an Airfoil with Geometrical Uncertainties Leveraging Multi-Fidelity Gaussian Process Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 Péter Zénó Korondi, Mariapia Marchi, Lucia Parussini, Domenico Quagliarella, and Carlo Poloni High-Lift Devices Topology Robust Optimisation Using Machine Learning Assisted Optimisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297 Lorenzo Gentile, Elisa Morales, Martin Zaefferer, Edmondo Minisci, Domenico Quagliarella, Thomas Bartz-Beielstein, and Renato Tognaccini Network Resilience Optimisation of Complex Systems . . . . . . . . . . . . . . . . . . . . . . 315 Gianluca Filippi and Massimiliano Vasile Gaussian Processes for CVaR Approximation in Robust Aerodynamic Shape Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 Elisa Morales, Domenico Quagliarella, and Renato Tognaccini

Contents

ix

Part IV Uncertainty Quantification, Identification and Calibration in Aerospace Models (UQ) Inference Methods for Gas-Surface Interaction Models: From Deterministic Approaches to Bayesian Techniques. . . . . . . . . . . . . . . . . . . . . . . . . . . 349 Anabel del Val, Olivier P. Le Maître, Olivier Chazot, Pietro M. Congedo, and Thierry E. Magin Bayesian Adaptive Selection Under Prior Ignorance . . . . . . . . . . . . . . . . . . . . . . . . 365 Tathagata Basu, Matthias C. M. Troffaes, and Jochen Einbeck A Machine-Learning Framework for Plasma-Assisted Combustion Using Principal Component Analysis and Gaussian Process Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379 Aurélie Bellemans, Mohammad Rafi Malik, Fabrizio Bisetti, and Alessandro Parente Estimating Exposure Fraction from Radiation Biomarkers: A Comparison of Frequentist and Bayesian Approaches . . . . . . . . . . . . . . . . . . . . . . 393 Adam Errington, Jochen Einbeck, and Jonathan Cumming A Review of Some Recent Advancements in Non-Ideal Compressible Fluid Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407 Giulio Gori, Olivier Le Maître, and Pietro M. Congedo Dealing with High Dimensional Inconsistent Measurements in Inverse Problems Using Surrogate Modeling: An Approach Based on Sets and Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421 Krushna Shinde, Pierre Feissel, and Sébastien Destercke Stochastic Preconditioners for Domain Decomposition Methods . . . . . . . . . . . 435 João F. Reis, Olivier P. Le Maître, Pietro M. Congedo, and Paul Mycek Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447

Part I

Applications of Uncertainty in Aerospace & Engineering (ENG)

From Uncertainty Quantification to Shape Optimization: Cross-Fertilization of Methods for Dimensionality Reduction Matteo Diez

and Andrea Serani

1 Introduction Simulation-based design (SBD) approaches have shown their maturity in successfully driving the design process of complex industrial applications subject to a variety of operating and environmental conditions. In recent years, SBD has moved to automatic SBD optimization (SBDO, e.g. [3]), embedding global optimization (GO) and uncertainty quantification (UQ) methods in the design process. In shape design, SBDO is composed of three main components: (1) a deterministic and/or stochastic physics-based solver, (2) an optimization algorithm, and (3) a shape parameterization/modification method, and is also referred to as simulation-driven design (SDD, see e.g. [12]). Despite the recent availability of high performance computing (HPC) systems, SBDO remains a theoretical, algorithmic, and technological challenge. The cost associated with the exploration of high-dimensional, large design spaces in the search for global optima is an outstanding critical issue, especially when highfidelity, computationally expensive, and multi-disciplinary black-box tools are used for the performance analysis. High-dimensional spaces are generally more difficult and expensive to explore; nevertheless, they potentially allow for bigger improvements. Additionally, UQ of complex industrial applications is usually computationally expensive, especially if high-order statistical moments are evaluated as in robust and reliability-based design optimization. Both GO and UQ are affected by the curse of dimensionality: the computational cost dramatically increases with the problem dimension. Therefore, there exists a natural ground for cross-fertilization of UQ and GO methods specifically aimed at (or using) dimensionality reduction [14].

M. Diez () · A. Serani CNR–INM, National Research Council–Institute of Marine Engineering, Rome, Italy e-mail: [email protected]; [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Vasile, D. Quagliarella (eds.), Advances in Uncertainty Quantification and Optimization Under Uncertainty with Aerospace Applications, Space Technology Proceedings 8, https://doi.org/10.1007/978-3-030-80542-5_1

3

4

M. Diez and A. Serani

These enable the efficient exploration of large design spaces in shape optimization, which, in turn, enable global multi-disciplinary optimization under uncertainty. Research in shape optimization has focused on shape/topology parameterization methods since, obviously, these methods have a large impact on optimization outcomes: ideally, the parameterization method should span the design variability with as few variables as possible. To this aim, a wide variety of shape modification approaches has been proposed in recent years [11, 19, 31]. Moreover, on-line and linear design-space dimensionality reduction methods have been proposed, requiring the evaluation of the objective function and/or its gradient. For instance, principal component analysis (PCA) or proper orthogonal decomposition (POD) methods have been applied for local reduced-dimensionality representations of feasible design regions [17]; a gradient-based POD/PCA approach is used in the active subspace method [15] to build low-dimensional representations of the design objective. The optimization efficiency is improved by basis rotation and/or dimensionality reduction. Nevertheless, these methods do not provide with an up-front assessment of the design-space variability. Moreover, if gradients are not directly provided, their evaluation by finite differences may be inaccurate due to numerical noise affecting the solvers output. Finally, these are local approaches and their extension to GO is not straightforward. For these reasons, off-line or up-front (linear) methods have been proposed, focusing on design-space variability and dimensionality reduction. One of the first examples can be found in [18], where an orthogonal representation of supercritical airfoil sections was achieved by an iterative least-square fitting of known sections and subsequent Gram–Schmitt orthogonalization. In [32] the POD is used to reduce the design-space dimensionality in a transonic airfoil optimization. POD is used in [16] to derive an efficient reduced-dimensionality set of airfoil design parameters. Geometric data reduction by POD is used in [4] for an airfoil design, where new reduced-dimensionality design spaces are iteratively identified as the optimization progresses. Although the dimensionality reduction is embedded in an iterative procedure, geometric filtration by POD is performed in an outer loop before evolutionary optimization is performed and is based on geometry only. In [2] and [20] the Karhunen–Loève expansion (KLE) is used for representing distributed geometrical uncertainties and building a reduced-order spatial model for uncertainty quantification. A method based on the KLE has been formulated in [8] for the assessment of the shape modification variability and the definition of a reduceddimensionality global model of the shape modification vector. KLE/PCA methods have been successfully applied for deterministic [28] and stochastic [9] hull form optimization of mono-hulls and catamarans in calm water and waves, respectively. In [6] KLE/PCA is used to assess, compare, and reduce in dimensionality three design spaces obtained by different shape modification methods. A discussion on the industrial application of KLE/PCA methods to design-space dimensionality reduction is given in [12]. It may be noted that KLE is formulated in the continuous domain, leading to the eigenproblem of an integral operator. The discretization the integral equation provides the eigenproblem of the autocovariance matrix of the discretized shape modification vector. This corresponds to solving the PCA of the discretized shape modification vector.

Shape Optimization: Methods for Dimensionality Reduction

5

Generally, off-line POD/KLE/PCA approaches are formally equivalent, may be applied to arbitrary shape modification methods, and require no objective function/gradient evaluation, as the dimensionality reduction is based on the notion of geometric/shape variability. Interestingly, this is referred to as geometric variance and energy of the mode shapes in KLE [8] and POD [16] approaches, respectively. The shape optimization efficiency is improved by reparameterization and dimensionality reduction; the assessment of the design space and the associated shape parameterization is provided before optimization and even performance analyses are performed. The assessment is based on the notion of geometric variability, making the approach fully off-line and very attractive, as no simulations are required for the dimensionality reduction. Nevertheless, the lack of physical information may become a critical issue in all those applications where small shape variations have a significant effect on the physics, such as in flow separations, sonic shocks, etc. For this reason, extension to physics-informed formulations has been proposed [24, 26, 27, 30]. Furthermore, if strongly nonlinear relationships exist between design variables, shape modification, and physical parameters, extensions may be of interest to nonlinear dimensionality reduction methods [5, 25]. The paper reviews and discusses recent methods for design-space dimensionality reduction based on the KLE. A discussion is provided on the use of geometry-based and physics-informed formulations. An example is shown and discussed for the hydrodynamic optimization of a naval destroyer.

2 Design-Space Dimensionality Reduction in Shape Optimization 2.1 Geometry-Based Formulation Consider a geometric domain G (which identifies the initial shape) and a set of coordinates ξ ∈ G. Assume that u ∈ U is the design variable vector, which defines a shape modification vector δ (see Fig. 1). Consider the vector space of all possible square-integrable modifications of the initial shape, δ(ξ , u) ∈ L2ρ (G), where L2ρ (G) is the Hilbert space defined by a generalized inner product (a, b)ρ =

G

ρ(ξ )a(ξ ) · b(ξ )dξ

1/2

(1)

with associated norm aρ = (a, a)ρ , where ρ(ξ ) ∈ R is an arbitrary weight function. Generally, ξ ∈ Rn with n = 1, 2, 3, u ∈ RM with M number of design variables, and δ ∈ Rm with m = 1, 2, 3 (with m not necessarily equal to n). Assume that, before running the shape optimization procedure, the design problem is affected by epistemic uncertainty, being the optimal design not known a priori. Therefore, u may be given a probability density function f (u), which

6

M. Diez and A. Serani

Fig. 1 Scheme and notation for the current formulation, showing an example for n = 1 and m = 2

represents the degree of belief that the optimal design will be found in certain regions of the design space. The associated mean shape modification is then δ =

U

δ(ξ , u)f (u)du

(2)

where δ is a function of ξ , being · the ensemble average over u. The variance associated with the shape modification vector (geometric variance) is defined as ˆ , u) · δ(ξ ˆ , u)f (u)dξ du ˆ 2 = ρ(ξ )δ(ξ σ 2 = δ U G

(3)

ˆ , u) = δ(ξ , u) − δ . where δ(ξ ˆ The aim of the KLE is to find an optimal basis of orthonormal functions for δ: ˆ , u) ≈ δ(ξ

N

xk (u)ϕ k (ξ )

(4)

ˆ , u) · ϕ k (ξ )dξ ρ(ξ )δ(ξ

(5)

k=1

where ˆ ϕ k )ρ = xk (u) = (δ,

G

are the basis function components or coefficients, used hereafter as new design variables.

Shape Optimization: Methods for Dimensionality Reduction

7

The optimality condition associated with the KLE refers to the geometric variance resolved by the basis functions through Eq. (4). Combining Eqs. (3)–(5) yields σ2 =

∞ ∞ ˆ ϕ j )2 xj2 = (δ, xk xj (ϕ k , ϕ j )ρ = ρ

∞ ∞ k=1 j =1

j =1

(6)

j =1

The basis resolving the maximum variance is formed by the solutions ϕ of the variational problem ˆ ϕ k )2 J(ϕ k ) = (δ, ρ

maximize ϕ∈L2ρ (G) subj ect to

(7)

(ϕ k , ϕ k )2ρ = 1

which yields (e.g., [8]) Lϕ k (ξ ) =

G

ˆ , u) ⊗ δ(ξ ˆ , u) ϕ k (ξ )dξ = λk ϕ k (ξ ) ρ(ξ ) δ(ξ

(8)

where ⊗ indicates the outer product and L is a self-adjoint integral operator whose eigensolutions define the optimal basis functions for the linear representation of Eq. (4). Therefore, its eigenfunctions (KL modes) {ϕ k }∞ k=1 are orthogonal and form a complete basis for L2ρ (G). Additionally, it may be proven that σ = 2

∞

λk

(9)

k=1

where the eigenvalues λk (KL values) represent the variance resolved by the associated basis function ϕ k , through its component xk in Eq. (4): λk = xk2

(10)

Finally, the solutions {ϕ k }∞ k=1 of Eq. (8) are used to define the reduceddimensionality space for the shape modification. Define l, 0 < l ≤ 1, as the desired level of confidence for the shape modification variability, the smallest N in Eq. (4) is selected, such as N k=1

with λk ≥ λk+1 .

λk ≥ l

∞ k=1

λk = lσ 2

(11)

8

M. Diez and A. Serani

Fig. 2 Block diagram for simulation-based shape optimization using geometry-based dimensionality reduction

It may be shown how the numerical solution of Eq. (8) via discretization of the shape domain and Monte Carlo (MC) sampling over design variables in u yields the PCA of the discretized shape modification vector. Details of equations and numerical implementation are given in [8]. The block diagram for simulationbased shape optimization using geometry-based dimensionality reduction is shown in Fig. 2.

2.2 Physics-Informed Formulation Along with the shape modification vector δ, consider a distributed physical parameter vector π ∈ Rp , p = 1, . . . , ∞ (including, e.g., velocity, pressure, etc.) defined on a physical domain P ∈ Rn and a lumped physical parameter vector θ ∈ Rq , q = 1, . . . , ∞ (including, e.g., resistance) on a domain Q (see Fig. 3). Note that Q has a null measure and corresponds to an arbitrary point where the lumped physical parameter vector is virtually defined. Also note that, in general, D ≡ G ∪ P ∪ Q is not simply connected. Rather, G may be a boundary of P.

Shape Optimization: Methods for Dimensionality Reduction

9

Fig. 3 Domains for shape modification vector, distributed physical parameter vector, and lumped (or global) physical parameter vector in a disjoint Hilbert space

Consider now a combined geometry-based and physics-informed vector γ ∈ Rm , and for the sake of simplicity a set of coordinates ξ ∈ Rn , such that ⎧ ⎪ ⎨ δ(ξ , u) γ (ξ , u) = π (ξ , u) ⎪ ⎩ θ (ξ , u)

if ξ ∈ G if ξ ∈ P if ξ ∈ Q

(12)

belongs to a disjoint Hilbert space L2ρ (D), defined by the generalized inner product (a, b)ρ = =

D G

ρ(ξ )a(ξ ) · b(ξ )dξ

(13)

ρ(ξ )a(ξ ) · b(ξ )dξ +

P

ρ(ξ )a(ξ ) · b(ξ )dξ + ρ(ξ θ )a(ξ θ ) · b(ξ θ )

1/2

with associated norm aρ = (a, a)ρ . Again, considering all possible realizations of u, the associated mean vector is γ =

U

γ (ξ , u)f (u)du

(14)

where γ is a function of ξ . The associated variance (which now considers a combined geometry-based and physics-informed design variability) equals

σ = γˆ 2

2

=

U D

ρ(ξ )γˆ (ξ , u) · γˆ (ξ , u)f (u)dξ du

(15)

where γˆ (ξ , u) = γ (ξ , u)−γ represents the physics-informed design modification vector.

10

M. Diez and A. Serani

Similarly to the previous case, the aim is to find an optimal basis of orthonormal functions, which will be used to construct a linear representation of γˆ : γˆ (ξ , u) ≈

N

xk (u)ψ k (ξ )

(16)

k=1

where by definition [27] ⎧ ⎪ ⎨ ϕ k (ξ ) ψ k (ξ ) = χ k (ξ ) ⎪ ⎩ ν (ξ ) k

if ξ ∈ G if ξ ∈ P if ξ ∈ Q

(17)

and xk (u) = (γˆ , ψ k )ρ =

D

ρ(ξ )γˆ (ξ , u) · ψ k (ξ )dξ

(18)

Similarly to Eq. (8), the solution is given by Lψ k (ξ ) =

D

ρ(ξ ) γˆ (ξ , u) ⊗ γˆ (ξ , u) ψ k (ξ )dξ = λk ψ k (ξ )

(19)

where again L is a self-adjoint integral operator whose eigensolutions define the optimal basis functions for the linear representation of Eq. (16). Therefore, its eigenfunctions (KL modes) {ψ k }∞ k=1 are orthogonal and form a complete basis for L2ρ (D). After dimensionality reduction is performed, the geometric components {ϕ k }N k=1 of the eigenvectors ψ k in Eq. (17) are used for the new representation of the shape modification vector. Again, it may be shown how the numerical solution of Eq. (19) via discretization of shape and physics domains and MC sampling over design variables u yields the PCA of the discretized shape and physical parameters modification vector. Details of equations and numerical implementation may be found in [24, 26, 27]. The block diagram for simulation-based shape optimization using physics-informed dimensionality reduction is shown in Fig. 4. For the sake of computational efficiency, the procedure includes low-fidelity analysis tools in the up-front dimensionality reduction, whereas high-fidelity analysis tools are devoted to drive the design optimization loop.

Shape Optimization: Methods for Dimensionality Reduction

11

Fig. 4 Block diagram for simulation-based shape optimization using physics-informed dimensionality reduction

3 Example Application KLE/PCA is applied to the shape reparameterization and hull form optimization of the DTMB 5415 model, an open-to-public naval combatant hull widely used as optimization benchmark in the ship hydrodynamic community [10]. The optimization problem pertains to the minimization of the total resistance in calm water at F r = 0.28 (equivalent to 20 kn for the full-scale ship, Lpp = 142 m), subject to fixed length between perpendiculars, vessel displacement greater or equal to the original, ±5% of maximum variation of beam and drought, and dedicated volume for the sonar in the bow dome. Hydrodynamic simulations are conducted using the code WARP (WAve Resistance Program), developed at CNR-INM. Wave resistance computations are based on the linear potential-flow theory using Dawson (double-model) linearization [7]. The frictional resistance is estimated using a flat-plate approximation, based on the local Reynolds number [21]. Details of equations, numerical implementations, and validation of the numerical solver are given in [1]. Simulations are performed with two degrees of freedom (sinkage and trim) for the demi-hull, taking advantage of symmetry about the ξ1 ξ3 -plane. The computational domain for the free-surface is

12

M. Diez and A. Serani

Fig. 5 FFD shape modification example: black points are the FFD lattice nodes, green circles are the active nodes, and blue diamonds are the modified nodes in this example

defined within 0.5Lpp upstream, 1.5Lpp downstream, and 1Lpp sideways. For the shape optimization problem, 180 × 50 grid nodes are used for the hull, whereas 150 × 50 nodes are used for the free-surface. The original design space for the shape modification is formed by M = 10 design variables, defined by the free-form deformation (FFD) method [22]. Specifically, the demi-hull is embedded in a lattice of 5 × 3 × 3 nodes in the ξ1 ξ2 ξ3 reference system: Only 10 nodes are active and can be moved in the beam (ξ2 ) direction only. Figure 5 shows an example of shape modification by a contour plot: The FFD lattice is represented by the black points, the active nodes are depicted with green circles, and the modified nodes are shown with blue diamonds. Data for geometry-based KLE/PCA collects the shape modification vector δ components. Physics-informed KLE/PCA data also includes heterogeneous/distributed and lumped physical parameters. Specifically, pressure distribution (p), wave elevation (η), and wave resistance coefficient (Cw ) are taken into account. These physical parameters are based on even-keel WARP solutions obtained with a quite coarse panel grid. A hybrid global/local deterministic particle swarm optimization algorithm [23] is used, assuming a limited budget of function evaluations equal to 200. The optimization algorithm setup is taken from [29]. For the sake of current example, two degrees of freedom WARP solutions obtained with a quite fine panel grid are used to drive the optimization loop. KLE/PCA is trained by sets of S = 100, 1000, and 10,000 MC samples and the variance resolved is presented in Fig. 6. Specifically, Fig. 6 shows the cumulative sum of the KL eigenvalues as percentage of the total variance, along with mean and 95% confidence interval evaluated by bootstrap analysis. The results are found convergent versus the number of samples. If at least the 95% of the original variance (σ 2 ) is desired, N = 4 reduced design variables are needed using geometry-based KLE/PCA, whereas physics-informed KLE/PCA requires N = 6 variables. The corresponding KL modes for the shape modification are shown in Figs. 7 and 8. It can be noted how the use of physical parameters affects the shape of the KL modes. The optimization is performed with the original FFD and the two reduced design spaces. The optimization convergence is shown in Fig. 9. The original design space

Shape Optimization: Methods for Dimensionality Reduction

13

Fig. 6 Variance resolved by the KL modes conditional to the number of MC samples (S), along with mean value and confidence interval provided by the bootstrap method. (a) Geometry-based. (b) Physics-informed

(M = 10) achieves a 7.6% reduction for the total resistance. Using geometry-based reduced design space (N = 4) improves the algorithm convergence, achieving a better optimum with 8.4% objective improvement. Finally, the physics-informed reduced design space (N = 6) provides the best optimum with 11.4% reduction for the total resistance. This means that, even if physics-informed KLE/PCA is not able to reduce the design-space dimensionality as the geometry-based KLE/PCA is (six versus four design variables, respectively), using physical parameters in the

14

M. Diez and A. Serani

Fig. 7 Geometry-based eigenvectors magnitude. (a) φ 1 . (b) φ 2 . (c) φ 3 . (d) φ 4

Fig. 8 Physics-informed eigenvectors magnitude (shape modification component). (a) φ 1 . (b) φ 2 . (c) φ 3 . (d) φ 4 . (e) φ 5 . (f) φ 6

Fig. 9 Optimization convergence

Shape Optimization: Methods for Dimensionality Reduction

15

Fig. 10 Optimal shapes for the original and the reduced design spaces: shape modification in ξ2 direction. (a) Original FFD. (b) Geometry-based. (c) Physics-informed

Fig. 11 Comparison of the pressure field on the parent (a) and the optimal (b–d) hulls. (b) Original FFD, (c) Geometry-based. (d) Physics-informed

dimensionality-reduction phase provides a more efficient and effective design space, leveraging the resulting physical significance of KL modes. The optimal hull shapes, along with the contour of the shape modification, are shown in Fig. 10. It can be noted how original FFD and geometry-based reduced design-space optima are quite similar, whereas the optimum obtained by the physics-informed reduced design space is significantly different, showing a wider bow dome. Finally, comparison of pressure distribution on the hull and wave elevation pattern produced by the parent and the optimal hull forms are given in Figs. 11 and 12, respectively. The design obtained by the physics-informed reduced design space shows a better pressure recovery towards the stern compared to the other optima, along with a significant reduction of both transverse and diverging Kelvin waves.

4 Concluding Remarks The paper reviewed and discussed recent methods for design-space dimensionality reduction in global simulation-based shape optimization, based on the KLE. The approach moves from the assumption that, before running the design optimization procedure, the design problem is affected by epistemic uncertainty, being the optimal design not known a priori. Therefore, the design variable vector goes

16

M. Diez and A. Serani

Fig. 12 Comparison of the wave elevation pattern produced by the parent (a) and the optimal (b–d) hull shapes. (b) Original FFD, (c) Geometry-based. (d) Physics-inform

stochastic, opening the door to cross-fertilization with methods for stochastic processes, specifically aimed at dimensionality reduction such as the KLE. KLE is formally equivalent to POD for flows/turbulence studies and reduces to the PCA of the discretized shape (and optionally physical parameters) modification vector. The formulations discussed here span from geometric only to physics informed. An example has been shown and discussed for the hydrodynamic optimization of a ship hull. The methodology goes beyond the current application and may be applied in all those areas where the design performance depends on the product shape (such as aerodynamics, heat transfer, acoustics, and aeroacoustics, etc.). The geometry-based formulation has the evident advantage that no simulations are required before running optimization. Nevertheless, no physical information is used in the analysis, and therefore no physical meaning is provided of what is physically resolved in the reduced-dimensionality design space and what is not. In other words, KL modes are not necessarily physically relevant. The physics-informed formulation overcomes this limitation by including physical parameters provided by low-fidelity solvers. The resulting reduceddimensionality representation of the shape modification vector is more efficient and effective than that provided by the geometry-based formulation, leveraging the physical significance of KL modes. Nevertheless, it is achieved at a higher

Shape Optimization: Methods for Dimensionality Reduction

17

computational cost as low-fidelity solvers need to be run during the design-space dimensionality reduction phase. Nevertheless, if the computational cost associated with low-fidelity solutions is orders of magnitude smaller than that associated with high-fidelity solutions used during the optimization phase, the physics-informed formulation is still very convenient. A third and compromise option is to use geometrical parameters that are physically relevant, such as for instance air/hydrofoil sections global/integral parameters [33] or hulls sections/waterplane-area global/integral parameters [13]. This would allow for considering physically related parameters (though by geometry only) without the need for running any physical-model solver in the designspace dimensionality reduction phase, providing a reasonable compromise between geometry-based and physics-informed formulations. It may be noted that the formulations discussed here rely on a linear dimensionality reduction approach and therefore a linear representation of the reduceddimensionality shape modification vector (linear subspace). If strongly nonlinear relationships exist between design variables, shape modification, and physical parameters, nonlinear dimensionality reduction methods may provide a more efficient and effective representation of the reduced-dimensionality shape modification vector, as discussed in [5, 25]. In summary, theory and techniques for stochastic processes and dimensionality reduction in uncertainty modelling and quantification, such as KLE and its variants, have demonstrated their capability of providing a rigorous and powerful framework for design-space variability assessment and dimensionality reduction in global shape optimization. This cross-fertilization of methods allows for the efficient exploration of large design spaces in shape optimization, which, in turn, enables global multi-disciplinary optimization under uncertainty. Acknowledgments The authors are grateful to Drs. Woei-Min Lin, Elena McCarthy, and Salahuddin Ahmed of the Office of Naval Research and Office of Naval Research Global for their support through NICOP grant N62909-18-1-2033.

References 1. Bassanini, P., Bulgarelli, U., Campana, E.F., Lalli, F.: The wave resistance problem in a boundary integral formulation. Surv. Math. Ind. 4, 151–194 (1994) 2. Borzì, A., Schulz, V., Schillings, C., Von Winckel, G.: On the treatment of distributed uncertainties in PDE-constrained optimization. GAMM-Mitteilungen 33(2), 230–246 (2010) 3. Campana, E.F., Peri, D., Tahara, Y., Stern, F.: Shape optimization in ship hydrodynamics using computational fluid dynamics. Comput. Methods Appl. Mech. Eng. 196(1–3), 634–651 (2006) 4. Cinquegrana, D., Iuliano, E.: Investigation of adaptive design variables bounds in dimensionality reduction for aerodynamic shape optimization. Comput. Fluids 174, 89–109 (2018) 5. D’Agostino, D., Serani, A., Campana, E.F., Diez, M.: Deep autoencoder for off-line designspace dimensionality reduction in shape optimization. In: 2018 AIAA/ASCE/AHS/ASC Structures, Structural Dynamics, and Materials Conference, p. 1648 (2018)

18

M. Diez and A. Serani

6. D’Agostino, D., Serani, A., Diez, M.: Design-space assessment and dimensionality reduction: an off-line method for shape reparameterization in simulation-based optimization. Ocean Eng. 197, 106852 (2020) 7. Dawson, C.W.: A practical computer method for solving ship-wave problems. In: Proceedings of the 2nd International Conference on Numerical Ship Hydrodynamics, Berkeley, pp. 30–38 (1977) 8. Diez, M., Campana, E.F., Stern, F.: Design-space dimensionality reduction in shape optimization by Karhunen–Loève expansion. Comput. Methods Appl. Mech. Eng. 283, 1525–1544 (2015) 9. Diez, M., Campana, E.F., Stern, F.: Stochastic optimization methods for ship resistance and operational efficiency via CFD. Struct. Multidiscip. Optim. 57(2), 735–758 (Feb 2018) 10. Grigoropoulos, G., Campana, E., Diez, M., Serani, A., Goren, O., Sariöz, K., Dani¸sman, D., Visonneau, M., Queutey, P., Abdel-Maksoud, M., et al.: Mission-based hull-form and propeller optimization of a transom stern destroyer for best performance in the sea environment. In: VII International Conference on Computational Methods in Marine Engineering MARINE2017 (2017) 11. Haftka, R.T., Grandhi, R.V.: Structural shape optimization-a survey. Comput. Methods Appl. Mech. Eng. 57(1), 91–106 (1986) 12. Harries, S., Abt, C.: Faster turn-around times for the design and optimization of functional surfaces. Ocean Eng. 193, 106470 (2019) 13. Khan S., Kaklis P., Serani A., Diez, M. (2021) Supporting Expensive Physical Models with Geometric Moment Invariants to Accelerate Sensitivity Analysis for Shape Optimisation., AIAA SciTech 2022 Forum 14. Le Maître, O., Knio, O.M.: Spectral Methods for Uncertainty Quantification: With Applications to Computational Fluid Dynamics. Springer Science & Business Media, Berlin (2010) 15. Lukaczyk, T., Palacios, F., Alonso, J.J., Constantine, P.: Active subspaces for shape optimization. In: Proceedings of the 10th AIAA Multidisciplinary Design Optimization Specialist Conference, National Harbor, MD, 13–17 Jan 2014 16. Poole, D.J., Allen, C.B., Rendall, T.C.: Metric-based mathematical derivation of efficient airfoil design variables. AIAA J. 53(5), 1349–1361 (2015) 17. Raghavan, B., Xiang, L., Breitkopf, P., Rassineux, A., Villon, P.: Towards simultaneous reduction of both input and output spaces for interactive simulation-based structural design. Comput. Methods Appl. Mech. Engrg. 265, 174–185 (2013) 18. Robinson, G., Keane, A.: Concise orthogonal representation of supercritical airfoils. J. Aircr. 38(3), 580–583 (2001) 19. Samareh, J.A.: Survey of shape parameterization techniques for high-fidelity multidisciplinary shape optimization. AIAA J. 39(5), 877–884 (2001) 20. Schillings, C., Schmidt, S., Schulz, V.: Efficient shape optimization for certain and uncertain aerodynamic design. Comput. Fluids 46(1), 78–87 (2011) 21. Schlichting, H., Gersten, K.: Boundary-Layer Theory. Springer, Berlin (2000) 22. Sederberg, T.W., Parry, S.R.: Free-form deformation of solid geometric models. ACM SIGGRAPH Comput. Graph. 20(4), 151–160 (1986) 23. Serani, A., Diez, M., Campana, E.F., Fasano, G., Peri, D., Iemma, U.: Globally convergent hybridization of particle swarm optimization using line search-based derivative-free techniques. In: Yang, X.S. (ed.) Recent Advances in Swarm Intelligence and Evolutionary Computation, Studies in Computational Intelligence, vol. 585, pp. 25–47. Springer International, Cham (2015) 24. Serani, A., Campana, E.F., Diez, M., Stern, F.: Towards augmented design-space exploration via combined geometry and physics based Karhunen-Loève expansion. In: 18th AIAA/ISSMO Multidisciplinary Analysis and Optimization Conference (MA&O), AVIATION 2017, Denver, 5–9 June 2017 25. Serani, A., D’Agostino, D., Campana, E.F., Diez, M., et al.: Assessing the interplay of shape and physical parameters by unsupervised nonlinear dimensionality reduction methods. J. Ship Res. 64, 313–327 (2020)

Shape Optimization: Methods for Dimensionality Reduction

19

26. Serani, A., Diez, M.: Shape optimization under stochastic conditions by design-space augmented dimensionality reduction. In: 19th AIAA/ISSMO Multidisciplinary Analysis and Optimization Conference (MA&O), AVIATION 2018. Atlanta, 25–29 June 2018 27. Serani, A., Diez, M., Wackers, J., Visonneau, M., Stern, F.: Stochastic shape optimization via design-space augmented dimensionality reduction and rans computations. In: AIAA SciTech 2019 Forum, p. 2218 (2019) 28. Serani, A., Fasano, G., Liuzzi, G., Lucidi, S., Iemma, U., Campana, E.F., Stern, F., Diez, M.: Ship hydrodynamic optimization by local hybridization of deterministic derivative-free global algorithms. Appl. Ocean Res. 59, 115–128 (2016) 29. Serani, A., Leotardi, C., Iemma, U., Campana, E.F., Fasano, G., Diez, M.: Parameter selection in synchronous and asynchronous deterministic particle swarm optimization for ship hydrodynamics problems. Appl. Soft Comput. 49, 313–334 (2016) 30. Serani, A., Stern, F., Campana, E.F. et al. Hull-form stochastic optimization via computationalcost reduction methods. Eng. Comput. (2021). https://doi.org/10.1007/s00366-021-01375-x 31. Sieger, D., Menzel, S., Botsch, M.: On shape deformation techniques for simulation-based design optimization. In: Perotto, S., Formaggia, L. (eds.) New Challenges in Grid Generation and Adaptivity for Scientific Computing, pp. 281–303. Springer International Publishing, Cham (2015) 32. Toal, D.J., Bressloff, N.W., Keane, A.J., Holden, C.M.: Geometric filtration using proper orthogonal decomposition for aerodynamic design optimization. AIAA J. 48(5), 916–928 (2010) 33. Volpi, S., Diez, M., Stern, F.: Multidisciplinary design optimization of a 3d composite hydrofoil via variable accuracy architecture. In: 2018 Multidisciplinary Analysis and Optimization Conference, p. 4173 (2018)

Cloud Uncertainty Quantification for Runback Ice Formations in Anti-Ice Electro-Thermal Ice Protection Systems Bárbara Arizmendi Gutiérrez, Tommaso Bellosta, Giulio Gori and Alberto Guardone

,

Nomenclature μm σ2 β H kwall cp F δ h s V ψk T il−v il−s A c

Expectancy Variance Collection efficiency IPS Substratum thickness [m] Effective thermal conductivity Specific heat [J kg−1 K−1 ] Wetness fraction Thickness [m] Heat transfer coefficient [W m−2 K−1 ] Curvilinear coordinate [m] Velocity [ms−1 ] Multivariate polynomial basis k Temperature [K] Vaporization latent heat [J kg−1 ] Solidification latent heat [J kg−1 ] Control volume surface area [m2 ] Chord [m]

B. Arizmendi Gutiérrez () · T. Bellosta · A. Guardone Department of Aerospace Science and Technology, Politecnico di Milano, Milan, Italy e-mail: [email protected] G. Gori INRIA/Centre de Mathématiques Appliquées, École Polytechnique, IPP, Palaiseau, France © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Vasile, D. Quagliarella (eds.), Advances in Uncertainty Quantification and Optimization Under Uncertainty with Aerospace Applications, Space Technology Proceedings 8, https://doi.org/10.1007/978-3-030-80542-5_2

21

22

m(s ˙ i) m ˙ (si ) M˙ ξ N βi d P evap water wall ref boil rec in out f imp ice ∞ 0 AI ETIPS q

B. Arizmendi Gutiérrez et al.

Local Mass Rate [kg s−1 ] Local Mass Flux Rate [kg m−2 s−1 ] Total Mass Rate [kg s−1 ] Vector of random variables Number of ith Deterministic coefficient Number of uncertain variables Polynomials Truncation Degree Evaporative Liquid film External solid surface Reference Temperature, 273.15 K Boiling Temperature, 373.15 K Recovery Incoming to a control volume Outgoing from a control volume Liquid film Impinging Freezing Free stream Total Anti-Ice Electro-Thermal Ice Protection System Quadrature Points

1 Introduction When an aircraft is traversing a very moist environment, that is a cloud, and the static air temperature is below the freezing point, accumulation of ice over its surfaces occurs, namely, in-flight icing. The water contained in these clouds is in a supercooled meta-stable equilibrium state i.e. droplets are liquid despite being their temperature below the water freezing point. When this meta-stable equilibrium is perturbed by impacting on a surface, the droplets totally or partially freeze [11]. Ice accretion is a major threat to the safety and performance of aircraft. Mild ice accretions might even entail significant undesirable effects. These include a drop in aerodynamic performance, decrease in controllability and malfunctioning of probes. Luckily, numerous technologies have been developed for protection against ice, these are the so-called Ice Protection Systems (IPS). They are commonly deployed in critical parts to assure their adequate performance. In case of aircraft, these include probes, windshields, engine lips and wings. IPS seek to remove or delay ice formations. De-Ice technologies operate by removing an already accreted layer of ice. Anti-Ice technologies (AI) aim to delay or avoid the formation of ice over the protected parts. In the case of thermal AI-IPS heating power is supplied such

Cloud Uncertainty Quantification for Thermal Ice Protection Systems

23

that the impinging water can be fully or partly evaporated. The residual water is driven downstream by aerodynamic shear forces of the airflow, where it might freeze causing runback ice. Runback ice can equally compromise the performance of aircraft. All the mature IPS technologies present strengths and weaknesses and often a combination of them is deployed for full protection [1]. Electro-thermal IPS (ETIPS) is a mature technology, which exploits the Joule effect to transform electrical current into heat. A substratum with embedded resistive circuits is positioned in the protected parts. ETIPS is one preferred method due to its geometric and operational flexibility, lightweight and reliability. However, ETIPS draw large amounts of power, especially to protect large areas in AI operation. The temperature rise induced seeks to partly or totally evaporate the impinging water in the protected parts. This determines the two possible Anti-Ice regimes namely, Fully Evaporative (FE) and Running Wet (RW). In RW regime runback water likely freezes downstream. A ridge of ice commonly forms whose position and shape depends on the outside temperature, the rate of water impacting upon the surface and the thermal power supplied by the IPS. It may cause an increase in drag coefficient, reduction of the maximum lift coefficient and reduction of the stalling angle [22]. In addition, those formations are difficult to be voluntarily removed, posing a threat for aircraft. From the icing perspective, clouds are mainly characterized by three different parameters, namely, the liquid water content (LWC) which accounts for the mass of water contained per volume of cloud, the mean volume diameter (MVD), which represents a mass averaged droplet diameter and the static air temperature (SAT). These parameters depend on environmental factors, such as the presence of large masses of water on the ground, the season or the emissions of particle aerosols from the ground among other factors [15]. Moreover, their complex physical processes and dynamics entail rapid local variability of the properties that make their predictions difficult. Unfortunately, the cloud properties are not generally measured in-flight and besides they are uncontrollable. Consequently, they are uncertain. Despite that, adequate performance of AI-ETIPS must be ensured by minimizing the likelihood of occurrence of severe runback ice formations. There is a limited research effort on cloud uncertainties quantification, which was performed in recent years. Relevant existing studies focused on the probabilistic prediction of the severity category of icing encounters for uncertain environmental parameters in unprotected components. For instance, Zhang et al. [24] elaborated an analytical model to assess the ice formation rate and accounted for operational environmental uncertain parameters, which were modelled as random variables. The range of values of each severity category was considered to be uncertain as well and a class belonging function was defined. The uncertainties were propagated employing Monte Carlo sampling. Later on, in an analogous study, Feng et al. [10] analytically propagated the cloud uncertainties to quantify the probability of belonging to each of the Zhang’s severity categories. Zhang et al. [25] extended that analysis by performing a sensitivity analysis of ice severity on the environmental parameters through fuzzy-state assumption and the Profust model. However, up

24

B. Arizmendi Gutiérrez et al.

to date and to the authors’ knowledge, there are no applications of Uncertainty Propagation for AI-ETIPS. In this work, a well-established methodology for uncertainty quantification, Monte Carlo sampling (MCS), is deployed into an unprecedented application, AI-ETIPS. The scope is the characterization of the uncertainties in the rates of runback ice forming downstream the protected parts caused by the operation of the AI-ETIPS in cloud uncertain parameters. Then, the statistical results are compared by those obtained by means of another well-established technique, namely, generalized Polynomial Chaos Expansion (gPCE). The goal is to assess whether accurate predictions can be elaborated at a reduced computational cost. The work is structured as follows: In Sect. 2 a brief overview of the physics and the numerical modelling for ETIPS are presented together with the case of study. Next, in Sect. 3 the stochastic modelling of the cloud uncertainties is introduced. Then, the methodologies selected for uncertainty propagation are briefly presented in Sect. 4. After, the results obtained are discussed in Sect. 5, and finally the concluding remarks are stated in Sect. 6

2 Modelling of an AI-ETIPS 2.1 Computational Model During an icing encounter, the aircraft surfaces collect water. The mass rate of impinging water m ˙ imp is dependent on the cloud LWC, the flying speed V∞ and the so-called collection efficiency β. The collection efficiency is a continuous variable that quantifies the local concentration of impinging water over the surface studied. It is dependent on the mass averaged diameter MVD, the geometry of the surface and V∞ . When the AI-ETIPS is active, part of the heat supplied is devoted to the total or partial evaporation of m ˙ imp . In case there is residual water, it leads to runback ice formations downstream in unprotected parts. With regard to thermodynamic process, apart from the heat supplied by the ETIPS, the kinetic energy of the impinging droplets is transformed into heat. Besides, due to the high speed, the viscous forces will dissipate part of the kinetic energy of the air stream into heat, that is, aerodynamic heating. It entails a raise of temperature at the surroundings of the surface, quantified by the recovery temperature Trec . Finally, the water droplets freezing release latent heat of fusion. On another note, the outgoing heat fluxes in the operation of an AI-ETIPS mainly include the evaporative and the convective losses. The convective losses are caused by the presence of a hot body in an aerodynamic field. Additionally, the impinging water droplets absorb heat such that their temperature is risen. A scheme of the process is presented in Fig. 1. The numerical model deployed in this study enables the calculation of characteristic parameters of the AI-ETIPS such as the runback ice mass rates among others. Numerous numerical models have been developed to this end, such as ANTICE [4], CANICE [16] and many others. In this work, an extension of the Icing code

Cloud Uncertainty Quantification for Thermal Ice Protection Systems

25

Fig. 1 Schematic representation of the main heat fluxes involved in the operation of an AI-ETIPS for an airfoil. Aerodynamic, Droplet, Latent and IPS heat fluxes are labelled in black, blue, orange and red, respectively

PoliMIce [12] for Anti-Icing modelling is utilized [5]. Long and mild encounters are here considered, extending horizontally more than 17.4 Nautical miles as recommended in for icing certifications, i.e. encounters with stratiform clouds [2]. Hence, steady state is assumed. The complete physics is decomposed in several numerical steps loosely coupled to account for the different physical processes, thanks to the fact that characteristic times of aerodynamic and ice accretions are well separated. First, the particle laden flow is solved for the computation of the mass impinging rates. Then, the thermodynamic processes entailed by the presence of AI-ETIPS are computed. The resolution of the particle laden flow allows the computation of the mass impinging rate profile across the airfoil surface. The volume fraction of water droplets in the carrier fluid, air, is very low. Hence, it is assumed that the flow field is unaffected by the water droplets. The Eulerian inviscid flow equations are solved through the CFD code SU2 [9]. The trajectories of the water droplets within the flow field are computed a posteriori by means of PoliDrop [6], an in-house developed Lagrangian Particle Tracking solver. It integrates the equations of motion of the droplets immersed in the flow field. In this way, the droplets impacting positions can be estimated, enabling the calculation of β. Then, the impinging mass rate is computed as: m ˙ imp = β LWC V∞

(1)

Next, the thermodynamic calculations consist of a 1D model to solve mass and energy conservation equations on a surface discretized in control elements coinciding with the computational mesh. The viscous effects are considered by

26

B. Arizmendi Gutiérrez et al.

including an empirical integral boundary layer model. The multilayered layout of the heaters is simplified as a single layer with equivalent thermal conductivity. The model equations are based on the work of Silva [19, 20] while the liquid film model is based on the work of Myers [17] adapted to the Anti-Ice problem. The mass conservation equation reads:

∂δf u¯ f δf , s ˙ evap − m ˙ ice m ˙ imp − m . = ∂s AρH2 O

(2)

˙ in From δf , the local liquid film thickness, the incoming and outgoing mass rates m and m ˙ out can be retrieved. The equation of the energy conservation in the solid substratum of the IPS is presented next:

dTwall kwall H − F hH2 O Twall − TH2 O ds + q˙IPS − (1 − F ) hair (Twall − Trec )

−ζ m ˙ imp il−v + cp Tboil − Timp = 0.

d ds

(3)

Then, a second conservation equation in the liquid film is formulated: F A hair (Trec − Twater ) + F A hwater (Twall − Twater ) +m ˙ in cpwater (Tin − Tref ) − m ˙ out cpwater (Tout − Tref ) 2 +m ˙ imp cpwater (T∞ − Tref ) + 12 V∞ −m ˙ evap il−v + cpwater (Twater − Tref ) +m ˙ ice il−s − cpwater (Twater − Tref ) = 0,

(4)

where m ˙ ice corresponds to the mass rate of runback ice originally accreting in the clean profile. The main contributions to the film energy balance correspond to evaporative, convective and supplied IPS heat fluxes. It is assumed there is no temperature gradient across the height of the liquid film. Moreover, Icing and evaporation on impingement terms are included in the conservation equations reported in [19] to increase the generality of the model.

2.2 Case of Study The case of study consists of an extruded NACA0012 airfoil with a chord of 0.9144 m. Seven independently controlled heaters that extend span-wise have been cast from the leading edge downstream in both pressure and surface sides (Fig. 2).

Cloud Uncertainty Quantification for Thermal Ice Protection Systems

27

Fig. 2 Layout of the heaters of the AI-ETIPS. Note that due to a manufacturing issue the heaters are shifted 0.019 m towards the CEG side

Table 1 Heat fluxes allocated to each heater for cases A and B. The corresponding thermal power consumption is 4815 W m−1

Heater A B C D E F G

Heat flux [W m−2 ] 43,400 32,550 26,350 21,700 18,600 20,150 18,600

The layout considered is taken from the experimental layout deployed for a wind tunnel test campaign reported in [3]. Due to a manufacturing error, the heaters were not disposed symmetrically from the leading edge. The flight conditions are considered constant since those variables are known and controlled in-flight. The angle of attack is equal to 0◦ , the pressure is 90,000 Pa, and the Mach number considered is equal to 0.28. The distribution of heat fluxes across each of the seven heaters is depicted in Table 1. Two quantities of interest related to the runback ice mass rates are considered. The first one accounts for the total freezing mass rate computed as: M˙ ice =

Ng

m ˙ ice (si )

(5)

i=1

The second QoI measures the maximum freezing mass rate, computed as: ˙ ice (si )] M˙ ice,max = max[m

(6)

The quantities are evaluated independently for pressure and suction airfoil sides, rendering a total of 4 quantities of interest (QoIs). It is highlighted there are discontinuities in all the outcome spaces, caused when SAT increases, the value

28

B. Arizmendi Gutiérrez et al.

of Trec reaches the freezing temperature and suddenly the residual water in RW operation stops freezing. It is pointed that all the two quantities corresponding to a particular side can simultaneously take a 0 value when the heat flux layout across the protected parts is sufficient to evaporate all the impinging water. Otherwise, they take a > 0 value (RW operation). In the outcome space just in between the two possible classes of values, the partial derivatives of the QoIs with respect to the uncertain variables are discontinuous, pointing a non-smooth feature. On a different note, a severity scale developed by Lewis [14] classified the icing intensity in four different groups based on M˙ ice,max quantified in g cm−2 h−1 . In this case, this scale will be adopted for the evaluation of the results. The icing rate can be classified as trace if M˙ ice,max < 1, as light if 1 < M˙ ice,max < 6, as moderate if 6 < M˙ < 12 and as severe if M˙ > 12. ice,max

ice,max

3 Cloud Uncertainty Characterization From the icing perspective, the three main characteristic parameters are the LWC, the MVD and the static air temperature SAT, which imposes a threshold temperature for the ice formation. If the SAT is greater than the freezing point, the appearance of ice formations is very unlikely. Horizontally extending and mild icing encounters are considered here such as flying through a stratiform cloud. Generally, the values of LWC and MVD present low values compared to short and severe encounters with cumuliform clouds. Except for the SAT, which is available in-flight but still uncontrollable, the measurement of LWC and MVD requires specific probes, which are only deployed for test campaigns. Due to the complexity of cloud physics and the dynamic nature of the clouds, the remote prediction of these properties would be a very challenging task, and commonly, qualitative remote predictions of the icing threat are formulated [7]. To address this, the cloud properties are modelled as independent random variables. To limit the prior knowledge introduced into the problem, the random variables are characterized using uniform probability densities. This general approach is a conservative one because it gives the same probability to large and small LWC when in experimental campaigns it was found that lower LWC values were more frequent than large ones [8, 18]. Hence, the frequency of severe runback ice accretions is overestimated, but the range of values should be maintained. The boundaries of each of the variables are selected from the current regulation for inflight icing for stratiform clouds which is in Appendix C of the title 14 Code of the Federal Regulation part 25 for Continuous Maximum Events [2]. The values considered are presented in Table 2.

Cloud Uncertainty Quantification for Thermal Ice Protection Systems Table 2 Characteristic bounds of uncertain cloud parameters

Lower bound Upper bound

LWC [g s−1 ] 0 1

29 MVD [μm] 10 50

SAT [◦ C] −35 0

4 Uncertainty Propagation Methodologies Since the selected quantities of interest (QoI) present inherent non-smooth features, Monte Carlo sampling (MCS) techniques are chosen for the forward Uncertainty Propagation. Additionally, the ability of generalized Polynomial Chaos Expansion (gPCE) on the prediction of the low order statistics is assessed. That is because it may enable the reduction of the number of function realizations required in comparison with MCS. Next, the two techniques are introduced.

4.1 Monte Carlo Sampling Methods These techniques rely on repeated random sampling of the input variables to obtain a large set of model realizations that can be processed to estimate the statistics of a certain QoI. It is widely deployed for forward uncertainty propagation due to its simplicity and because it does not suffer the curse of dimensionality. Furthermore, MCS techniques do not impose any limitations in the smoothness on the QoI, which is of particular interest here. Its convergence to the statistics of the QoI is guaranteed as stated by the law of large numbers. On the other hand, the main drawback is its slow convergence rate which is proportional to √1 , being N the number of samples N drawn [21].

4.2 Generalized Polynomial Chaos Expansion Generalized Polynomial Chaos Expansion (gPCE) is a spectral method to represent a stochastic QoI f (ξ ), dependent on a set of mutually independent random inputs (ξ = ξ1 , ξ2 . . . ξd ) for the forward propagation of input uncertainties. The stochastic QoI is represented on a basis of orthogonal polynomials on the random variables ξ . In this work, as uniformly distributed variables are considered U(−1,1), Legendre polynomials L(ξi ) are the appropriate choice of orthogonal polynomials. They are supported in the interval [−1,1], and their probability measure is equal to 0.5. Due to limited computational resources, the PCE must be truncated at a certain degree P . The gPCE is expressed as:

30

B. Arizmendi Gutiérrez et al.

f (ξ ) ≈

P

βi ψi (ξ )

(7)

i=0

The polynomial basis ψ(ξ ) are predetermined for each degree. The calculation of the coefficients βi is performed by a Non-Intrusive Spectral Projection and Gauss–Legendre quadrature formulas are deployed for their evaluation. A number of quadrature points Nq must be selected per random variable. The total number of quadrature points resulting from the full tensorization of the random variable space is equal to Nqd . In this work, Nq is defined according to the Gauss–Legendre quadrature rule and it is equal to P + 1. Among its strengths, it provides an analytic representation of the quantity of interest. In this way, low order statistics of the stochastic QoI can be easily obtained analytically, eliminating the need for sampling. Further details on the approach can be found in the reference literature [13, 23].

5 Numerical Results First, the results of the uncertainty propagation by means of MCS for both quantities of interest are presented. A total of 10,000 samples of the cloud parameter random variables and corresponding model evaluations are conducted. It is highlighted that the results obtained in both the pressure and suction sides of the airfoil present analogous trends. Here, the results obtained in the pressure side are reported. For fixed flight conditions, these values are highly dependent on the thermal power supplied employing the AI-ETIPS as well as its distribution across the protected parts. Both frequency distributions depicted in Fig. 3a, b are characterized by a Pareto type I frequency density. That is, FE or nearly FE operating regime is the most observed scenario and the frequency generally decays as the QoI increases. These occurrences account for the cases in which the SAT is sufficiently high such that the Trec over the surface is greater than the freezing temperature. Besides, it accounts for the cases for which the power layout is sufficient to fully evaporate the impinging water. There are some features in common between M˙ ice and M˙ ice,max . In FE regime, both quantities are equal to 0. In RW operation, M˙ ice establishes the upper bound of M˙ ice,max . Despite this, the correlation between these two quantities is low, the Pearson correlation coefficient is equal to 0.33 and variations in one of the quantities do not necessarily entail variations in the other quantity. The value of M˙ ice,max is highly influenced by the temperature, being the lower the temperature the more rapid the runback water will freeze and the higher M˙ ice,max . This effect is more limited in the case of M˙ ice as it is expected that all the runback water eventually

Cloud Uncertainty Quantification for Thermal Ice Protection Systems

31

Fig. 3 Binned representation of the statistic distribution of the icing QoIs for the case of study. (a) Presents the results obtained for M˙ ice and (b) those for M˙ ice,max

Fig. 4 Comparison of the low order statistics computed by means of gPCE and MCS. (a) Presents the results obtained for M˙ ice and (b) those for M˙ ice,max

freezes downstream. This temperature dependence explains the diversity in trends between M˙ ice and M˙ ice,max . From the Newton severity scale reported in Sect. 2.2, on the pressure side 53% of the encounters caused traces, 6% mild icing, 9% moderate icing and 30% severe icing. A 30% probability of severity should be a matter of concern. However, since the distributions of the cloud properties reported in Sect. 3 account conservatively for the cloud properties, the frequency of severe scenarios is greatly magnified. However, the ranges of M˙ ice,max should be representative of the worst-case scenario, since the thresholds of cloud properties are realistic. Nevertheless, the frequency of the most severe scenarios should be greatly reduced. Next, the comparisons of low order statistics obtained by means of gPCE and MCS are depicted in Fig. 4a, b. gPCE can deliver an estimation of μm and σ for each of the QoIs at a much reduced computational cost than MCS in either of the cases. For P = 7, the number of function evaluations required is equal to 512 as opposed

32

B. Arizmendi Gutiérrez et al.

to the 10,000 evaluations that MCS requires. The accuracy of these predictions is limited as observed. Regarding M˙ ice , the worst predictions are maintained below 6% accuracy with respect to MCS results and the majority of the predictions below 3%. In the case of M˙ ice,max , the predictions elaborated are slightly poorer, being 8% and 4% the corresponding values. To what concerns the quantitative predictions, it is observed in Fig. 4a, b that the predictions elaborated by means of gPCE present an oscillating behaviour as the truncation degree increases. Due to the spectral convergence theorem [23], one would expect that as the truncation degree increased, the low order statistic predictions would exponentially converge to a certain value. Hypothesis for the non-monotonic convergence include the use of smooth polynomials to approximate non-smooth and discontinuous functions or the position of the quadrature points with respect to the discontinuity.

6 Concluding Remarks By means of Monte Carlo Sampling techniques, the uncertain cloud properties are propagated into the runback related quantities of interest. For the thermal power layout and the flight properties considered, the most frequent scenario is the full evaporation of the impinging water, describing a Pareto type I density function. Moreover, the frequency of the most severe scenarios is fortunately limited. Further studies should be conducted to understand the effects of the thermal power supply and its distribution across heaters into the different frequency densities. Besides, the results obtained present very conservative trends due to the probabilistic distribution functions selected to describe the cloud properties. Hence, further characterization of the cloud uncertainties can lead to more realistic outcomes. gPCE provides estimations of the low order statistics of the runback ice QoIs at a significantly reduced number of model evaluations and thus, computational cost. The selection of two quantities of interest gives different insights on the possible shapes of runback ice formations. Moreover, the use of the local maximum quantity can give further insights on the icing severity with regard to the severity scale proposed by Newton. The propagation of the two quantities presents common features, which suggests that the reduction of one could entail the reduction of the other, although not necessarily. To what concerns the quantitative predictions of the low order statistics by means of gPCE, an oscillatory behaviour is observed as the truncation degree increases. This fact penalizes this standard approach in a way that future practitioners should pay special care to this issue. The computational cost entailed by large truncation degrees does not necessarily lead to improved accuracy. Many reasons can justify this behaviour such as the use of smooth polynomials to approximate a non-smooth and discontinuous function or the position of the quadrature points with respect to the discontinuity. It is concluded that further investigations must be conducted to clarify the reasons for these oscillations.

Cloud Uncertainty Quantification for Thermal Ice Protection Systems

33

References 1. Aviation Maintenance Technician Handbook - Airframe, vol. 2, chap. 15, pp. 15.1–15.32. U.S. Department of Transportation. Federal Aviation Administration (2012) 2. Airworthiness Standard: Transport Category Airplanes, Part 25 Appendix C. In: US Code of Federal Regulations, Title 14. Federal Aviation Administration (Issued Every Year) 3. Al-Khalil, K.M., Horvath, C., Miller, D.R., Wright, W.B.: Validation of NASA thermal ice protection computer codes. Part 3; The validation of ANTICE. In: 35th Aerospace Sciences Meeting and Exhibit, p. 51 (2001) 4. Al-Khalil, K., Keith, T., De Witt, J.: Development of an anti-icing runback model. In: 28th Aerospace Sciences Meeting, p. 759 (1990) 5. Arizmendi Gutiérrez, B., Della Noce, A., Gallia, M., Bellosta, T., Guardone, A.: Numerical simulation of a thermal ice protection system including state-of-the-art liquid film model. J. Comput. Appl. Math. 113454 (2021) 6. Bellosta, T., Parma, G., Guardone, A.: A robust 3D particle tracking solver of in-flight ice accretion using, arbitrary precision arithmetics. In: VIII International Conference on Coupled Problems in Science and Engineering (2019) 7. Bernstein, B.C., McDonough, F., Politovich, M.K., Brown, B.G., Ratvasky, T.P., Miller, D.R., Wolff, C.A., Cunning, G.: Current icing potential: algorithm description and comparison with aircraft observations. J. Appl. Meteorol. 44(7), 969–986 (2005) 8. Cober, S.G., Isaac, G.A., Strapp, J.: Aircraft icing measurements in East Coast winter storms. J. Appl. Meteorol. 34(1), 88–100 (1995) 9. Economon, T.D., Palacios, F., Copeland, S.R., Lukaczyk, T.W., Alonso, J.J.: SU2: an opensource suite for multiphysics simulation and design. AIAA J. 54(3), 828–846 (2016) 10. Feng, K., Lu, Z., Yun, W.: Aircraft icing severity analysis considering three uncertainty types. AIAA J. 57(4), 1514–1522 (2019) 11. Gent, R., Dart, N., Cansdale, J.: Aircraft icing. Philos. Trans. R. Soc. London, Ser. A Math. Phys. Eng. Sci. 358(1776), 2873–2911 (2000) 12. Gori, G., Zocca, M., Garabelli, M., Guardone, A., Quaranta, G.: PoliMIce: a simulation framework for three-dimensional ice accretion. Appl. Math. Comput. 267, 96–107 (2015) 13. Le Maître, O., Knio, O.M.: Spectral Methods for Uncertainty Quantification: With Applications to Computational Fluid Dynamics. Springer Science & Business Media, Berlin (2010) 14. Lewis, W.: A flight investigation of the meteorological conditions conducive to the formation of ice on airplanes. Tech. rep., National Aeronautics and Space Administration Moffett Field CA Ames Research (1947) 15. Miles, N.L., Verlinde, J., Clothiaux, E.E.: Cloud droplet size distributions in low-level stratiform clouds. J. Atmos. Sci. 57(2), 295–311 (2000) 16. Morency, F., Tezok, F., Paraschivoiu, I.: Anti-icing system simulation using CANICE. J. Aircr. 36(6), 999–1006 (1999) 17. Myers, T., Charpin, J., Chapman, S.: Modelling the flow and solidification of a thin liquid film on a three-dimensional surface. In: Progress in Industrial Mathematics at ECMI 2004, pp. 508–512. Springer, Berlin (2006) 18. Politovich, M.K., Bernstein, T.A.: Aircraft icing conditions in Northeast Colorado. J. Appl. Meteorol. 41(2), 118–132 (2002) 19. Silva, G., Silvares, O., de Jesus Zerbini, E.: Numerical simulation of airfoil thermal anti-ice operation, part 1: mathematical modelling. J. Aircr. 44(2), 627–633 (2007) 20. Silva, G., Silvares, O., Zerbini, E., Hefazi, H., Chen, H.H., Kaups, K.: Differential boundarylayer analysis and runback water flow model applied to flow around airfoils with thermal antiice. In: 1st AIAA Atmospheric and Space Environments Conference, p. 3967 (2009) 21. Soize, C.: Uncertainty Quantification. Springer, Berlin (2017) 22. Whalen, E., Broeren, A., Bragg, M., Lee, S.: Characteristics of runback ice accretions on airfoils and their aerodynamics effects. In: 43rd AIAA Aerospace Sciences Meeting and Exhibit, p. 1065 (2005)

34

B. Arizmendi Gutiérrez et al.

23. Xiu, D.: Numerical Methods for Stochastic Computations: A Spectral Method Approach. Princeton University Press, Princeton (2010) 24. Zhang, F., Huang, Z., Yao, H., Zhai, W., Gao, T.: Icing severity forecast algorithm under both subjective and objective parameters uncertainties. Atmos. Environ. 128, 263–267 (2016) 25. Zhang, X., Lu, Z., Feng, K., Ling, C.: Reliability sensitivity based on profust model: an application to aircraft icing analysis. AIAA J. 57(12), 5390–5402 (2019)

Multi-fidelity Surrogate Assisted Design Optimisation of an Airfoil under Uncertainty Using Far-Field Drag Approximation Elisa Morales , Péter Zénó Korondi , Domenico Quagliarella Renato Tognaccini , Mariapia Marchi , Lucia Parussini , and Carlo Poloni

,

This research has been developed with the partial support of the H2020 MCSA ITN UTOPIAE grant agreement number 722734. E. Morales () Italian Aerospace Research Centre, Capua, Italy Università degli Studi di Napoli Federico II, Naples, Italy e-mail: [email protected] P. Z. Korondi · C. Poloni ESTECO, Trieste, Italy Department of Engineering and Architecture, University of Trieste, Trieste, Italy e-mail: [email protected]; [email protected] D. Quagliarella Italian Aerospace Research Centre, Capua, Italy e-mail: [email protected] R. Tognaccini Università degli Studi di Napoli Federico II, Naples, Italy e-mail: [email protected] M. Marchi ESTECO, Trieste, Italy e-mail: [email protected] L. Parussini Department of Engineering and Architecture, University of Trieste, Trieste, Italy e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Vasile, D. Quagliarella (eds.), Advances in Uncertainty Quantification and Optimization Under Uncertainty with Aerospace Applications, Space Technology Proceedings 8, https://doi.org/10.1007/978-3-030-80542-5_3

35

36

E. Morales et al.

1 Introduction The aerodynamics performance of real-world applications is inherently uncertain due to manufacturing errors, uncertain environmental conditions [18, 19], and other physical phenomena like icing [2]. Therefore, uncertainties must be accounted for already during the aerodynamic design of airfoils. Uncertainty-based optimisation techniques provide optimal airfoil designs that are less vulnerable to the presence of uncertainty in the operational conditions (i.e., Mach number, angle-of-attack, etc.) at which the airfoil is functioning. In such optimisation techniques, the Quantity of Interest (QoI) is a statistical measure instead of a deterministic value. The accurate calculation of a statistical measure requires numerous function evaluations, which increases the computational demand significantly. In particular, the Computational Fluid Dynamic (CFD) calculations burden the computational budget as the QoIs (a statistical measure of lift, drag, or moment coefficients) are computed by solving the Reynolds-averaged Navier– Stokes (RANS) equations numerically. In order to reduce the computational time, a multi-fidelity surrogate assisted method is adopted here. Surrogates are data-based mathematical models constructed using only a few expensive function evaluations. With their help, the aerodynamic performance of an airfoil can be predicted at a low computational cost. The accuracy of a surrogate highly depends on the size of its training data. Therefore, when the function evaluations are truly expensive, for example, a CFD simulation with very fine grid, the training data can be complemented by function evaluations of lower fidelity. The information coming from various fidelity levels can be fused together with multi-fidelity Gaussian process regression (MF-GPR). This technique was introduced by Kennedy and O’Hagan [8]. In this work, the drag coefficient (cd ) of the MH 114 [11] propeller airfoil is minimised by a multi-fidelity surrogate assisted optimisation technique. The opensource fluid dynamic solver SU2 [4] is used for calculating the cd . SU2 solves the compressible RANS equations numerically and calculates cd by integrating the stress over the body surface with the so-called near-field method. The drag coefficient cd can have different levels of fidelity by using different grid refinements. A calculation with a fine grid provides a high-fidelity cd prediction. However, fine meshes are very demanding from a computational point of view. Coarse grids are computationally cheap, but they introduce a higher proportion of spurious drag. This numerically introduced drag stems from the truncation error of the used numerical methods and the artificial dissipation of solving the RANS equations with a coarse grid. The artificial dissipation is added in the numerical schemes to boost the convergence of the flow and stabilise the scheme. Hence, the prediction of the near-field cd deteriorates. Nevertheless, there are far-field methods for the estimation of the drag force that allow the cd prediction with a level of accuracy similar to a fine grid by identifying the spurious drag sources. A review of all these methods is given in [5]. In this work, the formulation described in [13] has been implemented.

Multi-fidelity Airfoil Optimisation with Far-Field Drag Approximation & Uncertainty

37

The prediction of the drag coefficient using the far-field method will be used for the low-fidelity level on a coarse grid. This procedure will allow a better estimation of the drag coefficient with respect to the near-field value computed on the same grid, thus resulting in an increased accuracy while preserving the computational cost. In addition, the near-field cd estimation obtained with a fine grid will be used for the high-fidelity level. This paper is organised as follows. Section 2 gives a brief overview of MF-GPR. The aerodynamic computational chain is detailed in Sect. 3. The far-field approach for drag estimation is explained in Sect. 4. Section 5 introduces a deterministic airfoil design problem. The airfoil design problem under uncertainty is given in Sect. 6. Specifically, the drag coefficient of a propeller blade airfoil will be minimised under geometrical and environmental constraints. The uncertainty will be introduced on the angle-of-attack modelled by a four-parameter beta distribution. We propose a multi-fidelity surrogate assisted optimisation pipeline in Sect. 7. The results of the deterministic and probabilistic optimisations are discussed in Sect. 8. The interpretation of the results is concluded in Sect. 9.

2 Multi-fidelity Gaussian Process Regression The drag coefficients obtained with the far-field approximation and SU2 are fused together into a single surrogate by multi-fidelity Gaussian process regression (MFGPR). The recursive formulation is adapted here as proposed by Le Gratiet and Garnier [10]: f˜1 (x) = hT1 (x)β 1 + δ˜1 (x),

(1a)

f˜2 (x) = ρ(x)f˜1 (x) + hT2 (x)β 2 + δ˜2 (x),

(1b)

ρ(x) = gT (x)β ρ ,

(1c)

where indices 1 and 2 denote the low and high-fidelity levels, respectively. The mean trend of the fidelity level is formulated as a least-squares regression hi (x)β i with the vector of regression functions hi (x) and the vector of regression coefficients β i . The local variations of the model are modelled as zero-mean Gaussian distributions with σi2 variance and incorporated into δ˜i (x) ∼ N(0, σi2 ). This recursive formulation, first, trains a standard GPR surrogate using the low-fidelity samples calculated by the far-field method. Then, the posterior of the low-fidelity GPR is combined together with the high-fidelity observations of SU2 by training an additional GPR. This recursive formulation avoids the need to construct a large covariance matrix containing the low- and high-fidelity designs as in [8]. Even if the training cost of the surrogate is negligible compared to the aerodynamic design evaluation, the reduced covariance size is advantageous as the model is frequently re-trained during the optimisation process.

38

E. Morales et al.

3 Aerodynamic Computational Chain When aerodynamic shape design problems are faced, it is crucial to have a selfoperating aerodynamic computational chain. It takes as input the design variables given by the optimiser and generates the candidate (in this case a single component airfoil) to be evaluated, builds the computational mesh, and runs the computational fluid dynamic flow solver. Finally, the obtained performance of the candidate is provided to the optimiser. In the design problem studied here, the performance of the candidate is the airfoil drag coefficient. Moreover, for the low-fidelity runs, instead of providing directly the cd given by the CFD solver, the drag coefficient is calculated with a far-field approach and later is provided to the optimiser. This pipeline is shown in Fig. 1, and explained in the following sub-sections. Note that the far-field formula is described in depth in Sect. 4. Airfoil Generation The candidate airfoils are generated using wg2aer.1 The input for the program is a set of values of the design variables, and it modifies a specified starting airfoil (MH 114) accordingly to some modification functions. To generate the baseline airfoil, the design variables are set equal to 0. Specifically, the airfoil is parametrised as a linear combination of the initial geometry (x0 (s), y0 (s)), and the applied modification functions yi (s). Thus, the airfoil is described as: ⎛ y(s) = k ⎝y0 (s) +

n

⎞ wi yi (s)⎠ ,

i=1

x(s) = x0 (s)

Fig. 1 Aerodynamic computational chain

1 Program

developed at the Italian Aerospace Research Centre (CIRA).

(2)

Multi-fidelity Airfoil Optimisation with Far-Field Drag Approximation & Uncertainty

39

y/c

0.2 0.1 0 -0.1

0

0.2

0.4

x/c

0.6

0.8

1

Fig. 2 Modified airfoils example. Baseline airfoil (solid line) Table 1 Mesh size parameters for low- and high-fidelity simulations. (Nb : number of cells on the body surface, Nw : number of cells in the wake, Nj : number of cells in far-field direction, Ntotal : total number of cells) Low-fidelity (LF) High-fidelity (HF)

Nb 96 512

Nw 48 256

Nj 48 256

Ntotal 16,384 262,144

the airfoil shape is controlled by the design parameters wi and k, the scale factor to fulfil the maximum thickness criterion. Specifically, ten design parameters are considered. This will lead to eight optimisation design variables. The first and second design parameters describe a thickness mode so that they have the same value but opposite sign (w2 = −w1 ). In addition, the third and fourth design parameters represent a camber mode; hence both are equal (w4 = w3 ). The range for the design parameters are: w1 , w2 ∈ [−2, 2], w5 , w6 , w7 , w8 ∈ [−1, 1],

w3 , w4 ∈ [−2, 2], w9 , w10 ∈ [−0.2, 0.2]

(3)

Regarding the modification functions, the first four are polynomials affecting the whole airfoil, while the rest are Hicks-Henne bump functions that have the location of the bump at different position of the airfoil chord. In Fig. 2, an example of airfoil modifications is provided. Grid Generation Construct2D is an open-source grid generator designed to create 2D grids for CFD computations on airfoils [3]. The grids are generated in Plot3D format; however, the source code has been changed to provide also the grid in SU2 format. Given the coordinates of the modified airfoil, a C-type grid is generated. The number of cells of the mesh depends on the level of fidelity that has to be run. The possible mesh sizes are provided in Table 1. The far-field is located at 500 airfoil chords. The possible computational meshes are depicted in Fig. 3.

40

E. Morales et al.

(a)

(b)

Fig. 3 Possible grids depending on the fidelity level. (a) Low-fidelity grid. (b) High-fidelity grid

CFD Evaluation The CFD solver used for the aerodynamic shape design optimisation problem is the open-source fluid dynamic solver SU2 [4]. Particularly, the compressible Reynoldsaveraged Navier–Stokes (RANS) equations are solved. The turbulence model used is the Spalart–Allmaras (SA) [17]. Furthermore, for the spatial integration, JST central scheme with artificial dissipation coupled with an implicit Euler method for the pseudo-time stepping is used.

4 Far-Field Drag Coefficient Calculation The far-field method implemented in this work was introduced in [13]. It allows the decomposition of the drag force in three components: wave, viscous, and spurious drag. Specifically, the method is based on entropy variations. The entropy drag is expressed as volume integral, which allows the decomposition of the drag into the above components. Hence, a proper selection of each region is needed. The entropy drag is defined as: D s = Dw + Dv + Dsp ,

(4)

where Dw , Dv , and Dsp are the wave, viscous, and spurious contributions, respectively. Dsp is the drag source related to the entropy introduced by the truncation error and the artificial dissipation of the numerical schemes used by the Computational Fluid Dynamics flow solver. Hence, by identifying the Dsp contribution and subtracting it from Eq. (4), a prediction of the drag coefficient, with an accuracy close to fine grids, is obtained on a coarser mesh. This will imply a considerable advantage while facing aerodynamic optimisation problems since the use of coarser grids allows a significant reduction of the required computational time. This advantage for optimisation has already been shown in [6, 12].

Multi-fidelity Airfoil Optimisation with Far-Field Drag Approximation & Uncertainty

41

Table 2 Mesh sizes and computed drag coefficients. Viscous test at M∞ = 0.2, Re∞ = 4.97 × 106 , and cl = 1.0 Nb 96 128 256 512 1024

h = 10.7 h=8 h=4 h=2 h=1

Nw 48 64 128 256 512

Nj 48 64 128 256 512

cdnf [dc] 165.7 137.1 120.7 113.9 110.0

Ntotal 9216 16,384 65,536 262,144 1,048,576

-1.5

cdv [dc] 115.0 107.8 106.5 104.2 102.6

180

-1

160

-0.5

cd

Cp

140 0

120 0.5 100

1 1.5 0

0.2

0.4

0.6

0.8

1

80 0

2

4

6

x/c

h

(a)

(b)

8

10

12

Fig. 4 MH 114 test at M∞ = 0.2, Re∞ = 4.97 × 106 , and cl = 1.0. Left: Pressure coefficient distribution on the body surface at h = 1 (filled square), h = 2 (orange filled diamond), h = 4 (green filled circle), h = 8 (blue filled right triangle), and h = 10.7 (red filled triangle). Right: Near-field (red solid line with filled square) and far-field (blue solid line with filled triangle) drag coefficients versus mesh size. (a) Pressure coefficient, cp . (b) Drag coefficient, cd

Furthermore, a test to verify this advantage for the design problem of the propeller blade airfoil, here studied, has been carried out. In particular, a viscous flow with working conditions M∞ = 0.2, Re∞ = 4.97 × 106 , and cl = 1.0 is performed on the MH 114 airfoil. The compressible RANS equations are solved using the SU2 flow solver with the SA turbulence model [17]. Five C-type grids of an increasing number of cells are studied. The grid size is obtained by the square root of the ratio √ between the number of cells of the finest grid and the grid under evaluation h = Nh=1 /Ni . The number of cells on the body surface (Nb ), on the wake (Nw ), in the far-field direction (Nj ), and the total number of cells (Ntotal ), as well as, the near-field value of the drag coefficient (cdnf ), and the far-field value (cdv ) are given in Table 2. Note that in this test case the only drag contribution is the viscous (Dv ), and the drag values are expressed in drag counts (1dc = 10−4 ). In Fig. 4, the comparison between near-field and far-field drag coefficients versus the grid size is given. In addition, the pressure coefficient distribution on the body surface (cp ) at the different grid refinements is also plotted.

42

E. Morales et al.

In Fig. 4a the differences between the cp on the body surface are barely visible; hence a good local accuracy of the solution is also demonstrated when the coarsest grid is used. However, Fig. 4b shows how the near-field value of the drag coefficient converges as the grid is refined (h −→ 0). The variation on cd between the coarsest and finest mesh sizes is given by the spurious drag source introduced by the numerical method and the artificial dissipation. Contrarily, using the far-field analysis of the drag force, the spurious drag contribution is removed. Hence, a better estimation of the cd is found. Thus, the drag coefficient value for the lower fidelity of the surrogate model is improved in accuracy while keeping the same computational time.

5 Deterministic Design Optimisation Problem The shape optimisation design problem studies the minimisation of the drag coefficient (cd ) of a propeller blade airfoil subjected to geometric and aerodynamic constraints. The baseline design is the Martin Hepperle MH 114 airfoil for a propeller. The flow conditions are M∞ = 0.2, Re = 4.97 × 106 , and α = 2◦ . The lift coefficient of the airfoil is required to be at least one (cl ≥ 1). Geometrical constraints are imposed for obtaining realistic shapes. The percentage thickness with respect to the airfoil chord (t% ) is fixed to the value of the baseline. The Leading Edge Radius (LER) and the Trailing Edge Angle (TEA) are constrained by minimum values not to fall below their baseline values with more than 10%. In mathematical terms, the deterministic optimisation example reads: ⎧ ⎪ min cd (w) ⎪ ⎪ w ⎪ ⎪ ⎪ ⎪ subject to: ⎪ ⎨ cl (w) ≥ 1.0 ⎪ t ⎪ % (w) = 13.05 ⎪ ⎪ ⎪ ⎪ LER(w) ≥ 0.011377 ⎪ ⎪ ⎩ T EA(w) ≥ 6.0◦

(5)

A penalty approach will be used to handle the constrained problem.

min n cd (w) + pcl max 0, 1 − cl (w) + pLER max 0, 0.011377 − LER(w) w∈W ⊆R

+pT EA max 0, 6 − T EA(w) , (6) where the pcl = 1000, pLER = 100,000, and pT EA = 100. The equality constraint of the thickness is imposed by scaling the modified airfoil shape to the given value.

Multi-fidelity Airfoil Optimisation with Far-Field Drag Approximation & Uncertainty

43

6 Probabilistic Design Optimisation Problem In order to improve the performance of the airfoil under uncertainty, a robust optimisation must be performed. Here, the angle-of-attack is the only parameter considered uncertain, thus representing uncertainty in the environmental conditions. Particularly, the uncertainty has been modelled with a four-parameter beta distribution. The variability range is αref ± 0.25. The statistical measure chosen to solve the robust design problem is the Conditional Value-at-Risk (CVaR) measure [1, 16], or super-quantile, at a confidence level γ equal to 0.95. This type of risk measures was introduced in the financial sector, but they have been advantageously applied to aerodynamic design optimisation problems [14, 15]. ⎧

0.95 c (w,u) ⎪ min CV aR d ⎪ ⎪ w ⎪ ⎪ ⎪ ⎪ subject to: ⎪

⎨ CV aR 0.95 loss cl (w,u) ≥ 1.0 ⎪ t% (w) = 13.05 ⎪ ⎪ ⎪ ⎪ ⎪ LER(w) ≥ 0.011377 ⎪ ⎪ ⎩ T EA(w) ≥ 6.0◦

(7)

Therefore, the robust optimisation problem reads:

min n CV aR 0.95 cd (w,u) + pcl max 0, 1 − CV aR 0.95 loss cl (w,u) w∈W ⊆R

+ pLER max 0, 0.011377 − LER(w) + pT EA max 0, 6 − T EA(w) , (8) where = (−cl ) is the loss Conditional Value-at-Risk. The random perturbations of the angle-of-attack impacts only the aerodynamic force requirements (cd and cl ). The geometric constraints can be evaluated for each design configuration (w) deterministically. The same penalty parameters have been used as in the deterministic case. CV aR 0.95 loss (cl )

−CV aR 0.95

7 Optimisation Pipeline The multi-fidelity surrogate assisted design optimisation strategy of expensive problems [9] has been tailored to reduce the computational cost of the airfoil optimisation problem here studied. Computationally more economic far-field drag predictions are used to populate the training dataset as the computational budget affords only a handful of high-fidelity RANS simulations. The optimisation pipeline is presented in Fig. 5.

44

E. Morales et al.

Fig. 5 Optimisation pipeline with multi-fidelity surrogate

Our constrained expected improvement formulation is: ∗ − Fobj (c˜d , c˜l ) P c˜l ≥ 1 , cEI = E max 0, Fobj ⎛

∗ F − F ( c ˆ , c ˆ ) obj d l obj ∗ = ⎝ Fobj − Fobj (cˆd , cˆl ) σˆ c2d +σˆ c2d φ

∗ −F Fobj obj (cˆd , cˆl )

σˆ c2d

(9)

⎞ c ˆ − 1 l ⎠ , σˆ c2l

where σˆ c2d and σˆ c2l are the standard deviations of the drag and lift coefficient, respectively. Fobj is the penalised objective given by Eq. (6) and Eq. (8) for the deterministic and robust optimisation studies, respectively. The best evaluated ∗ . The and φ symbols denote the cumulative objective value is given by Fobj distribution function and probability density function of the standard normal distribution, respectively. E is the expected value and P is the probability operator. Our optimisation method updates the multi-fidelity surrogate in every iteration with one additional design evaluation. The design of which predicted performance maximises the constrained expected improvement function given in Eq. (9) is evaluated. wnew =

max cEI . w∈W ⊆Rn

(10)

Multi-fidelity Airfoil Optimisation with Far-Field Drag Approximation & Uncertainty

45

After finding the most promising design candidate wnew with Eq. (10), the algorithm chooses the fidelity level of the aerodynamic solver based on the Scaled Expected Variance Reduction (SEVR) measure [9]: l = max SEV R l ,

(11)

LF,H F

where the SEVR is defined as: SEV R LF = SEV R H F =

ρ 2 (wnew )σˆ c2d ,LF (wnew ) cLF

ρ 2 (wnew )σˆ c2d ,LF (wnew ) + σˆ c2d ,δH F (wnew ) cH F

(12)

, ,

(13)

where cLF = 1 and cH F = 10 are the costs of the low- and high-fidelity simulations, respectively. The computational chains of low- and high-fidelity have to be complemented by an additional step for computing the lift and drag coefficients CVaR risk measures. Due to the heavy computational demand, the risk measure is calculated with a surrogate-based uncertainty quantification approach. At the low-fidelity level, for each design configuration, five LF samples are used for constructing a local GPR model, while at the high-fidelity level, five LF samples and three HF samples are used for constructing a local MF-GPR model. These models can then be used to draw a statistically significant number of samples to calculate the risk measure. The number of HF samples is set to the minimum necessary for training the local probabilistic model. We arbitrarily decided to increase the number of LF samples by 20% w.r.t. the HF approximation. The constructed local probabilistic models of the baseline configuration are presented in Fig. 6. Furthermore, in Fig. 7, the convergence of the risk measure value of the aerodynamic force coefficients in relation to the number of virtual samples is depicted. Based on the CVaR convergence, in the present work, 100,000 virtual samples of the local probabilistic models are generated to calculate the CVaR values for both aerodynamic coefficients (cl and cd ). The computational costs of the fidelity levels are set according to their true computational time required for the probabilistic optimisation. The cost of running a low-fidelity CFD evaluation is considered as 1. According to the computational time, the cost for a high-fidelity is 16 times greater than the low-fidelity runs. Considering that five LF samples are needed to build the low-fidelity probabilistic model, the total cost of the LF model is 5. On the other hand, three HF and five LF evaluations are required to construct the high-fidelity probabilistic model. Thus the total cost is 53. Therefore, a 1 to 10 cost-ratio is used in this study. Note that the computational costs of training the surrogates and calculating the acquisition function are considered negligible in comparison to the CFD evaluations. Finally, the computational chain of the aerodynamic forces with the probabilistic model is shown in Fig. 8.

46

E. Morales et al.

Fig. 6 Local probabilistic models of the aerodynamic force coefficients of the baseline configuration. (a) Drag coefficient, cd . (b) Lift coefficient, cl

Fig. 7 Convergence of the risk measure value of the aerodynamic force coefficients. (a) Lift coefficient, cl . (b) Drag coefficient, cd

8 Results In this section, the main obtained results are shown. A two-step optimisation was carried out. Firstly, a deterministic optimisation, which results are explained in Sect. 8.1, and, later, a probabilistic one. The results of the latter are given in Sect. 8.2.

8.1 Deterministic Optimisation The deterministic design optimisation was solved with multi-fidelity and singlefidelity surrogate-based techniques, and, also, with population-based technique. A resume of all the optimisation results is presented in Table 3. Regarding the surrogate approach, the same computational budget was used for comparing the single- and multi-fidelity surrogates. In both cases, 10 high-fidelity

Fig. 8 Computational chain of the aerodynamic forces with the probabilistic model

Multi-fidelity Airfoil Optimisation with Far-Field Drag Approximation & Uncertainty

47

48

E. Morales et al.

Table 3 Comparison of multi-fidelity, single-fidelity surrogate-based, and population-based optimisation results. (The cost of a single high- and low-fidelity simulations are 10 and 1, respectively)

Baseline (MH114) Best design MF-GPR Best design GPR Best design CMA-ES (h = 10.7) Best design CMA-ES (h = 8)

cl 1.09 1.00 1.08 1.02 1.02

TEA[◦ ] 6.60 6.03 7.76 6.17 6.34

LER 0.014324 0.013228 0.011973 0.018025 0.020900

Samples [LF,HF] [0, 1] [120, 18] [0, 30] [1800, 0] [1800, 0]

Cost 10 300 300 1800 2700

0.15 0.1

y/c

Fig. 9 Baseline and deterministic optimal airfoil comparison. MH114 (solid line), MF-GPR optimal airfoil (red solid line), and GPR optimal airfoil (green solid line). The dashed lines are the camber of each airfoil. Axes are not dependent

cd [dc] 117.8 112.0 117.6 117.1 115.6

0.05 0 -0.05

0

0.2

0.4

0.6

0.8

1

x/c

samples were used for constructing the initial surrogate. In the case of MF-GPR, the surrogate was complemented with 100 low-fidelity samples. Therefore, to keep the same computational budget, only 20 additional samples were generated to complement the high-fidelity surrogate. In Table 3, it can be observed that the optimisation that uses MF-GPR is able to find a better solution. This is due to the fact that the computational budget was severely limited; hence this did not allow us to have enough high-fidelity samples. The lack of HF samples prohibits the construction of an accurate GPR model. However, by introducing low-fidelity information obtained from computationally cheaper samples, the MF-GPR could provide a much better approximation of the performance landscape. Clearly, this handful of new samples were not enough to find a sufficiently good design. From an aerodynamic point of view, the optimal airfoil of the MF-GPR approach has a lower drag coefficient since its camber line is lower than the camber lines of the other two airfoils, the MH114 and the optimal airfoil obtained at the optimisation using GPR approach. This can be observed in Fig. 9. Note that, for visualisation purposes, the axes in Fig. 9 and Fig. 10 are not dependent. Specifically, by decreasing the camber and keeping the free stream angle-of-attack (AoA), the effective AoA of the airfoil actually perceives decreases. Thus, a lower lift coefficient is obtained. This implies a reduction of the lift-induced drag coefficient, hence of the total drag. In addition, the presented design optimisation approach is compared with a popular population-based algorithm, namely CMA-ES [7]. The optimisation was performed using only low-fidelity CFD evaluations. The evolutionary algorithm was

Multi-fidelity Airfoil Optimisation with Far-Field Drag Approximation & Uncertainty 0.15 0.1

y/c

Fig. 10 Baseline and deterministic optimal airfoil comparison. MH114 (solid line), MF-GPR optimal airfoil (red solid line), and CMAES optimal airfoil using h = 8 grid size (blue solid line). The dashed lines are the camber of each airfoil. Axes are not dependent

49

0.05 0 -0.05

0

0.2

0.4

0.6

0.8

1

x/c

Table 4 Comparison of prediction error of multi- and single-fidelity surrogate models. (Prediction error is defined as the arithmetic mean value of the relative error of the high-fidelity predictions during the course of optimisation) MF-GPR surrogate GPR surrogate

cˆd 2.04% 3.11%

cˆl 0.71% 5.65%

Fˆobj 34.51% 17.14%

Fobj (cˆd , cˆl , T EA, LER) 5.43% 11.53%

HF iterations 8 20

not able to find a similar optimal design to the one given by the MF-GPR approach. Contrarily, the airfoil was barely optimised. Therefore, it was decided to increase the mesh size to h = 8 (instead of h = 10.7) and redo the optimisation. In this case, the algorithm found a similarly best design configuration to the presented method (MF-GPR). The cost of a CFD evaluation on h = 8 grid size is 1.5 times the cost of a low-fidelity one. In addition, to perform the population-based optimisation, 1800 evaluations were needed. This implies that the computational cost is significantly increased with respect to the multi-fidelity approach presented in this paper. Therefore, the presented optimisation method presents an advantage with respect to single-fidelity surrogate-based optimisation. By adding low-fidelity samples, the performance landscape is better approximated. This results in a better allocation of computational resources. The multi-fidelity surrogate-based approach can find better airfoil designs compared to classical population-based optimisation, and single-fidelity techniques as the HF evaluations are performed only for promising design candidates. Furthermore, Fig. 10 shows the comparison of the optimal airfoils obtained with CMA-ES and with MF-GPR approaches with respect to the baseline. In addition, the camber line of each airfoil is also plotted. In this case, both optimal airfoils have a similar camber line. However, the obtained optimal design with CMA-ES has the maximum airfoil thickness placed at a forward position (x/c = 0.227) with respect to the MF-GPR best design (x/c = 0.297). This implies that the effective angle-ofattack is greater than the one perceived by the MF-GPR optimal airfoil. Hence, a higher cl and, consequently, cd are found. To evaluate the quality of the produced surrogate models, the mean prediction error is calculated. The results are summarised in Table 4. Particularly, the prediction error was calculated for each HF iteration by using the prediction and true values calculated at the new infill design point. It can be seen that MF-GPR predicted

50

E. Morales et al.

Fig. 11 Prediction of the distributions for the baseline and optimal designs. (a) Lift coefficient distribution, cl . (b) Drag coefficient distribution, cd

the aerodynamic forces of the new designs significantly better. The poor design configuration found by the GPR-based optimisation is also due to the penalty-based approach which was employed here. The lift coefficient is not well predicted and, consequently, the algorithm wastes computations on designs that are unfeasible and have high objective values. Moreover, in Table 4, it is also shown that the objective cannot be accurately predicted by a surrogate directly. However, by independently predicting the aerodynamic forces and calculating the objective afterwards based on these predictions, it was more accurately predicted.

8.2 Probabilistic Optimisation For improving the performance of the airfoil under uncertainty, a probabilistic optimisation was carried out. Specifically, only uncertainty on the angle-of-attack was considered. Considering that the advantages of using MF-GPR were already shown in the deterministic optimisation, the probabilistic one was only made using the proposed optimisation approach. The predicted lift and drag distributions of the optimal designs and the baseline airfoil are shown in Fig. 11. By perturbing the angle-of-attack, the obtained optimal design for the deterministic problem violates the constraint imposed for lift coefficient, as shown in Fig. 11a. The MF-GPR algorithm is able to take uncertainty into account during the optimisation; hence, it can find an optimal design which respects the imposed lift constraint. Particularly, for the probabilistic design optimisation, it was decided that CV aR loss (c˜l ) must be greater or equal to one. Figure 11a

Multi-fidelity Airfoil Optimisation with Far-Field Drag Approximation & Uncertainty

51

Table 5 Comparison of the requirements considering environmental uncertainty Objective Baseline (MH114) 119.07 Deterministic opt. 136.55 Probabilistic opt. 114.89

CV aR(c˜d ) [dc] 119.07 112.87 114.89

CV aR loss (c˜l ) 1.063 0.976 1.002

TEA 6.60 6.03 6.38

LER 0.014324 0.013228 0.016937

Samples [LF,HF] [5, 3] [5, 3] [370, 51]

Cost 35 35 880

0.15

y/c

0.1 0.05 0 -0.05

0

0.2

0.4

0.6

0.8

1

x/c Fig. 12 Baseline, deterministic optimum, and robust optimum airfoil comparison. MH114 (solid line), deterministic optimum (red solid line), and probabilistic optimum (blue solid line). The dashed lines are the camber of each airfoil. Axes are not dependent

shows how the probabilistic optimum design fulfils the imposed cl constraint. Consequently, this design has a higher drag. In Table 5, a comparison of the baseline airfoil with the optimal designs is given in terms of lift, drag, and shape characteristics. Besides, in order to calculate the CVaR, the MF-GPR technique requires the computation of three high-fidelity samples. Hence, the computational budget is triple. However, this is still limited as the size of the budget is equal to the cost of only 90 high-fidelity simulations. The obtained optimal airfoils are compared in Fig. 12. It is appreciated that both the deterministic and the probabilistic optimisation resulted in a smaller camber line curvature airfoil than the baseline. The MH114 airfoil generates a significantly higher lift coefficient than the required constraint value; hence the optimisation tends to reduce the camber curvature so that the lift reduces, and so does the drag. Moreover, by comparing the deterministic and probabilistic optimal designs, it can be seen that the probabilistic optimum has a stronger S-shaped lower side. This increases drag and lift coefficient; hence, resulting in a feasible airfoil design. Finally, in Fig. 13, a comparison of the pressure coefficient distribution and friction coefficient on the body surface of the baseline, deterministic, and probabilistic optimal designs is presented. Analysing both optimal designs, it can be observed that the deterministic optimum presents a smoother expansion rate on the upper surface of the airfoil. Specifically, the maximum is reached at 30% of the chord, whereas the probabilistic optimum has its peak at 10% of the chord. Comparing the pressure coefficient distribution of the optimal airfoils, the contribution of pressure to drag coefficient is clearly higher for the probabilistic optimum. Besides, for the

52

E. Morales et al. 0.01

-1.5 -1

0.008

-0.5

Cf

Cp

0.006 0

0.004 0.5 0.002

1 1.5 0

0.2

0.4

0.6

0.8

1

0 0

0.2

0.4

0.6

x/c

x/c

(a)

(b)

0.8

1

Fig. 13 Pressure coefficient (left) and friction coefficient (right) on the body surface comparison. MH114 (solid line), deterministic optimum (red solid line), and probabilistic optimum (blue solid line). (a) Pressure coefficient, cp . (b) Friction coefficient, cf

contribution of friction on the drag coefficient, a similar conclusion can be applied. The maximum value of the skin friction coefficient is higher for the probabilistic optimum on the suction side of the airfoil and, to a minor extent, on the pressure side. Thus, the friction drag is higher for the probabilistic solution, mainly due to the peak of maximum cf .

9 Conclusion In this present work, a complete optimisation workflow is presented for expensive aerospace applications under uncertainty. The workflow is employed to find optimal airfoil designs that produce minimal drag and respect both aerodynamic force and geometrical constraints. The prediction of aerodynamic forces is an expensive process as the Reynolds-Averaged Navier–Stokes partial differential equations have to be solved numerically. A multi-fidelity surrogate-based technique was used to decrease the computational effort. The drag and lift predictions obtained with two significantly different size meshes were fused together with a hierarchical Gaussian process regression technique. To increase the correlation of the high- and lowfidelity drag predictions, the spurious drag was compensated by performing the far-field drag prediction at a low-fidelity level. Our approach was compared against a classical single-fidelity surrogate-based and an evolutionary algorithm. The results showed that classical methods could struggle to find significantly improved designs due to the limited computational budget. The highest potential of our multi-fidelity surrogate-based approach relies on solving problems under uncertainty where the required numerous probabilistic samples can be efficiently obtained by introducing a multi-fidelity probabilistic model.

Multi-fidelity Airfoil Optimisation with Far-Field Drag Approximation & Uncertainty

53

References 1. Acerbi, C., Tasche, D.: Expected shortfall: a natural coherent alternative to value at risk. Econ. Notes 31(2), 379–388 (2002). https://doi.org/10.1111/1468-0300.00091 2. Arizmendi, B., Bellosta, T., del Val, A.I., Gori, G., Prazeres, M.O., Reis, J.: On real-time management of on-board ice protection systems by means of machine learning. In: AIAA Aviation 2019 Forum, p. 3464 (2019) 3. Construct2d. https://sourceforge.net/projects/construct2d/ 4. Economon, T.D., Palacios, F., Copeland, S.R., Lukaczyk, T.W., Alonso, J.J.: SU2: an opensource suite for multiphysics simulation and design. AIAA J. 54(3), 828–846 (2016). https:// doi.org/10.2514/1.J053813 5. Fan, Y., Li, W.: Review of far-field drag decomposition methods for aircraft design. J. Aircr. 56(1), 11–21 (2019) 6. Gariepy, M., Trepanier, J.Y., Petro, E., Malouin, B., Audet, C., LeDigabel, S., Tribes, C.: Direct search airfoil optimization using far-field drag decomposition results. In: 53rd AIAA Aerospace Sciences Meeting. AIAA SciTech Forum, American Institute of Aeronautics and Astronautics, Jan 2015. https://doi.org/10.2514/6.2015-1720 7. Hansen, N., Müller, S.D., Koumoutsakos, P.: Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (CMA-ES). Evol. Comput. 11(1), 1–18 (2003) 8. Kennedy, M.C., O’Hagan, A.: Predicting the output from a complex computer code when fast approximations are available. Biometrika 87(1), 1–13 (2000) 9. Korondi, P.Z., Marchi, M., Parussini, L., Poloni, C.: Multi-fidelity design optimisation strategy under uncertainty with limited computational budget. Optim. Eng. 22, 1039–1064 (2021) 10. Le Gratiet, L., Garnier, J.: Recursive co-kriging model for design of computer experiments with multiple levels of fidelity. Int. J. Uncertain. Quantif. 4(5), 365–386 (2014) 11. Martin Hepperle MH 114 for a propeller for ultralight. http://airfoiltools.com/airfoil/details? airfoil=mh114-il. Accessed 28 June 2020 12. Morales Tirado, E., Quagliarella, D., Tognaccini, R.: Airfoil optimization using far-field analysis of the drag force. In: AIAA Scitech 2019 Forum, p. 0972 (2019) 13. Paparone, L., Tognaccini, R.: Computational fluid dynamics-based drag prediction and decomposition. AIAA J. 41(9), 1647–1657 (2003) 14. Quagliarella, D.: Uncertainty Management for Robust Industrial Design in Aeronautics: Findings and Best Practice Collected During UMRIDA, a Collaborative Research Project (2013–2016) Funded by the European Union, chap. Value-at-Risk and Conditional Valueat-Risk in Optimization Under Uncertainty, pp. 541–565. Springer International Publishing, Cham (2019). https://doi.org/10.1007/978-3-319-77767-2_34 15. Quagliarella, D., Tirado, E.M., Bornaccioni, A.: Risk measures applied to robust aerodynamic shape design optimization. In: Flexible Engineering Toward Green Aircraft, pp. 153–168. Springer, Berlin (2020) 16. Rockafellar, R.T., Uryasev, S.: Conditional value-at-risk for general loss distributions. J. Bank. Financ. 26, 1443–1471 (2002) 17. Spalart, P., Allmaras, S.: A one-equation turbulence model for aerodynamic flows. In: 30th Aerospace Sciences Meeting and Exhibit, p. 439 (1992) 18. Wang, X., Hirsch, C., Liu, Z., Kang, S., Lacor, C.: Uncertainty-based robust aerodynamic optimization of rotor blades. Int. J. Numer. Methods Eng. 94(2), 111–127 (2013) 19. Zang, T.A.: Needs and opportunities for uncertainty-based multidisciplinary design methods for aerospace vehicles. National Aeronautics and Space Administration, Langley Research Center (2002)

Scalable Dynamic Asynchronous Monte Carlo Framework Applied to Wind Engineering Problems Riccardo Tosi , Marc Nuñez , Brendan Keith Barbara Wohlmuth , and Riccardo Rossi

, Jordi Pons-Prats

,

1 Introduction Uncertainty quantification (UQ) is a field of mathematics involving many engineering and science areas. Monte Carlo (MC) is one of the most famous UQ strategies, and it is widely used since it presents many advantages: it is simple to implement, non-intrusive, does not suffer from the so-called curse of dimensionality, and converges toward real statistical values as the number of realizations grows. However, its biggest drawback is the high computational cost of running such an algorithm. For this reason, many improvements with respect to standard Monte Carlo have been proposed. Concerning hierarchical Monte Carlo methods, we refer, for example, to Multilevel Monte Carlo and Continuation Multilevel Monte Carlo [7, 12, 13, 19–21]. When running in supercomputers, to improve the overall computational cost, it is important to take into account other key aspects, such as scheduling and estimation of Monte Carlo realizations [11]. Properly dealing with these parameters should

This work has been supported by the European Commission through the H2020 Research and Innovation program under contract 800898. R. Tosi () · M. Nuñez International Centre for Numerical Methods in Engineering, Catalonia, Spain e-mail: [email protected] B. Keith · B. Wohlmuth Technische Universität München, München, Germany J. Pons-Prats · R. Rossi International Centre for Numerical Methods in Engineering, Catalonia, Spain Universitat Politècnica de Catalunya, Barcelona, Spain © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Vasile, D. Quagliarella (eds.), Advances in Uncertainty Quantification and Optimization Under Uncertainty with Aerospace Applications, Space Technology Proceedings 8, https://doi.org/10.1007/978-3-030-80542-5_4

55

56

R. Tosi et al.

provide a constant exploitation of the machine, without leaving the hardware idling. To this end, we developed a Monte Carlo method specifically to run in distributed environments. This algorithm, known as asynchronous Monte Carlo, adds one extra level of parallelism to standard Monte Carlo: between batches. Each batch owns its hierarchy, namely the number of realizations in the case of Monte Carlo. This avoids having global synchronization points (something typical of classic Monte Carlo), which leave the supercomputer idle and inefficient, and ensures high computational efficiency. Computational efficiency is even more important when dealing with extremely complex and costly problems, such as those that are commonplace in wind engineering. In fact, wind engineering problems are by their nature stochastic, due to the presence of wind. In order to study the effect of wind on structures, one needs to solve fluid simulations for a time window long enough to capture all possible wind scenarios [5]. In this chapter, we study a stochastic wind engineering benchmark problem, with our asynchronous Monte Carlo algorithm. The framework we have developed is capable of handling both scalar quantities and time series quantities. Therefore, we will analyze both time-averaged and time series values of the drag force and of the base moment. Computing statistical estimators is crucial to ensure computational efficiency. We exploit power sums to compute expected value and central moments, since such quantities can be easily updated on the fly. Another statistical quantity estimated in this work is the Conditional Value-at-Risk (CVAR), which measures the tail of the probability density function. We will first recall the standard Monte Carlo method and then introduce our asynchronous Monte Carlo algorithm and its scheduling system, which will be used to solve the target problem. Afterward, we will focus on the computation of statistical quantities of interest and include various considerations on the optimal time window and the number of Monte Carlo realizations required to control the associated statistical error. This work exploits the Kratos Multiphysics (Kratos) [8, 9, 17] finite element software, the hierarchical Monte Carlo library XMC [1], and the distributed computing framework PyCOMPSs [3, 14, 24].

2 Monte Carlo Methods In this section, we review standard Monte Carlo methods, and we briefly discuss asynchronous Monte Carlo and the scheduling framework for running in distributed environments.

Asynchronous Monte Carlo Applied to Wind Engineering

57

2.1 Monte Carlo Monte Carlo methods [20] estimate statistical quantities empirically using a collection of independent samples of a random variable. Such methods are generally non-intrusive and converge with respect to the number of samples N at a rate independent of the stochastic space dimension. For these reasons, Monte Carlo methods are very popular and often used. The well-known Monte Carlo estimator for the expected value is N E

MC

n=1 QH (w

[QH ] =

(n) )

N

(1)

,

where Q is the output quantity of interest (QoI) we are interested in, N is the total number of realizations, w is the random variable, and H is a mesh discretization parameter. The mean square error of the estimator is defined as V[QH ] 2 [QH ] = E (EMC [QH ] − E[Q])2 = (E[QH ] − E[Q])2 + , eMC ! " N! " DE2 SE2

(2)

where V[QH ] is the variance of QH . The variance of QH can be estimated as N V[QH ] ≈ VMC [QH ] =

n=1 (QH (w

(n) ) − EMC [Q ])2 H

N −1

,

(3)

and the standard deviation of QH as σ [QH ] =

#

V[QH ] ≈

$

VMC [QH ] .

(4)

As we can see, Eq. (2) can be split into two terms: the square of the discretization error (DE) and the square of what later on we will denote as statistical error (SE). DE is related to the space discretization we are using, while SE to the variance of the estimator. Note that the statistical error is proportional to N −1/2 , which is an MC bottleneck. Other techniques, such as Multilevel Monte Carlo or Continuation Multilevel Monte Carlo, target the discretization error to reduce the overall mean square error. We refer to [7, 12, 13] for details. Instead of assessing convergence exploiting Eq. (2), we set a tolerance ε > 0 and a confidence 1 − φ ∈ (0, 1). Then, we want to satisfy the failure probability criteria % % % MC % P %E [QH ] − E[QH ]% ≥ ε ≤ φ .

(5)

58

R. Tosi et al.

Algorithm 1 Synchronous Monte Carlo Define initial hierarchy: N and H while Eq. (5) is not True do Update N for n = 0 : N do Generate w and compute QH end for Update statistical estimators and compute SE Check convergence with Eq. (5) end while

Similarly to the mean square error, the failure probability error can be split into statistical error (SE) and discretization error (DE). The former arises due to the finite sampling and goes to zero as N goes to infinity. The latter is present because we solve a discretized problem. DE is constant and is not a function of the number of realizations. We remark that if an analytical solution is not available, computing the discretization error is not possible for single-level MC, while it is possible to approximate it for Multilevel Monte Carlo methods. For this reason, since in this chapter we use single-level Monte Carlo, we will check convergence only by assessing the statistical error. Monte Carlo methods are highly parallelizable since all N realizations are independent and can be run in parallel. However, assessing their convergence is an intrinsically serial procedure. In fact, given N, one can run all realizations in parallel, and once all of them are finished, assess convergence with Eq. (5). In case convergence is not achieved, another MC iteration is needed. Waiting for all realizations to end before assessing convergence may cause the supercomputer to be idle for long times, and this can be extremely costly. In fact, a single realization may be slower or faster than others due to the different random variable values or a hardware malfunction. For this reason, we refer to classical Monte Carlo methods as synchronous. A draft of a general synchronous Monte Carlo algorithm is described in Algorithm 1. First, the hierarchy must be defined. Then, the realizations are run in parallel, and statistical estimators are updated on the fly. Finally, convergence is checked with Eq. (5). In case Eq. (5) is not satisfied, we iterate until convergence. To overcome the waste of computational resources due to idle times, we exploit the asynchronous Monte Carlo algorithm, which is described in the next subsection.

2.2 Asynchronous Monte Carlo Hierarchical Monte Carlo methods, such as Monte Carlo and Multilevel Monte Carlo, present three levels of parallelism [11]:

Asynchronous Monte Carlo Applied to Wind Engineering

59

– Between levels – Between samples per level – At the solver level, for each sample We remark that hierarchical Monte Carlo methods may present more levels of different accuracies, which normally are linked with the space discretization. Classic MC is a single-level method; thus the first parallelism is not present. In order to avoid costly idle times, a new level of parallelism is added: between batches. Each batch is independent of the others and is characterized by its own hierarchy, namely in its number of levels and realizations. The key idea is to run each batch parallel to other batches. Once a batch of realizations finishes, statistical estimators can be updated within their corresponding batch on the fly. Once the computation of in-batch estimators is concluded, these are merged with global estimators. After global estimators are updated, convergence can be computed. As with synchronous Monte Carlo, a synchronization point is still present before the convergence check. The difference is that now this is parallel to the run of other batches; therefore, the synchronization is local, and not global as before. This ensures the supercomputer is always operative, and there is no waste of resources due to lack of simulations. The basic algorithm is presented in Algorithm 2. First, the number of batches, B, the number of realizations, N, and mesh discretization, H , are defined. Then, all realizations are run in parallel, and in-batch statistical estimators are updated on the fly. Finally, once all in-batch operations are finished, global statistical estimators are updated and convergence is checked with Eq. (5). In case Eq. (5) is not satisfied, we iterate until convergence. Algorithm 2 Asynchronous Monte Carlo Define initial hierarchy: B, N and H while Eq. (5) is not True do Update B and N for b = 0 : B do for n = 0 : N do Generate w and compute QH end for Update in-batch statistical estimators end for Merge in-batch statistical estimators with global estimators Compute SE Check convergence with Eq. (5) end while

To ease the comprehension of Algorithm 2, we report the graph of asynchronous Monte Carlo in Fig. 1. The plot shows the connections and the dependencies between each task, as if the algorithm was run in distributed environment. By looking at the graph, we observe that three batches of two realizations each are executed and that the synchronization points are local and parallel to other batch samples.

60

R. Tosi et al.

Fig. 1 Graph connections of the asynchronous Monte Carlo algorithm. A total of three batches and two realizations per batch are executed

2.3 Scheduling Properly scheduling hierarchical Monte Carlo algorithms when running in distributed environments is crucial. It is known that dynamic scheduling is the best strategy for simulations whose run-time may change, as shown in [11]. For this reason, in our work, we exploit a dynamic scheduling approach. Specifically, when launching the job, we allocate a desired number of working nodes and one master node. The scheduler operates on the master node and spawns the tasks defined in Fig. 1 on the working nodes, where they are then executed. The scheduler uses a First In, First Out (FIFO) approach. This permits it to first finish the batches launched first, thus allowing it to keep the workflow designed in Algorithm 2.

3 Wind Engineering Benchmark In this section, we first present the wind engineering benchmark we consider, we illustrate the problem uncertainties we considered, and we report the statistical analysis of the chosen quantities of interest. The considered supercomputer for the analyses is the MareNostrum 4 system, with 11.15 petaflops of peak performance, which consists of 3456 compute nodes

Asynchronous Monte Carlo Applied to Wind Engineering

61

equipped with two Intel R Xeon Platinum 8160 (24 cores at 2.1 GHz each) processors. The stochastic problem we consider is first solved for a given tolerance ε and a given confidence 1 − φ. Thereafter, it is solved for a given computational cost, and different realizations N and time windows [0, T ] are considered. The number of nodes is chosen accordingly to the MC hierarchy to make sure a single batch of asynchronous Monte Carlo can properly fill the supercomputer. The computational cost of the asynchronous Monte Carlo run is computed as the product between the total number of simulation hours multiplied by the number of cores exploited, and its value is approximately 35,000 CPU hours.

3.1 Problem Description In wind engineering, research is focused on solving turbulent problems and studying the effect of wind surrounding buildings [5, 6]. We decide to focus on a high Reynolds flow around a 5 m × 1 m rectangle. The problem we solve is described by the incompressible Navier–Stokes equations ∂u + u · ∇u − νΔu + ∇p = f ∂t ∇ ·u=0

on , t ∈ [0, T ]

(6)

on , t ∈ [0, T ] ,

where u is the velocity field, p is the pressure field, ν is the kinematic viscosity, and f is the vector field of body forces. refers to the problem domain, and [0, T ] is the considered time window. The problem domain is shown in Fig. 2. The body is represented by arrows of four different colors, which serve to guide the reader in understanding Fig. 5. The inlet velocity is uniformly distributed on the y-axis and has an average value of 2 m/s. Slip boundary conditions are applied on the external boundaries, and no-slip boundary conditions are enforced on the rectangle body. The Reynolds number is Re = 132,719. The washout time, which is the time needed for one particle to go from the inlet to the outlet of the domain with an average speed of 2 m/s, is 137.5 s. The washout time is computed as Tw =

275 m . 2 m/s

(7)

Therefore, we consider a burn-in time Tbt of 140 s. Tbt is the simulation time information we discard to decorrelate the flow field from initial conditions. The mesh considered to solve the problem is a solution-oriented adaptive refined discretization with respect to the average velocity field u(t, x), averaged on

62

R. Tosi et al.

150 m

1m 5m

75 m 275 m

Fig. 2 Domain dimensions are 275 × 150 m. The inner rectangle has size 5 × 1 m

[Tbt , T ], where [0, Tbt ] is the burn-in time discarded to avoid dependencies on initial conditions. The metric computed to perform the refinement is built within Kratos [16], exploiting the averaged velocity field, and the original mesh is refined using the Mmg software [10]. The final mesh has around 25,000 nodes, and a minimal size, close to the rectangle body, of 0.002 m. The chosen time step is 0.02 s, which gives a CFL of CFL =

Δt u ≈ 20 . h

(8)

The quantities of interest we consider are the: 1. 2. 3. 4. 5. 6.

Time-averaged drag force Fd Time-averaged base moment Mb Time-averaged pressure field p(x) Time series drag Fd Time series base moment Mb Time series pressure field p(x)

By time-averaged, we refer to quantities averaged over the interval [Tbt , T ]; thus some information is lost due to the averaging process, which damps peaks and oscillations. In other words, out of a single realization, the quantity of interest is a single scalar value. On the other hand, with time series we refer to quantities that keep all the historical information.

Asynchronous Monte Carlo Applied to Wind Engineering

63

3.2 Source of Uncertainty Wind is the chosen uncertain parameter. Different wind models exist in the literature, and their aim is to imitate wind physical behavior; we refer to [2, 15] for details. For the sake of simplicity, we assume that the magnitude of the wind inlet velocity is given by a normal distribution, uinlet ∼ N(2.0, 0.02) .

(9)

The stochastic problem is solved using asynchronous Monte Carlo. Statistical convergence is checked for the time-averaged drag force, but, as commented above, also other physical quantities of interest are computed and analyzed.

3.3 Results Given the stochastic conditions introduced above, we want to satisfy Eq. (5), with confidence φ = 0.99 and tolerance ε = 0.0085. We remark that the tolerance is absolute, and it has a relative value around 0.2%, with respect to the time-averaged drag force expected value. The chosen algorithm is asynchronous Monte Carlo; therefore, we neglect the discretization error. Additionally, even though multiple quantities of interest are computed, convergence is checked only for the timeaveraged drag force. Given a time window [0, 300] seconds, we satisfy Eq. (5) after running 960 realizations. We report in Figs. 3 and 4 the instantaneous velocity and pressure fields at t = 200 s for one realization. In Table 1, we report the expected value and the standard deviation values for the scalar quantities of interest. As we can readily observe, the mean values are the same, which is the expected behavior. On the other hand, we see that standard

u(x

4 m/s 2

0

Fig. 3 Velocity field snapshot at t = 200 s

64

R. Tosi et al. p(x) 0 5

Pa

10 15

Fig. 4 Pressure field snapshot at t = 200 s Table 1 Statistical analysis of time-averaged drag force Fd , base moment Mb , time series drag force Fd , and time series base moment Mb . Results for N = 960 and T = 300 s are provided Q Fd Mb Fd Mb

E[·] 3.23506 −0.01055 3.23506 −0.01055

σ [·] 0.01242 0.01449 0.33241 4.55940

deviation values are different, much smaller for the time-averaged quantities of interest. This happens because oscillations are damped due to the intermediate averaging process. From the drag force of Table 1, one can estimate the drag coefficient as Cd =

Fd 1 2 2 ρu A

,

(10)

where ρ is the fluid density, u is the speed of the rectangle body relative to the fluid, and A is the cross-sectional area. The drag coefficient we obtain is Cd = 1.320, which is consistent with the literature results [6]. The different behaviors between time-averaged and time series quantities can be observed also looking at the pressure field distribution around the rectangle body. We refer to Fig. 2 as a reference for understanding the plots of Fig. 5. We have observed how to satisfy Eq. (5) for given tolerance and confidence. However, the following question may arise: is our time window [0, T ] optimal for such a problem? In order to try to answer this question, we compare simulations with a different number of realizations N and different time windows [0, T ]. To do so, we fix the computational cost to the one required by N = 960 and T = 300 s, which is around 35,000 CPU hours, and we compare the statistical errors we obtain. The statistical error (SE) is computed as follows:

Asynchronous Monte Carlo Applied to Wind Engineering

E[Q]

65

σ[Q] 2

0

2

4

E[Q]

0

2

4

6

8

10

12 x

0

2

4

6

8

10

12 x

σ[Q] 2 0 2 4 6

Fig. 5 The upper figure shows the statistical analysis (expected values combined with standard deviation) of the time-averaged pressure field (Q = p(x)), and the lower figure the statistical analysis (expected values combined with standard deviation) of the time series pressure field (Q = p(x))

& SE =

1 V QH , N

(11)

where QH is the drag force Fd , computed on the mesh with discretization parameter H . As we can see in Table 2, we obtain different results for different values of N and [0, T ]. For this reason, we believe it is possible to find a relationship between the statistical error, N , and [0, T ], in order to optimize the statistical error value. This will be the subject of a future study.

66

R. Tosi et al.

Table 2 The table reports the statistical error SE values of the time-averaged drag force Fd . N and T refer to the number of wind realizations and the time window [0, T ]. The burn-in time is 140 s. C is the computational cost, expressed in CPU hours SE 0.00723 0.00359 0.00389

N 1920 960 640

T 150 300 450

C 35,000 35,000 35,000

Table 3 CVAR analysis of time-averaged drag force Fd and time series drag force Fd . Results for N = 960, T = 300 s, and α = 0.9 are provided Q Fd Fd

CVAR 3.26569 4.33234

α 0.9 0.9

Finally, another statistical quantity one may be interested in computing is the Conditional Value-at-Risk (CVAR) [22, 23]. Let us define the α-quantile. As stated in [23], the α-quantile of a random variable Q, qα (Q), is simply FQ−1 (α), when the cumulative distribution function FQ is strictly increasing. We can now define the CVAR, following [23]. Given α ∈ (0, 1), the CVAR of Q at probability α is defined as 1 CVARα (Q) = 1−α

1

qβ (Q)dβ .

(12)

α

Looking at Table 3, we can observe the CVAR values for the time-averaged drag force and time series drag force, with α = 0.9. Similarly to what we commented before, we can observe that time-averaged values are smaller since oscillations are damped by the intermediate averaging process.

4 Conclusion We have introduced the asynchronous Monte Carlo method as an alternative to standard Monte Carlo, which is better designed for non-intrusive uncertainty quantification on supercomputers. We then applied this method to a wind engineering benchmark problem, which is simple but already contains many challenging features of realistic three-dimensional problems. First, we have observed that time-averaged quantities damp oscillations and peaks, so they lose information about wind critical scenarios. For this reason, time series quantities should be preferred. However, dealing with time series quantities is more difficult since it requires storing all the time series to post-process them afterward. Storing to file may become very challenging; for example, a single file containing time histories of both drag force and base moment is around 1 MB. Doing so for the pressure for three-dimensional

Asynchronous Monte Carlo Applied to Wind Engineering

67

problems for multiple realizations may easily become extremely costly. For this reason, updating statistics on the fly [4, 18, 19] should be preferred. In addition, different statistical quantities have been computed: namely, the expected value, the standard deviation, and the Conditional Value-at-Risk (CVAR). The expected value is not sensitive to high values, and the standard deviation takes into account oscillations; meanwhile, the CVAR measures the weight of the tail of the probability density function. Finally, we have observed that a connection between the statistical error, the number of realizations, and the time window exists for time-dependent problems.

References 1. Amela, R., Ayoul-Guilmard, Q., Badia, R.M., Ganesh, S., Nobile, F., Rossi, R., Tosi, R.: ExaQUte XMC (2019). https://doi.org/10.5281/zenodo.3235832 2. Andre, M.S.: Aeroelastic modeling and simulation for the assessment of wind effects on a parabolic trough solar collector. Ph.D. thesis, Technische Universität München (2018) 3. Badia, R.M., Conejero, J., Diaz, C., Ejarque, J., Lezzi, D., Lordan, F., Ramon-Cortes, C., Sirvent, R.: COMP superscalar, an interoperable programming framework. SoftwareX 3–4, 32–36 (2015). https://doi.org/10.1016/j.softx.2015.10.004 4. Bennett, J., Grout, R., Pébay, P., Roe, D., Thompson, D.: Numerically stable, single-pass, parallel statistics algorithms. In: Proceedings - IEEE International Conference on Cluster Computing, ICCC (2009). https://doi.org/10.1109/CLUSTR.2009.5289161 5. Braun, A.L., Awruch, A.M.: Aerodynamic and aeroelastic analyses on the CAARC standard tall building model using numerical simulation. Comput. Struct. 87(9–10), 564–581 (2009). https://doi.org/10.1016/j.compstruc.2009.02.002 6. Bruno, L., Salvetti, M.V., Ricciardelli, F.: Benchmark on the aerodynamics of a rectangular 5:1 cylinder: an overview after the first four years of activity. J. Wind Eng. Ind. Aerodyn. 126, 87–106 (2014). https://doi.org/10.1016/j.jweia.2014.01.005 7. Collier, N., Haji-Ali, A.L., Nobile, F., von Schwerin, E., Tempone, R.: A continuation multilevel Monte Carlo algorithm. BIT 55(2), 399–432 (2015). https://doi.org/10.1007/s10543-0140511-3 8. Dadvand, P., Rossi, R., Gil, M., Martorell, X., Cotela, J., Juanpere, E., Idelsohn, S.R., Oñate, E.: Migration of a generic multi-physics framework to HPC environments. Comput. Fluids 80(1), 301–309 (2013). https://doi.org/10.1016/j.compfluid.2012.02.004 9. Dadvand, P., Rossi, R., Oñate, E.: An object-oriented environment for developing finite element codes for multi-disciplinary applications. Arch. Comput. Method. E. 17(3), 253–297 (2010) 10. Dapogny, C., Dobrzynski, C., Frey, P.: Three-dimensional adaptive domain remeshing, implicit domain meshing, and applications to free and moving boundary problems. J. Comput. Phys. 262, 358–378 (2014). https://doi.org/10.1016/j.jcp.2014.01.005 11. Drzisga, D., Gmeiner, B., Rüde, U., Scheichl, R., Wohlmuth, B.: Scheduling massively parallel multigrid for multilevel Monte Carlo methods. SIAM J. Sci. Comput. 39(5), S873–S897 (2017). https://doi.org/10.1137/16m1083591 12. Giles, M.B.: Multilevel Monte Carlo path simulation. Oper. Res. 56(3), 607–617 (2008). https://doi.org/10.1287/opre.1070.0496 13. Giles, M.B.: Multilevel Monte Carlo methods. Acta Numer. 24, 259–328 (2015). https://doi. org/10.1017/S096249291500001X 14. Lordan, F., Tejedor, E., Ejarque, J., Rafanell, R., Álvarez, J., Marozzo, F., Lezzi, D., Sirvent, R., Talia, D., Badia, R.M.: ServiceSs: an interoperable programming framework for the cloud. J. Grid. Comput. 12(1), 67–91 (2014). https://doi.org/10.1007/s10723-013-9272-5

68

R. Tosi et al.

15. Mann, J.: Wind field simulation. Probab. Eng. Mech. 13(4), 269–282 (1998). https://doi.org/ 10.1016/s0266-8920(97)00036-2 16. Mataix Ferrándiz, V.: Innovative mathematical and numerical models for studying the deformation of shells during industrial forming processes with the Finite Element Method. Ph.D. thesis, Universitat Politècnica de Catalunya (2020) 17. Mataix Ferrándiz, V., Bucher, P., Rossi, R., Cotela Dalmau, J., Maria, J., Zorrilla, R., Celigueta, M.A., Casas, G., Roig, C., Velázquez, A.C., Dadvand, P., Latorre, S., González, J.I., de Pouplana, I., Maso, M., Núñez, M., Arrufat, F., Dbaumgaertner, Chandra, B., Ghantasala, A., Armingeiser, Warnakulasuriya, S., Lluís, Gárate, J., MFusseder, Pablo, Franci, A., Gracia, L., Thomas, Sautter, K.B., Tosi, R.: KratosMultiphysics/Kratos: Release 8.0 (2020). https:// doi.org/10.5281/zenodo.3234644 18. Pébay, P., Terriberry, T.B., Kolla, H., Bennett, J.: Numerically stable, scalable formulas for parallel and online computation of higher-order multivariate central moments with arbitrary weights. Comput. Stat. 31(4), 1305–1325 (2016). https://doi.org/10.1007/s00180-015-0637-z 19. Pisaroni, M., Krumscheid, S., Nobile, F.: Quantifying uncertain system outputs via the multilevel Monte Carlo method - Part I: central moment estimation. J. Comput. Phys. (2020). https://doi.org/10.1016/j.jcp.2020.109466 20. Pisaroni, M., Nobile, F., Leyland, P.: A Continuation Multi Level Monte Carlo (C-MLMC) method for uncertainty quantification in compressible inviscid aerodynamics. Comput. Methods Appl. Mech. Engrg. 326, 20–50 (2017). https://doi.org/10.1016/j.cma.2017.07.030 21. Pons-Prats, J., Bugeda, G.: Multi-level Monte Carlo Method, pp. 291–304. Springer International Publishing, Cham (2019). https://doi.org/10.1007/978-3-319-77767-2_18 22. Rockafellar, R.T., Royset, J.O.: On buffered failure probability in design and optimization of structures. Reliab. Eng. Syst. Saf. 95(5), 499–510 (2010). https://doi.org/10.1016/j.ress.2010. 01.001 23. Rockafellar, R.T., Royset, J.O.: Engineering decisions under risk averseness. ASCE-ASME J. Risk Uncertain. Eng. Syst. Part A Civil Eng. 1(2), 04015003 (2015). https://doi.org/10.1061/ AJRUA6.0000816 24. Tejedor, E., Becerra, Y., Alomar, G., Queralt, A., Badia, R.M., Torres, J., Cortes, T., Labarta, J.: PyCOMPSs: parallel computational workflows in Python. Int. J. High Perform. C. 31(1), 66–82 (2017). https://doi.org/10.1177/1094342015594678

Multi-Objective Optimal Design and Maintenance for Systems Based on Calendar Times Using MOEA/D-DE A. Cacereño, D. Greiner

, and B. Galván

1 Introduction Reliability (R(t)) is defined as the probability of failure-free operation under particular conditions during a certain period of time [1]. The above definition leads to interest in the time to failure, which is a continuous random variable that can be represented by a continuous probability distribution. Availability (A(t)) can be defined as the fraction of the total time in which devices or systems are able to perform their required function [2]. The main difference between these two concepts is that availability is used for repairable devices or systems [3], because it explains the process up until their failures and recoveries. The interest in the time to failure was expressed before, however, with the availability the interest in the time encompasses not only the time to failure but also the time to repair. Redundancy is used to improve systems’ reliability and availability and also to reduce costs of maintenance and device failures [4]. A redundancy is a component added to a subsystem from a series–parallel configuration in order to increase the number of alternative paths [5], so including redundant devices implies to modify the system design. On the other hand, the overall improvement of system reliability and/or availability is possible through preventive maintenance [6]. When a continuous operation system is not available due to a failure or a maintenance

A. Cacereño is a recipient of a contract from the Program of Training for Predoctoral Research Staff of University of Las Palmas de Gran Canaria. The authors are grateful for the support. A. Cacereño () · D. Greiner · B. Galván Instituto Universitario de Sistemas Inteligentes y Aplicaciones Numéricas en Ingeniería (SIANI), Universidad de Las Palmas de Gran Canaria (ULPGC), Campus Universitario de Tafira Baja, Las Palmas de Gran Canaria, Spain e-mail: [email protected]; [email protected]; [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Vasile, D. Quagliarella (eds.), Advances in Uncertainty Quantification and Optimization Under Uncertainty with Aerospace Applications, Space Technology Proceedings 8, https://doi.org/10.1007/978-3-030-80542-5_5

69

70

A. Cacereño et al.

activity, it enters an unproductive phase. If a preventive maintenance activity is performed, the unproductive phase will be more controlled than when repairs have to be performed because of a failure. For all above, it will be possible to improve the availability of repairable systems if it is dealt with their design and maintenance strategy. The employment of optimization methods arises suitable when complex problems have to be solved, in special when the number of potential solutions is high and getting the best solution is difficult. However, it will be possible to obtain sufficiently good solutions [7]. Evolutionary Optimization Algorithms have been used to solve engineering problems [8], and specifically in the reliability field [9]. Typical reliability optimization problems consider conflict objectives such as reliability/availability (to be maximized) and cost (to be minimized). Moreover, when a new system is designed, the discrete-event simulation arises as a powerful modelling technique that affords to analyze complex systems with a closer to reality representation of their behaviour. The present study covers the joint optimization of the design and preventive maintenance strategy for technical systems using evolutionary optimization strategies coupled to discrete-event simulation. We use the MOEA/D-DE [10] method, and we compare the performance of using different configurations of the algorithm. This chapter is organized as follows. Section 2 resumes the methodology. Section 3 presents the application case. In Sect. 4, the results are shown and discussed, and finally, Sect. 5 introduces the conclusions.

2 Methodology and Description of the Proposed Model 2.1 Extracting Availability and Economic Cost from Functionability Profiles The system availability can be characterized by its functionability profile. The concept of functionability profile was introduced by Knezevic [11] and is defined as the inherent capacity of systems to achieve the required function under specific features when they are used as it is specified. From the functionability profiles’ point of view, the states of a repairable system fluctuate among operation and recovery over the mission time. The shape of cited fluctuations is called functionability profile because it shows the states over the mission time. The system operates until the failure or preventive maintenance activity, so the system stops due to corrective or preventive maintenance, respectively. An example of functionability profile (for a system or device) is shown in Fig. 1, and it depends on operation times ((tf 1 , tf 2 , . . . , tf n )) and recovery times ((tr1 , tr2 , . . . , trn )). When the functionability profile of a continuous operation system is set to logical 1, the operation of the device is considered. Conversely, when the functionability profile is set to logical 0, it is considered that the device is stopped (either it is being maintained or it is being repaired after a failure).

Optimal Design and Maintenance Strategy Using MOEA/D-DE

71

Fig. 1 Functionability profile of a device (or system)

As previously mentioned, availability is tightly related to functionability profiles because it is characterized through the relationship between the system operation times and the total mission time, which includes the operation and recovery times. Therefore, it is possible to evaluate the instantaneous system availability (A) at the end of the mission time by using Eq. (1), being the objective function to maximize. n

A=

tf i

i=1 n

tf i +

i=1

m

,

(1)

trj

j =1

where n is the total number of operation times, tf i is the i-th operation time in hours (time to failure or time to preventive maintenance activity), m is the total number of recovery times, and trj is the j -th recovery time in hours (due to repair or preventive maintenance activity). Equation (1) is an approximation of Eq. (2) proposed by Andrews and Moss [2] where they explain that availability is an important measure of the performance for repairable devices, which is related to Mean Time To Failure (MTTF) and Mean Time To Repair (MTTR). A=

MT T F . MT T F + MT T R

(2)

When systems are operating, earnings are generated in relation to their availability. Conversely, when systems have to be recovered, economic cost is generated to return the operation status. In this chapter, the economic cost is a variable directly associated with the recovery times that are related to the corrective and preventive maintenance activities, quantities computed by Eq. (3), being the objective function to minimize. C=

q i=1

cci +

p j =1

cpj ,

(3)

72

A. Cacereño et al.

where C is the system operation cost quantified in economic units, q is the total number of corrective maintenance activities, cci is the cost due to the i-th corrective maintenance activity, p is the total number of preventive maintenance activities, and cpj is the cost due to the j -th preventive maintenance activity. Maintenance activity costs depend on the respective fixed quantities per hour (corrective and preventive) so the global cost is directly related to the recovery times. Preventive maintenance activities are scheduled shutdowns, so recovery times will be shorter and more economical than recovery times due to corrective maintenance activities.

2.2 Multi-Objective Optimization Approach The optimization method used in this chapter belongs to the Evolutionary Algorithms paradigm. It uses a population of individuals of a specific size. Each individual is a multidimensional vector, called chromosome, representing a possible candidate solution to the problem, while the vector components are called genes or decision variables. Extended information on Evolutionary Optimization Algorithms can be found, e.g., in Ref. [7]. Under our approach, each individual of the population consists of a real numbers string in which the system design alternatives and the periodic times to start a preventive maintenance activity related to each device included in the system design are codified, as is detailed later in the application case. We explore the capacity of the standard solver MOEA/D [12], one of the state-of-the-art evolutionary multiobjective algorithms belonging to the selection by decomposition paradigm, with Differential Evolution operator [10] to create new individuals. In the future, we pretend to extend the study covering methods based on other selection criteria such as Pareto dominance or Hypervolume as indicator, such as NSGA-II [13] or SMSEMOA [14], respectively.

2.3 Building Functionability Profiles Characterizing both the system availability and cost from the system functionability profile will be necessary to optimize the system design and the preventive maintenance strategy. The system functionability profile is built from the functionability profiles of system devices that are built by using discrete-event simulation. With this purpose, information about how to characterize operation times to failure (TF) and recovery times after failure (TR) is needed, which are related to the parameters of their probability density functions. The functionability profiles for system devices are built by generating random times that are obtained from the respective probability density functions both for operating times (TF) and recovery times (TR). To modify the functionability profiles, attending to preventive maintenance activities, operation times to preventive maintenance (TP) have to be used. This

Optimal Design and Maintenance Strategy Using MOEA/D-DE

73

Fig. 2 Building functionability profiles

information is supplied through each solution proposed by the Multi-Objective Evolutionary Algorithm (each individual of the population), which is used to build the device functionability profiles through discrete-event simulation. Moreover, recovery times due to preventive maintenance activities (TRP) have to be introduced by generating random times between the limits previously fixed. The process, which is shown in Fig. 2, is explained below:

74

A. Cacereño et al.

1. System mission time (life cycle) has to be decided, and then, the process continues for all devices. 2. The device functionability profile (FP) has to be initialized. 3. The time to start a preventive maintenance activity (TP) proposed by the Multi-Objective Evolutionary Algorithm is extracted from the individual of the population, which is being evaluated, and a recovery time to develop the preventive maintenance activity (TRP) is randomly generated, between the limits previously fixed. 4. Attending to the failure probability density function related to the device, an operation time to failure (TF) is randomly generated, between the limits previously fixed. 5. If TP < TF, the preventive maintenance activity is performed before the failure. In this case, as many logical “ones” as TP units followed by as many logical “zeros” as TRP units are added to the device functionability profile. Each time unit represented in this way (both as logical “one” and as “zero”) is equivalent to 1 h of real time. 6. If TP > TF, the failure occurs before carrying out the preventive maintenance activity. In this case, attending to the repair probability density function related to the device, the recovery time after the failure (TR) is randomly generated, between the limits previously fixed. Then, as many logical “ones” as TF units followed by as many logical “zeros” as TR units are added to the device functionability profile. Each time unit represented in this way (both as logical “one” and as logical “zero”) is equivalent to 1 h of real time. 7. Steps 4 to 6 have to be repeated until the end of the device mission time. 8. Steps 2 to 7 have to be repeated until the construction of the functionability profiles of all devices. 9. After building all the functionability profiles, attending to the logic due to the serial (AND) or parallel (OR) distribution of the system devices, the system functionability profile is built. Once built the system functionability profile, the values of the objective functions can be computed by using both Eq. (1) (characterizing the system availability in relation to the time in which the system is operating and being recovered) and Eq. (3) (characterizing the system operation cost depending on the cost of the time units in relation to the development of corrective or preventive maintenance).

3 The Application Case The application case consists of optimizing the design and preventive maintenance strategy for an industrial system based on two conflicting objectives, availability and operation cost. The proposed methodology is applied to an industrial fluid injection system, shown in Fig. 3. The system is basically formed by cut valves (Vi ) and impulsion pumps (Pi ).

Optimal Design and Maintenance Strategy Using MOEA/D-DE

75

Fig. 3 Application case: fluid injection system

The definitions of the data used are: – Life Cycle. System mission time, expressed in hours – Corrective Maintenance Cost. Considered cost to develop a repair activity to recover the system after a failure, expressed in economic units per hour – Preventive Maintenance Cost. Considered cost to develop a preventive maintenance activity, expressed in relation to the Corrective Maintenance Cost – Pump TF min. Minimum operation time to failure for a pump without preventive maintenance, expressed in hours – Pump TF max. Maximum operation time to failure for a pump without preventive maintenance, expressed in hours – Pump λ. Failure rate for a pump, which follows an exponential failure distribution, expressed in hours raised to the power of minus six – Pump TR min. Minimum duration of a corrective maintenance activity for a pump, expressed in hours – Pump TR max. Maximum duration of a corrective maintenance activity for a pump, expressed in hours – Pump TR μ. Mean for the normal distribution followed for the time to repair assumed for a pump, expressed in hours – Pump TR σ . Standard deviation for the normal distribution followed for the time to repair assumed for a pump, expressed in hours – Pump TP min. Minimum operation time to start following a preventive maintenance activity for a pump, expressed in hours – Pump TP max. Maximum operation time to start following a preventive maintenance activity for a pump, expressed in hours – Pump TRP min. Minimum time to perform a preventive maintenance activity for a pump, expressed in hours – Pump TRP max. Maximum time to perform a preventive maintenance activity for a pump, expressed in hours – Valve TF min. Minimum operation time to failure for a valve without preventive maintenance, expressed in hours – Valve TF max. Maximum operation time to failure for a valve without preventive maintenance, expressed in hours – Valve λ. Failure rate for a valve, which follows an exponential failure distribution, expressed in hours raised to the power of minus six

76

A. Cacereño et al.

– Valve TR min. Minimum duration of a corrective maintenance activity for a valve, expressed in hours – Valve TR max. Maximum duration of a corrective maintenance activity for a valve, expressed in hours – Valve TR μ. Mean for the normal distribution followed for the time to repair assumed for a valve, expressed in hours – Valve TR σ . Standard deviation for the normal distribution followed for the time to repair assumed for a valve, expressed in hours – Valve TP min. Minimum operation time to start following a preventive maintenance activity for a valve, expressed in hours – Valve TP max. Maximum operation time to start following a preventive maintenance activity for a valve, expressed in hours – Valve TRP min. Minimum time to perform a preventive maintenance activity for a valve, expressed in hours. – Valve TRP max. Maximum time to perform a preventive maintenance activity for a valve, expressed in hours The values used are shown in Table 1, which were obtained from specific literature [15], expert judgement (based on professional experience), or mathematics relations. Optimization objectives consist of maximizing the system availability and minimizing the operation cost due to the system unproductive phases (both because the system is being repaired and because the system is being maintained). To do that: – For the system devices, the optimum period to perform a preventive maintenance activity has to be established. – Including redundant devices P2 and/or V4 has to be decided by evaluating design alternatives. Including redundant devices will improve the system availability, but it will raise the system operation cost. From the optimization point of view, the Evolutionary Algorithms (EA) use a population of individuals called chromosomes that represent possible solutions to the problem. In our case, the chromosomes will be formed by real number strings with 0 as minimum value and 1 as maximum value; however, they have to be scaled to evaluate the objective functions. They will be codified as [B1 B2 T1 T2 T3 T4 T5 T6 T7 ], where the presence of redundant devices, P2 and V4, is defined by B1 and B2 , respectively, and the optimum time to start a preventive maintenance activity in relation to each device is represented by T1 to T7 . The parameters used to configure the evolutionary process are shown in Table 2, and they are: – Population size (N): We use population sizes of 50, 100, and 150 individuals. – Mutation Probability (PrM): The expectation of the number of genes mutating. The value is equivalent to 1/decision variables (9 for the application case) = 0.111. – Mutation Distribution (disM): The distribution index of polynomial mutation. This is set to the typical value of 20 for the present application case.

Optimal Design and Maintenance Strategy Using MOEA/D-DE

77

Table 1 Data set for system devices Parameter Life cycle Corrective maintenance cost Preventive maintenance cost Pump TF min Pump TF max Pump λ Pump TR min Pump TR max Pump TR μ Pump TR σ Pump TP min Pump TP max Pump TRP min Pump TRP max Valve TF min Valve TF max Valve λ Valve TR min Valve TR max Valve TR μ Valve TR σ Valve TP min Valve TP max Valve TRP min Valve TRP max Table 2 Parameters for the optimization process

Value 700,800 h 0.5 units 0.125 units 1h 70,080 h 159.57·10−6 h 1h 24.33 h 11 h 3.33 h 2920 h 8760 h 4h 8h 1h 70,080 h 44.61·10−6 h 1h 20.83 h 9.5 h 2.83 h 8760 h 35,040 h 1h 3h

Method MOEA/D-DE

N 50 100 150

Source – Expert judgement Expert judgement Expert judgement Expert judgement OREDA 2009 Expert judgement μ + 4σ OREDA 2009 (μ – TRmin)/3 Expert judgement Expert judgement Expert judgement Expert judgement Expert judgement Expert judgement OREDA 2009 Expert judgement μ + 4σ OREDA 2009 (μ – TRmin)/3 Expert judgement Expert judgement Expert judgement Expert judgement

PrM

disM

CR

0.111

20

0.90

F 0.4 0.5 0.6

δ

nr

0.9

1

– Crossover Rate (CR): The crossover operator has the function of mixing the genetic information among chromosomes to create new individuals. In Differential Evolution, each gene is crossed (or not) depending on a probability variable called Crossover Rate. Typical value for the Crossover Rate is between 0.1 and 1.0 [7]. For the application case, the Crossover Rate parameter is set to 0.9 because a large CR often speeds convergence [16]. – Scale Factor (F ): In Differential Evolution, the mutation operator alters the genes of the chromosome by adding a scaled difference vector from two chosen chromosomes to a third chromosome. The difference vector is scaled using the

78

A. Cacereño et al.

Scale Factor. Typical value for the Scale Factor is between 0.4 and 0.9 [7]. For the application case, values 0.4, 0.5, and 0.6 are tested as it is shown in Table 2. – Probability of choosing parents locally (δ): A typical value used [10, 17, 18] is 0.9. – Replacement mechanism (nr ): The replacement mechanism improves the quality of the population in terms of dominance, and it also maintains the diversity. A high-quality offspring solution could replace most of the current solutions in favour of its neighbouring solution [17], which implies the decrease in the diversity. The parameter nr is used to establish the maximum number of solutions replaced by a high-quality offspring. A proposed empirical rule [18] is to consider nr = 0.01 ∗ N , as being N the population size (note that nr has to be an entire value). We execute 20 times each of the 9 configurations for statistical purpose, and 10,000,000 evaluations as the stopping criterion. Finally, we use the Hypervolume indicator [19] to measure the performance of the configurations of the method. A summary of its characteristics is, e.g., in the introduction supplied by Auger et al. [20]. As they explain, the Hypervolume indicator has been proposed about a decade ago to compare the performance of multi-objective optimization algorithms. It measures the quality of a set of solutions quantitatively as the “size of the space covered”. Moreover, the Hypervolume indicator considers both approximations to the Pareto front, as well as coverage of the Pareto front, implicitly benefiting of uniform spread of the samples. The Software Platform PlatEMO [21] (programmed in MATLAB) was used to optimize the application case. The Design and Maintenance Strategy Analysis Software has been developed and implemented into the platform to solve the problem described above.

4 Results and Discussion Due to the hardness of the problem, a general purpose calculation cluster was used in the optimization process. The cluster is composed of 28 calculation nodes and one access or front-end node. Each calculation node consists of 2 processors Intel Xeon E5645 Westmere-EP with 6 cores each and 48 GB of RAM memory, allowing running 336 executions at the same time. Each execution of the method consumed an average time of 3886 min (2 days, 16 h and 45 min). The whole optimization process implies a sequential time of 699,480 min (1 year, 4 months, 5 days and 17 h, approximately). The relationship between method configurations (where N represents the population size and F the Scale Factor) and identifiers is shown in Table 3. The Hypervolume average values evolution (in relation to the 20 executions of each configuration) versus the evaluations number is shown in Fig. 4a. The detail for the final evaluations is shown in Fig. 4b. It is possible to check that the configuration

Optimal Design and Maintenance Strategy Using MOEA/D-DE

79

Table 3 Hypervolume indicator statistical analysis (best values per column in bold type) Identifier ID1 ID2 ID3 ID4 ID5 ID6 ID7 ID8 ID9

Configuration N = 50 − F = 0.4 N = 100 − F = 0.4 N = 150 − F = 0.4 N = 50 − F = 0.5 N = 100 − F = 0.5 N = 150 − F = 0.5 N = 50 − F = 0.6 N = 100 − F = 0.6 N = 150 − F = 0.6

Average 2.2688 2.2768 2.2752 2.2677 2.2828 2.2756 2.2657 2.2760 2.2759

Median 2.2690 2.2717 2.2771 2.2652 2.2857 2.2759 2.2644 2.2762 2.2752

Max. 2.2999 2.3261 2.3030 2.3098 2.2364 2.3152 2.3041 2.3110 2.3062

Min. 2.2462 2.2508 2.2546 2.2474 2.2554 2.2469 2.2307 2.2449 2.2395

St. D. 0.0147 0.0162 0.0138 0.0160 0.0157 0.0157 0.0143 0.0202 0.0159

Rank 5.850 4.499 4.649 6.050 3.600 4.700 6.500 4.899 4.300

with identifier ID5 (a population of 100 individuals and an F parameter of 0.5) presents the highest Hypervolume average value at the end of the process. Box plots of the Hypervolume values distribution at the end of the process are shown in Fig. 5. It is possible to check that the configuration with identifier ID5 (a population of 100 individuals and an F parameter of 0.5) presents both the highest Hypervolume median value and the highest Hypervolume minimum value, the configuration with identifier ID2 (a population of 100 individuals and an F parameter of 0.4) presents the highest Hypervolume maximum value, and the configuration with identifier ID3 (a population of 150 individuals and an F parameter of 0.4) presents the lowest standard deviation. The detailed values in relation to the Average, Median, Minimum Value, Maximum Value, and Standard Deviation of the Hypervolume indicator are shown in Table 3. In order to establish whether one of the nine configurations performs better than any other, a statistical significance hypothesis test was conducted [22]. The average ranks computed through the Friedman’s test are shown in Table 3. It is possible to observe that the configuration with identifier ID5 (a population of 100 individuals and an F parameter of 0.5) presents the best average rank. On the other hand, the procedure provides a p-value of 0.0159, which implies that the null hypothesis (H0 ) can be rejected (p-value N0 ⇒ Δtprop,real when N → ∞. rN > 1. Moreover, rN converges toward Δt eval Therefore, for large sample sizes, TDA-based Monte-Carlo has a cheaper computational cost. This trend can be observed in Table 1: The three parameters Δtprop,real , Δtprop,poly , and Δteval are set to typical values, which are respectively 10−2 s, 1s, and 10−3 s. Modelling SRP Uncertainties Since the variables of the polynomials are used to evaluate the propagation of uncertainties, at least six variables are needed in order to capture uncertainties on the state vector. Moreover, uncertainties on the SRP are crucial in the case of long propagations, and they need to be modelled as well. To Table 1 Values of rN for several values of N

N 10 102 103 104 105

rN 9.9.10−2 9.1.10−1 5.0 9.1 9.9

Can Uncertainty Propagation Solve the Mysterious Case of Snoopy?

115

Table 2 Value of n r7 for various TDA orders

k

k r7

5 8 12 15

0.43 0.26 0.17 0.13

minimize computation time, all the uncertainties on SRP are represented by only one variable instead of three for S, CR , and m: → [− γ SRP ] = KSRP · [CR ] ·

→ r S − · 3 m r

(11)

with δCR = CR

(

δCR CR

2

+

δS S

2

+

δm m

2 (12)

k+v , the dimension of the algebra modelling v the uncertainties on the SRP for each source of error can be compared with the k Dv dimension of the simplified algebra with fewer variables. This ratio k rv = (k+2) Dv is used to compare these two algebras: Since the dimension of k Dv is

k rv

=

1+

k v+1

1

1+

k v+2

0 the radius, centered on the Earth, will be estimated at each time step t. The expression of pR (t) is pR (t) =

R3

1S(R) xEarth f (x)dx

(17)

Depending on the time t and the radius R, it becomes possible to isolate windows of reentry for Snoopy. Based on expression (16), the Monte-Carlo estimator of pR (t) is N pˆ R (t) =

N

1 1S(R) Xi Earth N

(18)

i=1

The law of large numbers ensures that

N (t) −→ E(1S(R) X1 Earth ) = pR (t) pˆ R

(19)

Furthermore, it is possible to estimate the relative error made by the Monte-Carlo estimator with the following sequence, see [16]: $ N (t) εR

=

N (t)) V ar(pˆ R N (t) pˆ R

$ =

1 N (t) pˆ R

√

−1

N

→0

(20)

In other words, the estimator of Eq. (20) offers a way to compute the error made about its estimation, based on the size of the sample N and on the estimation itself. N (t) = 0, the error estimator is not defined, but the confidence interval However, if pˆ R N (t) = 0 is, according to Hanley [10], at 99.9% for N = 2.5.104 of pˆ R 6.9 0, = 0, 2.76.10−4 N

(21)

120

T. Caleb and S. Lizy-Destrez

It is now possible to estimate the probability of Snoopy’s presence and the associated error in all situations.

4 Results and Discussion The methodology developed in Sect. 3 will now be applied to the case of Snoopy. A numerical analysis will be performed before computing Snoopy’s trajectory in the TDA. Then, the uncertainties on Snoopy’s state vector and SRP will be estimated. Finally, the probability of Snoopy’s presence in the Earth’s SOI will be computed.

4.1 Performing Numerical Analysis on the Trajectory of Snoopy Integration Step for Snoopy To find the right integration step, the method described in Sect. 3.3 is used. The reference step is href = 1500s and the results of this analysis are referenced in Fig. 4: Since the local precision of the integration core has an order of magnitude set to 10−13 , there is no need to have an integration step delivering a greater precision than the one guaranteed by DOP853. This is why the integration step for Snoopy is set to h = 2days = 172,800s, so that the magnitude of the mean global relative error will be similar to the local integration error.

Fig. 4 Numerical analysis of the impact of the integration step on the integration error

Can Uncertainty Propagation Solve the Mysterious Case of Snoopy? Table 3 Evaluating the sensitivity of Snoopy’s trajectory

Variable x y z x˙ y˙ z˙ CR

Table 4 Initial conditions of Snoopy

Coordinate x y z x˙ y˙ z˙

121 Mean relative error =σ = 10−5 y0 −1 2.2.10 8.4.10−2 3.5 4.3 5.7.10−1 6.8.10−1 5.3 7.4 4.7 1.9 8.2.10−1 3.4.10−1 6.7.10−1 2.3.10−5

Value −5.981273207875668.107 km −1.281441253471675.108 km −5.559141789418507.107 km 2.521242690627454.101 km.s−1 −1.202240051772716.101 km.s−1 −5.308019902031308km.s−1

Sensitivity Analysis on Snoopy’s Trajectory A first propagation is carried out to evaluate the sensitivity of the trajectory with respect to each variable, as explained in Sect. 3.3, the results are stored in Table 3. The SRP has a very low effect on the trajectory for variations with a classic magnitude (10−5 ). However, since the value of CR is not known, these large uncertainties cause a strong dependency of the trajectory on CR . Furthermore, it appears that the impact of y, x, ˙ and y˙ on the trajectory compared to the other variables is important. It means that a poor approximation of one of these three variables will have more consequences on the overall approximation that it would on x, z, and z˙ .

4.2 Computing Snoopy’s Trajectory In order to propagate Snoopy’s trajectory, the same set of initial conditions as those found in L. Villanueva Rourera’s study was used, see [18]. These are centered on the Solar System barycenter in J 2000 on 1969 May 28 00:00:00 TDB (Temps Dynamique Barycentrique), see Table 4. The number of digits for these coordinates is the same as L. Villanueva Rourera’s, which is the maximum accuracy available for double precision floats. Snoopy’s trajectory was then computed on Python 3.7 running on Intel Xeon Gold 6126 CPUs at 2.6GHz with the integration parameters of Table 5, while parameters used to model the dynamics are referenced in Table 6.

122

T. Caleb and S. Lizy-Destrez

Table 5 Integration parameters for Snoopy with DOP853

Parameter Start date End date Step Absolute tolerance position Absolute tolerance velocity Relative tolerance TDA order

Table 6 Dynamical parameters of Snoopy

Parameter Point masses

SRP mSnoopy CSRP S

Value 1969 May 28 00:00:00 2016 Jan 01 00:00:00 hsnoopy = 172,800s 10−7 km 10−13 km.s−1 10−13 5

Value Sun, Mercury barycenter, Venus barycenter, Earth, Moon, Mars barycenter, Jupiter barycenter, Saturn barycenter, Uranus barycenter, Neptune barycenter, Pluto barycenter True 3351.032kg 1 12.56637m2

Computing the distance between Snoopy and the Earth is the main goal of this propagation, see Fig. 5. This trajectory corresponds to the one displayed in Fig. 1, which validates the propagator. The main potential reentry window occurs during the third approach of Snoopy to the Earth, between 3.5.108 s past J 2000 and 4.05.108 s past J 2000, see Fig. 6. Indeed, Snoopy approaches the Earth, so that it is contained in a sphere S(nRSOI ) centered on the Earth, with RSOI = 9.25.105 km, and n < 15. This is the main zone of investigation for a potential reentry of Snoopy, and the aim will be to verify if the uncertainties on Snoopy’s initial state vector and on the SRP exerted on it can deliver a potential window of reentry in the Earth’s SOI.

4.3 Estimating the Probability of Snoopy’s Presence Evaluating Initial Uncertainties Following Algorithm 1, the behavior of Cartesian uncertainties is observed using histograms, see Fig. 7. Based on histograms such as Fig. 7, the Cartesian uncertainties follow distributions that are modelled as normal distributions. Their empirical means and standard deviations are stored in Table 7. Choosing the uncertainties on SRP is much more arbitrary than uncertainties on the state vector, these values are stored in Table 8.

Can Uncertainty Propagation Solve the Mysterious Case of Snoopy?

Fig. 5 Distance from Snoopy to the Earth

Fig. 6 Potential window of Snoopy’s reentry in the Earth’s SOI

123

124

T. Caleb and S. Lizy-Destrez

Fig. 7 Uncertainties along the y-axis Table 7 Mean and standard deviation of the empiric distribution of the uncertainties on initial conditions of Snoopy

Axis δx (km) δy (km) δz (km) δ x˙ (km.s−1 ) δ y˙ (km.s−1 ) δ z˙ (km.s−1 )

Table 8 Uncertainties on SRP parameters

Mean 7.5523.101 −2.1030.101 9.5244 7.5406.10−5 6.1055.10−6 4.4094.10−5

Standard deviation 1.6239.103 1.3060.103 4.6176.102 1.6535.10−4 2.8622.10−4 1.2574.10−4

Parameter δm m δS S δCSRP CSRP

Value 2.5.10−1 10−1 10−1

According to Eq. (12), the following expression delivers the uncertainties on SRP: δCR = 0.29 CR

(22)

This value is higher than the relative uncertainties on the state vector (≈ 10−6 ), but it epitomizes the fact that parameters at stake for SRP are hard to evaluate for a spacecraft remaining in space a long time and subject to many phenomena. Indeed, debris may have struck Snoopy at any moment in its lifetime, solar radiations may have changed the coefficient of reflexivity over time, and the exposed surface is not always the same. Being conservative on the uncertainty of these three parameters ensures that no potential scenario is avoided.

Can Uncertainty Propagation Solve the Mysterious Case of Snoopy?

125

However, taking dynamical changes in the parameters of SRP into account with uncertainties on a static parameter is a strong hypothesis. Alternatively, it could prove interesting to allow one additional variable to model these phenomena with a dynamic law. Unlike expression (11), two variables [CR1 ] and [CR2 ] could be used instead of only [CR ]. The acceleration would then be → S − r → → [− γ R ] = KSRP · g t, − r , [CR1 ], [CR2 ] · · 3 m r

(23)

This modification could allow us to consider a large variety of scenarios g depending on the time or on the state vector itself and could be generalized to all parameters for an arbitrarily large number of variables. This idea will be the subject of future work. Probability of Snoopy’s Presence The probability of Snoopy’s presence in several spheres around the Earth was computed. These spheres have a radius that is a multiple of the radius of the Earth’s sphere of influence. These values are evaluated on the whole window of potential reentry highlighted in Fig. 6, and the normalized trajectory is represented in dashed lines on the following figures. Error bars computed, thanks to Eqs. (20) and (21), are displayed in Fig. 8 but are too small to be observed in practice. Snoopy enters S(11RSOI ) almost surely and may enter S(10RSOI ). But it never enters S(9RSOI ) or any other smaller sphere.

5 Conclusions and Future Work In this chapter, a methodology to compute a large number of possible trajectories for a spacecraft was delivered. This method implements Taylor Differential Algebra. Moreover, the generation of these trajectories is faster than a classic propagator. The ability to generate a large amount of trajectories allows to perform Monte-Carlo estimations with a high degree of precision. This methodology was applied to the mysterious case of Snoopy, the lost lunar module of mission Apollo 10. The generated trajectories allowed to estimate the probability of Snoopy’s presence in small spheres centered on the Earth. This criterion makes it possible to determine whether or not Snoopy reentered in the Earth’s atmosphere. In this study, the model developed for the modelling of the Solar Radiation Pressure allowed to gain significant performances as all the uncertainties of SRP are modelled by a single variable. Moreover, the sensitivity analysis of Snoopy’s polynomial trajectory highlighted the decisive role of the uncertainties on y, x, ˙ y, ˙ and CR . It means that a poor Taylor approximation along one of these dimensions has more consequences than on the three other dimensions.

126

T. Caleb and S. Lizy-Destrez

Fig. 8 Probability of Snoopy’s presence near the Earth

Finally, it is obvious that Snoopy approaches the Earth dramatically. Nevertheless, there is yet no statistical evidence that it will enter the Earth’s sphere of influence during the expected window. Therefore, a reentry of Snoopy in the Earth’s atmosphere is unlikely. This can be nuanced by the fact that there is no way to determine if the probability of Snoopy’s presence in the Earth’s SOI is 0 because of poor Taylor’s approximations or if this result would still be obtained by large Monte-Carlo estimations without the use of TDA. Future work with this code will be dedicated to switching from a full Python architecture to a Python interface toward a compiled language to increase the performances dramatically. Indeed, while Python is flexible and makes the prototyping of a tool very simple, its versatility causes the code to be less efficient than a C++ code. The expected performances will be used to perform domain splitting, in order to reduce the approximation error considerably. However, since domain splitting requires to propagate several trajectories in parallel, it is very time-consuming to split the domain at a high scale. Nevertheless, the dimensions where to perform domain splitting will be chosen thanks to the sensitivity analysis lead in this chapter. Finally, propagating the trajectory of WT1190F could also be interesting, to perform combined estimations with Snoopy’s trajectory.

Can Uncertainty Propagation Solve the Mysterious Case of Snoopy?

127

Acknowledgments The authors would like to thank Denis Hautesserres (CNES) for his expertise in numerical analysis and numerical integration and for the very interesting discussions on the case of Snoopy. Moreover, the authors would also like to thank Lydia Villanueva Rourera and Paolo Guardabasso (ISAE-SUPAERO) for their work on the trajectory of Snoopy.

References 1. Acciarini, G., Grecoy, C., Vasile, M.: On the solution of the Fokker-Planck equation without diffusion for uncertainty propagation in orbital dynamics. In: 2020 AAS/AIAA Astrodynamics Specialist Conference (2020) 2. Adamo, D.R.: Earth departure trajectory reconstruction of Apollo program components undergoing disposal in interplanetary space (2012). http://www.aiaahouston.org 3. Armellin, R., Di Lizia, P., Bernelli-Zazzera, F., Berz, M.: Asteroid Close Encounters Characterization Using Differential Algebra: The Case of Apophis. Springer (2010) 4. Berz, M.: Modern Map Methods in Particle Beam Physics. Academic Press (1999) 5. Bignon, E., Mercier, P., Azzopardi, V., Pinéde, R.: Accurate numerical orbit propagation using polynomial algebra computational engine pace. In: ISSFD 2015 Congress (2015) 6. Farnocchia, D., Chesley, S.R., Micheli, M., Delamere, A., Heyd, R.S., Tholen, D.J., Giorgini, J.D., Owen, W.M., Tamppari, L.K.: High precision comet trajectory estimates: The Mars flyby of c/2013 a1 (Siding Spring). Icarus (2016) 7. Folkner, W.M., Williams, J.G., Boggs, D.H., Park, R.S., , Kuchynka, P.: The planetary and lunar ephemerides de430 and de431. Tech. rep., Jet Propulsion Laboratory, California Institute of Technology (2014) 8. Georgevic, R.M.: Mathematical model of the solar radiation force and torques acting on the components of a spacecraft. Tech. rep., Jet Propulsion Laboratory (1971) 9. Hairer, E., Nørsett, S.P., Wanner, G.: Solving Ordinary Differential Equations I Nonstiff Problems. Springer (1993) 10. Hanley, J.A., Lippman-Hand, A.: If nothing goes wrong, is everything all right? interpreting zero numerators. J. Am. Med. Assoc. 249(13), 1743–1745 (1983) 11. Hautesserres, D., Villanueva Rourera, L., Guardabasso, P.: Research of the history of wt1190f and that of snoopy. Tech. rep., Centre National d’Etudes Spatiales (CNES) and Institut Supérieur de l’Aéronautique et de l’Espace (ISAE-SUPAERO) (2020) 12. Izzo, D., Biscani, F.: darioizzo/audi: Multiple Precision Differential Algebra (May 2018). https://doi.org/10.5281/zenodo.1253326 13. Massari, M., Di Lizia, P., Rasotto, M.: Nonlinear Uncertainty Propagation in Astrodynamics Using Differential Algebra and Graphics Processing Units. American Institute of Aeronautics and Astronautics (2017) 14. Mckay, M.D., Beckman, R.J., Conover, W.J.: A Comparison of Three Methods for Selecting Values of Input Variables in the Analysis of Output from a Computer Code. American Statistical Association and American Society for Quality (1979) 15. Ortega Absil, C., Serra, R., Riccardi, A., Vasile, M.: De-orbiting and re-entry analysis with generalised intrusive polynomial expansions. In: 67th International Astronautical Congress (2016) 16. Robert, C.P., Casella, G.: Monte Carlo Statistical Methods. Springer (2004) 17. Vasile, M., Ortega Absil, C., Riccardi, A.: Set propagation in dynamical systems with generalised polynomial algebra and its computational complexity. Commun. Nonlinear Sci. Numer. Simul. 75, 22–49 (2019)

128

T. Caleb and S. Lizy-Destrez

18. Villanueva Rourera, L., Lizy-Destrez, S., Guardabasso, P.: Snoopy’s trajectory - debris identification. Tech. rep., Institut Supérieur de l’Aéronautique et de l’Espace (ISAE-SUPAERO) (2020) 19. Wittig, A., Di Lizia, P., Armellin, R., Makino, K., Bernelli-Zazzera, F., Berz, M.: Propagation of Large Uncertainty Sets in Orbital Dynamics by Automatic Domain Splitting. Springer, Celest Mech Dyn Astr (2015)

Part II

Imprecise Probability, Theory and Applications (IP)

Robust Particle Filter for Space Navigation Under Epistemic Uncertainty Cristian Greco

and Massimiliano Vasile

1 Introduction State estimation addresses the problem of estimating the state of a system given uncertain knowledge of its dynamical equations and noisy indirect observations. Bayesian approaches require the definition of a single precise uncertainty distribution for the prior, likelihood and dynamical transition model. The output of this precise inference is highly sensitive on the input distributions’ specification, see e.g. [1]. However, in complex applications, it may be impossible to correctly quantify precise uncertainty measures. Indeed, the specification of a precise distribution would require a perfect knowledge of all the factors that concur to the definition of such an uncertainty and an abundance of data (e.g. knowledge of the dynamics, full sensor characterisation, complete information on the source and quality of the measurements, etc.). This issue is particularly relevant in satellite collision prediction and quantification because of the shortage of accurate observations for the high number of objects in Low Earth Orbit (LEO). Nonetheless, the majority of methods for space object tracking relies on the use of parametric distributions and simplifying assumptions, e.g. Gaussianity, linearised dynamics and observation models [2]. Methods tackling more complex distributions and the dynamical system nonlinearities exist, such as the particle filter [3], but they

This work was funded by the European Commission’s H2020 programme, through the H2020MSCA-ITN-2016 UTOPIAE Marie Curie Innovative Training Network, grant agreement 722734. C. Greco () · M. Vasile University of Strathclyde, Glasgow, UK e-mail: [email protected]; [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Vasile, D. Quagliarella (eds.), Advances in Uncertainty Quantification and Optimization Under Uncertainty with Aerospace Applications, Space Technology Proceedings 8, https://doi.org/10.1007/978-3-030-80542-5_9

131

132

C. Greco and M. Vasile

still employ precise and complete information on the prior, likelihood and transition distributions. A different generalisation of the estimation problem to set-valued distributions can be found in the literature. A robust estimation with p-box uncertainty models is studied in [4]. The set-valued Kalman filters have been developed in [5–7]. Convex polytopes have been used as epistemic sets in robust estimation in [8]. In [9], the authors have studied an approach based on Coherent Lower Previsions using closed convex sets of probabilities as a model of imprecision. Solution approaches are presented for two specific cases: family of Gaussian distributions with intervalvalued means and linear-vacuous mixture models. Most of the methods in the literature employ closed convex sets of probability distributions as epistemic sets for prior, measurement and transition uncertainties. This research presents the Robust Particle Filter (RPF), a robust filtering approach for state estimation of dynamical systems under mixed aleatoric and epistemic uncertainty. Epistemic uncertainty [10] is used to model scenarios in which a single precise distribution cannot be specified, but rather a set of distributions is employed. This results in set-valued posterior distributions and therefore interval-valued expectations. In this work, set-valued hyperparameters are used to construct the epistemic sets. Thus, the distribution sets employed in this chapter are more general than ones employed by most methods in the literature. This set-valued estimation replaces the traditional single point estimate which is sensitive to the choice of prior and likelihood. This work extends a previous research [11] by introducing a formal estimator for the expectation and its bounds, its derivative with respect to the epistemic parameters and a global Branch & Bound (B&B) optimisation to ensure convergence to the global optimum. Furthermore, a confidence interval is provided to robustly quantify the output accuracy. The robust filter is run on a satellite–debris collision scenario to quantify the impact probability. Robust bounds are estimated to show the sensitivity of the collision probability to different specifications of the input distributions.

2 Filtering Under Epistemic Uncertainty The state-space model for the state estimation problem addressed in this research is a continuous–discrete one [12]. The system state evolves according to a time continuous ordinary differential equation, whereas indirect observations yk are collected at discrete instances of time. Specifically, the state-space model is formulated as )

x˙ = f(t, x, d) yk = h(tk , xk , ε)

(1a) for k = 1, . . . , l .

(1b)

In Eq. (1a), x ∈ Rn is the system continuous state, d are static model parameters, f : R × Rn × Rd → Rn represents the functional relationship of the equations of

Robust Particle Filter for Space Navigation Under Epistemic Uncertainty

133

motion and t is the independent time variable. If the system’s initial condition x(t0 ) = x0 and the dynamical model parameters d were known perfectly, there would be no need of measurements, as the equations of motion could be (usually numerically) integrated to obtain the system evolution in time. However, in reallife scenarios, uncertainty is always involved in such systems, and measurements are needed to refine the state knowledge at a later time. Hence, Eq. (1b) models the measurement yk received at time tk , h is the generally nonlinear relation between the state and the observation value and ε is a generic noise affecting the measurement realisation. In this chapter, we will consider the case of uncertainty, modelled as a random variable, affecting – the initial condition X0 ∼ p(x0 ), – the static parameters D ∼ p(d), which are nuisance parameters to take into account but should not be estimated explicitly, and – the measurement realisation Yk ∼ p(yk |xk ), as a result of the noise ε, described by a conditional distribution. Hence, the standard continuous–discrete state-space model in Eq. (1) can be reformulated in a probabilistic fashion [2] to explicitly describe the uncertain nature of system as ⎧ X0 ∼ p(x0 ) ⎪ ⎪ ⎨ Xk ∼ p(xk | xk−1 ) ⎪ ⎪ ⎩ Yk ∼ p(yk | xk )

(2a) (2b) for k = 1, . . . , L ,

(2c)

where the transition probability p(xk | xk−1 ) describes the system probabilistic evolution as a Markov chain, under the Markov assumption, resulting from the uncertainty on D and process noise. The complete solution of the precise filtering problem (2) is the posterior distribution of the state conditional to the previously received observations, which following a sequential Bayesian approach is written as p(xk | y1:k ) =

p(yk | xk ) p(xk | y1:k−1 ) , p(yk | y1:k−1 )

(3)

where p(xk | y1:k−1 ) is the prior distribution computed by propagating the previous posterior p(xk−1 | y1:k−1 ) to tk with the transition model. The full distribution is generally expensive to compute, and its complete knowledge provides greater information than needed for most practical applications. Hence, the filtering problem is typically reduced to the computation of the expectation of a generic function φ with respect to the posterior distribution as Ep φ(Xk ) =

φ(xk ) p(xk | y1:k )dxk .

(4)

134

C. Greco and M. Vasile

2.1 Imprecise Formulation In the case of epistemic uncertainty, the probability distributions are not assumed to be known precisely, but they are specified within parameterised sets of probability measures. Within these imprecise sets of distributions, no judgement is made about their relative likeliness. The epistemic model considered accounts for uncertainty on the parameters of the probability distributions involved. For a generic random variable Z, the epistemic set is defined * + PZ = p(z; λ) | λ ∈ λ ,

(5)

where λ , the epistemic parameter domain, is a compact subset of Rλ . Under epistemic uncertainty, the probabilistic continuous–discrete filtering problem is stated as ⎧ X0 ∼ p(x0 ; λ0 ) ∈ PX0 (6a) ⎪ ⎪ ⎨ Xk ∼ p(xk | xk−1 ; λxk ) ∈ PXk |Xk−1 (6b) ⎪ ⎪ ⎩ Y ∼ p(y | x ; λ ) ∈ P (6c) k k yk k Yk |Xk for k = 1, . . . , L , where λ0 ∈ λ0 , λxk ∈ λxk and λy ∈ λyk are the epistemic parameters for the initial, transition and likelihood distributions, respectively. The symbol λ = [λ0 , λxk , λyk ], for k = 1, . . . , L, is used to indicate the collection of the three epistemic parameters, λ of their respective sets and pλ of the corresponding densities. The posterior distribution computed by the Bayesian inference depends on the epistemic parameters p(xk | y1:k ; λ) as well as the expectation Epλ φ Xk =

φ(xk )p(xk | y1:k ; λ) dxk .

(7)

Given the set-valued distributions, the robust estimation process outcome is an interval [E, E] whose bounds are the lower and upper values of the expectation of the quantity of interest conditional on the observations. The bounds are defined as E φ Xk = min Epλ φ Xk

(8a)

E φ Xk = max Epλ φ Xk ,

(8b)

λ∈λ λ∈λ

that is, the minimum and maximum of the expectation over the epistemic set. These lower and upper expectations express the tight bounds on the expectation of the quantity of interest as resulting from the imprecise specification of uncertainty.

Robust Particle Filter for Space Navigation Under Epistemic Uncertainty

135

This formulation allows one to model different independence models [13], e.g. λyk can be kept fixed for different k = 1, . . . , L (repetition independence) or free to change at different observation times (epistemic independence/irrelevance).

2.2 Expectation Estimator If no specific assumption, or suitable parameterisation, is imposed on the distributions pλ , the expectation has no closed-form solution, and numerical techniques are required. In this chapter, we employ sequential importance sampling (SIS) [2] to ˆ construct an estimator θ(λ) for the expectation Epλ φ Xk . Precomputed Sequential Importance Sampling We employ sequential importance sampling [2] to exploit the sequential nature of the tracking problem (6) and develop a sequential estimator for inference problems in the imprecise setting. The expectation is written introducing a proposal distribution π as Epλ φ Xk =

φ(xk )

=

p(xk | y1:k ; λ) π(xk | y1:k ) dxk π(xk | y1:k )

(9)

φ(xk ) w(xk , λ) π(xk | y1:k ) dxk ,

under the condition that π has an equal or larger support than the posterior. The function w(xk , λ) is the ratio between the target distribution and the proposal one functioning as a weight measuring their deviation. Hence, the expectation can be computed in a Monte Carlo fashion as N 1 (i) ˆ Epλ φ Xk ≈ θ (χ k , λ) = , w (λ)φ x(i) k N

(10)

i=1

(1)

(N )

(i)

where χ k = {xk , . . . , xk } is the collection of samples xk drawn from the (i) proposal, and w (i) (λ) = w(xk , λ). In a sequential fashion, the posterior can be decomposed as p(x0:k | y1:k ; λ) ∝ p(yk | xk ; λy ) p(xk | xk−1 ; λx ) p(x0:k−1 | y1:k−1 ; λ) ,

(11)

with initial condition p(x0:0 | y1:0 ; λ) = p(x0 ; λ0 ). Moreover, also the proposal distribution should be chosen as π(x0:k | y1:k ) = π(xk | x0:k−1 , y1:k ) π(x0:k−1 | y1:k−1 ) ,

(12)

with initial condition π(x0:0 | y1:0 ) = π(x0 ). With a proposal in this form, the samples can be drawn sequentially, that is, when the collection χ k at time tk needs

136

C. Greco and M. Vasile

to be drawn, the collection of samples until tk−1 need not to be drawn again, and the updated collection of samples can be formed as χ 0:k = {χ 0:k−1 , χ k }. This collection describes N trajectories, each with a sequence of particles at discrete times 0, 1, . . . , k. Thus, the total number of saved samples is N(k +1), which grows linearly with the number of observations. From Eqs. (11) and (12), the SIS weights can be computed in a sequential fashion as (i)

(i) wk (λ)

=

(i)

(i)

p(yk | xk ; λy ) p(xk | xk−1 ; λx ) (i) π(x(i) k | x0:k−1 , y1:k )

(i)

wˆ k−1 (λ)

(13a)

(i)

w (λ) (i) , wˆ k (λ) = N k (j ) j =1 wk (λ) (i)

(i)

(13b)

(i)

with wˆ 0 (λ0 ) = p(x0 ; λ0 )/π(x0 )/N. The weight update rule in Eq. (13a) does not include the posterior normalisation constant as it was neglected in Eq. (11). Therefore, Eq. (13b) is the weight self-normalisation, which addresses this issue. All the quantities that are independent from the epistemic parameter can be precomputed before the optimisation. Once the proposal distributions π have been chosen, the particles sampling χ 0:k , the proposal density evaluations and the function φ evaluations can be computed offline. Hence, the self-normalised precomputed Sequential Importance Sampling (pSIS) estimator is N

(i) (i) (i) θˆ χ k , λ = with xk ∼ π xk | x0:k−1 , y1:k , wˆ k (λ) φ xk

(14)

i=1

where both the samples generation and the importance weight computation, as in Eq. (13), are processed sequentially. This estimator is asymptotically unbiased with a bias decreasing with the number of particles as O(1/N) [14]. Estimator Derivatives The derivatives of the pSIS estimator can be computed analytically. The derivative knowledge is extremely valuable in the bound computation (8) as it enables the use of efficient gradient descent methods to improve the exploitation stage in the optimisation. Furthermore, because the precomputed SIS works on fixed samples χ 0:k , there would be no noise due to sampling in the derivative information. Let us assume that we can compute the derivatives of the density functions in Eq. (6), that is, we can evaluate ∇λ0 p(x0 ; λ0 )

(15a)

∇λx p(xk | xk−1 ; λx )

(15b)

∇λy p(yk | xk ; λy ) .

(15c)

Robust Particle Filter for Space Navigation Under Epistemic Uncertainty

137

The quantity to compute is the gradient of the estimator with respect to the epistemic parameters ⎤ ⎡ ∇λ0 θˆ ⎥ ⎢ (16) ∇λ θˆ = ⎣∇λx θˆ ⎦ . ∇λy θˆ By the linearity of the derivative operator and the chain rule, the gradient can be computed as ∇λ θˆ =

N

φ (i) ∇λ wˆ L(i) ,

(17)

i=1 (i)

since the precomputed function evaluations φ (i) = φ(x0:k ) are independent from λ. Hence, the estimator gradient is obtained by computing the derivative of the weights. By Eq. (13) and the chain rule again, the weight derivative ∇λ wˆ k(i) can (i) as be computed by the previous weight derivative ∇λ wˆ k−1 ⎡

⎤

0

(i)

(i) (i) (i) ⎢ ⎥ wˆ + ∇λ wk(i) = ⎣p(yk | xk ; λy ) ∇λx p(xk | xk−1 ; λx )⎦ k−1 (i) (i) (i) (i) ∇λy p(yk | xk ; λy ) p(xk | xk−1 ; λx ) πk (i)

(i)

(i)

p(yk | xk ; λy ) p(xk | xk−1 ; λx ) (i)

πk (i)

∇λ wˆ k =

N ∂wk(i) (j ) j =1 ∂wk

(18a)

(i) ∇λ wˆ k−1

(j )

∇λ wk ,

(18b)

with (i)

∂wk

(j ) ∂wk

(i)

= −

wk

(k) 2 N k=1 wk

+ N

δij

(k) k=1 wk

(19)

where δij = 1 if i = j , δij = 0 otherwise. The initial conditions for the weights derivative computation are ⎡

(i) ∇λ w0

⎤ (i) (i) ∇λ0 p(x0 ; λ0 )/π0 ⎦ =⎣ 0 0

(20)

because the initial distribution does not depend on the epistemic parameters λx and λy .

138

C. Greco and M. Vasile

2.3 Bound Estimator The robust estimation method computes the bounds in Eq. (8) by taking advantage of the efficient estimator constructed above. In the following, the routine to estimate the lower bound only will be discussed, as the same approach holds true for the upper bound with appropriate modifications. The lower bound estimator and the epistemic parameter achieving it are denoted as θˆ (χ k ) = min θˆ (χ k , λ)

(21a)

λ(χ k ) = arg min θˆ (χ k , λ) ,

(21b)

λ∈λ

λ∈λ

that is, the minimum of the expectation estimator and the argument of the minimum given the set of samples χ k . By employing the pSIS to evaluate the estimator for a candidate λ, the optimisation process operates on the importance weights only (i) wk (λ). Equation (21) involves two challenges: the optimisation problem is generally multi-modal; even if the global minimum of the estimator is found, θˆ could still deviate from the true sought bound E. Global Search The global search for solving the optimisation problem in Eq. (21) is tackled with a B&B approach using simplexes as subdomains and a Lipschitzbased lower bound estimation. The advantage of using B&B is that asymptotic convergence to the global optimum is granted and that an estimation of the distance from it is known at each iteration. The advantage of using simplexes is that the number of branched sub-simplexes is constant with the problem dimension rather than exponential as with other domains. The Lipschitz constant can be estimated using the estimator analytical derivatives. The algorithm is described in great detail and its convergence proved in [15]. Here, a summary of the Branch & Bound rules employed is provided. Preliminary Definitions Let S ⊂ Rn be an n-simplex with vertexes [λ0 , . . . , λn ]. Let L, with 0 < L < ∞, be the Lipschitz constant of θˆ over λ such that % % %θˆ (χ k , λ1 ) − θˆ (χ k , λ2 )% ≤ Lλ1 − λ2 ∀λ1 , λ2 ∈ λ .

(22)

Bounding For the upper bound of the minimum, the trivial bound is chosen as ub(S) = min θˆ (χ k , λj ) . j

(23)

Robust Particle Filter for Space Navigation Under Epistemic Uncertainty

139

Because θˆ is Lipschitz continuous, the lower bounding function lb : S → R can be defined as ˆ k , λj ) − Lλ − λj , lb(λ) = max θ(χ j

(24)

where λj are the vertexes of the simplex. Hence, the lower bound value over the simplex is defined as lb(S) = min lb(λ) . λ∈S

(25)

ˆ k , λj ) − Lλ − λj can be seen as the boundary of a hyperThe function θ(χ cone originating from (λj , θˆ (χ k , λj )). By construction, the minimum lb(S) is either attained at the common intersection of these n + 1 hyper-cones, call it λ∩ , if this intersection is inside the simplex, or at the boundary of the simplex, call it λ∂ , if the intersection is outside. In the latter case, the common intersection has a lower function value than the point on the boundary, that is, lb(λ∩ ) < lb(λ∂ ) [16], such that lb(λ∩ ) can be used as lower bound nevertheless. The hyper-cones intersection can be found analytically as the intersection of n hyper-planes perpendicular to the lines connecting the vertexes λ1 , . . . , λn to λ0 [15]. Branching Longest Edge Bisection (LEB) is adopted as the branching rule. The generic simplex S is split into two simplexes by bisection along its longest edge, say the one connecting the vertices λ∗i and λ∗j , such that λ∗ij =

λ∗i + λ∗j 2

(26)

is the new vertex. The two offspringing simplexes S (1) and S (2) share the same vertices of S expect that one has λ∗ij in place of λ∗i , whereas the other has λ∗ij in place of λ∗j . This branching rule eventually creates a simplex of arbitrary small size and an upper bound on the number of iterations required can be estimated [15]. Lipschitz Constant Estimation The Lipschitz constant is estimated as the maximum of the estimator gradient evaluated on the vertexes of the simplexes. Specifically, at (i) a generic iteration k, the domain λ is partitioned in disjoint simplexes Sk , such ' (i) (i) (i) that i Sk = λ , each with vertexes [λ0 , . . . , λn ]. The Lipschitz constant is set to 2 (i) 2 L = max 2∇λ θˆ (χ k , λj )2 . i,j

(27)

140

C. Greco and M. Vasile

Convergence The algorithm terminates when the difference between the upper and the lower bounds on the most promising simplex S∗ is below a given threshold δ ub(S∗ ) − lb(S∗ ) ≤ δ .

(28)

This ensures the global minimum to be within the interval θˆ ∈ [lb(S)∗ , ub(S)∗ ] .

(29)

Bound Approximation The accuracy of the lower bound estimator can be quantified with a confidence interval. Troffaes [14] proved that an estimator in the form (21) is a coherent lower expectation if θˆ (χ k , λ) is a coherent expectation, which is true for the constructed estimator thanks to the self-normalisation of the weights. Confidence bounds can be constructed for the bound estimator [14]. Let χ 1k , . . . , χ 2N k be 2N sets of samples. Define the quantities ⎧ ⎨ θˆ χ j , λ χ j

for j = 1, . . . , N k k , θˆ =

j −N j ⎩ θˆ χ , λ χ k for j = N + 1, . . . , 2N k j

(30)

with λ(χ k ) defined in Eq. (21b). For the first half, these are the classical bound estimators, whereas for the second half these require evaluating θˆ for a set of j −n samples χ k with the epistemic parameter minimising the estimator for another j 1H set of samples χ k . Let μ1H χ and σχ be the mean and standard deviation of the first j 2H ˆj half of θˆ , i.e. for j = 1, . . . , N, and μ2H χ and σχ the ones of the last half of θ , that is, for j = N + 1, . . . , 2N. Then, under the assumption that enough particles have been used to keep the bias small and bounded for all λ, the confidence interval for the confidence level 1 − α is σχ1H 2H σχ2H 1H μχ − tα,N −1 √ , μχ + tα,N −1 √ , N N

(31)

where tα,N −1 is the 1 − α two-sided critical value of the t-distribution with N − 1 degrees of freedom.

3 Test Case This section presents the test case to assess the performance of the developed filter. The scenario considered consists of a piece of debris on a potential collision orbit with a known operational satellite. SOCRATES (Satellite Orbital Conjunction Reports Assessing Threatening Encounters in Space) [17], an online service that provides twice-daily reports on the most likely collision events based on NORAD

Robust Particle Filter for Space Navigation Under Epistemic Uncertainty

141

Table 1 Spacecraft orbital elements at reference epoch from NORAD TLEs NORAD ID EPOCH [UTC] a [km] e [−] i [deg] [deg] ω [deg] θ [deg] 35684 09-Jul-2019 22:50:30 7009.95 4.7e−3 98.23 155.74 194.40 165.72 17296 10-Jul-2019 04:07:40 6992.96 4.0e−3 82.57 40.59 211.66 203.54

two-line elements (TLEs) and the SGP4 propagator, was employed to select two space objects with low range at their close approach. The orbital elements of these two objects are reported in Table 1. The first object is an operational satellite, NORAD ID 35684, whose ephemerides are assumed to be perfectly known. The second object, NORAD ID 17296, is a non-operational rocket body in orbit since 1987. Thus, the state of the debris is the latent variable xk . The reference Time of Closest Approach (rTCA) is 15 July 2019 at 21:54:40 UTC. We will define the final time of our propagation as tF = rT CA, with tF > tL the time of the last observation before the possible collision. The motion of the body is described in Cartesian coordinates, in an Earthcentred inertial reference frame. The dynamical model in Eq. (1a) includes the following components [18]: the Earth’s gravitational force derived from the EGM96 geopotential model up to degree and order 4, the atmospheric drag, according to the Jacchia–Gill model, the third-body disturbances due to the Moon’s and the Sun’s gravitational attraction and the solar radiation pressure with a conical shadow model for the Earth’s eclipses. The quantity to compute and bound is the Probability of Collision (PoC). A collision is defined when the minimum distance between the two objects is smaller than a given threshold δDCA . Hence, the collision indicator is expressed as ⎧ ⎨1 DCA(x ) ≤ δ F DCA IC (xF ) = ⎩0 DCA(xF ) > δDCA ,

(32)

where DCA is the function extracting the Distance of Closest Approach (DCA), that is, the minimum of the relative position norm between the debris state realisation and the known operational satellite. To detect the correct DCA for each sample, the relative distance at tF is not used directly because different state realisations have different Time of Closest Approach (TCA) and therefore the distance at tF is generally not the minimum one. This effect is particularly critical in this scenario because the relative velocity of the two objects is approximately 13.5 km/s, and even very small time differences yield to large distance changes. Therefore, the minimum DCA for a realisation xF is obtained by computing the pericentre distance of the relative hyperbolic trajectory between the two objects [19]. From here, the probability of collision is evaluated by computing the expectation of the indicator function PoC(λ) = IC (xF )p(xF |y1:L ; λ)dxF . (33)

142

C. Greco and M. Vasile

Therefore, the goal is to compute robust bounds on the PoC as PoC = min PoC(λ)

(34a)

PoC = max PoC(λ) ,

(34b)

λ∈λ λ∈λ

with a specific interest for PoC, which represents the epistemic worst case scenario.

3.1 Initial State Uncertainty The initial distribution is constructed by imposing an uncertainty measure on the debris’ TLE in Table 1. The value of such uncertainty is defined by following the guidelines of the European Space Agency (ESA) [20]. For inclinations larger than 60 deg, the 1σ uncertainty in the radial, transversal and normal components of position and velocity is reported in Table 2. A Gaussian parametric family is employed to describe the initial state uncertainty. The covariance matrix x0 in inertial coordinates is computed using the Jacobian of the transformation from radial, transversal and normal to Cartesian coordinates. From here, the importance initial distribution is defined as a normal distribution π(x0 ) = N x0 ; μx0 , x0 ,

(35)

where μx0 is the Cartesian state retrieved from the TLE. Epistemic uncertainty is introduced in the probability distribution associated with the initial conditions by using two epistemic parameters. The initial epistemic set is defined as 3 x0 ; λ 0 , PX0 = { p(x0 ; λ0 ) : p(x0 ) = N x0 ; μx0 , 3x0 = diag( λx0−1 x0 (1:3, 1:3), λx0−2 x0 (4:6, 4:6)), λx0−1 ∈ [0.332 , 1.52 ], λx0−2 ∈ [0.332 , 1.52 ] } , (36) where λ0 = [λ0−1 , λ0−2 ] is the epistemic parameter on the initial distribution, x0 (1:3, 1:3) and x0 (4:6, 4:6) indicate, respectively, the position block and the Table 2 1σ position (r) and velocity (v) uncertainty of TLEs for orbits with e < 0.1, i > 60 deg, perigee altitude ≤ 800km, in radial (U), transversal (V) and normal components (W) 1σrU [m] 104

1σrV [m] 556

1σrW [m] 139

1σvU [mm/s] 559

1σvV [mm/s] 110

1σvW [mm/s] 148

Robust Particle Filter for Space Navigation Under Epistemic Uncertainty

143

velocity block of the covariance matrix x0 and the operator diag indicates a block-diagonal matrix. Therefore, the set PX0 is parameterised using two epistemic multipliers λx0−1 and λx0−2 , which scale the covariance matrix, reducing the initial uncertainty for multipliers < 1 or increasing the initial uncertainty for multipliers > 1. The multiplier range [0.332 , 1.52 ] means that the standard deviations of the initial state may be reduced to 33% or increased by 50% of their reference value computed from Table 2. Note that the definition and parameterisation of the family of distributions are very much dependent on the nature of the epistemic uncertainty that one is considering. In this illustrative example, we maintain the assumption of a Gaussian type of family of distributions. More general parameterisation is also possible; see [21] for an example of the use of Bernstein polynomials to represent generic families on a bounded support.

3.2 Observation Model and Errors The simulated scenario involves indirect measurements of the state between the initial time and the TCA, which are employed to improve the state knowledge. The quantities that are measured (Eq. 1b) are the debris azimuth and elevation with respect to the equatorial plane [22], such that the ideal measurement model is expressed as ⎡

⎤ arctan xxkk (2) (1) ⎦ , h(tk , xk ) = ⎣ arcsin xxk k(1(3) :3)

(37)

where xk (i) indicates the i-th element of the vector xk . The simulated measurements are generated using the debris reference trajectory, i.e. with initial conditions μx0 , which is one of the collision trajectories of the space debris. To mimic measurement errors, the noisy measurements are drawn from the distribution (38) N yk ; h(tk , xk , 0), yk , where h(tk , xk , 0) is the ideal azimuth, an elevation observation model with zero noise, and yk is the diagonal covariance resulting from the standard deviations specified in Table 3. Observations are taken every eight hours between the initial epoch and the rTCA. The resulting noisy observations are indicated as y¯ k .

144

C. Greco and M. Vasile

Table 3 1σ azimuth (az) and elevation (el) uncertainty for noisy measurements of debris 1σaz [deg] 0.1

1σel [deg] 0.1

In the epistemic scenario, we assume that the observation covariance yk is not known precisely due to poor sensor characterisation. The likelihood epistemic set PYk |Xk is therefore parameterised as 3y , PYk |Xk = { p(¯yk |xk ) : p(¯yk |xk ) = N h(tk , xk ); y¯ k , k 3y = diag( λy y (1, 1), λy y (2, 2)), k k−1 k k−2 k

(39)

λyk−1 ∈ [0.332 , 1.52 ], λyk−2 ∈ [0.332 , 1.52 ] } , where yk (1, 1) and yk (2, 2) indicate, respectively, the azimuth and elevation variance values of the reference covariance matrix yk as resulting from the standard deviations in Table 3.

3.3 Results An initial Monte Carlo uncertain propagation is carried out without any observation by propagating 106 samples drawn from the distribution in Eq. (35). The resulting DCA distribution is reconstructed from the propagated samples as an Empirical Cumulative Distribution Function (ECDF) plotted in Fig. 1. For a threshold distance δDCA = 500 m (a conservative threshold for most satellites and debris), the collision probability compatible with π(x0 ) in Eq. (35) is PoC = 0.35% . Successively, for reference and precomputation, the precise bootstrap filter [2] is run with the prior specified in Eq. (35) and likelihoods 3y . N h(tk , xk ); y¯ k , k

(40)

The distribution resulting from the bootstrap filter with 106 samples is plotted in Fig. 2. By comparing Figs. 2 and 1, we can notice how the inclusion of observations leads to an increase in the probability of collision for any threshold, with the value for the 500 m equal to PoC = 1.82% .

Robust Particle Filter for Space Navigation Under Epistemic Uncertainty

145

Fig. 1 ECDF for the DCA between debris and operational satellite as resulting from the uncertainty on the initial conditions given by Eq. (35)

Fig. 2 Empirical Cumulative Distribution Function for Distance of Closest Approach between debris and operational satellite as resulting from bootstrap filter with observations, with prior as in Eq. (35) and likelihood as in Eq. (40)

146

C. Greco and M. Vasile

Fig. 3 Empirical Cumulative Distribution Function for Distance of Closest Approach between debris and operational satellite as resulting from the robust particle filter with two observations, with prior imprecise set as in Eq. (36) and likelihood imprecise set as in Eq. (39)

Indeed, since the observations are taken relatively close to the collision trajectory, the state probability mass is concentrated closer to regions which will more likely result in the collision event. Given the parameterisations for PX0 and PYk |Xk in Eqs. (36) and (39), the robust particle filter is run with proposal prior as in Eq. (35). The transition probability will be used as importance distribution. The B&B convergence threshold is set to δ = 1e − 4. The number of particles used is set to 105 . The results of the robust particle filter run with these settings are plotted in Fig. 3. The ECDFs corresponding to the lower and upper bounds (respectively, green and red lines) are displayed together with the precise distribution in Fig. 2 (black line) and a number of ECDFs resulting from epistemic samples generated with a Halton sequence within the set λ (light grey lines). The impact probability is now in the range of PoC ∈ [ 4.16e − 4% , 2.58% ] . The width of this robust probability interval, which is visualised in greater detail in Fig. 4, indicates how significantly sensitive this state estimation scenario is to different specifications of the uncertainty density functions. In particular, from Fig. 4, we can see that for the quantity of interest, i.e. the probability of DCA less than 500 m, the computed lower and upper bounds enclose all the other probabilities compatible with the distributions within the imprecise set. Indeed, as an a posteriori

Robust Particle Filter for Space Navigation Under Epistemic Uncertainty

147

Fig. 4 Collision probability interval for set threshold δDCA = 500 m, and ECDFs for DCA between debris and operational satellite as resulting from the robust particle filter with two observations, with prior imprecise set as in Eq. (36) and likelihood imprecise set as in Eq. (39)

validation of the bounds computed, the precise and the samples’ distributions are all included within the lower and upper bounds for the given threshold of 500 m. Thanks to the optimisation step to estimate the lower and upper collision probability over the epistemic family of distributions considered, the robust filter counteracts the characteristic probability dilution problem typical of satellite conjunction analyses [23]. It is important to underline that the distributions corresponding to lower or upper bounds for δDCA = 500 m, labelled as lower or upper distributions (respectively, blue and red lines in the figures), do not necessarily result in lower and upper bounds for the collision probability at different thresholds δDCA = 500 m. Indeed, the lower and upper distributions are not lower and upper envelopes for the ECDFs resulting from the imprecise set, but they are distributions belonging to the set itself bounding the expectation of a specific quantity of interest. From Fig. 3, we can notice that indeed the lower distribution does not yield to probability lower bound for DCA thresholds higher than roughly 2 km. Besides, we can notice that there is no distribution, which is also a lower envelope for the specified imprecise set due to the several ECDFs crossings. The same holds true for the upper envelope. Finally, the confidence interval for the PoC is computed by means of Eq. (31). Specifically, the 99% interval is constructed by running 2N = 100 independent optimisations with different samples. The resulting confidence interval for PoC is 2.28%, 3.77% .

(41)

148

C. Greco and M. Vasile

The width of this interval is connected to the reliability of the bound estimator, and it could be reduced by using a larger number of particles in the RPF.

4 Conclusions This chapter has presented the formulation and developed a robust particle filtering approach for the state estimation problem under aleatoric and epistemic uncertainty affecting the prior, likelihood or model parameters’ distribution. The formulation employed allows one to tackle very general epistemic models as there is no specific assumption on the epistemic set parameterisation. In the imprecise scenario discussed, the value of expectation of interest is enclosed in an interval. Therefore, lower and upper expectations have been introduced as robust bounds to be computed as solution of the imprecise filtering problem. Estimators have been presented for both the expectation and its bounds, together with a confidence interval to quantify the bound inaccuracy due to a finite number of samples employed. Then, the robust filtering approach employs precomputation performed with a standard filter and importance distributions to speed up the successive numerous filter evaluations required by the bounds computation routine. The optimisation step is tackled by a B&B solver based on the estimator Lipschitz continuity to ensure convergence to the global optimum within a set threshold. This approach has been applied to the computation of the collision probability between an operational satellite and a debris in LEO environment in the presence of epistemic uncertainty. The results have shown how the RPF is able to efficiently compute robust probability bounds in such scenario. In the context of space surveillance and tracking, this accentuated probability range indicates how sensitive the collision probability computation can be to modelling assumptions in the problem uncertainty structure in the presence of observations. In general, the ability of the robust particle filter in delivering more reliable estimates with efficient computations can be critical in a variety of highly uncertain scenarios.

References 1. Golodnikov, A.N., Knopov, P.S., Pardalos, P.M., Uryasev, S.P.: Optimization in the space of distribution functions and applications in the Bayes analysis. Probabilistic Constrained Optimization. Springer, Boston (2000) 2. Sarkka, S.: Bayesian Filtering and Smoothing, 1st edn. Cambridge University Press, New York (2013) 3. Gordon, N.J., Salmond, D.J., Smith, A.F.M.: Novel approach to nonlinear/non-Gaussian Bayesian state estimation. IEEE Proceedings F (radar and signal processing). IET Digital Library (1993) 4. Wang, W.S., Orshansky, M.: Robust estimation of parametric yield under limited descriptions of uncertainty. In: Proceedings of the 2006 IEEE/ACM International Conference on ComputerAided Design (2006)

Robust Particle Filter for Space Navigation Under Epistemic Uncertainty

149

5. Masreliez, C., Martin, R.: Robust Bayesian estimation for the linear model and robustifying the Kalman filter. IEEE Trans. Autom. Control 22(3), 361–371 (1977) 6. Morrell, D.R., Stirling, W.C.: Set-values filtering and smoothing. IEEE Trans. Syst. Man Cybern. 21(1), 184–193 (1991) 7. Smets, P., Ristic, B.: Kalman filter and joint tracking and classification in the TBM framework. In: Proceedings of the Seventh International Conference on Information Fusion (2004) 8. Noack, B., Klumpp, V., Brunn, D., Hanebeck, U.D.: Nonlinear Bayesian estimation with convex sets of probability densities. In: 2008 11th International Conference on Information Fusion (2008) 9. Benavoli, A., Zaffalon, M., Miranda, E.: Robust filtering through coherent lower previsions. IEEE Trans. Autom. Control (2010) 10. Augustin, T., Coolen, F.P.A., De Cooman, G., Troffaes, M.: Introduction to Imprecise Probabilities. Wiley (2014) 11. Greco, C., Gentile, L., Vasile, M., Minisci, E., and Bartz-Beielstein, T.: Robust particle filter for space objects tracking under severe uncertainty. In: AAS/AIAA Astrodynamics Specialist Conference (2019) 12. Jazwinski, A.H.: Stochastic Processes and Filtering Theory. Bellman R. Academic Press, New York (1970) 13. Couso, I., Moral, S., Walley, P.: Examples of independences for imprecise probabilities. In: Proceedings of 1st International Symposium on Imprecise Probabilities and Their Applications (1999) 14. Troffaes, M.: Imprecise Monte Carlo simulation and iterative importance sampling for the estimation of lower previsions. Int. J. Approx. Reason. 101, 31–48 (2018) 15. Greco, C., Vasile, M.: Robust Particle Filter under Mixed Aleatoric and Epistemic Uncertainty. (Manuscript in preparation 2020) 16. Casado, L.G, García, I., Tóth, B.G., Hendrix, E.M.T.: On determining the cover of a simplex by spheres centered at its vertices. J. Global Optim. 50, 645–655 (2011) 17. SOCRATES - Satellite Orbital Conjunction Reports Assessing Threatening Encounters in Space. http://celestrak.com/SOCRATES/. Accessed: 10-Jul-2019 18. Montenbruck, O., Gill, E.: Satellite Orbits: Models, Methods and Applications. Springer Science & Business Media (2000) 19. Wakker, K.F.: Fundamentals of Astrodynamics. TU Delft Library (2015) 20. Klinkrad, H., Alarcon, J.R., Sanchez, N.: Collision avoidance for operational ESA satellites. In: 4th European Conference on Space Debris (2005) 21. Vasile, M., Tardioli, C.: On the use of positive polynomials for the estimation of upper and lower expectations in orbital dynamics. In: Stardust Final Conference. Springer, Cham (2018) 22. Schutz, B., Tapley, B., Born, G.H.: Statistical Orbit Determination. Elsevier (2004) 23. Balch, M.S., Martin, R., Ferson, S.: Satellite conjunction analysis and the false confidence theorem. Proc. R. Soc. A (2019). https://doi.org/10.1098/rspa.2018.0565

Computing Bounds for Imprecise Continuous-Time Markov Chains Using Normal Cones Damjan Škulj

1 Introduction The theory of imprecise Markov chains in continuous time has achieved significant progress in recent years [5–7, 19], following the success of imprecise Markov chains in discrete time [2–4, 17, 18, 21]. They successfully combine the theory of stochastic processes with the ideas of imprecise probabilities [1, 22]. The theory has been employed in the analysis of optical networks [8, 13], electric grid [15, 16], and information propagation [11]. The applicability of the theory is still limited to cases with a moderate number of states, mainly because of the computational complexity. The core of the computations with imprecise (and precise) continuous-time Markov chains is the evaluation of the Kolmogorov backward equation. It is a matrix differential equation, which in the imprecise case involves lower transition operators instead of fixed matrices that are used in the precise theory. Consequently, the closed-form expressions known from the precise case are unfeasible for the imprecise model. As an alternative, numerically intensive grid methods have been developed [6, 10]. Those divide the interval of interest into a large number of subintervals where an optimization problem is solved using linear programming techniques. An alternative approach has already been presented in [19], with a hybrid method. The method combines the matrix exponential approach, known from the precise case, and grid techniques, in the situation where the matrix exponential approach is infeasible. The goal of the present article is to provide a computationally efficient algorithm based on the idea proposed in [19]. To make the method more suitable for practical use, we combine it with the theory of normal cones of convex sets. It allows

D. Škulj () Faculty of Social Sciences, University of Ljubljana, Ljubljana, Slovenia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Vasile, D. Quagliarella (eds.), Advances in Uncertainty Quantification and Optimization Under Uncertainty with Aerospace Applications, Space Technology Proceedings 8, https://doi.org/10.1007/978-3-030-80542-5_10

151

152

D. Škulj

substituting several steps that were initially based on linear programming with computationally simpler matrix operations. The primary result proposed is a computationally efficient procedure for solving the imprecise version of the Kolmogorov backward equation. It proceeds by identifying intervals where a solution using a suitable matrix exponential produces sufficiently accurate approximations within given error bounds. We illustrate our method by two examples. In our first example, the solution that would require more than a thousand steps with the grid methods completes in only three steps with our approach. In the second example, we formally confirm, in a reasonable number of steps, the validity of a solution from a previous study, where the existing methods were reported as infeasible. The intention of this chapter, however, is to provide the theoretical basis for the method and leave the practical considerations to further research. This also includes comparison with the existing methods, as not much practical testing has been reported in the literature up to now. This chapter is structured as follows. In Sect. 2, we provide an overview of the theory of imprecise Markov chains in continuous time. In Sect. 3, essential methods are presented for calculating lower expectations with respect to imprecise probabilistic models. The convexity properties of imprecise transition rate operators and their normal cones are presented in Sect. 4, and in Sect. 5, the norms and seminorms used throughout the chapter are provided. In Sect. 6, the numerical approximation techniques are discussed and the novel approach is proposed in detail. All mentioned methods are analyzed from the point of view of errors they produce in Sect. 7. Finally, in Sect. 8, the proposed methods are merged into a working algorithm and demonstrated on two examples. For the proofs of the theorems and more technical details on the concepts, the reader is kindly referred to an extended online version of this chapter [20].

2 Imprecise Markov Chains in Continuous Time 2.1 Imprecise Distributions over States An imprecise Markov chain in continuous time is a stochastic process with a state space X, whose elements will be denoted by k ∈ X and its cardinality |X| by m. The states will simply be labelled by consecutive numbers 1, 2, . . . , m. Labels however will not have any meaning for the dynamics of the process. The process will be indexed by time t ∈ [0, ∞). At every time point t, the state the process assumes is denoted by Xt , which is thus a random variable on X. As we will only consider the finite state case, the measurability considerations will be trivial. The distribution of Xt is assumed to be imprecisely specified and is therefore represented by an imprecise probabilistic model. The usual choice of the model in the theory of imprecise probabilities is credal sets and derived models of coherent lower and upper previsions (see e.g. [1, 12, 22]).

Computing Bounds for CTIMC

153

Briefly, a credal set M denotes a set of expectation functionals P compatible with available information. They give rise to lower and upper expectation functionals on the space of all real-valued functions (or gambles) on X, which can be identified with Rm , where m = |X|. Given a gamble f ∈ Rm , we define its lower and upper expectation with respect to a credal set M as E(f ) = inf P (f ) = inf P ∈M P ∈M

P (1{k} )f (k)

(1)

P (1{k} )f (k), E(f ) = sup P (f ) = sup P ∈M P ∈M k∈X

(2)

k∈X

and

respectively, where throughout this chapter 1A denotes the indicator gamble on A, i.e. 1A (k) = 1 if k ∈ A and 1A (k) = 0 elsewhere. Adding the time dimension, our analysis now translates into finding the lower expectations E t (f ) for a given gamble f with respect to the corresponding credal sets Mt at given time t. This results in a real-valued map t → E t (f ) on a required time interval. Typically, it is of the form [0, T ], where 0 denotes the initial time of the process observation. The value of E t (h) depends on the initial distribution, represented by an initial lower expectation E 0 , and the transition law, which is described in terms of imprecise transition rates, as described in the following section.

2.2 Imprecise Transition Rate Matrices A continuous-time Markov process switches between states in X randomly according to some transition rates, which are described using Q-matrices, also named transition rate matrices. Each element Qkl , for k = l, of a transition rate matrix denotes the rate at which a process in state k moves to state l. Its value is nonnegative. The diagonal elements Qkk are negative and denote the rate of leaving k. It follows that Qkk = − l=k Qkl , which implies that the sum of all rows of a Q-matrix equals 0. The imprecision in transition rates is modelled by replacing precisely given transition rate matrices with sets of those, called imprecise transition rate matrices or imprecise Q-matrices. These sets are assumed to contain the factual transitions governing the dynamics of the system at any time t and are typically denoted by Q. Thus, at every time we merely assume that transition rates belong to the set Q, while in the course of time they may arbitrarily vary within it. We additionally require the imprecise Q-matrices to be closed, convex, and bounded, i.e. there exists a constant M such that |Qkl | M for every Q ∈ Q and k, l ∈ {1, . . . , m}. We say that Q has separately specified rows if for every collection of Qk ∈ Qk , for k ∈ {1, . . . , m},

154

D. Škulj

there exists a matrix Q ∈ Q whose kth row is Qk : [Q(f )]k = Qk (f ). From now on, the separately specified rows property will be added to the list of standard requirements for an imprecise Q-matrix. For an imprecise Q-matrix, the corresponding lower transition operator is defined by Qf := min Qf, Q∈Q

(3)

where the min is meant component-wise. However, the separately specified rows property ensures that for every f ∈ Rm , some Qf ∈ Q exists such that Qf f = Qf . Thus, the above component-wise minimum is actually attained by some product Qf f .

2.3 Distributions at Time t The imprecise distribution of Xt represented by the lower expectation functional E t now satisfies the following relation [19]: E t (f ) = E 0 (T t f ),

(4)

for every gamble f . The lower transition operator T t is obtained as the unique nonlinear operator satisfying d T f = Q T tf dt t

(5)

and the initial condition T 0 f = f for every gamble f . Actually, De Bock [5] showed that the above equation holds even without reference to a specific gamble f . Yet, finding a specific lower expectation is merely possible for a given f in which case both interpretations of the equation coincide. To calculate E t (f ) for a given gamble f , the lower operator T t does not need to be completely specified. Instead, only the vector function ft := T t f needs to be evaluated. By (5), it follows that d ft = Qft , dt

(6)

with the initial condition f0 = f . It was shown in [19] that this equation has a unique solution for a lower transition rate operator.

Computing Bounds for CTIMC

155

3 Numerical Methods for Finding Lower Expectations 3.1 Lower Expectation and Transition Operators as Linear Programming Problems The methods for finding lower expectations of the random variables Xt are based on linear programming methods. Coherent lower (or upper) previsions are often presented in the form of a finite number of assessments, which can be turned into constraints of linear programming problems. Something similar can be said for imprecise transition rates, which as convex sets can also be generated by imposing a finite number of linear constraints. The corresponding objective function is usually deduced from the minimizing gamble. Specifically, consider Eq. (6). The calculation of the lower transition rate Qft for a given ft is an optimization problem, where the minimum min Qk (ft )

Qk ∈Qk

(7)

has to be obtained for every row set Qk = {Qk : Q ∈ Q}, k ∈ X. If the row set Qk is represented by a finite number of constraints, the above optimization problem can be solved by linear programming techniques. Once the solution hk = minQk ∈Qk Qk (ft ) is obtained for every k, the solutions are combined into the solution vector h, whose components are hk , and the minimizing matrix Q, whose rows are exactly the minimizing solutions Qk .

3.2 Computational Approaches to Estimating Lower Expectation Functionals The most common computation involving imprecise continuous-time Markov chains is solving of Eq. (6) for a given initial gamble f on a finite time interval [0, T ]. The calculation of Qft is typically implemented as a linear programming problem. In principle, it would have to be solved for every single time point of an interval, and this is clearly impossible. Hence, the exact solution is in most cases unattainable, whence we have to satisfy with approximations. Most of the computational approaches to finding approximate solutions proposed in the literature apply some kind of discretization of the interval [0, T ]. This means constructing a sequence 0 = t0 < t1 < · · · < tn = T . By 4 the semigroup property of the lower transition operators, we then have that T T = ni=1 T ti −ti−1 . The idea is now to take the differences δt = ti − ti−1 sufficiently small, so that approximations of the form Tˆ δt = (I + δtQ) or Tˆ δt = eδtQ , for some matrix Q, minimizing Qfti−1 , are accurate enough even when the approximation errors compound. It has been shown in [6, 10, 19] that it is possible, with appropriately fine grids, to

156

D. Škulj

achieve arbitrarily accurate approximations. The approximate solution fˆT of T T f is then obtained by initially setting f0 = f and then sequentially calculating the approximations fˆti = Tˆ ti −ti−1 fˆti−1 , resulting ultimately in fˆT = fˆtn . The present methods differ in the way the step sizes ti − ti−1 are determined and how the approximate transition operators T ti −ti−1 are obtained. Our goal is to achieve a progress in the applicability of the approach presented in [19], called the approximation with adaptive grid method. To explain the underlying idea, note that the optimization problems for finding the minima Qft for different t are all the same as far as constraints are considered, and they merely differ in the objective functions, which correspond to ft , which is a Lipschitz continuous function of t (cf. Proposition 7 in [19]). Therefore, it is legitimate to expect that the matrices Q, minimizing the expression Qft over all Q ∈ Q, would lie in a close neighborhood, or even be the same, for proximate values of t. This idea is unique to our approach, as the majority of other methods in existence do not attempt to make use of the continuity of solutions ft . By our method, the intervals ti −ti−1 are chosen in the way that the corresponding transition operators T ti −ti−1 can be approximated by e(ti −ti−1 )Q , where Q is a transition rate matrix. Very often, this choice even produces the exact solution on a suitable interval, i.e. no error additional to the initial error of fˆti−1 is produced. Moreover, utilizing this method, the intervals ti − ti−1 are typically allowed to be considerably wider than with using the alternative techniques.

4 Normal Cones of Imprecise Q-Operators A closed and convex set of transition matrices is a convex polyhedron if the set of constraints is finite and it is non-empty and bounded. Moreover, if it additionally satisfies the separately specified rows property, it can be represented as a product of the row polyhedra Q = ×m k=1 Qk . Here, we restrict to the normal cones corresponding to row sets of imprecise Q-matrices. For the theory of normal cones of general convex sets, see e.g. [9]. In the case of imprecise Q-matrices denoted generically by Q, we assumed that it has separately specified rows, which implies that it is of the form Q = ×k∈X Qk , where each Qk is a convex polyhedron of vectors qk . Take a row set Qk , which is a convex set of vectors: Qk = {q ∈ Rm : q1X = 0, qf Qk (f ) ∀f ∈ F}, where F is a suitable set of gambles generating a set of Q-matrices. It is assumed, for instance, to contain all gambles of the form 1{s} with the constraints Qk (1{s} ) ≥ 0 for s ∈ X, s = k. For every element q ∈ Qk , the corresponding normal cone is the set of vectors NQk (q) = {f ∈ Rm : qf pf ∀p ∈ Qk }.

Computing Bounds for CTIMC

157

Vector q can be considered as a kth row of a matrix Q ∈ Q, and its normal cone is the set of all vectors f ∈ Rm for which q = arg minq ∈Qk q f . To simplify the notation, we will now assume the gambles in F are enumerated by some indices i ∈ I , where I is an index set. Thus, F = {fi : i ∈ I }. It follows from Proposition 14.1 in [9] that every element f of the normal cone NQk (q) can be represented as a linear combination of elements in F that are contained in the cone: f =

αi fi + α0 1X ,

(8)

i∈Iq

where Iq = {i ∈ I : qfi = Q(fi )}; αi 0 for all i ∈ Iq and α0 is an arbitrary real constant. Here, we used the fact that the constraint q1X = 0 can equivalently be stated as a combination of two distinct constraints, q1X 0 and q(−1X ) 0, and therefore, depending on the sign of α0 , either 1X or −1X appears in the above linear combination with a positive coefficient. We will call the subset FIq = {fi : i ∈ Iq } the basis of the cone NQk (q).

5 Norms of Q-Matrices In our analysis, we will use vector and matrix norms. For vectors f , we will use the maximum norm f = max |fi |, i∈X

(9)

and the corresponding operator norm for matrices Q = max

1km

m

|qkl |.

(10)

l=1

For every stochastic matrix P , we therefore have that P = 1, which implies that eQ = 1 for every Q-matrix. In general, Q-matrices may have different norms, though. For a bounded closed set of vectors F, we will define F = max f , f ∈F

(11)

and for a bounded closed set of matrices Q, Q = max Q. Q∈Q

(12)

158

D. Škulj

The distance between two vectors f and g is defined as d(f, g) = f − g, and the maximal distance between two elements of a set of vectors F will be called the diameter of the set and denoted with δ(F) = maxf,g∈F d(f, g). Additionally, we define the distance between two matrices as d(Q, R) = Q−R, while the diameter of an imprecise Q-matrix Q we pronounce as the imprecision of Q, denoted by ι(Q) = maxQ,R∈Q d(Q, R). The degree of imprecision has been previously defined in [14] in the L1 metric for the case of imprecise discrete-time Markov chains. In the literature, the variational seminorm f v = max f − min f is also often used and proves especially useful in the context of stochastic processes. In [6], the quantity f c = 12 f v is also used. The reason to turn from norms to the seminorm is in the simple fact that f v = 0 implies that f is constant and further that Qf = 0 for every Q-matrix Q and Tf = f for every transition operator T . Moreover, Tf v f v holds for every f ∈ Rm . The inequality f c f is also immediate.

6 Numerical Methods for CTIMC Bounds Calculation In this section, we discuss methods for calculation of the solutions of the differential equation (6). Let ht be a solution of this equation with the initial value h0 . The initial value may be an approximation at a previous stage or interval. It has been shown in [19] (Proposition 7) that the solution ht is Lipschitz continuous. More precisely, the following estimate holds: ht+ t − ht tQh0 e tQ = tQh0 + o( t).

(13)

6.1 Matrix Exponential Method Assume that the initial vector h0 = h is given, and let Q be an extreme Q-matrix such that Qh = Qh. By definition, the initial vector belongs to the collection of normal cones NQk (Qk ) for every k ∈ {1, . . . , m}. Thus, for each index k, we have an index set J ⊆ I , such that FJ forms the basis of NQk (Qk ). Moreover, it can be shown that a basis FJ˜ of Rm exists, such that h is a non-negative linear combination of elements of FJ˜ . In our case, the basis contains either 1X or −1X , which are excluded from the set of gambles indexed by I . Let Ik,h denote the index set, which together with 1X or −1X forms the required basis corresponding to the kth row. Then, we can write αki fi + αk0 1X , (14) h= i∈Ih,k

Computing Bounds for CTIMC

159

where αki 0 for every i ∈ Ih,k , αk0 ∈ R. By these assumptions, the solution ht of Eq. (6) can be written as a linear combination of the form (14) for every t 0, yet not necessarily with non-negative coefficients αki for t 0. Remark 1 In the case described above where F ∩ NQk (Qk ) is not linearly independent, instead of the entire normal cone, we only consider its part that contains the gamble h and is positively spanned by the linearly independent subset. Note that this is only possible in the case where an extreme point happens to be an intersection of a number of hyperplanes greater than the dimension of the space containing the convex set. Thus, in principle, the cone generated by the linearly independent subset may represent only a fraction of the normal cone. In order to avoid repeating this fact, we will from now on slightly abuse terminology to name a cone spanned by a linearly independent set a normal cone. Yet, apart from the definition, this fact does not have any other negative impact, as these subsets of the normal cones are cones as well and they may likely become normal cones if only the constraints are slightly changed. In the general case, the vector h = h0 would belong to the interior of a normal cone, whence the coefficients αki are all strictly positive. (Here we already assume that the cone is generated by a linearly independent set of gambles.) For a small enough time T > 0, the values of ht may still belong to the same normal cone, whence they would satisfy Qht = Qht , for every t ∈ [0, T ]. In that case, the exact solution hT can be found explicitly as hT = eT Q h0 . Quite surprisingly, it has been shown in [19] that checking whether the above condition holds is possible by merely considering the solution at the end-point T . More precisely, we need to consider the partial sums of a Taylor series corresponding to the solution in the form of a matrix exponential. An implementation of this exact method was also proposed in the same paper, yet here we improve significantly on its efficiency by making use of the normal cones. To employ the exact method efficiently, it is necessary to aptly implement the following steps: – finding the time interval T where the method is applicable with T as large as possible and – verify whether the method is applicable on a given interval [0, T ] with acceptable maximal possible error. The second step suggests we might have an interval where the solutions ht do not lie exactly in the required normal cone, but sufficiently close to it, so that the error remains within acceptable bounds.

160

D. Škulj

6.2 Checking Applicability of the Matrix Exponential Method The procedure of checking the applicability of the exact method to an interval [0, T ] is based on the results proposed in [19]. Consider the Taylor series corresponding to the matrix exponential etQ =

∞ (tQ)i i=1

i!

(15)

,

and its rth partial sums pr (tQ) =

r (tQ)i i=1

i!

.

(16)

Theorem 1 Let Q be an arbitrary square matrix of order m, h ∈ Rm , and C ⊆ Rm a convex set, such that h ∈ C. Furthermore, let pr (tQ) be the partial sums (16). If pr (T Q)h ∈ C for some T > 0 and every r ∈ N0 = N ∪ {0}, then etQ h ∈ C for every t ∈ [0, T ]. An approximate version of the above results holds as well. Theorem 2 Assume the notation of Theorem 1. Suppose that ε > 0 and T > 0 exist such that for every r ∈ N0 , we can write E,r pr (T Q)h = hC,r T + hT , E,r where h, hC,r T ∈ C and hT c ε. Then, for every t ∈ [0, T ], E etQ h = hC t + ht , E where hC t ∈ C and ht c ε.

6.3 Checking the Normal Cone Inclusion In [19], verification of whether some pr (tQ) belongs to a normal cone C was implemented through the application of linear programming, which is computationally costly. Here, we propose a procedure that vastly reduces the number of linear programming routines that need to be executed and replace them with faster matrix methods. Notice again that in the case a normal cone contains a subset of F that is not linearly independent, a subset generated by an independent subset is only considered.

Computing Bounds for CTIMC

161

Let MJ denote the matrix whose columns are fi for i ∈ J ⊂ I and 1X as the first column. Here, J stands for any Ih,k . Equation (14) is equivalent to MJ α = h, where α denotes the vector of the coefficients α(i) for i ∈ J ∪ {0}. (Now we write α(i) instead of αi to avoid multiple indices.) Due to the assumed linear independence, MJ is reversible and we have that α = MJ−1 h. Let α 0 be the vector of coefficients such that MJ α 0 = h0 and pr (t) be the rth partial sums for some power series. Furthermore, let α tr be such that MJ α tr = pr (tQ)h0 . It is a matter of basic matrix algebra to prove that α tr = pr (tMJ−1 QMJ )α 0 = pr (tQJ )α 0 .

(17)

That is, QJ := MJ−1 QMJ is the matrix corresponding to Q in the basis {1X } ∪ FJ . The vector pr (tQ)h0 is in the cone CJ iff α tr has all components, except possibly for the first one, non-negative. To avoid unnecessary calculations, one should first check whether eT QJ α 0 satisfies these requirements. Proposition 1 Let α 0 be an m-tuple and QJ a square matrix defined above. Denote α tr = pr (tQJ ), where pr (t) are the rth partial sum polynomials for the Taylor series of the exponential function. Suppose that αrT (i) 0 for every r 0 and t (i) 0 for every i ∈ J and 0 t T , where α t = etQJ α . i ∈ J . Then, α∞ ∞ 0 The above proposition provides a directly applicable criterion for checking whether the solution of (6) on some interval is entirely contained in the same normal cone. If the inclusion holds for all normal cones corresponding to rows Qk , then the exact solution of (6) is obtained as hT = eT Q h0 .

6.4 Approximate Matrix Exponential Method The solution using the exponential method might sometimes not satisfy the conditions of the previous subsection exactly and can thus for a particular interval partially lie outside the starting normal cone, yet the distance to it might be small enough to ensure that the error is within required bounds. In this subsection, we give a theoretical basis for such a use. Let J = Ih,k for some row index k, an initial vector h be given and denote by C the normal cone NQk (Qk ). Let hrt = MJ α tr , where α tr are as in the previous subsection. We decompose α tr into (α tr )+ , which is the vector of its positive components and (α tr )+ (0) = α tr (0) and (α tr )− containing the absolute values of the negative components except for (α tr )− (0) = 0. We have that α tr = (α tr )+ − (α tr )− . E,r C,r = MJ (α tr )+ and hE,r = −MJ (α tr )− . Clearly, Hence, hrt = hC,r t + ht , where ht t C,r ht ∈ C. Theorem 3 We assume the notation used above. Let h ∈ Rm and Q ∈ Q be given T − such that Qh = Qh. Furthermore, suppose that hE,r T c = − MJ (α r ) c ε

162

D. Škulj

for some T > 0, ε > 0 and all J ∈ {Ih,k : 1 k m} and r ∈ N0 . Then, the inequality 2 2 2 2 2Q[etQ h] − Q[etQ h]2 ι(Q)ε holds for every t ∈ [0, T ]. The above theorem provides a base for the use of the matrix exponential approximation in the case the solution on an interval is nearly contained in the same normal cone.

7 Error Estimation In this section, we estimate the maximal possible error of the approximation hˆ t of the exact solution ht of Eq. (6), employing one of the described methods. We will assume that hˆ t ht and that it satisfies the equation dht = Qt h t , dt

(18)

where Qt : [0, T ] → Q is some piecewise constant map. We will assume this property for the sake of simplicity and because actually all the existing methods indeed produce such functions. In fact, as far as polyhedral sets of Q-matrices are concerned, this property indeed holds, as the matrix minimizing the expression Qh is constant as long as h remains in its normal cone. Note, however, that the grid method using the linear approximation hˆ tk+1 = I + (tk+1 − tk )Qk hˆ tk does not necessarily satisfy Eq. (18), yet it turns out that the error produced is of similar magnitude.

7.1 General Error Bounds Denote by P t the linear operator mapping h to the solution of the differential equation (6) at time t + t with the initial value at ht = h. We can write P t ht = ht+ t . Moreover, we will denote by Pˆ t the operator that maps ht to the approximation hˆ t+ t . Using an approximation method, the obtained estimate at time t + t is not P t hˆ t , but instead an approximation hˆ t+ t , which in addition to the error of the estimation Et at time t, contains an additional error due to the approximation m denote the error of the method on the interval t. That is, method used. Let E t m = Pˆ h ˆ t . The following proposition holds. E t t ˆ t − P t h

Computing Bounds for CTIMC

163

m denote the error Proposition 2 Let Et = hˆ t −ht for every t ∈ [0, T ], and let EΔt produced by an approximation method on an interval of width Δt. Then, Et+Δt m , which we can rewrite into ΔE E m . Et + EΔt t Δt

The above proposition could be interpreted as an estimate of the total error that m. results from the error in initial solution Et and the error of the method E t

7.2 Error Estimation for a Single Step Within a single approximation step, we calculate the solution hˆ t+ t based on the approximation hˆ t . For the purpose of error estimation, we will set t = 0 and t = T . Moreover, we will assume the initial solution is exact, because, otherwise, the initial error is merely added to the error of the method as shown in the previous section. Thus, the initial value is set to h0 = h. Now assume we have the estimation of the form hˆ t = etQ h for t ∈ [0, T ], where Qh = Qh. Our goal is to bound the norm of the difference hˆ T − hT , where hT is the exact solution of equation (6) with initial condition h0 = h. Let Et = hˆ t − ht represent the error of the approximation. The following theorem holds. Theorem 4 Let h ∈ Rm be given and the matrix Q be such that Qh = Qh. Suppose T − that for some T > 0 and ε > 0, we have that hE,r T c = MJ (α r ) c ε for every r ∈ N0 . Then, ι(Q) ε 2 eQt − 1 ε Et eQt − 1 Q

(19)

for every 0 t T . An estimate for the upper bound on the error follows. Consider again the operator etQ acting on vector h, which is by definition equal to etQ h = h +

∞ (tQ)k k=1

k!

h =: h + hE t .

(20)

Based on the maximal norm of hE t and applying Theorem 4, the maximal possible error of the approximation Et is estimated to be Et = 2hc (1 − etQ (1 − tQ)).

(21)

164

D. Škulj

7.3 Error Estimation for the Uniform Grid The approximation using the uniform grid method on an interval [0, T ] is obtained by dividing the interval into subintervals [ti , ti+1 ], where 0 = t0 < t1 < · · · < tn = T . Although the differences ti+1 − ti can be variable in some approaches (see e.g. [6]), we will conveniently assume all distances are equal to δ = Tn . The total error on the interval [0, T ] is bounded by the sum of the errors on the subintervals, which by (21) is equal to ET 2nh(1 − e

δQ

(1 − δQ)) = 2nh 1 − e

T n

Q 1 − T Q . n (22)

In [6], an error estimate for a uniform grid method, which uses the approximation of ht = (I + (tn − tn−1 )Q)ht−1 , has been found to be ET∗ = δ 2 Q2

n−1

hti c .

(23)

i=0

In the worst case, we have that hti c = h, where we end up with the estimate ET∗ = nδ 2 hQ2 ,

(24)

which is very close to our estimate (22), especially for large n. Both our error estimate and the one found in [6] benefit from ergodicity properties, causing diminishing the variational norm of the solution vector function.

8 Algorithm and Examples Based on the theoretical results, we now provide an algorithm for estimating the solution of equation (6) with given imprecise transition rate matrix Q and initial value h.

8.1 Parts of the Algorithm We will present the version of the algorithm where only the matrix exponential method is used.

Computing Bounds for CTIMC

Inputs:

165

The following inputs to the algorithm are needed:

– a set of gambles F is given in terms of an N × m matrix, where the ith row denotes a gamble fi ; – a set of lower transition rates Q is also represented in terms of an N × m matrix, where the (i, j )th entry denotes [Qfi ]j ; – a gamble h as an m-tuple; – time interval length T > 0; and – maximal allowed error E. Outputs: The algorithm provides an approximation of hT as an m-tuple and Er the maximal bound on the error. Note that the calculated approximation can be more accurate than required. The requirement is that Er E. Minimizing matrix: The matrix Q satisfying Qh = Qh is found using linear programming. Identification of the normal cones: For each row k = 1, . . . , m, we identify the index set Ik = {i ∈ {1, . . . , N } : Qk fi = Qfi }. Furthermore, we calculate h and – a non-negative linear combination i∈Ik αi fi = – if |Ik | > m − 1, a non-trivial linear combination i∈Ik βi fi = 0. Based on the above combinations, a gamble fi is eliminated. These steps are repeated until FIk becomes linearly independent. If needed, the set is completed to a basis with some of the remaining elements of the cone basis. The final output is a linearly independent set FIk and a collection of coefficients α = (α0 , . . . , αm−1 ) for every row k. In the case where some normal cones coincide for different rows, the duplicates are removed. Finding a feasible interval: In general, the application of the matrix exponential method on the entire interval [0, T ] is infeasible. Hence, we need to find a subinterval [0, T ] where the error is within required bounds. As by Proposition 2 the errors are sequentially added to the initial error, we require that the added part of the error ETm is smaller than the proportional part of the maximal allowed error: Er ET /T . This error estimate is calculated using Theorem 4. Its estimation first requires the assessment of ε, which is obtained by applying Theorem 3, as ε = minJ ∈Ik MJ (αsT )− c . The initial estimate of the interval length is obtained, using the linear approximation of etQ h ≈ h + tQh, to be the maximal t such that α0 + tQJ α0 0 (see (17)). If α0 happens to have zero elements, then the above expression may have negative coefficients even for very small values of t, in which case we just try with a minimal initial interval, specified as a parameter of the algorithm. In case the initial interval yields too large estimated error, the interval is halved until reaching the required error size. Since the estimated error size is at most as large as with the grid methods reported in [6, 19], the process eventually produces a feasible interval. Iterative step: Once a feasible interval length dt is found, the new initial solution is set to hdt = edtQ h. The remaining time interval then reduces to T − dt. The

166

D. Škulj

Algorithm 1 Function: approximate hT Require: F, Q, h, T , E Ensure: hT , maxErr " solution at time T , error estimate 1: ts = 0, te = T " start and end time points 2: maxErr = 0 3: nq = Q, io = ι(Q) 4: while ts < te do 5: Q = arg minQ∈Q Qh 6: for k = 1, . . . , m do 7: (Ik ) = normalCone(h, Qk , Qk ) 8: " find the basis of the normal cone for k-th row 9: (Iki , indk ) = reduceToIndependent(Ik , h) 10: " reduce to independent set and find linear combination equal h 11: end for 12: dt = min(initialInterval(I, ind), te − ts ) 13: " try initial interval based on the linear approximation 14: repeat 15: ε = estimateEpsilon(I, ind) io 16: Err = (enq t − 1) nq ε " estimated error 17: Ea = E · dt/T " maximal allowed error 18: if (Err > Ea) then 19: dt = dt/2 20: end if 21: until Err Ea 22: h = edt Q h " new solution 23: maxErr = maxErr + Err " total error 24: E = E − Err " the remaining allowed error " new starting point 25: ts = ts + dt 26: end while 27: return hT = h, maxErr

maximal allowed error is updated to E − Er, where Er is the evaluated maximal error of the applied method. Algorithm 1 illustrates the main steps of the approximation of the solution using our method.

8.2 Examples In our first example, we demonstrate the use of the method for a case where the solution remains in a single normal cone for the entire interval. Example 1 Let X be a set of 3 states, which we denote by 1, 2, 3. We consider a set Q of Q-matrices, which is given by the constraints of the form Qi (1A ) for all non-trivial subsets in X. As in addition we want to ensure that the representing

Computing Bounds for CTIMC

167

gambles f all satisfy k∈X fk = 0 and to be of norm equal to 1, we instead use the following six representing gambles: f1 = (−1, 1/2, 1/2)

f2 = (1/2, −1, 1/2)

f3 = (−1/2, −1/2, 1)

f4 = (1/2, 1/2, −1)

f5 = (−1/2, 1, −1/2)

f6 = (1, −1/2, −1/2).

Let the set Q be specified via the following constraints: ⎛

⎞ 0.76 −0.69 0.15 −0.24 0.60 −0.92 ⎜ ⎟ L = ⎝ −0.99 1.21 0.30 −0.39 −1.37 0.90 ⎠ . −0.24 −0.54 −0.76 0.61 0.45 0.15

(25)

The elements of the above matrix denote the lower bounds lki = Qk (fi ). Now, Q is the set of all Q-matrices Q satisfying, for every i = 1, . . . , 6, Qfi Li , which denotes the ith column of L. Given an initial gamble h = (−0.7, 1.7, −1), we calculate the solution of equation (6) satisfying h0 = h on the interval [0, 1]. We try finding as large as possible an interval where ht is in the same normal cone of Q as h. The matrix Q minimizing Qh over Q is found to be ⎛

⎞ −0.560 0.460 0.100 ⎜ ⎟ Q = ⎝ 0.607 −0.807 0.200 ⎠ . 0.147 0.360 −0.507 All normal cones NQk (Qk ) are spanned by the same set of gambles {f4 , f5 , 1X }. Specifically, we have that h = 1.6f4 +0.2f5 . This is of course due to the fact that we restricted the space of the gambles to the set where the sum of components for each one of them is zero. We cannot expect this for all further ht , whence the constant 1X will in general appear in the linear combinations forming ht . Thus, we have the initial vector of coefficients α0 = (1.6, 0.2, 0) of h in the basis B = (f4 , f5 , 1X ). The preliminary analysis based on the first-order Taylor approximation as described in Sect. 8.1 suggests that the initial time interval where the matrix exponential method could be applied is the interval [0, T ] with T = 0.774. To confirm this interval, all vectors pr (T Q)h must be contained in the cone generated by non-negative linear combinations of B, except for the constant. According to the procedure described in Sect. 6.2, we find the matrix QJ which corresponds to the operator Q in the basis B, which we obtain as ⎛

⎞ −1.267 −0.100 0 ⎜ ⎟ QJ = MJ−1 QMJ = ⎝ 0.100 −0.607 0 ⎠ , 0.007 0.103 0 with MJ being the matrix with elements of B as columns. Checking whether pr (T Q)h is contained in the same cone directly translates to checking whether

168

D. Škulj

α Tn = pr (T QJ )α0 has non-negative components corresponding to f4 and f5 , that is, in the first two places. The resulting sequence of coefficients is (rounded to two decimals) α T1 = (0.016, 0.230, 0.024)

α T2 = (0.791, 0.162, 0.021)

α T3 = (0.540, 0.192, 0.021)

α T4 = (0.601, 0.184, 0.021)

α T5 = (0.589, 0.186, 0.021)

α T∞ = (0.591, 0.185, 0.021).

All coefficients α Tn for n > 5 lie in the neighborhood of the limit values α T∞ and are certainly positive. Every partial sum pr (1 · Q)h therefore belongs to the same normal cone as h and so do all ht for t ∈ [0, T ], as follows by Theorem 1. The solution hT = eT ·Q h = (−0.182, 0.704, −0.460) is therefore the exact solution of equation (6) on this interval. Two more steps, similar to this one, are needed to obtain the result h1 = (−0.108, 0.552, −0.366). In this example, the power of the new method is fully demonstrated. First, only three optimization steps are needed. For comparison, we estimate the required number of steps if the uniform grid employed. method [6] were n−1By the error estimate 1 2 provided in their paper, δ 2 Q2 n−1 h = Q ti c i=0 i=0 hti c ε = 0.001 n2 is required. The norms hti c are bounded from below using the contraction nature of the transition operators, whence we can deduce that hti c h1 c = 0.45. The norm Q is bounded by 1.82. Based on these estimates, the number of required iterations would be at least 1 490. Applying our method does bring some additional tasks to be performed, yet these tasks in total contribute much less to the time complexity than the optimization steps. Second, knowing that the solution lies in the same normal cone guarantees not only that the result is accurate up to the maximal allowed error but also that it is the exact solution. Using the approximate operators (I + Tn Q)n , the best we can get are approximations. Example 2 In our second example, we revise example in [15], Section 3.4. In this example, the states denote failures in a power network, and the transitions arise from the repair rates. The imprecise transition rate matrix there is given as a pair of a lower and upper transition rate matrices: ⎞ ⎛ −0.98 0.32 0.32 0.19 ⎜ ⎟ 0 0.51 ⎟ ⎜ 730 −1460.61 QL = ⎜ ⎟ 0 −1460.61 0.51 ⎠ ⎝ 730 0 730 730 −2920 ⎛ ⎞ −0.83 0.37 0.37 0.24 ⎜ ⎟ 0 0.61 ⎟ ⎜ 1460 −730.51 QU = ⎜ ⎟, 0 −730.51 0.61 ⎠ ⎝ 1460 0 1460 1460 −1460

(26)

(27)

Computing Bounds for CTIMC

169

where we can simply take m 7 8 Q = QL , QU = Q : QL,k Qk QU,k , ∀1 k m, Qkl = 0 . l=1

(28) In the original paper, bounds for the long-term distribution were estimated, yet without a clear idea how to estimate the error bounds. It was observed, however, that the uniform grid with as little as 80 subintervals was sufficient to obtain a sufficiently accurate result on the interval [0, 0.02], which turned to be sufficient for the process to reach the limit distribution. The error estimates employing the methods at hand predicted significantly larger errors than observed. The bounds for the limit distributions were found to be ⎛ ⎛ ⎞ ⎞ 9.985 × 10−1 9.994 × 10−1 ⎜ ⎜ ⎟ ⎟ ⎜2.623 × 10−4 ⎟ ⎜7.252 × 10−4 ⎟ π =⎜ (29) ⎟ π =⎜ ⎟. −4 −4 ⎝2.623 × 10 ⎠ ⎝7.252 × 10 ⎠ 6.513 × 10−5 1.647 × 10−4 To calculate the lower transition probability P t ({i|j }), we first find the solution ht of (6) for h0 = 1{i} and take its j th component [ht ]j . To calculate the upper probability, we take h0 = −1{i} and then set P t ({i|j }) = −[ht ]j . For a sufficiently large time interval and a convergent chain, all components of ht became more and more similar, and in our case they denote the limit lower, respectively, and upper probabilities. We repeated the calculations utilizing our method, setting the maximal allowed error to 0.001 and the time interval to [0, 1], that is, clearly more than sufficient to ensure convergence. The method produced identical results on the lower and upper bounds, with the number of required iterations for each value varying between 30 and 40. Our method therefore confirms the validity of the results in the original paper, which does not contain a rigorous proof.

9 Concluding Remarks The method presented in this chapter provides a promising alternative to the existing methods for approximating the solutions of the imprecise generalization of the Kolmogorov backward differential equation on finite intervals. The primary achievement is that the approach of matrix exponentials no longer needs to be combined with the grid methods. This is predominantly thanks to the introduction of the approximate version of the exponential method and considerably improved error estimation.

170

D. Škulj

As presented, our analysis is limited to finite intervals; however, with some adaptations, it could be employed for finding the limit distributions as well. A step into this direction is demonstrated in our second example, where the obtained solution is effectively the limit distribution. The convergence manifests in the solutions becoming close to a constant vector. Put differently, the difference to a constant tends to zero, which is taken into account by the error estimates. It is a matter of further work to formalize this into a comprehensive method for finding long-term distributions. Acknowledgments The author acknowledges the financial support from the Slovenian Research Agency (research core funding no. P5-0168).

References 1. Augustin, T., Coolen, F.P., de Cooman, G., Troffaes, M.C.: Introduction to Imprecise Probabilities. Wiley (2014) 2. De Cooman, G., Bock, J.D., Lopatatzidis, S.: Imprecise stochastic processes in discrete time: global models, imprecise Markov chains, and ergodic theorems. Int. J. Approx. Reason. 76, 18– 46 (2016). https://doi.org/https://doi.org/10.1016/j.ijar.2016.04.009. http://www.sciencedirect. com/science/article/pii/S0888613X16300603 3. De Cooman, G., Hermans, F., Quaeghebeur, E.: Imprecise Markov chains and their limit behavior. Probab. Eng. Inf. Sci. 23(4), 597–635 (2009). https://doi.org/10.1017/S0269964809990039 4. Crossman, R.J., Škulj, D.: Imprecise Markov chains with absorption. Int. J. Approx. Reason. 51, 1085–1099 (2010). https://doi.org/10.1016/j.ijar.2010.08.008 5. De Bock, J.: The limit behaviour of imprecise continuous-time Markov chains. J. Nonlinear Sci. 27(1), 159–196 (2017) 6. Erreygers, A., De Bock, J.: Imprecise continuous-time Markov chains: Efficient computational methods with guaranteed error bounds. Preprint (2017). arXiv:1702.07150 7. Erreygers, A., De Bock, J.: Computing inferences for large-scale continuous-time Markov chains by combining lumping with imprecision. In: International Conference Series on Soft Methods in Probability and Statistics, pp. 78–86. Springer (2018) 8. Erreygers, A., Rottondi, C., Verticale, G., De Bock, J.: Imprecise Markov models for scalable and robust performance evaluation of flexi-grid spectrum allocation policies. IEEE Trans. Commun. 66(11), 5401–5414 (2018) 9. Gruber, P.: Convex and Discrete Geometry. Springer, Berlin, Heidelberg (2007). https://doi. org/10.1007/978-3-540-71133-9 10. Krak, T., De Bock, J., Siebes, A.: Imprecise continuous-time Markov chains. Int. J. Approx. Reason. 88, 452–528 (2017) 11. Liu, X., Tang, T., He, D.: Double-layer network negative public opinion information propagation modeling based on continuous-time Markov chain. Comput. J. (2020) 12. Miranda, E., de Cooman, G.: Marginal extension in the theory of coherent lower previsions. Int. J. Approx. Reason. 46(1), 188–225 (2007). https://doi.org/http://dx.doi.org/10.1016/j.ijar. 2006.12.009 13. Rottondi, C., Erreygers, A., Verticale, G., De Bock, J.: Modelling spectrum assignment in a two-service flexi-grid optical link with imprecise continuous-time Markov chains. In: DRCN 2017-Design of Reliable Communication Networks; 13th International Conference, pp. 1–8. VDE (2017) 14. Škulj, D.: Perturbation bounds and degree of imprecision for uniquely convergent imprecise Markov chains. Linear Algebra Appl. 533, 336–356 (2017)

Computing Bounds for CTIMC

171

15. Troffaes, M., Gledhill, J., Škulj, D., Blake, S.: Using imprecise continuous time Markov chains for assessing the reliability of power networks with common cause failure and non-immediate repair. SIPTA (2015) 16. Troffaes, M., Krak, T., Bains, H.: Two-state imprecise Markov chains for statistical modelling of two-state non-Markovian processes. In: The Eleventh International Symposium on Imprecise Probabilities: Theories and Applications, vol. 103, pp. 394–403. PMLR (2019) 17. Škulj, D.: Discrete time Markov chains with interval probabilities. Int. J. Approx. Reason. 50(8), 1314–1329 (2009). https://doi.org/10.1016/j.ijar.2009.06.007 18. Škulj, D.: A classification of invariant distributions and convergence of imprecise Markov chains. Linear Algebra Appl. 439(9), 2542–2561 (2013). https://doi.org/ http://dx.doi.org/10.1016/j.laa.2013.07.001. https://www.sciencedirect.com/science/article/pii/ S0024379513004527 19. Škulj, D.: Efficient computation of the bounds of continuous time imprecise Markov chains. Appl. Math. Comput. 250(0), 165–180 (2015). https://doi.org/http://dx.doi.org/10.1016/j.amc. 2014.10.092. http://www.sciencedirect.com/science/article/pii/S0096300314014672 20. Škulj, D.: Computing bounds for imprecise continuous-time Markov chains using normal cones. Preprint (2020). arXiv:2012.01029 21. Škulj, D., Hable, R.: Coefficients of ergodicity for Markov chains with uncertain parameters. Metrika 76(1), 107–133 (2013). https://doi.org/10.1007/s00184-011-0378-0 22. Walley, P.: Statistical Reasoning with Imprecise Probabilities. Chapman and Hall, London, New York (1991)

Simultaneous Sampling for Robust Markov Chain Monte Carlo Inference Daniel Krpelik, Louis J. M. Aslett, and Frank P. A. Coolen

1 Introduction Computations with sets of probability distributions can be helpful for sensitivity analysis in stochastic inference [3] or when imprecise models are used as an uncertainty model [9]. We will refer to this set of probability distributions as a credal set M. Our aim will be to evaluate or estimate expected values of function(s) f : X → R over the set of distributions in M. For coherent imprecise models, that is those that adhere to some rationality constraints [1], the set of expected values over this set is convex. This enables us to limit our focus solely on lower and upper bounds over this set. We will denote these as lower and upper expected values defined in Eq. (1). Ef := inf P ∈M

f dP

and

Ef := sup P ∈M

f dP .

(1)

Coherency of the stochastic model will also ensure that these bounds will be conjugated such that Ef = −E(−f ).

Supported by H2020-MSCA-ITN-2016 UTOPIAE, GA 722734. D. Krpelik () Durham University, Durham, UK VSB-TUO, Ostrava, Czech Republic e-mail: [email protected] L. J. M. Aslett · F. P. A. Coolen Durham University, Durham, UK © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Vasile, D. Quagliarella (eds.), Advances in Uncertainty Quantification and Optimization Under Uncertainty with Aerospace Applications, Space Technology Proceedings 8, https://doi.org/10.1007/978-3-030-80542-5_11

173

174

D. Krpelik et al.

Computing expected values of stochastic models is generally an intractable problem. Common practice is to estimate them, most often with Monte Carlo methods [5]. Monte Carlo methods simulate random sampling procedures and use statistical techniques to estimate population parameters—here the theoretical expected values. For imprecise models, an extension of Importance Sampling Monte Carlo can be used [7, 8]. Importance Sampling constructs a set of samples from an ancillary distribution PY and estimates expected values based on theoretical results for changing measures. Under some mild the expected value can be obtained 9 9 conditions, dPX X as Ef (X) = f (x)PX (dx) = f (y) dP (y)P Y (dy), where dPY is the Radon– dPY Nikodym derivative. The extension for imprecise models is achieved by constructing common independent samples from PY and estimating the lower expected value by X subsequent minimization of Ef (X) through varying dP dPY for PX ∈ M. We will show how a different technique, Markov Chain Monte Carlo (MCMC), can be extended for inferences with imprecise probability models. After a short revision of MCMC in Sect. 2, we show how sampling methods in general can be extended for sets of target distributions in Sect. 3, and use this extension on MCMC in Sect. 4. Section 5 describes the emerging branching process that is further analyzed for credal sets composed of distributions from the same exponential family in Sect. 6. In Sect. 7, we show how to represent the branching process within a computer algorithm. Finally, we discuss briefly how to decrease the computational demands that the branching process imposes in Sect. 8, and the procedure overall in Sect. 9.

2 Markov Chain Monte Carlo MCMC [4] is a class of methods that are used when we cannot sample from the target distribution directly. It constructs and subsequently samples a Markov chain (Xi )∞ i=1 whose stationary distribution is the target distribution. These samples will not be independent like in the case of Importance Sampling, but, still, the Law of Large Numbers and a variant of Central Limit Theorem hold, so it can be used to estimate expected values of interest. The chain is characterized by an initial

distribution P0 for X0 and a transition kernel such that Xi ∼ PXi |Xi−1 = K(Xi−1 , ·). The transition operator K is constructed so that the targeted distribution is its invariant, i.e., P = KP . The underlying idea is that, individually, each of the samples can be viewed as a sample from the target distribution. Nevertheless, because P0 is generally different from the target distribution, the chain would only asymptotically converge toward the target distribution. Still, the expected value of f : X → R can be estimated from a sample of finite trajectory of (Xi )k+n by k

Imprecise MCMC

175

ˆ = 1 f (Xi ) Ef n k+n

(2)

i=k

for k, n both large enough. In this chapter, we will adhere to the common notation and represent samples of random variables by respective lower-case letters. We aim to assess EP f . MCMC algorithms generally proceed as follows. Let P be the target distribution and KP a Markov Chain transition kernel with stationary distribution P . The initial position x0 is sampled from some distribution P0 , and the sample trajectory, (x0 , . . . , xi ), is sequentially extended by sampling Xi+1 from KP (xi , ·). A variant of Markov Chain Monte Carlo is the Metropolis–Hastings procedure [4, chapter 1] that constructs KP with an aid of an ancillary kernel Q. Let prefix d· represent the densities of respective probability measures. New position, xk+1 , is constructed as follows: 1. Sample from⎧X ∼ Q(xk , ·). ⎨x with probability a (x , x ), P k 2. Set xk+1 = ⎩xk with probability 1 − aP (xk , x ),

(x )dQ(x ,x) where aP (x, x ) := min{1, dP dP (x)dQ(x,x ) }. These will further be denoted simply by lower-case letters, i.e., q(x, ·) := dQ(x, ·).

3 Simultaneous Sampling The proposed extension of MCMC for imprecise probability models lies in sampling a set of chains targeting all the distributions in some credal set M. We will therefore first discuss how to sample from some set of distributions simultaneously. Let ΓP : A → X be a mapping that transforms some ancillary random variable A with known d

distribution PA into a random variable with distribution P . Then ΓP (A) = X ∼ P , which means that for functions f : X → R, EPA f (ΓP (A)) = EP f (X). As an example, let P be an exponential distribution with rate λ. We can choose the ancillary random variable to be A ∼ Unif((0, 1)) and ΓP : [0, 1] → R : . This ΓP , which is exactly the quantile function of the respective ΓP (A) = − ln(1−A) λ exponential variable, satisfies ΓP (A) ∼ Exp(λ). This procedure is commonly referred to as the inverse-transform method [5, chapter 2]. However, in order to extend the inverse-transform sampling to sets of distributions, we define ΓM : A → 2X as a mapping that takes the union of all ΓP ; P ∈ M. Hence, ΓM (A) is a non-empty random set such that, by construction, ∀P ∈ M, ΓP (A) ∈ ΓM (A) a.s. For integrable functions of interest, f : X → R, it will therefore also hold that ∀P ∈ M : f (ΓP (A)) ∈ f (ΓM (A)). We define the expected value of f over M by the Aumann integral [2, 6]

176

D. Krpelik et al.

EM f := Ef (ΓM (A)) = {Ef (X) : X ∈ S(ΓM (A))},

(3)

where S(ΓM (A)) is the set of all integrable selectors of ΓM (A) [6], i.e., all random variables X, such that X ∈ ΓM (A) almost surely. Clearly, by construction, M ⊂ S(ΓM (A)), so ∀P ∈ M : inf EM f ≤ EP f,

(4)

so we can use inf Ef (ΓM (A)) as a lower bound for infP ∈M Ef (X) = Ef . Now, in the case of independent sampling, empirical approximation of ΓM (A) and Eq. (4) could be used to compute estimates on the lower bound for expected values.

4 Markov Chain Monte Carlo for Imprecise Models Here we describe how trajectories of several Markov Chains can be constructed simultaneously. The algorithm is based on an extension of the Metropolis–Hastings procedure described in Sect. 2. A general propagation step can be rewritten with an aid of ancillary random variables that will later serve a similar purpose as in Sect. 3. Given the last position of the chain targeting distribution P is xk : 1. Sample ancillary UQ ∼ Unif((0, 1)). 2. Construct a proposal step x = ΓQ(xk ,·) (uQ ). 3. Sample ancillary ⎧ UA ∼ Unif((0, 1)). ⎨x if u < a (x , x ) A P k 4. Set xk+1 = . ⎩xk otherwise aP is the Metropolis–Hastings acceptance probability as defined in Sect. 2, and ΓP , together with the respective U , represents a mapping, such that ΓP (U ) ∼ P as in Sect. 3. This whole operation can be summarized as Xk+1 = ΓKP (Xk ,·) (UQ , UA ) and applied recursively to describe the whole Markov chain via ancillary variables. Denoting U0 an ancillary variable for sampling the initial position X0 , X0 = ΓP0 (U0 )

(5)

Xk = ΓKP (Xk−1 ,·) (UQk , UAk ).

(6)

We further use superscript P to specify the chain target distribution, thus denoting XkP the k-th position of the Metropolis–Hastings chain targeting P . The ancillary variables can now be utilized for simultaneous sampling. Let M be a credal set, a set of probability distributions on some common space X. Define Xk = {ΓP k (UQk , UAk ), P ∈ M} a set of k-th positions of the respective Markov Chains targeting distributions P ∈ M. Since X contains each individual sample, inf Xk ≤ ΓP k (UQk , UAk ) = XkP ,

Imprecise MCMC

177

1 1 inf f (Xi ) ≤ inf f (ΓP i (UQi , UAi )) n P ∈M n k+n

k+n

i=k

i=k

1 = inf f (XiP ) = inf Eˆ P f. P ∈M n P ∈M

(7)

k+n

(8)

i=k

Similarly as in Sect. 3, we can use the sampled set process as a lower estimate of the lower expected value of interest. The procedure can be visualized as follows. First, represent M by M ⊂ Rd , which is trivial if M is a set of parametrized distributions. For example, M = [1, 2] ⊂ R for representing a set of exponential distributions, M, with rate parameters λ ∈ M. In order to keep an analogue to the Markov property for the constructed chain of sets, we define random elements Y : M → X. These are random mappings representing the relation between individual distributions P ∈ M (aka M) and random samples, XP , related to them. In the notation introduced above, Xk = Yk (M) is the collection of k-th sampled chain positions of all the distributions in M. Figure 1 depicts a sample of Yk (blue line), proposed moves for each of the chains (orange line), evaluated acceptance criteria (red line below), and

Fig. 1 An iteration of iMCMC. A new step x (orange) is proposed based on current positions of all the chains (blue) in M, acceptance criterion a(x, x ) (red) is computed, and chains are propagated (black dashed) based on common random variable U (green horizontal line). One of the sheaves, M2 , is split during this step

178

D. Krpelik et al.

Fig. 2 An example trajectory of a set of Markov chains. Depicted in red are 5 individual chains. Green area represents convex closure of sets of positions of 1000 chains. The lower envelope is used to (under)estimate EX

new positions of the chains Yk+1 (dashed black line). In the example, some chains accepted their proposals and some rejected. An example of evolution of the set of chains is depicted in Fig. 2. In this example, we construct chains for an imprecise normal distribution with means μ ∈ [−2, 2]. Therefore, M = [−2, 2]. The set-valued estimator, Eq. (7)—left-hand side, gives ˆ of −2.4 < −2 = EX. lower bound on EX

5 Practical Implementation In this section, we will describe how we can practically simultaneously propagate even uncountably many chains. We achieve this by propagating sets of pairs (m, x), where m ⊂ M and x ∈ X. These pairs represent that each of the Markov Chains targeting P ∈ m is, at the given step, at the same position. Denote MCk = {(m1k , xk1 ), (m2k , xk2 ), . . .} as the set of these pairs at step k. In the introduced notation, Yk (y) = x ⇐⇒ ' ∃ pair (m, x) ∈ MCk : y ∈ m. At each step, we require that i mik = M in order to propagate chains for all P ∈ M. Also, omitting the details on boundaries, we require that, for each k, each P ∈ M lies in exactly one of mik . For technical reasons, we will assume that the distributions in M have non-empty intersection of their supports and that all chains start from a common initial point; thus, MC0 = {(M, x0 )}. This requirement can be bypassed. We choose a common proposal kernel Q for all P ∈ M. The MCMC procedure described in Sect. 4 will proceed similarly only jointly for each of the pairs in MCk . At each step, for each of the pairs, we will construct proposal position xi by (pseudo-random) sampling from Q(xki , .). This will depend

Imprecise MCMC

179

solely on the last position of the set of chains, xki , and hence be common to all P ∈ mik because we construct Xi as a transformation of some shared ancillary variable UQ . The ancillary decision variable UA is again common to all chains, so one of the three situations may occur. Either: 1. ∀P ∈ mik : uA < aP (xki , xi ) 2. ∀P ∈ mik : uA ≥ aP (xki , xi ) 3. ∃P1 , P2 ∈ mik : uA < aP1 (xki , xi ) ∧ uA ≥ aP2 (xki , xi ) i ) will be included in MC In cases 1 and 2, the whole pair (mik , xk+1 k+1 . In the latter i+ i case, we will instead include two new pairs into MCk+1 , (mk , xi ), and (mi− k , xk ), i based on which P ∈ mk accept or reject the proposal. We refer to the latter situation as branching. The method is indicated in Fig. 1. In the figure, we can see 4 distinct pairs (mik , xki ), common proposals x i for each of the pairs, and a split occurring for m2k . 2− 2 3 3 4 4 MCk+1 therefore equals {(m1k , x1 ), (m2+ k , x2 ), (mk , xk ), (mk , xk ), (mk , xk )}.

6 Linear Representation for Exponential Families Splitting of credal sets is generally an intractable task since we would need not only to check uncountably many conditions, but also represent an arbitrary partition of mik . Nevertheless, in certain cases, this can be done efficiently. Assume that the credal set M is a set of distributions from the same exponential family. Their densities can therefore be represented in natural form as p(x|η) = h(x)g(η) exp ηT (x) .

(9)

Inserting Eq. (9) into the criterion for accepting proposal x , we obtain h(x )g(η) exp ηT (x ) q(x , x) uA < aη (x, x ) = h(x)g(η) exp ηT (x) q(x, x )

=

h(x )q(x , x) ln(uA ) − ln h(x)q(x, x )

h(x )q(x , x) exp η(T (x ) − T (x)) h(x)q(x, x )

< η(T (x ) − T (x)),

(10) (11) (12)

which is a linear condition if we represent the credal set in the space of natural parameters as M = {Pη : η ∈ M}, where M ⊂ Rd }. A similar simplification may also occur for some random vectors, e.g., independent random variables. A special case consists of hierarchical models where the imprecision is in variables without parents like Bayesian inference with imprecise

180

D. Krpelik et al.

prior distributions. Take an arbitrary, reasonable, parametric likelihood function and model the prior knowledge about its parameters with an exponential family density p0 (θ |η). The prior credal set is hence M0 = {p0η , η ∈ M0 }, where M0 is the set that contains all the plausible values of the hyperparameters η. In the light of observation x, the Bayesian posterior is pη (θ |x) = 9

L(x|θ )p0 (θ |η) . L(x|θ )p0 (θ |η)dθ

(13)

Plugging Eq. (13) into the acceptance ratio for proposal a proposal position (denoted as x in the above) τ : a(θ, τ ) =

pη (τ |x)q(τ, θ ) pη (θ |x)q(θ, τ )

q(τ, θ ) L(x|τ ) = q(θ, τ ) L(x|θ ) = l(x, θ, τ )

(14) p0 (τ |η) L(x|τ )p0 (τ |η)dτ p0 (θ|η) 9 L(x|θ)p0 (θ|η)dθ

9

q(τ, θ ) p0 (τ |η) , q(θ, τ ) p0 (θ |η)

(15)

(16)

) where l(x, θ, τ ) = L(x|τ L(x|θ) is a real-valued function independent of η. Since p0 is from the exponential family, Eq. (16) can be transformed into a scaled version of Eq. (11); hence, the splitting boundary will remain linear and l(x, θ, τ ) is affecting only the offset of the splitting hyperplane. Another advantage of the finite implementation is that Xk will remain finite. This enables us to evaluate inf f (Xk ) through comparison of all elements instead of solving an intractable optimization problem.

7 Computer Representation of the Credal Sets Using sets of distributions from the exponential family allowed us to represent the splitting conditions by hyperplanes for a wide class of models. Although tracking the branches themselves during the algorithm, progression could still be intractable. In order to facilitate this task, we will represent the credal sets as sets of inequalities. This poses some restrictions on M since it also needs to be represented in this way. Assume that the imprecise model allows linear splitting as described in Sect. 6. Denote C(M) a set of linear inequality constraints cjT η > bj , such that M = {η : Cη & b}, where & represents vector dominance. We rewrite Eq. (12) as b(ua , x, x ) < s(x, x )T η for the sake of compact representation. At each branching − occurrence, the newly created sets m+ k , mk can be represented as new sets of inequalities, such that

Imprecise MCMC

181

T C(m+ k ) = C(mk ) ∪ {s(xk , xk ) η > b(uA , xk , xk )},

(17)

T C(m− k ) = C(mk ) ∪ {−s(xk , xk ) η > −b(uA , xk , xk )},

(18)

while omitting the boundary cases again. Branching occurs if ∃η1 , η2 : s(x, x )T η1 > b(uA , x, x ) ∧ s(x, x )T η2 < b(uA , x, x ). This can be checked by solving two linear programs. Define b := min s(x, x )T η,

(19)

η∈m

b := max s(x, x )T η = − min −s(x, x )T η. η∈m

η∈m

(20)

Condition η ∈ m represents the set of linear inequality constraints C(m). Clearly, b and b represent achievable bounds on b = s(x, x )T η so branching occurs iff b ∈ (b, b). If b < b, then all η ∈ m accept the proposal x . If b > b, then all η ∈ m reject it. This procedure would add a new constraint at each branching occasion. This might eventually make some of them redundant. Whether a constraint is redundant and can be removed from C(m) can be checked every couple of iterations in order to decrease the size of involved linear programs. This can be done for each constraint cjT η > bj in C(m) by either: – Solving minη∈m cjT η. If the solution is greater than bj , then the constraint is redundant. – Solving a dual problem to minη∈m cjT η. If constraint j is inactive, it is redundant.

8 Credal Set Merging Branching emerging from the above-mentioned procedures would produce an everincreasing number of partitions of M and become intractable for construction of long chains. In this section, we will show how to partially counter this tendency through special choice of the proposal kernel Q. qm (x |x) = (1 − α)qrw (x |x) + αq(x ).

(21)

If the proposal kernel is independent of the last position of the chain, it enables proposing the same position even for diverged chains with positive probability that all will accept this proposal. On this occasion, previously branched chains would coalesce. This tendency can also be achieved by introducing the independent proposal distribution as a component of a proposal mixture (Eq. (21)) together with the standard random walk kernel.

182

D. Krpelik et al.

Fig. 3 An example of a typical evolution of the number of credal set partitions for a run of an ensemble model with |M| = 1000 for sampling from an imprecise normal distribution with set mean and variance; so dim(η) = 2

Occasional merging of the split branches allows us to decrease the computational demands of the procedure. If it is allowed, we need to adjust our implementation in order to keep track of the individual chains in order to estimate Ef by the minimum of MCMC estimates over M as on the right-hand side of Eq. (7). In the uncountable case, an expensive post hoc analysis is required in order to untangle all possible trajectories of the chains and obtain theoretically tighter bounds than from the set representation on the left-hand side of Eq. (7). Random set-induced bounds on Ef are therefore used in practice. Nevertheless, the branching tendency for uncountable credal sets still leads to exponential growth of credal set partitions. Although the merging is frequent, it cannot be guaranteed that the number of partitions will remain within limits imposed by our hardware. A pragmatic solution to this problem is to limit ourselves to calculations with finite credal sets. For an infinite one, it is possible to create its finite subset and use the resulting estimators to estimate lower expectation over the original set. A combination of a dense finite subset of the original credal set with the above-mentioned methods can counter both the computational costs of tending to a large number of chains individually and the pathological excessive branching. Evolution of the number of branches for |M| = 1000 for a simple model of imprecise normal distribution is depicted in Fig. 3. During the evolution, the active amount of branches remains a fraction of the original credal set size. This indicates that some similar ensemble techniques might play an essential role in practical implementation of the algorithm.

Imprecise MCMC

183

9 Discussion We have introduced several steps toward Markov Chain Monte Carlo methods for imprecise probabilities based on an extension of Metropolis–Hastings algorithm. The procedure can be used to obtain conservative bounds on extremes of MCMC estimators arising from Markov Chains targeting individual distributions in the credal set. For a class of problems, it was shown how to design a numerically tractable representation of the set of individual chains even for uncountable credal sets in Sect. 6. Nevertheless, we judge that the methodology is still not ready for practical use. For uncountable credal sets, the exact procedure leads to high demand on computational resources through excessive branching of the chains. Although this tendency can be limited to some extent, as described in Sect. 8, it cannot be guaranteed that these requirements will be bounded. A heuristic approach was proposed in Sect. 8 based on an approximation of infinite credal sets by an ensemble of distributions. Preliminary results indicate that this could limit the computational demands at the cost of introducing additional error to the estimates. No theoretical analysis of asymptotic properties of the introduced set-valued Markov Chains was provided here. As was discussed in Sect. 4, the procedure practically constructs a Markov Chain (Yk ) of mappings M → X with an unspecified structure. Properties of associated random sets Xk = Yk (M) might be more accessible. Acknowledgments The project is: ERDF “A Research Platform focused on Industry 4.0 and Robotics in Ostrava Agglomeration”, No. CZ.02.1.01/0.0/0.0/17_049/0008425.

Reference 1. Augustin, T., Coolen, F.P., De Cooman, G., Troffaes, M.C. (eds.): Introduction to Imprecise Probabilities. Wiley Series in Probability and Statistics. Wiley, New York (2014) 2. Aumann, R.J.: Integrals of set-valued functions. J. Math. Anal. Appl. 12, 1–12 (1965) 3. Berger, J.O.: An overview of robust Bayesian analysis. Test 3, 5–124 (1994) 4. Brooks, S., Gelman, A., Jones, G., Meng, X.L. (eds.): Handbook of Markov Chain Monte Carlo. Chapman and Hall/CRC, New York (2011) 5. Kroese, D.P., Taimre, T., Botev, Z.I. (eds.): Handbook of Monte Carlo methods. Wiley Series in Probability and Statistics. Wiley, New York (2011) 6. Nguyen, H.T.: An Introduction to Random Sets. Chapman and Hall/CRC, New York (2006) 7. O’Neill, B.: Importance sampling for Bayesian sensitivity analysis. Int. J. Approx. Reason. 50, 270–278 (2009) 8. Troffaes, M.C.: Imprecise Monte Carlo simulation and iterative importance sampling for the estimation of lower previsions. Int. J. Approx. Reason. 101, 31–48 (2018) 9. Walley, P.: Statistical reasoning with imprecise probabilities. Chapman & Hall/CRC Monographs on Statistics & Applied Probability. Taylor & Francis, London (1991)

Computing Expected Hitting Times for Imprecise Markov Chains Thomas Krak

1 Introduction We study the non-negative solution of, and in particular a numerical method to solve non-negatively, the (possibly) non-linear system h = IAc + IAc · T h ,

(1)

where A is a non-empty strict subset of a non-empty finite set X, Ac := X \ A, RX is the set of all maps from X to R, IAc ∈ RX is the indicator of Ac , defined as IAc (x) := 1 if x ∈ Ac and IAc (x) := 0 otherwise, h ∈ RX is the vector that solves the system, and T : RX → RX is a (possibly) non-linear map that satisfies the coherence conditions [8] C1. T (αf ) = αT f for all f ∈ RX and α ∈ R≥0 ; C2. T f + T g ≤ T (f + g) for all f, g ∈ RX ; C3. minx∈X f (x) ≤ T f for all f ∈ RX .

(non-negative homogeneity) (super-additivity) (lower bounds)

Note that in Eq. (1), the operation · denotes element-wise multiplication, here applied to the functions IAc and T h. The motivation for this problem comes from the study of Markov chains using the theory of imprecise probabilities [1, 22]. In this setting, X is interpreted as the

Supported by H2020-MSCA-ITN-2016 UTOPIAE, grant agreement 722734. T. Krak () ELIS – FLip, Ghent University, Ghent, Belgium e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Vasile, D. Quagliarella (eds.), Advances in Uncertainty Quantification and Optimization Under Uncertainty with Aerospace Applications, Space Technology Proceedings 8, https://doi.org/10.1007/978-3-030-80542-5_12

185

186

T. Krak

set of possible states that some dynamical system of interest can be in, and T is the lower transition operator, which represents the (imprecise) probability model for switching from any state x to any state y in a single time step, e.g., T I{y} (x) = P (Xn+1 = y | Xn = x) ,

(2)

where P (Xn+1 = y | Xn = x) denotes the lower probability, i.e., a tight lower bound on the probability, that the system will be in state y at time n + 1, if it is in state x at time n. This lower bound is understood to be taken with respect to the set of stochastic processes induced by T , and it is this set that is called the corresponding imprecise Markov chain. We refer to [3, 7–9, 12, 13, 15, 18, 20] for further general information on Markov chains with imprecise probabilities. In this setting, the minimal non-negative solution h to Eq. (1) is then such that h(x) can be interpreted as a tight lower bound on the expected number of steps it will take the system to move from the state x, to any state contained in A. This vector h is called the (lower) expected hitting time of A, for the imprecise Markov chain characterized by T . A version of this characterization of the expected hitting times for imprecise Markov chains was first published by De Cooman et al. [6] and was later generalized to Markov chains under various imprecise probabilistic interpretations by Krak et al. [16]. In the special case that T is a linear map, i.e., when C2 is satisfied with equality for all f, g ∈ RX , then T can be interpreted as a |X| × |X| matrix T , which is called the transition matrix inthe context of (classical) Markov chains. This T is row-stochastic, meaning that y∈X T (x, y) = 1 for all x ∈ X, and T (x, y) ≥ 0 for all x, y ∈ X, and encodes the probability to move between states in one step, i.e., T (x, y) = P (Xn+1 = y | Xn = x), in analogy to Eq. (2). In this case, the minimal non-negative solution h to Eq. (1) is well-known to represent the expected hitting time of A for the homogenous Markov chain identified by T ; see, e.g., [19] for details. Before moving on, let us first consider why the problem is somewhat ill-posed. Proposition 1 ([16]) Let T : RX → RX be a map that satisfies the coherence conditions C1–C3. Then Eq. (1) has a (unique) minimal non-negative solution h in (R ∪ {+∞})X , where minimality means that h(x) ≤ h(x) for all x ∈ X and all non-negative h ∈ (R ∪ {+∞})X that solve Eq. (1). So, under our present assumptions, the solution is not necessarily unique—although there is a unique minimal solution—and the solution can be infinite-valued. It can be shown that in particular h(x) = +∞ if and only if moving from x to A in a finite number of steps has upper probability zero. Conjugate to the notion of lower probabilities, this upper probability is a tight upper bound on the probability that

Computing Expected Hitting Times for Imprecise Markov Chains

187

some event will happen. To exclude this case from our analysis, we will in the sequel assume that T also satisfies the reachability condition1 R1. For all x ∈ Ac , there is some nx ∈ N such that T nx IA (x) > 0 . Verifying whether a given map T satisfies R1 can be done, e.g., using the algorithm for checking lower reachability described in [5], simply replacing the map Q from that work with the map T . As that author notes, this algorithm takes O(|X|3 ) time to verify whether T satisfies R1. We refer to Sect. 4 below for details on how T may be evaluated in practice. In any case, we are now ready to state our first main result. Proposition 2 Let T : RX → RX be a map that satisfies the coherence conditions C1–C3 and the reachability condition R1. Then Eq. (1) has a unique solution h in RX , and this solution is non-negative. The proof of this result requires some setup and is given in Sect. 2. For now, we note that the unique solution h in RX that Proposition 2 talks about is also the (unique) minimal non-negative solution in (R ∪ {+∞})X ; hence, in particular, we can apply the interpretation of h as the vector of lower expected hitting times for an imprecise Markov chain, as discussed above. The following formalizes this. Corollary 1 Let T : RX → RX be a map that satisfies the coherence conditions C1–C3. If Eq. (1) has a unique solution h in RX , and if h is non-negative, then h is the minimal non-negative solution of Eq. (1) in (R ∪ {+∞})X . Proof Let g ∈ (R ∪ {+∞})X be the (unique) minimal non-negative solution of Eq. (1), whose existence is guaranteed by Proposition 1. Then g(x) ≤ h(x) for all x ∈ X, because g is minimal and because h is a non-negative solution to Eq. (1). Because h ∈ RX , this implies that g(x) ∈ R for all x ∈ X, whence g ∈ RX , and therefore, it follows that g = h because h is the unique solution of Eq. (1) in RX . Hence, and because g is the minimal non-negative solution of Eq. (1) in (R ∪ {+∞})X , so is h. ( ' Having established the existence of a unique real-valued solution to the system, our second main result is a numerical method for computing this solution. For comparison, in [16] the authors present an iterative method that can be directly applied as a computational tool—see Proposition 3 in Sect. 2—but which is only asymptotically exact, and whose runtime scales with h∞ , making it impractical when (some of) the expected hitting times are numerically large. In contrast, the novel method that we present in Sect. 3 is independent of h∞ , and we show that it converges to the correct solution in a finite number of steps under practically realistic assumptions on T (but ignoring numerical issues with finite-precision implementations).

1 Here

and in what follows, we take N to be the set of natural numbers without zero.

188

T. Krak

2 Existence of Solutions 2 2 2 2 The space RX is endowed with the supremum norm, i.e., 2f 2 := 2f 2∞ := maxx∈X |f (x)| for all f ∈ RX . Mappings M : f → Mf from RX to RX receive 2 2 2 2 the induced operator norm M := sup{2Mf 2 : f ∈ RX , 2f 2 ≤ 1}. Such a map M is called bounded if it maps (norm-)bounded sets to (norm-)bounded sets; if the map is non-negatively homogeneous (i.e., if it satisfies Property C1), then it is bounded if and only if M < +∞. Note that this includes, as a special case, that linear maps are bounded if and only if their norm is bounded. An element f ∈ RX that is identically equal to some μ ∈ R, i.e., f (x) = μ for all x ∈ X, is simply written as μ ∈ RX . For any f, g ∈ RX , we take f ≤ g to mean that f (x) ≤ g(x) for all x ∈ X. For any B ⊆ X, we define the indicator IB ∈ RX of B, for all x ∈ X, as IB (x) := 1 if x ∈ B, and IB (x) := 0, otherwise. For a given map T that satisfies C1–C3, we introduce the conjugate map T : RX → RX that is defined, for all f ∈ RX , as T f := −T (−f ). It is easily verified that this map satisfies the (conjugate) coherence conditions UC1. T (αf ) = αT f for all f ∈ RX and α ∈ R≥0 ; (non-negative homogeneity) UC2. T (f + g) ≤ T f + T g for all f, g ∈ RX ; (sub-additivity) (upper bounds) UC3. T f ≤ maxx∈X f (x) for all f ∈ RX . This map T is known as the upper transition operator in the context of imprecise Markov chains and yields, in analogy to Eq. (2), a tight upper bound on the probability of moving between states. Note that the map T is a linear map if and only if T f = T f for all f ∈ RX . T gives rise to the system h = IAc + IAc · T h ,

(3)

whose minimal non-negative solution h can be interpreted as a tight upper bound on the expected hitting times of a Markov chain with imprecise probabilities. It follows that in particular h ≤ h; we refer to [6, 16] for further details. The following properties are well-known to hold for maps that satisfy C1–C3; we state them here for convenience. Reference [17] happens to contain all of these, but most of the references on imprecise Markov chains state at least some of them. Here and in what follows, for any n ∈ N, we use M n to denote the n-fold composition of a map M : RX → RX with itself. Lemma 1 Let T : RX → RX be a map that satisfies C1–C3, and let T be its conjugate map. Then, for all f, g ∈ RX and all n ∈ N, it holds that T1. T2. T3. T4.

n

minx∈X f (x) ≤ T n f ≤ T f ≤ maxx∈X f (x); f ≤ g ⇒ T nf ≤ T ng X T n (f + μ) =2T n f2 + μ for 2 all constant μ ∈ R 2 2T n f − T n g 2 ≤ 2f − g 2

(bounds) (monotonicity) (constant additivity) (non-expansiveness)

Computing Expected Hitting Times for Imprecise Markov Chains

189

Corollary 2 Let T : RX → RX be a map that satisfies C1–C3. Then T is bounded. 2 2 Proof Let f ∈ RX be such that 2f 2 2≤ 21, and fix 2 x ∈ X. It then2 follows 2 any 2 2 2 2 2 2 2 from Property T1 and2 the 2 definition of f that − f ≤ [T f ](x) ≤ f , and hence, |[T f ](x)| ≤ 2f 2 ≤ 1. Because this is true for any x ∈ X, it follows that 2 2 2 2 2T f 2 ≤ 1. Because this is true for any f ∈ RX with 2f 2 ≤ 1, it follows from the 2 2 definition of the operator norm that 2T 2 ≤ 1, whence T is bounded. ( ' The following result provides an iterative method to find the minimal nonnegative solution(s) to the, in general, non-linear systems (1) and (3). Proposition 3 ([16]) Let T : RX → RX be a map that satisfies C1–C3, and let T be its conjugate map. Define h0 := h0 := IAc and, for all n ∈ N, let hn := IAc + IAc · T hn−1 and hn := IAc + IAc · T hn−1 . Then h∗ := limn→+∞ hn and h∗ := limn→+∞ hn exist in (R∪{+∞})X . Moreover, h∗ is the minimal non-negative solution to Eq. (1), and h∗ is the minimal non-negative solution to Eq. (3). Proof This follows from [16, Proposition 10, Theorem 12, and Corollary 13].

( '

The scheme proposed in this result can also be directly applied as a computational method, the complexity of which we will analyze and compare to our proposed method, in Sect. 4. We need the following, somewhat abstract, definition that is often encountered in the context of Markov chains with imprecise probabilities. Definition 1 Let T be a set of maps from RX to RX . For any map T : RX → RX and any x ∈ X, let Tx : RX → R : f → [Tf ](x). Then we say that T has % + * separately specified rows, if T = T : RX → RX % ∀x ∈ X : Tx ∈ Tx , where, for all x ∈ X, Tx := {Tx | T ∈ T}. This definition is needed to obtain the following well-known dual representation of T , of which we will make extensive use throughout the remainder of this work. Because the result is well-known (see, e.g., [9, 13]), we omit the proof here. Proposition 4 Let T : RX → RX be a map that satisfies C1–C3. Then there is a unique non-empty, closed, and convex set T that has separately specified rows, such that each T ∈ T is a linear map from RX to RX that satisfies C1–C3, and such that, for all f ∈ RX , T f = infT ∈T Tf . Moreover, for all f ∈ RX , there is some T ∈ T such that T f = Tf . Note that the elements T ∈ T of the dual representation T of T are linear maps from RX to RX that all satisfy C1–C3. Therefore, as noted in Sect. 1, these maps can be represented as |X| × |X| matrices that are row-stochastic. The dual representation T of T is also conveniently connected to the conjugate map T , as follows. We omit the proof, which is straightforward.

190

T. Krak

Corollary 3 Let T : RX → RX be a map that satisfies C1–C3, let T be its conjugate map, and let T be its dual representation. Then for all f ∈ RX , it holds that T f = supT ∈T Tf and T f = Tf for some T ∈ T. We note that, because the elements T ∈ T of the dual representation T of T are linear maps from RX to RX that satisfy C1–C3, the minimal non-negative solutions hT of the linear system hT = IAc + IAc · T hT exist due to Proposition 1. As established in Sect. 1, these hT can be interpreted as the vectors of expected hitting times of Markov chains parameterized by T ∈ T. The next result is established in [16] and formalizes the interpretation of the solutions to the systems (1) and (3) as providing bounds on the expected hitting times for the set of Markov chains induced by the elements T of T. Proposition 5 ([16]) Let T : RX → RX be a map that satisfies C1–C3, let T be its conjugate map, and let T be its dual representation. For any T ∈ T, let hT be the minimal non-negative solution of the linear system hT = IAc + IAc · T hT . Let h and h be the minimal non-negative solutions of the systems (1) and (3), respectively. Then h = inf hT T ∈T

and

h = sup hT . T ∈T

Moreover, there is some T ∈ T such that h = hT , and there is some (possibly different) T ∈ T such that h = hT . Proof This follows from [16, Lemma 8, Theorem 12 and Corollary 13].

( '

We are now ready to begin the analysis that shows that the reachability condition R1 is a sufficient assumption to establish Proposition 2. We start with the following result, which says that if T satisfies the reachability condition R1, then so do the elements of its dual representation T. Lemma 2 Let T : RX → RX be a map that satisfies C1–C3 and the reachability condition R1, and let T be its dual representation. Then for all T ∈ T and all x ∈ Ac , there is some nx ∈ N such that T nx IA (x) > 0; hence, T then also satisfies R1. Proof Fix any T ∈ T and any x ∈ Ac . Due to R1, there is some nx ∈ N such that [T nx IA ](x) > 0. Because T ∈ T, it follows from Proposition 4 that [T nx IA ](x) = [T T nx −1 IA ](x) ≥ T T nx −1 IA (x) ≥ · · · ≥ [T nx IA ](x) > 0 , where we repeatedly used the monotonicity property T2.

( '

Let us investigate this reachability property for linear maps more fully. We need the following two results.

Computing Expected Hitting Times for Imprecise Markov Chains

191

Lemma 3 Let T : RX → RX be a linear map that satisfies C1–C3 and the reachability condition R1. Then there is some x ∈ Ac such that [T IA ](x) > 0. Proof Suppose ex absurdo that this is false. Since it follows from Property T1 that [T IA ](x) ≥ miny∈X IA (y) = 0, we must have [T IA ](x) = 0 for all x ∈ Ac . This provides the induction base for n = 1 in the following induction argument: suppose that for some n ∈ N, [T n IA ](x) = 0 for all x ∈ Ac ; we will show that then also [T n+1 IA ](x) = 0 for all x ∈ Ac . First, for any x ∈ A we have [T n IA ](x) ≤ maxy∈X IA (y) = 1 = IA (x) due to T1. Moreover, for any x ∈ Ac , we have [T n IA ](x) = 0 = IA (x) by the induction hypothesis. Hence, we have T n IA ≤ IA . It follows that, for any x ∈ Ac , [T n+1 IA ](x) = [T T n IA ](x) ≤ [T IA ](x) = 0 , using the monotonicity of T (Property T2) for the inequality, and the argument at the beginning of this proof for the final equality. Because it follows from Property T1 that [T n+1 IA ](x) ≥ miny∈X IA (y) = 0, this implies that [T n+1 IA ](x) = 0 for all x ∈ Ac . This concludes the proof of the induction step. Hence, we have established that, for all x ∈ Ac , [T n IA ](x) = 0 for all n ∈ N that, because Ac is non-empty, contradicts the assumption that T satisfies R1. Hence, our assumption must be false, and there must be some x ∈ Ac such that [T IA ](x) > 0. ( ' Lemma 4 Let T : RX → RX be a linear map that satisfies C1–C3 and the c reachability any x ∈ A and n ∈ N with n > 1, and suppose n condition R1. Fix m that T IA (x) > 0 and T IA (x) = 0 for all m ∈ N with m < n. Then there is some y ∈ Ac such that [T I{y} ](x) > 0 and [T n−1 IA ](y) > 0. Proof Because IA = z∈A I{z} , and using the linear character of T —and therefore of T n —we have [T n I{z} ](x) , 0 < T n IA (x) = z∈A

and hence, there must be some z ∈ A such that [T n I{z} ](x) > 0. Next, we note that for any f ∈ RX it holds that f = y∈X f (y)I{y} . Hence, expanding the product T n and using the linearity of T —and therefore of T n−1 —yields

T [T n−1 I{z} ](y)I{y} (x) . [T n I{z} ](x) = T T n−1 I{z} (x) = y∈X Because [T n I{z} ](x) > 0, there must be some y ∈ X such that 0 < T [T n−1 I{z} ](y)I{y} (x) = [T n−1 I{z} ](y) T I{y} (x) ,

192

T. Krak

using the linearity of T for the equality. Since both factors in this expression are clearly non-negative due to Property T1, this implies that [T I{y} ](x) > 0 and [T n−1 I{z} ](y) > 0. Because z ∈ A we have I{z} ≤ IA , and hence, this last inequality together with Property T2 implies that 0 < [T n−1 I{z} ](y) ≤ [T n−1 IA ](y) , so we see that it only remains to show that y ∈ Ac . Suppose ex absurdo that y ∈ A. Then I{y} ≤ IA , and because [T I{y} ](x) > 0, it follows from Property T2 that 0 < [T I{y} ](x) ≤ [T IA ](x) , which contradicts the assumption that [T m IA ](x) = 0 for all m < n. Hence, y ∈ Ac , which concludes the proof. ( ' In the sequel, we will be interested in some results about mappings on subspaces of RX . The following definition introduces the required notation. Definition 2 For any f ∈ RX , we denote by f |Ac the restriction of f to Ac , i.e., c the mapping f |Ac : Ac → R : x → f (x), which is an element of RA . Moreover, let M : RX → RX be a map. Then we define its restriction M|Ac to c c the subspace RA of RX , for all f ∈ RA , as

M|Ac (f ) := M(g · IAc ) |Ac

(g ∈ RX : g|Ac = f ) .

c

c

c

The space RA again receives the supremum norm and maps from RA to RA the induced operator norm. Note that if the map M in the previous definition is a linear map, then also its restriction M|Ac is a linear map. Hence in particular, in that case M can be interpreted as a matrix, and the restriction M|Ac can then be interpreted as the |Ac | × |Ac | submatrix of M on the coordinates Ac . Moreover, we note the following simple property: Lemma 5 Let M : RX → RX be a bounded map. Then its restriction M|Ac to c RA is also bounded. 2 2 c Proof Choose any f ∈ RA with 2f 2 ≤ 1, and let g ∈ RX be2such 2 Ac = f 2 g| 2 that 2 2 2 c and g(x) = 0 for all x ∈ A. Then, it holds that g · IA = g and g = f 2. Hence, it follows that 2 2 2 2 2 2

% 2 2 2 2M|Ac f 2 = 2 2 M(g · IAc ) %Ac 2 = 2(Mg)|Ac 2 ≤ 2Mg 2 ≤ M , 2 2 2 2 where the last inequality used that 2g 2 = 2f 2 ≤ 1.

( '

We next need some elementary results from the spectral theory of bounded linear c c maps, in particular for linear maps from RA to RA . The following definition aggregates concepts from [10, Chapter 7, Definitions 1.2, 3.1, 3.5].

Computing Expected Hitting Times for Imprecise Markov Chains c

193

c

Definition 3 ([10]) Let M : RA → RA be a bounded linear map. The spectrum of M is the set σ (M) := {λ ∈ C : (λI − M) is not one-to-one}, where I denotes c the identity map on RA . The spectral radius of M is ρ(M) := supλ∈σ (M) |λ|. The resolvent R(λ, M) of M is defined for all λ ∈ C\σ (M) as R(λ, M) := (λI −M)−1 . Because Ac is finite (since X ⊃ Ac is finite), the spectrum σ (M) of any bounded c c linear map from RA to RA is a non-empty and finite set [10, Chapter 7, Corollary 1.4]. We will need the following properties of these objects. c

c

Lemma 6 ([10, Chapter 7, Lemma 3.4]) Let M : RA → RA be a bounded 1 k linear map. Then for all λ ∈ C with |λ| > ρ(M), the series +∞ k=0 λk+1 M converges in norm to R(λ, M). The next property is well-known, but we had some trouble finding an easy-to-use reference; hence, we prove it explicitly below. c

c

Corollary 4 Let M : RA → RA be a bounded linear map, and suppose that ρ(M) < 1. Then limn→+∞ M n = 0. 2 2 2 2 Proof Because ρ(M) < 1, we have limn→+∞ 2 nk=0 M k − R(1, M)2 = 0 by Lemma 6. Now for any n ∈ N, it holds that 2 2 2 2 2 n 2 2 2 n−1 n n−1 2 2 n 2 2 2 2 k k2 k k2 2 2M 2 = 2 = M − M M − R(1, M) + R(1, M) − M 2 2 2 2 2k=0 2 2k=0 2 k=0 k=0 2 2 2 2 2 n 2 2 2 n−1 2 k 2 2 2 k2 2 2 2 ≤2 M − R(1, M)2 + 2R(1, M) − M 2. 2k=0 2 2 2 k=0 Because both summands 2 on2 the right-hand side vanish as we take n to +∞, it follows that limn→+∞ 2M n 2 = 0, or in other words, that limn→+∞ M n = 0. ( ' c

c

Lemma 7 ([10, Chapter 7, Lemma 3.4]) Let M : RA → RA be a bounded 2 21 linear map. Then ρ(M) ≤ 2M n 2 n for all n ∈ N. The crucial observation is now that, for a linear map T that satisfies C1–C3 and c the reachability condition R1, its restriction T |Ac to RA is a bounded linear map (due to Corollary 2 and 2 Lemma 2 5) that for large enough n ∈ N, as we will see in Corollary 5, satisfies 2(T |Ac )n 2 < 1. We can therefore use Lemma 7 to establish that the spectral radius of T |Ac is less than 1, and we can then apply Lemma 6 and Corollary 4. Let us establish that these claims are indeed true. We start with the following result, which gives some basic properties of these restrictions. Lemma 8 Let T : RX → RX be a linear map that satisfies C1–C3, and let T |Ac c be its restriction to RA , as in Definition 2. Then (T |Ac )n f ≤ (T |Ac )n g for all n ∈ N

194

T. Krak c

and all f, g ∈ RA with f ≤ g. Moreover, it holds that 0 ≤ (T |Ac )n 1 ≤ 1 for all n ∈ N. Proof For the first claim, we proceed by induction on n. For the induction base, let f , g ∈ RX be such that f (x) := 0 and g (x) := 0 for all x ∈ A, and f (x) := f (x) and g (x) := g(x) for all x ∈ Ac . Then, clearly, f |Ac = f and g |Ac = g, and f ≤ g because f ≤ g. Moreover, it holds that f · IAc = f and g · IAc = g . It follows from Definition 2 that, for any x ∈ Ac , since f ≤ g , it holds that [T |Ac f ](x) = [T (f · IAc )](x) = [T (f )](x) ≤ [T (g )](x) = [T (g · IAc )](x) = [T |Ac g](x) , where we used that T satisfies Property T2 for the inequality. Because this is true for any x ∈ X, it follows that T |Ac f ≤ T |Ac g. Now suppose that (T |Ac )n f ≤ (T |Ac )n g for some n ∈ N. Then also

(T |Ac )n+1 f = T |Ac (T |Ac )n f ≤ T |Ac (T |Ac )n g = (T |Ac )n+1 g , using the argument for the induction base for the inequality. For the second claim, start by noting that IAc |Ac = 1, IAc · IAc = IAc , and 0 ≤ IAc ≤ 1. Hence, it follows from Definition 2 that, for all x ∈ Ac , 0 = [T 0](x) ≤ [T IAc ](x) = [T |Ac 1](x) = [T IAc ](x) ≤ [T 1](x) = 1 , where we used the linearity of T for the first equality, Property T2 for the first inequality, Definition 2 for the second and third equalities, Property T2 for the second inequality, and Property T1 for the last equality. Hence, we have 0 ≤ T |Ac 1 ≤ 1. By repeatedly using the monotonicity property established in the first part of this proof, it follows that also for all n > 1 we have that

(T |Ac )n 1 = (T |Ac )n−1 T |Ac 1 ≤ (T |Ac )n−1 1

= (T |Ac )n−2 T |Ac 1 ≤ · · · ≤ T |Ac 1 ≤ 1 . To establish that 0 ≤ (T |Ac )n 1 for all n ∈ N, we note that the case for n = 1 was already given above. Now, for any n > 1, we have (T |Ac )n 1 = (T |Ac )n−1 (T |Ac 1) ≥ (T |Ac )n−1 0 = 0 , where for the inequality we used the property 0 ≤ T |Ac 1 together with the monotonicity property established in the first part of this proof; and where we used the linearity of (T |Ac )n−1 for the final equality. ( '

Computing Expected Hitting Times for Imprecise Markov Chains

195

Lemma 9 Let T : RX → RX be a linear map that satisfies C1–C3 and the reachability condition R1. Then there is some n ∈ N such that for all k ∈ N with k ≥ n, it holds that [(T |Ac )k 1](x) < 1 for all x ∈ Ac . Proof Because T satisfies R1, for all x ∈ Ac there is some unique minimal nx ∈ N such that [T nx IA ](x) > 0. Let n := maxx∈X nx . We will show that [(T |Ac )k 1](x) < 1 for all x ∈ X, for all k ∈ N with k ≥ n. We proceed by induction, as follows. Fix any x ∈ Ac with nx = 1; then [T IA ](x) > 0. Because 1 = IA + IAc , and since [T 1](x) = 1 due to Property T1, it therefore follows from the fact that T is a linear map that 1 = [T 1](x) = [T IA ](x) + [T IAc ](x) > [T IAc ](x) . With g := IAc , we have g|Ac = 1 and g · IAc = IAc , so it follows from the above together with Definition 2 that [T |Ac 1](x) = [T IAc ](x) < 1 . Hence for any x ∈ Ac with nx = 1 it holds that [T |Ac 1](x) < 1. This enables the induction base for m = 1 in the following induction argument: suppose that for some m ∈ N it holds that [(T |Ac )m 1](x) < 1 for all x ∈ Ac with nx ≤ m; we will show that then also [(T |Ac )m+1 1](x) < 1 for all x ∈ Ac with nx ≤ m + 1. First consider any x ∈ Ac with nx ≤ m. By Lemma 8, we have that (T |Ac )1 ≤ 1 and therefore that [(T |Ac )m+1 1](x) = [(T |Ac )m (T |Ac 1)](x) ≤ [(T |Ac )m 1](x) < 1 , where we used the monotonicity of (T |Ac )m established in Lemma 8 for the first inequality, and the induction hypothesis that [(T |Ac )m 1](x) < 1 since nx ≤ m for the final inequality. Hence, we have [(T |Ac )m+1 1](x) ≤ [(T |Ac )m 1](x) < 1 for all x ∈ Ac with nx ≤ m. Now consider any x ∈ Ac with nx = m + 1. Because nx was chosen to be minimal, this means that [T m+1 IA ](x) > 0 and [T k IA ](x) = 0 for all k ∈ N with k ≤ m. Due to Lemma 4, and since m + 1 > 1, this implies that there is some y ∈ Ac such that [T I{y} ](x) > 0 and [T m IA ](y) > 0. This last inequality implies that ny ≤ m, since ny was chosen to be minimal. By the induction hypothesis, this implies that [(T |Ac )m 1](y) < 1. Let g ∈ RX be such that g|Ac = (T |Ac )m 1, and g(z) := 0 for all z ∈ A. Then g · IAc = g, whence it follows from Definition 2 that

[(T |Ac )m+1 1](x) = T |Ac (T |Ac )m 1 (x) = T (g · IAc ) (x) = T g (x) . Next, we note that g = z∈X g(z)I{z} , and so, using the linear character of T , it holds that T g (x) = z∈X g(z) T I{z} ](x). Focusing on the summand for z = y, we find that

196

T. Krak

g(z) T I{z} ](x) = g(y)[T I{y} ](x) + g(z) T I{z} (x) z∈X z∈X\{y} < [T I{y} ](x) + g(z) T I{z} (x) z∈X\{y} ≤ [T I{y} ](x) + T I{z} (x) = 1 , T I{z} (x) = z∈X\{y} z∈X

where the strict inequality holds because, as established above, [T I{y} ](x) > 0 and g(y) = [(T |Ac )m 1](y) < 1; where the second inequality follows because g(z) = 0 for all z ∈ A, because g(z) ≤ 1 for all z ∈ Ac due to Lemma 8, and because [T I{z} ](x) ≥ 0 for all z ∈ X because T satisfies T1; and where the final equality used that z∈X I{z} = 1 together with the fact that T is a linear map that satisfies T1. Hence, we find that [(T |Ac )m+1 1](x) < 1 for all x ∈ Ac with nx = m + 1. Because we already established that [(T |Ac )m+1 1](x) < 1 for all x ∈ Ac with nx ≤ m, we conclude that, indeed, [(T |Ac )m+1 1](x) < 1 for all x ∈ Ac with nx ≤ m+1. This concludes the proof of the induction step, and hence, our induction argument implies that, for any m ∈ N, [(T |Ac )m 1](x) < 1 for all x ∈ Ac with nx ≤ m. Because n = maxx∈Ac nx satisfies n ≥ nx for all x ∈ Ac , it follows that, as claimed, for all k ≥ n we have nx ≤ n ≤ k and hence [(T |Ac )k 1](x) < 1 for all x ∈ Ac . ( ' Corollary 5 Let T : RX → RX be a linear map that satisfies 2 C1–C3 2 and the reachability condition R1. Then there is some n ∈ N such that 2(T |Ac )n 2 < 1. Proof By Lemma 9, there is some n ∈ N such that [(T |Ac )n21](x) < 21 for all n 2 2 x ∈ Ac . Since 0 ≤ (T |Ac )n 1 due 2 to Lemma 2 that (T |Ac ) 1 < 1. It 2 28, this implies n n therefore suffices to show that 2(T |Ac )22 2 ≤ 2(T |Ac ) 12. Ac such that 2f 2 ≤ 1. Then −1 ≤ f ≤ 1 due to the Consider any f ∈ R 2 2 definition of 2f 2. Using the linear character of T |Ac (and therefore of (T |Ac )n ), it follows that −(T |Ac )n 1 = (T |Ac )n (−1) ≤ (T |Ac )n f ≤ (T |Ac )n 1 , n where the inequalities use the 2properties 2 monotonicity 2 2of (T |Ac ) established in n n Lemma 8. This implies that 2(T |Ac ) f 2 ≤ 2(T |Ac ) 12. Because this is true for 2 2 c every f ∈ RA such that 2f 2 ≤ 1, it follows that

2 2 2 2 2 2 + 2 *2 2(T |Ac )n 2 = sup 2(T |Ac )n f 2 : f ∈ RAc , 2f 2 ≤ 1 ≤ 2(T |Ac )n 12 , which concludes the proof.

( '

The next result summarizes the results established above and will be crucial for the remainder of this chapter.

Computing Expected Hitting Times for Imprecise Markov Chains

197

Lemma 10 Let T : RX → RX be a map that satisfies C1–C3 and the reachability condition R1, and let T be its dual representation. For anyT ∈ T, it holds that k limk→+∞ (T |Ac )k = 0 and, moreover, that (I − T |Ac )−1 = +∞ k=0 (T |Ac ) , where I c A is the identity map on R . Proof Fix any T ∈ T. Then it follows from Proposition 4 and Lemma 2 that T is a linear map from RX to RX that satisfies 2C1–C3 and 2 R1. By Corollary 5, this 2(T |Ac )n 2 < 1, which implies that implies that there is some n ∈ N such that 2 21 also 2(T |Ac )n 2 n < 1. Because T |Ac is a linear map (due to Definition 2) that is bounded by1 Corollary 2 and Lemma 5, Lemma 7 therefore implies that ρ(T |Ac ) ≤ 2 2 2(T |Ac )n 2 n < 1. By Corollary 4, this implies that limk→+∞ (T |Ac )k = 0 and, by Lemma 6, that +∞ k −1 ( ' k=0 (T |Ac ) = R(1, T |Ac ) = (I − T |Ac ) . Finally, we need the following result, which identifies the solution of the linear version of the system in Eq. (1). Proposition 6 Let T : RX → RX be a map that satisfies C1–C3 and the reachability condition R1, and let T be its dual representation. Then, for all T ∈ T, there exists a unique solution hT ∈ RX of the linear system hT = IAc + IAc · T hT . This hT is non-negative, and in particular, hT |A = 0 and hT |Ac = (I − T |Ac )−1 1. Proof Let hT be a vector such that hT |A := 0 and hT |Ac := (I − T |Ac )−1 1, where we note that (I − T |Ac )−1 exists by Lemma 10. Then hT ∈ RX ; specifically, hT (x) is finite for all x ∈ X, because (I − T |Ac )−1 is the inverse of a bounded linear map and thus bounded itself. It is easily verified that hT satisfies the linear system of interest; for any x ∈ A, we trivially have hT (x) = 0 = IAc (x) + IAc (x) · [T hT ](x) . Conversely, on Ac , it holds that hT |Ac = (I − T |Ac )−1 1, so, multiplying both sides with (I −T |Ac ), we obtain hT |Ac = 1+T |Ac (hT |Ac ). We note that, because hT |A = 0, it holds that hT · IAc = hT , and hence,

T |Ac (hT |Ac ) = T (hT · IAc ) |Ac = T hT |Ac , using Definition 2, and so

hT |Ac = 1 + T hT |Ac = IAc |Ac + IAc |Ac · T hT |Ac . This establishes that hT solves the linear system of interest. Let us prove that hT is non-negative. Let c := miny∈X hT (y), and let y ∈ X be such that h (y) = c. Then c ∈ R because h ∈ RX . Now suppose ex absurdo T

T

that y ∈ Ac . Then, since hT − c ≥ 0, and because, as established above, hT |Ac =

198

T. Krak

1 + T hT |Ac , it follows that

%

% hT |Ac − c = 1 + (T hT )|Ac − c = 1 + T (hT − c) %Ac ≥ 1 + T (0) %Ac = 1 , where the second equality used Property T3 and where the inequality used Property T2, which we can do because T satisfies C1–C3 due to Proposition 4; and where the final equality used the linearity of T . Because y ∈ Ac , this implies in particular that 0 = hT (y) − c ≥ 1. From this contradiction, it follows that y ∈ A. Since hT (x) = 0 for all x ∈ A, this implies that c = hT (y) = 0, which means that hT is non-negative. It remains to establish that hT is the unique solution in RX . To this end, let g ∈ RX be any real-valued solution of this system, i.e., suppose that also g = IAc + IAc · T g. Then some algebra analogous to the above yields that g|A = 0 and g|Ac = (I − T |Ac )−1 1, thus g = hT , whence the solution is unique. ( ' We can now prove our first main result, which was stated in Sect. 1. Proof of Proposition 2 Let h be the minimal non-negative solution to (1); this solution exists by Proposition 1, although it could be in (R ∪ {+∞})X . Let us first establish that in fact h ∈ RX . To this end, we note that by Proposition 5, there is some T ∈ T such that h = hT , where hT is the minimal non-negative solution of the linear system hT = IAc + IAc · T hT . Because T satisfies C1–C3 and R1, it follows from Proposition 6 that the linear system hT = IAc +IAc ·T hT has a unique solution in RX that is non-negative, which equals the minimal non-negative solution hT due to Corollary 1. Hence, hT ∈ RX , and because h = hT , it follows that also h ∈ RX . To establish that the solution of (1) is unique in RX , let hT and hS be any two solutions of (1) in RX ; without further assumptions, we will show that hT ≤ hS . First, for any x ∈ A, it holds that hT (x) = IAc (x) + IAc (x) · [T hT ](x) = 0, and, by the same argument, hS (x) = 0, so hT and hS agree on A. Next, due to Proposition 4, there are T , S ∈ T such that T hT = T hT and T hS = ShS . It also follows from Proposition 4 that T hT = T hT ≤ ShT , because S ∈ T. In turn, and using that hT (x) = 0 for all x ∈ A, and that therefore hT · IAc = hT , this implies that

T |Ac (hT |Ac ) = T hT |Ac ≤ ShT |Ac = S|Ac (hT |Ac ) . Substituting this inequality into the system (1) restricted to Ac yields hT |Ac = 1 + T |Ac (hT |Ac ) ≤ 1 + S|Ac (hT |Ac ) . Now note that S|Ac is a monotone map due to Proposition 4 and Lemma 8. Hence, and using the fact that S|Ac is a linear map, expanding this inequality into the righthand side gives, after n ∈ N expansions,

Computing Expected Hitting Times for Imprecise Markov Chains

hT |Ac ≤ (S|Ac )n (hT |Ac ) +

n−1

199

(S|Ac )k 1 .

k=0

Taking limits in n, and using that limn→+∞ (S|Ac )n = 0 and S|Ac )−1 due to Lemma 10, it follows that

+∞

k=0 (S|Ac )

k

= (I −

hT |Ac ≤ (I − S|Ac )−1 1 = hS |Ac , where we used Proposition 6 for the equality, which we can do because, since hS satisfies Eq. (1) by assumption and since T hS = ShS by the selection of S ∈ T, it holds that hS = IAc + IAc · S hS . Hence, we have found that hT |Ac ≤ hS |Ac . However, we can now repeat the above argument, mutatis mutandis, to establish that also hS |Ac ≤ hT |Ac . Hence, we conclude that hT and hS agree on Ac . Because we already established that they agree on A, we conclude that hT = hS and that therefore Eq. (1) has a unique solution in RX . Because h is a solution to Eq. (1) in RX , we conclude that h must be the unique solution. Because h is non-negative, it follows that the unique solution in RX is non-negative. ( '

3 A Computational Method The computational method that we propose in Proposition 7 works by iterating over specific choices of the extreme points of the dual representation T of T . The next result establishes that these extreme points exist and that T (·) obtains its value in an extreme point of T; this is well-known, so we omit the proof. Lemma 11 Let T : RX → RX be a map that satisfies C1–C3, and let T be its dual representation. Then T has a non-empty set of extreme points. Moreover, for all f ∈ RX , there is an extreme point T of T such that T f = Tf . We can now state the second main result of this work. Proposition 7 Let T : RX → RX be a map that satisfies C1–C3 and the reachability condition R1, and let T be its dual representation. Let T1 ∈ T be any extreme point of T and, for all n ∈ N, let hn be the unique solution in RX of the linear system hn = IAc + IAc · Tn hn , and let Tn+1 be an extreme point of T such that T hn = Tn+1 hn . Then the sequence {hn }n∈N is non-increasing, and its limit h∗ := limn→+∞ hn is the unique solution of (1) in RX . Proof First, for any n ∈ N, the solution hn ∈ RX of the linear system hn = IAc + IAc · Tn hn exists and is non-negative due to Proposition 6, and there is an extreme point Tn+1 of T that satisfies T hn = Tn+1 hn due to Lemma 11. Hence, the sequence {hn }n∈N is well-defined and bounded below by 0. Due to Proposition 6, it holds that hn (x) = 0 for all x ∈ A and all n ∈ N. Thus, the sequence {hn }n∈N

200

T. Krak

is trivially non-increasing and convergent on A. We will next establish the same on Ac . First note that hn · IAc = hn for all n ∈ N, because hn (x) = 0 for all x ∈ A. Now fix any n ∈ N. Then Tn+1 hn = T hn ≤ Tn hn because Tn ∈ T, and therefore in particular, it holds that

Tn+1 |Ac (hn |Ac ) = Tn+1 hn |Ac ≤ Tn hn |Ac = Tn |Ac (hn |Ac ) . Therefore, and because hn = IAc + IAc · Tn hn , it holds that hn |Ac = 1 + Tn |Ac (hn |Ac ) ≥ 1 + Tn+1 |Ac (hn |Ac ) . Now note that Tn+1 |Ac is a monotone map due to Proposition 4 and Lemma 8. Hence, and using the fact that Tn+1 |Ac is a linear map, repeatedly expanding this inequality into the right-hand side gives, after m ∈ N expansions, m−1

m

k hn |Ac ≥ Tn+1 |Ac (hn |Ac ) + Tn+1 |Ac 1 . k=0

Taking limits in m, and using limm→+∞ (Tn+1 |Ac )m = 0 and (I − Tn+1 |Ac )−1 due to Lemma 10, it follows that

+∞

k=0 (Tn+1 |Ac )

k

=

hn |Ac ≥ (I − Tn+1 |Ac )−1 1 = hn+1 |Ac , where we used Proposition 6 for the equality, which we can do because, by construction, hn+1 ∈ RX and hn+1 = IAc + IAc · Tn+1 hn+1 . Thus, the sequence {hn }n∈N is non-increasing. Because we know that hn ≥ 0 for all n ∈ N due to Proposition 6, it follows that the limit h∗ := limn→+∞ hn exists. Let us now show that h∗ solves (1). To this end, fix n ∈ N and note that h∗ − IAc − IAc · T h∗ ≤ h∗ − hn+1 + hn+1 − IAc − IAc · Tn+1 hn+1 + IAc · Tn+1 hn+1 − IAc · T h∗ = h∗ − hn+1 + IAc · Tn+1 hn+1 − IAc · T h∗ ≤ h∗ − hn+1 + Tn+1 hn+1 − T h∗ ≤ h∗ − hn+1 + Tn+1 hn+1 − T hn + T hn − T h∗ ≤ h∗ − hn+1 + Tn+1 hn+1 − T hn + hn − h∗ = h∗ − hn+1 + Tn+1 hn+1 − Tn+1 hn + hn − h∗ ≤ h∗ − hn+1 + hn+1 − hn + hn − h∗ ,

Computing Expected Hitting Times for Imprecise Markov Chains

201

where we used property T4 for the final two inequalities. Taking limits in n, all summands on the right-hand side vanish, from which we conclude that h∗ = IAc + IAc · T h∗ . In other words, h∗ is a solution of (1). Because T satisfies C1–C3 and R1, by Proposition 2, Eq. (1) has a unique solution h, and hence, we have that h = h∗ . ( ' We remark without proof that if, in the statement of Proposition 7, we instead take each Tn+1 to be an extreme point of T such that T hn = Tn+1 hn , then the sequence instead becomes non-decreasing and converges to a limit that is the unique nonnegative solution of (3) in RX . So, we can use a completely similar method to compute upper expected hitting times. That said, in the remainder of this chapter, we will focus on the version for lower expected hitting times.

4 Complexity Analysis We will say that a sequence {hn }n∈N of vectors is strictly decreasing if hn+1 ≤ hn and hn+1 = hn for all n ∈ N. Corollary 6 Let T : RX → RX be a map that satisfies C1–C3 and R1, and let T be its dual representation. Then any sequence {hn }n∈N in RX constructed as in Proposition 7 is either strictly decreasing everywhere, or, for some m ∈ N, is strictly decreasing for all n < m and satisfies hk = h∗ for all k ≥ m, with h∗ = limn→+∞ hn . Proof Because we know from Proposition 7 that the sequence {hn }n∈N is nonincreasing, the converse of this statement is that there is some m ∈ N such that hm = hm+1 with hm = h∗ . Suppose ex absurdo that this is the case. Consider the co-sequence {Tn }n∈N of extreme points of T that was used to construct the sequence {hn }n∈N . Let T := Tm+1 and h := hm = hm+1 . Then it holds that T h = T h, and h is the unique solution in RX of the linear system h = IAc + IAc · T h. Following the conditions of Proposition 7, we now construct a sequence {Tn }n∈N , and a co-sequence {hn }n∈N in RX , such that hn = IAc +IAc ·Tn hn for all n ∈ N. To this end, first set T1 := T . This yields h1 = h, and because T ∈ T is an extreme point of T that satisfies T h1 = T h = T h = T h1 , we can take T2 := T , which yields h2 = h. Proceeding in this fashion, we obtain Tn = T and hn = h for all n ∈ N. This sequence {hn }n∈N satisfies the conditions of Proposition 7, but limn→+∞ hn = h = hm = h∗ , a contradiction. ( ' Corollary 7 Let T : RX → RX be a map that satisfies C1–C3 and R1, and let T be its dual representation. Suppose that T has a finite number m ∈ N of extreme points. Then any sequence {hn }n∈N in RX constructed as in Proposition 7 satisfies hk = h∗ = lim→+∞ h for all k ≥ n, for some n ≤ m.

202

T. Krak

Proof Because T has m ∈ N extreme points, it follows from Proposition 7 that the elements of the sequence {hn }n∈N can take at most m distinct values, each corresponding to an extreme point of T. Hence, there is some n ≤ m such that hn = hm+1 , that is, hm+1 must have a value that was already obtained earlier in the sequence. Now apply Corollary 6. ( ' This result suggests that the numerical scheme proposed in Proposition 7 takes at most m iterations, where m is the number of extreme points of T. In fact, it is possible to show that this bound is tight, i.e., for any m ∈ N there is a map T whose dual representation T has m extreme points, for which the method suggested in Proposition 7 takes exactly m iterations when T1 ∈ T is chosen carefully. Unfortunately, the construction that shows the tightness is fairly involved, and we must omit it here for reasons of brevity. Let us now remark on the other computational aspects of the algorithm. At each step n ∈ N of the method, we need to solve two problems. First, given Tn ∈ T, we need to compute hn . This is equivalent to computing the expected hitting time of A for the precise Markov chain identified by Tn , so this step can be solved by any method available in the literature for the latter problem. However, Proposition 6 tells us that hn can be obtained by solving a linear system. Hence, ignoring issues of numerical precision, the complexity of this step is at most O(|X|ω ) [14], which is the complexity of multiplying two |X| × |X| matrices; the current best estimate for this exponent is ω ≈ 2.38 [2]. Secondly, we need to find an extreme point Tn+1 ∈ T such that Tn+1 hn = T hn = infT ∈T T hn . The complexity of this step will depend strongly on the way T is encoded. In many practical applications, however, it will be derived from a given set T of linear maps satisfying C1–C3; that is, the dual representation T is often given explicitly and used to describe T , rather than the other way around. Since T has separately specified rows, to identify T, one only has to specify the sets Tx . In turn, these sets Tx can, in practice, often be described by a finite number of linear inequalities; in the context of Markov chains, these inequalities represent given bounds on the transition probabilities for moving from the state x to other states. Under these conditions, each Tx will be a set that is non-empty (assuming feasibility of the specified constraints), closed, and convex, with a finite number of extreme points. In this case, computing each [T f ](x) (and in particular [T hn ](x)) reduces to solving a linear programming problem in N = |X| + c variables,2 where c is the number of constraints used to specify Tx . For instance, the simplex algorithm can be used to solve this problem, with the added benefit that the returned optimal solution is an extreme point of Tx . Unfortunately, the complexity of the simplex algorithm depends on the pivot rule that is used, and the existence of a polynomial-time pivot

2 It is worth noting that this analysis is somewhat pessimistic, because it assumes no real structure on the underlying dynamical system; in practice, it will often be unlikely that a given state x can move to every other state y in a single step, whence Tx will live in a subspace of dimension (much) less than |X|.

Computing Expected Hitting Times for Imprecise Markov Chains

203

rule is still an open research question [4, 21]. Nevertheless, it typically performs very well in practice; roughly speaking, probabilistic analyses typically yield an expected number of iterations on the order of O(N 2+α ), with α > 0 depending on the specifics of the analysis. For example, [4] give a bound for the expected number √ 2 of iterations on the order of O(|X|2 log cσ −2 +|X|3 log 3 c), where σ is a parameter of the probabilistic analysis. Crucially, however, we are not aware of a result that shows a (expected) runtime of the simplex algorithm that is sub-quadratic in |X|. Moreover, a very recent result [2] gives a deterministic interior-point algorithm that ˜ ω ),3 where, as before, solves this problem (approximately), and which runs in O(N ω O(N ) is the complexity of multiplying two N × N matrices. As that author notes, this complexity can be regarded as essentially optimal. However, because this is an interior-point method, the resulting solution is not guaranteed to be an extreme point of Tx , which we require for our algorithm, so we cannot use the result here other than for illustrating the complexity of the minimization problem. In any case, the above considerations suggest that finding Tx ∈ Tx such that Tx f = [T f ](x), for a generic Tx that is specified using a finite number of linear constraints, will typically take at least O(|X|2 ) time. This then needs to be repeated |X| times, once for each x ∈ X. Hence, finding an extreme point Tn+1 ∈ T such that T hn = Tn+1 hn can be done in polynomial expected time but requires at least O(|X|3 ) time (in expectation). In conclusion, the above analysis tells us that a single step of our proposed method has a expected runtime complexity of at least O(|X|3 ), where the complexity is dominated by finding Tn+1 ∈ T such that T hn = Tn+1 hn , rather than by computing hn from the previous Tn . Let us finally compare this to the only other method in the literature of which we are aware, which is described in Proposition 3. Starting with h0 := IAc , at each step of this method, we need to compute T hn−1 . Following the discussion above, and depending on the algorithm and analysis used, we can expect this to have a complexity of at least O(|X|3 ). Since this was also the dominating factor in the complexity of our new algorithm, we conclude that the two methods have comparable complexity (order) per iteration. What remains is a comparison of the number of iterations they require. With hn as in Proposition 3, it follows from [17, Lemma 42] that for2 any > 0, this method will 2 2 2hn ≤ n + 1 for all n ∈ N. Hence, take at least (2h2 − ) − 1 iterations, before 2hn − h2 ≤ . The new method from Proposition 7 is thus more efficient than the method from Proposition 3, when the 2 2 number m of extreme points of T is less than 2h2. Of course, since we do not know h to begin with, this does not provide a practically useful way to determine which method to use in practice. Moreover, because the number m of extreme points of 2 T2 is potentially extremely large, one may wonder whether the condition m < 2h2 is ever really satisfied in practice. However, a first empirical analysis suggests that the bound m on the number of iterations, although tight, is not representative of the average case complexity of our algorithm. 3 Following

˜ ) hides polylog(N ) factors. [2], O(N

204

T. Krak

In particular, we performed experiments where we randomly created sets T, and applied the method from Proposition 7 to the resulting systems. In these experiments, we varied the size of X between 102 and 103 , while keeping A a singleton throughout. For each setting, we generated sets Tx by sampling 50 points uniformly at random in the space of probability mass functions on X, and taking Tx to be the convex hull of these points. Thus each Tx has 50 extreme points (almost surely with respect to the uniform sampling), and the set T constructed from them has 50|X| extreme points. We then ran the method from Proposition 7, where we selected T1 such that T IA = T1 IA , and the number of iterations until convergence was recorded. Using the notation from Proposition 7, we say that it converged after n > 1 iterations, when we observe hn = hn−1 ; this is valid by Corollary 6. This was repeated 50 times for each value of |X|. The results of these experiments are shown in Fig. 1. We see that in the majority of cases, the method converges to the correct solution in three iterations, although sometimes it takes four iterations. There were no instances where the method took more iterations to converge, although in one instance (not shown) with |X| = 103 , the method converged in only two iterations.

Count

50 3 Iterations 4 Iterations

25 0 100 200 300 400 500 600 700 800 900 1000 Number of states

Fig. 1 The number of instances, out of 50, on which the method from Proposition 7 converged in a given number of iterations, for different sizes of the set X

Although still preliminary, we believe that these results are indicative of an average runtime complexity that is vastly more efficient than the worst case suggested by Corollary 7. In future work, we hope to examine this average case performance more thoroughly. Finally, in [16], the authors already noted the close connection between the system (1), and the equations of optimality that one encounters in the theory of Markov decision processes (MDPs) [11]. Moreover, the method that we propose in Proposition 7, although discovered independently, is reminiscent of the policy iteration algorithm for MDPs. Interestingly, the method in Proposition 3 bears a similar resemblance to the value iteration algorithm for MDPs. We hope to explore these connections more fully in future work. Acknowledgments The main results of this work were previously presented, without proof, as an abstract-and-poster contribution at ISIPTA 2019. The author wishes to express his sincere gratitude to Jasper De Bock for stimulating discussions and insightful comments during the preparation of this manuscript. He would also like to thank an anonymous reviewer for their helpful suggestions.

Computing Expected Hitting Times for Imprecise Markov Chains

205

References 1. Augustin, T., Coolen, F.P., Cooman, G.D., Troffaes, M.C.: Introduction to Imprecise Probabilities. Wiley, New York (2014) 2. Brand, J.v.d.: A deterministic linear program solver in current matrix multiplication time. In: Symposium on Discrete Algorithms, pp. 259–278 (2020) 3. Campos, M.A., Dimuro, G.P., da Rocha Costa, A.C., Kreinovich, V.: Computing 2-step predictions for interval-valued finite stationary Markov chains (2003) 4. Dadush, D., Huiberts, S.: A friendly smoothed analysis of the simplex method. In: Proceedings of STOC 2018, pp. 390–403 (2018) 5. De Bock, J.: The limit behaviour of continuous-time imprecise Markov chains. J. Nonlinear Sci. 27(1), 159–196 (2017) 6. De Cooman, G., De Bock, J., Lopatatzidis, S.: Imprecise stochastic processes in discrete time: global models, imprecise Markov chains, and ergodic theorems. Int. J. Approx. Reason. 76, 18–46 (2016) 7. De Cooman, G., Hermans, F.: Imprecise probability trees: bridging two theories of imprecise probability. Artif. Intell. 172(11), 1400–1427 (2008) 8. De Cooman, G., Hermans, F., Antonucci, A., Zaffalon, M.: Epistemic irrelevance in credal nets: the case of imprecise Markov trees. Int. J. Approx. Reason. 51(9), 1029–1052 (2010) 9. De Cooman, G., Hermans, F., Quaeghebeur, E.: Imprecise Markov chains and their limit behavior. Probab. Eng. Inf. Sci. 23(4), 597–635 (2009) 10. Dunford, N., Schwartz, J.T.: Linear Operators, Part 1: General Theory, vol. 10. Wiley, New York (1988) 11. Feinberg, E.A., Shwartz, A.: Handbook of Markov Decision Processes: Methods and Applications, vol. 40. Springer Science & Business Media, Heidelberg (2012) 12. Hartfiel, D.J.: Markov Set-Chains. Springer, Berlin (2006) 13. Hermans, F., Škulj, D.: Stochastic processes. In: Augustin, T., Coolen, F.P., Cooman, G.D., Troffaes, M.C. (eds.) Introduction to Imprecise Probabilities, chap. 11. Wiley, New York (2014) 14. Ibarra, O.H., Moran, S., Hui, R.: A generalization of the fast LUP matrix decomposition algorithm and applications. J. Algorithms 3(1), 45–56 (1982) 15. Kozine, I.O., Utkin, L.V.: Interval-valued finite Markov chains. Reliab. Comput. 8(2), 97–113 (2002) 16. Krak, T., T’Joens, N., De Bock, J.: Hitting times and probabilities for imprecise Markov chains. In: Proceedings of ISIPTA 2019, pp. 265–275 (2019) 17. Krak, T., T’Joens, N., De Bock, J.: Hitting times and probabilities for imprecise Markov chains (2019). https://arxiv.org/abs/1905.08781 18. Lopatatzidis, S.: Robust modelling and optimisation in stochastic processes using imprecise probabilities, with an application to queueing theory. Ph.D. thesis, Ghent University (2016) 19. Norris, J.: Markov Chains. Cambridge University Press, Cambridge (1997) 20. Škulj, D.: Finite discrete time Markov chains with interval probabilities. In: Soft Methods for Integrated Uncertainty Modelling, pp. 299–306. Springer, Berlin (2006) 21. Todd, M.: The many facets of linear programming. Math. Program. 91 (2002). https://doi.org/ 10.1007/s101070100261 22. Walley, P.: Statistical Reasoning with Imprecise Probabilities. Chapman and Hall, London (1991)

Part III

Robust and Reliability-Based Design Optimisation in Aerospace Engineering (RBDO)

Multi-Objective Robust Trajectory Optimization of Multi-Asteroid Fly-By Under Epistemic Uncertainty Simão da Graça Marto

and Massimiliano Vasile

1 Introduction This chapter considers the problem of finding optimal trajectories subject to epistemic uncertainty in system parameters and initial conditions. In particular, this chapter considers the case in which the uncertainty is time dependent and an optimal control law is sought that is robust versus multiple realizations of a set of uncertain model parameters. This type of problem is the characteristic of the preliminary design of small class low-cost space missions, where the uncertainty on the performance of the spacecraft is expected to be large, especially in the early stage of the system definition. When this is the case, the design of a single optimal trajectory without accounting for uncertainty could lead to solutions with a high probability of failure. A direct assessment of the robustness of the solution via Monte Carlo simulation is, however, expensive, in particular in the case of epistemic uncertainty when a single distribution is unknown. Thus in this chapter, we propose a method that uses Bernstein polynomials to build a representation of a family of probability distributions and compute the lower expectation on the realization of a set of events. The trajectory is then optimized with respect to this expectation. The use of the lower expectation as cost function provides solutions that are robust against the worst-case realization of the uncertainty in the trajectory model. This chapter is particularly concerned with the development of an efficient approach to the calculation of the lower expectation and optimization of the trajectory.

S. da Graça Marto () · M. Vasile University of Strathclyde, Glasgow, UK e-mail: [email protected]; [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Vasile, D. Quagliarella (eds.), Advances in Uncertainty Quantification and Optimization Under Uncertainty with Aerospace Applications, Space Technology Proceedings 8, https://doi.org/10.1007/978-3-030-80542-5_13

209

210

S. da Graça Marto and M. Vasile

Previous work on the subject was presented by Olympio and Izzo [13] using Taylor Algebra and more recently by Gerco et al. [7] using a set-oriented approach based on generalized polynomial algebra (GPA). In both cases, polynomial expansions were used to relate the final state with a set of uncertain variables. The work in [13] assumes uncertainty only in initial/final states, and each control law is optimal for each realization of the uncertain variables. Thus it obtains a family of control laws under the assumption that uncertainty is aleatory. The work in [7], on the other hand, obtains a single robust feed-forward solution and considers the effect of epistemic uncertainty on the initial state. Stochastic approximation methods were proposed by Olympio [12], which uses Robbins–Monro to enforce chance constraints and Kiefer–Wolfowitz for stochastic minimization. This chapter applies these methods to the optimization of a control law that is required to be robust to temporary single-engine failures, where the trajectory is re-calculated following the engine recovery, thus being a feed-back control law as well. Also in this case the uncertainty is modelled as aleatory. In [14], Ozaki et al. proposed a method based on differential dynamic programming with an unscented transform for uncertainty propagation. A feed-back control law was introduced to optimize the expected value of a cost function that combines v with final distance to target. This formulation assumes that the uncertain is Gaussian and remains Gaussian when propagated forward in time; thus it can always be described by the first two statistical moments. Following a different approach, Wachi [18] proposes a method that changes targets in the event of a mission failure. A value is attributed to each target, and its expectation is optimized using dynamic programming with a Markov decision process formulation. In most of these methods, even though the problem is not deterministic, the assumption is still to have a complete knowledge of the probability density function associated to each uncertain variable. However, many researchers [11] have argued that modelling uncertainty due to lack of knowledge probabilistically is inadequate. For example, if different experts suggest different probability distributions, we could use a Bayesian approach and attribute a probability to each distribution. However, this could result in severely underestimating the probability of some negative event happening, leading to a much worse design than expected. In the work of Di Carlo et al. [5], an epistemic uncertainty formulation was proposed, using Dempster–Shafer theory of evidence. The belief in meeting a propellant mass threshold is optimized subject to a constraint in the belief in reaching the desired final state. Quantifying epistemic uncertainty using lower expectation and Bernstein polynomials was first proposed by Vasile and Tardioli [17] using a linear optimization approach. However, the size of the linear optimization problem was growing exponentially with the number of dimensions. Thus, in this chapter, we revisit the approach proposed in [17] so that the size of the optimization problem grows linearly with the number of dimensions. We also combine the use of Bernstein polynomials with the idea, proposed in [5], to use a surrogate that directly maps the decision variables into the value of the lower expectation. This mapping completely bypasses the computation of the lower expectation during the search for an optimal trajectory.

Multi-Objective Robust Trajectory Optimization

211

We consider a simple case of a low-thrust multi-asteroid fly-by tour with epistemic uncertainty in initial conditions, thrust modulus, and specific impulse and also obtain results for a previously introduced asteroid rendezvous problem [9] with the same type of uncertainty.

2 Problem Formulation In this chapter, we consider general problems of the following form: min [−E(h1 < ν1 ), −E(h2 < ν2 ), . . . , −E(hm < νm ), ν1 , ν2 , . . . , νm ]T y∈Y,ν∈N s.t. x˙ = b(x) + f(x, y, ξ ), (1) where x is the state vector, y ∈ Y is a transcription for the control variable, ξ ∈ is a transcription for the uncertainty, the scalar functions h(·) = h(·) (y, ξ ) represent quantities of interest, and ν = [ν1 , ν2 , . . . νm ] ∈ N is a vector of thresholds on these quantities. In this chapter, these quantities of interest can be the propellant mass, distance to target, or relative speed to target. The lower expectation E quantifies the uncertainty, and its definition and calculation are in Sect. 3. In the following, the motion will be defined in non-singular equinoctial parameters; thus we have that x = [a, P1 , P2 , Q1 , Q2 , L, m]T , where L is the true longitude. In this chapter, we consider low-thrust trajectories consisting of an ejection by a conventional launcher, followed by a number of alternating coast and thrust arcs with ion propulsion. The ejection is characterized by the departure time tD , and the magnitude v∞ , azimuth γ, and declination δ of the hyperbolic excess velocity relative to the Earth in a heliocentric reference frame. The ith coast arc is characterized by its length in longitude LOF F,i , i.e., the difference between the longitude at the end of the arc and at the beginning. The ith thrust arc is characterized by its length LON,i , and by the azimuth αi and declination βi that the spacecraft engine is pointing towards. Variables that are required to calculate a trajectory, but which are not part of the control vector, are the engine thrust at r = 1AU, T , and the specific impulse Isp . We also optimize the times of arrival at each target tT ,i . For more details on the transcription approach, including accuracy and computational cost, please refer to [3, 4, 9] and [20]. We consider the thrust and specific impulse to be varying in an epistemically uncertain way, so these quantities are modelled as functions parameterized by epistemically uncertain parameters, T = T (L; T1 , . . . , TnT ) and Isp = Isp (L; Isp,1 , . . . , Isp,nT ). The values Ti and Isp,i are the values of the thrust and specific impulse at equispaced points in the trajectory, and the value of thrust and

212

S. da Graça Marto and M. Vasile

specific impulse for a particular true longitude L is obtained via linear interpolation. We also consider v∞ as an epistemically uncertain variable, so that the uncertain vector is ξ = [v∞ , T1 , . . . , TnT , Isp,1 . . . Isp,nI ].

3 Lower Expectation Before we go into the solution of problem (1), we introduce a procedure to compute the lower expectations. This is a critical point of our methodology that requires careful consideration because the computation of the lower expectation is an NPhard global optimization problem, in the general case. We write the indicator function I of a generic quantity of interest h as I (y, ξ , νh ) = h(y, ξ ) < νh .

(2)

For clarity, throughout this section, we will drop the dependency of I on y and νh . With epistemic uncertainty, the probability distribution followed by the uncertain variables ξ is unknown. Multiple conflicting sources of information may suggest different distributions. So, combining these sources of information, a family of distributions or p-box P is defined so as to contain the real distribution. Thus, we quantify the lower expectation E as the minimum expectation obtainable with distributions p ∈ P: E(I ; p) = min E(I ; p), p∈P

(3)

where E(I ; p) is the expectation function: E(I ; p) =

I (ξ )p(ξ ) dξ ,

(4)

and is the space of the uncertain variables ξ . The formula in Eq. (3) requires estimating expectation and then finding its minimum, but first a family of distributions must be defined. For this purpose, we follow [17] and use Bernstein polynomials. We could have a family composed of multivariate Bernstein polynomials, which would include distributions of dependent variables, ⎧

⎫ ⎪ ⎪ ⎪ pm (ξ , c) = cj Bj τ (ξ ) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎬ j∈J , (5) Pm = ⎪ ⎪ ⎪ ⎪ : ∀c > 0 c = 1 ⎪ ⎪ j ⎪ ⎪ ⎪ ⎪ ⎩ ⎭ j∈J

Multi-Objective Robust Trajectory Optimization

213

or a family of distributions of independent variables where each variable’s distribution is a uni-variate Bernstein distribution ⎧ ⎫ n ξ qk ⎪ ⎪ = ⎪ ⎪ ⎪ ⎪ (k) ⎪ pu (ξ ; c) = ⎪ ⎪ c b (τ (ξ )) j ;qk k k ⎪ ⎪ ⎪ j ⎨ ⎬ k=1 j =0 . (6) Pu = (k) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ : ⎪ ∀c > 0 ⎪ cj = 1 ∀k ⎪ ⎪ ⎪ ⎪ ⎩ ⎭ j In our previous work [9], we showed that optimizing with either of these families is equivalent to using 8 7

Pj = Bj,q τ (ξ ) ∀j ∈ J ⊂ Nnξ ,

(7)

where B is a multi-variate Bernstein basis function (Eq. (8)), scaled to be a valid probability distribution (i.e., to integrate to one over the uncertainty space ), and τ is a function that maps the uncertainty space to the unit hyper-cube [0, 1]nξ , nξ being the number of uncertain variables. Bj,q (x) =

nξ =

(qk + 1)

k=1

qk jk x (1 − x)qk −jk , jk

(8)

where the integer vector q indicates the order of the polynomial for each stochastic variable, and j is the multi-index that selects a specific function within the family. The problem of minimizing the expectation becomes one of finding the multi-variate Bernstein basis function that minimizes the problem, and we only need to compute expectations for those distributions.

3.1 Minimizing the Expectation Obtaining the lower expectation requires solving the optimization in Eq. (3). Since the family of distributions is P = Pj , this can be written as E(I ) = min j∈J

I (ξ )Bj,q τ (ξ ) dξ ,

(9)

where we wish to find the multi-index j that corresponds to the function Bj,q that minimizes the expectation. This is an optimization problem over integer variables. We consider a local search method, a pattern search, as in our previous work [9], except here we consider more variants of this algorithm, and compare them. In this iterative method, if at iteration k we have the multi-index jk , we form a

214

S. da Graça Marto and M. Vasile

neighbourhood Nk around point jk and choose the multi-index jk+1 ∈ Nk that minimizes E(I ; Bjk+1 ). This neighbourhood is composed of multi-indexes where only one index is different from jk , hence the term pattern search. We consider two different neighbourhoods, NkB and NkS . The former considers every multi-index that only differs from jk by one index, Eq. (10). The latter only contains multi-indexes where the differing index differs by 1, Eq. (11). 8 7 NkB (jk ) = j ∈ J \ {jk } : ∃m, ∀i = m, ji = jki

(10)

7 8 NkS (jk ) = j ∈ NkB : ∃m, jm = jkm ± 1 .

(11)

These methods always find a local optimum but have no guarantee of finding the global optimum. The solution returned by these methods depend on the initial guess j0 . In [9], we proposed a greedy initialization ∀k, j0k = arg min i

I (ξ )bi,qk (ξk )dξ .

(12)

f

Let jk 1 be the outcome of our pattern search algorithm from this starting point. Again following [9], we also investigate the option of re-starting the search from the symmetric point jf1 +1 : f +1

jk 1

f

= qk − jk 1 ∀k.

(13)

The solution obtained from this starting point is termed jf2 , and naturally we use the optimal of the two j ω2 = arg minj∈{jf1 , jf2 } E(I ; j). We compare the results of these methods with multiple restarts from random initial points, and with Matlab’s® genetic algorithm, using its default settings, i.e., a uniformly random initial population (50 if nξ ≤ 50, 200 otherwise), the scattered crossover function with 0.8 crossover fraction, and a Gaussian mutation function. Because the lower expectation is always greater than or equal to zero, whenever any of these algorithms find a zero, the search is interrupted.

3.2 Estimating the Expectation Given the difficulty of calculating the integral in Eq. (4) analytically, a quasi-Monte Carlo (qMC) approach is employed, following our previous work [9]. This is identical to a Monte Carlo approach, but instead of selecting the samples randomly, a deterministic, low-discrepancy sequence ξ (1) , . . . , ξ (N ) is used. We use the Halton sequence [8] for this purpose, transformed so that it follows whichever distribution Bj we want to estimate the expectation with.

Multi-Objective Robust Trajectory Optimization

215

The estimate of the expectation using qMC is N 1 Eˆ P (I ; p) = I (ξi ) , N

ξi ∼ p ,

(14)

i=1

where the number of samples we use is N = 5000. At each step of the pattern search algorithm, the multi-index j specifies the current estimate of which distribution Bj corresponds to the lower expectation. Our local search method consists of evaluating the expectation while varying one of the indices in j at a time. To evaluate the points in NkB , for instance, requires simulating the trajectory N · q · nξ times. There is overlap between the Bernstein basis distributions, which allows making use of some of the samples obtained to estimate E(I ; Bjk ) in the estimates of E(I ; Bjk+1 ) for jk+1 ∈ Nk (jk ). We now introduce a method based on rejection sampling, which we term qMC+RS to distinguish from the baseline method qMC. We choose samples to reuse based on the acceptance probability given as ⎛ pa (x) = min ⎝

Bjk+1 ,q (x) Bjk ,q (x)

⎞ , 1⎠ .

(15)

The resulting samples will follow the distribution

pr (x) = min Bjk+1 ,q (x), Bjk ,q (x) .

(16)

The function pr (x) is not normalized in the 9above equation, and its integral represents the fraction of reused samples ηr = [0,1]nξ pr (x) dx. In order to obtain samples following distribution Bjk+1 , we sample a complementary distribution, given as pc (x) = max 0, Bjk+1 ,q (x) − Bjk ,q (x) ,

(17)

where again pc (x) is not normalized in the above 9equation, and its integral represents the fraction of new samples 1 − ηr = [0,1]nξ pc (x) dx. Note that pr (x) + pc (x) = Bjk+1 ,q (x). Reused samples and newly obtained samples are combined and used to estimate the expectation. When the best element in Nk is selected, it is sampled with exclusively new samples, i.e., using qMC only, to avoid the accumulation of loss of accuracy, caused by applying rejection sampling with a non-smooth pa [19]. Only one index differs between jk and jk+1 . Let i be this index such that jki = k+1 ji . In Table 1, the value of ηr is shown for all possible pairs jki , jk+1 . i

216

S. da Graça Marto and M. Vasile

Table 1 Fraction of reusable points ηr , when jk and jk+1 only differ for the i-th index, given for each pair jki , jk+1 i

jki 0 1 2 3 4

jk+1 i 0 – 0.590 0.331 0.164 0.062

1 0.590 – 0.654 0.375 0.164

2 0.331 0.654 – 0.654 0.331

3 0.164 0.375 0.654 – 0.590

4 0.062 0.164 0.331 0.590 –

To keep their low-discrepancy properties, the samples are accepted deterministically. Let (x)k , k = 1 . . . N be our original samples following B jk ,q . We go through the individual (x)k in ascending order of (xi )k , keeping a subtotal of the values of pa ((x)k ). Whenever the subtotal exceeds 1, it is decremented and the sample is accepted. Otherwise, it is rejected.

4 Multi-Objective Optimization Due to the high number of dimensions in the control law y ∈ Y, we follow [9] in using a control mapping strategy to reduce the number of dimensions of the search space. As such, we have a control map y(z), where the proxy control variable z ∈ Z has fewer dimensions than y, as described in Sect. 4.1. As previously mentioned, the target sets are represented by a threshold ν on a metric, and thus the lower expectations are written as functions of ν, as well as y, E = E(y(z), ν). To avoid the computational burden of estimating the lower expectation and computing the control map, surrogate models E˜ are obtained that ˜ ν) ≈ E(y(z), ν), sidestep both of these computations, by approximating E(z, similarly to [5]. We use Kriging models implemented in the DACE toolbox [10]. The optimization problem (1) is thus written as ˜ ν), ν . min −E(z, z,ν

(18)

We introduce a threshold mapping technique, wherein a proxy threshold value ν ∗ is mapped to ν, in order to focus the search on more useful regions of the search space. This process is explained in Sect. 4.2. When using threshold mapping, the optimization problem becomes ˜ ν ∗ ), ν˜ (z, ν ∗ ) . min∗ −E(z, z,ν

(19)

These surrogate models are first trained with 100 initial training points. After this, we solve problem (18) or (19) using MACS [15], with 10 agents and 10 elements in the archive. After MACS runs, the points in its estimate of the Pareto front are

Multi-Objective Robust Trajectory Optimization

217

evaluated exactly, and these exact values are added to the surrogate model’s training set. These steps are repeated 10 times, such that in total we evaluate 100 + 10 × 10 points.

4.1 Control Mapping for Dimensionality Reduction Here the control mapping strategy in [9] is briefly explained in “Deterministic Control Map”, as well as two novel control maps in “Max-Min Control Map” and “Min-Max Control Map”.

Deterministic Control Map In this control map, proposed in [9], the search space is restricted to control laws that are optimal for a deterministic setting, targeting positions, and velocities in the vicinity of the targets. The only difference is that here we use multiple shooting, and thus the optimization variable includes the states at the beginning of each arc, F for OFF nodes. The states at a node i.e., at the nodes, xON for ON nodes and xOF i i + and that are obtained by propagating from the previous node are written as xON i + F xOF . i Defining the array of node states, OF F OF F . X = x0 , xON , . . . , xON nLT , xnLT 1 , x1

(20)

Thus the control mapping y = y∗ (ξ , Dr , Dv ) is defined as the vector y that, along with X, solves the optimal control problem y∗ (ξ , Dr , Dv ), X∗ = arg min mp (y, ξ ) y,X s.t. r(X, y, ξ , (tT )r ) = rr ((tT )r ) + (Dr )r v(X, y, ξ , (tT )r ) = vr ((tT )r ) + (Dv )r if target is not fly-by only. + ON xON = x s s

(21)

+ F F xOF = xOF s = 1, . . . , nLT s s x0 = (x0 )+ , where the r-th target position rr and velocity vr are calculated using Keplerian propagation for the target’s fly-by time (tT )r , and the spacecraft positions at fly-by

218

S. da Graça Marto and M. Vasile

r(X, y, ξ , (tT )r ) and v(X, y, ξ , (tT )r ) are calculated for (tT )r , by propagating from the immediately previous node in X. The state at departure (x0 )+ is defined by the departure conditions v∞ , γ, δ, the last two being part of y. The vectors (Dr )r and (Dv )r are the displacements in position and velocity for the r-th target. If a target is fly-by only, as in the test case in Sect. 5, the constraint on velocity is not applied and (Dv )r is not a part of z. If, on the other hand, the target is a rendezvous target, the constraints are in fact applied by equating the equinoctial elements, instead of position and velocity. This restricts the search to the set Y ∗ ⊂ Y , given as + * Y ∗ = y ∈ Y : y = y ∗ (ξ , Dr , Dv ) ∀ξ ∈ , Dr ∈ Rr , Dv ∈ Rv ,

(22)

where Rr and Rv are the 3D box sets. We also consider a further restricted search space Y − ⊂ Y ∗ ⊂ Y , where we fix the displacements to zero * + Y − = y ∈ Y : y = y ∗ (ξ , 0, 0) ∀ξ ∈ .

(23)

For control maps Y ∗ and Y − , we have z = ξ , Dr , Dv and z = ξ , respectively.

Max-Min Control Map In order to reduce the search space even further, and remove the dependency of the computational time on the number of uncertain variables nξ , we test a new control map, where we restrict the search only to the control laws that correspond to a worstcase scenario, that is, to the uncertain vector ξ for which theoptimalcontrol law is worse, such that the proxy control variable is given as z = Dr , Dv . We define it as yB (Dr , Dv ) = y∗ (ξ M , Dr , Dv ) s.t. ξ M = arg max mp (y∗ (ξ , Dr , Dv ), ξ ).

(24)

ξ

The value of ξ M is defined through a max-min problem, a bilevel problem where only the lower level, the solution of y∗ , is subject to constraints. Similarly to the work in [16], we use a surrogate for the inner problem output, i.e., we obtain m ˜ p (ξ ) ≈ mp ( y∗ (ξ , Dr , Dv ), ξ ). However, for the specific application of solving a max-min problem, we use the fact that the inner problem objective is the same value as the outer problem; thus we solve ˜ p (ξ ). ξ M = arg max m ξ

(25)

Multi-Objective Robust Trajectory Optimization

219

We use a Kriging model, implemented in the DACE toolbox [10], for the surrogate m ˜ p . We train this model with 15 training points. Equation (25) is solved on this surrogate model using MP-AIDEA [2]. This solution is then verified by computing it without the surrogate model, and the model is refined with this value. This process of optimizing and refining is iterated 5 times. This control map restricts the search space to Y B , defined as 8 7 Y B = y ∈ Y : y = yB (Dr , Dv ) ∀Dr ∈ Rr , Dv ∈ Rv ,

(26)

where Y B ⊂ Y ∗ ⊂ Y and Rr and Rv are box sets. Because this control map does not depend on ξ , its size is decoupled from the number of uncertain variables nξ . This makes the surrogate modelling and MO optimization process more scalable.

Min-Max Control Map An additional control map is also proposed, which defines z the same way as Y B , but is formulated as a min-max problem where the constraint is applied on both the minimization and maximization subproblems: yM (Dr , Dv ) = arg min max mp (y, ξ ) y∈Y ξ ∈ s.t.

(27)

max C(y, ξ ) ≤ , ξ ∈

where the constraint C(y, ξ ) is defined as C(y, ξ ) =

r(X, y, ξ , (tT )r ) − rr ((tT )r )2 ,

(28)

r

for fly-by constraints, and as C(y, ξ ) =

2 x(y, ξ , (tT )r )j − xrj ((tT )r ) , r

(29)

j

for the rendezvous constraints, where x is the equinoctial elements, and the semimajor axis a is in astronomical units. We solve Eq. (27) using MacMinMax [6], a min-max solver with strict constraints. Because the constraints cannot be met for all ξ , the constraint threshold ∈ R is found by MacMinMax via its iterative constraint relaxation process. This algorithm works by solving outer, inner, and constraint subproblems, for more information consult [6]. Given the simplicity of the outer and constraint subproblems, these are run using MATLAB’s fmincon-sqp ® with step and constraint tolerances

220

S. da Graça Marto and M. Vasile

set to 10−10 and 10−5 , respectively, and a maximum of 5000 function evaluations. The inner subproblem is solved using MP-AIDEA [2]. This control map defines a reduced space, 8 7 Y M = y ∈ Y : y = yM (Dr , Dv ) ∀Dr ∈ Rr , Dv ∈ Rv ,

(30)

which does not necessarily satisfy Y M ⊂ Y ∗ , unlike Y B , since there is no expectation that the resulting control satisfies the constraints for any specific values of ξ , Dr , Dv .

4.2 Threshold Mapping By defining the search space as a hyper-rectangle over the thresholds, we include a lot of “uninteresting” points where the lower expectation is either zero or one. These points are always dominated, as shown in Fig. 1. We investigate a threshold mapping strategy intended to focus the search efforts on the “interesting” region. We introduce a proxy threshold variable ν ∗ , which maps to the thresholds ν, in such a way that 0 < E(y(z), ν(ν ∗ , y(z))) < 1 .

(31)

Let h(y(z), ξ ) be our quantity of interest involved in the definition of the lower expectation, as explained in Sect. 2. Let also h(y(z)) = minξ ∈ h(y(z), ξ ) and h(y(z)) = maxξ ∈ h(y(z), ξ ). Our requirement in Eq. (31) implies h < ν < h, which can be trivially satisfied with the following mapping: Fig. 1 Abstract diagram representing the search space. When using a hyper-rectangular sample region, as we do, many of the samples fall into the regions where E = 0 or E = 1, which are shaded in this diagram. Points in these regions are always dominated. For example, point A is dominated by point B, and C by D, since they have the same value of E but with larger ν

A

=1

C D

( ),

= min

( ),

samples

B 0
0) and noiseless cases. Robust Knowledge Gradient can be applied if we can sample exactly at the desired location during optimization. Also in [10] it is assumed that while the output noise () is inherent in the measurement process and cannot be controlled, the disturbance δ can be controlled during optimization and only comes into play when the design is manufactured. However in many cases, one cannot control the input. In this chapter, we specifically consider the case where the design variables are also disturbed during optimization. Moreover, we re-visit DRA, suggest some improvements, and highlight the fact that it is very good in situations where the disturbance δ is already uncontrollable during optimization.

Bayesian Optimization for Robust Solutions Under Uncertain Input

247

3 Problem Definition We aim to search for the robust maximum of a black-box function f , defined over the input domain X ⊂ RD subject to independent white noise ∼ N(0, σ2 ) with constant variance σ2 across the input domain and a limited budget of N samples (N evaluations of the latent function f ). Additionally, the disturbance δ ∈ [x − , x + ] is distributed around each solution x ∈ X with probability density P[δ]. The objective is to maximize the following robustness function: ∞

max F (x) = f (x + δ) + P[δ]ddδ. (1) x∈X

− −∞

As the measurement of the method’s quality, we choose the opportunity cost, which is the difference between the maximum expected value (robust maximum) and the value of robustness function F at the solution returned by the algorithm xr . Opportunity Cost = max F (x ) − F (xr ). x

We assume the latent function f can be approximated reasonably well by a Gaussian process (GP). More details about GPs will be discussed in Sect. 4. In each iteration, based on the information collected so far, the algorithm sequentially chooses the solution to be sampled next so that the final opportunity cost is minimized.

4 Methodology 4.1 Gaussian Process We choose a Gaussian process as surrogate model in our Bayesian optimization. A Gaussian process is a collection of random variables, any finite number of which has a joint Gaussian distribution [13]. A Gaussian process is characterized by its mean function μ0 (x) and kernel (covariance function) 0 (x, x ), where μ0 (x) = E[f (x)], 0 (x, x ) = E[(f (x) − μ0 (x))(f (x ) − μ0 (x ))]. We choose the constant mean function and widely used square-exponential kernel, i.e., μ0 (x) = μ0 ,

x − x 2 0 (x, x ) = α0 exp − 2lx2

.

248

H. P. Le and J. Branke

The choice of other mean functions and kernels is discussed in [5] and [13]. Given the vector of observations f (x 1:n ) = (f (x 1 ), . . . , f (x n ))T at the vector of points x 1:n = (x 1 , . . . , x n )T , where n is the number of total samples so far, the posterior mean μn and posterior covariance n can be computed as follows:

−1 1:n f x − μ0 x 1:n + μ0 (x), μn (x) = 0 x, x 1:n 0 x 1:n , x 1:n + σ 2 In

−1

n (x , x) = 0 (x , x) − 0 x , x 1:n 0 (x 1:n , x 1:n + σ 2 In 0 x 1:n , x , where In is the identity matrix of size n and 0 (x, x 1:n ) = (0 (x, x 1 ), . . . , 0 (x, x n )), 0 (x 1:n , x) = (0 (x 1 , x), . . . , 0 (x n , x))T , 0 (x 1:n , x 1:n ) = [0 (x 1 , x 1:n ), . . . , 0 (x n , x 1:n )]. The Gram matrix ⎛

0 (x 1 , x 1 ) 0 (x 1 , x 2 ) ⎜ ⎜0 (x 2 , x 1 ) 0 (x 2 , x 2 ) 0 (x 1:n , x 1:n ) = ⎜ .. .. ⎜ . . ⎝ n 1 0 (x , x ) 0 (x n , x 2 )

⎞ . . . 0 (x 1 , x n ) ⎟ . . . 0 (x 2 , x n )⎟ ⎟ .. .. ⎟ . . ⎠ n n . . . 0 (x , x )

(2)

is also called covariance matrix and is positive semidefinite. Gaussian process can be fitted to deterministic data (when σ = 0) or stochastic data (when σ > 0). Maximum likelihood has been used for tuning the hyperparameters of the model and minimizing the model mismatch.

4.2 Robust Bayesian Optimization Direct Robustness Approximation The simplest way to apply Bayesian optimization to find the robust solutions solving (1) is to estimate F (x) by sampling over δ and applying standard acquisition functions at each sample location. We have called this method Direct Robustness Approximation (DRA) in [10]. The idea is to directly approximate the robustness function F by the GP, and each observation with random disturbance and output noise is taken as a sample of F . Every observation is averaged over k independent replications to reduce the observation noise, and the method is thus denoted DRA(k). Since for each solution k replications are needed, the total budget of function evaluations N should be divisible by k in order for it to be used up.

Bayesian Optimization for Robust Solutions Under Uncertain Input

249

Standard KG is used in the role of the acquisition function. Knowledge Gradient of a point is the expected value of the increase in the maximal values of the posterior mean if one can sample once more at that point. Given the observation at x 1 , . . . , x n , with the assumption that the next sample x n+1 will be at x, KG can be written as KGn (x) := E max μn+1 (x ) − max μn (x )|x n+1 = x . x

x

KG calculated for a discrete and finite set was introduced by Frazier in [6] as KG for correlated belief. Then Scott et al. present the KG for continuous parameters in [16] that can be approximated by the maximization over a finite subset of the input space, for instance KGn is approximated by discretizing X over a subset Xm = {x1 , . . . , xm } ⊂ X as follows: KGn (x) := E[max{μ1 + Zσ1 , . . . , μm+1 + Zσm+1 }|x n+1 = x],

(3)

where Z ∼ N(0, 1) and μi = μn (xi ) − μm+1 = μn (x) −

max

μn (x ),

max

μn (x ),

x ∈Xm ∪{x}

x ∈Xm ∪{x}

σi = σ˜ n (xi , x) = σm+1 = σ˜ m+1 =

n (xi , x) , σ n (x)

i = 1, m,

i = 1, m,

n (x, x) . σ n (x)

The GP model for DRA(k) would have to always allow for observation noise even if the underlying function f is deterministic, as observations are still stochastic due to the random input disturbance. The method returns the solution with best posterior mean of the approximated robustness function [10]. The flowchart in Fig. 1 summarizes method DRA(k). Because each observation is an average over multiple samples, the method is computationally expensive. However, the estimate of the solution quality can be improved if the k samples are drawn by Latin Hypercube Sampling (LHS) [11] rather than random sampling.

Robust Knowledge Gradient xrN = arg max M N (x). x

250

H. P. Le and J. Branke

Fig. 1 Flowchart for DRA(k)

The flowchart in Fig. 2 summarizes method rKG. As described in [10], Robust Knowledge Gradient adapts the standard Knowledge Gradient, where after n samples, rKGn is approximated by discretizing over a set Xm = {x1 , . . . , xm } ⊂ X as follows: ˜ 1 , . . . , Mm+1 + Z ˜ m+1 } , (4) rKGn (x) := E max{M1 + Z where ˜i = ˜ n (xi , x),

i = 1, m,

˜ m+1 = ˜ n (x, x), Mi = M n (xi ) − Mm+1 = M n (x) − M (x) = n

˜ n (x , x) =

−

−

max

M n (x ),

max

M n (x ),

x ∈Xm ∪{x}

x ∈Xm ∪{x}

μn (x + δ)P[δ]dδ, σ˜ n (x + δ, x)P[δ]dδ,

i = 1, m,

x ∈ X, x ∈ X.

Bayesian Optimization for Robust Solutions Under Uncertain Input

251

Fig. 2 Flowchart for rKG

The expectation in (4) can be computed using Algorithm 1 in [16]. We have previously derived an analytical formula for computing rKGn in case of uniformly distributed disturbance, constant prior mean μ0 , and the choice of squared-exponential kernel with D-dimensional input [10]. The approach fits a GP model over all sampled points and returns the solution with best estimated robustness performance.

4.3 Stochastic Kriging In most papers on Bayesian optimization, it is assumed that output noise is homoscedastic, i.e., the noise variance is constant across the input domain. Since the disturbance, if it cannot be avoided during optimization, leads to heteroscedastic observation noise, the assumption no longer holds. The shape of the function in the local neighborhood of a solution depends on the solution, making the resulting output variance location-dependent due to input uncertainty. Stochastic Kriging (SK) was introduced in [2] as a solution for the problem of poor metamodel fitting in such cases. SK correctly accounts for both sampling and response-surface uncertainty [2]. Thus for DRA(k), we try fitting a SK model on the set of sampled points instead of a standard GP.

252

H. P. Le and J. Branke

Table 1 Comparison of DRA(k) and rKG Method Observations Metamodels used Function learnt Acquisition function Solution returned

DRA(k) With random disturbances Either GP or SK Robustness function F Standard knowledge gradient Best posterior mean

rKG Exactly without disturbances GP Latent function f Robust knowledge gradient Best robust mean

Rather than a single observation, SK requires multiple (ni > 1) observations at each sampled location x i . The observation and intrinsic noise at each point can then be calculated as ni 1 fj (x i ), f¯(x i ) = ni j =1

σ2 (x i ) =

1 V ar(fj (x i )). ni

The formulas for computing posterior mean and covariance still have the same form, but with the adjusted component in the inverted matrix,

n

μn (x) = 0 x, x 1:n (M + )−1 f x 1:n − μ0 x 1:n + μ0 (x),

n

n (x , x) = 0 (x , x) − 0 x , x 1:n (M + )−1 0 x 1:n , x , n is the Gram matrix as in (2) and = diag(σ 2 , . . . , σ 2 ). where M n 1 The only difference between GP and SK in this approach is that the matrices of noise variance are σ In and , respectively. In particular, in the case of 2 = · · · = σ 2 , SK becomes a GP. And thus a SK is homoscedasticity, σ1 n characterized by mean function, kernel, and a diagonal matrix of noise variances at the sampled points. We predict the noise variances at not yet observed points assuming that it can be fitted by another GP. For simplicity, this GP has the same type of mean function and kernel as of the SK. It is worth noting that noise variance is not negative; hence, we choose the rectifier of the posterior mean as the noise variance at each point. Table 1 summarizes and compares the two methods mentioned above.

5 Experiments We study the same benchmark problems as [10] in several tests:

Bayesian Optimization for Robust Solutions Under Uncertain Input

253

Fig. 3 One-dimensional test functions (a) max f1 and (b) min f2

1. One-dimensional benchmark problems with DRA(5) where 5 evaluations are defined using stratified (Latin Hypercube) sampling 2. All benchmark problems with uncertain input, especially when the input is uncontrollable 3. One-dimensional problems with DRA(5) and DRA(10) using GP and SK as surrogate models

5.1 Benchmark Problems Test Functions 1. max f1 (x) = −0.5(x + 1) sin(π x 2 ) with X = [0.1, 2.1] and = 0.15. See Fig. 3a for an illustration. This function is also considered in a two-dimensional version by simply adding up over the two dimensions:

max f3 (x1 , x2 ) = −0.5(x1 + 1) sin π x12 − 0.5(x2 + 1) sin π x22 with search space [0.1, 2.1] × [0.1, 2.1] and = (0.15, 0.15). See Fig. 11a for an illustration. 2. A function from [12] min f2 (x) = 2 sin(10e−0.2x x)e−0.25x for robust minimum over the interval [0, 10] and = 0.5. See Fig. 3b for an illustration.

254

H. P. Le and J. Branke

Fig. 4 Opportunity cost depending on the number of evaluations used for using DRA(5) with uniform random and stratified sampling. DRA(1) is also shown for comparison. (a) f1 . (b) f2

Experimental Setup The number of initial samples was 5 for f1 , 10 for f2 and f3 , and in all cases they are chosen using stratified (Latin Hypercube) sampling. The hyperparameters of the GP and SK are optimized by maximizing the marginal likelihood at each step, using the functions from TensorFlow library [1]. All test results are averaged over 100 independent runs in the one-dimensional cases and 25 runs in the two-dimensional case.

5.2 Results Latin Hypercube Sampling Figure 4 compares the opportunity costs over the total number of evaluations used when 5 evaluations are uniformly distributed within the disturbance region and when they are defined using stratified (Latin Hypercube in one-dimensional case) sampling. It is clearly visible that DRA(5) benefits significantly from stratified sampling for both functions. Note that for a single replication (DRA(1)), stratified sampling is identical to random sampling. An example of the resulting GPs when using DRA(1), DRA(5) with and without stratified sampling for simple function f1 and complicated function f2 is shown in Figs. 5 and 6, respectively.

Stochastic Kriging We test with a budget of 120 evaluations for f1 and 150 evaluations for f2 . Figure 7 compares the results of DRA(10) and DRA(5) with SK and GP model and also DRA(1) (there is no SK version, as it is not possible to estimate the variance

Bayesian Optimization for Robust Solutions Under Uncertain Input

255

Fig. 5 Sampled values and the resulting GPs when using DRA(1), DRA(5) with uniform sampling, and DRA(5) with stratified sampling for f1 . The red cross is the point sampled in the last iteration

Fig. 6 Sampled values and resulting GPs when using DRA(1), DRA(5) with uniform sampling and DRA(5) with stratified sampling for f2 . The red cross is the point sampled in the last iteration

Fig. 7 Opportunity cost when using stochastic Kriging. (a) f1 . (b) f2

at a location given only a single evaluation). The results are a bit mixed, with very little difference on the simpler function f1 , even the indication that the SK model may get stuck for a small number of samples per solution (DRA(5)). For the more complex function f2 with a more widely changing variance, it seems the SK model achieves a better fit especially with few data points. Also, DRA(1), i.e., using a single replication per evaluated solution, seems to still work best. However, the differences are small, and the algorithms with a larger number of

256

H. P. Le and J. Branke

Fig. 8 Result of using stochastic Kriging and Gaussian process for f1 (maximized)

replications (DRA(5) and DRA(10)) have less computational overhead since they have fewer iterations, and are also straightforward to parallelize. As we have seen above, DRA(5) and DRA(10) can furthermore benefit from stratified sampling, so are probably preferable over all. Figure 8 and 9 illustrates the metamodels as the result of using GP and SK together with DRA(5) and DRA(10) for f1 and f2 , respectively.

Uncontrollable Input If we cannot control the input disturbance during optimization, the problem becomes a lot harder. Figure 10 and 11b examines the impact on rKG of making the input disturbance uncontrollable. If we at least can observe the sampled location after the query, rKG can still work very well with the simple test function f1 , and a bit worse with the more complicated function f2 . Yet when we can neither control nor observe where we have just sampled, rKG loses its advantage over DRA(1). Note that DRA(1) randomly disturbs the input anyway, so performs identical in all three scenarios.

Bayesian Optimization for Robust Solutions Under Uncertain Input

257

Fig. 9 Result of using stochastic Kriging and Gaussian process for f2 (minimized)

Fig. 10 Opportunity cost for rKG when the location is either controllable, not controllable but observable, or uncontrollable and unobservable, and comparison with DRA(1). (a) f1 . (b) f2

6 Conclusions In many realistic engineering problems, searching for robust solutions becomes vital in order to reduce the impact of possible disturbances on the decision variables. Another common issue in those problems is the inability to control of the disturbances even during optimization. In this chapter, we propose the use of Latin Hypercube Sampling to improve the quality of the solutions returned by DRA algorithm proposed in [10]. We demonstrate that a stochastic Kriging model may be beneficial if the observation

258

H. P. Le and J. Branke

(a)

(b)

Fig. 11 (a) Original function. (b) Opportunity cost when the location is unknown in twodimensional case

noise due to input disturbance is very heterogeneous. And we examine the case of an uncontrollable input even during optimization. The results show that if we can still observe the sampled location, rKG proposed in [10] still works very well and is the method of choice in terms of speed of convergence and the number of samples used. However, if the sampled location cannot be observed, the performance of rKG deteriorates and becomes comparable to the performance of DRA that is not affected by unobservable input noise. For future work, it is worth testing with higher-dimensional functions and looking for the application to the real-world problems. The algorithms should also be tested with normally distributed disturbances. Acknowledgments The authors would like to thank Dr. Michael Pearce for his technical assistance in completing the experiments. We acknowledge support from EPSRC under grant EP/L015374/1 and GE as part of the Colibri project initiative.

References 1. Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., Kudlur, M., Levenberg, J., Monga, R., Moore, S., Murray, D.G., Steiner, B., Tucker, P., Vasudevan, V., Warden, P., Wicke, M., Yu, Y., Zheng, X.: Tensorflow: A system for large-scale machine learning. In: 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), pp. 265–283 (2016) 2. Ankenman, B., Nelson, B., Staum, J.: Stochastic kriging for simulation metamodeling. Oper. Res. 58, 371–382 (2010) 3. Beyer, H.G., Sendhoff, B.: Robust optimization – a comprehensive survey. Comput. Methods Appl. Mech. Eng. 196(33), 3190–3218 (2007) 4. Branke, J.: Creating robust solutions by means of evolutionary algorithms. In: Eiben, A.E., Bäck, T., Schoenauer, M., Schwefel, H.P. (eds.) Parallel Problem Solving from Nature — PPSN V, pp. 119–128. Springer, Berlin (1998)

Bayesian Optimization for Robust Solutions Under Uncertain Input

259

5. Frazier, P.: A tutorial on bayesian optimization (2018). https://arxiv.org/abs/1807.02811. Accessed 28 April 2020 6. Frazier, P., Powell, W., Dayanik, S.: The knowledge-gradient policy for correlated normal beliefs. INFORMS J. Comput. 21(4), 517–656 (2009) 7. Fröhlich, L.P., Klenske, E.D., Vinogradska, J., Daniel, C., Zeilinger, M.N.: Noisy-input entropy search for efficient robust bayesian optimization. ArXiv abs/2002.02820 (2020) 8. Hennig, P., Schuler, C.J.: Entropy search for information-efficient global optimization. J. Mach. Learn. Res. 13(1), 1809–1837 (2012) 9. Jones, D.R., Schonlau, M., Welch, W.J.: Efficient global optimization of expensive black-box functions. J. Global Optim. 13(4), 45–492 (1998) 10. Le, H.P., Branke, J.: Bayesian optimization searching for robust solutions. In: 2020 Winter Simulation Conference (WSC) IEEE, 2844–2855 (2020) 11. McKay, M., Beckman, R.J., Conover, W.J.: A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics 21(2), 239– 245 (1979) 12. Paenke, I., Branke, J., Jin, Y.: Efficient search for robust solutions by means of evolutionary algorithms and fitness approximation. IEEE Trans. Evolut. Comput. 10(4), 405–420 (2006) 13. Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. The MIT Press, Cambridge (2006) 14. ur Rehman, S., Langelaar, M.: Expected improvement based infill sampling for global robust optimization of constrained problems. Optim. Eng. 18(3), 723–753 (2017) 15. ur Rehman, S., Langelaar, M., Keulen, F.: Efficient kriging-based robust optimization of unconstrained problems. J. Comput. Sci. 5(6), 872–881 (2014) 16. Scott, W., Frazier, P., Powell, W.: The correlated knowledge gradient for simulation optimization of continuous parameters using gaussian process regression. SIAM J. Optim. 21(3), 996–1026 (2011)

Optimization Under Uncertainty of Shock Control Bumps for Transonic Wings Christian Sabater

1 Introduction Shock control bumps (SCBs) are passive flow control devices that improve the performance of transonic wings by altering the flow around near-normal shock waves. These components are physical bumps placed in the location where the shock wave is expected to reside. Shock control bumps split the shock into weaker shocks by means of oblique or compression waves [1]. The flow is isentropically decelerated with respect to the baseline configuration where no bump is present. As a result, they are the most effective devices to reduce wave drag when applied to flow with strong stationary shock waves [2]. Shock Control bumps were first introduced in 1992 for the mitigation of wave drag [3]. Further studies took place in Europe within the EUROSHOCK II project [2] and in the USA [4] to investigate its full potential. From the 2000s, the focus has been on understanding the flow physics [5] and the realization of optimization studies [6, 7]. A more extensive overview is given in [8]. A crucial aspect that is gaining more attention is the need for robust SCBs for industrial applications [9, 10]. The robustness of SCBs, i.e., its ability to effectively reduce wave drag at different flight conditions, is of main concern as they are highly sensitive to the shock wave location [8]. At freestream velocities or lift coefficients different from the initially investigated design point, SCBs suffer from adverse effects as the shock wave is not located in the optimal location [1]. These operational uncertainties can deteriorate the performance of SCBs and make them unfeasible for real-world applications.

C. Sabater () German Aerospace Center (DLR), Institute of Aerodynamics and Flow Technology, Braunschweig, Germany © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Vasile, D. Quagliarella (eds.), Advances in Uncertainty Quantification and Optimization Under Uncertainty with Aerospace Applications, Space Technology Proceedings 8, https://doi.org/10.1007/978-3-030-80542-5_16

261

262

C. Sabater

Currently, multi-point optimization is the most widely used approach in aerodynamic shape optimization to reduce the sensitivity of the quantity of interest, QoI, against the operating conditions [11]. Let x ∈ Rd be the design variables affecting the shape and Y (x) ∈ R the QoI. The optimization of the objective function J consists of a weighted average of QoI evaluated at m operating conditions A ∈ Rm . The weights wi are selected according to expert knowledge and how often the aircraft is expected to fly at a given flight condition. Through this approach, it is possible to reduce the drag at both landing and cruise or at a given Mach points during cruise. *

+

x ∗ = arg min J (x) = arg min x

x

⎧ n ⎨ ⎩

wi Y (x, Ai )

i=1

⎫ ⎬ ⎭

(1)

The main disadvantage is the strong point optimization effect [12]. The optimum configuration strongly depends on the chosen operating conditions. Its performance is usually worse at intermediate ones as these are completely ignored by the optimizer. In this work, we consider the operating conditions as continuous random variables ξ ∈ Rm following a given probability distribution function, PDF. In that case, the response also becomes a random variable and the problem is shifted from the optimization of the QoI towards the optimization of a statistic of the QoI. A common approach in robust design [13] focuses on the minimization of a weighted linear combination of mean μY and standard deviation σY of the original QoI: * + x ∗ = arg min{J (x)} = arg min wμ μY (x) + wσ σY (x) x

x

(2)

Where wμ , wσ are the associated weights that add up to one. For example, wμ = 1, wσ = 0 minimizes the average performance, which is desired from a cost-effective point of view. The robust optimization formulation enhances the realistic design of SCBs by accounting for uncertainties during the design stage [14]. However, the process is computationally expensive as the propagation of the uncertainties from the input parameters to the QoI is required at each iteration [15]. The objective of this paper is the efficient, robust design of shock control bumps on a transonic 3D wing under operational uncertainties. The optimum configuration is efficiently found using a gradient-based optimization under uncertainty framework that combines the adjoint method with Gaussian Processes.

2 Gradient-Based Robust Design Framework This section introduces the gradient-based optimization under uncertainty framework that is used to efficiently obtain the optimum configuration. Additional details on the framework can be found in our previous work [16].

Optimization Under Uncertainty of Shock Control Bumps for Transonic Wings

263

2.1 Motivation The use of gradient-based optimization in aerodynamic shape design is a mature technology [17, 18]. The main reasons are the ability to handle hundreds of design parameters, the relatively fast convergence towards an optimum and the possibility of obtaining gradients through an adjoint formulation. Gradient-based optimization is an iterative process where the next step xj +1 is found by: xj +1 = xj + h∇J

(3)

where ∇ is the gradient (search direction) of the objective function J , and h is the step size along the search dimension to achieve a reduction in J. Obtaining the gradients through finite differences makes the process unfeasible for a large number of design parameters. The use of an adjoint formulation [19] enables the calculation of sensitivities at a cost independent of the number of design parameters. This is especially attractive when the number of cost functions is relatively small compared to the number of design parameters. However, in opposition with gradient-free optimizers such as surrogate-based optimization, gradient-based optimization can only lead to a local optimum and its solution is dependent on the initial starting location, x1 . Under uncertainty, the objective function becomes a random variable. Following the minimization of mean and standard deviation in Eq. (15), from a given sample xj , the next one xj +1 , is found by: xj +1 = xj + h∇J (xj ) = xj + h wμ ∇μY (xj ) + wσ ∇σY (xj )

(4)

According to Eqs. (2) and (4), the statistical moments of the QoI, μY (x), σY (x) and its derivatives, ∇μY (x), ∇σY (x), are required for the gradient-based optimization. Traditionally, the First and Second-Order Method of Moments has been used to approximate these statistics and its gradients [20, 21]. The Method of Moments relies on Taylor Series expansions of the stochastic input variables. As a result, this methodology is specially effective when first-order or second-order approximations of the stochastic space can be done. However, in the presence of shocks and other non-linear flow phenomena, and when the input uncertainties are large, the use of Taylor approximations is no longer suitable. In these cases, more sophisticated surrogate techniques are required.

2.2 Surrogate-Based Uncertainty Quantification The determination of statistical moments of the QoI at a given design point xj , μY (xj ), σY (xj ) through direct Monte Carlo sampling is computationally too expensive as it requires a large number of samples. We propose a Surrogate-Based

264

C. Sabater

Uncertainty Quantification (SBUQ) framework [14] that uses an approximation of the QoI in the stochastic space through a surrogate model zˆ (ξ ). The initial samples are evaluated in the full order model (CFD) according to a Design of Experiments in the stochastic space consisting on Sobol Sequences. Gaussian Process Regression, GPR, is the chosen surrogate model used to approximate the QoI. Gaussian Process Regression follows a probabilistic approach to surrogate modelling. An intuitive derivation is given in [22], while a more formal definition is found in [23]. The main advantage with respect to other surrogate methods is that it incorporates the confidence of the prediction into the regression result and the modelling contains less assumptions regarding the shape and nature of the landscape to represent [24]. Additional infill samples are added to the surrogate following an active infill criteria. The acquisition function deals with sampling evenly in the stochastic space [25]: * + ξ ∗ = arg min −PDFξ (ξ ) sˆ (ξ ) ξ

(5)

Equation (5) maximizes the product between the probability distribution function of the input parameters, PDFξ and the error estimation of the GPR, sˆ . The optimum infill location is found by differential evolution [26]. Finally, the mean and standard deviation are integrated with a large number of Monte Carlo samples in the surrogate model at a low cost.

2.3 Obtaining the Gradients of the Statistics In this subsection, we develop the formulation to obtain the gradients of the mean and standard deviation, ∇μY (xj ), ∇σY (xj ). At a given design point, x j , the deterministic gradients of the QoI with respect to the design parameters at realizations ξk , k = 1 . . . n are usually available using an

adjoint formulation: ∇Y x j , ξ k . From these, the gradient of the mean value at the @ % % % AT dμ % dμ % dμ % design point ∇μY (xj ) = dx 1 % , dx 2 % . . . dx d % is derived. The derivative xj

xj

xj

i of the mean value of the QoI % with respect to a given design parameter x at any dμY % given design point x j , dx i % is obtained from: xj

n dμY %% 1 dY %% = % % dx i x j n dx i xj ,ξk k=1

(6)

Optimization Under Uncertainty of Shock Control Bumps for Transonic Wings

265

%

Where

dY % % dx i xj ,ξ

are realizations of the derivative at n different uncertain k

locations ξk at the design point xj . Again, these derivatives are computed nonintrusively using an adjoint computation. The derivative of the standard deviation of the QoI with respect to each design parameter also is also found analytically: 2 n

dY %% dσY %% 1 dμY %%

Y x − μ x = , ξ − % % % j k Y j dx i xj dx i xj ,ξk dx i x j n σY xj k=1

(7)

This formulation is generic and only requires a large number of samples, n, evaluated in the full order model. When dealing with expensive CFD simulations, we propose the computation of a reduced number of realizations, nsurr , and the construction of a surrogate model of the gradients for each design parameter x i . In this way, the gradients of the statistics are accurately and efficiently integrated using the SBUQ approach previously introduced. The stochastic space is characterized for both the QoI, that is obtained by the primal solution of the CFD solver, and for each of the different d gradients of the QoI with respect to the design parameters, that are efficiently obtained by the adjoint method. As a result, d + 1 different surrogates are constructed at each design point, one to obtain the statistics of the primal solution and d to obtain the statistics of each of the gradients.

2.4 Optimization Architecture The optimization framework combines the gradients obtained by the adjoint formulation with the uncertainty quantification using GPR. As we can see in Fig. 1, this consists of two levels: the gradient-based optimizer in the outer level, and the SBUQ framework in the inner one. A Sequential Least SQuares Programming (SLSQP) [27] is selected as optimizer. At any given design point, x j , the optimizer requires both the statistic such as the mean, μ(x), (blue dots), and its gradients ∇μ(xj ) w.r.t. the design parameters (green arrows). At each design point (outer iteration), the uncertainty quantification is performed in the stochastic space with the help of the surrogate (blue surface) in order to obtain the statistic of the QoI. A total of nsurr samples (black dots) are computed in the black box solver, with its corresponding nsurr adjoint computations. Following Eqs. 6 and 7, a surrogate is also built for each individual dimension (green surfaces). This is used to obtain the gradient of the statistic of the QoI w.r.t. to the design Y parameters. With this approach, both μY and dμ dx are efficiently obtained at each iteration. The strength of the proposed method is the insensitivity to the number of design parameters. It decouples the dimensionality in the design space from the surrogate accuracy. The surrogates are built only in the stochastic space with a reduced

266

C. Sabater

Fig. 1 Robust design framework using the adjoint and Gaussian Processes: Top: gradientbased optimization of the mean of the QoI. Middle: uncertainty quantification through Gaussian Processes of the QoI and each of its gradients at a given design point xj . Bottom: evaluation of primal and adjoint solutions in the full order model

number of samples. As each surrogate of the gradients is built independently for each dimension, the training time only increases linearly with the number of design parameters. This training time is negligible in comparison with the evaluation of the black box solver. Taking this into consideration, the framework is suitable to problems with a large number of design parameters.

Optimization Under Uncertainty of Shock Control Bumps for Transonic Wings

267

2.5 Application to Analytical Test Function The framework was applied in the minimization of the mean of the random performance function Y . The function is driven by the design parameter x ∈ Rd=1 bounded between [a, b], with a = 2 and b = 8: Y d=1 (x, ξ ) = 3 − 4 e−4(x−4) − 5.2 e−4(x−6) + 2

2

x−a 1 b−x 2 ξ + ξ b−a b−a

(8)

where ξ 1 and ξ 2 are two independent random variables with uniform and Gaussian distributions, respectively, ξ 1 ∼ U(0, 10),

ξ 2 ∼ N(1.01, 0.71)

(9)

The gradients w.r.t. the design parameter x are defined as: dY d=1 (x, ξ ) ξ1 − ξ2 2 2 = 32(x − 4)e−4(x−4) + 41.6(x − 6)e−4(x−6) + dx b−a

(10)

For an accurate characterization of the mean value, 6 DoE samples and 2 infill samples were used for the SBUQ. Figure 2 shows the samples for the 1D test case at two different starting locations: x1 = 4.9 and x1 = 7. The reference (true) mean value is also represented in grey. As expected, with a gradient-based optimizer in a multi-modal function, the optimum solution depends on the initial point. The optimum was found using six iterations.

Fig. 2 Gradient-based optimization under uncertainty of the mean value for one dimensional analytical test function. Optimization samples are represented over the true mean function for two different starting points. (a) Starting point x0 = 4.9. (b) Starting point x0 = 7

268

C. Sabater

Fig. 3 Gradient-Based Optimization under Uncertainty of the mean value for 100 dimensional analytical test function with 100 Design parameters. Optimization samples are represented over a 2D section of the true mean function for two different starting points. (a) Starting point x1 = {4.9, 4.9 . . . 4.9}1×100 . (b) Starting point x1 = {7, 7 . . . 7}1×100

This test function was extended to d = 100 design parameters: Y d=100 (x, ξ ) =

100 d=1 i Y (x , ξ ) i=1

100

dY d=100 (x, ξ ) dY d=1 (x, ξ ) = i dx dx

(11)

In this case, finding the optimum solution required the same order of function evaluations as in the 1D test case, as shown in Fig. 3. The samples are represented over a 2D section for x 1 and x 2 , keeping the remaining 98 parameters constant. Despite the increase in dimensionality, for the two different starting locations, the required number of iterations was 11 and 9, respectively. This comparison assumes that the cost of obtaining the gradients in both cases (using for example an adjoint formulation) is equivalent.

3 Application to the Robust Design of Shock Control Bumps: Problem Definition This section introduces the test case for the SCB optimization formulation, the numerical model and the parametrization of the bump and uncertainties.

Optimization Under Uncertainty of Shock Control Bumps for Transonic Wings

269

Table 1 Operational uncertainties M CL

Mean, μ 0.84 0.351

Standard deviation, σ 0.0045 0.0045

Minimum 0.825 0.336

Maximum 0.855 0.367

3.1 Test Case The LANN wing is an aft-loaded, 3D transport aircraft type developed by NASA and Lockheed for the validation of transonic steady and unsteady aerodynamic data [28]. It is a good test case for the retrofit of shock control bumps due to the presence of a strong normal shock wave over the upper surface, that could be mitigated through a retrofit. The QoI is the wing drag coefficient, CD . The nominal flight conditions A are defined by the Mach number, M = 0.84 and lift coefficient, CL = 0.351. Under uncertainties, these were modelled as symmetric BEA distributions, centred in nominal conditions, following Table 1.

3.2 Numerical Model The high-fidelity DLR flowsolver TAU [29] was executed on the cluster using the DLR’s FlowSimulator Data Manager environment. The Reynolds-averaged Navier– Stokes equations in conjunction with the Spallart–Allmaras turbulence model were solved to obtain the primal solution. Convergence was reached with a smaller density residual than 1e–7. After the primal solution was obtained, the discrete adjoint equations [30] were solved in TAU to obtain the total derivative of the drag coefficient with respect to the design parameters xs [31]. When dealing with optimization at constant lift, the gradients w.r.t. the drag must be corrected [32]: % ∂CD ∂α ∂CL dCD %% ∂CD − = % dx CL =CL ∂x ∂α ∂CL ∂x

(12)

0

Finally, the mesh was modified to account for the bumps according to a mesh deformation tool developed by DLR using linear elasticity theory [33].

3.3 Parametrization of Shock Control Bumps The SCB was parametrized by means of a continuous 2D bump [14] as shown in Fig. 4. Each 2D section, A − A , was characterized by five design parameters: bump starting location Xstart , bump length lbump , maximum height hbump , maximum

270

C. Sabater

Fig. 4 Parametrization of shock control bump. Left: continuous bump over the CFD mesh of the wing and SCB guides. Right: SCB parameters over a 2D section and zoom in of the bump

height location Xh, bump and asymmetry factor mbump . The resulting continuous bump was obtained by interpolating between the sections each of the 2D parameters along the wingspan. This parametrization provides enough flexibility to come up with different shapes. For example, if one of the sections has height 0, the continuous bump will be separated in two different SCB. A total of 40 spanwise sections were selected, leading to a total of 200 design parameters. The bump extends spanwise from η = 0.15 to η = 0.95.

3.4 Optimization Formulations Three different optimization formulations were considered: First, the single-point (deterministic) problem finds the optimum configuration that minimizes the drag coefficient, CD at nominal operating conditions A: * + x ∗ = arg min CD (x, A)

(13)

Second, the multi-point optimization focuses on five different flight conditions: x ∗ = arg min x

⎧ 5 ⎨ ⎩

i=1

wi CD (x, Ai )

⎫ ⎬ ⎭

(14)

The location of the flight conditions Ai and the value of each weight, wi were selected to simulate the input PDF of the operational conditions and shown in Fig. 5.

Optimization Under Uncertainty of Shock Control Bumps for Transonic Wings

271

Fig. 5 Different treatment in operational conditions (Mach and lift coefficient): single and multipoint deal with discrete flight conditions, while the robust optimization considers a continuous probability density function

Third, the robust optimization minimizes a linear combination of mean and standard deviation of the drag coefficient. Different combinations of weights (wμ , wσ ) were selected to come up with representative robust configurations. * + x ∗ = arg min wμ μCD (x) + wσ σCD (x) x

(15)

Figure 5 shows the difference between the deterministic, multi-point and robust formulation by looking at the nature of the operating conditions. While in the deterministic configuration only the central (square) point was considered, in the multi-point five flight conditions were evaluated. Finally, the robust takes into account the continuous stochastic space following the probability distribution function.

4 Results In this section, the optimization results are shown for the different formulations.

272

C. Sabater

Fig. 6 Deterministic Optimization results of continuous shock control bump. Top: pressure distribution for the wing upper surface for baseline and optimum configurations. Bottom: shock control bump topology

4.1 Single-Point (Deterministic) Results Before engaging in the robust optimization, the deterministic results are presented. The gradient-based, single-point optimization was performed taking the clean wing as initial configuration. In further iterations, the shock control bump topology evolved through the modification of the 200 design parameters until the optimum shape, shown in Fig. 6, was reached. The initial continuous bump is divided into three regions. The pressure contour field over the upper surface of the wing indicates a smoother transition from lower (red) to higher pressures (yellow) for the wing with the bump due to the weakening of the shock. The reduction in the shock wave strength reduces wave drag. This is also appreciated in Fig. 7, where the pressure coefficient is shown at different cross sections. For example, at η = 60%, the normal shock wave is replaced by two isentropic compression waves. As a result, the total drag coefficient of the wing is reduced by 12.8%.

4.2 Uncertainty Quantification When uncertainties in lift coefficient and Mach number are present, the drag becomes a random variable. The first step to design robustly is to understand the

Optimization Under Uncertainty of Shock Control Bumps for Transonic Wings

273

Fig. 7 Surface pressure distribution at different cross sections for the baseline and optimum configuration. The addition of a shock control bump removes the normal shock wave over the upper surface of the wing

influence of the input uncertainties in the QoI. Figure 6 shows the contours of the stochastic space for the drag coefficient predicted by the GPR for the clean wing. A total of 12 DoE samples were used, shown by the big black dots. The small dots represent the Monte Carlo samples that were integrated in the surrogate. It can be seen that the variability in the drag is relatively large, between 180 and 230 drag counts (Fig. 8). The same analysis was done for the deterministic optimum configuration as represented in Fig. 9. This time ten DoE and two infill samples were used. The contours are lighter, as the shock control bump reduces the drag also under uncertainty. However, this configuration is not robust, as the variability was not taken into consideration during the design process. This will be investigated in the next section.

4.3 Robust Results The robust optimization was performed for a different combination of weights in mean and standard deviation. As before, the starting configuration of the

274

C. Sabater

Fig. 8 Contours of drag coefficient as a function of the lift coefficient and Mach number for the baseline wing (no bump). DoE samples shown as big black dots

Fig. 9 Contours of drag coefficient as a function of the lift coefficient and Mach number for the wing with the deterministic optimum SCB. DoE samples shown as big black dots, infill samples as green triangles

Optimization Under Uncertainty of Shock Control Bumps for Transonic Wings

275

Fig. 10 Convergence history of robust optimization

optimization was the clean wing. For each iteration, 10 DoE and two infill samples were computed in the stochastic space following the SBUQ framework to quantify the mean and standard deviation and their gradients. Figure 10 shows the convergence history of the objective function for a combination of weights: (wμ = 0.67, wσ = 0.33). Taking into account that 12 samples were required for each iteration, a total of 216 CFD samples were evaluated for this particular optimization. The framework is therefore able to quickly converge towards the local optima with a similar number of iterations as in the single-point optimization. To summarize the optimization results, the violin plots of Fig. 11 are used. The probability distribution function of the drag is represented for each optimum configuration. On top of each PDF, black whisker plots show the quantiles, and the mean value is represented by the white dots. The deterministic and multipoint optimum are also represented for comparison with the robust ones. Even though the deterministic optimum (orange PDF) reduces the drag compared to the clean wing, it is outperformed by the multi-point optimum (green PDF). The latter configuration further reduces the PDF. However, the configuration with the lowest average performance is the robust optimum with the weights wμ = 1, wσ = 0, the red PDF. This one further reduces the mean by 0.7 drag counts and the mode (most repeated value) by two drag counts w.r.t. the multi-point optimum. If compared to the deterministic optimum, the improvement in performance is 1.7 drag counts in mean and 3.2 drag counts in the mode. Further improvements can be achieved from a probabilistic formulation with focus on the mean value. If more importance is given to the variability (standard deviation), this comes at the expense of an increase in the mean value. As shown in the violin plot for the other robust optimum, the reduction in the extent of the PDF by increasing wσ is associated with a shift of the PDF towards higher values of drag. These are two

276

C. Sabater

Fig. 11 Violin plot of optima configurations. On top, shape of resulting shock control bump

conflictive objective functions. The corresponding shapes are shown on top of each PDF. It can be noted that there is an inverse correlation between the reduction in the variability of the drag and the bump height. Considering the conflictive nature between mean and standard deviation, the single objective problem represented by Eq. (15) can be interpreted as a multiobjective optimization. The set optimum configurations obtained with different weights can be represented in a Pareto front as shown in Fig. 12. Note how both deterministic optimum and robust optimum are not part of the Pareto front. Also, it is remarkable that the gradient-based framework is able to find these non-dominated solutions. This entails that the gradients of the mean and standard deviation are obtained with high accuracy. After the different solutions are found, the designers are responsible to choose the most suitable configuration according to their needs.

5 Conclusions Shock control bumps are effective shock control devices that are able to decrease wave drag by weakening the normal shock wave over the upper surface of the wing. To successfully retrofit shock bumps to transonic wings, these need to be robust against the variability in the operating conditions such as lift coefficient and Mach. Otherwise, their performance could seriously deteriorate with respect to nominal

Optimization Under Uncertainty of Shock Control Bumps for Transonic Wings

277

Fig. 12 Mean and standard deviation of deterministic, multi-point and robust optimum configurations and corresponding Pareto front

conditions. A multi-point formulation is not enough to account for this variability, and a robust optimization is required through a probabilistic characterization of the operational conditions. The presented gradient-design framework combines an adjoint formulation with Gaussian Processes for the characterization of the mean and standard deviation of the QoI as well as its gradients. The use of surrogate models enables a fast convergence towards the robust optima within a reduced CPU time. Most importantly, the framework is able to handle a large number of design parameters. Different optimum robust configurations have been obtained by changing the weights of the mean and standard deviation. The robust optimum with focus in the mean value outperforms the traditional multi-point optimum and is able to better mitigate the shock waves within the uncertain operational conditions. In addition, there is a clear trade-off between variability and average performance that can be displayed as a family of Pareto optima solutions. Future work involves the addition of more uncertainties and the use of different objective functions such as the quantile.

References 1. Ogawa, H., Babinsky, H., Pätzold, M., Lutz, T.: Shock-wave/boundary-layer interaction control using three-dimensional bumps for transonic wings. AIAA J. 46(6), 1442–1452 (2008) 2. Stanewsky, E., Délery, J., Fulker, J., de Matteis, P.: Synopsis of the project euroshock II. In: Stanewsky, E., Délery, J., Fulker, J., de Matteis, P. (eds.) Drag Reduction by Shock and Boundary Layer Control, pp. 1–124, Berlin. Springer, Berlin (2002)

278

C. Sabater

3. Ashill, P.R., Fulker, J.L., Shires, J.L.: A novel technique for controlling shock strength of laminar-flow airfoil sections. In: Proceedings of the 1st European Forum on Laminar Flow Technology, pp. 175–183, Hamburg (1992) 4. McGowan, A.R.: Avst morphing project research summaries in fiscal year 2001. Technical Report nasa tm-2002-2 11769, NASA (2002) 5. Bruce, P.J.K., Babinsky, H.: Experimental study into the flow physics of three-dimensional shock control bumps. J. Aircraft 49(5), 1222–1233 (2012) 6. Lee, D.S., Periaux, J., Onate, E., Gonzalez, L.F., Qin, N.: Active transonic aerofoil design optimization using robust multiobjective evolutionary algorithms. J. Aircraft 48(3), 1084–1094 (2011) 7. Paetzold, M., Lutz, T., Kramer, E., Wagner, S.: Numerical optimization of finite shock control bumps. In: 44th AIAA Aerospace Sciences Meeting and Exhibit. American Institute of Aeronautics and Astronautics (2006) 8. Bruce, P.J.K., Colliss, S.P.: Review of research into shock control bumps. Shock Waves 25(5), 451–471 (2014) 9. Jinks, E.R., Bruce, P.J., Santer, M.J.: Adaptive shock control bumps. In: 52nd Aerospace Sciences Meeting. American Institute of Aeronautics and Astronautics (2014) 10. Nuebler, K., Lutz, T., Kraemer, E., Colliss, S., Babinsky, H.: Shock control bump robustness enhancement. In: 50th AIAA Aerospace Sciences Meeting including the New Horizons Forum and Aerospace Exposition. American Institute of Aeronautics and Astronautics (2012) 11. Jameson, A.: Automatic design of transonic airfoils to reduce reduce the shock induced pressure drag. In: Proceedings of the 31st Israel Annual Conference on Aviation and Aeronautics, Tel Aviv, 01 (1990) 12. Huyse, L.: Free-form airfoil shape optimization under uncertainty using maximum expected value and second-order second-moment strategies. Techreport 2001-211020, NASA (2001) 13. Maruyama, D., Liu, D., Görtz, S.: An efficient aerodynamic shape optimization framework for robust design of airfoils using surrogate models. In: Proceedings of the VII European Congress on Computational Methods in Applied Sciences and Engineering (ECCOMAS Congress 2016). NTUA Greece (2016) 14. Sabater, C., Görtz, S.: An efficient bi-level surrogate approach for optimizing shock control bumps under uncertainty. In: AIAA Scitech 2019 Forum. American Institute of Aeronautics and Astronautics (2019) 15. Schuëller, G.I., Jensen, H.A.: Computational methods in optimization considering uncertainties – an overview. Comput. Methods Appl. Mech. Eng. 198(1), 2–13 (2008) 16. Sabater, C., Goertz, S.: Gradient-based aerodynamic robust optimization using the adjoint method and gaussian processes. In: EUROGEN (2019) 17. Merle, A., Stueck, A., Rempke, A.: An adjoint-based aerodynamic shape optimization strategy for trimmed aircraft with active engines. In: 35th AIAA Applied Aerodynamics Conference. American Institute of Aeronautics and Astronautics (2017) 18. Kenway, G.K., Martins, J.R.R.A.: Aerodynamic shape optimization of the CRM configuration including buffet-onset conditions. In: 54th AIAA Aerospace Sciences Meeting. American Institute of Aeronautics and Astronautics (2016) 19. Giles, M.B., Pierce, N.A.: An introduction to the adjoint approach to design. Flow, Turbulence Combust. 65(3–4), 393–415 (2000) 20. Pini, M., Cinnella, P.: Hybrid adjoint-based robust optimization approach for fluid-dynamics problems. In: 54th AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics, and Materials Conference. American Institute of Aeronautics and Astronautics (2013) 21. Fragkos, K.B., Papoutsis-Kiachagias, E.M., Giannakoglou, K.C.: pFOSM: An efficient algorithm for aerodynamic robust design based on continuous adjoint and matrix-vector products. Comput. Fluids 181, 57–66 (2019) 22. Jones, D.R.: A taxonomy of global optimization methodsbased on response surfaces. J. Global Optim. 21(4), 345–383 (2001) 23. Sacks, J., Welch, W.J., Mitchell, T.J., Wynn, H.P.: Design and analysis of computer experiments. Statist. Sci. 4(4), 409–423 (1989)

Optimization Under Uncertainty of Shock Control Bumps for Transonic Wings

279

24. Forrester, A.I.J., Keane, A.J.: Recent advances in surrogate-based optimization. Progress Aerosp. Sci. 45(1–3), 50–79 (2009) 25. Dwight, R., Han, Z.-H.: Efficient uncertainty quantification using gradient-enhanced kriging. In: 50th AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics, and Materials Conference. American Institute of Aeronautics and Astronautics (2009) 26. Storn, R., Price, K.: Differential evolution. a simple and efficient heuristic for global optimization over continuous spaces. J. Global Optim. 11(4), 341–359 (1997) 27. Kraft, D.: A software package for sequential quadratic programming. Technical Report DFVLR-FB 88-28, DLR German Aerospace Center – Institute of Flight Dynamics (1988) 28. Ruo, S., Malone, J., Horsten, J., HOUWINK, R.: The LANN program - an experimental and theoretical study of steady and unsteady transonic airloads on a supercritical wing. In: 16th Fluid and Plasmadynamics Conference. American Institute of Aeronautics and Astronautics (1983) 29. Gerhold, T.: Overview of the hybrid RANS code TAU. In: MEGAFLOW - Numerical Flow Simulation for Aircraft Design, pp. 81–92. Springer, Berlin (2015) 30. Dwight, R.: Efficiency improvements of rans-based analysis and optimization using implicit and adjoint methods on unstructured grids. In: DLR Deutsches Zentrum fur Luft- und Raumfahrt e.V. - Forschungsberichte (2006) 31. Brezillon, J., Dwight, R.P.: Applications of a discrete viscous adjoint method for aerodynamic shape optimisation of 3d configurations. CEAS Aeronaut. J. 3(1), 25–34 (2011) 32. Reuther, J., Jameson, A., Farmer, J., Martinelli, L., Saunders, D.: Aerodynamic shape optimization of complex aircraft configurations via an adjoint formulation. In: 34th Aerospace Sciences Meeting and Exhibit. American Institute of Aeronautics and Astronautics (1996) 33. Gerhold, T., Neumann, J.: The parallel mesh deformation of the DLR TAU-code. In: Notes on Numerical Fluid Mechanics and Multidisciplinary Design (NNFM), pp. 162–169. Springer, Berlin (2006)

Multi-Objective Design Optimisation of an Airfoil with Geometrical Uncertainties Leveraging Multi-Fidelity Gaussian Process Regression Péter Zénó Korondi , Mariapia Marchi , Lucia Parussini Domenico Quagliarella , and Carlo Poloni

,

1 Introduction Shape optimisation of an airfoil is one of the most fundamental problems in aerodynamic design optimisation. The purpose of an airfoil is to generate a pressure difference in a flow so that a force is generated. The force component perpendicular to the flow direction is called lift, and its magnitude and sense (in respect of the defined force reference frame) depend on the shape of the airfoil and on the flow conditions. Together with the lift, the presence of the airfoil in the flow inevitably generates a force component parallel to the flow direction, called drag. Most engineering applications exploit the lift, while the drag is an inevitable loss. Therefore, the shape optimisation of an airfoil aims to find the maximum lift or minimum drag design or an optimal lift-to-drag ratio such that some additional application-dependent requirements are also satisfied. In reality, solutions that optimise all objectives simultaneously are typically nonexisting. A single-objective problem might have a single global optimal solution. However, formulating a real problem as single-objective implies a decision making

P. Z. Korondi () · C. Poloni Department of Engineering and Architecture, University of Trieste, Trieste, Italy ESTECO S.p.A, Trieste, Italy e-mail: [email protected] M. Marchi ESTECO S.p.A, Trieste, Italy L. Parussini Department of Engineering and Architecture, University of Trieste, Trieste, Italy D. Quagliarella Italian Aerospace Research Centre, Capua, Italy © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Vasile, D. Quagliarella (eds.), Advances in Uncertainty Quantification and Optimization Under Uncertainty with Aerospace Applications, Space Technology Proceedings 8, https://doi.org/10.1007/978-3-030-80542-5_17

281

282

P. Z. Korondi et al.

on the preferences of various requirements. By formulating our problem as a multiobjective optimisation, the preferences of various requirements are decided after a set of Pareto optimal solutions are found [3]. Consequently, in the context of multiobjective airfoil optimisation, we are searching for a set of airfoil designs which are Pareto optimal to our multi-objective problem formulation. This set is called Pareto front. When the objectives are conflicting, the Pareto set contains more than one solution that cannot be improved in any of the objectives without degrading at least one of the others. To obtain the Pareto front various algorithms exist. However, they commonly require many performance evaluations of the underlying problem. This is troublesome for the aerodynamic shape optimisation of an airfoil, as accurate computational fluid dynamics (CFD) calculations are typically expensive in computational time [14]. This issue can be tackled by employing surrogate models [13, 15]. Expensive calculations are performed for only a handful of designs. Then a statistical model is built to approximate the aerodynamics of airfoil designs which have not been evaluated by the expensive CFD code. The accuracy of the statistical model highly depends on the number of available CFD evaluations. Consequently, sparsely sampled design landscapes are hard to approximate accurately with standard surrogate techniques. In such a case, aerodynamic calculations of lower fidelity can be used to provide sufficient information for building an accurate statistical model. The information from lower fidelity calculations can be fused together with high-fidelity data by using the multi-fidelity Gaussian process regression (MF-GPR) [9, 11]. There is also another issue to take into account. Often the actual design, or operation point, and its performance are slightly different from the optimisation solutions because of manufacturing, wear off and other operational deformations, like icing and surface pollution. In practice, our design problem is affected by various uncertainty sources which affect the actual performance. This issue can be addressed with uncertainty quantification (UQ) techniques and optimisation under uncertainty methods. UQ can be used to estimate statistical measures of the design performance that can be in turn used as reliable or robust objectives of the optimisation under uncertainty problem (see e.g. the reviews [2, 19]). In our recent work, we proposed a multi-fidelity optimisation workflow for optimising expensive multi-objective problems under uncertainty [10]. In this work, we investigate the performance and applicability of our proposed workflow for the aerodynamic shape optimisation of an airfoil. The optimisation problem is presented in Sect. 2. In Sect. 3 the used aerodynamic solvers are examined. In Sect. 4 the MF-GPR method is introduced and Sect. 5 discusses on the uncertainty treatment. The proposed optimisation workflow is briefly presented in Sect. 6. Results are analysed in Sect. 7 and our conclusions are expressed in Sect. 8.

Multi-Objective and Multi-Fidelity Airfoil Optimisation Under Uncertainty

283

2 Design Optimisation Problem of Airfoil Conceptually, aerodynamic design optimisation can be approached in two ways: inverse and direct [20]. In inverse design optimisation, a desired pressure distribution is targeted and the optimisation algorithm seeks to find the geometrical shape which produces the targeted pressure distribution or lift force. Therefore, inverse methods are commonly applied in later design phases when target values are known. A direct method, as its name suggests, directly optimises the objective without any predefined target value. In direct aerodynamic shape optimisation, a maximum lift or minimum drag design or an optimal lift-to-drag ratio is typically desired. However, searching for an optimal lift-to-drag ratio implies a predetermined (typically equal) preference of the lift and drag requirements. In this work, therefore, the aerodynamic airfoil shape optimisation problem is formulated as a multi-objective direct design optimisation. Here, the MH114 airfoil is optimised for a high-lift propeller. The set of geometries are sought which are Pareto optimal for maximal lift (L) and minimal drag (D). As the lift and drag forces are uncertain due to the geometrical uncertainties, a reliability based multi-objective optimisation problem is considered here:

min x

min x

S95 −Cl (x, u) ˜ Cd (x, u) ˜ S95 Cd0

(1a) (1b)

where S95 denotes the 95-th superpercentile which is a risk measure defined in Sect. 5. Cd0 = 0.01 is a normalisation factor which brings the drag coefficient value to the same order of magnitude as the lift coefficient. The lift coefficient (Cl ) and drag coefficient (Cd ) are:

Cl =

L(x, u) ˜ 1/2ρU 2 A

(2a)

Cd =

D(x, u) ˜ 1/2ρU 2 A

(2b)

ρ is the density and U is the free-stream velocity of the air. A is the reference area which is considered as unit throughout this work. The aerodynamic coefficients are calculated with an angle-of-attack = 5◦ , Reynolds = 5·106 and Mach = 0.218 at standard sea-level conditions. The geometrical shape of the airfoil is defined by superposing modal shape functions on the baseline geometry of MH114. Eight modal shape functions are considered. They are shown in Fig. 1 and Table 1. The first two modes modify the thickness and the camber line of the airfoil. The remaining six modes introduce

284

P. Z. Korondi et al.

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

Fig. 1 Design modes. (a) Mode 1. (b) Mode 2. (c) Mode 3. (d) Mode 4. (e) Mode 5. (f) Mode 6. (g) Mode 7. (h) Mode 8 Table 1 Design and uncertain variables Mode Mode 1 Mode 2 Mode 3 Mode 4 Mode 5 Mode 6 Mode 7 Mode 8

Function type Polynomial Polynomial Hicks-Henne Hicks-Henne Hicks-Henne Hicks-Henne Hicks-Henne Hicks-Henne

Design variable x1 x2 x3 x4 x5 x6 x7 x8

Uncertain variable u1 u2 u3 u4 u5 u6 u7 u8

Physical interpretation Thickness Camber Upper LE Lower LE Upper middle Lower middle Upper TE Lower TE

local shape modifications of the upper and lower side of the airfoil at the leading edge (LE), mid span and trailing edge (TE), respectively. The design variables to be optimised are the scaling parameters (xi ) of the modal shape functions. Additionally, the shape of the airfoil is considered to have some uncertainties due to the manufacturing process. Therefore, each shape mode is superposed on the design shape with an uncertain scaling factor (ui ).

3 Solvers To calculate the aerodynamic forces of the airfoil, two solvers are considered: XFOIL [4] and SU2 [6]. The former is an airfoil analysis tool based on potential flow equations (panel methods). For viscous problems, a two-equation integral boundary layer formulation is coupled with the inviscid flow solution [5]. The transition criteria are calculated by the eN envelope method. XFOIL has a fairly rapid calculation time and provides sufficient accuracy for most engineering applications. The SU2 software provides a solver for the compressible Reynolds-averaged Navier–Stokes equation. The RANS equation is closed by Menter’s Shear Stress

Multi-Objective and Multi-Fidelity Airfoil Optimisation Under Uncertainty

285

Fig. 2 Comparison of lift and drag coefficient curves of MH114 calculated with XFOIL and SU2. (a) Lift coefficient against angle-of-attack. (b) Drag coefficient against angle-of-attack

Transport turbulence model [12] which efficiently blends the k-ω turbulence model of the near wall region to the k- model of the region away from any wall. Various studies have been carried out to compare the results of these two solvers, like e.g. [1, 22]. Both solvers are suitable to accurately predict the aerodynamic forces of an airfoil. For the sake of this work, we will consider SU2 as a higher fidelity solver as it implements a more general form of the Navier–Stokes equation. The aerodynamic evaluations with XFOIL and SU2 are performed with the framework software described in [16, 17]. The modal shape function superposition is performed with the wg2aer.1 The modified airfoil geometry is stored in a Selig format which can be directly processed by XFOIL. For the CFD evaluation, the modified airfoil and its surrounding domain are discretised with the open-source Gmsh software which generates the mesh in .su2 format. Finally, SU2 performs the aerodynamic analysis of the airfoil and provides the high-fidelity drag and lift predictions. The lift and drag coefficients of the MH114 airfoil are plotted in Fig. 2a and b. The calculations are carried out at Reynolds = 5 · 106 and Mach = 0.218 with standard sea-level conditions. SU2 considers the domain around the airfoil fully turbulent. Therefore, XFOIL was also forced to operate in fully turbulent conditions by setting the transition point location at the beginning of lower and upper airfoil sides (XTRLO and XTRUP set to 0.01). We can see that the two solvers produce similar polar trends; however, there are some deviations in the actual values. This makes the two solvers appropriate candidates for a multi-fidelity optimisation.

1 Software

developed by the Italian Aerospace Research Centre (CIRA).

286

P. Z. Korondi et al.

4 Multi-Fidelity Gaussian Process Regression The multi-fidelity Gaussian process regression (MF-GPR) is briefly discussed in this section. This technique tailors the well-known Gaussian process regression (GPR)2 to fuse information from various fidelity sources into a single surrogate [8, 9]. We employ the recursive formulation proposed by Le Gratiet and Garnier [11]: f˜LF (x) = hTLF (x)β LF + δ˜LF (x), f˜HF (x) = ρ(x)f˜LF (x) + hTHF (x)β HF + δ˜HF (x), ρ(x) = gT (x)β ρ

(3)

where a least squares regression hTi (x)β i with i = HF, LF formulates the mean trend of the fidelity level. Correspondingly, hi (x) is the vector of regression functions and β i is the vector of regression coefficients. The local variations of the model are incorporated into δ˜i (x) ∼ N(0, σi2 ) and modelled as zero mean Gaussian distributions with σi2 variance. This MF-GPR formulation is hierarchical. The low-fidelity level is modelled by a GPR. The high-fidelity model builds an additional GPR using the posterior distribution of the low-fidelity level. A GPR problem is solved at each level without the need to construct a covariance matrix which contains the observations of all fidelity levels as in [9]. The surrogate is frequently updated during the optimisation; hence, the smaller size of the covariance matrix can result in a significant computational speed-up.

5 Uncertainty Treatment When simulating a flow around an airfoil only a limited number of phenomena are modelled. Therefore, the aerodynamic performance of a real airfoil might deviate from the numerical results. This motivates the construction of probabilistic models which are appended to the design optimisation workflow to predict the variations of the aerodynamic performance. Uncertainty modelling techniques are grouped as: deterministic, probabilistic and possibilistic, according to [2]. In this work, only geometrical uncertainties are considered. It is assumed that their nature is probabilistic and they can be described by a Gaussian distribution with zero mean and 0.01 standard deviation. The geometrical uncertainties are propagated through the aerodynamic solver which will result in a probabilistic aerodynamic performance.

2 Gaussian

process regression is also called Wiener–Kolmogorov prediction and Kriging.

Multi-Objective and Multi-Fidelity Airfoil Optimisation Under Uncertainty

287

The comparison of two probability distributions is not a trivial task. The possible realisations of a distribution are multiple [21]. Therefore, only certain properties of a distribution are compared. In this work, the 95-th superpercentile of the Cl and Cd distributions are used for evaluating the aerodynamic performance of the airfoil. The ζ -th superpercentile of a probabilistic variable is the expected value of all possible realisations which are not smaller than the ζ -th percentile value. This measure is commonly called superquantile or conditional Value-at-Risk as well. This risk measure is employed to ensure reliability in the sense that the tail of the response distributions is optimised. The robustness of response (i.e. sensitivity against uncertainties) is not taken into account with this risk measure. The advantages of the superpercentile measure over other risk measures for engineering applications are discussed in [18, 21]. The ζ -th superpercentile is calculated with the following equation: Sζ = q ζ /100

= E y| ˜ y˜ ≥ qζ /100 (y) ˜ =

1 1 − ζ /100

1

qτ (y)dτ ˜

(4)

ζ /100

where qζ /100 is the ζ -th percentile. In this work, ζ = 95. Analytical propagation of the uncertainty is not possible due to the complex aerodynamics solvers. Therefore, the superpercentile values of the Cl and Cd distributions are calculated using empirical values obtained by sampling. To obtain a sufficient number of samples, surrogate assisted uncertainty quantification is performed as in [10]. The probabilistic space is considered to be independent from the design space, and for each design a local GPR is built on 15 probabilistic samples. The calculation of the superpercentile is then done with 5000 virtual samples of the local probabilistic space. These values seemed to be adequate after performing a few trial and error checks, together with a quick validation of the probabilistic models of the low-fidelity responses, as explained in the next paragraph. The validation of a probabilistic model involving expensive aerodynamic simulations is a cumbersome process. However, we can validate the probabilistic models of the low-fidelity level at a relative low cost. In Tables 2 and 3, the relative error of the low-fidelity risk measure predictions, S95 (Cl ) and S95 (Cd ), respectively, are shown. The tables present the results of 5 independent calculations of the lift and drag risk measures with both the proposed surrogate-based UQ technique and a pure Monte Carlo sampling. The same 5000 input geometrical uncertainties of the surrogate-based UQ run were used for the Monte Carlo sampling calculation. However, they were evaluated with the XFOIL solver. The mean relative error of the lift and drag superpercentiles is around 0.1% which is adequate for the sake of this study. Due to the expensive nature of the high-fidelity level, we can only assume that the prediction error of the high-fidelity risk measure values has the same magnitude as we obtained with the low-fidelity level.

288

P. Z. Korondi et al.

Table 2 Relative error of the predicted S95 (Cl ) values at the low-fidelity level S95 (Cl ) with surrogate-based UQ S95 (Cl ) with pure Monte Carlo Relative error [%] Mean relative error [%]

Run 1 1.33722 1.33570 0.11380 0.11185

Run 2 1.33733 1.33545 0.14078

Run 3 1.33682 1.33581 0.07561

Run 4 1.33769 1.33599 0.12725

Run 5 1.33719 1.33583 0.10181

Table 3 Relative error of the predicted S95 (Cd ) values at the low-fidelity level S95 (Cd ) with surrogate-based UQ S95 (Cd ) with pure Monte Carlo Relative error [%] Mean relative error [%]

Run 1 0.013085 0.013098 0.09162 0.07177

Run 2 0.013089 0.013098 0.06871

Run 3 0.013085 0.013098 0.09925

Run 4 0.013091 0.013096 0.03818

Run 5 0.013089 0.013097 0.06108

6 Multi-Objective Optimisation Framework for Airfoil Optimisation Under Uncertainty This work employs the optimisation workflow proposed in [10]. The workflow embodies a multi-fidelity Bayesian optimisation for multi-objective problems, and in this work we employed it for aerodynamic design optimisation of airfoils using XFOIL and SU2. The workflow can be divided into three major components: design of experiments (DoE), multi-fidelity surrogate construction and acquisition function. To initialise our optimisation workflow, a DoE technique is employed to obtain a dataset for surrogate construction. At each fidelity level, the design space is sampled by uniform Latin Hypercube Sampling (LHS). After obtaining the 95-th superpercentiles of lift and drag, two independent multi-fidelity surrogate are trained using the recursive formulation defined by Eq. (3). The two MF-GPR models are used by the acquisition function to determine which design configuration should be evaluated in the next iteration and which solver should perform the aerodynamic calculation. Since it is a multi-objective problem, the decision on the next design location is made by maximising the hypervolume improvement of the lower confidence bound of the drag and lift coefficients following the suggestion of [7]. For the selected design configuration, the superpercentile values of the drag and lift coefficients are calculated by evaluating the corresponding probabilistic sample with XFOIL or SU2. The selection on the solver is based on Scaled Expected Variance Reduction (SEVR) values of the fidelity levels [10]. With the new superpercentile values, the surrogate model of the lift and drag can be retrained. The surrogate is updated with new designs until the computational budget is exhausted. At the end, the set of Pareto optimal designs are presented to decision makers. The complete optimisation workflow is depicted in Fig. 3.

Multi-Objective and Multi-Fidelity Airfoil Optimisation Under Uncertainty

289

Fig. 3 Multi-objective probabilistic optimisation workflow for aerodynamic shape optimisation of an airfoil with MF-GPR Table 4 Summary of optimisation problem

Number of objectives Number of constraints Number of design variables Number of uncertain variables Computational budget

2 0 8 8 136,500

7 Results A brief summary of the solved optimisation problem is presented in Table 4. The problem is bi-objective and has 8 design and 8 uncertain variables. The computational budget is set to 136,500 units. The number of evaluated LF and HF samples and their cost are presented in Table 5. We assigned 300 cost units for running a single evaluation of aerodynamics forces with SU2 and 1 unit for evaluating the design with XFOIL. Here, we determined the cost of the fidelity levels based on the actual running times of the simulations on the used machine. The optimisation stopped when no further high-fidelity samples could be added to the surrogate training set. After the 435th HF simulation, only 4245 units remained in the budget which is not sufficient for generating the required 15 samples to build the probabilistic model. Therefore, only the 96.8 % of the budget was used. In Fig. 4, we can see that the algorithm alternates the fidelity levels. The alternation stems from the fact that, in regions where the expected improvement is high due to large uncertainties, the algorithm will evaluate the new design with the low-fidelity solver. Following this step, high expected improvement values in the region are the results of promising performance prediction with low-level

290

P. Z. Korondi et al.

Table 5 Number of LF and HF samples and their costs Aerodynamic solver Evaluation cost Total number of samples Budget spent

LF DoE XFOIL 1 450 450 (0.3%)

LF total XFOIL 1 1755 1755 (1.2%)

HF DoE SU2 300 225 67,500 (51.7%)

HF total SU2 300 435 130,500 (95.6%)

Fig. 4 History of fidelity selection

Fig. 5 Comparing relative prediction errors of MF-GPR and GPR (objective 1: lift, objective 2: drag)

of uncertainty. Therefore, the region can be sampled by high-fidelity simulation without risking the waste of budget. To investigate the advantage of MF-GPR over single-fidelity GPR, the relative prediction errors of the surrogates are calculated in every iteration when a HF sample is generated. The classical GPR model is built in every iteration using only the HF samples of the actual iteration. Overall, the MF-GPR provides a better prediction; however, in some iterations GPR can temporarily be the best predictor as shown in Fig. 5. At each iteration, the prediction error is calculated based on a single sample. When the newly evaluated design lies in a region which can be accurately predicted by a GPR model using only HF samples, it is possible that the singlefidelity GPR model provides a slightly better prediction. However, in the majority of the iterations, MF-GPR outperforms the GPR model. In Table 6 we show the mean prediction error of the single- and multi-fidelity surrogates. Both surrogate models can provide a relatively accurate prediction of the objectives in overall. However,

Multi-Objective and Multi-Fidelity Airfoil Optimisation Under Uncertainty Table 6 Mean prediction error

Objective 1 (lift) Objective 2 (drag)

291 GPR 0.77% 1.51%

MF-GPR 0.31% 0.99%

Fig. 6 Correlation history of objectives

the single-fidelity surrogate model shows significantly bigger prediction errors when the newly evaluated design lies in a region which was not explored by a sufficient number of observations. Throughout the iterations, the correlation between the predictions at low and high-fidelity levels is high and steady as shown in Fig. 6. The high correlation is expected at the beginning of the optimisation as the high number of LF samples in the training data results in a MF-GPR model which predicts performances close to the LF observations. This correlation does not deteriorate by updating the model with high-fidelity samples which suggests that the initial MF-GPR dominated by the LF samples provides a good approximation. The obtained Pareto optimal solutions are depicted in Fig. 7 (red circles, Pareto front HF). The initial HF Pareto front obtained after the DoE (dash-dotted grey line) was significantly improved. The MF-GPR models of the objectives can provide accurate predictions; hence, most of the design locations suggested by the acquisition function are Pareto optimal. In the same figure, the Pareto front of the LF samples is shown. It could seem that the LF Pareto optimal solutions dominate the HF front. However, as Fig. 2a and b also suggest, the drag and lift force are actually under-predicted by using XFOIL. Indeed, by re-evaluating the Pareto optimal LF optimal designs with the HF solver, the green circles (HF evaluations of LF Pareto front) are found. Thus, we can conclude that the gap between the LF and HF front is due to the approximation error of the LF evaluations. Therefore, the introduction of the HF samples into the surrogate model constructions is beneficial for obtaining an accurate Pareto front. The list of design variables and objectives of the non-dominated designs is shown in Table 7. We can see that many design variables reach the boundary of the design variable limits. The design variable of the thickness mode is set to −1 for every nondominated design. This is expected as thin airfoils produce significantly less drag, and structural properties are not considered here. Depending on some further criteria on the propeller blade, the decision maker can choose the preferred airfoil (among the HF Pareto optimal solutions) for further analysis. For example, Table 8 lists three possible designs corresponding to the minimum drag, maximum lift-to-drag and maximum lift design. To compare the

292

P. Z. Korondi et al.

Fig. 7 Comparison of high and low-fidelity Pareto fronts

Table 7 List of design variables and objectives of the non-dominated designs x1 −1.000 −1.000 −1.000 −1.000 −1.000 −1.000 −1.000 −1.000 −1.000 −1.000 −1.000

x2 1.000 1.000 1.000 0.663 0.611 0.917 −0.128 −0.215 −0.717 −1.000 −1.000

x3 1.000 −0.084 −1.000 −0.927 −0.458 0.272 −1.000 1.000 −1.000 −0.379 1.000

x4 1.000 1.000 1.000 0.654 −1.000 −1.000 0.350 −1.000 −1.000 1.000 −1.000

x5 1.000 −0.119 −1.000 −1.000 −0.659 −1.000 −1.000 −1.000 −1.000 −1.000 0.000

x6 −1.000 −1.000 −0.113 −1.000 −0.608 −1.000 −1.000 −1.000 −1.000 −1.000 1.000

x7 0.000 −0.526 −1.000 −0.595 −0.345 −1.000 −0.616 −1.000 0.020 −1.000 −1.000

x8 −1.000 −1.000 −1.000 −1.000 −1.000 −0.199 −0.921 −1.000 −1.000 −1.000 −1.000

S95 (−Cl ) −1.637 −1.603 −1.552 −1.536 −1.501 −1.485 −1.439 −1.388 −1.363 −1.346 −1.288

S95 Cd /Cd0 1.822 1.715 1.649 1.608 1.578 1.569 1.505 1.457 1.450 1.433 1.418

Table 8 Optimal designs of various criteria Baseline design Minimum drag design Maximum lift-to-drag design Maximum lift design

S95 (Cl ) 1.4222 1.2879 1.5360 1.6371

S95 (Cd ) 0.01646 0.01418 0.01608 0.01822

lift-to-drag 0.864 0.908 0.955 0.899

Multi-Objective and Multi-Fidelity Airfoil Optimisation Under Uncertainty

293

Fig. 8 Comparison of the baseline and Pareto optimal designs with uncertainty prediction. (a) Comparison of the baseline and minimum drag design with uncertainty prediction. (b) Comparison of the baseline and maximum lift-to-drag design with uncertainty prediction. (c) Comparison of the baseline and maximum lift design with uncertainty prediction

predicted probability distributions of their aerodynamic forces, the baseline and the maximum lift-to-drag design are shown in Fig. 8b. Violin plots are drawn based on the 5000 virtual samples mentioned in Sect. 5. By optimising the 95-th superpercentile of the aerodynamic forces, the tails of the aerodynamic performance distributions are optimised. Due to the strong correlation of the lift and drag forces, the optimisation of the opposite distribution tails can result in more robust solutions

294

P. Z. Korondi et al.

Fig. 9 Comparison of Pareto optimal airfoil designs. (a) Minimum drag design. (b) Cp distribution of minimum drag design. (c) Maximum lift-to-drag design. (d) Cp distribution of maximum lift-to-drag design. (e) Maximum lift design. (f) Cp distribution of maximum lift design

(narrower spread of the distribution) as we can observe in the case of maximum lift and maximum lift-to-drag designs. The optimisation problem defined with Eqs. (1a)–(1b) aims to optimise exclusively the drag and lift forces, other aerodynamic and structural parameters of the airfoil are neglected. This can result in airfoil designs which are sub-optimal when structural requirements and other aerodynamic parameters are considered. For example, the maximum lift-to-drag design in Fig. 9c has a very thin trailing edge which is undesired from a structural point of view. Nevertheless, we can see that the pressure distribution in Fig. 9b, d and f is well approximated with XFOIL. This explains the effectiveness of the used multi-fidelity approach.

Multi-Objective and Multi-Fidelity Airfoil Optimisation Under Uncertainty

295

8 Conclusion The optimisation of an aerodynamic shape is computationally expensive—the more so, when uncertainties are taken into account. This problem can be tackled by employing multi-fidelity surrogate assisted optimisation. On the one hand, the required number of design evaluations is reduced by using statistical models which help us to evaluate only promising design candidates. On the other hand, the required number of expensive high-fidelity design evaluations is reduced by employing a MF-GPR which can complement the information obtained from highfidelity evaluations with low-fidelity information. In this study, we performed an aerodynamic shape optimisation under uncertainty combining information from XFOIL and the RANS solver of SU2. The multi-fidelity surrogate assisted optimisation provided an accurate Pareto front approximation with only a limited number of high-fidelity RANS simulations. The optimal solutions found by the proposed approach display significant dominance over the baseline solution in the objective space of the reliability measures of the lift and drag. Acknowledgments This work was partially supported by the H2020-MSCA-ITN-2016 UTOPIAE, grant agreement 722734.

References 1. Barrett, R., Ning, A.: Comparison of airfoil precomputational analysis methods for optimization of wind turbine blades. IEEE Trans. Sustainable Energy 7(3), 1081–1088 (2016) 2. Beyer, H.G., Sendhoff, B.: Robust optimization–a comprehensive survey. Comput. Methods Appl. Mech. Eng. 196(33–34), 3190–3218 (2007) 3. Deb, K.: Multi-objective optimization. In: Search Methodologies, pp. 403–449. Springer, Berlin (2014) 4. Drela, M.: Xfoil: An analysis and design system for low reynolds number airfoils. In: Low Reynolds Number Aerodynamics, pp. 1–12. Springer, Berlin (1989) 5. Drela, M.: Xfoil 6.9 User Primer (2001). xfoil_doc. txt. Last updated 30 Nov 2001 6. Economon, T.D., Palacios, F., Copeland, S.R., Lukaczyk, T.W., Alonso, J.J.: SU2: an opensource suite for multiphysics simulation and design. AIAA J. 54(3), 828–846 (2016) 7. Emmerich, M.T., Giannakoglou, K.C., Naujoks, B.: Single-and multiobjective evolutionary optimization assisted by Gaussian random field metamodels. IEEE Trans. Evolut. Comput. 10(4), 421–439 (2006) 8. Forrester, A.I., Sóbester, A., Keane, A.J.: Multi-fidelity optimization via surrogate modelling. Proc. Roy. Soc. A Math. Phys. Eng. Sci. 463(2088), 3251–3269 (2007) 9. Kennedy, M.C., O’Hagan, A.: Predicting the output from a complex computer code when fast approximations are available. Biometrika 87(1), 1–13 (2000) 10. Korondi, P.Z., Marchi, M., Parussini, L., Poloni, C.: Multi-fidelity design optimisation strategy under uncertainty with limited computational budget. Optim. Eng. 22, 1039–10644 (2020) 11. Le Gratiet, L., Garnier, J.: Recursive co-kriging model for design of computer experiments with multiple levels of fidelity. International J. Uncertainty Quantif. 4(5) (2014)

296

P. Z. Korondi et al.

12. Menter, F.R., Kuntz, M., Langtry, R.: Ten years of industrial experience with the SST turbulence model. Turbulence, Heat Mass Trans. 4(1), 625–632 (2003) 13. Poloni, C.: Hybrid GA for multi objective aerodynamic shape optimisation. In: Genetic Algorithms in Engineering and Computer Science. Wiley, Hoboken (1995) 14. Poloni, C., Pediroda, V.: GA coupled with computationally expensive simulations: tools to improve efficiency. In: Genetic Algorithms and Evolution Strategies in Engineering and Computer Science. Recent Advances and Industrial Applications pp. 267–288 (1997) 15. Poloni, C., Giurgevich, A., Onesti, L., Pediroda, V.: Hybridization of a multi-objective genetic algorithm, a neural network and a classical optimizer for a complex design problem in fluid dynamics. Comput. Methods Appl. Mech. Eng. 186(2–4), 403–420 (2000) 16. Quagliarella, D., Diez, M.: An open-source aerodynamic framework for benchmarking multifidelity methods. In: AIAA Aviation 2020 Forum, p. 3179 (2020) 17. Quagliarella, D., Serani, A., Diez, M., Pisaroni, M., Leyland, P., Montagliani, L., Iemma, U., Gaul, N.J., Shin, J., Wunsch, D., et al.: Benchmarking uncertainty quantification methods using the NACA 2412 airfoil with geometrical and operational uncertainties. In: AIAA Aviation 2019 Forum, p. 3555 (2019) 18. Quagliarella, D., Tirado, E.M., Bornaccioni, A.: Risk measures applied to robust aerodynamic shape design optimization. In: Flexible Engineering Toward Green Aircraft, pp. 153–168. Springer, Berlin (2020) 19. Schuëller, G.I., Jensen, H.A.: Computational methods in optimization considering uncertainties–an overview. Comput. Methods Appl. Mech. Eng. 198(1), 2–13 (2008) 20. Song, W., Keane, A.: A study of shape parameterisation methods for airfoil optimisation. In: 10th AIAA/ISSMO Multidisciplinary Analysis and Optimization Conference, p. 4482 (2004) 21. Tyrrell Rockafellar, R., Royset, J.O.: Engineering decisions under risk averseness. ASCEASME J. Risk Uncertainty Eng. Syst. Part A Civil Eng. 1(2), 04015003 (2015) 22. Vaithiyanathasamy, R.: CFD analysis of 2D and 3D airfoils using open source solver SU2. University of Twente, Internship Report (2017)

High-Lift Devices Topology Robust Optimisation Using Machine Learning Assisted Optimisation Lorenzo Gentile , Elisa Morales , Martin Zaefferer , Edmondo Minisci , Domenico Quagliarella , Thomas Bartz-Beielstein and Renato Tognaccini

,

1 Introduction Nowadays, the interest in the optimal design of High-Lift Devices (HLDs) has grown in the aeronautical industry [8, 21, 24]. The goal of HLDs is the increase of the lift force generated by the wing during the slow phases of the aircraft flight: take-off, climbing and landing. Among all existing devices, slats and flaps are the most used for this purpose. Particularly, the slat increases the angle-of-attack at

L. Gentile () Technische Hochschule Köln, Gummersbach, Germany University of Strathclyde, Glasgow, UK e-mail: [email protected]; [email protected] E. Morales Italian Aerospace Research Centre, Capua, Italy Università degli Studi di Napoli “Federico II”, Naples, Italy e-mail: [email protected] M. Zaefferer · T. Bartz-Beielstein Technische Hochschule Köln, Gummersbach, Germany e-mail: [email protected]; [email protected] E. Minisci University of Strathclyde, Glasgow, UK e-mail: [email protected] D. Quagliarella Italian Aerospace Research Centre, Capua, Italy e-mail: [email protected] R. Tognaccini Università degli Studi di Napoli “Federico II”, Naples, Italy e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Vasile, D. Quagliarella (eds.), Advances in Uncertainty Quantification and Optimization Under Uncertainty with Aerospace Applications, Space Technology Proceedings 8, https://doi.org/10.1007/978-3-030-80542-5_18

297

298

L. Gentile et al.

which maximum lift is found, thus delaying the stall of the wing. It is placed at the wing leading edge. Oppositely, the flap is located at the trailing edge and increases the lift at low angles-of-attack [22, 23]. The design of HLDs is a challenging task [14, 15, 20]. It involves several disciplines (such as aerodynamics, structures, system integration) during their design process. Furthermore, the design optimisation involves several objectives that are usually conflicting. Moreover, the optimisation problem is a multi-point problem because HLDs are deployed in diverse flight phases. In addition, HLD design is very challenging from a computational standpoint. The correct convergence of the Computational Fluid Dynamic (CFD) flow solver is a crucial aspect since HLDs design addresses configurations at high angles-of-attack, close to stall conditions. Usually, these configurations are characterised by separate flows which harden dramatically the accurate prediction of the aerodynamic performances. Therefore the need of using experimental data to validate the CFD results. Traditionally, a two-step method is adopted to design HLDs. Firstly, a promising configuration type (number of airfoil elements) is identified. Secondly, the elements’ position are determined solving an optimisation problem, often employing heuristic algorithms [14]. In realistic scenarios, it is a tremendous challenge to foresee the optimal configuration in terms of performance, weight, and cost. As a consequence, this strategy likely leads to sub-optimal solutions. Recently, this a priori selection, made by experienced designers, has been avoided by using a specific optimiser that handles configurational decisions [8]. Specifically, a mixed-variable, global optimiser for dynamically varying search spaces based on a genetic algorithm was used, the Structured-Chromosome Genetic Algorithm [5–8, 10]. In [8], a deterministic optimisation was performed where the objective function was limited to the maximisation of the lift coefficient characterising the HLD at a given angle-of-attack despite its costs, weight, drag, and practical construction feasibility. The use of CFD for predicting the aerodynamic performances of the airfoil is very expensive. However, this kind of analyses is affected by several uncertainties. In particular, the uncertainty affecting the operational angle-of-attack of the airfoil must be considered, because, as shown in [8], a small change may cause an abrupt stall of the airfoil. On the other hand, the introduction of uncertainty variables can make standard optimisation unaffordable in terms of computational costs. Thus, a method to reduce the CPU time is required. The presented work is the natural extension of the research conducted in [8]. This work aims at integrating the use of surrogate models to further optimise the usage of computational resources. The general idea is to adopt the search algorithm employed in the previous work in a Efficient Global Optimisation (EGO)-like optimisation framework [16] enhanced by data-driven models. The computational cost is further reduced by means of a quadrature approach that makes the uncertainty quantification relatively inexpensive. The multi-element airfoils used in [8] are employed as baseline configurations for the optimisation under uncertainty presented in this paper. The objective of the optimisation is to design an HLD which maximises the lift coefficient considering the angle-of-attack as an uncertain parameter. Therefore, a quadrature approach [1]

HLD Topology Robust Optimisation Using SBO

299

is implemented to quantify the uncertainty. Hence, instead of maximising directly the Cl , a statistical measure of it will be maximised. Furthermore, to calculate the airfoil performance, the open source CFD flow solver SU2 [3] is used. To properly face an aerodynamic design problem, an automatic aerodynamic computational chain is needed. In this work, the computational chain introduced in [8] is used. The paper is organised as follows: in Sect. 2 the developed process is introduced while in Sect. 3 the quadrature method is presented. Section 4 describes on the HLD optimisation problem and details the formulation and the constraint handling. Details about the optimisation settings are given in Sect. 5. Finally, Sect. 6 shows the results of the optimisation process. As a conclusion, Sect. 7 summarises the critical aspects of this research and presents some future works.

2 Machine Learning Assisted Optimisation To solve this optimisation problem, a method that accounts for the stochastic and expensive nature of the problem is required. Hence, a Machine Learning Assisted Optimisation approach, which stems from the EGO strategy [16] is used. Specifically, this consists of an iterative approach as shown in the Algorithm 1 [9]. Algorithm 1 Machine learning assisted optimisation 1: 2: 3: 4: 5: 6: 7: 8: 9: 10:

t = 0. P (t) = SetInitialPopulation(). Evaluate(P (t)) on f. while not TerminationCriterion() do Use P (t) to build a model M(t) and a classifier C(t). Define objective function F based on M(t) and C(t). P (t + 1) = GlobalSearch(F (t)). Evaluate(P (t + 1)) on f . P (t + 1) = P op(t) ∪ P (t + 1). t = t + 1. end while

The first step in Algorithm 1 (Line 1) is the determination of the initial dataset that is used to train the first surrogate model M and classifier C. Once the first data-set has been created and observed (in Line 2), it is used to train a surrogate model M to replicate the behaviour of the objective function, and a classifier C to distinguish feasible and unfeasible candidates (Line 4). A composition of C and M yields the objective function that will be optimised to propose a new promising solution in Lines 5 and 6. Finally, the data-set is enlarged with the new proposed point and its observation on f (Lines 6 and 8). Further details are given in Sect. 4.2 and a representation can be found in Fig. 1.

300

L. Gentile et al.

Fig. 1 Machine Learning Assisted Optimisation Flowchart. First a Design of Experiment (DOE) is created and observed. The observation, objective function evaluation, consists in computing the Cl coefficient of a candidate given three different angles-of-attack through high-fidelity CFD analyses. These values are used to estimate a robust measure of lift coefficient as described in Sect. 4.2. Once the data are collected, two data-driven models are trained. One aims at mimicking the objective function, and one at identifying feasible from unfeasible solutions. A combination of these two is used as a target of the optimisation algorithm SCGA to find a new promising HLD design. The actual performance of the proposed point is evaluated and stored. If the evaluation budget is not exhausted the process is repeated. Else, the best solution found is returned as the optimal final design

2.1 Surrogate Model To reduce the computational effort and efficiently pursue the identification of optimal solutions, optimisation is often enhanced by the use of surrogate models. These models mimic and replace the original expensive objective function providing a cheap target for the optimisation routines. In this research, a Gaussian process model (or Kriging) [4] has been chosen as surrogate model. An appreciated feature of Kriging is that it also provides an estimate of its own prediction uncertainty. This estimate can be used to balance exploration and exploitation by computing the Expected Improvement (EI) of candidate solutions [9, 19]. Kriging assumes that data follows a multi-variate Gaussian distribution, where errors are spatially correlated. This is encoded within a kernel function. To account for the hierarchical nature of the search space, the Wedge kernel [13] is used. This kernel employs a mapping function. With any standard kernel k(x, x), the mapping function can be employed such that kwedge (x, x) = k(h(x), h(x)), where h(x) applies hi to each dimension xi . The mapping function of the Wedge kernel is

HLD Topology Robust Optimisation Using SBO

⎧ T ⎪ ⎨ 00 , if δi (x) = false hi (x) = T

⎪ ⎩ θ1,i + v θ2,i cos(ρi ) − θ1,i vθ2,i sin(ρi ) , otherwise,

301

(1)

with the scaled variable value v = (xi − li )/(ui − li ), and lower / upper bounds li , ui in each dimension. From a geometrical view, this function maps the variable values in each dimension of the input vector x to a triangular shape (the Wedge) in a two-dimensional space. If the hierarchical variable in that dimension is inactive (δi (x) = false), then T xi is mapped to the origin ( 0 0 ). Else, xi is mapped to a line segment. The parameters ρi ∈ [0, π ], θ1,i ∈ R+ , and θ2,i ∈ R+ , specify the angle and two adjacent side lengths of a triangle (spanned by the origin, and the line segment). When training a standard Kriging model, it is assumed that the uncertainty in the already sampled locations is zero. However, this does not hold in noisy problems in the presence of uncertainty. One way to account for noise is to introduce the socalled nugget effect. This essentially adds a constant value η to the diagonal of the kernel matrix. The parameter η is determined by maximum likelihood estimation. The nugget effect enables the model to regress the observed data, and hence smoothen noisy observations. Furthermore, it may now produce a non-zero estimate of the uncertainty at observed locations [9].

2.2 Classifier In the developed framework, the surrogate model is fed only by solutions that did not report any error. Thus, the landscape is not compromised by artificially assigned penalty values. Further details about the types of errors that might occur are given in Sect. 4.2. In addition to the surrogate model, a classifier is used to predict and filter out candidates which may produce an error in the CFD analysis. Hence, the unfeasible regions are, in principle, detected and excluded in the new point selection step (Fig. 1). The classification method adopted is the Random Forest model [11, 12]. This type of model ensembles learning method for classification that operates by constructing a multitude of decision trees at training time. When predicting, the ensemble outputs the class that is the mode of the classes predicted by the individual trees.

302

L. Gentile et al.

3 Quadrature Approach for Uncertainty Quantification In this work, among all the possible uncertain parameters, only the uncertainty of the angle-of-attack has been considered. Classically, Monte Carlo (MC) methods [18] are used for uncertainty quantification. To accurately obtain the probabilistic distribution by using MC methods, a large number of samples is required. Considering that only one uncertain variable is introduced in this optimisation design problem, a different approach was thought to be more suitable. Particularly, the integral of the Cl coefficient over the interval of angle-of-attack [21.29◦ , 24◦ ] has been adopted as the measure of interest. Indeed, it is a metric that expresses the overall quality in the whole range of the uncertain parameter. To estimate the integral, the Simpson quadrature rule [1] has been used. One derivation replaces the integrand f (x) by the quadratic polynomial (i.e. parabola) which takes the same values as f (x) at the end points a and b and the midpoint m as follows: (x − a)(x − b) (x − a)(x − m) (x − m)(x − b) + f (m) + f (b) . (a − m)(a − b) (m − a)(m − b) (b − a)(b − m) (2) One can show that b a+b b−a f (a) + 4f + f (b) , (3) P (x)dx = 6 2 a

P (x) = f (a)

introducing the step size h = (b − a)/2 can be rewritten as

b a

a+b h f (a) + 4f + f (b) . P (x)dx = 3 2

(4)

Therefore, the quantity of interest can be estimated computing only three Cl values at the extremes and midpoint of the considered range of the angle-of-attack.

4 Problem Formulation In the following sections, the problem formulation will be briefly described. The description of the variables composing the design space is given in Sect. 4.1, while a detailed explanation of the adopted optimisation procedure is provided in Sect. 4.2.

4.1 Optimisation Design Variables The goal of the presented study is to modify the airfoil topology of a baseline HLD through a Machine Learning Assisted Optimisation (MLAO) framework to improve

HLD Topology Robust Optimisation Using SBO

303

(a)

(b)

(c)

(d)

Fig. 2 Illustration of all the configurations considered in the optimisation. (a) Double slotted flap type 1 (DS1 ) configuration (red solid line). (b) Double slotted flap type 2 (DS2 ) configuration (blue solid line). (c) Double slotted flap type 3 (DS3 ) configuration (darkgreen solid line). (d) Triple slotted flap (T S) configuration (violet solid line)

the starting performance. The three-elements airfoil McDonnell Douglas (MDA) 30P-30N [2, 17] has been adopted as a baseline. The goal of the research is to investigate the outcome of optimisation processes when configurational decision variables play a dominant role in the problem formulation. For this reason, in addition to the 30P-30N airfoil itself (hereinafter referred as SF), four baseline configurations have been derived from it. Particularly, three types of double slotted flap (DS1 , DS2 , and DS3 ) and one triple slotted flap (TS) have been generated as in detail described in [8]. A graphical depiction of the resulting devices is depicted in Fig. 2. The degrees of freedom, for each HLD, are the type of flap and the settings of each flap and slat component (meaning the position and rotation in the 2D space).

304 Fig. 3 Hierarchical problem formulation. All the independent variables (referring to Table 1, variables 1–4) constitute Level 1. The variables dependent by variable 1 make Level 2. Solid lines connect variables part of all the solutions. Whereas, dashed lines connect variables that can also not be present

L. Gentile et al.

Level 1

Level 2

Particularly, this optimisation problem has two levels of hierarchy. The top level is made by four independent variables: one nominal discrete variable F lap denotes the flap type (possible values of F lap: SF , DS1 , DS2 , DS3 , and T S) and three continuous variables ( ϑS , XS , YS ) for the settings of the slat. The latter three variables represent the difference of a proposed position from the nominal one. Moreover, the second level of the hierarchy collects all the variables that define the settings of the flap. They are all dependent on the flap type variable, and its value determines their presence or absence. The variables hierarchy is represented graphically in Fig. 3. Furthermore, Table 1 provides the description of each decision variable, the associated number in the hierarchical formulation, the variable type, the possible range of values (as possibility or bounds), and the dependency of any variables. However, this would be a concern from the optimisation perspective. The most intuitive and naive way to encode the variables indicating the flap settings would be the one used for encoding the settings of the slat. However, this would imply significant difficulties from the optimisation perspective. Therefore, in [8] a new formulation was proposed and it is, here, also implemented.

4.2 High-Lift Devices Robust Optimisation Problem For the present robust design optimisation problem, the working conditions of the multi-element airfoil are Mach = 0.2 and Reynolds = 5 × 106 at a range of angles-of-attack α = [21.29◦ , 24◦ ]. The optimisation process aims at finding the multi-element airfoil configuration which guarantees the best performance (maximum estimation of the integral of the Cl coefficient) in the full range of working conditions, by identifying the most performing flap configuration and the settings (position and rotation) for the flap elements, as for the slat. Hence, the aim of the design optimisation problem is to improve the stall performance of the multielement airfoil with respect to the deterministic optimum found in [8].

HLD Topology Robust Optimisation Using SBO

305

Table 1 Variables of the optimisation problem Description N F lap

1

Description ϑS XS YS ϑ1 X1 Y1 ϑ2 X2 Y2 ϑ3 X3 Y3

N 2 3 4 5 6 7 8 9 10 11 12 13

Variable type Nominal discrete Variable type Continuous Continuous Continuous Continuous Continuous Continuous Continuous Continuous Continuous Continuous Continuous Continuous

Possibilities

Dep

SF, DS1 , DS2 , DS3 , T S

Lower bound −15 −0.1 −0.025 −15 −0.2 −0.05 −15 −0.1 −0.025 −15 −0.1 −0.025

Upper bound 15 0.1 0.025 15 0.2 0.05 15 0.1 0.025 15 0.1 0.025

5 − 13

Dep – – – Flap = [SF ,DS1 ,DS2 ,DS3 ,T S] Flap = [SF ,DS1 ,DS2 ,DS3 ,T S] Flap = [SF ,DS1 ,DS2 ,DS3 ,T S] Flap = [DS1 ,DS2 ,DS3 ,T S] Flap = [DS1 ,DS2 ,DS3 ,T S] Flap = [DS1 ,DS2 ,DS3 ,T S] Flap = TS Flap = TS Flap = TS

Original Objective Function The purpose of this work is to maximise a robust measure of the lift coefficient: its integral with respect to the uncertain angle-of-attack over its range. The method presented in Sect. 3 has been used to compute a robust measure of the lift coefficient C˜l . With that said, using Eq. (4), the adopted objective function can be defined as: 24◦ ◦ ◦ ◦ ◦ ◦ ˜ Cl = 0.5 Cl (21.29 ) + 4Cl (24 − (24 − 21.29 )/2) + Cl (24 )

21.29◦

.

(5)

When dealing with CFD analysis, two errors might occur. Firstly, the generation of the computational grid might be impossible given certain HLD configurations, e.g. intersection between airfoil elements. Secondly, the CFD analysis might not converge. The employed convergence criterion is based on the difference between the up-to-date Cl and the mean lift coefficient in the last 1000 iterations of the CFD solver (ClAV G ). An analysis is considered converged, when this difference is lower or equal to 0.005 |Cl − ClAV G | ≤ 0.005 . Moreover, it is worthwhile to mention that Computational Fluid Dynamic analysis in near or post stall conditions is known to be a harsh challenge. Consequently, even in the case of fully converged solutions, the obtained results present a high margin of uncertainty. Therefore, in this study, CFD results that have not fully satisfied the imposed convergence criterion have not be used because of the inevitable and unacceptable margin of uncertainty and imprecision these would have determined.

306

L. Gentile et al.

The occurrence of one of the aforementioned CFD analysis failures is reported by the aerodynamic computational chain by means of an error flag. This is equal to −1 when the mesh is not generated, equal to −2 when the convergence of the flow field is not achieved, and equal to 0 in case of all successful evaluations. Only candidates successfully tested at all angles-of-attack return a correct estimation of C˜l . Therefore, the objective function is reformulated as follows: Q(X) =

⎧ ⎨C˜ , l

if ErrorF lag = 0 ⎩ErrorF lag, if ErrorF lag = 0,

(6)

where ErrorF lag refers to the three possible values returned by CFD analyses used to compute C˜l . Referring to Algorithm 1, it can be said that Q ≡ f .

Artificial Objective Function In the presented framework, the search of optimal solutions relies on the application of an optimiser to an artificial function that mimics the behaviour of the original function. In traditional Surrogate Based Optimisation (SBO), this artificial function is the prediction of a surrogate model trained with all the collected data. Here, it is determined by the predictions of two distinct models. The first, the surrogate model M, aims at the prediction of C˜l . The second, the classifier C, identifies feasible and unfeasible solutions. In particular, the classifier is trained to distinguish between configurations leading to mesh creation failures, CFD divergences, and successful analyses. This is based on the reported ErrorF lag. To train the models, all configurations are needed to be mapped into a common rectangular data structure. Therefore, all the structured chromosome have been flattened, and the ones containing less than the maximum number of possible variables (single and double slotted configurations) have been padded to match the dimensionality of the triple slotted configuration. Finally, using a step penalty, the artificial objective function is reformulated as follows: ⎧ ⎨EI (X), if C(X) = 0 M Qartif icial (X) = (7) ⎩p, if C(X) = 0, with p = −10 (note, that it is a maximisation problem and values ∼ 1 are expected), EIM (X) is the EI computed by the M(X) prediction model and C(X) is the prediction of the classifier. Referring to Algorithm 1, Qartif icial ≡ F .

HLD Topology Robust Optimisation Using SBO

307

Table 2 SCGA settings size 200

tournamentSize 3

maxEvaluations 50,000

elitism 1

mutRate 0.05

probability [3,1,. . . ,1]/16

5 Optimisation Setup As a first step of the MLAO process, an initial design is generated. In this case, an initial DOE of 200 individuals composed of the 5 baseline configurations and 195 randomly generated configurations has been evaluated. Then, new random individuals have been generated until a total of 200 cases with successful grid generation have been proposed. The cases with failure on the grid generation are computationally inexpensive, hence ignored in the evaluations counting. The R CEGO package [26] is used to train the Kriging model, while the R ranger package [25] is used to train the classifier. The optimisation has been stopped after 325 function evaluations (grid generation errors excluded) to assign to the DOE and optimisation respectively 60% and 40% of the total computational budget. The new point selection step is entrusted to SCGA (also available as an R package). Table 2 reports the list of the adopted SCGA settings. In particular, size indicates the population size, mutRate indicates the mutation rate, and probability indicates the probability of a design variable to be selected by the operators. The remaining parameters of SCGA have been left as default [5].

6 Results The most relevant information coming from the obtained results is presented and discussed. Firstly, the convergence of the optimisation process is provided and shown in Fig. 4. An important consideration, from this figure, is that achieving flow field convergence is a harsh problem for SU2 flow solver. Particularly, a considerable part of the DOE (21%) reported a convergence error in at least one simulation. Moreover, the largest part of the solutions randomly generated in the DOE (47%) propose a multi-element airfoil that prevents the generation of the computational grid. Hence, only 31% of the cases have converged in all the three analyses performed. These results indicate how strongly the constraints shrink the feasible search space. It is also worth noting that, in the initial design, only one randomly generated candidate performs comparably to the baseline solutions (first five in Fig. 4.) This underlies the difficulty of the investigated optimisation problem. When the evaluation of the randomly generated points is terminated, the optimisation process starts. Intuitively, in the very first iterations the model’s landscape attracts the optimiser towards the region of the best solution found so far. So the first points suggested are all characterised by the double slotted of type 1 (DS1 ) topology.

308

L. Gentile et al.

Fig. 4 Solutions evaluated in the preliminary deterministic optimisation. The filled dots depict the true value given by the CFD analyses. The crosses and plus symbols indicate an artificial value assigned to the solutions of which was impossible creating the grid and the CFD analyses did not converge. The solid black line separates the solutions composing the DOE and the ones proposed by the algorithm. The dashed red line indicates the best solution found in the initial design. Finally, the dashed blue line marks the reference solution obtained using the deterministic optimum found in [8]

Then, the concentration of points in that region and the consequent decrease of the model’s uncertainty lead the search into a more exploratory optimisation phase. Configurations with different flap type alternate until a new best solution with triple slotted flap configuration (T S) is found. From that point on, the search mainly focused on exploiting this region of the search space. It must be highlighted that the triple slotted flap is the most challenging configuration to design because all the variables are active. Another interesting observation regards the amount of unfeasible solutions found in the optimisation process. The largest part (67%) of the suggested points have been correctly classified as feasible. Notably, only 10% of the solutions led to an error in the computational grid generation. The remaining 26% reported a convergence error in at least one simulation. These results testify the overall high quality of the trained classifiers and show that it is relatively easy to identify configurations with unfeasible geometry with respect to the ones reporting CFD convergence anomalies. This behaviour was expected since, the flow conditions to be analysed (high anglesof-attack at stall conditions) are characterised by separated flows. Thus, representing an arduous task for any numerical flow solver.

HLD Topology Robust Optimisation Using SBO

309

0.2 0.1

y/c

0 -0.1 -0.2 -0.3 0

0.2

0.4

0.6

0.8

1

1.2

x/c Fig. 5 Comparison of the triple slotted robust optimum airfoil (darkblue solid line), the double slotted deterministic optimum (darkred solid line), and the 30P-30N airfoil (black solid line) Table 3 Comparison of the C˜l and Cl obtained for the MD 30P-30N, the baseline T S, the deterministic, and robust optimum airfoils Airfoil MD 30P-30N Baseline T S Deterministic optimum Robust optimum

C˜l 4.376 4.471 4.899 5.052

Cl (21.29◦ ) 4.345 4.529 5.048 5.152

Cl (22.645◦ ) 4.379 4.479 4.973 5.063

Cl (24◦ ) 4.396 4.379 4.451 4.905

In Fig. 5, the obtained robust optimum airfoil is compared with the deterministic optimum found in [8], and the baseline MD 30P-30N airfoil. Regarding the slat of the proposed airfoil, it has nearly moved with respect to the baseline, contrary to the deterministic that has enlarged the gap with the main body. Looking at the optimal HLD resulting from this process, one can see how this outperforms the deterministic optimum found in [8] (see Table 3). Notably, this is true for both the C˜l and Cl (21.29◦ ) which was the objective of [8]. This important achievement validates the MLAO framework and confirms the superiority of the methods based on data-driven models over plain optimisation routines when very limited computational budgets are available. The lift curves of each airfoil are provided in Fig. 6 to compare their performance. Both optimum airfoils provide a higher Cl than the baseline 30P-30N airfoil. Comparing the robust optimum triple slotted airfoil and its baseline (T S) airfoil, an increase of Cl is observed over the complete range of angle-of-attack. In addition, for both airfoils the maximum lift coefficient (Clmax ) is achieved at 20◦ , being Clmax of the robust optimised airfoil 9.8% higher. Moreover, the performance of the deterministic and robust optimum airfoils must be compared. The maximum lift of the deterministic optimum airfoil is achieved at 21.29◦ , since it was the α at which the deterministic optimisation was performed. However, it is observed that the Cl at 21.29◦ of the robustly optimised airfoil is higher than that provided by the deterministically optimised

310

5.5 5 4.5 4

Cl

Fig. 6 Cl vs. α curves for the 30P-30N airfoil (solid line with filled circle), T S baseline airfoil (green solid line with green filled square), deterministic optimum airfoil (red solid line with red filled triangle) found in [8], and robust optimum airfoil (blue solid line with blue filled diamond)

L. Gentile et al.

3.5 3 2.5 2 0

5

10

α

15

20

25

airfoil. Thus, this demonstrates that the deterministic optimisation gave a suboptimal solution. Particularly, the robust airfoil has a maximum lift coefficient 2.7% higher. Furthermore, the C˜l value for the MD 30P-30N, the baseline T S, and robust and deterministic optimum airfoils is given in Table 3. Regarding the measure of robustness, C˜l , the optimal airfoil found in this work improves the performance of the deterministic optimum and the baseline MD 30P-30N respectively by a 3.1% and 15, 4%. Finally, the pressure coefficient (Cp ) flow field at 24◦ of the robust and deterministic airfoil is given in Fig. 7. The streamtraces provided for deterministic optimum airfoil flow field show that there is a vortex over the slat of the airfoil. Although both airfoils are after maximum lift condition at 24◦ , the presence of the vortex determines a higher decrease of lift coefficient in the deterministic optimum. Hence, by performing the robust optimisation this behaviour is prevented, and stall and post stall performances are improved.

7 Conclusions and Future Work In this paper, the problem investigated in [6] is further analysed considering the angle-of-attack as an uncertain parameter. To cope with the consequent growth of the computational effort, a completely novel automated Machine Learning Assisted Optimisation framework able to cope with configurational decisions has been developed. The integration of these decisions, usually taken a priori, could be an important resource for the multidisciplinary design optimisation field. The potentiality of the method has been shown with the application to the High-Lift Device design problem, a complex problem presenting many difficulties itself. Among the many, the most challenging ones are the considerable computational cost

HLD Topology Robust Optimisation Using SBO

311

1

1

Cp 0.824 -0.529 -1.882 -3.235 -4.588 -5.941 -7.294 -8.647 -10.000

0

-0.5

-1

Cp

0.5

y/c

y/c

0.5

0.824 -0.529 -1.882 -3.235 -4.588 -5.941 -7.294 -8.647 -10.000

0

-0.5

-1 0

0.5

1

x/c

(a)

1.5

0

0.5

1

x/c

1.5

(b)

Fig. 7 Pressure coefficient Cp flow field at 24◦ for the robust and deterministic optimum airfoil. The black lines are the streamtraces. (a) Robust optimum airfoil. (b) Deterministic optimum airfoil

and the demanding constraints. Moreover, the numerical flow simulation capability represented a harsh limitation caused by the critical conditions under which the airfoil operates. An automatic estimation routine consisting of an aerodynamic computational chain based on the SU2 solver has been adopted to cope with these problems. To overcome the limitation on the computational resources, alongside the optimiser SCGA, the presented Machine Learning Assisted Optimisation makes use of data-driven models (Kriging, Random Forest) and Simpson quadrature. The objective of the optimisation was to find the topology, for the multi-element airfoil based on the McDonnell Douglas 30P–30N airfoil, which maximises a robust measure of the lift coefficient. The employed optimisation approach has identified a solution improving the baseline performance by ∼15, 4% with a moderate computational effort. For future research, many further improvements are possible. A more effective stopping and convergence criteria of the CFD analysis may be used. These would enhance the entire process increasing the percentage of successful evaluations. A different quadrature approach, able to estimate the mean and the deviance of a distribution, might be used to maximise the expected performance while maintaining the deviance within a limiting interval. A rigorous analysis of the model and classifier performances should be done. On one hand, this analysis could ensure that shortcomings in these components of the Machine Learning Assisted Optimisation framework do not compromise the execution of the overall process. On the other hand, it should be investigated whether different approaches are more suitable for the presented problem. From an optimisation standpoint, the peculiarities of the SCGA’s operators might be further exploited gathering the variables describing the settings of each element as done in [6]. As a step forward, the design problem might be extended including the shape of each element as well.

312

L. Gentile et al.

However, this would determine a significant increase of the number of variables and problem complexity. Furthermore, the optimisation could turn into a multi-objective optimisation process considering different performance indicators as the weight or the structural complexity. Besides, drag coefficient might be kept under control to satisfy the requirements imposed during take-off and climbing phases. Acknowledgments This research has been developed with the partial support of the H2020 MCSA ITN UTOPIAE grant agreement number 722734.

References 1. Atkinson, K.E.: An Introduction to Numerical Analysis. Wiley, Hoboken (2008) 2. Chin, V., Peters, D., Spaid, F., Mcghee, R.: Flowfield measurements about a multi-element airfoil at high Reynolds numbers. In: 23rd Fluid Dynamics, Plasmadynamics, and Lasers Conference, p. 3137 (1993) 3. Economon, T.D., Palacios, F., Copeland, S.R., Lukaczyk, T.W., Alonso, J.J.: SU2: an opensource suite for multiphysics simulation and design. AIAA J. 54(3), 828–846 (2016). https:// doi.org/10.2514/1.J053813 4. Forrester, A., Sobester, A., Keane, A.: Engineering Design via Surrogate Modelling: A Practical Guide. Wiley, Hoboken (2008) 5. Gentile, L.: LorenzoGentile/SCGA: SCGA second release. Update (Jan 2020). https://doi.org/ 10.5281/zenodo.3627555 6. Gentile, L., Greco, C., Minisci, E., Bartz-Beielstein, T., Vasile, M.: An optimization approach for designing optimal tracking campaigns for low-resources deep-space missions. In: 70th International Astronautical Congress (2019) 7. Gentile, L., Greco, C., Minisci, E., Bartz-Beielstein, T., Vasile, M.: Structured-chromosome GA optimisation for satellite tracking. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion, pp. 1955–1963 (2019) 8. Gentile, L., Morales, E., Quagliarella, D., Minisci, E., Bartz-Beielstein, T., Tognaccini, R.: High-lift devices topology optimisation using structured-chromosome genetic algorithm. In: 2020 IEEE Congress on Evolutionary Computation (CEC), pp. 497–505. IEEE, Piscataway (2020) 9. Gentile, L., Zaefferer, M., Giugliano, D., Chen, H., Bartz-Beielstein, T., Vasile, M.: Surrogate assisted optimization of particle reinforced metal matrix composites. In: Proceedings of the Genetic and Evolutionary Computation Conference (2018) 10. Greco, C., Gentile, L., Filippi, G., Minisci, E., Vasile, M., Bartz-Beielstein, T.: Autonomous generation of observation schedules for tracking satellites with structured-chromosome GA optimisation. In: 2019 IEEE Congress on Evolutionary Computation (CEC), pp. 497–505. IEEE, Piscataway (2019) 11. Ho, T.K.: Random decision forests. In: Proceedings of 3rd International Conference on Document Analysis and Recognition, vol. 1, pp. 278–282. IEEE, Piscataway (1995) 12. Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Analy. Mach. Intell. 20(8), 832–844 (1998) 13. Horn, D., Stork, J., Schüßler, N.J., Zaefferer, M.: Surrogates for hierarchical search spaces: The wedge-kernel and an automated analysis. In: López-Ibáñez, M. (ed.) Proceedings of the Genetic and Evolutionary Computation Conference - GECCO’19, pp. 916–924. GECCO ’19, ACM, Prague, Czech Republic (2019). https://doi.org/10.1145/3321707.3321765. http://doi. acm.org/10.1145/3321707.3321765

HLD Topology Robust Optimisation Using SBO

313

14. Iannelli, P., Moens, F., Minervino, M., Ponza, R., Benini, E.: Comparison of optimization strategies for high-lift design. J. Aircraft 54(2), 642–658 (2017) 15. Iannelli, P., Quagliarella, D.: Multi-objective/multi-point shape and setting high-lift system optimization by means of genetic algorithm and 2D Navier-Stokes equations. In: EUROGEN 2011 Conference proceedings, Capua (2011) 16. Jones, D.R., Schonlau, M., Welch, W.J.: Efficient global optimization of expensive black-box functions. J Global Optim. 13(4), 455–492 (1998) 17. Klausmeyer, S.M., Lin, J.C.: Comparative Results from a CFD Challenge Over a 2D ThreeElement High-Lift Airfoil (1997) 18. Metropolis, N., Ulam, S.: The Monte Carlo method. J. Amer. Statist. Assoc. 44(247), 335–341 (1949) 19. Mockus, J., Tiesis, V., Zilinskas, A.: The application of Bayesian methods for seeking the extremum. In: Towards Global Optimization 2, 117–129 (1978) 20. Moens, F., Wervaecke, C.: Multi-point optimization of shapes and settings of high-lift system by means of evolutionary algorithm and Navier-Stokes equations. Eng. Comput. 30, 601–622 (2013) 21. Rudnik, R., Geyr, H.: The European high lift project EUROLIFT II -objectives, approach, and structure. In: 25th AIAA Applied Aerodynamics Conference. p. 4296 (2007) 22. Rudolph, P.K.: High-lift systems on commercial subsonic airliners. Technical Report, National Aeronautic and Space Administration (NASA) (1996) 23. Smith, A.M.O.: High-lift aerodynamics. J. Aircraft 12(6), 501–530 (1975) 24. Van Dam, C.: The aerodynamic design of multi-element high-lift systems for transport airplanes. Progress Aerosp. Sci. 38(2), 101–144 (2002) 25. Wright, M., Ziegler, A.: Ranger: a fast implementation of random forests for high dimensional data in C++ and R. J. Statist. Softw. Art. 77(1), 1–17 (2017). https://doi.org/10.18637/jss.v077. i01. https://www.jstatsoft.org/v077/i01 26. Zaefferer, M., Stork, J., Friese, M., Fischbach, A., Naujoks, B., Bartz-Beielstein, T.: Efficient global optimization for combinatorial problems. In: Proceedings of the 2014 Conference on Genetic and Evolutionary Computation (GECCO’14), pp. 871–878. ACM, New York (2014). http://doi.acm.org/10.1145/2576768.2598282

Network Resilience Optimisation of Complex Systems Gianluca Filippi

and Massimiliano Vasile

1 Introduction The design of a Complex Engineered System (CEdS) is a multi-disciplinary problem, and it requires the collaboration of a high number of experts with different backgrounds. Also, particularly in the field of space engineering, the whole design process lasts for many years. These reasons suggest that lack of knowledge, conflicting opinions and subjective probability statements always heavily impact the process. By studying accidents that happened in the past in order to develop new approaches for the risk reduction of CEdSs, it has been recognised [9] that there exists a common pattern that usually bring the CEdS to fail with possible catastrophic consequences. This pattern includes: production pressure that erodes safety margins and exposes the system to risky scenarios, the habit of taking past successes as a reason for confidence in future designs, fragmented problem solving and also problems of communication within the organisation. The aim of this paper is to propose a new system engineering approach for the design of CEdSs that goes in the direction of solving the listed problems. We use a graph representation to model the CEdS and the interaction of the subsystems and components under uncertainty [3]. This approach allows to have a holistic and coherent view of the entire system and design process as well as simplifying the communications between different actors that are involved. Traditional methods based on the estimation of margins and/or statistical moments cannot be successful in modelling imprecision which is really influential particularly in the early phases of the design process.

G. Filippi () · M. Vasile University of Strathclyde, Glasgow, UK e-mail: [email protected]; [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Vasile, D. Quagliarella (eds.), Advances in Uncertainty Quantification and Optimization Under Uncertainty with Aerospace Applications, Space Technology Proceedings 8, https://doi.org/10.1007/978-3-030-80542-5_19

315

316

G. Filippi and M. Vasile

The implementation of Imprecise Probability theories within the design methodologies represents a substantial step to solve the problem. Dempster–Shafer theory of evidence is applied in this paper. First all the Quantities of Interest (QoIs) are identified and based on them the global indicators of performance f and functionality c are defined. For the latter we are interested in particular in quantifying the resilience ρ. The evolution of the system’s state (that quantifies the resilience) is modelled with the use of Bifurcation Theory. It allows, indeed, to capture the continuous transitions between fully functioning and degraded states as well as the occurrence of disruptions and shocks that perturb the system. Such a model can also easily describe qualitative (or topological) changes in the evolution of the system state due to the uncertainty. The optimisation for resilience and the propagation of uncertainty within the optimisation process reduce the possibility of underestimating the risk of the possible scenarios. The proposed resilience approach combines the concepts of robustness and reliability [10]. A solution of the optimisation problem is robust if it minimises the negative effect of uncertainty on the objective function f . Correspondingly, from an engineering point of view, a system’s design is robust if the influence of uncertainty on the performance function described by f is minimised. Reliability is instead a quantity related to the constraint function c and the functionality that c quantifies, and it measures the likelihood that the item will perform its intended function for a specified time interval under stated conditions.

2 Evidence Theory as Uncertainty Framework Theory of Evidence introduced by Dempster and Shafer is a generalisation of probability theory. In the latter it is required to specify the probability space (S, S, p). It is a triple with S the set that contains everything that could occur in the particular universe under consideration, S a suitably restricted set of subsets of S and p the function that defines probability for elements of S. Evidence Theory is instead defined by (!, , m). The frame of discernment ! is the set of all the mutually exclusive and collectively exhaustive elementary events (or hypothesis) θi , i = 1, . . . , |!|: + * ! = θ1 , θ2 , . . . , θi , . . . , θ|!|

(1)

All the possible events (or hypotheses) could be overlapping or nested, but in the frame of discernment only the finest division of them is considered. From the frame of discernment, one can define the power set 2! = (!, ∪) by considering all possible combinations of the elements of !:

Network Resilience Optimisation of Complex Systems

317

8 7 * + = 2! = ∅, {θ1 } , . . . , θ|!| , {θ1 , θ2 } , . . . , {θ1 , θ2 , . . . , θi } , . . . , {θ1 , θ3 } , ! (2) where the generic element ω = {θ1 , . . . , θj } of = 2! is a proposition that states the truth of only one of the events θ1 , . . . , θj without specifying which one. The degree of belief, or evidence, is quantified by the bpa that assigns a value m ∈ [0, 1] to each subset of : m : 2! → [0, 1]

(3)

where the function m has to satisfy the following conditions: m(ω) ≥ 0, ∀ω ∈

(4)

m(ω) = 0, ∀ω ∈ /

(5)

m(∅) = 0 m(A) = 1

(6) (7)

A∈2!

Each subset of the power set 2! with a non-zero bpa is called a Focal Element (FE) and the pair F, m, where F is the set of all F Es and m the corresponding bpas, is called Body of Evidence. The Theory of Evidence requires less restrictive statements about the likelihood than the general probability theory. In particular it involves the definition of two measures: Belief and Plausibility. For a given model of the Quantity of Interest (QoI) f and the target set A = {x ∈ |f (x) ∈ },

(8)

belief and plausibility are defined as

Bel(A) =

m(ωi )

(9)

ωi ⊆A

P l(A) = ωi

B

m(ωi )

(10)

A=∅

3 System Network Model First, the CEdS is defined as a network where each node i ∈ [1, . . . , N ] represents a specific subsystem or discipline.

318

G. Filippi and M. Vasile

For the generic i-th node, we consider then the following measures. Among all the specified requirements for the system under design, the most important one is defined as the performance indicator fi : f f

f f f fi di , di,c , ui , ui,c , ϕ I i , t : Di × Ui × T ⊆ Rmi +ni +1 → R,

(11)

while the remaining are included in the set of functionality indicators ci :

ci dci , dci,c , uci , uci,c , ϕ cI i , t : Di × Ui × T ⊆ Rmi +ni +1 → Rk

(12)

In Eqs. (11) and (12) t ∈ T ⊂ R is the time, di and ui are the set of design and uncertain variables used only within node i and di,c and ui,c are the set of variables shared between node i and other nodes. For easiness in the notation, we will omit the apex f and c, writing di , di,c , ui , ui,c , ϕ I i having however in mind that the couplings between nodes that define f in Eq. (11) and c in Eq. (12) are in general different. The global network indicators f and c finally arise from an emergent behaviour of the complex network. Considering a network with N nodes and representing with F and C two general indicators that are problem specific, it is: f (d,u) = FN i=1 fi (di , dc,i , ui , uc,i , ϕ I i , t)

(13)

c(d,u) = CN i=1 ci (di , dc,i , ui , uc,i , ϕ I i , t)

(14)

Finally, based on the distinction between coupling (uc,i ) and uncoupling (ui ) variables and on the uncertainty framework given in Sect. 2, it is possible to distinguish the FEs that belong to a single node (which number is NFu E,i ) and FEs that influence more nodes (which number is NFc E,i ).

4 Complexity Reduction of Uncertainty Quantification Using the framework summarised in Sect. 2, it is then possible to quantify uncertainty with the measures of belief and plausibility. Dempster Shafer Theory (DST) has however a drawback due to the high computational cost for the reconstruction of the curves. Indeed an optimisation (maximisation for belief and minimisation for plausibility) is required for each FE because its worst and best case scenarios are counted in Eqs. (9) and (10). For example, for a problem with m uncertain variables, each defined over Nk intervals, the complexity is: Nopt = NF E =

m = k=1

Nk

(15)

Network Resilience Optimisation of Complex Systems

319

The number of FEs and consequently of optimisations is then exponential with the problem dimension. In the following, three methods are presented to reduce the cost of uncertainty quantification when DST is used. They refer to the evaluation of the Belief curve of the performance function f in Eq. (11). However the generalisations to Plausibility and function c in Eq. (12) are immediate.

4.1 Network Decomposition This method exploits the properties of the network representation illustrated in Sect. 3. We further suppose that the global indicator f is a linear combination of the nodes functions fi f (d,u) =

N

fi (di , dc,i , ui , uc,i , ϕ I i , t)

(16)

i=1

where this assumptions holds true in many engineering problems. The decomposition algorithm aims at decoupling the subsystems over the uncertain variables in order to optimise only over a small subset of the FEs. This procedure requires the following steps: 1. 2. 3. 4.

Solution of the worst case scenario problem maxu∈U f (u). Maximisation over the coupled variables uc,i and computation of Belc,i (A). Maximisation over the uncoupled variables ui . C Reconstruction of the approximation Bel(A). A detailed analysis of the approach can be found in [5]. The overall cost is Nopt = Ns

mu i=1

NFu E,i

+

mc

NFc E,i

(17)

i=1

where Ns is the number of samples taken from combination of points in all the belief curves of the coupling variables Belc,i (A), NFu E,i is the number of FEs for the uncoupled variables affecting only node i and NFc E,i is the number of FEs of the coupling variables that are shared also by node i.

4.2 Tree-Based Exploration In this approach the whole computation of Belief proceeds by building a tree that has at its root the whole uncertainty space with the associated global worst case optimisation solution, and at its distal leaves the whole set of FEs, each one with an associated maximum of the quantity of interest. The heuristic that drives how the

320

G. Filippi and M. Vasile

tree is built and explored is key to the rapid convergence to the correct Belief and Plausibility values. The overall procedure follows these steps: 1. Solution of the worst case scenario problem f¯0 = max f (u) where S0 := U ; u∈S00 1 2. s-Decomposition of the uncertain space in Si ∪ Si2 ∪ Si3 . . . ∪ Sis following the heuristic criterion; 3. Exploration of each subdivision Sik by the optimiser to find the worst case scenario in that macro-FE maxu∈S k f (u). i

Points 2. and 3. are repeated recursively, where i represents the iteration step, until a predefined level of accuracy or computational cost is reached. Other details of the algorithm can be found in [1].

4.3 Combined Method A combination of the two proposed methods is finally possible in order to further reduce the overall computational cost needed to evaluate the Belief curve. We assume here that only a particular value of the Belief is required which corresponds to a given threshold ν: Bel(f ≤ ν). The approach starts applying the tree-based algorithm. The heuristic at point 2 in Sect. 4.2 is based on a measure of variance of the maxima of f over the FEs. In this way the uncertain space is subdivided in order to maximise the likelihood to obtain some of the maxima below ν. Such subsets Sik are then no further decomposed because their probability mass m contributes entirely to Bel(f ≤ ν). The process is iterated until the variance drops below a specified threshold value. At this point the set of coupling and uncoupling uncertain variables are updated removing all the FEs already evaluated and all the FEs which contribution is known to be included in the belief measure. The network decomposition of Sect. 4.1 is finally applied. The use of the tree-based algorithm during the first step of the method has two major effects. First, it reduces the number of FEs that need to be explored within the following step. Furthermore, this reduction has an important impact on the network topology. The updated graph is likely to have a reduced number of links resulting in a non-linear reduction of the number of operations needed to decompose the system. In particular, if the updated network results are disconnected, the decomposition approach can also be applied in parallel within each network component with a further cost reduction.

Network Resilience Optimisation of Complex Systems

321

5 Optimisation Approach A general formulation for the constraint optimisation under uncertainty is mind∈D φ(d) s.t. γj (d) ≤ 0

(18)

where φ(d) and γ (d) represent the general quantification of uncertainty on functions f and c, respectively. In particular, for this quantification we choose to use DST which translates Eq. (18) to the following: maxd∈D Bel(f (d, u) ≤ νf ) s.t. Bel(c(d, u) ≤ νc ) > 1 −

(19)

and we want to solve Eq. (19) for fixed νf , νc and . The method presented in Sect. 4.3 is used for the approximation of Belief. To further reduce the computational cost of the design process, an Efficient Global Optimisation (EGO) approach is also applied. The interested reader can find more information about EGO in [6]. An archive of design vectors Ad is first generated and for each dˆ ∈ Ad , ˆ u) ≤ νf ) and Bel(c(d, ˆ u) ≤ νc ) are evaluated using the complexity Bel(f (d, reduction technique presented in Sect. 4.3: dˆ → Bel(f (d, u) ≤ νf ) dˆ → Bel(c(d, u) ≤ νc )

(20)

The acquired information is used to initialise the surrogate models Sf and Sc for f and c, respectively. The next two steps are then iterated until convergence. A constraint maximisation over the design space D of the surrogates Sf and Sc is performed until convergence: maxd∈D Sf (d) s.t. Sc (d) ≥ 1 −

(21)

The design vector d∗ solution of Eq. (21) is added to the archive Ad . Bel(f (d∗ , u) ≤ νf ) and Bel(c(d∗ , u) ≤ νc ) are evaluated and the surrogates Sf and Sc are finally updated. All the design solutions in the archive Ad are finally cross-checked with the approach in Sect. 4.3 and the best solution is finally selected.

322

G. Filippi and M. Vasile

6 Resilience Framework In the optimisation approach described in Eq. (19), quantities f and c are general indicators respectively for system performance and functionality. As previously stated, we are interested in the global resilience of the CEdS. Resilience is a functional indicator of the system and then it can be incorporated within c in Eq. (19). With analogy to Eqs. (11)–(14), the state indicator of each i-th node (subsystem) is defined by xi : xi (di , dc,i , ui , uc,i , ϕ I i , t) : Di × Ui ⊆ Rm+n → R

(22)

which is the solution of the dynamical model of Ordinary Differential Equation (ODE): x˙i = χi (xi , βi ) +

N

ax,ij ψi (xi , xj , βij )

(23)

j =1

where ax,ij ∈ Ax with Ax the adjacency matrix, χ describes the self-dynamics, ψ the coupled dynamics between nodes and βi (di , dc,i , ui , uc,i ) : Di × Ui ⊆ Rm+n → R βij (di , dc,i , ui , uc,i ) : Di × Ui ⊆ Rm+n → R x0 = x(di , dc,i , ui , uc,i )|t=t0 : Di × Ui ⊆ Rm+n → R.

(24)

In Eq. (23) Bifurcation Theory [2, 7] is used to proper model the state xi where β is called the bifurcation parameter and is the responsible to the possible switch of the node’s state to different dynamical regimes. Smooth transitions involve a continuous change in the steady state of the system until the bifurcation value is crossed, giving place to a second-order phase transition. Catastrophic transitions involve a discontinuity of the steady state at the bifurcation value, giving place to first-order phase transitions. The resilience of node i is defined as the cumulative quantity: ρi =

te

xi dt

(25)

ρ = RN i=1 ρi

(26)

t0

where ρ = 1 indicates the system fully functioning and ρ = 0 the system with a non-recoverable failure. This description of resilience based on the dynamics of the states xi incorporates the concept of reliability but it is also something more. Reliability is included in the model because the global functionality given by the interactions between

Network Resilience Optimisation of Complex Systems

323

xi is considered. However the quantity ρ takes into consideration both the risk of undesired scenarios due to uncertainty and also the recovery process after the scenario happened. We consider the risk to be a combination of the system’s fragility and vulnerability, the former representing the ability of the system to avoid different uncertain scenario and the latter the loss quantification on the system state. Finally, the use of bifurcation theory allows to model also a continuous recovery after the loss that can bring to the same (lower or higher) state the system had before the shock. A further novelty introduced is due to the use of imprecision related to the resilience quantification. This translates to the use of the lower likelihood measure due to the ignorance affecting the problem.

7 Application The proposed method is applied to the design for resilience of an observational cube-sat in Low Earth Orbit (LEO). The goal of the mission is the detection of possible fires on the Earth within a belt centred at the latitude of 50 deg. The network representation is in Fig. 2a. Each node has been populated with analytical models that the reader can find in [4, 8]. The optimisation problem has been formulated as: maxd∈D Bel(Mass(d, u) ≤ νf ) s.t. Bel(ρobdh (d, u, t) ≤ νc ) > 0.8

(27)

The system’s performance is the overall mass f = Mass = Mi that is treated as objective function in the optimisation problem. The system functionality is the global network resilience ρ which model the state of the On Board Data Handling (OBDH) node during the mission time (Fig. 1).

8 Results The optimisation approach in Sect. 5 has been applied to the test case and results are in Fig. 2. The memetic optimisation solver Multi–Population Adaptive Inflationary Differential Evolution Algorithm (MP-AIDEA) has been used that shows to be efficient and effective, on average, on a wide range of problems mixing different characteristics. Its parameters have been fixed as it follows. The number of agents for each population Npop and the maximum number of function evaluations for each optimisation in the belief evaluation were set to be respectively Npop = max[5, nD ] and nbelief feval,max = 500nU with nD and nU respectively the number of design and uncertain variables while the whole process runs until convergence.

324

G. Filippi and M. Vasile

Fig. 1 Representation of the spacecraft as a complex system. The two quantities of interest are the overall mass M and the percent of coverage are P C for the payload

The dimension of the bubble for the global restart is δglobal = 0.1, the number of populations is npop = 2 and the convergence threshold of Differential Evolution (DE) is ρ = 0.25. Figure 2a compares the effect of uncertainty on the system for two different design vector solutions. The blue curve corresponds to the worst case optimum while the red one to the evidence-based optimisation of Eq. (27). The former shows a better worst case scenario, however, for the chosen threshold νf = 27.5 the latter has a higher belief to satisfy the statement. Figure 2b refers instead to the resilience of the solution. In Eq. (27) a threshold of 0.8 has been applied on the Belief on the resilience. For such value of belief the system is indeed able to recover after possible shocks due to uncertainty. However the worst case effect of the uncertainty is not recoverable as shown by the red curve.

Network Resilience Optimisation of Complex Systems

325

Fig. 2 (a): Belief curves. The red one corresponds to the solution of min-max; the blue one to the design solution of the evidence-based optimisation (b): global network resilience for the design solution of the evidence-based optimisation. The blue curve corresponds to the threshold on the belief νc = 0.8 while the red one corresponds to the worst case scenario on the uncertainty space

9 Conclusions This paper presented a system engineering approach for the design optimisation of complex engineered systems. Severe uncertainty, lack of knowledge and subjective probability are important aspects to be considered during the design process. To proper model this epistemic uncertainty, it is then suggested the use of Dempster– Shafer Theory of Evidence. An approach based on Efficient Global Optimisation for the evidence-based design is suggested where a surrogate model is generated and updated during the optimisation in order to find the optimal design configuration. Three methods are further described for the computational reduction of uncertainty quantification with Evidence Theory. A framework for the quantification of global network resilience as function of the single nodes resilience is presented and finally integrated in the optimisation algorithm.

References 1. Absil, C.O., Vasile, M.: A variance-based estimation of the resilience indices in the preliminary design optimisation of engineering systems under epistemic uncertainty. In: UROGEN 2017 (2017) 2. Benettin, G.: Una passeggiata tra i Sistemi Dinamici, Scuola Galileiana (2012) 3. Filippi, G., Vasile, M.: A memetic approach to the solution of constrained min-max problems. In: 2019 IEEE Congress on Evolutionary Computation, CEC 2019 - Proceedings, pp. 506–513. (2019). https://doi.org/10.1109/CEC.2019.8790124 4. Filippi, G., Vasile, M.: Global solution of constrained min-max problems with inflationary differential evolution. In: Minisci, E., Riccardi, A., Vasile, M. (eds.) Optimisation in Space Engineering OSE. No. Optimization and Engineering. Springer, Berlin (2020)

326

G. Filippi and M. Vasile

5. Filippi, G., Vasile, M.: Introduction to evidence-based robust optimisation. In: Vasile, M. (ed.) Optimization Under Uncertainty with Applications to Aerospace Engineering. Springer Nature, Berlin (2020) 6. Jones, D.R., Schonlau, M., Welch, W.J.: Efficient global optimization of expensive black-box functions. J. Global Optim. 13, 455–492 (1998). https://doi.org/10.1023/a:1008306431147 7. Seydel, R.: Basic bifurcation phenomena. Computer 49(June), 1–14 (1999) 8. Wertz, J.R., Larson, W.J.: Space Mission Analysis and Design, Space Tech edn. Kluwer Academic Publishers, Dordrecht (1999) 9. Woods, D.: Creating foresight: How resilience engineering can transform NASA’s approach to risky decision making. Work 4(2), 137–144 (2003) 10. Yao, W., Chen, X., Luo, W., van Tooren, M., Guo, J.: Review of uncertainty-based multidisciplinary design optimization methods for aerospace vehicles. Progress Aerosp. Sci. 47(6), 450–479 (2011). https://doi.org/10.1016/J.PAEROSCI.2011.05.001

Gaussian Processes for CVaR Approximation in Robust Aerodynamic Shape Design Elisa Morales

, Domenico Quagliarella

, and Renato Tognaccini

1 Introduction When working with numerical optimisation processes, especially for industrial applications, one inevitably comes across the need to verify that the obtained numerical solutions keep their promises when they enter the real world. This transfer presents at least two types of problems: the fidelity of the numerical model used for the performance forecast and the uncertainty about the actual conditions in which the product will have to operate. To this must be added the fact that the realisation of each industrial product involves a compromise between costs and manufacturing tolerances which can introduce significant deviations from the nominal characteristics foreseen in the project. To these problems, we try to respond with robust optimisation techniques that account for the sources of uncertainty in the project loop. The robust design approach proposed here is

This research has been developed with the partial support of the H2020 MCSA ITN UTOPIAE grant agreement number 722734. E. Morales () Italian Aerospace Research Centre, Capua, Italy Università degli Studi di Napoli “Federico II,”, Naples, Italy e-mail: [email protected] D. Quagliarella Italian Aerospace Research Centre, Capua, Italy e-mail: [email protected] R. Tognaccini Università degli Studi di Napoli “Federico II,”, Naples, Italy e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Vasile, D. Quagliarella (eds.), Advances in Uncertainty Quantification and Optimization Under Uncertainty with Aerospace Applications, Space Technology Proceedings 8, https://doi.org/10.1007/978-3-030-80542-5_20

327

328

E. Morales et al.

based on the use of risk functions that the optimisation process must minimise. This approach is covered extensively in various sources, including [12, 14]. In particular, some risk functions have proved promising, namely value-at-risk (VaR) and conditional-value-at-risk (CVaR), which originated in the field of financial engineering, but which have considerable potential also in other engineering fields. The disadvantage in the use of risk functions is all in their nature as statistical estimators, in the sense that to obtain a sufficiently significant estimate of the risk function, it is often necessary to carry out large samplings of the related quantity of interest (QoI). Therefore, if the QoI requires significant computational effort to evaluate it, the total computational cost of robust optimisation can quickly become unsustainable. Consequently, limiting the number of samples necessary to obtain a reasonable estimation of risk functions is a research topic of paramount importance. Various approaches are possible for this purpose ranging from the introduction of sophisticated sampling strategies such as multilevel Monte Carlo [6, 8] or, more in general, importance sampling [10], to the use of advanced quadrature schemes [20], to the introduction of response surfaces or surrogate methods [9] or, again, based on the use of the gradient. In this chapter, we explore the use of Gaussian processes (GP) for this purpose. Two are the characterising elements of this work compared to the current literature: (a)the approximation obtained with the GP is applied to evaluate risk functions such as CVaR which give greater importance to the tails of the probability distribution compared to the first two statistical standard moments. (b) The approximation with the GPs aims at making the optimisation process more efficient and quick and, therefore, we try to minimise the number of QoI evaluations with very light training and update strategy of the GP, and with the possibility of progressive refinements during the optimisation process. The application used to illustrate the potential of the method is the optimisation of the aerodynamic shape of an airfoil in the transonic field. The QoI is the drag coefficient, cd , that is calculated by solving the RANS equations. The work contains the following points: (a) a brief description of the approach to robust optimisation. (b) Use of GPs in this context. (c) Definition of the test problem and presentation of the results. (d) Conclusions and considerations on subsequent developments.

2 Robust Design and CVaR Risk Function In robust aerodynamic optimisation problems, the Quantity of Interest (QoI) is a statistical measure. The advantage of this type of problems is the offer of an optimal design less vulnerable to different sources of uncertainty (geometrical, operational, or epistemic). Here, the employed optimisation approach is based on risk measures [14], R(Z), which depend on random variables Z, representing the uncertainty of the problem. These random variables can also depend on the design parameters, x. Thus, the optimisation problem can be written as follows:

GPs for CVaR Approximation in Robust Aerodynamic Shape Design

min R0 (Z(x)) x∈X⊆Rn s. to: Ri (Z(x)) ≤ ci i = 1, . . . , m

329

(1)

Classically, the adopted risk measures are based on mean μ , variance σ 2 [11], or a combination of both. This kind of approaches will penalise any configurations far from the mean value. As an example, consider that the drag coefficient (cd ) has to be minimised. Hence, these measures will penalise, in the same manner, configurations that will provide an increase in drag and configurations that will decrease it. However, it is interesting the penalisation of only configurations leading to an increase on (cd ) since configurations with lower drag coefficient are desired. Therefore, risk measures that work asymmetrically shall be introduced. Those are Value-at-Risk and Conditional Value-at-Risk [16, 17]. In the present work, only the last will be studied. Let us define a random variable Z; the CVaRα can be considered as the conditional expectation of losses that exceed VaRα . Therefore, CVaR reads: 1 c = 1−α

α

1

ν β dβ

(2)

α

β where + at the confidence level β ∈ (0, 1), which definition is * ν , the VaR of Z inf z ∈ R : FZ (z) ≥ β , that is the inverse of the cumulative distribution function FZ of Z. The advantage of using CVaR instead of VaR is that the first is more sensitive to the shape of the upper tail of the cumulative distribution function (CDF). In addition, CVaR is a coherent risk measure [1]. Figure 1 shows how CVaR relates to the distribution function. The two filled areas, respectively in solid colour and squares, are proportional to the CVaR calculated for the distribution that represents the robust optimum and for that related to the deterministic optimum. The robust optimum, characterised by a more flattened distribution, is also the one with the lower CVaR.

3 Risk Function Approximation The work aims to introduce the use of surrogate models to reduce the computational load associated with the evaluation of the ECDF. However, there are some considerations regarding accuracy and precision obtainable in estimating the parameters of a population. Before going into detail on the use of GPs for the approximation of risk functions, it is necessary to reflect on what it means to use approximation techniques for the calculation of statistical estimators. The main problem when we try to evaluate a

330

E. Morales et al.

Fig. 1 Cumulative distribution function and related CVaR risk function at a given α threshold value (confidence level)

statistical parameter is accuracy, intended as the proximity of the estimator obtained through a sample to the real population parameter. Accuracy depends on the number of samples available to calculate the statistics. To calculate it, however, it is necessary to know the actual value of the population parameter. Realistically, we can settle for considering a value obtained with a huge number of samples as the true parameter value. So we should proceed according to the following steps: 1. Compute the quantity of interest as many times as possible as allowed by the computational budget. 2. Consider the obtained estimate as being the true value of the parameter. 3. Consider a smaller population (realistically the smaller that one can afford in the experiments) and do a high number of times (with different samples) the calculation of the estimator (one could also randomly extract subsamples from the large initial population to avoid doing other calculations). 4. If we calculate the standard deviation or variance of the estimates obtained in this way, we get an estimate of the precision of the method, but not the accuracy. If instead, we replace the average value of the estimates with the true value, we get an estimate of the accuracy. Passing to the surrogate model, you can get very high precision because it does not cost you to use huge populations. Instead, what can happen is that there is a not easily estimable bias. Indeed, you can also estimate accuracy and bias when the true (or almost true) value is available. So, from this point of view, our goal is to reduce as much as possible the bias that we inevitably get using surrogate models. At the same time, it is essential not to exceed the number of samples needed to train the

GPs for CVaR Approximation in Robust Aerodynamic Shape Design

331

approximator (in our case the GPs), so as not to frustrate the advantages in terms of computational cost.

3.1 Gaussian Processes The regression method based on Gaussian processes was used to derive an approximation of the empirical cumulative distribution function (ECDF) [19] for the QoI (the cd in these benchmark cases). After that, the statistics of interest, such as mean and standard deviation, are calculated from the approximation of the ECDF obtained with the GPs. The approach based on Gaussian processes is here implemented in a very simple way, without resorting to sophisticated techniques such as sparse Gaussian processes or adaptive sampling and has, fundamentally, the role of establishing a basis of comparison for methodologies based on more sophisticated metamodels. For a thorough and exhaustive discussion of the Gaussian processes we refer to the classic book of Rasmussen and Williams [15]. Let us here briefly recall that a Gaussian process defines a distribution of functions p(f ), with f : X → R, such that by taking any finite number of random variable samples, {z1 , . . . , zn } ⊂ Z, the marginal distribution over that finite subset p(f) = p({f (z1 ), . . . , f (zn )}) is a multivariate Gaussian probability distribution. The process is completely defined by specifying a mean function μ(z) and a covariance function, or kernel, K(zi , zj ; θ ), where θ is a vector of parameters that can be learned from data to obtain regression. The covariance function here used has the following form [4, 5]: ⎡ ⎢ 1 K(zi , zj ; θ ) = ν1 exp ⎢ ⎣− 2

L =1

⎤ () () 2 z i − zj ⎥ ⎥ + ν2 + δij N(zi ; θ ) ⎦ 2 r

(3)

with z() the -th component of vector z. The vector of hyperparameters is given by θ = {ν1 , ν2 , r1 , . . . , rL } and N defines the noise model.

3.2 Training Methodology The training procedure followed to instruct the Gaussian processes to approximate risk functions is the characterising point of this work. The goal we aim to achieve is to obtain a convenient compromise between the effort required for training and the predictive ability of the Gaussian process. The starting point is a set of ECDF calculated with a sufficiently large number of samples of the quantity of interest q. We will write that q is a function of a vector x ∈ X of deterministic parameters (design parameters) and a vector of random variables u ∈ U that model uncertainty:

332

E. Morales et al.

qki = q (x k , ui )

(4)

For our purposes, we will choose and fix once and for all a congruous number of samples ui of the random vector according to the assigned joint distribution function, and we will always use this set when we want to calculate the distribution function of q for an assigned deterministic input vector x k . So, having fixed an assigned design vector x k , we compute the set of Qk = {q(x k , ui ), i = 1 . . . n} that is obtained with the random sampling of u. This set of values constitutes our reference ECDF. Our goal is to build a Gaussian process capable of accurately approximating both the q values of the training set and those not belonging to it. The fundamental difference compared to the classic approach is that instead of trying to approximate the whole set of data globally, we focus on the single ECDF. In this way, we evaluate a tiny and predetermined sample of the input CDF, say U σ ⊂ U , and we use this subset as a training set. We build the Gaussian process that approximates it qˆk = qkGP (u; x k , U σ ), and the quality of the approximation depends on the training set U σ , the choice of which is a combinatorial optimisation problem. If we define with (Qk , qˆk )

(5)

a function that evaluates the distance between the complete ECDF and that obtained by the Gaussian process using the U σ training subset, for a given set size m < n, we have the following minimisation problem: min (Qk , qˆk )

U σ ⊂U

(6)

with U σ = {uσ1 , . . . , uσm } and σi ∈ {1, . . . , n} without repetitions. So it is a question of finding the σi indices each in the range (1 . . . N ) (without repetitions) that minimise the distance function . In this work, we use a genetic algorithm with an appropriate encoding of the bit string. However, the solution obtained has the disadvantage of not being general. That is, for an arbitrary ECDF Qp = Qk , the approximation obtained is not necessarily satisfactory. For this, it is necessary to expand the verification set by adding more ECDFs. In this case, the objective function becomes: min

U σ ⊂U

T

(Qk , qˆ# )

(7)

k=1

The critical point is that the set of random input vectors u must be the same for each deterministic vector x k . In this way, the optimal subset of U σ # points corresponds to different values of q because it also depends on the deterministic vector, and it will be the optimal choice for the approximation of the whole ECDF set. Therefore, once the optimal U σ # subset is obtained, the computation of a new ECDF approximation

GPs for CVaR Approximation in Robust Aerodynamic Shape Design

333

requires the exact calculation of q only for the values of this subset. So, excluding the GP training time, the computational cost of the new ECDF calculation is of the order of O(m). If we increase the cardinality of the U σ training set, the time required to define the next optimal local GP also increases, and the combination of GP training time increment and the O(m) evaluations may render the GP usage no more convenient. To remedy, at least partially, this drawback an objective function can be defined, which privileges more the approximation of the ECDF in the areas essential for the variation of the risk function considered. In this work, the objective function introduced to this end is the weighted sum of both differences between ECDFs and risk functions. The ideal size of the subset of samples to approximate an ECDF must respond to a trade-off between the accuracy of the GP approximation and the computational cost necessary to evaluate the quantity of interest at the input values of the subset. It is certainly possible to introduce the satisfaction of this tradeoff as a further goal of the training process, but, at present, the size of the GP training subset is decided through empirical considerations and a process of trial and error. As a final remark, the need to keep the same sample for the input variables governing the uncertainty is certainly a disadvantage of the method since it can introduce a bias in the approximation process. However, this problem is mitigated by the fact that the training samples, although they are all obtained with the same set of input variable samples, are relatively large, and the GP training process allows to select a small set of samples which, at least regarding the original training set, minimises this bias effect.

4 Numerical Analysis Tools For the aerodynamic design optimisation problem faced in this work, an autonomous aerodynamic computational chain has been developed. Firstly, it generates the candidate airfoil through wg2aer.1 It is a program that, from some input design variables, modifies a baseline airfoil considering a set of modification functions as Hicks–Henne, Wagner, or Legendre functions. Afterwards, an unstructured square mesh is created by means of a self-developed procedure based on the open-source grid generator Gmsh [3]. Finally, the aerodynamic solver is run. In this case, the open-source Computational Fluid Dynamic (CFD) SU2 [2] solver was used. It solves the compressible Reynolds-averaged Navier-Stokes (RANS) equations using SA turbulence model [18].

1 Developed

at the Italian Aerospace Research Centre (CIRA).

334

E. Morales et al.

5 Design Application Example 5.1 Optimisation Problem Setup For the design application example, both aerodynamic and geometric constraints are used. The airfoil percentage thickness with respect to the chord is fixed at the base value (t% ), while some constraints on the leading edge radius (LER), the trailing edge angle (TEA), and the airfoil percentage thickness with respect to the chord at x/c = 0.85 (TAT) are introduced to obtain realistic shapes. Moreover, a special attention is dedicated to the airfoil pitching moment coefficient cm , which for BWB configurations is a critical parameter due to the absence of the elevators. For this reason, two constraints for the cm coefficient are imposed to keep its value properly confined, as required by trim aspects. The cm coefficient is calculated with respect to the aerodynamic centre, and it is considered positive in the case of “nose up” pitching moment. The deterministic optimisation problem reads: ⎧ ⎪ min cd (x) ⎪ ⎪ x ⎪ ⎪ ⎪ ⎪ subject to: ⎪ ⎨ t% = 16.00 ⎪ LER ≥ 0.00781 ⎪ ⎪ ⎪ ⎪ ⎪ TEA ≥ 22.0◦ ⎪ ⎪ ⎩ TAT ≥ 0.06658

cl = 0.1 cm ≥ −0.04 cm ≤ 0.04 error = 0

(8)

The penalty approach is used to obtain an unconstrained problem: min cd (x) + P (x) x∈X⊆Rn

(9)

with P (x) = k 1 p+ (LER, 0.00781)+ k 2 p+ (TEA, 22.0◦ ) + k 3 p+ (TAT, 0.06658)+ k 4 p+ (cm , −0.04) + k 4 p− (cm , 0.04) + k 5 p+ (error, 0)

(10)

All the constraints except those regarding the lift coefficient and the airfoil percentage thickness with respect to the chord are treated as quadratic penalties: ) +

p (x, y) =

0 if x ≥ y (x − y)2 if x < y

) and

−

p (x, y) =

(x − y)2 if x ≥ y 0 if x < y (11)

GPs for CVaR Approximation in Robust Aerodynamic Shape Design

335

y/c

0.08 0.04 0.00

-0.04 -0.08

0.0

0.2

0.4

x/c

0.6

Fig. 2 Comparison of the deterministic optimised airfoil shape ( ( ) Table 1 Summary of uncertain parameter definition in the four benchmark cases

Uncertainty Mach, M Angle of attack, α Geometry, Uj

0.8

1.0

) versus the baseline airfoil

Range 0.78 , 0.82 −0.15◦ , 0.15◦ −0.0007 , 0.0007 , j = 1, . . . , 12

The cl and the thickness constraints are not included because they are automatically satisfied by the computation procedure by changing the angle of attack and by rescaling the airfoil thickness to the assigned value, respectively. The numerical values chosen for the k i coefficients are: k 1 = 5000, k 2 = 10, k 3 = 30, k 4 = 1000, k 5 = 1000. The transformation of a constrained optimisation problem into an unconstrained one through the penalty approach is always a delicate process, as the choices of the weights of the penalisation terms profoundly change the shape and features of the search space. After solving the deterministic optimisation, a robust one must be carried out to improve the airfoil performances under uncertainties in the airfoil shape and in the operating conditions. The baseline airfoil for the optimisation under uncertainty (the deterministic optimum) is shown, together with the initial airfoil, in Fig. 2. To account uncertainty, 12 uniformly distributed random variables were used to represent the stochastically perturbed shape of the airfoil. Moreover, the Mach number and the Angle of Attack are considered as the uncertain working conditions, and they are modelled as four-parameter beta distribution, whose density function is given by f (y; η, θ ) =

γ (η + θ )(y)η−1 (1 − y)θ−1 γ (η)γ (θ )

(12)

with shape factors η, θ , and a scale and translation given by y = (x − loc)/scale. Mach is characterised by η = 2, θ = 2, scale = 0.08, loc = 0.76, while α is characterised by η = 2, θ = 2, scale = 1.0, loc = −0.5. Table 1 reports the variation range of these uncertainties.

336

E. Morales et al.

After introducing random variables, the QoI is a functional. Thus, the risk function CVaR is used to map the chosen QoI into R. It is estimated with a confidence level α equal to 0.9. Note that, the constraints are computed only at the nominal values because it is interesting to assess the impact of random perturbations only on the drag force. So, the objective function of the robust optimisation problem is:

min n CVaR0.9 cd (x) + P (x)

x∈X⊆R

(13)

5.2 Optimisation Process and Robust Design Results The robust optimisation method is based on an adaptive surrogate model (GP) updated in multiple cycles. Following the procedure explained in Sect. 3.2, the solution of an optimisation problem determines the training subset for the Gaussian process. In the first step, the selected points are those that minimise the distance between the ECDF of the deterministic airfoil calculated with a sufficiently high number of CFD samples and the ECDF predicted by the Gaussian process. This first optimal set is used to start the initial robust design optimisation loop. Subsequently, a set of promising solutions is extracted, and their “exact” ECDF computed, and these new ECDFs feed another GP training loop. A new robust design loop is started using this new optimal subset. The alternation between these two phases is continued until a satisfactory robust solution is found. It is worth noting that before each new subsequent robust optimisation step, it is necessary to repeat the sample search process to build a new and, possibly, more effective GP training subset. To do this, it is obviously necessary to evaluate the quantity of interest in the whole large sample for each of the new solutions selected for training. This point is one of the most computationally intensive phases of the process. Step 1: Preliminary GP Training The initial step of the procedure requires the characterisation of the deterministic solution robustness. For this purpose, we sample the design variables that describe the uncertainty in the airfoil shape and the operating conditions with a Monte Carlo sampling. Subsequently, we build the ECDF of the quantity of interest, which is the drag coefficient (cd ). We used an ECDF with 120 samples for training. Each point of the ECDF required, on average, 80 minutes of elapsed time to obtain a convergence solution using SU2 V6.2.0 on eight cores of a cluster equipped with Intel (R) Xeon (R) cores E5-2670 CPU at 2.60 GHz. The minimisation of the distance between the 120-sample hi-fi ECDF and its approximation via the Gaussian Process was performed using a simple genetic algorithm. The algorithm had a population of 120 elements that evolved over 16 generations and used a bit string encoding with Gray code. The crossover operator is the classic one-point binary with a triggering probability of 100%. Bitmutation has a 2.4% chance of changing the state of a single bit. The string of bits of the genetic algorithm encodes the selection of five elements extracted from

GPs for CVaR Approximation in Robust Aerodynamic Shape Design 0.006

337

1.0 0.9

0.005

0.8 0.7 0.6 α

distance

0.004 0.003

0.5 0.4

0.002

0.3 0.2

0.001

0.1

0 0

200 400 600 800 1000 1200 1400 1600 1800 2000 evaluations

0.0 0.020

GP ECDF (CVaR=0.0371) True ECDF (CVaR=0.0380) 0.025

0.030

cd

0.035

0.040

0.045

Fig. 3 GP preliminary training – five variables – one design point. Evolution history and GP approximation vs. ECDF training set matching Table 2 CMA-ES parameters adopted for first robust design optimisation step Maximum evaluations 801

Population size 8

Initial standard deviation 0.02

the hi-fi ECDF. The Gaussian process constructs a response surface using these five elements and generates an approximated ECDF with a thousand Monte Carlo samples. The objective function is the distance of this approximated ECDF from the original one. The objective function evaluation process is very fast because the setup of a Gaussian process with five elements is rapid. Also, the sampling of the approximating ECDF with a thousand samples is almost immediate. The history of the evolution process is shown in Fig. 3 (left side). The right side shows the visual comparison between the 120-samples ECDF (on which the Gaussian process was trained) and the ECDF approximated with the Gaussian process using the five samples selected by the optimiser. Step 2: First Robust Optimisation Run The first robust optimisation step used the resulting GP approximator. Each approximation of the ECDF requires five evaluations of the QoI with five fluid dynamics solver runs. Of these five runs, the first is related to the nominal conditions (without random perturbations) and is at a constant cl . With the settings used, the constant cl run lasts about twice as long as those with the assigned angle of attack. Although the GP setup does not require the nominal point to be present in the training subset, this point must be calculated to evaluate the initial angle of attack on which to apply the perturbations due to uncertainty ( α). Consequently, it is useful for the computational efficiency of the whole process to force its inclusion in the GP setup subset. Twenty design variables describe the airfoil shape, and CMA-ES [7] is the algorithm chosen for robust optimisation. The algorithm parameters for this optimisation phase are reported in Table 2. The optimisation process was stopped after 185 evaluations at the end of the 23rd generation due to a stagnation of the optimisation process that occurred after the hundredth evaluation. The evolution history of this first robust optimisation step is reported in Fig. 4.

338

E. Morales et al.

Fig. 4 First optimisation run using the initial GP approximation

0.039

baseline

0.038

CVaR

0.9

0.037 0.036 0.035 0.034 0.033 0.032 0

20

1.0

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.0 0.020

80 100 120 140 160 180 200 evaluations

0.1

GP ECDF (CVaR=0.0371) True ECDF (CVaR=0.0380) 0.025

0.030

cd

0.035

0.040

0.0 0.020

0.045

1.0

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

0.5

0.5

α

1.0

0.4

0.4

0.3

0.3

0.2

GP ECDF (CVaR=0.0328) True ECDF (CVaR=0.0379) 0.025

0.030

cd

0.035

0.040

0.045

0.2

0.1 0.0 0.020

60

0.2

0.1

α

40

1.0

α

α

0.031

0.1

GP ECDF (CVaR=0.0329) True ECDF (CVaR=0.0382) 0.025

0.030

cd

0.035

0.040

0.045

0.0 0.020

GP ECDF (CVaR=0.0311) True ECDF (CVaR=0.0361) 0.025

0.030

cd

0.035

0.040

0.045

Fig. 5 Predicting capabilities of the first Gaussian process

The solution that the GP classified as best (# 108) plus two suboptimal solutions generated in the early stages of optimisation (# 7 and # 34) were evaluated with the pre-defined dense sampling to verify the quality of the approximation of the GP for elements that are not part of its training set. The predicting capabilities of the first Gaussian process are reported in Fig. 5. It is evident that the trained GP is unable to offer accurate CVaR predictions, as the actual ECDF is very different from that obtained using the GP approximator.

1.0

1.0

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

0.5

0.5

α

α

GPs for CVaR Approximation in Robust Aerodynamic Shape Design

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1

GP ECDF (CVaR=0.0364) True ECDF (CVaR=0.0380) 0.025

0.030

cd

0.035

0.040

0.0 0.020

0.045

1.0

1.0

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

0.5

0.5

α

α

0.0 0.020

0.4

0.4

0.3

0.3

0.2

GP ECDF (CVaR=0.0348) True ECDF (CVaR=0.0379) 0.025

0.030

cd

0.035

0.040

0.045

0.2

0.1 0.0 0.020

339

0.1

GP ECDF (CVaR=0.0355) True ECDF (CVaR=0.0382) 0.025

0.030

cd

0.035

0.040

0.045

0.0 0.020

GP ECDF (CVaR=0.0337) True ECDF (CVaR=0.0361) 0.025

0.030

cd

0.035

0.040

0.045

Fig. 6 ECDFs comparison on the training set for the second GP retraining step with five variables

Step 3: Gaussian Process Retraining In this second phase of the training of the Gaussian process, the goal is to choose a subset of input elements that can guarantee a better approximation for a greater number of ECDFs. We add to the training pool the three ECDFs that were not well represented with the old approximation scheme. The objective function is now constructed by adding the distances between the HighFidelity ECDFs and the ECDFs constructed with the Gaussian process in the four training cases. Therefore the same subset of input vectors of uncertain variables will be associated with the corresponding values of the QoI that correspond to the different training cases considered. Also, in this case, the process of evaluating the objective function is very fast since the quantities of interest were previously calculated in the evaluation phase of the high-fidelity ECDFs. The same genetic algorithm used for the first training phase was used here. However, in this step, the algorithm had a population of 240 elements that evolved over 32 generations. The one-point binary crossover had a triggering probability of 80%, and bit-mutation had a 1.2% chance of changing the state of a single bit. Figure 6 compares, for each training point, the ECDF obtained with fine Monte Carlo sampling and the approximated one with the Gaussian process. For the sake of completeness, Fig. 7 shows the comparison between the ECDFs in the case in which the training set of the Gaussian process has 10 points (and the same number of ECDFs, equal to four). A better approximation of the curves is evident. Still, it must be considered that the computational cost is double, having to evaluate, for each candidate solution, a double number of quantities of interest.

E. Morales et al.

1.0

1.0

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

0.5

0.5

α

α

340

0.4

0.4

0.3

0.3

0.2

0.2

0.1 0.0 0.020

0.025

0.030

cd

0.035

0.040

0.0 0.020

0.045

1.0

1.0

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

0.5

0.5

α

α

0.1

GP ECDF (CVaR=0.0369) True ECDF (CVaR=0.0380)

0.4

0.4

0.3

0.3

0.2

GP ECDF (CVaR=0.0362) True ECDF (CVaR=0.0379) 0.025

0.030

cd

0.035

0.040

0.045

0.2

0.1

0.1

GP ECDF (CVaR=0.0361) True ECDF (CVaR=0.0382)

0.0 0.020

0.025

0.030

cd

0.035

0.040

0.045

0.0 0.020

GP ECDF (CVaR=0.0345) True ECDF (CVaR=0.0361) 0.025

0.030

cd

0.035

0.040

0.045

Fig. 7 ECDFs comparison on the training set for the second GP retraining step with 10 variables 0.036

1.0

baseline

0.9 0.8 0.7

0.035 α

CVaR0.9

0.6 0.5 0.4

0.034

0.3 0.2 0.1

0.033

0

50

100

150

200

250

evaluations

300

350

400

450

0.0 0.020

GP ECDF (CVaR=0.0331) True ECDF (CVaR=0.0352) 0.025

0.030

cd

0.035

0.040

0.045

Fig. 8 Evolution history and best ECDF (true and approximate) obtained in the second robust optimisation run

Step 4: Second Robust Optimisation Run The second robust optimisation run is made again with CMA-ES, and the settings are identical to those of the previous run shown in Table 2. However, the optimisation process was interrupted after 61 generations. Indeed, after an improvement of the approximate objective function equal to about 3%, it was considered appropriate to move to a new training phase of the Gaussian process that included the new current minimum. Figure 8 reports the evolution history of the GP approximated CVaR together with the ECDF curves of the best solution found. The approximated distribution shows an acceptable

GPs for CVaR Approximation in Robust Aerodynamic Shape Design

341

agreement with the sampled one, except for the right tail, which, on the other hand, is very important for CVaR evaluation. Step 5: Third Gaussian Process Retraining The previous step of robust optimisation showed some weaknesses in the training process. The first one concerns the fact that, while obtaining a good overall agreement between the empirical distribution calculated with the Gaussian process and the Monte Carlo sampled distribution, the upper tail of the distribution, which is the most critical part for a good estimate of the CVaR, is not always well approximated. On the other hand, the second point is important when comparing distributions that are very close to each other. In this case, even in the presence of a good approximation, the error, even if small, in the estimate of the CVaR can lead to the overturning of the order relationship. In other words, it could happen that while the Monte Carlo sampling indicates, hypothetically, CVaR1 < CVaR2 , the GP approximation indicates CVaR1 > CVaR2 , reversing the order relation. To remedy the first problem, it was decided to introduce a new term in the objective function proportional to the distance between the CVaR computed with the Monte Carlo samples and the CVaR calculated with the GP. This term is added to the objective function for each of the training distributions. For the second problem, instead, it has been chosen to introduce a penalty in the objective function, which is increased for each violation of the order relation introduced by the Gaussian process compared to that defined by the Monte Carlo sampling. The calculation of the penalty value is immediate but involves the sorting and comparison operation detailed below. Let us consider, for each element of the training set, the pair CVaR, CVaR constituted by the values of the cumulative value at risk calculated, respectively, according to the Monte Carlo sampling and according to the approximation based on Gaussian processes. Then reorder the set of pairs so that CVaRi ≤ CVaRi−1 . The penalty value is given by: Ptset = w

n

1C CVaRi > CVaRi−1

(14)

i=2

where n is the size of the training set, w a weight constant, and 1C is the indicator function of subset C: 1C (x) := {1 if x ∈ C, 0 if x ∈ / C}, with C the subset of all the true inequalities among the set of all the possible ones X between the training set CVaR values calculated with the Gaussian process. The genetic algorithm had a population of 240 elements again and evolved over 150 generations. The one-point binary crossover had a triggering probability of 80%, and bit-mutation had a 1.2% chance of changing the state of a single bit, as in the previous training run. The penalty term weight w was set to 0.01. Table 3 compares, for each training point, the conditional value at risk obtained with fine Monte Carlo sampling (CVaR) and the approximated one with the Gaussian process (CVaR). The table shows the CVaR values of the training set sorted in descending order. This then allows evaluating the penalty term associated with the CVaR simply by comparing its value with that of the table previous row.

342

E. Morales et al.

Table 3 CVaR and CVaR values comparison in the training set after the third Gaussian process training step Solution ID Selection #1 – 1st run Baseline Selection #2 – 1st run Selection #3 – 1st run Best – 2nd run

Distance 0.0013679 0.0003242 0.0010019 0.0012644 0.0006619

0.0370

CVaR 0.03821 0.03805 0.03785 0.03612 0.03519

CVaR 0.03826 0.03749 0.03741 0.03599 0.03519

Penalty term − 0 0 0 0

1.0

baseline

0.9 0.8

0.0365

0.6 α

CVaR0.9

0.7 0.0360

0.5 0.4

0.0355

0.3 0.2

0.0350

0.1 0.0345

0

50

100

150

200

250

0.0 0.020

GP ECDF (CVaR=0.0346) True ECDF (CVaR=0.0355) 0.025

0.030

evaluations

cd

0.035

0.040

0.045

Fig. 9 Evolution history and best ECDF (true and approximate) obtained in the third robust optimisation run Table 4 CVaR and CVaR values comparison between the best solutions in the second and third optimisation runs Solution ID Best – 3rd run Best – 2nd run

Distance 0.0005268 0.0006619

CVaR 0.03550 0.03519

CVaR 0.03462 0.03519

Penalty term − 1

Step 6: Third Robust Optimisation Run The CMA-ES with the usual settings (Table 2). The baseline of run 3 is the best result of run 2. The optimisation process was interrupted after 29 generations, after an improvement of the approximate objective function equal to about 1.6%. Indeed, the procedure would seem to show no signs of an improvement trend after a first promising solution that seemed to have improved by 1.6%. This could indicate a problem in the approximation procedure and requires an additional verification step. Figure 9 reports the evolution history of the GP approximated CVaR together with the ECDF curves of the best solution found. The comparison of the best results obtained in run 2 and run 3, respectively, shown in Table 4, shows that, despite the positive training results, the GP approximator evaluated an improvement in a solution that then, during the verification phase, did not confirm. Step 7: Fourth Gaussian Process Retraining The previous step of robust optimisation showed an additional source of weakness in the training process. Therefore

1.0

1.0

0.9

0.9

0.9

0.8

0.8

0.8

0.7

0.7

0.7

0.6

0.6

0.6

0.5

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1 0.025

0.030

cd

0.035

0.040

0.045

0.0 0.020

0.3 0.2 0.1

GP ECDF (CVaR=0.0375) True ECDF (CVaR=0.0379) 0.025

0.030

cd

0.035

0.040

0.045

0.0 0.020

1.0

1.0

0.9

0.9

0.9

0.8

0.8

0.8

0.7

0.7

0.7

0.6

0.6

0.6

0.5

0.5

0.5

0.4

0.3

0.3

0.2

0.025

0.030

cd

0.035

0.040

0.045

0.0 0.020

0.030

0.025

0.030

cd

0.035

0.040

0.045

0.3 0.2

0.1

GP ECDF (CVaR=0.0358) True ECDF (CVaR=0.0361)

GP ECDF (CVaR=0.0383) True ECDF (CVaR=0.0382) 0.025

0.4

0.2

0.1 0.0 0.020

α

1.0

0.4

343

0.4

0.1

GP ECDF (CVaR=0.0381) True ECDF (CVaR=0.0380)

α

α

0.0 0.020

α

1.0

α

α

GPs for CVaR Approximation in Robust Aerodynamic Shape Design

0.1

GP ECDF (CVaR=0.0346) True ECDF (CVaR=0.0352) 0.025

0.030

cd

0.035

0.040

0.045

0.0 0.020

GP ECDF (CVaR=0.0361) True ECDF (CVaR=0.0377) cd

0.035

0.040

0.045

Fig. 10 ECDFs comparison on the training set for the fourth GP retraining step

a new training phase of the Gaussian process is necessary. However, instead of introducing the sub-optimal solution of run 3 into the training set, it was decided to insert its perturbation obtained by reducing the nominal angle of attack by about 0.2 degrees. In this way, the new ECDF introduced in the training set, despite being similar to the run 3 solution, was quite distant from the other training set elements. The original ECDF, on the other hand, was used in the performance verification phase of the Gaussian process. The genetic algorithm setup has been slightly changed. In particular, the bit-mutation operator has been replaced by the adaptive mutation operator described in [13]. The definition of the objective function is identical to that of the previous training phase. The only difference in the algorithm setup is that now the membership of the nominal condition in the training set is imposed explicitly and not through a penalty. This means that one of the five problem variables is constrained explicitly. Figure 10 reports the ECDFs obtained with fine Monte Carlo sampling and the approximated one for each training point after a training optimisation run of 200 generations. Step 8: Fourth Robust Optimisation Run The last robust optimisation run is performed using the Gaussian process obtained with the training set of six cumulative distributions described above. The CMA-ES settings did not change, but the optimisation continued up to the maximum limit of evaluations of the objective function (set at 801). Figure 11 shows an optimisation process divided into two distinct phases: the first in which the optimisation process is stagnant and does not show significant progress and the second visible after the five hundredth evaluation, in which the optimiser is finally able to find an exploitable direction in the optimisation process. The cumulative distributions relative to the optimal solution, again reported in Fig. 11, show an excellent agreement between densely Monte Carlo sampled and approximated distributions. Table 5 allows appreciating how the approximator latest version respects the order relationship for the optimal solution. The percentual improvement, calculated

344

E. Morales et al.

0.0365

1.0

baseline

0.9

0.0360

0.8 0.7 0.6

0.0350 α

CVaR0.9

0.0355

0.0345

0.5 0.4 0.3

0.0340

0.2 0.0335

0.1

0.0330

0.0 0.020

0

100 200 300 400 500 600 700 800 900 evaluations

GP ECDF (CVaR=0.0334) True ECDF (CVaR=0.0342) 0.025

0.030

cd

0.035

0.040

0.045

Fig. 11 Evolution history and best ECDF (true and approximate) obtained in the fourth robust optimisation run Table 5 CVaR and CVaR values comparison in the training set after the fourth Gaussian process training step and with the best solution found in the fourth robust optimisation run (this is the best solution obtained throughout the run sequence and is highlighted in bold) Solution id Selection #1 – 1st run Baseline Selection #2 – 1st run Perturbed best – 3rd run Selection #3 – 1st run Best – 3rd run Best – 2nd run Best – 4th run

Distance 0.0029011 0.0012991 0.0024264 0.0014308 0.0019234 0.0008803 0.0008981 0.0005761

Fig. 12 Comparison between the cumulative distributions of the training set and the one related to the optimal solution of the fourth robust optimisation run

CVaR 0.03821 0.03805 0.03785 0.03774 0.03612 0.03550 0.03519 0.03420

CVaR 0.03832 0.03810 0.03748 0.03608 0.03579 0.03575 0.03460 0.03341

Penalty term − 0 0 0 0 0 0 0

1.0 0.9 0.8 0.7

α

0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.020

training set optimum 0.025

0.030

cd

0.035

0.040

0.045

on the high fidelity distribution, compared to the baseline is equal to about 10%, which is a value comparable to the 12% predicted by the approximate method. Finally, Fig. 12 allows appreciating the difference in robustness between the optimal solution found (blue line) and the training set solutions.

GPs for CVaR Approximation in Robust Aerodynamic Shape Design

345

6 Conclusions In this work, we tried to show how an adaptive approach can allow an efficient implementation of robust optimisation based on the use of risk functions and on Gaussian processes as a strategy for constructing response surfaces. In particular, CVaR was used in a robust aerodynamic optimisation loop to improve a transonic airfoil performance. The key point of the procedure was introducing an iterative approach in which the training phase of the Gaussian process alternates with that of robust optimisation, and the results of the latter are then used in the next phase to improve the Gaussian process. One of the training phase peculiarities is that you can choose between different objective functions to be optimised. In the simplest approach, the minimisation of the distance between the cumulative distributions calculated with fine Monte Carlo sampling and those calculated with the Gaussian process is required. The minimisation of the distances between the risk functions calculated with the two different methods can then be added. Finally, a penalty can be imposed if the Gaussian process fails to obtain the same order relationship between risk functions as the reference solutions obtained by the Monte Carlo sampling. At present, the switch between the various phases of the procedure is not automatic. Still, it must be carried out manually based on the user’s empirical considerations on the progress of every single step of the robust optimisation. Further developments of the methodology will consider constructing an integrated algorithm in which the transition between the robust optimisation phase and the training phase of the Gaussian process can take place automatically. Despite the limitations indicated above, the presented procedure allowed the implementation of an effective and efficient robust design loop with a level of computational resources that is a fraction of that required by risk function-based approaches that do not approximate the objective function.

References 1. Artzner, P., Delbaen, F., Eber, J.M., Heath, D.: Coherent measures of risk. Mathematical Finance 9(3), 203–228 (1999). https://doi.org/10.1111/1467-9965.00068 2. Economon, T.D., Palacios, F., Copeland, S.R., Lukaczyk, T.W., Alonso, J.J.: SU2: An opensource suite for multiphysics simulation and design. AIAA J. 54(3), 828–846 (2016). https:// doi.org/10.2514/1.J053813 3. Geuzaine, C., Remacle, J.F.: Gmsh: A 3-D finite element mesh generator with built-in pre- and post-processing facilities. Int. J. Numer. Methods Eng. 79(11), 1309–1331 (2009). https://doi. org/10.1002/nme.2579, https://onlinelibrary.wiley.com/doi/abs/10.1002/nme.2579 4. Gibbs, M.N.: Bayesian Gaussian Processes for Regression and Classification. Ph.D. thesis, University of Cambridge (1997) 5. Gibbs, M., MacKay, D.J.: Efficient Implementation of Gaussian Processes. Tech. rep., Cambridge University Engineering Department (1997) 6. Giles, M.B.: Multilevel Monte Carlo methods. Acta Numerica 24, 259–328 (2015). https://doi.org/10.1017/S096249291500001X, http://journals.cambridge.org/article_ S096249291500001X

346

E. Morales et al.

7. Hansen, N., Ostermeier, A.: Completely derandomized self-adaptation in evolution strategies. Evolutionary Computation 9(2), 159–195 (2001) 8. Heinrich, S.: Multilevel Monte Carlo methods. In: Margenov, S., Wa´sniewski, J., Yalamov, P. (eds.) Large-Scale Scientific Computing: Third International Conference, LSSC 2001 Sozopol, Bulgaria, June 6–10, 2001 Revised Papers, pp. 58–67. Springer Berlin Heidelberg, Berlin, Heidelberg (2001). https://doi.org/10.1007/3-540-45346-6_5 9. Jeong, S., Murayama, M., Yamamoto, K.: Efficient optimization design method using kriging model. J. Aircraft 42(2), 413–420 (2005) 10. Neal, R.M.: Annealed importance sampling. Stat. Comput. 11(2), 125–139 (2001) 11. Park, G.J., Lee, T.H., Lee, K.H., Hwang, K.H.: Robust design: An overview. AIAA J. 44(1), 181–191 (Jan 2006). https://doi.org/10.2514/1.13639 12. Quagliarella, D.: Value-at-risk and conditional value-at-risk in optimization under uncertainty. In: Hirsch, C., Wunsch, D., Szumbarski, J., Łaniewski-Wołłk, L., Pons-Prats, J. (eds.) Uncertainty Management for Robust Industrial Design in Aeronautics. Springer (2019) 13. Quagliarella, D., Vicini, A.: A genetic algorithm with adaptable parameters. In: 1999 IEEE International Conference On Systems, Man, and Cybernetics. Institute of Electrical and Electronic Engineers (IEEE), Tokyo, Japan (Oct 1999) 14. Quagliarella, D., Petrone, G., Iaccarino, G.: Reliability-based design optimization with the generalized inverse distribution function. In: Greiner, D., Galván, B., Périaux, J., Gauger, N., Giannakoglou, K., Winter, G. (eds.) Advances in Evolutionary and Deterministic Methods for Design, Optimization and Control in Engineering and Sciences, Computational Methods in Applied Sciences, vol. 36, chap. 5, pp. 77–92. Springer International Publishing (2015). https:// doi.org/10.1007/978-3-319-11541-2_5. iSBN:978-3-319-11540-5 15. Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. The MIT Press (2006) 16. Rockafellar, R.T., Uryasev, S.: Optimization of conditional value-at-risk. J. Risk 2, 21–41 (2000) 17. Rockafellar, R.T., Uryasev, S.: Conditional value-at-risk for general loss distributions. J. Bank. Finance 26, 1443–1471 (2002) 18. Spalart, P., Allmaras, S.: A one-equation turbulence model for aerodynamic flows. In: 30th Aerospace Sciences Meeting and Exhibit, p. 439 (1992) 19. van der Vaart, A.: Asymptotic Statistics. Cambridge University Press (1998) 20. Witteveen, J., Doostan, A., Chantrasmi, T., Pecnik, R., Iaccarino, G.: Comparison of stochastic collocation methods for uncertainty quantification of the transonic rae 2822 airfoil. In: Proceedings of Workshop on Quantification of CFD Uncertainties (2009)

Part IV

Uncertainty Quantification, Identification and Calibration in Aerospace Models (UQ)

Inference Methods for Gas-Surface Interaction Models: From Deterministic Approaches to Bayesian Techniques Anabel del Val, Olivier P. Le Maître, Olivier Chazot, Pietro M. Congedo and Thierry E. Magin

,

1 Introduction Space travel, since its beginnings in Low Earth Orbit (LEO) to the exploration of our Solar System, has led to countless scientific advancements in what it is one of the most challenging undertakings of humankind. Venturing into Space requires large amounts of kinetic and potential energy to reach orbital and interplanetary velocities. All this amount of energy is dissipated when a space vehicle enters dense planetary atmospheres [24]. The bulk of this energy is exchanged during the entry phase by converting the kinetic energy of the vehicle into thermal energy in the surrounding atmosphere through the formation of a strong bow shock ahead of the vehicle [1]. The interaction between the dissociated gas and the protection system is governed by the material behavior which either acts as a catalyst for recombination

A. del Val () Aeronautics and Aerospace Department, von Karman Institute for Fluid Dynamics, Sint-Genesius-Rode, Belgium Inria, Centre de Mathématiques Appliquées, École Polytechnique, Institut Polytechnique de Paris, Palaiseau, France e-mail: [email protected] O. P. Le Maître CNRS, Centre de Mathématiques Appliquées, Inria, École Polytechnique, Institut Polytechnique de Paris, Palaiseau, France O. Chazot · T. E. Magin Aeronautics and Aerospace Department, von Karman Institute for Fluid Dynamics, Sint-Genesius-Rode, Belgium P. M. Congedo Inria, Centre de Mathématiques Appliquées, École Polytechnique, Institut Polytechnique de Paris, Palaiseau, France © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Vasile, D. Quagliarella (eds.), Advances in Uncertainty Quantification and Optimization Under Uncertainty with Aerospace Applications, Space Technology Proceedings 8, https://doi.org/10.1007/978-3-030-80542-5_21

349

350

A. del Val et al.

reactions of the atomic species in the surrounding gas mixture [9] or ablates as a consequence of the reactive gas mixture that surrounds it, injecting new species into the boundary layer [11]. Characterizing both phenomena for different atmospheric and material compositions is quite challenging due to the coupling mechanisms between material surface properties and the resulting ablation and heating rates. All these elements are also tightly coupled to the flowfield computations [30]. A key element of the modeling behind such phenomena is the heterogeneous chemical processes that are taken into account to explain the experiments. It is not always straightforward, from a priori knowledge point of view, to know which chemical processes underpin the macroscopic effects we see in thermal protection materials subjected to a reactive flow environment. All the above makes the determination of gas-surface interactions of thermal protection materials a complex task subjected to experimental and model uncertainties. The design and performance of atmospheric entry vehicles must account for these uncertain characterizations. It is relatively common when dealing with complex physical phenomena to resort to simple, non-intrusive a priori forward uncertainty propagation techniques [36]. These techniques assume a priori probability distributions for the main model parameters. Sensitivity analyses are then performed to discriminate the important ones. They also assume that the exact value is sufficiently well known and within the considered uncertainty range. These methods do not use any experimental observation to calibrate such parameters. The interest of using experimental information is that it leads to objective uncertainty levels and provides likely values rather than a priori guesses, achieving better and more reliable predictions. The calibration from experimental data focuses on a Bayesian approach that has the advantage of providing a complete objective characterization of the parameters’ uncertainty through their resulting posterior distribution. In this context, knowing how the experimental procedure is carried out is fundamental for the formulation of the inference method. Current plasma wind tunnel experiments rely heavily on the accurate characterization of free stream conditions that serve as input to Computational Fluid Dynamics (CFD) models. In the particular case of catalytic materials, this step proves critical to correctly account for the catalytic heat flux. Another important aspect of high temperature flow testing is that even though our interest lies in the heterogeneous chemical processes, it is common to have additional parameters for which direct experimental observations are not available. These parameters are needed to perform the inference but we are not explicitly interested in getting their distributions. Traditional Bayesian approaches deal with this problem by prescribing prior distributions on such parameters at the expense of some of the observations being consumed to evaluate these nuisance parameter posteriors. Consequently, it is important to remark their impact on the quality of the inference [38]. These challenges are most critical for catalytic materials. Ablative materials have a distinct behavior under such high temperature reacting flows. The fact that they undergo ablation, that is, the chemical consumption of the material, generates a visible surface recession that can be measured and gives direct evidence of the chemical processes involved in such phenomena. The main issue with ablative materials is centered on the fact that while an overall

Inference Methods for Gas/Surface Interaction Models

351

response is measured (recession), it is difficult to isolate the different effects that happen on the surface to produce the macroscopic response. Both systems present challenges for the efficient inference of heterogeneous chemical parameters. In this chapter, we review the state-of-the-art methods for the deterministic rebuilding of such parameters and describe the proposed inferences using Bayesian methods and how the aforementioned challenges are addressed. The Bayesian methods exploit the experimental data resulting from measurements performed by Panerai [29] and Helber [17] on two different types of materials: ceramic matrix composites and graphite, respectively. While deterministic approaches require more data and assumptions to extract the model parameters from the experimental data, Bayesian methods offer a complete characterization of their uncertainty. This work paves the way to a new design of experiments paradigm within the Thermal Protection System (TPS) community, where the most informative experiments will be sought. The chapter is organized as follows: Sect. 2 addresses the sets of experiments dedicated to the study of catalytic and ablative materials reviewed here. Section 3 reviews the models used for the deterministic approaches and how they are used together with the experimental data to rebuild the model parameters. Section 4 showcases the Bayesian methods proposed and Sect. 5 discusses the conclusions and outlook.

2 Plasma wind Tunnel Experiments In this section, we review the different experimental procedures adopted for the study of catalytic and ablative materials at the von Karman Institute (VKI). The experimental data is consequently used to rebuild different model parameters that define the gas-surface interaction of the different materials in question.

2.1 Heterogeneous Catalysis We consider the experimental setup of the Plasmatron facility at VKI, an inductively coupled plasma (ICP) wind tunnel [5]. The plasma flow is generated by the induction of electromagnetic currents within the testing gas in the plasma torch; this process creates a high-purity plasma flow which leaves the testing chamber through the exhaust. As a simple model of TPS response, we define the catalytic coefficient γ as the ratio of the number of atoms that recombine on the material surface over the total number of atoms that hit it. We assume the same recombination probability for the nitrogen and oxygen species constituting the air plasma, leading to just one single catalytic parameter to characterize the material under atmospheric entry conditions. In a typical experiment, one sequentially exposes two probes to the plasma flow: a reference probe made of a well-known material (copper)

352

A. del Val et al.

Fig. 1 Schematic view of the Plasmatron experimental setup

with a catalytic parameter γref , and a test probe which holds a sample of the TPS material with the unknown catalytic coefficient γTPS to be inferred. The following instruments equip the Plasmatron. For pressures, a water-cooled Pitot probe measures the dynamic pressure Pd within the plasma jet, and an absolute pressure transducer records the static pressure Ps in the Plasmatron chamber. The reference probe is a hemispherical device (25 mm radius) equipped with a watercooled copper calorimeter at the center of its front face. The calorimeter has a cooling water system that maintains the surface temperature of the reference probe. The reference probe heat flux is deduced from the mass flow (controlled by a calibrated rotameter) circulating in the cooling system and the inlet/outlet water temperature difference measured by thermocouples as a result of the exposure to the plasma flow. For the test probes, we measure directly the emissivities and the surface temperatures Tw . The determination of the heat flux assumes a radiative equilibrium at the surface, with the relation qw = σ Tw4 , where σ is the StefanBoltzmann constant, is the emissivity measured with an infrared radiometer, and Tw is the wall temperature which is measured using a pyrometer. More details on how these measuring devices work can be found in [28]. Figure 1 schematizes the Plasmatron and its instrumentation for catalytic property determination. The underlying idea of the experimental procedure is to perform first measurements of the wall temperature, heat flux and pressures Pd and Ps with the reference probe set in the plasma jet. As these measurements depend on the state of the free stream flow, in particular on the enthalpy Hδ at the boundary layer edge δ, the free stream conditions can be deduced if one knows the contribution to the heat flux of the surface catalysis. In our formulation, this is equivalent to knowing the catalytic coefficient γref of the reference probe. Then, in a second stage, the test probe is set in place of the reference probe in the plasma jet. The corresponding steady-state wall temperature Tw and emissivity are measured and, assuming that the free stream flow conditions have not changed, the catalytic coefficient γTPS of the test probe can be inferred.

Inference Methods for Gas/Surface Interaction Models

353

2.2 Thermochemical Ablation The experimental setup installed at the Plasmatron facility offers intrusive and nonintrusive (optical) measurement techniques. The instrumentation equipment consists of a video camera, a two-color pyrometer, a broad-band radiometer, a digital camera, and a spectrometer with Intensified Charge-Coupled Device (ICCD). The main data of interest during the investigation reported here is surface ablation rates of graphite in a pure nitrogen plasma s˙ and measurements of the locally resolved ablation product cyanogen (CN) boundary layer emission and densities ρCN . These measurements are directly linked to the model parameter we want to rebuild, the CN . nitridation reaction efficiency γN The same equipment as for the study of heterogeneous catalysis is used here for the determination of the free stream conditions through the use of heat flux measurements on a reference (copper) material. A set of other dedicated measurements are used for the characterization of the ablative material itself. The total stagnation point recession s˙ is measured using a digital camera (Nikon D5000) attached to a 400 mm lens, giving a resolution of about 0.03 mm per pixel. The test sample is preheated (cleaned and dried) by the Argon plasma used to start the Plasmatron facility. After starting the plasma on Argon gas, with the test sample in place, the test gas is switched to pure nitrogen. The stagnation point of the test sample is placed 445 mm from the torch exit. The strong radiative signature of the CN molecule allows for its easy probing by emission spectroscopy via the strong violet system emission [6, 31]. One can further benefit from the low surface recession rate of the graphite sample, contrary to air ablation, allowing to average several recorded spectra per ablation test. The emission spectroscopy setup consists of an Acton Series SP-2750 spectrograph of 75 cm focal length combined with an ICCD PIMAX camera with a frame of 1024 × 1024 pixels. The two-dimensional ICCD array enables spectral measurements across the complete plasma jet in an imaged plane of 20 cm, yielding a spatial resolution of 0.195 mm. For each acquisition, the camera records a data matrix with wavelength distributed along the horizontal axis and the lateral positions of the observed plasma radius distributed along the vertical axis. Objective of the spectral measurements is the determination of the locally resolved CN emission and the experimental CN species density ρCN . As the plasma jet is being observed from the side, the recorded signal is the result of the local emission integrated along the line-of-sight, projected onto the ICCD sensor. If we assume axisymmetry of the jet and an optically thin gas, the inverse Abel transformation provides the local emission coefficient at distance r from the center. Treatment of the experimental spectra is necessary prior to conversion of local emission. More details about the experimental CN species density determination can be found in [17].

354

A. del Val et al.

3 Deterministic Approaches to the Inference of Model Parameters In this section, we highlight the different approaches adopted in the relevant literature to rebuild model parameters deterministically and learn about gas-surface interaction phenomena through experiments. We also make the split between catalytic and ablative materials due to the different approaches to the rebuilding and models.

3.1 Heterogeneous Catalysis The catalytic activity on a material surface cannot be measured directly. We need models and simulations to bypass that lack of knowledge and use other measurable quantities to rebuild the model parameter we are seeking. Model-based numerical simulations include the catalytic recombination parameter γ in their model to account for this phenomenon on the computation or prediction of relevant quantities. These methods use experiments and models intertwined in a complex fashion. To identify the TPS catalytic properties γTPS , chemically reacting boundary layer simulations are employed in the vicinity of the stagnation point [1]. The Boundary Layer (BL) code implements a suitable order-of-magnitude simplification of the Navier-Stokes equations to compute one-dimensional boundary layer flow quantities. To solve the system, closure models for the thermodynamic and transport properties as well as the chemical production terms of the different species are needed. Transport fluxes are derived from kinetic theory using the ChapmanEnskog method for the solution of the Boltzmann equation [12, 26]. Diffusion fluxes are computed through the generalized Stefan-Maxwell equations [7, 13, 21]. For the homogeneous chemistry, the Law of Mass Action is used to compute production rates as proportional to the product of the reactant densities raised to their stoichiometric coefficients [22]. The thermodynamic properties, such as the enthalpy, are derived from statistical mechanics [1, 37] for a reacting mixture of perfect gases, assuming thermal equilibrium and chemical non-equilibrium. The dependency of the catalytic parameters comes from the boundary condition where this term accounts for the recombination efficiency at the wall. Apart from the closure models, the parabolic nature of the BL model requires the imposition of two boundary conditions: the external flow conditions at the boundary layer edge [23], and at the material surface where recombination reactions can be triggered depending on the catalytic nature of such material [14]. More details about the derivation, coordinate transformations, and numerical implementation of the BL code are available in the work of Barbante [2]. In summary, the predictive quantity of the model is the wall heat flux

Inference Methods for Gas/Surface Interaction Models

qw = qw

∂ ∂uδ , vδ γ, Tw , Pδ , Hδ , δ, ∂x ∂y

355

∂uδ ∂x

,

(1)

which depends on the free stream conditions (subscript δ), the thickness of the boundary layer δ, the catalytic parameter of the material γ, and the surface temperature Tw . An auxiliary axisymmetric magnetohydrodynamics simulation assuming local thermodynamic equilibrium (LTE) is performed to simulate the torch and the chamber of the wind tunnel [23]. Relaying on the knowledge of the operating conditions of the Plasmatron, such as electric power, injected mass flow, static pressure, and probe geometry, this 2D simulation lets us compute NonDimensional Parameters (NDPs) that define the momentum influx to the boundary layer (interested reader is directed to [10]). The prediction we are seeking to match the experimental data is now recast as

qw = qw γ, Tw , Pδ , Hδ , NDP s ,

(2)

where Pδ and Hδ are the static pressure and enthalpy at the boundary layer edge, respectively. We solve this equation iteratively over the outer edge temperature Tδ , which is directly linked to Hδ , until the numerical heat flux matches the one measured experimentally with the calorimetric probe. The procedure returns the enthalpy and velocity gradient at the boundary layer outer edge. Once the plasma enthalpy has been determined, we can run the BL code for various combinations of material catalysis and temperature, to obtain a heat flux abacus, qw = qw (Tw , γ). The abacus defines a chemically reacting frame for one enthalpy, pressure and model geometry combination. The TPS catalysis can be determined by identification of the γ contour, where the actual experimental conditions (qw , Tw ) lay.

3.2 Thermochemical Ablation The approach here presented is based on a full Navier-Stokes solution with boundary conditions for surface mass and energy balances. This approach has been widely used to study the ablative gas-surface interaction of TPS with the surrounding flowfield [4, 8, 35]. While the gas phase is governed by the Navier-Stokes equations, the surface is not fully simulated, but is approximately modeled. The plasma boundary layer edge condition composed of the temperature Tδ , axial velocity vδ , and species densities ρi,δ , dependent on the temperature and static pressure Ps , is fully described by the experimental-numerical plasma rebuilding procedure described previously in the context of heterogeneous catalysis. Quantities at the boundary layer edge serve as input to a 1D stagnation-line code with ablative boundary condition [34], with the goal to numerically simulate the ablation experiments. A set of balance equations are solved with respect to the conserved quantities of the gas, that is, mass, momentum

356

A. del Val et al.

and energy, which are imposed as boundary values for the Navier-Stokes equations. For the ablation experiments considered in this work, only a mass balance equation is needed given that we invoke the no slip condition for the momentum equations and the wall temperature is imposed as it is measured. This balance is obtained limiting the control volume of the mass conservation equation to the thin lamina representing the gas-surface interface [18, 33]. The unknown nitridation reaction efficiency is recovered from the closure model considered for the surface mass balance (find the derivation in more detail in [17]): CN γN (Tw )

m ˙ s MCN = ρNw MC

RTw 2π MN

−1/2 ,

(3)

where the term m ˙ s is the mass loss flux of the ablated material, M denotes the molar masses of the different species, Tw is the temperature at the wall, which is measured, and ρNw is the nitrogen density at the wall. The mass loss m ˙ s is directly linked to the measured recession rate s˙ through the known material density of the graphite rod ρs = 1760 kg/m3 . The unknown ρNw is computed by accounting in a fully-coupled way the effect of the ablation product injection. We point out that this approach does not take any changes in the surface microstructure during ablation into account and is solely based on steady-state ablation. In addition, it is almost impossible to observe changes of the surface state during ablation in the plasma wind tunnel in-situ. It is important to remark the fact that the experimentally derived ρCN is only used to validate the rebuilding methodology without using its information to CN . retrieve γN

4 Bayesian Approaches to the Inference of Model Parameters In this section we review the Bayesian formulations proposed for inferring catalytic parameters and nitridation reaction efficiencies from the experiments depicted in Sect. 2. We also report the main results for two different sets of cases, S1 test from Panerai [29] for ceramic matrix composite materials and G4-G7 cases from Helber [17] for graphite ablation.

4.1 Bayes Theorem The inference of model parameters uses the Bayes formula which can be generally formulated as P(q|M) = 9

L(M|q) P(q) , L(M|q) P(q)dq

(4)

Inference Methods for Gas/Surface Interaction Models

357

where q is the generic vector of parameters, having for components the parameters of the analysis, and M is the vector of the measured quantities used for the analysis. In the present Bayesian setting, q and M are real-valued random vectors. We denote P(q) the prior probability distribution of the parameters that expresses one’s beliefs on possible values of q before the measurements are made available. L(M|q) is the likelihood function, that is, the probability of observing measurements M given q. Typically, the likelihood compares the measurements with model predictions (functions of q), and relies on a noise model to account for the measurement error; a model error contribution can also be included [19]. Specifically, the comparison can be made on the raw measurements or more generally on some derived quantities which is the case of our study. From the measurements M, Bayesian inference updates the prior distribution P(q) to the posterior probability distribution P(q|M). The posterior distribution in Eq. 4 is usually not known in a closed form due to the complexity of the mapping q → M(q) where M(q) is the vector of model predictions using the forward model. The noise and error models can also complicate calculations. Therefore, sampling strategies, such as Markov Chain Monte Carlo (MCMC) methods [25], are needed to estimate the statistics of the posterior distribution of q (e.g., mean, moments, median, and mode). In this work, we use the Adaptive Metropolis (AM) algorithm [15], an extension of the RandomWalk Metropolis (RWM) algorithm, which adapts the proposal covariance matrix using previously sampled points. Lastly, the integral under the denominator in Eq. 4 extends to the space of q, denoted here with the Greek letter . In practical terms, this integral is called the evidence and it is a single number. It usually does not mean anything by itself, but can be important when comparing different model choices.

4.2 Heterogeneous Catalysis For the inference of catalytic parameters we have denoted the set of experimental data as M = (Psmeas , Pdmeas , qwref,TPS,meas , Twref,TPS,meas ). The issue with this inference is the fact that the model predictions qw = qw γ, Tw , Pδ , Hδ , NDP s are not just functions of the catalytic coefficient γ, but also depend on all the inputs of the BL code. It is worth noting that the dependencies of the predictions can be recast as qw = qw γ, Tw , Ps , Hδ , Pd given that the non-dimensional parameters are set for each case and the predictions are barely sensitive to them for the given measurements. The edge pressure is taken as the static pressure of the chamber and the stagnation pressure is used to derive the momentum influx to the boundary layer through the given non-dimensional parameters. As already mentioned, the pressures and wall temperatures are measured in the experiment, but only with limited precision, while the enthalpy Hδ is simply not known. In this Bayesian formulation we propose to infer all model parameters (ref and TPS for reference and protection material) jointly, given that they both play the same role in the inference problem and our level of prior knowledge about both materials can be safely assumed to be the same. Consequently, there may be zero, or multiple, boundary layer edge conditions

358

A. del Val et al.

consistent with the measurements. Since the boundary layer edge conditions cannot be completely characterized, the remaining uncertainty should be accounted for when inferring the TPS catalytic coefficient γTPS . One possibility to handle this issue is to consider the whole set of uncertain quantities, not just the quantities of interest γref and γTPS , but also the so-called nuisance parameters. In that case, we have q = (γref , γTPS , Twref , TwTPS , Ps , Pd , Hδ ) in the inference problem. The introduction of the nuisance parameters induces several difficulties related to the necessity to specify their prior distributions, the increased dimensionality of the inference space, and the consumption of information for the inference of the nuisance parameters. This last issue is detrimental to learning the parameters of interest. We derive an alternative formulation for the joint inference of the two catalytic coefficients γ = (γref , γTPS ). The proposed formulation only depends on the two catalytic coefficients and not on the other nuisance parameters. As a result, only the prior P(γ) is needed. Assuming independent unbiased Gaussian measurement errors, with magnitude σ, the proposed likelihood of M would read as ⎡ ⎡ 2 ⎤ 2 ⎤ meas − P opt (γ) meas − P opt (γ) P P s ⎢ ⎥ ⎥ ⎢ s d ⎥ exp ⎢− d ⎥ × Lopt (M|γ) = exp ⎢ ⎦ ⎣ ⎦ ⎣− 2 2 2σ 2σ Ps

Pd

⎡ 2 2 ⎤ i,opt i,opt i,meas i,meas Tw − qw (γ) − Tw (γ) ⎥ = ⎢ qw ⎥, × exp ⎢ − ⎦ ⎣− 2 2σqw 2σT2 i∈{ref,TPS} w (5)

where the dependence of the optimal values on the two material properties has been made explicit for clarity. Given M and a value for the couple of catalytic coefficients, the optimal nuisance parameters and associated heat fluxes are determined using the BL code. The procedure for this optimization is the Nelder-Mead algorithm [27], which is a gradient-free method requiring only evaluations of the BL model solution. Typically, a few hundreds resolutions of the BL model are needed to converge to the optimum of (5). The computational cost of the optimization prevents us from using directly this approach to draw samples of γ from their posterior distribution, and this fact motivates the approximation of the optimal (log) likelihood function using a Gaussian Process surrogate [32]. Figure 2 shows the obtained posterior distributions for experiment S1 reported in [29]. We can observe that the distributions of both γref and γTPS drop to small values at both ends of the spectrum, reducing the support from the prior distributions proposed. This behavior can be explained by the proposed likelihood form, which uses all the available measurements to access the fitness of the model predictions. It is also important to notice that both distributions have well-defined peaks for γref , 0.016 and γTPS , 0.01. In this framework, no assumptions are made concerning γref , which is estimated along with the protection material parameter

Inference Methods for Gas/Surface Interaction Models

359

Fig. 2 Marginal posteriors obtained with the Bayesian formulation

with no differences in their prior knowledge. It can be suggested that a deeper experimental study can provide more insights to the behavior of the reference material and a different prior can be defined for the same analysis where differences in knowledge between the two probes can be then accounted for. The distributions of the optimal nuisance parameters can be obtained by building surrogates on these quantities as functions of γref and γTPS . This is done by using the same Gaussian process methods as for the optimal likelihood. The joint posterior of the catalytic parameters can be then propagated through such surrogates to obtain the resulting distributions of the nuisance parameters.

4.3 Thermochemical Ablation For the nitridation experiments, we have as the set of measurements M = meas ) . It is important to notice that for a stagnation (Psmeas , Pdmeas , Twmeas , s˙ meas , ρCN line simulation in a reacting flow, the inlet boundary condition must comprise species densities ρi , ρi,δ , temperature Tδ , pressure Pδ , and velocity components

360

A. del Val et al.

uδ , vδ , while the surface conditions are temperature at the wall Tw which closes the energy equation; we assume no slip condition, which closes the momentum equations and we define surface mass balances for the pseudo-species mass equations. This mass balance needs the chemical mechanism at the surface to be specified and we also need to give values to the different parameters. All in all, we intend to find the flow solutions that are compatible with our experimental observations under the considered model. This means that from the set of observations M we want to infer the inlet conditions and the wall conditions in such a way that we are left with a population of possible flow solutions. As we work with a mixture of nine species: {e− , C+ , C2 , C3 , CN, C, N, N+ , N2 }, we are left with a 15 dimensional problem if we only consider nitridation at the wall. To properly define our vector of parameters to be inferred q we need to take into account some physical relationships. Relaying on the knowledge of the operating conditions of the Plasmatron, such as electric power, injected mass flow, static pressure, and probe geometry, we can compute non-dimensional parameters that define the momentum influx to the boundary layer (same parameters as for the catalytic case). These non-dimensional parameters together with the dynamic pressure expression, corrected for viscous effects Pd /KH = 1/2ρv 2 [3], allow us to define a relationship for the inlet velocity components uδ , vδ with the measured dynamic pressure. The need to specify two velocity components falls from the assumed ansatz used in the dimensionally reduced Navier-Stokes equations [20]. Generally, it is safe to assume that we have thermochemical equilibrium at the edge of the boundary layer for these conditions as already studied by Helber [16]. This assumption lets us relate the species densities to the temperature and the pressure, reducing our problem dimensionality further. What the additional physical relations depicted here are doing in practicality is to constrain our search for the more likely flow solutions to the ones that have thermo-chemical equilibrium at the inlet, with a particular given relationship between influx velocity components while complying with the dynamic pressure measurements. It is important to mention that the variability of the inlet non-dimensional parameters with the operating conditions is small as shown by Panerai [28] and they can be assumed to play a negligible role in the inference we want to carry out. In any case, it is possible to test these assumptions by getting rid of this additional structure in our model, therefore liberating the search for flow solutions to any with any velocity components. At the end of the analysis, we are left with five parameters to calibrate, namely CN ). In the experiment, a previous step is done to measure q = (Ps , Pd , Tw , Tδ , γN the free stream condition of the plasma jet Tδ , which is needed for the deterministic rebuilding presented in Sect. 3.2. In this Bayesian approach, we manage to get rid of that experimental step because our inference is not sensitive enough to Tδ . This is due to the fact that we do not measure heat fluxes here but instead have recession rates and CN densities. In the Bayesian approach, we can assume ignorance for CN . This reduction of Tδ and still be able to retrieve what we are looking for, γN observations in our inference problem is an important advantage of the Bayesian approach over the deterministic rebuilding.

Inference Methods for Gas/Surface Interaction Models

361

Finally, having our set of measurements and parameters defined, we propose a likelihood of the form > ? > ? |˙s meas − s˙ |2 |Psmeas − Ps |2 CN L(M|Ps , Pd , Tw , Tδ , γN ) ∝ exp − exp − × 2σs˙2 2σP2s > ? > ? > ? meas − ρ |2 |ρCN |Pdmeas − Pd |2 |Twmeas − Tw |2 CN × exp − exp − exp − , 2σρ2CN 2σT2w 2σP2d (6) CN ), ρ CN where s˙ = s˙ (Ps , uδ , vδ , Tw , Tδ , γN CN = ρCN (Ps , uδ , vδ , Tw , Tδ , γN ), uδ = uδ (Ps , Pd , Tδ ) and vδ = vδ (Ps , Pd , Tδ ). Notice the dependency of s˙ and ρCN on the velocity components which are functions of the dynamic pressure Pd , the static pressure Ps , and the temperature Tδ . Ps and Tδ are needed to compute the density of the mixture defined in the equation relating the dynamic pressure to one of the velocity components. The inference is carried out on the constitutive variables Ps , Pd , and Tδ while the relationship with the velocity components is computed externally and the velocity components are fed to the model to output s˙ and ρCN . It is simpler to keep the original inputs to the solver which would let us use this framework as it is to asses possible discrepancies of the auxiliary axisymmetric problem (to obtain the non-dimensional parameters) with the experiments, thereby having to include the velocities in the inference problem in the future. The computation of the model outputs s˙ and ρCN requires the solution of the 1D stagnation line problem which can be expensive to evaluate for a proper statistical characterization of the posterior distribution. To overcome this issue, s˙ and ρCN are approximated again by Gaussian Process surrogate models whose evaluations are cheaper. We can see in Fig. 3 the different inferences that we carried out when using each piece of information independently and all together. Overall we can see that the calibration seems consistent, the support of the distribution obtained with all measurements is contained in within the support of each of the parts and one can see some information gain (reduced support compared to either of the parts’). Overall, good agreement seems to be found, although we can clearly see that as the wall temperature increases (from G4 to G7), there seems to be an overprediction of ρCN that is not fully consistent with the measurements of recession rates. This is reflected CN as both measurements are directly and greatly affected on the calibration of γN by it. If we pay attention to case G6 (lower left), we can see that this is not the case. Here, the overlap of support is very small and the distribution that contains the information of both measurements does not share most of its support with the calibration obtained by using only CN densities. It is clear that, in this case, the results represent a trade-off between the two measurements and the physics must be studied further. The issue could be double-fold: either an epistemic uncertainty underlying the physical model or biased experimental observations.

362

A. del Val et al.

CN from recession rates, CN densities, and both Fig. 3 Marginal posterior distributions of γN measurements for G4 (upper left), G5 (upper right), G6 (lower left), and G7 (lower right)

5 Conclusions This contribution reviews some selected experiments and inference methods, centered around the VKI plasma wind tunnel, for catalytic and ablative thermal protection materials. The review showcases the complex, multi-scale phenomena that is atmospheric entry flows. The challenges associated with the understanding of such flows are manyfold. In this review, we highlight the intricacies present in selected sets of experimental data and how different models and procedures are used to deterministically retrieve catalytic and ablation coefficients. These methods rely heavily on additional experimental data as well as strong assumptions to fill in the gaps of missing knowledge. We enrich this review by including some of the latest works on Bayesian inference methods for such experimental datasets developed within the UTOPIAE network. For the case of catalytic materials, the calibration proposed has the impact of improving considerably the inference results by giving consistent and accurate posterior distributions without the need of assuming the value of γref for copper. The results show a reduced support and well-defined peaks for both posteriors of γref and γTPS . For the ablative material considered, preliminary results show promise in the Bayesian method by taking into account different measurements and combining

Inference Methods for Gas/Surface Interaction Models

363

CN from recession rates are consistent across them together. While the calibrated γN the different experimental conditions here explored, the same cannot be said about CN when calibrated from ρ . Further investigations need to be conducted to γN CN conclude if it is either an epistemic uncertainty underlying the physical model or biased experimental observations. All in all, it is possible to say that the model parameters can be learned from the experimental conditions considered. Not only that but also these methodologies open the door to future design of experiments to obtain the most accurate coefficients.

Acknowledgments This work is fully funded by the European Commission H2020 program, through the UTOPIAE Marie Curie Innovative Training Network, H2020-MSCA-ITN-2016, Grant Agreement number 722734.

References 1. Anderson, J.D.: Hypersonic and High-Temperature Gas Dynamics, 2nd edn. AIAA Education Series (2006) 2. Barbante, P.F.: Accurate and Efficient Modelling of High Temperature Nonequilibrium Air Flows. Ph.D. thesis, ULB/VKI (2001) 3. Barker, M.: On the use of very small pitot-tubes for measuring wind velocity. Proc. Roy. Soc. A Math. Phys. Eng. Sci. 101, 435–445 (1922) 4. Bianchi, D., Nasuti, F., Martelli, E.: Navier-stokes simulations of hypersonic flows with coupled graphite ablation. J. Spacecraft Rockets 47, 554–562 (2010) 5. Bottin, B., Chazot, O., Carbonaro, M., der Haegen, V.V., Paris, S.: The vki plasmatron characteristics and performance. In: Measurement Techniques for High Temperature and Plasma Flows, RTO-EN-8, pp. 6–1, 6–24 (1999) 6. Boubert, P., Vervisch, P.: Cn spectroscopy and physicochemistry in the boundary layer of a c/sic tile in a low pressure nitrogen/carbon dioxide plasma flow. J. Chem. Phys. 112, 10482–10490 (2000) 7. Chapman, S., Cowling, T.G.: The Mathematical Theory of Non-Uniform Gases. Cambridge University Press (1970) 8. Chen, Y.K., Milos, F.: Navier-stokes solutions with finite rate ablation for planetary mission Earth reentries. J. Spacecraft Rockets 42, 961–970 (2005) 9. Chorkendorff, I., Niemantsverdriet, J.W.: Concepts of Modern Catalysis and Kinetics, 3rd edn. Wiley (2017) 10. Degrez, G., Barbante, P., de la Llave, M., Magin, T.E., Chazot, O.: Determination of the catalytic properties of TPS materials in the VKI ICP facilities. In: European Congress on Computational Methods in Applied Sciences and Engineering ECCOMAS Computational Fluid Dynamics Conference 2001 Swansea, Wales, UK, 4–7 September 2001 11. Duffa, G.: Ablative Thermal Protection Systems Modeling. AIAA Education Series (2013) 12. Ferziger, J.H., Kaper, H.G.: Mathematical Theory of Transport Processes in Gases. NorthHolland Publishing Company (1972) 13. Giovangigli, V.: Multicomponent Flow Modeling. Birkhauser (1999) 14. Goulard, R.: On catalytic recombination rates in hypersonic stagnation heat transfer. Jet Propulsion 28(11), 737–745 (1958) 15. Haario, H., Saksman, E., Tamminen, J.: An adaptive metropolis algorithm. Bernoulli 7, 223– 242 (2001)

364

A. del Val et al.

16. Helber, B.: Material Response Characterization of Low-density Ablators in Atmospheric Entry Plasmas. Ph.D. thesis, Vrije Universiteit Brussel, von Karman Institute for Fluid Dynamics (2016) 17. Helber, B., Turchi, A., Magin, T.E.: Determination of active nitridation reaction efficiency of graphite in inductively coupled plasma flows. Carbon 125, 582–594 (2017) 18. Keenan, J.: Thermo-chemical Ablation of Heat Shields under Earth Re-entry Conditions. Ph.D. thesis, North Carolina State University, Raleigh, North Carolina (1994) 19. Kennedy, M., O’Hagan, A.: Bayesian calibration of computer models. J. Roy. Stat. Soc. B 63, 425–464 (2001) 20. Klomfass, A., Muller, S.: A quasi-onedimensional approach for hypersonic stagnation-point flows. Tech. rep., RWTH Aachen (1996) 21. Kolesnikov, A.F., Tirsky, G.A.: Hydrodynamics equations for partially ionised multicomponent gas mixtures with higher order approximations for transport coefficients. Fluid Mech. Sov. Res. 13(4), 70–97 (1984) 22. Kuo, K.K.: Principles of Combustion. Wiley, Inc. (2005) 23. Lani, A., Villedieu, N., Bensassi, K., Koloszar, L., Vymazal, M., Yalim, S.M., Panesi, M.: Coolfluid: an open computational platform for multi-physics simulation and research. In: 21st AIAA Computational Fluid Dynamics Conference (2013-2589) 24. Laub, B., Venkatapathy, E.: Thermal protection system technology and facility needs for demanding future planetary missions. In: Proceedings of International Workshop on Planetary Probe Atmospheric Entry and Descent Trajectory Analysis and Science, pp. 239–247 (2004) 25. Madras, N.: Lectures on Monte Carlo Methods. American Mathematical Society, Providence (2001) 26. Mitchner, M., Kruger, C.H.: Partially Ionized Gases. Wiley, Inc. (1973) 27. Nelder, J.A., Mead, R.: A simplex method for function minimization. Comput. J. 7(4), 308–313 (1965) 28. Panerai, F.: Aerothermochemistry Characterization of Thermal Protection Systems. Ph.D. thesis, Università degli Studi di Perugia, von Karman Institute for Fluid Dynamics (2012) 29. Panerai, F.: Characterization of gas-surface interactions for ceramic matrix composites in high enthalpy, low pressure air flow. Mater. Chem. Phys. 134(2–3), 597–607 (2012) 30. Panerai, F., Ferguson, J., Lachaud, J., Martin, A., Gasch, M., Mansour, N.: Micro-tomography based analysis of thermal conductivity, diffusivity and oxidation behavior of rigid and flexible fibrous insulators. Int. J. Heat Mass Transf. 108, 801–811 (2017) 31. Park, C., Bogdanoff, D.: Shock-tube measurements of nitridation coefficients of solid carbon. J. Thermophys. Heat Transf. 20, 487–492 (2006) 32. Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. The MIT Press (2006) 33. Turchi, A.: A Gas-Surface Interaction Model for the Numerical Study of Rocket Nozzle Flows over Pyrolyzing Ablative Materials. Ph.D. thesis, University of Rome “La Sapienza” (2013) 34. Turchi, A.: Multi-scale Models and Computational Methods for Aerothermodynamics Applications. Ph.D. thesis, École Central Paris, von Karman Institute for Fluid Dynamics (2014) 35. Turchi, A., Bianchi, D., Nasuti, F., Onofri, M.: A numerical approach for the study of the gassurface interaction in carbon-phenolic solid rocket nozzles. Aerospace Sci. Technol. 27, 25–31 (2013) 36. Turchi, A., Congedo, P.M., Magin, T.E.: Thermochemical ablation modeling forward uncertainty analysis part i: Numerical methods and effect of model parameters. Int. J. Thermal Sci. 118, 497–509 (2017) 37. Vicenti, W.G., Kruger, C.H.: Introduction to Physical Gas Dynamics. Krieger Pub Co. (1975) 38. Wirgin, A.: Influence of nuisance parameter uncertainty on the retrieval of the thermal conductivity of the macroscopically-homogeneous material within a cylinder from exterior temperature measurements. Appl. Math. Modell. 39, 5278–5298 (2015)

Bayesian Adaptive Selection Under Prior Ignorance Tathagata Basu

, Matthias C. M. Troffaes

, and Jochen Einbeck

1 Introduction High dimensional statistical modelling is a popular topic in modern day statistics. These type of problems are often very hard to be dealt using classical methods and we often rely on regularisation methods. There are several well-known frequentist methods which are efficient in tackling high dimensional problems. Tibshirani introduced least absolute shrinkage and selection operator or simply LASSO [11]. Fan and Li investigated asymptotic properties for variable selection and introduced SCAD [4]. Zou introduced adaptive LASSO [12], a weighted version of LASSO that gives asymptotically unbiased estimates. High dimensional modelling is equally well investigated in a Bayesian context. George and McCulloch introduced stochastic search variable selection [5] which uses latent variables for the selection of predictors. Ishwaran and Rao used a continuous bimodal prior for hyper-variances in spike and slab model to attain sparsity [6]. Park and Casella introduced a hierarchical model using the double exponential distribution [9]. Lykou and Ntzoufras [7] proposed a double exponential distribution for the regression parameters along with Bernoulli distributed latent variables. Several other works have been done.

This work is funded by the European Commission’s H2020 programme, through the UTOPIAE Marie Curie Innovative Training Network, H2020-MSCA-ITN-2016, Grant Agreement number 722734. T. Basu () · M. C. M. Troffaes · J. Einbeck Department of Mathematical Sciences, Durham University, Durham, UK e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Vasile, D. Quagliarella (eds.), Advances in Uncertainty Quantification and Optimization Under Uncertainty with Aerospace Applications, Space Technology Proceedings 8, https://doi.org/10.1007/978-3-030-80542-5_22

365

366

T. Basu et al.

In this article, we follow the approach of Narisetty and He [8] to attain sparsity. Moreover, we introduce an additional imprecise beta-Bernoulli to specify the selection probability of the latent variables similar to [7]. We perform a sensitivity analysis over these sets of selection probabilities to obtain a robust Bayesian variable selection routine. The rest of the paper is organised as follows: We first define our hierarchical model in Sect. 2, followed by the posterior computation in Sect. 3. We use an orthogonal design case to show the closed form posteriors and discuss their properties. In Sect. 4, we illustrate our results using both synthetic and real datasets and finally, we draw conclusions in Sect. 5.

2 Model Let Y := (Y1 , · · · , Yn )T denote the responses and X := (X1 , · · · , Xn )T be corresponding p-dimensional predictors. Then we define a linear model in the following way: Y = Xβ + ,

(1)

where β := (β1 , · · · , βp )T is the vector of regression coefficients and := (1 , · · · , n )T are Gaussian noises so that for 1 ≤ i ≤ n, i ∼ N(0, σ 2 ). We define the following hierarchical model for linear models, so that for 1 ≤ j ≤ p, Y | X, β ∼ N Xβ, σ 2 In

(2)

βj | γj = 1 ∼ N(0, σ 2 τ12 )

(3)

βj | γj = 0 ∼ N(0, σ 2 τ02 )

(4)

γj | qj ∼ Ber(qj )

(5)

qj ∼ Beta(sαj , s(1 − αj )),

(6)

where s > 0 are fixed constants. The latent variables γ := (γ1 , · · · , γp ) in the model correspond to spike and slab prior specification routine where γj represents the selection of the co-variate τ02 > 0) so that βj |γj = 0 has its probability xj . We fix sufficiently small τ0 (1 mass concentrated around zero. Therefore, probability distribution of βj |γj = 0 represents the spike component of our prior specification. To construct the slab component, we consider τ12 to be large so that τ1 τ0 . This allows the prior for βj | γj = 1 to be flat. We assume σ 2 to be known for the ease of computation. In a more generalised setting, we may choose an inverse-gamma distribution.

Bayesian Adaptive Selection Under Prior Ignorance

367

We use imprecise beta priors to specify the selection probabilities q := (q1 , . . . , qp ). We use α := (α1 , . . . , αp ) to represent our prior expectation of the selection probabilities (q) and s to represent concentration parameter. We consider α ∈ P, where P := (0, 1)p .

(7)

Note that, in our model we consider a near vacuous set for the elicitation of each αj . That is, for 1 > 0, αj ∈ [, 1 − ]. Therefore, the prior elicitation on the total number of active co-variates lies between p to p(1 − ). More generally, we can consider the following: P := P1 × · · · × Pp .

(8)

Each αj assigns a prior selection probability for each co-variate.

3 Posterior Computation Let γ := (γ1 , . . . , γp ) and q := (q1 , . . . , qp ). The joint posterior of the proposed hierarchical model can be computed in the following way: P (β, γ , q | Y, X) ∝ P (Y | X, β)P (β | γ )P (γ | q)P (q).

(9)

For the ease of computation we will use orthogonal design case i.e., XT X = nIp .

3.1 Selection Indicators Using Eq. (9), we write posterior of γ as P (γ | Y, X) =

P (β, γ , q | Y, X)dqdβ

∝

(10)

P (Y | X, β) P (β | γ ) P (γ | q)P (q)dq dβ.

(11)

Let fγj (βj ) be the density of βj | γj as mentioned in Eqs. (3) and (4). So, ⎛ fγj (βj ) := √

1 2π σ τγj

exp ⎝−

βj2 2σ 2 τγ2j

⎞ ⎠.

(12)

368

T. Basu et al. γ

Since P (γj | qj ) = qj j (1 − qj )1−γj and qj follows Beta distribution, P (β | γ )

P (γ | q)P (q)dq

= γj γj 1−γj 1−γj [f1 (βj )] [f0 (βj )] = P (qj )dqj qj (1 − qj )

(13)

j

=

= [αj f1 (βj )]γj [(1 − αj )f0 (βj )]1−γj .

(14)

j

Now for the orthogonal design case that is when XT X = nIp , we have βˆ = XT Y /n, where βˆ := (βˆ1 , . . . , βˆp ) are the ordinary least square estimates. Then, 1 1 P (Y | X, β) = # exp − 2 Y − Xβ22 2σ (2π σ 2 )n 1 1 T T ˆ T =# exp − 2 nβ β − 2nβ β + Y Y . 2σ (2π σ 2 )n

(15) (16)

Then combining Eqs. (11), (14) and (16) we have the decomposed posterior of γj such that

P (γj | Y, X) = Mj

n(βj − βˆj )2 exp − 2σ 2

× [αj f1 (βj )]γj [(1 − αj )f0 (βj )]1−γj dβj ,

(17)

where Mj is a normalisation constant independent of γj . Then we have,

P (γj = 1 | X, Y ) = Mj αj

n(βj − βˆj )2 exp − 2σ 2

f1 (βj )dβj .

(18)

Now, by completing the square, it can be shown that for k ∈ {0, 1} and j ∈ {1, · · · , p} we have exp −

n(βj − βˆj )2 2σ 2

⎛ 2 ⎞ ⎜ βj − βˆk,j ⎟ 1 ⎟, exp ⎜ fk (βj ) = wk,j √ ⎠ ⎝− 2σk2 2π σk

(19)

Bayesian Adaptive Selection Under Prior Ignorance

where βˆk,j :=

nτk2 βˆj , nτk2 +1

369

σk2 :=

σ 2 τk2 nτk2 +1

and wk,j :=

$ 1 nτk2 +1

exp −

nβˆj2 2(nσ 2 τk2 +σ 2 )

. Then

using Eq. (19) we have P (γj = 1 | X, Y ) = Mj αj w1,j

(20)

P (γj = 0 | X, Y ) = Mj (1 − αj )w0,j .

(21)

and

Therefore, γj | X, Y ∼ Ber

αj w1,j

αj w1,j + (1 − αj )w0,j

.

(22)

Co-variate Selection For the co-variate selection we investigate the posterior odds of each γj . We assign a co-variate to be non-active when ) sup

αj ∈ Pj

D P γj = 1 | X, Y

< 1, P γj = 0 | X, Y

(23)

for j = 1, · · · , p. Or equivalently when, ) sup

αj ∈ Pj

w1,j αj w0,j (1 − αj )

D < 1.

(24)

> 1.

(25)

Similarly, we assign a co-variate to be active if, ) inf

αj ∈ Pj

w1,j αj w0,j (1 − αj )

D

We define the rest to be indeterminate, as it depends on prior elicitation on the selection probability. Properties of the Posterior The posterior odds are given by w1,j αj w1,j = w0,j (1 − αj ) w0,j

1 −1 . 1 − αj

(26)

Now, the first derivative of the posterior odds is given by w1,j 1 > 0. w0,j (1 − αj )2

(27)

370

T. Basu et al.

Therefore, the posterior odds are monotone with respect to the prior selection probability αj . Near Vacuous Set Let 1 > 0. We define a near vacuous set for prior selection probability αj . So that, αj ∈ [, 1 − ]. Then, because of the monotonicity of posterior odds, we can compute the posterior odds on the lower and upper bounds of the set instead of the whole interval. Alternatively, ) sup

αj ∈[,1−]

w1,j αj w0,j (1 − αj )

D =

(1 − ) w1,j · w0,j

(28)

=

w1,j · (1 − ) w0,j

(29)

and ) inf

αj ∈[,1−]

w1,j αj w0,j (1 − αj )

D

3.2 Regression Coefficients The joint posterior of regression coefficients, i.e., β is given by P (β | Y, X) =

P (β, γ , q | Y, X)dq

(30)

P (Y | X, β)P (β | γ )P (γ | q)P (q)dq

(31)

γ

∝

γ

P (β | γ ) P (γ | q)P (q)dq . ∝ P (Y | X, β)

(32)

γ

From Eq. (14) we have P (β | γ )

P (γ | q)P (q)dq =

= [αj f1 (βj )]γj [(1 − αj )f0 (βj )]1−γj . j

(33) Then we can write Eq. (32) as ⎞ ⎛ = ⎝ P (β | Y, X) ∝ P (Y | X, β) [αj f1 (βj )]γj [(1 − αj )f0 (βj )]1−γj ⎠ . γ

j

Bayesian Adaptive Selection Under Prior Ignorance

371

Therefore, swapping sum and product operations we get, P (β | Y, X) ∝ P (Y | X, β)

= [αj f1 (βj )]γj [(1 − αj )f0 (βj )]1−γj j

∝ P (Y | X, β)

=

(34)

γj

[αj f1 (βj ) + (1 − αj )f0 (βj )].

(35)

j

Now, combining Eqs. (16) and (35) we have = 1 T T ˆ [αj f1 (βj ) + (1 − αj )f0 (βj )] P (β | Y, X) ∝ exp − 2 nβ β − 2nβ β 2σ j

= n 2 ˆ ∝ exp − 2 β − β2 [αj f1 (βj ) + (1 − αj )f0 (βj )] 2σ ∝

= j

(36)

j

n(βj − βˆj )2 exp − 2σ 2

[αj f1 (βj ) + (1 − αj )f0 (βj )].

(37)

Therefore, the βj ’s are a posteriori independent and for each 1 ≤ j ≤ p, we have,

n(βj − βˆj )2 P (βj | Y, X) ∝ exp − 2σ 2

[αj f1 (βj ) + (1 − αj )f0 (βj )].

(38)

Let Wj := αj w1,j + (1 − αj )w0,j . Then combining Eqs. (38) and (19) we have, βj | Y, X ∼

(1 − α )w αj w1,j j 0,j N βˆ1,j , σ12 + N βˆ0,j , σ02 . Wj Wj

(39)

Properties of the Posterior To analyse the properties of the posterior, we first consider the ratio of the weights in Eq. (39). For 1 ≤ j ≤ p, ratios of the weights are given by αj w1,j . (1 − αj )w0,j

(40)

This corresponds to posterior selection probability of selection indicators. Therefore, co-variates this ratio becomes greater than 1 for all αj ∈ (0, 1) and for active 2 ˆ N β1,j , σ1 dominates the posterior. Similarly, for non-active co-variates this ratio becomes less than 1 for all values of αj and N βˆ0,j , σ02 dominates the posterior. case occurs when, τ0 - 1/n and αj ∈ [, 1 − ]. Then, An interesting 2 ˆ N β1,j , σ dominates the posterior if, 1

372

T. Basu et al.

> ? σ 2 nτ12 + 1 1− 2 + log nτ1 + 1 , > 2 log n nτ12

βˆj2

(41)

and similarly, N βˆ0,j , σ02 dominates the posterior if, βˆj2

> ? σ 2 nτ12 + 1 2 + log nτ1 + 1 . < 2 log n nτ12 1−

(42)

Posterior Mean and Variance The posterior expectation of βj is given by E(βj | Y, X) =

αj w1,j (1 − αj )w0,j βˆ1,j + βˆ0,j . Wj Wj

(43)

Similarly, the posterior variance of βj is given by Var(βj | Y, X) (1 − α )w αj w1,j 2 j 0,j 2 2 σ1 + βˆ1,j + σ02 + βˆ0,j = Wj Wj > ?2 αj w1,j βˆ1,j + (1 − αj )w0,j βˆ0,j − Wj (44) 2 + (1 − α )w βˆ 2 αj w1,j βˆ1,j αj w1,j σ12 + (1 − αj )w0,j σ02 j 0,j 1,j = + Wj Wj > ?2 αj w1,j βˆ1,j + (1 − αj )w0,j βˆ0,j − Wj (45)

=

αj w1,j σ12 + (1 − αj )w0,j σ02 α(1 − α)w1,j w0,j (βˆ1,j − βˆ0,j )2 + . Wj Wj2

(46)

Therefore, we get a set of posterior variances Sj such that: Sj ⎧ ⎫ ⎨ α w σ 2 +(1−α )w σ 2 α(1−α)w w (βˆ −βˆ )2 ⎬ j 1,j 1 j 0,j 0 1,j 0,j 1,j 0,j : αj ∈ (0, 1) , = + 2 ⎩ ⎭ Wj Wj (47) where wk,j and σk are as defined before.

Bayesian Adaptive Selection Under Prior Ignorance

373

4 Illustration We analyse both synthetic datasets and a real dataset to illustrate our method.

4.1 Synthetic Datasets We use three different synthetic datasets to showcase the performance of our method in terms of variable selection. Synthetic Dataset 1 In this dataset, we construct an orthogonal design matrix Xi,j with 100 predictors and 100 observations. We assign the regression coefficients to be, (β1 , · · · , β6 ) := (100, 125, −80, 100, 200, −150) and βj = 0 for j > 6. We consider standard normal noise to construct the response vector yi = 6 j =1 Xi,j βj + i where i ∼ N(0, 1) for i = 1, · · · , 100. This setting allows us to evaluate the performance of our method with only strong non-zero effects. We analyse this dataset with two different sets of α’s and three different choices of τ1 . We show the summary in Table 1. Synthetic Dataset 2 In this case, we construct a similar design matrix as of synthetic dataset 1. We assign the regression coefficients such that the first 12 βj ’s represent a strong effect and the next 20 βj ’s represent a mild effect. We set βj = 0 for j > 32. 32 We construct the response vector in the following way: yi = j =1 Xi,j βj + i where i ∼ N (0, 1) for i = 1, · · · , 100. This type of coefficient assignment allows us to investigate both small and large effects within the model. We analyse this dataset with two different sets of α’s and three different choices of τ1 . We show the summary in Table 1. We observe that in this case, the choice of τ1 plays an important role. Synthetic Dataset 3 We use the third synthetic dataset to illustrate the high dimensional case. We construct the design matrix with 100 observations and 200 predictors. We assign the first 12 regression coefficients to demonstrate large effects and the next 28 as small effects. We set the rest of the regression coefficients to be zero, i.e., βj = 0 for j > 40. We construct the response vector in a similar fashion as for synthetic datasets 1 and 2. We use two different sets of weights. We use three different τ1 ’s for each set of weights. We provide the summary in Table 1. In all of the three cases, we also provide a comparison of different variable selection methods. We use basad [8], blasso [9] and SSLASSO [10] along with our method. We observe that in all the cases our method gives similar results to blasso. However, for blasso, we use the median values of the posteriors to identify the variables unlike our method of computing the posterior expectation of the latent variables. We also notice that for these three synthetic datasets, fixing τ1 = 10, gives us more accurate sets of active co-variates and also the number of indeterminate variables is less. We also observe that for high dimensional case, our method is more accurate in detecting the inactive variables unlike the other two

374

T. Basu et al.

Table 1 Summary of variable selection for three different synthetic datasets. Parameter setting/method Dataset 1, active 6 and inactive 94 α ∈ [0.1, 0.9], τ0 = 10−6 , τ1 = 1 α ∈ [0.1, 0.9], τ0 = 10−6 , τ1 = 10 α ∈ [0.1, 0.9], τ0 = 10−6 , τ1 = 100 α ∈ [0.05, 0.95], τ0 = 10−6 , τ1 = 1 α ∈ [0.05, 0.95], τ0 = 10−6 , τ1 = 10 α ∈ [0.05, 0.95], τ0 = 10−6 , τ1 = 100 BASAD BLASSO (Median) SSL (Double exponential) Dataset 2, active 32 and inactive 68 α ∈ [0.1, 0.9], τ0 = 10−6 , τ1 = 1 α ∈ [0.1, 0.9], τ0 = 10−6 , τ1 = 5 α ∈ [0.1, 0.9], τ0 = 10−6 , τ1 = 10 α ∈ [0.3, 0.95], τ0 = 10−6 , τ1 = 5 α ∈ [0.3, 0.95], τ0 = 10−6 , τ1 = 10 α ∈ [0.3, 0.95], τ0 = 10−6 , τ1 = 100 BASAD BLASSO (Median) SSL (Double exponential) Dataset 3, active 40 and inactive 160 α ∈ [0.1, 0.2], τ0 = 10−6 , τ1 = 1 α ∈ [0.1, 0.2], τ0 = 10−6 , τ1 = 5 α ∈ [0.1, 0.2], τ0 = 10−6 , τ1 = 10 α ∈ [0.2, 0.5], τ0 = 10−6 , τ1 = 5 α ∈ [0.2, 0.5], τ0 = 10−6 , τ1 = 10 α ∈ [0.2, 0.5], τ0 = 10−6 , τ1 = 100 BASAD BLASSO (Median) SSL (Double exponential)

True active Act Inact

Indet

True inactive Act Inact

Indet

6 6 6 6 6 6 6 6 6

0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 – – –

0 0 0 0 0 0 1 1 0

62 86 78 0 69 63 93 93 94

32 8 16 94 25 31 – – –

12 32 32 32 32 32 12 32 12

0 0 0 0 0 0 20 0 20

20 0 0 0 0 0 – – –

0 0 0 0 0 0 0 1 0

56 68 68 51 63 57 68 67 68

12 0 0 17 5 11 – – –

14 16 19 22 40 17 12 40 12

26 14 0 2 0 0 28 0 28

0 10 21 16 0 23 – – –

0 0 0 0 0 0 0 0 0

160 160 160 159 160 102 160 160 160

0 0 0 1 0 58 – – –

datasets. We also see that our method does not identify an inactive co-variate as an active co-variate. However, for high dimensional case, our method identifies some of the small effects as inactive for smaller values of τ1 and for larger values of τ1 it tends to identify variables as indeterminate, which can be understood by Eqs. (41) and (42).

Bayesian Adaptive Selection Under Prior Ignorance

375

4.2 Real Data Analysis We investigate the Gaia dataset to illustrate our method using real data. This dataset was used for computer experiments [2, 3] prior to the launch of European Space Agency’s Gaia mission [1]. The data contains spectral information of 16 (p) wavelength bands, and four different stellar parameters. In this example, we take stellar-temperature (in Kelvin scale) as the response variable. This dataset contains 8286 observations which are highly correlated. We show the correlation between the co-variates in Fig. 1. We randomly sample 100 (n) of them to fit our model and 100 more to measure the prediction accuracy. We standardise the dataset for the sake of clearer interpretation. A robust Bayesian routine needs different measure(s) of accuracy as we do not have a single posterior for prediction. We introduce a new measure which can be considered to evaluate prediction accuracy and call it minimum squared error. Let A(α) :=

⎧ ⎨ ⎩

) j:

w1,j αj w0,j (1 − αj )

D

⎫ ⎬ >1 . ⎭

(48)

Therefore, A(α) or simply, A denotes the set of active variables for each value of α. We define minimum squared error by post Minimum Squared Error = min Y − XA βˆA 22 , α∈P

(49)

post where βˆA := E(βA | Y ) is the posterior mean of βA . The sensitivity analysis also creates an indeterminacy in prediction. Therefore, we define a similar measure called maximum squared error over the set of α ∈ P. We use both minimum and maximum squared error to introduce a new measure to capture the indeterminacy such that: Indeterminacy =

Maximum Squared Error − Minimum Squared Error . Maximum Squared Error

(50)

Note that, for classical methods, indeterminacy is equal to zero. Literature [3] suggests that this dataset contains 1–3 main contributory variables. Based on this information, we take two sets for α so that α ∈ [0.05, 0.2] and α ∈ [0.2, 0.4]. We use JAGS to perform our analysis which we show in Table 2. We observe that for α ∈ [0.05, 0.2], our model performs better in terms of minimum squared error as well as indeterminacy. We observe that our method identifies only one active variable (band 6) irrespective of the choice of α. We also observe that unlike the synthetic datasets, we do not have a better choice of τ1 . The higher values of τ1 result in smaller minimum squared errors. However, indeterminacy is much higher than the case where τ1 = 1. We notice that our method is in agreement with Spike and Slab lasso [10] and Bayesian Lasso [9]. For Bayesian lasso, we use the

band1 band2

band16

band15

band14

band13

band12

band11

band10

band9

band8

band7

band6

band5

band4

band3

band2

T. Basu et al.

band1

376

1

0.8

band3 band4 band5

0.6

0.4

band6 band7

0.2

band8 band9 band10

0

−0.2

band11 band12 band13

−0.4

−0.6

band14 band15

−0.8

band16 −1

Fig. 1 Correlation plot matrix of the Gaia dataset

posterior median of the selected variables to fit the model instead of posterior mean. We see that for basad [8], it selects two active variables (band 2 and 6). This can be related to our setting where we have indeterminate co-variates contributing to higher minimum squared error.

5 Conclusion Bayesian variable selection is a very important topic in modern statistics. In this paper, we discuss a novel and robust Bayesian variable selection routine based on the notion of spike and slab priors. The robustness within the hierarchical model is introduced using imprecise beta model which allows us to incorporate prior elicitation in a more flexible way. We inspect posterior properties of regression coefficients and selection indicators for the orthogonal design case. For the illustration of our method, we use three synthetic datasets covering different aspects of design matrices

Bayesian Adaptive Selection Under Prior Ignorance

377

Table 2 Comparison of different methods for the Gaia dataset. Parameter setting/method α ∈ [0.2, 0.4], τ0 = 10−1 , τ1 = 1 α ∈ [0.2, 0.4], τ0 = 10−1 , τ1 = 10 α ∈ [0.2, 0.4], τ0 = 10−1 , τ1 = 50 α ∈ [0.2, 0.4], τ0 = 10−2 , τ1 = 1 α ∈ [0.2, 0.4], τ0 = 10−2 , τ1 = 10 α ∈ [0.2, 0.4], τ0 = 10−2 , τ1 = 50 α ∈ [0.05, 0.2], τ0 = 10−2 , τ1 = 1 α ∈ [0.05, 0.2], τ0 = 10−2 , τ1 = 10 BASAD BLASSO (Median) SSL (Double exponential)

Act 1 1 1 1 1 1 1 1 2 1 1

Inact 11 15 12 13 13 15 15 15 14 15 15

Indet 4 0 3 2 2 0 0 0 – – –

Min. SE 8.44 8.31 8.26 8.07 8.38 8.85 8.14 8.20 10.83 8.16 8.14

Indeterminacy 0.57 0.49 0.56 0.38 0.25 0.34 0.19 0.58 0 0 0

and a real-life dataset to evaluate performance of our method for general cases. Under suitable scaling parameter, our method outperforms other methods in variable selection using the synthetic datasets. For the considered real dataset, it is in good agreement with other methods.

References 1. ESA science & technology: Gaia. http://sci.esa.int/gaia, accessed: 2018-02-06 2. Bailer-Jones, C.A.L.: The ILIUM forward modelling algorithm for multivariate parameter estimation and its application to derive stellar parameters from Gaia spectrophotometry. Mon. Not. R. Astron. Soc. 403(1), 96–116 (2010). https://doi.org/10.1111/j.1365-2966.2009. 16125.x 3. Einbeck, J., Evers, L., Bailer-Jones, C.: Representing complex data using localized principal components with application to astronomical data. In: Gorban, A.N., Kégl, B., Wunsch, D.C., Zinovyev, A.Y. (eds.) Principal Manifolds for Data Visualization and Dimension Reduction, pp. 178–201. Springer, Berlin, Heidelberg (2008) 4. Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96(456), 1348–1360 (2001). https://doi.org/10.1198/016214501753382273 5. George, E.I., McCulloch, R.E.: Variable selection via Gibbs sampling. J. Am. Stat. Assoc. 88(423), 881–889 (1993). http://www.jstor.org/stable/2290777 6. Ishwaran, H., Rao, J.S.: Spike and slab variable selection: Frequentist and bayesian strategies. Ann. Stat. 33(2), 730–773 (2005). https://doi.org/10.1214/009053604000001147 7. Lykou, A., Ntzoufras, I.: On Bayesian lasso variable selection and the specification of the shrinkage parameter. Stat. Comput. 23(3), 361–390 (2013). https://doi.org/10.1007/s11222012-9316-x 8. Narisetty, N.N., He, X.: Bayesian variable selection with shrinking and diffusing priors. Ann. Stat. 42(2), 789–817 (2014). https://doi.org/10.1214/14-AOS1207 9. Park, T., Casella, G.: The Bayesian lasso. J. Am. Stat. Assoc. 103(482), 681–686 (2008). https://doi.org/10.1198/016214508000000337 10. Ro˘cková, V., George, E.I.: The spike-and-slab lasso. J. Am. Stat. Assoc. 113(521), 431–444 (2018). https://doi.org/10.1080/01621459.2016.1260469

378

T. Basu et al.

11. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B (Stat. Methodol.) 58(1), 267–288 (1996). http://www.jstor.org/stable/2346178 12. Zou, H.: The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 101(476), 1418–1429 (2006). https://doi.org/10.1198/016214506000000735

A Machine-Learning Framework for Plasma-Assisted Combustion Using Principal Component Analysis and Gaussian Process Regression Aurélie Bellemans, Mohammad Rafi Malik, Fabrizio Bisetti, and Alessandro Parente

1 Introduction Nonequilibrium plasma discharges show a positive effect on the ignition of combustion mixtures where conventional methods fail. With every discharge, high-energy electrons are introduced into the flow and lead to the ionization and excitation of the neutral molecules. Consequently, a sustained amount of radicals, ions, and vibrationally and electronically excited states are produced. These reactive species will initiate important chain branching reactions toward the ignition of the mixture.

Supported by the F.R.S.-FNRS. A. Bellemans () Thermo and Fluid Dynamics (FLOW), Faculty of Engineering, Vrije Universiteit Brussel, Brussels, Belgium Combustion and Robust Optimisation (BURN), Université Libre de Bruxelles and Vrije Universiteit Brussel, Brussels, Belgium University of Texas at Austin, Department of Aerospace Engineering and Engineering Mechanics, Austin, TX, USA e-mail: [email protected] M. R. Malik · A. Parente Ecole Polytechnique de Bruxelles, Département d’Aéro-Thermo-Mécanique, Université Libre de Bruxelles, Brussels, Belgium Combustion and Robust Optimisation (BURN), Université Libre de Bruxelles and Vrije Universiteit Brussel, Brussels, Belgium F. Bisetti University of Texas at Austin, Department of Aerospace Engineering and Engineering Mechanics, Austin, TX, USA © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Vasile, D. Quagliarella (eds.), Advances in Uncertainty Quantification and Optimization Under Uncertainty with Aerospace Applications, Space Technology Proceedings 8, https://doi.org/10.1007/978-3-030-80542-5_23

379

380

A. Bellemans et al.

A comprehensive overview on the benefits of using plasma discharges to enhance combustion applications is given in several review articles [14, 15]. Detailed kinetic mechanisms were developed to describe the complex interaction between the nonequilibrium plasma and the combustion chemistry [1]. Consequently, these mechanisms contain a large number of species and reactions that are active over a wide range of time scales, that is, nanoseconds for plasma reactions and milliseconds for combustion interactions. The large size of plasma-assisted combustion (PAC) mechanisms makes them impractical for their implementation in large unsteady multi-dimensional simulations. Developing a reliable surrogate model starting from the detailed physics offers a solution to alleviate the computational cost of the simulations. The objective is to represent the reaction kinetics using a reduced number of variables that reflect the behavior of the full system. Recent work has investigated various data-driven reduction techniques to obtain surrogate representations of the detailed plasma-assisted combustion physics ranging from graph-based techniques [3] to correlation-based techniques using principal component analysis [2]. A first class of techniques is given by the graph-based methods, such as the Directed Relation Graph method with Error Propagation (DRGEP) [9]. In this method, the chemical kinetics of a system is represented by a network in which each node represents a species. The nodes are interconnected by weighted edges where the weighting coefficient determines the importance of one species to another through elementary reactions. A reduced model is obtained by discarding unimportant nodes and edges in the graph. DRGEP is an acknowledged technique in the combustion community, and has been used as a standalone method or in combination with other methods for several applications [11, 13]. Recently, we have extended this technique to plasma-assisted combustion cases [3]. The novel PDRGEP method, with “P” for “plasma,” relies on new reduction targets and error metrics that conserve the energy branching in the system. The so-called energy branching is the amount of energy from the plasma discharge that is transferred to the internal states of the neutral molecules to create ions, and vibrationally and electronically excited states. P-DRGEP ensures that the effect of the plasma on the combustion chemistry is conserved throughout the reduction. Alternatively, principal component analysis (PCA) was successfully used to develop high-fidelity surrogates to study the plasma-assisted ignition of propane/argon and ethylene/air mixtures using a nanosecond pulse discharge. PCA is a data-driven reduction technique in which the original state-space is projected on a reduced set of new variables: the principal components. These components are a linear combination of the original variables and contain most of the variance in the system. In previous work on plasma-assisted combustion, the PCA-based model allowed reducing the number of variables by half from 163 species to 80 principal components without introducing any inaccuracies [2]. The objective of this chapter is to extend this research further and use a machinelearning framework combining principal component analysis with Gaussian process regression (GPR) in order to optimally reduce the dimensionality of surrogate PAC mechanisms. In this work, we will demonstrate how the entire state-space can be regressed in function of a selected amount of principal components using

ML Framework for PAC Using PCA and GPR

381

GPR. The method will be demonstrated on a detailed zero-dimensional simulation for the plasma-assisted ignition of ethylene-air mixtures using nanosecond pulse discharges. This work is structured as follows: Sect. 2 presents the reactor model and the ignition test cases, Sect. 3 describes the machine-learning framework based on PCA and GPR, and Sect. 4 presents the results on plasma-assisted combustion test cases for ethylene-air mixtures.

2 Reactor Model and Ignition Simulations The following ordinary differential equations describe the time evolution of a fuelair mixture in a closed isochoric and adiabatic reactor: dce = ωe , dt

dci = ωi . dt

i = e

(1)

The concentration of electrons is indicated by ce , and ci the one of all other species other than electrons (i = e). ωe and ωi are the molar production rate of electrons and species i, respectively. A two-temperature model [4] is used to describe the energy evolution of the mixture: Te is the electron temperature and T indicates the temperature for all other species. The evolution equations for T and Te are

cvi ci

dT = − i=e ωi ui − Qel − Qix − Ql , dt

(2)

cve ce

dTe = −ωe ue + Qel + Qix + Ql + QE . dt

(3)

i=e

The molar internal energy of the electrons is given by ue = ue (Te ) and ui = ui (T ) for all species i. Ue = ue ce is the internal energy density for the electrons and Ui = ui ci the one for all other particles. cvi is the specific heat at constant volume of species i and cve = 3kB /2 for the electrons. In the latter, kB is the Boltzmann constant. The energy lost by the electrons through recombination processes is given by Ql , Ql =

−ue NA qk .

(4)

k∈K

K is the set of recombination reactions, qk the rate coefficient for reaction k, ue the internal energy of the electrons, and NA the Avogadro number. Qix is the inelastic energy lost by the electrons due to ionization, dissociation, and excitation processes, Qix =

∈L

−Eexc, NA q ,

(5)

382

A. Bellemans et al.

where L is the set of reactions involved, and Eexc, is the excitation or ionization energy. Qel describes the elastic energy exchanges, ⎛ Qel = 3kB ⎝

⎞ νiel me /mi ⎠ ne (Te − T ).

(6)

i∈S,i=e

mi and me are the masses of species i and the electron, and νiel is the elastic collision frequency between species i and the electron. The mixture is excited by a pulsed discharge until ignition occurs. The power deposited by the discharge, QE , is modeled as a Gaussian source term in the electron energy equation 3: E 1 (t − μ)2 QE (t) = √ exp − . 2 σ2 σ 2π

(7)

Fig. 1 A burst of pulses produces a sustained concentration of species and radicals for a stoichiometric C2 H4 -air mixture at 800 K and 0.5 atm. Ignition is achieved after 4 discharge pulses [3]

Mass fractions

μ is the time√of peak power, σ the pulse width related to the Full-Width-Half-Max FWHM = 2 2 ln 2σ ≈ 2.355σ , and E the energy density of the pulse. The pulse frequency f and energy parameters are chosen in accordance with experiments to result in an ignition within 10 to 100 μs of the first pulse [8]. The governing equations (1) and (2) are integrated with CVODE [5] in an in-house computer program (PACMAN). The program is coupled to the CHEMKIN library [7] for the calculation of thermodynamic and kinetic properties. A detailed kinetic mechanism with 163 species and 1167 reactions [1] was developed to study the plasma-assisted ignition of a mixture of fuel and air. In this work, a derived skeletal mechanism for ethylene-air consisting of 55 species and 236 reactions [3] will be considered. A burst of pulses deposits energy to the mixture until ignition is achieved as shown in Fig. 1. The reactor is initialized with pressure p = 0.5 atm, temperatures T = Te =600–1000 K, and a mixture of ethylene and air with equivalence ratio φ = 1. The discharge parameters are chosen to guarantee ignition within O(100 μs) of the first pulse. The values of energy density per pulse

10

-2

CO H 2O

10-4

10

H O OH CO2

-6

0

10

20

30

Time, s

40

50

ML Framework for PAC Using PCA and GPR Table 1 Configuration used for the ignition of ethylene-air mixtures using a burst of pulses

383 Temperature, T0 [K] Pressure, p0 [atm] Equivalence ratio, φ Peak power density [kW/cm3 ] FWHM [ns] Pulse frequency, f [kHz] Number of pulses to ignition Time to ignition, τig [μs]

600–1000 0.5 1 2000 15 100 4–8 20–60

employed in the study are comparable to those in experimental studies on plasmaassisted ignition and are reported in Table 1. As described in previous work [6], the energy deposited by a series of pulse discharges has a positive effect on the ignition process. This beneficial effect is mainly attributed to a kinetic enhancement in which excited particles, mostly of O2 and N2 , are created through collisions with energetic electrons. These excited particles are thereafter used in quenching reactions (e.g., N2∗ + O2 −→ N2 + 2 O) to produce radicals that will provoke chain branching reactions. The evolution of the combustion process is evaluated by tracking the number density of CO2 . The exothermic release of carbon dioxide increases abruptly during ignition. The moment in time when the rate of change in the number density of CO2 is maximum is defined as the time of ignition t ∗ . The time to ignition is therefore defined as τig = t ∗ − t1 , where t1 is the timing of the peak discharge power during the first pulse.

3 PCA-Based Gaussian Process Regression Principal component analysis is a data-driven method to reduce the dimensionality of large kinetic mechanisms by projecting a detailed system consisting of Q variables on a smaller, approximate base with only q < Q principal components. Training data are sampled from simulations performed with the detailed kinetic mechanism. The data contain the value of all variables for each sample. In this case, the variables correspond to the molar concentrations of the species in the mechanism. The sample data are organized in matrix C, of size [n × Q] with n the number of samples in time. ⎡ c11 . . . ⎢ . . C=⎢ ⎣ .. . . cn1 . . .

⎤ c1Q .. ⎥ ⎥ . ⎦.

(8)

cnQ

The data are preprocessed before the analysis is carried out. Outlying samples are removed, and the data are centered and scaled. An overview of various preprocess-

384

A. Bellemans et al.

ing techniques for chemically reactive systems is given by Parente et al. [12]. An eigenvalue problem is solved for the covariance matrix S using the observed data to retrieve the principal components. The matrix of the right eigenvectors A, resulting from the largest eigenvalues L, corresponds to the principal components, also called scores, in the reduced representation, S=

1 C T C = ALAT . n−1

(9)

The matrix of eigenvectors is truncated to a matrix Aq containing only the q < Q eigenvectors corresponding to the highest eigenvalues or variance as described in Sutherland et al. [12], Z q = CAq ,

(10)

C˜ q = Z q ATq .

(11)

The principal components are a linear combination of the original variables. When inverting Eq. 10, one recovers the original data as shown in Eq. 11. The scores do not relate to the original space given by the conserved variables, which are in this case the molar concentrations, but do relate to the eigenvectors of the variables. The governing equations must be solved in this space and should be rewritten accordingly. Additionally, the species source terms are transformed to the space of principal components by using the truncated matrix of eigenvectors Aq : ω Z = ω C Aq .

(12)

In order to optimally reduce the dimensionality of the PAC problem, we can explore the use of regression processes in combination with PCA in a machinelearning framework. The objective is to map the variables in the state-space (temperatures, species concentrations) and their source terms in function of the principal component scores using a regression function, φ ≈ fφ (Zq ),

(13)

with φ the set of variables and source terms in the original state-space, fφ the regression function, and Zq the PC-basis. In this work, we propose to investigate Gaussian process based nonlinear regression as it showed promising results for combustion applications [10]. In Gaussian process regression, each variable is mapped as a Gaussian data subset. GPR is nonparametric and therefore not limited by a functional form. Instead of calculating the probability distribution function of parameters of a specific function, GPR will calculate the probability distribution over all admissible functions that fit the data. Thus, we will define a prior on the function space, calculate the posterior using the training data and compute the predictive posterior distribution on our points of interest. The Gaussian process prior (GP) can be defined as follows,

ML Framework for PAC Using PCA and GPR

fφ ≈ GP (m(x), K(x, x )),

385

(14)

with x and x the observed variables, m(x) a mean function, and K(x, x ) a covariance function also called kernel. Popular kernels are the Matèrn covariance function, the squared exponential function, artificial neural networks (ANN), and the piece-wise polynomial function. In this project, we will compare all those kernels and select the optimal solution for PAC test cases. Using GPR, we will map all the state variables, such as the species concentrations and temperatures, and the PC source terms, to the principal components: φ ≈ GP (Zq ). In a classic PCA approach, the principal components are solved and all statevariables are retrieved at each time step using the PCA matrices. In this novel machine-learning approach, we will still solve the principal components, but retrieve the original state space using regression and tabulation. Also the source terms for the principal components will be retrieved using GPR. The benefit of using this framework is that we can drastically decrease the number of components compared to a traditional application of PCA.

4 Results Data are assembled by combining several zero-dimensional simulations for the ignition of a stoichiometric ethylene-air mixture with initial temperatures ranging from 600 to 1000 K. The resulting test matrix consequently contains nine individual 0D trajectories. The data are sampled using a two different time scales: 0.1 ns for the discharge phase and 1 ms for the combustion chemistry in between pulses. First, the mechanism is reduced using principal component analysis on the data collected from the ignition trajectories. In a next study, the entire state-space is regressed in function of a selected number of principal components in order to optimally reduce the reduced-order model. More specifically, PCA is combined with Gaussian process regression in a machine learning approach using the GPML toolbox.

4.1 Principal Component Analysis First, the mechanism of 55 species will be reduced by relying solely on principal component analysis. This analysis is different from previous investigations where the original detailed mechanism of 163 species was reduced to 80 principal components [2]. In this work, we will start from the skeletal mechanism of 55 species, which already accounts for a major compression. The objective is to study whether its dimension can be reduced further using PCA. The analysis is carried out on the data containing the ensemble of 0D ignition trajectories. The data are centered and scaled using MAX scaling [12]. The algorithm retrieves 44 principal components to represent the entire state-space.

386

A. Bellemans et al.

Fig. 2 Comparison of the species number densities for simulations with the detailed model (full lines) and PCA model with 44 PCs (dashed lines)

Consequently, the transport equations in the 0D model are now expressed in terms of principal components instead of species and temperatures. New simulations were performed in the PACMAN code using these 44 components as governing variables. Figure 2 shows the time evolution of O2 , CO2 , O, H OH, H2 O, and CO for simulations with the detailed model of 55 species and the PCA model with 44 principal components. The species are reasonably well predicted throughout the simulations compared to the simulations with the original mechanism. There can be concluded that PCA can only provide a limited dimension reduction of the skeletal mechanism. The 55 species in the skeletal mechanism were reduced to 44 principal components. This outcome is a reasonable result as the skeletal mechanism is already a condensed form of the original one. In order to further scale down the size of the mechanism, we will combine PCA with Gaussian process regression in the next section.

4.2 Combination of PCA with Gaussian Process Regression The combination of PCA with GPR is the next process in the Machine-Learning framework for plasma-assisted combustion. As a reminder, the detailed kinetic scheme originally consists out of 163 species. A first reduction from 163 to 55 species was performed in previous research [3] using the graph-based technique P-DRGEP. Next, we applied principal component analysis which allowed a compression of 55 species to 44 PC’s. The present section presents the next methodology which is the coupling of PCA with GPR. A scheme detailing each task in this process is given in Fig. 3. The entire state-space (species and temperatures) is regressed in function of a small number of q principal components: φ ≈ GP (Z1 , . . . , Zq ). The amount of necessary components to perform the regression, q, is determined through an

Dimension Reduction

ML Framework for PAC Using PCA and GPR

387

163 species

Detailed model for plasma-assisted combustion

55 species

Plasma-DRGEP

Principal Component Analysis

Selection of q Principal Components

Gaussian Process Regression to map the state space

q=q-1

New predictions

Optimal PCA-GPR model

Fig. 3 The following scheme illustrates the dimension reduction process combining principal component analysis and Gaussian process regression

iteration process. In order to find this initial number, we have analyzed the principal component weighting factors as shown in Fig. 4. The objective is to start with a set of components that represents most of the variables (species and temperatures) throughout the simulation. PC’s 1 and 2 are dominated by the electron temperature Te and the gas temperature T , respectively. This implies that all the variables that are closely related to these temperatures could potentially be regressed in function of these 2 principal components. PC 3 has major components related to nitrogen and its vibrationally excited states. PC 4 and PC 5 are dominated by a combination of combustion and plasma species. Considering the aforementioned information, the initial number of components is set to 5 to represent the full spectrum of variables and reduced by one through every regression loop. The quality of the regression is assessed by comparing the evaluations for a test matrix against the detailed model. A minimization problem is solved to determine the hyper parameters in the kernel, that is, the mean function and the standard deviation. The squared exponential covariance function was selected for this type of application as a result from previous research on combustion test cases [10]. The mean function is set to zero. Following the aforementioned strategy, the number of principal components retained for the regressions is determined through an iteration process. Previous

-1

-1 T Te H H2 O O2 OH H2O N2 HO2 H2O2 CO CO2 CH2O HCO O2CHO CH3O CH3O2H CH3O2 CH3 CH2(S) C2H5 C2H4 C2H3 C2H2 C2H3OH CH2CHO HO2CH2CO CH2CO HCCO PC2H4OH O2C2H4OH C3H8 IC3H7 NC3H7 C3H6 C2H5T NC3H7O2 IC3H7O2 H2CC E N2(v1) N2(v2) N2(v3) N2(v4) N2(v5) N2(A3sigma) N2(B3Pi) N2(C3Pi) N2+ O2(a1Delta) O2(b1Sigma) O2+ N O(1D) NO

Weighting

T Te H H2 O O2 OH H2O N2 HO2 H2O2 CO CO2 CH2O HCO O2CHO CH3O CH3O2H CH3O2 CH3 CH2(S) C2H5 C2H4 C2H3 C2H2 C2H3OH CH2CHO HO2CH2CO CH2CO HCCO PC2H4OH O2C2H4OH C3H8 IC3H7 NC3H7 C3H6 C2H5T NC3H7O2 IC3H7O2 H2CC E N2(v1) N2(v2) N2(v3) N2(v4) N2(v5) N2(A3sigma) N2(B3Pi) N2(C3Pi) N2+ O2(a1Delta) O2(b1Sigma) O2+ N O(1D) NO

-1

-1 T Te H H2 O O2 OH H2O N2 HO2 H2O2 CO CO2 CH2O HCO O2CHO CH3O CH3O2H CH3O2 CH3 CH2(S) C2H5 C2H4 C2H3 C2H2 C2H3OH CH2CHO HO2CH2CO CH2CO HCCO PC2H4OH O2C2H4OH C3H8 IC3H7 NC3H7 C3H6 C2H5T NC3H7O2 IC3H7O2 H2CC E N2(v1) N2(v2) N2(v3) N2(v4) N2(v5) N2(A3sigma) N2(B3Pi) N2(C3Pi) N2+ O2(a1Delta) O2(b1Sigma) O2+ N O(1D) NO

Weighting

Weighting

T Te H H2 O O2 OH H2O N2 HO2 H2O2 CO CO2 CH2O HCO O2CHO CH3O CH3O2H CH3O2 CH3 CH2(S) C2H5 C2H4 C2H3 C2H2 C2H3OH CH2CHO HO2CH2CO CH2CO HCCO PC2H4OH O2C2H4OH C3H8 IC3H7 NC3H7 C3H6 C2H5T NC3H7O2 IC3H7O2 H2CC E N2(v1) N2(v2) N2(v3) N2(v4) N2(v5) N2(A3sigma) N2(B3Pi) N2(C3Pi) N2+ O2(a1Delta) O2(b1Sigma) O2+ N O(1D) NO

-1

T Te H H2 O O2 OH H2O N2 HO2 H2O2 CO CO2 CH2O HCO O2CHO CH3O CH3O2H CH3O2 CH3 CH2(S) C2H5 C2H4 C2H3 C2H2 C2H3OH CH2CHO HO2CH2CO CH2CO HCCO PC2H4OH O2C2H4OH C3H8 IC3H7 NC3H7 C3H6 C2H5T NC3H7O2 IC3H7O2 H2CC E N2(v1) N2(v2) N2(v3) N2(v4) N2(v5) N2(A3sigma) N2(B3Pi) N2(C3Pi) N2+ O2(a1Delta) O2(b1Sigma) O2+ N O(1D) NO

Weighting

Weighting

388 A. Bellemans et al.

0.5

1

-0.5 0

(a)

Variables

0.5 1

-0.5 0

(b) Variables

0.5 1

-0.5 0

Variables

1

(c)

0.5

-0.5 0

Variables

1

(d)

0.5

-0.5

0

Variables

(e)

Fig. 4 The weighting factors represent the contribution of the original variable to the principal component. (a) Principal component 1. (b) Principal component 2. (c) Principal component 3. (d) Principal component 4. (e) Principal component 5

ML Framework for PAC Using PCA and GPR

389

3000

3000

Detailed simulation GPR - 2 PC's

2500

Temperature, K

Temperature, K

2500

2000

1500

1000

500 -100

Detailed simulation GPR - 2 PC's

2000

1500

1000

0

100

200

300

400

-60

-40

-20

0

PC(1)

PC(2)

(a)

(b)

20

Fig. 5 Gas temperature in function of the two principal components. (a) First principal component. (b) Second principal component

work has demonstrated that GPR is more efficient for a low number of components. Combining this observation with earlier findings, the initial number of principal components is set to 5 for the regression and is decreased until a satisfactory model is obtained. A data sample of 5000 points was selected from the ensemble of observations containing the data from all ignition trajectories to train the model and conduct the GPR. A test matrix of 1000 points is generated to predict new points using the surrogate model based on PCA and GPR. The test matrix is different from the data used to train the model. In a first regression model for PAC, the data were regressed in function of an optimized number of q = 2 principal components: φ ≈ GP (Z1 , Z2 ). This implies that only two variables need to be solved in order to retrieve the entire detailed dynamics of the system. In a numerical algorithm, the code will advance the time evolution of the two principal components, and the entire state-space will be reconstructed using tables generated by the GPR model. The results presented in this work offer a preliminary study of the PCA-GPR reduction, without performing new simulations with the reactor code. The two principal components in the model are well suited to represent the detailed thermal evolution of the combustion problem. The latter is illustrated in Fig. 5 in which the temperature is plotted against the first two principal components. This result is expected as the principal component weights show major contributions of the temperature in the first two components (Fig. 4). To evaluate the accuracy of the model, key variables such as the temperature and species are predicted with GPR and compared to the data for the detailed simulations. Figure 6 compares the temperature evolution for the ensemble of ignition trajectories between the detailed simulation and the GPR model using two principal components. The GPR predictions are in good agreement with the

390

A. Bellemans et al.

Fig. 6 Comparison of the gas temperature between the detailed model (blue crosses) and the model using two principal components in combination with Gaussian process regression (squared exponential kernel) (red dots)

3000 Detailed simulation GPR - 2 PC's

Temperature, K

2500

2000

1500

1000

500

0

1

2

3

4

5

Time, s 10

-4

10 Detailed simulation GPR - 2 PC's

6

Detailed simulation GPR - 2 PC's

1.5

4

O2

C 2H 4

-5

-3

2

5

6 10

3

1

2

0.5

1 0 0

2

4

Time, s

(a)

6 10

-5

0

0

2

4

Time, s

6 10-5

(b)

Fig. 7 Comparison of species mass fraction between the detailed model (blue crosses) and the model using two principal components in combination with Gaussian process regression (squared exponential kernel) (red dots). (a) C2 H4 . (b) O2

detailed simulation. However, a small error of a few percent can be observed for the prediction of the end temperature. More discrepancies are observed for the reconstruction of the species mass fractions using the GPR model with two principal components. Figures 7a and b compare the evolution of the fuel (C2 H4 ) and oxidizer (O2 ), respectively, using the detailed model and the GPR predictions. The GPR results have the tendency to under- or overestimate the trajectories of the limiting test cases, that is, the cases starting from initial temperatures of 600 and 1000 K. The computed data points

ML Framework for PAC Using PCA and GPR

10

8

-3

Detailed simulation GPR - 4 PC's

6

N2

Fig. 8 Comparison of N2 species mass fraction between the detailed model (blue crosses) and the model using two principal components in combination with Gaussian process regression (squared exponential kernel) (red dots)

391

4

2

0

2

4

Time, s

6 10

-5

are centered around the model predictions for the intermediate ignition cases. This observation indicates it might be a better strategy to limit the number of trajectories to improve the accuracy of the model. From the latter observations it can be concluded that two principal components are sufficient to regress the gas temperature and species that correlate with it. However, to reconstruct species that evolve independently and are uncorrelated to the temperature and therefor PC’s 1 and 2, additional components should be added. This has been observed most almost half of the species in the model. For example, Fig. 8 shows the predicted species mass fraction of N2 using GPR with four principal components. Although components have been added, the reconstruction shows major discrepancies.

5 Conclusion The present research presents a machine-learning framework for predicting plasmaassisted combustion simulations of ethylene-air mixtures using a nanosecond pulse discharge in a large range of conditions. The framework presents a combination of various techniques to optimally reduce the number of significant variables in the system. Following the results presented in this work, principal component analysis offers a first dimension reduction. However, the compression remains limited as the 55 species were only reduced to 44 principal components. The original contribution of the following ML methodology is to combine the information maximization obtained with PCA with Gaussian process regression. Preliminary results indicate that the species and temperatures in the model can be regressed in function of two principal components using GPR. A major model reduction is obtained as the 57 variables in the system are reduced to 2. The model using two PC’s performs well for the reconstruction of main species that correlate highly with the gas temperature as it is a major component of the first two principal components. To reconstruct the

392

A. Bellemans et al.

detailed evolution of species that are uncorrelated with the gas temperature, more principal components are needed to perform the regression. Future work will focus on the implementation of the PCA-GPR framework into the PACMAN code. New zero-dimensional reactor simulations will be performed with the surrogate model containing only a few principal components in combination with an efficient tabulation strategy. Acknowledgments A. Bellemans was funded by a fellowship of the F.R.S.-FNRS. F. Bisetti was sponsored in part by NSF Grant No. 190377 and DOE grant DE-EE0008874.

References 1. Adamovich, I.V., Li, T., Lempert, W.R.: Kinetic mechanism of molecular energy transfer and chemical reactions in low-temperature air-fuel plasmas. Phil. Trans. R. Soc. A 373(2048), 20140336 (2015) 2. Bellemans, A., Deak, N., Bisetti, F.: Development of skeletal kinetics mechanisms for plasmaassisted combustion via principal component analysis (Feb 20). In: Plasma Sources Science and Technology. 29, 2, 025020 (2020) 3. Bellemans, A., Kincaid, N., Deak, N., Pepiot, P., Bisetti, F.: P-DRGEP: A novel methodology for the reduction of kinetics mechanisms for plasma-assisted combustion applications. In: Proceedings of the Combustion Institute. 38, 4, p. 6631–6639, 9 p (2021) 4. Bittencourt, J.A.: Fundamentals of Plasma Physics. Springer Science & Business Media (2013) 5. Cohen, S.D., Hindmarsh, A.C., Dubois, P.F.: Cvode, a stiff/nonstiff ode solver in c. Comput. Phys. 10(2), 138–143 (1996) 6. Deak, N., Bellemans, A., Bisetti, F.: Plasma assisted ignition of methane/air and ethylene/air mixtures: efficiency at low and high pressures. Proc. Combust. Inst. 38, 6551–6558 (2021) 7. Kee, R., Rupley, F., Miller, J.: Chemkin-ii: A fortran chemical kinetics package for the analysis of gas phase chemical reactions. In: Sandia Report SAND, pp. 89–8009 (1989) 8. Lefkowitz, J.K., Guo, P., Rousso, A., Ju, Y.: Species and temperature measurements of methane oxidation in a nanosecond repetitively pulsed discharge. Phil. Trans. R. Soc. A 373(2048), 20140333 (2015) 9. Lu, T., Law, C.K.: A directed relation graph method for mechanism reduction. Proc. Combust. Inst. 30(1), 1333–1341 (2005) 10. Malik, M.R., Isaac, B.J., Coussement, A., Smith, P.J., Parente, A.: Principal component analysis coupled with nonlinear regression for chemistry reduction. Combust. Flame 187, 30– 41 (2018) 11. Niemeyer, K.E., Sung, C., Raju, M.P.: Skeletal mechanism generation for surrogate fuels using directed relation graph with error propagation and sensitivity analysis. Combust. Flame 157(9), 1760–1770 (2010) 12. Parente, A., Sutherland, J.C.: Principal component analysis of turbulent combustion data: Data pre-processing and manifold sensitivity. Combust. Flame 160(2), 340–350 (2013) 13. Pepiot-Desjardins, P., Pitsch, H.: An automatic chemical lumping method for the reduction of large chemical kinetic mechanisms. Combust. Theor. Model. 12(6), 1089–1108 (2008) 14. Starikovskaia, S.: Plasma-assisted ignition and combustion: nanosecond discharges and development of kinetic mechanisms. J. Appl. Phys. D 47(35), 353001 (2014) 15. Starikovskiy, A., Aleksandrov, N.: Plasma-assisted ignition and combustion. Prog. Energ. Combust. Sci. 39(1), 61–110 (2013)

Estimating Exposure Fraction from Radiation Biomarkers: A Comparison of Frequentist and Bayesian Approaches Adam Errington, Jochen Einbeck

, and Jonathan Cumming

1 Introduction The major goal of space missions is to allow human exploration without exceeding a certain risk level from exposure to space radiation. Clearly, the understanding of human exposure to this ionising radiation in the aircraft environment is of great importance in the field of aerospace. Since human response to ionising radiation is both individual and variable, one approach is to explore radiation effects on a cellular level. Individual radiation sensitivity can provide the basis for personalised countermeasures against key environmental factors in long-term missions. Radiation biomarkers, which try to quantify the radiation dose through the damage that has been caused on a cellular level, are necessary in order to determine the radiation sensitivity in a blood sample. Most radiation biomarkers come in the form of count data, including cytogenetic biomarkers (dicentric chromosomes, micronuclei [1]), or protein-based biomarkers such as the γ-H2AX assay [2]. The latter biomarker considers counts of foci which appear after phosphorylation of the H2AX histone following DNA double-strand breaks. While this biomarker motivates our work, the principles are applicable to other biomarkers and also beyond the field of biodosimetry. The Poisson model is a natural choice for the analysis of count data, and has been successfully applied for the dicentric assay under full-body exposure (in which case one has, at least in

A. Errington · J. Cumming Department of Mathematical Sciences, Durham University, Durham, UK J. Einbeck () Department of Mathematical Sciences, Durham University, Durham, UK Durham Research Methods Centre, Durham University, Durham, UK e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Vasile, D. Quagliarella (eds.), Advances in Uncertainty Quantification and Optimization Under Uncertainty with Aerospace Applications, Space Technology Proceedings 8, https://doi.org/10.1007/978-3-030-80542-5_24

393

394

A. Errington et al.

principle, no overdispersion at all). Laboratory data (with known doses) are used to fit a linear or quadratic model to the measured yields (counts per cell), resulting in a ‘calibration curve’. Following exposure of an individual, the observed yield of a blood sample is equated to the calibration curve, and dose estimated via inverse regression [1]. While early evidence suggests that the distribution of γ-H2AX foci among the scored blood cells adheres well to the Poisson assumption, and hence can be analysed by employing methods used for the dicentric assay [2], practical γ-H2AX data sets almost always exhibit overdispersion. That is, for the γ-H2AX assay, the Poisson assumption of equidispersion (variance = mean) is usually violated, which may be caused by unobserved heterogeneity in the cell population or aspects of the scoring procedure. The deviation from the equidispersion property has two major implications. Firstly, it invalidates the use of Poisson models, so that more elaborate modelling techniques (such as based on Quasi-Poisson or Negative Binomial models) need to be applied. Secondly, it complicates the detection of partial body radiation exposures. In the case of partial exposure of individuals, their blood will contain a mixture of cells showing no radiation impact at all (structural zeros), and cells featuring a distribution of counts according to dose of exposure. It is important to detect partial-body exposure and to quantify the fraction of exposure, as otherwise the resulting dose estimates will be incorrect, with potentially severe consequences. For the dicentric assay, where whole-body counts are usually equidispersed, any significant overdispersion serves as evidence for the presence of partial exposure. It is well established how to adjust for partial-body exposure for this biomarker through the ‘contaminated Poisson method’ [3, 4]. However, for biomarkers such as micronuclei or the γ-H2AX, which are overdispersed per se, no such mechanism is yet known. In this chapter, we focus on the problem of estimating the exposure fraction and quantifying its uncertainty, for such scenarios. Bayesian and frequentist techniques will be employed and compared.

2 Methodology To represent partial-body exposure in the case of an overdispersed distribution of foci counts, we require models which can handle both overdispersion and excess zero counts in data. To account for the extra zero counts, zero-inflated models describe the data as a combination of two distributions: a distribution which takes a single value at zero and a count distribution such as the Poisson or NB1. For a sample consisting of n cells, we define Yj to be the response variable representing the observed number of foci for cell j (j = 1, .., n). The probability mass function for the zero-inflated Poisson (ZIP) model is:

Estimating Exposure Fraction from Radiation Biomarkers

395

⎧ ⎨p + (1 − p)e−λ , for y = 0 j P (Yj = yj |λ, p) = −λ y ⎩(1 − p) e λ j , for y j >0 yj !

(1)

where 0 ≤ p ≤ 1 and λ > 0, possibly depending on covariates such as dose. Here, λ refers to the mean of the underlying Poisson distribution and p is the zero-inflation parameter. As usual, p will be modelled through a logistic regression but with the proportion of the mixture assumed to be constant: logit(p) = γ0 .

(2)

The ZIP model has the properties: E(Yj ) = (1−p)λ = μ and Var(Yj ) = (1−p)λ(1+ pλ) and reduces to a Poisson when p = 0. Since Var(Yj ) ≥ μ, zero-inflation can be seen as a special form of overdispersion. In our context, for data which stem from full- or partial-body exposure, it is sensible to consider overdispersion and zero-inflation as two separately identifiable model properties. A suitable model for this purpose is the type 1 zero-inflated negative binomial (ZINB1) model with probability mass function defined by

P (Yj = yj |λ, p, α) =

⎧ ⎪ ⎨p + (1 − p)(1 + α)− αλ , $(yj + λ ) 1 −y −λ ⎪ ⎩(1 − p) yj !$( λα) (1 + α) α (1 + α ) j , α

for yj = 0 for yj > 0 (3)

This model shares the same mean as the ZIP, but has variance Var(Yj )=(1 − p)λ(1 + α + pλ), where α ≥ 0 is an overdispersion parameter. This variance suggests that the ZINB1 exhibits overdispersion when α > 0 or p > 0. For α = 0, the ZINB1 reduces to the ZIP. We estimate the model parameters α and p in two ways: – Maximum Likelihood estimation. This will implicitly produce standard errors of αˆ and pˆ through the Fisher information matrix. – Bayesian estimation. Using a uniform prior for p, a Gamma prior for α, and a prior for λ which is determined by linear transformation of a dose prior according to the known calibration curve, the posterior distribution is computed via a MCMC Gibbs sampling algorithm. Uncertainty Quantification is based on this posterior distribution. The exposure fraction, F = 1 − p, and dispersion, φ = 1 + α, can be estimated via Fˆ = 1 − pˆ and φˆ = 1 + α, ˆ respectively, where clearly SE(Fˆ ) = SE(p), ˆ and ˆ = SE(α). SE(φ) ˆ We note that F = 1 − p is a simplifying assumption as it ignores certain effects (such as cell death) which prevent irradiated cells being observable at the time of scoring. The validity of this assumption is investigated further in Sect. 4.

396

A. Errington et al.

3 Simulation To generate H2AX-type foci count samples, we make use of a whole-body calibration curve reported previously in the literature [5]: μ = 0.35 + 1.48D.

(4)

Assuming a fixed and known dose of D ≡ 3Gy for this simulation, n = 1000 observations were taken separately from two scenarios: A. Poi(λ = μ = 4.79) B. NB1(λ = μ = 4.79; φ = 2) (with ‘base’ dispersion φ = 1 + α), providing an equidispersed (A) and an overdispersed (B) whole-body sample (see Fig. 1). In order to mimic a 50% partial exposure scenario, 1000 zeros were manually added to the above samples. The whole process was repeated 100 times. Hereafter, information regarding dose level and fraction used to generate this data is assumed to be unknown. For Bayesian estimation, we use p ∼ U [0, 1], α ∼ $(0.005, 0.01−1 ) (in the form of $(k, θ ) such that E($(k, θ )) = kθ and Var($(k, θ )) = kθ 2 where k is the shape parameter and θ the scale parameter). Implementation of the priors for p, α and λ is provided in the Appendix. NB1 (phi = 2) Foci Distribution

Frequency

0

0

50

50

100

Frequency

100

150

150

Poisson Foci Distribution

0

5

10 Foci Per Cell

15

20

0

5

10

15

20

Foci Per Cell

Fig. 1 A comparison of the individual number of foci per cell produced in equidispersed (left) and overdispersed (right) whole-body samples of the same mean

Estimating Exposure Fraction from Radiation Biomarkers

397

Table 1 MLE and posterior mean estimates for the zero-inflation and dispersion parameter based on an average of the 100 Poisson simulation runs. For reference, the posterior 2.5% and 97.5% quantiles as well as the median and mode are given in brackets (2.5% quantile, 97.5% quantile, median, mode) True parameter values 0.5000 0 Frequentist estimates ZIP 0.4998 ± 0.0113

p α

pˆ αˆ

ZINB1 0.4997 ± 0.0113 0.0148 ± 0.3266

Bayesian estimates 0.4998 ± 0.0113 (0.4778, 0.5219, 0.4998, 0.4999)

pˆ

0.4992 ± 0.0113 (0.4771, 0.5213, 0.4992, 0.4991) 0.0549 ± 0.0224 (0.0309, 0.1131, 0.0486, 0.0333)

0.06 0.00

0.02

0.498

0.04

α^

0.500

p^

0.08

0.502

0.10

0.12

0.504

αˆ

F−ZIP

F−ZINB1

B−ZIP

B−ZINB1

F−ZINB1

B−ZINB1

Fig. 2 A comparison of the frequentist (F-ZIP/F-ZINB1) and Bayesian (B-ZIP/B-ZINB1) distributions of the model parameters resulting from 100 Poisson simulations. The red horizontal lines at p = 0.5 and α = 0 indicate the true parameter values

Scenario A One finds from Table 1 that, for the ZIP model, the Bayesian and frequentist estimates (and their standard errors) of p are identical. It follows that an estimate for the exposed fraction, F , is found through Fˆ = 1−pˆ = 0.5002±0.0113. Estimates of exposure fraction under ZINB1 are very similar to those under ZIP. While the frequentist ZINB1 estimate obtained for α suggests equidispersion, the Bayesian confidence interval for α does not cover the true value α = 0. From Fig. 2, we see that the Bayesian versions tend to skew the estimates away from the true values, which is related to the choice of priors. We investigate this choice in further detail later in this section.

398

A. Errington et al.

Table 2 MLE and posterior mean estimates for the zero-inflation and dispersion parameter based on an average of the 100 NB1 simulation runs True parameter values 0.5000 1.0000 Frequentist estimates ZIP 0.5149 ± 0.0113

p α

pˆ αˆ

ZINB1 0.5005 ± 0.0118 0.9849 ± 0.1096

Bayesian estimates 0.5148 ± 0.0112 (0.4928, 0.5368, 0.5148, 0.5147)

pˆ

0.5004 ± 0.0118 (0.4771, 0.5234, 0.5004, 0.5009) 0.9895 ± 0.1096 (0.7875, 1.2165, 0.9852, 0.9749)

0.490

0.7

0.495

0.8

0.500

0.9

0.505

α^

^ p

1.0

0.510

1.1

0.515

1.2

0.520

αˆ

F−ZIP

F−ZINB1

B−ZIP

B−ZINB1

F−ZINB1

B−ZINB1

Fig. 3 A comparison of the frequentist (F-ZIP/F-ZINB1) and Bayesian (B-ZIP/B-ZINB1) distributions of the model parameters resulting from 100 NB1 simulations. The red horizontal lines at p = 0.5 and α = 1 indicate the true parameter values

Scenario B From fitting ZIP and ZINB1 models, it is clear from the MLE estimates presented in Table 2 (ZIP: Fˆ = 0.4851±0.0113, ZINB1: Fˆ = 0.4995±0.0118) and the boxplots in Fig. 3 that the ZINB1 was able to account for overdispersion due to zero-inflation and sampling and was therefore the preferred model in estimating the exposed fraction. The corresponding fraction estimates from the Bayesian methods appear to show that the ZIP deviates the most from the true value (ZIP: Fˆ = 0.4852±0.0112, ZINB1: Fˆ = 0.4996±0.0118). The true value of α was found to be within 1 standard error in both the frequentist and Bayesian ZINB1, with the latter producing a slightly closer estimate. It appears that there is no strong preference for estimating the exposed fraction and its uncertainty utilising a Bayesian approach over the standard maximum likelihood method.

Estimating Exposure Fraction from Radiation Biomarkers

399

In the Bayesian framework, deciding on the most appropriate prior can often be a difficult task. Investigating the sensitivity of prior choice is therefore recommended. In Figs. 4 and 5 we compare our current α prior choice, which has mean 0.5 and a variance of 50, with (1) priors that share the same mean but altered variance and (2) prior mean centred at 2, respectively. Note that the scaling of the vertical axes is to allow comparability of the prior means between Figs. 4 and 5. It is clear from comparing boxplots (a) and (b) with (c) and (d) in both figures that the prior variance has a significant impact on the Bayesian estimate of α. Using a prior which exhibits a small variance leads to estimates which greatly deviate from the true value. On the other hand, a larger variance produces estimates which appear to be more in agreement with those obtained through maximum likelihood (more apparent for the NB1 simulated data). Furthermore, the precise value of the variance becomes irrelevant. We also notice by comparison of Figs. 4c,d with 5c,d that the posterior distribution for α remains unchanged when shifting the position of the prior mean (assuming the prior variance is not too small).

4 Estimation of Exposed Fraction In Sects. 2 and 3 it was assumed that the exposed fraction can be estimated by F = 1 − p, hence not taking into account time-dependent cell effects which may prevent an irradiated cell to be scored. Early research has suggested that radiation overexposure accidents where the doses are up to 0.5Gy, the question of differences in transformation and survival between un-irradiated cells and those irradiated is not a complicating factor. However, for higher doses it has been reported that some allowance for cell death should be considered when acute partial body irradiation is known to have occurred [6]. For the γ-H2AX biomarker, cells are usually scored after a few hours, which leaves much less time for cell death than for the dicentric biomarker, where at least 48 h need to pass until mitosis [3]. While it appears, on this basis, reasonable to assume that for the γ-H2AX biomarker the original irradiated fraction corresponds to the fraction of irradiated cells at the time of scoring, we still would like to investigate this claim further. Firstly, we recall from [3] that the corrected fraction of irradiated cells can be written as F =

1−p 1 − p + ps

(5)

where s = s(D) is a dose-dependent function describing the survival rate of irradiated cells. According to Lloyd and Edwards [7], this rate follows a decreasing exponential function of the dose D, s(D) = exp(−γ1 D).

(6)

400

A. Errington et al.

Fig. 4 Estimates for αˆ based on prior information centred at mean 0.5 with variance (a) 0.005, (b) 0.05, (c) 50 and (d) 500 from Poisson (left) and NB1 simulation (right)

Estimating Exposure Fraction from Radiation Biomarkers

401

Fig. 5 Estimates for αˆ based on prior information centred at mean 2 with variance (a) 0.005, (b) 0.05, (c) 50 and (d) 500 from Poisson (left) and NB1 simulation (right)

402

A. Errington et al.

In Hilali et al’s context [3] of dicentric chromosomes, they denote γ1 = 1/D0 , where D0 can be interpreted as the initial dose required to reduce the number of irradiated cells to 37% due to interphase death or mitotic delay. The range of plausible values for D0 for this biomarker has been postulated in the literature, without much justification, to be between 2.7 and 3.5Gy [6]. From (5) it is clear that we have F = 1 − p exactly when s = 1, that is, when the survival rate is approximately 100%. From (6), this implies γ1 approaching 0 (or D0 tending to infinity). Oliveira et al. [8] demonstrated that, when modelling the proportion p via logit(p) = γ0 + γ1 D

(7)

in a zero-inflated model, the constant γ1 in (6) corresponds to γ1 in (7). So, what remains to show is that, for γ-H2AX biomarker data, γ1 is not statistically different from 0. We now test this hypothesis for the γ-H2AX biomarker through manually scored H2AX calibration data [9], obtained in an in-vitro setting at Public Health England after irradiation of blood lymphocytes with 250kVp X-rays. The data are made up of four samples, corresponding to foci per cell frequencies for three levels of dose (0.75Gy, 1.5Gy and 3Gy) for each exposed fraction (20%, 40%, 60% and 80%). 200 cells were examined for each dose, totalling 600 cells per sample. A ZINB1 regression, with p modelled as in (7) and the mean modelled via the linear predictor μ = β0 + β1 D using the identity–link function, is fit separately to the individual samples. It is clear from the zeroes contained in the γˆ 1 confidence intervals quoted in Table 3 that effects such as cell death at the time of scoring can be considered negligible, therefore providing sufficient evidence for the exposed fraction to be estimated through Fˆ = 1 − pˆ in the case of a single sample. The right-hand column of the table gives the resulting confidence interval for s(D) in the case D = 1 Gy. We note also that by substituting any of the three doses from the calibration data into (6), the corresponding confidence intervals for s(D) will still encompass a value of 1, when working to two standard errors. Table 3 95% confidence intervals for γ1 and exp(−γˆ 1 ). Calibration data used consists of four exposure fractions for 3 dose levels (0.75Gy, 1.5Gy and 3Gy) True fraction irradiated (%) 20 40 60 80

γˆ 1 (−0.2225, 0.1612) (−0.0291, 0.3419) (−0.0490, 0.3100) (−0.1369, 0.2466)

exp(−γˆ 1 ) (0.8511, 1.2492) (0.7104, 1.0295) (0.7334, 1.0502) (0.7814, 1.1467)

Estimating Exposure Fraction from Radiation Biomarkers

403

5 Discussion Monitoring exposure of astronauts to radiation was initiated as early as Project Mercury and has continued through current space missions. Space radiation consists of charged nuclei and is dynamic in both quantity and quality. Although active dosimeters have been used for monitoring space radiation, these monitors do not provide complete information on the charges and energies of all particles found in space. Even if personal dosimeters are available or physical dosimetry methods, the intrinsic localisation of these dosimeters means that total body exposures may be vastly under- or overestimated [10]. Also, there remains a lack of understanding of the synergistic effect of radiation and microgravity, which can lead to significant uncertainties in estimating exposure [11]. To account for such uncertainties, cytogenetic biodosimetry methods have been used extensively for assessing space radiation exposures [12, 13]. More specifically, the analysis of dicentrics (and centric ring formations) is considered “the gold standard” for cytogenetic biodosimetry because dicentric chromosomes can be easily identified in Giemsa stained chromosome preparations and pre-exposure background levels are very low. Over the last decade, protein-based biomarkers such as γ-H2AX have emerged as potential alternatives to the dicentric assay due to their detection of radiation exposure in a faster, cheaper, and less labour-intensive manner. In the aerospace field, research on the suitability of the γ-H2AX biomarker for the estimation of the exposed fraction (and therefore the absorbed dose) due to space radiation still remains limited. However, there is some evidence suggesting the usefulness of γH2AX for explaining the effects of space travel-induced DNA damage [14, 15]. Based on its properties, the γ-H2AX biomarker may be a preferable choice for use in short-term missions, especially if a quick assessment of the contracted radiation is sought. However, both the dicentric and the γ-H2AX assay undergo time dependent decay in blood lymphocytes, which limits these methods to assessment of acute radiation exposures in cases where samples can be obtained fairly soon after the exposure has occurred (for the latter assay within hours). For long-term missions, or for cases where radiation exposure is to be assessed several months after the mission, the scoring of translocations (stable chromosome aberrations) as in the FISH chromosome painting technique appear to be more suitable, due to the frequency of cells with translocations remaining constant for decades after exposure [16]. In this chapter, we have attempted to address the issue of arriving at a fraction estimate and quantifying its uncertainty through equidispersed (Poisson) and overdispersed (NB1) simulated partial-exposure γ-H2AX data, using an external whole-body calibration curve. In both cases, fraction and dispersion parameter estimates obtained from a ZIP model (allows purely for zero-inflation) and a ZINB1 model (allows for both zero-inflation and overdispersion) were compared, employing frequentist and Bayesian methods. For the Poisson simulated data, it could be argued that the frequentist ZINB1 performed better than the Bayesian

404

A. Errington et al.

ZINB1, producing estimates pˆ and αˆ closer to their true values, which can be considered a reflection of the fact that a Gamma prior with support α > 0 will implicitly skew the estimate of α towards overdispersion. However, the estimates from the NB1 data revealed that the Bayesian ZINB1 was the slightly favourable model. The estimates for p and α, based on the priors described in Sect. 2, indicate that there is no “strong” preference choosing between a frequentist and Bayesian approach for fraction estimation and uncertainty quantification. For a practitioner analysing a patient blood sample who is inexperienced with the Bayesian MCMC framework, the maximum likelihood method offers a simple (freedom of priors) and much faster alternative for obtaining a fraction estimate. In the event that some prior information regarding the exposed fraction is known, a practitioner may desire to use this information in the estimation process. For the α prior, we showed in Sect. 3 that there is flexibility in the choice of prior, assuming a reasonably large variance is used and the mean of the prior is not too small. In this chapter, a uniform prior was used for p, which assumes that no prior information is available and therefore any exposure scenario is equally likely. As an extension, and for consideration of further work, a beta prior for p could be used to give some weight towards certain exposure patterns; for instance it may be plausible to assume a priori that p is close to 0 or 1, motivating a U -shaped form for the beta prior.

Appendix The following R code (used with package rjags) details the prior configurations used in Sect. 3. If the covariance matrix (betacov) is not available from the calibration curve (as in our case), it can be estimated from simulated data generated from that curve. # Prior distribution for lambda f1 1, the model predicts MB < MA for each possible oblique shock configuration with upstream state A1 . Note that suffix n identifies the quantity component normal to the shock front.

410

G. Gori et al.

Fig. 2 Diagrams illustrating (a) the variation of the flow deflection angle θ with shock angle β and (b) the polar of the Mach number, where MBx = MB cos θ and MBy = MB sin θ, as computed from the polytropic van der Waals model of siloxane fluid MDM. Also shown are the curves, labeled I G, that correspond to the polytropic ideal gas case with the same upstream Mach number MA used in the computation of the non-ideal cases

In Fig. 1b, state A2 , with ΓA2 < 1 is considered showing that there exists a range of shock angles in which the Mach number increases. Due to the almost negligible entropy rise, and together with ΓB < 1, cB initially decreases with increasing shock angle, as shown in Fig. 1b, where cB < cA up to 47.9 [deg]. A pronounced local minimum if found at 38.3 [deg]. The post-shock state characterized by MB = MA occurs at 42.5 [deg] whereas the local peak for MB is found at 38.2 [deg] and corresponds to the ratio MB /MA = 1.5. The variation of the flow deflection angle θ with the shock angle β for each of the configurations considered is shown in Fig. 2a, where it is compared with the curve labeled I G, which corresponds to the polytropic ideal gas case and, as is well-known, it is independent of the pre-shock thermodynamic state PA , vA [41]. The curves computed by means of the polytropic vdW model exhibit substantial differences depending on the pre-shock thermodynamic state. In particular, the maximum turning angle θmax that the flow can sustain across a planar attached shock wave varies from 9.5 [deg] of case A1 up to 50.2 [deg] of case A4 . In contrast, θmax = 23.0 [deg] for the polytropic ideal gas. Finally, the same results can be conveniently presented also in term of the shock polar for the Mach number, namely a plot of MBy = MB sin θ versus MBx = MB cos θ , as shown in Fig. 2b for each non-ideal configuration considered and the polytropic ideal gas counterpart. This analysis has a sound theoretical basis, owing to the fact that the simple van der Waals model is known to predict the correct qualitative behavior in the single-phase thermodynamic region close to liquid-vapor equilibrium (sufficiently far from the critical point for critical phenomena to be negligible). Moreover, the occurrence of non-ideal shock waves is confirmed for diverse substances using also state-of-the-art thermodynamic models, see [18, 43, 45].

Advancements in NICFD

411

Fig. 3 Experimental Schlieren images of experiments for the observation of oblique shocks in non-ideal supersonic flows

The experimental verification of the occurrence of non-ideal shock waves can be devised in test rigs working with fluids of high or even moderate molecular complexity. In [48, 49], steady oblique shock waves were observed for the first time in non-ideal supersonic flows of single-phase organic vapors. Oblique shock waves were observed and characterized experimentally at varying stagnation conditions in the pre-shock state, for a set of different flow deviation angles. Observations confirm the shock wave theory for two-dimensional steady flows, proving a purely nonideal dependence, in addition to the well-known dependence on the pre-shock Mach number, specific heat ratio and flow deviation angle typical of dilute gas conditions, of the shock pressure ratio on stagnation conditions. Figure 3a reports the Schlieren observation of oblique shocks propagating in a supersonic non-ideal stream. The stream is obtained using a planar converging-diverging nozzle. A backward facing step at the nozzle throat produces symmetric shocks propagating downstream, into the divergent section. The location of pressure probes is also indicated with green dots. Figure 3b reports instead the Schlieren observation of the shock pattern generated around a diamond-shaped airfoil profile plunged into a supersonic stream. The non-ideal increase of the flow Mach number across oblique shocks and the dependence of the shock angle and flow deviation from the upstream thermodynamic state are arguably relevant in applications where oblique shock waves are either intentionally formed (e.g. engine intake ramps) or a by-product of the supersonic flow expansion (e.g., fish-tail shocks in turbine nozzle vanes, over/underexpanded jet from nozzle exit). In [44] the authors investigate the relevance of non-ideal oblique shocks to renewable energy application.

3 NICFD Computational Model Accuracy Assessment The SU2 an open-source suite [13, 29] embodies the reference among NICFD computational solvers. In the limited framework of ideal flows of air, the reliability of the SU2 suite was extensively assessed [12, 13, 29, 30, 35]. Preliminary verification

412

G. Gori et al.

of the SU2 NICFD solver implementation can be found in [16, 33, 46]. The results shown hereinafter are taken from [15, 19, 20, 48]. These works, developed in the frame of the NSHOCK and UTOPIAE projects, concern the first-ever experimental assessment of a Non-Ideal Compressible-Fluid Dynamics (NICFD) model. The experimental test set considered to carry out the assessment includes flow configurations of practical interest for renewable energy applications. More precisely, experiments aim at reproducing the supersonic flow of MDM within turbine vanes in ORC power production systems. These typical flow configurations, involving fluid flows in mildly-to-highly non-ideal regimes, are reproduced numerically, using the non-ideal solver from SU2. Results are then compared against pressure and Mach number measurements collected at the Compressible-fluid dynamics for Renewable Energy Applications (CREA) laboratory of Politecnico di Milano, using the Test-Rig for Organic VApours (TROVA) [22, 32, 36]. The TROVA is equipped with pressure probes, to record the value of static pressure at the test section side wall. Temperature and pressure are also monitored within the settling chamber, ahead of the test section. Mach number measurements are also available through the post-processing of Schlieren images. Gaussian Probability Density Function (PDF) distribution are systematically considered for all the observations. The computational model accuracy assessment takes advantage of an Uncertainty Quantification (UQ) analysis [34], to assess the role of aleatory uncertainties on the nominal test conditions and thus to quantify the validity and robustness of SU2 predictions. Uncertainties are propagated through the solver and statistical moments of selected outputs are computed using a non-intrusive Polynomial Chaos expansion approach [24]. The output statistics, complemented with their numerical 2σ range, are compared against the experimental measurements and their associated tolerances. Several papers addressed the problem of quantifying uncertainties in the numerical simulation of non-ideal flows. Some of these papers [3, 4, 26] specifically focused the attention on thermodynamic models. In other works [6, 7, 14], multiple sources of uncertainties have been taken into account, both on operating conditions and thermodynamic models. According to the conclusions presented in these works, the uncertainties related to the thermodynamic model can be neglected with respect to the ones related to the operating conditions of the experimental facility. Therefore, in the following two sources of uncertainty were taken into account, namely, the uncertainty on the measured values of the total pressure and total temperature at the inlet of the test section. The test case and the results reported here are taken from [20]. The assessment test case considered in this review consists of a planar converging-diverging nozzle and the experiment aims at reproducing the isentropic expansion of a siloxane MDM flow in a non-ideal regime. Further details regarding the geometry and the testrig set-up can be found in [38] and [37]. The TROVA is a blowdown facility and, as the high-pressure reservoir empties, the flow encompasses highly non-ideal to ideal regimes. In the original work [20], five time instances (A-E) were selected during two discharges at diverse reservoir conditions, for a total of 10 different flow

Advancements in NICFD

413

Fig. 4 Accuracy assessment for discharge 1 (left column) and 2 (right column). Experiments A (top row) and E (bottom row). The continuous lines are the CFD predictions, complemented by the uncertainty bars. Diamond marks correspond to experimental measurements

configurations. Here, retaining the same labeling, we show only the A1 , E1 and the A2 , E2 cases. These experiments were reproduced numerically by means of steady simulations of inviscid flows, using diverse meshes and diverse thermodynamic models. To assess the role of the viscous effects, two and three-dimensional simulations based on the Reynolds-averaged Navier-Stokes (RANS) equations were also carried out. Figure 4 shows the pressure mean trends, for the considered experiments (A1 -E1 on the left column and A2 -E2 on the right column), complemented by ±2σ ranges resulting from the UQ analysis. On the same plots, experimental measurements (diamond marks) and their 2σ uncertainty bars are reported for comparison. The numerical solution fairly matches experimental data in all cases: the mean pressure trend is indeed very close to the measured values. Discrepancies are generally found near the exhaust section, where the increase of the boundary layer thickness causes the flow to re-compress. Moreover, differences reduce as the flow regime drifts from highly non-ideal to ideal, see [20].

414

G. Gori et al.

Fig. 5 Case A2 . (a) Experimental values and error bars relative to the Mach number measure are compared against the mean solution and the numerical error bars resulting from the UQ analysis; (b) Sobol indices for static pressure related to uncertainty on both the values of total pressure and total temperature at the inlet

Schlieren images are exploited to directly measure the local value of the Mach number for test A2 . Figure 5a reports the experimental measurements and the related error bars, see [39]. In the same plot, the numerical mean solution and the ±2σ interval, resulting from the UQ analysis of the Mach number, for test A2 , are also reported. The mean solution is well included within the experimental error bars, pointing out the reliability of the predicted Mach number trend. For test A2 , a sensitivity analysis based on the computation of the Sobol indices for static pressure is reported in Fig. 5b, Clearly, the analysis reveals that the inflow total pressure uncertainty utterly dominates. On the other hand, the inflow total temperature uncertainty contributes significantly to the static pressure variability in the close proximity of the discharging section.

4 Bayesian Inference of Fluid Model Parameters The recent offspring of experimental facilities for the investigation of non-ideal flows goes so far as to envision that a considerable amount of data will be soon available. Certainly, this calls for the development of novel and reliable tools, to take advantage of data collected in experiments. In this section, we report the description of a Bayesian framework for inferring the material-dependent parameters appearing in complex thermodynamics models for non-ideal flows. The contents presented hereinafter appear in [15, 21]. The framework is tailored on the TROVA test-rig at Politecnico di Milano [22, 32, 36] and it is intended to evaluate the potential analysis which could be done on NICFD flows data. In particular, results reported hereinafter concern the inference of the

Advancements in NICFD

415

Peng-Robinson (PR) fluid model parameters [31]. The calibration data set o is relative to the non-ideal expansion of a supersonic MDM flow across a converging diverging nozzle. The reader is referred to [15, 21] for a thorough description of the experiments and for a detailed summary of test conditions. In the inference process, the unknown variables set q includes the TROVA operating conditions and the PR model parameters. The operating conditions are included in the inferential process and treated as nuisance parameters. According to this notation, the Bayes theorem reads

P q|o ∝P o|q P q ,

(3)

being q = (P t , T t , Pcr , Tcr , ω, γ )T , respectively, the inflow total pressure, the inflow total temperature, the MDM critical pressure and temperature, the MDM acentric factor and the specific heat ratio. A Gaussian likelihood function L is employed. Uniformly distributed priors P q ∼ Uq [q min , q max ] are considered. Priors bounds were selected to largely encompass reference values found in literature or according to thermodynamic stability criteria or physical limits. Given that U points out a uniform probability distribution, the Bayes theorem ultimately reads

P q | o ∝ %j Lj Uq ,

j = Ne ,

(4)

being Ne the number of experiments carried out. The resulting Probability Density Functions (PDF) are reported in Fig. 6 (we recall that ξj are, in order,

Fig. 6 Posterior PDFs for the PR thermodynamic model parameters. The red line is the prior, the blue line the posterior

416

G. Gori et al.

P t , T t , Pcr , Tcr , ω, γ ). Generally, the mass of the posteriors is accumulated over the upper or the lower bounds. This reveals that inference process is trying to explore regions of the stochastic space which are outside the imposed prior bounds. Only the posterior of ξ1 , related to the total pressure P t at the nozzle inlet, is fully contained within the prior range. In general, the Bayesian framework fails in finding a combination of values allowing the matching between numerics and experiments. Results suggest either an epistemic uncertainty underlying the computational model of the test section or biased experimental observations. Inference based on a synthetic data set, that is, observations are extrapolated from high-fidelity simulations, reveal that the considered experiment may not be suitable for the goal of inferring the Peng-Robinson model coefficients for MDM [21]. Indeed, it is virtually shown that, despite the variety of measurements that can possibly be obtained within the TROVA, little may be potentially learned. Nevertheless, the framework is returning substantial indications toward the development of future experiments. For instance, temperature measurements would be beneficial to the inference process, as their inclusion helps sharpening the posterior distributions of some unknowns. Moreover, in the considered thermodynamic conditions the problem is utterly dominated by the uncertainty over the value of total pressure at the domain inlet. Therefore, accurately controlling the flow inflow conditions is crucial. Though not easily achievable in practice, test conditions closer to the saturation curve, which foster more significant non-ideal effects, yield an improved output of the inference process.

5 Conclusions This contribution reviews some of the advancements achieved in the frame of two EU-funded project (UTOPIAE and NSHOCK). Namely, we provide an overview about non-ideal oblique shock waves. The non-ideal effects described in Sect. 2 are relevant to many applications including, just to mention a few, engine intakes rockets, supersonic nozzle outflows, under-expanded jets, and highly loaded turbomachinery stages. In particular, they may influence the pattern of the typical fish-tail shocks generated at the turbine stator blade training edge. Therefore, their understanding is of the utmost relevance for the future improvement of ORC turbine design. In Sect. 3 we review the first-ever experimental-based accuracy assessment of computational models for NICFD flows. Results reveal that all the considered non-ideal flows are fairly well simulated by the SU2 NICFD CFD solver. The variability of the numerical solution is limited to very small values, pointing out the robustness and the predictive character of the numerical tool. Nevertheless, numerical/experimental comparisons also reveal a mismatch increasing with the flow regime non-ideal behavior. The mismatch suggests that the CFD model of the TROVA test-rig could be further improved.

Advancements in NICFD

417

In Sect. 4 we review a Bayesian framework for the inference of the materialdependent parameters entering complex equation of state for non-ideal flows. Results suggest that the available measurements are possibly biased or that the computational model includes some inherent uncertainties. Anyway, an analysis based on a synthetic data set reveals that the available observations (Mach and pressure measurements) would not bring substantial information to the inference process. The analysis shows that temperature measurements, though difficult to obtain, would instead be beneficial. Acknowledgments This research was partially funded by the UTOPIAE Marie Curie Innovative Training Network, H2020-MSCA-ITN-2016, Grant Agreement number 722734 and partially funded by the European Research Council under Grant ERC Consolidator 2013, project NSHOCK 617603. Numerical experiments presented in this chapter were carried out using the PlaFRIM experimental testbed, supported by Inria, CNRS (LABRI and IMB), Université de Bordeaux, Bordeaux INP and Conseil Régional d’Aquitaine (see https://www.plafrim.fr/).

References 1. Bates, J.W., Montgomery, D.C.: Some numerical studies of exotic shock wave behavior. Phys. Fluids 11(2), 462–475 (1999). https://doi.org/http://dx.doi.org/10.1063/1.869862, http:// scitation.aip.org/content/aip/journal/pof2/11/2/10.1063/1.869862 2. Bethe, H.A.: The theory of shock waves for an arbitrary equation of state. Technical paper 545, Office Sci. Res. & Dev. (1942) 3. Cinnella, P., Congedo, P., Parussini, L.: Quantification of thermodynamic uncertainties in real gas flows. Int. J. Eng. Syst. Modell. Simul. 2(1–2), 12–24 (2010) 4. Cinnella, P., Congedo, P., Pediroda, V., Parussini, L.: Sensitivity analysis of dense gas flow simulations to thermodynamic uncertainties. Phys. Fluids 23, 116101 (2011) 5. Colonna, P., der Stelt, T.P., Guardone, A.: FluidProp: A Program for the Estimation of Thermophysical Properties of Fluids. Energy Technology Section, Delft University of Technology, The Netherlands (2005) 6. Congedo, P., Corre, C., Martinez, J.M.: Shape optimization of an airfoil in a BZT flow with multiple-source uncertainties. Comput. Methods Appl. Mech. Eng. 200(1–4), 216–232 (2011). https://doi.org/10.1016/j.cma.2010.08.006, http://linkinghub.elsevier.com/retrieve/pii/ S0045782510002392 7. Congedo, P., Geraci, G., Abgrall, R., Pediroda, V., Parussini, L.: Tsi metamodels-based multiobjective robust optimization. Eng. Comput. (Swansea, Wales) 30(8), 1032–1053 (2013) 8. Cramer, M.S., Kluwick, A.: On the propagation of waves exhibiting both positive and negative nonlinearity. J. Fluid Mech. 142, 9–37 (1984) 9. Cramer, M.S., Sen, R.: Shock formation in fluids having embedded regions of negative nonlinearity. Phys. Fluids 29, 2181–2191 (1986) 10. Cramer, M.S., Sen, R.: Exact solutions for sonic shocks in van der Waals gases. Phys. Fluids 30, 377–385 (1987) 11. D’Angelo, S., Vimercati, D., Guardone, A.: A unified description of oblique waves in ideal and non-ideal steady supersonic flows around compressive and rarefactive corners. Acta Mechanics 229, 2585–2595 (2018). https://doi.org/10.1007/s00707-018-2130-6 12. Economon, T.D., Palacios, F., Copeland, S.R., Lukaczyk, T.W., Alonso, J.J.: SU2: An opensource suite for multiphysics simulation and design. AIAA J. 54(3), 828–846 (2015). https:// doi.org/10.2514/1.J053813

418

G. Gori et al.

13. Economon, T.D., Mudigere, D., Bansal, G., Heinecke, A., Palacios, F., Park, J., Smelyanskiy, M., Alonso, J.J., Dubey, P.: Performance optimizations for scalable implicit RANS calculations with SU2. Comput. Fluids 129, 146–158 (2016). https://doi. org/http://dx.doi.org/10.1016/j.compfluid.2016.02.003, http://www.sciencedirect.com/science/ article/pii/S0045793016300214 14. Geraci, G., Congedo, P., Abgrall, R., Iaccarino, G.: High-order statistics in global sensitivity analysis: Decomposition and model reduction. Comput. Methods Appl. Mech. Eng. 301, 80–115 (2016). https://doi.org/http://dx.doi.org/10.1016/j.cma.2015.12.022, http://www. sciencedirect.com/science/article/pii/S0045782515004284 15. Gori, G.: Non-Ideal Compressible-Fluid Dynamics: Developing a Combined Perspective on Modeling, Numerics and Experiments. Ph.D. thesis, Politecnico di Milano (2019) 16. Gori, G., Guardone, A., Vitale, S., Head, A., Pini, M., Colonna, P.: Non-ideal compressiblefluid dynamics simulation with SU2: Numerical assessment of nozzle and blade flows for organic rankine cycle applications. In: 3rd International Seminar on ORC Power Systems. Brussels, Belgium (October 2015) 17. Gori, G., Vimercati, D., Guardone, A.: Non-ideal compressible-fluid effects in oblique shock waves. J. Phys. Conf. Ser. 821(1), 012003 (2017). http://stacks.iop.org/1742-6596/821/i=1/a= 012003 18. Gori, G., Vimercati, D., Guardone, A.: A numerical investigation of oblique shock waves in non-ideal compressible-fluid flows. In: 31st International Symposium on Shock Waves. Nagoya, Japan (2017) 19. Gori, G., Zocca, M., Cammi, G., Spinelli, A., Guardone, A.: Experimental assessment of the open-source SU2 CFD suite for ORC applications. Energy Procedia 129(Supplement C), 256– 263 (2017) 20. Gori, G., Zocca, M., Cammi, G., Spinelli, A., Congedo, P., Guardone, A.: Accuracy assessment of the non-ideal computational fluid dynamics model for siloxane MDM from the open-source SU2 suite. Eur. J. Mech. B/Fluids 79, 109–120 (2020). https://doi.org/https:// doi.org/10.1016/j.euromechflu.2019.08.014, http://www.sciencedirect.com/science/article/pii/ S099775461830712X 21. Gori, G., Zocca, M., Guardone, A., Maître], O.L., Congedo, P.: Bayesian inference of thermodynamic models from vapor flow experiments. Comput. Fluids 205, 104550 (2020). https://doi.org/https://doi.org/10.1016/j.compfluid.2020.104550, http://www. sciencedirect.com/science/article/pii/S0045793020301225 22. Guardone, A., Spinelli, A., Dossena, V.: Influence of molecular complexity on nozzle design for an organic vapor wind tunnel. ASME J. Eng. Gas Turb. Power 135, 042307 (2013) 23. Lambrakis, K.C., Thompson, P.A.: Existence of real fluids with a negative fundamental derivative Γ . Phys. Fluids 15(5), 933–935 (1972) 24. Le Maître, O., Knio, O.: Spectral Methods for Uncertainty Quantification. Scientific Computation, 1st edn.. Springer Netherlands (2010). https://doi.org/10.1007/978-90-481-3520-2 25. Menikoff, R., Plohr, B.J.: The Riemann problem for fluid flow of real material. Rev. Mod. Phys. 61(1), 75–130 (1989) 26. Merle, X., Cinnella, P.: Bayesian quantification of thermodynamic uncertainties in dense gas flows. Reliab. Eng. Syst. Safety 134(Supplement C), 305–323 (2015). https://doi. org/https://doi.org/10.1016/j.ress.2014.08.006, http://www.sciencedirect.com/science/article/ pii/S0951832014001999 27. Nannan, N.R., Guardone, A., Colonna, P.: Critical point anomalies include expansion shock waves. Phys. Fluids 26(2), 021701 (2014). https://doi.org/http://dx.doi.org/10.1063/1.4863555, http://scitation.aip.org/content/aip/journal/pof2/26/2/10.1063/1.4863555 28. Nannan, N.R., Sirianni, C., Mathijssen, T., Guardone, A., Colonna, P.: The admissibility domain of rarefaction shock waves in the near-critical vapour–liquid equilibrium region of pure typical fluids. J. Fluid Mech. 795, 241–261 (2016). https://doi.org/10.1017/jfm.2016.197 29. Palacios, F., Colonno, M.R., Aranake, A.C., Campos, A., Copeland, S.R., Economon, T.D., Lonkar, A.K., Lukaczyk, T.W., Taylor, T.W.R., Alonso, J.J.: Stanford University Unstructured (SU2 ): An open-source integrated computational environment for multi-physics simulation and

Advancements in NICFD

419

design. AIAA Paper 2013-0287 51st AIAA Aerospace Sciences Meeting and Exhibit (January 2013) 30. Palacios, F., Economon, T.D., Aranake, A., Copeland, R.S., Lonkar, A., Lukaczyk, T., Manosalvas, D.E., Naik, R.K., Padron, S., Tracey, B., Variyar, A., Alonso, J.J.: Stanford university unstructured (SU2): Analysis and design technology for turbulent flows. AIAA Paper 2014-0243 52nd Aerospace Sciences Meeting (2014) 31. Peng, D.Y., Robinson, D.B.: A new two-constant equation of state. Ind. Eng. Chem. Fundam. 15, 59–64 (1976) 32. Pini, M., Spinelli, A., Dossena, V., Gaetani, P., Casella, F.: Dynamic simulation of a test rig for organic vapours (August 7–10 2011) 33. Pini, M., Vitale, S., Colonna, P., Gori, G., Guardone, A., Economon, T., Alonso, J., Palacios, F.: SU2: The Open-Source Software for Non-ideal Compressible Flows, vol. 821, p. 012013 (2017) 34. Roy, C.J., Oberkampf, W.L.: A comprehensive framework for verification, validation, and uncertainty quantification in scientific computing. Comput. Methods Appl. Mech. Eng. 200(25), 2131–2144 (2011). https://doi.org/https://doi.org/10.1016/j.cma.2011.03.016, http:// www.sciencedirect.com/science/article/pii/S0045782511001290 35. Sanchez, R., Kline, H., Thomas, D., Variyar, A., Righi, M., Economon, D.T., Alonso, J.J., Palacios, R., Dimitriadis, G., Terrapon, V.: Assessment of the fluid-structure interaction capabilities for aeronautical applications of the open-source solver SU2. In: ECCOMAS, VII European Congress on Computational Methods in Applied Sciences and Engineering, Crete Island, Greece (2016) 36. Spinelli, A., Pini, M., Dossena, V., Gaetani, P., Casella, F.: Design, simulation, and construction of a test rig for organic vapours. ASME J. Eng. Gas Turb. Power 135, 042303 (2013) 37. Spinelli, A., Guardone, A., Cozzi, F., Carmine, M., Cheli, R., Zocca, M., Gaetani, P., Dossena, V.: Experimental observation of non-ideal nozzle flow of siloxane vapor MDM. In: 3rd International Seminar on ORC Power Systems, Brussels, Belgium, 12–14 October (2015) 38. Spinelli, A., Cozzi, F., Zocca, M., Gaetani, P., Dossena, V., Guardone, A.: Experimental investigation of a non-ideal expansion flow of siloxane vapor MDM. In: Proceedings of the ASME 2016 Turbo Expo, Soul. No. GT2016-57357 (2016) 39. Spinelli, A., Cammi, G., Zocca, M., Gallarini, S., Cozzi, F., Gaetani, P., Dossena, V., Guardone, A.: Experimental observation of non-ideal expanding flows of siloxane MDM vapor for ORC applications. Energy Procedia 129, 1125–1132 (2017) 40. Thompson, P.A.: A fundamental derivative in gasdynamics. Phys. Fluids 14(9), 1843–1849 (1971) 41. Thompson, P.A.: Compressilbe Fluid Dynamics. McGraw-Hill (1988) 42. Thompson, P.A., Lambrakis, K.C.: Negative shock waves. J. Fluid Mech. 60, 187–208 (1973) 43. Vimercati, D.: Non-Ideal Steady Supersonic Flows. Ph.D. thesis, Politecnico di Milano (2019) 44. Vimercati, D., Gori, G., Spinelli, A., Guardone, A.: Non-ideal effects on the typical trailing edge shock pattern of ORC turbine blades. Energy Procedia 129(Supplement C), 1109–1116 (2017) 45. Vimercati, D., Gori, G., Guardone, A.: Non-ideal oblique shock waves. J. Fluid Mech. 847, 266–285 (2018). https://doi.org/10.1017/jfm.2018.328 46. Vitale, S., Gori, G., Pini, M., Guardone, A., Economon, T.D., Palacios, F., Alonso, J.J., Colonna, P.: Extension of the SU2 open source CFD code to the simulation of turbulent flows of fluids modelled with complex thermophysical laws. In: 22nd AIAA Computational Fluid Dynamics Conference, No. AIAA Paper 2760 (2015) ´ 47. Zeldovich, Y.B.: On the possibility of rarefaction shock waves. Zh. Eksp. Teor. Fiz. 4, 363–364 (1946) 48. Zocca, M.: Experimental Observation of Supersonic Non-ideal Compressible-Fluid Flows. Ph.D. thesis, Politecnico di Milano (2018) 49. Zocca, M., Guardone, A., Cammi, G., Cozzi, F., Spinelli, A.: Experimental observation of oblique shock waves in steady non-ideal flows. Exp. Fluids 60, 101 (2019). https://doi.org/ https://doi.org/10.1007/s00348-019-2746-x

Dealing with High Dimensional Inconsistent Measurements in Inverse Problems Using Surrogate Modeling: An Approach Based on Sets and Intervals Krushna Shinde

, Pierre Feissel, and Sébastien Destercke

1 Introduction The characterization of material behavior requires (under heterogeneous conditions, e.g., complex loading, geometry, or material) an inverse method to identify the material parameters. The deterministic identification problem is generally highly sensitive to data quality, and one way to resolve this issue is to take into account uncertainties in the data. While several identification methods exist in the literature, most of them use least-square minimization or Bayesian approaches. Both these approaches, however, can be quite sensitive to outliers [1, 2] or to aberrant measurements. In this chapter, we present a new identification strategy to solve the inverse problem, in particular when measurements are inconsistent with one another. In this strategy, we use sets [3] to model the uncertainty on the information related to measurements and parameters. We also developed some indicators of consistency of the measurements to characterize inconsistent measurements, that is, outliers in the data. In the literature, in the context of the set-based identification approaches, there are several methods like the Q-intersection method [4] to deal with inconsistent measurements but such approaches are not efficient when dealing

K. Shinde · P. Feissel CNRS, Roberval (Mechanics, Energy and Electricity), Centre de recherche Royallieu—CS, Université de Technologie de Compiègne, Compiègne, France https://roberval.utc.fr/ e-mail: [email protected]; [email protected] S. Destercke () CNRS, Heudiasyc (Heuristics and Diagnosis of Complex Systems), CS, Université de Technologie de Compiègne, Compiègne, France https://www.hds.utc.fr/ e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Vasile, D. Quagliarella (eds.), Advances in Uncertainty Quantification and Optimization Under Uncertainty with Aerospace Applications, Space Technology Proceedings 8, https://doi.org/10.1007/978-3-030-80542-5_26

421

422

K. Shinde et al.

with a large number of measurements. We applied our strategy to identify the elastic parameters of an isotropic material from full-field displacement measurements and detect outliers in the measurements. We considered two cases: (1) identification from a small number of measurements, (2) identification from a large number of measurements using surrogate modeling.

2 Identification Strategy and Outlier Detection Method We consider an inverse problem where we want to identify some parameters of a model y = f (θ ) from N measurements made on quantity y. The model f yields the relationship between the M model parameters θ ∈ RM and the measured quantity, under given experimental conditions. We will denote by y˜ ∈ RN the measurements made on y. In this work, we consider that the model is accurate, in the sense that any discrepancy between f (θ ∗ ), θ ∗ being the true parameter values, and y˜ is due to some measurement errors. In this work, we will consider that our uncertainty on y˜ is provided by an interval [y], that is a closed set of connected real values noted by [y] = [y, y] = {y ∈ R | y ≤ y ≤ y} where y and y are, respectively, the lower and the upper bounds of the interval. One advantage of intervals is that it requires almost no assumption regarding the nature and source of uncertainty [5]. In particular, our uncertainty on each measurement y˜k will be described by such an interval, and the overall uncertainty Sy on all measurements will correspond to a hyper-cube Sy =

N =

[y˜k , y˜k ] ⊂ RN

(1)

k=1

where each measurement is described by its lower bound y˜k and upper bound y˜k . Similarly, we will assume that our prior information about the parameters θ is provided by a hyper-cube S0θ ⊂ RM that provides a simple description of the physical boundaries in which θ can lie. The set-valued solution Sθ to the inverse problem can then be simply described as the set of all parameter values within S0θ that are consistent with the observed uncertain measurements, that is, Sθ = {θ ∈ S0θ | f (θ ) ∈ Sy }.

(2)

Since in the current approach all measurements are considered independent of each other, due to the fact that we take their Cartesian product, computing Sθ can alternatively be written as the result of the intersection Sθ =

N E k=1

Skθ

(3)

Set-Valued Approach for the Inverse Problem

423

where Skθ = {θ ∈ RM | f (θ ) ∈ [y˜k , y˜k ]}

(4)

is simply the set of parameter values consistent with the kth measurement. If all measurements are consistent, that is in our case if all intervals [y˜k , y˜k ] include the true value of the measured quantity yk , then the solution set Sθ will be nonempty, as Eq. (3) will have at least one solution. However, it is very likely that some measurements will not contain this true value, and B thatkthey will be globally inconsistent. In such a case, we will have Sθ = ∅ as N k=1 Sθ = ∅. There may be several reasons for the inconsistency of the measurements with respect to the model such as presence of measurement outliers or model error. In case of inconsistency, a way to restore consistency is to remove incompatible measurements, that is, possible outliers. To do this, our method relies on measures of consistency that we introduce now. For any two solution sets Skθ and Skθ corresponding to y˜k and y˜k measurement respectively, (k, k ) ∈ {1, . . . , N }2 , we define the degree of inclusion (DOI) of one solution set Skθ with respect to another Skθ as

DOIkk =

A(Skθ ∩ Skθ )

A(Skθ )

(5)

where A(Skθ ) corresponds to the area of the set Skθ . The DOI between two solution sets is nonsymmetric, that is, DOIkk = DOIk k . DOI reaches to its boundary values in the following situations as illustrated in Fig. 1. ) DOI

kk

Fig. 1 DOI between two sets

=

1 iff Skθ ⊆ Skθ 0 iff Skθ ∩ Skθ = ∅

(6)

424

K. Shinde et al.

Furthermore, the value of DOIkk will always be between 0 and 1 when A(Skθ ) is non-zero. The larger the value of DOI between one solution set and another, the higher the possibility of Skθ being included in Skθ . We now introduce a measurement-wise consistency degree from a set of measurements. By using the pairwise degree of inclusion (DOI) of the solution sets corresponding to the measurements, we define the global degree of consistency (GDOC) of any kth measurement with respect to all other measurements as N GDOC(k) =

k =1 DOIk k

+ 2N

N

k =1 DOIkk

(7)

which reaches its boundary values in the following situations: ) GDOC(k) =

1 0

iff S1θ = S2θ , . . . , = SN θ . iff Skθ ∩ Skθ = ∅, ∀ k ∈ {1, . . . , N }

(8)

The value of GDOC(k) will always be between 0 and 1. Note that the condition for GDOC =1 is very strong, as it requires all sets to be identical. If GDOC(k) = 0, then the kth measurement is fully inconsistent with all other measurements. A high value of GDOC for the kth measurement then indicates a high consistency with most of the other measurements. Finally, we define a global consistency measure for a group of measurements. k M Let S = {S1θ , . . . , Skθ , . . . , SN θ } with Sθ ⊆ R be the set of solutions to the inverse problem for the measurements {y1 ,. . . , yN }. We define the general consistency (GCONS) for any subset E ⊂ S of measurements as A( GCONS(E) =

B

Skθ )

Skθ ∈E

min A(Skθ )

(9)

.

Skθ ∈E

It has the following properties: 1. It is insensitive to permutation of the sets of measurement (commutativity). 2. The value of GCONS is monotonically decreasing with the size of the set E, in the sense that for any subsets of measurements E, F, with E ⊆ F , then we have GCONS(F ) ≤ GCONS(E). It also means that the more measurement we have, the less consistent they are with one another. 3. ⎧ B ⎪ 0 iff A( Skθ ) = ∅ ⎪ ⎪ ⎨ Skθ ∈E B GCONS(E) = . ⎪ 1 iff A( Skθ ) = min A(Skθ ) ⎪ ⎪ k ⎩ S ∈E k Sθ ∈E

θ

Set-Valued Approach for the Inverse Problem

425

Algorithm 1 GCONS outlier detection method Require: S = {S1θ ,. . . ..,SN " Set S1θ , S2θ , . . . SN θ }, GCON Sthreshold θ are arranged such that GDOC(1) ≥ GDOC(2) . . . . ≥ GDOC(N ). Ensure: Consistent set of solution sets corresponding to consistent measurements, Snew from S 1: Snew = {S1θ , S2θ }; " Initial set 2: for k ← 3 to N do 3: Ek = {Snew } ∪ {Skθ } " Skθ from S 4: if GCON S(Ek ) > GCON Sthreshold then 5: Accept Skθ 6: Snew = {Snew } ∪ {Skθ }; " Skθ from S 7: else 8: Snew = Snew ; " Basically we are removing the kth measurement which gives solution set Skθ .

A good principle to choose a subset of consistent measurements would be to search for the biggest subset E (the maximal number of measurements) that has a reasonable consistency, that is for which GCONS(E) is above some threshold. Yet, such a search could be exponential in N, which can be quite large, and therefore untractable. This is why we propose a greedy algorithm (Algorithm 1) that makes use of GDOC measures to find a suitable subset E. The idea is quite simple: starting from the most consistent measurement according to GDOC and ordering them according to their individual consistency, we iteratively add new measurements to E unless they bring the global consistency GCONS under a predefined threshold, that is unless they introduce too much inconsistency. Implementation with Discrete Description of Sets To solve the set-valued inverse problem, we need a discrete description of the sets. There are multiple ways to represent the sets in a discrete way, such as using boxes (SIVIA algorithm [6]) or a grid of points. Here, we use the same description as in [7], that is a grid of points, θi , i ∈ {1, . . . , Ng } as shown in Fig. 2a where Ng is the number of grid points. Such a description is convenient when comparing or intersecting the sets since the grid of points is the same for any set. Any set Sθ ⊂ S0θ is then characterized through its discrete characteristic function, defined at any point θi ∈ S0θ of the grid as shown in Eq. (10) and Fig. 2b. ) χSθ (θi ) =

1 if θi ∈ Sθ 0 otherwise

(10)

In the current application, a uniform grid is chosen to describe the prior parameter set S0θ , but it is not mandatory. In our method, each Skθ is therefore described by its discrete characteristic function, defined at any point of the grid as

426

K. Shinde et al.

Fig. 2 Discrete description of sets. (a) Prior set(S0θ ). (b) Characterized set(Sθ )

) χSk (θi ) = θ

1 0

if y˜k ≤ f (θi ) ≤ y˜k . otherwise

(11)

These discrete characteristic functions can be collected in an Ng × N matrix X as columns of boolean values as shown in Eq. (12). The Ng × N matrix X is described as ⎡ ⎤ 1 1 .. 1 ⎢ ⎥ ⎢0 1 .. 1⎥ (12) X=⎢ ⎥ ⎣.. .. .. ..⎦ 1 1 .. 0 where χSk (θi ) is the element of column k and line i. Using matrix X, an N × N θ

symmetric matrix T = XT X can be obtained, whose components are directly proportional to the inverse sets areas, and can therefore be used as an estimation of such areas: ⎤ ⎡ A(S1θ ) A(S1θ ∩ S2θ ) .. A(S1θ ∩ Skθ ) ⎥ ⎢ A(S2θ ) .. A(S2θ ∩ Skθ )⎥ ⎢ A(S1θ ∩ S2θ ) (13) T ∝⎢ ⎥. ⎦ ⎣ .. .. .. .. A(S1θ ∩ Skθ ) A(S2θ ∩ Skθ ) .. A(Skθ ) Indeed, the diagonal element Tkk of T represents the number of grid points for which the kth measurement is consistent and it is proportional to A(Skθ ). The nondiagonal element Tkk of T represents the number of grid points for which both kth and k th measurements are consistent and it is proportional to A(Skθ ∩ Skθ ). Hence, GDOC can be computed from matrix T for any kth measurement as follows N GDOC(k) =

Tk k k =1 Tkk

+

N

2N

Tkk k =1 Tk k

.

(14)

Set-Valued Approach for the Inverse Problem

427

We have presented an identification strategy and outlier detection method that uses intervals to represent information about parameters and measurements. The next section is devoted to applying this strategy to a mechanical inverse problem. We considered two cases: (1) identification from a small number of measurements, (2) identification from a large number of measurements using surrogate modeling.

3 Results In this Sect. 3, we will consider an example that is a typical mechanical inverse problem. We consider the case where θ corresponds to elastic Lamé parameters (λ and μ) and y˜ correspond to full-field displacement data obtained after applying a given load, corresponding to the experimental conditions, on the material specimen of a homogeneous 2D plate under plane strain as shown in Fig. 3a. The plate is clamped on the left side and loaded on the right side by a uniform traction f = 1000 N/m. To generate displacement measurement data y˜ (386 measurements), exact displacement data y Ref is simulated by a Finite Element (FE) model (193 nodes, 336 elements) as shown in Fig. 3b considering the reference values λ0 = 1.15 · 105 MPa and μ0 = 7.69 · 104 MPa. We also consider a possible Gaussian noise with 0 mean (no systematic bias) and with standard deviation σ . In the current work, σ was taken as 5% of the average of all the exact displacement values and in practical cases it can be assumed that σ can be deduced from the measurement technique.

Fig. 3 A homogeneous plate and its model. (a) 2D homogeneous plate. (b) FEM mesh

428

K. Shinde et al.

Fig. 4 Identification of parameters and outlier detection. (a) Empty solution set. (b) Solution set after detecting outlier

3.1 Application with the Set-Valued Inverse Method When Measurements Are in a Small Amount We apply the set-valued inverse method along with outlier detection method to identify the set of elastic parameters when there is random noise in the data. The measurement y˜ is created from y Ref by adding to it a Gaussian white noise with standard deviation σ and the information on the measurement y˜ was described in an interval form: [ y˜ − 2σ , y˜ + 2σ ]. Prior information about the parameters (S0θ ) is considered as a uniform 2D box λp × μp with λp = [0.72 105 , 1.90 105 ] MPa and μp = [ 7.2 104 , 8.15 104 ] MPa. Figure 4a shows that the identified set (green color) when taking all the measurements is empty due to inconsistency within the measurements. To obtain a non-empty solution set, we use GCONS outlier detection method (see Algorithm 1). Figure 4b shows the feasible set (yellow color) of the identified parameter using outlier detection method, with 55 measurements removed out of 386. We can note here that the exact value of the parameter shown by red mark is included in the solution set.

3.2 Application with the Set-Valued Inverse Method When Measurements Are in a Large Amount In practice, mechanical inverse problem often involve thousands of measurements rather than a few hundred. Now, we identify the elastic properties (Lamé parameters: λ and μ) of a homogeneous 2D plate under plane strain, as shown in Fig. 3a from a large number of measurements. Figure 5 shows the 2D box of the grid of points, where each grid represents the values of the parameters λ, μ, and x represent the position of the grid point. In

Set-Valued Approach for the Inverse Problem

429

Fig. 5 Description of parameters with a grid of points

our application, we need to solve the Finite Element(FE) model at each grid point to compute displacements corresponding to parameters. If the inverse problem consists of identification from a large number of measurements, that is, FE model with a large number of nodes and elements, then that leads to high computational cost. To reduce the computational cost, that is, to avoid solving FE calculations at each grid point, we use the radial basis function (RBF) interpolation method to build a surrogate model [8]. In this method, we solve the FE calculations at a few sample grid points, and then using the surrogate model, we compute the displacement at any grid point that helps to reduce computational cost. Now we describe the steps to build the surrogate model to compute the displacement at any grid point using the RBF interpolation method. 1. To build the surrogate model, initially, we randomly choose Ns sample grid points out of Ng number of grid points. We solve the FE model to calculate the displacement vectors at Ns grid points. We obtain the Ns number of displacement vectors which we can store in the matrix X as X = [y1 , . . . , yNs ]

(15)

The size of the matrix X is N × Ns , where N is the size of the displacement column vector,y . 2. In the second step, we perform the singular value decomposition (SVD) of the matrix X as SV D(X) = U ΣV T

(16)

where U is an N × N unitary matrix whose columns are called the left-singular vectors, Σ is an N × Ns diagonal matrix whose diagonal entries are known as the singular values of X, V is an Ns × Ns unitary matrix whose columns are called the right-singular vectors. We use the left-singular vectors as basis vectors corresponding to the largest singular values to approximate the displacement vector, y, at any grid point x as

430

K. Shinde et al.

y(x) =

Nb

φk αk (x) = [φ]α

(17)

k=1

where φ is the matrix of the size, N × Nb , Nb is the number of basis vectors such that Nb ≤ Ns , α is the Nb × 1 vector of unknown coefficients. 3. In the next step, the idea is to find the optimal values of the α. As we know the values of the displacement vector, y, at chosen Ns sample grid points, thanks to the FE calculations, we can compute the values at any ith sample grid point such that i = 1 to Ns by a simple least-square method: Nb α(xi ) = minimize J (α) = φk αk (xi ) − y(xi )22 α∈

(18)

k=1

4. We can compute the values of the coefficients α at sampled grid points using Eq. (18), and the displacement vector y at any sampled grid point xi using Eq. (17). To obtain the displacement at any grid point x, we need to know the values of the coefficients α at any grid point x. For that, we use the RBF interpolation method. In this method, we compute the unknown vector α at any grid point x from known vector α at Ns sample grid points. We approximate the α at any grid point x by defining radial basis function at Ns number of sampled grid points as α(x) =

Ns

wi ϕx − xi = [ϕ][w]

(19)

i=1

where ϕ is a radial basis function and ϕ is an Ns × Ns matrix, w is the Ns × 1 matrix of unknown weights. We choose Gaussian RBF function such that ϕ(x −

i 2 . The unknown weights w can be computed by solving xi ) = exp − x−x 2σ the following linear system of equations. ⎡

ϕ(x1 − x1 ) ϕ(x2 − x1 ) ⎢ ⎢ ϕ(x1 − x2 ) ϕ(x2 − x2 ) ⎢ .. .. ⎢ . . ⎣ ϕ(x1 − xNs ) ϕ(x2 − xNs )

⎤⎡ ⎤ ⎤ ⎡ w1 α(x1 ) . . . ϕ(xNs − x1 ) ⎥⎢ ⎥ ⎥ ⎢ . . . ϕ(xNs − x2 ) ⎥ ⎢ w2 ⎥ ⎢ α(x2 ) ⎥ ⎢ ⎢ ⎥ ⎥ ⎥ = .. .. ⎥ ⎢ .. ⎥ ⎢ .. ⎥ . . ⎦⎣ . ⎦ ⎣ . ⎦ . . . ϕ(xNs − xNs ) w Ns α(xNs )

We can then compute the unknown vector α at any grid point x by knowing weights w using Eq. (19). If we know the values of α then we can compute the displacement vector, y, at any grid point x that helps to avoid solving FE calculations at each grid point. Now, we apply the set-valued inverse method along with GCONS outlier detection method (Algorithm 1) to identify the set of elastic parameters when there

Set-Valued Approach for the Inverse Problem

431

Fig. 6 Outlier detection. (a) Empty solution set. (b) Solution set after detecting outlier

is random noise in the data. To generate displacement measurement data y˜ (12,096 measurements), exact displacement data y Ref is simulated by a Finite Element (FE) model (6048 nodes, 12,094 elements) as shown in Fig. 3b considering the reference values λ0 = 1.15 · 105 MPa and μ0 = 7.69 · 104 MPa. The information on the measurement y˜ was described in an interval form: [ y˜ − 2σ , y˜ + 2σ ]. Figure 6a shows that the identified set (green color) when taking all the measurements is empty due to inconsistency within the measurements. To obtain a non-empty solution set, we use our proposed solution and Algorithm 1 with the value of the GCONSthreshold set to 0.1. We use a low value of GCONS to ensure that a high enough number of measurements will be included. Figure 6b shows the feasible set (yellow color) of the identified parameter using GCONS method, with 1233 measurements removed. We can note here that the exact value of the parameter is included in the solution set (shown by red mark). We solve this identification problem using a surrogate model based on the RBF interpolation method. It took a total time of 55 min to solve the problem. If it is solved without surrogate modeling, it would have taken almost one day. But in the set-valued inverse method, computation time is divided into two parts: the first one concerns solving FE model to compute displacement at each grid point, and the second part concerns the post-processing of a large number of measurements with GCONS outlier detection algorithm. For the first part, we can use surrogate modeling to reduce computational time, and in this application, out of 55 min, it has taken 22 min. Hence, the GCONS algorithm takes 33 min to process 12,096 measurements to detect outliers. One way to reduce the computation time with the GCONS algorithm is that, generally, in the GCONS algorithm, we start with the first two measurements with the largest GDOC values (line 1 of the Algorithm 1). To reduce the time when we have a large number of measurements then one way is to start with large possible group of measurements whose GDOC values are high so that it reduces the time for computation with GCONS algorithm. In this application with the GCONS algorithm, out of 12,096 measurements, the initial set is fixed to the first 8000

432

K. Shinde et al.

measurements with high GDOC value. Then time taken by the GCONS algorithm reduces to 14 min from 33 min (when starting with the first two measurements with high GDOC value). The total time taken to solve the identification problem is 22 + 14 = 36 min. We illustrated the above application to show how we can solve the identification problem with a set-valued inverse problem when measurements are in a large amount using surrogate modeling. In our current work, we also studied the same application with an even larger number of measurements, that is, 37,000 full-field displacement measurements.

4 Summary In this work, we have presented a new parameter identification strategy relying on set theory and on interval measurements. In this approach, we have used intervals to describe uncertainty on measurements and parameters. We have introduced indicators of consistency of measurements, using them to propose an outlier detection method. We applied this strategy to identify the elastic properties of homogeneous isotropic material. The results showed that the identification strategy is not only helpful to obtain a feasible set of the parameters but is also able to detect the outliers in the noisy measurements. We also showed how we could solve the identification problem with a set-valued inverse problem when measurements are in a large amount using surrogate modeling. Acknowledgments The research reported in this chapter has been supported by the project Labex MS2T, financed by the French Government through the program Investments for the future managed by the National Agency for Research (Reference ANR-11-IDEX-0004-02).

References 1. Blais, J.A.R.: Least squares for practitioners. Math. Probl. Eng. 2010, 19 (2010) 2. Chen, Y., Breen, P., Andrew, N.L.: Impacts of outliers and mis-specification of priors on Bayesian fisheries-stock assessment. Can. J. Fish. Aquat. Sci. 57, 2293–2305 (2000). https:// doi.org/10.1139/f00-208 3. Jaulin, L., Kieffer, M., Didrit, O., Walter., E.: Applied Interval Analysis. In: Software Engineering/Programming and Operating Systems. Springer-Verlag, London (2001). https://doi.org/10. 1007/978-1-4471-0249-6 4. Sandretto, J.A., Trombettoni, G., Daney, D., Chabert, G.: Certified calibration of a cabledriven robot using interval contractor programming. In: Thomas, F., Perez Gracia, A. (eds.) Computational Kinematics. Mechanisms and Machine Science, vol 15. Springer, Dordrecht (2014). https://doi.org/10.1007/978-94-007-7214-4-24 5. Zio, E., Pedroni, N.: Literature review of methods for representing uncertainty. Foundation for an Industrial Safety Culture, Toulouse, France (ISSN 2100-3874), 2013-03 of the Cahiers de la Sécurité Industrielle http://www.foncsi.org/en/ (2013)

Set-Valued Approach for the Inverse Problem

433

6. Jaulin, L., Walter, E.: Set inversion via interval analysis for nonlinear bounded-error estimation. Automatica 29(4), 1053–1064 (1993). https://doi.org/10.1016/0005-1098(93)90106-4 7. Sui, L., Feissel, P., Denœux, T.: Identification of elastic properties in the belief function framework. Int. J. Approx. Reason. 101, 69–87 (2018). https://doi.org/10.1016/j.ijar.2018.06. 010 8. Durantin, C., Rouxel, J., Désidéri, J. et al.: Multifidelity surrogate modeling based on radial basis functions. Struct. Multidisc. Optim. 56, 1061–1075 (2017). https://doi.org/10.1007/s00158-0171703-7

Stochastic Preconditioners for Domain Decomposition Methods João F. Reis, Olivier P. Le Maître, Pietro M. Congedo

, and Paul Mycek

1 Introduction Let Ω ⊂ RN , N = 1, 2 an open set and denote by Ω¯ its closure and Θ a set of random events. We are interested in estimating statistics of the solution u of elliptic equation given by

∇ · κ(θ, x)∇u(θ, x) = f (x), u(θ, x) = νj ,

θ ∈ Θ,

θ ∈ Θ,

x ∈ Ω¯

x ∈ ∂Ωj , j = 1, . . . , Nb .

(1a) (1b)

where κ is the coefficient random field, f (x) is a squared-integrable deterministic source, and νj is a piecewise-constant deterministic Dirichlet boundary condition in ∂Ω.

The parameter κ is a log-normal homogeneous field κ(θ, x) := exp g(θ, x) , where g is a centered and homogeneous Gaussian random field with variance σg2 and covariance function defined as

The work in this chapter was partially supported by the H2020-MSCA-ITN-2016 UTOPIAE, grant agreement 722734. J. F. Reis () · P. M. Congedo Inria, Centre de Mathématiques Appliquées, École Polytechnique, Palaiseau, France e-mail: [email protected] O. P. Le Maître CNRS, Centre de Mathématiques Appliquées, Inria, École Polytechnique, Palaiseau, France P. Mycek Cerfacs, Toulouse, France © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Vasile, D. Quagliarella (eds.), Advances in Uncertainty Quantification and Optimization Under Uncertainty with Aerospace Applications, Space Technology Proceedings 8, https://doi.org/10.1007/978-3-030-80542-5_27

435

436

J. F. Reis et al.

C(x, x ) =

σg2 exp

x − x γ γ γ g

,

γ ∈ [1, 2].

(2)

Applications of Eq. (1), for instance in groundwater flow, typically require a wide range of values for κ. Consequently, the coefficient field is usually nonsmooth, which translates into a large variance σg2 , a low correlation length g and a smoothness parameter γ ≈ 1. We are interested in estimating E [z], where . z(θ ) = z (u) is a real-valued random variable derived from this solution. There are two separate classes of methods estimating E [z]: surrogate methods and sampling methods. Due to the well-known limitations of surrogate methods for non-smooth fields, we proceed by sampling, and for the sake of simplicity we consider the particular case of Monte Carlo (MC) sampling. The MC approach amounts to compute each sampled solution u(m) by solving the deterministic counterpart of the elliptic problem (1), corresponding to a sample of the coefficient field κ (m) . Contrary to surrogate methods, MC methods do not depend on the smoothness of the input data, and are therefore preferable in this situation. However, sampling methods present a major drawback: the convergence is slow. Each sampled solution is generally obtained by solving a linear system resulting from some Finite Element (FE) discretization. One way to reduce the cost of generating each sample is by reducing the cost of generating this system. To do so, the authors in [3, 7] construct a surrogate of the FE system, which is then solved by an iterative method. However, the accuracy of the MC estimate depends on the accuracy of the surrogate, which again, is limited for non-smooth data. In the current work, we work with the original system and use the surrogate of the FE system as a preconditioner to accelerate the resulting iterative scheme. We proceed by partitioning Ω into a set of D subdomains Ω (d) , which can either overlap or not. Once this partition is done, we split the nodes on the boundary of each subdomain from the rest of the nodes (including the ones on ∂Ω). The global FE system associated to each sample κ (m) is reduced to a boundary-to-boundary system, denoted by [S]u = bS .

(3)

System (3) is significantly smaller than the global FE system, however, still too large for a direct method. In addition, the matrix [S] is generally badly conditioned, which justifies the use of a preconditioned iterative method. A classical way of preconditioning system (3) is using deterministic preconditioners, that is, preconditioners that are sample independent. These preconditioners are formed by considering approximations of coefficient field that do not vary from sample to sample, such as the median κ¯ or the mean E [κ]. One of the main advantages of this approach is that once the preconditioner is available, it can be re-used for any number of samples with no additional cost. However, if the constant approximations κ¯ or E [κ] are significantly far from κ, then the corresponding preconditioner is poor. This is exactly the situation for the type of

Stochastic Preconditioners for Domain Decomposition Methods

437

non-smooth fields that we are concerned in this work. From this observation, we propose a stochastic preconditioner. Each sample of the stochastic preconditioner is adapted to the sampled parameter field of the elliptic equation. This strategy turns out to be very effective while having a very low cost per sample. We provide two reviews of the application of stochastic preconditioners in different contexts. These contributions include extensive comparisons between different stochastic and deterministic preconditioners.

2 Acceleration of the Schwarz Method The first contribution [10] of this project concerns the acceleration of the additive Schwarz method (SM) using stochastic preconditioners. Consider a sample κ (m) and the corresponding deterministic equation from (1). The additive Schwarz method (SM) [5] is a classical domain decomposition method to find the solution of this equation. Each new iteration amounts to find a new set of boundary values uk+1 by $ solving an independent set of local problems for each subdomain. We proved in [10] that each SM iteration is given as uk+1 = [LS ]uk$ + bS , $

(4)

where [LS ] = [I] − [S], and [S] is obtained from the resolution of local problems at each overlapping subdomain. See [10] for the definition of [LS ] and [S]. The fixed point of the sequence (4) is the solution of the system (3). The convergence of the SM is given by the spectral radius of [LS ], denoted as ρS . The SM is always converging for the type of domains we use, therefore ρS < 1 for all samples. Consider the “gap” between two consecutive SM iterates gk := uk+1 − uk$ . $ k −1 k Using (4), we have u$ = u$ + [S] g . A preconditioner for the SM is a matrix [P]−1 ≈ [S]−1 that accelerates iteration (4), as described in Algorithm 1. Similarly to the SM, we proved in [10] that a Preconditioned SM (PSM) iteration is given by = [LP ]uk$ + [P]−1 bS uk+1 $

(5)

Algorithm 1 Preconditioned Schwarz iteration 1: procedure PSM-ITERATION(uk$ , κ, f, [P]) 2: g ← SM-ITERATION(uk$ , κ, f ) − uk$ 3: uk+1 ← uk$ + [P]−1 g. $ 4: Return uk+1 $ 5: end procedure

" Do one Preconditioned Schwarz iteration " Compute gap " Compute the preconditioned update " Return the updated vector uk+1 $

438

J. F. Reis et al.

where [LP ] := [I] − [P]−1 [S]. For each sample κ (m) , the spectral radius ρP of the corresponding matrix [LP ] dictates the speed of convergence of the PSM. Ideally, we would like to have ρP - ρS . Note that, if ρP ≥ 1 than the sequence diverges. The ideal preconditioner yields the solution of (3) in a single iteration, that is, [P]−1 = [S]−1 . However, this requires to factorize [S] for each iteration, which should be more expansive than preforming the SM with no preconditioning. In the stochastic context, it makes sense to construct a stochastic operator that should yield preconditioners [P] ≈ [S] for each sample κ (m) . We call this stochastic operator the stochastic preconditioner. To this end, we introduce the KLexpansion [4, 6, 12] of a Gaussian field g with NKL modes given by gˆ NKL := μ +

N KL

#

λi φi (x)ξ i ,

(6)

i=1

where ξ = (ξ 1 , . . . , ξ NKL ) are i.i.d. standard normal random variables. Define κˆ NKL := exp(gˆ NKL ).

(7)

Using the KL-expansion (6), we construct the a KL-Preconditioner [F S], which cor(m) responds to the boundary-to-boundary operator for the sample κˆ NKL = κˆ NKL (ξ (m) ). In the following, we illustrate the spectral properties of [F S], which influence the convergence of sequence. The results in this section correspond to a one-dimensional problem with homogeneous Dirichlet boundary conditions. The parameters of the covariance matrix (2) are σg2 = 4, c = 0.1 and γ = 2. The numerical solution is given on a mesh with 1005 elements and 100 subdomains. Each plot in Fig. 1 represents the complex plane with the unit circle and a cloud of points. Each point represents the spectral radius of a matrix [LP ] that was formed using a preconditioner with NKL modes. In each plot we represent 3000 sampled points. The number of NKL modes decreases from the top-left corner to the bottomright corner. The case of NKL = 0 is the particular case were κˆ NKL is equal to the median of κ, denoted by κ. ¯ This is a deterministic preconditioner, since κ¯ does not change from sample to sample. This preconditioner offers the advantage that once it is constructed, it can be applied to as many samples as necessary with no extra cost. However, as it is depicted in the figure, the matrix [LP ] has a significant number of spectral radius outside the unique disk. This means that the sequence (5) given by this preconditioner is divergent. For NKL > 0, we now have a preconditioner that changes from sample to sample. There is more information from κ into [F S] and therefore the performance of this preconditioner should be better. This is visible from the contraction of the cloud of spectral radii onto the origin of the complex plane. At NKL = 8 we have all 3000 samples stable. The cloud of points continues to contract towards the origin, meaning that the convergence will be faster The KL-based preconditioner described above can be very effective, however, we are required to compute [F S] for each sample. In order to efficiently generate each

Stochastic Preconditioners for Domain Decomposition Methods

439

Fig. 1 The spectrum of 3, 000 matrices [LP ], with different NKL . A particular sample is highlighted in dark

preconditioner, we can use a set of [F S] to construct the stochastic preconditioner [3 S]. In [10], the stochastic preconditioner [3 S] has been constructed as a PC expansion on the parameters ξ introduced in (6). For an accurate enough surrogate, the resulting preconditioner [3 S] = [3 S](ξ (m) ) should be close to [S], and therefore, provide good acceleration rate for the sequence (5). We call the preconditioner [3 S] the Polynomial Chaos (PC) preconditioner. The performance of this surrogate is illustrated in Fig. 2. The average spectral radius of 10,000 matrices [LP ] corresponding to PC-approach (continuous line) is compared with the corresponding curve for 10,000 matrices corresponding to the KL-approach (dashed line). Once again, we see a clear decreasing tendency of the average spectral radius. The surrogate approach seems to reasonably follow the curve corresponding to the KL-approach. The results above prove of that an approximation of the boundary-to-boundary operator by a PC expansion can be used as a preconditioner of an iterative scheme to solve system (3). The stochastic preconditioner [3 S] applied Schwarz method is very effective, provided that s is large enough. However, the case illustrated here has a drawback: contrary to the SM, it is not possible to always have stable samples for the PSM. In addition, even if the sample is stable, we must use a large number of KL modes in order to have effective preconditioners. This was evident from Fig. 2, where for a small one-dimensional problem, the number of KL modes necessary to achieve an acceptable percentage of stable samples is relatively high. We expect that the number of KL modes necessary for higher dimensional problem is larger, which implies that we may not get accurate enough preconditioners. In the following, we present a two-dimensional case where both the problem of stability and large NKL are solved.

440

J. F. Reis et al.

Fig. 2 Average spectral radius of [LP ] for the KL-based and PC-based approach, over 10,000 samples

3 Acceleration of Schur Complement Based Methods The second contribution [9] aims at mitigating the two previous issues as well as expanding the analysis to a two-dimensional example. Consider a non-overlapping partition. The resulting boundary-to-boundary system (3) is the well-known Schur complement system. Applications of the Schur complement usually exploit the block structure of [S], which is given as [S] =

D [R(d) ]. [S(d) ][R(d) ]

(8)

d=1

where [R(d) ] denotes the usual restriction matrix, and [S(d) ] is the so-called influence matrix of subdomain Ω (d) [3]. Each influence matrix depends uniquely on the restriction of the coefficient field to the corresponding subdomain κ (d) (x) := (d) . κ x∈Ω The Schur complement is never explicitly assembled, and instead, the action of [S] to vector is split into local matrix-free multiplications between [S(d) ]. Similarly to the Schwarz approach, we can devise the Schur complement corresponding to the KL-expansion of κ as in (6). We would, however, require a significantly large number of modes to achieve decent accuracies, therefore, we must change to a local (d) strategy. To this end, we proceed by a DD-KL approach [2], and denote by κˆ NKL the local KL-expansion of κ restricted to Ω (d) , on s (d) modes. Each local KL-expansion depends on a set of local parameters ξ (d) . According to [3], we build influence (d) S(d) ]. Using the construction (8), matrices based on κˆ NKL , which we denote by [F

Stochastic Preconditioners for Domain Decomposition Methods

441

Algorithm 2 Procedure to compute one solution sample with the FPCG method 1: procedure FPCG-SOLVE(Sample κ (m) , tolerance tol, initial guess u0 ) 2: Set [3 S] = [0]; " Initialize Preconditioner 3: for d = 1, . . . , D do " Loop over subdomains 4: Set local problem and samples ξ (m) as in [2]; 5: Construct [S(d) ] from ξ (m) ; . [S(d) ] [R](d) ; " Update Preconditioner; 6: Set [3 S] ← [3 S] + [R](d) 7: end for 8: Set [3 S]−1 9: Set u$ = PCG(u0$ , [3 S]−1 , tol); 10: Return u$ ; 11: end procedure

" Inversion of [3 S] " Do PCG solve " Return solution

the matrix [F S] represents the Schur complement matrix based on the set of influence matrices [F S]. In [3], the authors devise a stochastic matrix [3 S](d) based on ξ (d) , and which we use as a starting point of our approach. Now, consider a fixed sample κ (m) and the corresponding problem (3). We use a Conjugate Gradient (CG) method to solve (3), which is preconditioned by the matrix [3 S] :=

D [R(d) ]. [3 S(d) ][R(d) ],

(9)

d=1

where [3 S(d) ] = [3 S](d) ξ (m) . Likewise in the Schwarz case, the approach here is split into two steps: the (d) preprocessing stage and the sampling stage. The preprocessing stage is where [3 S ] are constructed, and we refer to [3, 9] for more details on this part. The sampling stage, described in Algorithm 2, is where the sampled preconditioner is assembled and the Preconditioned Conjugate Method (PCG) scheme is preformed. For each subdomain, we start by setting the corresponding KL expansion and the set of local parameters according to [2]. Then, the preconditioner [3 S] is constructed by the local contributions of each [S(d) ]. Once available, [S] is used into the PCG scheme. The preconditioner obtained directly from the local PC expansions of each influence matrix is called the Direct PC (DPC) preconditioner. One can prove that the DPC preconditioner is very close to the original Schur complement matrix, for some general matrix norm. However, contrary to the Schur complement matrix, the sampled preconditioners resulting from the DPC preconditioner are not guaranteed to be symmetric and positive-definite (SPD). This means that the resulting PCG iterations are unstable, which is related to one of the issues raised in the previous contribution. To fix this, we proposed a Factorized PC (FPC) preconditioner in [9], that guarantees the positiveness of [3 S]. Details on the FPC construction are found in [9]. In the following, we consider a two-dimensional elliptic equation defined on the unit square, and present numerical tests that illustrate the effectiveness of the FPCG

442

J. F. Reis et al.

method. The parameters corresponding to the covariance (2) and to the spatial mesh are given for each case. The size of each subdomain is roughly the same. Similarly as in the Schwarz case, we consider the median coefficient κ¯ and the corresponding ¯ We call the PCG method preconditioned by [S] ¯ by deterministic preconditioner [S]. Median PCG (MPCG) method. The performance of the stochastic preconditioner is assessed by computing the stochastic ratio. # MPCG iterations . (10) # DPCG or FPCG iterations We are usually interested in E ρ , which denotes the average acceleration of either the Direct Preconditioned Conjugate Gradient (DPCG) method or the Factorized ¯ preconditioner. Preconditioned Conjugate FPCG method w.r.t. the [S] Increasing the variance makes the problem more complex, and for large σ 2 , the ¯ turns out median κ¯ lacks information from κ. Consequently, the preconditioner [S] to be less effective for larger σ 2 . Therefore, we expect that our surrogate approach ¯ in the context of large variance. perform significantly better than [S] Figure 3 illustrates the average acceleration over 100 samples of both the DPCG and the FPCG for different variances. The horizontal line indicates that the number of iteration for the stochastic preconditioner equals the ones for the deterministic preconditioner, thus, there is no acceleration. The DPCG method (dashed curve) loses its effectiveness as σ 2 increases. This is because, although, the DPC preconditioner [3 S] is a matrix that is much closer to the associated [S] than ¯ is, [3 [S] S] is not necessarily SPD for all samples. For the samples corresponding to non-SPD preconditioner (the “unstable” samples), the PCG still converges but takes significantly more iterations. The number of “unstable” samples increases with σ 2 , which explains the increasing behavior of the DPCG curve. On the contrary, the FPC preconditioner (continuous curve) is guaranteed to be always SPD, that is, there are ¯ no unstable samples and the performance of [3 S] is always better than the one of [S]. All in all, the average acceleration curve is increasing because the performance of ¯ is deteriorated with σ 2 and at a much higher rate than the FPCG’s. [S] ¯ preconIn [9], we concluded that the performance of both the [3 S] and the [S] ditioners increase with the number of subdomains, if all other parameters remain fixed. This has to do with the amount of the energy of the coefficient field retrieved (d) by each κˆ NKL . Indeed, for a fixed number of modes, the percentage of energy is larger as the size of the corresponding subdomain is smaller. We refer to [9] for more details on how the energy of κˆ NKL affects the performance of the PCG method. We also note that, by increasing the number of subdomains, the number of boundary nodes also increases, that is, the size of problems (3) increase. This implies that the MPCG method will require more iterations to converge and will inevitably diminish the benefits of using smaller subdomain discussed before. On the contrary, the number of FPCG iterations does not necessarily decrease, because the preconditioner changes. Figure 4 illustrates the average acceleration of the FPCG method using NKL = 4 and different subdomains. The monotonically . ρ=

Stochastic Preconditioners for Domain Decomposition Methods

443

Fig. 3 Comparison of the average acceleration between FPCG and DPCG, for different σ 2 . Other parameters are: σ 2 = 1, c = 0.05, γ = 1.2, s (d) = 3, D = 100

Fig. 4 Average acceleration of the FPCG method as a function of the number D of subdomains. Other parameters are: σ 2 = 1, c = 0.05, γ = 1.2, s (d) = 4

increasing curve suggests that the gain obtained by using smaller subdomains is bigger for the FPCG method than for the MPCG method. The FPCG method is also shown to be scalable with the number of subdomains, as the number of iterations consistently decreases with the number of subdomains. When discussing Fig. 4, we referred that the increasing number of boundary nodes could imply more PCG iterations, because the size of the problem (3) is larger. Figure 5 suggests that it does not. In this numerical experiment, we use a single random variable per subdomain and study the average number of FPCG iterations for 1000 samples. The benefits of decreasing the size of the subdomains largely compensate for larger size of the system, as the average number of iteration decreases. The construction of the preconditioner is therefore done at a minimal local cost, while the number of iterations is consistently smaller.

444

J. F. Reis et al.

Fig. 5 Averaged # of FPCG iterations with D. Other parameters are: σ 2 = 1, c = 0.05, γ = 1.2, s (d) = 3

4 Conclusions and Perspectives The idea of stochastic preconditioning has been greatly exploited in the two contributions [9, 10]. The goal of this chapter was to summarize the main results obtained in these two contributions and motivate further investigation on this topic. The starting point of this project has been the works [2, 3]. Here, the authors devise a stochastic operator that is sampled to obtain boundary-to-boundary system such as (3). The surrogate, however, yields a Schur complement system, corresponding to the coefficient field κˆ NKL ≈ κ. The resulting system is then an approximation of the original system. Consequently, the accuracy of the resulting numerical solution is bound to the accuracy of the surrogate. In the two contributions summarized here, we work with the original problem, and solve it using some iterative method. In [10], the iterative method used corresponds to the well-known additive Schwarz method, whereas in [9] we devise the Schur complement system and solve it using a PCG method. In both cases, a similar surrogate to the one devised in [3] is used to sample preconditioners and accelerate each iterative scheme. We call this surrogate the stochastic preconditioner. Before discussing some perspective work that is currently ongoing, we summarize some extra results coming from [9]. The performance of the FPCG method boils down to how much energy the set of local KL-expansions retrieve. The number of iterations is therefore minimal if we use as many subdomains as it is possible. Moreover, the cost of constructing each PC expansion consistently decreases with the number of subdomains because it is directly related to the number of KL modes used. This is a natural consequence from the fact that, if we aim for a certain fixed performance, we should be able to use less KL modes as the size of subdomain decrease. In terms of memory requirements, the FPCG method is also scalable with the number of subdomains. This once again is expectable, since the construction of each local PC expansion requires less memory as the number of subdomains increases.

Stochastic Preconditioners for Domain Decomposition Methods

445

We have been assessing the performance of each method by the number of iterations. However, it is important to understand how expensive is each iteration. We have seen that Algorithm 2 requires the factorization of [3 S] for each sample. The matrix [3 S] is SPD and a Cholesky factorization of [3 S] has cost of O(N$3 ), where N$ denotes the number of boundary nodes. If the size of subdomains is very small, than N$ is close to the actual number of nodes. In addition to this the preconditioner does not benefit from the block structure that is characteristic of the Schur complement. Consequently, the action of [3 S] at each iterate is preformed by a global operation. We would like to avoid a global operation, and instead proceed in a matrix-free fashion, similarly to what is usually done with [S]. Ideally, we should be able to devise a stochastic operator to get [3 S]−1 directly, −1 3 with no need of factorization. Moreover, the action of [S] at each PCG iteration should be divided into local operations, that are independent and can therefore be done in parallel. We could proceed by using [S(d) ]−1 in (8) and get away with [3 S]−1 . (d) −1 (d) However, [S ] does not exist for Ω that does not share a boundary with Ω (see [8] for more details). Therefore, this is not an option. To go about this, we proceed with a deflation type method to solve the problem (3). In particular, we aim at developing the idea of multipreconditioning, as in [1, 11]. Each iteration of the new multipreconditioned PCG should be significantly cheaper, since no global operators are neither applied nor inverted, thus compensating for the eventually larger number of iterations. More importantly, the preconditioining is based on pseudo-inverses of each influence matrix, which avoids any type of factorization.

References 1. Bridson, R., Greif, C.: A multipreconditioned conjugate gradient algorithm. SIAM J. Matrix Anal. Appl. 27(4), 1056–1068 (2006) 2. Contreras, A.A., Mycek, P., Le Maître, O.P., Rizzi, F., Debusschere, B., Knio, O.M.: Parallel domain decomposition strategies for stochastic elliptic equations. Part A: Local Karhunen– Loève representations. SIAM J. Sci. Comput. 40(4), C520–C546 (2018) 3. Contreras, A.A., Mycek, P., Le Maître, O.P., Rizzi, F., Debusschere, B., Knio, O.M.: Parallel domain decomposition strategies for stochastic elliptic equations. Part B: Accelerated Monte Carlo sampling with local PC expansions. SIAM J. Sci. Comput. 40(4), C547–C580 (2018) 4. Le Maître, O.P., Knio, O.M.: Spectral Methods for Uncertainty Quantification: with Applications to Computational Fluid Dynamics. Springer Science & Business Media (2010) 5. Lions, P.L.: On the Schwarz alternating method. I. In: First International Symposium on Domain Decomposition Methods for Partial Differential Equations, vol. 1, p. 42. Paris, France (1988) 6. Lévy, P., Loeve, M.: Processus Stochastiques et Mouvement Brownien. Gauthier-Villars, Paris (1965) 7. Mu, L., Zhang, G.: A domain decomposition model reduction method for linear convectiondiffusion equations with random coefficients. SIAM J. Sci. Comput. 41(3), A1984–A2011 (2019) 8. Quarteroni, A., Valli, A.: Domain Decomposition Methods for Partial Differential Equations. Oxford University Press (1999)

446

J. F. Reis et al.

9. Reis, J.F., Le Maître, O.P., Congedo, P.M., Mycek, P.: Stochastic preconditioning of domain decomposition methods for elliptic equations with random coefficients in Computer Methods in Applied Mechanics and Engineering, 381:113845 (2021) 10. Reis, J.F., Le Maître, O.P., Congedo, P.M., Mycek, P.: Preconditioning the Schwartz Method for Stochastic Elliptic Equations in preparation 11. Spillane, N.: An adaptive multipreconditioned conjugate gradient algorithm. SIAM J. Sci. Comput. 38(3), A1896–A1918 (2016) 12. Wiener, N.: The homogeneous chaos. Am. J. Math. 60(4), 897–936 (1938)

Index

A Ablation, 350 Ablative materials, 350, 352 Acceleration, 112, 113 ACTRAN simulations, 244 Adaptive Metropolis (AM) algorithm, 357 Adaptive sparse polynomial chaos expansion, 235–238, 242 adaptive PCE, application of, 237–238 adaptive PCE, formulation of, 234–235 adaptive sparse polynomial chaos expansion, 235–237 application of, 237–238 formulation of, 234–235 Adiabatic reactor, 381 Adjoint formulation of optimization, 231 Aerodynamic constraints, 334 Aeronautical optimization under uncertainties nacelle acoustic liner and manufacturing tolerances, 232–233 nacelle acoustic liner FEM model, 233–234 Airfoil generation, 38–39 Airfoil optimisation aerodynamic shape optimisation, 289 design optimisation, 283–284 design variables and objectives, 292 DoE technique, 288 fidelity selection, 290 Latin Hypercube Sampling (LHS), 288 LF and HF samples, 290 mean prediction error, 290 MF-GPR, 286 multi-objective probabilistic optimisation workflow, 289

optimal designs, 292 optimisation problem, 289, 294 Pareto optimal solutions, 291 relative prediction error, 290 Scaled Expected Variance Reduction (SEVR) values, 288 solvers, 284–285 surrogate model, 288 uncertainty modelling techniques analytical propagation, 287 cumbersome process, 287 relative error, 288 Alation coefficients, 362 Analytical test function, 267–268 Anti-ice electro-thermal ice protection systems (AI-ETIPS) binned representation of statistic distribution of icing QoIs, 31 cloud uncertainty characterization, 28, 29 computational model ANTICE, 24 CANICE, 24 impinging mass rate, 25 mass conservation equation, 26 PoliMIce, 25 Fully Evaporative, 23, 30 heat fluxes, 25, 27 layout of the heaters, 27 maximum freezing mass rate, 27 NACA0012 airfoil, 26 Newton severity scale, 31 RunningWet, 23, 28, 30 total freezing mass rate, 27

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Vasile, D. Quagliarella (eds.), Advances in Uncertainty Quantification and Optimization Under Uncertainty with Aerospace Applications, Space Technology Proceedings 8, https://doi.org/10.1007/978-3-030-80542-5

447

448 Anti-ice electro-thermal ice protection systems (AI-ETIPS) (cont.) uncertainty propagation methodologies gPCE, 29–30 Monte Carlo sampling methods, 29, 32 Apollo 10, 110, 125 Apophis rendezvous test, 222, 224, 228 Approximate matrix exponential method, 161–162 Approximation with adaptive grid method, 156 Artificial neural networks (ANN), 385 Asteroid tour test, 221–222 Availability (A(t)), 69

B Bayesian confidence interval, 398 Bayesian estimation, 395, 396 Bayesian formulation, 359 Bayesian methods, 351 Bayesian optimization, for robust solutions under uncertain input, 245–246 evolutionary algorithms, 246 experimental setup, 254 Latin Hypercube sampling, 254 methodology direct robustness approximation, 248–249 Gaussian process, 247–248 robust Bayesian optimization, 248–252 robust knowledge gradient, 249–251 Stochastic Kriging, 251–252 one-dimensional test functions, 253 opportunity cost, 247, 254 problem definition, 247 Robust Knowledge Gradient, 246 Stochastic Kriging, 254–256 uncontrollable input, 256, 257 Bayesian posterior, 180 Belief and plausibility, 317 Bernoulli distributed latent variables, 365 Bernstein polynomials, 209, 210 Beta distribution, 368 Bifurcation theory, 322 Blade Passage Frequency (BPF), 244 Boltzmann equation, 353 Bound approximation, 140 Boundary Layer (BL) code, 353, 354 Bounded linear map, 193 Bound estimator, 138–140 Branching, 139

Index C C++, 110, 126 Cartesian product, 422 Cauchy problem, 113 CFD simulations, 100 Chebyshev interpolation, 110 Cholesky factorization, 445 Chromosome, 72 Coefficient of reflexivity, 112, 117, 124 Coherent lower previsions, 132, 152, 155 Coherent upper previsions, 152, 155 Combustion process, 383 Comet Siding Spring, 117 Complex Engineered System (CEdS), 315 belief and plausibility, 317 belief curves, 325 combined method, 320 evidence-based optimisation, 324, 325 frame of discernment, 316 network decomposition, 319 optimisation approach, 321 quantities of interest (QoIs), 316, 317 resilience framework, 322–323 system network model, 317–318 Theory of Evidence, 316–317 tree-based exploration, 319–320 Complexity analysis, 201–204 Computational efficiency, 10, 56, 337 Computational fluid dynamic (CFD), 36, 282, 350 Computational method, 199–201 Computer representation, of credal sets, 180–181 Conditional-value-at-risk (CVaR), 43, 66, 67, 328 Continuous time imprecise Markov chains in distributions at time t, 154 imprecise distributions over states, 152–153 imprecise transition rate matrices, 153–154 Control map, 222–225 Convergence, 140 Correlated fields, 98, 99, 106 Correlation-based techniques, 380 Correlation plot matrix, 376 Coupling mechanisms, 350 Credal set merging, 181–182 Credal sets, 152, 180–181 Crossover rate (CR), 77

Index CTIMC bounds calculation, numerical methods for approximate matrix exponential method, 161–162 checking applicability, matrix exponential method, 160 matrix exponential method, 158–159 normal cone inclusion, checking, 160–161 Cumulative distribution function (CDF), 233, 237 Cytogenetic biodosimetry methods, 403 Cytogenetic biomarkers, 393

D DACE tool, 110, 219 Dawson (double-model) linearization, 11 Degree of inclusion (DOI), 423, 424 De-Ice technologies, 22 Dempster–Shafer theory, 210, 316, 318 Design and Maintenance Strategy Analysis Software, 78 Design-space dimensionality reduction in shape optimization FFD shape modification, 12, 15, 16 geometry-based formulation associated mean shape modification, 6 block diagram for simulation-based shape optimization, 8 comparison of pressure field on parent, 15 comparison of wave elevation pattern produced by the parent, 16 eigenvalues, 7 eigenvectors magnitude, 14 epistemic uncertainty, 5 Hilbert space, 5 KLE, 6, 7 optimal hull shapes, 15 optimal shapes for original and reduced design spaces, 15 shape modification vector, 6 optimization convergence, 12, 14 physics-informed formulation associated variance, 9 block diagram for simulation-based shape optimization, 10, 11 comparison of pressure field on parent, 15 comparison of wave elevation pattern produced by the parent, 16 domains for shape modification vector, 9 eigenvectors magnitude, 14

449 Hilbert space, 9 linear representation, 10 lumped physical parameter vector, 8, 9 optimal shapes for original and reduced design spaces, 15 WARP, 11, 12 Deterministic control map, 217–218 Deterministic optimisation, 335 Dicentric assay, 394 Differential Evolution operator, 72 Directed Relation Graph method with Error Propagation (DRGEP), 380 Direct robustness approximation (DRA), 246, 248–249 Discrete description of sets, 425–427 Dissociation, 381 Distance of Closest Approach (DCA), 141 DOP853, 114, 120, 122 Drag coefficient, 36

E Earth’s sphere of influence (SOI), 119, 120, 122, 123, 126 Efficient global optimisation (EGO) approach, 245, 298, 321 EGM96 geopotential model, 141 Electro-thermal IPS (ETIPS), 23–27, 30 Empirical cumulative distribution function (ECDF), 144–147, 331 Engine nacelle, reliability-based optimization of OASPL, 239–241 optimization platform, 238–239 Entropy drag, 40 Entropy Search, 245 Epistemic model, 134 Epistemic uncertainty, 132, 142, 209 filtering under, 132–133 bound estimator, 138–140 expectation estimator, 135–137 imprecise formulation, 134–135 quantifying, 210 Epistemic uncertainty formulation, 210 Error estimation, imprecise Markov chains, 162 general error bounds, 162–163 for single step, 163 for uniform grid, 164 ESTECO, 239 Estimator, 116, 117 Estimator derivatives, 136–137 Ethylene-air mixtures, 391

450 Euler method, 40 Euler’s explicit method, 113 Euler’s method, 113 European Space Agency (ESA), 142 Evolutionary algorithms (EA), 76 Evolutionary Optimization Algorithms, 72 Excitation processes, 381 Expectation functionals, 153 Expected hitting times, imprecise Markov chains, 185–187 complexity analysis, 201–204 computational method, 199–201 existence of solutions, 188–199 Exponential families, 179–180 Exposed fraction dicentric biomarker, 399 γ -H2AX biomarker, 399 ZINB1 regression, 402 Exposure fraction, 395

F Far-field approach, 37, 38 Far-field drag coefficient calculation, 40–42 Feed-back control law, 210 Finite element (FE), 427, 436 First In, First Out (FIFO) approach, 60 First Order Reliability Method (FORM), 237 Flight_Condition, 238 Fluid injection system, 75 Flyover approach, 233 fmincon-sqp® , 219 Fokker–Planck Equation, 109 Four-parameter beta distribution, 335 Frame of discernment, 316 Free-form deformation (FFD) method, 12, 15, 16 Friedman’s test, 79 Fully Evaporative (FE), 23, 30 Functionability profile, 70–74

G Gaia dataset, 377 Gas-surface interactions, 350 Gaussianity, 131 Gaussian mutation, 88 Gaussian processes (GP), 247–248 covariance function, 331 CVaR risk function cumulative distribution function, 330 drag and configurations, 329 optimisation problem, 328 value-at-risk, 329

Index empirical cumulative distribution function (ECDF), 331 risk function approximation, accuracy and bias, 330 robust optimisation first robust optimisation run, 337–339 fourth GP retraining, 342–345 fourth robust optimisation run, 343–345 GP retraining, 339–340 optimisation problem setup, 334–336 preliminary GP training, 336–337 second robust optimisation run, 340–341 third GP retraining, 341–342 third robust optimisation run, 342 training methodology, 331–333 Gaussian process regression (GPR), 380 Generalized polynomial algebra (GPA), 110, 210 Generalized Polynomial Chaos Expansion (gPCE), 29–32 Generational distance (GD), 222 Geometric constraints, 334 γ -H2AX assay, 393 Global degree of consistency (GDOC), 424, 425 Global optimization (GO), 3 Global search, 138–139 Golub-Welsh algorithm, 99 Gradient-based optimization, SCBs analytical test function, 267–268 gradients of the statistics, 264–265 motivation, 262 optimization framework, 265–266 SBUQ framework, 263–264 Graph-based techniques, 380

H Hausdorff distance, 222 Hermite Polynomials, 235 Heterogeneous chemical processes, 350 Hicks-Henne bump functions, 39 Hierarchical Gaussian process regression technique, 52 High-density polyethylene, 89 High-energy electrons, 379 High-lift devices (HLDs) computational fluid dynamic (CFD) flow solver, 298 crosses and plus symbols, 308 deterministic optimisation airfoil, 298, 310, 311 flow field convergence, 307

Index machine learning assisted optimisation classification method, 301 EGO strategy, 299 surrogate models, 300–301 optimisation design variables, 302–304 setup, 307 quadrature approach, 302 robust design optimisation problem artificial objective function, 306 optimisation process, 304 original objective function, 305–306 robust optimum airfoil, 309, 311 SCGA settings, 307 High performance computing (HPC) systems, 3 Homogeneous Gaussian random field, 435 Hub leakage flow, 97 Hypervolume, 72, 78–81, 87–89, 92, 93, 288 Hypervolume average vs. evaluations evolution, 80

I Ice accretion, 22 Ice Protection Systems (IPS), 22, 23, 25, 26 Ignition process, 383 Imprecise continuous-time Markov chains, 155 Imprecise Markov chains, 151–152, 185–187 algorithm, 164–166 applicability, 151 complexity analysis, 201–204 computational method, 199–201 in continuous time distributions at time t, 154 imprecise distributions over states, 152–153 imprecise transition rate matrices, 153–154 CTIMC bounds calculation, numerical methods for approximate matrix exponential method, 161–162 checking applicability, matrix exponential method, 160 matrix exponential method, 158–159 normal cone inclusion, checking, 160–161 error estimation, 162 general error bounds, 162–163 for single step, 163 for uniform grid, 164 examples, 166–169 existence of solutions, 188–199

451 imprecise Q-operators, normal cones of, 156–157 lower expectations, numerical methods for computational approaches, 155–156 and transition operators, as linear programming problems, 155 Q-matrices, norms of, 157–158 Imprecise models, 176–178 Imprecise Q-matrices, 153 Imprecise Q-operators, 156–157 Imprecise transition rate matrices, 153–154 Individual radiation sensitivity, 393 In-flight icing, 22 Initial state uncertainty, 142–143 Integrator, 116, 117 Intel Xeon E5645Westmere-EP, 78 Intel Xeon Gold 6126 CPUs, 121 Intensified Charge-Coupled Device (ICCD), 353 Intrusive Chaos Polynomials, 231 Inverse-gamma distribution, 366 Inverse generational distance (IGD+), 222 Ionization, 381 ISAE-SUPAERO, 110, 127 Isochoric reactor, 381

J Jet engine nacelle, reliability-based robust design optimization of, 231–232

K Karhunen–Loève expansion (KLE), 4–7, 11–13, 15–17 k-epsilon low Reynolds Yang-Shih turbulence model, 100 Kinetic enhancement, 383 Knowledge Gradient (KG), 245, 249 Kolmogorov backward equation, 151, 169 Kolmogorov-Smirnov (K-S) test, 233

L Laboratory data, 394 Lagrange interpolating polynomial, 99 Latin hypercube method, 118 Latin Hypercube Sampling (LHS), 249, 254, 257 Leading Edge Radius (LER), 42 Leakage flow, 97, 100, 106 Least Angle Regression (LAR) technique, 235 Legendre functions, 333 Leonardo Aircraft, 232

452 Leonardo proprietary semi-empiric impedance model, 233 Linear design-space dimensionality reduction methods, 4 Linear models, 366 Linear optimization approach, 210 Linear potential-flow theory, 11 Lipschitz constant estimation, 139 Lipschitz continuity, 148 Local thermodynamic equilibrium (LTE), 355 Logistic regression, 395 Longest Edge Bisection (LEB), 139 Low Earth Orbit (LEO), 131, 148 Lower expectation algorithms, 153 computational approaches, 155–156 multi-objective robust trajectory optimization, 212–213, 225–227 estimating expectation, 214–216 minimizing expectation, 213–214 and transition operators, as linear programming problems, 155 Lower probability, 186 Lower reachability, 187 Lower transition operator, 154, 186 Low-thrust multi-asteroid fly-by tour, 211 M Machine learning assisted optimisation classification method, 301 EGO strategy, 299 surrogate models, 300–301 MacMinMax, 219 Main inlet, 98 Manufacturing tolerance, 232–233 MareNostrum 4 system, 60 Markov assumption, 133 Markov Chain Monte Carlo (MCMC), 174–175, 357 credal set merging, 181–182 credal sets, computer representation of, 180–181 exponential families, linear representation for, 179–180 for imprecise models, 176–178 practical implementation, 178–179 simultaneous sampling, 175–176 Markov decision processes (MDPs), 204 Markov decision process formulation, 210 Martin Hepperle MH 114 airfoil, 42 Master node, 60 MATLAB, 78 Matlab’s® genetic algorithm, 214 Matrix exponential method, 158–160

Index Maximum Likelihood estimation, 395 Max-min control map, 218–219 Mean Leave one Out Error (ErrLOO ), 236 Mean Time To Failure (MTTF), 71 Mean Time To Repair (MTTR), 71 Measurement errors, 422 Metropolis-Hastings algorithm, 176, 183 Mild ice accretion, 22 Min-max control map, 219–220 Mmg software, 62 modeFRONTIER, 238 Model parameters Bayes theorem, 356–357 heterogeneous catalysis, 354–355, 357–359 thermochemical ablation, 355–356, 359–362 Modified airfoils, 39 MOEA/D-DE box plots of final hypervolume, 81 clustered non-dominated front and design options, 81 Friedman’s test, 79 functionability profiles building, 72–74 extracting availability and economic cost, 70–72 hypervolume average vs. evaluations evolution, 80 hypervolume indicator statistical analysis, 79 multi-objective optimization approach, 72 optimum solutions obtained from the evolutionary process, 82 Shaffer’s test, 79 Monte Carlo methods, 109, 111, 114, 116–119, 126, 209, 214, 345, 436 asynchronous Monte Carlo, 58–59 CVAR, 66 discretization error, 57, 58 drag force, 64 mean square error of estimator, 57 Monte Carlo estimator, 57 pressure field snapshot, 64 scheduling, 60 statistical error, 57, 58, 65–66 synchronous Monte Carlo, 58 velocity field snapshot, 63 wind engineering CFL, 62 domain dimensions, 62 Reynolds number, 61 source of uncertainty, 63 time-averaged, 62 MSC-ACTRAN software, 238, 244

Index Multi-fidelity Gaussian process regression (MF-GPR), 36, 37, 282, 286 Multi-fidelity surrogate assisted design optimisation aerodynamic computational chain airfoil generation, 38–39 CFD evaluation, 40 grid generation, 39 deterministic design optimisation problem, 42 deterministic optimisation baseline and deterministic optimal airfoil comparison, 48 computational chain of aerodynamic forces with probabilistic model, 47 multi-fidelity and single-fidelity comparison, 48 prediction error of multi-and single-fidelity surrogate models comparison, 49 prediction of distributions for baseline and optimal designs, 50 far-field drag coefficient calculation entropy drag, 40 grid, 41 mesh sizes and computed drag coefficients, 41 MH 114 test, 41 multi-fidelity Gaussian process regression, 37 optimisation pipeline CFD evaluation, 45 constrained expected improvement formulation, 44 drag coefficient, 46 HF samples, 45 LF samples, 45 lift coefficient, 46 SEVR, 45 probabilistic design optimisation problem, 43 probabilistic optimisation baseline, deterministic optimum, and robust optimum airfoil comparison, 51 pressure coefficient and friction coefficient, 52 Multi-objective evolutionary algorithms (MOEAs), 85 Multi-objective optimization, 72, 78, 83, 86 Multi-objective probabilistic optimisation workflow, 289 Multi-objective robust design optimization, 241

453 Multi-objective robustness analysis, 209–211 asteroid tour test, 221–222 control map and threshold map, 222–225 dimensionality reduction, control mapping for deterministic control map, 217–218 max-min control map, 218–219 min-max control map, 219–220 threshold mapping, 220–221 execution times, 228 expectation and sampling methods, 227 final population in decision space versus length, 92 decision space versus output, 91 fitness versus robustness of initial and final populations, 93, 94 initial and final populations length versus output, 90 robustness versus output, 93 lower expectation, 212–213, 225–227 estimating expectation, 214–216 minimizing expectation, 213–214 in polymer extrusion extrusion process, 86–87 robustness methodology, 87–88 SMS-EMO algorithm, 88–89 problem formulation, 211–212 quantity of solutions in final population, 90 two types, 86 Multi–Population Adaptive Inflationary Differential Evolution Algorithm (MP-AIDEA), 219, 3231 Multi-variate Bernstein basis function, 213 Mutation distribution (disM), 76 Mutatis mutandis, 199

N Nacelle acoustic liner method, 232–233–234 Navier–Stokes equations, 61, 354, 356 N -body problem, 112 Near Earth asteroids (NEA), 221 Near-field method, 36 Negative nonlinearities, 408 Nelder-Mead algorithm, 358 Newton severity scale, 31 Newton’s second law, 112 Non-deterministic performance curves, 101–102 Non-Dimensional Parameters (NDPs), 355 Nonequilibrium plasma, 379

454 Non-ideal compressible-fluid dynamics (NICFD) Mach number measurements, 412 uncertainty quantification (UQ) analysis, 412 Non-ideal oblique shock waves experimental values and error bars, 414 experimental verification, 411 Mach number, 409 output statistics, 412 Peng-Robinson model coefficients, 415 pre-shock thermodynamic state, 409 pressure and Mach number measurements, 412 RANS equations, 413 stagnation conditions, 411 van der Waals model, 410 Non-intrusive methods, 231 Non-intrusive Polynomial Chaos expansion approach, 412 Nonintrusive probabilistic collocation method, 99 Nonlinear dimensionality reduction methods, 5, 17 Non-linear systems, 185, 189 NSGA-II, 72 Numerical optimisation processes, 327

O On Board Data Handling (OBDH) node, 323 Operator norm, 188 Opportunity cost, Bayesian optimization, 247, 254 OverAll Sound Pressure Level (OASPL), 233, 238–241

P Pareto dominance, 72 Pareto front, 78, 85, 89, 216, 276, 277, 282, 290, 292, 295 Pareto optimal airfoil designs, 293, 294 p-box uncertainty models, 132 Penalty approach, 334 Plasma-assisted combustion (PAC) mechanisms, 380 Plasmatron, 351 Plasma wind tunnel experiments heterogeneous catalysis, 351–352 thermochemical ablation, 353 PlatEMO, 78 Poisson assumption, 394 Poisson simulation, 397, 403

Index Policy iteration algorithm, 204 PoliMIce, 25 Polymer extrusion process extrusion process, 86–87 multi-objective optimization algorithm, 88–89 robustness methodology, 87–88 SMS-EMO algorithm, 88–89 Polynomial Chaos Expansion (PCE), 231, 232, 234 Polynomial mutation, 88 Polynomial-time pivot rule, 202–203 Posterior computation mean and variance, 373 properties of, 371–372 regression coefficients, 370–372 selection indicators, 367–370 Preconditioning system, 436 Principal component analysis, 380 MAX scaling, 385 PACMAN code, 392 PCA-based Gaussian process regression dimension reduction process, 387 gas temperature in function, 389, 390 Matèrn covariance function, 385 mean function, 387 outlying samples, 383 plasma-assisted combustion, 386 probability distribution, 384 regression function, 384 sample data, 383 training data, 383 weighting factors, 388 skeletal mechanism, 385 Principal component analysis (PCA), 4, 380 Probability Density Function (PDF), 100, 109, 262, 270, 275, 276, 412, 415 Probability distribution, 366 Probability mass function, 394, 395 Probability of Collision (PoC), 141 Propagation of uncertainties dynamics modelling, 112–113 Fokker–Planck Equation, 109 Monte-Carlo estimations, 109–111, 116, 126 modelling initial uncertainties on Snoopy, 117–118 probability of Snoopy’s presence, 118–120 numerical analysis integration step, 115–116 polynomial trajectories sensitivity, 116 propagator implementation and validation comet siding spring, 117

Index

455

Q Q-intersection method, 421 Q-matrices, 153, 157–158 Quantities of interest (QoIs), 27, 36, 328 Quasi-Monte Carlo (qMC) approach, 214, 227

parametrization, 269–270 test case, 269 Robust estimation method, 138–140 Robust filtering approach, 132 Robust knowledge gradient (rKG), 249–251 Robust optimisation techniques, 327 Robust particle filter (RPF), 131–132 filtering under epistemic uncertainty, 132–133 bound estimator, 138–140 expectation estimator, 135–137 imprecise formulation, 134–135 Monte Carlo uncertain propagation, 144 test case, 140–142 initial state uncertainty, 142–143 observation model and errors, 143–144 Root mean square errors (RMSE), 224, 226 Rotor blade tip gap, 99 Runback ice, 23 RunningWet (RW), 23, 28, 30

R Radiation biomarkers, 393 Random forest model, 301 Random variables, 336 Random-Walk Metropolis (RWM) algorithm, 357 Real data analysis, 375–376 Recombination processes, 381 Redundancy, 69 Regression coefficients, 370–372 Reliability (R(t)), 69, 70 Reliability-based optimization OASPL, 239–241 optimization platform, 238–239 Reliability-based Optimization approach, 242 Response Surfaces, 231 Reynolds-averaged Navier-Stokes equation., 36, 40, 284, 333, 413 Reynolds-Averaged Navier–Stokes partial differential equations, 52 Reynolds number, 61 Robust Bayesian optimization, 248–249 Robust design application high-lift devices (HLDs) artificial objective function, 306 optimisation process, 304 original objective function, 305–306 shock control bumps (SCBs) numeric model, 269 operational uncertainties, 269 optimization formulations, 270–271

S Safran Aero Boosters, 97, 106 Sampling methods, 436 SBX crossover, 88 Scaled Expected Variance Reduction (SEVR), 45, 288 Scaled sensitivity derivatives, 102, 105, 106 Scale factor, 77 Schur complement based methods applications of, 440 DD-KL approach, 440 Direct PC (DPC) preconditioner, 441 Factorized PC (FPC) preconditioner, 441 FPCG method, 443, 444 median PCG (MPCG) method, 442 Schwarz Method boundary-to-boundary operator, 438 KL-based preconditioner, 438 preconditioned Schwarz iteration, 436, 437 Secondary inlets, 98 Second Order Reliability Method (SORM), 237 Selection probabilities, 367 Selig format, 285 Sensitivity analyses, 350 Separately specified rows, 153 Sequential Bayesian approach, 133 Sequential importance sampling, 135–136 Set-valued inverse method finite element (FE) model, 429 GCONS outlier detection method, 428 outlier detection, 431

internal structure of the propagator, 116–117 TDA structure to solve ODE classic Monte-Carlo simulation vs and TDA-Based Monte-Carlo, 114 implementing an ODE solver, 113–114 modelling SRP uncertainties, 114–115 Propagator, 116, 117 Proper orthogonal decomposition (POD) methods, 4 Protein-based biomarkers, 403 Python 3.7, 121 Python (PyAudi), 110, 113, 121, 126

456 Set-valued inverse method (cont.) radial basis function (RBF) interpolation method, 429 singular value decomposition (SVD), 429 Set-valued Kalman filters, 132 SGP4 propagator, 141 Shaffer’s test, 79 Shock control bumps (SCBs) EUROSHOCK II project, 261 gradient-based optimization analytical test function, 267–268 gradients of the statistics, 264–265 motivation, 262 optimization framework, 265–266 SBUQ framework, 263–264 robust design application numeric model, 269 operational uncertainties, 269 optimization formulations, 270–271 parametrization, 269–270 test case, 269 robustness of, 261 robust optimization contours of drag coefficient, 274 convergence history, 275 Pareto front, 276 probability distribution function, 275 violin plot of optima configurations, 275 single-point (deterministic) results, 272 strong point optimization effect, 262 uncertainty quantification, 273–274 Sideline approach, 233 Siding Spring, 117 Simplex algorithm, 202, 203 Simulation-based design (SBD), 3 Simultaneous sampling, 175–176 Single grid method, 163 SMS-EMOA algorithm, 88 Snoopy computing Snoopy’s trajectory DOP853, 122 dynamical parameters, 122 initial conditions of, 121 estimating the probability of Snoopy’s presence, 118–120, 125 distance from Snoopy to the Earth, 123 mean and standard deviation of empiric distribution, 124 potential window of Snoopy’s reentry in the Earth’s SOI, 123 uncertainties along the y-axis, 124 uncertainties on SRP parameters, 124 future work, 125–126

Index modelling initial uncertainties on, 117–118 performing numerical analysis on trajectory of integration step for, 120 sensitivity analysis on snoopy’s trajectory, 121 probability of Snoopy’s presence near Earth, 125, 126 WT1190F, 111 Solar Radiation Pressure (SRP), 112, 114–116, 119–122, 124, 125 Sound Pressure Level (SPL), 233 Spacecraft, 116, 117 Space radiation, 403 Spalart–Allmaras (SA), 40 SPICE, 117 SpiceyPy, 112 1.5-stage axial compressor CFD simulations, 100 computational cost, 100 geometrical uncertainties correlated fields at main inlet, 98 rotor blade tip gap, 99 secondary inlets, 98–99 geometry and operating regime, 97, 98 non-deterministic performance curves, 101–102 scaled sensitivity derivatives, 102, 105, 106 uncertainty quantification method Golub-Welsh algorithm, 99 nonintrusive probabilistic collocation method, 99 scaled sensitivity derivatives, 99–100 Standard continuous–discrete state-space model, 133 State-space model, 132 Stefan-Maxwell equations, 353 Stochastic approximation methods, 210 Stochastic Kriging, 251–252, 254–256 Surrogate-based uncertainty quantification (SBUQ), 263–264 Surrogate methods, 436 Synthetic datasets, 373–374 System network model, 317–318

T Taylor Algebra, 210 Taylor Differential Algebra (TDA), 110, 111, 113–118, 120, 122, 126 Taylor series, 160 Theory of Evidence, 316–317 Thermal Protection System (TPS) community, 351

Index

457

Thermochemical ablation, 359–362 3D steady RANS equations, 100 Threshold mapping, 220–225 Time of Closest Approach (TCA), 141 Toolbox, 116, 117 Trailing Edge Angle (TEA), 42 Trajectory, 116, 117 Transition law, 153 Transition matrix, 186 Transition rate matrices, 153 Transonic airfoil performance, 345 Transport turbulence model, 285 Tree-based algorithm, 319–320 Truncated Power Series Algebra, 110 2D homogeneous plate, 427 Two-line elements (TLEs), 141

Uncertainty quantification (UQ), 235, 282, 412 complexity reduction combined method, 320 network decomposition, 319 tree-based exploration, 319–320 Unconventional phenomena, 408 Uniform grid method, 164 Uni-variate Bernstein distribution, 213 Upper expectation functionals, 153

U UMRIDA European project, 231 Uncertain measurements, 422 Uncertainties aeronautical optimization under nacelle acoustic liner and manufacturing tolerances, 232–233 nacelle acoustic liner FEM model, 233–234 airfoil optimisation analytical propagation, 287 cumbersome process, 287 relative error, 288 uncertainty-based optimisation techniques, 36

W WAve Resistance Program(WARP), 11 Wind engineering, 56 CFL, 62 domain dimensions, 62 Reynolds number, 61 source of uncertainty, 63 time-averaged, 62 Working nodes, 60 WT1190F, 110, 111, 117

V Value-at-risk (VaR), 328 Value iteration algorithm, 204

Z Zero-inflated Poisson (ZIP) model, 394 Zero-inflation, 398