Table of contents : Bayesian Optimization Overview Global Optimization The Objective Function The Observation Model Bayesian Statistics Bayesian Inference Frequentist vs. Bayesian Approach Joint, Conditional, and Marginal Probabilities Independence Prior and Posterior Predictive Distributions Bayesian Inference: An Example Bayesian Optimization Workflow Gaussian Process Acquisition Function The Full Bayesian Optimization Loop Summary Chapter 2: Gaussian Processes Reviewing the Gaussian Basics Understanding the Covariance Matrix Marginal and Conditional Distribution of Multivariate Gaussian Sampling from a Gaussian Distribution Gaussian Process Regression The Kernel Function Extending to Other Variables Learning from Noisy Observations Gaussian Process in Practice Drawing from GP Prior Obtaining GP Posterior with Noise-Free Observations Working with Noisy Observations Experimenting with Different Kernel Parameters Hyperparameter Tuning Summary
Chapter 3: Bayesian Decision Theory and Expected Improvement Optimization via the Sequential Decision-Making Seeking the Optimal Policy Utility-Driven Optimization Multi-step Lookahead Policy Bellman’s Principle of Optimality Expected Improvement Deriving the Closed-Form Expression Implementing the Expected Improvement Using Bayesian Optimization Libraries Summary Chapter 4: Gaussian Process Regression with GPyTorch Introducing GPyTorch The Basics of PyTorch Revisiting GP Regression Building a GP Regression Model Fine-Tuning the Length Scale of the Kernel Function Fine-Tuning the Noise Variance Delving into Kernel Functions Combining Kernel Functions Predicting Airline Passenger Counts Summary Chapter 5: Monte Carlo Acquisition Function with Sobol Sequences and Random Restart Analytic Expected Improvement Using BoTorch Introducing Hartmann Function GP Surrogate with Optimized Hyperparameters Introducing the Analytic EI Optimization Using Analytic EI Grokking the Inner Optimization Routine Using MC Acquisition Function Using Monte Carlo Expected Improvement Summary Chapter 6: Knowledge Gradient: Nested Optimization vs. One-Shot Learning Introducing Knowledge Gradient Monte Carlo Estimation Optimizing Using Knowledge Gradient One-Shot Knowledge Gradient Sample Average Approximation One-Shot Formulation of KG Using SAA One-Shot KG in Practice Optimizing the OKG Acquisition Function Summary Chapter 7: Case Study: Tuning CNN Learning Rate with BoTorch Seeking Global Optimum of Hartmann Generating Initial Conditions Updating GP Posterior Creating a Monte Carlo Acquisition Function The Full BO Loop Hyperparameter Optimization for Convolutional Neural Network Using MNIST Defining CNN Architecture Training CNN Optimizing the Learning Rate Entering the Full BO Loop Summary Index