Distributional Reinforcement Learning (Adaptive Computation and Machine Learning) 9780262374019, 2022033240, 2022033241, 9780262048019, 9780262374026

The first comprehensive guide to distributional reinforcement learning, providing a new mathematical formalism for think

214 54 13MB

English Pages 384 Year 2023

Report DMCA / Copyright

DOWNLOAD EPUB FILE

Table of contents :
1 Introduction
3 1.1 Why Distributional Reinforcement Learning?
4 1.2 An Example: Kuhn Poker
5 1.3 How Is Distributional Reinforcement Learning Different?
6 1.4 Intended Audience and Organization
7 1.5 Bibliographical Remarks
8 2 The Distribution of Returns
9 2.1 Random Variables and Their Probability Distributions
10 2.2 Markov Decision Processes
11 2.3 The Pinball Model
12 2.4 The Return
13 2.5 The Bellman Equation
14 2.6 Properties of the Random Trajectory
15 2.7 The Random-Variable Bellman Equation
16 2.8 From Random Variables to Probability Distributions
17 2.9 Alternative Notions of the Return Distribution*
18 2.10 Technical Remarks
19 2.11 Bibliographical Remarks
20 2.12 Exercises
21 3 Learning the Return Distribution
22 3.1 The Monte Carlo Method
23 3.2 Incremental Learning
24 3.3 Temporal-Difference Learning
25 3.4 From Values to Probabilities
26 3.5 The Projection Step
27 3.6 Categorical Temporal-Difference Learning
28 3.7 Learning to Control
29 3.8 Further Considerations
30 3.9 Technical Remarks
31 3.10 Bibliographical Remarks
32 3.11 Exercises
33 4 Operators and Metrics
34 4.1 The Bellman Operator
35 4.2 Contraction Mappings
36 4.3 The Distributional Bellman Operator
37 4.4 Wasserstein Distances for Return Functions
38 4.5 ℓ p Probability Metrics and the Cramér Distance
39 4.6 Sufficient Conditions for Contractivity
40 4.7 A Matter of Domain
41 4.8 Weak Convergence of Return Functions*
42 4.9 Random-Variable Bellman Operators*
43 4.10 Technical Remarks
44 4.11 Bibliographical Remarks
45 4.12 Exercises
46 5 Distributional Dynamic Programming
47 5.1 Computational Model
48 5.2 Representing Return-Distribution Functions
49 5.3 The Empirical Representation
50 5.4 The Normal Representation
5.5 Fixed-Size Empirical Representations
52 5.6 The Projection Step
53 5.7 Distributional Dynamic Programming
54 5.8 Error Due to Diffusion
55 5.9 Convergence of Distributional Dynamic Programming
56 5.10 Quality of the Distributional Approximation
57 5.11 Designing Distributional Dynamic Programming Algorithms
58 5.12 Technical Remarks
59 5.13 Bibliographical Remarks
60 5.14 Exercises
61 6 Incremental Algorithms
62 6.1 Computation and Statistical Estimation
63 6.2 From Operators to Incremental Algorithms
64 6.3 Categorical Temporal-Difference Learning
65 6.4 Quantile Temporal-Difference Learning
66 6.5 An Algorithmic Template for Theoretical Analysis
67 6.6 The Right Step Sizes
68 6.7 Overview of Convergence Analysis
69 6.8 Convergence of Incremental Algorithms*
70 6.9 Convergence of Temporal-Difference Learning*
71 6.10 Convergence of Categorical Temporal-Difference Learning*
72 6.11 Technical Remarks
73 6.12 Bibliographical Remarks
74 6.13 Exercises
75 7 Control
76 7.1 Risk-Neutral Control
77 7.2 Value Iteration and Q-Learning
78 7.3 Distributional Value Iteration
79 7.4 Dynamics of Distributional Optimality Operators
80 7.5 Dynamics in the Presence of Multiple Optimal Policies*
81 7.6 Risk and Risk-Sensitive Control
82 7.7 Challenges in Risk-Sensitive Control
83 7.8 Conditional Value-At-Risk*
84 7.9 Technical Remarks
85 7.10 Bibliographical Remarks
86 7.11 Exercises
87 8 Statistical Functionals
88 8.1 Statistical Functionals
89 8.2 Moments
90 8.3 Bellman Closedness
91 8.4 Statistical Functional Dynamic Programming
92 8.5 Relationship to Distributional Dynamic Programming
93 8.6 Expectile Dynamic Programming
94 8.7 Infinite Collections of Statistical Functionals
95 8.8 Moment Temporal-Difference Learning*
96 8.9 Technical Remarks
97 8.10 Bibliographical Remarks
98 8.11 Exercises
99 9 Linear Function Approximation
100 9.1 Function Approximation and Aliasing
101 9.2 Optimal Linear Value Function Approximations
102 9.3 A Projected Bellman Operator for Linear Value Function Approximation
103 9.4 Semi-Gradient Temporal-Difference Learning
104 9.5 Semi-Gradient Algorithms for Distributional Reinforcement Learning
105 9.6 An Algorithm Based on Signed Distributions*
106 9.7 Convergence of the Signed Algorithm*
107 9.8 Technical Remarks
108 9.9 Bibliographical Remarks
109 9.10 Exercises
110 10 Deep Reinforcement Learning
111 10.1 Learning with a Deep Neural Network
10.2 Distributional Reinforcement Learning with Deep Neural Networks
113 10.3 Implicit Parameterizations
114 10.4 Evaluation of Deep Reinforcement Learning Agents
115 10.5 How Predictions Shape State Representations
116 10.6 Technical Remarks
117 10.7 Bibliographical Remarks
118 10.8 Exercises
119 11 Two Applications and a Conclusion
120 11.1 Multiagent Reinforcement Learning
121 11.2 Computational Neuroscience
122 11.3 Conclusion
123 11.4 Bibliographical Remarks
124 Notation
125 References
126 Index

Distributional Reinforcement Learning (Adaptive Computation and Machine Learning)
 9780262374019, 2022033240, 2022033241, 9780262048019, 9780262374026

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
Recommend Papers