vi Contents
3.7 Learning to Control . . . . . . . . . . . . . . . . . . . . . . . 69
3.8 Further Considerations . . . . . . . . . . . . . . . . . . . . . 70
3.9 Technical Remarks . . . . . . . . . . . . . . . . . . . . . . . 70
3.10 Bibliographical Remarks . . . . . . . . . . . . . . . . . . . . 71
3.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4 Operators and Metrics 77
4.1 The Bellman Operator . . . . . . . . . . . . . . . . . . . . . 78
4.2 Contraction Mappings . . . . . . . . . . . . . . . . . . . . . 79
4.3 The Distributional Bellman Operator . . . . . . . . . . . . . . 83
4.4 Wasserstein Distances for Return Functions . . . . . . . . . . 86
4.5
p
Probability Metrics and the Cramér Distance . . . . . . . . 92
4.6 Sufficient Conditions for Contractivity . . . . . . . . . . . . . 95
4.7 A Matter of Domain . . . . . . . . . . . . . . . . . . . . . . . 98
4.8 Weak Convergence of Return Functions* . . . . . . . . . . . 102
4.9 Random-Variable Bellman Operators* . . . . . . . . . . . . . 104
4.10 Technical Remarks . . . . . . . . . . . . . . . . . . . . . . . 105
4.11 Bibliographical Remarks . . . . . . . . . . . . . . . . . . . . 106
4.12 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5 Distributional Dynamic Programming 115
5.1 Computational Model . . . . . . . . . . . . . . . . . . . . . . 115
5.2 Representing Return-Distribution Functions . . . . . . . . . . 118
5.3 The Empirical Representation . . . . . . . . . . . . . . . . . 120
5.4 The Normal Representation . . . . . . . . . . . . . . . . . . . 125
5.5 Fixed-Size Empirical Representations . . . . . . . . . . . . . 128
5.6 The Projection Step . . . . . . . . . . . . . . . . . . . . . . . 130
5.7 Distributional Dynamic Programming . . . . . . . . . . . . . 135
5.8 Error Due to Diffusion . . . . . . . . . . . . . . . . . . . . . 138
5.9 Convergence of Distributional Dynamic Programming . . . . 141
5.10 Quality of the Distributional Approximation . . . . . . . . . . 145
5.11 Designing Distributional Dynamic Programming Algorithms . 147
5.12 Technical Remarks . . . . . . . . . . . . . . . . . . . . . . . 148
5.13 Bibliographical Remarks . . . . . . . . . . . . . . . . . . . . 154
5.14 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
6 Incremental Algorithms 161
6.1 Computation and Statistical Estimation . . . . . . . . . . . . . 161
6.2 From Operators to Incremental Algorithms . . . . . . . . . . 163
6.3 Categorical Temporal-Difference Learning . . . . . . . . . . . 165
Draft version.