References 339
Barbeau, André. 1974. Drugs affecting movement disorders. Annual Review of
Pharmacology 14 (1): 91–113.
Bard, Nolan, Jakob N. Foerster, Sarath Chandar, Neil Burch, Marc Lanctot, H. Francis
Song, Emilio Parisotto, Vincent Dumoulin, Subhodeep Moitra, Edward Hughes, Iain
Dunning, Shibl Mourad, Hugo Larochelle, Marc G. Bellemare, and Michael Bowling.
2020. The Hanabi challenge: A new frontier for AI research. Artificial Intelligence
280:103216.
Barnard, Etienne. 1993. Temporal-difference methods and Markov models. IEEE
Transactions on Systems, Man, and Cybernetics 23 (2): 357–365.
Barreto, André, Will Dabney, Rémi Munos, Jonathan J. Hunt, Tom Schaul, Hado van
Hasselt, and David Silver. 2017. Successor features for transfer in reinforcement learning.
In Advances in Neural Information Processing Systems.
Barth-Maron, Gabriel, Matthew W. Hoffman, David Budden, Will Dabney, Dan Horgan,
Dhruva TB, Alistair Muldal, Nicolas Heess, and Timothy Lillicrap. 2018. Distributed dis-
tributional deterministic policy gradients. In Proceedings of the International Conference
on Learning Representations.
Barto, Andrew G., Steven J. Bradtke, and Satinder P. Singh. 1995. Learning to act using
real-time dynamic programming. Artificial Intelligence 72 (1): 81–138.
Barto, Andrew G., Richard S. Sutton, and Charles W. Anderson. 1983. Neuronlike
adaptive elements that can solve difficult learning control problems. IEEE Transactions
on Systems, Man, and Cybernetics 13 (5): 834–846.
Bäuerle, Nicole, and Jonathan Ott. 2011. Markov decision processes with average-value-
at-risk criteria. Mathematical Methods of Operations Research 74 (3): 361–379.
Beattie, Charles, Joel Z. Leibo, Denis Teplyashin, Tom Ward, Marcus Wainwright,
Heinrich Küttler, Andrew Lefrancq, Simon Green, Víctor Valdés, Amir Sadik, Julian
Schrittwieser, Keith Anderson, Sarah York, Max Cant, Adam Cain, Adrian Bolton,
Stephen Gaffney, Helen King, Demis Hassabis, Shane Legg, and Stig Petersen. 2016.
DeepMind Lab. arXiv preprint arXiv:1612.03801.
Bellemare, Marc G., Salvatore Candido, Pablo Samuel Castro, Jun Gong, Marlos C.
Machado, Subhodeep Moitra, Sameera S. Ponda, and Ziyu Wang. 2020. Autonomous
navigation of stratospheric balloons using reinforcement learning. Nature 588 (7836):
77–82.
Bellemare, Marc G., Will Dabney, Robert Dadashi, Adrien Ali Taiga, Pablo Samuel
Castro, Nicolas Le Roux, Dale Schuurmans, Tor Lattimore, and Clare Lyle. 2019a. A
geometric perspective on optimal representations for reinforcement learning. In Advances
in Neural Information Processing Systems.
Bellemare, Marc G., Will Dabney, and Rémi Munos. 2017a. A distributional perspective
on reinforcement learning. In Proceedings of the International Conference on Machine
Learning.
Bellemare, Marc G., Ivo Danihelka, Will Dabney, Shakir Mohamed, Balaji Lakshmi-
narayanan, Stephan Hoyer, and Rémi Munos. 2017b. The Cramer distance as a solution
to biased Wasserstein gradients. arXiv preprint arXiv:1705.10743.
Draft version.