Protocols  and  Open  Problems  for  General  Agents.  J. 
Artif. Intell. Res., 61, 523–562 
Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T. 
P., Harley, T., Silver, D., & Kavukcuoglu,  K. (2016). 
Asynchronous  Methods  for  Deep  Reinforcement 
Learning.  In  Proceedings  of  the  33nd  International 
Conference  on  Machine  Learning,  ICML  2016,  (Vol. 
48, pp. 1928–1937) 
Mnih,  V.,  Kavukcuoglu,  K.,  Silver,  D.,  Rusu,  A.  A., 
Veness, J.,  Bellemare, M. G., Graves, A., Riedmiller, 
M.  A.,  Fidjeland,  A.,  Ostrovski,  G.,  Petersen,  S., 
Beattie,  C.,  Sadik,  A.,  Antonoglou,  I.,  King,  H., 
Kumaran, D., Wierstra, D., Legg, S., & Hassabis, D. 
(2015).  Human-level  control  through  deep 
reinforcement learning. Nat., 518(7540), 529–533 
Mott,  A.,  Zoran,  D.,  Chrzanowski,  M.,  Wierstra,  D.,  & 
Rezende,  D.  J.  (2019).  Towards  Interpretable 
Reinforcement  Learning  Using  Attention  Augmented 
Agents. In Advances in Neural Information Processing 
Systems 32: Annual Conference on Neural Information 
Processing Systems 2019, NeurIPS 2019,  (pp. 12329–
12338) 
Paszke,  A.,  Gross,  S.,  Massa,  F.,  Lerer,  A.,  Bradbury,  J., 
Chanan,  G.,  Killeen,  T.,  Lin,  Z.,  Gimelshein,  N., 
Antiga,  L.,  Desmaison,  A.,  Köpf,  A.,  Yang,  E.  Z., 
DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., 
Steiner, B., Fang, L., … Chintala, S. (2019). PyTorch: 
An Imperative Style, High-Performance Deep Learning 
Library. In Advances in Neural Information Processing 
Systems 32: Annual Conference on Neural Information 
Processing  Systems  2019,  NeurIPS  2019,  (pp.  8024–
8035) 
Schulman,  J.,  Moritz,  P.,  Levine,  S.,  Jordan,  M.  I.,  & 
Abbeel,  P.  (2016).  High-Dimensional  Continuous 
Control  Using  Generalized  Advantage  Estimation.  In 
4th  International  Conference  on  Learning 
Representations, ICLR 2016 
Shi, X., Chen, Z., Wang, H., Yeung, D.-Y., Wong, W.-K., 
& Woo, W. (2015). Convolutional LSTM Network: A 
Machine  Learning  Approach  for  Precipitation 
Nowcasting.  In  Advances  in  Neural  Information 
Processing Systems 28: Annual Conference on Neural 
Information Processing Systems 2015, (pp. 802–810) 
Sorokin,  I.,  Seleznev,  A.,  Pavlov,  M.,  Fedorov,  A.,  & 
Ignateva,  A.  (2015).  Deep  Attention  Recurrent  Q-
Network.  CoRR,  abs/1512.01693.  http://arxiv.org/ 
abs/1512.01693 
Srivastava, N., Mansimov, E., & Salakhutdinov, R. (2015). 
Unsupervised Learning of Video Representations using 
LSTMs.  In  Proceedings  of  the  32nd  International 
Conference  on  Machine  Learning,  ICML  2015,  (Vol. 
37, pp. 843–852) 
Sutskever, I., Vinyals, O., & Le, Q. v. (2014). Sequence to 
Sequence  Learning  with  Neural  Networks.  In  
Advances  in  Neural  Information  Processing  Systems 
27:  Annual  Conference  on  Neural  Information 
Processing Systems 2014,  (pp. 3104–3112) 
Tang, Y., Nguyen, D., & Ha, D. (2020). Neuroevolution of 
self-interpretable agents. In GECCO ’20: Genetic and 
Evolutionary Computation Conference, 2020 (pp. 414–
424). ACM. 
Wayne, G., Hung, C.-C., Amos, D., Mirza, M., Ahuja, A., 
Grabska-Barwinska,  A.,  Rae,  J.  W.,  Mirowski,  P., 
Leibo,  J.  Z.,  Santoro,  A.,  Gemici,  M.,  Reynolds,  M., 
Harley, T., Abramson, J., Mohamed, S., Rezende, D. J., 
Saxton, D., Cain, A., Hillier, C., … Lillicrap, T. P. 
(2018).  Unsupervised  Predictive  Memory  in  a  Goal-
Directed  Agent.  CoRR,  abs/1803.10760.  http:// 
arxiv.org/abs/1803.10760.