Deep Learning Policy Quantization

Jos van de Wolfshaar, Marco Wiering, Lambert Schomaker

Abstract

We introduce a novel type of actor-critic approach for deep reinforcement learning which is based on learning vector quantization. We replace the softmax operator of the policy with a more general and more flexible operator that is similar to the robust soft learning vector quantization algorithm. We compare our approach to the default A3C architecture on three Atari 2600 games and a simplistic game called Catch. We show that the proposed algorithm outperforms the softmax architecture on Catch. On the Atari games, we observe a nonunanimous pattern in terms of the best performing model.

References

Download


Paper Citation


in Harvard Style

van de Wolfshaar J., Wiering M. and Schomaker L. (2018). Deep Learning Policy Quantization.In Proceedings of the 10th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART, ISBN 978-989-758-275-2, pages 122-130. DOI: 10.5220/0006592901220130


in Bibtex Style

@conference{icaart18,
author={Jos van de Wolfshaar and Marco Wiering and Lambert Schomaker},
title={Deep Learning Policy Quantization},
booktitle={Proceedings of the 10th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,},
year={2018},
pages={122-130},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006592901220130},
isbn={978-989-758-275-2},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 10th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,
TI - Deep Learning Policy Quantization
SN - 978-989-758-275-2
AU - van de Wolfshaar J.
AU - Wiering M.
AU - Schomaker L.
PY - 2018
SP - 122
EP - 130
DO - 10.5220/0006592901220130