Exploration Methods for Connectionist Q-learning in Bomberman

Joseph Groot Kormelink, Madalina M. Drugan, Marco A. Wiering

2018

Abstract

In this paper, we investigate which exploration method yields the best performance in the game Bomberman. In Bomberman the controlled agent has to kill opponents by placing bombs. The agent is represented by a multi-layer perceptron that learns to play the game with the use of Q-learning. We introduce two novel exploration strategies: Error-Driven-e and Interval-Q, which base their explorative behavior on the temporal-difference error of Q-learning. The learning capabilities of these exploration strategies are compared to five existing methods: Random-Walk, Greedy, e-Greedy, Diminishing e-Greedy, and Max-Boltzmann. The results show that the methods that combine exploration with exploitation perform much better than the Random-Walk and Greedy strategies, which only select exploration or exploitation actions. Furthermore, the results show that Max-Boltzmann exploration performs the best in overall from the different techniques. The Error-Driven-e exploration strategy also performs very well, but suffers from an unstable learning behavior.

Download


Paper Citation


in Harvard Style

Groot Kormelink J., M. Drugan M. and Wiering M. (2018). Exploration Methods for Connectionist Q-learning in Bomberman.In Proceedings of the 10th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART, ISBN 978-989-758-275-2, pages 355-362. DOI: 10.5220/0006556403550362


in Bibtex Style

@conference{icaart18,
author={Joseph Groot Kormelink and Madalina M. Drugan and Marco A. Wiering},
title={Exploration Methods for Connectionist Q-learning in Bomberman},
booktitle={Proceedings of the 10th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,},
year={2018},
pages={355-362},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006556403550362},
isbn={978-989-758-275-2},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 10th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,
TI - Exploration Methods for Connectionist Q-learning in Bomberman
SN - 978-989-758-275-2
AU - Groot Kormelink J.
AU - M. Drugan M.
AU - Wiering M.
PY - 2018
SP - 355
EP - 362
DO - 10.5220/0006556403550362