Adaptive Two-stage Learning Algorithm for Repeated Games

Wataru Fujita, Koichi Moriyama, Ken-ichi Fukui, Masayuki Numao

Abstract

In our society, people engage in a variety of interactions. To analyze such interactions, we consider these interactions as a game and people as agents equipped with reinforcement learning algorithms. Reinforcement learning algorithms are widely studied with a goal of identifying strategies of gaining large payoffs in games; however, existing algorithms learn slowly because they require a large number of interactions. In this work, we constructed an algorithm that both learns quickly and maximizes payoffs in various repeated games. Our proposed algorithm combines two different algorithms that are used in the early and later stages of our algorithm. We conducted experiments in which our proposed agents played ten kinds of games in self-play and with other agents. Results showed that our proposed algorithm learned more quickly than existing algorithms and gained sufficiently large payoffs in nine games.

References

  1. Burkov, A. and Chaib-draa, B. (2009). Effective learning in the presence of adaptive counterparts. Journal of Algorithms, 64:127-138.
  2. Claus, C. and Boutilier, C. (1998). The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems. In Proc. 15th National Conference on Artificial Intelligence (AAAI), pages 746-752.
  3. Crandall, J. W. and Goodrich, M. A. (2011). Learning to compete, coordinate, and cooperate in repeated games using reinforcement learning. Machine Learning, 82:281-314.
  4. Fujita, W., Moriyama, K., Fukui, K., and Numao, M. (2016). Learning better strategies with a combination of complementary reinforcement learning algorithms. In Nishizaki, S., Numao, M., Caro, J. D. L., and Suarez, M. T. C., editors, Theory and Practice of Computation, pages 43-54. World Scientific.
  5. Hu, J. and Wellman, M. P. (2003). Nash Q-learning for General-sum Stochastic Games. Journal of Machine Learning Research, 4:1039-1069.
  6. Masuda, N. and Nakamura, M. (2011). Numerical analysis of a reinforcement learning model with the dynamic aspiration level in the iterated Prisoner's dilemma. Journal of Theoretical Biology, 278:55-62.
  7. Okada, A. (2011). Game Theory. Yuhikaku. (in Japanese).
  8. Rummery, G. A. and Niranjan, M. (1994). On-line Qlearning using connectionist systems. Technical Report TR166, Cambridge University Engineering Department.
  9. Stimpson, J. L. and Goodrich, M. A. (2003). Learning To Cooperate in a Social Dilemma: A Satisficing Approach to Bargaining. In Proc. 20th International Conference on Machine Learning (ICML), pages 728- 735.
  10. Sutton, R. S. and Barto, A. G. (1998). Reinforcement Learning: an Introduction. MIT Press.
  11. Watkins, C. J. C. H. and Dayan, P. (1992). Technical Note: Q-Learning. Machine Learning, 8:279-292.
  12. Wiering, M. A. and van Hasselt, H. (2008). Ensemble Algorithms in Reinforcement Learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B, 38:930- 936.
Download


Paper Citation


in Harvard Style

Fujita W., Moriyama K., Fukui K. and Numao M. (2016). Adaptive Two-stage Learning Algorithm for Repeated Games . In Proceedings of the 8th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART, ISBN 978-989-758-172-4, pages 47-55. DOI: 10.5220/0005711000470055


in Bibtex Style

@conference{icaart16,
author={Wataru Fujita and Koichi Moriyama and Ken-ichi Fukui and Masayuki Numao},
title={Adaptive Two-stage Learning Algorithm for Repeated Games},
booktitle={Proceedings of the 8th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,},
year={2016},
pages={47-55},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005711000470055},
isbn={978-989-758-172-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 8th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,
TI - Adaptive Two-stage Learning Algorithm for Repeated Games
SN - 978-989-758-172-4
AU - Fujita W.
AU - Moriyama K.
AU - Fukui K.
AU - Numao M.
PY - 2016
SP - 47
EP - 55
DO - 10.5220/0005711000470055