Unified Algorithm to Improve Reinforcement Learning in Dynamic Environments - An Instance-based Approach

Richardson Ribeiro, Fábio Favarim, Marco A. C. Barbosa, André Pinz Borges, Osmar Betazzi Dordal, Alessandro L. Koerich, Fabrício Enembreck

2012

Abstract

This paper presents an approach for speeding up the convergence of adaptive intelligent agents using reinforcement learning algorithms. Speeding up the learning of an intelligent agent is a complex task since the choice of inadequate updating techniques may cause delays in the learning process or even induce an unexpected acceleration that causes the agent to converge to a non- satisfactory policy. We have developed a technique for estimating policies which combines instance-based learning and reinforcement learning algorithms in Markovian environments. Experimental results in dynamic environments of different dimensions have shown that the proposed technique is able to speed up the convergence of the agents while achieving optimal action policies, avoiding problems of classical reinforcement learning approaches.

References

  1. Aha, D. W., Kibler, D., and Albert, M. K. (1991). Instance-based learning algorithms. Machine Learning, 6(1):37-66.
  2. Amato, C. and Shani, G. (2010). High-level reinforcement learning in strategy games. In Proc. 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS'10), pages 75-82.
  3. Banerjee, B. and Kraemer, L. (2010). Action discovery for reinforcement learning. In Proc. 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS'10), pages 1585-1586.
  4. Bianchi, R. A. C., Ribeiro, C. H. C., and Costa, A. H. R. (2004). Heuristically accelerated q-learning: A new approach to speed up reinforcement learning. In Proc. XVII Brazilian Symposium on Artificial Intelligence, pages 245-254, So Luis, Brazil.
  5. Bianchi, R. A. C., Ribeiro, C. H. C., and Costa, A. H. R. (2008). Accelerating autonomous learning by using heuristic selection of actions. Journal of Heuristics, 14:135-168.
  6. Butz, M. (2002). State value learning with an anticipatory learning classifier system in a markov decision process. Technical report, Illinois Genetic Algorithms Laboratory.
  7. Comanici, G. and Precup, D. (2010). Optimal policy switching algorithms for reinforcement learning. In Proc. 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS'10), pages 709-714.
  8. Dimitrakiev, D., Nikolova, N., and Tenekedjiev, K. (2010). Simulation and discrete event optimization for automated decisions for in-queue flights. Int. Journal of Intelligent Systems, 25(28):460-487.
  9. Drummond, C. (2002). Accelerating reinforcement learning by composing solutions of automatically identified subtask. Journal of Artificial Intelligence Research, 16:59-104.
  10. Enembreck, F., Avila, B. C., Scalabrini, E. E., and Barthes, J. P. A. (2007). Learning drifting negotiations. Applied Artificial Intelligence, 21:861-881.
  11. Firby, R. J. (1989). Adaptive Execution in Complex Dynamic Worlds. PhD thesis, Yale University.
  12. Galvn, I., Valls, J., Garca, M., and Isasi, P. (2011). A lazy learning approach for building classification models. Int. Journal of Intelligent Systems, 26(8):773-786.
  13. Jordan, P. R., Schvartzman, L. J., and Wellman, M. P. (2010). Strategy exploration in empirical games. In Proc. 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS'10), v. 1, pages 1131-1138, Toronto, Canada.
  14. Kittler, J., Hatef, M., Duin, R. P. W., and Matas, J. (1998). On combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(3):226- 239.
  15. Le, T. and Cai, C. (2010). A new feature for approximate dynamic programming traffic light controller. In Proc. 2th International Workshop on Computational Transportation Science (IWCTS'10), pages 29- 34, San Jose, CA, U.S.A.
  16. Mohammadian, M. (2006). Multi-agents systems for intelligent control of traffic signals. In Proc. International Conference on Computational Inteligence for Modelling Control and Automation and Int. Conf. on Intelligent Agents Web Technologies and Int. Commerce, page 270, Sydney, Australia.
  17. Pegoraro, R., Costa, A. H. R., and Ribeiro, C. H. C. (2001). Experience generalization for multi-agent reinforcement learning. In Proc. XXI International Conference of the Chilean Computer Science Society, pages 233- 239, Punta Arenas, Chile.
  18. Pelta, D., Cruz, C., and Gonzlez, J. (2009). A study on diversity and cooperation in a multiagent strategy for dynamic optimization problems. Int. Journal of Intelligent Systems, 24(18):844-861.
  19. Price, B. and Boutilier, C. (2003). Accelerating reinforcement learning through implicit imitation. Journal of Artificial Intelligence Research, 19:569-629.
  20. Ribeiro, C. H. C. (1999). A tutorial on reinforcement learning techniques. In Proc. Int. Joint Conference on Neural Networks, pages 59-61, Washington, USA.
  21. Ribeiro, R., Borges, A. P., and Enembreck, F. (2008). Interaction models for multiagent reinforcement learning. In Proc. 2008 International Conferences on Computational Intelligence for Modelling, Control and Automation; Intelligent Agents, Web Technologies and Internet Commerce; and Innovation in Software Engineering, pages 464-469, Vienna, Austria.
  22. Ribeiro, R., Borges, A. P., Ronszcka, A. F., Scalabrin, E., Avila, B. C., and Enembreck, F. (2011). Combinando modelos de interao para melhorar a coordenao em sistemas multiagente. Revista de Informtica Terica e Aplicada, 18:133-157.
  23. Ribeiro, R., Enembreck, F., and Koerich, A. L. (2006). A hybrid learning strategy for discovery of policies of action. In Proc. International Joint Conference X Ibero-American Artificial Intelligence Conference and XVIII Brazilian Artificial Intelligence Symposium, pages 268-277, Ribeiro Preto, Brazil.
  24. Sislak, D., Samek, J., and Pechoucek, M. (2008). Decentralized algorithms for collision avoidance in airspace. In Proc. 7th International Conference on AAMAS, pages 543-550, Estoril, Portugal.
  25. Spaan, M. T. J. and Melo, F. S. (2008). Interaction-driven markov games for decentralized multiagent planning under uncertainty. In Proc. 7th International Conference on AAMAS, pages 525-532, Estoril, Portugal.
  26. Strehl, A. L., Li, L., and Littman, M. L. (2009). Reinforcement learning in finite mdps: Pac analysis. Journal of Machine Learning Research (JMLR), 10:2413-2444.
  27. Sutton, R. S. and Barto, A. G. (1998). Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA.
  28. Tesauro, G. (1995). Temporal difference learning and TDGammon. Communications of the ACM, 38(3):58-68.
  29. Watkins, C. J. C. H. and Dayan, P. (1992). Q-learning. Machine Learning, 8(3/4):279-292.
  30. Zhang, C., Lesser, V., and Abdallah, S. (2010). Selforganization for cordinating decentralized reinforcement learning. In Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems, AAMAS'10, pages 739-746. International Foundation for Autonomous Agents and Multiagent Systems.
Download


Paper Citation


in Harvard Style

Ribeiro R., Favarim F., A. C. Barbosa M., Pinz Borges A., Betazzi Dordal O., L. Koerich A. and Enembreck F. (2012). Unified Algorithm to Improve Reinforcement Learning in Dynamic Environments - An Instance-based Approach . In Proceedings of the 14th International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-989-8565-10-5, pages 229-238. DOI: 10.5220/0004000002290238


in Bibtex Style

@conference{iceis12,
author={Richardson Ribeiro and Fábio Favarim and Marco A. C. Barbosa and André Pinz Borges and Osmar Betazzi Dordal and Alessandro L. Koerich and Fabrício Enembreck},
title={Unified Algorithm to Improve Reinforcement Learning in Dynamic Environments - An Instance-based Approach},
booktitle={Proceedings of the 14th International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2012},
pages={229-238},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004000002290238},
isbn={978-989-8565-10-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 14th International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - Unified Algorithm to Improve Reinforcement Learning in Dynamic Environments - An Instance-based Approach
SN - 978-989-8565-10-5
AU - Ribeiro R.
AU - Favarim F.
AU - A. C. Barbosa M.
AU - Pinz Borges A.
AU - Betazzi Dordal O.
AU - L. Koerich A.
AU - Enembreck F.
PY - 2012
SP - 229
EP - 238
DO - 10.5220/0004000002290238