COMPLEXITY OF STOCHASTIC BRANCH AND BOUND METHODS FOR BELIEF TREE SEARCH IN BAYESIAN REINFORCEMENT LEARNING

Christos Dimitrakakis

2010

Abstract

There has been a lot of recent work on Bayesian methods for reinforcement learning exhibiting near-optimal online performance. The main obstacle facing such methods is that in most problems of interest, the optimal solution involves planning in an infinitely large tree. However, it is possible to obtain stochastic lower and upper bounds on the value of each tree node. This enables us to use stochastic branch and bound algorithms to search the tree efficiently. This paper proposes some algorithms and examines their complexity in this setting.

References

  1. Asmuth, J., Li, L., Littman, M. L., Nouri, A., and Wingate, D. (2009). A Bayesian sampling approach to exploration in reinforcement learning. In UAI 2009.
  2. Auer, P., Jaksch, T., and Ortner, R. (2008). Near-optimal regret bounds for reinforcement learning. In Proceedings of NIPS 2008.
  3. Bubeck, S., Munos, R., Stoltz, G., and Szepesvári, C. (2008). Online optimization in X-armed bandits. In NIPS, pages 201-208.
  4. Dimitrakakis, C. (2008). Tree exploration for Bayesian RL exploration. In CIMCA'08, pages 1029-1034, Los Alamitos, CA, USA. IEEE Computer Society.
  5. Dimitrakakis, C. (2009). Complexity of stochastic branch and bound for belief tree search in Bayesian reinforcement learning. Technical Report IAS-UVA-09- 01, University of Amsterdam.
  6. Duff, M. O. (2002). Optimal Learning Computational Procedures for Bayes-adaptive Markov Decision Processes. PhD thesis, University of Massachusetts at Amherst.
  7. Kearns, M. J., Mansour, Y., and Ng, A. Y. (1999). A sparse sampling algorithm for near-optimal planning in large Markov decision processes. In Dean, T., editor, IJCAI, pages 1324-1231. Morgan Kaufmann.
  8. Kolter, J. Z. and Ng, A. Y. (2009). Near-Bayesian exploration in polynomial time. In ICML 2009.
  9. Norkin, V. I., Pflug, G. C., and Ruszczyski, A. (1998). A branch and bound method for stochastic global optimization. Mathematical Programming, 83(1):425- 450.
  10. Poupart, P., Vlassis, N., Hoey, J., and Regan, K. (2006). An analytic solution to discrete Bayesian reinforcement learning. In ICML 2006, pages 697-704. ACM Press New York, NY, USA.
  11. Puterman, M. L. (1994,2005). Markov Decision Processes : Discrete Stochastic Dynamic Programming. John Wiley & Sons, New Jersey, US.
  12. Ross, S., Pineau, J., Paquet, S., and Chaib-draa, B. (2008). Online planning algorithms for POMDPs. Journal of Artificial Intelligence Resesarch, 32:663-704.
  13. Wang, T., Lizotte, D., Bowling, M., and Schuurmans, D. (2005). Bayesian sparse sampling for on-line reward optimization. In ICML 7805, pages 956-963, New York, NY, USA. ACM.
  14. Zhang, T. (2006). From e-entropy to KL-entropy: Analysis of minimum information complexity density estimation. Annals of Statistics, 34(5):2180-2210.
Download


Paper Citation


in Harvard Style

Dimitrakakis C. (2010). COMPLEXITY OF STOCHASTIC BRANCH AND BOUND METHODS FOR BELIEF TREE SEARCH IN BAYESIAN REINFORCEMENT LEARNING . In Proceedings of the 2nd International Conference on Agents and Artificial Intelligence - Volume 1: ICAART, ISBN 978-989-674-021-4, pages 259-264. DOI: 10.5220/0002721402590264


in Bibtex Style

@conference{icaart10,
author={Christos Dimitrakakis},
title={COMPLEXITY OF STOCHASTIC BRANCH AND BOUND METHODS FOR BELIEF TREE SEARCH IN BAYESIAN REINFORCEMENT LEARNING},
booktitle={Proceedings of the 2nd International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,},
year={2010},
pages={259-264},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002721402590264},
isbn={978-989-674-021-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 2nd International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,
TI - COMPLEXITY OF STOCHASTIC BRANCH AND BOUND METHODS FOR BELIEF TREE SEARCH IN BAYESIAN REINFORCEMENT LEARNING
SN - 978-989-674-021-4
AU - Dimitrakakis C.
PY - 2010
SP - 259
EP - 264
DO - 10.5220/0002721402590264