Online Knowledge Gradient Exploration in an Unknown Environment

Saba Q. Yahyaa; Bernard Manderick

doi:10.5220/0004718700050013

Online Knowledge Gradient Exploration in an Unknown Environment

Saba Q. Yahyaa, Bernard Manderick

2014

Abstract

We present online kernel-based LSPI (or least squares policy iteration) which is an extension of offline kernel based LSPI. Online kernel-based LSPI combines characteristics of both online LSPI and offline kernel-based LSPI to improve the convergence rate as well as the optimal policy performances of the online LSPI. Online kernel-based LSPI uses knowledge gradient policy as an exploration policy and the approximate linear dependency based kernel sparsification method to select features automatically. We compare the optimal policy performance of online kernel-based LSPI and online LSPI on 5 discrete Markov decision problems, where online kernel-based LSPI outperforms online LSPI.

References

Engel, Y. and Meir, R. (2005). Algorithms and representations for reinforcement learning. Technical report, Ph.D. thesis, Senate of the Hebrew.
I.O. Ryzhov, W. P. and Frazier, P. (2012). The knowledgegradient policy for a general class of online learning problems. Operation Research, 60(1):180-195.
Koller, D. and Parr, R. (2000). Policy iteration for factored mdps. In Proceedings of the 16th Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI).
L. Bus¸oniu, D. Ernst, B. D. S. and Babus?ka, R. (2010). Online least-squares policy iteration for reinforcement learning control. In Proceedings of the 2010 American Control Conference.
Lagoudakis, M. G. and Parr, R. (2003). Model-free least squares policy iteration. Technical report, Ph.D. thesis, Duke University.
M. Sugiyama, H. Hachiya, C. T. and Vijayakumar, S. (2008). Geodesic gaussian kernels for value function approximation. Journal of Autonomous Robots, 25(3):287-304.
Mahadevan, S. (2008). Representation Discovery Using Harmonic Analysis. Morgan and Claypool Publishers.
Powell, W. (2007). Approximate Dynamic Programming: Solving the Curses of Dimensionality. John Wiley and Sons, New York, USA.
Puterman, M. L. (1994). Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley and Sons, Inc., New York, USA.
Scholkopf, B. and Smola, A. (2002). Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge, MA, USA.
Sutton, R. and Barto, A. (1998). Reinforcement Learning: An Introduction (Adaptive Computation and Machine Learning). The MIT Press, Cambridge, MA, 1st edition.
Vapnik, V. (1998). The Grid: Statistical Learning Theory. Wiley Press, New York, United State of America.
X. Xu, D. H. and Lu, X. (2007). Kernel-based least squares policy iteration for reinforcement learning. Journal of IEEE Transactions on Neural Network, 18(4):973- 992.
Y. Engel, S. M. and Meir, R. (2004). The kernel recursive least-squares algorithm. Journal of IEEE Transactions on Signal Processing, 52(8):2275-2285.
Y. Engel, S. M. and Meir, R. (2005). Reinforcement learning with gaussian processes. In Proceedings of the 22nd International Conference on Machine learning (ICML), New York, NY, USA. ACM.
Yahyaa, S. Q. and Manderick, B. (2012). Shortest path gaussian kernels for state action graphs: An empirical study. In The 24th Benelux Conference on Artificial Intelligence (BNAIC), Maastricht, The Netherlands.
Yahyaa, S. Q. and Manderick, B. (2013). Knowledge gradient exploration in online least squares policy iteration. In The 5th International Conference on Agents and Artificial Intelligence (ICAART), Barcelona, Spain.

Download

Paper Citation

in Harvard Style

Q. Yahyaa S. and Manderick B. (2014). Online Knowledge Gradient Exploration in an Unknown Environment . In Proceedings of the 6th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART, ISBN 978-989-758-015-4, pages 5-13. DOI: 10.5220/0004718700050013

in Bibtex Style

@conference{icaart14,
author={Saba Q. Yahyaa and Bernard Manderick},
title={Online Knowledge Gradient Exploration in an Unknown Environment},
booktitle={Proceedings of the 6th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,},
year={2014},
pages={5-13},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004718700050013},
isbn={978-989-758-015-4},
}

in EndNote Style

TY - CONF
JO - Proceedings of the 6th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,
TI - Online Knowledge Gradient Exploration in an Unknown Environment
SN - 978-989-758-015-4
AU - Q. Yahyaa S.
AU - Manderick B.
PY - 2014
SP - 5
EP - 13
DO - 10.5220/0004718700050013