ON TEMPORAL DIFFERENCE ALGORITHMS FOR CONTINUOUS SYSTEMS

Alexandre Donzé

doi:10.5220/0001183700550062

ON TEMPORAL DIFFERENCE ALGORITHMS FOR CONTINUOUS SYSTEMS

Alexandre Donzé

2005

Abstract

This article proposes a general, intuitive and rigorous framework for designing temporal differences algorithms to solve optimal control problems in continuous time and space. Within this framework, we derive a version of the classical TD(λ) algorithm as well as a new TD algorithm which is similar, but designed to be more accurate and to converge as fast as TD(λ) for the best values of λ without the burden of finding these values.

References

Bellman, R. (1957). Dynamic Programming. Princeton University Press, Princeton, New Jersey.
Coulom, R. (2002). Reinforcement Learning Using Neural Networks, with Applications to Motor Control. PhD thesis, Institut National Polytechnique de Grenoble.
Doya, K. (1996). Temporal difference learning in continuous time and space. In Touretzky, D. S., Mozer, M. C., and Hasselmo, M. E., editors, Advances in Neural Information Processing Systems, volume 8, pages 1073- 1079. The MIT Press.
Doya, K. (2000). Reinforcement learning in continuous time and space. Neural Computation, 12(1):219-245.
Munos, R. (2000). A study of reinforcement learning in the continuous case by the means of viscosity solutions. Machine Learning, 40(3):265-299.
Munos, R. and Moore, A. (1999). Variable resolution discretization for high-accuracy solutions of optimal control problems. In International Joint Conference on Artificial Intelligence.
Rantzer, A. (2005). On relaxed dynamic programming in switching systems. IEE Proceedings special issue on Hybrid Systems. Invited paper, to appear.
Sutton, R. S. (1996). Generalization in reinforcement learning: Successful examples using sparse coarse coding. In Advances in Neural Information Processing Systems 8, pages 1038-1044. MIT Press.
Sutton, R. S. and Barto, A. G. (1998). Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA.
Tesauro, G. (1995). Temporal difference learning and TDGammon. Communications of the ACM, 38(3):58-68.
Tsitsiklis, J. N. (2002). On the convergence of optimistic policy iteration. Journal of Machine Learning Research, 3:59-72.

Download

Paper Citation

in Harvard Style

Donzé A. (2005). ON TEMPORAL DIFFERENCE ALGORITHMS FOR CONTINUOUS SYSTEMS . In Proceedings of the Second International Conference on Informatics in Control, Automation and Robotics - Volume 1: ICINCO, ISBN 972-8865-29-5, pages 55-62. DOI: 10.5220/0001183700550062

in Bibtex Style

@conference{icinco05,
author={Alexandre Donzé},
title={ON TEMPORAL DIFFERENCE ALGORITHMS FOR CONTINUOUS SYSTEMS},
booktitle={Proceedings of the Second International Conference on Informatics in Control, Automation and Robotics - Volume 1: ICINCO,},
year={2005},
pages={55-62},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001183700550062},
isbn={972-8865-29-5},
}

in EndNote Style

TY - CONF
JO - Proceedings of the Second International Conference on Informatics in Control, Automation and Robotics - Volume 1: ICINCO,
TI - ON TEMPORAL DIFFERENCE ALGORITHMS FOR CONTINUOUS SYSTEMS
SN - 972-8865-29-5
AU - Donzé A.
PY - 2005
SP - 55
EP - 62
DO - 10.5220/0001183700550062