# ON TEMPORAL DIFFERENCE ALGORITHMS FOR CONTINUOUS SYSTEMS

### Alexandre Donzé

#### Abstract

This article proposes a general, intuitive and rigorous framework for designing temporal differences algorithms to solve optimal control problems in continuous time and space. Within this framework, we derive a version of the classical TD(λ) algorithm as well as a new TD algorithm which is similar, but designed to be more accurate and to converge as fast as TD(λ) for the best values of λ without the burden of finding these values.

#### References

- Bellman, R. (1957). Dynamic Programming. Princeton University Press, Princeton, New Jersey.
- Coulom, R. (2002). Reinforcement Learning Using Neural Networks, with Applications to Motor Control. PhD thesis, Institut National Polytechnique de Grenoble.
- Doya, K. (1996). Temporal difference learning in continuous time and space. In Touretzky, D. S., Mozer, M. C., and Hasselmo, M. E., editors, Advances in Neural Information Processing Systems, volume 8, pages 1073- 1079. The MIT Press.
- Doya, K. (2000). Reinforcement learning in continuous time and space. Neural Computation, 12(1):219-245.
- Munos, R. (2000). A study of reinforcement learning in the continuous case by the means of viscosity solutions. Machine Learning, 40(3):265-299.
- Munos, R. and Moore, A. (1999). Variable resolution discretization for high-accuracy solutions of optimal control problems. In International Joint Conference on Artificial Intelligence.
- Rantzer, A. (2005). On relaxed dynamic programming in switching systems. IEE Proceedings special issue on Hybrid Systems. Invited paper, to appear.
- Sutton, R. S. (1996). Generalization in reinforcement learning: Successful examples using sparse coarse coding. In Advances in Neural Information Processing Systems 8, pages 1038-1044. MIT Press.
- Sutton, R. S. and Barto, A. G. (1998). Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA.
- Tesauro, G. (1995). Temporal difference learning and TDGammon. Communications of the ACM, 38(3):58-68.
- Tsitsiklis, J. N. (2002). On the convergence of optimistic policy iteration. Journal of Machine Learning Research, 3:59-72.

#### Paper Citation

#### in Harvard Style

Donzé A. (2005). **ON TEMPORAL DIFFERENCE ALGORITHMS FOR CONTINUOUS SYSTEMS** . In *Proceedings of the Second International Conference on Informatics in Control, Automation and Robotics - Volume 1: ICINCO,* ISBN 972-8865-29-5, pages 55-62. DOI: 10.5220/0001183700550062

#### in Bibtex Style

@conference{icinco05,

author={Alexandre Donzé},

title={ON TEMPORAL DIFFERENCE ALGORITHMS FOR CONTINUOUS SYSTEMS},

booktitle={Proceedings of the Second International Conference on Informatics in Control, Automation and Robotics - Volume 1: ICINCO,},

year={2005},

pages={55-62},

publisher={SciTePress},

organization={INSTICC},

doi={10.5220/0001183700550062},

isbn={972-8865-29-5},

}

#### in EndNote Style

TY - CONF

JO - Proceedings of the Second International Conference on Informatics in Control, Automation and Robotics - Volume 1: ICINCO,

TI - ON TEMPORAL DIFFERENCE ALGORITHMS FOR CONTINUOUS SYSTEMS

SN - 972-8865-29-5

AU - Donzé A.

PY - 2005

SP - 55

EP - 62

DO - 10.5220/0001183700550062