DIRECT GRADIENT-BASED REINFORCEMENT LEARNING FOR ROBOT BEHAVIOR LEARNING

Andres El-Fakdi, Marc Carreras, Pere Ridao

Abstract

Autonomous Underwater Vehicles (AUV) represent a challenging control problem with complex, noisy, dynamics. Nowadays, not only the continuous scientific advances in underwater robotics but the increasing number of sub sea missions and its complexity ask for an automatization of submarine processes. This paper proposes a high-level control system for solving the action selection problem of an autonomous robot. The system is characterized by the use of Reinforcement Learning Direct Policy Search methods (RLDPS) for learning the internal state/action mapping of some behaviors. We demonstrate its feasibility with simulated experiments using the model of our underwater robot URIS in a target following task.

References

  1. R. Sutton and A. Barto, Reinforcement Learning, an Introduction. MIT Press, 1998.
  2. W.D. Smart and L.P Kaelbling, “Practical reinforcement learning in continuous spaces”, International Conference on Machine Learning, 2000.
  3. N. Hernandez and S. Mahadevan, “Hierarchical memorybased reinforcement learning”, Fifteenth International Conference on Neural Information Processing Systems, Denver, USA, 2000.
  4. D.P. Bertsekas and J.N. Tsitsiklis, Neuro-Dynamic Programming. Athena Scientific, 1996.
  5. R. Sutton, D. McAllester, S. Singh and Y. Mansour, “Policy gradient methods for reinforcement learning with function approximation” in Advances in Neural Information Processing Systems 12, pp. 1057-1063, MIT Press, 2000.
  6. C. Anderson, “Approximating a policy can be easier than approximating a value function” Computer Science Technical Report, CS-00-101, February 10, 2000.
  7. R. Williams, “Simple statistical gradient-following algorithms for connectionist reinforcement learning” in Machine Learning, 8, pp. 229-256, 1992.
  8. J. Baxter and P.L. Bartlett, “Direct gradient-based reinforcement learning” IEEE International Symposium on Circuits and Systems, May 28-31, Geneva, Switzerland, 2000.
  9. V.R. Konda and J.N. Tsitsiklis, “On actor-critic algorithms”, in SIAM Journal on Control and Optimization, vol. 42, No. 4, pp. 1143-1166, 2003.
  10. S.P. Singh, T. Jaakkola and M.I. Jordan, “Learning without state-estimation in partially observable Markovian decision processes”, in Proceedings of the 11th International Conference on Machine Learning, pp. 284-292, 1994.
  11. N. Meuleau, L. Peshkin and K. Kim, “Exploration in gradient-based reinforcement learning”, Technical report AI Memo 2001-003, April 3, 2001.
  12. J. Baxter and P.L. Bartlett, “Direct gradient-based reinforcement learning I: Gradient estimation algorithms” Technical Report. Australian National University, 1999.
  13. P. Ridao, A. Tiano, A. El-Fakdi, M. Carreras and A. Zirilli, “On the identification of non-linear models of unmanned underwater vehicles” in Control Engineering Practice, vol. 12, pp. 1483-1499, 2004.
  14. D. A., Aberdeen, Policy Gradient Algorithms for Partially Observable Markov Decision Processes, PhD Thesis, Australian National University, 2003.
  15. S. Haykin, Neural Networks, a comprehensive foundation, Prentice Hall, Upper Saddle River, New Jersey, USA, 1999.
  16. T.I., Fossen, Guidance and Control of Ocean Vehicles, John Wiley and Sons, New York, USA, 1994.
  17. J. A. Bagnell and J. G. Schneider, “Autonomous Helicopter Control using Reinforcement Learning Policy Search Methods”, in Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Seoul, Korea, 2001.
  18. M. T. Rosenstein and A. G. Barto, “Robot Weightlifting by Direct Policy Search”, in Proceedings of the International Joint Conference on Artificial Intelligence, 2001.
  19. N. Kohl and P. Stone, “Policy Gradient Reinforcement Learning for Fast Quadrupedal Locomotion”, in Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2004.
  20. M. Carreras, P. Ridao and A. El-Fakdi, “Semi-Online Neural-Q-Learning for Real-Time Robot Learning”, in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, USA, 2003.
Download


Paper Citation


in Harvard Style

El-Fakdi A., Carreras M. and Ridao P. (2005). DIRECT GRADIENT-BASED REINFORCEMENT LEARNING FOR ROBOT BEHAVIOR LEARNING . In Proceedings of the Second International Conference on Informatics in Control, Automation and Robotics - Volume 2: ICINCO, ISBN 972-8865-30-9, pages 225-231. DOI: 10.5220/0001188902250231


in Bibtex Style

@conference{icinco05,
author={Andres El-Fakdi and Marc Carreras and Pere Ridao},
title={DIRECT GRADIENT-BASED REINFORCEMENT LEARNING FOR ROBOT BEHAVIOR LEARNING},
booktitle={Proceedings of the Second International Conference on Informatics in Control, Automation and Robotics - Volume 2: ICINCO,},
year={2005},
pages={225-231},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001188902250231},
isbn={972-8865-30-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Second International Conference on Informatics in Control, Automation and Robotics - Volume 2: ICINCO,
TI - DIRECT GRADIENT-BASED REINFORCEMENT LEARNING FOR ROBOT BEHAVIOR LEARNING
SN - 972-8865-30-9
AU - El-Fakdi A.
AU - Carreras M.
AU - Ridao P.
PY - 2005
SP - 225
EP - 231
DO - 10.5220/0001188902250231