Multi-agent Polygon Formation using Reinforcement Learning

B. K. Swathi Prasad, Aditya G. Manjunath, Hariharan Ramasangu


This work provides details a simulation experiment and analysis of Q-learning applied to multi-agent systems. Six agents interact within the environment to form hexagon, square and triangle, by reaching their specific goal states. In the proposed approach, the agents form a hexagon and the maximum dimension of this pattern is be reduced to form patterns with smaller dimensions. A decentralised approach of controlling the agents via Q-Learning was adopted which reduced complexity. The agents will be able to either move forward, backward and sideways based on the decision taken. Finally, the Q-Learning action-reward system was designed such that the agents could exploit the system which meant that they would earn high rewards for correct actions and negative rewards so the opposite.


  1. A. Guillet, R. Lenain, B. T. and Martinet, P. (2014). Adaptable robot formation control: Adaptive and predictive formation control of autonomous vehicles. In IEEE Robotics and Automation Magazine. IEEE.
  2. B. Dafflon, F. Gechter, P. G. and Koukam, A. (2013). A layered multi-agent model for multi-configuration platoon control. pages pp.33-40. International Conference on Informatics in Control, Automation and Robotics.
  3. Busoniu, L., Schutter, B. D., and Babuska, R. (2006). Decentralized reinforcement learning control of a robotic manipulator. In 2006 9th International Conference on Control, Automation, Robotics and Vision, pages 1-6.
  4. C. Zhang, T. S. and Pan, Y. (2014). Neural network observer-based finite-time formation control of mobile robots. pages pp. 1-9. Mathematical Problems in Engineering.
  5. Cheng, T. and Savkin, A. (2011). Decentralized control of multi-agent systems for swarming with a given geometric pattern. pages 61(4), pp.731-744. Computers and Mathematics with Applications.
  6. Dong, X., Yu, B., Shi, Z., and Zhong, Y. (2015). Timevarying formation control for unmanned aerial vehicles: theories and applications. IEEE Transactions on Control Systems Technology, 23(1):340-348.
  7. Duran, S. and Gazi, V. (2010). Adaptive formation control and target tracking in a class of multi-agent systems. In Proceedings of the 2010 American Control Conference, pages 75-80.
  8. Gifford, C. M. and Agah, A. (2007). Precise formation of multi-robot systems. In 2007 IEEE International Conference on System of Systems Engineering, pages 1-6.
  9. I. Sanhoury, S. A. and Husain, A. (2012). Synchronizing multi-robots in switching between different formations tasks while tracking a line. pages pp.28-36. Communications in Computer and Information Science.
  10. J. Alonso-Mora, A. Breitenmoser, M. R. R. S. and Beardsley, P. (2011). Multi-robot system for artistic pattern formation. pages pp. 4512-4517. Robotics and Automation (ICRA), IEEE International Conference.
  11. J. Desai, J. O. and Kumar, V. (2001). Modeling and control of formations of nonholonomic mobile robots. pages 17(6), pp.905-908. IEEE Trans. Robot. Automat.
  12. Karimoddini, A., Karimadini, M., and Lin, H. (2014). Decentralized hybrid formation control of unmanned aerial vehicles. In 2014 American Control Conference, pages 3887-3892. IEEE.
  13. Krick, L., Broucke, M. E., and Francis, B. A. (2009). Stabilisation of infinitesimally rigid formations of multirobot networks. International Journal of Control, 82(3):423-439.
  14. Prasad, B. K. S., Aditya, M., and Ramasangu, H. (2016a). Flocking trajectory control under faulty leader: Energy-level based election of leader. pages 3752-3757. IEEE.
  15. Prasad, B. K. S., Aditya, M., and Ramasangu, H. (2016b). Multi-agent trajectory control under faulty leader: Energy-level based leader election under constant velocity. pages 2151-2156. IEEE.
  16. Rego, F., Soares, J. M., Pascoal, A., Aguiar, A. P., and Jones, C. (2014). Flexible triangular formation keeping of marine robotic vehicles using range measurements 1. IFAC Proceedings Volumes, 47(3):5145-5150.
  17. Ren, W. (2015). Consensus strategies for cooperative control of vehicle formations. In Control Theory and Applications, pages pp.505 - 512. IET.
  18. Rezaee, H. and Abdollahi, F. (2015). Pursuit formation of double-integrator dynamics using consensus control approach. IEEE Transactions on Industrial Electronics, 62(7):4249-4256.
  19. Smith, S. L., Broucke, M. E., and Francis, B. A. (2006). Stabilizing a multi-agent system to an equilateral polygon formation. In Proceedings of the 17th international symposium on mathematical theory of networks and systems, pages 2415-2424.

Paper Citation

in Harvard Style

Prasad B., Manjunath A. and Ramasangu H. (2017). Multi-agent Polygon Formation using Reinforcement Learning . In Proceedings of the 9th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART, ISBN 978-989-758-219-6, pages 159-165. DOI: 10.5220/0006187001590165

in Bibtex Style

author={B. K. Swathi Prasad and Aditya G. Manjunath and Hariharan Ramasangu},
title={Multi-agent Polygon Formation using Reinforcement Learning},
booktitle={Proceedings of the 9th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,},

in EndNote Style

JO - Proceedings of the 9th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,
TI - Multi-agent Polygon Formation using Reinforcement Learning
SN - 978-989-758-219-6
AU - Prasad B.
AU - Manjunath A.
AU - Ramasangu H.
PY - 2017
SP - 159
EP - 165
DO - 10.5220/0006187001590165