Combining off-Policy and on-Policy Reinforcement Learning for Dynamic Control of Nonlinear Systems

Ahmed A. Hani Hazza, Simon Fabri, Marvin Bugeja, Kenneth. Camilleri

2025

Abstract

This paper introduces QARSA, a novel reinforcement learning algorithm that combines the strengths of off-policy and on-policy methods, specifically Q-learning and SARSA, for the dynamic control of nonlinear systems. Designed to leverage the sample efficiency of off-policy learning while preserving the stability and lower variance of on-policy approaches, QARSA aims to offer a balanced and robust learning framework. The algorithm is evaluated on the CartPole-v1 simulation environment using the OpenAI Gym framework, with performance compared against standalone Q-learning and SARSA implementations. The comparison is based on three critical metrics: average reward, stability, and sample efficiency. Experimental results demonstrate that QARSA outperforms both Q-learning and SARSA, achieving higher average rewards, stability, sample efficiency, and improved consistency in learned policies. These results demonstrate QARSA’s effectiveness in environments were maximizing long-term performance while maintaining learning stability is crucial. The study provides valuable insights for the design of hybrid reinforcement learning algorithms for continuous control tasks.

Download


Paper Citation


in Harvard Style

Hani Hazza A., Fabri S., Bugeja M. and Camilleri K. (2025). Combining off-Policy and on-Policy Reinforcement Learning for Dynamic Control of Nonlinear Systems. In Proceedings of the 22nd International Conference on Informatics in Control, Automation and Robotics - Volume 1: ICINCO; ISBN 978-989-758-770-2, SciTePress, pages 387-394. DOI: 10.5220/0013836700003982


in Bibtex Style

@conference{icinco25,
author={Ahmed Hani Hazza and Simon Fabri and Marvin Bugeja and Kenneth. Camilleri},
title={Combining off-Policy and on-Policy Reinforcement Learning for Dynamic Control of Nonlinear Systems},
booktitle={Proceedings of the 22nd International Conference on Informatics in Control, Automation and Robotics - Volume 1: ICINCO},
year={2025},
pages={387-394},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013836700003982},
isbn={978-989-758-770-2},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 22nd International Conference on Informatics in Control, Automation and Robotics - Volume 1: ICINCO
TI - Combining off-Policy and on-Policy Reinforcement Learning for Dynamic Control of Nonlinear Systems
SN - 978-989-758-770-2
AU - Hani Hazza A.
AU - Fabri S.
AU - Bugeja M.
AU - Camilleri K.
PY - 2025
SP - 387
EP - 394
DO - 10.5220/0013836700003982
PB - SciTePress