Pre-Training Deep Q-Networks Eliminates the Need for Target Networks: An Empirical Study
Alexander Lindström, Arunselvan Ramaswamy, Karl-Johan Grinnemo
2025
Abstract
Deep Q-Learning is an important algorithm in the field of Reinforcement Learning for automated sequential decision making problems. It trains a neural network called the Deep Q Network (DQN) to find an optimal policy. Training is highly unstable with high variance. A target network is used to mitigate these problems, but leads to longer training times and, high training data and very large memory requirements. In this paper, we present a two phase pre-trained online training procedure that eliminates the need for a target network. In the first - offline - phase, the DQN is trained using expert actions. Unlike previous literature that tries to maximize the probability of picking the expert actions, we train to minimize the usual squared Bellman loss. Then, in the second - online - phase, it continues to train while interacting with an environment (simulator). We show, empirically, that the target network is eliminated; training variance is reduced; training is more stable; when the duration of pre-training is carefully chosen the rate of convergence (to an optimal policy) during the online training phase is faster; the quality of the final policy found is at least as good as the ones found using traditional methods.
DownloadPaper Citation
in Harvard Style
Lindström A., Ramaswamy A. and Grinnemo K. (2025). Pre-Training Deep Q-Networks Eliminates the Need for Target Networks: An Empirical Study. In Proceedings of the 14th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM; ISBN 978-989-758-730-6, SciTePress, pages 437-444. DOI: 10.5220/0013374600003905
in Bibtex Style
@conference{icpram25,
author={Alexander Lindström and Arunselvan Ramaswamy and Karl-Johan Grinnemo},
title={Pre-Training Deep Q-Networks Eliminates the Need for Target Networks: An Empirical Study},
booktitle={Proceedings of the 14th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM},
year={2025},
pages={437-444},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013374600003905},
isbn={978-989-758-730-6},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 14th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM
TI - Pre-Training Deep Q-Networks Eliminates the Need for Target Networks: An Empirical Study
SN - 978-989-758-730-6
AU - Lindström A.
AU - Ramaswamy A.
AU - Grinnemo K.
PY - 2025
SP - 437
EP - 444
DO - 10.5220/0013374600003905
PB - SciTePress