Pre-Training Deep Q-Networks Eliminates the Need for Target Networks: An Empirical Study

Alexander Lindström; Arunselvan Ramaswamy; Karl-Johan Grinnemo

doi:10.5220/0013374600003905

Pre-Training Deep Q-Networks Eliminates the Need for Target Networks: An Empirical Study

Alexander Lindström, Arunselvan Ramaswamy, Karl-Johan Grinnemo

2025

Abstract

Deep Q-Learning is an important algorithm in the field of Reinforcement Learning for automated sequential decision making problems. It trains a neural network called the Deep Q Network (DQN) to find an optimal policy. Training is highly unstable with high variance. A target network is used to mitigate these problems, but leads to longer training times and, high training data and very large memory requirements. In this paper, we present a two phase pre-trained online training procedure that eliminates the need for a target network. In the first - offline - phase, the DQN is trained using expert actions. Unlike previous literature that tries to maximize the probability of picking the expert actions, we train to minimize the usual squared Bellman loss. Then, in the second - online - phase, it continues to train while interacting with an environment (simulator). We show, empirically, that the target network is eliminated; training variance is reduced; training is more stable; when the duration of pre-training is carefully chosen the rate of convergence (to an optimal policy) during the online training phase is faster; the quality of the final policy found is at least as good as the ones found using traditional methods.

Download

Paper Citation

in Harvard Style

Lindström A., Ramaswamy A. and Grinnemo K. (2025). Pre-Training Deep Q-Networks Eliminates the Need for Target Networks: An Empirical Study. In Proceedings of the 14th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM; ISBN 978-989-758-730-6, SciTePress, pages 437-444. DOI: 10.5220/0013374600003905

in Bibtex Style

@conference{icpram25,
author={Alexander Lindström and Arunselvan Ramaswamy and Karl-Johan Grinnemo},
title={Pre-Training Deep Q-Networks Eliminates the Need for Target Networks: An Empirical Study},
booktitle={Proceedings of the 14th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM},
year={2025},
pages={437-444},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013374600003905},
isbn={978-989-758-730-6},
}

in EndNote Style

TY - CONF

JO - Proceedings of the 14th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM
TI - Pre-Training Deep Q-Networks Eliminates the Need for Target Networks: An Empirical Study
SN - 978-989-758-730-6
AU - Lindström A.
AU - Ramaswamy A.
AU - Grinnemo K.
PY - 2025
SP - 437
EP - 444
DO - 10.5220/0013374600003905
PB - SciTePress