Gradient Clipping in Deep Learning: A Dynamical Systems Perspective

Arunselvan Ramaswamy

2023

Abstract

Neural networks are ubiquitous components of Machine Learning (ML) algorithms. However, training them is challenging due to problems associated with exploding and vanishing loss-gradients. Gradient clipping is shown to effectively combat both the vanishing gradients and the exploding gradients problems. As the name suggests, gradients are clipped in order to prevent large updates. At the same time, very small neural network weights are updated using larger step-sizes. Although widely used in practice, there is very little theory surrounding clipping. In this paper, we analyze two popular gradient clipping techniques – the classic norm-based gradient clipping method and the adaptive gradient clipping technique. We prove that gradient clipping ensures numerical stability with very high probability. Further, clipping based stochastic gradient descent converges to a set of neural network weights that minimizes the average scaled training loss in a local sense. The averaging is with respect to the distribution that generated the training data. The scaling is a consequence of gradient clipping. We use tools from the theory of dynamical systems for the presented analysis.

Download


Paper Citation


in Harvard Style

Ramaswamy A. (2023). Gradient Clipping in Deep Learning: A Dynamical Systems Perspective. In Proceedings of the 12th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-758-626-2, pages 107-114. DOI: 10.5220/0011678000003411


in Bibtex Style

@conference{icpram23,
author={Arunselvan Ramaswamy},
title={Gradient Clipping in Deep Learning: A Dynamical Systems Perspective},
booktitle={Proceedings of the 12th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2023},
pages={107-114},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011678000003411},
isbn={978-989-758-626-2},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 12th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - Gradient Clipping in Deep Learning: A Dynamical Systems Perspective
SN - 978-989-758-626-2
AU - Ramaswamy A.
PY - 2023
SP - 107
EP - 114
DO - 10.5220/0011678000003411