Table of Contents
Gradient Clipping
Papers
Blog Posts
Related Pages
Gradient Clipping
See section 10.11
here
.
Papers
Zhang et al 2020 - Why Gradient Clipping Accelerates Training: A Theoretical Justification for Adaptivity
An extreme form of gradient clipping, where everything gets clipped, is the “Manhattan-Learning rule” (see the
Rprop paper
). Rprop is an advancement over this.
Blog Posts
What is gradient clipping
Related Pages
Optimizers