====== Gradient Clipping ======
See section 10.11 [[https://www.deeplearningbook.org/contents/rnn.html|here]].

===== Papers =====
  * [[https://arxiv.org/pdf/1905.11881.pdf|Zhang et al 2020 - Why Gradient Clipping Accelerates Training: A Theoretical Justification for Adaptivity]]
  * An extreme form of gradient clipping, where everything gets clipped, is the "Manhattan-Learning rule" (see the {{papers:rprop_paper.pdf|Rprop paper}}).  Rprop is an advancement over this.

===== Blog Posts =====
  * [[https://towardsdatascience.com/what-is-gradient-clipping-b8e815cdfb48|What is gradient clipping]]

===== Related Pages =====
  * [[Optimizers]]