Differences

This shows you the differences between two versions of the page.

--- ml:gradient_clipping [2022/05/29 22:11] – [Papers] jmflanig
+++ ml:gradient_clipping [2023/06/15 07:36] (current) – external edit 127.0.0.1
@@ Line 4: / Line 4: @@
 ===== Papers =====
   * [[https://arxiv.org/pdf/1905.11881.pdf|Zhang et al 2020 - Why Gradient Clipping Accelerates Training: A Theoretical Justification for Adaptivity]]
-  * An extreme form of gradient clipping, where everything gets clipped, is the "Manhattan-Learning rule.  Rprop is an advancement over this, see the {{papers:rprop_paper.pdf|Rprop paper}}.
+  * An extreme form of gradient clipping, where everything gets clipped, is the "Manhattan-Learning rule" (see the {{papers:rprop_paper.pdf|Rprop paper}}).  Rprop is an advancement over this.
 ===== Blog Posts =====