Table of Contents

Gradient Clipping

Gradient Clipping

See section 10.11 here.

Papers

Zhang et al 2020 - Why Gradient Clipping Accelerates Training: A Theoretical Justification for Adaptivity
An extreme form of gradient clipping, where everything gets clipped, is the “Manhattan-Learning rule” (see the Rprop paper). Rprop is an advancement over this.

Blog Posts

What is gradient clipping

Related Pages

Optimizers