Table of Contents

Learning Rate

Overviews

Learning Rate Schedule

Papers

Warm-up

Warm-up was originally proposed to handle training with very large batches for SGD (Goyal et al., 2017; Gotmare et al., 2019; Bernstein et al., 2018; Xiao et al., 2017).

Automatically Setting the Learning Rate

Parameter-Free Optimization

Optimization algorithms that don't have a stepsize or hyperparameters.

Convergence Conditions

Software