User Tools

Site Tools


ml:learning_rate

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
ml:learning_rate [2023/06/15 07:36] – external edit 127.0.0.1ml:learning_rate [2024/02/06 00:31] (current) – [Automatically Setting the Learning Rate] jmflanig
Line 19: Line 19:
   * **Plateau Learning Rate**   * **Plateau Learning Rate**
     * Decrease the learning rate when the objective reaches a plateau.  See [[https://pytorch.org/docs/stable/generated/torch.optim.lr_scheduler.ReduceLROnPlateau.html|PyTorch - ReduceLROnPlateau]]     * Decrease the learning rate when the objective reaches a plateau.  See [[https://pytorch.org/docs/stable/generated/torch.optim.lr_scheduler.ReduceLROnPlateau.html|PyTorch - ReduceLROnPlateau]]
 +  * **Other Schedules**
 +    * [[https://arxiv.org/pdf/1608.03983.pdf|Loshchilov & Hutter 2016 - SGDR: Stochastic Gradient Descent with Warm Restarts]] Used by nanoGPT
   * **Warm restarts** [[https://arxiv.org/pdf/1608.03983.pdf|Loshchilov & Hutter 2016 - SGDR: Stochastic Gradient Descent with Warm Restarts]]   * **Warm restarts** [[https://arxiv.org/pdf/1608.03983.pdf|Loshchilov & Hutter 2016 - SGDR: Stochastic Gradient Descent with Warm Restarts]]
   * **Batch size** [[https://arxiv.org/pdf/1711.00489.pdf|Smith et al 2017 - Don't Decay the Learning Rate, Increase the Batch Size]]   * **Batch size** [[https://arxiv.org/pdf/1711.00489.pdf|Smith et al 2017 - Don't Decay the Learning Rate, Increase the Batch Size]]
Line 34: Line 36:
 ==== Automatically Setting the Learning Rate ==== ==== Automatically Setting the Learning Rate ====
   * [[https://arxiv.org/pdf/1802.05074.pdf|Rolinek & Martius 2018 - L4: Practical Loss-based Stepsize Adaptation for Deep Learning]] Uses a linear approximation and a target loss value to pick the step size.  For cross-entropy, could use 0 as the target value. Similar to LRTuner.   * [[https://arxiv.org/pdf/1802.05074.pdf|Rolinek & Martius 2018 - L4: Practical Loss-based Stepsize Adaptation for Deep Learning]] Uses a linear approximation and a target loss value to pick the step size.  For cross-entropy, could use 0 as the target value. Similar to LRTuner.
 +  * [[https://arxiv.org/pdf/1909.13371.pdf|Chandra et al 2019 - Gradient Descent: The Ultimate Optimizer]] Stacked hyper-optimizers
   * **[[https://arxiv.org/pdf/2002.10542.pdf|Loizou et al 2020 - Stochastic Polyak Step-size for SGD: An Adaptive Learning Rate for Fast Convergence]]**   * **[[https://arxiv.org/pdf/2002.10542.pdf|Loizou et al 2020 - Stochastic Polyak Step-size for SGD: An Adaptive Learning Rate for Fast Convergence]]**
   * [[https://arxiv.org/pdf/2103.12623.pdf|Carvalho et al 2021 - Evolving Learning Rate Optimizers for Deep Neural Networks]]   * [[https://arxiv.org/pdf/2103.12623.pdf|Carvalho et al 2021 - Evolving Learning Rate Optimizers for Deep Neural Networks]]
Line 39: Line 42:
   * [[https://arxiv.org/pdf/2105.14526.pdf|Iyer et al 2021 - LRTuner: A Learning Rate Tuner for Deep Neural Networks]] Uses a quadratic approximation in the direction of descent to pick the step size. Seems to work well. Similar to L4.   * [[https://arxiv.org/pdf/2105.14526.pdf|Iyer et al 2021 - LRTuner: A Learning Rate Tuner for Deep Neural Networks]] Uses a quadratic approximation in the direction of descent to pick the step size. Seems to work well. Similar to L4.
   * [[https://arxiv.org/pdf/2111.15317.pdf|Teng et al 2021 - AutoDrop: Training Deep Learning Models with Automatic Learning Rate Drop]]   * [[https://arxiv.org/pdf/2111.15317.pdf|Teng et al 2021 - AutoDrop: Training Deep Learning Models with Automatic Learning Rate Drop]]
 +  * **[[https://arxiv.org/pdf/2306.00144.pdf|Cutkosky et al 2023 - Mechanic: A Learning Rate Tuner]]**
  
 +==== Parameter-Free Optimization ====
 +Optimization algorithms that don't have a stepsize or hyperparameters.
 +
 +  * [[https://arxiv.org/pdf/2302.12022.pdf|Ivgi et al 2023 - DoG is SGD’s Best Friend: A Parameter-Free Dynamic Step Size Schedule]]
  
 ==== Convergence Conditions ==== ==== Convergence Conditions ====
ml/learning_rate.1686814574.txt.gz · Last modified: 2023/06/15 07:36 by 127.0.0.1

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki