User Tools

Site Tools


ml:optimizers

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
ml:optimizers [2025/03/06 18:40] – [Modern Deep Learning Optimizers] jmflanigml:optimizers [2025/03/26 20:02] (current) – [Second-Order Optimizers] jmflanig
Line 79: Line 79:
   * [[https://en.wikipedia.org/wiki/Limited-memory_BFGS|L-BFGS]] Highly popular for training convex ML models such as logistic regression.  (See comparison [[https://dl.acm.org/doi/10.3115/1118853.1118871|Malouf 2002]])   * [[https://en.wikipedia.org/wiki/Limited-memory_BFGS|L-BFGS]] Highly popular for training convex ML models such as logistic regression.  (See comparison [[https://dl.acm.org/doi/10.3115/1118853.1118871|Malouf 2002]])
   * Apollo: [[https://arxiv.org/pdf/2009.13586.pdf|Ma 2021 - Apollo: An Adaptive Parameter-wise Diagonal Quasi-Newton Method for Nonconvex Stochastic Optimization]] A diagonal quasi-Newton method   * Apollo: [[https://arxiv.org/pdf/2009.13586.pdf|Ma 2021 - Apollo: An Adaptive Parameter-wise Diagonal Quasi-Newton Method for Nonconvex Stochastic Optimization]] A diagonal quasi-Newton method
 +  * [[https://arxiv.org/pdf/2305.14342|Liu et al 2023 - Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training]]
  
 ===== Gradient-Free Optimizers ===== ===== Gradient-Free Optimizers =====
ml/optimizers.1741286418.txt.gz · Last modified: 2025/03/06 18:40 by jmflanig

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki