User Tools

Site Tools


ml:optimizers

Optimizers

Survey Papers

First-Order Optimizers

Modern Deep Learning Optimizers

Provably Linearly-Convergent Optimizers

Adam, and related methods that use exponential moving averages such as RMSProp, Adadelta, and Nadam, can be demonstrated not to converge, even for 1-dimensional convex problems (see Reddi et al 2018 - On the Convergence of Adam and Beyond, follow-up here: Ward et al 2020). The methods below attempt to improve this situation.

Variance Reduction Techniques

Distributed Optimizers

See also Distributed Training

Learned Optimizers

Other Optimizers

Older Optimizers

For a history of SGD, etc see ML History - Optimization.

Second-Order Optimizers

Second-order optimizers such as Newton's method or quasi-Newton methods enjoy a much, much faster convergence rate than first-order optimizers (a “quadratic” convergence rate). See here and here.

Gradient-Free Optimizers

Also known as black-box optimizers. See also Hyperparameter Tuning.

ml/optimizers.txt · Last modified: 2025/03/26 20:02 by jmflanig

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki