ml:optimizers
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| ml:optimizers [2023/07/07 20:52] – [Modern Deep Learning Optimizers] jmflanig | ml:optimizers [2025/03/26 20:02] (current) – [Second-Order Optimizers] jmflanig | ||
|---|---|---|---|
| Line 3: | Line 3: | ||
| ===== Survey Papers ===== | ===== Survey Papers ===== | ||
| * Introduction: | * Introduction: | ||
| - | * [[https:// | + | |
| - | * [[https:// | + | |
| - | * Blog post: [[https:// | + | * **[[https:// |
| - | * Blog post about Adam, AdamW, and AMSGrad: [[https:// | + | * [[https:// |
| + | | ||
| + | | ||
| + | | ||
| + | * [[https:// | ||
| + | * Blog post about Adam, AdamW, and AMSGrad: [[https:// | ||
| ===== First-Order Optimizers ===== | ===== First-Order Optimizers ===== | ||
| Line 30: | Line 35: | ||
| * [[https:// | * [[https:// | ||
| * Generalized SignSGD: [[https:// | * Generalized SignSGD: [[https:// | ||
| - | * Lion: [[https:// | + | |
| + | * **Muon**: [[https:// | ||
| + | * Background on norms: [[https:// | ||
| + | * Applied to larger scale LLM training: [[https:// | ||
| ==== Provably Linearly-Convergent Optimizers ==== | ==== Provably Linearly-Convergent Optimizers ==== | ||
| Line 71: | Line 79: | ||
| * [[https:// | * [[https:// | ||
| * Apollo: [[https:// | * Apollo: [[https:// | ||
| + | * [[https:// | ||
| ===== Gradient-Free Optimizers ===== | ===== Gradient-Free Optimizers ===== | ||
ml/optimizers.1688763159.txt.gz · Last modified: 2023/07/07 20:52 by jmflanig