ml:optimizers
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| ml:optimizers [2025/03/06 10:11] – jmflanig | ml:optimizers [2025/03/26 20:02] (current) – [Second-Order Optimizers] jmflanig | ||
|---|---|---|---|
| Line 36: | Line 36: | ||
| * Generalized SignSGD: [[https:// | * Generalized SignSGD: [[https:// | ||
| * **Lion**: [[https:// | * **Lion**: [[https:// | ||
| + | * **Muon**: [[https:// | ||
| + | * Background on norms: [[https:// | ||
| + | * Applied to larger scale LLM training: [[https:// | ||
| ==== Provably Linearly-Convergent Optimizers ==== | ==== Provably Linearly-Convergent Optimizers ==== | ||
| Line 76: | Line 79: | ||
| * [[https:// | * [[https:// | ||
| * Apollo: [[https:// | * Apollo: [[https:// | ||
| + | * [[https:// | ||
| ===== Gradient-Free Optimizers ===== | ===== Gradient-Free Optimizers ===== | ||
ml/optimizers.1741255863.txt.gz · Last modified: 2025/03/06 10:11 by jmflanig