ml:scaling_laws
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| ml:scaling_laws [2024/07/12 03:34] – jmflanig | ml:scaling_laws [2025/06/01 23:09] (current) – [Related Pages] jmflanig | ||
|---|---|---|---|
| Line 6: | Line 6: | ||
| * [[https:// | * [[https:// | ||
| * **[[https:// | * **[[https:// | ||
| - | |||
| - | ==== In LLMs ==== | ||
| * [[https:// | * [[https:// | ||
| - | * **Training LLMs** | + | * [[https:// |
| - | * Large models are usually trained with scaling laws in mind (often compute optimal for deployment, not training). | + | |
| - | * [[https:// | + | ==== Training LLMs ==== |
| + | * Large models are usually trained with scaling laws in mind (often compute optimal for deployment, not training). | ||
| + | * [[https:// | ||
| ==== Emergent Abilities ==== | ==== Emergent Abilities ==== | ||
| + | See also [[nlp: | ||
| + | |||
| + | * GPT-3: [[https:// | ||
| + | * [[https:// | ||
| * [[https:// | * [[https:// | ||
| * **[[https:// | * **[[https:// | ||
| Line 21: | Line 25: | ||
| * [[Hyperparameter Tuning]] | * [[Hyperparameter Tuning]] | ||
| * [[nlp: | * [[nlp: | ||
| + | * [[nlp: | ||
| * [[nlp: | * [[nlp: | ||
ml/scaling_laws.1720755282.txt.gz · Last modified: 2024/07/12 03:34 by jmflanig