This is an old revision of the document!

Scaling Laws

Scaling laws are used to pick optimal hyperparameters for large models.

Papers

Large models are usually trained with scaling laws in mind (often compute optimal for deployment, not training). See for example:
- Google 2023 - PaLM 2 Technical Report (see section 2)

Schaeffer et al 2023 - Are Emergent Abilities of Large Language Models a Mirage?
Hu et al 2023 - Predicting Emergent Abilities with Infinite Resolution Evaluation Does bootstrap resampling to get a very fine-grained measure of model capabilities by resampling until they get the desire behavior a certain number of times. Can be used to predict emergent capabilities from very small models that rarely exhibit the desired behavior.
Du et al 2024 - Understanding Emergent Abilities of Language Models from the Loss Perspective