ml:scaling_laws
Table of Contents
Scaling Laws
Scaling laws are used to pick optimal hyperparameters for large models.
Papers
- Ruan et al 2024 - Observational Scaling Laws and the Predictability of Language Model Performance Does a multi-dimensional regression (fitting a sigmoid) to predict the performance across model “families” (LLaMA, GPT-3, etc)
Training LLMs
- Large models are usually trained with scaling laws in mind (often compute optimal for deployment, not training). See for example:
- Google 2023 - PaLM 2 Technical Report (see section 2)
Emergent Abilities
See also Language Model - Origin of Capabilities.
- GPT-3: Brown et al 2021 - Language Models are Few-Shot Learners GPT-3 showed emergent abilities. See for example Fig 3.10.
- Hu et al 2023 - Predicting Emergent Abilities with Infinite Resolution Evaluation Does bootstrap resampling to get a very fine-grained measure of model capabilities by resampling until they get the desire behavior a certain number of times. Can be used to predict emergent capabilities from very small models that rarely exhibit the desired behavior.
Related Pages
ml/scaling_laws.txt · Last modified: 2025/06/01 23:09 by jmflanig