ml:scaling_laws
This is an old revision of the document!
Table of Contents
Scaling Laws
Scaling laws are used to pick optimal hyperparameters for large models.
Papers
- Ruan et al 2024 - Observational Scaling Laws and the Predictability of Language Model Performance Does a multi-dimensional regression (fitting a sigmoid) to predict the performance across model “families” (LLaMA, GPT-3, etc)
Training LLMs
- Large models are usually trained with scaling laws in mind (often compute optimal for deployment, not training). See for example:
- Google 2023 - PaLM 2 Technical Report (see section 2)
Emergent Abilities
- Hu et al 2023 - Predicting Emergent Abilities with Infinite Resolution Evaluation Does bootstrap resampling to get a very fine-grained measure of model capabilities by resampling until they get the desire behavior a certain number of times. Can be used to predict emergent capabilities from very small models that rarely exhibit the desired behavior.
Related Pages
ml/scaling_laws.1720746527.txt.gz · Last modified: 2024/07/12 01:08 by jmflanig