User Tools

Site Tools


ml:scaling_laws

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
ml:scaling_laws [2024/07/12 03:34] jmflanigml:scaling_laws [2025/06/01 23:09] (current) – [Related Pages] jmflanig
Line 6: Line 6:
   * [[https://arxiv.org/pdf/2207.10551.pdf|Tay et al 2022 - Scaling Laws vs Model Architectures: How Does Inductive Bias Influence Scaling?]]   * [[https://arxiv.org/pdf/2207.10551.pdf|Tay et al 2022 - Scaling Laws vs Model Architectures: How Does Inductive Bias Influence Scaling?]]
   * **[[https://arxiv.org/pdf/2405.10938|Ruan et al 2024 - Observational Scaling Laws and the Predictability of Language Model Performance]]** Does a multi-dimensional regression (fitting a sigmoid) to predict the performance across model "families" (LLaMA, GPT-3, etc)   * **[[https://arxiv.org/pdf/2405.10938|Ruan et al 2024 - Observational Scaling Laws and the Predictability of Language Model Performance]]** Does a multi-dimensional regression (fitting a sigmoid) to predict the performance across model "families" (LLaMA, GPT-3, etc)
- 
-==== In LLMs ==== 
   * [[https://arxiv.org/pdf/2406.19146|Porian et al 2024 - Resolving Discrepancies in Compute-Optimal Scaling of Language Models]]   * [[https://arxiv.org/pdf/2406.19146|Porian et al 2024 - Resolving Discrepancies in Compute-Optimal Scaling of Language Models]]
-  * **Training LLMs** +  * [[https://arxiv.org/pdf/2410.11840|Choshen et al 2024 - A Hitchhiker's Guide to Scaling Law Estimation]] 
-    * Large models are usually trained with scaling laws in mind (often compute optimal for deployment, not training).  See for example: + 
-      * [[https://ai.google/static/documents/palm2techreport.pdf|Google 2023 - PaLM 2 Technical Report]] (see section 2)+==== Training LLMs ==== 
 +  * Large models are usually trained with scaling laws in mind (often compute optimal for deployment, not training).  See for example: 
 +    * [[https://ai.google/static/documents/palm2techreport.pdf|Google 2023 - PaLM 2 Technical Report]] (see section 2)
  
 ==== Emergent Abilities ==== ==== Emergent Abilities ====
 +See also [[nlp:Language Model#Origin of Capabilities|Language Model - Origin of Capabilities]].
 +
 +  * GPT-3: [[https://arxiv.org/pdf/2005.14165.pdf|Brown et al 2021 - Language Models are Few-Shot Learners]] GPT-3 showed emergent abilities.  See for example Fig 3.10.
 +  * [[https://arxiv.org/pdf/2206.07682|Wei et al 2022 - Emergent Abilities of Large Language Models]]
   * [[https://arxiv.org/pdf/2304.15004|Schaeffer et al 2023 - Are Emergent Abilities of Large Language Models a Mirage?]]   * [[https://arxiv.org/pdf/2304.15004|Schaeffer et al 2023 - Are Emergent Abilities of Large Language Models a Mirage?]]
   * **[[https://arxiv.org/pdf/2310.03262|Hu et al 2023 - Predicting Emergent Abilities with Infinite Resolution Evaluation]]** Does bootstrap resampling to get a very fine-grained measure of model capabilities by resampling until they get the desire behavior a certain number of times.  Can be used to predict emergent capabilities from very small models that rarely exhibit the desired behavior.   * **[[https://arxiv.org/pdf/2310.03262|Hu et al 2023 - Predicting Emergent Abilities with Infinite Resolution Evaluation]]** Does bootstrap resampling to get a very fine-grained measure of model capabilities by resampling until they get the desire behavior a certain number of times.  Can be used to predict emergent capabilities from very small models that rarely exhibit the desired behavior.
Line 21: Line 25:
   * [[Hyperparameter Tuning]]   * [[Hyperparameter Tuning]]
   * [[nlp:Language Model]]   * [[nlp:Language Model]]
 +  * [[nlp:Language Model#Origin of Capabilities|Language Model - Origin of Capabilities]]
   * [[nlp:pretraining#Pretraining Methodology]]   * [[nlp:pretraining#Pretraining Methodology]]
ml/scaling_laws.1720755282.txt.gz · Last modified: 2024/07/12 03:34 by jmflanig

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki