User Tools

Site Tools


ml:theory:generalization_in_deep_learning

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
ml:theory:generalization_in_deep_learning [2025/03/06 10:18] – [Overviews] jmflanigml:theory:generalization_in_deep_learning [2025/05/29 07:00] (current) – [Grokking] jmflanig
Line 58: Line 58:
 ==== Grokking ==== ==== Grokking ====
   * [[https://arxiv.org/pdf/2201.02177.pdf|Power et al 2022 - Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets]]   * [[https://arxiv.org/pdf/2201.02177.pdf|Power et al 2022 - Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets]]
 +  * [[https://arxiv.org/pdf/2505.20896|Wu et al 2025 - How Do Transformers Learn Variable Binding in Symbolic Programs?]]: "We find that the model’s final solution builds upon, rather than replaces, the heuristics learned in earlier phases. This adds nuance to the traditional narrative about “grokking”, where models are thought to discard superficial heuristics in favor of more systematic solutions. Instead, our model maintains its early-line heuristics while developing additional mechanisms to handle cases where these heuristics fail, suggesting cumulative learning where sophisticated capabilities emerge by augmenting simpler strategies."
  
 ===== Related Pages ===== ===== Related Pages =====
   * [[ml:Regularization]]   * [[ml:Regularization]]
ml/theory/generalization_in_deep_learning.1741256281.txt.gz · Last modified: 2025/03/06 10:18 by jmflanig

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki