ml:curriculum_learning

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
ml:curriculum_learning [2021/03/05 22:43] – [Overviews] jmflanigml:curriculum_learning [2024/06/28 03:27] (current) – [Papers] jmflanig
Line 1: Line 1:
 ====== Curriculum Learning ====== ====== Curriculum Learning ======
 +Curriculum learning (CL) is where a neural network is trained on easier examples before training on harder examples.  The method for deciding which examples to train on at different times is called the curriculum, and uses a measure of difficulty for the data points.  As an example, one can train on shorter sentences before adding longer sentences to the training.  CL can help the model learn features that generalize better, and help the model learn faster.  For an overview, see [[https://mila.quebec/wp-content/uploads/2019/08/2009_curriculum_icml.pdf|Bengio 2009]].
  
 ===== Overviews ===== ===== Overviews =====
   * [[https://arxiv.org/pdf/2010.13166.pdf|Wang et al 2020 - A Comprehensive Survey on Curriculum Learning]]   * [[https://arxiv.org/pdf/2010.13166.pdf|Wang et al 2020 - A Comprehensive Survey on Curriculum Learning]]
   * [[https://arxiv.org/pdf/2101.10382.pdf|Soviany et al 2021 - Curriculum Learning: A Survey]]   * [[https://arxiv.org/pdf/2101.10382.pdf|Soviany et al 2021 - Curriculum Learning: A Survey]]
 +  * [[https://www.aclweb.org/anthology/2020.acl-main.542.pdf|2020 - Curriculum Learning for Natural Language Understanding]]
 +
 +===== Strategies for the Curriculum =====
 +  * In NLP
 +    * Length, word rarity
  
 ===== Papers ===== ===== Papers =====
   * [[https://mila.quebec/wp-content/uploads/2019/08/2009_curriculum_icml.pdf|Bengio et al 2009 - Curriculum Learning]]   * [[https://mila.quebec/wp-content/uploads/2019/08/2009_curriculum_icml.pdf|Bengio et al 2009 - Curriculum Learning]]
   * [[http://proceedings.mlr.press/v70/graves17a/graves17a.pdf|Graves 2017 - Automated Curriculum Learning for Neural Networks]]   * [[http://proceedings.mlr.press/v70/graves17a/graves17a.pdf|Graves 2017 - Automated Curriculum Learning for Neural Networks]]
 +  * [[https://arxiv.org/pdf/1707.09533.pdf|Kocmi & Bojar 2017 - Curriculum Learning and Minibatch Bucketing in Neural Machine Translation]]
 +  * **[[https://arxiv.org/pdf/1811.00739.pdf|Zhang et al 2018 - An Empirical Exploration of Curriculum Learning for Neural Machine Translation]]**
 +  * [[https://arxiv.org/pdf/1904.03626.pdf|Hacohen & Weinshall 2019 - On The Power of Curriculum Learning in Training Deep Networks]] Has some theoretical results
 +  * [[https://aclanthology.org/N19-1119.pdf|Platanios et al 2019 - Competence-based Curriculum Learning for Neural Machine Translation]]
   * [[https://arxiv.org/pdf/2012.15832.pdf|Press et al 2020 - Shortformer: Better Language Modeling using Shorter Inputs]] Points out that BERT also uses curriculum learning.   * [[https://arxiv.org/pdf/2012.15832.pdf|Press et al 2020 - Shortformer: Better Language Modeling using Shorter Inputs]] Points out that BERT also uses curriculum learning.
   * [[https://www.aclweb.org/anthology/2020.acl-main.542.pdf|2020 - Curriculum Learning for Natural Language Understanding]]   * [[https://www.aclweb.org/anthology/2020.acl-main.542.pdf|2020 - Curriculum Learning for Natural Language Understanding]]
 +  * [[https://arxiv.org/pdf/2102.03554.pdf|Chang et al 2021 - Does the Order of Training Samples Matter? Improving Neural Data-to-Text Generation with Curriculum Learning]]
 +  * [[https://arxiv.org/pdf/2108.02170.pdf|Campos et al 2021 - Curriculum Learning for Language Modeling]]
 +  * [[https://arxiv.org/pdf/2109.11177.pdf|Lu & Zhang 2021 - Exploiting Curriculum Learning in Unsupervised Neural Machine Translation]]
 +  * [[https://arxiv.org/pdf/2310.09518|Lee et al 2023 - Instruction Tuning with Human Curriculum]]
  
 ===== Theory ===== ===== Theory =====
 +  * [[https://arxiv.org/pdf/1802.03796.pdf|Weinshall et al 2018 - Curriculum Learning by Transfer Learning: Theory and Experiments with Deep Networks]] Theory result: Shows, that for SGD on a convex problem (linear regression), "the rate of convergence of an ideal curriculum learning method is monotonically increasing with the difficulty of the examples." 
 +==== Self-Paced Learning ====
   * [[https://arxiv.org/pdf/1703.09923.pdf|2017 - On Convergence Property of Implicit Self-paced Objective]]   * [[https://arxiv.org/pdf/1703.09923.pdf|2017 - On Convergence Property of Implicit Self-paced Objective]]
   * [[https://www.sciencedirect.com/science/article/abs/pii/S0020025517307521|2017 - A theoretical understanding of self-paced learning]]   * [[https://www.sciencedirect.com/science/article/abs/pii/S0020025517307521|2017 - A theoretical understanding of self-paced learning]]
Line 21: Line 37:
 ===== Papers using Curriculum Learning ===== ===== Papers using Curriculum Learning =====
   * [[https://arxiv.org/pdf/1410.4615.pdf|2014 - Learning to Execute]]   * [[https://arxiv.org/pdf/1410.4615.pdf|2014 - Learning to Execute]]
 +
 +===== Related Pages =====
 +  * [[Active Learning]]
 +  * [[nlp:Compositional Generalization]]
  
ml/curriculum_learning.1614984191.txt.gz · Last modified: 2023/06/15 07:36 (external edit)

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki