User Tools

Site Tools


ml:regularization

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
ml:regularization [2022/05/09 07:32] – [Dropout] jmflanigml:regularization [2024/03/14 07:38] (current) – [Sparsity-Inducing Regularizers] jmflanig
Line 8: Line 8:
   * [[https://papers.nips.cc/paper/2013/file/7b5b23f4aadf9513306bcd59afb6e4c9-Paper.pdf|Adaptive Dropout]]  Learns a dropout network during training ([[https://paperswithcode.com/method/adaptive-dropout|summary]])   * [[https://papers.nips.cc/paper/2013/file/7b5b23f4aadf9513306bcd59afb6e4c9-Paper.pdf|Adaptive Dropout]]  Learns a dropout network during training ([[https://paperswithcode.com/method/adaptive-dropout|summary]])
   * [[https://arxiv.org/pdf/1606.01305.pdf|Zoneout (Krueger et al 2016)]] (For regularizing RNNs) "At each timestep, zoneout stochastically forces some hidden units to maintain their previous values"   * [[https://arxiv.org/pdf/1606.01305.pdf|Zoneout (Krueger et al 2016)]] (For regularizing RNNs) "At each timestep, zoneout stochastically forces some hidden units to maintain their previous values"
 +  * [[https://arxiv.org/pdf/1909.11299.pdf|Lee et al 2019 - Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models]]
 +  * LayerDrop: [[https://arxiv.org/pdf/1909.11556.pdf|Fan et al 2019 - Reducing Transformer Depth on Demand with Structured Dropout]] Regularizes networks to allow extraction of smaller networks of any depth at test time without needing to finetune.
   * [[https://arxiv.org/pdf/2101.01761.pdf|Pham & Le 2021 - AutoDropout: Learning Dropout Patterns to Regularize Deep Networks]]  Shows an improvement of 1-2 BLEU for machine translation. Downside: computationally expensive.   * [[https://arxiv.org/pdf/2101.01761.pdf|Pham & Le 2021 - AutoDropout: Learning Dropout Patterns to Regularize Deep Networks]]  Shows an improvement of 1-2 BLEU for machine translation. Downside: computationally expensive.
   * [[https://arxiv.org/pdf/2106.14448.pdf|Liang et al 2021 - R-Drop: Regularized Dropout for Neural Networks]] Universally works, except see the github issues about reproducibility   * [[https://arxiv.org/pdf/2106.14448.pdf|Liang et al 2021 - R-Drop: Regularized Dropout for Neural Networks]] Universally works, except see the github issues about reproducibility
Line 24: Line 26:
   * $L_0$ regularization   * $L_0$ regularization
     * [[https://arxiv.org/pdf/1712.01312.pdf|Louizos et al 2018 - Learning Sparse Neural Networks through L0 Regularization]]     * [[https://arxiv.org/pdf/1712.01312.pdf|Louizos et al 2018 - Learning Sparse Neural Networks through L0 Regularization]]
 +
 +==== Sparsity-Inducing Regularizers ====
 +  * **Structured Sparsity**
 +    * [[https://www.di.ens.fr/~fbach/STS394.pdf|Bach et al 2012 - Structured Sparsity through Convex Optimization]]
 +    * [[https://arxiv.org/pdf/1608.03665.pdf|Wen et al 2016 - Learning Structured Sparsity in Deep Neural
 +Networks]]
  
 ==== Label Smoothing ==== ==== Label Smoothing ====
ml/regularization.1652081542.txt.gz · Last modified: 2023/06/15 07:36 (external edit)

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki