User Tools

Site Tools


ml:nn_initialization

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
ml:nn_initialization [2021/06/08 08:44] – [Papers] jmflanigml:nn_initialization [2023/06/15 07:36] (current) – external edit 127.0.0.1
Line 5: Line 5:
   * Section 8.4 in [[https://www.deeplearningbook.org/contents/optimization.html|Deep Learning Book Ch 8]]   * Section 8.4 in [[https://www.deeplearningbook.org/contents/optimization.html|Deep Learning Book Ch 8]]
   * Initialization section in [[https://ucsc.primo.exlibrisgroup.com/permalink/01CDL_SCR_INST/1kt68tt/alma991025070453104876|Chapter 11 of Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow (UCSC login required)]]   * Initialization section in [[https://ucsc.primo.exlibrisgroup.com/permalink/01CDL_SCR_INST/1kt68tt/alma991025070453104876|Chapter 11 of Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow (UCSC login required)]]
 +  * [[https://classes.soe.ucsc.edu/nlp202/Winter22/slides/nn-training.pdf#page=9|NLP 202 Winter 2022 slides]]
  
 ===== Papers ===== ===== Papers =====
Line 13: Line 14:
   * ADMIN: [[https://arxiv.org/pdf/2004.08249.pdf|Liu et al 2020 - Understanding the Difficulty of Training Transformers]]   * ADMIN: [[https://arxiv.org/pdf/2004.08249.pdf|Liu et al 2020 - Understanding the Difficulty of Training Transformers]]
     * [[https://arxiv.org/pdf/2008.07772.pdf|Liu et al 2020 - Very Deep Transformers for Neural Machine Translation]] ADMIN used to train very deep Transformers     * [[https://arxiv.org/pdf/2008.07772.pdf|Liu et al 2020 - Very Deep Transformers for Neural Machine Translation]] ADMIN used to train very deep Transformers
-  * SkipInit: [[https://arxiv.org/pdf/2002.10444.pdf|De & Smith 2020 - Batch Normalization Biases Residual Blocks Towards the Identity Function in Deep Networks]] Very cool paper. An initialization strategy for deep residual networks that is billed as an alternative to batch normalization.+  * SkipInit: [[https://arxiv.org/pdf/2002.10444.pdf|De & Smith 2020 - Batch Normalization Biases Residual Blocks Towards the Identity Function in Deep Networks]] Very cool paper. An initialization strategy for deep residual networks that is billed as an alternative to batch normalization. Related to [[https://arxiv.org/pdf/2003.04887.pdf|ReZero]]. From the ReZero paper: "The authors find that in deep ResNets without BatchNorm, a scalar multiplier is needed to ensure convergence."
 ===== Software Defaults ===== ===== Software Defaults =====
   * PyTorch 1.0 uses He initialization for most layers such as Linear, RNN, Conv2d, etc (see [[https://discuss.pytorch.org/t/whats-the-default-initialization-methods-for-layers/3157/20|this post]])   * PyTorch 1.0 uses He initialization for most layers such as Linear, RNN, Conv2d, etc (see [[https://discuss.pytorch.org/t/whats-the-default-initialization-methods-for-layers/3157/20|this post]])
Line 21: Line 22:
   * [[https://cs230.stanford.edu/section/4/|Stanford Xavier Initialization]]   * [[https://cs230.stanford.edu/section/4/|Stanford Xavier Initialization]]
   * Blog post about Glorot and He: [[https://pouannes.github.io/blog/initialization/|How to initialize deep neural networks? Xavier and Kaiming initialization]] It has some good math derivations for the methods   * Blog post about Glorot and He: [[https://pouannes.github.io/blog/initialization/|How to initialize deep neural networks? Xavier and Kaiming initialization]] It has some good math derivations for the methods
 +
 +===== Related Pages =====
 +  * [[NN Training]]
  
ml/nn_initialization.1623141869.txt.gz · Last modified: 2023/06/15 07:36 (external edit)

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki