Differences

This shows you the differences between two versions of the page.

--- ml:nn_initialization [2021/06/08 08:44] – [Papers] jmflanig
+++ ml:nn_initialization [2023/06/15 07:36] (current) – external edit 127.0.0.1
@@ Line 5: / Line 5: @@
   * Section 8.4 in [[https://www.deeplearningbook.org/contents/optimization.html|Deep Learning Book Ch 8]]
   * Initialization section in [[https://ucsc.primo.exlibrisgroup.com/permalink/01CDL_SCR_INST/1kt68tt/alma991025070453104876|Chapter 11 of Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow (UCSC login required)]]
+  * [[https://classes.soe.ucsc.edu/nlp202/Winter22/slides/nn-training.pdf#page=9|NLP 202 Winter 2022 slides]]
 ===== Papers =====
@@ Line 13: / Line 14: @@
   * ADMIN: [[https://arxiv.org/pdf/2004.08249.pdf|Liu et al 2020 - Understanding the Difficulty of Training Transformers]]
     * [[https://arxiv.org/pdf/2008.07772.pdf|Liu et al 2020 - Very Deep Transformers for Neural Machine Translation]] ADMIN used to train very deep Transformers
-  * SkipInit: [[https://arxiv.org/pdf/2002.10444.pdf|De & Smith 2020 - Batch Normalization Biases Residual Blocks Towards the Identity Function in Deep Networks]] Very cool paper. An initialization strategy for deep residual networks that is billed as an alternative to batch normalization.
+  * SkipInit: [[https://arxiv.org/pdf/2002.10444.pdf|De & Smith 2020 - Batch Normalization Biases Residual Blocks Towards the Identity Function in Deep Networks]] Very cool paper. An initialization strategy for deep residual networks that is billed as an alternative to batch normalization. Related to [[https://arxiv.org/pdf/2003.04887.pdf|ReZero]]. From the ReZero paper: "The authors find that in deep ResNets without BatchNorm, a scalar multiplier is needed to ensure convergence."
 ===== Software Defaults =====
   * PyTorch 1.0 uses He initialization for most layers such as Linear, RNN, Conv2d, etc (see [[https://discuss.pytorch.org/t/whats-the-default-initialization-methods-for-layers/3157/20|this post]])
@@ Line 21: / Line 22: @@
   * [[https://cs230.stanford.edu/section/4/|Stanford Xavier Initialization]]
   * Blog post about Glorot and He: [[https://pouannes.github.io/blog/initialization/|How to initialize deep neural networks? Xavier and Kaiming initialization]] It has some good math derivations for the methods
+===== Related Pages =====
+  * [[NN Training]]