ml:nn_initialization
Table of Contents
Neural Network Initialization
Overviews
- Section 8.4 in Deep Learning Book Ch 8
- Initialization section in Chapter 11 of Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow (UCSC login required)
Papers
- Glorot (Xavier) initialization: Glorot & Bengio 2010 - Understanding the Difficulty of Training Deep Feedforward Neural Networks (use with sigmoid activations)
- Intution: if you initialize the network randomly, then for a given neuron, if there are a lot of incoming connections, then they will tend to cancel and not saturate the neuron. BUT, if the network is deep, then if variance of the neurons goes up as you go to higher layers, then you'll have a problem because they will saturate as you go higher
- He initialization: He et al 2015 (use with ReLu activations)
-
- Liu et al 2020 - Very Deep Transformers for Neural Machine Translation ADMIN used to train very deep Transformers
- SkipInit: De & Smith 2020 - Batch Normalization Biases Residual Blocks Towards the Identity Function in Deep Networks Very cool paper. An initialization strategy for deep residual networks that is billed as an alternative to batch normalization. Related to ReZero. From the ReZero paper: “The authors find that in deep ResNets without BatchNorm, a scalar multiplier is needed to ensure convergence.”
Software Defaults
- PyTorch 1.0 uses He initialization for most layers such as Linear, RNN, Conv2d, etc (see this post)
Resources
- Blog post about Glorot and He: How to initialize deep neural networks? Xavier and Kaiming initialization It has some good math derivations for the methods
Related Pages
ml/nn_initialization.txt · Last modified: 2023/06/15 07:36 by 127.0.0.1