ml:nn_initialization
This is an old revision of the document!
Table of Contents
Neural Network Initialization
- Glorot (Xavier) initialization: Glorot & Bengio 2010 - Understanding the Difficulty of Training Deep Feedforward Neural Networks (use with sigmoid activations)
- Intution: if you initialize the network randomly, then for a given neuron, if there are a lot of incoming connections, then they will tend to cancel and not saturate the neuron. BUT, if the network is deep, then if variance of the neurons goes up as you go to higher layers, then you'll have a problem because they will saturate as you go higher
- He initialization: He et al 2015 (use with ReLu activations)
- ADMIN: Liu et al 2020
- Liu et al 2020 - Very Deep Transformers for Neural Machine Translation ADMIN used to train very deep Transformers
Software Defaults
- PyTorch 1.0 uses He initialization for most layers such as Linear, RNN, Conv2d, etc (see this post)
Resources
- Blog post about Glorot and He: How to initialize deep neural networks? Xavier and Kaiming initialization It has some good math derivations for the methods
ml/nn_initialization.1614872672.txt.gz · Last modified: 2023/06/15 07:36 (external edit)