ml:optimization_in_deep_learning
This is an old revision of the document!
Table of Contents
Optimization Topics in Deep Learning
Effects on Optimization
- Batch normalization
- Makes the objective function and gradients smoother (more Lipschitz) (Santurkar 2018 - How Does Batch Normalization Help Optimization?)
- Weight normalization
- Improves the conditioning of the optimization problem (Salimans & Kingma 2016)
On Global Optimization of Neural Networks
- Du et al 2018 - Gradient Descent Provably Optimizes Over-parameterized Neural Networks “We show that as long as m is large enough and no two inputs are parallel, randomly initialized gradient descent converges to a globally optimal solution at a linear convergence rate for the quadratic loss function, for an m hidden node shallow neural network with ReLU activation and n training data.”
Properties of Neural Networks
Lipschitz Constant
If the weights are unbounded, the Lipschitz constant will be unbouned (true even for logistic regression).
Backpropagating Through Discontinuities
- Bengio et al 2013 - Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation Introduces straight-through estimator
- REINFORCE
ml/optimization_in_deep_learning.1617272199.txt.gz · Last modified: 2023/06/15 07:36 (external edit)