ml:optimization_in_deep_learning
This is an old revision of the document!
Table of Contents
Optimization Topics in Deep Learning
Effects on Optimization
- Batch normalization
- Makes the objective function and gradients smoother (more Lipschitz) (Santurkar 2018 - How Does Batch Normalization Help Optimization?. See also section 3 of De 2020.)
- Weight normalization
- Improves the conditioning of the optimization problem (Salimans & Kingma 2016)
On Global Optimization of Neural Networks
- Du et al 2018 - Gradient Descent Provably Optimizes Over-parameterized Neural Networks “We show that as long as m is large enough and no two inputs are parallel, randomly initialized gradient descent converges to a globally optimal solution at a linear convergence rate for the quadratic loss function, for an m hidden node shallow neural network with ReLU activation and n training data.”
- Goldblum et al 2020 - Truth or Backpropaganda? An Empirical Investigation of Deep Learning Theory Examines optimization issues in deep learning and their relation to ML theory
Properties of Neural Networks
Lipschitz Constant
If the weights are unbounded, the Lipschitz constant will be unbouned (true even for logistic regression).
Backpropagating Through Discontinuities
- Bengio et al 2013 - Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation Introduces straight-through estimator
- REINFORCE
ml/optimization_in_deep_learning.1623142275.txt.gz · Last modified: 2023/06/15 07:36 (external edit)