ml:optimization_in_deep_learning

This is an old revision of the document!

Table of Contents

Optimization Topics in Deep Learning

Optimization Topics in Deep Learning

Effects on Optimization

Batch normalization
- Makes the objective function and gradients smoother (more Lipschitz) (Santurkar 2018 - How Does Batch Normalization Help Optimization?)
Weight normalization
- Improves the conditioning of the optimization problem (Salimans & Kingma 2016)

On Global Optimization of Neural Networks

Du et al 2018 - Gradient Descent Provably Optimizes Over-parameterized Neural Networks “We show that as long as m is large enough and no two inputs are parallel, randomly initialized gradient descent converges to a globally optimal solution at a linear convergence rate for the quadratic loss function, for an m hidden node shallow neural network with ReLU activation and n training data.”

Properties of Neural Networks

Yao et al 2019 - PYHESSIAN: Neural Networks Through the Lens of the Hessian

Lipschitz Constant

If the weights are unbounded, the Lipschitz constant will be unbouned (true even for logistic regression).

Backpropagating Through Discontinuities

Bengio et al 2013 - Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation Introduces straight-through estimator
REINFORCE
Peng et al 2018 - Backpropagating through Structured Argmax using a SPIGOT

ml/optimization_in_deep_learning.1617272199.txt.gz · Last modified: 2023/06/15 07:36 (external edit)