This is an old revision of the document!

Neural Networks: Alternative Training Methods

Papers

Such et al 2017 - Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning Training neural networks with genetic algorithms instead of backprop
Baydin et al 2022 - Gradients without Backpropagation Uses forward model automatic differentiation to compute a “forward gradient” (no backward pass like backprop). Essentially it computes the change in loss in a random direction. When scaled by the loss, this is an unbiased estimate of the true gradient, which they plug into gradient descent. This has a number of important implications:
- They could have computed the finite differences approximation to the gradient by taking a small step in the random direction. This would allow computing the change in loss for discontinuous functions.
- The direction doesn't have to be sampled from a random normal - the components only need to be independent. They could have sampled the components from {-1,1} (two discrete values). This would allow them to optimize binary neural networks with their technique.