User Tools

Site Tools


ml:alternative_training_methods

This is an old revision of the document!


Neural Networks: Alternative Training Methods

Papers

  • Baydin et al 2022 - Gradients without Backpropagation Uses forward model automatic differentiation to compute a “forward gradient” (no backward pass like backprop). Essentially it computes the change in loss in a random direction. When scaled by the loss, this is an unbiased estimate of the true gradient, which they plug into gradient descent. This has a number of important implications:
    • They could have computed the finite differences approximation to the gradient by taking a small step in the random direction. This would allow computing the change in loss for discontinuous functions.
    • The direction doesn't have to be sampled from a random normal - the components only need to be independent. They could have sampled the components from {-1,1} (two discrete values). This would allow them to optimize binary neural networks with their technique.
ml/alternative_training_methods.1652572324.txt.gz · Last modified: 2023/06/15 07:36 (external edit)

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki