Baydin et al 2022 - Gradients without Backpropagation github Uses forward mode automatic differentiation to compute a “forward gradient” (no backward pass like backprop). Essentially it computes the change in loss in a random direction. When scaled by the loss, this is an unbiased estimate of the true gradient, which they plug into stochastic gradient descent. This has a number of important implications: