Differences

This shows you the differences between two versions of the page.

--- ml:alternative_training_methods [2022/10/25 21:03] – [Papers] jmflanig
+++ ml:alternative_training_methods [2023/08/11 20:05] (current) – [Papers] jmflanig
@@ Line 10: / Line 10: @@
     * The direction doesn't have to be sampled from a random normal - the components only need to be independent.  They could have sampled the components from {-1,1} (two discrete values).  This would allow them to optimize [[model_compression#binarized_neural_networks|binary neural networks]] with their technique.
     * Follow-up work: [[https://arxiv.org/pdf/2209.06302.pdf|Belouze 2022 - Optimization without Backpropagation]]
+  * [[https://arxiv.org/pdf/2212.13345.pdf|Hinton 2022 - The Forward-Forward Algorithm: Some Preliminary
+Investigations]]
+  * [[https://arxiv.org/pdf/2305.17333.pdf|Malladi et 2023 - Fine-Tuning Language Models with Just Forward Passes]]
 ===== Related Pages =====