Differences

This shows you the differences between two versions of the page.

--- ml:nn_training [2022/07/29 09:48] – [Training Setups in the Literature] jmflanig
+++ ml:nn_training [2024/07/09 22:29] (current) – [Topics] jmflanig
@@ Line 19: / Line 19: @@
   * [[Regularization]]
   * [[Fine-Tuning]] and [[nlp:Pretraining]]
-  * [[NN Tricks|Misc Tricks]]
+  * **[[NN Tricks|Neural Network Tricks]]**
     * Tricks such as [[Curriculum Learning]], etc
     * [[nlp:Transformers#Training|Transformer Training Tricks]]
     * Residual connections, [[https://arxiv.org/pdf/2003.04887.pdf|ReZero]]
+    * [[https://arxiv.org/pdf/1710.03740|Mixed Precision Training]] (also [[https://docs.nvidia.com/deeplearning/performance/mixed-precision-training/index.html|Train With Mixed Precision - NVIDIA Docs]], see other papers as well)
   * [[Large-Scale]] and [[Distributed Training]]
@@ Line 38: / Line 39: @@
   * TODO: BART
   * [[https://aclanthology.org/2021.emnlp-main.831.pdf|Academic Budget BERT]]
+  * [[https://arxiv.org/pdf/2201.11990.pdf|Megatron-Turing NLG]]
   * [[https://arxiv.org/pdf/2204.02311.pdf|PaLM]]