Differences

This shows you the differences between two versions of the page.

--- ml:nn_training [2022/08/01 07:32] – [Topics] jmflanig
+++ ml:nn_training [2024/07/09 22:29] (current) – [Topics] jmflanig
@@ Line 23: / Line 23: @@
     * [[nlp:Transformers#Training|Transformer Training Tricks]]
     * Residual connections, [[https://arxiv.org/pdf/2003.04887.pdf|ReZero]]
+    * [[https://arxiv.org/pdf/1710.03740|Mixed Precision Training]] (also [[https://docs.nvidia.com/deeplearning/performance/mixed-precision-training/index.html|Train With Mixed Precision - NVIDIA Docs]], see other papers as well)
   * [[Large-Scale]] and [[Distributed Training]]
@@ Line 38: / Line 39: @@
   * TODO: BART
   * [[https://aclanthology.org/2021.emnlp-main.831.pdf|Academic Budget BERT]]
+  * [[https://arxiv.org/pdf/2201.11990.pdf|Megatron-Turing NLG]]
   * [[https://arxiv.org/pdf/2204.02311.pdf|PaLM]]