User Tools

Site Tools


ml:nn_training

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
ml:nn_training [2022/08/01 07:32] – [Topics] jmflanigml:nn_training [2024/07/09 22:29] (current) – [Topics] jmflanig
Line 23: Line 23:
     * [[nlp:Transformers#Training|Transformer Training Tricks]]     * [[nlp:Transformers#Training|Transformer Training Tricks]]
     * Residual connections, [[https://arxiv.org/pdf/2003.04887.pdf|ReZero]]     * Residual connections, [[https://arxiv.org/pdf/2003.04887.pdf|ReZero]]
 +    * [[https://arxiv.org/pdf/1710.03740|Mixed Precision Training]] (also [[https://docs.nvidia.com/deeplearning/performance/mixed-precision-training/index.html|Train With Mixed Precision - NVIDIA Docs]], see other papers as well)
   * [[Large-Scale]] and [[Distributed Training]]   * [[Large-Scale]] and [[Distributed Training]]
  
Line 38: Line 39:
   * TODO: BART   * TODO: BART
   * [[https://aclanthology.org/2021.emnlp-main.831.pdf|Academic Budget BERT]]   * [[https://aclanthology.org/2021.emnlp-main.831.pdf|Academic Budget BERT]]
 +  * [[https://arxiv.org/pdf/2201.11990.pdf|Megatron-Turing NLG]]
   * [[https://arxiv.org/pdf/2204.02311.pdf|PaLM]]   * [[https://arxiv.org/pdf/2204.02311.pdf|PaLM]]
  
ml/nn_training.1659339121.txt.gz · Last modified: 2023/06/15 07:36 (external edit)

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki