User Tools

Site Tools


ml:nn_training

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
ml:nn_training [2022/07/29 09:48] – [Training Setups in the Literature] jmflanigml:nn_training [2024/07/09 22:29] (current) – [Topics] jmflanig
Line 19: Line 19:
   * [[Regularization]]   * [[Regularization]]
   * [[Fine-Tuning]] and [[nlp:Pretraining]]   * [[Fine-Tuning]] and [[nlp:Pretraining]]
-  * [[NN Tricks|Misc Tricks]]+  * **[[NN Tricks|Neural Network Tricks]]**
     * Tricks such as [[Curriculum Learning]], etc     * Tricks such as [[Curriculum Learning]], etc
     * [[nlp:Transformers#Training|Transformer Training Tricks]]     * [[nlp:Transformers#Training|Transformer Training Tricks]]
     * Residual connections, [[https://arxiv.org/pdf/2003.04887.pdf|ReZero]]     * Residual connections, [[https://arxiv.org/pdf/2003.04887.pdf|ReZero]]
 +    * [[https://arxiv.org/pdf/1710.03740|Mixed Precision Training]] (also [[https://docs.nvidia.com/deeplearning/performance/mixed-precision-training/index.html|Train With Mixed Precision - NVIDIA Docs]], see other papers as well)
   * [[Large-Scale]] and [[Distributed Training]]   * [[Large-Scale]] and [[Distributed Training]]
  
Line 38: Line 39:
   * TODO: BART   * TODO: BART
   * [[https://aclanthology.org/2021.emnlp-main.831.pdf|Academic Budget BERT]]   * [[https://aclanthology.org/2021.emnlp-main.831.pdf|Academic Budget BERT]]
 +  * [[https://arxiv.org/pdf/2201.11990.pdf|Megatron-Turing NLG]]
   * [[https://arxiv.org/pdf/2204.02311.pdf|PaLM]]   * [[https://arxiv.org/pdf/2204.02311.pdf|PaLM]]
  
ml/nn_training.1659088090.txt.gz · Last modified: 2023/06/15 07:36 (external edit)

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki