ml:nn_tricks
Table of Contents
Neural Network Tricks
Overviews
- NLP 202 lecture: Training Deep Neural Networks (Winter 2022)
- Training Tricks (see NN Training)
- Gradient clipping Pascanu et al 2012
- Overcoming Catastrophic Forgetting
- Adjust the batch size, or use gradient accumulation (see this blog, for example) to simulate larger batch sizes
- Adjust epsilon in Adam
- Fine-tuning Specific Tricks
- NoisyTune: A Little Noise Can Help You Finetune Pretrained Language Models Better : Before fine-tuning, adding a very small amount of uniform noise to each weight matrix can help performance (noise scaled by variance of the weight)
- Regularization Tricks (see Regularization)
- Knowledge Distillation (can improve performance by some type of regularization)
- Data Processing Tricks (see Data Preparation)
- Subword Units (BPE, wordpiece, subword regularization, BPE dropout. Shared source and target vocabulary for subword units.)
- Architecture Tricks (see NN Architectures)
- Residual connections
- Weight sharing
- Copy mechanism
- Seq2Seq and Generation Tricks
- Try a different decoding method
- Reinforcement Learning Tricks
- Efficiency Tricks
- Tricks for Edge Computing
Older NN Tricks
Related Pages
ml/nn_tricks.txt · Last modified: 2023/10/11 22:19 by jmflanig