ml:nn_tricks
This is an old revision of the document!
Table of Contents
Neural Network Tricks
- Training Tricks (see NN Training)
- Gradient clipping Pascanu et al 2012
- Overcoming Catastrophic Forgetting
- Adjust the batch size, or use gradient accumulation to simulate larger batch sizes
- Adjust epsilon in Adam
- Regularization Tricks (see Regularization)
- Knowledge Distillation (can improve performance by some type of regularization)
- Data Processing Tricks (see Data Preparation)
- Subword Units (BPE, wordpiece, subword regularization, BPE dropout. Shared source and target vocabulary for subword units.)
- Architecture Tricks (see NN Architectures)
- Residual connections
- Weight sharing
- Copy mechanism
- Seq2Seq and Generation Tricks
- Try a different decoding method
- Reinforcement Learning Tricks
- Efficiency Tricks
- Tricks for Edge Computing
Older NN Tricks
Related Pages
ml/nn_tricks.1652297345.txt.gz · Last modified: 2023/06/15 07:36 (external edit)