This is an old revision of the document!

Neural Network Tricks

Training Tricks (see NN Training)
- Initialization
- Normalization
- Learning Rate Schedule
- Gradient clipping Pascanu et al 2012
- Scheduled Sampling
- Curriculum Learning
Regularization Tricks (see Regularization)
- Dropout
- Ensembling
- Knowledge Distillation (can improve performance by some type of regularization)
Data Processing Tricks (see Data Preparation)
- Subword Units (BPE, wordpiece, subword regularization, BPE dropout. Shared source and target vocabulary for subword units.)
- Shared source and target embeddings
Architecture Tricks (see NN Architectures)
- Residual connections
- Weight sharing
- Attention
Efficiency Tricks
- Model Compression
- Tricks for Edge Computing
- Knowledge Distillation

Related Pages