This is an old revision of the document!
Table of Contents
Ensembling
Ensembling combines several models to improve generalization performance. For example, ensembling models trained with different random seeds almost always improves performance. This technique is often used when performance is the main object, such as in competitions like WMT. However, in papers, because it often gives a large improvement, researchers usually compare non-ensembling methods to other non-ensembled methods, and ensembled methods to ensembled methods. See for example of this see Gehring et al 2017.
Introduction: Method
For models trained separately with cross-entropy, the standard method of ensembling in NLP is to just average the probabilities of the models at test time and predict using this probability. There are two standard ways to create the different models for ensembling
- Multirun ensembling: models come from training runs with different random seeds
- Checkpoint ensembling: models come from different checkpoints of a single training run