Differences

This shows you the differences between two versions of the page.

--- ml:ensembling [2022/08/01 07:23] – [Ensembling] jmflanig
+++ ml:ensembling [2023/06/15 07:36] (current) – external edit 127.0.0.1
@@ Line 1: / Line 1: @@
 ====== Ensembling ======
-Ensembling combines several models to improve generalization performance.  For example, ensembling models trained with different random seeds almost always improves performance.  This technique is often used when performance is the main object, such as in competitions like [[https://machinetranslate.org/wmt|WMT]].  However, in papers, because it often gives a large improvement, researchers usually compare non-ensembling methods to other non-ensembled methods, and ensembled methods to ensembled methods.  See for example of this see [[https://arxiv.org/pdf/1705.03122.pdf|Gehring et al 2017]].
+Ensembling combines several models to improve generalization performance.  For example, ensembling models trained with different random seeds almost always improves performance.  This technique is often used when performance is the main object, such as in competitions like [[https://machinetranslate.org/wmt|WMT]].  However, in papers, because it often gives a large improvement, researchers usually compare non-ensembling methods to other non-ensembled methods, and ensembled methods to ensembled methods.  As an example of this see [[https://arxiv.org/pdf/1705.03122.pdf|Gehring et al 2017]].
 ==== Basic Method ====
-For models trained separately with cross-entropy, //the standard method of ensembling in NLP is to just average the probabilities of the models at test time and predict using this probability//.  There are two standard ways to create the different models for ensembling
+For models trained with cross-entropy, //the standard method of ensembling in NLP is to just average the probabilities of the models at test time and predict using this probability//.  There are two standard ways to create the different models for ensembling:
-  * Multirun ensembling: models come from training runs with different random seeds
+  * **Multirun ensembling**: models come from training runs with different random seeds
-  * Checkpoint ensembling: models come from different checkpoints of a single training run
+  * **Checkpoint ensembling**: models come from different checkpoints of a single training run. This has the advantage that a only single training run is needed
+See [[https://www.amazon.com/Neural-Machine-Translation-Philipp-Koehn/dp/1108497322|Koehn 2020]] p. 148 or [[http://mt-class.org/jhu/assets/nmt-book.pdf#page=67|pdf here]].
 ===== Overviews =====