Differences

This shows you the differences between two versions of the page.

--- ml:knowledge_distillation [2022/07/30 08:29] – [Knowledge Distillation] jmflanig
+++ ml:knowledge_distillation [2025/05/12 08:11] (current) – [Overviews] jmflanig
@@ Line 1: / Line 1: @@
 ====== Knowledge Distillation ======
-Various papers related to distillation.  From [[https://arxiv.org/pdf/2006.11316.pdf|Iandola 2020]]: "While the term 'knowledge distillation' was coined by Hinton et al. to describe a specific method and equation [40], the term 'distillation' is now used in reference to a diverse range of approaches where a "student" network is trained to replicate a 'teacher' network."
+Various papers related to distillation.  From [[https://arxiv.org/pdf/2006.11316.pdf|Iandola 2020]]: "While the term 'knowledge distillation' was coined by Hinton et al. 2015 to describe a specific method and equation, the term 'distillation' is now used in reference to a diverse range of approaches where a 'student' network is trained to replicate a 'teacher' network."
 ===== Overviews =====
   * Section 4.2.2 of [[https://arxiv.org/pdf/2006.11316.pdf|Iandola 2020]]
+  * [[https://arxiv.org/pdf/2402.13116|Xu et al 2024 - A Survey on Knowledge Distillation of Large Language Models]]
 ===== Papers =====
@@ Line 12: / Line 13: @@
 ===== Related Pages =====
+  * [[Ensembling]]
   * [[Model Compression]]