User Tools

Site Tools


ml:knowledge_distillation

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
ml:knowledge_distillation [2021/08/27 18:40] – [Papers] jmflanigml:knowledge_distillation [2025/05/12 08:11] (current) – [Overviews] jmflanig
Line 1: Line 1:
 ====== Knowledge Distillation ====== ====== Knowledge Distillation ======
-Various papers related to distillation.+Various papers related to distillation.  From [[https://arxiv.org/pdf/2006.11316.pdf|Iandola 2020]]: "While the term 'knowledge distillation' was coined by Hinton et al. 2015 to describe a specific method and equation, the term 'distillation' is now used in reference to a diverse range of approaches where a 'student' network is trained to replicate a 'teacher' network."
  
 ===== Overviews ===== ===== Overviews =====
   * Section 4.2.2 of [[https://arxiv.org/pdf/2006.11316.pdf|Iandola 2020]]   * Section 4.2.2 of [[https://arxiv.org/pdf/2006.11316.pdf|Iandola 2020]]
 +  * [[https://arxiv.org/pdf/2402.13116|Xu et al 2024 - A Survey on Knowledge Distillation of Large Language Models]]
  
 ===== Papers ===== ===== Papers =====
Line 9: Line 10:
   * [[https://arxiv.org/pdf/1606.07947.pdf|Kim & Rush 2016 - Sequence-Level Knowledge Distillation]] First paper applying knowledge distillation to seq2seq models.   * [[https://arxiv.org/pdf/1606.07947.pdf|Kim & Rush 2016 - Sequence-Level Knowledge Distillation]] First paper applying knowledge distillation to seq2seq models.
   * Multi-step KD: [[https://arxiv.org/pdf/1902.03393.pdf|Mirzadeh et al 2019 - Improved Knowledge Distillation via Teacher Assistant]]   * Multi-step KD: [[https://arxiv.org/pdf/1902.03393.pdf|Mirzadeh et al 2019 - Improved Knowledge Distillation via Teacher Assistant]]
-  * [[https://arxiv.org/pdf/2104.06457.pdf|Inaguma et al 2020 - Source and Target Bidirectional Knowledge Distillation for End-to-end Speech Translation]]+  * [[https://arxiv.org/pdf/2104.06457.pdf|Inaguma et al 2021 - Source and Target Bidirectional Knowledge Distillation for End-to-end Speech Translation]] 
 + 
 +===== Related Pages ===== 
 +  * [[Ensembling]] 
 +  * [[Model Compression]]
  
ml/knowledge_distillation.1630089639.txt.gz · Last modified: 2023/06/15 07:36 (external edit)

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki