ml:knowledge_distillation

Table of Contents

Knowledge Distillation

Knowledge Distillation

Various papers related to distillation. From Iandola 2020: “While the term 'knowledge distillation' was coined by Hinton et al. 2015 to describe a specific method and equation, the term 'distillation' is now used in reference to a diverse range of approaches where a 'student' network is trained to replicate a 'teacher' network.”

Overviews

Section 4.2.2 of Iandola 2020
Xu et al 2024 - A Survey on Knowledge Distillation of Large Language Models

Papers

Hinton et al 2015 - Distilling the Knowledge in a Neural Network (The paper that introduced knowledge distillation.)
Kim & Rush 2016 - Sequence-Level Knowledge Distillation First paper applying knowledge distillation to seq2seq models.
Multi-step KD: Mirzadeh et al 2019 - Improved Knowledge Distillation via Teacher Assistant
Inaguma et al 2021 - Source and Target Bidirectional Knowledge Distillation for End-to-end Speech Translation

Related Pages

ml/knowledge_distillation.txt · Last modified: 2025/05/12 08:11 by jmflanig