User Tools

Site Tools


ml:model_compression

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
ml:model_compression [2025/05/07 18:09] – [Parameter Sharing] jmflanigml:model_compression [2025/05/12 09:00] (current) – [After Training] jmflanig
Line 15: Line 15:
     * [[https://arxiv.org/pdf/2003.03033.pdf|Blalock et al 2020 - What is the State of Neural Network Pruning?]]     * [[https://arxiv.org/pdf/2003.03033.pdf|Blalock et al 2020 - What is the State of Neural Network Pruning?]]
     * [[https://arxiv.org/pdf/2102.00554|Hoefler et al 2021 - Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks]]     * [[https://arxiv.org/pdf/2102.00554|Hoefler et al 2021 - Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks]]
 +  * **Distillation**
 +    * [[https://arxiv.org/pdf/2402.13116|Xu et al 2024 - A Survey on Knowledge Distillation of Large Language Models]]
 +
  
 ===== General Papers ===== ===== General Papers =====
Line 47: Line 50:
   * [[https://www.aclweb.org/anthology/2020.ngt-1.4.pdf|Aji & Heafield 2020 - Compressing Neural Machine Translation Models with 4-bit Precision]]   * [[https://www.aclweb.org/anthology/2020.ngt-1.4.pdf|Aji & Heafield 2020 - Compressing Neural Machine Translation Models with 4-bit Precision]]
   * [[https://arxiv.org/pdf/2101.01321.pdf|Kim et al 2020 - I-BERT: Integer-only BERT Quantization]]   * [[https://arxiv.org/pdf/2101.01321.pdf|Kim et al 2020 - I-BERT: Integer-only BERT Quantization]]
 +  * **Empirical Studies**
 +    * [[https://arxiv.org/pdf/2505.02214|Zheng et al 2025 - An Empirical Study of Qwen3 Quantization]]
 +    * [[https://aclanthology.org/2024.lrec-main.461.pdf|Liu et al 2024 - Do Emergent Abilities Exist in Quantized Large Language Models: An Empirical Study]]
 +    * [[https://arxiv.org/pdf/2504.04823?|Liu et al 2025 - Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models]]
  
 ==== During Training ==== ==== During Training ====
ml/model_compression.1746641350.txt.gz · Last modified: 2025/05/07 18:09 by jmflanig

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki