Differences

This shows you the differences between two versions of the page.

--- ml:model_compression [2025/05/07 05:52] – [Parameter Sharing] jmflanig
+++ ml:model_compression [2025/05/12 09:00] (current) – [After Training] jmflanig
@@ Line 15: / Line 15: @@
     * [[https://arxiv.org/pdf/2003.03033.pdf|Blalock et al 2020 - What is the State of Neural Network Pruning?]]
     * [[https://arxiv.org/pdf/2102.00554|Hoefler et al 2021 - Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks]]
+  * **Distillation**
+    * [[https://arxiv.org/pdf/2402.13116|Xu et al 2024 - A Survey on Knowledge Distillation of Large Language Models]]
 ===== General Papers =====
-  * [[https://arxiv.org/pdf/2002.11794.pdf|Li et al 2020 - Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers]]
+  * [[https://arxiv.org/pdf/2002.11794.pdf|Li et al 2020 - Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers]] Related: [[Scaling laws]]
   * [[https://arxiv.org/pdf/2002.11985|Ganesh et al 2020 - Compressing Large-Scale Transformer-Based Models: A Case Study on BERT]]
@@ Line 38: / Line 41: @@
     * [[https://arxiv.org/pdf/2310.06694|Xia et al 2023 - Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning]]
     * [[https://arxiv.org/pdf/2402.02834|Kim et al 2024 - Shortened LLaMA: Depth Pruning for Large Language Models with Comparison of Retraining Methods]]
+    * [[https://arxiv.org/pdf/2403.03853|Men et al 2024 - ShortGPT: Layers in Large Language Models are More Redundant Than You Expect]]
 ===== Quantization ======
@@ Line 46: / Line 50: @@
   * [[https://www.aclweb.org/anthology/2020.ngt-1.4.pdf|Aji & Heafield 2020 - Compressing Neural Machine Translation Models with 4-bit Precision]]
   * [[https://arxiv.org/pdf/2101.01321.pdf|Kim et al 2020 - I-BERT: Integer-only BERT Quantization]]
+  * **Empirical Studies**
+    * [[https://arxiv.org/pdf/2505.02214|Zheng et al 2025 - An Empirical Study of Qwen3 Quantization]]
+    * [[https://aclanthology.org/2024.lrec-main.461.pdf|Liu et al 2024 - Do Emergent Abilities Exist in Quantized Large Language Models: An Empirical Study]]
+    * [[https://arxiv.org/pdf/2504.04823?|Liu et al 2025 - Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models]]
 ==== During Training ====
@@ Line 71: / Line 79: @@
 ===== Parameter Sharing =====
-  * HashedNets: [[https://arxiv.org/pdf/1504.04788|Chen et al 2015 - Compressing Neural Networks with the Hashing Trick]] Randomly shares weights in the neural network
+  * HashedNets: [[https://arxiv.org/pdf/1504.04788|Chen et al 2015 - Compressing Neural Networks with the Hashing Trick]] Randomly shares weights in the neural network by using hashing
 ===== Conferences and Workshops =====