Differences

This shows you the differences between two versions of the page.

--- ml:fine-tuning [2025/03/19 17:37] – [Parameter-Efficient Tuning (PET)] jmflanig
+++ ml:fine-tuning [2025/07/14 07:37] (current) – [General Papers] jmflanig
@@ Line 8: / Line 8: @@
   * [[https://arxiv.org/pdf/2205.05638.pdf|Liu et al 2022 - Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning]]
   * **[[https://arxiv.org/pdf/2306.09782|Lv et al 2023 - Full Parameter Fine-tuning for Large Language Models with Limited Resources]]**
+    * [[https://arxiv.org/pdf/2310.10195|Lv et al 2023 - AdaLomo: Low-memory Optimization with Adaptive Learning Rate]]
   * [[https://arxiv.org/pdf/2403.14608|Han et al 2024 - Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey]]
   * [[https://arxiv.org/pdf/2408.13296|2024 - The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs: An Exhaustive Review of Technologies, Research, Best Practices, Applied Research Challenges and Opportunities]] Missing lots of stuff.  Not really the ultimate guide.
@@ Line 30: / Line 31: @@
   * Gradual Fine-Tuning: [[https://arxiv.org/pdf/2103.02205.pdf|Xu et al 2021 - Gradual Fine-Tuning for Low-Resource Domain Adaptation]]
   * [[https://arxiv.org/pdf/2106.14282.pdf|Zhou & Srikumar 2021 - A Closer Look at How Fine-tuning Changes BERT]]
+  * EasyAdapt: [[https://arxiv.org/pdf/2109.04711|Bai et al 2021 - Pre-train or Annotate? Domain Adaptation with a Constrained Budget]] Adapts [[https://arxiv.org/pdf/0907.1815|Daumé III 2009 - Frustratingly Easy Domain Adaptation]] to Transformer era. Also considers the tradeoff between pretraining on in-domain data vs annotations on in-domain data under budget constraints.
   * [[https://arxiv.org/pdf/2109.05687.pdf|Xu et al 2021 - Raise a Child in Large Language Model: Towards Effective and Generalizable Fine-tuning]] Applies masking to only fine-tune a subset of the weights. Shows it outperforms regular fine-tuning.
   * [[https://arxiv.org/pdf/2202.12024.pdf|Wu et al 2022 - NoisyTune: A Little Noise Can Help You Finetune Pretrained Language Models Better]] Shows that adding some noise to the parameters (small perturbation) before fine-tuning can improve results.
@@ Line 38: / Line 40: @@
     * [[https://arxiv.org/pdf/2401.14556|2024 - Looking Right is Sometimes Right: Investigating the Capabilities of Decoder-only LLMs for Sequence Labeling]] Says "LLMs fall short of achieving state-of-the-art results in information extraction (IE) tasks, many of which are formulated as sequence labeling (SL)"
     * [[https://arxiv.org/pdf/2404.05961|LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders]] Shows that Mistral was probably pre-trained using some bi-directional attention.
+    * [[https://arxiv.org/pdf/2504.06225|Zhang et al 2025 - Encoder-Decoder Gemma: Improving the Quality-Efficiency Trade-Off via Adaptation]] They seem to think they are the first to do this (adapt pretrained decoder-only LLMs to encoder-decoder), which is incorrect.
 ===== Parameter-Efficient Tuning (PET) =====
@@ Line 53: / Line 56: @@
   * **QLoRA**: [[https://arxiv.org/pdf/2305.14314.pdf|Dettmers et al 2023 - QLORA: Efficient Finetuning of Quantized LLMs]]
   * [[https://arxiv.org/pdf/2305.17333.pdf|Malladi et al 2023 - Fine-Tuning Language Models with Just Forward Passes]]
+  * [[https://arxiv.org/pdf/2312.09979|Dou et al 2023 - LoRAMoE: Alleviate World Knowledge Forgetting in Large Language Models via MoE-Style Plugin]] Combines LoRA with MoE to improve performance
+  * [[https://arxiv.org/pdf/2402.03293|Hao et al 2024 - FLORA: Low-Rank Adapters Are Secretly Gradient Compressors]]
   * [[https://arxiv.org/pdf/2403.03507.pdf|Zhao et al 2024 - GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection]] Can also be used for pre-training
+  * [[https://arxiv.org/pdf/2402.09353|Liu et al 2024 - DoRA: Weight-Decomposed Low-Rank Adaptation]] **Says still often exists performance gap between PEFT and full fine-tuning** (cited by [[https://arxiv.org/pdf/2405.15525|He 2024]] for this).
   * [[https://arxiv.org/pdf/2405.15525|He et al 2024 - Sparse Matrix in Large Language Model Fine-tuning]]