User Tools

Site Tools


ml:fine-tuning

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
ml:fine-tuning [2025/05/01 04:04] – [Parameter-Efficient Tuning (PET)] jmflanigml:fine-tuning [2025/07/14 07:37] (current) – [General Papers] jmflanig
Line 8: Line 8:
   * [[https://arxiv.org/pdf/2205.05638.pdf|Liu et al 2022 - Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning]]   * [[https://arxiv.org/pdf/2205.05638.pdf|Liu et al 2022 - Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning]]
   * **[[https://arxiv.org/pdf/2306.09782|Lv et al 2023 - Full Parameter Fine-tuning for Large Language Models with Limited Resources]]**   * **[[https://arxiv.org/pdf/2306.09782|Lv et al 2023 - Full Parameter Fine-tuning for Large Language Models with Limited Resources]]**
-  * [[https://arxiv.org/pdf/2310.10195|Lv et al 2023 - Low-memory Optimization with Adaptive Learning Rate]] 
     * [[https://arxiv.org/pdf/2310.10195|Lv et al 2023 - AdaLomo: Low-memory Optimization with Adaptive Learning Rate]]     * [[https://arxiv.org/pdf/2310.10195|Lv et al 2023 - AdaLomo: Low-memory Optimization with Adaptive Learning Rate]]
   * [[https://arxiv.org/pdf/2403.14608|Han et al 2024 - Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey]]   * [[https://arxiv.org/pdf/2403.14608|Han et al 2024 - Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey]]
Line 41: Line 40:
     * [[https://arxiv.org/pdf/2401.14556|2024 - Looking Right is Sometimes Right: Investigating the Capabilities of Decoder-only LLMs for Sequence Labeling]] Says "LLMs fall short of achieving state-of-the-art results in information extraction (IE) tasks, many of which are formulated as sequence labeling (SL)"     * [[https://arxiv.org/pdf/2401.14556|2024 - Looking Right is Sometimes Right: Investigating the Capabilities of Decoder-only LLMs for Sequence Labeling]] Says "LLMs fall short of achieving state-of-the-art results in information extraction (IE) tasks, many of which are formulated as sequence labeling (SL)"
     * [[https://arxiv.org/pdf/2404.05961|LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders]] Shows that Mistral was probably pre-trained using some bi-directional attention.     * [[https://arxiv.org/pdf/2404.05961|LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders]] Shows that Mistral was probably pre-trained using some bi-directional attention.
 +    * [[https://arxiv.org/pdf/2504.06225|Zhang et al 2025 - Encoder-Decoder Gemma: Improving the Quality-Efficiency Trade-Off via Adaptation]] They seem to think they are the first to do this (adapt pretrained decoder-only LLMs to encoder-decoder), which is incorrect.
  
 ===== Parameter-Efficient Tuning (PET) ===== ===== Parameter-Efficient Tuning (PET) =====
Line 56: Line 56:
   * **QLoRA**: [[https://arxiv.org/pdf/2305.14314.pdf|Dettmers et al 2023 - QLORA: Efficient Finetuning of Quantized LLMs]]   * **QLoRA**: [[https://arxiv.org/pdf/2305.14314.pdf|Dettmers et al 2023 - QLORA: Efficient Finetuning of Quantized LLMs]]
   * [[https://arxiv.org/pdf/2305.17333.pdf|Malladi et al 2023 - Fine-Tuning Language Models with Just Forward Passes]]   * [[https://arxiv.org/pdf/2305.17333.pdf|Malladi et al 2023 - Fine-Tuning Language Models with Just Forward Passes]]
 +  * [[https://arxiv.org/pdf/2312.09979|Dou et al 2023 - LoRAMoE: Alleviate World Knowledge Forgetting in Large Language Models via MoE-Style Plugin]] Combines LoRA with MoE to improve performance
   * [[https://arxiv.org/pdf/2402.03293|Hao et al 2024 - FLORA: Low-Rank Adapters Are Secretly Gradient Compressors]]   * [[https://arxiv.org/pdf/2402.03293|Hao et al 2024 - FLORA: Low-Rank Adapters Are Secretly Gradient Compressors]]
   * [[https://arxiv.org/pdf/2403.03507.pdf|Zhao et al 2024 - GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection]] Can also be used for pre-training   * [[https://arxiv.org/pdf/2403.03507.pdf|Zhao et al 2024 - GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection]] Can also be used for pre-training
ml/fine-tuning.1746072287.txt.gz · Last modified: 2025/05/01 04:04 by jmflanig

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki