nlp:fine-tuning

This is an old revision of the document!

Fune-Tuning

This page lists fine-tuning methods such as Adaptors, BitFit, NoisyTune, etc.

Figure from Mahabadi 2021.

See also Optimization - Instability of Fine-tuning.

Adaptor Layers: Houlsby et al 2019 - Parameter-Efficient Transfer Learning for NLP
Mosbach 2020 - On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines Advocates a simple baseline in section 6: fine-tune using ADAM with bias correction and a learning rate of 2e−5 for 20 epochs, with learning rate linearly increased for the first 10% of steps and linearly decayed to zero afterward.
Gradual Fine-Tuning: Xu et al 2021 - Gradual Fine-Tuning for Low-Resource Domain Adaptation
Mahabadi et al 2021 - Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks
Mahabadi et al 2021 - COMPACTER: Efficient Low-Rank Hypercomplex Adapter Layers
Ben-Zaken et al 2021 - BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models
Wu et al 2022 - NoisyTune: A Little Noise Can Help You Finetune Pretrained Language Models Better

nlp/fine-tuning.1646248371.txt.gz · Last modified: 2023/06/15 07:36 (external edit)