nlp:domain_adaptation
Table of Contents
Domain Adaptation
See Wikipedia - Domain Adaptation. Usually in NLP, domain adaptation becomes necessary if the training data is from a different genre than the testing data - you train on newswire, test on medical domain, for example. Since natural language is so varied, often the domains are quite different, with different lexical items, syntactic patterns, and semantics. For a definition of what a domain is, see van der Wees 2015 - What’s in a Domain? Analyzing Genre and Topic Differences in Statistical Machine Translation or references in Gururangan 2020.
Overviews
- 2018 - A Survey of Unsupervised Deep Domain Adaptation Unsupervised domain adaptation differs from regular domain adaptation in that you don't get to see labeled examples in the domain of interest, only unlabeled examples.
Domain Adaptation (Outside of NLP)
Papers (General Domain Adaptation in NLP)
See also Awesome Neural Adaptation in NLP or here (old) A curated list of unsupervised domain adaption papers in NLP (not including MT).
- Daumé III 2009 - Frustratingly Easy Domain Adaptation A seminal paper, the baseline that you should always try (for linear models).
- The obvious baseline for neural networks is to fine-tune a pre-trained network on the new domain. I don't know any papers looking into this method and the trade-off associated with it, but there should be one
- Kim et al 2016 - Frustratingly Easy Neural Domain Adaptation Not a very good paper. Didn't compare to sensible baselines such as fine-tuning on the new domain.
- Ruder & Plank 2018 - Strong Baselines for Neural Semi-Supervised Learning under Domain Shift Proposes multi-task tri-training, which trains one model for tri-training on one task.
- Gururangan et al 2020 - Don't Stop Pretraining: Adapt Language Models to Domains and Tasks Note: measures similarity between domains by counting n-gram overlap. Has nice references for what is a domain.
- Kulshreshtha et al 2021 - Back-Training excels Self-Training at Unsupervised Domain Adaptation of Question Generation and Passage Retrieval Back-training: generating noisy inputs given outputs (vs self-training: generating noisy outputs given inputs).

Table from Ramponi & Plank 2020.
Domain Adaptation in NLP Tasks
Related Pages
nlp/domain_adaptation.txt · Last modified: 2023/06/15 07:36 by 127.0.0.1