This is an old revision of the document!

Domain Adaptation

See Wikipedia - Domain Adaptation. Usually in NLP, domain adaptation becomes necessary if the training data is from a different genre than the testing data - you train on newswire, test on medical domain, for example. Since natural language is so varied, often the domains are quite different, with different lexical items, syntactic patterns, and semantics. For a definition of what a domain is, see van der Wees 2015 - What’s in a Domain? Analyzing Genre and Topic Differences in Statistical Machine Translation or references in Gururangan 2020.

Overviews

2018 - A Survey of Unsupervised Deep Domain Adaptation Unsupervised domain adaptation differs from regular domain adaptation in that you don't get to see labeled examples in the domain of interest, only unlabeled examples.
Chu & Wang 2018 - A Survey of Domain Adaptation for Neural Machine Translation

See also Awesome Neural Adaptation in NLP A curated list of unsupervised domain adaption papers in NLP (not including MT).

Daumé III 2009 - Frustratingly Easy Domain Adaptation A seminal paper, the baseline that you should always try (for linear models).
The obvious baseline for neural networks is to fine-tune a pre-trained network on the new domain. I don't know any papers looking into this method and the trade-off associated with it, but there should be one
Kim et al 2016 - Frustratingly Easy Neural Domain Adaptation Not a very good paper. Didn't compare to sensible baselines such as fine-tuning on the new domain.
2017 - Neural Domain Adaptation for Biomedical Question Answering
Nishida et al 2019 - Unsupervised Domain Adaptation of Language Models for Reading Comprehension LREC paper
Gururangan et al 2020 - Don't Stop Pretraining: Adapt Language Models to Domains and Tasks Note: measures similarity between domains by counting n-gram overlap. Has nice references for what is a domain.
Karouzos et al 2021 - UDALM: Unsupervised Domain Adaptation through Language Modeling NAACL Main paper.