Table of Contents

Machine Translation

Overviews

For a reading list, see The Machine Translation Reading List

Key Papers

System Papers

Baselines

Syntax in MT

Multilingual Translation

In multilingual translation, one system is built to translate between many language pairs (rather than just one).

Low-Resource

Character-Level

Domain Adaptation

See also Domain Adaptation.

Pretraining

Unsupervised

Sentence Alignment

Before an MT system can be trained, the sentences in the parallel documents need to be aligned to create sentence pairs.

Statistical MT

See also Statistical Machine Translation. Recent papers related to SMT:

Evaluation

For an overview, see Evaluating MT Systems.

Papers

See also the metrics task at WMT every year which does a correlation with human evaluations.

BLEU

Note that BLEU is a corpus-level metric, and that averaging BLEU scores computed at the sentence level will not give the same result as corpus-level BLEU. Corpus-level BLEU is the standard one reported in papers.

Notes: To assess length effects (translations being too short), people often report the brevity penalty, BP computed when calculating BLEU. Most BLEU evaluation scripts report this number as BP = .

Datasets

Papers About Corpus Collection

Standard Datasets

Datasets for Small-Scale Experiments

Low-Resource Datasets

Large Datasets

Software

See also Tan 2020.

Resources

People