This is an old revision of the document!

Machine Translation

Overviews

For a reading list, see The Machine Translation Reading List

Key Papers

System Papers

Wu et al 2016 - Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Levin et al 2017 - Toward a Full-Scale Neural Machine Translation in Production: The Booking.com Use Case
Britz et al 2017 - Massive Exploration of Neural Machine Translation Architectures
Arivazhagan et al 2019 - Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges See the list open problems. From Google's MT team, did they deploy this?
Hassan et al 2018 - Achieving Human Parity on Automatic Chinese to English News Translation
Costa-jussà et al 2022 - No Language Left Behind: Scaling Human-Centered Machine Translation blog website model demo Transformer encoder-decoder model with sparsely gated mixture of experts. 50B params, and also distilled versions.

Baselines

Denkowski & Neubig et al 2017 - Stronger Baselines for Trustable Results in Neural Machine Translation

Syntax in MT

Multilingual Translation

In multilingual translation, one system is built to translate between many language pairs (rather than just one).

Low-Resource

Zoph et al 2016 - Transfer Learning for Low-Resource Neural Machine Translation
Gu et al 2018 - Universal Neural Machine Translation for Extremely Low Resource Languages
Comparision of SMT vs NMT for low-resource MT
- Sennrich & Zhang 2019 - Revisiting Low-Resource Neural Machine Translation: A Case Study
- Duh et al 2020 - Benchmarking Neural and Statistical Machine Translation on Low-Resource African Languages
Costa-jussà et al 2022 - No Language Left Behind: Scaling Human-Centered Machine Translation blog website model demo Transformer encoder-decoder model with sparsely gated mixture of experts. 50B params, and also distilled versions.

Character-Level

Cherry et al 2018 - Revisiting Character-Based Neural Machine Translation with Capacity and Compression

Domain Adaptation

Pretraining

Unsupervised

Sentence Alignment

Before an MT system can be trained, the sentences in the parallel documents need to be aligned to create sentence pairs.

Thompson & Koehn 2019 - Vecalign: Improved Sentence Alignment in Linear Time and Space
Mining parallel sentences
- Some of these methods can be used to mine parallel sentences from large collections of documents
- Laser
  - Artetxe & Schwenk 2019 - Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond github Used in NLLB
  - Heffernan et al 2022 - Bitext Mining Using Distilled Sentence Representations for Low-Resource Languages

Statistical MT

See also Statistical Machine Translation. Recent papers related to SMT:

2019 - A Multifaceted Evaluation of Neural versus Phrase-Based Machine Translation for 9 Language Directions

Evaluation

For an overview, see Evaluating MT Systems.

Papers

See also the metrics task at WMT every year which does a correlation with human evaluations.

BLEU

Note that BLEU is a corpus-level metric, and that averaging BLEU scores computed at the sentence level will not give the same result as corpus-level BLEU. Corpus-level BLEU is the standard one reported in papers.

Notes: To assess length effects (translations being too short), people often report the brevity penalty, BP computed when calculating BLEU. Most BLEU evaluation scripts report this number as BP = .

SacreBLEU (recommended) paper
- If you want to simulate SacreBLEU evaluation, but with statistical significance, you can use the mteval-v13a.pl script to tokenize your output and references, and then use MultEval
Compare-MT Can analyze the differences between two systems and compute statistical significance. paper
Historical: Moses's multi-bleu.pl
Jon Clark's MultEval Does automatic boostrap resampling to compute statistical significance. paper

Datasets

Standard Datasets

WMT 2014 En-Fr, etc
WMT 2016
- Nice scripts to download and preprocess: wmt16_en_de.sh
Opus - The Open Parallel Corpus

Datasets for Small-Scale Experiments

IWSLT 2013 MT Datasets English-French (200K sentence pairs), used for example here.
IWSLT 2014 English-German (160K sentence pairs), used for example here.
Malagasy-English dataset (80K sentence pairs) Malagasy is a morphologically rich language (WARNING: hasn't been used in a while, no recent neural models to compare to)

Low-Resource Datasets

Guzmán et al 2019 dataset Four language pairs: Nepali-English, Sinhala-English, Khmer-English, Pashto-English
Malagasy-English dataset (Jeff recommends)
LDMT MURI Data (ask Jeff for it, he has access)
Flores-101 dataset Paper: Goyal et al 2021 3001 sentences translated into 101 languages
Flores-200 dataset Paper: Costa-jussà et al 2022
Cherokee-English dataset Recommended (recent, 2020)

Large Datasets

El-Kishky et al 2019 - CCAligned: A Massive Collection of Cross-Lingual Web-Document Pairs

Software

Resources

Conferences and Workshops
- WMT (Workshop on Machine Translation, now Conference on Machine Translation)
Books
- Koehn 2020 - Neural Machine Translation
Wikis
- MT Research Survey Wiki Covers neural methods as well
Bibliographies
- MT Reading List (constantly updated)

NLP Wiki

Table of Contents