This is an old revision of the document!
Table of Contents
Machine Translation
Overviews
For a reading list, see The Machine Translation Reading List
Key Papers
System Papers
- Arivazhagan et al 2019 - Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges See the list open problems. From Google's MT team, did they deploy this?
- Costa-jussà et al 2022 - No Language Left Behind: Scaling Human-Centered Machine Translation blog website model demo Transformer encoder-decoder model with sparsely gated mixture of experts. 50B params, and also distilled versions.
Baselines
Syntax in MT
Multilingual Translation
In multilingual translation, one system is built to translate between many language pairs (rather than just one).
Low-Resource
- Comparision of SMT vs NMT for low-resource MT
- Costa-jussà et al 2022 - No Language Left Behind: Scaling Human-Centered Machine Translation blog website model demo Transformer encoder-decoder model with sparsely gated mixture of experts. 50B params, and also distilled versions.
Character-Level
Domain Adaptation
See also Domain Adaptation.
- Surveys
Pretraining
Unsupervised
Sentence Alignment
Before an MT system can be trained, the sentences in the parallel documents need to be aligned to create sentence pairs.
- Mining parallel sentences
- Some of these methods can be used to mine parallel sentences from large collections of documents
-
Statistical MT
See also Statistical Machine Translation. Recent papers related to SMT:
Evaluation
For an overview, see Evaluating MT Systems.
Papers
See also the metrics task at WMT every year which does a correlation with human evaluations.
BLEU
Note that BLEU is a corpus-level metric, and that averaging BLEU scores computed at the sentence level will not give the same result as corpus-level BLEU. Corpus-level BLEU is the standard one reported in papers.
Notes: To assess length effects (translations being too short), people often report the brevity penalty, BP computed when calculating BLEU. Most BLEU evaluation scripts report this number as BP = .
-
- If you want to simulate SacreBLEU evaluation, but with statistical significance, you can use the mteval-v13a.pl script to tokenize your output and references, and then use MultEval
- Compare-MT Can analyze the differences between two systems and compute statistical significance. paper
- Historical: Moses's multi-bleu.pl
Datasets
Standard Datasets
- WMT 2014 En-Fr, etc
-
- Nice scripts to download and preprocess: wmt16_en_de.sh
Datasets for Small-Scale Experiments
- IWSLT 2013 MT Datasets English-French (200K sentence pairs), used for example here.
- IWSLT 2014 English-German (160K sentence pairs), used for example here.
- Malagasy-English dataset (80K sentence pairs) Malagasy is a morphologically rich language (WARNING: hasn't been used in a while, no recent neural models to compare to)
Low-Resource Datasets
- Guzmán et al 2019 dataset Four language pairs: Nepali-English, Sinhala-English, Khmer-English, Pashto-English
- Malagasy-English dataset (Jeff recommends)
- LDMT MURI Data (ask Jeff for it, he has access)
- Flores-101 dataset Paper: Goyal et al 2021 3001 sentences translated into 101 languages
- Cherokee-English dataset Recommended (recent, 2020)
Large Datasets
Software
See also Tan 2020.
- FairSeq
- OpenNMT
- Sockeye
- Nematus
Resources
- Conferences and Workshops
- WMT (Workshop on Machine Translation, now Conference on Machine Translation)
- Books
- Wikis
- MT Research Survey Wiki Covers neural methods as well
- Bibliographies