For a reading list, see The Machine Translation Reading List
In multilingual translation, one system is built to translate between many language pairs (rather than just one).
See also Domain Adaptation.
Before an MT system can be trained, the sentences in the parallel documents need to be aligned to create sentence pairs.
See also Statistical Machine Translation. Recent papers related to SMT:
For an overview, see Evaluating MT Systems.
See also the metrics task at WMT every year which does a correlation with human evaluations.
Note that BLEU is a corpus-level metric, and that averaging BLEU scores computed at the sentence level will not give the same result as corpus-level BLEU. Corpus-level BLEU is the standard one reported in papers.
Notes: To assess length effects (translations being too short), people often report the brevity penalty, BP computed when calculating BLEU. Most BLEU evaluation scripts report this number as BP = .
See also Tan 2020.