User Tools

Site Tools


nlp:machine_translation

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
nlp:machine_translation [2023/10/06 00:28] – [Low-Resource] jmflanignlp:machine_translation [2024/08/13 06:21] (current) – [Evaluation] jmflanig
Line 43: Line 43:
      * [[https://www.aclweb.org/anthology/2020.lrec-1.325.pdf|Duh et al 2020 - Benchmarking Neural and Statistical Machine Translation on Low-Resource African Languages]]      * [[https://www.aclweb.org/anthology/2020.lrec-1.325.pdf|Duh et al 2020 - Benchmarking Neural and Statistical Machine Translation on Low-Resource African Languages]]
    * [[https://research.facebook.com/file/585831413174038/No-Language-Left-Behind--Scaling-Human-Centered-Machine-Translation.pdf|Costa-jussà et al 2022 - No Language Left Behind: Scaling Human-Centered Machine Translation]] [[https://github.com/facebookresearch/flores|dataset]] [[https://ai.facebook.com/blog/nllb-200-high-quality-machine-translation/|blog]] [[https://ai.facebook.com/research/no-language-left-behind/|website]] [[https://github.com/facebookresearch/fairseq/tree/nllb/?fbclid=IwAR1dOIBFelfGY48IJe0MgkUhJnqw3SP2y3O4VhlKs5-QM3dXuFRw4HIleZU|model]] [[https://nllb.metademolab.com/|demo]] Transformer encoder-decoder model with sparsely gated mixture of experts. 50B params, and also distilled versions.    * [[https://research.facebook.com/file/585831413174038/No-Language-Left-Behind--Scaling-Human-Centered-Machine-Translation.pdf|Costa-jussà et al 2022 - No Language Left Behind: Scaling Human-Centered Machine Translation]] [[https://github.com/facebookresearch/flores|dataset]] [[https://ai.facebook.com/blog/nllb-200-high-quality-machine-translation/|blog]] [[https://ai.facebook.com/research/no-language-left-behind/|website]] [[https://github.com/facebookresearch/fairseq/tree/nllb/?fbclid=IwAR1dOIBFelfGY48IJe0MgkUhJnqw3SP2y3O4VhlKs5-QM3dXuFRw4HIleZU|model]] [[https://nllb.metademolab.com/|demo]] Transformer encoder-decoder model with sparsely gated mixture of experts. 50B params, and also distilled versions.
 +  * [[https://aclanthology.org/2022.wmt-1.73.pdf|Marco & Fraser 2022 - Findings of the WMT 2022 Shared Tasks in Unsupervised MT and Very Low Resource Supervised MT]]
  
 ===== Character-Level ===== ===== Character-Level =====
Line 60: Line 61:
  
 ===== Unsupervised ===== ===== Unsupervised =====
 +  * [[https://arxiv.org/pdf/1711.00043.pdf|Lample et al 2017- Unsupervised Machine Translation Using Monolingual Corpora Only]]
   * [[https://arxiv.org/pdf/1906.06718.pdf|Luo et al 2019 - Neural Decipherment via Minimum-Cost Flow: from Ugaritic to Linear B]]   * [[https://arxiv.org/pdf/1906.06718.pdf|Luo et al 2019 - Neural Decipherment via Minimum-Cost Flow: from Ugaritic to Linear B]]
 +  * [[https://arxiv.org/pdf/1905.02450.pdf|Song et al 2019 - MASS: Masked Sequence to Sequence Pre-training for Language Generation]] [[https://github.com/microsoft/MASS|github]]
   * [[https://arxiv.org/pdf/2004.05516.pdf|Marchisio et al 2020 - When Does Unsupervised Machine Translation Work?]]   * [[https://arxiv.org/pdf/2004.05516.pdf|Marchisio et al 2020 - When Does Unsupervised Machine Translation Work?]]
-  * [[https://arxiv.org/pdf/2106.15818.pdf|Marchisio et al 2021 - What Can Unsupervised Machine Translation Contribute to High-Resource Language Pairs?]]+  * [[https://arxiv.org/pdf/2106.15818.pdf|Marchisio et al 2021 - On Systematic Style Differences between Unsupervised and Supervised MT and an Application for High-Resource Machine Translation]] 
 +  * [[https://aclanthology.org/2022.wmt-1.73.pdf|Marco & Fraser 2022 - Findings of the WMT 2022 Shared Tasks in Unsupervised MT and Very Low Resource Supervised MT]] 
 +  * [[https://arxiv.org/pdf/2310.10385.pdf|Tan & Monz 2023 - Towards a Better Understanding of Variations in Zero-Shot Neural Machine Translation Performance]]
  
 ===== Sentence Alignment ===== ===== Sentence Alignment =====
Line 92: Line 97:
   * [[https://arxiv.org/pdf/2004.06063.pdf|Freitag et al 2020 - BLEU might be Guilty but References are not Innocent]]   * [[https://arxiv.org/pdf/2004.06063.pdf|Freitag et al 2020 - BLEU might be Guilty but References are not Innocent]]
   * [[https://arxiv.org/pdf/2106.15195.pdf|Marie et al 2021 - Scientific Credibility of Machine Translation Research: A Meta-Evaluation of 769 Papers]]   * [[https://arxiv.org/pdf/2106.15195.pdf|Marie et al 2021 - Scientific Credibility of Machine Translation Research: A Meta-Evaluation of 769 Papers]]
 +  * [[https://arxiv.org/pdf/2310.10482.pdf|Guerreiro et al 2023 - xCOMET: Transparent Machine Translation Evaluation through Fine-grained Error Detection]]
 +  * [[https://arxiv.org/pdf/2302.14520|Kocmi & Federmann 2023 - Large Language Models Are State-of-the-Art Evaluators of Translation Quality]]
 +  * **Evaluation of Metrics**
 +    * **[[https://aclanthology.org/2021.tacl-1.87.pdf|Freitag et al 2021 - Experts, Errors, and Context: A Large-Scale Study of Human Evaluation for Machine Translation]]** Used in [[https://www2.statmt.org/wmt24/metrics-task.html|WMT]]
  
 === BLEU === === BLEU ===
Line 107: Line 116:
  
 ===== Datasets ===== ===== Datasets =====
 +
 +==== Papers About Corpus Collection ====
 +  * [[https://aclanthology.org/J03-3002.pdf|Resnik & Smith 2003 - The Web as a Parallel Corpus]] - The foundational paper about collecting parallel data from the web.
  
 ==== Standard Datasets ===== ==== Standard Datasets =====
Line 150: Line 162:
 ===== People ===== ===== People =====
   * [[https://scholar.google.com/citations?user=phgBJXYAAAAJ&hl=en|Wilker Aziz]]   * [[https://scholar.google.com/citations?user=phgBJXYAAAAJ&hl=en|Wilker Aziz]]
 +  * [[https://scholar.google.com/citations?user=iPAX6jcAAAAJ&hl=en|Marine Carpuat]]
   * [[https://scholar.google.com/citations?user=dok0514AAAAJ&hl=en|David Chiang]]   * [[https://scholar.google.com/citations?user=dok0514AAAAJ&hl=en|David Chiang]]
   * [[https://scholar.google.com/citations?user=dLaR9lgAAAAJ&hl=en|Orhan Firat]]   * [[https://scholar.google.com/citations?user=dLaR9lgAAAAJ&hl=en|Orhan Firat]]
nlp/machine_translation.1696552103.txt.gz · Last modified: 2023/10/06 00:28 by jmflanig

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki