====== Statistical Machine Translation ====== Papers and software for statistical machine translation, mostly for historical refererence. See also [[https://en.wikipedia.org/wiki/Statistical_machine_translation|Wikipedia - Statistical Machine Translation]]. ===== Overviews ===== * {{papers:survey-staistical-machine-translation_0.pdf|Lopez 2008 - Statistical Machine Translation}} or [[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.126.2845&rep=rep1&type=pdf|Lopez 2007 - A Survey of Statistical Machine Translation]] (older, but nicer formatting) ===== IBM Models and Alignment ===== * IBM Models * [[https://www.aclweb.org/anthology/C88-1016.pdf|Brown et al 1988 - A Statistical Approach to Language Translation]] Overview paper of the approach * [[https://www.aclweb.org/anthology/J93-2003.pdf|Brown et al 1993 - The Mathematics of Statistical Machine Translation: Parameter Estimation]] Mathematical details of IBM alignment models 1-5 * IBM's complete system: [[https://patentimages.storage.googleapis.com/8c/d3/08/74a381b7127df4/US5477451.pdf|pdf]] [[https://patents.google.com/patent/US5477451A/en|patent]] * [[https://www.aclweb.org/anthology/C96-2141.pdf|Vogel et al 1996 - HMM-Based Word Alignment in Statistical Translation ]] * [[https://dl.acm.org/doi/pdf/10.3115/976909.979664|Wang & Waibel 1997 - Decoding Algorithm in Statistical Machine Translation]] * ReWrite decoder: [[https://www.aclweb.org/anthology/P01-1030.pdf|Germann et al 2001 - Fast Decoding and Optimal Decoding for Machine Translation]] A* search, hillclimbing, and ILP for decoding for word-based models * [[https://www.aclweb.org/anthology/N03-1019.pdf|Kumar & Byrne 2003 - A Weighted Finite State Transducer Implementation of the Alignment Template Model for Statistical Machine Translation]] * [[https://www.aclweb.org/anthology/N09-2002.pdf|Riedel & Clark 2009 - Revisiting Optimal Decoding for Machine Translation IBM Model 4]] Uses an ILP for IBM model 4 * Concavity of IBM model 1: [[https://www.aclweb.org/anthology/N12-1069.pdf|Gimpel & Smith 2012]] * [[https://www.aclweb.org/anthology/N13-1073.pdf|Dyer et al 2013 - A Simple, Fast, and Effective Reparameterization of IBM Model 2]] * [[https://arxiv.org/pdf/2004.14675.pdf|Zenkel et al 2020 - End-to-End Neural Word Alignment Outperforms GIZA++]] Not conclusive, since didn't do an extrinsic evaluation in an SMT system. Often, improvements in alignment eror rate (AER) don't translate to better SMT models ===== Phrase-Based Machine Translation (PBMT) ==== * [[https://www.aclweb.org/anthology/N03-1017.pdf|Och et al 2003 - Statistical Phrase-Based Translation]] * [[https://www.aclweb.org/anthology/J03-1002.pdf|Och & Ney 2003 - A Systematic Comparison of Various Statistical Alignment Models]] * [[http://webpages.iust.ac.ir/morteza_zakeri/repo/iust_course_materials/NaturalLanguageProcessing/Project/refs/2004_pharaoh_a%20baem%20search_amta2004.pdf|Khoen 2004 - Pharaoh: a beam search decoder for phrase-based statistical machine translation models]] ===== Syntax-Based Methods ===== * [[https://www.aclweb.org/anthology/P04-1083.pdf|Melamed 2004 - Statistical Machine Translation by Parsing]] Introduces SCFGs for SMT * Hiero * [[https://www.aclweb.org/anthology/P05-1033.pdf|Chiang 2005 - A Hierarchical Phrase-Based Model for Statistical Machine Translation]] * [[https://direct.mit.edu/coli/article-pdf/33/2/201/1798392/coli.2007.33.2.201.pdf|Chiang 2007 - Hierarchical Phrase-Based Translation]] * SAMT ===== Training ===== * MERT: [[https://www.aclweb.org/anthology/P03-1021.pdf|Och 2003 - Minimum Error Rate Training in Statistical Machine Translation]] The standard method for training SMT systems since 2003 * Hypergraph MERT: [[https://storage.googleapis.com/pub-tools-public-publication-data/pdf/35496.pdf|Kumar et al 2008 - Efficient Minimum Error Rate Training and Minimum Bayes-Risk Decoding for Translation Hypergraphs and Lattices]] ===== Evaluation ===== * Metrics * [[https://www.aclweb.org/anthology/P02-1040.pdf|Papineni et al 2001 - BLEU: a Method for Automatic Evaluation of Machine Translation]] * [[https://www.aclweb.org/anthology/W05-0909.pdf|Banerjee & Lavie 2005 - METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments]] * Methodology * [[https://www.aclweb.org/anthology/W04-3250.pdf|Koehn 2004 - Statistical Significance Tests for Machine Translation Evaluation]] * [[https://www.aclweb.org/anthology/P11-2031.pdf|Clark et al 2011 - Better Hypothesis Testing for Statistical Machine Translation: Controlling for Optimizer Instability]] ===== Software ===== * [[http://www.statmt.org/moses/giza/GIZA++.html|GIZA++]] Open source reimplementation of IBM alignment models [[https://github.com/moses-smt/giza-pp|Github]] * [[https://www.isi.edu/licensed-sw/rewrite-decoder/|ReWrite Decoder]] Open source implementation of word-based SMT * [[http://www.statmt.org/moses/|Moses]] * **cdec** [[https://github.com/redpony/cdec|Github]] [[https://web.archive.org/web/20150224014619/http://www.cdec-decoder.org/|Old website]] [[https://web.archive.org/web/20150220064636/http://cdec-decoder.org/index.php?title=Main_Page|Older website]] ===== Related Pages ===== * [[Noisy Channel Model]] * [[Machine Translation]]