Differences

This shows you the differences between two versions of the page.

--- nlp:statistical_machine_translation [2021/04/20 08:02] – jmflanig
+++ nlp:statistical_machine_translation [2023/06/15 07:36] (current) – external edit 127.0.0.1
@@ Line 3: / Line 3: @@
 ===== Overviews =====
-  * {{papers:survey-staistical-machine-translation_0.pdf|Lopez 2008 - Statistical Machine Translation}} or [[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.126.2845&rep=rep1&type=pdf|Lopez 2007 - A Survey of Statistical Machine Translation]] (nicer formatting)
+  * {{papers:survey-staistical-machine-translation_0.pdf|Lopez 2008 - Statistical Machine Translation}} or [[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.126.2845&rep=rep1&type=pdf|Lopez 2007 - A Survey of Statistical Machine Translation]] (older, but nicer formatting)
 ===== IBM Models and Alignment =====
@@ Line 11: / Line 11: @@
     * IBM's complete system: [[https://patentimages.storage.googleapis.com/8c/d3/08/74a381b7127df4/US5477451.pdf|pdf]] [[https://patents.google.com/patent/US5477451A/en|patent]]
   * [[https://www.aclweb.org/anthology/C96-2141.pdf|Vogel et al 1996 - HMM-Based Word Alignment in Statistical Translation ]]
-  * ReWrite decoder: [[https://www.aclweb.org/anthology/P01-1030.pdf|Germann et al 2001 - Fast Decoding and Optimal Decoding for Machine Translation]]
+  * [[https://dl.acm.org/doi/pdf/10.3115/976909.979664|Wang & Waibel 1997 - Decoding Algorithm in Statistical Machine Translation]]
+  * ReWrite decoder: [[https://www.aclweb.org/anthology/P01-1030.pdf|Germann et al 2001 - Fast Decoding and Optimal Decoding for Machine Translation]] A* search, hillclimbing, and ILP for decoding for word-based models
   * [[https://www.aclweb.org/anthology/N03-1019.pdf|Kumar & Byrne 2003 - A Weighted Finite State Transducer Implementation of the Alignment Template Model for Statistical Machine Translation]]
+  * [[https://www.aclweb.org/anthology/N09-2002.pdf|Riedel & Clark 2009 - Revisiting Optimal Decoding for Machine Translation IBM Model 4]] Uses an ILP for IBM model 4
   * Concavity of IBM model 1: [[https://www.aclweb.org/anthology/N12-1069.pdf|Gimpel & Smith 2012]]
   * [[https://www.aclweb.org/anthology/N13-1073.pdf|Dyer et al 2013 - A Simple, Fast, and Effective Reparameterization of IBM Model 2]]
+  * [[https://arxiv.org/pdf/2004.14675.pdf|Zenkel et al 2020 - End-to-End Neural Word Alignment Outperforms GIZA++]] Not conclusive, since didn't do an extrinsic evaluation in an SMT system.  Often, improvements in alignment eror rate (AER) don't translate to better SMT models
 ===== Phrase-Based Machine Translation (PBMT) ====
@@ Line 20: / Line 23: @@
   * [[https://www.aclweb.org/anthology/J03-1002.pdf|Och & Ney 2003 - A Systematic Comparison of Various Statistical Alignment Models]]
   * [[http://webpages.iust.ac.ir/morteza_zakeri/repo/iust_course_materials/NaturalLanguageProcessing/Project/refs/2004_pharaoh_a%20baem%20search_amta2004.pdf|Khoen 2004 - Pharaoh: a beam search decoder for phrase-based statistical machine translation models]]
+===== Syntax-Based Methods =====
+  * [[https://www.aclweb.org/anthology/P04-1083.pdf|Melamed 2004 - Statistical Machine Translation by Parsing]] Introduces SCFGs for SMT
+  * Hiero
+    * [[https://www.aclweb.org/anthology/P05-1033.pdf|Chiang 2005 - A Hierarchical Phrase-Based Model for Statistical Machine Translation]]
+    * [[https://direct.mit.edu/coli/article-pdf/33/2/201/1798392/coli.2007.33.2.201.pdf|Chiang 2007 - Hierarchical Phrase-Based Translation]]
+  * SAMT
 ===== Training =====
-  * [[https://www.aclweb.org/anthology/P03-1021.pdf|Och 2003 - Minimum Error Rate Training in Statistical Machine Translation]] The standard method for training SMT systems since 2003
+  * MERT: [[https://www.aclweb.org/anthology/P03-1021.pdf|Och 2003 - Minimum Error Rate Training in Statistical Machine Translation]] The standard method for training SMT systems since 2003
+  * Hypergraph MERT: [[https://storage.googleapis.com/pub-tools-public-publication-data/pdf/35496.pdf|Kumar et al 2008 - Efficient Minimum Error Rate Training and Minimum Bayes-Risk Decoding for Translation Hypergraphs and Lattices]]
 ===== Evaluation =====
   * Metrics
-    * BLEU
+    * [[https://www.aclweb.org/anthology/P02-1040.pdf|Papineni et al 2001 - BLEU: a Method for Automatic Evaluation of Machine Translation]]
     * [[https://www.aclweb.org/anthology/W05-0909.pdf|Banerjee & Lavie 2005 - METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments]]
   * Methodology
     * [[https://www.aclweb.org/anthology/W04-3250.pdf|Koehn 2004 - Statistical Significance Tests for Machine Translation Evaluation]]
+    * [[https://www.aclweb.org/anthology/P11-2031.pdf|Clark et al 2011 - Better Hypothesis Testing for Statistical Machine Translation: Controlling for Optimizer Instability]]
 ===== Software =====
-  * GIZA++ Open source reimplementation of IBM alignment models
+  * [[http://www.statmt.org/moses/giza/GIZA++.html|GIZA++]] Open source reimplementation of IBM alignment models [[https://github.com/moses-smt/giza-pp|Github]]
   * [[https://www.isi.edu/licensed-sw/rewrite-decoder/|ReWrite Decoder]] Open source implementation of word-based SMT
   * [[http://www.statmt.org/moses/|Moses]]
-  * cdec
+  * **cdec** [[https://github.com/redpony/cdec|Github]] [[https://web.archive.org/web/20150224014619/http://www.cdec-decoder.org/|Old website]] [[https://web.archive.org/web/20150220064636/http://cdec-decoder.org/index.php?title=Main_Page|Older website]]
 ===== Related Pages =====