====== Structured Prediction Energy Networks ====== (Also known as SPENs.) Summary (from [[https://www.aclweb.org/anthology/D19-1099.pdf|Lyu - 2019]]):
Structured prediction energy networks (SPENs) are trained to assign global energy scores to output structures,and the gradient descent is used during inference to minimize the global energy (Belanger and Mc-Callum, 2016). As the gradient descent involves iterative optimization, its steps can be viewed as iterative refinement. In particular, Belanger et al. (2017) build a SPEN for SRL, but for the span-based formalism, not the dependency one we consider in this work. While they improve over their baseline model, their baseline model used multi-layer perceptron to encode local factors, thus the encoder power is limited. Moreover their refined model performs worse in the out-of-domain set-ting than their baseline model, indicating overfitting (Belanger et al., 2017). In the follow-up work, Tu and Gimpel (2018,2019) introduce inference networks to replace gradient descent. Their inference networks directly refine the output. Improvements over competitive baselines are reported on part-of-speech tagging, named entity recognition and CCG super-tagging (Tu and Gimpel, 2019). However, their inference networks are distilling knowledge from a tractable linear-chain conditional random field (CRF) model. Thus, these methods do not provide direct performance gains. More importantly,the interactions captured in these models are likely local, as they learn to mimic Markov CRFs.===== Papers ===== * [[https://arxiv.org/pdf/1511.06350.pdf|Belanger & McCallum 2015 - Structured Prediction Energy Networks]] * [[https://arxiv.org/pdf/1703.05667.pdf|Belanger et al 2017 - End-to-End Learning for Structured Prediction Energy Networks]] * [[https://www.aclweb.org/anthology/N18-2021.pdf|Rooshenas et al 2018 - Training Structured Prediction Energy Networks with Indirect Supervision]] * [[https://arxiv.org/pdf/1812.09603.pdf|Rooshenas et al 2018 - Search-Guided, Lightly-Supervised Training of Structured Prediction Energy Networks]] * Kevin Gimple's papers * [[https://arxiv.org/pdf/1803.03376.pdf|Tu & Gimple 2018 - Learning Approximate Inference Networks for Structured Prediction]] * [[https://www.aclweb.org/anthology/N19-1335.pdf|Tu & Gimple 2019 - Benchmarking Approximate Inference Methods for Neural Structured Prediction]] * [[https://www.aclweb.org/anthology/2020.spnlp-1.8.pdf|Tu et al 2020 - Improving Joint Training of Inference Networks and Structured Prediction Energy Networks]] * [[https://www.aclweb.org/anthology/2020.acl-main.251.pdf|Tu et al 2020 - ENGINE: Energy-Based Inference Networks for Non-Autoregressive Machine Translation]] * [[https://www.aclweb.org/anthology/W19-4109.pdf|Trinh et al 2019 - Energy-Based Modelling for Dialogue State Tracking]] * [[https://www.aclweb.org/anthology/2020.spnlp-1.5.pdf|Trinh et al 2020 - Energy-based Neural Modelling for Large-Scale Multiple Domain Dialogue State Tracking]] * [[https://arxiv.org/pdf/2009.13267.pdf|Bhattacharyya et al 2020 - Energy-Based Reranking: Improving Neural Machine Translation Using Energy-Based Models]] Interesting paper, but has some flaws. First, the energy-based models (EBMs) use BERT. For a fair comparision of the merits of EBMs, they would compare the baseline to EBMs without BERT. Second, they use reranking and do not attempt to use the ideas of SPENs to improve the decoding. ===== Related Pages ===== * [[Structured Prediction]] * [[Integer Linear Programming]]