Table of Contents
Structured Prediction Energy Networks
(Also known as SPENs.) Summary (from Lyu - 2019):
Structured prediction energy networks (SPENs) are trained to assign global energy scores to output structures,and the gradient descent is used during inference to minimize the global energy (Belanger and Mc-Callum, 2016). As the gradient descent involves iterative optimization, its steps can be viewed as iterative refinement. In particular, Belanger et al. (2017) build a SPEN for SRL, but for the span-based formalism, not the dependency one we consider in this work. While they improve over their baseline model, their baseline model used multi-layer perceptron to encode local factors, thus the encoder power is limited. Moreover their refined model performs worse in the out-of-domain set-ting than their baseline model, indicating overfitting (Belanger et al., 2017).
In the follow-up work, Tu and Gimpel (2018,2019) introduce inference networks to replace gradient descent. Their inference networks directly refine the output. Improvements over competitive baselines are reported on part-of-speech tagging, named entity recognition and CCG super-tagging (Tu and Gimpel, 2019). However, their inference networks are distilling knowledge from a tractable linear-chain conditional random field (CRF) model. Thus, these methods do not provide direct performance gains. More importantly,the interactions captured in these models are likely local, as they learn to mimic Markov CRFs.
Papers
- Kevin Gimple's papers
- Bhattacharyya et al 2020 - Energy-Based Reranking: Improving Neural Machine Translation Using Energy-Based Models Interesting paper, but has some flaws. First, the energy-based models (EBMs) use BERT. For a fair comparision of the merits of EBMs, they would compare the baseline to EBMs without BERT. Second, they use reranking and do not attempt to use the ideas of SPENs to improve the decoding.