Structured Prediction Energy Networks

(Also known as SPENs.) Summary (from Lyu - 2019):

Structured prediction energy networks (SPENs) are trained to assign global energy scores to output structures,and the gradient descent is used during inference to minimize the global energy (Belanger and Mc-Callum, 2016). As the gradient descent involves iterative optimization, its steps can be viewed as iterative refinement. In particular, Belanger et al. (2017) build a SPEN for SRL, but for the span-based formalism, not the dependency one we consider in this work. While they improve over their baseline model, their baseline model used multi-layer perceptron to encode local factors, thus the encoder power is limited. Moreover their refined model performs worse in the out-of-domain set-ting than their baseline model, indicating overfitting (Belanger et al., 2017).

In the follow-up work, Tu and Gimpel (2018,2019) introduce inference networks to replace gradient descent. Their inference networks directly refine the output. Improvements over competitive baselines are reported on part-of-speech tagging, named entity recognition and CCG super-tagging (Tu and Gimpel, 2019). However, their inference networks are distilling knowledge from a tractable linear-chain conditional random field (CRF) model. Thus, these methods do not provide direct performance gains. More importantly,the interactions captured in these models are likely local, as they learn to mimic Markov CRFs.

Papers

Belanger & McCallum 2015 - Structured Prediction Energy Networks
Belanger et al 2017 - End-to-End Learning for Structured Prediction Energy Networks
Rooshenas et al 2018 - Training Structured Prediction Energy Networks with Indirect Supervision
Rooshenas et al 2018 - Search-Guided, Lightly-Supervised Training of Structured Prediction Energy Networks
Kevin Gimple's papers
Trinh et al 2019 - Energy-Based Modelling for Dialogue State Tracking
Trinh et al 2020 - Energy-based Neural Modelling for Large-Scale Multiple Domain Dialogue State Tracking
Bhattacharyya et al 2020 - Energy-Based Reranking: Improving Neural Machine Translation Using Energy-Based Models Interesting paper, but has some flaws. First, the energy-based models (EBMs) use BERT. For a fair comparision of the merits of EBMs, they would compare the baseline to EBMs without BERT. Second, they use reranking and do not attempt to use the ideas of SPENs to improve the decoding.

NLP Wiki

Table of Contents

Structured Prediction Energy Networks

Papers

Related Pages