This is an old revision of the document!
Table of Contents
Structured Prediction Energy Networks
(Also known as SPENs.) Summary (from Lyu - 2019):
Structured prediction energy networks (SPENs) are trained to assign global energy scores to output structures,and the gradient descent is used during inference to minimize the global energy (Belanger and Mc-Callum, 2016). As the gradient descent involves iterative optimization, its steps can be viewed as iterative refinement. In particular, Belanger et al.(2017) build a SPEN for SRL, but for the span-based formalism, not the dependency one we con-sider in this work. While they improve over their baseline model, their baseline model used multi-layer perceptron to encode local factors, thus the encoder power is limited. Moreover their refined model performs worse in the out-of-domain set-ting than their baseline model, indicating overfitting (Belanger et al., 2017).
In the follow-up work, Tu and Gimpel (2018,2019) introduce inference networks to replace gradient descent. Their inference networks directly refine the output. Improvements over competitive baselines are reported on part-of-speech tagging, named entity recognition and CCG super-tagging (Tu and Gimpel, 2019). However, their inference networks are distilling knowledge from a tractable linear-chain conditional random field(CRF) model. Thus, these methods do not provide direct performance gains. More importantly,the interactions captured in these models are likely local, as they learn to mimic Markov CRFs.