nlp:seq2seq
Table of Contents
Sequence to Sequence Models
Decoding Strategies
See also Decoding.
- Nucleus Sampling: Holtzman et al 2019 - The Curious Case of Neural Text Degeneration
- Stahlberg & Byrne 2019 - On NMT Search Errors and Model Errors: Cat Got Your Tongue? Exact decoding method for seq2seq models. Follow-up work: Shi et al 2020 - Why Neural Machine Translation Prefers Empty Outputs
- Diverse k-Best and Lattice Decoding
- Xu & Durrett 2021 - Massive-scale Decoding for Text Generation using Lattices Produces a lattice of diverse generated outputs
- Constrained Decoding
- Poesia et al 2022 - Synchromesh: Reliable Code Generation from Pre-trained Language Models They created a tool that will take a ANTLR parser and a string, and give you the set of valid next token completions (see sect 3.1).
- Geng et al 2023 - Grammar-Constrained Decoding for Structured NLP Tasks without Finetuning Elegant solution. Uses Grammatical Framework to constrain the outputs.
- Parallel Decoding
- Speculative Decoding
- Overviews
- Miscellaneous Decoding Techniques
- Contrastive Decoding
Issues in Seq2Seq Models
Length Issues
- Shi et al 2020 - Why Neural Machine Translation Prefers Empty Outputs If you add a different EOS token for each length, and do not perform label smoothing on the EOS tokens, then the empty-sentence and length issues go away.
- Liang et al 2022 - The Implicit Length Bias of Label Smoothing on Beam Search Decoding Introduces a method for correcting the length problem induced by label smoothing which improves translation quality.
Exposure Bias
See also this post, references at the bottom.
- Wang & Sennrich 2020 - On Exposure Bias, Hallucination and Domain Shift in Neural Machine Translation Uses minimum risk training (i.e. risk loss function), which shows a consistant improvement across models.
- Arora et al 2022 - Why Exposure Bias Matters: An Imitation Learning Perspective of Error Accumulation in Language Generation Shows that exposure bias leads to an accumulation of errors during generation (such as repetition, etc), and perplexity doesn't capture this.
Scheduled Sampling
- Bengio et al 2015 - Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks Scheduled sampling attempts to avoid the exposure bias problem of teacher forcing by sampling predictions from the model according to a schedule during training
- Scheduled Sampling is actually DAGGER: Ross et al 2010 - A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning (see Graham Neubig's slides)
Sequence to Sequence Model Variants
- Noisy channel model: Yee et al 2019 - Simple and Effective Noisy Channel Modeling for Neural Machine Translation “noisy channel models can outperform a direct model by up to 3.2 BLEU”
- Fast Variants
- Gehring et al 2017 - Convolutional Sequence to Sequence Learning 10x faster. Very strong (high BLEU score) baseline given in Edunov et al 2017.
- Non-Autoregressive, see Non-Autoregressive Seq2seq
Datasets
- Standard seq2seq datasets
- WMT 2014 & 2016 (En-De, and En-Fr)
- Neural abstractive summarization (Rush 2015)
- Easy dialog datasets
- Some easy semantic parsing datasets? E2E dataset?
Misc Papers
Related Pages
nlp/seq2seq.txt · Last modified: 2025/05/29 07:15 by jmflanig