nlp:seq2seq
This is an old revision of the document!
Table of Contents
Sequence to Sequence Models
Decoding Strategies
See also Decoding.
- Nucleus Sampling: Holtzman et al 2019 - The Curious Case of Neural Text Degeneration
- Stahlberg & Byrne 2019 - On NMT Search Errors and Model Errors: Cat Got Your Tongue? Exact decoding method for seq2seq models. Follow-up work: Shi et al 2020 - Why Neural Machine Translation Prefers Empty Outputs
- Diverse k-Best and Lattice Decoding
- Xu & Durrett 2021 - Massive-scale Decoding for Text Generation using Lattices Produces a lattice of diverse generated outputs
- Constrained Decoding
- Poesia et al 2022 - Synchromesh: Reliable code generation from pre-trained language models They created a tool that will take a ANTLR parser and a string, and give you the set of valid next token completions (see sect 3.1).
Issues in Seq2Seq Models
Length Issues
- Shi et al 2020 - Why Neural Machine Translation Prefers Empty Outputs If you add a different EOS token for each length, and do not perform label smoothing on the EOS tokens, then the empty-sentence and length issues go away.
- Liang et al 2022 - The Implicit Length Bias of Label Smoothing on Beam Search Decoding Introduces a method for correcting the length problem induced by label smoothing which improves translation quality.
Exposure Bias
See also this post, references at the bottom.
- Wang & Sennrich 2020 - On Exposure Bias, Hallucination and Domain Shift in Neural Machine Translation Uses minimum risk training (i.e. risk loss function), which shows a consistant improvement across models.
Scheduled Sampling
- Bengio et al 2015 - Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks Scheduled sampling attempts to avoid the exposure bias problem of teacher forcing by sampling predictions from the model according to a schedule during training
- Scheduled Sampling is actually DAGGER: Ross et al 2010 - A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning (see Graham Neubig's slides)
Sequence to Sequence Model Variants
- Noisy channel model: Yee et al 2019 - Simple and Effective Noisy Channel Modeling for Neural Machine Translation “noisy channel models can outperform a direct model by up to 3.2 BLEU”
- Fast Variants
- Gehring et al 2017 - Convolutional Sequence to Sequence Learning 10x faster. Very strong (high BLEU score) baseline given in Edunov et al 2017.
- Non-Autoregressive, see Non-Autoregressive Seq2seq
Datasets
- Standard seq2seq datasets
- WMT 2014 & 2016 (En-De, and En-Fr)
- Neural abstractive summarization (Rush 2015)
- Easy dialog datasets
- Some easy semantic parsing datasets? E2E dataset?
Misc Papers
Related Pages
nlp/seq2seq.1686814574.txt.gz · Last modified: 2023/06/15 07:36 by 127.0.0.1