nlp:seq2seq

This is an old revision of the document!

Table of Contents

Sequence to Sequence Models

Sequence to Sequence Models

Decoding Strategies

See also Decoding.

Murray & Chiang 2018 - Correcting Length Bias in Neural Machine Translation
Nucleus Sampling: Holtzman et al 2019 - The Curious Case of Neural Text Degeneration
Wellek et al 2019 - Neural Text Generation with Unlikelihood Training
Stahlberg & Byrne 2019 - On NMT Search Errors and Model Errors: Cat Got Your Tongue? Exact decoding method for seq2seq models. Follow-up work: Shi et al 2020 - Why Neural Machine Translation Prefers Empty Outputs
Meister et al 2020 - If beam search is the answer, what was the question?
Hargreaves et al 2021 - Incremental Beam Manipulation for Natural Language Generation
Diverse k-Best and Lattice Decoding
- Xu & Durrett 2021 - Massive-scale Decoding for Text Generation using Lattices Produces a lattice of diverse generated outputs
Constrained Decoding
- Yin & Neubig 2017 - A Syntactic Neural Model for General-Purpose Code Generation
- Wang et al 2020 - RAT-SQL: Relation-Aware Schema Encoding and Linking for Text-to-SQL Parsers
- Poesia et al 2022 - Synchromesh: Reliable code generation from pre-trained language models They created a tool that will take a ANTLR parser and a string, and give you the set of valid next token completions (see sect 3.1).

Issues in Seq2Seq Models

Length Issues

Shi et al 2016 - Why Neural Translations are the Right Length
Murray & Chiang 2018 - Correcting Length Bias in Neural Machine Translation
Shi et al 2020 - Why Neural Machine Translation Prefers Empty Outputs If you add a different EOS token for each length, and do not perform label smoothing on the EOS tokens, then the empty-sentence and length issues go away.
Liang et al 2022 - The Implicit Length Bias of Label Smoothing on Beam Search Decoding Introduces a method for correcting the length problem induced by label smoothing which improves translation quality.

Exposure Bias

See also this post, references at the bottom.

Wang & Sennrich 2020 - On Exposure Bias, Hallucination and Domain Shift in Neural Machine Translation Uses minimum risk training (i.e. risk loss function), which shows a consistant improvement across models.

Scheduled Sampling

Bengio et al 2015 - Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks Scheduled sampling attempts to avoid the exposure bias problem of teacher forcing by sampling predictions from the model according to a schedule during training
Scheduled Sampling is actually DAGGER: Ross et al 2010 - A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning (see Graham Neubig's slides)
Mihaylova & Martins 2019 - Scheduled Sampling for Transformers

Sequence to Sequence Model Variants

Noisy channel model: Yee et al 2019 - Simple and Effective Noisy Channel Modeling for Neural Machine Translation “noisy channel models can outperform a direct model by up to 3.2 BLEU”
Fast Variants
- Gehring et al 2017 - Convolutional Sequence to Sequence Learning 10x faster. Very strong (high BLEU score) baseline given in Edunov et al 2017.
Non-Autoregressive, see Non-Autoregressive Seq2seq

Datasets

Standard seq2seq datasets
- WMT 2014 & 2016 (En-De, and En-Fr)
- Neural abstractive summarization (Rush 2015)
- Easy dialog datasets
- Some easy semantic parsing datasets? E2E dataset?

Misc Papers

Lee et al 2020 - On the Discrepancy between Density Estimation and Sequence Generation

Related Pages

nlp/seq2seq.1686814574.txt.gz · Last modified: 2023/06/15 07:36 by 127.0.0.1