User Tools

Site Tools


nlp:non-autoregressive_seq2seq

This is an old revision of the document!


Non-Autoregressive Sequence-to-Sequence Models

Non-autoregressive seq2seq models produce outputs in parallel rather than one word at a time.

Autoregressive vs Non-Autoregressive

The best definition for an autogressive model comes from the paper Gu 2017.

Definition: Autoregressive (Gu 2017) An autoregressive model generates tokens conditioned on the sequence of tokens previously generated. In other words, it operates one step at a time: it generates each token conditioned on the sequence of tokens previously generated.

Examples of autoregressive models (from [https://arxiv.org/pdf/1711.02281.pdf|Gu 2017]]) include RNNs, LSTMs, CNNs, and Transformers.

Summary

From Zhou & Keung 2020 - Improving Non-autoregressive Neural Machine Translation with Monolingual Data:

Many non-autoregressive (NAR) translation methods have been proposed, including latent space models (Gu et al., 2017; Ma et al., 2019; Shu et al., 2019), iterative refinement methods (Lee et al., 2018; Ghazvininejad et al., 2019), and alternative loss functions (Libovicky and Helcl, 2018; Wang et al., 2019; Wei et al., 2019; Li et al., 2019; Shao et al., 2019). The decoding speedup for NAR models is typically 2-15× depending on the specific setup (e.g., the number of length candidates, number of latent samples, etc.), and NAR models can be tuned to achieve different trade-offs between time complexity and decoding quality (Gu et al., 2017; Wei et al., 2019; Ghazvininejad et al., 2019; Ma et al., 2019).

All these methods are based on transformer modules (Vaswani et al., 2017), and depend on a well-trained AR model to obtain its output translations to create targets for NAR model training.

Key Papers

nlp/non-autoregressive_seq2seq.1626205779.txt.gz · Last modified: 2023/06/15 07:36 (external edit)

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki