nlp:non-autoregressive_seq2seq

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
nlp:non-autoregressive_seq2seq [2021/07/13 20:17] – [Autoregressive vs Non-Autoregressive] jmflanignlp:non-autoregressive_seq2seq [2024/05/03 03:37] (current) – [Key Papers] jmflanig
Line 6: Line 6:
 generates tokens conditioned on the sequence of tokens previously generated.  In other words, it operates one step at a time: it generates each token conditioned on the sequence of tokens previously generated. Examples of autoregressive models include RNNs, LSTMs, CNNs (masked convolution layers), and Transformers (from [[https://arxiv.org/pdf/1711.02281.pdf|Gu 2017]]). generates tokens conditioned on the sequence of tokens previously generated.  In other words, it operates one step at a time: it generates each token conditioned on the sequence of tokens previously generated. Examples of autoregressive models include RNNs, LSTMs, CNNs (masked convolution layers), and Transformers (from [[https://arxiv.org/pdf/1711.02281.pdf|Gu 2017]]).
  
-**Definition: Non-Autoregressive ([[https://arxiv.org/pdf/1711.02281.pdf|Gu 2017]]):** A non-autoregressive model removes the conditional dependence between output tokens and generates them in parallel.  For an example see equation 3 of [[https://arxiv.org/pdf/1711.02281.pdf|Gu 2017]].+**Definition: Non-Autoregressive ([[https://arxiv.org/pdf/1711.02281.pdf|Gu 2017]]):** A non-autoregressive model removes the conditional dependence between output tokens and generates them in parallel.  See equation 3 of [[https://arxiv.org/pdf/1711.02281.pdf|Gu 2017]] for an example.
  
-**Note:** There are also **global models** like [[ml:Conditional Random Field|Conditional Random Fields]], which are not autoregressive and not non-autoregressive by the above definitions.  Instead they perform inference (dynamic programming, MCMC, etc) to maximize a global scoring function.+**Note:** There are also **global models** like [[ml:Conditional Random Field|Conditional Random Fields]], which are not autoregressive and not non-autoregressive by the above definitions.  Instead they perform inference (using dynamic programming, MCMC, etc) to maximize a global scoring function.
  
 ===== Summary ===== ===== Summary =====
Line 36: Line 36:
  
   * [[https://arxiv.org/pdf/1711.02281.pdf|Gu et al 2017 - Non-Autoregressive Neural Machine Translation]]   * [[https://arxiv.org/pdf/1711.02281.pdf|Gu et al 2017 - Non-Autoregressive Neural Machine Translation]]
 +  * [[http://proceedings.mlr.press/v80/kaiser18a/kaiser18a.pdf|Kaiser et al 2018 - Fast Decoding in Sequence Models Using Discrete Latent Variables]]
   * **[[https://arxiv.org/pdf/1904.09324.pdf|Ghazvininejad et al 2019 - Mask-Predict: Parallel Decoding of Conditional Masked Language Models]]**   * **[[https://arxiv.org/pdf/1904.09324.pdf|Ghazvininejad et al 2019 - Mask-Predict: Parallel Decoding of Conditional Masked Language Models]]**
   * [[https://arxiv.org/pdf/2002.07233.pdf|Lee et al 2020 - On the Discrepancy between Density Estimation and Sequence Generation]]   * [[https://arxiv.org/pdf/2002.07233.pdf|Lee et al 2020 - On the Discrepancy between Density Estimation and Sequence Generation]]
Line 44: Line 45:
   * **[[https://arxiv.org/pdf/2004.07437.pdf|Non-Autoregressive Machine Translation with Latent Alignments]]** From Google, uses CTC loss from Gu & Kong 2020   * **[[https://arxiv.org/pdf/2004.07437.pdf|Non-Autoregressive Machine Translation with Latent Alignments]]** From Google, uses CTC loss from Gu & Kong 2020
   * [[https://arxiv.org/pdf/2006.10369.pdf|Kasai et al 2020 - Deep Encoder, Shallow Decoder: Reevaluating Non-autoregressive Machine Translation]]   * [[https://arxiv.org/pdf/2006.10369.pdf|Kasai et al 2020 - Deep Encoder, Shallow Decoder: Reevaluating Non-autoregressive Machine Translation]]
 +  * [[https://arxiv.org/pdf/2404.12022|Wu et al 2024 - Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration]]
 +
 +===== Papers =====
 +  * [[https://arxiv.org/pdf/2305.10427.pdf|Santilli et al 2023 - Accelerating Transformer Inference for Translation via Parallel Decoding]]
 +
nlp/non-autoregressive_seq2seq.1626207478.txt.gz · Last modified: 2023/06/15 07:36 (external edit)

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki