User Tools

Site Tools


nlp:seq2seq

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
nlp:seq2seq [2023/07/06 01:02] – [Decoding Strategies] jmflanignlp:seq2seq [2025/05/29 07:15] (current) – [Decoding Strategies] jmflanig
Line 17: Line 17:
     * [[https://arxiv.org/pdf/2109.05093.pdf|Scholak et al 2021 - PICARD: Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models]]     * [[https://arxiv.org/pdf/2109.05093.pdf|Scholak et al 2021 - PICARD: Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models]]
     * [[https://arxiv.org/pdf/2201.11227.pdf|Poesia et al 2022 - Synchromesh: Reliable Code Generation from Pre-trained Language Models]] They created a tool that will take a ANTLR parser and a string, and give you the set of valid next token completions (see sect 3.1).     * [[https://arxiv.org/pdf/2201.11227.pdf|Poesia et al 2022 - Synchromesh: Reliable Code Generation from Pre-trained Language Models]] They created a tool that will take a ANTLR parser and a string, and give you the set of valid next token completions (see sect 3.1).
-  * Parallel Decoding+    * **[[https://arxiv.org/pdf/2305.13971|Geng et al 2023 - Grammar-Constrained Decoding for Structured NLP Tasks without Finetuning]]** Elegant solution. Uses Grammatical Framework to constrain the outputs. 
 +    * **[[https://arxiv.org/pdf/2403.06988|Beurer-Kellner et al 2024 - Guiding LLMs The Right Way: Fast, Non-Invasive Constrained Generation]]** 
 +  * **Parallel Decoding**
     * [[https://arxiv.org/pdf/2305.10427.pdf|Santilli et al 2023 - Accelerating Transformer Inference for Translation via Parallel Decoding]]     * [[https://arxiv.org/pdf/2305.10427.pdf|Santilli et al 2023 - Accelerating Transformer Inference for Translation via Parallel Decoding]]
 +  * **Speculative Decoding**
 +    * Overviews
 +      * [[https://arxiv.org/pdf/2401.07851|Xia et al 2024 - Unlocking Efficiency in Large Language Model Inference: A Comprehensive Survey of Speculative Decoding]]
 +      * [[https://arxiv.org/pdf/2405.13019|Khoshnoodi et al 2024 - A Comprehensive Survey of Accelerated Generation Techniques in Large Language Models]]
 +    * [[https://arxiv.org/pdf/2211.17192|Leviathan et al 2024 - Fast Inference from Transformers via Speculative Decoding]]
 +    * [[https://arxiv.org/pdf/2404.11912|Sun et al 2024 - TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding]]
 +    * [[https://arxiv.org/pdf/2502.17421|Yang et al 2025 - LongSpec: Long-Context Speculative Decoding with Efficient Drafting and Verification]]
 +    * [[https://arxiv.org/pdf/2505.20776|Cha et al 2025 - SpecExtend: A Drop-in Enhancement for Speculative Decoding of Long Sequences]]
 +  * **Miscellaneous Decoding Techniques**
 +    * Contrastive Decoding
 +      * [[https://aclanthology.org/2024.eacl-long.155.pdf|Waldendorf et al 2024 - Contrastive Decoding Reduces Hallucinations in Large Multilingual Machine Translation Models]]
  
 ===== Issues in Seq2Seq Models ===== ===== Issues in Seq2Seq Models =====
Line 32: Line 45:
   * [[https://arxiv.org/pdf/2005.03642.pdf|Wang & Sennrich 2020 - On Exposure Bias, Hallucination and Domain Shift   * [[https://arxiv.org/pdf/2005.03642.pdf|Wang & Sennrich 2020 - On Exposure Bias, Hallucination and Domain Shift
 in Neural Machine Translation]] Uses minimum risk training (i.e. risk loss function), which shows a consistant improvement across models. in Neural Machine Translation]] Uses minimum risk training (i.e. risk loss function), which shows a consistant improvement across models.
 +  * [[https://arxiv.org/pdf/1905.10617.pdf|He et al 2021 - Exposure Bias versus Self-Recovery: Are Distortions Really Incremental for Autoregressive Text Generation?]] Published [[https://aclanthology.org/2021.emnlp-main.415.pdf|here]]
 +  * [[https://aclanthology.org/2022.findings-acl.58.pdf|Arora et al 2022 - Why Exposure Bias Matters: An Imitation Learning Perspective of Error Accumulation in Language Generation]] Shows that exposure bias leads to an accumulation of errors during generation (such as repetition, etc), and perplexity doesn't capture this.
  
 === Scheduled Sampling === === Scheduled Sampling ===
nlp/seq2seq.1688605365.txt.gz · Last modified: 2023/07/06 01:02 by jmflanig

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki