User Tools

Site Tools


nlp:tokenization

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
nlp:tokenization [2024/06/28 03:48] – [Software] jmflanignlp:tokenization [2024/07/12 03:28] (current) – [Miscellaneous Papers about Tokenization] jmflanig
Line 39: Line 39:
 ===== Miscellaneous Papers about Tokenization ===== ===== Miscellaneous Papers about Tokenization =====
   * [[https://aclanthology.org/2024.eacl-long.72.pdf|Christopoulou et al 2024 - Text-to-Code Generation with Modality-relative Pre-training]] Adds domain-specific tokens to a pretrained LM for text-to-code   * [[https://aclanthology.org/2024.eacl-long.72.pdf|Christopoulou et al 2024 - Text-to-Code Generation with Modality-relative Pre-training]] Adds domain-specific tokens to a pretrained LM for text-to-code
 +  * [[https://arxiv.org/pdf/2406.20086|Feucht et al 2024 - Token Erasure as a Footprint of Implicit Vocabulary Items in LLMs]] Finds the implicit vocabulary in a Transformer decoder model
  
 ===== Stopwords ===== ===== Stopwords =====
nlp/tokenization.1719546516.txt.gz · Last modified: 2024/06/28 03:48 by jmflanig

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki