nlp:tokenization
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| nlp:tokenization [2024/06/28 03:46] – jmflanig | nlp:tokenization [2024/07/12 03:28] (current) – [Miscellaneous Papers about Tokenization] jmflanig | ||
|---|---|---|---|
| Line 34: | Line 34: | ||
| * Gradient-based Subword Tokenization: | * Gradient-based Subword Tokenization: | ||
| - | ===== Effects and Choice of Tokenization | + | ==== Effects and Choice of Tokenization ==== |
| * [[https:// | * [[https:// | ||
| ===== Miscellaneous Papers about Tokenization ===== | ===== Miscellaneous Papers about Tokenization ===== | ||
| * [[https:// | * [[https:// | ||
| + | * [[https:// | ||
| ===== Stopwords ===== | ===== Stopwords ===== | ||
| Line 51: | Line 52: | ||
| * SentencePiece (does BPE and subword regularization): | * SentencePiece (does BPE and subword regularization): | ||
| * [[https:// | * [[https:// | ||
| + | |||
| + | ===== Related Pages ===== | ||
| + | * [[Data Preparation]] | ||
nlp/tokenization.1719546409.txt.gz · Last modified: 2024/06/28 03:46 by jmflanig