nlp:tokenization
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| nlp:tokenization [2024/03/20 03:52] – jmflanig | nlp:tokenization [2024/07/12 03:28] (current) – [Miscellaneous Papers about Tokenization] jmflanig | ||
|---|---|---|---|
| Line 33: | Line 33: | ||
| * BPE Dropout [[https:// | * BPE Dropout [[https:// | ||
| * Gradient-based Subword Tokenization: | * Gradient-based Subword Tokenization: | ||
| + | |||
| + | ==== Effects and Choice of Tokenization ==== | ||
| + | * [[https:// | ||
| ===== Miscellaneous Papers about Tokenization ===== | ===== Miscellaneous Papers about Tokenization ===== | ||
| * [[https:// | * [[https:// | ||
| + | * [[https:// | ||
| ===== Stopwords ===== | ===== Stopwords ===== | ||
| Line 48: | Line 52: | ||
| * SentencePiece (does BPE and subword regularization): | * SentencePiece (does BPE and subword regularization): | ||
| * [[https:// | * [[https:// | ||
| + | |||
| + | ===== Related Pages ===== | ||
| + | * [[Data Preparation]] | ||
nlp/tokenization.1710906724.txt.gz · Last modified: 2024/03/20 03:52 by jmflanig