nlp:data_preparation
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| nlp:data_preparation [2021/04/20 22:17] – [Creating a Train/Dev/Test Split] jmflanig | nlp:data_preparation [2023/06/15 07:36] (current) – external edit 127.0.0.1 | ||
|---|---|---|---|
| Line 6: | Line 6: | ||
| * Do not use n-fold cross-validation across sentences. | * Do not use n-fold cross-validation across sentences. | ||
| * Sometimes it's a good idea to split by date, so you have train, dev, test data chronologically ordered. | * Sometimes it's a good idea to split by date, so you have train, dev, test data chronologically ordered. | ||
| + | * [[https:// | ||
| ==== Papers ==== | ==== Papers ==== | ||
| - | * [[https:// | + | * [[https:// |
| - | * [[https:// | + | * [[https:// |
| + | * * [[https:// | ||
| ===== Tokenization ===== | ===== Tokenization ===== | ||
| Line 15: | Line 18: | ||
| ===== Related Pages ===== | ===== Related Pages ===== | ||
| + | * [[ml:Data Cleaning and Validation]] | ||
| * [[Dataset Creation]] | * [[Dataset Creation]] | ||
| * [[Language Identification]] | * [[Language Identification]] | ||
nlp/data_preparation.1618957039.txt.gz · Last modified: 2023/06/15 07:36 (external edit)