nlp:datasets
This is an old revision of the document!
Table of Contents
NLP Datasets
See also NLP Progress, Wikipedia List of datasets, and nlp-datasets. Also data preparation.
Language Modeling Corpora
- BNC corpus
- Gigaword
- Common crawl
- Bookcorpus (Used in BERT)
Multi-Task
Multilingual
- Survey on Multilingual NLP Datasets: List of Datasets and Paper
Dialog
Semantic Parsing
Machine Translation
Question Answering
Summarization
Multimodal
Natural Language Inference
Seq2seq
Some standard seq2seq datasets.
Compositional Generalization
Commonsense Reasoning
Paraphrase
nlp/datasets.1684489465.txt.gz · Last modified: 2023/06/15 07:36 (external edit)