Table of Contents

NLP Datasets

See also NLP Progress, Wikipedia List of datasets, and nlp-datasets. Also data preparation.

Language Modeling Corpora

General Benchmarks or Multi-Task Benchmarks

Multilingual

Dialog

Semantic Parsing

Machine Translation

Question Answering

Summarization

Multimodal

Natural Language Inference

Seq2seq

Some standard seq2seq datasets.

Compositional Generalization

Commonsense Reasoning

Paraphrase