User Tools

Site Tools


nlp:datasets

This is an old revision of the document!


NLP Datasets

Language Modeling Corpora

  • BNC corpus
  • Gigaword
  • Common crawl
  • Bookcorpus (Used in BERT)

Dialog

Semantic Parsing

Machine Translation

Question Answering

Summarization

Multimodal

Natural Language Inference

Seq2seq

Some standard seq2seq datasets.

Compositional Generalization

Commonsense Reasoning

nlp/datasets.1614637008.txt.gz · Last modified: 2023/06/15 07:36 (external edit)

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki