User Tools

Site Tools


nlp:multilinguality

This is an old revision of the document!


Multilinguality

Datasets

Survey on Multilingual NLP Datasets: List of Datasets and Paper

Name Languages Tasks Size Machine-Translated or Annotated?
MasakhaNER Amharic, Hausa, Igbo, Kinyarwanda, Luganda, Luo, Naija Pidgin, Swahili, Wolof, Yorùbá NER 3.5 GB Annotated
TyDi QA English, Arabic, Bengali, Finnish, Indonesian, Japanese, Kiswahili, Korean, Russian, Telugu, Thai QA 0.10 GB Annotated
MLQA English, Arabic, German, Spanish, Hindi, Vietnamese, Simplified Chinese QA 0.075 GB Annotated
AmericasNLI Aymara, Asháninka, Bribri, Guaraní, Nahuatl, Otomí, Quechua, Rarámuri, Shipibo-Konibo, Wixarika NLI 13.5 MB Annotated
MGSM English, Bengali, Chinese, French, German, Japanese, Russian, Spanish, Swahili, Telugu, Thai Arithmetic Reasoning 1.04 MB Annotated
IndoNLG Indonesian, Javanese, Sundanese Summarization, QA, Dialogue, MT 23.79 GB ?

People

nlp/multilinguality.1682895015.txt.gz · Last modified: 2023/06/15 07:36 (external edit)

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki