nlp:multilinguality
This is an old revision of the document!
Table of Contents
Multilinguality
Datasets
Survey on Multilingual NLP Datasets Datasets
| Name | Languages | Tasks | Size | Machine-Translated or Annotated? |
|---|---|---|---|---|
| MasakhaNER | Amharic, Hausa, Igbo, Kinyarwanda, Luganda, Luo, Naija Pidgin, Swahili, Wolof, Yorùbá | NER | 3.5 GB | Annotated |
| TyDi QA | English, Arabic, Bengali, Finnish, Indonesian, Japanese, Kiswahili, Korean, Russian, Telugu, Thai | QA | 0.10 GB | Annotated |
| MLQA | English, Arabic, German, Spanish, Hindi, Vietnamese, Simplified Chinese | QA | 0.075 GB | Annotated |
| AmericasNLI | Aymara, Asháninka, Bribri, Guaraní, Nahuatl, Otomí, Quechua, Rarámuri, Shipibo-Konibo, Wixarika | NLI | 13.5 MB | Annotated |
| MGSM | English, Bengali, Chinese, French, German, Japanese, Russian, Spanish, Swahili, Telugu, Thai | Arithmetic Reasoning | 1.04 MB | Annotated |
| IndoNLG | Indonesian, Javanese, Sundanese | Summarization, QA, Dialogue, MT | 23.79 GB | ? |
People
Related Pages
nlp/multilinguality.1682894973.txt.gz · Last modified: 2023/06/15 07:36 (external edit)