User Tools

Site Tools


nlp:multilinguality

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
nlp:multilinguality [2023/04/30 18:57] ctoukmajnlp:multilinguality [2024/06/19 15:43] (current) – [People] jmflanig
Line 2: Line 2:
  
   * [[http://proceedings.mlr.press/v119/hu20b/hu20b.pdf|Hu et al 2020 - XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization]]   * [[http://proceedings.mlr.press/v119/hu20b/hu20b.pdf|Hu et al 2020 - XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization]]
 +  * [[https://arxiv.org/pdf/2211.15649.pdf|Yu et al 2022 - Beyond Counting Datasets: A Survey of Multilingual Dataset Construction and Necessary Resources]]
  
  
 ===== Datasets ===== ===== Datasets =====
-^ Name ^ Languages ^ Tasks ^ Size ^ Machine-Translated or Annotated? ^ +  * [[https://multilingual-dataset-survey.github.io/|Multilingual Dataset Survey]]
-[[https://github.com/masakhane-io/masakhane-ner|MasakhaNER]] | Amharic, Hausa, Igbo, Kinyarwanda, Luganda, Luo, Naija Pidgin, Swahili, Wolof, Yorùbá| NER | 3.5 GB | Annotated | +
-| [[https://arxiv.org/ftp/arxiv/papers/2003/2003.05002.pdf|TyDi QA]] | English, Arabic, Bengali, Finnish, Indonesian, Japanese, Kiswahili, Korean, Russian, Telugu, Thai | QA | 0.10 GB | Annotated | +
-| [[https://aclanthology.org/2020.acl-main.653.pdf|MLQA]] | English, Arabic, German, Spanish, Hindi, Vietnamese, Simplified Chinese | QA | 0.075 GB | Annotated | +
-| [[https://arxiv.org/pdf/2104.08726.pdf|AmericasNLI]] | Aymara, Asháninka, Bribri, Guaraní, Nahuatl, Otomí, Quechua, Rarámuri, Shipibo-Konibo, Wixarika | NLI | 13.5 MB | Annotated | +
-| [[https://arxiv.org/pdf/2210.03057.pdf|MGSM]] | English, Bengali, Chinese, French, German, Japanese, Russian, Spanish, Swahili, Telugu, Thai | Arithmetic Reasoning | 1.04 MB | Annotated | +
-| [[https://aclanthology.org/2021.emnlp-main.699.pdf|IndoNLG]] | Indonesian, Javanese, Sundanese | Summarization, QA, Dialogue, MT | 23.79 GB | ? | +
  
 ===== People ===== ===== People =====
   * [[https://scholar.google.com/citations?user=dLaR9lgAAAAJ&hl=en|Orhan Firat]]   * [[https://scholar.google.com/citations?user=dLaR9lgAAAAJ&hl=en|Orhan Firat]]
 +  * [[https://scholar.google.com/citations?user=wlosgkoAAAAJ&hl=en|Graham Neubig]]
 ===== Related Pages ===== ===== Related Pages =====
   * [[Cross-Lingual Transfer]]   * [[Cross-Lingual Transfer]]
   * [[Machine Translation]]   * [[Machine Translation]]
nlp/multilinguality.1682881032.txt.gz · Last modified: 2023/06/15 07:36 (external edit)

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki