User Tools

Site Tools


nlp:datasets

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
nlp:datasets [2023/11/29 21:13] – [Multi-Task] jmflanignlp:datasets [2023/11/29 21:14] (current) jmflanig
Line 12: Line 12:
   * [[https://super.gluebenchmark.com/|SuperGLUE]]: [[https://arxiv.org/pdf/1905.00537.pdf|paper]] - A more difficult version of GLUE.   * [[https://super.gluebenchmark.com/|SuperGLUE]]: [[https://arxiv.org/pdf/1905.00537.pdf|paper]] - A more difficult version of GLUE.
   * [[https://www.cluebenchmarks.com/en/index.html|CLUE]]: [[https://arxiv.org/pdf/2004.05986.pdf|paper]] - Like GLUE, but for Chinese    * [[https://www.cluebenchmarks.com/en/index.html|CLUE]]: [[https://arxiv.org/pdf/2004.05986.pdf|paper]] - Like GLUE, but for Chinese 
-  * [[https://arxiv.org/pdf/2009.03300.pdf|Hendrycks et al 2020 - Measuring Massive Multitask Language Understanding]] This dataset is a popular dataset for LLMs to evaluate on (for example GPT-4, etc).  However, it has two serious issues. 1) the test set is available on the web, which means LLMs are likely contaminated, and 2) the datasets has no in-domain training data, and can only be evaluated in a few-shot manner. This make is impossible to compare to prior fine-tuned methods.+  * [[https://github.com/hendrycks/test|MMMLU]]: [[https://arxiv.org/pdf/2009.03300.pdf|Hendrycks et al 2020 - Measuring Massive Multitask Language Understanding]] This dataset is a popular dataset for LLMs to evaluate on (for example GPT-4, etc).  However, it has two serious issues. 1) the test set is available on the web, which means LLMs are likely contaminated, and 2) the datasets has no in-domain training data, and can only be evaluated in a few-shot manner. This make is impossible to properly compare to prior fine-tuned methods.
  
 ===== Multilingual ===== ===== Multilingual =====
nlp/datasets.1701292393.txt.gz · Last modified: 2023/11/29 21:13 by jmflanig

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki