nlp:evaluation
This is an old revision of the document!
Table of Contents
Evaluation
Natural Language Output
To evaluate natural language output, researchers often use BLEU or human evaluation. For summarization, they often use ROUGE.
See also Generation - Evaluation, Machine Translation - Evaluation, and Dialog - Evaluation.
Papers
Evaluation with Large Language Models
Robust Evaluation
- Ribeiro et al 2020 - Beyond Accuracy: Behavioral Testing of NLP Models with CheckList Very good paper, best paper award at ACL 2020.
See also Generation - Evaluation, Machine Translation - Evaluation, and Dialog - Evaluation.
Related Pages
- Natural Language Output
- Generation - Evaluation
- Machine Translation - Evaluation
- Dialog - Evaluation
- Question Answering - Evaluation
nlp/evaluation.1698891250.txt.gz · Last modified: 2023/11/02 02:14 by jmflanig