nlp:evaluation
Table of Contents
Evaluation
Natural Language Output
To evaluate natural language output, researchers often use BLEU or human evaluation. For summarization, they often use ROUGE.
See also Generation - Evaluation, Machine Translation - Evaluation, and Dialog - Evaluation.
Papers
Evaluation with Large Language Models
- Overviews
Robust Evaluation
- Ribeiro et al 2020 - Beyond Accuracy: Behavioral Testing of NLP Models with CheckList Very good paper, best paper award at ACL 2020.
See also Generation - Evaluation, Machine Translation - Evaluation, and Dialog - Evaluation.
Related Pages
- Natural Language Output
- Generation - Evaluation
- Machine Translation - Evaluation
- Dialog - Evaluation
- Question Answering - Evaluation
nlp/evaluation.txt · Last modified: 2025/11/18 22:24 by jmflanig