This is an old revision of the document!

Evaluation

Natural Language Output

To evaluate natural language output, researchers often use BLEU or human evaluation. For summarization, they often use ROUGE.

See also Generation - Evaluation, Machine Translation - Evaluation, and Dialog - Evaluation.