User Tools

Site Tools


nlp:evaluation

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
nlp:evaluation [2025/05/29 07:31] – [Evaluation with Large Language Models] jmflanignlp:evaluation [2025/11/18 22:24] (current) – [Evaluation with Large Language Models] jmflanig
Line 13: Line 13:
   * **Overviews**   * **Overviews**
     * [[https://arxiv.org/pdf/2411.15594|Gu et al 2024 - A Survey on LLM-as-a-Judge]]     * [[https://arxiv.org/pdf/2411.15594|Gu et al 2024 - A Survey on LLM-as-a-Judge]]
 +    * Blog: [[https://eugeneyan.com/writing/llm-evaluators/|2024 - Evaluating the Effectiveness of LLM-Evaluators (aka LLM-as-Judge)]]
   * [[https://arxiv.org/pdf/2305.17926|Wang et al 2023 - Large Language Models are not Fair Evaluators]]   * [[https://arxiv.org/pdf/2305.17926|Wang et al 2023 - Large Language Models are not Fair Evaluators]]
   * [[https://arxiv.org/pdf/2305.01937.pdf|Chiang & Lee 2023 - Can Large Language Models Be an Alternative to Human Evaluation?]]   * [[https://arxiv.org/pdf/2305.01937.pdf|Chiang & Lee 2023 - Can Large Language Models Be an Alternative to Human Evaluation?]]
nlp/evaluation.1748503898.txt.gz · Last modified: 2025/05/29 07:31 by jmflanig

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki