Differences

This shows you the differences between two versions of the page.

--- nlp:hallucination_and_factivity [2023/10/11 20:59] – [Hallucination and Factivity in LLMs] jmflanig
+++ nlp:hallucination_and_factivity [2025/06/02 00:52] (current) – [Hallucination and Factivity in LLMs] jmflanig
@@ Line 11: / Line 11: @@
 ===== Hallucination and Factivity in LLMs =====
-  * [[https://aclanthology.org/2022.acl-long.229.pdf|Lin et al 2022 - TruthfulQA: Measuring How Models Mimic Human Falsehoods]]
+  * **[[https://aclanthology.org/2022.acl-long.229.pdf|Lin et al 2022 - TruthfulQA: Measuring How Models Mimic Human Falsehoods]]**
   * [[https://arxiv.org/pdf/2206.04624.pdf|Lee et al 2022 - Factuality Enhanced Language Models for Open-Ended Text Generation]] - Prepends a topic prefix to sentences in the factual documents to make each sentence serve as a standalone fact during pretraining.
   * [[https://arxiv.org/pdf/2210.09150.pdf|Prompting GPT-3 To Be Reliable]]
-  * [[https://arxiv.org/pdf/2305.14251.pdf|Min et al 2023 - FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation]]
+  * [[https://arxiv.org/pdf/2305.13534|Zhang et al 2023 - How Language Model Hallucinations Can Snowball]]
+  * **[[https://arxiv.org/pdf/2305.14251.pdf|Min et al 2023 - FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation]]**
   * [[https://arxiv.org/pdf/2309.05217.pdf|Du et al 2023 - Quantifying and Attributing the Hallucination of Large Language Models via Association Analysis]]
+  * [[https://arxiv.org/pdf/2309.11495.pdf|Dhuliawala et al 2023 - Chain-of-Verification Reduces Hallucination in Large Language Models]]
   * [[https://arxiv.org/pdf/2310.00259.pdf|Zouying et al 2023 - AutoHall: Automated Hallucination Dataset Generation for Large Language Models]]
-  * [[https://arxiv.org/pdf/2310.00741.pdf|Chen et al 2023 - FELM: Benchmarking Factuality Evaluation of Large Language Models]]
+  * [[https://arxiv.org/pdf/2310.00741.pdf|Chen et al 2023 - FELM: Benchmarking Factuality Evaluation of Large Language Models]] [[https://github.com/hkust-nlp/felm|github]]
+  * **[[https://arxiv.org/pdf/2311.08401.pdf|Tian et al 2023 - Fine-tuning Language Models for Factuality]]** Uses DPO to fine-tune an LLM to produce factual outputs
+  * [[https://arxiv.org/pdf/2404.05904|Hong et al 2024 - The Hallucinations Leaderboard – An Open Effort to Measure Hallucinations in Large Language Models]]
+  * **[[https://arxiv.org/pdf/2405.05904|Gekhman et al 2024 - Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?]]**
 Prompt to break down sentences into independent facts:
@@ Line 26: / Line 31: @@
   * TruthQA: [[https://aclanthology.org/2022.acl-long.229.pdf|paper]] [[https://github.com/sylinrl/TruthfulQA|github]]
   * FactScore: [[https://arxiv.org/pdf/2305.14251.pdf|paper]] [[https://github.com/shmsw25/FActScore|github]]
+  * FELM: [[https://arxiv.org/pdf/2310.00741.pdf|paper]] [[https://github.com/hkust-nlp/felm|github]]
+  * LongFact: [[https://arxiv.org/pdf/2403.18802|paper]]
 ===== Related Pages =====
@@ Line 31: / Line 38: @@
   * [[modality#epistemic_modality|Factivity]]
   * [[Language Model]]
+  * [[ml:Trustworthy AI]]