====== Hallucination and Factivity ====== ===== Overviews ===== * **In Generation** * [[https://arxiv.org/pdf/2202.03629.pdf|Ji et al 2022 - Survey of Hallucination in Natural Language Generation]] * **In Large Language Models** * **[[https://arxiv.org/pdf/2309.01219.pdf|Zhang et al 2023 - Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models]]** * [[https://arxiv.org/pdf/2309.05922.pdf|Rawte et al 2023 - A Survey of Hallucination in Large Foundation Models]] * [[https://arxiv.org/pdf/2309.06794.pdf|Ye et al 2023 - Cognitive Mirage: A Review of Hallucinations in Large Language Models]] * [[https://arxiv.org/pdf/2309.16459.pdf|Andriopoulos & Pouwelse 2023 - Augmenting LLMs with Knowledge: A Survey on Hallucination Prevention]] ===== Hallucination and Factivity in LLMs ===== * **[[https://aclanthology.org/2022.acl-long.229.pdf|Lin et al 2022 - TruthfulQA: Measuring How Models Mimic Human Falsehoods]]** * [[https://arxiv.org/pdf/2206.04624.pdf|Lee et al 2022 - Factuality Enhanced Language Models for Open-Ended Text Generation]] - Prepends a topic prefix to sentences in the factual documents to make each sentence serve as a standalone fact during pretraining. * [[https://arxiv.org/pdf/2210.09150.pdf|Prompting GPT-3 To Be Reliable]] * [[https://arxiv.org/pdf/2305.13534|Zhang et al 2023 - How Language Model Hallucinations Can Snowball]] * **[[https://arxiv.org/pdf/2305.14251.pdf|Min et al 2023 - FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation]]** * [[https://arxiv.org/pdf/2309.05217.pdf|Du et al 2023 - Quantifying and Attributing the Hallucination of Large Language Models via Association Analysis]] * [[https://arxiv.org/pdf/2309.11495.pdf|Dhuliawala et al 2023 - Chain-of-Verification Reduces Hallucination in Large Language Models]] * [[https://arxiv.org/pdf/2310.00259.pdf|Zouying et al 2023 - AutoHall: Automated Hallucination Dataset Generation for Large Language Models]] * [[https://arxiv.org/pdf/2310.00741.pdf|Chen et al 2023 - FELM: Benchmarking Factuality Evaluation of Large Language Models]] [[https://github.com/hkust-nlp/felm|github]] * **[[https://arxiv.org/pdf/2311.08401.pdf|Tian et al 2023 - Fine-tuning Language Models for Factuality]]** Uses DPO to fine-tune an LLM to produce factual outputs * [[https://arxiv.org/pdf/2404.05904|Hong et al 2024 - The Hallucinations Leaderboard – An Open Effort to Measure Hallucinations in Large Language Models]] * **[[https://arxiv.org/pdf/2405.05904|Gekhman et al 2024 - Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?]]** Prompt to break down sentences into independent facts: {{media:facts-prompt-arxiv-2305.14251.png}} (from [[https://arxiv.org/pdf/2305.14251.pdf|Min 2023]]) ==== Datasets ==== * TruthQA: [[https://aclanthology.org/2022.acl-long.229.pdf|paper]] [[https://github.com/sylinrl/TruthfulQA|github]] * FactScore: [[https://arxiv.org/pdf/2305.14251.pdf|paper]] [[https://github.com/shmsw25/FActScore|github]] * FELM: [[https://arxiv.org/pdf/2310.00741.pdf|paper]] [[https://github.com/hkust-nlp/felm|github]] * LongFact: [[https://arxiv.org/pdf/2403.18802|paper]] ===== Related Pages ===== * [[Automatic Fact Checking]] * [[modality#epistemic_modality|Factivity]] * [[Language Model]] * [[ml:Trustworthy AI]]