nlp:explainability

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
nlp:explainability [2023/03/15 16:45] – [Papers] jmflanignlp:explainability [2025/06/01 23:17] (current) – [Related Pages] jmflanig
Line 7: Line 7:
   * [[https://arxiv.org/pdf/2012.14261.pdf|Zhang et al 2020 - A Survey on Neural Network Interpretability]]   * [[https://arxiv.org/pdf/2012.14261.pdf|Zhang et al 2020 - A Survey on Neural Network Interpretability]]
   * [[https://arxiv.org/pdf/2010.00389.pdf|Thayaparan et al 2020 - A Survey on Explainability in Machine Reading Comprehension]]   * [[https://arxiv.org/pdf/2010.00389.pdf|Thayaparan et al 2020 - A Survey on Explainability in Machine Reading Comprehension]]
 +  * [[https://arxiv.org/pdf/2207.13243|Rauker et al 2022 - Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks]]
  
 ==== Papers ==== ==== Papers ====
Line 30: Line 31:
 ===== Explainable NLP ===== ===== Explainable NLP =====
   * [[https://arxiv.org/pdf/2009.06354.pdf|Lamm et al 2020 - QED: A Framework and Dataset for Explanations in Question Answering]]   * [[https://arxiv.org/pdf/2009.06354.pdf|Lamm et al 2020 - QED: A Framework and Dataset for Explanations in Question Answering]]
 +
 +===== Interpretability and Explainability In LLMs =====
 +  * **Overviews**
 +    * [[https://arxiv.org/pdf/2401.12874|Luo & Specia 2024 - From Understanding to Utilization: A Survey on Explainability for Large Language Models]]
 +    * [[https://arxiv.org/pdf/2402.10688|Zhao et al 2024 - Towards Uncovering How Large Language Model Works: An Explainability Perspective]] This is an ok paper, but it cites almost none of the work before 2021 or work outside of the mechanistic interpretability literature.
 +  * **Resources**
 +    * [[https://burnycoder.github.io/Landing/Contents/Exobrain/Topics/Mechanistic%20interpretability/|Paper list]]
 +  * **Papers**
 +    * [[https://openaipublic.blob.core.windows.net/neuron-explainer/paper/index.html|Bills et al 2023 - Language models can explain neurons in language models]]
 +    * [[https://arxiv.org/pdf/2305.08809|Wu et al 2023 - Interpretability at Scale: Identifying Causal Mechanisms in Alpaca]]
 +    * [[https://arxiv.org/pdf/2305.19911|Foote et al 2023 - Neuron to Graph: Interpreting Language Model Neurons at Scale]]
  
 ===== Natural Language Explanations ===== ===== Natural Language Explanations =====
 +  * For NLI, see [[nlp:entailment#Entailment - Natural Language Explanations]]
   * [[https://arxiv.org/pdf/2112.08674.pdf|Wiegreffe et al 2021 - Reframing Human-AI Collaboration for Generating Free-Text Explanations]]   * [[https://arxiv.org/pdf/2112.08674.pdf|Wiegreffe et al 2021 - Reframing Human-AI Collaboration for Generating Free-Text Explanations]]
   * **On out-of-domain data**   * **On out-of-domain data**
     * [[https://aclanthology.org/2021.insights-1.17.pdf|Zhou and Tan 2021 - Investigating the Effect of Natural Language Explanations on Out-of-Distribution Generalization in Few-shot NLI]]     * [[https://aclanthology.org/2021.insights-1.17.pdf|Zhou and Tan 2021 - Investigating the Effect of Natural Language Explanations on Out-of-Distribution Generalization in Few-shot NLI]]
     * [[https://aclanthology.org/2022.acl-long.477.pdf|Chrysostomou & Aletras 2022 - An Empirical Study on Explanations in Out-of-Domain Settings]] Does it on text classification     * [[https://aclanthology.org/2022.acl-long.477.pdf|Chrysostomou & Aletras 2022 - An Empirical Study on Explanations in Out-of-Domain Settings]] Does it on text classification
 +  * **Making it more Robust**
 +    * [[https://arxiv.org/pdf/2305.04990.pdf|Ludan et al 2023 - Explanation-based Finetuning Makes Models More Robust to Spurious Cues]]
  
 ===== Evaluating Explanations ===== ===== Evaluating Explanations =====
Line 45: Line 60:
  
 ===== Related Pages ===== ===== Related Pages =====
 +  * [[ml:Mechanistic Interpretability]]
   * [[ml:Neural Network Psychology]]   * [[ml:Neural Network Psychology]]
   * [[Probing Experiments]]   * [[Probing Experiments]]
-  * [[Reasoning Chains]]+  * [[Reasoning#Reasoning Chains|Reasoning - Reasoning Chains]] 
 +  * [[ml:Trustworthy AI]]
   * [[ml:Visualizing Neural Networks]]   * [[ml:Visualizing Neural Networks]]
nlp/explainability.1678898709.txt.gz · Last modified: 2023/06/15 07:36 (external edit)

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki