Differences

This shows you the differences between two versions of the page.

--- nlp:explainability [2025/03/26 23:46] – [Interpretability and Explainability In LLMs] jmflanig
+++ nlp:explainability [2025/06/01 23:17] (current) – [Related Pages] jmflanig
@@ Line 7: / Line 7: @@
   * [[https://arxiv.org/pdf/2012.14261.pdf|Zhang et al 2020 - A Survey on Neural Network Interpretability]]
   * [[https://arxiv.org/pdf/2010.00389.pdf|Thayaparan et al 2020 - A Survey on Explainability in Machine Reading Comprehension]]
+  * [[https://arxiv.org/pdf/2207.13243|Rauker et al 2022 - Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks]]
 ==== Papers ====
@@ Line 32: / Line 33: @@
 ===== Interpretability and Explainability In LLMs =====
-  * [[https://openaipublic.blob.core.windows.net/neuron-explainer/paper/index.html|Bills et al 2023 - Language models can explain neurons in language models]]
+  * **Overviews**
-  * [[https://arxiv.org/pdf/2402.10688|Zhao et al 2024 - Towards Uncovering How Large Language Model Works: An Explainability Perspective]]** Good review paper
+    * [[https://arxiv.org/pdf/2401.12874|Luo & Specia 2024 - From Understanding to Utilization: A Survey on Explainability for Large Language Models]]
+    * [[https://arxiv.org/pdf/2402.10688|Zhao et al 2024 - Towards Uncovering How Large Language Model Works: An Explainability Perspective]] This is an ok paper, but it cites almost none of the work before 2021 or work outside of the mechanistic interpretability literature.
+  * **Resources**
+    * [[https://burnycoder.github.io/Landing/Contents/Exobrain/Topics/Mechanistic%20interpretability/|Paper list]]
+  * **Papers**
+    * [[https://openaipublic.blob.core.windows.net/neuron-explainer/paper/index.html|Bills et al 2023 - Language models can explain neurons in language models]]
+    * [[https://arxiv.org/pdf/2305.08809|Wu et al 2023 - Interpretability at Scale: Identifying Causal Mechanisms in Alpaca]]
+    * [[https://arxiv.org/pdf/2305.19911|Foote et al 2023 - Neuron to Graph: Interpreting Language Model Neurons at Scale]]
 ===== Natural Language Explanations =====
@@ Line 56: / Line 63: @@
   * [[ml:Neural Network Psychology]]
   * [[Probing Experiments]]
-  * [[Reasoning Chains]]
+  * [[Reasoning#Reasoning Chains|Reasoning - Reasoning Chains]]
   * [[ml:Trustworthy AI]]
   * [[ml:Visualizing Neural Networks]]