User Tools

Site Tools


ml:mechanistic_interpretability

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
ml:mechanistic_interpretability [2025/06/01 23:54] – [Papers] jmflanigml:mechanistic_interpretability [2025/06/02 11:23] (current) – [Papers] jmflanig
Line 12: Line 12:
   * [[https://arxiv.org/pdf/2211.00593|Wang et al 2022 - Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 Small]]   * [[https://arxiv.org/pdf/2211.00593|Wang et al 2022 - Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 Small]]
   * **[[https://arxiv.org/pdf/2304.14997|Conmy et al 2023 - Towards Automated Circuit Discovery for Mechanistic Interpretability]]**   * **[[https://arxiv.org/pdf/2304.14997|Conmy et al 2023 - Towards Automated Circuit Discovery for Mechanistic Interpretability]]**
 +  * [[https://arxiv.org/pdf/2304.14767|Geva et al 2023 - Dissecting Recall of Factual Associations in Auto-Regressive Language Models]]
   * [[https://arxiv.org/pdf/2305.00586|Hanna et al 2023 - How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model]]   * [[https://arxiv.org/pdf/2305.00586|Hanna et al 2023 - How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model]]
   * [[https://arxiv.org/pdf/2404.14349|Rajaram et al 2024 - Automatic Discovery of Visual Circuits]]   * [[https://arxiv.org/pdf/2404.14349|Rajaram et al 2024 - Automatic Discovery of Visual Circuits]]
ml/mechanistic_interpretability.1748822090.txt.gz · Last modified: 2025/06/01 23:54 by jmflanig

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki