ml:mechanistic_interpretability

Mechanistic Interpretability

Mechanistic interpretability research has been done in NLP before the term was invented, under other names. See Mechanistic? for important historical context.

Overviews

Papers

See also the papers in BERTology, Neural Network Psychology, Probing Experiments, Transformers - Analysis and Interpretation.

Sparse Autoencoders

Resources

ml/mechanistic_interpretability.txt · Last modified: 2025/06/02 11:23 by jmflanig

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki