nlp:transformers

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
nlp:transformers [2025/07/17 03:26] – [Mixture of Expert (MoE) Transformers] jmflanignlp:transformers [2025/10/17 20:09] (current) – [Analysis and Interpretation] jmflanig
Line 46: Line 46:
   * [[https://arxiv.org/pdf/2008.02217.pdf|Ramsauer et al 2020 - Hopfield Networks is All You Need]]   * [[https://arxiv.org/pdf/2008.02217.pdf|Ramsauer et al 2020 - Hopfield Networks is All You Need]]
   * [[https://arxiv.org/pdf/2012.14913.pdf|Geva et al 2020 - Transformer Feed-Forward Layers Are Key-Value Memories]]   * [[https://arxiv.org/pdf/2012.14913.pdf|Geva et al 2020 - Transformer Feed-Forward Layers Are Key-Value Memories]]
 +  * [[https://www.lesswrong.com/posts/AcKRB8wDpdaN6v6ru/interpreting-gpt-the-logit-lens|2020 - The Logit Lens]] Used many places, see [[https://arxiv.org/pdf/2503.11667|LogitLens4LLMs]] for some examples
   * [[https://arxiv.org/pdf/2310.03686|Langedijk et al 2023 - DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers]]   * [[https://arxiv.org/pdf/2310.03686|Langedijk et al 2023 - DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers]]
   * **For decoders/LLMs**   * **For decoders/LLMs**
nlp/transformers.1752722793.txt.gz · Last modified: 2025/07/17 03:26 by jmflanig

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki