User Tools

Site Tools


ml:state-space_models

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
ml:state-space_models [2025/04/07 18:53] jmflanigml:state-space_models [2025/08/22 17:58] (current) – [Key Papers] jmflanig
Line 22: Line 22:
 for Large Language Models]] for Large Language Models]]
   * [[https://arxiv.org/pdf/2312.00752.pdf|Gu & Dao 2023 - Mamba: Linear-Time Sequence Modeling with Selective State Spaces]] First rejected at [[https://openreview.net/forum?id=AL1fq05o7H|ICLR]] (bad reviewing) and then accepted to COLM ([[https://openreview.net/forum?id=tEYskw1VY2#discussion|reviews]])   * [[https://arxiv.org/pdf/2312.00752.pdf|Gu & Dao 2023 - Mamba: Linear-Time Sequence Modeling with Selective State Spaces]] First rejected at [[https://openreview.net/forum?id=AL1fq05o7H|ICLR]] (bad reviewing) and then accepted to COLM ([[https://openreview.net/forum?id=tEYskw1VY2#discussion|reviews]])
 +    * [[https://srush.github.io/annotated-mamba/hard.html|Mamba: The Hard Way]] Sasha Rush's tutorial implementation
 +    * [[https://github.com/sustcsonglin/mamba-triton/tree/master]]
   * [[https://arxiv.org/pdf/2401.13660.pdf|Wang et al 2024 - MambaByte: Token-free Selective State Space Model]]   * [[https://arxiv.org/pdf/2401.13660.pdf|Wang et al 2024 - MambaByte: Token-free Selective State Space Model]]
 +  * [[https://arxiv.org/pdf/2402.19427|De et al 2024 - Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models]]
   * [[https://arxiv.org/pdf/2403.01590|Ali et al 2024 - The Hidden Attention of Mamba Models]]   * [[https://arxiv.org/pdf/2403.01590|Ali et al 2024 - The Hidden Attention of Mamba Models]]
   * [[https://arxiv.org/pdf/2403.19887|Leiber et al 2024 - Jamba: A Hybrid Transformer-Mamba Language Model]]   * [[https://arxiv.org/pdf/2403.19887|Leiber et al 2024 - Jamba: A Hybrid Transformer-Mamba Language Model]]
Line 30: Line 33:
 ===== Papers ===== ===== Papers =====
   * [[https://arxiv.org/pdf/2304.12776.pdf|Vardasbi et al 2023 - State Spaces Aren’t Enough: Machine Translation Needs Attention]]   * [[https://arxiv.org/pdf/2304.12776.pdf|Vardasbi et al 2023 - State Spaces Aren’t Enough: Machine Translation Needs Attention]]
 +
 +===== Analysis and Mechanistic Interpretability =====
 +  * [[https://arxiv.org/pdf/2404.03646|Sharma et al 2024 - Locating and Editing Factual Associations in Mamba]]
 +  * [[https://arxiv.org/pdf/2505.24244|Endy et al 2025 - Mamba Knockout for Unraveling Factual Information Flow]]
 +
 +===== Theoretical Properties =====
 +  * [[https://arxiv.org/pdf/2404.08819|Merrill et al 2024 - The Illusion of State in State-Space Models]]
  
 ===== People ===== ===== People =====
ml/state-space_models.1744052008.txt.gz · Last modified: 2025/04/07 18:53 by jmflanig

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki