====== State-Space Models ======

===== Overviews =====
  * **Survey Papers**
    * [[https://arxiv.org/pdf/2404.16112|2024 - Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modeling: Methods, Applications, and Challenges]]
  * **Papers with Good Overviews**
    * [[https://arxiv.org/pdf/2008.07669.pdf|Gu et al 2020 - HiPPO: Recurrent Memory with Optimal Polynomial Projections]]
    * S4 model: **[[https://arxiv.org/pdf/2111.00396.pdf|Gu et al 2021 - Efficiently Modeling Long Sequences with Structured State Spaces]]** Good intro to state spaces
    * [[https://arxiv.org/pdf/2303.06349.pdf|Orvieto et al 2023 - Resurrecting Recurrent Neural Networks for Long Sequences]]
    * **[[https://arxiv.org/pdf/2312.00752.pdf|Gu & Dao 2023 - Mamba: Linear-Time Sequence Modeling with Selective State Spaces]]** Nice overview of SSMs

===== Key Papers =====
  * [[https://arxiv.org/pdf/2008.07669.pdf|Gu et al 2020 - HiPPO: Recurrent Memory with Optimal Polynomial Projections]]
  * S4 model: [[https://arxiv.org/pdf/2111.00396.pdf|Gu et al 2021 - Efficiently Modeling Long Sequences with Structured State Spaces]]
  * [[https://arxiv.org/pdf/2206.12037.pdf|Gu et al 2022 - How to Train Your HiPPO: State Space Models with Generalized Orthogonal Basis Projections]]
  * S5 model: [[https://arxiv.org/pdf/2208.04933.pdf|Smith et al 2022 - Simplified State Space Layers for Sequence Modeling]]
  * Mega: [[https://arxiv.org/pdf/2209.10655.pdf|Ma et al 2022 - Mega: Moving Average Equipped Gated Attention]] SOTA on long-range arena benchmark. Combines flash attention with state-space models.  Still n-squared runtime however
  * H3 model: [[https://arxiv.org/pdf/2212.14052.pdf|Fu et al 2022 - Hungry Hungry Hippos: Towards Language Modeling with State Space Models]]
  * [[https://arxiv.org/pdf/2303.06349.pdf|Orvieto et al 2023 - Resurrecting Recurrent Neural Networks for Long Sequences]] Gives a nice history
  * [[https://arxiv.org/pdf/2302.10866.pdf|Poli et al 2023 - Hyena Hierarchy: Towards Larger Convolutional Language Models]]
  * [[https://arxiv.org/pdf/2307.08621.pdf|Sun et al 2023 - Retentive Network: A Successor to Transformer
for Large Language Models]]
  * [[https://arxiv.org/pdf/2312.00752.pdf|Gu & Dao 2023 - Mamba: Linear-Time Sequence Modeling with Selective State Spaces]] First rejected at [[https://openreview.net/forum?id=AL1fq05o7H|ICLR]] (bad reviewing) and then accepted to COLM ([[https://openreview.net/forum?id=tEYskw1VY2#discussion|reviews]])
    * [[https://srush.github.io/annotated-mamba/hard.html|Mamba: The Hard Way]] Sasha Rush's tutorial implementation
    * [[https://github.com/sustcsonglin/mamba-triton/tree/master]]
  * [[https://arxiv.org/pdf/2401.13660.pdf|Wang et al 2024 - MambaByte: Token-free Selective State Space Model]]
  * [[https://arxiv.org/pdf/2402.19427|De et al 2024 - Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models]]
  * [[https://arxiv.org/pdf/2403.01590|Ali et al 2024 - The Hidden Attention of Mamba Models]]
  * [[https://arxiv.org/pdf/2403.19887|Leiber et al 2024 - Jamba: A Hybrid Transformer-Mamba Language Model]]
  * [[https://arxiv.org/pdf/2405.21060|Dao & Gu 2024 - Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality]]
  * [[https://arxiv.org/pdf/2408.10189|Bick et al 2024 - Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models]]

===== Papers =====
  * [[https://arxiv.org/pdf/2304.12776.pdf|Vardasbi et al 2023 - State Spaces Aren’t Enough: Machine Translation Needs Attention]]

===== Analysis and Mechanistic Interpretability =====
  * [[https://arxiv.org/pdf/2404.03646|Sharma et al 2024 - Locating and Editing Factual Associations in Mamba]]
  * [[https://arxiv.org/pdf/2505.24244|Endy et al 2025 - Mamba Knockout for Unraveling Factual Information Flow]]

===== Theoretical Properties =====
  * [[https://arxiv.org/pdf/2404.08819|Merrill et al 2024 - The Illusion of State in State-Space Models]]

===== People =====
  * [[https://scholar.google.com/citations?user=NQRw0bQAAAAJ&hl=en|Tri Dao]]
  * [[https://scholar.google.com/citations?user=DVCHv1kAAAAJ&hl=en|Albert Gu]]

===== Related Pages =====
  * [[nlp:RNN]]
  * [[nlp:Seq2seq]]