Table of Contents
State-Space Models
Overviews
Key Papers
Papers
Analysis and Mechanistic Interpretability
Theoretical Properties
People
Related Pages
State-Space Models
Overviews
Survey Papers
2024 - Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modeling: Methods, Applications, and Challenges
Papers with Good Overviews
Gu et al 2020 - HiPPO: Recurrent Memory with Optimal Polynomial Projections
S4 model:
Gu et al 2021 - Efficiently Modeling Long Sequences with Structured State Spaces
Good intro to state spaces
Orvieto et al 2023 - Resurrecting Recurrent Neural Networks for Long Sequences
Gu & Dao 2023 - Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Nice overview of SSMs
Key Papers
Gu et al 2020 - HiPPO: Recurrent Memory with Optimal Polynomial Projections
S4 model:
Gu et al 2021 - Efficiently Modeling Long Sequences with Structured State Spaces
Gu et al 2022 - How to Train Your HiPPO: State Space Models with Generalized Orthogonal Basis Projections
S5 model:
Smith et al 2022 - Simplified State Space Layers for Sequence Modeling
Mega:
Ma et al 2022 - Mega: Moving Average Equipped Gated Attention
SOTA on long-range arena benchmark. Combines flash attention with state-space models. Still n-squared runtime however
H3 model:
Fu et al 2022 - Hungry Hungry Hippos: Towards Language Modeling with State Space Models
Orvieto et al 2023 - Resurrecting Recurrent Neural Networks for Long Sequences
Gives a nice history
Poli et al 2023 - Hyena Hierarchy: Towards Larger Convolutional Language Models
Sun et al 2023 - Retentive Network: A Successor to Transformer for Large Language Models
Gu & Dao 2023 - Mamba: Linear-Time Sequence Modeling with Selective State Spaces
First rejected at
ICLR
(bad reviewing) and then accepted to COLM (
reviews
)
Mamba: The Hard Way
Sasha Rush's tutorial implementation
https://github.com/sustcsonglin/mamba-triton/tree/master
Wang et al 2024 - MambaByte: Token-free Selective State Space Model
De et al 2024 - Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Ali et al 2024 - The Hidden Attention of Mamba Models
Leiber et al 2024 - Jamba: A Hybrid Transformer-Mamba Language Model
Dao & Gu 2024 - Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
Bick et al 2024 - Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models
Papers
Vardasbi et al 2023 - State Spaces Aren’t Enough: Machine Translation Needs Attention
Analysis and Mechanistic Interpretability
Sharma et al 2024 - Locating and Editing Factual Associations in Mamba
Endy et al 2025 - Mamba Knockout for Unraveling Factual Information Flow
Theoretical Properties
Merrill et al 2024 - The Illusion of State in State-Space Models
People
Tri Dao
Albert Gu
Related Pages
RNN
Seq2seq