State-Space Models

State-Space Models

Overviews

Survey Papers
- 2024 - Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modeling: Methods, Applications, and Challenges
Papers with Good Overviews
- Gu et al 2020 - HiPPO: Recurrent Memory with Optimal Polynomial Projections
- S4 model: Gu et al 2021 - Efficiently Modeling Long Sequences with Structured State Spaces Good intro to state spaces
- Orvieto et al 2023 - Resurrecting Recurrent Neural Networks for Long Sequences
- Gu & Dao 2023 - Mamba: Linear-Time Sequence Modeling with Selective State Spaces Nice overview of SSMs

Key Papers

Gu et al 2020 - HiPPO: Recurrent Memory with Optimal Polynomial Projections
S4 model: Gu et al 2021 - Efficiently Modeling Long Sequences with Structured State Spaces
Gu et al 2022 - How to Train Your HiPPO: State Space Models with Generalized Orthogonal Basis Projections
S5 model: Smith et al 2022 - Simplified State Space Layers for Sequence Modeling
Mega: Ma et al 2022 - Mega: Moving Average Equipped Gated Attention SOTA on long-range arena benchmark. Combines flash attention with state-space models. Still n-squared runtime however
H3 model: Fu et al 2022 - Hungry Hungry Hippos: Towards Language Modeling with State Space Models
Orvieto et al 2023 - Resurrecting Recurrent Neural Networks for Long Sequences Gives a nice history
Poli et al 2023 - Hyena Hierarchy: Towards Larger Convolutional Language Models
Sun et al 2023 - Retentive Network: A Successor to Transformer for Large Language Models
Gu & Dao 2023 - Mamba: Linear-Time Sequence Modeling with Selective State Spaces First rejected at ICLR (bad reviewing) and then accepted to COLM (reviews)
- Mamba: The Hard Way Sasha Rush's tutorial implementation
- https://github.com/sustcsonglin/mamba-triton/tree/master
Wang et al 2024 - MambaByte: Token-free Selective State Space Model
De et al 2024 - Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Ali et al 2024 - The Hidden Attention of Mamba Models
Leiber et al 2024 - Jamba: A Hybrid Transformer-Mamba Language Model
Dao & Gu 2024 - Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
Bick et al 2024 - Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models

Papers

Vardasbi et al 2023 - State Spaces Aren’t Enough: Machine Translation Needs Attention

Table of Contents

State-Space Models

Overviews

Key Papers

Papers

Analysis and Mechanistic Interpretability

Theoretical Properties

People

Related Pages