Table of Contents
Efficient Neural Networks
Overviews
Efficient Transformers
Related Pages
Efficient Neural Networks
Methods having to do with efficiency in neural networks.
Overviews
General
Sze et al 2017 - Efficient Processing of Deep Neural Networks: A Tutorial and Survey
For LLMs
Wan et al 2023 - Efficient Large Language Models: A Survey
Zhou et al 2024 - A Survey on Efficient Inference for Large Language Models
Reasoning LLMs
Wang et al 2025 - Harnessing the Reasoning Economy: A Survey of Efficient Reasoning for Large Language Models
Efficient Transformers
Pope 2022 - Efficiently Scaling Transformer Inference
Introduced the idea of the KV cache.
Gim et al 2023 - Prompt Cache: Modular Attention Reuse for Low-Latency Inference
Zhang et al 2023 - H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models
Removes tokens from the kv-cache, and keeps the most important ones (the heavy-hitters, H2s)
Related Pages
Edge Computing
GPU Deep Learning
Model Compression
Systems & ML