ml:efficient_nns
This is an old revision of the document!
Table of Contents
Efficient Neural Networks
Methods having to do with efficiency in neural networks.
Overviews
- General
- For LLMs
- Reasoning LLMs
Efficient Transformers
- Pope 2022 - Efficiently Scaling Transformer Inference Introduced the idea of the KV cache.
- Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models Removes tokens from the kv-cache, and keeps the most important ones (the heavy-hitters, H2s)
Related Pages
ml/efficient_nns.1746598612.txt.gz · Last modified: 2025/05/07 06:16 by jmflanig