ml:efficient_nns
Table of Contents
Efficient Neural Networks
Methods having to do with efficiency in neural networks.
Overviews
- General
- For LLMs
- Reasoning LLMs
Efficient Transformers
- Pope 2022 - Efficiently Scaling Transformer Inference Introduced the idea of the KV cache.
- Zhang et al 2023 - H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models Removes tokens from the kv-cache, and keeps the most important ones (the heavy-hitters, H2s)
Related Pages
ml/efficient_nns.txt · Last modified: 2025/05/07 06:17 by jmflanig