ml:efficient_nns
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revision | |||
| ml:efficient_nns [2025/05/07 06:16] – [Efficient Transformers] jmflanig | ml:efficient_nns [2025/05/07 06:17] (current) – [Efficient Transformers] jmflanig | ||
|---|---|---|---|
| Line 14: | Line 14: | ||
| * [[https:// | * [[https:// | ||
| * [[https:// | * [[https:// | ||
| - | * [[Zhang et al 2023 - H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models]] Removes tokens from the kv-cache, and keeps the most important ones (the heavy-hitters, | + | * [[https:// |
| ===== Related Pages ===== | ===== Related Pages ===== | ||
ml/efficient_nns.1746598612.txt.gz · Last modified: 2025/05/07 06:16 by jmflanig