ml:efficient_nns

This is an old revision of the document!

Table of Contents

Efficient Neural Networks

Efficient Neural Networks

Methods having to do with efficiency in neural networks.

Overviews

Efficient Transformers

Pope 2022 - Efficiently Scaling Transformer Inference Introduced the idea of the KV cache.
Gim et al 2023 - Prompt Cache: Modular Attention Reuse for Low-Latency Inference
Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models Removes tokens from the kv-cache, and keeps the most important ones (the heavy-hitters, H2s)

Related Pages

ml/efficient_nns.1746598612.txt.gz · Last modified: 2025/05/07 06:16 by jmflanig