ml:efficient_nns

This is an old revision of the document!

Table of Contents

Efficient Neural Networks

Efficient Neural Networks

Methods having to do with efficiency in neural networks.

Overviews

For LLMs

Efficient Transformers

Pope 2022 - Efficiently Scaling Transformer Inference Introduced the idea of the KV cache.
Gim et al 2023 - Prompt Cache: Modular Attention Reuse for Low-Latency Inference

Related Pages

ml/efficient_nns.1743570174.txt.gz · Last modified: 2025/04/02 05:02 by jmflanig