ml:efficient_nns

This is an old revision of the document!

Table of Contents

Efficient Neural Networks

Efficient Neural Networks

Methods having to do with efficiency in neural networks.

Overviews

For LLMs
- Wan et al 2023 - Efficient Large Language Models: A Survey
- Zhou et al 2024 - A Survey on Efficient Inference for Large Language Models

Efficient Transformers

Pope 2022 - Efficiently Scaling Transformer Inference Introduced the idea of the KV cache.
Gim et al 2023 - Prompt Cache: Modular Attention Reuse for Low-Latency Inference

Related Pages

ml/efficient_nns.1743569853.txt.gz · Last modified: 2025/04/02 04:57 by jmflanig