ml:gpu_deep_learning

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
ml:gpu_deep_learning [2025/05/13 18:10] – [Details of Deep Learning on GPUs] jmflanigml:gpu_deep_learning [2025/07/17 03:25] (current) – [Miscellaneous Transformer & GPU Papers] jmflanig
Line 102: Line 102:
   * [[https://arxiv.org/pdf/2309.06180|Kwon et al 2023 - Efficient Memory Management for Large Language Model Serving with PagedAttention]]   * [[https://arxiv.org/pdf/2309.06180|Kwon et al 2023 - Efficient Memory Management for Large Language Model Serving with PagedAttention]]
   * [[https://arxiv.org/pdf/2205.05198|Korthikanti et al 2022 - Reducing Activation Recomputation in Large Transformer Models]]   * [[https://arxiv.org/pdf/2205.05198|Korthikanti et al 2022 - Reducing Activation Recomputation in Large Transformer Models]]
 +  * [[https://arxiv.org/pdf/2503.15798|Jie et al 2025 - Mixture of Lookup Experts]]
  
 ===== Customized Implementations on GPUs ===== ===== Customized Implementations on GPUs =====
Line 114: Line 115:
   * [[https://arxiv.org/pdf/2205.14135.pdf|Dao et al 2022 - FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness]]   * [[https://arxiv.org/pdf/2205.14135.pdf|Dao et al 2022 - FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness]]
   * [[https://arxiv.org/pdf/2307.08691.pdf|Dao 2023 - FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning]]   * [[https://arxiv.org/pdf/2307.08691.pdf|Dao 2023 - FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning]]
 +  * [[https://arxiv.org/pdf/2407.08608|Shah et al 2024 - FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision]]
 +  * [[https://arxiv.org/pdf/2505.22758|Nrusimha et al 2025 - FlashFormer: Whole-Model Kernels for Efficient Low-Batch Inference]] Fuses everything into one big kernel
  
 ===== Resources ===== ===== Resources =====
ml/gpu_deep_learning.1747159846.txt.gz · Last modified: 2025/05/13 18:10 by jmflanig

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki