ml:gpu_deep_learning

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
ml:gpu_deep_learning [2025/05/31 08:02] – [Customized Implementations on GPUs] jmflanigml:gpu_deep_learning [2025/07/17 03:25] (current) – [Miscellaneous Transformer & GPU Papers] jmflanig
Line 102: Line 102:
   * [[https://arxiv.org/pdf/2309.06180|Kwon et al 2023 - Efficient Memory Management for Large Language Model Serving with PagedAttention]]   * [[https://arxiv.org/pdf/2309.06180|Kwon et al 2023 - Efficient Memory Management for Large Language Model Serving with PagedAttention]]
   * [[https://arxiv.org/pdf/2205.05198|Korthikanti et al 2022 - Reducing Activation Recomputation in Large Transformer Models]]   * [[https://arxiv.org/pdf/2205.05198|Korthikanti et al 2022 - Reducing Activation Recomputation in Large Transformer Models]]
 +  * [[https://arxiv.org/pdf/2503.15798|Jie et al 2025 - Mixture of Lookup Experts]]
  
 ===== Customized Implementations on GPUs ===== ===== Customized Implementations on GPUs =====
ml/gpu_deep_learning.1748678521.txt.gz · Last modified: 2025/05/31 08:02 by jmflanig

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki