Differences

This shows you the differences between two versions of the page.

--- ml:gpu_deep_learning [2025/05/31 08:02] – [Customized Implementations on GPUs] jmflanig
+++ ml:gpu_deep_learning [2025/07/17 03:25] (current) – [Miscellaneous Transformer & GPU Papers] jmflanig
@@ Line 102: / Line 102: @@
   * [[https://arxiv.org/pdf/2309.06180|Kwon et al 2023 - Efficient Memory Management for Large Language Model Serving with PagedAttention]]
   * [[https://arxiv.org/pdf/2205.05198|Korthikanti et al 2022 - Reducing Activation Recomputation in Large Transformer Models]]
+  * [[https://arxiv.org/pdf/2503.15798|Jie et al 2025 - Mixture of Lookup Experts]]
 ===== Customized Implementations on GPUs =====