Differences

This shows you the differences between two versions of the page.

--- ml:distributed_training [2025/05/20 10:06] – [Model Parallel (or a combination of model + data parallel)] jmflanig
+++ ml:distributed_training [2025/05/29 07:18] (current) – [Model Parallel (or a combination of model + data parallel)] jmflanig
@@ Line 45: / Line 45: @@
   * [[https://arxiv.org/pdf/2401.10241.pdf|Qi et al 2024 - Zero Bubble Pipeline Parallelism]]
   * [[https://arxiv.org/pdf/2401.02669|Lin et al 2024 - Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache]]
+  * [[https://arxiv.org/pdf/2408.04093|Shyam et al 2024 - Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters]]
 ===== Distributed Serving (Inference) =====