====== Mixture of Expert (MoE) Models ====== Mixture of expert (MoE) models, focusing on sparse MoE models. ===== Overviews ===== * [[https://arxiv.org/pdf/2209.01667|Fedus et al 2022 - A Review of Sparse Expert Models in Deep Learning]] * **For LLMs** * [[https://arxiv.org/pdf/2407.06204|Cai et al 2024 - A Survey on Mixture of Experts]] (Focuses on LLMs) ===== Foundational and Early Papers ===== * [[https://arxiv.org/pdf/1701.06538|Shazeer et al 2017 - Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer]] ===== MoE Large Language Models ===== * [[https://arxiv.org/pdf/2101.03961|Fedus et al 2021 - Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity]] * [[https://arxiv.org/pdf/2401.04088|Jiang et al 2024 - Mixtral of Experts]] * [[https://arxiv.org/pdf/2401.06066|Dai et al 2024 - DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models]] * [[https://arxiv.org/pdf/2402.01739|Xue et al 2024 - OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models]] * [[https://arxiv.org/pdf/2406.12034|Kang et al 2024 - Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts]] * [[https://arxiv.org/pdf/2406.18219|Lo et al 2024 - A Closer Look into Mixture-of-Experts in Large Language Models]] * [[https://arxiv.org/pdf/2410.07348|Jin et al 2024 - MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts]] * [[https://arxiv.org/pdf/2505.21411|Tang et al 2025 - Pangu Pro MoE: Mixture of Grouped Experts for Efficient Sparsity]] * [[https://arxiv.org/pdf/2505.22323|Guo et al 2025 - Advancing Expert Specialization for Better MoE]] ===== People ===== * [[https://scholar.google.com/citations?user=EtliLXcAAAAJ&hl=en|Barret Zoph]] ===== Related Pages ===== * [[Conditional Computation]] * [[nlp:Language Model]] * [[Model Compression]]