Table of Contents
Mixture of Expert (MoE) Models
Overviews
Foundational and Early Papers
MoE Large Language Models
People
Related Pages
Mixture of Expert (MoE) Models
Mixture of expert (MoE) models, focusing on sparse MoE models.
Overviews
Fedus et al 2022 - A Review of Sparse Expert Models in Deep Learning
For LLMs
Cai et al 2024 - A Survey on Mixture of Experts
(Focuses on LLMs)
Foundational and Early Papers
Shazeer et al 2017 - Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
MoE Large Language Models
Fedus et al 2021 - Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
Jiang et al 2024 - Mixtral of Experts
Dai et al 2024 - DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Xue et al 2024 - OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Kang et al 2024 - Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts
Lo et al 2024 - A Closer Look into Mixture-of-Experts in Large Language Models
Jin et al 2024 - MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts
Tang et al 2025 - Pangu Pro MoE: Mixture of Grouped Experts for Efficient Sparsity
Guo et al 2025 - Advancing Expert Specialization for Better MoE
People
Barret Zoph
Related Pages
Conditional Computation
Language Model
Model Compression