Table of Contents

Mixture of Expert (MoE) Models

Mixture of Expert (MoE) Models

Mixture of expert (MoE) models, focusing on sparse MoE models.

Overviews

Fedus et al 2022 - A Review of Sparse Expert Models in Deep Learning
For LLMs
- Cai et al 2024 - A Survey on Mixture of Experts (Focuses on LLMs)

Foundational and Early Papers

Shazeer et al 2017 - Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

MoE Large Language Models

Fedus et al 2021 - Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
Jiang et al 2024 - Mixtral of Experts
Dai et al 2024 - DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Xue et al 2024 - OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Kang et al 2024 - Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts
Lo et al 2024 - A Closer Look into Mixture-of-Experts in Large Language Models
Jin et al 2024 - MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts
Tang et al 2025 - Pangu Pro MoE: Mixture of Grouped Experts for Efficient Sparsity
Guo et al 2025 - Advancing Expert Specialization for Better MoE

People

Barret Zoph

Related Pages