Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer Paper • 1701.06538 • Published Jan 23, 2017 • 4
ST-MoE: Designing Stable and Transferable Sparse Expert Models Paper • 2202.08906 • Published Feb 17, 2022 • 2
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM Paper • 2403.07816 • Published Mar 12 • 39