Great minds think alike. (Another similar architecture and system co-design)
#7
by
withinmiaov
- opened
Amazing work! We also address the All-to-All communication problem by using a similar communication-computation overlap approach, called Shortcut-connected MoE architecture. We are excited to see that such an approach can be applied to real products in the industry. If you are interested, we have more details on this in our paper: "Shortcut-connected Expert Parallelism for Accelerating Mixture-of-Expert" https://arxiv.org/abs/2404.05019