Idea for the future - combine this with traditional merging
#2
by
Henk717
- opened
Possibly concept to further improve this, have each expert contain a mild merge of the other expects to ensure overlap.
This would make it a bit closer to what MDEL / Aurora is doing, since they plan to have one base model that is a traditional merge with MoE's on top for enhancement. Since I believe this architecture is a little different we might be able to create a similar effect by merging each expert at a low percentage and then using the resulting models for the MoE model.
That's an interesting thought - I'll definitely have to try it. Thanks!