Idea for the future - combine this with traditional merging

by Henk717 - opened Dec 12, 2023

Dec 12, 2023

Possibly concept to further improve this, have each expert contain a mild merge of the other expects to ensure overlap.
This would make it a bit closer to what MDEL / Aurora is doing, since they plan to have one base model that is a traditional merge with MoE's on top for enhancement. Since I believe this architecture is a little different we might be able to create a similar effect by merging each expert at a low percentage and then using the resulting models for the MoE model.

chargoddard

Owner Dec 15, 2023

That's an interesting thought - I'll definitely have to try it. Thanks!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment