Well, yes, if the models are somewhat compatible. Here is an experiment I did. I wanted to merge two of the best performing models: mlabonne/NeuralBeagle14-7B and jeonsworld/CarbonVillain-en-10.7B-v4
Here is my recipe:
1. Expand the layers of NeuralBeagle to 10.7B ala frankenmerge.
2. DPO-tune the previous model with a high-quality preference dataset, argilla/distilabel-intel-orca-dpo-pairs
3. Merge the previous model with CarbonVillain (needs —allow-crimes in mergekit! 🔪)
And here is the resulting model, CarbonBeagle-11B, which ranked top in the leaderboard for its size class:
vicgalle/CarbonBeagle-11B