mistral-goliath-12b ?
I heard goliath 120b is at GPT4 level for some benchmarks, it it possible to use the same merge techniques and generate a merge of 2 mistral models? should be interesting if same capabilities are amplified as well. Maybe a merge of 3 models would be even stronger :)
I heard goliath 120b is at GPT4 level for some benchmarks, it it possible to use the same merge techniques and generate a merge of 2 mistral models? should be interesting if same capabilities are amplified as well. Maybe a merge of 3 models would be even stronger :)
There is no publically available 70B Mistral yet though.
The 7b models are quite strong. Maybe merging multiple small models will improve the overall result. I don't know... newbie here. But your Goliath experiment seems to indicate a valid path.
The 7b models are quite strong. Maybe merging multiple small models will improve the overall result. I don't know... newbie here. But your Goliath experiment seems to indicate a valid path.
It's not the way you think it works, adding like 3 same models, would be quite a amount to layer duplicasy which would eventually lead to a garbage model if not fine tuned further