A Mixtral-8x7b-v3.1?

#64
by chriss1245 - opened

Hi, I have been using mixtral 8x7b a lot lately. I love its MoE architecture which allows us to run a bigger model of 8x7B with resources of a 7B model. I have been trying to replace this wonderful model with newer models such as nemo or llama3.1. However, any of them can perform equally as this model. I heavily use function calling (thanks to Ollama). This model outperforms other models widely. When others fail or make mistakes in the json, this model keeps being the best. It does not make that many mistakes compared with llama3.1 7b, mistral-nemo 12B, gemma 9B and other simmilar models.

I thought this model was trained by using 8 llama2 7b. I wonder whether you plan to train a new mixtral with llama3.1 7b. I think that the improvement may be huge.

I believe it was trained from scratch from the mistral team. It doesn't use anything from llama

I see, a mistake of mine. Still it would be great to see a new version of mixtral. For sure there was a huge improvement during the last months which may be useful for improving this model. Is there any intention to train a new Mixtral?

Agreed, while its great, it would be nice to see a new version, like they did with the regular 7B.

Only the mistral team knows if there are plans to release new Mixtral models. I just work at HF

Only the mistral team knows if there are plans to release new Mixtral models. I just work at HF

For me, it wasn't really a question directed at anyone. just a general statement that it would be nice.

Sign up or log in to comment