questions
HI,
how this model differs from the Ministral 7B in terms of overall output quality ? I'm freaking out (because noob) to understand how shall I train Samantha on my own language but I have troubles understanding the starting point. for the ministral has been used the ChatML format but I did not understand if for both the ministral as well the llama the same dataset has been used (ehartford/samantha-data). I have been told to translate the samantha-1.1.json but at the same time i am wondering if:
- do i need to translate all the json/jsonl file under the folder "data" as well
- how to fine-tune Samantha using the translated dataset(s)
do you have any suggestion or some channel/ml/whatever place where could I ask ? many thanks, appreciated
It sounds like you want to fine-tune a model by training it on a new language. Just know this is computationally expensive and highly technical. Also, the bigger the model, the more resources it requires. Changing the .json is not sufficient.
There are lots of learning resources for this but it really depends on the hardware you will use. I suggest searching YouTube for an introduction to LLM training, then going from there.
Regarding these two specific models, there are quite a few differences:
- Samantha 33b is a fine-tune based on LLaMa. It was fine-tuned by ehartford by training the base LLaMa model on new inputs.
- Mistral 7b is a base model, just like LLaMa. There are now fine-tunes based on Mistral, including Mistral-Samantha.
- The other big difference is the number of parameters: 33b vs 7b. Samantha 33b is "smarter" than Mistral 7b, both in terms of the size of the model and in terms of the quality of the output.
Yes exactly, that's the idea, train one of the two Ministral or LLama with the translated dataset. Since the Ministral 7B seems to be very promising I thought to start with it, simply because it would be much more "light" to run. The most powerful GPU available to me is a RTX 3080,I could try to run the Samantha 33B if I find ( is available if i am not mistaken ) a quantized version of it.
Anyway regarding the training yes there are a lot of video, unfortunately find a good one is not easy as find it they tend to be pretty confusing and skipping important details.
i was triggered by this guy: https://www.youtube.com/watch?v=DhUsZ40jQb0 it seems that he trained samantha so be able to speak in spanish
Perhaps start by training a small model from scratch, then learning how to finetune the small model. Once you've done that, you can apply what you learned to Mistral or Samantha.
Here's a neat tutorial I found on the /r/LocalLlama subreddit: https://old.reddit.com/r/LocalLLaMA/comments/14dstqm/tutorial_train_your_own_llamacpp_miniggmlmodel/
That should be small enough to run on most systems.
Then, fine-tune your small model. Since it's small, it will be easier to run - and therefore easier to debug and learn from.
Here's a good example of fine-tuning: https://github.com/mzbac/qlora-fine-tune
Another resource for finetuning is the llama.cpp example: https://github.com/ggerganov/llama.cpp/tree/master/examples/finetune
thank you man, very appreciated. you took the time and the patience nobody else took. I will study further. cheers and take care