If I trained a model on mistral already, do I need to start from scratch due to difficulties of fine-tuning?

#62
by brando - opened

I heard rumors that there was a bug with the mistral 7B tokenizer. I was asking because I wanted to know if I should re-train from scratch again or if using my current checkpoint is ok. What do you suggest?

https://huggingface.co/kittn/mistral-7B-v0.1-hf/discussions/1

I do not know the inside detail. But I do observe the training loss not drop if I reused the original checkpoint model weight for the current base model and tokenizer. Thus I retrain my model from the scratch.

Hey @brando , @shaobaij , please see this issue which might be interesting to you: https://github.com/huggingface/transformers/issues/26498

We have managed to fine-tune Mistral in different settings, so the tokenizer settings should be correct.

Sign up or log in to comment