If I trained a model on mistral already, do I need to start from scratch due to difficulties of fine-tuning?

#62

by brando - opened Oct 18, 2023

Oct 18, 2023

I heard rumors that there was a bug with the mistral 7B tokenizer. I was asking because I wanted to know if I should re-train from scratch again or if using my current checkpoint is ok. What do you suggest?

https://huggingface.co/kittn/mistral-7B-v0.1-hf/discussions/1

shaobaij

Oct 27, 2023

I do not know the inside detail. But I do observe the training loss not drop if I reused the original checkpoint model weight for the current base model and tokenizer. Thus I retrain my model from the scratch.

lysandre

Oct 31, 2023

Hey @brando , @shaobaij , please see this issue which might be interesting to you: https://github.com/huggingface/transformers/issues/26498

We have managed to fine-tune Mistral in different settings, so the tokenizer settings should be correct.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment