Safetensors
English
olmoe
Mixture of Experts
olmo

Struggling to get coherent results finetuning

#1
by Fizzarolli - opened

Hi! I'm trying to do something similar as was done here, SFTing instruct data onto the base model. However, my loss starts tremendously high (~7) and the end model doesn't output properly formatted or coherent instruct responses
I notice that in the base model, the tokenizer's bos_token is set to null, while here it's set to the EOS token. Is there a reason for doing that for the SFT here?

Sign up or log in to comment