Struggling to get coherent results finetuning

by Fizzarolli - opened Oct 5

Oct 5

Hi! I'm trying to do something similar as was done here, SFTing instruct data onto the base model. However, my loss starts tremendously high (~7) and the end model doesn't output properly formatted or coherent instruct responses
I notice that in the base model, the tokenizer's bos_token is set to null, while here it's set to the EOS token. Is there a reason for doing that for the SFT here?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment