Add chat template
Note, according to the model publisher, a "</s>
" token was not used in the training, so I think this is incorrect. See the format the author described here:
https://huggingface.co/HuggingFaceH4/zephyr-7b-beta/discussions/7
Furthermore, the author had this note about the trailing token. See the author's comment in the link above:
---------------- BEGIN COMMENT IN LINK ABOVE ------------------
I had problems making the model stop generating content. So I found the solution in this link (https://medium.com/@parikshitsaikia1619/mistral-mastery-fine-tuning-fast-inference-guide-62e163198b06)
This change before starting the training solved my problem
tokenizer = AutoTokenizer.from_pretrained("HuggingFaceH4/zephyr-7b-beta")
#tokenizer.pad_token = tokenizer.eos_token
tokenizer.pad_token = tokenizer.unk_token <----
tokenizer.padding_side = "right" <----
---------------- END COMMENT IN LINK ABOVE ------------------
Any help is greatly appreciated. Thank you!