Fix tokenizer_config.json EOS token
#3
by
compilade
- opened
After converting this model to GGUF for use with llama.cpp, I noticed the model kept rambling.
When checking the logs with all the token ids of the output, it seemed to continue even though the 50279 token (aka <|im_end|>
in this model) kept appearing. This led me to check which token was the EOS token in the GGUF, and it was the token 0 (aka <|endoftext|>
in this model).
So I found that the EOS token was not correctly set in tokenizer_config.json
and changed it to <|im_end|>
(and then reconverted the model to GGUF to test it), and no more rambling!
compilade
changed pull request status to
open
pansophic
changed pull request status to
merged