<|im_end|> token at the end of every message

by ProjectXMP - opened Aug 1

Discussion

ProjectXMP

Aug 1

The token is at the end of every single assistant response:

mrjackspade

Aug 1

Thats one of the ChatML tokens referenced in the text:

It's also trained with ChatML tokens so there should be no EOS bleeding whatsoever.

You can set it as a stop token, but I've found the model performs better for me just using the ChatML template now

ProjectXMP

Aug 1

I have the chatML in place in VLLM, and the token still bleeds through.

AuriAetherwiing

Nothing is Real org Aug 1

Hm, that's quite weird, model was tested extensively with vLLM, and special token bleeding never happened. It was tested with text completion, not chat completion though, but jinja2 template in tokenizer_config.json does account for <|im_end|> as it should.

sheliak

Aug 3

•

edited Aug 3

I can confirm that I am experiencing this as well when using KoboldCPP and the built in ChatML template.

djuna

Aug 3

I think we should include 15 in eos token in the config file?

Memphisto

Aug 14

i had the same problem.
the fix for me was to activate "skip special tokens" in SillyTavern.
I have no idea what i am doing or what other effects this has.
But after i activated that option the Token is gone.

aaronday3

Nothing is Real org Aug 20

Ah that might just do it @Memphisto I had it on this entire time

Luni

24 days ago

This is due to a fault in the training. They did not add the token for the chatml to the special tokens, for whatever reason. So during training it kept spitting out <|im_end|> and other as strings rather as the EOS etc. token
Hopefully they will learn from their mistake because this model as a partial merge is a fine contribution

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment