Update chat template
I know it's a bit of a pain, but could you update the chat template to the latest chat templates now that llama.cpp supports it?
At least you won't have to requantize everything as I made a handy script that lets you create a new GGUF using the updated tokenizer_config.json file, see the details in the PR. :)
PS: You only have to update the first file in a split GGUF.
Yes that would be awesome.
@CISCai
do you have an example call how to use your script exactly?
Ok I did that. Chat Template is now:
llama-cpp-server-1 | {"tid":"134473023250432","timestamp":1714571741,"level":"INFO","function":"main","line":3033,"msg":"chat template","chat_example":"<|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|>You are a helpful assistant<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|USER_TOKEN|>Hello<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>Hi there<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|USER_TOKEN|>How are you?<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>","built_in":false}
looks similar to the previous one if I am not mistaken. Is there any way to make sure the update was successful?
@James3
Right now clients only support the default template, there are however a couple of PRs in progress:
llama.cpp: Refactor chat template API
llama-cpp-python: Support multiple chat templates - step 1
In the meantime you can check the metadata using HFs built-in GGUF inspector or the gguf-dump.py script:
python3 gguf-dump.py input.gguf
You should see a number of new metadata entries; tokenizer.chat_templates
(containing tool_use
and rag
), tokenizer.chat_template.tool_use
and tokenizer.chat_template.rag
.