Update chat template
I know it's a bit of a pain, but could you update the chat template to the latest chat templates now that llama.cpp supports it?
At least you won't have to requantize everything as I made a handy script that lets you create a new GGUF using the updated tokenizer_config.json file, see the details in the PR. :)
Can do, although I'm not sure if I'm doing the right thing, please double check:
I uploaded a test file at ./c4ai-command-r-v01-imat-IQ1_S.tmpl.gguf
I ran:
python3 scripts/gguf-new-metadata.py --chat-template default --chat-template-config ../c4ai-command-r-v01/tokenizer_config.json ./c4ai-command-r-v01-imat-IQ1_S.gguf c4ai-command-r-v01-imat-IQ1_S.tmpl.gguf
But gguf-dump.py only shows the added chat-template=default key, I was expecting new entries below that for the different tempalates, is this expected or am I doing sth wrong ?
python3 scripts/gguf-dump.py --no-tensors --json ./c4ai-command-r-v01-imat-IQ1_S.tmpl.gguf
{"filename": "./c4ai-command-r-v01-imat-IQ1_S.tmpl.gguf", "endian": "LITTLE", "metadata": {"GGUF.version": {"index": 0, "type": "UINT32", "offset": 4, "value": 3}, "GGUF.tensor_count": {"index": 1, "type": "UINT64", "offset": 8, "value": 322}, "GGUF.kv_count": {"index": 2, "type": "UINT64", "offset": 16, "value": 24}, "general.architecture": {"index": 3, "type": "STRING", "offset": 24, "value": "command-r"}, "general.name": {"index": 4, "type": "STRING", "offset": 73, "value": "c4ai-command-r-v01"}, "command-r.block_count": {"index": 5, "type": "UINT32", "offset": 123, "value": 40}, "command-r.context_length": {"index": 6, "type": "UINT32", "offset": 160, "value": 131072}, "command-r.embedding_length": {"index": 7, "type": "UINT32", "offset": 200, "value": 8192}, "command-r.feed_forward_length": {"index": 8, "type": "UINT32", "offset": 242, "value": 22528}, "command-r.attention.head_count": {"index": 9, "type": "UINT32", "offset": 287, "value": 64}, "command-r.attention.head_count_kv": {"index": 10, "type": "UINT32", "offset": 333, "value": 64}, "command-r.rope.freq_base": {"index": 11, "type": "FLOAT32", "offset": 382, "value": 8000000.0}, "command-r.attention.layer_norm_epsilon": {"index": 12, "type": "FLOAT32", "offset": 422, "value": 9.999999747378752e-06}, "general.file_type": {"index": 13, "type": "UINT32", "offset": 476, "value": 24}, "command-r.logit_scale": {"index": 14, "type": "FLOAT32", "offset": 509, "value": 0.0625}, "command-r.rope.scaling.type": {"index": 15, "type": "STRING", "offset": 546, "value": "none"}, "tokenizer.ggml.model": {"index": 16, "type": "STRING", "offset": 597, "value": "gpt2"}, "tokenizer.ggml.tokens": {"index": 17, "type": "ARRAY", "offset": 641, "array_types": ["STRING"]}, "tokenizer.ggml.token_type": {"index": 18, "type": "ARRAY", "offset": 4813469, "array_types": ["INT32"]}, "tokenizer.ggml.merges": {"index": 19, "type": "ARRAY", "offset": 5837518, "array_types": ["STRING"]}, "tokenizer.ggml.bos_token_id": {"index": 20, "type": "UINT32", "offset": 10865265, "value": 5}, "tokenizer.ggml.eos_token_id": {"index": 21, "type": "UINT32", "offset": 10865308, "value": 255001}, "tokenizer.ggml.padding_token_id": {"index": 22, "type": "UINT32", "offset": 10865351, "value": 0}, "tokenizer.ggml.add_bos_token": {"index": 23, "type": "BOOL", "offset": 10865398, "value": true}, "tokenizer.ggml.add_eos_token": {"index": 24, "type": "BOOL", "offset": 10865439, "value": false}, "general.quantization_version": {"index": 25, "type": "UINT32", "offset": 10865480, "value": 2}, "tokenizer.chat_template": {"index": 26, "type": "STRING", "offset": 10865524, "value": "default"}}, "tensors": {}}⏎
If this is the expected result I can run it on all files that I still have downloaded. (My model folder is 15 TB so I had to delete some older ones and would only update the ones I'm actively using)
Also do you know if the default template exists for all models or do I have to check the tokenizer_config.json for each model beforehand ?
It would appear that you do not have the latest tokenizer_config.json, also you only have to provide the --chat-template-config option, not --chat-template (it's only if you don't have the JSON file).
With the correct file you should see 3 new (in addition to tokenizer.chat_template
) metadata items:
tokenizer.chat_templates
tokenizer.chat_template.tool_use
tokenizer.chat_template.rag
Hey, when requantizing for the new pre-tokenizer stuff I finally got around to looking at this again, I indeed had an outdated tokenizer_config.json.
Sadly I only noticed after quantizing again, but a script is now running which runs the gguf-new-metadata.py on all ggufs and they should appear with fixed metadata one-by-one in the next few hours.
C4AI Command-R Plus will also be requantized and uploaded with fixed metadata, but that will take some time as just imatrix generation takes ~ 60 hours on my poor CPU-only server.
IQ1_S is already fixed, others should appear over a few hours.