Update chat templates
I know it's a bit of a pain, but could you update the chat template to the latest chat templates now that llama.cpp supports it?
At least you won't have to requantize everything as I made a handy script that lets you create a new GGUF using the updated tokenizer_config.json file, see the details in the PR. :)
PS: You only have to update the first file in a split GGUF.
@CISCai Thanks for your contribution! I rented a server to create all the quants, which I no longer have so this means I would have to re-download all the quants, update them all (first split where we have chunks) then re-upload everything to HF. Would there be a way to create a new dummy GGUF that has just the metadata maybe?
Unfortunately not. :(
Hopefully some day it can be the norm to just have a small GGUF containing only metadata as the first split.
For those of us at home, how can we run that gguf-new-metadata.py
script with the correct settings? I'd be willing to re-upload what I've downloaded myself.
@sealad886 Great! :)
If you already have a working Python3 environment with all the dependencies you can just run the script like this:
python gguf-new-metadata.py input.gguf output.gguf --chat-template-config tokenizer_config.json
If you don't have an environment set up you will need to do that first:
python3 -m venv my-venv
source my-venv/bin/activate
pip install -r requirements.txt
Turns out it's actually faster for me to re-quantize everything than it is to download these on my internet connection. I'm using your imatrix.dat
file, and I'll do a couple of hashes on several of the files that I did download to make sure I'm not just creating completely new files....but yeah, I'll upload those soon ish.
Thank you for updating (and the BPE PR). :)