Update chat templates

#17
by CISCai - opened

I know it's a bit of a pain, but could you update the chat template to the latest chat templates now that llama.cpp supports it?

At least you won't have to requantize everything as I made a handy script that lets you create a new GGUF using the updated tokenizer_config.json file, see the details in the PR. :)

PS: You only have to update the first file in a split GGUF.

@CISCai Thanks for your contribution! I rented a server to create all the quants, which I no longer have so this means I would have to re-download all the quants, update them all (first split where we have chunks) then re-upload everything to HF. Would there be a way to create a new dummy GGUF that has just the metadata maybe?

Unfortunately not. :(

Hopefully some day it can be the norm to just have a small GGUF containing only metadata as the first split.

For those of us at home, how can we run that gguf-new-metadata.py script with the correct settings? I'd be willing to re-upload what I've downloaded myself.

@sealad886 Great! :)

If you already have a working Python3 environment with all the dependencies you can just run the script like this:

python gguf-new-metadata.py input.gguf output.gguf --chat-template-config tokenizer_config.json

If you don't have an environment set up you will need to do that first:

python3 -m venv my-venv
source my-venv/bin/activate
pip install -r requirements.txt

Turns out it's actually faster for me to re-quantize everything than it is to download these on my internet connection. I'm using your imatrix.dat file, and I'll do a couple of hashes on several of the files that I did download to make sure I'm not just creating completely new files....but yeah, I'll upload those soon ish.

Thank you for updating (and the BPE PR). :)

CISCai changed discussion status to closed

Sign up or log in to comment