Where to get the matching tokenizer.

#12
by CharlaDev - opened

Thank you very much for the quants! I'm planning to use the Q8_0 model. But for that I need a matching tokenizer.

Im using the AutoTokenizer.from_pretrained method. And, of course, the straight-forward method is to use tokenizer based on the official Meta repo like so:

tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3.1-8B-Instruct")

But the repo is gated and they won't give me the permission to use it.

So I decided to try another (not gated) repo like so:

tokenizer = AutoTokenizer.from_pretrained("baseten/Meta-Llama-3.1-Instruct-tokenizer")

(link to repo)

I just want to make sure that the second method is correct and the tokenizer is matching the Q8_0 model.

Can someone please confirm it?

@CharlaDev that repo should work fine yes, the original tokenizer was last updated on July 30, and I updated my models a couple hours later, and it hasn't changed since, so you should be good to go!

Sign up or log in to comment