turboderp
/

Llama-3.1-70B-Instruct-exl2

Model card Files Files and versions Community

The tokenizer has changed just fyi

by bullerwins - opened Jul 24

Discussion

bullerwins

Jul 24

https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct/discussions/28/files

St33lMouse

Jul 24

That link leads to something...Forbidden. 404.

Hmm. Tokenizer. This Llama 3.1 70B instruct exl2 here came with its own tokenizer. Do I have to do anything special to get it to work with Ooba?

bullerwins

Jul 24

Maybe you are gated from the repo?

bullerwins

Jul 24

Hmm. Tokenizer. This Llama 3.1 70B instruct exl2 here came with its own tokenizer. Do I have to do anything special to get it to work with Ooba?

To make it work with ooba you need to update the exllamav2, turboderp has just updated the main branch to support it. I tested yesterday with the dev branch and was working fine ( i used tabbyapi though)

St33lMouse

Jul 24

OK, thanks.

jackboot

Jul 24

Waiting for access, has anyone rehosted it yet?

bullerwins

Jul 24

Waiting for access, has anyone rehosted it yet?

https://huggingface.co/SillyTilly

jackboot

Jul 24

That has the new tokenizer? They just gave me access and it was only updated a few hours ago. Some changes related to BOS and special tokens map only.

bullerwins

Jul 24

That has the new tokenizer? They just gave me access and it was only updated a few hours ago. Some changes related to BOS and special tokens map only.

it does not have it updated

wolfram

27 days ago

It's been a month and I just stumbled over this, too. So these Llama 3.1 EXL2 quants by @turboderp , and also those by @LoneStriker , don't have the updated tokenizer. Only @bullerwins quants have been updated with it.

So should we consider these older quants obsolete, will they get updated, or is it actually not an issue? I'm sure most of us would prefer to run the best possible version of Llama 3.1 so what's the consensus here?

bullerwins

27 days ago

•

edited 27 days ago

It's been a month and I just stumbled over this, too. So these Llama 3.1 EXL2 quants by @turboderp , and also those by @LoneStriker , don't have the updated tokenizer. Only @bullerwins quants have been updated with it.

So should we consider these older quants obsolete, will they get updated, or is it actually not an issue? I'm sure most of us would prefer to run the best possible version of Llama 3.1 so what's the consensus here?

I believe it does matter. I haven't run benchmarks with exl2 but with the GGUF I did A/B with and without the fixed tokenizer, and testing with mmlu pro benchmarks got consistent better results with the fixed one.

I actually just heard the last weeks thursai podcast and notice the 405B model got updated and the nous team had to retrain. I checked and it only affected the 405B, so 8B and 70B models with the fixed tokenizer and chat templates would be the best. The chat template for the exl2 doesn't need requant, just update the tokenizer_config.json.

Note: the chat template has gotten 2 updates, my models have the first update but not the second one. The second one is related to tool calling, so it won't matter if you don't use it. I'll update my models today I can hit you up here or in twitter if you want.

I believe that is all

wolfram

27 days ago

@bullerwins Thanks for the info! And yes, please let me know when you've updated, so I can update my local copies. I'd rather stay up to date now than have issues later.

Feel free to contact me on Twitter, too, I'll follow you (if I don't already) and retweet your update note. Always good to spread useful information.

jackboot

27 days ago

I just pull the new tokenizers in and replace them. Seems to work fine.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment