Why 72B model has different vocab size comparing with other models?

by Mikasaka - opened Feb 8

Feb 8

I found this 72B has vocab size of 152064, while other 7b 4b models etc have vocab size of 151936. Why it is designed in such way?

hafezmg48

Apr 22

I also have a similar problem. For Qwen 1.8B they mentioned that the vocab size is 151851, and the tokenizer also has the same 151851 vocabs, but in the model weights, the vocab_size is 151936. Can someone explain why it is that way? Thanks.

JustinLin610

Qwen org Apr 24

The vocabularies are the same actually. The reason why we have different sizes of vocab is our distributed training. For larger models trained across devices, we need padding for the vocab.

jklj077 changed discussion status to closed Apr 26

cduk

May 22

•

edited May 22

The problem is that vLLM checks for vocab size and if it doesn't match, the speculative decoding is not enabled. If you pad, then maybe pad all models to the same vocab size.

zhurou603

Sep 11

The problem is that vLLM checks for vocab size and if it doesn't match, the speculative decoding is not enabled. If you pad, then maybe pad all models to the same vocab size.

hi, do you solve this problem?

cduk

Sep 28

Yes and no. I modified the model to have the same vocab size. However, the vLLM speculative decoding performance is so terrible that it is not worth using.

aidayy

11 days ago

For tokenizers in transformers, in convention, tokenizer.vocab_size as documented is the size of the base vocabulary (without the added tokens). To get the actual vocabulary size, you need to use len(tokenizer), which is 151646 for Qwen1.5 models.

The vocab_size in config.json is the number of embeddings, which can be larger than the acutal vocabulary size because of optimization for GPU computation and other consideration. 152064 can be divided by 256 and 151926 can be divided by 128.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment