Vocab_size of the model configuration is incorrect
#18
by
robkirk
- opened
In the model configuration for this (and other opt models) the vocab_size is 50272, but the tokenizer has vocab size 50265, which matches the original vocabulary here. and the one on huggingface here. Could this be updated somehow (although I realise that could mess with checkpoints etc.)?
There's this issue on the transformers github referencing the samething.
Hey @robkirk ,
Good question! I think you can find the answer here: https://github.com/huggingface/transformers/issues/17431#issuecomment-1224231170 (it was on another GitHub issue)