Tokenizer differences from v1
#3
by
bartowski
- opened
Any reason that this one is missing the tokens in tokenizer_config.json for FIM/file separator etc that v1 had?
Hey, @bartowski , thanks for bringing this to our attention. We merged a fix for it: https://huggingface.co/google/codegemma-1.1-7b-it/discussions/4
TL;DR - the conversion scripts for the transformer's equivalent models had some issues, and some tokens went missing.
Let us know if you face any issues. 🤗