Tiktoken and interaction with Transformers
Support for tiktoken model files is seamlessly integrated in 🤗 transformers when loading models
from_pretrained
with a tokenizer.model
tiktoken file on the Hub, which is automatically converted into our
fast tokenizer.
Known models that were released with a tiktoken.model :
- gpt2
- llama3
Example usage
In order to load tiktoken
files in transformers
, ensure that the tokenizer.model
file is a tiktoken file and it
will automatically be loaded when loading from_pretrained
. Here is how one would load a tokenizer and a model, which
can be loaded from the exact same file:
from transformers import AutoTokenizer
model_id = "meta-llama/Meta-Llama-3-8B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id, subfolder="original")