Error with Tokenizer

#121
by wissamee - opened

Hello,
I'm currently fine-tuning the "Mistral-7B-Instruct-v0.1" model and I've encountered an issue that I haven't faced before when using the AutoTokenizer from Transformers. Here's the code I'm using:

tokenizer = AutoTokenizer.from_pretrained( base_model_id, padding_side="left", # reduces memory usage add_eos_token=True, add_bos_token=True, ) tokenizer.pad_token = tokenizer.eos_token

However, I'm receiving the following error:

OSError: Can't load tokenizer for 'mistralai/Mistral-7B-Instruct-v0.1'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'mistralai/Mistral-7B-Instruct-v0.1' is the correct path to a directory containing all relevant files for a LlamaTokenizerFast tokenizer.

Does anyone know how to resolve this issue?

I am facing the same issue, Have you found any solution?

I'm not sure if it's relevant, but I'm temporarily utilizing the "Mistral-7B-v0.1" tokenizer until a solution is found. Please keep me informed if there are any updates.

Hi - I have the same error but using flash_attn==2.5.8 gets rid of the tokenizer error but creates a new import module error for downloading models.

Requirements to reproduce:
flash_attn==2.5.8
transformers==4.41.2
torch==2.2.2
requests==2.31.0
mlflow==2.13.1
bitsandbytes==0.42.0
accelerate==0.31.0

databricks 14.3 ML cluster with cuda version 11.8
Has anyone got a fix?

This isn't a library error. I was facing the same issue until I realized I hadn't logged in to Hugging Face:

from huggingface_hub import login
login(token="your_access_token_here")

I'm trying to deploy the model on AKS cluster by adding the env variable 'HF_TOKEN' to the mistral-7b.yaml but still getting an error '401 Client Error: Unauthorized for url: https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1/resolve/main/adapter_config.json'. Any advise on this? Thanks

Sign up or log in to comment