tokeniser.json and vocab files not found

#4
by ShieldHero - opened

Tokenizer requires both vocab and tokeniser.json files. But theses files are not present in the repository. I am not able to initialise the tokeniser without these files. Can someone please lend me their aid in solving this issue?

File "run.py", line 70, in t = AutoTokenizer.from_pretrained(model_name) File "/miniconda/lib/python3.7/site-packages/transformers/models/auto/tokenization_auto.py", line 532, in from_pretrained return tokenizer_class_fast.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs) File "/miniconda/lib/python3.7/site-packages/transformers/tokenization_utils_base.py", line 1780, in from_pretrained **kwargs, File "/miniconda/lib/python3.7/site-packages/transformers/tokenization_utils_base.py", line 1908, in _from_pretrained tokenizer = cls(*init_inputs, **init_kwargs) File "/miniconda/lib/python3.7/site-packages/transformers/models/xlm_roberta/tokenization_xlm_roberta_fast.py", line 150, in __init__ **kwargs, File "/miniconda/lib/python3.7/site-packages/transformers/tokenization_utils_fast.py", line 118, in __init__ "Couldn't instantiate the backend tokenizer from one of: \n" ValueError: Couldn't instantiate the backend tokenizer from one of: (1) a `tokenizers` library serialization file, (2) a slow tokenizer instance to convert or (3) an equivalent slow tokenizer class to instantiate and convert. You need to have sentencepiece installed to convert a slow tokenizer to a fast one. ![Screenshot 2023-02-21 at 11.44.08 AM.png](https://cdn-uploads.huggingface.co/production/uploads/1676960156163-61e58371192e9bd83bc96e92.png)
ShieldHero changed discussion title from tokenised.json and vocal files not found to tokeniser.json and vocal files not found

@joeddav any solution to this

joeddav changed discussion status to closed

Thank you @joeddav

ShieldHero changed discussion title from tokeniser.json and vocal files not found to tokeniser.json and vocab files not found

Sign up or log in to comment