tokeniser.json and vocab files not found
#4
by
ShieldHero
- opened
Tokenizer requires both vocab and tokeniser.json files. But theses files are not present in the repository. I am not able to initialise the tokeniser without these files. Can someone please lend me their aid in solving this issue?
File "run.py", line 70, in
t = AutoTokenizer.from_pretrained(model_name)
File "/miniconda/lib/python3.7/site-packages/transformers/models/auto/tokenization_auto.py", line 532, in from_pretrained
return tokenizer_class_fast.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
File "/miniconda/lib/python3.7/site-packages/transformers/tokenization_utils_base.py", line 1780, in from_pretrained
**kwargs,
File "/miniconda/lib/python3.7/site-packages/transformers/tokenization_utils_base.py", line 1908, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "/miniconda/lib/python3.7/site-packages/transformers/models/xlm_roberta/tokenization_xlm_roberta_fast.py", line 150, in __init__
**kwargs,
File "/miniconda/lib/python3.7/site-packages/transformers/tokenization_utils_fast.py", line 118, in __init__
"Couldn't instantiate the backend tokenizer from one of: \n"
ValueError: Couldn't instantiate the backend tokenizer from one of:
(1) a `tokenizers` library serialization file,
(2) a slow tokenizer instance to convert or
(3) an equivalent slow tokenizer class to instantiate and convert.
You need to have sentencepiece installed to convert a slow tokenizer to a fast one.
![Screenshot 2023-02-21 at 11.44.08 AM.png](https://cdn-uploads.huggingface.co/production/uploads/1676960156163-61e58371192e9bd83bc96e92.png)
ShieldHero
changed discussion title from
tokenised.json and vocal files not found
to tokeniser.json and vocal files not found
Fixed via a2a45f84b9f7216de3462a96dc0fa0d65d441f9f
joeddav
changed discussion status to
closed
Thank you @joeddav
ShieldHero
changed discussion title from
tokeniser.json and vocal files not found
to tokeniser.json and vocab files not found