Text Generation
Transformers
PyTorch
English
gpt2
feature-extraction
causal-lm
text-generation-inference

"Entry Not Found" for online loading, and OSError for offline loading

#3
by hiarcs - opened

When I tried to load the model, I received an error:
404 Client Error: Entry Not Found for url: https://huggingface.co/cerebras/Cerebras-GPT-13B/resolve/main/pytorch_model.bin
I then downloaded all files to local folder and tried to load the model from local files, but got another OSError:
OSError: Error no file named pytorch_model.bin found in directory GPT-13B but there is a file for Flax weights. Use from_flax=True to load this model from those weights.

I have already tested both way on 111M, and they both work, so I assume my environment is ok.
I copied the code directly from the mdoel card and made no modifications.

Is there anything else I should try?

Hi, thanks for sharing this issue.

I believe this model is sharded by the save_pretrained() method into two pytorch_model_0000#-of-0000#+1.bin files, so you're correct—there is no https://huggingface.co/cerebras/Cerebras-GPT-13B/resolve/main/pytorch_model.bin as there is with smaller models. You may need to load both models and save as one .bin file.

Passing the directory name to the load_pretrained() function may solve the loading issue with multiple checkpoint shards. Could you check out https://huggingface.co/docs/transformers/big_models and see if it is useful?

Firstly, I've already downloaded both models and other files as well. I passed the local folder name to load_pretrained(), and I received an OSError, as I mentioned before. This way works for 111M, as well as in other multi model-files cases.
Sencondly, whether using a single model file or multi model files, the way to use load_pretrained() is the same, and the code in the model card should work regardless. There is no reason for a 404 client error to occur.

Sign up or log in to comment