Eats up all RAM + 163GB Swap
After the clone attempt failed i tried:
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("bigscience/bloom")
model = AutoModel.from_pretrained("bigscience/bloom")
This eats all Ram + Swap to 100% after the download has finished then get killed by ZSH
idk what to do anymore to get bloom running :(
You can try out Petals: https://colab.research.google.com/drive/1Ervk6HPNS6AYVr3xVdQnY5a-TjjmLCdQ?usp=sharing
Without Petals, you need 176+ GB GPU memory or RAM to run BLOOM at a decent speed.
Well, even i try with 374GB Swap, ZSH still kills it becus it occupies all memory with the above script.
We should maybe add a git tag (let's term it as "pytorch_only") pointing before the safetensors commit: 4d8e28c67403974b0f17a4ac5992e4ba0b0dbb6f but not sure if this will help - cc
@julien-c
@TimeRobber
(maybe the safetensors weights will be still downloaded to the cache?)
Then you'll be able to load the model with:
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("bigscience/bloom")
model = AutoModel.from_pretrained("bigscience/bloom", revision="pytorch_only")
Hum you can use huggingface_hub
to download specific files (which I think from_pretrained
already does). I think the issue is that the from_pretrained
also loads in memory, so I think you need to just set meta as the device or offload it to disk using accelerate.