Text Generation
Transformers
PyTorch
TensorBoard
Safetensors
bloom
Eval Results
text-generation-inference
Inference Endpoints

Eats up all RAM + 163GB Swap

#167
by LuvIsBadToTheBone - opened

After the clone attempt failed i tried:

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("bigscience/bloom")

model = AutoModel.from_pretrained("bigscience/bloom")

This eats all Ram + Swap to 100% after the download has finished then get killed by ZSH
idk what to do anymore to get bloom running :(

BigScience Workshop org

You can try out Petals: https://colab.research.google.com/drive/1Ervk6HPNS6AYVr3xVdQnY5a-TjjmLCdQ?usp=sharing

Without Petals, you need 176+ GB GPU memory or RAM to run BLOOM at a decent speed.

Well, even i try with 374GB Swap, ZSH still kills it becus it occupies all memory with the above script.

BigScience Workshop org
edited Jan 19, 2023

We should maybe add a git tag (let's term it as "pytorch_only") pointing before the safetensors commit: 4d8e28c67403974b0f17a4ac5992e4ba0b0dbb6f but not sure if this will help - cc @julien-c @TimeRobber (maybe the safetensors weights will be still downloaded to the cache?)
Then you'll be able to load the model with:

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("bigscience/bloom")

model = AutoModel.from_pretrained("bigscience/bloom", revision="pytorch_only")
BigScience Workshop org

Hum you can use huggingface_hub to download specific files (which I think from_pretrained already does). I think the issue is that the from_pretrained also loads in memory, so I think you need to just set meta as the device or offload it to disk using accelerate.

Sign up or log in to comment