Less than 0.05 Tokens/s on a 4090?

#2
by Jakxx - opened

Hey there! so I am relatively new to this. I loaded the model into textgeneration webui, but no matter what I do, I can not get more than 0.05 Tokens/s out of this model for some reason and I have no idea why.

This is the first model with 30B parameters that I loaded, so I have no direct comparison. The 13B models that I have run fine so far, but this one just goes absolutely nowhere.

30b models need 64gb vram to run unless they're in 4bit

Oh wait so this isn't a 4bit quantized model then.. okay makes sense.. thank you for the clarification. I should have known. It doesn't say so anywhere. ^^;

Sign up or log in to comment