Less than 0.05 Tokens/s on a 4090?
#2
by
Jakxx
- opened
Hey there! so I am relatively new to this. I loaded the model into textgeneration webui, but no matter what I do, I can not get more than 0.05 Tokens/s out of this model for some reason and I have no idea why.
This is the first model with 30B parameters that I loaded, so I have no direct comparison. The 13B models that I have run fine so far, but this one just goes absolutely nowhere.
30b models need 64gb vram to run unless they're in 4bit
Oh wait so this isn't a 4bit quantized model then.. okay makes sense.. thank you for the clarification. I should have known. It doesn't say so anywhere. ^^;