Less than 0.05 Tokens/s on a 4090?

by Jakxx - opened Apr 23, 2023

Apr 23, 2023

Hey there! so I am relatively new to this. I loaded the model into textgeneration webui, but no matter what I do, I can not get more than 0.05 Tokens/s out of this model for some reason and I have no idea why.

This is the first model with 30B parameters that I loaded, so I have no direct comparison. The 13B models that I have run fine so far, but this one just goes absolutely nowhere.

Monero

Apr 27, 2023

•

edited Apr 27, 2023

30b models need 64gb vram to run unless they're in 4bit

Jakxx

Apr 28, 2023

Oh wait so this isn't a 4bit quantized model then.. okay makes sense.. thank you for the clarification. I should have known. It doesn't say so anywhere. ^^;

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment