Anyone else having terrible perfomance with this model on web ui?
#1
by
RebornZA
- opened
...went from 4-5 tokens/sec on other 13b models to like 0.10 tokens/sec with this model.
Try using this one: https://huggingface.co/TheBloke/wizard-vicuna-13B-GPTQ
I only uploaded this one because none of TheBloke's quantizations usually work on the version of GPTQ that is used by the Occam fork of KoboldAI. I only tested this with KoboldAI, where I get my normal 20-25 tokens per second on my 3090. Try using TheBloke's quantization, since his are tested with ooba.
Thanks so much that fixed it <3
RebornZA
changed discussion status to
closed