Anyone else having terrible perfomance with this model on web ui?

by RebornZA - opened May 6, 2023

Discussion

RebornZA

May 6, 2023

...went from 4-5 tokens/sec on other 13b models to like 0.10 tokens/sec with this model.

tsumeone

Owner May 6, 2023

Try using this one: https://huggingface.co/TheBloke/wizard-vicuna-13B-GPTQ

I only uploaded this one because none of TheBloke's quantizations usually work on the version of GPTQ that is used by the Occam fork of KoboldAI. I only tested this with KoboldAI, where I get my normal 20-25 tokens per second on my 3090. Try using TheBloke's quantization, since his are tested with ooba.

RebornZA

May 6, 2023

Thanks so much that fixed it <3

RebornZA changed discussion status to closed May 6, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment