Necessary hardware for Operating the 34B Model
#40
by
blurjp
- opened
I currently use a 4090, but the inference process is extremely slow. Is it impractical to expect this model to run efficiently on just a single 4090?
Did you solve that? I have a same problem.
You can use these 2 bit versions made with quip#. Inference is slower than usual but it should work on a single 4090.
https://huggingface.co/KnutJaegersberg/Tess-M-34B-2bit
https://huggingface.co/KnutJaegersberg/orca-mini-70b-2bit