So far the best model I've tested. Just a little slow
#15
by
victorx98
- opened
on my 3090 machine
can't wait to see the result once the community start fine tune and optimize the speed
I would also like to try this model. Could you give me an overview of how you conducted the test?
I would also like to try this model. Could you give me an overview of how you conducted the test?
Use llama.cpp: https://github.com/ggerganov/llama.cpp
Any quantized versions of this? Need smaller than 24GB.
Any quantized versions of this? Need smaller than 24GB.
The one that fits on 24GB is q2_0. It's already pretty extreme quantization with significant quality degradation, just so it can (barely) chug along on a top of the line GPU.
how come it gives error, saying the loaded weights have different shape than the model ...