So far the best model I've tested. Just a little slow

#15

by victorx98 - opened Feb 1

Discussion

victorx98

Feb 1

on my 3090 machine
can't wait to see the result once the community start fine tune and optimize the speed

bubaigawa

Feb 2

I would also like to try this model. Could you give me an overview of how you conducted the test?

victorx98

Feb 2

I would also like to try this model. Could you give me an overview of how you conducted the test?

Use llama.cpp: https://github.com/ggerganov/llama.cpp

svanschalkwyk

Feb 5

Any quantized versions of this? Need smaller than 24GB.

CamiloMM

Feb 5

•

edited Feb 5

Any quantized versions of this? Need smaller than 24GB.

The one that fits on 24GB is q2_0. It's already pretty extreme quantization with significant quality degradation, just so it can (barely) chug along on a top of the line GPU.

svanschalkwyk

Feb 5

Ksgk-fy

Feb 6

how come it gives error, saying the loaded weights have different shape than the model ...

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment