no Q8?

#1
by GhostGate - opened

I was just wondering if there are no Q8 or Q6 quants.

There will be, IQs are uploading first. My local upload speed is slow, so this will take some time due to both size and quantity of files.

How would that even work on Q8? If I understand correctly this is no finetune but simply using "horror" imatrix to preserve relevant weights for horror with more relevance during quantization. But with Q8 (and probably Q6 too) I would expect it would have almost no effect since all weights are preserved well anyway?
Nice idea though. I was thinking along the same lines that if say one used imatrix with code snipets, if it would preserve better the coding ability of the quantized model.

There is still effect at Q6, but not as much.
Q8, although barely showing in PPL, it does affect it too.

Effects can be verified by testing "Q6" unaltered against Imat Q6, with "temp=0" and a creative test prompt.

The Neo class datasets are far stronger than average imat datasets as their are calibrated for the LLM and Imatrix process.
They are precision formatted based on a lot of trial and error and testing.
(rather than a copy/paste text file "mess" so to speak)

That being said, I recommend IQ4XS, and to a smaller degree Q4s and Q5s.

Neo class datasets were also used here:
https://huggingface.co/DavidAU/Command-R-01-Ultra-NEO-V1-35B-IMATRIX-GGUF

and in a number of other models also at my repo.

Same guidance applies.

For creative use cases, I usually do not recommend Q8 as there seems to a "drop off" or "dulling" at Q8 vs Q6/Q5KM.
This varies from model to model and based on testing a lot of models.

In terms of "horror" level ; the Grand Horror series of models far and above exceed Command-R Dark Horror due to their construction.

DavidAU changed discussion status to closed

Sign up or log in to comment