Will quant this (and others) when all the llamacpp fixes are in
#1
by
bartowski
- opened
Since there's still pending fixes (https://github.com/ggerganov/llama.cpp/pull/8676) will be holding off on most llama 3.1 quants, just so you know :)
Since there's still pending fixes (https://github.com/ggerganov/llama.cpp/pull/8676) will be holding off on most llama 3.1 quants, just so you know :)
Appreciate the heads up my dude!
This comment has been hidden
I made some experimental quants with the PR pulled in. Koboldcpp frankenfork has support for models with the PR.
I ran the BABIlong 32k qa2 dataset prompts on the broken quant and new, the unfixed one just repeats tokens but the new quant indeed at least produces sane responses at Q5. I've updated the linked repo to point to those