k-quants are now unblocked by ggerganov/llama.cpp/pull/2001
As in the title, k-quants are now unblocked, since ggerganov/llama.cpp/pull/2001 is merged.
That's if I understood the issue correctly.
I don't think that helps with this model, as 32001 is not divisible by 64 either. And it requires a compile-time option, so people could only use them if they manually re-compiled their llama.cpp or llama-cpp-python. Also it degrades quality quite a lot.
But today we got this PR which fixes it much more elegantly: https://github.com/ggerganov/llama.cpp/pull/2148 . That doesn't require any special compilation by the user, and quality isn't materially affected. File sizes will be fractionally large, but only by 1-2%.
I will be uploading k-quants using this new method tomorrow; I don't even need to wait for the PR to be merged.