TheBloke/WizardLM-13B-V1.1-GGML · k-quants are now unblocked by ggerganov/llama.cpp/pull/2001

I don't think that helps with this model, as 32001 is not divisible by 64 either. And it requires a compile-time option, so people could only use them if they manually re-compiled their llama.cpp or llama-cpp-python. Also it degrades quality quite a lot.

But today we got this PR which fixes it much more elegantly: https://github.com/ggerganov/llama.cpp/pull/2148 . That doesn't require any special compilation by the user, and quality isn't materially affected. File sizes will be fractionally large, but only by 1-2%.

I will be uploading k-quants using this new method tomorrow; I don't even need to wait for the PR to be merged.