higher quants?

#1
by Insanelycool - opened

for such a small model it might be nice to have larger quants as an option

Owner

Most people would not call 22B small. :)

I focus exclusively on i-quants (hence the SOTA-GGUF), mainly because the regular quants are well covered by many others, but also because I put significant effort into the smaller quants with testing, purpose-matched imatrix, custom chat templates, etc for the sole purpose of getting as small models as possible while still being as usable as possible so that they can be run on "low" spec hardware.

However, I have started uploading the base B/F16 GGUFs in every repo, which means anyone can easily do their own quants by downloading the B/F16 GGUF and the imatrix and running the following command from llama.cpp:

./quantize --imatrix model.imatrix.dat model.bf16.gguf model.Q5_K_M.gguf q5_k_m

Sign up or log in to comment