metadata
license: gemma
Experimental quants for https://huggingface.co/google/gemma-2-9b-it accordingly to LCPP PR (based on b_3529 for now) : https://github.com/ggerganov/llama.cpp/pull/8836
Get the last data on the main post of the thread.
The iMatrix I use is based on Group Merged V3 and enriched with a bit of French, a bit of Serbian, and a bit of Croatian languages.
MASTER : Gemma 2 9b It IQ2_S (with iMatrix, attn_output and attn.v in IQ3_S), quant made from BF16
Size : 2.99 GiB (2.77 BPW)
Arc-C 299 52.84280936
Arc-E 570 77.54385965
PPL 512 wikitext : 10.3868 +/- 0.07787
PR : Gemma 2 9b It IQ2_S (with Imatrix, attn_output in IQ3_XXS, and attn_v in Q4_K), quant made from BF16
Size : 3.00 GiB (2.79 BPW)
Arc-C 299 49.83277592
Arc-E 570 77.71929825
PPL 512 wikitext : 10.1303 +/- 0.07486
IQ2_M
MASTER : Gemma 2 9b It IQ2_M (with iMatrix, attn_output and attn.v in IQ3_S), quant made from BF16
Size : 3.19 GiB (2.97 BPW)
Arc-C 299 56.52173913
Arc-E 570 77.01754386
PPL 512 wikitext : 9.8154 +/- 0.07324
PR init : Gemma 2 9b It IQ2_M (with Imatrix, attn_output in IQ3_XXS, and attn_v in Q4_K), quant made from BF16
Size : 3.20 GiB (2.98 BPW)
Arc-C 299 54.18060201
Arc-E 570 78.07017544
PPL 512 wikitext : 9.5734 +/- 0.07040
PR CURRENT : Gemma 2 9b It IQ2_M, quant made from BF16
Size : 3.29 GiB (3.06 BPW)
Arc-C 299 55.85284281
Arc-E 570 78.07017544
PPL 512 wikitext : 9.4128 +/- 0.06881
IQ2_XL
PR CURRENT : Gemma 2 9b It IQ2_XL, quant made from BF16
Size : 3.41 GiB (3.17 BPW)
Arc-C 299 56.18729097
Arc-E 570 78.07017544
PPL 512 wikitext : 9.3283 +/- 0.06820
IQ3_XXS
MASTER : Gemma 2 9b It IQ3_XXS (with iMatrix, attn_k in IQ2_S, and attn_v in IQ3_XXS), quant made from BF16
Size : 3.53 GiB (3.28 BPW)
Arc-C 299 56.52173913
Arc-E 570 79.12280702
PPL 512 wikitext : 9.4116 +/- 0.06982
PR : Gemma 2 9b It IQ3_XXS (with Imatrix, attn_k in IQ3_XXS, and attn_v in Q4_K), quant made from BF16
Size : 3.60 GiB (3.35 BPW)
Arc-C 299 56.18729097
Arc-E 570 78.77192982
PPL 512 wikitext : 9.2026 +/- 0.06781
IQ3_S
MASTER : Gemma 2 9b It IQ3_S (with iMatrix, attn_v in IQ3_S), quant made from BF16
Size : 4.03 GiB (3.75 BPW)
Arc-C 299 57.52508361
Arc-E 570 77.71929825
PPL 512 wikitext : 9.2100 +/- 0.06859
PR : Gemma 2 9b It IQ3_S (with Imatrix, attn_v in Q4_K), quant made from BF16
Size : 4.07 GiB (3.79 BPW)
Arc-C 299 57.19063545
Arc-E 570 78.07017544
PPL 512 wikitext : 9.0082 +/- 0.06633
IQ3_M
MASTER : Gemma 2 9b It IQ3_M (with iMatrix, attn_output in Q4_K), quant made from BF16
Size : 4.18 GiB (3.89 BPW)
Arc-C 299 56.85618729
Arc-E 570 77.71929825
PPL 512 wikitext : 8.9697 +/- 0.06598
PR : Gemma 2 9b It IQ3_M (with Imatrix, attn_output in IQ4_XS), quant made from BF16
Size : 4.16 GiB (3.87 BPW)
Arc-C 299 57.19063545
Arc-E 570 77.71929825
PPL 512 wikitext : 8.9556 +/- 0.06586
PR rev2 : Gemma 2 9b It IQ3_M (with Imatrix, attn_output in IQ4_XS, attn.v Q5_K), quant made from BF16
Size : 4.20 GiB (3.90 BPW)²
Arc-C 299 58.52842809²
Arc-E 570 77.54385965²
PPL 512 wikitext : 8.9445 +/- 0.06576²
PR rev3 : Gemma 2 9b It IQ3_M (with Imatrix, attn_output in IQ4_XS, attn.v Q5_K, attn.k IQ4_XS), quant made from BF16
Size : 4.23 GiB (3.93 BPW)
Arc-C 299 58.19397993
Arc-E 570 77.19298246
PPL 512 wikitext : 8.9082 +/- 0.06536
IQ4_XS
MASTER : Gemma 2 9b It IQ4_XS (with iMatrix,), quant made from BF16
Size : 4.87 GiB (4.52 BPW)
Arc-C 299 57.52508361
Arc-E 570 78.24561404
PPL 512 wikitext : 8.8456 +/- 0.06533
FP16
MASTER : Gemma 2 9b It F16.
Size : 17.22 GiB (16.00 BPW)
Arc-C 299 59.53177258
Arc-E 570 78.77192982
PPL 512 wikitext : 8.7881 +/- 0.06533