|
--- |
|
license: gemma |
|
--- |
|
Experimental quants for https://huggingface.co/google/gemma-2-9b-it accordingly to LCPP PR (based on b_3529 for now) : https://github.com/ggerganov/llama.cpp/pull/8836 |
|
|
|
Get the last data on the main post of the thread. |
|
|
|
The iMatrix I use is based on Group Merged V3 and enriched with a bit of French, a bit of Serbian, and a bit of Croatian languages. |
|
|
|
``` |
|
MASTER : Gemma 2 9b It IQ2_S (with iMatrix, attn_output and attn.v in IQ3_S), quant made from BF16 |
|
Size : 2.99 GiB (2.77 BPW) |
|
Arc-C 299 52.84280936 |
|
Arc-E 570 77.54385965 |
|
PPL 512 wikitext : 10.3868 +/- 0.07787 |
|
|
|
PR : Gemma 2 9b It IQ2_S (with Imatrix, attn_output in IQ3_XXS, and attn_v in Q4_K), quant made from BF16 |
|
Size : 3.00 GiB (2.79 BPW) |
|
Arc-C 299 49.83277592 |
|
Arc-E 570 77.71929825 |
|
PPL 512 wikitext : 10.1303 +/- 0.07486 |
|
|
|
IQ2_M |
|
|
|
MASTER : Gemma 2 9b It IQ2_M (with iMatrix, attn_output and attn.v in IQ3_S), quant made from BF16 |
|
Size : 3.19 GiB (2.97 BPW) |
|
Arc-C 299 56.52173913 |
|
Arc-E 570 77.01754386 |
|
PPL 512 wikitext : 9.8154 +/- 0.07324 |
|
|
|
PR init : Gemma 2 9b It IQ2_M (with Imatrix, attn_output in IQ3_XXS, and attn_v in Q4_K), quant made from BF16 |
|
Size : 3.20 GiB (2.98 BPW) |
|
Arc-C 299 54.18060201 |
|
Arc-E 570 78.07017544 |
|
PPL 512 wikitext : 9.5734 +/- 0.07040 |
|
|
|
PR CURRENT : Gemma 2 9b It IQ2_M, quant made from BF16 |
|
Size : 3.29 GiB (3.06 BPW) |
|
Arc-C 299 55.85284281 |
|
Arc-E 570 78.07017544 |
|
PPL 512 wikitext : 9.4128 +/- 0.06881 |
|
|
|
IQ2_XL |
|
|
|
PR CURRENT : Gemma 2 9b It IQ2_XL, quant made from BF16 |
|
Size : 3.41 GiB (3.17 BPW) |
|
Arc-C 299 56.18729097 |
|
Arc-E 570 78.07017544 |
|
PPL 512 wikitext : 9.3283 +/- 0.06820 |
|
|
|
IQ3_XXS |
|
|
|
MASTER : Gemma 2 9b It IQ3_XXS (with iMatrix, attn_k in IQ2_S, and attn_v in IQ3_XXS), quant made from BF16 |
|
Size : 3.53 GiB (3.28 BPW) |
|
Arc-C 299 56.52173913 |
|
Arc-E 570 79.12280702 |
|
PPL 512 wikitext : 9.4116 +/- 0.06982 |
|
|
|
PR : Gemma 2 9b It IQ3_XXS (with Imatrix, attn_k in IQ3_XXS, and attn_v in Q4_K), quant made from BF16 |
|
Size : 3.60 GiB (3.35 BPW) |
|
Arc-C 299 56.18729097 |
|
Arc-E 570 78.77192982 |
|
PPL 512 wikitext : 9.2026 +/- 0.06781 |
|
|
|
IQ3_S |
|
|
|
MASTER : Gemma 2 9b It IQ3_S (with iMatrix, attn_v in IQ3_S), quant made from BF16 |
|
Size : 4.03 GiB (3.75 BPW) |
|
Arc-C 299 57.52508361 |
|
Arc-E 570 77.71929825 |
|
PPL 512 wikitext : 9.2100 +/- 0.06859 |
|
|
|
PR : Gemma 2 9b It IQ3_S (with Imatrix, attn_v in Q4_K), quant made from BF16 |
|
Size : 4.07 GiB (3.79 BPW) |
|
Arc-C 299 57.19063545 |
|
Arc-E 570 78.07017544 |
|
PPL 512 wikitext : 9.0082 +/- 0.06633 |
|
|
|
IQ3_M |
|
|
|
MASTER : Gemma 2 9b It IQ3_M (with iMatrix, attn_output in Q4_K), quant made from BF16 |
|
Size : 4.18 GiB (3.89 BPW) |
|
Arc-C 299 56.85618729 |
|
Arc-E 570 77.71929825 |
|
PPL 512 wikitext : 8.9697 +/- 0.06598 |
|
|
|
PR : Gemma 2 9b It IQ3_M (with Imatrix, attn_output in IQ4_XS), quant made from BF16 |
|
Size : 4.16 GiB (3.87 BPW) |
|
Arc-C 299 57.19063545 |
|
Arc-E 570 77.71929825 |
|
PPL 512 wikitext : 8.9556 +/- 0.06586 |
|
|
|
PR rev2 : Gemma 2 9b It IQ3_M (with Imatrix, attn_output in IQ4_XS, attn.v Q5_K), quant made from BF16 |
|
Size : 4.20 GiB (3.90 BPW)² |
|
Arc-C 299 58.52842809² |
|
Arc-E 570 77.54385965² |
|
PPL 512 wikitext : 8.9445 +/- 0.06576² |
|
|
|
PR rev3 : Gemma 2 9b It IQ3_M (with Imatrix, attn_output in IQ4_XS, attn.v Q5_K, attn.k IQ4_XS), quant made from BF16 |
|
Size : 4.23 GiB (3.93 BPW) |
|
Arc-C 299 58.19397993 |
|
Arc-E 570 77.19298246 |
|
PPL 512 wikitext : 8.9082 +/- 0.06536 |
|
|
|
IQ4_XS |
|
|
|
MASTER : Gemma 2 9b It IQ4_XS (with iMatrix,), quant made from BF16 |
|
Size : 4.87 GiB (4.52 BPW) |
|
Arc-C 299 57.52508361 |
|
Arc-E 570 78.24561404 |
|
PPL 512 wikitext : 8.8456 +/- 0.06533 |
|
|
|
FP16 |
|
|
|
MASTER : Gemma 2 9b It F16. |
|
Size : 17.22 GiB (16.00 BPW) |
|
Arc-C 299 59.53177258 |
|
Arc-E 570 78.77192982 |
|
PPL 512 wikitext : 8.7881 +/- 0.06533 |
|
``` |