Nexesenex's picture
Update README.md
619be19 verified
|
raw
history blame
3.71 kB
metadata
license: gemma

Experimental quants for https://huggingface.co/google/gemma-2-9b-it accordingly to LCPP PR (based on b_3529 for now) : https://github.com/ggerganov/llama.cpp/pull/8836

Get the last data on the main post of the thread.

The iMatrix I use is based on Group Merged V3 and enriched with a bit of French, a bit of Serbian, and a bit of Croatian languages.

MASTER : Gemma 2 9b It IQ2_S (with iMatrix, attn_output and attn.v in IQ3_S), quant made from BF16
Size : 2.99 GiB (2.77 BPW)
Arc-C 299     52.84280936
Arc-E 570     77.54385965
PPL 512 wikitext : 10.3868 +/- 0.07787

PR : Gemma 2 9b It IQ2_S (with Imatrix, attn_output in IQ3_XXS, and attn_v in Q4_K), quant made from BF16
Size : 3.00 GiB (2.79 BPW)
Arc-C 299     49.83277592
Arc-E 570     77.71929825
PPL 512 wikitext : 10.1303 +/- 0.07486

IQ2_M

MASTER : Gemma 2 9b It IQ2_M (with iMatrix, attn_output and attn.v in IQ3_S), quant made from BF16
Size : 3.19 GiB (2.97 BPW)
Arc-C 299     56.52173913
Arc-E 570     77.01754386
PPL 512 wikitext : 9.8154 +/- 0.07324

PR init : Gemma 2 9b It IQ2_M (with Imatrix, attn_output in IQ3_XXS, and attn_v in Q4_K), quant made from BF16
Size : 3.20 GiB (2.98 BPW)
Arc-C 299     54.18060201
Arc-E 570     78.07017544
PPL 512 wikitext :  9.5734 +/- 0.07040

PR CURRENT : Gemma 2 9b It IQ2_M, quant made from BF16
Size : 3.29 GiB (3.06 BPW)
Arc-C 299     55.85284281
Arc-E 570     78.07017544
PPL 512 wikitext : 9.4128 +/- 0.06881

IQ2_XL

PR CURRENT : Gemma 2 9b It IQ2_XL, quant made from BF16
Size : 3.41 GiB (3.17 BPW)
Arc-C 299     56.18729097
Arc-E 570     78.07017544
PPL 512 wikitext : 9.3283 +/- 0.06820

IQ3_XXS

MASTER : Gemma 2 9b It IQ3_XXS (with iMatrix, attn_k in IQ2_S, and attn_v in IQ3_XXS), quant made from BF16
Size : 3.53 GiB (3.28 BPW)
Arc-C 299 56.52173913
Arc-E 570 79.12280702
PPL 512 wikitext : 9.4116 +/- 0.06982

PR : Gemma 2 9b It IQ3_XXS (with Imatrix, attn_k in IQ3_XXS, and attn_v in Q4_K), quant made from BF16
Size : 3.60 GiB (3.35 BPW)
Arc-C 299 56.18729097
Arc-E 570 78.77192982
PPL 512 wikitext : 9.2026 +/- 0.06781

IQ3_S

MASTER : Gemma 2 9b It IQ3_S (with iMatrix, attn_v in IQ3_S), quant made from BF16
Size : 4.03 GiB (3.75 BPW)
Arc-C 299     57.52508361
Arc-E 570     77.71929825
PPL 512 wikitext : 9.2100 +/- 0.06859

PR : Gemma 2 9b It IQ3_S (with Imatrix, attn_v in Q4_K), quant made from BF16
Size : 4.07 GiB (3.79 BPW)
Arc-C 299     57.19063545
Arc-E 570     78.07017544
PPL 512 wikitext : 9.0082 +/- 0.06633

IQ3_M

MASTER : Gemma 2 9b It IQ3_M (with iMatrix, attn_output in Q4_K), quant made from BF16
Size : 4.18 GiB (3.89 BPW)
Arc-C 299     56.85618729
Arc-E 570     77.71929825
PPL 512 wikitext : 8.9697 +/- 0.06598

PR : Gemma 2 9b It IQ3_M (with Imatrix, attn_output in IQ4_XS), quant made from BF16
Size : 4.16 GiB (3.87 BPW)
Arc-C 299     57.19063545
Arc-E 570     77.71929825
PPL 512 wikitext : 8.9556 +/- 0.06586

PR rev2 : Gemma 2 9b It IQ3_M (with Imatrix, attn_output in IQ4_XS, attn.v Q5_K), quant made from BF16
Size : 4.20 GiB (3.90 BPW)²
Arc-C 299     58.52842809²
Arc-E 570     77.54385965²
PPL 512 wikitext : 8.9445 +/- 0.06576²

PR rev3 : Gemma 2 9b It IQ3_M (with Imatrix, attn_output in IQ4_XS, attn.v Q5_K, attn.k IQ4_XS), quant made from BF16
Size : 4.23 GiB (3.93 BPW)
Arc-C 299     58.19397993
Arc-E 570     77.19298246
PPL 512 wikitext : 8.9082 +/- 0.06536

IQ4_XS

MASTER : Gemma 2 9b It IQ4_XS (with iMatrix,), quant made from BF16
Size : 4.87 GiB (4.52 BPW)
Arc-C 299     57.52508361
Arc-E 570     78.24561404
PPL 512 wikitext : 8.8456 +/- 0.06533

FP16

MASTER : Gemma 2 9b It F16.
Size : 17.22 GiB (16.00 BPW)
Arc-C 299     59.53177258
Arc-E 570     78.77192982
PPL 512 wikitext : 8.7881 +/- 0.06533