Update README.md
Browse files
README.md
CHANGED
@@ -22,11 +22,10 @@ Original model: https://huggingface.co/google/gemma-2-9b-it
|
|
22 |
|
23 |
All quants made using imatrix option with dataset from [here](https://gist.github.com/bartowski1182/eb213dccb3571f863da82e99418f81e8)
|
24 |
|
25 |
-
Experimental quants are made with `--output-tensor-type f16 --token-embedding-type f16` per [ZeroWw](https://huggingface.co/ZeroWw)'s suggestion, please provide any feedback on quality differences you spot.
|
26 |
-
|
27 |
## What's new
|
28 |
|
29 |
- June 21 2024: Contains latest tokenizer fixes, which addressed a few oddities from the original fix, should be closest to correct performance yet. Also has metadata for SWA and logit softcapping.
|
|
|
30 |
|
31 |
## Prompt format
|
32 |
|
@@ -45,27 +44,34 @@ Note that this model does not support a System prompt.
|
|
45 |
| -------- | ---------- | --------- | ----------- |
|
46 |
| [gemma-2-9b-it-Q8_0_L.gguf](https://huggingface.co/bartowski/gemma-2-9b-it-GGUF/blob/main/gemma-2-9b-it-Q8_1.gguf) | Q8_0_L | 10.68GB | *Experimental*, uses f16 for embed and output weights. Please provide any feedback of differences. Extremely high quality, generally unneeded but max available quant. |
|
47 |
| [gemma-2-9b-it-Q8_0.gguf](https://huggingface.co/bartowski/gemma-2-9b-it-GGUF/blob/main/gemma-2-9b-it-Q8_0.gguf) | Q8_0 | 9.82GB | Extremely high quality, generally unneeded but max available quant. |
|
48 |
-
| [gemma-2-9b-it-Q6_K_L.gguf](https://huggingface.co/bartowski/gemma-2-9b-it-GGUF/blob/main/gemma-2-9b-it-Q6_K_L.gguf) | Q6_K_L |
|
49 |
| [gemma-2-9b-it-Q6_K.gguf](https://huggingface.co/bartowski/gemma-2-9b-it-GGUF/blob/main/gemma-2-9b-it-Q6_K.gguf) | Q6_K | 7.58GB | Very high quality, near perfect, *recommended*. |
|
50 |
-
| [gemma-2-9b-it-Q5_K_L.gguf](https://huggingface.co/bartowski/gemma-2-9b-it-GGUF/blob/main/gemma-2-9b-it-Q5_K_L.gguf) | Q5_K_L |
|
51 |
| [gemma-2-9b-it-Q5_K_M.gguf](https://huggingface.co/bartowski/gemma-2-9b-it-GGUF/blob/main/gemma-2-9b-it-Q5_K_M.gguf) | Q5_K_M | 6.64GB | High quality, *recommended*. |
|
52 |
| [gemma-2-9b-it-Q5_K_S.gguf](https://huggingface.co/bartowski/gemma-2-9b-it-GGUF/blob/main/gemma-2-9b-it-Q5_K_S.gguf) | Q5_K_S | 6.48GB | High quality, *recommended*. |
|
53 |
-
| [gemma-2-9b-it-Q4_K_L.gguf](https://huggingface.co/bartowski/gemma-2-9b-it-GGUF/blob/main/gemma-2-9b-it-Q4_K_L.gguf) | Q4_K_L |
|
54 |
| [gemma-2-9b-it-Q4_K_M.gguf](https://huggingface.co/bartowski/gemma-2-9b-it-GGUF/blob/main/gemma-2-9b-it-Q4_K_M.gguf) | Q4_K_M | 5.76GB | Good quality, uses about 4.83 bits per weight, *recommended*. |
|
55 |
| [gemma-2-9b-it-Q4_K_S.gguf](https://huggingface.co/bartowski/gemma-2-9b-it-GGUF/blob/main/gemma-2-9b-it-Q4_K_S.gguf) | Q4_K_S | 5.47GB | Slightly lower quality with more space savings, *recommended*. |
|
56 |
| [gemma-2-9b-it-IQ4_XS.gguf](https://huggingface.co/bartowski/gemma-2-9b-it-GGUF/blob/main/gemma-2-9b-it-IQ4_XS.gguf) | IQ4_XS | 5.18GB | Decent quality, smaller than Q4_K_S with similar performance, *recommended*. |
|
57 |
-
| [gemma-2-9b-it-Q3_K_XL.gguf](https://huggingface.co/bartowski/gemma-2-9b-it-GGUF/blob/main/gemma-2-9b-it-Q3_K_XL.gguf) | Q3_K_XL |
|
58 |
| [gemma-2-9b-it-Q3_K_L.gguf](https://huggingface.co/bartowski/gemma-2-9b-it-GGUF/blob/main/gemma-2-9b-it-Q3_K_L.gguf) | Q3_K_L | 5.13GB | Lower quality but usable, good for low RAM availability. |
|
59 |
| [gemma-2-9b-it-Q3_K_M.gguf](https://huggingface.co/bartowski/gemma-2-9b-it-GGUF/blob/main/gemma-2-9b-it-Q3_K_M.gguf) | Q3_K_M | 4.76GB | Even lower quality. |
|
60 |
| [gemma-2-9b-it-IQ3_M.gguf](https://huggingface.co/bartowski/gemma-2-9b-it-GGUF/blob/main/gemma-2-9b-it-IQ3_M.gguf) | IQ3_M | 4.49GB | Medium-low quality, new method with decent performance comparable to Q3_K_M. |
|
61 |
| [gemma-2-9b-it-Q3_K_S.gguf](https://huggingface.co/bartowski/gemma-2-9b-it-GGUF/blob/main/gemma-2-9b-it-Q3_K_S.gguf) | Q3_K_S | 4.33GB | Low quality, not recommended. |
|
62 |
| [gemma-2-9b-it-IQ3_XS.gguf](https://huggingface.co/bartowski/gemma-2-9b-it-GGUF/blob/main/gemma-2-9b-it-IQ3_XS.gguf) | IQ3_XS | 4.14GB | Lower quality, new method with decent performance, slightly better than Q3_K_S. |
|
63 |
| [gemma-2-9b-it-IQ3_XXS.gguf](https://huggingface.co/bartowski/gemma-2-9b-it-GGUF/blob/main/gemma-2-9b-it-IQ3_XXS.gguf) | IQ3_XXS | 3.79GB | Lower quality, new method with decent performance, comparable to Q3 quants. |
|
|
|
64 |
| [gemma-2-9b-it-Q2_K.gguf](https://huggingface.co/bartowski/gemma-2-9b-it-GGUF/blob/main/gemma-2-9b-it-Q2_K.gguf) | Q2_K | 3.80GB | Very low quality but surprisingly usable. |
|
65 |
| [gemma-2-9b-it-IQ2_M.gguf](https://huggingface.co/bartowski/gemma-2-9b-it-GGUF/blob/main/gemma-2-9b-it-IQ2_M.gguf) | IQ2_M | 3.43GB | Very low quality, uses SOTA techniques to also be surprisingly usable. |
|
66 |
| [gemma-2-9b-it-IQ2_S.gguf](https://huggingface.co/bartowski/gemma-2-9b-it-GGUF/blob/main/gemma-2-9b-it-IQ2_S.gguf) | IQ2_S | 3.21GB | Very low quality, uses SOTA techniques to be usable. |
|
67 |
| [gemma-2-9b-it-IQ2_XS.gguf](https://huggingface.co/bartowski/gemma-2-9b-it-GGUF/blob/main/gemma-2-9b-it-IQ2_XS.gguf) | IQ2_XS | 3.06GB | Very low quality, uses SOTA techniques to be usable. |
|
68 |
|
|
|
|
|
|
|
|
|
|
|
|
|
69 |
## Downloading using huggingface-cli
|
70 |
|
71 |
First, make sure you have hugginface-cli installed:
|
|
|
22 |
|
23 |
All quants made using imatrix option with dataset from [here](https://gist.github.com/bartowski1182/eb213dccb3571f863da82e99418f81e8)
|
24 |
|
|
|
|
|
25 |
## What's new
|
26 |
|
27 |
- June 21 2024: Contains latest tokenizer fixes, which addressed a few oddities from the original fix, should be closest to correct performance yet. Also has metadata for SWA and logit softcapping.
|
28 |
+
- July 3 2024: Updated the experimental quants to newer method, Q8 for embed/output, yields higher quality at much lower size than f16 (left Q8_0_L since Q8_0 is already Q8 embed/output)
|
29 |
|
30 |
## Prompt format
|
31 |
|
|
|
44 |
| -------- | ---------- | --------- | ----------- |
|
45 |
| [gemma-2-9b-it-Q8_0_L.gguf](https://huggingface.co/bartowski/gemma-2-9b-it-GGUF/blob/main/gemma-2-9b-it-Q8_1.gguf) | Q8_0_L | 10.68GB | *Experimental*, uses f16 for embed and output weights. Please provide any feedback of differences. Extremely high quality, generally unneeded but max available quant. |
|
46 |
| [gemma-2-9b-it-Q8_0.gguf](https://huggingface.co/bartowski/gemma-2-9b-it-GGUF/blob/main/gemma-2-9b-it-Q8_0.gguf) | Q8_0 | 9.82GB | Extremely high quality, generally unneeded but max available quant. |
|
47 |
+
| [gemma-2-9b-it-Q6_K_L.gguf](https://huggingface.co/bartowski/gemma-2-9b-it-GGUF/blob/main/gemma-2-9b-it-Q6_K_L.gguf) | Q6_K_L | 7.81GB | Uses Q8_0 for embed and output weights. Very high quality, near perfect, *recommended*. |
|
48 |
| [gemma-2-9b-it-Q6_K.gguf](https://huggingface.co/bartowski/gemma-2-9b-it-GGUF/blob/main/gemma-2-9b-it-Q6_K.gguf) | Q6_K | 7.58GB | Very high quality, near perfect, *recommended*. |
|
49 |
+
| [gemma-2-9b-it-Q5_K_L.gguf](https://huggingface.co/bartowski/gemma-2-9b-it-GGUF/blob/main/gemma-2-9b-it-Q5_K_L.gguf) | Q5_K_L | 6.87GB | Uses Q8_0 for embed and output weights. High quality, *recommended*. |
|
50 |
| [gemma-2-9b-it-Q5_K_M.gguf](https://huggingface.co/bartowski/gemma-2-9b-it-GGUF/blob/main/gemma-2-9b-it-Q5_K_M.gguf) | Q5_K_M | 6.64GB | High quality, *recommended*. |
|
51 |
| [gemma-2-9b-it-Q5_K_S.gguf](https://huggingface.co/bartowski/gemma-2-9b-it-GGUF/blob/main/gemma-2-9b-it-Q5_K_S.gguf) | Q5_K_S | 6.48GB | High quality, *recommended*. |
|
52 |
+
| [gemma-2-9b-it-Q4_K_L.gguf](https://huggingface.co/bartowski/gemma-2-9b-it-GGUF/blob/main/gemma-2-9b-it-Q4_K_L.gguf) | Q4_K_L | 5.98GB | Uses Q8_0 for embed and output weights. Good quality, uses about 4.83 bits per weight, *recommended*. |
|
53 |
| [gemma-2-9b-it-Q4_K_M.gguf](https://huggingface.co/bartowski/gemma-2-9b-it-GGUF/blob/main/gemma-2-9b-it-Q4_K_M.gguf) | Q4_K_M | 5.76GB | Good quality, uses about 4.83 bits per weight, *recommended*. |
|
54 |
| [gemma-2-9b-it-Q4_K_S.gguf](https://huggingface.co/bartowski/gemma-2-9b-it-GGUF/blob/main/gemma-2-9b-it-Q4_K_S.gguf) | Q4_K_S | 5.47GB | Slightly lower quality with more space savings, *recommended*. |
|
55 |
| [gemma-2-9b-it-IQ4_XS.gguf](https://huggingface.co/bartowski/gemma-2-9b-it-GGUF/blob/main/gemma-2-9b-it-IQ4_XS.gguf) | IQ4_XS | 5.18GB | Decent quality, smaller than Q4_K_S with similar performance, *recommended*. |
|
56 |
+
| [gemma-2-9b-it-Q3_K_XL.gguf](https://huggingface.co/bartowski/gemma-2-9b-it-GGUF/blob/main/gemma-2-9b-it-Q3_K_XL.gguf) | Q3_K_XL | 5.35GB | Uses Q8_0 for embed and output weights. Lower quality but usable, good for low RAM availability. |
|
57 |
| [gemma-2-9b-it-Q3_K_L.gguf](https://huggingface.co/bartowski/gemma-2-9b-it-GGUF/blob/main/gemma-2-9b-it-Q3_K_L.gguf) | Q3_K_L | 5.13GB | Lower quality but usable, good for low RAM availability. |
|
58 |
| [gemma-2-9b-it-Q3_K_M.gguf](https://huggingface.co/bartowski/gemma-2-9b-it-GGUF/blob/main/gemma-2-9b-it-Q3_K_M.gguf) | Q3_K_M | 4.76GB | Even lower quality. |
|
59 |
| [gemma-2-9b-it-IQ3_M.gguf](https://huggingface.co/bartowski/gemma-2-9b-it-GGUF/blob/main/gemma-2-9b-it-IQ3_M.gguf) | IQ3_M | 4.49GB | Medium-low quality, new method with decent performance comparable to Q3_K_M. |
|
60 |
| [gemma-2-9b-it-Q3_K_S.gguf](https://huggingface.co/bartowski/gemma-2-9b-it-GGUF/blob/main/gemma-2-9b-it-Q3_K_S.gguf) | Q3_K_S | 4.33GB | Low quality, not recommended. |
|
61 |
| [gemma-2-9b-it-IQ3_XS.gguf](https://huggingface.co/bartowski/gemma-2-9b-it-GGUF/blob/main/gemma-2-9b-it-IQ3_XS.gguf) | IQ3_XS | 4.14GB | Lower quality, new method with decent performance, slightly better than Q3_K_S. |
|
62 |
| [gemma-2-9b-it-IQ3_XXS.gguf](https://huggingface.co/bartowski/gemma-2-9b-it-GGUF/blob/main/gemma-2-9b-it-IQ3_XXS.gguf) | IQ3_XXS | 3.79GB | Lower quality, new method with decent performance, comparable to Q3 quants. |
|
63 |
+
| [gemma-2-9b-it-Q2_K_L.gguf](https://huggingface.co/bartowski/gemma-2-9b-it-GGUF/blob/main/gemma-2-9b-it-Q2_K_L.gguf) | Q2_K_L | 4.02GB | Uses Q8_0 for embed and output weights. Very low quality but surprisingly usable. |
|
64 |
| [gemma-2-9b-it-Q2_K.gguf](https://huggingface.co/bartowski/gemma-2-9b-it-GGUF/blob/main/gemma-2-9b-it-Q2_K.gguf) | Q2_K | 3.80GB | Very low quality but surprisingly usable. |
|
65 |
| [gemma-2-9b-it-IQ2_M.gguf](https://huggingface.co/bartowski/gemma-2-9b-it-GGUF/blob/main/gemma-2-9b-it-IQ2_M.gguf) | IQ2_M | 3.43GB | Very low quality, uses SOTA techniques to also be surprisingly usable. |
|
66 |
| [gemma-2-9b-it-IQ2_S.gguf](https://huggingface.co/bartowski/gemma-2-9b-it-GGUF/blob/main/gemma-2-9b-it-IQ2_S.gguf) | IQ2_S | 3.21GB | Very low quality, uses SOTA techniques to be usable. |
|
67 |
| [gemma-2-9b-it-IQ2_XS.gguf](https://huggingface.co/bartowski/gemma-2-9b-it-GGUF/blob/main/gemma-2-9b-it-IQ2_XS.gguf) | IQ2_XS | 3.06GB | Very low quality, uses SOTA techniques to be usable. |
|
68 |
|
69 |
+
## Credits
|
70 |
+
|
71 |
+
Thank you kalomaze and Dampf for assistance in creating the imatrix calibration dataset
|
72 |
+
|
73 |
+
Thank you ZeroWw for the inspiration to experiment with embed/output
|
74 |
+
|
75 |
## Downloading using huggingface-cli
|
76 |
|
77 |
First, make sure you have hugginface-cli installed:
|