Upload README.md
Browse files
README.md
CHANGED
@@ -50,15 +50,15 @@ The perfect score is 5.00. As a reference, bartowski's gemma-2-27b-it.Q6_K.gguf
|
|
50 |
| Filename | Quant type | File Size | Split | ELIZA-Tasks-100 | Nvidia 3090 | Description |
|
51 |
| -------- | ---------- | --------- | ----- | --------------- | ----------- | ----------- |
|
52 |
| [gemma-2-9b-it.f16.gguf](https://huggingface.co/ymcki/gemma-2-9b-it-GGUF/blob/main/gemma-2-9b-it.f16.gguf) | f16 | 18.49GB | false | 3.75 | 31.9t/s | Full F16 weights. |
|
53 |
-
| [gemma-2-9b-it.Q8_0.gguf](https://huggingface.co/ymcki/gemma-2-9b-it-GGUF/blob/main/gemma-2-2b-jpn-it.Q8_0.gguf) | Q8_0 | 9.83GB | false | 3.
|
54 |
-
| [gemma-2-
|
55 |
-
| [gemma-2-
|
56 |
-
| [gemma-2-
|
57 |
-
| [gemma-2-
|
58 |
| [gemma-2-9b-it.Q4_0.gguf](https://huggingface.co/ymcki/gemma-2-9b-it-GGUF/blob/main/gemma-2-9b-it.Q4_0.gguf) | Q4_0 | 5.44GB | false | 3.64 | 65.1t/s | Good quality, *recommended for edge device with 8GB RAM* |
|
59 |
-
| [gemma-2-
|
60 |
-
| [gemma-2-
|
61 |
-
| [gemma-2-
|
62 |
|
63 |
## How to check i8mm and sve support for ARM devices
|
64 |
|
@@ -108,7 +108,7 @@ On the other hand, Nvidia 3090 inference speed is significantly faster for Q4_0
|
|
108 |
|
109 |
According to this [blog](https://sc-bakushu.hatenablog.com/entry/2024/04/20/050213), adding imatrix to low bit quant can significantly improve performance. The best dataset for Japanese is [MTFMC/imatrix-dataset-for-japanese-llm](https://huggingface.co/datasets/TFMC/imatrix-dataset-for-japanese-llm). Therefore, I also created the imatrix versions of different Q4_0 quants.
|
110 |
|
111 |
-
However, based on my benchmarking results, the
|
112 |
|
113 |
## Convert safetensors to f16 gguf
|
114 |
|
|
|
50 |
| Filename | Quant type | File Size | Split | ELIZA-Tasks-100 | Nvidia 3090 | Description |
|
51 |
| -------- | ---------- | --------- | ----- | --------------- | ----------- | ----------- |
|
52 |
| [gemma-2-9b-it.f16.gguf](https://huggingface.co/ymcki/gemma-2-9b-it-GGUF/blob/main/gemma-2-9b-it.f16.gguf) | f16 | 18.49GB | false | 3.75 | 31.9t/s | Full F16 weights. |
|
53 |
+
| [gemma-2-9b-it.Q8_0.gguf](https://huggingface.co/ymcki/gemma-2-9b-it-GGUF/blob/main/gemma-2-2b-jpn-it.Q8_0.gguf) | Q8_0 | 9.83GB | false | 3.66 | 56.1t/s | Extremely high quality, *recommended for edge devices with 16GB RAM*. |
|
54 |
+
| [gemma-2-9b-it-imatrix.Q4_0.gguf](https://huggingface.co/ymcki/gemma-2-9b-it-GGUF/blob/main/gemma-2-9b-it-imatrix.Q4_0.gguf) | Q4_0 | 5.44GB | false | 3.76 | 80.6t/s | Good quality, *recommended for edge devices wth 8GB RAM*. |
|
55 |
+
| [gemma-2-9b-it-imatrix.Q4_0_8_8.gguf](https://huggingface.co/ymcki/gemma-2-9b-it-GGUF/blob/main/gemma-2-2b-jpn-it-imatrix.Q4_0_8_8.gguf) | Q4_0_8_8 | 1.63GB | false | TBD | TBD | Good quality, *recommended for edge device <8GB RAM*. |
|
56 |
+
| [gemma-2-9b-it-imatrix.Q4_0_4_8.gguf](https://huggingface.co/ymcki/gemma-2-9b-it-GGUF/blob/main/gemma-2-2b-jpn-it-imatrix.Q4_0_4_8.gguf) | Q4_0_4_8 | 1.63GB | false | TBD | TBD | Good quality, *recommended for edge device <8GB RAM*. |
|
57 |
+
| [gemma-2-9b-it-imatrix.Q4_0_4_4.gguf](https://huggingface.co/ymcki/gemma-2-9b-it-GGUF/blob/main/gemma-2-2b-jpn-it-imatrix.Q4_0_4_4.gguf) | Q4_0_4_4 | 1.63GB | false | TBD | TBD | Good quality, *recommended for edge device <8GB RAM*. |
|
58 |
| [gemma-2-9b-it.Q4_0.gguf](https://huggingface.co/ymcki/gemma-2-9b-it-GGUF/blob/main/gemma-2-9b-it.Q4_0.gguf) | Q4_0 | 5.44GB | false | 3.64 | 65.1t/s | Good quality, *recommended for edge device with 8GB RAM* |
|
59 |
+
| [gemma-2-9b-it.Q4_0_8_8.gguf](https://huggingface.co/ymcki/gemma-2-9b-it-GGUF/blob/main/gemma-2-9b-it.Q4_0_8_8.gguf) | Q4_0_8_8 | 1.63GB | false | TBD | TBD | Good quality, *recommended for edge device <8GB RAM* |
|
60 |
+
| [gemma-2-9b-it.Q4_0_4_8.gguf](https://huggingface.co/ymcki/gemma-2-9b-it-GGUF/blob/main/gemma-2-9b-it.Q4_0_4_8.gguf) | Q4_0_4_8 | 1.63GB | false | TBD | TBD | Good quality, *recommended for edge device <8GB RAM* |
|
61 |
+
| [gemma-2-9b-it.Q4_0_4_4.gguf](https://huggingface.co/ymcki/gemma-2-9b-it-GGUF/blob/main/gemma-2-9b-it.Q4_0_4_4.gguf) | Q4_0_4_4 | 1.63GB | false | TBD | TBD | Good quality, *recommended for edge device <8GB RAM*. |
|
62 |
|
63 |
## How to check i8mm and sve support for ARM devices
|
64 |
|
|
|
108 |
|
109 |
According to this [blog](https://sc-bakushu.hatenablog.com/entry/2024/04/20/050213), adding imatrix to low bit quant can significantly improve performance. The best dataset for Japanese is [MTFMC/imatrix-dataset-for-japanese-llm](https://huggingface.co/datasets/TFMC/imatrix-dataset-for-japanese-llm). Therefore, I also created the imatrix versions of different Q4_0 quants.
|
110 |
|
111 |
+
However, based on my benchmarking results, it seems like imatrix does improve the performance of a non-Japanese optimized model.
|
112 |
|
113 |
## Convert safetensors to f16 gguf
|
114 |
|