ymcki commited on
Commit
120c59f
1 Parent(s): 06bba61

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -15
README.md CHANGED
@@ -38,18 +38,18 @@ Note that this model does not support a System prompt.
38
  ELIZA-Tasks-100 is pretty standard benchmark for Japanese LLMs.
39
  The perfect score is 5.00. As a reference, bartowski's gemma-2-27b-it.Q6_K.gguf scores 4.04.
40
 
41
- | Filename | Quant type | File Size | Split | ELIZA-Tasks-100 | Nvidia 3090 | Description |
42
- | -------- | ---------- | --------- | ----- | --------------- | ----------- | ----------- |
43
- | [gemma-2-2b-jpn-it.f16.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it.f16.gguf) | f16 | 5.24GB | false | 2.90 | 98t/s | Full F16 weights. |
44
- | [gemma-2-2b-jpn-it.Q8_0.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it.Q8_0.gguf) | Q8_0 | 2.78GB | false | 3.06 | 140t/s | Extremely high quality, *recommended*. |
45
- | [gemma-2-2b-jpn-it-imatrix.Q4_0.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it-imatrix.Q4_0.gguf) | Q4_0 | 1.63GB | false | 2.89 | 137t/s | Good quality, *recommended for edge device <8GB RAM*. |
46
- | [gemma-2-2b-jpn-it-imatrix.Q4_0_8_8.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it-imatrix.Q4_0_8_8.gguf) | Q4_0_8_8 | 1.63GB | false | 2.78 | 2.79t/s | Good quality, *recommended for edge device <8GB RAM*. |
47
- | [gemma-2-2b-jpn-it-imatrix.Q4_0_4_8.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it-imatrix.Q4_0_4_8.gguf) | Q4_0_4_8 | 1.63GB | false | 2.77 | 2.61t/s | Good quality, *recommended for edge device <8GB RAM*. |
48
- | [gemma-2-2b-jpn-it-imatrix.Q4_0_4_4.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it-imatrix.Q4_0_4_4.gguf) | Q4_0_4_4 | 1.63GB | false | 2.65 | 3.09t/s | Good quality, *recommended for edge device <8GB RAM*. |
49
- | [gemma-2-2b-jpn-it.Q4_0.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it.Q4_0.gguf) | Q4_0 | 1.63GB | false | 2.77 | 159t/s | Good quality, *recommended for edge device <8GB RAM* |
50
- | [gemma-2-2b-jpn-it.Q4_0_8_8.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it.Q4_0_8_8.gguf) | Q4_0_8_8 | 1.63GB | false | 2.92 | 2.85t/s | Good quality, *recommended for edge device <8GB RAM* |
51
- | [gemma-2-2b-jpn-it.Q4_0_4_8.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it.Q4_0_4_8.gguf) | Q4_0_4_8 | 1.63GB | false | 2.74 | 2.56t/s | Good quality, *recommended for edge device <8GB RAM* |
52
- | [gemma-2-2b-jpn-it.Q4_0_4_4.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it.Q4_0_4_4.gguf) | Q4_0_4_4 | 1.63GB | false | 2.70 | 3.10t/s | Good quality, *recommended for edge device <8GB RAM*. |
53
 
54
  ## How to check i8mm and sve support for ARM devices
55
 
@@ -61,7 +61,7 @@ For ARM devices without both, it is recommended to use Q4_0_4_4.
61
 
62
  With these support, the inference speed should be faster in the order of Q4_0_8_8 > Q4_0_4_8 > Q4_0_4_4 > Q4_0 without much effect on the quality of response.
63
 
64
- This is a [list](https://gpages.juszkiewicz.com.pl/arm-socs-table/arm-socs.html) of ARM devices that support different ARM instructions. Apparently, it is only a partial list. It is better you check for i8mm and sve support by yourself.
65
 
66
  For Apple devices,
67
 
@@ -90,9 +90,9 @@ On the other hand, Nvidia 3090 inference speed is significantly faster for Q4_0
90
  | Google | Tensor | G1,G2 | No | No | Q4_0_4_4 |
91
  | Google | Tensor | G3,G4 | Yes | Yes | Q4_0_8_8 |
92
  | Samsung | Exynos | 2200,2400 | Yes | Yes | Q4_0_8_8 |
93
- | Mediatek | Dimensity | 9000 | Yes | Yes | Q4_0_8_8 |
94
  | Mediatek | Dimensity | 9300 | Yes | No | Q4_0_4_8 |
95
- | Qualcomm | Snapdragon | 8 Gen 1 | Yes | Yes | Q4_0_8_8 |
96
  | Qualcomm | Snapdragon | 8 Gen 2,8 Gen 3,X Elite | Yes | No | Q4_0_4_8 |
97
 
98
  ## imatrix quantization
 
38
  ELIZA-Tasks-100 is pretty standard benchmark for Japanese LLMs.
39
  The perfect score is 5.00. As a reference, bartowski's gemma-2-27b-it.Q6_K.gguf scores 4.04.
40
 
41
+ | Filename | Quant type | File Size | ELIZA-Tasks-100 | Nvidia 3090 | Description |
42
+ | -------- | ---------- | --------- | --------------- | ----------- | ----------- |
43
+ | [gemma-2-2b-jpn-it.f16.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it.f16.gguf) | f16 | 5.24GB | 2.90 | 98t/s | Full F16 weights. |
44
+ | [gemma-2-2b-jpn-it.Q8_0.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it.Q8_0.gguf) | Q8_0 | 2.78GB | 3.06 | 140t/s | Extremely high quality, *recommended*. |
45
+ | [gemma-2-2b-jpn-it-imatrix.Q4_0.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it-imatrix.Q4_0.gguf) | Q4_0 | 1.63GB | 2.89 | 137t/s | Good quality, *recommended for edge devices <8GB RAM*. |
46
+ | [gemma-2-2b-jpn-it-imatrix.Q4_0_8_8.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it-imatrix.Q4_0_8_8.gguf) | Q4_0_8_8 | 1.63GB | 2.78 | 2.79t/s | Good quality, *recommended for edge devices <8GB RAM*. |
47
+ | [gemma-2-2b-jpn-it-imatrix.Q4_0_4_8.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it-imatrix.Q4_0_4_8.gguf) | Q4_0_4_8 | 1.63GB | 2.77 | 2.61t/s | Good quality, *recommended for edge devices <8GB RAM*. |
48
+ | [gemma-2-2b-jpn-it-imatrix.Q4_0_4_4.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it-imatrix.Q4_0_4_4.gguf) | Q4_0_4_4 | 1.63GB | 2.65 | 3.09t/s | Good quality, *recommended for edge devices <8GB RAM*. |
49
+ | [gemma-2-2b-jpn-it.Q4_0.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it.Q4_0.gguf) | Q4_0 | 1.63GB | 2.77 | 159t/s | Good quality, *recommended for edge devices <8GB RAM* |
50
+ | [gemma-2-2b-jpn-it.Q4_0_8_8.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it.Q4_0_8_8.gguf) | Q4_0_8_8 | 1.63GB | 2.92 | 2.85t/s | Good quality, *recommended for edge devices <8GB RAM* |
51
+ | [gemma-2-2b-jpn-it.Q4_0_4_8.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it.Q4_0_4_8.gguf) | Q4_0_4_8 | 1.63GB | 2.74 | 2.56t/s | Good quality, *recommended for edge devices <8GB RAM* |
52
+ | [gemma-2-2b-jpn-it.Q4_0_4_4.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it.Q4_0_4_4.gguf) | Q4_0_4_4 | 1.63GB | 2.70 | 3.10t/s | Good quality, *recommended for edge devices <8GB RAM*. |
53
 
54
  ## How to check i8mm and sve support for ARM devices
55
 
 
61
 
62
  With these support, the inference speed should be faster in the order of Q4_0_8_8 > Q4_0_4_8 > Q4_0_4_4 > Q4_0 without much effect on the quality of response.
63
 
64
+ This is a [list](https://gpages.juszkiewicz.com.pl/arm-socs-table/arm-socs.html) of ARM CPUs that support different ARM instructions. Another [list](https://raw.githubusercontent.com/ThomasKaiser/sbc-bench/refs/heads/master/sbc-bench.sh). Apparently, they only covers limited number of ARM CPUs. It is better you check for i8mm and sve support by yourself.
65
 
66
  For Apple devices,
67
 
 
90
  | Google | Tensor | G1,G2 | No | No | Q4_0_4_4 |
91
  | Google | Tensor | G3,G4 | Yes | Yes | Q4_0_8_8 |
92
  | Samsung | Exynos | 2200,2400 | Yes | Yes | Q4_0_8_8 |
93
+ | Mediatek | Dimensity | 9000,9000+ | Yes | Yes | Q4_0_8_8 |
94
  | Mediatek | Dimensity | 9300 | Yes | No | Q4_0_4_8 |
95
+ | Qualcomm | Snapdragon | 7+ Gen 2,8/8+ Gen 1 | Yes | Yes | Q4_0_8_8 |
96
  | Qualcomm | Snapdragon | 8 Gen 2,8 Gen 3,X Elite | Yes | No | Q4_0_4_8 |
97
 
98
  ## imatrix quantization