ymcki commited on
Commit
d45cb19
1 Parent(s): c07efbc

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +35 -18
README.md CHANGED
@@ -20,8 +20,6 @@ widget:
20
 
21
  Original model: https://huggingface.co/google/gemma-2-2b-jpn-it
22
 
23
- Run them in [LM Studio](https://lmstudio.ai/)
24
-
25
  ## Prompt format
26
 
27
  ```
@@ -30,49 +28,68 @@ Run them in [LM Studio](https://lmstudio.ai/)
30
 
31
  ## Download a file (not the whole branch) from below:
32
 
 
 
 
33
  | Filename | Quant type | File Size | Split | ELIZA-Tasks-100 | Description |
34
  | -------- | ---------- | --------- | ----- | --------------- | ----------- |
35
  | [gemma-2-2b-jpn-it.f16.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it.f16.gguf) | f16 | 5.24GB | false | Full F16 weights. |
36
- | [gemma-2-2b-jpn-it.Q8_0.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it.Q8_0.gguf) | Q8_0 | 2.78GB | false | Extremely high quality, *recommended*. |
37
- | [gemma-2-2b-jpn-it-imatrix.Q4_0.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it-imatrix.Q4_0.gguf) | Q4_0 | 1.63GB | false | Good quality, *recommended for edge device <8GB RAM*. |
38
- | [gemma-2-2b-jpn-it-imatrix.Q4_0_8_8.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it-imatrix.Q4_0_8_8.gguf) | Q4_0_8_8 | 1.63GB | false | Good quality, *recommended for edge device <8GB RAM*. |
39
- | [gemma-2-2b-jpn-it-imatrix.Q4_0_4_8.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it-imatrix.Q4_0_4_8.gguf) | Q4_0_4_8 | 1.63GB | false | Good quality, *recommended for edge device <8GB RAM*. |
40
- | [gemma-2-2b-jpn-it-imatrix.Q4_0_4_4.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it-imatrix.Q4_0_4_4.gguf) | Q4_0_4_4 | 1.63GB | false | Good quality, *recommended for edge device <8GB RAM*. |
41
- | [gemma-2-2b-jpn-it.Q4_0.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it.Q4_0.gguf) | Q4_0 | 1.63GB | false | Poor quality, *not recommended*. |
42
- | [gemma-2-2b-jpn-it.Q4_0_8_8.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it.Q4_0_8_8.gguf) | Q4_0_8_8 | 1.63GB | false | Poor quality, *not recommended*. |
43
- | [gemma-2-2b-jpn-it.Q4_0_4_8.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it.Q4_0_4_8.gguf) | Q4_0_4_8 | 1.63GB | false | Poor quality, *not recommended*. |
44
- | [gemma-2-2b-jpn-it.Q4_0_4_4.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it.Q4_0_4_4.gguf) | Q4_0_4_4 | 1.63GB | false | Poor quality, *not recommended*. |
45
 
46
  ## How to check i8mm and sve support for ARM devices
47
 
48
- ARM i8mm support is necessary to take advantage of Q4_0_4_8 gguf. All ARM architecure >= ARMv8.6-A supports i8mm.
49
 
50
  ARM sve support is necessary to take advantage of Q4_0_8_8 gguf. sve is an optional feature that starts from ARMv8.2-A but majority of ARM chips doesn't implement it.
51
 
52
  For ARM devices without both, it is recommended to use Q4_0_4_4.
53
 
 
 
 
 
 
54
  For Apple devices,
55
 
56
  ```
57
  sysctl hw
58
  ```
59
 
60
- For ARM devices (ie most Android devices),
61
  ```
62
  cat /proc/cpuinfo
63
  ```
64
 
65
  There are also android apps that can display /proc/cpuinfo.
66
 
 
 
67
  ## Which Q4_0 model to use for ARM devices
68
  | Brand | Series | Model | i8mm | sve | Quant Type |
69
  | ----- | ------ | ----- | ---- | --- | -----------|
70
- | Qualcomm |Snapdragon | >= 7 Gen 1 | Yes | Yes | Q4_0_8_8 |
71
- | Qualcomm |Snapdragon | others | No | No | Q4_0_4_4 |
72
- | Apple | M | M1 | No | No | Q4_0_4_4 |
73
- | Apple | M | M2/M3/M4 | Yes | No | Q4_0_4_8 |
74
  | Apple | A | A4 to A14 | No | No | Q4_0_4_4 |
75
  | Apple | A | A15 to A18 | Yes | No | Q4_0_4_8 |
 
 
 
 
 
 
 
 
 
 
 
 
 
76
 
77
  ## Convert safetensors to f16 gguf
78
 
@@ -88,7 +105,7 @@ Make sure you have llama.cpp compiled:
88
  ./llama-quantize gemma-2-2b-jpn-it.f16.gguf gemma-2-2b-jpn-it.Q8_0.gguf q8_0
89
  ```
90
 
91
- ## Convert f16 gguf to other gguf with imatrix
92
 
93
  First, prepare imatrix from f16 gguf and c4_en_ja_imatrix.txt
94
 
 
20
 
21
  Original model: https://huggingface.co/google/gemma-2-2b-jpn-it
22
 
 
 
23
  ## Prompt format
24
 
25
  ```
 
28
 
29
  ## Download a file (not the whole branch) from below:
30
 
31
+ ELIZA-Tasks-100 is pretty standard benchmark for Japanese LLMs.
32
+ The perfect score is 5.00. As a reference, bartowski's gemma-2-27b-it.Q6_K.gguf scores 4.04.
33
+
34
  | Filename | Quant type | File Size | Split | ELIZA-Tasks-100 | Description |
35
  | -------- | ---------- | --------- | ----- | --------------- | ----------- |
36
  | [gemma-2-2b-jpn-it.f16.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it.f16.gguf) | f16 | 5.24GB | false | Full F16 weights. |
37
+ | [gemma-2-2b-jpn-it.Q8_0.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it.Q8_0.gguf) | Q8_0 | 2.78GB | false | 3.06 | Extremely high quality, *recommended*. |
38
+ | [gemma-2-2b-jpn-it-imatrix.Q4_0.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it-imatrix.Q4_0.gguf) | Q4_0 | 1.63GB | false | 2.89 | Good quality, *recommended for edge device <8GB RAM*. |
39
+ | [gemma-2-2b-jpn-it-imatrix.Q4_0_8_8.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it-imatrix.Q4_0_8_8.gguf) | Q4_0_8_8 | 1.63GB | false | 2.78 | Good quality, *recommended for edge device <8GB RAM*. |
40
+ | [gemma-2-2b-jpn-it-imatrix.Q4_0_4_8.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it-imatrix.Q4_0_4_8.gguf) | Q4_0_4_8 | 1.63GB | false | TBD | Good quality, *recommended for edge device <8GB RAM*. |
41
+ | [gemma-2-2b-jpn-it-imatrix.Q4_0_4_4.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it-imatrix.Q4_0_4_4.gguf) | Q4_0_4_4 | 1.63GB | false | TBD | Good quality, *recommended for edge device <8GB RAM*. |
42
+ | [gemma-2-2b-jpn-it.Q4_0.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it.Q4_0.gguf) | Q4_0 | 1.63GB | false | 2.77 | Good quality but imatrix version a bit better. |
43
+ | [gemma-2-2b-jpn-it.Q4_0_8_8.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it.Q4_0_8_8.gguf) | Q4_0_8_8 | 1.63GB | false | TBD | Poor quality, *not recommended*. |
44
+ | [gemma-2-2b-jpn-it.Q4_0_4_8.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it.Q4_0_4_8.gguf) | Q4_0_4_8 | 1.63GB | false | TBD | Poor quality, *not recommended*. |
45
+ | [gemma-2-2b-jpn-it.Q4_0_4_4.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it.Q4_0_4_4.gguf) | Q4_0_4_4 | 1.63GB | false | TBD | Poor quality, *not recommended*. |
46
 
47
  ## How to check i8mm and sve support for ARM devices
48
 
49
+ ARM i8mm support is necessary to take advantage of Q4_0_4_8 gguf. All ARM architecture >= ARMv8.6-A supports i8mm.
50
 
51
  ARM sve support is necessary to take advantage of Q4_0_8_8 gguf. sve is an optional feature that starts from ARMv8.2-A but majority of ARM chips doesn't implement it.
52
 
53
  For ARM devices without both, it is recommended to use Q4_0_4_4.
54
 
55
+ With these support, the inference speed should be faster in the order of Q4_0_8_8 > Q4_0_4_8 > Q4_0_4_4 > Q4_0 without much effect on the quality of response.
56
+
57
+
58
+ This is a [list](https://gpages.juszkiewicz.com.pl/arm-socs-table/arm-socs.html) of ARM devices that support different ARM instructions. Apparently, it is only a partial list. It is better you check for i8mm and sve support by yourself.
59
+
60
  For Apple devices,
61
 
62
  ```
63
  sysctl hw
64
  ```
65
 
66
+ For other ARM devices (ie most Android devices),
67
  ```
68
  cat /proc/cpuinfo
69
  ```
70
 
71
  There are also android apps that can display /proc/cpuinfo.
72
 
73
+ On the other hand, inference speed for the imatrix ggufs on my Nvidia 3090 is 137t/s for Q4_0, 2.8t/s for Q4_0_8_8. That means for Nvidia, you better off using Q4_0.
74
+
75
  ## Which Q4_0 model to use for ARM devices
76
  | Brand | Series | Model | i8mm | sve | Quant Type |
77
  | ----- | ------ | ----- | ---- | --- | -----------|
 
 
 
 
78
  | Apple | A | A4 to A14 | No | No | Q4_0_4_4 |
79
  | Apple | A | A15 to A18 | Yes | No | Q4_0_4_8 |
80
+ | Apple | M | M1 | No | No | Q4_0_4_4 |
81
+ | Apple | M | M2/M3/M4 | Yes | No | Q4_0_4_8 |
82
+ | Google | Tensor | G1,G2 | No | No | Q4_0_4_4 |
83
+ | Google | Tensor | G3,G4 | Yes | Yes | Q4_0_8_8 |
84
+ | Samsung | Exynos | 2200,2400 | Yes | Yes | Q4_0_8_8 |
85
+ | Mediatek | Dimensity | 9000 | Yes | Yes | Q4_0_8_8 |
86
+ | Mediatek | Dimensity | 9300 | Yes | No | Q4_0_4_8 |
87
+ | Qualcomm |Snapdragon | 8 Gen 1 | Yes | Yes | Q4_0_8_8 |
88
+ | Qualcomm |Snapdragon | 8 Gen 2,8 Gen 3,X Elite | Yes | No | Q4_0_4_8 |
89
+
90
+ ## imatrix quantization
91
+
92
+ According to this [blog](https://sc-bakushu.hatenablog.com/entry/2024/04/20/050213), adding imatrix to low bit quant can significantly improve performance. The best dataset for Japanese is [MTFMC/imatrix-dataset-for-japanese-llm](https://huggingface.co/datasets/TFMC/imatrix-dataset-for-japanese-llm). Therefore, I also created the imatrix versions of different Q4_0 quants. Indeed, they significantly outperforms the non-imatrix counterparts.
93
 
94
  ## Convert safetensors to f16 gguf
95
 
 
105
  ./llama-quantize gemma-2-2b-jpn-it.f16.gguf gemma-2-2b-jpn-it.Q8_0.gguf q8_0
106
  ```
107
 
108
+ ## Convert f16 gguf to other ggufs with imatrix
109
 
110
  First, prepare imatrix from f16 gguf and c4_en_ja_imatrix.txt
111