Upload README.md
Browse files
README.md
CHANGED
@@ -20,8 +20,6 @@ widget:
|
|
20 |
|
21 |
Original model: https://huggingface.co/google/gemma-2-2b-jpn-it
|
22 |
|
23 |
-
Run them in [LM Studio](https://lmstudio.ai/)
|
24 |
-
|
25 |
## Prompt format
|
26 |
|
27 |
```
|
@@ -30,49 +28,68 @@ Run them in [LM Studio](https://lmstudio.ai/)
|
|
30 |
|
31 |
## Download a file (not the whole branch) from below:
|
32 |
|
|
|
|
|
|
|
33 |
| Filename | Quant type | File Size | Split | ELIZA-Tasks-100 | Description |
|
34 |
| -------- | ---------- | --------- | ----- | --------------- | ----------- |
|
35 |
| [gemma-2-2b-jpn-it.f16.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it.f16.gguf) | f16 | 5.24GB | false | Full F16 weights. |
|
36 |
-
| [gemma-2-2b-jpn-it.Q8_0.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it.Q8_0.gguf) | Q8_0 | 2.78GB | false | Extremely high quality, *recommended*. |
|
37 |
-
| [gemma-2-2b-jpn-it-imatrix.Q4_0.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it-imatrix.Q4_0.gguf) | Q4_0 | 1.63GB | false | Good quality, *recommended for edge device <8GB RAM*. |
|
38 |
-
| [gemma-2-2b-jpn-it-imatrix.Q4_0_8_8.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it-imatrix.Q4_0_8_8.gguf) | Q4_0_8_8 | 1.63GB | false | Good quality, *recommended for edge device <8GB RAM*. |
|
39 |
-
| [gemma-2-2b-jpn-it-imatrix.Q4_0_4_8.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it-imatrix.Q4_0_4_8.gguf) | Q4_0_4_8 | 1.63GB | false | Good quality, *recommended for edge device <8GB RAM*. |
|
40 |
-
| [gemma-2-2b-jpn-it-imatrix.Q4_0_4_4.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it-imatrix.Q4_0_4_4.gguf) | Q4_0_4_4 | 1.63GB | false | Good quality, *recommended for edge device <8GB RAM*. |
|
41 |
-
| [gemma-2-2b-jpn-it.Q4_0.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it.Q4_0.gguf) | Q4_0 | 1.63GB | false |
|
42 |
-
| [gemma-2-2b-jpn-it.Q4_0_8_8.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it.Q4_0_8_8.gguf) | Q4_0_8_8 | 1.63GB | false | Poor quality, *not recommended*. |
|
43 |
-
| [gemma-2-2b-jpn-it.Q4_0_4_8.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it.Q4_0_4_8.gguf) | Q4_0_4_8 | 1.63GB | false | Poor quality, *not recommended*. |
|
44 |
-
| [gemma-2-2b-jpn-it.Q4_0_4_4.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it.Q4_0_4_4.gguf) | Q4_0_4_4 | 1.63GB | false | Poor quality, *not recommended*. |
|
45 |
|
46 |
## How to check i8mm and sve support for ARM devices
|
47 |
|
48 |
-
ARM i8mm support is necessary to take advantage of Q4_0_4_8 gguf. All ARM
|
49 |
|
50 |
ARM sve support is necessary to take advantage of Q4_0_8_8 gguf. sve is an optional feature that starts from ARMv8.2-A but majority of ARM chips doesn't implement it.
|
51 |
|
52 |
For ARM devices without both, it is recommended to use Q4_0_4_4.
|
53 |
|
|
|
|
|
|
|
|
|
|
|
54 |
For Apple devices,
|
55 |
|
56 |
```
|
57 |
sysctl hw
|
58 |
```
|
59 |
|
60 |
-
For ARM devices (ie most Android devices),
|
61 |
```
|
62 |
cat /proc/cpuinfo
|
63 |
```
|
64 |
|
65 |
There are also android apps that can display /proc/cpuinfo.
|
66 |
|
|
|
|
|
67 |
## Which Q4_0 model to use for ARM devices
|
68 |
| Brand | Series | Model | i8mm | sve | Quant Type |
|
69 |
| ----- | ------ | ----- | ---- | --- | -----------|
|
70 |
-
| Qualcomm |Snapdragon | >= 7 Gen 1 | Yes | Yes | Q4_0_8_8 |
|
71 |
-
| Qualcomm |Snapdragon | others | No | No | Q4_0_4_4 |
|
72 |
-
| Apple | M | M1 | No | No | Q4_0_4_4 |
|
73 |
-
| Apple | M | M2/M3/M4 | Yes | No | Q4_0_4_8 |
|
74 |
| Apple | A | A4 to A14 | No | No | Q4_0_4_4 |
|
75 |
| Apple | A | A15 to A18 | Yes | No | Q4_0_4_8 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
76 |
|
77 |
## Convert safetensors to f16 gguf
|
78 |
|
@@ -88,7 +105,7 @@ Make sure you have llama.cpp compiled:
|
|
88 |
./llama-quantize gemma-2-2b-jpn-it.f16.gguf gemma-2-2b-jpn-it.Q8_0.gguf q8_0
|
89 |
```
|
90 |
|
91 |
-
## Convert f16 gguf to other
|
92 |
|
93 |
First, prepare imatrix from f16 gguf and c4_en_ja_imatrix.txt
|
94 |
|
|
|
20 |
|
21 |
Original model: https://huggingface.co/google/gemma-2-2b-jpn-it
|
22 |
|
|
|
|
|
23 |
## Prompt format
|
24 |
|
25 |
```
|
|
|
28 |
|
29 |
## Download a file (not the whole branch) from below:
|
30 |
|
31 |
+
ELIZA-Tasks-100 is pretty standard benchmark for Japanese LLMs.
|
32 |
+
The perfect score is 5.00. As a reference, bartowski's gemma-2-27b-it.Q6_K.gguf scores 4.04.
|
33 |
+
|
34 |
| Filename | Quant type | File Size | Split | ELIZA-Tasks-100 | Description |
|
35 |
| -------- | ---------- | --------- | ----- | --------------- | ----------- |
|
36 |
| [gemma-2-2b-jpn-it.f16.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it.f16.gguf) | f16 | 5.24GB | false | Full F16 weights. |
|
37 |
+
| [gemma-2-2b-jpn-it.Q8_0.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it.Q8_0.gguf) | Q8_0 | 2.78GB | false | 3.06 | Extremely high quality, *recommended*. |
|
38 |
+
| [gemma-2-2b-jpn-it-imatrix.Q4_0.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it-imatrix.Q4_0.gguf) | Q4_0 | 1.63GB | false | 2.89 | Good quality, *recommended for edge device <8GB RAM*. |
|
39 |
+
| [gemma-2-2b-jpn-it-imatrix.Q4_0_8_8.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it-imatrix.Q4_0_8_8.gguf) | Q4_0_8_8 | 1.63GB | false | 2.78 | Good quality, *recommended for edge device <8GB RAM*. |
|
40 |
+
| [gemma-2-2b-jpn-it-imatrix.Q4_0_4_8.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it-imatrix.Q4_0_4_8.gguf) | Q4_0_4_8 | 1.63GB | false | TBD | Good quality, *recommended for edge device <8GB RAM*. |
|
41 |
+
| [gemma-2-2b-jpn-it-imatrix.Q4_0_4_4.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it-imatrix.Q4_0_4_4.gguf) | Q4_0_4_4 | 1.63GB | false | TBD | Good quality, *recommended for edge device <8GB RAM*. |
|
42 |
+
| [gemma-2-2b-jpn-it.Q4_0.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it.Q4_0.gguf) | Q4_0 | 1.63GB | false | 2.77 | Good quality but imatrix version a bit better. |
|
43 |
+
| [gemma-2-2b-jpn-it.Q4_0_8_8.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it.Q4_0_8_8.gguf) | Q4_0_8_8 | 1.63GB | false | TBD | Poor quality, *not recommended*. |
|
44 |
+
| [gemma-2-2b-jpn-it.Q4_0_4_8.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it.Q4_0_4_8.gguf) | Q4_0_4_8 | 1.63GB | false | TBD | Poor quality, *not recommended*. |
|
45 |
+
| [gemma-2-2b-jpn-it.Q4_0_4_4.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it.Q4_0_4_4.gguf) | Q4_0_4_4 | 1.63GB | false | TBD | Poor quality, *not recommended*. |
|
46 |
|
47 |
## How to check i8mm and sve support for ARM devices
|
48 |
|
49 |
+
ARM i8mm support is necessary to take advantage of Q4_0_4_8 gguf. All ARM architecture >= ARMv8.6-A supports i8mm.
|
50 |
|
51 |
ARM sve support is necessary to take advantage of Q4_0_8_8 gguf. sve is an optional feature that starts from ARMv8.2-A but majority of ARM chips doesn't implement it.
|
52 |
|
53 |
For ARM devices without both, it is recommended to use Q4_0_4_4.
|
54 |
|
55 |
+
With these support, the inference speed should be faster in the order of Q4_0_8_8 > Q4_0_4_8 > Q4_0_4_4 > Q4_0 without much effect on the quality of response.
|
56 |
+
|
57 |
+
|
58 |
+
This is a [list](https://gpages.juszkiewicz.com.pl/arm-socs-table/arm-socs.html) of ARM devices that support different ARM instructions. Apparently, it is only a partial list. It is better you check for i8mm and sve support by yourself.
|
59 |
+
|
60 |
For Apple devices,
|
61 |
|
62 |
```
|
63 |
sysctl hw
|
64 |
```
|
65 |
|
66 |
+
For other ARM devices (ie most Android devices),
|
67 |
```
|
68 |
cat /proc/cpuinfo
|
69 |
```
|
70 |
|
71 |
There are also android apps that can display /proc/cpuinfo.
|
72 |
|
73 |
+
On the other hand, inference speed for the imatrix ggufs on my Nvidia 3090 is 137t/s for Q4_0, 2.8t/s for Q4_0_8_8. That means for Nvidia, you better off using Q4_0.
|
74 |
+
|
75 |
## Which Q4_0 model to use for ARM devices
|
76 |
| Brand | Series | Model | i8mm | sve | Quant Type |
|
77 |
| ----- | ------ | ----- | ---- | --- | -----------|
|
|
|
|
|
|
|
|
|
78 |
| Apple | A | A4 to A14 | No | No | Q4_0_4_4 |
|
79 |
| Apple | A | A15 to A18 | Yes | No | Q4_0_4_8 |
|
80 |
+
| Apple | M | M1 | No | No | Q4_0_4_4 |
|
81 |
+
| Apple | M | M2/M3/M4 | Yes | No | Q4_0_4_8 |
|
82 |
+
| Google | Tensor | G1,G2 | No | No | Q4_0_4_4 |
|
83 |
+
| Google | Tensor | G3,G4 | Yes | Yes | Q4_0_8_8 |
|
84 |
+
| Samsung | Exynos | 2200,2400 | Yes | Yes | Q4_0_8_8 |
|
85 |
+
| Mediatek | Dimensity | 9000 | Yes | Yes | Q4_0_8_8 |
|
86 |
+
| Mediatek | Dimensity | 9300 | Yes | No | Q4_0_4_8 |
|
87 |
+
| Qualcomm |Snapdragon | 8 Gen 1 | Yes | Yes | Q4_0_8_8 |
|
88 |
+
| Qualcomm |Snapdragon | 8 Gen 2,8 Gen 3,X Elite | Yes | No | Q4_0_4_8 |
|
89 |
+
|
90 |
+
## imatrix quantization
|
91 |
+
|
92 |
+
According to this [blog](https://sc-bakushu.hatenablog.com/entry/2024/04/20/050213), adding imatrix to low bit quant can significantly improve performance. The best dataset for Japanese is [MTFMC/imatrix-dataset-for-japanese-llm](https://huggingface.co/datasets/TFMC/imatrix-dataset-for-japanese-llm). Therefore, I also created the imatrix versions of different Q4_0 quants. Indeed, they significantly outperforms the non-imatrix counterparts.
|
93 |
|
94 |
## Convert safetensors to f16 gguf
|
95 |
|
|
|
105 |
./llama-quantize gemma-2-2b-jpn-it.f16.gguf gemma-2-2b-jpn-it.Q8_0.gguf q8_0
|
106 |
```
|
107 |
|
108 |
+
## Convert f16 gguf to other ggufs with imatrix
|
109 |
|
110 |
First, prepare imatrix from f16 gguf and c4_en_ja_imatrix.txt
|
111 |
|