ymcki's picture
Upload README.md
528f049 verified
|
raw
history blame
5.14 kB
---
base_model: google/gemma-2-2b-jpn-it
language:
- multilingual
datasets:
- TFMC/imatrix-dataset-for-japanese-llm
library_name: transformers
license: gemma
license_link: https://ai.google.dev/gemma/terms
pipeline_tag: text-generation
tags:
- nlp
- code
quantized_by: ymcki
widget:
- messages:
- role: user
content: Can you provide ways to eat combinations of bananas and dragonfruits?
---
Original model: https://huggingface.co/google/gemma-2-2b-jpn-it
Run them in [LM Studio](https://lmstudio.ai/)
## Prompt format
```
<|system|> {system_prompt}<|end|><|user|> {prompt}<|end|><|assistant|>
```
## Download a file (not the whole branch) from below:
| Filename | Quant type | File Size | Split | ELIZA-Tasks-100 | Description |
| -------- | ---------- | --------- | ----- | --------------- | ----------- |
| [gemma-2-2b-jpn-it.f16.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it.f16.gguf) | f16 | 5.24GB | false | Full F16 weights. |
| [gemma-2-2b-jpn-it.Q8_0.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it.Q8_0.gguf) | Q8_0 | 2.78GB | false | Extremely high quality, *recommended*. |
| [gemma-2-2b-jpn-it-imatrix.Q4_0.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it-imatrix.Q4_0.gguf) | Q4_0 | 1.63GB | false | Good quality, *recommended for edge device <8GB RAM*. |
| [gemma-2-2b-jpn-it-imatrix.Q4_0_8_8.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it-imatrix.Q4_0_8_8.gguf) | Q4_0_8_8 | 1.63GB | false | Good quality, *recommended for edge device <8GB RAM*. |
| [gemma-2-2b-jpn-it-imatrix.Q4_0_4_8.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it-imatrix.Q4_0_4_8.gguf) | Q4_0_4_8 | 1.63GB | false | Good quality, *recommended for edge device <8GB RAM*. |
| [gemma-2-2b-jpn-it-imatrix.Q4_0_4_4.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it-imatrix.Q4_0_4_4.gguf) | Q4_0_4_4 | 1.63GB | false | Good quality, *recommended for edge device <8GB RAM*. |
| [gemma-2-2b-jpn-it.Q4_0.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it.Q4_0.gguf) | Q4_0 | 1.63GB | false | Poor quality, *not recommended*. |
| [gemma-2-2b-jpn-it.Q4_0_8_8.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it.Q4_0_8_8.gguf) | Q4_0_8_8 | 1.63GB | false | Poor quality, *not recommended*. |
| [gemma-2-2b-jpn-it.Q4_0_4_8.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it.Q4_0_4_8.gguf) | Q4_0_4_8 | 1.63GB | false | Poor quality, *not recommended*. |
| [gemma-2-2b-jpn-it.Q4_0_4_4.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it.Q4_0_4_4.gguf) | Q4_0_4_4 | 1.63GB | false | Poor quality, *not recommended*. |
## How to check i8mm and sve support for ARM devices
ARM i8mm support is necessary to take advantage of Q4_0_4_8 gguf. All ARM architecure >= ARMv8.6-A supports i8mm.
ARM sve support is necessary to take advantage of Q4_0_8_8 gguf. sve is an optional feature that starts from ARMv8.2-A but majority of ARM chips doesn't implement it.
For ARM devices without both, it is recommended to use Q4_0_4_4.
For Apple devices,
```
sysctl hw
```
For ARM devices (ie most Android devices),
```
cat /proc/cpuinfo
```
There are also android apps that can display /proc/cpuinfo.
## Which Q4_0 model to use for ARM devices
| Brand | Series | Model | i8mm | sve | Quant Type |
| ----- | ------ | ----- | ---- | --- | -----------|
| Qualcomm |Snapdragon | >= 7 Gen 1 | Yes | Yes | Q4_0_8_8 |
| Qualcomm |Snapdragon | others | No | No | Q4_0_4_4 |
| Apple | M | M1 | No | No | Q4_0_4_4 |
| Apple | M | M2/M3/M4 | Yes | No | Q4_0_4_8 |
| Apple | A | A4 to A14 | No | No | Q4_0_4_4 |
| Apple | A | A15 to A18 | Yes | No | Q4_0_4_8 |
## Convert safetensors to f16 gguf
Make sure you have llama.cpp git cloned:
```
python3 convert_hf_to_gguf.py gemma-2-2b-jpn-it/ --outfile gemma-2-2b-jpn-it.f16.gguf --outtype f16
```
## Convert f16 gguf to Q8_0 gguf without imatrix
Make sure you have llama.cpp compiled:
```
./llama-quantize gemma-2-2b-jpn-it.f16.gguf gemma-2-2b-jpn-it.Q8_0.gguf q8_0
```
## Convert f16 gguf to other gguf with imatrix
First, prepare imatrix from f16 gguf and c4_en_ja_imatrix.txt
```
./llama-imatrix -m gemma-2-2b-jpn-it.f16.gguf -f c4_en_ja_imatrix.txt -o gemma-2-2b-jpn-it.imatrix --chunks 32
```
Then, convert f16 gguf with imatrix to create imatrix gguf
```
./llama-quantize --imatrix gemma-2-2b-jpn-it.imatrix gemma-2-2b-jpn-it.f16.gguf gemma-2-2b-jpn-it-imatrix.Q4_0_8_8.gguf q4_0_8_8
```
## Downloading using huggingface-cli
First, make sure you have hugginface-cli installed:
```
pip install -U "huggingface_hub[cli]"
```
Then, you can target the specific file you want:
```
huggingface-cli download ymcki/gemma-2-2b-jpn-it-GGUF --include "gemma-2-2b-jpn-it-Q8_0.gguf" --local-dir ./
```
## Credits
Thank you bartowski for providing a README.md to get me started.
Thank you YoutechA320U for the ELYZA-tasks-100 auto evaluation tool.