README.md · ymcki/gemma-2-2b-jpn-it-GGUF at 86d878a1caa54b0557207bbc6c3ab920f912de9f

metadata

base_model: google/gemma-2-2b-jpn-it
language:
  - multilingual
datasets:
  - TFMC/imatrix-dataset-for-japanese-llm
library_name: transformers
license: gemma
license_link: https://ai.google.dev/gemma/terms
pipeline_tag: text-generation
tags:
  - nlp
  - code
quantized_by: ymcki
widget:
  - messages:
      - role: user
        content: Can you provide ways to eat combinations of bananas and dragonfruits?

Original model: https://huggingface.co/google/gemma-2-2b-jpn-it

Run them in LM Studio

Prompt format

<|system|> {system_prompt}<|end|><|user|> {prompt}<|end|><|assistant|>

Download a file (not the whole branch) from below:

Filename	Quant type	File Size	Split	ELIZA-Tasks-100
gemma-2-2b-jpn-it.f16.gguf	f16	5.24GB	false	Full F16 weights.
gemma-2-2b-jpn-it.Q8_0.gguf	Q8_0	2.78GB	false	Extremely high quality, recommended.
gemma-2-2b-jpn-it-imatrix.Q4_0.gguf	Q4_0	1.63GB	false	Good quality, recommended for edge device <8GB RAM.
gemma-2-2b-jpn-it-imatrix.Q4_0_8_8.gguf	Q4_0_8_8	1.63GB	false	Good quality, recommended for edge device <8GB RAM.
gemma-2-2b-jpn-it-imatrix.Q4_0_4_8.gguf	Q4_0_4_8	1.63GB	false	Good quality, recommended for edge device <8GB RAM.
gemma-2-2b-jpn-it-imatrix.Q4_0_4_4.gguf	Q4_0_4_4	1.63GB	false	Good quality, recommended for edge device <8GB RAM.
gemma-2-2b-jpn-it.Q4_0.gguf	Q4_0	1.63GB	false	Poor quality, not recommended.
gemma-2-2b-jpn-it.Q4_0_8_8.gguf	Q4_0_8_8	1.63GB	false	Poor quality, not recommended.
gemma-2-2b-jpn-it.Q4_0_4_8.gguf	Q4_0_4_8	1.63GB	false	Poor quality, not recommended.
gemma-2-2b-jpn-it.Q4_0_4_4.gguf	Q4_0_4_4	1.63GB	false	Poor quality, not recommended.

How to check i8mm and sve support for ARM devices

ARM i8mm support is necessary to take advantage of Q4_0_4_8 gguf. All ARM architecure >= ARMv8.6-A supports i8mm.

ARM sve support is necessary to take advantage of Q4_0_8_8 gguf. sve is an optional feature that starts from ARMv8.2-A but majority of ARM chips doesn't implement it.

For ARM devices without both, it is recommended to use Q4_0_4_4.

For Apple devices,

sysctl hw

For ARM devices (ie most Android devices),

cat /proc/cpuinfo

There are also android apps that can display /proc/cpuinfo.

Which Q4_0 model to use for ARM devices

Brand	Series	Model	i8mm	sve	Quant Type
Qualcomm ｜Snapdragon	>= 7 Gen 1	Yes	Yes	Q4_0_8_8
Qualcomm ｜Snapdragon	others	No	No	Q4_0_4_4
Apple	M	M1	No	No	Q4_0_4_4
Apple	M	M2/M3/M4	Yes	No	Q4_0_4_8
Apple	A	A4 to A14	No	No	Q4_0_4_4
Apple	A	A15 to A18	Yes	No	Q4_0_4_8

Convert safetensors to f16 gguf

Make sure you have llama.cpp git cloned:

python3 convert_hf_to_gguf.py gemma-2-2b-jpn-it/ --outfile gemma-2-2b-jpn-it.f16.gguf --outtype f16

Convert f16 gguf to Q8_0 gguf without imatrix

Make sure you have llama.cpp compiled:

./llama-quantize gemma-2-2b-jpn-it.f16.gguf gemma-2-2b-jpn-it.Q8_0.gguf q8_0

Convert f16 gguf to other gguf with imatrix

First, prepare imatrix from f16 gguf and c4_en_ja_imatrix.txt

./llama-imatrix -m gemma-2-2b-jpn-it.f16.gguf -f c4_en_ja_imatrix.txt -o gemma-2-2b-jpn-it.imatrix --chunks 32

Then, convert f16 gguf with imatrix to create imatrix gguf

./llama-quantize --imatrix gemma-2-2b-jpn-it.imatrix gemma-2-2b-jpn-it.f16.gguf gemma-2-2b-jpn-it-imatrix.Q4_0_8_8.gguf q4_0_8_8

Downloading using huggingface-cli

First, make sure you have hugginface-cli installed:

pip install -U "huggingface_hub[cli]"

Then, you can target the specific file you want:

huggingface-cli download ymcki/gemma-2-2b-jpn-it-GGUF --include "gemma-2-2b-jpn-it-Q8_0.gguf" --local-dir ./

Credits

Thank you bartowski for providing a README.md to get me started.

Thank you YoutechA320U for the ELYZA-tasks-100 auto evaluation tool.