ymcki's picture
Upload README.md
528f049 verified
|
raw
history blame
5.14 kB
metadata
base_model: google/gemma-2-2b-jpn-it
language:
  - multilingual
datasets:
  - TFMC/imatrix-dataset-for-japanese-llm
library_name: transformers
license: gemma
license_link: https://ai.google.dev/gemma/terms
pipeline_tag: text-generation
tags:
  - nlp
  - code
quantized_by: ymcki
widget:
  - messages:
      - role: user
        content: Can you provide ways to eat combinations of bananas and dragonfruits?

Original model: https://huggingface.co/google/gemma-2-2b-jpn-it

Run them in LM Studio

Prompt format

<|system|> {system_prompt}<|end|><|user|> {prompt}<|end|><|assistant|>

Download a file (not the whole branch) from below:

Filename Quant type File Size Split ELIZA-Tasks-100 Description
gemma-2-2b-jpn-it.f16.gguf f16 5.24GB false Full F16 weights.
gemma-2-2b-jpn-it.Q8_0.gguf Q8_0 2.78GB false Extremely high quality, recommended.
gemma-2-2b-jpn-it-imatrix.Q4_0.gguf Q4_0 1.63GB false Good quality, recommended for edge device <8GB RAM.
gemma-2-2b-jpn-it-imatrix.Q4_0_8_8.gguf Q4_0_8_8 1.63GB false Good quality, recommended for edge device <8GB RAM.
gemma-2-2b-jpn-it-imatrix.Q4_0_4_8.gguf Q4_0_4_8 1.63GB false Good quality, recommended for edge device <8GB RAM.
gemma-2-2b-jpn-it-imatrix.Q4_0_4_4.gguf Q4_0_4_4 1.63GB false Good quality, recommended for edge device <8GB RAM.
gemma-2-2b-jpn-it.Q4_0.gguf Q4_0 1.63GB false Poor quality, not recommended.
gemma-2-2b-jpn-it.Q4_0_8_8.gguf Q4_0_8_8 1.63GB false Poor quality, not recommended.
gemma-2-2b-jpn-it.Q4_0_4_8.gguf Q4_0_4_8 1.63GB false Poor quality, not recommended.
gemma-2-2b-jpn-it.Q4_0_4_4.gguf Q4_0_4_4 1.63GB false Poor quality, not recommended.

How to check i8mm and sve support for ARM devices

ARM i8mm support is necessary to take advantage of Q4_0_4_8 gguf. All ARM architecure >= ARMv8.6-A supports i8mm.

ARM sve support is necessary to take advantage of Q4_0_8_8 gguf. sve is an optional feature that starts from ARMv8.2-A but majority of ARM chips doesn't implement it.

For ARM devices without both, it is recommended to use Q4_0_4_4.

For Apple devices,

sysctl hw

For ARM devices (ie most Android devices),

cat /proc/cpuinfo

There are also android apps that can display /proc/cpuinfo.

Which Q4_0 model to use for ARM devices

Brand Series Model i8mm sve Quant Type
Qualcomm |Snapdragon >= 7 Gen 1 Yes Yes Q4_0_8_8
Qualcomm |Snapdragon others No No Q4_0_4_4
Apple M M1 No No Q4_0_4_4
Apple M M2/M3/M4 Yes No Q4_0_4_8
Apple A A4 to A14 No No Q4_0_4_4
Apple A A15 to A18 Yes No Q4_0_4_8

Convert safetensors to f16 gguf

Make sure you have llama.cpp git cloned:

python3 convert_hf_to_gguf.py gemma-2-2b-jpn-it/ --outfile gemma-2-2b-jpn-it.f16.gguf --outtype f16

Convert f16 gguf to Q8_0 gguf without imatrix

Make sure you have llama.cpp compiled:

./llama-quantize gemma-2-2b-jpn-it.f16.gguf gemma-2-2b-jpn-it.Q8_0.gguf q8_0

Convert f16 gguf to other gguf with imatrix

First, prepare imatrix from f16 gguf and c4_en_ja_imatrix.txt

./llama-imatrix -m gemma-2-2b-jpn-it.f16.gguf -f c4_en_ja_imatrix.txt -o gemma-2-2b-jpn-it.imatrix --chunks 32

Then, convert f16 gguf with imatrix to create imatrix gguf

./llama-quantize --imatrix gemma-2-2b-jpn-it.imatrix gemma-2-2b-jpn-it.f16.gguf gemma-2-2b-jpn-it-imatrix.Q4_0_8_8.gguf q4_0_8_8

Downloading using huggingface-cli

First, make sure you have hugginface-cli installed:

pip install -U "huggingface_hub[cli]"

Then, you can target the specific file you want:

huggingface-cli download ymcki/gemma-2-2b-jpn-it-GGUF --include "gemma-2-2b-jpn-it-Q8_0.gguf" --local-dir ./

Credits

Thank you bartowski for providing a README.md to get me started.

Thank you YoutechA320U for the ELYZA-tasks-100 auto evaluation tool.