Upload README.md

528f049 verified 17 days ago

5.14 kB

	---
	base_model: google/gemma-2-2b-jpn-it
	language:
	- multilingual
	datasets:
	- TFMC/imatrix-dataset-for-japanese-llm
	library_name: transformers
	license: gemma
	license_link: https://ai.google.dev/gemma/terms
	pipeline_tag: text-generation
	tags:
	- nlp
	- code
	quantized_by: ymcki
	widget:
	- messages:
	- role: user
	content: Can you provide ways to eat combinations of bananas and dragonfruits?
	---

	Original model: https://huggingface.co/google/gemma-2-2b-jpn-it

	Run them in [LM Studio](https://lmstudio.ai/)

	## Prompt format

	```
	<\|system\|> {system_prompt}<\|end\|><\|user\|> {prompt}<\|end\|><\|assistant\|>
	```

	## Download a file (not the whole branch) from below:

	\| Filename \| Quant type \| File Size \| Split \| ELIZA-Tasks-100 \| Description \|
	\| -------- \| ---------- \| --------- \| ----- \| --------------- \| ----------- \|
	\| [gemma-2-2b-jpn-it.f16.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it.f16.gguf) \| f16 \| 5.24GB \| false \| Full F16 weights. \|
	\| [gemma-2-2b-jpn-it.Q8_0.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it.Q8_0.gguf) \| Q8_0 \| 2.78GB \| false \| Extremely high quality, recommended. \|
	\| [gemma-2-2b-jpn-it-imatrix.Q4_0.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it-imatrix.Q4_0.gguf) \| Q4_0 \| 1.63GB \| false \| Good quality, recommended for edge device <8GB RAM. \|
	\| [gemma-2-2b-jpn-it-imatrix.Q4_0_8_8.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it-imatrix.Q4_0_8_8.gguf) \| Q4_0_8_8 \| 1.63GB \| false \| Good quality, recommended for edge device <8GB RAM. \|
	\| [gemma-2-2b-jpn-it-imatrix.Q4_0_4_8.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it-imatrix.Q4_0_4_8.gguf) \| Q4_0_4_8 \| 1.63GB \| false \| Good quality, recommended for edge device <8GB RAM. \|
	\| [gemma-2-2b-jpn-it-imatrix.Q4_0_4_4.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it-imatrix.Q4_0_4_4.gguf) \| Q4_0_4_4 \| 1.63GB \| false \| Good quality, recommended for edge device <8GB RAM. \|
	\| [gemma-2-2b-jpn-it.Q4_0.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it.Q4_0.gguf) \| Q4_0 \| 1.63GB \| false \| Poor quality, not recommended. \|
	\| [gemma-2-2b-jpn-it.Q4_0_8_8.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it.Q4_0_8_8.gguf) \| Q4_0_8_8 \| 1.63GB \| false \| Poor quality, not recommended. \|
	\| [gemma-2-2b-jpn-it.Q4_0_4_8.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it.Q4_0_4_8.gguf) \| Q4_0_4_8 \| 1.63GB \| false \| Poor quality, not recommended. \|
	\| [gemma-2-2b-jpn-it.Q4_0_4_4.gguf](https://huggingface.co/ymcki/gemma-2-2b-jpn-it-GGUF/blob/main/gemma-2-2b-jpn-it.Q4_0_4_4.gguf) \| Q4_0_4_4 \| 1.63GB \| false \| Poor quality, not recommended. \|

	## How to check i8mm and sve support for ARM devices

	ARM i8mm support is necessary to take advantage of Q4_0_4_8 gguf. All ARM architecure >= ARMv8.6-A supports i8mm.

	ARM sve support is necessary to take advantage of Q4_0_8_8 gguf. sve is an optional feature that starts from ARMv8.2-A but majority of ARM chips doesn't implement it.

	For ARM devices without both, it is recommended to use Q4_0_4_4.

	For Apple devices,

	```
	sysctl hw
	```

	For ARM devices (ie most Android devices),
	```
	cat /proc/cpuinfo
	```

	There are also android apps that can display /proc/cpuinfo.

	## Which Q4_0 model to use for ARM devices
	\| Brand \| Series \| Model \| i8mm \| sve \| Quant Type \|
	\| ----- \| ------ \| ----- \| ---- \| --- \| -----------\|
	\| Qualcomm ｜Snapdragon \| >= 7 Gen 1 \| Yes \| Yes \| Q4_0_8_8 \|
	\| Qualcomm ｜Snapdragon \| others \| No \| No \| Q4_0_4_4 \|
	\| Apple \| M \| M1 \| No \| No \| Q4_0_4_4 \|
	\| Apple \| M \| M2/M3/M4 \| Yes \| No \| Q4_0_4_8 \|
	\| Apple \| A \| A4 to A14 \| No \| No \| Q4_0_4_4 \|
	\| Apple \| A \| A15 to A18 \| Yes \| No \| Q4_0_4_8 \|

	## Convert safetensors to f16 gguf

	Make sure you have llama.cpp git cloned:

	```
	python3 convert_hf_to_gguf.py gemma-2-2b-jpn-it/ --outfile gemma-2-2b-jpn-it.f16.gguf --outtype f16
	```

	## Convert f16 gguf to Q8_0 gguf without imatrix
	Make sure you have llama.cpp compiled:
	```
	./llama-quantize gemma-2-2b-jpn-it.f16.gguf gemma-2-2b-jpn-it.Q8_0.gguf q8_0
	```

	## Convert f16 gguf to other gguf with imatrix

	First, prepare imatrix from f16 gguf and c4_en_ja_imatrix.txt

	```
	./llama-imatrix -m gemma-2-2b-jpn-it.f16.gguf -f c4_en_ja_imatrix.txt -o gemma-2-2b-jpn-it.imatrix --chunks 32
	```

	Then, convert f16 gguf with imatrix to create imatrix gguf

	```
	./llama-quantize --imatrix gemma-2-2b-jpn-it.imatrix gemma-2-2b-jpn-it.f16.gguf gemma-2-2b-jpn-it-imatrix.Q4_0_8_8.gguf q4_0_8_8
	```

	## Downloading using huggingface-cli

	First, make sure you have hugginface-cli installed:

	```
	pip install -U "huggingface_hub[cli]"
	```

	Then, you can target the specific file you want:

	```
	huggingface-cli download ymcki/gemma-2-2b-jpn-it-GGUF --include "gemma-2-2b-jpn-it-Q8_0.gguf" --local-dir ./
	```

	## Credits

	Thank you bartowski for providing a README.md to get me started.

	Thank you YoutechA320U for the ELYZA-tasks-100 auto evaluation tool.