About
GGUF imatrix quants of AlexBefest/WoonaV1.2-9b model. All quants, except Q6_k and Q8_0 was maded with imatrix quantization method.
Prompt template: Gemma (RECOMMENDED TEMP=0.3-0.5)
<start_of_turn>user\n {prompt}<end_of_turn>
Provided files
Name | Quant method | Bits | Size | Min RAM required | Use case |
---|---|---|---|---|---|
WoonaV1.2-9b-imat-Q2_K.gguf | Q2_K [imatrix] | 2 | 3.5 GB | 5.1 GB | small, very high quality loss - not recommended, but usable (probably faster than Q3_XXS, but worse) |
WoonaV1.2-9b-imat-IQ3_XXS.gguf | IQ3_XXS [imatrix] | 3 | 3.5 GB | 5.1 GB | small, high quality loss |
WoonaV1.2-9b-imat-IQ3_M.gguf | IQ3_M [imatrix] | 3 | 4.2 GB | 5.7 GB | small, high quality loss |
WoonaV1.2-9b-imat-IQ4_XS.gguf | IQ4_XS [imatrix] | 4 | 4.8 GB | 6.3 GB | medium, slightly worse than Q4_K_M |
WoonaV1.2-9b-imat-Q4_K_S.gguf | Q4_K_S [imatrix] | 4 | 5.1 GB | 6.7 GB | medium, balanced quality loss |
WoonaV1.2-9b-imat-Q4_K_M.gguf | Q4_K_M [imatrix] | 4 | 5.4 GB | 6.9 GB | medium, balanced quality - recommended |
WoonaV1.2-9b-imat-Q5_K_S.gguf | Q5_K_S [imatrix] | 5 | 6 GB | 7.6 GB | large, low quality loss - recommended |
WoonaV1.2-9b-imat-Q5_K_M.gguf | Q5_K_M [imatrix] | 5 | 6.2 GB | 7.8 GB | large, very low quality loss - recommended |
WoonaV1.2-9b-Q6_K.gguf | Q6_K [static] | 6 | 7.1 GB | 8.7 GB | very large, near perfect quality - recommended |
WoonaV1.2-9b-Q8_0.gguf | Q8_0 [static] | 8 | 9.2 GB | 10.8 GB | very large, extremely low quality loss |
How to Use
- llama.cpp The opensource framework for running GGUF LLM models on which all other interfaces are made.
- koboldcpp Easy method for windows inference. Lightweight open source fork llama.cpp with a simple graphical interface and many additional features.
- LM studio Proprietary free fork llama.cpp with a graphical interface.
- Downloads last month
- 145
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.