datatab
/

YugoGPT-Quantized-GGUF

@@ -19,30 +19,30 @@ base_model: gordicaleksa/YugoGPT
 # Quant. preference
-```bash
-"not_quantized"  : "Recommended. Fast conversion. Slow inference, big files.",
-"fast_quantized" : "Recommended. Fast conversion. OK inference, OK file size.",
-"quantized"      : "Recommended. Slow conversion. Fast inference, small files.",
-"f32"     : "Not recommended. Retains 100% accuracy, but super slow and memory hungry.",
-"f16"     : "Fastest conversion + retains 100% accuracy. Slow and memory hungry.",
-"q8_0"    : "Fast conversion. High resource use, but generally acceptable.",
-"q4_k_m"  : "Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q4_K",
-"q5_k_m"  : "Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q5_K",
-"q2_k"    : "Uses Q4_K for the attention.vw and feed_forward.w2 tensors, Q2_K for the other tensors.",
-"q3_k_l"  : "Uses Q5_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else Q3_K",
-"q3_k_m"  : "Uses Q4_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else Q3_K",
-"q3_k_s"  : "Uses Q3_K for all tensors",
-"q4_0"    : "Original quant method, 4-bit.",
-"q4_1"    : "Higher accuracy than q4_0 but not as high as q5_0. However has quicker inference than q5 models.",
-"q4_k_s"  : "Uses Q4_K for all tensors",
-"q4_k"    : "alias for q4_k_m",
-"q5_k"    : "alias for q5_k_m",
-"q5_0"    : "Higher accuracy, higher resource usage and slower inference.",
-"q5_1"    : "Even higher accuracy, resource usage and slower inference.",
-"q5_k_s"  : "Uses Q5_K for all tensors",
-"q6_k"    : "Uses Q8_K for all tensors",
-"iq2_xxs" : "2.06 bpw quantization",
-"iq2_xs"  : "2.31 bpw quantization",
-"iq3_xxs" : "3.06 bpw quantization",
-"q3_k_xs" : "3-bit extra small quantization"
-```

 # Quant. preference
+| Key           | Description                                                                           |
+|---------------|---------------------------------------------------------------------------------------|
+| not_quantized | Recommended. Fast conversion. Slow inference, big files.                              |
+| fast_quantized| Recommended. Fast conversion. OK inference, OK file size.                             |
+| quantized     | Recommended. Slow conversion. Fast inference, small files.                            |
+| f32           | Not recommended. Retains 100% accuracy, but super slow and memory hungry.             |
+| f16           | Fastest conversion + retains 100% accuracy. Slow and memory hungry.                   |
+| q8_0          | Fast conversion. High resource use, but generally acceptable.                         |
+| q4_k_m        | Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q4_K |
+| q5_k_m        | Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q5_K |
+| q2_k          | Uses Q4_K for the attention.vw and feed_forward.w2 tensors, Q2_K for the other tensors.|
+| q3_k_l        | Uses Q5_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else Q3_K  |
+| q3_k_m        | Uses Q4_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else Q3_K  |
+| q3_k_s        | Uses Q3_K for all tensors                                                             |
+| q4_0          | Original quant method, 4-bit.                                                         |
+| q4_1          | Higher accuracy than q4_0 but not as high as q5_0. However has quicker inference than q5 models.|
+| q4_k_s        | Uses Q4_K for all tensors                                                             |
+| q4_k          | alias for q4_k_m                                                                      |
+| q5_k          | alias for q5_k_m                                                                      |
+| q5_0          | Higher accuracy, higher resource usage and slower inference.                          |
+| q5_1          | Even higher accuracy, resource usage and slower inference.                            |
+| q5_k_s        | Uses Q5_K for all tensors                                                             |
+| q6_k          | Uses Q8_K for all tensors                                                             |
+| iq2_xxs       | 2.06 bpw quantization                                                                 |
+| iq2_xs        | 2.31 bpw quantization                                                                 |
+| iq3_xxs       | 3.06 bpw quantization                                                                 |
+| q3_k_xs       | 3-bit extra small quantization                                                        |