Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,24 @@
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
+
|
5 |
+
This repository contains alternative Mixtral-instruct-8x7B (https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) quantized models in GGUF format for use with `llama.cpp`.
|
6 |
+
The models are fully compatible with the oficial `llama.cpp` release and can be used out-of-the-box.
|
7 |
+
|
8 |
+
I'm carefull to say "alternative" rather than "better" or "improved" as I have not put any effort into evaluating performance
|
9 |
+
differences in actual usage. Perplexity is lower compared to the "official" `llama.cpp` quantization, but perplexity is not
|
10 |
+
necessarily a good measure for real world performance. Nevertheless, perplexity does measure quantization error, so below is a table
|
11 |
+
comparing perplexities of these quantized models to the current `llama.cpp` quantization approach on Wikitext for a context length of 512 tokens.
|
12 |
+
The "Quantization Error" columns in the table are defined as `(PPL(quantized model) - PPL(fp16))/PPL(fp16)`.
|
13 |
+
|
14 |
+
| Quantization | Model file | PPL(llama.cpp) | Quantization Error | PPL(new quants) | Quantization Error |
|
15 |
+
|--:|--:|--:|--:|--:|--:|
|
16 |
+
|Q2_K | mixtral-instruct-8x7b-q2k.gguf | 6.8953 | 56.4% | 5.2679 | 19.5% |
|
17 |
+
|Q3_K_S| mixtral-instruct-8x7b-q3k-small.gguf | 4.7038 | 6.68% | 4.6401 | 5.24% |
|
18 |
+
|Q3_K_M| mixtral-instruct-8x7b-q3k-medium.gguf| 4.6663 | 5.83% | 4.5608 | 3.44% |
|
19 |
+
|Q4_K_S| mixtral-instruct-8x7b-q4k-small.gguf | 4.5105 | 2.30% | 4.4630 | 1.22% |
|
20 |
+
|Q4_K_M| mixtral-instruct-8x7b-q4k-medium.gguf| 4.5105 | 2.30% | 4.4568 | 1.08% |
|
21 |
+
|Q5_K_S| mixtral-instruct-8x7b-q5k-small.gguf | 4.4402 | 0.71% | 4.4277 | 0.42% |
|
22 |
+
|Q4_0 | mixtral-instruct-8x7b-q40.gguf | 4.5102 | 2.29% | 4.4908 | 1.85% |
|
23 |
+
|Q4_1 | mixtral-instruct-8x7b-q41.gguf | 4.5415 | 3.00% | 4.4612 | 1.18% |
|
24 |
+
|Q5_0 | mixtral-instruct-8x7b-q50.gguf | 4.4361 | 0.61% | 4.4297 | 0.47% |
|