InferenceIllusionist
commited on
Commit
•
1a6d30c
1
Parent(s):
12952ce
Update README.md
Browse files
README.md
CHANGED
@@ -13,6 +13,11 @@ tags:
|
|
13 |
Quantized from fp32 with love. If you're using the latest version of llama.cpp you should no longer need to combine files before loading.
|
14 |
* Importance matrix calculated using fp16 precision model
|
15 |
* Calculated in 105 chunks with n_ctx=512 using groups_merged.txt
|
|
|
|
|
|
|
|
|
|
|
16 |
|
17 |
For a brief rundown of iMatrix quant performance please see this [PR](https://github.com/ggerganov/llama.cpp/pull/5747)
|
18 |
|
|
|
13 |
Quantized from fp32 with love. If you're using the latest version of llama.cpp you should no longer need to combine files before loading.
|
14 |
* Importance matrix calculated using fp16 precision model
|
15 |
* Calculated in 105 chunks with n_ctx=512 using groups_merged.txt
|
16 |
+
* See below for imatrix calculation arguments
|
17 |
+
|
18 |
+
```
|
19 |
+
.\llama-imatrix -m .\models\WizardLM-2-8x22b\ggml-model-f16.gguf -f .\imatrix\groups_merged.txt -o .\models\WizardLM-2-8x22b\WizardLM-2-8x22b-f16.imatrix -ngl 14 -t 24
|
20 |
+
```
|
21 |
|
22 |
For a brief rundown of iMatrix quant performance please see this [PR](https://github.com/ggerganov/llama.cpp/pull/5747)
|
23 |
|