InferenceIllusionist
/

WizardLM-2-8x22B-iMat-GGUF

Inference Endpoints

Model card Files Files and versions Community

InferenceIllusionist commited on Jun 15

Commit

1a6d30c

•

1 Parent(s): 12952ce

Update README.md

Files changed (1) hide show

README.md +5 -0

README.md CHANGED Viewed

@@ -13,6 +13,11 @@ tags:
 Quantized from fp32 with love. If you're using the latest version of llama.cpp you should no longer need to combine files before loading.
 * Importance matrix calculated using fp16 precision model
 * Calculated in 105 chunks with n_ctx=512 using groups_merged.txt
 For a brief rundown of iMatrix quant performance please see this [PR](https://github.com/ggerganov/llama.cpp/pull/5747)

 Quantized from fp32 with love. If you're using the latest version of llama.cpp you should no longer need to combine files before loading.
 * Importance matrix calculated using fp16 precision model
 * Calculated in 105 chunks with n_ctx=512 using groups_merged.txt
+* See below for imatrix calculation arguments
+```
+.\llama-imatrix -m .\models\WizardLM-2-8x22b\ggml-model-f16.gguf -f  .\imatrix\groups_merged.txt -o .\models\WizardLM-2-8x22b\WizardLM-2-8x22b-f16.imatrix -ngl 14 -t 24
+```
 For a brief rundown of iMatrix quant performance please see this [PR](https://github.com/ggerganov/llama.cpp/pull/5747)