InferenceIllusionist commited on
Commit
1f7461f
1 Parent(s): 84d94cd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -0
README.md CHANGED
@@ -12,6 +12,7 @@ Quantized from fp16.
12
  * Weighted quantizations were creating using fp16 GGUF and [groups_merged-enhancedV2-TurboMini.txt](https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-9432658) in 234 chunks and n_ctx=512
13
  * This method of calculating the importance matrix showed improvements in some areas for Mistral 7b and Llama3 8b models, see above post for details
14
  * The enhancedv2-turbomini file appends snippets from turboderp's calibration data to the standard groups_merged.txt file
 
15
 
16
  For a brief rundown of iMatrix quant performance please see this [PR](https://github.com/ggerganov/llama.cpp/pull/5747)
17
 
 
12
  * Weighted quantizations were creating using fp16 GGUF and [groups_merged-enhancedV2-TurboMini.txt](https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-9432658) in 234 chunks and n_ctx=512
13
  * This method of calculating the importance matrix showed improvements in some areas for Mistral 7b and Llama3 8b models, see above post for details
14
  * The enhancedv2-turbomini file appends snippets from turboderp's calibration data to the standard groups_merged.txt file
15
+ * Repetition penalty 1.05-1.15 has worked well for these quants.
16
 
17
  For a brief rundown of iMatrix quant performance please see this [PR](https://github.com/ggerganov/llama.cpp/pull/5747)
18