Nexesenex
/

MIstral-QUantized-70b_Miqu-1-70b-iMat.GGUF

Inference Endpoints

Model card Files Files and versions Community

Nexesenex commited on Feb 1

Commit

cd13f01

•

1 Parent(s): cb30fbf

Update README.md

Files changed (1) hide show

README.md +14 -4

README.md CHANGED Viewed

@@ -7,13 +7,23 @@ Miqu 1 70b : a leak of Mistral Medium Alpha. Credit for this model goes to the M
 ---
-Requantizations of a Q5_K_M quant of a trending 70b model without better quant/fp16 available, this through a Q8_0 intermediary step.
 Miqudev provided Q5_K_M, Q4_K_M, and Q2_K on this page : https://huggingface.co/miqudev/miqu-1-70b
-Here, you will find :
-- Q4_K_S, Q3_K_L, Q3_K_M, Q3_K_S, Q3_K_XS, Q2_K_S, IQ3_XXS SOTA and IQ2_XS SOTA available.
-- IQ2_XXS SOTA for this afternoon.
 ---

 ---
+Requantizations with iMatrix (better quality than without) of a Q5_K_M quant of a trending 70b model without better quant/fp16 available, this through a Q8_0 intermediary step.
 Miqudev provided Q5_K_M, Q4_K_M, and Q2_K on this page : https://huggingface.co/miqudev/miqu-1-70b
+Here, you will find the following quants :
+Full offload possible on 48GB VRAM with a huge context size :
+- Q4_K_S
+- Lower quality : Q3_K_L
+Full offload possible on 36GB VRAM with a variable context size (up to 7168 with Q3_K_M, for example)
+- Q3_K_M, Q3_K_S, Q3_K_XS, IQ3_XXS SOTA (which is equivalent to a Q3_K_S with more context!)
+- Lower quality : Q2_K_S
+Full offload possible on 24GB VRAM with a decent context size.
+- IQ2_XS SOTA
+- Lower quality : IQ2_XXS SOTA
 ---