Nexesenex
/

TeeZee_Kyllene-Yi-34B-v1.1-iMat.GGUF

Inference Endpoints

Model card Files Files and versions Community

Nexesenex commited on Mar 13

Commit

8daf3b7

•

1 Parent(s): 3de5855

Update README.md

Files changed (1) hide show

README.md +4 -1

README.md CHANGED Viewed

@@ -30,7 +30,10 @@ Full offload possible on 24GB VRAM with a big to huge context size (from 12288 w
 Full offload possible on 16GB VRAM with a decent context size : IQ3_XXS SOTA (which is equivalent to a Q3_K_S with more context!), Q2_K, Q2_K_S
-Full offload possible on 12GB VRAM with a decent context size : IQ2_XS SOTA. lower quality : IQ2_XXS SOTA, IQ1_S (prefer v2 or v3)
 ---

 Full offload possible on 16GB VRAM with a decent context size : IQ3_XXS SOTA (which is equivalent to a Q3_K_S with more context!), Q2_K, Q2_K_S
+Full offload possible on 12GB VRAM with a decent context size : IQ2_XS SOTA. lower quality : IQ2_XXS SOTA
+Full offload maybe possible on 8GB VRAM with a small context size : IQ1_S revision "even better" (b2404).
+All my IQ1_S quant from the 13/03/2024 will be with this new IQ1_S quantization base.
 ---