tsumeone
/

wizard-vicuna-13b-4bit-128g-cuda

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

tsumeone commited on May 4, 2023

Commit

11ab1d0

•

1 Parent(s): 51d9bc6

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -6,4 +6,6 @@ Quant of https://huggingface.co/junelee/wizard-vicuna-13b tested working with Oc
 Someone made a Triton quant already here, but it will not work with Occam's KoboldAI/GPTQ fork: https://huggingface.co/fbjr/wizard-vicuna-13b-4bit-128g
 ```python llama.py ./wizard-vicuna-13b c4 --wbits 4 --true-sequential --groupsize 128 --save_safetensors 4bit-128g.safetensors```

 Someone made a Triton quant already here, but it will not work with Occam's KoboldAI/GPTQ fork: https://huggingface.co/fbjr/wizard-vicuna-13b-4bit-128g
+Note that this model is fairly heavily censored (in my opinion) and delivers AI-moralizing responses to prompts that Vicuna 1.1 does not complain about.
 ```python llama.py ./wizard-vicuna-13b c4 --wbits 4 --true-sequential --groupsize 128 --save_safetensors 4bit-128g.safetensors```