LLaMa 65B 3bit GPTQ
This is a GPTQ format quantised 3-bit model of LLaMa 65B.
It is the result of quantising to 3-bit using GPTQ-for-LLaMa.
How to easily download and use this model in text-generation-webui
Open the text-generation-webui UI as normal.
- Click the Model tab.
- Under Download custom model or LoRA, enter
TheBloke/LLaMa-65B-GPTQ-3bit
. - Click Download.
- Wait until it says it's finished downloading.
- Click the Refresh icon next to Model in the top left.
- In the Model drop-down: choose the model you just downloaded,
LLaMa-65B-GPTQ-3bit
. - If you see an error in the bottom right, ignore it - it's temporary.
- Fill out the
GPTQ parameters
on the right:Bits = 3
,Groupsize = None
,model_type = Llama
- Click Save settings for this model in the top right.
- Click Reload the Model in the top right.
- Once it says it's loaded, click the Text Generation tab and enter a prompt!
Provided files
Compatible file - LLaMa-65B-GPTQ-3bit.safetensors
This will work with all versions of GPTQ-for-LLaMa. It has maximum compatibility
It was created with the --act-order
parameter to maximise inference quality, and with group_size = None to minmise VRAM requirements.
Wizard-Vicuna-13B-Uncensored-GPTQ-4bit-128g.compat.no-act-order.safetensors
- Works with all versions of GPTQ-for-LLaMa code, both Triton and CUDA branches
- Works with AutoGPTQ.
- Works with text-generation-webui one-click-installers
- Parameters: Groupsize = None. act-order.
- Command used to create the GPTQ:
python llama.py /workspace/models/huggyllama_llama-65b wikitext2 --wbits 3 --true-sequential --act-order --save_safetensors /workspace/llama-3bit/LLaMa-65B-GPTQ-3bit.safetensors
- Downloads last month
- 22