TheBloke
/

Vicuna-13B-1.1-GPTQ

Text Generation

Transformers

llama

conversational

Model card Files Files and versions Community

TheBloke commited on Apr 28, 2023

Commit

4cbf7c3

•

1 Parent(s): 515c0db

Update README.md

Browse files

Files changed (1) hide show

README.md +21 -6

README.md CHANGED Viewed

@@ -28,6 +28,20 @@ I have the following Vicuna 1.1 repositories available:
 * [GPTQ quantized 4bit 7B 1.1 for GPU - `safetensors` and `pt` formats](https://huggingface.co/TheBloke/vicuna-7B-1.1-GPTQ-4bit-128g)
 * [GPTQ quantized 4bit 7B 1.1 for CPU - GGML format for `llama.cpp`](https://huggingface.co/TheBloke/vicuna-7B-1.1-GPTQ-4bit-128g-GGML)
 ## GIBBERISH OUTPUT
 If you get gibberish output, it is because you are using the `safetensors` file without updating GPTQ-for-LLaMA.
@@ -43,17 +57,18 @@ Either way, please read the instructions below carefully.
 Two model files are provided. Ideally use the `safetensors` file. Full details below:
 Details of the files provided:
-* `vicuna-13B-1.1-GPTQ-4bit-128g.safetensors`
-  * `safetensors` format, with improved file security, created with the latest [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa) code.
-  * Command to create:
-    * `python3 llama.py vicuna-13B-1.1-HF c4 --wbits 4 --true-sequential --act-order --groupsize 128 --save_safetensors vicuna-13B-1.1-GPTQ-4bit-128g.safetensors`
-* `vicuna-13B-1.1-GPTQ-4bit-128g.no-act-order.pt`
   * `pt` format file, created without the `--act-order` flag.
   * This file may have slightly lower quality, but is included as it can be used without needing to compile the latest GPTQ-for-LLaMa code.
-  * It should hopefully therefore work with one-click-installers on Windows, which include the older GPTQ-for-LLaMa code.
   * Command to create:
     * `python3 llama.py vicuna-13B-1.1-HF c4 --wbits 4 --true-sequential --groupsize 128 --save_safetensors vicuna-13B-1.1-GPTQ-4bit-128g.no-act-order.pt`
 ## How to run in `text-generation-webui`
 File `vicuna-13B-1.1-GPTQ-4bit-128g.no-act-order.pt` can be loaded the same as any other GPTQ file, without requiring any updates to [oobaboogas text-generation-webui](https://github.com/oobabooga/text-generation-webui).

 * [GPTQ quantized 4bit 7B 1.1 for GPU - `safetensors` and `pt` formats](https://huggingface.co/TheBloke/vicuna-7B-1.1-GPTQ-4bit-128g)
 * [GPTQ quantized 4bit 7B 1.1 for CPU - GGML format for `llama.cpp`](https://huggingface.co/TheBloke/vicuna-7B-1.1-GPTQ-4bit-128g-GGML)
+## How to easily download and use this model in text-generation-webui
+Load text-generation-webui as you normally do.
+1. Click the **Model tab**.
+2. Under **Download custom model or LoRA**, enter this repo name: `TheBloke/vicuna-13B-1.1-GPTQ-4bit-128g`.
+3. Click **Download**.
+4. Wait until it says it's finished downloading.
+5. As this is a GPTQ model, fill in the `GPTQ parameters` on the right: `Bits = 4`, `Groupsize = 128`, `model_type = Llama`
+6. Now click the **Refresh** icon next to **Model** in the top left.
+7. In the **Model drop-down**: choose this model: `vicuna-13B-1.1-GPTQ-4bit-128g`.
+8. Click **Reload the Model** in the top right.
+9. Once it says it's loaded, click the **Text Generation tab** and enter a prompt!
 ## GIBBERISH OUTPUT
 If you get gibberish output, it is because you are using the `safetensors` file without updating GPTQ-for-LLaMA.
 Two model files are provided. Ideally use the `safetensors` file. Full details below:
 Details of the files provided:
+* `vicuna-13B-1.1-GPTQ-4bit-128g.compat.no-act-order.pt`
   * `pt` format file, created without the `--act-order` flag.
   * This file may have slightly lower quality, but is included as it can be used without needing to compile the latest GPTQ-for-LLaMa code.
+  * It will therefore work with one-click-installers on Windows, which include the older GPTQ-for-LLaMa code.
   * Command to create:
     * `python3 llama.py vicuna-13B-1.1-HF c4 --wbits 4 --true-sequential --groupsize 128 --save_safetensors vicuna-13B-1.1-GPTQ-4bit-128g.no-act-order.pt`
+* `vicuna-13B-1.1-GPTQ-4bit-128g.latest.safetensors`
+  * `safetensors` format, with improved file security, created with the latest [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa) code.
+  * Command to create:
+    * `python3 llama.py vicuna-13B-1.1-HF c4 --wbits 4 --true-sequential --act-order --groupsize 128 --save_safetensors vicuna-13B-1.1-GPTQ-4bit-128g.safetensors`
 ## How to run in `text-generation-webui`
 File `vicuna-13B-1.1-GPTQ-4bit-128g.no-act-order.pt` can be loaded the same as any other GPTQ file, without requiring any updates to [oobaboogas text-generation-webui](https://github.com/oobabooga/text-generation-webui).