TheBloke
/

Llama-2-70B-Chat-GPTQ

Text Generation

text-generation-inference

4-bit precision

Model card Files Files and versions Community

TheBloke commited on Jul 19, 2023

Commit

d7c8ef0

•

1 Parent(s): 0f75c61

Update README.md

Files changed (1) hide show

README.md +5 -5

README.md CHANGED Viewed

@@ -45,18 +45,18 @@ Now that we have ExLlama, that is the recommended loader to use for these models
 Reminder: ExLlama does not support 3-bit models, so if you wish to try those quants, you will need to use AutoGPTQ or GPTQ-for-LLaMa.
 ## AutoGPTQ and GPTQ-for-LLaMa requires latest version of Transformers
-If you plan to use any of these quants with AutoGPTQ or GPTQ-for-LLaMa, you will need to update Transformers to the latest Github code:
 ```
 pip3 install git+https://github.com/huggingface/transformers
 ```
-If using a UI like text-generation-webui, make sure to do this in the Python environment of text-generation-webui.
 ## Repositories available
 * [GPTQ models for GPU inference, with multiple quantisation parameter options.](https://huggingface.co/TheBloke/Llama-2-70B-chat-GPTQ)

 Reminder: ExLlama does not support 3-bit models, so if you wish to try those quants, you will need to use AutoGPTQ or GPTQ-for-LLaMa.
 ## AutoGPTQ and GPTQ-for-LLaMa requires latest version of Transformers
+If you plan to use any of these quants with AutoGPTQ or GPTQ-for-LLaMa, your Transformers needs to be be using the latest Github code.
+If you're using text-generation-webui and have updated to the latest version, this is done for you automatically.
+If not, you can update it manually with:
 ```
 pip3 install git+https://github.com/huggingface/transformers
 ```
 ## Repositories available
 * [GPTQ models for GPU inference, with multiple quantisation parameter options.](https://huggingface.co/TheBloke/Llama-2-70B-chat-GPTQ)