TheBloke commited on
Commit
fae8124
1 Parent(s): b7531c5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -138
README.md CHANGED
@@ -42,7 +42,18 @@ quantized_by: TheBloke
42
  <!-- description start -->
43
  # Description
44
 
45
- This repo contains GPTQ model files for [Migel Tissera's Synthia MoE v3 Mixtral 8x7B](https://huggingface.co/migtissera/Synthia-MoE-v3-Mixtral-8x7B).
 
 
 
 
 
 
 
 
 
 
 
46
 
47
  Multiple GPTQ parameter permutations are provided; see Provided Files below for details of the options provided, their parameters, and the software used to create them.
48
 
@@ -62,28 +73,12 @@ Multiple GPTQ parameter permutations are provided; see Provided Files below for
62
  SYSTEM: Elaborate on the topic using a Tree of Thoughts and backtrack when necessary to construct a clear, cohesive Chain of Thought reasoning. Always answer without hesitation.
63
  USER: {prompt}
64
  ASSISTANT:
65
-
66
  ```
67
 
68
  <!-- prompt-template end -->
69
 
70
 
71
 
72
- <!-- README_GPTQ.md-compatible clients start -->
73
- ## Known compatible clients / servers
74
-
75
- GPTQ models are currently supported on Linux (NVidia/AMD) and Windows (NVidia only). macOS users: please use GGUF models.
76
-
77
- These GPTQ models are known to work in the following inference servers/webuis.
78
-
79
- - [text-generation-webui](https://github.com/oobabooga/text-generation-webui)
80
- - [KoboldAI United](https://github.com/henk717/koboldai)
81
- - [LoLLMS Web UI](https://github.com/ParisNeo/lollms-webui)
82
- - [Hugging Face Text Generation Inference (TGI)](https://github.com/huggingface/text-generation-inference)
83
-
84
- This may not be a complete list; if you know of others, please let me know!
85
- <!-- README_GPTQ.md-compatible clients end -->
86
-
87
  <!-- README_GPTQ.md-provided-files start -->
88
  ## Provided files, and GPTQ parameters
89
 
@@ -189,6 +184,8 @@ Note that using Git with HF repos is strongly discouraged. It will be much slowe
189
  <!-- README_GPTQ.md-text-generation-webui start -->
190
  ## How to easily download and use this model in [text-generation-webui](https://github.com/oobabooga/text-generation-webui)
191
 
 
 
192
  Please make sure you're using the latest version of [text-generation-webui](https://github.com/oobabooga/text-generation-webui).
193
 
194
  It is strongly recommended to use the text-generation-webui one-click-installers unless you're sure you know how to make a manual install.
@@ -212,127 +209,6 @@ It is strongly recommended to use the text-generation-webui one-click-installers
212
 
213
  <!-- README_GPTQ.md-text-generation-webui end -->
214
 
215
- <!-- README_GPTQ.md-use-from-tgi start -->
216
- ## Serving this model from Text Generation Inference (TGI)
217
-
218
- It's recommended to use TGI version 1.1.0 or later. The official Docker container is: `ghcr.io/huggingface/text-generation-inference:1.1.0`
219
-
220
- Example Docker parameters:
221
-
222
- ```shell
223
- --model-id TheBloke/Synthia-MoE-v3-Mixtral-8x7B-GPTQ --port 3000 --quantize gptq --max-input-length 3696 --max-total-tokens 4096 --max-batch-prefill-tokens 4096
224
- ```
225
-
226
- Example Python code for interfacing with TGI (requires huggingface-hub 0.17.0 or later):
227
-
228
- ```shell
229
- pip3 install huggingface-hub
230
- ```
231
-
232
- ```python
233
- from huggingface_hub import InferenceClient
234
-
235
- endpoint_url = "https://your-endpoint-url-here"
236
-
237
- prompt = "Tell me about AI"
238
- prompt_template=f'''SYSTEM: Elaborate on the topic using a Tree of Thoughts and backtrack when necessary to construct a clear, cohesive Chain of Thought reasoning. Always answer without hesitation.
239
- USER: {prompt}
240
- ASSISTANT:
241
- '''
242
-
243
- client = InferenceClient(endpoint_url)
244
- response = client.text_generation(prompt,
245
- max_new_tokens=128,
246
- do_sample=True,
247
- temperature=0.7,
248
- top_p=0.95,
249
- top_k=40,
250
- repetition_penalty=1.1)
251
-
252
- print(f"Model output: {response}")
253
- ```
254
- <!-- README_GPTQ.md-use-from-tgi end -->
255
- <!-- README_GPTQ.md-use-from-python start -->
256
- ## Python code example: inference from this GPTQ model
257
-
258
- ### Install the necessary packages
259
-
260
- Requires: Transformers 4.33.0 or later, Optimum 1.12.0 or later, and AutoGPTQ 0.4.2 or later.
261
-
262
- ```shell
263
- pip3 install --upgrade transformers optimum
264
- # If using PyTorch 2.1 + CUDA 12.x:
265
- pip3 install --upgrade auto-gptq
266
- # or, if using PyTorch 2.1 + CUDA 11.x:
267
- pip3 install --upgrade auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/
268
- ```
269
-
270
- If you are using PyTorch 2.0, you will need to install AutoGPTQ from source. Likewise if you have problems with the pre-built wheels, you should try building from source:
271
-
272
- ```shell
273
- pip3 uninstall -y auto-gptq
274
- git clone https://github.com/PanQiWei/AutoGPTQ
275
- cd AutoGPTQ
276
- git checkout v0.5.1
277
- pip3 install .
278
- ```
279
-
280
- ### Example Python code
281
-
282
- ```python
283
- from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
284
-
285
- model_name_or_path = "TheBloke/Synthia-MoE-v3-Mixtral-8x7B-GPTQ"
286
- # To use a different branch, change revision
287
- # For example: revision="gptq-4bit-128g-actorder_True"
288
- model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
289
- device_map="auto",
290
- trust_remote_code=False,
291
- revision="main")
292
-
293
- tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
294
-
295
- prompt = "Tell me about AI"
296
- prompt_template=f'''SYSTEM: Elaborate on the topic using a Tree of Thoughts and backtrack when necessary to construct a clear, cohesive Chain of Thought reasoning. Always answer without hesitation.
297
- USER: {prompt}
298
- ASSISTANT:
299
- '''
300
-
301
- print("\n\n*** Generate:")
302
-
303
- input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
304
- output = model.generate(inputs=input_ids, temperature=0.7, do_sample=True, top_p=0.95, top_k=40, max_new_tokens=512)
305
- print(tokenizer.decode(output[0]))
306
-
307
- # Inference can also be done using transformers' pipeline
308
-
309
- print("*** Pipeline:")
310
- pipe = pipeline(
311
- "text-generation",
312
- model=model,
313
- tokenizer=tokenizer,
314
- max_new_tokens=512,
315
- do_sample=True,
316
- temperature=0.7,
317
- top_p=0.95,
318
- top_k=40,
319
- repetition_penalty=1.1
320
- )
321
-
322
- print(pipe(prompt_template)[0]['generated_text'])
323
- ```
324
- <!-- README_GPTQ.md-use-from-python end -->
325
-
326
- <!-- README_GPTQ.md-compatibility start -->
327
- ## Compatibility
328
-
329
- The files provided are tested to work with Transformers. For non-Mistral models, AutoGPTQ can also be used directly.
330
-
331
- [ExLlama](https://github.com/turboderp/exllama) is compatible with Llama and Mistral models in 4-bit. Please see the Provided Files table above for per-file compatibility.
332
-
333
- For a list of clients/servers, please see "Known compatible clients / servers", above.
334
- <!-- README_GPTQ.md-compatibility end -->
335
-
336
  <!-- footer start -->
337
  <!-- 200823 -->
338
  ## Discord
 
42
  <!-- description start -->
43
  # Description
44
 
45
+ This repo contains **EXPERIMENTAL&& GPTQ model files for [Migel Tissera's Synthia MoE v3 Mixtral 8x7B](https://huggingface.co/migtissera/Synthia-MoE-v3-Mixtral-8x7B).
46
+
47
+ ## Requires AutoGPTQ PR + transformers 4.36.0
48
+
49
+ These files were made with, and will currently only work with, this AutoGPTQ PR: https://github.com/LaaZa/AutoGPTQ/tree/Mixtral
50
+
51
+ To test, please build AutoGPTQ from source using that PR. You also need Transformers version 4.36.0, released December 11th.
52
+
53
+ Transformers support has just arrived also via two PRs - and is expected in main Transformers + Optimum tomorrow (Dec 12th).
54
+
55
+ Multiple GPTQ parameter permutations are provided; see Provided Files below for details of the options provided, their parameters, and the software used to create them.
56
+
57
 
58
  Multiple GPTQ parameter permutations are provided; see Provided Files below for details of the options provided, their parameters, and the software used to create them.
59
 
 
73
  SYSTEM: Elaborate on the topic using a Tree of Thoughts and backtrack when necessary to construct a clear, cohesive Chain of Thought reasoning. Always answer without hesitation.
74
  USER: {prompt}
75
  ASSISTANT:
 
76
  ```
77
 
78
  <!-- prompt-template end -->
79
 
80
 
81
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
82
  <!-- README_GPTQ.md-provided-files start -->
83
  ## Provided files, and GPTQ parameters
84
 
 
184
  <!-- README_GPTQ.md-text-generation-webui start -->
185
  ## How to easily download and use this model in [text-generation-webui](https://github.com/oobabooga/text-generation-webui)
186
 
187
+ **NOTE This will currently only work if you install Transformers 4.36.0 and the AutoGPTQ PR mentioned in the description**
188
+
189
  Please make sure you're using the latest version of [text-generation-webui](https://github.com/oobabooga/text-generation-webui).
190
 
191
  It is strongly recommended to use the text-generation-webui one-click-installers unless you're sure you know how to make a manual install.
 
209
 
210
  <!-- README_GPTQ.md-text-generation-webui end -->
211
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
212
  <!-- footer start -->
213
  <!-- 200823 -->
214
  ## Discord