lrl-modelcloud commited on
Commit
cd8f901
1 Parent(s): 2faa634

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +64 -0
README.md ADDED
@@ -0,0 +1,64 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: llama3.2
3
+ language:
4
+ - en
5
+ - de
6
+ - fr
7
+ - it
8
+ - pt
9
+ - hi
10
+ - es
11
+ - th
12
+ base_model:
13
+ - meta-llama/Llama-3.2-1B-Instruct
14
+ pipeline_tag: text-generation
15
+ tags:
16
+ - gptqmodel
17
+ - modelcloud
18
+ - llama3.2
19
+ - instruct
20
+ - int4
21
+ ---
22
+
23
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/641c13e7999935676ec7bc03/kG3NGuEqxDeXonQx7I-YU.png)
24
+
25
+ This model has been quantized using [GPTQModel](https://github.com/ModelCloud/GPTQModel).
26
+
27
+ - **bits**: 4
28
+ - **dynamic**: null
29
+ - **group_size**: 32
30
+ - **desc_act**: true
31
+ - **static_groups**: false
32
+ - **sym**: true
33
+ - **lm_head**: false
34
+ - **true_sequential**: true
35
+ - **quant_method**: "gptq"
36
+ - **checkpoint_format**: "gptq"
37
+ - **meta**:
38
+ - **quantizer**: gptqmodel:1.1.0
39
+ - **uri**: https://github.com/modelcloud/gptqmodel
40
+ - **damp_percent**: 0.1
41
+ - **damp_auto_increment**: 0.0015
42
+
43
+
44
+ ## Example:
45
+ ```python
46
+ from transformers import AutoTokenizer
47
+ from gptqmodel import GPTQModel
48
+
49
+ model_name = "ModelCloud/Llama-3.2-1B-Instruct-gptqmodel-4bit-vortext-v2.5"
50
+
51
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
52
+ model = GPTQModel.from_quantized(model_name)
53
+
54
+ messages = [
55
+ {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
56
+ {"role": "user", "content": "Who are you?"},
57
+ ]
58
+ input_tensor = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt")
59
+
60
+ outputs = model.generate(input_ids=input_tensor.to(model.device), max_new_tokens=512)
61
+ result = tokenizer.decode(outputs[0][input_tensor.shape[1]:], skip_special_tokens=True)
62
+
63
+ print(result)
64
+ ```