MohamedRashad commited on
Commit
aab38ee
1 Parent(s): 599c1a1

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +192 -0
README.md ADDED
@@ -0,0 +1,192 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: FreedomIntelligence/AceGPT-7B-chat
3
+ inference: false
4
+ license: llama2
5
+ model_creator: FreedomIntelligence
6
+ model_name: AceGPT 7B chat
7
+ model_type: llama2
8
+ quantized_by: MohamedRashad
9
+ datasets:
10
+ - FreedomIntelligence/Arabic-Vicuna-80
11
+ - FreedomIntelligence/Arabic-AlpacaEval
12
+ - FreedomIntelligence/MMLU_Arabic
13
+ - FreedomIntelligence/EXAMs
14
+ - FreedomIntelligence/ACVA-Arabic-Cultural-Value-Alignment
15
+ language:
16
+ - en
17
+ - ar
18
+ library_name: transformers
19
+ ---
20
+ <center>
21
+ <img src="https://www.halalcertificationturkey.com/wp-content/uploads/2020/02/16-1024x714.jpg">
22
+ </center>
23
+
24
+ # AceGPT 7B Chat - AWQ
25
+ - Model creator: [FreedomIntelligence](https://huggingface.co/FreedomIntelligence)
26
+ - Original model: [AceGPT 7B Chat](https://huggingface.co/FreedomIntelligence/AceGPT-7B-chat)
27
+
28
+ <!-- description start -->
29
+ ## Description
30
+
31
+ This repo contains AWQ model files for [FreedomIntelligence's AceGPT 7B Chat(https://huggingface.co/FreedomIntelligence/AceGPT-7B-chat).
32
+
33
+ ### About AWQ
34
+
35
+ AWQ is an efficient, accurate and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization. Compared to GPTQ, it offers faster Transformers-based inference with equivalent or better quality compared to the most commonly used GPTQ settings.
36
+
37
+ It is supported by:
38
+
39
+ - [Text Generation Webui](https://github.com/oobabooga/text-generation-webui) - using Loader: AutoAWQ
40
+ - [vLLM](https://github.com/vllm-project/vllm) - Llama and Mistral models only
41
+ - [Hugging Face Text Generation Inference (TGI)](https://github.com/huggingface/text-generation-inference)
42
+ - [Transformers](https://huggingface.co/docs/transformers) version 4.35.0 and later, from any code or client that supports Transformers
43
+ - [AutoAWQ](https://github.com/casper-hansen/AutoAWQ) - for use from Python code
44
+
45
+ <!-- description end -->
46
+
47
+ <!-- prompt-template start -->
48
+ ## Prompt template: Unknown
49
+
50
+ ```
51
+ [INST] <<SYS>>\nأنت مساعد مفيد ومحترم وصادق. أجب دائما بأكبر قدر ممكن من المساعدة بينما تكون آمنا. يجب ألا تتضمن إجاباتك أي محتوى ضار أو غير أخلاقي أو عنصري أو جنسي أو سام أو خطير أو غير قانوني. يرجى التأكد من أن ردودك غير متحيزة اجتماعيا وإيجابية بطبيعتها.\n\nإذا كان السؤال لا معنى له أو لم يكن متماسكا من الناحية الواقعية، اشرح السبب بدلا من الإجابة على شيء غير صحيح. إذا كنت لا تعرف إجابة سؤال ما، فيرجى عدم مشاركة معلومات خاطئة.\n<</SYS>>\n\n
52
+ [INST] {prompt} [/INST]
53
+ ```
54
+ <!-- prompt-template end -->
55
+
56
+ <!-- README_AWQ.md-use-from-python start -->
57
+ ## Inference from Python code using Transformers
58
+
59
+ ### Install the necessary packages
60
+
61
+ - Requires: [Transformers](https://huggingface.co/docs/transformers) 4.35.0 or later.
62
+ - Requires: [AutoAWQ](https://github.com/casper-hansen/AutoAWQ) 0.1.6 or later.
63
+
64
+ ```shell
65
+ pip3 install --upgrade "autoawq>=0.1.6" "transformers>=4.35.0"
66
+ ```
67
+
68
+ Note that if you are using PyTorch 2.0.1, the above AutoAWQ command will automatically upgrade you to PyTorch 2.1.0.
69
+
70
+ If you are using CUDA 11.8 and wish to continue using PyTorch 2.0.1, instead run this command:
71
+
72
+ ```shell
73
+ pip3 install https://github.com/casper-hansen/AutoAWQ/releases/download/v0.1.6/autoawq-0.1.6+cu118-cp310-cp310-linux_x86_64.whl
74
+ ```
75
+
76
+ If you have problems installing [AutoAWQ](https://github.com/casper-hansen/AutoAWQ) using the pre-built wheels, install it from source instead:
77
+
78
+ ```shell
79
+ pip3 uninstall -y autoawq
80
+ git clone https://github.com/casper-hansen/AutoAWQ
81
+ cd AutoAWQ
82
+ pip3 install .
83
+ ```
84
+
85
+ ### Transformers example code (requires Transformers 4.35.0 and later)
86
+
87
+ ```python
88
+ from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
89
+
90
+ model_name_or_path = "MohamedRashad/AceGPT-7B-chat-AWQ"
91
+
92
+ tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
93
+ model = AutoModelForCausalLM.from_pretrained(
94
+ model_name_or_path,
95
+ low_cpu_mem_usage=True,
96
+ device_map="auto"
97
+ )
98
+
99
+ # Using the text streamer to stream output one token at a time
100
+ streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
101
+
102
+ prompt = "ما أجمل بيت شعر فى اللغة العربية ؟"
103
+ prompt_template=f'''[INST] <<SYS>>\nأنت مساعد مفيد ومحترم وصادق. أجب دائما بأكبر قدر ممكن من المساعدة بينما تكون آمنا. يجب ألا تتضمن إجاباتك أي محتوى ضار أو غير أخلاقي أو عنصري أو جنسي أو سام أو خطير أو غير قانوني. يرجى التأكد من أن ردودك غير متحيزة اجتماعيا وإيجابية بطبيعتها.\n\nإذا كان السؤال لا معنى له أو لم يكن متماسكا من الناحية الواقعية، اشرح السبب بدلا من الإجابة على شيء غير صحيح. إذا كنت لا تع��ف إجابة سؤال ما، فيرجى عدم مشاركة معلومات خاطئة.\n<</SYS>>\n\n
104
+ [INST] {prompt} [/INST]
105
+ '''
106
+
107
+ # Convert prompt to tokens
108
+ tokens = tokenizer(
109
+ prompt_template,
110
+ return_tensors='pt'
111
+ ).input_ids.cuda()
112
+
113
+ generation_params = {
114
+ "do_sample": True,
115
+ "temperature": 0.7,
116
+ "top_p": 0.95,
117
+ "top_k": 40,
118
+ "max_new_tokens": 512,
119
+ "repetition_penalty": 1.1
120
+ }
121
+
122
+ # Generate streamed output, visible one token at a time
123
+ generation_output = model.generate(
124
+ tokens,
125
+ streamer=streamer,
126
+ **generation_params
127
+ )
128
+
129
+ # Generation without a streamer, which will include the prompt in the output
130
+ generation_output = model.generate(
131
+ tokens,
132
+ **generation_params
133
+ )
134
+
135
+ # Get the tokens from the output, decode them, print them
136
+ token_output = generation_output[0]
137
+ text_output = tokenizer.decode(token_output)
138
+ print("model.generate output: ", text_output)
139
+
140
+ # Inference is also possible via Transformers' pipeline
141
+ from transformers import pipeline
142
+
143
+ pipe = pipeline(
144
+ "text-generation",
145
+ model=model,
146
+ tokenizer=tokenizer,
147
+ **generation_params
148
+ )
149
+
150
+ pipe_output = pipe(prompt_template)[0]['generated_text']
151
+ print("pipeline output: ", pipe_output)
152
+
153
+ ```
154
+ <!-- README_AWQ.md-use-from-python end -->
155
+
156
+
157
+ <!-- README_AWQ.md-provided-files start -->
158
+ ## How AWQ Quantization happened ?
159
+ ```python
160
+ from awq import AutoAWQForCausalLM
161
+ from transformers import AutoTokenizer, AutoModelForCausalLM
162
+
163
+ model_path = "FreedomIntelligence/AceGPT-7B-chat"
164
+ quant_path = "AceGPT-7B-chat-AWQ"
165
+ quant_config = {"zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM"}
166
+ load_config = {
167
+ "low_cpu_mem_usage": True,
168
+ "device_map": "auto",
169
+ "trust_remote_code": True,
170
+ }
171
+ # Load model
172
+ model = AutoAWQForCausalLM.from_pretrained(model_path, **load_config)
173
+ tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
174
+
175
+ # Quantize
176
+ model.quantize(tokenizer, quant_config=quant_config)
177
+
178
+ # Save quantized model
179
+ model.save_quantized(quant_path)
180
+ tokenizer.save_pretrained(quant_path)
181
+
182
+ # Load quantized model
183
+ model = AutoModelForCausalLM.from_pretrained(quant_path)
184
+ tokenizer = AutoTokenizer.from_pretrained(quant_path)
185
+
186
+ # Push to hub
187
+ model.push_to_hub(quant_path)
188
+ tokenizer.push_to_hub(quant_path)
189
+ ```
190
+
191
+ <!-- README_AWQ.md-provided-files end -->
192
+