alexmarques commited on
Commit
82d7ef8
1 Parent(s): 41bb75f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -6,7 +6,7 @@ license: apache-2.0
6
  license_link: https://www.apache.org/licenses/LICENSE-2.0
7
  ---
8
 
9
- # Qwen2-0.5B-Instruct-quantized.w8a8
10
 
11
  ## Model Overview
12
  - **Model Architecture:** Qwen2
@@ -15,7 +15,7 @@ license_link: https://www.apache.org/licenses/LICENSE-2.0
15
  - **Model Optimizations:**
16
  - **Activation quantization:** INT8
17
  - **Weight quantization:** INT8
18
- - **Intended Use Cases:** Intended for commercial and research use in English. Similarly to [Qwen2-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2-0.5B-Instruct), this models is intended for assistant-like chat.
19
  - **Out-of-scope:** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in languages other than English.
20
  - **Release Date:** 7/15/2024
21
  - **Version:** 1.0
@@ -27,7 +27,7 @@ It achieves an average score of 80.32 on the [OpenLLM](https://huggingface.co/sp
27
 
28
  ### Model Optimizations
29
 
30
- This model was obtained by quantizing the weights of [Qwen2-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2-0.5B-Instruct) to INT8 data type.
31
  This optimization reduces the number of bits used to represent weights and activations from 16 to 8, reducing GPU memory requirements (by approximately 50%) and increasing matrix-multiply compute throughput (by approximately 2x).
32
  Weight quantization also reduces disk size requirements by approximately 50%.
33
 
 
6
  license_link: https://www.apache.org/licenses/LICENSE-2.0
7
  ---
8
 
9
+ # Qwen2-72B-Instruct-quantized.w8a8
10
 
11
  ## Model Overview
12
  - **Model Architecture:** Qwen2
 
15
  - **Model Optimizations:**
16
  - **Activation quantization:** INT8
17
  - **Weight quantization:** INT8
18
+ - **Intended Use Cases:** Intended for commercial and research use in English. Similarly to [Qwen2-72B-Instruct](https://huggingface.co/Qwen/Qwen2-72B-Instruct), this models is intended for assistant-like chat.
19
  - **Out-of-scope:** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in languages other than English.
20
  - **Release Date:** 7/15/2024
21
  - **Version:** 1.0
 
27
 
28
  ### Model Optimizations
29
 
30
+ This model was obtained by quantizing the weights of [Qwen2-72B-Instruct](https://huggingface.co/Qwen/Qwen2-72B-Instruct) to INT8 data type.
31
  This optimization reduces the number of bits used to represent weights and activations from 16 to 8, reducing GPU memory requirements (by approximately 50%) and increasing matrix-multiply compute throughput (by approximately 2x).
32
  Weight quantization also reduces disk size requirements by approximately 50%.
33