alexmarques
commited on
Commit
•
82d7ef8
1
Parent(s):
41bb75f
Update README.md
Browse files
README.md
CHANGED
@@ -6,7 +6,7 @@ license: apache-2.0
|
|
6 |
license_link: https://www.apache.org/licenses/LICENSE-2.0
|
7 |
---
|
8 |
|
9 |
-
# Qwen2-
|
10 |
|
11 |
## Model Overview
|
12 |
- **Model Architecture:** Qwen2
|
@@ -15,7 +15,7 @@ license_link: https://www.apache.org/licenses/LICENSE-2.0
|
|
15 |
- **Model Optimizations:**
|
16 |
- **Activation quantization:** INT8
|
17 |
- **Weight quantization:** INT8
|
18 |
-
- **Intended Use Cases:** Intended for commercial and research use in English. Similarly to [Qwen2-
|
19 |
- **Out-of-scope:** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in languages other than English.
|
20 |
- **Release Date:** 7/15/2024
|
21 |
- **Version:** 1.0
|
@@ -27,7 +27,7 @@ It achieves an average score of 80.32 on the [OpenLLM](https://huggingface.co/sp
|
|
27 |
|
28 |
### Model Optimizations
|
29 |
|
30 |
-
This model was obtained by quantizing the weights of [Qwen2-
|
31 |
This optimization reduces the number of bits used to represent weights and activations from 16 to 8, reducing GPU memory requirements (by approximately 50%) and increasing matrix-multiply compute throughput (by approximately 2x).
|
32 |
Weight quantization also reduces disk size requirements by approximately 50%.
|
33 |
|
|
|
6 |
license_link: https://www.apache.org/licenses/LICENSE-2.0
|
7 |
---
|
8 |
|
9 |
+
# Qwen2-72B-Instruct-quantized.w8a8
|
10 |
|
11 |
## Model Overview
|
12 |
- **Model Architecture:** Qwen2
|
|
|
15 |
- **Model Optimizations:**
|
16 |
- **Activation quantization:** INT8
|
17 |
- **Weight quantization:** INT8
|
18 |
+
- **Intended Use Cases:** Intended for commercial and research use in English. Similarly to [Qwen2-72B-Instruct](https://huggingface.co/Qwen/Qwen2-72B-Instruct), this models is intended for assistant-like chat.
|
19 |
- **Out-of-scope:** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in languages other than English.
|
20 |
- **Release Date:** 7/15/2024
|
21 |
- **Version:** 1.0
|
|
|
27 |
|
28 |
### Model Optimizations
|
29 |
|
30 |
+
This model was obtained by quantizing the weights of [Qwen2-72B-Instruct](https://huggingface.co/Qwen/Qwen2-72B-Instruct) to INT8 data type.
|
31 |
This optimization reduces the number of bits used to represent weights and activations from 16 to 8, reducing GPU memory requirements (by approximately 50%) and increasing matrix-multiply compute throughput (by approximately 2x).
|
32 |
Weight quantization also reduces disk size requirements by approximately 50%.
|
33 |
|