alexmarques commited on
Commit
50ec7e7
1 Parent(s): 101ef04

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +45 -0
README.md ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ pipeline_tag: text-generation
5
+ ---
6
+
7
+ # Meta-Llama-3-70B-Instruct-quantized.w8a16
8
+
9
+ ## Model Overview
10
+ - **Model Architecture:** Meta-Llama-3
11
+ - **Input:** Text
12
+ - **Output:** Text
13
+ - **Model Optimizations:**
14
+ - **Quantized:** INT8 weights
15
+ - **Release Date:** 7/2/2024
16
+ - **Version:** 1.0
17
+ - **Model Developers:** Neural Magic
18
+
19
+ Quantized version of [Meta-Llama-3-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct).
20
+ It achieves an average score of 79.18% on the OpenLLM benchmark (version 1), whereas the unquantized model achieves 77.90%.
21
+
22
+ ## Model Optimizations
23
+
24
+ This model was obtained by quantizing the weights of [Meta-Llama-3-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct) to INT8 data type.
25
+ Only the weights of the linear operators within transformers blocks are quantized. Symmetric per-channel quantization is applied, in which a linear scaling per output dimension maps the INT8 and floating point representations of the quantized weights.
26
+ [AutoGPTQ](https://github.com/AutoGPTQ/AutoGPTQ) is used for quantization.
27
+ This optimization reduces the number of bits per parameter from 16 to 8, reducing the disk size and GPU memory requirements by approximately 50%.
28
+
29
+ ## Evaluation
30
+
31
+ The model was evaluated with the [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) using the [vLLM](https://docs.vllm.ai/en/stable/) engine.
32
+
33
+ ## Accuracy
34
+
35
+ ### Open LLM Leaderboard evaluation scores
36
+ | | [Meta-Llama-3-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct) | Meta-Llama-3-70B-Instruct-quantized.w8a16<br>(this model) |
37
+ | :------------------: | :----------------------: | :------------------------------------------------: |
38
+ | arc-c<br>25-shot | 72.44% | 71.59% |
39
+ | hellaswag<br>10-shot | 85.54% | 85.65% |
40
+ | mmlu<br>5-shot | 80.18% | 78.69% |
41
+ | truthfulqa<br>0-shot | 62.92% | 61.94% |
42
+ | winogrande<br>5-shot | 83.19% | 83.11% |
43
+ | gsm8k<br>5-shot | 90.83% | 86.43% |
44
+ | **Average<br>Accuracy** | **79.18%** | **77.90%** |
45
+ | **Recovery** | **100%** | **98.38%** |