unsubscribe
commited on
Commit
•
d881779
1
Parent(s):
c8da55e
Update README.md
Browse files
README.md
CHANGED
@@ -75,9 +75,13 @@ We benchmarked the Llama 2 7B and 13B with 4-bit quantization on NVIDIA GeForce
|
|
75 |
| Llama 2 13B | N/A | 90.7 | 115.8 |
|
76 |
|
77 |
```shell
|
78 |
-
|
79 |
-
|
80 |
-
|
|
|
|
|
|
|
|
|
81 |
```
|
82 |
|
83 |
## 4-bit Weight Quantization
|
|
|
75 |
| Llama 2 13B | N/A | 90.7 | 115.8 |
|
76 |
|
77 |
```shell
|
78 |
+
pip install nvidia-ml-py
|
79 |
+
```
|
80 |
+
|
81 |
+
```bash
|
82 |
+
python profile_generation.py \
|
83 |
+
--model-path /path/to/your/model \
|
84 |
+
--concurrency 1 8 --prompt-tokens 0 512 --completion-tokens 2048 512
|
85 |
```
|
86 |
|
87 |
## 4-bit Weight Quantization
|