unsubscribe commited on
Commit
d881779
1 Parent(s): c8da55e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -3
README.md CHANGED
@@ -75,9 +75,13 @@ We benchmarked the Llama 2 7B and 13B with 4-bit quantization on NVIDIA GeForce
75
  | Llama 2 13B | N/A | 90.7 | 115.8 |
76
 
77
  ```shell
78
- python benchmark/profile_generation.py \
79
- ./workspace \
80
- --concurrency 1 --input_seqlen 1 --output_seqlen 512
 
 
 
 
81
  ```
82
 
83
  ## 4-bit Weight Quantization
 
75
  | Llama 2 13B | N/A | 90.7 | 115.8 |
76
 
77
  ```shell
78
+ pip install nvidia-ml-py
79
+ ```
80
+
81
+ ```bash
82
+ python profile_generation.py \
83
+ --model-path /path/to/your/model \
84
+ --concurrency 1 8 --prompt-tokens 0 512 --completion-tokens 2048 512
85
  ```
86
 
87
  ## 4-bit Weight Quantization