luow-amd haoyanli commited on
Commit
9b9a87f
1 Parent(s): f7aa269

Update README.md (#4)

Browse files

- Update Readme.ME (a8581e5cb523640a67481fc4e2ec31703691e7c9)


Co-authored-by: haoyanli <[email protected]>

Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -16,7 +16,7 @@ export MODEL_DIR = [local model checkpoint folder] or meta-llama/Meta-Llama-3.1-
16
  # single GPU
17
  python3 quantize_quark.py \
18
  --model_dir $MODEL_DIR \
19
- --output_dir llama31_8b_amd \
20
  --quant_scheme w_fp8_a_fp8 \
21
  --kv_cache_dtype fp8 \
22
  --num_calib_data 128 \
@@ -25,12 +25,12 @@ python3 quantize_quark.py \
25
  # If model size is too large for single GPU, please use multi GPU instead.
26
  python3 quantize_quark.py \
27
  --model_dir $MODEL_DIR \
28
- --output_dir llama31_8b_amd \
29
  --quant_scheme w_fp8_a_fp8 \
30
  --kv_cache_dtype fp8 \
31
  --num_calib_data 128 \
32
- --multi_gpu \
33
- --model_export quark_safetensors
34
  ```
35
  ## Evaluation
36
  Quark currently uses perplexity(PPL) as the evaluation metric for accuracy loss before and after quantization.The specific PPL algorithm can be referenced in the quantize_quark.py.
@@ -70,4 +70,4 @@ Unless required by applicable law or agreed to in writing, software
70
  distributed under the License is distributed on an "AS IS" BASIS,
71
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
72
  See the License for the specific language governing permissions and
73
- limitations under the License.
 
16
  # single GPU
17
  python3 quantize_quark.py \
18
  --model_dir $MODEL_DIR \
19
+ --output_dir Meta-Llama-3.1-8B-Instruct-FP8-KV \
20
  --quant_scheme w_fp8_a_fp8 \
21
  --kv_cache_dtype fp8 \
22
  --num_calib_data 128 \
 
25
  # If model size is too large for single GPU, please use multi GPU instead.
26
  python3 quantize_quark.py \
27
  --model_dir $MODEL_DIR \
28
+ --output_dir Meta-Llama-3.1-8B-Instruct-FP8-KV \
29
  --quant_scheme w_fp8_a_fp8 \
30
  --kv_cache_dtype fp8 \
31
  --num_calib_data 128 \
32
+ --model_export quark_safetensors \
33
+ --multi_gpu
34
  ```
35
  ## Evaluation
36
  Quark currently uses perplexity(PPL) as the evaluation metric for accuracy loss before and after quantization.The specific PPL algorithm can be referenced in the quantize_quark.py.
 
70
  distributed under the License is distributed on an "AS IS" BASIS,
71
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
72
  See the License for the specific language governing permissions and
73
+ limitations under the License.