comaniac
/

Meta-Llama-3-70B-Instruct-FP8-v2

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

comaniac commited on Jun 10

Commit

5a3007e

•

1 Parent(s): c5597b6

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -1,7 +1,7 @@
-## Llama-3-70B-Instruct-FP8-v1
 * Weights and activations are per-tensor quantized to float8_e4m3.
-* Quantization with AutoFP8.
 * Calibration dataset: Ultrachat (mgoin/ultrachat_2k)
 * Samples: 1024
 * Sequence length: 4096

+## Llama-3-70B-Instruct-FP8-v2
 * Weights and activations are per-tensor quantized to float8_e4m3.
+* Quantization with AutoFP8 with the updated activation scaling factor names.
 * Calibration dataset: Ultrachat (mgoin/ultrachat_2k)
 * Samples: 1024
 * Sequence length: 4096