FP8 Quantized model now available! (only requires half the original model's VRAM)

#33

by mysticbeing - opened 15 days ago

15 days ago

Runs on 1x H100 / A100 (80GB) : https://huggingface.co/mysticbeing/Llama-3.1-Nemotron-70B-Instruct-HF-FP8-DYNAMIC

Weight-and-activation quantization to FP8 is virtually lossless, as the text generated by FP8 models is nearly indistinguishable from that of their unquantized counterparts, requiring a very close examination to notice any differences.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment