Can it run on A100/A800 with VLLM?

by Parkerlambert123 - opened Jul 28

Jul 28

•

Sorry to bother you. The model can be run on A100 (SM80)？Some models like Llama3.1 can be run on A100/A800 with fp8_marlin.

mgoin

Neural Magic org Jul 29

Currently we don't support MoE FP8 models on Ampere. This is because vLLM uses Triton for its FusedMoE kernel, which doesn't support the FP8 Marlin mixed-precision gemm.

mgoin changed discussion status to closed Jul 29

traphix

Aug 28

Any update for this? Does vLLM support now?

mgoin

Neural Magic org Aug 28

vLLM supports FP8 MoE models only on Ada Lovelace or Hopper GPUs (>= SM 89) with hardware support for FP8

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment