Can it run on A100/A800 with VLLM?
#1
by
Parkerlambert123
- opened
Sorry to bother you. The model can be run on A100 (SM80)?Some models like Llama3.1 can be run on A100/A800 with fp8_marlin.
Currently we don't support MoE FP8 models on Ampere. This is because vLLM uses Triton for its FusedMoE kernel, which doesn't support the FP8 Marlin mixed-precision gemm.
mgoin
changed discussion status to
closed
Any update for this? Does vLLM support now?
vLLM supports FP8 MoE models only on Ada Lovelace or Hopper GPUs (>= SM 89) with hardware support for FP8