Does this model only work on CUDA devices with compute capability >= 9.0 or 8.9/ROCm MI300+?

#4
by jcfasi - opened

When trying to deploy this on SageMaker using the DJLServing 0.29.0 LMI image with vLLM I get this error:

RuntimeError: torch._scaled_mm is only supported on CUDA devices with compute capability >= 9.0 or 8.9, or

LMI Containers Reference

DJLServing 0.29.0 LMI Image URI: 763104351884.dkr.ecr.us-west-2.amazonaws.com/djl-inference:0.29.0-lmi11.0.0-cu124

Neural Magic org

Hi @jcfasi , vLLM has support for fp8 on Ampere (compute capability >= 8.0) as well! See https://docs.vllm.ai/en/latest/quantization/fp8.html

If you are running into issues, please post them at https://github.com/vllm-project/vllm/

Sign up or log in to comment