Model inference giving 503 error

#25
by DeepTreeTeam - opened

Hello,
it's been more than a day since the serverless inference on this model keeps returning a 503 error code, saying "Service Unavailable". Is there a specific reason for the model being unavailable? When will it be back?
Thank you!

The same...

Same here any updates on the issue so far.

This model was taken down from the inference API, but the fp8 version is available through NIM on DGX Cloud: https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct-FP8?dgx_inference=true

Cost for this is based on compute time.

Sign up or log in to comment