runtime error

b/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 371, in hf_raise_for_status raise HfHubHTTPError(str(e), response=response) from e huggingface_hub.utils._errors.HfHubHTTPError: 500 Server Error: Internal Server Error for url: https://api-inference.huggingface.co/models/casperhansen/llama-3-8b-instruct-awq (Request ID: R6NxhkCtkcj_ueyKwRH0x) Could not load model casperhansen/llama-3-8b-instruct-awq with any of the following classes: (<class 'transformers.models.llama.modeling_llama.LlamaForCausalLM'>,). See the original errors: while loading with LlamaForCausalLM, an error is thrown: Traceback (most recent call last): File "/src/transformers/src/transformers/pipelines/base.py", line 279, in infer_framework_load_model model = model_class.from_pretrained(model, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/src/transformers/src/transformers/modeling_utils.py", line 3016, in from_pretrained config.quantization_config = AutoHfQuantizer.merge_quantization_configs( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/src/transformers/src/transformers/quantizers/auto.py", line 145, in merge_quantization_configs quantization_config = AutoQuantizationConfig.from_dict(quantization_config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/src/transformers/src/transformers/quantizers/auto.py", line 75, in from_dict return target_cls.from_dict(quantization_config_dict) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/src/transformers/src/transformers/utils/quantization_config.py", line 90, in from_dict config = cls(**config_dict) ^^^^^^^^^^^^^^^^^^ File "/src/transformers/src/transformers/utils/quantization_config.py", line 655, in __init__ self.post_init() File "/src/transformers/src/transformers/utils/quantization_config.py", line 662, in post_init raise ValueError("AWQ is only available on GPU") ValueError: AWQ is only available on GPU

Container logs:

Fetching error logs...