Hi, can someone help me debug this?
I am trying to deploy wizard coder on Sagemaker using the recommended deploy code, I am getting the following error -

[RuntimeError: found uninitialized parameters in model : ['transformer.h.0.attn.c_attn.weight', 'transformer.h.0.attn.c_proj.weight', 'transformer.h.0.mlp.c_fc.weight', 'transformer.h.0.mlp.c_proj.weight', 'transformer.h.1.attn.c_attn.weight', ](RuntimeError: found uninitialized parameters in model : ['transformer.h.0.attn.c_attn.weight', 'transformer.h.0.attn.c_proj.weight', 'transformer.h.0.mlp.c_fc.weight', 'transformer.h.0.mlp.c_proj.weight', 'transformer.h.1.attn.c_attn.weight', 'transformer.h.1.attn.c_proj.weight', 'transformer.h.1.mlp.c_fc.weight', 'transformer.h.1.mlp.c_proj.weight', 'transformer.h.2.attn.c_attn.weight', 'transformer.h.2.attn.c_proj.weight', 'transformer.h.2.mlp.c_fc.weight', 'transformer.h.2.mlp.c_proj.weight', 'transformer.h.3.attn.c_attn.weight', 'transformer.h.3.attn.c_proj.weight', 'transformer.h.3.mlp.c_fc.weight', 'transformer.h.3.mlp.c_proj.weight'........

Recommended Deploy code -

import json
import sagemaker
import boto3
from sagemaker.huggingface import HuggingFaceModel, get_huggingface_llm_image_uri

try:
role = sagemaker.get_execution_role()
except ValueError:
iam = boto3.client('iam')
role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']

models

hub = {
'HF_MODEL_ID':'TheBloke/WizardCoder-15B-1.0-GPTQ',
'SM_NUM_GPUS': json.dumps(1)
}

create Hugging Face Model Class

huggingface_model = HuggingFaceModel(
image_uri=get_huggingface_llm_image_uri("huggingface",version="0.8.2"),
env=hub,
role=role,
)

deploy model to SageMaker Inference

predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type="ml.g5.2xlarge",
container_startup_health_check_timeout=300,
)

send request

predictor.predict({
"inputs": "My name is Julien and I like to",
})

I have no experience of Sagemaker at all I'm afraid. But I'm not sure this model is going to work. Firstly it's a GPTQ model, and I'm not sure if the Sagemaker code supports those?

Text Generation Inference, the Hugging Face inference library did recently add support for GPTQ. And maybe Sagemaker is using that? But I don't believe TGI supports Starcoder models (in GPTQ anyway), and this is a Starcoder model.

This Github Issue describes how to get my GPTQs working with Text Generation Inference - currently environment variables are required: https://github.com/huggingface/text-generation-inference/issues/601

But I've already had it reported to me that TGI doesn't work with Starcoder so even if Sagemaker is using TGI, I wouldn't expect it to work with this specific model. Try one of my Llama GPTQs instead.

TheBloke
/

WizardCoder-15B-1.0-GPTQ

Runtime error from Sagemaker deploy

Hub Model configuration. https://huggingface.co/models

create Hugging Face Model Class

deploy model to SageMaker Inference

send request