Issue: Cannot deploy on SageMaker

#15
by Feifeifly7879 - opened

Hi, I am trying to use the provided deployment code on SageMaker. I got the following error:

"The checkpoint you are trying to load has model type stablelm but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date".

Any suggestions how to fix this issue?

Can you show me code?
So I can fix code.

Feifeifly7879 changed discussion title from Issue: cannot deploy on SageMaker to Issue: Cannot deploy on SageMaker

I am using the code provided here in the website "Deploy/Amazon SageMaker". I figured out the problem is the version of transformers. In the code, it is set as 4.37.0. However, this version is too old. In my test, version 4.40.2 works. However, 4.37.0 is the highest version that HuggingFaceModel supports. So, I don't know how to work it out.

import sagemaker
import boto3
from sagemaker.huggingface import HuggingFaceModel

try:
    role = sagemaker.get_execution_role()
except ValueError:
    iam = boto3.client('iam')
    role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']

# Hub Model configuration. https://huggingface.co/models
hub = {
    'HF_MODEL_ID':'stabilityai/stablelm-zephyr-3b',
    'HF_TASK':'text-generation'
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
    transformers_version='4.37.0',
    pytorch_version='2.1.0',
    py_version='py310',
    env=hub,
    role=role, 
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
    initial_instance_count=1, # number of instances
    instance_type='ml.m5.xlarge' # ec2 instance type
)

predictor.predict({
    "inputs": "Can you please let us know more details about your ",
})

Finally, I succeeded deploying it on SageMaker with the following code (the key point is to use a latest version of TGI image):

import boto3
import sagemaker
from sagemaker import get_execution_role
from sagemaker.model import Model
from sagemaker.predictor import Predictor

# Set up the SageMaker session
sagemaker_session = sagemaker.Session()

# Get the execution role
role = get_execution_role()

hub = {
    'HF_MODEL_ID':'stabilityai/stablelm-zephyr-3b',
    'HF_TASK':'text-generation',
    'HUGGING_FACE_HUB_TOKEN':'<token>'
}

model = Model(
    image_uri='763104351884.dkr.ecr.ap-southeast-1.amazonaws.com/huggingface-pytorch-tgi-inference:2.3.0-tgi2.2.0-gpu-py310-cu121-ubuntu22.04-v2.0', // Use the TGI image with transfomers 4.43.1
    role=role,
    sagemaker_session=sagemaker_session,
    env=hub
)

predictor = model.deploy(
    initial_instance_count=1,
    instance_type='ml.g5.xlarge',
    endpoint_name='zephyr-19',
    container_startup_health_check_timeout=400,  # 6.67 minutes for startup timeout
    container_startup_health_check_frequency=200  # Health check every 3.33 minutes
)

print(f"Endpoint deployed successfully.")

Sign up or log in to comment