Deploy with Sagemaker LMI

by josete89 - opened Jan 22

Discussion

josete89

Jan 22

I've tried to deploy this with Sagemaker LMI but is not possible. Seems the model should follow this layout:

compiled: neff files
checkpoint: pytorch weights compiled
tokenizer...
Is it possible to get something like that? Or least some code snippet how to deploy this as an endpoint? I've tried but still no luck

jburtoft

AWS Inferentia and Trainium org Jan 22

Hey @josete89 . What you are describing is the layout for the Optimum library. This example was originally built with Transformers because Optimum didn't have the support for Mistral, but I saw a PR went through last week with it. We should be able to update it to work. Reach out to me.

dacorvo

AWS Inferentia and Trainium org Jan 23

optimum-neuron >= 0.0.17 compatible models have been added for several configurations.

josete89

Jan 23

•

edited Jan 23

I was able to compile the model seamlessly, but when I tried to deploy it:

from sagemaker.huggingface.model import HuggingFaceModel

# create Hugging Face Model Class
model = HuggingFaceModel(
   model_data=s3_model_uri,        # path to your model.tar.gz on s3
   role=role,                      # iam role with permissions to create an Endpoint
   transformers_version="4.34.1",  # transformers version used
   pytorch_version="1.13.1",       # pytorch version used
   py_version='py310',             # python version used
   model_server_workers=1,         # number of workers for the model server
)

I got the following message when I sent a request:
"Pretrained model is compiled with neuronx-cc(2.12.54.0+f631c2365) newer than current compiler (2.11.0.34+c5231f848), which may cause runtime".

I guess the base image needs to be updated.

jburtoft

AWS Inferentia and Trainium org Jan 23

@josete89 Yes, Mistral requires the new 2.16 SDK. Not all the images are updated yet.

As of today, you would need to use 763104351884.dkr.ecr.us-east-1.amazonaws.com/djl-inference:0.26.0-neuronx-sdk2.16.0

That may require you to repackage your model depending on what image you were using previously. Watch for updates at https://github.com/aws/deep-learning-containers/blob/master/available_images.md#large-model-inference-containers

What image are you using to deploy now? You may be able to update that and deploy it as a custom image.

josete89

Jan 23

Right now I'm using: "763104351884.dkr.ecr.eu-west-1.amazonaws.com/huggingface-pytorch-inference-neuronx:1.13.1-transformers4.34.1-neuronx-py310-sdk2.15.0-ubuntu20.04" I guess that's the problem :) How can I deploy it as custom image then?

jburtoft

AWS Inferentia and Trainium org Jan 24

You can specify a custom image by using
image_uri="763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-tgi-inference:1.13.1-optimum0.0.16-neuronx-py310-ubuntu22.04-v1.0",
instead of the version settings--those are just used to find the right image automatically for you.
You can try the above--it is a Hugging Face Text Generation Image update, but I am not sure of the SDK version.

You can also create a sagemaker compatible image and upload it to your private ECR repository.

git clone https://github.com/huggingface/optimum-neuron
cd optimum-neuron
make neuronx-tgi-sagemaker

It is extra steps, but you can specify the exact version of the SDK in the Docker file:
https://github.com/huggingface/optimum-neuron/blob/main/text-generation-inference/Dockerfile

It is less steps if you can wait for the SageMaker team to release an updated image.

dacorvo

AWS Inferentia and Trainium org Jan 26

The new image with AWS Neuron SDK 2.16 and optimum-neuron 0.0.17 has been released: https://github.com/aws/deep-learning-containers/releases/tag/v1.0-hf-tgi-0.0.17-pt-1.13.1-inf-neuronx-py310

josete89

Jan 30

Thanks a lot @jburtoft @dacorvo ! I will give it a try :)

jburtoft

AWS Inferentia and Trainium org Feb 2

@josete89 Make sure you check out the new blog post from HF that walks you through it. No image updates needed!

https://huggingface.co/blog/text-generation-inference-on-inferentia2

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment