What would be the minimal Sagemaker instance to deploy this model ?
#7
by
CarlosAndrea
- opened
as stated in the title, What would be the minimal Sagemaker instance to deploy this model ?
I'm trying it with ml.g5.24xlarge but so far I haven't been able to deploy it. I keep running into this error
- "Error: ShardCannotStart"
- "TypeError: unsupported operand type(s) for /: 'NoneType' and 'int'"
- "TypeError: unsupported operand type(s) for /: 'NoneType' and 'int' #033[2m#033[3mrank#033[0m#033[2m=#033[0m5#033[0m"
Hi,
The model is about 24GB, but with the additional data that is being sent through it, you probably would need at least 30 GB of RAM to be safe (it's a rough guess).
People have been deploying it successfully on SageMaker with 2 A10 GPUs: https://github.com/vllm-project/vllm/issues/2395. Each A10 GPU has 24GB of RAM so you'll have 48GB in total which is enough. Alternatively, 2 L4 GPUs should work as well, which also have 24GB RAM each.