Failed to run in AWS SageMaker
Hi,
I ran the script in the Deploy menu above in AWS Sagemaker, but after a while, it failed with the OOM error. The same issue happened when I tried ml.g5.2xlarge and ml.g5.12xlarge. Is it the AWS environment problem? Did anyone have this issue?
Thanks,
Error from CloudWatch:
torch.cuda.OutOfMemoryError: Allocation on device 0 would exceed allowed memory. (out of memory)
2023-09-06T15:39:42.570+08:00 Currently allocated : 21.13 GiB
2023-09-06T15:39:42.570+08:00 Requested : 150.00 MiB
2023-09-06T15:39:42.570+08:00 Device limit : 22.20 GiB
2023-09-06T15:39:42.570+08:00 Free (according to CUDA): 25.12 MiB
2023-09-06T15:39:45.076+08:00 PyTorch limit (set by user-supplied memory fraction) : 17179869184.00 GiB
You don't have a big enough gpu
Thanks for your help @teknium . I'm using ml.g5.12xlarge, isn't that enough? I can run llama2-13b from meta on that instance.
The deploy code suggests using 2xlarge:predictor = huggingface_model.deploy( initial_instance_count=1, instance_type="ml.g5.2xlarge", container_startup_health_check_timeout=300, )
Thanks for your help @teknium . I'm using ml.g5.12xlarge, isn't that enough? I can run llama2-13b from meta on that instance.
The deploy code suggests using 2xlarge:
predictor = huggingface_model.deploy( initial_instance_count=1, instance_type="ml.g5.2xlarge", container_startup_health_check_timeout=300, )
To fit it on a 24gb gpu, either set
"device_map": "auto"
or pip install bitsandbytes
and use load_in_8bit=True or load_in_4bit=True
all of these are LlamaForCausalLM.from_pretrained args, i.e.
self.model = LlamaForCausalLM.from_pretrained(
"./openhermes13b/",
torch_dtype=torch.float16,
device_map='auto',
#load_in_8bit=True
)
Thanks a lot.