Failed to run in AWS SageMaker

by fangleen - opened Sep 7, 2023

Sep 7, 2023

Hi,
I ran the script in the Deploy menu above in AWS Sagemaker, but after a while, it failed with the OOM error. The same issue happened when I tried ml.g5.2xlarge and ml.g5.12xlarge. Is it the AWS environment problem? Did anyone have this issue?

Thanks,

Error from CloudWatch:

torch.cuda.OutOfMemoryError: Allocation on device 0 would exceed allowed memory. (out of memory)

2023-09-06T15:39:42.570+08:00 Currently allocated : 21.13 GiB

2023-09-06T15:39:42.570+08:00 Requested : 150.00 MiB

2023-09-06T15:39:42.570+08:00 Device limit : 22.20 GiB

2023-09-06T15:39:42.570+08:00 Free (according to CUDA): 25.12 MiB

2023-09-06T15:39:45.076+08:00 PyTorch limit (set by user-supplied memory fraction) : 17179869184.00 GiB

teknium

NousResearch org Sep 7, 2023

You don't have a big enough gpu

fangleen

Sep 7, 2023

•

edited Sep 7, 2023

Thanks for your help @teknium . I'm using ml.g5.12xlarge, isn't that enough? I can run llama2-13b from meta on that instance.

The deploy code suggests using 2xlarge:
predictor = huggingface_model.deploy( initial_instance_count=1, instance_type="ml.g5.2xlarge", container_startup_health_check_timeout=300, )

teknium

NousResearch org Sep 9, 2023

Thanks for your help @teknium . I'm using ml.g5.12xlarge, isn't that enough? I can run llama2-13b from meta on that instance.

The deploy code suggests using 2xlarge:
predictor = huggingface_model.deploy( initial_instance_count=1, instance_type="ml.g5.2xlarge", container_startup_health_check_timeout=300, )

To fit it on a 24gb gpu, either set

"device_map": "auto"

or pip install bitsandbytes

and use load_in_8bit=True or load_in_4bit=True

all of these are LlamaForCausalLM.from_pretrained args, i.e.
self.model = LlamaForCausalLM.from_pretrained(
"./openhermes13b/",
torch_dtype=torch.float16,
device_map='auto',
#load_in_8bit=True
)

fangleen

Sep 11, 2023

Thanks a lot.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment