Response code 424

#1
by h2ku - opened

I got following error msg running on Nvidia Tesla T4 (4x GPU 路 64 GB). Which basically saying OOM.

{"error":"Request failed during generation: Server error: CUDA out of memory. Tried to allocate 72.00 MiB. GPU 0 has a total capacty of 14.58 GiB of which 59.56 MiB is free. Process 17070 has 14.52 GiB memory in use. Of the allocated memory 13.84 GiB is allocated by PyTorch, and 462.28 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF","error_type":"generation"}

EDEN T&S org

@h2ku The example code written in Readme loads the model to only one GPU by default.
Change the part that loads the model as follows.

model = AutoModelForCausalLM.from_pretrained("Edentns/DataVortexS-10.7B-dpo-v1.0", device_map="auto")

I was leaving the comment for fyi to future users.

Unfortunately, I'm working on a HF Inference Endpoints. Which does not provide function to include such options.
Thank you for the tip :)

Sign up or log in to comment