Edentns/DataVortexS-10.7B-dpo-v1.0

h2ku

Feb 15

I got following error msg running on Nvidia Tesla T4 (4x GPU · 64 GB). Which basically saying OOM.

{"error":"Request failed during generation: Server error: CUDA out of memory. Tried to allocate 72.00 MiB. GPU 0 has a total capacty of 14.58 GiB of which 59.56 MiB is free. Process 17070 has 14.52 GiB memory in use. Of the allocated memory 13.84 GiB is allocated by PyTorch, and 462.28 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF","error_type":"generation"}

JeongwonChoi

EDEN T&S org Feb 15

@h2ku The example code written in Readme loads the model to only one GPU by default.
Change the part that loads the model as follows.

model = AutoModelForCausalLM.from_pretrained("Edentns/DataVortexS-10.7B-dpo-v1.0", device_map="auto")

h2ku

Feb 15

I was leaving the comment for fyi to future users.

Unfortunately, I'm working on a HF Inference Endpoints. Which does not provide function to include such options.
Thank you for the tip :)

Edentns
/

DataVortexS-10.7B-dpo-v1.0

Response code 424