CUDA Out of memory
.OutOfMemoryError: CUDA out of memory. Tried to allocate 18.00 MiB (GPU 0; 8.00 GiB total capacity; 7.08 GiB already allocated; 0 bytes free; 7.31 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Getting this issue. Tried to change the pre-layer within Oooba-booga but no success.
I've been able to get responses on an rtx 2060 super 8gb card with the following flags in ooba
call python server.py --auto-devices --extensions api --model notstoic_pygmalion-13b-4bit-128g --model_type LLaMA --wbits 4 --groupsize 128 --no-cache --pre_layer 30
I've been able to get responses on an rtx 2060 super 8gb card with the following flags in ooba
call python server.py --auto-devices --extensions api --model notstoic_pygmalion-13b-4bit-128g --model_type LLaMA --wbits 4 --groupsize 128 --no-cache --pre_layer 30
Not working for me