joaoalvarenga/bloom-8bit · How to use this on GPUs?

Aug 8, 2022

•

edited Aug 8, 2022

I modified the model loading line as follows:

'''
model = BloomForCausalLM.from_pretrained('joaoalvarenga/bloom-8bit', low_cpu_mem_usage=True, device_map="auto")
'''

Using device_map="auto" automatically moves everything to the GPU but when I try generating, I get the following:

'''
RuntimeError: CUDA out of memory. Tried to allocate 13.40 GiB (GPU 0; 31.75 GiB total capacity; 19.45 GiB already allocated; 11.20 GiB free; 19.47 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CON
'''

I have 8 32GB gpus so everything fits. Am I doing this right, or is there a specific way to do decoding using a GPU?

Thanks in advance.

prajdabre changed discussion title from How to use this on GPUS? to How to use this on GPUs? Aug 8, 2022

prajdabre changed discussion status to closed Aug 26, 2022