4090单卡运行报显存不足

by loong - opened Aug 7

Discussion

loong

Aug 7

用bfloat格式也不够

loong

Aug 7

OutOfMemoryError: CUDA out of memory. Tried to allocate 1.48 GiB. GPU 0 has a total capacity of 23.64 GiB of which 688.12 MiB is free. Process 2629 has 818.00 MiB memory in use. Including non-PyTorch memory, this process has 22.16 GiB memory in use. Of the allocated memory 21.29 GiB is allocated by PyTorch, and 429.57 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

loong

Aug 7

pipe.enable_model_cpu_offload() 用这个好像也不行

zRzRzRzRzRzRzR

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org Aug 7

•

edited Aug 7

尝试更新一下现在的github和diffuser，解决了 23.9G

m1chael0220

Aug 8

需要安装pytorch2.4.0吗

zRzRzRzRzRzRzR

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org Aug 8

2.2也是可以的 2.2 2.3 2.4都行

m1chael0220

Aug 8

This comment has been hidden

m1chael0220

Aug 8

This comment has been hidden

m1chael0220

Aug 8

pipe.enable_model_cpu_offload()
看起来并没有生效

zRzRzRzRzRzRzR

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org Aug 9

PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True 运行也不行吗，一共消耗23.9G 确保你的显卡当前没有占用任何的内容
在哪个步骤炸了呢

m1chael0220

Aug 9

PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True 运行也不行吗，一共消耗23.9G 确保你的显卡当前没有占用任何的内容
在哪个步骤炸了呢

load阶段直接占用了36G，设置了pipe.enable_model_cpu_offload()，没有起到作用，diffusers是0.30.0dev的版本

zRzRzRzRzRzRzR

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org Aug 9

load 阶段？你是怎么运行代码的，或许更一下现在的稳定版本

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment