4090单卡运行报显存不足

#3
by loong - opened

用bfloat格式也不够

OutOfMemoryError: CUDA out of memory. Tried to allocate 1.48 GiB. GPU 0 has a total capacity of 23.64 GiB of which 688.12 MiB is free. Process 2629 has 818.00 MiB memory in use. Including non-PyTorch memory, this process has 22.16 GiB memory in use. Of the allocated memory 21.29 GiB is allocated by PyTorch, and 429.57 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

pipe.enable_model_cpu_offload() 用这个好像也不行

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org
edited Aug 7

尝试更新一下现在的github和diffuser,解决了 23.9G

需要安装pytorch2.4.0吗

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org

2.2也是可以的 2.2 2.3 2.4都行

This comment has been hidden
This comment has been hidden

pipe.enable_model_cpu_offload()
看起来并没有生效

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org

PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True 运行也不行吗,一共消耗23.9G 确保你的显卡当前没有占用任何的内容
在哪个步骤炸了呢

PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True 运行也不行吗,一共消耗23.9G 确保你的显卡当前没有占用任何的内容
在哪个步骤炸了呢

load阶段直接占用了36G,设置了pipe.enable_model_cpu_offload(),没有起到作用,diffusers是0.30.0dev的版本

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org

load 阶段? 你是怎么运行代码的,或许更一下现在的稳定版本

Sign up or log in to comment