VRAM consumption when using GPU (CUDA)

#37

by Sunjay353 - opened Jul 1

Discussion

Sunjay353

Jul 1

•

edited Jul 1

I noticed that the VRAM usage increases by around the model size when loading the model, which is expected. However, it then increases again by roughly twice the model size during inference. This means the VRAM consumption is approximately three times the model size overall. Furthermore, this additional utilization is not released after inference, only at model unload. Is this normal and expected behavior?

pi-null-mezon

16 days ago

Yes, it's normal and expected. Transformers consume memory proportional to the square of the tokens number in sequence.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment