CUDA out of memory without Gradio

#4
by snakelemma - opened

I can run the model locally through Gradio, but not standalone. For Gradio I use the code from https://huggingface.co/spaces/qnguyen3/nanoLLaVA.
The standalone version (using the sample code) gives me "CUDA out of memory" with NVIDIA GeForce RTX 4050 6Gb, while through Gradio the memory is not filled.
The error is thrown when SigLipAttention is loaded. Any idea why less vram is used with Gradio?

snakelemma changed discussion title from CUDA out of memory without using Gradio to CUDA out of memory without Gradio

Sign up or log in to comment