CUDA out of memory without Gradio

by snakelemma - opened May 16

May 16

I can run the model locally through Gradio, but not standalone. For Gradio I use the code from https://huggingface.co/spaces/qnguyen3/nanoLLaVA.
The standalone version (using the sample code) gives me "CUDA out of memory" with NVIDIA GeForce RTX 4050 6Gb, while through Gradio the memory is not filled.
The error is thrown when SigLipAttention is loaded. Any idea why less vram is used with Gradio?

snakelemma changed discussion title from CUDA out of memory without using Gradio to CUDA out of memory without Gradio May 16

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment