Spaces:
Running
"CUDA must not be initialized in main process" Error during inference
I see other spaces initializing the model and moving it to device("cuda") outside of the @spaces.GPU decorated function which I assume to be the main process. However, when I try to do that I receive the following error. I've tried to initialize the model within @spaces.GPU as well but I get the same error.
File "/usr/local/lib/python3.10/site-packages/spaces/zero/wrappers.py", line 171, in gradio_handler
res = worker.res_queue.get()
File "/usr/local/lib/python3.10/multiprocessing/queues.py", line 367, in get
return _ForkingPickler.loads(res)
File "/usr/local/lib/python3.10/site-packages/torch/multiprocessing/reductions.py", line 147, in rebuild_cuda_tensor
torch.cuda._lazy_init()
File "/usr/local/lib/python3.10/site-packages/torch/cuda/__init__.py", line 302, in _lazy_init
torch._C._cuda_init()
File "/usr/local/lib/python3.10/site-packages/spaces/zero/torch.py", line 181, in _cuda_init_raise
raise RuntimeError(
RuntimeError: CUDA must not be initialized in the main process on Spaces with Stateless GPU environment.
You can look at this Stacktrace to find out which part of your code triggered a CUDA init
My app.py looks like
import gradio as gr
import spaces
import torch
torch.jit.script = lambda f: f # Avoid script error in lambda
from t2v_metrics import VQAScore, list_all_vqascore_models
# Global model variable, but do not initialize or move to CUDA here
model_pipe = VQAScore(model="clip-flant5-xl", device="cuda") # our recommended scoring model
@spaces.GPU(duration = 20)
def generate(model_name, image, text):
result = model_pipe(images=[image], texts=[text]) # Perform the model inference
return result # Return the result
demo = gr.Interface(
fn=generate, # function to call
inputs=[gr.Dropdown(["clip-flant5-xl", "clip-flant5-xxl"], label="Model Name"), gr.Image(type="filepath"), gr.Textbox(label="Prompt")], # define the types of inputs
outputs="number", # define the type of output
title="VQAScore", # title of the app
description="This model evaluates the similarity between an image and a text prompt."
)
demo.queue()
demo.launch()
Hi
@zhiqiulin
, I may be wrong but my understanding is that .to(device)
outside of the function decorated with@spaces.GPU
is simply used to remember which model should be moved to GPU and that the model is actually not loaded to the GPU when you call .to(device)
.
I don't know the implementation details of your VQAScore
, but you might want to check if there's a line that uses CUDA when instantiating it. For example, if you call torch.load(ckpt_path, map_location="cuda")
, an error will be raised because it tries to load the weight to GPU. You can avoid this error by loading the weight to CPU first and then calling .to("cuda")
.
Given the stack-trace, it looks like unpickling the result returned by the VQAScore
pipe triggers a CUDA init.
Maybe that the result of the pipe somehow needs to be moved to CPU before returning the result inside the generate
function (In transformers
and diffusers
, pipe always return plain / regular objects, not tensors. I don't know about t2v_metrics
)