"CUDA must not be initialized in main process" Error during inference

#45
by zhiqiulin - opened
ZeroGPU Explorers org

I see other spaces initializing the model and moving it to device("cuda") outside of the @spaces.GPU decorated function which I assume to be the main process. However, when I try to do that I receive the following error. I've tried to initialize the model within @spaces.GPU as well but I get the same error.

File "/usr/local/lib/python3.10/site-packages/spaces/zero/wrappers.py", line 171, in gradio_handler
    res = worker.res_queue.get()
  File "/usr/local/lib/python3.10/multiprocessing/queues.py", line 367, in get
    return _ForkingPickler.loads(res)
  File "/usr/local/lib/python3.10/site-packages/torch/multiprocessing/reductions.py", line 147, in rebuild_cuda_tensor
    torch.cuda._lazy_init()
  File "/usr/local/lib/python3.10/site-packages/torch/cuda/__init__.py", line 302, in _lazy_init
    torch._C._cuda_init()
  File "/usr/local/lib/python3.10/site-packages/spaces/zero/torch.py", line 181, in _cuda_init_raise
    raise RuntimeError(
RuntimeError: CUDA must not be initialized in the main process on Spaces with Stateless GPU environment.
You can look at this Stacktrace to find out which part of your code triggered a CUDA init

My app.py looks like

import gradio as gr
import spaces
import torch
torch.jit.script = lambda f: f  # Avoid script error in lambda

from t2v_metrics import VQAScore, list_all_vqascore_models

# Global model variable, but do not initialize or move to CUDA here
model_pipe = VQAScore(model="clip-flant5-xl", device="cuda")  # our recommended scoring model

@spaces.GPU(duration = 20)
def generate(model_name, image, text):
    
    result = model_pipe(images=[image], texts=[text])  # Perform the model inference
    
    return result  # Return the result

demo = gr.Interface(
    fn=generate,  # function to call
    inputs=[gr.Dropdown(["clip-flant5-xl", "clip-flant5-xxl"], label="Model Name"), gr.Image(type="filepath"), gr.Textbox(label="Prompt")],  # define the types of inputs
    outputs="number",  # define the type of output
    title="VQAScore",  # title of the app
    description="This model evaluates the similarity between an image and a text prompt."
)

demo.queue()
demo.launch()
ZeroGPU Explorers org

Hi @zhiqiulin , I may be wrong but my understanding is that .to(device) outside of the function decorated with@spaces.GPU is simply used to remember which model should be moved to GPU and that the model is actually not loaded to the GPU when you call .to(device).
I don't know the implementation details of your VQAScore, but you might want to check if there's a line that uses CUDA when instantiating it. For example, if you call torch.load(ckpt_path, map_location="cuda"), an error will be raised because it tries to load the weight to GPU. You can avoid this error by loading the weight to CPU first and then calling .to("cuda").

ZeroGPU Explorers org

Given the stack-trace, it looks like unpickling the result returned by the VQAScore pipe triggers a CUDA init.

Maybe that the result of the pipe somehow needs to be moved to CPU before returning the result inside the generate function (In transformers and diffusers, pipe always return plain / regular objects, not tensors. I don't know about t2v_metrics)

Sign up or log in to comment