Multiple errors associated with HF-wide Zero GPU space renovation

#104
by John6666 - opened

The following glitch occurs when a @spaces.GPU decorator is attached to a function (method) that is not called directly from app.py.
If a function with decorators is called directly, it works.
However, it is very inconvenient and inefficient to separate model loading and other unnecessary processing on the GPU.

This did not occur at least until the day before yesterday.

Will this become a permanent specification in the future?
I hope it is a bug...
https://huggingface.co/spaces/John6666/votepurchase-multiple-model/discussions/3#66d0674e378dfa1ea4b39dcd

ZeroGPU Explorers org

Hi, thanks for the report. No, inner functions are not meant to disappear on ZeroGPU (indeed as you mentioned, this is convenient in a lot of scenarios)

We're currently doing a migration on ZeroGPU so you can get a degraded experience.

I can't reproduce the bug on https://hf.co/spaces/John6666/votepurchase-multiple-model though
This does not mean 100% that it is due to the migration and that there's no root cause that needs to be addressed.

Let me know if you can reproduce on your side

ZeroGPU Explorers org

I have the space duplicated and was experiencing this issue.
You should try to reproduce from this commit https://huggingface.co/spaces/John6666/votepurchase-multiple-model/commit/957b3127ca396a44ca0f3b82e1e84c5cc5c567b7

(The latest commit in the space has fixed the issue)

Let me know if you can reproduce on your side

Sorry. I just noticed.😰
In the meantime, I will now try to reproduce the experiment on my side.
I will do that later, but for now, here is an additional report.

The above issue is being addressed by HF staff and will be fixed in time, but I found an additional issue and am reporting it just in case.
It seems that if I call a class method from a Gradio event, the error occurrence is even more severe.
It was also pointless to just wrap it messily with a function.

At first, I suspected that the problem was compatible with the libraries, since the first ones to have the problem were all spaces using the stablepy library, but it worked after the fix in another space using the same stablepy.

Actual Space (crashes)

https://huggingface.co/spaces/John6666/DiffuseCraftMod

Actual Space (fixed and works)

https://huggingface.co/spaces/John6666/votepurchase-multiple-model

Code Summary (crashes when start inference)

# app.py
import spaces
import gradio as gr
~
from stablepy import Model_Diffusers
~
class GuiSD:
~
    @spaces.GPU
    def generate_pipeline(
~
sd_gen = GuiSD()
~
        generate_button.click(
            fn=sd_gen.generate_pipeline,

Error

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/gradio/queueing.py", line 575, in process_events
    response = await route_utils.call_process_api(
  File "/usr/local/lib/python3.10/site-packages/gradio/route_utils.py", line 285, in call_process_api
    output = await app.get_blocks().process_api(
  File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 1923, in process_api
    result = await self.call_function(
  File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 1520, in call_function
    prediction = await utils.async_iteration(iterator)
  File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 663, in async_iteration
    return await iterator.__anext__()
  File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 656, in __anext__
    return await anyio.to_thread.run_sync(
  File "/usr/local/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread
    return await future
  File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 859, in run
    result = context.run(func, *args)
  File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 639, in run_sync_iterator_async
    return next(iterator)
  File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 801, in gen_wrapper
    response = next(iterator)
  File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 801, in gen_wrapper
    response = next(iterator)
  File "/home/user/app/app.py", line 435, in load_new_model
    self.model.load_pipe(
  File "/usr/local/lib/python3.10/site-packages/stablepy/diffusers_vanilla/model.py", line 590, in load_pipe
    self.switch_pipe_class(
  File "/usr/local/lib/python3.10/site-packages/stablepy/diffusers_vanilla/model.py", line 389, in switch_pipe_class
    self.pipe = CLASS_DIFFUSERS_TASK[class_name][tk](**model_components).to(self.device)
  File "/usr/local/lib/python3.10/site-packages/diffusers/pipelines/pipeline_utils.py", line 431, in to
    module.to(device, dtype)
  File "/usr/local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2871, in to
    return super().to(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1152, in to
    return self._apply(convert)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 802, in _apply
    module._apply(fn)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 802, in _apply
    module._apply(fn)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 802, in _apply
    module._apply(fn)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 825, in _apply
    param_applied = fn(param)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1150, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
  File "/usr/local/lib/python3.10/site-packages/torch/cuda/__init__.py", line 302, in _lazy_init
    torch._C._cuda_init()
  File "/usr/local/lib/python3.10/site-packages/spaces/zero/torch/patching.py", line 245, in _cuda_init_raise
    raise RuntimeError(
RuntimeError: CUDA must not be initialized in the main process on Spaces with Stateless GPU environment.
You can look at this Stacktrace to find out which part of your code triggered a CUDA init

Code (failure to avoid crashing)

@spaces.GPU
def generate(**kwargs):
    return sd_gen.generate_pipeline(**kwargs)

        generate_button.click(
            fn=generate,

Therefore, I managed to find space for a Zero GPU and created a reproduction test space.
It's just a broken version of the working version above, with 4 line tweaks.
(It is slightly updated, so it is not exactly the same as the version at the time, but the phenomenon looks the same...)

Actual (Experiment) Space (crashes)

https://huggingface.co/spaces/John6666/votepurchase-crash

To @xi0v , please check as well.

P.S.
In this way, the two above appear to be different phenomena.

ZeroGPU Explorers org

I checked on the https://huggingface.co/spaces/John6666/votepurchase-crash space and I can confirm that the problem is present there. (Tested multiple times)

Screenshot_2024-08-30-07-43-58-389-edit_com.android.chrome.jpg

IMG_20240830_074609.jpg

While debugging this bug, I found a clue to another glitch.
I don't know where to post problems with Spaces in general, so I have no choice but to write about it here before I forget.

I think it's been more than 2 weeks, but there was a bug that was discussed on the forum that crashed when starting Spaces if Examples were written in Gradio or Docker's Spaces.
However, it would sometimes start even if the Examples were present, and the cause was not well understood.

Now I have found that the crash occurs with a high probability when the Examples components contains a Python None type value, especially for Checkboxes.
I think it is a kind of bug that an entity higher than Gradio or Docker scans and manipulates space on its own to begin with, but if you are having trouble, please try to delete None from Examples.

Somehow I managed to bypass the bug.
I will keep a copy of the source at the stage where the bug can be reproduced as CPU space. (There are no more free slots for Zero GPU space)
https://huggingface.co/spaces/John6666/DiffuseCraftModCrash

  • The seemingly unrelated model loading section was also a class method, so it was isolated.
  • I forgot to use yield statement unlike the other space, so I fixed it.
@spaces.GPU
def sd_gen_load_new_model(*args):
    yield from sd_gen.load_new_model(*args)

@spaces.GPU
def sd_gen_generate_pipeline(*args):
    yield from sd_gen.generate_pipeline(*args)
  • The difference in the behavior of the two spaces is probably the difference between yield and return statement
  • It seems that class methods called on Gradio events need to be isolated, even if they seem unrelated at first glance.
  • There are other places where class methods are referenced, but they are not affected unless they are called from an event?

P.S.

What I have been able to see in the course of my debugging of the above.

  • The problem, which is probably traced back to an outbreak last month and identified by the forum (see below), is still ongoing and has not improved in any way.
  • That there are more than one or two symptoms.
  • The situation in the Zero GPU space became even worse about two weeks ago.
  • Seems to be a forever building issue even with Docker in CPU space.
  • Something is buggy with Gradio (as usual. See below)
  • This may be unrelated, but while fixing a different Zero GPU space than the two above, I ran into a bug that crashed when Gradio's Progress was present. I have never encountered this in CPU space. It is not impossible that tqdm is malfunctioning.
  • The error in the Progress component above also occurred in the model load function, come to think of it. If this is the case, I cannot deny the relationship with the common from_pretrained function. I have never encountered a similar error before last month.

From Forum

https://discuss.huggingface.co/t/space-is-building-permanently/99740
https://discuss.huggingface.co/t/flowise-space-stuck-on-building/103813/
https://discuss.huggingface.co/t/space-not-building-and-showing-no-logs/103770
https://discuss.huggingface.co/t/perpetually-building/104664

Gradio Bug

4.42.0
https://huggingface.co/spaces/John6666/Nymbo_Theme
4.36.1
https://huggingface.co/spaces/Nymbo/Nymbo_Theme

I am reporting this because I happened to find a space where I could reproduce the same phenomenon as the above bug.
Compared to the two spaces above, reproduction is a bit more tedious, and you must first select the base model and then try to generate it with even 1girl in the prompt.

If the base model is not selected, it works fine, as does the original, multimodalart's space. It seems to have been triggered by applying the following commit in the original space and putting the same code in the base model selection section of my space.
https://huggingface.co/spaces/multimodalart/flux-lora-the-explorer/commit/c7245735ddf7dc3e13ce912767d7102b811d1203

So I rolled back the code, it works fine whether or not I choose a base model. However, I have prepared a space for crash reproduction.
This time, since we know the code that triggered it, it may be somewhat easier to identify it.

Error

NVML_SUCCESS == r INTERNAL ASSERT FAILED at "../c10/cuda/CUDACachingAllocator.cpp":838, please report a bug to PyTorch. 
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/gradio/queueing.py", line 536, in process_events
    response = await route_utils.call_process_api(
  File "/usr/local/lib/python3.10/site-packages/gradio/route_utils.py", line 288, in call_process_api
    output = await app.get_blocks().process_api(
  File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 1931, in process_api
    result = await self.call_function(
  File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 1528, in call_function
    prediction = await utils.async_iteration(iterator)
  File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 671, in async_iteration
    return await iterator.__anext__()
  File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 664, in __anext__
    return await anyio.to_thread.run_sync(
  File "/usr/local/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread
    return await future
  File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 859, in run
    result = context.run(func, *args)
  File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 647, in run_sync_iterator_async
    return next(iterator)
  File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 809, in gen_wrapper
    response = next(iterator)
  File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 809, in gen_wrapper
    response = next(iterator)
  File "/home/user/app/app.py", line 222, in run_lora
    for image in image_generator:
  File "/usr/local/lib/python3.10/site-packages/spaces/zero/wrappers.py", line 356, in gradio_handler
    raise res.value
gradio.exceptions.Error: 'Inference Error: NVML_SUCCESS == r INTERNAL ASSERT FAILED at "../c10/cuda/CUDACachingAllocator.cpp":838, please report a bug to PyTorch. '

Space (crashes if generate after select base model)

https://huggingface.co/spaces/John6666/flux-lora-the-explorer-crash

Space (roll-backed and works fine)

https://huggingface.co/spaces/John6666/flux-lora-the-explorer

multimodalart's original Space (works)

https://huggingface.co/spaces/multimodalart/flux-lora-the-explorer

I noticed a bug report on Discussion, but it is very likely that the conditions under which the bug occurred changed between yesterday morning and at least a few hours ago.
The space that was fine is buggy...
I don't know if the spaces that were not okay instead are fixed.

Same error (maybe)

https://huggingface.co/spaces/zero-gpu-explorers/README/discussions/104#66d14ecaad293ffc4be7d0d3

Space (Crashes when generate with lora selected)

https://huggingface.co/spaces/John6666/flux-lora-the-explorer

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/gradio/queueing.py", line 536, in process_events
    response = await route_utils.call_process_api(
  File "/usr/local/lib/python3.10/site-packages/gradio/route_utils.py", line 288, in call_process_api
    output = await app.get_blocks().process_api(
  File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 1931, in process_api
    result = await self.call_function(
  File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 1516, in call_function
    prediction = await anyio.to_thread.run_sync(  # type: ignore
  File "/usr/local/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread
    return await future
  File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 859, in run
    result = context.run(func, *args)
  File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 826, in wrapper
    response = f(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 826, in wrapper
    response = f(*args, **kwargs)
  File "/home/user/app/app.py", line 201, in run_lora
    image = generate_image(prompt_mash, steps, seed, cfg_scale, width, height, lora_scale, cn_on, progress)
  File "/usr/local/lib/python3.10/site-packages/spaces/zero/wrappers.py", line 211, in gradio_handler
    raise gr.Error("GPU task aborted")
gradio.exceptions.Error: 'GPU task aborted'

https://huggingface.co/spaces/zero-gpu-explorers/README/discussions/104#66d60ffbf4e165e0b4b0a582
As for this bug, a restart of the space fixed it. It seems I had stepped on a land mine with something. Others remain the same.

P.S.

Again! Please don't think that...

While I was trying to make the problem less likely to occur, I succeeded in reproducing a Progress-related error in Gradio, so I am reporting it here with the code.
I am not sure if this is a Gradio bug or a series of Spaces errors, so I am reporting it here.

The error message itself appears to be a common, elementary Python error in receiving a return value, but the problem is that it occurs without any changes to the code other than entering Progress as an argument.
I use progress=gr.Progress(track_tqdm=True) often elsewhere, but rarely see errors like this. I mean, it's the first time for me.

Error reproduction procedure

  1. Port the following spaces to Zero GPU space
  2. Select one of the models in the Base Model drop-down (If you don't do this, it works fine.)
  3. Put in the appropriate prompts, press the generate button, and wait.

Summary?

def change_base_model(repo_id: str, cn_on: bool): # works well

def change_base_model(repo_id: str, cn_on: bool, progress=gr.Progress(track_tqdm=True)): # crush!

Space (crushes running on Zero GPU)

https://huggingface.co/spaces/John6666/flux-lora-the-explorer-crash-progress

Error

Loading model: John6666/xe-pixel-flux-01-fp8-flux
Model load Error: too many values to unpack (expected 2)
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/gradio/routes.py", line 763, in predict
    output = await route_utils.call_process_api(
  File "/usr/local/lib/python3.10/site-packages/gradio/route_utils.py", line 288, in call_process_api
    output = await app.get_blocks().process_api(
  File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 1931, in process_api
    result = await self.call_function(
  File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 1528, in call_function
    prediction = await utils.async_iteration(iterator)
  File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 671, in async_iteration
    return await iterator.__anext__()
  File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 664, in __anext__
    return await anyio.to_thread.run_sync(
  File "/usr/local/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread
    return await future
  File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 859, in run
    result = context.run(func, *args)
  File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 647, in run_sync_iterator_async
    return next(iterator)
  File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 809, in gen_wrapper
    response = next(iterator)
  File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 809, in gen_wrapper
    response = next(iterator)
  File "/usr/local/lib/python3.10/site-packages/spaces/zero/wrappers.py", line 356, in gradio_handler
    raise res.value
gradio.exceptions.Error: 'Model load Error: too many values to unpack (expected 2)'

I have another sample ready to crash and bring it to you.

Incidentally, even if the code does not reference controlnet_union at all after this, it will crash as a result.
It is possible that there is simply not enough VRAM, but the error message is the same as the one in the SDXL space above, it is probably related to this series of problems.

Error reproduction procedure

  1. Port the following spaces to Zero GPU space.
  2. Put in the appropriate prompts, press the generate button, and wait.
#controlnet_union = FluxControlNetModel.from_pretrained(controlnet_model_union_repo, torch_dtype=dtype).to(device)
#controlnet = FluxMultiControlNetModel([controlnet_union]).to(device) # works well

controlnet_union = FluxControlNetModel.from_pretrained(controlnet_model_union_repo, torch_dtype=dtype).to(device)
controlnet = FluxMultiControlNetModel([controlnet_union]).to(device) # crash (but not here)!

Space (crushes running on Zero GPU)

https://huggingface.co/spaces/John6666/flux-lora-the-explorer-crush3

Error

NVML_SUCCESS == r INTERNAL ASSERT FAILED at "../c10/cuda/CUDACachingAllocator.cpp":830, please report a bug to PyTorch. 
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/gradio/queueing.py", line 536, in process_events
    response = await route_utils.call_process_api(
  File "/usr/local/lib/python3.10/site-packages/gradio/route_utils.py", line 288, in call_process_api
    output = await app.get_blocks().process_api(
  File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 1931, in process_api
    result = await self.call_function(
  File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 1528, in call_function
    prediction = await utils.async_iteration(iterator)
  File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 671, in async_iteration
    return await iterator.__anext__()
  File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 664, in __anext__
    return await anyio.to_thread.run_sync(
  File "/usr/local/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread
    return await future
  File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 859, in run
    result = context.run(func, *args)
  File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 647, in run_sync_iterator_async
    return next(iterator)
  File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 809, in gen_wrapper
    response = next(iterator)
  File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 809, in gen_wrapper
    response = next(iterator)
  File "/home/user/app/app.py", line 271, in run_lora
    for image in image_generator:
  File "/usr/local/lib/python3.10/site-packages/spaces/zero/wrappers.py", line 356, in gradio_handler
    raise res.value
gradio.exceptions.Error: 'Inference Error: NVML_SUCCESS == r INTERNAL ASSERT FAILED at "../c10/cuda/CUDACachingAllocator.cpp":830, please report a bug to PyTorch. '

I have brought the most difficult case to avoid the bug I have experienced.
Most of the model loading and infrequence related stuff is in the source created by a third party, so it's hard to find, and even if I find it and modify it, it doesn't work.
Furthermore, this threw up errors that were new to me in the series of errors .
It's a bit of a stretch to modify from_pretrained or torch's .to()...😔

Error reproduction procedure

  1. Port the following spaces (either) to Zero GPU space
  2. Put in the appropriate prompts, press the generate button.

Space

https://huggingface.co/spaces/John6666/Xlabs-Gradio-error

Error

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/spaces/zero/wrappers.py", line 135, in worker_init
    torch.init(nvidia_uuid)
  File "/usr/local/lib/python3.10/site-packages/spaces/zero/torch/patching.py", line 354, in init
    torch.Tensor([0]).cuda()
  File "/usr/local/lib/python3.10/site-packages/torch/cuda/__init__.py", line 300, in _lazy_init
    raise RuntimeError(
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/gradio/queueing.py", line 536, in process_events
    response = await route_utils.call_process_api(
  File "/usr/local/lib/python3.10/site-packages/gradio/route_utils.py", line 321, in call_process_api
    output = await app.get_blocks().process_api(
  File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 1935, in process_api
    result = await self.call_function(
  File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 1520, in call_function
    prediction = await anyio.to_thread.run_sync(  # type: ignore
  File "/usr/local/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread
    return await future
  File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 859, in run
    result = context.run(func, *args)
  File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 826, in wrapper
    response = f(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/spaces/zero/wrappers.py", line 217, in gradio_handler
    raise res.value
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

Original Space

https://huggingface.co/spaces/DamarJati/Xlabs-Gradio

ZeroGPU Explorers org

@John6666 I'm sorry I couldn't read all your answers and detailed reports but a major issue on ZeroGPU has just been fixed.
The issue could make Spaces stuck forever (thus leading to "GPU task aborted"), as well as prevent Gradio modals and progress to display (which seems to be linked to your original message)

To benefit from the fix, all you need to do is push a change to your Space in order to trigger an update of the spaces package (note that "Restart" or "Factory rebuild" won't trigger the update)

Thank you for your efforts to fix the bug. It looks like it was a tough one.
Maybe there is still a Gradio 3.x bug on the server, but that was also a CUDA error, so maybe it is fixed?

I couldn't read all your answers and detailed reports

All of this is just to provide material for bug fixes, so if it results in a fix, that's OK. There is no particular need to read it.😀

note that "Restart" or "Factory rebuild" won't trigger the update

So we can increase or decrease unnecessary newlines in random files, or tinker around requirements.txt.
I can post about this, but it would be helpful if the HF staff could publicize it via Post or Forum, if possible.

I have updated several spaces and completed operational testing, and it appears to be generally fixed.
It is almost no longer necessary to use the workaround to create a wrapper.

However, the following errors, which were not seen before, often appear.
I think it probably refers to the CUDA device number, but the same error is not caught by the search.
So it could be a newly implemented error message, or it could be from the code in the space I based my mod on.
I will come back to tell you when I figure out the error triggers.

Invalid device argument 0: did you call init?

I understand the trigger that calls for an error. It seems that the following line I added was the problem.
It was written in the model load section.
Commenting it out stopped the error itself, but this seems like a rather common code...

The CUDA error that is not the one that aborts over 60 seconds that was resolved still occurs occasionally, and as you said, other than the main problem, it still seems to be there.

torch.cuda.reset_peak_memory_stats()

By the way, this has nothing to do with this error, but I found a complaint or rather a request about Zero GPU on the forum, so I'll put it here.
https://discuss.huggingface.co/t/exceeded-gpu-quota-via-api-but-fine-interactively/105699/
https://discuss.huggingface.co/t/learn-about-gpu-throttling-quota-another-stupid-guy-d/89894

Many parts have been fixed with yesterday's fixes, but I will leave the pseudo code for the pattern that still aborts at a high rate of probability in its current state.
The details are not exactly the same as the actual code, but it is roughly like this.

import spaces
import torch
from diffusers import DiffusionPipeline, AutoencoderTiny

def clear_cache():
    import gc
    try:
        torch.cuda.empty_cache()
        torch.cuda.reset_peak_memory_stats() # The culprit.
        gc.collect()
    except Exception as e:
        print(e)

clear_cache.zerogpu = True

dtype = torch.bfloat16
device = "cuda" if torch.cuda.is_available() else "cpu"
base_model = "black-forest-labs/FLUX.1-schnell"
taef1 = AutoencoderTiny.from_pretrained("madebyollin/taef1", torch_dtype=dtype).to(device) # Almost never stops here.
pipe = DiffusionPipeline.from_pretrained(base_model, torch_dtype=dtype, vae=taef1).to(device) # Almost never stops here.

def load_model():
    global pipe
    base_model2 = "sayakpaul/FLUX.1-merged"
    pipe = DiffusionPipeline.from_pretrained(base_model2, torch_dtype=dtype) # Almost never stops here.
    #clear_cache() # Turning this on spits out an error, but does not stop here.

@spaces.GPU()
def infer():
    pass
    # inference code
    # Actual abort happens here.

infer() # It works.

load_model() # It works.
infer() # Abort well.

I have found that I can avoid the above problem with high probability by changing my code as follows.
It looks like the problem was not a big problem at the time of the above code, but was triggered by the fact that the above code causes .to(“cuda”), which results in packing a large tensor at startup, 30GB+ tensor for FLUX.1.
That is what it looked like from what I observed in the logs.

Dependencies are also attached. accelerate often looks suspicious in cases like this, but in this case, the problem seems to occur with or without accelerate. In particular, errors related to the meta tensor and torch CUDA still occur frequently.
However, it is also true that the errors that occurred with and without this library changed.
Similarly, the version of diffusers did not seem to have any particular effect on the error.

I am not sure as to whether this packing tensor problem is the essence of this bug, but it does seem to be at least a contributing factor.

taef1 = AutoencoderTiny.from_pretrained("madebyollin/taef1", torch_dtype=dtype)#.to(device) # Almost never stops here.
pipe = DiffusionPipeline.from_pretrained(base_model, torch_dtype=dtype, vae=taef1)#.to(device) # Almost never stops here.
spaces
torch
git+https://github.com/huggingface/diffusers
transformers
peft
sentencepiece
torchvision
huggingface_hub
timm
einops
controlnet_aux
kornia
numpy
opencv-python
deepspeed
mediapipe
openai==1.37.0

https://huggingface.co/posts/cbensimon/747180194960645
Thank you for the announcement. With so many likes, I think it will gradually get through to everyone who hasn't seen the post.

I tried to see if maybe the above might have been fixed, but it seemed that the problem of the CUDA initialization error prone to occur in the space where the model is selected and loaded after startup was not fixed.
If I add the @spaces.GPU() decorator to the load function as well, it works, so it's much better than it was in the early days of the problem, but the Quota...🤢

Perhaps tensor packing was implemented as an important part of the Zero GPU acceleration process, and it is incompatible with all programs that replace the contents of VRAM after startup.
I don't know the detailed implementation of Zero GPU space, so I may be off the mark, but it might be useful to have a separate decorator to indicate access to VRAM without actual computation. (Something like the environment variables that multimodalart sometimes uses or foobar.zerogpu=True might be close to that, though...)

https://discuss.huggingface.co/t/usage-quota-exceeded/106619
Also, I don't know if it's a spec, a bug, or just unimplemented except for the error message that Quota is not relaxed when logging in problem, and probably no one on the user side knows.
And, this may be a server issue unrelated to Zero GPU, but it seems to happen sometimes that the space cannot be seen from the browser when not HF logged in.

P.S.

I have not encountered this myself, so I don't know the details, but there have been frequent reports of various upload errors on the forum. Probably much more than usual.
I'll mention it here too, just in case.
In some cases, the problem can be avoided via Colab, but there are many mysterious behaviors.
https://discuss.huggingface.co/t/error-uploading-model-using-website-drag-and-drop-interface/76071/9
https://discuss.huggingface.co/t/problem-bad-request-when-using-datasets-dataset-push-to-hub/106614/2

P.S.

I see that the login in the Quota error in the Zero GPU space was this login!
After 6 months, I finally understand...😱
https://huggingface.co/docs/hub/spaces-oauth

I have found another pattern of error occurrence and will report it.
When performing inference on a model with quantization in global scope, it crashes.
That's ok (?), but the error is related to multi-processing for some reason.
I have found other errors in other spaces that crash with the same message even though they're not explicitly using multiprocessing, but this one may be the cause.

pipe = DiffusionPipeline.from_pretrained(base_model, torch_dtype=dtype, vae=taef1)
quantize(pipe.text_encoder_2, weights=qfloat8)
freeze(pipe.text_encoder_2)
~
inference() # crashes here.
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

P.S.

Removing the above code did not cure the symptoms, so I removed optimum-quanto from the dependencies and it fixed it.
I don't know if Quanto is doing something wrong, or if there is a problem with Quanto's dependencies instead of Quanto itself, but this is a bit of a mess.

John6666 changed discussion title from About the phenomenon of eternal stacks when called from a function within a function to Multiple errors associated with HF-wide Zero GPU space renovation

Sign up or log in to comment