stabilityai/stable-audio-open-1.0

Jun 5

•

Hello,
I'd like to ask about how much VRAM this model requires to run. I'm running a 3060 and 32gb of system ram, however, this doesn't seem to be enough to complete a generation.
I suspect this is due to stable_audio_tools, and optimizing the final output function could rectify this issue, because generation runs flawlessly, however it fails before returning a final output. Has anybody else with similar hardware experienced this issue? Have you gotten yours running?
For reference, here are my logs. This is from the example provided in README.md

Traceback (most recent call last):
  File "/home/personontheinternet/Development/stableaudioUI/main.py", line 24, in <module>
    output = generate_diffusion_cond(
  File "/home/personontheinternet/Documents/miniconda3/envs/exllama/lib/python3.10/site-packages/stable_audio_tools/inference/generation.py", line 247, in generate_diffusion_cond
    sampled = model.pretransform.decode(sampled)
  File "/home/personontheinternet/Documents/miniconda3/envs/exllama/lib/python3.10/site-packages/stable_audio_tools/models/pretransforms.py", line 70, in decode
    decoded = self.model.decode_audio(z, chunked=self.chunked, iterate_batch=self.iterate_batch, **kwargs)
  File "/home/personontheinternet/Documents/miniconda3/envs/exllama/lib/python3.10/site-packages/stable_audio_tools/models/autoencoders.py", line 513, in decode_audio
    return self.decode(latents, **kwargs)
  File "/home/personontheinternet/Documents/miniconda3/envs/exllama/lib/python3.10/site-packages/stable_audio_tools/models/autoencoders.py", line 334, in decode
    decoded.append(self.decoder(latents[i:i+1]))
  File "/home/personontheinternet/Documents/miniconda3/envs/exllama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/personontheinternet/Documents/miniconda3/envs/exllama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/personontheinternet/Documents/miniconda3/envs/exllama/lib/python3.10/site-packages/stable_audio_tools/models/autoencoders.py", line 191, in forward
    return self.layers(x)
  File "/home/personontheinternet/Documents/miniconda3/envs/exllama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/personontheinternet/Documents/miniconda3/envs/exllama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/personontheinternet/Documents/miniconda3/envs/exllama/lib/python3.10/site-packages/torch/nn/modules/container.py", line 217, in forward
    input = module(input)
  File "/home/personontheinternet/Documents/miniconda3/envs/exllama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/personontheinternet/Documents/miniconda3/envs/exllama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/personontheinternet/Documents/miniconda3/envs/exllama/lib/python3.10/site-packages/stable_audio_tools/models/autoencoders.py", line 114, in forward
    return self.layers(x)
  File "/home/personontheinternet/Documents/miniconda3/envs/exllama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/personontheinternet/Documents/miniconda3/envs/exllama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/personontheinternet/Documents/miniconda3/envs/exllama/lib/python3.10/site-packages/torch/nn/modules/container.py", line 217, in forward
    input = module(input)
  File "/home/personontheinternet/Documents/miniconda3/envs/exllama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/personontheinternet/Documents/miniconda3/envs/exllama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/personontheinternet/Documents/miniconda3/envs/exllama/lib/python3.10/site-packages/stable_audio_tools/models/autoencoders.py", line 60, in forward
    x = self.layers(x)
  File "/home/personontheinternet/Documents/miniconda3/envs/exllama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/personontheinternet/Documents/miniconda3/envs/exllama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/personontheinternet/Documents/miniconda3/envs/exllama/lib/python3.10/site-packages/torch/nn/modules/container.py", line 217, in forward
    input = module(input)
  File "/home/personontheinternet/Documents/miniconda3/envs/exllama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/personontheinternet/Documents/miniconda3/envs/exllama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/personontheinternet/Documents/miniconda3/envs/exllama/lib/python3.10/site-packages/stable_audio_tools/models/blocks.py", line 337, in forward
    x = snake_beta(x, alpha, beta)
  File "/home/personontheinternet/Documents/miniconda3/envs/exllama/lib/python3.10/site-packages/stable_audio_tools/models/blocks.py", line 302, in snake_beta
    return x + (1.0 / (beta + 0.000000001)) * pow(torch.sin(x * alpha), 2)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1024.00 MiB. GPU

Thank you for releasing this model. I honestly didn't expect it to see the light of day. I'm glad to have been proven wrong :)

gabrielhautclocq

Jun 5

It works for me with 32 GB of RAM and a 12 GB GPU (3080 ti)

Ghostquito

Jun 6

I got a 8gb 1070ti, doesn't appear to be happening/

the-french-artist

Jun 6

My total VRAM usage is at 12.2GB, including Windows display.

alohix

Jun 10

i got it working on Nvidia 1060 3Gb in comfyui :D

ternaryjimbo

Jun 18

•

edited Jun 18

is it super slow for anyone else? wonder if theres ways to speed it up... my gpu tested: A10G 24GB VRAM

onetimepad

Jun 24

Same config (3060 12 GB VRAM and 32GB RAM) and it first uses like 6 GB of VRAM, then, at the end, jumps to 12 GB VRAM and 2 GB of normal RAM - it slows down at this point and CPU is working, but generate sound at the end.

sassoshots

Jul 17

run in half-precision

stabilityai
/

stable-audio-open-1.0

VRAM Estimation