stabilityai/stable-audio-open-1.0 · Any Comfy workflow ?

3blackbar

Jun 5

Would be nice to have

cjohndesign

Jun 5

+1

NickyNicky

Jun 5

gradio

ameerazam08

Jun 5

@NickyNicky
https://huggingface.co/spaces/ameerazam08/stableaudio-open-1.0

gabrielhautclocq

Jun 5

I'm not very experienced with Python but I could make it work like this:

=== requirements1.txt ===

-i https://download.pytorch.org/whl/cu121
torch
torchvision
torchaudio

=== requirements2.txt ===

einops
ninja
packaging
huggingface_hub
stable_audio_tools

=== infer.py ===

import torch
import torchaudio
from einops import rearrange
from stable_audio_tools import get_pretrained_model
from stable_audio_tools.inference.generation import generate_diffusion_cond

device = "cuda" if torch.cuda.is_available() else "cpu"

# Download model
model, model_config = get_pretrained_model("stabilityai/stable-audio-open-1.0")
sample_rate = model_config["sample_rate"]
sample_size = model_config["sample_size"]

model = model.to(device)

# Set up text and timing conditioning
conditioning = [{
    "prompt": "128 BPM dnb drums loop",
    "seconds_start": 0, 
    "seconds_total": 30
}]

# Generate stereo audio
output = generate_diffusion_cond(
    model,
    steps=100,
    cfg_scale=7,
    conditioning=conditioning,
    sample_size=sample_size,
    sigma_min=0.3,
    sigma_max=500,
    sampler_type="dpmpp-3m-sde",
    device=device,
    # force seed because of error on Win11 64 bits
    seed=123456
)

# Rearrange audio batch to a single sequence
output = rearrange(output, "b d n -> d (b n)")

# Peak normalize, clip, convert to int16, and save to file
output = output.to(torch.float32).div(torch.max(torch.abs(output))).clamp(-1, 1).mul(32767).to(torch.int16).cpu()
torchaudio.save("output.wav", output, sample_rate)

Then type

python -m venv venv
.\venv\Scripts\activate
pip install -r requirements1.txt
pip install -r requirements2.txt

(will take a while)
Then authenticate to your Hugging face account with a token (create token at https://huggingface.co/settings/tokens)

huggingface-cli login

(paste your token and follow the instructions, token will not be outputed when pasted)

Then

python .\infer.py

Edit infer.py as needed.
Change prompt and seed if necessary.

drbaph

Jun 5

•

edited Jun 5

seed=123456
Edit infer.py as needed.
Change prompt and seed if necessary.

The seed doesnt work like that on windows, you will need to change line 138 in generation.py
"myenv/Lib/site-Packages/stable_audio_tools/inference/generation.py"
seed = 12345

Yeerchiu

Jun 7

•

edited Jun 7

change the seed in "myenv/Lib/site-Packages/stable_audio_tools/inference/generation.py"
from
seed = seed if seed != -1 else np.random.randint(0, 232 - 1)
to
seed = seed if seed != -1 else np.random.randint(0, 232 - 1,dtype=np.int64)
is also worked on my device.