Spaces:

mrfakename
/

E2-F5-TTS

Running on Zero

runtime error: FFT operations are only supported on MacOS 14+

by hololabs - opened 4 days ago

4 days ago

python3.10/site-packages/torch/functional.py", line 665, in stft
return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined]
RuntimeError: FFT operations are only supported on MacOS 14+

i was hoping to just try this out locally on an M1/16 running ventura 13.6.1

before i go down the rabbit hole of updating the OS, can someone (authors?) confirm the M1/16 would be enough to run this?

mrfakename

Owner 4 days ago

Hey @hololabs ,
I've tried running it locally on a M1 mac and it works for me. I think upgrading to macOS 14 should fix the issue.

hololabs

4 days ago

i did upgrade and i can run it

however the output produced is gibberish. fast talking english speaker with blurred words

how to trouble shoot this?

also how can i save a model so it doesnt have to retrain on my voice.wav over and over again?

thank you for your time in answering and this excellent work

mrfakename

Owner 4 days ago

Hi,
Would you mind sharing the text, reference audio, and generated audio?
Thanks!

hololabs

4 days ago

•

edited 4 days ago

text to generate : hello mrfakename thanks for the great code

additionally, i found i had to convert mp3 to this specific wav for the hugging face demo to work reliably

ffmpeg -i input.mp3 -ac 1 -ar 16000 -sample_fmt s16 -bitexact -c:a pcm_s16le output.wav

the above errors on my local mac m1/16 gig updated to latest os .

it works on hugging face though

mrfakename

Owner 3 days ago

Hi,
I've tested this on a local setup and it seems to work for me. I made a couple changes to the space which might have impacted the results, would you mind trying again with the latest updates?
Thanks!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment