Spaces:

TTS-AGI
/

TTS-Arena

Running on CPU Upgrade

App Files Files Community

Add Fish Speech

#48

by lengyue233 - opened Apr 30

Discussion

lengyue233

Apr 30

Hi everyone,

We are thrilled to announce that we have open sourced our new text-to-speech model, Fish Speech 1, today! You can find the model and more details on our Hugging Face blog post: https://huggingface.co/blog/lengyue233/fish-speech-1.

We have prepared two demos for you to try out:

The medium pretrain demo, which excels at general speaking, can be found at Fish Audio.
The large SFT demo, which works particularly well on ACGN content, is available on Hugging Face Space.

To better understand our model's performance, we are eager to integrate the medium pretrain model into TTS Arena for evaluation. We believe this will provide valuable insights into how Fish Speech 1 compares to other state-of-the-art TTS models. If the TTS Arena team requires any assistance or support during the integration process, we are more than happy to provide any necessary resources or guidance.

Best regards,
The Fisu Audio Team

mrfakename

TTS AGI org Apr 30

Hi, congratulations on your launch!! Are there any plans to switch to an open source license?

lengyue233

May 1

Hi, Fish Speech is an open-source model. The code is available under the BSD-3-Clause license, and the model weights are released under the BY-CC-NC-SA 4.0 license.
Feel free to use it for any non-commercial purposes.

mrfakename

TTS AGI org May 1

Thanks! Are there any plans to release the weights under an open source license (see OSD)?

lengyue233

May 1

Currently, we don't have any plan to release the weights for commercial use.

lengyue233

May 12

•

edited May 12

We have a very strong release coming soon, it's close to elvenlabs now. Some samples here:

Pendrokar

May 12

We have a very strong release coming soon, it's close to elvenlabs now. Some samples here:

With that kind statement of confidence, I have to be honest here. While it is better than half of the current models in the Arena, I predict that it will score below StyleTTS and XTTS if added. No were near ElevenLabs. It feels unstable, as in, it always has a slight stuttering. 😕

Of course that is for the voting public to decide.

lengyue233

Sep 16

What about now?

Pendrokar

Sep 16

What about now?

Better. I've added Fish Speech's HF Space to the Arena fork, which unlike this space uses HuggingFace Gradio Spaces to generate the audio. A few of the cached samples should be of Fish Speech. The ⚡button.
https://huggingface.co/spaces/Pendrokar/TTS-Spaces-Arena

I made and update the fork as I find that TTS-AGI organization is not genuine about their stated goal.

lengyue233

Sep 16

What about now?

Better. I've added Fish Speech's HF Space to the Arena fork, which unlike this space uses HuggingFace Gradio Spaces to generate the audio. A few of the cached samples should be of Fish Speech. The ⚡button.
https://huggingface.co/spaces/Pendrokar/TTS-Spaces-Arena

I made and update the fork as I find that TTS-AGI organization is not genuine about their stated goal.

BTW, did you use some reference audio (or timbre) for Fish Speech?

Pendrokar

Sep 16

BTW, did you use some reference audio (or timbre) for Fish Speech?

Reference audio. It is the one that OpenVoice used to use here on this very space. Zero-shot TTS spaces use that voice.
https://huggingface.co/spaces/TTS-AGI/TTS-Arena/discussions/19#65e00cf8121aa0d0b49e8789

Multiple voices per model would be useful to avoid a biased vote as the voter starts to notice the connection between model and voice. Would not be hard to do with Zero-shot TTS.

lengyue233

Sep 16

The issue is that the voice lacks energy and emotion, unlike Edge TTS. We'd expect Fish-Speech to mimic this behavior since it's not a semantic-based TTS model. It should mimic everything, not just timbre and some pitch/duration like XTTS or Tortoise. For best results, start with the English example in our space.

lengyue233

Sep 20

Hi @mrfakename , we believe our current version is much better than previous, do you mind to give a try? https://huggingface.co/spaces/fishaudio/fish-speech-1

Pendrokar

Sep 23

Suddenly switching the voice to that of another is as bad as hallucinating. 'This should never ever happen.😕

lengyue233

Sep 24

We have a update today, this should greatly reduce the chance of voice change.

mrfakename

TTS AGI org Oct 17

@lengyue233 Fish Speech is now available on the TTS Arena!

mrfakename changed discussion status to closed Oct 17

Pendrokar

Oct 17

I made and update the fork as I find that TTS-AGI organization is not genuine about their stated goal.

Faith restored! Great work!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment