Spaces:
Running
on
CPU Upgrade
Add Fish Speech
Hi everyone,
We are thrilled to announce that we have open sourced our new text-to-speech model, Fish Speech 1, today! You can find the model and more details on our Hugging Face blog post: https://huggingface.co/blog/lengyue233/fish-speech-1.
We have prepared two demos for you to try out:
- The medium pretrain demo, which excels at general speaking, can be found at Fish Audio.
- The large SFT demo, which works particularly well on ACGN content, is available on Hugging Face Space.
To better understand our model's performance, we are eager to integrate the medium pretrain model into TTS Arena for evaluation. We believe this will provide valuable insights into how Fish Speech 1 compares to other state-of-the-art TTS models. If the TTS Arena team requires any assistance or support during the integration process, we are more than happy to provide any necessary resources or guidance.
Best regards,
The Fisu Audio Team
Hi, congratulations on your launch!! Are there any plans to switch to an open source license?
Hi, Fish Speech is an open-source model. The code is available under the BSD-3-Clause license, and the model weights are released under the BY-CC-NC-SA 4.0 license.
Feel free to use it for any non-commercial purposes.
Thanks! Are there any plans to release the weights under an open source license (see OSD)?
Currently, we don't have any plan to release the weights for commercial use.
We have a very strong release coming soon, it's close to elvenlabs now. Some samples here:
We have a very strong release coming soon, it's close to elvenlabs now. Some samples here:
With that kind statement of confidence, I have to be honest here. While it is better than half of the current models in the Arena, I predict that it will score below StyleTTS and XTTS if added. No were near ElevenLabs. It feels unstable, as in, it always has a slight stuttering. ๐
Of course that is for the voting public to decide.
What about now?
What about now?
Better. I've added Fish Speech's HF Space to the Arena fork, which unlike this space uses HuggingFace Gradio Spaces to generate the audio. A few of the cached samples should be of Fish Speech. The โกbutton.
https://huggingface.co/spaces/Pendrokar/TTS-Spaces-Arena
I made and update the fork as I find that TTS-AGI organization is not genuine about their stated goal.
What about now?
Better. I've added Fish Speech's HF Space to the Arena fork, which unlike this space uses HuggingFace Gradio Spaces to generate the audio. A few of the cached samples should be of Fish Speech. The โกbutton.
https://huggingface.co/spaces/Pendrokar/TTS-Spaces-ArenaI made and update the fork as I find that TTS-AGI organization is not genuine about their stated goal.
BTW, did you use some reference audio (or timbre) for Fish Speech?
BTW, did you use some reference audio (or timbre) for Fish Speech?
Reference audio. It is the one that OpenVoice used to use here on this very space. Zero-shot TTS spaces use that voice.
https://huggingface.co/spaces/TTS-AGI/TTS-Arena/discussions/19#65e00cf8121aa0d0b49e8789
Multiple voices per model would be useful to avoid a biased vote as the voter starts to notice the connection between model and voice. Would not be hard to do with Zero-shot TTS.
The issue is that the voice lacks energy and emotion, unlike Edge TTS. We'd expect Fish-Speech to mimic this behavior since it's not a semantic-based TTS model. It should mimic everything, not just timbre and some pitch/duration like XTTS or Tortoise. For best results, start with the English example in our space.
Hi @mrfakename , we believe our current version is much better than previous, do you mind to give a try? https://huggingface.co/spaces/fishaudio/fish-speech-1
Suddenly switching the voice to that of another is as bad as hallucinating. 'This should never ever happen.๐
We have a update today, this should greatly reduce the chance of voice change.
I made and update the fork as I find that TTS-AGI organization is not genuine about their stated goal.
Faith restored! Great work!