An end-to-end (e2e) Voice Language Model by Fish Audio.
Vocal and background audio separator
Fast & efficient ASR outperforming Whisper!
F5-TTS & E2-TTS: Zero-Shot Voice Cloning (Unofficial Demo)