inference without voice cloning
#9
by
gqd
- opened
Hey
- All the examples show how to produce output with a speaker voice
- Wondering if it's possible to do fine-tuning on a speaker voice and then inference without passing a reference sample to reduce latency?
Thx
gqd
changed discussion title from
usage without voice cloning
to inference without voice cloning
Once you calculated latents , you can pass same latents to inference there after, that reduces inference time.
Please check code on https://huggingface.co/spaces/coqui/xtts/blob/main/app.py#L233
gpt_cond_latent,speaker_embedding = model.get_conditioning_latents(audio_path=speaker_wav, gpt_cond_len=30, max_ref_length=60)
Hey @gorkemgoknar
Is it possible to fine-tune and inference coqui/XTTS-v2 as a single speaker model entirely, to remove the additional latency of using the latents?
Or wouldn't that make much of a difference, when using the precomputed latents as you suggested?
Thx