No indication on how to use it
Hello,
with tortoise_tts you can actually use download_models to specify a model but you don't provide autoregressive, diffusion_decoder, clvp2, cvvp, vocoder?
How to use your model ?
Thanks in advance
You need to use a tortoise tts fork like 152334H/tortoise-tts-fast then use the argument : --ar-checkpoint and point it to .pth downloaded from huggingface. Note that this model is not yet very good
Thank you for your answer and your model that I was able to use. Indeed, the audio did not come out of very good quality yet.
My new model V2_9750_gpt.pth is much better for speaking but the voice cloning will not be great, but I'm running out of ideas to improve but I know it is possible to do better
https://commonvoice.mozilla.org/en/datasets here you can find for 940hours of audio in the last update in french normally of very good quality "Common Voice Corpus 13.0" (be careful to select only validated audios). If you make a new model with these audios in train I will be very curious to see the result on voice cloning :)
I have already tried on large datasets including a part on which contained CommonVoice but it is in my project to test also this dataset but I do not believe too much
Hey Snowad, thanks for your work! Are you planning to continue to work on your models, considering 11labs are releasing their French model next month?
With this large data set of good quality audios on CommonVoice, the cloning should be much better. In addition, the content contains more than 17'000 different voices which should give good variety
Hey Snowad, thanks for your work! Are you planning to continue to work on your models, considering 11labs are releasing their French model next month?
I don't think so, the reason is that tortoise is just not optimized and the inference time is too long. However, some projects look promising (like the recreation of VALL-E) and I would be the first to try to make a french version because 11labs is way too expensive