No indication on how to use it

by FlashNews - opened Apr 5, 2023

Apr 5, 2023

Hello,

with tortoise_tts you can actually use download_models to specify a model but you don't provide autoregressive, diffusion_decoder, clvp2, cvvp, vocoder?
How to use your model ?

Thanks in advance

Snowad

Owner Apr 6, 2023

You need to use a tortoise tts fork like 152334H/tortoise-tts-fast then use the argument : --ar-checkpoint and point it to .pth downloaded from huggingface. Note that this model is not yet very good

FlashNews

Apr 11, 2023

Thank you for your answer and your model that I was able to use. Indeed, the audio did not come out of very good quality yet.

Snowad

Owner Apr 11, 2023

My new model V2_9750_gpt.pth is much better for speaking but the voice cloning will not be great, but I'm running out of ideas to improve but I know it is possible to do better

FlashNews

Apr 16, 2023

•

edited Apr 16, 2023

https://commonvoice.mozilla.org/en/datasets here you can find for 940hours of audio in the last update in french normally of very good quality "Common Voice Corpus 13.0" (be careful to select only validated audios). If you make a new model with these audios in train I will be very curious to see the result on voice cloning :)

Snowad

Owner Apr 16, 2023

I have already tried on large datasets including a part on which contained CommonVoice but it is in my project to test also this dataset but I do not believe too much

MaxHoude

Apr 16, 2023

Hey Snowad, thanks for your work! Are you planning to continue to work on your models, considering 11labs are releasing their French model next month?

FlashNews

Apr 17, 2023

With this large data set of good quality audios on CommonVoice, the cloning should be much better. In addition, the content contains more than 17'000 different voices which should give good variety

Snowad

Owner Apr 17, 2023

Hey Snowad, thanks for your work! Are you planning to continue to work on your models, considering 11labs are releasing their French model next month?

I don't think so, the reason is that tortoise is just not optimized and the inference time is too long. However, some projects look promising (like the recreation of VALL-E) and I would be the first to try to make a french version because 11labs is way too expensive

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment