No description provided.

Enhanced sentences with improved emotion yield better analysis for text-to-speech (TTS) quality.
Testing with these enhanced sentences will likely produce better results than those obtained with previous sentences.
Try yourself with some sentences.

KingNish changed pull request status to open

Agree that there should be a lot more sentences that have commas, dots and exclamation marks as mentioned in #13

But the current sentences should not be removed.

Also, from where do these new sentences come from? What is the source?

TTS AGI org

Thanks for the contribution @KingNish - Are these GPT* generated? (it's okay) - just want to make sure we have the attribution right!

Thanks for the contribution @KingNish - Are these GPT* generated? (it's okay) - just want to make sure we have the attribution right!

Data is produced using Llama3 and Command R+ on Hugging Chat.

Still, the Harvard sentences should be kept and perhaps even prioritized. The reason is:
I finally understand those Harvard sentences and why they have so few commas. There are times when Pheme correctly pauses mid-sentence, making the sentence more comprehensible. ElevenLabs never does, plows right through.

These are the super rare cases when Pheme outclasses ElevenLabs. Though now Pheme is disabled due to poor overall performance.

Also you should parse all the non-harvard sentences through the toxicity Python library used by TTT-Arena.

Still, the Harvard sentences should be kept and perhaps even prioritized. The reason is:
I finally understand those Harvard sentences and why they have so few commas. There are times when Pheme correctly pauses mid-sentence, making the sentence more comprehensible. ElevenLabs never does, plows right through.

I would argue the opposite. Pauses at commas, periods, and other punctuation goes into how natural a speech sound. The location of where punctuations appear corresponds to where we usually pause when speaking as well. Having sentences without comma would just give certain models that doesn't pause an unfair advantage against the others.

Sentences with ! & ? have been appended to and can be tested on the forked Space.
https://huggingface.co/spaces/Pendrokar/TTS-Spaces-Arena

Have some more feedback on the sentences that makes them unnatural. Most sentences with the exclamation mark sentences have the form:
"<how hUmaN is feeling>, <text to speak>!"

34 sentences that quote a character. They could be kept, but it is very rare for a TTS to have prompting capabilities that act out scenes. (ElevenLabs & Parler)
https://github.com/Vaibhavs10/open-tts-tracker?tab=readme-ov-file#capability-specifics
Prompting

Unless the sentences are split and sent to the TTS as separate requests.

Added more of @KingNish sentences in the forked Arena. Wrapped them all in double quote marks just to see how various TTS models react to them. On the hope that they stop reading the text like reading a book.

The other option for more varied sentences may be Mozilla's Common Voice dataset:
https://huggingface.co/datasets/mozilla-foundation/common_voice_17_0/tree/main/transcript

As they get reviewed by the community there:
https://commonvoice.mozilla.org/en/review

This would also help adding non-English languages to the Arena.

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment