Spaces:
Running
on
CPU Upgrade
better sentence
Enhanced sentences with improved emotion yield better analysis for text-to-speech (TTS) quality.
Testing with these enhanced sentences will likely produce better results than those obtained with previous sentences.
Try yourself with some sentences.
Agree that there should be a lot more sentences that have commas, dots and exclamation marks as mentioned in #13
But the current sentences should not be removed.
Also, from where do these new sentences come from? What is the source?
Still, the Harvard sentences should be kept and perhaps even prioritized. The reason is:I finally understand those Harvard sentences and why they have so few commas. There are times when Pheme correctly pauses mid-sentence, making the sentence more comprehensible. ElevenLabs never does, plows right through.
These are the super rare cases when Pheme outclasses ElevenLabs. Though now Pheme is disabled due to poor overall performance.
Also you should parse all the non-harvard sentences through the toxicity Python library used by TTT-Arena.
Still, the Harvard sentences should be kept and perhaps even prioritized. The reason is:
I finally understand those Harvard sentences and why they have so few commas. There are times when Pheme correctly pauses mid-sentence, making the sentence more comprehensible. ElevenLabs never does, plows right through.
I would argue the opposite. Pauses at commas, periods, and other punctuation goes into how natural a speech sound. The location of where punctuations appear corresponds to where we usually pause when speaking as well. Having sentences without comma would just give certain models that doesn't pause an unfair advantage against the others.
Sentences with ! & ? have been appended to and can be tested on the forked Space.
https://huggingface.co/spaces/Pendrokar/TTS-Spaces-Arena
Have some more feedback on the sentences that makes them unnatural. Most sentences with the exclamation mark sentences have the form:"<how hUmaN is feeling>, <text to speak>!"
34 sentences that quote a character. They could be kept, but it is very rare for a TTS to have prompting capabilities that act out scenes. (ElevenLabs & Parler)
https://github.com/Vaibhavs10/open-tts-tracker?tab=readme-ov-file#capability-specifics
Prompting
Unless the sentences are split and sent to the TTS as separate requests.
Added more of @KingNish sentences in the forked Arena. Wrapped them all in double quote marks just to see how various TTS models react to them. On the hope that they stop reading the text like reading a book.
The other option for more varied sentences may be Mozilla's Common Voice dataset:
https://huggingface.co/datasets/mozilla-foundation/common_voice_17_0/tree/main/transcript
As they get reviewed by the community there:
https://commonvoice.mozilla.org/en/review
This would also help adding non-English languages to the Arena.