Text-to-Speech
Fairseq
English
audio

How do you add a 'pause' in the audio?

#5
by garland3 - opened

The model seems to ignore punctuation which makes it more difficult to understand.
Maybe, I just need to dive deeper into how the model is trained, but I thought someone might have a quick answer.

You can add fullstops where you want to add pauses. Its not perfect but works in most cases.

You can add fullstops where you want to add pauses. Its not perfect but works in most cases.

I really tried to figure out, how I can do this, but without success.
What can I use as fullstop or EOS in this model?

Adding commas instead of full stops worked for me, it's weird. Thought some unicode error so ran it from Pycharm/iPython and cmd (Windows) but the effect was the same even in HF space its the same issue. It pauses at commas but completely ignores full stops.

Adding commas instead of full stops worked for me, it's weird. Thought some unicode error so ran it from Pycharm/iPython and cmd (Windows) but the effect was the same even in HF space its the same issue. It pauses at commas but completely ignores full stops.

Yeah. I also found this workaround, but sometimes as result I get sighs in places where they are unnatural (for example in the end of sentence).

Pretty sure this behaviour occurs because the model is trained on the LJ Speech Dataset, which only has punctuation at the end of an utterance (see examples on LJ Speech webpage). Putting a comma / full-stop midway through the sentence is taking the model out of it's training configuration, and so gives unexpected results!

#Add a "," afer any punctuation mark and solve the problem:

def add_comma_after_punctuation(text: str) -> str:
# Lista de caracteres después de los cuales se debe agregar una coma
punctuation_marks = ['.', '!', '?', '(', ')', ':']

# Recorre cada marca de puntuación y añade una coma después de cada ocurrencia
for mark in punctuation_marks:
    text = text.replace(mark, mark + ',')

return text

Sign up or log in to comment