speecht5-tts-demo

Runtime error

Zhenhong commited on Apr 29, 2023

Commit

26ff55d

•

1 Parent(s): a46d973

Updated description

Files changed (5) hide show

.gitattributes CHANGED Viewed

@@ -1,4 +1,3 @@
-*.7z filter=lfs diff=lfs merge=lfs -text
 *.arrow filter=lfs diff=lfs merge=lfs -text
 *.bin filter=lfs diff=lfs merge=lfs -text
 *.bz2 filter=lfs diff=lfs merge=lfs -text
@@ -32,3 +31,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.arrow filter=lfs diff=lfs merge=lfs -text
 *.bin filter=lfs diff=lfs merge=lfs -text
 *.bz2 filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+*.7z filter=lfs diff=lfs merge=lfs -text

.gitignore CHANGED Viewed

@@ -1,4 +1,4 @@
 *.pyc
 __pycache__/
-.DS_Store

+.DS_Store
 *.pyc
 __pycache__/

README.md CHANGED Viewed

@@ -1,5 +1,5 @@
 ---
-title: SpeechT5 Speech Synthesis Demo
 emoji: 👩‍🎤
 colorFrom: yellow
 colorTo: blue

 ---
+title: Text-to-Speech Demo
 emoji: 👩‍🎤
 colorFrom: yellow
 colorTo: blue

app.py CHANGED Viewed

@@ -57,18 +57,13 @@ def predict(text, speaker):
     return (16000, speech)
-title = "SpeechT5: Speech Synthesis"
 description = """
 The <b>SpeechT5</b> model is pre-trained on text as well as speech inputs, with targets that are also a mix of text and speech.
 By pre-training on text and speech at the same time, it learns unified representations for both, resulting in improved modeling capabilities.
-SpeechT5 can be fine-tuned for different speech tasks. This space demonstrates the <b>text-to-speech</b> (TTS) checkpoint for the English language.
-See also the <a href="https://huggingface.co/spaces/Matthijs/speecht5-asr-demo">speech recognition (ASR) demo</a>
-and the <a href="https://huggingface.co/spaces/Matthijs/speecht5-vc-demo">voice conversion demo</a>.
-Refer to <a href="https://colab.research.google.com/drive/1i7I5pzBcU3WDFarDnzweIj4-sVVoIUFJ">this Colab notebook</a> to learn how to fine-tune the SpeechT5 TTS model on your own dataset or language.
 <b>How to use:</b> Enter some English text and choose a speaker. The output is a mel spectrogram, which is converted to a mono 16 kHz waveform by the
 HiFi-GAN vocoder. Because the model always applies random dropout, each attempt will give slightly different results.

     return (16000, speech)
+title = "Text-to-Speech based on SpeechT5"
 description = """
 The <b>SpeechT5</b> model is pre-trained on text as well as speech inputs, with targets that are also a mix of text and speech.
 By pre-training on text and speech at the same time, it learns unified representations for both, resulting in improved modeling capabilities.
+This space demonstrates the <b>text-to-speech</b> (TTS) checkpoint for the English language.
 <b>How to use:</b> Enter some English text and choose a speaker. The output is a mel spectrogram, which is converted to a mono 16 kHz waveform by the
 HiFi-GAN vocoder. Because the model always applies random dropout, each attempt will give slightly different results.

requirements.txt CHANGED Viewed

@@ -1,8 +1,8 @@
 git+https://github.com/huggingface/transformers.git
 torch
 torchaudio
 soundfile
-librosa
 samplerate
-resampy
-sentencepiece

 git+https://github.com/huggingface/transformers.git
 torch
 torchaudio
+sentencepiece
 soundfile
 samplerate
+librosa
+resampy