speechbrain
English
Tacotron2
zero-shot
multi-speaker-tts
pradnya-hf-dev commited on
Commit
e425eb9
1 Parent(s): e0781bc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +27 -0
README.md CHANGED
@@ -34,6 +34,8 @@ Please notice that we encourage you to read our tutorials and learn more about
34
 
35
  ### Perform Text-to-Speech (TTS)
36
 
 
 
37
  ```
38
  import torchaudio
39
  from speechbrain.pretrained import MSTacotron2
@@ -57,6 +59,31 @@ waveforms = hifi_gan.decode_batch(mel_outputs)
57
  torchaudio.save("synthesized_sample.wav", waveforms.squeeze(1).cpu(), 22050)
58
  ```
59
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
60
  If you want to generate multiple sentences in one-shot, you can do it this way:
61
  Note: The model internally reorders the input texts in the decreasing order of their lengths.
62
 
 
34
 
35
  ### Perform Text-to-Speech (TTS)
36
 
37
+ The following is an example of converting text-to-speech with the speaker voice characteristics extracted from reference speech.
38
+
39
  ```
40
  import torchaudio
41
  from speechbrain.pretrained import MSTacotron2
 
59
  torchaudio.save("synthesized_sample.wav", waveforms.squeeze(1).cpu(), 22050)
60
  ```
61
 
62
+ If you want to generate a random voice, you can use the following:
63
+
64
+ ```
65
+ import torchaudio
66
+ from speechbrain.pretrained import MSTacotron2
67
+ from speechbrain.pretrained import HIFIGAN
68
+
69
+ # Intialize TTS (mstacotron2) and Vocoder (HiFIGAN)
70
+ ms_tacotron2 = MSTacotron2.from_hparams(source="speechbrain/tts-mstacotron2-libritts", savedir="tmpdir_tts")
71
+ hifi_gan = HIFIGAN.from_hparams(source="speechbrain/tts-hifigan-libritts-22050Hz", savedir="tmpdir_vocoder")
72
+
73
+ # Required input
74
+ INPUT_TEXT = "Mary had a little lamb"
75
+
76
+ # Running the Zero-Shot Multi-Speaker Tacotron2 model to generate mel-spectrogram
77
+ mel_outputs, mel_lengths, alignments = ms_tacotron2.generate_random_voice(INPUT_TEXT)
78
+
79
+ # Running Vocoder (spectrogram-to-waveform)
80
+ waveforms = hifi_gan.decode_batch(mel_outputs)
81
+
82
+ # Save the waverform
83
+ torchaudio.save("synthesized_sample.wav", waveforms.squeeze(1).cpu(), 22050)
84
+ ```
85
+
86
+
87
  If you want to generate multiple sentences in one-shot, you can do it this way:
88
  Note: The model internally reorders the input texts in the decreasing order of their lengths.
89