Spaces:

projecte-aina
/

matxa-alvocat-tts-ca

Running

App Files Files Community

wetdog commited on Mar 6

Commit

bbb1375

•

1 Parent(s): d3127d4

update description

Browse files

Files changed (1) hide show

infer_onnx.py +7 -7

infer_onnx.py CHANGED Viewed

@@ -37,6 +37,7 @@ model_matcha_mel= onnxruntime.InferenceSession(str(MODEL_PATH_MATCHA_MEL), sess_
 model_vocos = onnxruntime.InferenceSession(str(MODEL_PATH_VOCOS), sess_options=sess_options, providers=["CPUExecutionProvider"])
 model_matcha = onnxruntime.InferenceSession(str(MODEL_PATH_MATCHA), sess_options=sess_options, providers=["CPUExecutionProvider"])
 def vocos_inference(mel):
     with open(CONFIG_PATH, "r") as f:
@@ -88,6 +89,7 @@ def vocos_inference(mel):
     return y
 def tts(text:str, spk_id:int):
     sid = np.array([int(spk_id)]) if spk_id is not None else None
     text_matcha , text_lengths = process_text(0,text,"cpu")
@@ -129,20 +131,18 @@ title = """
     <div
         style="display: inline-flex; align-items: center; gap: 0.8rem; font-size: 1.75rem;"
     > <h1 style="font-weight: 900; margin-bottom: 7px; line-height: normal;">
-        TTS Catalan Comparison
     </h1> </div>
 </div>
  """
 description = """
-VITS2 is an end-to-end speech synthesis model that predicts a speech waveform conditional on an input text sequence. VITS2 improved the
-training and inference efficiency and naturalness by introducing adversarial learning into the duration predictor. The transformer
-block was added to the normalizing flows to capture the long-term dependency when transforming the distribution.
-The synthesis quality was improved by incorporating Gaussian noise into the alignment search.
 🍵 Matcha-TTS, a new approach to non-autoregressive neural TTS, that uses conditional flow matching (similar to rectified flows) to speed up ODE-based speech synthesis
-Models are being trained in openslr69 and festcat datasets
 """
 article = "Training and demo by BSC."

 model_vocos = onnxruntime.InferenceSession(str(MODEL_PATH_VOCOS), sess_options=sess_options, providers=["CPUExecutionProvider"])
 model_matcha = onnxruntime.InferenceSession(str(MODEL_PATH_MATCHA), sess_options=sess_options, providers=["CPUExecutionProvider"])
 def vocos_inference(mel):
     with open(CONFIG_PATH, "r") as f:
     return y
 def tts(text:str, spk_id:int):
     sid = np.array([int(spk_id)]) if spk_id is not None else None
     text_matcha , text_lengths = process_text(0,text,"cpu")
     <div
         style="display: inline-flex; align-items: center; gap: 0.8rem; font-size: 1.75rem;"
     > <h1 style="font-weight: 900; margin-bottom: 7px; line-height: normal;">
+        TTS Vocoder Comparison
     </h1> </div>
 </div>
  """
 description = """
 🍵 Matcha-TTS, a new approach to non-autoregressive neural TTS, that uses conditional flow matching (similar to rectified flows) to speed up ODE-based speech synthesis
+For vocoders we use Hifigan universal version and Vocos trained in a catalan set of ~28 hours.
+Matcha was trained using openslr69 and festcat datasets
 """
 article = "Training and demo by BSC."