Benjamin-png
/

swahili-mms-tts-finetuned

Safetensors

vits

Model card Files Files and versions Community

Benjamin-png commited on 5 days ago

Commit

15f0ff3

•

1 Parent(s): bb5aebf

Update README.md

Browse files

Files changed (1) hide show

README.md +19 -15

README.md CHANGED Viewed

@@ -21,19 +21,8 @@ You can check out the code and process used in the fine-tuning by visiting the [
 You can load and use the model directly from the Hugging Face model hub using either the `pipeline` API or by manually downloading the model and tokenizer.
-### 1. Using the `pipeline` API
-```python
-from transformers import pipeline
-# Load the fine-tuned model
-tts = pipeline("text-to-speech", model="Benjamin-png/swahili-mms-tts-finetuned")
-# Generate speech from text
-speech = tts("Habari, karibu kwenye mfumo wetu wa kusikiliza kwa Kiswahili.")
-```
-### 2. Download and Run the Model Directly
 You can also download the model and tokenizer manually and run the text-to-speech pipeline without the Hugging Face `pipeline` helper. Here's how:
@@ -41,8 +30,8 @@ You can also download the model and tokenizer manually and run the text-to-speec
 import torch
 import numpy as np
 import scipy.io.wavfile
-from transformers import AutoTokenizer
-from vits_model import VitsModel  # Assuming VitsModel is the class for this TTS model
 device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
 model_name = "Benjamin-png/swahili-mms-tts-finetuned"
@@ -67,6 +56,21 @@ output_np = output.squeeze().cpu().numpy()
 scipy.io.wavfile.write(audio_file_path, rate=model.config.sampling_rate, data=output_np)
 ```
 ### Saving and Playing the Audio
 To save and play the audio, you can use the same methods mentioned above:
@@ -103,7 +107,7 @@ pip install torch transformers numpy soundfile scipy pydub
 If you're interested in reproducing the fine-tuning process or using the model for similar purposes, you can check out the Google Colab notebook that outlines the entire process:
-- [Google Colab Notebook](upload file to Google Drive and provide the link here)
 The notebook includes detailed steps on how to fine-tune the MMS model for Swahili TTS.

 You can load and use the model directly from the Hugging Face model hub using either the `pipeline` API or by manually downloading the model and tokenizer.
+### 1. Download and Run the Model Directly
 You can also download the model and tokenizer manually and run the text-to-speech pipeline without the Hugging Face `pipeline` helper. Here's how:
 import torch
 import numpy as np
 import scipy.io.wavfile
+from transformers import VitsModel, AutoTokenizer
 device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
 model_name = "Benjamin-png/swahili-mms-tts-finetuned"
 scipy.io.wavfile.write(audio_file_path, rate=model.config.sampling_rate, data=output_np)
 ```
+### 2. Using the `pipeline` API
+```python
+from transformers import pipeline
+# Load the fine-tuned model
+tts = pipeline("text-to-speech", model="Benjamin-png/swahili-mms-tts-finetuned")
+# Generate speech from text
+speech = tts("Habari, karibu kwenye mfumo wetu wa kusikiliza kwa Kiswahili.")
+```
 ### Saving and Playing the Audio
 To save and play the audio, you can use the same methods mentioned above:
 If you're interested in reproducing the fine-tuning process or using the model for similar purposes, you can check out the Google Colab notebook that outlines the entire process:
+- [Google Colab Notebook](https://colab.research.google.com/drive/1dK1a814UqDnXnM5Rz6NBmk-vmhdN9M4f#scrollTo=iG6IrVva27uT)
 The notebook includes detailed steps on how to fine-tune the MMS model for Swahili TTS.