Unable to train small model from scratch

#16

by RamNaamSatyaHai - opened Sep 17, 2023

Sep 17, 2023

I am using your fine-tuning script but instead of using the pre-trained model I am creating a model using the default configuration
configuration = WhisperConfig()
model = WhisperForConditionalGeneration(configuration)
Afterward, I trained the model on common_voice_11_0 hindi datasets.
But the result was all gibberish with a WER of 2360.
Can you tell me why can't I train the model using the above-creating model from the default config should assign me random weights and when I train for Hindi at least the model should output the correct answer

Vivekup

Sep 28, 2023

I am facing same problem, model created with default configuration results nonsensical output.

sanchit-gandhi

Owner Sep 28, 2023

Hey @RamNaamSatyaHai - training from randomly initialised weights will require significantly more data to reach convergence. The Common Voice dataset simply doesn't have enough training data for you to be able to train a model from scratch. It only has 10s of hours of data, wherein you need closer to 1000s of hours to train from scratch.

Instead, if you load the model with pre-trained weights, the model already has good knowledge of the ASR task, so you can fine-tune with relatively little data. This second approach is called transfer learning, and is what makes fine-tuning models on low-resource languages possible.

Vivekup

Sep 29, 2023

Thanks a lot @sanchit-gandhi for your input.

Vivekup

Sep 29, 2023

@sanchit-gandhi : While we can certainly fine-tune the existing model, the challenge lies in introducing additional language support into Whisper, a capability that presently doesn't exist within the Whisper model, based on my understanding. The question that arises is whether it's possible to augment language support within Whisper, and if so, what steps would be involved in achieving this enhancement.

sanchit-gandhi

Owner Sep 29, 2023

It's possible to add a new language during fine-tuning. Even in this case, it's better to fine-tune the original Whisper model, rather than train from scratch, since you'll be able to leverage the knowledge the Whisper model has in the other languages it was trained on. See this guide for an example: https://huggingface.co/learn/audio-course/chapter5/fine-tuning

Vivekup

Sep 29, 2023

Thanks @sanchit-gandhi for your input and sharing an article.

RamNaamSatyaHai changed discussion status to closed Oct 3, 2023

RamNaamSatyaHai

Oct 17, 2023

Hey @sanchit-gandhi I have tried the above example https://huggingface.co/learn/audio-course/chapter5/fine-tuning
But I want to add language without training on some pre-trained language.
Is there a way to do so?

RamNaamSatyaHai changed discussion status to open Oct 17, 2023

RamNaamSatyaHai

Oct 18, 2023

hey, @sanchit-gandhi one more question when I run the following code for both "Castillian Spanish" and "Spanish "is "es".
Could it be the reason that the whisper base model outputs Spanish for both dialects

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment