Unable to train small model from scratch

#16
by RamNaamSatyaHai - opened

I am using your fine-tuning script but instead of using the pre-trained model I am creating a model using the default configuration
configuration = WhisperConfig()
model = WhisperForConditionalGeneration(configuration)
Afterward, I trained the model on common_voice_11_0 hindi datasets.
But the result was all gibberish with a WER of 2360.
Can you tell me why can't I train the model using the above-creating model from the default config should assign me random weights and when I train for Hindi at least the model should output the correct answer

I am facing same problem, model created with default configuration results nonsensical output.

Hey @RamNaamSatyaHai - training from randomly initialised weights will require significantly more data to reach convergence. The Common Voice dataset simply doesn't have enough training data for you to be able to train a model from scratch. It only has 10s of hours of data, wherein you need closer to 1000s of hours to train from scratch.

Instead, if you load the model with pre-trained weights, the model already has good knowledge of the ASR task, so you can fine-tune with relatively little data. This second approach is called transfer learning, and is what makes fine-tuning models on low-resource languages possible.

Thanks a lot @sanchit-gandhi for your input.

@sanchit-gandhi : While we can certainly fine-tune the existing model, the challenge lies in introducing additional language support into Whisper, a capability that presently doesn't exist within the Whisper model, based on my understanding. The question that arises is whether it's possible to augment language support within Whisper, and if so, what steps would be involved in achieving this enhancement.

It's possible to add a new language during fine-tuning. Even in this case, it's better to fine-tune the original Whisper model, rather than train from scratch, since you'll be able to leverage the knowledge the Whisper model has in the other languages it was trained on. See this guide for an example: https://huggingface.co/learn/audio-course/chapter5/fine-tuning

Thanks @sanchit-gandhi for your input and sharing an article.

RamNaamSatyaHai changed discussion status to closed

Hey @sanchit-gandhi I have tried the above example https://huggingface.co/learn/audio-course/chapter5/fine-tuning
But I want to add language without training on some pre-trained language.
Is there a way to do so?

RamNaamSatyaHai changed discussion status to open

hey, @sanchit-gandhi one more question when I run the following code for both "Castillian Spanish" and "Spanish "is "es".
Could it be the reason that the whisper base model outputs Spanish for both dialects

Sign up or log in to comment