Better train whisper large V3 or whisper turbo ?

by GALEON-AI - opened


Thank you so much for this efficient and impressive work! 😊

I'm relatively new to this type of model, and I have a few questions. I am looking to train a large Whisper model specifically tailored for healthcare-related vocabulary. After that, I plan to use it for real-time voice processing with Whisper.

In this context, do you think it would be more beneficial to directly train the large V3 turbo model, or should I train the regular large V3 version and then distill it myself for better performance? If so, could you explain why that approach might be preferable?

Thanks again for the quick implementation of the large V3 turbo model—it's much appreciated!

Best regards.

Not sure about Turbo-V3, but I did find medium.en finetuned to be extremely quick and capable of near live transcription and with fewer hallucinations than large-v3.


What approach do you use for live transcription? I’ve tried two methods, and one has worked better than the other.

First, I used Voice Activity Detection (VAD), where I start transcribing the audio after detecting a pause in speech. It’s not exactly real-time transcription, but it works reasonably well.

Then I tried the approach described in this YouTube video: [link]. Unfortunately, I experienced a lot of issues with hallucinations in the transcription.

I’m curious to know how you handle live transcription. What methods do you find most effective?

BTW i work with french langage, i think it's why the medium model not work as well as the english version of live transcription.

Thanks for your response!

Sign up or log in to comment