large-v3

by arabcoders - opened Nov 6, 2023

Nov 6, 2023

•

edited Nov 6, 2023

Hello,

Thank you for your improved models, is it possible to make the same upgrades to the newly released whisper large-v3 model? If it's not. Could you release your steps to reproduce the results and i will try to do it myself.

Thank you.

P.S: sorry this meant to be posted in the /clu-ling/whisper-large-v2-japanesee-5k-steps repo.

elsayedissa

clu-ling org Nov 12, 2023

Hello,

Thank you for contacting us. I tried to attach the files I used to fine-tune large-v2 here, but I could not due to restrictions on specific files. So, I used the seq-to-seq script here: https://github.com/huggingface/transformers/tree/main/examples/pytorch/speech-recognition

You can use it along with your dataset and any whisper version.

Thanks.

arabcoders

Nov 13, 2023

Hello,

Thank you for contacting us. I tried to attach the files I used to fine-tune large-v2 here, but I could not due to restrictions on specific files. So, I used the seq-to-seq script here: https://github.com/huggingface/transformers/tree/main/examples/pytorch/speech-recognition

You can use it along with your dataset and any whisper version.

Thanks.

Thank you, if i may bother you a bit could you go little bit in details on your data? like is it regular audio files with subs or it's something else? if you could expand on the commands you used that would be really helpful. I found your model to be rather big improvements to regular whisper-v2 at least for Japanese.

Thank you again.

P.S: if the reason you aren't upload to upload the data is due to size or something else, we you can contact me and i can provide private hosting for the data.

elsayedissa

clu-ling org Nov 15, 2023

Thank you. The data is freely available. It is the common voice dataset: https://commonvoice.mozilla.org/en/datasets

You can access and download it in any language. If you want to share your email, I can send you the scripts I used.

Thanks.

arabcoders

Nov 15, 2023

Thank you. The data is freely available. It is the common voice dataset: https://commonvoice.mozilla.org/en/datasets

You can access and download it in any language. If you want to share your email, I can send you the scripts I used.

Thanks.

Good to hear i attempted to do it with large-v3 and for me it was failing for some reason i would really appreciate it if you could send the scripts used. You can contact me at [email protected]

Thank you again and sorry for taking your time.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment