nvidia/canary-1b · Transcription repeats the same word

Apr 8

Hi,

Thanks for making this model available!
I tried to implement it and it works like a charm with audios up to 1 min length. Unfortunately if I try to transcribe a longer than 1 minute recording, it only transcribes the first 2-3 sentences, then just repeats the word where it gets stuck for the rest of the text. I have a 1 hour long recording I'm trying to transcribe, and if I crop to 1 minute, it is perfect. 5 minutes already has the problem of repeating a word, and it can't even transcribe the first 1 minute properly. I used the basic example code for an English language transcription. Do you have an ide how to solve this issue?
I have an Nvidia RTX4090 GPU with 24 GB memory and I would like to infer only.

Thanks,
Agi

erastorgueva-nv

NVIDIA org Apr 8

Hi, thanks for trying out the model! We have a special script for inference on longer samples here: https://github.com/NVIDIA/NeMo/blob/main/examples/asr/speech_multitask/speech_to_text_aed_chunked_infer.py. It should fix your issues.

AgnesG

Apr 9

Oh, perfect!
It took me a bit of time to figure out, that I need to build the environment from git via pip from source instead of just pip, but otherwise it worked smooth and could transcribe a 1 hour long recording.

AgnesG changed discussion status to closed Apr 9

xdevfaheem

Sep 22

@AgnesG can you share instructions?