How to get timestamp for this model
Thanks for great work! When I tried to get timestamp using following code,
101 decoding_cfg = asr_model.cfg.decoding
102 print(OmegaConf.to_yaml(decoding_cfg))
103 with open_dict(decoding_cfg):
104 decoding_cfg.preserve_alignments = True
105 decoding_cfg.compute_timestamps = True
106 #decoding_cfg.rnnt_timestamp_type = 'word'
107 asr_model.change_decoding_strategy(decoding_cfg)
transcriptions = asr_model.transcribe(audios, return_hypotheses=True)
I got following message printed out and transcripts returned does not contains ,
Preservation of alignments was requested but TransformerAEDBeamInfer does not implement it.
...
return_hypotheses=True is currently not supported, returning text instead.
It seems it does not support feature yet. Do you guys have plan for it?
Hi, thanks for trying out the Canary model! It currently does not support timestamps directly, though we are looking into that.
In the meantime, assuming you want timestamps for ASR transcription, you can obtain them using NeMo Forced Aligner. If you want to get the timestamps based on the transcription provided by Canary, you will need to make a new NeMo manifest file with Canary’s transcriptions saved in the text
field. Then run NFA like in the quickstart command. The command in the quickstart uses the model stt_en_fastconformer_hybrid_large_pc
, which is English-only. If you want the other languages, then replace en
with de
, es
or fr
. We also have other ASR models you could use for alignment, but I would start with this.
Thanks for detailed instructions, I will give it a try.