metadata

language: id
license: apache-2.0
tags:
  - icefall
  - sherpa-ncnn
  - phoneme-recognition
  - automatic-speech-recognition
datasets:
  - mozilla-foundation/common_voice_13_0
  - indonesian-nlp/librivox-indonesia
  - google/fleurs

Sherpa-ncnn Pruned Stateless Zipformer RNN-T Streaming ID

Sherpa-ncnn Pruned Stateless Zipformer RNN-T Streaming ID is an automatic speech recognition model trained on the following datasets:

Common Voice ID
LibriVox Indonesia
FLEURS ID

Instead of being trained to predict sequences of words, this model was trained to predict sequence of phonemes, e.g. ['p', 'ə', 'r', 'b', 'u', 'a', 't', 'a', 'n', 'ɲ', 'a']. Therefore, the model's vocabulary contains the different IPA phonemes found in g2p ID.

This model was converted from the TorchScript version of Pruned Stateless Zipformer RNN-T Streaming ID to ncnn format.

Converting from TorchScript

Refer to the official instructions for conversion to ncnn, which includes installation of csukuangfj's ncnn fork.

Frameworks

k2
icefall
lhotse
sherpa-ncnn
ncnn