Wav2Vec2-XLS-R-300M-Japanese-Hiragana

Fine-tuned facebook/wav2vec2-xls-r-300m on Japanese Hiragana characters using JSUT, JVS, Common Voice, and in-house dataset. The sentence outputs do not contain word boundaries. Audio inputs should be sampled at 16kHz.

Test Results

CER: 9.34%

Training

Trained on JSUT, a subset of JVS, train+valid set of Common Voice Japanese, and in-house Japanese dataset. Tested on test set of Common Voice Japanese.

Downloads last month: 10

Inference Examples

Automatic Speech Recognition

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train snu-nia-12/wav2vec2-xls-r-300m_nia12_phone-hiragana_japanese

Evaluation results

Test CER on Common Voice Japanese
self-reported

9.340

View on Papers With Code