lsnoo's picture
Create README.md
e216fe8

A wav2vec2.0-xlsr-53 model is fine-tuned on a large speech corpus of children aged 3 to 5 years.


language: - ko thumbnail: "url to a thumbnail used in social sharing" tags: - automatic-speech-recognition - speech datasets: - AI-HUB metrics: - cer model-index: - name: xlsr-53-korean_kids_3_to_5_syl results: - task: type: automatic-speech-recognition dataset: type: AI-HUB
name: AI-HUB split: test metrics: - type: cer
value: 3.48
name: Syllable Error Rate

Dataset: children's speech aged 3 to 5 years

  • the number of files) train:valid:test=256,548:12,121:12,000
  • transcripts were tokenized into morphs by Mecab

EVALUATION results

  • Dataset: testset from the corpus
  • Performance: syllable Error Rate on the testset = 3.48%