johaness14's picture
Update README.md
d9e0d21 verified
|
raw
history blame
4.47 kB
metadata
license: apache-2.0
base_model: facebook/wav2vec2-conformer-rope-large
tags:
  - generated_from_trainer
metrics:
  - wer
model-index:
  - name: wav2vec2-conformer-rope-jv-openslr
    results: []
datasets:
  - openslr/openslr
language:
  - jv
pipeline_tag: automatic-speech-recognition

wav2vec2-conformer-rope-jv-openslr

This model is a fine-tuned version of facebook/wav2vec2-conformer-rope-large on the OpenSLR41 datasets. It achieves the following results on the evaluation set:

  • Loss: 0.2555
  • Wer: 0.1296

Model description

The model is a fine-tuned version of wav2vec2-conformer-rope-large, specifically adapted using the OpenSLR 41 dataset, which is focused on the Javanese language domain. This adaptation enables the model to effectively recognize and process spoken Javanese, leveraging the robust capabilities of the wav2vec2-conformer-rope-large architecture combined with domain-specific training data.

Intended uses & limitations

This model is intended for transcribing spoken Javanese language from audio recordings. It achieves a Word Error Rate (WER) of 12%, indicating that while the model performs reasonably well, it still produces significant transcription errors. Users should be aware that the accuracy may vary, particularly in cases with challenging audio conditions or less common dialects. Additionally, this model requires input audio at a sample rate of 16kHz, which may limit its applicability for recordings at different sample rates or lower quality audio files.

Training and evaluation data

The model use OpenSLR41 datasets, and split into 2 section (training and testing), then the model is trained using 1xA100 GPU with a training duration of NaN hours.

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 1000
  • num_epochs: 85
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer
0.6796 2.8329 2000 0.5100 0.5010
0.4236 5.6657 4000 0.3792 0.3598
0.318 8.4986 6000 0.3244 0.2846
0.2444 11.3314 8000 0.3026 0.2674
0.1916 14.1643 10000 0.2682 0.2364
0.1588 16.9972 12000 0.2762 0.2398
0.1338 19.8300 14000 0.2623 0.2116
0.1201 22.6629 16000 0.2672 0.2081
0.1005 25.4958 18000 0.2596 0.1978
0.0921 28.3286 20000 0.2595 0.1881
0.0853 31.1615 22000 0.2671 0.1730
0.0761 33.9943 24000 0.2588 0.1744
0.0689 36.8272 26000 0.2490 0.1668
0.0646 39.6601 28000 0.2630 0.1633
0.0615 42.4929 30000 0.2677 0.1688
0.0563 45.3258 32000 0.2627 0.1585
0.0524 48.1586 34000 0.2497 0.1468
0.0511 50.9915 36000 0.2520 0.1516
0.0486 53.8244 38000 0.2418 0.1544
0.0415 56.6572 40000 0.2571 0.1489
0.0409 59.4901 42000 0.2687 0.1502
0.0361 62.3229 44000 0.2542 0.1371
0.0346 65.1558 46000 0.2504 0.1344
0.0312 67.9887 48000 0.2603 0.1337
0.0307 70.8215 50000 0.2641 0.1254
0.0305 73.6544 52000 0.2675 0.1289
0.0265 76.4873 54000 0.2625 0.1261
0.0271 79.3201 56000 0.2573 0.1268
0.0257 82.1530 58000 0.2571 0.1241
0.0247 84.9858 60000 0.2555 0.1296

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.2.1+cu118
  • Datasets 2.20.0
  • Tokenizers 0.19.1