metadata

license: apache-2.0
tags:
  - generated_from_trainer
base_model: facebook/wav2vec2-large-xlsr-53
datasets:
  - common_voice_17_0
metrics:
  - wer
model-index:
  - name: xlsr-am-adap-phon
    results:
      - task:
          type: automatic-speech-recognition
          name: Automatic Speech Recognition
        dataset:
          name: common_voice_17_0
          type: common_voice_17_0
          config: am
          split: validation
          args: am
        metrics:
          - type: wer
            value: 0.9302421009437833
            name: Wer

xlsr-am-adap-phon

This model is a fine-tuned version of facebook/wav2vec2-large-xlsr-53 on the common_voice_17_0 dataset. It achieves the following results on the evaluation set:

Loss: 2.5869
Wer: 0.9302
Cer: 0.4393

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0003
train_batch_size: 16
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
num_epochs: 100
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer	Cer
6.6235	6.8966	100	6.3103	1.0	1.0
4.227	13.7931	200	4.2662	1.0	1.0
4.1461	20.6897	300	4.1543	1.0	0.9966
4.146	27.5862	400	4.1716	1.0	0.9859
4.105	34.4828	500	4.1391	1.0	0.9740
3.5688	41.3793	600	3.6625	1.0	0.9749
1.5705	48.2759	700	2.2315	1.0029	0.5187
0.6683	55.1724	800	2.2517	0.9684	0.4595
0.577	62.0690	900	2.2995	0.9528	0.4413
0.3109	68.9655	1000	2.4239	0.9397	0.4575
0.2803	75.8621	1100	2.4491	0.9508	0.4474
0.2136	82.7586	1200	2.4916	0.9179	0.4323
0.3282	89.6552	1300	2.5652	0.9302	0.4401
0.2118	96.5517	1400	2.5869	0.9302	0.4393

Framework versions

Transformers 4.42.0.dev0
Pytorch 2.3.1+cu121
Datasets 2.19.2
Tokenizers 0.19.1