metadata

library_name: transformers
license: apache-2.0
base_model: openai/whisper-large-v3
tags:
  - generated_from_trainer
metrics:
  - accuracy
  - precision
  - recall
  - f1
model-index:
  - name: speech-emotion-recognition-with-openai-whisper-large-v3
    results: []

speech-emotion-recognition-with-openai-whisper-large-v3

This model is a fine-tuned version of openai/whisper-large-v3 on the RAVDESS, SAVEE, TESS, and URDU dataset. It achieves the following results on the evaluation set:

Loss: 0.5008
Accuracy: 0.9199
Precision: 0.9230
Recall: 0.9199
F1: 0.9198

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 2
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 5
total_train_batch_size: 10
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 25
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy	Precision	Recall	F1
0.4948	0.9995	394	0.4911	0.8286	0.8449	0.8286	0.8302
0.6271	1.9990	788	0.5307	0.8225	0.8559	0.8225	0.8277
0.2364	2.9985	1182	0.5076	0.8692	0.8727	0.8692	0.8684
0.0156	3.9980	1576	0.5669	0.8732	0.8868	0.8732	0.8745
0.2305	5.0	1971	0.4578	0.9108	0.9142	0.9108	0.9114
0.0112	5.9995	2365	0.4701	0.9108	0.9159	0.9108	0.9114
0.0013	6.9990	2759	0.5232	0.9138	0.9204	0.9138	0.9137
0.1894	7.9985	3153	0.5008	0.9199	0.9230	0.9199	0.9198
0.0877	8.9980	3547	0.5517	0.9138	0.9152	0.9138	0.9138
0.1471	10.0	3942	0.5856	0.8895	0.9002	0.8895	0.8915
0.0026	10.9995	4336	0.8334	0.8773	0.8949	0.8773	0.8770

Framework versions

Transformers 4.44.2
Pytorch 2.4.1+cu121
Datasets 3.0.0
Tokenizers 0.19.1