boumehdi
/

wav2vec2-large-xlsr-moroccan-darija

Automatic Speech Recognition

Moroccan Arabic

xlsr-fine-tuning-week

Inference Endpoints

Model card Files Files and versions Community

boumehdi commited on Apr 23, 2023

Commit

f72f5af

•

1 Parent(s): cf16445

Update README.md

Files changed (1) hide show

README.md +18 -4

README.md CHANGED Viewed

@@ -17,7 +17,7 @@ model-index:
     metrics:
        - name: Test WER
          type: wer
-         value: 44.30
 ---
 # Wav2Vec2-Large-XLSR-53-Moroccan-Darija
@@ -61,7 +61,21 @@ Here's the output: ڭالت ليا هاد السيد هادا ما كاينش ب
 ## Evaluation & Previous works
-==================================================================================
 -v2 (fine-tuned on 9 hours of audio + replaced أ and ى and إ with ا as it creates a lot of problems + tried to standardize the Moroccan Darija)
@@ -77,7 +91,7 @@ The validation loss is still high also because the validation data contains word
 Further training to decrease the training Loss makes this model overfit a little bit.
-==================================================================================
 -v1 (fine-tuned on 6 hours of audio)
@@ -87,7 +101,7 @@ Further training to decrease the training Loss makes this model overfit a little
 **Validation Loss**: 45.24
-==================================================================================
 ## Future Work

     metrics:
        - name: Test WER
          type: wer
+         value: 23.44
 ---
 # Wav2Vec2-Large-XLSR-53-Moroccan-Darija
 ## Evaluation & Previous works
+====================================
+-v3 (fine-tuned on 10 hours of audio + changed hyperparameters + discovered a huge bug when using the letter ا)
+**Wer**: 23.44
+**Training Loss**: 15.96
+**Validation Loss**: 33.92
+The validation loss is still high also because the validation data contains words that have never been trained before. The solution is to add more data and more hours of training.
+Further training to decrease the training Loss makes this model overfit a little bit.
+====================================
 -v2 (fine-tuned on 9 hours of audio + replaced أ and ى and إ with ا as it creates a lot of problems + tried to standardize the Moroccan Darija)
 Further training to decrease the training Loss makes this model overfit a little bit.
+====================================
 -v1 (fine-tuned on 6 hours of audio)
 **Validation Loss**: 45.24
+====================================
 ## Future Work