|
--- |
|
language: ary |
|
metrics: |
|
- wer |
|
tags: |
|
- audio |
|
- automatic-speech-recognition |
|
- speech |
|
- xlsr-fine-tuning-week |
|
license: apache-2.0 |
|
model-index: |
|
- name: XLSR Wav2Vec2 Moroccan Arabic dialect by Boumehdi |
|
results: |
|
- task: |
|
name: Speech Recognition |
|
type: automatic-speech-recognition |
|
metrics: |
|
- name: Test WER |
|
type: wer |
|
value: 49.68 |
|
--- |
|
# Wav2Vec2-Large-XLSR-53-Moroccan-Darija |
|
|
|
**wav2vec2-large-xlsr** fine-tuned on 6 hours of labeled Darija Audios |
|
|
|
I have also added 3 phonetic units to this model ڭ, ڤ and پ. For example: ڭال , ڤيديو , پودكاست |
|
|
|
## Usage |
|
|
|
The model can be used directly (without a language model) as follows: |
|
|
|
```python |
|
import librosa |
|
import torch |
|
from transformers import Wav2Vec2CTCTokenizer, Wav2Vec2ForCTC, Wav2Vec2Processor, TrainingArguments, Wav2Vec2FeatureExtractor, Trainer |
|
|
|
tokenizer = Wav2Vec2CTCTokenizer("./vocab.json", unk_token="[UNK]", pad_token="[PAD]", word_delimiter_token="|") |
|
processor = Wav2Vec2Processor.from_pretrained('boumehdi/wav2vec2-large-xlsr-moroccan-darija', tokenizer=tokenizer) |
|
model=Wav2Vec2ForCTC.from_pretrained('boumehdi/wav2vec2-large-xlsr-moroccan-darija') |
|
|
|
|
|
# load the audio data (use your own wav file here!) |
|
input_audio, sr = librosa.load('file.wav', sr=16000) |
|
|
|
# tokenize |
|
input_values = processor(input_audio, return_tensors="pt", padding=True).input_values |
|
|
|
# retrieve logits |
|
logits = model(input_values).logits |
|
|
|
tokens=torch.argmax(logits, axis=-1) |
|
|
|
# decode using n-gram |
|
transcription = tokenizer.batch_decode(tokens) |
|
|
|
# print the output |
|
print(transcription) |
|
``` |
|
|
|
Here's the output: ڭالت ليا هاد السيد هادا ما كاينش بحالو |
|
|
|
## Evaluation |
|
|
|
**Wer**: 49.68 |
|
|
|
**Training Loss**: 9.88 |
|
|
|
**Validation Loss**: 45.24 |
|
|
|
This high validation loss value is mainly due to the fact that Darija can be written in many ways. |
|
|
|
## Future Work |
|
|
|
Currently working on improving this model. The new model will be available soon. |
|
|
|
|
|
|