--- language: ary metrics: - wer tags: - audio - automatic-speech-recognition - speech - xlsr-fine-tuning-week license: apache-2.0 model-index: - name: XLSR Wav2Vec2 Moroccan Arabic dialect by Boumehdi results: - task: name: Speech Recognition type: automatic-speech-recognition metrics: - name: Test WER type: wer value: 0.182171 --- # Wav2Vec2-Large-XLSR-53-Moroccan-Darija **wav2vec2-large-xlsr-53** - Fine-tuned on 31 hours (31 people) of labeled Darija Audios. - Each hour of audio is pronounced by a different person. - Transcriptions are performed by a single individual. - Fine-tuning is ongoing 24/7 to enhance accuracy. - We are consistently adding more data to the model every day. - Audio database is organized (by sex, age, region, ..)
Training Loss Validation Loss Wer
0.041000 0.197497 0.182171
## Usage The model can be used directly as follows: ```python import librosa import torch from transformers import Wav2Vec2CTCTokenizer, Wav2Vec2ForCTC, Wav2Vec2Processor, TrainingArguments, Wav2Vec2FeatureExtractor, Trainer tokenizer = Wav2Vec2CTCTokenizer("./vocab.json", unk_token="[UNK]", pad_token="[PAD]", word_delimiter_token="|") processor = Wav2Vec2Processor.from_pretrained('boumehdi/wav2vec2-large-xlsr-moroccan-darija', tokenizer=tokenizer) model=Wav2Vec2ForCTC.from_pretrained('boumehdi/wav2vec2-large-xlsr-moroccan-darija') # load the audio data (use your own wav file here!) input_audio, sr = librosa.load('file.wav', sr=16000) # tokenize input_values = processor(input_audio, return_tensors="pt", padding=True).input_values # retrieve logits logits = model(input_values).logits tokens = torch.argmax(logits, axis=-1) # decode using n-gram transcription = tokenizer.batch_decode(tokens) # print the output print(transcription) ``` Output: قالت ليا هاد السيد هادا ما كاينش بحالو email: souregh@gmail.com BOUMEHDI Ahmed