--- language: - km license: apache-2.0 tags: - hf-asr-leaderboard - generated_from_trainer datasets: - openslr - google/fleurs metrics: - wer model-index: - name: Whisper Small Khmer Spaced - Seanghay Yath results: - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: Google FLEURS type: google/fleurs config: km_kh split: all metrics: - name: Wer type: wer value: 0.6464 --- # whisper-small-khmer This model is a fine-tuned version of [openai/whisper-small](https://huggingface.co/openai/whisper-small) on the None dataset. It achieves the following results on the evaluation set: - Loss: 0.4657 - Wer: 0.6464 ## Model description This model is fine-tuned with Google FLEURS & OpenSLR (SLR42) dataset. - [ggml-model.bin](https://huggingface.co/seanghay/whisper-small-khmer/blob/main/ggml-model.bin) - [model.onnx](https://huggingface.co/seanghay/whisper-small-khmer/blob/main/model.onnx) ```python from transformers import pipeline pipe = pipeline( task="automatic-speech-recognition", model="seanghay/whisper-small-khmer", ) result = pipe("audio.wav", generate_kwargs={ "language":"<|km|>", "task":"transcribe"}, batch_size=16 ) print(result["text"]) ``` ## whisper.cpp ### 1. Transcode the input audio to 16kHz PCM ```shell ffmpeg -i audio.ogg -ar 16000 -ac 1 -c:a pcm_s16le output.wav ``` ### 2. Transcribe with whisper.cpp ```shell ./main -m ggml-model.bin -f output.wav --print-colors --language km ``` ## Training and evaluation data - `training` = google/fleurs['train+validation'] + openslr['train'] - `eval` = google/fleurs['test'] ## Training procedure This model was trained based on the project on [GitHub](https://github.com/seanghay/whisper-tiny-khmer) with an NVIDIA A10 24GB. ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 6.25e-06 - train_batch_size: 16 - eval_batch_size: 8 - seed: 42 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_steps: 800 - training_steps: 8000 - mixed_precision_training: Native AMP ### Training results | Training Loss | Epoch | Step | Validation Loss | Wer | |:-------------:|:-----:|:----:|:---------------:|:------:| | 0.2065 | 3.37 | 1000 | 0.3403 | 0.7929 | | 0.0446 | 6.73 | 2000 | 0.2911 | 0.6961 | | 0.008 | 10.1 | 3000 | 0.3578 | 0.6627 | | 0.003 | 13.47 | 4000 | 0.3982 | 0.6564 | | 0.0012 | 16.84 | 5000 | 0.4287 | 0.6512 | | 0.0004 | 20.2 | 6000 | 0.4499 | 0.6419 | | 0.0001 | 23.57 | 7000 | 0.4614 | 0.6469 | | 0.0001 | 26.94 | 8000 | 0.4657 | 0.6464 | ### Framework versions - Transformers 4.28.0.dev0 - Pytorch 2.0.0+cu117 - Datasets 2.11.1.dev0 - Tokenizers 0.13.3