Haitian Speech-to-Text Model
Overview
This repository contains a fine-tuned Whisper ASR (Automatic Speech Recognition) model for the Haitian language. The model is hosted on Hugging Face and is ready for use.
Performance
The model achieved a Word Error Rate (WER) of 0.19126, indicating high accuracy in transcribing spoken Haitian to written text.
Training
The model was trained with a learning rate of 1e-5.
Usage
You can use this model directly from the Hugging Face Model Hub. Here's a simple example in Python:
from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torchaudio
# load model and processor
processor = WhisperProcessor.from_pretrained("ZeeshanGeoPk/haitian-speech-to-text")
model = WhisperForConditionalGeneration.from_pretrained("ZeeshanGeoPk/haitian-speech-to-text")
# read audio files
sample_path = "path/to/audio.wav"
# load audio file using torchaudio
waveform, sample_rate = torchaudio.load(sample_path)
# resample if needed (Whisper model requires 16kHz)
if sample_rate != 16000:
resampler = torchaudio.transforms.Resample(sample_rate, 16000)
waveform = resampler(waveform)
sample_rate = 16000
# ensure mono channel
if waveform.shape[0] > 1:
waveform = waveform.mean(dim=0, keepdim=True)
# process audio using Whisper processor
input_features = processor(waveform.numpy(), sampling_rate=sample_rate, return_tensors="pt").input_features
# generate token ids
predicted_ids = model.generate(input_features)
# decode token ids to text
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
print(transcription)
- Downloads last month
- 8
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.