UlutSoftLLC/whisper-small-kyrgyz

Кыргыз Республикасынын Президентине караштуу Мамлекеттик тил жана тил саясаты боюнча улуттук комиссия.

Whisper ASR for Kyrgyz Language is an automatic speech recognition (ASR) solution customized for the Kyrgyz language. It is based on the pre-trained Whisper model and has undergone fine-tuning and adaptation to accurately transcribe Kyrgyz speech, taking into account its specific phonetic intricacies.

To run the model, first install:

!pip install datasets>=2.6.1
!pip install git+https://github.com/huggingface/transformers
!pip install librosa
!pip install evaluate>=0.30
!pip install jiwer
!pip install gradio==3.50.2

Linking the notebook to the Hub is straightforward - it simply requires entering your Hub authentication token when prompted.

from huggingface_hub import notebook_login

notebook_login()

Now that we've fine-tuned our model, we can build a demo to show off its ASR capabilities! We'll use 🤗 Transformers pipeline, which will take care of the entire ASR pipeline, right from pre-processing the audio inputs to decoding the model predictions. We'll build our interactive demo with Gradio. Gradio is arguably the most straightforward way of building machine learning demos; with Gradio, we can build a demo in just a matter of minutes!

Running the example below will generate a Gradio demo where we can record speech through the microphone of our computer and input it to our fine-tuned Whisper model to transcribe the corresponding text:

from transformers import pipeline
import gradio as gr

pipe = pipeline(model="UlutSoftLLC/whisper-small-kyrgyz")  

def transcribe(audio):
    text = pipe(audio)["text"]
    return text

iface = gr.Interface(
    fn=transcribe, 
    inputs=gr.Audio(source="microphone", type="filepath"), 
    outputs="text",
    title="Whisper Small Kyrgyz",
    description="Realtime demo for Kyrgyz speech recognition using a fine-tuned Whisper small model.",
)

iface.launch()