OpenAI Whisper Inference Endpoint example

Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multi-task model that can perform multilingual speech recognition as well as speech translation and language identification.

For more information about the model, license and limitations check the original repository at openai/whisper.

This repository implements a custom handler task for automatic-speech-recognition for 🤗 Inference Endpoints using OpenAIs new Whisper model. The code for the customized pipeline is in the pipeline.py.

There is also a notebook included, on how to create the handler.py

Request

The endpoint expects a binary audio file. Below is a cURL example and a Python example using the requests library.

curl

# load audio file
wget https://cdn-media.huggingface.co/speech_samples/sample1.flac

# run request
curl --request POST \
  --url https://{ENDPOINT}/ \
  --header 'Content-Type: audio/x-flac' \
  --header 'Authorization: Bearer {HF_TOKEN}' \
  --data-binary '@sample1.flac'

Python

import json
from typing import List
import requests as r
import base64
import mimetypes

ENDPOINT_URL=""
HF_TOKEN=""

def predict(path_to_audio:str=None):
    # read audio file
    with open(path_to_audio, "rb") as i:
      b = i.read()
    # get mimetype
    content_type= mimetypes.guess_type(path_to_audio)[0]

    headers= {
        "Authorization": f"Bearer {HF_TOKEN}",
        "Content-Type": content_type
    }
    response = r.post(ENDPOINT_URL, headers=headers, data=b)
    return response.json()

prediction = predict(path_to_audio="sample1.flac")

prediction

expected output

{"text": " going along slushy country roads and speaking to damp audiences in draughty school rooms day after day for a fortnight. He'll have to put in an appearance at some place of worship on Sunday morning, and he can come to us immediately afterwards."}