Whisper-small OpenVINO IR
Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. Trained on 680k hours of labelled data, Whisper models demonstrate a strong ability to generalise to many datasets and domains without the need for fine-tuning.
Whisper was proposed in the paper Robust Speech Recognition via Large-Scale Weak Supervision by Alec Radford et al from OpenAI. The original code repository can be found here.
Disclaimer: Content for this model card has partly been copied and pasted from this model card.
Model details
Whisper is a Transformer based encoder-decoder model, also referred to as a sequence-to-sequence model.
Model Type | Parameters | n_audio_ctx | n_audio_state | n_audio_head | n_audio_layer | n_text_ctx | n_text_state | n_text_head | n_text_layer | n_mels | n_vocab |
---|---|---|---|---|---|---|---|---|---|---|---|
whisper-tiny | 39 M | 1500 | 384 | 6 | 4 | 224 | 384 | 6 | 4 | 80 | 51864 |
whisper-base | 74 M | 1500 | 512 | 8 | 6 | 224 | 512 | 8 | 6 | 80 | 51864 |
whisper-small | 244 M | 1500 | 768 | 12 | 12 | 224 | 768 | 12 | 12 | 80 | 51864 |
whisper-medium | 769 M | 1500 | 1024 | 16 | 24 | 224 | 1024 | 16 | 16 | 80 | 51864 |
whisper-large-v1 | 1550 M | 1500 | 1280 | 20 | 32 | 224 | 1280 | 20 | 20 | 80 | 51864 |
whisper-large-v2 | 1550 M | 1500 | 1280 | 20 | 32 | 224 | 1280 | 20 | 20 | 80 | 51864 |
distil-whisper-large-v2 | 756 M | 1500 | 1280 | 20 | 32 | 224 | 1280 | 20 | 2 | 80 | 51864 |
whisper-large-v3 | 1550 M | 1500 | 1280 | 20 | 32 | 224 | 1280 | 20 | 20 | 128 | 51865 |
distil-whisper-large-v3 | 756 M | 1500 | 1280 | 20 | 32 | 224 | 1280 | 20 | 2 | 128 | 51865 |
whisper-large-v3-turbo | 809 M | 1500 | 1280 | 20 | 32 | 224 | 1280 | 20 | 4 | 128 | 51865 |
Unable to determine this model's library. Check the
docs
.
Model tree for Intel/whisper-small-openvino
Base model
openai/whisper-small