Versions:
- CUDA: 12.1
- cuDNN Version: 8.9.2.26_1.0-1_amd64
- tensorflow Version: 2.12.0
- torch Version: 2.1.0.dev20230606+cu12135
- transformers Version: 4.30.2
- accelerate Version: 0.20.3
Model Benchmarks:
RAM: 2.8 GB (Original_Model: 5.5GB)
VRAM: 1812 MB (Original_Model: 6GB)
test.wav: 23 s (Multilingual Speech i.e. English+Hindi)
- Time in seconds for Processing by each device
Device Name float32 (Original) float16 CudaCores TensorCores 3060 1.7 1.1 3,584 112 1660 Super OOM 3.3 1,408 N/A Collab (Tesla T4) 2.8 2.2 2,560 320 Collab (CPU) 35 N/A N/A N/A M1 (CPU) - - - - M1 (GPU -> 'mps') - - - - - NOTE: TensorCores are efficient in mixed-precision calculations
- CPU -> torch.float16 not supported on CPU (AMD Ryzen 5 3600 or Collab CPU)
Punchuation: True
Model Error Benchmarks:
- WER: Word Error Rate
- MER: Match Error Rate
- WIL: Word Information Lost
- WIP: Word Information Preserved
- CER: Character Error Rate
Hindi to Hindi (test.tsv) Common Voice 14.0
Test done on RTX 3060 on 2557 Samples
WER | MER | WIL | WIP | CER | |
---|---|---|---|---|---|
Original_Model (54 min) | 52.02 | 47.86 | 66.82 | 33.17 | 23.76 |
This_Model (38 min) | 54.97 | 47.86 | 66.83 | 33.16 | 30.23 |
Hindi to English (test.csv) Custom Dataset
Test done on RTX 3060 on 1000 Samples
WER | MER | WIL | WIP | CER | |
---|---|---|---|---|---|
Original_Model (30 min) | - | - | - | - | - |
This_Model (20 min) | - | - | - | - | - |
English (LibriSpeech -> test-clean)
Test done on RTX 3060 on __ Samples
WER | MER | WIL | WIP | CER | |
---|---|---|---|---|---|
Original_Model | - | - | - | - | - |
This_Model | - | - | - | - | - |
English (LibriSpeech -> test-other)
Test done on RTX 3060 on __ Samples
WER | MER | WIL | WIP | CER | |
---|---|---|---|---|---|
Original_Model | - | - | - | - | - |
This_Model | - | - | - | - | - |
- 'jiwer' library is used for calculations
Code for conversion:
Usage
A file __init__.py
is contained inside this repo which contains all the code to use this model.
Firstly, clone this repo and place all the files inside a folder.
Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install
git clone https://huggingface.co/devasheeshG/whisper_medium_fp16_transformers
Please try in jupyter notebook
# Import the Model
from whisper_medium_fp16_transformers import Model, load_audio, pad_or_trim
# Initilise the model
model = Model(
model_name_or_path='whisper_medium_fp16_transformers',
cuda_visible_device="0",
device='cuda',
)
# Load Audio
audio = load_audio('whisper_medium_fp16_transformers/test.wav')
audio = pad_or_trim(audio)
# Transcribe (First transcription takes time)
model.transcribe(audio)
Credits
It is fp16 version of openai/whisper-medium
- Downloads last month
- 30
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Evaluation results
- Test WER on LibriSpeech (clean)test set self-reported0.000
- Test MER on LibriSpeech (clean)test set self-reported0.000
- Test WIL on LibriSpeech (clean)test set self-reported0.000
- Test WIP on LibriSpeech (clean)test set self-reported0.000
- Test CER on LibriSpeech (clean)test set self-reported0.000
- Test WER on LibriSpeech (other)test set self-reported0.000
- Test MER on LibriSpeech (other)test set self-reported0.000
- Test WIL on LibriSpeech (other)test set self-reported0.000
- Test WIP on LibriSpeech (other)test set self-reported0.000
- Test CER on LibriSpeech (other)test set self-reported0.000